\data\
ngram 1=8761
ngram 2=38900
ngram 3=61516
ngram 4=70186
ngram 5=73187

\1-grams:
-1.830750	the	-0.749890
-1.890649	is	-0.691540
-2.038036	a	-0.642477
-1.951081	of	-0.643167
-1.901191	to	-0.669007
-1.750232	and	-0.422371
-1.862912	in	-0.848516
-1.843019	The	-0.513064
-1.968378	for	-0.526971
-2.098055	that	-0.698830
-2.973153	be	-0.510802
-2.246776	are	-0.492916
-2.237033	can	-0.817175
-2.116637	//	-0.571937
-2.279510	=	-0.435165
-2.095266	or	-0.359022
-2.427443	it	-0.772358
-2.455197	function	-0.652972
-2.176012	if	-0.790051
-2.269013	by	-0.536274
-2.220028	with	-0.516397
-2.326596	on	-0.587337
-2.615067	code	-0.611605
-2.301297	as	-0.460931
-2.823681	not	-0.412019
-2.186176	This	-0.681501
-2.887686	-	-0.860005
-2.498703	an	-0.534308
-2.464857	int	-0.520755
-2.638706	than	-0.478208
-2.905278	compiler	-0.634245
-2.973153	x	-1.047738
-2.571330	may	-0.638738
-2.679429	you	-0.828708
-2.597040	{	-0.674916
-2.668885	have	-0.511026
-2.619693	this	-0.425969
-2.742497	time	-0.609421
-2.742497	use	-0.559656
-2.668885	more	-0.536482
-2.412758	when	-0.590631
-2.345968	A	-0.338819
-2.583995	will	-0.519802
-2.571330	}	-0.670057
-2.384798	then	-0.748610
-2.350949	It	-1.087867
-3.666674	Example	-0.144935
-2.563087	from	-0.484906
-2.736369	memory	-0.490694
-2.539258	at	-0.582165
-2.663707	data	-0.492916
-2.994667	program	-0.459953
-2.615067	has	-0.609239
-2.774501	vector	-0.527678
-3.185448	make	-0.560667
-2.879150	different	-0.366130
-2.491719	because	-0.828988
-3.948416	same	-0.343434
-2.724368	functions	-0.550907
-2.648535	only	-0.532133
-2.823681	CPU	-0.520239
-2.914349	other	-0.294460
-2.718490	instruction	-0.848421
-3.402976	point	-0.486667
-3.053636	loop	-0.550907
-2.471418	If	-0.738280
-2.559024	which	-0.482762
-2.767908	all	-0.362300
-2.488269	but	-0.579640
-3.017303	used	-0.638272
-2.761415	one	-0.428933
-2.870779	cache	-0.478675
-2.633874	should	-0.711651
-2.787993	integer	-0.404631
-3.121889	no	-0.404014
-3.432173	page	-0.216129
-3.121889	set	-0.482085
-2.914349	class	-0.388180
-2.809052	floating	-1.464059
-2.879150	each	-0.412807
-3.041184	do	-0.504467
-3.107353	example	-0.309664
-2.923613	compilers	-0.595274
-2.994667	most	-0.521172
-3.005838	using	-0.408935
-2.736369	double	-0.388180
-2.973153	size	-0.508754
-2.870779	Intel	-0.541362
-3.005838	pointer	-0.559656
-2.905278	b	-0.594572
-2.706967	into	-0.680487
-2.809052	+	-0.345508
-3.573727	n.a.	-0.831734
-2.994667	library	-0.438319
-3.017303	i	-0.731155
-2.730327	float	-0.386097
-2.923613	multiple	-0.430206
-2.952654	two	-0.331596
-3.107353	object	-0.475018
-3.066455	number	-0.791116
-2.887686	static	-0.480326
-3.066455	64-bit	-0.495792
-3.066455	there	-1.226170
-2.933079	C++	-0.413004
-3.029080	also	-0.372386
-2.761415	such	-0.634589
-3.325590	efficient	-0.620349
-2.668885	In	-0.687689
-2.862566	*	-0.305675
-2.629096	There	-1.399097
-3.041184	array	-0.470172
-2.794900	where	-0.537119
-2.933079	many	-0.329994
-3.107353	possible	-0.537119
-2.973153	clock	-0.858278
-2.887686	version	-0.597365
-3.185448	value	-0.636822
-3.053636	objects	-0.539912
-3.017303	takes	-0.535113
-3.221094	variable	-0.361851
-2.846592	any	-0.353418
-2.905278	we	-0.525353
-3.093288	some	-0.351867
-2.787993	so	-0.586820
-2.952654	variables	-0.475815
-3.029080	return	-0.258687
-3.093288	2	-0.282878
-2.712690	You	-1.079181
-3.152507	table	-0.370666
-3.005838	performance	-0.477121
-3.107353	very	-0.365271
-2.973153	software	-0.303753
-3.432173	order	-0.981173
-2.994667	long	-0.560168
-2.887686	between	-0.431501
-3.079665	32-bit	-0.554635
-3.041184	branch	-0.485554
-3.221094	<	-0.448832
-3.066455	member	-0.786584
-3.053636	way	-0.671951
-3.168665	elements	-0.531772
-3.136928	faster	-0.763722
-2.862566	const	-0.715417
-3.029080	makes	-0.626534
-3.029080	cannot	-0.573288
-2.823681	before	-0.567298
-3.325590	stored	-0.806180
-3.240078	called	-0.380211
-3.136928	address	-0.550907
-3.005838	4	-0.363413
-2.774501	See	-0.897016
-3.349885	critical	-0.653978
-3.202905	call	-0.464887
-3.859119	0;	-0.691283
-2.962782	8	-0.367977
-3.041184	less	-0.492916
-2.816305	For	-0.890855
-3.666674	example,	-0.492916
-3.280733	bit	-0.419720
-3.152507	operating	-0.963788
-2.952654	unsigned	-0.602060
-3.432173	first	-0.401145
-3.121889	register	-0.355388
-2.973153	64	-0.459652
-3.107353	take	-0.519650
-3.202905	often	-0.272385
-2.831185	rather	-1.636822
-3.121889	optimization	-0.351335
-3.136928	libraries	-0.374816
-3.121889	how	-0.630089
-3.066455	code.	-0.340054
-3.185448	time.	-0.380211
-3.053636	template	-0.432918
-3.349885	registers	-0.411728
-3.259930	need	-0.560667
-3.053636	pointers	-0.455932
-3.152507	test	-0.346788
-3.280733	new	-0.351723
-3.259930	systems	-0.351723
-3.302582	user	-0.448633
-3.121889	these	-0.262322
-3.202905	they	-0.640518
-2.914349	without	-0.383217
-3.463474	useful	-0.734686
-2.887686	even	-0.482874
-3.573727	sure	-0.873127
-3.240078	method	-0.511399
-3.221094	always	-0.157123
-3.107353	access	-0.396006
-3.053636	void	-0.316824
-3.053636	16	-0.333823
-3.259930	SSE2	-0.543082
-3.136928	out	-0.610029
-4.060956	following	-0.418143
-3.349885	system	-0.252518
-2.962782	32	-0.467361
-3.202905	file	-0.285518
-3.136928	programming	-0.418143
-3.121889	dynamic	-0.743389
-3.121889	part	-1.158363
-3.185448	bits	-0.352183
-3.221094	operations	-0.365971
-3.202905	0	-0.479214
-3.152507	type	-0.409578
-3.463474	case	-0.425969
-3.280733	cases	-0.479214
-3.107353	short	-0.995844
-3.185448	&	-0.276206
-3.259930	simple	-0.264307
-3.017303	instructions	-0.327359
-3.107353	processors	-0.510290
-3.240078	available	-0.522879
-3.280733	constant	-0.332547
-3.079665	up	-0.545155
-3.240078	error	-0.338014
-2.994667	I	-0.611015
-3.432173	making	-0.453407
-3.121889	times	-0.417195
-3.375620	stack	-0.374816
-3.573727	want	-0.851937
-2.983777	Example:	-1.806180
-3.402976	Gnu	-0.505150
-2.942756	Some	-0.425969
-3.185448	its	-0.274701
-3.168665	about	-0.287666
-3.375620	important	-0.483961
-3.533784	accessed	-0.474817
-3.168665	CPUs	-0.416825
-3.185448	function.	-0.365673
-3.325590	extra	-0.365673
-3.152507	does	-0.683093
-3.280733	assembly	-0.532424
-3.259930	large	-0.282547
-2.994667	must	-0.465477
-2.962782	while	-0.282547
-3.029080	;	-0.296335
-3.152507	arrays	-0.416423
-3.168665	work	-0.346788
-2.973153	(see	-1.079181
-3.325590	Windows	-0.330993
-3.221094	calls	-0.435729
-3.221094	calculations	-0.301030
-3.240078	versions	-0.574031
-3.136928	execution	-0.435729
-3.617719	avoid	-0.435729
-3.402976	result	-0.522879
-3.121889	processor	-0.370451
-3.302582	compiled	-0.489638
-2.952654	An	-0.353418
-3.017303	Use	-0.263241
-3.402976	bytes	-0.370451
-3.202905	big	-0.277030
-3.280733	doesn't	-0.370451
-3.325590	threads	-0.378196
-3.617719	best	-0.327043
-3.432173	necessary	-0.479654
-3.259930	element	-0.378196
-3.432173	language	-0.296009
-2.973153	But	-0.457377
-3.349885	speed	-0.436188
-3.325590	specific	-0.214339
-3.136928	c	-0.634245
-3.221094	much	-0.447158
-3.349885	single	-0.316824
-3.859119	i;	-0.572097
-2.973153	These	-0.256826
-3.259930	virtual	-0.572097
-3.152507	several	-0.301030
-3.185448	through	-0.706795
-3.302582	common	-0.229674
-3.402976	a,	-0.748188
-3.349885	thread	-0.301030
-2.983777	etc.	-0.232573
-3.259930	AMD	-0.482450
-3.859119	compile	-0.737723
-3.185448	exception	-0.522879
-3.349885	allocated	-0.425969
-3.259930	small	-0.279841
-3.240078	overflow	-0.404779
-3.221094	+=	-0.312025
-3.325590	integers	-0.404779
-3.375620	option	-0.365271
-3.497208	good	-0.425969
-3.617719	power	-1.028029
-3.432173	matrix	-0.335792
-3.325590	Linux	-0.373581
-3.948416	been	-0.171935
-3.533784	cause	-0.354276
-3.349885	AVX	-0.373581
-3.432173	classes	-0.253605
-3.533784	done	-0.511883
-3.721858	therefore	-0.335792
-3.497208	precision	-0.284640
-3.325590	line	-0.268845
-3.185448	works	-0.362300
-3.259930	optimized	-0.324511
-3.168665	inside	-0.704722
-3.463474	manual	-0.579197
-3.107353	/	-0.201779
-3.721858	explained	-0.848042
-3.617719	calculated	-0.370921
-3.533784	calculation	-0.517049
-3.259930	};	-0.505150
-3.202905	128	-0.319513
-3.202905	uses	-0.319513
-3.259930	four	-0.189880
-3.259930	functions.	-0.425969
-3.240078	another	-0.266268
-3.375620	parameters	-0.477121
-3.349885	get	-0.319513
-3.463474	b;	-0.477121
-3.280733	check	-0.681241
-3.617719	advantageous	-0.567298
-3.573727	implemented	-0.681241
-3.402976	problem	-0.413734
-3.432173	known	-0.555063
-4.060956	(i	-1.066947
-3.221094	solution	-0.522879
-3.302582	container	-0.464887
-3.432173	advantage	-0.823909
-3.280733	Function	-0.492916
-3.240078	support	-0.390253
-3.402976	supported	-0.656418
-3.325590	eight	-0.510290
-3.402976	operators	-0.276206
-3.497208	few	-0.225054
-3.240078	contains	-0.334198
-3.185448	whether	-0.542474
-3.432173	i++)	-0.702175
-3.402976	list	-0.377664
-3.168665	would	-0.413004
-3.533784	likely	-0.865301
-3.375620	structure	-0.439333
-3.349885	doing	-0.301030
-3.375620	run	-0.388180
-3.666674	calculate	-0.413004
-3.432173	inline	-0.467361
-3.221094	every	-0.453997
-3.349885	standard	-0.287666
-3.402976	hardware	-0.374816
-3.152507	1	-0.268361
-3.240078	:	-0.307869
-3.259930	add	-0.232149
-3.617719	mode	-0.351335
-3.573727	store	-0.351335
-3.349885	values	-0.412180
-3.107353	All	-0.201327
-3.463474	sign	-0.616300
-3.349885	copy	-0.537119
-3.302582	optimizing	-0.385852
-3.497208	memory.	-0.361028
-3.402976	well	-0.385852
-3.168665	information	-0.662058
-3.302582	simply	-0.385852
-3.666674	able	-1.315270
-3.240078	certain	-0.315270
-4.449881	cycles	-0.301030
-3.325590	...	-0.397940
-3.280733	addresses	-0.488117
-3.573727	counter	-0.301030
-3.375620	shared	-0.698970
-3.573727	count	-0.425969
-3.463474	program.	-0.323306
-3.349885	quite	-0.204120
-3.573727	used.	-0.301030
-3.302582	files	-0.371611
-3.859119	recommended	-1.000000
-3.432173	intermediate	-0.488117
-3.349885	fast	-0.383217
-3.666674	allocation	-0.332064
-3.617719	(int	-0.633094
-3.432173	write	-0.308583
-3.497208	optimize	-0.286307
-3.573727	above	-0.411245
-3.121889	However,	-0.441209
-3.221094	was	-0.155973
-3.240078	both	-0.265117
-3.202905	programs	-0.229674
-3.402976	problems	-0.341648
-3.185448	unless	-0.873127
-3.721858	optimal	-0.341648
-3.463474	space	-0.293343
-3.573727	cases,	-0.425969
-3.859119	else	-0.873127
-4.213263	lot	-1.094975
-3.202905	Integer	-0.367977
-3.785088	dispatching	-0.316824
-3.859119	particular	-0.210369
-3.721858	microprocessor	-0.301030
-3.617719	replace	-0.514910
-3.859119	next	-0.158363
-3.259930	branches	-0.255272
-3.349885	typically	-0.234083
-3.280733	operator	-0.277549
-3.573727	preferably	-0.556302
-3.721858	1;	-0.514910
-3.185448	Therefore,	-0.711204
-3.463474	Mac	-0.325854
-3.533784	multiplication	-0.213880
-3.432173	application	-0.255272
-3.573727	x)	-1.556303
-3.240078	automatically	-0.261158
-3.533784	see	-0.393784
-3.463474	caching	-0.498519
-3.432173	allows	-0.284640
-3.497208	sets	-0.335792
-3.463474	expression	-0.393784
-3.325590	implementation	-0.393784
-3.168665	Most	-0.425969
-3.432173	complicated	-0.217693
-3.859119	handling	-0.261158
-3.325590	like	-0.309463
-3.463474	dependency	-0.761761
-3.533784	members	-0.335792
-3.349885	their	-0.197489
-3.533784	__m128i	-0.443697
-3.280733	Using	-0.376751
-3.349885	Boolean	-0.376751
-3.533784	cache.	-0.376751
-3.497208	don't	-0.346788
-3.259930	256	-0.244125
-3.302582	intrinsic	-0.568636
-3.302582	methods	-0.267606
-3.349885	signed	-0.443697
-3.497208	model	-0.244125
-3.375620	development	-0.267606
-3.302582	mathematical	-0.443697
-3.533784	never	-0.244125
-3.617719	separate	-0.267606
-3.497208	block	-0.292430
-3.432173	name	-0.244125
-3.463474	systems.	-0.318759
-3.617719	put	-0.408935
-3.375620	needs	-0.522879
-3.375620	y	-0.744727
-3.432173	conversion	-0.346788
-3.573727	c;	-0.376751
-3.240078	#include	-0.249877
-3.497208	various	-0.301030
-3.721858	disadvantage	-0.726999
-3.617719	high	-0.274701
-3.573727	zero	-0.226396
-3.375620	Microsoft	-0.359022
-3.349885	what	-0.391207
-3.573727	parameter	-0.391207
-3.573727	division	-0.359022
-3.666674	reference	-0.463757
-3.349885	source	-0.359022
-3.573727	cost	-0.660052
-3.432173	running	-0.391207
-3.463474	automatic	-0.425969
-3.432173	resources	-0.391207
-3.402976	induction	-0.787697
-3.785088	reason	-0.708515
-3.859119	dispatcher	-0.310575
-3.280733	n	-0.231394
-3.375620	string	-0.282547
-3.948416	programmer	-0.372723
-3.302582	three	-0.164447
-3.463474	better	-0.207913
-3.666674	keyword	-0.282547
-3.859119	efficient.	-0.282547
-3.785088	lookup	-0.256218
-3.666674	end	-0.532424
-3.302582	applications	-0.185637
-3.375620	below.	-0.231394
-3.375620	&&	-0.164447
-3.497208	|	-0.207913
-3.463474	Make	-0.445274
-3.221094	We	-0.583577
-3.432173	examples	-0.372723
-3.325590	char	-0.372723
-3.666674	difference	-0.641569
-3.533784	addition	-0.310575
-3.463474	data.	-0.486667
-3.463474	too	-0.207913
-3.402976	mechanism	-0.282547
-3.302582	Table	-0.185637
-3.375620	runtime	-0.256218
-3.497208	needed	-0.372723
-3.402976	means	-0.583577
-3.497208	last	-0.372723
-3.721858	byte	-0.865301
-3.463474	parts	-1.166331
-3.432173	||	-0.212089
-3.302582	>	-0.263241
-3.533784	types	-0.388180
-3.463474	expressions	-0.188608
-3.533784	difficult	-0.990240
-4.213263	set.	-0.291270
-3.259930	instead	-1.467361
-3.463474	compilers.	-0.212089
-3.785088	transferred	-0.622263
-3.497208	longer	-0.513119
-3.349885	after	-0.467361
-3.402976	read	-0.388180
-3.402976	give	-0.301030
-3.259930	Each	-0.271067
-3.533784	becomes	-0.243038
-3.463474	aligned	-0.447158
-3.463474	directives	-0.271067
-3.432173	requires	-0.301030
-3.402976	optimizations	-0.367977
-3.463474	graphics	-0.271067
-3.573727	public	-0.271067
-3.948416	public:	-0.492916
-3.533784	framework	-0.168404
-3.402976	look	-0.669007
-3.785088	linking	-0.333215
-3.259930	Many	-0.191886
-3.463474	processors.	-0.216709
-3.463474	actually	-0.146128
-3.617719	Intel,	-0.669007
-3.573727	linked	-0.367977
-3.533784	x;	-0.216709
-3.497208	microprocessors	-0.367977
-3.617719	load	-0.367977
-3.785088	control	-0.312025
-3.533784	assume	-0.823909
-4.213263	100;	-0.580871
-3.617719	numbers	-0.312025
-3.463474	platform	-0.195520
-3.721858	later	-0.346788
-3.666674	together	-0.346788
-3.785088	dispatch	-0.279841
-3.463474	calling	-0.312025
-3.402976	your	-0.425969
-3.859119	own	-0.170696
-3.666674	declared	-0.384576
-3.533784	XMM	-0.471726
-3.859119	second	-0.279841
-3.666674	shows	-0.425969
-3.948416	interface	-0.279841
-3.785088	improve	-0.425969
-3.432173	higher	-0.312025
-3.375620	bigger	-0.471726
-3.375620	vectors	-0.221849
-3.325590	Floating	-1.425969
-3.533784	AVX2	-0.147215
-3.666674	piece	-0.823909
-3.785088	divisible	-1.425969
-3.573727	<<	-0.289749
-3.432173	Here,	-0.403692
-3.573727	x86	-0.403692
-3.432173	process	-0.257564
-3.497208	binary	-0.227601
-3.617719	know	-0.449450
-3.533784	512	-0.403692
-3.785088	generate	-0.362300
-3.666674	advantages	-0.704722
-3.463474	r	-0.403692
-3.721858	usually	-0.148420
-3.463474	results	-0.362300
-3.948416	b,	-0.704722
-3.533784	storage	-0.227601
-3.617719	old	-0.227601
-3.721858	reduce	-0.257564
-3.402976	goes	-0.289749
-3.375620	union	-0.625541
-3.785088	0,	-0.257564
-3.859119	called.	-0.289749
-3.375620	10	-0.227601
-3.573727	based	-1.102662
-3.666674	choose	-0.403692
-3.573727	options	-0.257564
-3.497208	feature	-0.403692
-3.666674	ways	-0.704722
-3.432173	were	-0.289749
-3.573727	link	-0.403692
-3.573727	made	-0.257564
-3.666674	appropriate	-0.380211
-4.060956	i,	-0.301030
-3.666674	constructor	-0.266268
-3.497208	CPUs.	-0.301030
-3.785088	2;	-0.535113
-3.432173	just	-0.338819
-3.573727	a[i]	-0.681241
-3.375620	function,	-0.266268
-3.666674	operands	-0.266268
-4.060956	innermost	-0.681241
-3.402976	require	-0.425969
-3.617719	compiler.	-0.204120
-3.533784	advanced	-0.149762
-3.349885	#define	-0.301030
-3.533784	points	-0.602060
-3.533784	switch	-0.380211
-3.533784	range	-0.425969
-3.617719	start	-0.380211
-3.463474	modules	-0.176091
-3.432173	smaller	-0.266268
-3.375620	here	-0.380211
-3.497208	core	-0.313995
-3.533784	relevant	-0.510290
-3.402976	are:	-0.179296
-3.573727	around	-0.401145
-3.463474	5	-0.241444
-3.721858	replaced	-0.878266
-3.721858	a;	-0.401145
-3.432173	things	-0.401145
-3.497208	negative	-0.241444
-3.785088	section	-0.179296
-3.573727	reductions	-0.241444
-3.497208	go	-0.241444
-3.432173	depends	-0.878266
-3.785088	example:	-0.878266
-3.721858	tested	-0.209260
-3.785088	contentions	-0.355388
-3.785088	predicted	-0.276206
-3.785088	main	-0.241444
-3.497208	references	-0.276206
-3.785088	loaded	-0.313995
-3.573727	positive	-0.241444
-3.859119	loop.	-0.313995
-3.666674	computer	-0.276206
-3.721858	overhead	-0.577236
-4.060956	VIA	-0.355388
-3.666674	pointer.	-0.313995
-3.573727	supports	-0.179296
-3.463474	C	-0.313995
-3.617719	compatible	-0.656418
-3.573727	change	-0.355388
-3.533784	global	-0.401145
-3.533784	my	-0.209260
-3.573727	conversions	-0.249877
-3.573727	statement	-0.215115
-3.497208	errors	-0.374816
-3.721858	off	-0.425969
-3.617719	unused	-0.483961
-3.497208	relative	-0.425969
-3.573727	columns	-0.425969
-3.432173	p	-0.483961
-3.617719	platforms.	-0.182931
-3.573727	languages	-0.287666
-3.573727	installation	-0.329059
-3.497208	depending	-1.329059
-3.573727	syntax	-0.287666
-3.617719	cases.	-0.249877
-3.375620	Supports	-0.287666
-3.859119	choice	-0.851937
-3.573727	1.	-0.249877
-3.666674	STL	-0.215115
-3.785088	intended	-0.550907
-3.497208	dynamically	-0.425969
-3.859119	consecutive	-0.630089
-3.785088	profiler	-0.249877
-3.721858	become	-0.215115
-3.533784	Windows,	-0.329059
-3.617719	index	-0.249877
-3.573727	modern	-0.287666
-3.432173	gives	-0.425969
-3.573727	Loop	-0.425969
-4.213263	avoided	-0.483961
-3.666674	turn	-0.630089
-3.721858	inlining	-0.249877
-3.617719	size.	-0.152967
-3.573727	network	-0.301030
-3.617719	slow	-0.301030
-3.573727	b)	-0.259637
-3.721858	>=	-0.187087
-3.948416	desired	-0.187087
-3.432173	Such	-0.346788
-3.533784	#pragma	-0.522879
-3.497208	Dynamic	-0.698970
-3.533784	functions,	-0.346788
-3.721858	whole	-0.455932
-3.721858	inefficient	-0.346788
-3.859119	level-2	-1.000000
-3.721858	response	-0.455932
-3.573727	described	-0.522879
-3.666674	2.	-0.259637
-3.497208	variables.	-0.124939
-3.721858	lines	-0.259637
-3.617719	hot	-0.602060
-3.402976	Unfortunately,	-0.346788
-3.533784	v.	-0.124939
-3.463474	operation	-0.187087
-3.573727	code,	-0.259637
-3.666674	instance	-0.602060
-3.463474	comes	-0.301030
-3.948416	fact	-0.259637
-3.948416	find	-0.522879
-3.617719	rely	-1.000000
-3.463474	No	-0.124939
-3.617719	produce	-0.259637
-3.497208	position-independent	-0.698970
-3.721858	vectorization	-0.154902
-3.432173	including	-0.346788
-3.721858	checking	-0.259637
-3.666674	out-of-order	-0.572097
-3.533784	platforms	-0.316824
-3.721858	particularly	-0.191886
-3.721858	given	-0.316824
-3.666674	output	-0.229674
-3.859119	level-1	-0.669007
-3.463474	resources.	-0.157123
-3.463474	outside	-0.572097
-3.497208	task	-0.191886
-3.785088	limited	-0.229674
-3.617719	vectorized	-0.367977
-3.721858	sometimes	-0.271067
-3.617719	local	-0.229674
-3.785088	costs	-0.492916
-3.721858	S1	-0.316824
-3.617719	math	-0.316824
-3.617719	temp	-0.316824
-3.617719	inlined	-0.191886
-3.617719	still	-0.229674
-3.617719	class.	-0.316824
-3.497208	database	-0.157123
-3.573727	constants	-0.271067
-3.533784	bool	-0.425969
-3.432173	Do	-0.970037
-3.721858	frame	-0.367977
-3.666674	==	-0.367977
-3.859119	d;	-0.316824
-3.573727	special	-0.157123
-3.666674	prevent	-0.271067
-3.721858	shift	-0.229674
-3.859119	destructor	-0.316824
-3.721858	save	-0.124939
-3.666674	prevents	-0.492916
-4.060956	preceding	-0.271067
-3.573727	safe	-0.367977
-3.666674	d	-0.669007
-3.721858	Choice	-1.271067
-4.060956	tell	-0.492916
-3.617719	Pentium	-0.492916
-3.666674	further	-0.229674
-3.432173	Assume	-0.492916
-3.573727	efficiency	-0.492916
-3.617719	repeat	-0.669007
-3.721858	unroll	-0.425969
-3.948416	calls.	-0.316824
-3.533784	algorithm	-0.191886
-3.533784	sum	-0.492916
-3.666674	strings	-0.393784
-3.463474	On	-0.460731
-3.859119	exponent	-0.539912
-3.721858	Linux,	-0.335792
-3.948416	possibility	-0.636822
-3.666674	discussion	-0.761761
-3.573727	conditions	-0.393784
-3.785088	non-Intel	-0.335792
-3.497208	it.	-0.197489
-3.533784	(See	-0.539912
-3.666674	registers.	-0.197489
-3.785088	maximum	-0.284640
-3.948416	mode.	-0.238882
-3.859119	per	-0.335792
-3.617719	testing	-0.197489
-3.533784	alignment	-0.197489
-4.060956	right	-0.197489
-3.785088	offset	-0.284640
-3.666674	compatibility	-0.539912
-3.617719	macro	-0.159701
-3.666674	bytes.	-0.460731
-3.721858	object.	-0.197489
-3.617719	100	-0.124939
-3.463474	Note	-0.761761
-3.463474	them	-0.238882
-3.785088	writing	-0.335792
-3.617719	library.	-0.197489
-3.497208	struct	-0.335792
-3.617719	calculations.	-0.393784
-3.785088	operand	-0.539912
-3.666674	reduced	-0.284640
-4.449881	cycles.	-0.238882
-4.213263	final	-0.197489
-4.449881	sake	-1.238882
-3.617719	operations.	-0.197489
-3.463474	When	-0.335792
-3.573727	tasks	-0.284640
-3.463474	Avoid	-0.197489
-3.617719	effect	-0.539912
-3.721858	amount	-1.238882
-3.666674	variable.	-0.284640
-3.463474	time,	-0.284640
-3.533784	Variables	-0.539912
-3.666674	copying	-0.197489
-3.497208	optimization.	-0.159701
-3.666674	accessing	-0.284640
-3.533784	until	-0.539912
-3.573727	performance.	-0.197489
-3.721858	adding	-0.197489
-4.213263	Define	-0.335792
-3.497208	causes	-0.238882
-3.573727	processing	-0.159701
-3.859119	divide	-0.539912
-3.948416	so-called	-0.124939
-3.666674	clear	-0.284640
-3.948416	total	-0.238882
-3.948416	mix	-0.162727
-3.721858	16-bit	-0.301030
-3.785088	child	-0.505150
-3.533784	containers	-0.301030
-3.948416	fit	-0.505150
-4.060956	predict	-0.301030
-3.666674	priority	-0.249877
-3.785088	disk	-0.162727
-4.449881	frequency	-0.301030
-3.666674	unknown	-0.425969
-4.060956	obtained	-0.505150
-3.617719	libraries.	-0.204120
-3.721858	iteration	-0.249877
-3.785088	counters	-0.204120
-3.573727	Optimizing	-0.301030
-3.721858	128-bit	-0.425969
-3.721858	possibly	-0.249877
-3.666674	x,	-0.359022
-3.948416	stack.	-0.249877
-3.666674	2,	-0.249877
-3.666674	full	-0.124939
-3.497208	Another	-0.249877
-3.666674	overloaded	-0.359022
-3.948416	possible.	-0.162727
-3.948416	efficiently	-0.359022
-3.666674	models	-0.249877
-4.060956	OS	-0.425969
-3.785088	needed.	-0.162727
-3.617719	classes.	-0.204120
-3.948416	changed	-0.301030
-3.617719	true	-0.204120
-3.721858	thread.	-0.249877
-3.785088	names	-0.359022
-3.721858	though	-0.359022
-4.060956	execute	-0.204120
-3.617719	%	-0.204120
-3.573727	mov	-0.204120
-3.666674	N	-0.301030
-3.721858	kinds	-1.204120
-3.617719	details	-0.301030
-3.666674	RAM	-0.425969
-3.859119	rows	-0.425969
-3.497208	square	-0.249877
-3.666674	fail	-0.602060
-3.785088	purposes.	-0.301030
-3.497208	(e.g.	-0.124939
-3.948416	compiling	-0.425969
-3.785088	convert	-0.301030
-3.666674	thing	-0.359022
-4.213263	least	-0.301030
-3.533784	containing	-0.204120
-3.948416	0)	-0.726999
-3.785088	precision.	-0.204120
-3.533784	algebraic	-0.505150
-3.948416	structures	-0.162727
-3.666674	little	-0.204120
-3.497208	Any	-0.124939
-3.666674	logical	-0.301030
-3.666674	level	-0.301030
-3.617719	access.	-0.204120
-3.785088	bitwise	-0.726999
-3.785088	handle	-0.162727
-4.060956	heap	-0.359022
-3.617719	DWORD	-0.903090
-3.617719	Other	-0.301030
-3.497208	during	-0.359022
-3.785088	initialized	-0.321233
-3.721858	occur	-0.321233
-4.060956	target	-0.321233
-3.533784	especially	-0.467361
-3.785088	smart	-0.564271
-3.666674	includes	-0.321233
-4.213263	entire	-0.166331
-3.859119	executable	-0.388180
-3.785088	subexpression	-0.321233
-3.948416	insert	-0.321233
-3.666674	nontemporal	-0.388180
-3.785088	bounds	-0.212089
-4.213263	improved	-0.564271
-3.666674	SSE	-0.321233
-3.948416	discussed	-0.467361
-3.666674	updates	-0.166331
-3.785088	consider	-0.321233
-3.666674	loading	-0.263241
-3.617719	below	-0.388180
-3.666674	reading	-0.321233
-3.533784	directly	-0.321233
-4.213263	simplest	-0.212089
-3.666674	situation	-0.689210
-3.948416	message	-0.263241
-3.666674	delay	-0.321233
-3.721858	condition	-0.263241
-4.449881	monitor	-0.689210
-3.573727	resource	-0.263241
-3.785088	cores	-0.263241
-3.573727	parallel	-0.166331
-3.573727	either	-0.263241
-3.666674	implementations	-0.564271
-3.859119	calculating	-0.564271
-3.533784	ebx	-0.212089
-3.721858	generation	-0.689210
-3.785088	enable	-0.467361
-3.617719	instructions.	-0.212089
-3.785088	copied	-0.321233
-3.533784	e.g.	-0.166331
-3.948416	keep	-0.263241
-4.449881	PTR	-0.263241
-3.617719	Automatic	-0.388180
-3.666674	Library	-0.124939
-3.785088	?	-0.263241
-3.721858	defined	-0.388180
-3.785088	Visual	-0.467361
-3.859119	align	-0.212089
-3.666674	sizes	-0.212089
-3.721858	temp;	-0.388180
-3.617719	allow	-0.212089
-3.721858	PathScale	-0.388180
-3.785088	BSD	-0.263241
-4.060956	f;	-0.467361
-4.213263	previous	-0.263241
-4.449881	size;	-0.865301
-3.785088	rarely	-0.124939
-3.721858	way.	-0.212089
-3.666674	vector.	-0.212089
-3.721858	easier	-0.564271
-3.721858	identical	-0.263241
-3.573727	20	-0.321233
-3.721858	well.	-0.263241
-3.785088	program,	-0.321233
-3.666674	list[i]	-0.388180
-3.617719	under	-0.321233
-3.785088	expect	-0.467361
-3.573727	except	-0.467361
-3.617719	loops	-0.263241
-3.948416	why	-0.166331
-4.449881	dispatching.	-0.170696
-4.060956	cout	-1.124939
-3.785088	references.	-0.124939
-3.666674	come	-0.346788
-3.785088	statements	-0.124939
-4.060956	u;	-0.425969
-3.617719	SSE4.1	-0.279841
-3.721858	chapter	-0.170696
-3.666674	similar	-0.170696
-4.060956	course	-0.170696
-3.721858	back	-0.522879
-3.859119	risk	-0.647817
-3.721858	garbage	-0.522879
-3.721858	templates	-0.170696
-3.785088	buffer	-0.170696
-3.721858	header	-0.522879
-3.721858	future	-0.221849
-3.573727	whenever	-0.425969
-4.060956	unrolling	-0.221849
-3.617719	CriticalFunction	-0.425969
-3.859119	swap	-0.425969
-3.666674	newer	-0.221849
-3.785088	fraction	-0.425969
-3.721858	modify	-0.221849
-3.617719	seconds	-0.170696
-3.859119	unaligned	-0.522879
-3.617719	address.	-0.170696
-4.060956	Store	-0.425969
-3.859119	sequence	-0.425969
-3.785088	compiler,	-0.124939
-3.721858	significant	-0.124939
-3.617719	might	-0.221849
-3.721858	CPU.	-0.221849
-3.573727	Vector	-0.522879
-3.948416	length	-0.647817
-3.948416	sets.	-0.170696
-4.060956	linear	-0.279841
-3.666674	something	-0.346788
-3.666674	f	-0.522879
-3.785088	penalty	-0.346788
-3.617719	F1	-0.279841
-3.785088	invalid	-0.221849
-3.859119	reasons	-0.346788
-3.785088	setting	-0.346788
-3.721858	module	-0.221849
-4.449881	beginning	-0.823909
-3.573727	within	-0.647817
-3.948416	used,	-0.221849
-3.666674	checks	-0.346788
-3.785088	input	-0.170696
-3.948416	not.	-0.279841
-3.617719	programmers	-0.279841
-3.617719	alternative	-0.279841
-3.573727	My	-0.279841
-3.859119	organized	-0.425969
-4.449881	stride	-0.221849
-4.449881	set,	-0.221849
-3.785088	current	-0.170696
-3.666674	'this'	-0.425969
-3.785088	problem.	-0.279841
-3.617719	3	-0.346788
-3.721858	counts	-0.170696
-3.859119	gain	-0.522879
-3.573727	processors,	-0.346788
-3.721858	happen	-0.346788
-3.721858	enough	-0.522879
-3.785088	apply	-0.425969
-3.573727	Obviously,	-0.279841
-3.573727	version.	-0.346788
-3.617719	row	-0.170696
-3.785088	Compiler	-0.279841
-4.213263	matter	-0.647817
-3.617719	declaration	-0.221849
-3.948416	allocate	-0.425969
-3.859119	series	-0.823909
-3.617719	features	-0.221849
-3.859119	added	-0.425969
-4.213263	user.	-0.124939
-3.859119	to:	-1.124939
-3.859119	waste	-0.823909
-3.721858	metaprogramming	-0.124939
-3.721858	map	-0.279841
-3.948416	define	-0.221849
-4.213263	returns.	-0.170696
-3.948416	Windows.	-0.221849
-3.859119	style	-0.124939
-4.213263	Load	-0.778151
-3.785088	3;	-0.477121
-3.721858	approximately	-0.234083
-3.859119	order.	-0.234083
-4.060956	3:	-0.778151
-4.213263	microarchitecture	-1.079181
-3.859119	easy	-0.301030
-3.785088	situations	-0.602060
-4.060956	implement	-0.380211
-3.666674	65	-0.234083
-3.948416	chosen	-0.176091
-3.617719	256-bit	-0.234083
-3.859119	slightly	-0.602060
-3.859119	scattered	-0.602060
-3.617719	contain	-0.124939
-3.721858	writes	-0.234083
-3.666674	device	-0.301030
-3.785088	independent	-0.234083
-4.213263	allocation.	-0.124939
-3.859119	non-static	-0.477121
-4.060956	subsequent	-0.176091
-3.859119	applies	-0.778151
-4.213263	applied	-1.079181
-3.948416	destructors	-0.234083
-3.721858	integers.	-0.301030
-4.213263	terms	-0.778151
-3.721858	help	-0.380211
-3.948416	transfer	-0.234083
-3.785088	blocks	-0.234083
-3.859119	away	-0.234083
-4.060956	15.1b	-0.602060
-3.785088	low	-0.234083
-3.785088	multiply	-0.234083
-3.721858	share	-1.079181
-4.449881	enabled.	-0.176091
-3.948416	explanation	-0.477121
-3.785088	near	-0.602060
-3.859119	provided	-0.380211
-4.213263	latter	-0.301030
-3.617719	6	-0.234083
-3.785088	stores	-0.477121
-3.785088	to.	-0.301030
-3.785088	default	-0.234083
-3.666674	Instruction	-0.602060
-4.060956	finding	-0.380211
-3.666674	inefficient.	-0.124939
-4.213263	c,	-0.602060
-3.666674	search	-0.301030
-3.617719	Modern	-0.602060
-3.785088	block.	-0.234083
-3.859119	critical.	-0.234083
-4.213263	chains	-0.176091
-3.785088	time-consuming	-0.176091
-3.785088	brands	-0.602060
-3.721858	available.	-0.301030
-3.666674	Don't	-0.301030
-3.785088	brand	-0.124939
-4.060956	executed.	-0.124939
-3.721858	faster.	-0.301030
-4.213263	diagonal	-0.176091
-3.948416	n;	-0.176091
-3.859119	*p	-0.477121
-3.859119	logic	-0.234083
-3.666674	Microsoft,	-0.602060
-3.859119	hard	-0.602060
-3.666674	purposes	-0.234083
-3.785088	typical	-0.124939
-3.666674	usability	-0.234083
-3.721858	pure	-0.380211
-3.859119	vectorize	-0.477121
-3.666674	problems.	-0.234083
-3.617719	could	-0.301030
-4.060956	parameter.	-0.124939
-4.060956	derived	-0.477121
-3.617719	mentioned	-0.380211
-3.721858	Time	-0.301030
-3.785088	Optimization	-0.301030
-3.859119	expressions.	-0.176091
-3.666674	include	-0.234083
-3.859119	y;	-0.301030
-3.785088	overflow.	-0.176091
-3.785088	element.	-0.301030
-4.060956	oriented	-0.602060
-3.666674	fully	-0.176091
-3.785088	storage.	-0.124939
-3.721858	addition,	-0.301030
-3.721858	everything	-0.380211
-3.785088	involves	-0.380211
-3.617719	Here	-0.234083
-3.785088	factorial	-0.380211
-3.721858	OpenMP	-0.477121
-3.721858	eax	-0.176091
-4.449881	bb[],	-1.079181
-4.213263	mispredicted	-0.301030
-3.617719	standardized	-0.176091
-3.617719	(or	-0.124939
-3.785088	across	-0.234083
-4.449881	cycle	-0.477121
-3.948416	aliasing	-0.234083
-4.449881	aa[],	-1.079181
-4.060956	tool	-0.477121
-3.785088	parent	-0.477121
-3.948416	care	-0.602060
-3.859119	systems,	-0.182931
-3.859119	parm1,	-1.028029
-3.785088	included	-0.425969
-3.721858	false	-0.182931
-3.721858	value.	-0.249877
-3.859119	file.	-0.182931
-3.721858	*=	-0.425969
-3.859119	temporary	-0.124939
-3.666674	12	-0.182931
-3.721858	memcpy	-0.329059
-3.785088	procedure	-0.550907
-3.948416	PC	-0.249877
-3.785088	frequent	-0.182931
-3.666674	unlimited	-0.726999
-3.859119	parallelism	-0.249877
-4.449881	detection	-0.425969
-3.721858	c2	-0.329059
-4.449881	"The	-1.028029
-3.948416	throw()	-0.249877
-3.859119	prediction	-0.124939
-3.948416	polymorphic	-0.182931
-3.666674	#if	-0.329059
-3.666674	now	-0.182931
-3.785088	unit	-0.249877
-4.060956	conventions	-0.726999
-3.785088	register.	-0.182931
-3.859119	kind	-1.028029
-3.785088	graphical	-0.726999
-3.859119	lower	-0.182931
-3.859119	label	-0.249877
-3.721858	iterations	-0.124939
-3.948416	misprediction	-0.182931
-3.948416	integer,	-0.182931
-4.213263	binding	-0.182931
-3.859119	just-in-time	-0.425969
-3.666674	try	-0.329059
-3.785088	background	-0.124939
-4.060956	converted	-0.726999
-3.948416	pointed	-0.726999
-3.859119	CPUs,	-0.425969
-4.213263	account	-0.425969
-4.449881	p)	-0.726999
-4.213263	chain	-0.124939
-3.721858	algorithms	-0.182931
-3.948416	PLT	-0.329059
-3.721858	heavy	-0.182931
-3.666674	once	-0.182931
-3.785088	additions	-0.329059
-4.060956	hash	-0.425969
-3.721858	ecx	-0.249877
-4.060956	system.	-0.249877
-3.785088	variables,	-0.124939
-4.060956	equally	-0.249877
-3.785088	however,	-0.182931
-3.785088	designed	-0.329059
-3.666674	profiling	-0.124939
-3.785088	fragmented	-0.425969
-3.721858	inputs	-0.182931
-3.948416	fast.	-0.182931
-3.859119	family	-0.249877
-3.785088	4,	-0.124939
-3.785088	Virtual	-0.425969
-3.721858	j	-0.249877
-3.785088	interrupt	-0.249877
-3.948416	-1	-0.425969
-3.721858	8,	-0.249877
-4.213263	units.	-0.182931
-3.666674	who	-0.249877
-3.948416	fastest	-0.182931
-3.721858	__restrict	-0.249877
-3.721858	arithmetic	-0.249877
-3.948416	DLL	-0.124939
-3.785088	factors	-0.329059
-3.948416	Gnu,	-0.550907
-3.785088	arrays.	-0.124939
-3.721858	devices	-0.249877
-3.721858	branch.	-0.124939
-3.785088	required	-0.329059
-3.948416	(unsigned	-0.329059
-3.948416	almost	-0.249877
-3.785088	GOT	-0.329059
-3.721858	array.	-0.249877
-3.948416	listed	-0.550907
-3.859119	general	-0.124939
-3.859119	preferred	-0.249877
-4.449881	cycles,	-0.550907
-3.666674	explicitly	-0.182931
-3.948416	space.	-0.329059
-4.060956	fixed	-0.249877
-3.721858	Memory	-0.182931
-3.785088	zero.	-0.124939
-4.060956	non-sequential	-0.249877
-3.785088	multiplying	-0.425969
-3.721858	Conversion	-0.550907
-3.785088	down	-0.249877
-3.721858	software.	-0.124939
-3.859119	interpreted	-0.182931
-3.785088	exactly	-0.425969
-3.666674	jump	-0.182931
-4.060956	determined	-0.425969
-4.449881	cc[])	-1.028029
-4.060956	line.	-0.182931
-3.948416	easily	-0.249877
-4.213263	identification	-0.329059
-3.666674	vectors.	-0.182931
-3.859119	2)	-0.425969
-3.666674	applications.	-0.182931
-3.721858	volatile	-0.249877
-4.060956	misses	-0.249877
-3.948416	tables	-0.249877
-3.721858	random	-0.182931
-4.060956	X	-0.182931
-3.721858	Conversions	-0.550907
-3.859119	YMM	-0.329059
-3.948416	resolved	-0.492916
-3.948416	purpose	-0.367977
-3.948416	-fpic	-0.191886
-3.785088	D	-0.191886
-3.721858	had	-0.124939
-3.721858	parameters.	-0.124939
-3.859119	ebx,	-0.367977
-3.948416	measure	-0.191886
-3.859119	poorly	-0.271067
-4.213263	this:	-0.669007
-3.785088	sections	-0.191886
-3.785088	Software	-0.271067
-3.721858	Even	-0.124939
-3.721858	19	-0.191886
-4.060956	important.	-0.271067
-4.213263	carry	-0.492916
-3.859119	lazy	-0.669007
-3.721858	xn	-0.191886
-4.449881	stamp	-0.669007
-3.785088	debugging	-0.271067
-4.060956	10;	-0.367977
-3.785088	table.	-0.191886
-3.785088	1,	-0.367977
-3.948416	vector,	-0.191886
-4.449881	(b)	-0.970037
-3.785088	object,	-0.271067
-3.859119	allowed	-0.271067
-4.213263	delete	-0.191886
-3.721858	Likewise,	-0.124939
-4.449881	follows:	-0.191886
-3.785088	simultaneously.	-0.271067
-3.785088	itself	-0.271067
-3.859119	solution.	-0.191886
-3.785088	algebra	-0.367977
-3.948416	suitable	-0.124939
-3.785088	Template	-0.367977
-3.721858	spend	-0.271067
-4.060956	switches	-0.124939
-3.948416	disk.	-0.191886
-3.785088	serious	-0.191886
-3.948416	c);	-0.492916
-4.213263	Studio	-0.124939
-4.060956	a[100];	-0.367977
-4.060956	trick	-0.191886
-3.785088	disadvantages	-0.271067
-3.785088	eax,	-0.191886
-3.948416	distributed	-0.367977
-3.948416	generally	-0.191886
-3.948416	mode,	-0.191886
-3.948416	Linux.	-0.124939
-4.213263	C1	-0.191886
-3.785088	instances	-0.367977
-4.213263	called,	-0.271067
-3.785088	update	-0.124939
-3.785088	<=	-0.191886
-3.948416	integer.	-0.271067
-3.948416	body	-0.271067
-4.449881	definition	-0.367977
-3.859119	Java	-0.191886
-3.859119	Math	-0.492916
-3.948416	generates	-0.191886
-3.859119	executing	-0.367977
-3.721858	Open	-0.191886
-4.213263	256;	-0.492916
-3.785088	optimizations.	-0.191886
-3.859119	Cache	-0.367977
-3.785088	slower	-0.669007
-3.721858	free	-0.191886
-4.060956	consuming	-0.191886
-4.213263	hold	-0.191886
-3.948416	memory,	-0.124939
-4.060956	p.	-0.124939
-4.449881	SIZE;	-0.492916
-3.948416	case.	-0.191886
-3.859119	(	-0.492916
-3.785088	expensive	-0.124939
-3.785088	rounding	-0.271067
-3.948416	130	-0.367977
-3.721858	far	-0.271067
-3.721858	They	-0.367977
-3.721858	exceptions	-0.124939
-4.213263	system,	-0.191886
-3.859119	absolute	-0.271067
-4.060956	(a	-0.191886
-3.721858	machine	-0.367977
-3.721858	Induction	-0.492916
-3.721858	120	-0.191886
-3.859119	hardly	-0.367977
-4.213263	CPUID	-0.367977
-3.859119	saved	-0.271067
-3.721858	changes	-0.191886
-4.060956	integers,	-0.124939
-3.948416	collection	-0.191886
-3.785088	manuals	-0.271067
-3.859119	processor.	-0.191886
-3.785088	Shared	-0.970037
-4.060956	storing	-0.124939
-3.859119	developers	-0.191886
-4.449881	parm2)	-0.669007
-3.785088	T	-0.124939
-4.060956	eliminate	-0.191886
-3.948416	2:	-0.271067
-3.948416	composite	-0.492916
-3.721858	profilers	-0.271067
-3.948416	highly	-0.367977
-3.721858	again	-0.191886
-3.721858	127	-0.124939
-4.060956	language.	-0.124939
-4.213263	aware	-0.492916
-3.721858	Alternatively,	-0.492916
-3.948416	capabilities	-0.124939
-3.948416	4)	-0.492916
-4.060956	linker	-0.271067
-3.785088	int64_t	-0.124939
-3.859119	bits.	-0.492916
-3.785088	measurements	-0.191886
-3.859119	representation	-0.367977
-4.060956	SomeFunction	-0.970037
-3.721858	size,	-0.124939
-3.721858	is.	-0.191886
-4.213263	reductions:	-0.124939
-3.859119	waiting	-0.970037
-3.721858	available,	-0.124939
-3.721858	automatically.	-0.271067
-3.859119	powers	-0.970037
-3.785088	debug	-0.367977
-3.785088	polymorphism	-0.191886
-3.948416	Clang	-0.271067
-3.948416	measured	-0.124939
-4.213263	details.	-0.191886
-3.859119	factor	-0.124939
-3.948416	x);	-0.271067
-3.859119	core.	-0.191886
-3.785088	rules	-0.367977
-3.721858	speed.	-0.191886
-4.060956	vectorization.	-0.367977
-3.785088	anyway.	-0.124939
-4.449881	smallest	-0.271067
-4.449881	responsibility	-0.970037
-3.859119	Mathematical	-0.669007
-3.948416	MMX	-0.124939
-3.948416	reliable	-0.124939
-3.859119	Borland	-0.191886
-4.213263	sense	-0.669007
-4.213263	latest	-0.367977
-3.785088	Now	-0.191886
-4.060956	units	-0.124939
-3.859119	do.	-0.271067
-3.948416	reciprocal	-0.191886
-4.213263	d,	-0.492916
-4.060956	threads.	-0.271067
-3.721858	log	-0.271067
-3.948416	thousand	-0.191886
-3.721858	compile-time	-0.124939
-3.948416	remove	-0.271067
-3.785088	Intel's	-0.271067
-4.060956	16.	-0.191886
-3.948416	registers,	-0.191886
-4.213263	transpose	-0.271067
-3.948416	wait	-0.669007
-3.859119	number.	-0.191886
-3.948416	break	-0.204120
-3.785088	constant.	-0.124939
-4.449881	linkage	-0.602060
-4.213263	possible,	-0.425969
-4.213263	scan	-0.301030
-4.213263	systems:	-0.204120
-3.948416	predictable	-0.204120
-4.213263	"Hello	-0.425969
-3.948416	equal	-0.602060
-3.859119	CodeGear	-0.124939
-4.213263	compact	-0.204120
-3.859119	polynomial	-0.124939
-3.785088	Common	-0.425969
-3.785088	reads	-0.301030
-3.785088	plus	-0.124939
-4.449881	5:	-0.602060
-3.859119	increase	-0.602060
-4.060956	casting	-0.425969
-4.213263	course,	-0.124939
-4.213263	scope	-0.602060
-3.859119	principle	-0.301030
-4.213263	throughput	-0.301030
-3.948416	spent	-0.425969
-3.948416	16;	-0.301030
-3.948416	Func	-0.204120
-4.449881	identify	-0.204120
-3.785088	15	-0.204120
-3.785088	14	-0.204120
-3.859119	this.	-0.124939
-3.785088	Register	-0.204120
-4.060956	complex	-0.124939
-3.785088	Intrinsic	-0.602060
-3.948416	call.	-0.124939
-4.060956	notice	-0.301030
-3.859119	Add	-0.425969
-4.213263	prediction.	-0.204120
-3.948416	expected	-0.301030
-3.948416	declare	-0.204120
-4.060956	application.	-0.301030
-3.785088	here.	-0.301030
-4.213263	largest	-0.301030
-3.859119	dispatched	-0.602060
-4.213263	members.	-0.124939
-3.948416	fits	-0.425969
-3.785088	x-xxxx--x	-0.124939
-3.859119	giving	-0.124939
-4.213263	comparisons	-0.301030
-3.859119	Performance	-0.204120
-3.948416	above,	-0.124939
-4.060956	above.	-0.124939
-3.785088	Pointer	-0.301030
-3.859119	detect	-0.124939
-3.785088	normal	-0.124939
-3.785088	Several	-0.124939
-3.948416	convenient	-0.425969
-3.859119	show	-0.204120
-3.859119	column	-0.204120
-4.449881	{...}	-0.903090
-3.859119	Test	-0.204120
-3.859119	c1	-0.124939
-4.060956	x-	-0.602060
-3.859119	Number	-0.425969
-3.785088	portability	-0.204120
-3.859119	SSE3	-0.124939
-4.213263	evaluate	-0.124939
-3.948416	embedded	-0.301030
-3.859119	Agner	-0.204120
-4.213263	availability	-0.903090
-3.948416	13.1	-0.204120
-4.060956	reference,	-0.204120
-3.948416	.NET	-0.425969
-3.785088	!=	-0.425969
-3.948416	files,	-0.204120
-3.948416	Pointers	-0.301030
-4.060956	half	-0.301030
-3.859119	converting	-0.425969
-3.859119	occurs	-0.204120
-4.060956	Set	-0.301030
-3.859119	costly	-0.204120
-4.213263	newest	-0.425969
-3.859119	specifying	-0.301030
-3.859119	follows	-0.301030
-3.948416	comparing	-0.124939
-3.785088	efficient,	-0.425969
-3.859119	computers	-0.301030
-3.859119	B	-0.204120
-3.948416	System	-0.425969
-3.948416	five	-0.124939
-3.859119	step	-0.204120
-3.785088	poor	-0.124939
-3.785088	prefetch	-0.301030
-3.785088	9	-0.301030
-4.213263	deciding	-0.602060
-3.859119	self-relative	-0.204120
-3.948416	(float	-0.301030
-4.060956	Core	-0.301030
-3.948416	debugger	-0.124939
-4.060956	^	-0.124939
-3.785088	regardless	-0.903090
-3.785088	truncation	-0.204120
-3.859119	base	-0.124939
-3.785088	result.	-0.204120
-3.948416	How	-0.425969
-4.449881	chain.	-0.124939
-3.785088	Reading	-0.425969
-3.948416	compilation	-0.204120
-4.449881	spots	-0.204120
-3.948416	behavior	-0.425969
-3.785088	happens	-0.124939
-3.785088	7	-0.204120
-3.859119	87	-0.124939
-3.859119	Type	-0.204120
-3.859119	places	-0.124939
-4.449881	unwinding	-0.204120
-4.060956	static,	-0.425969
-4.449881	am	-0.301030
-4.060956	leaf	-0.425969
-3.948416	evaluated	-0.204120
-3.859119	completely	-0.124939
-4.060956	again.	-0.204120
-3.948416	powerful	-0.204120
-3.948416	form	-0.425969
-3.859119	deallocated	-0.602060
-3.859119	times.	-0.204120
-3.859119	32-	-0.602060
-3.948416	edx	-0.204120
-4.060956	rule	-0.602060
-3.948416	one.	-0.204120
-3.948416	permissible	-0.204120
-4.213263	worst	-0.425969
-4.060956	job	-0.124939
-3.785088	due	-0.903090
-4.213263	1.0;	-0.124939
-3.785088	depend	-0.903090
-4.060956	biggest	-0.204120
-3.948416	?Func@@YAXQAHAAH@Z	-0.204120
-3.859119	defines	-0.301030
-4.060956	overlap.	-0.204120
-3.859119	processing,	-0.204120
-4.449881	SelectAddMul(short	-0.903090
-3.859119	users	-0.124939
-3.859119	soon	-0.204120
-3.859119	six	-0.204120
-3.859119	Testing	-0.204120
-4.449881	general,	-0.301030
-4.213263	roll	-0.903090
-3.785088	(i.e.	-0.124939
-3.785088	edx,	-0.204120
-3.859119	C++,	-0.204120
-4.449881	i);	-0.903090
-3.859119	mixed	-0.124939
-4.060956	protection	-0.301030
-4.060956	counter.	-0.204120
-3.948416	structure.	-0.204120
-3.948416	4.	-0.204120
-3.785088	security	-0.124939
-3.785088	branches.	-0.204120
-3.785088	Is16vec8	-0.124939
-4.060956	cores.	-0.204120
-3.859119	communication	-0.425969
-4.060956	avoiding	-0.204120
-3.785088	anything	-0.301030
-3.948416	INSTRSET	-0.602060
-3.785088	Accessing	-0.301030
-3.948416	internal	-0.301030
-3.948416	type-casting	-0.204120
-3.948416	requirements	-0.425969
-3.948416	profiler.	-0.124939
-3.859119	__fastcall	-0.124939
-3.859119	loss	-0.823909
-3.859119	cleanup	-0.221849
-4.060956	Functions	-0.221849
-4.213263	handling.	-0.124939
-3.948416	Fortran	-0.124939
-3.948416	increment	-0.124939
-4.213263	drivers	-0.221849
-4.213263	economize	-0.522879
-3.859119	Templates	-0.221849
-4.060956	28	-0.124939
-3.948416	seven	-0.221849
-4.060956	turned	-0.221849
-4.060956	inheritance	-0.221849
-4.213263	overcome	-0.221849
-4.213263	maintain.	-0.124939
-4.213263	fourteen	-0.221849
-3.948416	122	-0.221849
-4.213263	consuming.	-0.221849
-3.859119	method.	-0.124939
-3.948416	backwards	-0.221849
-3.859119	remote	-0.124939
-3.948416	int,	-0.221849
-3.948416	bc	-0.346788
-4.213263	tools.	-0.124939
-4.060956	operation.	-0.221849
-4.213263	future.	-0.124939
-3.859119	swapping	-0.124939
-4.213263	AVX512	-0.124939
-4.213263	considerable	-0.124939
-3.859119	memset	-0.346788
-4.449881	rest	-0.823909
-4.060956	on,	-0.221849
-3.859119	Agner's	-0.823909
-3.859119	Digital	-0.823909
-3.859119	third	-0.124939
-4.449881	Roll	-0.823909
-3.948416	Critical	-0.124939
-4.449881	"Calling	-0.823909
-3.948416	CISC	-0.522879
-3.859119	22	-0.124939
-3.948416	AND	-0.221849
-4.213263	effort	-0.346788
-3.948416	numbers.	-0.124939
-3.859119	popular	-0.124939
-4.060956	SIZE	-0.346788
-3.948416	Runtime	-0.522879
-3.859119	principles	-0.124939
-3.948416	context	-0.522879
-3.859119	names.	-0.124939
-3.859119	reducing	-0.346788
-4.060956	benefit	-0.823909
-3.948416	worth	-0.522879
-3.948416	manual.	-0.124939
-3.948416	specifies	-0.221849
-3.948416	searching	-0.346788
-3.948416	versus	-0.221849
-4.213263	propagation	-0.221849
-4.060956	reduction	-0.221849
-3.948416	effects	-0.221849
-4.449881	1.;	-0.346788
-4.213263	live	-0.522879
-4.060956	multidimensional	-0.522879
-4.060956	install	-0.346788
-3.859119	development,	-0.221849
-3.948416	strict	-0.221849
-4.213263	(c	-0.522879
-3.948416	Position-independent	-0.346788
-3.948416	obvious	-0.221849
-3.948416	swapped	-0.346788
-3.859119	21	-0.221849
-4.213263	vectors:	-0.823909
-3.859119	OR	-0.221849
-4.060956	Array	-0.346788
-3.948416	processes	-0.124939
-3.859119	portable	-0.346788
-4.060956	consume	-0.346788
-4.213263	schemes	-0.522879
-3.948416	80	-0.221849
-3.948416	Arrays	-0.221849
-3.859119	lists	-0.124939
-4.060956	event	-0.221849
-3.948416	computer.	-0.346788
-3.948416	Static	-0.346788
-4.060956	becoming	-0.346788
-3.948416	select	-0.221849
-3.859119	list,	-0.124939
-4.213263	executed	-0.124939
-4.213263	actual	-0.124939
-3.948416	case,	-0.221849
-3.948416	over	-0.221849
-4.060956	realistic	-0.221849
-4.060956	abc	-0.221849
-4.213263	finished.	-0.124939
-4.449881	hand,	-0.124939
-3.948416	_WIN64	-0.221849
-4.449881	recover	-0.522879
-4.060956	console	-0.522879
-4.060956	advice	-0.221849
-3.948416	ways.	-0.221849
-3.948416	16.2	-0.221849
-3.859119	pow	-0.221849
-3.948416	split	-0.221849
-3.948416	generated	-0.522879
-3.948416	created	-0.221849
-4.213263	hundred	-0.221849
-3.859119	250	-0.346788
-3.948416	computing	-0.124939
-3.859119	pointers,	-0.124939
-4.060956	limit	-0.346788
-3.859119	90	-0.221849
-3.859119	follow	-0.522879
-3.859119	loop-carried	-0.823909
-3.948416	library,	-0.221849
-4.060956	recommendation	-0.124939
-3.859119	Objects	-0.124939
-3.948416	compromise	-0.221849
-4.449881	Mars	-0.124939
-4.060956	already	-0.221849
-3.948416	nothing	-0.221849
-3.859119	(a&&b)	-0.823909
-3.859119	physical	-0.221849
-4.449881	((unsigned	-0.346788
-3.859119	xxxxxxxxx	-0.124939
-4.060956	constructors	-0.522879
-3.948416	increased	-0.124939
-3.859119	programming,	-0.124939
-4.213263	factor.	-0.221849
-3.948416	i.e.	-0.221849
-3.948416	nonzero	-0.221849
-3.859119	unacceptably	-0.522879
-3.948416	process.	-0.124939
-3.859119	Calculate	-0.346788
-3.859119	Only	-0.221849
-3.859119	adds	-0.124939
-4.060956	()	-0.823909
-3.859119	Division	-0.346788
-3.948416	pitfalls	-0.346788
-4.213263	package	-0.124939
-3.859119	equivalent	-0.221849
-4.060956	understand	-0.124939
-3.859119	Fortunately,	-0.124939
-4.060956	command	-0.346788
-4.060956	a[i];	-0.124939
-3.948416	relatively	-0.124939
-4.060956	priority.	-0.124939
-3.859119	files.	-0.124939
-4.060956	inefficient,	-0.124939
-3.859119	guidelines	-0.124939
-4.449881	Kernel	-0.221849
-4.213263	necessarily	-0.124939
-3.859119	returns	-0.221849
-3.948416	jobs	-0.124939
-3.859119	Data	-0.221849
-3.859119	frameworks	-0.346788
-4.060956	excessive	-0.346788
-3.948416	safer	-0.522879
-4.060956	Aligning	-0.346788
-4.213263	execution.	-0.124939
-4.213263	a[size],	-0.522879
-4.213263	latency	-0.221849
-3.948416	specify	-0.346788
-4.449881	for(i=0;	-0.346788
-3.948416	larger	-0.124939
-3.859119	-(-a)	-0.346788
-3.859119	Multiple	-0.124939
-4.213263	unfortunately	-0.124939
-3.948416	n!	-0.124939
-3.948416	pieces	-0.823909
-4.060956	Basic	-0.124939
-3.859119	(In	-0.124939
-3.948416	microprocessors.	-0.124939
-4.060956	modules.	-0.346788
-3.948416	s	-0.522879
-4.060956	project	-0.124939
-3.948416	divided	-0.823909
-4.060956	www.agner.org/optimize/asmlib.zip.	-0.221849
-3.948416	Wednesday	-0.221849
-4.213263	mispredictions.	-0.124939
-3.859119	relies	-0.823909
-3.859119	And	-0.221849
-3.948416	platforms,	-0.221849
-4.060956	compare	-0.124939
-3.948416	valid	-0.124939
-3.948416	CPU-intensive	-0.124939
-3.948416	Is	-0.346788
-4.060956	so.	-0.124939
-3.859119	seen	-0.346788
-3.859119	Typically,	-0.346788
-3.859119	107	-0.124939
-3.948416	contiguous	-0.221849
-3.859119	gets	-0.346788
-3.859119	manuals.	-0.221849
-3.948416	tells	-0.522879
-4.060956	wrap	-0.221849
-3.859119	separately	-0.124939
-3.948416	__attribute((	-0.346788
-3.859119	necessary.	-0.124939
-3.948416	increasing	-0.221849
-4.060956	16,	-0.221849
-4.060956	threads,	-0.346788
-3.948416	Development	-0.124939
-4.213263	AND'ed	-0.522879
-4.213263	elimination	-0.124939
-4.060956	all.	-0.221849
-3.859119	..........................................................................................	-0.124939
-3.948416	upper	-0.522879
-3.859119	addresses.	-0.124939
-4.060956	loop-invariant	-0.522879
-3.859119	sum1	-0.221849
-3.948416	~a	-0.346788
-3.859119	Compilers	-0.124939
-3.859119	);	-0.221849
-3.859119	18	-0.221849
-3.859119	them.	-0.124939
-4.060956	point.	-0.124939
-4.213263	consumption	-0.221849
-4.213263	8.	-0.221849
-4.060956	key	-0.124939
-4.060956	explanation.	-0.221849
-3.859119	itself.	-0.124939
-4.060956	updated	-0.124939
-3.948416	appear	-0.346788
-3.859119	Codeplay	-0.221849
-3.859119	(except	-0.522879
-3.948416	combined	-0.221849
-4.060956	definitely	-0.346788
-3.948416	jumps	-0.124939
-3.948416	elements.	-0.221849
-4.060956	.cpp	-0.346788
-3.948416	features,	-0.221849
-4.060956	flag	-0.124939
-4.213263	8)	-0.522879
-3.859119	ever	-0.124939
-4.213263	Writes	-0.522879
-3.859119	13	-0.221849
-3.859119	b[i]	-0.346788
-4.060956	doubled.	-0.221849
-3.859119	written	-0.221849
-4.213263	languages,	-0.124939
-4.060956	malloc	-0.522879
-4.060956	runs	-0.124939
-4.213263	true,	-0.221849
-3.859119	division.	-0.124939
-4.060956	C;	-0.124939
-3.948416	0.18	-0.346788
-3.948416	MS	-0.522879
-3.859119	#endif	-0.221849
-3.948416	present	-0.221849
-4.060956	15.1c	-0.124939
-4.213263	1000;	-0.221849
-3.948416	strlen	-0.346788
-3.948416	__asm	-0.124939
-4.449881	cycle.	-0.346788
-3.859119	11	-0.221849
-3.859119	belong	-0.823909
-3.859119	50	-0.124939
-3.948416	facilities	-0.221849
-3.859119	5.	-0.124939
-3.948416	currently	-0.124939
-3.859119	here:	-0.124939
-3.948416	Does	-0.522879
-3.859119	macros	-0.124939
-3.948416	prefer	-0.346788
-4.060956	divisor	-0.522879
-4.213263	Program	-0.425969
-4.060956	better.	-0.124939
-4.060956	BSD,	-0.124939
-4.060956	bit-mask:	-0.249877
-3.948416	two.	-0.249877
-3.948416	up,	-0.249877
-3.948416	up.	-0.124939
-3.948416	reasons.	-0.124939
-3.948416	103	-0.124939
-4.213263	Choosing	-0.726999
-4.449881	slices	-0.124939
-4.213263	exception.	-0.124939
-3.948416	enum	-0.249877
-4.060956	repeats	-0.249877
-4.213263	highest	-0.124939
-3.948416	96	-0.124939
-4.213263	recommend	-0.249877
-4.213263	lead	-0.726999
-4.060956	additional	-0.124939
-3.948416	51	-0.124939
-3.948416	56	-0.124939
-3.948416	type.	-0.124939
-4.060956	place	-0.124939
-4.060956	preferable	-0.425969
-4.060956	overlap	-0.425969
-4.449881	eight-element	-0.726999
-4.060956	40	-0.124939
-4.060956	43	-0.124939
-4.060956	sixteen	-0.249877
-4.213263	turning	-0.425969
-4.060956	initialization.	-0.249877
-4.060956	Graphics	-0.124939
-3.948416	obstacles	-0.425969
-4.213263	asmlib	-0.425969
-3.948416	Furthermore,	-0.124939
-4.213263	obtain	-0.249877
-4.060956	ebx.	-0.249877
-3.948416	estimate	-0.124939
-4.060956	enabled	-0.124939
-4.213263	enables	-0.425969
-4.213263	Obstacles	-0.726999
-4.449881	r)	-0.425969
-3.948416	regular	-0.124939
-4.060956	m	-0.425969
-4.060956	Metaprogramming	-0.124939
-4.060956	explain	-0.249877
-3.948416	Dispatch	-0.425969
-3.948416	well,	-0.249877
-4.060956	sufficiently	-0.249877
-3.948416	126	-0.124939
-3.948416	bad	-0.124939
-4.449881	p(double	-0.726999
-4.060956	said	-0.249877
-3.948416	modulo	-0.124939
-4.060956	databases	-0.124939
-4.213263	_EM_OVERFLOW);	-0.726999
-3.948416	against	-0.124939
-3.948416	Vectorized	-0.249877
-3.948416	break;	-0.425969
-4.213263	loader	-0.124939
-4.060956	Failure	-0.726999
-4.449881	declared.	-0.124939
-3.948416	resources,	-0.124939
-3.948416	true.	-0.249877
-3.948416	objects.	-0.124939
-4.449881	parallel.	-0.124939
-3.948416	one,	-0.124939
-4.449881	list[300];	-0.726999
-4.449881	r++)	-0.726999
-4.213263	parabola	-0.425969
-4.213263	x^4	-0.425969
-4.060956	mouse	-0.124939
-4.449881	specialization	-0.425969
-3.948416	index.	-0.249877
-3.948416	options.	-0.124939
-4.213263	c++)	-0.726999
-4.060956	are.	-0.124939
-4.213263	needed,	-0.124939
-4.213263	declaring	-0.249877
-4.213263	SVML	-0.124939
-4.213263	*.so).	-0.425969
-4.449881	(u.i	-0.249877
-3.948416	support.	-0.124939
-4.213263	subtraction	-0.425969
-4.060956	Multiply	-0.249877
-4.060956	|=	-0.425969
-4.449881	pool.	-0.249877
-4.060956	performs	-0.124939
-4.060956	"Intel	-0.249877
-3.948416	Are	-0.726999
-3.948416	pre-increment	-0.124939
-3.948416	ownership	-0.425969
-3.948416	88	-0.124939
-4.213263	0x80000000;	-0.425969
-4.060956	move	-0.124939
-3.948416	Can	-0.124939
-4.060956	defining	-0.124939
-4.213263	produces	-0.425969
-4.213263	precision,	-0.124939
-4.213263	non-inlined	-0.726999
-4.213263	drawbacks	-0.726999
-3.948416	__declspec(align(16))	-0.124939
-3.948416	u.f	-0.124939
-3.948416	commercial	-0.124939
-3.948416	configuration	-0.425969
-4.060956	134	-0.124939
-4.060956	lines.	-0.249877
-4.060956	restrictions	-0.726999
-3.948416	Constant	-0.425969
-4.449881	manager	-0.124939
-4.213263	pattern	-0.249877
-4.060956	x86-64	-0.249877
-4.213263	*p+2	-0.249877
-4.060956	Watcom	-0.124939
-4.060956	round	-0.124939
-3.948416	cores,	-0.249877
-4.060956	chooses	-0.425969
-4.060956	running.	-0.124939
-4.213263	serial	-0.249877
-4.449881	cc	-0.726999
-4.060956	Header	-0.425969
-3.948416	150	-0.124939
-3.948416	thanks	-0.726999
-4.449881	2.0;	-0.249877
-4.213263	pipeline	-0.124939
-4.213263	n)	-0.726999
-4.213263	input.	-0.124939
-3.948416	8.1	-0.249877
-3.948416	conditions.	-0.124939
-4.060956	choosing	-0.425969
-4.060956	146	-0.249877
-3.948416	..............................................................................................	-0.124939
-4.213263	_mm256_zeroupper()	-0.726999
-4.060956	Making	-0.249877
-4.213263	flush-to-zero	-0.425969
-3.948416	Taylor	-0.124939
-3.948416	SelectAddMul_pointer	-0.726999
-4.060956	dispatcher.	-0.249877
-4.449881	Clang,	-0.425969
-3.948416	14.9	-0.249877
-4.060956	n,	-0.249877
-3.948416	14.8	-0.249877
-3.948416	overflow,	-0.249877
-4.060956	x++)	-0.425969
-4.060956	optimal.	-0.124939
-4.213263	*)d,	-0.425969
-4.060956	class,	-0.124939
-3.948416	z	-0.425969
-4.449881	advance	-0.249877
-4.449881	c:	-0.249877
-4.060956	guaranteed	-0.726999
-3.948416	think	-0.726999
-4.213263	example.	-0.124939
-3.948416	older	-0.124939
-4.060956	commonly	-0.425969
-3.948416	queue	-0.249877
-3.948416	{}	-0.124939
-4.213263	1.0f;	-0.249877
-4.213263	ALIGN	-0.249877
-4.060956	modification	-0.124939
-4.060956	solutions	-0.124939
-4.449881	guide	-0.726999
-4.060956	appendix	-0.425969
-4.060956	17	-0.249877
-4.060956	empty	-0.425969
-4.449881	maintenance	-0.124939
-4.060956	1:	-0.124939
-4.060956	Out	-0.249877
-4.213263	protected	-0.425969
-4.060956	Container	-0.726999
-3.948416	alternatives	-0.425969
-3.948416	modifications	-0.124939
-4.213263	i_div_3;	-0.124939
-4.060956	s;	-0.249877
-4.213263	case"	-0.124939
-4.449881	distinguish	-0.249877
-4.060956	missing	-0.249877
-4.213263	subroutines	-0.726999
-4.060956	tools	-0.124939
-4.213263	0x2710	-0.425969
-4.449881	spot	-0.124939
-4.060956	powN	-0.249877
-4.060956	C-style	-0.124939
-3.948416	While	-0.124939
-4.060956	Bitfield	-0.425969
-4.060956	clean	-0.425969
-3.948416	according	-0.726999
-4.060956	Bounds	-0.726999
-4.060956	u.i	-0.124939
-3.948416	dramatic	-0.124939
-4.060956	IDE.	-0.124939
-3.948416	lengths	-0.124939
-4.060956	expensive.	-0.249877
-4.060956	efficiency.	-0.124939
-4.060956	Copyright	-0.249877
-3.948416	extended	-0.425969
-4.060956	size)	-0.124939
-3.948416	(Gnu)	-0.124939
-3.948416	contained	-0.249877
-4.060956	transferring	-0.124939
-4.060956	Access	-0.425969
-4.213263	saving	-0.124939
-3.948416	years	-0.124939
-4.213263	y,	-0.425969
-4.060956	structured	-0.249877
-4.060956	documentation	-0.249877
-4.060956	CChild1	-0.249877
-3.948416	PGI	-0.249877
-3.948416	As	-0.124939
-4.449881	(RTTI)	-0.124939
-4.449881	default,	-0.124939
-4.449881	xpow10(double	-0.726999
-4.449881	a2,	-0.726999
-4.213263	inconvenient	-0.249877
-4.449881	expressed	-0.726999
-4.060956	bottleneck	-0.249877
-3.948416	directive	-0.124939
-4.060956	not,	-0.249877
-4.213263	scarce	-0.124939
-4.213263	12.4b	-0.124939
-4.213263	lrint	-0.249877
-3.948416	versions.	-0.124939
-3.948416	.............................................................................................	-0.124939
-4.060956	Alignment	-0.726999
-3.948416	going	-0.249877
-4.060956	underflow	-0.124939
-4.449881	ranges	-0.425969
-4.060956	splitting	-0.249877
-4.213263	user's	-0.249877
-4.060956	__INTEL_COMPILER	-0.249877
-4.060956	cleaned	-0.249877
-4.060956	cached.	-0.124939
-4.060956	video	-0.249877
-4.449881	aa:	-0.425969
-3.948416	information.	-0.124939
-3.948416	Whenever	-0.249877
-4.060956	area	-0.249877
-4.060956	consequence	-0.249877
-4.213263	a1,	-0.726999
-3.948416	unsigned.	-0.249877
-4.060956	pointers.	-0.249877
-3.948416	26	-0.124939
-3.948416	Smaller	-0.425969
-3.948416	29	-0.124939
-3.948416	sum2	-0.124939
-4.060956	(n	-0.249877
-4.060956	(b	-0.249877
-4.060956	2n	-0.249877
-4.213263	idea	-0.425969
-3.948416	......................................................................................................	-0.124939
-4.060956	C,	-0.249877
-3.948416	Same	-0.425969
-4.060956	disable	-0.249877
-4.060956	assumption	-0.249877
-4.060956	treated	-0.425969
-3.948416	Fastcall	-0.249877
-4.060956	RGB	-0.249877
-3.948416	avoids	-0.249877
-4.213263	prevented	-0.249877
-3.948416	.......................................................................................	-0.124939
-3.948416	seldom	-0.249877
-4.060956	mixing	-0.249877
-4.060956	Branches	-0.425969
-3.948416	double.	-0.124939
-3.948416	16.1	-0.249877
-4.449881	(r	-0.726999
-3.948416	50%	-0.425969
-4.213263	suboptimal	-0.249877
-4.449881	16kB	-0.425969
-3.948416	tasks.	-0.124939
-3.948416	image	-0.124939
-4.060956	worst-case	-0.124939
-3.948416	(1)	-0.124939
-3.948416	float,	-0.249877
-4.060956	9.5	-0.249877
-4.449881	Induction;	-0.249877
-3.948416	uncached	-0.124939
-3.948416	individual	-0.124939
-3.948416	begin	-0.124939
-4.449881	interface.	-0.124939
-3.948416	9.3	-0.249877
-4.213263	option.	-0.124939
-4.449881	diagonal.	-0.425969
-4.060956	interfaces	-0.249877
-4.060956	floats	-0.124939
-4.060956	another.	-0.249877
-4.213263	N>	-0.425969
-4.213263	Class	-0.425969
-3.948416	Small	-0.124939
-3.948416	N1	-0.124939
-3.948416	alloca	-0.124939
-4.213263	aliasing.	-0.124939
-4.449881	eliminated	-0.425969
-3.948416	detailed	-0.124939
-4.060956	F32vec4	-0.124939
-4.060956	mask	-0.249877
-4.213263	original	-0.124939
-3.948416	caches	-0.124939
-4.213263	recognize	-0.425969
-4.060956	513	-0.249877
-4.060956	Threads	-0.249877
-4.213263	Overloaded	-0.425969
-3.948416	Contentions	-0.726999
-4.060956	illustrated	-0.425969
-4.449881	words,	-0.249877
-4.060956	returned	-0.249877
-4.060956	existing	-0.249877
-3.948416	Let's	-0.249877
-4.060956	is,	-0.124939
-4.449881	illustrates	-0.124939
-4.060956	unit-testing	-0.249877
-4.213263	i<300;	-0.249877
-3.948416	{return	-0.124939
-4.060956	b2;	-0.249877
-3.948416	Remember	-0.249877
-4.060956	explicit	-0.124939
-3.948416	mirror	-0.249877
-4.449881	dedicated	-0.249877
-4.449881	Disp()	-0.726999
-4.449881	r,	-0.726999
-4.213263	8.26a	-0.124939
-4.060956	breakpoint	-0.249877
-4.449881	b1,	-0.425969
-3.948416	appears	-0.249877
-3.948416	functionality	-0.124939
-4.060956	languages.	-0.124939
-3.948416	sequential	-0.124939
-4.213263	www.agner.org/optimize/cppexamples.zip	-0.249877
-4.449881	style.	-0.249877
-4.060956	MOVNTQ	-0.249877
-3.948416	optimized.	-0.124939
-3.948416	.........................................................................................	-0.124939
-4.060956	CHello	-0.425969
-4.213263	found	-0.425969
-4.060956	me	-0.249877
-3.948416	counts.	-0.124939
-3.948416	measurement	-0.249877
-3.948416	layers	-0.425969
-4.213263	handler	-0.124939
-4.060956	coded	-0.425969
-4.060956	changing	-0.124939
-4.060956	unit-test	-0.124939
-4.060956	implicit	-0.249877
-3.948416	smaller.	-0.124939
-4.213263	interval	-0.124939
-3.948416	33	-0.124939
-3.948416	31	-0.124939
-4.449881	int)b	-0.425969
-3.948416	3.	-0.425969
-4.060956	μs	-0.249877
-4.213263	module.	-0.425969
-3.948416	cast	-0.726999
-4.060956	8-bit	-0.425969
-4.060956	Integers	-0.249877
-3.948416	Calling	-0.249877
-4.060956	(double	-0.249877
-4.213263	Weekdays	-0.425969
-3.948416	application-specific	-0.249877
-3.948416	first.	-0.249877
-3.948416	considerations	-0.124939
-3.948416	63	-0.124939
-4.060956	represented	-0.425969
-3.948416	force	-0.249877
-3.948416	manually.	-0.124939
-4.060956	identified	-0.726999
-4.060956	www.agner.org/optimize/cppexamples.zip.	-0.124939
-3.948416	virus	-0.249877
-4.060956	structures.	-0.124939
-3.948416	exp	-0.249877
-4.060956	(in	-0.425969
-4.060956	pointer,	-0.249877
-4.060956	kept	-0.249877
-3.948416	Y	-0.124939
-4.060956	interprocedural	-0.249877
-4.060956	incompatible	-0.425969
-4.213263	bytes)	-0.249877
-4.060956	selected	-0.124939
-4.060956	multiplied	-0.726999
-4.060956	reproducible	-0.249877
-3.948416	normally	-0.249877
-3.948416	constants.	-0.249877
-4.060956	cache,	-0.124939
-4.060956	entry	-0.124939
-3.948416	inferior	-0.425969
-4.213263	obsolete.	-0.124939
-3.948416	simultaneously	-0.249877
-4.213263	routine	-0.249877
-3.948416	auto_ptr	-0.124939
-4.213263	tree	-0.425969
-4.213263	unable	-0.726999
-4.060956	Optimizes	-0.249877
-4.213263	constants,	-0.249877
-4.060956	techniques	-0.124939
-3.948416	otherwise	-0.124939
-4.060956	Smart	-0.726999
-3.948416	opens	-0.249877
-4.060956	modified	-0.425969
-4.213263	15.1a	-0.726999
-4.060956	Comparison	-0.726999
-4.060956	finished	-0.425969
-4.213263	run.	-0.124939
-4.060956	sequentially	-0.249877
-3.948416	Intel:	-0.249877
-3.948416	format.	-0.249877
-3.948416	programs.	-0.124939
-3.948416	manner	-0.124939
-3.948416	work.	-0.425969
-3.948416	uint64_t	-0.124939
-4.449881	(level	-0.726999
-3.948416	tests	-0.124939
-3.948416	Then	-0.249877
-4.060956	soft	-0.249877
-4.449881	100,	-0.124939
-4.060956	results.	-0.249877
-3.948416	hyperthreading	-0.124939
-4.060956	operators.	-0.124939
-4.060956	simpler	-0.124939
-4.060956	format	-0.249877
-4.060956	reasonable	-0.249877
-3.948416	resolution	-0.124939
-3.948416	units,	-0.124939
-3.948416	12.2	-0.249877
-4.449881	b:	-0.249877
-3.948416	processing.	-0.124939
-4.213263	well-defined	-0.249877
-4.449881	Still	-0.726999
-4.060956	45	-0.249877
-4.449881	bb	-0.726999
-4.213263	detail	-0.301030
-4.060956	advices	-0.124939
-4.449881	conclusion	-0.301030
-4.213263	deleted	-0.124939
-4.213263	49	-0.124939
-4.213263	parameters,	-0.124939
-4.060956	Storing	-0.124939
-4.060956	(set)	-0.602060
-4.060956	fine-grained	-0.301030
-4.213263	Execution	-0.301030
-4.449881	LoadVector(cc	-0.602060
-4.449881	arbitrary	-0.124939
-4.060956	Which	-0.124939
-4.060956	behave	-0.301030
-4.060956	bits,	-0.124939
-4.060956	compact.	-0.124939
-4.213263	behaves	-0.301030
-4.213263	FPGA	-0.124939
-4.060956	earlier	-0.124939
-4.213263	5)	-0.124939
-4.449881	&,	-0.602060
-4.060956	101	-0.124939
-4.449881	reasons:	-0.301030
-4.060956	consecutively	-0.301030
-4.060956	Extra	-0.124939
-4.060956	error.	-0.301030
-4.060956	carried	-0.301030
-4.060956	reordering	-0.124939
-4.060956	platform.	-0.124939
-4.213263	satisfied	-0.602060
-4.060956	catch	-0.124939
-4.060956	/arch:AVX	-0.124939
-4.060956	93	-0.124939
-4.060956	53	-0.124939
-4.060956	#else	-0.301030
-4.213263	addition.	-0.124939
-4.060956	Text	-0.602060
-4.449881	big-endian	-0.602060
-4.060956	54	-0.124939
-4.060956	119	-0.124939
-4.060956	!a	-0.301030
-4.449881	abstraction	-0.124939
-4.449881	each,	-0.124939
-4.213263	11)	-0.301030
-4.060956	code).	-0.124939
-4.060956	......................................................................................	-0.124939
-4.213263	wrong	-0.124939
-4.449881	LoadVector(bb	-0.602060
-4.213263	alias	-0.124939
-4.449881	blocks,	-0.124939
-4.060956	feedback	-0.124939
-4.213263	pure_function	-0.124939
-4.060956	a-a	-0.602060
-4.449881	chains.	-0.124939
-4.060956	prefetching	-0.124939
-4.060956	compiler-generated	-0.301030
-4.213263	redesign	-0.301030
-4.213263	differently	-0.301030
-4.213263	B,	-0.301030
-4.213263	reasonably	-0.301030
-4.060956	Two	-0.124939
-4.060956	55	-0.124939
-4.449881	libraries:	-0.301030
-4.060956	projects	-0.124939
-4.449881	1)sign	-0.602060
-4.060956	14.6	-0.301030
-4.060956	combination	-0.602060
-4.213263	libraries,	-0.301030
-4.060956	mean	-0.124939
-4.060956	inserts	-0.124939
-4.213263	const*)p);	-0.602060
-4.060956	hidden	-0.124939
-4.213263	x*x*x*x*x*x*x*x	-0.602060
-4.213263	errors.	-0.124939
-4.060956	One	-0.124939
-4.213263	blocking	-0.124939
-4.213263	Faster	-0.301030
-4.213263	sources	-0.602060
-4.060956	well-tested	-0.124939
-4.213263	devices,	-0.124939
-4.060956	multiplication,	-0.124939
-4.213263	part.	-0.301030
-4.060956	API	-0.124939
-4.213263	starts	-0.124939
-4.060956	only.	-0.124939
-4.213263	counters,	-0.124939
-4.213263	execution,	-0.124939
-4.060956	list[i+1]	-0.301030
-4.060956	distance	-0.124939
-4.213263	14.28	-0.124939
-4.060956	zero,	-0.124939
-4.060956	r1	-0.124939
-4.060956	r2	-0.301030
-4.060956	(MS)	-0.124939
-4.060956	aligning	-0.124939
-4.060956	assuming	-0.301030
-4.213263	r;	-0.301030
-4.213263	analysis	-0.124939
-4.213263	seem	-0.124939
-4.213263	perhaps	-0.124939
-4.213263	service	-0.124939
-4.060956	Comes	-0.602060
-4.449881	esp	-0.124939
-4.213263	features.	-0.124939
-4.213263	---xx----	-0.124939
-4.060956	...............................................................................................	-0.124939
-4.060956	C/C++	-0.301030
-4.213263	100.	-0.124939
-4.213263	b;};	-0.301030
-4.060956	consumes	-0.124939
-4.060956	numbers,	-0.124939
-4.060956	129	-0.124939
-4.449881	reload	-0.301030
-4.060956	124	-0.124939
-4.449881	motion	-0.124939
-4.213263	speed-critical	-0.124939
-4.060956	numbers:	-0.602060
-4.060956	(page	-0.124939
-4.213263	12.	-0.124939
-4.213263	1000	-0.301030
-4.060956	long.	-0.124939
-4.213263	organization	-0.124939
-4.449881	slow,	-0.124939
-4.213263	performed	-0.301030
-4.060956	high-level	-0.301030
-4.449881	advance.	-0.301030
-4.060956	anyway	-0.124939
-4.213263	(*.dll	-0.602060
-4.060956	Intel-based	-0.602060
-4.213263	main()	-0.301030
-4.060956	x2	-0.301030
-4.060956	database,	-0.301030
-4.060956	Works	-0.301030
-4.060956	calculations:	-0.301030
-4.060956	basis	-0.301030
-4.060956	updating	-0.124939
-4.060956	manipulation	-0.124939
-4.213263	28.	-0.124939
-4.060956	C0	-0.301030
-4.060956	access,	-0.124939
-4.060956	calculations,	-0.124939
-4.060956	multi-core	-0.301030
-4.060956	level,	-0.301030
-4.213263	Exception	-0.301030
-4.060956	optimization,	-0.124939
-4.060956	Whether	-0.301030
-4.213263	contents	-0.301030
-4.060956	books	-0.124939
-4.060956	removed	-0.124939
-4.060956	164	-0.301030
-4.449881	matrix[rows][columns];	-0.602060
-4.449881	list.	-0.124939
-4.213263	exponential	-0.301030
-4.060956	generality	-0.301030
-4.213263	takes.	-0.124939
-4.060956	multithreaded	-0.124939
-4.213263	list[i+2]	-0.301030
-4.213263	Total	-0.301030
-4.060956	explicitly.	-0.124939
-4.060956	programs,	-0.124939
-4.060956	optimizes	-0.124939
-4.213263	instruments	-0.124939
-4.213263	After	-0.301030
-4.213263	involving	-0.124939
-4.449881	(vector)	-0.602060
-4.060956	unfortunate	-0.124939
-4.449881	"Optimizing	-0.602060
-4.060956	parameter,	-0.124939
-4.060956	exceptions.	-0.301030
-4.060956	time-	-0.301030
-4.449881	K8	-0.124939
-4.060956	loaded.	-0.301030
-4.449881	(i=0;	-0.301030
-4.213263	14.30	-0.124939
-4.060956	b;}	-0.301030
-4.060956	wasteful	-0.124939
-4.449881	Return	-0.124939
-4.449881	StoreVector(void	-0.602060
-4.449881	microcontrollers	-0.602060
-4.060956	character	-0.301030
-4.213263	implemented.	-0.301030
-4.449881	fact,	-0.301030
-4.449881	runtime.	-0.124939
-4.060956	manually	-0.124939
-4.060956	xxn	-0.124939
-4.449881	|,	-0.301030
-4.060956	7.2	-0.301030
-4.060956	Thread-local	-0.602060
-4.060956	81	-0.124939
-4.060956	7.1	-0.301030
-4.060956	signal	-0.124939
-4.449881	circular	-0.602060
-4.060956	7.4	-0.301030
-4.060956	ignore	-0.124939
-4.060956	keywords	-0.124939
-4.060956	7.8	-0.301030
-4.213263	once.	-0.124939
-4.213263	89	-0.301030
-4.060956	list[100];	-0.124939
-4.213263	considered	-0.301030
-4.213263	Windows).	-0.124939
-4.213263	for.	-0.124939
-4.060956	divisions	-0.124939
-4.213263	8;	-0.301030
-4.060956	reflects	-0.602060
-4.060956	.....................................................................................................	-0.124939
-4.060956	lies	-0.301030
-4.213263	trigonometric	-0.301030
-4.213263	manipulate	-0.301030
-4.449881	fractional	-0.602060
-4.213263	-128	-0.124939
-4.213263	spaced	-0.301030
-4.213263	approximate	-0.124939
-4.060956	comparisons,	-0.124939
-4.060956	User	-0.124939
-4.449881	dividend	-0.301030
-4.213263	unpredictable	-0.124939
-4.449881	LoadVector(void	-0.602060
-4.060956	step.	-0.124939
-4.060956	Z	-0.124939
-4.213263	separated	-0.301030
-4.060956	64,	-0.124939
-4.060956	copies	-0.301030
-4.213263	brand.	-0.124939
-4.060956	annoying	-0.301030
-4.213263	CodeAnalyst.	-0.124939
-4.213263	Literature	-0.124939
-4.213263	study	-0.301030
-4.449881	stack,	-0.301030
-4.449881	collection.	-0.124939
-4.213263	occurs,	-0.124939
-4.449881	-fno-pic	-0.124939
-4.213263	_M_IX86	-0.124939
-4.060956	elsewhere	-0.124939
-4.060956	bypassing	-0.301030
-4.449881	0x273F	-0.124939
-4.060956	135	-0.124939
-4.060956	looks	-0.602060
-4.213263	{double	-0.301030
-4.213263	implementing	-0.301030
-4.060956	int.	-0.124939
-4.060956	space,	-0.124939
-4.060956	skip	-0.124939
-4.060956	137	-0.124939
-4.060956	132	-0.124939
-4.060956	position-	-0.602060
-4.213263	Index	-0.602060
-4.060956	Specifies	-0.124939
-4.449881	residual	-0.602060
-4.213263	operations,	-0.124939
-4.060956	C++.	-0.124939
-4.060956	input/output	-0.124939
-4.213263	packages	-0.124939
-4.060956	operations:	-0.301030
-4.213263	Explicit	-0.301030
-4.060956	purpose.	-0.301030
-4.213263	reciprocal_divisor;	-0.124939
-4.213263	compilation.	-0.301030
-4.060956	(number	-0.602060
-4.449881	endian	-0.124939
-4.060956	allocates	-0.124939
-4.060956	136	-0.124939
-4.060956	reveals	-0.124939
-4.213263	filled	-0.124939
-4.060956	(requires	-0.124939
-4.449881	offer	-0.301030
-4.213263	Bitfields	-0.124939
-4.060956	At	-0.124939
-4.213263	up-to-date	-0.124939
-4.449881	leaving	-0.301030
-4.213263	Inheritance	-0.124939
-4.060956	153	-0.124939
-4.060956	degree	-0.301030
-4.449881	_mm_storeu_si128((__m128i	-0.602060
-4.060956	automatically,	-0.124939
-4.213263	sequentially.	-0.124939
-4.213263	Enums	-0.124939
-4.060956	Algebraic	-0.301030
-4.213263	A,	-0.301030
-4.060956	operands.	-0.124939
-4.060956	i<100;	-0.124939
-4.213263	0.11	-0.124939
-4.060956	0.12	-0.124939
-4.213263	nearest	-0.124939
-4.060956	To	-0.124939
-4.213263	x--	-0.602060
-4.060956	language,	-0.124939
-4.060956	145	-0.124939
-4.060956	140	-0.124939
-4.060956	141	-0.124939
-4.060956	RISC	-0.124939
-4.060956	Consider	-0.301030
-4.060956	text	-0.301030
-4.060956	Object	-0.124939
-4.060956	14.10	-0.301030
-4.060956	14.11	-0.301030
-4.449881	<int	-0.301030
-4.060956	back.	-0.301030
-4.060956	8.4	-0.301030
-4.060956	8.7	-0.301030
-4.213263	listing	-0.124939
-4.060956	twice	-0.301030
-4.060956	Pascal	-0.124939
-4.060956	expected.	-0.124939
-4.060956	14.4	-0.301030
-4.060956	Vec16s	-0.124939
-4.060956	Simple	-0.124939
-4.213263	Manual",	-0.602060
-4.213263	leave	-0.124939
-4.213263	solved	-0.301030
-4.060956	supplied	-0.301030
-4.060956	Available	-0.301030
-4.213263	translated	-0.301030
-4.213263	Linux:	-0.124939
-4.060956	With	-0.301030
-4.060956	Has	-0.124939
-4.213263	overriding	-0.301030
-4.449881	Opteron	-0.602060
-4.449881	systems".	-0.124939
-4.213263	correct	-0.124939
-4.060956	caching.	-0.124939
-4.060956	overflow:	-0.124939
-4.213263	scans	-0.301030
-4.449881	way:	-0.124939
-4.060956	Sometimes	-0.301030
-4.060956	-fno-builtin	-0.301030
-4.213263	justify	-0.124939
-4.449881	contrary,	-0.124939
-4.449881	conventions.	-0.124939
-4.213263	initialization	-0.602060
-4.213263	Internet	-0.124939
-4.213263	cover	-0.301030
-4.213263	Constructors	-0.301030
-4.060956	PC's	-0.124939
-4.060956	7.21	-0.301030
-4.060956	delays	-0.124939
-4.449881	a);	-0.602060
-4.213263	c[i]	-0.301030
-4.060956	cleaning	-0.602060
-4.060956	way,	-0.124939
-4.060956	Big	-0.124939
-4.213263	ZMM	-0.602060
-4.213263	coefficients	-0.124939
-4.060956	DOS	-0.301030
-4.213263	-fpie	-0.124939
-4.060956	labels	-0.124939
-4.213263	6,	-0.301030
-4.060956	ret	-0.301030
-4.060956	Signed	-0.124939
-4.060956	logarithms	-0.124939
-4.060956	stored.	-0.124939
-4.060956	manner.	-0.124939
-4.060956	Today,	-0.301030
-4.213263	easiest	-0.301030
-4.060956	pop	-0.124939
-4.060956	3.5	-0.301030
-4.213263	-S	-0.124939
-4.213263	inlined.	-0.124939
-4.213263	cmp	-0.124939
-4.213263	flow	-0.301030
-4.060956	directives.	-0.124939
-4.213263	deallocated.	-0.301030
-4.213263	(128	-0.124939
-4.060956	Programmers	-0.124939
-4.060956	focus	-0.124939
-4.213263	definition.	-0.301030
-4.213263	track	-0.301030
-4.213263	condition.	-0.124939
-4.060956	s3	-0.124939
-4.060956	s2	-0.124939
-4.213263	contemporary	-0.124939
-4.060956	66	-0.124939
-4.060956	probably	-0.124939
-4.060956	longjmp	-0.124939
-4.449881	2exponent	-0.124939
-4.060956	leads	-0.602060
-4.060956	Alignd	-0.602060
-4.213263	improving	-0.124939
-4.060956	sizes.	-0.124939
-4.060956	.......................................................................................................	-0.124939
-4.060956	holds	-0.301030
-4.060956	competing	-0.124939
-4.213263	questions	-0.124939
-4.213263	register,	-0.124939
-4.060956	etc.,	-0.124939
-4.060956	ReadTSC	-0.124939
-4.213263	with:	-0.602060
-4.060956	kernel	-0.124939
-4.449881	CPUs").	-0.301030
-4.449881	j;	-0.124939
-4.213263	natural	-0.124939
-4.060956	Examples	-0.301030
-4.060956	(iset	-0.602060
-4.060956	F2	-0.124939
-4.213263	moving	-0.301030
-4.213263	9.6b.	-0.301030
-4.060956	-O3	-0.301030
-4.213263	unusual	-0.301030
-4.449881	misses,	-0.602060
-4.060956	Divide	-0.602060
-4.449881	sorted	-0.602060
-4.060956	efficiency,	-0.124939
-4.449881	same.	-0.124939
-4.213263	(STL)	-0.124939
-4.449881	rid	-0.602060
-4.060956	ms	-0.301030
-4.060956	arrays,	-0.124939
-4.213263	matrix[r][c]	-0.301030
-4.060956	issue	-0.124939
-4.213263	solve	-0.301030
-4.060956	since	-0.124939
-4.449881	beyond	-0.602060
-4.213263	readable	-0.124939
-4.060956	infinity	-0.602060
-4.060956	bookkeeping	-0.124939
-4.060956	formula	-0.124939
-4.060956	technical	-0.124939
-4.060956	instr.	-0.602060
-4.213263	specified	-0.124939
-4.060956	organizing	-0.301030
-4.213263	9.5a	-0.124939
-4.213263	false,	-0.301030
-4.060956	open	-0.124939
-4.060956	decomposition	-0.124939
-4.060956	measuring	-0.301030
-4.213263	File	-0.124939
-4.213263	negligible	-0.124939
-4.060956	took	-0.124939
-4.060956	on.	-0.124939
-4.213263	Hyperthreading	-0.124939
-4.060956	30	-0.124939
-4.060956	initially	-0.602060
-4.213263	occur.	-0.124939
-4.213263	Strings	-0.124939
-4.213263	Preprocessing	-0.602060
-4.213263	utilize	-0.301030
-4.449881	(0,0,0,0,0,0,0,0)	-0.301030
-4.060956	38	-0.124939
-4.060956	reference.	-0.124939
-4.449881	FUNCNAME	-0.124939
-4.060956	history	-0.602060
-4.060956	CChild2	-0.124939
-4.449881	bit:	-0.301030
-4.060956	forums	-0.124939
-4.213263	addressing	-0.602060
-4.449881	1024;	-0.301030
-4.060956	C#,	-0.301030
-4.213263	allocating	-0.124939
-4.060956	a+b	-0.301030
-4.449881	taken	-0.602060
-4.213263	microprocessor.	-0.124939
-4.060956	argument	-0.124939
-4.060956	Func1	-0.124939
-4.213263	Unix-like	-0.124939
-4.060956	-----	-0.124939
-4.060956	2.5	-0.301030
-4.213263	read-only	-0.124939
-4.060956	well-structured	-0.124939
-4.060956	represent	-0.301030
-4.060956	elsewhere.	-0.124939
-4.060956	micro-op	-0.301030
-4.213263	best.	-0.124939
-4.060956	returning	-0.301030
-4.060956	Long	-0.301030
-4.213263	r1;	-0.124939
-4.060956	CPU-	-0.602060
-4.213263	top	-0.602060
-4.060956	decide	-0.124939
-4.449881	other.	-0.301030
-4.213263	brackets	-0.124939
-4.060956	2004.	-0.124939
-4.060956	odd	-0.124939
-4.060956	7.7	-0.301030
-4.060956	Documentation	-0.124939
-4.449881	prone.	-0.124939
-4.060956	compile-	-0.301030
-4.060956	Global	-0.301030
-4.060956	lookups	-0.124939
-4.060956	Whole	-0.602060
-4.060956	a*b	-0.301030
-4.213263	linker.	-0.301030
-4.213263	security.	-0.124939
-4.449881	lookup.	-0.124939
-4.213263	78	-0.301030
-4.213263	handled	-0.124939
-4.213263	a[],	-0.602060
-4.060956	implicitly	-0.301030
-4.213263	terminating	-0.301030
-4.060956	_WIN32	-0.124939
-4.060956	(2n	-0.602060
-4.213263	measures	-0.301030
-4.060956	multiplications.	-0.124939
-4.213263	intensive	-0.124939
-4.213263	moved	-0.301030
-4.060956	ReadTSC()	-0.124939
-4.449881	valid.	-0.301030
-4.449881	b[size];	-0.602060
-4.060956	Not	-0.301030
-4.213263	none	-0.602060
-4.060956	"what	-0.301030
-4.060956	Comparing	-0.124939
-4.060956	instructions,	-0.124939
-4.060956	Microprocessor	-0.124939
-4.213263	metaprogramming.	-0.124939
-4.060956	Size	-0.602060
-4.213263	metaprogramming,	-0.124939
-4.213263	bypass	-0.602060
-4.060956	output.	-0.301030
-4.060956	...........................................................................................	-0.124939
-4.213263	numerically	-0.602060
-4.060956	expression.	-0.124939
-4.060956	..........................................................................................................	-0.124939
-4.213263	InstructionSet()	-0.301030
-4.060956	Eliminate	-0.301030
-4.060956	backup	-0.124939
-4.060956	13.6	-0.301030
-4.449881	Get	-0.301030
-4.060956	throw	-0.124939
-4.060956	More	-0.124939
-4.060956	---x-----	-0.124939
-4.449881	_mm_loadu_si128((__m128i	-0.301030
-4.060956	Before	-0.301030
-4.060956	Applications	-0.602060
-4.060956	25	-0.124939
-4.213263	AVX-512	-0.602060
-4.060956	23	-0.124939
-4.449881	evict	-0.301030
-4.060956	Copying	-0.124939
-4.449881	(x	-0.602060
-4.060956	being	-0.124939
-4.060956	sum,	-0.301030
-4.060956	unrolled	-0.124939
-4.213263	slow.	-0.124939
-4.449881	StoreVector(aa	-0.602060
-4.060956	7.11	-0.301030
-4.060956	market	-0.124939
-4.060956	vectors,	-0.124939
-4.060956	resource.	-0.301030
-4.213263	Architecture	-0.301030
-4.060956	7.12	-0.301030
-4.213263	limited.	-0.124939
-4.213263	11.3	-0.124939
-4.060956	typedef	-0.124939
-4.213263	0x2700	-0.602060
-4.060956	Replace	-0.124939
-4.060956	Instead,	-0.124939
-4.060956	frameworks,	-0.124939
-4.060956	*(p++)	-0.301030
-4.060956	(Intel	-0.301030
-4.213263	nearby	-0.124939
-4.213263	fragmented.	-0.301030
-4.213263	truncation.	-0.124939
-4.213263	Different	-0.301030
-4.449881	logarithm	-0.124939
-4.213263	Day	-0.301030
-4.060956	ported	-0.602060
-4.213263	inline.	-0.301030
-4.060956	big.	-0.124939
-4.213263	deallocation	-0.301030
-4.449881	(PLT)	-0.124939
-4.060956	7.22	-0.301030
-4.213263	Context	-0.301030
-4.060956	7.23	-0.301030
-4.060956	services	-0.124939
-4.060956	7.20	-0.301030
-4.449881	extremely	-0.124939
-4.213263	kb	-0.124939
-4.449881	joining	-0.124939
-4.213263	decrement	-0.301030
-4.060956	0x1C.	-0.124939
-4.213263	free.	-0.124939
-4.213263	Check	-0.301030
-4.060956	double,	-0.301030
-4.449881	periodic	-0.602060
-4.060956	7.27	-0.301030
-4.060956	7.24	-0.301030
-4.060956	product	-0.301030
-4.060956	7.25	-0.301030
-4.060956	7.28	-0.124939
-4.060956	references,	-0.124939
-4.449881	CPUs"	-0.301030
-4.060956	experience	-0.124939
-4.449881	determine	-0.124939
-4.449881	<typename	-0.124939
-4.060956	Generate	-0.124939
-4.449881	certainly	-0.124939
-4.060956	Devirtualization	-0.124939
-4.060956	pivot	-0.124939
-4.060956	__declspec(	-0.301030
-4.213263	mispredictions	-0.124939
-4.449881	a[100],	-0.124939
-4.060956	allocations	-0.301030
-4.449881	necessary,	-0.124939
-4.060956	9.4	-0.301030
-4.060956	float.	-0.124939
-4.449881	1.0f;}	-0.301030
-4.213263	indeed	-0.124939
-4.060956	9.1	-0.301030
-4.060956	(not	-0.124939
-4.060956	built-in	-0.301030
-4.060956	n.	-0.124939
-4.060956	complete	-0.301030
-4.449881	(x)	-0.301030
-4.060956	ecx,	-0.124939
-4.213263	modified.	-0.301030
-4.449881	folding	-0.301030
-4.213263	expects	-0.124939
-4.449881	Call	-0.301030
-4.449881	joined	-0.301030
-4.060956	classes,	-0.124939
-4.060956	compute	-0.124939
-4.213263	interpreting	-0.301030
-4.213263	LIBM	-0.124939
-4.213263	accesses	-0.124939
-4.060956	$B1$2	-0.124939
-4.060956	copying.	-0.124939
-4.449881	Volume	-0.124939
-4.449881	placed	-0.301030
-4.449881	spot.	-0.124939
-4.213263	variable,	-0.124939
-4.060956	jumping	-0.301030
-4.060956	compared	-0.301030
-4.213263	3-dimensional	-0.301030
-4.060956	x.	-0.124939
-4.060956	a.	-0.124939
-4.060956	post-increment.	-0.124939
-4.213263	sufficient	-0.301030
-4.213263	evicted	-0.124939
-4.060956	flags	-0.124939
-4.060956	Sum2	-0.602060
-4.449881	(2,2,2,2,2,2,2,2)	-0.301030
-4.449881	[edx]	-0.301030
-4.060956	7.14	-0.301030
-4.060956	7.16	-0.301030
-4.060956	7.17	-0.301030
-4.213263	templates.	-0.124939
-4.060956	7.13	-0.124939
-4.060956	7.19	-0.301030
-4.449881	beware	-0.301030
-4.060956	7.18	-0.301030
-4.213263	card	-0.124939
-4.213263	(4)	-0.124939
-4.213263	elements:	-0.425969
-4.449881	viable	-0.124939
-4.213263	7.10	-0.425969
-4.449881	_mm_set1_epi16(2);	-0.425969
-4.213263	templates,	-0.124939
-4.213263	complexity	-0.124939
-4.213263	511	-0.124939
-4.449881	Member	-0.124939
-4.213263	modifying	-0.124939
-4.213263	undesired	-0.124939
-4.213263	7.15	-0.425969
-4.213263	symbol	-0.425969
-4.213263	problematic	-0.425969
-4.449881	invest	-0.124939
-4.213263	memcpy,	-0.124939
-4.449881	Sum3	-0.124939
-4.213263	Pure	-0.124939
-4.213263	impossible	-0.124939
-4.449881	B1;	-0.425969
-4.449881	forwarding	-0.425969
-4.213263	storage,	-0.124939
-4.449881	107).	-0.124939
-4.213263	64)	-0.124939
-4.213263	static.	-0.124939
-4.213263	_M_X64	-0.124939
-4.213263	CriticalFunction();	-0.124939
-4.213263	x-xxx---x	-0.124939
-4.449881	shown	-0.425969
-4.213263	lack	-0.124939
-4.449881	a2	-0.124939
-4.449881	a1	-0.124939
-4.213263	16)	-0.124939
-4.449881	debugger.	-0.124939
-4.213263	mostly	-0.124939
-4.213263	a)	-0.425969
-4.213263	ptr	-0.124939
-4.213263	fastcall	-0.124939
-4.213263	accumulators	-0.124939
-4.449881	aliasing"	-0.124939
-4.449881	Implementation	-0.124939
-4.213263	significantly	-0.124939
-4.213263	parallelism.	-0.425969
-4.449881	479001600};	-0.124939
-4.213263	"Performance	-0.425969
-4.213263	type-casted	-0.425969
-4.213263	x4	-0.425969
-4.449881	frequency.	-0.124939
-4.213263	interpretation	-0.124939
-4.213263	(chapter	-0.124939
-4.449881	research,	-0.124939
-4.213263	SIAM	-0.425969
-4.213263	a[i+1]	-0.425969
-4.213263	x-xxx----	-0.124939
-4.213263	send	-0.124939
-4.449881	string[100],	-0.425969
-4.213263	SSSE3	-0.124939
-4.213263	expressions,	-0.124939
-4.213263	CriticalFunctionType	-0.124939
-4.449881	71).	-0.124939
-4.213263	ASCII	-0.124939
-4.213263	overlapping	-0.124939
-4.213263	computationally	-0.425969
-4.213263	executes	-0.124939
-4.213263	window	-0.124939
-4.213263	ten	-0.124939
-4.213263	Structure	-0.124939
-4.213263	jobs.	-0.124939
-4.213263	please	-0.124939
-4.449881	Unions	-0.124939
-4.213263	AMD:	-0.124939
-4.213263	((a*x+b)*x+c)*x+d	-0.425969
-4.213263	9.2	-0.425969
-4.213263	1024	-0.124939
-4.449881	programmed.	-0.124939
-4.213263	Aligned	-0.124939
-4.213263	past	-0.124939
-4.213263	9.6	-0.425969
-4.449881	object's	-0.124939
-4.449881	b2,	-0.425969
-4.213263	partial	-0.124939
-4.449881	(a+b)+(c+d)	-0.425969
-4.213263	(16	-0.124939
-4.213263	xor	-0.124939
-4.213263	again,	-0.124939
-4.213263	9.9	-0.425969
-4.449881	resolve	-0.124939
-4.213263	context.	-0.124939
-4.449881	(May	-0.425969
-4.449881	131.	-0.124939
-4.213263	goal	-0.124939
-4.213263	discovered	-0.425969
-4.449881	Exp(float	-0.425969
-4.213263	9.8	-0.425969
-4.213263	_MSC_VER	-0.124939
-4.449881	16.3	-0.425969
-4.213263	chance	-0.124939
-4.213263	manipulated	-0.124939
-4.213263	c+b	-0.124939
-4.449881	override	-0.124939
-4.213263	branches,	-0.124939
-4.213263	applications,	-0.425969
-4.213263	developed	-0.124939
-4.213263	7.29	-0.425969
-4.213263	CriticalFunction.	-0.124939
-4.213263	discusses	-0.425969
-4.213263	#elif	-0.425969
-4.213263	7.26	-0.425969
-4.449881	"Instruction	-0.425969
-4.213263	400	-0.124939
-4.449881	invariant	-0.425969
-4.449881	c*x	-0.425969
-4.213263	carefully	-0.124939
-4.213263	CriticalInnerFunction	-0.124939
-4.213263	a/1=a	-0.124939
-4.213263	__m128	-0.124939
-4.213263	operator;	-0.124939
-4.213263	subexpression.	-0.124939
-4.213263	freed	-0.124939
-4.213263	operator,	-0.124939
-4.213263	p->Hello();	-0.124939
-4.213263	CPUs:	-0.124939
-4.449881	0's	-0.124939
-4.213263	chip	-0.124939
-4.213263	operator.	-0.425969
-4.213263	proceed	-0.124939
-4.449881	CriticalFunction_386(int	-0.425969
-4.213263	scientific	-0.124939
-4.213263	biased	-0.124939
-4.213263	minor	-0.124939
-4.449881	screen.	-0.124939
-4.449881	market.	-0.124939
-4.449881	aligned(16)))	-0.124939
-4.449881	justified	-0.124939
-4.213263	exit	-0.124939
-4.449881	cos(x);	-0.124939
-4.213263	having	-0.124939
-4.449881	for-loop	-0.124939
-4.213263	char,	-0.124939
-4.213263	a/a=1	-0.124939
-4.449881	legal	-0.124939
-4.213263	resource,	-0.124939
-4.449881	parallelization.	-0.124939
-4.449881	keeping	-0.124939
-4.213263	sampling:	-0.425969
-4.213263	12.5.	-0.124939
-4.213263	reduces	-0.124939
-4.213263	non-member	-0.124939
-4.449881	vectorized,	-0.124939
-4.449881	y=temp;}	-0.124939
-4.449881	influences	-0.124939
-4.213263	explains	-0.124939
-4.213263	emulate	-0.124939
-4.213263	four,	-0.124939
-4.449881	believe	-0.425969
-4.213263	stdint.h	-0.124939
-4.449881	elimin.,	-0.124939
-4.449881	instance.	-0.124939
-4.449881	TransposeCopy(double	-0.425969
-4.213263	insufficient	-0.124939
-4.213263	dangers	-0.124939
-4.213263	aligned.	-0.124939
-4.213263	external	-0.124939
-4.449881	"Error:	-0.425969
-4.213263	smmintrin.h	-0.124939
-4.213263	frameworks.	-0.124939
-4.449881	Monday,	-0.124939
-4.449881	X,	-0.124939
-4.213263	GNU	-0.124939
-4.213263	127.	-0.425969
-4.449881	FuncType	-0.124939
-4.213263	C1::f	-0.124939
-4.213263	Splitting	-0.425969
-4.213263	Algorithms	-0.425969
-4.213263	-mAVX	-0.124939
-4.449881	worry	-0.124939
-4.213263	instruction.	-0.124939
-4.449881	x^2	-0.124939
-4.213263	disabled	-0.124939
-4.213263	CPU-specific	-0.124939
-4.449881	8.26b	-0.124939
-4.213263	preprocessing	-0.124939
-4.213263	strides.	-0.124939
-4.213263	15.	-0.124939
-4.449881	develop	-0.425969
-4.449881	Full	-0.425969
-4.213263	(N	-0.425969
-4.213263	Enable	-0.124939
-4.213263	cases:	-0.124939
-4.449881	non-AVX	-0.124939
-4.449881	a+(b+c)	-0.425969
-4.449881	mouse.	-0.124939
-4.213263	www.amd.com.	-0.124939
-4.213263	-56	-0.124939
-4.449881	difficult.	-0.124939
-4.213263	-msse3	-0.124939
-4.213263	(32-bit	-0.124939
-4.213263	values.	-0.124939
-4.213263	/arch:SSE2	-0.425969
-4.213263	Vec8s	-0.124939
-4.213263	CParent	-0.124939
-4.213263	Adding	-0.124939
-4.213263	developer.intel.com.	-0.124939
-4.449881	frame-	-0.425969
-4.449881	<stdio.h>	-0.124939
-4.213263	9.11	-0.425969
-4.213263	relocations	-0.425969
-4.449881	NAN	-0.124939
-4.213263	9.10	-0.425969
-4.449881	2"	-0.124939
-4.213263	sum;	-0.124939
-4.213263	12.10	-0.425969
-4.213263	semaphores,	-0.124939
-4.213263	Multiplication	-0.124939
-4.213263	N&(N-1)	-0.124939
-4.449881	assembly:	-0.425969
-4.449881	regularly.	-0.124939
-4.213263	vacant	-0.124939
-4.213263	early	-0.124939
-4.213263	non-polymorphic	-0.124939
-4.449881	3.3;	-0.124939
-4.213263	users.	-0.124939
-4.213263	extern	-0.425969
-4.213263	heap.	-0.425969
-4.213263	formats	-0.124939
-4.449881	ruled	-0.425969
-4.213263	reused	-0.124939
-4.213263	Organize	-0.425969
-4.449881	CParent<CChild1>	-0.425969
-4.213263	14.12b	-0.124939
-4.449881	types:	-0.124939
-4.213263	FDIV	-0.425969
-4.213263	decimal	-0.425969
-4.213263	nfac	-0.124939
-4.213263	connections.	-0.124939
-4.449881	PC.	-0.124939
-4.213263	hacks	-0.124939
-4.213263	24	-0.124939
-4.213263	suffer	-0.425969
-4.213263	throw.	-0.124939
-4.213263	differently.	-0.124939
-4.449881	align(16))	-0.425969
-4.449881	element,	-0.425969
-4.213263	Often,	-0.425969
-4.213263	Replacing	-0.124939
-4.449881	b*x*x	-0.425969
-4.449881	language".	-0.124939
-4.213263	that's	-0.124939
-4.213263	13.3	-0.425969
-4.213263	inherent	-0.124939
-4.213263	-static	-0.124939
-4.449881	ArrayOfStructures[100];	-0.124939
-4.213263	13.2	-0.425969
-4.213263	comparison	-0.124939
-4.213263	hint	-0.124939
-4.213263	13.5	-0.425969
-4.213263	13.4	-0.425969
-4.213263	13.7	-0.425969
-4.213263	code"	-0.124939
-4.449881	course.	-0.124939
-4.449881	<dvec.h>	-0.425969
-4.213263	loop,	-0.124939
-4.213263	number,	-0.124939
-4.213263	case:	-0.425969
-4.449881	rolling	-0.425969
-4.213263	loop:	-0.124939
-4.213263	2.0f;	-0.124939
-4.449881	52.	-0.124939
-4.449881	supporting	-0.124939
-4.213263	thrown	-0.124939
-4.213263	invoking	-0.425969
-4.449881	{1,	-0.425969
-4.213263	construct	-0.124939
-4.449881	compiled.	-0.124939
-4.449881	transpose(double	-0.425969
-4.449881	Checking	-0.425969
-4.449881	elimination,	-0.425969
-4.213263	StringLength;	-0.124939
-4.213263	integration,	-0.124939
-4.213263	kilobytes	-0.124939
-4.449881	got	-0.124939
-4.213263	locally	-0.124939
-4.213263	it,	-0.124939
-4.213263	32.	-0.124939
-4.213263	minimum	-0.124939
-4.213263	thread-specific	-0.425969
-4.213263	a[2];	-0.124939
-4.213263	can't	-0.124939
-4.213263	Vec4f	-0.124939
-4.213263	(2)	-0.124939
-4.449881	paragraph	-0.124939
-4.213263	remain	-0.124939
-4.449881	scanners	-0.124939
-4.213263	calculates	-0.124939
-4.213263	databases,	-0.124939
-4.213263	C++".	-0.124939
-4.213263	2008	-0.124939
-4.213263	/Gy	-0.425969
-4.213263	Assuming	-0.425969
-4.213263	search,	-0.124939
-4.449881	.........................................................................	-0.124939
-4.213263	constructs	-0.124939
-4.449881	screen	-0.425969
-4.213263	Loops	-0.124939
-4.213263	Plus2	-0.124939
-4.213263	a*0	-0.425969
-4.213263	a*1	-0.425969
-4.449881	"vectorclass.h"	-0.425969
-4.449881	b[size],	-0.124939
-4.449881	Table[100];	-0.425969
-4.449881	describe	-0.124939
-4.213263	GOT.	-0.124939
-4.213263	dynamic_cast	-0.124939
-4.449881	Finding	-0.425969
-4.449881	120,	-0.425969
-4.213263	uninitialized	-0.124939
-4.213263	77	-0.124939
-4.213263	string.	-0.124939
-4.213263	74	-0.124939
-4.213263	73	-0.124939
-4.213263	Object2;	-0.124939
-4.213263	destination	-0.124939
-4.449881	lookup:	-0.425969
-4.213263	72	-0.124939
-4.213263	(Tuesday	-0.425969
-4.449881	b:2;	-0.425969
-4.449881	string;	-0.124939
-4.213263	putting	-0.425969
-4.213263	14.14b	-0.124939
-4.213263	non-	-0.124939
-4.213263	15.1c.	-0.124939
-4.449881	73).	-0.124939
-4.213263	.......................................................	-0.124939
-4.213263	mutexes,	-0.124939
-4.213263	volatile.	-0.124939
-4.213263	relocation.	-0.124939
-4.449881	(~a&c)	-0.124939
-4.213263	90%	-0.124939
-4.213263	MultiplyBy	-0.124939
-4.213263	suited	-0.425969
-4.449881	Developer’s	-0.425969
-4.449881	y1,	-0.124939
-4.213263	14.14a	-0.124939
-4.213263	settings	-0.124939
-4.213263	Compile	-0.124939
-4.213263	Provoke	-0.124939
-4.449881	import	-0.425969
-4.213263	turns	-0.425969
-4.213263	0x4700.	-0.124939
-4.213263	x----	-0.124939
-4.449881	Overcoming	-0.425969
-4.213263	a*1=a	-0.124939
-4.213263	Take	-0.124939
-4.213263	shuffling,	-0.124939
-4.449881	priorities	-0.124939
-4.449881	range";	-0.124939
-4.213263	corrections	-0.124939
-4.213263	safe.	-0.124939
-4.213263	supported.	-0.124939
-4.449881	anonymous	-0.124939
-4.213263	pure.	-0.124939
-4.213263	SSE4.2	-0.124939
-4.213263	clock;	-0.124939
-4.449881	12.4a	-0.124939
-4.449881	matrices,	-0.425969
-4.213263	inconsistent	-0.124939
-4.213263	bc);	-0.425969
-4.449881	join	-0.124939
-4.449881	range.	-0.124939
-4.449881	c:2;	-0.425969
-4.213263	Value	-0.425969
-4.213263	CGrandParent	-0.425969
-4.449881	two);	-0.425969
-4.213263	--	-0.425969
-4.213263	bypassed	-0.124939
-4.213263	drivers,	-0.124939
-4.213263	-0	-0.124939
-4.449881	accelerator	-0.124939
-4.213263	3.10	-0.425969
-4.213263	3.11	-0.425969
-4.213263	increases	-0.425969
-4.449881	3.13	-0.425969
-4.213263	3.14	-0.425969
-4.213263	3.15	-0.425969
-4.213263	3.16	-0.425969
-4.213263	Sum1	-0.124939
-4.449881	a*x*x*x	-0.425969
-4.449881	0x20;	-0.124939
-4.213263	TILESIZE	-0.124939
-4.213263	expression,	-0.124939
-4.449881	consumers	-0.124939
-4.213263	CPU,	-0.124939
-4.449881	&Object1;	-0.124939
-4.213263	.............................................................................	-0.124939
-4.449881	exp(x)	-0.425969
-4.449881	programming.	-0.124939
-4.213263	time1;	-0.124939
-4.449881	events,	-0.124939
-4.213263	achieved	-0.124939
-4.449881	<emmintrin.h>	-0.124939
-4.213263	2.8	-0.425969
-4.213263	answers	-0.124939
-4.213263	starting	-0.124939
-4.449881	disadvantages:	-0.124939
-4.213263	2.3	-0.425969
-4.213263	ahead	-0.425969
-4.213263	inserted	-0.124939
-4.213263	2.2	-0.425969
-4.213263	/arch:SSE	-0.425969
-4.213263	2.1	-0.425969
-4.213263	2.0	-0.124939
-4.449881	40320,	-0.425969
-4.449881	Func(int);	-0.425969
-4.213263	invalidate	-0.124939
-4.449881	opposite	-0.124939
-4.213263	itself,	-0.124939
-4.213263	2.7	-0.425969
-4.449881	a[1000];	-0.124939
-4.213263	2.6	-0.425969
-4.213263	S.	-0.124939
-4.213263	environment	-0.124939
-4.213263	F2(b);	-0.425969
-4.213263	handles	-0.124939
-4.213263	2.4	-0.425969
-4.213263	note	-0.425969
-4.213263	others	-0.124939
-4.213263	needs.	-0.124939
-4.213263	sar	-0.124939
-4.449881	("internal")))	-0.124939
-4.213263	8.15a	-0.124939
-4.449881	footprint	-0.124939
-4.213263	14.13b	-0.124939
-4.213263	namespaces.	-0.124939
-4.213263	preventing	-0.124939
-4.449881	Lowest	-0.425969
-4.213263	Saturday	-0.124939
-4.449881	a*b+a*c=a*(b+c)	-0.425969
-4.449881	resolutions,	-0.124939
-4.213263	So	-0.124939
-4.449881	9.6a	-0.124939
-4.449881	a*(b+c)	-0.425969
-4.213263	events	-0.124939
-4.213263	__fastcall.	-0.124939
-4.213263	a+0	-0.425969
-4.213263	Delays	-0.124939
-4.213263	Report	-0.425969
-4.213263	(bb[i]	-0.124939
-4.213263	specified.	-0.124939
-4.213263	prototype	-0.124939
-4.213263	39	-0.124939
-4.213263	let's	-0.124939
-4.449881	(Day	-0.124939
-4.213263	visible	-0.124939
-4.213263	Kbytes	-0.124939
-4.213263	proxy	-0.124939
-4.213263	Microprocessors	-0.425969
-4.449881	105.	-0.124939
-4.213263	recently	-0.124939
-4.213263	creating	-0.425969
-4.449881	order(i);	-0.124939
-4.213263	refers	-0.124939
-4.213263	floppy	-0.124939
-4.213263	underflow.	-0.425969
-4.213263	37	-0.124939
-4.213263	36	-0.124939
-4.449881	3)	-0.124939
-4.213263	contrived	-0.124939
-4.213263	Manual	-0.124939
-4.213263	Excessive	-0.124939
-4.213263	FuncA	-0.124939
-4.213263	web	-0.124939
-4.213263	occur,	-0.425969
-4.213263	reorder	-0.425969
-4.213263	microseconds	-0.124939
-4.213263	Standard	-0.124939
-4.213263	const_cast	-0.425969
-4.213263	though.	-0.124939
-4.449881	compiler:	-0.124939
-4.449881	y)	-0.425969
-4.213263	/MT	-0.124939
-4.213263	overwritten,	-0.124939
-4.449881	[esp+8]	-0.124939
-4.213263	annoyingly	-0.124939
-4.213263	list[size];	-0.124939
-4.213263	license	-0.124939
-4.213263	y1	-0.425969
-4.213263	y2	-0.425969
-4.449881	swapd(x,y)	-0.425969
-4.449881	int)i	-0.124939
-4.449881	CriticalFunction_SSE2(int	-0.425969
-4.449881	GHz	-0.124939
-4.213263	false.	-0.124939
-4.213263	Multithreading	-0.124939
-4.449881	New	-0.425969
-4.449881	CriticalFunctionDispatch(void)	-0.124939
-4.213263	methods.	-0.124939
-4.213263	became	-0.124939
-4.213263	if,	-0.124939
-4.213263	computers.	-0.124939
-4.213263	x.abc	-0.425969
-4.449881	Worst-case	-0.425969
-4.213263	Operations	-0.425969
-4.213263	named	-0.124939
-4.213263	a*0=0	-0.124939
-4.213263	p2	-0.124939
-4.213263	p1	-0.124939
-4.449881	major	-0.425969
-4.213263	internet	-0.124939
-4.449881	p;	-0.124939
-4.213263	lrintf	-0.124939
-4.213263	resulting	-0.124939
-4.449881	a[c][r]);	-0.124939
-4.449881	math.	-0.124939
-4.213263	2048	-0.124939
-4.213263	3.5;	-0.124939
-4.213263	DLLs	-0.124939
-4.213263	Unix	-0.124939
-4.213263	Lookup	-0.425969
-4.213263	differ	-0.124939
-4.449881	InstructionSet();	-0.425969
-4.449881	F1()	-0.124939
-4.213263	safety	-0.124939
-4.213263	predefined	-0.425969
-4.213263	variable-size	-0.124939
-4.213263	obj1;	-0.124939
-4.449881	Codes",	-0.124939
-4.449881	summarized	-0.124939
-4.213263	small.	-0.124939
-4.213263	................................................................................	-0.124939
-4.213263	buffer.	-0.124939
-4.449881	list[size],	-0.124939
-4.449881	"asmlib.h"	-0.425969
-4.213263	ArraySize	-0.124939
-4.213263	Live	-0.425969
-4.213263	mask);	-0.124939
-4.213263	suffixes	-0.124939
-4.213263	programmer.	-0.124939
-4.213263	x-xxxxxx-	-0.124939
-4.213263	name.	-0.124939
-4.213263	third-party	-0.124939
-4.213263	(a+b)+c=a+(b+c)	-0.124939
-4.213263	audio	-0.124939
-4.213263	arguments	-0.124939
-4.213263	infinite	-0.124939
-4.449881	flow.	-0.124939
-4.213263	worse,	-0.124939
-4.449881	miss	-0.124939
-4.213263	unsafe	-0.124939
-4.449881	away.	-0.124939
-4.213263	movements	-0.425969
-4.449881	((x2)	-0.425969
-4.213263	windows,	-0.124939
-4.213263	pressing	-0.425969
-4.213263	Factors	-0.425969
-4.213263	price,	-0.124939
-4.213263	Jumps	-0.124939
-4.449881	maintaining	-0.124939
-4.213263	Nevertheless,	-0.124939
-4.213263	sound	-0.124939
-4.213263	servers	-0.124939
-4.449881	utility.	-0.124939
-4.213263	executable.	-0.124939
-4.449881	controlled.	-0.124939
-4.213263	literature	-0.124939
-4.449881	512;	-0.425969
-4.213263	precautions	-0.124939
-4.449881	smarter	-0.425969
-4.449881	2001.	-0.124939
-4.213263	Current	-0.124939
-4.449881	concentrated	-0.425969
-4.449881	aa[i]	-0.425969
-4.449881	null	-0.124939
-4.213263	capable	-0.425969
-4.213263	FuncC(i);	-0.124939
-4.213263	updating.	-0.425969
-4.213263	MOVNTDQ	-0.124939
-4.213263	renaming	-0.124939
-4.213263	considering	-0.124939
-4.213263	worthwhile	-0.425969
-4.213263	a-(-b)=a+b	-0.124939
-4.449881	f();	-0.425969
-4.213263	separately.	-0.425969
-4.213263	patterns	-0.124939
-4.449881	93.	-0.124939
-4.213263	lowest	-0.124939
-4.213263	EXCEPTION_FLT_OVERFLOW	-0.124939
-4.213263	constructor,	-0.124939
-4.213263	syntax:	-0.425969
-4.449881	1.2;	-0.425969
-4.449881	26.	-0.124939
-4.213263	parentheses	-0.124939
-4.213263	check.	-0.124939
-4.213263	experiments	-0.124939
-4.213263	.................................................................................................	-0.124939
-4.449881	tables".	-0.124939
-4.213263	jl	-0.124939
-4.213263	computation	-0.124939
-4.213263	thread-local	-0.425969
-4.449881	date.	-0.124939
-4.213263	physics	-0.124939
-4.213263	first,	-0.425969
-4.449881	below)	-0.124939
-4.213263	eliminated.	-0.124939
-4.213263	c.load(cc+i);	-0.124939
-4.213263	x-xxxxxxx	-0.124939
-4.213263	Unrolling	-0.124939
-4.213263	hyperthreading.	-0.124939
-4.213263	bytes,	-0.124939
-4.449881	(*.lib,	-0.425969
-4.213263	irrelevant	-0.124939
-4.213263	careful	-0.124939
-4.449881	compression	-0.124939
-4.213263	intervals	-0.124939
-4.449881	(c2	-0.425969
-4.213263	immediate	-0.124939
-4.213263	Object1;	-0.124939
-4.213263	etc.)	-0.124939
-4.449881	64-bit.	-0.124939
-4.213263	indirect	-0.124939
-4.213263	(-a==-b)=(a==b)	-0.425969
-4.213263	hyperthreading,	-0.124939
-4.449881	exponent,	-0.425969
-4.449881	FuncA(i);	-0.124939
-4.213263	cross-platform	-0.124939
-4.449881	{temp=x;	-0.425969
-4.449881	decomposition.	-0.124939
-4.449881	parameter:	-0.124939
-4.213263	determines	-0.124939
-4.449881	(properties)	-0.124939
-4.213263	ABC	-0.124939
-4.213263	comments	-0.124939
-4.213263	.....................................................................................................................	-0.124939
-4.213263	profitable	-0.124939
-4.213263	behind	-0.425969
-4.213263	Technical	-0.124939
-4.213263	Neither	-0.124939
-4.213263	calculation.	-0.124939
-4.449881	const))	-0.124939
-4.213263	test,	-0.124939
-4.449881	-ffunction-	-0.425969
-4.449881	-msse	-0.124939
-4.449881	Initialize	-0.124939
-4.213263	indices	-0.124939
-4.213263	s0	-0.124939
-4.213263	/FA	-0.425969
-4.213263	marketing	-0.124939
-4.213263	parm2);	-0.124939
-4.213263	--xx-----	-0.124939
-4.449881	20,	-0.425969
-4.213263	conflicting	-0.124939
-4.449881	20;	-0.124939
-4.213263	61	-0.124939
-4.213263	looking	-0.124939
-4.213263	coarse-grained	-0.425969
-4.213263	matrix[c][r]	-0.124939
-4.213263	-mssse3	-0.124939
-4.213263	isolate	-0.425969
-4.213263	Let	-0.425969
-4.449881	question.	-0.124939
-4.213263	x^n	-0.124939
-4.213263	treats	-0.124939
-4.449881	topics	-0.124939
-4.213263	(3)	-0.124939
-4.449881	table:	-0.425969
-4.213263	unstable	-0.124939
-4.213263	60	-0.124939
-4.213263	iterations.	-0.124939
-4.213263	Join	-0.425969
-4.213263	bb[i]	-0.124939
-4.213263	sampling	-0.124939
-4.213263	(memory	-0.124939
-4.213263	verifying	-0.124939
-4.213263	3.6	-0.425969
-4.213263	3.4	-0.425969
-4.449881	log(b[i])	-0.425969
-4.213263	Now,	-0.124939
-4.213263	a+a+a+a=a*4	-0.425969
-4.213263	doubt	-0.124939
-4.449881	78).	-0.124939
-4.213263	v.f	-0.124939
-4.213263	manner?	-0.425969
-4.213263	generating	-0.124939
-4.213263	Especially	-0.425969
-4.213263	....................................................................................................	-0.124939
-4.213263	3.2	-0.425969
-4.213263	F1(a);	-0.425969
-4.213263	Switch	-0.124939
-4.449881	_mm_set1_epi16(0);	-0.425969
-4.213263	everywhere	-0.124939
-4.449881	a&&(b||c)	-0.124939
-4.213263	(a&b)	-0.425969
-4.213263	3.3	-0.425969
-4.213263	3.1	-0.425969
-4.213263	randomly	-0.124939
-4.213263	Useful	-0.124939
-4.213263	3.8	-0.425969
-4.449881	3.9	-0.425969
-4.213263	............................................................................................	-0.124939
-4.449881	mainframe	-0.124939
-4.213263	(time	-0.124939
-4.213263	Everything	-0.425969
-4.449881	required.	-0.124939
-4.213263	theoretical	-0.124939
-4.213263	12.1a.	-0.124939
-4.213263	file,	-0.124939
-4.213263	working	-0.124939
-4.213263	pragmas	-0.124939
-4.213263	use.	-0.124939
-4.213263	use,	-0.124939
-4.213263	favorable:	-0.124939
-4.213263	color	-0.124939
-4.213263	8192	-0.124939
-4.449881	_mm_cmpgt_epi16(b,	-0.425969
-4.213263	lost.	-0.124939
-4.449881	Exceptions	-0.425969
-4.213263	question	-0.124939
-4.213263	afterwards	-0.124939
-4.213263	Every	-0.124939
-4.213263	denormals-are-zero	-0.425969
-4.213263	declaration.	-0.124939
-4.213263	exceptions:	-0.124939
-4.449881	read.	-0.124939
-4.213263	Non-static	-0.124939
-4.213263	re-	-0.124939
-4.213263	requiring	-0.124939
-4.213263	{int	-0.425969
-4.213263	branching	-0.124939
-4.213263	changed.	-0.124939
-4.213263	belongs	-0.425969
-4.449881	(a&&c)	-0.124939
-4.213263	NumberOfTests	-0.124939
-4.213263	obviously	-0.124939
-4.449881	undetected.	-0.124939
-4.213263	alone	-0.124939
-4.449881	caller	-0.124939
-4.213263	understanding	-0.425969
-4.213263	influence	-0.124939
-4.213263	x-xx-----	-0.124939
-4.213263	Lazy	-0.425969
-4.213263	Volatile	-0.124939
-4.213263	lock	-0.425969
-4.213263	allowed.	-0.124939
-4.213263	today	-0.124939
-4.449881	(double)(signed	-0.425969
-4.213263	programmable	-0.425969
-4.213263	checks.	-0.124939
-4.213263	mechanisms	-0.124939
-4.213263	8.1.	-0.124939
-4.213263	G	-0.124939
-4.213263	Align	-0.124939
-4.213263	c)	-0.124939
-4.213263	&CriticalFunction_386;	-0.425969
-4.213263	(three	-0.124939
-4.449881	goto	-0.124939
-4.213263	testing.	-0.124939
-4.449881	feature.	-0.124939
-4.449881	interposition	-0.124939
-4.213263	loads	-0.425969
-4.449881	*.a)	-0.425969
-4.449881	dispatchers	-0.124939
-4.213263	Registers	-0.425969
-4.213263	Event-based	-0.124939
-4.213263	associated	-0.425969
-4.213263	time-consumer	-0.124939
-4.213263	mechanism.	-0.124939
-4.449881	Optimizations	-0.425969
-4.213263	machines	-0.124939
-4.449881	problem:	-0.124939
-4.213263	constructors,	-0.124939
-4.213263	References	-0.425969
-4.449881	mutually	-0.425969
-4.213263	_mm_empty()	-0.124939
-4.213263	report	-0.124939
-4.213263	disturbing	-0.425969
-4.213263	develop-	-0.425969
-4.449881	negative.	-0.425969
-4.213263	facilities,	-0.124939
-4.213263	creation	-0.425969
-4.213263	warning	-0.124939
-4.213263	min	-0.124939
-4.213263	14.2	-0.425969
-4.213263	14.3	-0.425969
-4.213263	14.1	-0.425969
-4.213263	vectorclass	-0.124939
-4.213263	14.7	-0.425969
-4.213263	debugging.	-0.425969
-4.449881	Func(int	-0.425969
-4.213263	(columns	-0.425969
-4.213263	14.5	-0.425969
-4.213263	defined.	-0.124939
-4.213263	-msse4.1	-0.124939
-4.213263	x^10	-0.425969
-4.449881	branches):	-0.425969
-4.449881	DoThisThreeTimesAWeek();	-0.425969
-4.449881	-msse2	-0.124939
-4.449881	logarithms,	-0.425969
-4.449881	default.	-0.124939
-4.213263	WTL	-0.124939
-4.449881	_controlfp(0,	-0.425969
-4.213263	load.	-0.124939
-4.449881	select(b	-0.425969
-4.213263	framework.	-0.124939
-4.213263	8.6	-0.425969
-4.449881	synchronization	-0.425969
-4.213263	Without	-0.124939
-4.213263	millisecond	-0.124939
-4.213263	sizeof(S1)	-0.124939
-4.213263	high-priority	-0.124939
-4.213263	development.	-0.124939
-4.213263	push	-0.124939
-4.449881	Numerically	-0.425969
-4.213263	verify	-0.124939
-4.449881	us	-0.425969
-4.213263	searching,	-0.124939
-4.449881	known.	-0.124939
-4.213263	14.13	-0.425969
-4.213263	14.12	-0.425969
-4.449881	area.	-0.124939
-4.213263	14.19	-0.124939
-4.213263	rounding.	-0.124939
-4.213263	column;	-0.124939
-4.213263	dramatically	-0.124939
-4.213263	148	-0.124939
-4.213263	8.5	-0.425969
-4.449881	Polynomial	-0.425969
-4.213263	temporarily.	-0.124939
-4.213263	obscure	-0.124939
-4.213263	14.1c	-0.124939
-4.213263	142	-0.124939
-4.213263	"assume	-0.425969
-4.213263	properly	-0.124939
-4.213263	issue.	-0.124939
-4.213263	restarted	-0.425969
-4.213263	Func2()	-0.425969
-4.213263	x--x-----	-0.124939
-4.213263	PCs.	-0.124939
-4.213263	8.2	-0.425969
-4.213263	Possible	-0.425969
-4.213263	IDE's	-0.124939
-4.213263	8.3	-0.425969
-4.449881	obtained.	-0.124939
-4.449881	C++:	-0.124939
-4.449881	sin(x);	-0.124939
-4.213263	delay.	-0.124939
-4.449881	1.f;	-0.124939
-4.213263	Divisions	-0.124939
-4.213263	7.32	-0.425969
-4.213263	largest_index	-0.425969
-4.449881	wide,	-0.124939
-4.213263	list[i].a	-0.124939
-4.213263	model.	-0.124939
-4.213263	7.33	-0.124939
-4.449881	manipulating	-0.425969
-4.449881	Dependency	-0.425969
-4.449881	additions.	-0.124939
-4.213263	MOVNTPD	-0.124939
-4.213263	158	-0.124939
-4.213263	A;	-0.124939
-4.449881	cycle?	-0.124939
-4.213263	server.	-0.124939
-4.213263	156	-0.425969
-4.213263	fine-tuned	-0.425969
-4.213263	157	-0.425969
-4.449881	draw	-0.124939
-4.449881	examples.	-0.124939
-4.213263	class:	-0.124939
-4.213263	sharing	-0.425969
-4.213263	155	-0.124939
-4.213263	7.30	-0.425969
-4.213263	ways,	-0.124939
-4.449881	predicted.	-0.124939
-4.449881	i<n;	-0.124939
-4.213263	dividing	-0.124939
-4.213263	complications	-0.124939
-4.213263	AVX,	-0.124939
-4.213263	7.31	-0.425969
-4.449881	redo	-0.425969
-4.213263	parallelization	-0.124939
-4.213263	Matrix	-0.425969
-4.213263	..................................................................................	-0.124939
-4.213263	a.store(aa+i);	-0.425969
-4.449881	0x7FFFFFFF)	-0.425969
-4.213263	add,	-0.124939
-4.213263	fed	-0.124939
-4.213263	array,	-0.124939
-4.449881	AVX.	-0.124939
-4.449881	Alignd(X)	-0.124939
-4.449881	_mm_add_epi16(c,	-0.425969
-4.213263	waits	-0.124939
-4.213263	loops,	-0.124939
-4.213263	Tuesday,	-0.124939
-4.213263	loops.	-0.124939
-4.213263	Increment	-0.124939
-4.213263	cached	-0.124939
-4.449881	propagation,	-0.124939
-4.213263	exceeds	-0.124939
-4.213263	Wikipedia	-0.124939
-4.213263	................................................................................................	-0.124939
-4.213263	mask.	-0.124939
-4.213263	conclude	-0.124939
-4.449881	shared.	-0.124939
-4.213263	relocated	-0.124939
-4.449881	else.	-0.124939
-4.449881	FIFO	-0.124939
-4.449881	Library"	-0.124939
-4.213263	dealing	-0.425969
-4.449881	Hello()	-0.425969
-4.213263	Various	-0.124939
-4.213263	software,	-0.124939
-4.213263	Overflow	-0.124939
-4.213263	multiplications	-0.124939
-4.213263	differences	-0.124939
-4.449881	2.2,	-0.425969
-4.213263	machine.	-0.425969
-4.213263	containers.	-0.124939
-4.213263	tree.	-0.124939
-4.213263	approach	-0.425969
-4.449881	dynamically.	-0.124939
-4.213263	__asm__	-0.124939
-4.449881	difficulties	-0.425969
-4.449881	deleting	-0.124939
-4.213263	discussion.	-0.124939
-4.213263	......................................................................................................................	-0.124939
-4.213263	int)	-0.124939
-4.449881	(y)	-0.425969
-4.213263	purpose,	-0.124939
-4.213263	algorithms,	-0.124939
-4.213263	&CriticalFunction_SSE2;	-0.425969
-4.213263	Disp();	-0.124939
-4.213263	consistent	-0.124939
-4.213263	1.23456.	-0.124939
-4.449881	v;	-0.425969
-4.449881	(b,	-0.425969
-4.213263	reveal	-0.124939
-4.213263	created.	-0.124939
-4.213263	denormal	-0.124939
-4.213263	optional	-0.124939
-4.449881	op.	-0.124939
-4.213263	experiment	-0.124939
-4.213263	Induction++;	-0.124939
-4.213263	precisions	-0.124939
-4.449881	Difficult	-0.124939
-4.449881	interpreter	-0.124939
-4.449881	(PLT).	-0.124939
-4.449881	order(int	-0.425969
-4.213263	digital	-0.124939
-4.449881	i++){	-0.425969
-4.213263	0.40	-0.124939
-4.449881	z;	-0.124939
-4.213263	139	-0.124939
-4.213263	0.44	-0.124939
-4.449881	cc);	-0.425969
-4.213263	__GNUC__	-0.124939
-4.213263	covered	-0.124939
-4.449881	fashioned	-0.425969
-4.213263	alloca.	-0.124939
-4.213263	Efficient	-0.124939
-4.213263	spaces	-0.124939
-4.213263	$B1$1:	-0.124939
-4.449881	362880,	-0.425969
-4.213263	(line	-0.425969
-4.213263	column.	-0.124939
-4.449881	complex,	-0.124939
-4.213263	3.7	-0.425969
-4.213263	cheap	-0.124939
-4.449881	purity.	-0.124939
-4.213263	0.f,	-0.124939
-4.449881	14.23b	-0.124939
-4.449881	below).	-0.124939
-4.213263	division,	-0.124939
-4.213263	received	-0.124939
-4.213263	degradation	-0.124939
-4.213263	"override"	-0.425969
-4.213263	11.2b	-0.124939
-4.213263	matrix[j][0]	-0.124939
-4.213263	x--xx----	-0.124939
-4.213263	Sometimes,	-0.124939
-4.449881	distribute	-0.124939
-4.449881	"worst	-0.425969
-4.449881	overdetermined	-0.124939
-4.449881	ms.	-0.124939
-4.449881	thing.	-0.124939
-4.213263	squares:	-0.124939
-4.213263	MemberPointer	-0.425969
-4.449881	www.intel.com.	-0.124939
-4.449881	TILESIZE)	-0.425969
-4.213263	sources.	-0.124939
-4.213263	first-in-last-out	-0.124939
-4.449881	3628800,	-0.425969
-4.449881	uncommon	-0.425969
-4.213263	coprocessor	-0.124939
-4.213263	23;	-0.425969
-4.213263	82	-0.124939
-4.213263	relates	-0.425969
-4.213263	7.9	-0.425969
-4.213263	alignment.	-0.124939
-4.213263	7.5	-0.124939
-4.449881	deal	-0.425969
-4.213263	binutils	-0.425969
-4.213263	7.6	-0.425969
-4.449881	Specific	-0.425969
-4.213263	__attribute__((const))	-0.124939
-4.213263	unnecessary	-0.124939
-4.449881	b[1000];	-0.124939
-4.213263	7.3	-0.425969
-4.213263	performing	-0.124939
-4.213263	once,	-0.124939
-4.213263	s1	-0.124939
-4.213263	(called	-0.124939
-4.213263	84	-0.124939
-4.449881	"C"	-0.124939
-4.213263	respects	-0.124939
-4.213263	License	-0.124939
-4.213263	ISO/IEC	-0.124939
-4.213263	command-line	-0.124939
-4.213263	i++	-0.124939
-4.449881	137).	-0.124939
-4.213263	template.	-0.124939
-4.213263	General	-0.124939
-4.213263	container,	-0.124939
-4.213263	integrated	-0.124939
-4.213263	started.	-0.124939
-4.213263	transposes	-0.425969
-4.449881	container.	-0.124939
-4.449881	(*SelectAddMul_pointer)(aa,	-0.425969
-4.449881	crash	-0.124939
-4.449881	1.1,	-0.425969
-4.449881	reserve	-0.124939
-4.213263	Older	-0.124939
-4.213263	deleted.	-0.124939
-4.213263	Asmlib	-0.124939
-4.213263	Both	-0.124939
-4.449881	&Object2;	-0.124939
-4.449881	a:4;	-0.425969
-4.213263	(-a)*(-b)=a*b	-0.124939
-4.213263	left	-0.124939
-4.213263	c2++)	-0.425969
-4.449881	Nested	-0.425969
-4.213263	ipow	-0.124939
-4.213263	comparison,	-0.124939
-4.449881	produced	-0.425969
-4.449881	NotPolymorphic();	-0.124939
-4.213263	a*b+a*c	-0.425969
-4.213263	11.1a	-0.124939
-4.213263	support,	-0.124939
-4.449881	forget	-0.425969
-4.449881	inverted	-0.124939
-4.213263	11.1b	-0.124939
-4.213263	80.	-0.425969
-4.213263	i.	-0.124939
-4.213263	worked	-0.124939
-4.449881	needed:	-0.425969
-4.449881	Introduction	-0.124939
-4.213263	i+=3){	-0.425969
-4.449881	PUBLIC	-0.425969
-4.449881	-(-a)=a	-0.124939
-4.449881	"IA-32	-0.425969
-4.449881	loaded,	-0.124939
-4.449881	(a&&b&&c)	-0.425969
-4.213263	a+0=a	-0.124939
-4.213263	7.15b	-0.124939
-4.213263	grows	-0.124939
-4.213263	0.	-0.425969
-4.449881	r2++)	-0.425969
-4.213263	index,	-0.124939
-4.213263	time-consumers	-0.124939
-4.449881	enough.	-0.124939
-4.449881	720,	-0.425969
-4.449881	tedious	-0.124939
-4.213263	correctly.	-0.124939
-4.213263	flexible,	-0.124939
-4.213263	ARRAYSIZE	-0.124939
-4.449881	annotation	-0.124939
-4.213263	respectively.	-0.124939
-4.213263	1.1	-0.425969
-4.213263	43).	-0.124939
-4.213263	^=	-0.425969
-4.449881	1's	-0.124939
-4.213263	hexadecimal	-0.124939
-4.449881	u,	-0.425969
-4.449881	extending	-0.124939
-4.449881	^,	-0.124939
-4.449881	systematic	-0.124939
-4.213263	forces	-0.425969
-4.213263	rolled	-0.425969
-4.213263	__attribute__	-0.425969
-4.213263	Java,	-0.124939
-4.449881	ivdep	-0.124939
-4.213263	Gnu.	-0.124939
-4.213263	recursion	-0.124939
-4.449881	^a	-0.425969
-4.213263	algebra,	-0.124939
-4.213263	management	-0.425969
-4.213263	lots	-0.425969
-4.213263	163	-0.124939
-4.213263	projects,	-0.425969
-4.213263	160	-0.124939
-4.213263	(This	-0.124939
-4.449881	Mac.	-0.124939
-4.213263	chip.	-0.425969
-4.213263	wrapped	-0.425969
-4.449881	perfectly	-0.124939
-4.449881	zero);	-0.425969
-4.449881	added?	-0.425969
-4.213263	Addison-Wesley,	-0.124939
-4.449881	counter,	-0.124939
-4.213263	modifier	-0.124939
-4.213263	-Ofast	-0.124939
-4.213263	test.	-0.124939
-4.213263	respond	-0.124939
-4.449881	Intensive	-0.425969
-4.213263	planning	-0.124939
-4.213263	C2	-0.124939
-4.213263	R	-0.124939
-4.449881	release	-0.425969
-4.449881	fraction.	-0.124939
-4.213263	Friday	-0.124939
-4.449881	textbooks	-0.425969
-4.213263	slower.	-0.124939
-4.213263	9.7	-0.425969
-4.213263	c1;	-0.124939
-4.449881	compilers,	-0.124939
-4.213263	subtracting	-0.124939
-4.449881	Returns	-0.124939
-4.213263	insight	-0.124939
-4.213263	a*b=b*a	-0.124939
-4.449881	r1+TILESIZE;	-0.425969
-4.213263	0/a	-0.425969
-4.449881	hope	-0.425969
-4.213263	Portability	-0.124939
-4.213263	compilers).	-0.425969
-4.449881	Catch	-0.124939
-4.213263	99%	-0.425969
-4.449881	";	-0.124939
-4.213263	Sab	-0.124939
-4.213263	contribution	-0.425969
-4.213263	distinction	-0.425969
-4.449881	Update	-0.425969
-4.213263	Supported	-0.124939
-4.449881	ment	-0.124939
-4.213263	typeof(CriticalFunction)	-0.425969
-4.449881	ones	-0.124939
-4.449881	busy	-0.124939
-4.213263	optimally	-0.124939
-4.213263	Fast	-0.124939
-4.213263	funny	-0.124939
-4.213263	queries	-0.124939
-4.213263	saying	-0.124939
-4.449881	normal.	-0.124939
-4.449881	1"	-0.425969
-4.213263	updated.	-0.124939
-4.213263	clumsy	-0.124939
-4.449881	int)u;	-0.124939
-4.213263	trivial	-0.124939
-4.449881	a[SIZE][SIZE])	-0.425969
-4.213263	x64	-0.124939
-4.213263	systems).	-0.124939
-4.449881	wasted	-0.425969
-4.213263	Public	-0.124939
-4.213263	dummy	-0.124939
-4.213263	symbolic	-0.124939
-4.213263	fetched	-0.124939
-4.213263	0x40	-0.124939
-4.213263	(a&b)|(a&c)	-0.425969
-4.213263	relocation,	-0.124939
-4.213263	(The	-0.124939
-4.213263	Mixing	-0.124939
-4.213263	decides	-0.124939
-4.449881	<xmmintrin.h>	-0.124939
-4.213263	Func1(x)	-0.124939
-4.449881	5040,	-0.425969
-4.449881	ability	-0.425969
-4.213263	3,	-0.124939
-4.213263	3.12	-0.425969
-4.213263	123	-0.124939
-4.213263	provoke	-0.124939
-4.449881	VectorC	-0.124939
-4.213263	processes.	-0.425969
-4.449881	((visibility	-0.425969
-4.213263	stronger	-0.124939
-4.213263	accessible	-0.425969
-4.213263	reusable	-0.124939
-4.213263	procedures	-0.124939
-4.213263	!(a	-0.124939
-4.449881	Overview	-0.425969
-4.213263	ammintrin.h	-0.124939
-4.213263	sequences	-0.425969
-4.449881	stop	-0.425969
-4.213263	expansions	-0.124939
-4.449881	bit.	-0.124939
-4.449881	Conclusion	-0.124939
-4.213263	linking.	-0.124939
-4.213263	Free	-0.124939
-4.213263	bottlenecks	-0.124939
-4.213263	interrupts	-0.124939
-4.213263	legacy	-0.124939
-4.213263	segment	-0.124939
-4.213263	__unix__	-0.425969
-4.213263	attempts	-0.425969
-4.213263	Code	-0.124939
-4.449881	Day;	-0.425969
-4.449881	swapd(a[r2][c2],a[c2][r2]);	-0.425969
-4.449881	bb,	-0.425969
-4.213263	power.	-0.124939
-4.449881	consumer	-0.124939
-4.449881	parenthesis	-0.425969
-4.213263	Security	-0.124939
-4.213263	limit,	-0.124939
-4.213263	~	-0.124939
-4.213263	swapd(a[r][c],	-0.425969
-4.213263	Single	-0.425969
-4.213263	0.28	-0.124939
-4.213263	Booleans	-0.124939
-4.213263	0.24	-0.124939
-4.449881	Caching	-0.425969
-4.213263	interfere	-0.425969
-4.213263	-fomit-	-0.425969
-4.213263	0.25	-0.124939
-4.449881	_mm_mullo_epi16	-0.425969
-4.213263	FMA4	-0.124939
-4.213263	distribution	-0.124939
-4.213263	>>	-0.124939
-4.213263	do,	-0.124939
-4.213263	appropriate.	-0.124939
-4.213263	Please	-0.124939
-4.449881	Network	-0.425969
-4.213263	__int64	-0.124939
-4.449881	mispredictions,	-0.124939
-4.449881	rows;	-0.425969
-4.213263	together.	-0.124939
-4.449881	organize	-0.124939
-4.449881	bitfield	-0.124939
-4.213263	15.1b.	-0.124939
-4.213263	sees	-0.124939
-4.213263	Interpreted	-0.124939
-4.213263	treat	-0.124939
-4.449881	a[SIZE][SIZE],	-0.425969
-4.213263	(a<b	-0.425969
-4.449881	processor).	-0.124939
-4.213263	occurred.	-0.124939
-4.449881	corresponding	-0.124939
-4.213263	memset(a,	-0.425969
-4.449881	m;}	-0.124939
-4.213263	recovering	-0.124939
-4.213263	Sunday,	-0.425969
-4.449881	incompatible.	-0.124939
-4.213263	2;}	-0.124939
-4.213263	fast,	-0.124939
-4.213263	University	-0.124939
-4.213263	log,	-0.124939
-4.213263	begins	-0.124939
-4.449881	14.26	-0.124939
-4.449881	14.27	-0.124939
-4.213263	somewhat	-0.124939
-4.449881	predictable.	-0.124939
-4.449881	subtraction,	-0.124939
-4.213263	14.23	-0.124939
-4.213263	slight	-0.124939
-4.213263	unit.	-0.124939
-4.213263	direct	-0.124939
-4.213263	generic	-0.124939
-4.213263	unit,	-0.124939
-4.213263	invalid.	-0.124939
-4.213263	heavily	-0.124939
-4.213263	self-	-0.425969
-4.213263	log2	-0.124939
-4.449881	counters.	-0.124939
-4.449881	speed,	-0.425969
-4.449881	(!a&&c)	-0.124939
-4.213263	[]	-0.124939
-4.449881	122.	-0.425969
-4.213263	messages	-0.124939
-4.213263	x[]);	-0.425969
-4.449881	only)	-0.124939
-4.213263	although	-0.124939
-4.213263	primitive	-0.124939
-4.213263	Goedecker	-0.425969
-4.449881	Model-specific	-0.425969
-4.213263	-fno-rtti	-0.124939
-4.213263	initialization,	-0.124939
-4.213263	largest_abs	-0.425969
-4.213263	implementations.	-0.124939
-4.449881	24,	-0.425969
-4.213263	studying	-0.124939
-4.213263	87).	-0.124939
-4.449881	contentions.	-0.124939
-4.213263	SafeArray	-0.124939
-4.213263	58	-0.124939
-4.449881	increasingly	-0.124939
-4.213263	accurate	-0.124939
-4.213263	efforts	-0.124939
-4.449881	starts.	-0.425969
-4.213263	i/2+r.	-0.425969
-4.449881	reader	-0.124939
-4.213263	usability,	-0.124939
-4.213263	template<>	-0.425969
-4.213263	low-level	-0.124939
-4.449881	available:	-0.425969
-4.213263	_controlfp_s(&dummy,	-0.425969
-4.449881	mangled	-0.425969
-4.449881	Transforming	-0.425969
-4.213263	contend	-0.425969
-4.449881	collector	-0.124939
-4.449881	int)n	-0.425969
-4.213263	Far	-0.124939
-4.449881	factorials:	-0.124939
-4.449881	........................................................................................	-0.124939
-4.213263	compares	-0.124939
-4.449881	shows.	-0.124939
-4.213263	By	-0.124939
-4.449881	certainty	-0.124939
-4.213263	1-bit	-0.124939
-4.213263	Or	-0.124939
-4.213263	Programs	-0.124939
-4.213263	operation,	-0.425969
-4.449881	closed.	-0.425969
-4.213263	source.	-0.124939
-4.449881	:1;//signbit	-0.425969
-4.449881	avoided,	-0.124939
-4.213263	book	-0.124939
-4.213263	avoided.	-0.124939
-4.213263	EMMS	-0.124939
-4.213263	immediately	-0.425969
-4.449881	sizeof(a));	-0.124939
-4.449881	105).	-0.124939
-4.213263	usage	-0.124939
-4.449881	fix	-0.425969
-4.449881	b[SIZE][SIZE])	-0.425969
-4.213263	press	-0.124939
-4.213263	sorting	-0.124939
-4.449881	(*.dll,	-0.425969
-4.449881	1.00	-0.124939
-4.213263	,	-0.124939
-4.213263	efficiently.	-0.124939
-4.213263	word	-0.124939
-4.213263	B2	-0.124939
-4.213263	10.1	-0.425969
-4.213263	Converting	-0.124939
-4.213263	matrix.	-0.124939
-4.213263	B;	-0.124939
-4.213263	divisions.	-0.124939
-4.449881	chains,	-0.124939
-4.213263	coprocessors	-0.124939
-4.213263	x.a	-0.425969
-4.213263	effect.	-0.124939
-4.449881	keyboard	-0.124939
-4.449881	39916800,	-0.425969
-4.449881	Approximate	-0.425969
-4.449881	Table[x]	-0.425969
-4.449881	restriction	-0.124939
-4.213263	x.c	-0.425969
-4.213263	misleading	-0.124939
-4.213263	#ifdef	-0.124939
-4.213263	..................................................................................................	-0.124939
-4.213263	Sum	-0.425969
-4.213263	x.b	-0.425969
-4.449881	CPUs".	-0.124939
-4.213263	shr	-0.124939
-4.213263	each.	-0.425969
-4.213263	0/a=0	-0.124939
-4.449881	replacing	-0.124939
-4.213263	1.0f	-0.124939
-4.449881	manual,	-0.124939
-4.213263	involve	-0.124939
-4.213263	Optimize	-0.124939
-4.449881	list[i];	-0.124939
-4.213263	initialize	-0.124939
-4.213263	117	-0.124939
-4.213263	previously	-0.124939
-4.213263	113	-0.124939
-4.449881	root	-0.425969
-4.213263	statistics,	-0.124939
-4.213263	times,	-0.124939
-4.213263	obstacle	-0.124939
-4.449881	Fog.	-0.124939
-4.213263	12.8	-0.425969
-4.213263	IDE	-0.124939
-4.213263	12.9	-0.425969
-4.213263	removing	-0.124939
-4.449881	setup	-0.124939
-4.449881	_LP64	-0.124939
-4.213263	type,	-0.124939
-4.213263	b.load(bb+i);	-0.124939
-4.213263	factorials	-0.124939
-4.213263	subroutine	-0.124939
-4.213263	later.	-0.124939
-4.213263	12.4	-0.425969
-4.213263	12.5	-0.425969
-4.449881	manipulations	-0.124939
-4.449881	leaks	-0.124939
-4.213263	says	-0.425969
-4.213263	12.6	-0.425969
-4.449881	140).	-0.124939
-4.213263	12.7	-0.425969
-4.213263	52	-0.124939
-4.449881	obey	-0.124939
-4.449881	72.	-0.124939
-4.449881	b*a	-0.124939
-4.213263	95	-0.124939
-4.213263	12.1	-0.425969
-4.213263	time-critical	-0.124939
-4.213263	12.3	-0.425969
-4.213263	universal	-0.124939
-4.213263	Suppl.	-0.425969
-4.213263	crash.	-0.124939
-4.213263	a+b+c+d	-0.425969
-4.213263	99	-0.124939
-4.213263	Remove	-0.124939
-4.213263	interpreters,	-0.425969
-4.213263	9.	-0.124939
-4.449881	any,	-0.124939
-4.213263	thread-safe	-0.124939
-4.449881	F3(bool	-0.425969
-4.213263	exclusive	-0.124939
-4.213263	9.2.	-0.124939
-4.449881	NumberOfTests;	-0.425969
-4.213263	stay	-0.124939
-4.213263	extension	-0.124939
-4.213263	"best	-0.425969
-4.213263	functions)	-0.124939
-4.213263	repeated	-0.124939
-4.449881	FactorialTable[13]	-0.425969
-4.213263	Friday)	-0.124939
-4.213263	function:	-0.124939
-4.449881	8.21	-0.124939
-4.213263	&CriticalFunction_AVX;	-0.425969
-4.213263	105	-0.124939
-4.213263	interpret	-0.124939
-4.213263	Writing	-0.124939
-4.449881	bug	-0.124939
-4.449881	Compare	-0.425969
-4.449881	x=y;	-0.425969
-4.213263	better,	-0.124939
-4.213263	flip	-0.124939
-4.449881	float's	-0.124939
-4.449881	minutes	-0.425969
-4.213263	109	-0.124939
-4.449881	(a+1)	-0.124939
-4.213263	reasons,	-0.124939
-4.213263	better:	-0.124939
-4.213263	method,	-0.124939
-4.213263	whose	-0.124939
-4.449881	delete,	-0.124939
-4.449881	accessed,	-0.124939
-4.449881	a&(b|c)	-0.124939
-4.213263	assigning	-0.124939
-4.213263	10%	-0.425969
-4.213263	Mostly	-0.425969
-4.449881	4:	-0.425969
-4.449881	4;	-0.124939
-4.213263	have.	-0.124939
-4.213263	relieving	-0.124939
-4.213263	10,	-0.124939
-4.213263	ENDP	-0.124939
-4.213263	incremented	-0.124939
-4.213263	48	-0.124939
-4.213263	Tuesday	-0.124939
-4.213263	internally	-0.425969
-4.213263	fourth	-0.124939
-4.213263	tips	-0.425969
-4.213263	shared_ptr	-0.124939
-4.449881	BSD.	-0.124939
-4.449881	effort.	-0.124939
-4.213263	today.	-0.124939
-4.213263	Put	-0.124939
-4.449881	CriticalFunction_AVX(int	-0.425969
-4.449881	(r2	-0.425969
-4.449881	writing:	-0.124939
-4.449881	B2;	-0.124939
-4.449881	overall	-0.124939
-4.449881	Structures	-0.425969
-4.213263	executables.	-0.124939
-4.213263	inherently	-0.124939
-4.213263	me.	-0.124939
-4.213263	non-virtual	-0.124939
-4.213263	safer.	-0.124939
-4.213263	b+c	-0.124939
-4.449881	__linux__	-0.124939
-4.449881	-32768	-0.124939
-4.449881	b+a	-0.124939
-4.449881	factorials,	-0.124939
-4.449881	convenience	-0.124939
-4.449881	231.	-0.124939
-4.449881	pow(x,10);	-0.124939
-4.449881	companies	-0.124939
-4.449881	dominate	-0.124939
-4.449881	preferences	-0.124939
-4.449881	optimize("a",on).	-0.124939
-4.449881	optimize(...)	-0.124939
-4.449881	x---x---x	-0.124939
-4.449881	16383	-0.124939
-4.449881	extensions.	-0.124939
-4.449881	today,	-0.124939
-4.449881	14.5b	-0.124939
-4.449881	14.5a	-0.124939
-4.449881	otherwise.	-0.124939
-4.449881	//Loopby4	-0.124939
-4.449881	b2);	-0.124939
-4.449881	1./3628800.,	-0.124939
-4.449881	niche	-0.124939
-4.449881	10)	-0.124939
-4.449881	fistp	-0.124939
-4.449881	SetThreadAffinityMask,	-0.124939
-4.449881	common,	-0.124939
-4.449881	(low	-0.124939
-4.449881	common.	-0.124939
-4.449881	segments	-0.124939
-4.449881	Darwin8	-0.124939
-4.449881	108	-0.124939
-4.449881	compact,	-0.124939
-4.449881	102	-0.124939
-4.449881	namespace.	-0.124939
-4.449881	update,	-0.124939
-4.449881	similarity	-0.124939
-4.449881	106	-0.124939
-4.449881	function)	-0.124939
-4.449881	"move	-0.124939
-4.449881	function"	-0.124939
-4.449881	Friday,	-0.124939
-4.449881	bits),	-0.124939
-4.449881	Correction	-0.124939
-4.449881	vulnerability	-0.124939
-4.449881	reinstallation	-0.124939
-4.449881	ignoring	-0.124939
-4.449881	sums	-0.124939
-4.449881	N-1)==0	-0.124939
-4.449881	1./6.,	-0.124939
-4.449881	{x	-0.124939
-4.449881	Rather	-0.124939
-4.449881	code-based	-0.124939
-4.449881	N+1	-0.124939
-4.449881	512-bit	-0.124939
-4.449881	7.6.	-0.124939
-4.449881	circumvent	-0.124939
-4.449881	---x---xx	-0.124939
-4.449881	blog.	-0.124939
-4.449881	OpenMP.	-0.124939
-4.449881	Includes	-0.124939
-4.449881	$B2$2	-0.124939
-4.449881	effects.	-0.124939
-4.449881	Delight".	-0.124939
-4.449881	AND-operations	-0.124939
-4.449881	precompiled	-0.124939
-4.449881	typo	-0.124939
-4.449881	51).	-0.124939
-4.449881	Included	-0.124939
-4.449881	Updates	-0.124939
-4.449881	'@'	-0.124939
-4.449881	(methods)	-0.124939
-4.449881	count.	-0.124939
-4.449881	masm=intel	-0.124939
-4.449881	warn	-0.124939
-4.449881	warm	-0.124939
-4.449881	noticed	-0.124939
-4.449881	constructs........................................................................	-0.124939
-4.449881	a<<b<<c	-0.124939
-4.449881	leak.	-0.124939
-4.449881	utility	-0.124939
-4.449881	clumsy,	-0.124939
-4.449881	16-byte	-0.124939
-4.449881	zeroes.	-0.124939
-4.449881	caveats.	-0.124939
-4.449881	$B1$2:.	-0.124939
-4.449881	repagination	-0.124939
-4.449881	a[i+2]	-0.124939
-4.449881	recognizes	-0.124939
-4.449881	recognized	-0.124939
-4.449881	cleans	-0.124939
-4.449881	(bitwise	-0.124939
-4.449881	portability.	-0.124939
-4.449881	independently.	-0.124939
-4.449881	9.5b	-0.124939
-4.449881	answer	-0.124939
-4.449881	(Standard	-0.124939
-4.449881	13.2.	-0.124939
-4.449881	adds,	-0.124939
-4.449881	objconv	-0.124939
-4.449881	purpose:	-0.124939
-4.449881	limitations	-0.124939
-4.449881	x8*x2;	-0.124939
-4.449881	limitation).	-0.124939
-4.449881	speculatively	-0.124939
-4.449881	disassembly	-0.124939
-4.449881	(en.wikipedia.org/wiki/L2_cache).	-0.124939
-4.449881	attempt	-0.124939
-4.449881	0+1.23456	-0.124939
-4.449881	before)	-0.124939
-4.449881	blocks.	-0.124939
-4.449881	timingtest.h	-0.124939
-4.449881	C1::Disp()	-0.124939
-4.449881	predictor.	-0.124939
-4.449881	128.	-0.124939
-4.449881	accurate,	-0.124939
-4.449881	Misaligned	-0.124939
-4.449881	FAQ	-0.124939
-4.449881	before.	-0.124939
-4.449881	Xnu	-0.124939
-4.449881	initialized.	-0.124939
-4.449881	so).	-0.124939
-4.449881	Remember,	-0.124939
-4.449881	/fp:fast=2	-0.124939
-4.449881	/MT).	-0.124939
-4.449881	MKL).	-0.124939
-4.449881	B1	-0.124939
-4.449881	CParent::Hello()	-0.124939
-4.449881	user-defined	-0.124939
-4.449881	spots,	-0.124939
-4.449881	B.	-0.124939
-4.449881	exploiting	-0.124939
-4.449881	sets).	-0.124939
-4.449881	lost	-0.124939
-4.449881	i+1;	-0.124939
-4.449881	terminated.	-0.124939
-4.449881	burden	-0.124939
-4.449881	(a&&!b)	-0.124939
-4.449881	SafeArray:	-0.124939
-4.449881	Is8vec16	-0.124939
-4.449881	weakness	-0.124939
-4.449881	imple-	-0.124939
-4.449881	Yeppp.	-0.124939
-4.449881	nested	-0.124939
-4.449881	source)	-0.124939
-4.449881	1994.	-0.124939
-4.449881	side	-0.124939
-4.449881	ample	-0.124939
-4.449881	(MMX),	-0.124939
-4.449881	1/50	-0.124939
-4.449881	/openmp	-0.124939
-4.449881	forgot	-0.124939
-4.449881	clauses	-0.124939
-4.449881	reproducibility.	-0.124939
-4.449881	-abs(x);.	-0.124939
-4.449881	x-xxx	-0.124939
-4.449881	timediff[i]	-0.124939
-4.449881	Be	-0.124939
-4.449881	for(inti=0;i<16;i+=4){	-0.124939
-4.449881	message.	-0.124939
-4.449881	coordination	-0.124939
-4.449881	distant	-0.124939
-4.449881	Sunday	-0.124939
-4.449881	restore	-0.124939
-4.449881	47	-0.124939
-4.449881	1996.	-0.124939
-4.449881	www.agner.org/optimize.	-0.124939
-4.449881	row++)	-0.124939
-4.449881	resume	-0.124939
-4.449881	x-xxxxx--	-0.124939
-4.449881	areas.	-0.124939
-4.449881	qword	-0.124939
-4.449881	Repeat	-0.124939
-4.449881	/EHs-	-0.124939
-4.449881	union:	-0.124939
-4.449881	(12.4e)	-0.124939
-4.449881	steals	-0.124939
-4.449881	(add	-0.124939
-4.449881	moved,	-0.124939
-4.449881	moved.	-0.124939
-4.449881	areas,	-0.124939
-4.449881	longdoublevalue	-0.124939
-4.449881	unconventional	-0.124939
-4.449881	*const_cast<int*>(&x)	-0.124939
-4.449881	(option	-0.124939
-4.449881	x-xxxxx-x	-0.124939
-4.449881	Handles	-0.124939
-4.449881	strange	-0.124939
-4.449881	invalid,	-0.124939
-4.449881	_mm_stream_si128	-0.124939
-4.449881	/Gy,	-0.124939
-4.449881	(there	-0.124939
-4.449881	sqrt	-0.124939
-4.449881	b.y	-0.124939
-4.449881	SSE.	-0.124939
-4.449881	bits:	-0.124939
-4.449881	264-1	-0.124939
-4.449881	generality.	-0.124939
-4.449881	sin(0.8);	-0.124939
-4.449881	esp+12	-0.124939
-4.449881	bounds-checking	-0.124939
-4.449881	owns	-0.124939
-4.449881	i--,	-0.124939
-4.449881	versa.	-0.124939
-4.449881	body.	-0.124939
-4.449881	CString	-0.124939
-4.449881	58.7	-0.124939
-4.449881	Prevent	-0.124939
-4.449881	"express"	-0.124939
-4.449881	(en.wikipedia.org/wiki/Standard_Template_Library).	-0.124939
-4.449881	zero:	-0.124939
-4.449881	*(int*)&x	-0.124939
-4.449881	-fno-pic).	-0.124939
-4.449881	tread	-0.124939
-4.449881	rows.	-0.124939
-4.449881	saturated.	-0.124939
-4.449881	rows,	-0.124939
-4.449881	uncaught	-0.124939
-4.449881	macro,	-0.124939
-4.449881	virtualization.	-0.124939
-4.449881	seek	-0.124939
-4.449881	macro.	-0.124939
-4.449881	constructed.	-0.124939
-4.449881	data,	-0.124939
-4.449881	FuncB	-0.124939
-4.449881	limits	-0.124939
-4.449881	xx4(x4);	-0.124939
-4.449881	precious	-0.124939
-4.449881	15h	-0.124939
-4.449881	installed,	-0.124939
-4.449881	installed.	-0.124939
-4.449881	point:	-0.124939
-4.449881	IA-32/Intel64,	-0.124939
-4.449881	_mm_perm_epi8	-0.124939
-4.449881	min)	-0.124939
-4.449881	LLVM	-0.124939
-4.449881	7.40a	-0.124939
-4.449881	7.40b	-0.124939
-4.449881	7.40c	-0.124939
-4.449881	(FuncRow(i)*columns	-0.124939
-4.449881	(Darwin)	-0.124939
-4.449881	see,	-0.124939
-4.449881	importance	-0.124939
-4.449881	comparable	-0.124939
-4.449881	interpretation.	-0.124939
-4.449881	x∙xn-1,	-0.124939
-4.449881	rarely.	-0.124939
-4.449881	optimize("a",	-0.124939
-4.449881	neither	-0.124939
-4.449881	correctly	-0.124939
-4.449881	sets)	-0.124939
-4.449881	libraries........................................................................................	-0.124939
-4.449881	mind.	-0.124939
-4.449881	sets,	-0.124939
-4.449881	dot	-0.124939
-4.449881	a2*b1)	-0.124939
-4.449881	mispredicted.	-0.124939
-4.449881	Day.	-0.124939
-4.449881	mispredicted,	-0.124939
-4.449881	Good	-0.124939
-4.449881	after)	-0.124939
-4.449881	OneOrTwo5[2]	-0.124939
-4.449881	intermediates,	-0.124939
-4.449881	incremented.	-0.124939
-4.449881	incremented,	-0.124939
-4.449881	here's	-0.124939
-4.449881	insufficient.	-0.124939
-4.449881	she	-0.124939
-4.449881	not-too-big	-0.124939
-4.449881	evictions	-0.124939
-4.449881	bit,	-0.124939
-4.449881	Prefetch	-0.124939
-4.449881	core).	-0.124939
-4.449881	noalias)	-0.124939
-4.449881	transposition	-0.124939
-4.449881	other's	-0.124939
-4.449881	below,	-0.124939
-4.449881	illegitimate	-0.124939
-4.449881	legitimate	-0.124939
-4.449881	121	-0.124939
-4.449881	architecture	-0.124939
-4.449881	renamed	-0.124939
-4.449881	unacceptable	-0.124939
-4.449881	hardware-related	-0.124939
-4.449881	12,	-0.124939
-4.449881	12)	-0.124939
-4.449881	"More	-0.124939
-4.449881	"Effective	-0.124939
-4.449881	Dispatcher	-0.124939
-4.449881	www.gnu.org/copyleft/fdl.html.	-0.124939
-4.449881	Boost	-0.124939
-4.449881	ebx,31	-0.124939
-4.449881	matters.	-0.124939
-4.449881	<ia32intrin.h>	-0.124939
-4.449881	finish.	-0.124939
-4.449881	inappropriate	-0.124939
-4.449881	0:	-0.124939
-4.449881	IA-32	-0.124939
-4.449881	x87	-0.124939
-4.449881	a+b+c=a+(b+c)	-0.124939
-4.449881	6.0f;	-0.124939
-4.449881	brackets.	-0.124939
-4.449881	NULL.	-0.124939
-4.449881	b*(2.0/3.0)	-0.124939
-4.449881	.a),	-0.124939
-4.449881	tolerance	-0.124939
-4.449881	Processor	-0.124939
-4.449881	Files	-0.124939
-4.449881	compactness,	-0.124939
-4.449881	(partial)	-0.124939
-4.449881	Abrash:	-0.124939
-4.449881	T+1	-0.124939
-4.449881	0]	-0.124939
-4.449881	Edition,	-0.124939
-4.449881	14.7b	-0.124939
-4.449881	Otherwise	-0.124939
-4.449881	suggested	-0.124939
-4.449881	14.3a	-0.124939
-4.449881	14.3b	-0.124939
-4.449881	(DLL)	-0.124939
-4.449881	SelectAddMul_SSE41	-0.124939
-4.449881	thenaandbcannot	-0.124939
-4.449881	Hat).	-0.124939
-4.449881	int32_t	-0.124939
-4.449881	issuing	-0.124939
-4.449881	unchanged	-0.124939
-4.449881	Deallocation	-0.124939
-4.449881	initiative	-0.124939
-4.449881	bloat	-0.124939
-4.449881	Division,	-0.124939
-4.449881	Gives	-0.124939
-4.449881	C-	-0.124939
-4.449881	boolb=0;	-0.124939
-4.449881	C#	-0.124939
-4.449881	(0);	-0.124939
-4.449881	acceptable	-0.124939
-4.449881	FuncB,	-0.124939
-4.449881	removed.	-0.124939
-4.449881	trigger	-0.124939
-4.449881	basic	-0.124939
-4.449881	j++)	-0.124939
-4.449881	once...................................	-0.124939
-4.449881	statements,	-0.124939
-4.449881	compiling.	-0.124939
-4.449881	frustration	-0.124939
-4.449881	2-3	-0.124939
-4.449881	prone	-0.124939
-4.449881	switches.....................................................................................................	-0.124939
-4.449881	deeper	-0.124939
-4.449881	b))	-0.124939
-4.449881	less.	-0.124939
-4.449881	"function".	-0.124939
-4.449881	Max.	-0.124939
-4.449881	operator[]	-0.124939
-4.449881	enabled:	-0.124939
-4.449881	owns.	-0.124939
-4.449881	wrapper	-0.124939
-4.449881	161	-0.124939
-4.449881	162	-0.124939
-4.449881	be,	-0.124939
-4.449881	algebra)	-0.124939
-4.449881	be.	-0.124939
-4.449881	Re-do	-0.124939
-4.449881	algebra.	-0.124939
-4.449881	slices.	-0.124939
-4.449881	transformation	-0.124939
-4.449881	x^8	-0.124939
-4.449881	2.11	-0.124939
-4.449881	power-save	-0.124939
-4.449881	situations:	-0.124939
-4.449881	balance	-0.124939
-4.449881	0.35	-0.124939
-4.449881	over.	-0.124939
-4.449881	over-	-0.124939
-4.449881	a<c)	-0.124939
-4.449881	a1/b1	-0.124939
-4.449881	"memory"	-0.124939
-4.449881	2009.	-0.124939
-4.449881	situations,	-0.124939
-4.449881	vendor	-0.124939
-4.449881	list;	-0.124939
-4.449881	Non-polymorphic	-0.124939
-4.449881	investment.	-0.124939
-4.449881	MOVNTPS,	-0.124939
-4.449881	(80	-0.124939
-4.449881	matrix[NUMROWS][NUMCOLUMNS];	-0.124939
-4.449881	149	-0.124939
-4.449881	x);}	-0.124939
-4.449881	cc[i]+2	-0.124939
-4.449881	Iu32vec2	-0.124939
-4.449881	Iu32vec4	-0.124939
-4.449881	time1	-0.124939
-4.449881	plug-ins	-0.124939
-4.449881	Vec32c	-0.124939
-4.449881	interval,	-0.124939
-4.449881	time?	-0.124939
-4.449881	&SelectAddMul_AVX2;	-0.124939
-4.449881	i7	-0.124939
-4.449881	kit	-0.124939
-4.449881	i)	-0.124939
-4.449881	(Vec4f	-0.124939
-4.449881	transitions	-0.124939
-4.449881	Usually	-0.124939
-4.449881	a[size];	-0.124939
-4.449881	...................................................................................................	-0.124939
-4.449881	x?"	-0.124939
-4.449881	separating	-0.124939
-4.449881	(the	-0.124939
-4.449881	g()	-0.124939
-4.449881	technology,	-0.124939
-4.449881	800	-0.124939
-4.449881	tolerated.	-0.124939
-4.449881	Gbytes.	-0.124939
-4.449881	p1;	-0.124939
-4.449881	CriticalFunctionType(int	-0.124939
-4.449881	deleted,	-0.124939
-4.449881	a/1	-0.124939
-4.449881	Yet,	-0.124939
-4.449881	-263	-0.124939
-4.449881	assigned	-0.124939
-4.449881	functional	-0.124939
-4.449881	2eee	-0.124939
-4.449881	human	-0.124939
-4.449881	plug-in	-0.124939
-4.449881	_mm_setcsr(_mm_getcsr()	-0.124939
-4.449881	2-20,	-0.124939
-4.449881	yet	-0.124939
-4.449881	Newton-Raphson	-0.124939
-4.449881	regarded	-0.124939
-4.449881	pixel	-0.124939
-4.449881	affinity	-0.124939
-4.449881	PROCNEAR	-0.124939
-4.449881	119).	-0.124939
-4.449881	stack).	-0.124939
-4.449881	ago,	-0.124939
-4.449881	reuse	-0.124939
-4.449881	Beginners	-0.124939
-4.449881	Error:	-0.124939
-4.449881	pool,	-0.124939
-4.449881	reordered,	-0.124939
-4.449881	doublevalue	-0.124939
-4.449881	(IPP).	-0.124939
-4.449881	hand.	-0.124939
-4.449881	hand-	-0.124939
-4.449881	|)	-0.124939
-4.449881	standardization	-0.124939
-4.449881	Patches	-0.124939
-4.449881	object-oriented	-0.124939
-4.449881	WriteFile	-0.124939
-4.449881	link.	-0.124939
-4.449881	strategies........................................................................................	-0.124939
-4.449881	monitoring	-0.124939
-4.449881	printf("Beta");	-0.124939
-4.449881	re-use	-0.124939
-4.449881	dead	-0.124939
-4.449881	Core2	-0.124939
-4.449881	meaning.	-0.124939
-4.449881	meaning,	-0.124939
-4.449881	2014-08-07.	-0.124939
-4.449881	(int)&matrix[0][0]	-0.124939
-4.449881	Safe	-0.124939
-4.449881	i*12,	-0.124939
-4.449881	__thread	-0.124939
-4.449881	(BTB).	-0.124939
-4.449881	meanings	-0.124939
-4.449881	capabilities.	-0.124939
-4.449881	paragraph.	-0.124939
-4.449881	Sum2(S3	-0.124939
-4.449881	18015,	-0.124939
-4.449881	alignments	-0.124939
-4.449881	status:	-0.124939
-4.449881	Including	-0.124939
-4.449881	millisecond.	-0.124939
-4.449881	list[100],	-0.124939
-4.449881	benchmark	-0.124939
-4.449881	ifbit=1	-0.124939
-4.449881	f(x)	-0.124939
-4.449881	floatvalue	-0.124939
-4.449881	Inlining	-0.124939
-4.449881	remains	-0.124939
-4.449881	worst-	-0.124939
-4.449881	titles.	-0.124939
-4.449881	11.2a	-0.124939
-4.449881	/GR–	-0.124939
-4.449881	~C1();	-0.124939
-4.449881	(vector	-0.124939
-4.449881	initial	-0.124939
-4.449881	optimize/#vectorclass	-0.124939
-4.449881	powN<true,N/2>::p(x);	-0.124939
-4.449881	accumulators.	-0.124939
-4.449881	22.	-0.124939
-4.449881	addressed	-0.124939
-4.449881	F0()	-0.124939
-4.449881	OS,	-0.124939
-4.449881	vmlsExp4	-0.124939
-4.449881	-mcmodel=large,	-0.124939
-4.449881	+127.	-0.124939
-4.449881	precision:	-0.124939
-4.449881	223	-0.124939
-4.449881	x[1]	-0.124939
-4.449881	Adolfy	-0.124939
-4.449881	64;	-0.124939
-4.449881	products	-0.124939
-4.449881	dvec.h	-0.124939
-4.449881	(.dll	-0.124939
-4.449881	stages	-0.124939
-4.449881	InstructionSet().The	-0.124939
-4.449881	64.	-0.124939
-4.449881	sticks	-0.124939
-4.449881	Parallelization	-0.124939
-4.449881	covers	-0.124939
-4.449881	die.	-0.124939
-4.449881	153.	-0.124939
-4.449881	65535	-0.124939
-4.449881	/Qopt-report	-0.124939
-4.449881	low-priority	-0.124939
-4.449881	Vectorization	-0.124939
-4.449881	Profile-guided	-0.124939
-4.449881	destructors.	-0.124939
-4.449881	Locked	-0.124939
-4.449881	independence,	-0.124939
-4.449881	libraries............................................................................	-0.124939
-4.449881	fine-tuning,	-0.124939
-4.449881	exits.	-0.124939
-4.449881	exits,	-0.124939
-4.449881	mangling	-0.124939
-4.449881	utilities	-0.124939
-4.449881	cases........................................................................................................	-0.124939
-4.449881	alleviated	-0.124939
-4.449881	decimals,	-0.124939
-4.449881	cell	-0.124939
-4.449881	transfers	-0.124939
-4.449881	list[j].b	-0.124939
-4.449881	list[j].a	-0.124939
-4.449881	12.4a.	-0.124939
-4.449881	12.4a,	-0.124939
-4.449881	example,a	-0.124939
-4.449881	obeyed.	-0.124939
-4.449881	sufficient,	-0.124939
-4.449881	product.	-0.124939
-4.449881	restores	-0.124939
-4.449881	2011).	-0.124939
-4.449881	serial,	-0.124939
-4.449881	restored	-0.124939
-4.449881	managed	-0.124939
-4.449881	Disadvantages	-0.124939
-4.449881	real-time	-0.124939
-4.449881	1./40320.,	-0.124939
-4.449881	exceeding	-0.124939
-4.449881	(a+c==b+c)=(a==b)	-0.124939
-4.449881	18.2.	-0.124939
-4.449881	click	-0.124939
-4.449881	1.23456,	-0.124939
-4.449881	download	-0.124939
-4.449881	/Qopenmp	-0.124939
-4.449881	technique	-0.124939
-4.449881	de-allocation	-0.124939
-4.449881	x<<3,	-0.124939
-4.449881	optimizer.	-0.124939
-4.449881	division).	-0.124939
-4.449881	monotonically	-0.124939
-4.449881	top-of-stack	-0.124939
-4.449881	configurations	-0.124939
-4.449881	off.	-0.124939
-4.449881	Vec8ui	-0.124939
-4.449881	2.20	-0.124939
-4.449881	2.23	-0.124939
-4.449881	std.org/jtc1/sc22/wg21/docs/TR18015.pdf.	-0.124939
-4.449881	relocate,	-0.124939
-4.449881	advised	-0.124939
-4.449881	button	-0.124939
-4.449881	positions	-0.124939
-4.449881	did	-0.124939
-4.449881	Vec8us	-0.124939
-4.449881	<excpt.h>	-0.124939
-4.449881	minimal	-0.124939
-4.449881	Library,	-0.124939
-4.449881	Systems	-0.124939
-4.449881	more.	-0.124939
-4.449881	telling	-0.124939
-4.449881	unexpected	-0.124939
-4.449881	feasible.	-0.124939
-4.449881	x2*x2;	-0.124939
-4.449881	Mathcad	-0.124939
-4.449881	-mveclibabi=acml.	-0.124939
-4.449881	friendly	-0.124939
-4.449881	running,	-0.124939
-4.449881	targets	-0.124939
-4.449881	exit.	-0.124939
-4.449881	switching.	-0.124939
-4.449881	alternatives:	-0.124939
-4.449881	operations...............................................................................................	-0.124939
-4.449881	27).	-0.124939
-4.449881	...................................................................................	-0.124939
-4.449881	log(c[i]);	-0.124939
-4.449881	assignment,	-0.124939
-4.449881	assignment.	-0.124939
-4.449881	diagnose.	-0.124939
-4.449881	8.9b	-0.124939
-4.449881	15.1c).	-0.124939
-4.449881	vector(float	-0.124939
-4.449881	c;};	-0.124939
-4.449881	8.9a	-0.124939
-4.449881	sets...........................	-0.124939
-4.449881	Integrates	-0.124939
-4.449881	AddTwo(int	-0.124939
-4.449881	Violation	-0.124939
-4.449881	evenly	-0.124939
-4.449881	identified,	-0.124939
-4.449881	simultaneous	-0.124939
-4.449881	incrementing	-0.124939
-4.449881	imprecise	-0.124939
-4.449881	Technology	-0.124939
-4.449881	-100,	-0.124939
-4.449881	p->f()	-0.124939
-4.449881	DontSkip	-0.124939
-4.449881	1./2.,	-0.124939
-4.449881	blend	-0.124939
-4.449881	e	-0.124939
-4.449881	flexibility	-0.124939
-4.449881	SelectAddMul_SSE2,	-0.124939
-4.449881	Greek[4]	-0.124939
-4.449881	Windows:	-0.124939
-4.449881	wealth	-0.124939
-4.449881	correlated	-0.124939
-4.449881	independently	-0.124939
-4.449881	Output	-0.124939
-4.449881	T,	-0.124939
-4.449881	T>	-0.124939
-4.449881	SelectAddMul,	-0.124939
-4.449881	throw()specification	-0.124939
-4.449881	asa	-0.124939
-4.449881	themselves.	-0.124939
-4.449881	ARRAYSIZE.	-0.124939
-4.449881	electrical	-0.124939
-4.449881	A2	-0.124939
-4.449881	"=m"(n)	-0.124939
-4.449881	A.	-0.124939
-4.449881	(IDE)	-0.124939
-4.449881	TR	-0.124939
-4.449881	Numbers	-0.124939
-4.449881	amounts	-0.124939
-4.449881	fld	-0.124939
-4.449881	Calculating	-0.124939
-4.449881	(v.	-0.124939
-4.449881	polygon	-0.124939
-4.449881	~(~a)=a	-0.124939
-4.449881	/vms	-0.124939
-4.449881	sample	-0.124939
-4.449881	everybody.	-0.124939
-4.449881	unwise	-0.124939
-4.449881	systems").	-0.124939
-4.449881	looses	-0.124939
-4.449881	p2->Hello();	-0.124939
-4.449881	8.23b.	-0.124939
-4.449881	d);	-0.124939
-4.449881	144	-0.124939
-4.449881	circuits	-0.124939
-4.449881	14.1b	-0.124939
-4.449881	143	-0.124939
-4.449881	14.1a	-0.124939
-4.449881	ecx+eax*4.	-0.124939
-4.449881	Reinterpret	-0.124939
-4.449881	max	-0.124939
-4.449881	77)	-0.124939
-4.449881	theory.	-0.124939
-4.449881	constructor"	-0.124939
-4.449881	14,	-0.124939
-4.449881	Number)	-0.124939
-4.449881	grow	-0.124939
-4.449881	Pointers,	-0.124939
-4.449881	Unpredictable	-0.124939
-4.449881	QueryPerformanceCounter	-0.124939
-4.449881	workload	-0.124939
-4.449881	u[0].	-0.124939
-4.449881	class).	-0.124939
-4.449881	seeing	-0.124939
-4.449881	distributors	-0.124939
-4.449881	2B,	-0.124939
-4.449881	(Red	-0.124939
-4.449881	optimally.	-0.124939
-4.449881	optimally,	-0.124939
-4.449881	view.	-0.124939
-4.449881	competition.	-0.124939
-4.449881	8.42n,	-0.124939
-4.449881	group	-0.124939
-4.449881	thank	-0.124939
-4.449881	www.openmp.org.	-0.124939
-4.449881	interesting	-0.124939
-4.449881	~a&~b=~(a|b)	-0.124939
-4.449881	;eax=addressofa	-0.124939
-4.449881	"static"	-0.124939
-4.449881	0x800	-0.124939
-4.449881	subtract	-0.124939
-4.449881	works,	-0.124939
-4.449881	@gnu_indirect_function");	-0.124939
-4.449881	optimizations,	-0.124939
-4.449881	matrixes.	-0.124939
-4.449881	speeds.	-0.124939
-4.449881	tmmintrin.h	-0.124939
-4.449881	{};	-0.124939
-4.449881	forgets	-0.124939
-4.449881	7.	-0.124939
-4.449881	integer:	-0.124939
-4.449881	module2.cpp.	-0.124939
-4.449881	last:	-0.124939
-4.449881	14.0	-0.124939
-4.449881	SelectAddMul_AVX2,	-0.124939
-4.449881	Choose	-0.124939
-4.449881	list[j].c;	-0.124939
-4.449881	r.a	-0.124939
-4.449881	places).	-0.124939
-4.449881	Linux)	-0.124939
-4.449881	decrementing	-0.124939
-4.449881	Accessibility	-0.124939
-4.449881	constructors.	-0.124939
-4.449881	x10	-0.124939
-4.449881	Generic	-0.124939
-4.449881	F64vec4	-0.124939
-4.449881	system-independent,	-0.124939
-4.449881	DynamicArray[i]	-0.124939
-4.449881	cross-module	-0.124939
-4.449881	therefore,	-0.124939
-4.449881	FuncCol(i))	-0.124939
-4.449881	requesting	-0.124939
-4.449881	locally.	-0.124939
-4.449881	8.3a	-0.124939
-4.449881	12.4c.	-0.124939
-4.449881	a+b=0,	-0.124939
-4.449881	"Software	-0.124939
-4.449881	listing.	-0.124939
-4.449881	eax.	-0.124939
-4.449881	game	-0.124939
-4.449881	polynomial(x)	-0.124939
-4.449881	cc[i]);	-0.124939
-4.449881	Time-based	-0.124939
-4.449881	uninstallation	-0.124939
-4.449881	a+b=b+a	-0.124939
-4.449881	remaining	-0.124939
-4.449881	u.d	-0.124939
-4.449881	branch).	-0.124939
-4.449881	conversions....................................................................................................	-0.124939
-4.449881	forbids	-0.124939
-4.449881	107.	-0.124939
-4.449881	Walking	-0.124939
-4.449881	(1985).	-0.124939
-4.449881	de-allocated.	-0.124939
-4.449881	..................................................................................................................	-0.124939
-4.449881	/O3	-0.124939
-4.449881	dispatching:	-0.124939
-4.449881	dispatching,	-0.124939
-4.449881	C++0x	-0.124939
-4.449881	(b1*b2);	-0.124939
-4.449881	1./1.30767E12,	-0.124939
-4.449881	*temp;	-0.124939
-4.449881	segmented	-0.124939
-4.449881	integral	-0.124939
-4.449881	compiler).	-0.124939
-4.449881	Weighing	-0.124939
-4.449881	workaround	-0.124939
-4.449881	Users	-0.124939
-4.449881	Problems	-0.124939
-4.449881	-m32	-0.124939
-4.449881	bool,	-0.124939
-4.449881	branch,	-0.124939
-4.449881	/arch:SSE4.1	-0.124939
-4.449881	performance:	-0.124939
-4.449881	editions).	-0.124939
-4.449881	[ecx+eax*4].	-0.124939
-4.449881	Borland/CodeGear/Embarcadero	-0.124939
-4.449881	easier.	-0.124939
-4.449881	1000.	-0.124939
-4.449881	ready	-0.124939
-4.449881	studied	-0.124939
-4.449881	0.666666666666666666667;	-0.124939
-4.449881	A2;	-0.124939
-4.449881	commpage.	-0.124939
-4.449881	uses.	-0.124939
-4.449881	powN<true,0>	-0.124939
-4.449881	0x3700,	-0.124939
-4.449881	matrix[row][column]	-0.124939
-4.449881	performance).	-0.124939
-4.449881	describes	-0.124939
-4.449881	{2.6f,	-0.124939
-4.449881	non-sequentially	-0.124939
-4.449881	1%	-0.124939
-4.449881	errors;	-0.124939
-4.449881	1)	-0.124939
-4.449881	(typically	-0.124939
-4.449881	hackers	-0.124939
-4.449881	errors,	-0.124939
-4.449881	formula:	-0.124939
-4.449881	Certainly	-0.124939
-4.449881	occupies	-0.124939
-4.449881	multi-threading,	-0.124939
-4.449881	"xmmintrin.h"	-0.124939
-4.449881	occupied	-0.124939
-4.449881	experimental	-0.124939
-4.449881	condition:	-0.124939
-4.449881	runtime).	-0.124939
-4.449881	c1,	-0.124939
-4.449881	Dr	-0.124939
-4.449881	switching	-0.124939
-4.449881	formulas	-0.124939
-4.449881	trace	-0.124939
-4.449881	Single-Instruction-Multiple-Data	-0.124939
-4.449881	condition,	-0.124939
-4.449881	protected:	-0.124939
-4.449881	zigzag	-0.124939
-4.449881	topic,	-0.124939
-4.449881	complaints	-0.124939
-4.449881	(XMM),	-0.124939
-4.449881	local:	-0.124939
-4.449881	20.	-0.124939
-4.449881	D,	-0.124939
-4.449881	throw(A,B,C)	-0.124939
-4.449881	fills	-0.124939
-4.449881	local.	-0.124939
-4.449881	precise	-0.124939
-4.449881	Predefined	-0.124939
-4.449881	local,	-0.124939
-4.449881	column-wise.	-0.124939
-4.449881	__intel_cpu_features_init_x()	-0.124939
-4.449881	stall	-0.124939
-4.449881	=0;	-0.124939
-4.449881	accelerators	-0.124939
-4.449881	Addison-Wesley.	-0.124939
-4.449881	absvalue	-0.124939
-4.449881	eax,0.	-0.124939
-4.449881	dword	-0.124939
-4.449881	ger	-0.124939
-4.449881	recommendations	-0.124939
-4.449881	decomposition,	-0.124939
-4.449881	targets.	-0.124939
-4.449881	undesired.	-0.124939
-4.449881	bytes).	-0.124939
-4.449881	Turn	-0.124939
-4.449881	12.6.	-0.124939
-4.449881	7.32b.	-0.124939
-4.449881	VTune,	-0.124939
-4.449881	[1.0,	-0.124939
-4.449881	if),	-0.124939
-4.449881	referencing	-0.124939
-4.449881	VTune;	-0.124939
-4.449881	stupid	-0.124939
-4.449881	worrying	-0.124939
-4.449881	storing.	-0.124939
-4.449881	......................................................................	-0.124939
-4.449881	7.29b	-0.124939
-4.449881	"Hacker's	-0.124939
-4.449881	Storage	-0.124939
-4.449881	Unsigned	-0.124939
-4.449881	blurred	-0.124939
-4.449881	7.29a	-0.124939
-4.449881	dummy;	-0.124939
-4.449881	eliminating	-0.124939
-4.449881	FatalAppExitA(0,"Array	-0.124939
-4.449881	emulating	-0.124939
-4.449881	satisfies	-0.124939
-4.449881	(int)n	-0.124939
-4.449881	with,	-0.124939
-4.449881	appropriately.	-0.124939
-4.449881	168.5	-0.124939
-4.449881	x10;	-0.124939
-4.449881	168.3	-0.124939
-4.449881	streaming	-0.124939
-4.449881	(2.5f	-0.124939
-4.449881	commas	-0.124939
-4.449881	reciprocal_divisor	-0.124939
-4.449881	usual	-0.124939
-4.449881	?Func2@@YAXQAHAAH@Z	-0.124939
-4.449881	<float.h>	-0.124939
-4.449881	temp++	-0.124939
-4.449881	doing.	-0.124939
-4.449881	unreasonably	-0.124939
-4.449881	popularity	-0.124939
-4.449881	(rebased)	-0.124939
-4.449881	ja	-0.124939
-4.449881	(YMM),	-0.124939
-4.449881	2.0)	-0.124939
-4.449881	dates	-0.124939
-4.449881	Called	-0.124939
-4.449881	(-a)*(-b)	-0.124939
-4.449881	row,	-0.124939
-4.449881	position	-0.124939
-4.449881	constructor.	-0.124939
-4.449881	2B.	-0.124939
-4.449881	number).	-0.124939
-4.449881	(0,	-0.124939
-4.449881	78.	-0.124939
-4.449881	decimals	-0.124939
-4.449881	executables	-0.124939
-4.449881	among	-0.124939
-4.449881	Installing	-0.124939
-4.449881	largest_abs)	-0.124939
-4.449881	8.15b.	-0.124939
-4.449881	towards	-0.124939
-4.449881	b[i]*c[i],	-0.124939
-4.449881	Compiler-specific	-0.124939
-4.449881	maintained	-0.124939
-4.449881	efficient:	-0.124939
-4.449881	LoadVectorA(void	-0.124939
-4.449881	c[arraysize];	-0.124939
-4.449881	performance,	-0.124939
-4.449881	sensible	-0.124939
-4.449881	Interprocedural	-0.124939
-4.449881	Assembly	-0.124939
-4.449881	powN<true,N>	-0.124939
-4.449881	a+b+c	-0.124939
-4.449881	compelling	-0.124939
-4.449881	profile.	-0.124939
-4.449881	cumbersome	-0.124939
-4.449881	403	-0.124939
-4.449881	pmmintrin.h	-0.124939
-4.449881	experience.	-0.124939
-4.449881	executable:	-0.124939
-4.449881	vector).	-0.124939
-4.449881	Environments)	-0.124939
-4.449881	machines?	-0.124939
-4.449881	those	-0.124939
-4.449881	a[i].u[1]	-0.124939
-4.449881	scheme	-0.124939
-4.449881	IDE,	-0.124939
-4.449881	(N-1))	-0.124939
-4.449881	investigating	-0.124939
-4.449881	novector	-0.124939
-4.449881	110;	-0.124939
-4.449881	spaces.	-0.124939
-4.449881	signaling	-0.124939
-4.449881	passed	-0.124939
-4.449881	<pmmintrin.h>	-0.124939
-4.449881	sub-vector.	-0.124939
-4.449881	IEEE	-0.124939
-4.449881	(|)	-0.124939
-4.449881	expensive,	-0.124939
-4.449881	N-1	-0.124939
-4.449881	dominating	-0.124939
-4.449881	fastcall))	-0.124939
-4.449881	interrupted.	-0.124939
-4.449881	17.4	-0.124939
-4.449881	script	-0.124939
-4.449881	lookup[b];	-0.124939
-4.449881	const*)p);}	-0.124939
-4.449881	Vec16uc	-0.124939
-4.449881	xxn(x4,	-0.124939
-4.449881	i/2;	-0.124939
-4.449881	According	-0.124939
-4.449881	microarchitecture.	-0.124939
-4.449881	team	-0.124939
-4.449881	abs(v.f)	-0.124939
-4.449881	(".type	-0.124939
-4.449881	one-man	-0.124939
-4.449881	vectorization.............................................................	-0.124939
-4.449881	None	-0.124939
-4.449881	MAX(a,b)	-0.124939
-4.449881	buffer,	-0.124939
-4.449881	roughly	-0.124939
-4.449881	pow(x,n)	-0.124939
-4.449881	<<6	-0.124939
-4.449881	arrays:	-0.124939
-4.449881	Processors".	-0.124939
-4.449881	Instrumentation:	-0.124939
-4.449881	i&15	-0.124939
-4.449881	("hidden")))".	-0.124939
-4.449881	buffers	-0.124939
-4.449881	_mm256_i64gather_pd	-0.124939
-4.449881	twice.	-0.124939
-4.449881	(2013)	-0.124939
-4.449881	capability	-0.124939
-4.449881	_endthread()	-0.124939
-4.449881	2.5*x^2	-0.124939
-4.449881	__debugbreak();.	-0.124939
-4.449881	system-	-0.124939
-4.449881	calls,	-0.124939
-4.449881	fine-	-0.124939
-4.449881	asmlib,	-0.124939
-4.449881	aiming	-0.124939
-4.449881	found,	-0.124939
-4.449881	subtask	-0.124939
-4.449881	lrint.	-0.124939
-4.449881	c[i]);	-0.124939
-4.449881	231-1	-0.124939
-4.449881	serves	-0.124939
-4.449881	latencies	-0.124939
-4.449881	audience	-0.124939
-4.449881	ced	-0.124939
-4.449881	full.	-0.124939
-4.449881	if.	-0.124939
-4.449881	bloat.	-0.124939
-4.449881	radical	-0.124939
-4.449881	Putting	-0.124939
-4.449881	absence	-0.124939
-4.449881	FuncB(i);	-0.124939
-4.449881	solving	-0.124939
-4.449881	programmers'	-0.124939
-4.449881	Four	-0.124939
-4.449881	(a+b).	-0.124939
-4.449881	objects?	-0.124939
-4.449881	y.	-0.124939
-4.449881	violations,	-0.124939
-4.449881	processor)	-0.124939
-4.449881	sourcebook	-0.124939
-4.449881	horizontal	-0.124939
-4.449881	comparison.	-0.124939
-4.449881	broken	-0.124939
-4.449881	*)alloca(n	-0.124939
-4.449881	spell-checking	-0.124939
-4.449881	int)size)	-0.124939
-4.449881	7.34a.	-0.124939
-4.449881	SSE).	-0.124939
-4.449881	CPU-type	-0.124939
-4.449881	communicating	-0.124939
-4.449881	even-numbered	-0.124939
-4.449881	meaning	-0.124939
-4.449881	a<<(b+c)	-0.124939
-4.449881	mixes	-0.124939
-4.449881	encryption	-0.124939
-4.449881	tried	-0.124939
-4.449881	coef[16]	-0.124939
-4.449881	fallacy	-0.124939
-4.449881	referenced	-0.124939
-4.449881	0xC0000091L	-0.124939
-4.449881	formalism.	-0.124939
-4.449881	underflow:	-0.124939
-4.449881	mark	-0.124939
-4.449881	sub-vectors	-0.124939
-4.449881	polymorphism.	-0.124939
-4.449881	__assume_aligned	-0.124939
-4.449881	polymorphism,	-0.124939
-4.449881	polymorphism:	-0.124939
-4.449881	menus	-0.124939
-4.449881	(Microsoft,	-0.124939
-4.449881	f,	-0.124939
-4.449881	date):	-0.124939
-4.449881	Examples:	-0.124939
-4.449881	MOVNTPS	-0.124939
-4.449881	lacks	-0.124939
-4.449881	arranged	-0.124939
-4.449881	(c+d)	-0.124939
-4.449881	pushed	-0.124939
-4.449881	USB	-0.124939
-4.449881	brand,	-0.124939
-4.449881	(b+c)	-0.124939
-4.449881	"frame	-0.124939
-4.449881	Sdouble	-0.124939
-4.449881	tested,	-0.124939
-4.449881	Sum3(S3	-0.124939
-4.449881	duration.	-0.124939
-4.449881	lifetime	-0.124939
-4.449881	temp++)	-0.124939
-4.449881	14.13c	-0.124939
-4.449881	consecutively?	-0.124939
-4.449881	14.13a	-0.124939
-4.449881	fill	-0.124939
-4.449881	8.15b	-0.124939
-4.449881	_mm_i64gather_pd	-0.124939
-4.449881	finally	-0.124939
-4.449881	hybrid	-0.124939
-4.449881	Func1(list,	-0.124939
-4.449881	workday	-0.124939
-4.449881	instantiated	-0.124939
-4.449881	flaws	-0.124939
-4.449881	pointers).	-0.124939
-4.449881	file"	-0.124939
-4.449881	IntegerPower<10>(x);	-0.124939
-4.449881	tested:	-0.124939
-4.449881	analyzing	-0.124939
-4.449881	S2	-0.124939
-4.449881	S3	-0.124939
-4.449881	(a+b)+c	-0.124939
-4.449881	{1.1,	-0.124939
-4.449881	elements,	-0.124939
-4.449881	(static_cast<MyChild*>(this))->Disp();	-0.124939
-4.449881	cross-	-0.124939
-4.449881	GetLogicalProcessorInformation	-0.124939
-4.449881	it).	-0.124939
-4.449881	substantial.	-0.124939
-4.449881	A*x*x	-0.124939
-4.449881	AES,	-0.124939
-4.449881	u	-0.124939
-4.449881	driver.	-0.124939
-4.449881	combined.	-0.124939
-4.449881	disadvantages.	-0.124939
-4.449881	loose	-0.124939
-4.449881	Processors	-0.124939
-4.449881	investing	-0.124939
-4.449881	arguments.	-0.124939
-4.449881	www.agner.org/optimize/asmlib.zip	-0.124939
-4.449881	Round	-0.124939
-4.449881	reductions.	-0.124939
-4.449881	kludgy.	-0.124939
-4.449881	sched_setaffinity).	-0.124939
-4.449881	answer.	-0.124939
-4.449881	out-of-	-0.124939
-4.449881	r1,	-0.124939
-4.449881	GUI	-0.124939
-4.449881	fence	-0.124939
-4.449881	eee	-0.124939
-4.449881	(Scalar	-0.124939
-4.449881	Omitting	-0.124939
-4.449881	nine,	-0.124939
-4.449881	$B2$2:	-0.124939
-4.449881	7.10b	-0.124939
-4.449881	d.y;	-0.124939
-4.449881	7.10a	-0.124939
-4.449881	randomness	-0.124939
-4.449881	thread,	-0.124939
-4.449881	development",	-0.124939
-4.449881	selected.	-0.124939
-4.449881	BigArray[1024]	-0.124939
-4.449881	shortly.	-0.124939
-4.449881	latencies,	-0.124939
-4.449881	perspective	-0.124939
-4.449881	latencies.	-0.124939
-4.449881	<<,	-0.124939
-4.449881	FuncRow(int);	-0.124939
-4.449881	versions:	-0.124939
-4.449881	1.0E8,	-0.124939
-4.449881	12.4c	-0.124939
-4.449881	"Intel®	-0.124939
-4.449881	illogical	-0.124939
-4.449881	Programmer’s	-0.124939
-4.449881	AND-OR	-0.124939
-4.449881	(short	-0.124939
-4.449881	!b)	-0.124939
-4.449881	versions,	-0.124939
-4.449881	microcontrollers.	-0.124939
-4.449881	-b	-0.124939
-4.449881	-a	-0.124939
-4.449881	_mm_prefetch	-0.124939
-4.449881	12.4.	-0.124939
-4.449881	MyChild>	-0.124939
-4.449881	bb[size]	-0.124939
-4.449881	building	-0.124939
-4.449881	universal,	-0.124939
-4.449881	pool	-0.124939
-4.449881	&list[100]	-0.124939
-4.449881	Entry	-0.124939
-4.449881	add_elements(__m128	-0.124939
-4.449881	Returning	-0.124939
-4.449881	++i).	-0.124939
-4.449881	NUMROWS	-0.124939
-4.449881	(s0+s1)+(s2+s3);	-0.124939
-4.449881	Primitives	-0.124939
-4.449881	narrow	-0.124939
-4.449881	12.4e.	-0.124939
-4.449881	DLL's	-0.124939
-4.449881	"Moving	-0.124939
-4.449881	sqaure:	-0.124939
-4.449881	&SelectAddMul_dispatch;	-0.124939
-4.449881	mutexes.	-0.124939
-4.449881	move.	-0.124939
-4.449881	nfac;	-0.124939
-4.449881	detecting	-0.124939
-4.449881	move,	-0.124939
-4.449881	"Inner	-0.124939
-4.449881	distinct	-0.124939
-4.449881	(a&b)&(c&d)	-0.124939
-4.449881	License,	-0.124939
-4.449881	core,	-0.124939
-4.449881	string,	-0.124939
-4.449881	0.82	-0.124939
-4.449881	heading	-0.124939
-4.449881	60.	-0.124939
-4.449881	Prefetching	-0.124939
-4.449881	vectorization,	-0.124939
-4.449881	position.	-0.124939
-4.449881	0.89	-0.124939
-4.449881	ahead.	-0.124939
-4.449881	15]	-0.124939
-4.449881	scanner	-0.124939
-4.449881	logic.	-0.124939
-4.449881	ifunc	-0.124939
-4.449881	/Gr	-0.124939
-4.449881	n∙(n-1)!.	-0.124939
-4.449881	Transposing	-0.124939
-4.449881	controlling	-0.124939
-4.449881	shared_ptr.	-0.124939
-4.449881	databases.	-0.124939
-4.449881	trying	-0.124939
-4.449881	multitasking	-0.124939
-4.449881	(27	-0.124939
-4.449881	(20	-0.124939
-4.449881	(total	-0.124939
-4.449881	8.5b	-0.124939
-4.449881	/GL	-0.124939
-4.449881	8.5a	-0.124939
-4.449881	it)	-0.124939
-4.449881	strategies	-0.124939
-4.449881	requested.	-0.124939
-4.449881	delayed	-0.124939
-4.449881	List[i]++;	-0.124939
-4.449881	you.	-0.124939
-4.449881	assumes	-0.124939
-4.449881	variable:	-0.124939
-4.449881	2003.	-0.124939
-4.449881	assumed	-0.124939
-4.449881	builder.	-0.124939
-4.449881	!	-0.124939
-4.449881	nowadays	-0.124939
-4.449881	Templates...............................................................................................................57	-0.124939
-4.449881	OneOrTwo5[b!=0]	-0.124939
-4.449881	maps	-0.124939
-4.449881	9.10,	-0.124939
-4.449881	steps.	-0.124939
-4.449881	Updating	-0.124939
-4.449881	-axAVX.	-0.124939
-4.449881	inverting	-0.124939
-4.449881	(int)(&list[100])	-0.124939
-4.449881	_mm_or_si128(c2,	-0.124939
-4.449881	ebx,eax	-0.124939
-4.449881	specialization.	-0.124939
-4.449881	specialization,	-0.124939
-4.449881	improved.	-0.124939
-4.449881	Fortran.	-0.124939
-4.449881	Meyers:	-0.124939
-4.449881	(a1*b2	-0.124939
-4.449881	evaluated,	-0.124939
-4.449881	OneOrTwo5[b!=0];	-0.124939
-4.449881	x(0)	-0.124939
-4.449881	vmldExp2	-0.124939
-4.449881	rounded	-0.124939
-4.449881	Truncation	-0.124939
-4.449881	1.2345;	-0.124939
-4.449881	duration	-0.124939
-4.449881	pointers:	-0.124939
-4.449881	_mm.	-0.124939
-4.449881	Wikibooks.	-0.124939
-4.449881	switches;	-0.124939
-4.449881	interleave	-0.124939
-4.449881	27	-0.124939
-4.449881	0.57	-0.124939
-4.449881	0);	-0.124939
-4.449881	Eclipse	-0.124939
-4.449881	real	-0.124939
-4.449881	B*x	-0.124939
-4.449881	2002).	-0.124939
-4.449881	"generate	-0.124939
-4.449881	true/false	-0.124939
-4.449881	detected	-0.124939
-4.449881	supercomputers	-0.124939
-4.449881	initializing	-0.124939
-4.449881	mirrored	-0.124939
-4.449881	matrix[FuncRow(i)][FuncCol(i)]	-0.124939
-4.449881	Implicit	-0.124939
-4.449881	underestimate	-0.124939
-4.449881	rule.	-0.124939
-4.449881	Installation	-0.124939
-4.449881	undocumented.	-0.124939
-4.449881	bitmap	-0.124939
-4.449881	0x7FFFFF)	-0.124939
-4.449881	2A	-0.124939
-4.449881	valuable	-0.124939
-4.449881	type-casting.	-0.124939
-4.449881	bases,	-0.124939
-4.449881	redirects	-0.124939
-4.449881	(SSE2):	-0.124939
-4.449881	severe	-0.124939
-4.449881	member.	-0.124939
-4.449881	vectors)	-0.124939
-4.449881	mentally	-0.124939
-4.449881	connect	-0.124939
-4.449881	bility	-0.124939
-4.449881	n+1;	-0.124939
-4.449881	Sutter:	-0.124939
-4.449881	primitive,	-0.124939
-4.449881	transposing	-0.124939
-4.449881	computer,	-0.124939
-4.449881	closest	-0.124939
-4.449881	print	-0.124939
-4.449881	evaluation	-0.124939
-4.449881	foreground	-0.124939
-4.449881	only).	-0.124939
-4.449881	obtain,	-0.124939
-4.449881	investigated	-0.124939
-4.449881	power,	-0.124939
-4.449881	(handle	-0.124939
-4.449881	11.1	-0.124939
-4.449881	Constructor	-0.124939
-4.449881	11.6	-0.124939
-4.449881	media	-0.124939
-4.449881	benefits	-0.124939
-4.449881	11.8	-0.124939
-4.449881	powN<(N	-0.124939
-4.449881	bias	-0.124939
-4.449881	utilized	-0.124939
-4.449881	MKL	-0.124939
-4.449881	263-1	-0.124939
-4.449881	F64vec2	-0.124939
-4.449881	language",	-0.124939
-4.449881	X.	-0.124939
-4.449881	(remove	-0.124939
-4.449881	X"	-0.124939
-4.449881	developing	-0.124939
-4.449881	aligned,	-0.124939
-4.449881	2.0/3.0	-0.124939
-4.449881	7.31b	-0.124939
-4.449881	7.31a	-0.124939
-4.449881	powN<true,N-N1>::p(x);	-0.124939
-4.449881	Language	-0.124939
-4.449881	noticeable.	-0.124939
-4.449881	gigabytes	-0.124939
-4.449881	>>=	-0.124939
-4.449881	103)	-0.124939
-4.449881	learning	-0.124939
-4.449881	7.43b.	-0.124939
-4.449881	links.	-0.124939
-4.449881	UnusedFiller;	-0.124939
-4.449881	50-50	-0.124939
-4.449881	know).	-0.124939
-4.449881	overview	-0.124939
-4.449881	supposed	-0.124939
-4.449881	14.4b	-0.124939
-4.449881	15.1a.	-0.124939
-4.449881	Mbytes.	-0.124939
-4.449881	interactive	-0.124939
-4.449881	version).	-0.124939
-4.449881	bitofn	-0.124939
-4.449881	happens.	-0.124939
-4.449881	error-prone.	-0.124939
-4.449881	article	-0.124939
-4.449881	entries.	-0.124939
-4.449881	risky.	-0.124939
-4.449881	built	-0.124939
-4.449881	Members	-0.124939
-4.449881	majority	-0.124939
-4.449881	build	-0.124939
-4.449881	(multithreaded)	-0.124939
-4.449881	int)(i	-0.124939
-4.449881	justifies	-0.124939
-4.449881	8.13a	-0.124939
-4.449881	8.13b	-0.124939
-4.449881	~,	-0.124939
-4.449881	wheel.	-0.124939
-4.449881	come.	-0.124939
-4.449881	(gcc	-0.124939
-4.449881	self-explaining	-0.124939
-4.449881	actively	-0.124939
-4.449881	2.5};	-0.124939
-4.449881	a.x,	-0.124939
-4.449881	weigh	-0.124939
-4.449881	resource-hungry	-0.124939
-4.449881	zero-bits	-0.124939
-4.449881	reorganized	-0.124939
-4.449881	(a&~b)|(~a&b)=a^b	-0.124939
-4.449881	__try	-0.124939
-4.449881	minimizing	-0.124939
-4.449881	relation	-0.124939
-4.449881	b[r][c];	-0.124939
-4.449881	enum,	-0.124939
-4.449881	8.21,	-0.124939
-4.449881	14.15b	-0.124939
-4.449881	fine	-0.124939
-4.449881	Usability	-0.124939
-4.449881	CPU-dispatching	-0.124939
-4.449881	double)	-0.124939
-4.449881	main,	-0.124939
-4.449881	truly	-0.124939
-4.449881	allocation,	-0.124939
-4.449881	seen,	-0.124939
-4.449881	(MFC).	-0.124939
-4.449881	sprintf,	-0.124939
-4.449881	double:	-0.124939
-4.449881	CriticalFunction,	-0.124939
-4.449881	reorganize:	-0.124939
-4.449881	convoluted	-0.124939
-4.449881	b[i];	-0.124939
-4.449881	.R.	-0.124939
-4.449881	restrictions.	-0.124939
-4.449881	copyrighted	-0.124939
-4.449881	network.	-0.124939
-4.449881	arraysize;	-0.124939
-4.449881	express	-0.124939
-4.449881	cheaper	-0.124939
-4.449881	re-allocation	-0.124939
-4.449881	de-referenced	-0.124939
-4.449881	Web	-0.124939
-4.449881	restart	-0.124939
-4.449881	(dynamically	-0.124939
-4.449881	(j	-0.124939
-4.449881	clearing	-0.124939
-4.449881	auto_ptr.	-0.124939
-4.449881	Non-strict	-0.124939
-4.449881	List[ArraySize];	-0.124939
-4.449881	experiments.	-0.124939
-4.449881	acceptable.	-0.124939
-4.449881	*(++p)	-0.124939
-4.449881	Vec4uq	-0.124939
-4.449881	Modulo	-0.124939
-4.449881	improvements).	-0.124939
-4.449881	vectorize,	-0.124939
-4.449881	doubles	-0.124939
-4.449881	so,	-0.124939
-4.449881	Compile-time	-0.124939
-4.449881	weekdays.	-0.124939
-4.449881	price	-0.124939
-4.449881	PSDK).	-0.124939
-4.449881	Noncached	-0.124939
-4.449881	printf(Greek[n]);	-0.124939
-4.449881	Unlike	-0.124939
-4.449881	interface,	-0.124939
-4.449881	(MKL	-0.124939
-4.449881	response.	-0.124939
-4.449881	MFC	-0.124939
-4.449881	supported");	-0.124939
-4.449881	misprediction,	-0.124939
-4.449881	loader.	-0.124939
-4.449881	(1./1.2345)	-0.124939
-4.449881	array[++i]	-0.124939
-4.449881	Func1(int	-0.124939
-4.449881	responses	-0.124939
-4.449881	offering	-0.124939
-4.449881	Programming	-0.124939
-4.449881	First-In-Last-	-0.124939
-4.449881	Inserting	-0.124939
-4.449881	(10000	-0.124939
-4.449881	cpuid	-0.124939
-4.449881	16.2.	-0.124939
-4.449881	Advice	-0.124939
-4.449881	a+a+a+a	-0.124939
-4.449881	PTR[ecx+eax*4],ebx	-0.124939
-4.449881	mode):	-0.124939
-4.449881	{1.0f,	-0.124939
-4.449881	kludgy	-0.124939
-4.449881	clauses:	-0.124939
-4.449881	throughout	-0.124939
-4.449881	x8	-0.124939
-4.449881	eliminates	-0.124939
-4.449881	straightforward.	-0.124939
-4.449881	create	-0.124939
-4.449881	non-const	-0.124939
-4.449881	dropping	-0.124939
-4.449881	Studio.	-0.124939
-4.449881	SSE4A	-0.124939
-4.449881	friend	-0.124939
-4.449881	inlining,	-0.124939
-4.449881	linking,	-0.124939
-4.449881	12.2,	-0.124939
-4.449881	unnecessarily	-0.124939
-4.449881	caught	-0.124939
-4.449881	structure),	-0.124939
-4.449881	checked	-0.124939
-4.449881	a[i+1];	-0.124939
-4.449881	(true)	-0.124939
-4.449881	107),	-0.124939
-4.449881	122)	-0.124939
-4.449881	62.	-0.124939
-4.449881	source,	-0.124939
-4.449881	96.	-0.124939
-4.449881	coded.	-0.124939
-4.449881	further.	-0.124939
-4.449881	Branch/loop	-0.124939
-4.449881	key?	-0.124939
-4.449881	sizeof(float));	-0.124939
-4.449881	x-xxx-x--	-0.124939
-4.449881	xx	-0.124939
-4.449881	key.	-0.124939
-4.449881	Menus,	-0.124939
-4.449881	conform	-0.124939
-4.449881	sizeof(float)).	-0.124939
-4.449881	password.	-0.124939
-4.449881	Linked	-0.124939
-4.449881	classes):	-0.124939
-4.449881	compilers.............................................................................	-0.124939
-4.449881	transition	-0.124939
-4.449881	(3	-0.124939
-4.449881	Copy	-0.124939
-4.449881	10.1.020.	-0.124939
-4.449881	TR18015	-0.124939
-4.449881	happening.	-0.124939
-4.449881	keys	-0.124939
-4.449881	assignment	-0.124939
-4.449881	Namespaces...........................................................................................................	-0.124939
-4.449881	alternatingly	-0.124939
-4.449881	e,	-0.124939
-4.449881	2005.	-0.124939
-4.449881	_mm_malloc	-0.124939
-4.449881	exit(),	-0.124939
-4.449881	sizeof(list));	-0.124939
-4.449881	hand-held	-0.124939
-4.449881	a;}	-0.124939
-4.449881	5.82	-0.124939
-4.449881	year	-0.124939
-4.449881	ex	-0.124939
-4.449881	1./720.,	-0.124939
-4.449881	throughputs	-0.124939
-4.449881	for(i=i_div_3=0;	-0.124939
-4.449881	meta-	-0.124939
-4.449881	CFALSE;	-0.124939
-4.449881	CFALSE:	-0.124939
-4.449881	Except	-0.124939
-4.449881	Include	-0.124939
-4.449881	entirely	-0.124939
-4.449881	(&ArraySize)	-0.124939
-4.449881	form.	-0.124939
-4.449881	cut	-0.124939
-4.449881	developed.	-0.124939
-4.449881	frequency,	-0.124939
-4.449881	lineage	-0.124939
-4.449881	nor	-0.124939
-4.449881	x*x	-0.124939
-4.449881	error-handling	-0.124939
-4.449881	correspondence	-0.124939
-4.449881	areas	-0.124939
-4.449881	occasionally	-0.124939
-4.449881	forms	-0.124939
-4.449881	reusability	-0.124939
-4.449881	141.	-0.124939
-4.449881	First-In-First-Out	-0.124939
-4.449881	old-fashioned.	-0.124939
-4.449881	n;}	-0.124939
-4.449881	u[1]	-0.124939
-4.449881	indication	-0.124939
-4.449881	x*8	-0.124939
-4.449881	complicated.	-0.124939
-4.449881	block,	-0.124939
-4.449881	ended	-0.124939
-4.449881	-msse2,	-0.124939
-4.449881	strongest	-0.124939
-4.449881	remember	-0.124939
-4.449881	Keep	-0.124939
-4.449881	Memory-hungry	-0.124939
-4.449881	sizeof	-0.124939
-4.449881	server	-0.124939
-4.449881	affects	-0.124939
-4.449881	decision	-0.124939
-4.449881	0.75	-0.124939
-4.449881	0.77	-0.124939
-4.449881	Nothing	-0.124939
-4.449881	<bool	-0.124939
-4.449881	staircase	-0.124939
-4.449881	?:	-0.124939
-4.449881	economy	-0.124939
-4.449881	1./8.71782E10,	-0.124939
-4.449881	ipow(x,10);	-0.124939
-4.449881	packing,	-0.124939
-4.449881	Contains	-0.124939
-4.449881	attribute	-0.124939
-4.449881	E-book	-0.124939
-4.449881	(int)d;	-0.124939
-4.449881	7.2.	-0.124939
-4.449881	s1,	-0.124939
-4.449881	_intel_fast_memcpy	-0.124939
-4.449881	graphic	-0.124939
-4.449881	vectors........................................................................	-0.124939
-4.449881	9.1a	-0.124939
-4.449881	9.1b	-0.124939
-4.449881	absvalue,	-0.124939
-4.449881	mathimf.h	-0.124939
-4.449881	GetTickCount	-0.124939
-4.449881	registers;	-0.124939
-4.449881	(.lib	-0.124939
-4.449881	2'nd	-0.124939
-4.449881	verifying,	-0.124939
-4.449881	lesson	-0.124939
-4.449881	forward	-0.124939
-4.449881	original,	-0.124939
-4.449881	translate	-0.124939
-4.449881	Occasionally,	-0.124939
-4.449881	improvements.	-0.124939
-4.449881	absvalue;	-0.124939
-4.449881	position-independent,	-0.124939
-4.449881	conditional	-0.124939
-4.449881	9.1.	-0.124939
-4.449881	frame,	-0.124939
-4.449881	frame"	-0.124939
-4.449881	int8_t	-0.124939
-4.449881	104).	-0.124939
-4.449881	Vec8f	-0.124939
-4.449881	Vec8i	-0.124939
-4.449881	x^3,	-0.124939
-4.449881	....................................................................................	-0.124939
-4.449881	1.5f	-0.124939
-4.449881	--combine	-0.124939
-4.449881	_mm256_i32gather_epi32	-0.124939
-4.449881	__declspec(noalias)	-0.124939
-4.449881	unequally	-0.124939
-4.449881	(b&c)	-0.124939
-4.449881	.exe	-0.124939
-4.449881	printf("Gamma");	-0.124939
-4.449881	polynomial:	-0.124939
-4.449881	commas.	-0.124939
-4.449881	popped	-0.124939
-4.449881	engineering	-0.124939
-4.449881	polynomial.	-0.124939
-4.449881	burdensome	-0.124939
-4.449881	trial	-0.124939
-4.449881	contiguous.	-0.124939
-4.449881	.................................................................	-0.124939
-4.449881	thinks	-0.124939
-4.449881	_mm_load_ps(coef+i);	-0.124939
-4.449881	maintenance.	-0.124939
-4.449881	downloaded	-0.124939
-4.449881	(*CriticalFunction)(parm1,	-0.124939
-4.449881	b1;	-0.124939
-4.449881	explaining	-0.124939
-4.449881	(ArraySize)	-0.124939
-4.449881	prepared	-0.124939
-4.449881	iterators	-0.124939
-4.449881	implies	-0.124939
-4.449881	200.	-0.124939
-4.449881	safety,	-0.124939
-4.449881	(a|b)&(a|c)	-0.124939
-4.449881	2006	-0.124939
-4.449881	2007	-0.124939
-4.449881	2004	-0.124939
-4.449881	53).	-0.124939
-4.449881	ready-made	-0.124939
-4.449881	Detect	-0.124939
-4.449881	GetPrivateProfileString	-0.124939
-4.449881	vector::reserve	-0.124939
-4.449881	CParent<CChild2>	-0.124939
-4.449881	storage.............................................................................	-0.124939
-4.449881	query	-0.124939
-4.449881	7.33b	-0.124939
-4.449881	delete).	-0.124939
-4.449881	compose	-0.124939
-4.449881	prototype:	-0.124939
-4.449881	fragmentation.	-0.124939
-4.449881	OK,	-0.124939
-4.449881	sorting,	-0.124939
-4.449881	y2;	-0.124939
-4.449881	fatal	-0.124939
-4.449881	scarcity	-0.124939
-4.449881	Rick	-0.124939
-4.449881	MOVNTI	-0.124939
-4.449881	-100+100+100	-0.124939
-4.449881	classes............................................................................................	-0.124939
-4.449881	initializer	-0.124939
-4.449881	(www.intel.com).	-0.124939
-4.449881	1./362880.,	-0.124939
-4.449881	non-zero,	-0.124939
-4.449881	initializes	-0.124939
-4.449881	g(x)	-0.124939
-4.449881	discover	-0.124939
-4.449881	-2.0	-0.124939
-4.449881	93).	-0.124939
-4.449881	(release	-0.124939
-4.449881	publicly	-0.124939
-4.449881	optimized,	-0.124939
-4.449881	discrete	-0.124939
-4.449881	-opt-report	-0.124939
-4.449881	unsatisfied	-0.124939
-4.449881	safe,	-0.124939
-4.449881	entries	-0.124939
-4.449881	other,	-0.124939
-4.449881	seemingly	-0.124939
-4.449881	imported	-0.124939
-4.449881	standardized.	-0.124939
-4.449881	compensate	-0.124939
-4.449881	maintain	-0.124939
-4.449881	sin.	-0.124939
-4.449881	sin,	-0.124939
-4.449881	rightmost	-0.124939
-4.449881	...............................................................................	-0.124939
-4.449881	insertion	-0.124939
-4.449881	object:	-0.124939
-4.449881	instead.	-0.124939
-4.449881	saved.	-0.124939
-4.449881	_mm_andnot_si128(mask,	-0.124939
-4.449881	8.11b	-0.124939
-4.449881	8.11a	-0.124939
-4.449881	(Integrated	-0.124939
-4.449881	opposite).	-0.124939
-4.449881	"AMD64	-0.124939
-4.449881	0x20,	-0.124939
-4.449881	DTRUE:	-0.124939
-4.449881	DTRUE;	-0.124939
-4.449881	exchange	-0.124939
-4.449881	supposedly	-0.124939
-4.449881	digits.	-0.124939
-4.449881	44.	-0.124939
-4.449881	digits,	-0.124939
-4.449881	www.agner.org/optimize/#vectorclass	-0.124939
-4.449881	(absvalue	-0.124939
-4.449881	8.23b	-0.124939
-4.449881	defined(__GNUC__)	-0.124939
-4.449881	excuse	-0.124939
-4.449881	SelectAddMul_SSE2	-0.124939
-4.449881	system-specific.	-0.124939
-4.449881	Predictable	-0.124939
-4.449881	Later	-0.124939
-4.449881	www.agner.org/	-0.124939
-4.449881	factors.	-0.124939
-4.449881	operators).	-0.124939
-4.449881	(b&&c)	-0.124939
-4.449881	35	-0.124939
-4.449881	34	-0.124939
-4.449881	'?',	-0.124939
-4.449881	-2.0,	-0.124939
-4.449881	int)a	-0.124939
-4.449881	holding	-0.124939
-4.449881	module,	-0.124939
-4.449881	Running	-0.124939
-4.449881	hint,	-0.124939
-4.449881	system-specific	-0.124939
-4.449881	instrset_detect();	-0.124939
-4.449881	remotely.	-0.124939
-4.449881	distributions	-0.124939
-4.449881	Loops:	-0.124939
-4.449881	"we	-0.124939
-4.449881	little-known	-0.124939
-4.449881	Hoisie:	-0.124939
-4.449881	InstructionSet():	-0.124939
-4.449881	decryption,	-0.124939
-4.449881	crystal	-0.124939
-4.449881	resolution.	-0.124939
-4.449881	absent	-0.124939
-4.449881	Similarly,	-0.124939
-4.449881	name,	-0.124939
-4.449881	(approximately):	-0.124939
-4.449881	reset	-0.124939
-4.449881	disassembly,	-0.124939
-4.449881	ordering?	-0.124939
-4.449881	arranging	-0.124939
-4.449881	hints	-0.124939
-4.449881	functionality.	-0.124939
-4.449881	decoded	-0.124939
-4.449881	a<<b<<c=a<<(b+c)	-0.124939
-4.449881	__intel_cpu_feature_indicator	-0.124939
-4.449881	Vol.	-0.124939
-4.449881	unchanged.	-0.124939
-4.449881	unchanged,	-0.124939
-4.449881	tag	-0.124939
-4.449881	Or,	-0.124939
-4.449881	Combining	-0.124939
-4.449881	fetch	-0.124939
-4.449881	(c1	-0.124939
-4.449881	reserved	-0.124939
-4.449881	balanced	-0.124939
-4.449881	convenient.	-0.124939
-4.449881	(when	-0.124939
-4.449881	effectively	-0.124939
-4.449881	specification.	-0.124939
-4.449881	high.	-0.124939
-4.449881	native	-0.124939
-4.449881	rendering	-0.124939
-4.449881	First-In-First-	-0.124939
-4.449881	line:	-0.124939
-4.449881	dilemma	-0.124939
-4.449881	(b*c)/d,	-0.124939
-4.449881	propagate	-0.124939
-4.449881	varies	-0.124939
-4.449881	line,	-0.124939
-4.449881	framework...........................................................................	-0.124939
-4.449881	floats:	-0.124939
-4.449881	timediff[NumberOfTests];	-0.124939
-4.449881	floats.	-0.124939
-4.449881	22).	-0.124939
-4.449881	Place	-0.124939
-4.449881	abc;	-0.124939
-4.449881	ambiguous	-0.124939
-4.449881	models.	-0.124939
-4.449881	correspond	-0.124939
-4.449881	_mm_permutevar_ps	-0.124939
-4.449881	7.38b.	-0.124939
-4.449881	Y;	-0.124939
-4.449881	BigArray[1024];	-0.124939
-4.449881	3B.	-0.124939
-4.449881	steps	-0.124939
-4.449881	looping	-0.124939
-4.449881	Func1(2);	-0.124939
-4.449881	flawed	-0.124939
-4.449881	about.	-0.124939
-4.449881	pop-up	-0.124939
-4.449881	signifying	-0.124939
-4.449881	renaming.	-0.124939
-4.449881	manually,	-0.124939
-4.449881	<malloc.h>	-0.124939
-4.449881	spell	-0.124939
-4.449881	sequential,	-0.124939
-4.449881	&list[0];	-0.124939
-4.449881	(Not	-0.124939
-4.449881	day	-0.124939
-4.449881	complications.	-0.124939
-4.449881	least,	-0.124939
-4.449881	VTune	-0.124939
-4.449881	__vrs4_expf	-0.124939
-4.449881	x-xx----x	-0.124939
-4.449881	identifies	-0.124939
-4.449881	--------x	-0.124939
-4.449881	a|(b&c)	-0.124939
-4.449881	double's	-0.124939
-4.449881	lists.	-0.124939
-4.449881	lists,	-0.124939
-4.449881	programmed	-0.124939
-4.449881	most.	-0.124939
-4.449881	Print	-0.124939
-4.449881	designers	-0.124939
-4.449881	Try	-0.124939
-4.449881	contiguously	-0.124939
-4.449881	8.1b	-0.124939
-4.449881	x,y	-0.124939
-4.449881	8.1a	-0.124939
-4.449881	-fno-strict-overflow.	-0.124939
-4.449881	usable	-0.124939
-4.449881	Reset	-0.124939
-4.449881	likelihood	-0.124939
-4.449881	fixed-size	-0.124939
-4.449881	dependent.	-0.124939
-4.449881	right-most	-0.124939
-4.449881	---------	-0.124939
-4.449881	143.	-0.124939
-4.449881	||).	-0.124939
-4.449881	facilitate	-0.124939
-4.449881	2007.	-0.124939
-4.449881	12.9b.	-0.124939
-4.449881	esp+8	-0.124939
-4.449881	......................................	-0.124939
-4.449881	economy,	-0.124939
-4.449881	lately.	-0.124939
-4.449881	structures:	-0.124939
-4.449881	row-wise,	-0.124939
-4.449881	lookup-table	-0.124939
-4.449881	reached	-0.124939
-4.449881	8.16	-0.124939
-4.449881	__rdtsc()).	-0.124939
-4.449881	(GOT).	-0.124939
-4.449881	8.17	-0.124939
-4.449881	8.18	-0.124939
-4.449881	Sequential	-0.124939
-4.449881	may,	-0.124939
-4.449881	multithreading.	-0.124939
-4.449881	7.43	-0.124939
-4.449881	7.42	-0.124939
-4.449881	renewed.	-0.124939
-4.449881	7.45	-0.124939
-4.449881	7.44	-0.124939
-4.449881	7.4.	-0.124939
-4.449881	two(2,2,2,2,2,2,2,2);	-0.124939
-4.449881	while-loop	-0.124939
-4.449881	Compiled	-0.124939
-4.449881	1/n!	-0.124939
-4.449881	378.7	-0.124939
-4.449881	counting	-0.124939
-4.449881	"\nError:	-0.124939
-4.449881	neutralize	-0.124939
-4.449881	x.i	-0.124939
-4.449881	38.1	-0.124939
-4.449881	38.7	-0.124939
-4.449881	"__attribute__((visibility("hidden")))".	-0.124939
-4.449881	animations	-0.124939
-4.449881	Manual".	-0.124939
-4.449881	demonstration	-0.124939
-4.449881	cards,	-0.124939
-4.449881	1./6.22702E9,	-0.124939
-4.449881	hand	-0.124939
-4.449881	caller,	-0.124939
-4.449881	32,	-0.124939
-4.449881	provokes	-0.124939
-4.449881	operand.	-0.124939
-4.449881	memory-hungry	-0.124939
-4.449881	provoked	-0.124939
-4.449881	32;	-0.124939
-4.449881	WhateverFunction(i);	-0.124939
-4.449881	unavoidable.	-0.124939
-4.449881	apparently	-0.124939
-4.449881	DLL.	-0.124939
-4.449881	_mm_stream_ps	-0.124939
-4.449881	perform	-0.124939
-4.449881	mark_end;	-0.124939
-4.449881	96).	-0.124939
-4.449881	x^1,	-0.124939
-4.449881	-ftrapv,	-0.124939
-4.449881	libircmt.lib.	-0.124939
-4.449881	1.2345);	-0.124939
-4.449881	repetitive.	-0.124939
-4.449881	(n)	-0.124939
-4.449881	consumers.	-0.124939
-4.449881	request	-0.124939
-4.449881	N:	-0.124939
-4.449881	"Alpha",	-0.124939
-4.449881	(be	-0.124939
-4.449881	artificially	-0.124939
-4.449881	animation.	-0.124939
-4.449881	cons	-0.124939
-4.449881	tolerance.	-0.124939
-4.449881	0x3F00	-0.124939
-4.449881	17is	-0.124939
-4.449881	reputation.	-0.124939
-4.449881	Iu8vec8	-0.124939
-4.449881	(/Oa).	-0.124939
-4.449881	saturated	-0.124939
-4.449881	offsets).	-0.124939
-4.449881	$B1$2:	-0.124939
-4.449881	knowledge	-0.124939
-4.449881	list[i].b	-0.124939
-4.449881	stub.	-0.124939
-4.449881	Espresso)	-0.124939
-4.449881	Guide	-0.124939
-4.449881	"Integrated	-0.124939
-4.449881	T+6,	-0.124939
-4.449881	examples:	-0.124939
-4.449881	bear	-0.124939
-4.449881	log(2.0)	-0.124939
-4.449881	andnot(a,a)	-0.124939
-4.449881	12.8a.	-0.124939
-4.449881	0x7FFFFFFF;	-0.124939
-4.449881	-mavx,	-0.124939
-4.449881	"__attribute__((visibility	-0.124939
-4.449881	2:8+esp	-0.124939
-4.449881	c2,	-0.124939
-4.449881	anything,	-0.124939
-4.449881	pulses	-0.124939
-4.449881	disappears	-0.124939
-4.449881	requests	-0.124939
-4.449881	reliable.	-0.124939
-4.449881	Typically	-0.124939
-4.449881	constructing	-0.124939
-4.449881	Non-public	-0.124939
-4.449881	GB,	-0.124939
-4.449881	writable	-0.124939
-4.449881	c2;	-0.124939
-4.449881	inttypes.h	-0.124939
-4.449881	attempting	-0.124939
-4.449881	Debugging.	-0.124939
-4.449881	indexes,	-0.124939
-4.449881	--xxxxxx-	-0.124939
-4.449881	preprocessor	-0.124939
-4.449881	enters	-0.124939
-4.449881	July	-0.124939
-4.449881	lookup[2]	-0.124939
-4.449881	Surprisingly,	-0.124939
-4.449881	because,	-0.124939
-4.449881	jeopardizing	-0.124939
-4.449881	draws	-0.124939
-4.449881	UNIX	-0.124939
-4.449881	_mm256_permutevar_ps	-0.124939
-4.449881	'$'	-0.124939
-4.449881	exist.	-0.124939
-4.449881	9.3.	-0.124939
-4.449881	freely	-0.124939
-4.449881	taking	-0.124939
-4.449881	values:	-0.124939
-4.449881	zation	-0.124939
-4.449881	(4096).	-0.124939
-4.449881	process......................................................................................................	-0.124939
-4.449881	values,	-0.124939
-4.449881	maximum,	-0.124939
-4.449881	discussions	-0.124939
-4.449881	#define,	-0.124939
-4.449881	_fpreset();	-0.124939
-4.449881	observed	-0.124939
-4.449881	15.1d	-0.124939
-4.449881	package,	-0.124939
-4.449881	piecewise	-0.124939
-4.449881	division:	-0.124939
-4.449881	unused.	-0.124939
-4.449881	ab[size];	-0.124939
-4.449881	steal	-0.124939
-4.449881	array[i++]	-0.124939
-4.449881	FPGAs.	-0.124939
-4.449881	x^n/n!	-0.124939
-4.449881	access................................................................................................................	-0.124939
-4.449881	OMF	-0.124939
-4.449881	nicely	-0.124939
-4.449881	finishes	-0.124939
-4.449881	keyword:	-0.124939
-4.449881	keyword,	-0.124939
-4.449881	2GHz	-0.124939
-4.449881	cryptography	-0.124939
-4.449881	Professional	-0.124939
-4.449881	keyword.	-0.124939
-4.449881	strcpy,	-0.124939
-4.449881	7.35b	-0.124939
-4.449881	7.35a	-0.124939
-4.449881	plain	-0.124939
-4.449881	_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);	-0.124939
-4.449881	_mm_stream_si32	-0.124939
-4.449881	GB.	-0.124939
-4.449881	advisable	-0.124939
-4.449881	mentations	-0.124939
-4.449881	_mm_i32gather_ps	-0.124939
-4.449881	100000001.23456.	-0.124939
-4.449881	9.6b	-0.124939
-4.449881	processors).	-0.124939
-4.449881	weighed	-0.124939
-4.449881	/arch:SSE2.	-0.124939
-4.449881	2009).	-0.124939
-4.449881	UnusedFiller	-0.124939
-4.449881	flaws:	-0.124939
-4.449881	Booleans...................................................................................................................	-0.124939
-4.449881	notion	-0.124939
-4.449881	Atom).	-0.124939
-4.449881	kbytes.	-0.124939
-4.449881	Intel)	-0.124939
-4.449881	comparison:	-0.124939
-4.449881	Intel.	-0.124939
-4.449881	FreeBSD	-0.124939
-4.449881	somewhat.	-0.124939
-4.449881	set:	-0.124939
-4.449881	SelectAddMul_dispatch(short	-0.124939
-4.449881	<intrin.h>	-0.124939
-4.449881	reduced.	-0.124939
-4.449881	Avoiding	-0.124939
-4.449881	reporting	-0.124939
-4.449881	profiling,	-0.124939
-4.449881	profiling.	-0.124939
-4.449881	numbered	-0.124939
-4.449881	phase	-0.124939
-4.449881	(b1	-0.124939
-4.449881	builder	-0.124939
-4.449881	Uncached	-0.124939
-4.449881	responsi-	-0.124939
-4.449881	thought	-0.124939
-4.449881	(SVML).	-0.124939
-4.449881	xx(-)x-	-0.124939
-4.449881	8.23a.	-0.124939
-4.449881	indexed	-0.124939
-4.449881	event-counters	-0.124939
-4.449881	often.	-0.124939
-4.449881	Microcontrollers	-0.124939
-4.449881	genuine	-0.124939
-4.449881	calculate.	-0.124939
-4.449881	modularity,	-0.124939
-4.449881	modularity.	-0.124939
-4.449881	localize	-0.124939
-4.449881	optimize,	-0.124939
-4.449881	Instead	-0.124939
-4.449881	Repeating	-0.124939
-4.449881	Fine-grained	-0.124939
-4.449881	script.	-0.124939
-4.449881	28)	-0.124939
-4.449881	work,	-0.124939
-4.449881	28,	-0.124939
-4.449881	calculated.	-0.124939
-4.449881	"FDIV	-0.124939
-4.449881	3.1,	-0.124939
-4.449881	slower,	-0.124939
-4.449881	reciprocal:	-0.124939
-4.449881	__attribute__((fastcall)).	-0.124939
-4.449881	cc[size]	-0.124939
-4.449881	fistpl	-0.124939
-4.449881	VML	-0.124939
-4.449881	Bridge)	-0.124939
-4.449881	mmintrin.h	-0.124939
-4.449881	long,	-0.124939
-4.449881	wastes	-0.124939
-4.449881	inexact	-0.124939
-4.449881	odd-sized	-0.124939
-4.449881	1./4.790016E8,	-0.124939
-4.449881	thread-like	-0.124939
-4.449881	T+5,	-0.124939
-4.449881	Itanium	-0.124939
-4.449881	Relocation.	-0.124939
-4.449881	brands,	-0.124939
-4.449881	2056	-0.124939
-4.449881	NUMROWS;	-0.124939
-4.449881	Lists	-0.124939
-4.449881	module2.cpp	-0.124939
-4.449881	12.8b.	-0.124939
-4.449881	0.29	-0.124939
-4.449881	14.18c	-0.124939
-4.449881	xmmintrin.h	-0.124939
-4.449881	blocking:	-0.124939
-4.449881	0.27	-0.124939
-4.449881	0.22	-0.124939
-4.449881	((C	-0.124939
-4.449881	((B	-0.124939
-4.449881	Func(a[i]);	-0.124939
-4.449881	creates	-0.124939
-4.449881	work-around	-0.124939
-4.449881	results,	-0.124939
-4.449881	log)	-0.124939
-4.449881	a*b*c=a*(b*c)	-0.124939
-4.449881	non-vector	-0.124939
-4.449881	recycled?	-0.124939
-4.449881	carry)	-0.124939
-4.449881	OS.	-0.124939
-4.449881	FactorialTable	-0.124939
-4.449881	timediff[i]);	-0.124939
-4.449881	can.	-0.124939
-4.449881	JNZ).	-0.124939
-4.449881	chain,	-0.124939
-4.449881	to)	-0.124939
-4.449881	!(a<b)=(a>=b)	-0.124939
-4.449881	134.	-0.124939
-4.449881	shows,	-0.124939
-4.449881	feeds	-0.124939
-4.449881	1.0)	-0.124939
-4.449881	improvements	-0.124939
-4.449881	introduced	-0.124939
-4.449881	Hence,	-0.124939
-4.449881	x86)	-0.124939
-4.449881	8.2a	-0.124939
-4.449881	8.2b	-0.124939
-4.449881	alternately	-0.124939
-4.449881	routines	-0.124939
-4.449881	billions	-0.124939
-4.449881	1.09	-0.124939
-4.449881	x4∙xn-4.	-0.124939
-4.449881	103),	-0.124939
-4.449881	1980	-0.124939
-4.449881	14.7b,	-0.124939
-4.449881	14.7b.	-0.124939
-4.449881	Ignoring	-0.124939
-4.449881	affected	-0.124939
-4.449881	x.d	-0.124939
-4.449881	x.f	-0.124939
-4.449881	7.9b	-0.124939
-4.449881	7.9a	-0.124939
-4.449881	identical.	-0.124939
-4.449881	somewhere	-0.124939
-4.449881	!b	-0.124939
-4.449881	bus	-0.124939
-4.449881	(FIFO)	-0.124939
-4.449881	practice,	-0.124939
-4.449881	8.24	-0.124939
-4.449881	8.25	-0.124939
-4.449881	disassembler.	-0.124939
-4.449881	&&,	-0.124939
-4.449881	8.20	-0.124939
-4.449881	(critical	-0.124939
-4.449881	8.22	-0.124939
-4.449881	smart.	-0.124939
-4.449881	12.9a.	-0.124939
-4.449881	SSE2,	-0.124939
-4.449881	r.b;}	-0.124939
-4.449881	Multiplying	-0.124939
-4.449881	post-increment	-0.124939
-4.449881	bugs,	-0.124939
-4.449881	C1::f.	-0.124939
-4.449881	F32vec8	-0.124939
-4.449881	Will	-0.124939
-4.449881	overkill.	-0.124939
-4.449881	8.3b	-0.124939
-4.449881	ordinary	-0.124939
-4.449881	paying	-0.124939
-4.449881	Journal	-0.124939
-4.449881	-1.0E8,	-0.124939
-4.449881	imprecision	-0.124939
-4.449881	provide	-0.124939
-4.449881	in-between	-0.124939
-4.449881	__intel_cpu_feature_indicator_x.	-0.124939
-4.449881	recovery	-0.124939
-4.449881	(WTL).	-0.124939
-4.449881	2.8.	-0.124939
-4.449881	indicates	-0.124939
-4.449881	46	-0.124939
-4.449881	(WTL):	-0.124939
-4.449881	44	-0.124939
-4.449881	decision.	-0.124939
-4.449881	42	-0.124939
-4.449881	indicated	-0.124939
-4.449881	workstations	-0.124939
-4.449881	41	-0.124939
-4.449881	230.7	-0.124939
-4.449881	faster,	-0.124939
-4.449881	(cc[i]	-0.124939
-4.449881	85	-0.124939
-4.449881	/Qipo	-0.124939
-4.449881	indeed.	-0.124939
-4.449881	a[1]	-0.124939
-4.449881	6);	-0.124939
-4.449881	hackers.	-0.124939
-4.449881	exact.	-0.124939
-4.449881	(row	-0.124939
-4.449881	friendly.	-0.124939
-4.449881	predictable,	-0.124939
-4.449881	(OnIdle	-0.124939
-4.449881	abusing	-0.124939
-4.449881	Leaf	-0.124939
-4.449881	"Delta"	-0.124939
-4.449881	0x10,	-0.124939
-4.449881	................................................................................................................	-0.124939
-4.449881	C2::Disp()	-0.124939
-4.449881	sleep	-0.124939
-4.449881	(SIMD)	-0.124939
-4.449881	if-else	-0.124939
-4.449881	();	-0.124939
-4.449881	feeding	-0.124939
-4.449881	----x---x	-0.124939
-4.449881	violate	-0.124939
-4.449881	34.	-0.124939
-4.449881	1.19	-0.124939
-4.449881	(a&&b)||(a&&!b)=a	-0.124939
-4.449881	mathe-	-0.124939
-4.449881	VHDL	-0.124939
-4.449881	postponed	-0.124939
-4.449881	rise	-0.124939
-4.449881	u[2]}	-0.124939
-4.449881	browsing	-0.124939
-4.449881	counter:	-0.124939
-4.449881	0.3,	-0.124939
-4.449881	Henry	-0.124939
-4.449881	encounter	-0.124939
-4.449881	Bit-fields	-0.124939
-4.449881	unacceptable.	-0.124939
-4.449881	live-ranges	-0.124939
-4.449881	0.30	-0.124939
-4.449881	Serialize	-0.124939
-4.449881	Reference	-0.124939
-4.449881	consuming,	-0.124939
-4.449881	problems,	-0.124939
-4.449881	(GOT)	-0.124939
-4.449881	0.38	-0.124939
-4.449881	----x----	-0.124939
-4.449881	created,	-0.124939
-4.449881	transpose(matrix);	-0.124939
-4.449881	contention.	-0.124939
-4.449881	powN<(N1&(N1-1))==0,N1>::p(x)	-0.124939
-4.449881	(r1	-0.124939
-4.449881	wrapping	-0.124939
-4.449881	omitted,	-0.124939
-4.449881	p->a	-0.124939
-4.449881	newer.	-0.124939
-4.449881	consisting	-0.124939
-4.449881	estimated	-0.124939
-4.449881	Underestimating	-0.124939
-4.449881	edx.	-0.124939
-4.449881	Z.	-0.124939
-4.449881	Architectures	-0.124939
-4.449881	hide	-0.124939
-4.449881	specification	-0.124939
-4.449881	decoding	-0.124939
-4.449881	abort(),	-0.124939
-4.449881	others.	-0.124939
-4.449881	Z;	-0.124939
-4.449881	memory.................................................................	-0.124939
-4.449881	considered.	-0.124939
-4.449881	general.	-0.124939
-4.449881	7.38a.	-0.124939
-4.449881	hundreds	-0.124939
-4.449881	Func2(double	-0.124939
-4.449881	studio	-0.124939
-4.449881	reinvent	-0.124939
-4.449881	yesterday's	-0.124939
-4.449881	4.5.2,	-0.124939
-4.449881	NEAR	-0.124939
-4.449881	CodeGear,	-0.124939
-4.449881	http://www.agner.org/optimize/asmlib.zip	-0.124939
-4.449881	compiler)	-0.124939
-4.449881	2040	-0.124939
-4.449881	7.43a.	-0.124939
-4.449881	recursive	-0.124939
-4.449881	Trying	-0.124939
-4.449881	29.	-0.124939
-4.449881	returning.	-0.124939
-4.449881	sizeof(b));	-0.124939
-4.449881	scheduler.	-0.124939
-4.449881	summarizes	-0.124939
-4.449881	(b*c)	-0.124939
-4.449881	prints	-0.124939
-4.449881	optimization",	-0.124939
-4.449881	(2.0f);	-0.124939
-4.449881	list[i	-0.124939
-4.449881	pow(x,N)	-0.124939
-4.449881	[eax+400]	-0.124939
-4.449881	-156.	-0.124939
-4.449881	speeded	-0.124939
-4.449881	y.d	-0.124939
-4.449881	used:	-0.124939
-4.449881	m.	-0.124939
-4.449881	m)	-0.124939
-4.449881	y.a	-0.124939
-4.449881	y.b	-0.124939
-4.449881	y.c	-0.124939
-4.449881	Coriolis	-0.124939
-4.449881	...).	-0.124939
-4.449881	m>	-0.124939
-4.449881	...))	-0.124939
-4.449881	oldest	-0.124939
-4.449881	correspondingly	-0.124939
-4.449881	excellent	-0.124939
-4.449881	(float)i;	-0.124939
-4.449881	grandparent	-0.124939
-4.449881	Vec4q	-0.124939
-4.449881	ebx,1	-0.124939
-4.449881	(u.i[1]	-0.124939
-4.449881	simplicity.	-0.124939
-4.449881	Actually,	-0.124939
-4.449881	geometry	-0.124939
-4.449881	Vec4i	-0.124939
-4.449881	saves	-0.124939
-4.449881	aliased	-0.124939
-4.449881	Vec4d	-0.124939
-4.449881	(MS	-0.124939
-4.449881	scheduling	-0.124939
-4.449881	PowerPC).	-0.124939
-4.449881	guarantee	-0.124939
-4.449881	Virtualization	-0.124939
-4.449881	algorithm.	-0.124939
-4.449881	compete	-0.124939
-4.449881	searches	-0.124939
-4.449881	both,	-0.124939
-4.449881	writes.	-0.124939
-4.449881	p->member	-0.124939
-4.449881	confirmed	-0.124939
-4.449881	hence	-0.124939
-4.449881	EXCEPTION_CONTINUE_SEARCH)	-0.124939
-4.449881	14.21.	-0.124939
-4.449881	versatile.	-0.124939
-4.449881	systematization	-0.124939
-4.449881	footprint.	-0.124939
-4.449881	memcpy:	-0.124939
-4.449881	summing	-0.124939
-4.449881	ports,	-0.124939
-4.449881	Assume,	-0.124939
-4.449881	ultimate	-0.124939
-4.449881	organized.	-0.124939
-4.449881	parsing	-0.124939
-4.449881	©	-0.124939
-4.449881	official	-0.124939
-4.449881	fragmentation	-0.124939
-4.449881	construction	-0.124939
-4.449881	modified,	-0.124939
-4.449881	40%	-0.124939
-4.449881	9.0	-0.124939
-4.449881	suppress.	-0.124939
-4.449881	reflected,	-0.124939
-4.449881	137,	-0.124939
-4.449881	terminates	-0.124939
-4.449881	26).	-0.124939
-4.449881	perfectly.	-0.124939
-4.449881	16.4	-0.124939
-4.449881	-fwhole-	-0.124939
-4.449881	terminated	-0.124939
-4.449881	.	-0.124939
-4.449881	backwards.	-0.124939
-4.449881	"Gamma",	-0.124939
-4.449881	often,	-0.124939
-4.449881	history,	-0.124939
-4.449881	c1()	-0.124939
-4.449881	integer-to-float	-0.124939
-4.449881	arraysize	-0.124939
-4.449881	Vec16us	-0.124939
-4.449881	---xxx-x-	-0.124939
-4.449881	subexpressions,	-0.124939
-4.449881	unreferenced	-0.124939
-4.449881	non-recursing	-0.124939
-4.449881	puts	-0.124939
-4.449881	15.0)	-0.124939
-4.449881	identifier	-0.124939
-4.449881	gained	-0.124939
-4.449881	stage	-0.124939
-4.449881	i++,i2+=2.0f)a[i]=i2;	-0.124939
-4.449881	(byte	-0.124939
-4.449881	asmlib..	-0.124939
-4.449881	combining	-0.124939
-4.449881	scope.	-0.124939
-4.449881	selecting	-0.124939
-4.449881	.............................................................	-0.124939
-4.449881	connections,	-0.124939
-4.449881	(temp	-0.124939
-4.449881	added.	-0.124939
-4.449881	Namespaces	-0.124939
-4.449881	merge	-0.124939
-4.449881	142).	-0.124939
-4.449881	flush	-0.124939
-4.449881	ASP	-0.124939
-4.449881	documented.	-0.124939
-4.449881	standards.	-0.124939
-4.449881	pointers.......................................................................................................37	-0.124939
-4.449881	correctness	-0.124939
-4.449881	(rather	-0.124939
-4.449881	(zero	-0.124939
-4.449881	CriticalFunction_Dispatch(int	-0.124939
-4.449881	BTB	-0.124939
-4.449881	profile-guided	-0.124939
-4.449881	PROC	-0.124939
-4.449881	0x3FF	-0.124939
-4.449881	7.32a	-0.124939
-4.449881	(*p	-0.124939
-4.449881	issues,	-0.124939
-4.449881	coding	-0.124939
-4.449881	F2(float	-0.124939
-4.449881	options.......................................................................................	-0.124939
-4.449881	0x3FFF	-0.124939
-4.449881	$B1$3:	-0.124939
-4.449881	0.0;	-0.124939
-4.449881	rewritten	-0.124939
-4.449881	mutexes	-0.124939
-4.449881	_mm256_i64gather_epi32	-0.124939
-4.449881	attack	-0.124939
-4.449881	clock.	-0.124939
-4.449881	exp,	-0.124939
-4.449881	rights.	-0.124939
-4.449881	fluctuating	-0.124939
-4.449881	combine	-0.124939
-4.449881	b<c	-0.124939
-4.449881	incur	-0.124939
-4.449881	included.	-0.124939
-4.449881	compiler-specific.	-0.124939
-4.449881	security,	-0.124939
-4.449881	reinterpret_cast	-0.124939
-4.449881	CPU-dispatcher	-0.124939
-4.449881	1.5f;	-0.124939
-4.449881	Sum1,	-0.124939
-4.449881	(*.ini	-0.124939
-4.449881	returned.	-0.124939
-4.449881	1.25	-0.124939
-4.449881	1.21	-0.124939
-4.449881	(also	-0.124939
-4.449881	mechanisms.	-0.124939
-4.449881	Warren,	-0.124939
-4.449881	matrix[i][j]	-0.124939
-4.449881	pass	-0.124939
-4.449881	mechanisms,	-0.124939
-4.449881	Constantfolding	-0.124939
-4.449881	&CriticalFunction_Dispatch;	-0.124939
-4.449881	traditionally	-0.124939
-4.449881	Database	-0.124939
-4.449881	upon	-0.124939
-4.449881	isolated	-0.124939
-4.449881	thorough	-0.124939
-4.449881	EXCEPTION_EXECUTE_HANDLER	-0.124939
-4.449881	isolates	-0.124939
-4.449881	ball	-0.124939
-4.449881	intrin.h	-0.124939
-4.449881	Convert	-0.124939
-4.449881	pow,	-0.124939
-4.449881	denominator	-0.124939
-4.449881	int)(max	-0.124939
-4.449881	pattern,	-0.124939
-4.449881	of.	-0.124939
-4.449881	Float	-0.124939
-4.449881	Sort	-0.124939
-4.449881	excessively	-0.124939
-4.449881	inlined,	-0.124939
-4.449881	1024/4	-0.124939
-4.449881	max)	-0.124939
-4.449881	/QaxAVX	-0.124939
-4.449881	delaying	-0.124939
-4.449881	/Ox	-0.124939
-4.449881	/Oy	-0.124939
-4.449881	reorganize	-0.124939
-4.449881	intranet	-0.124939
-4.449881	v.i	-0.124939
-4.449881	(2,2,2,2),	-0.124939
-4.449881	directly:	-0.124939
-4.449881	INVALID_HANDLE_VALUE	-0.124939
-4.449881	(Embarcadero/CodeGear/Borland	-0.124939
-4.449881	/Oa	-0.124939
-4.449881	/Og	-0.124939
-4.449881	54.	-0.124939
-4.449881	micro-operation	-0.124939
-4.449881	||,	-0.124939
-4.449881	Bit	-0.124939
-4.449881	inputs.	-0.124939
-4.449881	infinity,	-0.124939
-4.449881	infinity.	-0.124939
-4.449881	utilizing	-0.124939
-4.449881	solution,	-0.124939
-4.449881	Larger	-0.124939
-4.449881	criticized	-0.124939
-4.449881	PathScale.	-0.124939
-4.449881	reduction.	-0.124939
-4.449881	label.	-0.124939
-4.449881	despite	-0.124939
-4.449881	/O2	-0.124939
-4.449881	reinstall	-0.124939
-4.449881	framework,	-0.124939
-4.449881	SSE3.	-0.124939
-4.449881	situation,	-0.124939
-4.449881	modulo.	-0.124939
-4.449881	5.0f;	-0.124939
-4.449881	a[0]	-0.124939
-4.449881	hard-to-find	-0.124939
-4.449881	new.	-0.124939
-4.449881	/fp:fast	-0.124939
-4.449881	adjusted	-0.124939
-4.449881	dominating.	-0.124939
-4.449881	discriminating	-0.124939
-4.449881	36.	-0.124939
-4.449881	Contents	-0.124939
-4.449881	Asmlib:	-0.124939
-4.449881	zero-terminated	-0.124939
-4.449881	pipelined,	-0.124939
-4.449881	polymorphous	-0.124939
-4.449881	_mm_free.	-0.124939
-4.449881	discovers	-0.124939
-4.449881	_mm_stream_pd	-0.124939
-4.449881	streams	-0.124939
-4.449881	_mm_stream_pi	-0.124939
-4.449881	2.20,	-0.124939
-4.449881	intervals.	-0.124939
-4.449881	81).	-0.124939
-4.449881	cycles).	-0.124939
-4.449881	16is	-0.124939
-4.449881	non-constant	-0.124939
-4.449881	taken.	-0.124939
-4.449881	irregular	-0.124939
-4.449881	extracts	-0.124939
-4.449881	taken,	-0.124939
-4.449881	games	-0.124939
-4.449881	lrint(d);	-0.124939
-4.449881	Is32vec4	-0.124939
-4.449881	Is32vec2	-0.124939
-4.449881	microcontrollers:	-0.124939
-4.449881	(other	-0.124939
-4.449881	-ipo	-0.124939
-4.449881	x[])	-0.124939
-4.449881	a[arraysize],	-0.124939
-4.449881	printf("Delta");	-0.124939
-4.449881	1.2	-0.124939
-4.449881	ends	-0.124939
-4.449881	quickly	-0.124939
-4.449881	restriction,	-0.124939
-4.449881	email	-0.124939
-4.449881	(Windows,	-0.124939
-4.449881	i*sizeof(S1).	-0.124939
-4.449881	end.	-0.124939
-4.449881	147	-0.124939
-4.449881	packed	-0.124939
-4.449881	0x2F00,	-0.124939
-4.449881	(Windows:	-0.124939
-4.449881	(!a&&b)	-0.124939
-4.449881	temp2.	-0.124939
-4.449881	edition	-0.124939
-4.449881	stride,	-0.124939
-4.449881	stride.	-0.124939
-4.449881	stride)	-0.124939
-4.449881	temporarily	-0.124939
-4.449881	teachers	-0.124939
-4.449881	stopping	-0.124939
-4.449881	0x2C	-0.124939
-4.449881	(CGrandParent)	-0.124939
-4.449881	caches.	-0.124939
-4.449881	positive.	-0.124939
-4.449881	Booth:	-0.124939
-4.449881	Libraries	-0.124939
-4.449881	Because	-0.124939
-4.449881	Find	-0.124939
-4.449881	2016.	-0.124939
-4.449881	!(!a)=a	-0.124939
-4.449881	NAN.	-0.124939
-4.449881	Interrupt	-0.124939
-4.449881	platform-independent	-0.124939
-4.449881	strcat,	-0.124939
-4.449881	(bit	-0.124939
-4.449881	required,	-0.124939
-4.449881	numerical	-0.124939
-4.449881	characters	-0.124939
-4.449881	made)	-0.124939
-4.449881	matrices.	-0.124939
-4.449881	Thursday,	-0.124939
-4.449881	counterparts.	-0.124939
-4.449881	90.	-0.124939
-4.449881	double.....................................................................................	-0.124939
-4.449881	(Of	-0.124939
-4.449881	language......................................................	-0.124939
-4.449881	64).	-0.124939
-4.449881	STL.	-0.124939
-4.449881	matters:	-0.124939
-4.449881	0=	-0.124939
-4.449881	advance,	-0.124939
-4.449881	operators...............................................................................	-0.124939
-4.449881	84).	-0.124939
-4.449881	(Visual	-0.124939
-4.449881	13.1.	-0.124939
-4.449881	new/delete	-0.124939
-4.449881	14.22b	-0.124939
-4.449881	14.22a	-0.124939
-4.449881	technological	-0.124939
-4.449881	matters,	-0.124939
-4.449881	interprets	-0.124939
-4.449881	Run	-0.124939
-4.449881	vectorizing	-0.124939
-4.449881	400,	-0.124939
-4.449881	www.open-	-0.124939
-4.449881	15.1d.	-0.124939
-4.449881	Taking	-0.124939
-4.449881	reloaded	-0.124939
-4.449881	features:	-0.124939
-4.449881	Nerds	-0.124939
-4.449881	user-written	-0.124939
-4.449881	59	-0.124939
-4.449881	amd_vrd2_exp	-0.124939
-4.449881	5;	-0.124939
-4.449881	clearly	-0.124939
-4.449881	-fno-alias	-0.124939
-4.449881	ingenious	-0.124939
-4.449881	57	-0.124939
-4.449881	denominator:	-0.124939
-4.449881	Func1(double)	-0.124939
-4.449881	later)	-0.124939
-4.449881	Denmark.	-0.124939
-4.449881	StoreNTD(double	-0.124939
-4.449881	(Both	-0.124939
-4.449881	5,	-0.124939
-4.449881	two:	-0.124939
-4.449881	14.18a	-0.124939
-4.449881	14.18b	-0.124939
-4.449881	53.	-0.124939
-4.449881	two,	-0.124939
-4.449881	Slongdouble	-0.124939
-4.449881	9.2b	-0.124939
-4.449881	9.2a	-0.124939
-4.449881	Much	-0.124939
-4.449881	evicted.	-0.124939
-4.449881	advertise	-0.124939
-4.449881	(-a>-b)=(a<b)	-0.124939
-4.449881	short.	-0.124939
-4.449881	45.	-0.124939
-4.449881	1./39916800.,	-0.124939
-4.449881	occurrences	-0.124939
-4.449881	Connecting	-0.124939
-4.449881	134)	-0.124939
-4.449881	12.1a,	-0.124939
-4.449881	memset,	-0.124939
-4.449881	destroyed.	-0.124939
-4.449881	have:	-0.124939
-4.449881	memset:	-0.124939
-4.449881	etc.).	-0.124939
-4.449881	9.2,	-0.124939
-4.449881	things.	-0.124939
-4.449881	limitation	-0.124939
-4.449881	8.24.	-0.124939
-4.449881	attacks	-0.124939
-4.449881	7.32b	-0.124939
-4.449881	undocumented	-0.124939
-4.449881	surely	-0.124939
-4.449881	-1.	-0.124939
-4.449881	met:	-0.124939
-4.449881	shut	-0.124939
-4.449881	sections.	-0.124939
-4.449881	behaviors.	-0.124939
-4.449881	Very	-0.124939
-4.449881	assembly-like	-0.124939
-4.449881	Addison-	-0.124939
-4.449881	following:	-0.124939
-4.449881	difference,	-0.124939
-4.449881	bottleneck.	-0.124939
-4.449881	sequence.	-0.124939
-4.449881	__declspec(__align(64))	-0.124939
-4.449881	rounds	-0.124939
-4.449881	expansions.	-0.124939
-4.449881	temp1	-0.124939
-4.449881	0x40)	-0.124939
-4.449881	temp.	-0.124939
-4.449881	printf("\nResults:");	-0.124939
-4.449881	Low-level	-0.124939
-4.449881	0x0F)	-0.124939
-4.449881	...........................................................................	-0.124939
-4.449881	powN<true,N/2>::p(x)	-0.124939
-4.449881	version)	-0.124939
-4.449881	memcpy(b,	-0.124939
-4.449881	b[arraysize],	-0.124939
-4.449881	truth	-0.124939
-4.449881	ADX	-0.124939
-4.449881	ADC	-0.124939
-4.449881	realize	-0.124939
-4.449881	Contain	-0.124939
-4.449881	objects,	-0.124939
-4.449881	I64vec2	-0.124939
-4.449881	mitigated	-0.124939
-4.449881	-axSSE3,	-0.124939
-4.449881	objects)	-0.124939
-4.449881	2015	-0.124939
-4.449881	r+i/2	-0.124939
-4.449881	lag.	-0.124939
-4.449881	tedious.	-0.124939
-4.449881	design.	-0.124939
-4.449881	design,	-0.124939
-4.449881	scratch.	-0.124939
-4.449881	0.63	-0.124939
-4.449881	---xxx---	-0.124939
-4.449881	Programmable	-0.124939
-4.449881	producer	-0.124939
-4.449881	says.	-0.124939
-4.449881	-ffunction-sections)	-0.124939
-4.449881	re-allocated	-0.124939
-4.449881	nn	-0.124939
-4.449881	DEC,	-0.124939
-4.449881	whereas	-0.124939
-4.449881	Faster,	-0.124939
-4.449881	(Linux	-0.124939
-4.449881	ns	-0.124939
-4.449881	Effective	-0.124939
-4.449881	XOP	-0.124939
-4.449881	XOR	-0.124939
-4.449881	stand	-0.124939
-4.449881	-fp-model	-0.124939
-4.449881	class?	-0.124939
-4.449881	free)	-0.124939
-4.449881	1./2.09227E13};	-0.124939
-4.449881	135).	-0.124939
-4.449881	caching,	-0.124939
-4.449881	fprintf	-0.124939
-4.449881	Sum3.	-0.124939
-4.449881	block:	-0.124939
-4.449881	clients	-0.124939
-4.449881	1.4,	-0.124939
-4.449881	column++)	-0.124939
-4.449881	occurred	-0.124939
-4.449881	tool.	-0.124939
-4.449881	iterator	-0.124939
-4.449881	illegal	-0.124939
-4.449881	8.6a	-0.124939
-4.449881	8.6b	-0.124939
-4.449881	zip	-0.124939
-4.449881	7.15a.	-0.124939
-4.449881	33%	-0.124939
-4.449881	signed.	-0.124939
-4.449881	signed,	-0.124939
-4.449881	7.5.	-0.124939
-4.449881	s0,	-0.124939
-4.449881	remarkably	-0.124939
-4.449881	Today	-0.124939
-4.449881	great	-0.124939
-4.449881	files).	-0.124939
-4.449881	/QaxSSE3,	-0.124939
-4.449881	(www.agner.org/optimize/testp.zip).	-0.124939
-4.449881	everything,	-0.124939
-4.449881	Security.	-0.124939
-4.449881	standard.	-0.124939
-4.449881	methods:	-0.124939
-4.449881	respectively	-0.124939
-4.449881	v.10.3	-0.124939
-4.449881	v.10.2	-0.124939
-4.449881	tiling.	-0.124939
-4.449881	doubles:	-0.124939
-4.449881	Next,	-0.124939
-4.449881	alternative.	-0.124939
-4.449881	Library)	-0.124939
-4.449881	lea	-0.124939
-4.449881	StoreVectorA(void	-0.124939
-4.449881	emphasized	-0.124939
-4.449881	*.so)	-0.124939
-4.449881	en.wikipedia.org/wiki/Compiler_optimization.	-0.124939
-4.449881	wmmintrin.h	-0.124939
-4.449881	flip-flops,	-0.124939
-4.449881	Microsoft's	-0.124939
-4.449881	(/FAs	-0.124939
-4.449881	standards	-0.124939
-4.449881	140.	-0.124939
-4.449881	for(i=0,i2=0;	-0.124939
-4.449881	(Division	-0.124939
-4.449881	i<301;	-0.124939
-4.449881	valid)	-0.124939
-4.449881	CPU’s.	-0.124939
-4.449881	ab[i].b	-0.124939
-4.449881	forwards,	-0.124939
-4.449881	elimination.	-0.124939
-4.449881	unreliable.	-0.124939
-4.449881	port	-0.124939
-4.449881	Currently	-0.124939
-4.449881	cheap,	-0.124939
-4.449881	Library.	-0.124939
-4.449881	(www.intel.com/technology/itj/).	-0.124939
-4.449881	timing,	-0.124939
-4.449881	520	-0.124939
-4.449881	properties)	-0.124939
-4.449881	result,	-0.124939
-4.449881	reply	-0.124939
-4.449881	14.17b	-0.124939
-4.449881	52;	-0.124939
-4.449881	costless	-0.124939
-4.449881	penalty.	-0.124939
-4.449881	fetching,	-0.124939
-4.449881	afterwards.	-0.124939
-4.449881	123;	-0.124939
-4.449881	groups	-0.124939
-4.449881	_mm_load_si128((__m128i	-0.124939
-4.449881	sent	-0.124939
-4.449881	(less	-0.124939
-4.449881	_mm_exp_ps	-0.124939
-4.449881	noticeable	-0.124939
-4.449881	_mm_exp_pd	-0.124939
-4.449881	Step	-0.124939
-4.449881	__declspec(cpu_dispatch(...)).	-0.124939
-4.449881	c[size];	-0.124939
-4.449881	bb[i]*cc[i]	-0.124939
-4.449881	memset(list,	-0.124939
-4.449881	broader	-0.124939
-4.449881	NUMCOLUMNS;	-0.124939
-4.449881	semicolons	-0.124939
-4.449881	toggle	-0.124939
-4.449881	14.7a.	-0.124939
-4.449881	Today's	-0.124939
-4.449881	holes	-0.124939
-4.449881	12.1.	-0.124939
-4.449881	Therefore	-0.124939
-4.449881	output,	-0.124939
-4.449881	i=0;	-0.124939
-4.449881	session.	-0.124939
-4.449881	algorithm,	-0.124939
-4.449881	2014.	-0.124939
-4.449881	a[i+2];	-0.124939
-4.449881	anda	-0.124939
-4.449881	deviate	-0.124939
-4.449881	Background	-0.124939
-4.449881	"instrset_detect.cpp"	-0.124939
-4.449881	12.1b	-0.124939
-4.449881	_mm_add_epi16(a,b).	-0.124939
-4.449881	EXCLUSIVE	-0.124939
-4.449881	8.26b:	-0.124939
-4.449881	list[16];	-0.124939
-4.449881	formalism	-0.124939
-4.449881	full-size	-0.124939
-4.449881	graceful	-0.124939
-4.449881	changes.	-0.124939
-4.449881	(ATL)	-0.124939
-4.449881	-parallel	-0.124939
-4.449881	n++)	-0.124939
-4.449881	(Examples	-0.124939
-4.449881	Friday))	-0.124939
-4.449881	relaxed	-0.124939
-4.449881	and)	-0.124939
-4.449881	Family	-0.124939
-4.449881	fprintf(stderr,	-0.124939
-4.449881	16.1.	-0.124939
-4.449881	handle.	-0.124939
-4.449881	Uses	-0.124939
-4.449881	modifies	-0.124939
-4.449881	specifically	-0.124939
-4.449881	controversies	-0.124939
-4.449881	F1(int	-0.124939
-4.449881	representation,	-0.124939
-4.449881	Thus,	-0.124939
-4.449881	http://www.agner.org/optimize/	-0.124939
-4.449881	80.9	-0.124939
-4.449881	80.8	-0.124939
-4.449881	intelligible	-0.124939
-4.449881	computational	-0.124939
-4.449881	__attribute__((aligned(16))).	-0.124939
-4.449881	measurement.	-0.124939
-4.449881	obscured	-0.124939
-4.449881	considerations.	-0.124939
-4.449881	dictates	-0.124939
-4.449881	crashes	-0.124939
-4.449881	83	-0.124939
-4.449881	Look	-0.124939
-4.449881	17.9:	-0.124939
-4.449881	multiplexers,	-0.124939
-4.449881	next.	-0.124939
-4.449881	representations	-0.124939
-4.449881	deallocate	-0.124939
-4.449881	_mm_store_si128((__m128i	-0.124939
-4.449881	completely.	-0.124939
-4.449881	different.	-0.124939
-4.449881	succeeded	-0.124939
-4.449881	little-endian	-0.124939
-4.449881	chapter.	-0.124939
-4.449881	Vec16c	-0.124939
-4.449881	125	-0.124939
-4.449881	14.16a	-0.124939
-4.449881	if-branch	-0.124939
-4.449881	mainly	-0.124939
-4.449881	132.	-0.124939
-4.449881	size_t	-0.124939
-4.449881	FILO	-0.124939
-4.449881	Higher	-0.124939
-4.449881	86	-0.124939
-4.449881	coordinates	-0.124939
-4.449881	b2	-0.124939
-4.449881	b1	-0.124939
-4.449881	"Macro	-0.124939
-4.449881	b.	-0.124939
-4.449881	VIA.	-0.124939
-4.449881	happened	-0.124939
-4.449881	Polymorphism	-0.124939
-4.449881	32-62.	-0.124939
-4.449881	motion.	-0.124939
-4.449881	(www.boost.org).	-0.124939
-4.449881	(seconds	-0.124939
-4.449881	continue	-0.124939
-4.449881	Intel/x86-compatible	-0.124939
-4.449881	7.26b	-0.124939
-4.449881	Journal,	-0.124939
-4.449881	(parallel	-0.124939
-4.449881	7.26a	-0.124939
-4.449881	possibilities	-0.124939
-4.449881	subtasks	-0.124939
-4.449881	bottleneck,	-0.124939
-4.449881	analysis.	-0.124939
-4.449881	speed..............................................................................................................	-0.124939
-4.449881	difference.	-0.124939
-4.449881	nmmintrin.h	-0.124939
-4.449881	const,	-0.124939
-4.449881	SVML.	-0.124939
-4.449881	non-object	-0.124939
-4.449881	catching	-0.124939
-4.449881	F1();	-0.124939
-4.449881	used).	-0.124939
-4.449881	2.6.30	-0.124939
-4.449881	contentions,	-0.124939
-4.449881	causing	-0.124939
-4.449881	StoreNTD(&a[c][r],	-0.124939
-4.449881	F1.	-0.124939
-4.449881	("fldl	-0.124939
-4.449881	8.19.	-0.124939
-4.449881	micro-	-0.124939
-4.449881	15.1b,	-0.124939
-4.449881	module1.cpp	-0.124939
-4.449881	memory-intensive	-0.124939
-4.449881	F1?	-0.124939
-4.449881	account.	-0.124939
-4.449881	project.	-0.124939
-4.449881	project,	-0.124939
-4.449881	recoverable	-0.124939
-4.449881	3.x.	-0.124939
-4.449881	checking,	-0.124939
-4.449881	checking.	-0.124939
-4.449881	8.10b	-0.124939
-4.449881	8.10a	-0.124939
-4.449881	yet.	-0.124939
-4.449881	0.59	-0.124939
-4.449881	DontSkip;	-0.124939
-4.449881	(&a);	-0.124939
-4.449881	During	-0.124939
-4.449881	---xx---x	-0.124939
-4.449881	view	-0.124939
-4.449881	discontinued	-0.124939
-4.449881	Address	-0.124939
-4.449881	virtually	-0.124939
-4.449881	satisfactory.	-0.124939
-4.449881	87.	-0.124939
-4.449881	-fp-	-0.124939
-4.449881	....................................................................	-0.124939
-4.449881	fetching	-0.124939
-4.449881	i2;	-0.124939
-4.449881	zero(0,0,0,0,0,0,0,0);	-0.124939
-4.449881	card.	-0.124939
-4.449881	Func1,	-0.124939
-4.449881	convention	-0.124939
-4.449881	(OWL).	-0.124939
-4.449881	<,	-0.124939
-4.449881	<.	-0.124939
-4.449881	PHP,	-0.124939
-4.449881	non-reduced	-0.124939
-4.449881	level.	-0.124939
-4.449881	released	-0.124939
-4.449881	address:	-0.124939
-4.449881	Enterprise	-0.124939
-4.449881	Vec2uq	-0.124939
-4.449881	statements.............................................................................	-0.124939
-4.449881	Mac,	-0.124939
-4.449881	www.agner.org/optimize/testp.zip.	-0.124939
-4.449881	CString.	-0.124939
-4.449881	receive	-0.124939
-4.449881	Mac:	-0.124939
-4.449881	expanded	-0.124939
-4.449881	solutions.	-0.124939
-4.449881	solutions,	-0.124939
-4.449881	(a+1);	-0.124939
-4.449881	7.30b	-0.124939
-4.449881	7.30a	-0.124939
-4.449881	return;	-0.124939
-4.449881	published	-0.124939
-4.449881	GetProcessAffinityMask	-0.124939
-4.449881	reinstalled	-0.124939
-4.449881	emulated	-0.124939
-4.449881	map.	-0.124939
-4.449881	/Qparallel	-0.124939
-4.449881	fffff	-0.124939
-4.449881	Basic,	-0.124939
-4.449881	OneOrTwo5[b	-0.124939
-4.449881	Basic.	-0.124939
-4.449881	fastest.	-0.124939
-4.449881	square(x)	-0.124939
-4.449881	fastest:	-0.124939
-4.449881	max(T	-0.124939
-4.449881	late.	-0.124939
-4.449881	<float,	-0.124939
-4.449881	closer	-0.124939
-4.449881	static_cast	-0.124939
-4.449881	performance/price	-0.124939
-4.449881	improvement	-0.124939
-4.449881	runtime,	-0.124939
-4.449881	x^2,	-0.124939
-4.449881	set).	-0.124939
-4.449881	corresponds	-0.124939
-4.449881	icon	-0.124939
-4.449881	3");	-0.124939
-4.449881	1.61	-0.124939
-4.449881	38).	-0.124939
-4.449881	__attribute__((aligned(16)))	-0.124939
-4.449881	Documentation".	-0.124939
-4.449881	connection	-0.124939
-4.449881	satisfactorily	-0.124939
-4.449881	(SDK	-0.124939
-4.449881	alloca:	-0.124939
-4.449881	Linear	-0.124939
-4.449881	list[301];	-0.124939
-4.449881	code..................................................................................	-0.124939
-4.449881	Server	-0.124939
-4.449881	Comments	-0.124939
-4.449881	synchronizing	-0.124939
-4.449881	redesigning	-0.124939
-4.449881	alloca,	-0.124939
-4.449881	brushes,	-0.124939
-4.449881	_mm_cvtss_si32(_mm_load_ss(&x));}	-0.124939
-4.449881	mind,	-0.124939
-4.449881	addressing.	-0.124939
-4.449881	Optimized	-0.124939
-4.449881	multi-threaded	-0.124939
-4.449881	7.3.	-0.124939
-4.449881	Fog	-0.124939
-4.449881	;edx=addressinr	-0.124939
-4.449881	//=DeltaY	-0.124939
-4.449881	Similar	-0.124939
-4.449881	Alignment?	-0.124939
-4.449881	startup	-0.124939
-4.449881	1.5f};	-0.124939
-4.449881	doesn’t.	-0.124939
-4.449881	7.39	-0.124939
-4.449881	12.8a	-0.124939
-4.449881	12.8b	-0.124939
-4.449881	"how	-0.124939
-4.449881	7.35	-0.124939
-4.449881	7.37	-0.124939
-4.449881	#)	-0.124939
-4.449881	connections	-0.124939
-4.449881	7.36	-0.124939
-4.449881	MAX(f(x),	-0.124939
-4.449881	add_horizontal)	-0.124939
-4.449881	violates	-0.124939
-4.449881	8*x	-0.124939
-4.449881	www.agner.org/optimize	-0.124939
-4.449881	write:	-0.124939
-4.449881	exploited.	-0.124939
-4.449881	closely	-0.124939
-4.449881	foremost,	-0.124939
-4.449881	remedy	-0.124939
-4.449881	dependent	-0.124939
-4.449881	c1::*MemberPointer;	-0.124939
-4.449881	i--)	-0.124939
-4.449881	list[i].b.	-0.124939
-4.449881	removable	-0.124939
-4.449881	ignore,	-0.124939
-4.449881	atomic.	-0.124939
-4.449881	5).	-0.124939
-4.449881	87)	-0.124939
-4.449881	distributed.	-0.124939
-4.449881	degrades	-0.124939
-4.449881	Explain	-0.124939
-4.449881	(VML,	-0.124939
-4.449881	IsPowerOf2,	-0.124939
-4.449881	Func(ab[i].a);	-0.124939
-4.449881	(NetBurst)	-0.124939
-4.449881	understands	-0.124939
-4.449881	6!	-0.124939
-4.449881	ArraySize;	-0.124939
-4.449881	poorly.	-0.124939
-4.449881	habit,	-0.124939
-4.449881	log(2.0);	-0.124939
-4.449881	M	-0.124939
-4.449881	period	-0.124939
-4.449881	flexibility,	-0.124939
-4.449881	fashion.	-0.124939
-4.449881	3A	-0.124939
-4.449881	CPU-time	-0.124939
-4.449881	62	-0.124939
-4.449881	CLR,	-0.124939
-4.449881	0.95	-0.124939
-4.449881	exponent:	-0.124939
-4.449881	operands:	-0.124939
-4.449881	67	-0.124939
-4.449881	68	-0.124939
-4.449881	69	-0.124939
-4.449881	details).	-0.124939
-4.449881	4.0.1.	-0.124939
-4.449881	DLLs,	-0.124939
-4.449881	Incrementing	-0.124939
-4.449881	conversion.	-0.124939
-4.449881	browsers,	-0.124939
-4.449881	50;	-0.124939
-4.449881	unrealistic	-0.124939
-4.449881	identification.	-0.124939
-4.449881	500	-0.124939
-4.449881	between.	-0.124939
-4.449881	consult	-0.124939
-4.449881	iteration.	-0.124939
-4.449881	event,	-0.124939
-4.449881	helpful	-0.124939
-4.449881	costs.	-0.124939
-4.449881	protocols	-0.124939
-4.449881	instead:	-0.124939
-4.449881	(char,	-0.124939
-4.449881	moderately	-0.124939
-4.449881	aa,	-0.124939
-4.449881	overflows,	-0.124939
-4.449881	12.3.	-0.124939
-4.449881	brutally	-0.124939
-4.449881	sign(i)	-0.124939
-4.449881	occur:	-0.124939
-4.449881	bcc,	-0.124939
-4.449881	#undef	-0.124939
-4.449881	1.fffff,	-0.124939
-4.449881	"Technical	-0.124939
-4.449881	converts	-0.124939
-4.449881	sizeof(float))	-0.124939
-4.449881	Sfloat	-0.124939
-4.449881	erroneously	-0.124939
-4.449881	Subtractions	-0.124939
-4.449881	Builder	-0.124939
-4.449881	as(a	-0.124939
-4.449881	shifts	-0.124939
-4.449881	intrinsics	-0.124939
-4.449881	$B2$3:	-0.124939
-4.449881	Vec2d	-0.124939
-4.449881	breakdown.	-0.124939
-4.449881	subtasks,	-0.124939
-4.449881	y?"	-0.124939
-4.449881	driver	-0.124939
-4.449881	producers	-0.124939
-4.449881	Vec2q	-0.124939
-4.449881	parts:	-0.124939
-4.449881	Is8vec8	-0.124939
-4.449881	prototypes	-0.124939
-4.449881	manuals:	-0.124939
-4.449881	(int)(&list[0])	-0.124939
-4.449881	If,	-0.124939
-4.449881	breakdowns	-0.124939
-4.449881	minimize	-0.124939
-4.449881	uint16_t	-0.124939
-4.449881	32767	-0.124939
-4.449881	parts,	-0.124939
-4.449881	IPP	-0.124939
-4.449881	courses	-0.124939
-4.449881	fma4intrin.h	-0.124939
-4.449881	all,	-0.124939
-4.449881	bodies	-0.124939
-4.449881	eax,1	-0.124939
-4.449881	Loading	-0.124939
-4.449881	(SIZE	-0.124939
-4.449881	Michael	-0.124939
-4.449881	actions	-0.124939
-4.449881	risking	-0.124939
-4.449881	sign,	-0.124939
-4.449881	MathLoop()	-0.124939
-4.449881	heuristic	-0.124939
-4.449881	(n!)	-0.124939
-4.449881	root,	-0.124939
-4.449881	GOT,	-0.124939
-4.449881	7.2).	-0.124939
-4.449881	130.	-0.124939
-4.449881	ameliorated	-0.124939
-4.449881	installed	-0.124939
-4.449881	Gauss	-0.124939
-4.449881	blog	-0.124939
-4.449881	Dobbs	-0.124939
-4.449881	fundamental	-0.124939
-4.449881	[eax+4],	-0.124939
-4.449881	dest,	-0.124939
-4.449881	("CriticalFunction");	-0.124939
-4.449881	Change	-0.124939
-4.449881	scheduled	-0.124939
-4.449881	-fwrapv	-0.124939
-4.449881	conversions.	-0.124939
-4.449881	educational	-0.124939
-4.449881	state.	-0.124939
-4.449881	i/2	-0.124939
-4.449881	__vrd2_exp	-0.124939
-4.449881	traffic	-0.124939
-4.449881	address)	-0.124939
-4.449881	Compatibility	-0.124939
-4.449881	conversions:	-0.124939
-4.449881	confined	-0.124939
-4.449881	glitches	-0.124939
-4.449881	funda-	-0.124939
-4.449881	matrix[SIZE][SIZE];	-0.124939
-4.449881	FUNCNAME(short	-0.124939
-4.449881	matrix[c][r].	-0.124939
-4.449881	__declspec(align(64))	-0.124939
-4.449881	\n	-0.124939
-4.449881	MultiplyBy<8>(10);	-0.124939
-4.449881	accept	-0.124939
-4.449881	suggestions	-0.124939
-4.449881	Since	-0.124939
-4.449881	impacts	-0.124939
-4.449881	(with	-0.124939
-4.449881	Your	-0.124939
-4.449881	learn	-0.124939
-4.449881	compatibility,	-0.124939
-4.449881	c1+TILESIZE;	-0.124939
-4.449881	slice	-0.124939
-4.449881	division......................................................................................................	-0.124939
-4.449881	_mm_shuffle_epi8	-0.124939
-4.449881	__svml_expf4	-0.124939
-4.449881	Complicated	-0.124939
-4.449881	temp->b	-0.124939
-4.449881	list[x];	-0.124939
-4.449881	temp->a	-0.124939
-4.449881	subexpressions	-0.124939
-4.449881	(addition,	-0.124939
-4.449881	guess,	-0.124939
-4.449881	12.1b.	-0.124939
-4.449881	12.1b,	-0.124939
-4.449881	&SelectAddMul_SSE2;	-0.124939
-4.449881	imprecisions	-0.124939
-4.449881	(Intel)	-0.124939
-4.449881	__svml_exp2	-0.124939
-4.449881	hardware.	-0.124939
-4.449881	followed	-0.124939
-4.449881	SelectAddMul_dispatch;	-0.124939
-4.449881	branches:	-0.124939
-4.449881	prior	-0.124939
-4.449881	a+1;.	-0.124939
-4.449881	(but	-0.124939
-4.449881	__intel_cpu_features_init()	-0.124939
-4.449881	low-power	-0.124939
-4.449881	list[i+1];}	-0.124939
-4.449881	IsProcessorFeaturePresent	-0.124939
-4.449881	names,	-0.124939
-4.449881	satisfied.	-0.124939
-4.449881	distinctions	-0.124939
-4.449881	i+=3,i_div_3++){	-0.124939
-4.449881	relocation	-0.124939
-4.449881	hours	-0.124939
-4.449881	b+a,	-0.124939
-4.449881	satisfied:	-0.124939
-4.449881	neverthe-	-0.124939
-4.449881	list[]	-0.124939
-4.449881	clause.	-0.124939
-4.449881	expandable,	-0.124939
-4.449881	754	-0.124939
-4.449881	x2*x,	-0.124939
-4.449881	Should	-0.124939
-4.449881	deprecated.	-0.124939
-4.449881	porting	-0.124939
-4.449881	amd_vrs4_expf	-0.124939
-4.449881	hour.	-0.124939
-4.449881	discriminates	-0.124939
-4.449881	applying	-0.124939
-4.449881	has.	-0.124939
-4.449881	1023	-0.124939
-4.449881	Waiting	-0.124939
-4.449881	classes:	-0.124939
-4.449881	p1->Hello();	-0.124939
-4.449881	_mm_blendv_epi8(bc,	-0.124939
-4.449881	8.12a	-0.124939
-4.449881	8.12b	-0.124939
-4.449881	CriticalFunction_Dispatch;	-0.124939
-4.449881	(Compile	-0.124939
-4.449881	Frequent	-0.124939
-4.449881	isolating	-0.124939
-4.449881	strongly	-0.124939
-4.449881	manageable	-0.124939
-4.449881	include:	-0.124939
-4.449881	totaling	-0.124939
-4.449881	(5)	-0.124939
-4.449881	(MOVNT)	-0.124939
-4.449881	.NET,	-0.124939
-4.449881	column-wise	-0.124939
-4.449881	exception-safe	-0.124939
-4.449881	spends	-0.124939
-4.449881	4.5	-0.124939
-4.449881	Size()	-0.124939
-4.449881	Sizes	-0.124939
-4.449881	4.;	-0.124939
-4.449881	website.	-0.124939
-4.449881	buttons,	-0.124939
-4.449881	9.5a:	-0.124939
-4.449881	call,	-0.124939
-4.449881	.......................................................................	-0.124939
-4.449881	1./5040.,	-0.124939
-4.449881	ReadB()	-0.124939
-4.449881	printf("\n%2i	-0.124939
-4.449881	7.1.	-0.124939
-4.449881	elements?	-0.124939
-4.449881	(if	-0.124939
-4.449881	caused	-0.124939
-4.449881	x.f;	-0.124939
-4.449881	---xx--xx	-0.124939
-4.449881	XOP,	-0.124939
-4.449881	First-In-Last-Out	-0.124939
-4.449881	sizeof(float)	-0.124939
-4.449881	Hardware	-0.124939
-4.449881	tricks	-0.124939
-4.449881	a=	-0.124939
-4.449881	a:	-0.124939
-4.449881	directory	-0.124939
-4.449881	"Beta",	-0.124939
-4.449881	(b*2.0)/3.0	-0.124939
-4.449881	(doubly	-0.124939
-4.449881	CriticalFunction(b,	-0.124939
-4.449881	unreferen-	-0.124939
-4.449881	scalar	-0.124939
-4.449881	57).	-0.124939
-4.449881	63;	-0.124939
-4.449881	Scott	-0.124939
-4.449881	delays.	-0.124939
-4.449881	Read	-0.124939
-4.449881	paralleli-	-0.124939
-4.449881	uint8_t	-0.124939
-4.449881	opportunities	-0.124939
-4.449881	Thursday	-0.124939
-4.449881	_mm_i32gather_epi32	-0.124939
-4.449881	gone	-0.124939
-4.449881	aliasing,	-0.124939
-4.449881	add_elements(s);	-0.124939
-4.449881	Dispatcher.	-0.124939
-4.449881	void.	-0.124939
-4.449881	worse	-0.124939
-4.449881	((a+b)+c)+d.	-0.124939
-4.449881	Classes	-0.124939
-4.449881	strlen,	-0.124939
-4.449881	Constructor-style	-0.124939
-4.449881	correctness.	-0.124939
-4.449881	miss.	-0.124939
-4.449881	higher-priority	-0.124939
-4.449881	services.	-0.124939
-4.449881	library).	-0.124939
-4.449881	[esp+4]	-0.124939
-4.449881	N;	-0.124939
-4.449881	DynamicArray	-0.124939
-4.449881	_mm_hadd_ps(x,	-0.124939
-4.449881	N)	-0.124939
-4.449881	design	-0.124939
-4.449881	(depending	-0.124939
-4.449881	obvious.	-0.124939
-4.449881	obvious,	-0.124939
-4.449881	FuncType(short	-0.124939
-4.449881	(1.	-0.124939
-4.449881	freely.	-0.124939
-4.449881	x-xx--xx-	-0.124939
-4.449881	b[r][c]);	-0.124939
-4.449881	option)	-0.124939
-4.449881	replacements	-0.124939
-4.449881	Performance".	-0.124939
-4.449881	handler,	-0.124939
-4.449881	increment.	-0.124939
-4.449881	segmentation	-0.124939
-4.449881	integers:	-0.124939
-4.449881	-openmp	-0.124939
-4.449881	behaviour	-0.124939
-4.449881	XOR'ing	-0.124939
-4.449881	RTTI	-0.124939
-4.449881	divisor.	-0.124939
-4.449881	a.y);}	-0.124939
-4.449881	r1+1;	-0.124939
-4.449881	increments	-0.124939
-4.449881	handlers	-0.124939
-4.449881	references:	-0.124939
-4.449881	composer)	-0.124939
-4.449881	2-dimensional	-0.124939
-4.449881	-128,	-0.124939
-4.449881	//=A*x*x+B*x+C	-0.124939
-4.449881	applications:	-0.124939
-4.449881	/Fa	-0.124939
-4.449881	unrecoverable	-0.124939
-4.449881	Things	-0.124939
-4.449881	/Fm	-0.124939
-4.449881	texts	-0.124939
-4.449881	redesign.	-0.124939
-4.449881	0x3F800000;	-0.124939
-4.449881	-fpic.	-0.124939
-4.449881	research	-0.124939
-4.449881	servicing.	-0.124939
-4.449881	__cpuid(dummy,	-0.124939
-4.449881	rows/columns	-0.124939
-4.449881	p->NotPolymorphic();	-0.124939
-4.449881	Hoisie,	-0.124939
-4.449881	e.g.:	-0.124939
-4.449881	destroys	-0.124939
-4.449881	deque	-0.124939
-4.449881	knows	-0.124939
-4.449881	2010.	-0.124939
-4.449881	80x86	-0.124939
-4.449881	restoring	-0.124939
-4.449881	a2/b2;	-0.124939
-4.449881	_mm_cvtss_f32(s);	-0.124939
-4.449881	-read_only_relocs	-0.124939
-4.449881	newsgroups	-0.124939
-4.449881	12.4b,	-0.124939
-4.449881	12.4b.	-0.124939
-4.449881	(rarely	-0.124939
-4.449881	10000,	-0.124939
-4.449881	b[0],	-0.124939
-4.449881	72).	-0.124939
-4.449881	dimensions	-0.124939
-4.449881	v	-0.124939
-4.449881	y2,	-0.124939
-4.449881	lightweight	-0.124939
-4.449881	;alignby4	-0.124939
-4.449881	Coarse	-0.124939
-4.449881	vectorized:	-0.124939
-4.449881	influenced	-0.124939
-4.449881	5.5	-0.124939
-4.449881	swapping.	-0.124939
-4.449881	breaking	-0.124939
-4.449881	Cannot	-0.124939
-4.449881	libmmt.lib	-0.124939
-4.449881	speeding	-0.124939
-4.449881	;a	-0.124939
-4.449881	_finite())	-0.124939
-4.449881	Functional	-0.124939
-4.449881	ratio.	-0.124939
-4.449881	unattended.	-0.124939
-4.449881	reflect	-0.124939
-4.449881	;r	-0.124939
-4.449881	2.5f};	-0.124939
-4.449881	61.	-0.124939
-4.449881	-m64	-0.124939
-4.449881	concentrating	-0.124939
-4.449881	(&	-0.124939
-4.449881	preference	-0.124939
-4.449881	&list[8]);	-0.124939
-4.449881	(6	-0.124939
-4.449881	(0	-0.124939
-4.449881	(YMM)	-0.124939
-4.449881	12.4d.	-0.124939
-4.449881	process,	-0.124939
-4.449881	best-case	-0.124939
-4.449881	d.x;	-0.124939
-4.449881	a+b=b+a,	-0.124939
-4.449881	(8	-0.124939
-4.449881	(A	-0.124939
-4.449881	(C	-0.124939
-4.449881	developer	-0.124939
-4.449881	(B	-0.124939
-4.449881	PREFETCH	-0.124939
-4.449881	mangling.	-0.124939
-4.449881	susceptible	-0.124939
-4.449881	Stefan	-0.124939
-4.449881	(SSE):	-0.124939
-4.449881	inte-	-0.124939
-4.449881	/arch:SSE3	-0.124939
-4.449881	sum.	-0.124939
-4.449881	ReadB	-0.124939
-4.449881	Verilog.	-0.124939
-4.449881	inferior.	-0.124939
-4.449881	dimension	-0.124939
-4.449881	party	-0.124939
-4.449881	Advices	-0.124939
-4.449881	reporting.	-0.124939
-4.449881	wired	-0.124939
-4.449881	14.12a	-0.124939
-4.449881	refresh	-0.124939
-4.449881	inequality	-0.124939
-4.449881	Ready	-0.124939
-4.449881	types.	-0.124939
-4.449881	!a;	-0.124939
-4.449881	defined(__unix__)	-0.124939
-4.449881	destructor,	-0.124939
-4.449881	destructor.	-0.124939
-4.449881	wires	-0.124939
-4.449881	_mm_clflush	-0.124939
-4.449881	sizes?	-0.124939
-4.449881	SelectAddMul	-0.124939
-4.449881	--xxxx-xx	-0.124939
-4.449881	Loops......................................................................................................................	-0.124939
-4.449881	iset	-0.124939
-4.449881	beginning.	-0.124939
-4.449881	adhere	-0.124939
-4.449881	__rdtsc();	-0.124939
-4.449881	3.0;	-0.124939
-4.449881	Calculations	-0.124939
-4.449881	loop-	-0.124939
-4.449881	i++)a[i]=2*i;	-0.124939
-4.449881	non-recoverable	-0.124939
-4.449881	side-effects	-0.124939
-4.449881	Looking	-0.124939
-4.449881	loop?	-0.124939
-4.449881	Glibc	-0.124939
-4.449881	C1,	-0.124939
-4.449881	written.	-0.124939
-4.449881	Partial	-0.124939
-4.449881	a[]	-0.124939
-4.449881	modularity	-0.124939
-4.449881	properly.	-0.124939
-4.449881	fast=2	-0.124939
-4.449881	aa[size]	-0.124939
-4.449881	throws	-0.124939
-4.449881	feed	-0.124939
-4.449881	former	-0.124939
-4.449881	bug".	-0.124939
-4.449881	considerably.	-0.124939
-4.449881	feel	-0.124939
-4.449881	relate	-0.124939
-4.449881	b++;	-0.124939
-4.449881	well-known	-0.124939
-4.449881	6.	-0.124939
-4.449881	antivirus	-0.124939
-4.449881	sizes,	-0.124939
-4.449881	pipeline.	-0.124939
-4.449881	"	-0.124939
-4.449881	gates,	-0.124939
-4.449881	games.	-0.124939
-4.449881	FactorialTable[n];	-0.124939
-4.449881	Newer	-0.124939
-4.449881	IntegerPower	-0.124939
-4.449881	158.	-0.124939
-4.449881	'>')	-0.124939
-4.449881	18.1.	-0.124939
-4.449881	obsolete	-0.124939
-4.449881	15.1c,	-0.124939
-4.449881	prediction).	-0.124939
-4.449881	-231	-0.124939
-4.449881	information,	-0.124939
-4.449881	a*b*c*2.	-0.124939
-4.449881	nagging	-0.124939
-4.449881	threads?	-0.124939
-4.449881	7.22.	-0.124939
-4.449881	www.yeppp.info	-0.124939
-4.449881	&list[100];	-0.124939
-4.449881	Intel/MASM	-0.124939
-4.449881	Keywords	-0.124939
-4.449881	requires,	-0.124939
-4.449881	workplace	-0.124939
-4.449881	risky	-0.124939
-4.449881	"standard	-0.124939
-4.449881	cached,	-0.124939
-4.449881	p->f();	-0.124939
-4.449881	checking).	-0.124939
-4.449881	search:	-0.124939
-4.449881	instrset_detect	-0.124939
-4.449881	measure.	-0.124939
-4.449881	concentrate	-0.124939
-4.449881	double's.	-0.124939
-4.449881	Inlined	-0.124939
-4.449881	a*4	-0.124939
-4.449881	this).	-0.124939
-4.449881	x2;	-0.124939
-4.449881	Runtime,	-0.124939
-4.449881	multiplications,	-0.124939
-4.449881	That	-0.124939
-4.449881	reach	-0.124939
-4.449881	x2,	-0.124939
-4.449881	76	-0.124939
-4.449881	75	-0.124939
-4.449881	832	-0.124939
-4.449881	job,	-0.124939
-4.449881	job.	-0.124939
-4.449881	71	-0.124939
-4.449881	70	-0.124939
-4.449881	www.openmp.org	-0.124939
-4.449881	down.	-0.124939
-4.449881	79	-0.124939
-4.449881	CPLDs	-0.124939
-4.449881	coincides	-0.124939
-4.449881	8.14b	-0.124939
-4.449881	press.	-0.124939
-4.449881	8.14a	-0.124939
-4.449881	bit-mask	-0.124939
-4.449881	worried	-0.124939
-4.449881	"m"(x)	-0.124939
-4.449881	sign-bit	-0.124939
-4.449881	7.33a	-0.124939
-4.449881	stupid.	-0.124939
-4.449881	pooling.	-0.124939
-4.449881	pooling)	-0.124939
-4.449881	shares	-0.124939
-4.449881	modular.	-0.124939
-4.449881	forward)	-0.124939
-4.449881	advantages:	-0.124939
-4.449881	WritePrivateProfileString	-0.124939
-4.449881	2005).	-0.124939
-4.449881	try,	-0.124939
-4.449881	multiply-and-add	-0.124939
-4.449881	only,	-0.124939
-4.449881	triangle	-0.124939
-4.449881	x-xxxx-x-	-0.124939
-4.449881	Rounding	-0.124939
-4.449881	Quine–McCluskey	-0.124939
-4.449881	n-1	-0.124939
-4.449881	laws	-0.124939
-4.449881	x^0/0!	-0.124939
-4.449881	semicolons,	-0.124939
-4.449881	(/arch:SSE2,	-0.124939
-4.449881	float(i);	-0.124939
-4.449881	decimals.	-0.124939
-4.449881	not!	-0.124939
-4.449881	together......................................	-0.124939
-4.449881	(live	-0.124939
-4.449881	-ffast-math	-0.124939
-4.449881	managing	-0.124939
-4.449881	Sum1()	-0.124939
-4.449881	rare.	-0.124939
-4.449881	responded	-0.124939
-4.449881	-100	-0.124939
-4.449881	2.;	-0.124939
-4.449881	pre-calculated	-0.124939
-4.449881	a+b+c=c+b+a	-0.124939
-4.449881	devirtualization	-0.124939
-4.449881	invoked	-0.124939
-4.449881	128-	-0.124939
-4.449881	9.5b.	-0.124939
-4.449881	printer	-0.124939
-4.449881	Uninstallation	-0.124939
-4.449881	Half	-0.124939
-4.449881	Important	-0.124939
-4.449881	Func2	-0.124939
-4.449881	say	-0.124939
-4.449881	12.3a,	-0.124939
-4.449881	.........................	-0.124939
-4.449881	reads.	-0.124939
-4.449881	1./120.,	-0.124939
-4.449881	unions	-0.124939
-4.449881	Pascal,	-0.124939
-4.449881	namely	-0.124939
-4.449881	structure,	-0.124939
-4.449881	Multidimensional	-0.124939
-4.449881	'this'.	-0.124939
-4.449881	AVX2,	-0.124939
-4.449881	<asmlib.h>	-0.124939
-4.449881	16-bit,	-0.124939
-4.449881	u.i[1]	-0.124939
-4.449881	much.	-0.124939
-4.449881	(eax)	-0.124939
-4.449881	on)	-0.124939
-4.449881	attention	-0.124939
-4.449881	level-	-0.124939
-4.449881	maintainability	-0.124939
-4.449881	f=i;	-0.124939
-4.449881	0x7F	-0.124939
-4.449881	relying	-0.124939
-4.449881	setup.	-0.124939
-4.449881	Monday	-0.124939
-4.449881	n'th	-0.124939
-4.449881	"Register	-0.124939
-4.449881	a&b&c&d	-0.124939
-4.449881	clause	-0.124939
-4.449881	ignored	-0.124939
-4.449881	union,	-0.124939
-4.449881	i<20	-0.124939
-4.449881	<=,	-0.124939
-4.449881	relational	-0.124939
-4.449881	*x;	-0.124939
-4.449881	Primitives"	-0.124939
-4.449881	finds	-0.124939
-4.449881	false:	-0.124939
-4.449881	square:	-0.124939
-4.449881	protection.	-0.124939
-4.449881	iterative	-0.124939
-4.449881	objects),	-0.124939
-4.449881	tricky.	-0.124939
-4.449881	opposite:	-0.124939
-4.449881	for-loop:	-0.124939
-4.449881	complication	-0.124939
-4.449881	++b;	-0.124939
-4.449881	square.	-0.124939
-4.449881	SelectAddMul_SSE41,	-0.124939
-4.449881	strategy	-0.124939
-4.449881	AND'ing	-0.124939
-4.449881	xplus2()	-0.124939
-4.449881	X?"	-0.124939
-4.449881	Specifications,	-0.124939
-4.449881	million	-0.124939
-4.449881	(new	-0.124939
-4.449881	reason.	-0.124939
-4.449881	reason,	-0.124939
-4.449881	-fopenmp	-0.124939
-4.449881	squares	-0.124939
-4.449881	Best-case	-0.124939
-4.449881	investigation	-0.124939
-4.449881	disguise.	-0.124939
-4.449881	route.	-0.124939
-4.449881	small,	-0.124939
-4.449881	compactness	-0.124939
-4.449881	overhead.	-0.124939
-4.449881	//=2*A	-0.124939
-4.449881	2.6f;	-0.124939
-4.449881	%10I64i",	-0.124939
-4.449881	point-to-integer	-0.124939
-4.449881	set?".	-0.124939
-4.449881	loop-branch	-0.124939
-4.449881	vector(x	-0.124939
-4.449881	8.8b	-0.124939
-4.449881	Encryption,	-0.124939
-4.449881	8.8a	-0.124939
-4.449881	~b	-0.124939
-4.449881	Firewalls,	-0.124939
-4.449881	x-xxx--xx	-0.124939
-4.449881	100*16,	-0.124939
-4.449881	Newest	-0.124939
-4.449881	normalized,	-0.124939
-4.449881	MFC).	-0.124939
-4.449881	..............................................................................	-0.124939
-4.449881	widely	-0.124939
-4.449881	re-calculated	-0.124939
-4.449881	advise	-0.124939
-4.449881	Multithreaded	-0.124939
-4.449881	year.	-0.124939
-4.449881	Insert	-0.124939
-4.449881	Reducible	-0.124939
-4.449881	vector()	-0.124939
-4.449881	decades	-0.124939
-4.449881	decreased	-0.124939
-4.449881	CPU.............................................................................81	-0.124939
-4.449881	(*CriticalFunction)(b,	-0.124939
-4.449881	12.7.	-0.124939
-4.449881	patterns.	-0.124939
-4.449881	publish	-0.124939
-4.449881	definitions	-0.124939
-4.449881	Multiply(10,8);	-0.124939
-4.449881	"undefined".	-0.124939
-4.449881	separately:	-0.124939
-4.449881	[ecx+eax*4],ebx	-0.124939
-4.449881	std::unexpected()	-0.124939
-4.449881	b.x	-0.124939
-4.449881	people.	-0.124939
-4.449881	(GetExceptionCode()	-0.124939
-4.449881	-mveclibabi=svml.	-0.124939
-4.449881	a[c][r]	-0.124939
-4.449881	uninitialized,	-0.124939
-4.449881	jumps,	-0.124939
-4.449881	places.	-0.124939
-4.449881	computing,	-0.124939
-4.449881	stress	-0.124939
-4.449881	Iu8vec16	-0.124939
-4.449881	b[1],	-0.124939
-4.449881	strictness	-0.124939
-4.449881	2.1.	-0.124939
-4.449881	alignment,	-0.124939
-4.449881	2.00.	-0.124939
-4.449881	around.	-0.124939
-4.449881	Why	-0.124939
-4.449881	access.......................................................................................................	-0.124939
-4.449881	arraysize)	-0.124939
-4.449881	casting.	-0.124939
-4.449881	casting,	-0.124939
-4.449881	violations	-0.124939
-4.449881	-mveclibabi	-0.124939
-4.449881	malloc.	-0.124939
-4.449881	Last	-0.124939
-4.449881	malloc)	-0.124939
-4.449881	*(__m64*)&source);	-0.124939
-4.449881	231	-0.124939
-4.449881	interval.	-0.124939
-4.449881	removed,	-0.124939
-4.449881	Func()	-0.124939
-4.449881	interval:	-0.124939
-4.449881	question:	-0.124939
-4.449881	r2;	-0.124939
-4.449881	violation,	-0.124939
-4.449881	incremental	-0.124939
-4.449881	admittedly	-0.124939
-4.449881	application-	-0.124939
-4.449881	collect	-0.124939
-4.449881	PC's,	-0.124939
-4.449881	"position-independent	-0.124939
-4.449881	application,	-0.124939
-4.449881	spots.	-0.124939
-4.449881	list.Size();	-0.124939
-4.449881	essential	-0.124939
-4.449881	r2,	-0.124939
-4.449881	xx-xx--x-	-0.124939
-4.449881	mirroring	-0.124939
-4.449881	"function	-0.124939
-4.449881	identified.	-0.124939
-4.449881	g(x));	-0.124939
-4.449881	Time-	-0.124939
-4.449881	originally	-0.124939
-4.449881	8*1024/64	-0.124939
-4.449881	--xxxx---	-0.124939
-4.449881	(using	-0.124939
-4.449881	series:	-0.124939
-4.449881	3.;	-0.124939
-4.449881	v.i)	-0.124939
-4.449881	select_gt(b,	-0.124939
-4.449881	Prototype	-0.124939
-4.449881	String	-0.124939
-4.449881	IsPowerOf2	-0.124939
-4.449881	glibc	-0.124939
-4.449881	pros	-0.124939
-4.449881	matical	-0.124939
-4.449881	............................................................................	-0.124939
-4.449881	stored?	-0.124939
-4.449881	tends	-0.124939
-4.449881	leftmost	-0.124939
-4.449881	Gnu).	-0.124939
-4.449881	series,	-0.124939
-4.449881	series.	-0.124939
-4.449881	(ZMM).	-0.124939
-4.449881	Command	-0.124939
-4.449881	2.1.7,	-0.124939
-4.449881	Square	-0.124939
-4.449881	DelayFiveSeconds	-0.124939
-4.449881	tortuous	-0.124939
-4.449881	..........................................................	-0.124939
-4.449881	boxes,	-0.124939
-4.449881	s);	-0.124939
-4.449881	yes	-0.124939
-4.449881	dialog	-0.124939
-4.449881	integer).	-0.124939
-4.449881	~.	-0.124939
-4.449881	~(~a)	-0.124939
-4.449881	matrix,	-0.124939
-4.449881	xxxxxxx-x	-0.124939
-4.449881	Those	-0.124939
-4.449881	file)	-0.124939
-4.449881	predictions	-0.124939
-4.449881	value,	-0.124939
-4.449881	x[0]	-0.124939
-4.449881	_mm_stream_pi((__m64*)dest,	-0.124939
-4.449881	OneOrTwo5[(b!=0)	-0.124939
-4.449881	fake	-0.124939
-4.449881	(XMM)	-0.124939
-4.449881	logically	-0.124939
-4.449881	vendors	-0.124939
-4.449881	constant,	-0.124939
-4.449881	i[2];	-0.124939
-4.449881	analyze	-0.124939
-4.449881	accomplished	-0.124939
-4.449881	catch,	-0.124939
-4.449881	2005;	-0.124939
-4.449881	PCLMUL	-0.124939
-4.449881	wherever	-0.124939
-4.449881	criteria	-0.124939
-4.449881	..............................................................................................................	-0.124939
-4.449881	++i	-0.124939
-4.449881	repeatedly	-0.124939
-4.449881	sqrt,	-0.124939
-4.449881	__restrict__,	-0.124939
-4.449881	raising	-0.124939
-4.449881	vectorclass.h	-0.124939
-4.449881	section.	-0.124939
-4.449881	section,	-0.124939
-4.449881	unfavorable,	-0.124939
-4.449881	114	-0.124939
-4.449881	...................................................................................................................	-0.124939
-4.449881	hasn't	-0.124939
-4.449881	1.f);	-0.124939
-4.449881	emmintrin.h	-0.124939
-4.449881	optimal,	-0.124939
-4.449881	occurrence	-0.124939
-4.449881	testing,	-0.124939
-4.449881	row.	-0.124939
-4.449881	(methods).........................................................................	-0.124939
-4.449881	self-styled	-0.124939
-4.449881	ia32intrin.h	-0.124939
-4.449881	N-1)==0,N>::p(x);	-0.124939
-4.449881	minute	-0.124939
-4.449881	develop.	-0.124939
-4.449881	Declare	-0.124939
-4.449881	exact	-0.124939
-4.449881	RAM,	-0.124939
-4.449881	circumstances	-0.124939
-4.449881	powN<true,1>	-0.124939
-4.449881	NOT.	-0.124939
-4.449881	(0x2710	-0.124939
-4.449881	_mm_i64gather_epi32	-0.124939
-4.449881	Embarcadero	-0.124939
-4.449881	lower;	-0.124939
-4.449881	table-based	-0.124939
-4.449881	(a*b*c)+(c*b*a)	-0.124939
-4.449881	97	-0.124939
-4.449881	pow(x,10)	-0.124939
-4.449881	AMD's	-0.124939
-4.449881	data",	-0.124939
-4.449881	92	-0.124939
-4.449881	allowing	-0.124939
-4.449881	Tips	-0.124939
-4.449881	vice	-0.124939
-4.449881	generators.	-0.124939
-4.449881	x4*x4;	-0.124939
-4.449881	tested.	-0.124939
-4.449881	c.x	-0.124939
-4.449881	c.y	-0.124939
-4.449881	activates	-0.124939
-4.449881	back,	-0.124939
-4.449881	activated	-0.124939
-4.449881	Scheduling	-0.124939
-4.449881	7.34b.	-0.124939
-4.449881	Vectorize	-0.124939
-4.449881	comments,	-0.124939
-4.449881	variables:	-0.124939
-4.449881	80386	-0.124939
-4.449881	14.16b	-0.124939
-4.449881	note:	-0.124939
-4.449881	Requires	-0.124939
-4.449881	Pentium-II	-0.124939
-4.449881	Borland's	-0.124939
-4.449881	xx4;	-0.124939
-4.449881	__except	-0.124939
-4.449881	rounding,	-0.124939
-4.449881	issue,	-0.124939
-4.449881	superfluous	-0.124939
-4.449881	b*2.0/3.0	-0.124939
-4.449881	mainstream	-0.124939
-4.449881	_mm_hadd_ps(s,	-0.124939
-4.449881	Perl.	-0.124939
-4.449881	14.17a	-0.124939
-4.449881	15.1c?	-0.124939
-4.449881	discussions.	-0.124939
-4.449881	unwinding.	-0.124939
-4.449881	("int	-0.124939
-4.449881	relax	-0.124939
-4.449881	higher)	-0.124939
-4.449881	;checkifi<100	-0.124939
-4.449881	dealt	-0.124939
-4.449881	((x2)2)2	-0.124939
-4.449881	language:	-0.124939
-4.449881	dummy[4];	-0.124939
-4.449881	model,	-0.124939
-4.449881	exist	-0.124939
-4.449881	14.15a	-0.124939
-4.449881	(without	-0.124939
-4.449881	Loopunrolling	-0.124939
-4.449881	wrong,	-0.124939
-4.449881	makers	-0.124939
-4.449881	textbook	-0.124939
-4.449881	15;	-0.124939
-4.449881	leaks.	-0.124939
-4.449881	large.	-0.124939
-4.449881	Poor	-0.124939
-4.449881	159	-0.124939
-4.449881	154	-0.124939
-4.449881	/GR-	-0.124939
-4.449881	152	-0.124939
-4.449881	entry.	-0.124939
-4.449881	inlining.	-0.124939
-4.449881	Object1.Hello(),	-0.124939
-4.449881	151	-0.124939
-4.449881	thought-through	-0.124939
-4.449881	vectorize.	-0.124939
-4.449881	suggests	-0.124939
-4.449881	normally.	-0.124939
-4.449881	_mm256_i32gather_ps	-0.124939
-4.449881	Multithreading..............................................................................................................	-0.124939
-4.449881	_endthread(),	-0.124939
-4.449881	ranges)	-0.124939
-4.449881	analogous	-0.124939
-4.449881	summation	-0.124939
-4.449881	system.........................................................................................	-0.124939
-4.449881	statement:	-0.124939
-4.449881	happy	-0.124939
-4.449881	systems"	-0.124939
-4.449881	Gnu/AT&T	-0.124939
-4.449881	Device	-0.124939
-4.449881	mainframes,	-0.124939
-4.449881	device.	-0.124939
-4.449881	loops"	-0.124939
-4.449881	www.agner.org/optimize/#vectorclass.	-0.124939
-4.449881	%1	-0.124939
-4.449881	sort	-0.124939
-4.449881	floata;	-0.124939
-4.449881	updates.	-0.124939
-4.449881	updates,	-0.124939
-4.449881	(b[i]	-0.124939
-4.449881	level-3	-0.124939
-4.449881	14.00	-0.124939
-4.449881	Few	-0.124939
-4.449881	detects	-0.124939
-4.449881	_alloca)	-0.124939
-4.449881	anywhere	-0.124939
-4.449881	B1,	-0.124939
-4.449881	importantly,	-0.124939
-4.449881	.................................................................................................................	-0.124939
-4.449881	a[1],	-0.124939
-4.449881	0.5	-0.124939
-4.449881	Feature	-0.124939
-4.449881	0.6	-0.124939
-4.449881	mask,	-0.124939
-4.449881	list[ARRAYSIZE];	-0.124939
-4.449881	Is16vec4	-0.124939
-4.449881	Details	-0.124939
-4.449881	JavaScript,	-0.124939
-4.449881	NUMCOLUMNS	-0.124939
-4.449881	eight)	-0.124939
-4.449881	First	-0.124939
-4.449881	losing	-0.124939
-4.449881	ReadTSC();	-0.124939
-4.449881	cores:	-0.124939
-4.449881	makers.	-0.124939
-4.449881	panic	-0.124939
-4.449881	sin	-0.124939
-4.449881	rebooted.	-0.124939
-4.449881	WritePrivateProfileString,	-0.124939
-4.449881	(...)	-0.124939
-4.449881	abuse	-0.124939
-4.449881	place.	-0.124939
-4.449881	FuncCol(int);	-0.124939
-4.449881	%.	-0.124939
-4.449881	Const	-0.124939
-4.449881	abs(u.f)	-0.124939
-4.449881	tables:	-0.124939
-4.449881	13)	-0.124939
-4.449881	Interference	-0.124939
-4.449881	times:	-0.124939
-4.449881	min))	-0.124939
-4.449881	conflicts.	-0.124939
-4.449881	133	-0.124939
-4.449881	complicated?	-0.124939
-4.449881	131	-0.124939
-4.449881	<math.h>	-0.124939
-4.449881	Application	-0.124939
-4.449881	people	-0.124939
-4.449881	disks	-0.124939
-4.449881	Further	-0.124939
-4.449881	single-thread	-0.124939
-4.449881	collection,	-0.124939
-4.449881	138	-0.124939
-4.449881	tables.	-0.124939
-4.449881	%0	-0.124939
-4.449881	tables,	-0.124939
-4.449881	superior	-0.124939
-4.449881	powerful.	-0.124939
-4.449881	newsgroup	-0.124939
-4.449881	activate	-0.124939
-4.449881	C2,	-0.124939
-4.449881	massively	-0.124939
-4.449881	unique	-0.124939
-4.449881	multithreading	-0.124939
-4.449881	unlikely	-0.124939
-4.449881	pending	-0.124939
-4.449881	a[0],	-0.124939
-4.449881	kb.	-0.124939
-4.449881	kb,	-0.124939
-4.449881	practice	-0.124939
-4.449881	(RTTI).	-0.124939
-4.449881	(RTTI),	-0.124939
-4.449881	www.agner.org/optimize/testp.zip	-0.124939
-4.449881	....................................................................................................................	-0.124939
-4.449881	ease	-0.124939
-4.449881	resized	-0.124939
-4.449881	Pro	-0.124939
-4.449881	unpredictably	-0.124939
-4.449881	...................................	-0.124939
-4.449881	Internal	-0.124939
-4.449881	comparisons.	-0.124939
-4.449881	trap	-0.124939
-4.449881	overridden	-0.124939
-4.449881	Vec4ui	-0.124939
-4.449881	sub-expressions.	-0.124939
-4.449881	addresses,	-0.124939
-4.449881	opinions	-0.124939
-4.449881	microprocessors,	-0.124939
-4.449881	individually.	-0.124939
-4.449881	p->b;}	-0.124939
-4.449881	initialisation	-0.124939
-4.449881	matters	-0.124939
-4.449881	tune	-0.124939
-4.449881	18.3.	-0.124939
-4.449881	doubled	-0.124939
-4.449881	categories:	-0.124939
-4.449881	routines,	-0.124939
-4.449881	usability.	-0.124939
-4.449881	0x8040);	-0.124939
-4.449881	couple	-0.124939
-4.449881	Parameter	-0.124939
-4.449881	huge).	-0.124939
-4.449881	13.1,	-0.124939
-4.449881	Of	-0.124939
-4.449881	131)	-0.124939
-4.449881	recompile	-0.124939
-4.449881	Main	-0.124939
-4.449881	MKL,	-0.124939
-4.449881	log(c[i]);.	-0.124939
-4.449881	said,	-0.124939
-4.449881	ways).	-0.124939
-4.449881	tests,	-0.124939
-4.449881	trees,	-0.124939
-4.449881	template:	-0.124939
-4.449881	bits).	-0.124939
-4.449881	FactorialTable[b];	-0.124939
-4.449881	Thin	-0.124939
-4.449881	_mm_cvtsd_si32(_mm_load_sd(&x));}	-0.124939
-4.449881	int)i;	-0.124939
-4.449881	tried.	-0.124939
-4.449881	Windows)	-0.124939
-4.449881	needed?	-0.124939
-4.449881	2A,	-0.124939
-4.449881	follows.	-0.124939
-4.449881	static_cast<float>(i);	-0.124939
-4.449881	respect.	-0.124939
-4.449881	bulky	-0.124939
-4.449881	light-weight	-0.124939
-4.449881	kilobyte	-0.124939
-4.449881	correction	-0.124939
-4.449881	supports.	-0.124939
-4.449881	supports,	-0.124939
-4.449881	104	-0.124939
-4.449881	5-10%	-0.124939
-4.449881	0.5ns.	-0.124939
-4.449881	profitable.	-0.124939
-4.449881	255	-0.124939
-4.449881	MASM	-0.124939
-4.449881	150.	-0.124939
-4.449881	directly,	-0.124939
-4.449881	directly.	-0.124939
-4.449881	argue	-0.124939
-4.449881	planned	-0.124939
-4.449881	capability:	-0.124939
-4.449881	destination,	-0.124939
-4.449881	shell	-0.124939
-4.449881	.so).	-0.124939
-4.449881	range");	-0.124939
-4.449881	reversed	-0.124939
-4.449881	4.4,	-0.124939
-4.449881	reflecting	-0.124939
-4.449881	scanf.	-0.124939
-4.449881	isolation	-0.124939
-4.449881	limiting	-0.124939
-4.449881	(signed)	-0.124939
-4.449881	exceptions,	-0.124939
-4.449881	measurements:	-0.124939
-4.449881	1.0	-0.124939
-4.449881	YMM)	-0.124939
-4.449881	Covers	-0.124939
-4.449881	Wednesday,	-0.124939
-4.449881	(XMM	-0.124939
-4.449881	Iu16vec4	-0.124939
-4.449881	patch.	-0.124939
-4.449881	1.,	-0.124939
-4.449881	met	-0.124939
-4.449881	xopintrin.h	-0.124939
-4.449881	FuncC(i+1);	-0.124939
-4.449881	^0	-0.124939
-4.449881	C99	-0.124939
-4.449881	Iu16vec8	-0.124939
-4.449881	x86intrin.h	-0.124939
-4.449881	unpacking	-0.124939
-4.449881	-fsource-asm).	-0.124939
-4.449881	seriously.	-0.124939
-4.449881	__attribute__((aligned(64)));	-0.124939
-4.449881	clash	-0.124939
-4.449881	nature	-0.124939
-4.449881	if(!(a	-0.124939
-4.449881	requirement.	-0.124939
-4.449881	Advanced	-0.124939
-4.449881	propagated	-0.124939
-4.449881	C0::f	-0.124939
-4.449881	I64vec1	-0.124939
-4.449881	(CParent<>)	-0.124939
-4.449881	dilemma.	-0.124939
-4.449881	High	-0.124939
-4.449881	Gnu:	-0.124939
-4.449881	API's.	-0.124939
-4.449881	workaround.	-0.124939
-4.449881	4.1.0,	-0.124939
-4.449881	_mm_empty();	-0.124939
-4.449881	parallel:	-0.124939
-4.449881	our	-0.124939
-4.449881	8.0f)	-0.124939
-4.449881	considerable.	-0.124939
-4.449881	pointer".	-0.124939
-4.449881	"Zen	-0.124939
-4.449881	parameters).	-0.124939
-4.449881	debate	-0.124939
-4.449881	Pragmatic	-0.124939
-4.449881	Multiplications	-0.124939
-4.449881	(1,2,3,4),	-0.124939
-4.449881	146).	-0.124939
-4.449881	green.	-0.124939
-4.449881	candidates	-0.124939
-4.449881	Preprocessor	-0.124939
-4.449881	Out-of-order	-0.124939
-4.449881	he	-0.124939
-4.449881	thousands	-0.124939
-4.449881	menu	-0.124939
-4.449881	chapter,	-0.124939
-4.449881	shuffling	-0.124939
-4.449881	a&&b	-0.124939
-4.449881	arithmetics	-0.124939
-4.449881	Vec32uc	-0.124939
-4.449881	Move	-0.124939
-4.449881	if(!a	-0.124939
-4.449881	(column	-0.124939
-4.449881	gives:	-0.124939
-4.449881	100000000.	-0.124939
-4.449881	formats.	-0.124939
-4.449881	Iss.	-0.124939
-4.449881	incomplete	-0.124939
-4.449881	printf("Alpha");	-0.124939
-4.449881	generations	-0.124939
-4.449881	costless.	-0.124939
-4.449881	guidelines.	-0.124939
-4.449881	knowing	-0.124939
-4.449881	7.43b	-0.124939
-4.449881	18,	-0.124939
-4.449881	Thinking	-0.124939
-4.449881	Relocation	-0.124939
-4.449881	Typical	-0.124939
-4.449881	int16_t	-0.124939
-4.449881	shall	-0.124939
-4.449881	p2;	-0.124939
-4.449881	SafeArray()	-0.124939
-4.449881	closes	-0.124939
-4.449881	uint32_t	-0.124939
-4.449881	2.5f;	-0.124939
-4.449881	breakpoints	-0.124939
-4.449881	favor	-0.124939
-4.449881	g++	-0.124939
-4.449881	.......................................................................................................................	-0.124939
-4.449881	frustrated	-0.124939
-4.449881	2.7,	-0.124939
-4.449881	a[i+3];	-0.124939
-4.449881	columns.	-0.124939
-4.449881	a[N];	-0.124939
-4.449881	Overriding	-0.124939
-4.449881	columns;	-0.124939
-4.449881	a.x	-0.124939
-4.449881	a.y	-0.124939
-4.449881	conversion,	-0.124939
-4.449881	responsible	-0.124939
-4.449881	i++;	-0.124939
-4.449881	throw();	-0.124939
-4.449881	i++.	-0.124939
-4.449881	tools,	-0.124939
-4.449881	precedence,	-0.124939
-4.449881	__intel_cpu_features_init_x().	-0.124939
-4.449881	(Some	-0.124939
-4.449881	well-	-0.124939
-4.449881	a=a*2;	-0.124939
-4.449881	interrupt,	-0.124939
-4.449881	7.1-4,	-0.124939
-4.449881	truncation,	-0.124939
-4.449881	old.	-0.124939
-4.449881	232-1	-0.124939
-4.449881	API.	-0.124939
-4.449881	__m128d	-0.124939
-4.449881	Wesley	-0.124939
-4.449881	73)	-0.124939
-4.449881	73.	-0.124939
-4.449881	s(0.f,	-0.124939
-4.449881	Active	-0.124939
-4.449881	subset,	-0.124939
-4.449881	remedies	-0.124939
-4.449881	resultant	-0.124939
-4.449881	sequence,	-0.124939
-4.449881	safely	-0.124939
-4.449881	1];	-0.124939
-4.449881	kind:	-0.124939
-4.449881	environment,	-0.124939
-4.449881	point).	-0.124939
-4.449881	(everything	-0.124939
-4.449881	2008,	-0.124939
-4.449881	bb)	-0.124939
-4.449881	2008.	-0.124939
-4.449881	intrinsics,	-0.124939
-4.449881	intrinsics.	-0.124939
-4.449881	layer	-0.124939
-4.449881	(FILO)	-0.124939
-4.449881	linking"	-0.124939
-4.449881	14.2a	-0.124939
-4.449881	14.2b	-0.124939
-4.449881	FuncC.	-0.124939
-4.449881	similarly	-0.124939
-4.449881	R2	-0.124939
-4.449881	apart.	-0.124939
-4.449881	FMA3	-0.124939
-4.449881	clarity	-0.124939
-4.449881	these,	-0.124939
-4.449881	these.	-0.124939
-4.449881	Porting	-0.124939
-4.449881	wasted.	-0.124939
-4.449881	memmove,	-0.124939
-4.449881	[esp+12]	-0.124939
-4.449881	reserving	-0.124939
-4.449881	tempting	-0.124939
-4.449881	levels	-0.124939
-4.449881	DelayFiveSeconds()	-0.124939
-4.449881	mimic	-0.124939
-4.449881	Third	-0.124939
-4.449881	identifying	-0.124939
-4.449881	disagree	-0.124939
-4.449881	AQtime,	-0.124939
-4.449881	14.29	-0.124939
-4.449881	14.24	-0.124939
-4.449881	14.25	-0.124939
-4.449881	Vectors	-0.124939
-4.449881	reciprocal,	-0.124939
-4.449881	14.20	-0.124939
-4.449881	14.21	-0.124939
-4.449881	adapt	-0.124939
-4.449881	unrelated	-0.124939
-4.449881	unit-	-0.124939
-4.449881	SelectAddMul_AVX2	-0.124939
-4.449881	1.6;	-0.124939
-4.449881	fighting	-0.124939
-4.449881	__intel_new_strlen	-0.124939
-4.449881	ARM	-0.124939
-4.449881	a[i].	-0.124939
-4.449881	at,	-0.124939
-4.449881	overwrite	-0.124939
-4.449881	seconds.	-0.124939
-4.449881	399	-0.124939
-4.449881	immintrin.h	-0.124939
-4.449881	BSF	-0.124939
-4.449881	seconds;	-0.124939
-4.449881	1.2f;	-0.124939
-4.449881	intended,	-0.124939
-4.449881	makefile.	-0.124939
-4.449881	strings.	-0.124939
-4.449881	supply	-0.124939
-4.449881	queue.	-0.124939
-4.449881	queue,	-0.124939
-4.449881	from),	-0.124939
-4.449881	distinguishing	-0.124939
-4.449881	queue)	-0.124939
-4.449881	comp.lang.asm.x86	-0.124939
-4.449881	9.1b.	-0.124939
-4.449881	cc[]);	-0.124939
-4.449881	Coarse-grained	-0.124939
-4.449881	term	-0.124939
-4.449881	wstring	-0.124939
-4.449881	symbols,	-0.124939
-4.449881	consequences.	-0.124939
-4.449881	activating	-0.124939
-4.449881	sum2;	-0.124939
-4.449881	minimized.	-0.124939
-4.449881	inserted,	-0.124939
-4.449881	*(T*)0;	-0.124939
-4.449881	7.30b.	-0.124939
-4.449881	vary	-0.124939
-4.449881	14.4a	-0.124939
-4.449881	exceed	-0.124939
-4.449881	WriteFile(handle,	-0.124939
-4.449881	&SelectAddMul_SSE41;	-0.124939
-4.449881	11,	-0.124939
-4.449881	allocated.	-0.124939
-4.449881	11.	-0.124939
-4.449881	minimized	-0.124939
-4.449881	alias,	-0.124939
-4.449881	115	-0.124939
-4.449881	Atom	-0.124939
-4.449881	100>	-0.124939
-4.449881	116	-0.124939
-4.449881	111	-0.124939
-4.449881	110	-0.124939
-4.449881	around,	-0.124939
-4.449881	112	-0.124939
-4.449881	-Wstrict-overflow=2,	-0.124939
-4.449881	FuncB(i+1);	-0.124939
-4.449881	minimum,	-0.124939
-4.449881	118	-0.124939
-4.449881	11;	-0.124939
-4.449881	constant:	-0.124939
-4.449881	Jr.:	-0.124939
-4.449881	70).	-0.124939
-4.449881	(&)	-0.124939
-4.449881	list[300]	-0.124939
-4.449881	(&&	-0.124939
-4.449881	Default	-0.124939
-4.449881	locked	-0.124939
-4.449881	1./24.,	-0.124939
-4.449881	executed,	-0.124939
-4.449881	annoying.	-0.124939
-4.449881	94	-0.124939
-4.449881	91	-0.124939
-4.449881	9;	-0.124939
-4.449881	98	-0.124939
-4.449881	platform,	-0.124939
-4.449881	exiting	-0.124939
-4.449881	rare	-0.124939
-4.449881	occupying	-0.124939
-4.449881	_mm_and_si128(c2,	-0.124939
-4.449881	getting	-0.124939
-4.449881	Linux).	-0.124939
-4.449881	7.41a	-0.124939
-4.449881	7.41b	-0.124939
-4.449881	Zero	-0.124939
-4.449881	endl;	-0.124939
-4.449881	0x20	-0.124939
-4.449881	[eax],	-0.124939
-4.449881	Alternative	-0.124939
-4.449881	NOT	-0.124939
-4.449881	reports	-0.124939
-4.449881	registration	-0.124939
-4.449881	error;	-0.124939
-4.449881	calling.	-0.124939
-4.449881	(arrays	-0.124939
-4.449881	__declspec(thread).	-0.124939
-4.449881	2.5;	-0.124939
-4.449881	granularity	-0.124939
-4.449881	&=	-0.124939
-4.449881	accessed.	-0.124939
-4.449881	BIOS	-0.124939
-4.449881	quadratic	-0.124939
-4.449881	d+e,	-0.124939
-4.449881	re-loaded	-0.124939
-4.449881	2.5,	-0.124939
-4.449881	writeable	-0.124939
-4.449881	malloc/free	-0.124939
-4.449881	(single	-0.124939
-4.449881	"Gnu	-0.124939
-4.449881	eax,eax.	-0.124939
-4.449881	nature,	-0.124939
-4.449881	/arch:SSSE2	-0.124939
-4.449881	27.	-0.124939
-4.449881	1997.	-0.124939
-4.449881	Nowadays,	-0.124939
-4.449881	Sandy	-0.124939
-4.449881	Primitives".	-0.124939
-4.449881	;startofFunc	-0.124939
-4.449881	meaningless	-0.124939
-4.449881	---	-0.124939
-4.449881	disturb	-0.124939
-4.449881	task-specific	-0.124939
-4.449881	Temporary	-0.124939
-4.449881	'1'	-0.124939
-4.449881	dummy[0];	-0.124939
-4.449881	replaces	-0.124939
-4.449881	a+1;	-0.124939
-4.449881	reproducible.	-0.124939
-4.449881	incredibly	-0.124939
-4.449881	Efficiency	-0.124939
-4.449881	Re-interpreting	-0.124939
-4.449881	inheritance.	-0.124939
-4.449881	inheritance,	-0.124939
-4.449881	Future	-0.124939
-4.449881	sub-vector	-0.124939
-4.449881	(^)	-0.124939
-4.449881	incurred	-0.124939
-4.449881	b<c)	-0.124939
-4.449881	Foundation	-0.124939
-4.449881	volumes	-0.124939
-4.449881	12.2.	-0.124939

\2-grams:
-1.439966	is the	-0.333215
-0.591930	of the	-0.437412
-1.031236	to the	-0.326717
-1.080074	and the	-0.234744
-0.585141	in the	-0.405119
-0.844614	for the	-0.313995
-0.742054	that the	-0.323471
-2.044553	be the	-0.124939
-1.965873	are the	-0.346788
-1.605319	or the	-0.124939
-2.521360	it the	-0.124939
-0.476203	if the	-0.345810
-0.761694	by the	-0.280691
-0.761847	with the	-0.267252
-0.595476	on the	-0.276588
-2.367493	code the	-0.124939
-1.255526	as the	-0.124939
-1.671693	not the	-0.124939
-0.763439	than the	-0.199040
-1.386016	have the	-0.124939
-1.837523	this the	-0.602060
-1.143231	time the	-0.510290
-0.651526	use the	-0.261158
-0.482348	when the	-0.437868
-0.662661	then the	-0.246672
-0.701154	from the	-0.284640
-0.681328	at the	-0.384576
-1.060659	has the	-0.287666
-0.750382	make the	-0.249877
-0.467434	because the	-0.205595
-1.107174	only the	-0.159701
-0.417363	If the	-0.259637
-1.345045	which the	-0.271067
-0.763136	all the	-0.238882
-0.927657	but the	-0.204120
-2.295614	used the	-0.124939
-1.659330	set the	-0.124939
-0.932460	do the	-0.124939
-0.666032	using the	-0.234083
-1.785924	double the	-0.124939
-0.706505	into the	-0.207913
-1.723758	also the	-0.124939
-2.213498	efficient the	-0.124939
-0.879498	In the	-0.284640
-0.496478	where the	-0.204120
-2.149697	takes the	-0.124939
-1.371471	so the	-0.124939
-1.952690	return the	-0.124939
-0.794443	between the	-0.238882
-2.250362	member the	-0.124939
-1.474332	way the	-0.124939
-2.227499	faster the	-0.124939
-0.645715	makes the	-0.313995
-0.417401	before the	-0.316824
-1.427101	called the	-0.301030
-0.878922	See the	-0.221849
-0.780476	call the	-0.204120
-0.863632	example, the	-0.170696
-2.020485	first the	-0.124939
-1.991796	register the	-0.124939
-1.081860	take the	-0.204120
-1.932779	often the	-0.124939
-1.413222	how the	-0.124939
-1.234874	need the	-0.124939
-1.216017	test the	-0.124939
-0.737943	without the	-0.212089
-1.572077	even the	-0.124939
-0.732797	sure the	-0.166331
-1.831283	always the	-0.124939
-1.194035	access the	-0.124939
-0.721274	out the	-0.467361
-1.167254	case the	-0.124939
-1.317247	cases the	-0.124939
-0.679421	up the	-0.166331
-0.668812	making the	-0.124939
-1.142809	times the	-0.249877
-1.042785	want the	-0.124939
-0.616205	about the	-0.162727
-0.940190	does the	-0.602060
-0.680289	while the	-0.221849
-1.913638	work the	-0.124939
-1.000615	calls the	-0.221849
-0.675780	avoid the	-0.124939
-1.921762	processor the	-0.124939
-0.980282	Use the	-0.221849
-0.895363	But the	-0.124939
-0.893128	through the	-0.124939
-1.086349	compile the	-0.249877
-0.850330	cause the	-0.124939
-1.960447	done the	-0.124939
-1.878600	therefore the	-0.124939
-0.230297	inside the	-0.372723
-1.397190	calculated the	-0.124939
-1.157546	uses the	-0.124939
-0.902636	get the	-0.221849
-1.183254	check the	-0.124939
-1.960846	advantageous the	-0.124939
-1.153061	support the	-0.124939
-0.989581	contains the	-0.726999
-0.474804	whether the	-0.162727
-0.706026	doing the	-0.271067
-0.980779	run the	-0.124939
-0.390171	calculate the	-0.316824
-1.135397	inline the	-0.301030
-0.849353	add the	-0.124939
-0.577759	store the	-0.124939
-1.304426	All the	-0.124939
-1.114258	copy the	-0.124939
-1.333524	optimizing the	-0.124939
-1.333524	well the	-0.124939
-1.849629	simply the	-0.124939
-0.923271	write the	-0.124939
-1.069440	optimize the	-0.301030
-0.928866	above the	-0.425969
-0.592871	However, the	-0.124939
-1.722823	was the	-0.124939
-0.810872	both the	-0.221849
-0.291659	unless the	-0.259637
-0.714359	cases, the	-0.204120
-0.701243	replace the	-0.204120
-0.704839	Therefore, the	-0.204120
-0.882851	see the	-0.124939
-0.678630	allows the	-0.301030
-0.880036	sets the	-0.249877
-1.773241	like the	-0.124939
-0.753990	Using the	-0.124939
-1.734796	model the	-0.124939
-1.240102	block the	-0.425969
-0.529063	put the	-0.249877
-1.840744	needs the	-0.124939
-0.737355	what the	-0.346788
-1.235960	running the	-0.124939
-0.721066	Make the	-0.522879
-1.766271	last the	-0.124939
-0.475501	after the	-0.124939
-0.813135	read the	-0.249877
-0.515698	give the	-0.191886
-1.691601	becomes the	-0.124939
-1.714723	requires the	-0.124939
-0.792931	load the	-0.124939
-1.156778	control the	-0.124939
-1.832245	assume the	-0.124939
-0.770323	calling the	-0.249877
-1.167254	shows the	-0.124939
-0.285639	improve the	-0.388180
-0.475246	Here, the	-0.191886
-1.735098	know the	-0.124939
-1.708426	generate the	-0.124939
-1.626183	usually the	-0.124939
-0.746632	reduce the	-0.124939
-1.134502	goes the	-0.124939
-0.475246	choose the	-0.124939
-1.131065	made the	-0.124939
-1.659816	function, the	-0.124939
-1.121497	start the	-0.124939
-0.876155	smaller the	-0.124939
-1.100222	around the	-0.124939
-1.634992	reductions the	-0.124939
-1.634992	go the	-0.124939
-1.082761	tested the	-0.124939
-1.611238	supports the	-0.124939
-0.426941	change the	-0.124939
-0.402127	off the	-0.191886
-1.633792	Supports the	-0.124939
-1.646924	Windows, the	-0.124939
-0.476693	gives the	-0.204120
-0.827016	inlining the	-0.124939
-0.373340	Unfortunately, the	-0.191886
-0.311791	find the	-0.124939
-0.801010	produce the	-0.301030
-0.373340	including the	-0.191886
-0.177394	outside the	-0.346788
-1.008832	still the	-0.124939
-0.773080	prevent the	-0.301030
-1.602473	destructor the	-0.124939
-0.226190	prevents the	-0.477121
-0.226190	tell the	-1.079181
-1.677343	repeat the	-0.124939
-1.023087	unroll the	-0.425969
-0.249643	On the	-0.550907
-1.584265	Linux, the	-0.124939
-1.005641	Note the	-0.425969
-0.476594	When the	-0.124939
-0.738862	Avoid the	-0.124939
-0.738862	copying the	-0.124939
-1.570288	accessing the	-0.124939
-0.194584	until the	-0.176091
-0.976648	adding the	-0.124939
-0.980168	causes the	-0.124939
-1.530869	processing the	-0.124939
-0.314233	divide the	-0.271067
-1.508853	mix the	-0.124939
-0.279471	fit the	-0.367977
-0.952533	predict the	-0.124939
-0.442942	though the	-0.124939
-0.945406	execute the	-0.124939
-0.959779	compiling the	-0.124939
-0.952533	convert the	-0.124939
-0.952533	least the	-0.124939
-0.706133	containing the	-0.124939
-0.941885	handle the	-0.425969
-0.557061	during the	-0.124939
-0.405154	includes the	-0.124939
-0.674499	insert the	-0.124939
-1.526156	consider the	-0.124939
-1.511714	loading the	-0.124939
-0.520720	below the	-0.425969
-0.918352	reading the	-0.124939
-0.918352	delay the	-0.124939
-0.242445	calculating the	-0.191886
-0.241682	enable the	-0.492916
-1.484196	e.g. the	-0.124939
-0.914744	keep the	-0.124939
-1.497737	align the	-0.124939
-1.497737	allow the	-0.124939
-1.471065	rarely the	-0.124939
-0.674499	under the	-0.124939
-0.678652	expect the	-0.124939
-0.925659	except the	-0.425969
-1.484196	why the	-0.124939
-1.515172	whenever the	-0.124939
-0.873351	unrolling the	-0.425969
-0.480780	swap the	-0.425969
-0.631045	modify the	-0.124939
-0.480780	Store the	-0.726999
-1.442804	compiler, the	-0.124939
-0.479327	setting the	-0.124939
-0.368229	within the	-0.221849
-1.515172	apply the	-0.124939
-0.633107	Obviously, the	-0.124939
-1.515172	allocate the	-0.124939
-0.591502	implement the	-0.124939
-1.424564	chosen the	-0.124939
-1.410587	contain the	-0.124939
-0.591502	help the	-0.124939
-0.831202	away the	-0.124939
-0.037206	share the	-1.079181
-1.502115	near the	-0.124939
-0.321350	stores the	-0.124939
-0.320232	finding the	-0.221849
-1.439006	purposes the	-0.124939
-0.230434	vectorize the	-0.602060
-1.439006	include the	-0.124939
-0.435022	involves the	-0.249877
-1.439006	Here the	-0.124939
-1.439006	across the	-0.124939
-1.387853	once the	-0.124939
-1.402792	interrupt the	-0.124939
-1.402792	almost the	-0.124939
-1.434305	multiplying the	-0.124939
-0.783687	down the	-0.425969
-0.542440	exactly the	-0.301030
-1.329861	had the	-0.124939
-1.344800	measure the	-0.124939
-1.344800	vector, the	-0.124939
-1.344800	delete the	-0.124939
-1.329861	Likewise, the	-0.124939
-1.344800	mode, the	-0.124939
-1.329861	update the	-0.124939
-0.725695	generates the	-0.425969
-0.733064	executing the	-0.124939
-0.725695	free the	-0.425969
-0.725695	hold the	-0.124939
-1.344800	system, the	-0.124939
-0.725695	changes the	-0.425969
-1.329861	storing the	-0.124939
-0.725695	Now the	-0.124939
-0.482357	remove the	-0.124939
-1.360270	transpose the	-0.124939
-1.293324	predictable the	-0.124939
-1.277853	plus the	-0.124939
-0.147505	increase the	-0.124939
-0.662417	identify the	-0.124939
-1.326023	Add the	-0.124939
-0.662417	declare the	-0.124939
-0.669849	fits the	-0.124939
-1.277853	giving the	-0.124939
-1.277853	above, the	-0.124939
-1.277853	detect the	-0.124939
-1.293324	show the	-0.124939
-1.293324	Test the	-0.124939
-1.277853	evaluate the	-0.124939
-1.293324	reference, the	-0.124939
-0.666117	half the	-0.124939
-1.326023	converting the	-0.124939
-0.666117	specifying the	-0.124939
-1.309366	follows the	-0.124939
-1.277853	comparing the	-0.124939
-1.309366	prefetch the	-0.124939
-1.326023	static, the	-0.124939
-1.293324	Testing the	-0.124939
-1.309366	general, the	-0.124939
-1.277853	(i.e. the	-0.124939
-0.662417	avoiding the	-0.124939
-1.214142	increment the	-0.124939
-0.184137	economize the	-0.249877
-0.586936	overcome the	-0.124939
-1.214142	swapping the	-0.124939
-1.230185	on, the	-0.124939
-0.340422	reducing the	-0.124939
-0.184137	worth the	-0.425969
-1.230185	specifies the	-0.124939
-1.230185	OR the	-0.124939
-1.214142	lists the	-0.124939
-0.586936	select the	-0.124939
-1.214142	list, the	-0.124939
-0.586936	case, the	-0.124939
-0.586936	over the	-0.124939
-1.214142	hand, the	-0.124939
-1.230185	split the	-0.124939
-1.246842	limit the	-0.124939
-0.184137	follow the	-0.124939
-1.214142	increased the	-0.124939
-0.586936	Only the	-0.124939
-1.214142	adds the	-0.124939
-1.214142	Fortunately, the	-0.124939
-0.340422	specify the	-0.124939
-1.214142	unfortunately the	-0.124939
-1.214142	(In the	-0.124939
-1.214142	compare the	-0.124939
-0.590668	Is the	-0.425969
-0.590668	Typically, the	-0.124939
-0.590668	gets the	-0.124939
-0.184137	tells the	-0.124939
-0.586936	wrap the	-0.425969
-1.230185	increasing the	-0.124939
-1.246842	definitely the	-0.124939
-1.133275	BSD, the	-0.124939
-0.088699	Choosing the	-0.726999
-1.133275	place the	-0.124939
-0.245623	overlap the	-0.124939
-1.167254	turning the	-0.124939
-0.493758	obtain the	-0.425969
-0.497522	enables the	-0.425969
-0.493758	explain the	-0.124939
-1.133275	against the	-0.124939
-0.493758	declaring the	-0.124939
-1.133275	move the	-0.124939
-1.133275	Can the	-0.124939
-1.167254	chooses the	-0.124939
-0.497522	choosing the	-0.124939
-1.167254	commonly the	-0.124939
-1.133275	transferring the	-0.124939
-1.149932	splitting the	-0.124939
-1.149932	Whenever the	-0.124939
-0.493758	avoids the	-0.124939
-1.133275	begin the	-0.124939
-1.149932	words, the	-0.124939
-1.133275	illustrates the	-0.124939
-0.493758	mirror the	-0.124939
-1.133275	changing the	-0.124939
-0.493758	force the	-0.124939
-0.493758	opens the	-0.425969
-0.245623	finished the	-0.124939
-1.024993	Storing the	-0.124939
-1.024993	reordering the	-0.124939
-1.024993	prefetching the	-0.124939
-1.024993	distance the	-0.124939
-1.024993	aligning the	-0.124939
-0.372583	reload the	-0.124939
-1.024993	optimization, the	-0.124939
-1.042316	Whether the	-0.124939
-1.024993	removed the	-0.124939
-1.024993	optimizes the	-0.124939
-1.024993	Return the	-0.124939
-0.372583	fact, the	-0.124939
-1.024993	ignore the	-0.124939
-0.122806	reflects the	-0.124939
-0.372583	manipulate the	-0.124939
-0.372583	copies the	-0.124939
-0.372583	study the	-0.124939
-0.372583	bypassing the	-0.124939
-1.024993	skip the	-0.124939
-1.024993	allocates the	-0.124939
-0.372583	offer the	-0.124939
-1.024993	At the	-0.124939
-0.372583	leaving the	-0.425969
-0.372583	Consider the	-0.124939
-1.024993	leave the	-0.124939
-1.042316	With the	-0.124939
-1.042316	Sometimes the	-0.124939
-1.024993	justify the	-0.124939
-1.024993	contrary, the	-0.124939
-0.372583	cover the	-0.425969
-1.024993	way, the	-0.124939
-0.372583	Today, the	-0.124939
-1.024993	focus the	-0.124939
-1.024993	probably the	-0.124939
-1.024993	improving the	-0.124939
-1.042316	holds the	-0.124939
-0.372583	moving the	-0.425969
-1.024993	since the	-0.124939
-0.122806	beyond the	-0.602060
-0.372583	organizing the	-0.124939
-1.024993	open the	-0.124939
-0.372583	measuring the	-0.124939
-0.372583	utilize the	-0.124939
-1.042316	represent the	-0.124939
-0.372583	measures the	-0.124939
-0.122806	bypass the	-0.124939
-1.042316	evict the	-0.124939
-1.024993	Copying the	-0.124939
-1.024993	market the	-0.124939
-1.024993	Instead, the	-0.124939
-1.024993	joining the	-0.124939
-1.024993	determine the	-0.124939
-0.372583	interpreting the	-0.425969
-0.866224	modifying the	-0.124939
-0.866224	lack the	-0.124939
-0.866224	past the	-0.124939
-0.866224	override the	-0.124939
-0.866224	exit the	-0.124939
-0.866224	having the	-0.124939
-0.866224	reduces the	-0.124939
-0.866224	explains the	-0.124939
-0.866224	emulate the	-0.124939
-0.866224	Enable the	-0.124939
-0.866224	Adding the	-0.124939
-0.200289	Organize the	-0.425969
-0.866224	loop, the	-0.124939
-0.200289	invoking the	-0.124939
-0.866224	calculates the	-0.124939
-0.200289	Finding the	-0.425969
-0.200289	putting the	-0.124939
-0.200289	Overcoming the	-0.425969
-0.866224	Take the	-0.124939
-0.200289	increases the	-0.124939
-0.866224	invalidate the	-0.124939
-0.866224	So the	-0.124939
-0.866224	Nevertheless, the	-0.124939
-0.866224	Unrolling the	-0.124939
-0.866224	determines the	-0.124939
-0.200289	behind the	-0.124939
-0.200289	isolate the	-0.124939
-0.866224	verifying the	-0.124939
-0.866224	Now, the	-0.124939
-0.200289	Especially the	-0.124939
-0.866224	requiring the	-0.124939
-0.866224	influence the	-0.124939
-0.200289	loads the	-0.124939
-0.866224	draw the	-0.124939
-0.200289	sharing the	-0.425969
-0.200289	redo the	-0.124939
-0.866224	deleting the	-0.124939
-0.866224	covered the	-0.124939
-0.866224	Sometimes, the	-0.124939
-0.866224	crash the	-0.124939
-0.866224	reserve the	-0.124939
-0.866224	Both the	-0.124939
-0.866224	loaded, the	-0.124939
-0.866224	extending the	-0.124939
-0.200289	forces the	-0.124939
-0.200289	stop the	-0.124939
-0.866224	organize the	-0.124939
-0.866224	sees the	-0.124939
-0.866224	studying the	-0.124939
-0.866224	compares the	-0.124939
-0.200289	fix the	-0.124939
-0.866224	involve the	-0.124939
-0.866224	removing the	-0.124939
-0.866224	interpret the	-0.124939
-0.866224	flip the	-0.124939
-0.866224	reasons, the	-0.124939
-0.866224	relieving the	-0.124939
-0.866224	Put the	-0.124939
-0.583236	ignoring the	-0.124939
-0.583236	owns the	-0.124939
-0.583236	limits the	-0.124939
-0.583236	bit, the	-0.124939
-0.583236	Otherwise the	-0.124939
-0.583236	trigger the	-0.124939
-0.583236	Re-do the	-0.124939
-0.583236	separating the	-0.124939
-0.583236	ago, the	-0.124939
-0.583236	reuse the	-0.124939
-0.583236	Including the	-0.124939
-0.583236	restores the	-0.124939
-0.583236	telling the	-0.124939
-0.583236	Calculating the	-0.124939
-0.583236	thank the	-0.124939
-0.583236	Choose the	-0.124939
-0.583236	forbids the	-0.124939
-0.583236	Weighing the	-0.124939
-0.583236	eliminating the	-0.124939
-0.583236	emulating the	-0.124939
-0.583236	satisfies the	-0.124939
-0.583236	among the	-0.124939
-0.583236	signaling the	-0.124939
-0.583236	solving the	-0.124939
-0.583236	lacks the	-0.124939
-0.583236	loose the	-0.124939
-0.583236	Omitting the	-0.124939
-0.583236	controlling the	-0.124939
-0.583236	trying the	-0.124939
-0.583236	inverting the	-0.124939
-0.583236	interleave the	-0.124939
-0.583236	justifies the	-0.124939
-0.583236	weigh the	-0.124939
-0.583236	restart the	-0.124939
-0.583236	throughout the	-0.124939
-0.583236	eliminates the	-0.124939
-0.583236	dropping the	-0.124939
-0.583236	12.2, the	-0.124939
-0.583236	structure), the	-0.124939
-0.583236	Occasionally, the	-0.124939
-0.583236	explaining the	-0.124939
-0.583236	holding the	-0.124939
-0.583236	Combining the	-0.124939
-0.583236	fetch the	-0.124939
-0.583236	constructing the	-0.124939
-0.583236	enters the	-0.124939
-0.583236	steal the	-0.124939
-0.583236	Avoiding the	-0.124939
-0.583236	localize the	-0.124939
-0.583236	Repeating the	-0.124939
-0.583236	28, the	-0.124939
-0.583236	to) the	-0.124939
-0.583236	shows, the	-0.124939
-0.583236	paying the	-0.124939
-0.583236	provide the	-0.124939
-0.583236	in-between the	-0.124939
-0.583236	abusing the	-0.124939
-0.583236	wrapping the	-0.124939
-0.583236	Underestimating the	-0.124939
-0.583236	reinvent the	-0.124939
-0.583236	summarizes the	-0.124939
-0.583236	terminates the	-0.124939
-0.583236	puts the	-0.124939
-0.583236	merge the	-0.124939
-0.583236	combine the	-0.124939
-0.583236	upon the	-0.124939
-0.583236	isolates the	-0.124939
-0.583236	Sort the	-0.124939
-0.583236	reorganize the	-0.124939
-0.583236	despite the	-0.124939
-0.583236	extracts the	-0.124939
-0.583236	ends the	-0.124939
-0.583236	Because the	-0.124939
-0.583236	interprets the	-0.124939
-0.583236	Taking the	-0.124939
-0.583236	12.1a, the	-0.124939
-0.583236	met: the	-0.124939
-0.583236	Therefore the	-0.124939
-0.583236	crashes the	-0.124939
-0.583236	deallocate the	-0.124939
-0.583236	During the	-0.124939
-0.583236	view the	-0.124939
-0.583236	violates the	-0.124939
-0.583236	consult the	-0.124939
-0.583236	event, the	-0.124939
-0.583236	minimize the	-0.124939
-0.583236	12.1b, the	-0.124939
-0.583236	applying the	-0.124939
-0.583236	refresh the	-0.124939
-0.583236	concentrate the	-0.124939
-0.583236	shares the	-0.124939
-0.583236	namely the	-0.124939
-0.583236	finds the	-0.124939
-0.583236	++b; the	-0.124939
-0.583236	stress the	-0.124939
-0.583236	collect the	-0.124939
-0.583236	Declare the	-0.124939
-0.583236	tune the	-0.124939
-0.583236	tests, the	-0.124939
-0.583236	Move the	-0.124939
-0.583236	closes the	-0.124939
-0.583236	Overriding the	-0.124939
-0.583236	mimic the	-0.124939
-0.583236	overwrite the	-0.124939
-0.583236	activating the	-0.124939
-0.583236	exiting the	-0.124939
-0.583236	disturb the	-0.124939
-0.583236	replaces the	-0.124939
-0.583236	Re-interpreting the	-0.124939
-2.322581	a is	-0.124939
-2.324172	to is	-0.124939
-2.034724	and is	-0.124939
-2.385539	for is	-0.124939
-1.139762	that is	-0.274237
-0.436133	it is	-0.448633
-0.838614	function is	-0.381550
-2.576919	if is	-0.124939
-2.363926	by is	-0.124939
-2.399045	on is	-0.124939
-0.854820	code is	-0.276206
-2.102172	as is	-0.124939
-0.560563	This is	-0.455258
-1.949588	int is	-0.124939
-1.164548	compiler is	-0.279841
-2.266110	x is	-0.124939
-1.027687	this is	-0.372723
-1.230371	time is	-0.191886
-2.335921	use is	-0.124939
-1.787484	A is	-0.124939
-0.157792	It is	-0.620945
-1.681467	memory is	-0.124939
-1.671001	data is	-0.124939
-0.805923	program is	-0.207126
-1.638971	functions is	-0.124939
-1.939134	CPU is	-0.124939
-1.849194	other is	-0.124939
-2.475216	instruction is	-0.124939
-1.612951	point is	-0.249877
-1.369773	loop is	-0.124939
-0.611623	which is	-0.227601
-2.140856	all is	-0.124939
-1.603637	but is	-0.249877
-2.334868	used is	-0.124939
-1.570252	one is	-0.124939
-1.199276	cache is	-0.234083
-1.448036	integer is	-0.124939
-0.591535	set is	-0.425969
-1.122986	class is	-0.170696
-1.857967	do is	-0.425969
-1.321995	example is	-0.301030
-2.281087	compilers is	-0.124939
-1.622379	double is	-0.124939
-1.052090	size is	-0.212089
-1.001307	pointer is	-0.162727
-1.034318	b is	-0.263241
-1.269590	library is	-0.204120
-1.477018	i is	-0.124939
-0.790282	object is	-0.209260
-1.843580	number is	-0.124939
-2.173458	static is	-0.124939
-0.269304	there is	-0.751358
-1.106781	C++ is	-0.124939
-0.401440	There is	-1.005752
-1.084527	array is	-0.124939
-2.192285	where is	-0.124939
-1.305865	version is	-0.221849
-1.144785	value is	-0.191886
-1.404230	objects is	-0.124939
-0.955834	variable is	-0.279841
-1.739652	so is	-0.124939
-1.502945	variables is	-0.124939
-2.010776	2 is	-0.124939
-1.236962	table is	-0.124939
-1.027825	performance is	-0.249877
-1.657253	software is	-0.124939
-1.765328	order is	-0.124939
-1.145615	branch is	-0.124939
-1.741643	member is	-0.124939
-1.059497	way is	-0.367977
-1.466150	elements is	-0.301030
-1.203110	address is	-0.124939
-2.097905	call is	-0.124939
-2.069723	bit is	-0.124939
-1.411580	register is	-0.124939
-1.141857	optimization is	-0.221849
-2.029982	libraries is	-0.124939
-1.392959	template is	-0.124939
-1.120229	registers is	-0.124939
-2.062041	pointers is	-0.124939
-1.369429	user is	-0.124939
-0.505848	method is	-0.234083
-1.199256	access is	-0.124939
-1.542252	16 is	-0.124939
-1.350737	SSE2 is	-0.124939
-1.317848	system is	-0.124939
-1.322625	file is	-0.301030
-2.024252	programming is	-0.124939
-1.985201	bits is	-0.124939
-1.992766	operations is	-0.124939
-2.042162	0 is	-0.124939
-1.536888	type is	-0.124939
-1.171832	case is	-0.249877
-1.317944	processors is	-0.124939
-0.882790	constant is	-0.669007
-1.141264	error is	-0.124939
-1.027018	stack is	-0.221849
-1.501385	CPUs is	-0.124939
-1.979371	arrays is	-0.124939
-1.988342	calls is	-0.124939
-1.988342	execution is	-0.124939
-0.845430	result is	-0.271067
-0.991092	processor is	-0.221849
-1.952207	bytes is	-0.124939
-1.462755	threads is	-0.124939
-1.238747	element is	-0.124939
-0.756686	language is	-0.124939
-0.823734	speed is	-0.271067
-2.048736	c is	-0.124939
-1.463998	much is	-0.425969
-1.221827	thread is	-0.124939
-1.420016	etc. is	-0.124939
-1.454582	exception is	-0.124939
-1.442809	allocated is	-0.425969
-1.212434	overflow is	-0.124939
-1.439915	integers is	-0.124939
-1.434184	option is	-0.124939
-0.852162	matrix is	-0.301030
-1.426045	Linux is	-0.124939
-1.426045	AVX is	-0.425969
-1.866973	classes is	-0.124939
-1.191308	precision is	-0.124939
-1.874954	line is	-0.124939
-1.914913	works is	-0.124939
-1.897153	optimized is	-0.124939
-0.927182	manual is	-0.221849
-1.971922	calculation is	-0.124939
-2.016631	check is	-0.124939
-0.736286	problem is	-0.367977
-0.417448	solution is	-0.316824
-1.396590	container is	-0.124939
-1.848848	operators is	-0.124939
-2.004072	i++) is	-0.124939
-0.884752	list is	-0.124939
-2.030126	likely is	-0.124939
-1.137515	structure is	-0.124939
-1.840279	standard is	-0.124939
-1.878139	hardware is	-0.124939
-1.337274	1 is	-0.124939
-1.868360	mode is	-0.124939
-1.868360	store is	-0.124939
-1.884595	values is	-0.124939
-1.951691	sign is	-0.124939
-1.928154	copy is	-0.124939
-1.365286	information is	-0.124939
-1.338687	addresses is	-0.124939
-0.829450	counter is	-0.124939
-0.460309	count is	-0.212089
-0.816617	allocation is	-0.124939
-1.825608	write is	-0.124939
-1.830151	problems is	-0.124939
-0.911446	space is	-0.124939
-1.277959	microprocessor is	-0.124939
-1.785010	branches is	-0.124939
-0.622930	operator is	-0.271067
-1.266344	multiplication is	-0.124939
-1.272113	application is	-0.124939
-0.686123	caching is	-0.204120
-1.267495	sets is	-0.124939
-1.037473	expression is	-0.124939
-0.772997	implementation is	-0.221849
-0.879261	handling is	-0.249877
-1.808206	members is	-0.124939
-0.862228	model is	-0.124939
-1.247489	block is	-0.124939
-1.761150	name is	-0.124939
-1.253456	conversion is	-0.124939
-0.745947	disadvantage is	-0.823909
-1.743421	zero is	-0.124939
-1.004471	what is	-0.124939
-0.649799	parameter is	-0.124939
-1.238741	division is	-0.124939
-1.247913	reference is	-0.124939
-1.794678	source is	-0.124939
-1.260450	cost is	-0.124939
-0.842177	reason is	-0.726999
-0.827238	n is	-0.124939
-0.829692	string is	-0.249877
-0.980729	keyword is	-0.124939
-1.211276	lookup is	-0.124939
-1.706017	&& is	-0.124939
-1.241967	difference is	-0.425969
-1.765448	addition is	-0.124939
-0.980729	mechanism is	-0.301030
-1.715411	|| is	-0.124939
-1.759012	optimizations is	-0.124939
-1.685429	framework is	-0.124939
-0.681200	linking is	-0.221849
-0.948247	microprocessors is	-0.124939
-1.186842	load is	-0.124939
-1.840749	assume is	-0.124939
-1.162596	numbers is	-0.124939
-1.150579	platform is	-0.124939
-1.715497	dispatch is	-0.124939
-1.159560	interface is	-0.124939
-1.664240	AVX2 is	-0.124939
-0.901254	process is	-0.301030
-0.908338	r is	-0.124939
-0.899500	storage is	-0.124939
-0.915540	union is	-0.301030
-1.682474	10 is	-0.124939
-0.640616	feature is	-0.221849
-0.879533	constructor is	-0.301030
-1.780711	a[i] is	-0.124939
-1.692065	#define is	-0.124939
-1.766884	points is	-0.124939
-1.715601	switch is	-0.124939
-1.727866	range is	-0.124939
-0.452740	here is	-0.271067
-1.678850	core is	-0.124939
-1.085965	section is	-0.124939
-1.101251	contentions is	-0.425969
-0.703365	computer is	-0.124939
-1.640913	conversions is	-0.124939
-1.676713	errors is	-0.124939
-1.689334	columns is	-0.124939
-0.478410	p is	-0.204120
-0.678282	syntax is	-0.124939
-1.065686	STL is	-0.124939
-0.830148	profiler is	-0.301030
-0.677036	index is	-0.425969
-1.068743	inlining is	-0.124939
-1.046893	network is	-0.124939
-1.624493	b) is	-0.124939
-1.674304	response is	-0.124939
-1.624493	lines is	-0.124939
-1.037657	operation is	-0.425969
-1.043793	checking is	-0.124939
-1.010751	task is	-0.124939
-1.594529	limited is	-0.124939
-1.618721	math is	-0.124939
-1.571614	database is	-0.124939
-1.606457	constants is	-0.124939
-1.644340	bool is	-0.124939
-1.631342	frame is	-0.124939
-0.777501	destructor is	-0.301030
-0.782914	efficiency is	-0.124939
-1.010751	algorithm is	-0.124939
-1.612156	strings is	-0.124939
-0.480653	exponent is	-0.124939
-1.003829	possibility is	-0.425969
-1.612156	conditions is	-0.124939
-1.562345	testing is	-0.124939
-1.562345	alignment is	-0.124939
-1.639382	compatibility is	-0.124939
-1.550736	macro is	-0.124939
-0.314724	operand is	-0.191886
-0.752549	effect is	-0.124939
-0.712351	containers is	-0.124939
-1.551774	priority is	-0.124939
-0.557105	frequency is	-0.124939
-0.710555	iteration is	-0.124939
-0.557105	N is	-0.425969
-1.577394	thing is	-0.124939
-1.527583	handle is	-0.124939
-1.577394	heap is	-0.124939
-1.553005	nontemporal is	-0.124939
-1.513986	bounds is	-0.124939
-1.581112	improved is	-0.124939
-0.934542	situation is	-0.124939
-1.526607	message is	-0.124939
-0.676367	delay is	-0.124939
-0.674563	condition is	-0.124939
-0.674563	cores is	-0.301030
-0.915317	ebx is	-0.124939
-0.924823	list[i] is	-0.124939
-1.460329	statements is	-0.124939
-1.472593	chapter is	-0.124939
-1.472593	buffer is	-0.124939
-1.485214	unrolling is	-0.124939
-0.886646	CriticalFunction is	-0.425969
-1.525438	fraction is	-0.124939
-1.554486	length is	-0.124939
-0.482990	f is	-0.124939
-1.511612	penalty is	-0.124939
-0.880239	F1 is	-0.124939
-0.634974	alternative is	-0.301030
-0.633170	stride is	-0.301030
-1.525438	'this' is	-0.124939
-1.472593	row is	-0.124939
-1.460329	metaprogramming is	-0.124939
-1.498212	map is	-0.124939
-1.426836	contain is	-0.124939
-1.465854	device is	-0.124939
-0.589217	transfer is	-0.124939
-0.834481	blocks is	-0.124939
-0.847392	15.1b is	-0.124939
-0.591028	latter is	-0.124939
-1.439456	chains is	-0.124939
-1.426836	brand is	-0.124939
-1.439456	diagonal is	-0.124939
-0.834481	purposes is	-0.124939
-1.465854	Time is	-0.124939
-0.840888	everything is	-0.124939
-1.452455	Here is	-0.124939
-1.493962	OpenMP is	-0.124939
-0.594675	cycle is	-0.301030
-1.452455	aliasing is	-0.124939
-0.594675	tool is	-0.124939
-1.428528	memcpy is	-0.124939
-0.539876	parallelism is	-0.124939
-0.789736	#if is	-0.124939
-0.786521	unit is	-0.124939
-1.414702	label is	-0.124939
-1.388304	iterations is	-0.124939
-1.401302	misprediction is	-0.124939
-1.401302	binding is	-0.124939
-1.388304	background is	-0.124939
-1.388304	chain is	-0.124939
-1.401302	algorithms is	-0.124939
-1.401302	inputs is	-0.124939
-1.414702	who is	-0.124939
-1.388304	DLL is	-0.124939
-1.428528	required is	-0.124939
-1.414702	volatile is	-0.124939
-1.414702	misses is	-0.124939
-1.384817	purpose is	-0.124939
-1.356710	-fpic is	-0.124939
-1.356710	D is	-0.124939
-1.356710	xn is	-0.124939
-1.356710	delete is	-0.124939
-0.731744	itself is	-0.425969
-1.343310	switches is	-0.124939
-1.356710	trick is	-0.124939
-0.731744	body is	-0.124939
-1.356710	generates is	-0.124939
-1.343310	exceptions is	-0.124939
-1.384817	CPUID is	-0.124939
-1.370536	manuals is	-0.124939
-1.343310	T is	-0.124939
-1.384817	representation is	-0.124939
-1.356710	polymorphism is	-0.124939
-1.343310	factor is	-0.124939
-1.370536	log is	-0.124939
-1.317870	principle is	-0.124939
-0.664797	Func is	-0.425969
-0.668037	notice is	-0.425969
-1.303589	portability is	-0.124939
-1.289763	debugger is	-0.124939
-1.289763	base is	-0.124939
-1.303589	compilation is	-0.124939
-1.303589	?Func@@YAXQAHAAH@Z is	-0.124939
-1.347923	INSTRSET is	-0.124939
-1.224408	Fortran is	-0.124939
-1.238689	inheritance is	-0.124939
-1.224408	swapping is	-0.124939
-1.253456	memset is	-0.124939
-0.592119	effort is	-0.425969
-1.238689	propagation is	-0.124939
-1.238689	reduction is	-0.124939
-0.588855	abc is	-0.124939
-1.224408	recommendation is	-0.124939
-1.224408	package is	-0.124939
-1.224408	jobs is	-0.124939
-1.224408	n! is	-0.124939
-1.224408	Basic is	-0.124939
-1.268742	malloc is	-0.124939
-1.224408	15.1c is	-0.124939
-1.224408	macros is	-0.124939
-1.253456	prefer is	-0.124939
-0.184516	divisor is	-0.249877
-1.141779	slices is	-0.124939
-1.156546	enum is	-0.124939
-1.141779	estimate is	-0.124939
-0.246169	m is	-0.301030
-1.171832	specialization is	-0.124939
-1.141779	pre-increment is	-0.124939
-1.171832	ownership is	-0.124939
-0.495209	*p+2 is	-0.425969
-1.156546	14.9 is	-0.124939
-1.141779	modification is	-0.124939
-1.156546	powN is	-0.124939
-0.495209	bottleneck is	-0.124939
-1.156546	area is	-0.124939
-1.156546	consequence is	-0.124939
-1.156546	assumption is	-0.124939
-1.156546	Fastcall is	-0.124939
-1.141779	(1) is	-0.124939
-1.141779	alloca is	-0.124939
-1.141779	original is	-0.124939
-0.495209	unit-testing is	-0.124939
-1.141779	interval is	-0.124939
-1.156546	μs is	-0.124939
-1.156546	bytes) is	-0.124939
-1.141779	hyperthreading is	-0.124939
-0.495209	format is	-0.124939
-0.373559	conclusion is	-0.425969
-1.031607	abstraction is	-0.124939
-1.031607	manipulation is	-0.124939
-0.373559	dividend is	-0.124939
-1.031607	coefficients is	-0.124939
-1.031607	labels is	-0.124939
-1.031607	focus is	-0.124939
-1.031607	longjmp is	-0.124939
-1.031607	(STL) is	-0.124939
-1.046893	matrix[r][c] is	-0.124939
-1.031607	bookkeeping is	-0.124939
-1.031607	Hyperthreading is	-0.124939
-1.046893	a+b is	-0.124939
-1.031607	argument is	-0.124939
-1.046893	"what is	-0.124939
-1.031607	market is	-0.124939
-1.031607	11.3 is	-0.124939
-1.046893	*(p++) is	-0.124939
-0.373559	product is	-0.124939
-1.046893	allocations is	-0.124939
-1.031607	$B1$2 is	-0.124939
-0.870802	goal is	-0.124939
-0.870802	CriticalInnerFunction is	-0.124939
-0.870802	CParent is	-0.124939
-0.870802	N&(N-1) is	-0.124939
-0.870802	comparison is	-0.124939
-0.870802	search, is	-0.124939
-0.870802	footprint is	-0.124939
-0.870802	proxy is	-0.124939
-0.870802	safety is	-0.124939
-0.870802	considering is	-0.124939
-0.870802	Neither is	-0.124939
-0.870802	file, is	-0.124939
-0.870802	branching is	-0.124939
-0.870802	interposition is	-0.124939
-0.870802	14.1c is	-0.124939
-0.870802	matrix[j][0] is	-0.124939
-0.200781	MemberPointer is	-0.425969
-0.870802	1's is	-0.124939
-0.870802	(This is	-0.124939
-0.870802	Friday is	-0.124939
-0.870802	queries is	-0.124939
-0.870802	bottlenecks is	-0.124939
-0.870802	bitfield is	-0.124939
-0.870802	coprocessors is	-0.124939
-0.870802	any, is	-0.124939
-0.870802	8.21 is	-0.124939
-0.870802	bug is	-0.124939
-0.585616	attempt is	-0.124939
-0.585616	burden is	-0.124939
-0.585616	(there is	-0.124939
-0.585616	LLVM is	-0.124939
-0.585616	she is	-0.124939
-0.585616	14.7b is	-0.124939
-0.585616	cc[i]+2 is	-0.124939
-0.585616	technique is	-0.124939
-0.585616	targets is	-0.124939
-0.585616	throw()specification is	-0.124939
-0.585616	u.d is	-0.124939
-0.585616	Unsigned is	-0.124939
-0.585616	N-1 is	-0.124939
-0.585616	i&15 is	-0.124939
-0.585616	CPU-type is	-0.124939
-0.585616	eee is	-0.124939
-0.585616	12.4c is	-0.124939
-0.585616	&list[100] is	-0.124939
-0.585616	Truncation is	-0.124939
-0.585616	X" is	-0.124939
-0.585616	seen, is	-0.124939
-0.585616	re-allocation is	-0.124939
-0.585616	(&ArraySize) is	-0.124939
-0.585616	x*8 is	-0.124939
-0.585616	(ArraySize) is	-0.124939
-0.585616	g(x) is	-0.124939
-0.585616	supposedly is	-0.124939
-0.585616	while-loop is	-0.124939
-0.585616	animations is	-0.124939
-0.585616	log(2.0) is	-0.124939
-0.585616	inttypes.h is	-0.124939
-0.585616	array[i++] is	-0.124939
-0.585616	bus is	-0.124939
-0.585616	C2::Disp() is	-0.124939
-0.585616	Virtualization is	-0.124939
-0.585616	p->member is	-0.124939
-0.585616	15.0) is	-0.124939
-0.585616	1.5f; is	-0.124939
-0.585616	edition is	-0.124939
-0.585616	(Division is	-0.124939
-0.585616	bb[i]*cc[i] is	-0.124939
-0.585616	size_t is	-0.124939
-0.585616	Polymorphism is	-0.124939
-0.585616	subtasks is	-0.124939
-0.585616	fffff is	-0.124939
-0.585616	remedy is	-0.124939
-0.585616	87) is	-0.124939
-0.585616	eax,1 is	-0.124939
-0.585616	behaviour is	-0.124939
-0.585616	preference is	-0.124939
-0.585616	triangle is	-0.124939
-0.585616	Rounding is	-0.124939
-0.585616	loop-branch is	-0.124939
-0.585616	strictness is	-0.124939
-0.585616	Why is	-0.124939
-0.585616	malloc) is	-0.124939
-0.585616	mirroring is	-0.124939
-0.585616	occurrence is	-0.124939
-0.585616	higher) is	-0.124939
-0.585616	abuse is	-0.124939
-0.585616	kilobyte is	-0.124939
-0.585616	7.43b is	-0.124939
-0.585616	Relocation is	-0.124939
-0.585616	14.21 is	-0.124939
-0.585616	granularity is	-0.124939
-0.585616	'1' is	-0.124939
-1.129428	is a	-0.344664
-2.407298	a a	-0.124939
-1.145020	of a	-0.354276
-1.360216	to a	-0.273592
-1.692268	and a	-0.146128
-0.882161	in a	-0.377961
-1.404966	for a	-0.287666
-1.447816	that a	-0.255272
-1.352272	be a	-0.236089
-1.876086	are a	-0.425969
-1.096746	= a	-0.623779
-1.374661	or a	-0.195520
-2.384044	it a	-0.124939
-1.941587	function a	-0.221849
-1.306748	if a	-0.263241
-1.047219	by a	-0.382503
-0.849978	with a	-0.191886
-1.069252	on a	-0.232149
-0.815219	as a	-0.263241
-1.601001	not a	-0.124939
-1.760227	- a	-0.425969
-2.446250	int a	-0.124939
-1.112310	than a	-0.231394
-1.424224	{ a	-1.124939
-1.005796	have a	-0.180456
-1.485266	time a	-0.425969
-0.866230	use a	-0.232149
-1.281071	when a	-0.124939
-1.759174	then a	-0.249877
-1.094686	from a	-0.152967
-2.028255	memory a	-0.124939
-1.011219	at a	-0.449450
-0.926747	has a	-0.212089
-0.754165	make a	-0.167691
-1.693498	because a	-0.124939
-1.333808	only a	-0.182931
-0.975349	If a	-0.313995
-1.955502	which a	-0.124939
-1.697570	set a	-0.301030
-1.439710	do a	-0.346788
-0.901104	using a	-0.124939
-2.235526	double a	-0.124939
-2.309325	size a	-0.124939
-2.331425	pointer a	-0.124939
-0.672066	into a	-0.522879
-2.209734	float a	-0.124939
-1.244758	also a	-0.124939
-1.011058	such a	-0.170696
-2.345622	In a	-0.124939
-2.237228	array a	-0.124939
-0.947403	where a	-0.166331
-1.301949	takes a	-0.221849
-2.271780	so a	-0.124939
-0.890949	return a	-0.467361
-1.496576	between a	-0.124939
-1.724878	way a	-0.124939
-1.219078	makes a	-0.124939
-1.455098	called a	-0.124939
-2.218325	address a	-0.124939
-1.194803	call a	-0.221849
-0.825727	example, a	-0.124939
-1.179130	take a	-0.346788
-1.409114	often a	-0.124939
-1.663971	how a	-0.124939
-1.246524	need a	-0.124939
-2.081481	test a	-0.124939
-1.371741	even a	-0.124939
-1.099230	access a	-0.124939
-1.602477	out a	-0.124939
-1.567390	0 a	-0.124939
-2.083902	case a	-0.124939
-2.010900	& a	-0.124939
-2.033385	constant a	-0.124939
-1.169983	up a	-0.249877
-0.820493	making a	-0.124939
-2.187310	want a	-0.124939
-2.002954	about a	-0.124939
-1.495477	while a	-0.124939
-1.997173	; a	-0.124939
-1.506724	calls a	-0.124939
-0.993157	Use a	-0.221849
-1.977868	big a	-0.124939
-1.255539	But a	-0.124939
-0.337117	through a	-0.384576
-1.102280	a, a	-0.249877
-2.109387	compile a	-0.124939
-1.440337	matrix a	-0.124939
-1.900436	been a	-0.124939
-1.058561	cause a	-0.124939
-2.034838	done a	-0.124939
-1.973115	therefore a	-0.124939
-0.779155	inside a	-0.492916
-0.749578	uses a	-0.191886
-2.000075	parameters a	-0.124939
-0.686599	get a	-0.182931
-0.540945	b; a	-0.865301
-1.436153	implemented a	-0.124939
-2.004747	solution a	-0.124939
-1.963655	support a	-0.124939
-1.000730	contains a	-0.124939
-2.000864	whether a	-0.124939
-1.915192	doing a	-0.124939
-1.380628	run a	-0.124939
-1.954074	calculate a	-0.124939
-1.970658	inline a	-0.124939
-1.352318	add a	-0.124939
-1.366710	copy a	-0.425969
-1.926922	optimizing a	-0.124939
-0.851198	simply a	-0.823909
-1.920894	... a	-0.124939
-1.092987	quite a	-0.301030
-1.888937	used. a	-0.124939
-1.322346	write a	-0.124939
-1.874214	optimize a	-0.124939
-1.923077	However, a	-0.124939
-1.907837	cases, a	-0.124939
-0.704997	replace a	-0.124939
-1.842474	allows a	-0.124939
-1.281454	sets a	-0.425969
-1.044609	expression a	-0.301030
-1.834561	handling a	-0.124939
-0.774809	like a	-0.124939
-1.273288	__m128i a	-0.425969
-1.858619	Using a	-0.124939
-0.760518	put a	-0.221849
-1.277790	needs a	-0.124939
-0.759831	c; a	-0.823909
-1.253326	what a	-0.124939
-1.849596	running a	-0.124939
-1.217374	&& a	-0.124939
-1.781315	| a	-0.124939
-0.496886	Make a	-0.550907
-0.992637	char a	-0.602060
-1.831113	needed a	-0.124939
-1.193111	give a	-0.124939
-1.188700	becomes a	-0.124939
-0.797668	requires a	-0.124939
-1.197567	load a	-0.124939
-1.770415	calling a	-0.124939
-1.180880	shows a	-0.124939
-0.642709	generate a	-0.221849
-1.739432	reduce a	-0.124939
-1.158604	choose a	-0.124939
-1.149646	made a	-0.124939
-0.888741	just a	-0.124939
-0.620610	require a	-0.124939
-1.135123	start a	-0.425969
-1.682593	supports a	-0.124939
-1.719865	columns a	-0.124939
-1.077234	become a	-0.425969
-1.719865	gives a	-0.124939
-1.682390	inlining a	-0.124939
-1.663431	b) a	-0.124939
-0.541429	Such a	-0.124939
-1.711860	described a	-0.124939
-1.663431	produce a	-0.124939
-1.682160	including a	-0.124939
-1.028247	given a	-0.124939
-1.652196	temp a	-0.124939
-1.652196	d; a	-0.124939
-1.606807	save a	-0.124939
-1.681897	prevents a	-0.124939
-1.681897	tell a	-0.124939
-0.627865	unroll a	-0.425969
-1.601283	testing a	-0.124939
-0.751278	writing a	-0.124939
-1.629688	When a	-0.124939
-1.601283	copying a	-0.124939
-0.593874	accessing a	-0.124939
-1.660082	until a	-0.124939
-0.991537	adding a	-0.425969
-1.610546	causes a	-0.124939
-1.594926	predict a	-0.124939
-0.959032	true a	-0.425969
-1.575784	execute a	-0.124939
-1.594926	N a	-0.124939
-1.594926	least a	-0.124939
-0.680024	insert a	-0.124939
-0.925793	loading a	-0.425969
-0.935037	calculating a	-0.124939
-1.537996	e.g. a	-0.124939
-0.925793	? a	-0.425969
-1.577162	defined a	-0.124939
-0.524943	expect a	-0.249877
-1.506068	course a	-0.124939
-0.641236	whenever a	-0.124939
-1.515745	modify a	-0.124939
-0.888998	setting a	-0.124939
-0.485371	within a	-0.124939
-1.506068	counts a	-0.124939
-0.888998	processors, a	-0.124939
-1.525641	Obviously, a	-0.124939
-0.483550	allocate a	-0.425969
-1.546138	added a	-0.124939
-1.578820	waste a	-0.124939
-0.884400	define a	-0.124939
-0.595479	implement a	-0.124939
-1.460311	contain a	-0.124939
-1.479884	writes a	-0.124939
-1.479884	transfer a	-0.124939
-1.479884	away a	-0.124939
-1.479884	multiply a	-0.124939
-0.847887	stores a	-0.124939
-1.500381	finding a	-0.124939
-1.511004	vectorize a	-0.124939
-0.592874	include a	-0.124939
-1.490012	addition, a	-0.124939
-1.479884	across a	-0.124939
-1.449228	required a	-0.124939
-1.438859	down a	-0.124939
-1.370740	had a	-0.124939
-0.329557	10; a	-0.726999
-1.370740	Likewise, a	-0.124939
-1.391236	spend a	-0.124939
-1.391236	called, a	-0.124939
-1.401860	executing a	-0.124939
-1.370740	exceptions a	-0.124939
-0.736413	transpose a	-0.425969
-0.669466	break a	-0.124939
-1.313920	plus a	-0.124939
-0.671796	16; a	-0.425969
-1.324290	identify a	-0.124939
-0.669466	show a	-0.124939
-1.313920	evaluate a	-0.124939
-1.324290	reference, a	-0.124939
-1.334913	half a	-0.124939
-0.422008	converting a	-0.301030
-0.420696	follows a	-0.602060
-1.313920	base a	-0.124939
-1.345802	Reading a	-0.124939
-1.345802	form a	-0.124939
-0.420696	defines a	-0.602060
-1.313920	Is16vec8 a	-0.124939
-0.420696	Accessing a	-0.124939
-0.342827	install a	-0.124939
-0.342827	consume a	-0.301030
-1.245108	hand, a	-0.124939
-1.255731	created a	-0.124939
-1.277790	follow a	-0.124939
-0.592614	returns a	-0.124939
-1.266621	Is a	-0.124939
-1.266621	Typically, a	-0.124939
-1.266621	~a a	-0.124939
-1.266621	prefer a	-0.124939
-1.169711	repeats a	-0.124939
-1.158821	defining a	-0.124939
-0.247233	produces a	-0.124939
-1.169711	*p+2 a	-0.124939
-0.500402	choosing a	-0.124939
-1.158821	saving a	-0.124939
-0.498047	Whenever a	-0.124939
-1.158821	{return a	-0.124939
-0.498047	Calling a	-0.124939
-1.169711	force a	-0.124939
-1.169711	pointer, a	-0.124939
-1.169711	opens a	-0.124939
-1.044772	seem a	-0.124939
-1.044772	consumes a	-0.124939
-1.044772	optimizes a	-0.124939
-1.044772	Return a	-0.124939
-1.055942	7.2 a	-0.124939
-1.044772	ignore a	-0.124939
-0.375463	considered a	-0.124939
-1.055942	spaced a	-0.124939
-0.375463	implementing a	-0.124939
-1.044772	Specifies a	-0.124939
-1.044772	reveals a	-0.124939
-1.044772	(requires a	-0.124939
-1.044772	140 a	-0.124939
-1.044772	leave a	-0.124939
-0.375463	With a	-0.124939
-1.055942	scans a	-0.124939
-1.044772	justify a	-0.124939
-0.375463	holds a	-0.124939
-1.044772	arrays, a	-0.124939
-0.375463	false, a	-0.425969
-0.375463	represent a	-0.124939
-0.375463	returning a	-0.124939
-1.055942	terminating a	-0.124939
-1.044772	joining a	-0.124939
-1.044772	certainly a	-0.124939
-1.044772	indeed a	-0.124939
-1.044772	(not a	-0.124939
-1.044772	expects a	-0.124939
-1.044772	compute a	-0.124939
-0.879850	SSSE3 a	-0.124939
-0.879850	executes a	-0.124939
-0.879850	developed a	-0.124939
-0.879850	keeping a	-0.124939
-0.879850	emulate a	-0.124939
-0.879850	Replacing a	-0.124939
-0.879850	starting a	-0.124939
-0.879850	if, a	-0.124939
-0.879850	differ a	-0.124939
-0.201739	pressing a	-0.124939
-0.879850	maintaining a	-0.124939
-0.879850	c.load(cc+i); a	-0.124939
-0.879850	Unrolling a	-0.124939
-0.879850	afterwards a	-0.124939
-0.201739	lock a	-0.124939
-0.879850	reveal a	-0.124939
-0.879850	z; a	-0.124939
-0.201739	transposes a	-0.124939
-0.879850	Returns a	-0.124939
-0.879850	sees a	-0.124939
-0.879850	treat a	-0.124939
-0.879850	log2 a	-0.124939
-0.879850	studying a	-0.124939
-0.879850	replacing a	-0.124939
-0.879850	involve a	-0.124939
-0.879850	type, a	-0.124939
-0.879850	Writing a	-0.124939
-0.879850	assigning a	-0.124939
-0.879850	relieving a	-0.124939
-0.590285	installed, a	-0.124939
-0.590285	Gives a	-0.124939
-0.590285	re-use a	-0.124939
-0.590285	Inlining a	-0.124939
-0.590285	incrementing a	-0.124939
-0.590285	decrementing a	-0.124939
-0.590285	requesting a	-0.124939
-0.590285	occupies a	-0.124939
-0.590285	Installing a	-0.124939
-0.590285	executable: a	-0.124939
-0.590285	(a&b)&(c&d) a	-0.124939
-0.590285	Transposing a	-0.124939
-0.590285	8.5b a	-0.124939
-0.590285	Unlike a	-0.124939
-0.590285	create a	-0.124939
-0.590285	forms a	-0.124939
-0.590285	compose a	-0.124939
-0.590285	draws a	-0.124939
-0.590285	feeds a	-0.124939
-0.590285	8.2b a	-0.124939
-0.590285	8.3b a	-0.124939
-0.590285	indicates a	-0.124939
-0.590285	incur a	-0.124939
-0.590285	pass a	-0.124939
-0.590285	reinstall a	-0.124939
-0.590285	rounds a	-0.124939
-0.590285	8.10b a	-0.124939
-0.590285	fetching a	-0.124939
-0.590285	redesigning a	-0.124939
-0.590285	1.5f}; a	-0.124939
-0.590285	converts a	-0.124939
-0.590285	MultiplyBy<8>(10); a	-0.124939
-0.590285	isolating a	-0.124939
-0.590285	a= a	-0.124939
-0.590285	2.5f}; a	-0.124939
-0.590285	managing a	-0.124939
-0.590285	x-xxx--xx a	-0.124939
-0.590285	publish a	-0.124939
-0.590285	--xxxx--- a	-0.124939
-0.590285	RAM, a	-0.124939
-0.590285	activate a	-0.124939
-0.590285	occupying a	-0.124939
-2.396208	is of	-0.726999
-2.591532	that of	-0.124939
-2.286361	be of	-0.124939
-2.402890	are of	-0.124939
-2.275637	// of	-0.425969
-1.732163	function of	-0.329059
-2.466406	code of	-0.124939
-2.461960	may of	-0.124939
-1.784984	time of	-0.249877
-1.027348	use of	-0.166331
-2.360965	more of	-0.124939
-2.281890	program of	-0.124939
-1.224466	vector of	-0.321233
-1.073101	because of	-0.346788
-1.781446	functions of	-0.301030
-1.958603	CPU of	-0.124939
-1.942576	point of	-0.124939
-1.629666	loop of	-0.425969
-1.930068	which of	-0.425969
-1.883276	all of	-0.124939
-1.093131	one of	-0.903090
-1.703579	cache of	-0.301030
-2.400925	should of	-0.124939
-2.207115	integer of	-0.124939
-1.051022	set of	-0.249877
-1.852374	class of	-0.124939
-1.855686	each of	-0.124939
-1.507567	example of	-0.425969
-1.000749	most of	-0.761761
-0.530486	size of	-0.505150
-2.282179	pointer of	-0.124939
-2.192743	library of	-0.124939
-1.074412	multiple of	-0.778151
-0.986053	object of	-0.564271
-0.136825	number of	-0.346788
-1.305484	array of	-0.124939
-1.292992	many of	-0.522879
-0.413434	version of	-0.778151
-0.385183	value of	-0.425969
-1.301622	objects of	-0.346788
-1.058754	any of	-0.726999
-1.049244	some of	-0.726999
-1.025611	table of	-0.249877
-1.030412	performance of	-0.249877
-0.927172	order of	-0.425969
-1.488047	branch of	-0.301030
-0.968193	member of	-0.602060
-0.774891	way of	-0.197489
-1.216653	elements of	-0.221849
-0.486696	address of	-0.505150
-2.130253	call of	-0.124939
-0.859128	bit of	-0.346788
-1.253636	optimization of	-0.425969
-2.064090	libraries of	-0.124939
-1.606414	pointers of	-0.425969
-2.093253	even of	-0.124939
-1.012344	method of	-0.301030
-0.452536	out of	-0.471726
-1.985265	file of	-0.124939
-0.048519	part of	-1.116165
-0.753820	bits of	-0.425969
-2.024635	operations of	-0.124939
-0.695781	type of	-0.124939
-0.526562	case of	-0.329059
-1.179058	cases of	-0.249877
-2.026621	Some of	-0.124939
-1.265277	arrays of	-0.124939
-1.476593	calculations of	-0.124939
-0.323117	versions of	-0.467361
-1.266772	execution of	-0.301030
-0.596266	result of	-0.425969
-1.252543	bytes of	-0.124939
-1.978848	element of	-0.124939
-0.825509	speed of	-0.124939
-2.002193	much of	-0.124939
-1.897939	etc. of	-0.124939
-1.067237	overflow of	-0.249877
-0.867558	integers of	-0.124939
-0.066868	power of	-0.765917
-1.203129	cause of	-0.124939
-1.912870	precision of	-0.124939
-0.328254	calculation of	-0.338819
-1.397016	uses of	-0.124939
-1.971623	parameters of	-0.124939
-1.940820	problem of	-0.124939
-1.408135	solution of	-0.425969
-0.215510	advantage of	-0.425969
-1.931824	support of	-0.124939
-1.853285	few of	-0.124939
-0.514049	list of	-0.263241
-1.924451	would of	-0.124939
-1.141152	structure of	-0.124939
-0.760232	values of	-0.204120
-1.324562	All of	-0.425969
-1.366225	sign of	-0.425969
-0.852799	copy of	-0.346788
-0.950895	addresses of	-0.249877
-1.315930	allocation of	-0.425969
-1.303300	problems of	-0.124939
-1.834914	space of	-0.124939
-0.080265	lot of	-0.204120
-1.793201	multiplication of	-0.124939
-0.685200	implementation of	-0.425969
-1.042222	Most of	-0.602060
-0.491506	members of	-0.477121
-1.794692	methods of	-0.124939
-1.251472	development of	-0.124939
-1.019051	block of	-0.124939
-1.248878	name of	-0.124939
-1.874269	needs of	-0.124939
-1.822852	conversion of	-0.124939
-0.327119	disadvantage of	-0.204120
-1.824966	parameter of	-0.124939
-0.852901	source of	-0.124939
-0.367005	cost of	-0.124939
-1.824966	resources of	-0.124939
-1.777049	string of	-0.124939
-0.496888	end of	-0.249877
-0.494902	examples of	-0.249877
-1.786640	addition of	-0.124939
-1.777049	mechanism of	-0.124939
-1.210087	Table of	-0.425969
-1.767665	runtime of	-0.124939
-0.996773	means of	-0.301030
-0.983807	byte of	-0.124939
-0.035638	parts of	-1.146128
-0.420698	types of	-0.234083
-0.014892	instead of	-0.188608
-1.747131	Each of	-0.124939
-1.191651	optimizations of	-0.124939
-1.175751	Many of	-0.124939
-1.167771	numbers of	-0.425969
-0.659129	vectors of	-0.124939
-0.089280	piece of	-0.510290
-1.713474	process of	-0.124939
-0.225984	advantages of	-0.359022
-1.150892	results of	-0.124939
-1.140162	storage of	-0.124939
-0.415909	ways of	-0.249877
-1.700027	operands of	-0.124939
-0.619231	range of	-0.124939
-0.887182	start of	-0.301030
-1.670593	modules of	-0.124939
-1.695996	core of	-0.124939
-0.261949	overhead of	-0.279841
-1.706778	change of	-0.124939
-0.836030	installation of	-0.301030
-0.236798	choice of	-0.221849
-1.659146	index of	-0.124939
-0.377305	instance of	-0.492916
-1.018267	output of	-0.124939
-1.680788	outside of	-0.124939
-1.601154	task of	-0.124939
-0.421822	costs of	-0.204120
-1.633513	destructor of	-0.124939
-0.023637	Choice of	-0.425969
-0.346762	efficiency of	-0.367977
-1.601154	algorithm of	-0.124939
-1.032063	sum of	-0.124939
-1.624323	strings of	-0.124939
-0.597596	possibility of	-0.124939
-0.102154	discussion of	-0.263241
-1.601328	maximum of	-0.124939
-0.986082	alignment of	-0.124939
-0.592080	offset of	-0.249877
-1.002691	operand of	-0.124939
-0.025510	sake of	-0.159701
-0.596487	effect of	-0.249877
-0.025510	amount of	-0.284640
-1.601328	time, of	-0.124939
-1.579490	copying of	-0.124939
-1.590272	causes of	-0.124939
-1.544728	mix of	-0.124939
-0.956786	priority of	-0.124939
-1.577911	frequency of	-0.124939
-0.956786	iteration of	-0.425969
-1.566566	models of	-0.124939
-0.962321	names of	-0.124939
-0.027705	kinds of	-0.249877
-0.959545	details of	-0.124939
-0.714226	level of	-0.301030
-1.551772	target of	-0.124939
-1.528778	bounds of	-0.124939
-0.676438	loading of	-0.124939
-0.924533	reading of	-0.124939
-1.601772	situation of	-0.124939
-0.179048	implementations of	-0.182931
-0.319155	generation of	-0.301030
-0.918997	sizes of	-0.124939
-0.202374	risk of	-0.124939
-0.888747	fraction of	-0.425969
-0.276383	sequence of	-0.301030
-0.138156	length of	-0.550907
-0.885935	penalty of	-0.124939
-0.885935	reasons of	-0.425969
-0.082663	beginning of	-0.602060
-0.138156	matter of	-0.425969
-0.880364	declaration of	-0.124939
-0.082663	series of	-0.234083
-0.635045	features of	-0.124939
-0.082663	waste of	-0.380211
-0.037347	microarchitecture of	-1.079181
-0.590864	independent of	-0.301030
-1.464622	destructors of	-0.124939
-0.092900	terms of	-0.182931
-1.488903	help of	-0.124939
-1.464622	transfer of	-0.124939
-0.837383	blocks of	-0.124939
-0.231314	explanation of	-0.204120
-0.232005	brands of	-0.204120
-1.441627	brand of	-0.124939
-1.464622	logic of	-0.124939
-0.840177	Optimization of	-0.425969
-0.156616	care of	-0.271067
-1.425440	unit of	-0.124939
-0.042249	kind of	-0.124939
-1.401820	iterations of	-0.124939
-1.413470	misprediction of	-0.124939
-1.413470	binding of	-0.124939
-1.401820	chain of	-0.124939
-1.425440	family of	-0.124939
-0.272135	Conversion of	-0.221849
-1.425440	tables of	-0.124939
-1.463469	Conversions of	-0.124939
-0.328751	purpose of	-0.124939
-0.731033	trick of	-0.425969
-0.733845	disadvantages of	-0.124939
-0.328751	instances of	-0.249877
-0.733845	body of	-0.124939
-1.367448	changes of	-0.124939
-0.731033	collection of	-0.124939
-0.329863	aware of	-0.124939
-1.355478	capabilities of	-0.124939
-0.328751	representation of	-0.124939
-0.048635	powers of	-0.492916
-1.355478	factor of	-0.124939
-0.486482	rules of	-0.301030
-0.048635	responsibility of	-0.970037
-0.731033	reciprocal of	-0.425969
-1.300501	polynomial of	-0.124939
-0.672579	casting of	-0.425969
-0.148050	scope of	-0.346788
-0.669729	principle of	-0.124939
-0.419536	throughput of	-0.124939
-0.262916	Number of	-0.249877
-0.057298	availability of	-0.124939
-1.325480	half of	-0.124939
-0.666898	step of	-0.124939
-0.057298	regardless of	-0.301030
-0.666898	compilation of	-0.124939
-0.262916	behavior of	-0.249877
-0.666898	Type of	-0.425969
-0.262916	form of	-0.249877
-0.675447	rule of	-0.425969
-1.300501	job of	-0.124939
-0.421135	requirements of	-0.124939
-0.069724	loss of	-0.221849
-1.246299	cleanup of	-0.124939
-1.233630	swapping of	-0.124939
-0.069724	rest of	-0.823909
-1.233630	principles of	-0.124939
-0.590548	effects of	-0.425969
-1.259349	Array of	-0.124939
-1.233630	lists of	-0.124939
-1.246299	event of	-0.124939
-1.246299	advice of	-0.124939
-1.233630	recommendation of	-0.124939
-1.233630	Objects of	-0.124939
-1.259349	Division of	-0.124939
-0.341954	pitfalls of	-0.301030
-1.233630	inefficient, of	-0.124939
-0.590548	latency of	-0.124939
-0.069724	pieces of	-0.221849
-0.590548	consumption of	-0.124939
-1.246299	facilities of	-0.124939
-1.149389	slices of	-0.124939
-1.149389	estimate of	-0.124939
-1.162439	well, of	-0.124939
-0.246649	ownership of	-0.602060
-0.089056	drawbacks of	-0.249877
-1.162439	queue of	-0.124939
-1.149389	modification of	-0.124939
-0.496488	Out of	-0.425969
-1.149389	modifications of	-0.124939
-1.149389	lengths of	-0.124939
-0.089056	Alignment of	-0.726999
-0.496488	splitting of	-0.124939
-1.162439	area of	-0.124939
-0.496488	consequence of	-0.124939
-0.246649	50% of	-0.602060
-1.149389	functionality of	-0.124939
-0.499356	layers of	-0.425969
-1.162439	Integers of	-0.124939
-1.149389	considerations of	-0.124939
-0.496488	bytes) of	-0.124939
-1.149389	techniques of	-0.124939
-0.089056	Comparison of	-0.249877
-1.149389	resolution of	-0.124939
-1.037500	Which of	-0.124939
-0.374417	redesign of	-0.124939
-0.123321	combination of	-0.124939
-0.123321	sources of	-0.124939
-1.037500	analysis of	-0.124939
-1.037500	updating of	-0.124939
-0.374417	contents of	-0.124939
-1.050954	generality of	-0.124939
-1.050954	study of	-0.124939
-0.123321	(number of	-0.124939
-0.374417	degree of	-0.124939
-1.050954	overriding of	-0.124939
-0.374417	track of	-0.124939
-0.123321	rid of	-0.301030
-1.037500	decomposition of	-0.124939
-0.123321	history of	-0.124939
-0.123321	addressing of	-0.602060
-0.123321	top of	-0.602060
-1.037500	Documentation of	-0.124939
-0.123321	none of	-0.602060
-0.123321	Size of	-0.602060
-1.037500	logarithm of	-0.124939
-0.374417	deallocation of	-0.124939
-0.374417	allocations of	-0.124939
-1.037500	indeed of	-0.124939
-1.050954	beware of	-0.124939
-0.874863	complexity of	-0.124939
-0.874863	lack of	-0.124939
-0.874863	window of	-0.124939
-0.874863	Structure of	-0.124939
-0.874863	goal of	-0.124939
-0.874863	chance of	-0.124939
-0.874863	dangers of	-0.124939
-0.874863	comparison of	-0.124939
-0.874863	90% of	-0.124939
-0.201213	Value of	-0.124939
-0.201213	ahead of	-0.425969
-0.874863	opposite of	-0.124939
-0.201213	movements of	-0.425969
-0.201213	capable of	-0.124939
-0.874863	lowest of	-0.124939
-0.874863	marketing of	-0.124939
-0.201213	understanding of	-0.124939
-0.201213	creation of	-0.124939
-0.874863	parallelization of	-0.124939
-0.874863	degradation of	-0.124939
-0.201213	deal of	-0.124939
-0.201213	lots of	-0.124939
-0.201213	99% of	-0.425969
-0.201213	Overview of	-0.425969
-0.201213	sequences of	-0.124939
-0.874863	expansions of	-0.124939
-0.201213	Caching of	-0.425969
-0.874863	University of	-0.124939
-0.201213	Sum of	-0.425969
-0.874863	obstacle of	-0.124939
-0.874863	manipulations of	-0.124939
-0.874863	extension of	-0.124939
-0.201213	10% of	-0.425969
-0.874863	fourth of	-0.124939
-0.587717	vulnerability of	-0.124939
-0.587717	1/50 of	-0.124939
-0.587717	importance of	-0.124939
-0.587717	transposition of	-0.124939
-0.587717	architecture of	-0.124939
-0.587717	transformation of	-0.124939
-0.587717	standardization of	-0.124939
-0.587717	de-allocation of	-0.124939
-0.587717	Violation of	-0.124939
-0.587717	flexibility of	-0.124939
-0.587717	wealth of	-0.124939
-0.587717	independently of	-0.124939
-0.587717	amounts of	-0.124939
-0.587717	uninstallation of	-0.124939
-0.587717	decimals of	-0.124939
-0.587717	None of	-0.124939
-0.587717	absence of	-0.124939
-0.587717	fallacy of	-0.124939
-0.587717	menus of	-0.124939
-0.587717	lifetime of	-0.124939
-0.587717	perspective of	-0.124939
-0.587717	bility of	-0.124939
-0.587717	evaluation of	-0.124939
-0.587717	benefits of	-0.124939
-0.587717	bias of	-0.124939
-0.587717	gigabytes of	-0.124939
-0.587717	overview of	-0.124939
-0.587717	Members of	-0.124939
-0.587717	majority of	-0.124939
-0.587717	lineage of	-0.124939
-0.587717	indication of	-0.124939
-0.587717	scarcity of	-0.124939
-0.587717	safe, of	-0.124939
-0.587717	insertion of	-0.124939
-0.587717	distributions of	-0.124939
-0.587717	double's of	-0.124939
-0.587717	cons of	-0.124939
-0.587717	knowledge of	-0.124939
-0.587717	notion of	-0.124939
-0.587717	Instead of	-0.124939
-0.587717	Lists of	-0.124939
-0.587717	x86) of	-0.124939
-0.587717	billions of	-0.124939
-0.587717	practice, of	-0.124939
-0.587717	Bit-fields of	-0.124939
-0.587717	omitted, of	-0.124939
-0.587717	consisting of	-0.124939
-0.587717	hundreds of	-0.124939
-0.587717	searches of	-0.124939
-0.587717	systematization of	-0.124939
-0.587717	fragmentation of	-0.124939
-0.587717	Much of	-0.124939
-0.587717	occurrences of	-0.124939
-0.587717	groups of	-0.124939
-0.587717	holes of	-0.124939
-0.587717	Sizes of	-0.124939
-0.587717	design of	-0.124939
-0.587717	segmentation of	-0.124939
-0.587717	dimensions of	-0.124939
-0.587717	requires, of	-0.124939
-0.587717	laws of	-0.124939
-0.587717	attention of	-0.124939
-0.587717	maintainability of	-0.124939
-0.587717	investigation of	-0.124939
-0.587717	compactness of	-0.124939
-0.587717	advise of	-0.124939
-0.587717	ease of	-0.124939
-0.587717	couple of	-0.124939
-0.587717	nature of	-0.124939
-0.587717	"Zen of	-0.124939
-0.587717	thousands of	-0.124939
-0.587717	favor of	-0.124939
-0.587717	layer of	-0.124939
-0.587717	clarity of	-0.124939
-0.587717	levels of	-0.124939
-0.587717	Vectors of	-0.124939
-0.587717	reports of	-0.124939
-1.523236	is to	-0.296874
-2.124949	a to	-0.234083
-2.002313	and to	-0.191886
-2.375046	be to	-0.124939
-2.098385	or to	-0.124939
-1.822932	it to	-0.191886
-1.633463	function to	-0.279841
-1.408091	code to	-0.157123
-2.108077	as to	-0.124939
-2.066973	not to	-0.124939
-2.602079	- to	-0.124939
-1.380130	than to	-0.162727
-1.068555	compiler to	-0.346788
-1.913038	x to	-0.425969
-1.758695	you to	-0.124939
-0.757411	have to	-0.277030
-1.863025	this to	-0.301030
-1.200936	time to	-0.154902
-1.384727	memory to	-0.329059
-2.338386	at to	-0.124939
-1.972682	data to	-0.124939
-1.661010	program to	-0.124939
-0.987948	has to	-0.289749
-1.950118	only to	-0.124939
-1.626969	CPU to	-0.124939
-2.480926	instruction to	-0.124939
-1.130946	point to	-0.301030
-2.148977	all to	-0.124939
-1.609873	used to	-0.124939
-2.225068	cache to	-0.124939
-1.301820	integer to	-0.367977
-1.866236	set to	-0.124939
-2.153046	class to	-0.124939
-1.861001	do to	-0.124939
-2.090320	example to	-0.124939
-1.885034	compilers to	-0.124939
-1.624682	double to	-0.301030
-2.224968	size to	-0.124939
-0.753163	pointer to	-0.602060
-1.851927	b to	-0.425969
-1.477765	i to	-0.124939
-2.121215	float to	-0.124939
-1.584194	object to	-0.124939
-1.469881	number to	-0.124939
-1.784916	static to	-0.425969
-0.855092	efficient to	-0.229674
-1.545979	array to	-0.124939
-0.255926	possible to	-0.363821
-2.229870	version to	-0.124939
-2.248374	value to	-0.124939
-2.195941	objects to	-0.124939
-0.581707	takes to	-0.359022
-2.081029	variable to	-0.124939
-1.362969	variables to	-0.249877
-2.002950	return to	-0.124939
-1.468721	2 to	-0.301030
-1.479842	table to	-0.124939
-1.660287	software to	-0.124939
-0.204308	order to	-0.361511
-2.136108	branch to	-0.124939
-0.480993	way to	-0.244125
-1.686513	elements to	-0.124939
-0.944432	faster to	-0.234083
-0.626266	call to	-0.355388
-1.441552	example, to	-0.124939
-1.427303	bit to	-0.124939
-1.413108	register to	-0.124939
-0.337267	how to	-0.425969
-2.065423	template to	-0.124939
-2.044233	registers to	-0.124939
-0.372341	need to	-0.284640
-0.756398	pointers to	-0.263241
-1.590826	user to	-0.124939
-0.783588	useful to	-0.221849
-0.945814	sure to	-0.124939
-2.079271	method to	-0.124939
-1.507730	always to	-0.124939
-0.725657	access to	-0.263241
-1.544422	16 to	-0.124939
-1.588231	out to	-0.425969
-1.319408	system to	-0.124939
-1.956594	file to	-0.124939
-1.990994	bits to	-0.124939
-1.998476	operations to	-0.124939
-1.549701	0 to	-0.124939
-2.013371	type to	-0.124939
-2.047087	cases to	-0.124939
-1.930582	simple to	-0.124939
-1.966171	instructions to	-0.124939
-1.541114	available to	-0.124939
-1.510914	constant to	-0.425969
-0.640965	up to	-0.249877
-1.294799	times to	-0.124939
-0.158344	want to	-0.225054
-0.370596	important to	-0.216709
-1.994504	CPUs to	-0.124939
-1.472058	large to	-0.425969
-1.475897	work to	-0.124939
-0.780639	calls to	-0.182931
-1.926907	calculations to	-0.124939
-1.993150	execution to	-0.124939
-2.030492	result to	-0.124939
-1.957351	processor to	-0.124939
-1.108297	compiled to	-0.249877
-1.957351	bytes to	-0.124939
-1.909535	big to	-0.124939
-0.302468	necessary to	-0.212089
-1.464386	element to	-0.124939
-1.472779	speed to	-0.425969
-1.870368	specific to	-0.124939
-1.214885	common to	-0.301030
-1.443224	thread to	-0.425969
-1.960267	allocated to	-0.124939
-1.891393	small to	-0.124939
-1.064526	integers to	-0.249877
-1.444204	good to	-0.124939
-1.988292	done to	-0.124939
-1.888432	precision to	-0.124939
-1.880397	line to	-0.124939
-0.912785	parameters to	-0.522879
-0.292136	advantageous to	-0.449450
-0.903347	known to	-0.522879
-1.014917	solution to	-0.249877
-1.023465	advantage to	-0.249877
-0.675351	Function to	-0.726999
-1.002328	eight to	-0.726999
-1.003539	whether to	-0.124939
-0.115282	likely to	-0.393784
-1.915741	structure to	-0.124939
-1.836233	1 to	-0.124939
-1.819048	add to	-0.124939
-1.365963	information to	-0.124939
-1.878500	simply to	-0.124939
-0.024956	able to	-0.301030
-0.755897	certain to	-0.301030
-1.090030	cycles to	-0.124939
-1.339638	addresses to	-0.124939
-1.884677	count to	-0.124939
-1.864260	files to	-0.124939
-0.057369	recommended to	-0.410174
-1.830036	write to	-0.124939
-1.787096	programs to	-0.124939
-0.802712	optimal to	-0.221849
-1.292155	space to	-0.124939
-1.089557	lot to	-0.602060
-1.300826	Integer to	-0.425969
-1.295026	dispatching to	-0.124939
-1.789571	branches to	-0.124939
-1.273508	application to	-0.124939
-1.833109	expression to	-0.124939
-1.763950	complicated to	-0.124939
-1.802112	like to	-0.124939
-1.812200	members to	-0.124939
-1.775219	methods to	-0.124939
-1.785078	block to	-0.124939
-0.272649	needs to	-0.316824
-1.254538	conversion to	-0.124939
-1.242728	parameter to	-0.425969
-1.798347	division to	-0.124939
-0.325079	reference to	-0.359022
-0.745139	cost to	-0.221849
-0.727604	reason to	-0.346788
-1.769283	dispatcher to	-0.124939
-1.209596	n to	-0.124939
-1.758955	string to	-0.124939
-0.389877	programmer to	-0.647817
-1.710704	three to	-0.124939
-1.206724	better to	-0.124939
-1.758955	keyword to	-0.124939
-1.719935	applications to	-0.124939
-1.710704	&& to	-0.124939
-0.831369	addition to	-0.249877
-1.758955	mechanism to	-0.124939
-1.849406	means to	-0.124939
-1.782525	types to	-0.124939
-0.130815	difficult to	-0.313995
-1.842856	transferred to	-0.124939
-1.785458	aligned to	-0.124939
-1.751199	linking to	-0.124939
-1.730010	numbers to	-0.124939
-1.719166	dispatch to	-0.124939
-1.160577	interface to	-0.425969
-1.696889	process to	-0.124939
-0.638033	goes to	-0.221849
-0.908766	choose to	-0.124939
-1.138301	options to	-0.124939
-0.415604	ways to	-0.329059
-1.150302	link to	-0.124939
-1.138301	made to	-0.124939
-1.718511	appropriate to	-0.124939
-0.286486	points to	-0.279841
-1.718511	switch to	-0.124939
-0.730985	start to	-0.124939
-1.718511	here to	-0.124939
-0.504871	relevant to	-0.204120
-1.105051	things to	-0.124939
-1.092965	go to	-0.124939
-1.670551	predicted to	-0.124939
-1.095955	references to	-0.124939
-1.114341	overhead to	-0.124939
-0.477774	relative to	-0.903090
-1.691798	columns to	-0.124939
-0.841167	intended to	-0.124939
-1.069626	profiler to	-0.425969
-1.081796	Loop to	-0.124939
-1.050693	inefficient to	-0.425969
-1.676524	response to	-0.124939
-1.627607	lines to	-0.124939
-1.047640	comes to	-0.124939
-1.597644	limited to	-0.124939
-0.783175	costs to	-0.124939
-1.020730	destructor to	-0.124939
-0.779644	safe to	-0.124939
-1.565460	alignment to	-0.124939
-0.979450	macro to	-0.425969
-1.577182	them to	-0.124939
-0.592109	writing to	-0.249877
-0.988545	reduced to	-0.124939
-1.589230	clear to	-0.124939
-1.554468	priority to	-0.124939
-1.554468	iteration to	-0.124939
-0.953783	models to	-0.124939
-0.557347	changed to	-0.124939
-0.355917	fail to	-0.124939
-0.959954	thing to	-0.124939
-1.530697	structures to	-0.124939
-1.579614	heap to	-0.124939
-0.676671	initialized to	-0.124939
-0.925284	executable to	-0.124939
-1.541826	subexpression to	-0.124939
-1.504631	updates to	-0.124939
-0.676671	directly to	-0.301030
-0.520790	copied to	-0.425969
-1.516679	sizes to	-0.124939
-0.318129	easier to	-0.204120
-0.674908	identical to	-0.124939
-1.541826	20 to	-0.124939
-0.928426	expect to	-0.124939
-0.874602	similar to	-0.124939
-0.483113	back to	-0.124939
-0.874602	seconds to	-0.124939
-1.527126	sequence to	-0.124939
-1.513575	something to	-0.124939
-1.513575	penalty to	-0.124939
-1.513575	reasons to	-0.124939
-1.500433	programmers to	-0.124939
-1.500433	alternative to	-0.124939
-0.637047	happen to	-0.602060
-0.276736	enough to	-0.301030
-0.275963	apply to	-0.204120
-1.475286	row to	-0.124939
-1.487678	declaration to	-0.124939
-1.487678	features to	-0.124939
-0.275963	added to	-0.425969
-0.591290	easy to	-0.124939
-1.509810	situations to	-0.124939
-0.589520	writes to	-0.124939
-0.092778	applies to	-0.124939
-0.037293	applied to	-0.477121
-0.835015	destructors to	-0.124939
-0.322977	15.1b to	-0.124939
-1.441920	eax to	-0.124939
-1.509810	care to	-0.124939
-1.458658	procedure to	-0.124939
-1.416665	throw() to	-0.124939
-0.384960	try to	-0.124939
-0.105903	converted to	-0.367977
-0.105903	pointed to	-0.492916
-1.403523	algorithms to	-0.124939
-1.403523	however, to	-0.124939
-0.790123	designed to	-0.124939
-1.403523	inputs to	-0.124939
-0.540137	preferred to	-0.301030
-1.458658	Conversion to	-0.124939
-0.786982	down to	-0.124939
-0.783863	jump to	-0.124939
-0.732131	allowed to	-0.124939
-1.358673	delete to	-0.124939
-0.735295	distributed to	-0.425969
-1.358673	generates to	-0.124939
-1.345531	T to	-0.124939
-0.732131	linker to	-0.124939
-0.728990	measurements to	-0.124939
-1.345531	factor to	-0.124939
-1.345531	MMX to	-0.124939
-1.415617	sense to	-0.124939
-0.147842	equal to	-0.823909
-1.319265	reads to	-0.124939
-0.420550	Add to	-0.301030
-0.418759	expected to	-0.301030
-0.262509	convenient to	-0.124939
-1.305277	column to	-0.124939
-1.305277	portability to	-0.124939
-1.319265	Pointers to	-0.124939
-0.671536	converting to	-0.124939
-1.305277	costly to	-0.124939
-1.319265	computers to	-0.124939
-1.291726	debugger to	-0.124939
-1.305277	permissible to	-0.124939
-0.057214	due to	-0.204120
-1.305277	edx, to	-0.124939
-0.592355	effort to	-0.425969
-1.226096	principles to	-0.124939
-0.589167	obvious to	-0.124939
-0.592355	swapped to	-0.124939
-0.592355	portable to	-0.124939
-0.592355	limit to	-0.425969
-0.589167	nothing to	-0.124939
-1.226096	increased to	-0.124939
-0.589167	equivalent to	-0.124939
-1.226096	jobs to	-0.124939
-0.184577	safer to	-0.249877
-1.254538	-(-a) to	-0.124939
-1.226096	updated to	-0.124939
-1.254538	appear to	-0.124939
-1.269489	Writes to	-0.124939
-0.069620	belong to	-0.221849
-0.341369	prefer to	-0.124939
-1.143174	slices to	-0.124939
-0.088920	lead to	-0.726999
-1.143174	place to	-0.124939
-0.246258	preferable to	-0.124939
-0.246258	obstacles to	-0.301030
-0.088920	Obstacles to	-0.726999
-1.143174	loader to	-0.124939
-0.088920	Failure to	-0.425969
-1.143174	pre-increment to	-0.124939
-0.088920	thanks to	-0.249877
-0.088920	guaranteed to	-0.425969
-1.143174	modification to	-0.124939
-1.143174	solutions to	-0.124939
-0.246258	appendix to	-0.602060
-0.246258	alternatives to	-0.124939
-1.143174	modifications to	-0.124939
-1.143174	tools to	-0.124939
-0.088920	according to	-0.726999
-1.143174	lengths to	-0.124939
-0.246258	extended to	-0.124939
-0.498656	Access to	-0.124939
-1.143174	years to	-0.124939
-1.157628	inconvenient to	-0.124939
-1.143174	directive to	-0.124939
-0.495445	going to	-0.124939
-0.246258	idea to	-0.124939
-1.157628	interfaces to	-0.124939
-1.157628	mask to	-0.124939
-0.495445	Remember to	-0.124939
-0.495445	appears to	-0.425969
-1.143174	functionality to	-0.124939
-1.143174	handler to	-0.124939
-0.498656	inferior to	-0.124939
-1.143174	auto_ptr to	-0.124939
-0.088920	unable to	-0.124939
-0.088920	15.1a to	-0.124939
-1.143174	manner to	-0.124939
-1.047640	conclusion to	-0.124939
-1.032689	multiplication, to	-0.124939
-1.032689	seem to	-0.124939
-1.032689	-128 to	-0.124939
-1.047640	dividend to	-0.124939
-0.373717	annoying to	-0.425969
-1.032689	listing to	-0.124939
-0.373717	translated to	-0.124939
-1.047640	-fno-builtin to	-0.124939
-0.123125	leads to	-0.124939
-1.032689	questions to	-0.124939
-1.032689	issue to	-0.124939
-1.032689	argument to	-0.124939
-1.032689	decide to	-0.124939
-1.032689	unrolled to	-0.124939
-0.123125	0x2700 to	-0.602060
-0.123125	ported to	-0.124939
-1.032689	experience to	-0.124939
-1.032689	necessary, to	-0.124939
-1.047640	Call to	-0.124939
-1.032689	accesses to	-0.124939
-0.373717	compared to	-0.124939
-0.373717	sufficient to	-0.124939
-0.871549	impossible to	-0.124939
-0.200861	type-casted to	-0.425969
-0.871549	manipulated to	-0.124939
-0.871549	carefully to	-0.124939
-0.871549	dangers to	-0.124939
-0.871549	scanners to	-0.124939
-0.871549	priorities to	-0.124939
-0.871549	answers to	-0.124939
-0.871549	prototype to	-0.124939
-0.871549	Kbytes to	-0.124939
-0.871549	refers to	-0.124939
-0.871549	microseconds to	-0.124939
-0.871549	precautions to	-0.124939
-0.200861	worthwhile to	-0.124939
-0.871549	profitable to	-0.124939
-0.871549	Initialize to	-0.124939
-0.200861	belongs to	-0.124939
-0.871549	caller to	-0.124939
-0.871549	Volatile to	-0.124939
-0.200861	us to	-0.124939
-0.200861	approach to	-0.124939
-0.200861	relates to	-0.425969
-0.871549	11.1a to	-0.124939
-0.200861	forget to	-0.124939
-0.871549	respond to	-0.124939
-0.200861	contribution to	-0.425969
-0.200861	ability to	-0.124939
-0.200861	attempts to	-0.124939
-0.871549	consumer to	-0.124939
-0.871549	distribution to	-0.124939
-0.871549	messages to	-0.124939
-0.871549	coprocessors to	-0.124939
-0.871549	initialize to	-0.124939
-0.871549	obstacle to	-0.124939
-0.871549	extension to	-0.124939
-0.200861	minutes to	-0.124939
-0.871549	incremented to	-0.124939
-0.586003	Updates to	-0.124939
-0.586003	limitations to	-0.124939
-0.586003	forgot to	-0.124939
-0.586003	Handles to	-0.124939
-0.586003	bounds-checking to	-0.124939
-0.586003	comparable to	-0.124939
-0.586003	unacceptable to	-0.124939
-0.586003	T+1 to	-0.124939
-0.586003	prone to	-0.124939
-0.586003	plug-in to	-0.124939
-0.586003	223 to	-0.124939
-0.586003	advised to	-0.124939
-0.586003	unwise to	-0.124939
-0.586003	constructor" to	-0.124939
-0.586003	switching to	-0.124939
-0.586003	throw(A,B,C) to	-0.124939
-0.586003	cumbersome to	-0.124939
-0.586003	novector to	-0.124939
-0.586003	According to	-0.124939
-0.586003	capability to	-0.124939
-0.586003	tried to	-0.124939
-0.586003	Round to	-0.124939
-0.586003	-b to	-0.124939
-0.586003	Entry to	-0.124939
-0.586003	rounded to	-0.124939
-0.586003	closest to	-0.124939
-0.586003	supposed to	-0.124939
-0.586003	relation to	-0.124939
-0.586003	responses to	-0.124939
-0.586003	conform to	-0.124939
-0.586003	correspond to	-0.124939
-0.586003	steps to	-0.124939
-0.586003	Try to	-0.124939
-0.586003	attempting to	-0.124939
-0.586003	15.1d to	-0.124939
-0.586003	advisable to	-0.124939
-0.586003	rise to	-0.124939
-0.586003	specification to	-0.124939
-0.586003	Trying to	-0.124939
-0.586003	Convert to	-0.124939
-0.586003	Float to	-0.124939
-0.586003	quickly to	-0.124939
-0.586003	teachers to	-0.124939
-0.586003	5; to	-0.124939
-0.586003	port to	-0.124939
-0.586003	12.1b to	-0.124939
-0.586003	happened to	-0.124939
-0.586003	closer to	-0.124939
-0.586003	corresponds to	-0.124939
-0.586003	12.8a to	-0.124939
-0.586003	CLR, to	-0.124939
-0.586003	risking to	-0.124939
-0.586003	confined to	-0.124939
-0.586003	prior to	-0.124939
-0.586003	hours to	-0.124939
-0.586003	gone to	-0.124939
-0.586003	susceptible to	-0.124939
-0.586003	adhere to	-0.124939
-0.586003	relate to	-0.124939
-0.586003	WritePrivateProfileString to	-0.124939
-0.586003	responded to	-0.124939
-0.586003	-100 to	-0.124939
-0.586003	tends to	-0.124939
-0.586003	(a*b*c)+(c*b*a) to	-0.124939
-0.586003	analogous to	-0.124939
-0.586003	happy to	-0.124939
-0.586003	practice to	-0.124939
-0.586003	Windows) to	-0.124939
-0.586003	a=a*2; to	-0.124939
-0.586003	tempting to	-0.124939
-0.586003	adapt to	-0.124939
-0.586003	unrelated to	-0.124939
-0.586003	Alternative to	-0.124939
-2.425997	is and	-0.124939
-1.844275	a and	-0.704722
-2.402175	to and	-0.124939
-2.453639	it and	-0.124939
-1.699152	function and	-0.124939
-2.462783	if and	-0.124939
-2.068288	with and	-0.425969
-2.279042	on and	-0.124939
-1.206969	code and	-0.291270
-1.928892	compiler and	-0.124939
-2.611622	x and	-0.124939
-1.408574	time and	-0.124939
-1.989661	use and	-0.124939
-2.202810	more and	-0.124939
-2.031354	A and	-0.124939
-1.367324	memory and	-0.182931
-1.640001	data and	-0.124939
-1.889958	program and	-0.124939
-2.202280	make and	-0.124939
-1.180386	functions and	-0.166331
-1.504745	CPU and	-0.522879
-1.993892	instruction and	-0.124939
-2.133805	point and	-0.124939
-1.500846	loop and	-0.124939
-1.719243	used and	-0.124939
-2.079860	one and	-0.124939
-1.362478	cache and	-0.204120
-1.517385	integer and	-0.124939
-1.896221	page and	-0.124939
-1.274043	set and	-0.271067
-1.614102	class and	-0.124939
-1.812327	do and	-0.124939
-1.031069	compilers and	-0.425969
-2.049072	using and	-0.124939
-1.754451	double and	-0.425969
-1.800277	size and	-0.124939
-1.183583	Intel and	-0.249877
-2.152687	pointer and	-0.124939
-1.382191	b and	-0.823909
-1.560570	library and	-0.124939
-1.817999	i and	-0.124939
-1.317212	float and	-0.522879
-1.970215	two and	-0.124939
-1.820161	number and	-0.124939
-1.553778	static and	-0.301030
-1.401262	C++ and	-0.249877
-1.765820	efficient and	-0.124939
-2.060074	array and	-0.124939
-2.105232	possible and	-0.124939
-2.142124	version and	-0.124939
-2.164387	value and	-0.124939
-1.388902	objects and	-0.124939
-0.887944	variables and	-0.467361
-1.892493	return and	-0.124939
-1.610659	2 and	-0.124939
-1.972385	table and	-0.124939
-1.670469	performance and	-0.124939
-1.752546	order and	-0.124939
-1.476588	long and	-0.124939
-0.790465	32-bit and	-0.937852
-2.045004	branch and	-0.124939
-2.145014	way and	-0.124939
-2.065386	elements and	-0.124939
-1.328076	faster and	-0.124939
-1.301443	before and	-0.726999
-1.279473	called and	-0.249877
-1.440707	address and	-0.124939
-1.950057	4 and	-0.124939
-1.620679	call and	-0.425969
-1.951226	8 and	-0.124939
-1.983603	bit and	-0.124939
-1.969258	first and	-0.124939
-1.938637	register and	-0.124939
-2.004411	64 and	-0.124939
-1.131330	libraries and	-0.124939
-1.956488	registers and	-0.124939
-0.842481	pointers and	-0.176091
-1.914901	test and	-0.124939
-0.830507	new and	-0.477121
-0.830507	systems and	-0.234083
-1.560470	user and	-0.124939
-1.855353	these and	-0.124939
-1.185907	access and	-0.124939
-2.016145	SSE2 and	-0.124939
-1.485929	system and	-0.124939
-1.540813	32 and	-0.124939
-1.063072	file and	-0.124939
-1.306985	operations and	-0.124939
-0.971195	0 and	-0.301030
-1.509202	type and	-0.124939
-1.940335	case and	-0.124939
-1.970469	cases and	-0.124939
-0.964851	processors and	-0.124939
-1.478018	constant and	-0.124939
-1.300462	up and	-0.124939
-1.877350	error and	-0.124939
-0.871756	times and	-0.191886
-1.474440	stack and	-0.124939
-1.280427	Gnu and	-0.301030
-1.957023	important and	-0.124939
-1.476221	CPUs and	-0.425969
-0.908404	arrays and	-0.124939
-0.626956	Windows and	-0.212089
-1.103810	calls and	-0.249877
-1.840367	calculations and	-0.124939
-1.987370	versions and	-0.124939
-1.918205	execution and	-0.124939
-1.231286	processor and	-0.301030
-1.941652	compiled and	-0.124939
-1.422501	big and	-0.124939
-1.877655	threads and	-0.124939
-1.848203	best and	-0.124939
-0.967272	language and	-0.124939
-1.909251	speed and	-0.124939
-1.996612	c and	-0.124939
-1.206624	single and	-0.602060
-1.782478	etc. and	-0.124939
-0.395565	AMD and	-0.878266
-1.201246	allocated and	-0.301030
-1.183222	small and	-0.124939
-0.725269	overflow and	-0.182931
-0.945969	integers and	-0.346788
-1.835251	matrix and	-0.124939
-0.492654	Linux and	-0.284640
-1.856058	AVX and	-0.124939
-1.030373	classes and	-0.124939
-1.844777	works and	-0.124939
-1.823970	optimized and	-0.124939
-1.915038	calculation and	-0.124939
-1.890460	parameters and	-0.124939
-1.370899	problem and	-0.124939
-1.367031	support and	-0.124939
-1.775666	operators and	-0.124939
-1.829679	list and	-0.124939
-1.127000	structure and	-0.124939
-1.865350	inline and	-0.124939
-1.759615	1 and	-0.124939
-0.960676	mode and	-0.249877
-1.907823	sign and	-0.124939
-1.296250	counter and	-0.124939
-0.939309	count and	-0.249877
-0.738609	files and	-0.124939
-1.297071	fast and	-0.124939
-0.810158	allocation and	-0.221849
-1.712151	programs and	-0.124939
-0.635810	problems and	-0.191886
-1.745358	space and	-0.124939
-1.050969	dispatching and	-0.124939
-1.258195	microprocessor and	-0.124939
-0.885241	branches and	-0.249877
-1.696357	multiplication and	-0.124939
-1.713173	automatically and	-0.124939
-1.816588	caching and	-0.124939
-1.249646	sets and	-0.124939
-1.011865	complicated and	-0.124939
-1.713173	handling and	-0.124939
-1.736811	like and	-0.124939
-1.707798	methods and	-0.124939
-0.753505	signed and	-0.823909
-1.696140	model and	-0.124939
-1.707798	development and	-0.124939
-1.228674	block and	-0.124939
-1.220903	name and	-0.124939
-1.744776	conversion and	-0.124939
-1.702049	high and	-0.124939
-0.981704	zero and	-0.301030
-1.222870	Microsoft and	-0.124939
-1.753574	parameter and	-0.124939
-1.740108	division and	-0.124939
-1.740108	source and	-0.124939
-1.753574	running and	-0.124939
-1.753574	resources and	-0.124939
-1.671586	n and	-0.124939
-1.695885	string and	-0.124939
-1.659927	better and	-0.124939
-1.648574	applications and	-0.124939
-1.637510	&& and	-0.124939
-1.200375	addition and	-0.124939
-1.676580	> and	-0.124939
-1.165386	expressions and	-0.124939
-0.960413	read and	-0.124939
-1.160866	directives and	-0.124939
-1.669056	public and	-0.124939
-1.145183	framework and	-0.124939
-0.937814	linking and	-0.301030
-1.709478	microprocessors and	-0.124939
-1.147737	numbers and	-0.425969
-1.622867	platform and	-0.124939
-1.688289	later and	-0.124939
-1.688289	together and	-0.124939
-1.732864	XMM and	-0.124939
-1.143688	interface and	-0.124939
-1.732864	bigger and	-0.124939
-1.635187	vectors and	-0.124939
-1.599230	AVX2 and	-0.124939
-1.652116	<< and	-0.124939
-0.901575	x86 and	-0.301030
-1.638651	process and	-0.124939
-1.159283	advantages and	-0.124939
-1.778174	b, and	-0.124939
-1.625590	storage and	-0.124939
-1.638651	options and	-0.124939
-1.101979	constructor and	-0.124939
-1.754693	a[i] and	-0.124939
-0.870867	function, and	-0.301030
-1.628635	operands and	-0.124939
-1.577110	advanced and	-0.124939
-1.687106	range and	-0.124939
-1.671734	start and	-0.124939
-1.589430	modules and	-0.124939
-0.870867	smaller and	-0.124939
-1.085368	core and	-0.124939
-1.662283	around and	-0.124939
-1.077156	5 and	-0.124939
-1.577285	section and	-0.124939
-1.590346	tested and	-0.124939
-1.646911	contentions and	-0.124939
-1.077156	positive and	-0.124939
-1.632064	C and	-0.124939
-1.093738	global and	-0.124939
-1.591379	conversions and	-0.124939
-1.577483	statement and	-0.124939
-1.651890	off and	-0.124939
-1.075944	p and	-0.124939
-1.605735	languages and	-0.124939
-1.620582	installation and	-0.124939
-1.651890	dynamically and	-0.124939
-1.591379	inlining and	-0.124939
-1.592553	network and	-0.124939
-1.592553	slow and	-0.124939
-1.039380	functions, and	-0.124939
-1.577706	lines and	-0.124939
-1.657604	find and	-0.124939
-1.031010	checking and	-0.124939
-0.618502	platforms and	-0.124939
-1.664227	level-1 and	-0.124939
-1.547743	limited and	-0.124939
-1.577962	math and	-0.124939
-1.562590	constants and	-0.124939
-1.547743	shift and	-0.124939
-1.593898	safe and	-0.124939
-1.627641	efficiency and	-0.124939
-1.578257	strings and	-0.124939
-0.734342	testing and	-0.301030
-0.968862	alignment and	-0.124939
-1.487306	100 and	-0.124939
-0.749083	Variables and	-0.602060
-1.501202	processing and	-0.124939
-0.586317	clear and	-0.124939
-1.543495	x, and	-0.124939
-1.560694	OS and	-0.124939
-0.555005	names and	-0.249877
-1.560694	RAM and	-0.124939
-0.955336	rows and	-0.425969
-0.709352	thing and	-0.124939
-1.480796	structures and	-0.124939
-1.505707	occur and	-0.124939
-1.559492	smart and	-0.124939
-0.913217	SSE and	-0.124939
-0.913217	reading and	-0.425969
-1.473227	simplest and	-0.124939
-0.908928	message and	-0.124939
-1.489163	cores and	-0.124939
-1.473227	sizes and	-0.124939
-0.917548	PathScale and	-0.124939
-0.515488	BSD and	-0.726999
-0.671563	program, and	-0.124939
-0.871824	SSE4.1 and	-0.124939
-1.431834	buffer and	-0.124939
-1.431834	seconds and	-0.124939
-1.416462	compiler, and	-0.124939
-1.447770	invalid and	-0.124939
-1.431834	input and	-0.124939
-0.630170	programmers and	-0.124939
-1.447770	stride and	-0.124939
-1.447770	set, and	-0.124939
-0.632648	processors, and	-0.124939
-1.537617	matter and	-0.124939
-0.867535	declaration and	-0.425969
-1.447770	features and	-0.124939
-1.499421	added and	-0.124939
-1.418556	independent and	-0.124939
-1.418556	away and	-0.124939
-0.843656	15.1b and	-0.124939
-1.418556	multiply and	-0.124939
-1.472342	explanation and	-0.124939
-1.491860	brands and	-0.124939
-1.402013	diagonal and	-0.124939
-1.472342	*p and	-0.124939
-1.435755	addition, and	-0.124939
-0.435292	OpenMP and	-0.726999
-0.821778	standardized and	-0.124939
-0.591888	parent and	-0.602060
-1.367404	systems, and	-0.124939
-1.367404	false and	-0.124939
-1.384603	PC and	-0.124939
-1.384603	parallelism and	-0.124939
-1.402511	c2 and	-0.124939
-1.350860	prediction and	-0.124939
-1.350860	iterations and	-0.124939
-0.774914	integer, and	-0.124939
-1.367404	algorithms and	-0.124939
-1.402511	PLT and	-0.124939
-0.538230	additions and	-0.124939
-0.779245	ecx and	-0.124939
-1.350860	variables, and	-0.124939
-1.367404	however, and	-0.124939
-1.350860	profiling and	-0.124939
-0.540736	fragmented and	-0.124939
-0.535738	family and	-0.602060
-0.779245	devices and	-0.124939
-0.382390	GOT and	-0.726999
-1.367404	Memory and	-0.124939
-1.384603	down and	-0.124939
-0.779245	misses and	-0.425969
-1.326611	-fpic and	-0.124939
-1.382715	carry and	-0.124939
-0.480238	debugging and	-0.124939
-1.326611	vector, and	-0.124939
-0.725628	object, and	-0.124939
-1.344519	allowed and	-0.124939
-0.725628	itself and	-0.124939
-1.363197	algebra and	-0.124939
-1.309412	switches and	-0.124939
-1.363197	distributed and	-0.124939
-0.721253	mode, and	-0.124939
-1.326611	Java and	-0.124939
-1.326611	free and	-0.124939
-1.309412	expensive and	-0.124939
-0.725628	rounding and	-0.425969
-0.721253	system, and	-0.124939
-1.309412	integers, and	-0.124939
-0.721253	again and	-0.124939
-0.725628	linker and	-0.124939
-1.363197	debug and	-0.124939
-0.725628	Clang and	-0.425969
-1.309412	reliable and	-0.124939
-1.326611	Borland and	-0.124939
-1.309412	units and	-0.124939
-1.326611	registers, and	-0.124939
-0.725628	transpose and	-0.425969
-0.260957	possible, and	-0.124939
-0.658681	compact and	-0.124939
-1.296250	reads and	-0.124939
-1.259664	course, and	-0.124939
-1.277572	identify and	-0.124939
-1.259664	complex and	-0.124939
-0.658681	Performance and	-0.425969
-0.658681	Test and	-0.425969
-0.658681	portability and	-0.124939
-1.259664	evaluate and	-0.124939
-1.315768	.NET and	-0.124939
-0.415797	Pointers and	-0.602060
-1.277572	costly and	-0.124939
-0.667565	efficient, and	-0.124939
-1.296250	computers and	-0.124939
-1.277572	B and	-0.124939
-0.663100	9 and	-0.124939
-1.296250	Core and	-0.124939
-1.259664	debugger and	-0.124939
-0.658681	truncation and	-0.124939
-1.277572	spots and	-0.124939
-1.277572	7 and	-0.124939
-1.277572	powerful and	-0.124939
-0.420853	32- and	-0.602060
-1.277572	processing, and	-0.124939
-1.259664	users and	-0.124939
-0.658681	C++, and	-0.124939
-0.667565	communication and	-0.425969
-1.198391	Fortran and	-0.124939
-1.198391	increment and	-0.124939
-1.217069	backwards and	-0.124939
-1.198391	swapping and	-0.124939
-0.339136	memset and	-0.124939
-1.198391	popular and	-0.124939
-1.236587	searching and	-0.124939
-0.583919	propagation and	-0.124939
-0.583919	development, and	-0.124939
-1.217069	obvious and	-0.124939
-1.198391	lists and	-0.124939
-1.217069	pow and	-0.124939
-1.198391	pointers, and	-0.124939
-1.198391	Objects and	-0.124939
-0.183539	constructors and	-0.249877
-1.217069	nonzero and	-0.124939
-1.198391	package and	-0.124939
-1.198391	understand and	-0.124939
-1.198391	inefficient, and	-0.124939
-1.198391	jobs and	-0.124939
-1.217069	latency and	-0.124939
-1.217069	platforms, and	-0.124939
-1.198391	separately and	-0.124939
-1.198391	elimination and	-0.124939
-1.217069	sum1 and	-0.124939
-1.198391	Compilers and	-0.124939
-1.217069	Codeplay and	-0.124939
-0.583919	features, and	-0.124939
-1.198391	flag and	-0.124939
-0.588384	b[i] and	-0.425969
-0.183539	malloc and	-0.249877
-1.217069	true, and	-0.124939
-1.120159	Graphics and	-0.124939
-1.160114	obstacles and	-0.124939
-1.160114	m and	-0.124939
-1.120159	resources, and	-0.124939
-1.120159	one, and	-0.124939
-1.120159	needed, and	-0.124939
-1.120159	SVML and	-0.124939
-0.244762	subtraction and	-0.602060
-1.120159	precision, and	-0.124939
-1.120159	u.f and	-0.124939
-1.120159	134 and	-0.124939
-1.139677	*p+2 and	-0.124939
-0.491474	cores, and	-0.124939
-1.120159	pipeline and	-0.124939
-1.160114	flush-to-zero and	-0.124939
-1.139677	14.8 and	-0.124939
-0.491474	overflow, and	-0.124939
-0.491474	advance and	-0.124939
-1.120159	case" and	-0.124939
-0.495985	0x2710 and	-0.124939
-1.120159	spot and	-0.124939
-1.120159	saving and	-0.124939
-1.139677	structured and	-0.124939
-1.139677	documentation and	-0.124939
-1.139677	not, and	-0.124939
-1.120159	12.4b and	-0.124939
-1.120159	underflow and	-0.124939
-1.139677	2n and	-0.124939
-0.495985	Branches and	-0.425969
-0.491474	interfaces and	-0.124939
-1.120159	caches and	-0.124939
-1.120159	is, and	-0.124939
-1.139677	breakpoint and	-0.124939
-1.120159	functionality and	-0.124939
-0.495985	layers and	-0.124939
-1.120159	Y and	-0.124939
-1.120159	auto_ptr and	-0.124939
-1.139677	constants, and	-0.124939
-1.139677	opens and	-0.124939
-1.139677	format and	-0.124939
-1.120159	resolution and	-0.124939
-1.120159	units, and	-0.124939
-1.014738	49 and	-0.124939
-1.014738	bits, and	-0.124939
-1.035175	consecutively and	-0.124939
-1.035175	11) and	-0.124939
-1.035175	B, and	-0.124939
-1.014738	blocking and	-0.124939
-1.014738	API and	-0.124939
-1.014738	r1 and	-0.124939
-1.035175	r2 and	-0.124939
-1.014738	anyway and	-0.124939
-0.371046	database, and	-0.124939
-1.014738	calculations, and	-0.124939
-0.371046	level, and	-0.124939
-1.014738	books and	-0.124939
-0.371046	generality and	-0.124939
-1.014738	parameter, and	-0.124939
-1.014738	keywords and	-0.124939
-1.014738	-fno-pic and	-0.124939
-1.014738	_M_IX86 and	-0.124939
-1.014738	elsewhere and	-0.124939
-1.014738	operations, and	-0.124939
-1.014738	packages and	-0.124939
-1.014738	136 and	-0.124939
-1.014738	automatically, and	-0.124939
-1.014738	145 and	-0.124939
-1.014738	RISC and	-0.124939
-1.014738	Pascal and	-0.124939
-0.371046	Constructors and	-0.425969
-1.014738	PC's and	-0.124939
-0.371046	DOS and	-0.124939
-1.014738	Signed and	-0.124939
-1.014738	logarithms and	-0.124939
-1.035175	easiest and	-0.124939
-0.371046	flow and	-0.124939
-1.014738	s2 and	-0.124939
-1.014738	etc., and	-0.124939
-1.014738	F2 and	-0.124939
-1.014738	readable and	-0.124939
-1.014738	decomposition and	-0.124939
-1.014738	forums and	-0.124939
-1.014738	Func1 and	-0.124939
-1.014738	odd and	-0.124939
-1.014738	vectors, and	-0.124939
-1.035175	deallocation and	-0.124939
-1.014738	(PLT) and	-0.124939
-1.014738	references, and	-0.124939
-0.371046	folding and	-0.425969
-0.122373	Sum2 and	-0.301030
-0.859084	Structure and	-0.124939
-0.859084	_MSC_VER and	-0.124939
-0.859084	operator; and	-0.124939
-0.859084	CPU-specific and	-0.124939
-0.199513	develop and	-0.124939
-0.859084	Multiplication and	-0.124939
-0.859084	14.12b and	-0.124939
-0.859084	hacks and	-0.124939
-0.859084	hint and	-0.124939
-0.859084	paragraph and	-0.124939
-0.859084	scanners and	-0.124939
-0.859084	73 and	-0.124939
-0.859084	settings and	-0.124939
-0.859084	corrections and	-0.124939
-0.859084	inconsistent and	-0.124939
-0.859084	starting and	-0.124939
-0.859084	itself, and	-0.124939
-0.859084	Kbytes and	-0.124939
-0.199513	creating and	-0.425969
-0.859084	FuncA and	-0.124939
-0.859084	overwritten, and	-0.124939
-0.859084	if, and	-0.124939
-0.859084	p1 and	-0.124939
-0.859084	lrintf and	-0.124939
-0.859084	audio and	-0.124939
-0.859084	price, and	-0.124939
-0.859084	renaming and	-0.124939
-0.859084	compression and	-0.124939
-0.199513	exponent, and	-0.124939
-0.859084	verifying and	-0.124939
-0.199513	Exceptions and	-0.425969
-0.859084	constructors, and	-0.124939
-0.859084	push and	-0.124939
-0.859084	searching, and	-0.124939
-0.859084	properly and	-0.124939
-0.859084	list[i].a and	-0.124939
-0.859084	MOVNTPD and	-0.124939
-0.859084	Increment and	-0.124939
-0.859084	propagation, and	-0.124939
-0.859084	Library" and	-0.124939
-0.859084	multiplications and	-0.124939
-0.859084	optional and	-0.124939
-0.859084	__GNUC__ and	-0.124939
-0.859084	14.23b and	-0.124939
-0.859084	respects and	-0.124939
-0.859084	support, and	-0.124939
-0.859084	tedious and	-0.124939
-0.859084	systematic and	-0.124939
-0.199513	management and	-0.425969
-0.859084	clumsy and	-0.124939
-0.859084	fetched and	-0.124939
-0.859084	123 and	-0.124939
-0.859084	reusable and	-0.124939
-0.859084	expansions and	-0.124939
-0.859084	interrupts and	-0.124939
-0.859084	distribution and	-0.124939
-0.859084	log, and	-0.124939
-0.199513	Goedecker and	-0.124939
-0.859084	accurate and	-0.124939
-0.859084	sorting and	-0.124939
-0.859084	keyboard and	-0.124939
-0.199513	root and	-0.425969
-0.859084	statistics, and	-0.124939
-0.859084	leaks and	-0.124939
-0.859084	95 and	-0.124939
-0.859084	delete, and	-0.124939
-0.859084	accessed, and	-0.124939
-0.199513	Structures and	-0.124939
-0.579500	common, and	-0.124939
-0.579500	compact, and	-0.124939
-0.579500	'@' and	-0.124939
-0.579500	areas, and	-0.124939
-0.579500	strange and	-0.124939
-0.579500	invalid, and	-0.124939
-0.579500	sqrt and	-0.124939
-0.579500	esp+12 and	-0.124939
-0.579500	x∙xn-1, and	-0.124939
-0.579500	evictions and	-0.124939
-0.579500	compactness, and	-0.124939
-0.579500	bloat and	-0.124939
-0.579500	C# and	-0.124939
-0.579500	(0); and	-0.124939
-0.579500	frustration and	-0.124939
-0.579500	situations, and	-0.124939
-0.579500	technology, and	-0.124939
-0.579500	alignments and	-0.124939
-0.579500	independence, and	-0.124939
-0.579500	sufficient, and	-0.124939
-0.579500	running, and	-0.124939
-0.579500	2B, and	-0.124939
-0.579500	a+b=0, and	-0.124939
-0.579500	errors, and	-0.124939
-0.579500	condition, and	-0.124939
-0.579500	local, and	-0.124939
-0.579500	commas and	-0.124939
-0.579500	(YMM), and	-0.124939
-0.579500	dominating and	-0.124939
-0.579500	spell-checking and	-0.124939
-0.579500	tested, and	-0.124939
-0.579500	thread, and	-0.124939
-0.579500	! and	-0.124939
-0.579500	2A and	-0.124939
-0.579500	transposing and	-0.124939
-0.579500	aligned, and	-0.124939
-0.579500	cheaper and	-0.124939
-0.579500	source, and	-0.124939
-0.579500	_mm_malloc and	-0.124939
-0.579500	throughputs and	-0.124939
-0.579500	reusability and	-0.124939
-0.579500	economy and	-0.124939
-0.579500	_intel_fast_memcpy and	-0.124939
-0.579500	GetPrivateProfileString and	-0.124939
-0.579500	non-zero, and	-0.124939
-0.579500	maintain and	-0.124939
-0.579500	module, and	-0.124939
-0.579500	ambiguous and	-0.124939
-0.579500	sequential, and	-0.124939
-0.579500	VTune and	-0.124939
-0.579500	esp+8 and	-0.124939
-0.579500	hand and	-0.124939
-0.579500	caller, and	-0.124939
-0.579500	0x3F00 and	-0.124939
-0.579500	T+6, and	-0.124939
-0.579500	values, and	-0.124939
-0.579500	Professional and	-0.124939
-0.579500	FreeBSD and	-0.124939
-0.579500	optimize, and	-0.124939
-0.579500	VML and	-0.124939
-0.579500	brands, and	-0.124939
-0.579500	routines and	-0.124939
-0.579500	workstations and	-0.124939
-0.579500	decoding and	-0.124939
-0.579500	geometry and	-0.124939
-0.579500	terminated and	-0.124939
-0.579500	subexpressions, and	-0.124939
-0.579500	flush and	-0.124939
-0.579500	ASP and	-0.124939
-0.579500	issues, and	-0.124939
-0.579500	mutexes and	-0.124939
-0.579500	fluctuating and	-0.124939
-0.579500	mechanisms, and	-0.124939
-0.579500	(2,2,2,2), and	-0.124939
-0.579500	infinity, and	-0.124939
-0.579500	games and	-0.124939
-0.579500	email and	-0.124939
-0.579500	platform-independent and	-0.124939
-0.579500	limitation and	-0.124939
-0.579500	attacks and	-0.124939
-0.579500	temp1 and	-0.124939
-0.579500	Library) and	-0.124939
-0.579500	520 and	-0.124939
-0.579500	(ATL) and	-0.124939
-0.579500	2.6.30 and	-0.124939
-0.579500	15.1b, and	-0.124939
-0.579500	recoverable and	-0.124939
-0.579500	reinstalled and	-0.124939
-0.579500	synchronizing and	-0.124939
-0.579500	www.agner.org/optimize and	-0.124939
-0.579500	dependent and	-0.124939
-0.579500	period and	-0.124939
-0.579500	3A and	-0.124939
-0.579500	protocols and	-0.124939
-0.579500	intrinsics and	-0.124939
-0.579500	GOT, and	-0.124939
-0.579500	traffic and	-0.124939
-0.579500	manageable and	-0.124939
-0.579500	call, and	-0.124939
-0.579500	-128, and	-0.124939
-0.579500	libmmt.lib and	-0.124939
-0.579500	_finite()) and	-0.124939
-0.579500	(& and	-0.124939
-0.579500	process, and	-0.124939
-0.579500	side-effects and	-0.124939
-0.579500	modularity and	-0.124939
-0.579500	sizes, and	-0.124939
-0.579500	workplace and	-0.124939
-0.579500	www.openmp.org and	-0.124939
-0.579500	CPLDs and	-0.124939
-0.579500	(new and	-0.124939
-0.579500	squares and	-0.124939
-0.579500	100*16, and	-0.124939
-0.579500	violations and	-0.124939
-0.579500	pros and	-0.124939
-0.579500	tortuous and	-0.124939
-0.579500	catch, and	-0.124939
-0.579500	2005; and	-0.124939
-0.579500	++i and	-0.124939
-0.579500	lower; and	-0.124939
-0.579500	mainframes, and	-0.124939
-0.579500	sort and	-0.124939
-0.579500	mask, and	-0.124939
-0.579500	disks and	-0.124939
-0.579500	tables, and	-0.124939
-0.579500	bulky and	-0.124939
-0.579500	(1,2,3,4), and	-0.124939
-0.579500	arithmetics and	-0.124939
-0.579500	truncation, and	-0.124939
-0.579500	(&) and	-0.124939
-0.579500	(&& and	-0.124939
-0.579500	error; and	-0.124939
-2.398648	is in	-0.124939
-2.486265	a in	-0.124939
-2.059464	and in	-0.221849
-2.135229	be in	-0.602060
-1.924187	are in	-0.602060
-2.598100	can in	-0.124939
-1.904927	or in	-0.425969
-2.044272	it in	-0.124939
-1.408935	function in	-0.452298
-2.136192	with in	-0.124939
-1.213973	code in	-0.425969
-1.657393	as in	-0.492916
-1.622408	not in	-0.271067
-2.575188	- in	-0.124939
-1.671814	int in	-0.425969
-1.312241	than in	-0.367977
-1.965127	compiler in	-0.124939
-2.681588	x in	-0.124939
-1.741960	may in	-0.522879
-1.993883	this in	-0.425969
-1.594669	time in	-0.124939
-1.873062	use in	-0.124939
-2.334380	when in	-0.124939
-1.964965	memory in	-0.124939
-1.272873	data in	-0.279841
-1.773359	program in	-0.124939
-1.659052	vector in	-0.726999
-2.135377	different in	-0.124939
-1.820614	because in	-0.301030
-1.870133	same in	-0.124939
-1.386594	functions in	-0.271067
-1.754957	only in	-0.602060
-1.572232	other in	-0.249877
-2.220118	point in	-0.124939
-1.256342	loop in	-0.477121
-2.212932	which in	-0.124939
-1.498185	but in	-0.124939
-0.800921	used in	-0.391207
-2.167009	one in	-0.124939
-1.870221	cache in	-0.124939
-1.443983	integer in	-0.221849
-1.666270	set in	-0.301030
-1.821628	class in	-0.124939
-2.061013	example in	-0.124939
-1.641218	size in	-0.124939
-1.500901	pointer in	-0.249877
-1.842934	b in	-0.124939
-2.307383	i in	-0.124939
-2.094739	float in	-0.124939
-1.577782	object in	-0.124939
-2.330773	number in	-0.124939
-1.326970	efficient in	-0.522879
-1.758911	possible in	-0.124939
-1.768222	version in	-0.124939
-1.215693	value in	-0.204120
-0.858329	objects in	-0.284640
-0.873541	variable in	-0.249877
-2.185319	so in	-0.124939
-0.892525	variables in	-0.388180
-1.461082	2 in	-0.301030
-1.473429	table in	-0.301030
-2.115944	performance in	-0.124939
-1.323874	software in	-0.124939
-1.030565	order in	-0.726999
-2.114200	branch in	-0.124939
-1.703110	way in	-0.124939
-0.624998	elements in	-0.338819
-1.484552	faster in	-0.124939
-2.220491	const in	-0.124939
-0.320720	stored in	-0.413734
-1.432231	called in	-0.301030
-1.673306	address in	-0.124939
-2.024127	4 in	-0.124939
-2.024668	8 in	-0.124939
-1.186275	example, in	-0.221849
-1.422220	bit in	-0.124939
-2.151744	unsigned in	-0.124939
-1.622937	first in	-0.124939
-2.013639	libraries in	-0.124939
-1.368422	registers in	-0.301030
-2.047415	pointers in	-0.124939
-1.985611	test in	-0.124939
-0.831907	useful in	-0.234083
-2.050788	even in	-0.124939
-1.576016	method in	-0.124939
-1.555478	access in	-0.124939
-1.962468	16 in	-0.124939
-1.318630	file in	-0.124939
-0.914229	bits in	-0.271067
-1.536828	operations in	-0.124939
-2.028898	0 in	-0.124939
-1.993650	type in	-0.124939
-2.002197	case in	-0.124939
-2.208167	short in	-0.124939
-1.907888	simple in	-0.124939
-1.944833	instructions in	-0.124939
-0.679618	available in	-0.263241
-1.310622	up in	-0.301030
-1.943165	error in	-0.124939
-1.145249	times in	-0.249877
-1.278523	stack in	-0.301030
-1.910859	about in	-0.124939
-0.652319	accessed in	-0.467361
-1.496932	CPUs in	-0.425969
-1.900021	while in	-0.124939
-1.481040	arrays in	-0.124939
-1.469186	work in	-0.124939
-1.922476	Windows in	-0.124939
-1.112103	calls in	-0.726999
-1.101944	calculations in	-0.124939
-1.006308	result in	-0.221849
-1.994712	compiled in	-0.124939
-1.098505	bytes in	-0.124939
-1.888497	big in	-0.124939
-1.089797	threads in	-0.249877
-1.984728	necessary in	-0.124939
-0.536669	element in	-0.393784
-1.894508	language in	-0.124939
-1.470404	But in	-0.124939
-1.965122	speed in	-0.124939
-1.436309	thread in	-0.124939
-1.850047	etc. in	-0.124939
-1.451445	exception in	-0.425969
-1.943406	allocated in	-0.124939
-1.871324	small in	-0.124939
-1.933714	overflow in	-0.124939
-1.062237	integers in	-0.124939
-1.429888	option in	-0.425969
-1.415898	matrix in	-0.124939
-1.913238	Linux in	-0.124939
-1.184827	classes in	-0.602060
-1.057733	done in	-0.249877
-1.188346	precision in	-0.124939
-1.860328	line in	-0.124939
-1.410626	works in	-0.425969
-0.647975	explained in	-0.380211
-1.176216	calculated in	-0.301030
-1.187221	calculation in	-0.301030
-1.415264	advantageous in	-0.124939
-0.541243	implemented in	-0.321233
-1.905473	problem in	-0.124939
-1.960090	known in	-0.124939
-1.164949	solution in	-0.124939
-1.022670	advantage in	-0.124939
-1.895328	support in	-0.124939
-1.159887	supported in	-0.124939
-1.936014	eight in	-0.124939
-1.882738	list in	-0.124939
-2.024774	likely in	-0.124939
-1.367899	structure in	-0.124939
-0.871610	run in	-0.124939
-1.866555	hardware in	-0.124939
-1.108527	values in	-0.301030
-1.777556	All in	-0.124939
-1.863154	well in	-0.124939
-1.121672	information in	-0.124939
-1.818467	cycles in	-0.124939
-0.835282	addresses in	-0.346788
-1.314057	counter in	-0.124939
-1.311783	fast in	-0.124939
-1.823803	allocation in	-0.124939
-1.866933	However, in	-0.124939
-1.293397	optimal in	-0.124939
-0.909920	space in	-0.726999
-1.998682	lot in	-0.124939
-0.711950	dispatching in	-0.425969
-1.792768	microprocessor in	-0.124939
-1.772710	branches in	-0.124939
-1.763018	typically in	-0.124939
-1.883790	preferably in	-0.124939
-0.877735	automatically in	-0.249877
-1.819508	see in	-0.124939
-1.270719	implementation in	-0.425969
-1.746627	complicated in	-0.124939
-1.254988	handling in	-0.124939
-1.797408	members in	-0.124939
-1.759345	methods in	-0.124939
-1.749199	name in	-0.124939
-1.731471	zero in	-0.124939
-1.784746	division in	-0.124939
-1.014352	cost in	-0.301030
-0.575140	running in	-0.271067
-1.214282	dispatcher in	-0.124939
-1.777750	programmer in	-0.124939
-1.733520	lookup in	-0.124939
-1.827000	end in	-0.124939
-0.984478	examples in	-0.124939
-0.560874	difference in	-0.367977
-0.978894	mechanism in	-0.301030
-0.719888	needed in	-0.221849
-1.220690	last in	-0.124939
-0.288500	transferred in	-0.726999
-1.807695	longer in	-0.124939
-0.946848	optimizations in	-0.124939
-1.673479	framework in	-0.124939
-1.828480	look in	-0.124939
-1.184421	microprocessors in	-0.425969
-0.923781	numbers in	-0.124939
-1.728851	later in	-0.124939
-0.660210	together in	-0.221849
-0.774288	declared in	-0.425969
-1.836936	piece in	-0.124939
-1.744026	know in	-0.124939
-1.731180	r in	-0.124939
-0.638956	results in	-0.124939
-1.694775	goes in	-0.124939
-1.683289	options in	-0.124939
-1.694775	were in	-0.124939
-1.671294	operands in	-0.124939
-1.761533	points in	-0.124939
-1.707699	switch in	-0.124939
-1.671294	smaller in	-0.124939
-1.707699	here in	-0.124939
-0.592704	around in	-0.221849
-1.102517	things in	-0.124939
-1.646471	reductions in	-0.124939
-1.086195	tested in	-0.124939
-0.591677	contentions in	-0.522879
-0.702381	references in	-0.124939
-1.736709	overhead in	-0.124939
-1.682875	change in	-0.124939
-1.631941	conversions in	-0.124939
-1.063082	statement in	-0.124939
-0.566375	errors in	-0.221849
-0.477208	columns in	-0.425969
-1.644069	languages in	-0.124939
-0.830634	syntax in	-0.301030
-1.696283	avoided in	-0.124939
-1.641363	inefficient in	-0.124939
-0.256401	described in	-0.380211
-0.536294	lines in	-0.124939
-1.592113	operation in	-0.124939
-1.696921	instance in	-0.124939
-0.508383	given in	-0.124939
-1.573949	task in	-0.124939
-1.586077	limited in	-0.124939
-1.028293	costs in	-0.425969
-1.611400	S1 in	-0.124939
-1.018197	temp in	-0.124939
-1.005090	database in	-0.124939
-1.598554	constants in	-0.124939
-1.638291	bool in	-0.124939
-1.586077	shift in	-0.124939
-1.682034	d in	-0.124939
-1.008330	algorithm in	-0.124939
-0.387870	strings in	-0.204120
-0.748084	conditions in	-0.301030
-1.553893	right in	-0.124939
-1.541764	macro in	-0.124939
-1.529965	100 in	-0.124939
-1.579215	tasks in	-0.124939
-1.541764	processing in	-0.124939
-0.954589	containers in	-0.124939
-1.544453	priority in	-0.124939
-1.600011	obtained in	-0.124939
-0.947936	counters in	-0.124939
-1.544453	possibly in	-0.124939
-1.571345	names in	-0.124939
-1.557691	details in	-0.124939
-0.961346	rows in	-0.124939
-1.615087	fail in	-0.124939
-1.507002	(e.g. in	-0.124939
-1.585441	compiling in	-0.124939
-0.954589	least in	-0.124939
-1.519130	structures in	-0.124939
-0.405709	occur in	-0.346788
-0.522692	especially in	-0.425969
-1.577299	improved in	-0.124939
-0.926976	discussed in	-0.124939
-1.547653	below in	-0.124939
-0.916801	message in	-0.124939
-0.920166	delay in	-0.425969
-1.519902	resource in	-0.124939
-1.519902	cores in	-0.124939
-0.916801	either in	-0.425969
-0.406744	defined in	-0.221849
-1.481342	rarely in	-0.124939
-0.679397	except in	-0.301030
-1.465272	chapter in	-0.124939
-1.535906	back in	-0.124939
-1.465272	templates in	-0.124939
-1.478510	unrolling in	-0.124939
-0.885583	CriticalFunction in	-0.425969
-1.520830	sequence in	-0.124939
-0.882165	something in	-0.124939
-1.478510	invalid in	-0.124939
-1.465272	input in	-0.124939
-1.520830	organized in	-0.124939
-1.520830	'this' in	-0.124939
-0.367428	gain in	-0.346788
-1.506260	happen in	-0.124939
-1.452426	metaprogramming in	-0.124939
-1.478510	define in	-0.124939
-1.475073	implement in	-0.124939
-1.521967	terms in	-0.124939
-1.446406	blocks in	-0.124939
-0.833016	away in	-0.124939
-1.446406	low in	-0.124939
-0.592247	provided in	-0.124939
-1.446406	default in	-0.124939
-1.432752	chains in	-0.124939
-1.446406	purposes in	-0.124939
-0.592247	mentioned in	-0.124939
-0.836407	Optimization in	-0.425969
-1.475073	everything in	-0.124939
-1.419514	(or in	-0.124939
-0.270518	included in	-0.346788
-1.381600	iterations in	-0.124939
-1.438996	account in	-0.124939
-1.381600	chain in	-0.124939
-0.781863	algorithms in	-0.124939
-0.788673	additions in	-0.124939
-1.423920	factors in	-0.124939
-0.180386	listed in	-0.425969
-1.395253	explicitly in	-0.124939
-1.395253	interpreted in	-0.124939
-1.438996	determined in	-0.124939
-0.785255	misses in	-0.425969
-1.423920	YMM in	-0.124939
-1.381004	purpose in	-0.124939
-0.727263	-fpic in	-0.124939
-1.337261	had in	-0.124939
-1.351358	19 in	-0.124939
-0.730681	allowed in	-0.124939
-1.365928	itself in	-0.124939
-1.381004	algebra in	-0.124939
-1.351358	free in	-0.124939
-1.337261	expensive in	-0.124939
-1.337261	exceptions in	-0.124939
-0.483102	saved in	-0.124939
-1.351358	changes in	-0.124939
-1.337261	measured in	-0.124939
-1.337261	factor in	-0.124939
-1.337261	units in	-0.124939
-1.351358	reciprocal in	-0.124939
-1.345876	increase in	-0.124939
-0.670653	spent in	-0.425969
-0.663734	occurs in	-0.124939
-1.314057	follows in	-0.124939
-1.298981	step in	-0.124939
-0.663734	spots in	-0.425969
-1.284411	places in	-0.124939
-1.298981	evaluated in	-0.124939
-0.422017	deallocated in	-0.301030
-1.298981	permissible in	-0.124939
-1.284411	users in	-0.124939
-1.298981	six in	-0.124939
-1.234876	fourteen in	-0.124939
-1.219800	principles in	-0.124939
-1.234876	reduction in	-0.124939
-0.591471	portable in	-0.425969
-1.219800	lists in	-0.124939
-1.266695	recover in	-0.124939
-1.234876	advice in	-0.124939
-1.234876	already in	-0.124939
-1.234876	i.e. in	-0.124939
-1.219800	package in	-0.124939
-0.340873	seen in	-0.301030
-1.234876	contiguous in	-0.124939
-1.219800	separately in	-0.124939
-1.219800	Development in	-0.124939
-1.219800	key in	-0.124939
-0.340873	appear in	-0.602060
-0.594972	(except in	-0.124939
-1.219800	flag in	-0.124939
-0.587998	written in	-0.124939
-1.234876	present in	-0.124939
-1.137966	place in	-0.124939
-0.494561	sixteen in	-0.425969
-1.137966	enabled in	-0.124939
-1.153584	serial in	-0.124939
-1.137966	modifications in	-0.124939
-0.494561	missing in	-0.124939
-0.088805	subroutines in	-0.726999
-1.137966	lengths in	-0.124939
-0.494561	contained in	-0.124939
-1.137966	underflow in	-0.124939
-1.153584	prevented in	-0.124939
-0.088805	Contentions in	-0.726999
-0.245926	illustrated in	-0.301030
-0.494561	returned in	-0.124939
-1.137966	is, in	-0.124939
-0.494561	breakpoint in	-0.425969
-1.153584	appears in	-0.124939
-0.245926	found in	-0.124939
-1.137966	handler in	-0.124939
-0.498062	coded in	-0.425969
-1.137966	changing in	-0.124939
-0.494561	kept in	-0.124939
-1.137966	techniques in	-0.124939
-0.494561	sequentially in	-0.124939
-1.137966	simpler in	-0.124939
-0.373124	detail in	-0.425969
-1.028645	advices in	-0.124939
-1.028645	FPGA in	-0.124939
-0.373124	consecutively in	-0.124939
-1.028645	abstraction in	-0.124939
-1.028645	distance in	-0.124939
-1.028645	anyway in	-0.124939
-1.028645	updating in	-0.124939
-1.028645	instruments in	-0.124939
-1.028645	wasteful in	-0.124939
-0.373124	lies in	-0.425969
-1.028645	elsewhere in	-0.124939
-1.028645	RISC in	-0.124939
-0.373124	supplied in	-0.124939
-1.028645	PC's in	-0.124939
-1.028645	delays in	-0.124939
-1.028645	logarithms in	-0.124939
-1.028645	longjmp in	-0.124939
-1.028645	kernel in	-0.124939
-1.028645	bookkeeping in	-0.124939
-1.028645	formula in	-0.124939
-1.028645	brackets in	-0.124939
-1.028645	handled in	-0.124939
-1.028645	(PLT) in	-0.124939
-1.028645	pivot in	-0.124939
-0.373124	placed in	-0.124939
-0.868755	invest in	-0.124939
-0.868755	Sum3 in	-0.124939
-0.200562	shown in	-0.124939
-0.868755	proceed in	-0.124939
-0.868755	justified in	-0.124939
-0.868755	influences in	-0.124939
-0.868755	disabled in	-0.124939
-0.200562	relocations in	-0.425969
-0.868755	code" in	-0.124939
-0.868755	locally in	-0.124939
-0.868755	MultiplyBy in	-0.124939
-0.868755	answers in	-0.124939
-0.868755	inserted in	-0.124939
-0.868755	Delays in	-0.124939
-0.868755	visible in	-0.124939
-0.868755	summarized in	-0.124939
-0.868755	experiments in	-0.124939
-0.868755	everywhere in	-0.124939
-0.868755	pragmas in	-0.124939
-0.868755	alone in	-0.124939
-0.868755	time-consumer in	-0.124939
-0.200562	Optimizations in	-0.425969
-0.868755	parallelization in	-0.124939
-0.868755	covered in	-0.124939
-0.868755	degradation in	-0.124939
-0.868755	overdetermined in	-0.124939
-0.868755	integrated in	-0.124939
-0.868755	annotation in	-0.124939
-0.868755	ment in	-0.124939
-0.868755	efforts in	-0.124939
-0.868755	1-bit in	-0.124939
-0.868755	usage in	-0.124939
-0.868755	previously in	-0.124939
-0.868755	stay in	-0.124939
-0.868755	Friday) in	-0.124939
-0.868755	Put in	-0.124939
-0.584553	dominate in	-0.124939
-0.584553	niche in	-0.124939
-0.584553	SetThreadAffinityMask, in	-0.124939
-0.584553	AND-operations in	-0.124939
-0.584553	typo in	-0.124939
-0.584553	recognized in	-0.124939
-0.584553	dot in	-0.124939
-0.584553	0] in	-0.124939
-0.584553	utilities in	-0.124939
-0.584553	alleviated in	-0.124939
-0.584553	positions in	-0.124939
-0.584553	Numbers in	-0.124939
-0.584553	grow in	-0.124939
-0.584553	system-independent, in	-0.124939
-0.584553	formulas in	-0.124939
-0.584553	arranged in	-0.124939
-0.584553	flaws in	-0.124939
-0.584553	GetLogicalProcessorInformation in	-0.124939
-0.584553	investing in	-0.124939
-0.584553	randomness in	-0.124939
-0.584553	mirrored in	-0.124939
-0.584553	reorganized in	-0.124939
-0.584553	de-referenced in	-0.124939
-0.584553	Programming in	-0.124939
-0.584553	server in	-0.124939
-0.584553	Nothing in	-0.124939
-0.584553	absent in	-0.124939
-0.584553	decoded in	-0.124939
-0.584553	programmed in	-0.124939
-0.584553	contiguously in	-0.124939
-0.584553	may, in	-0.124939
-0.584553	bear in	-0.124939
-0.584553	because, in	-0.124939
-0.584553	finishes in	-0.124939
-0.584553	UnusedFiller in	-0.124939
-0.584553	phase in	-0.124939
-0.584553	indexed in	-0.124939
-0.584553	FactorialTable in	-0.124939
-0.584553	improvements in	-0.124939
-0.584553	introduced in	-0.124939
-0.584553	somewhere in	-0.124939
-0.584553	imprecision in	-0.124939
-0.584553	(OnIdle in	-0.124939
-0.584553	(GOT) in	-0.124939
-0.584553	scheduling in	-0.124939
-0.584553	construction in	-0.124939
-0.584553	1.2 in	-0.124939
-0.584553	version) in	-0.124939
-0.584553	iterator in	-0.124939
-0.584553	remarkably in	-0.124939
-0.584553	cheap, in	-0.124939
-0.584553	costless in	-0.124939
-0.584553	semicolons in	-0.124939
-0.584553	obscured in	-0.124939
-0.584553	representations in	-0.124939
-0.584553	succeeded in	-0.124939
-0.584553	if-branch in	-0.124939
-0.584553	continue in	-0.124939
-0.584553	GetProcessAffinityMask in	-0.124939
-0.584553	improvement in	-0.124939
-0.584553	foremost, in	-0.124939
-0.584553	CPU-time in	-0.124939
-0.584553	courses in	-0.124939
-0.584553	scheduled in	-0.124939
-0.584553	i/2 in	-0.124939
-0.584553	glitches in	-0.124939
-0.584553	IsProcessorFeaturePresent in	-0.124939
-0.584553	sizeof(float) in	-0.124939
-0.584553	rows/columns in	-0.124939
-0.584553	Calculations in	-0.124939
-0.584553	iterative in	-0.124939
-0.584553	predictions in	-0.124939
-0.584553	comments, in	-0.124939
-0.584553	anywhere in	-0.124939
-0.584553	resized in	-0.124939
-0.584553	overridden in	-0.124939
-0.584553	Thinking in	-0.124939
-0.584553	__intel_new_strlen in	-0.124939
-0.584553	volumes in	-0.124939
-2.211243	// The	-0.124939
-2.669739	x The	-0.124939
-2.397984	{ The	-0.124939
-0.920030	} The	-0.221849
-2.226461	data The	-0.124939
-1.758703	functions The	-0.124939
-1.870065	compilers The	-0.124939
-2.206896	Intel The	-0.124939
-2.107282	variables The	-0.124939
-2.036199	table The	-0.124939
-2.141959	unsigned The	-0.124939
-0.820760	code. The	-0.124939
-0.863491	time. The	-0.176091
-1.578468	registers The	-0.124939
-0.573777	function. The	-0.159701
-1.057145	etc. The	-0.124939
-1.903674	Linux The	-0.124939
-1.406626	}; The	-0.124939
-0.581157	functions. The	-0.170696
-1.825293	operators The	-0.124939
-1.103299	memory. The	-0.124939
-0.604700	program. The	-0.124939
-0.937379	used. The	-0.249877
-1.784194	microprocessor The	-0.124939
-1.763618	branches The	-0.124939
-1.794859	Mac The	-0.124939
-0.424779	cache. The	-0.124939
-0.590253	systems. The	-0.124939
-1.795099	c; The	-0.124939
-0.716084	efficient. The	-0.221849
-0.713975	below. The	-0.124939
-0.494988	data. The	-0.124939
-1.763568	types The	-0.124939
-0.609106	set. The	-0.124939
-0.954360	compilers. The	-0.124939
-0.786543	processors. The	-0.124939
-1.664525	platform The	-0.124939
-1.722175	together The	-0.124939
-0.636294	called. The	-0.124939
-0.613879	CPUs. The	-0.124939
-1.664259	operands The	-0.124939
-1.640918	compiler. The	-0.124939
-1.629702	modules The	-0.124939
-0.848164	are: The	-0.124939
-0.590124	loop. The	-0.124939
-1.094229	pointer. The	-0.124939
-1.625265	conversions The	-0.124939
-1.057769	platforms. The	-0.124939
-1.650653	installation The	-0.124939
-0.675308	cases. The	-0.124939
-1.064496	1. The	-0.124939
-1.625265	inlining The	-0.124939
-1.589766	size. The	-0.124939
-0.648665	2. The	-0.124939
-1.561737	variables. The	-0.124939
-1.555115	resources. The	-0.124939
-0.621486	class. The	-0.124939
-0.418027	calls. The	-0.124939
-1.567273	algorithm The	-0.124939
-0.977723	it. The	-0.124939
-0.739484	registers. The	-0.124939
-0.587907	mode. The	-0.124939
-1.616193	bytes. The	-0.124939
-0.739484	object. The	-0.124939
-0.977723	library. The	-0.124939
-0.592105	calculations. The	-0.124939
-0.741463	cycles. The	-0.124939
-0.977723	operations. The	-0.124939
-0.589302	variable. The	-0.124939
-1.535088	optimization. The	-0.124939
-0.977723	performance. The	-0.124939
-0.706701	libraries. The	-0.124939
-1.538988	stack. The	-0.124939
-0.942961	possible. The	-0.124939
-0.942961	needed. The	-0.124939
-0.706701	classes. The	-0.124939
-1.538988	thread. The	-0.124939
-0.710685	purposes. The	-0.124939
-0.706701	precision. The	-0.124939
-0.706701	access. The	-0.124939
-1.514891	condition The	-0.124939
-0.670900	instructions. The	-0.124939
-1.558764	f; The	-0.124939
-0.670900	way. The	-0.124939
-0.670900	vector. The	-0.124939
-0.915545	well. The	-0.124939
-1.459807	dispatching. The	-0.124939
-1.446533	statements The	-0.124939
-0.870667	address. The	-0.124939
-0.870667	sets. The	-0.124939
-0.478161	not. The	-0.124939
-0.633510	problem. The	-0.124939
-1.502249	3 The	-0.124939
-0.479570	version. The	-0.124939
-1.446533	user. The	-0.124939
-0.870667	returns. The	-0.124939
-0.874153	Windows. The	-0.124939
-0.587752	order. The	-0.124939
-1.414049	allocation. The	-0.124939
-0.835451	integers. The	-0.124939
-0.828395	enabled. The	-0.124939
-1.441879	6 The	-0.124939
-1.414049	inefficient. The	-0.124939
-0.587752	critical. The	-0.124939
-0.433812	available. The	-0.124939
-1.414049	executed. The	-0.124939
-0.589768	faster. The	-0.124939
-0.831909	problems. The	-0.124939
-1.414049	parameter. The	-0.124939
-1.427741	overflow. The	-0.124939
-0.433812	element. The	-0.124939
-1.414049	storage. The	-0.124939
-0.538615	value. The	-0.124939
-1.390726	file. The	-0.124939
-0.780756	register. The	-0.124939
-1.390726	once The	-0.124939
-0.784298	system. The	-0.124939
-0.780756	fast. The	-0.124939
-0.780756	units. The	-0.124939
-1.376588	branch. The	-0.124939
-0.538615	array. The	-0.124939
-1.420462	space. The	-0.124939
-1.376588	zero. The	-0.124939
-0.780756	line. The	-0.124939
-0.780756	vectors. The	-0.124939
-0.780756	applications. The	-0.124939
-1.390726	X The	-0.124939
-1.347347	table. The	-0.124939
-1.347347	solution. The	-0.124939
-1.332734	Linux. The	-0.124939
-0.729878	integer. The	-0.124939
-0.726306	optimizations. The	-0.124939
-0.726306	case. The	-0.124939
-1.347347	processor. The	-0.124939
-0.486726	bits. The	-0.124939
-0.726306	is. The	-0.124939
-0.482648	automatically. The	-0.124939
-1.362470	Clang The	-0.124939
-0.726306	details. The	-0.124939
-0.484682	vectorization. The	-0.124939
-1.332734	anyway. The	-0.124939
-0.729878	do. The	-0.124939
-0.482648	threads. The	-0.124939
-1.347347	number. The	-0.124939
-1.280401	constant. The	-0.124939
-1.295523	systems: The	-0.124939
-1.280401	polynomial The	-0.124939
-1.280401	this. The	-0.124939
-1.280401	call. The	-0.124939
-1.295523	prediction. The	-0.124939
-0.417735	application. The	-0.124939
-0.666532	here. The	-0.124939
-1.280401	members. The	-0.124939
-1.280401	above. The	-0.124939
-1.295523	result. The	-0.124939
-0.662931	7 The	-0.425969
-1.295523	unwinding The	-0.124939
-1.295523	again. The	-0.124939
-0.662931	one. The	-0.124939
-1.295523	counter. The	-0.124939
-0.662931	structure. The	-0.124939
-0.662931	4. The	-0.124939
-0.662931	branches. The	-0.124939
-1.216342	profiler. The	-0.124939
-1.216342	maintain. The	-0.124939
-1.216342	tools. The	-0.124939
-1.216342	numbers. The	-0.124939
-1.216342	names. The	-0.124939
-1.216342	manual. The	-0.124939
-0.340598	computer. The	-0.124939
-1.216342	finished. The	-0.124939
-0.587350	ways. The	-0.124939
-0.587350	16.2 The	-0.425969
-1.232009	pow The	-0.124939
-1.216342	a[i]; The	-0.124939
-1.216342	priority. The	-0.124939
-1.216342	execution. The	-0.124939
-1.216342	microprocessors. The	-0.124939
-0.587350	www.agner.org/optimize/asmlib.zip. The	-0.124939
-1.216342	mispredictions. The	-0.124939
-1.216342	so. The	-0.124939
-1.216342	addresses. The	-0.124939
-1.216342	them. The	-0.124939
-1.216342	point. The	-0.124939
-0.587350	8. The	-0.124939
-0.587350	explanation. The	-0.124939
-0.587350	elements. The	-0.124939
-0.587350	doubled. The	-0.124939
-1.216342	division. The	-0.124939
-0.340598	cycle. The	-0.124939
-1.216342	5. The	-0.124939
-1.216342	here: The	-0.124939
-1.135099	better. The	-0.124939
-1.135099	up. The	-0.124939
-1.135099	reasons. The	-0.124939
-1.135099	exception. The	-0.124939
-1.135099	type. The	-0.124939
-0.494071	initialization. The	-0.124939
-0.494071	ebx. The	-0.124939
-1.135099	bad The	-0.124939
-0.494071	true. The	-0.124939
-1.135099	objects. The	-0.124939
-0.494071	index. The	-0.124939
-0.245741	*.so). The	-0.124939
-0.494071	lines. The	-0.124939
-1.135099	running. The	-0.124939
-1.135099	input. The	-0.124939
-1.151354	dispatcher. The	-0.124939
-1.135099	optimal. The	-0.124939
-1.135099	example. The	-0.124939
-1.151354	1.0f; The	-0.124939
-1.135099	efficiency. The	-0.124939
-1.135099	versions. The	-0.124939
-1.135099	cached. The	-0.124939
-0.494071	unsigned. The	-0.425969
-0.494071	pointers. The	-0.124939
-1.135099	double. The	-0.124939
-0.245741	diagonal. The	-0.124939
-1.151354	another. The	-0.124939
-1.135099	aliasing. The	-0.124939
-0.494071	style. The	-0.124939
-1.135099	counts. The	-0.124939
-1.135099	smaller. The	-0.124939
-0.245741	3. The	-0.124939
-0.245741	module. The	-0.124939
-0.088740	cast The	-0.124939
-1.135099	manually. The	-0.124939
-1.135099	run. The	-0.124939
-0.494071	format. The	-0.124939
-1.135099	programs. The	-0.124939
-0.245741	work. The	-0.124939
-1.026415	compact. The	-0.124939
-0.372794	reasons: The	-0.124939
-0.372794	error. The	-0.124939
-1.026415	119 The	-0.124939
-1.026415	code). The	-0.124939
-1.026415	errors. The	-0.124939
-1.026415	only. The	-0.124939
-1.026415	analysis The	-0.124939
-1.026415	features. The	-0.124939
-0.372794	advance. The	-0.124939
-1.026415	28. The	-0.124939
-1.026415	takes. The	-0.124939
-0.372794	exceptions. The	-0.124939
-0.372794	implemented. The	-0.124939
-1.026415	once. The	-0.124939
-1.026415	Windows). The	-0.124939
-1.026415	for. The	-0.124939
-1.026415	step. The	-0.124939
-1.026415	brand. The	-0.124939
-1.026415	collection. The	-0.124939
-1.026415	135 The	-0.124939
-1.043301	purpose. The	-0.124939
-0.372794	compilation. The	-0.124939
-1.026415	sequentially. The	-0.124939
-1.026415	operands. The	-0.124939
-0.372794	back. The	-0.124939
-1.026415	expected. The	-0.124939
-1.026415	systems". The	-0.124939
-1.026415	conventions. The	-0.124939
-1.026415	stored. The	-0.124939
-1.043301	deallocated. The	-0.124939
-1.026415	sizes. The	-0.124939
-0.372794	9.6b. The	-0.425969
-1.026415	same. The	-0.124939
-1.026415	occur. The	-0.124939
-1.026415	prone. The	-0.124939
-0.372794	linker. The	-0.124939
-1.026415	multiplications. The	-0.124939
-1.026415	metaprogramming. The	-0.124939
-0.372794	output. The	-0.124939
-1.026415	expression. The	-0.124939
-0.372794	resource. The	-0.124939
-1.026415	truncation. The	-0.124939
-1.026415	big. The	-0.124939
-1.026415	float. The	-0.124939
-0.372794	1.0f;} The	-0.124939
-1.026415	n. The	-0.124939
-1.026415	copying. The	-0.124939
-1.026415	x. The	-0.124939
-1.026415	post-increment. The	-0.124939
-1.026415	templates. The	-0.124939
-0.867210	107). The	-0.124939
-0.867210	Implementation The	-0.124939
-0.200396	parallelism. The	-0.124939
-0.867210	frequency. The	-0.124939
-0.867210	71). The	-0.124939
-0.867210	jobs. The	-0.124939
-0.867210	context. The	-0.124939
-0.200396	operator. The	-0.124939
-0.867210	parallelization. The	-0.124939
-0.200396	sampling: The	-0.425969
-0.867210	instance. The	-0.124939
-0.867210	frameworks. The	-0.124939
-0.200396	127. The	-0.124939
-0.867210	instruction. The	-0.124939
-0.867210	cases: The	-0.124939
-0.867210	mouse. The	-0.124939
-0.867210	difficult. The	-0.124939
-0.867210	values. The	-0.124939
-0.867210	2" The	-0.124939
-0.200396	heap. The	-0.124939
-0.867210	differently. The	-0.124939
-0.867210	language". The	-0.124939
-0.867210	52. The	-0.124939
-0.867210	Loops The	-0.124939
-0.867210	GOT. The	-0.124939
-0.867210	string. The	-0.124939
-0.867210	volatile. The	-0.124939
-0.867210	relocation. The	-0.124939
-0.867210	supported. The	-0.124939
-0.867210	range. The	-0.124939
-0.867210	programming. The	-0.124939
-0.867210	disadvantages: The	-0.124939
-0.867210	needs. The	-0.124939
-0.867210	__fastcall. The	-0.124939
-0.867210	specified. The	-0.124939
-0.200396	underflow. The	-0.124939
-0.867210	false. The	-0.124939
-0.867210	Multithreading The	-0.124939
-0.867210	methods. The	-0.124939
-0.867210	small. The	-0.124939
-0.867210	utility. The	-0.124939
-0.867210	controlled. The	-0.124939
-0.200396	updating. The	-0.124939
-0.200396	separately. The	-0.124939
-0.867210	26. The	-0.124939
-0.867210	(properties) The	-0.124939
-0.867210	60 The	-0.124939
-0.867210	iterations. The	-0.124939
-0.867210	required. The	-0.124939
-0.867210	use. The	-0.124939
-0.867210	declaration. The	-0.124939
-0.867210	undetected. The	-0.124939
-0.867210	Volatile The	-0.124939
-0.867210	allowed. The	-0.124939
-0.867210	8.1. The	-0.124939
-0.867210	c) The	-0.124939
-0.867210	mechanism. The	-0.124939
-0.200396	negative. The	-0.124939
-0.867210	defined. The	-0.124939
-0.867210	load. The	-0.124939
-0.867210	framework. The	-0.124939
-0.867210	area. The	-0.124939
-0.867210	PCs. The	-0.124939
-0.867210	examples. The	-0.124939
-0.867210	predicted. The	-0.124939
-0.867210	mask. The	-0.124939
-0.200396	machine. The	-0.124939
-0.867210	dynamically. The	-0.124939
-0.867210	1.23456. The	-0.124939
-0.867210	(PLT). The	-0.124939
-0.867210	column. The	-0.124939
-0.867210	below). The	-0.124939
-0.867210	sources. The	-0.124939
-0.867210	137). The	-0.124939
-0.867210	template. The	-0.124939
-0.867210	started. The	-0.124939
-0.200396	80. The	-0.124939
-0.867210	i. The	-0.124939
-0.200396	0. The	-0.124939
-0.867210	correctly. The	-0.124939
-0.200396	1.1 The	-0.425969
-0.867210	43). The	-0.124939
-0.867210	Mac. The	-0.124939
-0.867210	fraction. The	-0.124939
-0.200396	compilers). The	-0.124939
-0.200396	processes. The	-0.124939
-0.867210	bit. The	-0.124939
-0.867210	linking. The	-0.124939
-0.867210	Security The	-0.124939
-0.867210	Booleans The	-0.124939
-0.867210	together. The	-0.124939
-0.867210	invalid. The	-0.124939
-0.200396	122. The	-0.124939
-0.200396	starts. The	-0.124939
-0.200396	i/2+r. The	-0.124939
-0.867210	shows. The	-0.124939
-0.200396	closed. The	-0.124939
-0.867210	avoided. The	-0.124939
-0.200396	each. The	-0.124939
-0.867210	later. The	-0.124939
-0.867210	140). The	-0.124939
-0.867210	105 The	-0.124939
-0.867210	4; The	-0.124939
-0.867210	have. The	-0.124939
-0.867210	BSD. The	-0.124939
-0.583750	51). The	-0.124939
-0.583750	count. The	-0.124939
-0.583750	independently. The	-0.124939
-0.583750	(en.wikipedia.org/wiki/L2_cache). The	-0.124939
-0.583750	initialized. The	-0.124939
-0.583750	i+1; The	-0.124939
-0.583750	terminated. The	-0.124939
-0.583750	generality. The	-0.124939
-0.583750	sin(0.8); The	-0.124939
-0.583750	(en.wikipedia.org/wiki/Standard_Template_Library). The	-0.124939
-0.583750	virtualization. The	-0.124939
-0.583750	installed. The	-0.124939
-0.583750	interpretation. The	-0.124939
-0.583750	rarely. The	-0.124939
-0.583750	insufficient. The	-0.124939
-0.583750	core). The	-0.124939
-0.583750	be. The	-0.124939
-0.583750	situations: The	-0.124939
-0.583750	119). The	-0.124939
-0.583750	paragraph. The	-0.124939
-0.583750	millisecond. The	-0.124939
-0.583750	destructors. The	-0.124939
-0.583750	division). The	-0.124939
-0.583750	27). The	-0.124939
-0.583750	144 The	-0.124939
-0.583750	ecx+eax*4. The	-0.124939
-0.583750	optimally. The	-0.124939
-0.583750	module2.cpp. The	-0.124939
-0.583750	eax. The	-0.124939
-0.583750	(b1*b2); The	-0.124939
-0.583750	performance: The	-0.124939
-0.583750	1000. The	-0.124939
-0.583750	runtime). The	-0.124939
-0.583750	20. The	-0.124939
-0.583750	accelerators The	-0.124939
-0.583750	eax,0. The	-0.124939
-0.583750	storing. The	-0.124939
-0.583750	8.15b. The	-0.124939
-0.583750	vector). The	-0.124939
-0.583750	Instrumentation: The	-0.124939
-0.583750	y. The	-0.124939
-0.583750	SSE). The	-0.124939
-0.583750	formalism. The	-0.124939
-0.583750	duration. The	-0.124939
-0.583750	disadvantages. The	-0.124939
-0.583750	kludgy. The	-0.124939
-0.583750	sched_setaffinity). The	-0.124939
-0.583750	shortly. The	-0.124939
-0.583750	databases. The	-0.124939
-0.583750	pointers: The	-0.124939
-0.583750	Wikibooks. The	-0.124939
-0.583750	27 The	-0.124939
-0.583750	noticeable. The	-0.124939
-0.583750	know). The	-0.124939
-0.583750	error-prone. The	-0.124939
-0.583750	risky. The	-0.124939
-0.583750	wheel. The	-0.124939
-0.583750	weekdays. The	-0.124939
-0.583750	16.2. The	-0.124939
-0.583750	straightforward. The	-0.124939
-0.583750	further. The	-0.124939
-0.583750	password. The	-0.124939
-0.583750	104). The	-0.124939
-0.583750	contiguous. The	-0.124939
-0.583750	instead. The	-0.124939
-0.583750	digits. The	-0.124939
-0.583750	44. The	-0.124939
-0.583750	factors. The	-0.124939
-0.583750	operators). The	-0.124939
-0.583750	unchanged. The	-0.124939
-0.583750	specification. The	-0.124939
-0.583750	floats. The	-0.124939
-0.583750	renaming. The	-0.124939
-0.583750	most. The	-0.124939
-0.583750	dependent. The	-0.124939
-0.583750	143. The	-0.124939
-0.583750	||). The	-0.124939
-0.583750	__rdtsc()). The	-0.124939
-0.583750	1.2345); The	-0.124939
-0.583750	repetitive. The	-0.124939
-0.583750	tolerance. The	-0.124939
-0.583750	reputation. The	-0.124939
-0.583750	(/Oa). The	-0.124939
-0.583750	Debugging. The	-0.124939
-0.583750	FPGAs. The	-0.124939
-0.583750	keyword. The	-0.124939
-0.583750	100000001.23456. The	-0.124939
-0.583750	/arch:SSE2. The	-0.124939
-0.583750	flaws: The	-0.124939
-0.583750	Atom). The	-0.124939
-0.583750	somewhat. The	-0.124939
-0.583750	28) The	-0.124939
-0.583750	__attribute__((fastcall)). The	-0.124939
-0.583750	134. The	-0.124939
-0.583750	r.b;} The	-0.124939
-0.583750	newer. The	-0.124939
-0.583750	m. The	-0.124939
-0.583750	algorithm. The	-0.124939
-0.583750	. The	-0.124939
-0.583750	documented. The	-0.124939
-0.583750	new. The	-0.124939
-0.583750	end. The	-0.124939
-0.583750	2016. The	-0.124939
-0.583750	84). The	-0.124939
-0.583750	features: The	-0.124939
-0.583750	-1. The	-0.124939
-0.583750	temp. The	-0.124939
-0.583750	tedious. The	-0.124939
-0.583750	design. The	-0.124939
-0.583750	Security. The	-0.124939
-0.583750	alternative. The	-0.124939
-0.583750	elimination. The	-0.124939
-0.583750	Library. The	-0.124939
-0.583750	afterwards. The	-0.124939
-0.583750	next. The	-0.124939
-0.583750	VIA. The	-0.124939
-0.583750	(www.boost.org). The	-0.124939
-0.583750	SVML. The	-0.124939
-0.583750	(&a); The	-0.124939
-0.583750	satisfactory. The	-0.124939
-0.583750	<. The	-0.124939
-0.583750	fastest. The	-0.124939
-0.583750	Fog The	-0.124939
-0.583750	doesn’t. The	-0.124939
-0.583750	distributed. The	-0.124939
-0.583750	6! The	-0.124939
-0.583750	67 The	-0.124939
-0.583750	details). The	-0.124939
-0.583750	conversion. The	-0.124939
-0.583750	costs. The	-0.124939
-0.583750	a+1;. The	-0.124939
-0.583750	satisfied. The	-0.124939
-0.583750	delays. The	-0.124939
-0.583750	library). The	-0.124939
-0.583750	freely. The	-0.124939
-0.583750	increment. The	-0.124939
-0.583750	applications: The	-0.124939
-0.583750	72). The	-0.124939
-0.583750	ratio. The	-0.124939
-0.583750	mangling. The	-0.124939
-0.583750	sum. The	-0.124939
-0.583750	3.0; The	-0.124939
-0.583750	i++)a[i]=2*i; The	-0.124939
-0.583750	bug". The	-0.124939
-0.583750	71 The	-0.124939
-0.583750	modular. The	-0.124939
-0.583750	advantages: The	-0.124939
-0.583750	reads. The	-0.124939
-0.583750	2.6f; The	-0.124939
-0.583750	spots. The	-0.124939
-0.583750	series. The	-0.124939
-0.583750	~. The	-0.124939
-0.583750	row. The	-0.124939
-0.583750	tested. The	-0.124939
-0.583750	vectorize. The	-0.124939
-0.583750	www.agner.org/optimize/#vectorclass. The	-0.124939
-0.583750	tables. The	-0.124939
-0.583750	powerful. The	-0.124939
-0.583750	comparisons. The	-0.124939
-0.583750	bits). The	-0.124939
-0.583750	_mm_cvtsd_si32(_mm_load_sd(&x));} The	-0.124939
-0.583750	tried. The	-0.124939
-0.583750	follows. The	-0.124939
-0.583750	directly. The	-0.124939
-0.583750	pointer". The	-0.124939
-0.583750	parameters). The	-0.124939
-0.583750	old. The	-0.124939
-0.583750	these. The	-0.124939
-0.583750	wasted. The	-0.124939
-0.583750	7.30b. The	-0.124939
-0.583750	70). The	-0.124939
-2.469541	is for	-0.124939
-2.073940	and for	-0.124939
-2.606615	that for	-0.124939
-2.418597	are for	-0.124939
-2.037644	or for	-0.124939
-2.356620	it for	-0.124939
-1.734446	function for	-0.182931
-1.478849	code for	-0.249877
-2.144220	as for	-0.124939
-2.309209	not for	-0.124939
-1.955612	than for	-0.124939
-1.354937	compiler for	-0.284640
-2.740690	x for	-0.124939
-1.519047	{ for	-0.425969
-2.298153	this for	-0.124939
-1.787837	time for	-0.249877
-1.773310	use for	-0.124939
-2.005022	memory for	-0.124939
-1.685021	data for	-0.124939
-2.294672	program for	-0.124939
-1.759123	different for	-0.301030
-2.198536	same for	-0.124939
-1.119520	functions for	-0.124939
-1.230022	only for	-0.124939
-2.038735	instruction for	-0.124939
-1.767323	loop for	-0.124939
-1.954309	but for	-0.124939
-0.567340	used for	-0.181420
-1.332334	one for	-0.191886
-1.903060	cache for	-0.425969
-1.885587	set for	-0.425969
-2.201085	class for	-0.124939
-1.900537	compilers for	-0.124939
-2.283346	most for	-0.124939
-2.267657	size for	-0.124939
-2.308963	b for	-0.124939
-1.806809	library for	-0.124939
-1.803039	object for	-0.124939
-1.775164	C++ for	-0.124939
-1.810663	efficient for	-0.124939
-1.767416	array for	-0.124939
-1.419432	possible for	-0.425969
-1.310569	version for	-0.124939
-2.232864	objects for	-0.124939
-1.724011	variable for	-0.124939
-1.174906	variables for	-0.124939
-2.115353	table for	-0.124939
-1.251392	performance for	-0.124939
-2.072166	software for	-0.124939
-1.709036	branch for	-0.124939
-2.099437	called for	-0.124939
-1.464297	0; for	-0.124939
-2.153138	example, for	-0.124939
-2.199832	unsigned for	-0.124939
-1.279517	register for	-0.124939
-1.149087	libraries for	-0.346788
-2.100318	template for	-0.124939
-1.602043	registers for	-0.124939
-1.392999	need for	-0.124939
-2.045442	test for	-0.124939
-0.426214	useful for	-0.188608
-1.217847	even for	-0.124939
-1.591176	method for	-0.124939
-1.922466	always for	-0.124939
-1.557304	16 for	-0.124939
-1.976392	system for	-0.124939
-1.578497	32 for	-0.124939
-1.081729	file for	-0.221849
-1.553956	bits for	-0.124939
-1.334443	operations for	-0.301030
-1.558991	0 for	-0.124939
-2.076987	cases for	-0.124939
-0.894114	instructions for	-0.124939
-0.725319	available for	-0.170696
-1.998236	error for	-0.124939
-1.300736	times for	-0.124939
-2.011186	stack for	-0.124939
-1.529138	important for	-0.124939
-2.025111	CPUs for	-0.124939
-1.956786	large for	-0.124939
-1.486547	work for	-0.124939
-0.922506	versions for	-0.903090
-1.988640	processor for	-0.124939
-0.774030	compiled for	-0.329059
-1.944430	big for	-0.124939
-1.963413	best for	-0.124939
-1.486329	necessary for	-0.124939
-1.986482	element for	-0.124939
-1.948688	language for	-0.124939
-2.010844	speed for	-0.124939
-0.893194	i; for	-0.903090
-1.910437	common for	-0.124939
-1.434346	etc. for	-0.124939
-2.093396	compile for	-0.124939
-2.023348	exception for	-0.124939
-1.987870	allocated for	-0.124939
-0.627294	option for	-0.124939
-1.069136	good for	-0.249877
-1.944078	matrix for	-0.124939
-1.921009	precision for	-0.124939
-1.948886	works for	-0.124939
-0.708001	optimized for	-0.124939
-0.929227	manual for	-0.221849
-1.977591	b; for	-0.124939
-0.370742	check for	-0.374816
-2.006241	advantageous for	-0.124939
-1.984245	solution for	-0.124939
-1.167662	container for	-0.124939
-0.416858	support for	-0.191886
-1.143450	operators for	-0.301030
-2.022817	i++) for	-0.124939
-1.874252	standard for	-0.124939
-1.908337	hardware for	-0.124939
-1.118557	1 for	-0.124939
-1.349245	optimizing for	-0.124939
-1.369934	information for	-0.124939
-1.862932	cycles for	-0.124939
-0.746415	... for	-0.602060
-1.345226	addresses for	-0.124939
-1.335004	files for	-0.124939
-2.026463	recommended for	-0.124939
-1.865585	allocation for	-0.124939
-1.848208	optimize for	-0.124939
-1.893028	above for	-0.124939
-1.859301	problems for	-0.124939
-1.859301	optimal for	-0.124939
-1.841570	space for	-0.124939
-1.887335	cases, for	-0.124939
-1.817174	branches for	-0.124939
-1.304670	1; for	-0.425969
-1.885118	caching for	-0.124939
-1.855150	implementation for	-0.124939
-1.267834	handling for	-0.124939
-0.867075	methods for	-0.124939
-1.801127	separate for	-0.124939
-1.255823	block for	-0.124939
-1.017139	name for	-0.124939
-1.263466	c; for	-0.425969
-1.905414	disadvantage for	-0.124939
-1.792355	high for	-0.124939
-1.774623	zero for	-0.124939
-1.830149	resources for	-0.124939
-0.843425	reason for	-0.249877
-1.773871	lookup for	-0.124939
-1.811666	examples for	-0.124939
-1.245630	difference for	-0.124939
-0.722753	needed for	-0.124939
-1.736835	expressions for	-0.124939
-0.827318	difficult for	-0.726999
-1.185183	directives for	-0.124939
-1.716631	framework for	-0.124939
-1.772157	linking for	-0.124939
-0.944343	x; for	-0.301030
-1.713173	platform for	-0.124939
-1.169134	higher for	-0.124939
-1.770092	know for	-0.124939
-1.748899	results for	-0.124939
-1.144280	options for	-0.124939
-1.154685	feature for	-0.124939
-1.144280	made for	-0.124939
-1.735885	appropriate for	-0.124939
-1.705211	constructor for	-0.124939
-1.744068	relevant for	-0.124939
-1.661079	section for	-0.124939
-1.690373	computer for	-0.124939
-0.573265	choice for	-0.221849
-1.654058	STL for	-0.124939
-0.340892	intended for	-0.124939
-1.717739	avoided for	-0.124939
-1.646236	lines for	-0.124939
-0.543600	instance for	-0.823909
-1.049398	checking for	-0.124939
-1.606052	inlined for	-0.124939
-1.596066	database for	-0.124939
-1.637466	destructor for	-0.124939
-0.755577	possibility for	-0.124939
-1.573867	macro for	-0.124939
-1.594556	them for	-0.124939
-1.581517	containers for	-0.124939
-0.714718	Optimizing for	-0.124939
-1.604384	rows for	-0.124939
-0.355037	compiling for	-0.124939
-1.549326	structures for	-0.124939
-1.543728	resource for	-0.124939
-0.679967	temp; for	-0.301030
-0.683026	easier for	-0.602060
-1.543728	identical for	-0.124939
-1.555011	program, for	-0.124939
-0.524153	except for	-0.124939
-0.878572	templates for	-0.124939
-1.549342	header for	-0.124939
-0.481697	penalty for	-0.249877
-1.525203	reasons for	-0.124939
-1.502336	module for	-0.124939
-1.502336	used, for	-0.124939
-0.886592	checks for	-0.124939
-1.502336	stride for	-0.124939
-1.525203	3 for	-0.124939
-1.491338	counts for	-0.124939
-0.892022	enough for	-0.124939
-1.502336	features for	-0.124939
-1.503584	3; for	-0.124939
-0.835471	chosen for	-0.425969
-1.467861	destructors for	-0.124939
-1.467861	transfer for	-0.124939
-0.592817	search for	-0.301030
-0.840834	Time for	-0.124939
-1.479445	mispredicted for	-0.124939
-0.846264	tool for	-0.124939
-1.452432	included for	-0.124939
-1.440195	c2 for	-0.124939
-1.428293	unit for	-0.124939
-0.106090	conventions for	-0.669007
-1.452432	account for	-0.124939
-1.416709	algorithms for	-0.124939
-0.792388	PLT for	-0.124939
-0.786992	once for	-0.425969
-0.543191	designed for	-0.124939
-1.416709	inputs for	-0.124939
-1.440195	factors for	-0.124939
-0.543191	required for	-0.124939
-1.440195	GOT for	-0.124939
-1.382203	poorly for	-0.124939
-1.358717	suitable for	-0.124939
-1.394440	Template for	-0.124939
-1.394440	a[100]; for	-0.124939
-1.370301	mode, for	-0.124939
-0.328924	130 for	-0.249877
-0.731690	120 for	-0.124939
-1.370301	changes for	-0.124939
-1.370301	again for	-0.124939
-1.358717	capabilities for	-0.124939
-0.048658	waiting for	-0.191886
-1.394440	rules for	-0.124939
-0.331064	wait for	-0.249877
-0.670173	principle for	-0.124939
-1.327493	expected for	-0.124939
-1.315256	Performance for	-0.124939
-1.340085	convenient for	-0.124939
-1.303354	c1 for	-0.124939
-1.303354	87 for	-0.124939
-0.667450	permissible for	-0.425969
-1.303354	1.0; for	-0.124939
-1.315256	Testing for	-0.124939
-1.340085	requirements for	-0.124939
-0.590992	drivers for	-0.124939
-0.590992	122 for	-0.425969
-1.260904	bc for	-0.124939
-0.593733	searching for	-0.124939
-0.069758	vectors: for	-0.823909
-1.248312	80 for	-0.124939
-0.590992	90 for	-0.124939
-1.236075	recommendation for	-0.124939
-1.248312	Only for	-0.124939
-1.236075	107 for	-0.124939
-1.236075	Compilers for	-0.124939
-0.343685	(except for	-0.301030
-1.248312	facilities for	-0.124939
-1.151402	103 for	-0.124939
-1.151402	51 for	-0.124939
-1.176961	preferable for	-0.124939
-1.151402	43 for	-0.124939
-0.246775	specialization for	-0.602060
-1.151402	88 for	-0.124939
-1.151402	manager for	-0.124939
-1.151402	150 for	-0.124939
-0.089100	guide for	-0.249877
-1.151402	tools for	-0.124939
-0.496823	documentation for	-0.124939
-1.151402	directive for	-0.124939
-0.496823	area for	-0.124939
-1.151402	29 for	-0.124939
-1.151402	floats for	-0.124939
-0.496823	www.agner.org/optimize/cppexamples.zip for	-0.124939
-1.151402	31 for	-0.124939
-1.163994	45 for	-0.124939
-1.039055	49 for	-0.124939
-1.039055	101 for	-0.124939
-1.039055	93 for	-0.124939
-1.039055	119 for	-0.124939
-1.039055	blocks, for	-0.124939
-1.039055	blocking for	-0.124939
-1.052023	r; for	-0.124939
-1.039055	perhaps for	-0.124939
-1.039055	organization for	-0.124939
-1.052023	calculations: for	-0.124939
-1.052023	basis for	-0.124939
-1.039055	81 for	-0.124939
-0.374642	89 for	-0.425969
-1.039055	153 for	-0.124939
-1.039055	140 for	-0.124939
-1.039055	141 for	-0.124939
-1.052023	twice for	-0.124939
-1.039055	competing for	-0.124939
-0.374642	unusual for	-0.425969
-0.374642	ms for	-0.124939
-1.039055	Documentation for	-0.124939
-1.039055	lookups for	-0.124939
-0.374642	78 for	-0.425969
-1.039055	market for	-0.124939
-1.052023	Day for	-0.124939
-0.374642	CPUs" for	-0.425969
-1.039055	variable, for	-0.124939
-1.052023	sufficient for	-0.124939
-0.875931	card for	-0.124939
-0.875931	accumulators for	-0.124939
-0.875931	justified for	-0.124939
-0.875931	sum; for	-0.124939
-0.875931	loop, for	-0.124939
-0.875931	loop: for	-0.124939
-0.875931	StringLength; for	-0.124939
-0.875931	it, for	-0.124939
-0.875931	a[2]; for	-0.124939
-0.875931	72 for	-0.124939
-0.201326	suited for	-0.124939
-0.875931	Compile for	-0.124939
-0.875931	corrections for	-0.124939
-0.201326	exp(x) for	-0.425969
-0.875931	events, for	-0.124939
-0.875931	proxy for	-0.124939
-0.875931	literature for	-0.124939
-0.875931	precautions for	-0.124939
-0.875931	Useful for	-0.124939
-0.875931	warning for	-0.124939
-0.875931	column; for	-0.124939
-0.875931	dramatically for	-0.124939
-0.875931	IDE's for	-0.124939
-0.875931	1.f; for	-0.124939
-0.201326	fine-tuned for	-0.124939
-0.875931	waits for	-0.124939
-0.875931	consistent for	-0.124939
-0.875931	interpreter for	-0.124939
-0.875931	spaces for	-0.124939
-0.875931	squares: for	-0.124939
-0.201326	uncommon for	-0.124939
-0.875931	unnecessary for	-0.124939
-0.875931	84 for	-0.124939
-0.875931	left for	-0.124939
-0.875931	stronger for	-0.124939
-0.875931	procedures for	-0.124939
-0.875931	~ for	-0.124939
-0.875931	accurate for	-0.124939
-0.201326	contend for	-0.425969
-0.875931	B; for	-0.124939
-0.875931	Optimize for	-0.124939
-0.875931	subroutine for	-0.124939
-0.588268	preferences for	-0.124939
-0.588268	Correction for	-0.124939
-0.588268	utility for	-0.124939
-0.588268	FAQ for	-0.124939
-0.588268	row++) for	-0.124939
-0.588268	be, for	-0.124939
-0.588268	interval, for	-0.124939
-0.588268	decimals, for	-0.124939
-0.588268	cell for	-0.124939
-0.588268	*temp; for	-0.124939
-0.588268	VTune, for	-0.124939
-0.588268	executables for	-0.124939
-0.588268	maintained for	-0.124939
-0.588268	IDE, for	-0.124939
-0.588268	buffers for	-0.124939
-0.588268	audience for	-0.124939
-0.588268	sourcebook for	-0.124939
-0.588268	meaning for	-0.124939
-0.588268	sqaure: for	-0.124939
-0.588268	delayed for	-0.124939
-0.588268	11.1 for	-0.124939
-0.588268	Usability for	-0.124939
-0.588268	.R. for	-0.124939
-0.588268	122) for	-0.124939
-0.588268	Except for	-0.124939
-0.588268	prepared for	-0.124939
-0.588268	compensate for	-0.124939
-0.588268	reserved for	-0.124939
-0.588268	timediff[NumberOfTests]; for	-0.124939
-0.588268	request for	-0.124939
-0.588268	Guide for	-0.124939
-0.588268	requests for	-0.124939
-0.588268	keyword, for	-0.124939
-0.588268	compete for	-0.124939
-0.588268	Assume, for	-0.124939
-0.588268	attack for	-0.124939
-0.588268	intranet for	-0.124939
-0.588268	criticized for	-0.124939
-0.588268	Libraries for	-0.124939
-0.588268	printf("\nResults:"); for	-0.124939
-0.588268	standards for	-0.124939
-0.588268	specifically for	-0.124939
-0.588268	125 for	-0.124939
-0.588268	possibilities for	-0.124939
-0.588268	helpful for	-0.124939
-0.588268	prototypes for	-0.124939
-0.588268	If, for	-0.124939
-0.588268	breakdowns for	-0.124939
-0.588268	parts, for	-0.124939
-0.588268	blog for	-0.124939
-0.588268	suggestions for	-0.124939
-0.588268	Waiting for	-0.124939
-0.588268	opportunities for	-0.124939
-0.588268	replacements for	-0.124939
-0.588268	handlers for	-0.124939
-0.588268	/Fa for	-0.124939
-0.588268	wired for	-0.124939
-0.588268	12.3a, for	-0.124939
-0.588268	strategy for	-0.124939
-0.588268	separately: for	-0.124939
-0.588268	Prototype for	-0.124939
-0.588268	exist for	-0.124939
-0.588268	systems" for	-0.124939
-0.588268	14.00 for	-0.124939
-0.588268	_alloca) for	-0.124939
-0.588268	doubled for	-0.124939
-0.588268	correction for	-0.124939
-0.588268	5-10% for	-0.124939
-0.588268	candidates for	-0.124939
-0.588268	responsible for	-0.124939
-0.588268	comp.lang.asm.x86 for	-0.124939
-0.588268	term for	-0.124939
-1.553241	is that	-0.677781
-2.706834	of that	-0.124939
-2.183575	and that	-0.124939
-2.534349	are that	-0.124939
-1.375201	function that	-0.324511
-2.514812	with that	-0.124939
-2.269285	on that	-0.124939
-1.151408	code that	-0.261158
-1.576556	compiler that	-0.182931
-2.141865	time that	-0.124939
-2.485758	use that	-0.124939
-2.413428	memory that	-0.124939
-1.839549	data that	-0.301030
-1.237513	program that	-0.166331
-1.004495	functions that	-0.510290
-2.405313	CPU that	-0.124939
-2.057838	instruction that	-0.124939
-1.646727	loop that	-0.124939
-2.455706	used that	-0.124939
-1.228455	one that	-0.234083
-2.353022	cache that	-0.124939
-2.306155	integer that	-0.124939
-1.917246	set that	-0.124939
-1.554624	class that	-0.124939
-1.449885	compilers that	-0.124939
-1.424849	size that	-0.346788
-1.370307	library that	-0.124939
-1.610529	object that	-0.124939
-1.316844	version that	-0.124939
-1.225238	value that	-0.602060
-1.794853	objects that	-0.124939
-1.537059	variable that	-0.124939
-0.317645	so that	-0.737723
-1.180978	variables that	-0.425969
-2.188100	table that	-0.124939
-2.240629	performance that	-0.124939
-1.350767	software that	-0.249877
-2.270731	long that	-0.124939
-0.965060	branch that	-0.301030
-1.340872	way that	-0.124939
-2.242335	elements that	-0.124939
-1.707629	address that	-0.425969
-2.212109	example, that	-0.124939
-2.142873	register that	-0.124939
-1.414287	libraries that	-0.301030
-2.139153	registers that	-0.124939
-2.157435	pointers that	-0.124939
-2.110296	test that	-0.124939
-1.380015	systems that	-0.124939
-0.328070	sure that	-0.477121
-1.370220	method that	-0.124939
-2.061481	file that	-0.124939
-2.127579	0 that	-0.124939
-2.101556	type that	-0.124939
-1.568145	case that	-0.425969
-0.898584	instructions that	-0.124939
-0.784551	processors that	-0.477121
-2.059972	constant that	-0.124939
-2.057207	error that	-0.124939
-1.305455	important that	-0.301030
-1.527953	CPUs that	-0.124939
-2.017833	large that	-0.124939
-1.510937	arrays that	-0.124939
-1.503509	work that	-0.124939
-1.512814	avoid that	-0.124939
-2.041822	processor that	-0.124939
-1.486446	big that	-0.124939
-1.489324	threads that	-0.124939
-1.249018	language that	-0.124939
-1.471709	thread that	-0.124939
-1.450519	small that	-0.124939
-1.224092	option that	-0.124939
-1.964168	classes that	-0.124939
-1.207688	line that	-0.124939
-2.017544	parameters that	-0.124939
-1.439084	check that	-0.124939
-1.411259	problem that	-0.124939
-2.020609	solution that	-0.124939
-1.415105	container that	-0.124939
-1.432843	advantage that	-0.124939
-1.152424	operators that	-0.301030
-1.159456	likely that	-0.124939
-1.979755	structure that	-0.124939
-1.972303	calculate that	-0.124939
-1.983762	copy that	-0.124939
-2.008691	information that	-0.124939
-0.850912	certain that	-0.221849
-1.909291	cycles that	-0.124939
-1.353990	addresses that	-0.425969
-1.909291	counter that	-0.124939
-1.350093	count that	-0.425969
-1.930910	files that	-0.124939
-2.039471	recommended that	-0.124939
-1.333435	fast that	-0.124939
-1.327679	write that	-0.425969
-1.070280	programs that	-0.301030
-0.921889	problems that	-0.124939
-1.877826	microprocessor that	-0.124939
-0.903775	branches that	-0.124939
-1.296645	operator that	-0.124939
-1.863533	application that	-0.124939
-1.046914	see that	-0.301030
-0.613310	expression that	-0.492916
-1.840169	complicated that	-0.124939
-1.876215	members that	-0.124939
-1.837197	model that	-0.124939
-1.267050	block that	-0.124939
-1.837197	name that	-0.124939
-1.021153	disadvantage that	-0.602060
-1.008824	high that	-0.602060
-1.819468	zero that	-0.124939
-1.864609	resources that	-0.124939
-1.905506	reason that	-0.124939
-1.830552	dispatcher that	-0.124939
-1.846126	programmer that	-0.124939
-1.225107	applications that	-0.425969
-1.822969	mechanism that	-0.124939
-0.237946	means that	-0.425969
-1.781680	expressions that	-0.124939
-1.791043	directives that	-0.124939
-0.799283	requires that	-0.249877
-0.956393	optimizations that	-0.124939
-1.187501	framework that	-0.124939
-1.814619	microprocessors that	-0.124939
-0.089537	assume that	-0.313995
-1.183811	shows that	-0.124939
-0.554162	know that	-0.204120
-1.171570	advantages that	-0.124939
-1.755295	options that	-0.124939
-0.915171	feature that	-0.124939
-1.739671	constructor that	-0.124939
-0.530681	require that	-0.425969
-1.126309	modules that	-0.124939
-0.867993	things that	-0.301030
-1.714847	reductions that	-0.124939
-1.688518	statement that	-0.124939
-1.088890	errors that	-0.124939
-1.084921	languages that	-0.425969
-1.696520	profiler that	-0.124939
-1.660489	operation that	-0.124939
-0.541027	fact that	-0.522879
-1.663458	platforms that	-0.124939
-1.024959	task that	-0.124939
-1.654988	constants that	-0.124939
-0.783672	destructor that	-0.124939
-0.514085	Assume that	-0.346788
-1.638528	algorithm that	-0.124939
-0.598873	possibility that	-0.249877
-1.685885	discussion that	-0.124939
-1.648725	conditions that	-0.124939
-1.631273	offset that	-0.124939
-0.146960	Note that	-0.522879
-1.006760	operand that	-0.425969
-1.631273	tasks that	-0.124939
-0.391316	Variables that	-0.903090
-0.998713	clear that	-0.124939
-1.605149	predict that	-0.124939
-1.605149	frequency that	-0.124939
-1.596511	iteration that	-0.124939
-0.963951	models that	-0.425969
-1.588041	true that	-0.124939
-1.613963	names that	-0.124939
-1.605149	details that	-0.124939
-0.967956	thing that	-0.124939
-1.579733	structures that	-0.124939
-1.576174	consider that	-0.124939
-1.576174	delay that	-0.124939
-1.567361	cores that	-0.124939
-1.558722	ebx that	-0.124939
-1.508860	statements that	-0.124939
-1.517330	course that	-0.124939
-0.896897	risk that	-0.124939
-1.517330	buffer that	-0.124939
-0.890791	something that	-0.425969
-0.884770	counts that	-0.425969
-0.890791	happen that	-0.124939
-1.471572	style that	-0.124939
-0.847059	provided that	-0.425969
-0.596323	everything that	-0.124939
-1.437872	now that	-0.124939
-1.465440	account that	-0.124939
-0.545170	factors that	-0.301030
-1.437872	explicitly that	-0.124939
-1.388876	measure that	-0.124939
-1.398063	Software that	-0.124939
-1.388876	trick that	-0.124939
-1.398063	disadvantages that	-0.124939
-1.407448	instances that	-0.124939
-1.379880	expensive that	-0.124939
-0.741995	aware that	-0.425969
-1.388876	polymorphism that	-0.124939
-0.123950	sense that	-0.301030
-1.321929	course, that	-0.124939
-0.673003	notice that	-0.124939
-1.340501	expected that	-0.124939
-1.321929	detect that	-0.124939
-1.331116	show that	-0.124939
-0.673003	specifying that	-0.124939
-1.331116	unwinding that	-0.124939
-1.261320	cleanup that	-0.124939
-0.593822	Functions that	-0.425969
-0.593822	specifies that	-0.124939
-1.261320	Arrays that	-0.124939
-1.251935	lists that	-0.124939
-0.593822	event that	-0.425969
-1.251935	Objects that	-0.124939
-1.261320	Data that	-0.124939
-1.270912	frameworks that	-0.124939
-1.280721	tells that	-0.124939
-1.261320	facilities that	-0.124939
-1.280721	divisor that	-0.124939
-0.498957	recommend that	-0.124939
-1.164410	estimate that	-0.124939
-1.174002	said that	-0.124939
-0.089378	think that	-0.124939
-1.183811	alternatives that	-0.124939
-1.164410	tools that	-0.124939
-1.164410	spot that	-0.124939
-1.164410	lengths that	-0.124939
-1.174002	consequence that	-0.124939
-0.498957	assumption that	-0.124939
-0.247573	recognize that	-0.301030
-1.174002	Remember that	-0.124939
-1.164410	considerations that	-0.124939
-0.498957	routine that	-0.124939
-1.164410	auto_ptr that	-0.124939
-0.376073	assuming that	-0.124939
-1.049063	Specifies that	-0.124939
-1.049063	reveals that	-0.124939
-1.049063	labels that	-0.124939
-1.049063	Programmers that	-0.124939
-1.049063	F2 that	-0.124939
-1.058872	unusual that	-0.124939
-0.123785	Applications that	-0.124939
-1.049063	(PLT) that	-0.124939
-1.049063	services that	-0.124939
-0.376073	Check that	-0.124939
-0.376073	beware that	-0.124939
-0.882781	again, that	-0.124939
-0.202046	discovered that	-0.124939
-0.882781	chance that	-0.124939
-0.882781	chip that	-0.124939
-0.202046	believe that	-0.124939
-0.202046	Algorithms that	-0.124939
-0.882781	hacks that	-0.124939
-0.202046	Assuming that	-0.124939
-0.202046	note that	-0.124939
-0.882781	events that	-0.124939
-0.202046	Operations that	-0.425969
-0.202046	Factors that	-0.425969
-0.882781	servers that	-0.124939
-0.202046	Everything that	-0.425969
-0.882781	report that	-0.124939
-0.882781	verify that	-0.124939
-0.882781	complications that	-0.124939
-0.882781	conclude that	-0.124939
-0.882781	spaces that	-0.124939
-0.882781	complex, that	-0.124939
-0.202046	hope that	-0.124939
-0.882781	ones that	-0.124939
-0.882781	saying that	-0.124939
-0.882781	Code that	-0.124939
-0.882781	certainty that	-0.124939
-0.882781	Programs that	-0.124939
-0.202046	says that	-0.124939
-0.882781	interpret that	-0.124939
-0.591787	noticed that	-0.124939
-0.591787	plug-ins that	-0.124939
-0.591787	exceeding that	-0.124939
-0.591787	forgets that	-0.124939
-0.591787	sub-vectors that	-0.124939
-0.591787	illogical that	-0.124939
-0.591787	scanner that	-0.124939
-0.591787	assumes that	-0.124939
-0.591787	assumed that	-0.124939
-0.591787	kludgy that	-0.124939
-0.591787	remember that	-0.124939
-0.591787	mathimf.h that	-0.124939
-0.591787	iterators that	-0.124939
-0.591787	discover that	-0.124939
-0.591787	excuse that	-0.124939
-0.591787	likelihood that	-0.124939
-0.591787	Espresso) that	-0.124939
-0.591787	browsing that	-0.124939
-0.591787	guarantee that	-0.124939
-0.591787	stage that	-0.124939
-0.591787	CPU-dispatcher that	-0.124939
-0.591787	discovers that	-0.124939
-0.591787	realize that	-0.124939
-0.591787	clients that	-0.124939
-0.591787	emphasized that	-0.124939
-0.591787	*.so) that	-0.124939
-0.591787	formalism that	-0.124939
-0.591787	dictates that	-0.124939
-0.591787	mind, that	-0.124939
-0.591787	unrealistic that	-0.124939
-0.591787	subexpressions that	-0.124939
-0.591787	guess, that	-0.124939
-0.591787	Things that	-0.124939
-0.591787	knows that	-0.124939
-0.591787	wires that	-0.124939
-0.591787	feel that	-0.124939
-0.591787	Keywords that	-0.124939
-0.591787	say that	-0.124939
-0.591787	complication that	-0.124939
-0.591787	multithreading that	-0.124939
-0.591787	unlikely that	-0.124939
-0.591787	argue that	-0.124939
-0.591787	knowing that	-0.124939
-0.591787	(everything that	-0.124939
-1.252308	to be	-0.243629
-0.356713	can be	-0.461587
-1.197443	not be	-0.291270
-0.492755	may be	-0.310575
-0.697759	will be	-0.202105
-2.203583	then be	-0.124939
-1.586452	only be	-0.346788
-2.676410	all be	-0.124939
-0.334948	should be	-0.256611
-0.811095	also be	-0.287666
-2.437284	software be	-0.124939
-0.460705	cannot be	-0.197489
-2.407451	less be	-0.124939
-1.302678	often be	-0.124939
-1.652090	even be	-0.124939
-2.283569	always be	-0.124939
-1.360293	cases be	-0.124939
-0.355974	must be	-0.146128
-0.960723	therefore be	-0.221849
-2.124500	container be	-0.124939
-0.364286	would be	-0.154902
-2.112337	likely be	-0.124939
-0.220297	preferably be	-0.355388
-1.042841	never be	-0.124939
-1.897409	actually be	-0.124939
-1.763167	fact be	-0.124939
-0.633180	sometimes be	-0.124939
-0.792255	still be	-0.124939
-0.566339	possibly be	-0.124939
-0.900602	course be	-0.124939
-0.646587	might be	-0.301030
-0.601137	could be	-0.124939
-1.495555	now be	-0.124939
-0.549984	easily be	-0.124939
-1.373336	soon be	-0.124939
-0.346326	definitely be	-0.301030
-1.198611	Can be	-0.124939
-1.075043	probably be	-0.124939
-0.900327	can't be	-0.124939
-0.600676	day be	-0.124939
-0.600676	Will be	-0.124939
-2.864352	to are	-0.124939
-2.396418	and are	-0.124939
-1.172778	that are	-0.335792
-2.073147	function are	-0.249877
-1.990674	code are	-0.124939
-0.952156	you are	-0.440209
-1.143109	data are	-0.367977
-1.711355	program are	-0.124939
-0.892514	functions are	-0.145142
-1.978902	other are	-0.425969
-2.500881	loop are	-0.124939
-1.374825	which are	-0.124939
-2.502104	but are	-0.124939
-1.745860	cache are	-0.124939
-2.424706	set are	-0.124939
-1.461997	class are	-0.346788
-1.009238	compilers are	-0.393784
-2.417721	size are	-0.124939
-1.677653	b are	-0.124939
-2.364035	object are	-0.124939
-0.444606	there are	-0.331413
-0.307817	There are	-0.518283
-0.831978	objects are	-0.271067
-0.817420	we are	-0.157123
-1.049423	variables are	-0.329059
-2.259683	table are	-0.124939
-2.226217	software are	-0.124939
-1.064873	elements are	-0.367977
-2.360770	stored are	-0.124939
-2.238187	bit are	-0.124939
-2.191851	optimization are	-0.124939
-0.712655	libraries are	-0.284640
-0.685370	registers are	-0.460731
-1.250654	pointers are	-0.124939
-1.127526	systems are	-0.221849
-1.614695	these are	-0.124939
-0.261294	they are	-0.212089
-2.171306	access are	-0.124939
-2.173370	programming are	-0.124939
-2.145843	bits are	-0.124939
-0.760111	operations are	-0.124939
-2.174765	cases are	-0.124939
-0.902343	instructions are	-0.191886
-0.905534	processors are	-0.124939
-1.300665	CPUs are	-0.124939
-0.682507	arrays are	-0.346788
-2.086220	Windows are	-0.124939
-2.117314	calls are	-0.124939
-1.011989	calculations are	-0.346788
-1.288102	versions are	-0.301030
-0.994782	threads are	-0.124939
-2.138612	c are	-0.124939
-1.482830	These are	-0.124939
-2.056256	thread are	-0.124939
-1.471031	etc. are	-0.124939
-2.071567	integers are	-0.124939
-1.215727	classes are	-0.124939
-0.400421	parameters are	-0.455932
-2.030209	problem are	-0.124939
-2.041663	container are	-0.124939
-1.160046	operators are	-0.124939
-2.016109	structure are	-0.124939
-1.126251	values are	-0.124939
-0.957919	addresses are	-0.124939
-0.955691	files are	-0.124939
-1.931905	both are	-0.124939
-1.938988	problems are	-0.124939
-1.305889	branches are	-0.124939
-1.895465	multiplication are	-0.124939
-1.295063	sets are	-0.425969
-1.912568	members are	-0.124939
-0.761805	methods are	-0.124939
-1.884081	development are	-0.124939
-0.746214	resources are	-0.124939
-1.836709	applications are	-0.124939
-1.246039	examples are	-0.124939
-1.908422	means are	-0.124939
-1.828564	|| are	-0.124939
-1.822948	expressions are	-0.124939
-0.801902	directives are	-0.124939
-1.802745	framework are	-0.124939
-0.597862	microprocessors are	-0.204120
-0.938314	numbers are	-0.124939
-1.186757	together are	-0.124939
-1.798625	vectors are	-0.124939
-1.812643	r are	-0.124939
-0.760669	results are	-0.124939
-1.782191	storage are	-0.124939
-1.161653	options are	-0.124939
-0.892557	operands are	-0.124939
-1.752868	modules are	-0.124939
-1.745812	reductions are	-0.124939
-1.751900	references are	-0.124939
-1.758075	C are	-0.124939
-1.725571	conversions are	-0.124939
-0.686035	languages are	-0.249877
-1.719483	STL are	-0.124939
-1.703717	lines are	-0.124939
-1.673754	output are	-0.124939
-1.706016	costs are	-0.124939
-0.513419	constants are	-0.124939
-1.667184	strings are	-0.124939
-0.482526	conditions are	-0.124939
-1.654188	tasks are	-0.124939
-1.645820	child are	-0.124939
-1.613070	counters are	-0.124939
-0.972059	names are	-0.124939
-1.625875	details are	-0.124939
-0.722106	rows are	-0.301030
-1.606807	structures are	-0.124939
-1.575282	updates are	-0.124939
-1.588087	cores are	-0.124939
-1.614888	implementations are	-0.124939
-0.931396	sizes are	-0.425969
-1.588087	BSD are	-0.124939
-0.682698	loops are	-0.301030
-1.533889	statements are	-0.124939
-1.540244	templates are	-0.124939
-1.566638	sequence are	-0.124939
-1.540244	counts are	-0.124939
-1.553241	map are	-0.124939
-1.494487	style are	-0.124939
-0.847120	destructors are	-0.124939
-1.507483	transfer are	-0.124939
-0.845680	diagonal are	-0.425969
-1.507483	purposes are	-0.124939
-1.507483	Here are	-0.124939
-1.449784	prediction are	-0.124939
-1.490633	conventions are	-0.124939
-1.449784	background are	-0.124939
-1.456331	algorithms are	-0.124939
-1.469728	additions are	-0.124939
-1.456331	inputs are	-0.124939
-1.462978	who are	-0.124939
-0.798861	factors are	-0.124939
-0.797412	devices are	-0.124939
-1.462978	misses are	-0.124939
-0.546014	tables are	-0.301030
-1.404986	D are	-0.124939
-1.404986	measure are	-0.124939
-1.404986	sections are	-0.124939
-1.418593	algebra are	-0.124939
-1.398339	switches are	-0.124939
-1.404986	Java are	-0.124939
-1.398339	exceptions are	-0.124939
-1.418593	machine are	-0.124939
-0.488834	manuals are	-0.124939
-0.488834	profilers are	-0.124939
-1.398339	capabilities are	-0.124939
-1.404986	measurements are	-0.124939
-1.398339	units are	-0.124939
-1.411736	log are	-0.124939
-0.422701	comparisons are	-0.124939
-0.676835	requirements are	-0.124939
-1.265608	Fortran are	-0.124939
-1.272465	drivers are	-0.124939
-0.596195	Templates are	-0.124939
-1.265608	principles are	-0.124939
-0.185953	schemes are	-0.249877
-0.596195	Arrays are	-0.124939
-1.286513	constructors are	-0.124939
-1.265608	guidelines are	-0.124939
-0.344334	frameworks are	-0.124939
-1.272465	consumption are	-0.124939
-0.596195	facilities are	-0.425969
-1.265608	macros are	-0.124939
-1.175555	solutions are	-0.124939
-1.175555	sum2 are	-0.124939
-0.502207	Branches are	-0.124939
-1.175555	caches are	-0.124939
-0.500744	Threads are	-0.124939
-1.175555	tests are	-0.124939
-1.064664	main() are	-0.124939
-1.057584	divisions are	-0.124939
-1.057584	Enums are	-0.124939
-1.064664	Constructors are	-0.124939
-0.377269	c[i] are	-0.124939
-0.377269	Examples are	-0.124939
-1.057584	lookups are	-0.124939
-0.888573	Sum3 are	-0.124939
-0.888573	influences are	-0.124939
-0.888573	constructs are	-0.124939
-0.888573	settings are	-0.124939
-0.888573	others are	-0.124939
-0.888573	DLLs are	-0.124939
-0.888573	suffixes are	-0.124939
-0.888573	arguments are	-0.124939
-0.888573	intervals are	-0.124939
-0.888573	v.f are	-0.124939
-0.888573	dispatchers are	-0.124939
-0.202646	Registers are	-0.425969
-0.202646	References are	-0.124939
-0.888573	int) are	-0.124939
-0.888573	algorithms, are	-0.124939
-0.888573	experiment are	-0.124939
-0.888573	i++ are	-0.124939
-0.888573	time-consumers are	-0.124939
-0.888573	procedures are	-0.124939
-0.888573	~ are	-0.124939
-0.594741	repagination are	-0.124939
-0.594741	clauses are	-0.124939
-0.594741	(Darwin) are	-0.124939
-0.594741	12) are	-0.124939
-0.594741	Beginners are	-0.124939
-0.594741	mangling are	-0.124939
-0.594741	distributors are	-0.124939
-0.594741	recommendations are	-0.124939
-0.594741	latencies are	-0.124939
-0.594741	'$' are	-0.124939
-0.594741	parsing are	-0.124939
-0.594741	objects) are	-0.124939
-0.594741	properties) are	-0.124939
-0.594741	123; are	-0.124939
-0.594741	#) are	-0.124939
-0.594741	slice are	-0.124939
-0.594741	(MOVNT) are	-0.124939
-0.594741	'>') are	-0.124939
-0.594741	clause are	-0.124939
-0.594741	vendors are	-0.124939
-0.594741	Multiplications are	-0.124939
-2.697958	to can	-0.425969
-2.490703	and can	-0.124939
-1.410582	that can	-0.455932
-1.156911	it can	-0.377664
-1.820573	function can	-0.367977
-1.426868	code can	-0.425969
-1.024615	This can	-0.784991
-1.042725	compiler can	-0.325854
-2.862546	x can	-0.124939
-0.797827	you can	-0.406664
-1.951842	this can	-0.124939
-2.606506	time can	-0.124939
-2.574238	use can	-0.124939
-1.334744	It can	-0.346788
-2.081859	memory can	-0.124939
-2.071806	data can	-0.124939
-2.059006	program can	-0.124939
-2.066149	vector can	-0.124939
-2.004589	same can	-0.124939
-1.823261	functions can	-0.301030
-2.028228	CPU can	-0.124939
-2.629714	instruction can	-0.124939
-1.804321	loop can	-0.124939
-1.258595	which can	-0.602060
-1.947887	integer can	-0.425969
-1.722281	set can	-0.124939
-2.370168	class can	-0.124939
-1.896798	example can	-0.425969
-1.085234	compilers can	-0.263241
-1.916276	size can	-0.425969
-1.681357	pointer can	-0.602060
-2.437997	b can	-0.124939
-1.290924	library can	-0.425969
-2.461307	i can	-0.124939
-1.623691	object can	-0.602060
-1.432872	array can	-0.425969
-1.315383	objects can	-0.522879
-1.551502	variable can	-0.602060
-0.817310	we can	-0.124939
-1.536921	variables can	-0.301030
-2.216858	2 can	-0.124939
-0.580328	You can	-0.291270
-1.747151	table can	-0.124939
-1.760366	performance can	-0.124939
-2.221573	software can	-0.124939
-1.512292	branch can	-0.301030
-2.358773	stored can	-0.124939
-1.721461	address can	-0.425969
-2.234527	bit can	-0.124939
-1.679055	register can	-0.425969
-2.188001	optimization can	-0.124939
-1.658061	libraries can	-0.425969
-1.636793	registers can	-0.124939
-1.640901	pointers can	-0.425969
-1.624077	systems can	-0.124939
-2.200694	user can	-0.124939
-1.647583	they can	-0.124939
-1.223151	method can	-0.726999
-2.167991	access can	-0.124939
-1.585586	system can	-0.425969
-2.122184	file can	-0.124939
-2.170204	programming can	-0.124939
-2.298261	part can	-0.124939
-1.591036	operations can	-0.425969
-2.150952	type can	-0.124939
-2.117562	instructions can	-0.124939
-1.338122	processors can	-0.124939
-2.171459	available can	-0.124939
-1.557612	constant can	-0.425969
-2.109999	error can	-0.124939
-2.116190	stack can	-0.124939
-1.146497	CPUs can	-0.124939
-2.109214	arrays can	-0.124939
-2.088162	work can	-0.124939
-2.114641	calls can	-0.124939
-2.073016	calculations can	-0.124939
-2.137052	result can	-0.124939
-2.088800	processor can	-0.124939
-2.088800	bytes can	-0.124939
-1.501757	threads can	-0.124939
-2.136833	c can	-0.124939
-0.982602	thread can	-0.221849
-1.232612	overflow can	-0.124939
-2.015858	classes can	-0.124939
-2.020906	line can	-0.124939
-2.112244	inside can	-0.124939
-1.420799	problem can	-0.425969
-1.426520	solution can	-0.124939
-2.009606	list can	-0.124939
-1.988989	hardware can	-0.124939
-2.030301	information can	-0.124939
-1.972615	... can	-0.124939
-0.953777	counter can	-0.726999
-1.339129	allocation can	-0.124939
-1.963838	above can	-0.124939
-1.929321	both can	-0.124939
-1.908586	programs can	-0.124939
-1.925285	space can	-0.124939
-1.930997	dispatching can	-0.124939
-1.064340	microprocessor can	-0.124939
-1.903853	branches can	-0.124939
-1.892792	multiplication can	-0.124939
-1.903853	application can	-0.124939
-1.294555	sets can	-0.425969
-1.922443	implementation can	-0.124939
-1.893100	handling can	-0.124939
-1.910467	members can	-0.124939
-1.893795	parameter can	-0.124939
-1.906288	reference can	-0.124939
-1.245622	programmer can	-0.124939
-1.857221	keyword can	-0.124939
-1.851355	lookup can	-0.124939
-1.834217	applications can	-0.124939
-0.206855	We can	-0.221849
-0.996329	mechanism can	-0.301030
-1.800346	framework can	-0.124939
-1.842004	microprocessors can	-0.124939
-1.814614	numbers can	-0.124939
-1.808500	interface can	-0.124939
-1.161205	process can	-0.124939
-0.921568	union can	-0.124939
-1.768856	constructor can	-0.124939
-1.731889	section can	-0.124939
-1.737918	tested can	-0.124939
-1.762909	contentions can	-0.124939
-1.089461	conversions can	-0.124939
-1.088014	statement can	-0.124939
-1.093831	errors can	-0.425969
-1.730196	languages can	-0.124939
-1.723905	inlining can	-0.124939
-1.689675	operation can	-0.124939
-1.032921	output can	-0.124939
-1.705105	costs can	-0.124939
-1.659712	database can	-0.124939
-1.678588	constants can	-0.124939
-1.665913	algorithm can	-0.124939
-1.640020	alignment can	-0.124939
-1.652883	offset can	-0.124939
-1.679811	effect can	-0.124939
-1.611641	counters can	-0.124939
-1.631376	heap can	-0.124939
-1.586909	loading can	-0.124939
-0.932570	condition can	-0.425969
-1.586909	cores can	-0.124939
-1.621377	generation can	-0.124939
-1.538939	buffer can	-0.124939
-1.545516	stride can	-0.124939
-1.532460	metaprogramming can	-0.124939
-1.552195	map can	-0.124939
-0.845419	chains can	-0.425969
-0.598718	tool can	-0.124939
-1.455285	binding can	-0.124939
-1.462067	family can	-0.124939
-1.448606	DLL can	-0.124939
-1.462067	tables can	-0.124939
-1.404075	sections can	-0.124939
-1.397293	switches can	-0.124939
-1.397293	Studio can	-0.124939
-0.489573	They can	-0.301030
-1.397293	exceptions can	-0.124939
-1.404075	collection can	-0.124939
-1.410965	manuals can	-0.124939
-1.397293	capabilities can	-0.124939
-1.404075	measurements can	-0.124939
-1.397293	units can	-0.124939
-1.337128	polynomial can	-0.124939
-1.344019	13.1 can	-0.124939
-1.351020	Pointers can	-0.124939
-1.337128	debugger can	-0.124939
-1.358136	behavior can	-0.124939
-1.344019	edx can	-0.124939
-1.337128	job can	-0.124939
-1.271839	abc can	-0.124939
-0.597554	limit can	-0.425969
-1.264837	guidelines can	-0.124939
-1.278955	seen can	-0.124939
-1.174929	estimate can	-0.124939
-1.174929	Metaprogramming can	-0.124939
-1.174929	manager can	-0.124939
-0.500644	pattern can	-0.124939
-1.174929	12.4b can	-0.124939
-1.182045	Integers can	-0.124939
-1.182045	simultaneously can	-0.124939
-1.174929	techniques can	-0.124939
-1.174929	otherwise can	-0.124939
-1.174929	resolution can	-0.124939
-1.064340	redesign can	-0.124939
-1.057106	projects can	-0.124939
-1.057106	14.28 can	-0.124939
-1.057106	divisions can	-0.124939
-1.057106	s3 can	-0.124939
-1.057106	etc., can	-0.124939
-1.057106	Strings can	-0.124939
-1.057106	read-only can	-0.124939
-0.888249	64) can	-0.124939
-0.888249	c+b can	-0.124939
-0.888249	chip can	-0.124939
-0.888249	formats can	-0.124939
-0.888249	miss can	-0.124939
-0.888249	Jumps can	-0.124939
-0.888249	parentheses can	-0.124939
-0.888249	Neither can	-0.124939
-0.888249	Divisions can	-0.124939
-0.888249	139 can	-0.124939
-0.888249	modifier can	-0.124939
-0.888249	insight can	-0.124939
-0.888249	queries can	-0.124939
-0.888249	bottlenecks can	-0.124939
-0.888249	>> can	-0.124939
-0.594576	tread can	-0.124939
-0.594576	(b+c) can	-0.124939
-0.594576	unequally can	-0.124939
-0.594576	dilemma can	-0.124939
-0.594576	preprocessor can	-0.124939
-0.594576	work-around can	-0.124939
-0.594576	8.24 can	-0.124939
-0.594576	BTB can	-0.124939
-0.594576	denominator can	-0.124939
-0.594576	valid) can	-0.124939
-0.594576	(Examples can	-0.124939
-0.594576	installed can	-0.124939
-0.594576	!a; can	-0.124939
-0.594576	shuffling can	-0.124939
-0.594576	Zero can	-0.124939
-0.594576	(arrays can	-0.124939
-2.845360	the //	-0.124939
-2.772346	is //	-0.124939
-2.590913	for //	-0.124939
-2.350153	function //	-0.124939
-1.838928	by //	-0.903090
-2.580869	compiler //	-0.124939
-2.077582	x //	-0.124939
-0.691748	{ //	-0.281013
-1.096169	} //	-0.179296
-2.422509	data //	-0.124939
-2.018156	functions //	-0.124939
-2.353072	size //	-0.124939
-2.278508	multiple //	-0.124939
-2.332608	version //	-0.124939
-2.305178	objects //	-0.124939
-1.502731	2 //	-0.301030
-1.728433	table //	-0.425969
-2.243792	branch //	-0.124939
-2.249853	elements //	-0.124939
-1.739712	faster //	-0.425969
-2.208135	call //	-0.124939
-1.711899	0; //	-0.124939
-2.184986	bit //	-0.124939
-1.698381	unsigned //	-0.425969
-1.442208	first //	-0.301030
-2.130760	code. //	-0.124939
-2.145477	time. //	-0.124939
-1.239526	test //	-0.124939
-2.171686	SSE2 //	-0.124939
-1.575536	0 //	-0.124939
-2.064877	error //	-0.124939
-2.095429	times //	-0.124939
-0.006787	Example: //	-1.806180
-2.071295	arrays //	-0.124939
-2.045965	work //	-0.124939
-2.105187	result //	-0.124939
-2.048686	bytes //	-0.124939
-1.975821	etc. //	-0.124939
-1.061172	matrix //	-0.249877
-1.971686	classes //	-0.124939
-0.586142	}; //	-0.170696
-1.032330	b; //	-0.124939
-1.950444	optimizing //	-0.124939
-1.107583	... //	-0.301030
-1.342093	counter //	-0.124939
-1.876345	operator //	-0.124939
-1.313076	1; //	-0.124939
-1.029467	conversion //	-0.301030
-1.274000	c; //	-0.124939
-1.825192	zero //	-0.124939
-1.226700	Table //	-0.425969
-1.211059	compilers. //	-0.124939
-1.834606	aligned //	-0.124939
-1.781301	x; //	-0.124939
-1.838430	100; //	-0.124939
-1.797508	later //	-0.124939
-1.746011	AVX2 //	-0.124939
-1.133209	constructor //	-0.124939
-1.792673	2; //	-0.124939
-1.767660	here //	-0.124939
-1.751015	a; //	-0.124939
-0.076110	example: //	-1.301030
-1.059704	slow //	-0.124939
-0.784098	d; //	-0.124939
-1.625250	rows //	-0.124939
-1.578793	directly //	-0.124939
-1.587461	temp; //	-0.124939
-1.570294	loops //	-0.124939
-0.889365	SSE4.1 //	-0.124939
-1.520566	templates //	-0.124939
-0.033589	to: //	-1.124939
-1.512387	metaprogramming //	-0.124939
-1.491643	multiply //	-0.124939
-1.483144	diagonal //	-0.124939
-1.500311	Time //	-0.124939
-1.500311	y; //	-0.124939
-1.431991	arrays. //	-0.124939
-1.458003	required //	-0.124939
-1.449158	array. //	-0.124939
-1.391166	measure //	-0.124939
-0.123991	this: //	-0.903090
-1.409040	10; //	-0.124939
-1.391166	follows: //	-0.124939
-0.330922	c); //	-0.425969
-1.409040	a[100]; //	-0.124939
-1.418260	256; //	-0.124939
-1.382498	T //	-0.124939
-1.400011	2: //	-0.124939
-1.391166	is. //	-0.124939
-1.391166	details. //	-0.124939
-1.400011	x); //	-0.124939
-1.324220	constant. //	-0.124939
-1.324220	polynomial //	-0.124939
-0.675306	casting //	-0.124939
-0.673345	16; //	-0.425969
-0.057518	{...} //	-0.301030
-1.333064	13.1 //	-0.124939
-0.057518	i); //	-0.425969
-1.253883	method. //	-0.124939
-1.253883	a[i]; //	-0.124939
-1.262912	returns //	-0.124939
-1.253883	n! //	-0.124939
-1.262912	www.agner.org/optimize/asmlib.zip. //	-0.124939
-0.594164	); //	-0.124939
-1.253883	point. //	-0.124939
-1.262912	13 //	-0.124939
-0.594164	#endif //	-0.124939
-1.166002	103 //	-0.124939
-0.089411	_EM_OVERFLOW); //	-0.425969
-0.501184	x^4 //	-0.124939
-0.247670	0x80000000; //	-0.124939
-1.175222	dispatcher. //	-0.124939
-1.166002	1: //	-0.124939
-1.175222	unsigned. //	-0.124939
-1.166002	Y //	-0.124939
-1.059704	error. //	-0.124939
-0.376245	#else //	-0.124939
-0.123833	numbers: //	-0.602060
-0.376245	calculations: //	-0.425969
-0.376245	8; //	-0.124939
-0.376245	operations: //	-0.425969
-1.050284	overflow: //	-0.124939
-1.050284	way: //	-0.124939
-1.050284	coefficients //	-0.124939
-0.123833	with: //	-0.602060
-1.050284	30 //	-0.124939
-1.050284	38 //	-0.124939
-0.376245	bit: //	-0.425969
-0.123833	b[size]; //	-0.301030
-0.202132	_mm_set1_epi16(2); //	-0.425969
-0.883613	_MSC_VER //	-0.124939
-0.883613	y=temp;} //	-0.124939
-0.883613	x^2 //	-0.124939
-0.883613	<stdio.h> //	-0.124939
-0.883613	3.3; //	-0.124939
-0.202132	<dvec.h> //	-0.425969
-0.202132	case: //	-0.425969
-0.883613	loop: //	-0.124939
-0.202132	"vectorclass.h" //	-0.124939
-0.202132	lookup: //	-0.425969
-0.202132	bc); //	-0.124939
-0.202132	two); //	-0.425969
-0.883613	TILESIZE //	-0.124939
-0.883613	time1; //	-0.124939
-0.883613	<emmintrin.h> //	-0.124939
-0.883613	a[c][r]); //	-0.124939
-0.202132	InstructionSet(); //	-0.425969
-0.202132	"asmlib.h" //	-0.124939
-0.883613	mask); //	-0.124939
-0.202132	512; //	-0.425969
-0.202132	1.2; //	-0.124939
-0.883613	c.load(cc+i); //	-0.124939
-0.883613	parameter: //	-0.124939
-0.883613	parm2); //	-0.124939
-0.883613	x^n //	-0.124939
-0.202132	table: //	-0.425969
-0.202132	_mm_set1_epi16(0); //	-0.425969
-0.202132	x^10 //	-0.124939
-0.883613	class: //	-0.124939
-0.202132	23; //	-0.124939
-0.202132	needed: //	-0.425969
-0.202132	zero); //	-0.124939
-0.883613	"; //	-0.124939
-0.202132	1" //	-0.425969
-0.883613	int)u; //	-0.124939
-0.883613	occurred. //	-0.124939
-0.883613	2;} //	-0.124939
-0.202132	available: //	-0.425969
-0.883613	sizeof(a)); //	-0.124939
-0.883613	b.load(bb+i); //	-0.124939
-0.883613	function: //	-0.124939
-0.592212	SafeArray: //	-0.124939
-0.592212	union: //	-0.124939
-0.592212	bits: //	-0.124939
-0.592212	zero: //	-0.124939
-0.592212	xx4(x4); //	-0.124939
-0.592212	point: //	-0.124939
-0.592212	enabled: //	-0.124939
-0.592212	x^8 //	-0.124939
-0.592212	list; //	-0.124939
-0.592212	precision: //	-0.124939
-0.592212	64; //	-0.124939
-0.592212	log(c[i]); //	-0.124939
-0.592212	p2->Hello(); //	-0.124939
-0.592212	@gnu_indirect_function"); //	-0.124939
-0.592212	integer: //	-0.124939
-0.592212	last: //	-0.124939
-0.592212	A2; //	-0.124939
-0.592212	"xmmintrin.h" //	-0.124939
-0.592212	condition: //	-0.124939
-0.592212	efficient: //	-0.124939
-0.592212	c[arraysize]; //	-0.124939
-0.592212	<pmmintrin.h> //	-0.124939
-0.592212	const*)p);} //	-0.124939
-0.592212	arrays: //	-0.124939
-0.592212	underflow: //	-0.124939
-0.592212	polymorphism: //	-0.124939
-0.592212	Examples: //	-0.124939
-0.592212	&SelectAddMul_dispatch; //	-0.124939
-0.592212	variable: //	-0.124939
-0.592212	n+1; //	-0.124939
-0.592212	Constructor //	-0.124939
-0.592212	bitofn //	-0.124939
-0.592212	double: //	-0.124939
-0.592212	reorganize: //	-0.124939
-0.592212	improvements). //	-0.124939
-0.592212	cpuid //	-0.124939
-0.592212	sizeof(float)); //	-0.124939
-0.592212	classes): //	-0.124939
-0.592212	ipow(x,10); //	-0.124939
-0.592212	(int)d; //	-0.124939
-0.592212	polynomial: //	-0.124939
-0.592212	_mm_load_ps(coef+i); //	-0.124939
-0.592212	defined(__GNUC__) //	-0.124939
-0.592212	InstructionSet(): //	-0.124939
-0.592212	BigArray[1024]; //	-0.124939
-0.592212	structures: //	-0.124939
-0.592212	7.45 //	-0.124939
-0.592212	two(2,2,2,2,2,2,2,2); //	-0.124939
-0.592212	WhateverFunction(i); //	-0.124939
-0.592212	0x7FFFFFFF; //	-0.124939
-0.592212	values: //	-0.124939
-0.592212	keyword: //	-0.124939
-0.592212	comparison: //	-0.124939
-0.592212	set: //	-0.124939
-0.592212	reciprocal: //	-0.124939
-0.592212	(WTL): //	-0.124939
-0.592212	counter: //	-0.124939
-0.592212	Serialize //	-0.124939
-0.592212	used: //	-0.124939
-0.592212	14.21. //	-0.124939
-0.592212	memcpy: //	-0.124939
-0.592212	asmlib.. //	-0.124939
-0.592212	0.0; //	-0.124939
-0.592212	&CriticalFunction_Dispatch; //	-0.124939
-0.592212	SSE3. //	-0.124939
-0.592212	lrint(d); //	-0.124939
-0.592212	denominator: //	-0.124939
-0.592212	two: //	-0.124939
-0.592212	have: //	-0.124939
-0.592212	memset: //	-0.124939
-0.592212	fprintf //	-0.124939
-0.592212	52; //	-0.124939
-0.592212	"instrset_detect.cpp" //	-0.124939
-0.592212	coordinates //	-0.124939
-0.592212	zero(0,0,0,0,0,0,0,0); //	-0.124939
-0.592212	address: //	-0.124939
-0.592212	fastest: //	-0.124939
-0.592212	alloca: //	-0.124939
-0.592212	//=DeltaY //	-0.124939
-0.592212	exponent: //	-0.124939
-0.592212	instead: //	-0.124939
-0.592212	conversions: //	-0.124939
-0.592212	matrix[c][r]. //	-0.124939
-0.592212	&SelectAddMul_SSE2; //	-0.124939
-0.592212	SelectAddMul_dispatch; //	-0.124939
-0.592212	classes: //	-0.124939
-0.592212	CriticalFunction_Dispatch; //	-0.124939
-0.592212	x.f; //	-0.124939
-0.592212	a: //	-0.124939
-0.592212	63; //	-0.124939
-0.592212	add_elements(s); //	-0.124939
-0.592212	integers: //	-0.124939
-0.592212	0x3F800000; //	-0.124939
-0.592212	e.g.: //	-0.124939
-0.592212	FactorialTable[n]; //	-0.124939
-0.592212	7.22. //	-0.124939
-0.592212	p->f(); //	-0.124939
-0.592212	search: //	-0.124939
-0.592212	x2; //	-0.124939
-0.592212	x^0/0! //	-0.124939
-0.592212	9.5b. //	-0.124939
-0.592212	false: //	-0.124939
-0.592212	square: //	-0.124939
-0.592212	square. //	-0.124939
-0.592212	*(__m64*)&source); //	-0.124939
-0.592212	interval: //	-0.124939
-0.592212	1.f); //	-0.124939
-0.592212	variables: //	-0.124939
-0.592212	xx4; //	-0.124939
-0.592212	15; //	-0.124939
-0.592212	154 //	-0.124939
-0.592212	statement: //	-0.124939
-0.592212	template: //	-0.124939
-0.592212	static_cast<float>(i); //	-0.124939
-0.592212	capability: //	-0.124939
-0.592212	__attribute__((aligned(64))); //	-0.124939
-0.592212	_mm_empty(); //	-0.124939
-0.592212	gives: //	-0.124939
-0.592212	seconds; //	-0.124939
-0.592212	1.2f; //	-0.124939
-0.592212	cc[]); //	-0.124939
-0.592212	116 //	-0.124939
-0.592212	110 //	-0.124939
-0.592212	11; //	-0.124939
-0.592212	endl; //	-0.124939
-0.592212	2.5; //	-0.124939
-1.427908	a =	-0.419338
-1.543213	x =	-0.124939
-2.086445	A =	-0.425969
-1.024093	size =	-0.505150
-0.873073	b =	-0.249877
-0.807606	i =	-0.753328
-1.836112	two =	-0.425969
-2.462744	clock =	-0.124939
-1.701559	4 =	-0.124939
-1.696613	8 =	-0.124939
-2.198144	32 =	-0.124939
-1.346670	0 =	-0.301030
-1.171062	constant =	-0.425969
-1.532333	result =	-0.425969
-0.628575	bytes =	-0.467361
-0.438523	c =	-0.329059
-0.092807	(i =	-0.985277
-0.186455	y =	-0.276206
-1.258038	zero =	-0.425969
-1.240848	n =	-0.124939
-1.240071	byte =	-0.425969
-1.168364	r =	-0.124939
-0.132411	a[i] =	-0.229674
-1.118734	C =	-0.425969
-0.845895	columns =	-0.124939
-0.688566	p =	-0.124939
-1.708712	b) =	-0.124939
-0.629531	temp =	-0.124939
-0.179247	d =	-0.221849
-0.515330	sum =	-0.346788
-1.646564	right =	-0.124939
-0.968792	true =	-0.124939
-0.971455	N =	-0.124939
-0.722459	rows =	-0.301030
-0.971455	level =	-0.425969
-0.936345	list[i] =	-0.124939
-0.485492	CriticalFunction =	-0.124939
-1.544448	seconds =	-0.124939
-0.486015	f =	-0.124939
-0.440258	*p =	-0.425969
-0.850541	factorial =	-0.425969
-1.504727	eax =	-0.124939
-0.796701	false =	-0.124939
-0.547121	c2 =	-0.301030
-0.798042	ecx =	-0.124939
-0.798042	j =	-0.425969
-0.273093	-1 =	-0.346788
-0.740050	xn =	-0.124939
-1.427093	Induction =	-0.124939
-0.674449	B =	-0.425969
-0.674449	edx =	-0.124939
-0.344512	bc =	-0.301030
-0.344512	SIZE =	-0.301030
-0.186035	(c =	-0.726999
-0.344512	-(-a) =	-0.301030
-1.268085	n! =	-0.124939
-0.345269	s =	-0.124939
-1.274477	Wednesday =	-0.124939
-1.274477	sum1 =	-0.124939
-0.344512	~a =	-0.301030
-0.597972	b[i] =	-0.124939
-0.089650	SelectAddMul_pointer =	-0.124939
-0.248359	z =	-0.602060
-1.177567	u.i =	-0.124939
-1.177567	size) =	-0.124939
-1.177567	sum2 =	-0.124939
-0.089650	(r =	-0.425969
-1.177567	N1 =	-0.124939
-0.501062	mask =	-0.425969
-1.177567	Y =	-0.124939
-0.124179	(set) =	-0.124939
-0.377482	!a =	-0.124939
-0.124179	a-a =	-0.602060
-0.124179	x*x*x*x*x*x*x*x =	-0.301030
-0.377482	list[i+1] =	-0.425969
-0.377482	x2 =	-0.425969
-0.377482	list[i+2] =	-0.425969
-1.059116	Z =	-0.124939
-1.065702	c[i] =	-0.124939
-1.059116	s3 =	-0.124939
-1.059116	s2 =	-0.124939
-0.377482	a+b =	-0.124939
-0.377482	a*b =	-0.425969
-0.124179	(x =	-0.301030
-1.059116	kb =	-0.124939
-0.202753	x4 =	-0.124939
-0.202753	a[i+1] =	-0.425969
-0.889611	nfac =	-0.124939
-0.202753	a*0 =	-0.425969
-0.202753	a*1 =	-0.425969
-0.889611	TILESIZE =	-0.124939
-0.889611	Saturday =	-0.124939
-0.202753	a+0 =	-0.425969
-0.202753	y1 =	-0.425969
-0.202753	y2 =	-0.425969
-0.202753	x.abc =	-0.124939
-0.889611	p2 =	-0.124939
-0.889611	p1 =	-0.124939
-0.889611	ArraySize =	-0.124939
-0.202753	aa[i] =	-0.124939
-0.202753	(c2 =	-0.124939
-0.889611	ABC =	-0.124939
-0.889611	s0 =	-0.124939
-0.889611	(a&&c) =	-0.124939
-0.889611	NumberOfTests =	-0.124939
-0.889611	min =	-0.124939
-0.889611	sizeof(S1) =	-0.124939
-0.202753	largest_index =	-0.124939
-0.889611	list[i].a =	-0.124939
-0.889611	matrix[j][0] =	-0.124939
-0.889611	s1 =	-0.124939
-0.202753	a*b+a*c =	-0.425969
-0.202753	(a&&b&&c) =	-0.124939
-0.889611	ARRAYSIZE =	-0.124939
-0.202753	^a =	-0.425969
-0.889611	Friday =	-0.124939
-0.202753	0/a =	-0.425969
-0.202753	(a&b)|(a&c) =	-0.425969
-0.889611	log2 =	-0.124939
-0.889611	(!a&&c) =	-0.124939
-0.202753	largest_abs =	-0.124939
-0.202753	x.a =	-0.124939
-0.202753	Table[x] =	-0.124939
-0.202753	x.c =	-0.124939
-0.202753	x.b =	-0.124939
-0.202753	a+b+c+d =	-0.425969
-0.202753	FactorialTable[13] =	-0.425969
-0.889611	Tuesday =	-0.124939
-0.202753	(r2 =	-0.124939
-0.889611	b+c =	-0.124939
-0.595268	{x =	-0.124939
-0.595268	a<<b<<c =	-0.124939
-0.595268	0+1.23456 =	-0.124939
-0.595268	timediff[i] =	-0.124939
-0.595268	Sunday =	-0.124939
-0.595268	OneOrTwo5[2] =	-0.124939
-0.595268	a<c) =	-0.124939
-0.595268	time1 =	-0.124939
-0.595268	a/1 =	-0.124939
-0.595268	x[1] =	-0.124939
-0.595268	list[j].a =	-0.124939
-0.595268	DontSkip =	-0.124939
-0.595268	Greek[4] =	-0.124939
-0.595268	A2 =	-0.124939
-0.595268	max =	-0.124939
-0.595268	x10 =	-0.124939
-0.595268	DynamicArray[i] =	-0.124939
-0.595268	polynomial(x) =	-0.124939
-0.595268	matrix[row][column] =	-0.124939
-0.595268	absvalue =	-0.124939
-0.595268	reciprocal_divisor =	-0.124939
-0.595268	(-a)*(-b) =	-0.124939
-0.595268	a+b+c =	-0.124939
-0.595268	coef[16] =	-0.124939
-0.595268	(a+b)+c =	-0.124939
-0.595268	NUMROWS =	-0.124939
-0.595268	(int)(&list[100]) =	-0.124939
-0.595268	(j =	-0.124939
-0.595268	a+a+a+a =	-0.124939
-0.595268	x8 =	-0.124939
-0.595268	(b&c) =	-0.124939
-0.595268	(a|b)&(a|c) =	-0.124939
-0.595268	-100+100+100 =	-0.124939
-0.595268	(b&&c) =	-0.124939
-0.595268	(c1 =	-0.124939
-0.595268	list[i].b =	-0.124939
-0.595268	andnot(a,a) =	-0.124939
-0.595268	lookup[2] =	-0.124939
-0.595268	x.d =	-0.124939
-0.595268	x.f =	-0.124939
-0.595268	!b =	-0.124939
-0.595268	a[1] =	-0.124939
-0.595268	(row =	-0.124939
-0.595268	(r1 =	-0.124939
-0.595268	arraysize =	-0.124939
-0.595268	(temp =	-0.124939
-0.595268	1024/4 =	-0.124939
-0.595268	a[0] =	-0.124939
-0.595268	(!a&&b) =	-0.124939
-0.595268	stride) =	-0.124939
-0.595268	ns =	-0.124939
-0.595268	ab[i].b =	-0.124939
-0.595268	sizeof(float)) =	-0.124939
-0.595268	temp->b =	-0.124939
-0.595268	temp->a =	-0.124939
-0.595268	list[] =	-0.124939
-0.595268	Thursday =	-0.124939
-0.595268	DynamicArray =	-0.124939
-0.595268	iset =	-0.124939
-0.595268	Monday =	-0.124939
-0.595268	a&b&c&d =	-0.124939
-0.595268	~b =	-0.124939
-0.595268	a[c][r] =	-0.124939
-0.595268	8*1024/64 =	-0.124939
-0.595268	IsPowerOf2 =	-0.124939
-0.595268	~(~a) =	-0.124939
-0.595268	x[0] =	-0.124939
-0.595268	NUMCOLUMNS =	-0.124939
-0.595268	^0 =	-0.124939
-0.595268	(column =	-0.124939
-0.595268	a.x =	-0.124939
-0.595268	a.y =	-0.124939
-0.595268	list[300] =	-0.124939
-0.595268	0x20 =	-0.124939
-2.582996	// or	-0.124939
-1.507244	function or	-0.271067
-2.512593	with or	-0.124939
-2.092692	code or	-0.124939
-2.489736	int or	-0.124939
-2.396706	this or	-0.124939
-2.141105	time or	-0.425969
-1.849769	memory or	-0.124939
-2.408497	data or	-0.124939
-1.827354	program or	-0.124939
-2.425245	vector or	-0.124939
-2.292842	same or	-0.124939
-1.665470	functions or	-0.249877
-2.403603	CPU or	-0.124939
-1.787252	loop or	-0.124939
-2.454242	used or	-0.124939
-1.181009	one or	-0.522879
-2.351353	cache or	-0.124939
-2.342915	set or	-0.124939
-1.235007	class or	-0.550907
-2.397104	compilers or	-0.124939
-2.341525	size or	-0.124939
-1.676568	Intel or	-0.124939
-0.735566	pointer or	-0.689210
-1.835374	library or	-0.124939
-1.136603	float or	-0.726999
-1.076552	two or	-0.778151
-1.828985	object or	-0.124939
-1.461931	static or	-0.249877
-1.803302	C++ or	-0.425969
-1.314985	array or	-0.221849
-2.299334	possible or	-0.124939
-1.393379	variable or	-0.425969
-2.244362	variables or	-0.124939
-2.140541	2 or	-0.124939
-2.186636	table or	-0.124939
-2.397596	order or	-0.124939
-1.743916	long or	-0.124939
-2.264053	32-bit or	-0.124939
-2.345075	member or	-0.124939
-1.493298	way or	-0.124939
-2.307615	const or	-0.124939
-2.154960	4 or	-0.124939
-2.198961	call or	-0.124939
-1.671646	8 or	-0.425969
-2.185886	64 or	-0.124939
-2.125967	optimization or	-0.124939
-2.137031	libraries or	-0.124939
-1.244298	pointers or	-0.124939
-2.109002	test or	-0.124939
-1.379776	new or	-0.124939
-2.107343	systems or	-0.124939
-2.114030	access or	-0.124939
-2.082567	16 or	-0.124939
-1.009865	SSE2 or	-0.425969
-2.185983	out or	-0.124939
-2.044065	system or	-0.124939
-0.847446	0 or	-0.550907
-2.250384	short or	-0.124939
-2.061433	instructions or	-0.124939
-2.112505	Gnu or	-0.124939
-2.105502	important or	-0.124939
-1.292875	CPUs or	-0.124939
-2.108378	assembly or	-0.124939
-1.503573	large or	-0.124939
-2.063795	arrays or	-0.124939
-2.019055	calculations or	-0.124939
-2.040775	bytes or	-0.124939
-0.992344	speed or	-0.124939
-2.007705	single or	-0.124939
-2.058404	AMD or	-0.124939
-2.062527	exception or	-0.124939
-1.068207	small or	-0.425969
-1.226126	overflow or	-0.124939
-1.463291	integers or	-0.124939
-1.446632	matrix or	-0.425969
-1.450399	AVX or	-0.425969
-1.437353	classes or	-0.124939
-1.057977	precision or	-0.124939
-1.439193	line or	-0.124939
-2.051936	manual or	-0.124939
-2.040037	advantageous or	-0.124939
-2.004535	container or	-0.124939
-2.007318	eight or	-0.124939
-1.915934	few or	-0.124939
-1.396527	list or	-0.124939
-0.503726	structure or	-0.564271
-1.986599	inline or	-0.124939
-1.902473	add or	-0.124939
-0.978238	mode or	-0.425969
-1.951829	values or	-0.124939
-1.930101	files or	-0.124939
-1.900138	problems or	-0.124939
-1.885541	space or	-0.124939
-1.892778	dispatching or	-0.124939
-1.862628	branches or	-0.124939
-1.848731	multiplication or	-0.124939
-1.853356	automatically or	-0.124939
-1.289714	expression or	-0.124939
-1.285811	members or	-0.124939
-1.843560	methods or	-0.124939
-0.762228	signed or	-0.522879
-1.850920	block or	-0.124939
-1.245259	zero or	-0.124939
-1.856051	Microsoft or	-0.124939
-1.260873	reference or	-0.124939
-1.822195	string or	-0.124939
-1.785988	three or	-0.124939
-1.814708	lookup or	-0.124939
-1.221102	types or	-0.124939
-1.780806	expressions or	-0.124939
-1.221102	read or	-0.425969
-0.958565	aligned or	-0.301030
-1.800999	declared or	-0.124939
-1.153494	process or	-0.124939
-1.778723	results or	-0.124939
-1.739007	constructor or	-0.124939
-1.715740	modules or	-0.124939
-1.107154	negative or	-0.124939
-1.722225	predicted or	-0.124939
-1.730418	loaded or	-0.124939
-1.714183	positive or	-0.124939
-0.709802	C or	-0.425969
-1.090784	off or	-0.124939
-1.704089	syntax or	-0.124939
-1.695896	index or	-0.124939
-1.684411	network or	-0.124939
-1.684411	slow or	-0.124939
-1.692925	functions, or	-0.124939
-1.662962	platforms or	-0.124939
-1.637904	task or	-0.124939
-1.637904	inlined or	-0.124939
-1.708233	repeat or	-0.124939
-1.630777	clear or	-0.124939
-0.959843	disk or	-0.124939
-0.967865	overloaded or	-0.124939
-1.587501	true or	-0.124939
-0.961834	little or	-0.425969
-1.575773	initialized or	-0.124939
-0.930077	SSE or	-0.124939
-0.930077	reading or	-0.425969
-0.928057	cores or	-0.425969
-0.930077	copied or	-0.124939
-1.566911	BSD or	-0.124939
-1.575773	program, or	-0.124939
-1.566911	loops or	-0.124939
-1.516834	templates or	-0.124939
-1.516834	buffer or	-0.124939
-1.516834	seconds or	-0.124939
-1.508320	compiler, or	-0.124939
-1.525518	module or	-0.124939
-0.884654	input or	-0.124939
-1.516834	row or	-0.124939
-1.534380	map or	-0.124939
-1.516348	3; or	-0.124939
-1.488623	writes or	-0.124939
-0.851100	brands or	-0.124939
-1.471076	brand or	-0.124939
-1.516348	*p or	-0.124939
-1.437470	12 or	-0.124939
-1.428608	prediction or	-0.124939
-1.437470	integer, or	-0.124939
-1.437470	once or	-0.124939
-0.793803	__restrict or	-0.124939
-1.428608	DLL or	-0.124939
-0.735811	delete or	-0.124939
-1.388525	C1 or	-0.124939
-1.397764	called, or	-0.124939
-1.379478	update or	-0.124939
-1.426721	slower or	-0.124939
-1.388525	polymorphism or	-0.124939
-1.397764	remove or	-0.124939
-1.349906	possible, or	-0.124939
-0.672951	reads or	-0.425969
-1.359775	scope or	-0.124939
-0.670903	reference, or	-0.124939
-1.321578	five or	-0.124939
-0.263860	Reading or	-0.726999
-1.330817	compilation or	-0.124939
-1.251635	__fastcall or	-0.124939
-1.251635	remote or	-0.124939
-1.261075	effects or	-0.124939
-1.251635	processes or	-0.124939
-1.280593	console or	-0.124939
-0.593770	created or	-0.425969
-1.261075	hundred or	-0.124939
-1.270725	command or	-0.124939
-1.261075	latency or	-0.124939
-1.261075	Wednesday or	-0.124939
-1.251635	key or	-0.124939
-1.251635	flag or	-0.124939
-1.183683	overlap or	-0.124939
-1.164165	needed, or	-0.124939
-1.164165	pre-increment or	-0.124939
-1.164165	move or	-0.124939
-1.164165	__declspec(align(16)) or	-0.124939
-1.173815	pattern or	-0.124939
-1.173815	bottleneck or	-0.124939
-0.498917	video or	-0.425969
-1.183683	50% or	-0.124939
-1.164165	(1) or	-0.124939
-1.164165	uncached or	-0.124939
-1.183683	incompatible or	-0.124939
-0.498917	simultaneously or	-0.124939
-0.247559	tree or	-0.301030
-1.164165	hyperthreading or	-0.124939
-1.048876	each, or	-0.124939
-1.048876	blocks, or	-0.124939
-1.048876	Two or	-0.124939
-1.048876	blocking or	-0.124939
-0.123778	(*.dll or	-0.301030
-1.058745	database, or	-0.124939
-1.048876	parameter, or	-0.124939
-1.058745	text or	-0.124939
-1.048876	correct or	-0.124939
-1.048876	-S or	-0.124939
-1.048876	(128 or	-0.124939
-0.376046	-O3 or	-0.124939
-0.123778	infinity or	-0.301030
-1.058745	Global or	-0.124939
-1.048876	Copying or	-0.124939
-1.048876	Replace or	-0.124939
-1.058745	interpreting or	-0.124939
-0.882653	card or	-0.124939
-0.882653	interpretation or	-0.124939
-0.882653	overlapping or	-0.124939
-0.882653	window or	-0.124939
-0.882653	(16 or	-0.124939
-0.882653	operator, or	-0.124939
-0.882653	for-loop or	-0.124939
-0.882653	stdint.h or	-0.124939
-0.882653	(32-bit or	-0.124939
-0.882653	number, or	-0.124939
-0.882653	locally or	-0.124939
-0.882653	search, or	-0.124939
-0.882653	uninitialized or	-0.124939
-0.882653	expression, or	-0.124939
-0.882653	Delays or	-0.124939
-0.882653	internet or	-0.124939
-0.882653	audio or	-0.124939
-0.882653	indices or	-0.124939
-0.882653	unstable or	-0.124939
-0.202032	*.a) or	-0.425969
-0.882653	searching, or	-0.124939
-0.882653	array, or	-0.124939
-0.882653	purpose, or	-0.124939
-0.882653	coprocessor or	-0.124939
-0.882653	recovering or	-0.124939
-0.882653	initialization, or	-0.124939
-0.882653	press or	-0.124939
-0.882653	keyboard or	-0.124939
-0.882653	52 or	-0.124939
-0.882653	delete, or	-0.124939
-0.591721	update, or	-0.124939
-0.591721	objconv or	-0.124939
-0.591721	C1::Disp() or	-0.124939
-0.591721	weakness or	-0.124939
-0.591721	x?" or	-0.124939
-0.591721	pixel or	-0.124939
-0.591721	__thread or	-0.124939
-0.591721	f(x) or	-0.124939
-0.591721	(.dll or	-0.124939
-0.591721	2.20 or	-0.124939
-0.591721	button or	-0.124939
-0.591721	imprecise or	-0.124939
-0.591721	polygon or	-0.124939
-0.591721	optimally, or	-0.124939
-0.591721	"static" or	-0.124939
-0.591721	game or	-0.124939
-0.591721	ger or	-0.124939
-0.591721	workday or	-0.124939
-0.591721	vectorize, or	-0.124939
-0.591721	misprediction, or	-0.124939
-0.591721	year or	-0.124939
-0.591721	First-In-First-Out or	-0.124939
-0.591721	GetTickCount or	-0.124939
-0.591721	(.lib or	-0.124939
-0.591721	frame" or	-0.124939
-0.591721	__declspec(noalias) or	-0.124939
-0.591721	reset or	-0.124939
-0.591721	piecewise or	-0.124939
-0.591721	creates or	-0.124939
-0.591721	violate or	-0.124939
-0.591721	VHDL or	-0.124939
-0.591721	inlined, or	-0.124939
-0.591721	/QaxAVX or	-0.124939
-0.591721	/O2 or	-0.124939
-0.591721	new/delete or	-0.124939
-0.591721	memset, or	-0.124939
-0.591721	2015 or	-0.124939
-0.591721	signed, or	-0.124939
-0.591721	(/FAs or	-0.124939
-0.591721	const, or	-0.124939
-0.591721	3"); or	-0.124939
-0.591721	(SDK or	-0.124939
-0.591721	Incrementing or	-0.124939
-0.591721	-fwrapv or	-0.124939
-0.591721	Quine–McCluskey or	-0.124939
-0.591721	printer or	-0.124939
-0.591721	AVX2, or	-0.124939
-0.591721	incremental or	-0.124939
-0.591721	criteria or	-0.124939
-0.591721	Pentium-II or	-0.124939
-0.591721	www.agner.org/optimize/testp.zip or	-0.124939
-0.591721	addresses, or	-0.124939
-0.591721	directly, or	-0.124939
-0.591721	range"); or	-0.124939
-0.591721	(XMM or	-0.124939
-0.591721	C0::f or	-0.124939
-0.591721	he or	-0.124939
-0.591721	wstring or	-0.124939
-0.591721	-Wstrict-overflow=2, or	-0.124939
-3.048606	is it	-0.124939
-3.000233	of it	-0.124939
-1.850771	and it	-0.329059
-1.158836	that it	-0.578579
-1.096921	if it	-0.726999
-1.988388	as it	-0.425969
-1.656080	than it	-0.191886
-0.892724	time it	-0.917330
-2.195722	use it	-0.425969
-1.162809	when it	-0.287666
-0.718511	then it	-0.881901
-1.520073	make it	-0.301030
-0.687197	because it	-0.402488
-2.576359	CPU it	-0.124939
-1.820855	If it	-0.602060
-2.031788	which it	-0.124939
-0.809788	but it	-0.425969
-2.509672	one it	-0.124939
-2.505074	set it	-0.124939
-2.503313	do it	-0.124939
-2.499365	pointer it	-0.124939
-2.434271	object it	-0.124939
-0.997629	where it	-0.346788
-2.442528	value it	-0.124939
-2.404076	so it	-0.124939
-0.894165	makes it	-0.346788
-0.937059	before it	-0.380211
-1.475939	call it	-0.124939
-1.719913	example, it	-0.425969
-2.254105	optimization it	-0.124939
-2.260988	libraries it	-0.124939
-1.390629	sure it	-0.301030
-2.224326	access it	-0.124939
-1.349977	case it	-0.124939
-0.915667	cases it	-0.492916
-2.192619	making it	-0.124939
-1.575654	want it	-0.124939
-2.191476	important it	-0.124939
-2.131669	while it	-0.124939
-2.140421	work it	-0.124939
-0.999388	But it	-0.346788
-2.089634	therefore it	-0.124939
-0.808316	whether it	-0.903090
-1.405356	calculate it	-0.124939
-2.023340	store it	-0.124939
-2.017655	well it	-0.124939
-1.347368	write it	-0.124939
-2.001137	However, it	-0.124939
-1.949607	was it	-0.124939
-1.336824	cases, it	-0.425969
-1.953587	microprocessor it	-0.124939
-1.323871	replace it	-0.124939
-0.419809	Therefore, it	-0.467361
-1.937196	allows it	-0.124939
-1.270822	what it	-0.124939
-1.875922	applications it	-0.124939
-1.235882	after it	-0.425969
-1.860959	give it	-0.124939
-1.843998	control it	-0.124939
-1.834662	Here, it	-0.124939
-1.790758	around it	-0.124939
-1.782492	turn it	-0.124939
-1.727644	fact it	-0.124939
-1.702037	sometimes it	-0.124939
-1.702037	prevent it	-0.124939
-1.719913	prevents it	-0.124939
-1.719913	tell it	-0.124939
-1.674253	time, it	-0.124939
-1.665496	copying it	-0.124939
-1.674253	accessing it	-0.124939
-1.692315	divide it	-0.124939
-1.639491	iteration it	-0.124939
-0.975507	though it	-0.124939
-0.974549	convert it	-0.425969
-1.610639	consider it	-0.124939
-0.937719	program, it	-0.425969
-1.597302	why it	-0.124939
-0.645121	whenever it	-0.124939
-1.564755	used, it	-0.124939
-1.569247	Obviously, it	-0.124939
-1.564755	features it	-0.124939
-1.523489	multiply it	-0.124939
-1.523489	Here it	-0.124939
-1.418883	delete it	-0.124939
-1.414345	Likewise, it	-0.124939
-0.490219	called, it	-0.124939
-1.418883	Now it	-0.124939
-1.356523	declare it	-0.124939
-1.351936	giving it	-0.124939
-1.351936	above, it	-0.124939
-1.351936	comparing it	-0.124939
-1.361158	general, it	-0.124939
-1.356523	C++, it	-0.124939
-1.361158	anything it	-0.124939
-1.281977	event it	-0.124939
-1.277342	hand, it	-0.124939
-1.281977	created it	-0.124939
-1.277342	Fortunately, it	-0.124939
-1.277342	unfortunately it	-0.124939
-0.598182	And it	-0.425969
-1.281977	platforms, it	-0.124939
-1.277342	compare it	-0.124939
-1.277342	languages, it	-0.124939
-1.185067	Furthermore, it	-0.124939
-1.189753	declaring it	-0.124939
-1.185067	class, it	-0.124939
-1.189753	disable it	-0.124939
-1.189753	words, it	-0.124939
-1.064814	optimization, it	-0.124939
-0.378268	Sometimes it	-0.124939
-1.069550	Today, it	-0.124939
-1.064814	arrays, it	-0.124939
-1.064814	variable, it	-0.124939
-0.203147	Often, it	-0.124939
-0.893459	it, it	-0.124939
-0.893459	worse, it	-0.124939
-0.893459	Nevertheless, it	-0.124939
-0.893459	software, it	-0.124939
-0.893459	algebra, it	-0.124939
-0.203147	projects, it	-0.425969
-0.893459	although it	-0.124939
-0.893459	Or it	-0.124939
-0.893459	method, it	-0.124939
-0.893459	accessed, it	-0.124939
-0.597218	recognizes it	-0.124939
-0.597218	see, it	-0.124939
-0.597218	Usually it	-0.124939
-0.597218	referencing it	-0.124939
-0.597218	performance, it	-0.124939
-0.597218	redirects it	-0.124939
-0.597218	(b*c)/d, it	-0.124939
-0.597218	least, it	-0.124939
-0.597218	Typically it	-0.124939
-0.597218	Hence, it	-0.124939
-0.597218	design, it	-0.124939
-0.597218	bottleneck, it	-0.124939
-0.597218	project, it	-0.124939
-0.597218	habit, it	-0.124939
-0.597218	all, it	-0.124939
-0.597218	XOR'ing it	-0.124939
-0.597218	AND'ing it	-0.124939
-0.597218	importantly, it	-0.124939
-0.597218	reflecting it	-0.124939
-0.597218	these, it	-0.124939
-0.597218	nature, it	-0.124939
-1.585756	the function	-0.416825
-1.413165	a function	-0.466086
-2.198622	of function	-0.346788
-3.043882	to function	-0.124939
-2.244630	and function	-0.301030
-1.933777	The function	-0.170696
-2.600788	for function	-0.124939
-2.159599	// function	-0.425969
-2.221984	or function	-0.124939
-2.804146	on function	-0.124939
-2.129099	as function	-0.124939
-1.785114	This function	-0.301030
-1.737200	this function	-0.124939
-1.671971	A function	-0.346788
-2.617290	vector function	-0.124939
-2.101902	make function	-0.124939
-2.064936	different function	-0.124939
-1.688559	same function	-0.124939
-2.480950	other function	-0.124939
-2.558075	which function	-0.124939
-1.779581	one function	-0.124939
-2.490160	no function	-0.124939
-1.379450	each function	-0.204120
-2.513452	do function	-0.124939
-2.519357	most function	-0.124939
-2.472717	using function	-0.124939
-1.551428	Intel function	-0.249877
-1.298058	library function	-0.124939
-2.428391	multiple function	-0.124939
-1.587818	many function	-0.124939
-1.802490	any function	-0.124939
-2.350408	between function	-0.124939
-0.629769	member function	-0.362300
-2.392169	const function	-0.124939
-1.745893	makes function	-0.425969
-0.987625	critical function	-0.182931
-1.674995	template function	-0.124939
-1.578109	simple function	-0.124939
-2.200388	Gnu function	-0.124939
-2.153906	about function	-0.124939
-2.165600	extra function	-0.124939
-2.119521	Use function	-0.124939
-2.126970	best function	-0.124939
-1.257575	single function	-0.124939
-1.498904	These function	-0.425969
-0.704724	virtual function	-0.234083
-1.516405	through function	-0.124939
-2.095914	common function	-0.124939
-2.113208	thread function	-0.124939
-2.181861	power function	-0.124939
-1.461484	optimized function	-0.124939
-2.064273	128 function	-0.124939
-2.035802	four function	-0.124939
-0.920983	another function	-0.124939
-1.158784	inline function	-0.124939
-1.394748	every function	-0.124939
-1.388593	standard function	-0.124939
-1.292885	intrinsic function	-0.124939
-1.284890	separate function	-0.124939
-1.914723	various function	-0.124939
-0.844606	dispatcher function	-0.124939
-1.860658	Each function	-0.124939
-0.962490	graphics function	-0.124939
-1.848981	Many function	-0.124939
-1.872659	linked function	-0.124939
-1.192403	calling function	-0.124939
-1.827791	own function	-0.124939
-1.813900	appropriate function	-0.124939
-1.784963	C function	-0.124939
-1.722493	desired function	-0.124939
-1.714530	No function	-0.124939
-1.041073	math function	-0.124939
-1.038391	inlined function	-0.124939
-0.515595	frame function	-0.124939
-0.790911	Assume function	-0.124939
-1.668458	right function	-0.124939
-1.009786	Define function	-0.124939
-1.641962	Another function	-0.124939
-1.650389	overloaded function	-0.124939
-1.629621	Any function	-0.124939
-1.588568	length function	-0.124939
-0.896742	linear function	-0.425969
-0.895842	define function	-0.124939
-1.529726	latter function	-0.124939
-1.521216	time-consuming function	-0.124939
-0.324512	pure function	-0.221849
-0.852789	factorial function	-0.124939
-0.850984	across function	-0.425969
-1.482891	memcpy function	-0.124939
-0.273628	detection function	-0.124939
-1.474298	polymorphic function	-0.124939
-1.487253	Virtual function	-0.124939
-1.470064	general function	-0.124939
-1.416306	Even function	-0.124939
-1.416306	storing function	-0.124939
-1.424899	transpose function	-0.124939
-0.679418	Intrinsic function	-0.124939
-0.265566	dispatched function	-0.124939
-1.353634	Several function	-0.124939
-1.362314	Set function	-0.124939
-0.265215	leaf function	-0.425969
-1.278771	Critical function	-0.124939
-0.598421	pow function	-0.124939
-1.283133	increasing function	-0.124939
-0.599328	strlen function	-0.124939
-1.195079	asmlib function	-0.124939
-1.186223	round function	-0.124939
-1.190628	lrint function	-0.124939
-1.190628	Fastcall function	-0.124939
-0.502418	exp function	-0.425969
-1.065690	53 function	-0.124939
-1.065690	API function	-0.124939
-1.065690	counters, function	-0.124939
-1.070140	exponential function	-0.124939
-1.065690	up-to-date function	-0.124939
-1.065690	Simple function	-0.124939
-1.070140	InstructionSet() function	-0.124939
-0.894049	indirect function	-0.124939
-0.894049	61 function	-0.124939
-0.894049	distribute function	-0.124939
-0.894049	Fast function	-0.124939
-0.203207	mangled function	-0.425969
-0.894049	Optimize function	-0.124939
-0.894049	thread-safe function	-0.124939
-0.597516	user-defined function	-0.124939
-0.597516	nested function	-0.124939
-0.597516	friend function	-0.124939
-0.597516	Branch/loop function	-0.124939
-0.597516	error-handling function	-0.124939
-0.597516	staircase function	-0.124939
-0.597516	Optimized function	-0.124939
-0.597516	instrset_detect function	-0.124939
-0.597516	std::unexpected() function	-0.124939
-0.597516	DelayFiveSeconds function	-0.124939
-0.597516	sin function	-0.124939
-0.597516	from), function	-0.124939
-2.745384	the if	-0.124939
-2.448405	and if	-0.124939
-2.459088	The if	-0.124939
-2.257892	that if	-0.124939
-2.388412	// if	-0.425969
-1.430118	or if	-0.425969
-2.371810	function if	-0.124939
-2.110986	code if	-0.124939
-2.239571	as if	-0.124939
-1.912716	not if	-0.124939
-2.548649	int if	-0.124939
-2.012465	than if	-0.301030
-2.620107	compiler if	-0.124939
-1.430347	{ if	-0.425969
-2.161714	time if	-0.124939
-1.910273	} if	-0.124939
-1.863511	memory if	-0.124939
-2.436954	program if	-0.124939
-2.030779	functions if	-0.124939
-1.286871	only if	-0.301030
-2.607359	instruction if	-0.124939
-2.428628	point if	-0.124939
-1.326170	loop if	-0.182931
-2.467129	but if	-0.124939
-1.524809	used if	-0.221849
-2.378395	one if	-0.124939
-2.397896	cache if	-0.124939
-1.725532	integer if	-0.124939
-1.933038	set if	-0.425969
-1.882370	example if	-0.425969
-2.319312	double if	-0.124939
-2.383776	size if	-0.124939
-2.400696	pointer if	-0.124939
-1.900819	b if	-0.425969
-2.321773	library if	-0.124939
-2.331869	static if	-0.124939
-1.342823	efficient if	-0.346788
-1.582871	possible if	-0.301030
-2.356303	version if	-0.124939
-2.330620	objects if	-0.124939
-2.241151	variable if	-0.124939
-2.280555	variables if	-0.124939
-1.170692	2 if	-0.425969
-2.227206	table if	-0.124939
-1.753139	performance if	-0.425969
-1.739797	branch if	-0.124939
-2.318980	way if	-0.124939
-0.899117	faster if	-0.647817
-2.232283	call if	-0.124939
-0.920817	example, if	-0.301030
-2.276531	unsigned if	-0.124939
-2.179261	register if	-0.124939
-1.399749	pointers if	-0.301030
-2.142430	systems if	-0.124939
-2.179759	user if	-0.124939
-1.635297	useful if	-0.425969
-0.739886	even if	-0.467361
-2.185870	method if	-0.124939
-2.208863	out if	-0.124939
-2.082316	system if	-0.124939
-2.153686	0 if	-0.124939
-2.136193	case if	-0.124939
-1.568630	available if	-0.124939
-2.161129	up if	-0.124939
-2.088106	error if	-0.124939
-2.130989	important if	-0.124939
-1.535370	CPUs if	-0.124939
-2.049943	while if	-0.124939
-2.090895	arrays if	-0.124939
-2.062119	Windows if	-0.124939
-2.121713	result if	-0.124939
-2.047980	best if	-0.124939
-1.504741	necessary if	-0.124939
-2.065241	element if	-0.124939
-0.905113	But if	-0.301030
-2.083217	speed if	-0.124939
-2.111348	i; if	-0.124939
-1.244847	thread if	-0.124939
-2.051808	integers if	-0.124939
-2.039742	option if	-0.124939
-2.057969	good if	-0.124939
-2.005576	precision if	-0.124939
-1.999972	line if	-0.124939
-2.011556	optimized if	-0.124939
-1.434082	}; if	-0.124939
-1.432446	b; if	-0.425969
-0.692979	check if	-0.182931
-0.921321	advantageous if	-0.522879
-1.416958	problem if	-0.425969
-2.089213	advantage if	-0.124939
-1.942832	1 if	-0.124939
-1.376422	mode if	-0.124939
-1.972829	values if	-0.124939
-1.365875	well if	-0.124939
-1.346782	cycles if	-0.425969
-1.353265	... if	-0.425969
-2.045752	recommended if	-0.124939
-1.338542	fast if	-0.124939
-1.957231	However, if	-0.124939
-1.890831	programs if	-0.124939
-1.321672	problems if	-0.425969
-1.090176	else if	-0.301030
-1.887273	application if	-0.124939
-1.877132	automatically if	-0.124939
-0.893834	see if	-0.124939
-1.294395	implementation if	-0.124939
-1.864722	complicated if	-0.124939
-1.866440	methods if	-0.124939
-1.272999	disadvantage if	-0.124939
-1.842370	zero if	-0.124939
-1.881884	what if	-0.124939
-1.895897	reference if	-0.124939
-1.836663	lookup if	-0.124939
-1.836663	runtime if	-0.124939
-1.242798	needed if	-0.425969
-1.809653	together if	-0.124939
-1.831196	bigger if	-0.124939
-1.782497	vectors if	-0.124939
-1.808920	know if	-0.124939
-1.794439	results if	-0.124939
-1.756946	function, if	-0.124939
-1.756946	operands if	-0.124939
-1.736739	modules if	-0.124939
-1.756946	smaller if	-0.124939
-1.778138	here if	-0.124939
-1.718547	section if	-0.124939
-1.116503	contentions if	-0.425969
-1.739072	predicted if	-0.124939
-1.746135	C if	-0.124939
-1.118176	global if	-0.124939
-1.705793	statement if	-0.124939
-1.091847	errors if	-0.124939
-1.063819	inefficient if	-0.124939
-1.691777	checking if	-0.124939
-1.676294	platforms if	-0.124939
-1.035536	vectorized if	-0.124939
-1.698962	costs if	-0.124939
-1.654751	inlined if	-0.124939
-1.676294	d; if	-0.124939
-1.676294	destructor if	-0.124939
-0.628931	safe if	-0.249877
-1.661814	further if	-0.124939
-1.654751	algorithm if	-0.124939
-1.674603	exponent if	-0.124939
-1.594867	disk if	-0.124939
-1.639841	obtained if	-0.124939
-0.562645	efficiently if	-0.124939
-1.609347	models if	-0.124939
-0.564633	fail if	-0.249877
-1.586538	occur if	-0.124939
-1.586538	target if	-0.124939
-0.526181	especially if	-0.425969
-1.564258	updates if	-0.124939
-0.524856	consider if	-0.124939
-1.586538	directly if	-0.124939
-1.578984	message if	-0.124939
-1.564258	parallel if	-0.124939
-1.610023	easier if	-0.124939
-0.930801	loops if	-0.124939
-0.894488	u; if	-0.124939
-1.522866	significant if	-0.124939
-0.889408	invalid if	-0.425969
-1.560660	organized if	-0.124939
-1.568630	gain if	-0.124939
-0.892788	happen if	-0.425969
-1.576750	matter if	-0.124939
-1.484408	style if	-0.124939
-1.514903	help if	-0.124939
-1.522873	explanation if	-0.124939
-1.514903	pure if	-0.124939
-1.484408	(or if	-0.124939
-1.522873	cycle if	-0.124939
-1.448235	frequent if	-0.124939
-1.455923	label if	-0.124939
-1.440681	variables, if	-0.124939
-1.448235	however, if	-0.124939
-1.455923	devices if	-0.124939
-1.448235	explicitly if	-0.124939
-1.455923	tables if	-0.124939
-1.405758	debugging if	-0.124939
-1.390243	Likewise, if	-0.124939
-1.390243	expensive if	-0.124939
-1.390243	compile-time if	-0.124939
-1.338811	compact if	-0.124939
-1.330985	course, if	-0.124939
-1.330985	complex if	-0.124939
-1.330985	detect if	-0.124939
-1.338811	Test if	-0.124939
-1.338811	costly if	-0.124939
-1.330985	poor if	-0.124939
-1.330985	happens if	-0.124939
-0.672640	evaluated if	-0.425969
-1.338811	permissible if	-0.124939
-1.330985	(i.e. if	-0.124939
-1.259630	hand, if	-0.124939
-1.267600	i.e. if	-0.124939
-1.259630	separately if	-0.124939
-1.267600	true, if	-0.124939
-1.170690	modification if	-0.124939
-0.247951	eliminated if	-0.602060
-1.170690	selected if	-0.124939
-1.170690	resolution if	-0.124939
-0.376750	Faster if	-0.425969
-1.053871	anyway if	-0.124939
-1.062145	7.8 if	-0.124939
-1.053871	space, if	-0.124939
-1.053871	delays if	-0.124939
-1.053871	longjmp if	-0.124939
-1.053871	questions if	-0.124939
-1.053871	(STL) if	-0.124939
-1.062145	Check if	-0.124939
-1.053871	determine if	-0.124939
-1.053871	mispredictions if	-0.124939
-0.886054	accumulators if	-0.124939
-0.886054	aliasing" if	-0.124939
-0.886054	479001600}; if	-0.124939
-0.886054	constructor, if	-0.124939
-0.886054	v.f if	-0.124939
-0.202386	branches): if	-0.425969
-0.202386	v; if	-0.425969
-0.886054	cheap if	-0.124939
-0.202386	Day; if	-0.425969
-0.886054	consumer if	-0.124939
-0.886054	avoided, if	-0.124939
-0.886054	subroutine if	-0.124939
-0.886054	leaks if	-0.124939
-0.593458	14.5b if	-0.124939
-0.593458	N-1)==0 if	-0.124939
-0.593458	WriteFile if	-0.124939
-0.593458	143 if	-0.124939
-0.593458	Number) if	-0.124939
-0.593458	calls, if	-0.124939
-0.593458	14.4b if	-0.124939
-0.593458	zero-bits if	-0.124939
-0.593458	14.15b if	-0.124939
-0.593458	(approximately): if	-0.124939
-0.593458	Or, if	-0.124939
-0.593458	inexact if	-0.124939
-0.593458	modified, if	-0.124939
-0.593458	adjusted if	-0.124939
-0.593458	8.10a if	-0.124939
-0.593458	runtime, if	-0.124939
-0.593458	occur: if	-0.124939
-0.593458	(YMM) if	-0.124939
-0.593458	destructor, if	-0.124939
-0.593458	sign-bit if	-0.124939
-0.593458	ignored if	-0.124939
-0.593458	normalized, if	-0.124939
-0.593458	uninitialized, if	-0.124939
-0.593458	(XMM) if	-0.124939
-0.593458	__restrict__, if	-0.124939
-0.593458	minute if	-0.124939
-0.593458	14.15a if	-0.124939
-0.593458	list[ARRAYSIZE]; if	-0.124939
-0.593458	panic if	-0.124939
-0.593458	reversed if	-0.124939
-0.593458	linking" if	-0.124939
-0.593458	minimized if	-0.124939
-0.593458	alias, if	-0.124939
-2.910503	is by	-0.124939
-2.717485	to by	-0.124939
-2.324307	and by	-0.124939
-1.776250	or by	-0.124939
-2.456953	it by	-0.124939
-2.209387	function by	-0.301030
-2.128981	code by	-0.124939
-2.548801	not by	-0.124939
-1.533285	than by	-0.234083
-2.677810	compiler by	-0.124939
-2.874367	x by	-0.124939
-1.327732	this by	-0.301030
-2.150484	more by	-0.124939
-2.519846	memory by	-0.124939
-2.066161	program by	-0.124939
-2.522599	functions by	-0.124939
-2.501828	CPU by	-0.124939
-1.143716	loop by	-0.301030
-1.249970	used by	-0.124939
-1.761682	one by	-0.301030
-2.546057	should by	-0.124939
-2.435603	set by	-0.124939
-2.387652	class by	-0.124939
-2.369769	double by	-0.124939
-2.427796	size by	-0.124939
-1.649206	i by	-0.301030
-2.373652	object by	-0.124939
-2.478642	number by	-0.124939
-2.460505	clock by	-0.124939
-2.399095	value by	-0.124939
-2.285630	variable by	-0.124939
-2.317708	variables by	-0.124939
-1.743522	2 by	-0.124939
-2.269300	table by	-0.124939
-1.529120	performance by	-0.124939
-1.514196	branch by	-0.602060
-2.388812	member by	-0.124939
-2.344479	way by	-0.124939
-2.365949	faster by	-0.124939
-2.365196	stored by	-0.124939
-1.475050	called by	-0.301030
-1.724114	address by	-0.124939
-2.266252	call by	-0.124939
-1.162392	optimization by	-0.221849
-1.639803	registers by	-0.124939
-2.178352	systems by	-0.124939
-2.178689	access by	-0.124939
-1.374376	out by	-0.124939
-2.134624	file by	-0.124939
-2.160913	type by	-0.124939
-2.120697	error by	-0.124939
-1.303853	accessed by	-0.301030
-1.525185	arrays by	-0.124939
-2.093266	Windows by	-0.124939
-2.123254	execution by	-0.124939
-2.144462	result by	-0.124939
-2.098241	bytes by	-0.124939
-1.265223	speed by	-0.602060
-2.077307	overflow by	-0.124939
-0.671224	done by	-0.234083
-2.035975	precision by	-0.124939
-1.456558	line by	-0.124939
-2.049825	works by	-0.124939
-2.039597	optimized by	-0.124939
-0.841086	calculated by	-0.204120
-2.021200	uses by	-0.124939
-2.006124	another by	-0.124939
-2.075606	advantageous by	-0.124939
-1.445589	implemented by	-0.124939
-1.178082	problem by	-0.124939
-0.441914	supported by	-0.460731
-1.970047	1 by	-0.124939
-1.993616	values by	-0.124939
-0.571310	simply by	-0.124939
-1.957637	counter by	-0.124939
-1.324385	space by	-0.124939
-0.907772	multiplication by	-0.124939
-1.900823	automatically by	-0.124939
-1.257438	zero by	-0.124939
-0.516501	division by	-0.726999
-1.881032	needed by	-0.124939
-1.897975	transferred by	-0.124939
-0.598733	aligned by	-0.204120
-1.814576	dispatch by	-0.124939
-1.832083	declared by	-0.124939
-1.814576	second by	-0.124939
-1.875907	piece by	-0.124939
-0.016508	divisible by	-0.580871
-1.786326	just by	-0.124939
-1.140502	smaller by	-0.124939
-1.761502	core by	-0.124939
-1.749753	5 by	-0.124939
-0.107836	replaced by	-0.316824
-1.749753	negative by	-0.124939
-1.738312	section by	-0.124939
-1.755588	predicted by	-0.124939
-1.729259	conversions by	-0.124939
-1.753416	off by	-0.124939
-1.729259	index by	-0.124939
-0.286181	avoided by	-0.176091
-1.707145	fact by	-0.124939
-0.785458	limited by	-0.301030
-1.671267	inlined by	-0.124939
-1.665432	database by	-0.124939
-1.689258	destructor by	-0.124939
-1.659674	save by	-0.124939
-1.677182	further by	-0.124939
-1.708027	efficiency by	-0.124939
-1.701680	unroll by	-0.124939
-1.644997	alignment by	-0.124939
-1.639082	macro by	-0.124939
-1.010077	divide by	-0.124939
-0.564609	obtained by	-0.124939
-1.634733	efficiently by	-0.124939
-1.628478	changed by	-0.124939
-1.622311	square by	-0.124939
-1.610235	structures by	-0.124939
-1.596945	initialized by	-0.124939
-0.180088	improved by	-0.249877
-1.590689	either by	-0.124939
-1.596945	copied by	-0.124939
-0.932040	align by	-0.124939
-1.590689	loops by	-0.124939
-1.549296	module by	-0.124939
-0.644712	gain by	-0.124939
-1.543130	row by	-0.124939
-0.596640	multiply by	-0.124939
-0.796471	binding by	-0.425969
-1.491346	converted by	-0.124939
-1.471430	additions by	-0.124939
-1.471430	designed by	-0.124939
-1.464989	j by	-0.124939
-0.796471	explicitly by	-0.124939
-0.800606	multiplying by	-0.124939
-1.458642	jump by	-0.124939
-0.273054	determined by	-0.346788
-1.464989	misses by	-0.124939
-1.400650	switches by	-0.124939
-1.413438	manuals by	-0.124939
-1.346491	compact by	-0.124939
-1.353029	comparisons by	-0.124939
-1.346491	step by	-0.124939
-1.353029	anything by	-0.124939
-1.273847	inheritance by	-0.124939
-1.273847	overcome by	-0.124939
-0.186009	generated by	-0.726999
-1.273847	created by	-0.124939
-1.267310	pointers, by	-0.124939
-1.267310	increased by	-0.124939
-0.344456	Division by	-0.602060
-1.267310	guidelines by	-0.124939
-1.273847	combined by	-0.124939
-1.183575	Multiply by	-0.124939
-1.183575	not, by	-0.124939
-0.500963	2n by	-0.124939
-1.183575	prevented by	-0.124939
-1.190316	illustrated by	-0.124939
-1.183575	returned by	-0.124939
-1.176937	8.26a by	-0.124939
-0.089638	identified by	-0.249877
-0.089638	multiplied by	-0.425969
-0.248322	modified by	-0.602060
-1.176937	manner by	-0.124939
-1.176937	hyperthreading by	-0.124939
-1.058636	deleted by	-0.124939
-1.058636	hidden by	-0.124939
-1.058636	zero, by	-0.124939
-1.058636	manually by	-0.124939
-0.377415	spaced by	-0.425969
-0.377415	separated by	-0.124939
-0.377415	solved by	-0.425969
-0.124160	Divide by	-0.602060
-1.065377	ms by	-0.124939
-1.058636	mispredictions by	-0.124939
-1.058636	necessary, by	-0.124939
-0.889286	thrown by	-0.124939
-0.889286	bypassed by	-0.124939
-0.889286	Codes", by	-0.124939
-0.889286	ArraySize by	-0.124939
-0.889286	everywhere by	-0.124939
-0.889286	Align by	-0.124939
-0.889286	dramatically by	-0.124939
-0.889286	dividing by	-0.124939
-0.889286	relocated by	-0.124939
-0.889286	received by	-0.124939
-0.889286	grows by	-0.124939
-0.889286	segment by	-0.124939
-0.889286	bitfield by	-0.124939
-0.595103	Parallelization by	-0.124939
-0.595103	2.0) by	-0.124939
-0.595103	investigated by	-0.124939
-0.595103	copyrighted by	-0.124939
-0.595103	Modulo by	-0.124939
-0.595103	doubles by	-0.124939
-0.595103	caught by	-0.124939
-0.595103	u[1] by	-0.124939
-0.595103	zation by	-0.124939
-0.595103	affected by	-0.124939
-0.595103	Multiplying by	-0.124939
-0.595103	indicated by	-0.124939
-0.595103	mitigated by	-0.124939
-0.595103	published by	-0.124939
-0.595103	ameliorated by	-0.124939
-0.595103	followed by	-0.124939
-0.595103	caused by	-0.124939
-0.595103	influenced by	-0.124939
-0.595103	accomplished by	-0.124939
-0.595103	activated by	-0.124939
-0.595103	frustrated by	-0.124939
-2.148722	or with	-0.124939
-2.111677	it with	-0.124939
-2.069054	function with	-0.425969
-2.653378	on with	-0.124939
-1.885307	code with	-0.221849
-2.218790	not with	-0.124939
-2.209711	than with	-0.124939
-1.904721	compiler with	-0.124939
-2.139096	this with	-0.124939
-2.490978	memory with	-0.124939
-2.486859	data with	-0.124939
-2.055033	program with	-0.124939
-2.497206	functions with	-0.124939
-2.484726	only with	-0.124939
-1.808602	CPU with	-0.124939
-2.338858	other with	-0.124939
-1.657437	loop with	-0.249877
-2.012373	used with	-0.425969
-1.944391	integer with	-0.124939
-1.709736	class with	-0.301030
-2.414049	do with	-0.124939
-1.892932	example with	-0.124939
-2.405228	size with	-0.124939
-1.675765	b with	-0.301030
-1.377623	library with	-0.346788
-2.455942	i with	-0.124939
-2.352095	object with	-0.124939
-1.581130	array with	-0.124939
-1.820746	version with	-0.124939
-2.348209	objects with	-0.124939
-2.262822	variable with	-0.124939
-2.198379	return with	-0.124939
-2.247743	table with	-0.124939
-2.213335	software with	-0.124939
-1.729023	elements with	-0.124939
-2.354921	faster with	-0.124939
-2.355201	stored with	-0.124939
-1.701767	called with	-0.124939
-2.212494	4 with	-0.124939
-2.248935	call with	-0.124939
-2.190566	libraries with	-0.124939
-2.208180	template with	-0.124939
-1.238332	systems with	-0.249877
-2.199265	method with	-0.124939
-2.220088	out with	-0.124939
-2.101599	system with	-0.124939
-2.180489	32 with	-0.124939
-1.587943	bits with	-0.425969
-1.353980	operations with	-0.124939
-2.145448	type with	-0.124939
-2.150756	case with	-0.124939
-1.069591	processors with	-0.124939
-2.167008	available with	-0.124939
-1.320556	constant with	-0.301030
-1.330523	up with	-0.602060
-1.555709	times with	-0.124939
-1.301974	accessed with	-0.124939
-0.943058	CPUs with	-0.204120
-1.281418	arrays with	-0.124939
-1.278096	work with	-0.124939
-2.067112	calculations with	-0.124939
-2.144949	versions with	-0.124939
-2.083579	processor with	-0.124939
-0.841647	compiled with	-0.271067
-0.904429	threads with	-0.204120
-2.052488	language with	-0.124939
-2.047450	thread with	-0.124939
-1.251965	compile with	-0.301030
-1.232774	allocated with	-0.301030
-2.064371	overflow with	-0.124939
-1.231936	integers with	-0.124939
-2.047722	Linux with	-0.124939
-2.010085	classes with	-0.124939
-0.670821	done with	-0.301030
-2.015266	line with	-0.124939
-2.036441	works with	-0.124939
-1.439513	calculated with	-0.124939
-1.038422	implemented with	-0.124939
-0.907446	problem with	-0.124939
-2.054467	known with	-0.124939
-2.041937	Function with	-0.124939
-2.005322	list with	-0.124939
-1.394225	run with	-0.124939
-1.977051	well with	-0.124939
-1.987520	addresses with	-0.124939
-1.945160	counter with	-0.124939
-1.942124	allocation with	-0.124939
-1.966487	However, with	-0.124939
-1.903815	programs with	-0.124939
-1.932848	problems with	-0.124939
-1.926884	dispatching with	-0.124939
-1.307075	microprocessor with	-0.425969
-1.961264	preferably with	-0.124939
-1.899402	application with	-0.124939
-1.919062	expression with	-0.124939
-1.906710	members with	-0.124939
-1.877666	methods with	-0.124939
-1.914742	signed with	-0.124939
-1.871783	model with	-0.124939
-1.884300	division with	-0.124939
-1.841454	n with	-0.124939
-1.898327	end with	-0.124939
-1.234501	applications with	-0.124939
-1.241884	addition with	-0.124939
-1.859225	types with	-0.124939
-1.839022	optimizations with	-0.124939
-1.178218	platform with	-0.124939
-1.817832	later with	-0.124939
-1.817832	together with	-0.124939
-1.187184	declared with	-0.124939
-1.166420	link with	-0.124939
-1.160401	made with	-0.124939
-1.812615	points with	-0.124939
-1.133942	modules with	-0.425969
-1.753751	core with	-0.124939
-1.119634	things with	-0.124939
-1.734537	tested with	-0.124939
-0.867924	computer with	-0.124939
-0.141513	compatible with	-0.159701
-1.087265	statement with	-0.124939
-1.747524	dynamically with	-0.124939
-1.094828	Loop with	-0.425969
-1.705991	network with	-0.124939
-0.658211	comes with	-0.249877
-1.035313	platforms with	-0.124939
-1.036836	vectorized with	-0.124939
-1.662930	algorithm with	-0.124939
-0.391919	compatibility with	-0.124939
-1.678426	effect with	-0.124939
-1.622586	predict with	-0.124939
-0.448490	obtained with	-0.346788
-1.629499	efficiently with	-0.124939
-1.629499	names with	-0.124939
-1.622586	N with	-0.124939
-1.595984	(e.g. with	-0.124939
-0.965335	structures with	-0.124939
-1.591710	directly with	-0.124939
-1.598735	defined with	-0.124939
-0.642615	come with	-0.301030
-0.889185	buffer with	-0.425969
-1.557342	happen with	-0.124939
-1.490842	style with	-0.124939
-1.504560	writes with	-0.124939
-1.497647	chains with	-0.124939
-1.497647	eax with	-0.124939
-1.474833	included with	-0.124939
-1.467573	c2 with	-0.124939
-1.467573	additions with	-0.124939
-1.446494	DLL with	-0.124939
-1.460432	devices with	-0.124939
-0.388798	multiplying with	-0.425969
-1.474833	determined with	-0.124939
-1.395416	Even with	-0.124939
-0.738869	polymorphism with	-0.124939
-1.395416	measured with	-0.124939
-1.335494	polynomial with	-0.124939
-1.342634	Func with	-0.124939
-1.342634	Test with	-0.124939
-1.349894	computers with	-0.124939
-1.335494	users with	-0.124939
-1.335494	mixed with	-0.124939
-1.357277	communication with	-0.124939
-1.342634	type-casting with	-0.124939
-1.278096	bc with	-0.124939
-0.597375	swapped with	-0.425969
-0.597375	Array with	-0.124939
-1.263453	compare with	-0.124939
-0.595825	contiguous with	-0.425969
-1.263453	separately with	-0.124939
-0.185881	AND'ed with	-0.249877
-1.270713	combined with	-0.124939
-1.263453	macros with	-0.124939
-1.173803	databases with	-0.124939
-1.181186	Vectorized with	-0.124939
-1.173803	29 with	-0.124939
-1.173803	begin with	-0.124939
-0.502021	represented with	-0.124939
-0.248136	incompatible with	-0.124939
-1.173803	entry with	-0.124939
-1.173803	tests with	-0.124939
-1.181186	well-defined with	-0.124939
-0.124067	satisfied with	-0.124939
-0.124067	Comes with	-0.301030
-1.063758	performed with	-0.124939
-0.377082	Works with	-0.425969
-1.063758	supplied with	-0.124939
-1.063758	moved with	-0.124939
-1.063758	compared with	-0.124939
-0.887667	impossible with	-0.124939
-0.887667	manipulated with	-0.124939
-0.887667	Loops with	-0.124939
-0.887667	14.14a with	-0.124939
-0.202553	Microprocessors with	-0.124939
-0.887667	patterns with	-0.124939
-0.887667	conflicting with	-0.124939
-0.887667	working with	-0.124939
-0.202553	associated with	-0.124939
-0.887667	machines with	-0.124939
-0.887667	ways, with	-0.124939
-0.887667	complications with	-0.124939
-0.202553	dealing with	-0.124939
-0.887667	extending with	-0.124939
-0.202553	interfere with	-0.124939
-0.887667	begins with	-0.124939
-0.887667	IDE with	-0.124939
-0.594280	Included with	-0.124939
-0.594280	coordination with	-0.124939
-0.594280	(12.4e) with	-0.124939
-0.594280	(add with	-0.124939
-0.594280	Vectorization with	-0.124939
-0.594280	configurations with	-0.124939
-0.594280	Systems with	-0.124939
-0.594280	correlated with	-0.124939
-0.594280	14, with	-0.124939
-0.594280	Problems with	-0.124939
-0.594280	trace with	-0.124939
-0.594280	Processors with	-0.124939
-0.594280	supercomputers with	-0.124939
-0.594280	built with	-0.124939
-0.594280	vector::reserve with	-0.124939
-0.594280	unsatisfied with	-0.124939
-0.594280	reached with	-0.124939
-0.594280	(zero with	-0.124939
-0.594280	rewritten with	-0.124939
-0.594280	streams with	-0.124939
-0.594280	connection with	-0.124939
-0.594280	coincides with	-0.124939
-0.594280	invoked with	-0.124939
-0.594280	repeatedly with	-0.124939
-0.594280	pow(x,10) with	-0.124939
-0.594280	dealt with	-0.124939
-0.594280	clash with	-0.124939
-0.594280	disagree with	-0.124939
-0.594280	fighting with	-0.124939
-2.788371	is on	-0.124939
-2.701037	and on	-0.124939
-2.418658	function on	-0.124939
-1.940458	not on	-0.124939
-1.649737	than on	-0.669007
-2.711148	compiler on	-0.124939
-1.720884	time on	-0.221849
-2.625657	use on	-0.124939
-2.606959	more on	-0.124939
-2.552054	memory on	-0.124939
-2.525954	program on	-0.124939
-1.346217	only on	-0.182931
-2.432777	all on	-0.124939
-2.538110	but on	-0.124939
-2.026060	used on	-0.124939
-2.368639	example on	-0.124939
-2.452581	size on	-0.124939
-2.397268	object on	-0.124939
-2.391817	possible on	-0.124939
-1.830375	version on	-0.124939
-2.386464	objects on	-0.124939
-1.267280	performance on	-0.346788
-2.350427	long on	-0.124939
-1.132808	stored on	-0.903090
-2.264684	called on	-0.124939
-2.201485	test on	-0.124939
-2.279480	useful on	-0.124939
-2.226457	even on	-0.124939
-2.155632	file on	-0.124939
-1.206106	operations on	-0.249877
-2.195757	cases on	-0.124939
-1.341288	processors on	-0.124939
-1.559272	important on	-0.124939
-1.550129	accessed on	-0.425969
-2.166325	assembly on	-0.124939
-0.850749	work on	-0.271067
-1.014118	calculations on	-0.346788
-2.142151	compiled on	-0.124939
-2.114035	bytes on	-0.124939
-2.108612	threads on	-0.124939
-0.995433	best on	-0.124939
-2.122591	speed on	-0.124939
-2.117134	much on	-0.124939
-1.480091	overflow on	-0.124939
-2.066208	matrix on	-0.124939
-2.043865	classes on	-0.124939
-2.104535	done on	-0.124939
-2.052665	precision on	-0.124939
-1.456656	works on	-0.124939
-2.096964	manual on	-0.124939
-0.330790	explained on	-1.380211
-1.440170	parameters on	-0.124939
-1.447279	check on	-0.124939
-1.447279	implemented on	-0.124939
-2.067673	solution on	-0.124939
-1.172525	supported on	-0.301030
-2.006622	operators on	-0.124939
-1.154218	run on	-0.124939
-0.971520	well on	-0.249877
-1.356313	cycles on	-0.124939
-1.995630	count on	-0.124939
-1.359815	files on	-0.425969
-1.099431	fast on	-0.124939
-1.955649	optimal on	-0.124939
-1.945857	space on	-0.124939
-2.019545	else on	-0.124939
-1.328681	dispatching on	-0.124939
-0.747151	running on	-0.346788
-1.860427	better on	-0.124939
-1.890490	examples on	-0.124939
-1.880236	addition on	-0.124939
-1.841121	expressions on	-0.124939
-1.236998	transferred on	-0.425969
-1.210813	optimizations on	-0.124939
-1.840728	graphics on	-0.124939
-1.835012	together on	-0.124939
-1.187254	dispatch on	-0.124939
-1.797262	storage on	-0.124939
-0.041787	based on	-0.234083
-1.823366	feature on	-0.124939
-1.769714	core on	-0.124939
-1.122622	around on	-0.124939
-0.712467	reductions on	-0.425969
-0.107886	depends on	-0.572097
-1.116673	tested on	-0.124939
-1.802836	compatible on	-0.124939
-0.020749	depending on	-0.726999
-1.765183	avoided on	-0.124939
-0.573928	turn on	-0.221849
-1.742780	described on	-0.124939
-1.704854	operation on	-0.124939
-1.720704	comes on	-0.124939
-0.053644	rely on	-0.271067
-1.696155	given on	-0.124939
-1.663971	tasks on	-0.124939
-0.599704	effect on	-0.249877
-1.640245	efficiently on	-0.124939
-0.971354	models on	-0.425969
-0.972554	details on	-0.124939
-1.613780	especially on	-0.124939
-0.938384	discussed on	-0.425969
-1.608081	below on	-0.124939
-1.602456	delay on	-0.124939
-1.596903	either on	-0.124939
-1.591420	ebx on	-0.124939
-1.566688	something on	-0.124939
-0.599339	cycle on	-0.124939
-1.464153	fastest on	-0.124939
-1.487104	listed on	-0.124939
-1.417485	spend on	-0.124939
-1.411786	measurements on	-0.124939
-1.406161	measured on	-0.124939
-0.489517	log on	-0.301030
-0.423926	spent on	-0.124939
-1.350538	15 on	-0.124939
-1.344840	normal on	-0.124939
-0.057700	depend on	-0.301030
-1.282984	effort on	-0.124939
-1.271357	list, on	-0.124939
-1.277132	compromise on	-0.124939
-1.271357	package on	-0.124939
-0.070220	relies on	-0.124939
-0.502700	Dispatch on	-0.124939
-1.180222	bad on	-0.124939
-1.180222	134 on	-0.124939
-0.089704	restrictions on	-0.249877
-1.186074	appears on	-0.124939
-0.501480	μs on	-0.124939
-1.186074	bytes) on	-0.124939
-1.180222	tests on	-0.124939
-1.067068	detail on	-0.124939
-1.061135	advices on	-0.124939
-0.377761	differently on	-0.124939
-0.377761	performed on	-0.124939
-1.061135	Literature on	-0.124939
-1.061135	focus on	-0.124939
-1.061135	specified on	-0.124939
-1.061135	9.5a on	-0.124939
-1.061135	forums on	-0.124939
-1.061135	flags on	-0.124939
-0.890977	interpretation on	-0.124939
-0.202893	Report on	-0.425969
-0.890977	Manual on	-0.124939
-0.890977	miss on	-0.124939
-0.890977	literature on	-0.124939
-0.202893	concentrated on	-0.124939
-0.890977	experiments on	-0.124939
-0.890977	influence on	-0.124939
-0.890977	(three on	-0.124939
-0.890977	crash on	-0.124939
-0.890977	perfectly on	-0.124939
-0.890977	optimally on	-0.124939
-0.202893	wasted on	-0.124939
-0.890977	heavily on	-0.124939
-0.890977	efforts on	-0.124939
-0.890977	book on	-0.124939
-0.890977	restriction on	-0.124939
-0.890977	IDE on	-0.124939
-0.890977	manipulations on	-0.124939
-0.890977	stay on	-0.124939
-0.202893	tips on	-0.124939
-0.595961	below, on	-0.124939
-0.595961	Files on	-0.124939
-0.595961	Turn on	-0.124939
-0.595961	Storage on	-0.124939
-0.595961	pushed on	-0.124939
-0.595961	article on	-0.124939
-0.595961	Advice on	-0.124939
-0.595961	tag on	-0.124939
-0.595961	7.43 on	-0.124939
-0.595961	mainly on	-0.124939
-0.595961	satisfactorily on	-0.124939
-0.595961	impacts on	-0.124939
-0.595961	(depending on	-0.124939
-0.595961	research on	-0.124939
-0.595961	concentrating on	-0.124939
-0.595961	Advices on	-0.124939
-0.595961	relying on	-0.124939
-0.595961	setup. on	-0.124939
-0.595961	textbook on	-0.124939
-0.595961	opinions on	-0.124939
-0.595961	NOT on	-0.124939
-0.595961	incurred on	-0.124939
-1.462714	the code	-0.469434
-2.531212	a code	-0.221849
-1.822309	of code	-0.346788
-3.172286	to code	-0.124939
-2.708333	and code	-0.124939
-3.255119	in code	-0.124939
-1.563584	The code	-0.425969
-2.473817	for code	-0.301030
-3.092821	that code	-0.124939
-2.807569	or code	-0.124939
-2.961555	function code	-0.124939
-2.864746	with code	-0.124939
-2.889207	on code	-0.124939
-1.991116	This code	-0.124939
-2.785141	than code	-0.124939
-1.856678	this code	-0.124939
-1.973764	when code	-0.301030
-1.938220	A code	-0.301030
-1.625420	program code	-0.346788
-2.686955	make code	-0.124939
-2.608236	different code	-0.124939
-1.699065	same code	-0.124939
-2.559721	other code	-0.124939
-1.678246	point code	-0.124939
-2.617764	which code	-0.124939
-1.649072	all code	-0.249877
-2.539630	class code	-0.124939
-2.477792	multiple code	-0.124939
-2.488122	64-bit code	-0.124939
-2.503056	such code	-0.124939
-2.500469	efficient code	-0.124939
-2.469472	where code	-0.124939
-2.407138	any code	-0.124939
-2.398159	makes code	-0.124939
-1.218982	critical code	-0.221849
-1.720357	bit code	-0.124939
-1.374013	system code	-0.301030
-1.327602	error code	-0.124939
-2.189810	about code	-0.124939
-1.038717	extra code	-0.221849
-0.864608	assembly code	-0.367977
-1.284647	compiled code	-0.124939
-2.123564	small code	-0.124939
-2.145097	good code	-0.124939
-1.480464	AVX code	-0.425969
-0.946955	optimized code	-0.124939
-2.062208	every code	-0.124939
-2.018150	All code	-0.124939
-0.465396	intermediate code	-0.321233
-2.005435	optimize code	-0.124939
-0.831229	above code	-0.221849
-1.339631	optimal code	-0.124939
-1.335945	particular code	-0.124939
-1.982759	Mac code	-0.124939
-1.955259	complicated code	-0.124939
-1.274539	source code	-0.124939
-1.215928	Each code	-0.425969
-1.197843	your code	-0.425969
-1.835811	binary code	-0.124939
-1.806667	advanced code	-0.124939
-0.165059	position-independent code	-0.212089
-0.791145	vectorized code	-0.124939
-1.644874	Any code	-0.124939
-1.618873	identical code	-0.124939
-1.534720	independent code	-0.124939
-1.543839	vectorize code	-0.124939
-1.434694	definition code	-0.124939
-0.332243	machine code	-0.124939
-0.679349	System code	-0.124939
-0.345735	Position-independent code	-0.301030
-0.186601	loop-invariant code	-0.124939
-0.503257	Vectorized code	-0.124939
-0.503257	serial code	-0.425969
-0.503257	mixing code	-0.425969
-1.194739	measurement code	-0.124939
-1.191656	cache, code	-0.124939
-0.378949	compiler-generated code	-0.124939
-1.069800	improving code	-0.124939
-1.069800	well-structured code	-0.124939
-0.378949	built-in code	-0.124939
-1.072905	complete code	-0.124939
-0.203488	invariant code	-0.425969
-0.896813	non-AVX code	-0.124939
-0.896813	resulting code	-0.124939
-0.896813	unsafe code	-0.124939
-0.896813	Both code	-0.124939
-0.896813	Interpreted code	-0.124939
-0.598910	dead code	-0.124939
-0.598910	build code	-0.124939
-0.598910	user-written code	-0.124939
-0.598910	startup code	-0.124939
-0.598910	Complicated code	-0.124939
-0.598910	exception-safe code	-0.124939
-0.598910	resultant code	-0.124939
-2.482728	is as	-0.346788
-2.486683	be as	-0.124939
-2.469561	are as	-0.124939
-2.571511	or as	-0.124939
-2.866290	it as	-0.124939
-2.778810	function as	-0.124939
-2.720921	code as	-0.124939
-2.572375	not as	-0.124939
-2.234470	than as	-0.124939
-2.885750	x as	-0.124939
-2.255748	may as	-0.425969
-2.597103	have as	-0.124939
-2.188402	time as	-0.124939
-2.611591	use as	-0.124939
-2.085133	data as	-0.124939
-2.072977	program as	-0.124939
-2.544206	vector as	-0.124939
-1.251462	same as	-0.221849
-2.538579	functions as	-0.124939
-2.527091	but as	-0.124939
-1.250656	used as	-0.234083
-2.462759	cache as	-0.124939
-1.942847	do as	-0.124939
-1.925212	size as	-0.124939
-2.462827	b as	-0.124939
-2.387119	object as	-0.124939
-2.351066	C++ as	-0.124939
-0.267752	such as	-0.159019
-1.257107	efficient as	-0.301030
-1.827123	value as	-0.124939
-2.377923	objects as	-0.124939
-1.556891	variable as	-0.124939
-2.369568	so as	-0.124939
-1.540923	variables as	-0.301030
-1.094260	long as	-0.271067
-1.502118	way as	-0.124939
-1.132548	stored as	-0.301030
-2.189990	often as	-0.124939
-2.099945	always as	-0.124939
-2.190213	programming as	-0.124939
-2.187148	available as	-0.124939
-2.154026	times as	-0.124939
-2.228337	want as	-0.124939
-2.126619	arrays as	-0.124939
-1.522497	work as	-0.124939
-2.093987	calculations as	-0.124939
-1.522482	compiled as	-0.124939
-2.078757	language as	-0.124939
-2.111427	much as	-0.124939
-2.073100	thread as	-0.124939
-2.051910	small as	-0.124939
-2.085258	integers as	-0.124939
-1.080274	good as	-0.249877
-2.069339	Linux as	-0.124939
-2.099755	done as	-0.124939
-1.463978	therefore as	-0.124939
-2.045525	precision as	-0.124939
-2.048375	optimized as	-0.124939
-1.045927	calculated as	-0.124939
-2.029709	get as	-0.124939
-2.081038	advantageous as	-0.124939
-0.588405	implemented as	-0.425969
-1.430552	known as	-0.425969
-0.627224	well as	-0.124939
-1.991094	count as	-0.124939
-1.940951	quite as	-0.124939
-0.737306	fast as	-0.124939
-1.950576	optimize as	-0.124939
-1.919542	branches as	-0.124939
-1.032226	name as	-0.301030
-1.870418	string as	-0.124939
-1.835639	expressions as	-0.124939
-0.985206	transferred as	-0.124939
-1.815435	framework as	-0.124939
-1.825766	numbers as	-0.124939
-1.836802	declared as	-0.124939
-1.814525	results as	-0.124939
-1.803489	were as	-0.124939
-0.737915	just as	-0.425969
-1.780008	smaller as	-0.124939
-1.755185	reductions as	-0.124939
-1.739892	languages as	-0.124939
-1.728856	STL as	-0.124939
-1.768775	intended as	-0.124939
-0.816024	code, as	-0.124939
-1.037655	platforms as	-0.124939
-1.693224	given as	-0.124939
-1.038924	vectorized as	-0.124939
-1.661039	offset as	-0.124939
-1.644162	macro as	-0.124939
-1.655340	them as	-0.124939
-1.637904	thing as	-0.124939
-1.600116	occur as	-0.124939
-1.600116	reading as	-0.124939
-0.683461	either as	-0.301030
-1.588489	ebx as	-0.124939
-1.606048	defined as	-0.124939
-1.541397	significant as	-0.124939
-1.552871	invalid as	-0.124939
-0.485620	organized as	-0.249877
-1.552871	set, as	-0.124939
-1.564656	processors, as	-0.124939
-1.570670	apply as	-0.124939
-1.552871	features as	-0.124939
-1.501338	style as	-0.124939
-1.507113	chosen as	-0.124939
-0.850870	provided as	-0.124939
-1.507113	standardized as	-0.124939
-1.479859	included as	-0.124939
-1.461813	now as	-0.124939
-1.467745	unit as	-0.124939
-0.548024	CPUs, as	-0.602060
-1.467745	j as	-0.124939
-1.473760	factors as	-0.124939
-1.461813	explicitly as	-0.124939
-0.797160	interpreted as	-0.124939
-0.801003	exactly as	-0.425969
-1.409754	xn as	-0.124939
-0.490032	distributed as	-0.124939
-1.409754	mode, as	-0.124939
-1.403821	memory, as	-0.124939
-1.409754	system, as	-0.124939
-1.403821	integers, as	-0.124939
-1.409754	measurements as	-0.124939
-1.354920	principle as	-0.124939
-1.361106	static, as	-0.124939
-1.348821	edx as	-0.124939
-1.342807	users as	-0.124939
-0.674779	soon as	-0.124939
-1.269640	executed as	-0.124939
-1.275739	consumption as	-0.124939
-1.281925	appear as	-0.124939
-1.275739	written as	-0.124939
-1.269640	15.1c as	-0.124939
-1.185015	up, as	-0.124939
-1.185015	enum as	-0.124939
-1.178829	precision, as	-0.124939
-1.185015	queue as	-0.124939
-0.089676	expressed as	-0.249877
-1.191291	Same as	-0.124939
-0.248433	treated as	-0.124939
-0.502554	coded as	-0.124939
-0.502554	represented as	-0.124939
-1.185015	reproducible as	-0.124939
-1.060076	parameters, as	-0.124939
-1.060076	FPGA as	-0.124939
-1.060076	devices, as	-0.124939
-1.060076	counters, as	-0.124939
-1.060076	execution, as	-0.124939
-1.060076	access, as	-0.124939
-1.060076	space, as	-0.124939
-1.060076	operations, as	-0.124939
-1.060076	etc., as	-0.124939
-1.060076	ReadTSC as	-0.124939
-1.060076	metaprogramming, as	-0.124939
-1.060076	vectors, as	-0.124939
-1.060076	classes, as	-0.124939
-0.890261	templates, as	-0.124939
-0.890261	branches, as	-0.124939
-0.890261	developed as	-0.124939
-0.890261	events as	-0.124939
-0.890261	microseconds as	-0.124939
-0.890261	use, as	-0.124939
-0.890261	_mm_empty() as	-0.124939
-0.890261	ways, as	-0.124939
-0.890261	AVX, as	-0.124939
-0.890261	cached as	-0.124939
-0.890261	Booleans as	-0.124939
-0.202820	internally as	-0.124939
-0.595598	clumsy, as	-0.124939
-0.595598	statements, as	-0.124939
-0.595598	yet as	-0.124939
-0.595598	regarded as	-0.124939
-0.595598	pool, as	-0.124939
-0.595598	assignment, as	-0.124939
-0.595598	optimizations, as	-0.124939
-0.595598	blurred as	-0.124939
-0.595598	passed as	-0.124939
-0.595598	serves as	-0.124939
-0.595598	elements, as	-0.124939
-0.595598	OneOrTwo5[b!=0] as	-0.124939
-0.595598	linking, as	-0.124939
-0.595598	frequency, as	-0.124939
-0.595598	hints as	-0.124939
-0.595598	pipelined, as	-0.124939
-0.595598	stride, as	-0.124939
-0.595598	contentions, as	-0.124939
-0.595598	checking, as	-0.124939
-0.595598	(n!) as	-0.124939
-0.595598	directory as	-0.124939
-0.595598	union, as	-0.124939
-0.595598	issue, as	-0.124939
-0.595598	collection, as	-0.124939
-0.595598	R2 as	-0.124939
-1.223087	is not	-0.374816
-2.485279	and not	-0.124939
-1.226133	are not	-0.204120
-2.620011	can not	-0.124939
-2.503520	or not	-0.124939
-2.991113	by not	-0.124939
-2.878439	not not	-0.124939
-2.921276	compiler not	-0.124939
-1.189731	may not	-0.625541
-2.838382	have not	-0.124939
-2.824978	when not	-0.124939
-1.309524	will not	-0.388180
-1.897425	has not	-0.301030
-1.054810	but not	-0.271067
-1.510505	should not	-0.221849
-2.621872	set not	-0.124939
-0.651869	do not	-0.332064
-2.593387	pointer not	-0.124939
-2.339193	need not	-0.124939
-2.331122	sure not	-0.124939
-2.303187	SSE2 not	-0.124939
-0.236498	does not	-0.249877
-2.198083	But not	-0.124939
-1.075586	therefore not	-0.124939
-2.088544	would not	-0.124939
-2.061391	simply not	-0.124939
-2.014634	was not	-0.124939
-1.948420	means not	-0.124939
-1.871060	platform not	-0.124939
-1.846971	usually not	-0.124939
-1.176789	were not	-0.124939
-0.094889	Do not	-0.425969
-1.663990	possibly not	-0.124939
-1.667765	though not	-0.124939
-1.586692	might not	-0.124939
-1.542826	include not	-0.124939
-1.497398	CPUs, not	-0.124939
-1.433682	had not	-0.124939
-1.435582	generally not	-0.124939
-1.435582	system, not	-0.124939
-1.433682	size, not	-0.124939
-0.745848	registers, not	-0.124939
-0.679680	am not	-0.124939
-1.293278	_WIN64 not	-0.124939
-1.291362	currently not	-0.124939
-0.186786	Does not	-0.249877
-1.073354	Has not	-0.124939
-1.073354	register, not	-0.124939
-1.075288	measures not	-0.124939
-0.899197	research, not	-0.124939
-0.899197	95 not	-0.124939
-0.600109	adds, not	-0.124939
-0.600109	rows, not	-0.124939
-0.600109	did not	-0.124939
-0.600109	specialization, not	-0.124939
-0.600109	forwards, not	-0.124939
-0.600109	(but not	-0.124939
-0.600109	precedence, not	-0.124939
-1.821106	// This	-0.550907
-1.253831	} This	-0.301030
-1.269555	code. This	-0.249877
-1.265190	time. This	-0.726999
-2.140031	Gnu This	-0.124939
-1.030435	function. This	-0.221849
-2.003908	etc. This	-0.124939
-1.187470	functions. This	-0.124939
-2.040212	b; This	-0.124939
-0.854079	memory. This	-0.124939
-0.839349	program. This	-0.124939
-1.277895	cache. This	-0.425969
-1.845637	efficient. This	-0.124939
-1.832896	below. This	-0.124939
-1.000217	data. This	-0.301030
-1.832846	set. This	-0.124939
-1.215988	compilers. This	-0.124939
-1.197373	x; This	-0.124939
-0.914483	called. This	-0.124939
-1.138458	CPUs. This	-0.124939
-1.133621	compiler. This	-0.124939
-1.115259	loop. This	-0.124939
-1.115259	pointer. This	-0.124939
-1.084075	platforms. This	-0.425969
-1.087306	cases. This	-0.124939
-0.840767	1. This	-0.124939
-1.694543	size. This	-0.124939
-1.666515	variables. This	-0.124939
-1.649879	resources. This	-0.124939
-0.785541	class. This	-0.124939
-1.677829	d; This	-0.124939
-1.631445	it. This	-0.124939
-1.631445	registers. This	-0.124939
-1.667855	bytes. This	-0.124939
-1.631445	object. This	-0.124939
-1.631445	library. This	-0.124939
-1.660324	calculations. This	-0.124939
-1.631445	operations. This	-0.124939
-1.002020	variable. This	-0.124939
-0.997129	optimization. This	-0.425969
-0.561462	stack. This	-0.249877
-1.596683	possible. This	-0.124939
-1.596683	needed. This	-0.124939
-0.718594	thread. This	-0.124939
-0.968901	purposes. This	-0.124939
-1.573094	instructions. This	-0.124939
-1.573094	vector. This	-0.124939
-0.931112	well. This	-0.124939
-1.531701	dispatching. This	-0.124939
-1.546381	problem. This	-0.124939
-1.531701	returns. This	-0.124939
-1.500624	order. This	-0.124939
-1.485944	allocation. This	-0.124939
-0.845611	block. This	-0.124939
-1.485944	executed. This	-0.124939
-1.493222	overflow. This	-0.124939
-1.457001	value. This	-0.124939
-0.794458	file. This	-0.425969
-1.449471	register. This	-0.124939
-1.457001	system. This	-0.124939
-1.449471	fast. This	-0.124939
-1.449471	units. This	-0.124939
-1.457001	array. This	-0.124939
-1.442069	software. This	-0.124939
-1.449471	line. This	-0.124939
-1.449471	vectors. This	-0.124939
-1.391479	parameters. This	-0.124939
-0.739783	important. This	-0.124939
-0.739783	simultaneously. This	-0.124939
-1.391479	Studio This	-0.124939
-1.399009	processor. This	-0.124939
-0.490096	bits. This	-0.301030
-1.399009	is. This	-0.124939
-1.399009	speed. This	-0.124939
-0.739783	do. This	-0.124939
-0.738122	16. This	-0.124939
-1.347527	notice This	-0.124939
-1.332063	members. This	-0.124939
-1.339726	again. This	-0.124939
-0.672837	times. This	-0.124939
-1.339726	one. This	-0.124939
-1.339726	counter. This	-0.124939
-1.339726	structure. This	-0.124939
-1.260545	profiler. This	-0.124939
-1.260545	manual. This	-0.124939
-1.260545	finished. This	-0.124939
-1.268345	ways. This	-0.124939
-1.260545	Mars This	-0.124939
-1.260545	process. This	-0.124939
-1.260545	files. This	-0.124939
-1.260545	execution. This	-0.124939
-0.343968	modules. This	-0.124939
-0.595323	all. This	-0.124939
-1.260545	addresses. This	-0.124939
-1.260545	them. This	-0.124939
-1.260545	point. This	-0.124939
-1.268345	doubled. This	-0.124939
-1.276289	cycle. This	-0.124939
-1.171435	up. This	-0.124939
-1.171435	reasons. This	-0.124939
-1.171435	objects. This	-0.124939
-1.179379	lines. This	-0.124939
-1.179379	1.0f; This	-0.124939
-1.171435	versions. This	-0.124939
-1.171435	cached. This	-0.124939
-1.179379	unsigned. This	-0.124939
-1.179379	pointers. This	-0.124939
-1.171435	option. This	-0.124939
-1.179379	b2; This	-0.124939
-1.171435	languages. This	-0.124939
-1.171435	counts. This	-0.124939
-1.171435	smaller. This	-0.124939
-1.187470	module. This	-0.124939
-1.171435	manually. This	-0.124939
-1.179379	results. This	-0.124939
-1.054440	addition. This	-0.124939
-1.054440	only. This	-0.124939
-1.054440	long. This	-0.124939
-1.062531	advance. This	-0.124939
-1.054440	28. This	-0.124939
-0.376830	loaded. This	-0.124939
-1.054440	C++. This	-0.124939
-1.054440	caching. This	-0.124939
-1.054440	stored. This	-0.124939
-1.054440	manner. This	-0.124939
-1.054440	directives. This	-0.124939
-0.376830	definition. This	-0.124939
-0.376830	CPUs"). This	-0.124939
-1.054440	same. This	-0.124939
-1.054440	reference. This	-0.124939
-1.062531	other. This	-0.124939
-0.376830	fragmented. This	-0.124939
-1.054440	truncation. This	-0.124939
-0.376830	inline. This	-0.124939
-1.054440	free. This	-0.124939
-0.376830	modified. This	-0.425969
-1.054440	x. This	-0.124939
-1.054440	a. This	-0.124939
-0.886440	static. This	-0.124939
-0.886440	frameworks. This	-0.124939
-0.886440	mouse. This	-0.124939
-0.886440	ArrayOfStructures[100]; This	-0.124939
-0.886440	compiled. This	-0.124939
-0.886440	32. This	-0.124939
-0.886440	72 This	-0.124939
-0.886440	volatile. This	-0.124939
-0.886440	safe. This	-0.124939
-0.886440	pure. This	-0.124939
-0.886440	range. This	-0.124939
-0.886440	2.0 This	-0.124939
-0.886440	lost. This	-0.124939
-0.886440	declaration. This	-0.124939
-0.886440	changed. This	-0.124939
-0.886440	feature. This	-0.124939
-0.886440	defined. This	-0.124939
-0.886440	default. This	-0.124939
-0.886440	development. This	-0.124939
-0.886440	known. This	-0.124939
-0.886440	rounding. This	-0.124939
-0.886440	temporarily. This	-0.124939
-0.886440	predicted. This	-0.124939
-0.886440	else. This	-0.124939
-0.886440	alloca. This	-0.124939
-0.886440	ms. This	-0.124939
-0.886440	137). This	-0.124939
-0.886440	i. This	-0.124939
-0.886440	Introduction This	-0.124939
-0.886440	normal. This	-0.124939
-0.886440	occurred. This	-0.124939
-0.886440	efficiently. This	-0.124939
-0.886440	list[i]; This	-0.124939
-0.886440	executables. This	-0.124939
-0.593655	231. This	-0.124939
-0.593655	effects. This	-0.124939
-0.593655	$B1$2:. This	-0.124939
-0.593655	before. This	-0.124939
-0.593655	saturated. This	-0.124939
-0.593655	compiling. This	-0.124939
-0.593655	slices. This	-0.124939
-0.593655	Gbytes. This	-0.124939
-0.593655	switching. This	-0.124939
-0.593655	view. This	-0.124939
-0.593655	branch). This	-0.124939
-0.593655	(1985). This	-0.124939
-0.593655	de-allocated. This	-0.124939
-0.593655	[ecx+eax*4]. This	-0.124939
-0.593655	0.666666666666666666667; This	-0.124939
-0.593655	local. This	-0.124939
-0.593655	bytes). This	-0.124939
-0.593655	spaces. This	-0.124939
-0.593655	full. This	-0.124939
-0.593655	if. This	-0.124939
-0.593655	(a+b). This	-0.124939
-0.593655	it). This	-0.124939
-0.593655	substantial. This	-0.124939
-0.593655	arguments. This	-0.124939
-0.593655	-axAVX. This	-0.124939
-0.593655	specialization. This	-0.124939
-0.593655	member. This	-0.124939
-0.593655	version). This	-0.124939
-0.593655	happens. This	-0.124939
-0.593655	entries. This	-0.124939
-0.593655	Studio. This	-0.124939
-0.593655	n;} This	-0.124939
-0.593655	35 This	-0.124939
-0.593655	functionality. This	-0.124939
-0.593655	mark_end; This	-0.124939
-0.593655	(4096). This	-0.124939
-0.593655	unused. This	-0.124939
-0.593655	kbytes. This	-0.124939
-0.593655	reduced. This	-0.124939
-0.593655	(SVML). This	-0.124939
-0.593655	often. This	-0.124939
-0.593655	JNZ). This	-0.124939
-0.593655	scheduler. This	-0.124939
-0.593655	added. This	-0.124939
-0.593655	clock. This	-0.124939
-0.593655	i*sizeof(S1). This	-0.124939
-0.593655	45. This	-0.124939
-0.593655	scratch. This	-0.124939
-0.593655	class? This	-0.124939
-0.593655	135). This	-0.124939
-0.593655	tiling. This	-0.124939
-0.593655	16.1. This	-0.124939
-0.593655	32-62. This	-0.124939
-0.593655	87. This	-0.124939
-0.593655	www.agner.org/optimize/testp.zip. This	-0.124939
-0.593655	CString. This	-0.124939
-0.593655	iteration. This	-0.124939
-0.593655	7.2). This	-0.124939
-0.593655	state. This	-0.124939
-0.593655	deprecated. This	-0.124939
-0.593655	((a+b)+c)+d. This	-0.124939
-0.593655	composer) This	-0.124939
-0.593655	-fpic. This	-0.124939
-0.593655	2010. This	-0.124939
-0.593655	written. This	-0.124939
-0.593655	158. This	-0.124939
-0.593655	measure. This	-0.124939
-0.593655	route. This	-0.124939
-0.593655	MFC). This	-0.124939
-0.593655	patterns. This	-0.124939
-0.593655	"undefined". This	-0.124939
-0.593655	-mveclibabi=svml. This	-0.124939
-0.593655	note: This	-0.124939
-0.593655	place. This	-0.124939
-0.593655	kb. This	-0.124939
-0.593655	usability. This	-0.124939
-0.593655	log(c[i]);. This	-0.124939
-0.593655	int)i; This	-0.124939
-0.593655	-fsource-asm). This	-0.124939
-0.593655	throw(); This	-0.124939
-0.593655	eax,eax. This	-0.124939
-2.178988	a -	-0.425969
-3.017344	by -	-0.124939
-0.398246	- -	-1.397112
-0.607825	x -	-1.386202
-2.746370	program -	-0.124939
-0.383906	n.a. -	-0.842794
-2.428992	2 -	-0.124939
-1.740727	4 -	-0.124939
-2.384420	8 -	-0.124939
-0.704334	0 -	-0.388180
-2.170056	integers -	-0.124939
-2.070037	1 -	-0.124939
-1.176811	10 -	-0.425969
-1.759975	b) -	-0.124939
-1.728397	inlined -	-0.124939
-0.646694	3 -	-0.124939
-1.493496	12 -	-0.124939
-1.498452	-1 -	-0.124939
-1.435504	expensive -	-0.124939
-1.371855	14 -	-0.124939
-1.292674	50 -	-0.124939
-1.197422	40 -	-0.124939
-1.197422	maintenance -	-0.124939
-1.074148	-S -	-0.124939
-1.074148	----- -	-0.124939
-1.075819	a*b -	-0.124939
-1.074148	ReadTSC() -	-0.124939
-1.075819	folding -	-0.124939
-0.203783	a+(b+c) -	-0.124939
-0.203783	-- -	-0.124939
-0.203783	a*(b+c) -	-0.124939
-0.899728	-(-a)=a -	-0.124939
-0.899728	b*a -	-0.124939
-0.899728	a&(b|c) -	-0.124939
-0.600376	convenience -	-0.124939
-0.600376	x-xxx -	-0.124939
-0.600376	after) -	-0.124939
-0.600376	(int)n -	-0.124939
-0.600376	2.5*x^2 -	-0.124939
-0.600376	a<<(b+c) -	-0.124939
-0.600376	(27 -	-0.124939
-0.600376	(20 -	-0.124939
-0.600376	int)(i -	-0.124939
-0.600376	(3 -	-0.124939
-0.600376	2004 -	-0.124939
-0.600376	xx(-)x- -	-0.124939
-0.600376	int)(max -	-0.124939
-0.600376	http://www.agner.org/optimize/ -	-0.124939
-0.600376	a*4 -	-0.124939
-0.600376	--- -	-0.124939
-1.960787	is an	-0.279841
-1.871903	of an	-0.291270
-2.122180	to an	-0.249877
-2.512695	and an	-0.124939
-1.902194	in an	-0.154902
-1.815204	for an	-0.460731
-2.648711	that an	-0.124939
-1.863220	be an	-0.279841
-2.007881	or an	-0.124939
-1.996263	if an	-0.221849
-1.964469	by an	-0.221849
-1.718143	with an	-0.124939
-2.024984	on an	-0.249877
-1.392952	as an	-0.301030
-1.963685	not an	-0.124939
-2.067800	than an	-0.124939
-1.382039	have an	-0.263241
-2.737542	time an	-0.124939
-1.720409	use an	-0.221849
-2.715526	when an	-0.124939
-2.749721	then an	-0.124939
-1.550154	at an	-0.602060
-1.530643	has an	-0.301030
-1.873796	make an	-0.124939
-2.715859	because an	-0.124939
-1.839061	only an	-0.124939
-2.661142	If an	-0.124939
-2.620483	used an	-0.124939
-2.531775	set an	-0.124939
-1.965161	do an	-0.124939
-1.453001	using an	-0.346788
-1.424599	into an	-0.124939
-2.527245	i an	-0.124939
-1.621028	such an	-0.124939
-2.322363	return an	-0.124939
-1.747808	makes an	-0.124939
-2.261071	often an	-0.124939
-2.297665	need an	-0.124939
-1.640845	without an	-0.124939
-1.627166	access an	-0.124939
-1.171062	making an	-0.124939
-2.199577	times an	-0.124939
-2.164705	about an	-0.124939
-2.149378	while an	-0.124939
-2.172726	avoid an	-0.124939
-1.520869	Use an	-0.124939
-2.159870	But an	-0.124939
-1.517342	through an	-0.124939
-1.440788	uses an	-0.124939
-2.072412	get an	-0.124939
-2.082889	whether an	-0.124939
-2.038027	doing an	-0.124939
-2.051915	run an	-0.124939
-2.014531	add an	-0.124939
-2.028306	simply an	-0.124939
-1.965154	was an	-0.124939
-1.994906	cases, an	-0.124939
-1.990192	replace an	-0.124939
-1.951917	like an	-0.124939
-1.945688	Using an	-0.124939
-1.949350	put an	-0.124939
-1.960526	needs an	-0.124939
-1.869968	requires an	-0.124939
-1.863616	shows an	-0.124939
-0.647699	generate an	-0.124939
-1.841339	choose an	-0.124939
-1.814101	just an	-0.124939
-1.100648	gives an	-0.124939
-1.742467	Such an	-0.124939
-1.734920	fact an	-0.124939
-1.742467	including an	-0.124939
-1.684142	When an	-0.124939
-1.672772	Avoid an	-0.124939
-1.672772	copying an	-0.124939
-1.009658	accessing an	-0.425969
-1.672772	adding an	-0.124939
-1.676529	causes an	-0.124939
-1.695818	divide an	-0.124939
-1.634285	(e.g. an	-0.124939
-1.649380	convert an	-0.124939
-1.638010	handle an	-0.124939
-1.615448	insert an	-0.124939
-0.898977	whenever an	-0.124939
-1.570199	modify an	-0.124939
-1.577948	setting an	-0.124939
-1.528298	away an	-0.124939
-1.419154	had an	-0.124939
-1.356099	plus an	-0.124939
-1.360026	declare an	-0.124939
-1.356099	detect an	-0.124939
-1.363988	defines an	-0.124939
-1.363988	Accessing an	-0.124939
-1.280844	increment an	-0.124939
-1.280844	adds an	-0.124939
-1.288806	specify an	-0.124939
-1.191896	declaring an	-0.124939
-1.187897	While an	-0.124939
-1.066958	catch an	-0.124939
-1.066958	signal an	-0.124939
-1.066958	Has an	-0.124939
-1.066958	issue an	-0.124939
-1.066958	Comparing an	-0.124939
-1.066958	throw an	-0.124939
-1.066958	expects an	-0.124939
-0.894903	construct an	-0.124939
-0.894903	CPU, an	-0.124939
-0.894903	constructor, an	-0.124939
-0.894903	exceeds an	-0.124939
-0.894903	performing an	-0.124939
-0.894903	provoke an	-0.124939
-0.894903	Converting an	-0.124939
-0.894903	replacing an	-0.124939
-0.597947	here's an	-0.124939
-0.597947	issuing an	-0.124939
-0.597947	seeing an	-0.124939
-0.597947	provokes an	-0.124939
-0.597947	feeding an	-0.124939
-0.597947	prints an	-0.124939
-0.597947	throws an	-0.124939
-0.597947	Insert an	-0.124939
-0.597947	fake an	-0.124939
-0.597947	raising an	-0.124939
-0.597947	detects an	-0.124939
-2.824526	to int	-0.124939
-2.438921	= int	-0.124939
-2.400088	or int	-0.124939
-2.080550	an int	-0.301030
-2.732669	int int	-0.124939
-1.253801	{ int	-0.301030
-2.722800	time int	-0.124939
-1.935447	} int	-0.301030
-2.501811	integer int	-0.124939
-2.519434	set int	-0.124939
-1.888334	library int	-0.425969
-1.327328	version int	-0.346788
-2.315481	2 int	-0.124939
-1.531283	long int	-0.124939
-0.402381	const int	-0.455932
-2.300497	4 int	-0.124939
-1.483585	0; int	-0.301030
-0.435572	unsigned int	-0.309463
-2.257484	SSE2 int	-0.124939
-0.111043	short int	-0.482450
-2.148832	work int	-0.124939
-2.162992	i; int	-0.124939
-2.182787	a, int	-0.124939
-2.119181	integers int	-0.124939
-2.104527	AVX int	-0.124939
-1.446193	}; int	-0.124939
-1.038955	b; int	-0.124939
-1.158949	inline int	-0.301030
-2.013011	... int	-0.124939
-1.922471	256 int	-0.124939
-1.941570	c; int	-0.124939
-0.965592	public: int	-0.124939
-1.210380	x; int	-0.425969
-1.197910	100; int	-0.124939
-1.825561	AVX2 int	-0.124939
-1.822384	reduce int	-0.124939
-1.770192	are: int	-0.124939
-0.508379	a; int	-0.602060
-1.709736	d; int	-0.124939
-1.651044	x, int	-0.124939
-0.244620	f; int	-0.970037
-0.898645	u; int	-0.425969
-0.042598	parm1, int	-0.726999
-1.483369	(unsigned int	-0.124939
-0.800852	volatile int	-0.124939
-1.429647	10; int	-0.124939
-0.490935	a[100]; int	-0.301030
-1.416961	127 int	-0.124939
-1.416961	MMX int	-0.124939
-1.358430	systems: int	-0.124939
-1.362700	16; int	-0.124939
-1.358430	7 int	-0.124939
-1.354202	1.0; int	-0.124939
-0.057779	SelectAddMul(short int	-0.903090
-1.279249	n! int	-0.124939
-0.598501	1000; int	-0.124939
-1.279249	__asm int	-0.124939
-0.089833	list[300]; int	-0.425969
-1.190921	b2; int	-0.124939
-0.124443	matrix[rows][columns]; int	-0.301030
-1.070337	89 int	-0.124939
-1.065982	list[100]; int	-0.124939
-1.070337	14.10 int	-0.124939
-1.070337	14.11 int	-0.124939
-1.070337	8.7 int	-0.124939
-1.070337	7.21 int	-0.124939
-1.065982	j; int	-0.124939
-0.378428	1024; int	-0.425969
-0.124443	a[], int	-0.301030
-1.065982	typedef int	-0.124939
-1.070337	7.23 int	-0.124939
-1.070337	7.20 int	-0.124939
-1.070337	7.19 int	-0.124939
-1.070337	7.18 int	-0.124939
-0.894246	y=temp;} int	-0.124939
-0.894246	14.12b int	-0.124939
-0.203227	Table[100]; int	-0.425969
-0.203227	b:2; int	-0.425969
-0.894246	string; int	-0.124939
-0.894246	14.13b int	-0.124939
-0.894246	list[size]; int	-0.124939
-0.894246	p; int	-0.124939
-0.894246	"C" int	-0.124939
-0.203227	a:4; int	-0.425969
-0.894246	c1; int	-0.124939
-0.894246	m;} int	-0.124939
-0.894246	2;} int	-0.124939
-0.597615	14.3a int	-0.124939
-0.597615	14.3b int	-0.124939
-0.597615	matrix[NUMROWS][NUMCOLUMNS]; int	-0.124939
-0.597615	8.9b int	-0.124939
-0.597615	8.9a int	-0.124939
-0.597615	14.1b int	-0.124939
-0.597615	14.1a int	-0.124939
-0.597615	403 int	-0.124939
-0.597615	110; int	-0.124939
-0.597615	14.13c int	-0.124939
-0.597615	14.13a int	-0.124939
-0.597615	FuncRow(int); int	-0.124939
-0.597615	8.13a int	-0.124939
-0.597615	8.13b int	-0.124939
-0.597615	9.1a int	-0.124939
-0.597615	9.1b int	-0.124939
-0.597615	8.11b int	-0.124939
-0.597615	8.11a int	-0.124939
-0.597615	7.42 int	-0.124939
-0.597615	ab[size]; int	-0.124939
-0.597615	SelectAddMul_dispatch(short int	-0.124939
-0.597615	module2.cpp int	-0.124939
-0.597615	blocking: int	-0.124939
-0.597615	m> int	-0.124939
-0.597615	8.6a int	-0.124939
-0.597615	8.6b int	-0.124939
-0.597615	list[16]; int	-0.124939
-0.597615	module1.cpp int	-0.124939
-0.597615	7.30b int	-0.124939
-0.597615	7.30a int	-0.124939
-0.597615	list[301]; int	-0.124939
-0.597615	IsPowerOf2, int	-0.124939
-0.597615	aa, int	-0.124939
-0.597615	FUNCNAME(short int	-0.124939
-0.597615	__declspec(align(64)) int	-0.124939
-0.597615	8.12a int	-0.124939
-0.597615	8.12b int	-0.124939
-0.597615	FuncType(short int	-0.124939
-0.597615	14.12a int	-0.124939
-0.597615	8.14b int	-0.124939
-0.597615	8.14a int	-0.124939
-0.597615	p->b;} int	-0.124939
-0.597615	int16_t int	-0.124939
-0.597615	1.6; int	-0.124939
-0.597615	399 int	-0.124939
-0.597615	98 int	-0.124939
-1.249978	time than	-0.191886
-2.223646	use than	-0.124939
-1.005746	more than	-0.310575
-2.685859	data than	-0.124939
-2.123943	program than	-0.425969
-2.670740	functions than	-0.124939
-2.654057	CPU than	-0.124939
-1.678123	other than	-0.124939
-2.576568	set than	-0.124939
-2.559354	b than	-0.124939
-2.498403	library than	-0.124939
-0.679203	efficient than	-0.191886
-1.440758	value than	-0.425969
-2.422479	variables than	-0.124939
-2.412579	way than	-0.124939
-0.373975	faster than	-0.287666
-0.721191	less than	-0.229674
-0.011829	rather than	-0.340054
-2.307202	optimization than	-0.124939
-2.279050	systems than	-0.124939
-2.241385	file than	-0.124939
-2.247888	bits than	-0.124939
-2.250347	operations than	-0.124939
-2.223064	instructions than	-0.124939
-1.322129	important than	-0.124939
-1.509890	thread than	-0.124939
-2.189404	power than	-0.124939
-1.231625	Linux than	-0.602060
-2.113980	classes than	-0.124939
-2.118998	precision than	-0.124939
-2.100692	container than	-0.124939
-2.072450	calculate than	-0.124939
-1.397011	mode than	-0.425969
-0.975809	values than	-0.726999
-2.023092	cycles than	-0.124939
-1.995767	space than	-0.124939
-2.034502	else than	-0.124939
-1.965485	signed than	-0.124939
-1.951876	block than	-0.124939
-1.928821	zero than	-0.124939
-0.518533	resources than	-0.249877
-1.003271	better than	-0.124939
-1.234283	expressions than	-0.124939
-0.987257	longer than	-0.301030
-1.863080	interface than	-0.124939
-1.196405	higher than	-0.124939
-0.334161	bigger than	-0.425969
-1.866098	ways than	-0.124939
-1.811896	modules than	-0.124939
-1.820061	smaller than	-0.124939
-1.803559	contentions than	-0.124939
-1.771665	index than	-0.124939
-1.724876	safe than	-0.124939
-1.713673	algorithm than	-0.124939
-0.724015	priority than	-0.301030
-1.657929	frequency than	-0.124939
-1.660776	efficiently than	-0.124939
-0.978694	RAM than	-0.124939
-1.575920	buffer than	-0.124939
-1.538702	device than	-0.124939
-1.535837	blocks than	-0.124939
-1.532990	time-consuming than	-0.124939
-0.853161	purposes than	-0.124939
-1.487550	parallelism than	-0.124939
-1.484684	lower than	-0.124939
-1.484684	random than	-0.124939
-0.124654	slower than	-0.204120
-1.426692	expensive than	-0.124939
-1.426692	reliable than	-0.124939
-0.678258	predictable than	-0.124939
-1.365495	compact than	-0.124939
-1.371322	form than	-0.124939
-1.286314	larger than	-0.124939
-1.195231	said than	-0.124939
-1.195231	bottleneck than	-0.124939
-1.192308	simpler than	-0.124939
-1.073235	Faster than	-0.124939
-1.070292	input/output than	-0.124939
-0.897144	footprint than	-0.124939
-0.897144	recently than	-0.124939
-0.897144	verify than	-0.124939
-0.897144	shared_ptr than	-0.124939
-0.599077	Rather than	-0.124939
-0.599077	bitmap than	-0.124939
-0.599077	2.0/3.0 than	-0.124939
-0.599077	(rather than	-0.124939
-0.599077	(other than	-0.124939
-0.599077	(less than	-0.124939
-0.599077	pooling) than	-0.124939
-1.497541	the compiler	-0.549672
-2.260762	a compiler	-0.221849
-2.549025	of compiler	-0.124939
-3.168603	and compiler	-0.124939
-3.411005	in compiler	-0.124939
-1.323082	The compiler	-0.572097
-3.243373	that compiler	-0.124939
-2.483816	by compiler	-0.124939
-2.439221	on compiler	-0.124939
-2.398995	This compiler	-0.124939
-1.693662	A compiler	-0.124939
-2.708687	different compiler	-0.124939
-2.695187	same compiler	-0.124939
-2.694192	which compiler	-0.124939
-2.630334	no compiler	-0.124939
-2.619098	each compiler	-0.124939
-0.643131	Intel compiler	-0.316824
-1.362412	C++ compiler	-0.346788
-2.319607	new compiler	-0.124939
-2.299268	following compiler	-0.124939
-0.374416	Gnu compiler	-0.243038
-2.212118	Windows compiler	-0.124939
-2.193843	best compiler	-0.124939
-1.087691	good compiler	-0.425969
-0.524476	optimizing compiler	-0.346788
-2.013462	particular compiler	-0.124939
-1.996463	Most compiler	-0.124939
-0.866586	Microsoft compiler	-0.124939
-1.960158	source compiler	-0.124939
-1.900629	Each compiler	-0.124939
-1.887181	your compiler	-0.124939
-1.841423	appropriate compiler	-0.124939
-1.633837	PathScale compiler	-0.124939
-1.543529	chosen compiler	-0.124939
-1.498716	just-in-time compiler	-0.124939
-0.746571	Clang compiler	-0.124939
-1.437543	Borland compiler	-0.124939
-1.370596	CodeGear compiler	-0.124939
-1.293003	Mars compiler	-0.124939
-1.293003	programming, compiler	-0.124939
-1.294596	Codeplay compiler	-0.124939
-0.346428	MS compiler	-0.602060
-1.197686	commercial compiler	-0.124939
-1.199286	PGI compiler	-0.124939
-0.899861	alone compiler	-0.124939
-0.899861	cheap compiler	-0.124939
-0.600443	friendly compiler	-0.124939
-0.600443	genuine compiler	-0.124939
-3.416895	a x	-0.124939
-2.555363	of x	-0.124939
-3.419559	to x	-0.124939
-3.215225	for x	-0.124939
-3.271657	that x	-0.124939
-2.330100	= x	-0.124939
-3.027685	on x	-0.124939
-0.705021	- x	-1.460731
-2.355989	int x	-0.425969
-2.104926	than x	-0.602060
-0.275758	x x	-1.472965
-1.131841	n.a. x	-0.602060
-2.555410	object x	-0.124939
-1.454684	* x	-0.425969
-1.400740	return x	-0.726999
-2.448743	between x	-0.124939
-1.742953	0; x	-0.425969
-1.739951	example, x	-0.425969
-2.310541	access x	-0.124939
-2.283445	case x	-0.124939
-1.497457	small x	-0.425969
-2.127601	get x	-0.124939
-2.081644	store x	-0.124939
-1.901335	x; x	-0.124939
-1.668943	2, x	-0.124939
-1.668943	square x	-0.124939
-1.591111	modify x	-0.124939
-1.548065	y; x	-0.124939
-1.499641	-1 x	-0.124939
-0.149599	x- x	-0.346788
-1.296892	Calculate x	-0.124939
-1.294155	elimination x	-0.124939
-1.199982	2.0; x	-0.124939
-0.124785	x-- x	-0.301030
-1.075043	---x----- x	-0.124939
-0.379657	(x) x	-0.124939
-0.900327	x-xxx---x x	-0.124939
-0.900327	74 x	-0.124939
-0.900327	a*b=b*a x	-0.124939
-0.600676	xx x	-0.124939
-0.600676	initializes x	-0.124939
-2.687284	and may	-0.124939
-2.136540	that may	-0.425969
-1.245997	it may	-0.726999
-2.253794	function may	-0.301030
-2.392225	code may	-0.124939
-1.506462	This may	-0.321233
-1.117457	compiler may	-0.372723
-0.968426	you may	-0.323306
-2.221397	this may	-0.124939
-2.767011	time may	-0.124939
-1.067449	It may	-0.477121
-2.138037	memory may	-0.124939
-1.735097	program may	-0.124939
-2.649214	functions may	-0.124939
-2.631845	CPU may	-0.124939
-1.660536	which may	-0.124939
-2.626721	but may	-0.124939
-2.572359	cache may	-0.124939
-2.544322	integer may	-0.124939
-2.556235	set may	-0.124939
-2.483630	example may	-0.124939
-1.735231	compilers may	-0.124939
-2.537944	size may	-0.124939
-2.541154	pointer may	-0.124939
-2.479585	library may	-0.124939
-1.490292	there may	-0.726999
-1.252137	There may	-0.602060
-2.444055	array may	-0.124939
-1.308955	we may	-0.124939
-2.407915	variables may	-0.124939
-0.399711	You may	-0.301030
-1.780547	table may	-0.124939
-1.662092	pointers may	-0.124939
-2.265110	systems may	-0.124939
-2.284635	user may	-0.124939
-2.314145	they may	-0.124939
-1.635487	method may	-0.425969
-2.256302	access may	-0.124939
-1.217591	system may	-0.124939
-2.212623	times may	-0.124939
-2.166995	Windows may	-0.124939
-1.541186	calls may	-0.124939
-2.161327	calculations may	-0.124939
-2.184460	execution may	-0.124939
-1.281666	processor may	-0.301030
-2.128557	These may	-0.124939
-2.137032	thread may	-0.124939
-2.115302	etc. may	-0.124939
-2.120753	calculation may	-0.124939
-2.099750	solution may	-0.124939
-2.093512	container may	-0.124939
-2.029673	count may	-0.124939
-2.005691	allocation may	-0.124939
-1.968592	branches may	-0.124939
-1.962611	multiplication may	-0.124939
-1.970656	implementation may	-0.124939
-1.964417	members may	-0.124939
-1.941233	methods may	-0.124939
-1.945423	reference may	-0.124939
-1.920564	programmer may	-0.124939
-1.936683	We may	-0.124939
-1.911171	mechanism may	-0.124939
-1.232427	expressions may	-0.124939
-1.862469	framework may	-0.124939
-1.834436	process may	-0.124939
-1.149367	constructor may	-0.425969
-1.804716	modules may	-0.124939
-1.823708	here may	-0.124939
-1.783001	section may	-0.124939
-1.769332	syntax may	-0.124939
-1.766132	profiler may	-0.124939
-1.744527	network may	-0.124939
-0.976713	frequency may	-0.124939
-1.609828	updates may	-0.124939
-1.574955	declaration may	-0.124939
-1.578252	map may	-0.124939
-1.532495	writes may	-0.124939
-1.532495	logic may	-0.124939
-1.532495	usability may	-0.124939
-1.478045	chain may	-0.124939
-1.433394	They may	-0.124939
-1.426673	collection may	-0.124939
-1.426673	developers may	-0.124939
-1.426673	measurements may	-0.124939
-1.363074	compilation may	-0.124939
-1.287266	drivers may	-0.124939
-1.287266	Templates may	-0.124939
-1.190356	solutions may	-0.124939
-1.190356	alloca may	-0.124939
-1.190356	unit-test may	-0.124939
-1.197184	tree may	-0.124939
-1.068818	advices may	-0.124939
-1.068818	One may	-0.124939
-1.068818	Bitfields may	-0.124939
-1.072245	2.5 may	-0.124939
-1.068818	intensive may	-0.124939
-0.896154	exit may	-0.124939
-0.896154	Overflow may	-0.124939
-0.896154	setup may	-0.124939
-0.598578	tolerance may	-0.124939
-0.598578	sticks may	-0.124939
-0.598578	developer may	-0.124939
-0.598578	dimension may	-0.124939
-0.598578	(^) may	-0.124939
-1.959303	and you	-0.539912
-1.576526	that you	-0.405765
-3.123356	can you	-0.124939
-2.850380	or you	-0.124939
-1.150445	if you	-0.425969
-2.408003	code you	-0.124939
-1.817397	as you	-0.124939
-2.324464	compiler you	-0.124939
-2.001392	time you	-0.124939
-1.484667	when you	-0.329059
-0.791073	then you	-0.656418
-2.684123	program you	-0.124939
-1.873169	because you	-0.124939
-2.673581	only you	-0.124939
-0.773010	If you	-0.352183
-1.656121	but you	-0.124939
-2.602103	compilers you	-0.124939
-1.605532	where you	-0.301030
-1.563402	so you	-0.124939
-2.393746	before you	-0.124939
-1.211926	example, you	-0.124939
-1.698502	how you	-0.425969
-2.286671	systems you	-0.124939
-1.392987	sure you	-0.124939
-2.291689	method you	-0.124939
-2.252417	case you	-0.124939
-1.606733	cases you	-0.124939
-2.185334	while you	-0.124939
-2.184964	Windows you	-0.124939
-2.173955	much you	-0.124939
-2.109553	solution you	-0.124939
-1.427539	whether you	-0.124939
-2.027775	All you	-0.124939
-2.027883	However, you	-0.124939
-2.005190	problems you	-0.124939
-0.586350	unless you	-0.329059
-1.342682	cases, you	-0.425969
-0.710297	Therefore, you	-0.425969
-1.057373	allows you	-0.602060
-1.024109	what you	-0.301030
-1.887704	give you	-0.124939
-1.218816	optimizations you	-0.124939
-0.923553	Here, you	-0.124939
-1.128660	things you	-0.425969
-1.101793	Windows, you	-0.124939
-1.749158	code, you	-0.124939
-1.694728	When you	-0.124939
-1.702584	until you	-0.124939
-0.939604	allow you	-0.425969
-0.940683	program, you	-0.124939
-1.575624	compiler, you	-0.124939
-1.583387	Obviously, you	-0.124939
-0.853533	Here you	-0.124939
-0.802381	systems, you	-0.124939
-1.433739	object, you	-0.124939
-1.428485	Likewise, you	-0.124939
-0.216228	Alternatively, you	-0.522879
-0.424759	general, you	-0.301030
-1.290262	case, you	-0.124939
-0.599885	library, you	-0.124939
-1.193352	Furthermore, you	-0.124939
-1.193352	150 you	-0.124939
-0.503518	words, you	-0.124939
-1.196019	Then you	-0.124939
-1.071080	devices, you	-0.124939
-1.071080	execution, you	-0.124939
-1.071080	slow, you	-0.124939
-0.379122	Whether you	-0.124939
-1.073764	fact, you	-0.124939
-1.071080	contrary, you	-0.124939
-0.379122	Before you	-0.425969
-1.071080	Instead, you	-0.124939
-0.897673	operator; you	-0.124939
-0.897673	purpose, you	-0.124939
-0.897673	insight you	-0.124939
-0.897673	better, you	-0.124939
-0.599343	8.21, you	-0.124939
-0.599343	safety, you	-0.124939
-0.599343	anything, you	-0.124939
-0.599343	reason, you	-0.124939
-0.599343	First you	-0.124939
-2.480240	= {	-0.124939
-2.672844	vector {	-0.124939
-0.183816	i++) {	-0.407485
-0.119605	else {	-0.388180
-0.012203	x) {	-0.602060
-0.312078	union {	-0.647817
-0.545575	b) {	-0.522879
-0.631969	S1 {	-0.124939
-1.690412	struct {	-0.124939
-0.162428	0) {	-0.380211
-1.488979	try {	-0.124939
-0.106946	p) {	-0.492916
-0.042638	cc[]) {	-0.550907
-0.390194	2) {	-0.124939
-0.049086	(b) {	-0.492916
-1.427824	C1 {	-0.124939
-1.440617	parm2) {	-0.124939
-0.332455	4) {	-0.425969
-1.360878	c1 {	-0.124939
-0.070387	() {	-0.124939
-0.186580	8) {	-0.726999
-0.249146	r) {	-0.301030
-0.089923	r++) {	-0.425969
-0.089923	c++) {	-0.249877
-0.089923	n) {	-0.249877
-0.503834	x++) {	-0.425969
-1.194346	powN {	-0.124939
-0.503834	Bitfield {	-0.124939
-1.191136	size) {	-0.124939
-0.089923	Disp() {	-0.726999
-0.249146	CHello {	-0.602060
-0.503834	Weekdays {	-0.124939
-1.069407	5) {	-0.124939
-0.378895	11) {	-0.425969
-0.378895	main() {	-0.425969
-0.378895	C0 {	-0.425969
-1.069407	ReadTSC() {	-0.124939
-0.896549	16) {	-0.124939
-0.203461	a) {	-0.425969
-0.203461	CParent<CChild1> {	-0.124939
-0.203461	CGrandParent {	-0.425969
-0.203461	y) {	-0.124939
-0.896549	CriticalFunctionDispatch(void) {	-0.124939
-0.896549	F1() {	-0.124939
-0.203461	Func2() {	-0.425969
-0.203461	0x7FFFFFFF) {	-0.425969
-0.203461	Hello() {	-0.425969
-0.203461	(y) {	-0.124939
-0.203461	TILESIZE) {	-0.124939
-0.203461	c2++) {	-0.425969
-0.203461	r2++) {	-0.425969
-0.203461	a[SIZE][SIZE]) {	-0.425969
-0.896549	SafeArray {	-0.124939
-0.203461	b[SIZE][SIZE]) {	-0.425969
-0.896549	B2 {	-0.124939
-0.896549	Friday) {	-0.124939
-0.598777	10) {	-0.124939
-0.598777	B1 {	-0.124939
-0.598777	source) {	-0.124939
-0.598777	i) {	-0.124939
-0.598777	g() {	-0.124939
-0.598777	F0() {	-0.124939
-0.598777	powN<true,0> {	-0.124939
-0.598777	largest_abs) {	-0.124939
-0.598777	powN<true,N> {	-0.124939
-0.598777	int)size) {	-0.124939
-0.598777	Sdouble {	-0.124939
-0.598777	temp++) {	-0.124939
-0.598777	S2 {	-0.124939
-0.598777	S3 {	-0.124939
-0.598777	__try {	-0.124939
-0.598777	(true) {	-0.124939
-0.598777	CParent<CChild2> {	-0.124939
-0.598777	(n) {	-0.124939
-0.598777	1.0) {	-0.124939
-0.598777	m) {	-0.124939
-0.598777	...)) {	-0.124939
-0.598777	EXCEPTION_CONTINUE_SEARCH) {	-0.124939
-0.598777	max) {	-0.124939
-0.598777	x[]) {	-0.124939
-0.598777	Slongdouble {	-0.124939
-0.598777	n++) {	-0.124939
-0.598777	Friday)) {	-0.124939
-0.598777	Sfloat {	-0.124939
-0.598777	MathLoop() {	-0.124939
-0.598777	Size() {	-0.124939
-0.598777	N) {	-0.124939
-0.598777	xplus2() {	-0.124939
-0.598777	arraysize) {	-0.124939
-0.598777	Func() {	-0.124939
-0.598777	v.i) {	-0.124939
-0.598777	powN<true,1> {	-0.124939
-0.598777	(...) {	-0.124939
-0.598777	13) {	-0.124939
-0.598777	min)) {	-0.124939
-0.598777	SafeArray() {	-0.124939
-0.598777	bb) {	-0.124939
-0.598777	DelayFiveSeconds() {	-0.124939
-2.214212	to have	-0.221849
-2.732730	and have	-0.124939
-1.826049	that have	-0.162727
-2.113227	can have	-0.124939
-2.919531	code have	-0.124939
-1.650851	not have	-0.249877
-2.870333	compiler have	-0.124939
-1.529564	may have	-0.477121
-1.197193	you have	-0.535113
-1.592964	will have	-0.204120
-2.136515	data have	-0.124939
-2.680398	program have	-0.124939
-1.303556	functions have	-0.301030
-2.670500	only have	-0.124939
-1.779803	should have	-0.124939
-1.978257	do have	-0.124939
-0.979328	compilers have	-0.367977
-2.563772	size have	-0.124939
-2.568704	Intel have	-0.124939
-1.942534	b have	-0.124939
-2.506164	library have	-0.124939
-1.471885	also have	-0.249877
-1.324941	objects have	-0.346788
-1.310382	we have	-0.124939
-2.428444	variables have	-0.124939
-1.187130	You have	-0.903090
-1.508143	elements have	-0.124939
-1.697146	often have	-0.124939
-1.286631	libraries have	-0.124939
-2.301268	registers have	-0.124939
-1.407943	systems have	-0.124939
-2.325346	they have	-0.124939
-1.645221	even have	-0.124939
-2.228365	instructions have	-0.124939
-2.254620	processors have	-0.124939
-0.323293	I have	-0.249877
-1.039784	CPUs have	-0.124939
-2.239291	does have	-0.124939
-1.304107	must have	-0.301030
-2.178664	calculations have	-0.124939
-2.212742	versions have	-0.124939
-0.844789	doesn't have	-0.367977
-1.273902	threads have	-0.124939
-2.153409	thread have	-0.124939
-2.135661	Linux have	-0.124939
-1.412211	would have	-0.124939
-2.050891	values have	-0.124939
-2.009448	both have	-0.124939
-1.072713	typically have	-0.124939
-1.328904	preferably have	-0.124939
-1.309217	sets have	-0.124939
-0.600388	don't have	-0.669007
-1.952434	methods have	-0.124939
-1.911245	applications have	-0.124939
-1.928921	examples have	-0.124939
-1.218658	microprocessors have	-0.124939
-0.899075	operands have	-0.124939
-1.776498	languages have	-0.124939
-1.721122	sometimes have	-0.124939
-1.718506	still have	-0.124939
-1.656808	models have	-0.124939
-1.580274	might have	-0.124939
-0.899198	programmers have	-0.124939
-1.534517	diagonal have	-0.124939
-1.539862	could have	-0.124939
-1.486028	inputs have	-0.124939
-1.488709	family have	-0.124939
-1.488709	who have	-0.124939
-1.488709	misses have	-0.124939
-0.745960	They have	-0.124939
-0.424729	computers have	-0.124939
-1.366468	spots have	-0.124939
-1.193091	tools have	-0.124939
-1.193091	caches have	-0.124939
-1.070883	projects have	-0.124939
-0.124629	microcontrollers have	-0.602060
-0.897540	can't have	-0.124939
-0.897540	others have	-0.124939
-0.897540	etc.) have	-0.124939
-0.599276	Environments) have	-0.124939
-0.599276	designers have	-0.124939
-0.599276	mentations have	-0.124939
-0.599276	isolation have	-0.124939
-1.940777	of this	-0.173243
-2.210311	to this	-0.425969
-2.352279	and this	-0.221849
-1.614735	in this	-0.411245
-2.990407	The this	-0.124939
-1.937137	for this	-0.124939
-2.471116	that this	-0.124939
-2.957172	// this	-0.124939
-3.035274	it this	-0.124939
-2.001682	if this	-0.221849
-1.943758	with this	-0.124939
-2.034257	on this	-0.249877
-2.817820	as this	-0.124939
-2.245591	have this	-0.124939
-1.499701	use this	-0.182931
-1.944982	then this	-0.124939
-2.146841	from this	-0.124939
-2.714365	at this	-0.124939
-1.727794	make this	-0.124939
-1.435416	because this	-0.271067
-2.073072	If this	-0.124939
-2.619291	which this	-0.124939
-1.087471	but this	-0.393784
-1.375822	do this	-0.204120
-2.532213	using this	-0.124939
-0.887931	In this	-0.539912
-2.443968	so this	-0.124939
-2.357094	call this	-0.124939
-2.411161	For this	-0.124939
-2.362207	example, this	-0.124939
-1.447102	how this	-0.124939
-2.280097	test this	-0.124939
-2.230064	system this	-0.124939
-2.251500	cases this	-0.124939
-1.578738	want this	-0.124939
-2.190703	about this	-0.124939
-2.235291	does this	-0.124939
-0.929789	avoid this	-0.124939
-1.524225	But this	-0.124939
-2.192200	through this	-0.124939
-2.090432	support this	-0.124939
-2.076025	inline this	-0.124939
-1.103320	optimize this	-0.124939
-0.947159	However, this	-0.124939
-0.709569	replace this	-0.602060
-1.966875	like this	-0.124939
-1.943405	running this	-0.124939
-1.914281	after this	-0.124939
-1.908485	read this	-0.124939
-1.872889	shows this	-0.124939
-1.872889	improve this	-0.124939
-0.921733	reduce this	-0.301030
-1.850612	choose this	-0.124939
-1.127945	around this	-0.124939
-1.124256	supports this	-0.124939
-1.802308	change this	-0.124939
-1.750877	Unfortunately, this	-0.124939
-1.732825	outside this	-0.124939
-1.717987	prevent this	-0.124939
-1.659882	though this	-0.124939
-1.659882	during this	-0.124939
-1.622093	under this	-0.124939
-1.628090	expect this	-0.124939
-0.938272	why this	-0.124939
-0.898733	Obviously, this	-0.425969
-1.540939	implement this	-0.124939
-1.543969	stores this	-0.124939
-1.483791	systems, this	-0.124939
-1.428786	system, this	-0.124939
-1.428786	eliminate this	-0.124939
-1.361840	course, this	-0.124939
-1.361840	giving this	-0.124939
-1.288696	overcome this	-0.124939
-1.288696	122 this	-0.124939
-1.291747	install this	-0.124939
-1.285667	adds this	-0.124939
-1.285667	unfortunately this	-0.124939
-1.191786	Furthermore, this	-0.124939
-1.194837	explain this	-0.124939
-1.191786	against this	-0.124939
-1.194837	overflow, this	-0.124939
-1.191786	changing this	-0.124939
-1.069898	skip this	-0.124939
-1.069898	At this	-0.124939
-1.072971	solved this	-0.124939
-0.378962	solve this	-0.124939
-0.896879	handles this	-0.124939
-0.896879	conclude this	-0.124939
-0.598944	subtract this	-0.124939
-0.598944	underestimate this	-0.124939
-0.598944	confirmed this	-0.124939
-0.598944	Change this	-0.124939
-0.598944	reflect this	-0.124939
-1.880810	the time	-0.527426
-3.299418	is time	-0.124939
-2.347779	a time	-0.249877
-2.449598	of time	-0.124939
-3.049583	and time	-0.124939
-1.718362	The time	-0.401145
-3.044810	be time	-0.124939
-3.027476	are time	-0.124939
-3.067294	if time	-0.124939
-2.949630	This time	-0.124939
-2.245768	this time	-0.124939
-2.804907	use time	-0.124939
-1.198587	more time	-0.698970
-2.158794	from time	-0.124939
-1.592953	same time	-0.221849
-2.087882	CPU time	-0.124939
-2.713745	If time	-0.124939
-2.624420	one time	-0.124939
-1.386328	each time	-0.425969
-2.482432	also time	-0.124939
-2.480765	takes time	-0.124939
-2.408351	very time	-0.124939
-1.034938	long time	-0.249877
-2.406439	critical time	-0.124939
-1.108327	first time	-0.204120
-2.281810	these time	-0.124939
-2.301868	short time	-0.124939
-1.320244	its time	-0.124939
-1.155346	extra time	-0.124939
-1.137358	execution time	-0.249877
-1.107754	much time	-0.249877
-0.561108	compile time	-0.204120
-1.212108	calculation time	-0.301030
-2.107272	get time	-0.124939
-0.350715	every time	-0.522879
-1.322825	next time	-0.124939
-1.968043	their time	-0.124939
-1.292626	development time	-0.124939
-1.966007	conversion time	-0.124939
-1.258732	last time	-0.124939
-0.481354	longer time	-0.425969
-1.890286	Each time	-0.124939
-1.896875	load time	-0.124939
-1.783225	installation time	-0.124939
-0.546733	response time	-0.346788
-1.744158	No time	-0.124939
-1.720784	particularly time	-0.124939
-1.716380	save time	-0.124939
-1.686392	so-called time	-0.124939
-1.665046	during time	-0.124939
-0.745837	spend time	-0.124939
-1.430963	measured time	-0.124939
-0.678890	biggest time	-0.425969
-1.294015	consume time	-0.124939
-1.289401	Development time	-0.124939
-1.194792	regular time	-0.124939
-1.197105	reproducible time	-0.124939
-1.074492	Execution time	-0.124939
-1.072166	Extra time	-0.124939
-1.074492	annoying time	-0.124939
-0.379269	compile- time	-0.124939
-0.898401	computation time	-0.124939
-0.898401	Every time	-0.124939
-0.898401	Returns time	-0.124939
-0.898401	ment time	-0.124939
-0.599709	real time	-0.124939
-0.599709	saves time	-0.124939
-0.599709	Read time	-0.124939
-0.599709	Coarse time	-0.124939
-0.599709	exact time	-0.124939
-2.471333	the use	-1.204120
-3.257487	a use	-0.124939
-1.351675	to use	-0.483112
-2.374698	and use	-0.221849
-2.369626	The use	-0.726999
-3.080517	for use	-0.124939
-2.147067	that use	-0.204120
-1.495398	can use	-0.403692
-2.485225	or use	-0.124939
-1.507016	not use	-0.263241
-1.530694	may use	-0.234083
-2.314512	you use	-0.124939
-1.684264	will use	-0.221849
-1.449564	then use	-0.249877
-2.724357	make use	-0.124939
-2.683929	CPU use	-0.124939
-2.614333	all use	-0.124939
-2.020930	cache use	-0.124939
-1.781504	should use	-0.124939
-2.596058	do use	-0.124939
-2.544344	example use	-0.124939
-1.987080	compilers use	-0.425969
-2.482432	also use	-0.124939
-2.521291	efficient use	-0.124939
-2.438571	any use	-0.124939
-2.471005	we use	-0.124939
-2.441657	variables use	-0.124939
-1.754325	cannot use	-0.124939
-2.381640	example, use	-0.124939
-2.322923	often use	-0.124939
-2.330726	libraries use	-0.124939
-1.657747	systems use	-0.124939
-1.382595	always use	-0.124939
-1.621109	operations use	-0.124939
-2.256272	available use	-0.124939
-1.564018	CPUs use	-0.124939
-1.556058	must use	-0.124939
-2.182347	threads use	-0.124939
-2.157241	integers use	-0.124939
-1.480693	classes use	-0.425969
-2.055091	well use	-0.124939
-0.931304	programs use	-0.124939
-1.324664	typically use	-0.124939
-1.957374	never use	-0.124939
-1.921162	better use	-0.124939
-1.255490	applications use	-0.124939
-1.780995	languages use	-0.124939
-1.662781	containers use	-0.124939
-1.653837	full use	-0.124939
-0.940727	resource use	-0.124939
-1.634125	implementations use	-0.124939
-1.585865	programmers use	-0.124939
-1.542384	Don't use	-0.124939
-1.486690	DLL use	-0.124939
-1.440143	Alternatively, use	-0.124939
-1.194792	explicit use	-0.124939
-0.503738	normally use	-0.124939
-1.072166	mean use	-0.124939
-1.072166	To use	-0.124939
-0.203649	(May use	-0.425969
-0.898401	CPUs: use	-0.124939
-0.898401	(2) use	-0.124939
-0.898401	Excessive use	-0.124939
-0.898401	DLLs use	-0.124939
-0.898401	machines use	-0.124939
-0.898401	Java, use	-0.124939
-0.599709	thenaandbcannot use	-0.124939
-0.599709	entries use	-0.124939
-0.599709	(Both use	-0.124939
-0.599709	Subtractions use	-0.124939
-3.142158	the more	-0.124939
-1.658137	is more	-0.458969
-2.540823	a more	-0.221849
-2.164811	and more	-0.182931
-2.655492	in more	-0.124939
-2.250308	for more	-0.124939
-2.289747	be more	-0.249877
-2.086771	are more	-0.204120
-1.534445	or more	-0.187087
-3.056224	it more	-0.124939
-2.450498	by more	-0.124939
-2.421872	with more	-0.124939
-1.756766	code more	-0.191886
-2.019810	have more	-0.124939
-2.226747	use more	-0.124939
-1.597086	A more	-0.124939
-2.729772	at more	-0.124939
-2.136515	data more	-0.124939
-2.680398	program more	-0.124939
-2.703699	make more	-0.124939
-2.665614	used more	-0.124939
-2.602232	one more	-0.124939
-1.603460	no more	-0.249877
-1.735479	do more	-0.124939
-1.320142	takes more	-0.522879
-2.413377	some more	-0.124939
-2.377783	software more	-0.124939
-2.397031	elements more	-0.124939
-2.414876	For more	-0.124939
-0.963144	take more	-0.550907
-1.697146	often more	-0.124939
-2.313296	optimization more	-0.124939
-1.645221	even more	-0.124939
-2.250969	up more	-0.124939
-2.197809	calls more	-0.124939
-0.655601	much more	-0.279841
-1.480952	therefore more	-0.124939
-2.124380	works more	-0.124939
-2.115251	calculated more	-0.124939
-2.130271	calculation more	-0.124939
-1.039506	uses more	-0.124939
-2.098459	get more	-0.124939
-2.064011	few more	-0.124939
-1.369182	cycles more	-0.425969
-1.994930	was more	-0.124939
-1.059391	caching more	-0.301030
-1.952434	development more	-0.124939
-0.965100	becomes more	-0.124939
-1.871736	actually more	-0.124939
-1.891997	load more	-0.124939
-1.868223	calling more	-0.124939
-1.843377	made more	-0.124939
-1.832899	require more	-0.124939
-1.797642	go more	-0.124939
-1.771313	become more	-0.124939
-0.849462	gives more	-0.301030
-1.773898	inlining more	-0.124939
-1.761710	find more	-0.124939
-1.718506	output more	-0.124939
-1.043668	sometimes more	-0.425969
-1.656808	possibly more	-0.124939
-0.976721	little more	-0.124939
-0.646269	allocate more	-0.124939
-0.325361	slightly more	-0.346788
-1.486028	once more	-0.124939
-1.433414	spend more	-0.124939
-1.369182	comparisons more	-0.124939
-1.366468	occurs more	-0.124939
-1.369182	prefetch more	-0.124939
-1.292732	consume more	-0.124939
-0.600388	becoming more	-0.124939
-1.287286	ever more	-0.124939
-1.070883	programs, more	-0.124939
-1.070883	allocating more	-0.124939
-1.070883	certainly more	-0.124939
-0.897540	invest more	-0.124939
-0.897540	dynamic_cast more	-0.124939
-0.897540	achieved more	-0.124939
-0.897540	cached more	-0.124939
-0.897540	somewhat more	-0.124939
-0.599276	sample more	-0.124939
-0.599276	implies more	-0.124939
-0.599276	40% more	-0.124939
-2.788671	of when	-0.124939
-1.994655	or when	-0.346788
-2.440688	function when	-0.425969
-2.779131	on when	-0.124939
-2.362292	code when	-0.124939
-2.692897	as when	-0.124939
-2.056239	than when	-0.124939
-2.757517	compiler when	-0.124939
-2.920970	x when	-0.124939
-1.725672	time when	-0.346788
-2.115816	memory when	-0.124939
-2.571112	program when	-0.124939
-1.091759	only when	-0.191886
-1.798900	used when	-0.124939
-1.968848	set when	-0.124939
-2.497781	do when	-0.124939
-1.930429	example when	-0.124939
-2.486223	size when	-0.124939
-2.500701	b when	-0.124939
-1.890076	number when	-0.124939
-2.567849	there when	-0.124939
-2.380354	also when	-0.124939
-1.463041	efficient when	-0.124939
-2.419017	possible when	-0.124939
-2.295490	2 when	-0.124939
-1.755526	faster when	-0.124939
-1.721890	called when	-0.425969
-2.358656	critical when	-0.124939
-2.317493	example, when	-0.124939
-2.283043	first when	-0.124939
-2.256714	libraries when	-0.124939
-2.246825	registers when	-0.124939
-2.228685	test when	-0.124939
-2.224954	systems when	-0.124939
-1.238601	useful when	-0.249877
-0.891216	even when	-0.249877
-2.183848	file when	-0.124939
-2.197379	bits when	-0.124939
-1.605328	operations when	-0.124939
-2.215375	cases when	-0.124939
-2.181688	times when	-0.124939
-2.164993	stack when	-0.124939
-2.240184	want when	-0.124939
-2.136964	work when	-0.124939
-2.159028	compiled when	-0.124939
-2.117220	best when	-0.124939
-2.149043	necessary when	-0.124939
-2.109616	language when	-0.124939
-2.144936	But when	-0.124939
-1.470627	matrix when	-0.124939
-1.467774	precision when	-0.425969
-2.070997	line when	-0.124939
-2.097375	advantageous when	-0.124939
-1.429394	problem when	-0.124939
-2.042423	calculate when	-0.124939
-1.988588	counter when	-0.124939
-2.001030	files when	-0.124939
-1.982120	allocation when	-0.124939
-1.950524	programs when	-0.124939
-1.971067	problems when	-0.124939
-1.299334	automatically when	-0.425969
-1.951691	implementation when	-0.124939
-1.289107	signed when	-0.124939
-1.953056	disadvantage when	-0.124939
-1.921284	running when	-0.124939
-1.920670	end when	-0.124939
-1.857998	expressions when	-0.124939
-1.854625	directives when	-0.124939
-1.867691	optimizations when	-0.124939
-1.867691	microprocessors when	-0.124939
-1.815471	process when	-0.124939
-1.856333	advantages when	-0.124939
-1.828669	results when	-0.124939
-1.783409	modules when	-0.124939
-0.874367	relevant when	-0.124939
-1.099225	dynamically when	-0.425969
-1.772281	avoided when	-0.124939
-1.735034	inefficient when	-0.124939
-1.730497	comes when	-0.124939
-1.691600	task when	-0.124939
-1.656761	obtained when	-0.124939
-0.972313	counters when	-0.124939
-0.723099	efficiently when	-0.301030
-0.685310	initialized when	-0.602060
-1.618972	especially when	-0.124939
-1.604920	message when	-0.124939
-1.618972	except when	-0.124939
-1.577580	CriticalFunction when	-0.124939
-1.572845	penalty when	-0.124939
-1.563527	invalid when	-0.124939
-1.475935	parallelism when	-0.124939
-0.273518	account when	-0.346788
-1.471251	however, when	-0.124939
-0.548675	fragmented when	-0.301030
-1.471251	inputs when	-0.124939
-1.475935	preferred when	-0.124939
-1.471251	explicitly when	-0.124939
-0.745176	resolved when	-0.425969
-1.413259	Likewise, when	-0.124939
-1.422678	itself when	-0.124939
-1.417943	serious when	-0.124939
-1.413259	Studio when	-0.124939
-0.743180	disadvantages when	-0.124939
-1.413259	update when	-0.124939
-1.417943	collection when	-0.124939
-1.355731	costly when	-0.124939
-1.355731	truncation when	-0.124939
-1.350996	happens when	-0.124939
-1.350996	places when	-0.124939
-0.265107	static, when	-0.726999
-0.424851	deallocated when	-0.301030
-1.355731	permissible when	-0.124939
-1.281336	strict when	-0.124939
-1.281336	compromise when	-0.124939
-1.276550	increased when	-0.124939
-1.276550	understand when	-0.124939
-1.184426	dramatic when	-0.124939
-1.189267	force when	-0.124939
-1.184426	simpler when	-0.124939
-1.064328	deleted when	-0.124939
-1.064328	manually when	-0.124939
-1.064328	-fno-pic when	-0.124939
-1.064328	Vec16s when	-0.124939
-1.064328	readable when	-0.124939
-1.064328	negligible when	-0.124939
-1.064328	allocating when	-0.124939
-1.064328	Func1 when	-0.124939
-0.378201	implicitly when	-0.124939
-1.064328	evicted when	-0.124939
-0.893132	freed when	-0.124939
-0.893132	0's when	-0.124939
-0.893132	bypassed when	-0.124939
-0.893132	achieved when	-0.124939
-0.893132	careful when	-0.124939
-0.893132	indices when	-0.124939
-0.893132	Useful when	-0.124939
-0.893132	question when	-0.124939
-0.893132	precisions when	-0.124939
-0.893132	1's when	-0.124939
-0.893132	stronger when	-0.124939
-0.893132	float's when	-0.124939
-0.597052	exits, when	-0.124939
-0.597052	popularity when	-0.124939
-0.597052	processor) when	-0.124939
-0.597052	Eclipse when	-0.124939
-0.597052	disappears when	-0.124939
-0.597052	33% when	-0.124939
-0.597052	released when	-0.124939
-0.597052	decreased when	-0.124939
-0.597052	definitions when	-0.124939
-2.527373	of A	-0.425969
-2.226142	= A	-0.301030
-2.721943	compiler A	-0.124939
-1.778214	} A	-0.124939
-2.093508	data A	-0.124939
-2.060057	functions A	-0.124939
-2.548134	loop A	-0.124939
-1.914532	double A	-0.425969
-2.224260	code. A	-0.124939
-0.931970	time. A	-0.124939
-2.240737	pointers A	-0.124939
-1.544573	function. A	-0.425969
-0.831137	functions. A	-0.204120
-2.073283	b; A	-0.124939
-1.998790	memory. A	-0.124939
-1.112078	used. A	-0.124939
-1.931954	sets A	-0.124939
-1.282628	systems. A	-0.124939
-1.919777	conversion A	-0.124939
-0.845349	data. A	-0.124939
-1.864259	set. A	-0.124939
-1.834398	processors. A	-0.124939
-1.800590	storage A	-0.124939
-1.810469	called. A	-0.124939
-1.772272	pointer. A	-0.124939
-1.697928	variables. A	-0.124939
-1.687951	local A	-0.124939
-1.655766	it. A	-0.124939
-1.655766	registers. A	-0.124939
-1.660910	mode. A	-0.124939
-1.655766	object. A	-0.124939
-1.655766	library. A	-0.124939
-1.655766	operations. A	-0.124939
-1.650683	optimization. A	-0.124939
-1.004299	performance. A	-0.124939
-1.626148	libraries. A	-0.124939
-1.631353	stack. A	-0.124939
-1.621004	possible. A	-0.124939
-1.631353	thread. A	-0.124939
-1.593565	instructions. A	-0.124939
-1.593565	way. A	-0.124939
-0.935184	well. A	-0.124939
-1.552172	address. A	-0.124939
-1.517016	destructors A	-0.124939
-1.511683	enabled. A	-0.124939
-0.850340	to. A	-0.124939
-0.849186	block. A	-0.124939
-1.517016	critical. A	-0.124939
-1.517016	usability A	-0.124939
-1.465863	file. A	-0.124939
-1.471263	arithmetic A	-0.124939
-1.471263	devices A	-0.124939
-0.547656	space. A	-0.124939
-1.460530	zero. A	-0.124939
-1.460530	software. A	-0.124939
-1.465863	vectors. A	-0.124939
-1.407871	parameters. A	-0.124939
-0.742353	important. A	-0.124939
-1.407871	switches A	-0.124939
-0.741196	disk. A	-0.124939
-1.413271	case. A	-0.124939
-1.413271	polymorphism A	-0.124939
-1.413271	speed. A	-0.124939
-1.346324	call. A	-0.124939
-0.675407	prediction. A	-0.124939
-1.346324	members. A	-0.124939
-1.346324	above. A	-0.124939
-1.351791	result. A	-0.124939
-1.346324	chain. A	-0.124939
-1.351791	times. A	-0.124939
-1.351791	counter. A	-0.124939
-1.351791	branches. A	-0.124939
-1.351791	cores. A	-0.124939
-1.272610	profiler. A	-0.124939
-1.278147	Templates A	-0.124939
-1.278147	consuming. A	-0.124939
-1.272610	method. A	-0.124939
-1.272610	tools. A	-0.124939
-1.278147	operation. A	-0.124939
-1.278147	factor. A	-0.124939
-1.272610	process. A	-0.124939
-1.272610	necessary. A	-0.124939
-1.272610	elimination A	-0.124939
-1.278147	elements. A	-0.124939
-1.278147	doubled. A	-0.124939
-1.272610	here: A	-0.124939
-1.181237	exception. A	-0.124939
-1.186846	initialization. A	-0.124939
-1.181237	Graphics A	-0.124939
-1.186846	index. A	-0.124939
-1.186846	lines. A	-0.124939
-1.181237	conditions. A	-0.124939
-1.181237	example. A	-0.124939
-0.501640	expensive. A	-0.124939
-1.181237	versions. A	-0.124939
-1.181237	tasks. A	-0.124939
-1.181237	interface. A	-0.124939
-1.181237	floats A	-0.124939
-1.186846	sequentially A	-0.124939
-1.061907	chains. A	-0.124939
-1.061907	motion A	-0.124939
-1.061907	collection. A	-0.124939
-1.061907	int. A	-0.124939
-1.061907	reference. A	-0.124939
-1.061907	prone. A	-0.124939
-0.891498	Unions A	-0.124939
-0.891498	subexpression. A	-0.124939
-0.891498	differently. A	-0.124939
-0.891498	course. A	-0.124939
-0.891498	37 A	-0.124939
-0.891498	date. A	-0.124939
-0.891498	78). A	-0.124939
-0.202947	debugging. A	-0.124939
-0.891498	load. A	-0.124939
-0.891498	below). A	-0.124939
-0.891498	enough. A	-0.124939
-0.891498	correctly. A	-0.124939
-0.891498	VectorC A	-0.124939
-0.891498	linking. A	-0.124939
-0.891498	incompatible. A	-0.124939
-0.891498	counters. A	-0.124939
-0.891498	CPUs". A	-0.124939
-0.891498	117 A	-0.124939
-0.596225	blocks. A	-0.124939
-0.596225	so). A	-0.124939
-0.596225	moved. A	-0.124939
-0.596225	body. A	-0.124939
-0.596225	mispredicted. A	-0.124939
-0.596225	owns. A	-0.124939
-0.596225	investment. A	-0.124939
-0.596225	153. A	-0.124939
-0.596225	constructors. A	-0.124939
-0.596225	107. A	-0.124939
-0.596225	targets. A	-0.124939
-0.596225	7.32b. A	-0.124939
-0.596225	constructor. A	-0.124939
-0.596225	sub-vector. A	-0.124939
-0.596225	microarchitecture. A	-0.124939
-0.596225	("hidden")))". A	-0.124939
-0.596225	driver. A	-0.124939
-0.596225	Sutter: A	-0.124939
-0.596225	(MFC). A	-0.124939
-0.596225	restrictions. A	-0.124939
-0.596225	sizeof(list)); A	-0.124939
-0.596225	form. A	-0.124939
-0.596225	developed. A	-0.124939
-0.596225	Loops: A	-0.124939
-0.596225	resolution. A	-0.124939
-0.596225	(Not A	-0.124939
-0.596225	complications. A	-0.124939
-0.596225	lists. A	-0.124939
-0.596225	2GHz A	-0.124939
-0.596225	(WTL). A	-0.124939
-0.596225	46 A	-0.124939
-0.596225	considered. A	-0.124939
-0.596225	scope. A	-0.124939
-0.596225	infinity. A	-0.124939
-0.596225	_mm_free. A	-0.124939
-0.596225	taken. A	-0.124939
-0.596225	says. A	-0.124939
-0.596225	(www.agner.org/optimize/testp.zip). A	-0.124939
-0.596225	changes. A	-0.124939
-0.596225	Basic. A	-0.124939
-0.596225	exploited. A	-0.124939
-0.596225	servicing. A	-0.124939
-0.596225	inferior. A	-0.124939
-0.596225	types. A	-0.124939
-0.596225	destructor. A	-0.124939
-0.596225	reason. A	-0.124939
-0.596225	set?". A	-0.124939
-0.596225	interval. A	-0.124939
-0.596225	Scheduling A	-0.124939
-0.596225	138 A	-0.124939
-0.596225	needed? A	-0.124939
-0.596225	seconds. A	-0.124939
-2.887166	a will	-0.124939
-2.923201	and will	-0.124939
-2.936011	// will	-0.124939
-1.555702	it will	-0.316824
-2.109594	function will	-0.124939
-1.588831	code will	-0.221849
-1.506686	This will	-0.263241
-1.252766	compiler will	-0.209260
-1.514188	you will	-0.234083
-2.000986	this will	-0.124939
-1.787759	It will	-0.124939
-2.139527	memory will	-0.124939
-1.624278	program will	-0.346788
-2.636198	CPU will	-0.124939
-1.676841	loop will	-0.425969
-1.661041	which will	-0.124939
-2.630470	but will	-0.124939
-2.009031	cache will	-0.124939
-2.548966	integer will	-0.124939
-2.528693	class will	-0.124939
-0.864904	compilers will	-0.338819
-1.938079	b will	-0.124939
-2.483284	library will	-0.124939
-2.542882	i will	-0.124939
-2.580033	there will	-0.124939
-2.558534	There will	-0.124939
-2.447221	array will	-0.124939
-2.475126	value will	-0.124939
-2.456051	objects will	-0.124939
-1.423552	we will	-0.124939
-2.437587	so will	-0.124939
-2.410789	variables will	-0.124939
-2.481012	You will	-0.124939
-1.772688	branch will	-0.124939
-2.382540	elements will	-0.124939
-2.267862	systems will	-0.124939
-1.655331	user will	-0.124939
-2.240047	16 will	-0.124939
-1.618384	file will	-0.124939
-2.256448	programming will	-0.124939
-2.243606	processors will	-0.124939
-1.332019	I will	-0.124939
-2.163761	calculations will	-0.124939
-1.294291	result will	-0.602060
-2.167976	processor will	-0.124939
-2.160842	threads will	-0.124939
-2.146777	language will	-0.124939
-2.169505	speed will	-0.124939
-1.260629	thread will	-0.124939
-2.138096	overflow will	-0.124939
-2.107150	line will	-0.124939
-2.128222	manual will	-0.124939
-2.107173	b; will	-0.124939
-2.058852	operators will	-0.124939
-1.973273	operator will	-0.124939
-1.964493	multiplication will	-0.124939
-1.981202	caching will	-0.124939
-1.939849	model will	-0.124939
-1.980090	y will	-0.124939
-1.921748	examples will	-0.124939
-1.848120	feature will	-0.124939
-1.796677	core will	-0.124939
-1.784348	section will	-0.124939
-1.799815	contentions will	-0.124939
-1.124799	main will	-0.124939
-1.070441	operation will	-0.124939
-1.733040	vectorization will	-0.124939
-1.715495	constants will	-0.124939
-1.677056	macro will	-0.124939
-1.648548	counters will	-0.124939
-1.617104	condition will	-0.124939
-1.617104	cores will	-0.124939
-1.578919	F1 will	-0.124939
-1.575711	stride will	-0.124939
-1.427248	trick will	-0.124939
-1.433784	instances will	-0.124939
-1.424017	127 will	-0.124939
-1.430504	linker will	-0.124939
-1.363557	break will	-0.124939
-1.360301	users will	-0.124939
-1.284376	Compilers will	-0.124939
-1.287656	18 will	-0.124939
-1.190746	loader will	-0.124939
-1.190746	manager will	-0.124939
-1.194051	17 will	-0.124939
-0.503794	0x2710 will	-0.425969
-1.069112	14.28 will	-0.124939
-1.069112	14.30 will	-0.124939
-1.069112	0x273F will	-0.124939
-1.072443	3.5 will	-0.124939
-0.896351	c+b will	-0.124939
-0.896351	disabled will	-0.124939
-0.896351	today will	-0.124939
-0.896351	modifier will	-0.124939
-0.896351	b+c will	-0.124939
-0.598678	OneOrTwo5[b!=0]; will	-0.124939
-0.598678	103) will	-0.124939
-0.598678	producer will	-0.124939
-0.598678	and) will	-0.124939
-0.598678	b++; will	-0.124939
-0.598678	b*2.0/3.0 will	-0.124939
-2.955027	The }	-0.124939
-2.939409	function }	-0.124939
-0.720819	} }	-0.496550
-2.380171	elements }	-0.124939
-1.328135	0; }	-0.249877
-2.158816	element }	-0.124939
-0.655693	i; }	-0.425969
-1.448347	b; }	-0.124939
-0.754306	... }	-0.301030
-1.971614	operator }	-0.124939
-0.342444	1; }	-0.393784
-1.962611	multiplication }	-0.124939
-1.291350	c; }	-0.124939
-1.920461	zero }	-0.124939
-1.908085	lookup }	-0.124939
-1.868577	x; }	-0.124939
-0.243952	2; }	-0.263241
-1.826956	range }	-0.124939
-1.789285	5 }	-0.124939
-1.789285	positive }	-0.124939
-1.698866	exponent }	-0.124939
-1.622967	temp; }	-0.124939
-1.626315	f; }	-0.124939
-0.325032	3; }	-0.221849
-1.535817	y; }	-0.124939
-1.539165	factorial }	-0.124939
-0.490978	x); }	-0.124939
-1.359726	1.0; }	-0.124939
-0.345624	1.; }	-0.124939
-1.287266	nonzero }	-0.124939
-1.283892	C; }	-0.124939
-1.197184	break; }	-0.124939
-1.190356	134 }	-0.124939
-0.503057	2.0; }	-0.124939
-0.503057	1.0f; }	-0.124939
-1.190356	i_div_3; }	-0.124939
-1.190356	N1 }	-0.124939
-0.124550	const*)p); }	-0.602060
-1.068818	Z }	-0.124939
-0.124550	a); }	-0.301030
-0.896154	p->Hello(); }	-0.124939
-0.896154	cos(x); }	-0.124939
-0.896154	C1::f }	-0.124939
-0.896154	sum; }	-0.124939
-0.896154	2.0f; }	-0.124939
-0.896154	range"; }	-0.124939
-0.896154	clock; }	-0.124939
-0.896154	-0 }	-0.124939
-0.203421	F2(b); }	-0.425969
-0.896154	FuncC(i); }	-0.124939
-0.896154	FuncA(i); }	-0.124939
-0.896154	parm2); }	-0.124939
-0.896154	x^n }	-0.124939
-0.203421	F1(a); }	-0.425969
-0.203421	&CriticalFunction_386; }	-0.124939
-0.203421	DoThisThreeTimesAWeek(); }	-0.124939
-0.896154	sin(x); }	-0.124939
-0.203421	a.store(aa+i); }	-0.425969
-0.203421	&CriticalFunction_SSE2; }	-0.124939
-0.896154	Induction++; }	-0.124939
-0.203421	cc); }	-0.124939
-0.203421	swapd(a[r2][c2],a[c2][r2]); }	-0.425969
-0.896154	EMMS }	-0.124939
-0.896154	sizeof(a)); }	-0.124939
-0.203421	&CriticalFunction_AVX; }	-0.124939
-0.896154	109 }	-0.124939
-0.598578	pow(x,10); }	-0.124939
-0.598578	sums }	-0.124939
-0.598578	before) }	-0.124939
-0.598578	powN<true,N/2>::p(x); }	-0.124939
-0.598578	list[j].c; }	-0.124939
-0.598578	cc[i]); }	-0.124939
-0.598578	x10; }	-0.124939
-0.598578	i/2; }	-0.124939
-0.598578	abs(v.f) }	-0.124939
-0.598578	c[i]); }	-0.124939
-0.598578	FuncB(i); }	-0.124939
-0.598578	IntegerPower<10>(x); }	-0.124939
-0.598578	(static_cast<MyChild*>(this))->Disp(); }	-0.124939
-0.598578	b[r][c]; }	-0.124939
-0.598578	printf(Greek[n]); }	-0.124939
-0.598578	CFALSE; }	-0.124939
-0.598578	DTRUE; }	-0.124939
-0.598578	Func(a[i]); }	-0.124939
-0.598578	timediff[i]); }	-0.124939
-0.598578	transpose(matrix); }	-0.124939
-0.598578	F1(); }	-0.124939
-0.598578	return; }	-0.124939
-0.598578	Func(ab[i].a); }	-0.124939
-0.598578	69 }	-0.124939
-0.598578	list[x]; }	-0.124939
-0.598578	N; }	-0.124939
-0.598578	b[r][c]); }	-0.124939
-0.598578	_mm_cvtss_f32(s); }	-0.124939
-0.598578	&list[8]); }	-0.124939
-0.598578	i[2]; }	-0.124939
-0.598578	N-1)==0,N>::p(x); }	-0.124939
-0.598578	104 }	-0.124939
-0.598578	FuncC(i+1); }	-0.124939
-0.598578	a[i+3]; }	-0.124939
-0.598578	i++; }	-0.124939
-0.598578	*(T*)0; }	-0.124939
-0.598578	111 }	-0.124939
-0.598578	9; }	-0.124939
-2.821710	is then	-0.124939
-1.996264	and then	-0.166331
-2.342992	are then	-0.124939
-2.343485	can then	-0.124939
-2.832741	function then	-0.124939
-2.147371	code then	-0.301030
-1.980170	time then	-0.301030
-2.638263	more then	-0.124939
-2.614305	will then	-0.124939
-2.582684	memory then	-0.124939
-2.088835	program then	-0.124939
-1.688144	functions then	-0.124939
-2.441620	other then	-0.124939
-2.565102	loop then	-0.124939
-2.527321	which then	-0.124939
-2.500986	cache then	-0.124939
-2.585125	should then	-0.124939
-1.965437	set then	-0.124939
-2.520647	compilers then	-0.124939
-2.475750	size then	-0.124939
-1.927375	pointer then	-0.124939
-2.415761	library then	-0.124939
-2.361657	two then	-0.124939
-2.419285	object then	-0.124939
-2.506135	number then	-0.124939
-2.418486	static then	-0.124939
-1.528878	2 then	-0.301030
-2.350595	performance then	-0.124939
-2.338171	elements then	-0.124939
-2.310101	example, then	-0.124939
-2.239254	registers then	-0.124939
-2.196968	case then	-0.124939
-1.581088	available then	-0.124939
-2.209361	up then	-0.124939
-2.155199	error then	-0.124939
-1.567226	times then	-0.124939
-2.119995	large then	-0.124939
-2.164536	must then	-0.124939
-2.118216	calculations then	-0.124939
-2.150747	execution then	-0.124939
-2.167975	result then	-0.124939
-2.128501	bytes then	-0.124939
-2.143860	necessary then	-0.124939
-2.122639	element then	-0.124939
-2.070021	etc. then	-0.124939
-2.125700	exception then	-0.124939
-2.074971	small then	-0.124939
-2.095393	option then	-0.124939
-2.063975	line then	-0.124939
-2.077319	works then	-0.124939
-1.442535	parameters then	-0.124939
-2.093608	advantageous then	-0.124939
-2.058831	problem then	-0.124939
-2.081373	known then	-0.124939
-2.054459	support then	-0.124939
-2.042154	structure then	-0.124939
-2.015002	values then	-0.124939
-1.983185	cycles then	-0.124939
-2.000761	... then	-0.124939
-1.983185	counter then	-0.124939
-1.961921	dispatching then	-0.124939
-1.937427	application then	-0.124939
-1.925365	automatically then	-0.124939
-1.925365	handling then	-0.124939
-1.912703	methods then	-0.124939
-1.917119	block then	-0.124939
-1.899390	high then	-0.124939
-1.889875	dispatcher then	-0.124939
-1.889875	addition then	-0.124939
-1.824670	vectors then	-0.124939
-1.792578	function, then	-0.124939
-0.897751	range then	-0.301030
-1.777109	core then	-0.124939
-1.772407	references then	-0.124939
-1.758596	supports then	-0.124939
-1.094506	index then	-0.124939
-1.722752	code, then	-0.124939
-1.752091	instance then	-0.124939
-1.708793	vectorization then	-0.124939
-1.717098	efficiency then	-0.124939
-1.007470	time, then	-0.425969
-1.635403	models then	-0.124939
-1.640264	changed then	-0.124939
-0.971659	execute then	-0.124939
-1.602475	resource then	-0.124939
-1.551415	compiler, then	-0.124939
-0.643069	module then	-0.124939
-0.894579	used, then	-0.124939
-1.561082	stride then	-0.124939
-1.561082	set, then	-0.124939
-1.520240	independent then	-0.124939
-1.540479	near then	-0.124939
-1.515325	chains then	-0.124939
-1.469088	integer, then	-0.124939
-1.469088	once then	-0.124939
-0.802969	cycles, then	-0.124939
-1.474060	volatile then	-0.124939
-0.742849	object, then	-0.425969
-1.416068	changes then	-0.124939
-1.411096	integers, then	-0.124939
-1.354151	predictable then	-0.124939
-1.349121	debugger then	-0.124939
-0.597784	on, then	-0.425969
-1.285207	swapped then	-0.124939
-1.274970	flag then	-0.124939
-1.280058	true, then	-0.124939
-1.183148	pipeline then	-0.124939
-1.188297	n, then	-0.124939
-0.501939	not, then	-0.124939
-1.183148	changing then	-0.124939
-1.183148	manner then	-0.124939
-1.063358	numbers, then	-0.124939
-1.063358	slow, then	-0.124939
-0.378068	basis then	-0.425969
-1.063358	elsewhere then	-0.124939
-1.063358	way, then	-0.124939
-1.063358	efficiency, then	-0.124939
-1.063358	arrays, then	-0.124939
-1.068569	false, then	-0.124939
-0.378068	sum, then	-0.124939
-0.378068	double, then	-0.124939
-0.892477	vacant then	-0.124939
-0.892477	priorities then	-0.124939
-0.892477	GHz then	-0.124939
-0.892477	differ then	-0.124939
-0.203047	first, then	-0.124939
-0.892477	below) then	-0.124939
-0.892477	hyperthreading, then	-0.124939
-0.892477	loops, then	-0.124939
-0.892477	container, then	-0.124939
-0.892477	support, then	-0.124939
-0.892477	segment then	-0.124939
-0.892477	limit, then	-0.124939
-0.596721	today, then	-0.124939
-0.596721	FuncB, then	-0.124939
-0.596721	meaning, then	-0.124939
-0.596721	identified, then	-0.124939
-0.596721	dispatching, then	-0.124939
-0.596721	found, then	-0.124939
-0.596721	9.10, then	-0.124939
-0.596721	fine then	-0.124939
-0.596721	so, then	-0.124939
-0.596721	other, then	-0.124939
-0.596721	row-wise, then	-0.124939
-0.596721	T+5, then	-0.124939
-0.596721	predictable, then	-0.124939
-0.596721	made) then	-0.124939
-0.596721	algorithm, then	-0.124939
-0.596721	ignore, then	-0.124939
-0.596721	y?" then	-0.124939
-0.596721	obvious, then	-0.124939
-0.596721	RTTI then	-0.124939
-0.596721	10000, then	-0.124939
-0.596721	only, then	-0.124939
-0.596721	small, then	-0.124939
-0.596721	231 then	-0.124939
-0.596721	C2, then	-0.124939
-0.596721	met then	-0.124939
-0.596721	18, then	-0.124939
-0.596721	d+e, then	-0.124939
-2.781279	// It	-0.124939
-2.155434	} It	-0.425969
-2.561988	functions It	-0.124939
-2.406704	object It	-0.124939
-2.237578	libraries It	-0.124939
-1.277216	code. It	-0.726999
-1.159158	time. It	-0.346788
-2.242100	pointers It	-0.124939
-2.204554	does It	-0.124939
-2.138818	arrays It	-0.124939
-2.060152	etc. It	-0.124939
-1.192658	functions. It	-0.301030
-1.127718	memory. It	-0.301030
-0.841954	used. It	-0.522879
-1.915803	systems. It	-0.124939
-1.909285	data. It	-0.124939
-1.865096	set. It	-0.124939
-1.835325	processors. It	-0.124939
-1.811210	called. It	-0.124939
-1.792704	CPUs. It	-0.124939
-1.777945	compiler. It	-0.124939
-1.753122	are: It	-0.124939
-1.772914	loop. It	-0.124939
-0.598319	pointer. It	-0.346788
-1.741552	cases. It	-0.124939
-1.698764	variables. It	-0.124939
-1.698838	class. It	-0.124939
-1.678584	database It	-0.124939
-1.698838	calls. It	-0.124939
-1.656408	registers. It	-0.124939
-1.656408	object. It	-0.124939
-1.656408	library. It	-0.124939
-1.008979	calculations. It	-0.425969
-1.661500	cycles. It	-0.124939
-1.656408	operations. It	-0.124939
-1.651375	optimization. It	-0.124939
-1.656408	performance. It	-0.124939
-1.631891	thread. It	-0.124939
-1.621646	structures It	-0.124939
-1.594102	vector. It	-0.124939
-1.547557	references. It	-0.124939
-1.557924	CPU. It	-0.124939
-1.557924	Windows. It	-0.124939
-0.598206	integers. It	-0.301030
-0.850419	to. It	-0.425969
-1.517444	critical. It	-0.124939
-1.522787	available. It	-0.124939
-1.506952	executed. It	-0.124939
-1.522787	faster. It	-0.124939
-0.849278	problems. It	-0.425969
-1.506952	parameter. It	-0.124939
-1.512167	expressions. It	-0.124939
-1.471635	value. It	-0.124939
-1.471635	system. It	-0.124939
-1.461014	arrays. It	-0.124939
-1.461014	branch. It	-0.124939
-0.800411	space. It	-0.124939
-1.461014	zero. It	-0.124939
-1.461014	software. It	-0.124939
-1.413643	solution. It	-0.124939
-1.408300	Linux. It	-0.124939
-1.408300	language. It	-0.124939
-1.413643	is. It	-0.124939
-1.419052	automatically. It	-0.124939
-0.741275	core. It	-0.425969
-1.424530	vectorization. It	-0.124939
-1.408300	anyway. It	-0.124939
-1.419052	do. It	-0.124939
-1.346696	constant. It	-0.124939
-1.346696	this. It	-0.124939
-0.676620	here. It	-0.124939
-1.346696	members. It	-0.124939
-1.352105	times. It	-0.124939
-1.352105	one. It	-0.124939
-1.352105	structure. It	-0.124939
-1.272924	profiler. It	-0.124939
-1.272924	handling. It	-0.124939
-1.272924	tools. It	-0.124939
-1.272924	numbers. It	-0.124939
-1.272924	a[i]; It	-0.124939
-1.272924	execution. It	-0.124939
-1.272924	so. It	-0.124939
-1.181492	input. It	-0.124939
-1.181492	IDE. It	-0.124939
-1.181492	versions. It	-0.124939
-1.181492	information. It	-0.124939
-1.181492	interface. It	-0.124939
-1.187039	unit-testing It	-0.124939
-1.187039	style. It	-0.124939
-1.181492	counts. It	-0.124939
-1.181492	smaller. It	-0.124939
-1.181492	manually. It	-0.124939
-1.181492	programs. It	-0.124939
-1.067720	part. It	-0.124939
-1.062100	100. It	-0.124939
-1.062100	organization It	-0.124939
-0.377895	purpose. It	-0.124939
-1.062100	sequentially. It	-0.124939
-1.062100	manner. It	-0.124939
-1.062100	x. It	-0.124939
-0.891628	context. It	-0.124939
-0.891628	aligned. It	-0.124939
-0.891628	throw. It	-0.124939
-0.891628	course. It	-0.124939
-0.891628	73). It	-0.124939
-0.891628	disadvantages: It	-0.124939
-0.891628	buffer. It	-0.124939
-0.891628	away. It	-0.124939
-0.891628	utility. It	-0.124939
-0.891628	check. It	-0.124939
-0.891628	decomposition. It	-0.124939
-0.891628	lost. It	-0.124939
-0.891628	read. It	-0.124939
-0.891628	148 It	-0.124939
-0.891628	started. It	-0.124939
-0.891628	Gnu. It	-0.124939
-0.891628	updated. It	-0.124939
-0.891628	predictable. It	-0.124939
-0.891628	shows. It	-0.124939
-0.891628	source. It	-0.124939
-0.891628	efficiently. It	-0.124939
-0.891628	divisions. It	-0.124939
-0.891628	72. It	-0.124939
-0.891628	safer. It	-0.124939
-0.596291	message. It	-0.124939
-0.596291	product. It	-0.124939
-0.596291	off. It	-0.124939
-0.596291	diagnose. It	-0.124939
-0.596291	profile. It	-0.124939
-0.596291	bloat. It	-0.124939
-0.596291	objects? It	-0.124939
-0.596291	polymorphism. It	-0.124939
-0.596291	move. It	-0.124939
-0.596291	ahead. It	-0.124939
-0.596291	strategies It	-0.124939
-0.596291	type-casting. It	-0.124939
-0.596291	response. It	-0.124939
-0.596291	happening. It	-0.124939
-0.596291	standardized. It	-0.124939
-0.596291	convenient. It	-0.124939
-0.596291	high. It	-0.124939
-0.596291	unavoidable. It	-0.124939
-0.596291	animation. It	-0.124939
-0.596291	_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It	-0.124939
-0.596291	processors). It	-0.124939
-0.596291	can. It	-0.124939
-0.596291	indeed. It	-0.124939
-0.596291	hackers. It	-0.124939
-0.596291	friendly. It	-0.124939
-0.596291	54. It	-0.124939
-0.596291	label. It	-0.124939
-0.596291	positive. It	-0.124939
-0.596291	considerations. It	-0.124939
-0.596291	list[i].b. It	-0.124939
-0.596291	atomic. It	-0.124939
-0.596291	poorly. It	-0.124939
-0.596291	fashion. It	-0.124939
-0.596291	between. It	-0.124939
-0.596291	130. It	-0.124939
-0.596291	conversions. It	-0.124939
-0.596291	57). It	-0.124939
-0.596291	correctness. It	-0.124939
-0.596291	61. It	-0.124939
-0.596291	sizes? It	-0.124939
-0.596291	a*b*c*2. It	-0.124939
-0.596291	double's. It	-0.124939
-0.596291	pooling. It	-0.124939
-0.596291	decimals. It	-0.124939
-0.596291	develop. It	-0.124939
-0.596291	leaks. It	-0.124939
-0.596291	green. It	-0.124939
-0.596291	costless. It	-0.124939
-0.596291	queue. It	-0.124939
-0.462305	// Example	-0.124939
-1.961952	} Example	-0.124939
-1.564111	; Example	-0.124939
-1.535213	element Example	-0.124939
-1.132922	loop. Example	-0.124939
-1.553784	storage. Example	-0.124939
-1.078340	runtime. Example	-0.124939
-1.078340	expression. Example	-0.124939
-2.434554	or from	-0.124939
-2.291522	it from	-0.124939
-2.471914	function from	-0.124939
-2.168582	code from	-0.602060
-2.294482	than from	-0.124939
-1.596970	compiler from	-0.425969
-1.893341	data from	-0.124939
-1.728890	vector from	-0.726999
-1.842926	only from	-0.301030
-1.837288	CPU from	-0.301030
-2.021274	one from	-0.124939
-2.007142	cache from	-0.124939
-2.574637	compilers from	-0.124939
-2.542196	b from	-0.124939
-1.324823	value from	-0.346788
-1.812671	variable from	-0.124939
-1.543394	return from	-0.124939
-2.371488	table from	-0.124939
-1.069238	elements from	-0.492916
-0.843146	called from	-0.263241
-2.345915	call from	-0.124939
-2.224569	file from	-0.124939
-0.836182	available from	-0.249877
-1.154497	accessed from	-0.124939
-2.164488	bytes from	-0.124939
-2.157471	threads from	-0.124939
-2.135086	integers from	-0.124939
-1.050601	calculated from	-0.425969
-2.102116	known from	-0.124939
-2.083216	support from	-0.124939
-2.070627	list from	-0.124939
-2.032139	1 from	-0.124939
-1.368263	files from	-0.124939
-1.992515	optimal from	-0.124939
-1.963468	sets from	-0.124939
-1.940181	separate from	-0.124939
-1.943297	block from	-0.124939
-1.039859	conversion from	-0.124939
-1.938259	resources from	-0.124939
-1.903969	n from	-0.124939
-1.907085	runtime from	-0.124939
-1.919776	needed from	-0.124939
-1.240553	transferred from	-0.425969
-0.711526	read from	-0.124939
-1.883499	linked from	-0.124939
-1.883499	microprocessors from	-0.124939
-1.859078	calling from	-0.124939
-1.836802	goes from	-0.124939
-0.899542	range from	-0.301030
-1.794985	loaded from	-0.124939
-0.847213	conversions from	-0.124939
-1.753900	response from	-0.124939
-1.740627	lines from	-0.124939
-1.072113	comes from	-0.124939
-1.678479	right from	-0.124939
-1.688396	writing from	-0.124939
-1.685065	reduced from	-0.124939
-1.685065	clear from	-0.124939
-1.656990	efficiently from	-0.124939
-1.656990	names from	-0.124939
-1.612514	ebx from	-0.124939
-1.619201	copied from	-0.124939
-0.645449	come from	-0.124939
-0.804047	Conversion from	-0.124939
-1.480899	jump from	-0.124939
-0.490941	far from	-0.124939
-1.429699	saved from	-0.124939
-1.429699	manuals from	-0.124939
-1.426289	again from	-0.124939
-0.678400	reads from	-0.124939
-0.070372	benefit from	-0.124939
-0.186539	recover from	-0.249877
-1.293962	generated from	-0.124939
-1.283571	increased from	-0.124939
-1.287007	returns from	-0.124939
-1.290470	gets from	-0.124939
-1.287007	sum1 from	-0.124939
-1.193560	going from	-0.124939
-1.190097	sum2 from	-0.124939
-0.503017	prevented from	-0.124939
-1.193560	interfaces from	-0.124939
-1.190097	interval from	-0.124939
-1.193560	b: from	-0.124939
-1.068622	removed from	-0.124939
-1.072113	separated from	-0.124939
-1.068622	Inheritance from	-0.124939
-0.378788	Available from	-0.124939
-1.068622	questions from	-0.124939
-1.072113	returning from	-0.124939
-1.068622	ReadTSC() from	-0.124939
-1.068622	evicted from	-0.124939
-0.203408	suffer from	-0.124939
-0.896022	warning from	-0.124939
-0.896022	fetched from	-0.124939
-0.203408	accessible from	-0.124939
-0.896022	recovering from	-0.124939
-0.896022	restriction from	-0.124939
-0.598512	timingtest.h from	-0.124939
-0.598512	referenced from	-0.124939
-0.598512	transition from	-0.124939
-0.598512	popped from	-0.124939
-0.598512	deviate from	-0.124939
-0.598512	learn from	-0.124939
-0.598512	Interference from	-0.124939
-0.598512	115 from	-0.124939
-0.598512	re-loaded from	-0.124939
-2.349762	the memory	-0.182931
-3.294916	is memory	-0.124939
-2.208809	a memory	-0.263241
-1.945679	of memory	-0.148420
-2.936916	to memory	-0.124939
-3.044876	and memory	-0.124939
-2.139566	in memory	-0.234083
-2.503831	The memory	-0.301030
-3.161605	that memory	-0.124939
-2.886446	or memory	-0.124939
-3.065185	if memory	-0.124939
-2.951603	by memory	-0.124939
-2.926158	with memory	-0.124939
-2.388496	as memory	-0.124939
-2.947565	This memory	-0.124939
-2.840594	than memory	-0.124939
-2.808586	have memory	-0.124939
-2.772598	this memory	-0.124939
-2.218954	more memory	-0.124939
-1.767048	from memory	-0.124939
-2.718384	data memory	-0.124939
-1.867297	different memory	-0.124939
-1.366700	same memory	-0.249877
-2.036834	one memory	-0.124939
-1.948750	into memory	-0.124939
-1.892201	multiple memory	-0.124939
-1.144568	static memory	-0.425969
-2.520404	efficient memory	-0.124939
-2.492022	possible memory	-0.124939
-1.840912	takes memory	-0.124939
-2.437215	any memory	-0.124939
-2.380777	less memory	-0.124939
-2.367953	take memory	-0.124939
-1.657521	new memory	-0.425969
-0.470705	dynamic memory	-1.102662
-2.259968	case memory	-0.124939
-1.321945	stack memory	-0.124939
-2.210808	about memory	-0.124939
-1.302157	large memory	-0.301030
-1.531967	big memory	-0.124939
-2.179948	much memory	-0.124939
-2.152816	common memory	-0.124939
-0.970867	allocated memory	-0.124939
-2.100268	another memory	-0.124939
-1.339387	particular memory	-0.425969
-1.862057	own memory	-0.124939
-0.946463	bigger memory	-0.301030
-1.846356	old memory	-0.124939
-1.827314	smaller memory	-0.124939
-1.127367	main memory	-0.124939
-0.259348	Dynamic memory	-1.079181
-1.743694	No memory	-0.124939
-1.724915	prevent memory	-0.124939
-0.978463	Optimizing memory	-0.425969
-0.566196	RAM memory	-0.124939
-1.590268	swap memory	-0.124939
-1.291571	seven memory	-0.124939
-1.293916	excessive memory	-0.124939
-1.289238	larger memory	-0.124939
-1.291571	contiguous memory	-0.124939
-1.194661	round memory	-0.124939
-1.194661	saving memory	-0.124939
-1.194661	uncached memory	-0.124939
-1.194661	units, memory	-0.124939
-1.072068	arbitrary memory	-0.124939
-1.072068	Extra memory	-0.124939
-1.072068	allocates memory	-0.124939
-0.203642	speed, memory	-0.124939
-0.599676	minimizing memory	-0.124939
-0.599676	Uncached memory	-0.124939
-0.599676	reserving memory	-0.124939
-3.112558	to at	-0.124939
-2.908331	be at	-0.124939
-2.754499	or at	-0.124939
-2.512776	it at	-0.124939
-2.922471	function at	-0.124939
-2.846345	by at	-0.124939
-2.386869	code at	-0.124939
-2.730356	not at	-0.124939
-2.290116	than at	-0.124939
-2.817873	compiler at	-0.124939
-2.754359	time at	-0.124939
-2.654546	memory at	-0.124939
-2.682664	has at	-0.124939
-1.805085	used at	-0.301030
-2.568296	compilers at	-0.124939
-2.532678	pointer at	-0.124939
-1.502700	library at	-0.249877
-2.452607	possible at	-0.124939
-2.466745	value at	-0.124939
-2.387574	variable at	-0.124939
-2.364828	table at	-0.124939
-1.349032	elements at	-0.425969
-2.411601	faster at	-0.124939
-1.134493	stored at	-0.425969
-2.368075	address at	-0.124939
-2.325851	bit at	-0.124939
-2.228147	bits at	-0.124939
-2.203324	instructions at	-0.124939
-2.231607	available at	-0.124939
-2.191984	stack at	-0.124939
-2.179481	calls at	-0.124939
-2.154900	calculations at	-0.124939
-2.160339	bytes at	-0.124939
-2.144211	best at	-0.124939
-2.108537	etc. at	-0.124939
-2.134697	good at	-0.124939
-1.074573	done at	-0.124939
-1.473572	line at	-0.425969
-1.210616	manual at	-0.301030
-2.144682	explained at	-0.124939
-1.455764	calculated at	-0.425969
-0.332463	known at	-0.878266
-2.097442	supported at	-0.124939
-1.408261	run at	-0.124939
-1.382571	values at	-0.124939
-2.009758	cycles at	-0.124939
-2.032815	addresses at	-0.124939
-1.964001	branches at	-0.124939
-1.957632	multiplication at	-0.124939
-1.933797	name at	-0.124939
-1.916068	zero at	-0.124939
-1.250745	better at	-0.124939
-1.904097	lookup at	-0.124939
-0.158906	byte at	-0.215115
-1.918903	transferred at	-0.124939
-1.888182	aligned at	-0.124939
-0.355468	look at	-0.522879
-1.856724	numbers at	-0.124939
-1.888289	piece at	-0.124939
-1.831078	options at	-0.124939
-1.150899	start at	-0.124939
-1.126816	around at	-0.425969
-1.799889	things at	-0.124939
-1.786143	reductions at	-0.124939
-0.715392	loaded at	-0.249877
-1.779429	supports at	-0.124939
-1.742054	comes at	-0.124939
-1.683384	offset at	-0.124939
-0.357948	unknown at	-0.903090
-1.648622	square at	-0.124939
-1.655662	thing at	-0.124939
-1.652127	least at	-0.124939
-1.617873	occur at	-0.124939
-1.569440	counts at	-0.124939
-1.583636	added at	-0.124939
-1.523683	(or at	-0.124939
-1.476036	DLL at	-0.124939
-0.332347	resolved at	-0.425969
-1.421578	memory, at	-0.124939
-1.361788	break at	-0.124939
-1.358195	happens at	-0.124939
-1.361788	evaluated at	-0.124939
-1.282606	popular at	-0.124939
-1.286229	pow at	-0.124939
-1.282606	project at	-0.124939
-0.503647	Dispatch at	-0.124939
-1.196657	appendix at	-0.124939
-1.189319	begin at	-0.124939
-1.189319	cache, at	-0.124939
-1.068034	Internet at	-0.124939
-1.071718	flow at	-0.124939
-1.068034	handled at	-0.124939
-0.895627	memcpy, at	-0.124939
-0.895627	kilobytes at	-0.124939
-0.895627	visible at	-0.124939
-0.895627	looking at	-0.124939
-0.895627	matrix[c][r] at	-0.124939
-0.895627	interrupts at	-0.124939
-0.895627	do, at	-0.124939
-0.895627	begins at	-0.124939
-0.895627	collector at	-0.124939
-0.598312	lost at	-0.124939
-0.598312	aiming at	-0.124939
-0.598312	instantiated at	-0.124939
-0.598312	(1./1.2345) at	-0.124939
-0.598312	decision at	-0.124939
-0.598312	Nerds at	-0.124939
-0.598312	Look at	-0.124939
-0.598312	relocation at	-0.124939
-0.598312	Looking at	-0.124939
-0.598312	unpredictably at	-0.124939
-0.598312	breakpoints at	-0.124939
-2.026009	the data	-0.355388
-3.240271	is data	-0.124939
-2.634138	a data	-0.726999
-2.106800	of data	-0.124939
-2.735549	to data	-0.124939
-1.992269	and data	-0.204120
-2.257843	The data	-0.124939
-3.121399	that data	-0.124939
-2.464811	or data	-0.124939
-2.499342	if data	-0.124939
-2.921576	This data	-0.124939
-2.761451	more data	-0.124939
-2.213795	when data	-0.425969
-2.613112	same data	-0.124939
-2.059331	other data	-0.124939
-2.633273	which data	-0.124939
-1.651239	all data	-0.249877
-2.664427	used data	-0.124939
-1.984763	class data	-0.124939
-1.888017	multiple data	-0.124939
-2.463687	two data	-0.124939
-1.642512	static data	-0.124939
-2.479133	where data	-0.124939
-1.000749	makes data	-1.028029
-2.343159	first data	-0.124939
-1.414809	test data	-0.124939
-1.584882	constant data	-0.124939
-2.231682	making data	-0.124939
-2.196495	its data	-0.124939
-2.198815	about data	-0.124939
-0.863275	large data	-0.271067
-2.182665	while data	-0.124939
-1.281643	big data	-0.301030
-1.515179	much data	-0.124939
-2.056582	store data	-0.124939
-2.043913	intermediate data	-0.124939
-1.216912	public data	-0.124939
-1.854984	own data	-0.124939
-1.840365	binary data	-0.124939
-1.840365	old data	-0.124939
-1.811764	advanced data	-0.124939
-0.633376	level-1 data	-0.425969
-1.718162	local data	-0.124939
-1.685978	right data	-0.124939
-1.693963	writing data	-0.124939
-1.653861	little data	-0.124939
-1.618734	align data	-0.124939
-1.580019	modify data	-0.124939
-1.577341	input data	-0.124939
-0.855077	non-static data	-0.425969
-1.534262	time-consuming data	-0.124939
-1.433252	far data	-0.124939
-1.427812	storing data	-0.124939
-1.433252	smallest data	-0.124939
-1.366305	files, data	-0.124939
-0.424714	prefetch data	-0.124939
-1.366305	processing, data	-0.124939
-1.369051	Accessing data	-0.124939
-1.287124	remote data	-0.124939
-1.292633	Aligning data	-0.124939
-0.504021	Access data	-0.425969
-1.192960	image data	-0.124939
-0.504021	Class data	-0.425969
-1.192960	Small data	-0.124939
-1.070784	Extra data	-0.124939
-1.070784	prefetching data	-0.124939
-1.070784	aligning data	-0.124939
-1.073565	organizing data	-0.124939
-1.070784	read-only data	-0.124939
-1.070784	accesses data	-0.124939
-0.897474	send data	-0.124939
-0.897474	keeping data	-0.124939
-0.203555	thread-specific data	-0.124939
-0.897474	received data	-0.124939
-0.897474	organize data	-0.124939
-0.599243	Prefetching data	-0.124939
-0.599243	exchange data	-0.124939
-0.599243	decryption, data	-0.124939
-0.599243	arranging data	-0.124939
-0.599243	writable data	-0.124939
-0.599243	Larger data	-0.124939
-0.599243	numerical data	-0.124939
-0.599243	Loading data	-0.124939
-0.599243	structure, data	-0.124939
-1.668818	the program	-0.389097
-1.792656	a program	-0.286307
-2.557191	of program	-0.124939
-2.688189	in program	-0.301030
-1.844341	The program	-0.238882
-3.027364	or program	-0.124939
-3.033884	on program	-0.124939
-2.790109	A program	-0.124939
-1.636042	C++ program	-0.124939
-2.436059	makes program	-0.124939
-1.262631	test program	-0.249877
-2.219828	Windows program	-0.124939
-2.206436	big program	-0.124939
-2.209919	But program	-0.124939
-2.151289	optimized program	-0.124939
-1.150192	mode program	-0.301030
-1.076260	application program	-0.124939
-1.886273	calling program	-0.124939
-0.788133	your program	-0.124939
-1.793186	installation program	-0.124939
-1.761334	desired program	-0.124939
-0.315435	whole program	-0.249877
-1.758804	No program	-0.124939
-1.701732	final program	-0.124939
-1.704292	clear program	-0.124939
-0.980429	during program	-0.124939
-0.941853	entire program	-0.124939
-1.629181	rarely program	-0.124939
-1.373666	7 program	-0.124939
-1.075242	speed-critical program	-0.124939
-1.075242	well-structured program	-0.124939
-0.124792	Whole program	-0.602060
-1.075242	intensive program	-0.124939
-0.900460	preventing program	-0.124939
-0.900460	usability, program	-0.124939
-0.600743	analyzing program	-0.124939
-0.600743	downloaded program	-0.124939
-0.600743	-fwhole- program	-0.124939
-0.600743	antivirus program	-0.124939
-2.139610	that has	-0.204120
-1.443091	it has	-0.266268
-2.480835	function has	-0.425969
-1.827991	code has	-0.425969
-1.551029	This has	-0.425969
-1.450663	compiler has	-0.263241
-2.830286	{ has	-0.124939
-2.783363	time has	-0.124939
-1.945510	It has	-0.124939
-1.344898	program has	-0.176091
-1.840077	CPU has	-0.301030
-2.639345	but has	-0.124939
-2.002244	integer has	-0.124939
-1.748483	set has	-0.124939
-1.980375	class has	-0.124939
-2.500623	example has	-0.124939
-2.587604	compilers has	-0.124939
-2.550052	size has	-0.124939
-1.431874	pointer has	-0.221849
-2.554237	b has	-0.124939
-1.107977	library has	-0.234083
-1.644791	object has	-0.301030
-2.487612	static has	-0.124939
-2.460590	C++ has	-0.124939
-1.624216	also has	-0.301030
-2.480546	value has	-0.124939
-2.385122	table has	-0.124939
-2.409474	performance has	-0.124939
-2.409518	way has	-0.124939
-2.313038	template has	-0.124939
-1.662072	registers has	-0.425969
-1.408945	user has	-0.301030
-2.210843	always has	-0.124939
-2.229005	system has	-0.124939
-2.236371	file has	-0.124939
-2.240141	type has	-0.124939
-2.206480	error has	-0.124939
-1.282577	processor has	-0.124939
-2.165606	element has	-0.124939
-2.152355	language has	-0.124939
-2.144754	thread has	-0.124939
-2.142346	overflow has	-0.124939
-2.112569	line has	-0.124939
-2.092659	problem has	-0.124939
-2.077284	list has	-0.124939
-2.072731	structure has	-0.124939
-2.050955	mode has	-0.124939
-2.034179	count has	-0.124939
-1.992963	space has	-0.124939
-0.915125	microprocessor has	-0.124939
-1.974401	application has	-0.124939
-1.943745	model has	-0.124939
-1.943007	parameter has	-0.124939
-1.924523	programmer has	-0.124939
-1.254818	keyword has	-0.124939
-1.918786	addition has	-0.124939
-1.865256	actually has	-0.124939
-1.852425	platform has	-0.124939
-1.818068	operands has	-0.124939
-1.793244	main has	-0.124939
-1.796142	computer has	-0.124939
-1.784598	p has	-0.124939
-1.772730	syntax has	-0.124939
-1.766915	STL has	-0.124939
-1.769813	inlining has	-0.124939
-1.762627	instance has	-0.124939
-1.765687	position-independent has	-0.124939
-1.688446	offset has	-0.124939
-1.659659	heap has	-0.124939
-1.621870	occur has	-0.124939
-1.624889	executable has	-0.124939
-1.574503	seconds has	-0.124939
-0.898687	F1 has	-0.124939
-1.580478	Compiler has	-0.124939
-1.528746	style has	-0.124939
-1.537738	latter has	-0.124939
-1.480570	chain has	-0.124939
-1.486586	who has	-0.124939
-1.428594	D has	-0.124939
-1.191656	manager has	-0.124939
-1.191656	spot has	-0.124939
-1.191656	auto_ptr has	-0.124939
-1.069800	reordering has	-0.124939
-1.069800	Pascal has	-0.124939
-0.896813	for-loop has	-0.124939
-0.896813	Sum1 has	-0.124939
-0.896813	reader has	-0.124939
-0.896813	functions) has	-0.124939
-0.598910	function" has	-0.124939
-0.598910	CParent::Hello() has	-0.124939
-0.598910	Deallocation has	-0.124939
-0.598910	8.23b has	-0.124939
-0.598910	apparently has	-0.124939
-0.598910	i=0; has	-0.124939
-2.379844	the vector	-0.397940
-2.021257	a vector	-0.401145
-2.534007	of vector	-0.124939
-2.775868	and vector	-0.124939
-2.415678	in vector	-0.221849
-2.510439	The vector	-0.124939
-2.503452	for vector	-0.124939
-3.040592	// vector	-0.124939
-2.492754	or vector	-0.425969
-2.969393	by vector	-0.124939
-1.864241	with vector	-0.124939
-2.394203	as vector	-0.124939
-2.822085	have vector	-0.124939
-1.640180	use vector	-0.301030
-2.798605	more vector	-0.124939
-1.172574	integer vector	-0.522879
-2.588126	class vector	-0.124939
-1.590918	each vector	-0.249877
-1.972427	using vector	-0.124939
-1.354252	Intel vector	-0.204120
-1.197950	into vector	-0.726999
-2.521157	64-bit vector	-0.124939
-2.525758	efficient vector	-0.124939
-2.498102	possible vector	-0.124939
-1.380886	long vector	-0.124939
-1.321110	bit vector	-0.249877
-2.302328	new vector	-0.124939
-1.086714	short vector	-0.221849
-2.259304	available vector	-0.124939
-2.237956	constant vector	-0.124939
-1.138308	result vector	-0.726999
-2.104863	another vector	-0.124939
-0.883705	Using vector	-0.249877
-1.294941	Boolean vector	-0.124939
-1.980726	intrinsic vector	-0.124939
-1.883816	XMM vector	-0.124939
-1.883816	bigger vector	-0.124939
-1.800845	supports vector	-0.124939
-1.802885	my vector	-0.124939
-1.782736	Supports vector	-0.124939
-1.101553	STL vector	-0.124939
-0.259239	#pragma vector	-0.602060
-1.720615	special vector	-0.124939
-1.692560	right vector	-0.124939
-0.600715	Define vector	-0.425969
-0.725541	128-bit vector	-0.124939
-1.622089	parallel vector	-0.124939
-1.624179	allow vector	-0.124939
-1.586996	My vector	-0.124939
-0.854278	256-bit vector	-0.124939
-1.442800	Mathematical vector	-0.124939
-1.371539	largest vector	-0.124939
-1.369398	Agner vector	-0.124939
-0.070452	Agner's vector	-0.522879
-1.290217	larger vector	-0.124939
-1.197600	sixteen vector	-0.124939
-1.197600	cores, vector	-0.124939
-1.074824	b;} vector	-0.124939
-0.898732	scientific vector	-0.124939
-0.203682	predefined vector	-0.425969
-0.599876	(when vector	-0.124939
-0.599876	odd-sized vector	-0.124939
-0.599876	Bit vector	-0.124939
-0.599876	a.y);} vector	-0.124939
-0.599876	2-dimensional vector	-0.124939
-3.069736	a make	-0.425969
-1.229881	to make	-0.593197
-2.017453	and make	-0.249877
-2.023498	that make	-0.249877
-1.707065	can make	-0.162727
-3.197066	// make	-0.124939
-3.107081	or make	-0.124939
-1.890626	not make	-0.124939
-1.935783	may make	-0.124939
-2.078909	you make	-0.124939
-1.234674	will make	-0.197489
-1.798791	then make	-0.249877
-2.670768	compilers make	-0.124939
-1.349743	cannot make	-0.249877
-1.562292	must make	-0.124939
-2.224764	doesn't make	-0.124939
-1.418723	would make	-0.425969
-2.027514	Therefore, make	-0.124939
-1.594476	course make	-0.124939
-1.499240	X make	-0.124939
-1.444615	Alternatively, make	-0.124939
-1.297643	Templates make	-0.124939
-0.901393	better, make	-0.124939
-0.601211	errors; make	-0.124939
-0.601211	(5) make	-0.124939
-2.482649	the different	-0.204120
-3.395248	is different	-0.124939
-1.976122	a different	-0.124939
-1.950183	of different	-0.289749
-2.780766	to different	-0.124939
-2.812657	and different	-0.124939
-2.198508	in different	-0.124939
-2.385146	The different	-0.249877
-1.617402	for different	-0.367977
-3.231793	that different	-0.124939
-2.672269	be different	-0.124939
-2.439037	are different	-0.124939
-3.109438	if different	-0.124939
-1.731126	with different	-0.124939
-2.199987	on different	-0.124939
-2.940206	as different	-0.124939
-2.000893	use different	-0.301030
-2.168922	from different	-0.124939
-1.918493	at different	-0.124939
-2.756436	make different	-0.124939
-2.734269	If different	-0.124939
-2.613745	each different	-0.124939
-2.623074	do different	-0.124939
-2.599440	using different	-0.124939
-2.600350	b different	-0.124939
-1.212574	two different	-0.271067
-1.168629	many different	-0.191886
-2.432431	very different	-0.124939
-1.099071	between different	-0.191886
-2.193666	Use different	-0.124939
-2.174992	These different	-0.124939
-0.705338	several different	-0.176091
-2.114603	support different	-0.124939
-2.110091	eight different	-0.124939
-2.084274	doing different	-0.124939
-1.929074	three different	-0.124939
-1.914075	look different	-0.124939
-1.697474	copying different	-0.124939
-1.662712	mix different	-0.124939
-1.496630	try different	-0.124939
-1.498320	CPUs, different	-0.124939
-0.600685	seven different	-0.124939
-0.600685	platforms, different	-0.425969
-1.198988	mixing different	-0.124939
-0.899662	having different	-0.124939
-0.899662	resolutions, different	-0.124939
-0.899662	treats different	-0.124939
-0.899662	assigning different	-0.124939
-0.600342	browsers, different	-0.124939
-0.600342	widely different	-0.124939
-0.600342	microprocessors, different	-0.124939
-2.343439	is because	-0.249877
-2.873515	be because	-0.124939
-2.719841	or because	-0.124939
-1.994849	function because	-0.346788
-2.964276	if because	-0.124939
-2.161000	code because	-0.124939
-2.796268	compiler because	-0.124939
-1.729335	time because	-0.221849
-2.627315	data because	-0.124939
-2.620766	functions because	-0.124939
-2.607477	loop because	-0.124939
-2.516518	all because	-0.124939
-2.544202	cache because	-0.124939
-2.513129	integer because	-0.124939
-2.526174	do because	-0.124939
-2.470483	double because	-0.124939
-1.690689	b because	-0.124939
-2.454539	library because	-0.124939
-2.455091	object because	-0.124939
-1.350792	efficient because	-0.221849
-2.440821	possible because	-0.124939
-2.451987	version because	-0.124939
-1.806722	variable because	-0.124939
-2.388312	variables because	-0.124939
-1.539822	performance because	-0.301030
-2.325460	software because	-0.124939
-2.392337	long because	-0.124939
-1.509189	faster because	-0.124939
-2.373032	critical because	-0.124939
-1.703240	register because	-0.425969
-2.258901	often because	-0.124939
-2.287540	template because	-0.124939
-1.413097	pointers because	-0.124939
-2.250489	test because	-0.124939
-2.246316	systems because	-0.124939
-2.302797	useful because	-0.124939
-2.230851	0 because	-0.124939
-2.229343	processors because	-0.124939
-2.224248	available because	-0.124939
-2.227781	up because	-0.124939
-1.326502	times because	-0.124939
-2.147738	large because	-0.124939
-2.171509	calls because	-0.124939
-2.185586	result because	-0.124939
-2.162280	necessary because	-0.124939
-2.128178	language because	-0.124939
-2.155242	speed because	-0.124939
-2.071150	128 because	-0.124939
-2.095889	parameters because	-0.124939
-2.106939	advantageous because	-0.124939
-1.435556	solution because	-0.124939
-2.117684	advantage because	-0.124939
-2.042905	operators because	-0.124939
-1.144191	mode because	-0.602060
-2.030948	values because	-0.124939
-1.965455	programs because	-0.124939
-1.983123	problems because	-0.124939
-1.336200	optimal because	-0.425969
-1.963737	microprocessor because	-0.124939
-1.936747	complicated because	-0.124939
-1.953828	cost because	-0.124939
-1.890539	better because	-0.124939
-1.887006	applications because	-0.124939
-1.901314	mechanism because	-0.124939
-1.912364	needed because	-0.124939
-1.896805	types because	-0.124939
-1.896805	read because	-0.124939
-1.851666	numbers because	-0.124939
-1.825675	process because	-0.124939
-1.813434	just because	-0.124939
-1.147579	operands because	-0.124939
-1.805909	smaller because	-0.124939
-1.150044	here because	-0.124939
-1.781687	intended because	-0.124939
-1.101370	avoided because	-0.124939
-0.661310	inefficient because	-0.249877
-1.761670	position-independent because	-0.124939
-1.711948	platforms because	-0.124939
-1.708102	constants because	-0.124939
-1.679764	tasks because	-0.124939
-1.637343	disk because	-0.124939
-1.618960	executable because	-0.124939
-1.603367	parallel because	-0.124939
-1.615009	copied because	-0.124939
-1.561974	statements because	-0.124939
-1.565820	seconds because	-0.124939
-1.480657	parallelism because	-0.124939
-1.476706	fastest because	-0.124939
-1.480657	preferred because	-0.124939
-1.422665	-fpic because	-0.124939
-0.743178	consuming because	-0.425969
-1.418714	size, because	-0.124939
-1.355719	course, because	-0.124939
-1.359706	occurs because	-0.124939
-0.677062	costly because	-0.124939
-1.355719	poor because	-0.124939
-1.355719	completely because	-0.124939
-1.280525	28 because	-0.124939
-1.280525	processes because	-0.124939
-1.187639	one, because	-0.124939
-1.195801	0x80000000; because	-0.124939
-1.191701	serial because	-0.124939
-1.191701	9.5 because	-0.124939
-1.187639	simpler because	-0.124939
-1.070863	differently because	-0.124939
-1.066762	unfortunate because	-0.124939
-0.378535	twice because	-0.124939
-1.066762	-fpie because	-0.124939
-1.066762	issue because	-0.124939
-1.066762	negligible because	-0.124939
-0.203281	problematic because	-0.124939
-0.894771	vectorized, because	-0.124939
-0.894771	unsafe because	-0.124939
-0.597881	i*12, because	-0.124939
-0.597881	interesting because	-0.124939
-0.597881	non-sequentially because	-0.124939
-0.597881	stall because	-0.124939
-0.597881	evaluated, because	-0.124939
-0.597881	*(++p) because	-0.124939
-0.597881	array[++i] because	-0.124939
-0.597881	line, because	-0.124939
-0.597881	Bridge) because	-0.124939
-0.597881	advance, because	-0.124939
-0.597881	alloca, because	-0.124939
-0.597881	risky because	-0.124939
-1.369172	the same	-0.345872
-1.930005	The same	-0.212089
-2.195040	from same	-0.425969
-1.446132	units same	-0.124939
-2.436111	the functions	-0.539912
-2.624741	of functions	-0.124939
-3.254260	to functions	-0.124939
-3.035614	and functions	-0.124939
-2.697361	The functions	-0.124939
-3.068698	for functions	-0.124939
-3.155172	that functions	-0.124939
-2.505676	if functions	-0.124939
-2.804179	have functions	-0.124939
-2.724306	from functions	-0.124939
-2.713695	vector functions	-0.124939
-2.101595	different functions	-0.124939
-1.833360	other functions	-0.124939
-2.650997	which functions	-0.124939
-1.806271	all functions	-0.124939
-2.677670	used functions	-0.124939
-2.562933	using functions	-0.124939
-1.109074	library functions	-0.124939
-1.885555	two functions	-0.425969
-2.518633	efficient functions	-0.124939
-1.445770	many functions	-0.425969
-0.654700	member functions	-0.204120
-1.493379	critical functions	-0.124939
-1.407083	these functions	-0.124939
-2.251803	system functions	-0.124939
-2.227992	Some functions	-0.124939
-2.208941	about functions	-0.124939
-2.234532	important functions	-0.124939
-2.191175	necessary functions	-0.124939
-2.157046	specific functions	-0.124939
-1.511362	These functions	-0.425969
-0.992697	virtual functions	-0.124939
-2.161429	several functions	-0.124939
-2.071786	few functions	-0.124939
-2.085189	inline functions	-0.124939
-2.033806	All functions	-0.124939
-2.015689	both functions	-0.124939
-1.968264	complicated functions	-0.124939
-0.346296	intrinsic functions	-0.204120
-0.536586	mathematical functions	-0.425969
-1.944627	various functions	-0.124939
-1.005198	string functions	-0.124939
-1.915019	three functions	-0.124939
-1.937561	Make functions	-0.124939
-1.217998	public functions	-0.425969
-1.826502	smaller functions	-0.124939
-1.806282	C functions	-0.124939
-1.044816	math functions	-0.124939
-1.719653	inlined functions	-0.124939
-1.045303	frame functions	-0.124939
-1.617226	Library functions	-0.124939
-0.549817	Virtual functions	-0.124939
-1.430286	suitable functions	-0.124939
-0.124705	Mathematical functions	-0.204120
-0.265890	Intrinsic functions	-0.249877
-1.372900	leaf functions	-0.124939
-1.370490	internal functions	-0.124939
-1.196809	missing functions	-0.124939
-0.503678	Fastcall functions	-0.124939
-1.194399	individual functions	-0.124939
-1.194399	Small functions	-0.124939
-0.504168	Overloaded functions	-0.124939
-1.071870	speed-critical functions	-0.124939
-0.898202	Pure functions	-0.124939
-0.898202	fastcall functions	-0.124939
-0.898202	non-polymorphic functions	-0.124939
-0.898202	Sometimes, functions	-0.124939
-0.898202	unnecessary functions	-0.124939
-0.599609	Non-polymorphic functions	-0.124939
-0.599609	QueryPerformanceCounter functions	-0.124939
-0.599609	Leaf functions	-0.124939
-0.599609	memory-intensive functions	-0.124939
-0.599609	Inlined functions	-0.124939
-2.891954	the only	-0.124939
-2.122262	is only	-0.271067
-3.185569	a only	-0.124939
-3.180208	of only	-0.124939
-2.358017	and only	-0.124939
-3.275522	in only	-0.124939
-2.486687	The only	-0.124939
-3.112626	that only	-0.124939
-2.617572	be only	-0.425969
-2.270440	are only	-0.249877
-2.112344	can only	-0.346788
-3.047722	it only	-0.124939
-3.032768	if only	-0.124939
-2.907444	by only	-0.124939
-2.054070	with only	-0.249877
-2.905156	on only	-0.124939
-1.783874	not only	-0.204120
-2.903441	you only	-0.124939
-2.249242	have only	-0.425969
-2.734626	this only	-0.124939
-1.989931	use only	-0.124939
-2.697995	from only	-0.124939
-1.737266	has only	-0.425969
-2.697535	make only	-0.124939
-2.673692	functions only	-0.124939
-1.541558	but only	-0.124939
-1.539854	used only	-0.346788
-2.650744	should only	-0.124939
-2.512927	example only	-0.124939
-2.541203	using only	-0.124939
-2.558732	size only	-0.124939
-2.476212	where only	-0.124939
-2.476212	possible only	-0.124939
-1.434550	takes only	-0.249877
-2.404560	branch only	-0.124939
-1.336536	called only	-0.425969
-2.367194	example, only	-0.124939
-1.308155	take only	-0.249877
-2.297819	registers only	-0.124939
-2.320173	need only	-0.124939
-2.287668	method only	-0.124939
-2.182674	work only	-0.124939
-2.165122	AMD only	-0.124939
-2.141540	option only	-0.124939
-2.133086	AVX only	-0.124939
-1.484889	done only	-0.124939
-0.717231	works only	-0.425969
-2.096406	problem only	-0.124939
-1.172521	contains only	-0.124939
-2.073470	would only	-0.124939
-2.070852	run only	-0.124939
-1.384613	well only	-0.425969
-2.002114	optimal only	-0.124939
-1.339838	dispatching only	-0.124939
-1.967312	allows only	-0.124939
-1.950278	methods only	-0.124939
-1.971677	needs only	-0.124939
-1.927317	needed only	-0.124939
-1.217191	requires only	-0.124939
-1.806937	things only	-0.124939
-1.131786	depends only	-0.425969
-1.793355	tested only	-0.124939
-1.801453	loaded only	-0.124939
-1.100819	Supports only	-0.425969
-1.749829	comes only	-0.124939
-1.747095	fact only	-0.124939
-1.652919	containing only	-0.124939
-1.650185	handle only	-0.124939
-1.623435	initialized only	-0.124939
-1.623435	includes only	-0.124939
-1.623435	insert only	-0.124939
-1.582042	F1 only	-0.124939
-1.584846	processors, only	-0.124939
-1.533499	chosen only	-0.124939
-1.550488	applies only	-0.124939
-0.441010	mispredicted only	-0.425969
-1.432766	allowed only	-0.124939
-1.429944	hold only	-0.124939
-1.365819	evaluated only	-0.124939
-1.286638	executed only	-0.124939
-1.286638	valid only	-0.124939
-1.286638	currently only	-0.124939
-1.192568	Can only	-0.124939
-1.070489	services only	-0.124939
-0.897276	modifying only	-0.124939
-0.599143	Actually, only	-0.124939
-0.599143	understands only	-0.124939
-1.831705	the CPU	-0.365971
-2.413746	a CPU	-0.492916
-2.540128	of CPU	-0.124939
-3.324018	to CPU	-0.124939
-3.110438	and CPU	-0.124939
-1.875572	The CPU	-0.359022
-2.718642	for CPU	-0.124939
-3.064216	// CPU	-0.124939
-2.938312	or CPU	-0.124939
-2.474033	by CPU	-0.124939
-2.444925	with CPU	-0.124939
-2.431354	on CPU	-0.124939
-2.875383	than CPU	-0.124939
-2.838382	have CPU	-0.124939
-2.813136	more CPU	-0.124939
-2.824978	when CPU	-0.124939
-2.188767	A CPU	-0.124939
-2.770864	at CPU	-0.124939
-2.684788	different CPU	-0.124939
-2.042758	one CPU	-0.124939
-1.991592	each CPU	-0.124939
-1.975297	using CPU	-0.124939
-1.964645	Intel CPU	-0.124939
-1.378622	multiple CPU	-0.346788
-1.537053	between CPU	-0.124939
-2.387476	called CPU	-0.124939
-2.307033	without CPU	-0.124939
-0.834860	specific CPU	-0.191886
-2.115569	uses CPU	-0.124939
-2.121425	known CPU	-0.124939
-2.061391	optimizing CPU	-0.124939
-1.341345	particular CPU	-0.124939
-1.976015	their CPU	-0.124939
-0.750444	automatic CPU	-0.522879
-1.890437	Many CPU	-0.124939
-1.869247	own CPU	-0.124939
-1.804127	supports CPU	-0.124939
-1.669665	unknown CPU	-0.124939
-1.631876	Automatic CPU	-0.124939
-1.629976	under CPU	-0.124939
-1.584809	similar CPU	-0.124939
-1.592392	apply CPU	-0.124939
-0.746237	Intel's CPU	-0.425969
-0.679290	13.1 CPU	-0.425969
-0.680070	newest CPU	-0.124939
-1.368635	poor CPU	-0.124939
-1.196368	bad CPU	-0.124939
-1.196368	options. CPU	-0.124939
-1.198293	c: CPU	-0.124939
-1.196368	explicit CPU	-0.124939
-1.073354	consumes CPU	-0.124939
-1.075288	Explicit CPU	-0.124939
-0.379430	13.6 CPU	-0.425969
-0.379430	(Intel CPU	-0.425969
-0.203729	13.7 CPU	-0.425969
-0.600109	13.2. CPU	-0.124939
-0.600109	inappropriate CPU	-0.124939
-0.600109	(NetBurst) CPU	-0.124939
-2.334305	the other	-0.313995
-3.037814	is other	-0.124939
-2.989441	of other	-0.124939
-2.786581	to other	-0.301030
-1.721901	and other	-0.164447
-2.262072	in other	-0.124939
-3.180805	The other	-0.124939
-1.998538	for other	-0.234083
-2.662098	are other	-0.124939
-1.960604	or other	-0.204120
-3.118856	if other	-0.124939
-3.027604	by other	-0.124939
-1.731859	with other	-0.182931
-3.009602	on other	-0.124939
-1.836491	than other	-0.124939
-2.032977	have other	-0.602060
-1.927378	from other	-0.124939
-2.057865	all other	-0.124939
-2.704155	but other	-0.124939
-2.666688	one other	-0.124939
-1.266128	no other	-0.249877
-1.249954	each other	-0.425969
-2.628982	do other	-0.124939
-1.991001	most other	-0.124939
-2.611970	size other	-0.124939
-2.555893	library other	-0.124939
-2.523632	two other	-0.124939
-1.881306	also other	-0.124939
-1.255604	In other	-0.425969
-0.759862	any other	-0.152967
-1.821696	some other	-0.124939
-2.246793	Some other	-0.124939
-1.304969	while other	-0.301030
-2.221692	calls other	-0.124939
-2.183004	several other	-0.124939
-2.158078	cause other	-0.124939
-1.957595	various other	-0.124939
-1.859146	reduce other	-0.124939
-1.865227	choose other	-0.124939
-1.843280	require other	-0.124939
-1.706585	On other	-0.124939
-1.662604	Any other	-0.124939
-1.629400	sizes other	-0.124939
-1.370793	Several other	-0.124939
-1.370793	c1 other	-0.124939
-1.294728	over other	-0.124939
-0.600476	affects other	-0.124939
-2.636447	the instruction	-0.329059
-2.927213	of instruction	-0.124939
-3.031056	and instruction	-0.124939
-2.695658	The instruction	-0.425969
-3.058920	if instruction	-0.124939
-2.936427	on instruction	-0.124939
-2.147854	This instruction	-0.602060
-2.335624	an instruction	-0.124939
-2.012466	this instruction	-0.301030
-2.795369	when instruction	-0.124939
-1.866236	different instruction	-0.301030
-2.634399	same instruction	-0.124939
-1.819723	which instruction	-0.301030
-2.584014	no instruction	-0.124939
-2.574763	each instruction	-0.124939
-2.510633	64-bit instruction	-0.124939
-2.489013	possible instruction	-0.124939
-2.416774	branch instruction	-0.124939
-2.361449	bit instruction	-0.124939
-2.293452	new instruction	-0.124939
-1.653029	these instruction	-0.425969
-0.373378	SSE2 instruction	-0.903090
-2.284222	32 instruction	-0.124939
-1.592208	available instruction	-0.425969
-2.208011	about instruction	-0.124939
-2.222143	assembly instruction	-0.124939
-2.190601	necessary instruction	-0.124939
-1.519461	specific instruction	-0.124939
-0.672953	AVX instruction	-0.602060
-0.734604	supported instruction	-0.271067
-2.019531	write instruction	-0.124939
-1.338937	particular instruction	-0.124939
-1.975223	next instruction	-0.124939
-1.944136	various instruction	-0.124939
-1.951036	what instruction	-0.124939
-0.383671	later instruction	-0.602060
-0.670740	higher instruction	-0.346788
-1.194755	AVX2 instruction	-0.425969
-0.923812	x86 instruction	-0.602060
-1.833108	appropriate instruction	-0.124939
-1.820228	compatible instruction	-0.124939
-1.753927	slow instruction	-0.124939
-1.072752	desired instruction	-0.124939
-1.726326	given instruction	-0.124939
-1.626355	SSE instruction	-0.124939
-0.899616	SSE4.1 instruction	-0.425969
-0.899122	newer instruction	-0.124939
-1.580198	current instruction	-0.124939
-1.539205	low instruction	-0.124939
-1.488052	lower instruction	-0.124939
-0.491810	CPUID instruction	-0.124939
-0.746200	latest instruction	-0.425969
-0.424863	scan instruction	-0.301030
-1.365515	SSE3 instruction	-0.124939
-0.425139	newest instruction	-0.124939
-1.370359	prefetch instruction	-0.124939
-1.288749	AVX512 instruction	-0.124939
-0.186704	CISC instruction	-0.425969
-1.194268	highest instruction	-0.124939
-1.196710	x86-64 instruction	-0.124939
-0.503658	MOVNTQ instruction	-0.124939
-1.194268	selected instruction	-0.124939
-1.071771	specified instruction	-0.124939
-0.124662	AVX-512 instruction	-0.602060
-0.898136	lowest instruction	-0.124939
-0.898136	FMA4 instruction	-0.124939
-0.898136	corresponding instruction	-0.124939
-0.898136	EMMS instruction	-0.124939
-0.599576	(the instruction	-0.124939
-0.599576	blend instruction	-0.124939
-0.599576	later) instruction	-0.124939
-0.599576	forward) instruction	-0.124939
-0.599576	Newest instruction	-0.124939
-0.599576	Pro instruction	-0.124939
-3.387311	the point	-0.124939
-3.624609	to point	-0.124939
-3.206661	= point	-0.124939
-3.234099	it point	-0.124939
-2.884625	will point	-0.124939
-0.026478	floating point	-0.412180
-2.552383	possible point	-0.124939
-1.763742	cannot point	-0.425969
-1.672091	they point	-0.425969
-2.249135	; point	-0.124939
-0.016593	Floating point	-0.312025
-1.202064	26 point	-0.124939
-1.202064	entry point	-0.124939
-0.204017	decimal point	-0.124939
-0.601545	technological point	-0.124939
-1.789056	the loop	-0.471726
-1.808898	a loop	-0.229674
-2.668617	of loop	-0.425969
-1.845539	The loop	-0.197489
-2.093465	// loop	-0.346788
-3.001564	as loop	-0.124939
-2.407228	This loop	-0.124939
-2.273156	this loop	-0.124939
-1.965702	A loop	-0.301030
-2.657485	no loop	-0.124939
-2.629915	using loop	-0.124939
-2.554526	efficient loop	-0.124939
-1.110756	out loop	-0.823909
-1.306500	while loop	-0.124939
-2.211207	big loop	-0.124939
-2.210592	c loop	-0.124939
-2.129410	another loop	-0.124939
-0.288975	innermost loop	-0.170696
-1.770165	whole loop	-0.124939
-1.733471	special loop	-0.124939
-1.743606	repeat loop	-0.124939
-1.014966	maximum loop	-0.425969
-0.942644	message loop	-0.124939
-1.496034	variables, loop	-0.124939
-1.297589	excessive loop	-0.124939
-1.075740	unrolled loop	-0.124939
-0.900793	Excessive loop	-0.124939
-0.900793	infinite loop	-0.124939
-0.900793	Initialize loop	-0.124939
-0.900793	Increment loop	-0.124939
-0.600910	intermediates, loop	-0.124939
-0.600910	power, loop	-0.124939
-0.600910	i<20 loop	-0.124939
-0.600910	Main loop	-0.124939
-2.864741	// If	-0.124939
-2.714508	} If	-0.124939
-2.309507	64 If	-0.124939
-1.281292	code. If	-0.425969
-2.169135	function. If	-0.124939
-1.130466	memory. If	-0.124939
-1.999044	used. If	-0.124939
-1.288881	cache. If	-0.425969
-1.287172	systems. If	-0.124939
-1.250960	efficient. If	-0.124939
-0.982967	set. If	-0.301030
-1.871726	compilers. If	-0.124939
-1.827075	called. If	-0.124939
-1.786617	loop. If	-0.124939
-1.786617	pointer. If	-0.124939
-1.744734	size. If	-0.124939
-1.710288	calls. If	-0.124939
-1.674089	mode. If	-0.124939
-1.670112	object. If	-0.124939
-1.007459	library. If	-0.124939
-1.674089	cycles. If	-0.124939
-0.974422	thread. If	-0.124939
-1.647392	purposes. If	-0.124939
-1.605552	way. If	-0.124939
-1.605552	vector. If	-0.124939
-1.560146	references. If	-0.124939
-1.580597	u; If	-0.124939
-1.564160	address. If	-0.124939
-0.896106	CPU. If	-0.124939
-0.896973	problem. If	-0.124939
-1.526543	order. If	-0.124939
-1.518402	inefficient. If	-0.124939
-1.518402	executed. If	-0.124939
-1.518402	parameter. If	-0.124939
-1.518402	storage. If	-0.124939
-1.475391	file. If	-0.124939
-1.475391	register. If	-0.124939
-1.475391	units. If	-0.124939
-1.471301	branch. If	-0.124939
-0.742939	table. If	-0.124939
-0.743810	simultaneously. If	-0.425969
-0.743810	integer. If	-0.124939
-1.417399	anyway. If	-0.124939
-1.421527	16. If	-0.124939
-0.742939	number. If	-0.124939
-1.354581	constant. If	-0.124939
-1.358749	prediction. If	-0.124939
-1.362957	application. If	-0.124939
-1.354581	members. If	-0.124939
-1.354581	chain. If	-0.124939
-0.676863	again. If	-0.425969
-1.358749	overlap. If	-0.124939
-1.358749	branches. If	-0.124939
-1.279567	maintain. If	-0.124939
-1.279567	future. If	-0.124939
-0.598554	factor. If	-0.124939
-1.279567	priority. If	-0.124939
-1.283776	www.agner.org/optimize/asmlib.zip. If	-0.124939
-1.279567	necessary. If	-0.124939
-1.279567	elimination If	-0.124939
-1.279567	addresses. If	-0.124939
-1.279567	5. If	-0.124939
-1.186866	better. If	-0.124939
-1.186866	up. If	-0.124939
-1.186866	declared. If	-0.124939
-1.186866	running. If	-0.124939
-1.186866	(RTTI) If	-0.124939
-0.502518	first. If	-0.124939
-1.186866	www.agner.org/optimize/cppexamples.zip. If	-0.124939
-1.186866	programs. If	-0.124939
-0.502518	results. If	-0.425969
-1.066177	addition. If	-0.124939
-1.066177	code). If	-0.124939
-1.066177	errors. If	-0.124939
-0.378455	part. If	-0.425969
-1.066177	12. If	-0.124939
-1.066177	long. If	-0.124939
-1.066177	same. If	-0.124939
-1.066177	slow. If	-0.124939
-1.066177	0x1C. If	-0.124939
-0.894377	CriticalFunction. If	-0.124939
-0.894377	15. If	-0.124939
-0.894377	cases: If	-0.124939
-0.894377	0x20; If	-0.124939
-0.894377	methods. If	-0.124939
-0.894377	hyperthreading. If	-0.124939
-0.203241	manner? If	-0.425969
-0.894377	read. If	-0.124939
-0.894377	obtained. If	-0.124939
-0.894377	containers. If	-0.124939
-0.894377	ms. If	-0.124939
-0.203241	added? If	-0.425969
-0.894377	58 If	-0.124939
-0.894377	105). If	-0.124939
-0.894377	BSD. If	-0.124939
-0.597682	extensions. If	-0.124939
-0.597682	macro. If	-0.124939
-0.597682	removed. If	-0.124939
-0.597682	time? If	-0.124939
-0.597682	class). If	-0.124939
-0.597682	speeds. If	-0.124939
-0.597682	7. If	-0.124939
-0.597682	lookup[b]; If	-0.124939
-0.597682	__debugbreak();. If	-0.124939
-0.597682	consecutively? If	-0.124939
-0.597682	n∙(n-1)!. If	-0.124939
-0.597682	62. If	-0.124939
-0.597682	coded. If	-0.124939
-0.597682	key? If	-0.124939
-0.597682	complicated. If	-0.124939
-0.597682	(www.intel.com). If	-0.124939
-0.597682	remotely. If	-0.124939
-0.597682	ordering? If	-0.124939
-0.597682	stub. If	-0.124939
-0.597682	calculate. If	-0.124939
-0.597682	42 If	-0.124939
-0.597682	writes. If	-0.124939
-0.597682	sequence. If	-0.124939
-0.597682	measurement. If	-0.124939
-0.597682	analysis. If	-0.124939
-0.597682	elements? If	-0.124939
-0.597682	references: If	-0.124939
-0.597682	6. If	-0.124939
-0.597682	pipeline. If	-0.124939
-0.597682	stored? If	-0.124939
-0.597682	152 If	-0.124939
-0.597682	ways). If	-0.124939
-0.597682	considerable. If	-0.124939
-0.597682	2.5f; If	-0.124939
-0.597682	sum2; If	-0.124939
-0.597682	allocated. If	-0.124939
-2.359934	of which	-0.191886
-2.426247	and which	-0.124939
-1.934791	in which	-0.669007
-1.912616	function which	-0.124939
-1.921285	on which	-0.124939
-2.868236	code which	-0.124939
-2.826478	compiler which	-0.124939
-2.762223	time which	-0.124939
-2.662764	memory which	-0.124939
-2.693674	at which	-0.124939
-2.645034	functions which	-0.124939
-2.627536	CPU which	-0.124939
-2.519532	class which	-0.124939
-2.495477	double which	-0.124939
-1.699151	pointer which	-0.124939
-2.475917	library which	-0.124939
-1.905969	i which	-0.124939
-2.474723	object which	-0.124939
-2.393538	variable which	-0.124939
-1.741423	address which	-0.124939
-2.350791	example, which	-0.124939
-2.330372	bit which	-0.124939
-2.310309	register which	-0.124939
-2.282079	out which	-0.124939
-2.215469	system which	-0.124939
-2.207539	instructions which	-0.124939
-2.234471	available which	-0.124939
-1.316068	about which	-0.124939
-2.195429	CPUs which	-0.124939
-2.134486	integers which	-0.124939
-2.101723	known which	-0.124939
-2.082665	support which	-0.124939
-2.063371	calculate which	-0.124939
-1.969961	operator which	-0.124939
-1.057232	see which	-0.124939
-1.873428	directives which	-0.124939
-1.868537	shows which	-0.124939
-1.833174	process which	-0.124939
-1.811449	overhead which	-0.124939
-1.765034	profiler which	-0.124939
-1.733746	operation which	-0.124939
-1.071340	code, which	-0.124939
-1.691530	conditions which	-0.124939
-1.678141	testing which	-0.124939
-0.565090	predict which	-0.124939
-0.940901	discussed which	-0.124939
-1.618980	consider which	-0.124939
-1.567507	compiler, which	-0.124939
-1.581000	checks which	-0.124939
-1.477291	chain which	-0.124939
-1.484090	non-sequential which	-0.124939
-1.426098	trick which	-0.124939
-1.422685	integers, which	-0.124939
-1.422685	size, which	-0.124939
-1.426098	registers, which	-0.124939
-1.359151	detect which	-0.124939
-1.373077	deciding which	-0.124939
-1.286877	latency which	-0.124939
-0.599192	true, which	-0.425969
-0.502997	up, which	-0.124939
-1.193462	advance which	-0.124939
-1.189967	{} which	-0.124939
-1.189967	default, which	-0.124939
-1.068524	abstraction which	-0.124939
-1.068524	optimization, which	-0.124939
-1.068524	comparisons, which	-0.124939
-0.378775	stack, which	-0.124939
-1.068524	(STL) which	-0.124939
-1.068524	decide which	-0.124939
-1.068524	references, which	-0.124939
-0.895956	operator, which	-0.124939
-0.895956	-56 which	-0.124939
-0.895956	number, which	-0.124939
-0.895956	CPU, which	-0.124939
-0.895956	intervals which	-0.124939
-0.895956	array, which	-0.124939
-0.895956	interpreter which	-0.124939
-0.895956	division, which	-0.124939
-0.895956	comparison, which	-0.124939
-0.895956	counter, which	-0.124939
-0.895956	decides which	-0.124939
-0.895956	collector which	-0.124939
-0.895956	certainty which	-0.124939
-0.203401	operation, which	-0.425969
-0.598478	moved, which	-0.124939
-0.598478	mispredicted, which	-0.124939
-0.598478	(DLL) which	-0.124939
-0.598478	x<<3, which	-0.124939
-0.598478	branch, which	-0.124939
-0.598478	asmlib, which	-0.124939
-0.598478	polymorphism, which	-0.124939
-0.598478	attribute which	-0.124939
-0.598478	results, which	-0.124939
-0.598478	matters, which	-0.124939
-0.598478	-ffunction-sections) which	-0.124939
-0.598478	everything, which	-0.124939
-0.598478	output, which	-0.124939
-0.598478	.NET, which	-0.124939
-0.598478	a[] which	-0.124939
-0.598478	multiplications, which	-0.124939
-0.598478	bit-mask which	-0.124939
-0.598478	(eax) which	-0.124939
-0.598478	model, which	-0.124939
-0.598478	WritePrivateProfileString, which	-0.124939
-0.598478	(RTTI), which	-0.124939
-0.598478	YMM) which	-0.124939
-0.598478	2.5, which	-0.124939
-2.993113	is all	-0.124939
-2.533140	of all	-0.124939
-2.447235	to all	-0.124939
-2.475076	and all	-0.249877
-2.415266	in all	-0.124939
-1.764147	for all	-0.154902
-1.868056	that all	-0.263241
-1.917177	if all	-0.301030
-2.240076	by all	-0.124939
-1.581552	with all	-0.321233
-1.482346	on all	-0.124939
-2.853053	not all	-0.124939
-2.809921	when all	-0.124939
-2.195698	then all	-0.124939
-1.760108	at all	-0.124939
-2.730917	make all	-0.124939
-1.875027	because all	-0.301030
-2.405123	before all	-0.124939
-2.381273	call all	-0.124939
-2.385110	example, all	-0.124939
-2.306751	test all	-0.124939
-2.198933	while all	-0.124939
-2.144782	cause all	-0.124939
-2.084333	would all	-0.124939
-1.399873	store all	-0.425969
-2.051200	addresses all	-0.124939
-2.009631	replace all	-0.124939
-1.311006	sets all	-0.124939
-1.978317	needs all	-0.124939
-1.939995	Make all	-0.124939
-1.935806	examples all	-0.124939
-1.935806	last all	-0.124939
-0.987777	after all	-0.301030
-1.898387	load all	-0.124939
-1.788748	off all	-0.124939
-0.849229	Supports all	-0.602060
-1.101907	inlining all	-0.425969
-1.758589	including all	-0.124939
-1.754359	checking all	-0.124939
-1.735049	prevents all	-0.124939
-1.692211	testing all	-0.124939
-1.692211	copying all	-0.124939
-1.012609	causes all	-0.124939
-1.621770	why all	-0.124939
-1.586769	Obviously, all	-0.124939
-1.536740	contain all	-0.124939
-1.547499	stores all	-0.124939
-1.541012	across all	-0.124939
-0.803520	almost all	-0.124939
-1.431867	Likewise, all	-0.124939
-1.436181	saved all	-0.124939
-1.436181	remove all	-0.124939
-1.369235	declare all	-0.124939
-1.292227	select all	-0.124939
-1.290053	Fortunately, all	-0.124939
-1.195317	(Gnu) all	-0.124939
-1.074757	manipulate all	-0.124939
-1.072562	language, all	-0.124939
-0.379323	scans all	-0.425969
-1.074757	solve all	-0.124939
-1.074757	Not all	-0.124939
-0.898666	join all	-0.124939
-0.898666	distribute all	-0.124939
-0.599842	pool all	-0.124939
-0.599842	removed, all	-0.124939
-0.599842	analyze all	-0.124939
-2.894331	function but	-0.124939
-2.733067	time but	-0.124939
-2.514928	all but	-0.124939
-2.520370	Intel but	-0.124939
-2.525156	i but	-0.124939
-2.422452	C++ but	-0.124939
-2.455456	order but	-0.124939
-2.335762	example, but	-0.124939
-2.249596	test but	-0.124939
-2.268116	user but	-0.124939
-2.228759	processors but	-0.124939
-2.184918	a, but	-0.124939
-2.123249	overflow but	-0.124939
-1.964847	programs but	-0.124939
-1.088440	cases, but	-0.124939
-1.949053	multiplication but	-0.124939
-1.302269	automatically but	-0.124939
-0.897018	function, but	-0.301030
-0.661287	functions, but	-0.124939
-1.069967	code, but	-0.124939
-0.599139	time, but	-0.124939
-0.896370	used, but	-0.124939
-0.896370	set, but	-0.124939
-0.898040	processors, but	-0.124939
-1.476487	systems, but	-0.124939
-0.549030	CPUs, but	-0.124939
-1.472542	variables, but	-0.124939
-1.480468	8, but	-0.124939
-1.492633	cycles, but	-0.124939
-1.430548	1, but	-0.124939
-1.418495	memory, but	-0.124939
-1.422476	system, but	-0.124939
-1.418495	integers, but	-0.124939
-1.355529	course, but	-0.124939
-1.355529	above, but	-0.124939
-0.424560	efficient, but	-0.124939
-1.359546	edx but	-0.124939
-1.284420	case, but	-0.124939
-1.284420	library, but	-0.124939
-1.280365	programming, but	-0.124939
-0.345379	threads, but	-0.124939
-1.284420	features, but	-0.124939
-1.280365	languages, but	-0.124939
-1.187510	BSD, but	-0.124939
-0.502618	well, but	-0.124939
-1.187510	needed, but	-0.124939
-1.187510	precision, but	-0.124939
-1.187510	spot but	-0.124939
-1.191603	float, but	-0.124939
-1.187510	is, but	-0.124939
-1.187510	unit-test but	-0.124939
-0.502618	pointer, but	-0.124939
-1.066665	bits, but	-0.124939
-1.070797	libraries, but	-0.124939
-1.066665	devices, but	-0.124939
-1.066665	numbers, but	-0.124939
-1.066665	64, but	-0.124939
-1.066665	occurs, but	-0.124939
-1.066665	automatically, but	-0.124939
-1.066665	readable but	-0.124939
-1.066665	instructions, but	-0.124939
-1.066665	metaprogramming, but	-0.124939
-1.066665	vectors, but	-0.124939
-0.894706	storage, but	-0.124939
-0.894706	expressions, but	-0.124939
-0.894706	again, but	-0.124939
-0.203274	applications, but	-0.124939
-0.894706	vectorized, but	-0.124939
-0.894706	expression, but	-0.124939
-0.203274	occur, but	-0.124939
-0.894706	hyperthreading, but	-0.124939
-0.894706	test, but	-0.124939
-0.894706	software, but	-0.124939
-0.894706	complex, but	-0.124939
-0.894706	loaded, but	-0.124939
-0.894706	flexible, but	-0.124939
-0.894706	relocation, but	-0.124939
-0.894706	unit, but	-0.124939
-0.894706	usability, but	-0.124939
-0.894706	manual, but	-0.124939
-0.894706	setup but	-0.124939
-0.894706	type, but	-0.124939
-0.894706	reasons, but	-0.124939
-0.894706	method, but	-0.124939
-0.597848	factorials, but	-0.124939
-0.597848	spots, but	-0.124939
-0.597848	macro, but	-0.124939
-0.597848	.a), but	-0.124939
-0.597848	2-20, but	-0.124939
-0.597848	-mcmodel=large, but	-0.124939
-0.597848	relocate, but	-0.124939
-0.597848	if), but	-0.124939
-0.597848	core, but	-0.124939
-0.597848	bases, but	-0.124939
-0.597848	primitive, but	-0.124939
-0.597848	main, but	-0.124939
-0.597848	block, but	-0.124939
-0.597848	hint, but	-0.124939
-0.597848	manually, but	-0.124939
-0.597848	-ftrapv, but	-0.124939
-0.597848	GB, but	-0.124939
-0.597848	profiling, but	-0.124939
-0.597848	103), but	-0.124939
-0.597848	often, but	-0.124939
-0.597848	security, but	-0.124939
-0.597848	solution, but	-0.124939
-0.597848	situation, but	-0.124939
-0.597848	restriction, but	-0.124939
-0.597848	required, but	-0.124939
-0.597848	Faster, but	-0.124939
-0.597848	caching, but	-0.124939
-0.597848	noticeable but	-0.124939
-0.597848	subtasks, but	-0.124939
-0.597848	expandable, but	-0.124939
-0.597848	aliasing, but	-0.124939
-0.597848	15.1c, but	-0.124939
-0.597848	cached, but	-0.124939
-0.597848	job, but	-0.124939
-0.597848	computing, but	-0.124939
-0.597848	casting, but	-0.124939
-0.597848	section, but	-0.124939
-0.597848	rounding, but	-0.124939
-0.597848	wrong, but	-0.124939
-0.597848	destination, but	-0.124939
-0.597848	symbols, but	-0.124939
-0.597848	platform, but	-0.124939
-1.758187	is used	-0.564271
-3.243107	and used	-0.124939
-1.287903	be used	-0.590779
-1.439134	are used	-0.498519
-3.143939	// used	-0.124939
-3.168235	it used	-0.124939
-2.389200	not used	-0.425969
-2.280920	have used	-0.124939
-2.260562	time used	-0.124939
-2.236408	when used	-0.124939
-2.172585	memory used	-0.425969
-2.787380	data used	-0.124939
-2.747577	only used	-0.124939
-2.741260	CPU used	-0.124939
-1.994308	most used	-0.124939
-1.635732	also used	-0.124939
-2.504795	we used	-0.124939
-1.021317	often used	-0.124939
-1.645557	method used	-0.124939
-2.130481	get used	-0.124939
-2.031905	was used	-0.124939
-2.026059	space used	-0.124939
-2.007897	typically used	-0.124939
-1.976841	model used	-0.124939
-1.296298	never used	-0.425969
-1.933358	longer used	-0.124939
-1.706101	When used	-0.124939
-1.496473	now used	-0.124939
-1.496473	algorithms used	-0.124939
-1.438481	had used	-0.124939
-1.439710	generally used	-0.124939
-1.372764	87 used	-0.124939
-1.294815	currently used	-0.124939
-0.249599	commonly used	-0.124939
-0.504399	seldom used	-0.124939
-1.075441	Pascal used	-0.124939
-0.600810	Microcontrollers used	-0.124939
-2.834851	the one	-0.522879
-2.580647	is one	-0.346788
-3.271514	a one	-0.124939
-2.942510	of one	-0.425969
-2.527858	to one	-0.346788
-2.232501	and one	-0.271067
-2.009687	in one	-0.204120
-2.703294	for one	-0.124939
-2.708328	that one	-0.124939
-3.039099	are one	-0.124939
-2.901755	or one	-0.124939
-3.073682	if one	-0.124939
-2.465413	by one	-0.124939
-2.954247	on one	-0.124939
-2.957436	code one	-0.124939
-2.873159	int one	-0.124939
-1.742899	than one	-0.124939
-2.025613	have one	-0.124939
-2.235127	use one	-0.124939
-1.654808	from one	-0.124939
-1.895478	has one	-0.124939
-2.129550	make one	-0.124939
-0.933776	only one	-0.221849
-2.079393	If one	-0.425969
-1.821424	which one	-0.301030
-1.971610	using one	-0.124939
-1.337113	into one	-0.301030
-2.563650	number one	-0.124939
-2.496066	where one	-0.124939
-2.483731	takes one	-0.124939
-2.384240	example, one	-0.124939
-2.260138	up one	-0.124939
-2.239212	times one	-0.124939
-2.159843	inside one	-0.124939
-2.109331	get one	-0.124939
-1.296540	needs one	-0.425969
-1.918212	read one	-0.124939
-1.852424	goes one	-0.124939
-1.858813	choose one	-0.124939
-1.153426	just one	-0.124939
-1.828943	function, one	-0.124939
-1.804119	go one	-0.124939
-1.717689	save one	-0.124939
-1.044728	preceding one	-0.124939
-1.691862	adding one	-0.124939
-0.978675	least one	-0.124939
-1.657100	handle one	-0.124939
-1.632313	enable one	-0.124939
-1.627936	program, one	-0.124939
-1.584371	set, one	-0.124939
-1.590920	allocate one	-0.124939
-1.433824	eliminate one	-0.124939
-1.431641	available, one	-0.124939
-1.289890	22 one	-0.124939
-1.292095	Only one	-0.124939
-1.195185	units, one	-0.124939
-1.072463	allocates one	-0.124939
-0.898600	randomly one	-0.124939
-0.898600	times, one	-0.124939
-0.599809	16383 one	-0.124939
-0.599809	signifying one	-0.124939
-0.599809	Contain one	-0.124939
-0.599809	shifts one	-0.124939
-0.599809	parts: one	-0.124939
-0.599809	branches: one	-0.124939
-0.599809	names, one	-0.124939
-0.599809	inserted, one	-0.124939
-2.262086	the cache	-0.346788
-2.258912	a cache	-0.279841
-2.191342	of cache	-0.263241
-3.355394	to cache	-0.124939
-2.809924	and cache	-0.124939
-2.525110	The cache	-0.301030
-3.119625	be cache	-0.124939
-2.964512	or cache	-0.124939
-3.010637	by cache	-0.124939
-1.643061	code cache	-0.301030
-2.936853	as cache	-0.124939
-2.825990	more cache	-0.124939
-2.193535	A cache	-0.124939
-1.474934	data cache	-0.191886
-2.698969	different cache	-0.124939
-1.506766	same cache	-0.301030
-2.714259	CPU cache	-0.124939
-2.656015	other cache	-0.124939
-2.704131	used cache	-0.124939
-2.009200	no cache	-0.124939
-2.624509	most cache	-0.124939
-2.511576	where cache	-0.124939
-2.463735	any cache	-0.124939
-2.315479	new cache	-0.124939
-2.303353	these cache	-0.124939
-1.131749	without cache	-0.124939
-1.595444	up cache	-0.124939
-2.229199	extra cache	-0.124939
-1.486158	cause cache	-0.124939
-1.198119	four cache	-0.602060
-1.941559	last cache	-0.124939
-1.220078	Each cache	-0.425969
-1.885891	improve cache	-0.124939
-0.124861	level-2 cache	-0.204120
-1.759269	code, cache	-0.124939
-1.752602	No cache	-0.124939
-0.425635	level-1 cache	-0.124939
-1.725960	special cache	-0.124939
-1.730989	prevent cache	-0.124939
-1.724296	save cache	-0.124939
-1.626254	entire cache	-0.124939
-1.435048	expensive cache	-0.124939
-1.436757	thousand cache	-0.124939
-1.073950	arbitrary cache	-0.124939
-0.379510	Explicit cache	-0.425969
-1.075687	micro-op cache	-0.124939
-0.899595	Provoke cache	-0.124939
-0.600309	sets, cache	-0.124939
-0.600309	(total cache	-0.124939
-0.600309	economy, cache	-0.124939
-0.600309	taking cache	-0.124939
-0.600309	executed, cache	-0.124939
-2.682845	that should	-0.425969
-2.526501	it should	-0.124939
-2.113122	function should	-0.249877
-2.175070	code should	-0.124939
-2.910087	This should	-0.124939
-2.857078	compiler should	-0.124939
-1.310873	you should	-0.271067
-1.788375	It should	-0.249877
-2.684122	data should	-0.124939
-1.889327	program should	-0.124939
-2.669271	functions should	-0.124939
-2.654438	loop should	-0.124939
-2.546006	class should	-0.124939
-2.507611	example should	-0.124939
-2.554991	size should	-0.124939
-2.558326	b should	-0.124939
-2.494116	object should	-0.124939
-2.465602	C++ should	-0.124939
-1.873201	There should	-0.124939
-1.846862	array should	-0.425969
-1.595243	objects should	-0.602060
-2.451737	we should	-0.124939
-1.187030	You should	-0.301030
-2.389764	table should	-0.124939
-2.413313	performance should	-0.124939
-2.368613	software should	-0.124939
-2.401789	branch should	-0.124939
-1.414026	test should	-0.124939
-2.278106	systems should	-0.124939
-2.285672	method should	-0.124939
-2.253487	cases should	-0.124939
-2.216069	constant should	-0.124939
-1.542596	arrays should	-0.425969
-2.172808	calculations should	-0.124939
-1.546722	versions should	-0.425969
-2.175750	bytes should	-0.124939
-2.168351	threads should	-0.124939
-2.147881	thread should	-0.124939
-2.127401	etc. should	-0.124939
-2.079526	list should	-0.124939
-2.022503	counter should	-0.124939
-2.035995	count should	-0.124939
-1.987259	programs should	-0.124939
-2.000584	problems should	-0.124939
-1.997886	dispatching should	-0.124939
-1.951366	block should	-0.124939
-1.944601	parameter should	-0.124939
-1.944601	resources should	-0.124939
-1.004529	dispatcher should	-0.124939
-1.917869	mechanism should	-0.124939
-1.870266	framework should	-0.124939
-0.786361	together should	-0.726999
-1.173449	process should	-0.124939
-1.848730	results should	-0.124939
-1.837627	storage should	-0.124939
-1.746068	lines should	-0.124939
-1.716104	output should	-0.124939
-0.977455	containers should	-0.124939
-1.654817	iteration should	-0.124939
-0.938470	updates should	-0.425969
-1.572797	statements should	-0.124939
-0.898273	unrolling should	-0.124939
-1.584267	penalty should	-0.124939
-1.575636	counts should	-0.124939
-1.538509	device should	-0.124939
-1.484461	binding should	-0.124939
-1.487357	interrupt should	-0.124939
-1.432280	Software should	-0.124939
-0.744571	developers should	-0.124939
-1.286152	guidelines should	-0.124939
-0.503337	queue should	-0.425969
-1.195132	video should	-0.124939
-1.195132	measurement should	-0.124939
-1.192177	considerations should	-0.124939
-1.195132	routine should	-0.124939
-1.198108	modified should	-0.124939
-1.070193	feedback should	-0.124939
-1.070193	calculations, should	-0.124939
-0.897078	formats should	-0.124939
-0.897078	servers should	-0.124939
-0.897078	wide, should	-0.124939
-0.599043	function) should	-0.124939
-0.599043	Patches should	-0.124939
-0.599043	Users should	-0.124939
-0.599043	complaints should	-0.124939
-0.599043	scheme should	-0.124939
-0.599043	imprecisions should	-0.124939
-0.599043	Uninstallation should	-0.124939
-0.599043	file) should	-0.124939
-0.599043	malloc/free should	-0.124939
-2.512242	the integer	-0.166331
-2.951499	of integer	-0.124939
-2.531096	to integer	-0.124939
-2.780936	and integer	-0.124939
-3.353792	in integer	-0.124939
-2.375279	The integer	-0.124939
-2.367108	for integer	-0.124939
-3.055096	are integer	-0.124939
-2.949731	with integer	-0.124939
-2.043732	on integer	-0.249877
-2.901583	as integer	-0.124939
-1.119397	an integer	-0.226396
-2.861671	than integer	-0.124939
-2.818876	use integer	-0.124939
-1.983568	more integer	-0.124939
-2.162498	from integer	-0.425969
-2.731176	vector integer	-0.124939
-2.107709	different integer	-0.425969
-2.770819	because integer	-0.124939
-2.628661	other integer	-0.124939
-2.592971	each integer	-0.124939
-2.605817	do integer	-0.124939
-2.496689	two integer	-0.124939
-1.889528	64-bit integer	-0.124939
-2.527558	efficient integer	-0.124939
-1.783044	32-bit integer	-0.124939
-2.410882	critical integer	-0.124939
-1.727328	bit integer	-0.124939
-1.323226	unsigned integer	-0.124939
-2.290061	these integer	-0.124939
-2.307865	even integer	-0.124939
-1.346747	simple integer	-0.124939
-1.285180	An integer	-0.124939
-2.091102	contains integer	-0.124939
-2.002948	particular integer	-0.124939
-2.010679	replace integer	-0.124939
-1.295126	Using integer	-0.425969
-0.884102	signed integer	-0.124939
-1.947129	means integer	-0.124939
-1.221084	aligned integer	-0.425969
-1.805754	negative integer	-0.124939
-1.805754	positive integer	-0.124939
-1.658496	mix integer	-0.124939
-0.279437	unaligned integer	-0.903090
-1.541691	256-bit integer	-0.124939
-1.541691	default integer	-0.124939
-1.488491	variables, integer	-0.124939
-0.491738	smallest integer	-0.602060
-1.367658	complex integer	-0.124939
-0.679124	six integer	-0.124939
-1.292621	fourteen integer	-0.124939
-1.294708	reducing integer	-0.124939
-0.600788	Calculate integer	-0.124939
-1.195711	additional integer	-0.124939
-1.195711	defining integer	-0.124939
-1.072859	involving integer	-0.124939
-1.072859	nearest integer	-0.124939
-1.072859	Simple integer	-0.124939
-0.898865	elimin., integer	-0.124939
-0.599942	compiler) integer	-0.124939
-0.599942	(6 integer	-0.124939
-0.599942	violation, integer	-0.124939
-0.599942	trap integer	-0.124939
-1.559489	is no	-0.363821
-2.524414	and no	-0.124939
-2.362585	that no	-0.124939
-2.709438	be no	-0.124939
-1.919789	are no	-0.176091
-2.168844	or no	-0.124939
-2.284829	if no	-0.301030
-3.059918	- no	-0.124939
-1.278978	have no	-0.271067
-2.240377	when no	-0.124939
-1.294717	has no	-0.124939
-1.821038	but no	-0.124939
-1.323832	takes no	-0.522879
-1.761910	makes no	-0.425969
-1.470950	take no	-0.301030
-2.251984	about no	-0.124939
-2.136299	get no	-0.124939
-1.430455	contains no	-0.124939
-2.077007	simply no	-0.124939
-1.910717	requires no	-0.124939
-1.890482	control no	-0.124939
-1.899172	assume no	-0.124939
-1.767459	produce no	-0.124939
-1.048039	Assume no	-0.425969
-1.441294	generates no	-0.124939
-1.077218	assuming no	-0.124939
-1.076239	(requires no	-0.124939
-0.203923	"assume no	-0.425969
-0.601077	virtually no	-0.124939
-3.458707	and page	-0.124939
-1.109681	on page	-0.170696
-2.853411	memory page	-0.124939
-2.854233	at page	-0.124939
-2.571527	also page	-0.124939
-0.150094	See page	-0.179296
-0.131403	(see page	-0.194020
-1.316717	see page	-0.124939
-1.873648	10 page	-0.124939
-0.393669	(See page	-0.124939
-1.377331	above, page	-0.124939
-1.377810	13.1 page	-0.124939
-0.902128	14.23 page	-0.124939
-0.601579	7.35 page	-0.124939
-3.694088	the set	-0.124939
-3.084880	is set	-0.124939
-3.057953	a set	-0.425969
-2.472760	to set	-0.204120
-3.496323	in set	-0.124939
-2.709438	be set	-0.124939
-3.221912	are set	-0.124939
-2.641075	can set	-0.124939
-2.208474	// set	-0.425969
-2.819208	from set	-0.124939
-2.805341	data set	-0.124939
-2.758073	different set	-0.124939
-2.745693	same set	-0.124939
-0.299615	instruction set	-0.499949
-2.730156	which set	-0.124939
-2.735092	used set	-0.124939
-2.702900	one set	-0.124939
-2.654622	each set	-0.124939
-2.629868	pointer set	-0.124939
-2.440523	cannot set	-0.124939
-2.027155	particular set	-0.124939
-1.885729	own set	-0.124939
-0.158244	Instruction set	-0.124939
-1.547535	typical set	-0.124939
-1.550438	addition, set	-0.124939
-1.440324	suitable set	-0.124939
-1.296137	list, set	-0.124939
-0.601273	realistic set	-0.425969
-0.124829	instr. set	-0.124939
-2.522702	the class	-0.212089
-2.000898	a class	-0.301030
-3.370056	of class	-0.124939
-3.383391	to class	-0.124939
-2.823768	and class	-0.124939
-3.183719	for class	-0.124939
-1.641032	or class	-0.204120
-2.197548	A class	-0.124939
-1.246570	vector class	-0.564271
-2.106805	same class	-0.124939
-2.737392	functions class	-0.124939
-2.662818	all class	-0.124939
-2.666688	one class	-0.124939
-1.283689	template class	-0.249877
-2.254558	simple class	-0.124939
-1.201483	}; class	-0.124939
-0.822701	container class	-0.204120
-1.962116	what class	-0.124939
-0.282984	child class	-0.191886
-1.668728	containers class	-0.124939
-0.529143	generation class	-0.124939
-0.647113	Vector class	-0.124939
-1.589547	declaration class	-0.124939
-0.233781	derived class	-0.204120
-0.441871	parent class	-0.249877
-1.494181	polymorphic class	-0.124939
-1.370793	base class	-0.124939
-0.600792	inheritance class	-0.425969
-0.249525	N> class	-0.124939
-1.074446	54 class	-0.124939
-1.074446	involving class	-0.124939
-1.074446	7.28 class	-0.124939
-1.074446	Devirtualization class	-0.124939
-1.076019	7.14 class	-0.124939
-0.203803	B1; class	-0.425969
-0.899928	object's class	-0.124939
-0.899928	class: class	-0.124939
-0.899928	Disp(); class	-0.124939
-0.203803	template<> class	-0.124939
-0.899928	Converting class	-0.124939
-0.899928	B2; class	-0.124939
-0.600476	versions: class	-0.124939
-0.600476	MyChild> class	-0.124939
-0.600476	7.44 class	-0.124939
-0.600476	7.37 class	-0.124939
-0.600476	7.36 class	-0.124939
-0.600476	7.41a class	-0.124939
-2.443216	the floating	-1.238882
-3.347214	is floating	-0.124939
-2.140062	a floating	-1.238882
-2.276773	of floating	-1.079181
-2.322508	to floating	-0.550907
-2.384819	and floating	-0.823909
-2.893019	in floating	-0.425969
-3.118035	The floating	-0.124939
-2.264105	for floating	-0.522879
-3.198799	that floating	-0.124939
-3.067494	are floating	-0.124939
-2.929918	or floating	-0.124939
-2.213834	with floating	-0.602060
-2.195888	on floating	-0.602060
-2.910922	as floating	-0.124939
-2.329497	than floating	-0.425969
-2.833663	have floating	-0.124939
-2.728430	A floating	-0.124939
-1.769406	from floating	-0.726999
-1.888507	make floating	-0.602060
-2.680162	different floating	-0.124939
-2.634752	all floating	-0.124939
-2.642295	one floating	-0.124939
-2.598072	each floating	-0.124939
-1.646507	two floating	-0.602060
-1.827034	any floating	-0.425969
-1.785223	between floating	-0.425969
-1.506003	makes floating	-0.602060
-2.372977	8 floating	-0.124939
-2.307339	new floating	-0.124939
-1.584082	making floating	-0.425969
-1.569961	does floating	-0.425969
-2.186952	big floating	-0.124939
-1.176162	eight floating	-0.602060
-2.093020	contains floating	-0.124939
-1.412796	doing floating	-0.425969
-2.033249	fast floating	-0.124939
-1.859119	generate floating	-0.124939
-1.806984	positive floating	-0.124939
-1.690478	100 floating	-0.124939
-1.696237	causes floating	-0.124939
-1.659547	mix floating	-0.124939
-1.657627	Any floating	-0.124939
-1.623686	entire floating	-0.124939
-1.538473	style floating	-0.124939
-1.489265	variables, floating	-0.124939
-1.293015	strict floating	-0.124939
-0.600445	nonzero floating	-0.425969
-1.291034	larger floating	-0.124939
-1.196105	additional floating	-0.124939
-0.203716	manipulating floating	-0.425969
-0.899064	Catch floating	-0.124939
-0.899064	mispredictions, floating	-0.124939
-0.600042	precise floating	-0.124939
-0.600042	Non-strict floating	-0.124939
-0.600042	native floating	-0.124939
-0.600042	Reset floating	-0.124939
-0.600042	relaxed floating	-0.124939
-0.600042	relax floating	-0.124939
-0.600042	FMA3 floating	-0.124939
-1.973983	of each	-0.234083
-2.457403	to each	-0.204120
-2.812657	and each	-0.124939
-2.333822	in each	-0.124939
-1.540282	for each	-0.200659
-2.153062	that each	-0.204120
-2.969037	or each	-0.124939
-2.518628	if each	-0.124939
-3.013978	by each	-0.124939
-2.451605	with each	-0.124939
-2.336691	than each	-0.425969
-2.855315	have each	-0.124939
-2.861672	time each	-0.124939
-2.200412	then each	-0.124939
-2.769170	from each	-0.124939
-2.767313	memory each	-0.124939
-2.163923	at each	-0.124939
-2.127071	because each	-0.124939
-2.734269	If each	-0.124939
-2.599440	using each	-0.124939
-2.610911	into each	-0.124939
-1.336465	where each	-0.124939
-2.515407	value each	-0.124939
-2.437279	between each	-0.124939
-2.398376	less each	-0.124939
-2.322298	test each	-0.124939
-1.333259	times each	-0.301030
-2.202406	But each	-0.124939
-1.415792	calculate each	-0.124939
-2.075685	store each	-0.124939
-1.992401	next each	-0.124939
-1.242084	after each	-0.124939
-1.900831	give each	-0.124939
-1.863937	Here, each	-0.124939
-1.835518	function, each	-0.124939
-1.626576	updates each	-0.124939
-1.596928	within each	-0.124939
-0.158156	near each	-0.669007
-1.370007	giving each	-0.124939
-0.600685	AND each	-0.425969
-1.294200	development, each	-0.124939
-1.294200	i.e. each	-0.124939
-1.075753	After each	-0.124939
-1.074049	contrary, each	-0.124939
-1.075753	moving each	-0.124939
-1.074049	necessary, each	-0.124939
-0.899662	invalidate each	-0.124939
-0.899662	draw each	-0.124939
-0.203776	Compare each	-0.425969
-0.600342	versions, each	-0.124939
-0.600342	neutralize each	-0.124939
-1.461734	to do	-0.349335
-2.158176	that do	-0.425969
-1.452289	can do	-0.216709
-3.152354	// do	-0.124939
-2.540898	or do	-0.124939
-2.391324	not do	-0.124939
-2.978529	you do	-0.124939
-1.961050	will do	-0.124939
-2.780026	program do	-0.124939
-2.704382	should do	-0.124939
-1.996820	compilers do	-0.425969
-1.587716	we do	-0.301030
-2.479368	variables do	-0.124939
-1.760294	cannot do	-0.124939
-1.700821	libraries do	-0.425969
-2.348237	pointers do	-0.124939
-2.333299	systems do	-0.124939
-2.352089	they do	-0.124939
-2.299190	operations do	-0.124939
-2.240620	must do	-0.124939
-2.091058	able do	-0.124939
-1.906858	directives do	-0.124939
-1.884532	vectors do	-0.124939
-1.820818	contentions do	-0.124939
-1.818517	references do	-0.124939
-1.792188	conversions do	-0.124939
-1.671863	containers do	-0.124939
-1.593843	programmers do	-0.124939
-1.295145	Compilers do	-0.124939
-1.199406	Can do	-0.124939
-0.249614	ranges do	-0.602060
-1.075641	register, do	-0.124939
-0.600877	studied do	-0.124939
-0.600877	live-ranges do	-0.124939
-0.600877	ranges) do	-0.124939
-3.320119	the example	-0.124939
-2.817212	of example	-0.124939
-3.494876	to example	-0.124939
-1.444715	in example	-0.202105
-3.279537	The example	-0.124939
-2.004196	for example	-0.234083
-2.188044	as example	-0.124939
-3.041494	This example	-0.124939
-1.967375	an example	-0.425969
-2.961375	than example	-0.124939
-1.533238	this example	-0.249877
-1.570160	from example	-0.204120
-2.742883	same example	-0.124939
-2.637407	using example	-0.124939
-1.347781	In example	-0.124939
-2.435116	For example	-0.124939
-2.331224	these example	-0.124939
-0.503161	following example	-0.681241
-2.219831	An example	-0.124939
-2.044570	optimize example	-0.124939
-0.949342	above example	-0.249877
-2.008702	next example	-0.124939
-2.000071	like example	-0.124939
-1.866797	reduce example	-0.124939
-0.726095	convert example	-0.124939
-1.593995	modify example	-0.124939
-0.901668	My example	-0.425969
-1.297988	reducing example	-0.124939
-0.901060	reduces example	-0.124939
-0.601044	SelectAddMul example	-0.124939
-2.523364	the compilers	-0.321233
-2.282079	The compilers	-0.221849
-3.188814	for compilers	-0.124939
-3.004524	with compilers	-0.124939
-3.012564	on compilers	-0.124939
-2.780047	from compilers	-0.124939
-1.721725	different compilers	-0.124939
-2.731381	only compilers	-0.124939
-1.486550	other compilers	-0.124939
-2.665054	all compilers	-0.124939
-1.742906	most compilers	-0.124939
-1.355632	Intel compilers	-0.204120
-2.544248	64-bit compilers	-0.124939
-0.811069	C++ compilers	-0.425969
-1.822026	some compilers	-0.124939
-2.402890	example, compilers	-0.124939
-1.703100	how compilers	-0.124939
-1.660784	these compilers	-0.124939
-1.578162	Gnu compilers	-0.124939
-0.497361	Some compilers	-0.550907
-2.195489	best compilers	-0.124939
-2.176853	common compilers	-0.124939
-1.498980	good compilers	-0.124939
-2.093500	few compilers	-0.124939
-1.136255	optimizing compilers	-0.124939
-0.403757	Most compilers	-0.467361
-1.897181	Many compilers	-0.124939
-1.668988	Optimizing compilers	-0.124939
-0.979793	Other compilers	-0.124939
-0.942623	PathScale compilers	-0.425969
-1.628190	why compilers	-0.124939
-0.900612	future compilers	-0.124939
-0.900303	current compilers	-0.124939
-1.551667	Modern compilers	-0.124939
-1.440988	latest compilers	-0.124939
-1.439460	Intel's compilers	-0.124939
-0.680310	How compilers	-0.425969
-1.293331	Mars compilers	-0.124939
-1.197950	commercial compilers	-0.124939
-1.197950	Watcom compilers	-0.124939
-1.076085	Different compilers	-0.124939
-0.899994	Current compilers	-0.124939
-0.899994	Supported compilers	-0.124939
-0.600509	Good compilers	-0.124939
-0.600509	Few compilers	-0.124939
-0.600509	(Some compilers	-0.124939
-1.975873	the most	-0.569875
-2.711111	is most	-0.124939
-3.420498	of most	-0.124939
-2.846879	and most	-0.124939
-1.944152	in most	-0.572097
-1.751065	The most	-0.483961
-3.226256	for most	-0.124939
-2.508355	that most	-0.124939
-2.451035	are most	-0.301030
-3.056212	by most	-0.124939
-2.463302	with most	-0.124939
-2.051465	on most	-0.124939
-2.982601	as most	-0.124939
-2.931836	than most	-0.124939
-2.826610	will most	-0.124939
-2.791811	has most	-0.124939
-2.721359	used most	-0.124939
-1.116205	In most	-0.550907
-2.525480	where most	-0.124939
-2.444441	way most	-0.124939
-2.397366	8 most	-0.124939
-2.393907	take most	-0.124939
-2.247307	accessed most	-0.124939
-2.224538	while most	-0.124939
-2.209919	But most	-0.124939
-1.224389	works most	-0.602060
-2.129039	uses most	-0.124939
-2.097459	run most	-0.124939
-2.046374	However, most	-0.124939
-1.816963	predicted most	-0.124939
-1.708161	On most	-0.124939
-0.746871	spend most	-0.425969
-1.294485	Fortunately, most	-0.124939
-1.294485	runs most	-0.124939
-1.198876	Furthermore, most	-0.124939
-1.200181	obtain most	-0.124939
-1.075242	consumes most	-0.124939
-0.600743	Since most	-0.124939
-0.600743	spends most	-0.124939
-3.060710	is using	-0.124939
-1.888277	of using	-0.353418
-2.280002	to using	-0.176091
-2.648793	and using	-0.124939
-3.453816	in using	-0.124939
-2.532989	for using	-0.124939
-1.785459	are using	-0.204120
-3.112116	function using	-0.124939
-1.108051	by using	-0.401145
-2.205977	on using	-0.301030
-2.986328	as using	-0.124939
-2.949269	not using	-0.124939
-2.935021	than using	-0.124939
-2.865074	when using	-0.124939
-2.798036	from using	-0.124939
-2.610459	example using	-0.124939
-2.519654	array using	-0.124939
-2.452242	between using	-0.124939
-1.740543	example, using	-0.124939
-1.248392	without using	-0.249877
-2.321645	method using	-0.124939
-1.503866	power using	-0.124939
-2.163319	matrix using	-0.124939
-2.165727	AVX using	-0.124939
-2.144072	calculated using	-0.124939
-2.103734	operators using	-0.124939
-2.041915	allocation using	-0.124939
-2.020755	preferably using	-0.124939
-1.920721	expressions using	-0.124939
-1.761753	operation using	-0.124939
-1.764236	fact using	-0.124939
-1.707097	conditions using	-0.124939
-0.901038	set, using	-0.425969
-1.438252	memory, using	-0.124939
-0.680214	am using	-0.124939
-1.201557	finished using	-0.124939
-0.900527	Programs using	-0.124939
-0.600776	optimized, using	-0.124939
-3.404163	the double	-0.124939
-3.294916	is double	-0.124939
-2.472275	a double	-0.124939
-2.752804	to double	-0.124939
-2.172278	and double	-0.249877
-2.696643	for double	-0.425969
-3.161605	that double	-0.124939
-3.023670	are double	-0.124939
-3.146560	can double	-0.124939
-2.515038	= double	-0.124939
-1.818507	or double	-0.329059
-2.320863	than double	-0.124939
-1.888195	{ double	-0.124939
-2.802948	use double	-0.124939
-2.699309	A double	-0.124939
-2.795883	} double	-0.124939
-2.682232	loop double	-0.124939
-2.578292	b double	-0.124939
-2.483819	two double	-0.124939
-1.488519	static double	-0.726999
-1.887323	64-bit double	-0.124939
-2.397351	2 double	-0.124939
-1.098983	long double	-0.191886
-1.353868	const double	-0.249877
-2.364933	4 double	-0.124939
-2.196390	; double	-0.124939
-2.143477	AVX double	-0.124939
-2.106588	128 double	-0.124939
-2.089934	four double	-0.124939
-2.121702	b; double	-0.124939
-2.081722	would double	-0.124939
-2.086116	inline double	-0.124939
-2.016974	cases, double	-0.124939
-1.967756	Using double	-0.124939
-1.956797	256 double	-0.124939
-1.294387	c; double	-0.425969
-1.174730	10 double	-0.425969
-0.716914	a; double	-0.425969
-1.627032	SSE double	-0.124939
-0.646491	u; double	-0.602060
-1.289238	AVX512 double	-0.124939
-1.289238	C; double	-0.124939
-1.291571	#endif double	-0.124939
-0.503718	float, double	-0.425969
-1.074426	8.4 double	-0.124939
-1.072068	coefficients double	-0.124939
-0.379256	Long double	-0.425969
-1.072068	unrolled double	-0.124939
-0.898335	3.3; double	-0.124939
-0.898335	14.14b double	-0.124939
-0.898335	14.14a double	-0.124939
-0.898335	A; double	-0.124939
-0.203642	a[SIZE][SIZE], double	-0.425969
-0.599676	x2*x2; double	-0.124939
-0.599676	dummy; double	-0.124939
-0.599676	c2; double	-0.124939
-0.599676	14.18c double	-0.124939
-0.599676	8.2a double	-0.124939
-0.599676	7.32a double	-0.124939
-0.599676	7.32b double	-0.124939
-0.599676	__declspec(__align(64)) double	-0.124939
-0.599676	14.17b double	-0.124939
-0.599676	14.16a double	-0.124939
-0.599676	dest, double	-0.124939
-0.599676	*x; double	-0.124939
-0.599676	8.8b double	-0.124939
-0.599676	8.8a double	-0.124939
-0.599676	x4*x4; double	-0.124939
-0.599676	14.16b double	-0.124939
-0.599676	14.17a double	-0.124939
-0.599676	14.20 double	-0.124939
-2.119638	the size	-0.726999
-3.214514	and size	-0.124939
-1.962659	The size	-0.522879
-2.749201	for size	-0.124939
-2.195568	code size	-0.124939
-1.430797	int size	-1.204120
-2.778670	data size	-0.124939
-1.740579	vector size	-0.124939
-2.121472	different size	-0.425969
-2.713118	same size	-0.124939
-2.669782	cache size	-0.124939
-1.223506	integer size	-0.380211
-2.595053	page size	-0.124939
-1.860084	array size	-0.124939
-2.481054	variable size	-0.124939
-2.479864	any size	-0.124939
-1.316382	register size	-0.124939
-2.238564	its size	-0.124939
-2.189543	specific size	-0.124939
-1.487552	matrix size	-0.425969
-0.793635	line size	-0.191886
-2.027590	Integer size	-0.124939
-1.977055	block size	-0.124939
-1.932185	longer size	-0.124939
-1.839679	smaller size	-0.124939
-1.760497	>= size	-0.124939
-1.703705	maximum size	-0.124939
-1.013995	final size	-0.124939
-1.705055	Define size	-0.124939
-1.702360	total size	-0.124939
-1.664920	full size	-0.124939
-1.673004	RAM size	-0.124939
-1.546708	256-bit size	-0.124939
-1.546708	default size	-0.124939
-0.549984	fixed size	-0.124939
-0.601229	Array size	-0.124939
-0.600953	combined size	-0.425969
-0.379657	Total size	-0.425969
-1.075043	kb size	-0.124939
-0.203843	Matrix size	-0.124939
-0.600676	Half size	-0.124939
-2.201759	the Intel	-0.445274
-2.647815	of Intel	-0.124939
-2.393089	and Intel	-0.346788
-2.333481	in Intel	-0.204120
-1.959215	The Intel	-0.647817
-2.729184	for Intel	-0.124939
-2.964512	or Intel	-0.124939
-2.479876	by Intel	-0.124939
-2.450645	with Intel	-0.124939
-1.845455	on Intel	-0.204120
-1.758480	an Intel	-0.124939
-2.340627	compiler Intel	-0.124939
-2.841777	use Intel	-0.124939
-2.836623	when Intel	-0.124939
-2.168383	from Intel	-0.425969
-2.698969	different Intel	-0.124939
-2.597712	using Intel	-0.124939
-2.582200	double Intel	-0.124939
-2.548636	library Intel	-0.124939
-2.432760	See Intel	-0.124939
-2.426600	For Intel	-0.124939
-2.249559	Gnu Intel	-0.124939
-1.546587	Windows Intel	-0.425969
-2.203729	bytes Intel	-0.124939
-1.234615	Linux Intel	-0.602060
-2.141656	optimized Intel	-0.124939
-2.059679	certain Intel	-0.124939
-2.004579	Mac Intel	-0.124939
-1.913999	compilers. Intel	-0.124939
-1.893796	Many Intel	-0.124939
-1.882532	later Intel	-0.124939
-1.835104	operands Intel	-0.124939
-1.626254	e.g. Intel	-0.124939
-1.588247	newer Intel	-0.124939
-1.586551	current Intel	-0.124939
-0.441954	Microsoft, Intel	-0.726999
-1.499918	Gnu, Intel	-0.124939
-0.249488	Clang, Intel	-0.602060
-1.197159	class, Intel	-0.124939
-1.073950	earlier Intel	-0.124939
-1.073950	platform. Intel	-0.124939
-0.379510	libraries: Intel	-0.124939
-0.899595	131. Intel	-0.124939
-0.899595	op. Intel	-0.124939
-0.203769	"IA-32 Intel	-0.425969
-0.600309	vmldExp2 Intel	-0.124939
-0.600309	2009). Intel	-0.124939
-0.600309	undocumented Intel	-0.124939
-0.600309	__svml_exp2 Intel	-0.124939
-0.600309	2.00. Intel	-0.124939
-0.600309	(using Intel	-0.124939
-0.600309	AQtime, Intel	-0.124939
-2.616771	the pointer	-0.301030
-1.710234	a pointer	-0.669007
-3.235781	and pointer	-0.124939
-3.230273	The pointer	-0.124939
-3.231879	for pointer	-0.124939
-3.284366	that pointer	-0.124939
-3.032597	or pointer	-0.124939
-1.563501	function pointer	-0.238882
-3.140826	if pointer	-0.124939
-3.021501	This pointer	-0.124939
-2.270026	this pointer	-0.124939
-1.963815	A pointer	-0.124939
-1.266860	no pointer	-0.550907
-2.519654	array pointer	-0.124939
-2.526569	possible pointer	-0.124939
-1.834405	any pointer	-0.425969
-1.790836	member pointer	-0.124939
-2.447530	const pointer	-0.124939
-2.403574	4 pointer	-0.124939
-2.398562	8 pointer	-0.124939
-2.264496	simple pointer	-0.124939
-2.241657	its pointer	-0.124939
-2.242826	about pointer	-0.124939
-2.192718	specific pointer	-0.124939
-1.443319	Function pointer	-0.124939
-1.949873	Make pointer	-0.124939
-1.179280	link pointer	-0.124939
-1.047678	Assume pointer	-0.124939
-0.412701	smart pointer	-0.221849
-0.371233	'this' pointer	-0.221849
-1.496244	integer, pointer	-0.124939
-0.680214	Set pointer	-0.425969
-1.373831	avoiding pointer	-0.124939
-1.199008	original pointer	-0.124939
-1.200281	returned pointer	-0.124939
-1.200281	implicit pointer	-0.124939
-1.075342	variable, pointer	-0.124939
-0.203863	frame- pointer	-0.124939
-2.652352	of b	-0.249877
-2.987316	to b	-0.124939
-1.880828	and b	-0.287666
-2.908121	in b	-0.425969
-3.243373	that b	-0.124939
-1.349281	= b	-0.726999
-2.123695	if b	-0.425969
-3.006660	on b	-0.124939
-1.830806	when b	-0.726999
-2.785719	because b	-0.124939
-1.252813	+ b	-0.492916
-1.856576	* b	-0.425969
-2.437992	< b	-0.124939
-2.254894	& b	-0.124939
-2.231432	its b	-0.124939
-2.209400	a, b	-0.124939
-2.122728	/ b	-0.124939
-1.454238	b; b	-0.425969
-2.071454	1 b	-0.124939
-1.401487	: b	-0.425969
-2.068479	add b	-0.124939
-1.313550	expression b	-0.124939
-1.297158	__m128i b	-0.425969
-1.296516	c; b	-0.425969
-1.258065	&& b	-0.124939
-1.934042	| b	-0.124939
-1.916253	|| b	-0.124939
-0.987369	> b	-0.301030
-1.858700	0, b	-0.124939
-1.818165	a; b	-0.124939
-0.979364	2, b	-0.425969
-1.668468	convert b	-0.124939
-0.941898	? b	-0.425969
-1.370596	evaluate b	-0.124939
-1.370596	^ b	-0.124939
-1.370596	Is16vec8 b	-0.124939
-1.297801	AND'ed b	-0.124939
-0.504179	Multiply b	-0.425969
-1.074347	security. b	-0.124939
-1.074347	accesses b	-0.124939
-0.600443	-100, b	-0.124939
-0.600443	-1.0E8, b	-0.124939
-0.600443	(2.0f); b	-0.124939
-0.600443	5.0f; b	-0.124939
-0.600443	two, b	-0.124939
-0.600443	XOR b	-0.124939
-0.600443	Multiply(10,8); b	-0.124939
-0.600443	a+1; b	-0.124939
-3.071514	it into	-0.124939
-2.494570	function into	-0.124939
-2.935709	code into	-0.124939
-2.717510	memory into	-0.124939
-1.636052	data into	-0.124939
-1.734246	vector into	-0.726999
-2.594994	set into	-0.124939
-2.569088	class into	-0.124939
-2.572948	b into	-0.124939
-2.515398	library into	-0.124939
-2.561362	i into	-0.124939
-2.474536	array into	-0.124939
-2.487019	possible into	-0.124939
-2.435509	variables into	-0.124939
-2.387150	software into	-0.124939
-2.414873	branch into	-0.124939
-2.342683	register into	-0.124939
-1.715566	take into	-0.425969
-2.261613	operations into	-0.124939
-2.254956	up into	-0.124939
-1.293422	work into	-0.301030
-2.184601	calculations into	-0.124939
-2.199439	compiled into	-0.124939
-1.522959	best into	-0.425969
-2.135697	matrix into	-0.124939
-1.120133	files into	-0.301030
-2.007770	problems into	-0.124939
-1.958552	block into	-0.124939
-1.967955	put into	-0.124939
-1.984916	y into	-0.124939
-1.914820	read into	-0.124939
-1.219213	linked into	-0.124939
-1.894617	load into	-0.124939
-1.873428	together into	-0.124939
-1.863974	vectors into	-0.124939
-1.848769	goes into	-0.124939
-1.855956	feature into	-0.124939
-1.818217	modules into	-0.124939
-1.800464	go into	-0.124939
-0.716406	loaded into	-0.124939
-1.718526	task into	-0.124939
-0.600097	them into	-0.249877
-1.012453	tasks into	-0.124939
-0.979721	fit into	-0.124939
-1.661243	N into	-0.124939
-1.625904	directly into	-0.124939
-1.625904	copied into	-0.124939
-1.586976	come into	-0.124939
-1.591946	back into	-0.124939
-0.900539	organized into	-0.124939
-1.485152	prediction into	-0.124939
-0.679709	fits into	-0.425969
-1.365127	job into	-0.124939
-1.290916	turned into	-0.124939
-1.290916	effects into	-0.124939
-1.290916	80 into	-0.124939
-1.290916	split into	-0.124939
-0.070431	divided into	-0.124939
-1.290916	combined into	-0.124939
-1.194006	one, into	-0.124939
-0.089980	cc into	-0.726999
-0.089980	bb into	-0.726999
-1.071574	instruments into	-0.124939
-1.071574	0x273F into	-0.124939
-1.074095	translated into	-0.124939
-1.071574	formula into	-0.124939
-0.124655	taken into	-0.602060
-0.379189	joined into	-0.124939
-0.898003	fed into	-0.124939
-0.203609	wrapped into	-0.425969
-0.599509	deeper into	-0.124939
-0.599509	Integrates into	-0.124939
-0.599509	nicely into	-0.124939
-0.599509	isolated into	-0.124939
-0.599509	packed into	-0.124939
-0.599509	feed into	-0.124939
-2.107585	a +	-0.572097
-2.106891	x +	-0.301030
-2.187189	A +	-0.124939
-1.007796	b +	-0.359022
-2.573616	i +	-0.124939
-2.377436	4 +	-0.124939
-2.372977	8 +	-0.124939
-0.762006	c +	-0.329059
-1.996605	operator +	-0.124939
-0.885124	y +	-0.425969
-1.177818	r +	-0.124939
-1.847337	a[i] +	-0.124939
-1.792249	p +	-0.124939
-1.756457	b) +	-0.124939
-1.047603	d +	-0.124939
-0.760639	exponent +	-0.124939
-1.584230	row +	-0.124939
-0.601162	*p +	-0.301030
-1.435190	(a +	-0.124939
-1.441133	4) +	-0.124939
-1.372196	9 +	-0.124939
-1.297004	(c +	-0.124939
-1.295005	b[i] +	-0.124939
-1.196105	SVML +	-0.124939
-1.198095	(b +	-0.124939
-0.124714	LoadVector(cc +	-0.602060
-0.124714	LoadVector(bb +	-0.602060
-0.124714	StoreVector(aa +	-0.602060
-0.203716	c*x +	-0.425969
-0.203716	b*x*x +	-0.425969
-0.203716	a*x*x*x +	-0.425969
-0.899064	bb[i] +	-0.124939
-0.203716	log(b[i]) +	-0.124939
-0.899064	Func1(x) +	-0.124939
-0.899064	1.0f +	-0.124939
-0.899064	ENDP +	-0.124939
-0.600042	b.y +	-0.124939
-0.600042	(FuncRow(i)*columns +	-0.124939
-0.600042	a1/b1 +	-0.124939
-0.600042	(int)&matrix[0][0] +	-0.124939
-0.600042	list[j].b +	-0.124939
-0.600042	e +	-0.124939
-0.600042	r.a +	-0.124939
-0.600042	A*x*x +	-0.124939
-0.600042	(a1*b2 +	-0.124939
-0.600042	B*x +	-0.124939
-0.600042	x*x +	-0.124939
-0.600042	(cc[i] +	-0.124939
-0.600042	p->a +	-0.124939
-0.600042	y.d +	-0.124939
-0.600042	y.a +	-0.124939
-0.600042	y.b +	-0.124939
-0.600042	y.c +	-0.124939
-0.600042	square(x) +	-0.124939
-0.600042	8*x +	-0.124939
-0.600042	(int)(&list[0]) +	-0.124939
-0.600042	vector(x +	-0.124939
-0.600042	b.x +	-0.124939
-0.600042	c.x +	-0.124939
-0.600042	c.y +	-0.124939
-0.712075	- n.a.	-0.726999
-1.837763	x n.a.	-0.346788
-0.502005	n.a. n.a.	-1.204120
-1.897219	platform n.a.	-0.124939
-1.445077	reciprocal n.a.	-0.124939
-1.203078	__INTEL_COMPILER n.a.	-0.124939
-1.078139	_WIN32 n.a.	-0.124939
-0.902395	0.40 n.a.	-0.124939
-0.902395	0.24 n.a.	-0.124939
-0.601712	1.61 n.a.	-0.124939
-2.337749	the library	-0.241444
-2.686450	a library	-0.249877
-3.420498	of library	-0.124939
-2.688189	in library	-0.124939
-2.759646	The library	-0.124939
-3.226256	for library	-0.124939
-2.534162	or library	-0.124939
-1.498234	function library	-0.346788
-1.889500	This library	-0.522879
-2.269247	this library	-0.425969
-2.795747	from library	-0.124939
-2.776246	vector library	-0.124939
-2.725170	point library	-0.124939
-1.201252	class library	-0.176091
-2.643323	most library	-0.124939
-1.970303	Intel library	-0.124939
-2.549755	efficient library	-0.124939
-2.482861	any library	-0.124939
-1.694329	template library	-0.425969
-0.871465	dynamic library	-0.425969
-2.211164	necessary library	-0.124939
-2.129039	get library	-0.124939
-2.079121	standard library	-0.124939
-2.071559	optimizing library	-0.124939
-1.904932	graphics library	-0.124939
-1.908733	linked library	-0.124939
-0.946801	interface library	-0.124939
-0.765857	link library	-0.249877
-1.130647	core library	-0.425969
-1.814425	tested library	-0.124939
-1.046850	math library	-0.124939
-1.630459	entire library	-0.124939
-1.553673	Load library	-0.124939
-1.441914	executing library	-0.124939
-1.439315	consuming library	-0.124939
-0.249584	asmlib library	-0.301030
-0.600743	usable library	-0.124939
-0.600743	IPP library	-0.124939
-0.600743	Primitives" library	-0.124939
-2.287961	of i	-0.234083
-3.445461	to i	-0.124939
-3.243107	and i	-0.124939
-3.288686	that i	-0.124939
-2.333621	= i	-0.124939
-3.143337	if i	-0.124939
-2.990087	as i	-0.124939
-2.953132	not i	-0.124939
-2.955974	int i	-0.124939
-2.236408	when i	-0.425969
-2.795224	has i	-0.124939
-2.750039	If i	-0.124939
-2.612696	example i	-0.124939
-0.262311	0; i	-1.716003
-2.264375	making i	-0.124939
-1.558154	; i	-0.124939
-2.172730	+= i	-0.124939
-2.076987	add i	-0.124939
-2.054837	counter i	-0.124939
-0.374032	(int i	-1.238882
-1.938261	&& i	-0.124939
-1.922512	|| i	-0.124939
-1.202371	100; i	-0.425969
-1.848656	2; i	-0.124939
-1.828769	replaced i	-0.124939
-1.015783	divide i	-0.425969
-1.633550	condition i	-0.124939
-1.640956	size; i	-0.124939
-0.216488	256; i	-0.823909
-1.439710	eliminate i	-0.124939
-1.375232	comparisons i	-0.124939
-1.372764	comparing i	-0.124939
-1.373996	type-casting i	-0.124939
-1.199141	40 i	-0.124939
-1.200380	2.0; i	-0.124939
-0.900593	StringLength; i	-0.124939
-0.900593	20; i	-0.124939
-3.290459	is float	-0.124939
-2.647484	a float	-0.249877
-3.242086	of float	-0.124939
-2.751452	to float	-0.124939
-3.067182	The float	-0.124939
-3.072602	for float	-0.124939
-2.162815	= float	-0.249877
-1.398281	{ float	-0.564271
-2.800998	use float	-0.124939
-2.157741	from float	-0.425969
-2.513336	static float	-0.124939
-1.511961	const float	-0.124939
-2.363814	4 float	-0.124939
-2.359631	8 float	-0.124939
-1.477340	bit float	-0.124939
-2.266952	16 float	-0.124939
-2.294283	SSE2 float	-0.124939
-0.901362	i; float	-0.204120
-2.201719	a, float	-0.124939
-2.105904	128 float	-0.124939
-2.089048	four float	-0.124939
-2.089761	list float	-0.124939
-2.085653	inline float	-0.124939
-1.956222	256 float	-0.124939
-1.902991	public: float	-0.124939
-1.884907	x; float	-0.124939
-0.946966	100; float	-0.301030
-1.859312	AVX2 float	-0.124939
-1.850389	were float	-0.124939
-1.811221	a; float	-0.124939
-0.976970	mix float	-0.425969
-1.662268	convert float	-0.124939
-1.597238	series float	-0.124939
-1.486177	variables, float	-0.124939
-0.746253	a[100]; float	-0.124939
-1.289075	AVX512 float	-0.124939
-1.291440	1000; float	-0.124939
-1.074360	14.6 float	-0.124939
-1.074360	7.1 float	-0.124939
-1.074360	8; float	-0.124939
-1.071969	66 float	-0.124939
-1.071969	j; float	-0.124939
-1.074360	7.27 float	-0.124939
-1.074360	7.24 float	-0.124939
-1.074360	7.16 float	-0.124939
-0.898268	elimin., float	-0.124939
-0.898268	x^2 float	-0.124939
-0.898268	a[1000]; float	-0.124939
-0.898268	1.f; float	-0.124939
-0.898268	11.1a float	-0.124939
-0.898268	11.1b float	-0.124939
-0.898268	Mixing float	-0.124939
-0.599642	8.3a float	-0.124939
-0.599642	7.29a float	-0.124939
-0.599642	mixes float	-0.124939
-0.599642	a;} float	-0.124939
-0.599642	floats: float	-0.124939
-0.599642	8.1b float	-0.124939
-0.599642	8.1a float	-0.124939
-0.599642	8.16 float	-0.124939
-0.599642	8.18 float	-0.124939
-0.599642	32; float	-0.124939
-0.599642	14.18a float	-0.124939
-0.599642	14.18b float	-0.124939
-0.599642	1./2.09227E13}; float	-0.124939
-0.599642	c[size]; float	-0.124939
-0.599642	7.26b float	-0.124939
-0.599642	7.26a float	-0.124939
-0.599642	50; float	-0.124939
-0.599642	(8 float	-0.124939
-0.599642	14.2a float	-0.124939
-0.599642	14.2b float	-0.124939
-3.084429	the multiple	-0.124939
-2.308774	a multiple	-1.079181
-2.991988	to multiple	-0.124939
-2.262362	in multiple	-0.492916
-3.186032	The multiple	-0.124939
-2.524026	for multiple	-0.301030
-2.302255	or multiple	-0.124939
-2.521627	if multiple	-0.425969
-2.485799	by multiple	-0.425969
-1.795142	with multiple	-0.124939
-3.012564	on multiple	-0.124939
-2.957370	as multiple	-0.124939
-2.275045	have multiple	-0.124939
-2.002437	use multiple	-0.124939
-2.780047	from multiple	-0.124939
-2.780073	has multiple	-0.124939
-1.891845	make multiple	-0.301030
-2.640761	set multiple	-0.124939
-2.630471	do multiple	-0.124939
-1.093215	into multiple	-0.346788
-1.035205	between multiple	-0.249877
-2.315324	out multiple	-0.124939
-2.257921	making multiple	-0.124939
-2.217753	while multiple	-0.124939
-1.549521	avoid multiple	-0.124939
-1.522935	through multiple	-0.124939
-1.004010	doing multiple	-0.425969
-1.989935	allows multiple	-0.124939
-1.978779	Using multiple	-0.124939
-1.962532	running multiple	-0.124939
-1.864053	generate multiple	-0.124939
-1.809811	supports multiple	-0.124939
-1.075777	checking multiple	-0.425969
-1.699243	testing multiple	-0.124939
-1.699243	Avoid multiple	-0.124939
-1.703750	Define multiple	-0.124939
-1.672019	compiling multiple	-0.124939
-1.665978	containing multiple	-0.124939
-1.631200	keep multiple	-0.124939
-0.679691	Testing multiple	-0.425969
-1.074546	instructions, multiple	-0.124939
-0.899994	supporting multiple	-0.124939
-0.600509	Running multiple	-0.124939
-0.600509	combining multiple	-0.124939
-0.600509	Run multiple	-0.124939
-0.600509	toggle multiple	-0.124939
-2.420453	the two	-0.157123
-3.442918	is two	-0.124939
-2.658091	of two	-0.124939
-2.336561	in two	-0.124939
-2.751837	The two	-0.124939
-3.263386	that two	-0.124939
-3.162495	be two	-0.124939
-2.032123	are two	-0.191886
-2.158872	or two	-0.249877
-2.253257	by two	-0.124939
-2.459368	with two	-0.124939
-2.415795	as two	-0.124939
-2.343187	than two	-0.124939
-1.879766	have two	-0.425969
-2.148447	has two	-0.124939
-2.139595	make two	-0.124939
-2.085310	If two	-0.124939
-1.991759	do two	-0.124939
-1.954587	into two	-0.124939
-2.478126	variable two	-0.124939
-1.265918	between two	-0.124939
-2.441804	way two	-0.124939
-1.474553	first two	-0.124939
-1.254189	these two	-0.124939
-2.260062	making two	-0.124939
-1.517423	These two	-0.124939
-1.415258	doing two	-0.124939
-1.416405	run two	-0.425969
-1.998539	next two	-0.124939
-1.297557	__m128i two	-0.425969
-1.963782	running two	-0.124939
-1.948092	Make two	-0.124939
-1.155178	just two	-0.124939
-1.844449	require two	-0.124939
-1.733896	prevent two	-0.124939
-0.901639	swap two	-0.124939
-0.855303	approximately two	-0.425969
-1.438527	again two	-0.124939
-1.293825	compare two	-0.124939
-1.074844	Comparing two	-0.124939
-0.900194	Replacing two	-0.124939
-0.600609	correspondingly two	-0.124939
-0.600609	allowing two	-0.124939
-2.138458	the object	-0.277549
-2.479180	of object	-0.301030
-2.551127	The object	-0.124939
-2.168124	or object	-0.124939
-3.017355	as object	-0.124939
-1.225618	an object	-0.324511
-2.163373	data object	-0.124939
-2.755329	different object	-0.124939
-2.742883	same object	-0.124939
-2.700807	one object	-0.124939
-2.665564	no object	-0.124939
-1.596847	each object	-0.124939
-2.567127	static object	-0.124939
-2.399401	first object	-0.124939
-1.666877	new object	-0.124939
-1.288340	An object	-0.301030
-1.520879	single object	-0.425969
-2.104591	structure object	-0.124939
-0.510760	shared object	-0.221849
-2.066151	intermediate object	-0.124939
-1.909278	Each object	-0.124939
-1.046985	local object	-0.124939
-1.633395	why object	-0.124939
-1.497085	temporary object	-0.124939
-1.201078	recommend object	-0.124939
-1.201078	contained object	-0.124939
-1.200069	original object	-0.124939
-0.504540	existing object	-0.124939
-0.901060	Mixing object	-0.124939
-0.601044	usual object	-0.124939
-1.977001	the number	-1.414973
-3.510963	is number	-0.124939
-2.587407	a number	-0.522879
-2.130406	The number	-0.970037
-2.359163	// number	-0.602060
-2.273942	this number	-0.124939
-1.482956	point number	-0.204120
-2.010885	set number	-0.124939
-1.836316	variable number	-0.425969
-2.462885	32-bit number	-0.124939
-2.230439	large number	-0.124939
-1.531225	element number	-0.124939
-2.162897	line number	-0.124939
-2.030616	optimal number	-0.124939
-1.979261	model number	-0.124939
-1.888794	higher number	-0.124939
-1.130642	positive number	-0.425969
-1.046765	limited number	-0.425969
-0.760644	maximum number	-0.602060
-1.706056	reduced number	-0.124939
-0.760520	total number	-0.602060
-1.498493	family number	-0.124939
-1.497394	random number	-0.124939
-1.296581	realistic number	-0.124939
-0.346415	excessive number	-0.602060
-1.295476	107 number	-0.124939
-0.601166	increasing number	-0.425969
-1.201890	extended number	-0.124939
-1.199671	63 number	-0.124939
-1.075840	odd number	-0.124939
-0.379764	evict number	-0.124939
-0.600943	Max. number	-0.124939
-0.600943	integral number	-0.124939
-2.780779	the static	-0.204120
-2.488415	a static	-0.124939
-2.650078	of static	-0.124939
-3.366377	to static	-0.124939
-2.626795	and static	-0.124939
-2.421904	in static	-0.522879
-2.279463	The static	-0.221849
-3.235619	that static	-0.124939
-2.516159	or static	-0.124939
-3.111773	if static	-0.124939
-3.000836	on static	-0.124939
-2.176293	as static	-0.301030
-2.337497	than static	-0.124939
-2.292427	{ static	-0.124939
-2.244211	use static	-0.124939
-2.840575	when static	-0.124939
-1.958202	A static	-0.602060
-1.657624	from static	-0.522879
-2.783702	because static	-0.124939
-2.107642	functions static	-0.124939
-2.656179	all static	-0.124939
-1.732709	using static	-0.301030
-2.543484	object static	-0.124939
-1.648062	static static	-0.301030
-1.105708	array static	-1.028029
-2.513686	where static	-0.124939
-2.213923	large static	-0.124939
-2.132170	b; static	-0.124939
-2.115196	support static	-0.124939
-1.358458	both static	-0.425969
-1.259451	keyword static	-0.425969
-1.901276	requires static	-0.124939
-0.809169	public: static	-0.726999
-1.699449	them static	-0.124939
-1.631799	includes static	-0.124939
-1.588766	module static	-0.124939
-1.543009	n; static	-0.124939
-1.295997	specify static	-0.124939
-1.197422	__declspec(align(16)) static	-0.124939
-1.200758	N> static	-0.124939
-0.899728	<emmintrin.h> static	-0.124939
-0.899728	14.19 static	-0.124939
-0.899728	(called static	-0.124939
-0.899728	factorials: static	-0.124939
-0.899728	word static	-0.124939
-0.600376	boolb=0; static	-0.124939
-0.600376	T> static	-0.124939
-0.600376	line: static	-0.124939
-0.600376	_mm_cvtss_si32(_mm_load_ss(&x));} static	-0.124939
-0.600376	add_horizontal) static	-0.124939
-2.732701	the 64-bit	-0.124939
-2.501935	a 64-bit	-0.204120
-2.669803	of 64-bit	-0.124939
-3.473006	to 64-bit	-0.124939
-1.914363	and 64-bit	-0.259637
-1.650385	in 64-bit	-0.556302
-2.771628	The 64-bit	-0.124939
-2.390929	for 64-bit	-0.124939
-2.315181	or 64-bit	-0.124939
-2.951303	than 64-bit	-0.124939
-2.006479	use 64-bit	-0.301030
-2.809671	from 64-bit	-0.124939
-2.754974	only 64-bit	-0.124939
-2.695221	all 64-bit	-0.124939
-2.548662	two 64-bit	-0.124939
-1.256182	In 64-bit	-0.204120
-2.409741	4 64-bit	-0.124939
-2.280291	up 64-bit	-0.124939
-2.257265	Some 64-bit	-0.124939
-1.539446	Use 64-bit	-0.425969
-2.025667	Therefore, 64-bit	-0.124939
-1.946258	efficient. 64-bit	-0.124939
-1.703876	registers. 64-bit	-0.124939
-1.668028	full 64-bit	-0.124939
-1.637898	expect 64-bit	-0.124939
-1.591021	references. 64-bit	-0.124939
-1.593207	define 64-bit	-0.124939
-1.374657	reference, 64-bit	-0.124939
-1.295476	(In 64-bit	-0.124939
-1.199671	29 64-bit	-0.124939
-0.900860	__int64 64-bit	-0.124939
-0.600943	whereas 64-bit	-0.124939
-0.600943	different. 64-bit	-0.124939
-2.328624	and there	-0.425969
-1.833597	that there	-0.726999
-3.198620	are there	-0.124939
-1.361561	if there	-1.124939
-3.052899	- there	-0.124939
-2.351446	than there	-0.124939
-2.238388	when there	-0.425969
-1.346247	then there	-0.823909
-1.879143	because there	-0.124939
-1.562244	If there	-0.522879
-1.662559	but there	-0.425969
-1.863597	where there	-0.124939
-2.495070	so there	-0.124939
-2.289926	case there	-0.124939
-1.278809	But there	-0.301030
-2.119243	whether there	-0.124939
-0.833089	However, there	-0.522879
-1.094355	unless there	-0.602060
-0.934072	cases, there	-0.425969
-1.985704	put there	-0.124939
-1.869775	Here, there	-0.124939
-1.632414	why there	-0.124939
-1.592113	course there	-0.124939
-1.593207	used, there	-0.124939
-1.547449	diagonal there	-0.124939
-1.497394	systems, there	-0.124939
-1.497394	however, there	-0.124939
-1.375762	general, there	-0.124939
-1.295476	Fortunately, there	-0.124939
-1.295476	unfortunately there	-0.124939
-0.601390	Typically, there	-0.124939
-1.199671	enabled there	-0.124939
-0.900860	avoided, there	-0.124939
-2.611057	the C++	-0.234083
-2.814968	a C++	-0.124939
-2.337107	of C++	-0.249877
-2.829431	and C++	-0.124939
-2.262651	in C++	-0.124939
-2.282736	The C++	-0.221849
-3.193969	for C++	-0.124939
-3.255269	that C++	-0.124939
-3.111820	// C++	-0.124939
-2.303236	or C++	-0.124939
-2.441609	on C++	-0.124939
-2.413787	as C++	-0.124939
-2.850615	when C++	-0.124939
-2.771575	A C++	-0.124939
-1.231061	different C++	-0.564271
-2.667302	all C++	-0.124939
-2.634538	most C++	-0.124939
-1.280514	Intel C++	-0.271067
-2.616656	into C++	-0.124939
-2.541961	In C++	-0.124939
-1.578294	Gnu C++	-0.124939
-2.217553	compiled C++	-0.124939
-2.120540	another C++	-0.124939
-2.055599	All C++	-0.124939
-2.043684	However, C++	-0.124939
-1.314129	Most C++	-0.425969
-1.279066	Microsoft C++	-0.124939
-1.832233	advanced C++	-0.124939
-1.789774	modern C++	-0.124939
-1.666302	libraries. C++	-0.124939
-0.942663	PathScale C++	-0.124939
-1.436647	language. C++	-0.124939
-0.746368	Borland C++	-0.124939
-1.198082	reasons. C++	-0.124939
-1.198082	While C++	-0.124939
-0.504239	PGI C++	-0.124939
-1.199584	C, C++	-0.124939
-1.074645	well-structured C++	-0.124939
-0.900061	15. C++	-0.124939
-0.900061	36 C++	-0.124939
-0.900061	Standard C++	-0.124939
-0.900061	Portability C++	-0.124939
-0.600543	Borland/CodeGear/Embarcadero C++	-0.124939
-0.600543	"Intel® C++	-0.124939
-0.600543	(Embarcadero/CodeGear/Borland C++	-0.124939
-1.917393	is also	-0.340539
-3.250560	and also	-0.124939
-3.293050	that also	-0.124939
-1.749565	are also	-0.284640
-1.635525	can also	-0.669007
-3.171022	it also	-0.124939
-3.118591	function also	-0.124939
-2.406042	This also	-0.124939
-1.933356	may also	-0.249877
-2.833452	will also	-0.124939
-2.203113	It also	-0.124939
-1.316418	but also	-0.182931
-1.785985	should also	-0.301030
-2.657155	set also	-0.124939
-2.656396	compilers also	-0.124939
-1.665495	systems also	-0.425969
-2.323444	these also	-0.124939
-2.323090	method also	-0.124939
-2.251588	stack also	-0.124939
-2.201556	language also	-0.124939
-2.167112	Linux also	-0.124939
-2.105237	operators also	-0.124939
-2.043061	allocation also	-0.124939
-1.978603	methods also	-0.124939
-1.944717	keyword also	-0.124939
-1.921927	expressions also	-0.124939
-1.790622	STL also	-0.124939
-1.709953	(See also	-0.124939
-1.670411	possibly also	-0.124939
-1.591230	course also	-0.124939
-1.592420	unrolling also	-0.124939
-1.592420	might also	-0.124939
-1.593613	F1 also	-0.124939
-1.374161	soon also	-0.124939
-0.379724	libraries, also	-0.425969
-0.600843	102 also	-0.124939
-2.759545	of such	-0.124939
-2.945272	to such	-0.124939
-2.667099	in such	-0.124939
-2.501342	for such	-0.124939
-2.488648	that such	-0.124939
-3.027690	function such	-0.124939
-2.509170	if such	-0.124939
-2.042457	on such	-0.249877
-2.261637	have such	-0.124939
-2.810839	use such	-0.124939
-2.729268	make such	-0.124939
-1.360096	functions such	-1.028029
-2.675387	but such	-0.124939
-2.596007	no such	-0.124939
-1.738531	do such	-0.124939
-2.618278	compilers such	-0.124939
-2.572589	using such	-0.124939
-2.524747	In such	-0.124939
-2.456134	many such	-0.124939
-2.269588	operations such	-0.124939
-2.261043	type such	-0.124939
-2.269032	cases such	-0.124939
-2.223115	CPUs such	-0.124939
-2.033959	However, such	-0.124939
-2.009283	replace such	-0.124939
-1.990483	branches such	-0.124939
-1.920847	applications such	-0.124939
-1.918212	types such	-0.124939
-1.219926	optimizations such	-0.425969
-1.812658	around such	-0.124939
-1.804119	reductions such	-0.124939
-1.102273	languages such	-0.425969
-0.791445	prevent such	-0.301030
-0.600500	tasks such	-0.726999
-1.696153	time, such	-0.124939
-1.621452	why such	-0.124939
-1.540786	blocks such	-0.124939
-1.540786	purposes such	-0.124939
-1.487461	iterations such	-0.124939
-1.433824	vector, such	-0.124939
-1.431641	memory, such	-0.124939
-1.436018	profilers such	-0.124939
-1.431641	available, such	-0.124939
-1.294312	threads, such	-0.124939
-1.289890	languages, such	-0.124939
-1.195185	resources, such	-0.124939
-1.197402	overflow, such	-0.124939
-1.195185	illustrates such	-0.124939
-1.195185	considerations such	-0.124939
-1.072463	comparisons, such	-0.124939
-1.072463	language, such	-0.124939
-1.072463	justify such	-0.124939
-1.072463	classes, such	-0.124939
-0.898600	templates, such	-0.124939
-0.898600	resource, such	-0.124939
-0.898600	shuffling, such	-0.124939
-0.898600	events, such	-0.124939
-0.898600	suffixes such	-0.124939
-0.898600	maintaining such	-0.124939
-0.599809	serial, such	-0.124939
-0.599809	vectorization, such	-0.124939
-0.599809	obtain, such	-0.124939
-0.599809	media such	-0.124939
-0.599809	9.2, such	-0.124939
-0.599809	information, such	-0.124939
-0.599809	Porting such	-0.124939
-0.599809	supply such	-0.124939
-3.116373	is efficient	-0.124939
-3.590100	of efficient	-0.124939
-3.595376	to efficient	-0.124939
-3.412680	and efficient	-0.124939
-3.310133	be efficient	-0.124939
-3.293265	are efficient	-0.124939
-1.758736	as efficient	-0.669007
-2.128236	an efficient	-0.301030
-2.945457	have efficient	-0.124939
-0.729879	more efficient	-0.644464
-1.015027	most efficient	-0.238882
-1.547772	very efficient	-0.124939
-0.722534	less efficient	-0.367977
-2.396099	how efficient	-0.124939
-2.229794	An efficient	-0.124939
-2.064057	quite efficient	-0.124939
-1.972531	various efficient	-0.124939
-0.805442	equally efficient	-0.124939
-2.987295	function In	-0.124939
-1.945856	} In	-0.301030
-2.531212	double In	-0.124939
-2.380254	2 In	-0.124939
-1.688016	code. In	-0.124939
-2.308311	pointers In	-0.124939
-2.258160	cases In	-0.124939
-1.560363	function. In	-0.124939
-1.250784	etc. In	-0.124939
-2.149110	integers In	-0.124939
-2.115832	b; In	-0.124939
-2.029120	program. In	-0.124939
-1.255972	efficient. In	-0.124939
-1.879224	processors. In	-0.124939
-1.802827	loop. In	-0.124939
-1.098912	size. In	-0.124939
-1.738160	variables. In	-0.124939
-1.748470	checking In	-0.124939
-1.713321	resources. In	-0.124939
-1.686322	it. In	-0.124939
-0.759594	calculations. In	-0.124939
-1.688938	cycles. In	-0.124939
-1.580274	unrolling In	-0.124939
-1.580274	Windows. In	-0.124939
-0.853994	faster. In	-0.425969
-1.531869	parameter. In	-0.124939
-1.539862	element. In	-0.124939
-1.486028	register. In	-0.124939
-1.486028	fast. In	-0.124939
-1.428036	parameters. In	-0.124939
-1.433414	simultaneously. In	-0.124939
-1.430717	optimizations. In	-0.124939
-1.428036	language. In	-0.124939
-0.744850	speed. In	-0.124939
-1.430717	16. In	-0.124939
-1.369182	application. In	-0.124939
-1.287286	priority. In	-0.124939
-1.290001	all. In	-0.124939
-1.292732	cycle. In	-0.124939
-1.195822	two. In	-0.124939
-1.193091	counts. In	-0.124939
-1.070883	chains. In	-0.124939
-1.070883	55 In	-0.124939
-1.070883	explicitly. In	-0.124939
-1.070883	for. In	-0.124939
-1.070883	step. In	-0.124939
-1.070883	condition. In	-0.124939
-1.070883	occur. In	-0.124939
-1.070883	elsewhere. In	-0.124939
-1.070883	big. In	-0.124939
-0.897540	71). In	-0.124939
-0.897540	users. In	-0.124939
-0.897540	throw. In	-0.124939
-0.897540	32. In	-0.124939
-0.897540	string. In	-0.124939
-0.897540	safe. In	-0.124939
-0.897540	though. In	-0.124939
-0.897540	name. In	-0.124939
-0.897540	calculation. In	-0.124939
-0.897540	obtained. In	-0.124939
-0.897540	cycle? In	-0.124939
-0.897540	purity. In	-0.124939
-0.897540	have. In	-0.124939
-0.599276	otherwise. In	-0.124939
-0.599276	/MT). In	-0.124939
-0.599276	B. In	-0.124939
-0.599276	mind. In	-0.124939
-0.599276	60. In	-0.124939
-0.599276	2002). In	-0.124939
-0.599276	system-specific. In	-0.124939
-0.599276	44 In	-0.124939
-0.599276	34. In	-0.124939
-0.599276	counterparts. In	-0.124939
-0.599276	short. In	-0.124939
-0.599276	destroyed. In	-0.124939
-0.599276	addressing. In	-0.124939
-0.599276	divisor. In	-0.124939
-0.599276	g(x)); In	-0.124939
-0.599276	generators. In	-0.124939
-0.599276	146). In	-0.124939
-0.599276	__intel_cpu_features_init_x(). In	-0.124939
-0.599276	API. In	-0.124939
-0.599276	strings. In	-0.124939
-2.216276	a *	-0.467361
-2.346917	int *	-0.124939
-1.834813	x *	-0.346788
-0.876286	b *	-0.287666
-2.579088	i *	-0.124939
-2.527548	float *	-0.124939
-2.424328	2 *	-0.124939
-1.238849	const *	-0.522879
-2.380955	8 *	-0.124939
-1.362282	(int *	-0.425969
-1.854791	10 *	-0.124939
-1.809867	5 *	-0.124939
-1.046012	temp *	-0.425969
-1.693361	100 *	-0.124939
-0.804040	j *	-0.124939
-1.436561	(a *	-0.124939
-1.293937	abc *	-0.124939
-1.198789	(u.i *	-0.124939
-1.198789	CChild1 *	-0.124939
-1.200559	CHello *	-0.124939
-1.075620	1000 *	-0.124939
-1.075620	x2 *	-0.124939
-1.075620	C0 *	-0.124939
-0.124740	StoreVector(void *	-0.602060
-1.073850	xxn *	-0.124939
-1.073850	CChild2 *	-0.124939
-0.899529	a2 *	-0.124939
-0.899529	a1 *	-0.124939
-0.899529	CriticalFunctionType *	-0.124939
-0.899529	FuncType *	-0.124939
-0.899529	(bb[i] *	-0.124939
-0.203763	(columns *	-0.124939
-0.203763	typeof(CriticalFunction) *	-0.425969
-0.899529	Func1(x) *	-0.124939
-0.899529	(a+1) *	-0.124939
-0.600276	Sum2(S3 *	-0.124939
-0.600276	example,a *	-0.124939
-0.600276	AddTwo(int *	-0.124939
-0.600276	FuncCol(i)) *	-0.124939
-0.600276	(2.5f *	-0.124939
-0.600276	a[i].u[1] *	-0.124939
-0.600276	*)alloca(n *	-0.124939
-0.600276	(b1 *	-0.124939
-0.600276	powN<(N1&(N1-1))==0,N1>::p(x) *	-0.124939
-0.600276	v.i *	-0.124939
-0.600276	StoreNTD(double *	-0.124939
-0.600276	powN<true,N/2>::p(x) *	-0.124939
-0.600276	StoreVectorA(void *	-0.124939
-0.600276	anda *	-0.124939
-0.600276	b2 *	-0.124939
-0.600276	b1 *	-0.124939
-0.600276	(b[i] *	-0.124939
-0.600276	8.0f) *	-0.124939
-2.319027	compiler There	-0.425969
-2.767030	} There	-0.124939
-2.518971	double There	-0.124939
-2.302782	code. There	-0.124939
-1.681518	time. There	-0.425969
-2.198509	function. There	-0.124939
-2.126525	etc. There	-0.124939
-2.106496	functions. There	-0.124939
-2.032781	... There	-0.124939
-1.997351	dispatching There	-0.124939
-1.953599	systems. There	-0.124939
-1.917387	efficient. There	-0.124939
-1.911921	below. There	-0.124939
-1.215026	processors. There	-0.124939
-1.856689	vectors There	-0.124939
-1.839949	process There	-0.124939
-1.842744	called. There	-0.124939
-1.788868	are: There	-0.124939
-1.762539	size. There	-0.124939
-1.710118	resources. There	-0.124939
-1.721481	calls. There	-0.124939
-1.683578	it. There	-0.124939
-1.683578	registers. There	-0.124939
-1.683578	object. There	-0.124939
-1.683578	performance. There	-0.124939
-1.616745	instructions. There	-0.124939
-1.616745	way. There	-0.124939
-1.572484	references. There	-0.124939
-1.575353	address. There	-0.124939
-1.581147	not. There	-0.124939
-1.572484	user. There	-0.124939
-1.575353	returns. There	-0.124939
-1.529595	allocation. There	-0.124939
-1.532483	enabled. There	-0.124939
-1.529595	inefficient. There	-0.124939
-1.535390	block. There	-0.124939
-1.538316	faster. There	-0.124939
-1.529595	parameter. There	-0.124939
-1.532483	expressions. There	-0.124939
-1.487164	value. There	-0.124939
-1.481330	arrays. There	-0.124939
-1.481330	branch. There	-0.124939
-1.484237	vectors. There	-0.124939
-1.426245	parameters. There	-0.124939
-1.438072	bits. There	-0.124939
-1.432118	automatically. There	-0.124939
-1.429172	core. There	-0.124939
-1.432118	threads. There	-0.124939
-1.368138	throughput There	-0.124939
-1.288957	consuming. There	-0.124939
-1.285990	manual. There	-0.124939
-1.285990	execution. There	-0.124939
-1.288957	8. There	-0.124939
-1.198041	*.so). There	-0.124939
-1.192047	support. There	-0.124939
-1.192047	optimal. There	-0.124939
-1.192047	maintenance There	-0.124939
-1.070095	explicitly. There	-0.124939
-1.070095	Windows). There	-0.124939
-1.070095	CodeAnalyst. There	-0.124939
-1.070095	way: There	-0.124939
-1.070095	security. There	-0.124939
-1.070095	limited. There	-0.124939
-1.070095	0x1C. There	-0.124939
-1.070095	copying. There	-0.124939
-1.070095	post-increment. There	-0.124939
-0.897011	screen. There	-0.124939
-0.897011	programmer. There	-0.124939
-0.897011	check. There	-0.124939
-0.897011	tables". There	-0.124939
-0.897011	created. There	-0.124939
-0.897011	43). There	-0.124939
-0.897011	Gnu. There	-0.124939
-0.897011	Conclusion There	-0.124939
-0.897011	power. There	-0.124939
-0.897011	87). There	-0.124939
-0.599010	-abs(x);. There	-0.124939
-0.599010	NULL. There	-0.124939
-0.599010	uses. There	-0.124939
-0.599010	2B. There	-0.124939
-0.599010	Mbytes. There	-0.124939
-0.599010	commas. There	-0.124939
-0.599010	recycled? There	-0.124939
-0.599010	x4∙xn-4. There	-0.124939
-0.599010	Namespaces There	-0.124939
-0.599010	returned. There	-0.124939
-0.599010	36. There	-0.124939
-0.599010	normally. There	-0.124939
-0.599010	.so). There	-0.124939
-0.599010	point). There	-0.124939
-0.599010	inheritance. There	-0.124939
-2.339482	the array	-0.209260
-2.560871	of array	-0.124939
-3.459015	to array	-0.124939
-3.258142	and array	-0.124939
-3.467528	in array	-0.124939
-2.279616	for array	-0.346788
-3.048687	or array	-0.124939
-1.275515	an array	-0.313995
-1.774813	from array	-0.726999
-2.729100	same array	-0.124939
-2.690491	one array	-0.124939
-2.001438	each array	-0.124939
-2.629303	size array	-0.124939
-1.545907	into array	-0.726999
-2.544996	two array	-0.124939
-2.319316	dynamic array	-0.124939
-1.601978	simple array	-0.124939
-2.228463	large array	-0.124939
-1.013255	An array	-0.124939
-2.212466	through array	-0.124939
-1.500436	allocated array	-0.124939
-1.950945	Make array	-0.124939
-1.706625	per array	-0.124939
-1.703160	final array	-0.124939
-1.667249	Any array	-0.124939
-0.646897	linear array	-0.124939
-1.591524	current array	-0.124939
-1.495772	temporary array	-0.124939
-0.186904	multidimensional array	-0.249877
-1.199406	individual array	-0.124939
-1.200579	constants, array	-0.124939
-0.900726	variable-size array	-0.124939
-0.900726	[] array	-0.124939
-0.600877	Output array	-0.124939
-0.600877	fixed-size array	-0.124939
-3.089192	and where	-0.124939
-3.041124	function where	-0.124939
-2.969993	code where	-0.124939
-2.721206	program where	-0.124939
-2.682833	point where	-0.124939
-2.082835	loop where	-0.124939
-2.691329	used where	-0.124939
-2.615754	set where	-0.124939
-2.528075	object where	-0.124939
-2.525552	static where	-0.124939
-2.451250	variable where	-0.124939
-2.338871	libraries where	-0.124939
-2.274082	operations where	-0.124939
-2.266879	case where	-0.124939
-0.531946	cases where	-0.550907
-2.247444	instructions where	-0.124939
-2.188074	threads where	-0.124939
-2.117820	solution where	-0.124939
-2.088369	structure where	-0.124939
-2.069224	mode where	-0.124939
-2.005526	programs where	-0.124939
-2.011232	space where	-0.124939
-1.984828	sets where	-0.124939
-1.962014	model where	-0.124939
-1.937442	examples where	-0.124939
-1.906496	expressions where	-0.124939
-1.852506	process where	-0.124939
-1.808134	computer where	-0.124939
-1.783785	languages where	-0.124939
-1.759743	functions, where	-0.124939
-1.664840	predict where	-0.124939
-0.180839	situation where	-0.550907
-1.583652	templates where	-0.124939
-1.591737	sequence where	-0.124939
-0.900590	checks where	-0.124939
-0.158112	situations where	-0.271067
-1.490766	false where	-0.124939
-1.488749	chain where	-0.124939
-0.803265	however, where	-0.124939
-1.496872	determined where	-0.124939
-1.434800	mode, where	-0.124939
-1.369888	step where	-0.124939
-1.367853	(i.e. where	-0.124939
-1.290707	Fortran where	-0.124939
-1.292752	inheritance where	-0.124939
-1.195842	pipeline where	-0.124939
-1.195842	cache, where	-0.124939
-1.195842	manner where	-0.124939
-1.072958	calculations, where	-0.124939
-1.072958	Internet where	-0.124939
-1.072958	instructions, where	-0.124939
-0.898931	12.4a where	-0.124939
-0.898931	today where	-0.124939
-0.898931	experiment where	-0.124939
-0.599976	areas where	-0.124939
-0.599976	__intel_cpu_feature_indicator where	-0.124939
-0.599976	1980 where	-0.124939
-0.599976	pow(x,N) where	-0.124939
-0.599976	1.fffff, where	-0.124939
-0.599976	data", where	-0.124939
-0.599976	back, where	-0.124939
-0.599976	sequence, where	-0.124939
-3.261943	the many	-0.124939
-3.430501	is many	-0.124939
-2.994343	to many	-0.124939
-2.423999	in many	-0.221849
-1.951140	for many	-0.170696
-2.734187	that many	-0.124939
-2.103751	are many	-0.204120
-3.090185	function many	-0.124939
-3.034581	by many	-0.124939
-1.584285	with many	-0.166331
-2.960886	as many	-0.124939
-2.033720	have many	-0.124939
-2.847189	then many	-0.124939
-2.782255	from many	-0.124939
-1.462938	has many	-0.191886
-2.713323	used many	-0.124939
-2.616656	into many	-0.124939
-1.621172	In many	-0.301030
-1.818642	so many	-0.124939
-2.403798	example, many	-0.124939
-1.291965	how many	-0.726999
-2.234474	its many	-0.124939
-2.218716	while many	-0.124939
-2.206146	But many	-0.124939
-2.124740	uses many	-0.124939
-1.017147	contains many	-0.249877
-2.094042	run many	-0.124939
-2.079251	store many	-0.124939
-1.935829	too many	-0.124939
-1.864408	generate many	-0.124939
-1.861491	goes many	-0.124939
-1.764683	Unfortunately, many	-0.124939
-1.706978	On many	-0.124939
-1.666302	containing many	-0.124939
-1.542834	contain many	-0.124939
-1.438133	hold many	-0.124939
-1.296494	seen many	-0.124939
-1.199584	avoids many	-0.124939
-1.074645	Has many	-0.124939
-0.900061	worse, many	-0.124939
-0.900061	requiring many	-0.124939
-0.600543	Includes many	-0.124939
-0.600543	Contains many	-0.124939
-0.600543	modifies many	-0.124939
-0.600543	"how many	-0.124939
-3.684379	the possible	-0.124939
-1.786641	is possible	-1.315270
-3.055056	a possible	-0.124939
-2.565514	of possible	-0.124939
-3.298177	and possible	-0.124939
-2.127124	be possible	-0.903090
-3.215971	are possible	-0.124939
-1.820700	it possible	-0.726999
-3.161333	if possible	-0.124939
-2.033947	as possible	-0.124939
-1.799120	not possible	-0.425969
-2.819484	A possible	-0.124939
-2.110167	only possible	-0.425969
-2.716509	other possible	-0.124939
-2.702491	all possible	-0.124939
-1.887476	also possible	-0.425969
-1.713173	often possible	-0.425969
-2.300319	always possible	-0.124939
-2.250014	its possible	-0.124939
-1.119477	best possible	-0.124939
-2.169366	therefore possible	-0.124939
-1.912237	optimizations possible	-0.124939
-1.738130	sometimes possible	-0.124939
-1.706941	maximum possible	-0.124939
-0.942655	simplest possible	-0.124939
-1.632402	rarely possible	-0.124939
-1.498085	fastest possible	-0.124939
-1.441095	generally possible	-0.124939
-0.680631	worst possible	-0.124939
-1.375153	biggest possible	-0.124939
-2.488725	the clock	-0.359022
-3.455701	is clock	-0.124939
-2.495520	a clock	-0.602060
-3.407328	of clock	-0.124939
-2.393542	The clock	-0.425969
-2.850593	more clock	-0.124939
-2.783843	A clock	-0.124939
-1.695980	CPU clock	-0.249877
-1.533017	one clock	-0.522879
-1.650909	two clock	-0.124939
-2.443291	2 clock	-0.124939
-1.743130	4 clock	-0.124939
-2.394983	8 clock	-0.124939
-2.297685	16 clock	-0.124939
-1.518646	several clock	-0.124939
-0.734105	few clock	-0.669007
-1.404102	every clock	-0.425969
-1.987565	their clock	-0.124939
-1.974434	256 clock	-0.124939
-1.935616	three clock	-0.124939
-1.885436	higher clock	-0.124939
-0.924267	10 clock	-0.301030
-0.601164	core clock	-0.346788
-0.876269	5 clock	-0.301030
-1.698349	100 clock	-0.124939
-0.688134	20 clock	-0.301030
-1.546708	6 clock	-0.124939
-1.373336	15 clock	-0.124939
-0.600953	80 clock	-0.124939
-1.294155	actual clock	-0.124939
-0.600953	hundred clock	-0.425969
-1.295521	11 clock	-0.124939
-1.294155	50 clock	-0.124939
-1.198611	40 clock	-0.124939
-0.504319	45 clock	-0.124939
-1.075043	25 clock	-0.124939
-0.203843	matrices, clock	-0.425969
-0.600676	2-3 clock	-0.124939
-0.600676	counting clock	-0.124939
-0.600676	500 clock	-0.124939
-0.600676	dummy[0]; clock	-0.124939
-3.074886	the version	-0.124939
-3.360895	a version	-0.124939
-2.738501	The version	-0.425969
-2.515209	function version	-0.124939
-1.925708	code version	-0.346788
-2.690196	same version	-0.124939
-1.555245	which version	-0.522879
-2.660936	one version	-0.124939
-1.749377	each version	-0.124939
-2.540680	static version	-0.124939
-1.893137	64-bit version	-0.124939
-2.513686	possible version	-0.124939
-2.385535	bit version	-0.124939
-1.254449	new version	-0.124939
-1.384261	SSE2 version	-0.301030
-2.210423	Windows version	-0.124939
-2.214580	compiled version	-0.124939
-2.180154	specific version	-0.124939
-1.486793	AVX version	-0.124939
-1.474849	optimized version	-0.124939
-2.116567	another version	-0.124939
-2.021440	optimal version	-0.124939
-1.970642	separate version	-0.124939
-1.932855	better version	-0.124939
-1.856213	old version	-0.124939
-0.458718	appropriate version	-0.669007
-1.152663	advanced version	-0.124939
-1.074816	desired version	-0.124939
-0.759703	right version	-0.602060
-1.697828	final version	-0.124939
-1.588766	future version	-0.124939
-1.588766	newer version	-0.124939
-1.587133	current version	-0.124939
-1.543009	chosen version	-0.124939
-1.493496	interpreted version	-0.124939
-0.332696	debug version	-0.249877
-0.492166	latest version	-0.602060
-0.680567	dispatched version	-0.425969
-1.292674	popular version	-0.124939
-1.197422	selected version	-0.124939
-0.504476	inferior version	-0.124939
-1.074148	kernel version	-0.124939
-0.203783	Lowest version	-0.425969
-0.203783	binutils version	-0.124939
-0.899728	command-line version	-0.124939
-0.203783	release version	-0.124939
-0.899728	generic version	-0.124939
-0.600376	Generic version	-0.124939
-0.600376	glibc version	-0.124939
-0.600376	Default version	-0.124939
-1.896861	the value	-0.541764
-2.704100	a value	-0.249877
-2.407324	The value	-0.425969
-3.111313	by value	-0.124939
-2.280282	this value	-0.124939
-2.769227	different value	-0.124939
-1.852302	other value	-0.602060
-2.753119	point value	-0.124939
-2.687071	integer value	-0.124939
-1.391080	each value	-0.204120
-2.472153	return value	-0.124939
-1.668032	new value	-0.425969
-1.578589	its value	-0.124939
-1.868246	binary value	-0.124939
-1.821596	negative value	-0.124939
-1.739769	preceding value	-0.124939
-1.708419	maximum value	-0.124939
-1.706752	final value	-0.124939
-0.688455	previous value	-0.301030
-0.492444	absolute value	-0.301030
-1.375982	B value	-0.124939
-0.901393	minimum value	-0.124939
-0.901393	R value	-0.124939
-0.901393	fourth value	-0.124939
-0.601211	initial value	-0.124939
-2.672277	the objects	-0.182931
-2.343219	of objects	-0.425969
-2.196147	and objects	-0.329059
-2.769608	The objects	-0.124939
-2.762437	for objects	-0.425969
-3.192988	are objects	-0.124939
-1.990258	when objects	-0.301030
-1.898773	vector objects	-0.124939
-2.125944	different objects	-0.124939
-2.704865	other objects	-0.124939
-1.836107	If objects	-0.602060
-1.818545	all objects	-0.602060
-1.602359	class objects	-0.124939
-1.858417	many objects	-0.124939
-1.835759	any objects	-0.124939
-2.334371	new objects	-0.124939
-2.266547	making objects	-0.124939
-2.229449	large objects	-0.124939
-1.539461	big objects	-0.124939
-1.088409	allocated objects	-0.249877
-2.085864	store objects	-0.124939
-0.756524	shared objects	-0.301030
-1.907341	graphics objects	-0.124939
-1.735703	local objects	-0.124939
-1.048758	Do objects	-0.425969
-1.701286	so-called objects	-0.124939
-1.591818	similar objects	-0.124939
-1.592944	modify objects	-0.124939
-1.496034	temporary objects	-0.124939
-0.049172	Shared objects	-0.669007
-0.747498	composite objects	-0.124939
-1.374491	declare objects	-0.124939
-0.090088	Are objects	-0.249877
-0.600910	Returning objects	-0.124939
-3.243107	and takes	-0.124939
-2.157533	that takes	-0.204120
-1.202994	it takes	-0.713210
-3.038883	code takes	-0.124939
-2.349229	compiler takes	-0.124939
-1.950119	It takes	-0.124939
-2.172585	memory takes	-0.124939
-2.775385	program takes	-0.124939
-2.779729	instruction takes	-0.124939
-2.736893	loop takes	-0.124939
-1.780794	integer takes	-0.301030
-1.969656	double takes	-0.124939
-2.619494	pointer takes	-0.124939
-2.560817	object takes	-0.124939
-2.538123	C++ takes	-0.124939
-2.456465	table takes	-0.124939
-1.711022	often takes	-0.124939
-2.289585	always takes	-0.124939
-2.160572	precision takes	-0.124939
-2.110380	list takes	-0.124939
-1.076270	typically takes	-0.124939
-1.328980	multiplication takes	-0.124939
-1.993874	handling takes	-0.124939
-1.976841	never takes	-0.124939
-0.884614	conversion takes	-0.124939
-1.026086	division takes	-0.301030
-1.261078	addition takes	-0.124939
-1.762173	operation takes	-0.124939
-0.901590	something takes	-0.425969
-1.495248	DLL takes	-0.124939
-1.439710	collection takes	-0.124939
-1.439710	again takes	-0.124939
-1.373996	truncation takes	-0.124939
-1.297290	Division takes	-0.124939
-0.900593	Multiplication takes	-0.124939
-0.900593	branching takes	-0.124939
-0.900593	obviously takes	-0.124939
-2.230638	the variable	-0.263241
-1.960044	a variable	-0.249877
-2.572099	of variable	-0.221849
-2.897155	and variable	-0.124939
-2.561756	or variable	-0.124939
-2.930538	have variable	-0.124939
-1.609400	A variable	-0.124939
-2.737669	other variable	-0.124939
-2.757263	point variable	-0.124939
-2.715673	one variable	-0.124939
-2.691371	integer variable	-0.124939
-2.680076	no variable	-0.124939
-2.481042	member variable	-0.124939
-1.769356	const variable	-0.124939
-2.423261	unsigned variable	-0.124939
-1.729379	register variable	-0.425969
-2.072961	shared variable	-0.124939
-1.990703	signed variable	-0.124939
-0.395480	induction variable	-0.124939
-1.222966	public variable	-0.124939
-0.877484	global variable	-0.301030
-1.498930	temporary variable	-0.124939
-1.443260	saved variable	-0.124939
-2.970056	of any	-0.124939
-2.775028	to any	-0.301030
-3.127085	and any	-0.124939
-2.899248	in any	-0.124939
-2.514159	for any	-0.124939
-2.293526	or any	-0.124939
-2.516244	if any	-0.124939
-2.096498	by any	-0.425969
-2.447776	with any	-0.124939
-2.986607	on any	-0.124939
-2.926946	as any	-0.124939
-2.888354	not any	-0.124939
-2.883823	than any	-0.124939
-2.269248	have any	-0.124939
-2.835410	use any	-0.124939
-1.924912	from any	-0.301030
-2.776067	at any	-0.124939
-2.134543	make any	-0.124939
-2.082342	If any	-0.425969
-2.692418	but any	-0.124939
-2.617245	do any	-0.124939
-2.534052	In any	-0.124939
-2.420585	return any	-0.124939
-1.341755	before any	-0.249877
-1.736476	call any	-0.425969
-2.380736	take any	-0.124939
-1.673156	need any	-0.124939
-1.404700	without any	-0.124939
-1.640289	access any	-0.425969
-1.584836	making any	-0.124939
-2.216270	avoid any	-0.124939
-2.117668	get any	-0.124939
-1.426847	contains any	-0.124939
-2.088407	run any	-0.124939
-1.879622	calling any	-0.124939
-1.860875	generate any	-0.124939
-1.855592	reduce any	-0.124939
-0.821648	produce any	-0.602060
-1.739011	outside any	-0.124939
-1.699622	time, any	-0.124939
-1.696065	adding any	-0.124939
-1.630659	insert any	-0.124939
-1.628862	loading any	-0.124939
-1.543509	include any	-0.124939
-0.332644	hardly any	-0.124939
-1.437981	remove any	-0.124939
-1.371035	avoiding any	-0.124939
-1.198591	recommend any	-0.124939
-1.073652	alias any	-0.124939
-1.073652	throw any	-0.124939
-0.899396	resolve any	-0.124939
-0.899396	obey any	-0.124939
-0.600209	express any	-0.124939
-0.600209	identifies any	-0.124939
-0.600209	destroys any	-0.124939
-2.820963	and we	-0.124939
-1.794934	that we	-0.284640
-3.081115	function we	-0.124939
-2.009939	if we	-0.221849
-2.867701	time we	-0.124939
-2.231009	when we	-0.124939
-1.514823	then we	-0.271067
-1.719428	because we	-0.249877
-1.676530	If we	-0.124939
-2.694192	which we	-0.124939
-2.576399	number we	-0.124939
-2.515807	where we	-0.124939
-1.408796	so we	-0.425969
-2.419182	before we	-0.124939
-2.401079	example, we	-0.124939
-2.319607	systems we	-0.124939
-1.610748	case we	-0.124939
-1.530125	But we	-0.124939
-1.360974	However, we	-0.124939
-1.943217	examples we	-0.124939
-1.864904	Here, we	-0.124939
-1.760683	lines we	-0.124939
-1.732279	constants we	-0.124939
-1.014448	When we	-0.124939
-0.980009	thing we	-0.425969
-1.589286	future we	-0.124939
-1.590863	Obviously, we	-0.124939
-0.855070	Here we	-0.124939
-1.492376	4, we	-0.124939
-1.437543	mode, we	-0.124939
-1.435961	available, we	-0.124939
-1.199286	cores, we	-0.124939
-1.197686	While we	-0.124939
-1.197686	As we	-0.124939
-0.504179	Then we	-0.124939
-1.074347	numbers, we	-0.124939
-1.075952	7.4 we	-0.124939
-1.074347	since we	-0.124939
-0.899861	four, we	-0.124939
-0.899861	algebra, we	-0.124939
-0.600443	decomposition, we	-0.124939
-0.600443	lesson we	-0.124939
-0.600443	Similarly, we	-0.124939
-0.600443	Surprisingly, we	-0.124939
-0.600443	14.7b, we	-0.124939
-0.600443	Next, we	-0.124939
-0.600443	Thus, we	-0.124939
-0.600443	Should we	-0.124939
-2.815548	of some	-0.124939
-2.666152	to some	-0.124939
-2.521599	and some	-0.124939
-1.784418	in some	-0.471726
-2.392564	for some	-0.124939
-2.361557	that some	-0.124939
-2.689304	are some	-0.124939
-3.185230	it some	-0.124939
-2.500970	by some	-0.124939
-2.230272	with some	-0.124939
-2.054073	on some	-0.124939
-2.972396	may some	-0.124939
-2.907684	have some	-0.124939
-2.153108	has some	-0.124939
-2.792977	make some	-0.124939
-2.653453	do some	-0.124939
-0.854992	In some	-0.793946
-2.520991	takes some	-0.124939
-2.416719	example, some	-0.124939
-2.323905	out some	-0.124939
-1.572464	does some	-0.124939
-2.098046	doing some	-0.124939
-1.909809	give some	-0.124939
-1.866343	reduce some	-0.124939
-1.771890	described some	-0.124939
-1.768793	Unfortunately, some	-0.124939
-1.733717	save some	-0.124939
-1.594764	SSE4.1 some	-0.124939
-1.439862	Even some	-0.124939
-1.199936	While some	-0.124939
-0.900993	describe some	-0.124939
-2.277312	is so	-0.170696
-3.084038	and so	-0.124939
-3.353792	in so	-0.124939
-2.653261	be so	-0.124939
-2.180776	are so	-0.124939
-3.101423	it so	-0.124939
-3.038404	function so	-0.124939
-2.420764	code so	-0.425969
-3.000342	x so	-0.124939
-2.247753	time so	-0.425969
-2.731176	vector so	-0.124939
-2.673314	different so	-0.124939
-2.633226	cache so	-0.124939
-1.468847	do so	-0.221849
-2.557971	example so	-0.124939
-2.501033	C++ so	-0.124939
-2.405556	address so	-0.124939
-2.384044	call so	-0.124939
-2.387731	less so	-0.124939
-2.372323	bit so	-0.124939
-2.324503	pointers so	-0.124939
-2.273180	operations so	-0.124939
-2.266106	case so	-0.124939
-2.247403	does so	-0.124939
-2.195846	calculations so	-0.124939
-2.187354	threads so	-0.124939
-2.171418	exception so	-0.124939
-2.068641	mode so	-0.124939
-1.953525	source so	-0.124939
-1.836607	start so	-0.124939
-1.805754	negative so	-0.124939
-1.801780	section so	-0.124939
-1.779425	statement so	-0.124939
-1.755406	code, so	-0.124939
-1.723433	inlined so	-0.124939
-1.691249	macro so	-0.124939
-1.689248	100 so	-0.124939
-1.662544	2, so	-0.124939
-1.664582	changed so	-0.124939
-1.626794	identical so	-0.124939
-1.585401	unrolling so	-0.124939
-0.900972	organized so	-0.425969
-1.581334	metaprogramming so	-0.124939
-1.490539	integer, so	-0.124939
-1.494664	designed so	-0.124939
-1.434604	thousand so	-0.124939
-1.373889	possible, so	-0.124939
-1.369725	compact so	-0.124939
-1.367658	above, so	-0.124939
-1.369725	truncation so	-0.124939
-1.195711	default, so	-0.124939
-1.197798	9.5 so	-0.124939
-1.072859	bits, so	-0.124939
-1.072859	prefetching so	-0.124939
-1.072859	parameter, so	-0.124939
-0.898865	12.4a so	-0.124939
-0.898865	bytes, so	-0.124939
-0.898865	factorials so	-0.124939
-0.599942	switches; so	-0.124939
-0.599942	developing so	-0.124939
-0.599942	digits, so	-0.124939
-0.599942	0x2C so	-0.124939
-0.599942	C1, so	-0.124939
-3.269264	the variables	-0.124939
-3.394545	of variables	-0.124939
-2.745492	for variables	-0.124939
-3.263386	that variables	-0.124939
-3.145333	are variables	-0.124939
-3.021573	on variables	-0.124939
-2.770686	make variables	-0.124939
-2.679754	other variables	-0.124939
-1.342995	point variables	-0.182931
-2.703371	which variables	-0.124939
-2.671832	all variables	-0.124939
-2.066016	used variables	-0.124939
-2.637447	most variables	-0.124939
-2.546480	multiple variables	-0.124939
-2.549755	static variables	-0.124939
-2.493523	many variables	-0.124939
-1.109591	register variables	-0.301030
-2.380112	how variables	-0.124939
-2.314540	these variables	-0.124939
-2.258947	simple variables	-0.124939
-2.236514	its variables	-0.124939
-2.186421	several variables	-0.124939
-1.486379	precision variables	-0.425969
-2.051053	counter variables	-0.124939
-2.026572	Integer variables	-0.124939
-0.768356	Boolean variables	-0.522879
-0.445054	induction variables	-0.234083
-1.220972	public variables	-0.124939
-0.876816	global variables	-0.124939
-1.076574	Such variables	-0.124939
-1.046033	local variables	-0.124939
-1.633399	initialized variables	-0.124939
-1.630569	allow variables	-0.124939
-0.856172	non-static variables	-0.425969
-0.216453	Induction variables	-0.346788
-1.373007	Register variables	-0.124939
-1.371580	(i.e. variables	-0.124939
-1.374438	internal variables	-0.124939
-0.504279	Integers variables	-0.425969
-1.074844	Storing variables	-0.124939
-0.379630	Global variables	-0.124939
-0.900194	uninitialized variables	-0.124939
-0.600609	summation variables	-0.124939
-3.295906	the return	-0.124939
-3.441034	of return	-0.124939
-2.802986	to return	-0.301030
-2.652570	and return	-0.124939
-3.242075	The return	-0.124939
-3.239197	can return	-0.124939
-2.357115	// return	-0.124939
-2.285182	function return	-0.124939
-2.960524	may return	-0.124939
-1.010159	{ return	-0.284640
-2.833452	will return	-0.124939
-1.798592	} return	-0.124939
-1.449008	version return	-0.249877
-1.803230	2 return	-0.124939
-2.257158	error return	-0.124939
-2.243731	its return	-0.124939
-2.239955	must return	-0.124939
-1.558358	; return	-0.124939
-2.207406	i; return	-0.124939
-1.443518	Function return	-0.425969
-1.432139	supported return	-0.124939
-1.376329	... return	-0.425969
-2.020223	1; return	-0.124939
-1.977444	never return	-0.124939
-1.156660	2; return	-0.425969
-0.856359	3; return	-0.425969
-1.372961	normal return	-0.124939
-1.296183	#endif return	-0.124939
-1.294980	here: return	-0.124939
-0.600843	x8*x2; return	-0.124939
-0.600843	(N-1)) return	-0.124939
-0.600843	2.5}; return	-0.124939
-0.600843	134) return	-0.124939
-0.600843	causing return	-0.124939
-0.600843	__rdtsc(); return	-0.124939
-0.600843	s); return	-0.124939
-3.525951	is 2	-0.124939
-3.488972	a 2	-0.124939
-1.783008	of 2	-0.316824
-3.028729	to 2	-0.124939
-3.227156	be 2	-0.124939
-2.360533	// 2	-0.301030
-3.112178	= 2	-0.124939
-2.260296	by 2	-0.124939
-3.056394	- 2	-0.124939
-2.353117	than 2	-0.124939
-2.620490	double 2	-0.124939
-1.535732	+ 2	-0.249877
-2.514473	* 2	-0.124939
-1.805084	2 2	-0.124939
-2.460517	between 2	-0.124939
-1.745814	4 2	-0.425969
-2.417384	unsigned 2	-0.124939
-1.195773	64 2	-0.221849
-1.638229	32 2	-0.124939
-2.138844	/ 2	-0.124939
-2.081698	add 2	-0.124939
-0.793012	== 2	-0.124939
-1.670856	% 2	-0.124939
-1.637192	below 2	-0.124939
-0.647234	fraction 2	-0.124939
-0.805340	2) 2	-0.124939
-1.439862	int64_t 2	-0.124939
-0.680611	Add 2	-0.425969
-0.425502	Core 2	-0.124939
-1.076040	124 2	-0.124939
-0.601010	exceed 2	-0.124939
-3.008929	// You	-0.124939
-2.816238	time You	-0.124939
-2.789957	} You	-0.124939
-2.691841	functions You	-0.124939
-2.386408	unsigned You	-0.124939
-2.319493	code. You	-0.124939
-1.685108	time. You	-0.124939
-2.115141	functions. You	-0.124939
-2.031413	used. You	-0.124939
-2.006504	1; You	-0.124939
-1.256813	efficient. You	-0.124939
-1.920566	below. You	-0.124939
-1.849173	called. You	-0.124939
-1.821023	compiler. You	-0.124939
-1.805589	pointer. You	-0.124939
-1.689083	registers. You	-0.124939
-1.012013	cycles. You	-0.124939
-1.011514	operations. You	-0.124939
-1.654321	needed. You	-0.124939
-1.656701	classes. You	-0.124939
-1.659093	thread. You	-0.124939
-1.656701	precision. You	-0.124939
-1.621305	way. You	-0.124939
-1.621305	vector. You	-0.124939
-1.577519	references. You	-0.124939
-1.584737	not. You	-0.124939
-1.534154	allocation. You	-0.124939
-1.485408	zero. You	-0.124939
-1.485408	software. You	-0.124939
-1.432268	case. You	-0.124939
-1.429835	anyway. You	-0.124939
-1.432268	16. You	-0.124939
-1.370228	application. You	-0.124939
-1.370228	here. You	-0.124939
-1.367768	result. You	-0.124939
-1.367768	one. You	-0.124939
-0.678724	overlap. You	-0.124939
-0.678724	cores. You	-0.425969
-1.288586	handling. You	-0.124939
-1.288586	numbers. You	-0.124939
-1.288586	manual. You	-0.124939
-1.293521	modules. You	-0.124939
-1.288586	them. You	-0.124939
-1.288586	itself. You	-0.124939
-1.196611	expensive. You	-0.124939
-1.071672	operands. You	-0.124939
-1.071672	n. You	-0.124939
-1.071672	a. You	-0.124939
-0.898070	debugger. You	-0.124939
-0.898070	CriticalFunction. You	-0.124939
-0.898070	52. You	-0.124939
-0.898070	question. You	-0.124939
-0.898070	model. You	-0.124939
-0.898070	examples. You	-0.124939
-0.898070	shared. You	-0.124939
-0.898070	test. You	-0.124939
-0.898070	incompatible. You	-0.124939
-0.898070	72. You	-0.124939
-0.898070	today. You	-0.124939
-0.898070	me. You	-0.124939
-0.599543	108 You	-0.124939
-0.599543	twice. You	-0.124939
-0.599543	heading You	-0.124939
-0.599543	-fno-strict-overflow. You	-0.124939
-0.599543	__intel_cpu_feature_indicator_x. You	-0.124939
-0.599543	contention. You	-0.124939
-0.599543	compiler-specific. You	-0.124939
-0.599543	64). You	-0.124939
-0.599543	used). You	-0.124939
-0.599543	account. You	-0.124939
-0.599543	late. You	-0.124939
-0.599543	job. You	-0.124939
-0.599543	entry. You	-0.124939
-0.599543	dilemma. You	-0.124939
-0.599543	makefile. You	-0.124939
-2.342967	the table	-0.241444
-2.151805	a table	-0.393784
-3.509202	of table	-0.124939
-2.884030	and table	-0.425969
-2.267898	in table	-0.124939
-2.294724	The table	-0.221849
-3.187748	// table	-0.124939
-3.073058	on table	-0.124939
-2.898269	this table	-0.124939
-2.800669	make table	-0.124939
-2.658536	each table	-0.124939
-2.335167	these table	-0.124939
-1.638347	following table	-0.124939
-2.197772	These table	-0.124939
-1.110309	virtual table	-0.124939
-1.007817	lookup table	-0.124939
-1.769975	Unfortunately, table	-0.124939
-1.047753	vectorized table	-0.124939
-1.015386	offset table	-0.124939
-1.706031	Avoid table	-0.124939
-1.635277	align table	-0.124939
-0.550500	hash table	-0.124939
-0.149654	linkage table	-0.522879
-1.297377	written table	-0.124939
-1.201377	Vectorized table	-0.124939
-1.200467	As table	-0.124939
-0.203937	import table	-0.124939
-2.187640	the performance	-0.329059
-3.032555	a performance	-0.124939
-2.804078	of performance	-0.124939
-2.426950	in performance	-0.221849
-2.287358	The performance	-0.124939
-3.231879	for performance	-0.124939
-3.032597	or performance	-0.124939
-3.035907	code performance	-0.124939
-2.237061	more performance	-0.425969
-2.865074	when performance	-0.124939
-2.793276	A performance	-0.124939
-2.151037	program performance	-0.124939
-2.015543	no performance	-0.124939
-2.484368	any performance	-0.124939
-2.441863	software performance	-0.124939
-2.410796	called performance	-0.124939
-2.340301	useful performance	-0.124939
-2.292679	system performance	-0.124939
-1.277446	best performance	-0.602060
-1.500038	good performance	-0.124939
-2.039480	optimize performance	-0.124939
-2.000636	Most performance	-0.124939
-1.297445	Using performance	-0.425969
-1.890422	improve performance	-0.124939
-0.760477	reduced performance	-0.602060
-1.633289	identical performance	-0.124939
-1.373831	identify performance	-0.124939
-1.372566	poor performance	-0.124939
-1.295918	realistic performance	-0.124939
-1.199008	highest performance	-0.124939
-1.199008	51 performance	-0.124939
-1.076618	measuring performance	-0.124939
-1.075342	Comparing performance	-0.124939
-0.900527	inherent performance	-0.124939
-0.900527	overall performance	-0.124939
-0.600776	benchmark performance	-0.124939
-0.600776	investigating performance	-0.124939
-0.600776	degrades performance	-0.124939
-3.684379	the very	-0.124939
-2.052891	is very	-0.209260
-2.005251	a very	-0.266268
-3.494876	to very	-0.124939
-2.874440	and very	-0.124939
-2.542141	for very	-0.301030
-2.053652	be very	-0.271067
-2.199114	are very	-0.124939
-3.062928	on very	-0.124939
-3.017355	as very	-0.124939
-2.981187	not very	-0.124939
-2.974810	may very	-0.124939
-2.039336	have very	-0.124939
-2.819484	A very	-0.124939
-2.544153	also very	-0.124939
-1.827332	some very	-0.124939
-2.253299	accessed very	-0.124939
-2.232478	arrays very	-0.124939
-2.135568	get very	-0.124939
-2.005982	caching very	-0.124939
-1.866797	made very	-0.124939
-1.131711	things very	-0.124939
-1.829467	depends very	-0.124939
-1.793149	become very	-0.124939
-1.441095	generally very	-0.124939
-1.297988	-(-a) very	-0.124939
-1.295972	unfortunately very	-0.124939
-1.201078	Optimizes very	-0.124939
-1.076139	Programmers very	-0.124939
-0.601044	admittedly very	-0.124939
-2.488725	the software	-0.301030
-2.311371	a software	-0.301030
-2.339320	of software	-0.124939
-3.419559	to software	-0.124939
-2.840985	and software	-0.124939
-2.917180	in software	-0.124939
-2.529606	for software	-0.124939
-2.506918	that software	-0.124939
-3.027685	on software	-0.124939
-2.858818	when software	-0.124939
-2.783843	A software	-0.124939
-2.140443	make software	-0.124939
-2.707098	which software	-0.124939
-2.676410	all software	-0.124939
-2.640375	most software	-0.124939
-2.496789	many software	-0.124939
-1.787219	32-bit software	-0.124939
-2.326926	new software	-0.124939
-1.586957	making software	-0.124939
-2.251250	Some software	-0.124939
-2.238262	extra software	-0.124939
-2.204542	big software	-0.124939
-2.149793	optimized software	-0.124939
-2.058804	All software	-0.124939
-2.006553	application software	-0.124939
-1.987565	their software	-0.124939
-0.967274	Many software	-0.301030
-1.880145	platform software	-0.124939
-1.890792	bigger software	-0.124939
-1.077245	whole software	-0.124939
-0.980059	Optimizing software	-0.425969
-1.544005	typical software	-0.124939
-1.294155	CPU-intensive software	-0.124939
-1.295521	18 software	-0.124939
-1.199982	structured software	-0.124939
-0.900327	irrelevant software	-0.124939
-0.900327	working software	-0.124939
-0.900327	performing software	-0.124939
-0.900327	Security software	-0.124939
-0.600676	memory-hungry software	-0.124939
-0.600676	multi-threaded software	-0.124939
-2.435888	the order	-0.793946
-2.844735	of order	-0.301030
-1.406190	in order	-1.797037
-2.571077	The order	-0.301030
-1.624152	In order	-0.602060
-2.218990	specific order	-0.124939
-1.873648	storage order	-0.124939
-1.876024	link order	-0.124939
-1.502270	non-sequential order	-0.124939
-1.202198	sequential order	-0.124939
-1.077739	natural order	-0.124939
-0.902128	opposite order	-0.124939
-0.601579	out-of- order	-0.124939
-0.601579	2'nd order	-0.124939
-3.605571	the long	-0.124939
-3.058105	is long	-0.124939
-2.147498	a long	-0.335792
-2.846879	and long	-0.425969
-3.027364	or long	-0.124939
-2.463302	with long	-0.124939
-1.690841	as long	-0.726999
-2.945439	not long	-0.124939
-2.885973	have long	-0.124939
-2.862979	when long	-0.124939
-2.790109	A long	-0.124939
-2.647592	no long	-0.124939
-2.474018	some long	-0.124939
-2.489680	so long	-0.124939
-1.386564	very long	-0.249877
-0.930453	long long	-0.346788
-2.397366	8 long	-0.124939
-1.483309	unsigned long	-0.124939
-2.382630	how long	-0.124939
-2.315572	SSE2 long	-0.124939
-1.550537	avoid long	-0.425969
-2.205962	i; long	-0.124939
-1.939424	too long	-0.124939
-1.878726	AVX2 long	-0.124939
-1.843061	just long	-0.124939
-1.701732	Avoid long	-0.124939
-1.496014	misprediction long	-0.124939
-1.438022	MMX long	-0.124939
-1.373666	systems: long	-0.124939
-1.294485	AVX512 long	-0.124939
-0.186884	unacceptably long	-0.726999
-1.076551	libraries: long	-0.124939
-1.075242	Linux: long	-0.124939
-0.900460	types: long	-0.124939
-0.900460	time1; long	-0.124939
-0.900460	annoyingly long	-0.124939
-0.600743	int32_t long	-0.124939
-0.600743	<intrin.h> long	-0.124939
-0.600743	DontSkip; long	-0.124939
-3.156330	and between	-0.124939
-3.402927	in between	-0.124939
-2.973609	or between	-0.124939
-2.654453	cache between	-0.124939
-2.593088	there between	-0.124939
-2.500928	takes between	-0.124939
-1.799593	performance between	-0.124939
-2.205253	bytes between	-0.124939
-1.277366	speed between	-0.124939
-0.847939	shared between	-0.823909
-1.999360	typically between	-0.124939
-1.975399	conversion between	-0.124939
-0.444916	difference between	-0.301030
-1.893347	framework between	-0.124939
-1.864259	choose between	-0.124939
-1.840778	switch between	-0.124939
-1.103179	conversions between	-0.425969
-1.801147	choice between	-0.124939
-1.592052	processors, between	-0.124939
-1.493496	jump between	-0.124939
-0.182690	Conversions between	-0.204120
-1.440460	distributed between	-0.124939
-0.425407	communication between	-0.301030
-1.294332	select between	-0.124939
-1.294332	split between	-0.124939
-0.600712	compromise between	-0.124939
-1.294332	nothing between	-0.124939
-1.292674	jumps between	-0.124939
-0.249503	chooses between	-0.602060
-0.504139	distinguish between	-0.124939
-1.074148	distance between	-0.124939
-0.899728	Jumps between	-0.124939
-0.899728	Switch between	-0.124939
-0.203783	synchronization between	-0.124939
-0.203783	distinction between	-0.124939
-0.600376	similarity between	-0.124939
-0.600376	balance between	-0.124939
-0.600376	transitions between	-0.124939
-0.600376	evenly between	-0.124939
-0.600376	workload between	-0.124939
-0.600376	communicating between	-0.124939
-0.600376	correspondence between	-0.124939
-0.600376	varies between	-0.124939
-0.600376	observed between	-0.124939
-0.600376	discriminating between	-0.124939
-0.600376	distinctions between	-0.124939
-0.600376	porting between	-0.124939
-0.600376	discriminates between	-0.124939
-0.600376	environment, between	-0.124939
-0.600376	distinguishing between	-0.124939
-3.119581	the 32-bit	-0.124939
-2.695185	a 32-bit	-0.124939
-3.470019	of 32-bit	-0.124939
-3.480175	to 32-bit	-0.124939
-2.660225	and 32-bit	-0.124939
-1.684684	in 32-bit	-0.481486
-1.955011	for 32-bit	-0.279841
-3.204327	are 32-bit	-0.124939
-2.597726	// 32-bit	-0.124939
-3.009388	as 32-bit	-0.124939
-2.954635	than 32-bit	-0.124939
-2.886785	use 32-bit	-0.124939
-2.109323	only 32-bit	-0.425969
-2.727055	but 32-bit	-0.124939
-1.902262	two 32-bit	-0.124939
-2.552463	In 32-bit	-0.124939
-1.792339	between 32-bit	-0.124939
-1.580015	Gnu 32-bit	-0.124939
-2.134108	uses 32-bit	-0.124939
-2.125996	support 32-bit	-0.124939
-2.047443	fast 32-bit	-0.124939
-2.042260	both 32-bit	-0.124939
-1.993806	their 32-bit	-0.124939
-1.905184	Many 32-bit	-0.124939
-1.816538	supports 32-bit	-0.124939
-1.104831	Supports 32-bit	-0.425969
-1.768499	including 32-bit	-0.124939
-1.707413	Linux, 32-bit	-0.124939
-1.439632	Linux. 32-bit	-0.124939
-1.374822	reference, 32-bit	-0.124939
-0.900926	X, 32-bit	-0.124939
-0.600977	16-bit, 32-bit	-0.124939
-2.619243	the branch	-0.301030
-3.496474	is branch	-0.124939
-2.223874	a branch	-0.263241
-2.808957	of branch	-0.301030
-2.654471	and branch	-0.301030
-2.767597	The branch	-0.124939
-2.760522	for branch	-0.124939
-3.297458	that branch	-0.124939
-2.528297	if branch	-0.124939
-2.467272	with branch	-0.124939
-3.046553	on branch	-0.124939
-2.198146	code branch	-0.301030
-1.468755	A branch	-0.425969
-2.093218	loop branch	-0.124939
-2.655488	no branch	-0.124939
-2.506738	many branch	-0.124939
-2.529850	possible branch	-0.124939
-2.488920	any branch	-0.124939
-2.397263	take branch	-0.124939
-1.665725	new branch	-0.124939
-2.245858	about branch	-0.124939
-2.194438	single branch	-0.124939
-2.166690	cause branch	-0.124939
-2.029526	optimal branch	-0.124939
-2.022784	particular branch	-0.124939
-0.671736	control branch	-0.346788
-1.886809	dispatch branch	-0.124939
-1.374326	predictable branch	-0.124939
-1.373159	poor branch	-0.124939
-1.199406	cache, branch	-0.124939
-1.075641	wrong branch	-0.124939
-0.124807	misses, branch	-0.301030
-0.900726	Provoke branch	-0.124939
-0.900726	Remove branch	-0.124939
-0.600877	buffer, branch	-0.124939
-3.556713	a <	-0.124939
-2.364510	x <	-0.425969
-0.395242	i <	-0.612279
-0.994262	c <	-0.346788
-1.191538	(i <	-0.124939
-1.949916	n <	-0.124939
-0.766268	r <	-0.726999
-1.741197	temp <	-0.124939
-1.595069	row <	-0.124939
-0.805398	c2 <	-0.124939
-1.500477	j <	-0.124939
-1.376313	column <	-0.124939
-1.375538	c1 <	-0.124939
-1.200999	u.f <	-0.124939
-1.076838	r1 <	-0.124939
-0.379898	r2 <	-0.425969
-0.901527	int)i <	-0.124939
-0.901527	!(a <	-0.124939
-0.203963	int)n <	-0.124939
-0.601278	u <	-0.124939
-0.601278	(u.i[1] <	-0.124939
-0.601278	(seconds <	-0.124939
-0.601278	(0 <	-0.124939
-2.732701	the member	-0.191886
-3.510963	is member	-0.124939
-2.149646	a member	-0.761761
-3.462590	of member	-0.124939
-3.273716	and member	-0.124939
-2.927769	in member	-0.425969
-3.260402	The member	-0.124939
-3.261138	for member	-0.124939
-2.544305	or member	-0.124939
-2.469271	with member	-0.124939
-1.552011	data member	-0.204120
-1.894968	make member	-0.301030
-2.734561	same member	-0.124939
-2.707747	other member	-0.124939
-1.602542	class member	-0.124939
-2.646898	each member	-0.124939
-2.621208	b member	-0.124939
-1.650863	static member	-0.602060
-2.491981	any member	-0.124939
-2.448426	way member	-0.124939
-2.450552	const member	-0.124939
-0.902292	virtual member	-0.204120
-1.741531	Assume member	-0.124939
-0.442015	non-static member	-0.425969
-0.804618	polymorphic member	-0.124939
-0.550411	Virtual member	-0.602060
-0.504703	Class member	-0.425969
-1.075840	Simple member	-0.124939
-1.075840	(not member	-0.124939
-0.900860	non-polymorphic member	-0.124939
-0.900860	Non-static member	-0.124939
-0.900860	non-virtual member	-0.124939
-0.600943	(without member	-0.124939
-2.672277	the way	-0.249877
-2.429865	a way	-0.492916
-2.769608	The way	-0.425969
-2.273156	this way	-0.124939
-2.744523	different way	-0.124939
-1.714529	same way	-0.425969
-2.108481	only way	-0.425969
-1.848687	other way	-0.301030
-2.720400	which way	-0.124939
-1.534012	one way	-0.221849
-1.769161	no way	-0.301030
-2.542613	C++ way	-0.124939
-1.630641	efficient way	-0.301030
-2.452144	faster way	-0.124939
-1.727386	first way	-0.124939
-1.655753	useful way	-0.124939
-2.268987	simple way	-0.124939
-1.003254	best way	-0.823909
-2.188881	common way	-0.124939
-1.500568	good way	-0.124939
-2.129410	another way	-0.124939
-1.887261	second way	-0.124939
-1.826789	compatible way	-0.124939
-1.047379	safe way	-0.124939
-1.633211	simplest way	-0.124939
-0.855953	easy way	-0.425969
-1.546061	typical way	-0.124939
-0.804571	fastest way	-0.425969
-1.376770	convenient way	-0.124939
-1.376770	efficient, way	-0.124939
-1.297589	portable way	-0.124939
-1.200679	suboptimal way	-0.124939
-0.379751	easiest way	-0.425969
-0.600910	intelligible way	-0.124939
-2.626745	the elements	-0.301030
-1.980003	of elements	-0.602060
-3.306437	The elements	-0.124939
-2.778075	for elements	-0.124939
-2.533831	if elements	-0.425969
-2.891038	when elements	-0.124939
-2.165469	data elements	-0.124939
-1.821107	all elements	-0.124939
-1.453754	array elements	-0.124939
-2.522101	many elements	-0.124939
-2.502869	any elements	-0.124939
-2.336489	these elements	-0.124939
-2.210632	language elements	-0.124939
-2.131801	four elements	-0.124939
-2.132216	container elements	-0.124939
-1.018645	eight elements	-0.249877
-2.085664	add elements	-0.124939
-1.890893	interface elements	-0.124939
-1.867758	10 elements	-0.124939
-0.152869	consecutive elements	-0.602060
-1.674230	N elements	-0.124939
-0.902207	swap elements	-0.124939
-1.549291	subsequent elements	-0.124939
-1.201477	distinguish elements	-0.124939
-1.201477	mirror elements	-0.124939
-0.901326	dummy elements	-0.124939
-1.772921	is faster	-0.726999
-3.513154	a faster	-0.124939
-2.328321	be faster	-0.249877
-2.694601	are faster	-0.425969
-3.066625	code faster	-0.124939
-2.771269	functions faster	-0.124939
-2.551737	C++ faster	-0.124939
-2.422943	called faster	-0.124939
-1.713789	often faster	-0.124939
-1.654238	even faster	-0.124939
-1.335381	times faster	-0.301030
-1.552140	calls faster	-0.124939
-1.522554	much faster	-0.124939
-2.150745	calculated faster	-0.124939
-1.164434	run faster	-0.124939
-1.909331	becomes faster	-0.124939
-1.864032	usually faster	-0.124939
-1.868629	goes faster	-0.124939
-1.671838	execute faster	-0.124939
-1.671838	little faster	-0.124939
-1.553460	slightly faster	-0.124939
-1.441492	generally faster	-0.124939
-1.296303	executed faster	-0.124939
-1.297244	increasing faster	-0.124939
-0.090104	Still faster	-0.726999
-1.076339	packages faster	-0.124939
-0.901193	ipow faster	-0.124939
-0.601110	neither faster	-0.124939
-2.948077	the const	-0.249877
-2.669490	a const	-0.249877
-3.335732	of const	-0.124939
-3.350004	to const	-0.124939
-3.150704	The const	-0.124939
-2.960034	or const	-0.124939
-2.291041	{ const	-0.425969
-1.956810	A const	-0.124939
-2.823598	} const	-0.124939
-2.768643	has const	-0.124939
-2.580459	double const	-0.124939
-1.284626	static const	-0.425969
-2.476461	* const	-0.124939
-2.459565	variables const	-0.124939
-2.248867	constant const	-0.124939
-2.199286	i; const	-0.124939
-0.884409	__m128i const	-0.726999
-1.941146	char const	-0.124939
-1.894902	x; const	-0.124939
-1.883856	declared const	-0.124939
-1.816709	global const	-0.124939
-1.753816	vectorization const	-0.124939
-1.728954	local const	-0.124939
-1.434820	T const	-0.124939
-1.438309	x); const	-0.124939
-1.373118	(float const	-0.124939
-1.293937	#endif const	-0.124939
-1.198789	14.8 const	-0.124939
-1.198789	16.1 const	-0.124939
-0.504079	(double const	-0.425969
-1.073850	14.30 const	-0.124939
-0.124740	LoadVector(void const	-0.602060
-1.073850	9.5a const	-0.124939
-1.073850	11.3 const	-0.124939
-1.075620	9.4 const	-0.124939
-1.075620	7.17 const	-0.124939
-0.203763	Func(int); const	-0.425969
-0.899529	9.6a const	-0.124939
-0.899529	11.2b const	-0.124939
-0.899529	squares: const	-0.124939
-0.899529	factorials: const	-0.124939
-0.600276	14.5a const	-0.124939
-0.600276	(Vec4f const	-0.124939
-0.600276	11.2a const	-0.124939
-0.600276	(vector const	-0.124939
-0.600276	LoadVectorA(void const	-0.124939
-0.600276	add_elements(__m128 const	-0.124939
-0.600276	7.33b const	-0.124939
-0.600276	#define, const	-0.124939
-0.600276	max(T const	-0.124939
-0.600276	7.33a const	-0.124939
-0.600276	FuncCol(int); const	-0.124939
-0.600276	14.4a const	-0.124939
-2.855873	and makes	-0.124939
-2.745227	that makes	-0.124939
-3.148126	// makes	-0.124939
-1.883685	it makes	-0.124939
-2.440636	code makes	-0.425969
-1.217223	This makes	-0.447158
-2.971653	compiler makes	-0.124939
-2.869437	this makes	-0.124939
-2.881399	It makes	-0.124939
-2.777700	program makes	-0.124939
-2.749414	only makes	-0.124939
-1.672121	which makes	-0.124939
-2.688457	one makes	-0.124939
-2.009754	set makes	-0.425969
-2.572298	library makes	-0.124939
-1.635936	also makes	-0.124939
-2.409818	call makes	-0.124939
-2.347367	pointers makes	-0.124939
-2.295135	system makes	-0.124939
-2.216071	processor makes	-0.124939
-1.499581	option makes	-0.425969
-2.146345	check makes	-0.124939
-2.073186	simply makes	-0.124939
-1.969070	reference makes	-0.124939
-1.944717	keyword makes	-0.124939
-1.222153	linking makes	-0.124939
-1.842951	#define makes	-0.124939
-1.738548	frame makes	-0.124939
-1.591230	templates makes	-0.124939
-1.594810	checks makes	-0.124939
-1.592420	declaration makes	-0.124939
-1.547856	blocks makes	-0.124939
-1.442311	instances makes	-0.124939
-1.441108	linker makes	-0.124939
-1.076751	product makes	-0.124939
-0.600843	position-independent, makes	-0.124939
-3.043257	or cannot	-0.124939
-2.048713	it cannot	-0.221849
-2.130873	function cannot	-0.124939
-3.041879	code cannot	-0.124939
-1.833484	compiler cannot	-0.346788
-1.383550	you cannot	-0.162727
-2.780684	instruction cannot	-0.124939
-2.716557	which cannot	-0.124939
-2.678537	cache cannot	-0.124939
-1.996539	compilers cannot	-0.124939
-2.627832	size cannot	-0.124939
-2.562179	object cannot	-0.124939
-2.488462	variable cannot	-0.124939
-0.942047	You cannot	-0.221849
-2.427680	address cannot	-0.124939
-2.364243	optimization cannot	-0.124939
-1.141175	they cannot	-0.221849
-2.290905	cases cannot	-0.124939
-2.272290	instructions cannot	-0.124939
-2.262814	times cannot	-0.124939
-2.245856	CPUs cannot	-0.124939
-2.223560	work cannot	-0.124939
-2.164823	therefore cannot	-0.124939
-2.124731	problem cannot	-0.124939
-1.977444	name cannot	-0.124939
-1.969070	reference cannot	-0.124939
-1.966713	resources cannot	-0.124939
-1.943552	lookup cannot	-0.124939
-1.954148	We cannot	-0.124939
-1.242598	types cannot	-0.425969
-1.908721	linking cannot	-0.124939
-1.841774	operands cannot	-0.124939
-1.819308	loaded cannot	-0.124939
-1.372961	debugger cannot	-0.124939
-1.294980	Compilers cannot	-0.124939
-0.900660	algorithms, cannot	-0.124939
-2.793873	and before	-0.425969
-2.896389	int before	-0.124939
-2.250187	time before	-0.124939
-2.140210	program before	-0.124939
-2.760162	instruction before	-0.124939
-2.571853	double before	-0.124939
-2.599495	Intel before	-0.124939
-2.495325	array before	-0.124939
-2.509538	value before	-0.124939
-2.492751	takes before	-0.124939
-2.428799	table before	-0.124939
-2.446390	long before	-0.124939
-1.496545	called before	-0.602060
-2.245933	times before	-0.124939
-1.573899	stack before	-0.124939
-2.188770	big before	-0.124939
-2.164885	overflow before	-0.124939
-2.164885	integers before	-0.124939
-2.143301	precision before	-0.124939
-2.139236	check before	-0.124939
-1.442104	known before	-0.425969
-1.135717	values before	-0.124939
-2.061391	well before	-0.124939
-2.041736	cycles before	-0.124939
-2.041736	counter before	-0.124939
-2.050799	count before	-0.124939
-1.977148	signed before	-0.124939
-1.935406	addition before	-0.124939
-1.259763	needed before	-0.124939
-1.921630	read before	-0.124939
-1.759025	comes before	-0.124939
-1.730937	temp before	-0.124939
-1.725336	algorithm before	-0.124939
-1.663990	priority before	-0.124939
-1.663990	iteration before	-0.124939
-1.662115	counters before	-0.124939
-1.590484	reasons before	-0.124939
-1.544726	Time before	-0.124939
-1.491674	misprediction before	-0.124939
-1.441331	resolved before	-0.124939
-1.435582	again before	-0.124939
-1.368635	c1 before	-0.124939
-1.370543	B before	-0.124939
-1.370543	compilation before	-0.124939
-1.368635	job before	-0.124939
-1.293278	cleanup before	-0.124939
-0.090026	_mm256_zeroupper() before	-0.249877
-1.196368	years before	-0.124939
-1.073354	experience before	-0.124939
-1.073354	evicted before	-0.124939
-0.899197	freed before	-0.124939
-0.203729	immediately before	-0.425969
-0.600109	stages before	-0.124939
-0.600109	restored before	-0.124939
-0.600109	subtask before	-0.124939
-0.600109	(c+d) before	-0.124939
-0.600109	checked before	-0.124939
-0.600109	sub-vector before	-0.124939
-2.177150	is stored	-0.539912
-2.914139	and stored	-0.124939
-1.622086	be stored	-0.602060
-1.605641	are stored	-0.535113
-3.033945	not stored	-0.124939
-2.886763	then stored	-0.124939
-2.644549	pointer stored	-0.124939
-1.481108	also stored	-0.425969
-1.445504	objects stored	-0.425969
-2.319360	always stored	-0.124939
-2.172095	been stored	-0.124939
-2.087916	information stored	-0.124939
-2.020684	typically stored	-0.124939
-1.988458	never stored	-0.124939
-1.869871	usually stored	-0.124939
-1.712936	Variables stored	-0.124939
-0.601568	i.e. stored	-0.425969
-1.297963	necessarily stored	-0.124939
-3.354658	the called	-0.425969
-1.817241	is called	-0.225609
-2.056780	be called	-0.271067
-2.114360	are called	-0.124939
-2.538590	function called	-0.425969
-1.993634	when called	-0.602060
-2.833295	memory called	-0.124939
-2.119056	functions called	-0.425969
-2.775989	only called	-0.124939
-2.704035	cache called	-0.124939
-1.480535	also called	-0.249877
-2.494253	variables called	-0.124939
-2.167651	been called	-0.124939
-2.044676	was called	-0.124939
-1.951957	mechanism called	-0.124939
-1.909540	actually called	-0.124939
-1.179153	usually called	-0.124939
-0.925739	feature called	-0.124939
-1.771457	functions, called	-0.124939
-0.601311	(also called	-0.124939
-0.601311	83 called	-0.124939
-0.601311	erroneously called	-0.124939
-2.155565	the address	-0.937852
-2.781872	The address	-0.425969
-3.292512	for address	-0.124939
-3.145502	function address	-0.124939
-3.066625	code address	-0.124939
-1.762993	an address	-0.602060
-2.894970	this address	-0.124939
-1.776340	from address	-0.425969
-1.426952	memory address	-0.182931
-2.173837	at address	-0.124939
-2.748521	same address	-0.124939
-2.722450	other address	-0.124939
-2.656574	each address	-0.124939
-2.532349	array address	-0.124939
-2.466712	return address	-0.124939
-2.333849	these address	-0.124939
-2.252128	its address	-0.124939
-1.997517	complicated address	-0.124939
-1.996609	their address	-0.124939
-1.947922	runtime address	-0.124939
-1.886308	own address	-0.124939
-1.890905	higher address	-0.124939
-0.688471	target address	-0.301030
-1.499484	fixed address	-0.124939
-1.374545	base address	-0.124939
-1.296303	larger address	-0.124939
-1.076339	nearby address	-0.124939
-0.901193	whose address	-0.124939
-3.613654	the 4	-0.124939
-3.475609	is 4	-0.124939
-3.012337	of 4	-0.425969
-3.438839	to 4	-0.124939
-2.204145	// 4	-0.425969
-3.076533	= 4	-0.124939
-2.493825	by 4	-0.124939
-3.044281	- 4	-0.124939
-2.358287	int 4	-0.124939
-2.607348	double 4	-0.124939
-1.905711	float 4	-0.124939
-2.531087	also 4	-0.124939
-2.502014	* 4	-0.124939
-1.848426	takes 4	-0.425969
-1.219780	4 4	-0.221849
-2.412306	unsigned 4	-0.124939
-1.195348	64 4	-0.221849
-1.693524	time. 4	-0.124939
-1.383351	16 4	-0.124939
-1.110087	32 4	-0.221849
-2.132135	/ 4	-0.124939
-2.083448	mode 4	-0.124939
-1.996938	sets 4	-0.124939
-0.349940	Pentium 4	-0.124939
-1.669823	2, 4	-0.124939
-1.595680	Store 4	-0.124939
-1.501311	procedure 4	-0.124939
-0.391009	unlimited 4	-0.726999
-1.438252	int64_t 4	-0.124939
-1.438252	factor 4	-0.124939
-1.294650	22 4	-0.124939
-0.504379	ALIGN 4	-0.425969
-1.199008	1: 4	-0.124939
-1.075342	............................................................................................... 4	-0.124939
-0.900527	recently 4	-0.124939
-0.900527	bytes, 4	-0.124939
-0.600776	_mm_permutevar_ps 4	-0.124939
-0.600776	_mm256_permutevar_ps 4	-0.124939
-2.909617	or See	-0.124939
-2.330274	code. See	-0.124939
-2.218726	function. See	-0.124939
-2.120633	functions. See	-0.124939
-1.386639	memory. See	-0.124939
-2.039449	program. See	-0.124939
-1.371539	used. See	-0.425969
-1.906754	compilers. See	-0.124939
-1.888562	processors. See	-0.124939
-1.776556	platforms. See	-0.124939
-1.780667	cases. See	-0.124939
-1.780667	1. See	-0.124939
-1.746487	variables. See	-0.124939
-1.694640	mode. See	-0.124939
-1.690490	optimization. See	-0.124939
-1.657798	possible. See	-0.124939
-1.624179	way. See	-0.124939
-1.584886	CPU. See	-0.124939
-1.586996	not. See	-0.124939
-1.589116	version. See	-0.124939
-1.541238	order. See	-0.124939
-1.537028	allocation. See	-0.124939
-1.543359	available. See	-0.124939
-1.539128	expressions. See	-0.124939
-1.537028	storage. See	-0.124939
-0.803560	system. See	-0.425969
-1.487976	branch. See	-0.124939
-1.434214	optimizations. See	-0.124939
-1.432094	language. See	-0.124939
-1.367267	this. See	-0.124939
-1.369398	overlap. See	-0.124939
-1.290217	handling. See	-0.124939
-1.290217	files. See	-0.124939
-1.290217	so. See	-0.124939
-1.292358	manuals. See	-0.124939
-0.503838	pool. See	-0.124939
-0.503838	dispatcher. See	-0.425969
-1.195448	cached. See	-0.124939
-1.195448	aliasing. See	-0.124939
-1.072661	compact. See	-0.124939
-1.072661	errors. See	-0.124939
-1.072661	takes. See	-0.124939
-1.074824	exceptions. See	-0.124939
-1.072661	occur. See	-0.124939
-0.379336	other. See	-0.425969
-0.898732	aligned. See	-0.124939
-0.898732	required. See	-0.124939
-0.898732	mechanism. See	-0.124939
-0.898732	issue. See	-0.124939
-0.898732	delay. See	-0.124939
-0.898732	containers. See	-0.124939
-0.898732	alignment. See	-0.124939
-0.898732	contentions. See	-0.124939
-0.898732	crash. See	-0.124939
-0.599876	incremented. See	-0.124939
-0.599876	die. See	-0.124939
-0.599876	doing. See	-0.124939
-0.599876	requested. See	-0.124939
-0.599876	Intel. See	-0.124939
-0.599876	OS. See	-0.124939
-0.599876	__declspec(cpu_dispatch(...)). See	-0.124939
-0.599876	motion. See	-0.124939
-0.599876	obvious. See	-0.124939
-0.599876	(RTTI). See	-0.124939
-0.599876	0x8040); See	-0.124939
-2.015217	the critical	-0.639849
-3.648467	is critical	-0.124939
-2.603434	a critical	-0.221849
-3.560363	in critical	-0.124939
-2.567266	The critical	-0.602060
-3.370726	for critical	-0.124939
-2.714594	are critical	-0.124939
-3.160203	or critical	-0.124939
-2.865762	A critical	-0.124939
-2.780911	same critical	-0.124939
-1.092356	most critical	-0.564271
-2.430036	less critical	-0.124939
-2.279069	making critical	-0.124939
-1.741258	particularly critical	-0.124939
-0.504801	Making critical	-0.425969
-0.379978	Call critical	-0.425969
-0.601478	activates critical	-0.124939
-2.813121	the call	-0.301030
-3.072732	a call	-0.425969
-3.534530	of call	-0.124939
-2.118448	to call	-0.761761
-2.893836	and call	-0.124939
-2.397752	can call	-0.124939
-2.607309	// call	-0.124939
-1.601561	function call	-0.204120
-3.006765	not call	-0.124939
-2.989581	may call	-0.124939
-2.216504	A call	-0.124939
-2.861951	will call	-0.124939
-1.798917	then call	-0.425969
-1.808561	one call	-0.602060
-2.664474	each call	-0.124939
-2.506030	any call	-0.124939
-1.202394	first call	-0.221849
-2.310171	system call	-0.124939
-1.542479	doesn't call	-0.425969
-1.521952	single call	-0.425969
-1.405797	every call	-0.124939
-1.844460	modules call	-0.124939
-1.501896	Virtual call	-0.124939
-1.442286	Now call	-0.124939
-0.918916	= 0;	-0.726999
-1.565676	return 0;	-0.301030
-1.941609	> 0;	-0.124939
-1.775815	>= 0;	-0.124939
-1.379671	!= 0;	-0.124939
-3.582189	the 8	-0.124939
-3.449263	is 8	-0.124939
-2.659248	of 8	-0.124939
-2.838068	and 8	-0.124939
-3.167531	be 8	-0.124939
-2.586037	// 8	-0.425969
-3.057405	= 8	-0.124939
-2.489793	by 8	-0.124939
-3.037508	- 8	-0.124939
-2.355226	int 8	-0.425969
-2.592390	page 8	-0.124939
-2.600013	double 8	-0.124939
-2.544899	float 8	-0.124939
-2.495051	* 8	-0.124939
-2.509263	takes 8	-0.124939
-2.447583	between 8	-0.124939
-1.122243	8 8	-0.124939
-2.409430	unsigned 8	-0.124939
-1.720452	64 8	-0.425969
-1.224783	16 8	-0.124939
-1.109891	32 8	-0.346788
-2.259116	constant 8	-0.124939
-2.128348	/ 8	-0.124939
-2.098436	structure 8	-0.124939
-2.081045	mode 8	-0.124939
-0.732255	char 8	-0.124939
-1.945716	last 8	-0.124939
-1.736978	== 8	-0.124939
-1.595020	Store 8	-0.124939
-0.855350	65 8	-0.425969
-0.390988	unlimited 8	-0.726999
-1.495326	lower 8	-0.124939
-1.438724	eax, 8	-0.124939
-1.438724	hold 8	-0.124939
-1.373171	Agner 8	-0.124939
-1.198479	1: 8	-0.124939
-0.600643	............................................................................... 8	-0.124939
-0.600643	Vec16c 8	-0.124939
-0.600643	Vec2uq 8	-0.124939
-0.600643	Is8vec8 8	-0.124939
-0.600643	kb, 8	-0.124939
-0.600643	I64vec1 8	-0.124939
-2.051610	is less	-0.510290
-3.258142	and less	-0.124939
-3.249197	for less	-0.124939
-2.212788	be less	-0.522879
-1.972671	are less	-0.249877
-3.048687	or less	-0.124939
-3.173826	it less	-0.124939
-3.044896	code less	-0.124939
-2.960965	not less	-0.124939
-2.896692	have less	-0.124939
-2.812444	at less	-0.124939
-2.780026	program less	-0.124939
-2.702002	other less	-0.124939
-2.722378	but less	-0.124939
-2.535941	also less	-0.124939
-2.389581	register less	-0.124939
-2.348237	pointers less	-0.124939
-2.263598	times less	-0.124939
-2.228463	while less	-0.124939
-1.521625	much less	-0.124939
-2.156526	works less	-0.124939
-2.042507	write less	-0.124939
-2.033586	was less	-0.124939
-0.694194	caching less	-0.903090
-1.996210	allows less	-0.124939
-1.955571	difference less	-0.124939
-1.791042	become less	-0.124939
-1.765308	produce less	-0.124939
-1.761870	vectorization less	-0.124939
-1.671863	Optimizing less	-0.124939
-1.673024	though less	-0.124939
-1.591524	input less	-0.124939
-0.601665	slightly less	-0.124939
-0.900726	somewhat less	-0.124939
-0.600877	neverthe- less	-0.124939
-3.060761	// For	-0.124939
-1.445464	code. For	-0.301030
-1.688727	time. For	-0.124939
-2.155453	etc. For	-0.124939
-2.041122	used. For	-0.124939
-2.046622	files For	-0.124939
-1.782947	cases. For	-0.124939
-1.044078	resources. For	-0.124939
-1.698463	variable. For	-0.124939
-1.692770	optimization. For	-0.124939
-1.625912	vector. For	-0.124939
-0.899525	dispatching. For	-0.425969
-0.900710	version. For	-0.124939
-0.854952	to. For	-0.425969
-0.854162	expressions. For	-0.425969
-1.540676	overflow. For	-0.124939
-1.491447	units. For	-0.124939
-1.489524	software. For	-0.124939
-1.437326	automatically. For	-0.124939
-1.435386	core. For	-0.124939
-1.370379	structure. For	-0.124939
-1.291198	profiler. For	-0.124939
-0.600472	operation. For	-0.124939
-1.293147	factor. For	-0.124939
-1.196237	are. For	-0.124939
-1.196237	conditions. For	-0.124939
-1.196237	efficiency. For	-0.124939
-1.196237	tasks. For	-0.124939
-1.196237	structures. For	-0.124939
-0.503958	constants. For	-0.425969
-1.073255	systems". For	-0.124939
-1.073255	directives. For	-0.124939
-1.073255	sizes. For	-0.124939
-1.073255	lookup. For	-0.124939
-0.379416	valid. For	-0.124939
-1.073255	post-increment. For	-0.124939
-0.899130	frequency. For	-0.124939
-0.899130	jobs. For	-0.124939
-0.899130	subexpression. For	-0.124939
-0.899130	supported. For	-0.124939
-0.899130	question. For	-0.124939
-0.899130	development. For	-0.124939
-0.899130	purity. For	-0.124939
-0.899130	sources. For	-0.124939
-0.899130	enough. For	-0.124939
-0.899130	fraction. For	-0.124939
-0.899130	predictable. For	-0.124939
-0.899130	unit. For	-0.124939
-0.899130	matrix. For	-0.124939
-0.600076	algebra. For	-0.124939
-0.600076	exits. For	-0.124939
-0.600076	combined. For	-0.124939
-0.600076	modularity. For	-0.124939
-0.600076	identical. For	-0.124939
-0.600076	reduction. For	-0.124939
-0.600076	modulo. For	-0.124939
-0.600076	completely. For	-0.124939
-0.600076	reporting. For	-0.124939
-0.600076	minimized. For	-0.124939
-2.206295	for example,	-0.425969
-1.676113	this example,	-0.301030
-0.170678	For example,	-0.505150
-2.334634	following example,	-0.124939
-0.950191	above example,	-0.425969
-1.745390	preceding example,	-0.124939
-0.249815	Same example,	-0.301030
-0.902529	contrived example,	-0.124939
-3.363740	the bit	-0.425969
-2.922098	this bit	-0.124939
-1.598556	each bit	-0.249877
-2.532653	many bit	-0.124939
-2.420678	8 bit	-0.124939
-0.774452	64 bit	-0.204120
-2.319813	16 bit	-0.124939
-0.823392	32 bit	-0.234083
-2.207259	single bit	-0.124939
-2.185416	small bit	-0.124939
-0.926991	128 bit	-0.346788
-0.280403	sign bit	-0.452298
-1.298444	256 bit	-0.124939
-1.077818	slow bit	-0.425969
-1.595287	significant bit	-0.124939
-0.492688	carry bit	-0.124939
-0.425817	32- bit	-0.124939
-1.077138	(128 bit	-0.124939
-0.901727	comparison, bit	-0.124939
-0.601378	128- bit	-0.124939
-2.209732	the operating	-0.884607
-3.042065	of operating	-0.124939
-2.055833	and operating	-0.388180
-2.554798	The operating	-0.602060
-2.370171	an operating	-0.124939
-2.763614	different operating	-0.124939
-2.671724	no operating	-0.124939
-2.570478	multiple operating	-0.124939
-2.559847	two operating	-0.124939
-1.647779	64-bit operating	-0.301030
-2.492139	some operating	-0.124939
-1.789898	32-bit operating	-0.425969
-2.335167	these operating	-0.124939
-2.230325	Windows operating	-0.124939
-2.173404	Linux operating	-0.124939
-2.075457	certain operating	-0.124939
-2.018107	Mac operating	-0.124939
-1.179101	old operating	-0.425969
-1.592985	compiler, operating	-0.124939
-1.593884	current operating	-0.124939
-0.804898	X operating	-0.124939
-1.296469	languages, operating	-0.124939
-0.504783	protected operating	-0.124939
-1.076439	contemporary operating	-0.124939
-0.901260	Older operating	-0.124939
-0.901260	Supported operating	-0.124939
-0.601144	circumvent operating	-0.124939
-3.574666	the unsigned	-0.124939
-3.442918	is unsigned	-0.124939
-3.394545	of unsigned	-0.124939
-2.653042	to unsigned	-0.124939
-2.319720	and unsigned	-0.425969
-3.202098	The unsigned	-0.124939
-2.158872	or unsigned	-0.124939
-2.124901	if unsigned	-0.249877
-2.459368	with unsigned	-0.425969
-1.963458	an unsigned	-0.249877
-2.354464	int unsigned	-0.124939
-2.049489	{ unsigned	-0.602060
-2.861455	use unsigned	-0.124939
-1.492507	4 unsigned	-0.124939
-1.736717	8 unsigned	-0.124939
-2.295636	16 unsigned	-0.124939
-1.634033	part unsigned	-0.425969
-2.127406	/ unsigned	-0.124939
-1.973236	256 unsigned	-0.124939
-1.735304	d; unsigned	-0.124939
-1.671188	x, unsigned	-0.124939
-1.669770	convert unsigned	-0.124939
-1.636248	f; unsigned	-0.124939
-0.679791	systems: unsigned	-0.124939
-1.371580	normal unsigned	-0.124939
-1.295257	versus unsigned	-0.124939
-1.295257	1000; unsigned	-0.124939
-1.074844	Linux: unsigned	-0.124939
-1.076285	7.7 unsigned	-0.124939
-1.076285	7.25 unsigned	-0.124939
-0.900194	compiler: unsigned	-0.124939
-0.900194	142 unsigned	-0.124939
-0.600609	Vec32c unsigned	-0.124939
-0.600609	a[size]; unsigned	-0.124939
-0.600609	T, unsigned	-0.124939
-0.600609	0x3FF unsigned	-0.124939
-0.600609	0x3FFF unsigned	-0.124939
-0.600609	14.22b unsigned	-0.124939
-0.600609	14.22a unsigned	-0.124939
-0.600609	uint16_t unsigned	-0.124939
-0.600609	uint8_t unsigned	-0.124939
-0.600609	0x7F unsigned	-0.124939
-0.600609	uint32_t unsigned	-0.124939
-2.054189	the first	-0.281725
-3.074511	to first	-0.124939
-1.887041	The first	-0.301030
-3.322421	are first	-0.124939
-3.181928	or first	-0.124939
-2.217257	on first	-0.301030
-2.943617	this first	-0.124939
-2.791938	only first	-0.124939
-1.898300	shows first	-0.124939
-1.773361	comes first	-0.124939
-0.253506	bytes. first	-0.726999
-1.077739	49 first	-0.124939
-0.380019	After first	-0.425969
-0.601579	reflected, first	-0.124939
-2.889773	the register	-0.221849
-2.226437	a register	-0.212089
-2.566449	of register	-0.124939
-2.552347	The register	-0.301030
-2.543299	for register	-0.124939
-2.822873	A register	-0.124939
-2.819208	from register	-0.124939
-1.461367	vector register	-0.124939
-2.796806	make register	-0.124939
-1.601161	same register	-0.346788
-1.483399	point register	-0.301030
-2.702900	one register	-0.124939
-2.678597	integer register	-0.124939
-2.566340	float register	-0.124939
-2.421713	called register	-0.124939
-1.667108	new register	-0.425969
-2.281782	available register	-0.124939
-2.251984	about register	-0.124939
-1.569906	extra register	-0.124939
-2.199521	single register	-0.124939
-2.045210	optimize register	-0.124939
-1.894323	XMM register	-0.124939
-1.673439	logical register	-0.124939
-1.497348	temporary register	-0.124939
-1.500257	YMM register	-0.124939
-1.441294	free register	-0.124939
-1.297112	fourteen register	-0.124939
-1.297112	physical register	-0.124939
-1.076239	flags register	-0.124939
-3.416895	a 64	-0.124939
-2.799254	of 64	-0.301030
-3.419559	to 64	-0.124939
-2.840985	and 64	-0.425969
-2.337249	in 64	-0.903090
-2.755724	The 64	-0.124939
-3.172625	be 64	-0.124939
-3.155480	are 64	-0.124939
-3.062108	= 64	-0.124939
-3.022050	with 64	-0.124939
-3.027099	code 64	-0.124939
-3.039191	- 64	-0.124939
-1.960919	int 64	-0.124939
-2.865952	use 64	-0.124939
-1.720556	double 64	-0.124939
-2.443291	2 64	-0.124939
-1.382663	long 64	-0.124939
-1.492807	4 64	-0.124939
-1.486799	8 64	-0.301030
-1.310899	64 64	-0.249877
-2.309086	32 64	-0.124939
-2.256431	Gnu 64	-0.124939
-2.127601	uses 64	-0.124939
-2.076448	1 64	-0.124939
-1.328671	typically 64	-0.124939
-1.942160	efficient. 64	-0.124939
-1.261229	char 64	-0.124939
-1.629810	entire 64	-0.124939
-1.437563	int64_t 64	-0.124939
-1.295521	_WIN64 64	-0.124939
-1.198611	exception. 64	-0.124939
-1.199982	"Intel 64	-0.124939
-0.900327	exceeds 64	-0.124939
-0.900327	__int64 64	-0.124939
-0.600676	-fno-pic). 64	-0.124939
-0.600676	Iu32vec2 64	-0.124939
-0.600676	covers 64	-0.124939
-0.600676	11.6 64	-0.124939
-0.600676	9.6b 64	-0.124939
-0.600676	Vec2q 64	-0.124939
-0.600676	Vec4ui 64	-0.124939
-2.152706	to take	-0.204120
-3.298177	and take	-0.124939
-2.362071	that take	-0.249877
-1.635966	can take	-0.492916
-2.981187	not take	-0.124939
-1.728964	may take	-0.204120
-2.986483	you take	-0.124939
-1.805796	will take	-0.124939
-1.866062	functions take	-0.124939
-2.709906	should take	-0.124939
-2.622400	double take	-0.124939
-2.624783	b take	-0.124939
-2.548674	C++ take	-0.124939
-2.376234	often take	-0.124939
-2.304003	operations take	-0.124939
-2.295269	cases take	-0.124939
-2.225771	calculations take	-0.124939
-2.220791	doesn't take	-0.124939
-2.103611	would take	-0.124939
-2.012564	typically take	-0.124939
-1.968254	division take	-0.124939
-1.955723	We take	-0.124939
-1.862874	usually take	-0.124939
-1.794138	conversions take	-0.124939
-1.738130	sometimes take	-0.124939
-1.737137	still take	-0.124939
-1.201078	Let's take	-0.124939
-1.076139	logarithms take	-0.124939
-0.901060	Divisions take	-0.124939
-0.901060	precisions take	-0.124939
-1.981835	is often	-0.279841
-2.893836	and often	-0.124939
-1.828440	are often	-0.212089
-2.035737	can often	-0.301030
-2.570489	it often	-0.124939
-3.079548	code often	-0.124939
-3.001812	compiler often	-0.124939
-1.964455	will often	-0.124939
-2.778750	functions often	-0.124939
-1.588729	most often	-0.425969
-2.645823	size often	-0.124939
-2.469124	very often	-0.124939
-2.392208	how often	-0.124939
-2.345233	systems often	-0.124939
-2.092755	hardware often	-0.124939
-1.376147	quite often	-0.124939
-1.987921	conversion often	-0.124939
-1.672350	disk often	-0.124939
-1.593970	statements often	-0.124939
-1.499471	however, often	-0.124939
-0.901460	mechanisms often	-0.124939
-0.601244	companies often	-0.124939
-0.601244	hackers often	-0.124939
-0.601244	Keep often	-0.124939
-3.321840	a rather	-0.124939
-2.425102	code rather	-0.425969
-2.250676	time rather	-0.425969
-2.831217	use rather	-0.124939
-2.161981	memory rather	-0.425969
-2.623501	integer rather	-0.124939
-2.521406	float rather	-0.124939
-2.534429	object rather	-0.124939
-2.496510	array rather	-0.124939
-1.733079	8 rather	-0.425969
-2.363789	register rather	-0.124939
-2.345570	template rather	-0.124939
-1.146272	registers rather	-0.823909
-2.329481	pointers rather	-0.124939
-2.296111	access rather	-0.124939
-2.270013	system rather	-0.124939
-2.276937	bits rather	-0.124939
-2.275968	0 rather	-0.124939
-2.252113	instructions rather	-0.124939
-2.271038	processors rather	-0.124939
-2.246686	times rather	-0.124939
-2.234284	stack rather	-0.124939
-2.214925	calls rather	-0.124939
-2.116418	container rather	-0.124939
-2.072149	mode rather	-0.124939
-2.042351	cycles rather	-0.124939
-1.987323	sets rather	-0.124939
-1.990928	implementation rather	-0.124939
-1.957979	parameter rather	-0.124939
-1.909422	expressions rather	-0.124939
-1.899987	linking rather	-0.124939
-1.810038	references rather	-0.124939
-1.811867	loaded rather	-0.124939
-1.753859	operation rather	-0.124939
-1.691711	100 rather	-0.124939
-1.664280	models rather	-0.124939
-1.598233	beginning rather	-0.124939
-1.543054	blocks rather	-0.124939
-1.495645	memcpy rather	-0.124939
-1.433909	factor rather	-0.124939
-1.433909	units rather	-0.124939
-1.370707	step rather	-0.124939
-1.198392	advance rather	-0.124939
-1.073454	zero, rather	-0.124939
-1.073454	xxn rather	-0.124939
-1.073454	frameworks, rather	-0.124939
-0.899263	-56 rather	-0.124939
-0.899263	once, rather	-0.124939
-0.600142	!b) rather	-0.124939
-0.600142	connections rather	-0.124939
-0.600142	(b*2.0)/3.0 rather	-0.124939
-0.600142	unions rather	-0.124939
-0.600142	X?" rather	-0.124939
-0.600142	matters rather	-0.124939
-0.600142	supports, rather	-0.124939
-0.600142	tools, rather	-0.124939
-0.600142	at, rather	-0.124939
-2.737015	the optimization	-0.124939
-2.674577	of optimization	-0.249877
-2.338379	to optimization	-0.425969
-2.210172	on optimization	-0.602060
-3.063454	code optimization	-0.124939
-2.988991	compiler optimization	-0.124939
-2.277100	this optimization	-0.124939
-1.636174	program optimization	-0.124939
-2.516919	many optimization	-0.124939
-1.795872	software optimization	-0.124939
-1.129606	An optimization	-0.726999
-2.209740	best optimization	-0.124939
-2.202384	specific optimization	-0.124939
-2.184872	good optimization	-0.124939
-1.280164	various optimization	-0.124939
-1.906918	Many optimization	-0.124939
-1.202156	your optimization	-0.425969
-0.718036	relevant optimization	-0.726999
-1.130792	my optimization	-0.124939
-1.636618	insert optimization	-0.124939
-0.901714	Compiler optimization	-0.425969
-1.200202	detailed optimization	-0.124939
-1.201178	interprocedural optimization	-0.124939
-1.076239	81 optimization	-0.124939
-1.076239	Generate optimization	-0.124939
-0.203923	Specific optimization	-0.425969
-0.601077	Profile-guided optimization	-0.124939
-0.601077	Interprocedural optimization	-0.124939
-0.601077	strongest optimization	-0.124939
-3.704019	the libraries	-0.124939
-3.292779	The libraries	-0.124939
-3.088724	or libraries	-0.124939
-1.287348	function libraries	-0.329059
-2.798577	vector libraries	-0.124939
-2.760835	different libraries	-0.124939
-2.722450	other libraries	-0.124939
-2.707406	all libraries	-0.124939
-2.010187	class libraries	-0.124939
-2.659905	most libraries	-0.124939
-1.973613	Intel libraries	-0.124939
-2.557962	two libraries	-0.124939
-2.569863	static libraries	-0.124939
-2.333849	these libraries	-0.124939
-0.871630	dynamic libraries	-0.182931
-2.264694	Gnu libraries	-0.124939
-2.235418	large libraries	-0.124939
-1.444316	Function libraries	-0.425969
-1.404078	standard libraries	-0.124939
-1.947922	runtime libraries	-0.124939
-1.907498	Many libraries	-0.124939
-1.913020	linked libraries	-0.124939
-0.925516	link libraries	-0.301030
-1.774366	Dynamic libraries	-0.124939
-1.443372	purpose libraries	-0.124939
-1.076339	Two libraries	-0.124939
-1.076339	well-tested libraries	-0.124939
-1.076339	LIBM libraries	-0.124939
-3.541476	is how	-0.124939
-2.346026	of how	-0.726999
-2.524414	and how	-0.249877
-2.121964	for how	-0.970037
-2.054727	on how	-0.124939
-1.325465	about how	-0.124939
-2.104159	calculate how	-0.124939
-1.377269	count how	-0.425969
-2.003471	see how	-0.124939
-0.440353	shows how	-0.550907
-0.766190	know how	-0.249877
-1.767459	checking how	-0.124939
-1.742324	tell how	-0.124939
-1.713061	Note how	-0.124939
-0.943500	discussed how	-0.425969
-1.633722	e.g. how	-0.124939
-1.593293	counts how	-0.124939
-0.747009	measure how	-0.124939
-1.375319	show how	-0.124939
-1.297112	specifies how	-0.124939
-1.296137	programming, how	-0.124939
-1.296137	understand how	-0.124939
-1.201178	explain how	-0.124939
-1.202156	idea how	-0.124939
-1.200202	illustrates how	-0.124939
-1.076239	decide how	-0.124939
-0.203923	discusses how	-0.425969
-0.901126	doubt how	-0.124939
-0.601077	describes how	-0.124939
-2.395357	the code.	-0.187087
-2.669803	of code.	-0.124939
-3.005458	as code.	-0.124939
-2.878837	this code.	-0.124939
-2.783559	instruction code.	-0.124939
-1.846034	point code.	-0.301030
-2.670285	integer code.	-0.124939
-2.560786	64-bit code.	-0.124939
-2.544121	C++ code.	-0.124939
-2.440401	makes code.	-0.124939
-2.430453	critical code.	-0.124939
-1.383117	system code.	-0.124939
-2.259948	error code.	-0.124939
-1.316574	extra code.	-0.124939
-1.561841	assembly code.	-0.124939
-1.288916	compiled code.	-0.124939
-1.377314	intermediate code.	-0.124939
-2.011620	application code.	-0.124939
-1.986787	mathematical code.	-0.124939
-1.026293	source code.	-0.124939
-1.078288	position-independent code.	-0.124939
-1.739335	vectorized code.	-0.124939
-1.636796	executable code.	-0.124939
-1.633505	simplest code.	-0.124939
-1.548546	independent code.	-0.124939
-1.295476	Fortran code.	-0.124939
-1.297689	Position-independent code.	-0.124939
-1.295476	CPU-intensive code.	-0.124939
-1.200779	suboptimal code.	-0.124939
-0.504480	application-specific code.	-0.124939
-0.900860	non-AVX code.	-0.124939
-0.900860	time-critical code.	-0.124939
-0.600943	precompiled code.	-0.124939
-3.138286	the time.	-0.124939
-2.704100	a time.	-0.124939
-2.825627	of time.	-0.124939
-2.889066	more time.	-0.124939
-1.601970	same time.	-0.221849
-2.106855	CPU time.	-0.124939
-2.662485	each time.	-0.124939
-2.664540	most time.	-0.124939
-2.527525	takes time.	-0.124939
-2.475709	long time.	-0.124939
-2.404996	first time.	-0.124939
-1.042361	extra time.	-0.124939
-1.298957	execution time.	-0.124939
-0.429375	compile time.	-0.215115
-2.157694	calculation time.	-0.124939
-2.105536	run time.	-0.124939
-1.984962	development time.	-0.124939
-1.952875	last time.	-0.124939
-0.829951	longer time.	-0.425969
-1.223274	load time.	-0.124939
-1.797761	installation time.	-0.124939
-1.736447	save time.	-0.124939
-1.707585	total time.	-0.124939
-0.504640	user's time.	-0.124939
-0.901393	computation time.	-0.124939
-2.573471	the template	-0.425969
-3.503658	is template	-0.124939
-2.267301	a template	-0.425969
-3.455285	of template	-0.124939
-3.254207	The template	-0.124939
-3.125164	function template	-0.124939
-3.075098	by template	-0.124939
-2.468270	with template	-0.124939
-2.185668	as template	-0.124939
-2.867007	more template	-0.124939
-1.697214	A template	-0.124939
-2.646757	class template	-0.124939
-2.629915	using template	-0.124939
-1.887136	C++ template	-0.124939
-2.530949	where template	-0.124939
-2.454748	2 template	-0.124939
-2.367095	template template	-0.124939
-2.210144	Use template	-0.124939
-2.129190	Function template	-0.124939
-1.403203	standard template	-0.425969
-2.047520	above template	-0.124939
-1.993501	complicated template	-0.124939
-1.765666	checking template	-0.124939
-1.672125	N template	-0.124939
-1.441438	2: template	-0.124939
-0.504460	powN template	-0.425969
-0.900793	partial template	-0.124939
-0.203890	Full template	-0.425969
-0.900793	parameter: template	-0.124939
-0.900793	m;} template	-0.124939
-0.600910	(partial) template	-0.124939
-0.600910	convoluted template	-0.124939
-0.600910	non-recursing template	-0.124939
-0.600910	Partial template	-0.124939
-3.831531	the registers	-0.124939
-2.689223	of registers	-0.249877
-1.947224	in registers	-0.271067
-1.398290	vector registers	-0.124939
-2.818234	because registers	-0.124939
-1.691819	point registers	-0.124939
-1.786136	integer registers	-0.124939
-2.662690	using registers	-0.124939
-2.289541	available registers	-0.124939
-1.327316	stack registers	-0.602060
-2.207628	These registers	-0.124939
-0.334747	XMM registers	-0.425969
-1.599738	enough registers	-0.124939
-1.552247	256-bit registers	-0.124939
-0.550584	YMM registers	-0.124939
-1.201798	saving registers	-0.124939
-0.124874	ZMM registers	-0.124939
-3.008474	the need	-0.425969
-3.342588	The need	-0.124939
-2.252840	that need	-0.346788
-1.891973	not need	-0.124939
-1.654953	may need	-0.271067
-1.805505	you need	-0.522879
-2.777951	only need	-0.124939
-1.268246	no need	-0.726999
-2.673913	class need	-0.124939
-2.676114	compilers need	-0.124939
-1.430676	we need	-0.249877
-2.504147	You need	-0.124939
-2.383196	libraries need	-0.124939
-2.348545	systems need	-0.124939
-1.014012	doesn't need	-0.221849
-2.218673	threads need	-0.124939
-2.215243	language need	-0.124939
-2.176269	therefore need	-0.124939
-2.067175	files need	-0.124939
-0.768920	don't need	-0.124939
-1.949697	applications need	-0.124939
-3.647578	the pointers	-0.124939
-2.561795	of pointers	-0.346788
-2.511971	that pointers	-0.124939
-3.054185	or pointers	-0.124939
-2.286141	function pointers	-0.301030
-3.150958	if pointers	-0.124939
-3.075098	by pointers	-0.124939
-3.047838	with pointers	-0.124939
-2.424947	as pointers	-0.124939
-2.108247	than pointers	-0.124939
-2.882068	use pointers	-0.124939
-2.787296	make pointers	-0.124939
-2.692825	all pointers	-0.124939
-1.985283	using pointers	-0.124939
-2.559815	multiple pointers	-0.124939
-1.652542	two pointers	-0.301030
-1.262556	member pointers	-0.124939
-2.229449	while pointers	-0.124939
-1.110393	through pointers	-0.249877
-2.132654	uses pointers	-0.124939
-1.443717	Function pointers	-0.124939
-2.064469	All pointers	-0.124939
-1.984171	Using pointers	-0.124939
-1.869448	link pointers	-0.124939
-1.667638	Any pointers	-0.124939
-0.943562	smart pointers	-0.124939
-1.635466	includes pointers	-0.124939
-1.634337	keep pointers	-0.124939
-1.592944	invalid pointers	-0.124939
-1.595205	setting pointers	-0.124939
-1.546061	contain pointers	-0.124939
-0.090088	Smart pointers	-0.124939
-0.900793	Member pointers	-0.124939
-0.600910	initializing pointers	-0.124939
-2.996491	the test	-0.249877
-2.701534	a test	-0.124939
-2.676984	of test	-0.425969
-2.053034	to test	-0.221849
-3.324098	and test	-0.124939
-3.506362	in test	-0.124939
-2.554798	The test	-0.124939
-2.776090	for test	-0.425969
-3.266596	can test	-0.124939
-2.603690	// test	-0.124939
-3.073058	on test	-0.124939
-2.829734	A test	-0.124939
-2.763614	different test	-0.124939
-2.713255	should test	-0.124939
-2.658536	each test	-0.124939
-1.805022	performance test	-0.124939
-2.436180	before test	-0.124939
-1.392353	void test	-0.602060
-2.276959	simple test	-0.124939
-2.216685	speed test	-0.124939
-2.179173	small test	-0.124939
-1.130979	my test	-0.124939
-0.943201	under test	-0.124939
-0.901808	My test	-0.425969
-1.201377	dedicated test	-0.124939
-1.077351	built-in test	-0.124939
-0.601144	unit- test	-0.124939
-2.685550	the new	-0.182931
-1.827306	a new	-0.255272
-3.391375	and new	-0.124939
-1.875579	with new	-0.602060
-3.067854	This new	-0.124939
-2.007981	each new	-0.124939
-2.656723	using new	-0.124939
-2.269198	important new	-0.124939
-2.204648	These new	-0.124939
-2.142940	uses new	-0.124939
-2.117458	operators new	-0.124939
-2.090471	add new	-0.124939
-2.016685	next new	-0.124939
-1.769367	desired new	-0.124939
-1.708559	adding new	-0.124939
-1.550202	brand new	-0.124939
-1.298308	over new	-0.124939
-0.601378	advertise new	-0.124939
-0.601378	receive new	-0.124939
-0.601378	(with new	-0.124939
-3.055072	to systems	-0.425969
-3.537950	in systems	-0.124939
-2.743909	other systems	-0.124939
-2.725059	all systems	-0.124939
-1.282698	64-bit systems	-0.124939
-2.567863	such systems	-0.124939
-2.501492	some systems	-0.124939
-0.975012	32-bit systems	-0.301030
-1.736955	bit systems	-0.425969
-0.577488	operating systems	-0.324511
-1.168038	Some systems	-0.249877
-2.203659	These systems	-0.124939
-1.332026	Mac systems	-0.124939
-1.675550	16-bit systems	-0.124939
-0.680669	embedded systems	-0.124939
-1.201977	existing systems	-0.124939
-1.077038	endian systems	-0.124939
-1.077751	utilize systems	-0.124939
-1.077038	Unix-like systems	-0.124939
-0.901660	Unix systems	-0.124939
-0.601344	Web systems	-0.124939
-2.231991	the user	-0.263241
-3.088032	a user	-0.124939
-2.575907	of user	-0.346788
-3.586054	to user	-0.124939
-3.401897	and user	-0.124939
-2.564744	The user	-0.124939
-2.289842	for user	-0.522879
-2.858313	A user	-0.124939
-2.786513	different user	-0.124939
-2.547782	possible user	-0.124939
-2.476272	very user	-0.124939
-2.428108	less user	-0.124939
-2.093159	standard user	-0.124939
-0.849395	end user	-0.124939
-1.772348	including user	-0.124939
-0.182869	graphical user	-0.425969
-1.442638	storing user	-0.124939
-1.297796	popular user	-0.124939
-0.901794	Take user	-0.124939
-1.979276	of these	-0.204120
-3.502418	to these	-0.124939
-2.877613	and these	-0.124939
-3.496323	in these	-0.124939
-2.543299	for these	-0.124939
-2.362585	that these	-0.124939
-3.066279	on these	-0.124939
-2.893958	use these	-0.124939
-2.131978	because these	-0.124939
-2.067951	all these	-0.124939
-2.071759	but these	-0.124939
-2.554924	In these	-0.124939
-2.462911	between these	-0.124939
-2.435507	For these	-0.124939
-2.321687	access these	-0.124939
-1.551994	avoid these	-0.124939
-2.215112	Use these	-0.124939
-2.216280	But these	-0.124939
-0.977515	All these	-0.124939
-2.050895	However, these	-0.124939
-1.769384	Unfortunately, these	-0.124939
-1.742324	tell these	-0.124939
-1.674407	though these	-0.124939
-1.673439	convert these	-0.124939
-1.597167	swap these	-0.124939
-1.596195	setting these	-0.124939
-1.297112	overcome these	-0.124939
-1.201178	distinguish these	-0.124939
-0.601077	translate these	-0.124939
-2.423341	and they	-0.221849
-2.251642	that they	-0.221849
-3.113377	or they	-0.124939
-3.159610	function they	-0.124939
-1.679128	if they	-0.522879
-3.068855	- they	-0.124939
-2.919202	time they	-0.124939
-1.835489	when they	-0.726999
-1.438539	because they	-0.191886
-1.467536	which they	-0.602060
-2.073496	but they	-0.425969
-1.865733	where they	-0.124939
-1.856453	objects they	-0.124939
-2.438664	before they	-0.124939
-2.392208	how they	-0.124939
-2.299677	cases they	-0.124939
-1.432451	whether they	-0.124939
-2.031615	programs they	-0.124939
-2.045154	unless they	-0.124939
-1.971783	what they	-0.124939
-1.936393	after they	-0.124939
-1.822021	reductions they	-0.124939
-1.597996	whenever they	-0.124939
-0.601244	texts they	-0.124939
-2.632191	and without	-0.301030
-2.987622	or without	-0.124939
-2.775839	memory without	-0.124939
-2.765925	data without	-0.124939
-2.752896	program without	-0.124939
-2.697703	same without	-0.124939
-2.737392	functions without	-0.124939
-2.720090	loop without	-0.124939
-2.710677	used without	-0.124939
-2.642386	integer without	-0.124939
-2.642484	compilers without	-0.124939
-2.591015	double without	-0.124939
-1.901568	object without	-0.124939
-2.519245	version without	-0.124939
-2.508834	objects without	-0.124939
-2.354568	libraries without	-0.124939
-2.319970	even without	-0.124939
-2.300118	programming without	-0.124939
-2.287851	operations without	-0.124939
-2.261605	instructions without	-0.124939
-2.277521	processors without	-0.124939
-2.247081	error without	-0.124939
-2.237650	CPUs without	-0.124939
-2.210098	calculations without	-0.124939
-2.230596	versions without	-0.124939
-1.013294	compiled without	-0.221849
-1.012736	bytes without	-0.823909
-2.203420	speed without	-0.124939
-2.179444	exception without	-0.124939
-2.152262	precision without	-0.124939
-2.121452	container without	-0.124939
-1.933155	applications without	-0.124939
-1.905642	microprocessors without	-0.124939
-1.792127	errors without	-0.124939
-1.698889	copying without	-0.124939
-1.668728	changed without	-0.124939
-1.671823	compiling without	-0.124939
-1.632484	directly without	-0.124939
-1.591091	F1 without	-0.124939
-1.372348	type-casting without	-0.124939
-1.294728	int, without	-0.124939
-1.197818	functionality without	-0.124939
-1.197818	unit-test without	-0.124939
-1.074446	probably without	-0.124939
-0.899928	question without	-0.124939
-0.600476	freely without	-0.124939
-0.600476	(Compile without	-0.124939
-2.178861	is useful	-0.335792
-3.107133	a useful	-0.124939
-1.598440	be useful	-0.704722
-1.980148	are useful	-0.726999
-2.920337	more useful	-0.124939
-2.683589	most useful	-0.124939
-1.640673	also useful	-0.124939
-2.545298	many useful	-0.124939
-1.273490	very useful	-0.221849
-2.433917	less useful	-0.124939
-2.401035	often useful	-0.124939
-1.742843	particularly useful	-0.124939
-1.552288	contain useful	-0.124939
-3.525369	the even	-0.124939
-3.400928	is even	-0.124939
-3.366377	to even	-0.124939
-3.156330	and even	-0.124939
-3.168784	for even	-0.124939
-3.128792	be even	-0.124939
-3.199742	can even	-0.124939
-2.153970	or even	-0.249877
-2.375643	not even	-0.124939
-2.924404	an even	-0.124939
-2.085123	may even	-0.124939
-2.857788	have even	-0.124939
-2.769429	memory even	-0.124939
-2.505727	objects even	-0.124939
-2.468030	variable even	-0.124939
-2.453688	performance even	-0.124939
-2.280890	cases even	-0.124939
-2.218099	arrays even	-0.124939
-2.229080	versions even	-0.124939
-1.537856	An even	-0.124939
-2.146222	works even	-0.124939
-2.046680	cycles even	-0.124939
-2.026214	cases, even	-0.124939
-1.986101	handling even	-0.124939
-1.975399	don't even	-0.124939
-1.931287	applications even	-0.124939
-1.937595	mechanism even	-0.124939
-1.942388	needed even	-0.124939
-1.914271	Intel, even	-0.124939
-1.733261	temp even	-0.124939
-1.728397	inlined even	-0.124939
-1.588766	used, even	-0.124939
-1.546294	mispredicted even	-0.124939
-1.438802	called, even	-0.124939
-1.292674	executed even	-0.124939
-1.294332	returns even	-0.124939
-1.199087	up, even	-0.124939
-1.197422	resources, even	-0.124939
-1.197422	default, even	-0.124939
-1.074148	execution, even	-0.124939
-1.074148	occurs, even	-0.124939
-1.074148	space, even	-0.124939
-1.074148	11.3 even	-0.124939
-0.899728	expressions, even	-0.124939
-0.899728	time-consumer even	-0.124939
-0.899728	times, even	-0.124939
-0.600376	b)) even	-0.124939
-0.600376	nine, even	-0.124939
-0.600376	overflows, even	-0.124939
-0.600376	handler, even	-0.124939
-3.140803	is sure	-0.425969
-3.429004	for sure	-0.124939
-2.744316	be sure	-0.425969
-1.981178	are sure	-0.425969
-3.073053	not sure	-0.124939
-0.835925	make sure	-0.711204
-0.946627	makes sure	-0.778151
-1.337126	making sure	-0.602060
-1.264054	Make sure	-0.124939
-0.601712	Be sure	-0.124939
-3.146555	the method	-0.124939
-2.072514	The method	-0.249877
-3.126248	or method	-0.124939
-1.313120	This method	-0.577236
-1.534383	this method	-0.249877
-2.847374	A method	-0.124939
-2.811260	vector method	-0.124939
-2.765889	same method	-0.124939
-1.832522	which method	-0.301030
-2.509903	variable method	-0.124939
-2.423810	call method	-0.124939
-2.267838	important method	-0.124939
-1.893453	calling method	-0.124939
-1.595365	similar method	-0.124939
-1.596104	newer method	-0.124939
-1.551828	Optimization method	-0.124939
-1.499194	general method	-0.124939
-1.500676	preferred method	-0.124939
-1.201132	C-style method	-0.124939
-1.201132	original method	-0.124939
-1.076938	Which method	-0.124939
-1.076938	unfortunate method	-0.124939
-2.397792	is always	-0.124939
-2.823208	to always	-0.124939
-2.897155	and always	-0.124939
-3.354173	that always	-0.124939
-2.203105	are always	-0.124939
-3.206588	// always	-0.124939
-3.119765	or always	-0.124939
-1.402851	not always	-0.124939
-1.964766	will always	-0.124939
-1.901669	vector always	-0.124939
-2.702164	cache always	-0.124939
-2.040503	should always	-0.124939
-2.508336	variable always	-0.124939
-2.445485	cannot always	-0.124939
-1.671132	they always	-0.124939
-2.277408	constant always	-0.124939
-2.262656	stack always	-0.124939
-2.248677	must always	-0.124939
-1.796117	statement always	-0.124939
-1.800733	p always	-0.124939
-1.500477	almost always	-0.124939
-1.377090	am always	-0.124939
-0.601278	therefore, always	-0.124939
-3.320119	the access	-0.124939
-2.192128	to access	-0.388180
-3.279537	The access	-0.124939
-2.752745	that access	-0.124939
-3.257269	can access	-0.124939
-2.077917	you access	-0.124939
-2.910475	have access	-0.124939
-1.565824	memory access	-0.204120
-2.803055	data access	-0.124939
-2.754459	CPU access	-0.124939
-2.716509	other access	-0.124939
-2.689282	cache access	-0.124939
-2.535375	possible access	-0.124939
-1.761154	cannot access	-0.425969
-2.345316	user access	-0.124939
-1.225515	file access	-0.124939
-2.135568	get access	-0.124939
-2.048475	fast access	-0.124939
-1.798115	gives access	-0.124939
-0.822780	network access	-0.124939
-1.498085	Memory access	-0.124939
-1.499087	non-sequential access	-0.124939
-1.374149	giving access	-0.124939
-1.200069	regular access	-0.124939
-1.076139	File access	-0.124939
-0.901060	(4) access	-0.124939
-0.203917	Network access	-0.124939
-0.901060	direct access	-0.124939
-0.901060	exclusive access	-0.124939
-0.601044	forward access	-0.124939
-1.798892	} void	-0.124939
-2.531542	version void	-0.124939
-2.456668	branch void	-0.124939
-0.993846	virtual void	-0.346788
-2.194208	thread void	-0.124939
-1.488662	matrix void	-0.425969
-2.160952	classes void	-0.124939
-0.926899	}; void	-0.221849
-2.125880	problem void	-0.124939
-0.797533	inline void	-0.301030
-0.693221	public: void	-0.346788
-1.199538	96 void	-0.124939
-1.199538	8.26a void	-0.124939
-1.076884	7.12 void	-0.124939
-1.075740	typedef void	-0.124939
-0.900793	8.26b void	-0.124939
-0.900793	a[c][r]); void	-0.124939
-0.900793	14.1c void	-0.124939
-0.900793	Disp(); void	-0.124939
-0.203890	x[]); void	-0.124939
-0.900793	8.21 void	-0.124939
-0.600910	9.5b void	-0.124939
-0.600910	Dispatcher void	-0.124939
-0.600910	x);} void	-0.124939
-0.600910	{}; void	-0.124939
-0.600910	0xC0000091L void	-0.124939
-0.600910	8.5a void	-0.124939
-0.600910	prototype: void	-0.124939
-0.600910	<malloc.h> void	-0.124939
-0.600910	8.25 void	-0.124939
-0.600910	9.2b void	-0.124939
-0.600910	9.2a void	-0.124939
-0.600910	vectorized: void	-0.124939
-0.600910	<asmlib.h> void	-0.124939
-3.503658	is 16	-0.124939
-3.022913	of 16	-0.124939
-3.021128	to 16	-0.124939
-3.096542	= 16	-0.124939
-2.314173	or 16	-0.124939
-1.902276	by 16	-0.124939
-3.047935	code 16	-0.124939
-3.051161	- 16	-0.124939
-1.849080	int 16	-0.124939
-2.947998	than 16	-0.124939
-2.614168	page 16	-0.124939
-2.586038	number 16	-0.124939
-1.487828	8 16	-0.124939
-2.340618	test 16	-0.124939
-1.383852	16 16	-0.301030
-1.637733	32 16	-0.425969
-1.978654	256 16	-0.124939
-1.261834	char 16	-0.124939
-0.647178	Store 16	-0.602060
-1.497163	lower 16	-0.124939
-1.498295	8, 16	-0.124939
-1.499430	identification 16	-0.124939
-1.295310	adds 16	-0.124939
-1.199538	150 16	-0.124939
-1.075740	...................................................................................... 16	-0.124939
-0.900793	Vec8s 16	-0.124939
-0.900793	................................................................................ 16	-0.124939
-0.900793	.................................................................................. 16	-0.124939
-0.600910	15.1c). 16	-0.124939
-0.600910	Vec16uc 16	-0.124939
-0.600910	Iu8vec8 16	-0.124939
-0.600910	Vec4d 16	-0.124939
-0.600910	_mm_shuffle_epi8 16	-0.124939
-0.600910	Is16vec4 16	-0.124939
-2.211668	the SSE2	-0.884607
-3.381102	and SSE2	-0.124939
-2.562236	The SSE2	-0.602060
-2.788143	for SSE2	-0.124939
-2.098315	// SSE2	-0.346788
-3.132830	or SSE2	-0.124939
-3.185624	if SSE2	-0.124939
-3.100215	with SSE2	-0.124939
-2.015436	set SSE2	-0.124939
-2.343157	without SSE2	-0.124939
-0.926946	128 SSE2	-0.124939
-1.891781	vectors SSE2	-0.124939
-1.710312	Define SSE2	-0.124939
-1.673435	possible. SSE2	-0.124939
-1.077038	145 SSE2	-0.124939
-0.901660	executable. SSE2	-0.124939
-0.901660	-msse SSE2	-0.124939
-0.601344	_mm_stream_si128 SSE2	-0.124939
-0.601344	_mm_stream_si32 SSE2	-0.124939
-0.601344	xmmintrin.h SSE2	-0.124939
-0.601344	_mm_stream_pd SSE2	-0.124939
-2.868932	is out	-0.602060
-3.227935	are out	-0.124939
-3.088724	or out	-0.124939
-3.166615	if out	-0.124939
-2.989547	not out	-0.124939
-1.604619	instructions out	-0.425969
-1.851666	points out	-0.124939
-1.794921	conversions out	-0.124939
-1.794921	index out	-0.124939
-1.078042	find out	-0.124939
-0.792785	shift out	-0.301030
-0.266122	rule out	-0.425969
-0.057947	roll out	-0.602060
-0.070525	Roll out	-0.823909
-1.200334	move out	-0.124939
-1.201278	mask out	-0.124939
-0.379831	carried out	-0.124939
-0.124833	Index out	-0.602060
-0.379831	moved out	-0.425969
-1.076339	being out	-0.124939
-0.379831	jumping out	-0.425969
-0.203930	ruled out	-0.124939
-0.203930	rolling out	-0.425969
-0.203930	turns out	-0.425969
-0.901193	left out	-0.124939
-0.203930	rolled out	-0.425969
-0.601110	print out	-0.124939
-0.601110	breaking out	-0.124939
-2.162727	the following	-0.284640
-1.506270	The following	-0.530704
-0.601947	InstructionSet().The following	-0.124939
-2.908667	the system	-0.221849
-3.094306	a system	-0.124939
-3.070980	of system	-0.124939
-2.541699	and system	-0.124939
-2.949755	in system	-0.124939
-3.373846	The system	-0.124939
-3.370726	for system	-0.124939
-3.117690	with system	-0.124939
-2.792431	different system	-0.124939
-0.419664	operating system	-0.255272
-1.315857	handling system	-0.124939
-1.847547	advanced system	-0.124939
-0.981341	Other system	-0.425969
-1.444836	highly system	-0.124939
-1.377889	Accessing system	-0.124939
-0.901927	compilers, system	-0.124939
-0.601478	routines, system	-0.124939
-3.582189	the 32	-0.124939
-3.050385	is 32	-0.124939
-3.400890	of 32	-0.124939
-3.413318	to 32	-0.124939
-3.207651	and 32	-0.124939
-2.685880	in 32	-0.602060
-2.529177	or 32	-0.124939
-3.045262	by 32	-0.124939
-2.971608	as 32	-0.124939
-1.847182	int 32	-0.124939
-2.922420	than 32	-0.124939
-2.863698	use 32	-0.124939
-2.772500	make 32	-0.124939
-2.592390	page 32	-0.124939
-2.601623	example 32	-0.124939
-2.600013	double 32	-0.124939
-1.655953	float 32	-0.124939
-2.495051	* 32	-0.124939
-2.441679	2 32	-0.124939
-2.460358	long 32	-0.124939
-2.398703	4 32	-0.124939
-1.486652	8 32	-0.124939
-2.386221	64 32	-0.124939
-2.296659	16 32	-0.124939
-2.162968	AVX 32	-0.124939
-2.126884	uses 32	-0.124939
-2.019456	preferably 32	-0.124939
-1.903493	directives 32	-0.124939
-1.762811	produce 32	-0.124939
-1.703412	accessing 32	-0.124939
-1.667274	% 32	-0.124939
-1.295389	over 32	-0.124939
-1.295389	16, 32	-0.124939
-0.601493	upper 32	-0.425969
-0.600643	_mm_perm_epi8 32	-0.124939
-0.600643	161 32	-0.124939
-0.600643	Vec8us 32	-0.124939
-0.600643	...................................................................... 32	-0.124939
-0.600643	Vec4i 32	-0.124939
-0.600643	Is32vec2 32	-0.124939
-0.600643	80386 32	-0.124939
-0.600643	Iu16vec4 32	-0.124939
-2.813121	the file	-0.204120
-3.582903	is file	-0.124939
-2.509266	a file	-0.124939
-2.790245	The file	-0.124939
-3.319352	for file	-0.124939
-2.816956	data file	-0.124939
-2.590931	library file	-0.124939
-1.382244	object file	-0.124939
-2.557927	C++ file	-0.124939
-2.525590	many file	-0.124939
-2.220909	big file	-0.124939
-1.378093	intermediate file	-0.425969
-1.985545	separate file	-0.124939
-1.298912	put file	-0.425969
-1.970987	source file	-0.124939
-1.674757	Optimizing file	-0.124939
-1.635363	entire file	-0.124939
-0.412716	executable file	-0.221849
-0.279685	header file	-0.204120
-0.487677	map file	-0.124939
-1.549818	standardized file	-0.124939
-0.249696	Header file	-0.301030
-0.601244	Include file	-0.124939
-0.601244	zip file	-0.124939
-3.328500	the programming	-0.124939
-3.513154	a programming	-0.124939
-2.161504	of programming	-0.425969
-3.501313	in programming	-0.124939
-2.821625	from programming	-0.124939
-1.694919	other programming	-0.249877
-2.732134	which programming	-0.124939
-2.568939	multiple programming	-0.124939
-2.551737	C++ programming	-0.124939
-1.796219	software programming	-0.124939
-1.579792	Some programming	-0.124939
-2.227818	compiled programming	-0.124939
-1.267595	common programming	-0.124939
-2.074815	certain programming	-0.124939
-2.027888	particular programming	-0.124939
-1.967323	various programming	-0.124939
-1.202223	your programming	-0.425969
-1.841466	advanced programming	-0.124939
-1.795848	modern programming	-0.124939
-1.740649	safe programming	-0.124939
-0.158248	oriented programming	-0.191886
-1.499484	preferred programming	-0.124939
-0.680671	System programming	-0.124939
-1.076339	catch programming	-0.124939
-0.901193	trivial programming	-0.124939
-0.901193	primitive programming	-0.124939
-0.601110	meta- programming	-0.124939
-0.601110	Nowadays, programming	-0.124939
-3.127499	the dynamic	-0.301030
-2.698983	a dynamic	-0.425969
-2.407845	of dynamic	-0.492916
-2.877613	and dynamic	-0.124939
-3.286107	The dynamic	-0.124939
-3.286053	for dynamic	-0.124939
-2.551202	or dynamic	-0.425969
-3.163966	if dynamic	-0.124939
-3.067243	with dynamic	-0.124939
-2.985346	not dynamic	-0.124939
-2.964784	than dynamic	-0.124939
-1.645045	use dynamic	-0.903090
-2.879123	more dynamic	-0.124939
-1.968073	A dynamic	-0.602060
-2.796806	make dynamic	-0.124939
-2.118618	same dynamic	-0.124939
-2.704941	all dynamic	-0.124939
-2.639300	using dynamic	-0.124939
-2.567405	multiple dynamic	-0.124939
-2.536488	where dynamic	-0.124939
-2.335088	without dynamic	-0.124939
-2.234418	while dynamic	-0.124939
-1.298631	avoid dynamic	-0.602060
-2.133503	another dynamic	-0.124939
-2.068562	All dynamic	-0.124939
-1.297502	separate dynamic	-0.425969
-1.953096	Make dynamic	-0.124939
-0.601273	versus dynamic	-0.124939
-1.201178	Whenever dynamic	-0.124939
-3.694088	the part	-0.124939
-3.084880	is part	-0.425969
-3.057953	a part	-0.425969
-3.221912	are part	-0.124939
-3.021394	as part	-0.124939
-2.985346	not part	-0.124939
-2.277100	this part	-0.425969
-2.822873	A part	-0.124939
-1.872121	same part	-0.602060
-2.759314	If part	-0.124939
-2.730156	which part	-0.124939
-2.731784	but part	-0.124939
-1.754496	each part	-0.301030
-2.568493	static part	-0.124939
-2.498169	any part	-0.124939
-0.640087	critical part	-1.054358
-2.321687	access part	-0.124939
-2.263113	important part	-0.124939
-2.234418	large part	-0.124939
-2.177405	small part	-0.124939
-2.158846	optimized part	-0.124939
-2.133503	another part	-0.124939
-2.027155	particular part	-0.124939
-1.592329	significant part	-0.124939
-1.548501	time-consuming part	-0.124939
-1.547535	(or part	-0.124939
-0.124829	fractional part	-0.301030
-0.901126	time-critical part	-0.124939
-0.601077	task-specific part	-0.124939
-3.138286	the bits	-0.301030
-3.525922	of bits	-0.124939
-2.889066	more bits	-0.124939
-2.121287	same bits	-0.124939
-2.731517	other bits	-0.124939
-2.070224	all bits	-0.425969
-1.655864	multiple bits	-0.301030
-2.414420	8 bits	-0.124939
-1.029142	64 bits	-0.124939
-2.350639	test bits	-0.124939
-1.226262	16 bits	-0.249877
-0.773647	32 bits	-0.346788
-2.180947	small bits	-0.124939
-1.455500	128 bits	-0.124939
-1.297812	256 bits	-0.124939
-1.948750	n bits	-0.124939
-1.180147	512 bits	-0.124939
-1.598673	enough bits	-0.124939
-0.747169	vector, bits	-0.124939
-1.441248	size, bits	-0.124939
-1.375141	comparing bits	-0.124939
-1.200733	individual bits	-0.124939
-0.901393	1024 bits	-0.124939
-0.203950	element, bits	-0.425969
-0.601211	remaining bits	-0.124939
-3.053401	of operations	-0.124939
-1.105268	vector operations	-0.154902
-1.484064	point operations	-0.204120
-1.280710	integer operations	-0.249877
-2.666225	do operations	-0.124939
-2.573950	64-bit operations	-0.124939
-2.475819	return operations	-0.124939
-2.447735	makes operations	-0.124939
-2.414331	bit operations	-0.124939
-2.340481	these operations	-0.124939
-2.253513	extra operations	-0.124939
-1.347749	Integer operations	-0.425969
-1.989173	Boolean operations	-0.124939
-1.990703	mathematical operations	-0.124939
-1.950675	lookup operations	-0.124939
-1.949158	| operations	-0.124939
-1.935186	read operations	-0.124939
-1.739658	shift operations	-0.124939
-1.672712	disk operations	-0.124939
-0.647470	Vector operations	-0.602060
-0.805241	arithmetic operations	-0.124939
-0.901527	primitive operations	-0.124939
-0.601278	Similar operations	-0.124939
-3.098912	is 0	-0.124939
-2.821619	to 0	-0.301030
-2.718348	be 0	-0.425969
-1.594002	= 0	-0.492916
-1.676919	than 0	-0.669007
-2.183733	from 0	-0.425969
-2.538819	value 0	-0.124939
-2.473982	return 0	-0.124939
-1.789184	< 0	-0.124939
-2.415665	8 0	-0.124939
-2.403531	64 0	-0.124939
-2.309735	always 0	-0.124939
-2.315510	16 0	-0.124939
-2.322473	32 0	-0.124939
-2.309079	bits 0	-0.124939
-2.281185	& 0	-0.124939
-2.216360	element 0	-0.124939
-2.139976	get 0	-0.124939
-2.016605	typically 0	-0.124939
-1.949333	n 0	-0.124939
-1.261726	| 0	-0.425969
-0.988791	> 0	-0.301030
-1.200866	interval 0	-0.124939
-0.601244	0= 0	-0.124939
-2.578708	the type	-0.425969
-2.822241	of type	-0.124939
-3.517906	to type	-0.124939
-3.324098	and type	-0.124939
-2.405577	The type	-0.124939
-2.289512	function type	-0.124939
-2.971685	than type	-0.124939
-2.130462	different type	-0.124939
-2.751368	same type	-0.124939
-2.682813	integer type	-0.124939
-2.658536	each type	-0.124939
-2.632501	pointer type	-0.124939
-2.569735	float type	-0.124939
-2.501296	any type	-0.124939
-2.468518	return type	-0.124939
-2.276959	simple type	-0.124939
-2.100853	doing type	-0.124939
-0.848647	runtime type	-0.726999
-1.910737	Each type	-0.124939
-1.848257	appropriate type	-0.124939
-1.822537	loaded type	-0.124939
-0.333003	composite type	-0.249877
-1.376558	Pointer type	-0.124939
-0.346584	Runtime type	-0.602060
-1.200467	C-style type	-0.124939
-0.601144	Implicit type	-0.124939
-0.601144	Constructor-style type	-0.124939
-2.506205	the case	-0.249877
-1.859494	in case	-0.753328
-2.983818	{ case	-0.124939
-1.479792	this case	-0.301030
-2.554702	possible case	-0.124939
-2.117893	likely case	-0.124939
-1.639438	simplest case	-0.124939
-1.553622	latter case	-0.124939
-1.501579	general case	-0.124939
-0.425819	worst case	-0.301030
-0.249778	break; case	-0.124939
-0.601612	worst- case	-0.124939
-0.601612	former case	-0.124939
-3.363740	the cases	-0.124939
-2.434630	in cases	-0.823909
-2.725610	be cases	-0.425969
-3.279389	are cases	-0.124939
-2.783583	different cases	-0.124939
-1.306533	most cases	-0.124939
-2.562389	In cases	-0.124939
-1.863726	many cases	-0.124939
-2.546639	possible cases	-0.124939
-0.750658	some cases	-0.287666
-2.285081	simple cases	-0.124939
-2.204648	These cases	-0.124939
-1.430898	few cases	-0.425969
-1.315206	complicated cases	-0.124939
-1.942440	difficult cases	-0.124939
-1.047447	special cases	-0.124939
-0.981180	Other cases	-0.425969
-1.376135	complex cases	-0.124939
-0.901727	Difficult cases	-0.124939
-0.601378	rare cases	-0.124939
-3.684379	the short	-0.124939
-3.533644	is short	-0.124939
-3.055056	a short	-0.124939
-3.485271	of short	-0.124939
-3.298177	and short	-0.124939
-3.063292	with short	-0.124939
-2.961375	than short	-0.124939
-2.946789	{ short	-0.124939
-2.819484	A short	-0.124939
-2.637407	using short	-0.124939
-2.636789	Intel short	-0.124939
-2.413483	4 short	-0.124939
-2.408252	8 short	-0.124939
-1.484074	unsigned short	-0.602060
-2.321563	SSE2 short	-0.124939
-2.291424	type short	-0.124939
-1.523297	i; short	-0.425969
-2.084415	1 short	-0.124939
-1.981085	256 short	-0.124939
-1.950757	char short	-0.124939
-1.884175	AVX2 short	-0.124939
-0.037758	bb[], short	-1.079181
-0.037758	aa[], short	-1.079181
-0.492576	( short	-0.602060
-1.440093	MMX short	-0.124939
-1.296979	11 short	-0.124939
-1.077151	7.22 short	-0.124939
-0.901060	char, short	-0.124939
-0.601044	int8_t short	-0.124939
-0.601044	(char, short	-0.124939
-3.000449	the &	-0.249877
-2.228154	a &	-0.263241
-3.533967	to &	-0.124939
-3.313430	The &	-0.124939
-3.145247	= &	-0.124939
-2.368387	int &	-0.425969
-2.646958	using &	-0.124939
-2.630809	b &	-0.124939
-0.902654	const &	-0.425969
-2.202943	single &	-0.124939
-2.056874	(int &	-0.124939
-1.441248	T &	-0.124939
-0.504640	(u.i &	-0.425969
-0.504640	(n &	-0.124939
-0.203950	(N &	-0.124939
-0.901393	(Day &	-0.124939
-0.601211	Sum3(S3 &	-0.124939
-0.601211	powN<(N &	-0.124939
-0.601211	((C &	-0.124939
-0.601211	((B &	-0.124939
-0.601211	list[i &	-0.124939
-0.601211	v.10.3 &	-0.124939
-0.601211	v.10.2 &	-0.124939
-0.601211	OneOrTwo5[b &	-0.124939
-0.601211	(A &	-0.124939
-3.149347	the simple	-0.301030
-2.033170	a simple	-0.209260
-3.561427	of simple	-0.124939
-3.055072	to simple	-0.124939
-3.381102	and simple	-0.124939
-2.547885	in simple	-0.425969
-2.400835	for simple	-0.124939
-3.055134	as simple	-0.124939
-1.700540	A simple	-0.124939
-2.777951	only simple	-0.124939
-1.999369	do simple	-0.124939
-2.670797	most simple	-0.124939
-2.571327	two simple	-0.124939
-2.561553	In simple	-0.124939
-2.472620	between simple	-0.124939
-2.223181	Use simple	-0.124939
-2.061638	quite simple	-0.124939
-1.870902	reduce simple	-0.124939
-1.673435	mix simple	-0.124939
-1.297464	50 simple	-0.124939
-0.601344	Putting simple	-0.124939
-3.621890	the instructions	-0.124939
-3.434080	of instructions	-0.124939
-3.236134	The instructions	-0.124939
-3.040172	on instructions	-0.124939
-1.626412	vector instructions	-0.124939
-2.736591	different instructions	-0.124939
-2.651522	no instructions	-0.124939
-2.541362	two instructions	-0.124939
-2.527660	where instructions	-0.124939
-2.363094	optimization instructions	-0.124939
-1.665265	new instructions	-0.124939
-2.322161	these instructions	-0.124939
-2.254247	Some instructions	-0.124939
-1.568479	extra instructions	-0.425969
-2.242847	assembly instructions	-0.124939
-2.193781	specific instructions	-0.124939
-2.192757	single instructions	-0.124939
-1.266776	These instructions	-0.301030
-2.166419	AVX instructions	-0.124939
-2.100986	few instructions	-0.124939
-2.069077	certain instructions	-0.124939
-0.948689	write instructions	-0.249877
-1.988876	intrinsic instructions	-0.124939
-1.981615	conversion instructions	-0.124939
-1.887112	control instructions	-0.124939
-1.668898	execute instructions	-0.124939
-1.547626	256-bit instructions	-0.124939
-1.548855	search instructions	-0.124939
-0.492359	executing instructions	-0.124939
-1.442179	machine instructions	-0.124939
-1.373996	six instructions	-0.124939
-1.200380	application-specific instructions	-0.124939
-0.203870	reorder instructions	-0.124939
-0.600810	16-byte instructions	-0.124939
-0.600810	carry) instructions	-0.124939
-0.600810	ADX instructions	-0.124939
-0.600810	pending instructions	-0.124939
-3.320119	the processors	-0.124939
-2.673378	of processors	-0.249877
-2.054400	on processors	-0.425969
-2.794430	vector processors	-0.124939
-2.128520	different processors	-0.124939
-2.656843	most processors	-0.124939
-1.973009	Intel processors	-0.124939
-2.559331	such processors	-0.124939
-2.487538	some processors	-0.124939
-1.476275	first processors	-0.301030
-2.273524	simple processors	-0.124939
-1.523297	virtual processors	-0.124939
-1.512226	AMD processors	-0.124939
-1.906339	Many processors	-0.124939
-1.870756	x86 processors	-0.124939
-1.865813	old processors	-0.124939
-1.822451	VIA processors	-0.124939
-1.795129	modern processors	-0.124939
-1.015408	non-Intel processors	-0.124939
-0.981052	unknown processors	-0.124939
-0.566672	logical processors	-0.249877
-0.550230	PC processors	-0.124939
-0.601247	physical processors	-0.124939
-1.296979	present processors	-0.124939
-1.200069	older processors	-0.124939
-0.601044	micro- processors	-0.124939
-0.601044	emulated processors	-0.124939
-0.601044	lightweight processors	-0.124939
-0.601044	Newer processors	-0.124939
-0.601044	Future processors	-0.124939
-3.006454	the available	-0.249877
-2.295316	is available	-0.522879
-3.552275	of available	-0.124939
-3.282873	be available	-0.124939
-1.752397	are available	-0.393784
-2.113561	only available	-0.124939
-1.890410	also available	-0.124939
-2.405575	register available	-0.124939
-2.382060	libraries available	-0.124939
-1.676618	registers available	-0.124939
-2.347438	systems available	-0.124939
-2.312920	always available	-0.124939
-1.607523	processors available	-0.124939
-1.870444	made available	-0.124939
-1.796543	become available	-0.124939
-1.500676	easily available	-0.124939
-1.443426	profilers available	-0.124939
-1.377223	largest available	-0.124939
-1.298042	Only available	-0.124939
-0.901593	became available	-0.124939
-0.601311	Library, available	-0.124939
-0.601311	publicly available	-0.124939
-2.685550	the constant	-0.124939
-3.619149	is constant	-0.124939
-2.033444	a constant	-0.313995
-2.684033	and constant	-0.602060
-2.798783	The constant	-0.124939
-3.221275	// constant	-0.124939
-2.110930	by constant	-0.726999
-3.059542	as constant	-0.124939
-2.854636	A constant	-0.124939
-2.763554	point constant	-0.124939
-2.722204	one constant	-0.124939
-2.697902	integer constant	-0.124939
-2.672520	each constant	-0.124939
-2.207259	single constant	-0.124939
-2.175074	precision constant	-0.124939
-2.038431	Integer constant	-0.124939
-1.640053	enable constant	-0.124939
-1.442406	compile-time constant	-0.124939
-1.077138	Copying constant	-0.124939
-0.203983	elimination, constant	-0.425969
-3.204327	are up	-0.124939
-3.054076	code up	-0.124939
-2.972984	not up	-0.124939
-2.791075	make up	-0.124939
-1.761458	set up	-0.124939
-1.849806	takes up	-0.124939
-1.470651	take up	-0.301030
-1.532027	speed up	-0.124939
-2.063211	count up	-0.124939
-1.954135	end up	-0.124939
-0.809836	look up	-0.425969
-1.866939	goes up	-0.124939
-0.942751	keep up	-0.425969
-0.942535	allow up	-0.425969
-1.595601	setting up	-0.124939
-1.296713	turned up	-0.124939
-0.601193	split up	-0.124939
-0.249637	clean up	-0.124939
-0.504500	cleaned up	-0.124939
-1.075940	filled up	-0.124939
-0.124818	cleaning up	-0.124939
-1.075940	(not up	-0.124939
-0.203903	Splitting up	-0.425969
-0.900926	dispatchers up	-0.124939
-0.600977	warm up	-0.124939
-0.600977	cleans up	-0.124939
-0.600977	fills up	-0.124939
-0.600977	fill up	-0.124939
-0.600977	speeded up	-0.124939
-0.600977	summing up	-0.124939
-0.600977	totaling up	-0.124939
-0.600977	speeding up	-0.124939
-2.900694	the error	-0.124939
-2.830755	of error	-0.124939
-3.559230	to error	-0.124939
-2.900499	and error	-0.425969
-3.126248	or error	-0.124939
-3.050771	as error	-0.124939
-1.364641	an error	-0.316824
-2.282683	this error	-0.124939
-2.246119	more error	-0.425969
-2.836419	from error	-0.124939
-2.101604	other error	-0.124939
-2.321925	programming error	-0.124939
-2.226448	An error	-0.124939
-2.201251	common error	-0.124939
-2.139298	another error	-0.124939
-1.201430	own error	-0.124939
-1.156866	appropriate error	-0.124939
-1.767050	No error	-0.124939
-0.124855	residual error	-0.124939
-0.901593	minor error	-0.124939
-0.901593	provoke error	-0.124939
-0.601311	unrecoverable error	-0.124939
-3.228576	and I	-0.124939
-2.741516	that I	-0.124939
-3.138329	if I	-0.124939
-2.964430	compiler I	-0.124939
-2.862979	when I	-0.124939
-2.747751	If I	-0.124939
-2.716218	but I	-0.124939
-1.305661	compilers I	-0.970037
-2.135247	functions. I	-0.124939
-1.946971	examples I	-0.124939
-1.179214	Here, I	-0.425969
-1.863997	called. I	-0.124939
-1.786833	size. I	-0.124939
-1.014141	it. I	-0.425969
-1.701732	performance. I	-0.124939
-1.592924	not. I	-0.124939
-1.548460	element. I	-0.124939
-1.494724	arrays. I	-0.124939
-1.439315	number. I	-0.124939
-1.372369	call. I	-0.124939
-1.373666	one. I	-0.124939
-0.601006	manuals. I	-0.124939
-1.198876	options. I	-0.124939
-1.198876	is, I	-0.124939
-1.198876	manually. I	-0.124939
-1.075242	expected. I	-0.124939
-1.075242	Instead, I	-0.124939
-1.075242	a. I	-0.124939
-0.900460	research, I	-0.124939
-0.900460	use. I	-0.124939
-0.900460	manual, I	-0.124939
-0.600743	easier. I	-0.124939
-0.600743	microcontrollers. I	-0.124939
-0.600743	tricky. I	-0.124939
-0.600743	people. I	-0.124939
-0.600743	said, I	-0.124939
-0.600743	chapter, I	-0.124939
-0.600743	apart. I	-0.124939
-0.600743	consequences. I	-0.124939
-2.354558	of making	-0.425969
-3.458707	and making	-0.124939
-2.203106	for making	-0.124939
-2.477150	are making	-0.124939
-1.476228	by making	-0.301030
-3.131274	with making	-0.124939
-3.053059	not making	-0.124939
-2.117879	than making	-0.301030
-1.663648	from making	-0.346788
-2.244803	avoid making	-0.124939
-1.914752	actually making	-0.124939
-1.640094	consider making	-0.124939
-1.377331	places making	-0.124939
-0.204024	difficulties making	-0.425969
-2.674577	of times	-0.124939
-2.567405	multiple times	-0.124939
-1.903511	two times	-0.124939
-1.050303	many times	-0.234083
-2.321687	access times	-0.124939
-2.234146	execution times	-0.124939
-1.520863	several times	-0.425969
-2.119976	eight times	-0.124939
-2.107751	few times	-0.124939
-1.981694	256 times	-0.124939
-1.260704	three times	-0.124939
-1.866298	10 times	-0.124939
-1.819901	5 times	-0.124939
-0.547297	response times	-0.522879
-1.636618	20 times	-0.124939
-1.548501	subsequent times	-0.124939
-1.550438	search times	-0.124939
-1.498316	random times	-0.124939
-0.747009	thousand times	-0.124939
-1.375319	six times	-0.124939
-1.297112	seven times	-0.124939
-1.297112	hundred times	-0.124939
-1.298088	250 times	-0.124939
-1.201178	inconvenient times	-0.124939
-0.379818	1000 times	-0.124939
-1.076239	unpredictable times	-0.124939
-0.901126	ten times	-0.124939
-0.901126	NumberOfTests times	-0.124939
-0.601077	million times	-0.124939
-2.322290	the stack	-0.204120
-3.097477	a stack	-0.124939
-3.073980	of stack	-0.425969
-2.834500	to stack	-0.301030
-3.435084	and stack	-0.124939
-3.566151	in stack	-0.124939
-2.415273	The stack	-0.425969
-3.112364	on stack	-0.124939
-2.851735	from stack	-0.124939
-1.850397	point stack	-0.602060
-1.756055	called stack	-0.124939
-2.429947	call stack	-0.124939
-1.202831	register stack	-0.221849
-1.405833	standard stack	-0.425969
-1.769999	No stack	-0.124939
-0.601512	"standard stack	-0.124939
-3.510197	and want	-0.124939
-2.350400	may want	-0.425969
-0.941162	you want	-0.675846
-1.315379	we want	-0.522879
-2.286729	I want	-0.124939
-1.994816	don't want	-0.124939
-1.264262	We want	-0.124939
-1.853525	just want	-0.124939
-1.744380	still want	-0.124939
-0.805763	who want	-0.425969
-1.290258	code. Example:	-0.726999
-2.356470	time. Example:	-0.124939
-1.567945	function. Example:	-0.425969
-1.389610	memory. Example:	-0.425969
-2.052941	used. Example:	-0.124939
-1.863578	called. Example:	-0.124939
-1.817881	loop. Example:	-0.124939
-1.076216	2. Example:	-0.425969
-1.758324	variables. Example:	-0.124939
-1.736183	calls. Example:	-0.124939
-1.701375	registers. Example:	-0.124939
-1.703999	variable. Example:	-0.124939
-1.666613	needed. Example:	-0.124939
-1.631448	instructions. Example:	-0.124939
-1.546937	order. Example:	-0.124939
-1.548263	to. Example:	-0.124939
-0.855174	overflow. Example:	-0.425969
-1.497110	value. Example:	-0.124939
-1.494463	branch. Example:	-0.124939
-1.372171	constant. Example:	-0.124939
-1.373501	prediction. Example:	-0.124939
-1.373501	result. Example:	-0.124939
-0.679891	counter. Example:	-0.425969
-1.295654	operation. Example:	-0.124939
-1.294320	finished. Example:	-0.124939
-1.295654	ways. Example:	-0.124939
-1.294320	execution. Example:	-0.124939
-1.295654	elements. Example:	-0.124939
-1.075143	once. Example:	-0.124939
-1.075143	limited. Example:	-0.124939
-0.900393	static. Example:	-0.124939
-0.900393	known. Example:	-0.124939
-0.900393	thing. Example:	-0.124939
-0.900393	divisions. Example:	-0.124939
-0.900393	later. Example:	-0.124939
-0.600710	zeroes. Example:	-0.124939
-0.600710	undesired. Example:	-0.124939
-0.600710	offsets). Example:	-0.124939
-0.600710	overhead. Example:	-0.124939
-0.600710	individually. Example:	-0.124939
-2.348248	the Gnu	-0.452298
-2.694351	and Gnu	-0.301030
-2.707119	in Gnu	-0.602060
-2.020001	The Gnu	-0.477121
-3.246914	// Gnu	-0.124939
-3.174565	or Gnu	-0.124939
-1.299430	Windows Gnu	-0.602060
-2.229333	Use Gnu	-0.124939
-0.969693	Intel, Gnu	-0.602060
-1.873154	10 Gnu	-0.124939
-1.640373	PathScale Gnu	-0.124939
-1.597958	Windows. Gnu	-0.124939
-0.380005	-fno-builtin Gnu	-0.425969
-0.902061	Asmlib Gnu	-0.124939
-0.601545	2009. Gnu	-0.124939
-2.875871	time Some	-0.124939
-2.742569	functions Some	-0.124939
-2.355138	optimization Some	-0.124939
-2.357777	libraries Some	-0.124939
-2.353825	code. Some	-0.124939
-2.352263	time. Some	-0.124939
-2.307798	access Some	-0.124939
-1.296299	systems. Some	-0.425969
-1.918518	compilers. Some	-0.124939
-1.902536	directives Some	-0.124939
-0.900620	compiler. Some	-0.301030
-1.816458	loop. Some	-0.124939
-1.701388	mode. Some	-0.124939
-1.707175	bytes. Some	-0.124939
-1.590328	unrolling Some	-0.124939
-1.546020	order. Some	-0.124939
-1.543126	allocation. Some	-0.124939
-1.547474	available. Some	-0.124939
-1.546020	problems. Some	-0.124939
-1.494868	line. Some	-0.124939
-1.494868	applications. Some	-0.124939
-1.439789	important. Some	-0.124939
-1.293661	them. Some	-0.124939
-1.293661	division. Some	-0.124939
-0.504259	two. Some	-0.124939
-1.198215	up. Some	-0.124939
-1.199683	style. Some	-0.124939
-1.198215	run. Some	-0.124939
-1.074745	for. Some	-0.124939
-1.076218	compilation. Some	-0.124939
-1.074745	sequentially. Some	-0.124939
-1.074745	Hyperthreading Some	-0.124939
-1.074745	best. Some	-0.124939
-0.900127	though. Some	-0.124939
-0.600576	places). Some	-0.124939
-0.600576	logic. Some	-0.124939
-0.600576	intervals. Some	-0.124939
-0.600576	STL. Some	-0.124939
-0.600576	project. Some	-0.124939
-0.600576	card. Some	-0.124939
-0.600576	Alignment? Some	-0.124939
-0.600576	redesign. Some	-0.124939
-0.600576	stupid. Some	-0.124939
-0.600576	protection. Some	-0.124939
-2.348285	of its	-0.249877
-2.672858	to its	-0.124939
-2.890543	and its	-0.124939
-2.938623	in its	-0.124939
-2.534451	if its	-0.124939
-3.111313	by its	-0.124939
-2.477359	with its	-0.124939
-2.212050	on its	-0.124939
-2.112555	than its	-0.124939
-2.288879	have its	-0.124939
-2.876152	then its	-0.124939
-2.828959	from its	-0.124939
-2.831843	at its	-0.124939
-1.746662	has its	-0.425969
-2.738170	but its	-0.124939
-2.341829	sure its	-0.124939
-2.256117	about its	-0.124939
-2.202137	thread its	-0.124939
-2.139238	get its	-0.124939
-2.106358	calculate its	-0.124939
-1.824090	change its	-0.124939
-1.671990	handle its	-0.124939
-1.635869	align its	-0.124939
-0.680392	type-casting its	-0.124939
-0.601211	utilizing its	-0.124939
-1.522819	much about	-0.124939
-0.254457	information about	-0.338819
-1.934007	read about	-0.124939
-1.868617	made about	-0.124939
-1.156599	here about	-0.124939
-0.726214	details about	-0.124939
-0.902031	something about	-0.124939
-1.553660	care about	-0.124939
-1.443638	rules about	-0.124939
-1.374942	87 about	-0.124939
-1.296634	recommendation about	-0.124939
-1.200600	43 about	-0.124939
-1.200600	26 about	-0.124939
-1.201477	assumption about	-0.124939
-1.076539	137 about	-0.124939
-0.901326	worry about	-0.124939
-0.901326	that's about	-0.124939
-0.901326	hint about	-0.124939
-0.901326	comments about	-0.124939
-0.601177	discussions about	-0.124939
-0.601177	thought about	-0.124939
-0.601177	reply about	-0.124939
-0.601177	worried about	-0.124939
-0.601177	Tips about	-0.124939
-0.601177	Details about	-0.124939
-0.601177	debate about	-0.124939
-1.983581	is important	-0.948847
-3.324433	be important	-0.124939
-3.112364	on important	-0.124939
-3.077636	as important	-0.124939
-2.378669	an important	-0.124939
-2.936325	this important	-0.124939
-1.840444	more important	-0.249877
-1.306806	most important	-0.124939
-2.510713	so important	-0.124939
-1.548112	very important	-0.301030
-1.744909	less important	-0.124939
-2.270330	Some important	-0.124939
-2.231477	An important	-0.124939
-2.180153	therefore important	-0.124939
-1.953487	too important	-0.124939
-1.741654	particularly important	-0.124939
-2.349338	is accessed	-0.124939
-3.496741	and accessed	-0.124939
-2.135982	be accessed	-0.301030
-1.516396	are accessed	-0.467361
-3.204797	or accessed	-0.124939
-2.417654	not accessed	-0.425969
-2.925841	when accessed	-0.124939
-1.859471	objects accessed	-0.425969
-2.179984	been accessed	-0.124939
-1.773978	fact accessed	-0.124939
-1.299128	necessarily accessed	-0.124939
-2.678193	of CPUs	-0.124939
-2.778075	for CPUs	-0.425969
-3.079316	with CPUs	-0.124939
-2.211580	on CPUs	-0.301030
-2.131112	different CPUs	-0.124939
-2.728474	other CPUs	-0.124939
-2.712377	all CPUs	-0.124939
-2.662989	most CPUs	-0.124939
-1.564183	Intel CPUs	-0.124939
-1.655676	multiple CPUs	-0.602060
-2.569959	64-bit CPUs	-0.124939
-2.262597	Some CPUs	-0.124939
-2.218121	Use CPUs	-0.124939
-1.512705	AMD CPUs	-0.124939
-1.908659	Many CPUs	-0.124939
-1.872067	x86 CPUs	-0.124939
-1.867758	old CPUs	-0.124939
-1.105270	modern CPUs	-0.124939
-1.742919	Pentium CPUs	-0.124939
-0.485139	non-Intel CPUs	-0.346788
-1.594180	current CPUs	-0.124939
-0.442155	Modern CPUs	-0.425969
-0.901326	Current CPUs	-0.124939
-0.901326	Older CPUs	-0.124939
-0.601177	106 CPUs	-0.124939
-0.601177	low-power CPUs	-0.124939
-2.537509	the function.	-0.467361
-3.069736	a function.	-0.124939
-2.099461	other function.	-0.425969
-2.589348	library function.	-0.124939
-2.504446	any function.	-0.124939
-1.262856	member function.	-0.124939
-1.753675	called function.	-0.124939
-1.497097	critical function.	-0.301030
-2.344134	new function.	-0.124939
-2.202943	single function.	-0.124939
-2.212742	virtual function.	-0.124939
-2.012675	next function.	-0.124939
-1.992416	intrinsic function.	-0.124939
-1.984962	separate function.	-0.124939
-1.008131	dispatcher function.	-0.124939
-1.767239	desired function.	-0.124939
-1.047183	inlined function.	-0.124939
-1.636705	message function.	-0.124939
-1.552073	pure function.	-0.124939
-1.500920	memcpy function.	-0.124939
-1.499240	polymorphic function.	-0.124939
-1.377668	leaf function.	-0.124939
-1.298487	strlen function.	-0.124939
-1.076638	ReadTSC function.	-0.124939
-0.901393	Pure function.	-0.124939
-2.907060	the extra	-0.124939
-3.068000	of extra	-0.124939
-3.365818	The extra	-0.124939
-2.416833	This extra	-0.124939
-1.520417	an extra	-0.170696
-2.929153	this extra	-0.124939
-2.753441	other extra	-0.124939
-1.212557	no extra	-0.380211
-1.853044	takes extra	-0.124939
-1.222802	any extra	-0.425969
-2.506245	some extra	-0.124939
-2.411823	take extra	-0.124939
-2.366874	need extra	-0.124939
-2.117227	few extra	-0.124939
-2.092086	add extra	-0.124939
-1.377756	9 extra	-0.124939
-1.297963	adds extra	-0.124939
-1.077338	inserts extra	-0.124939
-2.363616	that does	-0.425969
-3.094757	or does	-0.124939
-2.050616	it does	-0.124939
-3.148987	function does	-0.124939
-2.411404	This does	-0.425969
-1.605141	compiler does	-0.249877
-2.898269	this does	-0.124939
-1.950933	It does	-0.124939
-2.754373	loop does	-0.124939
-2.734121	which does	-0.124939
-1.711644	pointer does	-0.602060
-2.586198	library does	-0.124939
-2.574637	object does	-0.124939
-1.806573	2 does	-0.425969
-2.473875	long does	-0.124939
-2.200362	thread does	-0.124939
-2.158501	manual does	-0.124939
-2.116455	list does	-0.124939
-2.016340	operator does	-0.124939
-1.950245	dispatcher does	-0.124939
-1.952027	programmer does	-0.124939
-1.549930	aliasing does	-0.124939
-1.296469	hand, does	-0.124939
-1.200467	unit-test does	-0.124939
-1.076439	argument does	-0.124939
-0.901260	14.26 does	-0.124939
-0.601144	__intel_cpu_features_init_x() does	-0.124939
-3.010504	the assembly	-0.726999
-3.570776	of assembly	-0.124939
-3.576927	to assembly	-0.124939
-2.907266	and assembly	-0.425969
-2.206326	in assembly	-0.182931
-2.798783	The assembly	-0.124939
-2.553859	for assembly	-0.124939
-2.567131	or assembly	-0.124939
-2.127361	an assembly	-0.124939
-2.260721	use assembly	-0.124939
-1.739825	using assembly	-0.301030
-1.678248	need assembly	-0.124939
-1.639647	following assembly	-0.425969
-2.224201	Use assembly	-0.124939
-2.207259	single assembly	-0.124939
-0.889523	inline assembly	-0.124939
-1.077818	compiler-generated assembly	-0.124939
-1.077138	Generate assembly	-0.124939
-0.601378	timing, assembly	-0.124939
-0.601378	MASM assembly	-0.124939
-3.359175	the large	-0.124939
-2.880851	is large	-0.124939
-2.321915	a large	-0.124939
-3.059182	of large	-0.124939
-2.547885	in large	-0.249877
-2.788143	for large	-0.124939
-3.100215	with large	-0.124939
-3.094055	on large	-0.124939
-3.065145	This large	-0.124939
-2.913684	use large	-0.124939
-2.219039	A large	-0.124939
-2.561553	In large	-0.124939
-2.506054	so large	-0.124939
-1.105894	very large	-0.124939
-2.223181	Use large	-0.124939
-2.205707	several large	-0.124939
-2.176959	cause large	-0.124939
-1.950391	too large	-0.124939
-0.943196	align large	-0.124939
-0.504721	sufficiently large	-0.425969
-1.077038	skip large	-0.124939
-3.430385	a must	-0.124939
-3.280088	that must	-0.124939
-3.162714	it must	-0.124939
-2.524245	function must	-0.124939
-3.032951	code must	-0.124939
-2.348072	compiler must	-0.124939
-3.025056	x must	-0.124939
-1.803604	you must	-0.221849
-2.879322	It must	-0.124939
-2.770793	program must	-0.124939
-2.112090	functions must	-0.124939
-2.777827	instruction must	-0.124939
-2.716218	but must	-0.124939
-2.636748	class must	-0.124939
-2.641045	do must	-0.124939
-2.590242	i must	-0.124939
-2.558105	object must	-0.124939
-2.518405	array must	-0.124939
-2.502671	we must	-0.124939
-2.451477	branch must	-0.124939
-2.397037	bit must	-0.124939
-2.337467	user must	-0.124939
-2.349797	they must	-0.124939
-2.271649	I must	-0.124939
-2.204973	threads must	-0.124939
-2.080428	sign must	-0.124939
-2.021129	programs must	-0.124939
-1.953362	We must	-0.124939
-1.899915	framework must	-0.124939
-1.882483	vectors must	-0.124939
-1.840516	constructor must	-0.124939
-1.794468	errors must	-0.124939
-1.790634	index must	-0.124939
-1.732642	task must	-0.124939
-1.297091	SIZE must	-0.124939
-0.900460	pragmas must	-0.124939
-0.900460	recursion must	-0.124939
-0.900460	any, must	-0.124939
-0.600743	correctness must	-0.124939
-2.967905	the while	-0.425969
-3.207651	and while	-0.124939
-3.167531	be while	-0.124939
-2.880015	time while	-0.124939
-2.636482	do while	-0.124939
-2.420993	0; while	-0.124939
-2.291253	bits while	-0.124939
-1.547868	calculations while	-0.124939
-2.055697	files while	-0.124939
-2.029787	cases, while	-0.124939
-1.839261	function, while	-0.124939
-1.815801	computer while	-0.124939
-1.824075	overhead while	-0.124939
-1.792212	Windows, while	-0.124939
-1.671417	x, while	-0.124939
-1.590850	used, while	-0.124939
-1.493940	4, while	-0.124939
-1.438724	vector, while	-0.124939
-1.440118	called, while	-0.124939
-1.437334	integers, while	-0.124939
-1.437334	size, while	-0.124939
-1.437334	compile-time while	-0.124939
-1.373171	break while	-0.124939
-1.371777	1.0; while	-0.124939
-1.295389	on, while	-0.124939
-1.295389	nothing while	-0.124939
-1.296792	threads, while	-0.124939
-0.900260	string; while	-0.124939
-0.900260	arguments while	-0.124939
-0.900260	exceptions: while	-0.124939
-0.900260	wide, while	-0.124939
-0.900260	once, while	-0.124939
-0.600643	incremented, while	-0.124939
-0.600643	expensive, while	-0.124939
-0.600643	unchanged, while	-0.124939
-0.600643	both, while	-0.124939
-0.600643	pattern, while	-0.124939
-0.600643	Func1, while	-0.124939
-0.600643	flexibility, while	-0.124939
-0.600643	semicolons, while	-0.124939
-0.600643	application, while	-0.124939
-0.600643	intended, while	-0.124939
-2.830179	a ;	-0.124939
-3.452185	to ;	-0.124939
-1.843889	loop ;	-0.124939
-2.592671	i ;	-0.124939
-1.861456	array ;	-0.124939
-2.452526	return ;	-0.124939
-2.451444	2 ;	-0.124939
-2.406030	4 ;	-0.124939
-1.577801	stack ;	-0.124939
-1.977444	name ;	-0.124939
-1.868796	r ;	-0.124939
-1.669224	true ;	-0.124939
-0.942295	ebx ;	-0.124939
-1.632622	align ;	-0.124939
-0.550096	label ;	-0.124939
-1.443518	( ;	-0.124939
-1.443518	Induction ;	-0.124939
-1.200480	ALIGN ;	-0.124939
-0.504419	Induction; ;	-0.124939
-1.075541	pure_function ;	-0.124939
-1.075541	esp ;	-0.124939
-0.900660	Induction++; ;	-0.124939
-0.600843	$B2$2 ;	-0.124939
-0.600843	a[i+2] ;	-0.124939
-0.600843	PROCNEAR ;	-0.124939
-0.600843	mode): ;	-0.124939
-0.600843	85 ;	-0.124939
-0.600843	NEAR ;	-0.124939
-0.600843	8.26b: ;	-0.124939
-0.600843	;edx=addressinr ;	-0.124939
-0.600843	sign(i) ;	-0.124939
-0.600843	;alignby4 ;	-0.124939
-0.600843	;r ;	-0.124939
-0.600843	;checkifi<100 ;	-0.124939
-0.600843	[esp+12] ;	-0.124939
-0.600843	;startofFunc ;	-0.124939
-2.578708	the arrays	-0.522879
-2.676984	of arrays	-0.124939
-3.039075	to arrays	-0.124939
-2.884030	and arrays	-0.124939
-3.299068	for arrays	-0.124939
-2.888814	when arrays	-0.124939
-2.800669	make arrays	-0.124939
-2.763614	different arrays	-0.124939
-2.761664	If arrays	-0.124939
-2.641255	size arrays	-0.124939
-1.903514	static arrays	-0.124939
-1.560018	large arrays	-0.124939
-1.540894	big arrays	-0.425969
-2.109458	few arrays	-0.124939
-2.023451	replace arrays	-0.124939
-1.223479	aligned arrays	-0.124939
-1.824333	global arrays	-0.124939
-1.707827	accessing arrays	-0.124939
-1.497875	variables, arrays	-0.124939
-1.077351	character arrays	-0.124939
-1.076439	Big arrays	-0.124939
-0.901260	Aligned arrays	-0.124939
-0.901260	variable-size arrays	-0.124939
-0.901260	Align arrays	-0.124939
-0.601144	clearing arrays	-0.124939
-0.601144	Linear arrays	-0.124939
-0.601144	Multidimensional arrays	-0.124939
-2.740278	the work	-0.271067
-3.517482	of work	-0.124939
-2.402201	to work	-0.191886
-3.339285	that work	-0.124939
-3.199920	it work	-0.124939
-2.401011	not work	-0.124939
-2.984602	may work	-0.124939
-2.901595	this work	-0.124939
-2.212606	will work	-0.124939
-2.728474	other work	-0.124939
-2.714377	should work	-0.124939
-2.550837	also work	-0.124939
-2.306574	always work	-0.124939
-2.318362	programming work	-0.124939
-2.250934	extra work	-0.124939
-2.241361	versions work	-0.124939
-1.013766	doesn't work	-0.124939
-1.983529	model work	-0.124939
-1.984380	development work	-0.124939
-1.222667	directives work	-0.425969
-1.672494	little work	-0.124939
-1.636441	BSD work	-0.124939
-1.499009	heavy work	-0.124939
-1.200600	caches work	-0.124939
-1.076539	User work	-0.124939
-0.601177	reinstallation work	-0.124939
-3.419559	to (see	-0.124939
-2.959680	compiler (see	-0.124939
-2.678426	one (see	-0.124939
-2.032363	cache (see	-0.124939
-2.632809	class (see	-0.124939
-2.601835	double (see	-0.124939
-2.614399	pointer (see	-0.124939
-2.547862	efficient (see	-0.124939
-2.472666	variables (see	-0.124939
-2.382394	register (see	-0.124939
-1.672732	registers (see	-0.425969
-2.297685	16 (see	-0.124939
-2.289020	system (see	-0.124939
-2.267401	instructions (see	-0.124939
-2.281459	processors (see	-0.124939
-2.260060	constant (see	-0.124939
-2.247405	stack (see	-0.124939
-2.209963	necessary (see	-0.124939
-2.175948	integers (see	-0.124939
-2.157229	precision (see	-0.124939
-2.107973	list (see	-0.124939
-2.076448	1 (see	-0.124939
-2.052311	counter (see	-0.124939
-1.918917	expressions (see	-0.124939
-1.845035	range (see	-0.124939
-1.797943	intended (see	-0.124939
-1.075318	vectorization (see	-0.425969
-1.763167	checking (see	-0.124939
-1.589762	templates (see	-0.124939
-1.591111	stride (see	-0.124939
-1.545354	chains (see	-0.124939
-1.545354	time-consuming (see	-0.124939
-0.855396	aliasing (see	-0.425969
-1.494201	prediction (see	-0.124939
-1.494201	profiling (see	-0.124939
-1.437563	capabilities (see	-0.124939
-1.374702	throughput (see	-0.124939
-1.075043	mispredictions (see	-0.124939
-0.900327	profitable (see	-0.124939
-0.600676	CPU-dispatching (see	-0.124939
-0.600676	devirtualization (see	-0.124939
-3.157832	the Windows	-0.124939
-3.091157	a Windows	-0.124939
-2.430107	and Windows	-0.221849
-2.948348	in Windows	-0.124939
-2.803116	The Windows	-0.124939
-2.290489	for Windows	-0.346788
-3.231350	// Windows	-0.124939
-3.104947	on Windows	-0.124939
-1.745528	compiler Windows	-0.425969
-1.207384	64-bit Windows	-0.271067
-1.537933	32-bit Windows	-0.124939
-2.419880	bit Windows	-0.124939
-1.362910	both Windows	-0.425969
-1.297963	(In Windows	-0.124939
-1.201664	BSD, Windows	-0.124939
-1.077338	Object Windows	-0.124939
-0.901860	_LP64 Windows	-0.124939
-0.601445	position. Windows	-0.124939
-3.350187	the calls	-0.425969
-3.543312	of calls	-0.124939
-3.361258	and calls	-0.124939
-2.252041	that calls	-0.221849
-1.470152	function calls	-0.287666
-3.119788	by calls	-0.124939
-2.209350	then calls	-0.124939
-2.680076	no calls	-0.124939
-2.527345	many calls	-0.124939
-2.311325	always calls	-0.124939
-2.311447	system calls	-0.124939
-2.134538	Function calls	-0.124939
-2.131499	support calls	-0.124939
-2.117398	contains calls	-0.124939
-1.955258	Make calls	-0.124939
-1.802282	turn calls	-0.124939
-0.901995	F1 calls	-0.425969
-1.297909	16.2 calls	-0.124939
-1.297132	Multiple calls	-0.124939
-1.200999	loader calls	-0.124939
-1.200999	handler calls	-0.124939
-1.076838	API calls	-0.124939
-0.601278	jumps, calls	-0.124939
-2.814415	the calculations	-0.124939
-2.829039	of calculations	-0.124939
-3.327765	The calculations	-0.124939
-3.086943	on calculations	-0.124939
-2.737669	other calculations	-0.124939
-1.288714	point calculations	-0.124939
-2.691371	integer calculations	-0.124939
-1.747034	do calculations	-0.301030
-1.656240	multiple calculations	-0.301030
-2.498352	some calculations	-0.124939
-1.756214	address calculations	-0.124939
-2.220899	necessary calculations	-0.124939
-1.489946	precision calculations	-0.124939
-2.103679	doing calculations	-0.124939
-2.073524	All calculations	-0.124939
-2.078036	certain calculations	-0.124939
-2.069119	intermediate calculations	-0.124939
-1.299155	mathematical calculations	-0.124939
-1.849571	start calculations	-0.124939
-1.635692	parallel calculations	-0.124939
-1.498930	background calculations	-0.124939
-1.500477	arithmetic calculations	-0.124939
-1.442485	consuming calculations	-0.124939
-2.451310	code versions	-0.124939
-2.358005	compiler versions	-0.124939
-2.246119	more versions	-0.425969
-1.277330	different versions	-0.522879
-2.594116	library versions	-0.124939
-1.095226	multiple versions	-0.380211
-1.655357	two versions	-0.301030
-2.347438	new versions	-0.124939
-2.265674	Some versions	-0.124939
-2.231499	compiled versions	-0.124939
-2.204812	several versions	-0.124939
-2.164216	optimized versions	-0.124939
-1.948325	three versions	-0.124939
-1.738551	special versions	-0.124939
-1.635284	Library versions	-0.124939
-1.596104	newer versions	-0.124939
-1.444170	latest versions	-0.124939
-0.901593	CPU-specific versions	-0.124939
-0.203970	New versions	-0.425969
-0.901593	command-line versions	-0.124939
-0.901593	Fast versions	-0.124939
-0.601311	trial versions	-0.124939
-2.677916	the execution	-0.249877
-3.039277	of execution	-0.425969
-3.510093	to execution	-0.124939
-2.880810	and execution	-0.124939
-2.781872	The execution	-0.124939
-3.292512	for execution	-0.124939
-2.474308	with execution	-0.425969
-2.989328	an execution	-0.124939
-2.916114	have execution	-0.124939
-2.156554	program execution	-0.124939
-2.129814	different execution	-0.124939
-2.746977	point execution	-0.124939
-2.639743	size execution	-0.124939
-2.567318	64-bit execution	-0.124939
-2.537604	where execution	-0.124939
-1.798631	order execution	-0.124939
-2.235418	while execution	-0.124939
-2.199477	several execution	-0.124939
-2.077556	optimizing execution	-0.124939
-1.996609	their execution	-0.124939
-0.793418	out-of-order execution	-0.301030
-1.015136	total execution	-0.124939
-1.675575	128-bit execution	-0.124939
-1.674638	during execution	-0.124939
-1.498547	fastest execution	-0.124939
-1.076339	delays execution	-0.124939
-0.601110	full-size execution	-0.124939
-0.601110	Out-of-order execution	-0.124939
-1.865335	to avoid	-0.372723
-3.524083	and avoid	-0.124939
-1.898819	can avoid	-0.329059
-2.098109	may avoid	-0.124939
-2.335873	you avoid	-0.124939
-1.630262	should avoid	-0.124939
-1.764782	cannot avoid	-0.124939
-2.030293	preferably avoid	-0.124939
-1.961282	means avoid	-0.124939
-2.233348	the result	-0.513119
-2.867891	a result	-0.301030
-2.302904	The result	-0.221849
-2.618349	// result	-0.425969
-2.939956	this result	-0.124939
-2.787069	same result	-0.124939
-2.416409	first result	-0.124939
-2.097531	store result	-0.124939
-1.378874	intermediate result	-0.124939
-1.954109	better result	-0.124939
-1.895937	second result	-0.124939
-1.015901	final result	-0.124939
-1.599492	Store result	-0.124939
-1.202064	33 result	-0.124939
-1.077639	correct result	-0.124939
-2.737015	the processor	-0.367977
-3.057953	a processor	-0.124939
-3.036507	of processor	-0.425969
-2.754011	that processor	-0.425969
-3.066279	on processor	-0.124939
-3.044059	This processor	-0.124939
-2.964784	than processor	-0.124939
-1.715843	same processor	-0.425969
-2.080241	which processor	-0.425969
-2.654622	each processor	-0.124939
-2.567405	multiple processor	-0.124939
-2.498169	any processor	-0.124939
-2.339768	new processor	-0.124939
-2.202384	specific processor	-0.124939
-1.523397	virtual processor	-0.425969
-2.127822	support processor	-0.124939
-2.027155	particular processor	-0.124939
-2.009494	next processor	-0.124939
-1.945482	better processor	-0.124939
-1.822778	VIA processor	-0.124939
-1.708201	non-Intel processor	-0.124939
-1.673439	logical processor	-0.124939
-0.504560	soft processor	-0.124939
-1.200202	hyperthreading processor	-0.124939
-0.901126	physics processor	-0.124939
-0.901126	word processor	-0.124939
-0.601077	i7 processor	-0.124939
-0.601077	Core2 processor	-0.124939
-0.601077	M processor	-0.124939
-3.012543	the compiled	-0.249877
-2.252932	is compiled	-0.212089
-3.401897	and compiled	-0.124939
-2.703906	in compiled	-0.124939
-2.488305	be compiled	-0.124939
-2.710889	are compiled	-0.124939
-2.048014	code compiled	-0.249877
-2.245389	when compiled	-0.124939
-2.750240	other compiled	-0.124939
-2.674555	each compiled	-0.124939
-2.586020	object compiled	-0.124939
-2.411808	first compiled	-0.124939
-1.347401	programs compiled	-0.425969
-0.688706	directly compiled	-0.124939
-1.551140	fully compiled	-0.124939
-1.201531	8.26a compiled	-0.124939
-1.202177	normally compiled	-0.124939
-0.901794	8.26b compiled	-0.124939
-0.601411	cross- compiled	-0.124939
-2.839796	} An	-0.124939
-2.744308	functions An	-0.124939
-2.529278	C++ An	-0.124939
-2.470455	variables An	-0.124939
-2.354978	code. An	-0.124939
-2.353311	time. An	-0.124939
-2.291598	operations An	-0.124939
-2.099997	operators An	-0.124939
-2.080446	store An	-0.124939
-2.052417	program. An	-0.124939
-2.051053	used. An	-0.124939
-1.789085	cases. An	-0.124939
-1.666949	classes. An	-0.124939
-1.543419	inefficient. An	-0.124939
-1.543419	executed. An	-0.124939
-1.493679	arrays. An	-0.124939
-1.493679	zero. An	-0.124939
-1.439953	important. An	-0.124939
-1.371580	above. An	-0.124939
-0.679791	result. An	-0.124939
-1.295257	propagation An	-0.124939
-1.295257	Arrays An	-0.124939
-1.198347	declared. An	-0.124939
-1.199783	s; An	-0.124939
-1.198347	www.agner.org/optimize/cppexamples.zip. An	-0.124939
-1.074844	Inheritance An	-0.124939
-1.074844	Enums An	-0.124939
-1.074844	inlined. An	-0.124939
-1.076285	fragmented. An	-0.124939
-1.074844	Devirtualization An	-0.124939
-0.900194	CPUs: An	-0.124939
-0.900194	C++: An	-0.124939
-0.900194	thing. An	-0.124939
-0.600609	leak. An	-0.124939
-0.600609	+127. An	-0.124939
-0.600609	systems"). An	-0.124939
-0.600609	pointers). An	-0.124939
-0.600609	matrices. An	-0.124939
-0.600609	standard. An	-0.124939
-0.600609	checking). An	-0.124939
-0.600609	language: An	-0.124939
-0.600609	supports. An	-0.124939
-0.600609	27. An	-0.124939
-2.204624	// Use	-0.124939
-3.168235	it Use	-0.124939
-2.754892	functions Use	-0.124939
-2.200659	language Use	-0.124939
-2.176522	etc. Use	-0.124939
-1.262321	data. Use	-0.124939
-1.922512	compilers. Use	-0.124939
-1.838947	compiler. Use	-0.124939
-1.828769	example: Use	-0.124939
-1.791410	1. Use	-0.124939
-1.764593	2. Use	-0.124939
-1.668898	libraries. Use	-0.124939
-1.667683	possible. Use	-0.124939
-1.496473	file. Use	-0.124939
-1.444661	this: Use	-0.124939
-1.439710	details. Use	-0.124939
-1.294815	names. Use	-0.124939
-1.294815	files. Use	-0.124939
-1.294815	point. Use	-0.124939
-1.199141	reasons. Use	-0.124939
-1.199141	optimal. Use	-0.124939
-1.199141	option. Use	-0.124939
-1.201623	3. Use	-0.124939
-1.076684	implemented. Use	-0.124939
-1.075441	expected. Use	-0.124939
-1.075441	spot. Use	-0.124939
-0.203870	3.2 Use	-0.425969
-0.203870	14.3 Use	-0.425969
-0.203870	14.1 Use	-0.425969
-0.900593	server. Use	-0.124939
-0.900593	contentions. Use	-0.124939
-0.900593	48 Use	-0.124939
-0.600810	link. Use	-0.124939
-0.600810	listing. Use	-0.124939
-0.600810	7.34a. Use	-0.124939
-0.600810	"__attribute__((visibility("hidden")))". Use	-0.124939
-0.600810	this). Use	-0.124939
-3.620800	of bytes	-0.124939
-0.990193	4 bytes	-0.550907
-0.928020	8 bytes	-0.301030
-2.412452	64 bytes	-0.124939
-1.019370	16 bytes	-0.301030
-1.043509	128 bytes	-0.124939
-0.408096	unused bytes	-0.191886
-1.803612	consecutive bytes	-0.124939
-1.552711	65 bytes	-0.124939
-1.443567	127 bytes	-0.124939
-1.443567	size, bytes	-0.124939
-0.902061	2048 bytes	-0.124939
-0.902061	0x40 bytes	-0.124939
-0.601545	800 bytes	-0.124939
-0.601545	alignment, bytes	-0.124939
-3.746182	the big	-0.124939
-2.728993	is big	-0.425969
-2.436762	a big	-0.124939
-3.534530	of big	-0.124939
-3.351666	and big	-0.124939
-3.521869	in big	-0.124939
-3.319352	for big	-0.124939
-3.349153	that big	-0.124939
-3.083430	on big	-0.124939
-3.079548	code big	-0.124939
-1.884753	have big	-0.249877
-2.906181	use big	-0.124939
-2.840232	A big	-0.124939
-2.734582	other big	-0.124939
-1.808561	one big	-0.124939
-2.021980	no big	-0.124939
-1.822659	so big	-0.425969
-1.388327	very big	-0.124939
-2.392208	how big	-0.124939
-1.261726	too big	-0.124939
-1.711131	On big	-0.124939
-1.709519	writing big	-0.124939
-1.441479	Even big	-0.124939
-0.601244	yesterday's big	-0.124939
-3.391375	and doesn't	-0.124939
-2.088366	that doesn't	-0.191886
-1.672311	it doesn't	-0.321233
-1.550015	compiler doesn't	-0.380211
-2.892648	It doesn't	-0.124939
-2.774038	CPU doesn't	-0.124939
-2.796242	instruction doesn't	-0.124939
-2.697902	integer doesn't	-0.124939
-2.676073	class doesn't	-0.124939
-2.651990	size doesn't	-0.124939
-2.584580	object doesn't	-0.124939
-2.334825	method doesn't	-0.124939
-2.272246	error doesn't	-0.124939
-2.190017	overflow doesn't	-0.124939
-2.174420	line doesn't	-0.124939
-2.054338	above doesn't	-0.124939
-2.021307	microprocessor doesn't	-0.124939
-1.769367	operation doesn't	-0.124939
-1.501074	volatile doesn't	-0.124939
-1.297630	currently doesn't	-0.124939
-2.907060	the threads	-0.124939
-3.068000	of threads	-0.124939
-3.365818	The threads	-0.124939
-3.363004	for threads	-0.124939
-3.380179	that threads	-0.124939
-3.153196	or threads	-0.124939
-2.248414	more threads	-0.124939
-1.886476	different threads	-0.124939
-2.104478	other threads	-0.124939
-2.732851	all threads	-0.124939
-2.643492	into threads	-0.124939
-1.045848	multiple threads	-0.124939
-1.290806	two threads	-0.204120
-1.795942	between threads	-0.124939
-2.125005	eight threads	-0.124939
-1.298819	separate threads	-0.124939
-1.077338	Two threads	-0.124939
-0.901860	high-priority threads	-0.124939
-2.300962	the best	-0.257564
-2.902079	is best	-0.124939
-1.972837	The best	-0.279841
-3.438006	for best	-0.124939
-3.361856	are best	-0.124939
-2.246871	work best	-0.124939
-0.950988	works best	-0.522879
-0.681053	fits best	-0.425969
-1.202865	performs best	-0.124939
-2.753584	the necessary	-0.124939
-2.178518	is necessary	-0.636822
-3.631536	of necessary	-0.124939
-3.458707	and necessary	-0.124939
-2.338712	be necessary	-0.425969
-3.322421	are necessary	-0.124939
-3.237344	it necessary	-0.124939
-1.803498	not necessary	-0.204120
-2.553541	where necessary	-0.124939
-2.522192	any necessary	-0.124939
-1.718130	often necessary	-0.425969
-1.237600	therefore necessary	-0.602060
-1.638187	rarely necessary	-0.124939
-1.077739	124 necessary	-0.124939
-3.561427	of element	-0.124939
-3.128431	by element	-0.124939
-2.481460	with element	-0.124939
-3.012202	an element	-0.124939
-2.152970	vector element	-0.425969
-2.720016	one element	-0.124939
-1.251954	each element	-0.550907
-1.246894	array element	-0.124939
-2.478794	table element	-0.124939
-1.477471	first element	-0.301030
-2.348545	new element	-0.124939
-2.255241	extra element	-0.124939
-2.108568	calculate element	-0.124939
-2.096599	every element	-0.124939
-1.954577	last element	-0.124939
-1.223166	Each element	-0.124939
-0.601620	per element	-0.249877
-0.425651	largest element	-0.301030
-1.077038	nearest element	-0.124939
-0.901660	dummy element	-0.124939
-0.601344	reach element	-0.124939
-3.648175	a language	-0.124939
-2.943617	this language	-0.124939
-2.877180	A language	-0.124939
-1.274308	C++ language	-0.204120
-0.729310	programming language	-0.321233
-0.651823	assembly language	-0.166331
-2.209699	common language	-0.124939
-1.827243	C language	-0.124939
-1.675500	Any language	-0.124939
-1.502270	preferred language	-0.124939
-1.444278	D language	-0.124939
-0.333067	definition language	-0.249877
-1.377331	mixed language	-0.124939
-0.380019	high-level language	-0.124939
-2.357296	code. But	-0.124939
-2.355415	time. But	-0.124939
-1.567767	function. But	-0.124939
-1.507093	etc. But	-0.124939
-1.817525	pointer. But	-0.124939
-1.763167	b) But	-0.124939
-1.730534	resources. But	-0.124939
-1.702360	cycles. But	-0.124939
-1.667598	precision. But	-0.124939
-1.591111	CPU. But	-0.124939
-1.589762	returns. But	-0.124939
-1.548065	integers. But	-0.124939
-1.544005	parameter. But	-0.124939
-1.496913	array. But	-0.124939
-1.495555	line. But	-0.124939
-1.495555	applications. But	-0.124939
-1.440283	integer. But	-0.124939
-1.371974	members. But	-0.124939
-1.294155	names. But	-0.124939
-1.294155	itself. But	-0.124939
-1.294155	5. But	-0.124939
-1.198611	languages. But	-0.124939
-1.198611	obsolete. But	-0.124939
-1.075043	inlined. But	-0.124939
-1.075043	n. But	-0.124939
-0.900327	programmed. But	-0.124939
-0.900327	market. But	-0.124939
-0.900327	checks. But	-0.124939
-0.900327	issue. But	-0.124939
-0.900327	delay. But	-0.124939
-0.900327	1.23456. But	-0.124939
-0.900327	today. But	-0.124939
-0.600676	versa. But	-0.124939
-0.600676	themselves. But	-0.124939
-0.600676	++i). But	-0.124939
-0.600676	C1::f. But	-0.124939
-0.600676	simplicity. But	-0.124939
-0.600676	session. But	-0.124939
-0.600676	b. But	-0.124939
-0.600676	miss. But	-0.124939
-0.600676	conflicts. But	-0.124939
-2.634379	the speed	-0.380211
-3.066073	to speed	-0.425969
-2.435923	in speed	-0.522879
-2.301530	The speed	-0.124939
-2.557437	for speed	-0.124939
-3.196873	if speed	-0.124939
-2.246399	when speed	-0.425969
-1.867402	where speed	-0.425969
-2.517280	any speed	-0.124939
-1.024071	execution speed	-0.221849
-1.972486	high speed	-0.124939
-1.897307	improve speed	-0.124939
-1.872739	reduce speed	-0.124939
-1.710796	reduced speed	-0.124939
-1.709073	processing speed	-0.124939
-0.680776	half speed	-0.124939
-1.377310	Testing speed	-0.124939
-3.014592	the specific	-0.124939
-3.638473	is specific	-0.124939
-2.090323	a specific	-0.221849
-3.068000	of specific	-0.124939
-2.794298	for specific	-0.425969
-3.293265	are specific	-0.124939
-3.231350	// specific	-0.124939
-3.153196	or specific	-0.124939
-2.845957	at specific	-0.124939
-1.773322	no specific	-0.124939
-1.588758	any specific	-0.124939
-1.771432	code, specific	-0.124939
-1.678174	fit specific	-0.124939
-1.673916	Any specific	-0.124939
-1.376533	giving specific	-0.124939
-0.124870	CPU- specific	-0.124939
-0.601445	system- specific	-0.124939
-0.601445	application- specific	-0.124939
-3.510093	to c	-0.124939
-2.418888	and c	-0.221849
-3.292779	The c	-0.124939
-3.128398	= c	-0.124939
-3.166615	if c	-0.124939
-2.950985	{ c	-0.124939
-2.871682	then c	-0.124939
-2.149685	vector c	-0.425969
-2.760487	If c	-0.124939
-1.943430	+ c	-0.425969
-2.519924	* c	-0.124939
-1.331828	0; c	-0.726999
-2.143414	b; c	-0.124939
-1.404266	: c	-0.124939
-1.298755	__m128i c	-0.425969
-1.969163	division c	-0.124939
-1.876087	b, c	-0.124939
-1.179190	0, c	-0.425969
-1.047510	d; c	-0.425969
-0.942964	? c	-0.425969
-1.637786	temp; c	-0.124939
-1.374545	Is16vec8 c	-0.124939
-1.200334	100, c	-0.124939
-1.076339	zero, c	-0.124939
-0.901193	3.5; c	-0.124939
-0.601110	1.0E8, c	-0.124939
-0.601110	CFALSE: c	-0.124939
-0.601110	(a+1); c	-0.124939
-2.460132	is much	-0.271067
-3.556713	a much	-0.124939
-2.897155	and much	-0.124939
-3.259372	are much	-0.124939
-2.436401	as much	-0.124939
-2.930538	have much	-0.124939
-2.843788	A much	-0.124939
-2.666225	do much	-0.124939
-1.324244	takes much	-0.522879
-2.504204	so much	-0.124939
-2.470544	very much	-0.124939
-2.407490	take much	-0.124939
-2.386275	often much	-0.124939
-1.293094	how much	-0.425969
-2.258017	accessed much	-0.124939
-2.154121	calculated much	-0.124939
-2.140715	uses much	-0.124939
-1.007895	too much	-0.124939
-1.866942	usually much	-0.124939
-1.869986	made much	-0.124939
-0.425708	How much	-0.301030
-1.201777	obtain much	-0.124939
-0.901527	worry much	-0.124939
-1.861991	a single	-0.200659
-3.604904	to single	-0.124939
-3.370726	for single	-0.124939
-3.300372	are single	-0.124939
-3.117690	with single	-0.124939
-3.073041	as single	-0.124939
-3.007935	than single	-0.124939
-2.262407	use single	-0.124939
-2.849144	from single	-0.124939
-2.662690	using single	-0.124939
-2.477557	between single	-0.124939
-2.283349	constant single	-0.124939
-2.140681	four single	-0.124939
-2.125465	eight single	-0.124939
-1.709647	testing single	-0.124939
-1.674885	mix single	-0.124939
-1.202377	mixing single	-0.124939
-2.614084	= i;	-0.425969
-0.926261	int i;	-0.579197
-2.643054	+ i;	-0.124939
-1.504430	*= i;	-0.124939
-0.902730	b[size], i;	-0.124939
-2.843109	} These	-0.124939
-2.357296	code. These	-0.124939
-2.355415	time. These	-0.124939
-2.172614	etc. These	-0.124939
-2.069165	memory. These	-0.124939
-2.007858	operator These	-0.124939
-1.981017	cache. These	-0.124939
-1.924175	set. These	-0.124939
-1.841011	CPUs. These	-0.124939
-1.817525	pointer. These	-0.124939
-1.735890	calls. These	-0.124939
-1.667598	libraries. These	-0.124939
-1.668943	stack. These	-0.124939
-1.666257	needed. These	-0.124939
-1.667598	precision. These	-0.124939
-1.631155	vector. These	-0.124939
-0.900878	CPU. These	-0.124939
-1.592465	problem. These	-0.124939
-1.495555	vectors. These	-0.124939
-1.294155	process. These	-0.124939
-1.294155	so. These	-0.124939
-1.198611	efficiency. These	-0.124939
-1.199982	another. These	-0.124939
-1.198611	www.agner.org/optimize/cppexamples.zip. These	-0.124939
-1.075043	CodeAnalyst. These	-0.124939
-1.075043	microprocessor. These	-0.124939
-1.075043	best. These	-0.124939
-1.075043	lookup. These	-0.124939
-1.075043	free. These	-0.124939
-0.900327	0x4700. These	-0.124939
-0.900327	checks. These	-0.124939
-0.900327	server. These	-0.124939
-0.900327	AVX. These	-0.124939
-0.600676	128. These	-0.124939
-0.600676	1996. These	-0.124939
-0.600676	commpage. These	-0.124939
-0.600676	Fortran. These	-0.124939
-0.600676	_mm. These	-0.124939
-0.600676	(GOT). These	-0.124939
-0.600676	3.x. These	-0.124939
-0.600676	Primitives". These	-0.124939
-2.684589	the virtual	-0.249877
-2.511738	a virtual	-0.425969
-3.059182	of virtual	-0.124939
-3.055072	to virtual	-0.124939
-2.903869	and virtual	-0.124939
-3.342588	The virtual	-0.124939
-2.481460	with virtual	-0.124939
-2.219039	A virtual	-0.124939
-2.743909	other virtual	-0.124939
-2.768791	If virtual	-0.124939
-2.720016	one virtual	-0.124939
-2.684313	no virtual	-0.124939
-2.239797	avoid virtual	-0.124939
-2.203659	These virtual	-0.124939
-0.809840	public: virtual	-0.726999
-1.771754	inefficient virtual	-0.124939
-1.708197	Avoid virtual	-0.124939
-1.706792	so-called virtual	-0.124939
-0.747330	Java virtual	-0.124939
-0.901660	NotPolymorphic(); virtual	-0.124939
-0.601344	inheritance, virtual	-0.124939
-3.509202	of several	-0.124939
-3.324098	and several	-0.124939
-3.506362	in several	-0.124939
-2.284699	for several	-0.221849
-3.334433	that several	-0.124939
-2.112032	are several	-0.425969
-3.103000	by several	-0.124939
-3.073058	on several	-0.124939
-2.287543	have several	-0.124939
-2.888814	when several	-0.124939
-1.746356	has several	-0.124939
-2.761664	If several	-0.124939
-2.734965	but several	-0.124939
-2.465318	between several	-0.124939
-1.471150	take several	-0.124939
-2.348392	test several	-0.124939
-2.114695	contains several	-0.124939
-1.911627	requires several	-0.124939
-1.913412	load several	-0.124939
-1.794419	statement several	-0.124939
-1.735535	save several	-0.124939
-1.551741	provided several	-0.124939
-1.296469	package several	-0.124939
-1.076439	took several	-0.124939
-0.601144	alternatingly several	-0.124939
-0.601144	wastes several	-0.124939
-0.601144	Connecting several	-0.124939
-2.290480	function through	-0.301030
-2.814608	data through	-0.124939
-2.097127	loop through	-0.425969
-1.761643	class through	-0.602060
-2.630809	b through	-0.124939
-2.589348	library through	-0.124939
-1.656331	object through	-0.301030
-2.505219	variable through	-0.124939
-2.426655	called through	-0.124939
-2.437026	address through	-0.124939
-0.811145	accessed through	-0.425969
-1.179639	goes through	-0.124939
-0.877040	go through	-0.124939
-1.821596	main through	-0.124939
-1.799432	Loop through	-0.124939
-1.635034	updates through	-0.124939
-1.500920	GOT through	-0.124939
-1.499240	jump through	-0.124939
-1.375982	7 through	-0.124939
-1.296800	separately through	-0.124939
-0.901393	caller through	-0.124939
-0.601211	Walking through	-0.124939
-0.601211	propagate through	-0.124939
-0.601211	looping through	-0.124939
-0.601211	propagated through	-0.124939
-3.368353	the common	-0.124939
-2.735121	is common	-0.249877
-2.377672	a common	-0.182931
-3.580330	of common	-0.124939
-3.357936	The common	-0.124939
-3.355417	for common	-0.124939
-3.146301	or common	-0.124939
-3.063995	as common	-0.124939
-2.904420	more common	-0.124939
-2.858313	A common	-0.124939
-2.750240	other common	-0.124939
-1.186348	most common	-0.124939
-1.864108	many common	-0.124939
-1.327331	Some common	-0.124939
-2.076864	All common	-0.124939
-2.005504	allows common	-0.124939
-0.747410	eliminate common	-0.124939
-1.202177	pointer, common	-0.124939
-0.601411	inlining, common	-0.124939
-2.084052	= a,	-0.522879
-3.202608	if a,	-0.124939
-1.567352	int a,	-0.778151
-2.826540	vector a,	-0.124939
-1.568295	double a,	-0.726999
-1.219980	float a,	-0.669007
-2.431972	example, a,	-0.124939
-2.291470	& a,	-0.124939
-0.833736	(int a,	-0.522879
-1.851194	i, a,	-0.124939
-0.793545	bool a,	-0.301030
-1.077639	Vec16s a,	-0.124939
-0.902061	Vec8s a,	-0.124939
-0.601545	vector(float a,	-0.124939
-0.601545	memcpy(b, a,	-0.124939
-2.908667	the thread	-0.124939
-2.515056	a thread	-0.124939
-2.572573	or thread	-0.124939
-3.038646	not thread	-0.124939
-2.865762	A thread	-0.124939
-2.780911	same thread	-0.124939
-2.756665	other thread	-0.124939
-1.810439	one thread	-0.124939
-1.599070	each thread	-0.124939
-2.319188	system thread	-0.124939
-1.043256	another thread	-0.124939
-0.768935	separate thread	-0.346788
-0.809665	Each thread	-0.124939
-1.298129	third thread	-0.124939
-0.901927	high-priority thread	-0.124939
-0.601478	low-priority thread	-0.124939
-0.601478	higher-priority thread	-0.124939
-2.056777	files etc.	-0.124939
-1.763523	b) etc.	-0.124939
-1.076754	functions, etc.	-0.124939
-1.669236	2, etc.	-0.124939
-1.588742	compiler, etc.	-0.124939
-1.437792	size, etc.	-0.124939
-1.294320	pointers, etc.	-0.124939
-1.294320	programming, etc.	-0.124939
-1.198744	units, etc.	-0.124939
-1.075143	/arch:AVX etc.	-0.124939
-1.075143	multiplication, etc.	-0.124939
-1.075143	counters, etc.	-0.124939
-1.075143	access, etc.	-0.124939
-1.075143	comparisons, etc.	-0.124939
-1.075143	way, etc.	-0.124939
-0.900393	Monday, etc.	-0.124939
-0.900393	databases, etc.	-0.124939
-0.900393	mutexes, etc.	-0.124939
-0.900393	resolutions, etc.	-0.124939
-0.900393	add, etc.	-0.124939
-0.900393	loops, etc.	-0.124939
-0.900393	propagation, etc.	-0.124939
-0.900393	limit, etc.	-0.124939
-0.900393	mispredictions, etc.	-0.124939
-0.600710	<ia32intrin.h> etc.	-0.124939
-0.600710	maps etc.	-0.124939
-0.600710	sprintf, etc.	-0.124939
-0.600710	sin, etc.	-0.124939
-0.600710	cards, etc.	-0.124939
-0.600710	-mavx, etc.	-0.124939
-0.600710	ports, etc.	-0.124939
-0.600710	history, etc.	-0.124939
-0.600710	connections, etc.	-0.124939
-0.600710	-axSSE3, etc.	-0.124939
-0.600710	/QaxSSE3, etc.	-0.124939
-0.600710	Basic, etc.	-0.124939
-0.600710	brushes, etc.	-0.124939
-0.600710	boxes, etc.	-0.124939
-0.600710	_endthread(), etc.	-0.124939
-0.600710	exceptions, etc.	-0.124939
-2.903869	and AMD	-0.124939
-3.342588	The AMD	-0.124939
-2.552673	for AMD	-0.124939
-1.851771	on AMD	-0.301030
-3.055134	as AMD	-0.124939
-2.913684	use AMD	-0.124939
-2.227969	bytes AMD	-0.124939
-2.202142	AMD AMD	-0.124939
-2.049674	both AMD	-0.124939
-1.912273	processors. AMD	-0.124939
-0.233996	Intel, AMD	-0.937852
-1.848122	operands AMD	-0.124939
-1.796268	platforms. AMD	-0.124939
-1.201265	(Gnu) AMD	-0.124939
-0.901660	131. AMD	-0.124939
-0.901660	op. AMD	-0.124939
-0.901660	ammintrin.h AMD	-0.124939
-0.601344	_mm_exp_pd AMD	-0.124939
-0.601344	__vrd2_exp AMD	-0.124939
-0.601344	XOP, AMD	-0.124939
-0.601344	immintrin.h AMD	-0.124939
-2.580992	to compile	-0.221849
-2.961684	and compile	-0.124939
-2.082900	you compile	-0.301030
-0.817072	at compile	-1.000000
-2.540281	we compile	-0.124939
-2.811832	the exception	-0.425969
-3.574295	is exception	-0.124939
-3.525922	of exception	-0.124939
-3.533967	to exception	-0.124939
-3.342281	and exception	-0.124939
-2.788137	The exception	-0.425969
-3.312485	for exception	-0.124939
-3.344191	that exception	-0.124939
-3.111313	by exception	-0.124939
-3.079945	on exception	-0.124939
-1.688609	an exception	-0.271067
-2.903709	use exception	-0.124939
-2.804006	program exception	-0.124939
-2.764027	If exception	-0.124939
-2.021517	no exception	-0.124939
-2.646958	using exception	-0.124939
-2.556371	C++ exception	-0.124939
-2.540970	possible exception	-0.124939
-2.504446	any exception	-0.124939
-1.765584	No exception	-0.124939
-1.736447	save exception	-0.124939
-1.635034	why exception	-0.124939
-0.504640	structured exception	-0.124939
-0.504640	disable exception	-0.425969
-0.901393	Enable exception	-0.124939
-3.160697	the allocated	-0.124939
-2.737596	is allocated	-0.124939
-3.600094	of allocated	-0.124939
-2.805298	The allocated	-0.425969
-2.133165	be allocated	-0.425969
-2.319984	are allocated	-0.249877
-2.830917	has allocated	-0.124939
-2.756665	other allocated	-0.124939
-2.074807	all allocated	-0.425969
-2.678654	each allocated	-0.124939
-2.344465	sure allocated	-0.124939
-2.230635	An allocated	-0.124939
-2.173213	been allocated	-0.124939
-1.892736	own allocated	-0.124939
-0.343621	dynamically allocated	-0.329059
-0.805366	Memory allocated	-0.425969
-1.201798	slices allocated	-0.124939
-2.534496	is small	-0.301030
-2.376460	a small	-0.182931
-3.561427	of small	-0.124939
-3.567989	to small	-0.124939
-3.381102	and small	-0.124939
-3.537950	in small	-0.124939
-2.552673	for small	-0.301030
-3.132830	or small	-0.124939
-3.094055	on small	-0.124939
-3.055134	as small	-0.124939
-2.640427	into small	-0.124939
-1.885586	such small	-0.124939
-1.863345	many small	-0.124939
-2.501492	some small	-0.124939
-1.823236	so small	-0.425969
-1.799641	very small	-0.124939
-2.018640	typically small	-0.124939
-1.262105	too small	-0.124939
-1.710312	writing small	-0.124939
-1.297464	relatively small	-0.124939
-1.201977	kept small	-0.124939
-3.146555	the overflow	-0.124939
-2.683061	of overflow	-0.249877
-2.198889	for overflow	-0.204120
-2.762977	that overflow	-0.124939
-3.090485	on overflow	-0.124939
-2.126487	an overflow	-0.124939
-2.148579	make overflow	-0.425969
-2.099546	point overflow	-0.124939
-1.784794	integer overflow	-0.124939
-2.682189	no overflow	-0.124939
-2.540149	array overflow	-0.124939
-2.544363	possible overflow	-0.124939
-2.259243	about overflow	-0.124939
-2.226448	An overflow	-0.124939
-1.490720	cause overflow	-0.124939
-2.037387	Integer overflow	-0.124939
-1.913910	give overflow	-0.124939
-1.822872	positive overflow	-0.124939
-1.595365	buffer overflow	-0.124939
-1.201132	against overflow	-0.124939
-1.076938	ignore overflow	-0.124939
-0.601311	uncaught overflow	-0.124939
-3.075749	a +=	-0.124939
-1.104119	i +=	-0.477121
-1.741197	temp +=	-0.124939
-0.633883	sum +=	-0.249877
-0.529148	list[i] +=	-0.249877
-1.375538	c1 +=	-0.124939
-0.601747	s +=	-0.124939
-0.601434	sum1 +=	-0.124939
-1.200999	u.i +=	-0.124939
-1.200999	sum2 +=	-0.124939
-1.200999	Y +=	-0.124939
-1.077618	list[i+1] +=	-0.124939
-1.076838	r1 +=	-0.124939
-1.077618	list[i+2] +=	-0.124939
-1.076838	Z +=	-0.124939
-1.076838	s3 +=	-0.124939
-1.076838	s2 +=	-0.124939
-0.901527	s0 +=	-0.124939
-0.901527	s1 +=	-0.124939
-0.601278	*const_cast<int*>(&x) +=	-0.124939
-0.601278	15] +=	-0.124939
-0.601278	matrix[FuncRow(i)][FuncCol(i)] +=	-0.124939
-0.601278	matrix[i][j] +=	-0.124939
-3.818263	the integers	-0.124939
-2.837689	of integers	-0.124939
-3.063296	to integers	-0.124939
-2.914139	and integers	-0.124939
-3.293265	are integers	-0.124939
-2.660692	using integers	-0.124939
-1.908118	two integers	-0.124939
-1.649207	64-bit integers	-0.124939
-1.542785	between integers	-0.602060
-1.537933	32-bit integers	-0.124939
-0.872325	unsigned integers	-0.221849
-2.139686	four integers	-0.124939
-2.125005	eight integers	-0.124939
-1.299555	signed integers	-0.124939
-1.676344	16-bit integers	-0.124939
-1.552014	multiply integers	-0.124939
-1.202277	sixteen integers	-0.124939
-0.504904	8-bit integers	-0.124939
-2.543714	the option	-0.166331
-2.239026	with option	-0.124939
-2.418048	This option	-0.425969
-1.690045	an option	-0.492916
-1.837031	compiler option	-0.346788
-2.936325	this option	-0.124939
-2.518911	any option	-0.124939
-2.387877	optimization option	-0.124939
-2.006732	handling option	-0.124939
-1.048013	output option	-0.124939
-1.744364	unroll option	-0.124939
-1.674707	(e.g. option	-0.124939
-1.501326	12 option	-0.124939
-1.077538	-fpie option	-0.124939
-0.901994	annotation option	-0.124939
-0.601512	file" option	-0.124939
-2.743848	is good	-0.249877
-2.123511	a good	-0.316824
-3.653838	of good	-0.124939
-3.411540	for good	-0.124939
-2.723995	are good	-0.425969
-2.041531	as good	-0.726999
-3.062941	not good	-0.124939
-1.611696	A good	-0.301030
-2.748865	all good	-0.124939
-2.547135	many good	-0.124939
-1.389742	very good	-0.124939
-1.077939	mean good	-0.124939
-3.033480	the power	-0.425969
-1.813012	a power	-1.572097
-2.722724	integer power	-0.124939
-2.044220	Integer power	-0.124939
-1.282234	high power	-0.425969
-1.712249	processing power	-0.124939
-1.554109	low power	-0.124939
-1.299462	computing power	-0.124939
-0.601746	computational power	-0.124939
-2.691365	the matrix	-0.182931
-2.193038	a matrix	-0.249877
-3.634807	to matrix	-0.124939
-2.707926	in matrix	-0.301030
-3.394752	for matrix	-0.124939
-2.225011	A matrix	-0.425969
-2.138981	different matrix	-0.124939
-1.726189	64 matrix	-0.124939
-2.230832	big matrix	-0.124939
-1.393297	copy matrix	-0.425969
-1.180881	512 matrix	-0.124939
-1.712167	per matrix	-0.124939
-1.598224	define matrix	-0.124939
-1.444756	transpose matrix	-0.124939
-3.605133	a Linux	-0.124939
-3.590100	of Linux	-0.124939
-3.595376	to Linux	-0.124939
-3.412680	and Linux	-0.124939
-2.549565	in Linux	-0.425969
-2.556241	for Linux	-0.124939
-3.231350	// Linux	-0.124939
-3.104947	on Linux	-0.124939
-1.836675	compiler Linux	-0.346788
-1.207384	64-bit Linux	-0.367977
-2.564065	In Linux	-0.124939
-1.378840	32-bit Linux	-0.249877
-1.737651	bit Linux	-0.124939
-2.263446	about Linux	-0.124939
-2.238368	Windows Linux	-0.124939
-1.823371	supports Linux	-0.124939
-0.851254	Windows, Linux	-0.602060
-1.077338	_WIN32 Linux	-0.124939
-1.897742	not been	-0.221849
-1.314673	have been	-0.159701
-0.977906	has been	-0.147215
-1.300444	already been	-0.124939
-2.842749	to cause	-0.124939
-2.550606	and cause	-0.124939
-3.419422	that cause	-0.124939
-1.637366	can cause	-0.124939
-1.731583	may cause	-0.301030
-1.809914	will cause	-0.425969
-1.544641	doesn't cause	-0.124939
-1.523831	common cause	-0.425969
-2.114142	would cause	-0.124939
-1.502490	frequent cause	-0.124939
-1.300268	schemes cause	-0.124939
-2.467580	the AVX	-0.460731
-2.805298	The AVX	-0.124939
-3.370726	for AVX	-0.124939
-2.214315	// AVX	-0.425969
-3.196873	if AVX	-0.124939
-2.485600	with AVX	-0.425969
-2.923894	use AVX	-0.124939
-2.187657	from AVX	-0.425969
-2.692913	no AVX	-0.124939
-2.690118	set AVX	-0.124939
-2.430084	4 AVX	-0.124939
-1.409181	without AVX	-0.124939
-2.291390	instructions AVX	-0.124939
-1.298824	256 AVX	-0.124939
-1.501671	system. AVX	-0.124939
-0.204004	12.1 AVX	-0.425969
-0.601478	wmmintrin.h AVX	-0.124939
-3.631536	of classes	-0.124939
-3.458707	and classes	-0.124939
-3.577965	in classes	-0.124939
-1.137202	vector classes	-0.124939
-2.647612	into classes	-0.124939
-2.573801	C++ classes	-0.124939
-0.683464	container classes	-0.182931
-1.263279	string classes	-0.124939
-1.678840	child classes	-0.124939
-0.902705	Vector classes	-0.425969
-1.554380	parent classes	-0.124939
-0.090140	Container classes	-0.124939
-0.601579	wrapper classes	-0.124939
-0.601579	generations classes	-0.124939
-2.256197	is done	-0.321233
-3.496741	and done	-0.124939
-1.771079	be done	-0.335792
-2.119527	are done	-0.204120
-3.031232	than done	-0.124939
-2.298354	have done	-0.124939
-2.842223	has done	-0.124939
-2.751593	all done	-0.124939
-2.054286	was done	-0.124939
-1.874006	usually done	-0.124939
-1.299128	necessarily done	-0.124939
-2.147035	is therefore	-0.229674
-2.109733	and therefore	-0.170696
-2.733604	are therefore	-0.425969
-2.246320	can therefore	-0.249877
-3.034388	may therefore	-0.124939
-2.223190	will therefore	-0.124939
-1.422432	should therefore	-0.425969
-3.671409	a precision	-0.124939
-3.086192	of precision	-0.124939
-2.130078	same precision	-0.124939
-2.104149	point precision	-0.124939
-0.871511	double precision	-0.276206
-0.826922	single precision	-0.271067
-1.281894	high precision	-0.124939
-1.853616	require precision	-0.124939
-1.377730	mixed precision	-0.124939
-0.204037	Single precision	-0.124939
-0.601646	High precision	-0.124939
-0.601646	(single precision	-0.124939
-3.157832	the line	-0.124939
-3.091157	a line	-0.425969
-3.153196	or line	-0.124939
-3.141728	by line	-0.124939
-3.113255	with line	-0.124939
-2.929153	this line	-0.124939
-2.061511	one line	-0.425969
-0.991644	cache line	-0.301030
-2.676599	each line	-0.124939
-1.491210	matrix line	-0.124939
-2.055320	above line	-0.124939
-2.018300	next line	-0.124939
-1.955858	last line	-0.124939
-1.915142	Each line	-0.124939
-1.500862	interpreted line	-0.124939
-1.299187	memset line	-0.124939
-0.346582	command line	-0.301030
-0.601445	Command line	-0.124939
-3.342281	and works	-0.124939
-2.251243	that works	-0.221849
-3.202918	it works	-0.124939
-3.076281	code works	-0.124939
-2.164166	This works	-0.301030
-2.999217	compiler works	-0.124939
-2.280282	this works	-0.124939
-2.889101	It works	-0.124939
-2.711374	one works	-0.124939
-1.630209	cache works	-0.425969
-2.552524	also works	-0.124939
-1.647385	method works	-0.124939
-2.224764	doesn't works	-0.124939
-2.034180	dispatching works	-0.124939
-2.016747	branches works	-0.124939
-2.005292	implementation works	-0.124939
-1.262106	mechanism works	-0.124939
-1.913367	linking works	-0.124939
-1.796097	profiler works	-0.124939
-1.766410	vectorization works	-0.124939
-1.297643	already works	-0.124939
-0.379871	"what works	-0.425969
-0.901393	14.13b works	-0.124939
-0.901393	Multithreading works	-0.124939
-0.601211	(|) works	-0.124939
-3.008474	the optimized	-0.425969
-2.880851	is optimized	-0.124939
-3.381102	and optimized	-0.124939
-3.537950	in optimized	-0.124939
-2.796633	The optimized	-0.124939
-2.333485	be optimized	-0.124939
-2.707217	are optimized	-0.124939
-3.132830	or optimized	-0.124939
-3.020143	not optimized	-0.124939
-3.012202	an optimized	-0.124939
-3.001179	you optimized	-0.124939
-2.670494	each optimized	-0.124939
-1.532709	best optimized	-0.124939
-2.118755	contains optimized	-0.124939
-1.392024	well optimized	-0.124939
-2.081415	simply optimized	-0.124939
-1.638469	includes optimized	-0.124939
-0.856188	fully optimized	-0.124939
-0.332994	highly optimized	-0.249877
-0.379925	Not optimized	-0.124939
-0.901660	carefully optimized	-0.124939
-3.093245	is inside	-0.124939
-3.257224	be inside	-0.124939
-3.199920	it inside	-0.124939
-3.073039	code inside	-0.124939
-2.823576	memory inside	-0.124939
-2.739298	used inside	-0.124939
-2.531228	objects inside	-0.124939
-2.503668	variable inside	-0.124939
-2.471692	table inside	-0.124939
-1.377906	branch inside	-0.726999
-2.446396	elements inside	-0.124939
-2.235412	arrays inside	-0.124939
-1.298095	calculations inside	-0.602060
-2.061860	counter inside	-0.124939
-2.016102	branches inside	-0.124939
-0.580408	declared inside	-0.602060
-1.672494	counters inside	-0.124939
-1.636441	condition inside	-0.124939
-0.943424	defined inside	-0.124939
-1.442763	body inside	-0.124939
-1.374942	happens inside	-0.124939
-1.296634	Objects inside	-0.124939
-1.297510	nothing inside	-0.124939
-0.901326	etc.) inside	-0.124939
-0.601177	entirely inside	-0.124939
-0.601177	log) inside	-0.124939
-3.397110	the manual	-0.425969
-3.659637	a manual	-0.124939
-2.931815	and manual	-0.124939
-2.058727	in manual	-0.564271
-1.894211	This manual	-0.346788
-2.363356	compiler manual	-0.124939
-1.881769	this manual	-0.425969
-2.447973	See manual	-0.124939
-2.274429	Gnu manual	-0.124939
-1.826280	my manual	-0.124939
-0.601888	(See manual	-0.726999
-0.601702	present manual	-0.425969
-0.902195	vectorclass manual	-0.124939
-3.496884	a /	-0.124939
-1.429981	b /	-0.124939
-2.597568	i /	-0.124939
-1.870756	Here, /	-0.124939
-1.819478	5 /	-0.124939
-1.794138	1. /	-0.124939
-1.739125	temp /	-0.124939
-1.702973	100 /	-0.124939
-1.548238	eax /	-0.124939
-1.441095	xn /	-0.124939
-1.441095	Borland /	-0.124939
-1.374149	CodeGear /	-0.124939
-1.200069	size) /	-0.124939
-0.504743	int)b /	-0.124939
-1.076139	Signed /	-0.124939
-0.124826	(2n /	-0.602060
-1.076139	kb /	-0.124939
-0.901060	a2 /	-0.124939
-0.901060	a1 /	-0.124939
-0.901060	2.0 /	-0.124939
-0.901060	8192 /	-0.124939
-0.901060	(a+1) /	-0.124939
-0.601044	a2*b1) /	-0.124939
-0.601044	ebx,eax /	-0.124939
-0.601044	(10000 /	-0.124939
-0.601044	int)a /	-0.124939
-0.601044	address) /	-0.124939
-0.601044	(1. /	-0.124939
-0.601044	80x86 /	-0.124939
-0.601044	(0x2710 /	-0.124939
-3.150330	is explained	-0.124939
-2.484753	are explained	-0.301030
-1.173571	as explained	-0.744727
-1.745477	further explained	-0.124939
-0.488003	reasons explained	-0.726999
-1.203132	As explained	-0.124939
-0.902596	mechanisms explained	-0.124939
-3.417397	the calculated	-0.124939
-2.216931	is calculated	-0.249877
-1.706075	be calculated	-0.397940
-2.729735	are calculated	-0.124939
-2.846058	has calculated	-0.124939
-2.119133	only calculated	-0.124939
-2.334210	always calculated	-0.124939
-0.601746	17is calculated	-0.124939
-0.601746	16is calculated	-0.124939
-2.324381	the calculation	-0.903090
-2.215399	The calculation	-0.425969
-3.093028	This calculation	-0.124939
-2.954790	this calculation	-0.124939
-2.888906	A calculation	-0.124939
-2.011955	each calculation	-0.425969
-2.495514	order calculation	-0.124939
-1.344658	address calculation	-0.124939
-1.712208	total calculation	-0.124939
-0.601679	estimated calculation	-0.124939
-0.601679	Address calculation	-0.124939
-1.304217	} };	-0.689210
-2.451006	elements };	-0.124939
-2.416542	bit };	-0.124939
-2.109265	structure };	-0.124939
-1.299030	c; };	-0.124939
-1.741790	d; };	-0.124939
-1.442883	19 };	-0.124939
-1.077038	perhaps };	-0.124939
-0.379925	b;} };	-0.124939
-0.203977	c:2; };	-0.124939
-0.901660	Saturday };	-0.124939
-0.203977	f(); };	-0.124939
-0.901660	b[1000]; };	-0.124939
-0.901660	NotPolymorphic(); };	-0.124939
-0.901660	0x40 };	-0.124939
-0.203977	:1;//signbit };	-0.425969
-0.601344	~C1(); };	-0.124939
-0.601344	UnusedFiller; };	-0.124939
-0.601344	abc; };	-0.124939
-0.601344	"Delta" };	-0.124939
-0.601344	4.; };	-0.124939
-3.098912	is 128	-0.124939
-2.851326	a 128	-0.602060
-1.966180	int 128	-0.124939
-2.982246	than 128	-0.124939
-2.643023	page 128	-0.124939
-2.634042	double 128	-0.124939
-2.574877	float 128	-0.124939
-1.807693	2 128	-0.425969
-1.747703	4 128	-0.124939
-2.415665	8 128	-0.124939
-1.729759	first 128	-0.124939
-2.315510	16 128	-0.124939
-2.325604	SSE2 128	-0.124939
-1.455652	128 128	-0.124939
-1.951709	dispatcher 128	-0.124939
-1.262700	char 128	-0.124939
-1.673151	% 128	-0.124939
-1.637774	SSE 128	-0.124939
-1.441479	int64_t 128	-0.124939
-0.601570	strlen 128	-0.425969
-1.200866	uint64_t 128	-0.124939
-1.201677	12.2 128	-0.124939
-0.901460	......................................................................... 128	-0.124939
-0.601244	(MMX), 128	-0.124939
-3.746182	the uses	-0.124939
-3.351666	and uses	-0.124939
-2.251642	that uses	-0.124939
-3.205937	it uses	-0.124939
-2.536878	function uses	-0.124939
-3.079548	code uses	-0.124939
-2.997428	int uses	-0.124939
-2.356824	compiler uses	-0.124939
-2.204863	It uses	-0.124939
-1.752261	program uses	-0.249877
-2.634042	double uses	-0.124939
-2.574877	float uses	-0.124939
-2.463894	software uses	-0.124939
-2.016605	typically uses	-0.124939
-1.331200	application uses	-0.124939
-1.315783	implementation uses	-0.124939
-1.999430	their uses	-0.124939
-1.984756	never uses	-0.124939
-1.872724	feature uses	-0.124939
-1.740098	sometimes uses	-0.124939
-1.739297	still uses	-0.124939
-1.549015	typical uses	-0.124939
-1.442286	vector, uses	-0.124939
-0.601244	CString uses	-0.124939
-2.745773	the four	-0.191886
-3.609800	is four	-0.124939
-2.573999	of four	-0.124939
-3.567989	to four	-0.124939
-3.272614	are four	-0.124939
-2.565332	or four	-0.124939
-2.481460	with four	-0.124939
-2.993070	than four	-0.124939
-2.936444	have four	-0.124939
-2.823540	has four	-0.124939
-1.862784	only four	-0.124939
-2.669478	do four	-0.124939
-2.409525	first four	-0.124939
-2.142197	get four	-0.124939
-2.096599	every four	-0.124939
-2.015880	next four	-0.124939
-1.935973	read four	-0.124939
-0.943053	e.g. four	-0.124939
-1.442883	hold four	-0.124939
-1.077038	each, four	-0.124939
-0.901660	calculates four	-0.124939
-2.993070	than functions.	-0.124939
-2.780674	different functions.	-0.124939
-1.514645	library functions.	-0.249877
-2.579831	multiple functions.	-0.124939
-2.571327	two functions.	-0.124939
-1.095728	member functions.	-0.367977
-1.524196	virtual functions.	-0.124939
-2.094508	hardware functions.	-0.124939
-1.045042	intrinsic functions.	-0.124939
-1.044804	mathematical functions.	-0.124939
-1.952479	string functions.	-0.124939
-1.949004	three functions.	-0.124939
-1.048074	frame functions.	-0.124939
-1.676257	overloaded functions.	-0.124939
-1.500166	polymorphic functions.	-0.124939
-1.077038	speed-critical functions.	-0.124939
-1.077751	trigonometric functions.	-0.124939
-0.901660	non-member functions.	-0.124939
-0.901660	thread-safe functions.	-0.124939
-0.901660	non-virtual functions.	-0.124939
-0.601344	unreferenced functions.	-0.124939
-3.600648	is another	-0.124939
-2.824803	to another	-0.124939
-3.371067	and another	-0.124939
-2.701512	in another	-0.124939
-3.333421	for another	-0.124939
-3.126248	or another	-0.124939
-2.110093	by another	-0.249877
-2.079847	with another	-0.124939
-3.090485	on another	-0.124939
-2.184851	from another	-0.425969
-2.667849	do another	-0.124939
-2.275349	making another	-0.124939
-2.241470	while another	-0.124939
-1.140062	calls another	-0.249877
-2.222165	Use another	-0.124939
-2.174412	inside another	-0.124939
-1.871177	goes another	-0.124939
-1.708571	causes another	-0.124939
-1.596104	set, another	-0.124939
-1.202623	produces another	-0.124939
-0.601311	interface, another	-0.124939
-0.601311	encounter another	-0.124939
-3.018719	the parameters	-0.124939
-3.610324	of parameters	-0.124939
-3.382025	The parameters	-0.124939
-2.294863	function parameters	-0.124939
-3.077636	as parameters	-0.124939
-2.824324	vector parameters	-0.124939
-2.102302	point parameters	-0.425969
-1.786405	integer parameters	-0.301030
-1.169952	template parameters	-0.346788
-2.265037	its parameters	-0.124939
-1.456004	four parameters	-0.124939
-0.747859	Function parameters	-0.367977
-1.771077	desired parameters	-0.124939
-1.709469	macro parameters	-0.124939
-0.601621	fourteen parameters	-0.425969
-0.901994	(three parameters	-0.124939
-2.157149	to get	-0.162727
-3.423738	and get	-0.124939
-2.400737	can get	-0.124939
-3.236476	// get	-0.124939
-3.160203	or get	-0.124939
-2.410920	not get	-0.124939
-2.346396	may get	-0.425969
-3.007873	you get	-0.124939
-1.808609	will get	-0.124939
-2.724608	should get	-0.124939
-1.843517	we get	-0.124939
-2.052402	both get	-0.124939
-2.021368	typically get	-0.124939
-1.991355	don't get	-0.124939
-1.377310	soon get	-0.124939
-1.201798	(1) get	-0.124939
-0.901927	(4) get	-0.124939
-2.355364	= b;	-0.124939
-1.762945	int b;	-0.301030
-1.727329	double b;	-0.124939
-1.607445	& b;	-0.124939
-0.571997	a, b;	-0.602060
-2.192621	+= b;	-0.124939
-2.097899	: b;	-0.124939
-1.954477	&& b;	-0.124939
-1.955355	| b;	-0.124939
-1.936490	|| b;	-0.124939
-1.874584	0, b;	-0.124939
-0.793597	bool b;	-0.301030
-1.077839	a[100], b;	-0.124939
-3.363740	the check	-0.124939
-3.585112	a check	-0.124939
-2.239623	to check	-0.425969
-2.399455	can check	-0.301030
-3.221275	// check	-0.124939
-2.407592	not check	-0.124939
-3.067854	This check	-0.124939
-2.883705	then check	-0.124939
-1.614738	no check	-0.726999
-2.256108	extra check	-0.124939
-2.250714	must check	-0.124939
-2.004254	automatically check	-0.124939
-1.974156	automatic check	-0.124939
-1.952336	runtime check	-0.124939
-1.637352	bounds check	-0.124939
-1.596633	might check	-0.124939
-1.595960	input check	-0.124939
-1.550202	brand check	-0.124939
-1.202077	missing check	-0.124939
-1.201398	(1) check	-0.124939
-2.180234	is advantageous	-0.539912
-1.848525	be advantageous	-0.564271
-3.361856	are advantageous	-0.124939
-1.896037	not advantageous	-0.346788
-2.253621	more advantageous	-0.124939
-2.437833	less advantageous	-0.124939
-2.402001	how advantageous	-0.124939
-2.334210	always advantageous	-0.124939
-2.042051	particular advantageous	-0.124939
-2.405235	is implemented	-0.329059
-1.705900	be implemented	-0.602060
-2.210617	are implemented	-0.221849
-3.217286	if implemented	-0.124939
-2.299038	have implemented	-0.425969
-2.405563	often implemented	-0.124939
-2.165078	calculation implemented	-0.124939
-2.041636	programs implemented	-0.124939
-2.026183	typically implemented	-0.124939
-2.029961	preferably implemented	-0.124939
-2.690391	the problem	-0.329059
-2.717161	a problem	-0.124939
-2.416165	The problem	-0.124939
-2.418657	This problem	-0.124939
-1.599492	this problem	-0.367977
-2.873340	A problem	-0.124939
-2.697277	no problem	-0.124939
-2.229830	big problem	-0.124939
-1.710373	alignment problem	-0.124939
-1.710882	causes problem	-0.124939
-1.676630	Another problem	-0.124939
-1.552711	usability problem	-0.124939
-1.444078	serious problem	-0.124939
-1.378669	worst problem	-0.124939
-0.902061	safety problem	-0.124939
-2.144810	is known	-0.492916
-3.648175	a known	-0.124939
-3.631536	of known	-0.124939
-3.458707	and known	-0.124939
-3.339219	be known	-0.124939
-1.803498	not known	-0.903090
-2.791938	only known	-0.124939
-2.711265	integer known	-0.124939
-1.981085	size known	-0.124939
-2.650014	pointer known	-0.124939
-2.522192	any known	-0.124939
-2.286350	constant known	-0.124939
-1.590406	error known	-0.425969
-0.601675	already known	-0.124939
-1.479592	for (i	-1.587337
-2.018355	if (i	-0.346788
-2.261210	while (i	-0.124939
-3.143781	the solution	-0.301030
-3.556713	a solution	-0.124939
-3.327765	The solution	-0.124939
-2.413808	This solution	-0.124939
-2.035015	this solution	-0.301030
-2.742162	which solution	-0.124939
-1.265458	efficient solution	-0.204120
-2.281581	simple solution	-0.124939
-2.214884	best solution	-0.124939
-2.090315	standard solution	-0.124939
-1.347594	optimal solution	-0.124939
-2.000892	complicated solution	-0.124939
-1.949158	better solution	-0.124939
-0.901995	alternative solution	-0.124939
-1.499703	fastest solution	-0.124939
-1.376313	powerful solution	-0.124939
-1.202557	clean solution	-0.124939
-1.201777	reasonable solution	-0.124939
-1.076838	Which solution	-0.124939
-0.901527	viable solution	-0.124939
-0.901527	universal solution	-0.124939
-0.601278	radical solution	-0.124939
-0.601278	ultimate solution	-0.124939
-2.905460	the container	-0.124939
-2.322984	a container	-0.301030
-3.065041	of container	-0.425969
-3.357936	The container	-0.124939
-3.374852	that container	-0.124939
-2.568937	or container	-0.124939
-2.816474	make container	-0.124939
-2.750240	other container	-0.124939
-2.724403	one container	-0.124939
-2.655076	example container	-0.124939
-2.569782	such container	-0.124939
-1.632359	efficient container	-0.301030
-2.093159	standard container	-0.124939
-1.891560	own container	-0.124939
-1.871820	made container	-0.124939
-1.797822	STL container	-0.124939
-1.710200	accessing container	-0.124939
-1.674798	containing container	-0.124939
-1.077238	well-tested container	-0.124939
-3.169409	the advantage	-0.301030
-2.075764	The advantage	-1.028029
-3.084474	This advantage	-0.124939
-1.765648	an advantage	-0.301030
-2.699476	no advantage	-0.124939
-2.574617	such advantage	-0.124939
-2.539764	takes advantage	-0.124939
-0.965289	take advantage	-1.028029
-2.225530	speed advantage	-0.124939
-2.218990	specific advantage	-0.124939
-1.826292	main advantage	-0.124939
-1.711690	maximum advantage	-0.124939
-1.675500	full advantage	-0.124939
-1.077739	took advantage	-0.124939
-1.680714	// Function	-0.602060
-3.092868	code Function	-0.124939
-2.384336	libraries Function	-0.124939
-2.334825	method Function	-0.124939
-2.256108	function. Function	-0.124939
-2.147576	parameters Function	-0.124939
-2.085307	copy Function	-0.124939
-1.392001	memory. Function	-0.425969
-1.637352	instructions. Function	-0.124939
-1.298308	Functions Function	-0.124939
-1.297630	itself. Function	-0.124939
-1.201398	operators. Function	-0.124939
-0.379938	7.7 Function	-0.425969
-0.379938	7.16 Function	-0.425969
-0.203983	7.15 Function	-0.425969
-0.901727	__fastcall. Function	-0.124939
-0.601378	12.6. Function	-0.124939
-0.601378	/Gr Function	-0.124939
-0.601378	about. Function	-0.124939
-0.601378	libircmt.lib. Function	-0.124939
-3.559230	to support	-0.124939
-3.333421	for support	-0.124939
-2.252440	that support	-0.124939
-2.161192	not support	-0.124939
-2.290892	have support	-0.425969
-2.866889	will support	-0.124939
-2.156638	has support	-0.425969
-2.810480	make support	-0.124939
-2.499919	some support	-0.124939
-2.382060	libraries support	-0.124939
-1.237054	AVX support	-0.124939
-1.151471	hardware support	-0.602060
-2.003020	handling support	-0.124939
-1.988899	don't support	-0.124939
-1.949774	better support	-0.124939
-1.913910	requires support	-0.124939
-1.106012	off support	-0.425969
-1.676767	OS support	-0.124939
-1.499194	profiling support	-0.124939
-1.443426	debugging support	-0.124939
-0.901593	inherent support	-0.124939
-0.601311	excellent support	-0.124939
-3.859350	the supported	-0.124939
-2.254561	is supported	-0.689210
-2.924658	and supported	-0.124939
-2.772132	that supported	-0.124939
-2.476075	are supported	-0.602060
-2.540693	if supported	-0.425969
-3.082279	as supported	-0.124939
-2.413153	not supported	-0.124939
-2.789912	only supported	-0.124939
-1.641294	SSE2 supported	-0.124939
-2.266626	about supported	-0.124939
-1.491894	AVX supported	-0.124939
-0.380005	Get supported	-0.425969
-0.902061	minimum supported	-0.124939
-0.601545	Detect supported	-0.124939
-3.638473	is eight	-0.124939
-3.068000	of eight	-0.425969
-2.549565	in eight	-0.726999
-3.293265	are eight	-0.124939
-3.153196	or eight	-0.124939
-2.111770	by eight	-0.726999
-2.945457	have eight	-0.124939
-2.749576	but eight	-0.124939
-2.643492	into eight	-0.124939
-1.731189	first eight	-0.124939
-2.347216	these eight	-0.124939
-2.109632	run eight	-0.124939
-1.674522	handle eight	-0.124939
-0.093862	Load eight	-1.028029
-0.856717	involves eight	-0.124939
-1.077338	each, eight	-0.124939
-0.901860	handles eight	-0.124939
-0.601445	reloaded eight	-0.124939
-3.859350	the operators	-0.124939
-2.694351	and operators	-0.124939
-2.809697	The operators	-0.124939
-2.854341	from operators	-0.124939
-2.740784	all operators	-0.124939
-2.754558	but operators	-0.124939
-2.209626	These operators	-0.124939
-2.041053	Integer operators	-0.124939
-0.769127	Boolean operators	-0.124939
-0.981551	overloaded operators	-0.124939
-0.112998	bitwise operators	-0.279841
-1.298462	increment operators	-0.124939
-0.504944	Overloaded operators	-0.124939
-0.380005	decrement operators	-0.124939
-0.601545	relational operators	-0.124939
-3.904735	the few	-0.124939
-1.919946	a few	-0.236913
-3.416374	The few	-0.124939
-3.140572	with few	-0.124939
-3.096516	as few	-0.124939
-2.226732	A few	-0.124939
-2.796472	same few	-0.124939
-2.796018	only few	-0.124939
-2.765072	which few	-0.124939
-2.486482	very few	-0.124939
-2.148928	uses few	-0.124939
-1.774435	Unfortunately, few	-0.124939
-2.520772	that contains	-0.124939
-2.451310	code contains	-0.124939
-1.752699	program contains	-0.124939
-2.763384	loop contains	-0.124939
-2.744195	which contains	-0.124939
-1.514512	library contains	-0.249877
-1.798306	software contains	-0.425969
-1.715644	often contains	-0.124939
-2.006664	expression contains	-0.124939
-1.821408	section contains	-0.124939
-1.707835	testing contains	-0.124939
-1.636758	ebx contains	-0.124939
-0.805132	now contains	-0.124939
-1.500676	ecx contains	-0.124939
-1.442684	collection contains	-0.124939
-1.376479	edx contains	-0.124939
-1.201877	www.agner.org/optimize/cppexamples.zip contains	-0.124939
-0.901593	Library" contains	-0.124939
-0.601311	www.agner.org/optimize/asmlib.zip contains	-0.124939
-0.601311	http://www.agner.org/optimize/asmlib.zip contains	-0.124939
-0.601311	(CGrandParent) contains	-0.124939
-0.601311	(CParent<>) contains	-0.124939
-3.047696	of whether	-0.124939
-3.342281	and whether	-0.124939
-3.079945	on whether	-0.124939
-2.563248	efficient whether	-0.124939
-2.341829	sure whether	-0.124939
-2.327385	out whether	-0.124939
-2.256117	about whether	-0.124939
-2.149944	check whether	-0.124939
-2.051065	fast whether	-0.124939
-1.315690	see whether	-0.124939
-1.957878	difference whether	-0.124939
-1.894671	shows whether	-0.124939
-1.873230	know whether	-0.124939
-1.708419	clear whether	-0.124939
-1.674493	predict whether	-0.124939
-0.943294	consider whether	-0.124939
-0.647251	checks whether	-0.602060
-1.441248	compile-time whether	-0.124939
-1.375141	evaluate whether	-0.124939
-0.149662	deciding whether	-0.522879
-1.076638	determine whether	-0.124939
-0.901393	considering whether	-0.124939
-0.901393	determines whether	-0.124939
-0.901393	decides whether	-0.124939
-0.601211	correctly whether	-0.124939
-0.384474	100; i++)	-0.602060
-1.853974	2; i++)	-0.124939
-0.075225	size; i++)	-0.522879
-0.856563	n; i++)	-0.124939
-1.445715	256; i++)	-0.124939
-1.299108	1000; i++)	-0.124939
-1.077739	i<100; i++)	-0.124939
-0.902128	20; i++)	-0.124939
-0.902128	i<n; i++)	-0.124939
-0.204024	rows; i++)	-0.124939
-0.204024	NumberOfTests; i++)	-0.425969
-0.601579	arraysize; i++)	-0.124939
-0.601579	ArraySize; i++)	-0.124939
-0.601579	list.Size(); i++)	-0.124939
-3.020797	the list	-0.124939
-2.516724	a list	-0.301030
-3.620800	of list	-0.124939
-3.624609	to list	-0.124939
-3.572018	in list	-0.124939
-2.873340	A list	-0.124939
-2.484999	long list	-0.124939
-2.328233	following list	-0.124939
-0.601766	linked list	-0.204120
-0.877523	negative list	-0.602060
-0.877523	positive list	-0.301030
-1.638332	entire list	-0.124939
-1.598469	linear list	-0.124939
-1.444590	smallest list	-0.124939
-0.124882	sorted list	-0.124939
-3.339285	that would	-0.124939
-2.569088	it would	-0.425969
-2.163827	This would	-0.124939
-2.355647	compiler would	-0.124939
-2.992953	you would	-0.124939
-2.279484	this would	-0.124939
-2.756160	loop would	-0.124939
-2.736118	which would	-0.124939
-2.503668	variable would	-0.124939
-2.516669	we would	-0.124939
-2.169064	line would	-0.124939
-2.144451	parameters would	-0.124939
-2.133931	solution would	-0.124939
-2.014408	multiplication would	-0.124939
-2.004836	implementation would	-0.124939
-1.744668	d would	-0.124939
-1.636441	loops would	-0.124939
-1.593313	metaprogramming would	-0.124939
-1.498138	chain would	-0.124939
-0.805121	who would	-0.124939
-0.601354	reduction would	-0.124939
-1.296634	15.1c would	-0.124939
-1.200600	otherwise would	-0.124939
-1.076539	0x273F would	-0.124939
-1.076539	logarithm would	-0.124939
-0.901326	sizeof(S1) would	-0.124939
-3.920982	the likely	-0.124939
-1.941983	is likely	-0.990240
-3.345650	are likely	-0.124939
-2.925775	more likely	-0.124939
-2.686846	most likely	-0.124939
-2.576858	also likely	-0.124939
-1.802687	very likely	-0.124939
-2.435871	less likely	-0.124939
-2.184071	therefore likely	-0.124939
-2.069755	quite likely	-0.124939
-1.502869	equally likely	-0.124939
-2.823578	the structure	-0.124939
-2.515889	a structure	-0.602060
-2.841197	of structure	-0.301030
-2.064656	or structure	-0.346788
-3.078863	This structure	-0.124939
-2.869535	A structure	-0.124939
-2.170754	data structure	-0.124939
-2.826781	program structure	-0.124939
-2.783979	same structure	-0.124939
-1.774328	whole structure	-0.124939
-1.676874	logical structure	-0.124939
-1.638001	parallel structure	-0.124939
-1.552479	logic structure	-0.124939
-1.299934	multidimensional structure	-0.124939
-1.201931	pipeline structure	-0.124939
-1.201931	class, structure	-0.124939
-2.887813	is doing	-0.124939
-2.489245	of doing	-0.204120
-3.423738	and doing	-0.124939
-2.557437	for doing	-0.124939
-2.319984	are doing	-0.124939
-3.146252	by doing	-0.124939
-3.038646	not doing	-0.124939
-3.007935	than doing	-0.124939
-2.935451	time doing	-0.124939
-2.246399	when doing	-0.124939
-1.778749	from doing	-0.124939
-2.848012	at doing	-0.124939
-2.780089	CPU doing	-0.124939
-2.772586	loop doing	-0.124939
-1.912790	actually doing	-0.124939
-1.771795	fact doing	-0.124939
-0.901927	busy doing	-0.124939
-2.408283	to run	-0.367977
-3.435084	and run	-0.124939
-2.525240	that run	-0.301030
-2.243629	can run	-0.124939
-2.095864	may run	-0.124939
-3.009563	you run	-0.124939
-1.693159	will run	-0.221849
-2.889843	then run	-0.124939
-2.850076	at run	-0.124939
-2.787895	only run	-0.124939
-2.725760	should run	-0.124939
-2.680718	each run	-0.124939
-2.360896	test run	-0.124939
-2.322617	always run	-0.124939
-1.952951	applications run	-0.124939
-1.742195	still run	-0.124939
-1.951255	to calculate	-0.558594
-2.950241	and calculate	-0.124939
-2.038702	can calculate	-0.124939
-3.031620	may calculate	-0.124939
-2.222627	will calculate	-0.124939
-2.804296	only calculate	-0.124939
-2.258963	must calculate	-0.124939
-1.554622	could calculate	-0.124939
-3.873958	the inline	-0.124939
-2.572864	to inline	-0.221849
-3.394752	for inline	-0.124939
-3.086973	as inline	-0.124939
-2.380232	an inline	-0.124939
-2.931712	use inline	-0.124939
-2.790181	same inline	-0.124939
-2.798038	functions inline	-0.124939
-0.854405	static inline	-0.522879
-2.453037	cannot inline	-0.124939
-2.432012	call inline	-0.124939
-2.233167	An inline	-0.124939
-2.230367	Use inline	-0.124939
-1.773838	functions, inline	-0.124939
-3.053401	of every	-0.124939
-2.550310	for every	-0.124939
-3.354173	that every	-0.124939
-3.119765	or every	-0.124939
-2.212992	on every	-0.301030
-2.911725	this every	-0.124939
-2.176023	at every	-0.124939
-2.645294	example every	-0.124939
-2.429147	called every	-0.124939
-2.180772	done every	-0.124939
-2.118909	list every	-0.124939
-2.018038	branches every	-0.124939
-1.298376	block every	-0.425969
-1.008242	addition every	-0.301030
-1.823979	loaded every	-0.124939
-1.635692	updates every	-0.124939
-1.635692	e.g. every	-0.124939
-1.499703	misprediction every	-0.124939
-1.376313	evaluated every	-0.124939
-1.297132	updated every	-0.124939
-0.901527	incremented every	-0.124939
-0.601278	re-allocated every	-0.124939
-0.601278	re-calculated every	-0.124939
-2.634379	the standard	-0.301030
-3.094306	a standard	-0.124939
-3.070980	of standard	-0.425969
-2.567266	The standard	-0.124939
-3.370726	for standard	-0.124939
-3.385573	that standard	-0.124939
-3.076084	This standard	-0.124939
-3.007935	than standard	-0.124939
-2.262407	use standard	-0.124939
-1.864872	many standard	-0.124939
-2.288608	simple standard	-0.124939
-2.209308	several standard	-0.124939
-1.826152	C standard	-0.124939
-1.639397	includes standard	-0.124939
-1.552247	include standard	-0.124939
-0.379978	C/C++ standard	-0.425969
-0.601478	IEEE standard	-0.124939
-2.824902	the hardware	-0.204120
-2.516724	a hardware	-0.602060
-2.691713	of hardware	-0.726999
-3.624609	to hardware	-0.124939
-3.572018	in hardware	-0.124939
-3.390361	The hardware	-0.124939
-3.116120	on hardware	-0.124939
-1.906848	has hardware	-0.602060
-2.106646	other hardware	-0.124939
-2.139482	known hardware	-0.124939
-1.332603	microprocessor hardware	-0.124939
-1.995388	intrinsic hardware	-0.124939
-1.378156	defines hardware	-0.124939
-0.902061	direct hardware	-0.124939
-0.601545	catching hardware	-0.124939
-3.557575	is 1	-0.124939
-2.884030	and 1	-0.425969
-3.251041	be 1	-0.124939
-2.321279	or 1	-0.124939
-3.075254	with 1	-0.124939
-2.827894	at 1	-0.124939
-2.596868	+ 1	-0.124939
-2.420313	unsigned 1	-0.124939
-1.723516	64 1	-0.124939
-2.305002	always 1	-0.124939
-2.312311	16 1	-0.124939
-2.320081	32 1	-0.124939
-2.277811	& 1	-0.124939
-1.404042	1 1	-0.124939
-1.741816	bool 1	-0.124939
-1.636177	? 1	-0.124939
-1.443505	ebx, 1	-0.124939
-1.441690	eax, 1	-0.124939
-1.440786	127 1	-0.124939
-1.077351	164 1	-0.124939
-1.076439	ecx, 1	-0.124939
-1.076439	Volume 1	-0.124939
-0.901260	Adding 1	-0.124939
-0.901260	subtracting 1	-0.124939
-0.601144	Contents 1	-0.124939
-0.601144	1023 1	-0.124939
-0.601144	level- 1	-0.124939
-3.078787	a :	-0.124939
-2.717839	one :	-0.124939
-1.957980	b :	-0.425969
-1.808441	2 :	-0.425969
-2.090303	1 :	-0.124939
-2.085990	sign :	-0.124939
-0.761345	exponent :	-0.124939
-0.647401	fraction :	-0.124939
-1.502162	2) :	-0.124939
-0.747289	D :	-0.425969
-0.747289	C1 :	-0.425969
-0.504700	CChild1 :	-0.425969
-1.076938	CChild2 :	-0.124939
-0.901593	CParent :	-0.124939
-0.901593	C2 :	-0.124939
-0.901593	1.0f :	-0.124939
-0.601311	"=m"(n) :	-0.124939
-0.601311	1.5f :	-0.124939
-0.601311	c1() :	-0.124939
-0.601311	EXCEPTION_EXECUTE_HANDLER :	-0.124939
-0.601311	" :	-0.124939
-0.601311	"m"(x) :	-0.124939
-2.566645	to add	-0.124939
-2.903869	and add	-0.124939
-3.364389	that add	-0.124939
-2.212359	// add	-0.124939
-3.132830	or add	-0.124939
-3.020143	not add	-0.124939
-2.344124	may add	-0.124939
-2.209995	then add	-0.124939
-2.113517	instruction add	-0.124939
-2.522176	we add	-0.124939
-2.340391	even add	-0.124939
-2.287299	instructions add	-0.124939
-2.243167	; add	-0.124939
-2.227969	doesn't add	-0.124939
-2.089667	add add	-0.124939
-1.910188	actually add	-0.124939
-1.674139	mov add	-0.124939
-1.077038	register, add	-0.124939
-0.901660	sar add	-0.124939
-0.901660	shr add	-0.124939
-0.601344	86 add	-0.124939
-0.993268	64-bit mode	-0.388180
-1.538901	32-bit mode	-0.124939
-1.326121	bit mode	-0.124939
-0.981769	16-bit mode	-0.124939
-1.445590	rounding mode	-0.124939
-0.187038	console mode	-0.425969
-0.249808	flush-to-zero mode	-0.124939
-0.505024	protected mode	-0.425969
-0.204057	denormals-are-zero mode	-0.124939
-2.877374	a store	-0.301030
-2.000964	to store	-0.209260
-2.552109	and store	-0.425969
-3.323598	can store	-0.124939
-2.350400	may store	-0.124939
-2.897752	will store	-0.124939
-2.863830	memory store	-0.124939
-2.254171	; store	-0.124939
-1.599287	might store	-0.124939
-0.902395	better: store	-0.124939
-2.688448	the values	-0.550907
-2.567266	The values	-0.301030
-2.948503	have values	-0.124939
-1.582670	other values	-0.522879
-2.690118	set values	-0.124939
-1.910030	multiple values	-0.425969
-2.579153	two values	-0.124939
-2.484560	table values	-0.124939
-2.004412	their values	-0.124939
-1.951732	three values	-0.124939
-1.770649	desired values	-0.124939
-1.376732	five values	-0.124939
-1.298129	actual values	-0.124939
-1.298129	valid values	-0.124939
-1.298129	key values	-0.124939
-0.901927	G values	-0.124939
-0.901927	R values	-0.124939
-2.370266	code. All	-0.124939
-2.302588	system All	-0.124939
-2.233444	execution All	-0.124939
-2.060266	program. All	-0.124939
-1.984017	systems. All	-0.124939
-1.866797	options All	-0.124939
-1.817508	are: All	-0.124939
-1.763150	variables. All	-0.124939
-1.704952	operations. All	-0.124939
-1.670190	needed. All	-0.124939
-0.980646	purposes. All	-0.124939
-1.549237	problems. All	-0.124939
-1.547240	storage. All	-0.124939
-1.498085	fast. All	-0.124939
-1.442100	do. All	-0.124939
-1.200069	conditions. All	-0.124939
-1.076139	stored. All	-0.124939
-1.076139	prone. All	-0.124939
-0.901060	relocation. All	-0.124939
-0.901060	feature. All	-0.124939
-0.901060	9.2. All	-0.124939
-0.601044	constructed. All	-0.124939
-0.601044	149 All	-0.124939
-0.601044	steps. All	-0.124939
-0.601044	93). All	-0.124939
-0.601044	www.agner.org/optimize/#vectorclass All	-0.124939
-0.601044	Relocation. All	-0.124939
-0.601044	Comments All	-0.124939
-0.601044	integer). All	-0.124939
-0.601044	unwinding. All	-0.124939
-2.376329	the sign	-0.630089
-3.407529	The sign	-0.124939
-3.403065	for sign	-0.124939
-3.257609	// sign	-0.124939
-3.135898	with sign	-0.124939
-2.127457	int sign	-0.301030
-2.018496	set sign	-0.425969
-2.364370	test sign	-0.124939
-2.334430	out sign	-0.124939
-1.502469	down sign	-0.124939
-1.378422	Set sign	-0.124939
-0.902195	flip sign	-0.124939
-0.601612	inequality sign	-0.124939
-3.377728	the copy	-0.124939
-3.094306	a copy	-0.124939
-2.832869	to copy	-0.301030
-2.690194	and copy	-0.301030
-2.567266	The copy	-0.124939
-3.370726	for copy	-0.124939
-2.615871	// copy	-0.425969
-2.222441	A copy	-0.425969
-2.025235	no copy	-0.124939
-2.269550	Some copy	-0.124939
-2.009531	Most copy	-0.124939
-1.913925	Many copy	-0.124939
-1.802124	unused copy	-0.124939
-1.674311	Any copy	-0.124939
-0.090132	non-inlined copy	-0.249877
-1.077438	backup copy	-0.124939
-0.901927	constructors, copy	-0.124939
-2.835945	of optimizing	-0.124939
-3.401897	and optimizing	-0.124939
-2.946945	in optimizing	-0.124939
-2.792237	for optimizing	-0.124939
-2.513501	by optimizing	-0.124939
-1.855604	an optimizing	-0.823909
-3.000439	than optimizing	-0.124939
-2.906933	when optimizing	-0.124939
-2.843913	at optimizing	-0.124939
-2.816061	because optimizing	-0.124939
-2.475082	between optimizing	-0.124939
-1.130273	An optimizing	-0.726999
-2.218347	best optimizing	-0.124939
-2.191328	good optimizing	-0.124939
-2.076864	All optimizing	-0.124939
-1.846435	advanced optimizing	-0.124939
-1.741745	prevent optimizing	-0.124939
-1.376334	job optimizing	-0.124939
-0.601411	Prevent optimizing	-0.124939
-3.175316	the memory.	-0.124939
-2.848301	of memory.	-0.124939
-2.152635	in memory.	-0.176091
-3.120798	code memory.	-0.124939
-2.165529	program memory.	-0.124939
-1.961142	into memory.	-0.124939
-1.654349	static memory.	-0.124939
-2.272247	stack memory.	-0.124939
-1.503494	allocated memory.	-0.124939
-1.827151	main memory.	-0.124939
-0.726722	RAM memory.	-0.124939
-1.299374	contiguous memory.	-0.124939
-3.859350	the well	-0.124939
-3.637008	a well	-0.124939
-3.446735	and well	-0.124939
-1.545891	as well	-0.564271
-3.048201	not well	-0.124939
-2.648641	pointer well	-0.124939
-2.482077	very well	-0.124939
-1.452854	how well	-0.301030
-1.553530	work well	-0.425969
-1.480613	works well	-0.425969
-2.066490	quite well	-0.124939
-0.877580	predicted well	-0.301030
-1.078152	Works well	-0.124939
-0.902061	worked well	-0.124939
-0.902061	flexible, well	-0.124939
-3.337045	the information	-0.124939
-3.517482	of information	-0.124939
-3.305724	for information	-0.124939
-3.051847	This information	-0.124939
-2.921826	have information	-0.124939
-2.901595	this information	-0.124939
-2.886559	more information	-0.124939
-2.069655	all information	-0.425969
-2.673797	no information	-0.124939
-2.493684	some information	-0.124939
-2.338096	without information	-0.124939
-2.250934	extra information	-0.124939
-1.533214	necessary information	-0.425969
-1.765096	No information	-0.124939
-1.670766	full information	-0.124939
-1.597664	added information	-0.124939
-0.747482	CPUID information	-0.124939
-1.443638	debug information	-0.124939
-1.375816	unwinding information	-0.124939
-0.601530	gets information	-0.425969
-1.200600	additional information	-0.124939
-1.201477	application-specific information	-0.124939
-0.901326	insufficient information	-0.124939
-0.601177	seek information	-0.124939
-0.601177	recovery information	-0.124939
-0.601177	incomplete information	-0.124939
-2.213204	is simply	-0.301030
-3.401897	and simply	-0.124939
-3.374852	that simply	-0.124939
-2.710889	are simply	-0.124939
-3.146301	or simply	-0.124939
-2.893361	It simply	-0.124939
-2.749274	used simply	-0.124939
-2.643194	pointer simply	-0.124939
-2.596609	number simply	-0.124939
-2.273207	error simply	-0.124939
-2.281992	I simply	-0.124939
-2.190698	integers simply	-0.124939
-2.182861	done simply	-0.124939
-2.151919	implemented simply	-0.124939
-1.894732	numbers simply	-0.124939
-1.638932	copied simply	-0.124939
-1.550499	brand simply	-0.124939
-1.442638	measured simply	-0.124939
-0.901794	significantly simply	-0.124939
-2.903895	is able	-0.602060
-1.771689	be able	-0.937852
-1.981866	are able	-1.028029
-2.421060	not able	-0.425969
-2.335892	always able	-0.124939
-1.918701	actually able	-0.124939
-1.181114	were able	-0.425969
-1.745390	sometimes able	-0.124939
-2.879128	is certain	-0.301030
-2.854953	a certain	-0.124939
-3.559230	to certain	-0.124939
-3.333421	for certain	-0.124939
-2.762977	that certain	-0.124939
-3.282873	be certain	-0.124939
-2.468621	are certain	-0.301030
-2.536314	if certain	-0.124939
-3.090485	on certain	-0.124939
-3.015638	not certain	-0.124939
-2.933481	have certain	-0.124939
-2.837836	at certain	-0.124939
-2.810480	make certain	-0.124939
-2.682189	no certain	-0.124939
-2.175497	therefore certain	-0.124939
-2.068080	count certain	-0.124939
-2.060834	quite certain	-0.124939
-2.044676	was certain	-0.124939
-1.743714	prevents certain	-0.124939
-0.805281	almost certain	-0.124939
-0.901593	obey certain	-0.124939
-0.601311	query certain	-0.124939
-0.497655	clock cycles	-0.301030
-2.614637	// ...	-0.124939
-1.615626	{ ...	-0.124939
-1.110656	i; ...	-0.249877
-1.299310	c; ...	-0.425969
-1.918774	public: ...	-0.124939
-1.913938	x; ...	-0.124939
-1.552624	y; ...	-0.124939
-0.747819	c); ...	-0.425969
-1.077338	136 ...	-0.124939
-1.077338	j; ...	-0.124939
-0.901860	CriticalFunction(); ...	-0.124939
-0.901860	479001600}; ...	-0.124939
-0.901860	list[size]; ...	-0.124939
-0.601445	List[ArraySize]; ...	-0.124939
-0.601445	Func1(2); ...	-0.124939
-0.601445	log(2.0); ...	-0.124939
-0.601445	b[1], ...	-0.124939
-0.601445	FactorialTable[b]; ...	-0.124939
-2.818318	the addresses	-0.602060
-3.057796	to addresses	-0.425969
-3.391375	and addresses	-0.124939
-3.139513	or addresses	-0.124939
-2.939428	have addresses	-0.124939
-2.841465	from addresses	-0.124939
-2.181806	memory addresses	-0.124939
-2.783583	different addresses	-0.124939
-2.577978	64-bit addresses	-0.124939
-1.813919	return addresses	-0.124939
-2.344509	these addresses	-0.124939
-2.219447	element addresses	-0.124939
-2.204648	These addresses	-0.124939
-2.136008	Function addresses	-0.124939
-2.076027	All addresses	-0.124939
-0.851331	relative addresses	-0.301030
-1.595960	row addresses	-0.124939
-1.443758	absolute addresses	-0.124939
-1.376812	self-relative addresses	-0.124939
-1.201398	round addresses	-0.124939
-3.117008	a counter	-0.124939
-2.785206	point counter	-0.124939
-1.115558	loop counter	-0.238882
-2.720408	integer counter	-0.124939
-2.098604	add counter	-0.124939
-2.072283	cycles counter	-0.124939
-1.106654	Loop counter	-0.124939
-0.944274	monitor counter	-0.124939
-0.857054	cycle counter	-0.124939
-0.216668	stamp counter	-0.124939
-2.823578	the shared	-0.204120
-3.658697	is shared	-0.124939
-2.442360	a shared	-0.492916
-3.435084	and shared	-0.124939
-3.566151	in shared	-0.124939
-3.324433	be shared	-0.124939
-3.307598	are shared	-0.124939
-3.167325	or shared	-0.124939
-3.043397	not shared	-0.124939
-2.913928	when shared	-0.124939
-2.223296	A shared	-0.425969
-2.822553	make shared	-0.124939
-2.783979	same shared	-0.124939
-1.902719	64-bit shared	-0.124939
-1.756055	called shared	-0.425969
-2.247607	large shared	-0.124939
-3.086023	to count	-0.124939
-3.425329	that count	-0.124939
-3.250572	it count	-0.124939
-1.287034	loop count	-0.380211
-2.559977	clock count	-0.124939
-1.733103	first count	-0.124939
-2.184859	therefore count	-0.124939
-1.994816	don't count	-0.124939
-0.135370	repeat count	-0.564271
-1.598942	seconds count	-0.124939
-2.506205	the program.	-0.301030
-2.607536	a program.	-0.221849
-3.645250	to program.	-0.124939
-2.575420	C++ program.	-0.124939
-2.418728	first program.	-0.124939
-2.231837	big program.	-0.124939
-2.098777	mode program.	-0.124939
-1.078186	application program.	-0.124939
-1.826722	main program.	-0.124939
-1.775025	whole program.	-0.124939
-1.711101	final program.	-0.124939
-1.502915	designed program.	-0.124939
-1.202778	existing program.	-0.124939
-2.401211	is quite	-0.124939
-2.223026	be quite	-0.124939
-3.300372	are quite	-0.124939
-3.038646	not quite	-0.124939
-2.948503	have quite	-0.124939
-2.754508	which quite	-0.124939
-2.751231	but quite	-0.124939
-2.566261	also quite	-0.124939
-1.726033	take quite	-0.425969
-2.262102	accessed quite	-0.124939
-2.266655	does quite	-0.124939
-1.222987	actually quite	-0.124939
-1.825579	predicted quite	-0.124939
-1.639397	occur quite	-0.124939
-1.598581	happen quite	-0.124939
-1.376732	happens quite	-0.124939
-1.298129	runs quite	-0.124939
-2.300729	is used.	-0.279841
-2.497952	be used.	-0.124939
-1.981178	are used.	-0.182931
-3.073053	not used.	-0.124939
-2.863830	memory used.	-0.124939
-2.369565	registers used.	-0.124939
-1.299711	never used.	-0.124939
-1.244887	longer used.	-0.124939
-1.917381	actually used.	-0.124939
-1.203078	seldom used.	-0.124939
-2.169161	data files	-0.124939
-2.816474	make files	-0.124939
-2.073657	all files	-0.425969
-2.598937	library files	-0.124939
-1.657350	object files	-0.124939
-2.534437	many files	-0.124939
-2.207504	several files	-0.124939
-2.070824	intermediate files	-0.124939
-1.027020	source files	-0.124939
-1.638291	loading files	-0.124939
-1.638291	resource files	-0.124939
-0.902571	header files	-0.124939
-0.601700	help files	-0.301030
-1.443281	Open files	-0.124939
-0.601671	.cpp files	-0.425969
-0.504890	configuration files	-0.124939
-1.202824	Header files	-0.124939
-1.077238	Object files	-0.124939
-0.601411	Temporary files	-0.124939
-1.964058	is recommended	-1.447158
-1.897400	not recommended	-0.346788
-2.587719	also recommended	-0.124939
-1.493290	therefore recommended	-0.124939
-0.601880	strongly recommended	-0.124939
-2.913521	the intermediate	-0.221849
-3.631536	of intermediate	-0.124939
-2.928222	and intermediate	-0.124939
-2.571077	The intermediate	-0.124939
-3.394752	for intermediate	-0.124939
-3.205504	if intermediate	-0.124939
-2.467086	on intermediate	-0.425969
-1.626224	an intermediate	-0.425969
-2.454444	makes intermediate	-0.124939
-2.100045	every intermediate	-0.124939
-1.406414	store intermediate	-0.124939
-2.081076	All intermediate	-0.124939
-1.443799	storing intermediate	-0.124939
-1.077739	frameworks, intermediate	-0.124939
-2.887813	is fast	-0.124939
-2.291138	for fast	-0.124939
-3.317224	be fast	-0.124939
-3.300372	are fast	-0.124939
-1.833944	as fast	-0.903090
-2.948503	have fast	-0.124939
-1.824006	so fast	-0.425969
-2.479165	very fast	-0.124939
-2.158206	calculated fast	-0.124939
-2.064866	quite fast	-0.124939
-1.741258	particularly fast	-0.124939
-1.640552	enable fast	-0.124939
-1.552824	addition, fast	-0.124939
-1.501671	equally fast	-0.124939
-1.376732	job fast	-0.124939
-1.202377	sufficiently fast	-0.124939
-0.601478	reciprocal, fast	-0.124939
-3.973723	the allocation	-0.124939
-2.825452	The allocation	-0.124939
-0.983319	memory allocation	-0.271067
-2.423484	register allocation	-0.124939
-2.330820	dynamic allocation	-0.124939
-1.554901	involves allocation	-0.124939
-1.503190	frequent allocation	-0.124939
-1.378810	Register allocation	-0.124939
-1.806734	for (int	-0.970037
-0.857018	factorial (int	-0.425969
-0.049205	SomeFunction (int	-0.492916
-1.203178	Multiply (int	-0.124939
-1.078240	Func1 (int	-0.124939
-0.902462	Plus2 (int	-0.124939
-0.902462	MultiplyBy (int	-0.124939
-0.902462	FuncA (int	-0.124939
-0.601746	FuncB (int	-0.124939
-3.873958	the write	-0.124939
-2.346272	to write	-0.329059
-2.928222	and write	-0.124939
-2.578085	or write	-0.124939
-2.348107	may write	-0.124939
-3.012963	you write	-0.124939
-2.399537	often write	-0.124939
-2.294485	instructions write	-0.124939
-1.591748	I write	-0.425969
-2.224120	threads write	-0.124939
-0.412834	nontemporal write	-0.522879
-1.598701	programmers write	-0.124939
-1.202198	uncached write	-0.124939
-0.601579	Noncached write	-0.124939
-2.057108	to optimize	-0.221849
-3.483690	and optimize	-0.124939
-2.653916	can optimize	-0.124939
-2.416524	not optimize	-0.124939
-3.034223	compiler optimize	-0.124939
-2.892453	will optimize	-0.124939
-2.003342	compilers optimize	-0.124939
-2.402539	often optimize	-0.124939
-1.502669	easily optimize	-0.124939
-1.202464	otherwise optimize	-0.124939
-0.601646	selecting optimize	-0.124939
-0.601646	Cannot optimize	-0.124939
-2.406858	the above	-0.301030
-2.306358	The above	-0.346788
-2.543848	if above	-0.124939
-3.095918	This above	-0.124939
-2.762445	used above	-0.124939
-1.776070	described above	-0.124939
-1.554568	mentioned above	-0.124939
-1.299295	28 above	-0.124939
-0.902395	matrix[c][r] above	-0.124939
-0.601712	position above	-0.124939
-2.371464	code. However,	-0.124939
-2.248370	function. However,	-0.124939
-2.059933	used. However,	-0.124939
-1.845682	CPUs. However,	-0.124939
-1.792617	platforms. However,	-0.124939
-1.791665	size. However,	-0.124939
-1.735580	resources. However,	-0.124939
-1.670549	possible. However,	-0.124939
-1.672474	thread. However,	-0.124939
-1.673439	purposes. However,	-0.124939
-1.593293	sets. However,	-0.124939
-1.549468	critical. However,	-0.124939
-1.547535	executed. However,	-0.124939
-1.499285	value. However,	-0.124939
-1.441294	120 However,	-0.124939
-1.441294	processor. However,	-0.124939
-1.442265	automatically. However,	-0.124939
-1.200202	are. However,	-0.124939
-1.201178	first. However,	-0.124939
-1.076239	platform. However,	-0.124939
-0.901126	debugger. However,	-0.124939
-0.901126	screen. However,	-0.124939
-0.901126	flow. However,	-0.124939
-0.901126	calculation. However,	-0.124939
-0.901126	implementations. However,	-0.124939
-0.601077	brackets. However,	-0.124939
-0.601077	maintenance. However,	-0.124939
-0.601077	models. However,	-0.124939
-0.601077	F1. However,	-0.124939
-3.556713	a was	-0.124939
-2.520032	that was	-0.124939
-2.166445	it was	-0.249877
-3.163210	function was	-0.124939
-2.768071	CPU was	-0.124939
-2.793282	instruction was	-0.124939
-2.679436	set was	-0.124939
-2.600161	there was	-0.124939
-1.797958	software was	-0.124939
-2.255759	CPUs was	-0.124939
-1.873052	feature was	-0.124939
-1.796117	statement was	-0.124939
-1.768089	operation was	-0.124939
-1.595069	seconds was	-0.124939
-1.549311	brand was	-0.124939
-1.444037	CPUID was	-0.124939
-1.377868	How was	-0.124939
-1.297132	recommendation was	-0.124939
-1.297132	Basic was	-0.124939
-1.297909	consumption was	-0.124939
-1.297132	15.1c was	-0.124939
-1.200999	alloca was	-0.124939
-0.901527	11.2b was	-0.124939
-3.552275	of both	-0.124939
-2.269367	in both	-0.191886
-3.282873	be both	-0.124939
-3.265943	are both	-0.124939
-2.536314	if both	-0.124939
-3.124088	by both	-0.124939
-3.095954	with both	-0.124939
-2.923783	time both	-0.124939
-2.866889	will both	-0.124939
-2.880668	then both	-0.124939
-2.836419	from both	-0.124939
-2.821715	has both	-0.124939
-2.812822	because both	-0.124939
-2.049719	optimize both	-0.124939
-2.028208	Therefore, both	-0.124939
-1.821408	supports both	-0.124939
-1.798012	Supports both	-0.124939
-1.744456	outside both	-0.124939
-1.597586	checks both	-0.124939
-1.375737	evaluate both	-0.124939
-0.901593	destination both	-0.124939
-0.601311	(2013) both	-0.124939
-3.746182	the programs	-0.124939
-3.534530	of programs	-0.124939
-3.351666	and programs	-0.124939
-2.546212	in programs	-0.124939
-2.782075	for programs	-0.124939
-3.087555	with programs	-0.124939
-2.572616	64-bit programs	-0.124939
-2.557927	C++ programs	-0.124939
-2.565000	such programs	-0.124939
-2.525590	many programs	-0.124939
-2.463894	software programs	-0.124939
-2.471097	32-bit programs	-0.124939
-2.273869	making programs	-0.124939
-2.264133	Some programs	-0.124939
-2.199165	common programs	-0.124939
-2.112033	few programs	-0.124939
-2.019759	Mac programs	-0.124939
-2.017392	application programs	-0.124939
-1.909824	Many programs	-0.124939
-1.674757	Other programs	-0.124939
-1.553860	oriented programs	-0.124939
-1.296966	CPU-intensive programs	-0.124939
-0.601244	interactive programs	-0.124939
-0.601244	Multithreaded programs	-0.124939
-2.911897	the problems	-0.124939
-3.620800	of problems	-0.124939
-3.624609	to problems	-0.124939
-2.834653	has problems	-0.124939
-1.669534	these problems	-0.124939
-2.349309	without problems	-0.124939
-2.209626	These problems	-0.124939
-2.208634	common problems	-0.124939
-1.491791	cause problems	-0.124939
-1.316931	caching problems	-0.124939
-0.601862	compatibility problems	-0.425969
-0.943659	resource problems	-0.124939
-1.553734	finding problems	-0.124939
-0.601660	usability problems	-0.301030
-1.077639	technical problems	-0.124939
-2.916930	time unless	-0.124939
-2.505219	variable unless	-0.124939
-2.502361	so unless	-0.124939
-2.377082	optimization unless	-0.124939
-2.357038	pointers unless	-0.124939
-2.344134	systems unless	-0.124939
-2.280058	& unless	-0.124939
-1.571127	CPUs unless	-0.425969
-2.230491	calculations unless	-0.124939
-1.405022	mode unless	-0.124939
-2.001175	handling unless	-0.124939
-1.800270	avoided unless	-0.124939
-1.077484	slow unless	-0.425969
-1.741440	frame unless	-0.124939
-1.741440	safe unless	-0.124939
-1.708419	clear unless	-0.124939
-1.550392	default unless	-0.124939
-1.442928	object, unless	-0.124939
-1.442928	rounding unless	-0.124939
-1.076638	manually unless	-0.124939
-0.901393	X, unless	-0.124939
-0.601211	bits), unless	-0.124939
-0.601211	b*(2.0/3.0) unless	-0.124939
-0.601211	constant, unless	-0.124939
-0.601211	unfavorable, unless	-0.124939
-2.510044	the optimal	-0.249877
-2.545523	is optimal	-0.204120
-3.463523	The optimal	-0.124939
-2.501216	be optimal	-0.301030
-2.012450	not optimal	-0.249877
-3.061916	an optimal	-0.124939
-2.439805	less optimal	-0.124939
-3.889074	the space	-0.124939
-3.659637	a space	-0.124939
-3.642544	of space	-0.124939
-3.407529	The space	-0.124939
-1.999483	more space	-0.124939
-1.372441	memory space	-0.234083
-2.793315	same space	-0.124939
-1.632113	cache space	-0.124939
-2.447458	address space	-0.124939
-2.216437	much space	-0.124939
-1.676339	disk space	-0.124939
-1.676783	little space	-0.124939
-0.726650	heap space	-0.124939
-3.353677	are cases,	-0.124939
-2.779931	other cases,	-0.124939
-2.754337	all cases,	-0.124939
-1.186813	most cases,	-0.176091
-2.578525	such cases,	-0.124939
-1.867558	many cases,	-0.124939
-1.073549	some cases,	-0.329059
-1.608095	simple cases,	-0.124939
-2.124252	few cases,	-0.124939
-0.943857	simplest cases,	-0.124939
-2.293282	if else	-0.301030
-1.031402	} else	-1.124939
-0.425889	anything else	-0.124939
-0.601880	34 else	-0.124939
-0.601880	68 else	-0.124939
-1.847244	a lot	-1.062791
-2.235444	A lot	-0.425969
-3.068855	- Integer	-0.124939
-2.471658	2 Integer	-0.124939
-2.378268	optimization Integer	-0.124939
-1.696752	time. Integer	-0.124939
-2.187302	overflow Integer	-0.124939
-1.430667	operators Integer	-0.124939
-2.015821	multiplication Integer	-0.124939
-1.970987	division Integer	-0.124939
-1.796490	cases. Integer	-0.124939
-1.794101	size. Integer	-0.124939
-1.707113	performance. Integer	-0.124939
-1.376147	15 Integer	-0.124939
-1.296966	microprocessors. Integer	-0.124939
-1.201677	constants. Integer	-0.124939
-1.201677	results. Integer	-0.124939
-0.379885	14.4 Integer	-0.425969
-1.076738	microprocessor. Integer	-0.124939
-0.203957	14.5 Integer	-0.124939
-0.901460	158 Integer	-0.124939
-0.901460	discussion. Integer	-0.124939
-0.901460	processor). Integer	-0.124939
-0.601244	division: Integer	-0.124939
-0.601244	15.1d. Integer	-0.124939
-0.601244	8.24. Integer	-0.124939
-3.433259	the dispatching	-0.124939
-2.830060	The dispatching	-0.124939
-0.930230	CPU dispatching	-0.312025
-2.460495	makes dispatching	-0.124939
-1.979714	automatic dispatching	-0.124939
-0.204077	Model-specific dispatching	-0.124939
-4.033767	the particular	-0.124939
-1.882464	a particular	-0.204120
-3.456133	that particular	-0.124939
-3.396204	are particular	-0.124939
-2.704094	each particular	-0.124939
-2.379173	the microprocessor	-0.329059
-2.883165	a microprocessor	-0.124939
-3.101957	of microprocessor	-0.124939
-3.553263	and microprocessor	-0.124939
-3.622045	in microprocessor	-0.124939
-2.231066	A microprocessor	-0.425969
-0.505002	dedicated microprocessor	-0.124939
-2.348935	to replace	-0.425969
-2.245721	can replace	-0.124939
-3.220740	or replace	-0.124939
-1.731859	may replace	-0.602060
-2.222064	will replace	-0.124939
-2.900799	then replace	-0.124939
-1.764782	cannot replace	-0.425969
-2.407083	often replace	-0.124939
-1.316871	automatically replace	-0.124939
-2.352699	the next	-0.151268
-2.143555	The next	-0.124939
-3.303234	// next	-0.124939
-2.154237	get next	-0.124939
-0.601880	mainstream next	-0.124939
-3.780728	the branches	-0.124939
-2.684286	of branches	-0.249877
-3.381102	and branches	-0.124939
-3.342588	The branches	-0.124939
-3.364389	that branches	-0.124939
-2.452083	code branches	-0.124939
-2.725059	all branches	-0.124939
-2.746400	used branches	-0.124939
-2.684313	no branches	-0.124939
-2.571327	two branches	-0.124939
-1.863345	many branches	-0.124939
-2.276090	making branches	-0.124939
-2.205707	several branches	-0.124939
-2.114622	few branches	-0.124939
-1.893179	dispatch branches	-0.124939
-1.741085	preceding branches	-0.124939
-1.708197	Avoid branches	-0.124939
-0.943338	identical branches	-0.124939
-1.077751	Eliminate branches	-0.124939
-0.601344	Unpredictable branches	-0.124939
-0.601344	Predictable branches	-0.124939
-2.537618	is typically	-0.204120
-3.600094	of typically	-0.124939
-2.769496	that typically	-0.124939
-2.473932	are typically	-0.124939
-2.417440	This typically	-0.124939
-3.007475	may typically	-0.124939
-2.217587	will typically	-0.124939
-2.645909	pointer typically	-0.124939
-2.536391	takes typically	-0.124939
-2.330367	SSE2 typically	-0.124939
-1.956285	programmer typically	-0.124939
-1.913357	framework typically	-0.124939
-1.711947	strings typically	-0.124939
-1.501671	devices typically	-0.124939
-1.443679	developers typically	-0.124939
-1.299287	frameworks typically	-0.124939
-1.078019	level, typically	-0.124939
-3.139513	or operator	-0.124939
-2.815571	vector operator	-0.124939
-2.722204	one operator	-0.124939
-1.352666	& operator	-0.124939
-1.951008	| operator	-0.124939
-1.798065	index operator	-0.124939
-1.744112	sum operator	-0.124939
-0.726468	overloaded operator	-0.602060
-0.680832	casting operator	-0.124939
-1.298308	AND operator	-0.124939
-0.601514	OR operator	-0.124939
-1.201398	modulo operator	-0.124939
-1.201398	pre-increment operator	-0.124939
-0.901727	dynamic_cast operator	-0.124939
-0.203983	const_cast operator	-0.124939
-0.901727	[] operator	-0.124939
-0.601378	?: operator	-0.124939
-0.601378	post-increment operator	-0.124939
-0.601378	reinterpret_cast operator	-0.124939
-0.601378	static_cast operator	-0.124939
-3.725720	is preferably	-0.124939
-3.353677	are preferably	-0.124939
-3.179315	by preferably	-0.124939
-3.094902	- preferably	-0.124939
-2.350400	may preferably	-0.124939
-0.988980	should preferably	-1.000000
-1.237956	therefore preferably	-0.301030
-1.378476	files, preferably	-0.124939
-0.902395	container, preferably	-0.124939
-0.601712	SSE2, preferably	-0.124939
-1.801784	= 1;	-0.301030
-2.413351	- 1;	-0.425969
-1.086483	+ 1;	-0.425969
-1.407203	: 1;	-0.425969
-1.181201	<< 1;	-0.425969
-1.378731	^ 1;	-0.124939
-0.601813	>>= 1;	-0.124939
-2.372623	time. Therefore,	-0.124939
-1.955369	data. Therefore,	-0.124939
-1.931915	set. Therefore,	-0.124939
-1.869901	called. Therefore,	-0.124939
-1.706752	it. Therefore,	-0.124939
-1.707585	mode. Therefore,	-0.124939
-1.593641	references. Therefore,	-0.124939
-1.551232	to. Therefore,	-0.124939
-1.550392	critical. Therefore,	-0.124939
-1.499240	applications. Therefore,	-0.124939
-1.441248	parameters. Therefore,	-0.124939
-1.442087	number. Therefore,	-0.124939
-0.601380	consuming. Therefore,	-0.124939
-1.296800	numbers. Therefore,	-0.124939
-1.296800	addresses. Therefore,	-0.124939
-1.200733	exception. Therefore,	-0.124939
-1.200733	declared. Therefore,	-0.124939
-0.504640	another. Therefore,	-0.124939
-1.076638	int. Therefore,	-0.124939
-1.077484	78 Therefore,	-0.124939
-0.901393	programmed. Therefore,	-0.124939
-0.901393	strides. Therefore,	-0.124939
-0.901393	namespaces. Therefore,	-0.124939
-0.901393	PCs. Therefore,	-0.124939
-0.601211	calculated. Therefore,	-0.124939
-3.889074	the Mac	-0.124939
-2.210301	and Mac	-0.182931
-2.955430	in Mac	-0.124939
-3.403065	for Mac	-0.124939
-3.189418	or Mac	-0.124939
-3.123731	on Mac	-0.124939
-2.587526	64-bit Mac	-0.124939
-1.262942	32-bit Mac	-0.346788
-1.992888	systems. Mac	-0.124939
-1.712433	Linux, Mac	-0.124939
-1.077839	perhaps Mac	-0.124939
-0.124889	Intel-based Mac	-0.301030
-0.902195	date. Mac	-0.124939
-2.756975	the multiplication	-0.191886
-3.113691	a multiplication	-0.124939
-2.550606	and multiplication	-0.124939
-3.596311	in multiplication	-0.124939
-2.818629	The multiplication	-0.425969
-3.420184	for multiplication	-0.124939
-2.954790	this multiplication	-0.124939
-2.104612	point multiplication	-0.124939
-2.040575	integer multiplication	-0.124939
-0.934806	Integer multiplication	-0.249877
-1.554401	involves multiplication	-0.124939
-2.505568	the application	-0.204120
-2.811913	The application	-0.425969
-2.380232	an application	-0.124939
-2.417567	first application	-0.124939
-2.271893	Some application	-0.124939
-2.233167	An application	-0.124939
-2.038279	particular application	-0.124939
-1.917114	graphics application	-0.124939
-1.898300	your application	-0.124939
-1.896399	second application	-0.124939
-1.710737	final application	-0.124939
-1.551989	typical application	-0.124939
-0.902128	web application	-0.124939
-0.902128	WTL application	-0.124939
-1.516316	const x)	-0.602060
-0.986815	& x)	-0.903090
-1.364863	(int x)	-0.425969
-0.425814	(float x)	-0.602060
-0.090150	p(double x)	-0.726999
-0.090150	xpow10(double x)	-0.726999
-1.203078	(double x)	-0.124939
-0.204050	Exp(float x)	-0.425969
-0.601712	Func1(int x)	-0.124939
-0.601712	Func2(double x)	-0.124939
-3.600648	is automatically	-0.124939
-3.559230	to automatically	-0.124939
-3.359251	that automatically	-0.124939
-2.398603	can automatically	-0.124939
-2.451310	code automatically	-0.124939
-2.214813	will automatically	-0.124939
-2.821691	data automatically	-0.124939
-2.763384	loop automatically	-0.124939
-2.718895	should automatically	-0.124939
-2.380650	optimization automatically	-0.124939
-2.311817	operations automatically	-0.124939
-2.238365	arrays automatically	-0.124939
-2.227166	doesn't automatically	-0.124939
-2.033032	programs automatically	-0.124939
-1.871177	goes automatically	-0.124939
-1.739285	inlined automatically	-0.124939
-1.551087	writes automatically	-0.124939
-1.441942	update automatically	-0.124939
-0.901593	14.14b automatically	-0.124939
-0.601311	12.8b automatically	-0.124939
-0.601311	shall automatically	-0.124939
-0.601311	73) automatically	-0.124939
-2.122552	to see	-0.460731
-3.496741	and see	-0.124939
-2.403313	can see	-0.301030
-3.018112	you see	-0.124939
-2.220940	will see	-0.124939
-2.576858	also see	-0.124939
-1.773978	code, see	-0.124939
-1.299508	features, see	-0.124939
-1.078039	operations, see	-0.124939
-0.601679	topic, see	-0.124939
-0.601679	registers; see	-0.124939
-3.889074	the caching	-0.124939
-3.471019	and caching	-0.124939
-3.407843	that caching	-0.124939
-2.049847	code caching	-0.249877
-1.309340	data caching	-0.522879
-2.778480	If caching	-0.124939
-2.701686	no caching	-0.124939
-1.764356	makes caching	-0.124939
-2.351379	without caching	-0.124939
-2.182937	cause caching	-0.124939
-1.299241	Data caching	-0.124939
-0.902195	Efficient caching	-0.124939
-0.902195	Code caching	-0.124939
-2.164980	that allows	-0.204120
-2.577560	it allows	-0.124939
-2.009729	This allows	-0.249877
-2.362759	compiler allows	-0.124939
-2.760816	which allows	-0.124939
-2.571527	also allows	-0.124939
-1.553614	Windows allows	-0.124939
-2.221780	language allows	-0.124939
-1.977005	reference allows	-0.124939
-1.956149	mechanism allows	-0.124939
-0.856659	logic allows	-0.124939
-1.552466	standardized allows	-0.124939
-0.902128	biased allows	-0.124939
-0.902128	shared_ptr allows	-0.124939
-3.483690	and sets	-0.124939
-3.140572	with sets	-0.124939
-2.896070	then sets	-0.124939
-2.846169	data sets	-0.124939
-1.092385	instruction sets	-0.271067
-2.672742	example sets	-0.124939
-1.641384	32 sets	-0.124939
-1.851975	constructor sets	-0.124939
-1.555060	Instruction sets	-0.124939
-1.202878	routine sets	-0.124939
-0.601646	__intel_cpu_features_init() sets	-0.124939
-0.601646	similarly sets	-0.124939
-2.692342	the expression	-0.182931
-2.572355	The expression	-0.124939
-3.087307	This expression	-0.124939
-2.130433	an expression	-0.301030
-2.713533	integer expression	-0.124939
-2.514283	some expression	-0.124939
-1.289977	An expression	-0.124939
-2.073394	intermediate expression	-0.124939
-1.954477	&& expression	-0.124939
-1.675896	Any expression	-0.124939
-1.299241	equivalent expression	-0.124939
-1.300135	loop-invariant expression	-0.124939
-0.601612	non-reduced expression	-0.124939
-3.638473	is implementation	-0.124939
-3.099684	code implementation	-0.124939
-3.073324	This implementation	-0.124939
-2.819925	vector implementation	-0.124939
-2.789462	different implementation	-0.124939
-2.752426	which implementation	-0.124939
-2.567381	C++ implementation	-0.124939
-2.548927	possible implementation	-0.124939
-1.388704	software implementation	-0.249877
-2.229794	An implementation	-0.124939
-2.219217	best implementation	-0.124939
-1.502694	good implementation	-0.124939
-1.151738	hardware implementation	-0.301030
-1.061357	complicated implementation	-0.301030
-1.595947	metaprogramming implementation	-0.124939
-1.550797	typical implementation	-0.124939
-1.376533	mixed implementation	-0.124939
-1.299800	safer implementation	-0.124939
-2.418524	4 Most	-0.124939
-2.375080	code. Most	-0.124939
-2.077802	memory. Most	-0.124939
-1.987803	cache. Most	-0.124939
-1.949876	efficient. Most	-0.124939
-1.907808	framework Most	-0.124939
-1.867758	storage Most	-0.124939
-1.821172	reductions Most	-0.124939
-1.822897	loop. Most	-0.124939
-1.736851	resources. Most	-0.124939
-1.706391	testing Most	-0.124939
-1.708123	variable. Most	-0.124939
-1.593313	references. Most	-0.124939
-1.594180	sets. Most	-0.124939
-1.549291	expressions. Most	-0.124939
-1.441889	optimizations. Most	-0.124939
-1.296634	maintain. Most	-0.124939
-1.297510	reduction Most	-0.124939
-1.076539	12. Most	-0.124939
-1.076539	on. Most	-0.124939
-0.901326	constructs Most	-0.124939
-0.901326	executable. Most	-0.124939
-0.901326	compression Most	-0.124939
-0.901326	updated. Most	-0.124939
-0.601177	47 Most	-0.124939
-0.601177	guidelines. Most	-0.124939
-3.873958	the complicated	-0.124939
-2.869771	a complicated	-0.124939
-3.631536	of complicated	-0.124939
-3.458707	and complicated	-0.124939
-3.119909	on complicated	-0.124939
-2.943617	this complicated	-0.124939
-1.344269	more complicated	-0.166331
-2.681969	most complicated	-0.124939
-2.512591	so complicated	-0.124939
-2.210629	These complicated	-0.124939
-1.993309	Using complicated	-0.124939
-1.874122	reduce complicated	-0.124939
-1.077739	More complicated	-0.124939
-1.077739	extremely complicated	-0.124939
-3.108427	of handling	-0.124939
-3.476007	for handling	-0.124939
-1.177897	error handling	-0.124939
-0.391162	exception handling	-0.179296
-0.380139	Exception handling	-0.425969
-3.099684	code like	-0.124939
-2.120711	functions like	-0.124939
-2.304130	cases like	-0.124939
-2.175629	classes like	-0.124939
-2.152249	implemented like	-0.124939
-2.110232	would like	-0.124939
-1.932941	expressions like	-0.124939
-0.810052	look like	-0.425969
-1.132432	things like	-0.124939
-1.710498	tasks like	-0.124939
-1.595947	statements like	-0.124939
-1.554460	situations like	-0.124939
-1.202890	treated like	-0.124939
-1.201664	techniques like	-0.124939
-0.379965	behaves like	-0.124939
-0.124870	looks like	-0.602060
-0.601445	expanded like	-0.124939
-0.601445	actions like	-0.124939
-3.889074	the dependency	-0.124939
-3.107133	a dependency	-0.124939
-3.642544	of dependency	-0.124939
-2.881054	A dependency	-0.124939
-2.575591	such dependency	-0.124939
-1.036841	long dependency	-0.329059
-1.752470	critical dependency	-0.124939
-1.917608	Each dependency	-0.124939
-1.774136	Such dependency	-0.124939
-1.502469	down dependency	-0.124939
-0.070555	loop-carried dependency	-0.346788
-1.078286	carried dependency	-0.124939
-1.078286	Long dependency	-0.124939
-3.407135	the members	-0.124939
-2.725900	are members	-0.425969
-3.145297	with members	-0.124939
-1.187696	data members	-0.197489
-2.760962	used members	-0.124939
-2.696022	class members	-0.124939
-2.527525	variable members	-0.124939
-2.270531	its members	-0.124939
-1.445256	smallest members	-0.124939
-0.601755	Data members	-0.425969
-0.902328	Non-static members	-0.124939
-2.689223	of their	-0.124939
-3.604904	to their	-0.124939
-3.423738	and their	-0.124939
-3.196873	if their	-0.124939
-2.112190	by their	-0.124939
-3.073041	as their	-0.124939
-2.911584	when their	-0.124939
-2.848012	at their	-0.124939
-2.824191	program their	-0.124939
-2.820517	make their	-0.124939
-1.881152	because their	-0.602060
-2.009302	each their	-0.124939
-2.359744	test their	-0.124939
-1.826726	change their	-0.124939
-1.678340	fit their	-0.124939
-1.638820	keep their	-0.124939
-1.078019	leaving their	-0.124939
-2.307892	type __m128i	-0.124939
-2.226475	element __m128i	-0.124939
-1.110985	c __m128i	-0.425969
-1.006191	inline __m128i	-0.425969
-0.333127	d, __m128i	-0.726999
-0.504921	bit-mask: __m128i	-0.425969
-0.504921	c: __m128i	-0.425969
-0.504921	b: __m128i	-0.425969
-1.078420	operations: __m128i	-0.124939
-0.380059	(0,0,0,0,0,0,0,0) __m128i	-0.425969
-0.380059	(2,2,2,2,2,2,2,2) __m128i	-0.425969
-2.879516	} Using	-0.124939
-2.256108	function. Using	-0.124939
-1.990547	cache. Using	-0.124939
-1.952336	lookup Using	-0.124939
-1.770707	2. Using	-0.124939
-1.709903	variable. Using	-0.124939
-1.674468	precision. Using	-0.124939
-0.805225	12 Using	-0.425969
-0.504741	14.9 Using	-0.425969
-1.202077	expensive. Using	-0.124939
-1.201398	efficiency. Using	-0.124939
-0.504741	16.1 Using	-0.425969
-0.901727	temporarily. Using	-0.124939
-0.901727	105). Using	-0.124939
-0.203983	12.4 Using	-0.425969
-0.203983	12.5 Using	-0.425969
-0.601378	chapter. Using	-0.124939
-0.601378	150. Using	-0.124939
-0.601378	parallel: Using	-0.124939
-0.601378	11. Using	-0.124939
-2.908667	the Boolean	-0.522879
-3.094306	a Boolean	-0.124939
-2.839439	of Boolean	-0.124939
-2.805298	The Boolean	-0.124939
-3.370726	for Boolean	-0.124939
-3.117690	with Boolean	-0.124939
-3.073041	as Boolean	-0.124939
-3.007935	than Boolean	-0.124939
-2.948503	have Boolean	-0.124939
-2.538028	many Boolean	-0.124939
-1.771795	produce Boolean	-0.124939
-1.443679	reciprocal Boolean	-0.124939
-1.298129	mispredictions. Boolean	-0.124939
-1.202377	true. Boolean	-0.124939
-0.901927	overdetermined Boolean	-0.124939
-0.901927	invalid. Boolean	-0.124939
-0.601478	76 Boolean	-0.124939
-3.029211	the cache.	-0.425969
-3.204797	or cache.	-0.124939
-2.459885	code cache.	-0.425969
-1.554835	data cache.	-0.124939
-2.799653	same cache.	-0.124939
-2.098530	1 cache.	-0.124939
-0.823824	level-2 cache.	-0.124939
-0.793776	level-1 cache.	-0.301030
-1.677068	disk cache.	-0.124939
-0.380059	micro-op cache.	-0.124939
-0.601679	level-3 cache.	-0.124939
-3.483690	and don't	-0.124939
-3.413594	that don't	-0.124939
-1.519031	you don't	-0.301030
-2.759598	but don't	-0.124939
-2.688389	compilers don't	-0.124939
-1.315241	we don't	-0.346788
-2.365511	they don't	-0.124939
-1.591961	I don't	-0.124939
-2.086429	simply don't	-0.124939
-0.902261	please don't	-0.124939
-0.902261	factorials don't	-0.124939
-0.601646	"we don't	-0.124939
-3.561427	of 256	-0.124939
-3.168781	= 256	-0.124939
-3.132830	or 256	-0.124939
-3.007583	int 256	-0.124939
-2.777951	only 256	-0.124939
-2.639982	double 256	-0.124939
-2.580081	float 256	-0.124939
-1.748516	4 256	-0.124939
-1.742497	8 256	-0.124939
-1.738657	unsigned 256	-0.124939
-1.638610	16 256	-0.124939
-2.324879	32 256	-0.124939
-2.177649	AVX 256	-0.124939
-1.954577	char 256	-0.124939
-1.933174	> 256	-0.124939
-1.889693	AVX2 256	-0.124939
-1.442174	int64_t 256	-0.124939
-1.442174	available, 256	-0.124939
-1.201265	uint64_t 256	-0.124939
-0.601344	(XMM), 256	-0.124939
-0.601344	832 256	-0.124939
-3.368353	the intrinsic	-0.124939
-3.580330	of intrinsic	-0.124939
-3.357936	The intrinsic	-0.124939
-2.402508	for intrinsic	-0.425969
-3.286271	are intrinsic	-0.124939
-3.108865	with intrinsic	-0.124939
-2.918759	use intrinsic	-0.124939
-2.786513	different intrinsic	-0.124939
-2.749274	used intrinsic	-0.124939
-2.674555	each intrinsic	-0.124939
-2.658703	using intrinsic	-0.124939
-2.329001	SSE2 intrinsic	-0.124939
-2.225222	Use intrinsic	-0.124939
-2.133967	support intrinsic	-0.124939
-1.044749	Using intrinsic	-0.602060
-1.822880	supports intrinsic	-0.124939
-1.707646	so-called intrinsic	-0.124939
-0.601411	assembly-like intrinsic	-0.124939
-0.601411	_mm_clflush intrinsic	-0.124939
-3.368353	the methods	-0.124939
-2.786513	different methods	-0.124939
-2.750240	other methods	-0.124939
-2.749274	used methods	-0.124939
-2.569782	such methods	-0.124939
-2.384249	optimization methods	-0.124939
-1.256827	these methods	-0.249877
-2.348941	useful methods	-0.124939
-1.639833	following methods	-0.124939
-1.522333	These methods	-0.124939
-2.054829	above methods	-0.124939
-1.988470	development methods	-0.124939
-1.972008	various methods	-0.124939
-1.871184	storage methods	-0.124939
-1.596257	similar methods	-0.124939
-0.601411	code-based methods	-0.124939
-0.601411	workaround methods	-0.124939
-0.601411	table-based methods	-0.124939
-0.601411	suggests methods	-0.124939
-2.864155	a signed	-0.301030
-3.600094	of signed	-0.124939
-2.832869	to signed	-0.124939
-3.385573	that signed	-0.124939
-3.317224	be signed	-0.124939
-3.117690	with signed	-0.124939
-3.108640	on signed	-0.124939
-2.364996	than signed	-0.124939
-2.662690	using signed	-0.124939
-1.796200	between signed	-0.425969
-2.227273	Use signed	-0.124939
-1.674885	mix signed	-0.124939
-1.501094	integer, signed	-0.124939
-1.376732	comparing signed	-0.124939
-0.601595	int, signed	-0.425969
-0.504917	8-bit signed	-0.124939
-0.901927	char, signed	-0.124939
-3.671409	a model	-0.124939
-2.436980	and model	-0.346788
-2.776115	that model	-0.124939
-3.197039	or model	-0.124939
-2.884962	A model	-0.124939
-1.933999	memory model	-0.124939
-1.700860	CPU model	-0.249877
-2.358636	new model	-0.124939
-1.130773	processor model	-0.124939
-2.023179	next model	-0.124939
-1.502257	false model	-0.124939
-0.601646	-fp- model	-0.124939
-3.382493	the development	-0.124939
-3.610324	of development	-0.124939
-2.692268	and development	-0.124939
-2.807492	The development	-0.425969
-2.826781	program development	-0.124939
-2.678748	most development	-0.124939
-2.509442	some development	-0.124939
-1.181548	software development	-0.124939
-2.478801	between development	-0.124939
-2.193284	good development	-0.124939
-1.848104	advanced development	-0.124939
-1.553023	easy development	-0.124939
-1.377477	powerful development	-0.124939
-1.298295	popular development	-0.124939
-0.901994	Various development	-0.124939
-0.901994	integrated development	-0.124939
-3.805387	the mathematical	-0.124939
-2.835945	of mathematical	-0.301030
-2.538770	and mathematical	-0.249877
-3.549012	in mathematical	-0.124939
-3.355417	for mathematical	-0.124939
-3.146301	or mathematical	-0.124939
-3.101286	on mathematical	-0.124939
-2.672755	do mathematical	-0.124939
-2.348941	useful mathematical	-0.124939
-2.262392	about mathematical	-0.124939
-1.522076	common mathematical	-0.124939
-2.166538	optimized mathematical	-0.124939
-2.106523	doing mathematical	-0.124939
-2.003611	complicated mathematical	-0.124939
-1.846435	advanced mathematical	-0.124939
-1.674159	mix mathematical	-0.124939
-1.500630	heavy mathematical	-0.124939
-1.297796	computing mathematical	-0.124939
-0.601411	vectorizing mathematical	-0.124939
-2.542344	is never	-0.301030
-2.324506	are never	-0.249877
-2.654683	can never	-0.425969
-1.968515	will never	-0.124939
-1.630077	should never	-0.124939
-2.362368	user never	-0.124939
-2.196188	overflow never	-0.124939
-2.054286	was never	-0.124939
-2.042035	space never	-0.124939
-1.598643	input never	-0.124939
-1.202598	directive never	-0.124939
-2.063889	a separate	-0.329059
-3.689596	of separate	-0.124939
-2.961181	in separate	-0.124939
-3.378551	be separate	-0.124939
-2.973676	have separate	-0.124939
-2.837076	make separate	-0.124939
-2.808013	functions separate	-0.124939
-2.652818	into separate	-0.124939
-2.373354	need separate	-0.124939
-3.904735	the block	-0.124939
-3.671409	a block	-0.124939
-3.316490	can block	-0.124939
-1.167162	memory block	-0.271067
-2.846169	data block	-0.124939
-2.251747	large block	-0.124939
-2.232845	big block	-0.124939
-2.192662	small block	-0.124939
-1.895689	own block	-0.124939
-1.874638	old block	-0.124939
-1.677525	possibly block	-0.124939
-1.503082	try block	-0.124939
-3.022886	the name	-0.124939
-2.811913	The name	-0.124939
-2.022969	function name	-0.124939
-2.801463	different name	-0.124939
-1.877816	same name	-0.301030
-2.016783	class name	-0.124939
-2.440542	called name	-0.124939
-2.267226	its name	-0.124939
-2.267691	about name	-0.124939
-1.742922	local name	-0.124939
-1.551989	brand name	-0.124939
-1.077739	arbitrary name	-0.124939
-0.902128	looking name	-0.124939
-0.601579	Assembly name	-0.124939
-3.889074	the systems.	-0.124939
-1.087284	64-bit systems.	-0.176091
-2.514283	some systems.	-0.124939
-2.481349	32-bit systems.	-0.124939
-1.326621	operating systems.	-0.124939
-2.183375	Linux systems.	-0.124939
-2.025871	Mac systems.	-0.124939
-1.899075	bigger systems.	-0.124939
-1.639882	message systems.	-0.124939
-1.639882	BSD systems.	-0.124939
-0.680883	embedded systems.	-0.124939
-1.077839	Unix-like systems.	-0.124939
-0.601612	Itanium systems.	-0.124939
-2.244216	to put	-0.346788
-2.707066	and put	-0.124939
-2.746231	be put	-0.124939
-3.028870	may put	-0.124939
-3.021580	you put	-0.124939
-2.973676	have put	-0.124939
-1.959977	then put	-0.602060
-1.393385	simply put	-0.124939
-0.856956	Don't put	-0.124939
-3.845218	the needs	-0.124939
-3.626121	a needs	-0.124939
-2.369329	that needs	-0.425969
-2.052953	it needs	-0.346788
-3.078863	This needs	-0.124939
-2.361565	compiler needs	-0.425969
-2.774450	loop needs	-0.124939
-2.641871	b needs	-0.124939
-2.322059	file needs	-0.124939
-2.284347	constant needs	-0.124939
-2.123238	list needs	-0.124939
-1.131896	section needs	-0.425969
-1.742195	still needs	-0.124939
-1.676331	iteration needs	-0.124939
-1.201931	handler needs	-0.124939
-0.601512	ReadB needs	-0.124939
-2.352586	= y	-0.602060
-1.898346	{ y	-0.726999
-2.650066	double y	-0.124939
-2.488898	return y	-0.124939
-2.111883	structure y	-0.124939
-2.009420	expression y	-0.124939
-1.992386	c; y	-0.124939
-1.935792	> y	-0.124939
-1.875362	Here, y	-0.124939
-1.827599	a; y	-0.124939
-1.772158	b) y	-0.124939
-0.442106	y; y	-0.726999
-0.504821	b2; y	-0.425969
-0.601512	1) y	-0.124939
-0.601512	a.x, y	-0.124939
-0.601512	write: y	-0.124939
-3.022886	the conversion	-0.249877
-2.417059	The conversion	-0.124939
-3.084474	This conversion	-0.124939
-2.877180	A conversion	-0.124939
-2.841162	data conversion	-0.124939
-2.711265	integer conversion	-0.124939
-1.981085	size conversion	-0.124939
-2.592472	float conversion	-0.124939
-2.447047	before conversion	-0.124939
-2.429970	unsigned conversion	-0.124939
-1.203121	type conversion	-0.124939
-1.491561	precision conversion	-0.124939
-0.902128	Efficient conversion	-0.124939
-0.601579	integer-to-float conversion	-0.124939
-3.240964	= c;	-0.124939
-2.128813	int c;	-0.301030
-2.662484	double c;	-0.124939
-1.698277	+ c;	-0.124939
-1.871928	* c;	-0.124939
-2.500431	return c;	-0.124939
-2.159615	/ c;	-0.124939
-0.418356	b, c;	-0.329059
-1.677778	% c;	-0.124939
-0.090150	r, c;	-0.425969
-3.056282	of #include	-0.124939
-2.594116	library #include	-0.124939
-2.326960	SSE2 #include	-0.124939
-2.243442	versions #include	-0.124939
-2.171913	classes #include	-0.124939
-2.035931	dispatching #include	-0.124939
-1.931195	compilers. #include	-0.124939
-1.742232	vectorized #include	-0.124939
-1.298042	16.2 #include	-0.124939
-1.201877	16.1 #include	-0.124939
-1.201877	9.3 #include	-0.124939
-1.076938	141 #include	-0.124939
-1.077685	9.6b. #include	-0.124939
-0.379912	InstructionSet() #include	-0.425969
-0.901593	<stdio.h> #include	-0.124939
-0.901593	Or #include	-0.124939
-0.601311	<excpt.h> #include	-0.124939
-0.601311	<float.h> #include	-0.124939
-0.601311	(SSE2): #include	-0.124939
-0.601311	(Intel) #include	-0.124939
-0.601311	(SSE): #include	-0.124939
-0.601311	114 #include	-0.124939
-3.904735	the various	-0.124939
-3.653838	of various	-0.124939
-2.935438	and various	-0.124939
-3.590110	in various	-0.124939
-1.831351	are various	-0.321233
-2.297670	have various	-0.124939
-2.823714	because various	-0.124939
-2.455949	makes various	-0.124939
-2.124917	contains various	-0.124939
-1.875047	reduce various	-0.124939
-1.378143	show various	-0.124939
-0.902261	describe various	-0.124939
-3.037791	the disadvantage	-0.425969
-2.727900	a disadvantage	-0.249877
-2.142602	The disadvantage	-0.669007
-1.978661	A disadvantage	-0.301030
-1.583294	important disadvantage	-0.425969
-0.726752	Another disadvantage	-0.301030
-1.378977	biggest disadvantage	-0.124939
-3.033480	the high	-0.249877
-3.143955	is high	-0.124939
-2.383782	a high	-0.182931
-2.823166	The high	-0.124939
-2.813303	for high	-0.425969
-2.846058	has high	-0.124939
-1.571086	so high	-0.602060
-2.490933	very high	-0.124939
-0.902462	annoyingly high	-0.124939
-3.937861	the zero	-0.124939
-2.900270	is zero	-0.124939
-3.677347	of zero	-0.124939
-2.199639	to zero	-0.124939
-3.510197	and zero	-0.124939
-2.544301	takes zero	-0.124939
-2.055171	was zero	-0.124939
-1.300196	__m128i zero	-0.425969
-0.380072	terminating zero	-0.425969
-0.601712	remains zero	-0.124939
-3.018719	the Microsoft	-0.124939
-3.122353	is Microsoft	-0.124939
-3.068867	to Microsoft	-0.124939
-3.435084	and Microsoft	-0.124939
-2.807492	The Microsoft	-0.124939
-3.167325	or Microsoft	-0.124939
-2.486641	with Microsoft	-0.124939
-2.774821	If Microsoft	-0.124939
-1.954023	below. Microsoft	-0.124939
-1.224651	Intel, Microsoft	-0.124939
-1.553023	available. Microsoft	-0.124939
-1.443879	Borland Microsoft	-0.124939
-1.376932	CodeGear Microsoft	-0.124939
-0.901994	(The Microsoft	-0.124939
-0.601512	date): Microsoft	-0.124939
-0.601512	tested: Microsoft	-0.124939
-3.604904	to what	-0.124939
-3.423738	and what	-0.124939
-3.385573	that what	-0.124939
-3.160203	or what	-0.124939
-2.058675	on what	-0.249877
-2.848012	at what	-0.124939
-2.266655	does what	-0.124939
-2.224039	But what	-0.124939
-2.092895	add what	-0.124939
-1.897307	shows what	-0.124939
-0.925989	know what	-0.124939
-1.132376	change what	-0.124939
-1.501094	explicitly what	-0.124939
-1.502828	exactly what	-0.124939
-0.901927	that's what	-0.124939
-0.204004	Checking what	-0.425969
-0.901927	reader what	-0.124939
-3.412236	the parameter	-0.124939
-2.877374	a parameter	-0.124939
-2.851897	of parameter	-0.602060
-2.942777	and parameter	-0.425969
-2.549009	function parameter	-0.124939
-2.720408	integer parameter	-0.124939
-1.982412	size parameter	-0.425969
-1.286629	template parameter	-0.124939
-1.150046	; parameter	-0.249877
-1.203078	implicit parameter	-0.124939
-3.937861	the division	-0.124939
-3.510197	and division	-0.124939
-3.434624	The division	-0.124939
-2.371060	than division	-0.425969
-1.577008	point division	-0.124939
-2.744711	one division	-0.124939
-1.788021	integer division	-0.124939
-2.058930	fast division	-0.124939
-0.586628	Integer division	-0.425969
-1.078139	involving division	-0.124939
-2.881226	a reference	-0.301030
-1.649448	or reference	-0.505150
-2.900958	A reference	-0.124939
-1.356938	const reference	-0.124939
-2.292415	constant reference	-0.124939
-1.803940	relative reference	-0.124939
-1.078340	Return reference	-0.124939
-0.902529	null reference	-0.124939
-2.822257	the source	-0.124939
-3.373846	The source	-0.124939
-3.370726	for source	-0.124939
-3.385573	that source	-0.124939
-2.792431	different source	-0.124939
-1.876671	same source	-0.301030
-2.735479	all source	-0.124939
-2.728834	one source	-0.124939
-2.349861	useful source	-0.124939
-2.206512	common source	-0.124939
-2.143485	another source	-0.124939
-2.078544	All source	-0.124939
-1.501094	frequent source	-0.124939
-1.443679	Open source	-0.124939
-1.443102	reliable source	-0.124939
-1.077438	open source	-0.124939
-0.601478	valuable source	-0.124939
-2.831588	the cost	-0.903090
-2.420654	The cost	-0.726999
-3.073053	not cost	-0.124939
-3.095918	This cost	-0.124939
-1.616556	no cost	-0.249877
-2.528830	any cost	-0.124939
-2.498206	performance cost	-0.124939
-1.318874	extra cost	-0.124939
-2.253832	large cost	-0.124939
-1.133122	overhead cost	-0.124939
-2.629088	is running	-0.124939
-3.394752	for running	-0.124939
-2.477150	are running	-0.301030
-1.837456	when running	-0.425969
-2.447047	before running	-0.124939
-2.323111	system running	-0.124939
-1.554188	avoid running	-0.124939
-2.224120	threads running	-0.124939
-2.212028	thread running	-0.124939
-1.954260	applications running	-0.124939
-1.874122	process running	-0.124939
-1.298628	processes running	-0.124939
-1.077739	starts running	-0.124939
-1.078219	Consider running	-0.124939
-2.347001	and automatic	-0.204120
-2.814140	The automatic	-0.124939
-2.804753	for automatic	-0.124939
-2.083540	with automatic	-0.425969
-2.467933	on automatic	-0.124939
-2.701686	no automatic	-0.124939
-2.682738	do automatic	-0.124939
-2.554702	where automatic	-0.124939
-2.231403	Use automatic	-0.124939
-2.124228	contains automatic	-0.124939
-1.825838	supports automatic	-0.124939
-1.299688	install automatic	-0.124939
-0.601612	intrinsics, automatic	-0.124939
-3.873958	the resources	-0.124939
-2.928222	and resources	-0.124939
-1.725151	more resources	-0.823909
-2.853411	memory resources	-0.124939
-2.107371	other resources	-0.124939
-2.760816	which resources	-0.124939
-2.743461	all resources	-0.124939
-1.639521	system resources	-0.124939
-1.503228	allocated resources	-0.124939
-2.075345	shared resources	-0.124939
-0.823374	network resources	-0.124939
-1.298628	computing resources	-0.124939
-0.902128	reserve resources	-0.124939
-0.601579	steals resources	-0.124939
-3.166486	the induction	-0.602060
-3.620800	of induction	-0.124939
-3.446735	and induction	-0.124939
-2.487685	with induction	-0.124939
-2.129553	an induction	-0.301030
-2.929090	use induction	-0.124939
-2.824598	make induction	-0.124939
-2.787069	same induction	-0.124939
-2.102763	point induction	-0.425969
-2.697277	no induction	-0.124939
-1.909383	two induction	-0.124939
-2.369023	need induction	-0.124939
-1.895937	second induction	-0.124939
-1.202064	explicit induction	-0.124939
-0.204017	Update induction	-0.425969
-3.039963	the reason	-0.726999
-1.929422	The reason	-0.564271
-2.815913	same reason	-0.124939
-1.501053	no reason	-0.823909
-1.829740	main reason	-0.124939
-1.378931	security reason	-0.124939
-2.763838	the dispatcher	-0.367977
-2.832383	The dispatcher	-0.124939
-2.913354	A dispatcher	-0.124939
-2.845598	make dispatcher	-0.124939
-1.163427	CPU dispatcher	-0.204120
-3.619149	is n	-0.124939
-3.062101	of n	-0.124939
-3.369589	that n	-0.124939
-3.221275	// n	-0.124939
-2.512443	by n	-0.124939
-3.097656	on n	-0.124939
-2.244886	when n	-0.425969
-2.769991	If n	-0.124939
-2.546639	where n	-0.124939
-2.186725	+= n	-0.124939
-2.090471	add n	-0.124939
-2.058358	(int n	-0.124939
-2.025979	1; n	-0.124939
-1.912827	x; n	-0.124939
-1.708559	adding n	-0.124939
-1.595287	significant n	-0.124939
-1.443082	xn n	-0.124939
-1.443082	<= n	-0.124939
-0.901727	subtracting n	-0.124939
-0.901727	>> n	-0.124939
-3.845218	the string	-0.124939
-2.715837	a string	-0.249877
-3.610324	of string	-0.124939
-2.692268	and string	-0.124939
-2.807492	The string	-0.124939
-3.378588	for string	-0.124939
-3.391034	that string	-0.124939
-3.077636	as string	-0.124939
-2.926484	use string	-0.124939
-2.851735	from string	-0.124939
-2.680718	each string	-0.124939
-2.207571	common string	-0.124939
-1.551393	style string	-0.124939
-0.504821	constants, string	-0.425969
-0.901994	ASCII string	-0.124939
-0.901994	SSE4.2 string	-0.124939
-2.327324	the programmer	-0.425969
-3.779216	a programmer	-0.124939
-2.584028	The programmer	-0.124939
-2.030499	application programmer	-0.124939
-3.805387	the three	-0.124939
-3.401897	and three	-0.124939
-2.800944	The three	-0.124939
-3.303156	be three	-0.124939
-2.710889	are three	-0.425969
-3.146301	or three	-0.124939
-3.063995	as three	-0.124939
-2.942432	have three	-0.124939
-1.905963	has three	-0.124939
-2.730238	all three	-0.124939
-2.642468	into three	-0.124939
-2.457870	way three	-0.124939
-2.233351	compiled three	-0.124939
-2.097581	every three	-0.124939
-1.956706	Make three	-0.124939
-1.799098	Supports three	-0.124939
-1.551782	approximately three	-0.124939
-1.077238	reveals three	-0.124939
-0.901794	executes three	-0.124939
-3.131480	is better	-0.425969
-2.607536	a better	-0.124939
-2.931815	and better	-0.124939
-3.403065	for better	-0.124939
-2.339464	be better	-0.249877
-3.330027	are better	-0.124939
-3.164836	by better	-0.124939
-2.881054	A better	-0.124939
-2.828718	make better	-0.124939
-1.679274	need better	-0.124939
-2.010804	expression better	-0.124939
-1.299688	becoming better	-0.124939
-1.202331	performs better	-0.124939
-2.834292	the keyword	-0.204120
-2.307748	The keyword	-0.522879
-1.495746	static keyword	-0.249877
-1.771256	const keyword	-0.124939
-1.733359	register keyword	-0.124939
-2.116384	inline keyword	-0.124939
-0.805843	volatile keyword	-0.124939
-1.299629	__fastcall keyword	-0.124939
-3.099413	not efficient.	-0.124939
-1.559512	more efficient.	-0.191886
-2.496938	very efficient.	-0.124939
-0.793825	less efficient.	-0.204120
-0.805964	equally efficient.	-0.124939
-2.524312	a lookup	-0.425969
-2.953275	use lookup	-0.124939
-0.897521	table lookup	-0.166331
-1.545177	Use lookup	-0.425969
-1.263964	Table lookup	-0.124939
-1.504083	GOT lookup	-0.124939
-2.378697	the end	-0.630089
-3.702202	of end	-0.124939
-3.701604	to end	-0.124939
-3.159787	with end	-0.124939
-3.043367	than end	-0.124939
-2.536828	we end	-0.124939
-2.335892	always end	-0.124939
-0.601779	mark end	-0.124939
-2.946945	in applications	-0.124939
-3.355417	for applications	-0.124939
-2.816474	make applications	-0.124939
-2.750240	other applications	-0.124939
-2.569782	such applications	-0.124939
-2.534437	many applications	-0.124939
-1.799353	software applications	-0.124939
-2.439896	critical applications	-0.124939
-1.327331	Some applications	-0.124939
-2.207504	several applications	-0.124939
-1.992280	mathematical applications	-0.124939
-1.377623	embedded applications	-0.124939
-1.297796	Multiple applications	-0.124939
-1.297796	CPU-intensive applications	-0.124939
-1.077238	multithreaded applications	-0.124939
-0.901794	Unix applications	-0.124939
-0.901794	WTL applications	-0.124939
-0.601411	resource-hungry applications	-0.124939
-0.601411	Memory-hungry applications	-0.124939
-2.425749	8 below.	-0.124939
-1.056023	explained below.	-0.249877
-2.145924	128 below.	-0.124939
-2.009420	see below.	-0.124939
-1.078523	described below.	-0.124939
-1.743278	given below.	-0.124939
-0.688905	discussed below.	-0.124939
-1.553568	mentioned below.	-0.124939
-1.443879	sections below.	-0.124939
-1.377477	13.1 below.	-0.124939
-1.298295	guidelines below.	-0.124939
-1.202477	8.1 below.	-0.124939
-0.504821	146 below.	-0.124939
-1.078085	164 below.	-0.124939
-0.901994	summarized below.	-0.124939
-0.901994	14.19 below.	-0.124939
-3.845218	the &&	-0.124939
-2.442360	a &&	-0.191886
-3.610324	of &&	-0.124939
-3.029311	an &&	-0.124939
-2.641871	b &&	-0.124939
-2.120567	operators &&	-0.124939
-1.989699	256 &&	-0.124939
-1.996721	y &&	-0.124939
-1.827057	change &&	-0.124939
-1.078085	!a &&	-0.124939
-0.901994	min &&	-0.124939
-0.901994	ARRAYSIZE &&	-0.124939
-0.204010	(a<b &&	-0.124939
-0.601512	b<c &&	-0.124939
-0.601512	INVALID_HANDLE_VALUE &&	-0.124939
-0.601512	if(!a &&	-0.124939
-3.402093	the |	-0.124939
-2.445186	a |	-0.271067
-3.483690	and |	-0.124939
-2.884962	A |	-0.124939
-0.747939	4) |	-0.124939
-0.601728	Wednesday |	-0.124939
-0.204037	(Tuesday |	-0.425969
-0.902261	(~a&c) |	-0.124939
-0.204037	(a&b) |	-0.425969
-0.601646	_mm_setcsr(_mm_getcsr() |	-0.124939
-0.601646	0x7FFFFF) |	-0.124939
-0.601646	0x0F) |	-0.124939
-1.721961	// Make	-0.388180
-1.955795	below. Make	-0.124939
-1.937812	set. Make	-0.124939
-1.744173	class. Make	-0.124939
-1.711545	mode. Make	-0.124939
-1.711101	object. Make	-0.124939
-1.711989	variable. Make	-0.124939
-1.598045	returns. Make	-0.124939
-1.202331	126 Make	-0.124939
-1.202331	support. Make	-0.124939
-1.202331	operators. Make	-0.124939
-0.902195	executables. Make	-0.124939
-0.601612	alternatives: Make	-0.124939
-2.874124	} We	-0.124939
-2.063795	used. We	-0.124939
-1.989173	cache. We	-0.124939
-1.967642	zero We	-0.124939
-1.930611	compilers. We	-0.124939
-1.845742	compiler. We	-0.124939
-1.707474	performance. We	-0.124939
-1.442485	number. We	-0.124939
-1.375538	chain. We	-0.124939
-1.297132	names. We	-0.124939
-1.200999	u.f We	-0.124939
-1.200999	example. We	-0.124939
-1.200999	optimized. We	-0.124939
-1.076838	28. We	-0.124939
-0.901527	15.1c. We	-0.124939
-0.901527	bit. We	-0.124939
-0.601278	caveats. We	-0.124939
-0.601278	...). We	-0.124939
-0.601278	PowerPC). We	-0.124939
-0.601278	set). We	-0.124939
-0.601278	'this'. We	-0.124939
-0.601278	15.1c? We	-0.124939
-0.601278	annoying. We	-0.124939
-3.169409	the examples	-0.124939
-2.571077	The examples	-0.124939
-2.561044	for examples	-0.602060
-2.457529	code examples	-0.124939
-2.917643	more examples	-0.124939
-2.543469	many examples	-0.124939
-1.669819	these examples	-0.124939
-2.329142	following examples	-0.124939
-2.212028	several examples	-0.124939
-2.123540	contains examples	-0.124939
-1.364045	above examples	-0.124939
-1.077739	More examples	-0.124939
-0.902128	contrived examples	-0.124939
-0.902128	obscure examples	-0.124939
-2.794298	for char	-0.124939
-2.592706	= char	-0.425969
-2.750718	used char	-0.124939
-2.688320	set char	-0.124939
-2.583805	static char	-0.124939
-2.423206	8 char	-0.124939
-1.325878	unsigned char	-0.124939
-2.321980	16 char	-0.124939
-2.329683	SSE2 char	-0.124939
-2.327299	32 char	-0.124939
-2.149225	}; char	-0.124939
-1.442870	MMX char	-0.124939
-0.901860	stdint.h char	-0.124939
-0.601445	7.10b char	-0.124939
-0.601445	7.31b char	-0.124939
-0.601445	7.31a char	-0.124939
-0.601445	8.17 char	-0.124939
-0.601445	7.9b char	-0.124939
-2.760393	the difference	-0.669007
-3.721976	a difference	-0.124939
-2.578801	The difference	-0.124939
-1.269309	no difference	-0.425969
-2.236897	big difference	-0.124939
-1.803940	relative difference	-0.124939
-1.554622	Time difference	-0.124939
-0.601779	minimal difference	-0.124939
-2.958296	in addition	-0.124939
-2.382588	an addition	-0.124939
-3.031232	than addition	-0.124939
-1.345569	point addition	-0.182931
-2.064878	one addition	-0.425969
-2.011955	each addition	-0.124939
-2.359772	new addition	-0.124939
-2.275369	important addition	-0.124939
-2.148563	another addition	-0.124939
-1.048456	preceding addition	-0.124939
-1.640036	allow addition	-0.124939
-3.024984	the data.	-0.124939
-2.418361	of data.	-0.367977
-2.831006	vector data.	-0.124939
-2.758010	used data.	-0.124939
-2.590947	static data.	-0.124939
-1.678026	test data.	-0.124939
-2.360541	user data.	-0.124939
-2.354056	these data.	-0.124939
-1.712433	writing data.	-0.124939
-1.598045	input data.	-0.124939
-1.077839	read-only data.	-0.124939
-0.601612	Misaligned data.	-0.124939
-0.601612	writeable data.	-0.124939
-2.540763	is too	-0.124939
-2.494713	be too	-0.124939
-2.478228	are too	-0.124939
-3.189418	or too	-0.124939
-2.945016	time too	-0.124939
-2.838421	has too	-0.124939
-1.854207	takes too	-0.124939
-2.052523	was too	-0.124939
-1.800393	become too	-0.124939
-1.745062	unroll too	-0.124939
-1.444477	generates too	-0.124939
-1.202778	Making too	-0.124939
-0.601612	worrying too	-0.124939
-3.637008	a mechanism	-0.124939
-2.809697	The mechanism	-0.124939
-3.081659	This mechanism	-0.124939
-3.025891	compiler mechanism	-0.124939
-2.658168	Intel mechanism	-0.124939
-2.273119	Gnu mechanism	-0.124939
-1.554042	execution mechanism	-0.124939
-2.040043	dispatching mechanism	-0.124939
-0.580527	dispatch mechanism	-0.204120
-1.745620	out-of-order mechanism	-0.124939
-1.503094	detection mechanism	-0.124939
-1.443567	update mechanism	-0.124939
-0.680726	unwinding mechanism	-0.124939
-1.077639	updating mechanism	-0.124939
-0.902061	renaming mechanism	-0.124939
-2.099066	// Table	-0.346788
-3.077979	- Table	-0.124939
-2.320895	16 Table	-0.124939
-2.329001	SSE2 Table	-0.124939
-1.973277	Microsoft Table	-0.124939
-1.890929	AVX2 Table	-0.124939
-1.871820	options Table	-0.124939
-1.639575	nontemporal Table	-0.124939
-1.551140	overflow. Table	-0.124939
-1.297796	AVX512 Table	-0.124939
-1.201531	(Gnu) Table	-0.124939
-1.201531	uint64_t Table	-0.124939
-1.077238	132 Table	-0.124939
-0.901794	license Table	-0.124939
-0.601411	F64vec4 Table	-0.124939
-0.601411	168.3 Table	-0.124939
-0.601411	-opt-report Table	-0.124939
-0.601411	multiply-and-add Table	-0.124939
-0.601411	97 Table	-0.124939
-3.018719	the runtime	-0.124939
-3.097477	a runtime	-0.124939
-3.610324	of runtime	-0.124939
-3.378588	for runtime	-0.124939
-3.112364	on runtime	-0.124939
-2.926484	use runtime	-0.124939
-2.869535	A runtime	-0.124939
-1.927053	at runtime	-0.124939
-2.738123	all runtime	-0.124939
-2.572677	such runtime	-0.124939
-1.562351	large runtime	-0.425969
-2.228829	big runtime	-0.124939
-2.219902	language runtime	-0.124939
-1.852423	require runtime	-0.124939
-1.769999	No runtime	-0.124939
-1.077538	Big runtime	-0.124939
-2.631026	is needed	-0.221849
-2.740509	be needed	-0.124939
-2.723995	are needed	-0.124939
-1.895017	not needed	-0.346788
-3.027262	than needed	-0.124939
-2.858589	memory needed	-0.124939
-2.699226	set needed	-0.124939
-2.664592	size needed	-0.124939
-2.244218	work needed	-0.124939
-1.916064	actually needed	-0.124939
-1.638915	rarely needed	-0.124939
-0.601811	searching needed	-0.124939
-3.637008	a means	-0.124939
-3.193135	function means	-0.124939
-2.517759	by means	-0.425969
-1.727253	This means	-0.970037
-2.740784	all means	-0.124939
-1.842350	variable means	-0.425969
-2.154579	/ means	-0.124939
-1.873154	10 means	-0.124939
-1.850687	function, means	-0.124939
-1.850687	operands means	-0.124939
-1.852211	here means	-0.124939
-1.676120	% means	-0.124939
-1.378156	protection means	-0.124939
-1.202064	Metaprogramming means	-0.124939
-0.601545	(Scalar means	-0.124939
-2.755842	the last	-0.191886
-2.935438	and last	-0.124939
-3.416374	The last	-0.124939
-2.448165	as last	-0.425969
-3.027262	than last	-0.124939
-0.925927	0, last	-0.602060
-1.677114	true last	-0.124939
-1.599579	come last	-0.124939
-0.805683	8, last	-0.425969
-1.299374	16, last	-0.124939
-0.601646	12, last	-0.124939
-0.601646	400, last	-0.124939
-2.751697	one byte	-0.124939
-1.036273	first byte	-0.669007
-2.239376	bytes byte	-0.124939
-1.152336	1 byte	-0.301030
-0.501279	last byte	-1.028029
-1.714030	per byte	-0.124939
-1.378977	15 byte	-0.124939
-3.397110	the parts	-0.425969
-2.921037	when parts	-0.124939
-2.828718	make parts	-0.124939
-1.730161	different parts	-0.726999
-2.108097	other parts	-0.425969
-2.758010	used parts	-0.124939
-1.222316	critical parts	-0.823909
-2.220120	specific parts	-0.124939
-2.084550	certain parts	-0.124939
-1.552732	time-consuming parts	-0.124939
-1.444477	consuming parts	-0.124939
-1.298795	Critical parts	-0.124939
-1.077839	nearby parts	-0.124939
-2.606507	a ||	-0.124939
-3.631536	of ||	-0.124939
-3.458707	and ||	-0.124939
-3.036348	an ||	-0.124939
-2.307124	0 ||	-0.124939
-0.070553	(a&&b) ||	-0.346788
-1.299108	Wednesday ||	-0.124939
-0.902128	(a&&c) ||	-0.124939
-0.902128	!(a ||	-0.124939
-0.902128	(!a&&c) ||	-0.124939
-0.902128	Tuesday ||	-0.124939
-0.601579	(a&&!b) ||	-0.124939
-0.601579	defined(__unix__) ||	-0.124939
-0.601579	if(!(a ||	-0.124939
-3.595007	a >	-0.124939
-3.046783	x >	-0.124939
-1.958754	b >	-0.425969
-2.606692	i >	-0.124939
-2.480366	2 >	-0.124939
-2.216952	c >	-0.124939
-1.639575	list[i] >	-0.124939
-0.747410	(a >	-0.425969
-1.202177	(u.i >	-0.124939
-1.201531	u.f >	-0.124939
-1.202177	(n >	-0.124939
-0.901794	(bb[i] >	-0.124939
-0.901794	bb[i] >	-0.124939
-0.203990	select(b >	-0.425969
-0.601411	-a >	-0.124939
-0.601411	(absvalue >	-0.124939
-0.601411	(SIZE >	-0.124939
-0.601411	<=, >	-0.124939
-0.601411	abs(u.f) >	-0.124939
-3.496741	and types	-0.124939
-1.383849	different types	-0.425969
-2.776530	other types	-0.124939
-1.787752	integer types	-0.124939
-2.591163	two types	-0.124939
-2.517541	some types	-0.124939
-1.817542	return types	-0.124939
-2.356823	these types	-0.124939
-1.607849	simple types	-0.124939
-1.377930	mixed types	-0.124939
-1.078039	specified types	-0.124939
-3.083107	of expressions	-0.124939
-1.851172	point expressions	-0.124939
-1.512706	integer expressions	-0.221849
-2.594271	float expressions	-0.124939
-2.587122	two expressions	-0.124939
-2.575591	such expressions	-0.124939
-2.250708	large expressions	-0.124939
-1.363788	write expressions	-0.124939
-2.042106	Integer expressions	-0.124939
-0.981823	algebraic expressions	-0.124939
-1.639438	simplest expressions	-0.124939
-0.601612	accept expressions	-0.124939
-0.601612	Reducible expressions	-0.124939
-2.468067	is difficult	-0.669007
-2.939092	and difficult	-0.425969
-2.496870	be difficult	-0.602060
-2.725900	are difficult	-0.425969
-3.124419	code difficult	-0.124939
-2.252459	more difficult	-0.124939
-2.569985	In difficult	-0.124939
-2.487961	very difficult	-0.124939
-2.184071	therefore difficult	-0.124939
-2.069755	quite difficult	-0.124939
-1.078039	slow, difficult	-0.124939
-0.908388	instruction set.	-0.271067
-2.710694	each set.	-0.124939
-3.170500	function instead	-0.124939
-3.089499	code instead	-0.124939
-3.007583	int instead	-0.124939
-2.824077	data instead	-0.124939
-2.605019	i instead	-0.124939
-2.580081	float instead	-0.124939
-2.583146	object instead	-0.124939
-2.478794	table instead	-0.124939
-2.358561	registers instead	-0.124939
-2.314012	system instead	-0.124939
-1.824000	references instead	-0.124939
-1.674139	counters instead	-0.124939
-1.595662	templates instead	-0.124939
-1.501584	#if instead	-0.124939
-0.747473	rounding instead	-0.425969
-1.297464	macros instead	-0.124939
-1.201977	format instead	-0.124939
-1.077038	-fpie instead	-0.124939
-1.077038	typedef instead	-0.124939
-0.901660	int) instead	-0.124939
-0.601344	|) instead	-0.124939
-3.123731	on compilers.	-0.124939
-1.888330	different compilers.	-0.124939
-2.108097	other compilers.	-0.124939
-2.077117	all compilers.	-0.124939
-2.661100	Intel compilers.	-0.124939
-1.640921	C++ compilers.	-0.124939
-2.514283	some compilers.	-0.124939
-1.582542	Gnu compilers.	-0.124939
-1.976042	Microsoft compilers.	-0.124939
-0.688911	PathScale compilers.	-0.124939
-1.553177	across compilers.	-0.124939
-1.444923	Clang compilers.	-0.124939
-1.202331	commercial compilers.	-0.124939
-2.751470	is transferred	-0.124939
-2.229403	be transferred	-0.823909
-1.877227	are transferred	-0.647817
-3.245810	or transferred	-0.124939
-2.905580	then transferred	-0.124939
-2.339275	always transferred	-0.124939
-3.702211	is longer	-0.124939
-3.671409	a longer	-0.124939
-3.653838	of longer	-0.124939
-3.354527	be longer	-0.124939
-2.884962	A longer	-0.124939
-1.774892	no longer	-0.301030
-1.441244	takes longer	-0.425969
-1.313331	take longer	-0.425969
-2.282822	making longer	-0.124939
-1.270146	much longer	-0.602060
-2.183285	matrix longer	-0.124939
-1.942827	byte longer	-0.124939
-2.541699	and after	-0.249877
-3.160203	or after	-0.124939
-2.780911	same after	-0.124939
-2.115696	only after	-0.124939
-2.588912	object after	-0.124939
-2.546757	array after	-0.124939
-2.262102	accessed after	-0.124939
-2.152580	check after	-0.124939
-1.377889	cycles after	-0.425969
-1.956285	needed after	-0.124939
-1.741831	output after	-0.124939
-1.552247	destructors after	-0.124939
-1.443102	switches after	-0.124939
-1.077438	removed after	-0.124939
-0.901927	_mm_empty() after	-0.124939
-0.601478	resume after	-0.124939
-0.601478	locked after	-0.124939
-2.571970	to read	-0.346788
-3.446735	and read	-0.124939
-2.492566	be read	-0.124939
-3.306042	can read	-0.124939
-3.048201	not read	-0.124939
-3.012726	may read	-0.124939
-3.011259	you read	-0.124939
-2.884625	will read	-0.124939
-2.116552	only read	-0.124939
-2.754558	but read	-0.124939
-2.528877	we read	-0.124939
-1.552711	256-bit read	-0.124939
-1.443567	had read	-0.124939
-1.202064	uncached read	-0.124939
-0.902061	99 read	-0.124939
-2.684270	to give	-0.249877
-2.924658	and give	-0.425969
-3.306042	can give	-0.124939
-3.174565	or give	-0.124939
-3.048201	not give	-0.124939
-3.012726	may give	-0.124939
-2.218702	will give	-0.124939
-2.726915	should give	-0.124939
-2.355246	systems give	-0.124939
-2.232821	doesn't give	-0.124939
-2.111904	would give	-0.124939
-1.743067	sometimes give	-0.124939
-1.742558	still give	-0.124939
-1.597448	counts give	-0.124939
-0.805459	inputs give	-0.124939
-2.381173	code. Each	-0.124939
-2.377033	time. Each	-0.124939
-1.738977	resources. Each	-0.124939
-1.711728	bytes. Each	-0.124939
-1.674844	stack. Each	-0.124939
-1.674139	classes. Each	-0.124939
-1.637055	instructions. Each	-0.124939
-1.500166	units. Each	-0.124939
-1.443592	simultaneously. Each	-0.124939
-1.443592	threads. Each	-0.124939
-1.376645	cores. Each	-0.124939
-1.201977	initialization. Each	-0.124939
-1.201265	information. Each	-0.124939
-1.202690	diagonal. Each	-0.124939
-1.077751	reasons: Each	-0.124939
-1.077038	list. Each	-0.124939
-0.601344	(methods) Each	-0.124939
-0.601344	64. Each	-0.124939
-0.601344	unacceptable. Each	-0.124939
-0.601344	Z. Each	-0.124939
-0.601344	29. Each	-0.124939
-2.579704	it becomes	-0.124939
-1.703935	code becomes	-0.182931
-2.783891	loop becomes	-0.124939
-2.297601	instructions becomes	-0.124939
-2.226475	threads becomes	-0.124939
-2.164581	calculation becomes	-0.124939
-2.071624	counter becomes	-0.124939
-1.348802	space becomes	-0.124939
-1.317226	caching becomes	-0.124939
-1.992817	never becomes	-0.124939
-0.601679	click becomes	-0.124939
-3.690917	is aligned	-0.124939
-2.494713	be aligned	-0.602060
-2.322994	are aligned	-0.124939
-1.904318	vector aligned	-0.124939
-2.828718	make aligned	-0.124939
-2.098777	store aligned	-0.124939
-2.024113	typically aligned	-0.124939
-2.028965	preferably aligned	-0.124939
-1.954477	three aligned	-0.124939
-1.918934	load aligned	-0.124939
-1.744173	S1 aligned	-0.124939
-0.504971	16kB aligned	-0.425969
-0.902195	properly aligned	-0.124939
-3.471019	and directives	-0.124939
-1.670105	these directives	-0.124939
-2.274429	Gnu directives	-0.124939
-2.211634	These directives	-0.124939
-1.974278	#include directives	-0.124939
-1.976042	Microsoft directives	-0.124939
-1.157288	#define directives	-0.124939
-1.598934	Compiler directives	-0.124939
-0.856795	Optimization directives	-0.124939
-0.601861	OpenMP directives	-0.301030
-0.805732	#if directives	-0.124939
-0.124889	Preprocessing directives	-0.124939
-0.902195	preprocessing directives	-0.124939
-2.773456	that requires	-0.124939
-2.326868	it requires	-0.124939
-2.167914	This requires	-0.124939
-2.943617	this requires	-0.124939
-2.896942	It requires	-0.124939
-2.399537	often requires	-0.124939
-1.678552	pointers requires	-0.124939
-1.649068	method requires	-0.124939
-2.299631	processors requires	-0.124939
-2.180311	precision requires	-0.124939
-2.163096	calculation requires	-0.124939
-1.895451	vectors requires	-0.124939
-1.872229	usually requires	-0.124939
-0.902128	sampling requires	-0.124939
-3.859350	the optimizations	-0.124939
-3.077001	of optimizations	-0.124939
-3.025891	compiler optimizations	-0.124939
-2.106646	other optimizations	-0.124939
-2.086472	which optimizations	-0.425969
-2.075960	all optimizations	-0.124939
-2.679385	do optimizations	-0.124939
-1.886782	such optimizations	-0.124939
-1.590923	making optimizations	-0.124939
-2.217863	specific optimizations	-0.124939
-2.109386	doing optimizations	-0.124939
-1.897969	improve optimizations	-0.124939
-1.640885	enable optimizations	-0.124939
-1.202577	interprocedural optimizations	-0.124939
-0.601545	cross-module optimizations	-0.124939
-3.889074	the graphics	-0.124939
-2.444478	a graphics	-0.191886
-3.083107	of graphics	-0.124939
-3.189418	or graphics	-0.124939
-2.701686	no graphics	-0.124939
-2.250708	large graphics	-0.124939
-2.220120	specific graphics	-0.124939
-1.917608	Each graphics	-0.124939
-0.805553	heavy graphics	-0.124939
-1.078286	cover graphics	-0.124939
-0.902195	third-party graphics	-0.124939
-0.902195	Various graphics	-0.124939
-0.601612	rendering graphics	-0.124939
-2.877374	a public	-0.301030
-3.677347	of public	-0.124939
-2.942777	and public	-0.124939
-2.970448	have public	-0.124939
-2.754337	all public	-0.124939
-2.528830	any public	-0.124939
-0.588475	: public	-0.234083
-2.084475	All public	-0.124939
-0.902395	override public	-0.124939
-0.601712	B1, public	-0.124939
-1.176355	{ public:	-0.602060
-1.921792	x; public:	-0.124939
-1.203533	56 public:	-0.124939
-0.601913	a[N]; public:	-0.124939
-3.407135	the framework	-0.124939
-3.113691	a framework	-0.124939
-3.093028	This framework	-0.124939
-2.485404	software framework	-0.124939
-2.263986	extra framework	-0.124939
-1.008826	runtime framework	-0.124939
-1.224167	graphics framework	-0.124939
-1.203054	interface framework	-0.124939
-0.981662	level framework	-0.124939
-1.377930	complex framework	-0.124939
-0.266180	.NET framework	-0.124939
-3.071680	to look	-0.425969
-3.306042	can look	-0.124939
-3.048201	not look	-0.124939
-2.096184	may look	-0.301030
-2.080900	you look	-0.602060
-2.873340	A look	-0.124939
-2.884625	will look	-0.124939
-2.796071	functions look	-0.124939
-2.726915	should look	-0.124939
-2.569765	also look	-0.124939
-2.416409	first look	-0.124939
-2.022738	typically look	-0.124939
-0.504841	Let's look	-0.425969
-0.902061	let's look	-0.124939
-0.902061	(3) look	-0.124939
-1.147931	static linking	-0.329059
-1.103841	dynamic linking	-0.221849
-0.664142	Dynamic linking	-0.249877
-1.679535	level linking	-0.124939
-1.555022	easy linking	-0.124939
-0.601932	Static linking	-0.124939
-2.381173	code. Many	-0.124939
-2.145660	functions. Many	-0.124939
-2.065784	program. Many	-0.124939
-1.951086	below. Many	-0.124939
-1.912273	processors. Many	-0.124939
-1.846722	compiler. Many	-0.124939
-1.674139	counters Many	-0.124939
-1.636350	updates Many	-0.124939
-1.595662	dispatching. Many	-0.124939
-1.552027	integers. Many	-0.124939
-1.442883	solution. Many	-0.124939
-1.297464	microprocessors. Many	-0.124939
-1.201265	databases Many	-0.124939
-1.201265	options. Many	-0.124939
-1.201265	input. Many	-0.124939
-0.901660	developer.intel.com. Many	-0.124939
-0.901660	slower. Many	-0.124939
-0.601344	more. Many	-0.124939
-0.601344	breakdown. Many	-0.124939
-0.601344	services. Many	-0.124939
-0.601344	properly. Many	-0.124939
-2.831006	vector processors.	-0.124939
-2.804515	different processors.	-0.124939
-1.565715	Intel processors.	-0.124939
-2.514283	some processors.	-0.124939
-2.140341	known processors.	-0.124939
-1.917608	graphics processors.	-0.124939
-0.877769	VIA processors.	-0.124939
-0.981555	logical processors.	-0.124939
-0.902374	future processors.	-0.124939
-0.902374	newer processors.	-0.124939
-1.502469	PC processors.	-0.124939
-1.378869	newest processors.	-0.124939
-1.077839	contemporary processors.	-0.124939
-2.630056	is actually	-0.124939
-3.645250	to actually	-0.124939
-3.403065	for actually	-0.124939
-2.322994	are actually	-0.124939
-3.312980	can actually	-0.124939
-3.240613	it actually	-0.124939
-2.348680	may actually	-0.124939
-2.651390	pointer actually	-0.124939
-2.360541	user actually	-0.124939
-1.202331	modifications actually	-0.124939
-1.077839	F2 actually	-0.124939
-0.902195	code" actually	-0.124939
-0.601612	temp++ actually	-0.124939
-2.301982	of Intel,	-1.079181
-3.438006	for Intel,	-0.124939
-2.384165	an Intel,	-0.124939
-2.870313	from Intel,	-0.124939
-2.692560	compilers Intel,	-0.124939
-2.057910	both Intel,	-0.124939
-1.977895	Microsoft Intel,	-0.124939
-0.442287	Microsoft, Intel,	-0.425969
-1.203492	Clang, Intel,	-0.124939
-2.383168	a linked	-0.550907
-2.341727	be linked	-0.249877
-3.353677	are linked	-0.124939
-3.035240	than linked	-0.124939
-2.942360	use linked	-0.124939
-2.228461	A linked	-0.425969
-2.899217	then linked	-0.124939
-2.806000	functions linked	-0.124939
-1.803407	dynamically linked	-0.124939
-0.601712	(dynamically linked	-0.124939
-3.233883	= x;	-0.124939
-2.128361	int x;	-0.124939
-2.896104	} x;	-0.124939
-1.662443	float x;	-0.301030
-2.552176	* x;	-0.124939
-2.498488	return x;	-0.124939
-1.503172	+= x;	-0.124939
-0.274611	*= x;	-0.124939
-1.444877	C1 x;	-0.124939
-0.504998	Bitfield x;	-0.425969
-0.902328	ptr x;	-0.124939
-3.086192	of microprocessors	-0.124939
-3.483690	and microprocessors	-0.124939
-2.662574	Intel microprocessors	-0.124939
-1.833379	some microprocessors	-0.124939
-2.462670	way microprocessors	-0.124939
-1.180601	old microprocessors	-0.124939
-0.691886	modern microprocessors	-0.249877
-1.598755	newer microprocessors	-0.124939
-0.442263	Modern microprocessors	-0.249877
-1.202464	older microprocessors	-0.124939
-1.203291	Smaller microprocessors	-0.124939
-0.601646	Today's microprocessors	-0.124939
-2.200018	to load	-0.388180
-3.444045	The load	-0.124939
-3.078198	not load	-0.124939
-2.900426	will load	-0.124939
-2.182201	at load	-0.425969
-1.554609	work load	-0.425969
-2.224669	specific load	-0.124939
-1.299462	actual load	-0.124939
-0.601746	it) load	-0.124939
-3.097848	to control	-0.425969
-1.152683	loop control	-0.425969
-1.792476	cache control	-0.124939
-2.710413	set control	-0.124939
-2.559273	version control	-0.124939
-1.446304	Cache control	-0.124939
-2.688921	to assume	-0.249877
-3.496741	and assume	-0.124939
-2.245122	can assume	-0.726999
-3.204797	or assume	-0.124939
-3.018112	you assume	-0.124939
-1.844913	we assume	-0.124939
-1.764435	cannot assume	-0.425969
-2.114142	would assume	-0.124939
-0.747731	generally assume	-0.425969
-0.601679	makers assume	-0.124939
-0.601679	safely assume	-0.124939
-1.923460	= 100;	-0.367977
-0.803898	< 100;	-0.761761
-3.184329	the numbers	-0.301030
-1.345730	point numbers	-0.249877
-1.457449	four numbers	-0.124939
-2.129162	eight numbers	-0.124939
-1.299838	model numbers	-0.124939
-1.445277	thousand numbers	-0.124939
-0.902462	denormal numbers	-0.124939
-0.902462	hexadecimal numbers	-0.124939
-0.601746	(low numbers	-0.124939
-3.659637	a platform	-0.124939
-3.642544	of platform	-0.124939
-2.804515	different platform	-0.124939
-1.484927	bit platform	-0.124939
-2.242902	Windows platform	-0.124939
-2.183375	Linux platform	-0.124939
-0.876287	hardware platform	-0.124939
-1.348727	optimal platform	-0.124939
-1.976042	Microsoft platform	-0.124939
-1.876355	x86 platform	-0.124939
-1.502469	PC platform	-0.124939
-1.202778	x86-64 platform	-0.124939
-1.077839	efficiency, platform	-0.124939
-3.763557	is later	-0.124939
-2.278073	and later	-0.271067
-3.456591	for later	-0.124939
-1.839594	or later	-0.550907
-2.950521	use later	-0.124939
-2.300821	& later	-0.124939
-2.074265	cycles later	-0.124939
-3.135467	code together	-0.124939
-1.455318	used together	-0.602060
-1.860171	objects together	-0.124939
-1.136973	stored together	-0.204120
-1.224636	linked together	-0.124939
-1.641213	keep together	-0.124939
-1.299629	project together	-0.124939
-1.078620	joined together	-0.124939
-2.837012	the dispatch	-0.204120
-3.749654	a dispatch	-0.124939
-3.726366	to dispatch	-0.124939
-2.552539	function dispatch	-0.124939
-1.297963	CPU dispatch	-0.301030
-1.960170	runtime dispatch	-0.124939
-3.397110	the calling	-0.124939
-3.690917	is calling	-0.124939
-3.471019	and calling	-0.124939
-3.407529	The calling	-0.124939
-3.403065	for calling	-0.124939
-2.138660	function calling	-0.124939
-2.271073	by calling	-0.124939
-3.091718	as calling	-0.124939
-3.023327	than calling	-0.124939
-1.758098	before calling	-0.425969
-2.220120	specific calling	-0.124939
-2.097461	standard calling	-0.124939
-1.379316	5: calling	-0.124939
-3.620800	of your	-0.124939
-3.624609	to your	-0.124939
-2.952584	in your	-0.124939
-2.800541	for your	-0.124939
-3.396565	that your	-0.124939
-2.540693	if your	-0.124939
-1.899329	make your	-0.301030
-2.776037	If your	-0.124939
-2.446201	before your	-0.124939
-2.176722	inside your	-0.124939
-2.054779	write your	-0.124939
-2.046650	unless your	-0.124939
-1.597958	define your	-0.124939
-0.902061	send your	-0.124939
-0.601545	Inserting your	-0.124939
-0.714934	its own	-0.124939
-1.317271	their own	-0.124939
-0.672535	your own	-0.221849
-1.133035	my own	-0.425969
-1.600800	My own	-0.124939
-3.147131	is declared	-0.124939
-2.228236	be declared	-0.221849
-2.326785	are declared	-0.124939
-3.083406	not declared	-0.124939
-1.446403	objects declared	-0.425969
-2.510875	variables declared	-0.124939
-1.712648	macro declared	-0.124939
-1.016753	Variables declared	-0.425969
-2.918431	the XMM	-0.823909
-3.596311	in XMM	-0.124939
-3.425403	The XMM	-0.124939
-3.420184	for XMM	-0.124939
-2.543215	if XMM	-0.425969
-2.782991	point XMM	-0.124939
-2.149683	uses XMM	-0.124939
-2.043162	Integer XMM	-0.124939
-1.994696	Boolean XMM	-0.124939
-0.450658	128-bit XMM	-0.124939
-1.299508	versus XMM	-0.124939
-2.551422	the second	-0.263241
-2.887069	a second	-0.124939
-2.425190	The second	-0.249877
-2.104515	every second	-0.124939
-0.601880	compatibility, second	-0.124939
-1.052134	example shows	-0.425969
-1.809000	table shows	-0.124939
-1.202999	12.4b shows	-0.124939
-0.902529	16) shows	-0.124939
-0.902529	39 shows	-0.124939
-0.902529	58 shows	-0.124939
-0.601779	77) shows	-0.124939
-0.601779	131) shows	-0.124939
-3.601075	and interface	-0.124939
-0.616965	user interface	-0.301030
-0.806121	graphical interface	-0.124939
-0.505062	well-defined interface	-0.425969
-2.580081	to improve	-0.124939
-1.963543	can improve	-0.271067
-3.094011	not improve	-0.124939
-1.823837	may improve	-0.522879
-2.808495	only improve	-0.124939
-1.679322	possibly improve	-0.124939
-3.873958	the higher	-0.124939
-3.679909	is higher	-0.124939
-2.517560	a higher	-0.204120
-3.339219	be higher	-0.124939
-3.322421	are higher	-0.124939
-3.181928	or higher	-0.124939
-2.877180	A higher	-0.124939
-2.836533	has higher	-0.124939
-2.522192	any higher	-0.124939
-1.524417	much higher	-0.124939
-2.021546	next higher	-0.124939
-1.917588	give higher	-0.124939
-1.872229	usually higher	-0.124939
-0.601579	hence higher	-0.124939
-2.889570	is bigger	-0.602060
-3.610324	of bigger	-0.124939
-3.382025	The bigger	-0.124939
-3.324433	be bigger	-0.124939
-3.307598	are bigger	-0.124939
-3.112364	on bigger	-0.124939
-3.077636	as bigger	-0.124939
-2.774450	loop bigger	-0.124939
-1.416541	new bigger	-0.602060
-2.242834	arrays bigger	-0.124939
-2.007268	allows bigger	-0.124939
-1.915589	becomes bigger	-0.124939
-1.799106	become bigger	-0.124939
-1.711093	offset bigger	-0.124939
-1.298295	Objects bigger	-0.124939
-1.298295	ever bigger	-0.124939
-3.382493	the vectors	-0.124939
-3.566151	in vectors	-0.124939
-3.378588	for vectors	-0.124939
-2.465396	on vectors	-0.124939
-2.706765	integer vectors	-0.124939
-2.664697	using vectors	-0.124939
-1.978693	double vectors	-0.124939
-2.588896	float vectors	-0.124939
-2.583408	64-bit vectors	-0.124939
-1.995090	intrinsic vectors	-0.124939
-1.898180	XMM vectors	-0.124939
-1.898180	bigger vectors	-0.124939
-1.711636	Define vectors	-0.124939
-1.502415	YMM vectors	-0.124939
-1.077538	(128 vectors	-0.124939
-0.379992	3-dimensional vectors	-0.124939
-3.231350	// Floating	-0.124939
-2.646004	double Floating	-0.124939
-2.637941	n.a. Floating	-0.124939
-2.498937	variables Floating	-0.124939
-1.990260	systems. Floating	-0.124939
-1.973737	division Floating	-0.124939
-1.741468	shift Floating	-0.124939
-1.709890	cycles. Floating	-0.124939
-1.676344	purposes. Floating	-0.124939
-1.551405	expressions. Floating	-0.124939
-1.442870	parameters. Floating	-0.124939
-1.444091	integer. Floating	-0.124939
-0.379965	14.6 Floating	-0.425969
-0.901860	105. Floating	-0.124939
-0.203997	7.3 Floating	-0.425969
-0.601445	organized. Floating	-0.124939
-0.601445	cycles). Floating	-0.124939
-0.601445	79 Floating	-0.124939
-3.920982	the AVX2	-0.124939
-3.425403	The AVX2	-0.124939
-3.420184	for AVX2	-0.124939
-3.268574	// AVX2	-0.124939
-2.925841	when AVX2	-0.124939
-1.812580	2 AVX2	-0.124939
-1.338174	4 AVX2	-0.124939
-1.491225	8 AVX2	-0.124939
-0.885500	256 AVX2	-0.124939
-1.897034	vectors AVX2	-0.124939
-0.902328	AVX, AVX2	-0.124939
-3.422620	the piece	-0.425969
-2.279713	a piece	-1.124939
-3.189243	by piece	-0.124939
-2.132819	same piece	-0.425969
-2.447463	critical piece	-0.124939
-2.246932	calculations piece	-0.124939
-1.503581	small piece	-0.124939
-2.042810	particular piece	-0.124939
-2.546321	is divisible	-0.903090
-2.752029	be divisible	-0.425969
-2.423346	not divisible	-0.425969
-2.674289	size divisible	-0.124939
-1.228604	address divisible	-0.823909
-0.965024	addresses divisible	-0.726999
-3.677347	of <<	-0.124939
-1.957574	n <<	-0.124939
-1.641372	list[i] <<	-0.124939
-0.033849	cout <<	-0.346788
-1.503069	j <<	-0.124939
-0.902395	3) <<	-0.124939
-0.601712	asa <<	-0.124939
-0.601712	as(a <<	-0.124939
-0.601712	(C <<	-0.124939
-0.601712	(B <<	-0.124939
-1.594460	} Here,	-0.204120
-2.218144	i; Here,	-0.124939
-2.150724	b; Here,	-0.124939
-2.071542	... Here,	-0.124939
-1.993309	c; Here,	-0.124939
-1.916167	x; Here,	-0.124939
-0.902128	ArrayOfStructures[100]; Here,	-0.124939
-0.902128	3.5; Here,	-0.124939
-0.902128	testing. Here,	-0.124939
-0.601579	blog. Here,	-0.124939
-0.601579	sets). Here,	-0.124939
-0.601579	List[i]++; Here,	-0.124939
-0.601579	c1::*MemberPointer; Here,	-0.124939
-0.601579	1]; Here,	-0.124939
-2.831588	the x86	-0.425969
-3.602603	in x86	-0.124939
-3.434624	The x86	-0.124939
-3.429004	for x86	-0.124939
-1.551659	all x86	-0.522879
-2.428908	bit x86	-0.124939
-2.084475	All x86	-0.124939
-1.802372	Supports x86	-0.124939
-1.802372	modern x86	-0.124939
-0.902395	__linux__ x86	-0.124939
-3.398860	The process	-0.124939
-2.467086	on process	-0.124939
-2.877180	A process	-0.124939
-2.684875	each process	-0.124939
-2.055874	allocation process	-0.124939
-2.007035	complicated process	-0.124939
-1.299300	development process	-0.124939
-1.955676	lookup process	-0.124939
-0.691866	installation process	-0.124939
-1.501314	background process	-0.124939
-1.443799	update process	-0.124939
-1.298628	Development process	-0.124939
-0.601579	learning process	-0.124939
-0.601579	delaying process	-0.124939
-3.175316	the binary	-0.124939
-2.721157	a binary	-0.124939
-3.653838	of binary	-0.124939
-3.655951	to binary	-0.124939
-2.956861	in binary	-0.124939
-3.096516	as binary	-0.124939
-2.937003	use binary	-0.124939
-2.226732	A binary	-0.425969
-2.269427	its binary	-0.124939
-1.773614	produce binary	-0.124939
-0.902261	biased binary	-0.124939
-0.902261	facilities, binary	-0.124939
-2.412592	to know	-0.367977
-3.078198	not know	-0.124939
-2.335873	you know	-0.124939
-2.535683	we know	-0.124939
-1.764782	cannot know	-0.124939
-1.290398	doesn't know	-0.124939
-2.115265	would know	-0.124939
-1.995313	don't know	-0.124939
-0.601746	Intel) know	-0.124939
-3.137673	is 512	-0.124939
-3.113691	a 512	-0.425969
-3.496741	and 512	-0.124939
-3.425403	The 512	-0.124939
-2.576858	also 512	-0.124939
-1.745149	8 512	-0.425969
-1.640868	16 512	-0.425969
-2.184071	matrix 512	-0.124939
-0.650141	512 512	-0.221849
-0.601679	38.7 512	-0.124939
-0.601679	80.9 512	-0.124939
-2.489418	to generate	-0.301030
-2.957836	and generate	-0.425969
-3.449795	that generate	-0.124939
-1.463319	will generate	-0.249877
-2.240203	doesn't generate	-0.124939
-2.012992	automatically generate	-0.124939
-3.035630	the advantages	-0.249877
-2.022520	The advantages	-1.079181
-2.847988	has advantages	-0.124939
-2.786814	other advantages	-0.124939
-2.554560	many advantages	-0.124939
-2.225814	specific advantages	-0.124939
-2.217520	several advantages	-0.124939
-2.060268	above advantages	-0.124939
-2.931815	and r	-0.124939
-2.774783	that r	-0.425969
-2.355364	= r	-0.301030
-3.164836	by r	-0.124939
-2.983818	{ r	-0.124939
-2.921037	when r	-0.124939
-2.554702	where r	-0.124939
-1.746488	0; r	-0.425969
-2.251142	; r	-0.124939
-2.127752	whether r	-0.124939
-1.333469	1; r	-0.425969
-1.976484	what r	-0.124939
-1.078286	lies r	-0.124939
-2.257839	is usually	-0.166331
-2.484753	are usually	-0.124939
-2.905823	will usually	-0.124939
-2.304470	processors usually	-0.124939
-2.247919	calculations usually	-0.124939
-1.503423	integer, usually	-0.124939
-1.203132	databases usually	-0.124939
-3.397110	the results	-0.124939
-2.572355	The results	-0.301030
-3.087307	This results	-0.124939
-2.334430	out results	-0.124939
-2.331361	32 results	-0.124939
-2.144687	four results	-0.124939
-0.964790	intermediate results	-0.124939
-1.444032	measured results	-0.124939
-1.444032	reliable results	-0.124939
-1.444477	thousand results	-0.124939
-0.902195	inconsistent results	-0.124939
-0.902195	misleading results	-0.124939
-0.601612	experimental results	-0.124939
-3.601075	and b,	-0.124939
-3.070100	int b,	-0.124939
-0.439955	a, b,	-0.851937
-1.078741	a[100], b,	-0.124939
-3.407135	the storage	-0.124939
-3.089300	of storage	-0.124939
-2.818629	The storage	-0.124939
-3.174435	by storage	-0.124939
-2.173420	data storage	-0.124939
-2.593837	static storage	-0.124939
-1.843702	variable storage	-0.124939
-1.378310	Register storage	-0.124939
-0.124896	Thread-local storage	-0.124939
-1.078039	endian storage	-0.124939
-0.204044	thread-local storage	-0.124939
-2.696271	the old	-0.249877
-3.444045	The old	-0.124939
-3.438006	for old	-0.124939
-2.085192	with old	-0.249877
-3.139365	on old	-0.124939
-3.054456	an old	-0.124939
-2.490933	very old	-0.124939
-1.202865	years old	-0.124939
-0.601746	plain old	-0.124939
-2.488680	to reduce	-0.124939
-3.553263	and reduce	-0.124939
-1.963387	can reduce	-0.191886
-2.352128	may reduce	-0.425969
-2.905823	will reduce	-0.124939
-2.459003	cannot reduce	-0.124939
-1.919363	actually reduce	-0.124939
-2.525990	that goes	-0.124939
-3.174565	or goes	-0.124939
-2.326468	it goes	-0.124939
-3.193135	function goes	-0.124939
-3.110113	code goes	-0.124939
-2.940207	time goes	-0.124939
-2.826540	vector goes	-0.124939
-2.324254	always goes	-0.124939
-1.742558	output goes	-0.124939
-1.677140	frequency goes	-0.124939
-1.501048	DLL goes	-0.124939
-1.298462	project goes	-0.124939
-1.077639	9.5a goes	-0.124939
-0.601545	p->f() goes	-0.124939
-0.601545	1% goes	-0.124939
-3.097477	a union	-0.425969
-3.382025	The union	-0.124939
-3.167325	or union	-0.124939
-2.977050	{ union	-0.124939
-1.974298	A union	-0.301030
-1.077538	14.28 union	-0.124939
-0.901994	14.23b union	-0.124939
-0.901994	14.26 union	-0.124939
-0.901994	14.27 union	-0.124939
-0.901994	14.23 union	-0.124939
-0.601512	7.40b union	-0.124939
-0.601512	doubles: union	-0.124939
-0.601512	7.39 union	-0.124939
-0.601512	14.29 union	-0.124939
-0.601512	14.24 union	-0.124939
-0.601512	14.25 union	-0.124939
-1.802044	= 0,	-0.124939
-1.929531	at 0,	-0.602060
-1.244829	> 0,	-0.425969
-0.204077	memset(a, 0,	-0.425969
-0.204077	_controlfp_s(&dummy, 0,	-0.425969
-0.601846	memset(list, 0,	-0.124939
-2.303005	is called.	-0.221849
-3.412800	be called.	-0.124939
-2.329076	are called.	-0.249877
-2.059619	was called.	-0.124939
-1.045377	never called.	-0.124939
-2.841197	of 10	-0.301030
-3.614644	to 10	-0.124939
-3.435084	and 10	-0.124939
-3.391034	that 10	-0.124939
-3.083547	- 10	-0.124939
-2.551228	where 10	-0.124939
-2.546005	value 10	-0.124939
-1.853509	takes 10	-0.124939
-2.413569	take 10	-0.124939
-1.991309	systems. 10	-0.124939
-1.713269	until 10	-0.124939
-1.597150	chapter 10	-0.124939
-1.597693	Windows. 10	-0.124939
-1.298295	executed 10	-0.124939
-0.901994	.................................................................................................... 10	-0.124939
-0.901994	99 10	-0.124939
-2.900270	is based	-0.301030
-3.370394	be based	-0.124939
-1.981178	are based	-1.028029
-2.794543	CPU based	-0.124939
-2.580316	C++ based	-0.124939
-2.225561	language based	-0.124939
-1.958604	dispatcher based	-0.124939
-1.917723	framework based	-0.124939
-1.828012	go based	-0.124939
-1.553530	chosen based	-0.124939
-2.692441	to choose	-0.124939
-2.709222	and choose	-0.301030
-1.656463	may choose	-0.271067
-3.023324	you choose	-0.124939
-2.903116	will choose	-0.124939
-2.735086	should choose	-0.124939
-2.011733	automatically choose	-0.124939
-1.445477	developers choose	-0.124939
-3.412236	the options	-0.124939
-2.365155	compiler options	-0.124939
-0.945387	optimization options	-0.249877
-2.294131	available options	-0.124939
-2.183497	line options	-0.124939
-2.086523	certain options	-0.124939
-1.976744	various options	-0.124939
-1.802717	installation options	-0.124939
-1.445423	debugging options	-0.124939
-0.601712	power-save options	-0.124939
-3.402093	the feature	-0.124939
-2.873556	a feature	-0.301030
-3.204911	function feature	-0.124939
-2.420490	This feature	-0.124939
-2.040036	this feature	-0.602060
-2.884962	A feature	-0.124939
-2.790364	CPU feature	-0.124939
-2.576567	such feature	-0.124939
-2.388732	template feature	-0.124939
-1.678266	test feature	-0.124939
-1.742831	special feature	-0.124939
-0.902261	interposition feature	-0.124939
-1.890192	different ways	-0.602060
-2.786814	other ways	-0.124939
-2.560554	possible ways	-0.124939
-2.217520	several ways	-0.124939
-2.059990	fast ways	-0.124939
-0.519718	various ways	-0.425969
-1.263641	three ways	-0.425969
-0.204064	smarter ways	-0.425969
-2.164980	that were	-0.602060
-2.457544	elements were	-0.124939
-2.364328	they were	-0.124939
-2.294485	instructions were	-0.124939
-2.247634	versions were	-0.124939
-2.040162	space were	-0.124939
-1.875547	results were	-0.124939
-1.825818	tested were	-0.124939
-1.711690	tasks were	-0.124939
-1.639139	sizes were	-0.124939
-1.202198	tests were	-0.124939
-0.902128	8.15a were	-0.124939
-0.902128	differences were	-0.124939
-0.601579	Func2 were	-0.124939
-3.937861	the link	-0.124939
-3.695957	a link	-0.124939
-3.678178	to link	-0.124939
-3.510197	and link	-0.124939
-2.820892	The link	-0.124939
-1.495515	static link	-0.249877
-1.012019	dynamic link	-0.425969
-1.772967	No link	-0.124939
-1.640680	previous link	-0.124939
-0.902395	symbolic link	-0.124939
-2.900270	is made	-0.301030
-2.061516	be made	-0.124939
-2.970448	have made	-0.124939
-2.161389	has made	-0.425969
-2.613731	library made	-0.124939
-2.599191	object made	-0.124939
-1.502723	once made	-0.124939
-1.078139	projects made	-0.124939
-0.601712	ready made	-0.124939
-0.601712	Ready made	-0.124939
-2.549306	the appropriate	-0.467361
-3.453675	The appropriate	-0.124939
-3.083406	not appropriate	-0.124939
-3.058170	an appropriate	-0.124939
-2.839191	make appropriate	-0.124939
-2.691779	most appropriate	-0.124939
-2.236622	Use appropriate	-0.124939
-0.601779	wherever appropriate	-0.124939
-1.363243	int i,	-0.271067
-1.700041	+ i,	-0.602060
-0.601947	%10I64i", i,	-0.124939
-3.973723	the constructor	-0.124939
-3.721976	a constructor	-0.124939
-2.825452	The constructor	-0.124939
-2.627133	// constructor	-0.124939
-2.900958	A constructor	-0.124939
-2.299367	simple constructor	-0.124939
-0.695347	copy constructor	-0.191886
-0.601842	default constructor	-0.124939
-3.086192	of CPUs.	-0.124939
-1.730420	different CPUs.	-0.249877
-1.978473	Intel CPUs.	-0.124939
-1.514384	AMD CPUs.	-0.124939
-2.008005	their CPUs.	-0.124939
-1.876687	x86 CPUs.	-0.124939
-1.874638	old CPUs.	-0.124939
-1.828382	VIA CPUs.	-0.124939
-1.801642	modern CPUs.	-0.124939
-1.712699	non-Intel CPUs.	-0.124939
-1.598755	future CPUs.	-0.124939
-1.077939	earlier CPUs.	-0.124939
-2.204072	= 2;	-0.249877
-1.423836	+ 2;	-0.823909
-1.343960	* 2;	-0.346788
-2.486319	< 2;	-0.124939
-2.198599	+= 2;	-0.124939
-1.181288	<< 2;	-0.425969
-2.741336	is just	-0.124939
-3.634807	to just	-0.124939
-2.954005	in just	-0.124939
-3.339219	be just	-0.124939
-3.322421	are just	-0.124939
-3.197025	function just	-0.124939
-3.131274	with just	-0.124939
-3.012963	you just	-0.124939
-2.957773	have just	-0.124939
-2.918654	when just	-0.124939
-2.896942	It just	-0.124939
-2.828767	vector just	-0.124939
-0.902128	significantly just	-0.124939
-0.902128	index, just	-0.124939
-3.677347	of a[i]	-0.124939
-3.678178	to a[i]	-0.124939
-3.240964	= a[i]	-0.124939
-1.552087	{ a[i]	-1.028029
-2.254171	; a[i]	-0.124939
-2.227263	element a[i]	-0.124939
-1.433944	i++) a[i]	-0.425969
-1.299295	here: a[i]	-0.124939
-1.078139	overflow: a[i]	-0.124939
-1.078139	formula a[i]	-0.124939
-3.382493	the function,	-0.124939
-2.936325	this function,	-0.124939
-2.783979	same function,	-0.124939
-2.731067	one function,	-0.124939
-2.603813	library function,	-0.124939
-2.484926	member function,	-0.124939
-2.384717	template function,	-0.124939
-1.606619	simple function,	-0.124939
-2.168872	optimized function,	-0.124939
-2.144327	another function,	-0.124939
-1.854598	innermost function,	-0.124939
-1.743821	frame function,	-0.124939
-1.553023	latter function,	-0.124939
-1.502961	detection function,	-0.124939
-1.298841	select function,	-0.124939
-0.901994	non-member function,	-0.124939
-2.642149	the operands	-0.301030
-3.453675	The operands	-0.124939
-2.789669	point operands	-0.124939
-2.759880	all operands	-0.124939
-2.560554	where operands	-0.124939
-1.300245	Boolean operands	-0.124939
-1.224748	aligned operands	-0.124939
-0.902529	Aligned operands	-0.124939
-2.552838	the innermost	-0.689210
-1.130986	critical innermost	-0.425969
-1.300464	Critical innermost	-0.124939
-3.624609	to require	-0.124939
-3.396565	that require	-0.124939
-2.333738	or require	-0.124939
-3.048201	not require	-0.124939
-2.347536	may require	-0.124939
-2.318771	operations require	-0.124939
-2.293451	instructions require	-0.124939
-2.243583	arrays require	-0.124939
-2.179434	precision require	-0.124939
-2.111904	would require	-0.124939
-1.953605	applications require	-0.124939
-1.826371	references require	-0.124939
-1.444590	profilers require	-0.124939
-0.902061	MOVNTDQ require	-0.124939
-0.601545	algebra) require	-0.124939
-2.759251	the compiler.	-0.124939
-3.120350	a compiler.	-0.124939
-2.816946	different compiler.	-0.124939
-2.806084	same compiler.	-0.124939
-1.979390	Intel compiler.	-0.124939
-2.581960	C++ compiler.	-0.124939
-2.277063	Gnu compiler.	-0.124939
-2.150270	another compiler.	-0.124939
-1.282422	Microsoft compiler.	-0.124939
-3.029211	the advanced	-0.249877
-2.850095	of advanced	-0.124939
-3.496741	and advanced	-0.124939
-3.420184	for advanced	-0.124939
-3.131478	on advanced	-0.124939
-2.382588	an advanced	-0.124939
-2.925775	more advanced	-0.124939
-2.686846	most advanced	-0.124939
-1.995073	using advanced	-0.124939
-2.548979	many advanced	-0.124939
-1.640793	under advanced	-0.124939
-3.615500	a #define	-0.124939
-3.160203	or #define	-0.124939
-3.185458	function #define	-0.124939
-3.117690	with #define	-0.124939
-3.020423	compiler #define	-0.124939
-2.483899	2 #define	-0.124939
-2.424476	8 #define	-0.124939
-2.430036	example, #define	-0.124939
-2.196604	etc. #define	-0.124939
-1.825006	5 #define	-0.124939
-1.078019	#else #define	-0.124939
-1.077438	runtime. #define	-0.124939
-0.204004	elements: #define	-0.425969
-0.901927	name. #define	-0.124939
-0.901927	__GNUC__ #define	-0.124939
-0.601478	N: #define	-0.124939
-0.601478	<math.h> #define	-0.124939
-3.665433	of points	-0.124939
-2.328071	it points	-0.301030
-1.968010	pointer points	-0.124939
-2.330866	always points	-0.124939
-2.126356	list points	-0.124939
-1.916722	actually points	-0.124939
-0.926148	r points	-0.301030
-1.803520	unused points	-0.124939
-1.803520	p points	-0.124939
-0.124896	initially points	-0.602060
-0.601679	eight) points	-0.124939
-2.722497	a switch	-0.249877
-3.666922	to switch	-0.124939
-2.702786	and switch	-0.301030
-2.809007	for switch	-0.124939
-3.204797	or switch	-0.124939
-2.227596	A switch	-0.124939
-2.824818	because switch	-0.124939
-1.743638	task switch	-0.124939
-1.553264	n; switch	-0.124939
-1.300268	context switch	-0.124939
-0.601679	lists, switch	-0.124939
-3.407135	the range	-0.425969
-2.583626	of range	-0.124939
-3.425403	The range	-0.124939
-2.954790	this range	-0.124939
-2.799653	same range	-0.124939
-2.767216	which range	-0.124939
-1.758369	address range	-0.425969
-1.744015	limited range	-0.124939
-1.300268	live range	-0.124939
-0.204044	Live range	-0.425969
-0.601679	narrow range	-0.124939
-3.417397	the start	-0.124939
-2.487209	to start	-0.124939
-3.524083	and start	-0.124939
-3.327196	can start	-0.124939
-3.028870	may start	-0.124939
-2.082099	you start	-0.602060
-2.222064	will start	-0.124939
-2.255186	; start	-0.124939
-1.679047	during start	-0.124939
-3.172353	the modules	-0.124939
-3.642544	of modules	-0.124939
-3.189418	or modules	-0.124939
-2.108097	other modules	-0.124939
-2.746155	all modules	-0.124939
-2.608743	library modules	-0.124939
-2.587122	two modules	-0.124939
-2.444008	critical modules	-0.124939
-2.272676	Some modules	-0.124939
-2.222723	language modules	-0.124939
-1.992007	separate modules	-0.124939
-0.856706	across modules	-0.124939
-0.601791	.cpp modules	-0.124939
-3.392183	the smaller	-0.425969
-2.893108	is smaller	-0.124939
-3.103890	a smaller	-0.124939
-3.631536	of smaller	-0.124939
-3.398860	The smaller	-0.124939
-3.339219	be smaller	-0.124939
-3.113645	code smaller	-0.124939
-2.647612	into smaller	-0.124939
-2.591003	multiple smaller	-0.124939
-2.346057	even smaller	-0.124939
-2.233635	bytes smaller	-0.124939
-1.916640	becomes smaller	-0.124939
-1.874122	made smaller	-0.124939
-1.443799	units smaller	-0.124939
-2.753620	used here	-0.124939
-2.586648	static here	-0.124939
-2.225227	necessary here	-0.124939
-2.224157	speed here	-0.124939
-2.162109	calculation here	-0.124939
-2.136363	problem here	-0.124939
-1.332374	operator here	-0.425969
-1.954023	n here	-0.124939
-1.954560	runtime here	-0.124939
-1.825435	go here	-0.124939
-1.743278	given here	-0.124939
-1.378022	anything here	-0.124939
-1.298841	And here	-0.124939
-0.504821	said here	-0.425969
-1.077538	decomposition here	-0.124939
-0.601512	provoked here	-0.124939
-3.402093	the core	-0.124939
-2.816379	The core	-0.425969
-3.337768	are core	-0.124939
-2.796472	same core	-0.124939
-1.700860	CPU core	-0.124939
-2.443115	called core	-0.124939
-2.325746	system core	-0.124939
-2.246244	execution core	-0.124939
-2.235268	processor core	-0.124939
-2.026023	microprocessor core	-0.124939
-1.744472	math core	-0.124939
-1.445916	Math core	-0.124939
-3.407135	the relevant	-0.124939
-2.898468	is relevant	-0.301030
-3.362388	be relevant	-0.124939
-2.925775	more relevant	-0.124939
-1.667388	all relevant	-0.425969
-2.576858	also relevant	-0.124939
-2.359772	new relevant	-0.124939
-1.875510	options relevant	-0.124939
-1.445636	hardly relevant	-0.124939
-1.078039	keywords relevant	-0.124939
-0.902328	respects relevant	-0.124939
-2.364528	registers are:	-0.124939
-2.366022	pointers are:	-0.124939
-2.328233	programming are:	-0.124939
-2.269610	stack are:	-0.124939
-1.363637	allocation are:	-0.124939
-2.040043	dispatching are:	-0.124939
-1.223972	linking are:	-0.124939
-1.826371	references are:	-0.124939
-1.800042	inlining are:	-0.124939
-1.501048	iterations are:	-0.124939
-1.444078	free are:	-0.124939
-1.444590	profilers are:	-0.124939
-1.298974	effects are:	-0.124939
-1.202064	solutions are:	-0.124939
-0.601545	Disadvantages are:	-0.124939
-3.678178	to around	-0.124939
-2.245985	work around	-0.124939
-1.919095	directives around	-0.124939
-1.879079	ways around	-0.124939
-0.158321	scattered around	-0.492916
-1.299642	wrap around	-0.124939
-1.078486	jumping around	-0.124939
-0.902395	randomly around	-0.124939
-0.204050	parenthesis around	-0.124939
-0.601712	circumstances around	-0.124939
-2.839431	to 5	-0.124939
-3.089188	- 5	-0.124939
-2.793973	only 5	-0.124939
-1.870696	* 5	-0.124939
-1.854207	takes 5	-0.425969
-2.482551	between 5	-0.124939
-1.916727	processors. 5	-0.124939
-1.744617	== 5	-0.124939
-1.202331	....................................................................................... 5	-0.124939
-1.077839	........................................................................................... 5	-0.124939
-1.077839	23 5	-0.124939
-0.902195	3, 5	-0.124939
-0.601612	website. 5	-0.124939
-2.905720	is replaced	-0.602060
-3.553263	and replaced	-0.124939
-1.942846	be replaced	-0.778151
-3.378690	are replaced	-0.124939
-2.849927	has replaced	-0.124939
-2.184558	been replaced	-0.124939
-2.154425	parameters replaced	-0.124939
-2.610726	= a;	-0.124939
-1.855548	int a;	-0.221849
-2.638646	+ a;	-0.124939
-1.504067	float a;	-0.249877
-1.746461	bool a;	-0.124939
-0.380112	{double a;	-0.425969
-0.204071	{int a;	-0.425969
-3.080043	of things	-0.124939
-3.394752	for things	-0.124939
-2.766484	other things	-0.124939
-1.748794	do things	-0.301030
-2.591003	multiple things	-0.124939
-2.585116	two things	-0.124939
-2.512664	some things	-0.124939
-2.292165	simple things	-0.124939
-2.110105	doing things	-0.124939
-1.974632	various things	-0.124939
-1.953789	three things	-0.124939
-0.902128	reveal things	-0.124939
-0.902128	funny things	-0.124939
-0.601579	ingenious things	-0.124939
-3.904735	the negative	-0.124939
-3.702211	is negative	-0.124939
-2.721157	a negative	-0.425969
-2.935438	and negative	-0.124939
-3.416374	The negative	-0.124939
-3.411540	for negative	-0.124939
-3.354527	be negative	-0.124939
-3.337768	are negative	-0.124939
-3.062941	not negative	-0.124939
-2.226732	A negative	-0.124939
-2.703908	no negative	-0.124939
-2.555866	possible negative	-0.124939
-1.704626	code section	-0.124939
-3.107671	This section	-0.124939
-2.042799	this section	-0.124939
-1.922360	data section	-0.124939
-2.028114	next section	-0.124939
-0.601846	language", section	-0.124939
-3.412236	the reductions	-0.124939
-2.928520	more reductions	-0.124939
-2.088719	which reductions	-0.124939
-2.550832	many reductions	-0.124939
-2.296953	simple reductions	-0.124939
-2.012537	Most reductions	-0.124939
-0.358939	algebraic reductions	-0.204120
-1.299642	obvious reductions	-0.124939
-1.299642	equivalent reductions	-0.124939
-1.078486	Algebraic reductions	-0.124939
-2.687753	to go	-0.124939
-3.483690	and go	-0.124939
-3.316490	can go	-0.124939
-3.263057	// go	-0.124939
-3.204911	function go	-0.124939
-2.349252	may go	-0.124939
-2.220380	will go	-0.124939
-2.802001	functions go	-0.124939
-2.506060	variables go	-0.124939
-2.256196	must go	-0.124939
-2.243007	calculations go	-0.124939
-1.202464	otherwise go	-0.124939
-3.402168	that depends	-0.124939
-2.931712	use depends	-0.124939
-2.828767	vector depends	-0.124939
-2.778202	loop depends	-0.124939
-1.859442	value depends	-0.124939
-1.791922	branch depends	-0.425969
-1.469849	calculation depends	-0.425969
-2.023896	application depends	-0.124939
-1.956622	addition depends	-0.124939
-1.826767	predicted depends	-0.124939
-1.745308	sum depends	-0.124939
-1.600138	gain depends	-0.124939
-1.077739	bookkeeping depends	-0.124939
-0.601579	truth depends	-0.124939
-4.012816	the example:	-0.124939
-3.466191	for example:	-0.124939
-2.974076	this example:	-0.124939
-0.793889	For example:	-1.204120
-2.336480	following example:	-0.124939
-1.679322	Another example:	-0.124939
-3.553263	and tested	-0.124939
-2.228819	be tested	-0.221849
-1.681739	have tested	-0.204120
-2.251335	versions tested	-0.124939
-1.492428	been tested	-0.124939
-1.745477	further tested	-0.124939
-0.601813	well- tested	-0.124939
-4.012816	the contentions	-0.124939
-1.998175	when contentions	-0.301030
-1.425338	cache contentions	-0.204120
-2.582467	such contentions	-0.124939
-1.238336	cause contentions	-0.602060
-0.492821	Cache contentions	-0.301030
-2.907552	is predicted	-0.124939
-1.998999	be predicted	-0.182931
-2.485850	are predicted	-0.301030
-3.094011	not predicted	-0.124939
-2.089804	simply predicted	-0.124939
-1.876984	usually predicted	-0.124939
-2.699242	the main	-0.329059
-3.728566	of main	-0.124939
-2.714436	in main	-0.301030
-2.581407	The main	-0.124939
-2.878526	from main	-0.124939
-2.601431	two main	-0.124939
-2.549109	and references	-0.124939
-3.197039	or references	-0.124939
-3.027262	than references	-0.124939
-2.923048	more references	-0.124939
-1.994643	using references	-0.124939
-2.288362	constant references	-0.124939
-1.106547	relative references	-0.124939
-1.445090	absolute references	-0.124939
-1.378143	self-relative references	-0.124939
-1.299374	versus references	-0.124939
-0.601646	non-constant references	-0.124939
-0.601646	Internal references	-0.124939
-2.751470	is loaded	-0.124939
-3.568623	and loaded	-0.124939
-1.998999	be loaded	-0.182931
-2.735551	are loaded	-0.124939
-2.028959	typically loaded	-0.124939
-0.601846	over- loaded	-0.124939
-3.937861	the positive	-0.124939
-2.610637	a positive	-0.346788
-3.434624	The positive	-0.124939
-2.811150	for positive	-0.124939
-2.892887	A positive	-0.124939
-2.593197	two positive	-0.124939
-2.519179	some positive	-0.124939
-1.563628	large positive	-0.425969
-1.364030	both positive	-0.124939
-1.553876	low positive	-0.124939
-2.644770	the loop.	-0.234083
-3.764183	a loop.	-0.124939
-2.373772	test loop.	-0.124939
-0.626831	innermost loop.	-0.221849
-0.902730	infinite loop.	-0.124939
-2.697259	the computer	-0.249877
-3.123718	a computer	-0.124939
-2.962630	in computer	-0.124939
-2.900958	A computer	-0.124939
-2.749355	one computer	-0.124939
-2.554560	many computer	-0.124939
-2.441960	4 computer	-0.124939
-1.876623	old computer	-0.124939
-3.037791	the overhead	-0.726999
-2.423370	The overhead	-0.726999
-2.715188	no overhead	-0.124939
-1.159701	extra overhead	-0.124939
-1.564269	large overhead	-0.124939
-1.978089	high overhead	-0.124939
-1.678776	little overhead	-0.124939
-1.924887	and VIA	-0.397940
-3.272416	or VIA	-0.124939
-1.203894	recognize VIA	-0.124939
-3.973723	the pointer.	-0.124939
-2.522614	a pointer.	-0.425969
-2.869134	memory pointer.	-0.124939
-1.794231	member pointer.	-0.124939
-2.275788	stack pointer.	-0.124939
-0.689085	smart pointer.	-0.124939
-0.902809	'this' pointer.	-0.124939
-1.078340	hidden pointer.	-0.124939
-2.778791	that supports	-0.124939
-1.954037	compiler supports	-0.124939
-2.899829	It supports	-0.124939
-1.860119	CPU supports	-0.124939
-2.019649	set supports	-0.425969
-2.578649	also supports	-0.124939
-1.678469	N supports	-0.124939
-1.554914	tool supports	-0.124939
-0.601712	N+1 supports	-0.124939
-0.601712	(IDE) supports	-0.124939
-3.889074	the C	-0.124939
-3.471019	and C	-0.124939
-3.583995	in C	-0.124939
-3.346806	be C	-0.124939
-2.489780	with C	-0.124939
-3.023327	than C	-0.124939
-1.582542	Gnu C	-0.124939
-1.992007	separate C	-0.124939
-1.639882	either C	-0.124939
-0.204030	2.2, C	-0.425969
-0.204030	fashioned C	-0.425969
-0.902195	low-level C	-0.124939
-0.601612	official C	-0.124939
-2.747642	is compatible	-0.726999
-2.499037	be compatible	-0.602060
-2.169415	not compatible	-0.301030
-2.690129	most compatible	-0.124939
-2.350150	even compatible	-0.124939
-1.553796	fully compatible	-0.124939
-1.445903	highly compatible	-0.124939
-0.601809	backwards compatible	-0.124939
-0.902462	mostly compatible	-0.124939
-3.695957	a change	-0.124939
-3.678178	to change	-0.124939
-3.434624	The change	-0.124939
-2.129647	can change	-0.346788
-2.350400	may change	-0.124939
-3.019842	you change	-0.124939
-2.221502	will change	-0.124939
-2.534541	we change	-0.124939
-1.764609	cannot change	-0.124939
-1.554221	Don't change	-0.124939
-3.920982	the global	-0.124939
-2.609601	a global	-0.346788
-3.496741	and global	-0.124939
-2.583667	or global	-0.124939
-2.742407	one global	-0.124939
-2.527525	variable global	-0.124939
-2.507259	variables global	-0.124939
-1.757384	called global	-0.124939
-2.246967	avoid global	-0.124939
-2.083623	All global	-0.124939
-1.711831	Avoid global	-0.124939
-3.665433	of my	-0.124939
-3.496741	and my	-0.124939
-2.438522	in my	-0.124939
-2.809007	for my	-0.124939
-3.419422	that my	-0.124939
-3.174435	by my	-0.124939
-2.469632	on my	-0.124939
-2.448768	See my	-0.124939
-2.442608	For my	-0.124939
-2.011729	see my	-0.124939
-1.299128	(In my	-0.124939
-3.181304	the conversions	-0.124939
-2.754337	all conversions	-0.124939
-2.599714	float conversions	-0.124939
-2.372629	need conversions	-0.124939
-2.308776	type conversions	-0.124939
-1.554775	avoid conversions	-0.425969
-1.110538	These conversions	-0.124939
-1.712196	Avoid conversions	-0.124939
-1.378476	Type conversions	-0.124939
-0.601712	point-to-integer conversions	-0.124939
-3.937861	the statement	-0.124939
-2.543848	if statement	-0.124939
-2.958580	this statement	-0.124939
-2.744711	one statement	-0.124939
-2.693312	each statement	-0.124939
-1.745969	call statement	-0.124939
-1.898594	control statement	-0.124939
-0.626679	switch statement	-0.221849
-1.503069	throw() statement	-0.124939
-1.502377	general statement	-0.124939
-3.086192	of errors	-0.425969
-2.837300	program errors	-0.124939
-2.796472	same errors	-0.124939
-1.633209	such errors	-0.301030
-1.641137	programming errors	-0.124939
-2.183691	cause errors	-0.124939
-2.009225	handling errors	-0.124939
-1.444677	serious errors	-0.124939
-1.077939	unpredictable errors	-0.124939
-1.077939	137 errors	-0.124939
-0.601646	detecting errors	-0.124939
-0.601646	fatal errors	-0.124939
-3.553263	and off	-0.124939
-3.260763	it off	-0.124939
-0.343739	turn off	-0.329059
-1.713538	them off	-0.124939
-1.445924	log off	-0.124939
-0.249823	turning off	-0.301030
-0.601813	cut off	-0.124939
-3.095582	of unused	-0.425969
-2.384165	an unused	-0.124939
-2.498325	2 unused	-0.124939
-1.751780	4 unused	-0.425969
-2.443404	For unused	-0.124939
-1.150126	; unused	-0.726999
-2.125138	few unused	-0.124939
-2.099426	add unused	-0.124939
-0.856893	6 unused	-0.425969
-3.904735	the relative	-0.124939
-3.110400	a relative	-0.124939
-3.411540	for relative	-0.124939
-3.337768	are relative	-0.124939
-3.204911	function relative	-0.124939
-2.135808	because relative	-0.124939
-1.793777	member relative	-0.425969
-1.876276	generate relative	-0.124939
-1.712287	offset relative	-0.124939
-0.902261	mostly relative	-0.124939
-0.204037	self- relative	-0.124939
-0.601646	addressed relative	-0.124939
-2.698001	of columns	-0.425969
-2.942777	and columns	-0.425969
-3.179315	by columns	-0.124939
-2.928264	when columns	-0.124939
-2.103058	loop columns	-0.425969
-2.782170	If columns	-0.124939
-2.433469	8 columns	-0.124939
-1.803753	unused columns	-0.124939
-0.204050	20, columns	-0.425969
-0.902395	10, columns	-0.124939
-3.074511	to p	-0.425969
-2.773456	that p	-0.124939
-3.213308	= p	-0.124939
-3.160115	by p	-0.124939
-3.086973	as p	-0.124939
-2.650014	pointer p	-0.124939
-2.593288	object p	-0.124939
-2.546307	* p	-0.124939
-2.447047	before p	-0.124939
-2.218144	i; p	-0.124939
-2.127322	whether p	-0.124939
-1.552944	away p	-0.124939
-0.902128	p->Hello(); p	-0.124939
-0.902128	p; p	-0.124939
-2.757100	all platforms.	-0.124939
-1.300054	Windows platforms.	-0.301030
-2.028115	Mac platforms.	-0.124939
-0.766629	x86 platforms.	-0.124939
-1.554109	across platforms.	-0.124939
-1.503269	PC platforms.	-0.124939
-0.504962	x86-64 platforms.	-0.124939
-1.078240	Unix-like platforms.	-0.124939
-0.204057	major platforms.	-0.124939
-2.110283	other languages	-0.124939
-2.358213	these languages	-0.124939
-1.019740	programming languages	-0.124939
-2.238957	compiled languages	-0.124939
-1.641026	includes languages	-0.124939
-1.502723	interpreted languages	-0.124939
-1.078486	high-level languages	-0.124939
-0.902395	Interpreted languages	-0.124939
-0.601712	Compiled languages	-0.124939
-0.601712	Low-level languages	-0.124939
-3.181304	the installation	-0.602060
-2.576211	The installation	-0.124939
-3.429004	for installation	-0.124939
-2.862670	at installation	-0.124939
-2.754337	all installation	-0.124939
-2.418847	take installation	-0.124939
-0.981785	during installation	-0.425969
-1.553530	standardized installation	-0.124939
-0.505011	Program installation	-0.124939
-1.202731	individual installation	-0.124939
-1.992192	name depending	-0.124939
-1.878745	ways depending	-0.124939
-1.802875	dynamically depending	-0.124939
-0.274628	cycles, depending	-0.823909
-1.444265	memory, depending	-0.124939
-1.444265	integers, depending	-0.124939
-1.077939	64, depending	-0.124939
-0.902261	four, depending	-0.124939
-0.601646	meanings depending	-0.124939
-0.601646	12.4a, depending	-0.124939
-0.601646	move, depending	-0.124939
-0.601646	solutions, depending	-0.124939
-3.031340	the syntax	-0.425969
-3.695957	a syntax	-0.124939
-2.576211	The syntax	-0.124939
-2.928520	more syntax	-0.124939
-1.895193	C++ syntax	-0.124939
-2.519179	some syntax	-0.124939
-2.258604	assembly syntax	-0.124939
-2.245645	Windows syntax	-0.124939
-2.185542	Linux syntax	-0.124939
-1.078486	bypassing syntax	-0.124939
-3.955422	the cases.	-0.124939
-1.749844	most cases.	-0.124939
-2.579507	such cases.	-0.124939
-1.867944	many cases.	-0.124939
-1.580539	some cases.	-0.124939
-1.354098	simple cases.	-0.124939
-2.227127	best cases.	-0.124939
-2.057910	both cases.	-0.124939
-1.640634	simplest cases.	-0.124939
-1.915051	processors. Supports	-0.124939
-1.849180	compiler. Supports	-0.124939
-1.710010	library. Supports	-0.124939
-1.675789	libraries. Supports	-0.124939
-1.639086	well. Supports	-0.124939
-1.597150	sets. Supports	-0.124939
-1.598236	not. Supports	-0.124939
-1.201931	options. Supports	-0.124939
-1.077538	code). Supports	-0.124939
-0.901994	parallelization. Supports	-0.124939
-0.901994	64-bit. Supports	-0.124939
-0.901994	Mac. Supports	-0.124939
-0.901994	source. Supports	-0.124939
-0.601512	PSDK). Supports	-0.124939
-0.601512	yet. Supports	-0.124939
-0.601512	workaround. Supports	-0.124939
-2.928420	the choice	-0.522879
-2.309841	The choice	-0.823909
-1.089899	good choice	-0.726999
-2.046165	optimal choice	-0.124939
-1.445898	suitable choice	-0.124939
-3.725720	is 1.	-0.124939
-2.942777	and 1.	-0.124939
-2.605736	= 1.	-0.124939
-2.182755	or 1.	-0.124939
-2.603078	number 1.	-0.124939
-0.902395	problem: 1.	-0.124939
-0.601712	dispatching: 1.	-0.124939
-0.601712	local: 1.	-0.124939
-0.601712	manuals: 1.	-0.124939
-0.601712	satisfied: 1.	-0.124939
-2.760393	the STL	-0.191886
-2.962630	in STL	-0.124939
-2.384956	an STL	-0.124939
-2.947783	use STL	-0.124939
-2.786814	other STL	-0.124939
-2.276616	Some STL	-0.124939
-1.202999	objects. STL	-0.124939
-0.902529	container. STL	-0.124939
-2.302549	is intended	-0.522879
-2.735551	are intended	-0.124939
-3.126470	as intended	-0.124939
-3.094011	not intended	-0.124939
-3.051650	than intended	-0.124939
-1.503870	unit intended	-0.124939
-3.653838	of dynamically	-0.124939
-3.590110	in dynamically	-0.124939
-3.354527	be dynamically	-0.124939
-3.096516	as dynamically	-0.124939
-2.807590	different dynamically	-0.124939
-1.089539	allocated dynamically	-0.249877
-2.192662	small dynamically	-0.124939
-1.677937	frequency dynamically	-0.124939
-1.639737	align dynamically	-0.124939
-0.601811	Aligning dynamically	-0.425969
-1.077939	aligning dynamically	-0.124939
-0.601646	vary dynamically	-0.124939
-3.742373	of consecutive	-0.124939
-3.204574	by consecutive	-0.124939
-2.422584	64 consecutive	-0.124939
-2.152812	four consecutive	-0.124939
-0.480270	eight consecutive	-1.204120
-3.433259	the profiler	-0.124939
-2.614807	a profiler	-0.221849
-2.218221	The profiler	-0.204120
-2.909183	A profiler	-0.124939
-1.446091	Intel's profiler	-0.124939
-0.601846	AMD's profiler	-0.124939
-2.693621	to become	-0.124939
-2.405038	can become	-0.124939
-2.301099	have become	-0.124939
-2.905823	will become	-0.124939
-2.903980	then become	-0.124939
-1.749428	has become	-0.124939
-1.503669	easily become	-0.124939
-3.683510	a Windows,	-0.124939
-3.596311	in Windows,	-0.124939
-2.409267	for Windows,	-0.249877
-3.145297	with Windows,	-0.124939
-1.904146	64-bit Windows,	-0.124939
-2.569985	In Windows,	-0.124939
-1.792979	32-bit Windows,	-0.124939
-2.359772	systems Windows,	-0.124939
-2.026992	Mac Windows,	-0.124939
-1.678203	16-bit Windows,	-0.124939
-1.299128	(In Windows,	-0.124939
-3.417397	the index	-0.124939
-3.431318	that index	-0.124939
-3.111235	as index	-0.124939
-3.098826	This index	-0.124939
-2.384165	an index	-0.124939
-1.339318	array index	-0.221849
-2.010176	their index	-0.124939
-1.264002	last index	-0.124939
-0.601746	FatalAppExitA(0,"Array index	-0.124939
-3.695957	a modern	-0.124939
-2.584600	of modern	-0.346788
-3.434624	The modern	-0.124939
-3.425329	that modern	-0.124939
-2.825925	because modern	-0.124939
-1.826275	all modern	-0.124939
-2.688485	most modern	-0.124939
-2.084475	All modern	-0.124939
-2.012537	Most modern	-0.124939
-1.378130	Several modern	-0.124939
-2.526740	that gives	-0.602060
-3.237344	it gives	-0.124939
-3.113645	code gives	-0.124939
-3.084474	This gives	-0.124939
-2.760816	which gives	-0.124939
-2.695560	set gives	-0.124939
-2.585116	two gives	-0.124939
-2.399537	often gives	-0.124939
-2.356373	systems gives	-0.124939
-1.852542	here gives	-0.124939
-1.598701	SSE4.1 gives	-0.124939
-1.078219	CPUs" gives	-0.124939
-0.902128	0's gives	-0.124939
-0.902128	N&(N-1) gives	-0.124939
-1.936230	// Loop	-0.191886
-3.056926	x Loop	-0.124939
-2.897987	} Loop	-0.124939
-2.389399	time. Loop	-0.124939
-1.852148	compiler. Loop	-0.124939
-1.445077	case. Loop	-0.124939
-1.299642	factor. Loop	-0.124939
-0.902395	eliminated. Loop	-0.124939
-0.601712	12.4a. Loop	-0.124939
-0.601712	8.23a. Loop	-0.124939
-3.835016	is avoided	-0.124939
-1.707301	be avoided	-0.522879
-2.349470	to turn	-0.726999
-2.950241	and turn	-0.124939
-3.615467	in turn	-0.124939
-3.330824	can turn	-0.124939
-3.228937	or turn	-0.124939
-3.083406	not turn	-0.124939
-3.023324	you turn	-0.124939
-2.902387	then turn	-0.124939
-3.992830	the inlining	-0.124939
-3.553263	and inlining	-0.124939
-3.456591	for inlining	-0.124939
-2.140714	function inlining	-0.249877
-2.116415	by inlining	-0.425969
-2.459734	makes inlining	-0.124939
-1.031900	Function inlining	-0.124939
-3.417397	the size.	-0.124939
-3.220740	or size.	-0.124939
-3.131753	code size.	-0.124939
-1.905383	vector size.	-0.124939
-1.632749	cache size.	-0.124939
-2.530808	variable size.	-0.124939
-1.733093	register size.	-0.124939
-2.224669	specific size.	-0.124939
-2.184415	line size.	-0.124939
-3.181304	the network	-0.301030
-3.117008	a network	-0.124939
-3.678178	to network	-0.124939
-2.942777	and network	-0.124939
-3.434624	The network	-0.124939
-3.429004	for network	-0.124939
-3.212695	or network	-0.124939
-3.150073	with network	-0.124939
-2.470484	on network	-0.425969
-0.902395	databases, network	-0.124939
-3.955422	the slow	-0.124939
-2.902079	is slow	-0.301030
-2.879296	a slow	-0.124939
-3.361856	are slow	-0.124939
-3.154903	with slow	-0.124939
-2.350976	may slow	-0.425969
-2.764696	but slow	-0.124939
-2.324822	operations slow	-0.124939
-1.048465	particularly slow	-0.124939
-2.632117	+ b)	-0.124939
-2.599714	float b)	-0.124939
-2.481477	< b)	-0.124939
-2.297291	& b)	-0.124939
-1.214823	/ b)	-0.124939
-2.099969	: b)	-0.124939
-1.938269	|| b)	-0.124939
-1.938955	> b)	-0.124939
-1.773655	>= b)	-0.124939
-0.634058	bool b)	-0.726999
-3.553263	and >=	-0.124939
-1.668293	i >=	-0.124939
-1.446960	(i >=	-0.124939
-1.445677	(a >=	-0.124939
-0.090158	(level >=	-0.425969
-0.124911	(iset >=	-0.124939
-0.902596	int)i >=	-0.124939
-2.511976	the desired	-0.204120
-3.752625	to desired	-0.124939
-3.601075	and desired	-0.124939
-3.179892	with desired	-0.124939
-2.069653	used. Such	-0.124939
-1.639139	way. Such	-0.124939
-1.501314	software. Such	-0.124939
-1.444278	processor. Such	-0.124939
-1.443799	language. Such	-0.124939
-1.377331	chain. Such	-0.124939
-1.202198	running. Such	-0.124939
-0.902128	market. Such	-0.124939
-0.204024	chip. Such	-0.124939
-0.601579	standards. Such	-0.124939
-0.601579	identification. Such	-0.124939
-0.601579	games. Such	-0.124939
-0.601579	__declspec(thread). Such	-0.124939
-0.601579	reproducible. Such	-0.124939
-3.407135	the #pragma	-0.425969
-2.583667	or #pragma	-0.124939
-3.208908	function #pragma	-0.124939
-2.939673	use #pragma	-0.124939
-2.330866	always #pragma	-0.124939
-2.057275	write #pragma	-0.124939
-1.224548	aligned #pragma	-0.425969
-1.641172	nontemporal #pragma	-0.124939
-0.805723	__restrict #pragma	-0.425969
-0.601679	aliased #pragma	-0.124939
-0.601679	Vectorize #pragma	-0.124939
-1.706048	code. Dynamic	-0.124939
-2.070966	used. Dynamic	-0.124939
-2.057057	allocation Dynamic	-0.124939
-1.957203	efficient. Dynamic	-0.124939
-1.597933	user. Dynamic	-0.124939
-1.552586	allocation. Dynamic	-0.124939
-1.552586	inefficient. Dynamic	-0.124939
-1.298962	28 Dynamic	-0.124939
-1.202464	are. Dynamic	-0.124939
-1.077939	limited. Dynamic	-0.124939
-0.204037	9.6 Dynamic	-0.425969
-0.204037	3.6 Dynamic	-0.425969
-2.760962	used functions,	-0.124939
-2.612062	library functions,	-0.124939
-2.487723	member functions,	-0.124939
-2.270531	its functions,	-0.124939
-2.219629	virtual functions,	-0.124939
-1.045376	intrinsic functions,	-0.301030
-1.598643	similar functions,	-0.124939
-1.554401	pure functions,	-0.124939
-1.078039	well-tested functions,	-0.124939
-0.380059	exponential functions,	-0.425969
-0.380059	trigonometric functions,	-0.425969
-2.761538	the whole	-0.271067
-3.127113	a whole	-0.124939
-2.817642	for whole	-0.425969
-2.692956	do whole	-0.124939
-2.449615	called whole	-0.124939
-2.237673	Use whole	-0.124939
-2.115169	doing whole	-0.124939
-3.992830	the inefficient	-0.124939
-2.635910	is inefficient	-0.522879
-3.378690	are inefficient	-0.124939
-3.061916	an inefficient	-0.124939
-2.980204	have inefficient	-0.124939
-1.390333	very inefficient	-0.124939
-1.378977	quite inefficient	-0.124939
-2.595593	the level-2	-0.823909
-3.133982	a level-2	-0.425969
-3.483916	The level-2	-0.124939
-3.476007	for level-2	-0.124939
-3.232478	if level-2	-0.124939
-2.835650	the response	-0.602060
-3.735594	a response	-0.124939
-3.456591	for response	-0.124939
-1.385193	long response	-0.425969
-1.942264	longer response	-0.124939
-0.902596	immediate response	-0.124939
-0.601813	irregular response	-0.124939
-3.140803	is described	-0.124939
-3.353677	are described	-0.124939
-1.926918	as described	-0.522879
-2.970448	have described	-0.124939
-2.342324	method described	-0.124939
-2.310139	cases described	-0.124939
-1.993786	methods described	-0.124939
-1.802372	syntax described	-0.124939
-1.744380	further described	-0.124939
-0.902395	paragraph described	-0.124939
-2.496353	of 2.	-0.204120
-3.386863	be 2.	-0.124939
-2.274115	by 2.	-0.301030
-1.802268	platforms. 2.	-0.124939
-1.600379	version. 2.	-0.124939
-0.601779	meaning. 2.	-0.124939
-0.601779	loader. 2.	-0.124939
-0.601779	PathScale. 2.	-0.124939
-3.653838	of variables.	-0.124939
-2.796472	same variables.	-0.124939
-2.748865	all variables.	-0.124939
-2.715813	integer variables.	-0.124939
-2.592390	static variables.	-0.124939
-2.431475	unsigned variables.	-0.124939
-1.478487	register variables.	-0.124939
-2.355438	these variables.	-0.124939
-1.961719	induction variables.	-0.124939
-1.918103	public variables.	-0.124939
-1.132793	global variables.	-0.124939
-1.804112	consecutive variables.	-0.124939
-3.715184	of lines	-0.124939
-1.229210	cache lines	-0.234083
-2.654919	into lines	-0.124939
-2.443300	4 lines	-0.124939
-2.334098	16 lines	-0.124939
-2.217713	These lines	-0.124939
-2.126915	few lines	-0.124939
-3.033480	the hot	-0.249877
-2.879296	a hot	-0.301030
-3.524083	and hot	-0.124939
-3.220740	or hot	-0.124939
-2.962402	this hot	-0.124939
-2.530505	any hot	-0.124939
-1.078804	find hot	-0.425969
-1.554734	finding hot	-0.124939
-0.601746	identifying hot	-0.124939
-1.938345	types Unfortunately,	-0.124939
-1.874168	called. Unfortunately,	-0.124939
-1.853741	2; Unfortunately,	-0.124939
-1.743576	calls. Unfortunately,	-0.124939
-1.676120	classes. Unfortunately,	-0.124939
-1.677140	purposes. Unfortunately,	-0.124939
-1.597448	dispatching. Unfortunately,	-0.124939
-1.444078	table. Unfortunately,	-0.124939
-1.444078	core. Unfortunately,	-0.124939
-1.377131	this. Unfortunately,	-0.124939
-1.298974	80 Unfortunately,	-0.124939
-1.077639	CodeAnalyst. Unfortunately,	-0.124939
-0.601545	portability. Unfortunately,	-0.124939
-0.601545	lrint. Unfortunately,	-0.124939
-0.601545	132. Unfortunately,	-0.124939
-1.641298	C++ v.	-0.124939
-1.598265	compiler, v.	-0.124939
-0.647521	Compiler v.	-0.124939
-1.078420	C/C++ v.	-0.124939
-0.902328	VectorC v.	-0.124939
-0.601679	(gcc v.	-0.124939
-0.601679	(MKL v.	-0.124939
-0.601679	Asmlib: v.	-0.124939
-0.601679	bcc, v.	-0.124939
-0.601679	Glibc v.	-0.124939
-0.601679	2008, v.	-0.124939
-3.087307	This operation	-0.124939
-2.129395	same operation	-0.425969
-2.103687	point operation	-0.124939
-2.737835	one operation	-0.124939
-2.293789	& operation	-0.124939
-2.213375	single operation	-0.124939
-2.098777	store operation	-0.124939
-1.917608	graphics operation	-0.124939
-1.743286	shift operation	-0.124939
-1.678561	128-bit operation	-0.124939
-1.299241	AND operation	-0.124939
-0.902195	digital operation	-0.124939
-0.601612	illegal operation	-0.124939
-2.920080	the code,	-0.124939
-3.678178	to code,	-0.124939
-2.578185	efficient code,	-0.124939
-2.087551	optimizing code,	-0.124939
-1.379308	intermediate code,	-0.124939
-1.977431	source code,	-0.124939
-1.553876	independent code,	-0.124939
-1.299295	CPU-intensive code,	-0.124939
-0.902395	legacy code,	-0.124939
-0.601712	superfluous code,	-0.124939
-3.973723	the instance	-0.124939
-3.058170	an instance	-0.124939
-1.370618	one instance	-0.492916
-2.697593	each instance	-0.124939
-2.392785	template instance	-0.124939
-1.671982	new instance	-0.425969
-2.026463	next instance	-0.124939
-1.920089	Each instance	-0.124939
-2.774783	that comes	-0.124939
-3.189418	or comes	-0.124939
-2.578274	it comes	-0.124939
-3.031428	compiler comes	-0.124939
-2.897662	It comes	-0.124939
-2.762939	which comes	-0.124939
-2.662996	size comes	-0.124939
-2.143002	advantage comes	-0.124939
-1.991567	model comes	-0.124939
-1.976484	parameter comes	-0.124939
-1.640327	delay comes	-0.124939
-1.639882	BSD comes	-0.124939
-1.077839	feedback comes	-0.124939
-4.055779	the fact	-0.124939
-2.154438	in fact	-0.124939
-2.426103	The fact	-0.726999
-2.982037	this fact	-0.124939
-2.161638	to find	-0.505150
-2.589556	also find	-0.124939
-2.461585	cannot find	-0.124939
-0.902797	(2) find	-0.124939
-3.088949	to rely	-0.425969
-2.530512	that rely	-0.301030
-2.656222	can rely	-0.425969
-2.733909	should rely	-0.124939
-1.510234	cannot rely	-0.602060
-2.334210	always rely	-0.124939
-2.258270	must rely	-0.124939
-1.554421	Don't rely	-0.124939
-0.601746	surely rely	-0.124939
-2.620840	// No	-0.124939
-2.651390	pointer No	-0.124939
-1.699305	time. No	-0.124939
-2.085429	memory. No	-0.124939
-2.070309	used. No	-0.124939
-1.825838	are: No	-0.124939
-1.552288	parameter. No	-0.124939
-1.552288	storage. No	-0.124939
-1.502469	array. No	-0.124939
-1.077839	2004. No	-0.124939
-0.601612	/EHs- No	-0.124939
-0.601612	53). No	-0.124939
-0.601612	-ipo No	-0.124939
-2.846094	to produce	-0.124939
-2.780135	that produce	-0.124939
-3.327196	can produce	-0.124939
-2.169415	not produce	-0.602060
-3.028870	may produce	-0.124939
-2.900426	will produce	-0.124939
-2.733909	should produce	-0.124939
-2.692560	compilers produce	-0.124939
-1.433063	operators produce	-0.124939
-3.402093	the position-independent	-0.124939
-3.653838	of position-independent	-0.124939
-2.935438	and position-independent	-0.425969
-3.096516	as position-independent	-0.124939
-2.265232	use position-independent	-0.425969
-2.830792	make position-independent	-0.124939
-2.672820	using position-independent	-0.124939
-2.352417	without position-independent	-0.124939
-2.329204	always position-independent	-0.124939
-2.148928	uses position-independent	-0.124939
-1.742831	special position-independent	-0.124939
-0.601646	burdensome position-independent	-0.124939
-3.456591	for vectorization	-0.124939
-2.155112	make vectorization	-0.124939
-2.281082	want vectorization	-0.124939
-2.155161	advantageous vectorization	-0.124939
-2.130337	whether vectorization	-0.124939
-0.868100	automatic vectorization	-0.249877
-0.412917	Automatic vectorization	-0.124939
-3.322421	are including	-0.124939
-3.160115	by including	-0.124939
-2.241058	calculations including	-0.124939
-1.827719	VIA including	-0.124939
-1.801390	Windows, including	-0.124939
-1.772885	code, including	-0.124939
-1.016360	strings including	-0.425969
-1.299108	on, including	-0.124939
-1.299108	platforms, including	-0.124939
-1.299108	features, including	-0.124939
-1.202677	n, including	-0.124939
-0.601579	data, including	-0.124939
-0.601579	computer, including	-0.124939
-0.601579	package, including	-0.124939
-2.569581	for checking	-0.301030
-3.194293	by checking	-0.124939
-2.029928	no checking	-0.425969
-2.357649	without checking	-0.124939
-1.803469	syntax checking	-0.124939
-0.688963	bounds checking	-0.124939
-0.090158	Bounds checking	-0.124939
-3.035630	the out-of-order	-0.249877
-3.098758	of out-of-order	-0.124939
-3.701604	to out-of-order	-0.124939
-2.243767	with out-of-order	-0.602060
-2.712908	no out-of-order	-0.124939
-2.691236	do out-of-order	-0.124939
-2.114442	doing out-of-order	-0.124939
-1.746507	prevents out-of-order	-0.124939
-3.666922	to platforms	-0.124939
-2.140970	different platforms	-0.124939
-1.857985	other platforms	-0.124939
-2.767216	which platforms	-0.124939
-2.751593	all platforms	-0.124939
-2.595880	multiple platforms	-0.124939
-2.212909	common platforms	-0.124939
-2.184819	Linux platforms	-0.124939
-2.026992	Mac platforms	-0.124939
-1.877019	x86 platforms	-0.124939
-0.601679	ARM platforms	-0.124939
-2.750191	is particularly	-0.124939
-3.395338	be particularly	-0.124939
-2.212369	are particularly	-0.124939
-2.980204	have particularly	-0.124939
-2.905051	A particularly	-0.124939
-2.533875	any particularly	-0.124939
-2.176439	works particularly	-0.124939
-3.150330	is given	-0.124939
-3.127113	a given	-0.124939
-2.750088	be given	-0.124939
-2.327547	are given	-0.249877
-3.121332	as given	-0.124939
-2.184558	been given	-0.124939
-0.601862	advice given	-0.124939
-3.422620	the output	-0.124939
-3.538428	and output	-0.124939
-3.453675	The output	-0.124939
-3.116254	as output	-0.124939
-3.058170	an output	-0.124939
-3.045587	compiler output	-0.124939
-2.902387	then output	-0.124939
-0.942358	assembly output	-0.204120
-2.595593	the level-1	-0.522879
-3.764183	a level-1	-0.124939
-3.476007	for level-1	-0.124939
-2.819239	same level-1	-0.124939
-1.641654	entire level-1	-0.124939
-3.889074	the resources.	-0.124939
-3.083107	of resources.	-0.124939
-2.793315	same resources.	-0.124939
-2.769807	other resources.	-0.124939
-2.444008	critical resources.	-0.124939
-2.262223	extra resources.	-0.124939
-2.195248	allocated resources.	-0.124939
-2.121604	few resources.	-0.124939
-1.773693	network resources.	-0.124939
-1.743286	limited resources.	-0.124939
-1.298795	computing resources.	-0.124939
-1.202331	scarce resources.	-0.124939
-0.601612	ample resources.	-0.124939
-3.131480	is outside	-0.124939
-2.855992	memory outside	-0.124939
-2.757911	but outside	-0.124939
-2.524268	variable outside	-0.124939
-2.320778	operations outside	-0.124939
-2.224904	element outside	-0.124939
-2.194809	overflow outside	-0.124939
-2.186013	done outside	-0.124939
-2.070309	counter outside	-0.124939
-1.898188	declared outside	-0.124939
-1.826722	go outside	-0.124939
-1.640772	defined outside	-0.124939
-1.202331	move outside	-0.124939
-3.904735	the task	-0.124939
-2.873556	a task	-0.124939
-3.653838	of task	-0.124939
-3.483690	and task	-0.124939
-3.096516	as task	-0.124939
-3.090158	This task	-0.124939
-2.951034	this task	-0.124939
-2.689073	each task	-0.124939
-2.214255	single task	-0.124939
-1.744472	given task	-0.124939
-1.676294	Any task	-0.124939
-0.601646	essential task	-0.124939
-2.636893	is limited	-0.221849
-2.729261	a limited	-0.124939
-3.403981	be limited	-0.124939
-3.245810	or limited	-0.124939
-3.169723	with limited	-0.124939
-2.231938	A limited	-0.124939
-3.608987	in vectorized	-0.124939
-3.444045	The vectorized	-0.124939
-2.813303	for vectorized	-0.425969
-2.227654	be vectorized	-0.221849
-2.945063	use vectorized	-0.124939
-2.465983	faster vectorized	-0.124939
-2.437833	example, vectorized	-0.124939
-1.078240	indeed vectorized	-0.124939
-0.601746	series, vectorized	-0.124939
-3.150330	is sometimes	-0.124939
-2.954022	and sometimes	-0.124939
-2.733604	are sometimes	-0.124939
-2.130336	can sometimes	-0.522879
-3.048475	compiler sometimes	-0.124939
-1.919608	framework sometimes	-0.124939
-1.446170	They sometimes	-0.124939
-3.184329	the local	-0.301030
-3.120350	a local	-0.124939
-3.524083	and local	-0.124939
-2.567125	for local	-0.301030
-2.808013	functions local	-0.124939
-2.783359	other local	-0.124939
-2.757100	all local	-0.124939
-1.775333	including local	-0.124939
-1.078240	parameters, local	-0.124939
-2.837012	the costs	-0.301030
-3.728566	of costs	-0.124939
-2.581407	The costs	-0.602060
-2.585890	also costs	-0.124939
-2.502919	performance costs	-0.124939
-1.525016	These costs	-0.124939
-3.101957	of S1	-0.124939
-2.997679	{ S1	-0.124939
-2.239376	bytes S1	-0.124939
-2.154670	}; S1	-0.124939
-1.901363	100; S1	-0.124939
-0.393690	struct S1	-0.602060
-0.380112	b;}; S1	-0.124939
-3.689596	of math	-0.124939
-1.538836	vector math	-0.425969
-2.667024	Intel math	-0.124939
-2.215063	common math	-0.124939
-2.209225	AMD math	-0.124939
-2.184723	precision math	-0.124939
-2.174368	optimized math	-0.124939
-2.059460	fast math	-0.124939
-1.678111	little math	-0.124939
-3.689596	of temp	-0.124939
-3.689733	to temp	-0.124939
-2.359096	= temp	-0.301030
-2.993009	{ temp	-0.124939
-2.837076	make temp	-0.124939
-2.422180	register temp	-0.124939
-1.743812	save temp	-0.124939
-0.529364	temp; temp	-0.425969
-0.601746	&list[0]; temp	-0.124939
-3.184329	the inlined	-0.301030
-3.737969	is inlined	-0.124939
-3.689596	of inlined	-0.124939
-2.499037	be inlined	-0.124939
-3.361856	are inlined	-0.124939
-2.384165	an inlined	-0.124939
-2.407083	often inlined	-0.124939
-2.334210	always inlined	-0.124939
-1.875195	usually inlined	-0.124939
-2.902079	is still	-0.301030
-2.245721	can still	-0.249877
-3.253942	it still	-0.124939
-3.131753	code still	-0.124939
-2.900426	will still	-0.124939
-2.535683	we still	-0.124939
-2.115265	would still	-0.124939
-1.918350	framework still	-0.124939
-1.444964	capabilities still	-0.124939
-3.033480	the class.	-0.124939
-2.587429	or class.	-0.124939
-2.806084	same class.	-0.124939
-2.150270	another class.	-0.124939
-2.141130	container class.	-0.124939
-0.981957	child class.	-0.124939
-1.555048	derived class.	-0.124939
-1.555048	parent class.	-0.124939
-0.902462	object's class.	-0.124939
-3.904735	the database	-0.124939
-3.110400	a database	-0.124939
-3.197039	or database	-0.124939
-2.884962	A database	-0.124939
-2.325746	system database	-0.124939
-2.086429	optimizing database	-0.124939
-1.677937	Optimizing database	-0.124939
-1.444677	Open database	-0.124939
-0.680993	System database	-0.124939
-1.299374	development, database	-0.124939
-0.902261	mutexes, database	-0.124939
-0.601646	registration database	-0.124939
-3.412236	the constants	-0.124939
-3.092430	of constants	-0.124939
-3.429004	for constants	-0.124939
-2.800137	only constants	-0.124939
-2.779931	other constants	-0.124939
-2.105075	point constants	-0.425969
-1.911500	two constants	-0.124939
-2.043690	Integer constants	-0.124939
-1.640680	identical constants	-0.124939
-0.601712	String constants	-0.124939
-3.683510	a bool	-0.124939
-3.665433	of bool	-0.124939
-2.236086	bytes bool	-0.124939
-1.111073	a, bool	-0.726999
-1.829092	a; bool	-0.124939
-1.554021	y; bool	-0.124939
-1.078039	way: bool	-0.124939
-1.078420	7.11 bool	-0.124939
-0.902328	z; bool	-0.124939
-0.601679	7.10a bool	-0.124939
-0.601679	7.9a bool	-0.124939
-2.384862	time. Do	-0.124939
-2.261344	function. Do	-0.124939
-2.069653	used. Do	-0.124939
-1.956149	efficient. Do	-0.124939
-1.937317	set. Do	-0.124939
-1.551989	allocation. Do	-0.124939
-1.552944	block. Do	-0.124939
-1.444278	optimizations. Do	-0.124939
-1.077739	list. Do	-0.124939
-1.078219	resource. Do	-0.124939
-0.902128	column; Do	-0.124939
-0.601579	editions). Do	-0.124939
-0.601579	key. Do	-0.124939
-0.601579	map. Do	-0.124939
-3.427907	the frame	-0.425969
-3.127113	a frame	-0.124939
-3.094861	to frame	-0.124939
-3.553263	and frame	-0.124939
-3.047489	than frame	-0.124939
-2.231066	A frame	-0.425969
-1.168845	stack frame	-0.124939
-2.500162	2 ==	-0.124939
-2.151954	128 ==	-0.124939
-1.445477	(a ==	-0.124939
-0.149729	INSTRSET ==	-0.346788
-0.504982	(b ==	-0.425969
-0.380099	Day ==	-0.124939
-0.902529	(Day ==	-0.124939
-0.601779	(GetExceptionCode() ==	-0.124939
-2.384401	int d;	-0.425969
-1.569988	double d;	-0.425969
-1.953724	+ d;	-0.124939
-0.325822	c, d;	-0.221849
-1.078821	{double d;	-0.124939
-3.937861	the special	-0.124939
-2.877374	a special	-0.124939
-3.677347	of special	-0.124939
-2.959736	in special	-0.124939
-2.811150	for special	-0.124939
-3.353677	are special	-0.124939
-2.970448	have special	-0.124939
-2.834971	make special	-0.124939
-2.418847	take special	-0.124939
-1.378130	Several special	-0.124939
-2.487944	to prevent	-0.301030
-2.950241	and prevent	-0.124939
-3.330824	can prevent	-0.124939
-3.135467	code prevent	-0.124939
-2.903116	will prevent	-0.124939
-2.238551	doesn't prevent	-0.124939
-1.876902	options prevent	-0.124939
-1.078340	To prevent	-0.124939
-2.727900	a shift	-0.249877
-2.556648	and shift	-0.124939
-3.334482	can shift	-0.124939
-3.262927	= shift	-0.124939
-2.970150	this shift	-0.124939
-2.905823	will shift	-0.124939
-1.564318	; shift	-0.124939
-2.838379	the destructor	-0.124939
-2.615856	a destructor	-0.346788
-2.913354	A destructor	-0.124939
-2.719783	no destructor	-0.124939
-2.222614	virtual destructor	-0.124939
-2.579172	to save	-0.124939
-2.405038	can save	-0.124939
-2.352128	may save	-0.124939
-2.905823	will save	-0.124939
-2.736266	should save	-0.124939
-2.257221	; save	-0.124939
-1.679022	possibly save	-0.124939
-3.538428	and prevents	-0.124939
-2.329277	it prevents	-0.301030
-1.895123	This prevents	-0.522879
-2.966259	this prevents	-0.124939
-2.808288	instruction prevents	-0.124939
-2.773712	which prevents	-0.124939
-2.582254	also prevents	-0.124939
-1.978359	division prevents	-0.124939
-2.512622	the preceding	-0.249877
-3.505314	The preceding	-0.124939
-3.185067	with preceding	-0.124939
-3.937861	the safe	-0.124939
-3.140803	is safe	-0.124939
-3.117008	a safe	-0.124939
-3.434624	The safe	-0.124939
-2.744316	be safe	-0.124939
-3.073053	not safe	-0.124939
-2.253039	more safe	-0.124939
-2.800137	only safe	-0.124939
-2.215682	thread safe	-0.124939
-2.198592	exception safe	-0.124939
-2.950241	and d	-0.124939
-2.609056	= d	-0.425969
-2.995338	{ d	-0.124939
-2.666704	double d	-0.124939
-1.952368	+ d	-0.425969
-1.458495	b; d	-0.425969
-0.793665	d; d	-0.602060
-0.601779	DTRUE: d	-0.124939
-0.380112	2.5 Choice	-0.425969
-0.204071	2.3 Choice	-0.425969
-0.204071	2.2 Choice	-0.425969
-0.204071	2.1 Choice	-0.425969
-0.204071	2.7 Choice	-0.425969
-0.204071	2.6 Choice	-0.425969
-0.204071	2.4 Choice	-0.425969
-2.202300	to tell	-0.564271
-2.660871	can tell	-0.124939
-2.910414	then tell	-0.124939
-3.955422	the Pentium	-0.124939
-2.725190	a Pentium	-0.425969
-2.823166	The Pentium	-0.124939
-3.139365	on Pentium	-0.124939
-2.896904	A Pentium	-0.124939
-2.667024	Intel Pentium	-0.124939
-2.254878	while Pentium	-0.124939
-1.180902	old Pentium	-0.124939
-0.601746	oldest Pentium	-0.124939
-3.750575	is further	-0.124939
-3.721976	a further	-0.124939
-2.131291	for further	-0.271067
-3.386863	be further	-0.124939
-3.370192	are further	-0.124939
-3.135467	code further	-0.124939
-2.900958	A further	-0.124939
-2.789657	loop further	-0.124939
-3.113645	code Assume	-0.124939
-2.890504	} Assume	-0.124939
-2.589509	static Assume	-0.124939
-1.919489	aligned Assume	-0.124939
-1.676451	access. Assume	-0.124939
-1.502270	throw() Assume	-0.124939
-1.444278	speed. Assume	-0.124939
-0.902128	aligned(16))) Assume	-0.124939
-0.902128	const)) Assume	-0.124939
-0.902128	ivdep Assume	-0.124939
-0.902128	-fno-rtti Assume	-0.124939
-0.601579	column-wise. Assume	-0.124939
-0.601579	78. Assume	-0.124939
-0.601579	general. Assume	-0.124939
-3.412236	the efficiency	-0.425969
-2.420654	The efficiency	-0.726999
-2.958580	this efficiency	-0.124939
-2.928264	when efficiency	-0.124939
-2.842657	program efficiency	-0.124939
-2.794543	CPU efficiency	-0.124939
-2.727150	cache efficiency	-0.124939
-1.899627	improve efficiency	-0.124939
-1.803407	relative efficiency	-0.124939
-1.202731	highest efficiency	-0.124939
-2.921735	the repeat	-0.823909
-3.689733	to repeat	-0.124939
-2.255186	; repeat	-0.124939
-1.976963	high repeat	-0.124939
-1.016468	maximum repeat	-0.425969
-1.554109	low repeat	-0.124939
-1.553484	typical repeat	-0.124939
-1.503269	fixed repeat	-0.124939
-1.203178	Let's repeat	-0.124939
-3.190443	the unroll	-0.602060
-2.579172	to unroll	-0.522879
-3.025075	you unroll	-0.124939
-2.905823	will unroll	-0.124939
-2.104254	loop unroll	-0.124939
-2.695362	compilers unroll	-0.124939
-1.876387	usually unroll	-0.124939
-2.584749	it calls.	-0.124939
-1.688466	function calls.	-0.279841
-2.336448	system calls.	-0.124939
-1.901041	interface calls.	-0.124939
-3.407135	the algorithm	-0.425969
-3.665433	of algorithm	-0.124939
-3.047122	an algorithm	-0.124939
-2.527161	any algorithm	-0.124939
-2.421059	first algorithm	-0.124939
-2.331879	following algorithm	-0.124939
-2.295751	simple algorithm	-0.124939
-2.225357	best algorithm	-0.124939
-1.094256	optimal algorithm	-0.124939
-2.009102	complicated algorithm	-0.124939
-0.902328	universal algorithm	-0.124939
-3.920982	the sum	-0.124939
-3.683510	a sum	-0.124939
-3.665433	of sum	-0.124939
-3.268574	// sum	-0.124939
-2.988389	{ sum	-0.124939
-2.896104	} sum	-0.124939
-1.915800	float sum	-0.425969
-1.178915	i++) sum	-0.602060
-1.852782	i, sum	-0.124939
-0.902328	list[size], sum	-0.124939
-0.902328	initialize sum	-0.124939
-3.422620	the strings	-0.425969
-3.228937	or strings	-0.124939
-2.080021	all strings	-0.425969
-2.101909	store strings	-0.124939
-1.678165	handle strings	-0.124939
-1.445198	storing strings	-0.124939
-0.124908	Text strings	-0.124939
-0.380099	text strings	-0.425969
-1.916727	processors. On	-0.124939
-1.851988	CPUs. On	-0.124939
-1.850662	compiler. On	-0.124939
-1.742401	resources. On	-0.124939
-1.202331	structures. On	-0.124939
-1.078286	output. On	-0.124939
-0.902195	difficult. On	-0.124939
-0.902195	hyperthreading. On	-0.124939
-0.902195	tree. On	-0.124939
-0.601612	predictor. On	-0.124939
-0.601612	comparison. On	-0.124939
-0.601612	profitable. On	-0.124939
-0.601612	9.1b. On	-0.124939
-3.042146	the exponent	-0.425969
-3.739297	to exponent	-0.124939
-2.832383	The exponent	-0.425969
-2.378750	// exponent	-0.602060
-2.131083	int exponent	-0.602060
-3.715184	of Linux,	-0.124939
-3.553263	and Linux,	-0.124939
-2.964085	in Linux,	-0.124939
-1.650959	64-bit Linux,	-0.124939
-2.487044	32-bit Linux,	-0.124939
-0.692047	Windows, Linux,	-0.249877
-0.601813	(Windows, Linux,	-0.124939
-2.645648	the possibility	-0.602060
-0.982007	Another possibility	-0.425969
-0.902797	theoretical possibility	-0.124939
-0.902797	obscure possibility	-0.124939
-3.973723	the discussion	-0.124939
-3.123718	a discussion	-0.425969
-2.815467	for discussion	-0.425969
-2.966259	this discussion	-0.124939
-2.934062	more discussion	-0.124939
-2.900958	A discussion	-0.124939
-1.977803	various discussion	-0.124939
-0.634007	further discussion	-0.726999
-3.434624	The conditions	-0.124939
-1.658689	multiple conditions	-0.124939
-2.358213	these conditions	-0.124939
-1.641510	following conditions	-0.425969
-2.281949	error conditions	-0.124939
-2.086523	certain conditions	-0.124939
-2.013226	caching conditions	-0.124939
-1.956547	three conditions	-0.124939
-1.203078	Copyright conditions	-0.124939
-1.202731	worst-case conditions	-0.124939
-3.749654	a non-Intel	-0.124939
-3.169723	with non-Intel	-0.124939
-1.715046	on non-Intel	-0.329059
-2.909183	A non-Intel	-0.124939
-0.902663	treats non-Intel	-0.124939
-0.902663	treat non-Intel	-0.124939
-3.080229	to it.	-0.124939
-3.127587	on it.	-0.124939
-2.937003	use it.	-0.124939
-2.371183	need it.	-0.124939
-2.269828	about it.	-0.124939
-2.246244	calls it.	-0.124939
-2.246244	avoid it.	-0.124939
-2.138320	support it.	-0.124939
-1.826742	tested it.	-0.124939
-1.677114	execute it.	-0.124939
-1.298962	understand it.	-0.124939
-0.601646	recompile it.	-0.124939
-2.112931	CPU (See	-0.425969
-1.812580	2 (See	-0.425969
-2.265105	CPUs (See	-0.124939
-2.244729	Windows (See	-0.124939
-1.939933	types (See	-0.124939
-1.852782	CPUs. (See	-0.124939
-1.851276	modules (See	-0.124939
-1.773978	2. (See	-0.124939
-1.772471	variables. (See	-0.124939
-1.554021	mispredicted (See	-0.124939
-0.902328	www.intel.com. (See	-0.124939
-3.098758	of registers.	-0.124939
-2.962630	in registers.	-0.124939
-1.905649	vector registers.	-0.301030
-2.820110	different registers.	-0.124939
-2.725053	integer registers.	-0.124939
-2.653867	into registers.	-0.124939
-1.900571	XMM registers.	-0.124939
-0.805899	YMM registers.	-0.124939
-3.039963	the maximum	-0.124939
-3.749654	a maximum	-0.124939
-2.309142	The maximum	-0.346788
-2.555158	value maximum	-0.124939
-2.422403	take maximum	-0.124939
-1.203266	worst-case maximum	-0.124939
-1.375561	64-bit mode.	-0.124939
-1.263455	32-bit mode.	-0.346788
-1.740913	bit mode.	-0.124939
-0.601913	sleep mode.	-0.124939
-1.516469	elements per	-0.301030
-2.287774	times per	-0.124939
-2.090548	values per	-0.124939
-0.848520	cycles per	-0.221849
-0.601940	Time per	-0.602060
-2.813303	for testing	-0.124939
-3.361856	are testing	-0.124939
-2.524226	by testing	-0.124939
-2.250461	when testing	-0.124939
-2.827035	because testing	-0.124939
-2.458216	makes testing	-0.124939
-1.299775	development, testing	-0.124939
-0.204057	Worst-case testing	-0.124939
-0.601746	Best-case testing	-0.124939
-3.178299	the alignment	-0.124939
-3.665433	of alignment	-0.124939
-3.425403	The alignment	-0.124939
-3.145297	with alignment	-0.124939
-3.131478	on alignment	-0.124939
-3.093028	This alignment	-0.124939
-2.954790	this alignment	-0.124939
-2.925841	when alignment	-0.124939
-2.654157	pointer alignment	-0.124939
-1.918975	requires alignment	-0.124939
-1.078039	Specifies alignment	-0.124939
-2.552838	the right	-0.212089
-2.679221	size right	-0.124939
-1.746944	shift right	-0.124939
-2.762687	the offset	-0.271067
-3.473600	The offset	-0.124939
-3.065695	an offset	-0.124939
-2.717480	no offset	-0.124939
-1.133154	global offset	-0.425969
-1.713872	total offset	-0.124939
-3.973723	the compatibility	-0.124939
-2.586556	of compatibility	-0.522879
-2.950241	and compatibility	-0.425969
-2.933150	when compatibility	-0.124939
-1.299909	backwards compatibility	-0.124939
-0.902529	resolve compatibility	-0.124939
-0.902529	cross-platform compatibility	-0.124939
-0.601779	bugs, compatibility	-0.124939
-3.955422	the macro	-0.124939
-2.879296	a macro	-0.124939
-3.689596	of macro	-0.124939
-3.431318	that macro	-0.124939
-2.229327	A macro	-0.124939
-2.235573	Use macro	-0.124939
-1.016531	Define macro	-0.124939
-1.078240	Replace macro	-0.124939
-0.902462	preprocessing macro	-0.124939
-1.813716	2 bytes.	-0.425969
-1.497799	4 bytes.	-0.602060
-1.745948	8 bytes.	-0.425969
-1.727428	64 bytes.	-0.124939
-2.332982	16 bytes.	-0.124939
-2.151954	128 bytes.	-0.124939
-1.503190	12 bytes.	-0.124939
-0.902529	400 bytes.	-0.124939
-3.037791	the object.	-0.124939
-2.812612	same object.	-0.124939
-2.699749	each object.	-0.124939
-0.965086	shared object.	-0.124939
-1.830290	global object.	-0.124939
-1.640988	entire object.	-0.124939
-0.902596	anonymous object.	-0.124939
-2.853706	of 100	-0.124939
-3.361856	are 100	-0.124939
-3.184250	by 100	-0.124939
-2.494001	with 100	-0.124939
-3.096824	- 100	-0.124939
-2.556133	* 100	-0.124939
-2.249654	result 100	-0.124939
-1.554421	element. 100	-0.124939
-0.747811	eax, 100	-0.124939
-1.916727	x; Note	-0.124939
-1.800835	1. Note	-0.124939
-1.599379	version. Note	-0.124939
-1.502469	system. Note	-0.124939
-1.501579	arrays. Note	-0.124939
-1.444477	details. Note	-0.124939
-1.299241	explanation. Note	-0.124939
-1.202331	optimized. Note	-0.124939
-0.902195	away. Note	-0.124939
-0.601612	Day. Note	-0.124939
-0.601612	disassembler. Note	-0.124939
-0.601612	131 Note	-0.124939
-0.601612	a[i]. Note	-0.124939
-2.282069	making them	-0.124939
-2.279286	want them	-0.124939
-2.210385	compile them	-0.124939
-1.874584	reduce them	-0.124939
-1.803945	turn them	-0.124939
-1.711101	copying them	-0.124939
-1.640327	reading them	-0.124939
-1.377531	comparing them	-0.124939
-1.078286	copies them	-0.124939
-1.077839	leave them	-0.124939
-0.902195	join them	-0.124939
-0.601612	hide them	-0.124939
-0.601612	getting them	-0.124939
-2.957836	and writing	-0.124939
-3.387359	are writing	-0.124939
-1.979290	or writing	-0.301030
-3.126470	as writing	-0.124939
-2.493969	software writing	-0.124939
-1.535534	threads writing	-0.425969
-3.184329	the library.	-0.124939
-2.298303	function library.	-0.124939
-2.787432	point library.	-0.124939
-2.700582	class library.	-0.124939
-2.596747	static library.	-0.124939
-1.977895	source library.	-0.124939
-1.829067	C library.	-0.124939
-1.078240	LIBM library.	-0.124939
-0.601746	non-vector library.	-0.124939
-2.986098	{ struct	-0.124939
-1.458150	}; struct	-0.124939
-1.444677	follows: struct	-0.124939
-1.202464	__declspec(align(16)) struct	-0.124939
-1.202878	14.9 struct	-0.124939
-1.078353	1024; struct	-0.124939
-1.077939	7.13 struct	-0.124939
-0.902261	8.15a struct	-0.124939
-0.601646	7.40a struct	-0.124939
-0.601646	8.15b struct	-0.124939
-0.601646	7.35b struct	-0.124939
-0.601646	7.35a struct	-0.124939
-3.417397	the calculations.	-0.425969
-2.787432	point calculations.	-0.124939
-2.041542	integer calculations.	-0.124939
-2.359608	these calculations.	-0.124939
-1.045339	mathematical calculations.	-0.301030
-1.919592	graphics calculations.	-0.124939
-1.640322	parallel calculations.	-0.124939
-1.299462	actual calculations.	-0.124939
-0.902462	overlapping calculations.	-0.124939
-3.433259	the operand	-0.425969
-3.065695	an operand	-0.124939
-2.067298	one operand	-0.425969
-1.479472	first operand	-0.301030
-0.788917	second operand	-0.726999
-1.379144	predictable operand	-0.124939
-3.721976	a reduced	-0.124939
-3.702202	of reduced	-0.124939
-2.500125	be reduced	-0.301030
-3.159787	with reduced	-0.124939
-2.366359	compiler reduced	-0.124939
-1.908401	has reduced	-0.602060
-2.693959	compilers reduced	-0.124939
-2.183410	been reduced	-0.124939
-0.875636	clock cycles.	-0.238882
-2.513269	the final	-0.204120
-2.280599	its final	-0.124939
-2.477007	the sake	-1.238882
-3.689596	of operations.	-0.124939
-1.746451	vector operations.	-0.124939
-2.787432	point operations.	-0.124939
-2.722724	integer operations.	-0.124939
-2.100352	standard operations.	-0.124939
-1.744746	shift operations.	-0.124939
-0.805803	arithmetic operations.	-0.124939
-1.078240	input/output operations.	-0.124939
-0.601746	(SIMD) operations.	-0.124939
-2.262223	function. When	-0.124939
-1.993771	cache. When	-0.124939
-1.825838	are: When	-0.124939
-1.799509	size. When	-0.124939
-1.711101	operations. When	-0.124939
-1.676783	precision. When	-0.124939
-1.553177	aliasing When	-0.124939
-1.298795	method. When	-0.124939
-1.298795	mispredictions. When	-0.124939
-0.902195	additions. When	-0.124939
-0.601612	GB. When	-0.124939
-0.601612	profiling. When	-0.124939
-0.601612	100000000. When	-0.124939
-3.937861	the tasks	-0.124939
-3.429004	for tasks	-0.124939
-2.141636	different tasks	-0.124939
-2.779931	other tasks	-0.124939
-2.296953	simple tasks	-0.124939
-1.406713	standard tasks	-0.124939
-2.086523	certain tasks	-0.124939
-1.678469	Other tasks	-0.124939
-0.856777	time-consuming tasks	-0.124939
-0.902395	trivial tasks	-0.124939
-2.262223	function. Avoid	-0.124939
-2.150370	functions. Avoid	-0.124939
-2.070749	program. Avoid	-0.124939
-1.825838	are: Avoid	-0.124939
-1.553177	problems. Avoid	-0.124939
-1.444477	19 Avoid	-0.124939
-1.299241	8. Avoid	-0.124939
-1.202331	declared. Avoid	-0.124939
-0.902195	93. Avoid	-0.124939
-0.902195	26. Avoid	-0.124939
-0.902195	9. Avoid	-0.124939
-0.601612	22. Avoid	-0.124939
-0.601612	140. Avoid	-0.124939
-3.955422	the effect	-0.124939
-2.307053	The effect	-0.522879
-3.098826	This effect	-0.124939
-2.962402	this effect	-0.124939
-2.530505	any effect	-0.124939
-1.828444	negative effect	-0.124939
-1.598929	significant effect	-0.124939
-1.445277	polymorphism effect	-0.124939
-1.202865	dramatic effect	-0.124939
-2.835650	the amount	-0.903090
-1.713538	total amount	-0.124939
-1.599595	significant amount	-0.124939
-0.805933	required amount	-0.425969
-1.379717	equal amount	-0.124939
-1.299796	considerable amount	-0.124939
-0.902596	insufficient amount	-0.124939
-3.422620	the variable.	-0.124939
-3.123718	a variable.	-0.124939
-3.437390	that variable.	-0.124939
-2.603382	float variable.	-0.124939
-1.733359	register variable.	-0.425969
-1.608588	simple variable.	-0.124939
-1.264537	induction variable.	-0.124939
-1.745111	local variable.	-0.124939
-3.889074	the time,	-0.124939
-3.659637	a time,	-0.124939
-3.642544	of time,	-0.124939
-2.947310	this time,	-0.124939
-2.793315	same time,	-0.124939
-2.788289	CPU time,	-0.124939
-2.523842	any time,	-0.124939
-2.486881	long time,	-0.124939
-2.262223	extra time,	-0.124939
-2.210385	compile time,	-0.124939
-1.992007	development time,	-0.124939
-1.078286	compile- time,	-0.124939
-0.601612	programmers' time,	-0.124939
-2.696022	class Variables	-0.124939
-2.273130	stack Variables	-0.124939
-2.086614	memory. Variables	-0.124939
-1.957731	efficient. Variables	-0.124939
-1.180702	storage Variables	-0.124939
-1.826829	are: Variables	-0.124939
-1.552885	inefficient. Variables	-0.124939
-1.552885	storage. Variables	-0.124939
-0.380059	9.4 Variables	-0.425969
-0.601679	26). Variables	-0.124939
-0.601679	stride. Variables	-0.124939
-3.973723	the copying	-0.124939
-3.098758	of copying	-0.124939
-3.538428	and copying	-0.124939
-2.000323	by copying	-0.346788
-3.116254	as copying	-0.124939
-2.933150	when copying	-0.124939
-1.078340	wasteful copying	-0.124939
-1.078340	backup copying	-0.124939
-3.653838	of optimization.	-0.124939
-3.655951	to optimization.	-0.124939
-3.411540	for optimization.	-0.124939
-3.120798	code optimization.	-0.124939
-3.034223	compiler optimization.	-0.124939
-2.951034	this optimization.	-0.124939
-2.165529	program optimization.	-0.124939
-2.483711	software optimization.	-0.124939
-1.744061	prevent optimization.	-0.124939
-1.676294	full optimization.	-0.124939
-0.902261	careful optimization.	-0.124939
-0.601646	profile-guided optimization.	-0.124939
-3.701604	to accessing	-0.124939
-2.297015	for accessing	-0.124939
-3.228937	or accessing	-0.124939
-3.159787	with accessing	-0.124939
-3.116254	as accessing	-0.124939
-2.372808	than accessing	-0.425969
-2.933150	when accessing	-0.124939
-1.713764	When accessing	-0.124939
-3.204797	or until	-0.124939
-3.131478	on until	-0.124939
-2.798073	only until	-0.124939
-2.527525	variable until	-0.124939
-2.328174	file until	-0.124939
-1.828336	loaded until	-0.124939
-0.492831	wait until	-0.124939
-0.902328	waits until	-0.124939
-0.902328	repeated until	-0.124939
-0.601679	detected until	-0.124939
-0.601679	postponed until	-0.124939
-3.937861	the performance.	-0.124939
-2.554075	in performance.	-0.124939
-3.135404	on performance.	-0.124939
-2.842657	program performance.	-0.124939
-2.558203	possible performance.	-0.124939
-2.226241	best performance.	-0.124939
-1.899627	improve performance.	-0.124939
-1.712885	reduced performance.	-0.124939
-1.642064	improved performance.	-0.124939
-1.078139	improving performance.	-0.124939
-3.456591	for adding	-0.124939
-3.378690	are adding	-0.124939
-1.833925	by adding	-0.191886
-2.453012	before adding	-0.124939
-2.357649	without adding	-0.124939
-2.012851	like adding	-0.124939
-1.641480	keep adding	-0.124939
-1.683304	// Define	-0.359022
-0.601980	cores: Define	-0.124939
-3.483690	and causes	-0.124939
-2.765072	which causes	-0.124939
-1.981748	size causes	-0.425969
-2.553180	version causes	-0.124939
-2.125731	list causes	-0.124939
-2.056650	write causes	-0.124939
-1.801232	inlining causes	-0.124939
-1.744472	destructor causes	-0.124939
-1.598755	stride causes	-0.124939
-1.502257	frequent causes	-0.124939
-0.902261	bug causes	-0.124939
-0.601646	free) causes	-0.124939
-3.937861	the processing	-0.124939
-3.695957	a processing	-0.124939
-3.035240	than processing	-0.124939
-2.158183	vector processing	-0.124939
-1.976401	high processing	-0.124939
-0.969458	graphics processing	-0.124939
-1.639990	parallel processing	-0.124939
-1.078139	signal processing	-0.124939
-0.902395	sound processing	-0.124939
-0.902395	physics processing	-0.124939
-2.351077	to divide	-0.726999
-3.584546	and divide	-0.124939
-3.341893	can divide	-0.124939
-3.278211	= divide	-0.124939
-2.337311	you divide	-0.124939
-2.764992	the so-called	-0.124939
-3.779216	a so-called	-0.124939
-2.584028	The so-called	-0.124939
-2.425414	This so-called	-0.124939
-3.147131	is clear	-0.425969
-3.386863	be clear	-0.124939
-3.330824	can clear	-0.124939
-3.083406	not clear	-0.124939
-1.726186	more clear	-0.221849
-2.712908	no clear	-0.124939
-2.438818	less clear	-0.124939
-2.285848	making clear	-0.124939
-2.645648	the total	-0.301030
-3.137458	a total	-0.124939
-3.494484	The total	-0.124939
-2.329784	bits total	-0.124939
-2.415696	to mix	-0.124939
-3.104883	not mix	-0.124939
-0.601962	Don't mix	-0.301030
-0.601913	balanced mix	-0.124939
-3.553263	and 16-bit	-0.124939
-2.440262	in 16-bit	-0.522879
-3.456591	for 16-bit	-0.124939
-3.164727	with 16-bit	-0.124939
-2.841316	make 16-bit	-0.124939
-1.433877	eight 16-bit	-0.124939
-1.713538	mode. 16-bit	-0.124939
-3.433259	the child	-0.425969
-2.713566	and child	-0.124939
-2.830060	The child	-0.124939
-1.328252	its child	-0.301030
-1.503657	polymorphic child	-0.124939
-1.078541	correct child	-0.124939
-3.665433	of containers	-0.124939
-3.425403	The containers	-0.124939
-3.345650	are containers	-0.124939
-2.675326	example containers	-0.124939
-2.213651	These containers	-0.124939
-2.178048	inside containers	-0.124939
-1.993193	separate containers	-0.124939
-1.917472	Many containers	-0.124939
-1.875510	made containers	-0.124939
-1.106220	STL containers	-0.124939
-1.444498	suitable containers	-0.124939
-2.415696	to fit	-0.669007
-2.786920	that fit	-0.124939
-3.104883	not fit	-0.124939
-2.177181	data fit	-0.124939
-2.296257	to predict	-0.301030
-2.660871	can predict	-0.124939
-0.601947	occasionally predict	-0.124939
-3.973723	the priority	-0.124939
-2.820110	different priority	-0.124939
-2.132819	same priority	-0.124939
-2.217520	thread priority	-0.124939
-1.977525	high priority	-0.124939
-1.203391	higher priority	-0.124939
-0.856940	low priority	-0.124939
-0.805787	lower priority	-0.124939
-4.012816	the disk	-0.124939
-3.728566	of disk	-0.124939
-2.957836	and disk	-0.124939
-3.466191	for disk	-0.124939
-1.641959	reading disk	-0.124939
-0.234053	hard disk	-0.204120
-0.912618	clock frequency	-0.301030
-3.702202	of unknown	-0.124939
-3.058170	an unknown	-0.124939
-2.873034	from unknown	-0.124939
-2.759880	all unknown	-0.124939
-2.554560	many unknown	-0.124939
-2.056945	was unknown	-0.124939
-0.650150	were unknown	-0.823909
-1.678165	handle unknown	-0.124939
-2.473439	is obtained	-0.367977
-2.347055	be obtained	-0.249877
-0.902864	doubt obtained	-0.124939
-2.140029	function libraries.	-0.124939
-2.840079	vector libraries.	-0.124939
-2.667024	Intel libraries.	-0.124939
-2.596747	static libraries.	-0.124939
-2.330388	dynamic libraries.	-0.124939
-2.254878	large libraries.	-0.124939
-1.877683	link libraries.	-0.124939
-1.745370	math libraries.	-0.124939
-0.902462	external libraries.	-0.124939
-3.992830	the iteration	-0.124939
-1.654048	one iteration	-0.124939
-1.759924	each iteration	-0.124939
-2.267534	extra iteration	-0.124939
-2.103518	every iteration	-0.124939
-1.745723	preceding iteration	-0.124939
-1.641480	previous iteration	-0.124939
-4.012816	the counters	-0.124939
-3.728566	of counters	-0.124939
-3.473600	The counters	-0.124939
-2.502919	performance counters	-0.124939
-2.218734	These counters	-0.124939
-0.245487	monitor counters	-0.124939
-2.389399	time. Optimizing	-0.124939
-2.203858	etc. Optimizing	-0.124939
-1.827325	are: Optimizing	-0.124939
-1.106376	1. Optimizing	-0.425969
-1.774344	2. Optimizing	-0.124939
-1.553876	critical. Optimizing	-0.124939
-1.553185	storage. Optimizing	-0.124939
-1.445077	speed. Optimizing	-0.124939
-0.680963	9 Optimizing	-0.425969
-0.902395	processor). Optimizing	-0.124939
-2.727900	a 128-bit	-0.425969
-3.713808	to 128-bit	-0.124939
-2.827750	The 128-bit	-0.425969
-1.912775	two 128-bit	-0.124939
-2.131074	supported 128-bit	-0.124939
-1.920587	Each 128-bit	-0.124939
-1.678285	full 128-bit	-0.124939
-3.763557	is possibly	-0.124939
-2.954022	and possibly	-0.124939
-2.130336	can possibly	-0.221849
-3.034388	may possibly	-0.124939
-2.768129	but possibly	-0.124939
-1.554822	could possibly	-0.124939
-0.902596	overwritten, possibly	-0.124939
-3.615467	in x,	-0.124939
-1.569479	double x,	-0.249877
-1.916935	float x,	-0.124939
-2.061939	(int x,	-0.124939
-1.745669	S1 x,	-0.124939
-1.599820	modify x,	-0.124939
-1.203279	(double x,	-0.124939
-0.601779	f, x,	-0.124939
-2.645648	the stack.	-0.234083
-2.428739	register stack.	-0.124939
-2.013817	their stack.	-0.124939
-1.900457	own stack.	-0.124939
-3.702202	of 2,	-0.124939
-3.255482	= 2,	-0.124939
-3.098754	- 2,	-0.124939
-1.952368	+ 2,	-0.425969
-2.604528	number 2,	-0.124939
-0.333128	1, 2,	-0.249877
-1.378530	(i.e. 2,	-0.124939
-0.601779	(0, 2,	-0.124939
-2.923397	the full	-0.124939
-3.721976	a full	-0.124939
-3.702202	of full	-0.124939
-3.615467	in full	-0.124939
-3.228937	or full	-0.124939
-3.159787	with full	-0.124939
-2.947783	use full	-0.124939
-2.847988	has full	-0.124939
-2.387124	time. Another	-0.124939
-1.827971	loop. Another	-0.124939
-1.298962	itself. Another	-0.124939
-1.202464	Watcom Another	-0.124939
-1.202464	double. Another	-0.124939
-1.077939	chains. Another	-0.124939
-0.902261	GOT. Another	-0.124939
-0.902261	slower. Another	-0.124939
-0.601646	ARRAYSIZE. Another	-0.124939
-0.601646	DLL. Another	-0.124939
-0.601646	CPU’s. Another	-0.124939
-0.601646	considerably. Another	-0.124939
-3.750575	is overloaded	-0.124939
-2.950241	and overloaded	-0.425969
-3.386863	be overloaded	-0.124939
-3.228937	or overloaded	-0.124939
-2.132642	an overloaded	-0.124939
-1.996366	using overloaded	-0.124939
-2.600813	multiple overloaded	-0.124939
-2.238275	An overloaded	-0.124939
-3.805008	is possible.	-0.124939
-3.421802	be possible.	-0.124939
-1.850940	if possible.	-0.124939
-2.203812	as possible.	-0.124939
-1.843150	more efficiently	-0.124939
-1.591345	most efficiently	-0.425969
-1.492427	less efficiently	-0.301030
-2.251329	work efficiently	-0.124939
-3.228937	or models	-0.124939
-2.798762	CPU models	-0.124939
-1.014651	processor models	-0.346788
-2.225814	specific models	-0.124939
-1.994977	development models	-0.124939
-1.599820	future models	-0.124939
-1.599820	newer models	-0.124939
-0.601779	Later models	-0.124939
-3.819753	is OS	-0.124939
-0.465170	Mac OS	-0.522879
-1.922697	requires OS	-0.124939
-2.636893	is needed.	-0.124939
-2.171335	not needed.	-0.124939
-3.051650	than needed.	-0.124939
-2.938091	when needed.	-0.124939
-1.641108	rarely needed.	-0.124939
-0.601846	unpacking needed.	-0.124939
-3.524083	and classes.	-0.124939
-3.438006	for classes.	-0.124939
-2.840079	vector classes.	-0.124939
-2.679014	using classes.	-0.124939
-2.359608	these classes.	-0.124939
-1.031798	container classes.	-0.124939
-1.502956	polymorphic classes.	-0.124939
-1.378330	base classes.	-0.124939
-0.902462	reusable classes.	-0.124939
-3.160071	is changed	-0.124939
-2.063903	be changed	-0.271067
-2.163784	has changed	-0.124939
-0.601913	artificially changed	-0.124939
-2.902079	is true	-0.124939
-3.378551	be true	-0.124939
-2.607393	= true	-0.425969
-3.220282	if true	-0.124939
-2.407083	often true	-0.124939
-2.334210	always true	-0.124939
-1.957240	&& true	-0.124939
-1.938864	|| true	-0.124939
-0.601746	result, true	-0.124939
-3.992830	the thread.	-0.124939
-3.735594	a thread.	-0.124939
-2.823297	different thread.	-0.124939
-2.790297	other thread.	-0.124939
-2.751697	one thread.	-0.124939
-1.484581	each thread.	-0.221849
-1.458109	another thread.	-0.124939
-2.830060	The names	-0.425969
-2.025068	function names	-0.346788
-2.301788	have names	-0.124939
-2.535778	variable names	-0.124939
-2.142934	Function names	-0.124939
-1.554383	brand names	-0.124939
-1.035421	even though	-0.602060
-1.854131	function, though	-0.124939
-1.503423	systems, though	-0.124939
-1.445431	available, though	-0.124939
-1.300042	backwards though	-0.124939
-0.601813	b[i]*c[i], though	-0.124939
-0.601813	Object1.Hello(), though	-0.124939
-2.416319	to execute	-0.191886
-2.406771	can execute	-0.124939
-2.466229	code execute	-0.124939
-3.524083	and %	-0.124939
-1.707156	b %	-0.124939
-2.615156	i %	-0.124939
-2.144873	(i %	-0.124939
-1.300088	SIZE %	-0.124939
-1.202865	size) %	-0.124939
-0.505024	int)b %	-0.124939
-0.902462	64) %	-0.124939
-0.601746	0x40) %	-0.124939
-3.035240	than mov	-0.124939
-2.806257	instruction mov	-0.124939
-2.298645	instructions mov	-0.124939
-2.098604	add mov	-0.124939
-0.726613	mov mov	-0.124939
-0.902395	xor mov	-0.124939
-0.902395	push mov	-0.124939
-0.902395	$B1$1: mov	-0.124939
-0.601712	$B2$2: mov	-0.124939
-0.601712	$B1$2: mov	-0.124939
-2.855522	of N	-0.124939
-2.568351	for N	-0.301030
-3.223299	if N	-0.124939
-3.159787	with N	-0.124939
-2.784647	If N	-0.124939
-2.560554	where N	-0.124939
-1.994699	model N	-0.124939
-1.299909	case, N	-0.124939
-1.615658	different kinds	-0.823909
-2.790297	other kinds	-0.124939
-2.762677	all kinds	-0.124939
-2.599358	two kinds	-0.124939
-2.150766	four kinds	-0.124939
-2.088506	certain kinds	-0.124939
-0.380112	Different kinds	-0.425969
-2.823166	The details	-0.124939
-2.567125	for details	-0.124939
-2.931282	more details	-0.124939
-2.783359	other details	-0.124939
-1.553796	standardized details	-0.124939
-1.078240	technical details	-0.124939
-1.078240	More details	-0.124939
-0.601746	hardware-related details	-0.124939
-0.601746	Further details	-0.124939
-3.973723	the RAM	-0.124939
-2.855522	of RAM	-0.301030
-3.615467	in RAM	-0.124939
-2.254204	more RAM	-0.425969
-2.192755	from RAM	-0.425969
-2.560554	where RAM	-0.124939
-1.744276	save RAM	-0.124939
-1.713484	time, RAM	-0.124939
-3.042146	the rows	-0.425969
-3.108427	of rows	-0.425969
-2.131083	int rows	-0.602060
-2.492715	between rows	-0.124939
-1.525943	through rows	-0.124939
-3.671409	a square	-0.124939
-3.655951	to square	-0.124939
-3.416374	The square	-0.124939
-3.263057	// square	-0.124939
-2.740115	one square	-0.124939
-2.596078	float square	-0.124939
-2.443115	called square	-0.124939
-2.232442	Use square	-0.124939
-2.010041	like square	-0.124939
-1.444677	reciprocal square	-0.124939
-0.902261	division, square	-0.124939
-0.601646	Division, square	-0.124939
-3.701604	to fail	-0.124939
-3.228937	or fail	-0.124939
-2.351552	may fail	-0.124939
-1.969457	will fail	-0.602060
-2.408608	often fail	-0.124939
-2.367889	they fail	-0.124939
-1.492809	therefore fail	-0.425969
-0.601779	products fail	-0.124939
-1.890938	different purposes.	-0.124939
-2.113215	other purposes.	-0.124939
-1.913753	multiple purposes.	-0.124939
-1.679705	test purposes.	-0.124939
-1.672108	these purposes.	-0.124939
-0.601846	demonstration purposes.	-0.124939
-2.802001	functions (e.g.	-0.124939
-2.723211	cache (e.g.	-0.124939
-2.601632	number (e.g.	-0.124939
-2.480280	branch (e.g.	-0.124939
-2.434087	call (e.g.	-0.124939
-2.246244	calls (e.g.	-0.124939
-2.123699	operators (e.g.	-0.124939
-1.955573	applications (e.g.	-0.124939
-1.918922	linking (e.g.	-0.124939
-1.874638	storage (e.g.	-0.124939
-1.743240	algorithm (e.g.	-0.124939
-1.298962	flag (e.g.	-0.124939
-3.111698	of compiling	-0.124939
-3.263365	or compiling	-0.124939
-2.529690	by compiling	-0.124939
-1.555851	when compiling	-0.669007
-2.851158	to convert	-0.124939
-2.658540	can convert	-0.124939
-1.970086	will convert	-0.301030
-2.214868	then convert	-0.124939
-2.426943	first convert	-0.124939
-2.260353	must convert	-0.124939
-1.721405	same thing	-0.425969
-2.749355	one thing	-0.124939
-1.733583	first thing	-0.124939
-2.277446	important thing	-0.124939
-1.899178	second thing	-0.124939
-1.678722	Another thing	-0.124939
-1.299629	third thing	-0.124939
-1.299909	obvious thing	-0.124939
-3.206117	the least	-0.301030
-1.366685	at least	-0.301030
-3.420184	for containing	-0.124939
-3.124419	code containing	-0.124939
-2.835519	vector containing	-0.124939
-2.018209	class containing	-0.425969
-2.419584	register containing	-0.124939
-2.328174	file containing	-0.124939
-2.182581	line containing	-0.124939
-1.993568	block containing	-0.124939
-1.640793	subexpression containing	-0.124939
-1.202978	www.agner.org/optimize/cppexamples.zip containing	-0.124939
-0.902328	patterns containing	-0.124939
-2.488760	< 0)	-0.124939
-1.245043	> 0)	-0.124939
-0.517670	== 0)	-0.522879
-0.266234	!= 0)	-0.425969
-3.728566	of precision.	-0.124939
-3.568623	and precision.	-0.124939
-2.794178	point precision.	-0.124939
-1.569818	double precision.	-0.124939
-1.110922	single precision.	-0.124939
-0.601846	losing precision.	-0.124939
-3.920982	the algebraic	-0.124939
-3.665433	of algebraic	-0.124939
-3.496741	and algebraic	-0.124939
-2.939673	use algebraic	-0.124939
-2.832877	make algebraic	-0.124939
-2.824818	because algebraic	-0.124939
-2.527161	any algebraic	-0.124939
-1.607849	simple algebraic	-0.124939
-2.009102	complicated algebraic	-0.124939
-1.976215	various algebraic	-0.124939
-1.917472	Many algebraic	-0.124939
-3.111698	of structures	-0.124939
-3.263365	or structures	-0.124939
-1.415714	data structures	-0.182931
-2.240988	big structures	-0.124939
-2.612717	a little	-0.221849
-3.159787	with little	-0.124939
-3.116254	as little	-0.124939
-2.976928	have little	-0.124939
-2.900958	A little	-0.124939
-2.546588	takes little	-0.124939
-2.492426	very little	-0.124939
-1.958487	too little	-0.124939
-3.263057	// Any	-0.124939
-2.057057	allocation Any	-0.124939
-1.711466	object. Any	-0.124939
-1.553410	block. Any	-0.124939
-1.502257	units. Any	-0.124939
-1.378556	here. Any	-0.124939
-1.378143	counter. Any	-0.124939
-1.298962	maintain. Any	-0.124939
-0.902261	shared. Any	-0.124939
-0.601646	saved. Any	-0.124939
-0.601646	checking. Any	-0.124939
-0.601646	device. Any	-0.124939
-3.187376	the logical	-0.124939
-3.721976	a logical	-0.124939
-3.098758	of logical	-0.124939
-2.589322	or logical	-0.425969
-2.809336	same logical	-0.124939
-2.749355	one logical	-0.124939
-2.129627	eight logical	-0.124939
-0.601779	even-numbered logical	-0.124939
-2.381961	int level	-0.425969
-2.266645	extra level	-0.124939
-1.535213	element level	-0.124939
-2.141938	Function level	-0.124939
-1.027499	high level	-0.301030
-1.899456	higher level	-0.124939
-1.202999	highest level	-0.124939
-0.601779	"function level	-0.124939
-2.471338	on access.	-0.124939
-1.934930	memory access.	-0.124939
-2.840079	vector access.	-0.124939
-2.695447	each access.	-0.124939
-2.102523	every access.	-0.124939
-2.101591	hardware access.	-0.124939
-1.744123	database access.	-0.124939
-1.555048	non-static access.	-0.124939
-1.502956	random access.	-0.124939
-3.193533	the bitwise	-0.301030
-2.424279	The bitwise	-0.425969
-3.169723	with bitwise	-0.124939
-2.685297	using bitwise	-0.124939
-1.545177	Use bitwise	-0.425969
-0.902663	corresponding bitwise	-0.124939
-4.012816	the handle	-0.124939
-2.489418	to handle	-0.204120
-2.957836	and handle	-0.124939
-3.338172	can handle	-0.124939
-2.905580	then handle	-0.124939
-2.240203	doesn't handle	-0.124939
-2.841125	the heap	-0.204120
-3.771380	of heap	-0.124939
-2.311241	The heap	-0.346788
-1.678111	mov DWORD	-0.124939
-1.503269	ecx DWORD	-0.124939
-0.747937	ebx, DWORD	-0.425969
-0.680927	edx, DWORD	-0.425969
-1.078240	ecx, DWORD	-0.124939
-0.380086	[edx] DWORD	-0.124939
-0.902462	[esp+8] DWORD	-0.124939
-0.601746	[eax+400] DWORD	-0.124939
-0.601746	[esp+4] DWORD	-0.124939
-2.390541	time. Other	-0.124939
-1.918971	processors. Other	-0.124939
-1.299462	priority. Other	-0.124939
-1.203178	format. Other	-0.124939
-1.078240	Literature Other	-0.124939
-0.204057	3.11 Other	-0.425969
-0.204057	3.9 Other	-0.425969
-0.204057	7.31 Other	-0.425969
-0.601746	Gnu). Other	-0.124939
-2.759484	used during	-0.124939
-2.495868	performance during	-0.124939
-2.296560	instructions during	-0.124939
-2.055836	both during	-0.124939
-1.827971	core during	-0.124939
-1.827561	computer during	-0.124939
-1.828382	change during	-0.124939
-1.378143	occurs during	-0.124939
-1.202464	selected during	-0.124939
-0.902261	itself, during	-0.124939
-0.902261	grows during	-0.124939
-0.601646	framework, during	-0.124939
-2.751470	is initialized	-0.249877
-3.568623	and initialized	-0.124939
-3.466191	for initialized	-0.124939
-2.502309	be initialized	-0.301030
-2.561659	array initialized	-0.124939
-2.185709	been initialized	-0.124939
-2.405038	can occur	-0.124939
-2.352128	may occur	-0.124939
-2.905823	will occur	-0.124939
-2.584068	also occur	-0.124939
-2.239376	doesn't occur	-0.124939
-1.133045	contentions occur	-0.425969
-1.203379	seldom occur	-0.124939
-3.046545	the target	-0.249877
-2.837066	The target	-0.124939
-1.263479	branch target	-0.346788
-1.640793	program, especially	-0.124939
-1.502490	systems, especially	-0.124939
-1.299128	inefficient, especially	-0.124939
-1.202598	precision, especially	-0.124939
-0.902328	resource, especially	-0.124939
-0.902328	file, especially	-0.124939
-0.902328	relocation, especially	-0.124939
-0.902328	chains, especially	-0.124939
-0.601679	slower, especially	-0.124939
-0.601679	chain, especially	-0.124939
-0.601679	consuming, especially	-0.124939
-2.614807	a smart	-0.522879
-3.728566	of smart	-0.124939
-2.231938	A smart	-0.425969
-2.685297	using smart	-0.124939
-2.495429	very smart	-0.124939
-2.012357	their smart	-0.124939
-3.437390	that includes	-0.124939
-2.011158	This includes	-0.124939
-3.045587	compiler includes	-0.124939
-2.582254	also includes	-0.124939
-2.465437	way includes	-0.124939
-2.331885	file includes	-0.124939
-1.920646	linking includes	-0.124939
-0.601779	Currently includes	-0.124939
-2.597947	the entire	-0.170696
-3.081147	an entire	-0.124939
-2.928420	the executable	-0.346788
-3.069506	an executable	-0.124939
-2.220471	single executable	-0.124939
-1.878119	binary executable	-0.124939
-0.878006	main executable	-0.124939
-4.012816	the subexpression	-0.124939
-3.749654	a subexpression	-0.124939
-3.245810	or subexpression	-0.124939
-2.815913	same subexpression	-0.124939
-1.270284	common subexpression	-0.301030
-0.266219	Common subexpression	-0.425969
-2.581905	to insert	-0.221849
-2.561235	and insert	-0.249877
-3.345647	can insert	-0.124939
-3.042798	may insert	-0.124939
-3.422620	the nontemporal	-0.425969
-3.702202	of nontemporal	-0.124939
-3.453675	The nontemporal	-0.124939
-1.905649	vector nontemporal	-0.124939
-2.681098	using nontemporal	-0.124939
-1.712370	so-called nontemporal	-0.124939
-1.678165	mix nontemporal	-0.124939
-1.641492	insert nontemporal	-0.124939
-4.012816	the bounds	-0.124939
-3.749654	a bounds	-0.124939
-3.728566	of bounds	-0.124939
-2.244961	with bounds	-0.301030
-3.151472	on bounds	-0.124939
-1.455910	array bounds	-0.124939
-3.506862	for improved	-0.124939
-1.894749	be improved	-0.647817
-3.035630	the SSE	-0.425969
-2.151954	128 SSE	-0.124939
-2.101909	mode SSE	-0.124939
-1.445477	sections SSE	-0.124939
-0.601779	_mm_prefetch SSE	-0.124939
-0.601779	_mm_stream_ps SSE	-0.124939
-0.601779	mmintrin.h SSE	-0.124939
-0.601779	_mm_stream_pi SSE	-0.124939
-2.638867	is discussed	-0.221849
-2.329842	are discussed	-0.425969
-3.136933	as discussed	-0.124939
-2.589556	also discussed	-0.124939
-3.973723	the updates	-0.124939
-2.568351	for updates	-0.124939
-2.848080	program updates	-0.124939
-1.978916	automatic updates	-0.124939
-0.944146	Automatic updates	-0.124939
-1.503190	frequent updates	-0.124939
-1.445477	consuming updates	-0.124939
-0.601779	download updates	-0.124939
-3.726366	to consider	-0.124939
-1.823837	may consider	-0.346788
-3.026833	you consider	-0.124939
-2.908547	will consider	-0.124939
-2.288850	I consider	-0.124939
-1.564951	must consider	-0.124939
-3.422620	the loading	-0.425969
-3.386863	be loading	-0.124939
-2.957277	time loading	-0.124939
-2.873034	from loading	-0.124939
-2.848080	program loading	-0.124939
-1.664725	without loading	-0.425969
-1.446597	lazy loading	-0.124939
-0.505038	Program loading	-0.124939
-3.378551	be below	-0.124939
-2.680539	example below	-0.124939
-2.450991	address below	-0.124939
-2.168068	explained below	-0.124939
-1.106708	columns below	-0.425969
-1.504208	listed below	-0.124939
-1.299462	28 below	-0.124939
-0.380086	matrix[r][c] below	-0.425969
-0.902462	7.15b below	-0.124939
-3.422620	the reading	-0.425969
-3.701604	to reading	-0.124939
-2.950241	and reading	-0.124939
-3.370192	are reading	-0.124939
-3.228937	or reading	-0.124939
-3.159787	with reading	-0.124939
-3.143363	on reading	-0.124939
-2.372808	than reading	-0.425969
-3.920982	the directly	-0.124939
-3.208908	function directly	-0.124939
-3.101367	as directly	-0.124939
-2.246967	calls directly	-0.124939
-2.057275	write directly	-0.124939
-1.445636	representation directly	-0.124939
-1.378310	C++, directly	-0.124939
-1.078039	instruments directly	-0.124939
-0.902328	C1::f directly	-0.124939
-0.902328	fed directly	-0.124939
-0.601679	Called directly	-0.124939
-2.767310	the simplest	-0.191886
-2.427934	The simplest	-0.124939
-3.187376	the situation	-0.602060
-3.453675	The situation	-0.124939
-2.947783	use situation	-0.124939
-2.900958	A situation	-0.124939
-2.804296	only situation	-0.124939
-2.532187	any situation	-0.124939
-1.617636	case situation	-0.124939
-2.216144	common situation	-0.124939
-4.055779	the message	-0.124939
-3.779216	a message	-0.124939
-3.601075	and message	-0.124939
-0.829706	error message	-0.249877
-2.578801	The delay	-0.301030
-3.101755	This delay	-0.124939
-2.957277	time delay	-0.124939
-2.903116	will delay	-0.124939
-2.255927	large delay	-0.124939
-2.238551	doesn't delay	-0.124939
-1.299629	considerable delay	-0.124939
-0.204064	forwarding delay	-0.425969
-3.190443	the condition	-0.124939
-3.735594	a condition	-0.124939
-3.226337	if condition	-0.124939
-2.791596	loop condition	-0.124939
-2.284903	error condition	-0.124939
-1.504113	overflow condition	-0.124939
-1.203478	control condition	-0.124939
-0.897721	performance monitor	-0.689210
-3.937861	the resource	-0.124939
-3.677347	of resource	-0.124939
-3.212695	or resource	-0.124939
-2.802856	same resource	-0.124939
-2.110283	other resource	-0.124939
-1.378476	files, resource	-0.124939
-1.300335	economize resource	-0.124939
-1.202731	scarce resource	-0.124939
-0.601712	precious resource	-0.124939
-0.601712	objects), resource	-0.124939
-2.859179	of cores	-0.124939
-2.826508	different cores	-0.124939
-1.701877	CPU cores	-0.249877
-2.604133	multiple cores	-0.124939
-2.151788	four cores	-0.124939
-1.203479	soft cores	-0.124939
-3.695957	a parallel	-0.124939
-3.677347	of parallel	-0.124939
-3.602603	in parallel	-0.124939
-3.429004	for parallel	-0.124939
-2.112992	doing parallel	-0.124939
-2.010820	allows parallel	-0.124939
-1.106446	Supports parallel	-0.425969
-1.378823	specifying parallel	-0.124939
-0.902395	inherently parallel	-0.124939
-0.601712	massively parallel	-0.124939
-3.602603	in either	-0.124939
-2.465421	faster either	-0.124939
-1.458698	implemented either	-0.425969
-1.920127	linked either	-0.124939
-1.877351	choose either	-0.124939
-1.553185	contain either	-0.124939
-1.445423	saved either	-0.124939
-1.203078	going either	-0.124939
-1.078139	blocks, either	-0.124939
-0.902395	unit, either	-0.124939
-2.142969	different implementations	-0.425969
-1.582986	Some implementations	-0.124939
-1.524491	common implementations	-0.425969
-2.013400	Most implementations	-0.124939
-2.010902	their implementations	-0.124939
-1.775353	slow implementations	-0.124939
-1.600100	alternative implementations	-0.124939
-0.902529	early implementations	-0.124939
-3.742373	of calculating	-0.124939
-2.132639	for calculating	-0.367977
-3.055851	than calculating	-0.124939
-2.940583	when calculating	-0.124939
-1.203400	begin calculating	-0.124939
-3.665433	of ebx	-0.124939
-3.596311	in ebx	-0.124939
-2.248465	result ebx	-0.124939
-2.149683	uses ebx	-0.124939
-1.742884	save ebx	-0.124939
-1.444877	Now ebx	-0.124939
-1.378310	Register ebx	-0.124939
-1.078039	esp ebx	-0.124939
-1.078039	pop ebx	-0.124939
-1.078039	$B1$2 ebx	-0.124939
-0.601679	restore ebx	-0.124939
-2.812612	same generation	-0.124939
-1.733823	first generation	-0.124939
-2.364345	new generation	-0.124939
-1.333220	next generation	-0.425969
-0.948463	second generation	-0.301030
-1.445431	compile-time generation	-0.124939
-1.299796	third generation	-0.124939
-2.580081	to enable	-0.346788
-3.568623	and enable	-0.124939
-3.245810	or enable	-0.124939
-3.037173	may enable	-0.124939
-2.223755	will enable	-0.124939
-2.013627	sets enable	-0.124939
-2.105539	point instructions.	-0.124939
-2.359608	these instructions.	-0.124939
-1.492774	AVX instructions.	-0.124939
-1.958790	string instructions.	-0.124939
-1.899025	control instructions.	-0.124939
-1.553796	subsequent instructions.	-0.124939
-1.445903	machine instructions.	-0.124939
-1.378956	scan instructions.	-0.124939
-1.202865	detailed instructions.	-0.124939
-2.751470	is copied	-0.249877
-2.502309	be copied	-0.124939
-3.094011	not copied	-0.124939
-2.185709	been copied	-0.124939
-1.078754	contents copied	-0.124939
-0.601846	deleted, copied	-0.124939
-3.665433	of e.g.	-0.124939
-3.666922	to e.g.	-0.124939
-3.101367	as e.g.	-0.124939
-3.037036	compiler e.g.	-0.124939
-1.599021	set, e.g.	-0.124939
-1.444877	hold e.g.	-0.124939
-1.444498	available, e.g.	-0.124939
-1.078039	language, e.g.	-0.124939
-0.601679	multi-threading, e.g.	-0.124939
-0.601679	with, e.g.	-0.124939
-0.601679	interrupt, e.g.	-0.124939
-2.351615	to keep	-0.182931
-3.601075	and keep	-0.124939
-2.342685	always keep	-0.124939
-0.601913	producers keep	-0.124939
-0.068455	DWORD PTR	-0.263241
-2.704783	set Automatic	-0.124939
-1.938554	expressions Automatic	-0.124939
-1.898714	dispatch Automatic	-0.124939
-1.773775	vectorization Automatic	-0.124939
-1.299462	tools. Automatic	-0.124939
-0.204057	3.4 Automatic	-0.425969
-0.902462	12.1a. Automatic	-0.124939
-0.204057	12.3 Automatic	-0.425969
-0.601746	updates. Automatic	-0.124939
-2.247483	Windows Library	-0.124939
-0.333128	Template Library	-0.124939
-1.446317	Math Library	-0.124939
-1.445477	16. Library	-0.124939
-1.299909	Kernel Library	-0.124939
-1.202999	optimized. Library	-0.124939
-1.078340	LIBM Library	-0.124939
-0.601779	directly: Library	-0.124939
-3.130534	a ?	-0.425969
-1.707590	b ?	-0.301030
-1.618111	0 ?	-0.425969
-0.982143	0) ?	-0.124939
-0.902663	EXCEPTION_FLT_OVERFLOW ?	-0.124939
-0.601846	OneOrTwo5[(b!=0) ?	-0.124939
-2.905720	is defined	-0.301030
-2.733604	are defined	-0.124939
-2.849927	has defined	-0.124939
-2.603672	object defined	-0.124939
-2.512087	variables defined	-0.124939
-2.184558	been defined	-0.124939
-1.492723	classes defined	-0.425969
-3.776939	is Visual	-0.124939
-0.751646	Microsoft Visual	-0.346788
-1.203266	processing. Visual	-0.124939
-0.380126	C#, Visual	-0.124939
-1.078541	free. Visual	-0.124939
-0.601846	(MS Visual	-0.124939
-2.695991	to align	-0.124939
-3.341893	can align	-0.124939
-2.630953	// align	-0.124939
-2.224320	will align	-0.124939
-1.564732	; align	-0.124939
-3.098758	of sizes	-0.124939
-1.890192	different sizes	-0.301030
-2.759880	all sizes	-0.124939
-2.558911	array sizes	-0.124939
-2.423484	register sizes	-0.124939
-2.186440	matrix sizes	-0.124939
-2.044750	Integer sizes	-0.124939
-1.853699	smaller sizes	-0.124939
-2.610726	= temp;	-0.124939
-1.728794	double temp;	-0.301030
-1.873163	* temp;	-0.124939
-2.424792	register temp;	-0.124939
-1.879579	b, temp;	-0.124939
-1.555562	c, temp;	-0.124939
-1.078440	a[100], temp;	-0.124939
-3.431318	that allow	-0.124939
-3.078198	not allow	-0.124939
-2.900426	will allow	-0.124939
-2.733909	should allow	-0.124939
-2.692560	compilers allow	-0.124939
-1.417449	systems allow	-0.124939
-2.028115	Mac allow	-0.124939
-1.803049	Windows, allow	-0.124939
-1.745370	math allow	-0.124939
-3.992830	the PathScale	-0.124939
-2.442792	and PathScale	-0.522879
-3.237292	or PathScale	-0.124939
-1.922799	Intel, PathScale	-0.124939
-1.802733	platforms. PathScale	-0.124939
-1.203379	PGI PathScale	-0.124939
-0.601813	Hat). PathScale	-0.124939
-3.726366	to BSD	-0.124939
-3.568623	and BSD	-0.124939
-2.965544	in BSD	-0.124939
-0.485454	Linux, BSD	-0.522879
-1.445878	Open BSD	-0.124939
-0.601846	Mac, BSD	-0.124939
-2.647506	+ f;	-0.124939
-1.100384	float f;	-0.602060
-2.514286	return f;	-0.124939
-2.597947	the previous	-0.279841
-3.810938	a previous	-0.124939
-0.881329	< size;	-0.865301
-2.546321	is rarely	-0.124939
-3.568623	and rarely	-0.124939
-3.264215	it rarely	-0.124939
-2.769856	but rarely	-0.124939
-1.600567	programmers rarely	-0.124939
-1.600354	features rarely	-0.124939
-2.823297	different way.	-0.124939
-2.790297	other way.	-0.124939
-1.387412	following way.	-0.124939
-1.078737	inefficient way.	-0.124939
-1.641480	either way.	-0.124939
-0.505002	suboptimal way.	-0.124939
-0.601813	graceful way.	-0.124939
-3.422620	the vector.	-0.124939
-3.123718	a vector.	-0.124939
-2.749355	one vector.	-0.124939
-2.671033	size vector.	-0.124939
-1.996088	Boolean vector.	-0.124939
-1.960154	last vector.	-0.124939
-1.016584	per vector.	-0.124939
-1.379090	largest vector.	-0.124939
-2.905720	is easier	-0.602060
-3.553263	and easier	-0.124939
-2.582580	it easier	-0.425969
-1.720316	often easier	-0.124939
-1.920342	becomes easier	-0.124939
-1.854622	just easier	-0.124939
-1.078440	reordering easier	-0.124939
-3.150330	is identical	-0.425969
-2.733604	are identical	-0.124939
-2.087042	All identical	-0.124939
-1.503669	almost identical	-0.124939
-0.805982	exactly identical	-0.124939
-1.078440	joining identical	-0.124939
-0.204071	Join identical	-0.425969
-3.510197	and 20	-0.124939
-2.412155	- 20	-0.425969
-2.867610	from 20	-0.124939
-1.203078	repeats 20	-0.124939
-1.202731	...................................................................................................... 20	-0.124939
-0.902395	....................................................... 20	-0.124939
-0.902395	163 20	-0.124939
-0.601712	links. 20	-0.124939
-0.601712	access................................................................................................................ 20	-0.124939
-0.601712	files). 20	-0.124939
-2.453620	as well.	-0.124939
-2.493925	very well.	-0.124939
-2.059542	optimize well.	-0.124939
-0.877937	predicted well.	-0.301030
-1.203132	performs well.	-0.124939
-0.380112	reasonably well.	-0.124939
-0.601813	moderately well.	-0.124939
-2.837012	the program,	-0.301030
-3.749654	a program,	-0.124939
-2.586931	C++ program,	-0.124939
-1.900958	your program,	-0.124939
-1.713659	final program,	-0.124939
-1.078541	multithreaded program,	-0.124939
-3.702202	of list[i]	-0.124939
-2.312311	{ list[i]	-0.425969
-2.013121	expression list[i]	-0.124939
-1.957933	&& list[i]	-0.124939
-1.877180	<< list[i]	-0.124939
-0.204064	i++){ list[i]	-0.124939
-0.204064	i+=3){ list[i]	-0.124939
-0.601779	i+=3,i_div_3++){ list[i]	-0.124939
-2.954797	time under	-0.124939
-1.914339	program under	-0.301030
-2.499379	performance under	-0.124939
-2.227127	best under	-0.124939
-2.188126	done under	-0.124939
-1.828132	tested under	-0.124939
-1.299462	runs under	-0.124939
-1.078240	services under	-0.124939
-0.902462	Wikipedia under	-0.124939
-3.568623	and expect	-0.124939
-2.658540	can expect	-0.124939
-3.094011	not expect	-0.124939
-2.336951	you expect	-0.425969
-2.539127	we expect	-0.124939
-1.351012	cannot expect	-0.425969
-2.420880	register except	-0.124939
-1.633264	bits except	-0.425969
-1.712885	time, except	-0.124939
-1.445423	object, except	-0.124939
-1.299642	library, except	-0.124939
-1.202731	underflow except	-0.124939
-1.078139	programs, except	-0.124939
-1.078486	stack, except	-0.124939
-0.601712	faster, except	-0.124939
-0.601712	representation, except	-0.124939
-3.955422	the loops	-0.124939
-2.954797	time loops	-0.124939
-1.911924	two loops	-0.124939
-2.579507	such loops	-0.124939
-2.195411	small loops	-0.124939
-2.057910	both loops	-0.124939
-1.745994	unroll loops	-0.124939
-1.078240	metaprogramming, loops	-0.124939
-0.204057	Nested loops	-0.425969
-0.501359	reason why	-0.124939
-1.601181	reasons why	-0.124939
-1.555716	explanation why	-0.124939
-0.902797	explains why	-0.124939
-1.248620	CPU dispatching.	-0.170696
-1.552677	{ cout	-1.028029
-2.565813	array cout	-0.124939
-1.601607	f cout	-0.124939
-2.957836	and references.	-0.124939
-3.245810	or references.	-0.124939
-3.169723	with references.	-0.124939
-2.685297	using references.	-0.124939
-1.048748	local references.	-0.124939
-0.425874	internal references.	-0.124939
-3.701604	to come	-0.124939
-2.531271	that come	-0.602060
-3.228937	or come	-0.124939
-3.031620	may come	-0.124939
-2.551387	objects come	-0.124939
-2.367889	they come	-0.124939
-2.011733	automatically come	-0.124939
-2.012565	members come	-0.124939
-3.728566	of statements	-0.124939
-3.229397	if statements	-0.124939
-2.604133	multiple statements	-0.124939
-2.101900	add statements	-0.124939
-0.626758	switch statements	-0.124939
-0.902663	Switch statements	-0.124939
-3.294053	= u;	-0.124939
-2.131994	int u;	-0.602060
-1.595493	} u;	-0.425969
-3.417397	the SSE4.1	-0.425969
-3.689733	to SSE4.1	-0.124939
-3.438006	for SSE4.1	-0.124939
-3.279823	// SSE4.1	-0.124939
-3.154903	with SSE4.1	-0.124939
-2.704783	set SSE4.1	-0.124939
-2.299691	instructions SSE4.1	-0.124939
-1.299775	library, SSE4.1	-0.124939
-0.601746	tmmintrin.h SSE4.1	-0.124939
-3.992830	the chapter	-0.124939
-2.555779	in chapter	-0.249877
-3.104703	This chapter	-0.124939
-2.450362	See chapter	-0.124939
-2.027287	next chapter	-0.124939
-1.641480	previous chapter	-0.124939
-0.601813	loops" chapter	-0.124939
-3.147131	is similar	-0.425969
-3.721976	a similar	-0.124939
-2.950241	and similar	-0.124939
-3.437390	that similar	-0.124939
-2.976928	have similar	-0.124939
-2.900958	A similar	-0.124939
-2.492426	very similar	-0.124939
-2.127684	contains similar	-0.124939
-2.360923	of course	-0.182931
-0.601947	zigzag course	-0.124939
-0.601947	(Of course	-0.124939
-2.556648	and back	-0.726999
-2.250846	result back	-0.124939
-1.829307	go back	-0.124939
-1.679022	priority back	-0.124939
-1.378731	places back	-0.124939
-1.299796	jumps back	-0.124939
-0.601813	dates back	-0.124939
-3.196645	the risk	-0.602060
-3.133982	a risk	-0.124939
-2.771589	but risk	-0.124939
-1.776732	no risk	-0.602060
-1.900754	higher risk	-0.124939
-3.735594	a garbage	-0.124939
-2.556648	and garbage	-0.249877
-3.456591	for garbage	-0.124939
-3.104703	This garbage	-0.124939
-2.449615	called garbage	-0.124939
-1.854868	start garbage	-0.124939
-1.554329	time-consuming garbage	-0.124939
-3.101957	of templates	-0.124939
-3.553263	and templates	-0.124939
-2.496127	with templates	-0.124939
-2.020119	class templates	-0.124939
-2.683192	using templates	-0.124939
-1.996553	Using templates	-0.124939
-1.078440	classes, templates	-0.124939
-3.466191	for buffer	-0.124939
-2.874504	memory buffer	-0.124939
-2.793543	loop buffer	-0.124939
-2.601148	static buffer	-0.124939
-0.689044	target buffer	-0.124939
-0.124915	circular buffer	-0.301030
-3.190443	the header	-0.301030
-3.735594	a header	-0.124939
-3.553263	and header	-0.124939
-3.291371	// header	-0.124939
-2.670016	Intel header	-0.124939
-2.101805	standard header	-0.124939
-1.157868	appropriate header	-0.425969
-3.427907	the future	-0.124939
-3.735594	a future	-0.124939
-3.456591	for future	-0.124939
-3.443548	that future	-0.124939
-2.220603	on future	-0.124939
-3.047489	than future	-0.124939
-1.679514	though future	-0.124939
-2.445703	called whenever	-0.124939
-2.353094	useful whenever	-0.124939
-2.244965	calculations whenever	-0.124939
-2.072283	cycles whenever	-0.124939
-1.975715	zero whenever	-0.124939
-1.979844	cost whenever	-0.124939
-1.899282	declared whenever	-0.124939
-1.554221	mispredicted whenever	-0.124939
-1.299642	And whenever	-0.124939
-0.601712	initiative whenever	-0.124939
-2.530791	by unrolling	-0.425969
-1.851183	loop unrolling	-0.124939
-0.575678	Loop unrolling	-0.221849
-3.689596	of CriticalFunction	-0.124939
-3.689733	to CriticalFunction	-0.124939
-3.050760	int CriticalFunction	-0.124939
-2.556133	* CriticalFunction	-0.124939
-2.556216	version CriticalFunction	-0.124939
-2.284472	times CriticalFunction	-0.124939
-1.433941	supported CriticalFunction	-0.425969
-2.129474	whether CriticalFunction	-0.124939
-1.678111	execute CriticalFunction	-0.124939
-2.852860	to swap	-0.301030
-3.584546	and swap	-0.124939
-2.630953	// swap	-0.425969
-3.099413	not swap	-0.124939
-1.510620	cannot swap	-0.602060
-3.123718	a newer	-0.124939
-3.615467	in newer	-0.124939
-3.453675	The newer	-0.124939
-3.143363	on newer	-0.124939
-2.900958	A newer	-0.124939
-2.080021	all newer	-0.124939
-2.691779	most newer	-0.124939
-2.086184	All newer	-0.124939
-3.433259	the fraction	-0.124939
-2.130628	int fraction	-0.602060
-2.754050	one fraction	-0.124939
-2.258033	large fraction	-0.124939
-2.198177	small fraction	-0.124939
-1.407258	1 fraction	-0.425969
-3.094861	to modify	-0.124939
-2.657766	can modify	-0.124939
-2.591223	or modify	-0.124939
-3.225276	function modify	-0.124939
-2.537976	we modify	-0.124939
-2.459003	cannot modify	-0.124939
-1.996308	don't modify	-0.124939
-3.689596	of seconds	-0.124939
-3.431318	that seconds	-0.124939
-3.039285	than seconds	-0.124939
-2.993009	{ seconds	-0.124939
-2.783407	If seconds	-0.124939
-2.704783	set seconds	-0.124939
-2.254878	while seconds	-0.124939
-1.524583	several seconds	-0.124939
-1.714435	until seconds	-0.124939
-3.476007	for unaligned	-0.124939
-1.152587	store unaligned	-0.602060
-1.997484	Using unaligned	-0.124939
-0.969797	load unaligned	-0.602060
-0.505078	16kB unaligned	-0.425969
-2.962402	this address.	-0.124939
-2.866474	memory address.	-0.124939
-2.816946	different address.	-0.124939
-2.806084	same address.	-0.124939
-2.272748	its address.	-0.124939
-1.224556	load address.	-0.124939
-1.378643	self-relative address.	-0.124939
-1.299462	valid address.	-0.124939
-0.601746	(signed) address.	-0.124939
-2.105121	// Store	-0.522879
-1.642977	SSE2 Store	-0.124939
-0.689122	SSE Store	-0.301030
-3.042146	the sequence	-0.249877
-2.887069	a sequence	-0.301030
-3.635507	in sequence	-0.124939
-3.483916	The sequence	-0.124939
-2.494492	long sequence	-0.124939
-3.433259	the compiler,	-0.124939
-2.671520	Intel compiler,	-0.124939
-1.896551	C++ compiler,	-0.124939
-1.583477	Gnu compiler,	-0.124939
-2.188448	Linux compiler,	-0.124939
-1.364591	both compiler,	-0.124939
-3.763557	is significant	-0.124939
-2.883165	a significant	-0.124939
-3.456591	for significant	-0.124939
-3.088676	not significant	-0.124939
-2.693436	most significant	-0.124939
-0.981876	least significant	-0.124939
-1.300042	seven significant	-0.124939
-2.581140	it might	-0.124939
-3.042718	compiler might	-0.124939
-2.962402	this might	-0.124939
-2.900554	It might	-0.124939
-2.509666	variables might	-0.124939
-2.450991	address might	-0.124939
-2.364203	user might	-0.124939
-1.961282	We might	-0.124939
-0.902462	coprocessor might	-0.124939
-3.037791	the CPU.	-0.124939
-3.715184	of CPU.	-0.124939
-2.670016	Intel CPU.	-0.124939
-2.102783	hardware CPU.	-0.124939
-1.803469	modern CPU.	-0.124939
-1.714030	non-Intel CPU.	-0.124939
-0.902596	GHz CPU.	-0.124939
-2.665535	Intel Vector	-0.124939
-2.323468	bits Vector	-0.124939
-1.772967	variables. Vector	-0.124939
-1.598942	sets. Vector	-0.124939
-0.902395	12.5. Vector	-0.124939
-0.601712	12.4. Vector	-0.124939
-0.601712	lately. Vector	-0.124939
-0.601712	12.1. Vector	-0.124939
-0.601712	12.7. Vector	-0.124939
-0.601712	(ZMM). Vector	-0.124939
-2.839750	the length	-0.903090
-2.834718	The length	-0.425969
-1.961447	string length	-0.124939
-1.600742	row length	-0.124939
-3.601075	and sets.	-0.124939
-2.177181	data sets.	-0.124939
-1.494468	instruction sets.	-0.124939
-2.304960	instructions sets.	-0.124939
-2.387490	a linear	-0.329059
-3.649399	in linear	-0.124939
-1.777133	including linear	-0.124939
-3.750575	is something	-0.124939
-2.781484	that something	-0.425969
-2.847988	has something	-0.124939
-2.003929	do something	-0.425969
-2.114442	doing something	-0.124939
-1.996366	put something	-0.124939
-1.078340	Storing something	-0.124939
-1.078340	certainly something	-0.124939
-3.098758	of f	-0.124939
-2.627133	// f	-0.425969
-2.902387	then f	-0.124939
-2.131020	i++) f	-0.124939
-1.853977	i, f	-0.124939
-0.601779	(float)i; f	-0.124939
-0.601779	float(i); f	-0.124939
-0.601779	f=i; f	-0.124939
-3.749654	a penalty	-0.124939
-3.473600	The penalty	-0.124939
-3.107671	This penalty	-0.124939
-2.717480	no penalty	-0.124939
-1.396010	performance penalty	-0.249877
-0.805881	misprediction penalty	-0.124939
-3.689596	of F1	-0.124939
-3.689733	to F1	-0.124939
-3.431318	that F1	-0.124939
-3.217015	function F1	-0.124939
-2.544482	if F1	-0.124939
-3.184250	by F1	-0.124939
-2.900799	then F1	-0.124939
-2.783407	If F1	-0.124939
-0.601746	returning. F1	-0.124939
-3.776939	is invalid	-0.124939
-2.957836	and invalid	-0.425969
-2.502309	be invalid	-0.124939
-3.169723	with invalid	-0.124939
-1.224626	becomes invalid	-0.124939
-0.601846	violations, invalid	-0.124939
-3.483916	The reasons	-0.124939
-2.207899	for reasons	-0.602060
-1.830172	main reasons	-0.124939
-1.745852	special reasons	-0.124939
-1.379131	security reasons	-0.124939
-3.728566	of setting	-0.124939
-3.568623	and setting	-0.124939
-3.466191	for setting	-0.124939
-3.245810	or setting	-0.124939
-2.000973	by setting	-0.221849
-2.878526	from setting	-0.124939
-3.992830	the module	-0.124939
-3.735594	a module	-0.124939
-2.823297	different module	-0.124939
-1.721672	same module	-0.124939
-2.790297	other module	-0.124939
-2.492242	software module	-0.124939
-1.995573	separate module	-0.124939
-2.598734	the beginning	-0.823909
-3.725720	is within	-0.124939
-2.867610	from within	-0.124939
-2.851235	data within	-0.124939
-2.800137	only within	-0.124939
-2.011506	members within	-0.124939
-1.975715	zero within	-0.124939
-1.598597	statements within	-0.124939
-0.902395	irrelevant within	-0.124939
-0.601712	keys within	-0.124939
-0.601712	obsolete within	-0.124939
-2.638867	is used,	-0.221849
-2.755938	be used,	-0.124939
-2.739473	are used,	-0.124939
-1.300297	ever used,	-0.124939
-3.538428	and checks	-0.124939
-3.437390	that checks	-0.124939
-2.581859	it checks	-0.124939
-2.712908	no checks	-0.124939
-2.580492	such checks	-0.124939
-1.503973	overflow checks	-0.425969
-1.959598	dispatcher checks	-0.124939
-1.202999	explicit checks	-0.124939
-3.245810	or input	-0.124939
-3.151472	on input	-0.124939
-2.454719	as input	-0.124939
-3.065695	an input	-0.124939
-1.258509	user input	-0.249877
-2.334377	file input	-0.124939
-3.405232	are not.	-0.124939
-3.345647	can not.	-0.124939
-1.904829	or not.	-0.367977
-2.782528	which not.	-0.124939
-3.431318	that programmers	-0.124939
-1.867944	many programmers	-0.124939
-2.520823	some programmers	-0.124939
-2.488809	software programmers	-0.124939
-2.259199	assembly programmers	-0.124939
-2.012968	Most programmers	-0.124939
-1.918660	Many programmers	-0.124939
-1.852024	advanced programmers	-0.124939
-0.601746	Application programmers	-0.124939
-3.955422	the alternative	-0.124939
-3.444045	The alternative	-0.124939
-3.220282	if alternative	-0.124939
-2.945063	use alternative	-0.124939
-2.298158	simple alternative	-0.124939
-1.544912	An alternative	-0.124939
-1.678423	Another alternative	-0.124939
-0.601746	little-known alternative	-0.124939
-0.601746	light-weight alternative	-0.124939
-1.640335	instructions. My	-0.124939
-1.378130	this. My	-0.124939
-1.202731	example. My	-0.124939
-1.202731	languages. My	-0.124939
-0.902395	Asmlib My	-0.124939
-0.902395	counters. My	-0.124939
-0.902395	matrix. My	-0.124939
-0.902395	me. My	-0.124939
-0.601712	level. My	-0.124939
-0.601712	identified. My	-0.124939
-3.156800	is organized	-0.124939
-2.345526	be organized	-0.249877
-2.737508	are organized	-0.124939
-3.232478	if organized	-0.124939
-2.374660	registers organized	-0.124939
-0.885090	critical stride	-0.221849
-1.248663	instruction set,	-0.221849
-2.926739	the current	-0.124939
-3.568623	and current	-0.124939
-3.449795	that current	-0.124939
-3.151472	on current	-0.124939
-2.562917	where current	-0.124939
-1.678897	handle current	-0.124939
-3.973723	the 'this'	-0.124939
-3.123718	a 'this'	-0.124939
-3.453675	The 'this'	-0.124939
-3.447199	for 'this'	-0.124939
-2.273861	its 'this'	-0.124939
-1.202999	transferring 'this'	-0.124939
-0.504982	implicit 'this'	-0.425969
-1.078340	references, 'this'	-0.124939
-4.012816	the problem.	-0.124939
-3.749654	a problem.	-0.124939
-1.767983	this problem.	-0.124939
-2.238938	big problem.	-0.124939
-2.152841	another problem.	-0.124939
-1.378931	security problem.	-0.124939
-3.524083	and 3	-0.124939
-2.690254	page 3	-0.124939
-2.545443	takes 3	-0.124939
-2.419734	take 3	-0.124939
-2.073252	program. 3	-0.124939
-1.995002	systems. 3	-0.124939
-0.805803	interrupt 3	-0.425969
-1.378643	14 3	-0.124939
-0.601746	....................................................................................................................... 3	-0.124939
-2.812068	functions counts	-0.124939
-2.775899	which counts	-0.124939
-1.611139	clock counts	-0.124939
-1.803224	profiler counts	-0.124939
-0.856937	subsequent counts	-0.124939
-1.300042	event counts	-0.124939
-1.203132	case" counts	-0.124939
-2.695991	to gain	-0.249877
-2.832383	The gain	-0.124939
-3.110660	This gain	-0.124939
-2.337311	you gain	-0.124939
-2.199103	small gain	-0.124939
-2.779931	other processors,	-0.124939
-2.550832	many processors,	-0.124939
-2.439292	4 processors,	-0.124939
-2.208630	AMD processors,	-0.124939
-1.829046	VIA processors,	-0.124939
-1.713231	non-Intel processors,	-0.124939
-1.599287	future processors,	-0.124939
-1.300335	CISC processors,	-0.124939
-1.202731	older processors,	-0.124939
-0.601712	Atom processors,	-0.124939
-2.246320	can happen	-0.124939
-3.034388	may happen	-0.124939
-2.905823	will happen	-0.124939
-2.850818	program happen	-0.124939
-2.512087	variables happen	-0.124939
-2.410138	often happen	-0.124939
-2.187232	matrix happen	-0.124939
-3.395338	be enough	-0.124939
-3.088676	not enough	-0.124939
-2.849927	has enough	-0.124939
-1.799286	long enough	-0.425969
-1.290420	big enough	-0.301030
-2.197253	small enough	-0.124939
-1.640742	rarely enough	-0.124939
-3.726366	to apply	-0.124939
-2.423346	not apply	-0.425969
-2.352706	may apply	-0.425969
-2.044569	should apply	-0.124939
-2.339275	always apply	-0.124939
-0.748017	rules apply	-0.124939
-2.897987	} Obviously,	-0.124939
-1.772967	variables. Obviously,	-0.124939
-1.712196	object. Obviously,	-0.124939
-1.712540	cycles. Obviously,	-0.124939
-1.677434	needed. Obviously,	-0.124939
-1.598942	dispatching. Obviously,	-0.124939
-1.378476	again. Obviously,	-0.124939
-1.299295	finished. Obviously,	-0.124939
-1.299295	process. Obviously,	-0.124939
-0.902395	framework. Obviously,	-0.124939
-3.128071	code version.	-0.124939
-2.693312	each version.	-0.124939
-2.558203	possible version.	-0.124939
-2.484187	32-bit version.	-0.124939
-2.102026	every version.	-0.124939
-2.074685	intermediate version.	-0.124939
-1.875629	old version.	-0.124939
-1.773655	desired version.	-0.124939
-1.599633	alternative version.	-0.124939
-1.078139	up-to-date version.	-0.124939
-3.955422	the row	-0.124939
-3.708772	a row	-0.124939
-3.608987	in row	-0.124939
-3.248162	= row	-0.124939
-2.870313	from row	-0.124939
-2.012844	each row	-0.124939
-2.441229	0; row	-0.124939
-1.713497	per row	-0.124939
-1.642198	calculating row	-0.124939
-1.482914	C++ Compiler	-0.249877
-1.299963	Mars Compiler	-0.124939
-1.299963	macros Compiler	-0.124939
-0.204077	8.5 Compiler	-0.425969
-0.601846	18.2. Compiler	-0.124939
-0.601846	selected. Compiler	-0.124939
-2.388112	a matter	-1.028029
-1.546144	doesn't matter	-0.124939
-3.417397	the declaration	-0.124939
-3.444045	The declaration	-0.124939
-3.050760	int declaration	-0.124939
-2.700582	class declaration	-0.124939
-2.596747	static declaration	-0.124939
-2.530808	variable declaration	-0.124939
-1.677487	full declaration	-0.124939
-1.554421	available. declaration	-0.124939
-0.902462	"C" declaration	-0.124939
-2.415696	to allocate	-0.271067
-3.104883	not allocate	-0.124939
-2.354282	even allocate	-0.124939
-2.188890	classes allocate	-0.124939
-4.033767	the series	-0.124939
-2.525163	a series	-0.903090
-3.110660	This series	-0.124939
-2.978038	this series	-0.124939
-1.203400	Taylor series	-0.124939
-3.955422	the features	-0.124939
-2.973676	have features	-0.124939
-2.806084	same features	-0.124939
-2.783359	other features	-0.124939
-1.706889	optimization features	-0.124939
-2.362052	new features	-0.124939
-1.852024	advanced features	-0.124939
-1.445277	consuming features	-0.124939
-0.601746	Important features	-0.124939
-2.637879	is added	-0.823909
-3.742373	of added	-0.124939
-2.753979	be added	-0.124939
-2.986831	have added	-0.124939
-2.186862	been added	-0.124939
-2.933502	the user.	-0.124939
-0.733297	end user.	-0.124939
-3.146803	code to:	-0.124939
-1.768228	this to:	-0.823909
-2.177540	optimized to:	-0.124939
-1.714384	reduced to:	-0.124939
-0.981983	changed to:	-0.425969
-2.525163	a waste	-0.903090
-3.584546	and waste	-0.124939
-2.369680	they waste	-0.124939
-2.239962	big waste	-0.124939
-1.714205	total waste	-0.124939
-3.713808	to metaprogramming	-0.124939
-2.905051	A metaprogramming	-0.124939
-2.403324	how metaprogramming	-0.124939
-1.286868	template metaprogramming	-0.124939
-1.959116	better metaprogramming	-0.124939
-1.678285	full metaprogramming	-0.124939
-1.078687	considered metaprogramming	-0.124939
-3.735594	a map	-0.124939
-3.553263	and map	-0.124939
-2.827750	The map	-0.425969
-1.878349	link map	-0.124939
-0.550798	hash map	-0.124939
-1.078440	Generate map	-0.124939
-0.601813	"generate map	-0.124939
-2.697180	to define	-0.124939
-3.345647	can define	-0.124939
-2.220732	// define	-0.124939
-3.042798	may define	-0.124939
-3.278300	it returns.	-0.124939
-1.738361	function returns.	-0.176091
-3.642397	in Windows.	-0.124939
-3.486050	for Windows.	-0.124939
-1.491953	64-bit Windows.	-0.249877
-1.379884	32-bit Windows.	-0.249877
-1.387620	programming style	-0.124939
-0.878046	C style	-0.124939
-1.714564	writing style	-0.124939
-0.601880	x87 style	-0.124939
-0.601880	C- style	-0.124939
-1.873929	// Load	-1.028029
-1.379733	call. Load	-0.124939
-3.270502	= 3;	-0.124939
-3.062260	int 3;	-0.124939
-2.640844	+ 3;	-0.124939
-1.460226	* 3;	-0.425969
-2.163687	/ 3;	-0.124939
-1.679109	% 3;	-0.124939
-2.905720	is approximately	-0.124939
-3.715184	of approximately	-0.124939
-3.456591	for approximately	-0.124939
-3.378690	are approximately	-0.124939
-3.237292	or approximately	-0.124939
-2.421511	take approximately	-0.124939
-2.268997	accessed approximately	-0.124939
-3.108427	of order.	-0.124939
-2.046165	optimal order.	-0.124939
-0.550787	non-sequential order.	-0.301030
-0.805928	random order.	-0.124939
-1.203400	sequential order.	-0.124939
-2.315129	case 3:	-0.124939
-0.772263	manual 3:	-0.970037
-0.902864	Manual 3:	-0.124939
-3.516422	The microarchitecture	-0.124939
-0.042749	"The microarchitecture	-1.028029
-2.909392	is easy	-0.602060
-2.961684	and easy	-0.124939
-3.254499	or easy	-0.124939
-2.030872	no easy	-0.425969
-0.902730	facilities, easy	-0.124939
-3.728566	of situations	-0.124939
-2.556348	in situations	-0.425969
-3.403981	be situations	-0.124939
-3.387359	are situations	-0.124939
-2.585890	also situations	-0.124939
-2.372585	test situations	-0.124939
-2.416319	to implement	-0.367977
-2.189853	classes implement	-0.124939
-1.830927	tested implement	-0.124939
-3.043367	than 65	-0.124939
-0.856940	65 65	-0.124939
-1.202999	......................................................................................... 65	-0.124939
-0.902529	namespaces. 65	-0.124939
-0.601779	Namespaces........................................................................................................... 65	-0.124939
-0.601779	16.4 65	-0.124939
-0.601779	80.8 65	-0.124939
-0.601779	.............................................................................. 65	-0.124939
-3.199780	the chosen	-0.124939
-2.911239	is chosen	-0.124939
-3.421802	be chosen	-0.124939
-2.163784	has chosen	-0.124939
-3.708772	a 256-bit	-0.124939
-3.689596	of 256-bit	-0.124939
-3.689733	to 256-bit	-0.124939
-3.524083	and 256-bit	-0.124939
-3.444045	The 256-bit	-0.124939
-2.747027	one 256-bit	-0.124939
-2.130408	supported 256-bit	-0.124939
-2.011414	allows 256-bit	-0.124939
-1.203178	splitting 256-bit	-0.124939
-2.909392	is slightly	-0.301030
-1.866626	only slightly	-0.301030
-2.117342	run slightly	-0.124939
-0.902730	Sum1 slightly	-0.124939
-0.902730	although slightly	-0.124939
-3.584546	and scattered	-0.124939
-2.753979	be scattered	-0.425969
-2.329076	are scattered	-0.249877
-2.816162	functions scattered	-0.124939
-2.209115	etc. scattered	-0.124939
-3.689733	to contain	-0.124939
-3.431318	that contain	-0.124939
-3.327196	can contain	-0.124939
-3.028870	may contain	-0.124939
-2.733909	should contain	-0.124939
-2.367293	they contain	-0.124939
-2.184107	classes contain	-0.124939
-1.078240	books contain	-0.124939
-0.601746	newsgroups contain	-0.124939
-3.553263	and writes	-0.124939
-2.591223	or writes	-0.124939
-3.260763	it writes	-0.124939
-3.225276	function writes	-0.124939
-2.762677	all writes	-0.124939
-0.944186	nontemporal writes	-0.124939
-1.378731	normal writes	-0.124939
-3.973723	the device	-0.124939
-3.721976	a device	-0.124939
-3.538428	and device	-0.124939
-3.615467	in device	-0.124939
-2.786814	other device	-0.124939
-2.594476	64-bit device	-0.124939
-0.856940	logic device	-0.124939
-1.299629	Critical device	-0.124939
-3.153553	is independent	-0.425969
-3.387359	are independent	-0.124939
-1.679961	OS independent	-0.124939
-1.503870	almost independent	-0.124939
-1.378931	completely independent	-0.124939
-0.124915	position- independent	-0.124939
-1.429466	memory allocation.	-0.124939
-2.333418	dynamic allocation.	-0.124939
-3.133982	a non-static	-0.425969
-3.254499	or non-static	-0.124939
-2.081773	all non-static	-0.124939
-1.591260	any non-static	-0.301030
-2.088761	All non-static	-0.124939
-2.931801	the subsequent	-0.124939
-2.585345	The subsequent	-0.124939
-2.774052	all subsequent	-0.124939
-2.424796	This applies	-0.425969
-2.134886	same applies	-0.425969
-1.642333	also applies	-0.602060
-2.505721	2 applies	-0.124939
-1.300310	advice applies	-0.124939
-2.347821	be applied	-0.726999
-1.723433	when applied	-0.823909
-2.561235	and destructors	-0.124939
-3.179892	with destructors	-0.124939
-1.828229	all destructors	-0.301030
-2.232750	necessary destructors	-0.124939
-3.094861	to integers.	-0.124939
-3.121332	as integers.	-0.124939
-1.905291	64-bit integers.	-0.124939
-2.435260	unsigned integers.	-0.124939
-1.997044	signed integers.	-0.124939
-1.679268	16-bit integers.	-0.124939
-1.678776	containing integers.	-0.124939
-2.210931	in terms	-1.028029
-1.805781	consecutive terms	-0.124939
-3.094861	to help	-0.425969
-3.334482	can help	-0.124939
-2.524131	some help	-0.124939
-2.357649	without help	-0.124939
-2.102538	store help	-0.124939
-0.680994	files, help	-0.425969
-1.299796	remote help	-0.124939
-3.103884	to transfer	-0.124939
-3.494484	The transfer	-0.124939
-0.751710	parameter transfer	-0.221849
-0.601913	Parameter transfer	-0.124939
-1.935862	memory blocks	-0.124939
-2.604133	multiple blocks	-0.124939
-1.545220	big blocks	-0.124939
-1.713659	copying blocks	-0.124939
-0.601846	building blocks	-0.124939
-0.601846	"Moving blocks	-0.124939
-2.177540	optimized away	-0.124939
-1.393839	optimizing away	-0.124939
-0.950237	optimize away	-0.249877
-1.997664	put away	-0.124939
-1.830172	go away	-0.124939
-1.307548	example 15.1b	-0.492916
-1.746831	inlined 15.1b	-0.124939
-1.714985	reduced 15.1b	-0.124939
-4.012816	the low	-0.124939
-3.776939	is low	-0.124939
-2.885112	a low	-0.124939
-2.497193	with low	-0.124939
-2.495429	very low	-0.124939
-0.902663	got low	-0.124939
-3.726366	to multiply	-0.124939
-2.658540	can multiply	-0.124939
-3.297262	// multiply	-0.124939
-2.361917	= multiply	-0.301030
-2.737449	should multiply	-0.124939
-2.459862	cannot multiply	-0.124939
-3.713808	to share	-0.124939
-2.405038	can share	-0.602060
-2.552534	objects share	-0.124939
-2.229636	threads share	-0.124939
-2.013096	members share	-0.124939
-1.876387	usually share	-0.124939
-1.299796	28 share	-0.124939
-2.354446	is enabled.	-0.176091
-1.767554	an explanation	-0.602060
-2.722100	no explanation	-0.124939
-2.338334	following explanation	-0.124939
-1.203533	detailed explanation	-0.124939
-3.776939	is near	-0.124939
-2.768430	used near	-0.124939
-1.345203	stored near	-0.726999
-2.450927	called near	-0.124939
-1.900533	together near	-0.124939
-1.503870	equally near	-0.124939
-2.909392	is provided	-0.301030
-2.486950	are provided	-0.301030
-2.986831	have provided	-0.124939
-1.854996	function, provided	-0.124939
-0.902730	branches, provided	-0.124939
-2.767310	the latter	-0.191886
-2.839426	The latter	-0.425969
-3.361856	are 6	-0.124939
-3.279823	// 6	-0.124939
-3.096824	- 6	-0.124939
-2.073252	program. 6	-0.124939
-1.378330	plus 6	-0.124939
-1.299462	future. 6	-0.124939
-1.078240	........................................................................................... 6	-0.124939
-0.902462	24 6	-0.124939
-0.601746	system......................................................................................... 6	-0.124939
-2.957836	and stores	-0.425969
-2.552539	function stores	-0.124939
-2.847011	vector stores	-0.124939
-1.393726	simply stores	-0.425969
-1.960382	mechanism stores	-0.124939
-0.601846	[ecx+eax*4],ebx stores	-0.124939
-3.264215	it to.	-0.124939
-0.743291	points to.	-0.249877
-1.600993	apply to.	-0.124939
-1.504723	pointed to.	-0.124939
-1.299963	jumps to.	-0.124939
-0.902663	refers to.	-0.124939
-3.433259	the default	-0.124939
-3.749654	a default	-0.124939
-3.726366	to default	-0.124939
-3.297262	// default	-0.124939
-2.275337	by default	-0.124939
-2.909183	A default	-0.124939
-2.325563	bits Instruction	-0.124939
-2.228844	element Instruction	-0.124939
-1.994699	name Instruction	-0.124939
-1.641213	BSD Instruction	-0.124939
-0.747851	follows: Instruction	-0.425969
-1.378810	4. Instruction	-0.124939
-0.601779	"\nError: Instruction	-0.124939
-0.601779	13.1. Instruction	-0.124939
-3.771380	of finding	-0.124939
-2.133541	for finding	-0.367977
-3.064377	than finding	-0.124939
-3.750575	is inefficient.	-0.124939
-3.370192	are inefficient.	-0.124939
-2.492426	very inefficient.	-0.124939
-2.408608	often inefficient.	-0.124939
-2.072220	quite inefficient.	-0.124939
-1.317446	caching inefficient.	-0.124939
-1.919811	becomes inefficient.	-0.124939
-1.599541	course inefficient.	-0.124939
-0.418402	b, c,	-0.550907
-1.879699	0, c,	-0.124939
-3.538428	and search	-0.124939
-3.453675	The search	-0.124939
-3.370192	are search	-0.124939
-2.094087	If search	-0.425969
-2.043086	programs search	-0.124939
-1.959320	string search	-0.124939
-1.900292	improve search	-0.124939
-1.876623	binary search	-0.124939
-2.796647	CPU Modern	-0.124939
-2.126063	operators Modern	-0.124939
-2.058219	optimize Modern	-0.124939
-1.744123	resources. Modern	-0.124939
-1.553796	chains Modern	-0.124939
-1.378643	prediction. Modern	-0.124939
-1.202865	parallel. Modern	-0.124939
-0.601746	mechanisms. Modern	-0.124939
-0.601746	temp2. Modern	-0.124939
-1.776773	memory block.	-0.249877
-2.365496	new block.	-0.124939
-2.199867	allocated block.	-0.124939
-2.028114	next block.	-0.124939
-1.504083	try block.	-0.124939
-0.902663	environment block.	-0.124939
-2.909392	is critical.	-0.301030
-2.753979	be critical.	-0.124939
-3.099413	not critical.	-0.124939
-2.696769	most critical.	-0.124939
-1.048786	particularly critical.	-0.124939
-0.619074	dependency chains	-0.191886
-0.204104	Dependency chains	-0.124939
-4.012816	the time-consuming	-0.124939
-2.939675	more time-consuming	-0.124939
-2.004719	most time-consuming	-0.124939
-1.549817	very time-consuming	-0.124939
-2.073871	quite time-consuming	-0.124939
-1.997231	put time-consuming	-0.124939
-1.890938	different brands	-0.602060
-2.115122	CPU brands	-0.124939
-2.793808	other brands	-0.124939
-2.765493	all brands	-0.124939
-1.679535	Other brands	-0.124939
-1.078541	competing brands	-0.124939
-2.905720	is available.	-0.602060
-3.226337	if available.	-0.124939
-2.584068	also available.	-0.124939
-2.399431	libraries available.	-0.124939
-2.198471	option available.	-0.124939
-1.941526	types available.	-0.124939
-0.902596	became available.	-0.124939
-1.802825	cases. Don't	-0.124939
-1.678165	possible. Don't	-0.124939
-1.445757	threads. Don't	-0.124939
-1.078340	metaprogramming. Don't	-0.124939
-0.204064	14.7 Don't	-0.425969
-0.601779	overkill. Don't	-0.124939
-0.601779	evicted. Don't	-0.124939
-0.601779	opposite: Don't	-0.124939
-3.776939	is brand	-0.124939
-2.974076	this brand	-0.124939
-1.861097	CPU brand	-0.124939
-1.845356	any brand	-0.124939
-2.044330	particular brand	-0.124939
-1.679961	unknown brand	-0.124939
-2.548725	is executed.	-0.124939
-3.414453	are executed.	-0.124939
-1.364877	was executed.	-0.124939
-3.763557	is faster.	-0.124939
-2.492242	software faster.	-0.124939
-2.286120	times faster.	-0.124939
-1.270518	much faster.	-0.124939
-2.166569	calculation faster.	-0.124939
-1.978824	division faster.	-0.124939
-1.678776	execute faster.	-0.124939
-2.767310	the diagonal	-0.191886
-0.944387	below diagonal	-0.124939
-1.856272	int n;	-0.124939
-2.488760	< n;	-0.124939
-0.748012	<= n;	-0.124939
-0.902797	ptr n;	-0.124939
-2.362861	= *p	-0.602060
-3.204574	by *p	-0.124939
-2.313767	{ *p	-0.425969
-1.078821	reload *p	-0.124939
-0.204084	string[100], *p	-0.425969
-3.438678	the logic	-0.124939
-3.483916	The logic	-0.124939
-1.915612	program logic	-0.301030
-0.204084	programmable logic	-0.425969
-0.601880	Programmable logic	-0.124939
-3.422620	the Microsoft,	-0.425969
-3.453675	The Microsoft,	-0.124939
-3.189243	by Microsoft,	-0.124939
-3.159787	with Microsoft,	-0.124939
-2.873034	from Microsoft,	-0.124939
-1.922598	Intel, Microsoft,	-0.124939
-1.802268	platforms. Microsoft,	-0.124939
-1.378530	(i.e. Microsoft,	-0.124939
-3.438678	the hard	-0.124939
-2.730626	a hard	-0.726999
-3.476007	for hard	-0.124939
-2.560214	many hard	-0.124939
-1.504430	fragmented hard	-0.124939
-3.702202	of purposes	-0.124939
-2.820110	different purposes	-0.124939
-2.111747	other purposes	-0.425969
-2.691779	most purposes	-0.124939
-2.216144	common purposes	-0.124939
-1.744554	special purposes	-0.124939
-1.502910	general purposes	-0.124939
-0.601779	educational purposes	-0.124939
-4.012816	the typical	-0.124939
-2.729261	a typical	-0.124939
-3.473600	The typical	-0.124939
-2.909183	A typical	-0.124939
-2.525794	some typical	-0.124939
-0.601846	Four typical	-0.124939
-3.721976	a usability	-0.124939
-3.702202	of usability	-0.124939
-2.950241	and usability	-0.124939
-3.453675	The usability	-0.124939
-3.447199	for usability	-0.124939
-3.370192	are usability	-0.124939
-2.277446	important usability	-0.124939
-0.601779	problems, usability	-0.124939
-3.150330	is pure	-0.124939
-3.127113	a pure	-0.124939
-3.378690	are pure	-0.124939
-2.905051	A pure	-0.124939
-1.678776	containing pure	-0.124939
-1.554083	contain pure	-0.124939
-1.555068	involves pure	-0.124939
-2.695991	to vectorize	-0.425969
-3.584546	and vectorize	-0.124939
-3.099413	not vectorize	-0.124939
-2.224320	will vectorize	-0.124939
-1.997305	don't vectorize	-0.124939
-2.731125	cache problems.	-0.124939
-1.809559	performance problems.	-0.124939
-2.361007	these problems.	-0.124939
-1.712927	alignment problems.	-0.124939
-1.714602	compatibility problems.	-0.124939
-1.078340	technical problems.	-0.124939
-0.601779	Installation problems.	-0.124939
-0.601779	Compatibility problems.	-0.124939
-3.431318	that could	-0.124939
-3.253942	it could	-0.124939
-3.217015	function could	-0.124939
-3.131753	code could	-0.124939
-3.021580	you could	-0.124939
-1.994381	methods could	-0.124939
-1.378643	portability could	-0.124939
-1.202865	N1 could	-0.124939
-0.601746	r+i/2 could	-0.124939
-3.242285	function parameter.	-0.124939
-2.761189	one parameter.	-0.124939
-1.003320	template parameter.	-0.124939
-3.046545	the derived	-0.249877
-2.733370	a derived	-0.425969
-3.618258	and derived	-0.124939
-3.378551	be mentioned	-0.124939
-3.361856	are mentioned	-0.124939
-3.111235	as mentioned	-0.124939
-2.324822	operations mentioned	-0.124939
-2.043909	problems mentioned	-0.124939
-1.994381	methods mentioned	-0.124939
-1.445590	disadvantages mentioned	-0.124939
-0.902462	time-consumers mentioned	-0.124939
-0.902462	ones mentioned	-0.124939
-2.377321	// Time	-0.124939
-2.672658	size Time	-0.124939
-1.599595	user. Time	-0.124939
-0.902596	kilobytes Time	-0.124939
-0.902596	9.6a Time	-0.124939
-0.601813	9.1. Time	-0.124939
-0.601813	9.3. Time	-0.124939
-1.445878	table. Optimization	-0.124939
-0.505022	17 Optimization	-0.425969
-0.204077	"Performance Optimization	-0.425969
-0.204077	8.6 Optimization	-0.425969
-0.601846	"Software Optimization	-0.124939
-0.601846	Architectures Optimization	-0.124939
-1.693973	point expressions.	-0.124939
-2.732115	integer expressions.	-0.124939
-2.303013	simple expressions.	-0.124939
-1.300526	Boolean expressions.	-0.124939
-1.680341	algebraic expressions.	-0.124939
-3.091895	to include	-0.124939
-3.083406	not include	-0.124939
-2.735086	should include	-0.124939
-2.693959	compilers include	-0.124939
-2.012565	sets include	-0.124939
-1.803103	languages include	-0.124939
-1.078340	packages include	-0.124939
-1.078620	Examples include	-0.124939
-2.510282	return y;	-0.124939
-0.567201	x, y;	-0.124939
-0.748080	d, y;	-0.425969
-1.203400	100, y;	-0.124939
-0.601880	1.23456, y;	-0.124939
-4.012816	the overflow.	-0.124939
-3.105180	of overflow.	-0.124939
-2.819828	for overflow.	-0.124939
-2.729748	integer overflow.	-0.124939
-1.493173	cause overflow.	-0.124939
-0.902663	generating overflow.	-0.124939
-1.615304	array element.	-0.124939
-2.219577	single element.	-0.124939
-2.188026	matrix element.	-0.124939
-2.028114	next element.	-0.124939
-1.016692	per element.	-0.124939
-1.078541	pivot element.	-0.124939
-1.216749	object oriented	-0.669007
-1.078842	Object oriented	-0.124939
-0.601947	non-object oriented	-0.124939
-3.973723	the fully	-0.124939
-3.147131	is fully	-0.124939
-3.721976	a fully	-0.124939
-3.701604	to fully	-0.124939
-3.370192	are fully	-0.124939
-3.083406	not fully	-0.124939
-2.873034	from fully	-0.124939
-2.335892	always fully	-0.124939
-3.728566	of storage.	-0.124939
-1.733893	register storage.	-0.124939
-1.996170	separate storage.	-0.124939
-1.503444	temporary storage.	-0.124939
-0.124915	big-endian storage.	-0.124939
-1.078541	endian storage.	-0.124939
-3.715184	of addition,	-0.124939
-3.622045	in addition,	-0.124939
-3.121332	as addition,	-0.124939
-2.373685	than addition,	-0.425969
-2.791918	point addition,	-0.124939
-2.042511	integer addition,	-0.124939
-0.601813	saturated addition,	-0.124939
-3.101957	of everything	-0.425969
-3.443548	that everything	-0.124939
-2.628403	// everything	-0.425969
-2.561734	where everything	-0.124939
-2.347783	sure everything	-0.124939
-2.296358	up everything	-0.124939
-1.445677	eliminate everything	-0.124939
-2.330082	it involves	-0.124939
-3.142992	code involves	-0.124939
-2.974076	this involves	-0.124939
-2.585890	also involves	-0.124939
-1.634161	operations involves	-0.425969
-0.601846	driver involves	-0.124939
-2.899878	} Here	-0.124939
-2.074183	... Here	-0.124939
-1.640634	way. Here	-0.124939
-1.444964	language. Here	-0.124939
-1.202865	double. Here	-0.124939
-0.902462	2" Here	-0.124939
-0.902462	iterations. Here	-0.124939
-0.601746	capabilities. Here	-0.124939
-0.601746	a2/b2; Here	-0.124939
-4.012816	the factorial	-0.124939
-2.383586	int factorial	-0.425969
-2.729748	integer factorial	-0.124939
-1.959958	n factorial	-0.124939
-0.505022	n, factorial	-0.425969
-0.505065	x++) factorial	-0.425969
-3.427907	the OpenMP	-0.425969
-3.194293	by OpenMP	-0.124939
-2.237673	Use OpenMP	-0.124939
-1.803469	Supports OpenMP	-0.124939
-0.680994	processing, OpenMP	-0.425969
-1.078440	directives. OpenMP	-0.124939
-0.601813	107), OpenMP	-0.124939
-2.659744	pointer eax	-0.124939
-2.257221	; eax	-0.124939
-1.503669	array. eax	-0.124939
-0.492806	ebx, eax	-0.301030
-1.445677	eax, eax	-0.124939
-1.378977	edx, eax	-0.124939
-0.902596	compares eax	-0.124939
-1.569095	int bb[],	-1.079181
-2.756627	is mispredicted	-0.726999
-2.231745	be mispredicted	-0.124939
-3.737969	is standardized	-0.124939
-3.708772	a standardized	-0.124939
-3.524083	and standardized	-0.124939
-3.378551	be standardized	-0.124939
-3.111235	as standardized	-0.124939
-3.078198	not standardized	-0.124939
-2.945063	use standardized	-0.124939
-1.553796	fully standardized	-0.124939
-0.902462	non- standardized	-0.124939
-3.689596	of (or	-0.124939
-2.845360	program (or	-0.124939
-2.704783	set (or	-0.124939
-2.615406	library (or	-0.124939
-2.454705	stored (or	-0.124939
-2.335874	SSE2 (or	-0.124939
-2.148730	four (or	-0.124939
-1.959722	char (or	-0.124939
-1.445277	delete (or	-0.124939
-1.109741	optimize across	-0.124939
-1.224797	optimizations across	-0.124939
-1.831442	compatible across	-0.124939
-1.554809	transfer across	-0.124939
-1.554596	standardized across	-0.124939
-0.601846	unchanged across	-0.124939
-1.047310	clock cycle	-0.477121
-1.439426	pointer aliasing	-0.124939
-2.339790	out aliasing	-0.124939
-1.379625	Pointer aliasing	-0.124939
-0.601943	strict aliasing	-0.425969
-1.569095	int aa[],	-1.079181
-2.426034	This tool	-0.124939
-1.149523	test tool	-0.346788
-1.300623	development tool	-0.425969
-4.012816	the parent	-0.124939
-3.130534	a parent	-0.425969
-2.859179	of parent	-0.301030
-3.297262	// parent	-0.124939
-2.604133	multiple parent	-0.124939
-2.059993	both parent	-0.124939
-3.752625	to care	-0.124939
-1.601490	takes care	-0.602060
-1.313887	take care	-0.726999
-1.997805	don't care	-0.124939
-2.696769	most systems,	-0.124939
-1.905864	64-bit systems,	-0.124939
-1.794141	32-bit systems,	-0.124939
-1.741579	operating systems,	-0.124939
-2.030371	Mac systems,	-0.124939
-0.204084	CriticalFunction_386(int parm1,	-0.425969
-0.204084	CriticalFunction_SSE2(int parm1,	-0.425969
-0.204084	CriticalFunction_AVX(int parm1,	-0.425969
-0.601880	CriticalFunctionType(int parm1,	-0.124939
-0.601880	CriticalFunction_Dispatch(int parm1,	-0.124939
-2.907552	is included	-0.301030
-3.387359	are included	-0.124939
-3.094011	not included	-0.124939
-2.400614	libraries included	-0.124939
-1.876984	usually included	-0.124939
-0.902663	license included	-0.124939
-3.127113	a false	-0.124939
-3.456591	for false	-0.124939
-3.395338	be false	-0.124939
-3.262927	= false	-0.124939
-3.237292	or false	-0.124939
-1.958627	&& false	-0.124939
-1.940056	|| false	-0.124939
-3.992830	the value.	-0.124939
-2.812612	same value.	-0.124939
-2.506315	return value.	-0.124939
-2.274977	its value.	-0.124939
-2.165101	calculated value.	-0.124939
-1.713784	maximum value.	-0.124939
-0.944087	previous value.	-0.124939
-1.659735	object file.	-0.301030
-1.282797	source file.	-0.124939
-1.746210	output file.	-0.124939
-1.642373	executable file.	-0.124939
-1.600441	input file.	-0.124939
-3.060361	x *=	-0.124939
-1.998519	y *=	-0.124939
-1.601072	f *=	-0.124939
-0.857085	factorial *=	-0.425969
-1.445677	xn *=	-0.124939
-1.078440	xxn *=	-0.124939
-0.902596	nfac *=	-0.124939
-2.730626	a temporary	-0.124939
-3.742373	of temporary	-0.124939
-3.476007	for temporary	-0.124939
-3.396204	are temporary	-0.124939
-1.078641	inserts temporary	-0.124939
-3.750575	is 12	-0.124939
-2.236622	Use 12	-0.124939
-1.678443	access. 12	-0.124939
-1.554342	approximately 12	-0.124939
-1.445757	2: 12	-0.124939
-1.202999	103 12	-0.124939
-0.902529	10, 12	-0.124939
-0.601779	libraries........................................................................................ 12	-0.124939
-3.427907	the memcpy	-0.124939
-3.713808	to memcpy	-0.124939
-3.553263	and memcpy	-0.124939
-1.078440	0.11 memcpy	-0.124939
-1.078440	0.12 memcpy	-0.124939
-0.601813	Processor memcpy	-0.124939
-0.601813	0.22 memcpy	-0.124939
-4.012816	the procedure	-0.124939
-2.885112	a procedure	-0.602060
-3.728566	of procedure	-0.124939
-3.473600	The procedure	-0.124939
-2.450927	called procedure	-0.124939
-0.601846	ordinary procedure	-0.124939
-3.137458	a PC	-0.124939
-3.642397	in PC	-0.124939
-3.159735	on PC	-0.124939
-0.993018	standard PC	-0.425969
-3.130534	a frequent	-0.124939
-3.473600	The frequent	-0.124939
-2.735551	are frequent	-0.425969
-2.939675	more frequent	-0.124939
-2.787139	If frequent	-0.124939
-2.695099	most frequent	-0.124939
-0.601779	_mm256_i64gather_pd unlimited	-0.124939
-0.601779	_mm_i64gather_pd unlimited	-0.124939
-0.601779	_mm256_i32gather_epi32 unlimited	-0.124939
-0.601779	_mm_i32gather_ps unlimited	-0.124939
-0.601779	_mm256_i64gather_epi32 unlimited	-0.124939
-0.601779	_mm_i32gather_epi32 unlimited	-0.124939
-0.601779	_mm_i64gather_epi32 unlimited	-0.124939
-0.601779	_mm256_i32gather_ps unlimited	-0.124939
-3.438678	the parallelism	-0.425969
-0.380139	fine-grained parallelism	-0.124939
-0.204084	coarse-grained parallelism	-0.124939
-0.601880	Fine-grained parallelism	-0.124939
-0.601880	Coarse-grained parallelism	-0.124939
-1.354401	CPU detection	-0.425969
-3.553263	and c2	-0.124939
-2.844688	vector c2	-0.124939
-2.490152	between c2	-0.124939
-1.300437	__m128i c2	-0.425969
-1.203379	bit-mask: c2	-0.124939
-1.078440	r1; c2	-0.124939
-0.902596	c1; c2	-0.124939
-0.093901	3: "The	-1.028029
-1.714392	adding throw()	-0.124939
-0.550810	throw() throw()	-0.301030
-1.446132	exceptions throw()	-0.124939
-0.249845	empty throw()	-0.124939
-4.033767	the prediction	-0.124939
-3.764183	a prediction	-0.124939
-3.476007	for prediction	-0.124939
-1.379740	branch prediction	-0.124939
-1.854280	advanced prediction	-0.124939
-3.444166	the polymorphic	-0.124939
-2.731996	a polymorphic	-0.124939
-2.442488	call polymorphic	-0.124939
-1.078888	implementing polymorphic	-0.124939
-2.976928	have #if	-0.124939
-2.947783	use #if	-0.124939
-2.901777	} #if	-0.124939
-2.828148	because #if	-0.124939
-2.706651	set #if	-0.124939
-2.397429	code. #if	-0.124939
-1.554063	n; #if	-0.124939
-0.902529	compiled. #if	-0.124939
-3.750575	is now	-0.124939
-3.370192	are now	-0.124939
-3.330824	can now	-0.124939
-1.746507	Assume now	-0.124939
-1.503469	ecx now	-0.124939
-1.445757	body now	-0.124939
-1.203559	ranges now	-0.124939
-0.601779	Borland's now	-0.124939
-2.974076	this unit	-0.124939
-2.962280	time unit	-0.124939
-2.815913	same unit	-0.124939
-2.754050	one unit	-0.124939
-1.016521	processing unit	-0.124939
-0.380126	Execution unit	-0.425969
-1.203826	calling conventions	-0.124939
-0.070574	"Calling conventions	-0.823909
-1.203780	Calling conventions	-0.124939
-3.130534	a register.	-0.124939
-2.160095	vector register.	-0.124939
-2.815913	same register.	-0.124939
-2.754050	one register.	-0.124939
-1.901171	XMM register.	-0.124939
-1.679535	logical register.	-0.124939
-3.133982	a kind	-0.425969
-2.043262	this kind	-0.602060
-2.829742	different kind	-0.124939
-1.979935	what kind	-0.124939
-0.601880	worse kind	-0.124939
-4.012816	the graphical	-0.124939
-2.885112	a graphical	-0.602060
-2.909183	A graphical	-0.124939
-1.899260	own graphical	-0.124939
-1.378931	Several graphical	-0.124939
-0.601846	system-specific graphical	-0.124939
-3.438678	the lower	-0.124939
-3.790747	is lower	-0.124939
-3.133982	a lower	-0.124939
-3.739297	to lower	-0.124939
-2.498263	with lower	-0.124939
-4.033767	the label	-0.124939
-2.704094	each label	-0.124939
-0.692140	unused label	-0.425969
-1.746390	preceding label	-0.124939
-1.078641	$B1$2 label	-0.124939
-3.992830	the iterations	-0.124939
-3.715184	of iterations	-0.124939
-2.936860	more iterations	-0.124939
-2.791596	loop iterations	-0.124939
-2.599358	two iterations	-0.124939
-1.524957	several iterations	-0.124939
-1.997044	mathematical iterations	-0.124939
-3.444166	the misprediction	-0.124939
-3.137458	a misprediction	-0.124939
-3.601075	and misprediction	-0.124939
-1.539358	branch misprediction	-0.124939
-1.859127	an integer,	-0.124939
-2.600118	64-bit integer,	-0.124939
-1.878618	binary integer,	-0.124939
-1.555277	6 integer,	-0.124939
-0.124931	lazy binding	-0.204120
-0.204104	Lazy binding	-0.124939
-3.764183	a just-in-time	-0.124939
-2.961684	and just-in-time	-0.124939
-3.155584	on just-in-time	-0.124939
-2.269217	use just-in-time	-0.124939
-0.204084	interpreters, just-in-time	-0.425969
-3.721976	a try	-0.124939
-3.701604	to try	-0.124939
-3.031620	may try	-0.124939
-2.995338	{ try	-0.124939
-2.903116	will try	-0.124939
-2.902387	then try	-0.124939
-2.712908	no try	-0.124939
-2.536828	we try	-0.124939
-3.433259	the background	-0.124939
-3.749654	a background	-0.124939
-3.728566	of background	-0.124939
-2.819828	for background	-0.124939
-1.503657	heavy background	-0.124939
-0.902663	theoretical background	-0.124939
-2.913095	is converted	-0.301030
-2.347055	be converted	-0.726999
-2.945610	when converted	-0.124939
-1.659905	object pointed	-0.301030
-1.861311	value pointed	-0.425969
-1.846079	variable pointed	-0.425969
-1.642427	target pointed	-0.124939
-3.742373	of CPUs,	-0.124939
-2.797347	other CPUs,	-0.124939
-1.980616	Intel CPUs,	-0.124939
-1.106814	modern CPUs,	-0.425969
-0.380139	multi-core CPUs,	-0.124939
-3.780575	to account	-0.124939
-1.264716	into account	-0.492916
-1.112691	* p)	-0.726999
-4.103464	the chain	-0.124939
-0.619074	dependency chain	-0.124939
-3.553263	and algorithms	-0.124939
-3.463523	The algorithms	-0.124939
-3.147399	on algorithms	-0.124939
-2.143638	different algorithms	-0.124939
-2.217713	These algorithms	-0.124939
-2.011873	complicated algorithms	-0.124939
-1.853150	advanced algorithms	-0.124939
-3.444166	the PLT	-0.124939
-3.779216	a PLT	-0.124939
-2.561235	and PLT	-0.249877
-3.494484	The PLT	-0.124939
-3.427907	the heavy	-0.124939
-3.735594	a heavy	-0.124939
-3.713808	to heavy	-0.124939
-3.164727	with heavy	-0.124939
-3.121332	as heavy	-0.124939
-2.715188	no heavy	-0.124939
-2.524131	some heavy	-0.124939
-3.135467	code once	-0.124939
-3.043367	than once	-0.124939
-2.866951	at once	-0.124939
-2.804296	only once	-0.124939
-2.448307	called once	-0.124939
-2.287788	I once	-0.124939
-0.902529	Compile once	-0.124939
-0.601779	(rebased) once	-0.124939
-4.012816	the additions	-0.124939
-3.105180	of additions	-0.425969
-2.607080	float additions	-0.124939
-1.913200	two additions	-0.124939
-2.151788	four additions	-0.124939
-1.959958	n additions	-0.124939
-2.891008	a hash	-0.124939
-1.821422	A hash	-0.425969
-0.601947	trees, hash	-0.124939
-2.654919	into ecx	-0.124939
-1.564318	; ecx	-0.425969
-1.445677	eax, ecx	-0.124939
-1.445677	is. ecx	-0.124939
-0.601813	stack). ecx	-0.124939
-0.601813	[eax+4], ecx	-0.124939
-0.601813	[eax], ecx	-0.124939
-4.078968	the system.	-0.124939
-1.118636	operating system.	-0.204120
-2.252114	Windows system.	-0.124939
-2.794178	point variables,	-0.124939
-2.601148	static variables,	-0.124939
-2.426104	register variables,	-0.124939
-1.609082	simple variables,	-0.124939
-1.745844	local variables,	-0.124939
-0.681028	Register variables,	-0.124939
-3.163367	is equally	-0.124939
-2.214716	are equally	-0.221849
-2.271787	accessed equally	-0.124939
-1.094643	cases, however,	-0.301030
-1.299963	inefficient, however,	-0.124939
-1.203266	needed, however,	-0.124939
-0.902663	do, however,	-0.124939
-0.601846	accurate, however,	-0.124939
-0.601846	OK, however,	-0.124939
-3.153553	is designed	-0.425969
-3.403981	be designed	-0.124939
-2.735551	are designed	-0.425969
-1.995958	never designed	-0.124939
-1.446091	poorly designed	-0.124939
-0.601846	originally designed	-0.124939
-3.721976	a profiling	-0.124939
-3.538428	and profiling	-0.124939
-3.159787	with profiling	-0.124939
-2.839191	make profiling	-0.124939
-2.820110	different profiling	-0.124939
-1.898067	own profiling	-0.124939
-1.299629	languages, profiling	-0.124939
-0.601779	offering profiling	-0.124939
-3.776939	is fragmented	-0.124939
-3.568623	and fragmented	-0.124939
-3.403981	be fragmented	-0.124939
-2.939675	more fragmented	-0.124939
-1.224626	becomes fragmented	-0.124939
-1.106655	become fragmented	-0.124939
-3.427907	the inputs	-0.124939
-3.463523	The inputs	-0.124939
-2.561734	possible inputs	-0.124939
-1.829307	negative inputs	-0.124939
-1.445924	allowed inputs	-0.124939
-1.203132	mouse inputs	-0.124939
-0.601813	Higher inputs	-0.124939
-3.160071	is fast.	-0.124939
-1.390688	very fast.	-0.124939
-2.075529	quite fast.	-0.124939
-1.504271	equally fast.	-0.124939
-2.986831	have family	-0.124939
-2.115562	CPU family	-0.124939
-2.277217	its family	-0.124939
-0.926371	x86 family	-0.124939
-0.601880	brand, family	-0.124939
-2.612401	= 4,	-0.124939
-1.878044	<< 4,	-0.124939
-1.746908	Pentium 4,	-0.124939
-0.981887	2, 4,	-0.124939
-0.902663	3, 4,	-0.124939
-0.601846	Iss. 4,	-0.124939
-2.629676	// Virtual	-0.124939
-2.814110	functions Virtual	-0.124939
-1.996594	systems. Virtual	-0.124939
-0.380126	7.20 Virtual	-0.425969
-0.902663	pure. Virtual	-0.124939
-0.601846	96). Virtual	-0.124939
-3.715184	of j	-0.124939
-3.164727	with j	-0.124939
-2.312796	{ j	-0.425969
-2.638646	+ j	-0.124939
-2.442486	0; j	-0.124939
-2.030713	replace j	-0.124939
-1.554576	multiply j	-0.124939
-3.433259	the interrupt	-0.425969
-3.466191	for interrupt	-0.124939
-2.386542	an interrupt	-0.124939
-2.239992	An interrupt	-0.124939
-1.995958	never interrupt	-0.124939
-0.902663	drivers, interrupt	-0.124939
-2.363807	= -1	-0.124939
-1.609605	& -1	-0.425969
-1.264261	| -1	-0.425969
-1.379332	^ -1	-0.124939
-3.395338	be 8,	-0.124939
-3.262927	= 8,	-0.124939
-3.237292	or 8,	-0.124939
-3.047489	than 8,	-0.124939
-2.183091	at 8,	-0.425969
-2.668829	double 8,	-0.124939
-1.503177	4, 8,	-0.124939
-0.857262	execution units.	-0.191886
-2.031673	multiplication units.	-0.124939
-3.538428	and who	-0.124939
-2.766409	but who	-0.124939
-2.365123	user who	-0.124939
-1.445477	developers who	-0.124939
-1.299909	And who	-0.124939
-0.601779	those who	-0.124939
-0.601779	Those who	-0.124939
-0.601779	people who	-0.124939
-3.044340	the fastest	-0.249877
-3.160071	is fastest	-0.124939
-3.756635	of fastest	-0.124939
-3.494484	The fastest	-0.124939
-3.237292	or __restrict	-0.124939
-1.873163	* __restrict	-0.124939
-1.959851	keyword __restrict	-0.124939
-1.503669	__restrict __restrict	-0.124939
-0.902596	ivdep __restrict	-0.124939
-0.601813	noalias) __restrict	-0.124939
-0.601813	on) __restrict	-0.124939
-3.061916	an arithmetic	-0.124939
-2.727394	integer arithmetic	-0.124939
-2.692956	do arithmetic	-0.124939
-2.659744	pointer arithmetic	-0.124939
-2.115169	doing arithmetic	-0.124939
-0.681044	Pointer arithmetic	-0.124939
-0.601813	multiplexers, arithmetic	-0.124939
-3.444166	the DLL	-0.124939
-2.889034	a DLL	-0.124939
-2.822591	same DLL	-0.124939
-1.264319	runtime DLL	-0.124939
-3.433259	the factors	-0.124939
-2.826508	different factors	-0.124939
-2.218734	These factors	-0.124939
-1.525144	several factors	-0.425969
-1.679961	unknown factors	-0.124939
-0.601846	limiting factors	-0.124939
-2.930107	the Gnu,	-0.522879
-3.494484	The Gnu,	-0.124939
-3.136933	as Gnu,	-0.124939
-1.555863	Microsoft, Gnu,	-0.124939
-3.433259	the arrays.	-0.124939
-2.258033	large arrays.	-0.124939
-1.641959	initialized arrays.	-0.124939
-1.641534	align arrays.	-0.124939
-1.601206	unaligned arrays.	-0.124939
-0.380126	character arrays.	-0.124939
-3.715184	of devices	-0.124939
-2.581478	such devices	-0.124939
-2.332404	system devices	-0.124939
-1.503768	small devices	-0.124939
-1.554576	logic devices	-0.124939
-1.379470	Common devices	-0.124939
-0.601813	hand-held devices	-0.124939
-3.992830	the branch.	-0.124939
-3.735594	a branch.	-0.124939
-3.715184	of branch.	-0.124939
-3.443548	that branch.	-0.124939
-1.203478	control branch.	-0.124939
-1.641480	previous branch.	-0.124939
-1.078440	wrong branch.	-0.124939
-3.433259	the required	-0.425969
-3.153553	is required	-0.425969
-3.229397	if required	-0.124939
-2.874504	memory required	-0.124939
-1.378931	SSE3 required	-0.124939
-0.902663	previously required	-0.124939
-2.089558	= (unsigned	-0.522879
-1.776248	>= (unsigned	-0.124939
-1.446279	<= (unsigned	-0.124939
-0.601913	operator[] (unsigned	-0.124939
-2.754041	is almost	-0.249877
-2.968477	in almost	-0.425969
-2.565293	where almost	-0.124939
-1.922230	give almost	-0.124939
-3.433259	the GOT	-0.124939
-3.130534	a GOT	-0.124939
-2.953275	use GOT	-0.124939
-1.776019	slow GOT	-0.124939
-0.902663	effect. GOT	-0.124939
-0.601846	suppress. GOT	-0.124939
-3.427907	the array.	-0.124939
-3.061916	an array.	-0.124939
-2.823297	different array.	-0.124939
-2.151982	another array.	-0.124939
-1.600333	linear array.	-0.124939
-1.378731	normal array.	-0.124939
-0.902596	destination array.	-0.124939
-2.214128	are listed	-0.522879
-3.136933	as listed	-0.124939
-2.304960	instructions listed	-0.124939
-1.078741	ReadTSC listed	-0.124939
-3.196645	the general	-0.124939
-3.739297	to general	-0.124939
-2.822024	for general	-0.124939
-2.942510	more general	-0.124939
-1.775456	No general	-0.124939
-3.438678	the preferred	-0.124939
-2.909392	is preferred	-0.301030
-3.483916	The preferred	-0.124939
-3.412800	be preferred	-0.124939
-3.396204	are preferred	-0.124939
-1.103423	clock cycles,	-0.550907
-3.135467	code explicitly	-0.124939
-3.045587	compiler explicitly	-0.124939
-2.856361	data explicitly	-0.124939
-2.043917	space explicitly	-0.124939
-2.044195	dispatching explicitly	-0.124939
-1.828875	reductions explicitly	-0.124939
-1.746507	tell explicitly	-0.124939
-1.712927	alignment explicitly	-0.124939
-2.190667	memory space.	-0.425969
-1.633545	cache space.	-0.124939
-1.878618	storage space.	-0.124939
-1.679630	disk space.	-0.124939
-2.891008	a fixed	-0.124939
-2.969484	and fixed	-0.124939
-2.246757	with fixed	-0.124939
-2.871811	memory Memory	-0.124939
-1.713047	processing Memory	-0.124939
-1.445677	disk. Memory	-0.124939
-0.204071	3.13 Memory	-0.124939
-0.902596	math. Memory	-0.124939
-0.601813	include: Memory	-0.124939
-0.601813	API's. Memory	-0.124939
-3.728566	of zero.	-0.124939
-2.851158	to zero.	-0.124939
-3.387359	are zero.	-0.124939
-2.230430	element zero.	-0.124939
-2.089804	simply zero.	-0.124939
-1.804474	gives zero.	-0.124939
-2.526871	a non-sequential	-0.301030
-3.185067	with non-sequential	-0.124939
-2.346864	access non-sequential	-0.124939
-3.728566	of multiplying	-0.124939
-3.466191	for multiplying	-0.124939
-3.199403	by multiplying	-0.124939
-3.051650	than multiplying	-0.124939
-2.938091	when multiplying	-0.124939
-1.504533	before multiplying	-0.301030
-2.791918	point Conversion	-0.124939
-2.198960	integers Conversion	-0.124939
-2.074265	used. Conversion	-0.124939
-1.300289	conversion Conversion	-0.425969
-1.554329	enabled. Conversion	-0.124939
-1.299796	point. Conversion	-0.124939
-0.601813	%. Conversion	-0.124939
-2.075987	count down	-0.124939
-2.058726	was down	-0.124939
-0.823671	slow down	-0.301030
-1.745844	shift down	-0.124939
-1.379144	break down	-0.124939
-0.601846	shut down	-0.124939
-3.427907	the software.	-0.124939
-3.715184	of software.	-0.124939
-2.028508	application software.	-0.124939
-1.900625	your software.	-0.124939
-1.378731	security software.	-0.124939
-1.078440	23 software.	-0.124939
-0.902596	legacy software.	-0.124939
-4.033767	the interpreted	-0.124939
-2.909392	is interpreted	-0.124939
-3.584546	and interpreted	-0.124939
-2.967008	in interpreted	-0.124939
-3.412800	be interpreted	-0.124939
-3.153553	is exactly	-0.124939
-2.735551	are exactly	-0.124939
-2.983505	have exactly	-0.124939
-2.843452	make exactly	-0.124939
-2.115898	doing exactly	-0.124939
-1.445878	measure exactly	-0.124939
-3.721976	a jump	-0.124939
-3.702202	of jump	-0.124939
-3.437390	that jump	-0.124939
-2.966259	this jump	-0.124939
-2.266645	extra jump	-0.124939
-2.256202	; jump	-0.124939
-2.028400	microprocessor jump	-0.124939
-1.802546	statement jump	-0.124939
-2.913095	is determined	-0.301030
-2.347055	be determined	-0.249877
-2.416315	often determined	-0.124939
-1.625187	int cc[])	-1.028029
-3.154529	code line.	-0.124939
-1.425635	cache line.	-0.124939
-2.190416	matrix line.	-0.124939
-3.421802	be easily	-0.124939
-2.247221	can easily	-0.124939
-3.104883	not easily	-0.124939
-1.765651	cannot easily	-0.425969
-0.920037	type identification	-0.367977
-1.601502	Compiler identification	-0.124939
-3.973723	the vectors.	-0.124939
-3.702202	of vectors.	-0.124939
-3.615467	in vectors.	-0.124939
-2.789669	point vectors.	-0.124939
-2.725053	integer vectors.	-0.124939
-2.653867	into vectors.	-0.124939
-1.712927	adding vectors.	-0.124939
-1.679560	128-bit vectors.	-0.124939
-2.643054	+ 2)	-0.124939
-2.564156	* 2)	-0.124939
-1.249357	+= 2)	-0.602060
-1.775815	>= 2)	-0.124939
-0.204084	((x2) 2)	-0.425969
-2.957277	time applications.	-0.124939
-2.820110	different applications.	-0.124939
-2.759880	all applications.	-0.124939
-2.580492	such applications.	-0.124939
-2.255927	large applications.	-0.124939
-2.247483	Windows applications.	-0.124939
-1.078340	intensive applications.	-0.124939
-0.601779	matical applications.	-0.124939
-2.827750	The volatile	-0.425969
-3.443548	that volatile	-0.124939
-1.959851	keyword volatile	-0.124939
-1.900379	declared volatile	-0.124939
-1.503669	volatile volatile	-0.124939
-0.601813	Explain volatile	-0.124939
-0.601813	dummy[4]; volatile	-0.124939
-1.517326	cache misses	-0.221849
-1.016807	causes misses	-0.425969
-1.446705	Cache misses	-0.124939
-1.009268	lookup tables	-0.124939
-1.776540	produce tables	-0.124939
-0.806033	PLT tables	-0.425969
-0.204091	Lookup tables	-0.124939
-3.735594	a random	-0.124939
-2.964085	in random	-0.425969
-3.456591	for random	-0.124939
-3.194293	by random	-0.124939
-3.047489	than random	-0.124939
-2.936860	more random	-0.124939
-2.869107	at random	-0.124939
-0.358990	OS X	-0.204120
-1.203667	__declspec(align(16)) X	-0.124939
-0.902864	Alignd(X) X	-0.124939
-2.552534	objects Conversions	-0.124939
-2.075244	... Conversions	-0.124939
-2.074265	used. Conversions	-0.124939
-1.996308	conversion Conversions	-0.124939
-1.678776	precision. Conversions	-0.124939
-1.554329	enabled. Conversions	-0.124939
-0.505002	14.8 Conversions	-0.425969
-3.438678	the YMM	-0.124939
-2.961684	and YMM	-0.425969
-3.483916	The YMM	-0.124939
-0.857080	256-bit YMM	-0.124939
-0.902730	named YMM	-0.124939
-2.911239	is resolved	-0.301030
-3.405232	are resolved	-0.124939
-3.104883	not resolved	-0.124939
-1.649755	always resolved	-0.425969
-4.055779	the purpose	-0.124939
-2.426103	The purpose	-0.425969
-2.230423	specific purpose	-0.124939
-1.746285	special purpose	-0.124939
-4.055779	the -fpic	-0.124939
-3.179892	with -fpic	-0.124939
-1.251184	without -fpic	-0.249877
-2.200758	option -fpic	-0.124939
-4.012816	the D	-0.124939
-3.473600	The D	-0.124939
-3.466191	for D	-0.124939
-2.020597	class D	-0.425969
-1.445665	language. D	-0.124939
-0.601846	Yet, D	-0.124939
-3.260763	it had	-0.124939
-3.025075	you had	-0.124939
-2.850818	program had	-0.124939
-2.372615	registers had	-0.124939
-1.804207	columns had	-0.124939
-1.679022	models had	-0.124939
-1.078440	PC's had	-0.124939
-2.727394	integer parameters.	-0.124939
-2.424792	register parameters.	-0.124939
-2.393804	template parameters.	-0.124939
-2.274977	its parameters.	-0.124939
-2.150766	four parameters.	-0.124939
-2.126915	few parameters.	-0.124939
-1.203132	additional parameters.	-0.124939
-2.103085	1 ebx,	-0.124939
-1.407340	add ebx,	-0.124939
-1.879015	r ebx,	-0.124939
-0.857044	eax ebx,	-0.124939
-1.203400	31 ebx,	-0.124939
-3.779216	a measure	-0.124939
-2.697180	to measure	-0.124939
-3.601075	and measure	-0.124939
-3.030371	you measure	-0.124939
-2.909392	is poorly	-0.124939
-3.764183	a poorly	-0.124939
-3.396204	are poorly	-0.124939
-0.601880	original, poorly	-0.124939
-0.601880	perform poorly	-0.124939
-2.701658	do this:	-0.124939
-0.694750	like this:	-0.903090
-4.012816	the sections	-0.124939
-2.861548	data sections	-0.124939
-2.336480	following sections	-0.124939
-2.061264	above sections	-0.124939
-1.554596	subsequent sections	-0.124939
-0.204077	-ffunction- sections	-0.124939
-1.554809	problems. Software	-0.124939
-1.445878	disk. Software	-0.124939
-1.445665	anyway. Software	-0.124939
-0.380126	Architecture Software	-0.425969
-0.601846	rights. Software	-0.124939
-0.601846	swapping. Software	-0.124939
-1.714276	calculations. Even	-0.124939
-1.678531	needed. Even	-0.124939
-1.641480	well. Even	-0.124939
-1.445677	table. Even	-0.124939
-1.378977	4. Even	-0.124939
-0.601813	common. Even	-0.124939
-0.601813	come. Even	-0.124939
-2.869107	at 19	-0.124939
-2.499320	table 19	-0.124939
-1.078440	....................................................................................................... 19	-0.124939
-0.902596	.................................................................................................... 19	-0.124939
-0.902596	160 19	-0.124939
-0.601813	162 19	-0.124939
-0.601813	press. 19	-0.124939
-2.639857	is important.	-0.221849
-2.948234	more important.	-0.124939
-0.902864	increasingly important.	-0.124939
-2.842504	the carry	-0.425969
-3.516422	The carry	-0.124939
-3.108427	of lazy	-0.124939
-2.961684	and lazy	-0.425969
-3.155584	on lazy	-0.124939
-2.231939	But lazy	-0.124939
-1.641834	allow lazy	-0.124939
-3.121332	as xn	-0.124939
-2.605227	float xn	-0.124939
-2.554234	value xn	-0.124939
-2.197740	+= xn	-0.124939
-2.116392	calculate xn	-0.124939
-0.601813	nfac; xn	-0.124939
-0.601813	ex xn	-0.124939
-1.580553	time stamp	-0.669007
-4.012816	the debugging	-0.124939
-2.819828	for debugging	-0.124939
-1.942351	after debugging	-0.124939
-1.804474	off debugging	-0.124939
-1.678684	full debugging	-0.124939
-0.601846	verifying, debugging	-0.124939
-2.617468	= 10;	-0.124939
-1.215629	/ 10;	-0.301030
-0.982045	% 10;	-0.425969
-3.433259	the table.	-0.124939
-2.336480	following table.	-0.124939
-2.222115	virtual table.	-0.124939
-2.061264	above table.	-0.124939
-1.379784	linkage table.	-0.124939
-0.601846	pre-calculated table.	-0.124939
-3.728566	of 1,	-0.124939
-3.270502	= 1,	-0.124939
-3.245810	or 1,	-0.124939
-1.641534	sizes 1,	-0.124939
-1.078541	Volume 1,	-0.124939
-0.204077	{1, 1,	-0.425969
-3.779216	a vector,	-0.124939
-2.862866	of vector,	-0.301030
-2.068270	one vector,	-0.124939
-2.029771	next vector,	-0.124939
-1.851325	if (b)	-0.970037
-4.012816	the object,	-0.124939
-2.815913	same object,	-0.124939
-2.258033	large object,	-0.124939
-2.199867	allocated object,	-0.124939
-1.379869	shared object,	-0.425969
-1.446517	composite object,	-0.124939
-3.790747	is allowed	-0.124939
-3.412800	be allowed	-0.124939
-2.737508	are allowed	-0.124939
-2.424493	not allowed	-0.124939
-2.810609	only allowed	-0.124939
-3.110004	to delete	-0.124939
-2.448683	and delete	-0.221849
-1.829798	pointer. Likewise,	-0.124939
-1.713293	object. Likewise,	-0.124939
-1.599841	sets. Likewise,	-0.124939
-1.554329	overflow. Likewise,	-0.124939
-1.203132	type. Likewise,	-0.124939
-0.902596	false. Likewise,	-0.124939
-0.601813	operand. Likewise,	-0.124939
-1.762520	as follows:	-0.191886
-2.847011	vector simultaneously.	-0.124939
-2.553685	objects simultaneously.	-0.124939
-1.535534	threads simultaneously.	-0.124939
-1.299963	processes simultaneously.	-0.124939
-1.299963	jobs simultaneously.	-0.124939
-0.601846	seemingly simultaneously.	-0.124939
-3.142992	code itself	-0.124939
-3.051383	compiler itself	-0.124939
-2.168943	program itself	-0.124939
-2.029171	application itself	-0.124939
-1.900321	calling itself	-0.124939
-1.555022	device itself	-0.124939
-1.633969	efficient solution.	-0.301030
-1.960377	better solution.	-0.124939
-1.776532	inefficient solution.	-0.124939
-1.445898	reliable solution.	-0.124939
-1.078641	up-to-date solution.	-0.124939
-3.105180	of algebra	-0.124939
-2.847011	vector algebra	-0.124939
-2.794178	point algebra	-0.124939
-2.045812	Integer algebra	-0.124939
-1.997018	Boolean algebra	-0.124939
-1.600567	linear algebra	-0.124939
-2.731996	a suitable	-0.124939
-3.756635	of suitable	-0.124939
-3.104883	not suitable	-0.124939
-2.771180	all suitable	-0.124939
-3.297262	// Template	-0.124939
-1.555107	Windows Template	-0.425969
-1.678897	possible. Template	-0.124939
-0.902663	Standard Template	-0.124939
-0.601846	(Standard Template	-0.124939
-0.601846	Active Template	-0.124939
-3.334482	can spend	-0.124939
-3.088676	not spend	-0.124939
-3.025075	you spend	-0.124939
-2.089239	well spend	-0.124939
-2.043813	programs spend	-0.124939
-1.995328	never spend	-0.124939
-1.958872	applications spend	-0.124939
-1.746831	task switches	-0.124939
-0.187069	context switches	-0.124939
-0.380166	Context switches	-0.124939
-2.854568	to disk.	-0.124939
-2.884088	from disk.	-0.124939
-0.857244	hard disk.	-0.124939
-0.902797	floppy disk.	-0.124939
-3.130534	a serious	-0.425969
-3.387359	are serious	-0.124939
-2.939675	more serious	-0.124939
-2.830383	because serious	-0.124939
-2.695099	most serious	-0.124939
-1.679322	Another serious	-0.124939
-1.619936	* c);	-0.301030
-0.204091	(b, c);	-0.425969
-0.601913	CriticalFunction(b, c);	-0.124939
-0.601913	(*CriticalFunction)(b, c);	-0.124939
-0.321214	Visual Studio	-0.124939
-0.601980	(Visual Studio	-0.124939
-2.386035	int a[100];	-0.124939
-1.504654	float a[100];	-0.425969
-1.855976	i, a[100];	-0.124939
-4.078968	the trick	-0.124939
-2.311241	The trick	-0.221849
-1.746719	special trick	-0.124939
-3.433259	the disadvantages	-0.124939
-3.473600	The disadvantages	-0.124939
-3.387359	are disadvantages	-0.124939
-2.525794	some disadvantages	-0.124939
-2.363818	these disadvantages	-0.124939
-2.336480	following disadvantages	-0.124939
-2.373636	registers eax,	-0.124939
-1.407258	1 eax,	-0.124939
-1.078541	cmp eax,	-0.124939
-0.902663	[esp+8] eax,	-0.124939
-0.601846	PTR[ecx+eax*4],ebx eax,	-0.124939
-0.601846	2:8+esp eax,	-0.124939
-2.911239	is distributed	-0.124939
-3.601075	and distributed	-0.124939
-2.755938	be distributed	-0.425969
-2.402990	libraries distributed	-0.124939
-3.160071	is generally	-0.124939
-3.601075	and generally	-0.124939
-2.739473	are generally	-0.124939
-2.660093	can generally	-0.425969
-1.906151	64-bit mode,	-0.124939
-1.794334	32-bit mode,	-0.124939
-1.740913	bit mode,	-0.124939
-0.902797	exclusive mode,	-0.124939
-2.965566	and Linux.	-0.124939
-2.968477	in Linux.	-0.124939
-2.824232	for Linux.	-0.124939
-2.600118	64-bit Linux.	-0.124939
-2.060471	{ C1	-0.124939
-1.608249	class C1	-0.249877
-3.387359	are instances	-0.124939
-2.081189	all instances	-0.425969
-2.604133	multiple instances	-0.124939
-2.558322	many instances	-0.124939
-2.394826	template instances	-0.124939
-0.601846	renamed instances	-0.124939
-2.640849	is called,	-0.346788
-2.759882	be called,	-0.124939
-3.433259	the update	-0.124939
-3.726366	to update	-0.124939
-3.473600	The update	-0.124939
-3.245810	or update	-0.124939
-3.065695	an update	-0.124939
-2.365496	new update	-0.124939
-3.061512	x <=	-0.124939
-1.923169	i <=	-0.124939
-2.313176	0 <=	-0.124939
-1.959958	n <=	-0.124939
-0.601846	min) <=	-0.124939
-0.601846	1.0 <=	-0.124939
-3.752625	to integer.	-0.124939
-1.975315	an integer.	-0.124939
-2.489920	32-bit integer.	-0.124939
-1.078741	nearest integer.	-0.124939
-3.444166	the body	-0.425969
-3.237970	function body	-0.124939
-1.850960	loop body	-0.124939
-2.278341	its body	-0.124939
-0.709237	hardware definition	-0.367977
-4.033767	the Java	-0.124939
-3.108427	of Java	-0.124939
-3.476007	for Java	-0.124939
-1.535586	best Java	-0.124939
-1.713667	so-called Java	-0.124939
-2.673030	Intel Math	-0.124939
-2.211612	AMD Math	-0.124939
-0.748008	Intel's Math	-0.425969
-1.379491	Core Math	-0.124939
-0.505042	"Intel Math	-0.425969
-1.955438	compiler generates	-0.249877
-1.997805	conversion generates	-0.124939
-1.078741	-128 generates	-0.124939
-0.902797	sampling generates	-0.124939
-3.476007	for executing	-0.124939
-3.204574	by executing	-0.124939
-2.474771	on executing	-0.124939
-1.245152	after executing	-0.124939
-0.601880	speculatively executing	-0.124939
-3.553263	and Open	-0.124939
-1.713293	library. Open	-0.124939
-1.641480	well. Open	-0.124939
-1.078440	2004. Open	-0.124939
-0.902596	connections. Open	-0.124939
-0.601813	Yeppp. Open	-0.124939
-0.601813	mutexes. Open	-0.124939
-3.302196	= 256;	-0.124939
-1.171795	< 256;	-0.602060
-3.728566	of optimizations.	-0.124939
-2.793808	other optimizations.	-0.124939
-2.089168	certain optimizations.	-0.124939
-1.745844	further optimizations.	-0.124939
-0.505022	interprocedural optimizations.	-0.124939
-0.902663	low-level optimizations.	-0.124939
-3.303234	// Cache	-0.124939
-1.900933	together Cache	-0.124939
-0.204084	9.2 Cache	-0.425969
-0.204084	9.10 Cache	-0.425969
-0.902730	9.2. Cache	-0.124939
-2.752029	be slower	-0.124939
-2.400614	libraries slower	-0.124939
-2.221056	much slower	-0.124939
-2.116744	run slower	-0.124939
-1.679109	execute slower	-0.124939
-0.601846	nor slower	-0.124939
-3.763557	is free	-0.124939
-3.553263	and free	-0.124939
-3.456591	for free	-0.124939
-3.088676	not free	-0.124939
-2.751697	one free	-0.124939
-1.829062	my free	-0.124939
-1.554822	could free	-0.124939
-1.864015	time consuming	-0.249877
-0.380166	time- consuming	-0.124939
-0.601947	Time- consuming	-0.124939
-2.858005	to hold	-0.301030
-2.247822	can hold	-0.124939
-4.055779	the memory,	-0.124939
-2.716079	in memory,	-0.124939
-1.504563	allocated memory,	-0.124939
-0.601913	segmented memory,	-0.124939
-1.024790	(see p.	-0.124939
-1.879118	storage p.	-0.124939
-1.379532	above, p.	-0.124939
-1.096146	< SIZE;	-0.492916
-1.884648	this case.	-0.124939
-2.706283	each case.	-0.124939
-2.489920	32-bit case.	-0.124939
-1.642280	either case.	-0.124939
-3.476007	for (	-0.124939
-0.124919	Alignd (	-0.602060
-0.601880	longdoublevalue (	-0.124939
-0.601880	doublevalue (	-0.124939
-0.601880	floatvalue (	-0.124939
-3.403981	be expensive	-0.124939
-2.255371	more expensive	-0.124939
-2.769856	but expensive	-0.124939
-2.520186	so expensive	-0.124939
-2.495429	very expensive	-0.124939
-2.073871	quite expensive	-0.124939
-3.568623	and rounding	-0.124939
-3.051650	than rounding	-0.124939
-2.794178	point rounding	-0.124939
-2.685297	using rounding	-0.124939
-1.799053	between rounding	-0.425969
-2.238727	Use rounding	-0.124939
-1.607485	page 130	-0.726999
-1.078741	129 130	-0.124939
-0.902797	......................................................................... 130	-0.124939
-0.601913	following: 130	-0.124939
-3.763557	is far	-0.124939
-3.735594	a far	-0.124939
-3.553263	and far	-0.124939
-2.089484	values far	-0.124939
-1.959851	keyword far	-0.124939
-1.599841	course far	-0.124939
-0.902596	storage, far	-0.124939
-2.088995	memory. They	-0.124939
-1.774459	variables. They	-0.124939
-1.378977	branches. They	-0.124939
-1.203132	information. They	-0.124939
-0.902596	64-bit. They	-0.124939
-0.601813	smart. They	-0.124939
-0.601813	unreliable. They	-0.124939
-3.715184	of exceptions	-0.124939
-3.456591	for exceptions	-0.124939
-3.226337	if exceptions	-0.124939
-2.683192	using exceptions	-0.124939
-1.078440	throw exceptions	-0.124939
-0.902596	thrown exceptions	-0.124939
-0.902596	Catch exceptions	-0.124939
-3.455353	the system,	-0.124939
-1.210413	operating system,	-0.221849
-3.196645	the absolute	-0.602060
-2.956048	use absolute	-0.124939
-2.719783	no absolute	-0.124939
-2.488960	32-bit absolute	-0.124939
-1.300130	compare absolute	-0.124939
-2.617468	= (a	-0.124939
-2.134664	if (a	-0.124939
-0.601947	MAX(a,b) (a	-0.124939
-3.992830	the machine	-0.124939
-3.715184	of machine	-0.124939
-3.121332	as machine	-0.124939
-2.654919	into machine	-0.124939
-2.221617	virtual machine	-0.124939
-2.126915	few machine	-0.124939
-0.902596	resulting machine	-0.124939
-3.262927	= Induction	-0.124939
-3.058393	int Induction	-0.124939
-2.903685	} Induction	-0.124939
-2.464181	elements Induction	-0.124939
-1.939811	expressions Induction	-0.124939
-1.078440	motion Induction	-0.124939
-0.601813	70 Induction	-0.124939
-3.713808	to 120	-0.124939
-3.553263	and 120	-0.124939
-2.696958	page 120	-0.124939
-1.940790	set. 120	-0.124939
-1.078440	.......................................................................................................... 120	-0.124939
-0.902596	....................................................... 120	-0.124939
-0.601813	memory................................................................. 120	-0.124939
-2.909392	is hardly	-0.124939
-3.396204	are hardly	-0.124939
-3.174778	with hardly	-0.124939
-2.853831	has hardly	-0.124939
-2.059619	was hardly	-0.124939
-2.842504	the CPUID	-0.301030
-2.817016	only CPUID	-0.124939
-4.033767	the saved	-0.124939
-2.503406	be saved	-0.124939
-3.396204	are saved	-0.124939
-2.853831	has saved	-0.124939
-2.059619	was saved	-0.124939
-3.992830	the changes	-0.124939
-3.715184	of changes	-0.124939
-2.558251	version changes	-0.124939
-2.524131	some changes	-0.124939
-1.960096	dispatcher changes	-0.124939
-1.803224	index changes	-0.124939
-1.299796	__fastcall changes	-0.124939
-3.414453	are integers,	-0.124939
-1.906439	64-bit integers,	-0.124939
-1.379959	32-bit integers,	-0.124939
-3.779216	a collection	-0.124939
-2.917566	A collection	-0.124939
-0.488071	garbage collection	-0.124939
-0.601913	Boost collection	-0.124939
-2.400194	optimization manuals	-0.124939
-1.672108	these manuals	-0.124939
-2.336480	programming manuals	-0.124939
-1.679535	Other manuals	-0.124939
-1.554596	subsequent manuals	-0.124939
-1.378931	five manuals	-0.124939
-3.196645	the processor.	-0.301030
-2.673030	Intel processor.	-0.124939
-2.445992	4 processor.	-0.124939
-1.300130	actual processor.	-0.124939
-1.203580	soft processor.	-0.124939
-3.142992	code Shared	-0.124939
-2.393984	time. Shared	-0.124939
-2.188448	Linux Shared	-0.124939
-1.264049	below. Shared	-0.425969
-1.641746	BSD Shared	-0.124939
-1.599928	references. Shared	-0.124939
-3.114994	of storing	-0.124939
-2.574535	for storing	-0.124939
-2.530791	by storing	-0.124939
-3.483916	The developers	-0.124939
-2.845598	make developers	-0.124939
-1.804274	software developers	-0.124939
-2.278998	Some developers	-0.124939
-0.748008	Software developers	-0.124939
-1.689614	int parm2)	-0.669007
-2.962280	time T	-0.124939
-2.905601	} T	-0.124939
-1.617941	type T	-0.124939
-2.223177	a, T	-0.124939
-2.117380	inline T	-0.124939
-0.601846	protected: T	-0.124939
-3.106933	to eliminate	-0.124939
-2.247521	can eliminate	-0.124939
-2.591400	also eliminate	-0.124939
-3.111698	of 2:	-0.124939
-2.314265	case 2:	-0.124939
-1.215807	manual 2:	-0.602060
-1.980368	parameter 2:	-0.124939
-2.731996	a composite	-0.425969
-3.756635	of composite	-0.124939
-2.047169	cases, composite	-0.124939
-1.203533	transferring composite	-0.124939
-3.463523	The profilers	-0.124939
-3.164727	with profilers	-0.124939
-2.277409	Some profilers	-0.124939
-2.217713	These profilers	-0.124939
-1.978334	various profilers	-0.124939
-1.775932	Unfortunately, profilers	-0.124939
-0.902596	third-party profilers	-0.124939
-3.779216	a highly	-0.124939
-3.601075	and highly	-0.124939
-2.329842	are highly	-0.249877
-2.288895	making highly	-0.124939
-3.553263	and again	-0.124939
-2.452769	address again	-0.124939
-1.713538	them again	-0.124939
-1.503423	interpreted again	-0.124939
-1.379470	Reading again	-0.124939
-1.203379	Then again	-0.124939
-0.902596	reused again	-0.124939
-3.713808	to 127	-0.124939
-3.047489	than 127	-0.124939
-2.558251	version 127	-0.124939
-1.445431	127 127	-0.124939
-1.078440	-128 127	-0.124939
-1.078440	2exponent 127	-0.124939
-0.601813	11.8 127	-0.124939
-1.034249	assembly language.	-0.124939
-1.446479	D language.	-0.124939
-1.446705	definition language.	-0.124939
-2.140241	be aware	-0.425969
-0.601980	(be aware	-0.124939
-2.153936	functions. Alternatively,	-0.124939
-1.802488	size. Alternatively,	-0.124939
-1.679022	stack. Alternatively,	-0.124939
-1.599841	returns. Alternatively,	-0.124939
-1.503423	applications. Alternatively,	-0.124939
-1.203379	format. Alternatively,	-0.124939
-1.078440	Windows). Alternatively,	-0.124939
-2.798735	point capabilities	-0.124939
-2.402700	optimization capabilities	-0.124939
-0.634188	out-of-order capabilities	-0.124939
-1.714246	processing capabilities	-0.124939
-2.488760	< 4)	-0.124939
-2.200322	+= 4)	-0.124939
-0.926359	<< 4)	-0.301030
-1.078800	>= 4)	-0.425969
-2.931801	the linker	-0.221849
-3.505314	The linker	-0.124939
-1.600929	compiler, linker	-0.124939
-2.612401	= int64_t	-0.124939
-3.245810	or int64_t	-0.124939
-3.062260	int int64_t	-0.124939
-2.503860	2 int64_t	-0.124939
-2.102322	1 int64_t	-0.124939
-0.601846	263-1 int64_t	-0.124939
-3.108427	of bits.	-0.124939
-1.728049	64 bits.	-0.124939
-2.337941	32 bits.	-0.124939
-2.269319	extra bits.	-0.124939
-1.900754	higher bits.	-0.124939
-4.012816	the measurements	-0.124939
-3.449795	that measurements	-0.124939
-2.276346	time measurements	-0.124939
-2.905580	then measurements	-0.124939
-2.843452	make measurements	-0.124939
-2.525794	some measurements	-0.124939
-4.033767	the representation	-0.124939
-3.483916	The representation	-0.124939
-2.796451	point representation	-0.124939
-2.732115	integer representation	-0.124939
-0.926271	binary representation	-0.124939
-1.856514	int SomeFunction	-0.823909
-2.612687	float SomeFunction	-0.124939
-2.346191	void SomeFunction	-0.124939
-2.850818	program size,	-0.124939
-2.733126	cache size,	-0.124939
-2.186258	line size,	-0.124939
-1.713538	total size,	-0.124939
-1.600087	declaration size,	-0.124939
-1.503669	fixed size,	-0.124939
-1.378977	Type size,	-0.124939
-3.260763	it is.	-0.124939
-2.791596	loop is.	-0.124939
-2.452769	address is.	-0.124939
-1.919363	actually is.	-0.124939
-1.774704	vectorization is.	-0.124939
-1.599595	metaprogramming is.	-0.124939
-1.445924	itself is.	-0.124939
-0.333190	algebra reductions:	-0.124939
-0.124930	(vector) reductions:	-0.124939
-3.156800	is waiting	-0.425969
-3.396204	are waiting	-0.124939
-2.276865	time waiting	-0.425969
-2.413215	often waiting	-0.124939
-2.259089	while waiting	-0.124939
-3.763557	is available,	-0.124939
-3.395338	be available,	-0.124939
-3.378690	are available,	-0.124939
-2.708528	set available,	-0.124939
-2.695362	compilers available,	-0.124939
-2.584068	also available,	-0.124939
-1.299796	currently available,	-0.124939
-3.139213	code automatically.	-0.124939
-2.248649	work automatically.	-0.124939
-2.176439	works automatically.	-0.124939
-1.775686	comes automatically.	-0.124939
-1.746215	vectorized automatically.	-0.124939
-1.713293	alignment automatically.	-0.124939
-1.555315	vectorize automatically.	-0.124939
-3.476007	for powers	-0.124939
-3.396204	are powers	-0.124939
-3.131670	as powers	-0.124939
-1.743428	using powers	-0.602060
-2.251326	avoid powers	-0.124939
-3.130534	a debug	-0.425969
-3.726366	to debug	-0.124939
-2.129074	contains debug	-0.124939
-1.503444	temporary debug	-0.124939
-1.203479	17 debug	-0.124939
-0.601846	Uses debug	-0.124939
-3.466191	for polymorphism	-0.124939
-2.358703	without polymorphism	-0.124939
-1.960170	runtime polymorphism	-0.124939
-1.775382	desired polymorphism	-0.124939
-0.601975	Runtime polymorphism	-0.124939
-0.601846	Compile-time polymorphism	-0.124939
-3.601075	and Clang	-0.124939
-2.834718	The Clang	-0.425969
-1.804131	platforms. Clang	-0.124939
-0.550859	Gnu, Clang	-0.301030
-2.911239	is measured	-0.124939
-2.834718	The measured	-0.124939
-3.421802	be measured	-0.124939
-1.878910	were measured	-0.124939
-3.656516	in details.	-0.124939
-2.209508	for details.	-0.124939
-3.438678	the factor	-0.124939
-3.133982	a factor	-0.124939
-3.483916	The factor	-0.124939
-2.704094	each factor	-0.124939
-1.601520	risk factor	-0.124939
-0.249845	*)d, x);	-0.602060
-0.204091	order(int x);	-0.124939
-0.601913	_mm_hadd_ps(x, x);	-0.124939
-0.601913	x2, x);	-0.124939
-4.033767	the core.	-0.124939
-2.819239	same core.	-0.124939
-2.805170	CPU core.	-0.124939
-2.704094	each core.	-0.124939
-1.290770	processor core.	-0.124939
-3.433259	the rules	-0.425969
-3.473600	The rules	-0.124939
-2.815913	same rules	-0.124939
-2.558322	many rules	-0.124939
-2.089168	certain rules	-0.124939
-0.601846	coding rules	-0.124939
-3.715184	of speed.	-0.124939
-3.456591	for speed.	-0.124939
-3.047489	than speed.	-0.124939
-1.899888	higher speed.	-0.124939
-1.678285	full speed.	-0.124939
-0.601813	real-time speed.	-0.124939
-0.601813	single-thread speed.	-0.124939
-3.766375	to vectorization.	-0.124939
-2.877841	at vectorization.	-0.124939
-0.751742	automatic vectorization.	-0.346788
-2.373636	registers anyway.	-0.124939
-2.142086	support anyway.	-0.124939
-1.961018	needed anyway.	-0.124939
-1.830164	loaded anyway.	-0.124939
-1.679109	true anyway.	-0.124939
-0.204077	restarted anyway.	-0.124939
-2.768474	the smallest	-0.271067
-2.768474	the responsibility	-0.970037
-1.899679	AVX2 Mathematical	-0.124939
-1.078641	manipulation Mathematical	-0.124939
-0.380139	14.10 Mathematical	-0.425969
-0.902730	140). Mathematical	-0.124939
-0.204084	12.7 Mathematical	-0.425969
-2.600118	64-bit MMX	-0.124939
-1.313842	64 MMX	-0.124939
-2.336883	file MMX	-0.124939
-1.203533	older MMX	-0.124939
-3.137458	a reliable	-0.124939
-2.002411	more reliable	-0.124939
-2.698445	most reliable	-0.124939
-2.155001	get reliable	-0.124939
-3.196645	the Borland	-0.124939
-2.673030	Intel Borland	-0.124939
-1.600621	Windows. Borland	-0.124939
-1.379131	CodeGear Borland	-0.124939
-0.601880	2005). Borland	-0.124939
-2.842504	the sense	-0.903090
-2.463553	makes sense	-0.124939
-2.842504	the latest	-0.425969
-3.516422	The latest	-0.124939
-2.629676	// Now	-0.124939
-1.555022	to. Now	-0.124939
-1.378931	above. Now	-0.124939
-0.601846	d); Now	-0.124939
-0.601846	interrupted. Now	-0.124939
-0.601846	(s0+s1)+(s2+s3); Now	-0.124939
-1.024677	execution units	-0.124939
-2.221813	These units	-0.124939
-1.777133	Such units	-0.124939
-2.852860	to do.	-0.124939
-3.099413	not do.	-0.124939
-2.460722	cannot do.	-0.124939
-2.045271	programs do.	-0.124939
-0.601880	event-counters do.	-0.124939
-3.044340	the reciprocal	-0.249877
-3.106562	- reciprocal	-0.124939
-1.078741	approximate reciprocal	-0.124939
-0.601913	xx-xx--x- reciprocal	-0.124939
-1.460864	* d,	-0.726999
-0.602033	c, d,	-0.301030
-1.384109	multiple threads.	-0.221849
-2.495294	between threads.	-0.124939
-0.601947	stopping threads.	-0.124939
-3.992830	the log	-0.124939
-3.553263	and log	-0.124939
-3.463523	The log	-0.124939
-3.262927	= log	-0.124939
-3.237292	or log	-0.124939
-1.920832	requires log	-0.124939
-1.829798	loop. log	-0.124939
-4.055779	the thousand	-0.124939
-2.731996	a thousand	-0.249877
-3.756635	of thousand	-0.124939
-3.642397	in thousand	-0.124939
-3.735594	a compile-time	-0.124939
-3.553263	and compile-time	-0.124939
-3.237292	or compile-time	-0.124939
-3.164727	with compile-time	-0.124939
-2.869107	at compile-time	-0.124939
-2.533875	any compile-time	-0.124939
-2.012606	allows compile-time	-0.124939
-2.697180	to remove	-0.249877
-3.263365	or remove	-0.124939
-3.042798	may remove	-0.124939
-0.902797	add, remove	-0.124939
-3.776939	is Intel's	-0.124939
-3.728566	of Intel's	-0.124939
-3.628723	in Intel's	-0.124939
-3.169723	with Intel's	-0.124939
-1.854776	CPUs. Intel's	-0.124939
-0.380126	overriding Intel's	-0.425969
-2.001951	by 16.	-0.124939
-2.710685	page 16.	-0.124939
-1.203667	modulo 16.	-0.124939
-2.557489	in registers,	-0.124939
-3.159735	on registers,	-0.124939
-1.446132	MMX registers,	-0.124939
-0.601913	restoring registers,	-0.124939
-2.492381	to transpose	-0.301030
-2.444614	call transpose	-0.124939
-2.697180	to wait	-0.726999
-3.601075	and wait	-0.124939
-2.914047	will wait	-0.124939
-2.261748	must wait	-0.124939
-2.797347	other number.	-0.124939
-2.796451	point number.	-0.124939
-2.488960	32-bit number.	-0.124939
-1.997843	signed number.	-0.124939
-1.045377	model number.	-0.124939
-3.444166	the break	-0.124939
-3.103884	to break	-0.124939
-2.914047	will break	-0.124939
-0.902797	press break	-0.124939
-3.749654	a constant.	-0.124939
-3.387359	are constant.	-0.124939
-3.245810	or constant.	-0.124939
-2.815913	same constant.	-0.124939
-2.729748	integer constant.	-0.124939
-2.187393	precision constant.	-0.124939
-0.182924	procedure linkage	-0.602060
-2.018544	if possible,	-0.346788
-3.147653	as possible,	-0.124939
-1.210296	bit scan	-0.346788
-0.601980	(bit scan	-0.124939
-1.741381	bit systems:	-0.124939
-0.567249	16-bit systems:	-0.249877
-2.256541	more predictable	-0.425969
-2.698445	most predictable	-0.124939
-2.405315	how predictable	-0.124939
-0.748041	poorly predictable	-0.124939
-1.181636	<< "Hello	-0.425969
-0.187074	Writes "Hello	-0.425969
-2.911239	is equal	-0.602060
-3.421802	be equal	-0.124939
-3.073352	an equal	-0.124939
-2.189618	therefore equal	-0.124939
-3.483916	The CodeGear	-0.124939
-2.474771	on CodeGear	-0.124939
-2.673030	Intel CodeGear	-0.124939
-2.164711	/ CodeGear	-0.124939
-0.601880	9.0 CodeGear	-0.124939
-3.835016	is compact	-0.124939
-1.727224	more compact	-0.124939
-2.978038	this polynomial	-0.124939
-2.907526	} polynomial	-0.124939
-0.601952	Calculate polynomial	-0.124939
-1.078821	degree polynomial	-0.124939
-0.902730	Vec4f polynomial	-0.124939
-4.012816	the Common	-0.124939
-2.729748	integer Common	-0.124939
-1.855839	2; Common	-0.124939
-1.445665	reductions: Common	-0.124939
-1.299963	elimination Common	-0.124939
-0.601846	Verilog. Common	-0.124939
-3.449795	that reads	-0.124939
-3.245810	or reads	-0.124939
-2.853572	program reads	-0.124939
-1.900533	later reads	-0.124939
-1.601206	unaligned reads	-0.124939
-0.902663	afterwards reads	-0.124939
-2.874504	memory plus	-0.124939
-2.670965	double plus	-0.124939
-2.453660	address plus	-0.124939
-2.294455	constant plus	-0.124939
-2.129497	list plus	-0.124939
-1.503870	label plus	-0.124939
-0.848014	manual 5:	-0.602060
-3.100855	to increase	-0.425969
-3.341893	can increase	-0.124939
-2.460722	cannot increase	-0.124939
-1.920690	actually increase	-0.124939
-0.902730	minor increase	-0.124939
-2.591959	C++ casting	-0.124939
-1.203885	type casting	-0.249877
-1.379645	Type casting	-0.124939
-2.592476	of course,	-0.124939
-0.601980	Of course,	-0.124939
-2.933502	the scope	-0.823909
-3.786643	of scope	-0.124939
-4.033767	the principle	-0.124939
-2.832383	The principle	-0.425969
-3.110660	This principle	-0.124939
-2.978038	this principle	-0.124939
-2.819239	same principle	-0.124939
-3.048761	the throughput	-0.425969
-0.806084	unit throughput	-0.124939
-3.160071	is spent	-0.124939
-2.990184	have spent	-0.124939
-2.277384	time spent	-0.124939
-2.076257	cycles spent	-0.124939
-3.286060	= 16;	-0.124939
-1.470383	/ 16;	-0.124939
-0.981978	% 16;	-0.124939
-1.446279	<= 16;	-0.124939
-3.111698	of Func	-0.124939
-2.277384	time Func	-0.425969
-2.884088	from Func	-0.124939
-2.344998	void Func	-0.124939
-2.493125	to identify	-0.204120
-3.776939	is 15	-0.124939
-3.568623	and 15	-0.124939
-2.871274	at 15	-0.124939
-1.203479	pool. 15	-0.124939
-1.203266	150 15	-0.124939
-0.601846	90. 15	-0.124939
-2.548886	takes 14	-0.124939
-1.996594	systems. 14	-0.124939
-1.713446	optimization. 14	-0.124939
-1.446304	130 14	-0.124939
-0.601846	framework........................................................................... 14	-0.124939
-0.601846	language...................................................... 14	-0.124939
-2.696416	do this.	-0.124939
-1.555510	avoid this.	-0.124939
-2.014518	see this.	-0.124939
-1.379311	avoiding this.	-0.124939
-1.203400	illustrates this.	-0.124939
-2.729748	integer Register	-0.124939
-2.607080	float Register	-0.124939
-2.399984	code. Register	-0.124939
-1.299963	finished. Register	-0.124939
-1.203479	constants. Register	-0.124939
-0.902663	4; Register	-0.124939
-3.794788	a complex	-0.124939
-1.843377	more complex	-0.124939
-2.921819	A complex	-0.124939
-2.399984	code. Intrinsic	-0.124939
-1.995958	name Intrinsic	-0.124939
-1.959958	below. Intrinsic	-0.124939
-1.641534	instructions. Intrinsic	-0.124939
-1.445878	case. Intrinsic	-0.124939
-0.601846	12.3. Intrinsic	-0.124939
-3.752625	to call.	-0.124939
-2.300777	function call.	-0.124939
-2.429319	first call.	-0.124939
-2.105015	every call.	-0.124939
-2.225452	will notice	-0.425969
-1.846781	we notice	-0.425969
-0.505082	Copyright notice	-0.124939
-2.630953	// Add	-0.425969
-2.075771	program. Add	-0.124939
-1.804023	1. Add	-0.124939
-1.714026	library. Add	-0.124939
-1.600441	dispatching. Add	-0.124939
-1.263546	branch prediction.	-0.221849
-1.715127	right prediction.	-0.124939
-4.055779	the expected	-0.124939
-3.805008	is expected	-0.124939
-2.504505	be expected	-0.124939
-3.405232	are expected	-0.124939
-2.854568	to declare	-0.124939
-3.601075	and declare	-0.124939
-3.042798	may declare	-0.124939
-3.030371	you declare	-0.124939
-3.202937	the application.	-0.124939
-1.349727	particular application.	-0.124939
-0.601947	MFC application.	-0.124939
-2.962280	time here.	-0.124939
-2.768430	used here.	-0.124939
-1.855201	appropriate here.	-0.124939
-1.300389	pitfalls here.	-0.124939
-1.078541	odd here.	-0.124939
-0.902663	400 here.	-0.124939
-3.206117	the largest	-0.124939
-0.124930	numerically largest	-0.602060
-4.033767	the dispatched	-0.124939
-3.764183	a dispatched	-0.124939
-3.100855	to dispatched	-0.124939
-3.635507	in dispatched	-0.124939
-2.153702	another dispatched	-0.124939
-1.647693	data members.	-0.124939
-2.716931	class members.	-0.124939
-2.786920	that fits	-0.425969
-3.271200	it fits	-0.124939
-1.282920	what fits	-0.425969
-0.902797	float's fits	-0.124939
-3.102641	- x-xxxx--x	-0.124939
-1.803623	inlining x-xxxx--x	-0.124939
-1.378931	x-xxxx--x x-xxxx--x	-0.124939
-0.902663	a&(b|c) x-xxxx--x	-0.124939
-0.601846	a|(b&c) x-xxxx--x	-0.124939
-0.601846	Loopunrolling x-xxxx--x	-0.124939
-2.822024	for giving	-0.124939
-3.204574	by giving	-0.124939
-1.379491	am giving	-0.124939
-0.902730	By giving	-0.124939
-0.601880	subset, giving	-0.124939
-1.694513	point comparisons	-0.425969
-1.914908	two comparisons	-0.124939
-2.673030	Intel Performance	-0.124939
-2.588600	C++ Performance	-0.124939
-1.752873	4 Performance	-0.425969
-1.203580	"Intel Performance	-0.124939
-0.601880	"Integrated Performance	-0.124939
-1.556156	(see above,	-0.124939
-1.471086	explained above,	-0.124939
-1.300444	16.2 above,	-0.124939
-0.601913	bodies above,	-0.124939
-1.471133	explained above.	-0.124939
-1.747170	given above.	-0.124939
-0.601997	mentioned above.	-0.124939
-1.600141	address. Pointer	-0.124939
-1.300176	propagation Pointer	-0.124939
-0.902663	77 Pointer	-0.124939
-0.902663	rounding. Pointer	-0.124939
-0.601846	sin. Pointer	-0.124939
-0.601846	accessed. Pointer	-0.124939
-3.739297	to detect	-0.124939
-3.341893	can detect	-0.124939
-3.039976	may detect	-0.124939
-2.911288	will detect	-0.124939
-1.317379	automatically detect	-0.124939
-4.012816	the normal	-0.124939
-3.749654	a normal	-0.124939
-3.726366	to normal	-0.124939
-3.568623	and normal	-0.124939
-3.169723	with normal	-0.124939
-3.051650	than normal	-0.124939
-1.940653	compilers. Several	-0.124939
-1.679109	libraries. Several	-0.124939
-1.078541	forums Several	-0.124939
-0.601846	SSE. Several	-0.124939
-0.601846	(OWL). Several	-0.124939
-0.601846	Perl. Several	-0.124939
-3.805008	is convenient	-0.124939
-3.779216	a convenient	-0.124939
-2.755938	be convenient	-0.124939
-2.256541	more convenient	-0.425969
-3.739297	to show	-0.124939
-2.961684	and show	-0.124939
-2.810609	only show	-0.124939
-1.078641	135 show	-0.124939
-1.078821	9.1 show	-0.124939
-2.967008	in column	-0.124939
-3.174778	with column	-0.124939
-2.443747	0; column	-0.124939
-1.300130	swapping column	-0.124939
-0.601880	leftmost column	-0.124939
-0.124934	parm2) {...}	-0.903090
-2.401800	libraries Test	-0.124939
-1.600441	dispatching. Test	-0.124939
-1.446078	disk. Test	-0.124939
-1.300130	mispredictions. Test	-0.124939
-0.204084	13.4 Test	-0.425969
-3.742373	of c1	-0.124939
-3.584546	and c1	-0.124939
-2.021077	class c1	-0.124939
-2.443747	0; c1	-0.124939
-1.078641	r1; c1	-0.124939
-3.294053	= x-	-0.124939
-1.954680	x x-	-0.726999
-1.379985	x- x-	-0.124939
-3.303234	// Number	-0.124939
-1.634332	bits Number	-0.425969
-1.804023	1. Number	-0.124939
-0.902730	column. Number	-0.124939
-0.902730	113 Number	-0.124939
-4.012816	the portability	-0.124939
-3.749654	a portability	-0.124939
-3.728566	of portability	-0.124939
-3.229397	if portability	-0.124939
-2.938091	when portability	-0.124939
-1.078541	efficiency, portability	-0.124939
-3.303234	// SSE3	-0.124939
-1.900216	vectors SSE3	-0.124939
-0.902730	-msse2 SSE3	-0.124939
-0.204084	Suppl. SSE3	-0.124939
-0.601880	emmintrin.h SSE3	-0.124939
-2.583735	to evaluate	-0.124939
-2.346122	always evaluate	-0.124939
-2.968477	in embedded	-0.425969
-3.179892	with embedded	-0.124939
-2.529139	some embedded	-0.124939
-1.504329	small embedded	-0.124939
-2.528592	by Agner	-0.124939
-2.687411	using Agner	-0.124939
-2.673030	Intel Agner	-0.124939
-1.203400	class, Agner	-0.124939
-0.902730	By Agner	-0.124939
-2.933502	the availability	-0.823909
-3.516422	The availability	-0.124939
-2.875933	Example 13.1	-0.124939
-1.750270	example 13.1	-0.124939
-1.300444	122 13.1	-0.124939
-0.902797	loops. 13.1	-0.124939
-3.794788	a reference,	-0.124939
-2.187994	or reference,	-0.124939
-0.601947	non-const reference,	-0.124939
-3.444166	the .NET	-0.124939
-2.834718	The .NET	-0.425969
-1.300297	Basic .NET	-0.124939
-0.601913	Microsoft's .NET	-0.124939
-1.445878	(a !=	-0.124939
-1.203693	z !=	-0.124939
-1.203479	(n !=	-0.124939
-1.203479	(b !=	-0.124939
-0.601846	(handle !=	-0.124939
-0.601846	(*p !=	-0.124939
-2.129595	few files,	-0.124939
-0.944248	resource files,	-0.124939
-1.555570	help files,	-0.124939
-0.505091	configuration files,	-0.124939
-1.133187	references Pointers	-0.124939
-1.679923	thread. Pointers	-0.124939
-1.600742	address. Pointers	-0.124939
-0.204091	7.6 Pointers	-0.425969
-2.377210	than half	-0.124939
-2.877841	at half	-0.124939
-1.867109	only half	-0.124939
-2.822024	for converting	-0.124939
-3.396204	are converting	-0.124939
-2.940583	when converting	-0.124939
-2.454732	before converting	-0.124939
-1.078821	implicitly converting	-0.124939
-2.810609	only occurs	-0.124939
-1.504574	exception occurs	-0.124939
-1.855534	switch occurs	-0.124939
-1.642193	subexpression occurs	-0.124939
-1.504070	interrupt occurs	-0.124939
-2.221230	// Set	-0.249877
-0.601947	7.6. Set	-0.124939
-0.601947	7.5. Set	-0.124939
-3.156800	is costly	-0.124939
-3.396204	are costly	-0.124939
-2.074699	quite costly	-0.124939
-1.300130	relatively costly	-0.124939
-1.078641	extremely costly	-0.124939
-2.933502	the newest	-0.346788
-3.516422	The newest	-0.124939
-3.476007	for specifying	-0.124939
-3.204574	by specifying	-0.124939
-2.359759	without specifying	-0.124939
-1.854996	constructor specifying	-0.124939
-0.505042	C, specifying	-0.425969
-3.456133	that follows	-0.124939
-3.267693	it follows	-0.124939
-2.455821	as follows	-0.124939
-2.662565	pointer follows	-0.124939
-0.601880	closely follows	-0.124939
-3.756635	of comparing	-0.124939
-2.529690	by comparing	-0.124939
-2.376326	than comparing	-0.124939
-0.902797	counter, comparing	-0.124939
-3.776939	is efficient,	-0.124939
-3.568623	and efficient,	-0.124939
-2.939675	more efficient,	-0.124939
-2.769856	but efficient,	-0.124939
-2.374268	pointers efficient,	-0.124939
-2.073871	quite efficient,	-0.124939
-3.742373	of computers	-0.124939
-3.456133	that computers	-0.124939
-2.831504	because computers	-0.124939
-1.804202	modern computers	-0.124939
-0.681061	powerful computers	-0.124939
-4.033767	the B	-0.124939
-3.742373	of B	-0.124939
-2.152812	four B	-0.124939
-1.078821	A, B	-0.124939
-0.204084	1.1, B	-0.425969
-2.402555	code. System	-0.124939
-0.204091	3.8 System	-0.425969
-0.204091	14.13 System	-0.425969
-0.902797	else. System	-0.124939
-2.862866	of five	-0.124939
-3.752625	to five	-0.124939
-2.771180	all five	-0.124939
-1.680069	changed five	-0.124939
-2.704094	each step	-0.124939
-2.220471	single step	-0.124939
-2.028942	next step	-0.124939
-1.203616	second step	-0.124939
-0.601880	91 step	-0.124939
-3.776939	is poor	-0.124939
-3.728566	of poor	-0.124939
-3.726366	to poor	-0.124939
-3.403981	be poor	-0.124939
-2.878526	from poor	-0.124939
-0.601846	Very poor	-0.124939
-3.726366	to prefetch	-0.124939
-3.473600	The prefetch	-0.124939
-3.338172	can prefetch	-0.124939
-2.459862	cannot prefetch	-0.124939
-2.305165	processors prefetch	-0.124939
-2.012992	automatically prefetch	-0.124939
-3.065695	an 9	-0.124939
-2.562136	* 9	-0.124939
-2.491431	between 9	-0.124939
-1.203479	ebx. 9	-0.124939
-1.078754	6, 9	-0.124939
-0.902663	84 9	-0.124939
-3.168158	on deciding	-0.124939
-1.723433	when deciding	-0.823909
-3.764183	a self-relative	-0.124939
-3.742373	of self-relative	-0.124939
-3.476007	for self-relative	-0.124939
-2.117521	calculate self-relative	-0.124939
-1.132999	supports self-relative	-0.124939
-3.286060	= (float	-0.124939
-1.679923	square (float	-0.124939
-0.249845	parabola (float	-0.602060
-0.902797	lrintf (float	-0.124939
-3.794788	a Core	-0.124939
-1.566897	Intel Core	-0.425969
-2.212811	AMD Core	-0.124939
-3.199780	the debugger	-0.124939
-3.779216	a debugger	-0.124939
-3.494484	The debugger	-0.124939
-2.917566	A debugger	-0.124939
-4.078968	the ^	-0.124939
-2.733370	a ^	-0.124939
-1.300690	~a ^	-0.124939
-2.962280	time regardless	-0.124939
-2.815913	same regardless	-0.124939
-2.046236	cases, regardless	-0.124939
-1.503657	false regardless	-0.124939
-1.445878	registers, regardless	-0.124939
-0.601846	name, regardless	-0.124939
-3.728566	of truncation	-0.124939
-3.726366	to truncation	-0.124939
-3.051650	than truncation	-0.124939
-2.953275	use truncation	-0.124939
-2.830383	because truncation	-0.124939
-1.300176	specifies truncation	-0.124939
-4.033767	the base	-0.124939
-3.133982	a base	-0.124939
-3.739297	to base	-0.124939
-2.864164	data base	-0.124939
-1.203400	image base	-0.124939
-4.012816	the result.	-0.124939
-2.815913	same result.	-0.124939
-2.219577	single result.	-0.124939
-2.165797	calculated result.	-0.124939
-1.829740	negative result.	-0.124939
-1.829740	positive result.	-0.124939
-1.804423	1. How	-0.124939
-0.505062	8.1 How	-0.425969
-1.078741	multiplications. How	-0.124939
-0.204091	3.1 How	-0.425969
-0.694801	dependency chain.	-0.124939
-2.343884	access Reading	-0.124939
-2.075140	program. Reading	-0.124939
-1.679109	access. Reading	-0.124939
-1.503870	tables Reading	-0.124939
-1.078541	0x1C. Reading	-0.124939
-0.902663	0x4700. Reading	-0.124939
-4.055779	the compilation	-0.124939
-3.263365	or compilation	-0.124939
-1.922230	requires compilation	-0.124939
-0.550842	just-in-time compilation	-0.301030
-0.455921	hot spots	-0.204120
-3.199780	the behavior	-0.602060
-3.494484	The behavior	-0.124939
-2.201050	overflow behavior	-0.124939
-1.078741	wasteful behavior	-0.124939
-3.107671	This happens	-0.124939
-2.808495	only happens	-0.124939
-2.778097	which happens	-0.124939
-2.028959	typically happens	-0.124939
-1.979502	what happens	-0.124939
-1.555235	everything happens	-0.124939
-4.012816	the 7	-0.124939
-2.871274	at 7	-0.124939
-2.249330	Windows 7	-0.124939
-2.251867	versions 7	-0.124939
-1.078541	25 7	-0.124939
-0.601846	tool. 7	-0.124939
-2.019824	page 87	-0.124939
-1.379311	Func 87	-0.124939
-1.203400	............................................................................................. 87	-0.124939
-1.203400	......................................................................................... 87	-0.124939
-0.601880	................................................................................................... 87	-0.124939
-2.849346	vector Type	-0.124939
-1.446078	table. Type	-0.124939
-1.446078	follows: Type	-0.124939
-0.380139	7.11 Type	-0.124939
-0.902730	safer. Type	-0.124939
-2.144978	different places	-0.124939
-2.229266	specific places	-0.124939
-2.152812	four places	-0.124939
-1.960556	n places	-0.124939
-1.879015	r places	-0.124939
-0.961022	stack unwinding	-0.204120
-3.430994	be static,	-0.124939
-0.849705	keyword static,	-0.726999
-0.902864	Without static,	-0.124939
-0.970002	I am	-0.301030
-2.891008	a leaf	-0.301030
-2.234565	A leaf	-0.425969
-2.495294	between leaf	-0.124939
-3.805008	is evaluated	-0.124939
-2.755938	be evaluated	-0.425969
-2.739473	are evaluated	-0.124939
-3.104883	not evaluated	-0.124939
-3.739297	to completely	-0.124939
-2.753979	be completely	-0.124939
-3.254499	or completely	-0.124939
-3.155584	on completely	-0.124939
-1.680521	fail completely	-0.124939
-3.618258	and again.	-0.124939
-0.488082	back again.	-0.124939
-1.203780	breakpoint again.	-0.124939
-3.756635	of powerful	-0.124939
-2.990184	have powerful	-0.124939
-2.002411	more powerful	-0.301030
-2.075529	quite powerful	-0.124939
-3.199780	the form	-0.602060
-2.800916	other form	-0.124939
-1.901187	numbers form	-0.124939
-1.878618	binary form	-0.124939
-3.790747	is deallocated	-0.124939
-2.961684	and deallocated	-0.425969
-3.396204	are deallocated	-0.124939
-2.587719	also deallocated	-0.124939
-2.013623	automatically deallocated	-0.124939
-1.960020	three times.	-0.124939
-1.776892	response times.	-0.124939
-1.379131	five times.	-0.124939
-1.300310	hundred times.	-0.124939
-0.505042	inconvenient times.	-0.124939
-3.635507	in 32-	-0.124939
-3.483916	The 32-	-0.124939
-2.822024	for 32-	-0.425969
-2.913354	A 32-	-0.124939
-1.804202	Supports 32-	-0.124939
-2.965566	and edx	-0.124939
-3.642397	in edx	-0.124939
-3.104883	not edx	-0.124939
-1.564939	; edx	-0.425969
-1.510813	cannot rule	-0.602060
-0.857174	aliasing rule	-0.425969
-1.379532	completely rule	-0.124939
-3.642397	in one.	-0.124939
-1.672916	new one.	-0.124939
-1.048925	preceding one.	-0.124939
-1.642280	previous one.	-0.124939
-3.805008	is permissible	-0.124939
-3.421802	be permissible	-0.124939
-3.405232	are permissible	-0.124939
-2.172620	not permissible	-0.301030
-3.048761	the worst	-0.425969
-2.839426	The worst	-0.124939
-3.046545	the job	-0.124939
-2.232482	best job	-0.124939
-1.504245	background job	-0.124939
-2.696770	compilers due	-0.124939
-1.900321	higher due	-0.124939
-1.641959	delay due	-0.124939
-1.600354	future due	-0.124939
-0.902663	unstable due	-0.124939
-0.902663	differences due	-0.124939
-2.090567	= 1.0;	-0.124939
-2.516301	return 1.0;	-0.124939
-3.449795	that depend	-0.124939
-2.737449	should depend	-0.124939
-2.240203	doesn't depend	-0.124939
-1.996806	don't depend	-0.124939
-1.996170	methods depend	-0.124939
-1.679535	details depend	-0.124939
-3.046545	the biggest	-0.249877
-3.505314	The biggest	-0.124939
-1.715098	Define biggest	-0.124939
-1.997221	name ?Func@@YAXQAHAAH@Z	-0.124939
-0.681095	?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z	-0.124939
-1.203680	ALIGN ?Func@@YAXQAHAAH@Z	-0.124939
-0.204091	PUBLIC ?Func@@YAXQAHAAH@Z	-0.425969
-3.267693	it defines	-0.124939
-1.535514	language defines	-0.124939
-1.997843	__m128i defines	-0.124939
-0.902730	__m128 defines	-0.124939
-0.601880	__m128d defines	-0.124939
-2.014234	not overlap.	-0.124939
-2.658364	b overlap.	-0.124939
-1.504358	now overlap.	-0.124939
-0.944122	parallel processing,	-0.425969
-1.203580	video processing,	-0.124939
-1.203400	image processing,	-0.124939
-1.078641	signal processing,	-0.124939
-0.902730	sound processing,	-0.124939
-1.027949	void SelectAddMul(short	-0.903090
-4.033767	the users	-0.124939
-1.869487	many users	-0.124939
-2.495702	software users	-0.124939
-1.962169	end users	-0.124939
-1.830352	computer users	-0.124939
-3.584546	and soon	-0.124939
-3.131670	as soon	-0.124939
-2.224320	will soon	-0.124939
-1.300130	Basic soon	-0.124939
-1.203400	As soon	-0.124939
-3.764183	a six	-0.124939
-2.810609	only six	-0.124939
-2.550040	takes six	-0.124939
-2.428129	first six	-0.124939
-0.857080	approximately six	-0.124939
-1.642229	16 Testing	-0.124939
-2.231760	speed Testing	-0.124939
-0.601880	14.7b. Testing	-0.124939
-0.601880	14.7a. Testing	-0.124939
-0.601880	rare. Testing	-0.124939
-1.257507	In general,	-0.301030
-2.583735	to roll	-0.823909
-2.543761	we roll	-0.124939
-2.814110	functions (i.e.	-0.124939
-2.520186	so (i.e.	-0.124939
-2.513302	variables (i.e.	-0.124939
-2.503860	2 (i.e.	-0.124939
-2.076412	addresses (i.e.	-0.124939
-1.600354	module (i.e.	-0.124939
-3.568623	and edx,	-0.124939
-3.628723	in edx,	-0.124939
-2.438693	8 edx,	-0.124939
-1.554596	eax edx,	-0.124939
-1.445878	eax, edx,	-0.124939
-1.078541	ecx, edx,	-0.124939
-3.108427	of C++,	-0.124939
-2.575123	In C++,	-0.124939
-1.641654	e.g. C++,	-0.124939
-1.203580	C, C++,	-0.124939
-0.601880	managed C++,	-0.124939
-1.332769	+ i);	-0.903090
-3.742373	of mixed	-0.124939
-3.412800	be mixed	-0.124939
-2.302478	have mixed	-0.124939
-2.913354	A mixed	-0.124939
-0.902730	integration, mixed	-0.124939
-3.238707	if protection	-0.124939
-2.804514	other protection	-0.124939
-0.979374	copy protection	-0.425969
-2.105853	loop counter.	-0.124939
-2.044455	integer counter.	-0.124939
-0.748143	stamp counter.	-0.124939
-4.055779	the structure.	-0.124939
-3.137458	a structure.	-0.124939
-2.596978	or structure.	-0.124939
-2.859135	program structure.	-0.124939
-3.805008	is 4.	-0.124939
-0.793848	Pentium 4.	-0.301030
-0.601913	Linux) 4.	-0.124939
-0.601913	makers. 4.	-0.124939
-3.466191	for security	-0.124939
-2.562917	where security	-0.124939
-2.152841	another security	-0.124939
-2.061264	above security	-0.124939
-0.601846	compelling security	-0.124939
-0.601846	party security	-0.124939
-3.728566	of branches.	-0.124939
-3.568623	and branches.	-0.124939
-2.793808	other branches.	-0.124939
-2.558322	many branches.	-0.124939
-1.959323	three branches.	-0.124939
-1.078541	nearby branches.	-0.124939
-2.153475	128 Is16vec8	-0.124939
-1.997018	c; Is16vec8	-0.124939
-1.203479	c: Is16vec8	-0.124939
-1.203479	b: Is16vec8	-0.124939
-1.078754	(0,0,0,0,0,0,0,0) Is16vec8	-0.124939
-1.078754	(2,2,2,2,2,2,2,2) Is16vec8	-0.124939
-1.702386	CPU cores.	-0.249877
-2.609161	multiple cores.	-0.124939
-2.242691	processor cores.	-0.124939
-3.742373	of communication	-0.124939
-2.822024	for communication	-0.124939
-3.456133	that communication	-0.124939
-2.831504	because communication	-0.124939
-2.232118	necessary communication	-0.124939
-2.826452	for avoiding	-0.124939
-2.277178	by avoiding	-0.124939
-2.344400	always avoiding	-0.124939
-3.726366	to anything	-0.124939
-3.151472	on anything	-0.124939
-3.051650	than anything	-0.124939
-2.060205	optimize anything	-0.124939
-1.980778	cost anything	-0.124939
-1.078541	alias anything	-0.124939
-3.309290	// INSTRSET	-0.124939
-1.714246	macro INSTRSET	-0.124939
-0.806033	#if INSTRSET	-0.425969
-0.204091	#elif INSTRSET	-0.425969
-2.343884	access Accessing	-0.124939
-1.830164	pointer. Accessing	-0.124939
-1.379144	again. Accessing	-0.124939
-1.203266	structures. Accessing	-0.124939
-1.078541	compact. Accessing	-0.124939
-0.601846	Efficiency Accessing	-0.124939
-3.601075	and internal	-0.124939
-2.573291	for internal	-0.602060
-3.179892	with internal	-0.124939
-2.345869	access internal	-0.124939
-2.276563	by type-casting	-0.301030
-2.943090	when type-casting	-0.124939
-1.554984	style type-casting	-0.124939
-1.203533	C-style type-casting	-0.124939
-3.199780	the requirements	-0.602060
-2.220785	These requirements	-0.124939
-1.805008	off requirements	-0.124939
-1.714392	alignment requirements	-0.124939
-4.055779	the profiler.	-0.124939
-3.137458	a profiler.	-0.124939
-2.230423	specific profiler.	-0.124939
-0.601913	ready-made profiler.	-0.124939
-4.033767	the __fastcall	-0.124939
-3.233697	function __fastcall	-0.124939
-1.960914	keyword __fastcall	-0.124939
-0.601880	fastcall)) __fastcall	-0.124939
-0.601880	calling. __fastcall	-0.124939
-3.764183	a loss	-0.124939
-3.584546	and loss	-0.124939
-3.254499	or loss	-0.124939
-2.537271	any loss	-0.124939
-2.277395	about loss	-0.124939
-2.797347	other cleanup	-0.124939
-2.768328	all cleanup	-0.124939
-2.232118	necessary cleanup	-0.124939
-2.013623	handling cleanup	-0.124939
-1.855714	require cleanup	-0.124939
-0.505082	9.3 Functions	-0.425969
-0.380166	7.14 Functions	-0.124939
-0.601947	10.1.020. Functions	-0.124939
-2.289871	error handling.	-0.124939
-1.090079	exception handling.	-0.124939
-2.965566	and Fortran	-0.124939
-3.642397	in Fortran	-0.124939
-0.601913	versatile. Fortran	-0.124939
-0.601913	Pascal, Fortran	-0.124939
-4.055779	the increment	-0.124939
-3.103884	to increment	-0.124939
-2.797465	loop increment	-0.124939
-2.278486	about increment	-0.124939
-3.636150	and drivers	-0.124939
-0.442322	device drivers	-0.249877
-2.699570	to economize	-0.425969
-3.636150	and economize	-0.124939
-2.395138	time. Templates	-0.124939
-1.554684	parameter. Templates	-0.124939
-1.446438	10; Templates	-0.124939
-1.078641	7.28 Templates	-0.124939
-0.601880	57 Templates	-0.124939
-0.902886	row 28	-0.124939
-0.681128	column 28	-0.124939
-0.902864	systems). 28	-0.124939
-3.752625	to seven	-0.124939
-2.475633	on seven	-0.425969
-2.189763	cause seven	-0.124939
-1.555277	approximately seven	-0.124939
-3.430994	be turned	-0.124939
-2.854055	vector turned	-0.124939
-0.926395	options turned	-0.301030
-3.771380	of inheritance	-0.124939
-1.660014	multiple inheritance	-0.124939
-1.300464	Multiple inheritance	-0.124939
-2.699570	to overcome	-0.249877
-3.440385	be overcome	-0.124939
-2.699570	to maintain.	-0.124939
-3.636150	and maintain.	-0.124939
-2.858005	to fourteen	-0.301030
-2.973437	and fourteen	-0.124939
-3.752625	to 122	-0.124939
-2.020535	page 122	-0.425969
-0.601913	strategies........................................................................................ 122	-0.124939
-0.601913	sets........................... 122	-0.124939
-1.864215	time consuming.	-0.249877
-1.079022	time- consuming.	-0.124939
-4.033767	the method.	-0.124939
-2.978038	this method.	-0.124939
-2.441429	call method.	-0.124939
-2.395850	template method.	-0.124939
-2.303013	simple method.	-0.124939
-3.111698	of backwards	-0.124939
-3.104883	not backwards	-0.124939
-2.271088	accessed backwards	-0.124939
-1.078888	track backwards	-0.124939
-4.033767	the remote	-0.124939
-3.764183	a remote	-0.124939
-3.739297	to remote	-0.124939
-3.155584	on remote	-0.124939
-0.601880	updates, remote	-0.124939
-3.136933	as int,	-0.124939
-3.073352	an int,	-0.124939
-2.507590	2 int,	-0.124939
-1.618888	short int,	-0.124939
-3.601075	and bc	-0.124939
-2.851694	vector bc	-0.124939
-1.300678	__m128i bc	-0.425969
-1.203680	bit-mask: bc	-0.124939
-1.045598	development tools.	-0.124939
-1.107051	installation tools.	-0.124939
-2.068757	one operation.	-0.124939
-2.222263	single operation.	-0.124939
-1.048969	shift operation.	-0.124939
-3.048761	the future.	-0.124939
-0.601980	distant future.	-0.124939
-4.033767	the swapping	-0.124939
-3.396204	are swapping	-0.124939
-2.940583	when swapping	-0.124939
-2.877215	memory swapping	-0.124939
-1.503890	Memory swapping	-0.124939
-4.103464	the AVX512	-0.124939
-0.766809	512 AVX512	-0.124939
-2.734749	a considerable	-0.124939
-2.926114	A considerable	-0.124939
-4.033767	the memset	-0.124939
-3.742373	of memset	-0.124939
-3.739297	to memset	-0.124939
-3.456133	that memset	-0.124939
-2.816162	functions memset	-0.124939
-2.935209	the rest	-0.823909
-1.866176	version on,	-0.425969
-1.980802	running on,	-0.124939
-0.601969	turned on,	-0.124939
-2.687411	using Agner's	-0.124939
-2.187930	classes Agner's	-0.124939
-0.902730	107). Agner's	-0.124939
-0.601880	-mveclibabi=acml. Agner's	-0.124939
-0.601880	amd_vrd2_exp Agner's	-0.124939
-4.033767	the Digital	-0.124939
-3.584546	and Digital	-0.124939
-1.203400	Watcom Digital	-0.124939
-0.601880	2008. Digital	-0.124939
-0.601880	intrinsics. Digital	-0.124939
-4.033767	the third	-0.124939
-3.764183	a third	-0.124939
-3.483916	The third	-0.124939
-3.055851	than third	-0.124939
-0.601880	59 third	-0.124939
-2.105884	// Roll	-0.823909
-2.632234	// Critical	-0.124939
-1.078741	only. Critical	-0.124939
-1.078741	brand. Critical	-0.124939
-1.078741	C++. Critical	-0.124939
-0.149757	5: "Calling	-0.823909
-4.055779	the CISC	-0.124939
-3.601075	and CISC	-0.124939
-2.834718	The CISC	-0.425969
-3.179892	with CISC	-0.124939
-3.584546	and 22	-0.124939
-1.203400	....................................................................................... 22	-0.124939
-0.902730	................................................................................................ 22	-0.124939
-0.601880	switches..................................................................................................... 22	-0.124939
-0.601880	access....................................................................................................... 22	-0.124939
-3.494484	The AND	-0.124939
-2.632234	// AND	-0.425969
-2.605607	two AND	-0.124939
-1.680801	bitwise AND	-0.124939
-3.455353	the effort	-0.425969
-1.453661	optimization effort	-0.301030
-2.107867	point numbers.	-0.124939
-1.830606	negative numbers.	-0.124939
-1.446279	thousand numbers.	-0.124939
-0.902797	denormal numbers.	-0.124939
-2.942510	more popular	-0.124939
-2.913354	A popular	-0.124939
-2.696769	most popular	-0.124939
-2.441785	less popular	-0.124939
-1.078641	One popular	-0.124939
-3.315431	// SIZE	-0.124939
-2.131994	int SIZE	-0.602060
-1.961417	&& SIZE	-0.124939
-1.203533	(RTTI) Runtime	-0.124939
-0.380153	7.21 Runtime	-0.425969
-0.601913	7.43a. Runtime	-0.124939
-0.601913	73. Runtime	-0.124939
-2.337406	programming principles	-0.124939
-1.878119	storage principles	-0.124939
-1.854280	advanced principles	-0.124939
-1.830172	main principles	-0.124939
-0.601880	engineering principles	-0.124939
-3.111698	of context	-0.425969
-3.494484	The context	-0.124939
-2.917566	A context	-0.124939
-0.601913	Frequent context	-0.124939
-3.233697	function names.	-0.124939
-2.537448	variable names.	-0.124939
-2.261586	assembly names.	-0.124939
-2.219403	common names.	-0.124939
-0.601880	identifier names.	-0.124939
-3.742373	of reducing	-0.124939
-3.476007	for reducing	-0.124939
-2.873452	at reducing	-0.124939
-2.359759	without reducing	-0.124939
-1.920690	actually reducing	-0.124939
-2.660871	can benefit	-0.425969
-2.916823	will benefit	-0.124939
-0.857197	could benefit	-0.425969
-2.755938	be worth	-0.425969
-1.641842	rarely worth	-0.124939
-1.601034	alternative worth	-0.124939
-1.446572	hardly worth	-0.124939
-3.057257	compiler manual.	-0.124939
-2.297377	this manual.	-0.124939
-2.429319	first manual.	-0.124939
-1.300444	present manual.	-0.124939
-3.462566	that specifies	-0.124939
-2.497443	software specifies	-0.124939
-1.407595	standard specifies	-0.124939
-1.961447	keyword specifies	-0.124939
-3.601075	and searching	-0.124939
-2.967341	time searching	-0.124939
-1.961447	string searching	-0.124939
-0.601972	Is searching	-0.425969
-2.279358	stack versus	-0.124939
-1.379625	Pointers versus	-0.124939
-0.601972	Static versus	-0.425969
-1.078741	Signed versus	-0.124939
-1.186897	constant propagation	-0.249877
-1.203961	Constant propagation	-0.124939
-3.449723	the reduction	-0.124939
-2.046622	particular reduction	-0.124939
-0.380166	Algebraic reduction	-0.124939
-2.739185	cache effects	-0.124939
-1.133158	negative effects	-0.425969
-1.830606	positive effects	-0.124939
-0.601913	side effects	-0.124939
-1.424505	+ 1.;	-0.346788
-3.516422	The live	-0.124939
-0.902955	their live	-0.726999
-3.140962	a multidimensional	-0.124939
-2.598913	or multidimensional	-0.425969
-2.921819	A multidimensional	-0.124939
-2.856283	to install	-0.301030
-2.262447	must install	-0.124939
-0.902864	please install	-0.124939
-3.742373	of development,	-0.124939
-2.856345	program development,	-0.124939
-2.805170	CPU development,	-0.124939
-2.495702	software development,	-0.124939
-0.601880	GUI development,	-0.124939
-3.444166	the strict	-0.425969
-3.779216	a strict	-0.124939
-3.486050	for strict	-0.124939
-2.442779	less strict	-0.124939
-2.416998	for (c	-0.726999
-2.649750	+ (c	-0.124939
-1.961155	below. Position-independent	-0.124939
-1.776540	2. Position-independent	-0.124939
-0.902797	default. Position-independent	-0.124939
-0.204091	14.12 Position-independent	-0.124939
-3.805008	is obvious	-0.124939
-2.755938	be obvious	-0.425969
-3.073352	an obvious	-0.124939
-2.584452	such obvious	-0.124939
-3.160071	is swapped	-0.425969
-3.421802	be swapped	-0.124939
-3.405232	are swapped	-0.124939
-2.354282	even swapped	-0.124939
-1.300130	.......................................................................................... 21	-0.124939
-1.203400	...................................................................................................... 21	-0.124939
-1.078821	loaded. 21	-0.124939
-1.078641	....................................................................................................... 21	-0.124939
-0.601880	................................................................................................................. 21	-0.124939
-1.379813	biggest vectors:	-0.124939
-0.090170	eight-element vectors:	-0.726999
-3.483916	The OR	-0.124939
-3.303234	// OR	-0.124939
-2.240852	An OR	-0.124939
-1.680701	bitwise OR	-0.124939
-0.601880	EXCLUSIVE OR	-0.124939
-2.380183	// Array	-0.124939
-1.504245	arrays. Array	-0.124939
-0.601947	7.15a. Array	-0.124939
-2.800916	other processes	-0.124939
-1.914433	multiple processes	-0.124939
-2.562115	many processes	-0.124939
-1.503978	background processes	-0.124939
-3.790747	is portable	-0.124939
-3.764183	a portable	-0.124939
-3.412800	be portable	-0.124939
-3.099413	not portable	-0.124939
-1.554863	fully portable	-0.124939
-3.106933	to consume	-0.124939
-2.660871	can consume	-0.425969
-2.820294	functions consume	-0.124939
-1.079038	Such schemes	-0.124939
-0.425933	protection schemes	-0.602060
-2.414551	- 80	-0.425969
-2.707212	page 80	-0.124939
-2.155730	functions. 80	-0.124939
-1.998097	put 80	-0.124939
-1.600595	references. Arrays	-0.124939
-0.204091	7.10 Arrays	-0.124939
-0.902797	dynamically. Arrays	-0.124939
-0.601913	behaviors. Arrays	-0.124939
-3.476007	for lists	-0.124939
-3.254499	or lists	-0.124939
-2.502333	table lists	-0.124939
-1.922122	linked lists	-0.124939
-0.601880	Linked lists	-0.124939
-3.202937	the event	-0.301030
-2.231583	specific event	-0.124939
-0.601947	meaningless event	-0.124939
-3.137458	a computer.	-0.124939
-2.447344	4 computer.	-0.124939
-2.154564	another computer.	-0.124939
-0.902797	mainframe computer.	-0.124939
-3.150649	code Static	-0.124939
-1.830314	are: Static	-0.124939
-1.601034	not. Static	-0.124939
-0.380153	14.11 Static	-0.425969
-3.819753	is becoming	-0.124939
-2.489158	are becoming	-0.124939
-2.190416	therefore becoming	-0.124939
-4.055779	the select	-0.124939
-3.752625	to select	-0.124939
-2.786920	that select	-0.124939
-2.342685	always select	-0.124939
-3.764183	a list,	-0.124939
-3.635507	in list,	-0.124939
-3.131670	as list,	-0.124939
-1.830172	negative list,	-0.124939
-0.601880	queue, list,	-0.124939
-2.914958	is executed	-0.124939
-2.759882	be executed	-0.124939
-3.048761	the actual	-0.124939
-2.015283	their actual	-0.124939
-2.982037	this case,	-0.124939
-0.857156	latter case,	-0.124939
-1.503978	general case,	-0.124939
-0.902797	General case,	-0.124939
-2.505295	performance over	-0.124939
-1.181697	advantages over	-0.124939
-1.203533	alloca over	-0.124939
-0.601913	controversies over	-0.124939
-2.891008	a realistic	-0.301030
-2.948234	more realistic	-0.124939
-2.921819	A realistic	-0.124939
-2.864722	of abc	-0.301030
-1.715098	struct abc	-0.124939
-0.601947	c;}; abc	-0.124939
-2.756627	is finished.	-0.124939
-3.423873	are finished.	-0.124939
-1.586153	other hand,	-0.124939
-3.601075	and _WIN64	-0.124939
-2.425644	not _WIN64	-0.124939
-1.900603	platform _WIN64	-0.124939
-0.902797	_LP64 _WIN64	-0.124939
-2.584654	to recover	-0.522879
-4.078968	the console	-0.124939
-3.140962	a console	-0.425969
-2.234565	A console	-0.425969
-3.202937	the advice	-0.301030
-3.505314	The advice	-0.124939
-2.825970	same advice	-0.124939
-2.833001	different ways.	-0.124939
-1.914053	two ways.	-0.124939
-2.447344	4 ways.	-0.124939
-2.441329	8 ways.	-0.124939
-2.875933	Example 16.2	-0.124939
-2.004465	example 16.2	-0.124939
-2.076402	program. 16.2	-0.124939
-0.902797	155 16.2	-0.124939
-4.033767	the pow	-0.124939
-3.584546	and pow	-0.124939
-3.483916	The pow	-0.124939
-3.055851	than pow	-0.124939
-0.601880	sqrt, pow	-0.124939
-3.805008	is split	-0.124939
-3.103884	to split	-0.124939
-3.421802	be split	-0.124939
-2.060515	was split	-0.124939
-3.405232	are generated	-0.124939
-2.465431	code generated	-0.425969
-2.076694	files generated	-0.124939
-0.902797	comments generated	-0.124939
-3.160071	is created	-0.425969
-3.462566	that created	-0.124939
-3.263365	or created	-0.124939
-1.805008	dynamically created	-0.124939
-2.734749	a hundred	-0.124939
-2.223082	several hundred	-0.124939
-3.742373	of 250	-0.124939
-3.278211	= 250	-0.124939
-3.233697	function 250	-0.124939
-3.055851	than 250	-0.124939
-0.601880	not! 250	-0.124939
-3.756635	of computing	-0.124939
-3.601075	and computing	-0.124939
-2.824232	for computing	-0.124939
-2.442779	less computing	-0.124939
-3.742373	of pointers,	-0.124939
-2.223331	through pointers,	-0.124939
-1.600621	invalid pointers,	-0.124939
-1.446258	far pointers,	-0.124939
-1.078641	parameters, pointers,	-0.124939
-3.766375	to limit	-0.124939
-2.091163	certain limit	-0.124939
-0.346762	upper limit	-0.301030
-3.584546	and 90	-0.124939
-2.703767	page 90	-0.124939
-1.804202	syntax 90	-0.124939
-1.203400	...................................................................................................... 90	-0.124939
-1.078641	...................................................................................... 90	-0.124939
-3.739297	to follow	-0.124939
-3.028599	you follow	-0.124939
-2.907185	then follow	-0.124939
-1.776173	lines follow	-0.124939
-1.078641	labels follow	-0.124939
-3.764183	a loop-carried	-0.124939
-2.719783	no loop-carried	-0.124939
-2.603514	two loop-carried	-0.124939
-1.775456	No loop-carried	-0.124939
-1.642553	especially loop-carried	-0.124939
-3.237970	function library,	-0.124939
-2.161054	vector library,	-0.425969
-2.712197	class library,	-0.124939
-2.604107	static library,	-0.124939
-4.078968	the recommendation	-0.124939
-1.535767	specific recommendation	-0.124939
-0.902931	My recommendation	-0.124939
-2.061226	allocation Objects	-0.124939
-1.776173	2. Objects	-0.124939
-1.679263	needed. Objects	-0.124939
-1.554684	inefficient. Objects	-0.124939
-1.078641	collection. Objects	-0.124939
-3.137458	a compromise	-0.425969
-3.752625	to compromise	-0.124939
-2.241860	doesn't compromise	-0.124939
-0.902797	viable compromise	-0.124939
-0.070578	Digital Mars	-0.124939
-2.913095	is already	-0.301030
-3.469094	that already	-0.124939
-2.857770	has already	-0.124939
-3.160071	is nothing	-0.425969
-2.855796	has nothing	-0.124939
-2.832628	because nothing	-0.124939
-2.698156	do nothing	-0.124939
-2.222973	c (a&&b)	-0.124939
-0.902730	--xx----- (a&&b)	-0.124939
-0.902730	x--xx---- (a&&b)	-0.124939
-0.601880	75 (a&&b)	-0.124939
-0.601880	a&&b (a&&b)	-0.124939
-4.033767	the physical	-0.124939
-3.742373	of physical	-0.124939
-3.204574	by physical	-0.124939
-2.366650	new physical	-0.124939
-2.152812	four physical	-0.124939
-2.018733	if ((unsigned	-0.346788
-3.104597	- xxxxxxxxx	-0.124939
-0.902730	a/1=a xxxxxxxxx	-0.124939
-0.902730	x-xxxxxxx xxxxxxxxx	-0.124939
-0.601880	Constantfolding xxxxxxxxx	-0.124939
-0.601880	xxxxxxx-x xxxxxxxxx	-0.124939
-2.993562	have constructors	-0.124939
-2.540694	any constructors	-0.124939
-1.139028	copy constructors	-0.602060
-3.805008	is increased	-0.124939
-2.755938	be increased	-0.124939
-2.270651	CPUs increased	-0.124939
-2.188020	been increased	-0.124939
-2.588600	C++ programming,	-0.124939
-2.335096	system programming,	-0.124939
-2.230333	language programming,	-0.124939
-1.300310	11 programming,	-0.124939
-0.601880	object-oriented programming,	-0.124939
-2.808142	other factor.	-0.124939
-0.634202	unroll factor.	-0.124939
-1.446132	available, i.e.	-0.124939
-0.601943	16, i.e.	-0.425969
-0.601913	taken, i.e.	-0.124939
-0.601913	matrix, i.e.	-0.124939
-3.805008	is nonzero	-0.124939
-3.779216	a nonzero	-0.124939
-3.756635	of nonzero	-0.124939
-2.547665	if nonzero	-0.124939
-3.742373	of unacceptably	-0.124939
-3.204574	by unacceptably	-0.124939
-2.986831	have unacceptably	-0.124939
-1.746390	sometimes unacceptably	-0.124939
-1.078641	experience unacceptably	-0.124939
-2.015072	each process.	-0.124939
-1.997367	development process.	-0.124939
-1.901041	dispatch process.	-0.124939
-1.446132	update process.	-0.124939
-3.303234	// Calculate	-0.124939
-0.902730	15.1c. Calculate	-0.124939
-0.902730	15.1b. Calculate	-0.124939
-0.601880	8.23b. Calculate	-0.124939
-0.601880	15.1a. Calculate	-0.124939
-3.303234	// Only	-0.124939
-1.714026	library. Only	-0.124939
-1.503890	file. Only	-0.124939
-1.503890	line. Only	-0.124939
-1.203580	ebx. Only	-0.124939
-3.267693	it adds	-0.124939
-3.233697	function adds	-0.124939
-1.920690	actually adds	-0.124939
-1.504250	identification adds	-0.124939
-0.601880	ebx,1 adds	-0.124939
-1.425516	test ()	-0.602060
-1.379645	Func ()	-0.124939
-0.902864	CriticalInnerFunction ()	-0.124939
-3.303234	// Division	-0.124939
-1.714744	calculations. Division	-0.124939
-1.714205	cycles. Division	-0.124939
-1.555223	faster. Division	-0.124939
-0.601880	matters: Division	-0.124939
-4.055779	the pitfalls	-0.124939
-2.834718	The pitfalls	-0.425969
-2.220494	common pitfalls	-0.124939
-2.129595	few pitfalls	-0.124939
-2.864769	program package	-0.124939
-1.390882	software package	-0.124939
-4.033767	the equivalent	-0.124939
-3.790747	is equivalent	-0.124939
-3.483916	The equivalent	-0.124939
-3.396204	are equivalent	-0.124939
-2.116627	doing equivalent	-0.124939
-2.856283	to understand	-0.124939
-3.618258	and understand	-0.124939
-1.998305	don't understand	-0.124939
-2.187930	classes Fortunately,	-0.124939
-1.300310	all. Fortunately,	-0.124939
-1.078641	sizes. Fortunately,	-0.124939
-0.601880	less. Fortunately,	-0.124939
-0.601880	d.y; Fortunately,	-0.124939
-4.078968	the command	-0.124939
-2.891008	a command	-0.301030
-2.921819	A command	-0.124939
-3.294053	= a[i];	-0.124939
-2.514286	return a[i];	-0.124939
-1.249550	+= a[i];	-0.124939
-4.055779	the relatively	-0.124939
-3.805008	is relatively	-0.124939
-3.779216	a relatively	-0.124939
-2.739473	are relatively	-0.124939
-1.282916	high priority.	-0.124939
-0.857174	low priority.	-0.124939
-1.504358	lower priority.	-0.124939
-2.864164	data files.	-0.124939
-2.622172	library files.	-0.124939
-1.979756	source files.	-0.124939
-1.679263	disk files.	-0.124939
-1.601340	header files.	-0.124939
-2.913095	is inefficient,	-0.124939
-2.076360	quite inefficient,	-0.124939
-1.078842	extremely inefficient,	-0.124939
-4.033767	the guidelines	-0.124939
-2.365231	these guidelines	-0.124939
-2.337406	following guidelines	-0.124939
-2.278998	Some guidelines	-0.124939
-0.601880	Accessibility guidelines	-0.124939
-0.216701	Math Kernel	-0.221849
-3.835016	is necessarily	-0.124939
-2.014681	not necessarily	-0.124939
-3.584546	and returns	-0.124939
-3.233697	function returns	-0.124939
-2.780307	which returns	-0.124939
-1.804920	unused returns	-0.124939
-1.078821	ret returns	-0.124939
-2.945363	more jobs	-0.124939
-2.605607	two jobs	-0.124939
-0.601943	cleanup jobs	-0.124939
-0.601913	foreground jobs	-0.124939
-1.746569	class. Data	-0.124939
-1.203760	work. Data	-0.124939
-0.902730	together. Data	-0.124939
-0.902730	87). Data	-0.124939
-0.601880	areas. Data	-0.124939
-3.584546	and frameworks	-0.124939
-3.396204	are frameworks	-0.124939
-1.960735	runtime frameworks	-0.124939
-1.900574	interface frameworks	-0.124939
-1.776532	Such frameworks	-0.124939
-4.078968	the excessive	-0.124939
-2.134861	an excessive	-0.602060
-2.961646	use excessive	-0.124939
-3.160071	is safer	-0.425969
-3.405232	are safer	-0.124939
-2.917566	A safer	-0.124939
-2.189618	therefore safer	-0.124939
-1.942788	set. Aligning	-0.124939
-0.204097	12.8 Aligning	-0.425969
-0.204097	12.9 Aligning	-0.425969
-0.634214	out-of-order execution.	-0.124939
-1.642656	parallel execution.	-0.124939
-2.386854	int a[size],	-0.124939
-1.664346	float a[size],	-0.301030
-3.048761	the latency	-0.249877
-3.810938	a latency	-0.124939
-3.103884	to specify	-0.124939
-3.030371	you specify	-0.124939
-2.541438	we specify	-0.124939
-2.090935	well specify	-0.124939
-0.994820	i; for(i=0;	-0.346788
-3.444166	the larger	-0.124939
-3.805008	is larger	-0.124939
-3.779216	a larger	-0.124939
-2.014400	allows larger	-0.124939
-3.131670	as -(-a)	-0.124939
-3.104597	- -(-a)	-0.124939
-2.647331	n.a. -(-a)	-0.124939
-2.014518	expression -(-a)	-0.124939
-2.013980	like -(-a)	-0.124939
-1.804023	cases. Multiple	-0.124939
-1.203580	146 Multiple	-0.124939
-0.601880	"function". Multiple	-0.124939
-0.601880	exact. Multiple	-0.124939
-0.601880	7.38a. Multiple	-0.124939
-3.835016	is unfortunately	-0.124939
-1.666769	but unfortunately	-0.124939
-3.756635	of n!	-0.124939
-2.632234	// n!	-0.124939
-3.136933	as n!	-0.124939
-2.314702	0 n!	-0.124939
-3.235581	if pieces	-0.124939
-1.504329	small pieces	-0.425969
-1.642280	identical pieces	-0.124939
-1.300297	Critical pieces	-0.124939
-3.114994	of Basic	-0.124939
-3.496332	for Basic	-0.124939
-0.944369	Visual Basic	-0.124939
-1.446078	solution. (In	-0.124939
-1.379131	this. (In	-0.124939
-1.203400	input. (In	-0.124939
-1.078641	inlined. (In	-0.124939
-0.902730	respectively. (In	-0.124939
-2.833001	different microprocessors.	-0.124939
-2.114689	other microprocessors.	-0.124939
-2.698445	most microprocessors.	-0.124939
-0.601913	Intel/x86-compatible microprocessors.	-0.124939
-2.836285	different modules.	-0.124939
-1.861267	other modules.	-0.301030
-2.337805	system modules.	-0.124939
-3.309290	// s	-0.124939
-1.446425	x); s	-0.124939
-0.505062	s; s	-0.425969
-0.601913	//Loopby4 s	-0.124939
-3.449723	the project	-0.124939
-3.794788	a project	-0.124939
-1.804981	software project	-0.124939
-3.805008	is divided	-0.124939
-2.755938	be divided	-0.425969
-3.104883	not divided	-0.124939
-1.878181	usually divided	-0.124939
-2.195613	from www.agner.org/optimize/asmlib.zip.	-0.124939
-2.184877	at www.agner.org/optimize/asmlib.zip.	-0.124939
-2.625596	library www.agner.org/optimize/asmlib.zip.	-0.124939
-1.264261	| Wednesday	-0.425969
-1.747016	== Wednesday	-0.124939
-1.503978	4, Wednesday	-0.124939
-0.902797	Tuesday, Wednesday	-0.124939
-2.889723	from mispredictions.	-0.124939
-1.380003	branch mispredictions.	-0.124939
-3.456133	that relies	-0.124939
-3.146803	code relies	-0.124939
-2.856345	program relies	-0.124939
-1.960914	mechanism relies	-0.124939
-0.601880	MKL relies	-0.124939
-2.209115	etc. And	-0.124939
-1.679443	precision. And	-0.124939
-1.300130	maintain. And	-0.124939
-0.902730	(PLT). And	-0.124939
-0.601880	www.yeppp.info And	-0.124939
-2.145649	different platforms,	-0.425969
-2.562115	many platforms,	-0.124939
-2.494002	between platforms,	-0.124939
-2.189909	Linux platforms,	-0.124939
-2.856283	to compare	-0.124939
-3.618258	and compare	-0.124939
-2.261322	; compare	-0.124939
-3.805008	is valid	-0.124939
-3.137458	a valid	-0.124939
-3.756635	of valid	-0.124939
-3.752625	to valid	-0.124939
-3.111698	of CPU-intensive	-0.124939
-3.752625	to CPU-intensive	-0.124939
-3.486050	for CPU-intensive	-0.124939
-2.529139	some CPU-intensive	-0.124939
-1.679923	stack. Is	-0.124939
-0.748012	solution. Is	-0.124939
-0.902797	tree. Is	-0.124939
-0.601913	38). Is	-0.124939
-1.750955	do so.	-0.124939
-1.555511	approximately so.	-0.124939
-0.601947	excessively so.	-0.124939
-3.790747	is seen	-0.124939
-3.412800	be seen	-0.124939
-3.099413	not seen	-0.124939
-2.986831	have seen	-0.124939
-1.300130	ever seen	-0.124939
-1.745852	resources. Typically,	-0.124939
-1.554863	enabled. Typically,	-0.124939
-1.503890	units. Typically,	-0.124939
-1.300130	future. Typically,	-0.124939
-0.601880	caches. Typically,	-0.124939
-4.033767	the 107	-0.124939
-2.703767	page 107	-0.124939
-1.203400	......................................................................................... 107	-0.124939
-0.601880	................................................................. 107	-0.124939
-0.601880	.......................................................... 107	-0.124939
-3.160071	is contiguous	-0.425969
-3.179892	with contiguous	-0.124939
-2.758797	one contiguous	-0.124939
-1.854991	modules contiguous	-0.124939
-3.267693	it gets	-0.124939
-2.780307	which gets	-0.124939
-2.709849	class gets	-0.124939
-2.367895	user gets	-0.124939
-1.961451	programmer gets	-0.124939
-3.742373	of manuals.	-0.124939
-3.584546	and manuals.	-0.124939
-2.401445	optimization manuals.	-0.124939
-1.554863	subsequent manuals.	-0.124939
-1.379131	five manuals.	-0.124939
-3.113670	This tells	-0.124939
-2.336883	file tells	-0.124939
-1.961447	keyword tells	-0.124939
-1.106858	profiler tells	-0.425969
-2.856283	to wrap	-0.301030
-3.110422	not wrap	-0.124939
-2.557943	value wrap	-0.124939
-2.709849	class separately	-0.124939
-2.606686	object separately	-0.124939
-2.188108	line separately	-0.124939
-2.029834	branches separately	-0.124939
-1.714205	them separately	-0.124939
-1.555570	pure __attribute((	-0.124939
-1.300297	__fastcall __attribute((	-0.124939
-0.204091	align(16)) __attribute((	-0.425969
-0.902797	const)) __attribute((	-0.124939
-3.790747	is necessary.	-0.124939
-3.412800	be necessary.	-0.124939
-3.099413	not necessary.	-0.124939
-3.055851	than necessary.	-0.124939
-2.564103	where necessary.	-0.124939
-3.805008	is increasing	-0.124939
-3.209807	by increasing	-0.124939
-2.388134	an increasing	-0.425969
-0.601913	monotonically increasing	-0.124939
-2.530791	by 16,	-0.425969
-2.877841	at 16,	-0.124939
-0.806044	8, 16,	-0.124939
-2.836285	different threads,	-0.124939
-1.660014	multiple threads,	-0.301030
-2.495294	between threads,	-0.124939
-0.857127	6 Development	-0.124939
-1.446279	details. Development	-0.124939
-0.601913	old-fashioned. Development	-0.124939
-0.601913	(Integrated Development	-0.124939
-2.756627	is AND'ed	-0.726999
-2.996966	have AND'ed	-0.124939
-0.689148	subexpression elimination	-0.124939
-0.681177	Pointer elimination	-0.124939
-3.110422	not all.	-0.124939
-1.930277	at all.	-0.301030
-1.714873	them all.	-0.124939
-3.054310	compiler ..........................................................................................	-0.124939
-2.337406	programming ..........................................................................................	-0.124939
-1.979935	resources ..........................................................................................	-0.124939
-1.203400	maintenance ..........................................................................................	-0.124939
-1.203580	sequentially ..........................................................................................	-0.124939
-4.055779	the upper	-0.124939
-0.505062	reasonable upper	-0.425969
-1.078888	Get upper	-0.124939
-0.601913	not-too-big upper	-0.124939
-3.146803	code addresses.	-0.124939
-2.877215	memory addresses.	-0.124939
-1.804741	relative addresses.	-0.124939
-1.446258	absolute addresses.	-0.124939
-1.203400	round addresses.	-0.124939
-3.140962	a loop-invariant	-0.124939
-2.969484	and loop-invariant	-0.425969
-2.340389	out loop-invariant	-0.124939
-3.739297	to sum1	-0.124939
-3.002400	{ sum1	-0.124939
-2.514521	variables sum1	-0.124939
-0.902730	list[size], sum1	-0.124939
-0.601880	list[i+1];} sum1	-0.124939
-3.286060	= ~a	-0.124939
-1.609605	& ~a	-0.425969
-1.379332	^ ~a	-0.124939
-0.601913	--------- ~a	-0.124939
-2.514521	variables Compilers	-0.124939
-2.401268	code. Compilers	-0.124939
-1.997484	cache. Compilers	-0.124939
-1.503890	X Compilers	-0.124939
-1.379311	overlap. Compilers	-0.124939
-0.601880	"memory" );	-0.124939
-0.601880	<<6 );	-0.124939
-0.601880	bb[size] );	-0.124939
-0.601880	cc[size] );	-0.124939
-0.601880	aa[size] );	-0.124939
-3.742373	of 18	-0.124939
-1.379671	Number 18	-0.124939
-0.902730	158 18	-0.124939
-0.902730	.................................................................................................. 18	-0.124939
-0.601880	159 18	-0.124939
-2.277395	about them.	-0.124939
-2.251326	avoid them.	-0.124939
-1.998202	needs them.	-0.124939
-1.504430	multiplying them.	-0.124939
-0.601880	connect them.	-0.124939
-1.765013	floating point.	-0.124939
-1.715098	per point.	-0.124939
-1.203667	entry point.	-0.124939
-1.864215	time consumption	-0.249877
-2.203721	power consumption	-0.124939
-2.118543	by 8.	-0.249877
-0.902931	appropriate. 8.	-0.124939
-4.078968	the key	-0.124939
-3.140962	a key	-0.124939
-2.598913	or key	-0.124939
-2.134861	an explanation.	-0.124939
-1.746944	further explanation.	-0.124939
-1.680110	little explanation.	-0.124939
-3.204574	by itself.	-0.124939
-3.146803	code itself.	-0.124939
-2.856345	program itself.	-0.124939
-1.854996	constructor itself.	-0.124939
-1.804023	profiler itself.	-0.124939
-2.757906	be updated	-0.124939
-1.493363	been updated	-0.124939
-0.601947	Last updated	-0.124939
-2.914047	will appear	-0.124939
-2.859135	program appear	-0.124939
-1.673414	they appear	-0.425969
-1.854991	modules appear	-0.124939
-3.483916	The Codeplay	-0.124939
-1.642013	well. Codeplay	-0.124939
-1.300130	xxxxxxxxx Codeplay	-0.124939
-0.601880	2005. Codeplay	-0.124939
-0.601880	CodeGear, Codeplay	-0.124939
-2.606686	object (except	-0.124939
-1.941072	expressions (except	-0.124939
-1.679622	iteration (except	-0.124939
-1.642013	loops (except	-0.124939
-1.445898	capabilities (except	-0.124939
-3.444166	the combined	-0.425969
-3.421802	be combined	-0.124939
-3.405232	are combined	-0.124939
-3.057257	compiler combined	-0.124939
-3.819753	is definitely	-0.124939
-1.790380	should definitely	-0.602060
-1.504358	binding definitely	-0.124939
-3.601075	and jumps	-0.124939
-3.271200	it jumps	-0.124939
-2.221220	thread jumps	-0.124939
-0.380153	Eliminate jumps	-0.124939
-2.851694	vector elements.	-0.124939
-2.712197	class elements.	-0.124939
-1.870342	array elements.	-0.124939
-1.555570	finding elements.	-0.124939
-2.774052	all .cpp	-0.124939
-1.660014	multiple .cpp	-0.301030
-1.601042	current .cpp	-0.124939
-1.869874	many features,	-0.124939
-2.090935	optimizing features,	-0.124939
-1.600595	metaprogramming features,	-0.124939
-1.078741	backup features,	-0.124939
-3.154529	code flag	-0.124939
-1.980125	zero flag	-0.124939
-0.492878	carry flag	-0.124939
-1.090017	+= 8)	-0.726999
-1.777116	>= 8)	-0.124939
-4.033767	the ever	-0.124939
-3.635507	in ever	-0.124939
-2.986831	have ever	-0.124939
-2.201246	exception ever	-0.124939
-1.446438	hardly ever	-0.124939
-2.221728	// Writes	-0.726999
-1.981236	resources Writes	-0.124939
-3.584546	and 13	-0.124939
-2.873452	at 13	-0.124939
-1.446078	120 13	-0.124939
-0.601880	121 13	-0.124939
-0.601880	1.19 13	-0.124939
-3.635507	in b[i]	-0.124939
-3.278211	= b[i]	-0.124939
-3.232478	if b[i]	-0.124939
-3.002400	{ b[i]	-0.124939
-2.131920	i++) b[i]	-0.124939
-2.913095	is doubled.	-0.301030
-3.110422	not doubled.	-0.124939
-2.189180	been doubled.	-0.124939
-3.584546	and written	-0.124939
-3.412800	be written	-0.124939
-2.556085	value written	-0.124939
-2.045271	programs written	-0.124939
-0.601880	hand- written	-0.124939
-1.228308	programming languages,	-0.124939
-0.601980	script languages,	-0.124939
-2.346565	or malloc	-0.301030
-2.820294	functions malloc	-0.124939
-1.555285	(or malloc	-0.124939
-2.535083	that runs	-0.124939
-2.861943	program runs	-0.124939
-2.499190	software runs	-0.124939
-2.756627	is true,	-0.249877
-3.147653	as true,	-0.124939
-4.033767	the division.	-0.124939
-3.254499	or division.	-0.124939
-2.849346	vector division.	-0.124939
-2.796451	point division.	-0.124939
-2.732115	integer division.	-0.124939
-2.617468	= C;	-0.124939
-2.647506	+ C;	-0.124939
-0.380166	B, C;	-0.124939
-0.601972	0.18 0.18	-0.124939
-1.078741	0.11 0.18	-0.124939
-1.078741	0.12 0.18	-0.124939
-0.601913	0.75 0.18	-0.124939
-2.648062	n.a. MS	-0.124939
-1.708156	optimization MS	-0.425969
-1.446132	int64_t MS	-0.124939
-1.203533	uint64_t MS	-0.124939
-2.907526	} #endif	-0.124939
-1.554863	n; #endif	-0.124939
-1.078641	pure_function #endif	-0.124939
-0.601880	__attribute__((aligned(16))) #endif	-0.124939
-0.601880	SelectAddMul_AVX2 #endif	-0.124939
-3.444166	the present	-0.124939
-3.494484	The present	-0.124939
-3.486050	for present	-0.124939
-3.104883	not present	-0.124939
-2.856283	to 15.1c	-0.124939
-2.696566	example 15.1c	-0.124939
-0.601947	151 15.1c	-0.124939
-2.206703	= 1000;	-0.249877
-2.491214	< 1000;	-0.124939
-3.444166	the strlen	-0.425969
-0.902797	0.28 strlen	-0.124939
-0.601913	examples: strlen	-0.124939
-0.601913	0.27 strlen	-0.124939
-3.805008	is __asm	-0.124939
-3.263365	or __asm	-0.124939
-1.921792	x; __asm	-0.124939
-0.204091	syntax: __asm	-0.124939
-1.335353	clock cycle.	-0.346788
-2.873452	at 11	-0.124939
-2.550040	takes 11	-0.124939
-2.230333	language 11	-0.124939
-1.679263	needed. 11	-0.124939
-1.203400	103 11	-0.124939
-3.456133	that belong	-0.124939
-2.768328	all belong	-0.124939
-2.413215	often belong	-0.124939
-2.340977	always belong	-0.124939
-1.776173	lines belong	-0.124939
-2.575123	In 50	-0.124939
-2.550040	takes 50	-0.124939
-1.203400	.............................................................................................. 50	-0.124939
-1.078641	............................................................................................... 50	-0.124939
-1.078641	took 50	-0.124939
-2.990184	have facilities	-0.124939
-1.854845	advanced facilities	-0.124939
-0.857156	search facilities	-0.425969
-1.379478	powerful facilities	-0.124939
-3.104597	- 5.	-0.124939
-2.295479	constant 5.	-0.124939
-1.878477	<< 5.	-0.124939
-1.855175	CPUs. 5.	-0.124939
-0.902730	AVX. 5.	-0.124939
-3.160071	is currently	-0.124939
-3.405232	are currently	-0.124939
-2.346886	method currently	-0.124939
-2.168359	manual currently	-0.124939
-2.029477	multiplication here:	-0.124939
-1.555402	mentioned here:	-0.124939
-1.300130	principles here:	-0.124939
-1.300490	pitfalls here:	-0.124939
-0.601880	reporting here:	-0.124939
-1.804423	cases. Does	-0.124939
-1.600742	sets. Does	-0.124939
-0.902855	Windows. Does	-0.124939
-1.203533	IDE. Does	-0.124939
-3.174778	with macros	-0.124939
-3.131670	as macros	-0.124939
-2.251326	avoid macros	-0.124939
-2.239784	Use macros	-0.124939
-0.601880	Predefined macros	-0.124939
-2.353863	may prefer	-0.425969
-3.030371	you prefer	-0.124939
-2.914047	will prefer	-0.124939
-2.541438	we prefer	-0.124939
-3.449723	the divisor	-0.425969
-2.548304	if divisor	-0.425969
-2.297534	constant divisor	-0.124939
-0.380179	3.5 Program	-0.425969
-0.204104	3.3 Program	-0.425969
-3.819753	is better.	-0.124939
-1.555692	work better.	-0.124939
-0.601947	clearly better.	-0.124939
-3.618258	and BSD,	-0.124939
-3.163926	on BSD,	-0.124939
-1.016852	Linux, BSD,	-0.124939
-4.078968	the bit-mask:	-0.124939
-3.140962	a bit-mask:	-0.425969
-0.902864	inverted bit-mask:	-0.124939
-3.756635	of two.	-0.124939
-3.263365	or two.	-0.124939
-3.060093	than two.	-0.124939
-2.847755	make two.	-0.124939
-1.855868	start up,	-0.124939
-1.203680	cleaned up,	-0.124939
-1.078741	starts up,	-0.124939
-1.078741	filled up,	-0.124939
-1.203680	cleaned up.	-0.124939
-1.078741	starts up.	-0.124939
-1.078741	filled up.	-0.124939
-0.601913	broken up.	-0.124939
-2.505295	performance reasons.	-0.124939
-2.221220	several reasons.	-0.124939
-1.555277	usability reasons.	-0.124939
-0.902797	marketing reasons.	-0.124939
-2.707212	page 103	-0.124939
-1.078741	..................................................................................................... 103	-0.124939
-0.902797	................................................................................................. 103	-0.124939
-0.902797	writing: 103	-0.124939
-1.815996	2 Choosing	-0.425969
-1.133332	5 Choosing	-0.425969
-1.864416	time slices	-0.124939
-2.135306	an exception.	-0.124939
-2.156294	another exception.	-0.124939
-2.304380	& enum	-0.124939
-2.241715	An enum	-0.124939
-1.714978	conditions enum	-0.124939
-0.601913	bool, enum	-0.124939
-2.861943	program repeats	-0.124939
-2.105853	loop repeats	-0.124939
-2.591400	also repeats	-0.124939
-3.455353	the highest	-0.124939
-2.839426	The highest	-0.124939
-2.189618	matrix 96	-0.124939
-1.300297	.......................................................................................... 96	-0.124939
-0.902797	...................................................................................................................... 96	-0.124939
-0.601913	............................................................. 96	-0.124939
-3.110004	to recommend	-0.124939
-0.204104	textbooks recommend	-0.124939
-3.780575	to lead	-0.124939
-2.407205	can lead	-0.602060
-2.388932	an additional	-0.124939
-3.060224	compiler additional	-0.124939
-1.203667	transferring additional	-0.124939
-2.722100	no 51	-0.124939
-2.707212	page 51	-0.124939
-0.601913	classes............................................................................................ 51	-0.124939
-0.601913	............................................................................ 51	-0.124939
-2.851694	vector 56	-0.124939
-1.203533	.............................................................................................. 56	-0.124939
-1.203533	............................................................................................. 56	-0.124939
-0.601913	................................................................................................................... 56	-0.124939
-3.779216	a type.	-0.124939
-2.833001	different type.	-0.124939
-2.734494	integer type.	-0.124939
-1.078741	wrong type.	-0.124939
-3.794788	a place	-0.124939
-3.766375	to place	-0.124939
-2.068757	one place	-0.124939
-3.163367	is preferable	-0.124939
-3.430994	be preferable	-0.124939
-2.416315	often preferable	-0.124939
-3.106933	to overlap	-0.425969
-3.349433	can overlap	-0.124939
-3.110422	not overlap	-0.124939
-3.050988	the eight-element	-0.726999
-1.856541	takes 40	-0.124939
-1.203780	s; 40	-0.124939
-0.601947	conversions.................................................................................................... 40	-0.124939
-3.064983	x 43	-0.124939
-2.021248	page 43	-0.124939
-0.601947	statements............................................................................. 43	-0.124939
-2.969484	and sixteen	-0.425969
-3.272416	or sixteen	-0.124939
-1.642548	either sixteen	-0.124939
-3.506862	for turning	-0.124939
-2.277793	by turning	-0.301030
-2.184877	at initialization.	-0.124939
-2.377727	need initialization.	-0.124939
-2.233382	necessary initialization.	-0.124939
-1.804598	platforms. Graphics	-0.124939
-1.078842	input/output Graphics	-0.124939
-0.204097	3.10 Graphics	-0.124939
-4.055779	the obstacles	-0.124939
-2.366648	these obstacles	-0.124939
-2.280231	important obstacles	-0.124939
-2.220494	common obstacles	-0.124939
-3.455353	the asmlib	-0.124939
-1.998965	using asmlib	-0.425969
-2.402555	code. Furthermore,	-0.124939
-1.554984	executed. Furthermore,	-0.124939
-0.902797	crash. Furthermore,	-0.124939
-0.601913	edx. Furthermore,	-0.124939
-3.110004	to obtain	-0.425969
-2.661651	can obtain	-0.124939
-3.114994	of ebx.	-0.124939
-3.766375	to ebx.	-0.124939
-1.078842	pop ebx.	-0.124939
-3.263365	or estimate	-0.124939
-1.203680	reasonable estimate	-0.124939
-0.601913	roughly estimate	-0.124939
-0.601913	our estimate	-0.124939
-3.163367	is enabled	-0.124939
-2.344400	always enabled	-0.124939
-1.714873	them enabled	-0.124939
-3.636150	and enables	-0.124939
-2.172039	This enables	-0.301030
-0.380179	8.4 Obstacles	-0.425969
-0.204104	8.3 Obstacles	-0.425969
-1.195559	& r)	-0.425969
-3.642397	in regular	-0.124939
-3.486050	for regular	-0.124939
-2.875641	at regular	-0.124939
-2.304235	simple regular	-0.124939
-3.771380	of m	-0.124939
-2.468920	way m	-0.124939
-1.158068	function, m	-0.425969
-2.403846	code. Metaprogramming	-0.124939
-0.681128	15 Metaprogramming	-0.124939
-1.203667	Metaprogramming Metaprogramming	-0.124939
-1.962318	examples explain	-0.124939
-0.505082	me explain	-0.124939
-1.078842	To explain	-0.124939
-2.396295	time. Dispatch	-0.124939
-1.961155	below. Dispatch	-0.124939
-1.941850	compilers. Dispatch	-0.124939
-0.601913	times: Dispatch	-0.124939
-3.136933	as well,	-0.124939
-2.178337	optimized well,	-0.124939
-1.830752	predicted well,	-0.124939
-1.078888	reasonably well,	-0.124939
-3.819753	is sufficiently	-0.124939
-2.741447	are sufficiently	-0.425969
-0.902864	worked sufficiently	-0.124939
-1.961155	below. 126	-0.124939
-1.446132	127 126	-0.124939
-1.300297	.......................................................................................... 126	-0.124939
-1.078741	..................................................................................................... 126	-0.124939
-3.805008	is bad	-0.124939
-3.779216	a bad	-0.124939
-3.756635	of bad	-0.124939
-1.746431	particularly bad	-0.124939
-1.570667	double p(double	-0.726999
-3.163367	is said	-0.425969
-3.430994	be said	-0.124939
-1.643000	easier said	-0.124939
-4.055779	the modulo	-0.124939
-3.752625	to modulo	-0.124939
-2.619451	i modulo	-0.124939
-2.252057	avoid modulo	-0.124939
-3.618258	and databases	-0.124939
-0.982090	Other databases	-0.124939
-1.300464	remote databases	-0.124939
-1.181620	0, _EM_OVERFLOW);	-0.425969
-0.204104	_controlfp(0, _EM_OVERFLOW);	-0.425969
-1.379625	protection against	-0.124939
-0.601913	warn against	-0.124939
-0.601913	weighed against	-0.124939
-0.601913	remedies against	-0.124939
-1.803985	size. Vectorized	-0.124939
-1.203533	operators. Vectorized	-0.124939
-0.601913	12.4b. Vectorized	-0.124939
-0.601913	112 Vectorized	-0.124939
-0.601913	printf("Beta"); break;	-0.124939
-0.601913	printf("Gamma"); break;	-0.124939
-0.601913	printf("Delta"); break;	-0.124939
-0.601913	printf("Alpha"); break;	-0.124939
-3.206117	the loader	-0.124939
-3.636150	and loader	-0.124939
-1.446479	number. Failure	-0.124939
-0.380166	deallocated. Failure	-0.425969
-0.902864	flow. Failure	-0.124939
-2.757926	is declared.	-0.124939
-3.756635	of resources,	-0.124939
-2.945363	more resources,	-0.124939
-2.822591	same resources,	-0.124939
-1.776687	network resources,	-0.124939
-3.805008	is true.	-0.124939
-3.486050	for true.	-0.124939
-3.421802	be true.	-0.124939
-2.342685	always true.	-0.124939
-3.601075	and objects.	-0.124939
-2.771180	all objects.	-0.124939
-2.712197	class objects.	-0.124939
-2.153838	four objects.	-0.124939
-2.559206	in parallel.	-0.124939
-3.209807	by one,	-0.124939
-2.812734	only one,	-0.124939
-2.342685	always one,	-0.124939
-1.379332	plus one,	-0.124939
-1.973401	int list[300];	-0.726999
-0.333204	SIZE; r++)	-0.726999
-3.302196	= parabola	-0.124939
-1.664346	float parabola	-0.602060
-2.380902	// x^4	-0.301030
-0.601980	x^3, x^4	-0.124939
-3.794788	a mouse	-0.124939
-3.618258	and mouse	-0.124939
-2.598913	or mouse	-0.124939
-1.287347	template specialization	-0.425969
-2.797465	loop index.	-0.124939
-2.564424	array index.	-0.124939
-2.304235	simple index.	-0.124939
-0.601913	top-of-stack index.	-0.124939
-2.505295	performance options.	-0.124939
-2.402700	optimization options.	-0.124939
-1.831483	relevant options.	-0.124939
-0.601913	monitoring options.	-0.124939
-0.748140	SIZE; c++)	-0.425969
-0.380179	r; c++)	-0.425969
-2.468020	elements are.	-0.124939
-2.403959	optimization are.	-0.124939
-1.673535	they are.	-0.124939
-3.440385	be needed,	-0.124939
-2.490266	are needed,	-0.124939
-3.786643	of declaring	-0.124939
-2.277793	by declaring	-0.301030
-4.103464	the SVML	-0.124939
-1.726591	Intel SVML	-0.124939
-2.600857	or *.so).	-0.425969
-0.204104	(*.dll, *.so).	-0.124939
-2.135158	if (u.i	-0.249877
-2.232750	necessary support.	-0.124939
-2.189909	AVX support.	-0.124939
-1.503978	profiling support.	-0.124939
-0.601913	C++0x support.	-0.124939
-3.636150	and subtraction	-0.124939
-0.602007	addition, subtraction	-0.602060
-2.633519	// Multiply	-0.425969
-3.074073	int Multiply	-0.124939
-0.601947	b<c) Multiply	-0.124939
-0.380166	*(p++) |=	-0.425969
-0.601947	*(int*)&x |=	-0.124939
-0.601947	x.i |=	-0.124939
-1.777853	memory pool.	-0.249877
-3.469094	that performs	-0.124939
-1.866176	version performs	-0.124939
-2.242691	processor performs	-0.124939
-3.449723	the "Intel	-0.124939
-3.505314	The "Intel	-0.124939
-1.203780	Intel: "Intel	-0.124939
-2.396295	time. Are	-0.124939
-1.203680	index. Are	-0.124939
-0.902797	small. Are	-0.124939
-0.601913	94 Are	-0.124939
-3.494484	The pre-increment	-0.124939
-2.958838	use pre-increment	-0.124939
-2.565293	where pre-increment	-0.124939
-1.831044	change pre-increment	-0.124939
-3.601075	and ownership	-0.124939
-1.555277	transfer ownership	-0.124939
-0.601913	transfers ownership	-0.124939
-0.601913	looses ownership	-0.124939
-2.707212	page 88	-0.124939
-0.601913	delete). 88	-0.124939
-0.601913	...................................... 88	-0.124939
-0.601913	together...................................... 88	-0.124939
-0.505118	|= 0x80000000;	-0.425969
-0.204104	^= 0x80000000;	-0.124939
-2.660871	can move	-0.124939
-3.045638	may move	-0.124939
-1.203667	mouse move	-0.124939
-2.909459	} Can	-0.124939
-1.300444	all. Can	-0.124939
-1.078741	2004. Can	-0.124939
-0.902797	container. Can	-0.124939
-3.771380	of defining	-0.124939
-3.496332	for defining	-0.124939
-2.530791	by defining	-0.124939
-2.789663	that produces	-0.124939
-1.846760	variable produces	-0.425969
-3.118316	of precision,	-0.124939
-1.984824	double precision,	-0.124939
-2.892991	a non-inlined	-0.602060
-3.119752	This non-inlined	-0.124939
-3.206117	the drawbacks	-0.602060
-3.636150	and drawbacks	-0.124939
-3.004779	{ __declspec(align(16))	-0.124939
-1.203680	12.2 __declspec(align(16))	-0.124939
-0.902797	Alignd(X) __declspec(align(16))	-0.124939
-0.902797	alignment. __declspec(align(16))	-0.124939
-3.756635	of u.f	-0.124939
-3.462566	that u.f	-0.124939
-3.309290	// u.f	-0.124939
-1.446279	<= u.f	-0.124939
-4.055779	the commercial	-0.124939
-2.917566	A commercial	-0.124939
-2.562115	many commercial	-0.124939
-0.902797	optional commercial	-0.124939
-2.061679	write configuration	-0.124939
-1.379478	files, configuration	-0.124939
-0.902797	drivers, configuration	-0.124939
-0.601913	DLLs, configuration	-0.124939
-2.021248	page 134	-0.124939
-0.902864	range"; 134	-0.124939
-0.902864	.................................................................................................. 134	-0.124939
-3.154529	code lines.	-0.124939
-2.047816	cache lines.	-0.124939
-2.130492	few lines.	-0.124939
-3.496332	for restrictions	-0.124939
-2.130492	few restrictions	-0.124939
-1.393999	certain restrictions	-0.425969
-2.648062	n.a. Constant	-0.124939
-1.980222	Microsoft Constant	-0.124939
-0.601913	6.0f; Constant	-0.124939
-0.601913	places. Constant	-0.124939
-0.567273	heap manager	-0.124939
-2.491452	branch pattern	-0.124939
-0.124930	periodic pattern	-0.301030
-4.078968	the x86-64	-0.124939
-2.969484	and x86-64	-0.425969
-1.078842	_M_IX86 x86-64	-0.124939
-2.789663	that *p+2	-0.425969
-1.421333	calculate *p+2	-0.124939
-3.618258	and Watcom	-0.124939
-0.748052	Open Watcom	-0.124939
-1.300577	Codeplay Watcom	-0.124939
-3.794788	a round	-0.124939
-3.766375	to round	-0.124939
-2.184877	at round	-0.124939
-2.945363	more cores,	-0.124939
-2.807327	CPU cores,	-0.124939
-2.607478	multiple cores,	-0.124939
-1.078741	RISC cores,	-0.124939
-2.788289	that chooses	-0.425969
-2.861943	program chooses	-0.124939
-2.344400	always chooses	-0.124939
-3.163367	is running.	-0.124939
-3.414453	are running.	-0.124939
-2.945610	when running.	-0.124939
-3.166689	is serial	-0.124939
-0.204104	Transforming serial	-0.425969
-1.782277	from cc	-0.726999
-2.633519	// Header	-0.425969
-2.716116	set Header	-0.124939
-0.601947	12.2. Header	-0.124939
-3.462566	that 150	-0.124939
-2.707212	page 150	-0.124939
-1.300297	.......................................................................................... 150	-0.124939
-1.078741	....................................................................................................... 150	-0.124939
-2.584307	efficient thanks	-0.124939
-2.014255	automatically thanks	-0.124939
-1.600742	similar thanks	-0.124939
-1.504563	fragmented thanks	-0.124939
-2.207363	= 2.0;	-0.249877
-3.206117	the pipeline	-0.124939
-3.810938	a pipeline	-0.124939
-3.078083	int n)	-0.124939
-1.110136	(int n)	-0.602060
-1.418422	user input.	-0.124939
-1.203801	mouse input.	-0.124939
-3.057257	compiler 8.1	-0.124939
-2.503848	table 8.1	-0.124939
-1.960863	Table 8.1	-0.124939
-1.078741	66 8.1	-0.124939
-2.314265	case conditions.	-0.124939
-2.104577	hardware conditions.	-0.124939
-1.203533	worst-case conditions.	-0.124939
-0.601913	best-case conditions.	-0.124939
-2.530791	by choosing	-0.124939
-2.945610	when choosing	-0.124939
-1.962318	programmer choosing	-0.124939
-2.021248	page 146	-0.425969
-1.830814	are: 146	-0.124939
-0.601947	libraries............................................................................ 146	-0.124939
-2.818223	functions ..............................................................................................	-0.124939
-1.942726	types ..............................................................................................	-0.124939
-1.922084	directives ..............................................................................................	-0.124939
-1.901187	control ..............................................................................................	-0.124939
-3.246643	function _mm256_zeroupper()	-0.124939
-1.492632	call _mm256_zeroupper()	-0.602060
-1.600929	user. Making	-0.124939
-0.601969	13 Making	-0.425969
-0.601947	improvements. Making	-0.124939
-3.455353	the flush-to-zero	-0.425969
-0.681177	Set flush-to-zero	-0.124939
-3.779216	a Taylor	-0.124939
-3.136933	as Taylor	-0.124939
-0.601913	12.9b. Taylor	-0.124939
-0.601913	12.9a. Taylor	-0.124939
-2.566185	* SelectAddMul_pointer	-0.124939
-1.504563	2) SelectAddMul_pointer	-0.124939
-1.300737	8) SelectAddMul_pointer	-0.124939
-1.078741	5) SelectAddMul_pointer	-0.124939
-4.078968	the dispatcher.	-0.124939
-3.794788	a dispatcher.	-0.124939
-2.116442	CPU dispatcher.	-0.425969
-0.391196	Gnu, Clang,	-0.425969
-3.601075	and 14.9	-0.124939
-2.875933	Example 14.9	-0.124939
-1.078741	141 14.9	-0.124939
-0.902797	int)u; 14.9	-0.124939
-3.163926	on n,	-0.124939
-2.297534	constant n,	-0.124939
-0.982113	x, n,	-0.425969
-2.875933	Example 14.8	-0.124939
-2.693854	example 14.8	-0.124939
-1.078741	platform. 14.8	-0.124939
-1.078741	140 14.8	-0.124939
-3.756635	of overflow,	-0.124939
-3.486050	for overflow,	-0.124939
-2.734494	integer overflow,	-0.124939
-2.189763	cause overflow,	-0.124939
-1.203961	100; x++)	-0.425969
-1.555398	n; x++)	-0.124939
-0.601947	i--, x++)	-0.124939
-3.414453	are optimal.	-0.124939
-2.426797	not optimal.	-0.124939
-2.886896	from optimal.	-0.124939
-0.124930	_mm_storeu_si128((__m128i *)d,	-0.301030
-0.601980	_mm_store_si128((__m128i *)d,	-0.124939
-3.794788	a class,	-0.124939
-0.902999	Vector class,	-0.124939
-1.555850	derived class,	-0.124939
-2.909459	} z	-0.124939
-1.960718	&& z	-0.124939
-0.902797	cos(x); z	-0.124939
-0.902797	sin(x); z	-0.124939
-2.559206	in advance	-0.249877
-1.747932	vector c:	-0.249877
-3.163367	is guaranteed	-0.425969
-3.414453	are guaranteed	-0.124939
-3.110422	not guaranteed	-0.124939
-3.042798	may think	-0.124939
-2.908797	then think	-0.124939
-2.289915	I think	-0.124939
-1.997805	don't think	-0.124939
-2.135306	an example.	-0.124939
-2.990146	this example.	-0.124939
-3.494484	The older	-0.124939
-3.179892	with older	-0.124939
-3.159735	on older	-0.124939
-1.715124	On older	-0.124939
-3.819753	is commonly	-0.124939
-2.005740	most commonly	-0.425969
-2.607710	two commonly	-0.124939
-4.055779	the queue	-0.124939
-3.779216	a queue	-0.124939
-2.917566	A queue	-0.124939
-0.902797	FIFO queue	-0.124939
-4.055779	the {}	-0.124939
-2.180379	inside {}	-0.124939
-1.078741	5) {}	-0.124939
-0.601913	vector() {}	-0.124939
-2.649750	+ 1.0f;	-0.124939
-1.249647	+= 1.0f;	-0.301030
-0.380179	ret ALIGN	-0.124939
-0.204104	assembly: ALIGN	-0.425969
-2.724428	no modification	-0.124939
-1.680743	need modification	-0.124939
-2.091163	certain modification	-0.124939
-1.601042	similar solutions	-0.124939
-0.204097	Possible solutions	-0.124939
-0.601947	hybrid solutions	-0.124939
-1.294161	optimization guide	-0.726999
-4.078968	the appendix	-0.124939
-2.388932	an appendix	-0.425969
-2.242579	An appendix	-0.124939
-3.505314	The 17	-0.124939
-1.379872	Number 17	-0.124939
-0.204097	157 17	-0.425969
-4.078968	the empty	-0.124939
-3.505314	The empty	-0.124939
-2.388932	an empty	-0.425969
-2.565871	and maintenance	-0.124939
-3.185067	with 1:	-0.124939
-2.315129	case 1:	-0.124939
-1.283007	parameter 1:	-0.124939
-0.601969	11 Out	-0.425969
-0.601947	First-In-Last- Out	-0.124939
-0.601947	First-In-First- Out	-0.124939
-3.144495	a protected	-0.425969
-3.110004	to protected	-0.425969
-1.555285	allocation. Container	-0.124939
-0.204097	9.7 Container	-0.425969
-0.601947	threads? Container	-0.124939
-3.136933	as alternatives	-0.124939
-2.584307	efficient alternatives	-0.124939
-2.565293	possible alternatives	-0.124939
-1.979930	various alternatives	-0.124939
-3.756635	of modifications	-0.124939
-3.209807	by modifications	-0.124939
-1.901625	your modifications	-0.124939
-1.856014	require modifications	-0.124939
-1.249647	+= i_div_3;	-0.124939
-1.856377	i, i_div_3;	-0.124939
-3.294053	= s;	-0.124939
-2.386035	int s;	-0.124939
-0.902864	__m128 s;	-0.124939
-0.204104	"worst case"	-0.124939
-0.204104	"best case"	-0.124939
-2.700769	to distinguish	-0.249877
-3.505314	The missing	-0.124939
-2.741447	are missing	-0.425969
-2.921819	A missing	-0.124939
-1.680604	Optimizing subroutines	-0.124939
-0.124930	"Optimizing subroutines	-0.602060
-1.300623	development tools	-0.124939
-1.600929	metaprogramming tools	-0.124939
-1.504245	profiling tools	-0.124939
-3.302196	= 0x2710	-0.124939
-1.504911	address 0x2710	-0.301030
-0.664190	hot spot	-0.124939
-2.837066	The powN	-0.425969
-3.238707	if powN	-0.124939
-2.714557	class powN	-0.124939
-3.449723	the C-style	-0.124939
-3.315431	// C-style	-0.124939
-1.879118	old C-style	-0.124939
-2.231294	language While	-0.124939
-2.155730	functions. While	-0.124939
-0.902797	language". While	-0.124939
-0.601913	others. While	-0.124939
-1.459053	}; Bitfield	-0.425969
-1.880135	union Bitfield	-0.124939
-1.715098	struct Bitfield	-0.124939
-3.106933	to clean	-0.425969
-2.700127	most clean	-0.124939
-2.262447	must clean	-0.124939
-1.446279	-fpic according	-0.124939
-1.446572	representation according	-0.124939
-1.078888	behave according	-0.124939
-0.902797	Now, according	-0.124939
-3.315431	// Bounds	-0.124939
-0.204097	14.2 Bounds	-0.425969
-0.601947	fragmentation. Bounds	-0.124939
-0.902977	u; u.i	-0.124939
-1.555398	n; u.i	-0.124939
-1.300577	nonzero u.i	-0.124939
-3.779216	a dramatic	-0.124939
-2.945363	more dramatic	-0.124939
-2.498453	very dramatic	-0.124939
-2.075529	quite dramatic	-0.124939
-2.388932	an IDE.	-0.124939
-1.901057	own IDE.	-0.124939
-1.446366	Studio IDE.	-0.124939
-3.494484	The lengths	-0.124939
-2.833001	different lengths	-0.124939
-2.539124	variable lengths	-0.124939
-0.601913	great lengths	-0.124939
-3.110422	not expensive.	-0.124939
-1.805140	very expensive.	-0.425969
-2.443774	less expensive.	-0.124939
-3.114994	of efficiency.	-0.124939
-3.649399	in efficiency.	-0.124939
-1.901959	improve efficiency.	-0.124939
-0.944324	20 Copyright	-0.425969
-0.601947	www.agner.org/optimize. Copyright	-0.124939
-0.601947	Denmark. Copyright	-0.124939
-3.805008	is extended	-0.124939
-3.421802	be extended	-0.124939
-3.405232	are extended	-0.124939
-3.073352	an extended	-0.124939
-2.741224	cache size)	-0.124939
-1.776682	>= size)	-0.124939
-0.204097	(line size)	-0.124939
-0.902797	smmintrin.h (Gnu)	-0.124939
-0.601913	fma4intrin.h (Gnu)	-0.124939
-0.601913	xopintrin.h (Gnu)	-0.124939
-0.601913	x86intrin.h (Gnu)	-0.124939
-3.805008	is contained	-0.124939
-3.779216	a contained	-0.124939
-3.752625	to contained	-0.124939
-1.379332	completely contained	-0.124939
-3.771380	of transferring	-0.124939
-2.826452	for transferring	-0.124939
-3.215103	by transferring	-0.124939
-1.961980	efficient. Access	-0.124939
-0.204097	9.9 Access	-0.425969
-0.601947	locally. Access	-0.124939
-2.575782	for saving	-0.124939
-0.601980	frame, saving	-0.124939
-2.562115	many years	-0.124939
-2.221220	several years	-0.124939
-1.379478	six years	-0.124939
-0.902797	ten years	-0.124939
-1.984824	double y,	-0.425969
-0.982160	x, y,	-0.425969
-3.114994	of structured	-0.124939
-3.185067	with structured	-0.124939
-3.163926	on structured	-0.124939
-2.369381	compiler documentation	-0.425969
-1.379532	poor documentation	-0.124939
-1.078842	Microprocessor documentation	-0.124939
-3.007172	{ CChild1	-0.124939
-2.022037	class CChild1	-0.425969
-0.902864	Object2; CChild1	-0.124939
-1.300297	Mars PGI	-0.124939
-0.902797	(The PGI	-0.124939
-0.601913	tolerated. PGI	-0.124939
-0.601913	2007. PGI	-0.124939
-1.714100	100 As	-0.124939
-0.601913	pow(x,n) As	-0.124939
-0.601913	perfectly. As	-0.124939
-0.601913	5). As	-0.124939
-0.391189	identification (RTTI)	-0.124939
-2.118970	by default,	-0.124939
-1.570667	double xpow10(double	-0.726999
-0.090173	a1, a2,	-0.726999
-1.930526	at inconvenient	-0.301030
-2.593253	also inconvenient	-0.124939
-2.348589	be expressed	-0.726999
-3.449723	the bottleneck	-0.425969
-3.794788	a bottleneck	-0.124939
-2.231583	specific bottleneck	-0.124939
-4.055779	the directive	-0.124939
-3.779216	a directive	-0.124939
-1.855575	#define directive	-0.124939
-0.601913	__assume_aligned directive	-0.124939
-2.095355	If not,	-0.425969
-2.272688	does not,	-0.124939
-1.300804	Does not,	-0.124939
-2.892991	a scarce	-0.124939
-1.879778	were scarce	-0.124939
-2.201085	Example 12.4b	-0.124939
-2.005572	example 12.4b	-0.124939
-3.455353	the lrint	-0.124939
-2.386854	int lrint	-0.425969
-2.607478	multiple versions.	-0.124939
-2.605607	two versions.	-0.124939
-2.600118	64-bit versions.	-0.124939
-2.332551	dynamic versions.	-0.124939
-2.345869	access .............................................................................................	-0.124939
-2.188890	classes .............................................................................................	-0.124939
-2.130032	operators .............................................................................................	-0.124939
-2.030208	multiplication .............................................................................................	-0.124939
-1.446479	16. Alignment	-0.124939
-0.505082	9.5 Alignment	-0.425969
-0.601947	7.2. Alignment	-0.124939
-3.805008	is going	-0.124939
-3.756635	of going	-0.124939
-3.104883	not going	-0.124939
-2.943090	when going	-0.124939
-2.969484	and underflow	-0.124939
-3.077232	an underflow	-0.124939
-2.801031	point underflow	-0.124939
-0.187080	live ranges	-0.425969
-3.618258	and splitting	-0.124939
-2.837066	The splitting	-0.425969
-1.879344	were splitting	-0.124939
-3.206117	the user's	-0.301030
-1.963071	end user's	-0.124939
-3.110422	not __INTEL_COMPILER	-0.124939
-2.190641	Linux __INTEL_COMPILER	-0.124939
-0.505082	__INTEL_COMPILER __INTEL_COMPILER	-0.124939
-2.757906	be cleaned	-0.124939
-3.414453	are cleaned	-0.124939
-1.980802	resources cleaned	-0.124939
-3.819753	is cached.	-0.124939
-3.430994	be cached.	-0.124939
-2.426797	not cached.	-0.124939
-3.618258	and video	-0.124939
-3.272416	or video	-0.124939
-0.505082	RGB video	-0.425969
-2.559206	in aa:	-0.425969
-2.607444	number information.	-0.124939
-2.298105	available information.	-0.124939
-2.014255	handling information.	-0.124939
-1.831483	relevant information.	-0.124939
-2.188890	classes Whenever	-0.124939
-2.076257	used. Whenever	-0.124939
-1.601034	problem. Whenever	-0.124939
-1.203533	better. Whenever	-0.124939
-4.078968	the area	-0.124939
-2.191227	memory area	-0.425969
-2.869446	data area	-0.124939
-4.078968	the consequence	-0.124939
-2.837066	The consequence	-0.124939
-1.078842	unfortunate consequence	-0.124939
-1.984824	double a1,	-0.425969
-0.505118	y, a1,	-0.425969
-3.805008	is unsigned.	-0.124939
-3.752625	to unsigned.	-0.124939
-3.263365	or unsigned.	-0.124939
-3.235581	if unsigned.	-0.124939
-3.185067	with pointers.	-0.124939
-1.962318	char pointers.	-0.124939
-0.902909	invalid pointers.	-0.124939
-2.707212	page 26	-0.124939
-2.718452	floating 26	-0.124939
-0.601913	constructs........................................................................ 26	-0.124939
-0.601913	storage............................................................................. 26	-0.124939
-1.679630	possible. Smaller	-0.124939
-1.503978	software. Smaller	-0.124939
-1.078741	caching. Smaller	-0.124939
-0.601913	microcontrollers: Smaller	-0.124939
-2.707212	page 29	-0.124939
-1.446132	int64_t 29	-0.124939
-1.379478	column 29	-0.124939
-0.601913	operators............................................................................... 29	-0.124939
-3.752625	to sum2	-0.124939
-3.601075	and sum2	-0.124939
-1.878764	0, sum2	-0.124939
-0.902797	list[i]; sum2	-0.124939
-3.294053	= (n	-0.124939
-2.548304	if (n	-0.124939
-2.261210	while (n	-0.124939
-3.496332	for (b	-0.124939
-3.294053	= (b	-0.124939
-2.548304	if (b	-0.124939
-2.530791	by 2n	-0.425969
-3.185067	with 2n	-0.124939
-3.064377	than 2n	-0.124939
-2.726769	no idea	-0.124939
-1.249700	good idea	-0.602060
-2.866797	data ......................................................................................................	-0.124939
-2.376122	pointers ......................................................................................................	-0.124939
-2.345869	access ......................................................................................................	-0.124939
-1.746285	database ......................................................................................................	-0.124939
-3.649399	in C,	-0.124939
-1.407742	standard C,	-0.425969
-1.555511	include C,	-0.124939
-3.309290	// Same	-0.124939
-0.601913	12.4c. Same	-0.124939
-0.601913	12.4e. Same	-0.124939
-0.601913	12.4d. Same	-0.124939
-2.660871	can disable	-0.425969
-2.741018	should disable	-0.124939
-0.902864	test. disable	-0.124939
-3.449723	the assumption	-0.425969
-3.077232	an assumption	-0.124939
-2.540694	any assumption	-0.124939
-3.163367	is treated	-0.425969
-2.591400	also treated	-0.124939
-2.091501	simply treated	-0.124939
-1.941850	compilers. Fastcall	-0.124939
-0.601913	optimize(...) Fastcall	-0.124939
-0.601913	/vms Fastcall	-0.124939
-0.601913	compiler). Fastcall	-0.124939
-1.901282	vectors RGB	-0.124939
-0.601992	Aligning RGB	-0.425969
-0.601947	root, RGB	-0.124939
-3.601075	and avoids	-0.124939
-3.462566	that avoids	-0.124939
-3.271200	it avoids	-0.124939
-2.773330	but avoids	-0.124939
-3.166689	is prevented	-0.425969
-2.759882	be prevented	-0.124939
-2.818223	functions .......................................................................................	-0.124939
-1.900603	platform .......................................................................................	-0.124939
-1.746431	algorithm .......................................................................................	-0.124939
-1.379625	throughput .......................................................................................	-0.124939
-3.805008	is seldom	-0.124939
-3.462566	that seldom	-0.124939
-2.884088	from seldom	-0.124939
-1.998097	put seldom	-0.124939
-2.826452	for mixing	-0.124939
-3.163926	on mixing	-0.124939
-2.945610	when mixing	-0.124939
-0.380166	7.12 Branches	-0.425969
-0.902864	15.1b. Branches	-0.124939
-0.601947	penalty. Branches	-0.124939
-4.055779	the double.	-0.124939
-3.805008	is double.	-0.124939
-2.605607	two double.	-0.124939
-0.601913	long, double.	-0.124939
-2.875933	Example 16.1	-0.124939
-2.693854	example 16.1	-0.124939
-1.078741	153 16.1	-0.124939
-0.902797	below) 16.1	-0.124939
-2.417865	for (r	-0.726999
-3.805008	is 50%	-0.124939
-2.812734	only 50%	-0.124939
-1.679777	true 50%	-0.124939
-1.555423	mispredicted 50%	-0.124939
-2.892991	a suboptimal	-0.301030
-3.780575	to suboptimal	-0.124939
-0.391189	memcpy 16kB	-0.425969
-2.833001	different tasks.	-0.124939
-2.304235	simple tasks.	-0.124939
-1.998243	mathematical tasks.	-0.124939
-0.601913	distinct tasks.	-0.124939
-4.055779	the image	-0.124939
-3.601075	and image	-0.124939
-3.405232	are image	-0.124939
-1.203680	RGB image	-0.124939
-4.078968	the worst-case	-0.124939
-1.714760	testing worst-case	-0.124939
-0.944324	under worst-case	-0.124939
-1.679777	true (1)	-0.124939
-0.902797	problem: (1)	-0.124939
-0.601913	object: (1)	-0.124939
-0.601913	Step (1)	-0.124939
-3.779216	a float,	-0.124939
-3.756635	of float,	-0.124939
-2.494002	between float,	-0.124939
-1.300444	int, float,	-0.124939
-2.005018	example 9.5	-0.124939
-1.776456	variables. 9.5	-0.124939
-1.203667	88 9.5	-0.124939
-2.207363	= Induction;	-0.249877
-3.752625	to uncached	-0.124939
-3.405232	are uncached	-0.124939
-3.073352	an uncached	-0.124939
-2.241715	An uncached	-0.124939
-4.055779	the individual	-0.124939
-3.752625	to individual	-0.124939
-3.209807	by individual	-0.124939
-1.379478	identify individual	-0.124939
-3.752625	to begin	-0.124939
-3.462566	that begin	-0.124939
-3.345647	can begin	-0.124939
-2.261748	must begin	-0.124939
-1.258869	user interface.	-0.124939
-2.875933	Example 9.3	-0.124939
-2.503848	table 9.3	-0.124939
-1.379332	87 9.3	-0.124939
-0.902797	CPUs". 9.3	-0.124939
-2.044651	this option.	-0.124939
-1.446680	-fpic option.	-0.124939
-3.050988	the diagonal.	-0.425969
-3.618258	and interfaces	-0.124939
-1.673308	user interfaces	-0.124939
-2.105177	hardware interfaces	-0.124939
-1.753421	4 floats	-0.124939
-2.154867	four floats	-0.124939
-1.714534	100 floats	-0.124939
-3.106933	to another.	-0.124939
-3.272416	or another.	-0.124939
-3.064377	than another.	-0.124939
-2.386854	int N>	-0.425969
-0.380179	<int N>	-0.124939
-0.380179	7.19 Class	-0.425969
-0.380179	7.18 Class	-0.425969
-2.076402	program. Small	-0.124939
-1.203533	parallel. Small	-0.124939
-0.902797	controlled. Small	-0.124939
-0.902797	favorable: Small	-0.124939
-3.462566	that N1	-0.124939
-2.296505	constant N1	-0.124939
-1.855575	#define N1	-0.124939
-0.601913	#undef N1	-0.124939
-3.756635	of alloca	-0.124939
-2.943090	when alloca	-0.124939
-2.782528	which alloca	-0.124939
-1.600742	returns. alloca	-0.124939
-3.636150	and aliasing.	-0.124939
-1.715623	pointer aliasing.	-0.124939
-2.348589	be eliminated	-0.425969
-3.779216	a detailed	-0.124939
-3.486050	for detailed	-0.124939
-2.945363	more detailed	-0.124939
-2.462021	makes detailed	-0.124939
-1.997854	256 F32vec4	-0.124939
-0.505105	x^4 F32vec4	-0.124939
-1.203667	floats F32vec4	-0.124939
-3.272416	or mask	-0.124939
-2.241904	Use mask	-0.124939
-1.300758	__m128i mask	-0.425969
-3.206117	the original	-0.124939
-3.516422	The original	-0.124939
-2.771180	all caches	-0.124939
-2.405315	how caches	-0.124939
-2.015130	Most caches	-0.124939
-0.601913	other's caches	-0.124939
-3.780575	to recognize	-0.124939
-1.971348	will recognize	-0.602060
-0.505082	513 513	-0.124939
-0.601947	168.5 513	-0.124939
-0.601947	230.7 513	-0.124939
-1.203780	Threads Threads	-0.124939
-0.204097	7.29 Threads	-0.124939
-0.601947	Linux). Threads	-0.124939
-0.380179	7.27 Overloaded	-0.425969
-0.204104	7.26 Overloaded	-0.425969
-1.776540	2. Contentions	-0.124939
-0.902797	buffer. Contentions	-0.124939
-0.601913	(BTB). Contentions	-0.124939
-0.601913	experiments. Contentions	-0.124939
-3.163367	is illustrated	-0.425969
-3.430994	be illustrated	-0.124939
-3.142260	as illustrated	-0.124939
-1.702566	other words,	-0.249877
-3.505314	The returned	-0.124939
-2.757906	be returned	-0.425969
-3.414453	are returned	-0.124939
-3.505314	The existing	-0.124939
-3.185067	with existing	-0.124939
-2.388932	an existing	-0.124939
-2.402555	code. Let's	-0.124939
-1.679777	precision. Let's	-0.124939
-0.601913	rows. Let's	-0.124939
-0.601913	inputs. Let's	-0.124939
-2.585474	it is,	-0.124939
-2.605476	there is,	-0.124939
-1.963109	reason is,	-0.124939
-1.591625	example illustrates	-0.124939
-3.114994	of unit-testing	-0.124939
-3.215103	by unit-testing	-0.124939
-3.116700	This unit-testing	-0.124939
-0.346761	for(i=0; i<300;	-0.301030
-0.601980	for(i=i_div_3=0; i<300;	-0.124939
-1.504857	p) {return	-0.124939
-1.203827	r) {return	-0.124939
-0.601913	ReadB() {return	-0.124939
-0.601913	Sum1() {return	-0.124939
-2.166766	/ b2;	-0.124939
-0.505105	b1, b2;	-0.425969
-0.902864	B2 b2;	-0.124939
-1.504124	applications. Remember	-0.124939
-1.300297	names. Remember	-0.124939
-1.203533	better. Remember	-0.124939
-0.601913	down. Remember	-0.124939
-3.505314	The explicit	-0.124939
-2.388932	an explicit	-0.124939
-2.849923	make explicit	-0.124939
-4.055779	the mirror	-0.124939
-3.752625	to mirror	-0.124939
-3.042798	may mirror	-0.124939
-2.278341	its mirror	-0.124939
-2.736132	a dedicated	-0.249877
-1.236177	void Disp()	-0.726999
-1.973401	int r,	-0.726999
-2.201085	Example 8.26a	-0.124939
-2.005572	example 8.26a	-0.124939
-3.794788	a breakpoint	-0.124939
-0.902954	3 breakpoint	-0.124939
-1.504471	fixed breakpoint	-0.124939
-0.090173	a2, b1,	-0.425969
-3.462566	that appears	-0.124939
-3.271200	it appears	-0.124939
-2.982037	this appears	-0.124939
-2.241860	processor appears	-0.124939
-4.055779	the functionality	-0.124939
-2.103557	add functionality	-0.124939
-1.776248	desired functionality	-0.124939
-1.203680	well-defined functionality	-0.124939
-2.804514	other languages.	-0.124939
-1.642819	programming languages.	-0.124939
-0.601947	well-known languages.	-0.124939
-3.756635	of sequential	-0.124939
-3.642397	in sequential	-0.124939
-3.179892	with sequential	-0.124939
-0.902797	non- sequential	-0.124939
-1.930526	at www.agner.org/optimize/cppexamples.zip	-0.124939
-2.452363	See www.agner.org/optimize/cppexamples.zip	-0.124939
-1.228380	programming style.	-0.249877
-2.837066	The MOVNTQ	-0.425969
-3.315431	// MOVNTQ	-0.124939
-2.741224	cache MOVNTQ	-0.124939
-3.805008	is optimized.	-0.124939
-3.104883	not optimized.	-0.124939
-2.442779	less optimized.	-0.124939
-1.555130	fully optimized.	-0.124939
-2.866797	data .........................................................................................	-0.124939
-1.922084	directives .........................................................................................	-0.124939
-1.776102	vectorization .........................................................................................	-0.124939
-0.902797	topics .........................................................................................	-0.124939
-2.714557	class CHello	-0.124939
-1.224970	public CHello	-0.425969
-0.902864	Object2; CHello	-0.124939
-2.506711	be found	-0.301030
-1.642576	rarely found	-0.124939
-3.215103	by me	-0.124939
-0.204097	Let me	-0.425969
-0.601947	sent me	-0.124939
-4.055779	the counts.	-0.124939
-2.563361	clock counts.	-0.124939
-1.555130	subsequent counts.	-0.124939
-1.203533	case" counts.	-0.124939
-2.505295	performance measurement	-0.124939
-1.998097	put measurement	-0.124939
-1.776248	desired measurement	-0.124939
-0.601913	Your measurement	-0.124939
-2.607478	multiple layers	-0.124939
-2.497443	software layers	-0.124939
-2.221220	several layers	-0.124939
-1.997367	separate layers	-0.124939
-2.289871	error handler	-0.124939
-1.249736	exception handler	-0.124939
-3.163367	is coded	-0.425969
-3.430994	be coded	-0.124939
-3.414453	are coded	-0.124939
-3.618258	and changing	-0.124939
-2.530791	by changing	-0.124939
-1.804823	index changing	-0.124939
-3.449723	the unit-test	-0.124939
-3.794788	a unit-test	-0.124939
-2.986072	this unit-test	-0.124939
-4.078968	the implicit	-0.124939
-3.505314	The implicit	-0.124939
-2.388932	an implicit	-0.124939
-3.601075	and smaller.	-0.124939
-3.405232	are smaller.	-0.124939
-2.241860	bytes smaller.	-0.124939
-2.076694	files smaller.	-0.124939
-3.206117	the interval	-0.124939
-1.777116	desired interval	-0.124939
-4.055779	the 33	-0.124939
-1.555277	65 33	-0.124939
-0.902797	...................................................................................................................... 33	-0.124939
-0.601913	Booleans................................................................................................................... 33	-0.124939
-2.707212	page 31	-0.124939
-1.775956	variables. 31	-0.124939
-1.446572	ebx, 31	-0.124939
-1.203533	63 31	-0.124939
-0.391189	(unsigned int)b	-0.425969
-1.804131	platforms. 3.	-0.124939
-1.504271	interrupt 3.	-0.124939
-1.446572	vectorization. 3.	-0.124939
-0.601913	namespace. 3.	-0.124939
-1.879118	10 μs	-0.124939
-1.831039	5 μs	-0.124939
-0.601992	250 μs	-0.124939
-2.808142	other module.	-0.124939
-1.203890	another module.	-0.301030
-1.777565	Dynamic cast	-0.124939
-1.300590	Static cast	-0.124939
-0.601913	Reinterpret cast	-0.124939
-0.601913	Const cast	-0.124939
-3.142260	as 8-bit	-0.124939
-2.388932	an 8-bit	-0.425969
-2.691672	using 8-bit	-0.124939
-1.804485	size. Integers	-0.124939
-1.642435	sizes Integers	-0.124939
-0.380166	7.2 Integers	-0.425969
-2.376122	pointers Calling	-0.124939
-1.746870	class. Calling	-0.124939
-1.300297	5. Calling	-0.124939
-0.601913	exit. Calling	-0.124939
-0.505082	lrint (double	-0.425969
-0.902864	ipow (double	-0.124939
-0.601947	IntegerPower (double	-0.124939
-1.459153	}; Weekdays	-0.425969
-0.505102	enum Weekdays	-0.425969
-3.486050	for application-specific	-0.124939
-2.104431	store application-specific	-0.124939
-2.090935	optimizing application-specific	-0.124939
-1.600888	define application-specific	-0.124939
-2.223406	c first.	-0.124939
-1.715270	operand first.	-0.124939
-1.601181	come first.	-0.124939
-1.504124	fastest first.	-0.124939
-4.055779	the considerations	-0.124939
-3.209807	by considerations	-0.124939
-2.338334	following considerations	-0.124939
-0.902797	conflicting considerations	-0.124939
-2.507590	2 63	-0.124939
-2.232021	element 63	-0.124939
-1.300297	valid 63	-0.124939
-1.203533	63 63	-0.124939
-3.819753	is represented	-0.124939
-2.757906	be represented	-0.124939
-1.776908	fact represented	-0.124939
-3.752625	to force	-0.124939
-3.345647	can force	-0.124939
-2.658088	into force	-0.124939
-1.960863	applications force	-0.124939
-2.982037	this manually.	-0.124939
-2.698156	do manually.	-0.124939
-1.830606	reductions manually.	-0.124939
-0.902797	parentheses manually.	-0.124939
-3.430994	be identified	-0.124939
-2.741447	are identified	-0.425969
-2.557156	objects identified	-0.124939
-2.969951	in www.agner.org/optimize/cppexamples.zip.	-0.124939
-2.877841	at www.agner.org/optimize/cppexamples.zip.	-0.124939
-2.451962	See www.agner.org/optimize/cppexamples.zip.	-0.124939
-3.779216	a virus	-0.124939
-3.752625	to virus	-0.124939
-3.486050	for virus	-0.124939
-0.601913	Firewalls, virus	-0.124939
-3.618258	and structures.	-0.124939
-3.272416	or structures.	-0.124939
-2.177721	data structures.	-0.124939
-2.623881	library exp	-0.124939
-1.641842	Library exp	-0.124939
-1.203533	floats exp	-0.124939
-1.203680	exp exp	-0.124939
-1.984745	size (in	-0.425969
-2.189966	line (in	-0.124939
-1.680336	frequency (in	-0.124939
-3.140962	a pointer,	-0.124939
-1.601494	'this' pointer,	-0.124939
-0.601947	imported pointer,	-0.124939
-3.819753	is kept	-0.124939
-2.757906	be kept	-0.124939
-3.414453	are kept	-0.124939
-2.675268	double Y	-0.124939
-2.539124	variable Y	-0.124939
-2.515743	variables Y	-0.124939
-0.601913	Y; Y	-0.124939
-2.945610	when interprocedural	-0.124939
-2.699904	do interprocedural	-0.124939
-0.505105	enables interprocedural	-0.425969
-2.741447	are incompatible	-0.425969
-3.154529	code incompatible	-0.124939
-0.902864	use, incompatible	-0.124939
-1.998488	256 bytes)	-0.124939
-0.249860	(in bytes)	-0.301030
-3.449723	the selected	-0.124939
-3.819753	is selected	-0.124939
-3.430994	be selected	-0.124939
-3.819753	is multiplied	-0.124939
-2.757906	be multiplied	-0.425969
-1.804823	index multiplied	-0.124939
-2.969484	and reproducible	-0.124939
-2.948234	more reproducible	-0.124939
-2.155766	get reproducible	-0.124939
-3.405232	are normally	-0.124939
-3.104883	not normally	-0.124939
-3.113670	This normally	-0.124939
-2.367807	systems normally	-0.124939
-3.486050	for constants.	-0.124939
-2.945363	more constants.	-0.124939
-2.734494	integer constants.	-0.124939
-1.203533	defining constants.	-0.124939
-3.154529	code cache,	-0.124939
-2.177721	data cache,	-0.124939
-2.825970	same cache,	-0.124939
-3.142260	as entry	-0.124939
-2.221589	common entry	-0.124939
-0.806067	PLT entry	-0.124939
-3.805008	is inferior	-0.124939
-3.405232	are inferior	-0.124939
-3.073352	an inferior	-0.124939
-2.241715	An inferior	-0.124939
-2.759882	be obsolete.	-0.124939
-0.204104	Mostly obsolete.	-0.124939
-2.260148	while simultaneously	-0.124939
-2.250893	calculations simultaneously	-0.124939
-1.980368	running simultaneously	-0.124939
-1.300297	jobs simultaneously	-0.124939
-1.078942	service routine	-0.124939
-0.124930	initialization routine	-0.301030
-3.405232	are auto_ptr	-0.124939
-2.758797	one auto_ptr	-0.124939
-1.203533	one, auto_ptr	-0.124939
-0.601913	shared_ptr. auto_ptr	-0.124939
-2.491452	branch tree	-0.124939
-0.926438	binary tree	-0.301030
-3.166689	is unable	-0.425969
-2.759882	be unable	-0.425969
-1.555285	executed. Optimizes	-0.124939
-0.748097	vectorization. Optimizes	-0.124939
-1.078842	conventions. Optimizes	-0.124939
-2.108801	point constants,	-0.425969
-1.264563	string constants,	-0.124939
-3.449723	the techniques	-0.124939
-2.339264	following techniques	-0.124939
-2.014662	complicated techniques	-0.124939
-3.263365	or otherwise	-0.124939
-2.782528	which otherwise	-0.124939
-2.370279	they otherwise	-0.124939
-2.118087	would otherwise	-0.124939
-0.204097	7.9 Smart	-0.425969
-0.902864	deleted. Smart	-0.124939
-0.601947	auto_ptr. Smart	-0.124939
-3.271200	it opens	-0.124939
-3.237970	function opens	-0.124939
-2.782528	which opens	-0.124939
-2.714206	set opens	-0.124939
-2.757906	be modified	-0.425969
-3.414453	are modified	-0.124939
-1.997854	never modified	-0.124939
-2.005572	example 15.1a	-0.425969
-1.016890	reduced 15.1a	-0.425969
-1.804598	platforms. Comparison	-0.124939
-0.902864	8.1. Comparison	-0.124939
-0.204097	8.2 Comparison	-0.425969
-3.819753	is finished	-0.124939
-2.993562	have finished	-0.124939
-2.164184	has finished	-0.425969
-2.914958	is run.	-0.124939
-3.353252	can run.	-0.124939
-2.177721	data sequentially	-0.124939
-2.457697	stored sequentially	-0.124939
-2.271787	accessed sequentially	-0.124939
-2.884088	from Intel:	-0.124939
-2.402700	optimization Intel:	-0.124939
-1.203680	documentation Intel:	-0.124939
-0.902797	regularly. Intel:	-0.124939
-2.982037	this format.	-0.124939
-2.675268	double format.	-0.124939
-2.336883	file format.	-0.124939
-0.601913	OMF format.	-0.124939
-2.590276	C++ programs.	-0.124939
-2.178337	optimized programs.	-0.124939
-1.680069	16-bit programs.	-0.124939
-1.555863	oriented programs.	-0.124939
-1.504271	non-sequential manner	-0.124939
-0.902797	systematic manner	-0.124939
-0.601913	unconventional manner	-0.124939
-0.601913	column-wise manner	-0.124939
-2.699600	compilers work.	-0.124939
-2.342685	always work.	-0.124939
-2.280231	important work.	-0.124939
-1.922522	microprocessors work.	-0.124939
-3.263365	or uint64_t	-0.124939
-2.507590	2 uint64_t	-0.124939
-1.997221	256 uint64_t	-0.124939
-0.601913	264-1 uint64_t	-0.124939
-2.135158	if (level	-0.726999
-3.642397	in tests	-0.124939
-3.494484	The tests	-0.124939
-2.529139	some tests	-0.124939
-2.505295	performance tests	-0.124939
-1.379478	times. Then	-0.124939
-1.203533	support. Then	-0.124939
-1.203533	option. Then	-0.124939
-0.601913	F1? Then	-0.124939
-3.140962	a soft	-0.425969
-1.714534	so-called soft	-0.124939
-1.078842	FPGA soft	-0.124939
-2.207363	= 100,	-0.124939
-1.446366	reliable results.	-0.124939
-0.505082	reproducible results.	-0.124939
-0.902864	undesired results.	-0.124939
-3.779216	a hyperthreading	-0.124939
-2.958838	use hyperthreading	-0.124939
-2.789645	If hyperthreading	-0.124939
-2.252057	avoid hyperthreading	-0.124939
-3.618258	and operators.	-0.124939
-0.982113	overloaded operators.	-0.124939
-1.078955	decrement operators.	-0.124939
-3.163367	is simpler	-0.124939
-2.223051	much simpler	-0.124939
-1.922471	becomes simpler	-0.124939
-2.801031	point format	-0.124939
-1.642593	file format	-0.124939
-1.714760	right format	-0.124939
-3.140962	a reasonable	-0.124939
-2.814870	only reasonable	-0.124939
-2.724428	no reasonable	-0.124939
-4.055779	the resolution	-0.124939
-1.979784	high resolution	-0.124939
-1.901187	higher resolution	-0.124939
-0.902797	millisecond resolution	-0.124939
-2.734494	integer units,	-0.124939
-2.252057	execution units,	-0.124939
-1.961593	addition units,	-0.124939
-1.504271	arithmetic units,	-0.124939
-2.875933	Example 12.2	-0.124939
-2.270214	function. 12.2	-0.124939
-1.300297	107 12.2	-0.124939
-1.203533	126 12.2	-0.124939
-1.747932	vector b:	-0.249877
-2.866797	data processing.	-0.124939
-1.641988	parallel processing.	-0.124939
-1.203533	image processing.	-0.124939
-1.078888	multi-core processing.	-0.124939
-2.892991	a well-defined	-0.301030
-1.379972	behavior well-defined	-0.124939
-2.222227	// Still	-0.726999
-2.414951	- 45	-0.425969
-2.223614	i; 45	-0.124939
-0.601947	Loops...................................................................................................................... 45	-0.124939
-1.782277	from bb	-0.726999
-2.971430	in detail	-0.124939
-2.951125	more detail	-0.124939
-4.078968	the advices	-0.124939
-1.922246	Many advices	-0.124939
-1.379532	security advices	-0.124939
-2.587990	The conclusion	-0.301030
-3.166689	is deleted	-0.124939
-1.902134	later deleted	-0.124939
-4.103464	the 49	-0.124939
-2.021961	page 49	-0.124939
-2.556097	function parameters,	-0.124939
-2.398936	template parameters,	-0.124939
-2.403846	code. Storing	-0.124939
-1.747170	class. Storing	-0.124939
-1.714873	mode. Storing	-0.124939
-3.618258	and (set)	-0.124939
-2.993562	have (set)	-0.124939
-0.601947	formula: (set)	-0.124939
-3.618258	and fine-grained	-0.124939
-3.185067	with fine-grained	-0.124939
-0.601947	exploiting fine-grained	-0.124939
-1.504513	zero. Execution	-0.124939
-0.204104	3.16 Execution	-0.425969
-2.366658	= LoadVector(cc	-0.602060
-2.135752	an arbitrary	-0.124939
-1.747170	class. Which	-0.124939
-0.902864	values. Which	-0.124939
-0.902864	effect. Which	-0.124939
-3.045638	may behave	-0.124939
-2.701022	compilers behave	-0.124939
-2.344400	always behave	-0.124939
-2.424639	64 bits,	-0.124939
-2.339601	32 bits,	-0.124939
-2.154867	four bits,	-0.124939
-3.618258	and compact.	-0.124939
-2.948234	more compact.	-0.124939
-2.443774	less compact.	-0.124939
-2.789663	that behaves	-0.425969
-0.902931	Overflow behaves	-0.124939
-3.636150	and FPGA	-0.124939
-2.389731	an FPGA	-0.124939
-3.618258	and earlier	-0.124939
-3.185067	with earlier	-0.124939
-2.305573	& earlier	-0.124939
-1.794595	< 5)	-0.124939
-1.777116	>= 5)	-0.124939
-1.179173	operators &,	-0.602060
-2.710685	page 101	-0.124939
-1.300464	necessary. 101	-0.124939
-0.601947	Multithreading.............................................................................................................. 101	-0.124939
-1.388036	following reasons:	-0.301030
-2.468020	elements consecutively	-0.124939
-2.457697	stored consecutively	-0.124939
-2.271787	accessed consecutively	-0.124939
-1.961980	efficient. Extra	-0.124939
-1.962657	data. Extra	-0.124939
-1.446479	processor. Extra	-0.124939
-3.771380	of error.	-0.124939
-3.077232	an error.	-0.124939
-2.339264	programming error.	-0.124939
-3.430994	be carried	-0.124939
-1.879344	were carried	-0.124939
-0.601947	loop- carried	-0.124939
-3.215103	by reordering	-0.124939
-3.116700	This reordering	-0.124939
-2.986072	this reordering	-0.124939
-2.155429	another platform.	-0.124939
-2.031503	Mac platform.	-0.124939
-1.504471	PC platform.	-0.124939
-2.743430	are satisfied	-0.425969
-3.116033	not satisfied	-0.124939
-3.045638	may catch	-0.124939
-2.916823	will catch	-0.124939
-2.911401	} catch	-0.124939
-0.902864	-mAVX /arch:AVX	-0.124939
-0.601947	(multithreaded) /arch:AVX	-0.124939
-0.601947	(/arch:SSE2, /arch:AVX	-0.124939
-2.710685	page 93	-0.124939
-1.680336	containers 93	-0.124939
-1.078842	..................................................................................................... 93	-0.124939
-2.223614	virtual 53	-0.124939
-0.902864	........................................................................................ 53	-0.124939
-0.601947	(methods)......................................................................... 53	-0.124939
-1.504358	X #else	-0.124939
-1.300577	); #else	-0.124939
-0.902864	__attribute__((const)) #else	-0.124939
-2.389731	an addition.	-0.124939
-2.803340	point addition.	-0.124939
-2.397456	time. Text	-0.124939
-1.680110	classes. Text	-0.124939
-1.078842	Strings Text	-0.124939
-2.247959	with big-endian	-0.602060
-0.902864	B2; 54	-0.124939
-0.601947	........................................................................... 54	-0.124939
-0.601947	.............................................................................................................. 54	-0.124939
-3.618258	and 119	-0.124939
-1.714760	library. 119	-0.124939
-0.601947	vectors........................................................................ 119	-0.124939
-1.961417	&& !a	-0.124939
-1.942449	|| !a	-0.124939
-0.902864	a&&(b||c) !a	-0.124939
-2.868457	of abstraction	-0.124939
-1.380047	bits each,	-0.124939
-1.078974	>= 11)	-0.425969
-0.902931	(chapter 11)	-0.124939
-3.771380	of code).	-0.124939
-1.879118	binary code).	-0.124939
-0.601947	(byte code).	-0.124939
-2.062425	allocation ......................................................................................	-0.124939
-1.203780	unit-testing ......................................................................................	-0.124939
-0.902864	cycle? ......................................................................................	-0.124939
-3.455353	the wrong	-0.124939
-3.810938	a wrong	-0.124939
-2.366658	= LoadVector(bb	-0.602060
-3.780575	to alias	-0.124939
-2.427954	not alias	-0.124939
-1.937421	memory blocks,	-0.124939
-2.369753	user feedback	-0.124939
-1.831039	main feedback	-0.124939
-1.078842	User feedback	-0.124939
-1.158171	#define pure_function	-0.124939
-0.601980	Func1(double) pure_function	-0.124939
-3.108537	- a-a	-0.124939
-2.648794	n.a. a-a	-0.124939
-0.902864	a-(-b)=a+b a-a	-0.124939
-1.062760	dependency chains.	-0.124939
-3.496332	for prefetching	-0.124939
-1.980915	automatic prefetching	-0.124939
-1.203780	simultaneously prefetching	-0.124939
-4.078968	the compiler-generated	-0.124939
-3.618258	and compiler-generated	-0.124939
-1.300464	understand compiler-generated	-0.124939
-2.926114	A redesign	-0.124939
-0.380179	complete redesign	-0.425969
-0.380179	behave differently	-0.124939
-1.079022	behaves differently	-0.124939
-2.912038	then B,	-0.124939
-0.380179	A, B,	-0.425969
-0.505102	Optimizes reasonably	-0.425969
-1.078942	optimizes reasonably	-0.124939
-1.446479	core. Two	-0.124939
-1.078842	templates. Two	-0.124939
-0.601947	_mm_add_epi16(a,b). Two	-0.124939
-0.902864	.................................................................................. 55	-0.124939
-0.601947	-2.0 55	-0.124939
-0.601947	.................................................................................................................... 55	-0.124939
-0.793899	math libraries:	-0.301030
-2.659150	into projects	-0.124939
-2.591959	C++ projects	-0.124939
-2.499190	software projects	-0.124939
-0.492900	( 1)sign	-0.602060
-2.881985	Example 14.6	-0.124939
-2.445012	0; 14.6	-0.124939
-1.078842	137 14.6	-0.124939
-4.078968	the combination	-0.124939
-3.794788	a combination	-0.124939
-1.300577	OR combination	-0.124939
-3.246643	function libraries,	-0.124939
-1.181684	link libraries,	-0.425969
-2.242691	doesn't mean	-0.124939
-1.901621	numbers mean	-0.124939
-1.078842	brackets mean	-0.124939
-3.060224	compiler inserts	-0.124939
-2.416315	often inserts	-0.124939
-1.804823	profiler inserts	-0.124939
-0.380179	_mm_loadu_si128((__m128i const*)p);	-0.425969
-0.601980	_mm_load_si128((__m128i const*)p);	-0.124939
-3.794788	a hidden	-0.124939
-3.430994	be hidden	-0.124939
-1.922021	actually hidden	-0.124939
-2.649527	n.a. x*x*x*x*x*x*x*x	-0.124939
-0.204104	((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x	-0.425969
-2.196187	from errors.	-0.124939
-2.586446	such errors.	-0.124939
-2.091388	memory. One	-0.124939
-1.300464	tools. One	-0.124939
-1.078842	once. One	-0.124939
-0.982128	square blocking	-0.124939
-0.601980	Square blocking	-0.124939
-2.634807	// Faster	-0.425969
-1.078942	elsewhere. Faster	-0.124939
-1.555585	typical sources	-0.124939
-0.806068	frequent sources	-0.425969
-3.766375	to well-tested	-0.124939
-3.215103	by well-tested	-0.124939
-2.131168	contains well-tested	-0.124939
-1.504704	small devices,	-0.124939
-1.446760	smallest devices,	-0.124939
-2.801031	point multiplication,	-0.124939
-0.902864	subtraction, multiplication,	-0.124939
-0.601947	(addition, multiplication,	-0.124939
-1.493804	AVX part.	-0.425969
-2.047388	particular part.	-0.124939
-3.272416	or API	-0.124939
-2.337805	system API	-0.124939
-2.104726	standard API	-0.124939
-2.171234	program starts	-0.124939
-1.831553	computer starts	-0.124939
-3.154529	code only.	-0.124939
-1.944256	parts only.	-0.124939
-0.902864	multiplications only.	-0.124939
-2.106254	loop counters,	-0.124939
-2.610850	multiple counters,	-0.124939
-2.864769	program execution,	-0.124939
-1.049138	out-of-order execution,	-0.124939
-2.445012	0; list[i+1]	-0.124939
-1.203667	i_div_3; list[i+1]	-0.124939
-0.601947	=0; list[i+1]	-0.124939
-4.078968	the distance	-0.124939
-2.986072	this distance	-0.124939
-0.902864	whose distance	-0.124939
-2.888122	Example 14.28	-0.124939
-2.005572	example 14.28	-0.124939
-3.766375	to zero,	-0.124939
-0.601947	towards zero,	-0.124939
-0.601947	select_gt(b, zero,	-0.124939
-2.445012	0; r1	-0.124939
-1.805275	Loop r1	-0.124939
-1.446818	SIZE; r1	-0.124939
-1.805275	Loop r2	-0.124939
-1.078842	r1; r2	-0.124939
-0.601947	r1+1; r2	-0.124939
-0.902864	ammintrin.h (MS)	-0.124939
-0.601947	intrin.h (MS)	-0.124939
-0.601947	nmmintrin.h (MS)	-0.124939
-3.771380	of aligning	-0.124939
-3.496332	for aligning	-0.124939
-2.886896	from aligning	-0.124939
-3.496332	for assuming	-0.124939
-3.414453	are assuming	-0.124939
-2.886896	from assuming	-0.124939
-3.302196	= r;	-0.124939
-1.794595	< r;	-0.425969
-1.158219	range analysis	-0.124939
-0.601980	thorough analysis	-0.124939
-2.355023	may seem	-0.124939
-1.831394	tested seem	-0.124939
-2.973437	and perhaps	-0.124939
-1.643054	except perhaps	-0.124939
-0.806084	interrupt service	-0.124939
-0.601980	Interrupt service	-0.124939
-2.281043	Gnu Comes	-0.124939
-1.980689	Microsoft Comes	-0.124939
-0.601947	Embarcadero Comes	-0.124939
-1.700547	+ esp	-0.124939
-1.673384	new features.	-0.124939
-2.243524	processor features.	-0.124939
-2.659659	b ---xx----	-0.124939
-0.204104	(-a==-b)=(a==b) ---xx----	-0.124939
-2.156555	parameters ...............................................................................................	-0.124939
-2.091501	optimizing ...............................................................................................	-0.124939
-1.555511	usability ...............................................................................................	-0.124939
-4.078968	the C/C++	-0.124939
-3.505314	The C/C++	-0.124939
-1.203667	Watcom C/C++	-0.124939
-3.302196	= 100.	-0.124939
-1.794595	< 100.	-0.124939
-3.078083	int b;};	-0.124939
-1.984824	double b;};	-0.425969
-3.469094	that consumes	-0.124939
-2.784760	which consumes	-0.124939
-1.746944	still consumes	-0.124939
-2.154867	four numbers,	-0.124939
-1.997854	model numbers,	-0.124939
-0.902864	hexadecimal numbers,	-0.124939
-1.555511	critical. 129	-0.124939
-1.078842	129 129	-0.124939
-0.601947	17.4 129	-0.124939
-2.859733	to reload	-0.301030
-4.078968	the 124	-0.124939
-0.601947	cases........................................................................................................ 124	-0.124939
-0.601947	.................................................................................... 124	-0.124939
-2.213049	code motion	-0.124939
-3.810938	a speed-critical	-0.124939
-2.828682	for speed-critical	-0.124939
-3.771380	of numbers:	-0.124939
-2.801031	point numbers:	-0.124939
-1.714534	100 numbers:	-0.124939
-1.830814	section (page	-0.124939
-1.601042	chapter (page	-0.124939
-1.203780	8.1 (page	-0.124939
-3.780575	to 12.	-0.124939
-0.902946	chapter 12.	-0.124939
-2.425987	take 1000	-0.124939
-0.505102	repeats 1000	-0.425969
-3.414453	are long.	-0.124939
-1.961642	too long.	-0.124939
-1.300804	unacceptably long.	-0.124939
-2.743272	cache organization	-0.124939
-0.748124	Cache organization	-0.124939
-2.916830	is slow,	-0.124939
-3.166689	is performed	-0.425969
-3.440385	be performed	-0.124939
-3.794788	a high-level	-0.124939
-2.261210	while high-level	-0.124939
-1.855412	advanced high-level	-0.124939
-2.718555	in advance.	-0.301030
-2.062650	fast anyway	-0.124939
-1.746719	database anyway	-0.124939
-1.555511	default anyway	-0.124939
-1.931645	library (*.dll	-0.425969
-2.405379	libraries (*.dll	-0.124939
-3.505314	The Intel-based	-0.124939
-3.142260	as Intel-based	-0.124939
-1.203667	BSD, Intel-based	-0.124939
-3.636150	and main()	-0.124939
-2.386854	int main()	-0.425969
-3.294053	= x2	-0.124939
-2.677436	double x2	-0.124939
-2.612687	float x2	-0.124939
-2.337805	system database,	-0.124939
-1.300464	remote database,	-0.124939
-1.300464	list, database,	-0.124939
-1.804598	platforms. Works	-0.124939
-0.601947	MKL). Works	-0.124939
-0.601947	(IPP). Works	-0.124939
-3.771380	of calculations:	-0.124939
-3.496332	for calculations:	-0.124939
-1.203667	modulo calculations:	-0.124939
-4.078968	the basis	-0.124939
-0.601947	(FIFO) basis	-0.124939
-0.601947	(FILO) basis	-0.124939
-3.505314	The updating	-0.124939
-2.377727	need updating	-0.124939
-1.642774	Automatic updating	-0.124939
-2.869446	data manipulation	-0.124939
-2.436964	bit manipulation	-0.124939
-1.961980	string manipulation	-0.124939
-2.619170	= 28.	-0.124939
-2.608910	number 28.	-0.124939
-2.714557	class C0	-0.124939
-1.922584	public C0	-0.124939
-0.902864	obj1; C0	-0.124939
-2.062199	optimize access,	-0.124939
-1.379532	base access,	-0.124939
-0.601947	First-In-Last-Out access,	-0.124939
-3.771380	of calculations,	-0.124939
-2.118090	doing calculations,	-0.124939
-1.998643	mathematical calculations,	-0.124939
-3.496332	for multi-core	-0.124939
-3.272416	or multi-core	-0.124939
-3.163926	on multi-core	-0.124939
-2.741224	cache level,	-0.124939
-2.338141	file level,	-0.124939
-1.680223	priority level,	-0.124939
-2.880052	at Exception	-0.124939
-1.317760	handling Exception	-0.425969
-3.766375	to optimization,	-0.124939
-2.861943	program optimization,	-0.124939
-0.902864	Without optimization,	-0.124939
-2.211236	etc. Whether	-0.124939
-1.555398	expressions. Whether	-0.124939
-0.601947	Sum3. Whether	-0.124939
-3.455353	the contents	-0.425969
-1.642656	entire contents	-0.124939
-2.607710	two books	-0.124939
-1.831717	relevant books	-0.124939
-0.601947	group books	-0.124939
-3.819753	is removed	-0.124939
-3.430994	be removed	-0.124939
-3.116700	This removed	-0.124939
-2.710685	page 164	-0.124939
-1.078842	.......................................................................................................... 164	-0.124939
-0.601947	www.gnu.org/copyleft/fdl.html. 164	0.000000
-1.664558	float matrix[rows][columns];	-0.602060
-0.969975	linked list.	-0.124939
-3.516422	The exponential	-0.124939
-0.204104	logarithms, exponential	-0.425969
-3.505314	The generality	-0.124939
-3.496332	for generality	-0.124939
-1.679885	full generality	-0.124939
-2.586201	it takes.	-0.124939
-2.334055	part takes.	-0.124939
-3.794788	a multithreaded	-0.124939
-2.576850	In multithreaded	-0.124939
-2.091501	optimizing multithreaded	-0.124939
-1.334278	1; list[i+2]	-0.425969
-1.203801	i_div_3; list[i+2]	-0.124939
-2.680877	size Total	-0.124939
-1.771859	elements Total	-0.425969
-3.274736	it explicitly.	-0.124939
-3.154529	code explicitly.	-0.124939
-2.403959	optimization explicitly.	-0.124939
-2.804514	other programs,	-0.124939
-2.530822	some programs,	-0.124939
-1.680336	16-bit programs,	-0.124939
-3.274736	it optimizes	-0.124939
-3.060224	compiler optimizes	-0.124939
-1.446366	Studio optimizes	-0.124939
-1.504513	profiling instruments	-0.124939
-0.505102	measurement instruments	-0.124939
-2.634807	// After	-0.425969
-1.504513	branch. After	-0.124939
-1.133332	reductions involving	-0.124939
-1.504911	Conversions involving	-0.124939
-0.948801	XMM (vector)	-0.602060
-4.078968	the unfortunate	-0.124939
-3.819753	is unfortunate	-0.124939
-3.077232	an unfortunate	-0.124939
-0.492890	2: "Optimizing	-0.602060
-4.078968	the parameter,	-0.124939
-3.794788	a parameter,	-0.124939
-3.242285	function parameter,	-0.124939
-2.886896	from exceptions.	-0.124939
-2.691672	using exceptions.	-0.124939
-2.105177	hardware exceptions.	-0.124939
-4.078968	the time-	-0.124939
-2.499973	very time-	-0.124939
-1.998531	put time-	-0.124939
-0.124934	Opteron K8	-0.124939
-3.819753	is loaded.	-0.124939
-2.189180	been loaded.	-0.124939
-0.902864	heavily loaded.	-0.124939
-2.577033	for (i=0;	-0.301030
-3.636150	and 14.30	-0.124939
-2.201085	Example 14.30	-0.124939
-3.294053	= b;}	-0.124939
-2.647506	+ b;}	-0.124939
-1.203667	{return b;}	-0.124939
-3.116700	This wasteful	-0.124939
-2.986072	this wasteful	-0.124939
-0.601947	unnecessarily wasteful	-0.124939
-2.381622	// Return	-0.124939
-1.395821	void StoreVector(void	-0.602060
-0.249867	Smaller microcontrollers	-0.602060
-3.649399	in character	-0.124939
-3.185067	with character	-0.124939
-3.142260	as character	-0.124939
-3.166689	is implemented.	-0.425969
-3.423873	are implemented.	-0.124939
-1.625408	In fact,	-0.301030
-1.930775	at runtime.	-0.124939
-2.799439	loop manually	-0.124939
-2.191317	done manually	-0.124939
-1.078842	motion manually	-0.124939
-3.771380	of xxn	-0.124939
-2.201186	+= xxn	-0.124939
-0.601947	x^n/n! xxn	-0.124939
-0.124934	&, |,	-0.301030
-2.881985	Example 7.2	-0.124939
-1.680110	classes. 7.2	-0.124939
-1.203667	26 7.2	-0.124939
-1.680223	thread. Thread-local	-0.124939
-1.555511	block. Thread-local	-0.124939
-1.379645	times. Thread-local	-0.124939
-2.861943	program 81	-0.124939
-2.710685	page 81	-0.124939
-0.601947	................................................................................... 81	-0.124939
-2.881985	Example 7.1	-0.124939
-1.300577	manuals. 7.1	-0.124939
-1.203667	26 7.1	-0.124939
-1.962093	dispatcher signal	-0.124939
-1.379645	processing, signal	-0.124939
-0.902864	statistics, signal	-0.124939
-2.894984	a circular	-0.602060
-2.696566	example 7.4	-0.124939
-2.339601	32 7.4	-0.124939
-2.156329	functions. 7.4	-0.124939
-3.766375	to ignore	-0.124939
-3.045638	may ignore	-0.124939
-2.315467	cases ignore	-0.124939
-3.618258	and keywords	-0.124939
-2.564025	many keywords	-0.124939
-0.601947	Compiler-specific keywords	-0.124939
-2.881985	Example 7.8	-0.124939
-0.902864	37 7.8	-0.124939
-0.902864	changed. 7.8	-0.124939
-2.122163	only once.	-0.124939
-2.456215	called once.	-0.124939
-3.009578	{ 89	-0.124939
-2.021961	page 89	-0.425969
-3.074073	int list[100];	-0.124939
-2.612687	float list[100];	-0.124939
-1.747170	S1 list[100];	-0.124939
-2.759882	be considered	-0.124939
-0.601980	traditionally considered	-0.124939
-2.971430	in Windows).	-0.124939
-2.602966	64-bit Windows).	-0.124939
-2.214047	compile for.	-0.124939
-1.107114	intended for.	-0.124939
-4.078968	the divisions	-0.124939
-3.618258	and divisions	-0.124939
-1.300464	Multiple divisions	-0.124939
-2.619170	= 8;	-0.124939
-2.105539	: 8;	-0.124939
-3.469094	that reflects	-0.124939
-3.116700	This reflects	-0.124939
-2.677436	double reflects	-0.124939
-2.189853	classes .....................................................................................................	-0.124939
-1.078842	Hyperthreading .....................................................................................................	-0.124939
-0.902864	Implementation .....................................................................................................	-0.124939
-3.469094	that lies	-0.124939
-1.962996	difference lies	-0.124939
-1.747509	efficiency lies	-0.124939
-3.636150	and trigonometric	-0.124939
-1.079038	functions, trigonometric	-0.425969
-3.110004	to manipulate	-0.425969
-3.281660	or manipulate	-0.124939
-2.381622	// fractional	-0.602060
-2.196187	from -128	-0.124939
-2.443980	8 -128	-0.124939
-2.759882	be spaced	-0.425969
-3.423873	are spaced	-0.124939
-3.081147	an approximate	-0.124939
-1.365233	fast approximate	-0.124939
-3.649399	in comparisons,	-0.124939
-3.414453	are comparisons,	-0.124939
-2.801031	point comparisons,	-0.124939
-1.078842	features. User	-0.124939
-0.902864	deleted. User	-0.124939
-0.601947	seriously. User	-0.124939
-3.209321	the dividend	-0.301030
-2.185324	at unpredictable	-0.124939
-2.191295	cause unpredictable	-0.124939
-1.045696	__m128i LoadVector(void	-0.602060
-3.215103	by step.	-0.124939
-2.030602	next step.	-0.124939
-1.901508	second step.	-0.124939
-2.677436	double Z	-0.124939
-2.540806	variable Z	-0.124939
-0.601947	Z; Z	-0.124939
-2.743430	are separated	-0.425969
-3.116033	not separated	-0.124939
-3.618258	and 64,	-0.124939
-3.215103	by 64,	-0.124939
-0.601947	32, 64,	-0.124939
-3.618258	and copies	-0.124939
-3.469094	that copies	-0.124939
-0.601947	ebx,31 copies	-0.124939
-2.829374	same brand.	-0.124939
-2.116883	CPU brand.	-0.124939
-3.819753	is annoying	-0.124939
-3.414453	are annoying	-0.124939
-3.077232	an annoying	-0.124939
-2.456215	called CodeAnalyst.	-0.124939
-1.515587	AMD CodeAnalyst.	-0.124939
-0.748092	19 Literature	-0.124939
-0.601980	titles. Literature	-0.124939
-3.110004	to study	-0.425969
-1.831394	my study	-0.124939
-3.209321	the stack,	-0.301030
-0.647797	garbage collection.	-0.124939
-2.586201	it occurs,	-0.124939
-1.998488	never occurs,	-0.124939
-1.249759	option -fno-pic	-0.124939
-1.203849	platform _M_IX86	-0.124939
-1.078942	_M_IX86 _M_IX86	-0.124939
-3.819753	is elsewhere	-0.124939
-2.092404	information elsewhere	-0.124939
-1.805162	errors elsewhere	-0.124939
-3.272416	or bypassing	-0.124939
-3.215103	by bypassing	-0.124939
-3.060224	compiler bypassing	-0.124939
-2.859733	to 0x273F	-0.124939
-3.618258	and 135	-0.124939
-2.911401	} 135	-0.124939
-0.601947	once................................... 135	-0.124939
-3.242285	function looks	-0.124939
-3.154529	code looks	-0.124939
-2.189853	classes looks	-0.124939
-1.880335	union {double	-0.124939
-1.049075	S1 {double	-0.425969
-2.828682	for implementing	-0.124939
-2.233937	But implementing	-0.124939
-3.771380	of int.	-0.124939
-3.766375	to int.	-0.124939
-2.315017	type int.	-0.124939
-2.882686	memory space,	-0.124939
-2.741224	cache space,	-0.124939
-1.680562	RAM space,	-0.124939
-3.349433	can skip	-0.124939
-3.045638	may skip	-0.124939
-0.902864	Please skip	-0.124939
-2.710685	page 137	-0.124939
-1.446592	rounding 137	-0.124939
-0.601947	division...................................................................................................... 137	-0.124939
-1.504358	line. 132	-0.124939
-1.203667	......................................................................................... 132	-0.124939
-0.902864	................................................................................................. 132	-0.124939
-3.771380	of position-	-0.124939
-2.462786	makes position-	-0.124939
-1.714534	so-called position-	-0.124939
-3.321661	// Index	-0.124939
-0.204104	"Error: Index	-0.425969
-0.601947	optimize("a",on). Specifies	-0.124939
-0.601947	only). Specifies	-0.124939
-0.601947	__attribute__((aligned(16))). Specifies	-0.124939
-3.209321	the residual	-0.602060
-2.162015	vector operations,	-0.124939
-2.739293	integer operations,	-0.124939
-3.771380	of C++.	-0.124939
-3.272416	or C++.	-0.124939
-2.243367	compiled C++.	-0.124939
-2.804514	other input/output	-0.124939
-2.338141	file input/output	-0.124939
-1.078842	File input/output	-0.124939
-3.063211	compiler packages	-0.124939
-1.805336	software packages	-0.124939
-4.078968	the operations:	-0.124939
-1.300577	AND operations:	-0.124939
-1.203667	modulo operations:	-0.124939
-1.922926	processors. Explicit	-0.124939
-0.204104	9.11 Explicit	-0.425969
-2.986072	this purpose.	-0.124939
-2.231583	specific purpose.	-0.124939
-2.046622	particular purpose.	-0.124939
-1.875230	* reciprocal_divisor;	-0.124939
-0.601980	y2, reciprocal_divisor;	-0.124939
-2.457324	before compilation.	-0.124939
-0.806116	just-in-time compilation.	-0.124939
-3.294053	= (number	-0.124939
-2.166766	/ (number	-0.124939
-1.680110	% (number	-0.124939
-1.291111	big endian	-0.124939
-3.469094	that allocates	-0.124939
-3.274736	it allocates	-0.124939
-0.601947	queue) allocates	-0.124939
-2.710685	page 136	-0.124939
-1.446592	x); 136	-0.124939
-1.203667	............................................................................................. 136	-0.124939
-2.904927	It reveals	-0.124939
-1.078842	listing reveals	-0.124939
-0.601947	ball reveals	-0.124939
-3.835016	is filled	-0.124939
-2.759882	be filled	-0.124939
-3.272416	or (requires	-0.124939
-2.716116	set (requires	-0.124939
-1.203667	loader (requires	-0.124939
-1.751400	compilers offer	-0.301030
-1.078942	Bitfields Bitfields	-0.124939
-0.380179	7.25 Bitfields	-0.124939
-3.315431	// At	-0.124939
-0.902864	computers. At	-0.124939
-0.601947	dominating. At	-0.124939
-2.389731	an up-to-date	-0.124939
-2.701816	most up-to-date	-0.124939
-1.505016	before leaving	-0.301030
-1.446680	details. Inheritance	-0.124939
-0.380179	7.22 Inheritance	-0.124939
-2.861943	program 153	-0.124939
-2.710685	page 153	-0.124939
-0.601947	speed.............................................................................................................. 153	-0.124939
-1.980351	high degree	-0.124939
-1.555285	typical degree	-0.124939
-0.601947	n'th degree	-0.124939
-2.060742	{ _mm_storeu_si128((__m128i	-0.602060
-1.922923	optimizations automatically,	-0.124939
-1.300464	15.1c automatically,	-0.124939
-0.902864	11.1b automatically,	-0.124939
-2.567207	array sequentially.	-0.124939
-1.574853	accessed sequentially.	-0.124939
-0.380179	7.4 Enums	-0.124939
-0.601980	disguise. Enums	-0.124939
-1.601155	CPU. Algebraic	-0.124939
-1.379532	call. Algebraic	-0.124939
-0.601947	reductions. Algebraic	-0.124939
-3.786643	of A,	-0.124939
-2.386854	int A,	-0.425969
-4.078968	the operands.	-0.124939
-2.062087	both operands.	-0.124939
-1.998418	Boolean operands.	-0.124939
-1.300690	for(i=0; i<100;	-0.124939
-1.078955	(i=0; i<100;	-0.124939
-0.601947	for(i=0,i2=0; i<100;	-0.124939
-0.602012	0.18 0.11	-0.124939
-1.078942	0.12 0.11	-0.124939
-2.509467	2 0.12	-0.124939
-1.300690	0.18 0.12	-0.124939
-0.902864	0.44 0.12	-0.124939
-3.455353	the nearest	-0.124939
-3.780575	to nearest	-0.124939
-1.680110	libraries. To	-0.124939
-1.300464	future. To	-0.124939
-0.601947	rebooted. To	-0.124939
-2.369400	x x--	-0.425969
-1.446600	reductions: x--	-0.124939
-2.591959	C++ language,	-0.124939
-2.339264	programming language,	-0.124939
-1.446705	definition language,	-0.124939
-4.078968	the 145	-0.124939
-2.710685	page 145	-0.124939
-1.203667	....................................................................................... 145	-0.124939
-2.710685	page 140	-0.124939
-2.612687	float 140	-0.124939
-0.601947	double..................................................................................... 140	-0.124939
-2.710685	page 141	-0.124939
-0.902864	x64 141	-0.124939
-0.601947	................................... 141	-0.124939
-3.064377	than RISC	-0.124939
-2.495294	between RISC	-0.124939
-0.902864	got RISC	-0.124939
-1.922359	processors. Consider	-0.124939
-1.746719	resources. Consider	-0.124939
-0.902864	loops. Consider	-0.124939
-3.771380	of text	-0.124939
-1.679998	handle text	-0.124939
-1.446366	storing text	-0.124939
-1.855637	compiler. Object	-0.124939
-1.203667	88 Object	-0.124939
-0.601947	discontinued Object	-0.124939
-2.881985	Example 14.10	-0.124939
-0.902864	142 14.10	-0.124939
-0.601947	u[0]. 14.10	-0.124939
-2.881985	Example 14.11	-0.124939
-1.078842	145 14.11	-0.124939
-0.601947	limitation). 14.11	-0.124939
-1.446999	template <int	-0.301030
-1.504245	iterations back.	-0.124939
-1.379532	places back.	-0.124939
-1.300577	written back.	-0.124939
-2.881985	Example 8.4	-0.124939
-1.203667	option. 8.4	-0.124939
-0.902864	77 8.4	-0.124939
-2.881985	Example 8.7	-0.124939
-0.902864	105. 8.7	-0.124939
-0.902864	82 8.7	-0.124939
-1.565561	assembly listing	-0.124939
-1.747312	output listing	-0.124939
-2.772974	used twice	-0.124939
-2.469146	const twice	-0.124939
-2.167890	calculated twice	-0.124939
-3.771380	of Pascal	-0.124939
-1.804598	platforms. Pascal	-0.124939
-1.379645	C++, Pascal	-0.124939
-3.430994	be expected.	-0.124939
-3.142260	as expected.	-0.124939
-1.831378	contentions expected.	-0.124939
-1.714760	performance. 14.4	-0.124939
-1.446705	130 14.4	-0.124939
-1.078842	135 14.4	-0.124939
-3.007172	{ Vec16s	-0.124939
-2.714557	class Vec16s	-0.124939
-0.601947	Vec32uc Vec16s	-0.124939
-1.961980	efficient. Simple	-0.124939
-1.504358	fast. Simple	-0.124939
-0.601947	fast=2 Simple	-0.124939
-0.204104	Developer’s Manual",	-0.425969
-0.601980	Programmer’s Manual",	-0.124939
-2.973437	and leave	-0.124939
-2.742214	should leave	-0.124939
-2.759882	be solved	-0.425969
-2.859753	has solved	-0.124939
-3.819753	is supplied	-0.124939
-3.414453	are supplied	-0.124939
-2.993562	have supplied	-0.124939
-1.680336	purposes. Available	-0.124939
-1.680110	access. Available	-0.124939
-1.379645	Agner Available	-0.124939
-3.166689	is translated	-0.124939
-2.190343	been translated	-0.124939
-1.906726	64-bit Linux:	-0.124939
-0.601980	/Gy, Linux:	-0.124939
-1.600929	user. With	-0.124939
-1.300464	numbers. With	-0.124939
-1.078842	step. With	-0.124939
-1.446366	Linux. Has	-0.124939
-1.203667	IDE. Has	-0.124939
-0.601947	builder Has	-0.124939
-2.743430	are overriding	-0.425969
-2.015600	allows overriding	-0.124939
-1.260509	AMD Opteron	-0.602060
-1.486615	operating systems".	-0.124939
-3.455353	the correct	-0.124939
-3.835016	is correct	-0.124939
-3.154529	code caching.	-0.124939
-2.882686	memory caching.	-0.124939
-2.062199	optimize caching.	-0.124939
-3.771380	of overflow:	-0.124939
-2.801031	point overflow:	-0.124939
-1.203780	avoids overflow:	-0.124939
-2.789663	that scans	-0.425969
-3.246643	function scans	-0.124939
-1.388036	following way:	-0.124939
-2.403846	code. Sometimes	-0.124939
-1.300577	consuming. Sometimes	-0.124939
-1.203667	tasks. Sometimes	-0.124939
-2.490883	32-bit -fno-builtin	-0.124939
-2.436964	bit -fno-builtin	-0.124939
-2.201524	option -fno-builtin	-0.124939
-3.110004	to justify	-0.124939
-1.504672	easily justify	-0.124939
-3.209321	the contrary,	-0.124939
-0.948781	calling conventions.	-0.124939
-3.516422	The initialization	-0.124939
-2.389731	an initialization	-0.425969
-3.455353	the Internet	-0.124939
-0.902931	163 Internet	-0.124939
-3.110004	to cover	-0.425969
-3.116033	not cover	-0.124939
-1.300632	itself. Constructors	-0.124939
-0.380179	7.23 Constructors	-0.425969
-2.495294	between PC's	-0.124939
-2.430512	first PC's	-0.124939
-2.104726	standard PC's	-0.124939
-2.881985	Example 7.21	-0.124939
-1.078842	53 7.21	-0.124939
-0.902864	effort. 7.21	-0.124939
-3.469094	that delays	-0.124939
-2.190528	cause delays	-0.124939
-0.601947	severe delays	-0.124939
-0.903028	i, a);	-0.602060
-2.973437	and c[i]	-0.425969
-0.601980	b[i]; c[i]	-0.124939
-3.496332	for cleaning	-0.124939
-2.969894	time cleaning	-0.124939
-2.886896	from cleaning	-0.124939
-2.825970	same way,	-0.124939
-2.804514	other way,	-0.124939
-2.761189	one way,	-0.124939
-2.091388	memory. Big	-0.124939
-1.300690	computer. Big	-0.124939
-0.601947	of. Big	-0.124939
-2.973437	and ZMM	-0.425969
-0.601980	512-bit ZMM	-0.124939
-3.786643	of coefficients	-0.124939
-0.204104	Polynomial coefficients	-0.124939
-3.142260	as DOS	-0.124939
-2.368967	systems DOS	-0.124939
-1.879118	old DOS	-0.124939
-3.516422	The -fpie	-0.124939
-1.504783	option -fpie	-0.124939
-2.564025	many labels	-0.124939
-2.315129	case labels	-0.124939
-1.203667	sequential labels	-0.124939
-0.982128	2, 6,	-0.425969
-1.504513	4, 6,	-0.124939
-1.078842	pop ret	-0.124939
-0.601947	$B2$3: ret	-0.124939
-0.601947	beginning. ret	-0.124939
-1.961755	below. Signed	-0.124939
-1.555398	overflow. Signed	-0.124939
-0.601947	7.4. Signed	-0.124939
-3.618258	and logarithms	-0.124939
-3.142260	as logarithms	-0.124939
-2.155766	uses logarithms	-0.124939
-3.819753	is stored.	-0.124939
-3.430994	be stored.	-0.124939
-3.414453	are stored.	-0.124939
-1.555398	standardized manner.	-0.124939
-1.504471	non-sequential manner.	-0.124939
-1.504358	random manner.	-0.124939
-1.804485	size. Today,	-0.124939
-1.078842	slow. Today,	-0.124939
-0.902864	computers. Today,	-0.124939
-4.103464	the easiest	-0.124939
-2.839426	The easiest	-0.425969
-3.618258	and pop	-0.124939
-1.078842	100. pop	-0.124939
-0.601947	$B1$3: pop	-0.124939
-2.297534	constant 3.5	-0.124939
-1.446479	19 3.5	-0.124939
-1.300464	process. 3.5	-0.124939
-1.879699	options -S	-0.124939
-0.204104	/FA -S	-0.124939
-3.835016	is inlined.	-0.124939
-2.759882	be inlined.	-0.124939
-1.407841	add cmp	-0.124939
-0.601980	i++. cmp	-0.124939
-2.872111	data flow	-0.124939
-2.171234	program flow	-0.124939
-1.980238	#include directives.	-0.124939
-1.555850	OpenMP directives.	-0.124939
-0.601947	Preprocessor directives.	-0.124939
-2.593253	also deallocated.	-0.124939
-1.493597	been deallocated.	-0.124939
-2.951125	more (128	-0.124939
-2.022740	set (128	-0.124939
-1.203667	obsolete. Programmers	-0.124939
-0.902864	instruction. Programmers	-0.124939
-0.601947	expansions. Programmers	-0.124939
-3.766375	to focus	-0.124939
-2.948234	more focus	-0.124939
-1.831039	main focus	-0.124939
-3.246643	function definition.	-0.124939
-2.022517	class definition.	-0.124939
-4.103464	the track	-0.124939
-0.944355	keep track	-0.425969
-2.990146	this condition.	-0.124939
-1.592808	error condition.	-0.124939
-3.618258	and s3	-0.124939
-1.879231	0, s3	-0.124939
-0.601947	a[i+2]; s3	-0.124939
-1.879231	0, s2	-0.124939
-0.601947	a[i+1]; s2	-0.124939
-0.601947	s1, s2	-0.124939
-2.477363	on contemporary	-0.124939
-1.777434	Unfortunately, contemporary	-0.124939
-1.300464	.......................................................................................... 66	-0.124939
-1.078955	1.0f;} 66	-0.124939
-0.902864	............................................................................................ 66	-0.124939
-3.819753	is probably	-0.124939
-3.349433	can probably	-0.124939
-0.601947	disassembly, probably	-0.124939
-3.771380	of longjmp	-0.124939
-3.242285	function longjmp	-0.124939
-3.163926	on longjmp	-0.124939
-0.124934	1)sign 2exponent	-0.124939
-1.804711	statement leads	-0.124939
-1.776569	vectorization leads	-0.124939
-1.504358	binding leads	-0.124939
-2.679221	size Alignd	-0.124939
-2.252676	arrays Alignd	-0.124939
-1.300577	); Alignd	-0.124939
-2.828682	for improving	-0.124939
-3.168158	on improving	-0.124939
-2.741224	cache sizes.	-0.124939
-2.190416	matrix sizes.	-0.124939
-1.379532	mixed sizes.	-0.124939
-1.642548	loading .......................................................................................................	-0.124939
-1.203667	Metaprogramming .......................................................................................................	-0.124939
-1.203667	databases .......................................................................................................	-0.124939
-3.469094	that holds	-0.124939
-2.315017	type holds	-0.124939
-1.555398	eax holds	-0.124939
-3.771380	of competing	-0.124939
-3.414453	are competing	-0.124939
-2.921819	A competing	-0.124939
-1.643006	programming questions	-0.124939
-0.601980	answer questions	-0.124939
-3.810938	a register,	-0.124939
-2.162015	vector register,	-0.124939
-1.901508	interface etc.,	-0.124939
-1.855863	function, etc.,	-0.124939
-0.902864	compilers, etc.,	-0.124939
-4.078968	the ReadTSC	-0.124939
-3.242285	function ReadTSC	-0.124939
-2.155766	get ReadTSC	-0.124939
-1.133491	replaced with:	-0.425969
-1.078942	Replace with:	-0.124939
-3.649399	in kernel	-0.124939
-2.337805	system kernel	-0.124939
-2.190641	Linux kernel	-0.124939
-0.878215	VIA CPUs").	-0.301030
-0.903028	i, j;	-0.124939
-3.144495	a natural	-0.124939
-2.131868	contains natural	-0.124939
-1.715211	calculations. Examples	-0.124939
-1.379532	above. Examples	-0.124939
-1.203667	run. Examples	-0.124939
-3.007172	{ (iset	-0.124939
-0.601947	&SelectAddMul_AVX2; (iset	-0.124939
-0.601947	&SelectAddMul_SSE41; (iset	-0.124939
-3.242285	function F2	-0.124939
-3.215103	by F2	-0.124939
-2.315129	case F2	-0.124939
-2.600857	or moving	-0.425969
-3.068704	than moving	-0.124939
-2.888122	Example 9.6b.	-0.124939
-2.005572	example 9.6b.	-0.425969
-0.902864	only) -O3	-0.124939
-0.601947	/O3 -O3	-0.124939
-0.601947	/Ox -O3	-0.124939
-3.278300	it unusual	-0.124939
-2.427954	not unusual	-0.425969
-1.793627	cache misses,	-0.602060
-3.108537	- Divide	-0.124939
-2.104388	add Divide	-0.124939
-0.601947	---xx---x Divide	-0.124939
-2.894984	a sorted	-0.602060
-3.771380	of efficiency,	-0.124939
-2.741224	cache efficiency,	-0.124939
-2.495294	between efficiency,	-0.124939
-3.209321	the same.	-0.124939
-1.931645	library (STL)	-0.124939
-1.642576	Library (STL)	-0.124939
-1.204002	get rid	-0.602060
-1.879118	10 ms	-0.124939
-1.446479	120 ms	-0.124939
-1.078842	30 ms	-0.124939
-2.607710	two arrays,	-0.124939
-2.261210	large arrays,	-0.124939
-2.242017	big arrays,	-0.124939
-2.468986	elements matrix[r][c]	-0.124939
-1.536176	element matrix[r][c]	-0.124939
-3.766375	to issue	-0.124939
-3.077232	an issue	-0.124939
-1.379645	portability issue	-0.124939
-3.110004	to solve	-0.425969
-3.116033	not solve	-0.124939
-2.144095	problem since	-0.124939
-1.300464	updated since	-0.124939
-0.601947	pulses since	-0.124939
-2.916830	is beyond	-0.602060
-2.257715	more readable	-0.124939
-0.601980	human readable	-0.124939
-3.819753	is infinity	-0.124939
-3.430994	be infinity	-0.124939
-3.272416	or infinity	-0.124939
-3.771380	of bookkeeping	-0.124939
-2.986072	this bookkeeping	-0.124939
-1.642322	why bookkeeping	-0.124939
-2.530822	some formula	-0.124939
-1.747283	safe formula	-0.124939
-1.714760	right formula	-0.124939
-4.078968	the technical	-0.124939
-3.771380	of technical	-0.124939
-1.714873	causes technical	-0.124939
-2.190641	AVX instr.	-0.124939
-1.601268	SSE4.1 instr.	-0.124939
-1.379532	SSE3 instr.	-0.124939
-3.455353	the specified	-0.124939
-2.031753	typically specified	-0.124939
-3.771380	of organizing	-0.124939
-3.496332	for organizing	-0.124939
-3.215103	by organizing	-0.124939
-2.888122	Example 9.5a	-0.124939
-2.005572	example 9.5a	-0.124939
-3.835016	is false,	-0.124939
-2.619170	= false,	-0.425969
-3.618258	and open	-0.124939
-3.349433	can open	-0.124939
-1.680223	Another open	-0.124939
-2.047297	optimal decomposition	-0.124939
-0.601947	functional decomposition	-0.124939
-0.601947	Functional decomposition	-0.124939
-3.771380	of measuring	-0.124939
-3.618258	and measuring	-0.124939
-3.215103	by measuring	-0.124939
-0.204104	3.7 File	-0.124939
-0.601980	categories: File	-0.124939
-3.166689	is negligible	-0.124939
-3.810938	a negligible	-0.124939
-3.274736	it took	-0.124939
-3.154529	code took	-0.124939
-1.962883	We took	-0.124939
-2.523068	so on.	-0.124939
-1.980802	running on.	-0.124939
-1.300577	turned on.	-0.124939
-1.922926	processors. Hyperthreading	-0.124939
-0.204104	10.1 Hyperthreading	-0.124939
-3.108537	- 30	-0.124939
-2.031052	typically 30	-0.124939
-0.601947	142). 30	-0.124939
-2.784760	which initially	-0.124939
-2.665404	pointer initially	-0.124939
-1.203667	entry initially	-0.124939
-2.427954	not occur.	-0.124939
-2.243524	doesn't occur.	-0.124939
-1.504513	arrays. Strings	-0.124939
-0.204104	9.8 Strings	-0.124939
-1.923085	directives Preprocessing	-0.124939
-0.204104	7.32 Preprocessing	-0.425969
-3.110004	to utilize	-0.425969
-1.555665	fully utilize	-0.124939
-2.868457	of (0,0,0,0,0,0,0,0)	-0.301030
-1.832169	example: 38	-0.124939
-1.078842	.......................................................................................................... 38	-0.124939
-0.902864	..................................................................................................................... 38	-0.124939
-3.272416	or reference.	-0.124939
-2.469146	const reference.	-0.124939
-0.902864	null reference.	-0.124939
-0.903028	#define FUNCNAME	-0.124939
-4.078968	the history	-0.124939
-3.505314	The history	-0.124939
-0.902864	past history	-0.124939
-2.714557	class CChild2	-0.124939
-0.902864	Object1; CChild2	-0.124939
-0.601947	p1->Hello(); CChild2	-0.124939
-1.139128	sign bit:	-0.301030
-1.715664	discussion forums	-0.124939
-1.078842	Internet forums	-0.124939
-0.902864	internet forums	-0.124939
-1.805543	relative addressing	-0.124939
-0.681162	self-relative addressing	-0.425969
-2.366658	= 1024;	-0.301030
-3.142260	as C#,	-0.124939
-2.403846	code. C#,	-0.124939
-0.902864	Java, C#,	-0.124939
-2.378095	than allocating	-0.124939
-0.902931	re- allocating	-0.124939
-3.469094	that a+b	-0.124939
-2.648794	n.a. a+b	-0.124939
-1.446366	reductions: a+b	-0.124939
-2.507818	be taken	-0.602060
-3.455353	the microprocessor.	-0.124939
-3.786643	of microprocessor.	-0.124939
-3.242285	function argument	-0.124939
-2.986072	this argument	-0.124939
-2.825970	same argument	-0.124939
-2.790903	If Func1	-0.124939
-2.346191	void Func1	-0.124939
-2.279581	about Func1	-0.124939
-2.971430	in Unix-like	-0.124939
-2.776943	all Unix-like	-0.124939
-3.108537	- -----	-0.124939
-3.064983	x -----	-0.124939
-0.902864	x---- -----	-0.124939
-2.568224	* 2.5	-0.124939
-2.442652	8 2.5	-0.124939
-1.942449	compilers. 2.5	-0.124939
-2.973437	and read-only	-0.124939
-3.423873	are read-only	-0.124939
-3.794788	a well-structured	-0.124939
-3.618258	and well-structured	-0.124939
-2.948234	more well-structured	-0.124939
-2.330846	bits represent	-0.124939
-1.601042	counts represent	-0.124939
-0.601947	truly represent	-0.124939
-1.777472	find elsewhere.	-0.124939
-1.203894	found elsewhere.	-0.124939
-0.902864	reused elsewhere.	-0.124939
-4.078968	the micro-op	-0.124939
-3.794788	a micro-op	-0.124939
-3.272416	or micro-op	-0.124939
-3.166689	is best.	-0.124939
-2.180093	works best.	-0.124939
-3.771380	of returning	-0.124939
-3.215103	by returning	-0.124939
-2.945610	when returning	-0.124939
-1.830814	are: Long	-0.124939
-1.680110	precision. Long	-0.124939
-1.555511	order. Long	-0.124939
-2.619170	= r1;	-0.124939
-2.491214	< r1;	-0.124939
-3.794788	a CPU-	-0.124939
-3.505314	The CPU-	-0.124939
-2.849923	make CPU-	-0.124939
-3.780575	to top	-0.124939
-1.565354	; top	-0.425969
-3.766375	to decide	-0.124939
-3.469094	that decide	-0.124939
-3.045638	may decide	-0.124939
-1.761417	each other.	-0.301030
-0.982128	square brackets	-0.124939
-1.203801	{} brackets	-0.124939
-1.078842	since 2004.	-0.124939
-0.601947	8.42n, 2004.	-0.124939
-0.601947	2.1.7, 2004.	-0.124939
-3.819753	is odd	-0.124939
-3.077232	an odd	-0.124939
-1.680110	little odd	-0.124939
-2.881985	Example 7.7	-0.124939
-1.203667	operators. 7.7	-0.124939
-0.902864	36 7.7	-0.124939
-1.601268	Compiler Documentation	-0.124939
-0.902864	Free Documentation	-0.124939
-0.601947	www.openmp.org. Documentation	-0.124939
-1.337859	error prone.	-0.124939
-2.877841	at compile-	-0.124939
-2.724428	no compile-	-0.124939
-1.642435	allow compile-	-0.124939
-2.271111	function. Global	-0.124939
-1.714760	it. Global	-0.124939
-1.601042	returns. Global	-0.124939
-2.505368	table lookups	-0.124939
-1.504584	PLT lookups	-0.124939
-0.601947	simultaneous lookups	-0.124939
-2.403959	optimization Whole	-0.124939
-2.077035	program. Whole	-0.124939
-0.601947	/Og Whole	-0.124939
-3.294053	= a*b	-0.124939
-0.601947	b+a a*b	-0.124939
-0.601947	b+a, a*b	-0.124939
-3.455353	the linker.	-0.124939
-2.333418	dynamic linker.	-0.124939
-3.786643	of security.	-0.124939
-3.110004	to security.	-0.124939
-1.556066	table lookup.	-0.124939
-2.021961	page 78	-0.425969
-1.715127	library. 78	-0.124939
-3.166689	is handled	-0.124939
-3.440385	be handled	-0.124939
-2.063741	(int a[],	-0.124939
-0.204104	Func(int a[],	-0.425969
-3.819753	is implicitly	-0.124939
-3.242285	function implicitly	-0.124939
-2.191317	done implicitly	-0.124939
-3.455353	the terminating	-0.425969
-2.457324	before terminating	-0.124939
-3.110422	not _WIN32	-0.124939
-1.901169	platform _WIN32	-0.124939
-1.078842	_WIN32 _WIN32	-0.124939
-3.771380	of (2n	-0.124939
-2.568224	* (2n	-0.124939
-2.297534	constant (2n	-0.124939
-2.789663	that measures	-0.425969
-1.805224	profiler measures	-0.124939
-3.618258	and multiplications.	-0.124939
-2.724428	no multiplications.	-0.124939
-2.154867	four multiplications.	-0.124939
-2.444773	less intensive	-0.124939
-0.204104	computationally intensive	-0.124939
-2.759882	be moved	-0.425969
-3.281660	or moved	-0.124939
-3.294053	= ReadTSC()	-0.124939
-2.496416	long ReadTSC()	-0.124939
-2.241904	Use ReadTSC()	-0.124939
-2.916830	is valid.	-0.301030
-0.346777	a[size], b[size];	-0.602060
-1.855637	compiler. Not	-0.124939
-1.776569	vectorization Not	-0.124939
-0.601947	builder. Not	-0.124939
-2.948146	when none	-0.124939
-2.081223	but none	-0.425969
-3.064377	than "what	-0.124939
-0.601947	thinks "what	-0.124939
-0.601947	kind: "what	-0.124939
-1.078842	addition. Comparing	-0.124939
-0.601947	clause. Comparing	-0.124939
-0.601947	2.1. Comparing	-0.124939
-1.714647	processing instructions,	-0.124939
-1.203667	sequential instructions,	-0.124939
-0.601947	fence instructions,	-0.124939
-2.015225	sets Microprocessor	-0.124939
-1.504245	branch. Microprocessor	-0.124939
-1.203667	obsolete. Microprocessor	-0.124939
-2.398936	template metaprogramming.	-0.124939
-1.680891	need metaprogramming.	-0.124939
-3.242285	function Size	-0.124939
-2.468020	elements Size	-0.124939
-1.680110	classes. Size	-0.124939
-3.506862	for metaprogramming,	-0.124939
-1.701936	template metaprogramming,	-0.124939
-2.661651	can bypass	-0.425969
-3.281660	or bypass	-0.124939
-2.262785	assembly output.	-0.124939
-2.232257	language output.	-0.124939
-1.998418	Boolean output.	-0.124939
-2.031390	microprocessor ...........................................................................................	-0.124939
-1.980689	division ...........................................................................................	-0.124939
-1.901169	platform ...........................................................................................	-0.124939
-3.455353	the numerically	-0.425969
-0.601980	Find numerically	-0.124939
-1.942449	|| expression.	-0.124939
-1.555398	chosen expression.	-0.124939
-1.504471	arithmetic expression.	-0.124939
-2.377052	pointers ..........................................................................................................	-0.124939
-1.379759	notice ..........................................................................................................	-0.124939
-0.902864	Conclusion ..........................................................................................................	-0.124939
-3.516422	The InstructionSet()	-0.124939
-2.828682	for InstructionSet()	-0.425969
-2.031165	branches Eliminate	-0.124939
-1.300690	1.; Eliminate	-0.124939
-1.300464	jumps Eliminate	-0.124939
-3.794788	a backup	-0.124939
-1.961642	better backup	-0.124939
-0.601947	legitimate backup	-0.124939
-1.942788	set. 13.6	-0.124939
-1.555511	65 13.6	-0.124939
-1.203667	126 13.6	-0.124939
-2.381622	// Get	-0.301030
-3.110422	not throw	-0.124939
-1.997854	never throw	-0.124939
-1.680223	possibly throw	-0.124939
-1.942788	set. More	-0.124939
-1.856767	a[i] More	-0.124939
-1.555511	problems. More	-0.124939
-1.078842	Devirtualization ---x-----	-0.124939
-0.902864	x-xxxxxxx ---x-----	-0.124939
-0.902864	x--x----- ---x-----	-0.124939
-1.566582	return _mm_loadu_si128((__m128i	-0.301030
-2.232257	language Before	-0.124939
-1.379645	spots Before	-0.124939
-1.203667	tasks. Before	-0.124939
-1.998192	systems. Applications	-0.124939
-1.203667	interface. Applications	-0.124939
-0.601947	141. Applications	-0.124939
-3.108537	- 25	-0.124939
-1.714760	performance. 25	-0.124939
-0.601947	process...................................................................................................... 25	-0.124939
-4.103464	the AVX-512	-0.124939
-0.505102	12.2 AVX-512	-0.425969
-2.509467	2 23	-0.124939
-2.014549	their 23	-0.124939
-1.078842	............................................................................................... 23	-0.124939
-1.971663	will evict	-0.301030
-2.271111	function. Copying	-0.124939
-2.091388	memory. Copying	-0.124939
-0.601947	backwards. Copying	-0.124939
-2.577033	for (x	-0.602060
-2.048652	else being	-0.124939
-1.961755	n being	-0.124939
-0.601947	That being	-0.124939
-3.315431	// sum,	-0.124939
-2.430512	first sum,	-0.124939
-1.901508	second sum,	-0.124939
-3.505314	The unrolled	-0.124939
-2.799439	loop unrolled	-0.124939
-1.379532	completely unrolled	-0.124939
-3.166689	is slow.	-0.124939
-1.962276	too slow.	-0.124939
-0.249867	aa: StoreVector(aa	-0.602060
-2.881985	Example 7.11	-0.124939
-1.601268	problem. 7.11	-0.124939
-1.078842	38 7.11	-0.124939
-4.078968	the market	-0.124939
-3.618258	and market	-0.124939
-2.809494	CPU market	-0.124939
-3.771380	of vectors,	-0.124939
-3.649399	in vectors,	-0.124939
-1.998418	Boolean vectors,	-0.124939
-2.201861	allocated resource.	-0.124939
-1.746944	limited resource.	-0.124939
-1.203667	scarce resource.	-0.124939
-1.981538	Intel Architecture	-0.425969
-0.601980	"AMD64 Architecture	-0.124939
-2.881985	Example 7.12	-0.124939
-2.271111	function. 7.12	-0.124939
-1.203667	40 7.12	-0.124939
-3.166689	is limited.	-0.124939
-2.501498	very limited.	-0.124939
-2.888122	Example 11.3	-0.124939
-2.005572	example 11.3	-0.124939
-3.272416	or typedef	-0.124939
-2.315017	type typedef	-0.124939
-2.156555	parameters typedef	-0.124939
-2.196187	from 0x2700	-0.425969
-2.457245	address 0x2700	-0.124939
-2.156667	}; Replace	-0.124939
-1.078842	on. Replace	-0.124939
-0.601947	7.34b. Replace	-0.124939
-0.902864	model. Instead,	-0.124939
-0.601947	2011). Instead,	-0.124939
-0.601947	NOT. Instead,	-0.124939
-3.618258	and frameworks,	-0.124939
-1.961867	runtime frameworks,	-0.124939
-1.922584	graphics frameworks,	-0.124939
-3.294053	= *(p++)	-0.124939
-1.680902	0) *(p++)	-0.124939
-0.601947	i--) *(p++)	-0.124939
-2.271449	CPUs (Intel	-0.124939
-2.211236	etc. (Intel	-0.124939
-0.902864	only) (Intel	-0.124939
-3.810938	a nearby	-0.124939
-2.116167	other nearby	-0.124939
-1.962276	too fragmented.	-0.124939
-1.107003	become fragmented.	-0.124939
-3.786643	of truncation.	-0.124939
-2.973437	and truncation.	-0.124939
-0.380179	7.1 Different	-0.425969
-0.601980	number). Different	-0.124939
-3.209321	the logarithm	-0.124939
-3.656516	in Day	-0.124939
-1.245226	|| Day	-0.425969
-3.819753	is ported	-0.124939
-1.901733	later ported	-0.124939
-1.504471	easily ported	-0.124939
-3.281660	or inline.	-0.124939
-2.556097	function inline.	-0.124939
-3.819753	is big.	-0.124939
-2.499973	very big.	-0.124939
-1.961642	too big.	-0.124939
-2.973437	and deallocation	-0.425969
-0.601980	allocation, deallocation	-0.124939
-1.556066	table (PLT)	-0.124939
-2.881985	Example 7.22	-0.124939
-1.078842	54 7.22	-0.124939
-0.902864	implementations. 7.22	-0.124939
-0.204104	3.14 Context	-0.124939
-0.601980	renewed. Context	-0.124939
-2.881985	Example 7.23	-0.124939
-2.156667	}; 7.23	-0.124939
-1.078842	54 7.23	-0.124939
-4.078968	the services	-0.124939
-1.922246	Many services	-0.124939
-1.504245	background services	-0.124939
-2.881985	Example 7.20	-0.124939
-1.680110	access. 7.20	-0.124939
-1.078842	53 7.20	-0.124939
-2.916830	is extremely	-0.124939
-1.747551	8 kb	-0.124939
-1.880017	512 kb	-0.124939
-2.278409	by joining	-0.124939
-3.780575	to decrement	-0.124939
-2.973437	and decrement	-0.124939
-3.294053	= 0x1C.	-0.124939
-2.716116	set 0x1C.	-0.124939
-2.608176	number 0x1C.	-0.124939
-2.973437	and free.	-0.124939
-3.506862	for free.	-0.124939
-2.634807	// Check	-0.124939
-1.777275	2. Check	-0.124939
-3.771380	of double,	-0.124939
-2.601540	64-bit double,	-0.124939
-1.203780	float, double,	-0.124939
-1.355198	simple periodic	-0.602060
-2.881985	Example 7.27	-0.124939
-2.156329	functions. 7.27	-0.124939
-1.203667	56 7.27	-0.124939
-2.881985	Example 7.24	-0.124939
-1.078842	55 7.24	-0.124939
-0.601947	53. 7.24	-0.124939
-4.078968	the product	-0.124939
-2.499190	software product	-0.124939
-1.078842	competing product	-0.124939
-2.881985	Example 7.25	-0.124939
-1.555624	integers. 7.25	-0.124939
-1.078842	55 7.25	-0.124939
-2.881985	Example 7.28	-0.124939
-1.804823	cases. 7.28	-0.124939
-1.203667	56 7.28	-0.124939
-3.618258	and references,	-0.124939
-1.300464	pointers, references,	-0.124939
-0.601947	Pointers, references,	-0.124939
-0.878215	VIA CPUs"	-0.301030
-2.530822	some experience	-0.124939
-2.339264	programming experience	-0.124939
-1.601155	might experience	-0.124939
-2.859733	to determine	-0.124939
-1.446999	template <typename	-0.124939
-1.078842	-S Generate	-0.124939
-0.902864	-static Generate	-0.124939
-0.601947	/Fm Generate	-0.124939
-2.916830	is certainly	-0.124939
-1.961755	below. Devirtualization	-0.124939
-1.776569	vectorization Devirtualization	-0.124939
-0.601947	8.19. Devirtualization	-0.124939
-3.794788	a pivot	-0.124939
-3.142260	as pivot	-0.124939
-1.446366	suitable pivot	-0.124939
-2.338590	16 __declspec(	-0.124939
-1.504471	__restrict __declspec(	-0.124939
-0.902864	aligned(16))) __declspec(	-0.124939
-3.786643	of mispredictions	-0.124939
-1.794643	branch mispredictions	-0.124939
-0.903028	i, a[100],	-0.124939
-3.771380	of allocations	-0.124939
-2.882686	memory allocations	-0.124939
-2.564025	many allocations	-0.124939
-2.294707	if necessary,	-0.124939
-2.881985	Example 9.4	-0.124939
-2.156329	functions. 9.4	-0.124939
-1.203667	88 9.4	-0.124939
-3.794788	a float.	-0.124939
-2.154867	four float.	-0.124939
-1.300577	int, float.	-0.124939
-1.700547	+ 1.0f;}	-0.301030
-3.166689	is indeed	-0.124939
-3.423873	are indeed	-0.124939
-2.505368	table 9.1	-0.124939
-2.346864	access 9.1	-0.124939
-1.379532	87 9.1	-0.124939
-2.804514	other (not	-0.124939
-1.830927	tested (not	-0.124939
-0.902864	NAN (not	-0.124939
-3.794788	a built-in	-0.124939
-3.505314	The built-in	-0.124939
-1.078842	inserts built-in	-0.124939
-3.771380	of n.	-0.124939
-1.831039	positive n.	-0.124939
-0.601947	value, n.	-0.124939
-3.794788	a complete	-0.124939
-2.921819	A complete	-0.124939
-2.131168	contains complete	-0.124939
-2.114505	x (x)	-0.301030
-1.642435	ebx ecx,	-0.124939
-1.379645	edx, ecx,	-0.124939
-0.902864	ENDP ecx,	-0.124939
-2.600857	or modified.	-0.425969
-3.116033	not modified.	-0.124939
-0.249867	Constant folding	-0.301030
-3.278300	it expects	-0.124939
-1.673496	user expects	-0.124939
-2.381622	// Call	-0.301030
-2.507818	be joined	-0.301030
-2.854055	vector classes,	-0.124939
-1.961980	string classes,	-0.124939
-1.777133	functions, classes,	-0.124939
-3.349433	can compute	-0.124939
-2.262447	must compute	-0.124939
-2.261322	; compute	-0.124939
-3.118316	of interpreting	-0.425969
-3.506862	for interpreting	-0.124939
-3.636150	and LIBM	-0.124939
-1.515587	AMD LIBM	-0.124939
-2.789663	that accesses	-0.124939
-2.091354	All accesses	-0.124939
-4.078968	the $B1$2	-0.124939
-1.714534	100 $B1$2	-0.124939
-0.902864	jl $B1$2	-0.124939
-1.679998	disk copying.	-0.124939
-1.504358	Memory copying.	-0.124939
-0.601947	illegitimate copying.	-0.124939
-0.124934	Manual", Volume	-0.124939
-2.507818	be placed	-0.301030
-0.823883	hot spot.	-0.124939
-3.144495	a variable,	-0.124939
-2.739293	integer variable,	-0.124939
-3.771380	of jumping	-0.124939
-3.496332	for jumping	-0.124939
-1.943352	after jumping	-0.124939
-2.969894	time compared	-0.124939
-1.446592	disadvantages compared	-0.124939
-0.601947	duration compared	-0.124939
-3.780575	to 3-dimensional	-0.124939
-2.600857	or 3-dimensional	-0.425969
-3.771380	of x.	-0.124939
-3.649399	in x.	-0.124939
-3.163926	on x.	-0.124939
-3.766375	to a.	-0.124939
-3.649399	in a.	-0.124939
-2.647506	+ a.	-0.124939
-3.766375	to post-increment.	-0.124939
-3.272416	or post-increment.	-0.124939
-3.064377	than post-increment.	-0.124939
-3.166689	is sufficient	-0.425969
-3.440385	be sufficient	-0.124939
-3.835016	is evicted	-0.124939
-2.759882	be evicted	-0.124939
-4.078968	the flags	-0.124939
-1.980125	zero flags	-0.124939
-0.902864	partial flags	-0.124939
-3.649399	in Sum2	-0.124939
-3.064377	than Sum2	-0.124939
-0.601947	Sum1, Sum2	-0.124939
-2.868457	of (2,2,2,2,2,2,2,2)	-0.301030
-0.689169	PTR [edx]	-0.301030
-2.881985	Example 7.14	-0.124939
-1.203780	45 7.14	-0.124939
-1.078842	big. 7.14	-0.124939
-2.881985	Example 7.16	-0.124939
-1.300464	50 7.16	-0.124939
-1.078842	systems". 7.16	-0.124939
-2.881985	Example 7.17	-0.124939
-1.714760	object. 7.17	-0.124939
-1.300464	50 7.17	-0.124939
-1.998965	using templates.	-0.124939
-0.601980	recursive templates.	-0.124939
-2.881985	Example 7.13	-0.124939
-1.300464	microprocessors. 7.13	-0.124939
-1.203667	43 7.13	-0.124939
-2.881985	Example 7.19	-0.124939
-1.715325	bytes. 7.19	-0.124939
-1.203667	51 7.19	-0.124939
-1.281183	But beware	-0.301030
-2.881985	Example 7.18	-0.124939
-1.714760	performance. 7.18	-0.124939
-1.203667	51 7.18	-0.124939
-1.923085	graphics card	-0.124939
-0.902931	accelerator card	-0.124939
-1.300632	inefficient, (4)	-0.124939
-0.601980	finally (4)	-0.124939
-2.609824	two elements:	-0.124939
-2.567207	array elements:	-0.124939
-3.148056	a viable	-0.124939
-1.078942	38 7.10	-0.124939
-0.902931	93. 7.10	-0.124939
-2.620879	= _mm_set1_epi16(2);	-0.425969
-2.716931	class templates,	-0.124939
-1.805145	STL templates,	-0.124939
-3.636150	and complexity	-0.124939
-1.980918	high complexity	-0.124939
-1.079022	14.4 511	-0.124939
-0.902931	511 511	-0.124939
-0.380193	7.8 Member	-0.124939
-3.506862	for modifying	-0.124939
-3.220466	by modifying	-0.124939
-2.996966	have undesired	-0.124939
-1.777275	produce undesired	-0.124939
-0.902931	48 7.15	-0.124939
-0.601980	respect. 7.15	-0.124939
-3.516422	The symbol	-0.124939
-1.714968	so-called symbol	-0.124939
-2.501498	very problematic	-0.124939
-1.747232	particularly problematic	-0.124939
-3.113097	to invest	-0.124939
-3.636150	and memcpy,	-0.124939
-3.147653	as memcpy,	-0.124939
-2.977426	and Sum3	-0.124939
-1.446600	anyway. Pure	-0.124939
-1.300632	them. Pure	-0.124939
-3.835016	is impossible	-0.124939
-3.423873	are impossible	-0.124939
-2.022999	class B1;	-0.425969
-1.408064	store forwarding	-0.425969
-0.902931	Far storage,	-0.124939
-0.601980	little-endian storage,	-0.124939
-2.022675	page 107).	-0.124939
-2.167797	/ 64)	-0.124939
-0.601980	(typically 64)	-0.124939
-2.506893	table static.	-0.124939
-0.601980	lookup-table static.	-0.124939
-1.300711	_WIN64 _M_X64	-0.124939
-0.902931	_M_X64 _M_X64	-0.124939
-2.347387	void CriticalFunction();	-0.124939
-0.601980	ReadTSC(); CriticalFunction();	-0.124939
-0.902931	x-xxx---x x-xxx---x	-0.124939
-0.902931	--xx----- x-xxx---x	-0.124939
-2.460257	as shown	-0.425969
-3.636150	and lack	-0.124939
-2.370130	systems lack	-0.124939
-2.620879	= a2	-0.124939
-2.620879	= a1	-0.124939
-2.714185	page 16)	-0.124939
-2.202052	+= 16)	-0.124939
-3.148056	a debugger.	-0.124939
-3.835016	is mostly	-0.124939
-3.636150	and mostly	-0.124939
-2.306769	& a)	-0.124939
-1.379892	(float a)	-0.124939
-0.601980	qword ptr	-0.124939
-0.601980	dword ptr	-0.124939
-3.516422	The fastcall	-0.124939
-2.242969	Use fastcall	-0.124939
-3.786643	of accumulators	-0.124939
-2.610850	multiple accumulators	-0.124939
-1.970878	pointer aliasing"	-0.124939
-0.204111	13.5 Implementation	-0.124939
-2.507684	performance significantly	-0.124939
-2.299518	up significantly	-0.124939
-1.079022	fine-grained parallelism.	-0.124939
-1.078942	natural parallelism.	-0.124939
-0.204111	39916800, 479001600};	-0.124939
-0.902931	book "Performance	-0.124939
-0.601980	Hoisie: "Performance	-0.124939
-3.835016	is type-casted	-0.124939
-3.423873	are type-casted	-0.124939
-2.679615	double x4	-0.124939
-2.614572	float x4	-0.124939
-1.866721	clock frequency.	-0.124939
-3.786643	of interpretation	-0.124939
-3.281660	or interpretation	-0.124939
-2.331989	operations (chapter	-0.124939
-2.253522	execution (chapter	-0.124939
-1.203954	own research,	-0.124939
-0.902931	Codes", SIAM	-0.124939
-0.601980	Hoisie, SIAM	-0.124939
-2.262353	; a[i+1]	-0.124939
-1.203881	Induction; a[i+1]	-0.124939
-1.300632	xxxxxxxxx x-xxx----	-0.124939
-0.601980	x---x---x x-xxx----	-0.124939
-3.281660	or send	-0.124939
-1.998806	don't send	-0.124939
-1.264698	char string[100],	-0.425969
-2.339720	16 SSSE3	-0.124939
-2.212300	etc. SSSE3	-0.124939
-3.786643	of expressions,	-0.124939
-2.803340	point expressions,	-0.124939
-2.563383	version CriticalFunctionType	-0.124939
-0.902931	prototype CriticalFunctionType	-0.124939
-2.022675	page 71).	-0.124939
-3.656516	in ASCII	-0.124939
-0.601980	zero-terminated ASCII	-0.124939
-3.116033	not overlapping	-0.124939
-2.889723	from overlapping	-0.124939
-3.810938	a computationally	-0.124939
-3.116033	not computationally	-0.124939
-1.962514	mechanism executes	-0.124939
-1.203801	12.4b executes	-0.124939
-1.300632	project window	-0.124939
-0.601980	disassembly window	-0.124939
-3.281660	or ten	-0.124939
-3.246643	function ten	-0.124939
-3.321661	// Structure	-0.124939
-1.203801	smaller. Structure	-0.124939
-3.786643	of jobs.	-0.124939
-1.504513	background jobs.	-0.124939
-0.902931	So please	-0.124939
-0.902931	saying please	-0.124939
-0.380193	7.24 Unions	-0.124939
-0.902931	developer.intel.com. AMD:	-0.124939
-0.902931	regularly. AMD:	-0.124939
-3.302196	= ((a*x+b)*x+c)*x+d	-0.124939
-3.066146	x ((a*x+b)*x+c)*x+d	-0.124939
-1.446760	important. 9.2	-0.124939
-1.379733	87 9.2	-0.124939
-3.835016	is 1024	-0.124939
-3.780575	to 1024	-0.124939
-1.365238	was programmed.	-0.124939
-0.902931	12.5. Aligned	-0.124939
-0.601980	performance). Aligned	-0.124939
-4.103464	the past	-0.124939
-0.902931	Writing past	-0.124939
-2.091989	memory. 9.6	-0.124939
-1.300711	90 9.6	-0.124939
-3.461056	the object's	-0.124939
-0.505131	b1, b2,	-0.425969
-2.834886	because partial	-0.124939
-1.714968	so-called partial	-0.124939
-2.620879	= (a+b)+(c+d)	-0.425969
-3.078083	int (16	-0.124939
-2.680877	size (16	-0.124939
-2.814439	instruction xor	-0.124939
-1.680445	mov xor	-0.124939
-1.203881	Remember again,	-0.124939
-1.078942	logarithm again,	-0.124939
-1.203801	96 9.9	-0.124939
-1.203801	www.agner.org/optimize/cppexamples.zip. 9.9	-0.124939
-2.977426	and resolve	-0.124939
-4.103464	the context.	-0.124939
-2.370130	new context.	-0.124939
-1.866591	version (May	-0.425969
-2.022675	page 131.	-0.124939
-4.103464	the goal	-0.124939
-1.300711	realistic goal	-0.124939
-3.636150	and discovered	-0.124939
-2.996966	have discovered	-0.124939
-1.919597	float Exp(float	-0.425969
-1.078942	93 9.8	-0.124939
-0.902931	needs. 9.8	-0.124939
-2.649527	n.a. _MSC_VER	-0.124939
-0.902931	#ifdef _MSC_VER	-0.124939
-0.204111	156 16.3	-0.425969
-0.902931	90% chance	-0.124939
-0.601980	50-50 chance	-0.124939
-3.440385	be manipulated	-0.124939
-2.062310	was manipulated	-0.124939
-3.786643	of c+b	-0.124939
-1.642895	subexpression c+b	-0.124939
-3.113097	to override	-0.124939
-2.964472	use branches,	-0.124939
-1.446680	eliminate branches,	-0.124939
-2.610850	multiple applications,	-0.124939
-2.586446	such applications,	-0.124939
-2.996966	have developed	-0.124939
-2.092068	well developed	-0.124939
-1.300632	method. 7.29	-0.124939
-0.601980	Templates...............................................................................................................57 7.29	-0.124939
-3.786643	of CriticalFunction.	-0.124939
-3.780575	to CriticalFunction.	-0.124939
-2.169226	manual discusses	-0.124939
-1.831315	section discusses	-0.124939
-0.601980	SelectAddMul_SSE41 #elif	-0.124939
-0.601980	SelectAddMul_SSE2 #elif	-0.124939
-1.300711	); 7.26	-0.124939
-1.203801	56 7.26	-0.124939
-0.204111	4: "Instruction	-0.425969
-3.835016	is 400	-0.124939
-3.321661	// 400	-0.124939
-1.107136	Loop invariant	-0.425969
-1.955540	+ c*x	-0.425969
-3.158444	code carefully	-0.124939
-2.710694	each carefully	-0.124939
-2.948146	when CriticalInnerFunction	-0.124939
-2.347387	void CriticalInnerFunction	-0.124939
-0.601980	--------x a/1=a	-0.124939
-0.601980	----x---x a/1=a	-0.124939
-3.009578	{ __m128	-0.124939
-2.315916	type __m128	-0.124939
-2.306769	& operator;	-0.124939
-1.962276	| operator;	-0.124939
-3.810938	a subexpression.	-0.124939
-2.298565	constant subexpression.	-0.124939
-3.835016	is freed	-0.124939
-3.440385	be freed	-0.124939
-1.300711	OR operator,	-0.124939
-0.601980	assignment operator,	-0.124939
-0.902931	&Object2; p->Hello();	-0.124939
-0.601980	p->NotPolymorphic(); p->Hello();	-0.124939
-2.677589	Intel CPUs:	-0.124939
-1.831712	VIA CPUs:	-0.124939
-2.084121	all 0's	-0.124939
-3.810938	a chip	-0.124939
-2.829374	same chip	-0.124939
-1.379733	^ operator.	-0.124939
-0.601980	sizeof operator.	-0.124939
-3.353252	can proceed	-0.124939
-2.593253	also proceed	-0.124939
-2.387675	int CriticalFunction_386(int	-0.425969
-3.636150	and scientific	-0.124939
-3.656516	in scientific	-0.124939
-3.835016	is biased	-0.124939
-3.810938	a biased	-0.124939
-3.810938	a minor	-0.124939
-2.567682	possible minor	-0.124939
-3.461056	the screen.	-0.124939
-3.461056	the market.	-0.124939
-0.602032	__attribute(( aligned(16)))	-0.124939
-2.761868	be justified	-0.124939
-3.780575	to exit	-0.124939
-1.203881	Calling exit	-0.124939
-2.620879	= cos(x);	-0.124939
-2.542495	variable having	-0.124939
-0.902931	p2 having	-0.124939
-2.236325	A for-loop	-0.124939
-2.105380	1 char,	-0.124939
-0.902931	types: char,	-0.124939
-0.902931	(-a)*(-b)=a*b a/a=1	-0.124939
-0.601980	---xxx--- a/a=1	-0.124939
-0.748133	serious legal	-0.124939
-2.808142	other resource,	-0.124939
-1.203801	scarce resource,	-0.124939
-1.283190	automatic parallelization.	-0.124939
-3.121663	of keeping	-0.124939
-0.902931	Event-based sampling:	-0.124939
-0.601980	Time-based sampling:	-0.124939
-2.888122	Example 12.5.	-0.124939
-1.962196	Table 12.5.	-0.124939
-3.119752	This reduces	-0.124939
-2.015521	automatically reduces	-0.124939
-3.810938	a non-member	-0.124939
-1.747312	local non-member	-0.124939
-2.761868	be vectorized,	-0.124939
-0.204111	x=y; y=temp;}	-0.124939
-0.204111	disturbing influences	-0.124939
-2.699296	example explains	-0.124939
-1.962276	better explains	-0.124939
-3.780575	to emulate	-0.124939
-3.353252	can emulate	-0.124939
-3.281660	or four,	-0.124939
-3.220466	by four,	-0.124939
-1.593138	I believe	-0.425969
-3.656516	in stdint.h	-0.124939
-2.339403	file stdint.h	-0.124939
-0.944418	subexpression elimin.,	-0.124939
-2.069733	one instance.	-0.124939
-1.650945	void TransposeCopy(double	-0.425969
-3.081147	an insufficient	-0.124939
-2.859753	has insufficient	-0.124939
-4.103464	the dangers	-0.124939
-3.786643	of dangers	-0.124939
-3.423873	are aligned.	-0.124939
-0.902931	optimally aligned.	-0.124939
-4.103464	the external	-0.124939
-3.190304	with external	-0.124939
-1.181723	<< "Error:	-0.425969
-1.601502	SSE4.1 smmintrin.h	-0.124939
-1.078942	(MS) smmintrin.h	-0.124939
-1.962434	runtime frameworks.	-0.124939
-1.901975	interface frameworks.	-0.124939
-0.204111	Sunday, Monday,	-0.124939
-0.982216	OS X,	-0.124939
-2.926114	A GNU	-0.124939
-0.601980	price GNU	-0.124939
-2.714185	page 127.	-0.124939
-1.446680	generates 127.	-0.124939
-1.866591	version FuncType	-0.124939
-3.780575	to C1::f	-0.124939
-2.444614	call C1::f	-0.124939
-1.962514	efficient. Splitting	-0.124939
-0.601980	rule. Splitting	-0.124939
-1.715127	operations. Algorithms	-0.124939
-0.601980	matrixes. Algorithms	-0.124939
-0.902931	-msse4.1 -mAVX	-0.124939
-0.601980	/arch:SSE4.1 -mAVX	-0.124939
-3.113097	to worry	-0.124939
-2.223162	single instruction.	-0.124939
-1.379892	scan instruction.	-0.124939
-2.636099	// x^2	-0.124939
-3.440385	be disabled	-0.124939
-3.423873	are disabled	-0.124939
-4.103464	the CPU-specific	-0.124939
-3.423873	are CPU-specific	-0.124939
-2.202357	Example 8.26b	-0.124939
-3.516422	The preprocessing	-0.124939
-2.859753	has preprocessing	-0.124939
-2.839593	different strides.	-0.124939
-1.504672	fixed strides.	-0.124939
-3.780575	to 15.	-0.124939
-2.714185	page 15.	-0.124939
-3.113097	to develop	-0.425969
-2.636099	// Full	-0.425969
-3.321661	// (N	-0.124939
-1.203801	N1 (N	-0.124939
-3.321661	// Enable	-0.124939
-0.902931	12.1a. Enable	-0.124939
-2.701816	most cases:	-0.124939
-2.340196	following cases:	-0.124939
-3.113097	to non-AVX	-0.124939
-2.620879	= a+(b+c)	-0.425969
-3.461056	the mouse.	-0.124939
-1.300632	5. www.amd.com.	-0.124939
-0.601980	Processors". www.amd.com.	-0.124939
-3.780575	to -56	-0.124939
-2.253840	result -56	-0.124939
-2.258303	more difficult.	-0.124939
-2.212300	etc. -msse3	-0.124939
-0.902931	/MT -msse3	-0.124939
-1.203801	8.26a (32-bit	-0.124939
-0.601980	segments (32-bit	-0.124939
-1.379813	B values.	-0.124939
-1.203801	case" values.	-0.124939
-0.902931	-msse2 /arch:SSE2	-0.124939
-0.601980	double) /arch:SSE2	-0.124939
-2.558320	objects Vec8s	-0.124939
-1.379733	Is16vec8 Vec8s	-0.124939
-2.716931	class CParent	-0.124939
-1.555745	Here CParent	-0.124939
-0.902931	78). Adding	-0.124939
-0.601980	around. Adding	-0.124939
-0.601980	3B. developer.intel.com.	-0.124939
-0.601980	Manual". developer.intel.com.	-0.124939
-0.204111	-fomit- frame-	-0.425969
-1.283135	#include <stdio.h>	-0.124939
-1.203801	96 9.11	-0.124939
-0.902931	2001. 9.11	-0.124939
-4.103464	the relocations	-0.124939
-1.879937	generate relocations	-0.124939
-2.602810	or NAN	-0.124939
-1.203801	96 9.10	-0.124939
-0.601980	opposite). 9.10	-0.124939
-0.681213	"Hello 2"	-0.124939
-2.516301	return sum;	-0.124939
-1.879699	0, sum;	-0.124939
-1.504592	vectors. 12.10	-0.124939
-1.446680	120 12.10	-0.124939
-3.786643	of semaphores,	-0.124939
-3.147653	as semaphores,	-0.124939
-1.300632	microprocessors. Multiplication	-0.124939
-1.078942	microprocessor. Multiplication	-0.124939
-3.302196	= N&(N-1)	-0.124939
-2.912038	then N&(N-1)	-0.124939
-3.113097	to assembly:	-0.425969
-0.204111	produced regularly.	-0.124939
-3.506862	for vacant	-0.124939
-3.116033	not vacant	-0.124939
-4.103464	the early	-0.124939
-2.281392	Some early	-0.124939
-2.542416	any non-polymorphic	-0.124939
-0.601980	Place non-polymorphic	-0.124939
-2.620879	= 3.3;	-0.124939
-2.565943	many users.	-0.124939
-2.500945	software users.	-0.124939
-2.996966	have extern	-0.124939
-2.803340	point extern	-0.124939
-4.103464	the heap.	-0.124939
-2.885448	memory heap.	-0.124939
-3.516422	The formats	-0.124939
-2.339403	file formats	-0.124939
-2.761868	be ruled	-0.425969
-3.835016	is reused	-0.124939
-3.440385	be reused	-0.124939
-1.642736	vector. Organize	-0.124939
-0.601980	bottleneck. Organize	-0.124939
-1.225171	public CParent<CChild1>	-0.425969
-2.888122	Example 14.12b	-0.124939
-2.699296	example 14.12b	-0.124939
-2.178803	data types:	-0.124939
-4.103464	the FDIV	-0.124939
-3.516422	The FDIV	-0.124939
-4.103464	the decimal	-0.124939
-3.810938	a decimal	-0.124939
-2.614572	float nfac	-0.124939
-1.922926	x; nfac	-0.124939
-1.777355	network connections.	-0.124939
-1.747153	database connections.	-0.124939
-3.148056	a PC.	-0.124939
-3.168158	on hacks	-0.124939
-0.601980	self-styled hacks	-0.124939
-2.290267	times 24	-0.124939
-1.203801	....................................................................................... 24	-0.124939
-2.417873	often suffer	-0.124939
-2.191215	therefore suffer	-0.124939
-3.636150	and throw.	-0.124939
-3.353252	can throw.	-0.124939
-2.331910	bits differently.	-0.124939
-2.180093	works differently.	-0.124939
-0.380193	__declspec( align(16))	-0.425969
-2.016414	each element,	-0.425969
-1.642815	loading Often,	-0.124939
-1.203881	two. Often,	-0.124939
-1.078942	condition. Replacing	-0.124939
-1.079022	inline. Replacing	-0.124939
-1.955540	+ b*x*x	-0.425969
-1.565681	assembly language".	-0.124939
-3.636150	and that's	-0.124939
-2.776831	but that's	-0.124939
-1.078942	124 13.3	-0.124939
-0.902931	programming. 13.3	-0.124939
-3.423873	are inherent	-0.124939
-2.996966	have inherent	-0.124939
-0.601980	-openmp -static	-0.124939
-0.601980	-m64 -static	-0.124939
-1.049135	S1 ArrayOfStructures[100];	-0.124939
-1.300711	122 13.2	-0.124939
-1.300632	files. 13.2	-0.124939
-2.739293	integer comparison	-0.124939
-1.078942	approximate comparison	-0.124939
-4.103464	the hint	-0.124939
-3.810938	a hint	-0.124939
-1.203801	126 13.5	-0.124939
-1.078942	elsewhere. 13.5	-0.124939
-2.511353	2 13.4	-0.124939
-0.601980	decision. 13.4	-0.124939
-2.156532	128 13.7	-0.124939
-1.078942	129 13.7	-0.124939
-1.078942	kernel code"	-0.124939
-0.601980	"position-independent code"	-0.124939
-3.121663	of course.	-0.124939
-1.283135	#include <dvec.h>	-0.425969
-4.103464	the loop,	-0.124939
-2.262274	while loop,	-0.124939
-1.999044	signed number,	-0.124939
-1.504672	family number,	-0.124939
-3.810938	a case:	-0.124939
-1.504592	lower case:	-0.124939
-2.533001	by rolling	-0.425969
-4.103464	the loop:	-0.124939
-1.856934	innermost loop:	-0.124939
-3.302196	= 2.0f;	-0.124939
-2.649750	+ 2.0f;	-0.124939
-2.022675	page 52.	-0.124939
-2.830924	for supporting	-0.124939
-3.475723	that thrown	-0.124939
-1.446600	exceptions thrown	-0.124939
-3.220466	by invoking	-0.124939
-2.362944	without invoking	-0.124939
-2.620879	= {1,	-0.425969
-3.780575	to construct	-0.124939
-3.246643	function construct	-0.124939
-3.170036	is compiled.	-0.124939
-1.650945	void transpose(double	-0.425969
-0.380193	8.7 Checking	-0.425969
-0.944418	subexpression elimination,	-0.425969
-3.302196	= StringLength;	-0.124939
-1.856377	i, StringLength;	-0.124939
-2.031832	application integration,	-0.124939
-1.747153	database integration,	-0.124939
-2.131391	few kilobytes	-0.124939
-1.079022	Total kilobytes	-0.124939
-2.305250	have got	-0.124939
-3.278300	it locally	-0.124939
-1.981236	resources locally	-0.124939
-3.786643	of it,	-0.124939
-2.015600	allows it,	-0.124939
-3.302196	= 32.	-0.124939
-1.879381	usually 32.	-0.124939
-4.103464	the minimum	-0.124939
-2.331910	bits minimum	-0.124939
-2.852102	make thread-specific	-0.124939
-1.680445	containing thread-specific	-0.124939
-3.078083	int a[2];	-0.124939
-1.856377	i, a[2];	-0.124939
-2.787003	which can't	-0.124939
-2.509828	You can't	-0.124939
-2.157089	parameters Vec4f	-0.124939
-0.601980	Vec4uq Vec4f	-0.124939
-1.379733	call. (2)	-0.124939
-1.078942	occurs, (2)	-0.124939
-1.049126	preceding paragraph	-0.124939
-2.919617	will remain	-0.124939
-1.555665	diagonal remain	-0.124939
-0.505122	virus scanners	-0.124939
-2.801422	loop calculates	-0.124939
-2.787003	which calculates	-0.124939
-1.715286	accessing databases,	-0.124939
-1.203801	resources, databases,	-0.124939
-0.601980	"Effective C++".	-0.124939
-0.601980	Effective C++".	-0.124939
-1.446600	Studio 2008	-0.124939
-0.601980	Server 2008	-0.124939
-1.446680	sections /Gy	-0.124939
-0.902931	functions) /Gy	-0.124939
-1.300632	C; Assuming	-0.124939
-0.601980	has. Assuming	-0.124939
-1.879619	binary search,	-0.124939
-1.601502	linear search,	-0.124939
-2.370597	compiler .........................................................................	-0.124939
-2.593648	C++ constructs	-0.124939
-2.340196	programming constructs	-0.124939
-2.147670	different screen	-0.425969
-1.446680	optimizations. Loops	-0.124939
-1.078942	7.13 Loops	-0.124939
-2.347387	void Plus2	-0.124939
-1.831792	a; Plus2	-0.124939
-3.110520	- a*0	-0.124939
-2.649527	n.a. a*0	-0.124939
-3.110520	- a*1	-0.124939
-2.649527	n.a. a*1	-0.124939
-1.283135	#include "vectorclass.h"	-0.425969
-0.602041	a[size], b[size],	-0.124939
-1.985265	double Table[100];	-0.425969
-0.748133	sections describe	-0.124939
-3.810938	a GOT.	-0.124939
-3.636150	and GOT.	-0.124939
-3.516422	The dynamic_cast	-0.124939
-2.463553	makes dynamic_cast	-0.124939
-0.903035	3 Finding	-0.425969
-0.204111	24, 120,	-0.425969
-3.506862	for uninitialized	-0.124939
-3.423873	are uninitialized	-0.124939
-0.601980	81). 77	-0.124939
-0.601980	....................................................................... 77	-0.124939
-3.810938	a string.	-0.124939
-0.601980	vendor string.	-0.124939
-3.066146	x 74	-0.124939
-0.601980	compilers............................................................................. 74	-0.124939
-2.913351	} 73	-0.124939
-2.714185	page 73	-0.124939
-1.078942	CChild2 Object2;	-0.124939
-0.902931	C2 Object2;	-0.124939
-4.103464	the destination	-0.124939
-3.636150	and destination	-0.124939
-1.811150	table lookup:	-0.425969
-3.636150	and 72	-0.124939
-1.831792	a; 72	-0.124939
-2.306769	& (Tuesday	-0.124939
-2.015918	expression (Tuesday	-0.124939
-2.387675	int b:2;	-0.425969
-2.620879	= string;	-0.124939
-3.506862	for putting	-0.124939
-3.220466	by putting	-0.124939
-3.190304	with 14.14b	-0.124939
-2.888122	Example 14.14b	-0.124939
-3.656516	in non-	-0.124939
-3.168158	on non-	-0.124939
-3.780575	to 15.1c.	-0.124939
-2.888122	Example 15.1c.	-0.124939
-2.022675	page 73).	-0.124939
-3.158444	code .......................................................	-0.124939
-1.901816	vectors .......................................................	-0.124939
-0.902931	semaphores, mutexes,	-0.124939
-0.902931	windows, mutexes,	-0.124939
-3.835016	is volatile.	-0.124939
-1.902214	declared volatile.	-0.124939
-2.964472	use relocation.	-0.124939
-2.378461	need relocation.	-0.124939
-1.264642	| (~a&c)	-0.124939
-3.810938	a 90%	-0.124939
-2.156532	uses 90%	-0.124939
-3.078083	int MultiplyBy	-0.124939
-2.792165	If MultiplyBy	-0.124939
-3.116033	not suited	-0.124939
-2.233381	best suited	-0.124939
-0.748142	Software Developer’s	-0.425969
-0.204111	b2, y1,	-0.124939
-2.888122	Example 14.14a	-0.124939
-2.699296	example 14.14a	-0.124939
-2.370685	user settings	-0.124939
-0.902931	color settings	-0.124939
-2.272010	function. Compile	-0.124939
-1.446839	130 Compile	-0.124939
-2.272010	function. Provoke	-0.124939
-1.446680	disk. Provoke	-0.124939
-2.390532	an import	-0.425969
-1.998647	block turns	-0.124939
-1.504513	prediction turns	-0.124939
-3.636150	and 0x4700.	-0.124939
-2.889723	from 0x4700.	-0.124939
-1.078942	----- x----	-0.124939
-0.902931	x---- x----	-0.124939
-0.204111	2.8 Overcoming	-0.425969
-0.902931	a*0=0 a*1=a	-0.124939
-0.601980	--xxxx-xx a*1=a	-0.124939
-1.300632	necessary. Take	-0.124939
-1.078942	features. Take	-0.124939
-2.872111	data shuffling,	-0.124939
-0.601980	conversion, shuffling,	-0.124939
-2.147670	different priorities	-0.124939
-3.121663	of range";	-0.124939
-1.980997	various corrections	-0.124939
-1.203881	me corrections	-0.124939
-2.444773	less safe.	-0.124939
-2.202846	exception safe.	-0.124939
-3.835016	is supported.	-0.124939
-3.116033	not supported.	-0.124939
-2.390532	an anonymous	-0.124939
-3.835016	is pure.	-0.124939
-3.440385	be pure.	-0.124939
-2.307086	instructions SSE4.2	-0.124939
-0.902931	smmintrin.h SSE4.2	-0.124939
-2.516301	return clock;	-0.124939
-2.497381	long clock;	-0.124939
-2.006127	example 12.4a	-0.124939
-1.985413	size matrices,	-0.425969
-1.923164	give inconsistent	-0.124939
-1.923005	becomes inconsistent	-0.124939
-0.601980	_mm_or_si128(c2, bc);	-0.124939
-0.601980	_mm_andnot_si128(mask, bc);	-0.124939
-3.113097	to join	-0.124939
-3.121663	of range.	-0.124939
-2.387675	int c:2;	-0.425969
-1.504592	fast. Value	-0.124939
-1.078942	slow. Value	-0.124939
-2.716931	class CGrandParent	-0.124939
-1.923085	public CGrandParent	-0.124939
-0.204111	_mm_add_epi16(c, two);	-0.425969
-3.066146	x --	-0.124939
-1.300632	xxxxxxxxx --	-0.124939
-3.835016	is bypassed	-0.124939
-3.440385	be bypassed	-0.124939
-2.223082	several drivers,	-0.124939
-0.601980	Device drivers,	-0.124939
-3.835016	is -0	-0.124939
-3.281660	or -0	-0.124939
-1.225171	graphics accelerator	-0.124939
-1.680445	access. 3.10	-0.124939
-1.300711	21 3.10	-0.124939
-1.300711	21 3.11	-0.124939
-1.078942	best. 3.11	-0.124939
-3.278300	it increases	-0.124939
-2.506893	table increases	-0.124939
-0.602023	21 3.13	-0.425969
-1.300632	22 3.14	-0.124939
-1.078942	caching. 3.14	-0.124939
-1.379813	cores. 3.15	-0.124939
-1.300632	22 3.15	-0.124939
-1.379733	chain. 3.16	-0.124939
-1.300632	22 3.16	-0.124939
-2.852102	make Sum1	-0.124939
-2.156930	functions. Sum1	-0.124939
-0.204111	a*b+a*c=a*(b+c) a*x*x*x	-0.425969
-0.505131	|= 0x20;	-0.124939
-3.220466	by TILESIZE	-0.124939
-3.078083	int TILESIZE	-0.124939
-2.542416	any expression,	-0.124939
-1.962117	&& expression,	-0.124939
-2.278946	time consumers	-0.124939
-4.103464	the CPU,	-0.124939
-1.777355	slow CPU,	-0.124939
-2.620879	= &Object1;	-0.124939
-2.370130	systems .............................................................................	-0.124939
-2.273123	does .............................................................................	-0.124939
-0.204111	Approximate exp(x)	-0.425969
-3.121663	of programming.	-0.124939
-3.110520	- time1;	-0.124939
-2.497381	long time1;	-0.124939
-1.394267	certain events,	-0.124939
-3.835016	is achieved	-0.124939
-3.440385	be achieved	-0.124939
-1.283135	#include <emmintrin.h>	-0.124939
-1.504592	applications. 2.8	-0.124939
-1.379813	14 2.8	-0.124939
-4.103464	the answers	-0.124939
-2.156532	get answers	-0.124939
-3.786643	of starting	-0.124939
-1.079022	Before starting	-0.124939
-2.164987	has disadvantages:	-0.124939
-1.555745	6 2.3	-0.124939
-1.300632	manual. 2.3	-0.124939
-2.491452	branch ahead	-0.124939
-2.077589	counter ahead	-0.124939
-3.835016	is inserted	-0.124939
-2.996966	have inserted	-0.124939
-1.998885	cache. 2.2	-0.124939
-1.831474	5 2.2	-0.124939
-0.902931	-msse /arch:SSE	-0.124939
-0.601980	vectors) /arch:SSE	-0.124939
-1.901737	platform 2.1	-0.124939
-1.831474	5 2.1	-0.124939
-2.649750	+ 2.0	-0.124939
-2.491214	< 2.0	-0.124939
-0.204111	5040, 40320,	-0.425969
-2.387675	int Func(int);	-0.425969
-2.919617	will invalidate	-0.124939
-0.601980	actively invalidate	-0.124939
-2.841800	The opposite	-0.124939
-3.656516	in itself,	-0.124939
-1.922767	framework itself,	-0.124939
-1.504592	12 2.7	-0.124939
-0.601980	undocumented. 2.7	-0.124939
-2.387675	int a[1000];	-0.124939
-1.879619	10 2.6	-0.124939
-1.856138	compiler. 2.6	-0.124939
-3.220466	by S.	-0.124939
-0.601980	Henry S.	-0.124939
-2.223082	thread environment	-0.124939
-1.998567	development environment	-0.124939
-3.009578	{ F2(b);	-0.124939
-0.902931	b[1000]; F2(b);	-0.124939
-3.278300	it handles	-0.124939
-2.031991	microprocessor handles	-0.124939
-1.715048	optimization. 2.4	-0.124939
-1.555745	6 2.4	-0.124939
-3.780575	to note	-0.124939
-0.902931	Please note	-0.124939
-2.132504	whether others	-0.124939
-1.203881	well, others	-0.124939
-2.232747	specific needs.	-0.124939
-1.203881	user's needs.	-0.124939
-2.167797	/ sar	-0.124939
-2.105221	add sar	-0.124939
-0.204111	((visibility ("internal")))	-0.124939
-2.888122	Example 8.15a	-0.124939
-2.699296	example 8.15a	-0.124939
-2.192348	memory footprint	-0.124939
-3.636150	and 14.13b	-0.124939
-2.888122	Example 14.13b	-0.124939
-3.281660	or namespaces.	-0.124939
-2.693818	using namespaces.	-0.124939
-3.506862	for preventing	-0.124939
-0.601980	effectively preventing	-0.124939
-2.636099	// Lowest	-0.425969
-0.601980	Friday, Saturday	-0.124939
-0.601980	0x20, Saturday	-0.124939
-0.204111	(a+b)+(c+d) a*b+a*c=a*(b+c)	-0.425969
-0.204111	screen resolutions,	-0.124939
-1.379813	4. So	-0.124939
-0.601980	everybody. So	-0.124939
-2.202357	Example 9.6a	-0.124939
-2.620879	= a*(b+c)	-0.425969
-1.777434	Such events	-0.124939
-1.504592	random events	-0.124939
-3.220466	by __fastcall.	-0.124939
-2.693818	using __fastcall.	-0.124939
-3.110520	- a+0	-0.124939
-2.649527	n.a. a+0	-0.124939
-1.446680	speed. Delays	-0.124939
-0.601980	reproducibility. Delays	-0.124939
-0.902931	Technical Report	-0.124939
-0.601980	"Technical Report	-0.124939
-3.302196	= (bb[i]	-0.124939
-2.105539	: (bb[i]	-0.124939
-3.835016	is specified.	-0.124939
-2.718034	set specified.	-0.124939
-3.246643	function prototype	-0.124939
-2.144934	Function prototype	-0.124939
-2.714185	page 39	-0.124939
-0.601980	j++) 39	-0.124939
-2.444773	example, let's	-0.124939
-0.601980	difference, let's	-0.124939
-2.549586	if (Day	-0.124939
-3.440385	be visible	-0.124939
-3.116033	not visible	-0.124939
-2.425670	64 Kbytes	-0.124939
-1.998488	256 Kbytes	-0.124939
-3.810938	a proxy	-0.124939
-3.516422	The proxy	-0.124939
-2.913351	} Microprocessors	-0.124939
-1.902055	control Microprocessors	-0.124939
-2.022675	page 105.	-0.124939
-2.272487	accessed recently	-0.124939
-1.680604	least recently	-0.124939
-3.780575	to creating	-0.124939
-3.506862	for creating	-0.124939
-2.620879	= order(i);	-0.124939
-2.666831	pointer refers	-0.124939
-1.504672	parallelism refers	-0.124939
-3.810938	a floppy	-0.124939
-3.147653	as floppy	-0.124939
-3.786643	of underflow.	-0.124939
-3.636150	and underflow.	-0.124939
-1.203801	...................................................................................................... 37	-0.124939
-0.902931	avoided. 37	-0.124939
-3.656516	in 36	-0.124939
-0.902931	............................................................................................ 36	-0.124939
-1.610327	& 3)	-0.124939
-2.586446	such contrived	-0.124939
-2.501498	very contrived	-0.124939
-1.379813	branches. Manual	-0.124939
-0.902931	www.intel.com. Manual	-0.124939
-1.504751	space. Excessive	-0.124939
-0.601980	much. Excessive	-0.124939
-2.347387	void FuncA	-0.124939
-0.601980	alternately FuncA	-0.124939
-3.636150	and web	-0.124939
-0.902931	integration, web	-0.124939
-3.353252	can occur,	-0.124939
-2.243524	doesn't occur,	-0.124939
-3.780575	to reorder	-0.124939
-3.048496	may reorder	-0.124939
-3.281660	or microseconds	-0.124939
-2.425987	take microseconds	-0.124939
-4.103464	the Standard	-0.124939
-1.078942	security. Standard	-0.124939
-4.103464	the const_cast	-0.124939
-3.516422	The const_cast	-0.124939
-1.601423	used, though.	-0.124939
-0.601980	optimal, though.	-0.124939
-0.602041	MS compiler:	-0.124939
-0.204111	F3(bool y)	-0.425969
-0.902931	-static /MT	-0.124939
-0.601980	/openmp /MT	-0.124939
-3.835016	is overwritten,	-0.124939
-3.440385	be overwritten,	-0.124939
-0.944409	PTR [esp+8]	-0.124939
-3.440385	be annoyingly	-0.124939
-1.923164	give annoyingly	-0.124939
-2.614572	float list[size];	-0.124939
-1.747471	S1 list[size];	-0.124939
-1.203801	commercial license	-0.124939
-0.902931	License license	-0.124939
-0.601980	b2); y1	-0.124939
-0.601980	y2; y1	-0.124939
-1.078942	reciprocal_divisor; y2	-0.124939
-0.601980	b1; y2	-0.124939
-1.158252	#define swapd(x,y)	-0.425969
-0.602032	((unsigned int)i	-0.124939
-2.387675	int CriticalFunction_SSE2(int	-0.425969
-1.816377	2 GHz	-0.124939
-3.835016	is false.	-0.124939
-2.948146	when false.	-0.124939
-1.879619	10 Multithreading	-0.124939
-1.078942	101 Multithreading	-0.124939
-1.158252	CPUs. New	-0.425969
-1.875645	* CriticalFunctionDispatch(void)	-0.124939
-2.369497	these methods.	-0.124939
-1.901975	dispatch methods.	-0.124939
-3.063211	compiler became	-0.124939
-1.379813	soon became	-0.124939
-2.817016	only if,	-0.124939
-2.157327	advantageous if,	-0.124939
-1.203881	user's computers.	-0.124939
-0.902931	mainframe computers.	-0.124939
-1.300632	C; x.abc	-0.124939
-0.601980	7.40c x.abc	-0.124939
-0.204111	16.3 Worst-case	-0.425969
-1.555665	expressions. Operations	-0.124939
-1.203801	aliasing. Operations	-0.124939
-2.405379	libraries named	-0.124939
-2.377747	registers named	-0.124939
-0.902931	x-xxxxxx- a*0=0	-0.124939
-0.902931	a+0=a a*0=0	-0.124939
-3.636150	and p2	-0.124939
-0.601980	p2; p2	-0.124939
-3.656516	in p1	-0.124939
-0.601980	p1; p1	-0.124939
-2.084121	all major	-0.425969
-2.964472	use internet	-0.124939
-1.379733	Several internet	-0.124939
-1.875645	* p;	-0.124939
-3.078083	int lrintf	-0.124939
-2.822375	functions lrintf	-0.124939
-4.103464	the resulting	-0.124939
-3.516422	The resulting	-0.124939
-0.204111	swapd(a[r][c], a[c][r]);	-0.124939
-1.493905	precision math.	-0.124939
-3.302196	= 2048	-0.124939
-1.880017	512 2048	-0.124939
-2.649750	+ 3.5;	-0.124939
-2.570272	* 3.5;	-0.124939
-3.516422	The DLLs	-0.124939
-2.253046	Windows DLLs	-0.124939
-3.506862	for Unix	-0.124939
-2.602966	64-bit Unix	-0.124939
-1.962434	lookup Lookup	-0.124939
-1.078942	lookup. Lookup	-0.124939
-2.157089	parameters differ	-0.124939
-1.300711	drivers differ	-0.124939
-2.620879	= InstructionSet();	-0.425969
-1.650945	void F1()	-0.124939
-3.119752	This safety	-0.124939
-1.300711	compromise safety	-0.124939
-3.786643	of predefined	-0.124939
-2.242969	Use predefined	-0.124939
-3.810938	a variable-size	-0.124939
-1.601662	allocate variable-size	-0.124939
-2.306769	& obj1;	-0.124939
-1.446680	C1 obj1;	-0.124939
-0.204111	Intensive Codes",	-0.124939
-2.745422	are summarized	-0.124939
-3.835016	is small.	-0.124939
-1.962276	too small.	-0.124939
-2.015521	handling ................................................................................	-0.124939
-0.902931	consumers ................................................................................	-0.124939
-3.810938	a buffer.	-0.124939
-1.642895	target buffer.	-0.124939
-1.919597	float list[size],	-0.124939
-1.283135	#include "asmlib.h"	-0.425969
-3.786643	of ArraySize	-0.124939
-3.078083	int ArraySize	-0.124939
-2.614572	float Live	-0.124939
-1.555585	storage. Live	-0.124939
-0.601980	c2, mask);	-0.124939
-0.601980	_mm_and_si128(c2, mask);	-0.124939
-3.190304	with suffixes	-0.124939
-2.222845	These suffixes	-0.124939
-4.103464	the programmer.	-0.124939
-2.031832	application programmer.	-0.124939
-0.902931	a+0=a x-xxxxxx-	-0.124939
-0.601980	x-xx----x x-xxxxxx-	-0.124939
-3.810938	a name.	-0.124939
-2.829374	same name.	-0.124939
-3.810938	a third-party	-0.124939
-2.593253	also third-party	-0.124939
-0.902931	b*a (a+b)+c=a+(b+c)	-0.124939
-0.601980	a+b+c=a+(b+c) (a+b)+c=a+(b+c)	-0.124939
-3.506862	for audio	-0.124939
-0.601980	streaming audio	-0.124939
-3.147653	as arguments	-0.124939
-2.829374	same arguments	-0.124939
-3.081147	an infinite	-0.124939
-1.379813	avoiding infinite	-0.124939
-2.171808	program flow.	-0.124939
-2.355946	even worse,	-0.124939
-1.446600	Even worse,	-0.124939
-2.048645	cache miss	-0.124939
-4.103464	the unsafe	-0.124939
-3.835016	is unsafe	-0.124939
-1.482652	optimized away.	-0.124939
-4.103464	the movements	-0.124939
-1.300711	physical movements	-0.124939
-2.620879	= ((x2)	-0.425969
-3.780575	to windows,	-0.124939
-1.446600	memory, windows,	-0.124939
-3.780575	to pressing	-0.124939
-2.015680	like pressing	-0.124939
-1.504592	register. Factors	-0.124939
-1.446680	is. Factors	-0.124939
-3.147653	as price,	-0.124939
-1.980918	high price,	-0.124939
-1.300632	jumps Jumps	-0.124939
-1.203801	optimized. Jumps	-0.124939
-2.977426	and maintaining	-0.124939
-1.078942	operands. Nevertheless,	-0.124939
-0.902931	PC. Nevertheless,	-0.124939
-3.636150	and sound	-0.124939
-1.379813	processing, sound	-0.124939
-3.636150	and servers	-0.124939
-3.168158	on servers	-0.124939
-2.157753	make utility.	-0.124939
-4.103464	the executable.	-0.124939
-2.829374	same executable.	-0.124939
-2.761868	be controlled.	-0.124939
-2.232747	specific literature	-0.124939
-1.504513	general literature	-0.124939
-2.620879	= 512;	-0.425969
-2.272010	extra precautions	-0.124939
-1.747153	special precautions	-0.124939
-2.745422	are smarter	-0.425969
-0.204111	SIAM 2001.	-0.124939
-0.902931	73). Current	-0.124939
-0.601980	accumulators. Current	-0.124939
-3.170036	is concentrated	-0.425969
-2.315715	{ aa[i]	-0.425969
-3.148056	a null	-0.124939
-3.835016	is capable	-0.124939
-3.423873	are capable	-0.124939
-2.913351	} FuncC(i);	-0.124939
-0.902931	FuncA(i); FuncC(i);	-0.124939
-3.506862	for updating.	-0.124939
-0.601980	Hardware updating.	-0.124939
-3.636150	and MOVNTDQ	-0.124939
-2.743272	cache MOVNTDQ	-0.124939
-3.516422	The renaming	-0.124939
-2.431391	register renaming	-0.124939
-1.715366	When considering	-0.124939
-1.300871	worth considering	-0.124939
-3.835016	is worthwhile	-0.124939
-3.440385	be worthwhile	-0.124939
-0.902931	x-xxx---- a-(-b)=a+b	-0.124939
-0.601980	--xxxxxx- a-(-b)=a+b	-0.124939
-1.650945	void f();	-0.425969
-2.202528	allocated separately.	-0.124939
-1.446600	measured separately.	-0.124939
-2.347863	access patterns	-0.124939
-1.203801	regular patterns	-0.124939
-2.022675	page 93.	-0.124939
-4.103464	the lowest	-0.124939
-0.601980	Error: lowest	-0.124939
-1.856377	#define EXCEPTION_FLT_OVERFLOW	-0.124939
-1.747551	== EXCEPTION_FLT_OVERFLOW	-0.124939
-3.810938	a constructor,	-0.124939
-2.092466	copy constructor,	-0.124939
-0.601980	Intel/MASM syntax:	-0.124939
-0.601980	Gnu/AT&T syntax:	-0.124939
-1.875645	* 1.2;	-0.425969
-2.022675	page 26.	-0.124939
-4.103464	the parentheses	-0.124939
-2.609824	two parentheses	-0.124939
-2.202449	overflow check.	-0.124939
-1.805304	syntax check.	-0.124939
-3.786643	of experiments	-0.124939
-2.701658	do experiments	-0.124939
-2.253522	execution .................................................................................................	-0.124939
-1.504672	tables .................................................................................................	-0.124939
-0.204111	"Instruction tables".	-0.124939
-2.167797	/ jl	-0.124939
-1.078942	cmp jl	-0.124939
-1.715207	total computation	-0.124939
-0.902931	overall computation	-0.124939
-2.852102	make thread-local	-0.124939
-1.715605	(See thread-local	-0.124939
-3.113097	to date.	-0.124939
-3.810938	a physics	-0.124939
-1.203881	dedicated physics	-0.124939
-2.168591	calculated first,	-0.124939
-2.092148	values first,	-0.124939
-1.556256	(see below)	-0.124939
-3.835016	is eliminated.	-0.124939
-3.423873	are eliminated.	-0.124939
-2.468986	elements c.load(cc+i);	-0.124939
-0.902931	b.load(bb+i); c.load(cc+i);	-0.124939
-0.601980	!(!a)=a x-xxxxxxx	-0.124939
-0.601980	x-xxxx-x- x-xxxxxxx	-0.124939
-1.601423	CPU. Unrolling	-0.124939
-0.601980	FuncC. Unrolling	-0.124939
-2.859753	has hyperthreading.	-0.124939
-2.693818	using hyperthreading.	-0.124939
-0.902931	1024 bytes,	-0.124939
-0.902931	8192 bytes,	-0.124939
-1.708937	libraries (*.lib,	-0.425969
-3.786643	of irrelevant	-0.124939
-3.440385	be irrelevant	-0.124939
-3.440385	be careful	-0.124939
-1.999203	needs careful	-0.124939
-2.178803	data compression	-0.124939
-2.972462	time intervals	-0.124939
-1.078942	unpredictable intervals	-0.124939
-2.830924	for (c2	-0.425969
-3.081147	an immediate	-0.124939
-1.078942	expects immediate	-0.124939
-1.446680	C1 Object1;	-0.124939
-1.203881	CChild1 Object1;	-0.124939
-1.078942	multiplication, etc.)	-0.124939
-0.601980	OS, etc.)	-0.124939
-2.977426	and 64-bit.	-0.124939
-3.516422	The indirect	-0.124939
-0.601980	"Gnu indirect	-0.124939
-0.601980	---x---xx (-a==-b)=(a==b)	-0.124939
-0.601980	---xx--xx (-a==-b)=(a==b)	-0.124939
-2.859753	has hyperthreading,	-0.124939
-2.693818	using hyperthreading,	-0.124939
-3.461056	the exponent,	-0.425969
-2.315715	{ FuncA(i);	-0.124939
-4.103464	the cross-platform	-0.124939
-3.786643	of cross-platform	-0.124939
-0.204111	swapd(x,y) {temp=x;	-0.425969
-2.178803	data decomposition.	-0.124939
-1.702144	template parameter:	-0.124939
-2.787003	which determines	-0.124939
-1.715605	operand determines	-0.124939
-1.317915	members (properties)	-0.124939
-3.078083	int ABC	-0.124939
-1.856377	#define ABC	-0.124939
-4.103464	the comments	-0.124939
-2.131391	few comments	-0.124939
-1.300711	Arrays .....................................................................................................................	-0.124939
-1.078942	Literature .....................................................................................................................	-0.124939
-3.835016	is profitable	-0.124939
-3.440385	be profitable	-0.124939
-1.555745	logic behind	-0.124939
-1.078942	hidden behind	-0.124939
-0.902931	Fog. Technical	-0.124939
-0.601980	TR18015 Technical	-0.124939
-1.943288	set. Neither	-0.124939
-0.601980	hour. Neither	-0.124939
-2.710694	each calculation.	-0.124939
-2.031435	next calculation.	-0.124939
-0.602032	__attribute(( const))	-0.124939
-3.780575	to test,	-0.124939
-1.642895	under test,	-0.124939
-0.204111	/Gy -ffunction-	-0.425969
-0.204111	/arch:SSE -msse	-0.124939
-2.636099	// Initialize	-0.124939
-2.567207	array indices	-0.124939
-1.805781	consecutive indices	-0.124939
-3.009578	{ s0	-0.124939
-2.614572	float s0	-0.124939
-1.078942	listing /FA	-0.124939
-0.601980	masm=intel /FA	-0.124939
-3.506862	for marketing	-0.124939
-1.504592	heavy marketing	-0.124939
-3.078083	int parm2);	-0.124939
-0.601980	(*CriticalFunction)(parm1, parm2);	-0.124939
-0.902931	(a+b)+c=a+(b+c) --xx-----	-0.124939
-0.902931	x--x----- --xx-----	-0.124939
-2.620879	= 20,	-0.425969
-4.103464	the conflicting	-0.124939
-2.417873	often conflicting	-0.124939
-1.794843	< 20;	-0.124939
-4.103464	the 61	-0.124939
-0.902931	................................................................................ 61	-0.124939
-3.220466	by looking	-0.124939
-0.902931	funny looking	-0.124939
-3.190304	with coarse-grained	-0.124939
-2.496589	between coarse-grained	-0.124939
-2.468986	elements matrix[c][r]	-0.124939
-2.233619	element matrix[c][r]	-0.124939
-0.902931	-msse3 -mssse3	-0.124939
-0.601980	/arch:SSE3 -mssse3	-0.124939
-3.780575	to isolate	-0.124939
-3.636150	and isolate	-0.124939
-2.405141	code. Let	-0.124939
-1.601343	sets. Let	-0.124939
-2.972915	in question.	-0.124939
-3.321661	// x^n	-0.124939
-2.155898	four x^n	-0.124939
-3.475723	that treats	-0.124939
-1.962593	dispatcher treats	-0.124939
-1.708919	optimization topics	-0.124939
-1.601343	address. (3)	-0.124939
-0.601980	around, (3)	-0.124939
-1.264661	lookup table:	-0.425969
-3.835016	is unstable	-0.124939
-3.423873	are unstable	-0.124939
-1.379813	cores. 60	-0.124939
-0.601980	.................................................................................................................. 60	-0.124939
-3.786643	of iterations.	-0.124939
-0.601980	Newton-Raphson iterations.	-0.124939
-1.078942	analysis Join	-0.124939
-0.902931	area. Join	-0.124939
-3.302196	= bb[i]	-0.124939
-2.948146	when bb[i]	-0.124939
-4.103464	the sampling	-0.124939
-0.902931	Event-based sampling	-0.124939
-3.302196	= (memory	-0.124939
-2.558320	objects (memory	-0.124939
-3.506862	for verifying	-0.124939
-0.601980	testing, verifying	-0.124939
-1.446680	19 3.6	-0.124939
-0.601980	acceptable. 3.6	-0.124939
-1.300711	18 3.4	-0.124939
-1.078942	manner. 3.4	-0.124939
-2.620879	= log(b[i])	-0.425969
-1.078942	100. Now,	-0.124939
-0.601980	sizeof(float)). Now,	-0.124939
-2.511353	2 a+a+a+a=a*4	-0.124939
-0.601980	((x2)2)2 a+a+a+a=a*4	-0.124939
-3.656516	in doubt	-0.124939
-2.726769	no doubt	-0.124939
-2.022675	page 78).	-0.124939
-3.636150	and v.f	-0.124939
-1.943209	> v.f	-0.124939
-0.902931	FIFO manner?	-0.124939
-0.601980	FILO manner?	-0.124939
-3.068704	than generating	-0.124939
-2.362944	without generating	-0.124939
-1.300632	priority. Especially	-0.124939
-1.300632	addresses. Especially	-0.124939
-3.063211	compiler ....................................................................................................	-0.124939
-1.642656	updates ....................................................................................................	-0.124939
-2.339720	16 3.2	-0.124939
-0.601980	improved. 3.2	-0.124939
-3.009578	{ F1(a);	-0.124939
-0.902931	a[1000]; F1(a);	-0.124939
-2.272010	function. Switch	-0.124939
-1.300711	ways. Switch	-0.124939
-2.620879	= _mm_set1_epi16(0);	-0.425969
-3.158444	code everywhere	-0.124939
-1.556063	scattered everywhere	-0.124939
-2.620879	= a&&(b||c)	-0.124939
-3.302196	= (a&b)	-0.124939
-1.879699	0, (a&b)	-0.124939
-2.339720	16 3.3	-0.124939
-0.601980	sections. 3.3	-0.124939
-2.339720	16 3.1	-0.124939
-0.902931	consumers 3.1	-0.124939
-1.879778	goes randomly	-0.124939
-1.556063	scattered randomly	-0.124939
-1.203801	structures. Useful	-0.124939
-0.601980	requirement. Useful	-0.124939
-1.642895	20 3.8	-0.124939
-0.601980	finish. 3.8	-0.124939
-0.944418	20 3.9	-0.425969
-2.062866	optimize ............................................................................................	-0.124939
-1.831553	references ............................................................................................	-0.124939
-1.546256	big mainframe	-0.124939
-3.321661	// (time	-0.124939
-3.110520	- (time	-0.124939
-1.715048	optimization. Everything	-0.124939
-1.504592	register. Everything	-0.124939
-3.170036	is required.	-0.124939
-4.103464	the theoretical	-0.124939
-3.516422	The theoretical	-0.124939
-3.780575	to 12.1a.	-0.124939
-2.888122	Example 12.1a.	-0.124939
-4.103464	the file,	-0.124939
-0.601980	.exe file,	-0.124939
-1.556063	hard working	-0.124939
-0.601980	indexes, working	-0.124939
-3.516422	The pragmas	-0.124939
-3.147653	as pragmas	-0.124939
-3.780575	to use.	-0.124939
-3.656516	in use.	-0.124939
-3.780575	to use,	-0.124939
-2.431391	register use,	-0.124939
-2.444773	less favorable:	-0.124939
-1.777036	vectorization favorable:	-0.124939
-2.339166	system color	-0.124939
-1.203881	RGB color	-0.124939
-3.835016	is 8192	-0.124939
-3.302196	= 8192	-0.124939
-2.620879	= _mm_cmpgt_epi16(b,	-0.425969
-3.835016	is lost.	-0.124939
-3.423873	are lost.	-0.124939
-0.204111	7.30 Exceptions	-0.425969
-4.103464	the question	-0.124939
-3.656516	in question	-0.124939
-3.636150	and afterwards	-0.124939
-2.864769	program afterwards	-0.124939
-1.601343	returns. Every	-0.124939
-0.601980	columns. Every	-0.124939
-4.103464	the denormals-are-zero	-0.124939
-3.636150	and denormals-are-zero	-0.124939
-3.246643	function declaration.	-0.124939
-2.716931	class declaration.	-0.124939
-2.808142	other exceptions:	-0.124939
-1.943686	after exceptions:	-0.124939
-3.113097	to read.	-0.124939
-1.831315	are: Non-static	-0.124939
-0.902931	instance. Non-static	-0.124939
-3.810938	a re-	-0.124939
-3.281660	or re-	-0.124939
-2.627317	library requiring	-0.124939
-1.922767	framework requiring	-0.124939
-1.300711	abc {int	-0.124939
-0.902931	Sab {int	-0.124939
-4.103464	the branching	-0.124939
-3.516422	The branching	-0.124939
-2.859753	has changed.	-0.124939
-1.998488	never changed.	-0.124939
-2.611245	object belongs	-0.124939
-1.203881	normally belongs	-0.124939
-1.245346	|| (a&&c)	-0.124939
-3.078083	int NumberOfTests	-0.124939
-0.601980	Repeat NumberOfTests	-0.124939
-3.835016	is obviously	-0.124939
-3.278300	it obviously	-0.124939
-1.133419	go undetected.	-0.124939
-1.300632	runs alone	-0.124939
-0.601980	stand alone	-0.124939
-3.461056	the caller	-0.124939
-1.962276	better understanding	-0.124939
-0.601980	basic understanding	-0.124939
-3.353252	can influence	-0.124939
-2.859753	has influence	-0.124939
-2.224274	c x-xx-----	-0.124939
-1.379733	x-xxxx--x x-xx-----	-0.124939
-1.879778	called. Lazy	-0.124939
-1.078942	long. Lazy	-0.124939
-3.321661	// Volatile	-0.124939
-1.555665	enabled. Volatile	-0.124939
-3.780575	to lock	-0.124939
-0.601980	temporarily lock	-0.124939
-3.835016	is allowed.	-0.124939
-3.116033	not allowed.	-0.124939
-2.586367	efficient today	-0.124939
-2.370130	new today	-0.124939
-2.620879	= (double)(signed	-0.425969
-3.810938	a programmable	-0.124939
-2.926114	A programmable	-0.124939
-2.586446	such checks.	-0.124939
-1.805304	syntax checks.	-0.124939
-1.962434	lookup mechanisms	-0.124939
-0.601980	Updating mechanisms	-0.124939
-2.506893	table 8.1.	-0.124939
-1.962196	Table 8.1.	-0.124939
-4.103464	the G	-0.124939
-2.155898	four G	-0.124939
-2.191374	Linux Align	-0.124939
-1.379813	4. Align	-0.124939
-2.649750	+ c)	-0.124939
-2.167797	/ c)	-0.124939
-3.302196	= &CriticalFunction_386;	-0.124939
-2.516301	return &CriticalFunction_386;	-0.124939
-2.609824	two (three	-0.124939
-2.281154	stack (three	-0.124939
-2.315715	{ goto	-0.124939
-3.116033	not testing.	-0.124939
-2.889723	from testing.	-0.124939
-0.204111	"override" feature.	-0.124939
-0.204111	symbol interposition	-0.124939
-3.475723	that loads	-0.124939
-2.864769	program loads	-0.124939
-0.204111	(*.lib, *.a)	-0.425969
-2.117324	CPU dispatchers	-0.124939
-1.504592	register. Registers	-0.124939
-1.078942	reference. Registers	-0.124939
-2.212300	etc. Event-based	-0.124939
-0.601980	reliable. Event-based	-0.124939
-2.047865	problems associated	-0.124939
-1.805463	errors associated	-0.124939
-3.810938	a time-consumer	-0.124939
-1.379813	biggest time-consumer	-0.124939
-1.504831	detection mechanism.	-0.124939
-1.504513	prediction mechanism.	-0.124939
-1.747818	8 Optimizations	-0.425969
-3.636150	and machines	-0.124939
-1.446680	Java machines	-0.124939
-2.299876	this problem:	-0.124939
-2.092466	copy constructors,	-0.124939
-1.555745	default constructors,	-0.124939
-1.601263	references. References	-0.124939
-1.203801	type. References	-0.124939
-2.745422	are mutually	-0.425969
-3.147653	as _mm_empty()	-0.124939
-1.680445	execute _mm_empty()	-0.124939
-3.048496	may report	-0.124939
-2.405221	optimization report	-0.124939
-2.776943	all disturbing	-0.124939
-2.091354	All disturbing	-0.124939
-3.656516	in develop-	-0.124939
-2.500945	software develop-	-0.124939
-2.761868	be negative.	-0.425969
-1.555824	search facilities,	-0.124939
-1.446760	debugging facilities,	-0.124939
-4.103464	the creation	-0.124939
-3.516422	The creation	-0.124939
-3.063211	compiler warning	-0.124939
-2.726769	no warning	-0.124939
-3.078083	int min	-0.124939
-1.777116	>= min	-0.124939
-1.379733	constant. 14.2	-0.124939
-1.078942	132 14.2	-0.124939
-1.504513	zero. 14.3	-0.124939
-1.203801	134 14.3	-0.124939
-1.078942	132 14.1	-0.124939
-0.902931	topics 14.1	-0.124939
-4.103464	the vectorclass	-0.124939
-3.110520	- vectorclass	-0.124939
-1.078942	reciprocal_divisor; 14.7	-0.124939
-0.902931	139 14.7	-0.124939
-3.636150	and debugging.	-0.124939
-3.190304	with debugging.	-0.124939
-1.650945	void Func(int	-0.425969
-3.835016	is (columns	-0.124939
-2.570272	* (columns	-0.124939
-1.078942	136 14.5	-0.124939
-0.601980	96. 14.5	-0.124939
-3.835016	is defined.	-0.124939
-3.440385	be defined.	-0.124939
-0.902931	-mssse3 -msse4.1	-0.124939
-0.601980	/arch:SSSE2 -msse4.1	-0.124939
-3.321661	// x^10	-0.124939
-2.516301	return x^10	-0.124939
-1.871037	many branches):	-0.425969
-2.315715	{ DoThisThreeTimesAWeek();	-0.425969
-0.204111	/arch:SSE2 -msse2	-0.124939
-2.460257	as logarithms,	-0.425969
-2.533001	by default.	-0.124939
-3.506862	for WTL	-0.124939
-2.926114	A WTL	-0.124939
-2.636099	// _controlfp(0,	-0.425969
-3.780575	to load.	-0.124939
-2.253125	work load.	-0.124939
-2.620879	= select(b	-0.425969
-1.680604	level framework.	-0.124939
-1.379972	.NET framework.	-0.124939
-1.300632	handling. 8.6	-0.124939
-1.078942	81 8.6	-0.124939
-2.977426	and synchronization	-0.425969
-1.555585	executed. Without	-0.124939
-0.902931	73 Without	-0.124939
-3.506862	for millisecond	-0.124939
-3.190304	with millisecond	-0.124939
-2.912038	then sizeof(S1)	-0.124939
-1.446600	factor sizeof(S1)	-0.124939
-3.810938	a high-priority	-0.124939
-3.636150	and high-priority	-0.124939
-2.500945	software development.	-0.124939
-1.555824	easy development.	-0.124939
-3.780575	to push	-0.124939
-0.902931	$B1$1: push	-0.124939
-3.121663	of Numerically	-0.425969
-3.780575	to verify	-0.124939
-3.636150	and verify	-0.124939
-1.317897	allows us	-0.425969
-3.636150	and searching,	-0.124939
-0.601980	sorting, searching,	-0.124939
-3.170036	is known.	-0.124939
-0.902931	148 14.13	-0.124939
-0.601980	X. 14.13	-0.124939
-1.203881	146 14.12	-0.124939
-0.601980	147 14.12	-0.124939
-2.192348	memory area.	-0.124939
-2.888122	Example 14.19	-0.124939
-2.699296	example 14.19	-0.124939
-3.068704	than rounding.	-0.124939
-2.280678	about rounding.	-0.124939
-2.649750	+ column;	-0.124939
-0.601980	row, column;	-0.124939
-2.507684	performance dramatically	-0.124939
-0.902931	24 dramatically	-0.124939
-1.962991	data. 148	-0.124939
-0.601980	code.................................................................................. 148	-0.124939
-0.601980	latencies. 8.5	-0.124939
-0.601980	CPU.............................................................................81 8.5	-0.124939
-2.636099	// Polynomial	-0.425969
-2.355946	even temporarily.	-0.124939
-1.680604	least temporarily.	-0.124939
-2.501498	very obscure	-0.124939
-0.902931	construct obscure	-0.124939
-2.888122	Example 14.1c	-0.124939
-2.699296	example 14.1c	-0.124939
-2.334055	part 142	-0.124939
-0.601980	......................... 142	-0.124939
-3.506862	for "assume	-0.124939
-2.202290	option "assume	-0.124939
-3.423873	are properly	-0.124939
-1.078942	deleted properly	-0.124939
-2.405221	optimization issue.	-0.124939
-0.902931	legal issue.	-0.124939
-3.835016	is restarted	-0.124939
-3.636150	and restarted	-0.124939
-3.078083	int Func2()	-0.124939
-2.347387	void Func2()	-0.124939
-0.902931	x-xx----- x--x-----	-0.124939
-0.601980	x-xx--xx- x--x-----	-0.124939
-3.068704	than PCs.	-0.124939
-2.105459	standard PCs.	-0.124939
-1.715048	optimization. 8.2	-0.124939
-1.078942	66 8.2	-0.124939
-1.715127	it. Possible	-0.124939
-0.601980	machines? Possible	-0.124939
-3.636150	and IDE's	-0.124939
-2.015998	Most IDE's	-0.124939
-1.943050	compilers. 8.3	-0.124939
-0.902931	74 8.3	-0.124939
-2.761868	be obtained.	-0.124939
-2.972915	in C++:	-0.124939
-2.620879	= sin(x);	-0.124939
-4.103464	the delay.	-0.124939
-2.497381	long delay.	-0.124939
-2.620879	= 1.f;	-0.124939
-1.078942	explicitly. Divisions	-0.124939
-0.902931	additions. Divisions	-0.124939
-2.405141	code. 7.32	-0.124939
-1.555745	65 7.32	-0.124939
-1.856377	i, largest_index	-0.124939
-0.601980	absvalue; largest_index	-0.124939
-1.635188	bits wide,	-0.124939
-3.009578	{ list[i].a	-0.124939
-1.715286	accessing list[i].a	-0.124939
-2.243524	processor model.	-0.124939
-2.232747	specific model.	-0.124939
-1.555745	65 7.33	-0.124939
-0.902931	discussion. 7.33	-0.124939
-2.830924	for manipulating	-0.425969
-0.204111	3.15 Dependency	-0.425969
-2.460257	as additions.	-0.124939
-2.743272	cache MOVNTPD	-0.124939
-0.601980	MOVNTPS, MOVNTPD	-0.124939
-1.446760	integer. 158	-0.124939
-0.902931	............................................................................. 158	-0.124939
-3.302196	= A;	-0.124939
-2.649750	+ A;	-0.124939
-1.866721	clock cycle?	-0.124939
-4.103464	the server.	-0.124939
-2.377350	test server.	-0.124939
-1.078942	...................................................................................... 156	-0.124939
-0.601980	large. 156	-0.124939
-3.636150	and fine-tuned	-0.124939
-3.423873	are fine-tuned	-0.124939
-0.902931	................................................................................................ 157	-0.124939
-0.902931	normal. 157	-0.124939
-3.113097	to draw	-0.124939
-1.680908	test examples.	-0.124939
-1.555984	derived class:	-0.124939
-0.601980	grandparent class:	-0.124939
-3.786643	of sharing	-0.124939
-3.423873	are sharing	-0.124939
-3.780575	to 155	-0.124939
-0.601980	.................................................................... 155	-0.124939
-0.902931	60 7.30	-0.124939
-0.601980	multithreading. 7.30	-0.124939
-2.808142	other ways,	-0.124939
-2.450061	4 ways,	-0.124939
-2.761868	be predicted.	-0.124939
-0.380193	(i=0; i<n;	-0.124939
-3.835016	is dividing	-0.124939
-2.457324	before dividing	-0.124939
-2.808142	other complications	-0.124939
-2.191295	cause complications	-0.124939
-2.362944	without AVX,	-0.124939
-1.642656	e.g. AVX,	-0.124939
-1.078942	on. 7.31	-0.124939
-0.902931	61 7.31	-0.124939
-2.977426	and redo	-0.425969
-3.506862	for parallelization	-0.124939
-1.981315	automatic parallelization	-0.124939
-1.555824	element. Matrix	-0.124939
-1.446680	follows: Matrix	-0.124939
-1.555745	destructors ..................................................................................	-0.124939
-1.379813	spots ..................................................................................	-0.124939
-1.446919	c); a.store(aa+i);	-0.124939
-1.203961	aa: a.store(aa+i);	-0.124939
-1.610327	& 0x7FFFFFFF)	-0.425969
-3.353252	can add,	-0.124939
-0.601980	horizontal add,	-0.124939
-3.835016	is fed	-0.124939
-3.440385	be fed	-0.124939
-4.103464	the array,	-0.124939
-3.081147	an array,	-0.124939
-2.830924	for AVX.	-0.124939
-1.158252	#define Alignd(X)	-0.124939
-2.620879	= _mm_add_epi16(c,	-0.425969
-3.475723	that waits	-0.124939
-2.776831	but waits	-0.124939
-3.068704	than loops,	-0.124939
-2.262274	while loops,	-0.124939
-3.506862	for Tuesday,	-0.124939
-0.902931	Monday, Tuesday,	-0.124939
-3.656516	in loops.	-0.124939
-1.856934	innermost loops.	-0.124939
-3.321661	// Increment	-0.124939
-0.902931	respectively. Increment	-0.124939
-3.440385	be cached	-0.124939
-3.423873	are cached	-0.124939
-1.601773	constant propagation,	-0.124939
-2.872111	data exceeds	-0.124939
-1.998488	never exceeds	-0.124939
-3.656516	in Wikipedia	-0.124939
-1.943050	compilers. Wikipedia	-0.124939
-1.715127	testing ................................................................................................	-0.124939
-1.555665	chains ................................................................................................	-0.124939
-0.902931	inverted mask.	-0.124939
-0.601980	affinity mask.	-0.124939
-2.919617	will conclude	-0.124939
-2.191215	therefore conclude	-0.124939
-2.761868	be shared.	-0.124939
-3.835016	is relocated	-0.124939
-3.423873	are relocated	-0.124939
-0.857286	everything else.	-0.124939
-3.148056	a FIFO	-0.124939
-0.602023	Kernel Library"	-0.124939
-3.506862	for dealing	-0.124939
-3.423873	are dealing	-0.124939
-1.650945	void Hello()	-0.425969
-0.902931	unit. Various	-0.124939
-0.601980	network. Various	-0.124939
-2.602966	64-bit software,	-0.124939
-1.805304	modern software,	-0.124939
-3.321661	// Overflow	-0.124939
-1.078942	30 Overflow	-0.124939
-2.693818	using multiplications	-0.124939
-2.299518	up multiplications	-0.124939
-2.532511	some differences	-0.124939
-1.776957	No differences	-0.124939
-2.620879	= 2.2,	-0.425969
-2.829374	same machine.	-0.124939
-2.224115	virtual machine.	-0.124939
-1.805145	STL containers.	-0.124939
-0.902931	deleting containers.	-0.124939
-2.491452	branch tree.	-0.124939
-1.879619	binary tree.	-0.124939
-0.601980	flawed approach	-0.124939
-0.601980	thought-through approach	-0.124939
-1.504965	allocated dynamically.	-0.124939
-0.902931	CriticalFunctionDispatch(void) __asm__	-0.124939
-0.601980	(); __asm__	-0.124939
-2.305250	have difficulties	-0.425969
-2.977426	and deleting	-0.124939
-3.810938	a discussion.	-0.124939
-1.747312	further discussion.	-0.124939
-1.078942	Enums ......................................................................................................................	-0.124939
-1.078942	Strings ......................................................................................................................	-0.124939
-2.317506	short int)	-0.124939
-1.555585	(or int)	-0.124939
-2.549586	if (y)	-0.425969
-2.990146	this purpose,	-0.124939
-2.232747	specific purpose,	-0.124939
-0.902931	sorting algorithms,	-0.124939
-0.601980	encryption algorithms,	-0.124939
-3.302196	= &CriticalFunction_SSE2;	-0.124939
-2.516301	return &CriticalFunction_SSE2;	-0.124939
-2.347387	void Disp();	-0.124939
-0.902931	"; Disp();	-0.124939
-3.835016	is consistent	-0.124939
-3.220466	by consistent	-0.124939
-3.302196	= 1.23456.	-0.124939
-3.068704	than 1.23456.	-0.124939
-0.204111	u, v;	-0.425969
-0.204111	_mm_mullo_epi16 (b,	-0.425969
-2.417873	often reveal	-0.124939
-1.643134	implementations reveal	-0.124939
-3.835016	is created.	-0.124939
-3.423873	are created.	-0.124939
-2.964472	use denormal	-0.124939
-0.902931	generating denormal	-0.124939
-3.440385	be optional	-0.124939
-0.601980	License, optional	-0.124939
-0.903053	unaligned op.	-0.124939
-2.243445	An experiment	-0.124939
-1.831394	my experiment	-0.124939
-2.262353	; Induction++;	-0.124939
-1.203881	Induction; Induction++;	-0.124939
-2.839593	different precisions	-0.124939
-2.803340	point precisions	-0.124939
-0.204111	13.3 Difficult	-0.124939
-2.390532	an interpreter	-0.124939
-1.811150	table (PLT).	-0.124939
-2.387675	int order(int	-0.425969
-3.786643	of digital	-0.124939
-1.379733	complex digital	-0.124939
-0.505122	i<300; i++){	-0.425969
-0.902931	0.44 0.40	-0.124939
-0.601980	0.89 0.40	-0.124939
-0.505131	y, z;	-0.124939
-1.078942	........................................................................................... 139	-0.124939
-0.902931	c) 139	-0.124939
-0.601980	0.57 0.44	-0.124939
-0.601980	0.38 0.44	-0.124939
-0.204111	bb, cc);	-0.425969
-1.901737	platform __GNUC__	-0.124939
-0.902931	#ifdef __GNUC__	-0.124939
-3.475723	that covered	-0.124939
-3.423873	are covered	-0.124939
-1.181705	old fashioned	-0.425969
-3.190304	with alloca.	-0.124939
-2.693818	using alloca.	-0.124939
-2.091989	memory. Efficient	-0.124939
-1.078942	truncation. Efficient	-0.124939
-2.885448	memory spaces	-0.124939
-2.299518	up spaces	-0.124939
-1.981236	parameter $B1$1:	-0.124939
-1.504592	12 $B1$1:	-0.124939
-0.204111	40320, 362880,	-0.425969
-2.167797	/ (line	-0.124939
-0.601980	sets) (line	-0.124939
-3.281660	or column.	-0.124939
-2.990146	this column.	-0.124939
-2.258303	more complex,	-0.124939
-1.962355	below. 3.7	-0.124939
-1.642895	20 3.7	-0.124939
-3.810938	a cheap	-0.124939
-1.300632	relatively cheap	-0.124939
-1.300919	mathematical purity.	-0.124939
-0.902931	0.f, 0.f,	-0.124939
-0.601980	s(0.f, 0.f,	-0.124939
-2.202357	Example 14.23b	-0.124939
-1.556256	(see below).	-0.124939
-3.810938	a division,	-0.124939
-2.190977	precision division,	-0.124939
-3.786643	of received	-0.124939
-1.300791	command received	-0.124939
-1.203801	dramatic degradation	-0.124939
-0.902931	slight degradation	-0.124939
-4.103464	the "override"	-0.124939
-2.990146	this "override"	-0.124939
-2.888122	Example 11.2b	-0.124939
-2.699296	example 11.2b	-0.124939
-3.786643	of matrix[j][0]	-0.124939
-0.902931	order(i); matrix[j][0]	-0.124939
-1.078942	---x----- x--xx----	-0.124939
-0.601980	(a&&b)||(a&&!b)=a x--xx----	-0.124939
-2.272010	function. Sometimes,	-0.124939
-1.078942	spot. Sometimes,	-0.124939
-3.113097	to distribute	-0.124939
-3.461056	the "worst	-0.425969
-2.745422	are overdetermined	-0.124939
-0.602032	250 ms.	-0.124939
-2.137658	same thing.	-0.124939
-3.786643	of squares:	-0.124939
-2.776943	all squares:	-0.124939
-2.972462	time MemberPointer	-0.124939
-2.457324	before MemberPointer	-0.124939
-2.196762	from www.intel.com.	-0.124939
-1.504910	+= TILESIZE)	-0.425969
-3.786643	of sources.	-0.124939
-1.680763	unknown sources.	-0.124939
-4.103464	the first-in-last-out	-0.124939
-3.810938	a first-in-last-out	-0.124939
-0.204111	362880, 3628800,	-0.425969
-2.429114	not uncommon	-0.425969
-3.810938	a coprocessor	-0.124939
-1.923085	graphics coprocessor	-0.124939
-2.105539	: 23;	-0.124939
-1.879778	<< 23;	-0.124939
-1.446600	Linux. 82	-0.124939
-1.203801	.............................................................................................. 82	-0.124939
-2.593648	C++ relates	-0.124939
-2.233222	language relates	-0.124939
-1.831633	pointer. 7.9	-0.124939
-0.601980	pointers.......................................................................................................37 7.9	-0.124939
-4.103464	the alignment.	-0.124939
-1.300711	Data alignment.	-0.124939
-1.555824	integers. 7.5	-0.124939
-1.203801	33 7.5	-0.124939
-1.504965	good deal	-0.425969
-1.078942	(requires binutils	-0.124939
-0.601980	Requires binutils	-0.124939
-1.715127	operations. 7.6	-0.124939
-1.203801	33 7.6	-0.124939
-0.681195	14 Specific	-0.425969
-2.272010	function. __attribute__((const))	-0.124939
-1.078942	pure_function __attribute__((const))	-0.124939
-3.423873	are unnecessary	-0.124939
-1.715127	Avoid unnecessary	-0.124939
-1.919597	float b[1000];	-0.124939
-1.203801	29 7.3	-0.124939
-1.203801	31 7.3	-0.124939
-3.220466	by performing	-0.124939
-1.962276	better performing	-0.124939
-2.817016	only once,	-0.124939
-2.168591	calculated once,	-0.124939
-1.879699	0, s1	-0.124939
-1.300632	a[i]; s1	-0.124939
-2.718034	set (called	-0.124939
-1.601263	statements (called	-0.124939
-2.224115	i; 84	-0.124939
-0.902931	............................................................................. 84	-0.124939
-0.204111	extern "C"	-0.124939
-2.776943	all respects	-0.124939
-2.565943	many respects	-0.124939
-1.078942	Documentation License	-0.124939
-0.601980	yes License	-0.124939
-2.452363	See ISO/IEC	-0.124939
-0.601980	en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC	-0.124939
-3.147653	as command-line	-0.124939
-2.926114	A command-line	-0.124939
-2.262353	; i++	-0.124939
-2.031911	operator i++	-0.124939
-2.022675	page 137).	-0.124939
-4.103464	the template.	-0.124939
-2.829374	same template.	-0.124939
-3.321661	// General	-0.124939
-0.902931	GNU General	-0.124939
-4.103464	the container,	-0.124939
-2.223162	single container,	-0.124939
-3.516422	The integrated	-0.124939
-3.281660	or integrated	-0.124939
-3.835016	is started.	-0.124939
-2.062310	was started.	-0.124939
-2.787003	which transposes	-0.124939
-2.699296	example transposes	-0.124939
-3.461056	the container.	-0.124939
-1.821604	return (*SelectAddMul_pointer)(aa,	-0.425969
-2.226587	will crash	-0.124939
-2.620879	= 1.1,	-0.425969
-3.113097	to reserve	-0.124939
-1.300632	division. Older	-0.124939
-0.601980	etc.). Older	-0.124939
-3.835016	is deleted.	-0.124939
-3.440385	be deleted.	-0.124939
-1.446600	Linux. Asmlib	-0.124939
-1.300711	13 Asmlib	-0.124939
-1.203801	run. Both	-0.124939
-1.079022	linker. Both	-0.124939
-2.620879	= &Object2;	-0.124939
-2.387675	int a:4;	-0.425969
-0.902931	a*1=a (-a)*(-b)=a*b	-0.124939
-0.601980	x-xxxxx-x (-a)*(-b)=a*b	-0.124939
-3.440385	be left	-0.124939
-2.431391	register left	-0.124939
-0.601980	c1+TILESIZE; c2++)	-0.124939
-0.601980	r2; c2++)	-0.124939
-0.748133	processor. Nested	-0.425969
-3.321661	// ipow	-0.124939
-2.679615	double ipow	-0.124939
-2.739293	integer comparison,	-0.124939
-0.902931	subtraction, comparison,	-0.124939
-2.745422	are produced	-0.425969
-1.650945	void NotPolymorphic();	-0.124939
-3.110520	- a*b+a*c	-0.124939
-2.649527	n.a. a*b+a*c	-0.124939
-2.888122	Example 11.1a	-0.124939
-2.699296	example 11.1a	-0.124939
-2.243524	doesn't support,	-0.124939
-2.191374	AVX support,	-0.124939
-2.338754	you forget	-0.425969
-3.461056	the inverted	-0.124939
-3.780575	to 11.1b	-0.124939
-2.888122	Example 11.1b	-0.124939
-3.302196	= 80.	-0.124939
-2.714185	page 80.	-0.124939
-2.608910	number i.	-0.124939
-0.902931	index, i.	-0.124939
-3.119752	This worked	-0.124939
-2.996966	have worked	-0.124939
-3.170036	is needed:	-0.425969
-1.408027	1 Introduction	-0.124939
-1.203881	i<300; i+=3){	-0.124939
-0.601980	i<301; i+=3){	-0.124939
-1.753969	4 PUBLIC	-0.425969
-0.204111	a+a+a+a=a*4 -(-a)=a	-0.124939
-0.505122	Intel: "IA-32	-0.425969
-3.170036	is loaded,	-0.124939
-1.245346	|| (a&&b&&c)	-0.425969
-2.316233	0 a+0=a	-0.124939
-0.601980	---xxx-x- a+0=a	-0.124939
-2.888122	Example 7.15b	-0.124939
-2.699296	example 7.15b	-0.124939
-2.680877	size grows	-0.124939
-2.567207	array grows	-0.124939
-3.835016	is 0.	-0.124939
-2.491214	< 0.	-0.124939
-0.204111	r1+TILESIZE; r2++)	-0.425969
-4.103464	the index,	-0.124939
-1.078942	brackets index,	-0.124939
-4.103464	the time-consumers	-0.124939
-2.222686	common time-consumers	-0.124939
-1.365340	fast enough.	-0.124939
-0.204111	120, 720,	-0.425969
-1.379980	quite tedious	-0.124939
-2.253125	work correctly.	-0.124939
-2.180093	works correctly.	-0.124939
-3.636150	and flexible,	-0.124939
-0.601980	universal, flexible,	-0.124939
-3.078083	int ARRAYSIZE	-0.124939
-2.491214	< ARRAYSIZE	-0.124939
-1.283172	source annotation	-0.124939
-1.379813	edx, respectively.	-0.124939
-0.601980	137, respectively.	-0.124939
-1.601582	3 1.1	-0.124939
-1.203801	information. 1.1	-0.124939
-2.714185	page 43).	-0.124939
-1.446600	p. 43).	-0.124939
-1.203801	u.i ^=	-0.124939
-0.601980	u.i[1] ^=	-0.124939
-2.084121	all 1's	-0.124939
-2.964472	use hexadecimal	-0.124939
-1.998885	Using hexadecimal	-0.124939
-2.218515	} u,	-0.425969
-2.533001	by extending	-0.124939
-0.380193	|, ^,	-0.124939
-3.148056	a systematic	-0.124939
-2.031911	operator forces	-0.124939
-1.880335	union forces	-0.124939
-3.835016	is rolled	-0.124939
-1.300632	list, rolled	-0.124939
-1.601423	module __attribute__	-0.124939
-0.902931	("internal"))) __attribute__	-0.124939
-3.656516	in Java,	-0.124939
-3.147653	as Java,	-0.124939
-1.079126	#pragma ivdep	-0.124939
-3.636150	and Gnu.	-0.124939
-3.190304	with Gnu.	-0.124939
-4.103464	the recursion	-0.124939
-3.516422	The recursion	-0.124939
-3.148056	a ^a	-0.425969
-3.786643	of algebra,	-0.124939
-1.998885	Boolean algebra,	-0.124939
-2.885448	memory management	-0.124939
-1.680683	heap management	-0.124939
-3.423873	are lots	-0.124939
-3.190304	with lots	-0.124939
-0.902931	www.amd.com. 163	-0.124939
-0.902931	..................................................................................................................... 163	-0.124939
-0.601980	team projects,	-0.124939
-0.601980	one-man projects,	-0.124939
-0.902931	/MT 160	-0.124939
-0.601980	options....................................................................................... 160	-0.124939
-3.321661	// (This	-0.124939
-1.715286	variable. (This	-0.124939
-2.977426	and Mac.	-0.124939
-2.829374	same chip.	-0.124939
-2.811673	CPU chip.	-0.124939
-3.440385	be wrapped	-0.124939
-3.423873	are wrapped	-0.124939
-1.133428	predicted perfectly	-0.124939
-0.204111	_mm_cmpgt_epi16(b, zero);	-0.425969
-1.493831	been added?	-0.425969
-0.902931	C++". Addison-Wesley,	-0.124939
-0.601980	Delight". Addison-Wesley,	-0.124939
-2.106655	loop counter,	-0.124939
-2.607087	static modifier	-0.124939
-0.902931	fastcall modifier	-0.124939
-3.281660	or -Ofast	-0.124939
-0.902931	better: -Ofast	-0.124939
-3.780575	to test.	-0.124939
-0.902931	155 test.	-0.124939
-3.780575	to respond	-0.124939
-1.998488	never respond	-0.124939
-0.204111	Numerically Intensive	-0.425969
-4.103464	the planning	-0.124939
-0.902931	early planning	-0.124939
-2.716931	class C2	-0.124939
-0.902931	Object1; C2	-0.124939
-4.103464	the R	-0.124939
-2.155898	four R	-0.124939
-3.148056	a release	-0.425969
-3.461056	the fraction.	-0.124939
-3.281660	or Friday	-0.124939
-0.601980	0x10, Friday	-0.124939
-1.643194	programming textbooks	-0.425969
-3.440385	be slower.	-0.124939
-2.864769	program slower.	-0.124939
-1.300711	90 9.7	-0.124939
-0.902931	alloca. 9.7	-0.124939
-3.302196	= c1;	-0.124939
-2.716931	class c1;	-0.124939
-0.806143	just-in-time compilers,	-0.124939
-3.636150	and subtracting	-0.124939
-3.220466	by subtracting	-0.124939
-2.636099	// Returns	-0.124939
-3.516422	The insight	-0.124939
-2.370130	new insight	-0.124939
-0.601980	a+b=b+a a*b=b*a	-0.124939
-0.601980	a+b=b+a, a*b=b*a	-0.124939
-1.794843	< r1+TILESIZE;	-0.425969
-3.147653	as 0/a	-0.124939
-3.110520	- 0/a	-0.124939
-2.122597	only hope	-0.425969
-3.321661	// Portability	-0.124939
-1.379813	14 Portability	-0.124939
-2.808142	other compilers).	-0.124939
-1.079022	DOS compilers).	-0.124939
-2.636099	// Catch	-0.124939
-3.068704	than 99%	-0.124939
-1.078942	programs, 99%	-0.124939
-0.681213	"Hello ";	-0.124939
-1.715366	struct Sab	-0.124939
-1.079022	b;}; Sab	-0.124939
-1.601263	significant contribution	-0.124939
-1.078942	negligible contribution	-0.124939
-3.810938	a distinction	-0.124939
-2.281630	important distinction	-0.124939
-2.636099	// Update	-0.425969
-1.446760	Clang Supported	-0.124939
-0.601980	vectorclass.h Supported	-0.124939
-0.204111	develop- ment	-0.124939
-2.272010	function. typeof(CriticalFunction)	-0.124939
-0.601980	("CriticalFunction"); typeof(CriticalFunction)	-0.124939
-3.461056	the ones	-0.124939
-3.170036	is busy	-0.124939
-3.440385	be optimally	-0.124939
-2.119141	run optimally	-0.124939
-1.300632	necessary. Fast	-0.124939
-1.078942	keywords Fast	-0.124939
-3.516422	The funny	-0.124939
-2.532511	some funny	-0.124939
-1.747153	database queries	-0.124939
-0.601980	Database queries	-0.124939
-2.864769	program saying	-0.124939
-0.902931	messages saying	-0.124939
-2.378983	than normal.	-0.124939
-0.681213	"Hello 1"	-0.425969
-3.835016	is updated.	-0.124939
-1.962593	dispatcher updated.	-0.124939
-3.516422	The clumsy	-0.124939
-1.923801	look clumsy	-0.124939
-0.204111	(double)(signed int)u;	-0.124939
-4.103464	the trivial	-0.124939
-3.506862	for trivial	-0.124939
-0.204111	transpose(double a[SIZE][SIZE])	-0.425969
-3.281660	or x64	-0.124939
-2.167797	/ x64	-0.124939
-2.602966	64-bit systems).	-0.124939
-1.880017	x86 systems).	-0.124939
-3.170036	is wasted	-0.425969
-0.902931	General Public	-0.124939
-0.902931	Fog. Public	-0.124939
-2.272010	extra dummy	-0.124939
-2.105221	add dummy	-0.124939
-4.103464	the symbolic	-0.124939
-3.810938	a symbolic	-0.124939
-3.440385	be fetched	-0.124939
-3.423873	are fetched	-0.124939
-3.302196	= 0x40	-0.124939
-3.281660	or 0x40	-0.124939
-3.110520	- (a&b)|(a&c)	-0.124939
-0.601980	x-xxxxx-- (a&b)|(a&c)	-0.124939
-3.068704	than relocation,	-0.124939
-2.378461	need relocation,	-0.124939
-1.943050	compilers. (The	-0.124939
-1.300711	explanation. (The	-0.124939
-3.321661	// Mixing	-0.124939
-1.943050	compilers. Mixing	-0.124939
-3.278300	it decides	-0.124939
-3.246643	function decides	-0.124939
-1.283135	#include <xmmintrin.h>	-0.124939
-2.570272	* Func1(x)	-0.124939
-2.516301	return Func1(x)	-0.124939
-0.204111	720, 5040,	-0.425969
-3.461056	the ability	-0.425969
-3.220466	by 3,	-0.124939
-1.680524	2, 3,	-0.124939
-1.300711	21 3.12	-0.124939
-1.300791	modules. 3.12	-0.124939
-3.116033	not 123	-0.124939
-0.902931	ABC 123	-0.124939
-3.780575	to provoke	-0.124939
-2.919617	will provoke	-0.124939
-0.602023	Codeplay VectorC	-0.124939
-2.808142	other processes.	-0.124939
-2.610850	multiple processes.	-0.124939
-0.204111	__attribute__ ((visibility	-0.425969
-3.835016	is stronger	-0.124939
-2.223718	much stronger	-0.124939
-3.423873	are accessible	-0.124939
-3.116033	not accessible	-0.124939
-3.636150	and reusable	-0.124939
-3.656516	in reusable	-0.124939
-3.516422	The procedures	-0.124939
-1.446760	far procedures	-0.124939
-3.302196	= !(a	-0.124939
-2.649527	n.a. !(a	-0.124939
-0.602023	18 Overview	-0.425969
-0.601980	SSE4A ammintrin.h	-0.124939
-0.601980	XOP ammintrin.h	-0.124939
-3.068704	than sequences	-0.124939
-2.201893	small sequences	-0.124939
-2.977426	and stop	-0.425969
-1.747312	further expansions	-0.124939
-1.203801	Taylor expansions	-0.124939
-1.394359	sign bit.	-0.124939
-0.204111	12.10 Conclusion	-0.124939
-2.607087	static linking.	-0.124939
-2.333418	dynamic linking.	-0.124939
-1.203801	IDE. Free	-0.124939
-0.902931	GNU Free	-0.124939
-4.103464	the bottlenecks	-0.124939
-2.507684	performance bottlenecks	-0.124939
-3.780575	to interrupts	-0.124939
-1.879937	generate interrupts	-0.124939
-3.190304	with legacy	-0.124939
-2.532511	some legacy	-0.124939
-2.872111	data segment	-0.124939
-2.763595	one segment	-0.124939
-2.649527	n.a. __unix__	-0.124939
-0.902931	__linux__ __unix__	-0.124939
-3.636150	and attempts	-0.124939
-3.278300	it attempts	-0.124939
-1.504592	vectors. Code	-0.124939
-0.902931	eliminated. Code	-0.124939
-0.505131	Weekdays Day;	-0.425969
-2.315715	{ swapd(a[r2][c2],a[c2][r2]);	-0.425969
-0.204111	(*SelectAddMul_pointer)(aa, bb,	-0.425969
-1.747073	save power.	-0.124939
-1.715048	processing power.	-0.124939
-2.278946	time consumer	-0.124939
-3.148056	a parenthesis	-0.425969
-1.680445	classes. Security	-0.124939
-1.300791	computer. Security	-0.124939
-2.280599	its limit,	-0.124939
-0.601980	acceptable limit,	-0.124939
-2.964472	use ~	-0.124939
-0.902931	^, ~	-0.124939
-3.246643	function swapd(a[r][c],	-0.124939
-1.555665	diagonal swapd(a[r][c],	-0.124939
-2.398619	time. Single	-0.124939
-1.998885	cache. Single	-0.124939
-0.902931	0.25 0.28	-0.124939
-0.601980	0.29 0.28	-0.124939
-2.996966	have Booleans	-0.124939
-0.902931	7.5 Booleans	-0.124939
-1.078942	K8 0.24	-0.124939
-0.902931	0.25 0.24	-0.124939
-0.380193	9.1 Caching	-0.425969
-3.048496	may interfere	-0.124939
-2.919617	will interfere	-0.124939
-2.666831	pointer -fomit-	-0.124939
-0.601980	/Oy -fomit-	-0.124939
-0.902931	0.24 0.25	-0.124939
-0.902931	1.00 0.25	-0.124939
-2.620879	= _mm_mullo_epi16	-0.425969
-4.103464	the FMA4	-0.124939
-2.213411	AMD FMA4	-0.124939
-0.902931	Public distribution	-0.124939
-0.601980	Non-public distribution	-0.124939
-1.777275	b) >>	-0.124939
-0.601980	<<, >>	-0.124939
-3.780575	to do,	-0.124939
-1.078942	Programmers do,	-0.124939
-3.241855	if appropriate.	-0.124939
-2.567682	where appropriate.	-0.124939
-1.300711	manuals. Please	-0.124939
-1.300711	explanation. Please	-0.124939
-0.204111	3.12 Network	-0.425969
-2.439079	unsigned __int64	-0.124939
-0.902931	compiler: __int64	-0.124939
-1.794870	branch mispredictions,	-0.124939
-1.794843	< rows;	-0.425969
-2.872111	data together.	-0.124939
-1.923324	linked together.	-0.124939
-3.113097	to organize	-0.124939
-3.148056	a bitfield	-0.124939
-2.888122	Example 15.1b.	-0.124939
-2.699296	example 15.1b.	-0.124939
-3.278300	it sees	-0.124939
-3.063211	compiler sees	-0.124939
-2.398619	time. Interpreted	-0.124939
-0.601980	script. Interpreted	-0.124939
-3.063211	compiler treat	-0.124939
-2.593253	also treat	-0.124939
-0.204111	TransposeCopy(double a[SIZE][SIZE],	-0.425969
-3.302196	= (a<b	-0.124939
-0.601980	!(a<b)=(a>=b) (a<b	-0.124939
-3.461056	the processor).	-0.124939
-2.996966	have occurred.	-0.124939
-2.859753	has occurred.	-0.124939
-3.461056	the corresponding	-0.124939
-3.009578	{ memset(a,	-0.124939
-1.980759	zero memset(a,	-0.124939
-1.875645	* m;}	-0.124939
-3.636150	and recovering	-0.124939
-3.506862	for recovering	-0.124939
-3.009578	{ Sunday,	-0.124939
-1.747391	constants Sunday,	-0.124939
-0.204111	mutually incompatible.	-0.124939
-2.649750	+ 2;}	-0.124939
-2.202052	+= 2;}	-0.124939
-3.835016	is fast,	-0.124939
-0.601980	-fp-model fast,	-0.124939
-2.092068	optimizing University	-0.124939
-0.902931	Technical University	-0.124939
-3.780575	to log,	-0.124939
-0.601980	pow, log,	-0.124939
-3.475723	that begins	-0.124939
-1.446760	body begins	-0.124939
-2.202357	Example 14.26	-0.124939
-2.202357	Example 14.27	-0.124939
-3.835016	is somewhat	-0.124939
-2.180093	works somewhat	-0.124939
-0.748142	poorly predictable.	-0.124939
-0.857277	addition, subtraction,	-0.124939
-2.888122	Example 14.23	-0.124939
-2.699296	example 14.23	-0.124939
-3.810938	a slight	-0.124939
-2.191295	cause slight	-0.124939
-2.253522	execution unit.	-0.124939
-1.715048	processing unit.	-0.124939
-3.636150	and direct	-0.124939
-2.015600	allows direct	-0.124939
-4.103464	the generic	-0.124939
-3.810938	a generic	-0.124939
-1.962593	addition unit,	-0.124939
-1.715048	processing unit,	-0.124939
-3.835016	is invalid.	-0.124939
-1.805145	become invalid.	-0.124939
-3.835016	is heavily	-0.124939
-1.777992	rely heavily	-0.124939
-2.817016	only self-	-0.124939
-1.643134	calculating self-	-0.124939
-2.852102	make log2	-0.124939
-2.679615	double log2	-0.124939
-0.944455	monitor counters.	-0.124939
-1.556099	execution speed,	-0.425969
-1.245346	|| (!a&&c)	-0.124939
-3.516422	The []	-0.124939
-0.601980	Safe []	-0.124939
-2.022675	page 122.	-0.425969
-2.289871	error messages	-0.124939
-0.601980	pop-up messages	-0.124939
-0.601980	F2(float x[]);	-0.124939
-0.601980	F1(int x[]);	-0.124939
-2.117324	CPU only)	-0.124939
-1.078942	automatically, although	-0.124939
-0.601980	133 although	-0.124939
-3.423873	are primitive	-0.124939
-1.300632	relatively primitive	-0.124939
-0.902931	S. Goedecker	-0.124939
-0.601980	Stefan Goedecker	-0.124939
-0.204111	13.2 Model-specific	-0.425969
-0.601980	/GR– -fno-rtti	-0.124939
-0.601980	/GR- -fno-rtti	-0.124939
-2.990146	this initialization,	-0.124939
-0.601980	clauses: initialization,	-0.124939
-3.009578	{ largest_abs	-0.124939
-0.601980	absvalue, largest_abs	-0.124939
-1.601502	alternative implementations.	-0.124939
-1.446680	Java implementations.	-0.124939
-0.380193	6, 24,	-0.425969
-3.636150	and studying	-0.124939
-3.506862	for studying	-0.124939
-2.714185	page 87).	-0.124939
-1.446600	p. 87).	-0.124939
-2.048645	cache contentions.	-0.124939
-2.716931	class SafeArray	-0.124939
-0.902931	7.15b SafeArray	-0.124939
-2.714185	page 58	-0.124939
-1.300632	so. 58	-0.124939
-0.602032	becoming increasingly	-0.124939
-3.147653	as accurate	-0.124939
-1.203881	sufficiently accurate	-0.124939
-2.951125	more efforts	-0.124939
-2.405221	optimization efforts	-0.124939
-2.171808	program starts.	-0.425969
-2.131868	contains i/2+r.	-0.124939
-1.300632	computing i/2+r.	-0.124939
-3.461056	the reader	-0.124939
-3.168158	on usability,	-0.124939
-1.715286	time, usability,	-0.124939
-1.203881	true. template<>	-0.124939
-0.902931	recursion template<>	-0.124939
-4.103464	the low-level	-0.124939
-3.780575	to low-level	-0.124939
-3.170036	is available:	-0.425969
-1.078942	overflow: _controlfp_s(&dummy,	-0.124939
-0.601980	_fpreset(); _controlfp_s(&dummy,	-0.124939
-1.565561	; mangled	-0.425969
-0.204111	12.6 Transforming	-0.425969
-2.919617	will contend	-0.124939
-2.405379	libraries contend	-0.124939
-0.903053	garbage collector	-0.124939
-0.602032	((unsigned int)n	-0.425969
-0.902931	created. Far	-0.124939
-0.601980	huge). Far	-0.124939
-3.121663	of factorials:	-0.124939
-2.127817	functions ........................................................................................	-0.124939
-3.278300	it compares	-0.124939
-2.905661	It compares	-0.124939
-0.944427	below shows.	-0.124939
-2.913351	} By	-0.124939
-1.747471	platforms By	-0.124939
-2.502567	with certainty	-0.124939
-0.601980	rightmost 1-bit	-0.124939
-0.601980	right-most 1-bit	-0.124939
-3.321661	// Or	-0.124939
-1.446600	parameters. Or	-0.124939
-1.203801	running. Programs	-0.124939
-1.203801	conditions. Programs	-0.124939
-2.306769	& operation,	-0.124939
-1.747312	shift operation,	-0.124939
-3.170036	is closed.	-0.425969
-1.446680	Open source.	-0.124939
-1.078942	open source.	-0.124939
-1.394359	sign :1;//signbit	-0.425969
-2.761868	be avoided,	-0.124939
-4.103464	the book	-0.124939
-0.601980	Advanced book	-0.124939
-3.835016	is avoided.	-0.124939
-3.440385	be avoided.	-0.124939
-3.321661	// EMMS	-0.124939
-3.081147	an EMMS	-0.124939
-2.701658	do immediately	-0.124939
-1.079022	placed immediately	-0.124939
-1.181714	0, sizeof(a));	-0.124939
-2.022675	page 105).	-0.124939
-2.431391	register usage	-0.124939
-0.601980	"Register usage	-0.124939
-3.113097	to fix	-0.425969
-1.985265	double b[SIZE][SIZE])	-0.425969
-3.636150	and press	-0.124939
-1.300632	key press	-0.124939
-3.147653	as sorting	-0.124939
-2.701816	most sorting	-0.124939
-1.861808	objects (*.dll,	-0.425969
-1.952105	n.a. 1.00	-0.124939
-1.078942	23 ,	-0.124939
-0.902931	52 ,	-0.124939
-2.951125	more efficiently.	-0.124939
-2.444773	less efficiently.	-0.124939
-4.103464	the word	-0.124939
-3.810938	a word	-0.124939
-1.923085	public B2	-0.124939
-1.923562	public: B2	-0.124939
-1.078942	101 10.1	-0.124939
-0.601980	(www.intel.com/technology/itj/). 10.1	-0.124939
-1.300632	a[i]; Converting	-0.124939
-0.902931	undetected. Converting	-0.124939
-3.810938	a matrix.	-0.124939
-1.880017	512 matrix.	-0.124939
-3.302196	= B;	-0.124939
-2.649750	+ B;	-0.124939
-2.118823	doing divisions.	-0.124939
-1.555745	independent divisions.	-0.124939
-1.318008	dependency chains,	-0.124939
-3.786643	of coprocessors	-0.124939
-3.147653	as coprocessors	-0.124939
-2.077907	... x.a	-0.124939
-1.300632	C; x.a	-0.124939
-2.829374	same effect.	-0.124939
-2.726769	no effect.	-0.124939
-3.113097	to keyboard	-0.124939
-0.204111	3628800, 39916800,	-0.425969
-2.636099	// Approximate	-0.425969
-2.315715	{ Table[x]	-0.425969
-1.772146	const restriction	-0.124939
-0.902931	B; x.c	-0.124939
-0.601980	2.; x.c	-0.124939
-3.440385	be misleading	-0.124939
-1.923164	give misleading	-0.124939
-2.872111	data #ifdef	-0.124939
-0.601980	8.22 #ifdef	-0.124939
-1.805383	installation ..................................................................................................	-0.124939
-1.777275	checking ..................................................................................................	-0.124939
-0.601980	12.8a. Sum	-0.124939
-0.601980	12.8b. Sum	-0.124939
-1.300791	1.; x.b	-0.124939
-0.902931	A; x.b	-0.124939
-1.133446	VIA CPUs".	-0.124939
-2.167797	/ shr	-0.124939
-1.680445	mov shr	-0.124939
-2.331910	bits each.	-0.124939
-2.243524	bytes each.	-0.124939
-1.300632	xxxxxxxxx 0/a=0	-0.124939
-0.601980	x-xxx-x-- 0/a=0	-0.124939
-2.533001	by replacing	-0.124939
-3.302196	= 1.0f	-0.124939
-1.642815	? 1.0f	-0.124939
-2.299876	this manual,	-0.124939
-3.636150	and involve	-0.124939
-3.048496	may involve	-0.124939
-2.346122	always Optimize	-0.124939
-2.191374	Linux Optimize	-0.124939
-1.504910	+= list[i];	-0.124939
-3.321661	// initialize	-0.124939
-1.079022	sum, initialize	-0.124939
-1.300632	n! 117	-0.124939
-0.601980	vectorization............................................................. 117	-0.124939
-3.475723	that previously	-0.124939
-0.601980	assigned previously	-0.124939
-2.405379	libraries 113	-0.124939
-1.203801	............................................................................................. 113	-0.124939
-0.982188	square root	-0.425969
-3.636150	and statistics,	-0.124939
-3.506862	for statistics,	-0.124939
-1.962117	three times,	-0.124939
-1.777593	response times,	-0.124939
-4.103464	the obstacle	-0.124939
-3.081147	an obstacle	-0.124939
-0.681195	Agner Fog.	-0.124939
-1.203881	exp 12.8	-0.124939
-1.078942	119 12.8	-0.124939
-4.103464	the IDE	-0.124939
-3.081147	an IDE	-0.124939
-1.680445	access. 12.9	-0.124939
-1.446680	120 12.9	-0.124939
-3.786643	of removing	-0.124939
-3.220466	by removing	-0.124939
-1.680908	test setup	-0.124939
-0.602023	_WIN64 _LP64	-0.124939
-2.306690	simple type,	-0.124939
-2.145093	known type,	-0.124939
-3.009578	{ b.load(bb+i);	-0.124939
-2.468986	elements b.load(bb+i);	-0.124939
-4.103464	the factorials	-0.124939
-1.446680	reciprocal factorials	-0.124939
-4.103464	the subroutine	-0.124939
-1.998567	separate subroutine	-0.124939
-3.636150	and later.	-0.124939
-3.281660	or later.	-0.124939
-1.300632	107 12.4	-0.124939
-1.300632	division. 12.4	-0.124939
-0.902931	109 12.5	-0.124939
-0.601980	section. 12.5	-0.124939
-0.982225	algebraic manipulations	-0.124939
-2.192348	memory leaks	-0.124939
-2.105459	standard says	-0.124939
-0.601980	convention says	-0.124939
-2.511353	2 12.6	-0.124939
-0.902931	113 12.6	-0.124939
-2.022675	page 140).	-0.124939
-0.902931	117 12.7	-0.124939
-0.601980	118 12.7	-0.124939
-2.511353	2 52	-0.124939
-2.157168	}; 52	-0.124939
-3.113097	to obey	-0.124939
-2.022675	page 72.	-0.124939
-2.620879	= b*a	-0.124939
-3.835016	is 95	-0.124939
-2.714185	page 95	-0.124939
-1.300711	elements. 12.1	-0.124939
-0.902931	105 12.1	-0.124939
-4.103464	the time-critical	-0.124939
-3.656516	in time-critical	-0.124939
-1.300632	future. 12.3	-0.124939
-1.300632	107 12.3	-0.124939
-3.810938	a universal	-0.124939
-1.776957	No universal	-0.124939
-2.718034	set Suppl.	-0.124939
-0.601980	pmmintrin.h Suppl.	-0.124939
-2.919617	will crash.	-0.124939
-2.339166	system crash.	-0.124939
-0.601980	a*b*c=a*(b*c) a+b+c+d	-0.124939
-0.601980	a+b+c=c+b+a a+b+c+d	-0.124939
-3.780575	to 99	-0.124939
-1.203801	.............................................................................................. 99	-0.124939
-3.321661	// Remove	-0.124939
-2.031832	branches Remove	-0.124939
-1.777275	code, interpreters,	-0.124939
-1.078942	frameworks, interpreters,	-0.124939
-3.636150	and 9.	-0.124939
-1.680604	level 9.	-0.124939
-2.549586	if any,	-0.124939
-2.964472	use thread-safe	-0.124939
-2.926114	A thread-safe	-0.124939
-1.650945	void F3(bool	-0.425969
-3.656516	in exclusive	-0.124939
-3.506862	for exclusive	-0.124939
-2.506893	table 9.2.	-0.124939
-1.962196	Table 9.2.	-0.124939
-1.794843	< NumberOfTests;	-0.425969
-2.919617	will stay	-0.124939
-1.300632	necessarily stay	-0.124939
-2.602966	64-bit extension	-0.124939
-1.747312	further extension	-0.124939
-4.103464	the "best	-0.124939
-3.636150	and "best	-0.124939
-2.492801	member functions)	-0.124939
-0.601980	ced functions)	-0.124939
-3.835016	is repeated	-0.124939
-3.440385	be repeated	-0.124939
-2.387675	int FactorialTable[13]	-0.425969
-1.962276	| Friday)	-0.124939
-1.747551	== Friday)	-0.124939
-1.680843	child function:	-0.124939
-1.203881	lrint function:	-0.124939
-2.202357	Example 8.21	-0.124939
-3.302196	= &CriticalFunction_AVX;	-0.124939
-2.516301	return &CriticalFunction_AVX;	-0.124939
-1.715127	operations. 105	-0.124939
-0.601980	operations............................................................................................... 105	-0.124939
-3.063211	compiler interpret	-0.124939
-2.912038	then interpret	-0.124939
-1.805224	1. Writing	-0.124939
-1.203801	programs. Writing	-0.124939
-0.204111	FDIV bug	-0.124939
-2.636099	// Compare	-0.425969
-0.204111	{temp=x; x=y;	-0.425969
-3.281660	or better,	-0.124939
-1.446600	Even better,	-0.124939
-3.780575	to flip	-0.124939
-3.321661	// flip	-0.124939
-1.459106	four float's	-0.124939
-1.526080	several minutes	-0.425969
-2.913351	} 109	-0.124939
-0.902931	........................................................................................ 109	-0.124939
-2.620879	= (a+1)	-0.124939
-2.808142	other reasons,	-0.124939
-2.369497	these reasons,	-0.124939
-1.446600	Even better:	-0.124939
-0.601980	option) better:	-0.124939
-2.710694	each method,	-0.124939
-1.642736	simplest method,	-0.124939
-2.542495	variable whose	-0.124939
-1.715605	Variables whose	-0.124939
-2.977426	and delete,	-0.124939
-3.170036	is accessed,	-0.124939
-2.620879	= a&(b|c)	-0.124939
-3.506862	for assigning	-0.124939
-3.220466	by assigning	-0.124939
-2.817016	only 10%	-0.124939
-1.680445	true 10%	-0.124939
-0.601980	1994. Mostly	-0.124939
-0.601980	1997. Mostly	-0.124939
-1.471172	manual 4:	-0.425969
-1.471005	/ 4;	-0.124939
-1.923324	microprocessors have.	-0.124939
-1.379733	users have.	-0.124939
-3.786643	of relieving	-0.124939
-3.506862	for relieving	-0.124939
-3.302196	= 10,	-0.124939
-1.504672	8, 10,	-0.124939
-1.379813	?Func@@YAXQAHAAH@Z ENDP	-0.124939
-0.601980	?Func2@@YAXQAHAAH@Z ENDP	-0.124939
-3.321661	// incremented	-0.124939
-2.190343	been incremented	-0.124939
-1.747471	calls. 48	-0.124939
-0.601980	................................................................................................................ 48	-0.124939
-1.747551	== Tuesday	-0.124939
-1.680524	2, Tuesday	-0.124939
-2.168591	calculated internally	-0.124939
-2.157566	implemented internally	-0.124939
-1.805622	unused fourth	-0.124939
-0.902931	Every fourth	-0.124939
-2.565943	many tips	-0.124939
-2.532511	some tips	-0.124939
-3.506862	for shared_ptr	-0.124939
-0.601980	assignment. shared_ptr	-0.124939
-2.977426	and BSD.	-0.124939
-3.461056	the effort.	-0.124939
-2.299438	available today.	-0.124939
-1.446680	Java today.	-0.124939
-1.777275	2. Put	-0.124939
-0.601980	question: Put	-0.124939
-2.387675	int CriticalFunction_AVX(int	-0.425969
-2.830924	for (r2	-0.425969
-2.533001	by writing:	-0.124939
-2.022999	class B2;	-0.124939
-3.461056	the overall	-0.124939
-0.380193	7.17 Structures	-0.425969
-3.636150	and executables.	-0.124939
-2.839593	different executables.	-0.124939
-3.835016	is inherently	-0.124939
-3.423873	are inherently	-0.124939
-3.780575	to me.	-0.124939
-2.889723	from me.	-0.124939
-3.810938	a non-virtual	-0.124939
-3.068704	than non-virtual	-0.124939
-3.835016	is safer.	-0.124939
-2.593253	also safer.	-0.124939
-3.786643	of b+c	-0.124939
-1.203881	first. b+c	-0.124939
-0.204111	__unix__ __linux__	-0.124939
-2.340853	16 -32768	-0.124939
-3.310495	= b+a	-0.124939
-4.129425	the factorials,	-0.124939
-3.802463	of convenience	-0.124939
-3.073074	than 231.	-0.124939
-2.518327	return pow(x,10);	-0.124939
-2.502706	software companies	-0.124939
-2.922430	will dominate	-0.124939
-2.233913	specific preferences	-0.124939
-1.777874	#pragma optimize("a",on).	-0.124939
-1.777874	#pragma optimize(...)	-0.124939
-1.079043	---x----- x---x---x	-0.124939
-1.079043	2exponent 16383	-0.124939
-2.719960	set extensions.	-0.124939
-1.203981	μs today,	-0.124939
-2.894347	Example 14.5b	-0.124939
-2.894347	Example 14.5a	-0.124939
-1.380026	specifying otherwise.	-0.124939
-0.602014	for(inti=0;i<16;i+=4){ //Loopby4	-0.124939
-2.572330	* b2);	-0.124939
-0.602014	1./362880., 1./3628800.,	-0.124939
-3.827713	a niche	-0.124939
-2.492447	< 10)	-0.124939
-1.300799	__asm fistp	-0.124939
-1.805718	Windows, SetThreadAffinityMask,	-0.124939
-3.433503	are common,	-0.124939
-2.874792	data (low	-0.124939
-2.503028	very common.	-0.124939
-1.902674	bigger segments	-0.124939
-0.602014	Mac: Darwin8	-0.124939
-1.680871	level 108	-0.124939
-0.902998	fast, compact,	-0.124939
-3.850836	is 102	-0.124939
-0.902998	anonymous namespace.	-0.124939
-3.085097	an update,	-0.124939
-3.527822	The similarity	-0.124939
-1.079043	contemporary 106	-0.124939
-2.544145	any function)	-0.124939
-3.827713	a "move	-0.124939
-0.902998	indirect function"	-0.124939
-0.602014	Thursday, Friday,	-0.124939
-0.902998	(16 bits),	-0.124939
-1.300799	division. Correction	-0.124939
-3.527822	The vulnerability	-0.124939
-4.129425	the reinstallation	-0.124939
-3.225895	by ignoring	-0.124939
-2.156931	four sums	-0.124939
-2.307969	& N-1)==0	-0.124939
-0.602014	1./2., 1./6.,	-0.124939
-1.777643	b) {x	-0.124939
-1.556025	element. Rather	-0.124939
-1.777735	inefficient code-based	-0.124939
-1.999122	model N+1	-0.124939
-4.129425	the 512-bit	-0.124939
-2.894347	Example 7.6.	-0.124939
-3.291104	or circumvent	-0.124939
-0.902998	0/a=0 ---x---xx	-0.124939
-1.831862	my blog.	-0.124939
-0.602014	std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP.	-0.124939
-1.856778	CPUs. Includes	-0.124939
-1.555933	eax $B2$2	-0.124939
-0.902998	undesired effects.	-0.124939
-0.602014	"Hacker's Delight".	-0.124939
-2.341268	32 AND-operations	-0.124939
-3.663751	in precompiled	-0.124939
-3.827713	a typo	-0.124939
-2.717714	page 51).	-0.124939
-0.602014	Documentation". Included	-0.124939
-1.446834	anyway. Updates	-0.124939
-0.602014	'?', '@'	-0.124939
-2.824466	functions (methods)	-0.124939
-2.803414	loop count.	-0.124939
-3.112512	- masm=intel	-0.124939
-2.263848	must warn	-0.124939
-0.602014	measurements: warm	-0.124939
-3.121718	not noticed	-0.124939
-2.595343	C++ constructs........................................................................	-0.124939
-3.112512	- a<<b<<c	-0.124939
-2.888228	memory leak.	-0.124939
-1.601644	similar utility	-0.124939
-3.654810	and clumsy,	-0.124939
-3.527822	The 16-byte	-0.124939
-2.779853	all zeroes.	-0.124939
-2.534207	some caveats.	-0.124939
-1.504873	label $B1$2:.	-0.124939
-3.654810	and repagination	-0.124939
-3.795255	to a[i+2]	-0.124939
-3.066220	compiler recognizes	-0.124939
-3.121718	not recognized	-0.124939
-0.602014	_endthread() cleans	-0.124939
-2.032546	operator (bitwise	-0.124939
-0.902998	cross-platform portability.	-0.124939
-2.169292	calculated independently.	-0.124939
-2.894347	Example 9.5b	-0.124939
-3.795255	to answer	-0.124939
-1.805579	STL (Standard	-0.124939
-2.894347	Example 13.2.	-0.124939
-1.079089	[edx] adds,	-0.124939
-2.967317	use objconv	-0.124939
-2.233913	specific purpose:	-0.124939
-1.446881	serious limitations	-0.124939
-3.310495	= x8*x2;	-0.124939
-2.994258	this limitation).	-0.124939
-1.203935	43 speculatively	-0.124939
-4.129425	the disassembly	-0.124939
-2.745331	cache (en.wikipedia.org/wiki/L2_cache).	-0.124939
-2.729123	no attempt	-0.124939
-2.913667	then 0+1.23456	-0.124939
-0.902998	(time before)	-0.124939
-2.888228	memory blocks.	-0.124939
-2.340669	file timingtest.h	-0.124939
-3.795255	to C1::Disp()	-0.124939
-2.803414	loop predictor.	-0.124939
-3.310495	= 128.	-0.124939
-2.347850	always accurate,	-0.124939
-1.203935	double. Misaligned	-0.124939
-4.129425	the FAQ	-0.124939
-2.457547	called before.	-0.124939
-4.129425	the Xnu	-0.124939
-1.079043	being initialized.	-0.124939
-2.703420	do so).	-0.124939
-1.856639	compiler. Remember,	-0.124939
-0.602014	/fp:fast /fp:fast=2	-0.124939
-2.203058	option /MT).	-0.124939
-0.602014	(VML, MKL).	-0.124939
-1.923586	public B1	-0.124939
-3.482454	that CParent::Hello()	-0.124939
-3.827713	a user-defined	-0.124939
-1.777920	hot spots,	-0.124939
-3.802463	of B.	-0.124939
-3.517654	for exploiting	-0.124939
-3.802463	of sets).	-0.124939
-2.191509	been lost	-0.124939
-3.310495	= i+1;	-0.124939
-3.850836	is terminated.	-0.124939
-1.446881	serious burden	-0.124939
-0.902998	a&&(b||c) (a&&!b)	-0.124939
-2.967317	use SafeArray:	-0.124939
-2.157300	128 Is8vec16	-0.124939
-2.048156	particular weakness	-0.124939
-2.595343	C++ imple-	-0.124939
-1.203935	processing. Yeppp.	-0.124939
-1.715495	Avoid nested	-0.124939
-2.307969	& source)	-0.124939
-1.079043	books 1994.	-0.124939
-2.861745	has side	-0.124939
-3.000398	have ample	-0.124939
-2.332977	bits (MMX),	-0.124939
-3.073074	than 1/50	-0.124939
-1.079043	/arch:AVX /openmp	-0.124939
-2.544927	we forgot	-0.124939
-1.962818	three clauses	-0.124939
-0.602014	Poor reproducibility.	-0.124939
-3.310495	= -abs(x);.	-0.124939
-3.112512	- x-xxx	-0.124939
-0.902998	CriticalFunction(); timediff[i]	-0.124939
-0.602014	signed. Be	-0.124939
-1.747911	sum for(inti=0;i<16;i+=4){	-0.124939
-2.290871	error message.	-0.124939
-2.458192	before coordination	-0.124939
-2.954036	more distant	-0.124939
-3.011997	{ Sunday	-0.124939
-2.263387	; restore	-0.124939
-0.602014	sizeof(b)); 47	-0.124939
-0.902998	Addison-Wesley, 1996.	-0.124939
-2.892568	from www.agner.org/optimize.	-0.124939
-0.602014	NUMROWS; row++)	-0.124939
-3.795255	to resume	-0.124939
-0.602014	~(~a)=a x-xxxxx--	-0.124939
-2.888228	memory areas.	-0.124939
-0.602014	fld qword	-0.124939
-3.327981	// Repeat	-0.124939
-2.016156	handling /EHs-	-0.124939
-3.827713	a union:	-0.124939
-2.702043	example (12.4e)	-0.124939
-2.224016	thread steals	-0.124939
-0.602014	ADC (add	-0.124939
-3.850836	is moved,	-0.124939
-3.449984	be moved.	-0.124939
-2.888228	memory areas,	-0.124939
-0.902998	, longdoublevalue	-0.124939
-2.414881	rather unconventional	-0.124939
-3.067312	x *const_cast<int*>(&x)	-0.124939
-3.162394	code (option	-0.124939
-0.902998	a*1=a x-xxxxx-x	-0.124939
-0.602014	malloc. Handles	-0.124939
-3.802463	of strange	-0.124939
-1.805579	become invalid,	-0.124939
-0.902998	MOVNTDQ _mm_stream_si128	-0.124939
-0.602014	(Windows: /Gy,	-0.124939
-1.203935	enabled (there	-0.124939
-3.153114	as sqrt	-0.124939
-3.310495	= b.y	-0.124939
-2.145250	support SSE.	-0.124939
-1.601829	fraction bits:	-0.124939
-2.317001	0 264-1	-0.124939
-3.162394	code generality.	-0.124939
-3.310495	= sin(0.8);	-0.124939
-3.654810	and esp+12	-0.124939
-1.715495	adding bounds-checking	-0.124939
-3.482454	that owns	-0.124939
-2.446280	0; i--,	-0.124939
-0.602014	vice versa.	-0.124939
-3.251046	function body.	-0.124939
-3.291104	or CString	-0.124939
-1.203981	513 58.7	-0.124939
-3.327981	// Prevent	-0.124939
-1.747680	limited "express"	-0.124939
-0.902998	complexity (en.wikipedia.org/wiki/Standard_Template_Library).	-0.124939
-3.795255	to zero:	-0.124939
-1.923494	x; *(int*)&x	-0.124939
-0.602014	(option -fno-pic).	-0.124939
-2.766015	one tread	-0.124939
-2.451427	4 rows.	-0.124939
-3.850836	is saturated.	-0.124939
-4.129425	the rows,	-0.124939
-2.244312	An uncaught	-0.124939
-3.827713	a macro,	-0.124939
-0.602014	Ignoring virtualization.	-0.124939
-3.795255	to seek	-0.124939
-3.827713	a macro.	-0.124939
-3.433503	are constructed.	-0.124939
-2.608584	static data,	-0.124939
-2.348587	void FuncB	-0.124939
-3.482454	that limits	-0.124939
-1.203935	F32vec4 xx4(x4);	-0.124939
-3.827713	a precious	-0.124939
-0.602014	Family 15h	-0.124939
-2.502706	software installed,	-0.124939
-3.449984	be installed.	-0.124939
-2.719054	floating point:	-0.124939
-3.517654	for IA-32/Intel64,	-0.124939
-0.902998	SSSE3 _mm_perm_epi8	-0.124939
-3.112512	- min)	-0.124939
-4.129425	the LLVM	-0.124939
-2.894347	Example 7.40a	-0.124939
-2.894347	Example 7.40b	-0.124939
-2.894347	Example 7.40c	-0.124939
-1.079043	compute (FuncRow(i)*columns	-0.124939
-1.504826	X (Darwin)	-0.124939
-3.357106	can see,	-0.124939
-4.129425	the importance	-0.124939
-2.347850	always comparable	-0.124939
-3.195606	with interpretation.	-0.124939
-3.310495	= x∙xn-1,	-0.124939
-1.379934	happens rarely.	-0.124939
-1.777874	#pragma optimize("a",	-0.124939
-2.778593	but neither	-0.124939
-1.680871	predict correctly	-0.124939
-3.802463	of sets)	-0.124939
-3.251046	function libraries........................................................................................	-0.124939
-3.795255	to mind.	-0.124939
-2.815473	instruction sets,	-0.124939
-3.527822	The dot	-0.124939
-2.652005	+ a2*b1)	-0.124939
-2.419436	often mispredicted.	-0.124939
-2.544191	variable Day.	-0.124939
-3.449984	be mispredicted,	-0.124939
-1.777504	vectorization Good	-0.124939
-0.902998	(time after)	-0.124939
-2.616466	float OneOrTwo5[2]	-0.124939
-1.504780	temporary intermediates,	-0.124939
-3.850836	is incremented.	-0.124939
-2.191509	been incremented,	-0.124939
-0.602014	works, here's	-0.124939
-3.850836	is insufficient.	-0.124939
-3.291104	or she	-0.124939
-3.827713	a not-too-big	-0.124939
-2.745331	cache evictions	-0.124939
-2.092960	sign bit,	-0.124939
-2.719960	set Prefetch	-0.124939
-2.813863	CPU core).	-0.124939
-1.079089	__declspec( noalias)	-0.124939
-3.517654	for transposition	-0.124939
-2.712916	each other's	-0.124939
-1.556071	provided below,	-0.124939
-0.902998	preventing illegitimate	-0.124939
-1.747726	prevent legitimate	-0.124939
-1.300799	files. 121	-0.124939
-1.680871	logical architecture	-0.124939
-2.567869	many renamed	-0.124939
-3.850836	is unacceptable	-0.124939
-2.811801	other hardware-related	-0.124939
-2.882274	at 12,	-0.124939
-0.902998	(chapter 12)	-0.124939
-3.654810	and "More	-0.124939
-0.602014	Meyers: "Effective	-0.124939
-3.327981	// Dispatcher	-0.124939
-2.452764	See www.gnu.org/copyleft/fdl.html.	-0.124939
-4.129425	the Boost	-0.124939
-0.902998	shr ebx,31	-0.124939
-1.379934	security matters.	-0.124939
-1.981440	#include <ia32intrin.h>	-0.124939
-3.795255	to finish.	-0.124939
-2.967317	use inappropriate	-0.124939
-2.316863	case 0:	-0.124939
-3.654810	and IA-32	-0.124939
-3.195606	with x87	-0.124939
-0.902998	a*b=b*a a+b+c=a+(b+c)	-0.124939
-3.310495	= 6.0f;	-0.124939
-1.203935	{} brackets.	-0.124939
-3.795255	to NULL.	-0.124939
-3.153114	as b*(2.0/3.0)	-0.124939
-3.291104	or .a),	-0.124939
-4.129425	the tolerance	-0.124939
-1.379980	Test Processor	-0.124939
-1.999353	cache. Files	-0.124939
-2.867614	program compactness,	-0.124939
-3.225895	by (partial)	-0.124939
-0.602014	Michael Abrash:	-0.124939
-2.975045	time T+1	-0.124939
-2.106240	: 0]	-0.124939
-0.602014	Third Edition,	-0.124939
-2.702043	example 14.7b	-0.124939
-1.203935	interface. Otherwise	-0.124939
-2.611948	two suggested	-0.124939
-2.894347	Example 14.3a	-0.124939
-2.894347	Example 14.3b	-0.124939
-2.629046	library (DLL)	-0.124939
-1.079043	FUNCNAME SelectAddMul_SSE41	-0.124939
-2.513246	2 thenaandbcannot	-0.124939
-0.602014	(Red Hat).	-0.124939
-0.602014	231-1 int32_t	-0.124939
-3.517654	for issuing	-0.124939
-3.850836	is unchanged	-0.124939
-1.680825	stack. Deallocation	-0.124939
-1.902258	own initiative	-0.124939
-3.162394	code bloat	-0.124939
-1.555887	inefficient. Division,	-0.124939
-1.300845	90 Gives	-0.124939
-3.153114	as C-	-0.124939
-0.602014	floata; boolb=0;	-0.124939
-3.654810	and C#	-0.124939
-1.504826	false (0);	-0.124939
-3.085097	an acceptable	-0.124939
-3.654810	and FuncB,	-0.124939
-0.902998	1-bit removed.	-0.124939
-2.922430	will trigger	-0.124939
-3.827713	a basic	-0.124939
-0.602014	columns; j++)	-0.124939
-2.882274	at once...................................	-0.124939
-1.856870	switch statements,	-0.124939
-3.850836	is compiling.	-0.124939
-3.802463	of frustration	-0.124939
-2.819173	only 2-3	-0.124939
-3.433503	are prone	-0.124939
-1.079089	Context switches.....................................................................................................	-0.124939
-1.831908	go deeper	-0.124939
-1.943651	|| b))	-0.124939
-2.032546	operator less.	-0.124939
-3.827713	a "function".	-0.124939
-1.079043	lookups Max.	-0.124939
-2.307969	& operator[]	-0.124939
-3.850836	is enabled:	-0.124939
-2.612775	object owns.	-0.124939
-3.433503	are wrapper	-0.124939
-1.203981	__INTEL_COMPILER 161	-0.124939
-0.902998	_M_X64 162	-0.124939
-3.357106	can be,	-0.124939
-1.601736	linear algebra)	-0.124939
-3.795255	to be.	-0.124939
-3.327981	// Re-do	-0.124939
-3.802463	of algebra.	-0.124939
-2.975045	time slices.	-0.124939
-4.129425	the transformation	-0.124939
-3.327981	// x^8	-0.124939
-2.564417	version 2.11	-0.124939
-1.203981	disable power-save	-0.124939
-2.811801	other situations:	-0.124939
-0.602014	sensible balance	-0.124939
-0.902998	1.00 0.35	-0.124939
-3.850836	is over.	-0.124939
-3.085097	an over-	-0.124939
-1.962818	&& a<c)	-0.124939
-3.310495	= a1/b1	-0.124939
-2.106240	: "memory"	-0.124939
-0.602014	IA-32/Intel64, 2009.	-0.124939
-2.534207	some situations,	-0.124939
-1.504826	false vendor	-0.124939
-0.602014	100> list;	-0.124939
-3.327981	// Non-polymorphic	-0.124939
-2.203196	good investment.	-0.124939
-2.308153	instructions MOVNTPS,	-0.124939
-2.191878	precision (80	-0.124939
-3.082131	int matrix[NUMROWS][NUMCOLUMNS];	-0.124939
-0.602014	prediction). 149	-0.124939
-1.204028	*)d, x);}	-0.124939
-3.802463	of cc[i]+2	-0.124939
-2.426704	64 Iu32vec2	-0.124939
-2.157300	128 Iu32vec4	-0.124939
-3.011997	{ time1	-0.124939
-2.291194	making plug-ins	-0.124939
-1.999122	256 Vec32c	-0.124939
-2.994258	this interval,	-0.124939
-2.214382	compile time?	-0.124939
-3.310495	= &SelectAddMul_AVX2;	-0.124939
-1.380026	Core i7	-0.124939
-1.999168	development kit	-0.124939
-2.568605	array i)	-0.124939
-1.379934	polynomial (Vec4f	-0.124939
-4.129425	the transitions	-0.124939
-1.203935	cached. Usually	-0.124939
-0.602014	u[2]} a[size];	-0.124939
-1.079043	organization ...................................................................................................	-0.124939
-3.795255	to x?"	-0.124939
-2.048433	problems separating	-0.124939
-2.609644	number (the	-0.124939
-2.348587	void g()	-0.124939
-3.066220	compiler technology,	-0.124939
-2.568605	array 800	-0.124939
-3.449984	be tolerated.	-0.124939
-2.513246	2 Gbytes.	-0.124939
-2.572330	* p1;	-0.124939
-3.082131	int CriticalFunctionType(int	-0.124939
-0.602014	created, deleted,	-0.124939
-3.112512	- a/1	-0.124939
-1.079043	C++. Yet,	-0.124939
-2.426704	64 -263	-0.124939
-2.063211	was assigned	-0.124939
-1.300799	here: functional	-0.124939
-3.153114	as 2eee	-0.124939
-3.121718	not human	-0.124939
-3.827713	a plug-in	-0.124939
-0.902998	<xmmintrin.h> _mm_setcsr(_mm_getcsr()	-0.124939
-3.073074	than 2-20,	-0.124939
-3.121718	not yet	-0.124939
-3.654810	and Newton-Raphson	-0.124939
-3.449984	be regarded	-0.124939
-2.712916	each pixel	-0.124939
-2.224016	thread affinity	-0.124939
-1.379980	?Func@@YAXQAHAAH@Z PROCNEAR	-0.124939
-2.717714	page 119).	-0.124939
-4.129425	the stack).	-0.124939
-0.602014	decades ago,	-0.124939
-3.051374	may reuse	-0.124939
-0.602014	answer. Beginners	-0.124939
-3.327981	// Error:	-0.124939
-2.888228	memory pool,	-0.124939
-2.191509	been reordered,	-0.124939
-0.902998	, doublevalue	-0.124939
-0.602014	Primitives (IPP).	-0.124939
-2.882274	at hand.	-0.124939
-3.827713	a hand-	-0.124939
-3.654810	and |)	-0.124939
-1.962910	better standardization	-0.124939
-0.602014	solutions. Patches	-0.124939
-3.654810	and object-oriented	-0.124939
-2.445681	call WriteFile	-0.124939
-0.902998	symbolic link.	-0.124939
-1.902443	dispatch strategies........................................................................................	-0.124939
-2.508884	performance monitoring	-0.124939
-1.203935	1: printf("Beta");	-0.124939
-3.795255	to re-use	-0.124939
-3.850836	is dead	-0.124939
-3.527822	The Core2	-0.124939
-2.842927	different meaning.	-0.124939
-2.048156	particular meaning,	-0.124939
-1.300799	updated 2014-08-07.	-0.124939
-3.153114	as (int)&matrix[0][0]	-0.124939
-3.327981	// Safe	-0.124939
-2.778593	but i*12,	-0.124939
-1.963049	keyword __thread	-0.124939
-1.601644	buffer (BTB).	-0.124939
-2.224016	several meanings	-0.124939
-2.169569	calculation capabilities.	-0.124939
-2.032269	next paragraph.	-0.124939
-3.082131	int Sum2(S3	-0.124939
-0.602014	TR 18015,	-0.124939
-2.842927	different alignments	-0.124939
-2.805661	point status:	-0.124939
-1.680779	classes. Including	-0.124939
-2.106517	every millisecond.	-0.124939
-1.747772	S1 list[100],	-0.124939
-3.527822	The benchmark	-0.124939
-0.602014	nn ifbit=1	-0.124939
-2.445773	example, f(x)	-0.124939
-1.446881	follows: floatvalue	-0.124939
-1.079089	definition. Inlining	-0.124939
-1.601644	seconds remains	-0.124939
-4.129425	the worst-	-0.124939
-3.802463	of titles.	-0.124939
-2.894347	Example 11.2a	-0.124939
-1.203935	(RTTI) /GR–	-0.124939
-2.078442	... ~C1();	-0.124939
-2.652005	+ (vector	-0.124939
-4.129425	the initial	-0.124939
-0.602014	www.agner.org/ optimize/#vectorclass	-0.124939
-2.572330	* powN<true,N/2>::p(x);	-0.124939
-2.457547	called accumulators.	-0.124939
-2.717714	page 22.	-0.124939
-1.777643	fact addressed	-0.124939
-2.348587	void F0()	-0.124939
-2.032638	Mac OS,	-0.124939
-1.079043	earlier vmlsExp4	-0.124939
-3.195606	with -mcmodel=large,	-0.124939
-3.795255	to +127.	-0.124939
-2.681804	double precision:	-0.124939
-2.492447	< 223	-0.124939
-2.032915	1; x[1]	-0.124939
-3.654810	and Adolfy	-0.124939
-3.310495	= 64;	-0.124939
-2.502706	software products	-0.124939
-2.340669	file dvec.h	-0.124939
-2.406578	libraries (.dll	-0.124939
-2.224016	several stages	-0.124939
-3.850836	is InstructionSet().The	-0.124939
-3.802463	of 64.	-0.124939
-0.602014	USB sticks	-0.124939
-2.234420	threads Parallelization	-0.124939
-2.191832	line covers	-0.124939
-2.291517	I die.	-0.124939
-2.717714	page 153.	-0.124939
-2.317001	0 65535	-0.124939
-0.902998	report /Qopt-report	-0.124939
-3.827713	a low-priority	-0.124939
-0.602014	12.1b. Vectorization	-0.124939
-1.379934	x-xxxx--x Profile-guided	-0.124939
-3.654810	and destructors.	-0.124939
-2.213367	etc. Locked	-0.124939
-1.902305	platform independence,	-0.124939
-2.333853	dynamic libraries............................................................................	-0.124939
-3.802463	of fine-tuning,	-0.124939
-3.281894	it exits.	-0.124939
-2.803414	loop exits,	-0.124939
-1.999122	name mangling	-0.124939
-2.282377	Gnu utilities	-0.124939
-0.902998	Difficult cases........................................................................................................	-0.124939
-2.191509	been alleviated	-0.124939
-2.611948	two decimals,	-0.124939
-2.192016	matrix cell	-0.124939
-3.482454	that transfers	-0.124939
-3.310495	= list[j].b	-0.124939
-0.902998	order(i); list[j].a	-0.124939
-2.894347	Example 12.4a.	-0.124939
-2.702043	example 12.4a,	-0.124939
-2.446603	For example,a	-0.124939
-3.449984	be obeyed.	-0.124939
-3.433503	are sufficient,	-0.124939
-1.715495	final product.	-0.124939
-1.643037	ebx restores	-0.124939
-0.602014	July 2011).	-0.124939
-0.902998	inherently serial,	-0.124939
-3.449984	be restored	-0.124939
-1.079089	C#, managed	-0.124939
-1.715495	registers. Disadvantages	-0.124939
-1.380026	expected real-time	-0.124939
-0.602014	1./5040., 1./40320.,	-0.124939
-2.234558	speed exceeding	-0.124939
-1.079043	---xx---- (a+c==b+c)=(a==b)	-0.124939
-1.962864	Table 18.2.	-0.124939
-0.602014	menu click	-0.124939
-3.310495	= 1.23456,	-0.124939
-2.016156	automatically download	-0.124939
-0.602014	-fopenmp /Qopenmp	-0.124939
-3.122826	This technique	-0.124939
-3.654810	and de-allocation	-0.124939
-3.225895	by x<<3,	-0.124939
-2.234282	best optimizer.	-0.124939
-2.281778	about division).	-0.124939
-3.827713	a monotonically	-0.124939
-3.827713	a top-of-stack	-0.124939
-2.612545	multiple configurations	-0.124939
-3.654810	and off.	-0.124939
-0.602014	Vec8i Vec8ui	-0.124939
-2.564417	version 2.20	-0.124939
-2.650262	n.a. 2.23	-0.124939
-0.602014	www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf.	-0.124939
-3.795255	to relocate,	-0.124939
-3.433503	are advised	-0.124939
-3.827713	a button	-0.124939
-1.715495	right positions	-0.124939
-2.994258	this did	-0.124939
-0.602014	Iu16vec8 Vec8us	-0.124939
-1.981440	#include <excpt.h>	-0.124939
-3.827713	a minimal	-0.124939
-1.300845	Kernel Library,	-0.124939
-0.902998	Far Systems	-0.124939
-3.291104	or more.	-0.124939
-2.356780	even telling	-0.124939
-3.654810	and unexpected	-0.124939
-1.504780	profiling feasible.	-0.124939
-3.310495	= x2*x2;	-0.124939
-3.802463	of Mathcad	-0.124939
-2.203058	option -mveclibabi=acml.	-0.124939
-2.371619	user friendly	-0.124939
-1.300799	processes running,	-0.124939
-2.842927	different targets	-0.124939
-2.254256	calls exit.	-0.124939
-1.747633	task switching.	-0.124939
-2.341130	following alternatives:	-0.124939
-2.858815	vector operations...............................................................................................	-0.124939
-2.717714	page 27).	-0.124939
-1.880167	options ...................................................................................	-0.124939
-2.652005	+ log(c[i]);	-0.124939
-3.225895	by assignment,	-0.124939
-3.225895	by assignment.	-0.124939
-3.795255	to diagnose.	-0.124939
-2.894347	Example 8.9b	-0.124939
-3.795255	to 15.1c).	-0.124939
-1.203935	{} vector(float	-0.124939
-3.082131	int c;};	-0.124939
-2.894347	Example 8.9a	-0.124939
-2.815473	instruction sets...........................	-0.124939
-1.601690	Windows. Integrates	-0.124939
-2.348587	void AddTwo(int	-0.124939
-0.602014	scanf. Violation	-0.124939
-2.254026	work evenly	-0.124939
-2.191509	been identified,	-0.124939
-3.802463	of simultaneous	-0.124939
-3.517654	for incrementing	-0.124939
-1.805579	become imprecise	-0.124939
-2.679119	Intel Technology	-0.124939
-3.310495	= -100,	-0.124939
-2.445681	call p->f()	-0.124939
-0.602014	0); DontSkip	-0.124939
-0.602014	1., 1./2.,	-0.124939
-3.827713	a blend	-0.124939
-2.652005	+ e	-0.124939
-3.654810	and flexibility	-0.124939
-0.602014	SelectAddMul, SelectAddMul_SSE2,	-0.124939
-2.470414	const Greek[4]	-0.124939
-1.204028	(in Windows:	-0.124939
-3.827713	a wealth	-0.124939
-3.850836	is correlated	-0.124939
-2.341591	out independently	-0.124939
-3.327981	// Output	-0.124939
-1.079043	<typename T,	-0.124939
-1.079043	<typename T>	-0.124939
-0.902998	FuncType SelectAddMul,	-0.124939
-1.204028	empty throw()specification	-0.124939
-2.169292	calculated asa	-0.124939
-1.079043	93 themselves.	-0.124939
-3.073074	than ARRAYSIZE.	-0.124939
-1.380026	defines electrical	-0.124939
-2.681804	double A2	-0.124939
-2.106240	: "=m"(n)	-0.124939
-3.654810	and A.	-0.124939
-0.902998	environment (IDE)	-0.124939
-0.902998	ISO/IEC TR	-0.124939
-1.680779	libraries. Numbers	-0.124939
-2.263341	large amounts	-0.124939
-1.300799	__asm fld	-0.124939
-1.504826	fast. Calculating	-0.124939
-0.602014	Mathcad (v.	-0.124939
-1.777828	whole polygon	-0.124939
-1.079043	----- ~(~a)=a	-0.124939
-2.378918	pointers /vms	-0.124939
-3.051374	may sample	-0.124939
-2.892568	from everybody.	-0.124939
-3.281894	it unwise	-0.124939
-2.440447	operating systems").	-0.124939
-3.482454	that looses	-0.124939
-0.902998	&Object2; p2->Hello();	-0.124939
-2.894347	Example 8.23b.	-0.124939
-2.652005	+ d);	-0.124939
-1.203981	pointers. 144	-0.124939
-2.106378	hardware circuits	-0.124939
-2.894347	Example 14.1b	-0.124939
-1.555933	n; 143	-0.124939
-2.894347	Example 14.1a	-0.124939
-3.850836	is ecx+eax*4.	-0.124939
-1.079043	int. Reinterpret	-0.124939
-1.203935	100, max	-0.124939
-1.079043	(page 77)	-0.124939
-2.378550	test theory.	-0.124939
-0.602014	"move constructor"	-0.124939
-2.224801	through 14,	-0.124939
-2.930452	A Number)	-0.124939
-2.922430	will grow	-0.124939
-1.379980	systems: Pointers,	-0.124939
-1.805487	size. Unpredictable	-0.124939
-3.291104	or QueryPerformanceCounter	-0.124939
-4.129425	the workload	-0.124939
-3.225895	by u[0].	-0.124939
-2.832806	same class).	-0.124939
-3.433503	are seeing	-0.124939
-1.446927	Software distributors	-0.124939
-0.602014	2A, 2B,	-0.124939
-0.602014	2006 (Red	-0.124939
-2.273049	CPUs optimally.	-0.124939
-2.874792	data optimally,	-0.124939
-3.802463	of view.	-0.124939
-1.504826	heavy competition.	-0.124939
-1.777458	v. 8.42n,	-0.124939
-0.602014	Coriolis group	-0.124939
-3.795255	to thank	-0.124939
-0.602014	OpenMP. www.openmp.org.	-0.124939
-1.747633	particularly interesting	-0.124939
-1.379934	x-xxxx--x ~a&~b=~(a|b)	-0.124939
-1.504873	label ;eax=addressofa	-0.124939
-1.601690	declaration "static"	-0.124939
-3.802463	of 0x800	-0.124939
-3.357106	can subtract	-0.124939
-2.994258	this works,	-0.124939
-0.602014	CriticalFunction, @gnu_indirect_function");	-0.124939
-2.811801	other optimizations,	-0.124939
-3.654810	and matrixes.	-0.124939
-2.842927	different speeds.	-0.124939
-1.379934	SSE3 tmmintrin.h	-0.124939
-0.602014	x(0) {};	-0.124939
-1.963187	programmer forgets	-0.124939
-1.379934	above. 7.	-0.124939
-1.831908	positive integer:	-0.124939
-1.680964	compiling module2.cpp.	-0.124939
-2.016294	members last:	-0.124939
-2.426704	64 14.0	-0.124939
-0.602014	SelectAddMul_SSE41, SelectAddMul_AVX2,	-0.124939
-0.602014	consumers. Choose	-0.124939
-2.652005	+ list[j].c;	-0.124939
-1.203935	{return r.a	-0.124939
-2.842927	different places).	-0.124939
-3.517654	for Linux)	-0.124939
-3.291104	or decrementing	-0.124939
-2.213367	etc. Accessibility	-0.124939
-2.811801	other constructors.	-0.124939
-2.681804	double x10	-0.124939
-3.327981	// Generic	-0.124939
-0.602014	F32vec8 F64vec4	-0.124939
-3.850836	is system-independent,	-0.124939
-0.602014	92 DynamicArray[i]	-0.124939
-2.703420	do cross-module	-0.124939
-0.602014	Remember, therefore,	-0.124939
-2.652005	+ FuncCol(i))	-0.124939
-3.225895	by requesting	-0.124939
-2.874792	data locally.	-0.124939
-2.894347	Example 8.3a	-0.124939
-2.894347	Example 12.4c.	-0.124939
-1.805810	gives a+b=0,	-0.124939
-0.902998	AMD: "Software	-0.124939
-2.263986	assembly listing.	-0.124939
-3.663751	in eax.	-0.124939
-1.831954	computer game	-0.124939
-3.327981	// polynomial(x)	-0.124939
-2.572330	* cc[i]);	-0.124939
-1.504826	line. Time-based	-0.124939
-3.654810	and uninstallation	-0.124939
-1.446834	reductions: a+b=b+a	-0.124939
-4.129425	the remaining	-0.124939
-3.327981	// u.d	-0.124939
-0.602014	ifunc branch).	-0.124939
-1.379980	Type conversions....................................................................................................	-0.124939
-2.340531	system forbids	-0.124939
-2.717714	page 107.	-0.124939
-1.555979	block. Walking	-0.124939
-0.602014	754 (1985).	-0.124939
-2.595113	also de-allocated.	-0.124939
-1.203981	Threads ..................................................................................................................	-0.124939
-0.902998	-Ofast /O3	-0.124939
-2.813863	CPU dispatching:	-0.124939
-2.813863	CPU dispatching,	-0.124939
-3.195606	with C++0x	-0.124939
-2.168831	/ (b1*b2);	-0.124939
-0.602014	1./8.71782E10, 1./1.30767E12,	-0.124939
-0.602014	list[100], *temp;	-0.124939
-3.195606	with segmented	-0.124939
-3.085097	an integral	-0.124939
-1.379934	CodeGear compiler).	-0.124939
-2.078304	program. Weighing	-0.124939
-2.223878	These workaround	-0.124939
-0.602014	appropriately. Users	-0.124939
-0.602014	matters. Problems	-0.124939
-0.602014	/Qopenmp -m32	-0.124939
-1.079089	double, bool,	-0.124939
-0.902998	generic branch,	-0.124939
-0.902998	-msse4.1 /arch:SSE4.1	-0.124939
-1.203935	worst-case performance:	-0.124939
-0.602014	Enterprise editions).	-0.124939
-2.458146	address [ecx+eax*4].	-0.124939
-0.602014	optimizer. Borland/CodeGear/Embarcadero	-0.124939
-1.203935	maintenance easier.	-0.124939
-2.559809	value 1000.	-0.124939
-2.695975	using ready	-0.124939
-3.000398	have studied	-0.124939
-2.652005	+ 0.666666666666666666667;	-0.124939
-2.202919	+= A2;	-0.124939
-1.715403	so-called commpage.	-0.124939
-3.281894	it uses.	-0.124939
-2.719317	class powN<true,0>	-0.124939
-0.602014	0x2F00, 0x3700,	-0.124939
-0.602014	column++) matrix[row][column]	-0.124939
-2.203196	good performance).	-0.124939
-1.601644	chapter describes	-0.124939
-3.310495	= {2.6f,	-0.124939
-2.273188	accessed non-sequentially	-0.124939
-3.073074	than 1%	-0.124939
-0.602014	non-recoverable errors;	-0.124939
-2.307969	& 1)	-0.124939
-2.682539	size (typically	-0.124939
-3.482454	that hackers	-0.124939
-0.602014	hard-to-find errors,	-0.124939
-4.129425	the formula:	-0.124939
-0.602014	loop? Certainly	-0.124939
-1.805579	statement occupies	-0.124939
-1.380026	internal multi-threading,	-0.124939
-1.981440	#include "xmmintrin.h"	-0.124939
-3.850836	is occupied	-0.124939
-1.601736	My experimental	-0.124939
-1.902489	control condition:	-0.124939
-2.882274	at runtime).	-0.124939
-0.602014	r2, c1,	-0.124939
-0.602014	Specifications, Dr	-0.124939
-3.802463	of switching	-0.124939
-2.611948	two formulas	-0.124939
-3.795255	to trace	-0.124939
-2.457547	called Single-Instruction-Multiple-Data	-0.124939
-0.902998	initialization, condition,	-0.124939
-3.011997	{ protected:	-0.124939
-3.827713	a zigzag	-0.124939
-2.994258	this topic,	-0.124939
-1.079043	User complaints	-0.124939
-2.332977	bits (XMM),	-0.124939
-3.251046	function local:	-0.124939
-3.073074	than 20.	-0.124939
-1.379980	C++, D,	-0.124939
-2.016248	like throw(A,B,C)	-0.124939
-3.281894	it fills	-0.124939
-1.880167	made local.	-0.124939
-2.445773	less precise	-0.124939
-0.602014	18.3. Predefined	-0.124939
-2.766015	one local,	-0.124939
-2.273188	accessed column-wise.	-0.124939
-3.251046	function __intel_cpu_features_init_x()	-0.124939
-1.079043	flags stall	-0.124939
-1.643175	list[i] =0;	-0.124939
-1.203935	Graphics accelerators	-0.124939
-0.902998	C++". Addison-Wesley.	-0.124939
-1.079089	bit: absvalue	-0.124939
-1.680779	mov eax,0.	-0.124939
-0.602014	fistp dword	-0.124939
-0.602014	inte- ger	-0.124939
-3.527822	The recommendations	-0.124939
-2.874792	data decomposition,	-0.124939
-1.504826	jump targets.	-0.124939
-3.449984	be undesired.	-0.124939
-2.341268	32 bytes).	-0.124939
-0.602014	discussions. Turn	-0.124939
-2.894347	Example 12.6.	-0.124939
-2.702043	example 7.32b.	-0.124939
-2.679119	Intel VTune,	-0.124939
-1.203935	interval [1.0,	-0.124939
-2.608584	static if),	-0.124939
-3.850836	is referencing	-0.124939
-2.457547	called VTune;	-0.124939
-0.602014	incredibly stupid	-0.124939
-2.364011	without worrying	-0.124939
-2.458192	before storing.	-0.124939
-2.132431	operators ......................................................................	-0.124939
-2.894347	Example 7.29b	-0.124939
-0.602014	Jr.: "Hacker's	-0.124939
-1.203935	objects. Storage	-0.124939
-0.602014	constant: Unsigned	-0.124939
-0.902998	increasingly blurred	-0.124939
-2.894347	Example 7.29a	-0.124939
-3.082131	int dummy;	-0.124939
-3.153114	as eliminating	-0.124939
-2.063580	write FatalAppExitA(0,"Array	-0.124939
-3.225895	by emulating	-0.124939
-2.564417	version satisfies	-0.124939
-3.310495	= (int)n	-0.124939
-1.601690	module with,	-0.124939
-0.602014	utilized appropriately.	-0.124939
-0.602014	378.7 168.5	-0.124939
-2.518327	return x10;	-0.124939
-0.602014	58.7 168.3	-0.124939
-1.777643	produce streaming	-0.124939
-2.518327	return (2.5f	-0.124939
-2.497887	between commas	-0.124939
-1.079043	reciprocal_divisor; reciprocal_divisor	-0.124939
-4.129425	the usual	-0.124939
-2.451427	4 ?Func2@@YAXQAHAAH@Z	-0.124939
-1.981440	#include <float.h>	-0.124939
-3.654810	and temp++	-0.124939
-3.433503	are doing.	-0.124939
-3.850836	is unreasonably	-0.124939
-3.663751	in popularity	-0.124939
-0.902998	relocated (rebased)	-0.124939
-1.079043	cmp ja	-0.124939
-2.332977	bits (YMM),	-0.124939
-0.602014	[1.0, 2.0)	-0.124939
-3.482454	that dates	-0.124939
-3.327981	// Called	-0.124939
-2.650262	n.a. (-a)*(-b)	-0.124939
-3.082131	int row,	-0.124939
-1.203981	mirror position	-0.124939
-3.827713	a constructor.	-0.124939
-3.654810	and 2B.	-0.124939
-3.827713	a number).	-0.124939
-2.308660	processors (0,	-0.124939
-2.717714	page 78.	-0.124939
-1.880121	binary decimals	-0.124939
-1.999168	separate executables	-0.124939
-3.433503	are among	-0.124939
-1.962956	below. Installing	-0.124939
-1.943743	> largest_abs)	-0.124939
-2.702043	example 8.15b.	-0.124939
-1.379980	truncation towards	-0.124939
-2.032408	multiplication b[i]*c[i],	-0.124939
-0.902998	appropriate. Compiler-specific	-0.124939
-3.850836	is maintained	-0.124939
-2.954036	more efficient:	-0.124939
-1.999445	__m128i LoadVectorA(void	-0.124939
-0.602014	b[arraysize], c[arraysize];	-0.124939
-4.129425	the performance,	-0.124939
-3.827713	a sensible	-0.124939
-1.079089	-O3 Interprocedural	-0.124939
-2.145435	Function Assembly	-0.124939
-2.719317	class powN<true,N>	-0.124939
-3.112512	- a+b+c	-0.124939
-3.827713	a compelling	-0.124939
-4.129425	the profile.	-0.124939
-3.433503	are cumbersome	-0.124939
-2.882274	at 403	-0.124939
-1.379934	SSE3 pmmintrin.h	-0.124939
-3.795255	to experience.	-0.124939
-2.867614	program executable:	-0.124939
-3.827713	a vector).	-0.124939
-1.300799	Development Environments)	-0.124939
-1.715634	non-Intel machines?	-0.124939
-3.517654	for those	-0.124939
-3.310495	= a[i].u[1]	-0.124939
-1.380026	protection scheme	-0.124939
-4.129425	the IDE,	-0.124939
-2.307969	& (N-1))	-0.124939
-3.517654	for investigating	-0.124939
-1.777874	#pragma novector	-0.124939
-3.310495	= 110;	-0.124939
-0.902998	vacant spaces.	-0.124939
-0.602014	icon signaling	-0.124939
-3.449984	be passed	-0.124939
-1.981440	#include <pmmintrin.h>	-0.124939
-2.432907	first sub-vector.	-0.124939
-4.129425	the IEEE	-0.124939
-2.032546	operator (|)	-0.124939
-1.300799	relatively expensive,	-0.124939
-1.999122	model N-1	-0.124939
-3.433503	are dominating	-0.124939
-1.300891	__attribute(( fastcall))	-0.124939
-0.602014	brutally interrupted.	-0.124939
-2.157300	128 17.4	-0.124939
-1.504826	interpreted script	-0.124939
-3.310495	= lookup[b];	-0.124939
-1.079089	_mm_loadu_si128((__m128i const*)p);}	-0.124939
-0.602014	Iu8vec16 Vec16uc	-0.124939
-1.203935	F32vec4 xxn(x4,	-0.124939
-2.652005	+ i/2;	-0.124939
-2.741712	integer According	-0.124939
-2.032592	microprocessor microarchitecture.	-0.124939
-2.446603	For team	-0.124939
-1.943743	> abs(v.f)	-0.124939
-0.902998	__asm__ (".type	-0.124939
-2.446603	For one-man	-0.124939
-3.517654	for vectorization.............................................................	-0.124939
-1.079043	metaprogramming. None	-0.124939
-1.856778	#define MAX(a,b)	-0.124939
-1.643129	target buffer,	-0.124939
-3.357106	can roughly	-0.124939
-3.310495	= pow(x,n)	-0.124939
-0.902998	3) <<6	-0.124939
-4.129425	the arrays:	-0.124939
-0.602014	15h Processors".	-0.124939
-0.602014	methods: Instrumentation:	-0.124939
-3.802463	of i&15	-0.124939
-0.602014	"__attribute__((visibility ("hidden")))".	-0.124939
-3.153114	as buffers	-0.124939
-1.902212	AVX2 _mm256_i64gather_pd	-0.124939
-3.281894	it twice.	-0.124939
-0.602014	Today (2013)	-0.124939
-4.129425	the capability	-0.124939
-3.795255	to _endthread()	-0.124939
-3.310495	= 2.5*x^2	-0.124939
-3.291104	or __debugbreak();.	-0.124939
-2.811801	other system-	-0.124939
-3.251046	function calls,	-0.124939
-3.795255	to fine-	-0.124939
-2.629046	library asmlib,	-0.124939
-3.850836	is aiming	-0.124939
-2.191509	been found,	-0.124939
-2.048156	particular subtask	-0.124939
-3.654810	and lrint.	-0.124939
-2.572330	* c[i]);	-0.124939
-0.602014	-231 231-1	-0.124939
-2.668262	pointer serves	-0.124939
-2.815473	instruction latencies	-0.124939
-1.747680	limited audience	-0.124939
-0.602014	unreferen- ced	-0.124939
-1.923540	becomes full.	-0.124939
-3.802463	of if.	-0.124939
-0.602014	Feature bloat.	-0.124939
-3.527822	The radical	-0.124939
-1.504919	space. Putting	-0.124939
-3.527822	The absence	-0.124939
-3.011997	{ FuncB(i);	-0.124939
-3.802463	of solving	-0.124939
-4.129425	the programmers'	-0.124939
-2.399785	time. Four	-0.124939
-3.802463	of (a+b).	-0.124939
-1.203981	contained objects?	-0.124939
-3.663751	in y.	-0.124939
-1.643037	bounds violations,	-0.124939
-4.129425	the processor)	-0.124939
-2.930452	A sourcebook	-0.124939
-1.379934	SSE3 horizontal	-0.124939
-2.805661	point comparison.	-0.124939
-3.449984	be broken	-0.124939
-1.380026	(float *)alloca(n	-0.124939
-3.153114	as spell-checking	-0.124939
-1.504919	(unsigned int)size)	-0.124939
-2.894347	Example 7.34a.	-0.124939
-2.819173	only SSE).	-0.124939
-4.129425	the CPU-type	-0.124939
-3.654810	and communicating	-0.124939
-4.129425	the even-numbered	-0.124939
-2.842927	different meaning	-0.124939
-3.310495	= a<<(b+c)	-0.124939
-3.162394	code mixes	-0.124939
-2.567869	many encryption	-0.124939
-2.291517	I tried	-0.124939
-2.616466	float coef[16]	-0.124939
-3.527822	The fallacy	-0.124939
-3.121718	not referenced	-0.124939
-0.902998	EXCEPTION_FLT_OVERFLOW 0xC0000091L	-0.124939
-3.827713	a formalism.	-0.124939
-3.802463	of underflow:	-0.124939
-2.263387	; mark	-0.124939
-2.661281	into sub-vectors	-0.124939
-1.446834	compile-time polymorphism.	-0.124939
-4.129425	the __assume_aligned	-0.124939
-1.446834	compile-time polymorphism,	-0.124939
-1.963002	runtime polymorphism:	-0.124939
-0.602014	self-explaining menus	-0.124939
-2.703880	compilers (Microsoft,	-0.124939
-0.602014	e, f,	-0.124939
-3.795255	to date):	-0.124939
-2.439847	unsigned Examples:	-0.124939
-2.745331	cache MOVNTPS	-0.124939
-3.281894	it lacks	-0.124939
-3.449984	be arranged	-0.124939
-2.119789	calculate (c+d)	-0.124939
-3.850836	is pushed	-0.124939
-3.654810	and USB	-0.124939
-2.281732	its brand,	-0.124939
-2.254441	result (b+c)	-0.124939
-3.291104	or "frame	-0.124939
-1.715634	struct Sdouble	-0.124939
-2.092636	well tested,	-0.124939
-3.082131	int Sum3(S3	-0.124939
-1.446834	suitable duration.	-0.124939
-4.129425	the lifetime	-0.124939
-0.602014	&list[100]; temp++)	-0.124939
-2.894347	Example 14.13c	-0.124939
-0.602014	numbered consecutively?	-0.124939
-2.894347	Example 14.13a	-0.124939
-3.051374	may fill	-0.124939
-2.894347	Example 8.15b	-0.124939
-1.902212	AVX2 _mm_i64gather_pd	-0.124939
-3.654810	and finally	-0.124939
-1.777735	Such hybrid	-0.124939
-1.079043	list[100]; Func1(list,	-0.124939
-1.777828	whole workday	-0.124939
-3.433503	are instantiated	-0.124939
-2.811801	other flaws	-0.124939
-1.963187	char pointers).	-0.124939
-1.601736	map file"	-0.124939
-2.518327	return IntegerPower<10>(x);	-0.124939
-1.880213	were tested:	-0.124939
-3.654810	and analyzing	-0.124939
-2.719317	class S2	-0.124939
-2.719317	class S3	-0.124939
-2.650262	n.a. (a+b)+c	-0.124939
-3.310495	= {1.1,	-0.124939
-2.874792	data elements,	-0.124939
-0.902998	function: (static_cast<MyChild*>(this))->Disp();	-0.124939
-3.449984	be cross-	-0.124939
-1.680687	(e.g. GetLogicalProcessorInformation	-0.124939
-3.802463	of it).	-0.124939
-2.078027	quite substantial.	-0.124939
-3.310495	= A*x*x	-0.124939
-1.203935	(Gnu) AES,	-0.124939
-3.482454	that u	-0.124939
-1.556025	device driver.	-0.124939
-3.449984	be combined.	-0.124939
-3.654810	and disadvantages.	-0.124939
-2.544927	we loose	-0.124939
-0.602014	competition. Processors	-0.124939
-3.073074	than investing	-0.124939
-2.281732	its arguments.	-0.124939
-2.882274	at www.agner.org/optimize/asmlib.zip	-0.124939
-3.327981	// Round	-0.124939
-2.016063	complicated reductions.	-0.124939
-2.503028	very kludgy.	-0.124939
-1.715634	Linux, sched_setaffinity).	-0.124939
-2.544145	any answer.	-0.124939
-2.281732	its out-of-	-0.124939
-3.082131	int r1,	-0.124939
-1.556025	easy GUI	-0.124939
-3.654810	and fence	-0.124939
-0.602014	sign, eee	-0.124939
-0.602014	scalar (Scalar	-0.124939
-1.300799	handling. Omitting	-0.124939
-3.073074	than nine,	-0.124939
-0.602014	lea $B2$2:	-0.124939
-2.894347	Example 7.10b	-0.124939
-2.652005	+ d.y;	-0.124939
-2.894347	Example 7.10a	-0.124939
-3.802463	of randomness	-0.124939
-1.680825	priority thread,	-0.124939
-2.502706	software development",	-0.124939
-3.121718	not selected.	-0.124939
-3.082131	int BigArray[1024]	-0.124939
-2.016386	see shortly.	-0.124939
-2.815473	instruction latencies,	-0.124939
-0.602014	broader perspective	-0.124939
-2.498348	long latencies.	-0.124939
-0.602014	~, <<,	-0.124939
-3.082131	int FuncRow(int);	-0.124939
-3.327981	// versions:	-0.124939
-3.310495	= 1.0E8,	-0.124939
-3.654810	and 12.4c	-0.124939
-1.203981	Intel: "Intel®	-0.124939
-1.079043	seem illogical	-0.124939
-1.079089	Architecture Programmer’s	-0.124939
-0.902998	clumsy AND-OR	-0.124939
-3.310495	= (short	-0.124939
-1.962818	&& !b)	-0.124939
-2.612545	multiple versions,	-0.124939
-1.380026	embedded microcontrollers.	-0.124939
-1.943743	> -b	-0.124939
-2.016386	expression -a	-0.124939
-0.602014	PREFETCH _mm_prefetch	-0.124939
-1.962864	Table 12.4.	-0.124939
-1.079043	<typename MyChild>	-0.124939
-3.082131	int bb[size]	-0.124939
-0.902998	digital building	-0.124939
-3.433503	are universal,	-0.124939
-3.795255	to pool	-0.124939
-3.802463	of &list[100]	-0.124939
-3.327981	// Entry	-0.124939
-2.616466	float add_elements(__m128	-0.124939
-0.602014	void. Returning	-0.124939
-0.902998	i<n; ++i).	-0.124939
-3.082131	int NUMROWS	-0.124939
-3.310495	= (s0+s1)+(s2+s3);	-0.124939
-1.379980	Performance Primitives	-0.124939
-3.827713	a narrow	-0.124939
-2.894347	Example 12.4e.	-0.124939
-1.963002	runtime DLL's	-0.124939
-0.602014	17.9: "Moving	-0.124939
-2.181381	inside sqaure:	-0.124939
-3.310495	= &SelectAddMul_dispatch;	-0.124939
-0.602014	Locked mutexes.	-0.124939
-1.203935	mouse move.	-0.124939
-2.168831	/ nfac;	-0.124939
-3.517654	for detecting	-0.124939
-0.602014	conditional move,	-0.124939
-0.602014	Booth: "Inner	-0.124939
-0.602014	logically distinct	-0.124939
-3.310495	= (a&b)&(c&d)	-0.124939
-0.902998	Public License,	-0.124939
-2.813863	CPU core,	-0.124939
-2.016248	like string,	-0.124939
-0.602014	4.5 0.82	-0.124939
-0.602014	Print heading	-0.124939
-2.717714	page 60.	-0.124939
-1.715449	optimization. Prefetching	-0.124939
-1.981717	automatic vectorization,	-0.124939
-1.601644	current position.	-0.124939
-0.602014	0.77 0.89	-0.124939
-1.504780	iterations ahead.	-0.124939
-2.307969	& 15]	-0.124939
-1.203981	virus scanner	-0.124939
-2.867614	program logic.	-0.124939
-0.602014	2.11 ifunc	-0.124939
-2.824466	functions /Gr	-0.124939
-3.310495	= n∙(n-1)!.	-0.124939
-2.915311	} Transposing	-0.124939
-3.225895	by controlling	-0.124939
-3.654810	and shared_ptr.	-0.124939
-3.654810	and databases.	-0.124939
-2.458192	before trying	-0.124939
-3.827713	a multitasking	-0.124939
-2.032408	multiplication (27	-0.124939
-2.032408	multiplication (20	-0.124939
-3.310495	= (total	-0.124939
-2.894347	Example 8.5b	-0.124939
-2.406486	optimization /GL	-0.124939
-2.894347	Example 8.5a	-0.124939
-3.802463	of it)	-0.124939
-1.902443	dispatch strategies	-0.124939
-3.850836	is requested.	-0.124939
-3.850836	is delayed	-0.124939
-2.133123	i++) List[i]++;	-0.124939
-2.458192	before you.	-0.124939
-0.902998	14.27 assumes	-0.124939
-2.805661	point variable:	-0.124939
-0.902998	Addison-Wesley, 2003.	-0.124939
-3.850836	is assumed	-0.124939
-2.595343	C++ builder.	-0.124939
-0.602014	||, !	-0.124939
-2.341130	programming nowadays	-0.124939
-1.079043	7.28 Templates...............................................................................................................57	-0.124939
-1.556071	implement OneOrTwo5[b!=0]	-0.124939
-1.504965	hash maps	-0.124939
-1.601644	chapter 9.10,	-0.124939
-2.611948	two steps.	-0.124939
-1.601783	version. Updating	-0.124939
-3.291104	or -axAVX.	-0.124939
-3.225895	by inverting	-0.124939
-3.850836	is (int)(&list[100])	-0.124939
-3.310495	= _mm_or_si128(c2,	-0.124939
-1.680779	mov ebx,eax	-0.124939
-2.399969	template specialization.	-0.124939
-2.399969	template specialization,	-0.124939
-3.449984	be improved.	-0.124939
-3.654810	and Fortran.	-0.124939
-0.602014	Scott Meyers:	-0.124939
-3.310495	= (a1*b2	-0.124939
-3.121718	not evaluated,	-0.124939
-3.310495	= OneOrTwo5[b!=0];	-0.124939
-2.106240	: x(0)	-0.124939
-0.602014	vmlsExp4 vmldExp2	-0.124939
-3.449984	be rounded	-0.124939
-3.327981	// Truncation	-0.124939
-2.168831	/ 1.2345;	-0.124939
-3.663751	in duration	-0.124939
-1.379980	type-casting pointers:	-0.124939
-3.195606	with _mm.	-0.124939
-2.882274	at Wikibooks.	-0.124939
-1.747633	task switches;	-0.124939
-3.051374	may interleave	-0.124939
-1.379980	overlap. 27	-0.124939
-0.602014	1.21 0.57	-0.124939
-0.602014	__cpuid(dummy, 0);	-0.124939
-3.795255	to Eclipse	-0.124939
-3.195606	with real	-0.124939
-2.652005	+ B*x	-0.124939
-0.602014	Journal, 2002).	-0.124939
-4.129425	the "generate	-0.124939
-2.347850	always true/false	-0.124939
-3.121718	not detected	-0.124939
-1.079043	Big supercomputers	-0.124939
-3.225895	by initializing	-0.124939
-3.850836	is mirrored	-0.124939
-2.133123	i++) matrix[FuncRow(i)][FuncCol(i)]	-0.124939
-3.327981	// Implicit	-0.124939
-2.419436	often underestimate	-0.124939
-2.994258	this rule.	-0.124939
-1.601598	user. Installation	-0.124939
-3.433503	are undocumented.	-0.124939
-3.291104	or bitmap	-0.124939
-2.307969	& 0x7FFFFF)	-0.124939
-1.079043	Volume 2A	-0.124939
-3.827713	a valuable	-0.124939
-1.203935	C-style type-casting.	-0.124939
-2.874792	data bases,	-0.124939
-2.789259	which redirects	-0.124939
-2.106332	mode (SSE2):	-0.124939
-2.192062	cause severe	-0.124939
-1.963187	last member.	-0.124939
-2.616466	float vectors)	-0.124939
-0.602014	funda- mentally	-0.124939
-3.482454	that connect	-0.124939
-0.602014	responsi- bility	-0.124939
-1.504965	*= n+1;	-0.124939
-1.715772	(See Sutter:	-0.124939
-2.954036	more primitive,	-0.124939
-3.517654	for transposing	-0.124939
-2.832806	same computer,	-0.124939
-3.850836	is closest	-0.124939
-3.795255	to print	-0.124939
-4.129425	the evaluation	-0.124939
-3.517654	for foreground	-0.124939
-0.602014	(Linux only).	-0.124939
-3.795255	to obtain,	-0.124939
-3.654810	and investigated	-0.124939
-2.741712	integer power,	-0.124939
-3.245026	if (handle	-0.124939
-1.777458	v. 11.1	-0.124939
-3.327981	// Constructor	-0.124939
-1.203935	31 11.6	-0.124939
-0.602014	removable media	-0.124939
-3.527822	The benefits	-0.124939
-1.203935	33 11.8	-0.124939
-2.518327	return powN<(N	-0.124939
-4.129425	the bias	-0.124939
-3.850836	is utilized	-0.124939
-4.129425	the MKL	-0.124939
-0.602014	-263 263-1	-0.124939
-1.203935	F32vec4 F64vec2	-0.124939
-2.263986	assembly language",	-0.124939
-1.680964	OS X.	-0.124939
-1.923679	linking (remove	-0.124939
-2.244358	processor X"	-0.124939
-3.850836	is developing	-0.124939
-3.433503	are aligned,	-0.124939
-2.063580	write 2.0/3.0	-0.124939
-2.894347	Example 7.31b	-0.124939
-2.894347	Example 7.31a	-0.124939
-2.572330	* powN<true,N-N1>::p(x);	-0.124939
-1.380073	Common Language	-0.124939
-1.923540	becomes noticeable.	-0.124939
-2.513246	2 gigabytes	-0.124939
-1.962956	n >>=	-0.124939
-2.717714	page 103)	-0.124939
-3.827713	a learning	-0.124939
-2.894347	Example 7.43b.	-0.124939
-2.534207	some links.	-0.124939
-3.082131	int UnusedFiller;	-0.124939
-3.827713	a 50-50	-0.124939
-3.035733	you know).	-0.124939
-1.203935	detailed overview	-0.124939
-3.850836	is supposed	-0.124939
-2.894347	Example 14.4b	-0.124939
-2.894347	Example 15.1a.	-0.124939
-2.513246	2 Mbytes.	-0.124939
-3.663751	in interactive	-0.124939
-0.902998	2008 version).	-0.124939
-0.602014	ifbit=1 bitofn	-0.124939
-1.300799	ever happens.	-0.124939
-3.654810	and error-prone.	-0.124939
-0.902998	Wikipedia article	-0.124939
-2.611948	two entries.	-0.124939
-2.778593	but risky.	-0.124939
-1.300799	project built	-0.124939
-1.747772	class. Members	-0.124939
-4.129425	the majority	-0.124939
-3.357106	can build	-0.124939
-1.923679	linking (multithreaded)	-0.124939
-1.300891	((unsigned int)(i	-0.124939
-1.642944	rarely justifies	-0.124939
-2.894347	Example 8.13a	-0.124939
-2.894347	Example 8.13b	-0.124939
-0.902998	^, ~,	-0.124939
-4.129425	the wheel.	-0.124939
-3.795255	to come.	-0.124939
-2.180827	works (gcc	-0.124939
-4.129425	the self-explaining	-0.124939
-3.051374	may actively	-0.124939
-0.602014	4.4, 2.5};	-0.124939
-2.652005	+ a.x,	-0.124939
-3.795255	to weigh	-0.124939
-4.129425	the resource-hungry	-0.124939
-3.195606	with zero-bits	-0.124939
-2.419436	often reorganized	-0.124939
-1.504965	-1 (a&~b)|(~a&b)=a^b	-0.124939
-0.602014	62 __try	-0.124939
-3.517654	for minimizing	-0.124939
-3.663751	in relation	-0.124939
-3.310495	= b[r][c];	-0.124939
-3.195606	with enum,	-0.124939
-2.702043	example 8.21,	-0.124939
-2.894347	Example 14.15b	-0.124939
-1.962910	too fine	-0.124939
-0.602014	E-book Usability	-0.124939
-1.981717	automatic CPU-dispatching	-0.124939
-3.291104	or double)	-0.124939
-2.892568	from main,	-0.124939
-3.795255	to truly	-0.124939
-3.527822	The allocation,	-0.124939
-2.419436	often seen,	-0.124939
-0.602014	Classes (MFC).	-0.124939
-0.602014	strlen, sprintf,	-0.124939
-3.827713	a double:	-0.124939
-0.602014	(".type CriticalFunction,	-0.124939
-3.654810	and reorganize:	-0.124939
-3.654810	and convoluted	-0.124939
-2.652005	+ b[i];	-0.124939
-1.642991	e.g. .R.	-0.124939
-2.364011	without restrictions.	-0.124939
-3.850836	is copyrighted	-0.124939
-3.291104	or network.	-0.124939
-2.492447	< arraysize;	-0.124939
-3.795255	to express	-0.124939
-2.063488	both cheaper	-0.124939
-2.888228	memory re-allocation	-0.124939
-2.913667	then de-referenced	-0.124939
-2.078257	used. Web	-0.124939
-3.795255	to restart	-0.124939
-0.602014	DLL's (dynamically	-0.124939
-3.517654	for (j	-0.124939
-3.291104	or clearing	-0.124939
-3.517654	for auto_ptr.	-0.124939
-0.602014	-fno-alias Non-strict	-0.124939
-3.082131	int List[ArraySize];	-0.124939
-1.831862	my experiments.	-0.124939
-3.850836	is acceptable.	-0.124939
-3.310495	= *(++p)	-0.124939
-0.602014	Vec4q Vec4uq	-0.124939
-3.327981	// Modulo	-0.124939
-0.602014	suggested improvements).	-0.124939
-3.795255	to vectorize,	-0.124939
-3.802463	of doubles	-0.124939
-2.793431	If so,	-0.124939
-0.602014	7.43b. Compile-time	-0.124939
-4.129425	the weekdays.	-0.124939
-3.066220	compiler price	-0.124939
-3.291104	or PSDK).	-0.124939
-1.300799	__fastcall Noncached	-0.124939
-1.856917	range printf(Greek[n]);	-0.124939
-1.079089	modified. Unlike	-0.124939
-2.371619	user interface,	-0.124939
-1.642944	Library (MKL	-0.124939
-3.517654	for response.	-0.124939
-3.085097	an MFC	-0.124939
-3.121718	not supported");	-0.124939
-2.492585	branch misprediction,	-0.124939
-4.129425	the loader.	-0.124939
-2.119789	calculate (1./1.2345)	-0.124939
-3.310495	= array[++i]	-0.124939
-3.082131	int Func1(int	-0.124939
-0.902998	immediate responses	-0.124939
-3.433503	are offering	-0.124939
-2.191786	classes Programming	-0.124939
-3.827713	a First-In-Last-	-0.124939
-2.406440	code. Inserting	-0.124939
-3.310495	= (10000	-0.124939
-1.555979	away cpuid	-0.124939
-2.702043	example 16.2.	-0.124939
-0.602014	theory. Advice	-0.124939
-2.650262	n.a. a+a+a+a	-0.124939
-1.681149	DWORD PTR[ecx+eax*4],ebx	-0.124939
-0.902998	(32-bit mode):	-0.124939
-3.310495	= {1.0f,	-0.124939
-2.525000	so kludgy	-0.124939
-1.962818	three clauses:	-0.124939
-0.602014	occupied throughout	-0.124939
-2.681804	double x8	-0.124939
-0.902998	(This eliminates	-0.124939
-3.449984	be straightforward.	-0.124939
-3.654810	and create	-0.124939
-3.827713	a non-const	-0.124939
-3.225895	by dropping	-0.124939
-1.643222	Visual Studio.	-0.124939
-2.214012	AMD SSE4A	-0.124939
-3.291104	or friend	-0.124939
-3.251046	function inlining,	-0.124939
-2.608584	static linking,	-0.124939
-2.702043	example 12.2,	-0.124939
-3.850836	is unnecessarily	-0.124939
-3.850836	is caught	-0.124939
-0.602014	if-else structure),	-0.124939
-3.850836	is checked	-0.124939
-2.202919	+= a[i+1];	-0.124939
-3.245026	if (true)	-0.124939
-2.717714	page 107),	-0.124939
-2.717714	page 122)	-0.124939
-2.717714	page 62.	-0.124939
-1.902443	second source,	-0.124939
-2.717714	page 96.	-0.124939
-2.063211	was coded.	-0.124939
-2.180735	optimized further.	-0.124939
-3.327981	// Branch/loop	-0.124939
-3.827713	a key?	-0.124939
-2.572330	* sizeof(float));	-0.124939
-0.902998	a/1=a x-xxx-x--	-0.124939
-3.112512	- xx	-0.124939
-0.602014	unique key.	-0.124939
-1.601598	user. Menus,	-0.124939
-1.203935	default, conform	-0.124939
-2.572330	* sizeof(float)).	-0.124939
-3.827713	a password.	-0.124939
-0.602014	casting. Linked	-0.124939
-2.858815	vector classes):	-0.124939
-2.842927	different compilers.............................................................................	-0.124939
-2.544145	any transition	-0.124939
-1.204028	subtraction (3	-0.124939
-0.602014	obeyed. Copy	-0.124939
-1.777458	v. 10.1.020.	-0.124939
-0.902998	ISO/IEC TR18015	-0.124939
-3.850836	is happening.	-0.124939
-3.225895	by keys	-0.124939
-1.680918	overloaded assignment	-0.124939
-0.902998	7.33 Namespaces...........................................................................................................	-0.124939
-2.254533	versions alternatingly	-0.124939
-1.447019	d, e,	-0.124939
-0.602014	1.4, 2005.	-0.124939
-1.203935	defining _mm_malloc	-0.124939
-2.254256	calls exit(),	-0.124939
-1.880167	0, sizeof(list));	-0.124939
-1.203935	Small hand-held	-0.124939
-2.572330	* a;}	-0.124939
-2.595343	C++ 5.82	-0.124939
-3.827713	a year	-0.124939
-0.602014	series: ex	-0.124939
-0.602014	1./120., 1./720.,	-0.124939
-0.602014	latencies, throughputs	-0.124939
-1.203935	i_div_3; for(i=i_div_3=0;	-0.124939
-1.446973	Template meta-	-0.124939
-0.902998	goto CFALSE;	-0.124939
-3.011997	{ CFALSE:	-0.124939
-1.715495	registers. Except	-0.124939
-0.602014	optimize/#vectorclass Include	-0.124939
-1.203981	kept entirely	-0.124939
-3.281894	it (&ArraySize)	-0.124939
-0.902998	ASCII form.	-0.124939
-2.922430	will cut	-0.124939
-2.063211	was developed.	-0.124939
-2.565062	clock frequency,	-0.124939
-3.827713	a lineage	-0.124939
-2.470506	faster nor	-0.124939
-2.518327	return x*x	-0.124939
-1.902258	own error-handling	-0.124939
-1.715587	clear correspondence	-0.124939
-3.433503	are areas	-0.124939
-3.051374	may occasionally	-0.124939
-2.253887	calculations forms	-0.124939
-0.602014	modularity, reusability	-0.124939
-2.717714	page 141.	-0.124939
-3.195606	with First-In-First-Out	-0.124939
-2.503028	very old-fashioned.	-0.124939
-2.518327	return n;}	-0.124939
-2.032915	replace u[1]	-0.124939
-2.534207	some indication	-0.124939
-1.300845	operation. x*8	-0.124939
-2.954036	more complicated.	-0.124939
-2.888228	memory block,	-0.124939
-0.602014	(doubly ended	-0.124939
-1.805718	Windows, -msse2,	-0.124939
-4.129425	the strongest	-0.124939
-3.795255	to remember	-0.124939
-1.504826	file. Keep	-0.124939
-1.446881	disk. Memory-hungry	-0.124939
-4.129425	the sizeof	-0.124939
-3.827713	a server	-0.124939
-2.789259	which affects	-0.124939
-1.902443	dispatch decision	-0.124939
-0.602014	0.63 0.75	-0.124939
-2.513246	2 0.77	-0.124939
-1.379934	chain. Nothing	-0.124939
-2.399969	template <bool	-0.124939
-3.827713	a staircase	-0.124939
-3.527822	The ?:	-0.124939
-2.888228	memory economy	-0.124939
-0.602014	1./6.22702E9, 1./8.71782E10,	-0.124939
-2.518327	return ipow(x,10);	-0.124939
-0.902998	shuffling, packing,	-0.124939
-0.602014	2003. Contains	-0.124939
-3.085097	an attribute	-0.124939
-1.446881	free E-book	-0.124939
-3.310495	= (int)d;	-0.124939
-1.962864	Table 7.2.	-0.124939
-0.602014	s0, s1,	-0.124939
-1.300845	Functions _intel_fast_memcpy	-0.124939
-0.902998	windows, graphic	-0.124939
-3.517654	for vectors........................................................................	-0.124939
-2.894347	Example 9.1a	-0.124939
-2.894347	Example 9.1b	-0.124939
-3.082131	int absvalue,	-0.124939
-2.340669	file mathimf.h	-0.124939
-4.129425	the GetTickCount	-0.124939
-1.902674	XMM registers;	-0.124939
-2.406578	libraries (.lib	-0.124939
-3.827713	a 2'nd	-0.124939
-3.802463	of verifying,	-0.124939
-3.527822	The lesson	-0.124939
-0.602014	Sequential forward	-0.124939
-4.129425	the original,	-0.124939
-3.795255	to translate	-0.124939
-0.602014	experience. Occasionally,	-0.124939
-1.601598	significant improvements.	-0.124939
-3.310495	= absvalue;	-0.124939
-1.831816	section position-independent,	-0.124939
-3.827713	a conditional	-0.124939
-1.962864	Table 9.1.	-0.124939
-2.282055	stack frame,	-0.124939
-2.282055	stack frame"	-0.124939
-1.446834	127 int8_t	-0.124939
-1.446834	p. 104).	-0.124939
-0.602014	Vec2d Vec8f	-0.124939
-0.602014	Vec16us Vec8i	-0.124939
-0.602014	x^2, x^3,	-0.124939
-2.048386	dispatching ....................................................................................	-0.124939
-1.643083	? 1.5f	-0.124939
-0.602014	/GL --combine	-0.124939
-1.902212	AVX2 _mm256_i32gather_epi32	-0.124939
-1.203935	aliasing. __declspec(noalias)	-0.124939
-2.273049	CPUs unequally	-0.124939
-1.962910	| (b&c)	-0.124939
-4.129425	the .exe	-0.124939
-1.446927	2: printf("Gamma");	-0.124939
-2.499179	order polynomial:	-0.124939
-3.225895	by commas.	-0.124939
-3.654810	and popped	-0.124939
-2.502706	software engineering	-0.124939
-3.827713	a polynomial.	-0.124939
-4.129425	the burdensome	-0.124939
-0.902998	Free trial	-0.124939
-1.923540	becomes contiguous.	-0.124939
-2.378780	registers .................................................................	-0.124939
-2.032454	typically thinks	-0.124939
-2.572330	* _mm_load_ps(coef+i);	-0.124939
-1.902535	later maintenance.	-0.124939
-3.802463	of downloaded	-0.124939
-2.518327	return (*CriticalFunction)(parm1,	-0.124939
-2.168831	/ b1;	-0.124939
-2.882274	at explaining	-0.124939
-2.078488	count (ArraySize)	-0.124939
-3.449984	be prepared	-0.124939
-1.715403	so-called iterators	-0.124939
-1.923355	actually implies	-0.124939
-3.073074	than 200.	-0.124939
-0.602014	jeopardizing safety,	-0.124939
-1.379934	x-xxxx--x (a|b)&(a|c)	-0.124939
-0.602014	4.1.0, 2006	-0.124939
-1.504780	4, 2007	-0.124939
-0.602014	© 2004	-0.124939
-2.717714	page 53).	-0.124939
-3.827713	a ready-made	-0.124939
-3.327981	// Detect	-0.124939
-3.153114	as GetPrivateProfileString	-0.124939
-1.902489	calling vector::reserve	-0.124939
-1.923586	public CParent<CChild2>	-0.124939
-2.544191	variable storage.............................................................................	-0.124939
-3.795255	to query	-0.124939
-2.894347	Example 7.33b	-0.124939
-3.654810	and delete).	-0.124939
-3.795255	to compose	-0.124939
-3.251046	function prototype:	-0.124939
-2.888228	memory fragmentation.	-0.124939
-3.850836	is OK,	-0.124939
-3.153114	as sorting,	-0.124939
-0.902998	y1, y2;	-0.124939
-2.192062	cause fatal	-0.124939
-4.129425	the scarcity	-0.124939
-1.203935	obsolete. Rick	-0.124939
-2.745331	cache MOVNTI	-0.124939
-2.559809	value -100+100+100	-0.124939
-3.654810	and classes............................................................................................	-0.124939
-2.568605	array initializer	-0.124939
-0.602014	cryptography (www.intel.com).	-0.124939
-0.602014	1./40320., 1./362880.,	-0.124939
-3.449984	be non-zero,	-0.124939
-1.856732	constructor initializes	-0.124939
-3.291104	or g(x)	-0.124939
-3.035733	you discover	-0.124939
-1.923632	give -2.0	-0.124939
-2.717714	page 93).	-0.124939
-3.162394	code (release	-0.124939
-3.172431	on publicly	-0.124939
-1.446973	highly optimized,	-0.124939
-3.827713	a discrete	-0.124939
-0.602014	/Qopt-report -opt-report	-0.124939
-2.191509	been unsatisfied	-0.124939
-3.121718	not safe,	-0.124939
-2.282055	stack entries	-0.124939
-4.129425	the other,	-0.124939
-3.291104	or seemingly	-0.124939
-3.085097	an imported	-0.124939
-3.121718	not standardized.	-0.124939
-3.795255	to compensate	-0.124939
-0.902998	test, maintain	-0.124939
-2.016248	like sin.	-0.124939
-0.602014	exp, sin,	-0.124939
-4.129425	the rightmost	-0.124939
-2.234190	language ...............................................................................	-0.124939
-4.129425	the insertion	-0.124939
-2.874792	data object:	-0.124939
-2.254533	versions instead.	-0.124939
-3.449984	be saved.	-0.124939
-3.310495	= _mm_andnot_si128(mask,	-0.124939
-2.894347	Example 8.11b	-0.124939
-2.894347	Example 8.11a	-0.124939
-0.902998	IDE's (Integrated	-0.124939
-3.850836	is opposite).	-0.124939
-0.902998	AMD: "AMD64	-0.124939
-3.310495	= 0x20,	-0.124939
-3.011997	{ DTRUE:	-0.124939
-0.902998	goto DTRUE;	-0.124939
-3.795255	to exchange	-0.124939
-2.789259	which supposedly	-0.124939
-1.880121	binary digits.	-0.124939
-2.717714	page 44.	-0.124939
-1.601598	significant digits,	-0.124939
-1.680779	libraries. www.agner.org/optimize/#vectorclass	-0.124939
-3.245026	if (absvalue	-0.124939
-2.702043	example 8.23b	-0.124939
-1.943651	|| defined(__GNUC__)	-0.124939
-2.223786	common excuse	-0.124939
-1.079043	FUNCNAME SelectAddMul_SSE2	-0.124939
-1.601644	course system-specific.	-0.124939
-1.680733	needed. Predictable	-0.124939
-1.805487	size. Later	-0.124939
-2.703880	compilers www.agner.org/	-0.124939
-1.300845	physical factors.	-0.124939
-1.777551	>= operators).	-0.124939
-1.943651	|| (b&&c)	-0.124939
-2.446280	0; 35	-0.124939
-2.915311	} 34	-0.124939
-0.602014	characters '?',	-0.124939
-0.602014	0.3, -2.0,	-0.124939
-1.504919	(unsigned int)a	-0.124939
-3.517654	for holding	-0.124939
-1.999168	separate module,	-0.124939
-1.203935	processing. Running	-0.124939
-4.129425	the hint,	-0.124939
-3.172431	on system-specific	-0.124939
-3.310495	= instrset_detect();	-0.124939
-3.291104	or remotely.	-0.124939
-2.703512	most distributions	-0.124939
-0.602014	"Inner Loops:	-0.124939
-3.482454	that "we	-0.124939
-2.930452	A little-known	-0.124939
-0.602014	Adolfy Hoisie:	-0.124939
-2.695975	using InstructionSet():	-0.124939
-0.602014	Encryption, decryption,	-0.124939
-1.831862	my crystal	-0.124939
-0.902998	millisecond resolution.	-0.124939
-1.379934	completely absent	-0.124939
-0.902998	PC. Similarly,	-0.124939
-2.832806	same name,	-0.124939
-2.234420	element (approximately):	-0.124939
-3.850836	is reset	-0.124939
-3.827713	a disassembly,	-0.124939
-1.079043	natural ordering?	-0.124939
-3.172431	on arranging	-0.124939
-2.406486	optimization hints	-0.124939
-2.016017	their functionality.	-0.124939
-3.654810	and decoded	-0.124939
-1.079043	---xx---- a<<b<<c=a<<(b+c)	-0.124939
-2.544191	variable __intel_cpu_feature_indicator	-0.124939
-0.602014	Journal Vol.	-0.124939
-0.902998	remain unchanged.	-0.124939
-3.850836	is unchanged,	-0.124939
-3.827713	a tag	-0.124939
-0.602014	6); Or,	-0.124939
-0.602014	included. Combining	-0.124939
-3.795255	to fetch	-0.124939
-3.517654	for (c1	-0.124939
-3.850836	is reserved	-0.124939
-3.827713	a balanced	-0.124939
-2.078027	quite convenient.	-0.124939
-2.308660	processors (when	-0.124939
-2.364011	without effectively	-0.124939
-1.504873	throw() specification.	-0.124939
-1.962910	too high.	-0.124939
-2.729123	no native	-0.124939
-3.073074	than rendering	-0.124939
-3.827713	a First-In-First-	-0.124939
-2.745331	cache line:	-0.124939
-3.122826	This dilemma	-0.124939
-3.310495	= (b*c)/d,	-0.124939
-2.922430	will propagate	-0.124939
-0.902998	perfectly varies	-0.124939
-2.745331	cache line,	-0.124939
-1.902443	interface framework...........................................................................	-0.124939
-1.962956	n floats:	-0.124939
-2.498348	long timediff[NumberOfTests];	-0.124939
-2.156931	four floats.	-0.124939
-1.446834	p. 22).	-0.124939
-3.327981	// Place	-0.124939
-1.963187	char abc;	-0.124939
-3.850836	is ambiguous	-0.124939
-2.813863	CPU models.	-0.124939
-0.902998	123 correspond	-0.124939
-2.819173	only _mm_permutevar_ps	-0.124939
-2.894347	Example 7.38b.	-0.124939
-3.310495	= Y;	-0.124939
-3.082131	int BigArray[1024];	-0.124939
-3.654810	and 3B.	-0.124939
-2.341130	following steps	-0.124939
-3.073074	than looping	-0.124939
-3.310495	= Func1(2);	-0.124939
-0.602014	mentally flawed	-0.124939
-1.880398	know about.	-0.124939
-0.602014	nagging pop-up	-0.124939
-3.517654	for signifying	-0.124939
-2.432723	register renaming.	-0.124939
-1.203981	me manually,	-0.124939
-1.981440	#include <malloc.h>	-0.124939
-4.129425	the spell	-0.124939
-2.347850	always sequential,	-0.124939
-3.310495	= &list[0];	-0.124939
-0.902998	NAN (Not	-0.124939
-2.534207	some day	-0.124939
-2.272911	extra complications.	-0.124939
-1.079043	At least,	-0.124939
-2.679119	Intel VTune	-0.124939
-1.642944	Library __vrs4_expf	-0.124939
-3.112512	- x-xx----x	-0.124939
-1.805625	profiler identifies	-0.124939
-0.902998	a/a=1 --------x	-0.124939
-3.310495	= a|(b&c)	-0.124939
-2.445312	8 double's	-0.124939
-1.923725	linked lists.	-0.124939
-0.602014	initializer lists,	-0.124939
-3.449984	be programmed	-0.124939
-2.776029	used most.	-0.124939
-3.327981	// Print	-0.124939
-1.079043	Microprocessor designers	-0.124939
-1.446881	core. Try	-0.124939
-2.458699	stored contiguously	-0.124939
-2.894347	Example 8.1b	-0.124939
-3.327981	// x,y	-0.124939
-2.894347	Example 8.1a	-0.124939
-3.291104	or -fno-strict-overflow.	-0.124939
-0.902998	re- usable	-0.124939
-3.327981	// Reset	-0.124939
-4.129425	the likelihood	-0.124939
-3.827713	a fixed-size	-0.124939
-2.016386	implementation dependent.	-0.124939
-0.902998	Remove right-most	-0.124939
-0.602014	(a&~b)|(~a&b)=a^b ---------	-0.124939
-2.717714	page 143.	-0.124939
-3.654810	and ||).	-0.124939
-3.795255	to facilitate	-0.124939
-0.602014	3.1, 2007.	-0.124939
-2.894347	Example 12.9b.	-0.124939
-2.458146	address esp+8	-0.124939
-1.902535	together ......................................	-0.124939
-2.888228	memory economy,	-0.124939
-1.300799	updated lately.	-0.124939
-3.802463	of structures:	-0.124939
-2.273188	accessed row-wise,	-0.124939
-3.827713	a lookup-table	-0.124939
-3.449984	be reached	-0.124939
-2.894347	Example 8.16	-0.124939
-0.602014	Windows: __rdtsc()).	-0.124939
-2.508423	table (GOT).	-0.124939
-2.894347	Example 8.17	-0.124939
-2.894347	Example 8.18	-0.124939
-1.680779	access. Sequential	-0.124939
-2.510129	You may,	-0.124939
-3.802463	of multithreading.	-0.124939
-2.894347	Example 7.43	-0.124939
-2.894347	Example 7.42	-0.124939
-3.449984	be renewed.	-0.124939
-2.894347	Example 7.45	-0.124939
-2.894347	Example 7.44	-0.124939
-2.894347	Example 7.4.	-0.124939
-1.379934	Is16vec8 two(2,2,2,2,2,2,2,2);	-0.124939
-3.291104	or while-loop	-0.124939
-2.406440	code. Compiled	-0.124939
-3.802463	of 1/n!	-0.124939
-1.880351	512 378.7	-0.124939
-3.850836	is counting	-0.124939
-0.602014	fprintf(stderr, "\nError:	-0.124939
-1.203935	underflow neutralize	-0.124939
-0.902998	2.0f; x.i	-0.124939
-0.602014	2056 38.1	-0.124939
-0.602014	2040 38.7	-0.124939
-2.016202	allows "__attribute__((visibility("hidden")))".	-0.124939
-3.654810	and animations	-0.124939
-0.602014	Reference Manual".	-0.124939
-3.517654	for demonstration	-0.124939
-1.923586	graphics cards,	-0.124939
-0.602014	1./4.790016E8, 1./6.22702E9,	-0.124939
-3.225895	by hand	-0.124939
-1.902258	own caller,	-0.124939
-1.300845	16, 32,	-0.124939
-2.191832	line provokes	-0.124939
-1.902443	second operand.	-0.124939
-2.854291	make memory-hungry	-0.124939
-3.850836	is provoked	-0.124939
-3.310495	= 32;	-0.124939
-3.310495	= WhateverFunction(i);	-0.124939
-3.433503	are unavoidable.	-0.124939
-1.079043	-fno-pic apparently	-0.124939
-3.827713	a DLL.	-0.124939
-0.602014	MOVNTPS _mm_stream_ps	-0.124939
-3.795255	to perform	-0.124939
-2.263387	; mark_end;	-0.124939
-2.717714	page 96).	-0.124939
-3.327981	// x^1,	-0.124939
-2.203058	option -ftrapv,	-0.124939
-2.629046	library libircmt.lib.	-0.124939
-2.168831	/ 1.2345);	-0.124939
-3.850836	is repetitive.	-0.124939
-1.856870	switch (n)	-0.124939
-2.975045	time consumers.	-0.124939
-3.827713	a request	-0.124939
-3.802463	of N:	-0.124939
-3.011997	{ "Alpha",	-0.124939
-2.513246	2 (be	-0.124939
-3.850836	is artificially	-0.124939
-3.291104	or animation.	-0.124939
-3.654810	and cons	-0.124939
-2.092498	certain tolerance.	-0.124939
-0.602014	0x3700, 0x3F00	-0.124939
-2.572330	* 17is	-0.124939
-2.281732	its reputation.	-0.124939
-2.426704	64 Iu8vec8	-0.124939
-1.555979	aliasing (/Oa).	-0.124939
-0.602014	maximum, saturated	-0.124939
-2.439294	bit offsets).	-0.124939
-1.680779	mov $B1$2:	-0.124939
-2.203196	good knowledge	-0.124939
-1.379934	1.0; list[i].b	-0.124939
-2.340669	file stub.	-0.124939
-3.291104	or Espresso)	-0.124939
-1.556025	Optimization Guide	-0.124939
-3.654810	and "Integrated	-0.124939
-3.795255	to T+6,	-0.124939
-2.534207	some examples:	-0.124939
-2.263848	must bear	-0.124939
-1.880351	Here, log(2.0)	-0.124939
-3.112512	- andnot(a,a)	-0.124939
-2.894347	Example 12.8a.	-0.124939
-0.602014	&= 0x7FFFFFFF;	-0.124939
-0.602014	-msse2, -mavx,	-0.124939
-3.291104	or "__attribute__((visibility	-0.124939
-1.680779	mov 2:8+esp	-0.124939
-0.602014	_mm_blendv_epi8(bc, c2,	-0.124939
-2.063534	optimize anything,	-0.124939
-2.565062	clock pulses	-0.124939
-2.371297	systems disappears	-0.124939
-1.556025	search requests	-0.124939
-2.445773	less reliable.	-0.124939
-1.680733	possible. Typically	-0.124939
-3.225895	by constructing	-0.124939
-0.902998	allowed. Non-public	-0.124939
-2.513246	2 GB,	-0.124939
-1.680687	Any writable	-0.124939
-0.602014	c1, c2;	-0.124939
-3.291104	or inttypes.h	-0.124939
-2.892568	from attempting	-0.124939
-1.079043	takes. Debugging.	-0.124939
-2.695975	using indexes,	-0.124939
-0.902998	-(-a)=a --xxxxxx-	-0.124939
-4.129425	the preprocessor	-0.124939
-2.244358	processor enters	-0.124939
-0.602014	4.5.2, July	-0.124939
-2.616466	float lookup[2]	-0.124939
-0.602014	-156. Surprisingly,	-0.124939
-2.587401	efficient because,	-0.124939
-2.364011	without jeopardizing	-0.124939
-3.482454	that draws	-0.124939
-3.654810	and UNIX	-0.124939
-2.192108	AVX _mm256_permutevar_ps	-0.124939
-3.654810	and '$'	-0.124939
-1.963187	examples exist.	-0.124939
-1.962864	Table 9.3.	-0.124939
-2.776029	used freely	-0.124939
-2.364011	without taking	-0.124939
-1.446927	absolute values:	-0.124939
-0.602014	paralleli- zation	-0.124939
-2.682539	size (4096).	-0.124939
-1.300799	Development process......................................................................................................	-0.124939
-0.902998	G values,	-0.124939
-0.602014	minimum, maximum,	-0.124939
-2.357288	useful discussions	-0.124939
-3.827713	a #define,	-0.124939
-0.602014	status: _fpreset();	-0.124939
-1.880213	were observed	-0.124939
-2.702043	example 15.1d	-0.124939
-2.502706	software package,	-0.124939
-1.079043	allocating piecewise	-0.124939
-2.741712	integer division:	-0.124939
-1.805810	columns unused.	-0.124939
-0.902998	Sab ab[size];	-0.124939
-3.357106	can steal	-0.124939
-3.310495	= array[i++]	-0.124939
-3.654810	and FPGAs.	-0.124939
-2.202919	+= x^n/n!	-0.124939
-1.079043	File access................................................................................................................	-0.124939
-3.795255	to OMF	-0.124939
-1.681010	fit nicely	-0.124939
-2.378550	test finishes	-0.124939
-2.608584	static keyword:	-0.124939
-2.608584	static keyword,	-0.124939
-0.602014	0.5ns. 2GHz	-0.124939
-3.654810	and cryptography	-0.124939
-4.129425	the Professional	-0.124939
-2.432723	register keyword.	-0.124939
-3.153114	as strcpy,	-0.124939
-2.894347	Example 7.35b	-0.124939
-2.894347	Example 7.35a	-0.124939
-3.827713	a plain	-0.124939
-0.902998	<xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);	-0.124939
-0.602014	MOVNTI _mm_stream_si32	-0.124939
-2.513246	2 GB.	-0.124939
-3.850836	is advisable	-0.124939
-0.602014	imple- mentations	-0.124939
-1.902212	AVX2 _mm_i32gather_ps	-0.124939
-3.310495	= 100000001.23456.	-0.124939
-2.894347	Example 9.6b	-0.124939
-1.715634	non-Intel processors).	-0.124939
-3.449984	be weighed	-0.124939
-1.642991	e.g. /arch:SSE2.	-0.124939
-0.602014	5, 2009).	-0.124939
-0.902998	inserted UnusedFiller	-0.124939
-2.224016	several flaws:	-0.124939
-0.902998	7.5 Booleans...................................................................................................................	-0.124939
-1.999445	mathematical notion	-0.124939
-1.079089	(Intel Atom).	-0.124939
-2.426704	64 kbytes.	-0.124939
-0.602014	(Microsoft, Intel)	-0.124939
-2.224063	single comparison:	-0.124939
-2.892568	from Intel.	-0.124939
-1.079043	conventions. FreeBSD	-0.124939
-2.508884	performance somewhat.	-0.124939
-2.815473	instruction set:	-0.124939
-2.348587	void SelectAddMul_dispatch(short	-0.124939
-1.981440	#include <intrin.h>	-0.124939
-3.449984	be reduced.	-0.124939
-0.602014	NAN. Avoiding	-0.124939
-2.290871	error reporting	-0.124939
-3.195606	with profiling,	-0.124939
-3.802463	of profiling.	-0.124939
-2.559486	objects numbered	-0.124939
-0.902998	planning phase	-0.124939
-2.168831	/ (b1	-0.124939
-2.595343	C++ builder	-0.124939
-0.902998	strides. Uncached	-0.124939
-4.129425	the responsi-	-0.124939
-0.602014	hasn't thought	-0.124939
-2.629046	library (SVML).	-0.124939
-3.112512	- xx(-)x-	-0.124939
-2.894347	Example 8.23a.	-0.124939
-3.433503	are indexed	-0.124939
-2.778593	but event-counters	-0.124939
-2.078027	quite often.	-0.124939
-2.371297	systems Microcontrollers	-0.124939
-3.827713	a genuine	-0.124939
-3.795255	to calculate.	-0.124939
-1.300799	programming, modularity,	-0.124939
-3.802463	of modularity.	-0.124939
-3.795255	to localize	-0.124939
-3.795255	to optimize,	-0.124939
-1.715495	it. Instead	-0.124939
-1.079043	spot. Repeating	-0.124939
-1.203935	parallel. Fine-grained	-0.124939
-0.602014	shell script.	-0.124939
-1.446834	p. 28)	-0.124939
-2.595113	also work,	-0.124939
-1.379980	column 28,	-0.124939
-2.191509	been calculated.	-0.124939
-4.129425	the "FDIV	-0.124939
-1.777458	v. 3.1,	-0.124939
-3.162394	code slower,	-0.124939
-4.129425	the reciprocal:	-0.124939
-3.291104	or __attribute__((fastcall)).	-0.124939
-3.082131	int cc[size]	-0.124939
-0.602014	\n fistpl	-0.124939
-0.602014	MKL, VML	-0.124939
-0.602014	Sandy Bridge)	-0.124939
-1.446834	MMX mmintrin.h	-0.124939
-2.498348	long long,	-0.124939
-2.032592	microprocessor wastes	-0.124939
-3.850836	is inexact	-0.124939
-2.811801	other odd-sized	-0.124939
-0.602014	1./39916800., 1./4.790016E8,	-0.124939
-3.827713	a thread-like	-0.124939
-3.795255	to T+5,	-0.124939
-3.654810	and Itanium	-0.124939
-1.805625	1. Relocation.	-0.124939
-2.813863	CPU brands,	-0.124939
-1.203981	513 2056	-0.124939
-2.492447	< NUMROWS;	-0.124939
-0.602014	tables: Lists	-0.124939
-2.915311	} module2.cpp	-0.124939
-2.894347	Example 12.8b.	-0.124939
-0.602014	0.35 0.29	-0.124939
-2.894347	Example 14.18c	-0.124939
-1.643129	SSE xmmintrin.h	-0.124939
-1.680825	square blocking:	-0.124939
-0.602014	0.59 0.27	-0.124939
-0.902998	0.28 0.22	-0.124939
-1.962910	| ((C	-0.124939
-1.962910	| ((B	-0.124939
-3.310495	= Func(a[i]);	-0.124939
-2.867614	program creates	-0.124939
-2.341130	following work-around	-0.124939
-2.078581	intermediate results,	-0.124939
-3.073074	than log)	-0.124939
-0.902998	x-xxx---- a*b*c=a*(b*c)	-0.124939
-2.544145	any non-vector	-0.124939
-3.449984	be recycled?	-0.124939
-3.195606	with carry)	-0.124939
-2.032638	Mac OS.	-0.124939
-3.527822	The FactorialTable	-0.124939
-1.856778	i, timediff[i]);	-0.124939
-1.963187	programmer can.	-0.124939
-0.602014	DEC, JNZ).	-0.124939
-2.016756	dependency chain,	-0.124939
-1.963095	addition to)	-0.124939
-0.602014	----x---- !(a<b)=(a>=b)	-0.124939
-2.717714	page 134.	-0.124939
-1.203981	9.3 shows,	-0.124939
-3.281894	it feeds	-0.124939
-1.943743	> 1.0)	-0.124939
-1.504780	general improvements	-0.124939
-2.191509	been introduced	-0.124939
-1.446927	do. Hence,	-0.124939
-0.902998	(called x86)	-0.124939
-2.894347	Example 8.2a	-0.124939
-2.894347	Example 8.2b	-0.124939
-2.254256	calls alternately	-0.124939
-1.079043	service routines	-0.124939
-3.251046	function billions	-0.124939
-1.079043	K8 1.09	-0.124939
-3.153114	as x4∙xn-4.	-0.124939
-2.717714	page 103),	-0.124939
-1.832093	around 1980	-0.124939
-2.702043	example 14.7b,	-0.124939
-2.894347	Example 14.7b.	-0.124939
-0.602014	year. Ignoring	-0.124939
-3.121718	not affected	-0.124939
-0.602014	3.; x.d	-0.124939
-1.923494	x; x.f	-0.124939
-2.894347	Example 7.9b	-0.124939
-2.894347	Example 7.9a	-0.124939
-2.092636	simply identical.	-0.124939
-1.379980	occurs somewhere	-0.124939
-1.962818	&& !b	-0.124939
-2.888228	memory bus	-0.124939
-1.203981	Out (FIFO)	-0.124939
-2.341130	programming practice,	-0.124939
-2.702043	example 8.24	-0.124939
-2.894347	Example 8.25	-0.124939
-2.340669	file disassembler.	-0.124939
-2.132431	operators &&,	-0.124939
-2.894347	Example 8.20	-0.124939
-3.153114	as (critical	-0.124939
-2.894347	Example 8.22	-0.124939
-2.503028	very smart.	-0.124939
-2.894347	Example 12.9a.	-0.124939
-3.517654	for SSE2,	-0.124939
-2.652005	+ r.b;}	-0.124939
-1.715495	operations. Multiplying	-0.124939
-4.129425	the post-increment	-0.124939
-2.281778	about bugs,	-0.124939
-3.291104	or C1::f.	-0.124939
-0.602014	F64vec2 F32vec8	-0.124939
-0.602014	Dispatcher. Will	-0.124939
-1.203935	against overkill.	-0.124939
-2.894347	Example 8.3b	-0.124939
-3.085097	an ordinary	-0.124939
-2.364011	without paying	-0.124939
-0.602014	Technology Journal	-0.124939
-3.310495	= -1.0E8,	-0.124939
-0.902998	slight imprecision	-0.124939
-2.244358	doesn't provide	-0.124939
-2.333023	operations in-between	-0.124939
-2.544191	variable __intel_cpu_feature_indicator_x.	-0.124939
-1.747541	save recovery	-0.124939
-1.642944	Library (WTL).	-0.124939
-0.602014	2.7, 2.8.	-0.124939
-2.439294	bit indicates	-0.124939
-0.902998	0x20; 46	-0.124939
-1.642944	Library (WTL):	-0.124939
-1.300799	mispredictions. 44	-0.124939
-1.446834	reliable decision.	-0.124939
-0.902998	systems). 42	-0.124939
-1.203935	place indicated	-0.124939
-0.602014	PC's, workstations	-0.124939
-0.602014	i++,i2+=2.0f)a[i]=i2; 41	-0.124939
-0.902998	2048 230.7	-0.124939
-2.224385	much faster,	-0.124939
-1.643083	? (cc[i]	-0.124939
-2.544191	variable 85	-0.124939
-2.867614	program /Qipo	-0.124939
-0.902998	tedious indeed.	-0.124939
-2.032915	1; a[1]	-0.124939
-1.880213	<< 6);	-0.124939
-3.517654	for hackers.	-0.124939
-3.850836	is exact.	-0.124939
-3.517654	for (row	-0.124939
-2.371619	user friendly.	-0.124939
-1.446927	poorly predictable,	-0.124939
-1.902443	interface (OnIdle	-0.124939
-2.419436	often abusing	-0.124939
-2.272911	function. Leaf	-0.124939
-0.602014	"Gamma", "Delta"	-0.124939
-3.310495	= 0x10,	-0.124939
-1.300845	Functions ................................................................................................................	-0.124939
-3.291104	or C2::Disp()	-0.124939
-2.661281	into sleep	-0.124939
-0.602014	Single-Instruction-Multiple-Data (SIMD)	-0.124939
-3.085097	an if-else	-0.124939
-1.601829	CriticalFunction ();	-0.124939
-3.433503	are feeding	-0.124939
-0.902998	a/a=1 ----x---x	-0.124939
-3.482454	that violate	-0.124939
-2.717714	page 34.	-0.124939
-0.602014	0.6 1.19	-0.124939
-0.902998	x--xx---- (a&&b)||(a&&!b)=a	-0.124939
-1.747587	special mathe-	-0.124939
-3.153114	as VHDL	-0.124939
-3.449984	be postponed	-0.124939
-1.805810	gives rise	-0.124939
-3.082131	int u[2]}	-0.124939
-0.902998	web browsing	-0.124939
-2.803414	loop counter:	-0.124939
-0.602014	{1.1, 0.3,	-0.124939
-1.923494	processors. Henry	-0.124939
-2.544927	we encounter	-0.124939
-1.999353	cache. Bit-fields	-0.124939
-3.433503	are unacceptable.	-0.124939
-2.016017	their live-ranges	-0.124939
-0.902998	0.40 0.30	-0.124939
-3.327981	// Serialize	-0.124939
-1.556025	Optimization Reference	-0.124939
-2.975045	time consuming,	-0.124939
-1.715772	compatibility problems,	-0.124939
-2.508423	table (GOT)	-0.124939
-1.079043	K8 0.38	-0.124939
-0.602014	(a+c==b+c)=(a==b) ----x----	-0.124939
-3.850836	is created,	-0.124939
-0.602014	matrix[SIZE][SIZE]; transpose(matrix);	-0.124939
-2.745331	cache contention.	-0.124939
-2.518327	return powN<(N1&(N1-1))==0,N1>::p(x)	-0.124939
-3.517654	for (r1	-0.124939
-3.225895	by wrapping	-0.124939
-3.449984	be omitted,	-0.124939
-1.203935	{return p->a	-0.124939
-1.300799	necessarily newer.	-0.124939
-0.602014	circuits consisting	-0.124939
-3.085097	an estimated	-0.124939
-1.601644	dispatching. Underestimating	-0.124939
-3.663751	in edx.	-0.124939
-3.654810	and Z.	-0.124939
-0.602014	IA-32 Architectures	-0.124939
-3.291104	or hide	-0.124939
-1.504873	throw() specification	-0.124939
-0.602014	fetching, decoding	-0.124939
-0.602014	exit(), abort(),	-0.124939
-3.073074	than others.	-0.124939
-2.202919	+= Z;	-0.124939
-2.203196	allocated memory.................................................................	-0.124939
-3.449984	be considered.	-0.124939
-3.663751	in general.	-0.124939
-2.894347	Example 7.38a.	-0.124939
-3.433503	are hundreds	-0.124939
-2.681804	double Func2(double	-0.124939
-1.643222	Visual studio	-0.124939
-3.795255	to reinvent	-0.124939
-3.802463	of yesterday's	-0.124939
-1.777458	v. 4.5.2,	-0.124939
-0.602014	PROC NEAR	-0.124939
-3.527822	The CodeGear,	-0.124939
-2.340669	file http://www.agner.org/optimize/asmlib.zip	-0.124939
-1.379934	CodeGear compiler)	-0.124939
-0.902998	511 2040	-0.124939
-2.894347	Example 7.43a.	-0.124939
-3.153114	as recursive	-0.124939
-0.902998	testing. Trying	-0.124939
-2.191832	line 29.	-0.124939
-2.364011	without returning.	-0.124939
-2.224847	a, sizeof(b));	-0.124939
-2.224016	thread scheduler.	-0.124939
-2.508423	table summarizes	-0.124939
-3.482454	that (b*c)	-0.124939
-2.092636	simply prints	-0.124939
-3.162394	code optimization",	-0.124939
-1.204028	parabola (2.0f);	-0.124939
-2.078442	... list[i	-0.124939
-3.517654	for pow(x,N)	-0.124939
-1.643083	PTR [eax+400]	-0.124939
-3.073074	than -156.	-0.124939
-3.449984	be speeded	-0.124939
-3.310495	= y.d	-0.124939
-3.449984	be used:	-0.124939
-2.544191	variable m.	-0.124939
-3.082131	int m)	-0.124939
-3.310495	= y.a	-0.124939
-3.310495	= y.b	-0.124939
-3.310495	= y.c	-0.124939
-0.602014	optimization", Coriolis	-0.124939
-1.079043	64, ...).	-0.124939
-1.079089	<int m>	-0.124939
-0.602014	WriteFile(handle, ...))	-0.124939
-4.129425	the oldest	-0.124939
-3.654810	and correspondingly	-0.124939
-2.861745	has excellent	-0.124939
-3.310495	= (float)i;	-0.124939
-4.129425	the grandparent	-0.124939
-0.602014	Vec8ui Vec4q	-0.124939
-0.902998	sar ebx,1	-0.124939
-3.245026	if (u.i[1]	-0.124939
-2.281732	its simplicity.	-0.124939
-1.446927	simultaneously. Actually,	-0.124939
-1.079089	3-dimensional geometry	-0.124939
-0.602014	Is32vec4 Vec4i	-0.124939
-3.482454	that saves	-0.124939
-3.121718	not aliased	-0.124939
-0.602014	Vec8f Vec4d	-0.124939
-0.602014	respectively (MS	-0.124939
-0.602014	thread-like scheduling	-0.124939
-1.680687	(e.g. PowerPC).	-0.124939
-2.729123	no guarantee	-0.124939
-0.602014	over. Virtualization	-0.124939
-2.234282	best algorithm.	-0.124939
-2.347850	always compete	-0.124939
-2.703420	do searches	-0.124939
-3.517654	for both,	-0.124939
-1.643175	nontemporal writes.	-0.124939
-3.291104	or p->member	-0.124939
-3.000398	have confirmed	-0.124939
-3.654810	and hence	-0.124939
-2.106240	: EXCEPTION_CONTINUE_SEARCH)	-0.124939
-2.894347	Example 14.21.	-0.124939
-3.153114	as versatile.	-0.124939
-3.654810	and systematization	-0.124939
-2.888228	memory footprint.	-0.124939
-3.654810	and memcpy:	-0.124939
-3.225895	by summing	-0.124939
-2.888228	memory ports,	-0.124939
-1.079043	expression. Assume,	-0.124939
-3.527822	The ultimate	-0.124939
-3.850836	is organized.	-0.124939
-3.654810	and parsing	-0.124939
-1.203981	Copyright ©	-0.124939
-3.527822	The official	-0.124939
-3.122826	This fragmentation	-0.124939
-0.602014	AND-OR construction	-0.124939
-3.433503	are modified,	-0.124939
-2.554687	takes 40%	-0.124939
-1.777458	v. 9.0	-0.124939
-0.602014	-read_only_relocs suppress.	-0.124939
-3.850836	is reflected,	-0.124939
-3.654810	and 137,	-0.124939
-1.643083	condition terminates	-0.124939
-1.446834	p. 26).	-0.124939
-1.831954	predicted perfectly.	-0.124939
-2.341268	32 16.4	-0.124939
-0.602014	--combine -fwhole-	-0.124939
-3.850836	is terminated	-0.124939
-1.203935	63 .	-0.124939
-3.121718	not backwards.	-0.124939
-0.602014	"Beta", "Gamma",	-0.124939
-2.503028	very often,	-0.124939
-1.203981	pattern history,	-0.124939
-1.923863	public: c1()	-0.124939
-4.129425	the integer-to-float	-0.124939
-3.082131	int arraysize	-0.124939
-1.079043	Vec16s Vec16us	-0.124939
-0.902998	a-(-b)=a+b ---xxx-x-	-0.124939
-2.223786	common subexpressions,	-0.124939
-1.446927	remove unreferenced	-0.124939
-3.827713	a non-recursing	-0.124939
-3.066220	compiler puts	-0.124939
-0.602014	(v. 15.0)	-0.124939
-3.802463	of identifier	-0.124939
-2.234190	language gained	-0.124939
-0.902998	planning stage	-0.124939
-1.079043	i<100; i++,i2+=2.0f)a[i]=i2;	-0.124939
-3.162394	code (byte	-0.124939
-2.629046	library asmlib..	-0.124939
-3.517654	for combining	-0.124939
-3.663751	in scope.	-0.124939
-3.225895	by selecting	-0.124939
-1.680733	structures .............................................................	-0.124939
-1.747587	database connections,	-0.124939
-3.517654	for (temp	-0.124939
-3.449984	be added.	-0.124939
-0.902998	7.33 Namespaces	-0.124939
-2.913667	then merge	-0.124939
-2.717714	page 142).	-0.124939
-3.153114	as flush	-0.124939
-0.602014	PHP, ASP	-0.124939
-2.092636	well documented.	-0.124939
-2.340531	system standards.	-0.124939
-0.902998	Member pointers.......................................................................................................37	-0.124939
-3.517654	for correctness	-0.124939
-2.559486	objects (rather	-0.124939
-0.902998	-0 (zero	-0.124939
-3.082131	int CriticalFunction_Dispatch(int	-0.124939
-4.129425	the BTB	-0.124939
-1.079089	offer profile-guided	-0.124939
-1.379980	?Func@@YAXQAHAAH@Z PROC	-0.124939
-2.652005	+ 0x3FF	-0.124939
-2.894347	Example 7.32a	-0.124939
-2.263341	while (*p	-0.124939
-1.555979	usability issues,	-0.124939
-2.832806	same coding	-0.124939
-2.348587	void F2(float	-0.124939
-3.066220	compiler options.......................................................................................	-0.124939
-2.652005	+ 0x3FFF	-0.124939
-0.902998	jl $B1$3:	-0.124939
-3.310495	= 0.0;	-0.124939
-0.602014	12.4b, rewritten	-0.124939
-0.902998	semaphores, mutexes	-0.124939
-1.902212	AVX2 _mm256_i64gather_epi32	-0.124939
-3.802463	of attack	-0.124939
-0.902998	external clock.	-0.124939
-0.902998	log, exp,	-0.124939
-2.348863	access rights.	-0.124939
-2.419436	often fluctuating	-0.124939
-3.795255	to combine	-0.124939
-1.962818	&& b<c	-0.124939
-3.357106	can incur	-0.124939
-2.595113	also included.	-0.124939
-3.433503	are compiler-specific.	-0.124939
-3.802463	of security,	-0.124939
-3.527822	The reinterpret_cast	-0.124939
-1.715403	so-called CPU-dispatcher	-0.124939
-2.572330	* 1.5f;	-0.124939
-2.824466	functions Sum1,	-0.124939
-2.078396	files (*.ini	-0.124939
-3.850836	is returned.	-0.124939
-0.602014	1.09 1.25	-0.124939
-1.079043	0.11 1.21	-0.124939
-2.719317	class (also	-0.124939
-1.504780	prediction mechanisms.	-0.124939
-0.902998	S. Warren,	-0.124939
-0.902998	39 matrix[i][j]	-0.124939
-3.795255	to pass	-0.124939
-1.902443	dispatch mechanisms,	-0.124939
-1.379934	x-xxxx--x Constantfolding	-0.124939
-3.310495	= &CriticalFunction_Dispatch;	-0.124939
-3.121718	not traditionally	-0.124939
-1.805625	cases. Database	-0.124939
-1.079043	alias upon	-0.124939
-2.032962	preferably isolated	-0.124939
-3.827713	a thorough	-0.124939
-1.643083	? EXCEPTION_EXECUTE_HANDLER	-0.124939
-1.777551	operation isolates	-0.124939
-0.602014	crystal ball	-0.124939
-2.779853	all intrin.h	-0.124939
-3.327981	// Convert	-0.124939
-3.153114	as pow,	-0.124939
-2.223786	common denominator	-0.124939
-1.504919	(unsigned int)(max	-0.124939
-1.203935	regular pattern,	-0.124939
-1.447019	aware of.	-0.124939
-0.602014	41 Float	-0.124939
-1.446881	solution. Sort	-0.124939
-2.419436	often excessively	-0.124939
-0.602014	reordered, inlined,	-0.124939
-0.902998	repeated 1024/4	-0.124939
-1.446881	<= max)	-0.124939
-2.203058	option /QaxAVX	-0.124939
-2.994258	this delaying	-0.124939
-3.291104	or /Ox	-0.124939
-1.747818	frame /Oy	-0.124939
-3.654810	and reorganize	-0.124939
-3.291104	or intranet	-0.124939
-1.943743	> v.i	-0.124939
-1.680779	containing (2,2,2,2),	-0.124939
-2.824466	functions directly:	-0.124939
-1.380073	!= INVALID_HANDLE_VALUE	-0.124939
-0.602014	5.82 (Embarcadero/CodeGear/Borland	-0.124939
-1.555979	aliasing /Oa	-0.124939
-2.406486	optimization /Og	-0.124939
-2.717714	page 54.	-0.124939
-3.654810	and micro-operation	-0.124939
-0.602014	&&, ||,	-0.124939
-2.660958	b Bit	-0.124939
-2.568881	possible inputs.	-0.124939
-2.559809	value infinity,	-0.124939
-1.923632	give infinity.	-0.124939
-1.555933	fully utilizing	-0.124939
-2.307923	simple solution,	-0.124939
-0.902998	favorable: Larger	-0.124939
-2.191509	been criticized	-0.124939
-3.291104	or PathScale.	-0.124939
-1.681010	algebraic reduction.	-0.124939
-4.129425	the label.	-0.124939
-2.470506	faster despite	-0.124939
-2.234558	speed /O2	-0.124939
-3.795255	to reinstall	-0.124939
-4.129425	the framework,	-0.124939
-2.157300	uses SSE3.	-0.124939
-2.048156	particular situation,	-0.124939
-1.963418	means modulo.	-0.124939
-3.310495	= 5.0f;	-0.124939
-0.902998	a[2]; a[0]	-0.124939
-2.254256	avoid hard-to-find	-0.124939
-2.695975	using new.	-0.124939
-0.602014	-ffast-math /fp:fast	-0.124939
-3.449984	be adjusted	-0.124939
-3.433503	are dominating.	-0.124939
-2.364011	without discriminating	-0.124939
-3.795255	to 36.	-0.124939
-0.602014	2014-08-07. Contents	-0.124939
-0.602014	2.8. Asmlib:	-0.124939
-3.827713	a zero-terminated	-0.124939
-3.850836	is pipelined,	-0.124939
-3.827713	a polymorphous	-0.124939
-3.654810	and _mm_free.	-0.124939
-1.902535	later discovers	-0.124939
-0.902998	MOVNTPD _mm_stream_pd	-0.124939
-2.612545	multiple streams	-0.124939
-1.203981	MOVNTQ _mm_stream_pi	-0.124939
-2.564417	version 2.20,	-0.124939
-2.975045	time intervals.	-0.124939
-2.717714	page 81).	-0.124939
-2.565062	clock cycles).	-0.124939
-2.572330	* 16is	-0.124939
-3.654810	and non-constant	-0.124939
-3.850836	is taken.	-0.124939
-3.654810	and irregular	-0.124939
-1.446927	linker extracts	-0.124939
-3.850836	is taken,	-0.124939
-1.831954	computer games	-0.124939
-3.310495	= lrint(d);	-0.124939
-2.157300	128 Is32vec4	-0.124939
-2.426704	64 Is32vec2	-0.124939
-2.202827	small microcontrollers:	-0.124939
-2.445681	call (other	-0.124939
-0.602014	/Qipo -ipo	-0.124939
-3.082131	int x[])	-0.124939
-2.681804	double a[arraysize],	-0.124939
-1.556210	3: printf("Delta");	-0.124939
-2.525000	so 1.2	-0.124939
-3.122826	This ends	-0.124939
-0.902998	respond quickly	-0.124939
-1.805672	syntax restriction,	-0.124939
-3.153114	as email	-0.124939
-1.747772	platforms (Windows,	-0.124939
-1.379934	plus i*sizeof(S1).	-0.124939
-4.129425	the end.	-0.124939
-2.406440	code. 147	-0.124939
-2.975045	time packed	-0.124939
-2.078581	addresses 0x2F00,	-0.124939
-2.203058	option (Windows:	-0.124939
-1.943651	|| (!a&&b)	-0.124939
-3.654810	and temp2.	-0.124939
-0.602014	"express" edition	-0.124939
-2.452348	critical stride,	-0.124939
-2.452348	critical stride.	-0.124939
-0.602014	(critical stride)	-0.124939
-3.795255	to temporarily	-0.124939
-2.502706	software teachers	-0.124939
-3.654810	and stopping	-0.124939
-2.559809	value 0x2C	-0.124939
-2.719317	class (CGrandParent)	-0.124939
-2.888228	memory caches.	-0.124939
-2.063488	both positive.	-0.124939
-0.602014	Rick Booth:	-0.124939
-0.902998	math. Libraries	-0.124939
-0.602014	complicated? Because	-0.124939
-3.327981	// Find	-0.124939
-3.291104	or 2016.	-0.124939
-1.446834	reductions: !(!a)=a	-0.124939
-3.291104	or NAN.	-0.124939
-1.203935	type. Interrupt	-0.124939
-3.449984	be platform-independent	-0.124939
-0.602014	strcpy, strcat,	-0.124939
-0.602014	BSF (bit	-0.124939
-3.153114	as required,	-0.124939
-1.680779	containing numerical	-0.124939
-3.527822	The characters	-0.124939
-3.449984	be made)	-0.124939
-3.802463	of matrices.	-0.124939
-0.602014	Wednesday, Thursday,	-0.124939
-2.492816	32-bit counterparts.	-0.124939
-2.717714	page 90.	-0.124939
-3.654810	and double.....................................................................................	-0.124939
-2.399785	time. (Of	-0.124939
-2.595343	C++ language......................................................	-0.124939
-0.602014	(rarely 64).	-0.124939
-4.129425	the STL.	-0.124939
-3.281894	it matters:	-0.124939
-2.307969	& 0=	-0.124939
-3.663751	in advance,	-0.124939
-3.654810	and operators...............................................................................	-0.124939
-2.717714	page 84).	-0.124939
-0.902998	x64 (Visual	-0.124939
-1.962864	Table 13.1.	-0.124939
-2.695975	using new/delete	-0.124939
-2.894347	Example 14.22b	-0.124939
-2.894347	Example 14.22a	-0.124939
-3.827713	a technological	-0.124939
-2.356780	even matters,	-0.124939
-2.789259	which interprets	-0.124939
-1.680779	access. Run	-0.124939
-3.517654	for vectorizing	-0.124939
-2.882274	at 400,	-0.124939
-0.602014	Performance". www.open-	-0.124939
-2.894347	Example 15.1d.	-0.124939
-1.555933	overflow. Taking	-0.124939
-3.449984	be reloaded	-0.124939
-2.341130	following features:	-0.124939
-3.517654	for Nerds	-0.124939
-4.129425	the user-written	-0.124939
-2.915311	} 59	-0.124939
-0.602014	amd_vrs4_expf amd_vrd2_exp	-0.124939
-2.572330	* 5;	-0.124939
-3.850836	is clearly	-0.124939
-0.602014	/Oa -fno-alias	-0.124939
-2.078027	quite ingenious	-0.124939
-0.902998	template. 57	-0.124939
-2.223786	common denominator:	-0.124939
-2.681804	double Func1(double)	-0.124939
-1.555887	(or later)	-0.124939
-3.802463	of Denmark.	-0.124939
-2.348587	void StoreNTD(double	-0.124939
-1.079043	float. (Both	-0.124939
-0.602014	Builder 5,	-0.124939
-3.663751	in two:	-0.124939
-2.894347	Example 14.18a	-0.124939
-2.894347	Example 14.18b	-0.124939
-2.717714	page 53.	-0.124939
-2.652005	+ two,	-0.124939
-1.715634	struct Slongdouble	-0.124939
-2.894347	Example 9.2b	-0.124939
-2.894347	Example 9.2a	-0.124939
-1.715541	mode. Much	-0.124939
-3.449984	be evicted.	-0.124939
-3.795255	to advertise	-0.124939
-1.079043	---xx---- (-a>-b)=(a<b)	-0.124939
-3.433503	are short.	-0.124939
-2.717714	page 45.	-0.124939
-0.602014	1./3628800., 1./39916800.,	-0.124939
-2.779853	all occurrences	-0.124939
-0.902998	power. Connecting	-0.124939
-2.717714	page 134)	-0.124939
-2.702043	example 12.1a,	-0.124939
-0.602014	memmove, memset,	-0.124939
-3.850836	is destroyed.	-0.124939
-2.544927	we have:	-0.124939
-2.695975	using memset:	-0.124939
-1.504780	4, etc.).	-0.124939
-2.508423	table 9.2,	-0.124939
-0.602014	stupid things.	-0.124939
-2.994258	this limitation	-0.124939
-2.894347	Example 8.24.	-0.124939
-1.203981	virus attacks	-0.124939
-2.894347	Example 7.32b	-0.124939
-3.527822	The undocumented	-0.124939
-3.357106	can surely	-0.124939
-1.203981	2n -1.	-0.124939
-3.433503	are met:	-0.124939
-3.850836	is shut	-0.124939
-2.341130	following sections.	-0.124939
-0.602014	unexpected behaviors.	-0.124939
-1.643083	well. Very	-0.124939
-1.643037	allow assembly-like	-0.124939
-0.602014	development", Addison-	-0.124939
-4.129425	the following:	-0.124939
-4.129425	the difference,	-0.124939
-3.827713	a bottleneck.	-0.124939
-1.680871	logical sequence.	-0.124939
-3.011997	{ __declspec(__align(64))	-0.124939
-3.251046	function rounds	-0.124939
-1.715449	macro expansions.	-0.124939
-3.827713	a temp1	-0.124939
-2.168831	/ 0x40)	-0.124939
-3.802463	of temp.	-0.124939
-2.915311	} printf("\nResults:");	-0.124939
-0.602014	hand. Low-level	-0.124939
-2.307969	& 0x0F)	-0.124939
-1.203935	(RTTI) ...........................................................................	-0.124939
-2.518327	return powN<true,N/2>::p(x)	-0.124939
-0.602014	(release version)	-0.124939
-2.660958	b memcpy(b,	-0.124939
-0.602014	a[arraysize], b[arraysize],	-0.124939
-4.129425	the truth	-0.124939
-4.129425	the ADX	-0.124939
-3.802463	of ADC	-0.124939
-3.795255	to realize	-0.124939
-0.602014	purpose: Contain	-0.124939
-1.300845	13 objects,	-0.124939
-2.157300	128 I64vec2	-0.124939
-3.449984	be mitigated	-0.124939
-0.902998	-mAVX -axSSE3,	-0.124939
-3.795255	to objects)	-0.124939
-3.663751	in 2015	-0.124939
-3.482454	that r+i/2	-0.124939
-2.975045	time lag.	-0.124939
-3.654810	and tedious.	-0.124939
-2.106378	hardware design.	-0.124939
-2.502706	software design,	-0.124939
-2.892568	from scratch.	-0.124939
-2.513246	2 0.63	-0.124939
-0.902998	(-a)*(-b)=a*b ---xxx---	-0.124939
-1.831908	5 Programmable	-0.124939
-4.129425	the producer	-0.124939
-3.281894	it says.	-0.124939
-1.079043	Linux: -ffunction-sections)	-0.124939
-3.850836	is re-allocated	-0.124939
-3.663751	in nn	-0.124939
-1.680687	(e.g. DEC,	-0.124939
-1.446881	registers, whereas	-0.124939
-3.327981	// Faster,	-0.124939
-0.902998	__attribute__((const)) (Linux	-0.124939
-0.602014	0.5 ns	-0.124939
-0.602014	"More Effective	-0.124939
-2.214012	AMD XOP	-0.124939
-3.827713	a XOR	-0.124939
-3.827713	a stand	-0.124939
-0.602014	/fp:fast=2 -fp-model	-0.124939
-0.602014	polymorphous class?	-0.124939
-3.654810	and free)	-0.124939
-0.602014	1./1.30767E12, 1./2.09227E13};	-0.124939
-2.717714	page 135).	-0.124939
-1.680733	disk caching,	-0.124939
-1.601690	define fprintf	-0.124939
-3.654810	and Sum3.	-0.124939
-2.994258	this block:	-0.124939
-0.602014	Thin clients	-0.124939
-1.777458	v. 1.4,	-0.124939
-0.602014	NUMCOLUMNS; column++)	-0.124939
-2.861745	has occurred	-0.124939
-1.902489	control tool.	-0.124939
-4.129425	the iterator	-0.124939
-3.085097	an illegal	-0.124939
-2.894347	Example 8.6a	-0.124939
-2.894347	Example 8.6b	-0.124939
-3.827713	a zip	-0.124939
-2.894347	Example 7.15a.	-0.124939
-3.073074	than 33%	-0.124939
-3.449984	be signed.	-0.124939
-3.850836	is signed,	-0.124939
-2.894347	Example 7.5.	-0.124939
-1.446881	Now s0,	-0.124939
-0.602014	gained remarkably	-0.124939
-1.999261	systems. Today	-0.124939
-3.795255	to great	-0.124939
-0.602014	(*.ini files).	-0.124939
-1.079043	/arch:AVX /QaxSSE3,	-0.124939
-1.680871	details (www.agner.org/optimize/testp.zip).	-0.124939
-3.517654	for everything,	-0.124939
-1.079043	copying. Security.	-0.124939
-0.602014	C99 standard.	-0.124939
-1.504780	profiling methods:	-0.124939
-0.602014	objects, respectively	-0.124939
-1.203935	SVML v.10.3	-0.124939
-1.203935	SVML v.10.2	-0.124939
-3.291104	or tiling.	-0.124939
-1.715403	100 doubles:	-0.124939
-0.602014	200. Next,	-0.124939
-2.587401	efficient alternative.	-0.124939
-1.446973	Template Library)	-0.124939
-1.680779	mov lea	-0.124939
-2.348587	void StoreVectorA(void	-0.124939
-3.449984	be emphasized	-0.124939
-3.291104	or *.so)	-0.124939
-1.715449	optimization. en.wikipedia.org/wiki/Compiler_optimization.	-0.124939
-0.602014	PCLMUL wmmintrin.h	-0.124939
-0.602014	gates, flip-flops,	-0.124939
-3.663751	in Microsoft's	-0.124939
-1.747680	output (/FAs	-0.124939
-4.129425	the standards	-0.124939
-2.717714	page 140.	-0.124939
-0.602014	i2; for(i=0,i2=0;	-0.124939
-1.079043	divisions (Division	-0.124939
-1.300891	for(i=0; i<301;	-0.124939
-0.602014	(if valid)	-0.124939
-2.679119	Intel CPU’s.	-0.124939
-3.011997	{ ab[i].b	-0.124939
-2.254210	arrays forwards,	-0.124939
-0.602014	Gauss elimination.	-0.124939
-2.419436	often unreliable.	-0.124939
-3.795255	to port	-0.124939
-1.300845	www.agner.org/optimize/asmlib.zip. Currently	-0.124939
-3.433503	are cheap,	-0.124939
-1.300845	Kernel Library.	-0.124939
-0.602014	2007 (www.intel.com/technology/itj/).	-0.124939
-2.815473	instruction timing,	-0.124939
-1.880351	512 520	-0.124939
-2.457547	called properties)	-0.124939
-2.224063	single result,	-0.124939
-3.827713	a reply	-0.124939
-2.894347	Example 14.17b	-0.124939
-2.106240	: 52;	-0.124939
-3.433503	are costless	-0.124939
-1.504826	misprediction penalty.	-0.124939
-3.225895	by fetching,	-0.124939
-1.379934	normal afterwards.	-0.124939
-3.310495	= 123;	-0.124939
-2.661281	into groups	-0.124939
-2.518327	return _mm_load_si128((__m128i	-0.124939
-3.000398	have sent	-0.124939
-1.643083	loops (less	-0.124939
-0.602014	ia32intrin.h _mm_exp_ps	-0.124939
-3.449984	be noticeable	-0.124939
-0.602014	_mm_exp_ps _mm_exp_pd	-0.124939
-1.601644	address. Step	-0.124939
-1.203935	directive __declspec(cpu_dispatch(...)).	-0.124939
-0.902998	b[size], c[size];	-0.124939
-3.654810	and bb[i]*cc[i]	-0.124939
-1.079043	list[100]; memset(list,	-0.124939
-4.129425	the broader	-0.124939
-2.492447	< NUMCOLUMNS;	-0.124939
-3.654810	and semicolons	-0.124939
-3.357106	can toggle	-0.124939
-2.894347	Example 14.7a.	-0.124939
-2.333023	operations Today's	-0.124939
-2.192062	cause holes	-0.124939
-1.962864	Table 12.1.	-0.124939
-0.602014	exist. Therefore	-0.124939
-2.234190	language output,	-0.124939
-0.602014	initialisation i=0;	-0.124939
-2.224063	single session.	-0.124939
-2.832806	same algorithm,	-0.124939
-3.112512	- 2014.	-0.124939
-2.202919	+= a[i+2];	-0.124939
-1.504780	4, anda	-0.124939
-3.051374	may deviate	-0.124939
-1.504780	software. Background	-0.124939
-1.981440	#include "instrset_detect.cpp"	-0.124939
-2.702043	example 12.1b	-0.124939
-2.063580	write _mm_add_epi16(a,b).	-0.124939
-4.129425	the EXCLUSIVE	-0.124939
-2.702043	example 8.26b:	-0.124939
-2.616466	float list[16];	-0.124939
-1.300845	strict formalism	-0.124939
-3.195606	with full-size	-0.124939
-3.827713	a graceful	-0.124939
-3.281894	it changes.	-0.124939
-1.642944	Library (ATL)	-0.124939
-0.602014	/Qparallel -parallel	-0.124939
-1.380026	16; n++)	-0.124939
-2.399785	time. (Examples	-0.124939
-1.962910	| Friday))	-0.124939
-1.777735	including relaxed	-0.124939
-0.602014	(bitwise and)	-0.124939
-2.214012	AMD Family	-0.124939
-2.133077	supported fprintf(stderr,	-0.124939
-2.702043	example 16.1.	-0.124939
-3.357106	can handle.	-0.124939
-2.399785	time. Uses	-0.124939
-3.291104	or modifies	-0.124939
-2.063534	optimize specifically	-0.124939
-3.795255	to controversies	-0.124939
-2.348587	void F1(int	-0.124939
-4.129425	the representation,	-0.124939
-1.079089	back. Thus,	-0.124939
-2.016386	see http://www.agner.org/optimize/	-0.124939
-1.079089	13.6 80.9	-0.124939
-0.602014	14.0 80.8	-0.124939
-3.654810	and intelligible	-0.124939
-4.129425	the computational	-0.124939
-3.291104	or __attribute__((aligned(16))).	-0.124939
-2.975045	time measurement.	-0.124939
-3.433503	are obscured	-0.124939
-2.370928	these considerations.	-0.124939
-2.867614	program dictates	-0.124939
-3.482454	that crashes	-0.124939
-3.850836	is 83	-0.124939
-0.602014	Pragmatic Look	-0.124939
-1.831816	section 17.9:	-0.124939
-0.602014	flip-flops, multiplexers,	-0.124939
-4.129425	the next.	-0.124939
-2.741712	integer representations	-0.124939
-3.795255	to deallocate	-0.124939
-3.011997	{ _mm_store_si128((__m128i	-0.124939
-1.204028	eliminated completely.	-0.124939
-3.433503	are different.	-0.124939
-2.703880	compilers succeeded	-0.124939
-3.000398	have little-endian	-0.124939
-2.994258	this chapter.	-0.124939
-0.602014	Is8vec16 Vec16c	-0.124939
-2.048386	dispatching 125	-0.124939
-2.894347	Example 14.16a	-0.124939
-4.129425	the if-branch	-0.124939
-1.880721	based mainly	-0.124939
-2.717714	page 132.	-0.124939
-2.316817	type size_t	-0.124939
-3.827713	a FILO	-0.124939
-1.079043	12. Higher	-0.124939
-1.504873	ecx 86	-0.124939
-0.602014	x,y coordinates	-0.124939
-2.572330	* b2	-0.124939
-2.572330	* b1	-0.124939
-4.129425	the "Macro	-0.124939
-3.654810	and b.	-0.124939
-3.654810	and VIA.	-0.124939
-1.856824	just happened	-0.124939
-1.079043	runtime. Polymorphism	-0.124939
-2.332977	bits 32-62.	-0.124939
-3.162394	code motion.	-0.124939
-1.555979	purposes (www.boost.org).	-0.124939
-2.263341	while (seconds	-0.124939
-3.327981	// continue	-0.124939
-3.172431	on Intel/x86-compatible	-0.124939
-2.894347	Example 7.26b	-0.124939
-0.602014	Dobbs Journal,	-0.124939
-3.066220	compiler (parallel	-0.124939
-2.894347	Example 7.26a	-0.124939
-4.129425	the possibilities	-0.124939
-2.811801	other subtasks	-0.124939
-3.291104	or bottleneck,	-0.124939
-3.517654	for analysis.	-0.124939
-1.379980	Testing speed..............................................................................................................	-0.124939
-0.902998	color difference.	-0.124939
-0.902998	SSE4.2 nmmintrin.h	-0.124939
-0.602014	enum, const,	-0.124939
-3.654810	and SVML.	-0.124939
-3.073074	than non-object	-0.124939
-3.795255	to catching	-0.124939
-3.011997	{ F1();	-0.124939
-3.121718	not used).	-0.124939
-2.564417	version 2.6.30	-0.124939
-2.745331	cache contentions,	-0.124939
-3.225895	by causing	-0.124939
-3.011997	{ StoreNTD(&a[c][r],	-0.124939
-3.251046	function F1.	-0.124939
-1.300799	__asm ("fldl	-0.124939
-2.894347	Example 8.19.	-0.124939
-2.033100	Therefore, micro-	-0.124939
-3.073074	than 15.1b,	-0.124939
-0.602014	8.20 module1.cpp	-0.124939
-3.291104	or memory-intensive	-0.124939
-3.663751	in F1?	-0.124939
-2.661281	into account.	-0.124939
-0.602014	Xnu project.	-0.124939
-2.502706	software project,	-0.124939
-2.497887	between recoverable	-0.124939
-2.253980	Windows 3.x.	-0.124939
-1.643037	bounds checking,	-0.124939
-0.602014	spell checking.	-0.124939
-2.894347	Example 8.10b	-0.124939
-2.894347	Example 8.10a	-0.124939
-2.180735	optimized yet.	-0.124939
-0.602014	0.82 0.59	-0.124939
-3.082131	int DontSkip;	-0.124939
-0.902998	Plus2 (&a);	-0.124939
-2.078304	program. During	-0.124939
-0.602014	(-a>-b)=(a<b) ---xx---x	-0.124939
-3.051374	may view	-0.124939
-1.504826	now discontinued	-0.124939
-1.446834	Linux. Address	-0.124939
-3.850836	is virtually	-0.124939
-3.121718	not satisfactory.	-0.124939
-2.717714	page 87.	-0.124939
-0.902998	fast, -fp-	-0.124939
-1.680779	counters ....................................................................	-0.124939
-3.517654	for fetching	-0.124939
-2.616466	float i2;	-0.124939
-1.379934	Is16vec8 zero(0,0,0,0,0,0,0,0);	-0.124939
-0.902998	accelerator card.	-0.124939
-3.795255	to Func1,	-0.124939
-0.902998	usage convention	-0.124939
-1.642944	Library (OWL).	-0.124939
-4.129425	the <,	-0.124939
-3.153114	as <.	-0.124939
-0.602014	JavaScript, PHP,	-0.124939
-4.129425	the non-reduced	-0.124939
-2.340669	file level.	-0.124939
-2.888228	memory released	-0.124939
-2.281732	its address:	-0.124939
-3.654810	and Enterprise	-0.124939
-2.157300	128 Vec2uq	-0.124939
-1.856870	switch statements.............................................................................	-0.124939
-1.715634	Linux, Mac,	-0.124939
-2.892568	from www.agner.org/optimize/testp.zip.	-0.124939
-3.291104	or CString.	-0.124939
-3.795255	to receive	-0.124939
-0.602014	5.5 Mac:	-0.124939
-3.850836	is expanded	-0.124939
-0.602014	planned solutions.	-0.124939
-2.341130	following solutions,	-0.124939
-2.572330	* (a+1);	-0.124939
-2.894347	Example 7.30b	-0.124939
-2.894347	Example 7.30a	-0.124939
-0.602014	supported"); return;	-0.124939
-2.406578	libraries published	-0.124939
-1.680687	(e.g. GetProcessAffinityMask	-0.124939
-3.449984	be reinstalled	-0.124939
-2.016386	see emulated	-0.124939
-1.504965	hash map.	-0.124939
-0.902998	160 /Qparallel	-0.124939
-3.654810	and fffff	-0.124939
-1.643222	Visual Basic,	-0.124939
-3.310495	= OneOrTwo5[b	-0.124939
-3.517654	for Basic.	-0.124939
-3.850836	is fastest.	-0.124939
-2.518327	return square(x)	-0.124939
-1.446881	changes fastest:	-0.124939
-1.446834	T max(T	-0.124939
-1.962910	too late.	-0.124939
-0.902998	SafeArray <float,	-0.124939
-3.654810	and closer	-0.124939
-3.527822	The static_cast	-0.124939
-0.602014	superior performance/price	-0.124939
-1.300799	considerable improvement	-0.124939
-2.882274	at runtime,	-0.124939
-0.602014	x^1, x^2,	-0.124939
-2.439294	bit set).	-0.124939
-3.122826	This corresponds	-0.124939
-0.602014	discrete icon	-0.124939
-0.602014	("int 3");	-0.124939
-0.602014	1.25 1.61	-0.124939
-2.717714	page 38).	-0.124939
-1.504826	X __attribute__((aligned(16)))	-0.124939
-1.601736	Compiler Documentation".	-0.124939
-3.663751	in connection	-0.124939
-1.300799	runs satisfactorily	-0.124939
-0.602014	kit (SDK	-0.124939
-3.195606	with alloca:	-0.124939
-1.555887	inefficient. Linear	-0.124939
-3.082131	int list[301];	-0.124939
-1.300891	Position-independent code..................................................................................	-0.124939
-2.253980	Windows Server	-0.124939
-0.602014	formats. Comments	-0.124939
-3.802463	of synchronizing	-0.124939
-3.172431	on redesigning	-0.124939
-3.195606	with alloca,	-0.124939
-0.602014	graphic brushes,	-0.124939
-2.518327	return _mm_cvtss_si32(_mm_load_ss(&x));}	-0.124939
-3.663751	in mind,	-0.124939
-1.379980	self-relative addressing.	-0.124939
-0.602014	you. Optimized	-0.124939
-0.902998	supporting multi-threaded	-0.124939
-2.894347	Example 7.3.	-0.124939
-1.379980	Agner Fog	-0.124939
-0.602014	;eax=addressofa ;edx=addressinr	-0.124939
-0.602014	//=A*x*x+B*x+C //=DeltaY	-0.124939
-1.079043	float. Similar	-0.124939
-1.203981	pool. Alignment?	-0.124939
-4.129425	the startup	-0.124939
-0.602014	{2.6f, 1.5f};	-0.124939
-3.482454	that doesn’t.	-0.124939
-2.894347	Example 7.39	-0.124939
-2.702043	example 12.8a	-0.124939
-3.795255	to 12.8b	-0.124939
-3.291104	or "how	-0.124939
-2.702043	example 7.35	-0.124939
-2.894347	Example 7.37	-0.124939
-3.195606	with #)	-0.124939
-0.602014	electrical connections	-0.124939
-2.894347	Example 7.36	-0.124939
-3.310495	= MAX(f(x),	-0.124939
-3.251046	function add_horizontal)	-0.124939
-1.446881	trick violates	-0.124939
-3.112512	- 8*x	-0.124939
-2.452764	See www.agner.org/optimize	-0.124939
-3.051374	may write:	-0.124939
-3.000398	have exploited.	-0.124939
-3.122826	This closely	-0.124939
-3.654810	and foremost,	-0.124939
-2.282331	important remedy	-0.124939
-2.340531	system dependent	-0.124939
-3.082131	int c1::*MemberPointer;	-0.124939
-2.446280	0; i--)	-0.124939
-3.654810	and list[i].b.	-0.124939
-3.291104	or removable	-0.124939
-3.795255	to ignore,	-0.124939
-1.079043	mean atomic.	-0.124939
-2.572330	* 5).	-0.124939
-2.717714	page 87)	-0.124939
-3.850836	is distributed.	-0.124939
-1.300891	definitely degrades	-0.124939
-0.602014	7.3. Explain	-0.124939
-2.629046	library (VML,	-0.124939
-0.602014	<bool IsPowerOf2,	-0.124939
-3.310495	= Func(ab[i].a);	-0.124939
-2.451427	4 (NetBurst)	-0.124939
-3.281894	it understands	-0.124939
-2.609644	number 6!	-0.124939
-2.492447	< ArraySize;	-0.124939
-1.203935	performs poorly.	-0.124939
-3.802463	of habit,	-0.124939
-3.310495	= log(2.0);	-0.124939
-1.747911	Pentium M	-0.124939
-2.565062	clock period	-0.124939
-3.654810	and flexibility,	-0.124939
-0.902998	first-in-last-out fashion.	-0.124939
-3.654810	and 3A	-0.124939
-3.802463	of CPU-time	-0.124939
-0.602014	block: 62	-0.124939
-0.602014	Runtime, CLR,	-0.124939
-0.602014	2.23 0.95	-0.124939
-4.129425	the exponent:	-0.124939
-2.858815	vector operands:	-0.124939
-1.999261	systems. 67	-0.124939
-2.915311	} 68	-0.124939
-2.032915	1; 69	-0.124939
-3.517654	for details).	-0.124939
-0.602014	v 4.0.1.	-0.124939
-3.802463	of DLLs,	-0.124939
-1.379980	structure. Incrementing	-0.124939
-4.129425	the conversion.	-0.124939
-2.842927	different browsers,	-0.124939
-3.310495	= 50;	-0.124939
-3.850836	is unrealistic	-0.124939
-2.106378	hardware identification.	-0.124939
-1.555979	approximately 500	-0.124939
-1.880351	choose between.	-0.124939
-3.795255	to consult	-0.124939
-1.643083	previous iteration.	-0.124939
-2.544145	any event,	-0.124939
-2.503028	very helpful	-0.124939
-2.508884	performance costs.	-0.124939
-1.079089	Available protocols	-0.124939
-1.981763	reference instead:	-0.124939
-1.643037	sizes (char,	-0.124939
-1.203981	Optimizes moderately	-0.124939
-1.504873	__restrict aa,	-0.124939
-0.602014	(b*c) overflows,	-0.124939
-1.962864	Table 12.3.	-0.124939
-2.191509	been brutally	-0.124939
-2.652005	+ sign(i)	-0.124939
-2.922430	will occur:	-0.124939
-1.446881	Borland bcc,	-0.124939
-0.602014	powN<true,N-N1>::p(x); #undef	-0.124939
-0.602014	2eee 1.fffff,	-0.124939
-0.602014	18015, "Technical	-0.124939
-2.702043	example converts	-0.124939
-2.572330	* sizeof(float))	-0.124939
-1.715634	struct Sfloat	-0.124939
-3.327981	// erroneously	-0.124939
-1.079043	multiplications. Subtractions	-0.124939
-2.595343	C++ Builder	-0.124939
-2.169292	calculated as(a	-0.124939
-3.654810	and shifts	-0.124939
-2.858815	vector intrinsics	-0.124939
-0.602014	ja $B2$3:	-0.124939
-0.902998	Vec4f Vec2d	-0.124939
-2.340531	system breakdown.	-0.124939
-2.202827	small subtasks,	-0.124939
-3.654810	and y?"	-0.124939
-3.827713	a driver	-0.124939
-1.079043	Microprocessor producers	-0.124939
-0.602014	I64vec2 Vec2q	-0.124939
-1.962818	three parts:	-0.124939
-2.426704	64 Is8vec8	-0.124939
-3.251046	function prototypes	-0.124939
-1.379934	five manuals:	-0.124939
-3.310495	= (int)(&list[0])	-0.124939
-0.602014	footprint. If,	-0.124939
-0.602014	micro-operation breakdowns	-0.124939
-3.795255	to minimize	-0.124939
-0.602014	65535 uint16_t	-0.124939
-0.602014	-32768 32767	-0.124939
-3.663751	in parts,	-0.124939
-3.527822	The IPP	-0.124939
-0.902998	University courses	-0.124939
-0.902998	FMA4 fma4intrin.h	-0.124939
-3.663751	in all,	-0.124939
-3.251046	function bodies	-0.124939
-2.106055	add eax,1	-0.124939
-2.874792	data Loading	-0.124939
-3.245026	if (SIZE	-0.124939
-0.602014	tricks Michael	-0.124939
-2.307923	simple actions	-0.124939
-3.433503	are risking	-0.124939
-4.129425	the sign,	-0.124939
-2.348587	void MathLoop()	-0.124939
-2.954036	more heuristic	-0.124939
-3.251046	function (n!)	-0.124939
-1.680825	square root,	-0.124939
-4.129425	the GOT,	-0.124939
-1.777458	v. 7.2).	-0.124939
-2.717714	page 130.	-0.124939
-3.449984	be ameliorated	-0.124939
-2.048202	programs installed	-0.124939
-3.827713	a Gauss	-0.124939
-1.831862	my blog	-0.124939
-0.602014	Dr Dobbs	-0.124939
-4.129425	the fundamental	-0.124939
-1.643083	PTR [eax+4],	-0.124939
-2.572330	* dest,	-0.124939
-0.902998	__asm__ ("CriticalFunction");	-0.124939
-0.602014	1.2345; Change	-0.124939
-3.449984	be scheduled	-0.124939
-2.203058	option -fwrapv	-0.124939
-2.668262	pointer conversions.	-0.124939
-3.517654	for educational	-0.124939
-2.432723	register state.	-0.124939
-1.079043	compute i/2	-0.124939
-0.602014	__vrs4_expf __vrd2_exp	-0.124939
-1.504826	heavy traffic	-0.124939
-0.902998	(memory address)	-0.124939
-1.601598	user. Compatibility	-0.124939
-2.316817	type conversions:	-0.124939
-3.433503	are confined	-0.124939
-3.291104	or glitches	-0.124939
-3.827713	a funda-	-0.124939
-2.681804	double matrix[SIZE][SIZE];	-0.124939
-2.348587	void FUNCNAME(short	-0.124939
-2.234420	element matrix[c][r].	-0.124939
-0.902998	writing: __declspec(align(64))	-0.124939
-0.602014	%1 \n	-0.124939
-3.310495	= MultiplyBy<8>(10);	-0.124939
-1.831954	references accept	-0.124939
-3.654810	and suggestions	-0.124939
-1.079043	25 Since	-0.124939
-1.831908	negative impacts	-0.124939
-1.805810	dynamically (with	-0.124939
-1.300799	method. Your	-0.124939
-3.357106	can learn	-0.124939
-0.902998	price, compatibility,	-0.124939
-2.492447	< c1+TILESIZE;	-0.124939
-2.975045	time slice	-0.124939
-2.048479	Integer division......................................................................................................	-0.124939
-1.963187	needed _mm_shuffle_epi8	-0.124939
-1.902535	later __svml_expf4	-0.124939
-1.715495	it. Complicated	-0.124939
-1.379934	1.0; temp->b	-0.124939
-2.518327	return list[x];	-0.124939
-3.011997	{ temp->a	-0.124939
-2.223786	common subexpressions	-0.124939
-2.333023	operations (addition,	-0.124939
-2.291517	I guess,	-0.124939
-2.894347	Example 12.1b.	-0.124939
-2.702043	example 12.1b,	-0.124939
-3.310495	= &SelectAddMul_SSE2;	-0.124939
-2.789259	which imprecisions	-0.124939
-2.191786	classes (Intel)	-0.124939
-0.602014	__svml_expf4 __svml_exp2	-0.124939
-2.813863	CPU hardware.	-0.124939
-3.449984	be followed	-0.124939
-0.602014	SelectAddMul_AVX2, SelectAddMul_dispatch;	-0.124939
-2.611948	two branches:	-0.124939
-2.032408	multiplication prior	-0.124939
-2.518327	return a+1;.	-0.124939
-2.371297	systems (but	-0.124939
-3.251046	function __intel_cpu_features_init()	-0.124939
-2.202827	small low-power	-0.124939
-2.202919	+= list[i+1];}	-0.124939
-1.680687	(e.g. IsProcessorFeaturePresent	-0.124939
-2.611948	two names,	-0.124939
-3.433503	are satisfied.	-0.124939
-3.527822	The distinctions	-0.124939
-1.203981	i<300; i+=3,i_div_3++){	-0.124939
-2.379195	need relocation	-0.124939
-2.554687	takes hours	-0.124939
-3.310495	= b+a,	-0.124939
-3.433503	are satisfied:	-0.124939
-3.051374	may neverthe-	-0.124939
-2.616466	float list[]	-0.124939
-1.643083	condition clause.	-0.124939
-2.145389	container expandable,	-0.124939
-2.106194	standard 754	-0.124939
-0.602014	xxn(x4, x2*x,	-0.124939
-1.601690	CPU. Should	-0.124939
-3.850836	is deprecated.	-0.124939
-0.602014	facilitate porting	-0.124939
-1.642944	Library amd_vrs4_expf	-0.124939
-3.085097	an hour.	-0.124939
-3.482454	that discriminates	-0.124939
-3.663751	in applying	-0.124939
-3.281894	it has.	-0.124939
-1.079043	2exponent 1023	-0.124939
-0.602014	handle. Waiting	-0.124939
-2.858815	vector classes:	-0.124939
-0.902998	&Object1; p1->Hello();	-0.124939
-3.310495	= _mm_blendv_epi8(bc,	-0.124939
-2.894347	Example 8.12a	-0.124939
-2.894347	Example 8.12b	-0.124939
-0.902998	CriticalFunctionType CriticalFunction_Dispatch;	-0.124939
-2.406440	code. (Compile	-0.124939
-2.078304	program. Frequent	-0.124939
-3.073074	than isolating	-0.124939
-3.850836	is strongly	-0.124939
-2.954036	more manageable	-0.124939
-2.300152	up include:	-0.124939
-1.446881	registers, totaling	-0.124939
-3.291104	or (5)	-0.124939
-2.308153	instructions (MOVNT)	-0.124939
-1.300799	Basic .NET,	-0.124939
-3.827713	a column-wise	-0.124939
-1.203981	Making exception-safe	-0.124939
-1.079043	153 spends	-0.124939
-0.602014	0.30 4.5	-0.124939
-3.082131	int Size()	-0.124939
-0.602014	7.1. Sizes	-0.124939
-2.652005	+ 4.;	-0.124939
-3.827713	a website.	-0.124939
-0.602014	Menus, buttons,	-0.124939
-2.702043	example 9.5a:	-0.124939
-2.712916	each call,	-0.124939
-3.066220	compiler .......................................................................	-0.124939
-0.602014	1./720., 1./5040.,	-0.124939
-3.082131	int ReadB()	-0.124939
-1.880305	results printf("\n%2i	-0.124939
-1.962864	Table 7.1.	-0.124939
-2.612545	multiple elements?	-0.124939
-0.902998	aliasing" (if	-0.124939
-3.449984	be caused	-0.124939
-1.880213	<< x.f;	-0.124939
-0.902998	0/a=0 ---xx--xx	-0.124939
-2.340853	16 XOP,	-0.124939
-3.291104	or First-In-Last-Out	-0.124939
-2.572330	* sizeof(float)	-0.124939
-1.203935	support. Hardware	-0.124939
-1.079043	manipulation tricks	-0.124939
-2.307969	& a=	-0.124939
-2.192016	matrix a:	-0.124939
-2.832806	same directory	-0.124939
-0.602014	"Alpha", "Beta",	-0.124939
-3.153114	as (b*2.0)/3.0	-0.124939
-0.602014	deque (doubly	-0.124939
-3.310495	= CriticalFunction(b,	-0.124939
-0.602014	(remove unreferen-	-0.124939
-3.827713	a scalar	-0.124939
-1.446834	p. 57).	-0.124939
-2.106240	: 63;	-0.124939
-1.203935	processing. Scott	-0.124939
-2.263341	large delays.	-0.124939
-3.327981	// Read	-0.124939
-1.643175	Automatic paralleli-	-0.124939
-0.602014	255 uint8_t	-0.124939
-1.379934	detect opportunities	-0.124939
-1.504873	8, Thursday	-0.124939
-1.902212	AVX2 _mm_i32gather_epi32	-0.124939
-3.000398	have gone	-0.124939
-3.291104	or aliasing,	-0.124939
-2.518327	return add_elements(s);	-0.124939
-3.327981	// Dispatcher.	-0.124939
-3.291104	or void.	-0.124939
-2.356780	even worse	-0.124939
-3.153114	as ((a+b)+c)+d.	-0.124939
-0.602014	Foundation Classes	-0.124939
-0.602014	strcat, strlen,	-0.124939
-3.327981	// Constructor-style	-0.124939
-3.517654	for correctness.	-0.124939
-2.745331	cache miss.	-0.124939
-3.827713	a higher-priority	-0.124939
-0.602014	Background services.	-0.124939
-2.719317	class library).	-0.124939
-1.643083	PTR [esp+4]	-0.124939
-2.518327	return N;	-0.124939
-2.572330	* DynamicArray	-0.124939
-3.310495	= _mm_hadd_ps(x,	-0.124939
-1.777551	>= N)	-0.124939
-4.129425	the design	-0.124939
-2.078257	cycles (depending	-0.124939
-3.850836	is obvious.	-0.124939
-3.850836	is obvious,	-0.124939
-2.348587	void FuncType(short	-0.124939
-2.572330	* (1.	-0.124939
-1.680871	changed freely.	-0.124939
-1.079089	(x) x-xx--xx-	-0.124939
-0.602014	StoreNTD(&a[c][r], b[r][c]);	-0.124939
-2.233913	specific option)	-0.124939
-3.153114	as replacements	-0.124939
-2.595343	C++ Performance".	-0.124939
-2.203381	exception handler,	-0.124939
-3.654810	and increment.	-0.124939
-2.967317	use segmentation	-0.124939
-3.153114	as integers:	-0.124939
-0.602014	-parallel -openmp	-0.124939
-3.122826	This behaviour	-0.124939
-3.225895	by XOR'ing	-0.124939
-3.517654	for RTTI	-0.124939
-2.832806	same divisor.	-0.124939
-2.652005	+ a.y);}	-0.124939
-3.310495	= r1+1;	-0.124939
-2.224016	thread increments	-0.124939
-2.203381	exception handlers	-0.124939
-3.291104	or references:	-0.124939
-0.602014	(parallel composer)	-0.124939
-3.327981	// 2-dimensional	-0.124939
-1.880305	generate -128,	-0.124939
-0.602014	//=2*A //=A*x*x+B*x+C	-0.124939
-1.079043	multithreaded applications:	-0.124939
-3.291104	or /Fa	-0.124939
-3.085097	an unrecoverable	-0.124939
-1.079043	condition. Things	-0.124939
-2.340669	file /Fm	-0.124939
-4.129425	the texts	-0.124939
-3.802463	of redesign.	-0.124939
-1.962910	| 0x3F800000;	-0.124939
-3.802463	of -fpic.	-0.124939
-3.802463	of research	-0.124939
-3.850836	is servicing.	-0.124939
-0.902998	clock; __cpuid(dummy,	-0.124939
-3.802463	of rows/columns	-0.124939
-0.902998	&Object1; p->NotPolymorphic();	-0.124939
-0.602014	A. Hoisie,	-0.124939
-1.300799	pointers, e.g.:	-0.124939
-3.482454	that destroys	-0.124939
-1.805579	STL deque	-0.124939
-3.066220	compiler knows	-0.124939
-3.663751	in 2010.	-0.124939
-3.517654	for 80x86	-0.124939
-3.654810	and restoring	-0.124939
-2.652005	+ a2/b2;	-0.124939
-2.518327	return _mm_cvtss_f32(s);	-0.124939
-2.203058	option -read_only_relocs	-0.124939
-3.654810	and newsgroups	-0.124939
-2.702043	example 12.4b,	-0.124939
-2.894347	Example 12.4b.	-0.124939
-2.332977	bits (rarely	-0.124939
-3.310495	= 10000,	-0.124939
-0.602014	a[0], b[0],	-0.124939
-2.717714	page 72).	-0.124939
-4.129425	the dimensions	-0.124939
-0.602014	g++ v	-0.124939
-0.902998	y1, y2,	-0.124939
-1.203935	Small lightweight	-0.124939
-1.079043	esp ;alignby4	-0.124939
-1.831816	are: Coarse	-0.124939
-3.251046	function vectorized:	-0.124939
-0.902998	obviously influenced	-0.124939
-1.777458	v. 5.5	-0.124939
-1.504826	Memory swapping.	-0.124939
-3.433503	are breaking	-0.124939
-1.962956	below. Cannot	-0.124939
-2.629046	library libmmt.lib	-0.124939
-3.517654	for speeding	-0.124939
-1.379980	Func ;a	-0.124939
-3.195606	with _finite())	-0.124939
-0.902998	decomposition. Functional	-0.124939
-0.602014	performance/price ratio.	-0.124939
-0.902998	proceed unattended.	-0.124939
-3.795255	to reflect	-0.124939
-0.602014	;a ;r	-0.124939
-0.602014	{1.0f, 2.5f};	-0.124939
-2.717714	page 61.	-0.124939
-0.602014	-m32 -m64	-0.124939
-0.902998	busy concentrating	-0.124939
-2.132431	operators (&	-0.124939
-1.601736	My preference	-0.124939
-0.602014	Func1(list, &list[8]);	-0.124939
-2.378780	registers (6	-0.124939
-2.263341	while (0	-0.124939
-2.332977	bits (YMM)	-0.124939
-2.894347	Example 12.4d.	-0.124939
-1.715495	copying process,	-0.124939
-4.129425	the best-case	-0.124939
-2.652005	+ d.x;	-0.124939
-1.446834	reductions: a+b=b+a,	-0.124939
-2.378780	registers (8	-0.124939
-3.310495	= (A	-0.124939
-1.962910	| (C	-0.124939
-2.502706	software developer	-0.124939
-1.962910	| (B	-0.124939
-0.602014	Prefetch PREFETCH	-0.124939
-1.999122	name mangling.	-0.124939
-2.445773	less susceptible	-0.124939
-1.715495	performance. Stefan	-0.124939
-2.106332	mode (SSE):	-0.124939
-3.802463	of inte-	-0.124939
-0.902998	-msse3 /arch:SSE3	-0.124939
-3.802463	of sum.	-0.124939
-3.153114	as ReadB	-0.124939
-3.291104	or Verilog.	-0.124939
-3.850836	is inferior.	-0.124939
-2.432907	first dimension	-0.124939
-1.300799	third party	-0.124939
-0.902998	www.amd.com. Advices	-0.124939
-2.290871	error reporting.	-0.124939
-3.449984	be wired	-0.124939
-2.894347	Example 14.12a	-0.124939
-3.795255	to refresh	-0.124939
-3.527822	The inequality	-0.124939
-1.079043	templates. Ready	-0.124939
-2.842927	different types.	-0.124939
-3.310495	= !a;	-0.124939
-1.504919	#if defined(__unix__)	-0.124939
-4.129425	the destructor,	-0.124939
-3.827713	a destructor.	-0.124939
-4.129425	the wires	-0.124939
-4.129425	the _mm_clflush	-0.124939
-3.291104	or sizes?	-0.124939
-4.129425	the SelectAddMul	-0.124939
-0.902998	a*0=0 --xxxx-xx	-0.124939
-1.079043	7.13 Loops......................................................................................................................	-0.124939
-3.082131	int iset	-0.124939
-4.129425	the beginning.	-0.124939
-3.795255	to adhere	-0.124939
-3.310495	= __rdtsc();	-0.124939
-2.168831	/ 3.0;	-0.124939
-1.715541	cycles. Calculations	-0.124939
-1.944067	longer loop-	-0.124939
-1.079043	i<100; i++)a[i]=2*i;	-0.124939
-3.654810	and non-recoverable	-0.124939
-2.729123	no side-effects	-0.124939
-0.602014	things. Looking	-0.124939
-2.994258	this loop?	-0.124939
-0.602014	Gnu: Glibc	-0.124939
-2.719317	class C1,	-0.124939
-2.191832	line written.	-0.124939
-3.327981	// Partial	-0.124939
-3.663751	in a[]	-0.124939
-0.902998	consistent modularity	-0.124939
-2.308660	processors properly.	-0.124939
-1.999122	model fast=2	-0.124939
-3.082131	int aa[size]	-0.124939
-1.923355	actually throws	-0.124939
-3.795255	to feed	-0.124939
-4.129425	the former	-0.124939
-0.602014	"FDIV bug".	-0.124939
-2.254256	execution considerably.	-0.124939
-1.446881	developers feel	-0.124939
-3.482454	that relate	-0.124939
-3.310495	= b++;	-0.124939
-2.445773	less well-known	-0.124939
-1.643037	vector. 6.	-0.124939
-3.085097	an antivirus	-0.124939
-2.842927	different sizes,	-0.124939
-4.129425	the pipeline.	-0.124939
-0.602014	%0 "	-0.124939
-3.153114	as gates,	-0.124939
-1.831954	computer games.	-0.124939
-2.518327	return FactorialTable[n];	-0.124939
-2.399785	time. Newer	-0.124939
-2.681804	double IntegerPower	-0.124939
-2.717714	page 158.	-0.124939
-1.680687	(e.g. '>')	-0.124939
-1.962864	Table 18.1.	-0.124939
-1.805579	become obsolete	-0.124939
-3.795255	to 15.1c,	-0.124939
-2.518327	return prediction).	-0.124939
-2.341268	32 -231	-0.124939
-1.880351	feature information,	-0.124939
-3.795255	to a*b*c*2.	-0.124939
-3.195606	with nagging	-0.124939
-1.300799	Multiple threads?	-0.124939
-2.702043	example 7.22.	-0.124939
-1.203935	languages. www.yeppp.info	-0.124939
-2.492447	< &list[100];	-0.124939
-1.805718	Windows, Intel/MASM	-0.124939
-0.902998	82 Keywords	-0.124939
-3.122826	This requires,	-0.124939
-2.016017	their workplace	-0.124939
-1.747633	particularly risky	-0.124939
-3.517654	for "standard	-0.124939
-3.850836	is cached,	-0.124939
-0.902998	obj1; p->f();	-0.124939
-1.643037	bounds checking).	-0.124939
-1.079043	pivot search:	-0.124939
-3.327981	// instrset_detect	-0.124939
-2.975045	time measure.	-0.124939
-3.654810	and concentrate	-0.124939
-3.195606	with double's.	-0.124939
-1.446881	case. Inlined	-0.124939
-3.310495	= a*4	-0.124939
-1.831816	supports this).	-0.124939
-2.572330	* x2;	-0.124939
-0.602014	Language Runtime,	-0.124939
-0.602014	n-1 multiplications,	-0.124939
-1.963326	data. That	-0.124939
-2.544927	we reach	-0.124939
-0.602014	x2*x, x2,	-0.124939
-3.112512	- 76	-0.124939
-0.902998	x-xx----- 75	-0.124939
-3.082131	int 832	-0.124939
-1.300799	considerable job,	-0.124939
-2.406486	optimization job.	-0.124939
-1.300845	8. 71	-0.124939
-2.915311	} 70	-0.124939
-2.452764	See www.openmp.org	-0.124939
-3.654810	and down.	-0.124939
-0.902998	programmer. 79	-0.124939
-3.433503	are CPLDs	-0.124939
-2.568605	array coincides	-0.124939
-2.894347	Example 8.14b	-0.124939
-1.300799	key press.	-0.124939
-2.894347	Example 8.14a	-0.124939
-3.827713	a bit-mask	-0.124939
-1.962910	too worried	-0.124939
-2.106240	: "m"(x)	-0.124939
-4.129425	the sign-bit	-0.124939
-2.894347	Example 7.33a	-0.124939
-2.503028	very stupid.	-0.124939
-2.888228	memory pooling.	-0.124939
-0.902998	(memory pooling)	-0.124939
-3.482454	that shares	-0.124939
-3.654810	and modular.	-0.124939
-1.380026	scan forward)	-0.124939
-1.962818	three advantages:	-0.124939
-3.654810	and WritePrivateProfileString	-0.124939
-1.446834	Studio 2005).	-0.124939
-2.967317	use try,	-0.124939
-2.805661	point multiply-and-add	-0.124939
-1.555979	writes only,	-0.124939
-3.122826	This triangle	-0.124939
-0.902998	x-xxxxxx- x-xxxx-x-	-0.124939
-3.327981	// Rounding	-0.124939
-1.680687	(e.g. Quine–McCluskey	-0.124939
-1.923632	requires n-1	-0.124939
-0.602014	fundamental laws	-0.124939
-3.795255	to x^0/0!	-0.124939
-3.225895	by semicolons,	-0.124939
-2.719960	set (/arch:SSE2,	-0.124939
-3.310495	= float(i);	-0.124939
-2.567869	many decimals.	-0.124939
-0.602014	Certainly not!	-0.124939
-2.458699	stored together......................................	-0.124939
-2.157300	uses (live	-0.124939
-2.805661	point -ffast-math	-0.124939
-3.802463	of managing	-0.124939
-3.082131	int Sum1()	-0.124939
-3.850836	is rare.	-0.124939
-3.449984	be responded	-0.124939
-1.715495	adding -100	-0.124939
-2.652005	+ 2.;	-0.124939
-3.827713	a pre-calculated	-0.124939
-0.902998	(a+b)+c=a+(b+c) a+b+c=c+b+a	-0.124939
-4.129425	the devirtualization	-0.124939
-3.654810	and invoked	-0.124939
-2.611948	two 128-	-0.124939
-2.702043	example 9.5b.	-0.124939
-3.827713	a printer	-0.124939
-0.602014	unattended. Uninstallation	-0.124939
-1.856778	CPUs. Half	-0.124939
-2.719960	set Important	-0.124939
-3.654810	and Func2	-0.124939
-0.902998	let's say	-0.124939
-2.702043	example 12.3a,	-0.124939
-2.519431	variables .........................	-0.124939
-1.680964	128-bit reads.	-0.124939
-0.602014	1./24., 1./120.,	-0.124939
-2.695975	using unions	-0.124939
-0.602014	D, Pascal,	-0.124939
-0.902998	chains, namely	-0.124939
-2.874792	data structure,	-0.124939
-1.963326	data. Multidimensional	-0.124939
-3.153114	as 'this'.	-0.124939
-3.517654	for AVX2,	-0.124939
-1.981440	#include <asmlib.h>	-0.124939
-2.063488	both 16-bit,	-0.124939
-3.195606	with u.i[1]	-0.124939
-1.962910	too much.	-0.124939
-2.544191	variable (eax)	-0.124939
-0.602014	optimize("a", on)	-0.124939
-4.129425	the attention	-0.124939
-4.129425	the level-	-0.124939
-3.654810	and maintainability	-0.124939
-1.643222	f; f=i;	-0.124939
-2.652005	+ 0x7F	-0.124939
-3.433503	are relying	-0.124939
-0.602014	BIOS setup.	-0.124939
-1.446973	1, Monday	-0.124939
-3.085097	an n'th	-0.124939
-1.601644	chapter "Register	-0.124939
-0.902998	(~a&c) a&b&c&d	-0.124939
-2.712916	each clause	-0.124939
-3.449984	be ignored	-0.124939
-3.827713	a union,	-0.124939
-3.527822	The i<20	-0.124939
-0.602014	<, <=,	-0.124939
-3.654810	and relational	-0.124939
-3.067312	x *x;	-0.124939
-1.379980	Performance Primitives"	-0.124939
-1.079043	14.30 finds	-0.124939
-2.347850	always false:	-0.124939
-2.181381	inside square:	-0.124939
-0.602014	Copy protection.	-0.124939
-3.291104	or iterative	-0.124939
-2.078811	shared objects),	-0.124939
-1.747633	particularly tricky.	-0.124939
-4.129425	the opposite:	-0.124939
-4.129425	the for-loop:	-0.124939
-4.129425	the complication	-0.124939
-3.310495	= ++b;	-0.124939
-3.827713	a square.	-0.124939
-0.602014	SelectAddMul_SSE2, SelectAddMul_SSE41,	-0.124939
-3.827713	a strategy	-0.124939
-3.225895	by AND'ing	-0.124939
-2.348587	void xplus2()	-0.124939
-2.244358	processor X?"	-0.124939
-1.079089	Exception Specifications,	-0.124939
-3.827713	a million	-0.124939
-2.063626	allocation (new	-0.124939
-2.994258	this reason.	-0.124939
-2.994258	this reason,	-0.124939
-0.602014	-mveclibabi -fopenmp	-0.124939
-1.856732	smaller squares	-0.124939
-1.203935	optimal. Best-case	-0.124939
-2.281778	about investigation	-0.124939
-3.663751	in disguise.	-0.124939
-2.518327	return route.	-0.124939
-1.962910	too small,	-0.124939
-3.527822	The compactness	-0.124939
-2.803414	loop overhead.	-0.124939
-2.078257	counter //=2*A	-0.124939
-2.106240	: 2.6f;	-0.124939
-0.602014	printf("\n%2i %10I64i",	-0.124939
-2.719054	floating point-to-integer	-0.124939
-2.815473	instruction set?".	-0.124939
-3.527822	The loop-branch	-0.124939
-2.518327	return vector(x	-0.124939
-2.894347	Example 8.8b	-0.124939
-2.824466	functions Encryption,	-0.124939
-2.894347	Example 8.8a	-0.124939
-1.379934	^ ~b	-0.124939
-0.902998	users. Firewalls,	-0.124939
-0.602014	a<<b<<c=a<<(b+c) x-xxx--xx	-0.124939
-2.652005	+ 100*16,	-0.124939
-0.602014	difference. Newest	-0.124939
-2.347850	always normalized,	-0.124939
-2.253980	Windows MFC).	-0.124939
-1.379980	unwinding ..............................................................................	-0.124939
-3.195606	with widely	-0.124939
-3.449984	be re-calculated	-0.124939
-3.527822	The advise	-0.124939
-1.999353	cache. Multithreaded	-0.124939
-2.032269	next year.	-0.124939
-0.902998	specified. Insert	-0.124939
-0.602014	inlining. Reducible	-0.124939
-2.157669	}; vector()	-0.124939
-2.132292	few decades	-0.124939
-3.654810	and decreased	-0.124939
-3.225895	by CPU.............................................................................81	-0.124939
-3.310495	= (*CriticalFunction)(b,	-0.124939
-2.894347	Example 12.7.	-0.124939
-2.348863	access patterns.	-0.124939
-3.654810	and publish	-0.124939
-2.470414	const definitions	-0.124939
-3.310495	= Multiply(10,8);	-0.124939
-3.850836	is "undefined".	-0.124939
-1.079043	handled separately:	-0.124939
-1.643083	PTR [ecx+eax*4],ebx	-0.124939
-4.129425	the std::unexpected()	-0.124939
-3.310495	= b.x	-0.124939
-3.802463	of people.	-0.124939
-0.602014	__except (GetExceptionCode()	-0.124939
-2.203058	option -mveclibabi=svml.	-0.124939
-3.011997	{ a[c][r]	-0.124939
-3.433503	are uninitialized,	-0.124939
-3.802463	of jumps,	-0.124939
-2.132292	few places.	-0.124939
-0.902998	scientific computing,	-0.124939
-0.602014	nowadays stress	-0.124939
-2.157300	128 Iu8vec16	-0.124939
-0.602014	a[1], b[1],	-0.124939
-4.129425	the strictness	-0.124939
-1.962864	Table 2.1.	-0.124939
-2.244358	bytes alignment,	-0.124939
-1.777458	v. 2.00.	-0.124939
-1.300845	wrap around.	-0.124939
-0.602014	sub-expressions. Why	-0.124939
-1.504826	Memory access.......................................................................................................	-0.124939
-2.492447	< arraysize)	-0.124939
-2.316817	type casting.	-0.124939
-2.316817	type casting,	-0.124939
-1.643037	bounds violations	-0.124939
-0.902998	-Ofast -mveclibabi	-0.124939
-3.291104	or malloc.	-0.124939
-0.602014	2014. Last	-0.124939
-3.291104	or malloc)	-0.124939
-0.602014	_mm_stream_pi((__m64*)dest, *(__m64*)&source);	-0.124939
-2.492447	< 231	-0.124939
-2.233913	specific interval.	-0.124939
-3.433503	are removed,	-0.124939
-2.348587	void Func()	-0.124939
-2.092498	certain interval:	-0.124939
-3.663751	in question:	-0.124939
-2.492447	< r2;	-0.124939
-1.643037	bounds violation,	-0.124939
-3.433503	are incremental	-0.124939
-3.850836	is admittedly	-0.124939
-2.452348	critical application-	-0.124939
-3.795255	to collect	-0.124939
-0.602014	Covers PC's,	-0.124939
-1.999122	name "position-independent	-0.124939
-4.129425	the application,	-0.124939
-1.777920	hot spots.	-0.124939
-2.492447	< list.Size();	-0.124939
-4.129425	the essential	-0.124939
-0.602014	r1, r2,	-0.124939
-3.225895	by xx-xx--x-	-0.124939
-3.654810	and mirroring	-0.124939
-3.517654	for "function	-0.124939
-2.191509	been identified.	-0.124939
-0.602014	MAX(f(x), g(x));	-0.124939
-2.157531	functions. Time-	-0.124939
-2.063211	was originally	-0.124939
-3.850836	is 8*1024/64	-0.124939
-0.602014	~a&~b=~(a|b) --xxxx---	-0.124939
-1.380026	follows (using	-0.124939
-4.129425	the series:	-0.124939
-2.652005	+ 3.;	-0.124939
-1.943743	> v.i)	-0.124939
-3.310495	= select_gt(b,	-0.124939
-3.327981	// Prototype	-0.124939
-1.680825	stack. String	-0.124939
-3.195606	with IsPowerOf2	-0.124939
-0.602014	2.20, glibc	-0.124939
-4.129425	the pros	-0.124939
-0.602014	mathe- matical	-0.124939
-0.902998	(properties) ............................................................................	-0.124939
-3.850836	is stored?	-0.124939
-2.595113	also tends	-0.124939
-4.129425	the leftmost	-0.124939
-3.654810	and Gnu).	-0.124939
-1.203935	Taylor series,	-0.124939
-1.203935	Taylor series.	-0.124939
-2.332977	bits (ZMM).	-0.124939
-0.602014	18.1. Command	-0.124939
-1.777458	v. 2.1.7,	-0.124939
-0.902998	effort. Square	-0.124939
-4.129425	the DelayFiveSeconds	-0.124939
-2.407316	how tortuous	-0.124939
-2.378780	registers ..........................................................	-0.124939
-0.602014	dialog boxes,	-0.124939
-0.602014	_mm_hadd_ps(s, s);	-0.124939
-2.729123	no yes	-0.124939
-0.602014	buttons, dialog	-0.124939
-3.085097	an integer).	-0.124939
-1.079089	|, ~.	-0.124939
-1.446834	reductions: ~(~a)	-0.124939
-0.602014	quadratic matrix,	-0.124939
-1.300799	xxxxxxxxx xxxxxxx-x	-0.124939
-1.962956	below. Those	-0.124939
-1.300891	.cpp file)	-0.124939
-2.492585	branch predictions	-0.124939
-1.831908	positive value,	-0.124939
-1.999353	c; x[0]	-0.124939
-3.011997	{ _mm_stream_pi((__m64*)dest,	-0.124939
-3.153114	as OneOrTwo5[(b!=0)	-0.124939
-3.795255	to fake	-0.124939
-2.332977	bits (XMM)	-0.124939
-2.612545	multiple logically	-0.124939
-2.813863	CPU vendors	-0.124939
-2.741712	integer constant,	-0.124939
-3.082131	int i[2];	-0.124939
-3.035733	you analyze	-0.124939
-3.449984	be accomplished	-0.124939
-0.602014	try, catch,	-0.124939
-0.602014	Edition, 2005;	-0.124939
-0.602014	AES, PCLMUL	-0.124939
-1.963049	keyword wherever	-0.124939
-2.016063	complicated criteria	-0.124939
-1.079043	Inheritance ..............................................................................................................	-0.124939
-2.032546	operator ++i	-0.124939
-0.902998	dividing repeatedly	-0.124939
-2.016248	like sqrt,	-0.124939
-3.291104	or __restrict__,	-0.124939
-3.517654	for raising	-0.124939
-0.602014	dvec.h vectorclass.h	-0.124939
-2.032269	next section.	-0.124939
-3.162394	code section,	-0.124939
-2.349186	method unfavorable,	-0.124939
-2.191786	classes 114	-0.124939
-1.079043	Bitfields ...................................................................................................................	-0.124939
-1.963187	programmer hasn't	-0.124939
-0.902998	0.f, 1.f);	-0.124939
-2.341453	SSE2 emmintrin.h	-0.124939
-2.347850	always optimal,	-0.124939
-4.129425	the occurrence	-0.124939
-0.602014	fine-tuning, testing,	-0.124939
-1.747726	preceding row.	-0.124939
-2.824466	functions (methods).........................................................................	-0.124939
-3.073074	than self-styled	-0.124939
-2.652005	+ ia32intrin.h	-0.124939
-2.307969	& N-1)==0,N>::p(x);	-0.124939
-3.827713	a minute	-0.124939
-3.795255	to develop.	-0.124939
-1.203935	are. Declare	-0.124939
-4.129425	the exact	-0.124939
-3.802463	of RAM,	-0.124939
-4.129425	the circumstances	-0.124939
-2.719317	class powN<true,1>	-0.124939
-3.517654	for NOT.	-0.124939
-3.310495	= (0x2710	-0.124939
-1.902212	AVX2 _mm_i64gather_epi32	-0.124939
-2.168831	/ Embarcadero	-0.124939
-2.291102	times lower;	-0.124939
-2.587401	efficient table-based	-0.124939
-1.880167	reduce (a*b*c)+(c*b*a)	-0.124939
-0.602014	38.1 97	-0.124939
-2.119789	calculate pow(x,10)	-0.124939
-0.602014	VTune; AMD's	-0.124939
-3.802463	of data",	-0.124939
-3.011997	{ 92	-0.124939
-3.225895	by allowing	-0.124939
-0.902998	tables". Tips	-0.124939
-3.291104	or vice	-0.124939
-2.609644	number generators.	-0.124939
-3.310495	= x4*x4;	-0.124939
-3.000398	have tested.	-0.124939
-2.652005	+ c.x	-0.124939
-2.652005	+ c.y	-0.124939
-2.244358	processor activates	-0.124939
-1.379934	places back,	-0.124939
-2.950696	when activated	-0.124939
-0.602014	polynomial. Scheduling	-0.124939
-2.894347	Example 7.34b.	-0.124939
-0.902998	("internal"))) Vectorize	-0.124939
-1.831862	my comments,	-0.124939
-1.963557	induction variables:	-0.124939
-1.601690	features 80386	-0.124939
-2.894347	Example 14.16b	-0.124939
-0.902998	Portability note:	-0.124939
-0.602014	13.1, Requires	-0.124939
-4.129425	the Pentium-II	-0.124939
-3.850836	is Borland's	-0.124939
-1.504965	*= xx4;	-0.124939
-0.902998	exceptions: __except	-0.124939
-3.802463	of rounding,	-0.124939
-0.902998	legal issue,	-0.124939
-0.902998	removing superfluous	-0.124939
-2.445773	example, b*2.0/3.0	-0.124939
-3.449984	be mainstream	-0.124939
-3.310495	= _mm_hadd_ps(s,	-0.124939
-3.850836	is Perl.	-0.124939
-2.894347	Example 14.17a	-0.124939
-3.795255	to 15.1c?	-0.124939
-3.517654	for discussions.	-0.124939
-2.282055	stack unwinding.	-0.124939
-1.300799	__asm ("int	-0.124939
-3.795255	to relax	-0.124939
-1.555887	(or higher)	-0.124939
-0.902998	i++ ;checkifi<100	-0.124939
-1.879982	usually dealt	-0.124939
-3.310495	= ((x2)2)2	-0.124939
-2.263986	assembly language:	-0.124939
-3.082131	int dummy[4];	-0.124939
-2.813863	CPU model,	-0.124939
-2.703880	compilers exist	-0.124939
-2.894347	Example 14.15a	-0.124939
-1.680733	structures (without	-0.124939
-0.602014	true/false Loopunrolling	-0.124939
-3.850836	is wrong,	-0.124939
-3.066220	compiler makers	-0.124939
-3.827713	a textbook	-0.124939
-2.106240	: 15;	-0.124939
-2.888228	memory leaks.	-0.124939
-0.602014	unreasonably large.	-0.124939
-0.902998	mask. Poor	-0.124939
-0.602014	22). 159	-0.124939
-2.272911	function. 154	-0.124939
-0.902998	-fno-rtti /GR-	-0.124939
-2.915311	} 152	-0.124939
-1.504919	GOT entry.	-0.124939
-3.251046	function inlining.	-0.124939
-3.795255	to Object1.Hello(),	-0.124939
-3.795255	to 151	-0.124939
-2.092636	well thought-through	-0.124939
-3.795255	to vectorize.	-0.124939
-3.654810	and suggests	-0.124939
-3.073074	than normally.	-0.124939
-1.902212	AVX2 _mm256_i32gather_ps	-0.124939
-1.880121	10 Multithreading..............................................................................................................	-0.124939
-0.602014	abort(), _endthread(),	-0.124939
-0.602014	(live ranges)	-0.124939
-2.016386	implementation analogous	-0.124939
-2.611948	two summation	-0.124939
-2.440447	operating system.........................................................................................	-0.124939
-2.518327	return statement:	-0.124939
-2.347850	always happy	-0.124939
-2.440447	operating systems"	-0.124939
-1.715634	Linux, Gnu/AT&T	-0.124939
-2.341130	programming Device	-0.124939
-3.654810	and mainframes,	-0.124939
-4.129425	the device.	-0.124939
-0.602014	"Macro loops"	-0.124939
-0.902998	vectorclass www.agner.org/optimize/#vectorclass.	-0.124939
-0.602014	("fldl %1	-0.124939
-1.079043	access, sort	-0.124939
-0.602014	7.29b floata;	-0.124939
-1.643175	Automatic updates.	-0.124939
-1.981717	automatic updates,	-0.124939
-1.446927	log (b[i]	-0.124939
-3.827713	a level-3	-0.124939
-1.777458	v. 14.00	-0.124939
-1.555933	enabled. Few	-0.124939
-3.482454	that detects	-0.124939
-1.999122	name _alloca)	-0.124939
-0.602014	occurred anywhere	-0.124939
-1.923586	public B1,	-0.124939
-2.016433	Most importantly,	-0.124939
-1.203935	Graphics .................................................................................................................	-0.124939
-0.602014	b[0], a[1],	-0.124939
-2.572330	* 0.5	-0.124939
-1.601598	user. Feature	-0.124939
-0.602014	0.95 0.6	-0.124939
-2.994258	this mask,	-0.124939
-2.616466	float list[ARRAYSIZE];	-0.124939
-2.426704	64 Is16vec4	-0.124939
-1.923494	processors. Details	-0.124939
-1.555979	include JavaScript,	-0.124939
-1.203935	100, NUMCOLUMNS	-0.124939
-1.555887	(or eight)	-0.124939
-1.643037	way. First	-0.124939
-3.802463	of losing	-0.124939
-3.310495	= ReadTSC();	-0.124939
-2.813863	CPU cores:	-0.124939
-3.066220	compiler makers.	-0.124939
-1.556025	Don't panic	-0.124939
-3.527822	The sin	-0.124939
-3.850836	is rebooted.	-0.124939
-1.902489	calling WritePrivateProfileString,	-0.124939
-1.079043	catch (...)	-0.124939
-2.811801	other abuse	-0.124939
-2.994258	this place.	-0.124939
-3.082131	int FuncCol(int);	-0.124939
-2.032546	operator %.	-0.124939
-1.079089	CPUs"). Const	-0.124939
-3.327981	// abs(u.f)	-0.124939
-1.556164	Instruction tables:	-0.124939
-2.492447	< 13)	-0.124939
-0.602014	feasible. Interference	-0.124939
-2.842927	different times:	-0.124939
-3.112512	- min))	-0.124939
-1.643083	resource conflicts.	-0.124939
-0.602014	work, 133	-0.124939
-2.525000	so complicated?	-0.124939
-0.602014	patch. 131	-0.124939
-1.981440	#include <math.h>	-0.124939
-2.078304	program. Application	-0.124939
-2.567869	many people	-0.124939
-0.902998	floppy disks	-0.124939
-1.555887	parameter. Further	-0.124939
-4.129425	the single-thread	-0.124939
-1.601875	garbage collection,	-0.124939
-2.915311	} 138	-0.124939
-3.251046	function tables.	-0.124939
-0.602014	fistpl %0	-0.124939
-1.504826	jump tables,	-0.124939
-2.016017	their superior	-0.124939
-2.954036	more powerful.	-0.124939
-4.129425	the newsgroup	-0.124939
-3.035733	you activate	-0.124939
-3.291104	or C2,	-0.124939
-3.195606	with massively	-0.124939
-3.827713	a unique	-0.124939
-3.795255	to multithreading	-0.124939
-3.850836	is unlikely	-0.124939
-3.802463	of pending	-0.124939
-2.499179	order a[0],	-0.124939
-2.426704	64 kb.	-0.124939
-1.880351	512 kb,	-0.124939
-2.223786	common practice	-0.124939
-1.504919	identification (RTTI).	-0.124939
-1.504919	identification (RTTI),	-0.124939
-2.892568	from www.agner.org/optimize/testp.zip	-0.124939
-0.902998	Unions ....................................................................................................................	-0.124939
-3.654810	and ease	-0.124939
-3.449984	be resized	-0.124939
-1.747911	Pentium Pro	-0.124939
-1.601783	come unpredictably	-0.124939
-2.203150	integers ...................................	-0.124939
-1.747772	calls. Internal	-0.124939
-2.741712	integer comparisons.	-0.124939
-0.902998	(3) trap	-0.124939
-3.449984	be overridden	-0.124939
-0.602014	Iu32vec4 Vec4ui	-0.124939
-2.223786	common sub-expressions.	-0.124939
-1.300799	valid addresses,	-0.124939
-2.842927	different opinions	-0.124939
-2.842927	different microprocessors,	-0.124939
-2.016294	members individually.	-0.124939
-2.652005	+ p->b;}	-0.124939
-2.803414	loop initialisation	-0.124939
-3.482454	that matters	-0.124939
-0.602014	fine- tune	-0.124939
-1.962864	Table 18.3.	-0.124939
-3.073074	than doubled	-0.124939
-2.370928	these categories:	-0.124939
-1.079043	service routines,	-0.124939
-3.802463	of usability.	-0.124939
-1.962910	| 0x8040);	-0.124939
-3.827713	a couple	-0.124939
-2.106332	mode Parameter	-0.124939
-3.449984	be huge).	-0.124939
-2.702043	example 13.1,	-0.124939
-1.556025	faster. Of	-0.124939
-1.079043	(page 131)	-0.124939
-3.654810	and recompile	-0.124939
-3.327981	// Main	-0.124939
-0.902998	named MKL,	-0.124939
-2.652005	+ log(c[i]);.	-0.124939
-1.079043	being said,	-0.124939
-3.802463	of ways).	-0.124939
-1.831862	my tests,	-0.124939
-1.880121	binary trees,	-0.124939
-3.827713	a template:	-0.124939
-0.602014	(80 bits).	-0.124939
-3.310495	= FactorialTable[b];	-0.124939
-1.300845	doubled. Thin	-0.124939
-2.518327	return _mm_cvtsd_si32(_mm_load_sd(&x));}	-0.124939
-0.602014	(short int)i;	-0.124939
-3.000398	have tried.	-0.124939
-3.663751	in Windows)	-0.124939
-2.119835	structure needed?	-0.124939
-1.446973	1, 2A,	-0.124939
-3.153114	as follows.	-0.124939
-3.310495	= static_cast<float>(i);	-0.124939
-2.994258	this respect.	-0.124939
-1.923540	becomes bulky	-0.124939
-2.930452	A light-weight	-0.124939
-1.079043	One kilobyte	-0.124939
-3.172431	on correction	-0.124939
-3.281894	it supports.	-0.124939
-2.813863	CPU supports,	-0.124939
-1.643175	temp; 104	-0.124939
-3.225895	by 5-10%	-0.124939
-2.106148	1 0.5ns.	-0.124939
-3.850836	is profitable.	-0.124939
-2.317001	0 255	-0.124939
-3.527822	The MASM	-0.124939
-2.717714	page 150.	-0.124939
-2.815473	instruction directly,	-0.124939
-3.281894	it directly.	-0.124939
-3.051374	may argue	-0.124939
-3.654810	and planned	-0.124939
-2.994258	this capability:	-0.124939
-1.715495	final destination,	-0.124939
-0.602014	UNIX shell	-0.124939
-3.291104	or .so).	-0.124939
-3.802463	of range");	-0.124939
-3.449984	be reversed	-0.124939
-0.602014	-2.0, 4.4,	-0.124939
-3.153114	as reflecting	-0.124939
-3.251046	function scanf.	-0.124939
-3.663751	in isolation	-0.124939
-3.433503	are limiting	-0.124939
-2.492816	32-bit (signed)	-0.124939
-2.805661	point exceptions,	-0.124939
-2.975045	time measurements:	-0.124939
-1.446881	Now 1.0	-0.124939
-3.291104	or YMM)	-0.124939
-1.601644	sets. Covers	-0.124939
-0.902998	Tuesday, Wednesday,	-0.124939
-2.378780	registers (XMM	-0.124939
-2.426704	64 Iu16vec4	-0.124939
-2.544145	any patch.	-0.124939
-0.602014	1/n! 1.,	-0.124939
-3.121718	not met	-0.124939
-1.079043	(MS) xopintrin.h	-0.124939
-0.602014	FuncB(i+1); FuncC(i+1);	-0.124939
-3.827713	a ^0	-0.124939
-4.129425	the C99	-0.124939
-2.157300	128 Iu16vec8	-0.124939
-1.079043	(MS) x86intrin.h	-0.124939
-0.602014	packing, unpacking	-0.124939
-3.291104	or -fsource-asm).	-0.124939
-1.079043	feedback seriously.	-0.124939
-0.602014	BigArray[1024] __attribute__((aligned(64)));	-0.124939
-1.601690	might clash	-0.124939
-0.902998	first-in-last-out nature	-0.124939
-1.300845	equivalent if(!(a	-0.124939
-2.888228	memory requirement.	-0.124939
-0.902998	2001. Advanced	-0.124939
-3.449984	be propagated	-0.124939
-3.795255	to C0::f	-0.124939
-2.426704	64 I64vec1	-0.124939
-2.719317	class (CParent<>)	-0.124939
-1.203935	bad dilemma.	-0.124939
-1.943790	set. High	-0.124939
-0.602014	4.0.1. Gnu:	-0.124939
-3.654810	and API's.	-0.124939
-2.568881	possible workaround.	-0.124939
-1.777458	v. 4.1.0,	-0.124939
-1.203981	MOVNTQ _mm_empty();	-0.124939
-3.663751	in parallel:	-0.124939
-3.245026	if our	-0.124939
-3.112512	- 8.0f)	-0.124939
-3.850836	is considerable.	-0.124939
-0.602014	"frame pointer".	-0.124939
-0.602014	Abrash: "Zen	-0.124939
-1.981763	reference parameters).	-0.124939
-1.300799	considerable debate	-0.124939
-2.930452	A Pragmatic	-0.124939
-0.902998	9. Multiplications	-0.124939
-2.858815	vector (1,2,3,4),	-0.124939
-1.079043	(page 146).	-0.124939
-3.663751	in green.	-0.124939
-0.602014	Typical candidates	-0.124939
-0.902998	C++: Preprocessor	-0.124939
-1.446927	threads. Out-of-order	-0.124939
-2.263341	while he	-0.124939
-3.225895	by thousands	-0.124939
-3.827713	a menu	-0.124939
-2.994258	this chapter,	-0.124939
-3.654810	and shuffling	-0.124939
-3.310495	= a&&b	-0.124939
-2.668262	pointer arithmetics	-0.124939
-1.999122	256 Vec32uc	-0.124939
-1.777458	variables. Move	-0.124939
-2.063580	write if(!a	-0.124939
-3.517654	for (column	-0.124939
-2.611948	two gives:	-0.124939
-3.795255	to 100000000.	-0.124939
-2.340669	file formats.	-0.124939
-0.602014	11, Iss.	-0.124939
-2.861745	has incomplete	-0.124939
-0.602014	0: printf("Alpha");	-0.124939
-1.300799	third generations	-0.124939
-3.850836	is costless.	-0.124939
-0.602014	heuristic guidelines.	-0.124939
-2.892568	from knowing	-0.124939
-2.702043	example 7.43b	-0.124939
-3.310495	= 18,	-0.124939
-0.602014	lag. Thinking	-0.124939
-1.601644	address. Relocation	-0.124939
-1.715495	registers. Typical	-0.124939
-0.602014	32767 int16_t	-0.124939
-0.902998	License shall	-0.124939
-2.572330	* p2;	-0.124939
-1.923863	public: SafeArray()	-0.124939
-3.654810	and closes	-0.124939
-0.602014	232-1 uint32_t	-0.124939
-2.106240	: 2.5f;	-0.124939
-1.446973	debug breakpoints	-0.124939
-3.663751	in favor	-0.124939
-0.602014	Darwin8 g++	-0.124939
-0.902998	Introduction .......................................................................................................................	-0.124939
-1.747680	still frustrated	-0.124939
-1.777458	v. 2.7,	-0.124939
-2.202919	+= a[i+3];	-0.124939
-4.129425	the columns.	-0.124939
-1.446834	T a[N];	-0.124939
-2.213367	etc. Overriding	-0.124939
-2.492447	< columns;	-0.124939
-0.602014	7.41b a.x	-0.124939
-0.602014	d.x; a.y	-0.124939
-2.874792	data conversion,	-0.124939
-3.850836	is responsible	-0.124939
-0.602014	for-loop: i++;	-0.124939
-0.902998	F1() throw();	-0.124939
-1.300799	increment i++.	-0.124939
-1.999168	development tools,	-0.124939
-2.426888	take precedence,	-0.124939
-2.445681	call __intel_cpu_features_init_x().	-0.124939
-3.327981	// (Some	-0.124939
-3.654810	and well-	-0.124939
-2.892568	from a=a*2;	-0.124939
-3.085097	an interrupt,	-0.124939
-1.777458	v. 7.1-4,	-0.124939
-3.195606	with truncation,	-0.124939
-1.203935	years old.	-0.124939
-2.317001	0 232-1	-0.124939
-2.281732	its API.	-0.124939
-2.316817	type __m128d	-0.124939
-0.602014	Addison- Wesley	-0.124939
-2.717714	page 73)	-0.124939
-2.717714	page 73.	-0.124939
-1.203935	F32vec4 s(0.f,	-0.124939
-4.129425	the Active	-0.124939
-3.827713	a subset,	-0.124939
-2.568881	possible remedies	-0.124939
-4.129425	the resultant	-0.124939
-4.129425	the sequence,	-0.124939
-3.357106	can safely	-0.124939
-2.307969	& 1];	-0.124939
-4.129425	the kind:	-0.124939
-0.602014	multitasking environment,	-0.124939
-2.719054	floating point).	-0.124939
-1.923586	directives (everything	-0.124939
-0.602014	studio 2008,	-0.124939
-1.504873	__restrict bb)	-0.124939
-0.602014	7.1-4, 2008.	-0.124939
-2.858815	vector intrinsics,	-0.124939
-2.858815	vector intrinsics.	-0.124939
-2.272911	extra layer	-0.124939
-1.203981	Out (FILO)	-0.124939
-1.680871	level linking"	-0.124939
-2.894347	Example 14.2a	-0.124939
-2.894347	Example 14.2b	-0.124939
-2.913667	then FuncC.	-0.124939
-3.654810	and similarly	-0.124939
-0.902998	2008 R2	-0.124939
-0.602014	0x800 apart.	-0.124939
-1.902351	vectors FMA3	-0.124939
-3.654810	and clarity	-0.124939
-2.016248	like these,	-0.124939
-3.195606	with these.	-0.124939
-0.602014	hardware. Porting	-0.124939
-2.191509	been wasted.	-0.124939
-0.902998	memcpy, memmove,	-0.124939
-1.643083	PTR [esp+12]	-0.124939
-3.517654	for reserving	-0.124939
-3.850836	is tempting	-0.124939
-1.962818	three levels	-0.124939
-2.348587	void DelayFiveSeconds()	-0.124939
-3.795255	to mimic	-0.124939
-0.602014	Addison-Wesley. Third	-0.124939
-3.517654	for identifying	-0.124939
-2.291517	I disagree	-0.124939
-3.153114	as AQtime,	-0.124939
-2.894347	Example 14.29	-0.124939
-2.894347	Example 14.24	-0.124939
-2.894347	Example 14.25	-0.124939
-1.601736	problem. Vectors	-0.124939
-1.079043	approximate reciprocal,	-0.124939
-2.894347	Example 14.20	-0.124939
-2.702043	example 14.21	-0.124939
-3.795255	to adapt	-0.124939
-3.433503	are unrelated	-0.124939
-4.129425	the unit-	-0.124939
-1.079043	FUNCNAME SelectAddMul_AVX2	-0.124939
-3.310495	= 1.6;	-0.124939
-1.380073	spent fighting	-0.124939
-3.654810	and __intel_new_strlen	-0.124939
-3.654810	and ARM	-0.124939
-3.663751	in a[i].	-0.124939
-1.981671	running at,	-0.124939
-3.357106	can overwrite	-0.124939
-0.602014	increments seconds.	-0.124939
-2.882274	at 399	-0.124939
-2.192108	AVX immintrin.h	-0.124939
-3.827713	a BSF	-0.124939
-3.082131	int seconds;	-0.124939
-2.572330	* 1.2f;	-0.124939
-3.153114	as intended,	-0.124939
-3.291104	or makefile.	-0.124939
-2.567869	many strings.	-0.124939
-3.051374	may supply	-0.124939
-3.827713	a queue.	-0.124939
-2.832806	same queue,	-0.124939
-2.457547	called from),	-0.124939
-3.517654	for distinguishing	-0.124939
-0.602014	ended queue)	-0.124939
-0.602014	newsgroup comp.lang.asm.x86	-0.124939
-2.702043	example 9.1b.	-0.124939
-3.082131	int cc[]);	-0.124939
-1.203935	parallel. Coarse-grained	-0.124939
-1.446927	Intel's term	-0.124939
-0.602014	string, wstring	-0.124939
-1.923586	public symbols,	-0.124939
-1.203935	dramatic consequences.	-0.124939
-3.802463	of activating	-0.124939
-2.202919	+= sum2;	-0.124939
-3.850836	is minimized.	-0.124939
-1.880213	were inserted,	-0.124939
-2.518327	return *(T*)0;	-0.124939
-2.702043	example 7.30b.	-0.124939
-3.051374	may vary	-0.124939
-2.894347	Example 14.4a	-0.124939
-3.357106	can exceed	-0.124939
-1.962818	&& WriteFile(handle,	-0.124939
-3.310495	= &SelectAddMul_SSE41;	-0.124939
-0.602014	Vol. 11,	-0.124939
-2.191509	been allocated.	-0.124939
-1.601644	chapter 11.	-0.124939
-3.850836	is minimized	-0.124939
-3.121718	not alias,	-0.124939
-2.915311	} 115	-0.124939
-2.679119	Intel Atom	-0.124939
-0.602014	<float, 100>	-0.124939
-0.602014	instrset_detect(); 116	-0.124939
-2.915311	} 111	-0.124939
-0.902998	mask); 110	-0.124939
-1.300845	wrap around,	-0.124939
-2.915311	} 112	-0.124939
-2.203058	option -Wstrict-overflow=2,	-0.124939
-0.902998	FuncC(i); FuncB(i+1);	-0.124939
-0.602014	operands: minimum,	-0.124939
-1.880213	called. 118	-0.124939
-2.106240	: 11;	-0.124939
-3.827713	a constant:	-0.124939
-0.602014	Warren, Jr.:	-0.124939
-2.717714	page 70).	-0.124939
-2.032546	operator (&)	-0.124939
-2.915311	} list[300]	-0.124939
-2.132431	operators (&&	-0.124939
-3.327981	// Default	-0.124939
-0.902998	remain locked	-0.124939
-0.602014	1./6., 1./24.,	-0.124939
-2.308153	instructions executed,	-0.124939
-3.449984	be annoying.	-0.124939
-1.079043	list. 94	-0.124939
-2.048340	space 91	-0.124939
-2.202919	+= 9;	-0.124939
-0.602014	9.5a: 98	-0.124939
-2.032638	Mac platform,	-0.124939
-2.950696	when exiting	-0.124939
-2.534207	some rare	-0.124939
-3.802463	of occupying	-0.124939
-3.310495	= _mm_and_si128(c2,	-0.124939
-3.654810	and getting	-0.124939
-3.663751	in Linux).	-0.124939
-2.894347	Example 7.41a	-0.124939
-2.894347	Example 7.41b	-0.124939
-1.504780	zero. Zero	-0.124939
-1.880213	<< endl;	-0.124939
-1.680779	% 0x20	-0.124939
-1.643083	PTR [eax],	-0.124939
-0.602014	7.38b. Alternative	-0.124939
-1.999353	Boolean NOT	-0.124939
-0.902998	misleading reports	-0.124939
-2.244082	big registration	-0.124939
-3.085097	an error;	-0.124939
-3.251046	function calling.	-0.124939
-1.446927	far (arrays	-0.124939
-3.291104	or __declspec(thread).	-0.124939
-2.572330	* 2.5;	-0.124939
-4.129425	the granularity	-0.124939
-1.203935	u.i &=	-0.124939
-3.449984	be accessed.	-0.124939
-4.129425	the BIOS	-0.124939
-3.827713	a quadratic	-0.124939
-2.913667	then d+e,	-0.124939
-3.850836	is re-loaded	-0.124939
-2.299599	constant 2.5,	-0.124939
-2.132569	contains writeable	-0.124939
-3.291104	or malloc/free	-0.124939
-1.203935	enabled (single	-0.124939
-2.457547	called "Gnu	-0.124939
-0.902998	xor eax,eax.	-0.124939
-3.663751	in nature,	-0.124939
-0.902998	-mssse3 /arch:SSSE2	-0.124939
-2.717714	page 27.	-0.124939
-0.602014	Wesley 1997.	-0.124939
-2.191786	classes Nowadays,	-0.124939
-1.680687	(e.g. Sandy	-0.124939
-1.379980	Performance Primitives".	-0.124939
-1.999122	name ;startofFunc	-0.124939
-3.663751	in meaningless	-0.124939
-3.067312	x ---	-0.124939
-2.419436	often disturb	-0.124939
-4.129425	the task-specific	-0.124939
-0.902998	connections. Temporary	-0.124939
-3.122826	This '1'	-0.124939
-3.310495	= dummy[0];	-0.124939
-3.654810	and replaces	-0.124939
-3.310495	= a+1;	-0.124939
-3.121718	not reproducible.	-0.124939
-2.273557	does incredibly	-0.124939
-1.715587	variable. Efficiency	-0.124939
-1.079089	valid. Re-interpreting	-0.124939
-2.695975	using inheritance.	-0.124939
-2.612545	multiple inheritance,	-0.124939
-1.079043	brand. Future	-0.124939
-1.902443	second sub-vector	-0.124939
-2.032546	operator (^)	-0.124939
-3.850836	is incurred	-0.124939
-1.962818	&& b<c)	-0.124939
-1.981624	Microsoft Foundation	-0.124939
-2.811801	other volumes	-0.124939
-1.962864	Table 12.2.	-0.124939

\3-grams:
-1.591150	it is the	-0.124939
-0.557155	on is the	-0.124939
-1.617528	code is the	-0.124939
-0.869631	This is the	-0.249877
-0.946082	this is the	-0.124939
-1.337947	It is the	-0.903090
-1.093071	which is the	-0.124939
-1.297731	set is the	-0.124939
-1.339164	pointer is the	-0.124939
-1.232970	array is the	-0.124939
-0.557155	where is the	-0.124939
-1.355925	variable is the	-0.124939
-1.047048	access is the	-0.124939
-0.956377	SSE2 is the	-0.124939
-1.143958	stack is the	-0.124939
-0.557155	calls is the	-0.124939
-1.258325	result is the	-0.124939
-1.223139	matrix is the	-0.124939
-1.451438	solution is the	-0.124939
-0.557155	i++) is the	-0.124939
-1.112197	list is the	-0.124939
-0.817491	reference is the	-0.124939
-1.047048	n is the	-0.124939
-0.956377	r is the	-0.124939
-1.258325	here is the	-0.124939
-0.557155	strings is the	-0.124939
-0.956377	containers is the	-0.124939
-0.993582	alternative is the	-0.124939
-0.557155	metaprogramming is the	-0.124939
-0.361688	cycle is the	-0.425969
-0.956377	parallelism is the	-0.124939
-0.557155	?Func@@YAXQAHAAH@Z is the	-0.124939
-0.557155	"what is the	-0.124939
-0.557155	$B1$2 is the	-0.124939
-0.557155	considering is the	-0.124939
-0.557155	burden is the	-0.124939
-0.557155	eee is the	-0.124939
-0.557155	fffff is the	-0.124939
-0.557155	eax,1 is the	-0.124939
-0.499044	be of the	-0.124939
-0.396636	function of the	-0.124939
-0.354652	code of the	-0.124939
-0.557912	use of the	-0.124939
-0.354652	more of the	-0.124939
-0.399584	because of the	-0.124939
-0.142087	which of the	-0.124939
-0.499044	all of the	-0.124939
-0.052491	one of the	-0.166331
-0.499044	class of the	-0.124939
-0.499044	each of the	-0.124939
-0.220416	most of the	-0.182931
-0.438075	size of the	-0.166331
-0.071499	multiple of the	-0.329059
-0.437415	object of the	-0.249877
-0.138603	many of the	-0.124939
-0.128404	version of the	-0.425969
-0.303401	value of the	-0.287666
-0.237148	any of the	-0.124939
-0.081324	some of the	-0.191886
-0.371413	performance of the	-0.124939
-0.283194	order of the	-0.346788
-0.675135	branch of the	-0.124939
-0.270675	member of the	-0.346788
-0.228864	address of the	-0.197489
-0.354652	call of the	-0.124939
-0.701439	out of the	-0.124939
-0.156048	part of the	-0.510290
-0.375757	bits of the	-0.124939
-0.703164	case of the	-0.124939
-0.354652	Some of the	-0.124939
-0.257159	versions of the	-0.212089
-0.209859	result of the	-0.367977
-0.354652	element of the	-0.124939
-0.354652	much of the	-0.124939
-0.340613	overflow of the	-0.425969
-0.631543	integers of the	-0.124939
-1.099181	power of the	-0.124939
-0.451735	calculation of the	-0.124939
-0.354652	parameters of the	-0.124939
-0.354652	problem of the	-0.124939
-0.702819	advantage of the	-0.301030
-0.354652	support of the	-0.124939
-0.354652	few of the	-0.124939
-0.560260	structure of the	-0.124939
-0.247832	copy of the	-0.124939
-0.499044	problems of the	-0.124939
-0.354652	space of the	-0.124939
-0.195432	implementation of the	-0.124939
-0.089389	Most of the	-0.124939
-0.474566	members of the	-0.602060
-0.742872	disadvantage of the	-0.124939
-0.354652	resources of the	-0.124939
-0.522598	end of the	-0.124939
-0.354652	runtime of the	-0.124939
-0.029214	parts of the	-0.580871
-0.571823	instead of the	-0.124939
-0.354652	Each of the	-0.124939
-0.499044	optimizations of the	-0.124939
-0.499044	Many of the	-0.124939
-0.499044	results of the	-0.124939
-0.354652	operands of the	-0.124939
-0.263909	start of the	-0.124939
-0.586118	overhead of the	-0.124939
-0.263909	installation of the	-0.124939
-0.161531	instance of the	-0.124939
-0.499044	output of the	-0.124939
-0.354652	task of the	-0.124939
-0.540242	efficiency of the	-0.124939
-0.463281	discussion of the	-0.124939
-0.340613	offset of the	-0.124939
-0.706876	sake of the	-0.124939
-0.340613	effect of the	-0.124939
-0.354652	frequency of the	-0.124939
-0.142087	iteration of the	-0.124939
-0.354652	models of the	-0.124939
-0.499044	names of the	-0.124939
-0.560260	loading of the	-0.124939
-0.499044	reading of the	-0.124939
-0.354652	situation of the	-0.124939
-0.700745	implementations of the	-0.124939
-0.499044	sizes of the	-0.124939
-0.142087	fraction of the	-0.124939
-0.641643	length of the	-0.124939
-0.120091	beginning of the	-0.191886
-0.499044	declaration of the	-0.124939
-0.560260	features of the	-0.124939
-0.609924	waste of the	-0.124939
-0.263909	independent of the	-0.124939
-0.354652	help of the	-0.124939
-0.695501	explanation of the	-0.124939
-0.354652	logic of the	-0.124939
-0.338644	care of the	-0.124939
-0.594440	purpose of the	-0.124939
-0.340613	instances of the	-0.124939
-0.499044	body of the	-0.124939
-0.354652	changes of the	-0.124939
-0.594440	representation of the	-0.124939
-0.036082	responsibility of the	-0.492916
-0.142087	reciprocal of the	-0.425969
-0.354652	polynomial of the	-0.124939
-0.782108	scope of the	-0.124939
-0.560260	throughput of the	-0.124939
-0.499044	step of the	-0.124939
-0.473295	regardless of the	-0.124939
-0.499044	compilation of the	-0.124939
-0.340613	behavior of the	-0.124939
-0.354652	job of the	-0.124939
-0.560260	requirements of the	-0.124939
-0.051399	rest of the	-0.124939
-0.499044	latency of the	-0.124939
-0.354652	facilities of the	-0.124939
-0.354652	estimate of the	-0.124939
-0.089389	ownership of the	-0.602060
-0.340613	drawbacks of the	-0.425969
-0.354652	modification of the	-0.124939
-0.354652	modifications of the	-0.124939
-0.354652	lengths of the	-0.124939
-0.089389	50% of the	-0.124939
-0.499044	bytes) of the	-0.124939
-0.354652	resolution of the	-0.124939
-0.499044	redesign of the	-0.124939
-0.354652	analysis of the	-0.124939
-0.499044	contents of the	-0.124939
-0.354652	generality of the	-0.124939
-0.499044	track of the	-0.124939
-0.263909	rid of the	-0.124939
-0.354652	decomposition of the	-0.124939
-0.354652	Documentation of the	-0.124939
-0.089389	none of the	-0.301030
-0.354652	indeed of the	-0.124939
-0.354652	beware of the	-0.124939
-0.354652	90% of the	-0.124939
-0.354652	lowest of the	-0.124939
-0.499044	understanding of the	-0.124939
-0.142087	99% of the	-0.425969
-0.354652	expansions of the	-0.124939
-0.142087	10% of the	-0.425969
-0.354652	1/50 of the	-0.124939
-0.354652	architecture of the	-0.124939
-0.354652	flexibility of the	-0.124939
-0.354652	decimals of the	-0.124939
-0.354652	None of the	-0.124939
-0.354652	bility of the	-0.124939
-0.354652	evaluation of the	-0.124939
-0.354652	bias of the	-0.124939
-0.354652	overview of the	-0.124939
-0.354652	knowledge of the	-0.124939
-0.354652	x86) of the	-0.124939
-0.354652	searches of the	-0.124939
-0.354652	systematization of the	-0.124939
-0.354652	fragmentation of the	-0.124939
-0.354652	Much of the	-0.124939
-0.354652	segmentation of the	-0.124939
-0.354652	dimensions of the	-0.124939
-0.354652	investigation of the	-0.124939
-0.354652	compactness of the	-0.124939
-0.354652	nature of the	-0.124939
-0.354652	clarity of the	-0.124939
-1.085853	a to the	-0.124939
-0.622701	it to the	-0.124939
-1.082668	code to the	-0.124939
-0.710422	as to the	-0.124939
-1.285024	compiler to the	-0.124939
-0.223737	x to the	-0.602060
-0.699248	memory to the	-0.425969
-0.710422	data to the	-0.124939
-0.710422	only to the	-0.124939
-0.498443	point to the	-0.249877
-0.495128	all to the	-0.124939
-1.121494	integer to the	-0.124939
-0.380779	pointer to the	-0.301030
-0.814944	object to the	-0.124939
-0.878456	number to the	-0.124939
-0.180523	static to the	-0.425969
-0.634077	call to the	-0.249877
-0.784107	pointers to the	-0.124939
-0.460655	access to the	-0.124939
-0.710422	16 to the	-0.124939
-0.495128	instructions to the	-0.124939
-0.710422	available to the	-0.124939
-0.180523	constant to the	-0.124939
-1.164908	important to the	-0.124939
-1.034700	calls to the	-0.124939
-0.495128	execution to the	-0.124939
-1.108362	known to the	-0.124939
-0.495128	add to the	-0.124939
-0.495128	write to the	-0.124939
-0.180523	parameter to the	-0.124939
-0.641658	reference to the	-0.301030
-0.710422	n to the	-0.124939
-0.437890	addition to the	-0.124939
-0.495128	transferred to the	-0.124939
-0.180523	interface to the	-0.124939
-0.514511	goes to the	-0.124939
-0.710422	link to the	-0.124939
-0.710422	made to the	-0.124939
-0.562477	points to the	-0.124939
-0.710422	go to the	-0.124939
-0.710422	overhead to the	-0.124939
-0.052256	relative to the	-0.425969
-0.437890	writing to the	-0.425969
-0.495128	clear to the	-0.124939
-0.495128	iteration to the	-0.124939
-0.878456	changed to the	-0.124939
-0.495128	updates to the	-0.124939
-0.334234	directly to the	-0.124939
-0.223737	copied to the	-0.124939
-0.710422	similar to the	-0.124939
-0.878456	back to the	-0.124939
-0.495128	row to the	-0.124939
-0.387325	added to the	-0.124939
-0.995484	applies to the	-0.124939
-0.495128	eax to the	-0.124939
-0.495128	throw() to the	-0.124939
-0.495128	inputs to the	-0.124939
-0.180523	distributed to the	-0.425969
-0.063507	equal to the	-0.124939
-0.495128	reads to the	-0.124939
-0.495128	column to the	-0.124939
-0.574309	due to the	-0.124939
-0.710422	obvious to the	-0.124939
-0.710422	swapped to the	-0.124939
-0.180523	limit to the	-0.124939
-0.514511	belong to the	-0.124939
-0.495128	place to the	-0.124939
-0.437890	thanks to the	-0.124939
-0.814944	alternatives to the	-0.124939
-0.495128	modifications to the	-0.124939
-0.080949	according to the	-0.124939
-0.814944	extended to the	-0.124939
-0.495128	inconvenient to the	-0.124939
-0.710422	inferior to the	-0.124939
-0.180523	annoying to the	-0.124939
-0.710422	translated to the	-0.124939
-0.814944	leads to the	-0.124939
-0.710422	compared to the	-0.124939
-0.495128	refers to the	-0.124939
-0.710422	belongs to the	-0.124939
-0.495128	caller to the	-0.124939
-0.180523	contribution to the	-0.425969
-0.495128	messages to the	-0.124939
-0.495128	extension to the	-0.124939
-0.495128	Updates to the	-0.124939
-0.495128	unacceptable to the	-0.124939
-0.495128	According to the	-0.124939
-0.495128	closest to the	-0.124939
-0.495128	conform to the	-0.124939
-0.495128	closer to the	-0.124939
-0.495128	adapt to the	-0.124939
-1.028624	function and the	-0.124939
-0.836526	compiler and the	-0.124939
-0.170729	CPU and the	-0.726999
-0.584510	cache and the	-0.425969
-0.836526	library and the	-0.124939
-0.727296	2 and the	-0.124939
-0.505328	elements and the	-0.124939
-0.961464	called and the	-0.124939
-1.079776	pointers and the	-0.124939
-0.727296	32 and the	-0.124939
-0.949343	file and the	-0.124939
-1.079627	0 and the	-0.124939
-0.982807	processors and the	-0.124939
-0.634451	times and the	-0.124939
-1.132187	Windows and the	-0.124939
-0.505328	calculations and the	-0.124939
-0.338900	processor and the	-0.124939
-0.949343	language and the	-0.124939
-0.505328	etc. and the	-0.124939
-0.903387	allocated and the	-0.124939
-0.505328	parameters and the	-0.124939
-0.444568	count and the	-0.124939
-0.727296	microprocessor and the	-0.124939
-0.961464	branches and the	-0.124939
-0.505328	development and the	-0.124939
-0.727296	name and the	-0.124939
-0.505328	applications and the	-0.124939
-0.727296	framework and the	-0.124939
-0.505328	later and the	-0.124939
-0.505328	b, and the	-0.124939
-0.505328	options and the	-0.124939
-0.727296	constructor and the	-0.124939
-0.338900	function, and the	-0.124939
-0.836526	smaller and the	-0.124939
-0.505328	contentions and the	-0.124939
-0.903536	platforms and the	-0.124939
-0.505328	level-1 and the	-0.124939
-0.505328	math and the	-0.124939
-0.727296	alignment and the	-0.124939
-0.836526	thing and the	-0.124939
-0.836526	program, and the	-0.124939
-0.505328	compiler, and the	-0.124939
-0.182973	declaration and the	-0.124939
-0.505328	away and the	-0.124939
-0.727296	15.1b and the	-0.124939
-0.727296	integer, and the	-0.124939
-0.505328	vector, and the	-0.124939
-0.727296	linker and the	-0.124939
-0.903536	possible, and the	-0.124939
-0.505328	.NET and the	-0.124939
-0.505328	obvious and the	-0.124939
-0.505328	latency and the	-0.124939
-0.505328	resources, and the	-0.124939
-0.727296	overflow, and the	-0.124939
-0.727296	advance and the	-0.124939
-0.505328	resolution and the	-0.124939
-0.505328	bits, and the	-0.124939
-0.505328	B, and the	-0.124939
-0.505328	API and the	-0.124939
-0.727296	level, and the	-0.124939
-0.505328	parameter, and the	-0.124939
-0.505328	easiest and the	-0.124939
-0.727296	flow and the	-0.124939
-0.505328	hint and the	-0.124939
-0.505328	itself, and the	-0.124939
-0.727296	exponent, and the	-0.124939
-0.505328	properly and the	-0.124939
-0.505328	support, and the	-0.124939
-0.505328	tedious and the	-0.124939
-0.505328	statistics, and the	-0.124939
-0.505328	(0); and the	-0.124939
-0.505328	sufficient, and the	-0.124939
-0.505328	! and the	-0.124939
-0.505328	source, and the	-0.124939
-0.505328	T+6, and the	-0.124939
-0.505328	terminated and the	-0.124939
-0.505328	www.agner.org/optimize and the	-0.124939
-0.505328	call, and the	-0.124939
-0.505328	libmmt.lib and the	-0.124939
-0.505328	process, and the	-0.124939
-0.505328	workplace and the	-0.124939
-0.505328	www.openmp.org and the	-0.124939
-0.505328	++i and the	-0.124939
-0.505328	lower; and the	-0.124939
-0.505328	(&) and the	-0.124939
-0.088851	be in the	-0.124939
-0.795949	or in the	-0.124939
-0.589309	it in the	-0.124939
-0.972869	function in the	-0.124939
-0.811655	code in the	-0.124939
-0.497818	not in the	-0.124939
-0.856545	int in the	-0.124939
-0.888548	than in the	-0.124939
-0.555515	compiler in the	-0.124939
-0.625957	time in the	-0.124939
-0.582588	data in the	-0.124939
-0.262304	because in the	-0.425969
-0.497818	functions in the	-0.124939
-0.088851	only in the	-0.124939
-0.338460	other in the	-0.124939
-0.941492	loop in the	-0.124939
-0.347418	used in the	-0.301030
-0.391518	integer in the	-0.124939
-0.262304	set in the	-0.124939
-0.351690	example in the	-0.124939
-0.555515	size in the	-0.124939
-0.555515	object in the	-0.124939
-0.351690	number in the	-0.124939
-0.430871	value in the	-0.124939
-0.636165	objects in the	-0.124939
-0.476629	variable in the	-0.124939
-0.657632	variables in the	-0.124939
-0.262304	table in the	-0.124939
-0.494919	way in the	-0.124939
-0.724505	elements in the	-0.124939
-0.783715	stored in the	-0.301030
-0.671011	called in the	-0.124939
-0.494919	address in the	-0.124939
-0.351690	4 in the	-0.124939
-0.391518	example, in the	-0.124939
-0.555515	bit in the	-0.124939
-0.671011	registers in the	-0.124939
-0.351690	test in the	-0.124939
-0.746166	useful in the	-0.124939
-0.351690	even in the	-0.124939
-0.494919	operations in the	-0.124939
-0.351690	type in the	-0.124939
-0.351690	instructions in the	-0.124939
-0.359285	available in the	-0.249877
-0.351690	error in the	-0.124939
-0.680453	times in the	-0.124939
-0.671011	stack in the	-0.124939
-0.953347	accessed in the	-0.124939
-0.351690	while in the	-0.124939
-0.494919	arrays in the	-0.124939
-0.064877	calls in the	-0.249877
-0.589309	bytes in the	-0.124939
-0.338460	threads in the	-0.425969
-0.351690	necessary in the	-0.124939
-0.696531	element in the	-0.124939
-0.351690	language in the	-0.124939
-0.494919	But in the	-0.124939
-0.351690	small in the	-0.124939
-0.351690	overflow in the	-0.124939
-0.141178	option in the	-0.425969
-0.088851	classes in the	-0.124939
-0.141178	works in the	-0.124939
-0.236177	explained in the	-0.221849
-0.623980	implemented in the	-0.124939
-0.589309	advantage in the	-0.124939
-0.351690	support in the	-0.124939
-0.555515	supported in the	-0.124939
-0.610925	run in the	-0.124939
-0.351690	hardware in the	-0.124939
-0.262304	values in the	-0.124939
-0.555515	information in the	-0.124939
-0.351690	cycles in the	-0.124939
-0.246440	addresses in the	-0.602060
-0.494919	counter in the	-0.124939
-0.064877	space in the	-0.425969
-0.856545	dispatching in the	-0.124939
-0.351690	preferably in the	-0.124939
-0.351690	see in the	-0.124939
-0.494919	handling in the	-0.124939
-0.351690	members in the	-0.124939
-0.351690	methods in the	-0.124939
-0.351690	name in the	-0.124939
-0.351690	zero in the	-0.124939
-0.336603	running in the	-0.301030
-0.494919	dispatcher in the	-0.124939
-0.555515	examples in the	-0.124939
-0.262304	mechanism in the	-0.425969
-0.141178	microprocessors in the	-0.425969
-0.351690	later in the	-0.124939
-0.686219	together in the	-0.124939
-0.387243	declared in the	-0.124939
-0.351690	goes in the	-0.124939
-0.351690	options in the	-0.124939
-0.351690	were in the	-0.124939
-0.351690	points in the	-0.124939
-0.686219	around in the	-0.124939
-0.277383	contentions in the	-0.301030
-0.589309	references in the	-0.124939
-0.351690	overhead in the	-0.124939
-0.351690	change in the	-0.124939
-0.351690	conversions in the	-0.124939
-0.494919	statement in the	-0.124939
-0.328442	described in the	-0.124939
-0.610925	lines in the	-0.124939
-0.351690	operation in the	-0.124939
-0.610925	given in the	-0.124939
-0.351690	S1 in the	-0.124939
-0.494919	database in the	-0.124939
-0.351690	constants in the	-0.124939
-0.690106	strings in the	-0.124939
-0.351690	macro in the	-0.124939
-0.351690	100 in the	-0.124939
-0.494919	containers in the	-0.124939
-0.351690	priority in the	-0.124939
-0.351690	names in the	-0.124939
-0.494919	rows in the	-0.124939
-0.351690	fail in the	-0.124939
-0.351690	structures in the	-0.124939
-0.246440	occur in the	-0.301030
-0.351690	improved in the	-0.124939
-0.494919	discussed in the	-0.124939
-0.141178	delay in the	-0.425969
-0.141178	either in the	-0.124939
-0.262304	except in the	-0.124939
-0.351690	back in the	-0.124939
-0.351690	happen in the	-0.124939
-0.494919	away in the	-0.124939
-0.555515	provided in the	-0.124939
-0.351690	chains in the	-0.124939
-0.555515	mentioned in the	-0.124939
-0.246440	included in the	-0.124939
-0.351690	account in the	-0.124939
-0.494919	algorithms in the	-0.124939
-0.494919	additions in the	-0.124939
-0.351690	factors in the	-0.124939
-0.514551	listed in the	-0.124939
-0.351690	interpreted in the	-0.124939
-0.141178	misses in the	-0.124939
-0.351690	YMM in the	-0.124939
-0.351690	free in the	-0.124939
-0.555515	saved in the	-0.124939
-0.351690	changes in the	-0.124939
-0.351690	units in the	-0.124939
-0.351690	reciprocal in the	-0.124939
-0.141178	spent in the	-0.124939
-0.494919	occurs in the	-0.124939
-0.351690	step in the	-0.124939
-0.141178	spots in the	-0.124939
-0.351690	places in the	-0.124939
-0.351690	evaluated in the	-0.124939
-0.141178	portable in the	-0.425969
-0.351690	recover in the	-0.124939
-0.351690	advice in the	-0.124939
-0.351690	already in the	-0.124939
-0.262304	seen in the	-0.124939
-0.351690	key in the	-0.124939
-0.088851	appear in the	-0.124939
-0.351690	flag in the	-0.124939
-0.351690	present in the	-0.124939
-0.351690	place in the	-0.124939
-0.351690	serial in the	-0.124939
-0.351690	modifications in the	-0.124939
-0.494919	missing in the	-0.124939
-0.351690	lengths in the	-0.124939
-0.064877	Contentions in the	-0.124939
-0.141178	breakpoint in the	-0.124939
-0.351690	appears in the	-0.124939
-0.351690	handler in the	-0.124939
-0.351690	changing in the	-0.124939
-0.494919	kept in the	-0.124939
-0.351690	techniques in the	-0.124939
-0.351690	FPGA in the	-0.124939
-0.494919	consecutively in the	-0.124939
-0.351690	abstraction in the	-0.124939
-0.351690	updating in the	-0.124939
-0.351690	instruments in the	-0.124939
-0.351690	wasteful in the	-0.124939
-0.141178	lies in the	-0.425969
-0.351690	elsewhere in the	-0.124939
-0.494919	supplied in the	-0.124939
-0.351690	delays in the	-0.124939
-0.351690	logarithms in the	-0.124939
-0.351690	kernel in the	-0.124939
-0.351690	(PLT) in the	-0.124939
-0.494919	shown in the	-0.124939
-0.351690	disabled in the	-0.124939
-0.141178	relocations in the	-0.425969
-0.351690	locally in the	-0.124939
-0.351690	answers in the	-0.124939
-0.351690	inserted in the	-0.124939
-0.351690	visible in the	-0.124939
-0.351690	everywhere in the	-0.124939
-0.351690	pragmas in the	-0.124939
-0.351690	alone in the	-0.124939
-0.351690	time-consumer in the	-0.124939
-0.141178	Optimizations in the	-0.425969
-0.351690	parallelization in the	-0.124939
-0.351690	overdetermined in the	-0.124939
-0.351690	integrated in the	-0.124939
-0.351690	annotation in the	-0.124939
-0.351690	previously in the	-0.124939
-0.351690	stay in the	-0.124939
-0.351690	dominate in the	-0.124939
-0.351690	dot in the	-0.124939
-0.351690	alleviated in the	-0.124939
-0.351690	positions in the	-0.124939
-0.351690	Numbers in the	-0.124939
-0.351690	grow in the	-0.124939
-0.351690	flaws in the	-0.124939
-0.351690	mirrored in the	-0.124939
-0.351690	Programming in the	-0.124939
-0.351690	Nothing in the	-0.124939
-0.351690	contiguously in the	-0.124939
-0.351690	UnusedFiller in the	-0.124939
-0.351690	version) in the	-0.124939
-0.351690	foremost, in the	-0.124939
-0.351690	glitches in the	-0.124939
-0.351690	predictions in the	-0.124939
-0.351690	anywhere in the	-0.124939
-0.351690	resized in the	-0.124939
-0.719887	is for the	-0.124939
-0.719887	or for the	-0.124939
-0.601878	function for the	-0.124939
-0.454346	code for the	-0.124939
-0.405666	time for the	-0.425969
-0.770159	data for the	-0.124939
-0.447013	program for the	-0.124939
-0.897933	functions for the	-0.124939
-0.998789	used for the	-0.124939
-0.567816	one for the	-0.124939
-0.168366	cache for the	-0.124939
-0.168366	set for the	-0.124939
-0.447013	size for the	-0.124939
-0.634021	object for the	-0.124939
-0.634021	array for the	-0.124939
-0.210327	possible for the	-0.301030
-0.803375	version for the	-0.124939
-0.827021	variables for the	-0.124939
-0.803375	performance for the	-0.124939
-0.447013	called for the	-0.124939
-0.770159	register for the	-0.124939
-0.634021	registers for the	-0.124939
-0.719887	need for the	-0.124939
-0.447013	test for the	-0.124939
-0.982969	useful for the	-0.124939
-0.867069	file for the	-0.124939
-0.844735	instructions for the	-0.124939
-0.915295	available for the	-0.124939
-0.634021	important for the	-0.124939
-0.447013	large for the	-0.124939
-0.324874	compiled for the	-0.124939
-0.447013	big for the	-0.124939
-0.878564	option for the	-0.124939
-0.405666	good for the	-0.124939
-0.858511	optimized for the	-0.124939
-0.858675	check for the	-0.124939
-0.447013	solution for the	-0.124939
-0.741782	support for the	-0.124939
-0.719887	1 for the	-0.124939
-0.634021	information for the	-0.124939
-0.634021	files for the	-0.124939
-0.447013	above for the	-0.124939
-0.447013	space for the	-0.124939
-0.447013	caching for the	-0.124939
-0.634021	handling for the	-0.124939
-0.719887	name for the	-0.124939
-0.447013	disadvantage for the	-0.124939
-0.634021	difference for the	-0.124939
-0.803375	needed for the	-0.124939
-0.076075	difficult for the	-0.249877
-0.634021	options for the	-0.124939
-0.447013	appropriate for the	-0.124939
-0.447013	constructor for the	-0.124939
-0.447013	relevant for the	-0.124939
-0.447013	destructor for the	-0.124939
-0.447013	them for the	-0.124939
-0.827021	compiling for the	-0.124939
-0.104713	easier for the	-0.301030
-0.447013	identical for the	-0.124939
-0.770159	except for the	-0.124939
-0.447013	stride for the	-0.124939
-0.634021	enough for the	-0.124939
-0.168366	chosen for the	-0.124939
-0.447013	included for the	-0.124939
-0.447013	factors for the	-0.124939
-0.447013	poorly for the	-0.124939
-0.405666	wait for the	-0.124939
-0.447013	Testing for the	-0.124939
-0.311425	(except for the	-0.124939
-0.447013	51 for the	-0.124939
-0.634021	documentation for the	-0.124939
-0.447013	blocking for the	-0.124939
-0.447013	competing for the	-0.124939
-0.168366	unusual for the	-0.124939
-0.634021	suited for the	-0.124939
-0.447013	proxy for the	-0.124939
-0.447013	consistent for the	-0.124939
-0.447013	unnecessary for the	-0.124939
-0.447013	accurate for the	-0.124939
-0.168366	contend for the	-0.425969
-0.447013	subroutine for the	-0.124939
-0.447013	preferences for the	-0.124939
-0.447013	Correction for the	-0.124939
-0.447013	FAQ for the	-0.124939
-0.447013	maintained for the	-0.124939
-0.447013	122) for the	-0.124939
-0.447013	Except for the	-0.124939
-0.447013	compensate for the	-0.124939
-0.447013	compete for the	-0.124939
-0.447013	standards for the	-0.124939
-0.447013	specifically for the	-0.124939
-0.447013	Prototype for the	-0.124939
-0.447013	correction for the	-0.124939
-0.265173	is that the	-0.178184
-0.730876	and that the	-0.124939
-0.963059	code that the	-0.124939
-0.560503	compiler that the	-0.124939
-0.583637	instruction that the	-0.124939
-0.702604	class that the	-0.124939
-0.448183	so that the	-0.124939
-0.413600	long that the	-0.124939
-0.356547	sure that the	-0.124939
-0.159319	case that the	-0.124939
-0.294802	important that the	-0.124939
-0.583637	work that the	-0.124939
-0.583637	avoid that the	-0.124939
-0.583637	check that the	-0.124939
-0.583637	problem that the	-0.124939
-0.583637	advantage that the	-0.124939
-0.659225	likely that the	-0.124939
-0.413600	calculate that the	-0.124939
-0.413600	copy that the	-0.124939
-0.799514	certain that the	-0.124939
-0.583637	fast that the	-0.124939
-0.702604	problems that the	-0.124939
-0.294802	see that the	-0.124939
-0.583637	block that the	-0.124939
-0.413600	name that the	-0.124939
-0.099481	disadvantage that the	-0.301030
-0.803541	means that the	-0.124939
-0.784164	requires that the	-0.124939
-0.471222	assume that the	-0.124939
-0.659225	feature that the	-0.124939
-0.214967	require that the	-0.124939
-0.759728	things that the	-0.124939
-0.413600	reductions that the	-0.124939
-0.152222	fact that the	-0.124939
-0.881074	Assume that the	-0.124939
-0.382606	possibility that the	-0.124939
-0.413600	discussion that the	-0.124939
-0.237071	Note that the	-0.204120
-0.413600	predict that the	-0.124939
-0.413600	frequency that the	-0.124939
-0.413600	consider that the	-0.124939
-0.413600	delay that the	-0.124939
-0.583637	risk that the	-0.124939
-0.159319	provided that the	-0.124939
-0.524490	sense that the	-0.124939
-0.583637	notice that the	-0.124939
-0.413600	expected that the	-0.124939
-0.413600	detect that the	-0.124939
-0.413600	estimate that the	-0.124939
-0.413600	said that the	-0.124939
-0.583637	assumption that the	-0.124939
-0.294802	recognize that the	-0.124939
-0.583637	assuming that the	-0.124939
-0.413600	chance that the	-0.124939
-0.583637	believe that the	-0.124939
-0.583637	Assuming that the	-0.124939
-0.413600	certainty that the	-0.124939
-0.583637	says that the	-0.124939
-0.413600	forgets that the	-0.124939
-0.413600	illogical that the	-0.124939
-0.413600	assumed that the	-0.124939
-0.413600	emphasized that the	-0.124939
-0.413600	complication that the	-0.124939
-0.413600	unlikely that the	-0.124939
-0.413600	knowing that the	-0.124939
-1.727298	to be the	-0.124939
-2.146801	may be the	-0.124939
-0.880188	course be the	-0.124939
-1.038833	b are the	-0.124939
-0.588190	problem are the	-0.124939
-0.588190	inputs are the	-0.124939
-0.588190	algebra are the	-0.124939
-0.588190	principles are the	-0.124939
-0.570878	vector or the	-0.124939
-0.991642	loop or the	-0.124939
-1.187837	array or the	-0.124939
-0.991642	way or the	-0.124939
-1.018907	aligned or the	-0.124939
-0.570878	positive or the	-0.124939
-0.842816	overloaded or the	-0.124939
-0.570878	possible, or the	-0.124939
-0.842816	reference, or the	-0.124939
-0.570878	searching, or the	-0.124939
-0.598155	access it the	-0.124939
-0.504416	that if the	-0.124939
-0.379936	or if the	-0.124939
-0.425357	function if the	-0.124939
-0.504416	not if the	-0.124939
-0.233821	than if the	-0.124939
-0.300571	compiler if the	-0.124939
-0.476433	memory if the	-0.124939
-0.425357	functions if the	-0.124939
-0.379844	only if the	-0.124939
-0.300571	instruction if the	-0.124939
-0.300571	point if the	-0.124939
-0.425433	loop if the	-0.124939
-0.601326	used if the	-0.124939
-0.300571	one if the	-0.124939
-0.476433	integer if the	-0.124939
-0.300571	double if the	-0.124939
-0.221619	efficient if the	-0.124939
-0.233821	possible if the	-0.124939
-0.175919	2 if the	-0.249877
-0.124786	performance if the	-0.124939
-0.425357	branch if the	-0.124939
-0.300571	way if the	-0.124939
-0.300892	faster if the	-0.221849
-0.514409	example, if the	-0.124939
-0.300571	systems if the	-0.124939
-0.124786	useful if the	-0.124939
-0.228303	even if the	-0.124939
-0.300571	system if the	-0.124939
-0.425357	available if the	-0.124939
-0.300571	up if the	-0.124939
-0.300571	error if the	-0.124939
-0.425357	CPUs if the	-0.124939
-0.300571	best if the	-0.124939
-0.425357	necessary if the	-0.124939
-0.300571	element if the	-0.124939
-0.265985	But if the	-0.124939
-0.300571	speed if the	-0.124939
-0.476433	thread if the	-0.124939
-0.300571	integers if the	-0.124939
-0.425433	check if the	-0.124939
-0.124816	advantageous if the	-0.124939
-0.124786	problem if the	-0.124939
-0.300571	advantage if the	-0.124939
-0.425357	mode if the	-0.124939
-0.425357	well if the	-0.124939
-0.425357	fast if the	-0.124939
-0.124786	problems if the	-0.124939
-0.504416	see if the	-0.124939
-0.425357	implementation if the	-0.124939
-0.300571	complicated if the	-0.124939
-0.300571	methods if the	-0.124939
-0.425357	disadvantage if the	-0.124939
-0.300571	reference if the	-0.124939
-0.300571	lookup if the	-0.124939
-0.124786	needed if the	-0.425969
-0.300571	vectors if the	-0.124939
-0.300571	results if the	-0.124939
-0.300571	operands if the	-0.124939
-0.300571	here if the	-0.124939
-0.124786	contentions if the	-0.425969
-0.300571	predicted if the	-0.124939
-0.425357	errors if the	-0.124939
-0.425357	inefficient if the	-0.124939
-0.300571	platforms if the	-0.124939
-0.425357	vectorized if the	-0.124939
-0.300571	inlined if the	-0.124939
-0.300571	further if the	-0.124939
-0.300571	disk if the	-0.124939
-0.300571	obtained if the	-0.124939
-0.504416	efficiently if the	-0.124939
-0.300571	models if the	-0.124939
-0.300686	fail if the	-0.124939
-0.300571	target if the	-0.124939
-0.162560	especially if the	-0.124939
-0.300571	updates if the	-0.124939
-0.504416	consider if the	-0.124939
-0.300571	directly if the	-0.124939
-0.425357	loops if the	-0.124939
-0.124786	happen if the	-0.425969
-0.300571	matter if the	-0.124939
-0.300571	pure if the	-0.124939
-0.300571	cycle if the	-0.124939
-0.300571	frequent if the	-0.124939
-0.300571	however, if the	-0.124939
-0.300571	Likewise, if the	-0.124939
-0.300571	compact if the	-0.124939
-0.300571	course, if the	-0.124939
-0.300571	complex if the	-0.124939
-0.300571	Test if the	-0.124939
-0.300571	happens if the	-0.124939
-0.300571	permissible if the	-0.124939
-0.300571	(i.e. if the	-0.124939
-0.079089	eliminated if the	-0.301030
-0.300571	selected if the	-0.124939
-0.300571	delays if the	-0.124939
-0.300571	(STL) if the	-0.124939
-0.300571	determine if the	-0.124939
-0.300571	mispredictions if the	-0.124939
-0.300571	WriteFile if the	-0.124939
-0.300571	(YMM) if the	-0.124939
-0.300571	sign-bit if the	-0.124939
-0.300571	ignored if the	-0.124939
-0.300571	(XMM) if the	-0.124939
-0.300571	minute if the	-0.124939
-0.300571	minimized if the	-0.124939
-0.715958	and by the	-0.124939
-0.420413	not by the	-0.124939
-0.454721	than by the	-0.124939
-0.593753	more by the	-0.124939
-0.969056	loop by the	-0.124939
-0.802429	used by the	-0.124939
-0.420413	stored by the	-0.124939
-0.298243	called by the	-0.124939
-0.593753	address by the	-0.124939
-0.420413	call by the	-0.124939
-0.671292	out by the	-0.124939
-0.593753	arrays by the	-0.124939
-0.454721	done by the	-0.124939
-0.824325	calculated by the	-0.124939
-0.593753	implemented by the	-0.124939
-0.209284	supported by the	-0.182931
-0.420413	automatically by the	-0.124939
-0.420413	needed by the	-0.124939
-0.824325	aligned by the	-0.124939
-0.294188	divisible by the	-0.279841
-0.762591	replaced by the	-0.124939
-0.420413	predicted by the	-0.124939
-0.298243	limited by the	-0.124939
-0.715958	obtained by the	-0.124939
-0.420413	square by the	-0.124939
-0.420413	converted by the	-0.124939
-0.420413	additions by the	-0.124939
-0.277451	determined by the	-0.124939
-0.073165	generated by the	-0.249877
-0.420413	illustrated by the	-0.124939
-0.202446	multiplied by the	-0.301030
-0.100576	modified by the	-0.301030
-0.420413	manually by the	-0.124939
-0.420413	ArraySize by the	-0.124939
-0.420413	relocated by the	-0.124939
-0.420413	bitfield by the	-0.124939
-0.420413	investigated by the	-0.124939
-0.420413	caught by the	-0.124939
-0.420413	indicated by the	-0.124939
-0.420413	influenced by the	-0.124939
-0.420413	activated by the	-0.124939
-0.671385	or with the	-0.124939
-0.716061	it with the	-0.124939
-0.423208	function with the	-0.425969
-0.716061	compiler with the	-0.124939
-0.671385	CPU with the	-0.124939
-0.484299	library with the	-0.425969
-0.420466	variable with the	-0.124939
-0.161220	bits with the	-0.124939
-0.745254	processors with the	-0.124939
-0.100584	up with the	-0.124939
-0.671385	accessed with the	-0.124939
-0.891382	compiled with the	-0.124939
-0.499647	threads with the	-0.425969
-0.298270	compile with the	-0.425969
-0.420466	overflow with the	-0.124939
-0.671385	integers with the	-0.124939
-0.472987	done with the	-0.124939
-0.593831	calculated with the	-0.124939
-0.745254	problem with the	-0.124939
-0.420466	types with the	-0.124939
-0.593831	declared with the	-0.124939
-0.593831	link with the	-0.124939
-0.420466	points with the	-0.124939
-0.593831	things with the	-0.124939
-0.856893	compatible with the	-0.124939
-0.796324	comes with the	-0.124939
-0.593831	vectorized with the	-0.124939
-0.277474	obtained with the	-0.124939
-0.420466	N with the	-0.124939
-0.420466	directly with the	-0.124939
-0.298270	come with the	-0.124939
-0.420466	happen with the	-0.124939
-0.420466	included with the	-0.124939
-0.420466	c2 with the	-0.124939
-0.420466	DLL with the	-0.124939
-0.202462	multiplying with the	-0.124939
-0.420466	bc with the	-0.124939
-0.420466	separately with the	-0.124939
-0.796324	AND'ed with the	-0.124939
-0.420466	combined with the	-0.124939
-0.420466	entry with the	-0.124939
-0.420466	tests with the	-0.124939
-0.671385	satisfied with the	-0.124939
-0.298270	Comes with the	-0.124939
-0.420466	compared with the	-0.124939
-0.593831	Microprocessors with the	-0.124939
-0.420466	conflicting with the	-0.124939
-0.420466	configurations with the	-0.124939
-0.420466	unsatisfied with the	-0.124939
-0.420466	rewritten with the	-0.124939
-0.420466	coincides with the	-0.124939
-0.420466	repeatedly with the	-0.124939
-0.420466	fighting with the	-0.124939
-0.094533	than on the	-0.301030
-0.356188	compiler on the	-0.124939
-0.694023	time on the	-0.124939
-0.356188	memory on the	-0.124939
-0.356188	program on the	-0.124939
-0.703733	only on the	-0.124939
-0.501188	version on the	-0.124939
-0.356188	objects on the	-0.124939
-0.042523	stored on the	-0.425969
-0.562729	processors on the	-0.124939
-0.502316	work on the	-0.124939
-0.248553	calculations on the	-0.124939
-0.619131	best on the	-0.124939
-0.356188	much on the	-0.124939
-0.501188	overflow on the	-0.124939
-0.356188	done on the	-0.124939
-0.501188	parameters on the	-0.124939
-0.356188	optimal on the	-0.124939
-0.356188	space on the	-0.124939
-0.438638	running on the	-0.124939
-0.142558	transferred on the	-0.425969
-0.501188	optimizations on the	-0.124939
-0.356188	graphics on the	-0.124939
-0.356188	together on the	-0.124939
-0.356188	storage on the	-0.124939
-0.483311	based on the	-0.124939
-0.501188	around on the	-0.124939
-0.137450	depends on the	-0.170696
-0.356188	compatible on the	-0.124939
-0.124524	depending on the	-0.249877
-0.356188	comes on the	-0.124939
-0.434014	rely on the	-0.249877
-0.341728	effect on the	-0.124939
-0.356188	especially on the	-0.124939
-0.356188	15 on the	-0.124939
-0.299375	depend on the	-0.124939
-0.356188	list, on the	-0.124939
-0.356188	compromise on the	-0.124939
-0.619131	relies on the	-0.124939
-0.356188	appears on the	-0.124939
-0.501188	μs on the	-0.124939
-0.356188	focus on the	-0.124939
-0.356188	forums on the	-0.124939
-0.356188	interpretation on the	-0.124939
-0.356188	influence on the	-0.124939
-0.356188	efforts on the	-0.124939
-0.356188	Turn on the	-0.124939
-0.356188	Storage on the	-0.124939
-0.356188	pushed on the	-0.124939
-0.356188	(depending on the	-0.124939
-0.356188	relying on the	-0.124939
-0.596506	to code the	-0.124939
-0.778215	time as the	-0.124939
-0.800481	same as the	-0.124939
-1.091770	such as the	-0.124939
-1.184032	long as the	-0.124939
-1.158464	stored as the	-0.124939
-0.463780	good as the	-0.124939
-0.535149	precision as the	-0.124939
-1.137318	well as the	-0.124939
-0.903174	code, as the	-0.124939
-0.535149	features as the	-0.124939
-0.535149	chosen as the	-0.124939
-0.778215	soon as the	-0.124939
-0.535149	consumption as the	-0.124939
-0.535149	blurred as the	-0.124939
-0.535149	directory as the	-0.124939
-1.536499	is not the	-0.124939
-0.996994	but not the	-0.124939
-0.575164	rows, not the	-0.124939
-0.698709	time than the	-0.124939
-0.857991	more than the	-0.124939
-0.421009	CPU than the	-0.124939
-0.717135	other than the	-0.124939
-0.421009	set than the	-0.124939
-0.768636	efficient than the	-0.124939
-0.546359	faster than the	-0.191886
-0.476590	less than the	-0.425969
-0.716523	rather than the	-0.124939
-0.421009	instructions than the	-0.124939
-0.421009	calculate than the	-0.124939
-0.422244	resources than the	-0.124939
-0.672354	better than the	-0.124939
-0.594641	higher than the	-0.124939
-0.229403	bigger than the	-0.204120
-0.421009	modules than the	-0.124939
-0.421009	smaller than the	-0.124939
-0.421009	safe than the	-0.124939
-0.298543	priority than the	-0.124939
-0.825584	slower than the	-0.124939
-0.594641	predictable than the	-0.124939
-0.421009	larger than the	-0.124939
-0.421009	input/output than the	-0.124939
-0.421009	footprint than the	-0.124939
-0.824871	to have the	-0.124939
-0.746235	not have the	-0.124939
-1.029955	libraries have the	-0.124939
-0.942454	systems have the	-0.124939
-0.707320	doesn't have the	-0.124939
-1.351403	don't have the	-0.124939
-0.551549	languages have the	-0.124939
-0.884551	to this the	-0.425969
-0.583521	122 this the	-0.124939
-1.166074	the time the	-0.301030
-0.245169	each time the	-0.425969
-0.596408	first time the	-0.425969
-0.429957	every time the	-0.301030
-0.747317	next time the	-0.124939
-0.747317	last time the	-0.124939
-0.455014	to use the	-0.168404
-0.415836	and use the	-0.124939
-0.458717	that use the	-0.124939
-0.379856	can use the	-0.492916
-0.534148	or use the	-0.124939
-0.486477	not use the	-0.124939
-0.420063	may use the	-0.124939
-0.415836	will use the	-0.124939
-0.379536	do use the	-0.124939
-0.149564	compilers use the	-0.124939
-0.534148	cannot use the	-0.124939
-0.600935	always use the	-0.124939
-0.534148	applications use the	-0.124939
-0.534148	normally use the	-0.124939
-0.379536	mean use the	-0.124939
-0.379536	CPUs: use the	-0.124939
-0.379536	thenaandbcannot use the	-0.124939
-0.379536	Subtractions use the	-0.124939
-0.223150	or when the	-0.602060
-0.125806	function when the	-0.425969
-0.429447	code when the	-0.124939
-0.223150	time when the	-0.301030
-0.429447	memory when the	-0.124939
-0.303636	program when the	-0.124939
-0.639068	only when the	-0.124939
-0.481036	used when the	-0.124939
-0.303636	there when the	-0.124939
-0.509327	efficient when the	-0.124939
-0.429447	faster when the	-0.124939
-0.125806	called when the	-0.124939
-0.303636	systems when the	-0.124939
-0.302983	useful when the	-0.124939
-0.664529	even when the	-0.124939
-0.303636	bits when the	-0.124939
-0.429447	operations when the	-0.124939
-0.303636	cases when the	-0.124939
-0.303636	want when the	-0.124939
-0.303636	best when the	-0.124939
-0.303636	language when the	-0.124939
-0.303636	But when the	-0.124939
-0.429447	matrix when the	-0.124939
-0.125806	precision when the	-0.425969
-0.429447	problem when the	-0.124939
-0.303636	counter when the	-0.124939
-0.303636	files when the	-0.124939
-0.303636	allocation when the	-0.124939
-0.303636	programs when the	-0.124939
-0.303636	problems when the	-0.124939
-0.125806	automatically when the	-0.124939
-0.303636	disadvantage when the	-0.124939
-0.303636	modules when the	-0.124939
-0.481036	relevant when the	-0.124939
-0.125806	dynamically when the	-0.124939
-0.303636	inefficient when the	-0.124939
-0.303636	task when the	-0.124939
-0.303636	obtained when the	-0.124939
-0.605538	efficiently when the	-0.124939
-0.079701	initialized when the	-0.301030
-0.303636	especially when the	-0.124939
-0.605538	fragmented when the	-0.124939
-0.303636	inputs when the	-0.124939
-0.125806	resolved when the	-0.124939
-0.303636	update when the	-0.124939
-0.303636	collection when the	-0.124939
-0.303636	truncation when the	-0.124939
-0.303636	places when the	-0.124939
-0.235570	deallocated when the	-0.425969
-0.303636	increased when the	-0.124939
-0.303636	deleted when the	-0.124939
-0.303636	negligible when the	-0.124939
-0.303636	freed when the	-0.124939
-0.303636	bypassed when the	-0.124939
-0.303636	precisions when the	-0.124939
-0.303636	float's when the	-0.124939
-0.303636	processor) when the	-0.124939
-0.303636	33% when the	-0.124939
-0.303636	released when the	-0.124939
-0.303636	decreased when the	-0.124939
-0.383966	function then the	-0.124939
-0.716578	time then the	-0.124939
-0.383966	more then the	-0.124939
-0.646699	functions then the	-0.124939
-0.540487	set then the	-0.124939
-0.279521	2 then the	-0.425969
-0.383966	registers then the	-0.124939
-0.383966	case then the	-0.124939
-0.540487	times then the	-0.124939
-0.383966	line then the	-0.124939
-0.540487	parameters then the	-0.124939
-0.383966	values then the	-0.124939
-0.383966	methods then the	-0.124939
-0.383966	high then the	-0.124939
-0.383966	function, then the	-0.124939
-0.716578	range then the	-0.124939
-0.540487	index then the	-0.124939
-0.150863	time, then the	-0.124939
-0.383966	changed then the	-0.124939
-0.540487	execute then the	-0.124939
-0.608336	module then the	-0.124939
-0.383966	near then the	-0.124939
-0.383966	once then the	-0.124939
-0.150863	object, then the	-0.124939
-0.383966	integers, then the	-0.124939
-0.383966	flag then the	-0.124939
-0.383966	true, then the	-0.124939
-0.383966	pipeline then the	-0.124939
-0.383966	changing then the	-0.124939
-0.383966	slow, then the	-0.124939
-0.383966	false, then the	-0.124939
-0.540487	sum, then the	-0.124939
-0.540487	double, then the	-0.124939
-0.383966	vacant then the	-0.124939
-0.383966	priorities then the	-0.124939
-0.383966	GHz then the	-0.124939
-0.383966	loops, then the	-0.124939
-0.383966	9.10, then the	-0.124939
-0.383966	row-wise, then the	-0.124939
-0.383966	ignore, then the	-0.124939
-0.383966	only, then the	-0.124939
-0.383966	18, then the	-0.124939
-0.561857	function from the	-0.124939
-0.561857	than from the	-0.124939
-0.975211	compiler from the	-0.124939
-0.561857	cache from the	-0.124939
-0.469098	value from the	-0.124939
-0.633426	return from the	-0.124939
-0.504332	called from the	-0.124939
-0.398758	call from the	-0.124939
-0.398758	file from the	-0.124939
-0.844100	available from the	-0.124939
-0.674181	accessed from the	-0.124939
-0.195776	calculated from the	-0.301030
-0.398758	sets from the	-0.124939
-0.398758	n from the	-0.124939
-0.398758	runtime from the	-0.124939
-0.398758	needed from the	-0.124939
-0.700607	read from the	-0.124939
-0.398758	goes from the	-0.124939
-0.398758	right from the	-0.124939
-0.398758	writing from the	-0.124939
-0.398758	efficiently from the	-0.124939
-0.633426	far from the	-0.124939
-0.700607	benefit from the	-0.124939
-0.398758	generated from the	-0.124939
-0.398758	returns from the	-0.124939
-0.398758	gets from the	-0.124939
-0.398758	removed from the	-0.124939
-0.398758	separated from the	-0.124939
-0.398758	returning from the	-0.124939
-0.398758	evicted from the	-0.124939
-0.398758	warning from the	-0.124939
-0.398758	fetched from the	-0.124939
-0.398758	popped from the	-0.124939
-0.398758	deviate from the	-0.124939
-0.550956	it at the	-0.124939
-0.391240	memory at the	-0.124939
-0.283319	used at the	-0.425969
-0.391240	compilers at the	-0.124939
-0.391240	variable at the	-0.124939
-0.408257	elements at the	-0.124939
-0.391240	faster at the	-0.124939
-0.660114	done at the	-0.124939
-0.391240	cycles at the	-0.124939
-0.391240	branches at the	-0.124939
-0.391240	multiplication at the	-0.124939
-0.391240	name at the	-0.124939
-0.391240	zero at the	-0.124939
-0.391240	lookup at the	-0.124939
-0.162735	look at the	-0.191886
-0.391240	options at the	-0.124939
-0.391240	things at the	-0.124939
-0.045258	unknown at the	-0.903090
-0.391240	thing at the	-0.124939
-0.391240	least at the	-0.124939
-0.391240	counts at the	-0.124939
-0.391240	DLL at the	-0.124939
-0.391240	break at the	-0.124939
-0.391240	popular at the	-0.124939
-0.391240	begins at the	-0.124939
-0.391240	lost at the	-0.124939
-0.391240	Looking at the	-0.124939
-0.933488	it has the	-0.124939
-0.446588	This has the	-0.425969
-0.501382	{ has the	-0.124939
-0.828108	It has the	-0.124939
-0.501382	always has the	-0.124939
-0.893730	microprocessor has the	-0.124939
-0.501382	main has the	-0.124939
-0.501382	inlining has the	-0.124939
-0.501382	position-independent has the	-0.124939
-0.501382	occur has the	-0.124939
-0.501382	executable has the	-0.124939
-0.501382	auto_ptr has the	-0.124939
-0.595001	to make the	-0.243038
-0.690142	and make the	-0.124939
-0.736947	not make the	-0.124939
-0.543892	will make the	-0.124939
-0.160128	would make the	-0.124939
-0.416514	course make the	-0.124939
-0.416514	Templates make the	-0.124939
-0.416514	(5) make the	-0.124939
-0.325158	is because the	-0.124939
-0.296160	be because the	-0.124939
-0.296160	or because the	-0.124939
-0.394282	function because the	-0.124939
-0.342455	time because the	-0.124939
-0.296160	data because the	-0.124939
-0.296160	functions because the	-0.124939
-0.296160	all because the	-0.124939
-0.296160	cache because the	-0.124939
-0.296160	integer because the	-0.124939
-0.296160	object because the	-0.124939
-0.342455	efficient because the	-0.124939
-0.296160	possible because the	-0.124939
-0.296160	version because the	-0.124939
-0.231293	performance because the	-0.124939
-0.296160	software because the	-0.124939
-0.296160	long because the	-0.124939
-0.469836	faster because the	-0.124939
-0.296160	often because the	-0.124939
-0.296160	test because the	-0.124939
-0.296160	available because the	-0.124939
-0.469836	times because the	-0.124939
-0.296160	large because the	-0.124939
-0.296160	calls because the	-0.124939
-0.296160	result because the	-0.124939
-0.296160	necessary because the	-0.124939
-0.296160	128 because the	-0.124939
-0.419487	solution because the	-0.124939
-0.078203	mode because the	-0.124939
-0.296160	programs because the	-0.124939
-0.296160	microprocessor because the	-0.124939
-0.296160	better because the	-0.124939
-0.296160	applications because the	-0.124939
-0.296160	needed because the	-0.124939
-0.296160	types because the	-0.124939
-0.296160	read because the	-0.124939
-0.296160	process because the	-0.124939
-0.419487	operands because the	-0.124939
-0.419487	here because the	-0.124939
-0.297372	inefficient because the	-0.124939
-0.296160	copied because the	-0.124939
-0.296160	-fpic because the	-0.124939
-0.296160	occurs because the	-0.124939
-0.296160	28 because the	-0.124939
-0.419487	twice because the	-0.124939
-0.296160	-fpie because the	-0.124939
-0.296160	i*12, because the	-0.124939
-0.296160	stall because the	-0.124939
-0.296160	evaluated, because the	-0.124939
-0.296160	line, because the	-0.124939
-0.964176	and only the	-0.124939
-0.510606	in only the	-0.124939
-0.972859	with only the	-0.124939
-0.510606	on only the	-0.124939
-1.043357	not only the	-0.124939
-0.847920	use only the	-0.124939
-0.510606	using only the	-0.124939
-0.510606	initialized only the	-0.124939
-0.510606	includes only the	-0.124939
-0.510606	insert only the	-0.124939
-0.510606	processors, only the	-0.124939
-0.510606	Actually, only the	-0.124939
-0.510606	understands only the	-0.124939
-0.341093	code. If the	-0.124939
-0.270124	function. If the	-0.124939
-0.431549	memory. If the	-0.124939
-0.270124	used. If the	-0.124939
-0.114390	cache. If the	-0.124939
-0.385230	systems. If the	-0.124939
-0.385230	efficient. If the	-0.124939
-0.216154	set. If the	-0.124939
-0.270124	calls. If the	-0.124939
-0.270124	object. If the	-0.124939
-0.385230	library. If the	-0.124939
-0.270124	purposes. If the	-0.124939
-0.270124	way. If the	-0.124939
-0.385230	CPU. If the	-0.124939
-0.385230	problem. If the	-0.124939
-0.270124	order. If the	-0.124939
-0.270124	executed. If the	-0.124939
-0.270124	file. If the	-0.124939
-0.270124	register. If the	-0.124939
-0.385230	table. If the	-0.124939
-0.114390	simultaneously. If the	-0.124939
-0.385230	number. If the	-0.124939
-0.270124	constant. If the	-0.124939
-0.270124	members. If the	-0.124939
-0.270124	maintain. If the	-0.124939
-0.270124	priority. If the	-0.124939
-0.270124	elimination If the	-0.124939
-0.270124	better. If the	-0.124939
-0.270124	declared. If the	-0.124939
-0.270124	www.agner.org/optimize/cppexamples.zip. If the	-0.124939
-0.270124	addition. If the	-0.124939
-0.270124	code). If the	-0.124939
-0.270124	12. If the	-0.124939
-0.270124	long. If the	-0.124939
-0.270124	same. If the	-0.124939
-0.270124	slow. If the	-0.124939
-0.270124	0x1C. If the	-0.124939
-0.270124	0x20; If the	-0.124939
-0.270124	containers. If the	-0.124939
-0.270124	ms. If the	-0.124939
-0.270124	105). If the	-0.124939
-0.270124	time? If the	-0.124939
-0.270124	class). If the	-0.124939
-0.270124	7. If the	-0.124939
-0.270124	62. If the	-0.124939
-0.270124	coded. If the	-0.124939
-0.270124	key? If the	-0.124939
-0.270124	complicated. If the	-0.124939
-0.270124	writes. If the	-0.124939
-0.270124	analysis. If the	-0.124939
-0.270124	elements? If the	-0.124939
-0.270124	references: If the	-0.124939
-0.270124	pipeline. If the	-0.124939
-0.270124	stored? If the	-0.124939
-0.270124	sum2; If the	-0.124939
-0.270124	allocated. If the	-0.124939
-0.414899	in which the	-0.301030
-0.546856	code which the	-0.124939
-0.746184	of all the	-0.124939
-0.387695	and all the	-0.124939
-0.746184	in all the	-0.124939
-0.690148	for all the	-0.124939
-0.932429	that all the	-0.124939
-0.336050	if all the	-0.124939
-0.707788	with all the	-0.124939
-0.827899	on all the	-0.124939
-0.594487	then all the	-0.124939
-0.298491	because all the	-0.124939
-0.420906	last all the	-0.124939
-0.420906	load all the	-0.124939
-0.161341	inlining all the	-0.425969
-0.420906	checking all the	-0.124939
-0.420906	stores all the	-0.124939
-0.420906	manipulate all the	-0.124939
-0.420906	solve all the	-0.124939
-0.420906	distribute all the	-0.124939
-0.420906	pool all the	-0.124939
-0.470321	all but the	-0.124939
-0.470321	example, but the	-0.124939
-0.820753	functions, but the	-0.124939
-0.820753	time, but the	-0.124939
-0.764655	efficient, but the	-0.124939
-0.470321	edx but the	-0.124939
-0.764655	threads, but the	-0.124939
-0.470321	BSD, but the	-0.124939
-0.670411	well, but the	-0.124939
-0.470321	64, but the	-0.124939
-0.470321	vectors, but the	-0.124939
-0.470321	vectorized, but the	-0.124939
-0.670411	occur, but the	-0.124939
-0.470321	hyperthreading, but the	-0.124939
-0.470321	macro, but the	-0.124939
-0.470321	103), but the	-0.124939
-0.470321	situation, but the	-0.124939
-0.470321	aliasing, but the	-0.124939
-0.890094	have used the	-0.124939
-0.653592	to set the	-0.124939
-0.574411	addition, set the	-0.124939
-0.668481	to do the	-0.124939
-1.081786	can do the	-0.124939
-0.767123	will do the	-0.124939
-0.471572	program do the	-0.124939
-0.672397	cannot do the	-0.124939
-0.471572	must do the	-0.124939
-0.613773	of using the	-0.124939
-0.610564	and using the	-0.124939
-0.385294	in using the	-0.124939
-0.610564	for using the	-0.124939
-0.621573	are using the	-0.425969
-0.446447	by using the	-0.279841
-0.280216	on using the	-0.124939
-0.385294	array using the	-0.124939
-0.362646	without using the	-0.124939
-0.385294	expressions using the	-0.124939
-0.385294	operation using the	-0.124939
-0.385294	finished using the	-0.124939
-0.581238	can double the	-0.124939
-0.581238	would double the	-0.124939
-0.704609	data into the	-0.124939
-0.400750	i into the	-0.124939
-0.400750	possible into the	-0.124939
-0.400750	branch into the	-0.124939
-0.400750	register into the	-0.124939
-0.155703	best into the	-0.425969
-0.400750	block into the	-0.124939
-0.400750	put into the	-0.124939
-0.564761	linked into the	-0.124939
-0.400750	feature into the	-0.124939
-0.373593	them into the	-0.124939
-0.564761	fit into the	-0.124939
-0.400750	N into the	-0.124939
-0.400750	directly into the	-0.124939
-0.400750	back into the	-0.124939
-0.400750	instruments into the	-0.124939
-0.400750	fed into the	-0.124939
-0.400750	deeper into the	-0.124939
-0.400750	Integrates into the	-0.124939
-0.400750	nicely into the	-0.124939
-0.400750	feed into the	-0.124939
-0.773022	but also the	-0.124939
-0.594163	how efficient the	-0.124939
-0.649784	function. In the	-0.124939
-0.171023	faster. In the	-0.124939
-0.649784	speed. In the	-0.124939
-0.457195	all. In the	-0.124939
-0.457195	two. In the	-0.124939
-0.457195	occur. In the	-0.124939
-0.457195	big. In the	-0.124939
-0.457195	string. In the	-0.124939
-0.457195	name. In the	-0.124939
-0.457195	obtained. In the	-0.124939
-0.457195	60. In the	-0.124939
-0.457195	146). In the	-0.124939
-0.310600	and where the	-0.124939
-0.310600	program where the	-0.124939
-0.310600	set where the	-0.124939
-0.265370	cases where the	-0.182931
-0.310600	instructions where the	-0.124939
-0.310600	mode where the	-0.124939
-0.310600	sets where the	-0.124939
-0.310600	model where the	-0.124939
-0.310600	examples where the	-0.124939
-0.310600	process where the	-0.124939
-0.310600	computer where the	-0.124939
-0.310600	predict where the	-0.124939
-0.430728	situation where the	-0.124939
-0.310600	templates where the	-0.124939
-0.456728	situations where the	-0.124939
-0.310600	determined where the	-0.124939
-0.310600	step where the	-0.124939
-0.310600	(i.e. where the	-0.124939
-0.310600	Fortran where the	-0.124939
-0.310600	manner where the	-0.124939
-0.885011	compiler takes the	-0.124939
-0.549928	2, so the	-0.124939
-0.549928	thousand so the	-0.124939
-0.549928	truncation so the	-0.124939
-0.549928	digits, so the	-0.124939
-0.587769	will return the	-0.124939
-0.431332	in between the	-0.124939
-0.610129	performance between the	-0.124939
-0.481137	difference between the	-0.124939
-0.431332	framework between the	-0.124939
-0.610129	synchronization between the	-0.124939
-0.610129	distinction between the	-0.124939
-0.431332	similarity between the	-0.124939
-0.431332	transitions between the	-0.124939
-0.431332	evenly between the	-0.124939
-0.431332	observed between the	-0.124939
-0.431332	distinguishing between the	-0.124939
-1.345069	virtual member the	-0.124939
-0.756662	the way the	-0.124939
-0.999534	no way the	-0.124939
-2.049491	is faster the	-0.124939
-0.853032	This makes the	-0.124939
-0.377197	this makes the	-0.124939
-0.377197	only makes the	-0.124939
-0.634383	which makes the	-0.124939
-0.377197	one makes the	-0.124939
-0.597047	also makes the	-0.124939
-0.377197	call makes the	-0.124939
-0.377197	processor makes the	-0.124939
-0.148874	option makes the	-0.124939
-0.377197	simply makes the	-0.124939
-0.530813	linking makes the	-0.124939
-0.377197	templates makes the	-0.124939
-0.377197	checks makes the	-0.124939
-0.377197	blocks makes the	-0.124939
-0.377197	instances makes the	-0.124939
-0.385257	time before the	-0.124939
-0.385257	program before the	-0.124939
-0.270144	value before the	-0.124939
-0.270144	takes before the	-0.124939
-0.270144	table before the	-0.124939
-0.270144	long before the	-0.124939
-0.072826	called before the	-0.124939
-0.270144	times before the	-0.124939
-0.385257	stack before the	-0.124939
-0.270144	check before the	-0.124939
-0.114398	known before the	-0.425969
-0.431579	values before the	-0.124939
-0.270144	well before the	-0.124939
-0.270144	cycles before the	-0.124939
-0.270144	addition before the	-0.124939
-0.270144	comes before the	-0.124939
-0.270144	priority before the	-0.124939
-0.270144	iteration before the	-0.124939
-0.270144	resolved before the	-0.124939
-0.270144	again before the	-0.124939
-0.270144	B before the	-0.124939
-0.270144	freed before the	-0.124939
-0.114398	immediately before the	-0.124939
-0.270144	restored before the	-0.124939
-1.510483	is called the	-0.124939
-0.555874	memory called the	-0.124939
-0.555874	cache called the	-0.124939
-0.649529	memory. See the	-0.124939
-0.457031	platforms. See the	-0.124939
-0.457031	mode. See the	-0.124939
-0.457031	version. See the	-0.124939
-0.457031	branch. See the	-0.124939
-0.457031	handling. See the	-0.124939
-0.649529	pool. See the	-0.124939
-0.457031	doing. See the	-0.124939
-0.457031	__declspec(cpu_dispatch(...)). See the	-0.124939
-0.457031	obvious. See the	-0.124939
-0.292889	to call the	-0.271067
-0.603221	and call the	-0.124939
-0.682635	can call the	-0.124939
-0.426742	may call the	-0.124939
-0.426742	modules call the	-0.124939
-0.426742	Now call the	-0.124939
-0.353161	this example, the	-0.124939
-0.956692	For example, the	-0.124939
-0.211950	above example, the	-0.124939
-0.589806	shows first the	-0.124939
-0.588982	single register the	-0.124939
-1.351853	can take the	-0.124939
-0.505684	not take the	-0.124939
-0.505684	b take the	-0.124939
-0.505684	We take the	-0.124939
-0.505684	usually take the	-0.124939
-0.505684	Let's take the	-0.124939
-1.729814	is often the	-0.124939
-0.554454	is how the	-0.124939
-1.074574	and how the	-0.124939
-0.949636	about how the	-0.124939
-1.028398	not need the	-0.124939
-1.021122	we need the	-0.124939
-1.070907	doesn't need the	-0.124939
-1.028398	don't need the	-0.124939
-0.729352	to test the	-0.124939
-0.529283	should test the	-0.124939
-0.757582	and without the	-0.124939
-0.412148	or without the	-0.124939
-0.581491	object without the	-0.124939
-0.412148	version without the	-0.124939
-0.412148	libraries without the	-0.124939
-0.412148	even without the	-0.124939
-0.412148	processors without the	-0.124939
-0.412148	CPUs without the	-0.124939
-0.412148	calculations without the	-0.124939
-0.412148	changed without the	-0.124939
-0.412148	(Compile without the	-0.124939
-0.568491	for even the	-0.124939
-0.568491	cases even the	-0.124939
-0.997273	are sure the	-0.124939
-0.572747	make sure the	-0.191886
-0.770256	makes sure the	-0.124939
-0.578791	Make sure the	-0.124939
-0.866268	and always the	-0.124939
-0.854477	to access the	-0.124939
-0.762014	that access the	-0.124939
-0.525818	(4) access the	-0.124939
-0.291018	shift out the	-0.425969
-0.198085	rule out the	-0.301030
-0.229764	roll out the	-0.425969
-0.157238	rolling out the	-0.425969
-0.840555	in case the	-0.124939
-0.521393	latter case the	-0.124939
-1.143635	most cases the	-0.124939
-0.987326	some cases the	-0.124939
-0.619356	set up the	-0.124939
-0.549895	speed up the	-0.124939
-0.193174	look up the	-0.301030
-0.549895	split up the	-0.124939
-0.390506	warm up the	-0.124939
-0.390506	cleans up the	-0.124939
-0.390506	fills up the	-0.124939
-0.390506	fill up the	-0.124939
-0.390506	summing up the	-0.124939
-0.312789	of making the	-0.124939
-0.693676	for making the	-0.124939
-0.408352	by making the	-0.124939
-0.386384	places making the	-0.124939
-0.747190	two times the	-0.124939
-0.742621	many times the	-0.124939
-0.747190	three times the	-0.124939
-1.048530	you want the	-0.124939
-0.714525	We want the	-0.124939
-0.497623	just want the	-0.124939
-0.513582	much about the	-0.124939
-0.289300	information about the	-0.182931
-0.365028	care about the	-0.124939
-0.365028	that's about the	-0.124939
-0.365028	thought about the	-0.124939
-0.449115	that does the	-0.124939
-0.771068	It does the	-0.124939
-0.473565	which does the	-0.124939
-0.473565	operator does the	-0.124939
-0.473565	__intel_cpu_features_init_x() does the	-0.124939
-0.390840	and while the	-0.124939
-0.390840	used, while the	-0.124939
-0.390840	called, while the	-0.124939
-0.390840	integers, while the	-0.124939
-0.390840	break while the	-0.124939
-0.390840	once, while the	-0.124939
-0.390840	expensive, while the	-0.124939
-0.390840	unchanged, while the	-0.124939
-0.390840	both, while the	-0.124939
-0.390840	intended, while the	-0.124939
-0.586449	BSD work the	-0.124939
-0.508757	that calls the	-0.124939
-0.488258	always calls the	-0.124939
-0.488258	16.2 calls the	-0.124939
-0.488258	loader calls the	-0.124939
-0.561908	to avoid the	-0.124939
-0.420054	can avoid the	-0.124939
-0.616974	may avoid the	-0.124939
-0.547865	you avoid the	-0.124939
-0.586733	word processor the	-0.124939
-0.483486	2. Use the	-0.124939
-0.483486	option. Use the	-0.124939
-0.483486	implemented. Use the	-0.124939
-0.483486	spot. Use the	-0.124939
-0.483486	listing. Use the	-0.124939
-0.461636	precision. But the	-0.124939
-0.461636	array. But the	-0.124939
-0.461636	integer. But the	-0.124939
-0.461636	5. But the	-0.124939
-0.461636	languages. But the	-0.124939
-0.461636	market. But the	-0.124939
-0.461018	library through the	-0.124939
-1.101205	accessed through the	-0.124939
-0.655755	goes through the	-0.124939
-0.746520	go through the	-0.124939
-0.461018	updates through the	-0.124939
-0.461018	propagate through the	-0.124939
-0.729375	and compile the	-0.124939
-0.339465	you compile the	-0.124939
-0.506574	we compile the	-0.124939
-0.448721	that cause the	-0.124939
-0.907571	can cause the	-0.124939
-0.949853	may cause the	-0.124939
-0.210821	will cause the	-0.124939
-0.875446	have done the	-0.124939
-1.469455	and therefore the	-0.124939
-0.160314	be inside the	-0.124939
-0.160314	memory inside the	-0.124939
-0.160314	used inside the	-0.124939
-0.160314	objects inside the	-0.124939
-0.160314	variable inside the	-0.124939
-0.034877	branch inside the	-0.249877
-0.160314	arrays inside the	-0.124939
-0.047154	calculations inside the	-0.301030
-0.160314	counter inside the	-0.124939
-0.449229	declared inside the	-0.425969
-0.160314	condition inside the	-0.124939
-0.246930	defined inside the	-0.124939
-0.160314	body inside the	-0.124939
-0.160314	happens inside the	-0.124939
-0.160314	nothing inside the	-0.124939
-0.160314	etc.) inside the	-0.124939
-0.160314	log) inside the	-0.124939
-1.362788	is calculated the	-0.124939
-0.809523	only calculated the	-0.124939
-0.990628	that uses the	-0.124939
-0.751582	It uses the	-0.124939
-0.519732	never uses the	-0.124939
-0.751582	can get the	-0.124939
-0.659857	not get the	-0.124939
-0.805908	will get the	-0.124939
-0.463633	both get the	-0.124939
-0.463633	typically get the	-0.124939
-1.304597	to check the	-0.124939
-0.347314	can check the	-0.124939
-0.875471	more advantageous the	-0.124939
-0.988332	that support the	-0.124939
-0.866281	not support the	-0.124939
-0.518954	will support the	-0.124939
-0.485690	which contains the	-0.124939
-0.695033	now contains the	-0.124939
-0.485690	ecx contains the	-0.124939
-0.485690	edx contains the	-0.124939
-0.299870	efficient whether the	-0.124939
-0.299870	sure whether the	-0.124939
-0.299870	about whether the	-0.124939
-0.424423	see whether the	-0.124939
-0.299870	shows whether the	-0.124939
-0.299870	know whether the	-0.124939
-0.299870	predict whether the	-0.124939
-0.078949	checks whether the	-0.124939
-0.299870	compile-time whether the	-0.124939
-0.299870	determines whether the	-0.124939
-0.783433	of doing the	-0.124939
-0.677610	are doing the	-0.124939
-0.400573	by doing the	-0.124939
-0.400573	time doing the	-0.124939
-0.677610	from doing the	-0.124939
-0.400573	fact doing the	-0.124939
-0.400573	busy doing the	-0.124939
-0.650979	to run the	-0.124939
-0.483605	you run the	-0.124939
-0.948018	will run the	-0.124939
-0.120756	to calculate the	-0.284640
-0.459568	can calculate the	-0.124939
-0.531757	to inline the	-0.124939
-0.515826	cannot inline the	-0.124939
-0.806602	to add the	-0.124939
-0.773147	// add the	-0.124939
-0.636204	may add the	-0.124939
-0.636204	then add the	-0.124939
-0.448430	register, add the	-0.124939
-0.749570	to store the	-0.124939
-0.179375	and store the	-0.124939
-0.348440	can store the	-0.124939
-0.490407	may store the	-0.124939
-0.348440	will store the	-0.124939
-0.348440	might store the	-0.124939
-0.348440	better: store the	-0.124939
-0.541804	needed. All the	-0.124939
-0.541804	do. All the	-0.124939
-0.341893	to copy the	-0.124939
-0.914470	and copy the	-0.124939
-0.927640	of optimizing the	-0.124939
-0.796432	by optimizing the	-0.124939
-0.356681	how well the	-0.124939
-1.554364	is simply the	-0.124939
-0.335186	to write the	-0.124939
-0.871383	to optimize the	-0.124939
-0.503184	often optimize the	-0.124939
-0.470637	used above the	-0.124939
-0.470637	28 above the	-0.124939
-0.470637	matrix[c][r] above the	-0.124939
-0.470637	position above the	-0.124939
-0.355060	function. However, the	-0.124939
-0.355060	resources. However, the	-0.124939
-0.355060	purposes. However, the	-0.124939
-0.355060	sets. However, the	-0.124939
-0.355060	executed. However, the	-0.124939
-0.355060	value. However, the	-0.124939
-0.355060	debugger. However, the	-0.124939
-0.355060	calculation. However, the	-0.124939
-0.578070	recommendation was the	-0.124939
-0.869405	in both the	-0.124939
-0.436604	by both the	-0.124939
-0.436604	because both the	-0.124939
-0.436604	Therefore, both the	-0.124939
-0.436604	checks both the	-0.124939
-0.198473	time unless the	-0.124939
-0.198473	variable unless the	-0.124939
-0.198473	optimization unless the	-0.124939
-0.198473	systems unless the	-0.124939
-0.198473	calculations unless the	-0.124939
-0.293983	mode unless the	-0.124939
-0.198473	handling unless the	-0.124939
-0.087996	slow unless the	-0.124939
-0.198473	safe unless the	-0.124939
-0.198473	clear unless the	-0.124939
-0.198473	rounding unless the	-0.124939
-0.198473	bits), unless the	-0.124939
-0.198473	constant, unless the	-0.124939
-0.198473	unfavorable, unless the	-0.124939
-0.572546	most cases, the	-0.124939
-0.568994	many cases, the	-0.124939
-0.607767	some cases, the	-0.124939
-0.568994	simple cases, the	-0.124939
-0.317983	to replace the	-0.249877
-1.039027	may replace the	-0.124939
-0.561905	will replace the	-0.124939
-0.400132	data. Therefore, the	-0.124939
-0.400132	called. Therefore, the	-0.124939
-0.400132	mode. Therefore, the	-0.124939
-0.400132	critical. Therefore, the	-0.124939
-0.563859	consuming. Therefore, the	-0.124939
-0.400132	addresses. Therefore, the	-0.124939
-0.660727	to see the	-0.124939
-0.827354	can see the	-0.124939
-0.549455	it allows the	-0.124939
-0.366134	This allows the	-0.425969
-0.390200	which allows the	-0.124939
-0.390200	reference allows the	-0.124939
-0.390200	mechanism allows the	-0.124939
-1.053193	instruction sets the	-0.124939
-0.457347	example sets the	-0.124939
-0.457347	__intel_cpu_features_init() sets the	-0.124939
-0.457347	similarly sets the	-0.124939
-0.580636	code like the	-0.124939
-0.417765	cache. Using the	-0.124939
-0.417765	temporarily. Using the	-0.124939
-0.417765	105). Using the	-0.124939
-0.417765	chapter. Using the	-0.124939
-0.417765	11. Using the	-0.124939
-0.578705	or model the	-0.124939
-0.532913	can block the	-0.124939
-0.532913	possibly block the	-0.124939
-0.588227	to put the	-0.124939
-0.515483	and put the	-0.124939
-0.326225	may put the	-0.124939
-0.326225	have put the	-0.124939
-0.084105	then put the	-0.301030
-0.583655	iteration needs the	-0.124939
-0.411940	to what the	-0.124939
-0.411940	shows what the	-0.124939
-0.656306	know what the	-0.124939
-0.158856	Checking what the	-0.425969
-0.773242	avoid running the	-0.124939
-0.532301	Consider running the	-0.124939
-0.997891	// Make the	-0.124939
-0.406095	class. Make the	-0.124939
-0.406095	object. Make the	-0.124939
-0.406095	returns. Make the	-0.124939
-0.406095	alternatives: Make the	-0.124939
-0.860604	and last the	-0.124939
-0.600844	and after the	-0.124939
-0.300219	or after the	-0.124939
-0.300219	check after the	-0.124939
-0.124668	cycles after the	-0.124939
-0.300219	output after the	-0.124939
-0.300219	_mm_empty() after the	-0.124939
-0.300219	locked after the	-0.124939
-0.923740	to read the	-0.124939
-0.437320	may read the	-0.124939
-0.437320	you read the	-0.124939
-0.619200	only read the	-0.124939
-0.315098	to give the	-0.124939
-0.131135	and give the	-0.425969
-0.319890	not give the	-0.124939
-0.319890	doesn't give the	-0.124939
-0.319890	counts give the	-0.124939
-1.350669	code becomes the	-0.124939
-1.009666	it requires the	-0.124939
-0.567920	to load the	-0.124939
-0.430840	will load the	-0.124939
-0.186328	to control the	-0.124939
-1.150107	to assume the	-0.124939
-0.676527	by calling the	-0.124939
-0.423344	than calling the	-0.124939
-0.162010	before calling the	-0.124939
-0.887125	example shows the	-0.124939
-0.358003	to improve the	-0.124939
-0.222611	can improve the	-0.301030
-0.194820	not improve the	-0.124939
-0.093982	may improve the	-0.425969
-0.194820	only improve the	-0.124939
-0.194820	possibly improve the	-0.124939
-0.379273	} Here, the	-0.124939
-0.300091	i; Here, the	-0.124939
-0.300091	x; Here, the	-0.124939
-0.300091	3.5; Here, the	-0.124939
-0.300091	List[i]++; Here, the	-0.124939
-0.300091	c1::*MemberPointer; Here, the	-0.124939
-1.012620	doesn't know the	-0.124939
-1.374094	will generate the	-0.124939
-1.410590	is usually the	-0.124939
-0.754300	to reduce the	-0.124939
-0.531979	can reduce the	-0.124939
-0.415206	cannot reduce the	-0.124939
-0.858995	it goes the	-0.124939
-0.515665	always goes the	-0.124939
-0.503649	to choose the	-0.124939
-0.233547	and choose the	-0.124939
-0.446219	may choose the	-0.124939
-0.300091	will choose the	-0.124939
-0.300091	automatically choose the	-0.124939
-0.185266	has made the	-0.124939
-0.849504	simple function, the	-0.124939
-1.007163	to start the	-0.124939
-0.513290	and start the	-0.124939
-0.170777	the smaller the	-0.124939
-0.456244	The smaller the	-0.124939
-0.733900	parenthesis around the	-0.124939
-0.509277	circumstances around the	-0.124939
-0.846549	which reductions the	-0.124939
-1.097558	to go the	-0.124939
-0.585045	have tested the	-0.124939
-0.992716	CPU supports the	-0.124939
-0.275216	to change the	-0.124939
-0.208750	can change the	-0.124939
-0.391880	may change the	-0.124939
-0.391880	will change the	-0.124939
-0.275216	we change the	-0.124939
-0.330729	turn off the	-0.124939
-0.261916	log off the	-0.124939
-0.211303	turning off the	-0.124939
-0.261916	cut off the	-0.124939
-0.572792	PSDK). Supports the	-0.124939
-0.847988	64-bit Windows, the	-0.124939
-0.079138	that gives the	-0.301030
-0.300816	set gives the	-0.124939
-0.300816	two gives the	-0.124939
-0.300816	N&(N-1) gives the	-0.124939
-0.441654	and inlining the	-0.124939
-0.433741	by inlining the	-0.124939
-0.246047	types Unfortunately, the	-0.124939
-0.246047	called. Unfortunately, the	-0.124939
-0.246047	calls. Unfortunately, the	-0.124939
-0.246047	purposes. Unfortunately, the	-0.124939
-0.246047	dispatching. Unfortunately, the	-0.124939
-0.246047	this. Unfortunately, the	-0.124939
-0.246047	portability. Unfortunately, the	-0.124939
-0.229203	to find the	-0.124939
-0.210545	cannot find the	-0.124939
-0.210545	(2) find the	-0.124939
-0.694812	to produce the	-0.124939
-0.433456	will produce the	-0.124939
-0.433456	should produce the	-0.124939
-0.246047	by including the	-0.124939
-0.246047	VIA including the	-0.124939
-0.105827	strings including the	-0.425969
-0.246047	features, including the	-0.124939
-0.246047	n, including the	-0.124939
-0.246047	computer, including the	-0.124939
-0.205162	is outside the	-0.124939
-0.125807	memory outside the	-0.124939
-0.125807	but outside the	-0.124939
-0.125807	variable outside the	-0.124939
-0.125807	operations outside the	-0.124939
-0.125807	element outside the	-0.124939
-0.125807	overflow outside the	-0.124939
-0.125807	done outside the	-0.124939
-0.125807	go outside the	-0.124939
-0.125807	move outside the	-0.124939
-0.331931	is still the	-0.124939
-0.424272	can prevent the	-0.124939
-0.424272	code prevent the	-0.124939
-0.424272	will prevent the	-0.124939
-0.570680	no destructor the	-0.124939
-0.146495	it prevents the	-0.124939
-0.081860	This prevents the	-0.425969
-0.157688	instruction prevents the	-0.124939
-0.157688	also prevents the	-0.124939
-0.157688	division prevents the	-0.124939
-0.085604	to tell the	-1.028029
-0.157688	then tell the	-0.124939
-0.575501	Let's repeat the	-0.124939
-0.555258	to unroll the	-0.425969
-0.172564	CPUs. On the	-0.124939
-0.172564	compiler. On the	-0.124939
-0.172564	resources. On the	-0.124939
-0.172564	structures. On the	-0.124939
-0.172564	difficult. On the	-0.124939
-0.172564	hyperthreading. On the	-0.124939
-0.172564	profitable. On the	-0.124939
-0.172564	9.1b. On the	-0.124939
-0.987722	64-bit Linux, the	-0.124939
-0.489412	x; Note the	-0.124939
-0.489412	Day. Note the	-0.124939
-0.300766	function. When the	-0.124939
-0.300766	size. When the	-0.124939
-0.300766	precision. When the	-0.124939
-0.300766	method. When the	-0.124939
-0.300766	mispredictions. When the	-0.124939
-0.412474	function. Avoid the	-0.124939
-0.412474	are: Avoid the	-0.124939
-0.412474	93. Avoid the	-0.124939
-0.274006	by copying the	-0.124939
-1.153829	for accessing the	-0.124939
-0.137180	or until the	-0.124939
-0.137180	only until the	-0.124939
-0.137180	variable until the	-0.124939
-0.137180	file until the	-0.124939
-0.137180	loaded until the	-0.124939
-0.249886	wait until the	-0.124939
-0.137180	waits until the	-0.124939
-0.137180	repeated until the	-0.124939
-0.137180	postponed until the	-0.124939
-0.984844	by adding the	-0.124939
-0.482615	before adding the	-0.124939
-0.483459	and causes the	-0.124939
-0.483459	free) causes the	-0.124939
-0.565288	than processing the	-0.124939
-0.126567	to divide the	-0.301030
-0.310894	you divide the	-0.124939
-1.229496	to mix the	-0.124939
-0.064193	to fit the	-0.425969
-0.284761	that fit the	-0.124939
-1.078041	to predict the	-0.124939
-0.680556	can predict the	-0.124939
-0.199404	even though the	-0.124939
-0.283604	backwards though the	-0.124939
-0.964126	to execute the	-0.124939
-0.773710	can execute the	-0.124939
-0.478499	or compiling the	-0.124939
-0.683447	by compiling the	-0.124939
-0.680556	then convert the	-0.124939
-0.476693	first convert the	-0.124939
-0.710715	at least the	-0.124939
-0.155664	class containing the	-0.124939
-0.400612	line containing the	-0.124939
-0.553180	to handle the	-0.425969
-0.339163	performance during the	-0.124939
-0.339163	computer during the	-0.124939
-0.339163	change during the	-0.124939
-0.339163	selected during the	-0.124939
-0.263557	that includes the	-0.124939
-0.446653	This includes the	-0.124939
-0.263557	also includes the	-0.124939
-0.263557	way includes the	-0.124939
-0.263557	file includes the	-0.124939
-0.752136	to insert the	-0.124939
-0.364998	and insert the	-0.124939
-1.197921	may consider the	-0.124939
-0.563701	be loading the	-0.124939
-0.322282	be below the	-0.124939
-0.322282	28 below the	-0.124939
-0.131908	matrix[r][c] below the	-0.425969
-0.666527	and reading the	-0.124939
-0.467867	with reading the	-0.124939
-0.467867	will delay the	-0.124939
-0.467867	doesn't delay the	-0.124939
-0.168030	of calculating the	-0.124939
-0.149037	for calculating the	-0.124939
-0.168030	than calculating the	-0.124939
-0.168030	when calculating the	-0.124939
-0.149999	to enable the	-0.301030
-0.167548	or enable the	-0.124939
-0.167548	may enable the	-0.124939
-0.255776	will enable the	-0.124939
-0.167548	sets enable the	-0.124939
-0.561306	with, e.g. the	-0.124939
-0.626887	to keep the	-0.124939
-0.562502	can align the	-0.124939
-0.562502	will allow the	-0.124939
-0.560113	and rarely the	-0.124939
-0.388601	performance under the	-0.124939
-0.388601	done under the	-0.124939
-0.388601	runs under the	-0.124939
-0.152679	you expect the	-0.124939
-0.850497	cannot expect the	-0.124939
-0.174249	bits except the	-0.425969
-1.253008	reason why the	-0.124939
-0.563992	zero whenever the	-0.124939
-0.170569	by unrolling the	-0.425969
-0.302855	not swap the	-0.124939
-0.079546	cannot swap the	-0.301030
-0.522308	to modify the	-0.124939
-0.522308	or modify the	-0.124939
-0.371208	don't modify the	-0.124939
-0.125425	// Store the	-0.726999
-0.817997	Gnu compiler, the	-0.124939
-0.302132	and setting the	-0.124939
-0.347767	by setting the	-0.124939
-0.302132	from setting the	-0.124939
-0.243180	from within the	-0.124939
-0.243180	data within the	-0.124939
-0.243180	only within the	-0.124939
-0.243180	members within the	-0.124939
-0.243180	obsolete within the	-0.124939
-0.830024	should apply the	-0.124939
-0.372057	cycles. Obviously, the	-0.124939
-0.372057	needed. Obviously, the	-0.124939
-0.372057	framework. Obviously, the	-0.124939
-1.283384	to allocate the	-0.124939
-0.540079	to implement the	-0.124939
-0.354465	classes implement the	-0.124939
-0.814696	has chosen the	-0.124939
-0.554180	classes contain the	-0.124939
-0.142030	to help the	-0.124939
-0.354465	can help the	-0.124939
-0.402887	optimize away the	-0.124939
-0.027601	to share the	-0.124939
-0.009007	can share the	-0.602060
-0.027601	objects share the	-0.124939
-0.027601	threads share the	-0.124939
-0.027601	members share the	-0.124939
-0.027601	usually share the	-0.124939
-0.027601	28 share the	-0.124939
-0.562881	is near the	-0.124939
-0.094782	and stores the	-0.124939
-0.316181	function stores the	-0.124939
-0.094782	simply stores the	-0.124939
-0.174377	for finding the	-0.249877
-0.215539	than finding the	-0.124939
-0.557061	most purposes the	-0.124939
-0.106185	to vectorize the	-0.301030
-0.160402	and vectorize the	-0.124939
-0.247037	will vectorize the	-0.124939
-0.160402	don't vectorize the	-0.124939
-0.817319	to include the	-0.124939
-0.445171	it involves the	-0.124939
-0.279471	this involves the	-0.124939
-0.279471	also involves the	-0.124939
-0.279471	driver involves the	-0.124939
-0.557061	iterations. Here the	-0.124939
-0.956140	optimize across the	-0.124939
-0.551750	code once the	-0.124939
-0.553359	never interrupt the	-0.124939
-0.553359	where almost the	-0.124939
-0.556596	for multiplying the	-0.124939
-0.301948	slow down the	-0.425969
-0.468406	are exactly the	-0.124939
-0.332463	have exactly the	-0.124939
-0.332463	doing exactly the	-0.124939
-0.545016	models had the	-0.124939
-1.015901	to measure the	-0.124939
-0.798863	one vector, the	-0.124939
-0.798863	to delete the	-0.124939
-0.545016	sets. Likewise, the	-0.124939
-0.798863	64-bit mode, the	-0.124939
-0.545016	to update the	-0.124939
-0.378527	compiler generates the	-0.425969
-0.410414	for executing the	-0.124939
-0.578932	after executing the	-0.124939
-0.407770	not free the	-0.124939
-0.407770	could free the	-0.124939
-0.291835	to hold the	-0.124939
-0.798863	the system, the	-0.124939
-0.407770	dispatcher changes the	-0.124939
-0.407770	__fastcall changes the	-0.124939
-0.795632	by storing the	-0.124939
-0.407770	above. Now the	-0.124939
-0.407770	d); Now the	-0.124939
-0.302986	to remove the	-0.124939
-0.303640	may remove the	-0.124939
-1.197371	to transpose the	-0.124939
-0.540349	how predictable the	-0.124939
-0.538268	memory plus the	-0.124939
-0.049631	to increase the	-0.124939
-0.105674	can increase the	-0.124939
-0.105674	cannot increase the	-0.124939
-0.105674	actually increase the	-0.124939
-0.463051	to identify the	-0.124939
-0.544542	1. Add the	-0.124939
-0.608175	to declare the	-0.124939
-0.383870	may declare the	-0.124939
-0.151687	that fits the	-0.124939
-0.783687	for giving the	-0.124939
-0.783687	explained above, the	-0.124939
-0.538268	may detect the	-0.124939
-0.787356	and show the	-0.124939
-0.540349	mispredictions. Test the	-0.124939
-1.047907	to evaluate the	-0.124939
-0.997094	or reference, the	-0.124939
-0.542440	than half the	-0.124939
-0.610621	only half the	-0.124939
-0.794788	for converting the	-0.124939
-0.385327	by specifying the	-0.124939
-0.385327	without specifying the	-0.124939
-0.542440	closely follows the	-0.124939
-0.538268	counter, comparing the	-0.124939
-0.542440	can prefetch the	-0.124939
-0.544542	Without static, the	-0.124939
-0.540349	speed Testing the	-0.124939
-1.179198	In general, the	-0.124939
-0.538268	module (i.e. the	-0.124939
-0.540349	for avoiding the	-0.124939
-0.608175	by avoiding the	-0.124939
-0.767498	to increment the	-0.124939
-0.092636	to economize the	-0.301030
-0.130286	and economize the	-0.124939
-0.339032	to overcome the	-0.124939
-0.528994	when swapping the	-0.124939
-0.771742	turned on, the	-0.124939
-0.227324	of reducing the	-0.124939
-0.227324	for reducing the	-0.124939
-0.227324	without reducing the	-0.124939
-0.060276	be worth the	-0.425969
-0.130286	rarely worth the	-0.124939
-0.130286	hardly worth the	-0.124939
-0.531439	software specifies the	-0.124939
-0.531439	// OR the	-0.124939
-0.528994	table lists the	-0.124939
-0.496013	that select the	-0.124939
-0.352476	always select the	-0.124939
-0.528994	in list, the	-0.124939
-0.352476	this case, the	-0.124939
-0.496013	latter case, the	-0.124939
-0.496013	advantages over the	-0.124939
-0.352476	controversies over the	-0.124939
-1.018609	other hand, the	-0.124939
-0.771742	to split the	-0.124939
-0.533899	to limit the	-0.124939
-0.130286	to follow the	-0.124939
-0.130286	you follow the	-0.124939
-0.130286	then follow the	-0.124939
-0.130286	lines follow the	-0.124939
-0.528994	CPUs increased the	-0.124939
-0.352476	file. Only the	-0.124939
-0.352476	ebx. Only the	-0.124939
-0.528994	function adds the	-0.124939
-0.528994	sizes. Fortunately, the	-0.124939
-0.330234	to specify the	-0.124939
-0.227324	we specify the	-0.124939
-0.227324	well specify the	-0.124939
-0.965349	but unfortunately the	-0.124939
-0.528994	inlined. (In the	-0.124939
-0.888943	to compare the	-0.124939
-0.354103	stack. Is the	-0.124939
-0.354103	38). Is the	-0.124939
-0.354103	enabled. Typically, the	-0.124939
-0.354103	future. Typically, the	-0.124939
-0.354103	user gets the	-0.124939
-0.354103	programmer gets the	-0.124939
-0.130286	This tells the	-0.124939
-0.130286	file tells the	-0.124939
-0.060276	profiler tells the	-0.124939
-0.262731	to wrap the	-0.425969
-0.531439	by increasing the	-0.124939
-0.533899	is definitely the	-0.124939
-0.515443	and BSD, the	-0.124939
-0.031178	2 Choosing the	-0.425969
-0.031178	5 Choosing the	-0.425969
-0.515443	to place the	-0.124939
-0.076749	to overlap the	-0.124939
-0.170035	can overlap the	-0.124939
-0.930510	by turning the	-0.124939
-0.127668	to obtain the	-0.425969
-0.239815	This enables the	-0.425969
-0.436989	me explain the	-0.124939
-0.309269	To explain the	-0.124939
-0.515443	weighed against the	-0.124939
-0.238772	by declaring the	-0.124939
-0.515443	may move the	-0.124939
-0.515443	container. Can the	-0.124939
-0.521393	always chooses the	-0.124939
-0.439462	by choosing the	-0.124939
-0.311111	programmer choosing the	-0.124939
-0.521393	is commonly the	-0.124939
-0.515443	of transferring the	-0.124939
-0.518408	and splitting the	-0.124939
-0.518408	problem. Whenever the	-0.124939
-0.309269	it avoids the	-0.124939
-0.309269	but avoids the	-0.124939
-0.515443	can begin the	-0.124939
-0.990005	other words, the	-0.124939
-0.929295	example illustrates the	-0.124939
-0.309269	to mirror the	-0.124939
-0.309269	may mirror the	-0.124939
-0.744292	by changing the	-0.124939
-0.309269	to force the	-0.124939
-0.309269	applications force the	-0.124939
-0.309269	it opens the	-0.124939
-0.309269	set opens the	-0.124939
-0.170035	have finished the	-0.124939
-0.076749	has finished the	-0.124939
-0.493758	mode. Storing the	-0.124939
-0.493758	by reordering the	-0.124939
-0.493758	simultaneously prefetching the	-0.124939
-0.493758	this distance the	-0.124939
-0.493758	from aligning the	-0.124939
-0.201562	to reload the	-0.124939
-0.493758	Without optimization, the	-0.124939
-0.497522	expressions. Whether the	-0.124939
-0.493758	This removed the	-0.124939
-0.493758	it optimizes the	-0.124939
-0.812088	// Return the	-0.124939
-0.201562	In fact, the	-0.124939
-0.493758	may ignore the	-0.124939
-0.088699	that reflects the	-0.124939
-0.088699	This reflects the	-0.124939
-0.088699	double reflects the	-0.124939
-0.105674	to manipulate the	-0.124939
-0.245623	that copies the	-0.124939
-0.245623	ebx,31 copies the	-0.124939
-0.105674	to study the	-0.124939
-0.245623	by bypassing the	-0.124939
-0.245623	compiler bypassing the	-0.124939
-0.493758	Please skip the	-0.124939
-0.493758	it allocates the	-0.124939
-0.201562	compilers offer the	-0.124939
-0.493758	// At the	-0.124939
-0.201562	before leaving the	-0.425969
-0.245623	processors. Consider the	-0.124939
-0.245623	loops. Consider the	-0.124939
-0.708175	and leave the	-0.124939
-0.497522	user. With the	-0.124939
-0.497522	code. Sometimes the	-0.124939
-0.708175	to justify the	-0.124939
-0.812088	the contrary, the	-0.124939
-0.105674	to cover the	-0.425969
-0.493758	same way, the	-0.124939
-0.245623	slow. Today, the	-0.124939
-0.245623	computers. Today, the	-0.124939
-0.493758	to focus the	-0.124939
-0.493758	is probably the	-0.124939
-0.708175	for improving the	-0.124939
-0.497522	eax holds the	-0.124939
-0.105674	or moving the	-0.425969
-0.493758	pulses since the	-0.124939
-0.027601	is beyond the	-0.602060
-0.245623	of organizing the	-0.124939
-0.245623	by organizing the	-0.124939
-0.493758	can open the	-0.124939
-0.245623	and measuring the	-0.124939
-0.245623	by measuring the	-0.124939
-0.105674	to utilize the	-0.124939
-0.497522	counts represent the	-0.124939
-0.105674	that measures the	-0.124939
-0.042089	can bypass the	-0.124939
-0.088699	or bypass the	-0.124939
-0.890451	will evict the	-0.124939
-0.493758	function. Copying the	-0.124939
-0.493758	and market the	-0.124939
-0.493758	2011). Instead, the	-0.124939
-0.812088	by joining the	-0.124939
-0.812088	to determine the	-0.124939
-0.105674	of interpreting the	-0.425969
-0.453389	for modifying the	-0.124939
-0.453389	systems lack the	-0.124939
-0.453389	Writing past the	-0.124939
-0.643868	to override the	-0.124939
-0.453389	to exit the	-0.124939
-0.453389	variable having the	-0.124939
-0.453389	This reduces the	-0.124939
-0.453389	better explains the	-0.124939
-0.453389	to emulate the	-0.124939
-0.453389	12.1a. Enable the	-0.124939
-0.453389	78). Adding the	-0.124939
-0.140920	vector. Organize the	-0.124939
-0.140920	bottleneck. Organize the	-0.124939
-0.453389	while loop, the	-0.124939
-0.140920	by invoking the	-0.124939
-0.140920	without invoking the	-0.124939
-0.453389	which calculates the	-0.124939
-0.064769	3 Finding the	-0.425969
-0.140920	for putting the	-0.124939
-0.140920	by putting the	-0.124939
-0.064769	2.8 Overcoming the	-0.425969
-0.453389	necessary. Take the	-0.124939
-0.140920	it increases the	-0.124939
-0.140920	table increases the	-0.124939
-0.453389	actively invalidate the	-0.124939
-0.453389	4. So the	-0.124939
-0.453389	operands. Nevertheless, the	-0.124939
-0.453389	FuncC. Unrolling the	-0.124939
-0.453389	which determines the	-0.124939
-0.140920	logic behind the	-0.124939
-0.140920	hidden behind the	-0.124939
-0.140920	to isolate the	-0.124939
-0.140920	and isolate the	-0.124939
-0.453389	for verifying the	-0.124939
-0.453389	sizeof(float)). Now, the	-0.124939
-0.140920	priority. Especially the	-0.124939
-0.140920	addresses. Especially the	-0.124939
-0.453389	library requiring the	-0.124939
-0.453389	can influence the	-0.124939
-0.140920	that loads the	-0.124939
-0.140920	program loads the	-0.124939
-0.643868	to draw the	-0.124939
-0.140920	of sharing the	-0.124939
-0.140920	are sharing the	-0.124939
-0.064769	and redo the	-0.124939
-0.643868	and deleting the	-0.124939
-0.453389	that covered the	-0.124939
-0.453389	spot. Sometimes, the	-0.124939
-0.643868	will crash the	-0.124939
-0.643868	to reserve the	-0.124939
-0.453389	run. Both the	-0.124939
-0.643868	is loaded, the	-0.124939
-0.643868	by extending the	-0.124939
-0.140920	operator forces the	-0.124939
-0.140920	union forces the	-0.124939
-0.064769	and stop the	-0.124939
-0.643868	to organize the	-0.124939
-0.453389	compiler sees the	-0.124939
-0.453389	and studying the	-0.124939
-0.453389	it compares the	-0.124939
-0.064769	to fix the	-0.124939
-0.453389	may involve the	-0.124939
-0.453389	by removing the	-0.124939
-0.453389	compiler interpret the	-0.124939
-0.453389	to flip the	-0.124939
-0.453389	these reasons, the	-0.124939
-0.453389	for relieving the	-0.124939
-0.453389	2. Put the	-0.124939
-0.350854	by ignoring the	-0.124939
-0.350854	that owns the	-0.124939
-0.350854	that limits the	-0.124939
-0.350854	sign bit, the	-0.124939
-0.350854	interface. Otherwise the	-0.124939
-0.350854	will trigger the	-0.124939
-0.350854	// Re-do the	-0.124939
-0.350854	problems separating the	-0.124939
-0.350854	decades ago, the	-0.124939
-0.350854	may reuse the	-0.124939
-0.350854	classes. Including the	-0.124939
-0.350854	ebx restores the	-0.124939
-0.350854	even telling the	-0.124939
-0.350854	fast. Calculating the	-0.124939
-0.350854	to thank the	-0.124939
-0.350854	consumers. Choose the	-0.124939
-0.350854	system forbids the	-0.124939
-0.350854	program. Weighing the	-0.124939
-0.350854	as eliminating the	-0.124939
-0.350854	by emulating the	-0.124939
-0.350854	version satisfies the	-0.124939
-0.350854	are among the	-0.124939
-0.350854	icon signaling the	-0.124939
-0.350854	of solving the	-0.124939
-0.350854	it lacks the	-0.124939
-0.350854	we loose the	-0.124939
-0.350854	handling. Omitting the	-0.124939
-0.350854	by controlling the	-0.124939
-0.350854	before trying the	-0.124939
-0.350854	by inverting the	-0.124939
-0.350854	may interleave the	-0.124939
-0.350854	rarely justifies the	-0.124939
-0.350854	to weigh the	-0.124939
-0.350854	to restart the	-0.124939
-0.350854	occupied throughout the	-0.124939
-0.350854	(This eliminates the	-0.124939
-0.350854	by dropping the	-0.124939
-0.350854	example 12.2, the	-0.124939
-0.350854	if-else structure), the	-0.124939
-0.350854	experience. Occasionally, the	-0.124939
-0.350854	at explaining the	-0.124939
-0.350854	for holding the	-0.124939
-0.350854	included. Combining the	-0.124939
-0.350854	to fetch the	-0.124939
-0.350854	by constructing the	-0.124939
-0.350854	processor enters the	-0.124939
-0.350854	can steal the	-0.124939
-0.350854	NAN. Avoiding the	-0.124939
-0.350854	to localize the	-0.124939
-0.350854	spot. Repeating the	-0.124939
-0.350854	column 28, the	-0.124939
-0.350854	addition to) the	-0.124939
-0.350854	9.3 shows, the	-0.124939
-0.350854	without paying the	-0.124939
-0.350854	doesn't provide the	-0.124939
-0.350854	operations in-between the	-0.124939
-0.350854	often abusing the	-0.124939
-0.350854	by wrapping the	-0.124939
-0.350854	dispatching. Underestimating the	-0.124939
-0.350854	to reinvent the	-0.124939
-0.350854	table summarizes the	-0.124939
-0.350854	condition terminates the	-0.124939
-0.350854	compiler puts the	-0.124939
-0.350854	then merge the	-0.124939
-0.350854	to combine the	-0.124939
-0.350854	alias upon the	-0.124939
-0.350854	operation isolates the	-0.124939
-0.350854	solution. Sort the	-0.124939
-0.350854	and reorganize the	-0.124939
-0.350854	faster despite the	-0.124939
-0.350854	linker extracts the	-0.124939
-0.350854	This ends the	-0.124939
-0.350854	complicated? Because the	-0.124939
-0.350854	which interprets the	-0.124939
-0.350854	overflow. Taking the	-0.124939
-0.350854	example 12.1a, the	-0.124939
-0.350854	are met: the	-0.124939
-0.350854	exist. Therefore the	-0.124939
-0.350854	that crashes the	-0.124939
-0.350854	to deallocate the	-0.124939
-0.350854	program. During the	-0.124939
-0.350854	may view the	-0.124939
-0.350854	trick violates the	-0.124939
-0.350854	to consult the	-0.124939
-0.350854	any event, the	-0.124939
-0.350854	to minimize the	-0.124939
-0.350854	example 12.1b, the	-0.124939
-0.350854	in applying the	-0.124939
-0.350854	to refresh the	-0.124939
-0.350854	and concentrate the	-0.124939
-0.350854	that shares the	-0.124939
-0.350854	chains, namely the	-0.124939
-0.350854	14.30 finds the	-0.124939
-0.350854	= ++b; the	-0.124939
-0.350854	nowadays stress the	-0.124939
-0.350854	to collect the	-0.124939
-0.350854	are. Declare the	-0.124939
-0.350854	fine- tune the	-0.124939
-0.350854	my tests, the	-0.124939
-0.350854	variables. Move the	-0.124939
-0.350854	and closes the	-0.124939
-0.350854	etc. Overriding the	-0.124939
-0.350854	to mimic the	-0.124939
-0.350854	can overwrite the	-0.124939
-0.350854	of activating the	-0.124939
-0.350854	when exiting the	-0.124939
-0.350854	often disturb the	-0.124939
-0.350854	and replaces the	-0.124939
-0.350854	valid. Re-interpreting the	-0.124939
-1.924360	that a is	-0.124939
-1.226115	if a is	-0.124939
-1.612578	when a is	-0.124939
-1.560844	points to is	-0.124939
-0.490862	pointed to is	-0.124939
-1.158492	objects and is	-0.124939
-0.879677	big and is	-0.124939
-0.879677	support and is	-0.124939
-1.044519	dispatching and is	-0.124939
-0.879677	checking and is	-0.124939
-0.590196	is, and is	-0.124939
-1.464239	optimized for is	-0.124939
-0.687427	function that is	-0.124939
-0.692102	code that is	-0.124939
-0.746268	time that is	-0.124939
-0.516609	memory that is	-0.124939
-1.141778	program that is	-0.124939
-0.741912	one that is	-0.124939
-0.746268	set that is	-0.124939
-0.932325	class that is	-0.124939
-0.316397	size that is	-0.301030
-0.981456	library that is	-0.124939
-0.861079	object that is	-0.124939
-0.981456	version that is	-0.124939
-0.138792	value that is	-0.124939
-0.861079	variable that is	-0.124939
-0.516609	table that is	-0.124939
-0.451891	software that is	-0.124939
-1.193688	branch that is	-0.124939
-0.185632	address that is	-0.425969
-0.861079	method that is	-0.124939
-0.516609	type that is	-0.124939
-0.516609	constant that is	-0.124939
-0.311951	expression that is	-0.249877
-0.516609	zero that is	-0.124939
-0.516609	offset that is	-0.124939
-0.185632	operand that is	-0.124939
-0.861079	everything that is	-0.124939
-0.516609	measure that is	-0.124939
-0.516609	polymorphism that is	-0.124939
-0.516609	unwinding that is	-0.124939
-0.516609	divisor that is	-0.124939
-0.746268	routine that is	-0.124939
-0.516609	(PLT) that is	-0.124939
-0.185632	Everything that is	-0.425969
-0.516609	Code that is	-0.124939
-0.727369	and it is	-0.124939
-0.443025	that it is	-0.221849
-0.129422	if it is	-0.267606
-0.346911	as it is	-0.124939
-0.382930	than it is	-0.124939
-1.072444	time it is	-0.124939
-0.286500	when it is	-0.301030
-0.256231	then it is	-0.271067
-0.496734	because it is	-0.124939
-0.280052	CPU it is	-0.124939
-0.074904	If it is	-0.124939
-0.398218	which it is	-0.124939
-0.400348	but it is	-0.191886
-0.280052	one it is	-0.124939
-0.229261	where it is	-0.124939
-0.289251	before it is	-0.124939
-0.280052	libraries it is	-0.124939
-0.221972	sure it is	-0.124939
-0.235052	cases it is	-0.425969
-0.280052	important it is	-0.124939
-0.280052	while it is	-0.124939
-0.211233	But it is	-0.124939
-0.280052	therefore it is	-0.124939
-0.035839	whether it is	-0.301030
-0.280052	However, it is	-0.124939
-0.117833	cases, it is	-0.124939
-0.280052	microprocessor it is	-0.124939
-0.159226	Therefore, it is	-0.124939
-0.280052	applications it is	-0.124939
-0.280052	Here, it is	-0.124939
-0.398218	though it is	-0.124939
-0.117833	program, it is	-0.124939
-0.280052	why it is	-0.124939
-0.446023	whenever it is	-0.124939
-0.280052	used, it is	-0.124939
-0.280052	Obviously, it is	-0.124939
-0.280052	Here it is	-0.124939
-0.280052	Likewise, it is	-0.124939
-0.446023	called, it is	-0.124939
-0.280052	Now it is	-0.124939
-0.280052	above, it is	-0.124939
-0.280052	general, it is	-0.124939
-0.280052	C++, it is	-0.124939
-0.280052	event it is	-0.124939
-0.280052	hand, it is	-0.124939
-0.280052	Fortunately, it is	-0.124939
-0.117833	And it is	-0.124939
-0.280052	platforms, it is	-0.124939
-0.280052	languages, it is	-0.124939
-0.280052	Furthermore, it is	-0.124939
-0.280052	words, it is	-0.124939
-0.398218	Sometimes it is	-0.124939
-0.280052	Today, it is	-0.124939
-0.398218	Often, it is	-0.124939
-0.280052	Nevertheless, it is	-0.124939
-0.280052	software, it is	-0.124939
-0.280052	algebra, it is	-0.124939
-0.117833	projects, it is	-0.124939
-0.280052	method, it is	-0.124939
-0.280052	accessed, it is	-0.124939
-0.280052	see, it is	-0.124939
-0.280052	performance, it is	-0.124939
-0.280052	Hence, it is	-0.124939
-0.280052	design, it is	-0.124939
-0.280052	bottleneck, it is	-0.124939
-0.280052	project, it is	-0.124939
-0.280052	habit, it is	-0.124939
-0.280052	these, it is	-0.124939
-0.280052	nature, it is	-0.124939
-0.550370	the function is	-0.321233
-0.767251	a function is	-0.234083
-0.651931	The function is	-0.124939
-0.349204	This function is	-0.124939
-0.799274	this function is	-0.124939
-0.938349	A function is	-0.124939
-0.445203	other function is	-0.124939
-0.878455	each function is	-0.124939
-0.469443	member function is	-0.124939
-0.599617	critical function is	-0.124939
-0.631240	template function is	-0.124939
-0.650770	virtual function is	-0.124939
-0.716500	inline function is	-0.124939
-0.766358	dispatcher function is	-0.124939
-0.716500	graphics function is	-0.124939
-0.445203	linked function is	-0.124939
-0.799274	frame function is	-0.124939
-0.716500	Assume function is	-0.124939
-0.863268	pure function is	-0.124939
-0.766358	dispatched function is	-0.124939
-0.435479	leaf function is	-0.124939
-0.445203	lrint function is	-0.124939
-0.445203	InstructionSet() function is	-0.124939
-0.445203	user-defined function is	-0.124939
-0.445203	sin function is	-0.124939
-0.598622	while if is	-0.124939
-0.596460	j by is	-0.124939
-1.699088	rely on is	-0.124939
-0.716161	the code is	-0.173243
-0.714752	of code is	-0.124939
-0.835244	The code is	-0.124939
-0.450052	function code is	-0.124939
-0.776579	this code is	-0.124939
-0.289999	program code is	-0.301030
-0.450052	such code is	-0.124939
-0.312905	system code is	-0.124939
-0.441139	intermediate code is	-0.425969
-0.873489	above code is	-0.124939
-0.638705	source code is	-0.124939
-0.169164	your code is	-0.124939
-0.964774	position-independent code is	-0.124939
-0.776579	machine code is	-0.124939
-0.312905	Position-independent code is	-0.124939
-0.638705	Vectorized code is	-0.124939
-0.638705	built-in code is	-0.124939
-0.450052	unsafe code is	-0.124939
-0.450052	Interpreted code is	-0.124939
-0.450052	Complicated code is	-0.124939
-0.591883	etc., as is	-0.124939
-0.591883	vectors, as is	-0.124939
-0.215659	// This is	-0.346788
-0.800337	} This is	-0.124939
-0.663091	code. This is	-0.124939
-0.063449	time. This is	-0.124939
-0.340750	Gnu This is	-0.124939
-0.667481	function. This is	-0.124939
-0.538152	functions. This is	-0.124939
-0.340750	b; This is	-0.124939
-0.591252	memory. This is	-0.124939
-0.591252	program. This is	-0.124939
-0.256334	data. This is	-0.124939
-0.538152	called. This is	-0.124939
-0.479779	CPUs. This is	-0.124939
-0.479779	loop. This is	-0.124939
-0.479779	pointer. This is	-0.124939
-0.479779	cases. This is	-0.124939
-0.340750	size. This is	-0.124939
-0.538152	class. This is	-0.124939
-0.340750	it. This is	-0.124939
-0.479779	variable. This is	-0.124939
-0.479779	purposes. This is	-0.124939
-0.340750	instructions. This is	-0.124939
-0.340750	vector. This is	-0.124939
-0.479779	well. This is	-0.124939
-0.340750	returns. This is	-0.124939
-0.479779	block. This is	-0.124939
-0.340750	value. This is	-0.124939
-0.340750	system. This is	-0.124939
-0.340750	software. This is	-0.124939
-0.340750	line. This is	-0.124939
-0.340750	parameters. This is	-0.124939
-0.479779	simultaneously. This is	-0.124939
-0.340750	Studio This is	-0.124939
-0.340750	processor. This is	-0.124939
-0.256334	bits. This is	-0.124939
-0.479779	do. This is	-0.124939
-0.479779	16. This is	-0.124939
-0.479779	times. This is	-0.124939
-0.340750	counter. This is	-0.124939
-0.340750	structure. This is	-0.124939
-0.340750	Mars This is	-0.124939
-0.340750	up. This is	-0.124939
-0.340750	objects. This is	-0.124939
-0.340750	pointers. This is	-0.124939
-0.340750	counts. This is	-0.124939
-0.340750	results. This is	-0.124939
-0.340750	addition. This is	-0.124939
-0.340750	long. This is	-0.124939
-0.340750	directives. This is	-0.124939
-0.479779	CPUs"). This is	-0.124939
-0.340750	same. This is	-0.124939
-0.340750	other. This is	-0.124939
-0.340750	truncation. This is	-0.124939
-0.340750	x. This is	-0.124939
-0.340750	frameworks. This is	-0.124939
-0.340750	compiled. This is	-0.124939
-0.340750	32. This is	-0.124939
-0.340750	declaration. This is	-0.124939
-0.340750	default. This is	-0.124939
-0.340750	rounding. This is	-0.124939
-0.340750	temporarily. This is	-0.124939
-0.340750	predicted. This is	-0.124939
-0.340750	alloca. This is	-0.124939
-0.340750	$B1$2:. This is	-0.124939
-0.340750	before. This is	-0.124939
-0.340750	de-allocated. This is	-0.124939
-0.340750	[ecx+eax*4]. This is	-0.124939
-0.340750	0.666666666666666666667; This is	-0.124939
-0.340750	spaces. This is	-0.124939
-0.340750	if. This is	-0.124939
-0.340750	specialization. This is	-0.124939
-0.340750	happens. This is	-0.124939
-0.340750	35 This is	-0.124939
-0.340750	kbytes. This is	-0.124939
-0.340750	(SVML). This is	-0.124939
-0.340750	often. This is	-0.124939
-0.340750	i*sizeof(S1). This is	-0.124939
-0.340750	CString. This is	-0.124939
-0.340750	deprecated. This is	-0.124939
-0.340750	((a+b)+c)+d. This is	-0.124939
-0.340750	measure. This is	-0.124939
-0.340750	usability. This is	-0.124939
-0.340750	eax,eax. This is	-0.124939
-0.374397	an int is	-0.124939
-2.008973	short int is	-0.124939
-1.192589	the compiler is	-0.301030
-1.143213	The compiler is	-0.124939
-0.753633	This compiler is	-0.124939
-0.733311	Intel compiler is	-0.124939
-0.551590	C++ compiler is	-0.425969
-1.292208	Gnu compiler is	-0.124939
-0.753633	Clang compiler is	-0.124939
-0.520934	Mars compiler is	-0.124939
-1.267092	of x is	-0.124939
-0.595057	that x is	-0.124939
-1.118509	of this is	-0.124939
-1.020598	for this is	-0.124939
-0.813323	that this is	-0.124939
-0.513861	if this is	-0.124939
-0.494351	as this is	-0.124939
-0.709147	from this is	-0.124939
-0.640479	because this is	-0.124939
-0.709147	If this is	-0.124939
-0.354837	but this is	-0.204120
-0.494351	example, this is	-0.124939
-0.494351	test this is	-0.124939
-0.494351	system this is	-0.124939
-0.876580	However, this is	-0.124939
-0.180335	Obviously, this is	-0.124939
-0.494351	unfortunately this is	-0.124939
-1.361431	the time is	-0.425969
-0.531467	This time is	-0.124939
-0.531467	If time is	-0.124939
-1.019569	much time is	-0.124939
-0.350584	calculation time is	-0.124939
-0.531467	conversion time is	-0.124939
-0.558344	response time is	-0.124939
-0.531467	No time is	-0.124939
-0.531467	measured time is	-0.124939
-0.531467	Extra time is	-0.124939
-0.531467	computation time is	-0.124939
-0.891230	you use is	-0.124939
-0.245180	of A is	-0.124939
-0.052684	} It is	-0.124939
-0.112654	functions It is	-0.124939
-0.112654	object It is	-0.124939
-0.112654	libraries It is	-0.124939
-0.025544	code. It is	-0.124939
-0.117526	time. It is	-0.124939
-0.112654	pointers It is	-0.124939
-0.112654	etc. It is	-0.124939
-0.116710	functions. It is	-0.124939
-0.116710	memory. It is	-0.124939
-0.066205	used. It is	-0.124939
-0.112654	systems. It is	-0.124939
-0.112654	data. It is	-0.124939
-0.112654	set. It is	-0.124939
-0.112654	processors. It is	-0.124939
-0.112654	called. It is	-0.124939
-0.112654	compiler. It is	-0.124939
-0.112654	are: It is	-0.124939
-0.112654	loop. It is	-0.124939
-0.249560	pointer. It is	-0.124939
-0.112654	cases. It is	-0.124939
-0.112654	variables. It is	-0.124939
-0.112654	calls. It is	-0.124939
-0.112654	object. It is	-0.124939
-0.052684	calculations. It is	-0.124939
-0.112654	optimization. It is	-0.124939
-0.112654	performance. It is	-0.124939
-0.112654	thread. It is	-0.124939
-0.112654	structures It is	-0.124939
-0.112654	references. It is	-0.124939
-0.112654	Windows. It is	-0.124939
-0.116710	integers. It is	-0.124939
-0.052684	to. It is	-0.124939
-0.112654	critical. It is	-0.124939
-0.112654	executed. It is	-0.124939
-0.112654	faster. It is	-0.124939
-0.052684	problems. It is	-0.124939
-0.112654	expressions. It is	-0.124939
-0.112654	arrays. It is	-0.124939
-0.112654	zero. It is	-0.124939
-0.112654	language. It is	-0.124939
-0.112654	is. It is	-0.124939
-0.112654	automatically. It is	-0.124939
-0.052684	core. It is	-0.124939
-0.112654	vectorization. It is	-0.124939
-0.112654	do. It is	-0.124939
-0.112654	constant. It is	-0.124939
-0.112654	members. It is	-0.124939
-0.112654	times. It is	-0.124939
-0.112654	structure. It is	-0.124939
-0.112654	profiler. It is	-0.124939
-0.112654	handling. It is	-0.124939
-0.112654	a[i]; It is	-0.124939
-0.112654	execution. It is	-0.124939
-0.112654	input. It is	-0.124939
-0.112654	IDE. It is	-0.124939
-0.112654	versions. It is	-0.124939
-0.112654	information. It is	-0.124939
-0.112654	interface. It is	-0.124939
-0.112654	unit-testing It is	-0.124939
-0.112654	style. It is	-0.124939
-0.112654	counts. It is	-0.124939
-0.112654	smaller. It is	-0.124939
-0.112654	programs. It is	-0.124939
-0.112654	part. It is	-0.124939
-0.112654	organization It is	-0.124939
-0.189419	purpose. It is	-0.124939
-0.112654	manner. It is	-0.124939
-0.112654	x. It is	-0.124939
-0.112654	context. It is	-0.124939
-0.112654	aligned. It is	-0.124939
-0.112654	throw. It is	-0.124939
-0.112654	course. It is	-0.124939
-0.112654	73). It is	-0.124939
-0.112654	disadvantages: It is	-0.124939
-0.112654	away. It is	-0.124939
-0.112654	decomposition. It is	-0.124939
-0.112654	lost. It is	-0.124939
-0.112654	read. It is	-0.124939
-0.112654	148 It is	-0.124939
-0.112654	Gnu. It is	-0.124939
-0.112654	updated. It is	-0.124939
-0.112654	shows. It is	-0.124939
-0.112654	efficiently. It is	-0.124939
-0.112654	divisions. It is	-0.124939
-0.112654	72. It is	-0.124939
-0.112654	message. It is	-0.124939
-0.112654	product. It is	-0.124939
-0.112654	off. It is	-0.124939
-0.112654	diagnose. It is	-0.124939
-0.112654	bloat. It is	-0.124939
-0.112654	polymorphism. It is	-0.124939
-0.112654	move. It is	-0.124939
-0.112654	ahead. It is	-0.124939
-0.112654	strategies It is	-0.124939
-0.112654	type-casting. It is	-0.124939
-0.112654	response. It is	-0.124939
-0.112654	happening. It is	-0.124939
-0.112654	standardized. It is	-0.124939
-0.112654	convenient. It is	-0.124939
-0.112654	animation. It is	-0.124939
-0.112654	_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It is	-0.124939
-0.112654	can. It is	-0.124939
-0.112654	indeed. It is	-0.124939
-0.112654	hackers. It is	-0.124939
-0.112654	friendly. It is	-0.124939
-0.112654	54. It is	-0.124939
-0.112654	considerations. It is	-0.124939
-0.112654	poorly. It is	-0.124939
-0.112654	fashion. It is	-0.124939
-0.112654	between. It is	-0.124939
-0.112654	130. It is	-0.124939
-0.112654	57). It is	-0.124939
-0.112654	correctness. It is	-0.124939
-0.112654	61. It is	-0.124939
-0.112654	sizes? It is	-0.124939
-0.112654	a*b*c*2. It is	-0.124939
-0.112654	double's. It is	-0.124939
-0.112654	pooling. It is	-0.124939
-0.112654	decimals. It is	-0.124939
-0.112654	develop. It is	-0.124939
-0.112654	leaks. It is	-0.124939
-0.112654	costless. It is	-0.124939
-0.112654	queue. It is	-0.124939
-1.533284	the memory is	-0.124939
-1.398763	in memory is	-0.124939
-1.408285	static memory is	-0.124939
-1.182985	allocated memory is	-0.124939
-1.002915	static data is	-0.124939
-0.575122	input data is	-0.124939
-0.850788	thread-specific data is	-0.124939
-0.575122	numerical data is	-0.124939
-0.530420	the program is	-0.227601
-0.800100	a program is	-0.124939
-0.960870	The program is	-0.124939
-0.822629	test program is	-0.124939
-0.435030	optimized program is	-0.124939
-0.305537	mode program is	-0.124939
-0.435030	calling program is	-0.124939
-0.435030	intensive program is	-0.124939
-0.997594	other functions is	-0.124939
-1.076775	member functions is	-0.124939
-0.997594	these functions is	-0.124939
-2.018501	the CPU is	-0.124939
-0.587324	(NetBurst) CPU is	-0.124939
-1.666078	the other is	-0.124939
-1.437031	each other is	-0.124939
-1.070544	scan instruction is	-0.124939
-0.843797	the point is	-0.124939
-1.709556	floating point is	-0.425969
-0.843797	decimal point is	-0.124939
-1.163858	the loop is	-0.124939
-1.467507	a loop is	-0.124939
-0.938007	while loop is	-0.124939
-1.269143	innermost loop is	-0.124939
-0.363095	compiler which is	-0.124939
-0.363095	memory which is	-0.124939
-0.573896	pointer which is	-0.124939
-0.363095	library which is	-0.124939
-0.363095	object which is	-0.124939
-0.363095	variable which is	-0.124939
-0.510863	address which is	-0.124939
-0.363095	bit which is	-0.124939
-0.363095	support which is	-0.124939
-0.363095	process which is	-0.124939
-0.363095	operation which is	-0.124939
-0.510863	code, which is	-0.124939
-0.363095	compiler, which is	-0.124939
-0.363095	trick which is	-0.124939
-0.363095	integers, which is	-0.124939
-0.363095	size, which is	-0.124939
-0.363095	latency which is	-0.124939
-0.144658	true, which is	-0.124939
-0.510863	up, which is	-0.124939
-0.510863	stack, which is	-0.124939
-0.363095	number, which is	-0.124939
-0.363095	division, which is	-0.124939
-0.363095	comparison, which is	-0.124939
-0.363095	counter, which is	-0.124939
-0.363095	collector which is	-0.124939
-0.144658	operation, which is	-0.124939
-0.363095	(DLL) which is	-0.124939
-0.363095	x<<3, which is	-0.124939
-0.363095	branch, which is	-0.124939
-0.363095	asmlib, which is	-0.124939
-0.363095	polymorphism, which is	-0.124939
-0.363095	everything, which is	-0.124939
-0.363095	output, which is	-0.124939
-0.363095	.NET, which is	-0.124939
-0.363095	bit-mask which is	-0.124939
-0.363095	2.5, which is	-0.124939
-1.167986	at all is	-0.124939
-0.570761	Intel but is	-0.124939
-0.570761	C++ but is	-0.124939
-0.842598	processors, but is	-0.124939
-0.570761	test, but is	-0.124939
-2.053686	is used is	-0.124939
-0.568355	if one is	-0.124939
-0.366414	which one is	-0.124939
-0.838109	preceding one is	-0.124939
-1.235222	a cache is	-0.124939
-1.226372	code cache is	-0.124939
-0.763461	A cache is	-0.124939
-0.659158	data cache is	-0.124939
-1.135091	same cache is	-0.124939
-0.842642	level-2 cache is	-0.425969
-1.050281	level-1 cache is	-0.124939
-1.327636	the integer is	-0.124939
-1.135217	an integer is	-0.124939
-0.557941	even integer is	-0.124939
-0.958351	An integer is	-0.124939
-0.317306	instruction set is	-0.425969
-1.161419	the class is	-0.124939
-1.287705	a class is	-0.124939
-0.629697	or class is	-0.124939
-0.449921	template class is	-0.124939
-1.074967	child class is	-0.124939
-1.051817	derived class is	-0.124939
-0.513565	base class is	-0.124939
-1.975090	to do is	-0.124939
-1.691970	can do is	-0.124939
-0.544040	This example is	-0.124939
-0.737431	this example is	-0.124939
-0.544040	An example is	-0.124939
-0.191879	My example is	-0.425969
-1.177660	different compilers is	-0.124939
-1.349987	and double is	-0.124939
-0.572039	A double is	-0.124939
-0.844989	64-bit double is	-0.124939
-1.592451	the size is	-0.124939
-0.717779	for size is	-0.124939
-0.499595	cache size is	-0.124939
-0.754867	integer size is	-0.124939
-0.889333	register size is	-0.124939
-0.499595	its size is	-0.124939
-0.499595	specific size is	-0.124939
-1.032761	line size is	-0.124939
-0.499595	smaller size is	-0.124939
-0.499595	RAM size is	-0.124939
-0.523096	the pointer is	-0.124939
-0.488418	The pointer is	-0.124939
-0.813403	function pointer is	-0.124939
-0.488418	This pointer is	-0.124939
-0.699460	this pointer is	-0.124939
-0.801046	A pointer is	-0.124939
-0.699460	Assume pointer is	-0.124939
-0.508891	smart pointer is	-0.124939
-0.438331	of b is	-0.124939
-1.198803	and b is	-0.124939
-0.495799	that b is	-0.124939
-0.223916	if b is	-0.602060
-0.081013	when b is	-0.249877
-0.787327	the library is	-0.124939
-0.537130	vector library is	-0.124939
-0.753482	dynamic library is	-0.124939
-0.195518	when i is	-0.124939
-0.560658	If i is	-0.124939
-0.560658	counter i is	-0.124939
-0.498911	the object is	-0.124939
-0.735048	or object is	-0.124939
-0.845996	an object is	-0.124939
-0.429975	no object is	-0.124939
-0.608083	new object is	-0.124939
-0.653167	shared object is	-0.124939
-0.429975	Each object is	-0.124939
-0.608083	local object is	-0.124939
-0.429975	original object is	-0.124939
-1.295622	point number is	-0.124939
-0.583773	higher number is	-0.124939
-0.593408	word static is	-0.124939
-0.387170	and there is	-0.425969
-0.341968	that there is	-0.124939
-0.257399	if there is	-0.321233
-0.184808	- there is	-0.124939
-0.082647	when there is	-0.425969
-0.043023	then there is	-0.602060
-0.312691	because there is	-0.124939
-0.385770	If there is	-0.425969
-0.288860	but there is	-0.124939
-0.277021	where there is	-0.124939
-0.184808	case there is	-0.124939
-0.453112	But there is	-0.124939
-0.184808	whether there is	-0.124939
-0.053315	unless there is	-0.602060
-0.116787	cases, there is	-0.301030
-0.184808	why there is	-0.124939
-0.184808	course there is	-0.124939
-0.184808	used, there is	-0.124939
-0.184808	diagonal there is	-0.124939
-0.184808	systems, there is	-0.124939
-0.184808	however, there is	-0.124939
-0.184808	general, there is	-0.124939
-0.184808	unfortunately there is	-0.124939
-0.277021	Typically, there is	-0.124939
-0.184808	enabled there is	-0.124939
-0.736005	and C++ is	-0.124939
-1.025185	in C++ is	-0.124939
-0.510531	when C++ is	-0.124939
-0.510531	However, C++ is	-0.124939
-0.510531	libraries. C++ is	-0.124939
-0.510531	reasons. C++ is	-0.124939
-0.510531	15. C++ is	-0.124939
-0.510531	Portability C++ is	-0.124939
-0.261543	double There is	-0.124939
-0.261543	functions. There is	-0.124939
-0.261543	... There is	-0.124939
-0.261543	systems. There is	-0.124939
-0.374078	processors. There is	-0.124939
-0.261543	process There is	-0.124939
-0.261543	called. There is	-0.124939
-0.261543	are: There is	-0.124939
-0.261543	size. There is	-0.124939
-0.261543	object. There is	-0.124939
-0.261543	way. There is	-0.124939
-0.261543	references. There is	-0.124939
-0.261543	returns. There is	-0.124939
-0.261543	allocation. There is	-0.124939
-0.261543	inefficient. There is	-0.124939
-0.261543	block. There is	-0.124939
-0.261543	parameter. There is	-0.124939
-0.261543	value. There is	-0.124939
-0.261543	branch. There is	-0.124939
-0.261543	parameters. There is	-0.124939
-0.261543	bits. There is	-0.124939
-0.261543	automatically. There is	-0.124939
-0.261543	throughput There is	-0.124939
-0.261543	consuming. There is	-0.124939
-0.261543	execution. There is	-0.124939
-0.261543	support. There is	-0.124939
-0.261543	screen. There is	-0.124939
-0.261543	programmer. There is	-0.124939
-0.261543	created. There is	-0.124939
-0.261543	43). There is	-0.124939
-0.261543	Gnu. There is	-0.124939
-0.261543	Conclusion There is	-0.124939
-0.261543	87). There is	-0.124939
-0.261543	recycled? There is	-0.124939
-0.261543	x4∙xn-4. There is	-0.124939
-0.261543	Namespaces There is	-0.124939
-0.261543	returned. There is	-0.124939
-0.261543	.so). There is	-0.124939
-0.737089	the array is	-0.124939
-0.728773	each array is	-0.124939
-0.728773	simple array is	-0.124939
-0.951809	An array is	-0.124939
-0.963365	multidimensional array is	-0.124939
-0.506214	fixed-size array is	-0.124939
-0.593772	1.fffff, where is	-0.124939
-0.565031	code version is	-0.425969
-0.790258	64-bit version is	-0.124939
-0.541990	Windows version is	-0.124939
-0.541990	compiled version is	-0.124939
-1.164610	the value is	-0.124939
-0.987994	a value is	-0.124939
-1.048815	The value is	-0.124939
-1.063246	each value is	-0.124939
-0.747785	its value is	-0.124939
-0.375683	of objects is	-0.124939
-0.595909	the variable is	-0.221849
-0.757530	a variable is	-0.301030
-0.681877	or variable is	-0.124939
-0.904150	A variable is	-0.124939
-0.583481	do so is	-0.124939
-1.285149	point variables is	-0.124939
-0.649679	register variables is	-0.124939
-1.865446	of 2 is	-0.124939
-0.780972	the table is	-0.124939
-0.974854	virtual table is	-0.124939
-0.896900	lookup table is	-0.124939
-1.017506	the performance is	-0.124939
-0.919570	The performance is	-0.124939
-0.494381	code performance is	-0.124939
-0.494381	when performance is	-0.124939
-0.111569	best performance is	-0.602060
-0.574283	application software is	-0.124939
-0.574283	CPU-intensive software is	-0.124939
-0.580251	storage order is	-0.124939
-0.580251	link order is	-0.124939
-1.196992	the branch is	-0.124939
-0.924125	and branch is	-0.124939
-0.748034	if branch is	-0.124939
-0.549472	control branch is	-0.124939
-0.517649	wrong branch is	-0.124939
-0.658242	data member is	-0.124939
-0.720338	this way is	-0.124939
-1.021368	same way is	-0.124939
-0.336992	other way is	-0.425969
-0.720338	first way is	-0.124939
-0.501141	second way is	-0.124939
-0.501141	compatible way is	-0.124939
-0.874367	of elements is	-0.301030
-0.527267	this address is	-0.124939
-0.527267	its address is	-0.124939
-0.348732	target address is	-0.124939
-0.527267	whose address is	-0.124939
-1.588012	function call is	-0.124939
-1.047106	carry bit is	-0.124939
-1.101975	of register is	-0.124939
-0.554283	A register is	-0.124939
-1.188514	vector register is	-0.124939
-0.452132	of optimization is	-0.124939
-0.746902	software optimization is	-0.124939
-0.516982	interprocedural optimization is	-0.124939
-0.516982	81 optimization is	-0.124939
-1.829988	function libraries is	-0.124939
-1.095035	A template is	-0.124939
-0.193703	powN template is	-0.124939
-0.449591	of registers is	-0.124939
-0.513055	because registers is	-0.124939
-0.853262	integer registers is	-0.124939
-0.513055	available registers is	-0.124939
-0.881069	smart pointers is	-0.124939
-1.460144	the user is	-0.124939
-0.804002	a user is	-0.124939
-1.024405	end user is	-0.124939
-0.479190	The method is	-0.124939
-0.202937	This method is	-0.279841
-0.340322	this method is	-0.124939
-0.242099	which method is	-0.124939
-0.315156	general method is	-0.124939
-0.526655	The access is	-0.124939
-0.605836	memory access is	-0.124939
-0.958990	file access is	-0.124939
-0.566201	= 16 is	-0.124939
-1.199386	by 16 is	-0.124939
-0.547532	if SSE2 is	-0.124939
-0.547532	possible. SSE2 is	-0.124939
-0.547532	executable. SSE2 is	-0.124939
-0.543519	The system is	-0.124939
-1.136564	operating system is	-0.124939
-0.623300	the file is	-0.425969
-1.111553	a file is	-0.124939
-0.589910	meta- programming is	-0.124939
-0.588785	1024 bits is	-0.124939
-1.623711	vector operations is	-0.124939
-1.056156	> 0 is	-0.124939
-0.483016	composite type is	-0.124939
-0.763083	this case is	-0.124939
-0.522166	simplest case is	-0.124939
-0.931832	worst case is	-0.124939
-1.047931	of processors is	-0.124939
-0.792992	Intel processors is	-0.124939
-0.922992	PC processors is	-0.124939
-0.887777	the constant is	-0.124939
-0.429004	a constant is	-0.903090
-0.982237	the error is	-0.124939
-0.861671	of error is	-0.124939
-0.746723	this error is	-0.124939
-0.861671	residual error is	-0.124939
-1.138949	the stack is	-0.124939
-0.458748	The stack is	-0.124939
-0.513738	register stack is	-0.124939
-1.064787	of CPUs is	-0.124939
-0.562818	old CPUs is	-0.124939
-0.588608	character arrays is	-0.124939
-1.679406	function calls is	-0.124939
-0.588879	fastest execution is	-0.124939
-0.835438	the result is	-0.124939
-0.867584	The result is	-0.124939
-0.447258	first result is	-0.124939
-0.447258	second result is	-0.124939
-0.447258	33 result is	-0.124939
-0.437500	the processor is	-0.124939
-0.486045	logical processor is	-0.124939
-0.695608	soft processor is	-0.124939
-0.587754	127 bytes is	-0.124939
-0.821481	of threads is	-0.124939
-0.821481	between threads is	-0.124939
-1.070840	array element is	-0.124939
-0.351131	first element is	-0.124939
-0.497877	C++ language is	-0.124939
-0.540425	programming language is	-0.124939
-0.847341	assembly language is	-0.124939
-0.386158	definition language is	-0.124939
-0.789026	The speed is	-0.124939
-0.708013	for speed is	-0.124939
-0.440638	if speed is	-0.124939
-0.166679	when speed is	-0.124939
-0.166679	where speed is	-0.124939
-0.590568	then c is	-0.124939
-0.362666	How much is	-0.425969
-1.062136	a thread is	-0.124939
-0.891653	one thread is	-0.124939
-0.968582	another thread is	-0.124939
-0.555154	multiplication, etc. is	-0.124939
-0.555154	mutexes, etc. is	-0.124939
-0.195066	The exception is	-0.124939
-1.047909	is allocated is	-0.124939
-0.557434	been allocated is	-0.124939
-1.013275	of overflow is	-0.124939
-0.528729	Integer overflow is	-0.124939
-0.528729	against overflow is	-0.124939
-0.833307	unsigned integers is	-0.124939
-0.816451	output option is	-0.124939
-0.556584	-fpie option is	-0.124939
-0.604696	the matrix is	-0.124939
-0.456064	a matrix is	-0.249877
-1.285814	64-bit Linux is	-0.124939
-1.077845	32-bit Linux is	-0.124939
-0.555768	if AVX is	-0.124939
-0.555768	system. AVX is	-0.124939
-1.552142	vector classes is	-0.124939
-0.968154	double precision is	-0.124939
-0.761255	Single precision is	-0.124939
-0.585023	last line is	-0.124939
-0.586494	already works is	-0.124939
-1.032277	is optimized is	-0.124939
-0.298247	This manual is	-0.301030
-0.174350	present manual is	-0.124939
-0.588379	Address calculation is	-0.124939
-0.589698	brand check is	-0.124939
-0.435001	the problem is	-0.124939
-0.698645	The problem is	-0.124939
-0.397847	this problem is	-0.602060
-0.216181	the solution is	-0.124939
-0.385290	This solution is	-0.124939
-0.216181	this solution is	-0.124939
-0.270169	which solution is	-0.124939
-0.270169	best solution is	-0.124939
-0.385290	optimal solution is	-0.124939
-0.270169	complicated solution is	-0.124939
-0.385290	alternative solution is	-0.124939
-0.270169	powerful solution is	-0.124939
-0.270169	clean solution is	-0.124939
-0.270169	reasonable solution is	-0.124939
-0.270169	universal solution is	-0.124939
-1.096403	the container is	-0.124939
-0.809407	or container is	-0.124939
-1.488979	bitwise operators is	-0.124939
-0.589341	i<n; i++) is	-0.124939
-0.795084	the list is	-0.124939
-0.971176	a list is	-0.124939
-0.537860	linked list is	-0.124939
-0.742017	sorted list is	-0.124939
-0.590071	quite likely is	-0.124939
-0.316245	or structure is	-0.124939
-0.583636	This standard is	-0.124939
-1.301541	the hardware is	-0.124939
-0.545922	+ 1 is	-0.124939
-0.545922	& 1 is	-0.124939
-0.869166	16-bit mode is	-0.124939
-1.636967	to store is	-0.124939
-0.585389	two values is	-0.124939
-0.587737	The sign is	-0.124939
-1.160290	non-inlined copy is	-0.124939
-0.803154	the information is	-0.124939
-0.549224	This information is	-0.124939
-0.797551	memory addresses is	-0.124939
-0.546093	self-relative addresses is	-0.124939
-0.740896	loop counter is	-0.124939
-0.442404	cycles counter is	-0.124939
-0.626952	monitor counter is	-0.124939
-0.626952	cycle counter is	-0.124939
-0.296294	loop count is	-0.249877
-0.414688	first count is	-0.124939
-0.242779	repeat count is	-0.124939
-0.546531	memory allocation is	-0.124939
-0.583017	uncached write is	-0.124939
-0.866176	these problems is	-0.124939
-0.466020	The space is	-0.124939
-0.492782	memory space is	-0.124939
-0.987833	the microprocessor is	-0.124939
-0.581195	several branches is	-0.124939
-0.581639	& operator is	-0.124939
-0.091747	overloaded operator is	-0.124939
-0.367844	dynamic_cast operator is	-0.124939
-0.517551	const_cast operator is	-0.124939
-0.367844	reinterpret_cast operator is	-0.124939
-0.670833	the multiplication is	-0.124939
-0.537479	graphics application is	-0.124939
-0.537479	WTL application is	-0.124939
-0.368173	code caching is	-0.425969
-0.393078	If caching is	-0.124939
-0.393078	without caching is	-0.124939
-0.393078	Data caching is	-0.124939
-0.393078	Efficient caching is	-0.124939
-0.927303	instruction sets is	-0.124939
-1.038817	the expression is	-0.124939
-0.496482	This expression is	-0.124939
-0.496482	some expression is	-0.124939
-0.424244	vector implementation is	-0.124939
-0.424244	which implementation is	-0.124939
-0.390012	software implementation is	-0.425969
-0.775567	complicated implementation is	-0.124939
-0.793402	exception handling is	-0.124939
-0.171006	Exception handling is	-0.124939
-1.536100	data members is	-0.124939
-0.729712	memory model is	-0.124939
-0.409215	CPU model is	-0.124939
-0.781205	processor model is	-0.124939
-0.922320	memory block is	-0.124939
-1.200666	function name is	-0.124939
-0.463592	the conversion is	-0.124939
-0.443828	The disadvantage is	-0.602060
-0.761751	A disadvantage is	-0.124939
-0.761751	Another disadvantage is	-0.124939
-1.440477	to zero is	-0.124939
-0.489144	that what is	-0.124939
-0.927479	on what is	-0.124939
-0.489144	reader what is	-0.124939
-0.533159	the parameter is	-0.124939
-0.599781	a parameter is	-0.124939
-0.533159	function parameter is	-0.124939
-0.149359	size parameter is	-0.124939
-0.637363	template parameter is	-0.124939
-0.532713	The division is	-0.124939
-1.276613	Integer division is	-0.124939
-1.414639	or reference is	-0.124939
-0.534054	A reference is	-0.124939
-1.270026	the source is	-0.124939
-0.535849	This cost is	-0.124939
-0.904809	extra cost is	-0.124939
-0.475372	The reason is	-0.726999
-0.441722	that n is	-0.124939
-0.166967	when n is	-0.124939
-0.441722	where n is	-0.124939
-0.442479	the string is	-0.124939
-0.402569	a string is	-0.425969
-0.442479	each string is	-0.124939
-0.691643	register keyword is	-0.124939
-0.483593	inline keyword is	-0.124939
-0.483593	__fastcall keyword is	-0.124939
-0.808120	table lookup is	-0.124939
-0.577150	of && is	-0.124939
-1.318854	the difference is	-0.124939
-0.898609	The difference is	-0.124939
-0.860527	preceding addition is	-0.124939
-0.483593	This mechanism is	-0.124939
-0.970606	dispatch mechanism is	-0.124939
-0.691643	unwinding mechanism is	-0.124939
-0.577668	of || is	-0.124939
-0.859922	of optimizations is	-0.124939
-0.852401	graphics framework is	-0.124939
-0.297642	static linking is	-0.249877
-0.756932	dynamic linking is	-0.124939
-0.424964	modern microprocessors is	-0.124939
-0.475615	older microprocessors is	-0.124939
-0.187495	work load is	-0.124939
-0.867030	we assume is	-0.124939
-0.710113	point numbers is	-0.124939
-0.518521	of platform is	-0.124939
-0.518521	different platform is	-0.124939
-0.577673	a dispatch is	-0.124939
-0.928265	user interface is	-0.124939
-0.574712	when AVX2 is	-0.124939
-0.659264	on process is	-0.124939
-0.463256	lookup process is	-0.124939
-0.463256	delaying process is	-0.124939
-0.465182	by r is	-0.124939
-0.465182	where r is	-0.124939
-0.465182	whether r is	-0.124939
-0.658510	of storage is	-0.124939
-0.658510	data storage is	-0.124939
-0.749918	Thread-local storage is	-0.124939
-0.173569	a union is	-0.425969
-0.841435	A union is	-0.124939
-0.575803	that 10 is	-0.124939
-0.375129	function feature is	-0.124939
-0.527871	This feature is	-0.124939
-0.093024	this feature is	-0.301030
-0.457205	A constructor is	-0.124939
-0.918831	copy constructor is	-0.124939
-0.739192	default constructor is	-0.124939
-0.580992	element a[i] is	-0.124939
-0.576360	with #define is	-0.124939
-0.580327	of points is	-0.124939
-0.577679	context switch is	-0.124939
-1.193586	of range is	-0.124939
-0.288668	used here is	-0.124939
-0.288668	static here is	-0.124939
-0.288668	speed here is	-0.124939
-0.288668	problem here is	-0.124939
-0.120779	operator here is	-0.124939
-0.288668	And here is	-0.124939
-1.106731	CPU core is	-0.124939
-1.032539	code section is	-0.124939
-0.839037	data section is	-0.124939
-1.040151	cache contentions is	-0.124939
-0.509475	such contentions is	-0.124939
-0.406030	the computer is	-0.124939
-0.399583	one computer is	-0.124939
-0.573252	type conversions is	-0.124939
-1.027523	such errors is	-0.124939
-0.576203	when columns is	-0.124939
-0.125153	to p is	-0.124939
-0.426827	that p is	-0.124939
-0.301674	by p is	-0.124939
-0.301674	before p is	-0.124939
-0.301674	whether p is	-0.124939
-0.193034	the syntax is	-0.124939
-0.618611	The syntax is	-0.124939
-0.631093	the STL is	-0.124939
-0.442619	A profiler is	-0.124939
-0.442619	Intel's profiler is	-0.124939
-0.442619	AMD's profiler is	-0.124939
-0.548566	the index is	-0.124939
-0.389584	that index is	-0.124939
-0.424562	array index is	-0.425969
-0.443076	function inlining is	-0.124939
-0.335782	the network is	-0.124939
-0.995080	/ b) is	-0.124939
-0.575320	a response is	-0.124939
-0.572180	of lines is	-0.124939
-0.180861	same operation is	-0.425969
-0.820622	bounds checking is	-0.124939
-0.885035	Bounds checking is	-0.124939
-0.805487	a task is	-0.124939
-0.490574	given task is	-0.124939
-1.185337	is limited is	-0.124939
-0.571793	little math is	-0.124939
-0.568457	or database is	-0.124939
-0.842963	of constants is	-0.124939
-0.573471	a bool is	-0.124939
-1.096761	stack frame is	-0.124939
-0.777664	the destructor is	-0.124939
-0.425752	A destructor is	-0.124939
-0.425752	virtual destructor is	-0.124939
-0.427550	when efficiency is	-0.124939
-0.427550	program efficiency is	-0.124939
-0.427550	highest efficiency is	-0.124939
-0.490574	of algorithm is	-0.124939
-0.490574	following algorithm is	-0.124939
-0.571348	handle strings is	-0.124939
-0.163367	the exponent is	-0.124939
-0.125525	The exponent is	-0.124939
-0.179030	Another possibility is	-0.425969
-0.571348	these conditions is	-0.124939
-0.567760	Best-case testing is	-0.124939
-0.983473	the alignment is	-0.124939
-0.573154	cross-platform compatibility is	-0.124939
-0.566867	the macro is	-0.124939
-0.212285	an operand is	-0.124939
-0.093299	one operand is	-0.124939
-0.044149	second operand is	-0.249877
-0.518505	The effect is	-0.124939
-0.417266	this effect is	-0.124939
-0.402910	of containers is	-0.124939
-0.402910	made containers is	-0.124939
-0.567915	STL containers is	-0.124939
-0.835493	same priority is	-0.124939
-0.387782	clock frequency is	-0.124939
-0.402248	the iteration is	-0.124939
-0.639435	each iteration is	-0.124939
-0.402248	preceding iteration is	-0.124939
-0.339183	if N is	-0.124939
-0.339183	If N is	-0.124939
-0.339183	where N is	-0.124939
-0.339183	case, N is	-0.124939
-0.568884	important thing is	-0.124939
-0.565020	the handle is	-0.124939
-1.235041	the heap is	-0.124939
-0.981610	vector nontemporal is	-0.124939
-1.068209	array bounds is	-0.124939
-1.510000	be improved is	-0.124939
-0.472112	The situation is	-0.124939
-0.673253	case situation is	-0.124939
-1.318933	error message is	-0.124939
-0.282322	The delay is	-0.124939
-0.389325	This delay is	-0.124939
-0.616176	the condition is	-0.124939
-0.388626	a condition is	-0.124939
-0.547185	control condition is	-0.124939
-0.388626	different cores is	-0.124939
-0.365016	CPU cores is	-0.425969
-0.467058	result ebx is	-0.124939
-0.467058	Register ebx is	-0.124939
-0.469578	of list[i] is	-0.124939
-0.469578	expression list[i] is	-0.124939
-1.119269	switch statements is	-0.124939
-0.560254	This chapter is	-0.124939
-0.964189	target buffer is	-0.124939
-0.967092	loop unrolling is	-0.124939
-0.459210	times CriticalFunction is	-0.124939
-0.459210	whether CriticalFunction is	-0.124939
-0.831597	the fraction is	-0.124939
-0.567158	row length is	-0.124939
-0.429873	of f is	-0.124939
-0.125911	// f is	-0.124939
-0.303955	then f is	-0.124939
-0.829470	misprediction penalty is	-0.124939
-0.457405	function F1 is	-0.124939
-0.457405	returning. F1 is	-0.124939
-0.372824	simple alternative is	-0.124939
-0.524598	An alternative is	-0.124939
-0.372824	light-weight alternative is	-0.124939
-0.438147	critical stride is	-0.301030
-0.564844	for 'this' is	-0.124939
-0.560254	per row is	-0.124939
-1.053127	template metaprogramming is	-0.124939
-0.970015	hash map is	-0.124939
-0.555848	they contain is	-0.124939
-0.822011	logic device is	-0.124939
-0.393079	parameter transfer is	-0.124939
-0.353471	Parameter transfer is	-0.124939
-0.714155	memory blocks is	-0.124939
-0.629312	big blocks is	-0.124939
-0.649272	example 15.1b is	-0.124939
-0.697749	the latter is	-0.124939
-0.141967	The latter is	-0.124939
-1.228528	dependency chains is	-0.124939
-0.555848	particular brand is	-0.124939
-1.228528	the diagonal is	-0.124939
-0.443946	different purposes is	-0.124939
-0.443946	educational purposes is	-0.124939
-0.559631	user. Time is	-0.124939
-0.168071	// everything is	-0.124939
-0.558366	capabilities. Here is	-0.124939
-0.562172	directives. OpenMP is	-0.124939
-0.475263	clock cycle is	-0.301030
-1.116560	pointer aliasing is	-0.124939
-0.500704	This tool is	-0.124939
-0.142452	development tool is	-0.124939
-0.556018	and memcpy is	-0.124939
-0.134788	the parallelism is	-0.124939
-0.331275	Fine-grained parallelism is	-0.124939
-0.429796	because #if is	-0.124939
-0.429796	code. #if is	-0.124939
-0.428741	this unit is	-0.124939
-0.428741	time unit is	-0.124939
-0.554607	each label is	-0.124939
-0.551799	of iterations is	-0.124939
-0.946530	branch misprediction is	-0.124939
-1.177349	lazy binding is	-0.124939
-0.551799	theoretical background is	-0.124939
-1.177915	dependency chain is	-0.124939
-0.553201	complicated algorithms is	-0.124939
-0.553201	possible inputs is	-0.124939
-0.554607	and who is	-0.124939
-0.807786	the DLL is	-0.124939
-0.556018	memory required is	-0.124939
-0.554607	keyword volatile is	-0.124939
-1.136134	cache misses is	-0.124939
-1.108127	The purpose is	-0.124939
-1.059281	without -fpic is	-0.124939
-0.548233	Yet, D is	-0.124939
-0.548233	value xn is	-0.124939
-1.116963	and delete is	-0.124939
-0.409942	code itself is	-0.124939
-0.409942	device itself is	-0.124939
-1.015377	context switches is	-0.124939
-1.116963	The trick is	-0.124939
-0.652806	loop body is	-0.124939
-0.409942	its body is	-0.124939
-1.059281	compiler generates is	-0.124939
-0.546649	using exceptions is	-0.124939
-1.205650	the CPUID is	-0.124939
-0.549822	five manuals is	-0.124939
-0.798545	type T is	-0.124939
-0.942130	binary representation is	-0.124939
-0.801377	Runtime polymorphism is	-0.124939
-0.798545	the factor is	-0.124939
-0.549822	loop. log is	-0.124939
-0.543522	This principle is	-0.124939
-0.151109	time Func is	-0.425969
-0.151480	we notice is	-0.425969
-0.541695	if portability is	-0.124939
-0.914285	the debugger is	-0.124939
-0.539876	image base is	-0.124939
-0.541695	the compilation is	-0.124939
-0.541695	name ?Func@@YAXQAHAAH@Z is	-0.124939
-0.547199	macro INSTRSET is	-0.124939
-0.530567	versatile. Fortran is	-0.124939
-0.532705	of inheritance is	-0.124939
-0.530567	Memory swapping is	-0.124939
-0.534853	that memset is	-0.124939
-0.263954	optimization effort is	-0.425969
-1.022430	constant propagation is	-0.124939
-0.773946	Algebraic reduction is	-0.124939
-0.263185	of abc is	-0.124939
-0.770227	My recommendation is	-0.124939
-0.530567	program package is	-0.124939
-0.770227	cleanup jobs is	-0.124939
-0.530567	of n! is	-0.124939
-0.770227	of Basic is	-0.124939
-0.957571	or malloc is	-0.124939
-0.530567	example 15.1c is	-0.124939
-0.530567	with macros is	-0.124939
-0.534853	you prefer is	-0.124939
-0.060382	the divisor is	-0.124939
-0.060382	if divisor is	-0.425969
-0.933262	time slices is	-0.124939
-0.519559	An enum is	-0.124939
-0.516968	our estimate is	-0.124939
-0.170379	way m is	-0.124939
-0.076887	function, m is	-0.124939
-1.056771	template specialization is	-0.124939
-0.516968	where pre-increment is	-0.124939
-0.522166	and ownership is	-0.124939
-0.127902	that *p+2 is	-0.425969
-0.519559	and 14.9 is	-0.124939
-0.516968	certain modification is	-0.124939
-0.519559	if powN is	-0.124939
-0.127902	the bottleneck is	-0.124939
-0.519559	data area is	-0.124939
-0.751287	The consequence is	-0.124939
-0.519559	an assumption is	-0.124939
-0.519559	compilers. Fastcall is	-0.124939
-0.516968	Step (1) is	-0.124939
-0.516968	when alloca is	-0.124939
-0.861874	the original is	-0.124939
-0.309980	by unit-testing is	-0.124939
-0.309980	This unit-testing is	-0.124939
-0.516968	desired interval is	-0.124939
-0.751287	250 μs is	-0.124939
-0.927378	(in bytes) is	-0.124939
-0.516968	If hyperthreading is	-0.124939
-0.309980	point format is	-0.124939
-0.437943	file format is	-0.124939
-0.201891	The conclusion is	-0.425969
-0.815113	of abstraction is	-0.124939
-0.495209	data manipulation is	-0.124939
-0.201891	the dividend is	-0.124939
-0.495209	of coefficients is	-0.124939
-0.495209	sequential labels is	-0.124939
-0.495209	main focus is	-0.124939
-0.495209	function longjmp is	-0.124939
-0.710555	library (STL) is	-0.124939
-0.715967	element matrix[r][c] is	-0.124939
-0.495209	why bookkeeping is	-0.124939
-0.495209	processors. Hyperthreading is	-0.124939
-0.498497	that a+b is	-0.124939
-0.495209	this argument is	-0.124939
-0.498497	kind: "what is	-0.124939
-0.495209	CPU market is	-0.124939
-0.710555	example 11.3 is	-0.124939
-0.498497	= *(p++) is	-0.124939
-0.246169	software product is	-0.124939
-0.246169	competing product is	-0.124939
-0.498497	of allocations is	-0.124939
-0.495209	jl $B1$2 is	-0.124939
-0.454711	realistic goal is	-0.124939
-0.454711	when CriticalInnerFunction is	-0.124939
-0.454711	Here CParent is	-0.124939
-0.454711	then N&(N-1) is	-0.124939
-0.454711	integer comparison is	-0.124939
-0.454711	linear search, is	-0.124939
-0.645919	memory footprint is	-0.124939
-0.454711	The proxy is	-0.124939
-0.454711	compromise safety is	-0.124939
-0.454711	worth considering is	-0.124939
-0.454711	hour. Neither is	-0.124939
-0.454711	.exe file, is	-0.124939
-0.454711	The branching is	-0.124939
-0.645919	symbol interposition is	-0.124939
-0.454711	example 14.1c is	-0.124939
-0.454711	of matrix[j][0] is	-0.124939
-0.141242	time MemberPointer is	-0.124939
-0.141242	before MemberPointer is	-0.124939
-0.645919	all 1's is	-0.124939
-0.454711	// (This is	-0.124939
-0.454711	or Friday is	-0.124939
-0.454711	database queries is	-0.124939
-0.454711	performance bottlenecks is	-0.124939
-0.645919	a bitfield is	-0.124939
-0.454711	of coprocessors is	-0.124939
-0.645919	if any, is	-0.124939
-0.645919	Example 8.21 is	-0.124939
-0.645919	FDIV bug is	-0.124939
-0.351898	no attempt is	-0.124939
-0.351898	serious burden is	-0.124939
-0.351898	enabled (there is	-0.124939
-0.351898	the LLVM is	-0.124939
-0.351898	or she is	-0.124939
-0.351898	example 14.7b is	-0.124939
-0.351898	of cc[i]+2 is	-0.124939
-0.351898	This technique is	-0.124939
-0.351898	different targets is	-0.124939
-0.351898	empty throw()specification is	-0.124939
-0.351898	// u.d is	-0.124939
-0.351898	constant: Unsigned is	-0.124939
-0.351898	model N-1 is	-0.124939
-0.351898	of i&15 is	-0.124939
-0.351898	the CPU-type is	-0.124939
-0.351898	sign, eee is	-0.124939
-0.351898	and 12.4c is	-0.124939
-0.351898	of &list[100] is	-0.124939
-0.351898	// Truncation is	-0.124939
-0.351898	processor X" is	-0.124939
-0.351898	often seen, is	-0.124939
-0.351898	memory re-allocation is	-0.124939
-0.351898	it (&ArraySize) is	-0.124939
-0.351898	operation. x*8 is	-0.124939
-0.351898	count (ArraySize) is	-0.124939
-0.351898	or g(x) is	-0.124939
-0.351898	which supposedly is	-0.124939
-0.351898	or while-loop is	-0.124939
-0.351898	and animations is	-0.124939
-0.351898	Here, log(2.0) is	-0.124939
-0.351898	or inttypes.h is	-0.124939
-0.351898	= array[i++] is	-0.124939
-0.351898	memory bus is	-0.124939
-0.351898	or C2::Disp() is	-0.124939
-0.351898	over. Virtualization is	-0.124939
-0.351898	or p->member is	-0.124939
-0.351898	(v. 15.0) is	-0.124939
-0.351898	* 1.5f; is	-0.124939
-0.351898	"express" edition is	-0.124939
-0.351898	divisions (Division is	-0.124939
-0.351898	and bb[i]*cc[i] is	-0.124939
-0.351898	type size_t is	-0.124939
-0.351898	runtime. Polymorphism is	-0.124939
-0.351898	other subtasks is	-0.124939
-0.351898	and fffff is	-0.124939
-0.351898	important remedy is	-0.124939
-0.351898	page 87) is	-0.124939
-0.351898	add eax,1 is	-0.124939
-0.351898	This behaviour is	-0.124939
-0.351898	My preference is	-0.124939
-0.351898	This triangle is	-0.124939
-0.351898	// Rounding is	-0.124939
-0.351898	The loop-branch is	-0.124939
-0.351898	the strictness is	-0.124939
-0.351898	sub-expressions. Why is	-0.124939
-0.351898	or malloc) is	-0.124939
-0.351898	and mirroring is	-0.124939
-0.351898	the occurrence is	-0.124939
-0.351898	(or higher) is	-0.124939
-0.351898	other abuse is	-0.124939
-0.351898	One kilobyte is	-0.124939
-0.351898	example 7.43b is	-0.124939
-0.351898	address. Relocation is	-0.124939
-0.351898	example 14.21 is	-0.124939
-0.351898	the granularity is	-0.124939
-0.351898	This '1' is	-0.124939
-0.927487	to is a	-0.124939
-1.094823	that is a	-0.301030
-1.407465	it is a	-0.124939
-1.092409	function is a	-0.425969
-1.354582	code is a	-0.124939
-0.769443	This is a	-0.191886
-0.695789	compiler is a	-0.425969
-1.373289	this is a	-0.124939
-0.514745	use is a	-0.124939
-1.709062	It is a	-0.124939
-0.927487	memory is a	-0.124939
-0.927487	data is a	-0.124939
-1.296449	which is a	-0.124939
-1.148265	cache is a	-0.124939
-0.384982	example is a	-0.124939
-0.601895	size is a	-0.301030
-0.812127	b is a	-0.124939
-0.627298	there is a	-0.159701
-0.600766	There is a	-0.221849
-1.060872	array is a	-0.124939
-0.743110	so is a	-0.124939
-0.856970	register is a	-0.124939
-0.856970	template is a	-0.124939
-0.976042	registers is a	-0.124939
-0.927487	access is a	-0.124939
-0.450686	case is a	-0.124939
-1.287178	constant is a	-0.124939
-0.530860	stack is a	-0.124939
-1.060872	language is a	-0.124939
-0.185196	much is a	-0.425969
-0.384982	matrix is a	-0.124939
-1.270462	solution is a	-0.124939
-0.976042	list is a	-0.124939
-0.514745	likely is a	-0.124939
-0.856970	structure is a	-0.124939
-0.976042	counter is a	-0.124939
-1.055223	caching is a	-0.124939
-0.927487	n is a	-0.124939
-0.856970	r is a	-0.124939
-0.919201	union is a	-0.124939
-0.514745	switch is a	-0.124939
-0.660873	here is a	-0.124939
-0.514745	columns is a	-0.124939
-0.593926	p is a	-0.124939
-0.976042	exponent is a	-0.124939
-0.856970	iteration is a	-0.124939
-0.228901	N is a	-0.301030
-0.743110	situation is a	-0.124939
-0.856970	condition is a	-0.124939
-0.514745	statements is a	-0.124939
-0.343154	stride is a	-0.425969
-0.514745	row is a	-0.124939
-0.514745	device is a	-0.124939
-0.514745	Time is a	-0.124939
-0.514745	Here is a	-0.124939
-0.514745	OpenMP is a	-0.124939
-0.514745	chain is a	-0.124939
-0.185196	itself is a	-0.124939
-0.514745	T is a	-0.124939
-0.514745	factor is a	-0.124939
-0.514745	log is a	-0.124939
-0.514745	swapping is a	-0.124939
-0.514745	reduction is a	-0.124939
-0.743110	abc is a	-0.124939
-0.514745	prefer is a	-0.124939
-0.450686	divisor is a	-0.425969
-0.185196	*p+2 is a	-0.425969
-0.514745	interval is a	-0.124939
-0.514745	bytes) is a	-0.124939
-0.514745	abstraction is a	-0.124939
-0.514745	(STL) is a	-0.124939
-0.514745	CParent is a	-0.124939
-0.514745	bug is a	-0.124939
-0.514745	LLVM is a	-0.124939
-1.253296	= a a	-0.124939
-0.596989	a= a a	-0.124939
-0.721663	function of a	-0.425969
-0.739392	because of a	-0.124939
-0.923947	functions of a	-0.124939
-0.747855	CPU of a	-0.124939
-0.229624	loop of a	-0.301030
-0.517543	integer of a	-0.124939
-1.048885	example of a	-0.124939
-1.071203	size of a	-0.301030
-0.517543	pointer of a	-0.124939
-0.391587	object of a	-0.221849
-1.128222	version of a	-0.124939
-1.223670	value of a	-0.124939
-1.084996	objects of a	-0.124939
-1.145504	performance of a	-0.124939
-0.572023	member of a	-0.124939
-0.533182	elements of a	-0.124939
-1.484191	address of a	-0.124939
-1.242414	bit of a	-0.124939
-0.864331	out of a	-0.124939
-0.668241	part of a	-0.321233
-0.598851	bits of a	-0.301030
-1.116705	type of a	-0.124939
-0.897273	versions of a	-0.124939
-1.048553	speed of a	-0.124939
-0.988086	overflow of a	-0.124939
-0.747855	uses of a	-0.124939
-1.417735	advantage of a	-0.124939
-0.863148	structure of a	-0.124939
-0.596725	values of a	-0.425969
-0.185850	sign of a	-0.124939
-0.414560	members of a	-0.425969
-0.747855	development of a	-0.124939
-1.183161	disadvantage of a	-0.124939
-0.706568	end of a	-0.124939
-1.769045	parts of a	-0.124939
-1.157697	types of a	-0.124939
-0.986309	instead of a	-0.124939
-0.747855	optimizations of a	-0.124939
-0.984187	range of a	-0.124939
-0.517543	modules of a	-0.124939
-0.517543	change of a	-0.124939
-1.231124	instance of a	-0.124939
-0.747855	output of a	-0.124939
-1.177801	efficiency of a	-0.124939
-0.747855	sum of a	-0.124939
-0.988086	offset of a	-0.124939
-0.258541	length of a	-0.346788
-1.340269	beginning of a	-0.124939
-0.517543	transfer of a	-0.124939
-0.533182	Conversion of a	-0.124939
-0.747855	body of a	-0.124939
-0.747855	collection of a	-0.124939
-1.084996	scope of a	-0.124939
-0.452495	form of a	-0.124939
-0.517543	Objects of a	-0.124939
-0.517543	Division of a	-0.124939
-0.747855	latency of a	-0.124939
-1.031673	pieces of a	-0.124939
-0.747855	redesign of a	-0.124939
-0.863148	combination of a	-0.124939
-0.517543	window of a	-0.124939
-0.517543	dangers of a	-0.124939
-0.747855	Value of a	-0.124939
-0.747855	creation of a	-0.124939
-0.185850	Sum of a	-0.124939
-0.517543	menus of a	-0.124939
-0.517543	benefits of a	-0.124939
-0.517543	insertion of a	-0.124939
-0.517543	notion of a	-0.124939
-0.517543	layer of a	-0.124939
-1.195438	it to a	-0.124939
-1.316573	code to a	-0.124939
-1.370099	point to a	-0.124939
-0.468359	integer to a	-0.124939
-0.548640	class to a	-0.124939
-0.583514	pointer to a	-0.124939
-0.935333	object to a	-0.124939
-0.548640	objects to a	-0.124939
-0.569847	call to a	-0.346788
-1.332310	access to a	-0.124939
-0.548640	file to a	-0.124939
-1.222703	calls to a	-0.124939
-0.192897	thread to a	-0.425969
-1.200047	parameters to a	-0.124939
-0.802107	application to a	-0.124939
-0.537720	reference to a	-0.124939
-0.802107	link to a	-0.124939
-1.319613	points to a	-0.124939
-0.548640	columns to a	-0.124939
-1.060272	writing to a	-0.124939
-1.021261	changed to a	-0.124939
-0.802107	executable to a	-0.124939
-1.103137	copied to a	-0.124939
-0.802107	similar to a	-0.124939
-0.648408	added to a	-0.124939
-0.312680	applied to a	-0.124939
-0.316067	converted to a	-0.249877
-0.802107	jump to a	-0.124939
-0.802107	linker to a	-0.124939
-0.802107	equivalent to a	-0.124939
-0.548640	updated to a	-0.124939
-0.548640	Writes to a	-0.124939
-0.085826	lead to a	-0.124939
-0.548640	loader to a	-0.124939
-0.935333	leads to a	-0.124939
-0.192897	type-casted to a	-0.124939
-0.548640	respond to a	-0.124939
-0.548640	distribution to a	-0.124939
-0.548640	comparable to a	-0.124939
-0.548640	223 to a	-0.124939
-0.548640	-b to a	-0.124939
-0.548640	confined to a	-0.124939
-1.436807	functions and a	-0.124939
-1.006270	class and a	-0.124939
-0.576372	version and a	-0.124939
-1.402021	systems and a	-0.124939
-0.853149	user and a	-0.124939
-1.029240	processor and a	-0.124939
-1.185531	language and a	-0.124939
-0.853149	block and a	-0.124939
-0.576372	parameter and a	-0.124939
-0.576372	set, and a	-0.124939
-0.576372	addition, and a	-0.124939
-0.853149	object, and a	-0.124939
-0.853149	C++, and a	-0.124939
-0.853149	development, and a	-0.124939
-0.853149	cores, and a	-0.124939
-0.576372	functionality and a	-0.124939
-0.853149	database, and a	-0.124939
-0.576372	(PLT) and a	-0.124939
-0.576372	Kbytes and a	-0.124939
-0.576372	running, and a	-0.124939
-0.576372	traffic and a	-0.124939
-0.457949	a in a	-0.124939
-0.213469	or in a	-0.124939
-0.793502	it in a	-0.124939
-0.449626	function in a	-0.301030
-0.650959	with in a	-0.124939
-0.819696	code in a	-0.124939
-0.604077	not in a	-0.124939
-0.443954	than in a	-0.124939
-0.171218	this in a	-0.124939
-0.650959	memory in a	-0.124939
-0.712501	data in a	-0.124939
-0.740617	program in a	-0.124939
-0.650959	same in a	-0.124939
-0.604077	functions in a	-0.124939
-1.145684	loop in a	-0.124939
-0.828629	but in a	-0.124939
-0.832569	used in a	-0.124939
-0.457949	one in a	-0.124939
-0.890412	integer in a	-0.124939
-0.650959	b in a	-0.124939
-0.650959	version in a	-0.124939
-1.058792	objects in a	-0.124939
-0.749701	variable in a	-0.124939
-0.457949	so in a	-0.124939
-0.457031	variables in a	-0.124939
-0.793502	software in a	-0.124939
-0.618568	elements in a	-0.249877
-0.740617	faster in a	-0.124939
-0.543165	stored in a	-0.234083
-0.316726	called in a	-0.124939
-0.890412	example, in a	-0.124939
-0.650959	first in a	-0.124939
-0.650959	access in a	-0.124939
-0.740617	file in a	-0.124939
-0.407048	bits in a	-0.124939
-0.827050	up in a	-0.124939
-0.865555	times in a	-0.124939
-0.212611	accessed in a	-0.367977
-0.171218	CPUs in a	-0.425969
-0.793502	calculations in a	-0.124939
-0.483197	result in a	-0.124939
-0.793502	bytes in a	-0.124939
-0.865555	threads in a	-0.124939
-0.826444	element in a	-0.124939
-0.413093	done in a	-0.124939
-0.316726	calculation in a	-0.425969
-0.573245	implemented in a	-0.301030
-0.457949	likely in a	-0.124939
-0.828629	run in a	-0.124939
-0.827050	values in a	-0.124939
-0.740617	information in a	-0.124939
-0.650959	fast in a	-0.124939
-0.457949	branches in a	-0.124939
-0.457949	typically in a	-0.124939
-0.457949	complicated in a	-0.124939
-0.457949	programmer in a	-0.124939
-0.457949	lookup in a	-0.124939
-0.650959	last in a	-0.124939
-0.441664	declared in a	-0.124939
-0.457949	piece in a	-0.124939
-0.457949	smaller in a	-0.124939
-0.457949	here in a	-0.124939
-0.372440	columns in a	-0.602060
-0.828629	lines in a	-0.124939
-0.537130	strings in a	-0.425969
-0.316726	conditions in a	-0.124939
-0.457949	tasks in a	-0.124939
-0.457949	obtained in a	-0.124939
-0.457949	possibly in a	-0.124939
-0.650959	rows in a	-0.124939
-0.650959	message in a	-0.124939
-0.483197	defined in a	-0.124939
-0.457949	sequence in a	-0.124939
-0.650959	something in a	-0.124939
-0.457949	invalid in a	-0.124939
-0.457949	organized in a	-0.124939
-0.457949	'this' in a	-0.124939
-0.457949	implement in a	-0.124939
-0.962465	included in a	-0.124939
-0.457949	algebra in a	-0.124939
-0.740617	saved in a	-0.124939
-0.650959	contained in a	-0.124939
-0.171218	coded in a	-0.425969
-0.457949	PC's in a	-0.124939
-0.457949	handled in a	-0.124939
-0.457949	pivot in a	-0.124939
-0.650959	placed in a	-0.124939
-0.457949	proceed in a	-0.124939
-0.457949	typo in a	-0.124939
-0.457949	investing in a	-0.124939
-0.457949	absent in a	-0.124939
-0.457949	programmed in a	-0.124939
-0.457949	finishes in a	-0.124939
-0.457949	indexed in a	-0.124939
-0.457949	semicolons in a	-0.124939
-0.457949	scheduled in a	-0.124939
-0.457949	Calculations in a	-0.124939
-0.553589	that for a	-0.124939
-1.243526	function for a	-0.124939
-1.036125	use for a	-0.124939
-0.811019	memory for a	-0.124939
-0.811019	library for a	-0.124939
-0.811019	branch for a	-0.124939
-0.811019	16 for a	-0.124939
-1.133035	file for a	-0.124939
-1.303652	compiled for a	-0.124939
-0.553589	element for a	-0.124939
-1.216321	optimized for a	-0.124939
-0.947491	container for a	-0.124939
-0.553589	implementation for a	-0.124939
-0.811019	handling for a	-0.124939
-0.553589	lookup for a	-0.124939
-0.553589	platform for a	-0.124939
-1.147610	compiling for a	-0.124939
-0.553589	3 for a	-0.124939
-0.811019	enough for a	-0.124939
-0.947491	designed for a	-0.124939
-0.553589	inputs for a	-0.124939
-1.072430	wait for a	-0.124939
-0.811019	principle for a	-0.124939
-0.553589	87 for a	-0.124939
-0.811019	90 for a	-0.124939
-0.553589	directive for a	-0.124939
-0.811019	area for a	-0.124939
-0.553589	49 for a	-0.124939
-0.553589	perhaps for a	-0.124939
-0.553589	Compile for a	-0.124939
-0.811019	fine-tuned for a	-0.124939
-0.553589	wired for a	-0.124939
-1.903485	is that a	-0.124939
-1.535214	code that a	-0.124939
-1.262394	compiler that a	-0.124939
-1.114949	compilers that a	-0.124939
-0.557920	example, that a	-0.124939
-1.235204	sure that a	-0.124939
-0.818885	small that a	-0.124939
-0.566396	certain that a	-0.124939
-0.988606	means that a	-0.124939
-0.818885	shows that a	-0.124939
-0.637101	know that a	-0.124939
-1.259328	require that a	-0.124939
-0.331570	Assume that a	-0.124939
-1.083236	possibility that a	-0.124939
-0.818885	happen that a	-0.124939
-0.818885	specifies that a	-0.124939
-0.557920	tells that a	-0.124939
-0.557920	unusual that a	-0.124939
-0.818885	says that a	-0.124939
-0.557920	stage that a	-0.124939
-0.557920	feel that a	-0.124939
-1.409559	to be a	-0.124939
-1.443451	can be a	-0.204120
-1.103474	may be a	-0.124939
-0.568643	only be a	-0.425969
-1.545819	should be a	-0.124939
-0.995520	also be a	-0.124939
-0.800446	even be a	-0.124939
-0.547713	always be a	-0.124939
-1.360073	must be a	-0.124939
-1.115425	therefore be a	-0.124939
-0.569274	preferably be a	-0.346788
-0.800446	course be a	-0.124939
-0.976538	might be a	-0.124939
-0.933079	could be a	-0.124939
-1.173845	there are a	-0.301030
-1.284788	There are a	-0.301030
-0.200668	Registers are a	-0.425969
-0.762505	a = a	-0.647817
-1.056503	x = a	-0.124939
-0.731324	b = a	-0.301030
-0.340386	0 = a	-0.124939
-0.618634	c = a	-0.124939
-0.631629	y = a	-0.425969
-0.761908	d = a	-0.124939
-0.732775	false = a	-0.124939
-0.732775	ecx = a	-0.124939
-0.340386	-(-a) = a	-0.425969
-0.183751	a*1 = a	-0.425969
-0.183751	a+0 = a	-0.425969
-0.508607	(!a&&c) = a	-0.124939
-0.508607	a/1 = a	-0.124939
-0.508607	(b&&c) = a	-0.124939
-0.508607	(!a&&b) = a	-0.124939
-0.508607	~b = a	-0.124939
-0.508607	~(~a) = a	-0.124939
-0.508607	^0 = a	-0.124939
-0.193260	time or a	-0.124939
-0.550287	same or a	-0.124939
-0.624042	one or a	-0.602060
-0.955049	pointer or a	-0.124939
-0.805064	library or a	-0.124939
-0.550287	table or a	-0.124939
-0.939358	CPUs or a	-0.124939
-0.805064	line or a	-0.124939
-0.805064	list or a	-0.124939
-0.805064	reference or a	-0.124939
-0.550287	module or a	-0.124939
-0.550287	DLL or a	-0.124939
-0.358756	tree or a	-0.425969
-0.550287	objconv or a	-0.124939
-1.359204	make it a	-0.124939
-0.596712	give it a	-0.124939
-1.409430	the function a	-0.249877
-1.411647	critical function a	-0.124939
-1.002135	that if a	-0.124939
-1.220610	2 if a	-0.124939
-1.442459	faster if a	-0.124939
-0.561184	example, if a	-0.124939
-1.402856	even if a	-0.124939
-1.178227	But if a	-0.124939
-0.542104	optimized if a	-0.124939
-1.196315	check if a	-0.124939
-1.188400	advantageous if a	-0.124939
-1.002135	see if a	-0.124939
-0.790460	inefficient if a	-0.124939
-0.542104	algorithm if a	-0.124939
-0.542104	occur if a	-0.124939
-0.790460	loops if a	-0.124939
-0.542104	significant if a	-0.124939
-0.191448	invalid if a	-0.124939
-0.542104	(or if a	-0.124939
-0.191448	evaluated if a	-0.425969
-0.498567	is by a	-0.124939
-0.716081	it by a	-0.124939
-1.096360	than by a	-0.124939
-1.022043	used by a	-0.124939
-0.892172	i by a	-0.124939
-0.498567	variable by a	-0.124939
-0.112144	branch by a	-0.301030
-1.009982	calculated by a	-0.124939
-0.498567	counter by a	-0.124939
-0.886815	multiplication by a	-0.124939
-0.097960	division by a	-0.367977
-0.560052	replaced by a	-0.124939
-0.498567	database by a	-0.124939
-0.498567	initialized by a	-0.124939
-1.087752	improved by a	-0.124939
-0.822156	multiply by a	-0.124939
-1.044005	determined by a	-0.124939
-0.112144	Division by a	-0.301030
-0.440147	identified by a	-0.124939
-1.017111	multiplied by a	-0.124939
-0.181354	spaced by a	-0.425969
-0.498567	Modulo by a	-0.124939
-0.498567	Multiplying by a	-0.124939
-0.437145	function with a	-0.124939
-0.448616	on with a	-0.124939
-0.870450	code with a	-0.124939
-0.636490	not with a	-0.124939
-0.722897	CPU with a	-0.124939
-0.448616	other with a	-0.124939
-0.406759	loop with a	-0.124939
-0.636490	integer with a	-0.124939
-0.312206	class with a	-0.124939
-0.448616	size with a	-0.124939
-0.312206	b with a	-0.124939
-0.503669	library with a	-0.124939
-0.722897	array with a	-0.124939
-0.636490	elements with a	-0.124939
-0.448616	call with a	-0.124939
-0.807027	processors with a	-0.124939
-0.312206	constant with a	-0.124939
-0.636490	times with a	-0.124939
-0.722897	accessed with a	-0.124939
-0.886208	CPUs with a	-0.124939
-0.448616	language with a	-0.124939
-0.722897	integers with a	-0.124939
-0.678776	done with a	-0.124939
-0.448616	list with a	-0.124939
-0.636490	run with a	-0.124939
-0.448616	However, with a	-0.124939
-0.448616	members with a	-0.124939
-0.448616	model with a	-0.124939
-0.448616	n with a	-0.124939
-0.448616	end with a	-0.124939
-0.636490	platform with a	-0.124939
-0.168788	modules with a	-0.124939
-0.448616	tested with a	-0.124939
-0.722897	computer with a	-0.124939
-0.931715	compatible with a	-0.124939
-0.830877	compatibility with a	-0.124939
-0.944746	obtained with a	-0.124939
-0.937520	multiplying with a	-0.124939
-0.448616	Func with a	-0.124939
-0.448616	communication with a	-0.124939
-0.448616	type-casting with a	-0.124939
-0.448616	performed with a	-0.124939
-0.448616	moved with a	-0.124939
-0.448616	Loops with a	-0.124939
-0.448616	ways, with a	-0.124939
-0.448616	trace with a	-0.124939
-0.448616	vector::reserve with a	-0.124939
-0.448616	reached with a	-0.124939
-0.723664	is on a	-0.124939
-0.723664	function on a	-0.124939
-0.503146	long on a	-0.124939
-0.503146	file on a	-0.124939
-0.831860	processors on a	-0.124939
-0.182453	accessed on a	-0.124939
-1.089431	work on a	-0.124939
-0.503146	compiled on a	-0.124939
-0.503146	threads on a	-0.124939
-0.943303	best on a	-0.124939
-0.503146	matrix on a	-0.124939
-0.723664	implemented on a	-0.124939
-0.831860	run on a	-0.124939
-0.831860	fast on a	-0.124939
-0.503146	else on a	-0.124939
-0.503146	addition on a	-0.124939
-0.723664	tested on a	-0.124939
-1.197004	rely on a	-0.124939
-0.503146	either on a	-0.124939
-0.503146	measured on a	-0.124939
-0.503146	package on a	-0.124939
-0.503146	bad on a	-0.124939
-0.723664	μs on a	-0.124939
-0.723664	performed on a	-0.124939
-0.503146	specified on a	-0.124939
-0.503146	9.5a on a	-0.124939
-0.503146	miss on a	-0.124939
-0.503146	perfectly on a	-0.124939
-0.503146	tag on a	-0.124939
-0.503146	satisfactorily on a	-0.124939
-0.503146	NOT on a	-0.124939
-0.437976	or as a	-0.124939
-0.437976	it as a	-0.124939
-0.437976	not as a	-0.124939
-0.620198	than as a	-0.124939
-0.932486	same as a	-0.124939
-0.641765	used as a	-0.124939
-0.437976	b as a	-0.124939
-0.887072	such as a	-0.124939
-0.345327	efficient as a	-0.301030
-0.345327	stored as a	-0.124939
-0.437976	often as a	-0.124939
-0.437976	programming as a	-0.124939
-0.620198	work as a	-0.124939
-0.620198	compiled as a	-0.124939
-0.437976	language as a	-0.124939
-0.437976	done as a	-0.124939
-0.234788	implemented as a	-0.301030
-0.805651	fast as a	-0.124939
-0.306992	name as a	-0.124939
-0.437976	numbers as a	-0.124939
-0.921228	just as a	-0.124939
-0.437976	STL as a	-0.124939
-0.437976	intended as a	-0.124939
-0.437976	given as a	-0.124939
-0.437976	offset as a	-0.124939
-0.437976	occur as a	-0.124939
-0.796290	either as a	-0.124939
-0.437976	ebx as a	-0.124939
-0.399484	organized as a	-0.124939
-0.620198	provided as a	-0.124939
-0.620198	interpreted as a	-0.124939
-0.437976	edx as a	-0.124939
-0.437976	appear as a	-0.124939
-0.437976	queue as a	-0.124939
-0.399484	expressed as a	-0.124939
-0.437976	FPGA as a	-0.124939
-0.437976	ReadTSC as a	-0.124939
-0.437976	microseconds as a	-0.124939
-0.620198	internally as a	-0.124939
-0.437976	regarded as a	-0.124939
-0.437976	assignment, as a	-0.124939
-1.177988	is not a	-0.124939
-0.570578	means not a	-0.124939
-0.942347	n.a. - a	-0.425969
-1.733860	{ int a	-0.124939
-1.035670	more than a	-0.124939
-0.961934	efficient than a	-0.124939
-0.958113	faster than a	-0.301030
-1.198201	less than a	-0.124939
-0.992701	rather than a	-0.221849
-0.511577	bits than a	-0.124939
-1.126783	resources than a	-0.124939
-0.341728	longer than a	-0.425969
-0.511577	time-consuming than a	-0.124939
-0.511577	lower than a	-0.124939
-0.590758	slower than a	-0.124939
-0.511577	simpler than a	-0.124939
-0.511577	verify than a	-0.124939
-0.680718	else { a	-0.823909
-0.322039	(b) { a	-0.726999
-0.555583	(true) { a	-0.124939
-0.734700	to have a	-0.124939
-0.701134	and have a	-0.124939
-1.052214	that have a	-0.124939
-0.548380	may have a	-0.124939
-0.725320	functions have a	-0.124939
-0.880828	compilers have a	-0.425969
-0.434147	also have a	-0.124939
-1.025012	objects have a	-0.124939
-0.906558	we have a	-0.124939
-0.803163	elements have a	-0.124939
-0.803163	systems have a	-0.124939
-0.701134	even have a	-0.124939
-0.906558	CPUs have a	-0.124939
-0.331611	must have a	-0.124939
-0.489447	thread have a	-0.124939
-0.701134	preferably have a	-0.124939
-0.489447	still have a	-0.124939
-0.489447	models have a	-0.124939
-0.532914	every time a	-0.346788
-0.825256	next time a	-0.124939
-0.561401	Each time a	-0.124939
-0.561401	Every time a	-0.124939
-0.645425	to use a	-0.229674
-0.880601	and use a	-0.124939
-0.922624	can use a	-0.124939
-0.726171	not use a	-0.124939
-0.969995	may use a	-0.124939
-0.446371	then use a	-0.124939
-0.453390	variables use a	-0.124939
-0.643870	operations use a	-0.124939
-0.643870	must use a	-0.124939
-0.453390	well use a	-0.124939
-0.643870	applications use a	-0.124939
-0.453390	programmers use a	-0.124939
-0.453390	Alternatively, use a	-0.124939
-0.170036	(May use a	-0.425969
-0.911520	than when a	-0.124939
-1.288095	only when a	-0.124939
-0.911520	used when a	-0.124939
-0.538706	b when a	-0.124939
-0.538706	example, when a	-0.124939
-0.538706	times when a	-0.124939
-0.538706	advantageous when a	-0.124939
-0.538706	comes when a	-0.124939
-0.538706	Likewise, when a	-0.124939
-0.538706	happens when a	-0.124939
-0.538706	allocating when a	-0.124939
-0.538706	popularity when a	-0.124939
-0.579948	small then a	-0.124939
-0.371231	range then a	-0.425969
-0.579948	container, then a	-0.124939
-0.842720	it from a	-0.124939
-0.842720	data from a	-0.124939
-0.543353	value from a	-0.124939
-1.176606	called from a	-0.124939
-0.695766	available from a	-0.124939
-1.033136	calculated from a	-0.124939
-0.508207	known from a	-0.124939
-0.508207	optimal from a	-0.124939
-0.508207	resources from a	-0.124939
-0.957393	read from a	-0.124939
-0.508207	response from a	-0.124939
-0.732106	comes from a	-0.124939
-0.967659	recover from a	-0.124939
-0.508207	restriction from a	-0.124939
-1.157843	from memory a	-0.124939
-0.590020	much memory a	-0.124939
-0.490680	to at a	-0.124939
-0.490680	be at a	-0.124939
-0.457120	elements at a	-0.425969
-0.385584	stored at a	-0.602060
-0.490680	bit at a	-0.124939
-0.490680	bits at a	-0.124939
-0.490680	bytes at a	-0.124939
-0.179441	line at a	-0.124939
-0.490680	numbers at a	-0.124939
-0.490680	piece at a	-0.124939
-0.930645	loaded at a	-0.124939
-0.490680	comes at a	-0.124939
-0.490680	square at a	-0.124939
-0.490680	kilobytes at a	-0.124939
-0.490680	looking at a	-0.124939
-0.936855	that has a	-0.124939
-0.232060	code has a	-0.249877
-0.569619	This has a	-0.124939
-0.470083	but has a	-0.124939
-0.670033	integer has a	-0.124939
-0.670033	class has a	-0.124939
-1.013996	library has a	-0.124939
-0.846125	object has a	-0.124939
-0.470083	static has a	-0.124939
-0.846125	user has a	-0.124939
-0.764186	processor has a	-0.124939
-0.470083	application has a	-0.124939
-0.470083	parameter has a	-0.124939
-0.670033	keyword has a	-0.124939
-0.470083	chain has a	-0.124939
-0.470083	manager has a	-0.124939
-0.470083	reader has a	-0.124939
-0.530266	to make a	-0.161151
-0.691991	and make a	-0.124939
-0.847947	can make a	-0.124939
-0.739695	not make a	-0.124939
-0.666691	you make a	-0.124939
-0.882709	will make a	-0.124939
-0.417825	Alternatively, make a	-0.124939
-0.576443	language because a	-0.124939
-0.576443	values because a	-0.124939
-0.576443	vectorized, because a	-0.124939
-0.576443	advance, because a	-0.124939
-0.731533	is only a	-0.124939
-1.150919	not only a	-0.124939
-0.927724	use only a	-0.124939
-1.097524	has only a	-0.124939
-1.052662	takes only a	-0.124939
-0.927724	contains only a	-0.124939
-0.482302	pointer. If a	-0.124939
-0.689561	problem. If a	-0.124939
-0.482302	inefficient. If a	-0.124939
-0.689561	table. If a	-0.124939
-0.689561	integer. If a	-0.124939
-0.689561	number. If a	-0.124939
-0.482302	chain. If a	-0.124939
-0.482302	future. If a	-0.124939
-0.689561	factor. If a	-0.124939
-0.177381	part. If a	-0.124939
-0.482302	read. If a	-0.124939
-0.482302	obtained. If a	-0.124939
-0.482302	BSD. If a	-0.124939
-0.482302	extensions. If a	-0.124939
-0.482302	lookup[b]; If a	-0.124939
-0.482302	ways). If a	-0.124939
-1.234263	on which a	-0.124939
-0.587860	at which a	-0.124939
-0.494817	// set a	-0.425969
-0.576675	cannot set a	-0.124939
-1.253931	to do a	-0.124939
-1.088403	can do a	-0.425969
-0.659200	is using a	-0.124939
-0.691694	of using a	-0.124939
-0.952752	to using a	-0.124939
-0.579887	by using a	-0.124939
-0.463215	as using a	-0.124939
-0.463215	than using a	-0.124939
-0.594550	8.4 double a	-0.124939
-0.890492	array size a	-0.124939
-1.659152	function pointer a	-0.124939
-0.545786	function into a	-0.124939
-0.387655	memory into a	-0.124939
-0.678637	data into a	-0.124939
-0.387655	b into a	-0.124939
-0.387655	array into a	-0.124939
-0.387655	variables into a	-0.124939
-0.387655	calculations into a	-0.124939
-0.281451	files into a	-0.425969
-0.387655	y into a	-0.124939
-0.387655	together into a	-0.124939
-0.387655	task into a	-0.124939
-0.739476	them into a	-0.124939
-0.545786	fit into a	-0.124939
-0.151938	fits into a	-0.124939
-0.387655	turned into a	-0.124939
-0.387655	80 into a	-0.124939
-0.387655	combined into a	-0.124939
-0.387655	formula into a	-0.124939
-0.545786	joined into a	-0.124939
-0.151938	wrapped into a	-0.425969
-0.387655	isolated into a	-0.124939
-0.387655	packed into a	-0.124939
-0.594095	8.18 float a	-0.124939
-0.776535	is also a	-0.124939
-0.533595	and also a	-0.124939
-0.533595	possibly also a	-0.124939
-0.703084	to such a	-0.124939
-0.805631	in such a	-0.124939
-0.703084	if such a	-0.124939
-0.703084	have such a	-0.124939
-0.805631	do such a	-0.124939
-0.490644	using such a	-0.124939
-0.490644	around such a	-0.124939
-0.490644	illustrates such a	-0.124939
-0.490644	justify such a	-0.124939
-0.490644	supply such a	-0.124939
-0.596221	processors. In a	-0.124939
-1.737957	the array a	-0.124939
-1.324487	cases where a	-0.124939
-0.475401	solution where a	-0.124939
-0.475401	space where a	-0.124939
-0.358530	situation where a	-0.249877
-0.621529	situations where a	-0.124939
-0.475401	inheritance where a	-0.124939
-0.475401	instructions, where a	-0.124939
-1.137567	that takes a	-0.124939
-0.354957	integer takes a	-0.124939
-0.918098	typically takes a	-0.124939
-0.541482	collection takes a	-0.124939
-0.595147	address so a	-0.124939
-0.745352	and return a	-0.124939
-0.745352	function return a	-0.124939
-0.717278	{ return a	-0.124939
-0.798855	} return a	-0.124939
-0.171853	2; return a	-0.425969
-0.171853	3; return a	-0.124939
-0.562400	bytes between a	-0.124939
-0.810158	difference between a	-0.124939
-1.378320	the way a	-0.124939
-0.578180	which way a	-0.124939
-1.475338	This makes a	-0.124939
-0.529754	compiler makes a	-0.124939
-0.529754	It makes a	-0.124939
-0.529754	program makes a	-0.124939
-0.529754	position-independent, makes a	-0.124939
-1.006806	is called a	-0.124939
-1.453216	memory address a	-0.124939
-0.471115	to call a	-0.221849
-0.584658	For example, a	-0.124939
-1.206144	to take a	-0.124939
-1.001137	that take a	-0.124939
-1.080698	may take a	-0.124939
-0.523384	conversions take a	-0.124939
-0.523384	logarithms take a	-0.124939
-1.081097	is often a	-0.124939
-0.554026	how often a	-0.124939
-1.126713	know how a	-0.124939
-0.574696	idea how a	-0.124939
-0.533853	only need a	-0.124939
-0.546664	doesn't need a	-0.124939
-1.033790	don't need a	-0.124939
-1.664738	to test a	-0.124939
-0.473145	or even a	-0.124939
-0.549958	have even a	-0.124939
-0.484116	to access a	-0.124939
-0.844622	you access a	-0.124939
-0.668416	roll out a	-0.124939
-1.008822	= 0 a	-0.124939
-1.593968	the case a	-0.124939
-1.552119	a & a	-0.124939
-0.590160	each constant a	-0.124939
-0.521855	make up a	-0.124939
-0.521855	setting up a	-0.124939
-0.186850	Splitting up a	-0.425969
-0.809514	for making a	-0.124939
-0.706148	are making a	-0.124939
-0.798806	by making a	-0.124939
-0.307806	than making a	-0.124939
-0.927997	from making a	-0.124939
-0.439629	actually making a	-0.124939
-2.037021	you want a	-0.124939
-1.734212	information about a	-0.124939
-0.562304	bits while a	-0.124939
-0.562304	function, while a	-0.124939
-0.589140	;startofFunc ; a	-0.124939
-0.828704	then calls a	-0.124939
-0.563277	support calls a	-0.124939
-0.486528	it Use a	-0.124939
-0.486528	example: Use a	-0.124939
-0.486528	1. Use a	-0.124939
-0.178424	3.2 Use a	-0.425969
-0.588562	how big a	-0.124939
-0.535151	names. But a	-0.124939
-0.535151	itself. But a	-0.124939
-0.535151	simplicity. But a	-0.124939
-0.189275	function through a	-0.124939
-0.062967	class through a	-0.301030
-0.225410	b through a	-0.124939
-0.189275	object through a	-0.124939
-0.225410	variable through a	-0.124939
-0.225410	called through a	-0.124939
-0.225410	address through a	-0.124939
-0.483068	accessed through a	-0.124939
-0.368117	go through a	-0.124939
-0.225410	GOT through a	-0.124939
-0.225410	jump through a	-0.124939
-0.225410	caller through a	-0.124939
-0.225410	Walking through a	-0.124939
-0.225410	looping through a	-0.124939
-0.225410	propagated through a	-0.124939
-0.171530	= a, a	-0.249877
-1.262298	to compile a	-0.124939
-1.384675	a matrix a	-0.124939
-0.557192	to matrix a	-0.124939
-1.239985	not been a	-0.124939
-0.378105	may cause a	-0.124939
-1.021046	will cause a	-0.124939
-0.879683	have done a	-0.124939
-1.617509	is therefore a	-0.124939
-0.602562	is inside a	-0.124939
-0.426303	table inside a	-0.124939
-0.234155	declared inside a	-0.425969
-0.602562	defined inside a	-0.124939
-0.736362	that uses a	-0.124939
-0.587538	compiler uses a	-0.124939
-0.384444	program uses a	-0.124939
-0.587538	application uses a	-0.124939
-0.587538	implementation uses a	-0.124939
-0.416234	still uses a	-0.124939
-0.589225	The parameters a	-0.124939
-0.609702	to get a	-0.124939
-0.393260	and get a	-0.124939
-0.153560	may get a	-0.124939
-0.393260	should get a	-0.124939
-0.393260	soon get a	-0.124939
-0.393260	(4) get a	-0.124939
-0.731528	int b; a	-0.124939
-0.524089	double b; a	-0.124939
-0.169982	a, b; a	-1.028029
-0.643551	bool b; a	-0.124939
-0.194678	have implemented a	-0.124939
-0.589360	The solution a	-0.124939
-1.235415	that support a	-0.124939
-0.178855	software contains a	-0.124939
-0.699243	often contains a	-0.124939
-0.488285	expression contains a	-0.124939
-0.589248	considering whether a	-0.124939
-1.144982	are doing a	-0.124939
-0.706833	to run a	-0.124939
-1.301002	can calculate a	-0.124939
-1.248572	to inline a	-0.124939
-0.800456	then add a	-0.124939
-0.547718	we add a	-0.124939
-0.193062	// copy a	-0.425969
-0.586911	job optimizing a	-0.124939
-0.372021	is simply a	-0.823909
-1.349569	{ ... a	-0.124939
-0.956463	be quite a	-0.124939
-0.183578	take quite a	-0.425969
-1.400938	are used. a	-0.124939
-1.271119	to write a	-0.124939
-0.544084	you write a	-0.124939
-1.607615	to optimize a	-0.124939
-0.586779	CPUs. However, a	-0.124939
-0.872011	simple cases, a	-0.124939
-0.637864	to replace a	-0.124939
-0.676886	can replace a	-0.124939
-0.155544	cannot replace a	-0.124939
-0.563944	automatically replace a	-0.124939
-0.867167	Windows allows a	-0.124939
-0.538758	then sets a	-0.124939
-0.538758	routine sets a	-0.124939
-0.666615	the expression a	-0.425969
-0.820988	The expression a	-0.124939
-0.583397	for handling a	-0.124939
-0.600386	things like a	-0.124939
-0.424852	treated like a	-0.124939
-0.600386	behaves like a	-0.124939
-0.424852	expanded like a	-0.124939
-0.424852	actions like a	-0.124939
-0.537642	element __m128i a	-0.124939
-0.537642	operations: __m128i a	-0.124939
-0.584383	} Using a	-0.124939
-0.518403	to put a	-0.124939
-0.420011	you put a	-0.124939
-0.593154	Don't put a	-0.124939
-0.538259	This needs a	-0.124939
-0.538259	loop needs a	-0.124939
-0.592803	* c; a	-0.124939
-0.419776	/ c; a	-0.124939
-0.623896	b, c; a	-0.425969
-0.419776	% c; a	-0.124939
-0.534835	or what a	-0.124939
-0.777664	change what a	-0.124939
-0.584019	before running a	-0.124939
-1.126765	a && a	-0.124939
-0.529492	b && a	-0.124939
-1.350879	a | a	-0.124939
-0.216404	// Make a	-0.903090
-0.310800	support. Make a	-0.124939
-0.310800	operators. Make a	-0.124939
-0.486406	7.10b char a	-0.124939
-0.486406	8.17 char a	-0.124939
-0.486406	7.9b char a	-0.124939
-1.230239	is needed a	-0.124939
-0.525669	should give a	-0.124939
-0.525669	still give a	-0.124939
-0.524955	loop becomes a	-0.124939
-0.760527	caching becomes a	-0.124939
-0.692846	This requires a	-0.124939
-0.611709	pointers requires a	-0.124939
-0.432377	processors requires a	-0.124939
-0.432377	sampling requires a	-0.124939
-0.855136	to load a	-0.124939
-0.580499	as calling a	-0.124939
-1.341424	example shows a	-0.124939
-0.523673	131) shows a	-0.124939
-0.491922	to generate a	-0.124939
-0.148513	and generate a	-0.425969
-0.797757	will generate a	-0.124939
-0.578947	and reduce a	-0.124939
-0.940988	to choose a	-0.124939
-1.137296	may choose a	-0.124939
-0.518358	have made a	-0.124939
-0.518358	once made a	-0.124939
-0.797514	is just a	-0.124939
-0.653844	in just a	-0.124939
-0.459797	when just a	-0.124939
-0.580055	or require a	-0.124939
-0.366875	not require a	-0.124939
-0.516185	may require a	-0.124939
-0.366875	arrays require a	-0.124939
-0.366875	references require a	-0.124939
-1.014965	to start a	-0.124939
-0.515777	can start a	-0.124939
-0.575810	N supports a	-0.124939
-1.157093	of columns a	-0.124939
-0.835308	can become a	-0.124939
-0.902115	has become a	-0.124939
-0.577910	This gives a	-0.124939
-0.575799	for inlining a	-0.124939
-0.574663	|| b) a	-0.124939
-0.331994	way. Such a	-0.124939
-0.331994	processor. Such a	-0.124939
-0.331994	language. Such a	-0.124939
-0.331994	market. Such a	-0.124939
-0.331994	games. Such a	-0.124939
-0.577474	paragraph described a	-0.124939
-0.849922	operators produce a	-0.124939
-0.575785	are including a	-0.124939
-0.709349	be given a	-0.124939
-0.494474	been given a	-0.124939
-0.573968	make temp a	-0.124939
-1.198154	c, d; a	-0.124939
-0.991913	can save a	-0.124939
-0.575770	this prevents a	-0.124939
-1.546674	to tell a	-0.124939
-0.283435	to unroll a	-0.301030
-0.369894	usually unroll a	-0.124939
-0.570597	because testing a	-0.124939
-0.333807	or writing a	-0.124939
-0.572523	profiling. When a	-0.124939
-0.570597	when copying a	-0.124939
-0.617861	for accessing a	-0.124939
-0.142346	than accessing a	-0.124939
-0.355495	When accessing a	-0.124939
-1.001137	wait until a	-0.124939
-0.612393	by adding a	-0.425969
-0.571238	which causes a	-0.124939
-1.392708	to predict a	-0.124939
-0.176389	= true a	-0.425969
-0.986098	can execute a	-0.124939
-1.017546	for N a	-0.124939
-1.392708	at least a	-0.124939
-0.425563	to insert a	-0.124939
-0.744688	and insert a	-0.124939
-0.174257	without loading a	-0.425969
-1.068103	for calculating a	-0.124939
-0.472240	begin calculating a	-0.124939
-0.565862	to e.g. a	-0.124939
-0.322408	b ? a	-0.425969
-0.568867	has defined a	-0.124939
-0.457258	can expect a	-0.124939
-0.171040	cannot expect a	-0.301030
-1.286388	of course a	-0.124939
-0.375381	useful whenever a	-0.124939
-0.375381	cost whenever a	-0.124939
-0.375381	And whenever a	-0.124939
-0.830112	to modify a	-0.124939
-0.459868	of setting a	-0.124939
-0.894580	by setting a	-0.124939
-0.305135	is within a	-0.124939
-0.305135	zero within a	-0.124939
-0.305135	irrelevant within a	-0.124939
-0.305135	keys within a	-0.124939
-0.563221	functions counts a	-0.124939
-0.459868	many processors, a	-0.124939
-0.459868	older processors, a	-0.124939
-0.564861	} Obviously, a	-0.124939
-0.303315	to allocate a	-0.301030
-0.304232	classes allocate a	-0.124939
-0.566508	have added a	-0.124939
-0.568989	they waste a	-0.124939
-0.794870	// define a	-0.124939
-0.458580	may define a	-0.124939
-0.366322	to implement a	-0.124939
-0.559112	should contain a	-0.124939
-0.824368	or writes a	-0.124939
-0.824368	to transfer a	-0.124939
-1.090813	optimize away a	-0.124939
-0.824368	can multiply a	-0.124939
-0.635530	function stores a	-0.124939
-0.447993	mechanism stores a	-0.124939
-0.562731	of finding a	-0.124939
-0.829375	will vectorize a	-0.124939
-0.499615	to include a	-0.124939
-0.355061	sets include a	-0.124939
-0.355061	packages include a	-0.124939
-0.826031	integer addition, a	-0.124939
-0.560918	unchanged across a	-0.124939
-0.558056	previously required a	-0.124939
-0.993384	slow down a	-0.124939
-0.549845	it had a	-0.124939
-0.186571	/ 10; a	-0.425969
-0.096596	% 10; a	-0.425969
-0.549845	type. Likewise, a	-0.124939
-0.552119	can spend a	-0.124939
-1.165707	is called, a	-0.124939
-0.810424	after executing a	-0.124939
-0.549845	of exceptions a	-0.124939
-0.522773	to transpose a	-0.425969
-0.544326	the break a	-0.124939
-0.544326	to break a	-0.124939
-0.543022	address plus a	-0.124939
-0.545635	/ 16; a	-0.124939
-0.545635	% 16; a	-0.124939
-1.146989	to identify a	-0.124939
-0.544326	and show a	-0.124939
-0.386640	only show a	-0.124939
-1.063470	to evaluate a	-0.124939
-0.544326	non-const reference, a	-0.124939
-0.928049	only half a	-0.124939
-0.388461	for converting a	-0.124939
-0.272600	are converting a	-0.124939
-0.272600	implicitly converting a	-0.124939
-0.271902	that follows a	-0.124939
-0.271902	it follows a	-0.124939
-0.271902	pointer follows a	-0.124939
-0.543022	to base a	-0.124939
-0.546947	tables Reading a	-0.124939
-0.546947	numbers form a	-0.124939
-0.271902	__m128i defines a	-0.124939
-0.271902	__m128 defines a	-0.124939
-0.271902	__m128d defines a	-0.124939
-0.543022	c; Is16vec8 a	-0.124939
-0.271902	structures. Accessing a	-0.124939
-0.271902	compact. Accessing a	-0.124939
-0.271902	Efficiency Accessing a	-0.124939
-0.191298	to install a	-0.124939
-0.228712	must install a	-0.124939
-0.099471	can consume a	-0.425969
-0.228712	functions consume a	-0.124939
-1.033139	other hand, a	-0.124939
-0.535179	that created a	-0.124939
-0.538259	labels follow a	-0.124939
-0.354949	and returns a	-0.124939
-0.354949	which returns a	-0.124939
-0.780961	solution. Is a	-0.124939
-0.536716	resources. Typically, a	-0.124939
-0.536716	= ~a a	-0.124939
-0.536716	we prefer a	-0.124939
-0.755129	loop repeats a	-0.124939
-0.751957	by defining a	-0.124939
-0.260071	that produces a	-0.124939
-0.077157	variable produces a	-0.124939
-0.755129	calculate *p+2 a	-0.124939
-0.441349	by choosing a	-0.124939
-0.312514	when choosing a	-0.124939
-0.868504	for saving a	-0.124939
-0.311367	used. Whenever a	-0.124939
-0.311367	better. Whenever a	-0.124939
-0.519952	Sum1() {return a	-0.124939
-0.311367	pointers Calling a	-0.124939
-0.311367	class. Calling a	-0.124939
-0.521809	can force a	-0.124939
-0.755129	a pointer, a	-0.124939
-0.521809	function opens a	-0.124939
-0.715224	may seem a	-0.124939
-0.498047	still consumes a	-0.124939
-0.498047	compiler optimizes a	-0.124939
-0.821061	// Return a	-0.124939
-0.500402	Example 7.2 a	-0.124939
-0.498047	cases ignore a	-0.124939
-0.355625	be considered a	-0.124939
-0.247233	traditionally considered a	-0.124939
-0.500402	are spaced a	-0.124939
-0.355625	for implementing a	-0.124939
-0.247233	But implementing a	-0.124939
-0.498047	only). Specifies a	-0.124939
-0.498047	It reveals a	-0.124939
-0.498047	set (requires a	-0.124939
-0.498047	float 140 a	-0.124939
-0.498047	should leave a	-0.124939
-0.247233	numbers. With a	-0.124939
-0.247233	step. With a	-0.124939
-0.500402	function scans a	-0.124939
-0.498047	easily justify a	-0.124939
-0.247233	that holds a	-0.124939
-0.247233	type holds a	-0.124939
-0.498047	two arrays, a	-0.124939
-0.106256	= false, a	-0.425969
-0.247233	bits represent a	-0.124939
-0.247233	truly represent a	-0.124939
-0.247233	of returning a	-0.124939
-0.247233	by returning a	-0.124939
-0.500402	before terminating a	-0.124939
-0.821061	by joining a	-0.124939
-0.821061	is certainly a	-0.124939
-0.715224	is indeed a	-0.124939
-0.498047	NAN (not a	-0.124939
-0.498047	it expects a	-0.124939
-0.498047	can compute a	-0.124939
-0.457295	etc. SSSE3 a	-0.124939
-0.457295	mechanism executes a	-0.124939
-0.457295	have developed a	-0.124939
-0.649939	of keeping a	-0.124939
-0.457295	can emulate a	-0.124939
-0.457295	inline. Replacing a	-0.124939
-0.457295	Before starting a	-0.124939
-0.457295	only if, a	-0.124939
-0.457295	drivers differ a	-0.124939
-0.141868	to pressing a	-0.124939
-0.141868	like pressing a	-0.124939
-0.649939	and maintaining a	-0.124939
-0.457295	b.load(bb+i); c.load(cc+i); a	-0.124939
-0.457295	CPU. Unrolling a	-0.124939
-0.457295	and afterwards a	-0.124939
-0.141868	to lock a	-0.124939
-0.141868	temporarily lock a	-0.124939
-0.457295	implementations reveal a	-0.124939
-0.649939	y, z; a	-0.124939
-0.141868	which transposes a	-0.124939
-0.141868	example transposes a	-0.124939
-0.649939	// Returns a	-0.124939
-0.457295	it sees a	-0.124939
-0.457295	compiler treat a	-0.124939
-0.457295	make log2 a	-0.124939
-0.457295	for studying a	-0.124939
-0.649939	by replacing a	-0.124939
-0.457295	and involve a	-0.124939
-0.457295	simple type, a	-0.124939
-0.457295	1. Writing a	-0.124939
-0.457295	by assigning a	-0.124939
-0.457295	of relieving a	-0.124939
-0.353936	software installed, a	-0.124939
-0.353936	90 Gives a	-0.124939
-0.353936	to re-use a	-0.124939
-0.353936	definition. Inlining a	-0.124939
-0.353936	for incrementing a	-0.124939
-0.353936	or decrementing a	-0.124939
-0.353936	by requesting a	-0.124939
-0.353936	statement occupies a	-0.124939
-0.353936	below. Installing a	-0.124939
-0.353936	program executable: a	-0.124939
-0.353936	= (a&b)&(c&d) a	-0.124939
-0.353936	} Transposing a	-0.124939
-0.353936	Example 8.5b a	-0.124939
-0.353936	modified. Unlike a	-0.124939
-0.353936	and create a	-0.124939
-0.353936	calculations forms a	-0.124939
-0.353936	to compose a	-0.124939
-0.353936	that draws a	-0.124939
-0.353936	it feeds a	-0.124939
-0.353936	Example 8.2b a	-0.124939
-0.353936	Example 8.3b a	-0.124939
-0.353936	bit indicates a	-0.124939
-0.353936	can incur a	-0.124939
-0.353936	to pass a	-0.124939
-0.353936	to reinstall a	-0.124939
-0.353936	function rounds a	-0.124939
-0.353936	Example 8.10b a	-0.124939
-0.353936	for fetching a	-0.124939
-0.353936	on redesigning a	-0.124939
-0.353936	{2.6f, 1.5f}; a	-0.124939
-0.353936	example converts a	-0.124939
-0.353936	= MultiplyBy<8>(10); a	-0.124939
-0.353936	than isolating a	-0.124939
-0.353936	& a= a	-0.124939
-0.353936	{1.0f, 2.5f}; a	-0.124939
-0.353936	of managing a	-0.124939
-0.353936	a<<b<<c=a<<(b+c) x-xxx--xx a	-0.124939
-0.353936	and publish a	-0.124939
-0.353936	~a&~b=~(a|b) --xxxx--- a	-0.124939
-0.353936	of RAM, a	-0.124939
-0.353936	you activate a	-0.124939
-0.353936	of occupying a	-0.124939
-1.824061	This is of	-0.425969
-0.596859	works is of	-0.124939
-0.596859	animations is of	-0.124939
-0.598736	exceeding that of	-0.124939
-2.280084	may be of	-0.124939
-2.173244	should be of	-0.124939
-0.892905	These are of	-0.124939
-0.202741	table // of	-0.425969
-1.999499	a function of	-0.124939
-0.199319	linear function of	-0.124939
-1.194524	detection function of	-0.124939
-0.578567	increasing function of	-0.124939
-0.199319	exp function of	-0.124939
-0.578567	staircase function of	-0.124939
-2.502610	the code of	-0.124939
-2.001178	you may of	-0.124939
-1.286126	the time of	-0.301030
-1.038405	calculation time of	-0.124939
-0.025319	the use of	-0.124939
-0.080867	The use of	-0.124939
-0.494277	make use of	-0.124939
-0.494277	efficient use of	-0.124939
-0.494277	any use of	-0.124939
-0.494277	better use of	-0.124939
-0.494277	explicit use of	-0.124939
-0.494277	Excessive use of	-0.124939
-1.710016	or more of	-0.124939
-2.275916	the program of	-0.124939
-0.293304	a vector of	-0.380211
-0.460865	bit vector of	-0.124939
-0.833527	code because of	-0.124939
-0.996949	time because of	-0.124939
-0.833527	b because of	-0.124939
-0.503927	variables because of	-0.124939
-0.503927	systems because of	-0.124939
-0.833527	times because of	-0.124939
-0.503927	speed because of	-0.124939
-0.503927	parameters because of	-0.124939
-0.724963	solution because of	-0.124939
-0.503927	intended because of	-0.124939
-0.724963	avoided because of	-0.124939
-0.958466	inefficient because of	-0.124939
-0.503927	disk because of	-0.124939
-0.503927	preferred because of	-0.124939
-0.503927	completely because of	-0.124939
-0.868580	member functions of	-0.301030
-1.406160	a CPU of	-0.124939
-0.875332	newest CPU of	-0.124939
-0.587438	possible point of	-0.124939
-0.587438	technological point of	-0.124939
-1.642953	a loop of	-0.124939
-0.851287	innermost loop of	-0.425969
-0.845895	message loop of	-0.124939
-0.873510	discussed which of	-0.124939
-0.587018	advance which of	-0.124939
-1.288367	to all of	-0.124939
-1.316877	if all of	-0.124939
-0.313083	is one of	-0.602060
-0.313083	to one of	-0.301030
-1.215244	only one of	-0.124939
-0.604354	into one of	-0.425969
-0.507904	choose one of	-0.124939
-0.507904	Only one of	-0.124939
-0.507904	signifying one of	-0.124939
-1.451130	a cache of	-0.124939
-1.314834	data cache of	-0.124939
-1.478844	level-2 cache of	-0.124939
-0.892860	There should of	-0.124939
-1.850716	an integer of	-0.124939
-0.181548	a set of	-0.124939
-0.499370	which set of	-0.124939
-0.499370	used set of	-0.124939
-0.499370	one set of	-0.124939
-0.499370	each set of	-0.124939
-0.499370	particular set of	-0.124939
-0.499370	own set of	-0.124939
-0.499370	typical set of	-0.124939
-0.499370	suitable set of	-0.124939
-0.181548	realistic set of	-0.124939
-1.503792	the class of	-0.124939
-0.584132	what class of	-0.124939
-0.584265	or each of	-0.124939
-0.868201	after each of	-0.124939
-0.828837	the example of	-0.124939
-0.240972	an example of	-0.602060
-0.488289	where most of	-0.124939
-0.488289	way most of	-0.124939
-0.488289	while most of	-0.124939
-0.488289	But most of	-0.124939
-0.488289	uses most of	-0.124939
-0.488289	run most of	-0.124939
-0.488289	predicted most of	-0.124939
-0.178856	spend most of	-0.425969
-0.488289	runs most of	-0.124939
-0.488289	obtain most of	-0.124939
-0.488289	consumes most of	-0.124939
-0.488289	spends most of	-0.124939
-0.129144	the size of	-0.580871
-0.146644	The size of	-0.191886
-0.434058	line size of	-0.124939
-0.326893	maximum size of	-0.124939
-0.326893	Define size of	-0.124939
-0.326893	total size of	-0.124939
-0.133390	combined size of	-0.124939
-0.133390	Total size of	-0.425969
-2.082924	a pointer of	-0.124939
-1.179740	a library of	-0.124939
-0.034475	a multiple of	-0.778151
-1.195456	the object of	-0.124939
-0.422667	an object of	-0.367977
-0.693687	new object of	-0.124939
-0.329477	An object of	-0.425969
-0.018836	the number of	-0.312929
-0.061017	a number of	-0.124939
-0.012761	The number of	-0.191886
-0.030380	// number of	-0.301030
-0.172428	this number of	-0.124939
-0.046406	variable number of	-0.425969
-0.098371	large number of	-0.124939
-0.098371	optimal number of	-0.124939
-0.046406	limited number of	-0.124939
-0.030380	maximum number of	-0.124939
-0.098371	reduced number of	-0.124939
-0.030380	total number of	-0.301030
-0.098371	realistic number of	-0.124939
-0.030380	excessive number of	-0.124939
-0.098371	107 number of	-0.124939
-0.046406	increasing number of	-0.124939
-0.098371	extended number of	-0.124939
-0.098371	Max. number of	-0.124939
-0.098371	integral number of	-0.124939
-0.805846	an array of	-0.124939
-0.541940	dynamic array of	-0.124939
-0.541940	Make array of	-0.124939
-1.226356	for many of	-0.124939
-1.240672	with many of	-0.124939
-0.675070	has many of	-0.425969
-0.540305	avoids many of	-0.124939
-0.268020	same version of	-0.124939
-0.115888	which version of	-0.425969
-0.428502	each version of	-0.124939
-0.268020	possible version of	-0.124939
-0.453475	new version of	-0.124939
-0.558582	SSE2 version of	-0.124939
-0.268020	specific version of	-0.124939
-0.382491	optimized version of	-0.124939
-0.268020	optimal version of	-0.124939
-0.268020	better version of	-0.124939
-0.268020	old version of	-0.124939
-0.161243	appropriate version of	-0.823909
-0.382491	desired version of	-0.124939
-0.072376	right version of	-0.602060
-0.268020	final version of	-0.124939
-0.268020	future version of	-0.124939
-0.268020	newer version of	-0.124939
-0.268020	interpreted version of	-0.124939
-0.276040	debug version of	-0.425969
-0.072376	latest version of	-0.301030
-0.268020	popular version of	-0.124939
-0.382491	inferior version of	-0.124939
-0.268020	command-line version of	-0.124939
-0.382491	release version of	-0.124939
-0.243785	the value of	-0.425969
-0.144498	The value of	-0.124939
-0.252633	different value of	-0.124939
-0.252633	integer value of	-0.124939
-0.331814	each value of	-0.124939
-0.108200	new value of	-0.124939
-0.252633	binary value of	-0.124939
-0.252633	negative value of	-0.124939
-0.252633	preceding value of	-0.124939
-0.252633	final value of	-0.124939
-0.205771	absolute value of	-0.425969
-0.252633	initial value of	-0.124939
-0.354939	when objects of	-0.425969
-0.541439	store objects of	-0.124939
-0.541439	similar objects of	-0.124939
-0.541439	Returning objects of	-0.124939
-0.720082	in any of	-0.124939
-0.720082	if any of	-0.124939
-1.021112	by any of	-0.124939
-0.500987	use any of	-0.124939
-0.181936	If any of	-0.124939
-0.500987	but any of	-0.124939
-0.827271	without any of	-0.124939
-0.823060	of some of	-0.124939
-0.887864	to some of	-0.124939
-0.887864	and some of	-0.124939
-0.823060	with some of	-0.124939
-0.498995	described some of	-0.124939
-0.498995	Even some of	-0.124939
-0.498995	While some of	-0.124939
-0.498995	describe some of	-0.124939
-0.437713	a table of	-0.346788
-0.972390	The table of	-0.124939
-0.493894	// table of	-0.124939
-0.493894	make table of	-0.124939
-0.656024	the performance of	-0.346788
-0.494948	Comparing performance of	-0.124939
-0.494948	overall performance of	-0.124939
-0.494948	benchmark performance of	-0.124939
-0.332142	the order of	-0.367977
-0.322580	The order of	-0.124939
-0.470194	opposite order of	-0.124939
-0.825711	new branch of	-0.124939
-0.561649	particular branch of	-0.124939
-0.561649	dispatch branch of	-0.124939
-0.480567	is member of	-0.124939
-0.461494	a member of	-0.522879
-0.962909	data member of	-0.124939
-0.686768	polymorphic member of	-0.124939
-0.480567	(not member of	-0.124939
-0.600834	the way of	-0.124939
-1.047259	a way of	-0.124939
-0.424879	C++ way of	-0.124939
-0.300485	efficient way of	-0.124939
-0.600427	useful way of	-0.124939
-0.424879	simple way of	-0.124939
-0.424879	common way of	-0.124939
-0.600427	good way of	-0.124939
-0.424879	convenient way of	-0.124939
-0.424879	efficient, way of	-0.124939
-0.424879	portable way of	-0.124939
-1.235534	the elements of	-0.124939
-0.889832	all elements of	-0.124939
-0.966409	array elements of	-0.124939
-0.529381	four elements of	-0.124939
-0.529381	N elements of	-0.124939
-0.088060	the address of	-0.544068
-0.126520	The address of	-0.124939
-0.305791	return address of	-0.124939
-0.884201	every call of	-0.124939
-0.852934	each bit of	-0.124939
-0.323043	sign bit of	-0.425969
-0.451320	significant bit of	-0.124939
-1.110484	the optimization of	-0.124939
-0.116940	on optimization of	-0.602060
-0.590962	Two libraries of	-0.124939
-0.991842	that pointers of	-0.124939
-1.019049	two pointers of	-0.124939
-0.591674	versions even of	-0.124939
-0.806231	the method of	-0.124939
-0.675875	The method of	-0.124939
-0.490934	newer method of	-0.124939
-0.490934	C-style method of	-0.124939
-0.490934	original method of	-0.124939
-0.076659	is out of	-0.301030
-0.288564	are out of	-0.124939
-0.288564	or out of	-0.124939
-0.288564	if out of	-0.124939
-0.288564	not out of	-0.124939
-0.120743	instructions out of	-0.124939
-0.288564	conversions out of	-0.124939
-0.288564	index out of	-0.124939
-0.076659	Index out of	-0.301030
-0.120743	moved out of	-0.124939
-0.288564	being out of	-0.124939
-0.120743	jumping out of	-0.425969
-0.288564	breaking out of	-0.124939
-0.588786	zip file of	-0.124939
-0.035872	the part of	-0.124939
-0.017566	is part of	-0.124939
-0.017566	a part of	-0.124939
-0.035872	are part of	-0.124939
-0.035872	as part of	-0.124939
-0.035872	not part of	-0.124939
-0.017566	this part of	-0.425969
-0.035872	A part of	-0.124939
-0.011631	same part of	-0.602060
-0.035872	If part of	-0.124939
-0.035872	which part of	-0.124939
-0.035872	but part of	-0.124939
-0.063579	each part of	-0.425969
-0.035872	static part of	-0.124939
-0.035872	any part of	-0.124939
-0.010565	critical part of	-1.028029
-0.035872	access part of	-0.124939
-0.035872	important part of	-0.124939
-0.035872	large part of	-0.124939
-0.035872	small part of	-0.124939
-0.035872	optimized part of	-0.124939
-0.035872	another part of	-0.124939
-0.035872	particular part of	-0.124939
-0.035872	significant part of	-0.124939
-0.035872	time-consuming part of	-0.124939
-0.035872	(or part of	-0.124939
-0.035872	time-critical part of	-0.124939
-0.035872	task-specific part of	-0.124939
-0.296879	the bits of	-0.124939
-0.385470	16 bits of	-0.425969
-0.393273	32 bits of	-0.425969
-0.417706	n bits of	-0.124939
-0.417706	individual bits of	-0.124939
-1.633005	vector operations of	-0.124939
-0.300550	the type of	-0.124939
-0.396741	and type of	-0.124939
-0.670388	The type of	-0.124939
-0.396741	each type of	-0.124939
-0.396741	any type of	-0.124939
-0.396741	return type of	-0.124939
-0.396741	appropriate type of	-0.124939
-0.358369	the case of	-0.249877
-0.146144	in case of	-0.359022
-0.523372	possible cases of	-0.124939
-0.187200	Other cases of	-0.425969
-0.523372	rare cases of	-0.124939
-0.589975	project. Some of	-0.124939
-0.780631	to arrays of	-0.124939
-0.536528	make arrays of	-0.124939
-0.536528	few arrays of	-0.124939
-1.175893	the calculations of	-0.124939
-0.560619	necessary calculations of	-0.124939
-0.095175	more versions of	-0.425969
-0.114887	different versions of	-0.367977
-0.348719	multiple versions of	-0.301030
-0.184243	two versions of	-0.124939
-0.217237	new versions of	-0.124939
-0.217237	Some versions of	-0.124939
-0.217237	optimized versions of	-0.124939
-0.217237	special versions of	-0.124939
-0.217237	newer versions of	-0.124939
-0.217237	latest versions of	-0.124939
-0.217237	CPU-specific versions of	-0.124939
-0.217237	Fast versions of	-0.124939
-0.728894	the execution of	-0.425969
-0.536737	during execution of	-0.124939
-0.441501	the result of	-0.669007
-0.264924	a result of	-0.124939
-0.395758	The result of	-0.124939
-0.501664	intermediate result of	-0.124939
-1.253906	8 bytes of	-0.124939
-0.534722	consecutive bytes of	-0.124939
-0.534722	65 bytes of	-0.124939
-1.052650	first element of	-0.124939
-0.272058	the speed of	-0.124939
-0.790254	The speed of	-0.124939
-0.441188	high speed of	-0.124939
-0.589286	do much of	-0.124939
-0.585885	ports, etc. of	-0.124939
-0.581916	for overflow of	-0.425969
-0.502735	An overflow of	-0.124939
-0.502735	positive overflow of	-0.124939
-0.644467	to integers of	-0.124939
-0.644467	two integers of	-0.124939
-0.453775	four integers of	-0.124939
-0.453775	eight integers of	-0.124939
-0.453775	multiply integers of	-0.124939
-0.453775	sixteen integers of	-0.124939
-0.053554	the power of	-0.301030
-0.001663	a power of	-0.873127
-0.023885	high power of	-0.425969
-0.049161	processing power of	-0.124939
-0.049161	computational power of	-0.124939
-0.188096	common cause of	-0.124939
-0.527270	frequent cause of	-0.124939
-0.586422	a precision of	-0.124939
-0.099735	the calculation of	-0.425969
-0.144012	The calculation of	-0.124939
-0.552743	the uses of	-0.124939
-0.552743	typical uses of	-0.124939
-1.151771	the parameters of	-0.124939
-0.587380	worst problem of	-0.124939
-0.811623	alternative solution of	-0.124939
-0.553923	radical solution of	-0.124939
-0.411455	the advantage of	-0.124939
-0.016226	The advantage of	-0.425969
-0.150816	takes advantage of	-0.124939
-0.016226	take advantage of	-0.329059
-0.150816	main advantage of	-0.124939
-0.150816	maximum advantage of	-0.124939
-0.150816	full advantage of	-0.124939
-0.150816	took advantage of	-0.124939
-0.587078	for support of	-0.124939
-0.584169	only few of	-0.124939
-0.277300	a list of	-0.124939
-0.319102	A list of	-0.124939
-0.319102	long list of	-0.124939
-0.082736	negative list of	-0.124939
-0.244318	positive list of	-0.124939
-0.319102	smallest list of	-0.124939
-0.586826	15.1c would of	-0.124939
-0.516857	A structure of	-0.124939
-0.516857	whole structure of	-0.124939
-0.516857	logic structure of	-0.124939
-0.236497	the values of	-0.221849
-0.769099	The values of	-0.124939
-0.544360	systems. All of	-0.124939
-0.544360	Comments All of	-0.124939
-1.054481	the sign of	-0.425969
-0.637782	the copy of	-0.124939
-0.449454	unused copy of	-0.124939
-0.407330	non-inlined copy of	-0.425969
-0.449454	backup copy of	-0.124939
-0.244385	the addresses of	-0.249877
-0.792540	The allocation of	-0.124939
-0.543277	involves allocation of	-0.124939
-1.058963	the problems of	-0.124939
-0.541658	to problems of	-0.124939
-0.583412	address space of	-0.124939
-0.050979	a lot of	-0.212089
-0.028390	A lot of	-0.124939
-1.336642	the multiplication of	-0.124939
-0.392725	An implementation of	-0.124939
-0.553102	good implementation of	-0.124939
-0.729193	hardware implementation of	-0.124939
-0.284090	complicated implementation of	-0.124939
-0.392725	typical implementation of	-0.124939
-0.497502	4 Most of	-0.124939
-0.497502	framework Most of	-0.124939
-0.497502	resources. Most of	-0.124939
-0.435506	the members of	-0.124939
-0.127303	are members of	-0.124939
-0.308163	with members of	-0.124939
-0.649990	data members of	-0.124939
-0.308163	variable members of	-0.124939
-0.127303	Data members of	-0.425969
-0.308163	Non-static members of	-0.124939
-0.581644	other methods of	-0.124939
-0.777198	the development of	-0.124939
-0.534568	easy development of	-0.124939
-0.492440	a block of	-0.124939
-0.492440	big block of	-0.124939
-0.492440	own block of	-0.124939
-0.979702	the name of	-0.124939
-0.776544	The name of	-0.124939
-0.584996	the needs of	-0.124939
-1.132077	The conversion of	-0.124939
-0.621478	the disadvantage of	-0.124939
-0.247812	The disadvantage of	-0.124939
-0.185692	A disadvantage of	-0.124939
-0.096060	important disadvantage of	-0.124939
-0.185692	Another disadvantage of	-0.124939
-0.219584	biggest disadvantage of	-0.124939
-1.024311	a parameter of	-0.124939
-0.449484	useful source of	-0.124939
-0.449484	common source of	-0.124939
-0.449484	frequent source of	-0.124939
-0.449484	valuable source of	-0.124939
-0.032129	the cost of	-0.124939
-0.049130	The cost of	-0.124939
-0.349548	overhead cost of	-0.124939
-0.582989	the resources of	-0.124939
-1.143278	a string of	-0.124939
-0.468171	the end of	-0.221849
-0.310801	to end of	-0.124939
-0.310801	with end of	-0.124939
-0.310801	mark end of	-0.124939
-0.080927	for examples of	-0.602060
-0.309829	more examples of	-0.124939
-0.309829	many examples of	-0.124939
-0.309829	several examples of	-0.124939
-0.309829	contains examples of	-0.124939
-0.309829	More examples of	-0.124939
-0.581271	allow addition of	-0.124939
-0.861597	The mechanism of	-0.124939
-0.556360	// Table of	-0.425969
-0.580366	language runtime of	-0.124939
-0.487370	a means of	-0.124939
-0.178631	by means of	-0.425969
-0.697206	first byte of	-0.124939
-0.484326	per byte of	-0.124939
-0.013024	the parts of	-0.425969
-0.026450	when parts of	-0.124939
-0.026450	make parts of	-0.124939
-0.006463	different parts of	-0.425969
-0.013024	other parts of	-0.425969
-0.026450	used parts of	-0.124939
-0.005163	critical parts of	-0.823909
-0.026450	specific parts of	-0.124939
-0.026450	certain parts of	-0.124939
-0.026450	time-consuming parts of	-0.124939
-0.026450	Critical parts of	-0.124939
-0.026450	nearby parts of	-0.124939
-0.271903	and types of	-0.124939
-0.260751	different types of	-0.249877
-0.271903	other types of	-0.124939
-0.434132	integer types of	-0.124939
-0.271903	two types of	-0.124939
-0.271903	some types of	-0.124939
-0.011121	function instead of	-0.124939
-0.011121	code instead of	-0.124939
-0.011121	int instead of	-0.124939
-0.011121	data instead of	-0.124939
-0.011121	i instead of	-0.124939
-0.011121	float instead of	-0.124939
-0.011121	object instead of	-0.124939
-0.011121	table instead of	-0.124939
-0.011121	registers instead of	-0.124939
-0.011121	system instead of	-0.124939
-0.011121	references instead of	-0.124939
-0.011121	counters instead of	-0.124939
-0.011121	templates instead of	-0.124939
-0.011121	#if instead of	-0.124939
-0.005525	rounding instead of	-0.124939
-0.011121	macros instead of	-0.124939
-0.011121	format instead of	-0.124939
-0.011121	-fpie instead of	-0.124939
-0.011121	typedef instead of	-0.124939
-0.011121	int) instead of	-0.124939
-0.011121	|) instead of	-0.124939
-0.579342	unacceptable. Each of	-0.124939
-0.761351	all optimizations of	-0.124939
-0.525434	interprocedural optimizations of	-0.124939
-0.522822	code. Many of	-0.124939
-0.522822	microprocessors. Many of	-0.124939
-0.754568	four numbers of	-0.124939
-0.521481	eight numbers of	-0.124939
-0.538484	the vectors of	-0.124939
-0.382568	in vectors of	-0.124939
-0.538484	on vectors of	-0.124939
-0.382568	Define vectors of	-0.124939
-0.382568	(128 vectors of	-0.124939
-0.031369	the piece of	-0.425969
-0.006095	a piece of	-0.425969
-0.031369	same piece of	-0.124939
-0.065181	critical piece of	-0.124939
-0.133338	small piece of	-0.124939
-0.065181	particular piece of	-0.124939
-0.577562	The process of	-0.124939
-0.189229	the advantages of	-0.124939
-0.014936	The advantages of	-0.477121
-0.157556	above advantages of	-0.124939
-0.749612	the results of	-0.124939
-0.925703	The results of	-0.124939
-0.746390	The storage of	-0.124939
-0.746390	thread-local storage of	-0.124939
-0.072657	different ways of	-0.301030
-0.269346	possible ways of	-0.124939
-0.269346	fast ways of	-0.124939
-0.756539	various ways of	-0.124939
-0.114119	smarter ways of	-0.124939
-0.576814	The operands of	-0.124939
-0.145624	the range of	-0.124939
-0.366298	The range of	-0.124939
-0.366298	same range of	-0.124939
-0.366298	live range of	-0.124939
-0.653163	the start of	-0.124939
-0.459361	; start of	-0.124939
-0.459361	during start of	-0.124939
-1.002849	the modules of	-0.124939
-0.576585	execution core of	-0.124939
-0.038542	the overhead of	-0.124939
-0.038542	The overhead of	-0.249877
-0.325313	extra overhead of	-0.124939
-0.271394	large overhead of	-0.124939
-0.577192	The change of	-0.124939
-0.715035	The installation of	-0.124939
-0.167682	during installation of	-0.425969
-0.084120	the choice of	-0.249877
-0.028281	The choice of	-0.124939
-0.164453	suitable choice of	-0.124939
-0.849426	an index of	-0.124939
-0.248260	an instance of	-0.124939
-0.768168	one instance of	-0.124939
-0.248260	each instance of	-0.124939
-0.106627	new instance of	-0.425969
-0.248260	next instance of	-0.124939
-0.248260	Each instance of	-0.124939
-0.705730	the output of	-0.124939
-0.993139	assembly output of	-0.124939
-0.575704	declared outside of	-0.124939
-0.570588	essential task of	-0.124939
-0.248491	the costs of	-0.124939
-0.073324	The costs of	-0.301030
-1.228363	the destructor of	-0.124939
-0.008714	2.5 Choice of	-0.425969
-0.008714	2.3 Choice of	-0.425969
-0.008714	2.2 Choice of	-0.425969
-0.008714	2.1 Choice of	-0.425969
-0.008714	2.7 Choice of	-0.425969
-0.008714	2.6 Choice of	-0.425969
-0.008714	2.4 Choice of	-0.425969
-0.100311	the efficiency of	-0.124939
-0.047266	The efficiency of	-0.249877
-0.230978	relative efficiency of	-0.124939
-0.570588	an algorithm of	-0.124939
-0.495308	the sum of	-0.124939
-0.495308	a sum of	-0.124939
-0.572168	or strings of	-0.124939
-0.504254	the possibility of	-0.124939
-0.357103	obscure possibility of	-0.124939
-0.074276	the discussion of	-0.124939
-0.035552	a discussion of	-0.124939
-0.035552	for discussion of	-0.124939
-0.074276	more discussion of	-0.124939
-0.074276	A discussion of	-0.124939
-0.017412	further discussion of	-0.249877
-0.570600	a maximum of	-0.124939
-0.484865	The alignment of	-0.124939
-0.484865	Specifies alignment of	-0.124939
-0.338689	the offset of	-0.301030
-0.354717	The offset of	-0.124939
-0.331281	first operand of	-0.124939
-0.001432	the sake of	-0.159701
-0.356625	the effect of	-0.124939
-0.279040	The effect of	-0.301030
-0.003108	the amount of	-0.301030
-0.018990	total amount of	-0.124939
-0.018990	significant amount of	-0.124939
-0.009391	required amount of	-0.124939
-0.018990	equal amount of	-0.124939
-0.018990	considerable amount of	-0.124939
-0.018990	insufficient amount of	-0.124939
-0.570600	extra time, of	-0.124939
-0.569038	wasteful copying of	-0.124939
-0.569819	frequent causes of	-0.124939
-0.566397	balanced mix of	-0.124939
-0.477756	high priority of	-0.124939
-0.682257	low priority of	-0.124939
-1.467907	clock frequency of	-0.124939
-0.837595	one iteration of	-0.124939
-0.477756	every iteration of	-0.124939
-0.568079	future models of	-0.124939
-0.176592	The names of	-0.124939
-0.004045	different kinds of	-0.124939
-0.020611	other kinds of	-0.124939
-0.020611	all kinds of	-0.124939
-0.020611	two kinds of	-0.124939
-0.020611	four kinds of	-0.124939
-0.020611	certain kinds of	-0.124939
-0.010183	Different kinds of	-0.425969
-0.683354	The details of	-0.124939
-0.478441	technical details of	-0.124939
-0.403599	extra level of	-0.124939
-0.403599	higher level of	-0.124939
-0.403599	highest level of	-0.124939
-0.835493	The target of	-0.124939
-0.565118	the bounds of	-0.124939
-0.152431	the loading of	-0.124939
-0.389353	lazy loading of	-0.124939
-0.174174	the reading of	-0.124939
-0.842355	case situation of	-0.124939
-0.058835	different implementations of	-0.124939
-0.206484	Some implementations of	-0.124939
-0.058835	common implementations of	-0.124939
-0.126908	Most implementations of	-0.124939
-0.126908	slow implementations of	-0.124939
-0.126908	early implementations of	-0.124939
-0.314552	first generation of	-0.124939
-0.214904	new generation of	-0.124939
-0.094293	next generation of	-0.124939
-0.490644	second generation of	-0.124939
-0.214904	compile-time generation of	-0.124939
-0.842889	different sizes of	-0.124939
-0.468038	all sizes of	-0.124939
-0.042450	the risk of	-0.124939
-0.142282	but risk of	-0.124939
-0.042450	no risk of	-0.124939
-0.459798	large fraction of	-0.124939
-0.459798	small fraction of	-0.124939
-0.214547	the sequence of	-0.425969
-0.166684	a sequence of	-0.124939
-0.189164	The sequence of	-0.124939
-0.189164	long sequence of	-0.124939
-0.015050	the length of	-0.425969
-0.046811	The length of	-0.425969
-0.459011	The penalty of	-0.124939
-0.652617	misprediction penalty of	-0.124939
-0.618267	for reasons of	-0.425969
-0.022733	the beginning of	-0.602060
-0.011238	a matter of	-0.425969
-0.457440	The declaration of	-0.124939
-0.457440	full declaration of	-0.124939
-0.060476	the series of	-0.124939
-0.009512	a series of	-0.204120
-0.060476	This series of	-0.124939
-0.060476	this series of	-0.124939
-0.372853	the features of	-0.124939
-0.524639	optimization features of	-0.124939
-0.372853	consuming features of	-0.124939
-0.009512	a waste of	-0.204120
-0.060476	and waste of	-0.124939
-0.060476	big waste of	-0.124939
-0.060476	total waste of	-0.124939
-0.027705	The microarchitecture of	-0.124939
-0.003368	"The microarchitecture of	-1.028029
-0.141945	is independent of	-0.425969
-0.354188	almost independent of	-0.124939
-1.054384	and destructors of	-0.124939
-0.007913	in terms of	-0.182931
-0.561725	without help of	-0.124939
-0.559516	The transfer of	-0.124939
-0.444829	copying blocks of	-0.124939
-0.444829	"Moving blocks of	-0.124939
-0.067983	an explanation of	-0.221849
-0.160963	detailed explanation of	-0.124939
-0.047434	different brands of	-0.124939
-0.161404	other brands of	-0.124939
-0.161404	all brands of	-0.124939
-0.161404	competing brands of	-0.124939
-0.817788	any brand of	-0.124939
-0.559516	The logic of	-0.124939
-0.168014	"Performance Optimization of	-0.425969
-0.034182	takes care of	-0.301030
-0.025381	take care of	-0.124939
-0.555706	one unit of	-0.124939
-0.015366	a kind of	-0.124939
-0.010183	this kind of	-0.124939
-0.031296	different kind of	-0.124939
-0.031296	what kind of	-0.124939
-0.031296	worse kind of	-0.124939
-0.810417	several iterations of	-0.124939
-0.554480	and misprediction of	-0.124939
-1.181850	lazy binding of	-0.124939
-0.553256	the chain of	-0.124939
-0.952754	x86 family of	-0.124939
-0.186554	point Conversion of	-0.124939
-0.186554	used. Conversion of	-0.124939
-0.083336	conversion Conversion of	-0.425969
-0.186554	point. Conversion of	-0.124939
-0.555706	produce tables of	-0.124939
-0.559408	used. Conversions of	-0.124939
-0.220539	the purpose of	-0.124939
-0.131705	The purpose of	-0.124939
-0.441940	The trick of	-0.425969
-0.410692	The disadvantages of	-0.124939
-0.410692	are disadvantages of	-0.124939
-0.220539	are instances of	-0.124939
-0.096419	all instances of	-0.124939
-0.220539	renamed instances of	-0.124939
-0.158507	the body of	-0.124939
-0.549471	the changes of	-0.124939
-0.409688	a collection of	-0.124939
-0.409688	A collection of	-0.124939
-0.259277	be aware of	-0.124939
-0.221189	(be aware of	-0.124939
-1.019627	out-of-order capabilities of	-0.124939
-0.220539	the representation of	-0.124939
-0.220539	The representation of	-0.124939
-0.220539	integer representation of	-0.124939
-0.361366	binary representation of	-0.124939
-0.035956	for powers of	-0.124939
-0.035956	are powers of	-0.124939
-0.035956	as powers of	-0.124939
-0.011658	using powers of	-0.602060
-0.035956	avoid powers of	-0.124939
-0.801119	a factor of	-0.124939
-0.126485	the rules of	-0.124939
-0.305685	many rules of	-0.124939
-0.004958	the responsibility of	-0.970037
-0.379871	the reciprocal of	-0.425969
-0.541293	degree polynomial of	-0.124939
-0.739813	type casting of	-0.124939
-0.387854	Type casting of	-0.124939
-0.019238	the scope of	-0.346788
-0.151673	The principle of	-0.124939
-0.151673	the throughput of	-0.124939
-0.180852	// Number of	-0.124939
-0.081079	bits Number of	-0.425969
-0.180852	113 Number of	-0.124939
-0.008127	the availability of	-0.124939
-0.042249	The availability of	-0.124939
-0.925255	only half of	-0.124939
-0.385634	each step of	-0.124939
-0.542881	second step of	-0.124939
-0.042249	time regardless of	-0.124939
-0.042249	same regardless of	-0.124939
-0.042249	cases, regardless of	-0.124939
-0.042249	false regardless of	-0.124939
-0.042249	registers, regardless of	-0.124939
-0.042249	name, regardless of	-0.124939
-0.280395	just-in-time compilation of	-0.124939
-0.052338	the behavior of	-0.124939
-0.180852	The behavior of	-0.124939
-0.385634	vector Type of	-0.124939
-0.385634	follows: Type of	-0.124939
-0.052338	the form of	-0.301030
-0.180852	other form of	-0.124939
-0.152320	aliasing rule of	-0.425969
-0.999802	the job of	-0.124939
-0.073246	the requirements of	-0.124939
-0.051216	a loss of	-0.124939
-0.051216	and loss of	-0.124939
-0.051216	or loss of	-0.124939
-0.051216	any loss of	-0.124939
-0.051216	about loss of	-0.124939
-0.533820	all cleanup of	-0.124939
-0.531955	the swapping of	-0.124939
-0.009771	the rest of	-0.823909
-0.531955	advanced principles of	-0.124939
-0.141903	negative effects of	-0.425969
-0.904445	// Array of	-0.124939
-0.531955	or lists of	-0.124939
-0.951982	the event of	-0.124939
-0.533820	The advice of	-0.124939
-0.772639	specific recommendation of	-0.124939
-0.531955	inefficient. Objects of	-0.124939
-0.535693	calculations. Division of	-0.124939
-0.099284	The pitfalls of	-0.425969
-0.228209	common pitfalls of	-0.124939
-0.895755	is inefficient, of	-0.124939
-0.340177	the latency of	-0.124939
-0.051216	if pieces of	-0.124939
-0.024853	small pieces of	-0.124939
-0.051216	identical pieces of	-0.124939
-0.051216	Critical pieces of	-0.124939
-0.340177	time consumption of	-0.124939
-0.533820	advanced facilities of	-0.124939
-0.936777	time slices of	-0.124939
-0.518313	or estimate of	-0.124939
-0.520573	predicted well, of	-0.124939
-0.170682	transfer ownership of	-0.124939
-0.170682	transfers ownership of	-0.124939
-0.170682	looses ownership of	-0.124939
-0.020611	the drawbacks of	-0.301030
-0.065023	and drawbacks of	-0.124939
-0.520573	the queue of	-0.124939
-0.518313	no modification of	-0.124939
-0.128107	11 Out of	-0.425969
-0.518313	by modifications of	-0.124939
-0.518313	The lengths of	-0.124939
-0.065023	16. Alignment of	-0.124939
-0.031296	9.5 Alignment of	-0.425969
-0.065023	7.2. Alignment of	-0.124939
-0.128107	The splitting of	-0.124939
-0.520573	the area of	-0.124939
-0.310605	the consequence of	-0.124939
-0.438783	The consequence of	-0.124939
-0.170682	is 50% of	-0.124939
-0.170682	true 50% of	-0.124939
-0.170682	mispredicted 50% of	-0.124939
-0.518313	the functionality of	-0.124939
-0.312005	several layers of	-0.124939
-0.312005	separate layers of	-0.124939
-0.520573	size. Integers of	-0.124939
-0.518313	conflicting considerations of	-0.124939
-0.239529	(in bytes) of	-0.124939
-0.749164	the techniques of	-0.124939
-0.065023	platforms. Comparison of	-0.124939
-0.065023	8.1. Comparison of	-0.124939
-0.031296	8.2 Comparison of	-0.425969
-0.518313	the resolution of	-0.124939
-0.496488	values. Which of	-0.124939
-0.106045	complete redesign of	-0.124939
-0.089056	the combination of	-0.124939
-0.089056	a combination of	-0.124939
-0.089056	OR combination of	-0.124939
-0.089056	typical sources of	-0.124939
-0.042249	frequent sources of	-0.124939
-0.496488	thorough analysis of	-0.124939
-0.496488	Automatic updating of	-0.124939
-0.106045	the contents of	-0.124939
-0.499356	The generality of	-0.124939
-0.499356	my study of	-0.124939
-0.089056	= (number of	-0.124939
-0.089056	/ (number of	-0.124939
-0.089056	% (number of	-0.124939
-0.246649	high degree of	-0.124939
-0.246649	typical degree of	-0.124939
-0.499356	allows overriding of	-0.124939
-0.106045	keep track of	-0.124939
-0.027705	get rid of	-0.301030
-0.496488	optimal decomposition of	-0.124939
-0.089056	the history of	-0.124939
-0.089056	The history of	-0.124939
-0.089056	past history of	-0.124939
-0.089056	relative addressing of	-0.124939
-0.042249	self-relative addressing of	-0.425969
-0.089056	to top of	-0.124939
-0.042249	; top of	-0.425969
-0.496488	www.openmp.org. Documentation of	-0.124939
-0.089056	when none of	-0.124939
-0.042249	but none of	-0.425969
-0.089056	function Size of	-0.124939
-0.089056	elements Size of	-0.124939
-0.089056	classes. Size of	-0.124939
-0.817788	the logarithm of	-0.124939
-0.106045	and deallocation of	-0.124939
-0.246649	memory allocations of	-0.124939
-0.246649	many allocations of	-0.124939
-0.496488	are indeed of	-0.124939
-0.893476	But beware of	-0.124939
-0.455875	high complexity of	-0.124939
-0.455875	and lack of	-0.124939
-0.455875	disassembly window of	-0.124939
-0.455875	// Structure of	-0.124939
-0.455875	the goal of	-0.124939
-0.455875	50-50 chance of	-0.124939
-0.455875	the dangers of	-0.124939
-0.455875	approximate comparison of	-0.124939
-0.455875	uses 90% of	-0.124939
-0.141524	fast. Value of	-0.124939
-0.141524	slow. Value of	-0.124939
-0.141524	branch ahead of	-0.124939
-0.141524	counter ahead of	-0.124939
-0.647729	The opposite of	-0.124939
-0.141524	the movements of	-0.124939
-0.141524	physical movements of	-0.124939
-0.141524	is capable of	-0.124939
-0.141524	are capable of	-0.124939
-0.455875	the lowest of	-0.124939
-0.455875	heavy marketing of	-0.124939
-0.141524	better understanding of	-0.124939
-0.141524	basic understanding of	-0.124939
-0.141524	the creation of	-0.124939
-0.141524	The creation of	-0.124939
-0.455875	automatic parallelization of	-0.124939
-0.455875	dramatic degradation of	-0.124939
-0.065023	good deal of	-0.124939
-0.141524	are lots of	-0.124939
-0.141524	with lots of	-0.124939
-0.141524	than 99% of	-0.124939
-0.141524	programs, 99% of	-0.124939
-0.065023	18 Overview of	-0.425969
-0.141524	than sequences of	-0.124939
-0.141524	small sequences of	-0.124939
-0.455875	further expansions of	-0.124939
-0.065023	9.1 Caching of	-0.425969
-0.455875	Technical University of	-0.124939
-0.141524	12.8a. Sum of	-0.124939
-0.141524	12.8b. Sum of	-0.124939
-0.455875	the obstacle of	-0.124939
-0.647729	algebraic manipulations of	-0.124939
-0.455875	further extension of	-0.124939
-0.141524	only 10% of	-0.124939
-0.141524	true 10% of	-0.124939
-0.455875	Every fourth of	-0.124939
-0.352817	The vulnerability of	-0.124939
-0.352817	than 1/50 of	-0.124939
-0.352817	the importance of	-0.124939
-0.352817	for transposition of	-0.124939
-0.352817	logical architecture of	-0.124939
-0.352817	the transformation of	-0.124939
-0.352817	better standardization of	-0.124939
-0.352817	and de-allocation of	-0.124939
-0.352817	scanf. Violation of	-0.124939
-0.352817	and flexibility of	-0.124939
-0.352817	a wealth of	-0.124939
-0.352817	out independently of	-0.124939
-0.352817	large amounts of	-0.124939
-0.352817	and uninstallation of	-0.124939
-0.352817	binary decimals of	-0.124939
-0.352817	metaprogramming. None of	-0.124939
-0.352817	The absence of	-0.124939
-0.352817	The fallacy of	-0.124939
-0.352817	self-explaining menus of	-0.124939
-0.352817	the lifetime of	-0.124939
-0.352817	broader perspective of	-0.124939
-0.352817	responsi- bility of	-0.124939
-0.352817	the evaluation of	-0.124939
-0.352817	The benefits of	-0.124939
-0.352817	the bias of	-0.124939
-0.352817	2 gigabytes of	-0.124939
-0.352817	detailed overview of	-0.124939
-0.352817	class. Members of	-0.124939
-0.352817	the majority of	-0.124939
-0.352817	a lineage of	-0.124939
-0.352817	some indication of	-0.124939
-0.352817	the scarcity of	-0.124939
-0.352817	not safe, of	-0.124939
-0.352817	the insertion of	-0.124939
-0.352817	most distributions of	-0.124939
-0.352817	8 double's of	-0.124939
-0.352817	and cons of	-0.124939
-0.352817	good knowledge of	-0.124939
-0.352817	mathematical notion of	-0.124939
-0.352817	it. Instead of	-0.124939
-0.352817	tables: Lists of	-0.124939
-0.352817	(called x86) of	-0.124939
-0.352817	function billions of	-0.124939
-0.352817	programming practice, of	-0.124939
-0.352817	cache. Bit-fields of	-0.124939
-0.352817	be omitted, of	-0.124939
-0.352817	circuits consisting of	-0.124939
-0.352817	are hundreds of	-0.124939
-0.352817	do searches of	-0.124939
-0.352817	and systematization of	-0.124939
-0.352817	This fragmentation of	-0.124939
-0.352817	mode. Much of	-0.124939
-0.352817	all occurrences of	-0.124939
-0.352817	into groups of	-0.124939
-0.352817	cause holes of	-0.124939
-0.352817	7.1. Sizes of	-0.124939
-0.352817	the design of	-0.124939
-0.352817	use segmentation of	-0.124939
-0.352817	the dimensions of	-0.124939
-0.352817	This requires, of	-0.124939
-0.352817	fundamental laws of	-0.124939
-0.352817	the attention of	-0.124939
-0.352817	and maintainability of	-0.124939
-0.352817	about investigation of	-0.124939
-0.352817	The compactness of	-0.124939
-0.352817	The advise of	-0.124939
-0.352817	and ease of	-0.124939
-0.352817	a couple of	-0.124939
-0.352817	first-in-last-out nature of	-0.124939
-0.352817	Abrash: "Zen of	-0.124939
-0.352817	by thousands of	-0.124939
-0.352817	in favor of	-0.124939
-0.352817	extra layer of	-0.124939
-0.352817	and clarity of	-0.124939
-0.352817	three levels of	-0.124939
-0.352817	problem. Vectors of	-0.124939
-0.352817	misleading reports of	-0.124939
-1.917875	it is to	-0.124939
-1.173570	this is to	-0.124939
-1.070675	data is to	-0.124939
-1.235108	loop is to	-0.124939
-0.196379	do is to	-0.124939
-1.317754	performance is to	-0.124939
-0.831262	software is to	-0.124939
-0.475896	way is to	-0.124939
-0.571889	optimization is to	-0.124939
-0.564663	pointers is to	-0.124939
-1.503423	method is to	-0.124939
-1.100392	case is to	-0.124939
-1.070675	error is to	-0.124939
-0.564663	optimized is to	-0.124939
-0.321361	problem is to	-0.249877
-0.409546	solution is to	-0.204120
-0.831262	container is to	-0.124939
-0.831262	lookup is to	-0.124939
-1.285901	here is to	-0.124939
-0.564663	errors is to	-0.124939
-0.564663	limited is to	-0.124939
-0.196379	possibility is to	-0.124939
-0.564663	thing is to	-0.124939
-0.364864	cores is to	-0.425969
-0.364864	alternative is to	-0.124939
-0.564663	aliasing is to	-0.124939
-0.564663	purpose is to	-0.124939
-0.564663	delete is to	-0.124939
-0.564663	trick is to	-0.124939
-0.564663	generates is to	-0.124939
-0.564663	exceptions is to	-0.124939
-0.564663	recommendation is to	-0.124939
-0.564663	jobs is to	-0.124939
-0.564663	goal is to	-0.124939
-0.564663	safety is to	-0.124939
-0.564663	bottlenecks is to	-0.124939
-0.376320	set a to	-0.425969
-0.883974	add a to	-0.124939
-0.202170	copy a to	-0.425969
-0.592397	reduce a to	-0.124939
-0.376320	converting a to	-0.124939
-0.592397	prefer a to	-0.124939
-1.412175	function and to	-0.124939
-1.540140	Windows and to	-0.124939
-0.375058	zero and to	-0.425969
-0.589290	obstacles and to	-0.124939
-0.589290	operations, and to	-0.124939
-0.589290	module, and to	-0.124939
-1.707340	would be to	-0.124939
-0.591795	repeat or to	-0.124939
-0.591795	console or to	-0.124939
-0.865583	want it to	-0.124939
-0.582902	allows it to	-0.124939
-0.200220	convert it to	-0.124939
-0.582902	comparing it to	-0.124939
-0.582902	compare it to	-0.124939
-0.582902	redirects it to	-0.124939
-1.936811	the function to	-0.124939
-1.934495	a function to	-0.124939
-0.243196	// function to	-0.602060
-1.171069	this function to	-0.124939
-0.572770	which function to	-0.124939
-0.996645	one function to	-0.124939
-1.633466	member function to	-0.124939
-0.572770	Critical function to	-0.124939
-1.260233	the code to	-0.124939
-1.559340	of code to	-0.124939
-1.134035	critical code to	-0.124939
-1.134035	extra code to	-0.124939
-0.709257	assembly code to	-0.124939
-0.194055	AVX code to	-0.425969
-1.037125	machine code to	-0.124939
-0.592018	so as to	-0.124939
-0.592018	apply as to	-0.124939
-0.591034	compiler not to	-0.124939
-0.591034	sure not to	-0.124939
-0.598815	maintenance - to	-0.124939
-0.193394	program than to	-0.124939
-0.550898	b than to	-0.124939
-0.550898	optimization than to	-0.124939
-0.550898	operations than to	-0.124939
-0.806164	thread than to	-0.124939
-0.550898	container than to	-0.124939
-0.550898	block than to	-0.124939
-0.550898	recently than to	-0.124939
-0.550898	bitmap than to	-0.124939
-0.550898	2.0/3.0 than to	-0.124939
-0.550898	pooling) than to	-0.124939
-0.705174	the compiler to	-0.333215
-0.753899	a compiler to	-0.124939
-0.503004	Windows compiler to	-0.124939
-0.503004	particular compiler to	-0.124939
-0.586427	for x to	-0.124939
-0.586427	get x to	-0.124939
-0.586427	Calculate x to	-0.124939
-0.586427	initializes x to	-0.124939
-0.122423	allows you to	-0.124939
-0.199602	allow you to	-0.124939
-0.850714	that have to	-0.124939
-0.594166	not have to	-0.124939
-0.377388	may have to	-0.124939
-0.207305	you have to	-0.263241
-0.821228	will have to	-0.124939
-0.591566	data have to	-0.124939
-0.938595	functions have to	-0.124939
-0.591566	do have to	-0.124939
-0.742047	we have to	-0.124939
-0.047279	You have to	-0.124939
-0.418945	processors have to	-0.124939
-0.418945	calculations have to	-0.124939
-0.418945	versions have to	-0.124939
-0.401920	doesn't have to	-0.124939
-0.591566	would have to	-0.124939
-0.418945	values have to	-0.124939
-0.103771	don't have to	-0.124939
-0.418945	caches have to	-0.124939
-0.868763	want this to	-0.124939
-0.584557	expect this to	-0.124939
-0.584557	adds this to	-0.124939
-1.621610	the time to	-0.124939
-0.639674	more time to	-0.124939
-0.540946	same time to	-0.124939
-0.526922	takes time to	-0.124939
-0.499400	long time to	-0.124939
-1.220524	compile time to	-0.124939
-1.260745	longer time to	-0.124939
-0.555437	response time to	-0.124939
-0.978043	the memory to	-0.124939
-1.372052	of memory to	-0.124939
-0.374983	static memory to	-0.425969
-0.551407	swap memory to	-0.124939
-1.578785	look at to	-0.124939
-1.918972	the data to	-0.124939
-0.588402	organize data to	-0.124939
-1.427229	the program to	-0.124939
-1.103088	your program to	-0.124939
-0.975000	that has to	-0.124939
-1.157517	it has to	-0.124939
-1.095707	code has to	-0.124939
-0.583983	compiler has to	-0.124939
-1.016281	program has to	-0.124939
-0.951992	pointer has to	-0.124939
-0.485306	b has to	-0.124939
-0.329686	user has to	-0.124939
-0.485306	element has to	-0.124939
-0.485306	line has to	-0.124939
-0.485306	mode has to	-0.124939
-0.485306	addition has to	-0.124939
-0.485306	actually has to	-0.124939
-0.485306	offset has to	-0.124939
-0.694411	F1 has to	-0.124939
-0.485306	who has to	-0.124939
-0.587686	example only to	-0.124939
-0.874802	dispatching only to	-0.124939
-1.152742	the CPU to	-0.124939
-0.597776	forward) instruction to	-0.124939
-0.515021	to point to	-0.124939
-0.515021	= point to	-0.124939
-0.515021	it point to	-0.124939
-0.515021	will point to	-0.124939
-1.329953	floating point to	-0.301030
-0.185261	cannot point to	-0.425969
-0.185261	they point to	-0.124939
-0.515021	; point to	-0.124939
-1.168629	at all to	-0.124939
-1.406771	be used to	-0.124939
-0.571192	it used to	-0.124939
-0.571192	get used to	-0.124939
-1.818726	the cache to	-0.124939
-1.087688	an integer to	-0.124939
-0.918058	more integer to	-0.124939
-0.191306	from integer to	-0.425969
-1.000296	unsigned integer to	-0.124939
-1.000296	signed integer to	-0.124939
-0.584684	are set to	-0.124939
-0.584684	pointer set to	-0.124939
-0.592996	one class to	-0.124939
-1.168223	can do to	-0.124939
-1.487906	for example to	-0.124939
-1.080862	C++ compilers to	-0.124939
-0.534061	or double to	-0.301030
-1.056500	fixed size to	-0.124939
-0.351667	a pointer to	-0.455932
-0.417479	or pointer to	-0.124939
-0.703329	function pointer to	-0.425969
-0.807091	'this' pointer to	-0.124939
-0.160395	Set pointer to	-0.425969
-0.200471	in b to	-0.425969
-0.800507	of i to	-0.124939
-0.560726	add i to	-0.124939
-0.560726	type-casting i to	-0.124939
-0.592314	convert float to	-0.124939
-1.593505	an object to	-0.124939
-0.569381	one object to	-0.124939
-0.569381	first object to	-0.124939
-1.220634	a number to	-0.124939
-0.639185	point number to	-0.124939
-0.560004	model number to	-0.124939
-0.199865	keyword static to	-0.425969
-0.468783	more efficient to	-0.229674
-1.491128	the array to	-0.124939
-1.058816	an array to	-0.124939
-0.027736	is possible to	-0.286307
-0.024872	be possible to	-0.204120
-0.052929	it possible to	-0.191886
-0.125120	not possible to	-0.124939
-0.079345	also possible to	-0.124939
-0.079345	often possible to	-0.124939
-0.176500	always possible to	-0.124939
-0.176500	sometimes possible to	-0.124939
-1.285946	which version to	-0.124939
-0.594767	fourth value to	-0.124939
-0.886804	composite objects to	-0.124939
-0.096525	it takes to	-0.359022
-1.798882	the variable to	-0.124939
-0.548958	other variables to	-0.124939
-0.548958	these variables to	-0.124939
-1.275170	induction variables to	-0.124939
-0.548958	allow variables to	-0.124939
-0.589308	must return to	-0.124939
-0.559897	add 2 to	-0.124939
-0.195354	Add 2 to	-0.425969
-1.033728	the table to	-0.124939
-1.058767	virtual table to	-0.124939
-1.498052	the software to	-0.124939
-1.001171	for software to	-0.124939
-0.002609	in order to	-0.365673
-0.042784	In order to	-0.124939
-1.060540	code branch to	-0.124939
-0.545516	a way to	-0.124939
-0.125582	The way to	-0.124939
-0.125582	only way to	-0.425969
-0.235186	no way to	-0.425969
-0.302962	faster way to	-0.124939
-0.428546	useful way to	-0.124939
-0.045972	best way to	-0.124939
-0.428546	good way to	-0.124939
-0.428546	safe way to	-0.124939
-0.302962	simplest way to	-0.124939
-0.125582	easy way to	-0.124939
-0.302962	typical way to	-0.124939
-0.125582	fastest way to	-0.124939
-0.125582	easiest way to	-0.124939
-1.720461	of elements to	-0.124939
-1.005376	all elements to	-0.124939
-0.812136	is faster to	-0.124939
-0.677290	often faster to	-0.124939
-0.677290	even faster to	-0.124939
-0.677290	much faster to	-0.124939
-0.474648	usually faster to	-0.124939
-0.306956	the call to	-0.124939
-0.146504	a call to	-0.124939
-0.598692	function call to	-0.124939
-0.519511	A call to	-0.124939
-0.091991	one call to	-0.301030
-0.369231	each call to	-0.124939
-0.369231	any call to	-0.124939
-0.716997	first call to	-0.124939
-0.146504	single call to	-0.124939
-0.369231	Virtual call to	-0.124939
-0.653842	for example, to	-0.124939
-1.782146	For example, to	-0.124939
-0.836995	sign bit to	-0.124939
-1.333199	a register to	-0.124939
-0.812563	extra register to	-0.124939
-0.554442	physical register to	-0.124939
-0.129306	of how to	-0.204120
-0.025885	for how to	-0.124939
-0.389893	on how to	-0.124939
-0.368238	about how to	-0.124939
-0.104789	shows how to	-0.204120
-0.243185	know how to	-0.124939
-0.098275	discussed how to	-0.124939
-0.225497	specifies how to	-0.124939
-0.225497	programming, how to	-0.124939
-0.225497	illustrates how to	-0.124939
-0.098275	discusses how to	-0.124939
-0.590995	Use template to	-0.124939
-1.546883	XMM registers to	-0.124939
-0.141699	the need to	-0.124939
-0.193190	that need to	-0.301030
-0.433899	not need to	-0.124939
-0.260646	may need to	-0.301030
-0.109419	you need to	-0.124939
-0.065096	no need to	-0.191886
-0.258722	we need to	-0.124939
-0.245488	You need to	-0.124939
-0.245488	libraries need to	-0.124939
-0.245488	files need to	-0.124939
-0.276666	of pointers to	-0.301030
-0.668059	that pointers to	-0.124939
-0.418596	or pointers to	-0.124939
-0.668059	than pointers to	-0.124939
-0.418596	multiple pointers to	-0.124939
-0.418596	Any pointers to	-0.124939
-0.418596	keep pointers to	-0.124939
-0.418596	setting pointers to	-0.124939
-0.418596	initializing pointers to	-0.124939
-1.149894	the user to	-0.124939
-1.021377	is useful to	-0.124939
-0.510321	be useful to	-0.204120
-0.684495	also useful to	-0.124939
-0.827525	very useful to	-0.124939
-0.427773	often useful to	-0.124939
-0.175560	is sure to	-0.124939
-0.348012	are sure to	-0.124939
-0.677850	Make sure to	-0.124939
-1.057996	which method to	-0.124939
-0.972112	vector always to	-0.124939
-0.563362	therefore, always to	-0.124939
-0.575020	the access to	-0.124939
-0.648990	you access to	-0.124939
-0.407757	have access to	-0.124939
-0.407757	possible access to	-0.124939
-0.407757	get access to	-0.124939
-0.407757	fast access to	-0.124939
-0.407757	gives access to	-0.124939
-0.648990	network access to	-0.124939
-0.407757	giving access to	-0.124939
-0.407757	direct access to	-0.124939
-0.407757	forward access to	-0.124939
-1.200124	by 16 to	-0.124939
-0.566372	adds 16 to	-0.124939
-0.197446	turns out to	-0.425969
-0.958765	operating system to	-0.124939
-1.313600	the file to	-0.124939
-0.588958	other bits to	-0.124939
-0.589178	disk operations to	-0.124939
-0.196832	from 0 to	-0.124939
-0.589606	same type to	-0.124939
-1.693544	some cases to	-0.124939
-0.587036	and simple to	-0.124939
-0.875797	new instructions to	-0.124939
-1.510773	are available to	-0.124939
-0.566110	made available to	-0.124939
-1.053082	a constant to	-0.425969
-0.375271	are up to	-0.124939
-0.375271	code up to	-0.124939
-0.375271	not up to	-0.124939
-0.593857	set up to	-0.124939
-0.704164	take up to	-0.124939
-0.375271	count up to	-0.124939
-0.148303	allow up to	-0.425969
-0.375271	turned up to	-0.124939
-0.375271	(not up to	-0.124939
-0.375271	dispatchers up to	-0.124939
-0.375271	totaling up to	-0.124939
-0.997653	of times to	-0.124939
-0.576399	response times to	-0.124939
-0.113027	and want to	-0.124939
-0.052847	may want to	-0.124939
-0.091219	you want to	-0.212089
-0.066340	we want to	-0.124939
-0.113027	I want to	-0.124939
-0.189865	We want to	-0.124939
-0.113027	still want to	-0.124939
-0.052847	who want to	-0.124939
-0.091287	is important to	-0.182931
-0.519818	more important to	-0.124939
-0.200891	very important to	-0.124939
-0.244509	therefore important to	-0.124939
-0.244509	too important to	-0.124939
-0.877471	different CPUs to	-0.124939
-0.195420	sufficiently large to	-0.425969
-0.560556	heavy work to	-0.124939
-0.560556	reinstallation work to	-0.124939
-0.162953	the calls to	-0.124939
-0.426797	of calls to	-0.124939
-0.995371	function calls to	-0.124939
-0.426797	by calls to	-0.124939
-0.426797	no calls to	-0.124939
-0.426797	contains calls to	-0.124939
-0.426797	Multiple calls to	-0.124939
-0.586910	other calculations to	-0.124939
-1.432216	the execution to	-0.124939
-0.879454	final result to	-0.124939
-0.587919	hyperthreading processor to	-0.124939
-1.151517	is compiled to	-0.124939
-0.510819	first compiled to	-0.124939
-0.510819	8.26a compiled to	-0.124939
-0.510819	8.26b compiled to	-0.124939
-0.587919	of bytes to	-0.124939
-1.144257	very big to	-0.124939
-0.082735	is necessary to	-0.124939
-0.125298	be necessary to	-0.124939
-0.204982	it necessary to	-0.124939
-0.284164	not necessary to	-0.124939
-0.090508	often necessary to	-0.124939
-0.058197	therefore necessary to	-0.301030
-0.204982	rarely necessary to	-0.124939
-0.559494	nearest element to	-0.124939
-0.559494	dummy element to	-0.124939
-0.568313	execution speed to	-0.425969
-0.584845	is specific to	-0.124939
-0.459927	is common to	-0.425969
-0.529109	more common to	-0.124939
-1.113343	the thread to	-0.124939
-1.163068	a thread to	-0.124939
-0.588013	slices allocated to	-0.124939
-0.870855	too small to	-0.124939
-0.829802	of integers to	-0.124939
-0.829802	32-bit integers to	-0.124939
-0.752724	unsigned integers to	-0.124939
-1.082356	is good to	-0.124939
-0.557570	not good to	-0.124939
-0.588877	than done to	-0.124939
-1.370177	single precision to	-0.124939
-1.633541	cache line to	-0.124939
-0.756928	function parameters to	-0.124939
-0.466380	as parameters to	-0.124939
-0.664180	four parameters to	-0.124939
-0.173381	fourteen parameters to	-0.425969
-0.202051	is advantageous to	-0.367977
-0.150355	be advantageous to	-0.271067
-0.167711	not advantageous to	-0.301030
-0.198762	less advantageous to	-0.124939
-0.198762	always advantageous to	-0.124939
-0.564956	is known to	-0.726999
-0.463827	and known to	-0.124939
-0.880595	the solution to	-0.124939
-0.491514	simple solution to	-0.124939
-0.491514	standard solution to	-0.124939
-0.491514	better solution to	-0.124939
-0.592263	an advantage to	-0.124939
-0.493420	no advantage to	-0.124939
-0.493420	specific advantage to	-0.124939
-0.182460	// Function to	-0.726999
-0.080321	by eight to	-0.726999
-0.318979	deciding whether to	-0.124939
-0.488931	decides whether to	-0.124939
-0.041926	is likely to	-0.324511
-0.083468	are likely to	-0.124939
-0.083468	more likely to	-0.124939
-0.083468	also likely to	-0.124939
-0.154810	very likely to	-0.124939
-0.083468	less likely to	-0.124939
-0.083468	therefore likely to	-0.124939
-0.083468	equally likely to	-0.124939
-1.294371	the structure to	-0.124939
-0.583467	Adding 1 to	-0.124939
-0.582733	not add to	-0.124939
-0.549301	this information to	-0.124939
-0.549301	extra information to	-0.124939
-0.585158	used simply to	-0.124939
-0.006106	is able to	-0.124939
-0.011296	be able to	-0.204120
-0.002280	are able to	-0.182931
-0.009191	not able to	-0.124939
-0.018581	always able to	-0.124939
-0.018581	actually able to	-0.124939
-0.009191	were able to	-0.124939
-0.018581	sometimes able to	-0.124939
-0.297241	is certain to	-0.124939
-0.418423	not certain to	-0.124939
-0.418423	therefore certain to	-0.124939
-0.418423	was certain to	-0.124939
-0.590790	almost certain to	-0.124939
-0.936897	clock cycles to	-0.124939
-0.797756	return addresses to	-0.124939
-0.546208	these addresses to	-0.124939
-0.585393	seconds count to	-0.124939
-0.584606	many files to	-0.124939
-0.001924	is recommended to	-0.405765
-0.073496	not recommended to	-0.124939
-0.042301	also recommended to	-0.124939
-0.106687	therefore recommended to	-0.124939
-0.042301	strongly recommended to	-0.124939
-0.583206	threads write to	-0.124939
-0.581292	64-bit programs to	-0.124939
-0.513184	is optimal to	-0.124939
-0.305028	be optimal to	-0.124939
-0.820749	not optimal to	-0.124939
-1.239980	memory space to	-0.124939
-0.915039	heap space to	-0.124939
-1.028818	a lot to	-0.602060
-0.541336	overflow Integer to	-0.124939
-0.541336	discussion. Integer to	-0.124939
-0.787754	The dispatching to	-0.124939
-1.429290	CPU dispatching to	-0.124939
-0.581408	two branches to	-0.124939
-0.782639	an application to	-0.124939
-0.537672	typical application to	-0.124939
-0.583336	&& expression to	-0.124939
-1.462576	more complicated to	-0.124939
-0.581982	would like to	-0.124939
-1.537450	data members to	-0.124939
-1.143039	these methods to	-0.124939
-0.581198	data block to	-0.124939
-0.186870	a needs to	-0.124939
-0.117668	that needs to	-0.301030
-0.161028	it needs to	-0.124939
-0.083461	compiler needs to	-0.124939
-0.186870	b needs to	-0.124939
-0.186870	file needs to	-0.124939
-0.186870	constant needs to	-0.124939
-0.186870	list needs to	-0.124939
-0.186870	ReadB needs to	-0.124939
-1.027784	the conversion to	-0.124939
-0.535008	before conversion to	-0.124939
-0.898868	a parameter to	-0.124939
-0.533299	implicit parameter to	-0.124939
-1.208085	point division to	-0.124939
-0.495032	a reference to	-0.124939
-0.123782	or reference to	-0.329059
-0.218389	relative reference to	-0.124939
-0.218389	Return reference to	-0.124939
-0.218389	null reference to	-0.124939
-0.786073	no cost to	-0.124939
-0.414683	performance cost to	-0.124939
-0.661134	extra cost to	-0.124939
-0.414683	large cost to	-0.124939
-0.585240	overhead cost to	-0.124939
-0.056463	no reason to	-0.346788
-1.502174	CPU dispatcher to	-0.124939
-0.528286	add n to	-0.124939
-0.528286	adding n to	-0.124939
-0.579937	ASCII string to	-0.124939
-0.176070	the programmer to	-0.647817
-0.577410	executes three to	-0.124939
-0.459113	be better to	-0.124939
-1.140870	static keyword to	-0.124939
-0.577914	resource-hungry applications to	-0.124939
-0.577410	change && to	-0.124939
-0.627854	in addition to	-0.124939
-0.627854	an addition to	-0.124939
-0.442994	important addition to	-0.124939
-0.442994	another addition to	-0.124939
-0.579937	update mechanism to	-0.124939
-0.584011	Metaprogramming means to	-0.124939
-0.581078	these types to	-0.124939
-0.205623	is difficult to	-0.249877
-0.044567	and difficult to	-0.124939
-0.029198	be difficult to	-0.124939
-0.044567	are difficult to	-0.124939
-0.094236	code difficult to	-0.124939
-0.167529	more difficult to	-0.124939
-0.094236	very difficult to	-0.124939
-0.094236	therefore difficult to	-0.124939
-0.094236	quite difficult to	-0.124939
-0.094236	slow, difficult to	-0.124939
-1.135075	is transferred to	-0.124939
-1.126148	are aligned to	-0.124939
-0.579549	easy linking to	-0.124939
-1.379614	point numbers to	-0.124939
-0.577873	runtime dispatch to	-0.124939
-0.186480	well-defined interface to	-0.425969
-1.110295	installation process to	-0.124939
-0.374076	time goes to	-0.124939
-0.374076	output goes to	-0.124939
-0.374076	project goes to	-0.124939
-0.374076	p->f() goes to	-0.124939
-0.374076	1% goes to	-0.124939
-0.611426	may choose to	-0.124939
-0.465298	you choose to	-0.124939
-0.745825	compiler options to	-0.124939
-0.516347	various options to	-0.124939
-0.269183	several ways to	-0.124939
-0.170522	various ways to	-0.346788
-0.114062	three ways to	-0.124939
-0.749436	The link to	-0.124939
-0.518473	symbolic link to	-0.124939
-0.343873	is made to	-0.124939
-0.577837	wherever appropriate to	-0.124939
-0.170580	it points to	-0.124939
-0.290077	pointer points to	-0.124939
-0.195335	always points to	-0.124939
-0.195335	actually points to	-0.124939
-0.466168	r points to	-0.124939
-0.195335	p points to	-0.124939
-0.055884	initially points to	-0.301030
-0.577837	to switch to	-0.124939
-0.098843	you start to	-0.124939
-0.577837	will start to	-0.124939
-0.577837	necessary here to	-0.124939
-0.314683	more relevant to	-0.124939
-0.314683	also relevant to	-0.124939
-0.314683	options relevant to	-0.124939
-0.314683	hardly relevant to	-0.124939
-0.314683	keywords relevant to	-0.124939
-0.314683	respects relevant to	-0.124939
-0.510202	two things to	-0.124939
-0.510202	ingenious things to	-0.124939
-0.507872	// go to	-0.124939
-0.507872	function go to	-0.124939
-0.575095	simply predicted to	-0.124939
-0.508453	more references to	-0.124939
-0.508453	Internal references to	-0.124939
-0.920314	extra overhead to	-0.124939
-0.511958	little overhead to	-0.124939
-0.301356	are relative to	-0.124939
-0.301356	function relative to	-0.124939
-0.125047	member relative to	-0.425969
-0.301356	offset relative to	-0.124939
-0.301356	addressed relative to	-0.124939
-0.576345	unused columns to	-0.124939
-0.774053	is intended to	-0.124939
-0.632426	are intended to	-0.124939
-0.521272	a profiler to	-0.425969
-1.050785	// Loop to	-0.124939
-0.505672	8.23a. Loop to	-0.124939
-0.557999	is inefficient to	-0.425969
-0.575452	immediate response to	-0.124939
-1.381659	cache lines to	-0.124939
-0.716228	that comes to	-0.124939
-0.716228	it comes to	-0.124939
-1.186064	is limited to	-0.124939
-0.906433	the costs to	-0.124939
-0.427637	of costs to	-0.124939
-0.427637	performance costs to	-0.124939
-0.946265	the destructor to	-0.124939
-1.031973	a destructor to	-0.124939
-0.602806	is safe to	-0.124939
-0.602806	be safe to	-0.124939
-0.602806	more safe to	-0.124939
-0.567995	requires alignment to	-0.124939
-0.790572	a macro to	-0.124939
-0.691149	Define macro to	-0.124939
-0.568869	want them to	-0.124939
-0.354729	are writing to	-0.124939
-0.770667	or writing to	-0.124939
-0.142111	threads writing to	-0.425969
-0.329751	be reduced to	-0.124939
-1.184092	more clear to	-0.124939
-0.835881	higher priority to	-0.124939
-1.078722	one iteration to	-0.124939
-0.522802	processor models to	-0.124939
-0.477774	is changed to	-0.124939
-0.485421	be changed to	-0.124939
-0.339293	artificially changed to	-0.124939
-0.341532	may fail to	-0.124939
-0.236213	often fail to	-0.124939
-0.236213	they fail to	-0.124939
-0.102241	therefore fail to	-0.124939
-0.236213	products fail to	-0.124939
-0.683517	first thing to	-0.124939
-0.478542	obvious thing to	-0.124939
-1.295968	data structures to	-0.124939
-1.235671	the heap to	-0.124939
-0.282383	be initialized to	-0.124939
-0.389443	been initialized to	-0.124939
-0.985280	the executable to	-0.124939
-0.763431	main executable to	-0.124939
-0.566167	a subexpression to	-0.124939
-0.563097	automatic updates to	-0.124939
-0.389443	calls directly to	-0.124939
-0.389443	write directly to	-0.124939
-0.389443	fed directly to	-0.124939
-0.316896	is copied to	-0.124939
-0.322316	been copied to	-0.124939
-0.322316	contents copied to	-0.124939
-0.564118	register sizes to	-0.124939
-0.060394	is easier to	-0.124939
-0.214298	and easier to	-0.124939
-0.313791	often easier to	-0.124939
-0.214298	just easier to	-0.124939
-0.152259	is identical to	-0.124939
-0.547378	are identical to	-0.124939
-0.566167	from 20 to	-0.124939
-0.470522	not expect to	-0.124939
-0.470522	we expect to	-0.124939
-0.170662	is similar to	-0.124939
-0.304015	result back to	-0.124939
-0.304015	priority back to	-0.124939
-0.304015	jumps back to	-0.124939
-0.304015	dates back to	-0.124939
-0.455801	set seconds to	-0.124939
-0.647613	several seconds to	-0.124939
-1.101216	the sequence to	-0.124939
-0.563858	has something to	-0.124939
-1.098321	performance penalty to	-0.124939
-0.563858	special reasons to	-0.124939
-0.562736	software programmers to	-0.124939
-0.562736	little-known alternative to	-0.124939
-0.373673	program happen to	-0.124939
-0.373673	variables happen to	-0.124939
-0.373673	matrix happen to	-0.124939
-0.189381	be enough to	-0.124939
-0.084449	long enough to	-0.425969
-0.458775	big enough to	-0.124939
-0.189381	small enough to	-0.124939
-0.189381	rarely enough to	-0.124939
-0.084262	not apply to	-0.124939
-0.084262	may apply to	-0.425969
-0.188907	always apply to	-0.124939
-0.282096	rules apply to	-0.124939
-0.560500	a row to	-0.124939
-0.561616	variable declaration to	-0.124939
-0.561616	new features to	-0.124939
-0.031771	is added to	-0.346788
-0.282096	be added to	-0.124939
-0.089339	is easy to	-0.124939
-0.563539	test situations to	-0.124939
-0.497583	or writes to	-0.124939
-0.353603	function writes to	-0.124939
-0.353603	all writes to	-0.124939
-0.032514	This applies to	-0.124939
-0.032514	same applies to	-0.124939
-0.021403	also applies to	-0.124939
-0.067660	advice applies to	-0.124939
-0.006753	be applied to	-0.249877
-0.005394	when applied to	-0.522879
-0.764069	and destructors to	-0.124939
-0.444109	with destructors to	-0.124939
-0.209658	example 15.1b to	-0.124939
-0.217154	reduced 15.1b to	-0.124939
-0.557347	pointer eax to	-0.124939
-0.563539	don't care to	-0.124939
-0.558956	The procedure to	-0.124939
-0.554810	adding throw() to	-0.124939
-0.252510	may try to	-0.124939
-0.252510	will try to	-0.124939
-0.252510	then try to	-0.124939
-0.252510	we try to	-0.124939
-0.092337	is converted to	-0.425969
-0.017989	be converted to	-0.249877
-0.076910	when converted to	-0.124939
-0.092337	object pointed to	-0.124939
-0.036755	value pointed to	-0.425969
-0.036755	variable pointed to	-0.124939
-0.076910	target pointed to	-0.124939
-0.553437	advanced algorithms to	-0.124939
-0.553437	OK, however, to	-0.124939
-0.163802	are designed to	-0.124939
-0.810743	the inputs to	-0.124939
-0.251174	is preferred to	-0.124939
-0.331396	be preferred to	-0.124939
-0.558956	%. Conversion to	-0.124939
-0.428892	count down to	-0.124939
-0.428892	was down to	-0.124939
-0.427864	; jump to	-0.124939
-0.427864	microprocessor jump to	-0.124939
-0.578441	are allowed to	-0.124939
-0.578441	not allowed to	-0.124939
-1.117640	and delete to	-0.124939
-0.158652	be distributed to	-0.425969
-1.059836	compiler generates to	-0.124939
-0.546915	time T to	-0.124939
-0.442278	the linker to	-0.124939
-0.576784	time measurements to	-0.124939
-0.408956	some measurements to	-0.124939
-0.799019	the factor to	-0.124939
-0.546915	64-bit MMX to	-0.124939
-0.554702	makes sense to	-0.124939
-0.032514	is equal to	-0.602060
-0.105903	be equal to	-0.124939
-0.105903	therefore equal to	-0.124939
-0.543698	or reads to	-0.124939
-0.271824	program. Add to	-0.124939
-0.271824	library. Add to	-0.124939
-0.271824	dispatching. Add to	-0.124939
-0.270869	is expected to	-0.124939
-0.432631	be expected to	-0.124939
-0.270869	are expected to	-0.124939
-0.180600	is convenient to	-0.124939
-0.271824	be convenient to	-0.124939
-0.080979	more convenient to	-0.124939
-0.541914	leftmost column to	-0.124939
-0.541914	of portability to	-0.124939
-0.543698	thread. Pointers to	-0.124939
-0.387448	when converting to	-0.124939
-0.387448	before converting to	-0.124939
-0.541914	extremely costly to	-0.124939
-0.793287	powerful computers to	-0.124939
-0.914904	the debugger to	-0.124939
-0.966214	not permissible to	-0.124939
-0.042188	compilers due to	-0.124939
-0.042188	higher due to	-0.124939
-0.042188	delay due to	-0.124939
-0.042188	future due to	-0.124939
-0.042188	unstable due to	-0.124939
-0.042188	differences due to	-0.124939
-0.541914	in edx, to	-0.124939
-0.142144	the effort to	-0.425969
-0.530823	engineering principles to	-0.124939
-0.141719	be obvious to	-0.124939
-0.354836	be swapped to	-0.124939
-0.354836	even swapped to	-0.124939
-0.354836	be portable to	-0.124939
-0.354836	not portable to	-0.124939
-0.354836	certain limit to	-0.124939
-0.675392	upper limit to	-0.124939
-0.141719	is nothing to	-0.124939
-0.770671	be increased to	-0.124939
-0.353450	is equivalent to	-0.124939
-0.353450	are equivalent to	-0.124939
-0.770671	cleanup jobs to	-0.124939
-0.060400	is safer to	-0.124939
-0.130578	are safer to	-0.124939
-0.130578	therefore safer to	-0.124939
-0.535008	expression -(-a) to	-0.124939
-0.770671	be updated to	-0.124939
-0.535008	program appear to	-0.124939
-0.537116	resources Writes to	-0.124939
-0.051141	that belong to	-0.124939
-0.051141	all belong to	-0.124939
-0.051141	often belong to	-0.124939
-0.051141	always belong to	-0.124939
-0.051141	lines belong to	-0.124939
-0.099159	may prefer to	-0.124939
-0.227871	will prefer to	-0.124939
-0.933909	time slices to	-0.124939
-0.064926	to lead to	-0.124939
-0.020582	can lead to	-0.602060
-0.747300	one place to	-0.124939
-0.259317	is preferable to	-0.124939
-0.170435	be preferable to	-0.124939
-0.170435	often preferable to	-0.124939
-0.170435	the obstacles to	-0.124939
-0.170435	important obstacles to	-0.124939
-0.170435	common obstacles to	-0.124939
-0.031251	8.4 Obstacles to	-0.425969
-0.031251	8.3 Obstacles to	-0.425969
-0.862423	the loader to	-0.124939
-0.064926	number. Failure to	-0.124939
-0.031251	deallocated. Failure to	-0.425969
-0.064926	flow. Failure to	-0.124939
-0.517216	change pre-increment to	-0.124939
-0.064926	efficient thanks to	-0.124939
-0.064926	automatically thanks to	-0.124939
-0.064926	similar thanks to	-0.124939
-0.064926	fragmented thanks to	-0.124939
-0.031251	is guaranteed to	-0.425969
-0.064926	are guaranteed to	-0.124939
-0.064926	not guaranteed to	-0.124939
-0.747300	need modification to	-0.124939
-0.747300	Possible solutions to	-0.124939
-0.076910	an appendix to	-0.425969
-0.170435	An appendix to	-0.124939
-0.170435	as alternatives to	-0.124939
-0.170435	possible alternatives to	-0.124939
-0.170435	various alternatives to	-0.124939
-0.517216	of modifications to	-0.124939
-0.517216	metaprogramming tools to	-0.124939
-0.064926	-fpic according to	-0.124939
-0.064926	representation according to	-0.124939
-0.064926	behave according to	-0.124939
-0.064926	Now, according to	-0.124939
-0.517216	great lengths to	-0.124939
-0.170435	is extended to	-0.124939
-0.170435	be extended to	-0.124939
-0.170435	are extended to	-0.124939
-0.311664	efficient. Access to	-0.124939
-0.311664	locally. Access to	-0.124939
-0.517216	many years to	-0.124939
-0.519746	also inconvenient to	-0.124939
-0.517216	__assume_aligned directive to	-0.124939
-0.310095	is going to	-0.124939
-0.310095	not going to	-0.124939
-0.049732	good idea to	-0.124939
-0.519746	and interfaces to	-0.124939
-0.519746	Use mask to	-0.124939
-0.310095	names. Remember to	-0.124939
-0.310095	down. Remember to	-0.124939
-0.310095	it appears to	-0.124939
-0.310095	this appears to	-0.124939
-0.517216	add functionality to	-0.124939
-0.862423	exception handler to	-0.124939
-0.311664	is inferior to	-0.124939
-0.311664	are inferior to	-0.124939
-0.517216	one auto_ptr to	-0.124939
-0.031251	is unable to	-0.124939
-0.031251	be unable to	-0.124939
-0.031251	example 15.1a to	-0.124939
-0.031251	reduced 15.1a to	-0.124939
-0.517216	systematic manner to	-0.124939
-0.892320	The conclusion to	-0.124939
-0.495445	point multiplication, to	-0.124939
-0.495445	tested seem to	-0.124939
-0.710942	from -128 to	-0.124939
-0.892320	the dividend to	-0.124939
-0.246258	is annoying to	-0.124939
-0.246258	are annoying to	-0.124939
-0.495445	output listing to	-0.124939
-0.354373	is translated to	-0.124939
-0.246258	been translated to	-0.124939
-0.498656	option -fno-builtin to	-0.124939
-0.088920	statement leads to	-0.124939
-0.088920	vectorization leads to	-0.124939
-0.088920	binding leads to	-0.124939
-0.710942	programming questions to	-0.124939
-0.495445	portability issue to	-0.124939
-0.495445	function argument to	-0.124939
-0.495445	may decide to	-0.124939
-0.495445	completely unrolled to	-0.124939
-0.042188	from 0x2700 to	-0.425969
-0.088920	address 0x2700 to	-0.124939
-0.088920	is ported to	-0.124939
-0.088920	later ported to	-0.124939
-0.088920	easily ported to	-0.124939
-0.495445	some experience to	-0.124939
-0.815606	if necessary, to	-0.124939
-0.892320	// Call to	-0.124939
-0.495445	All accesses to	-0.124939
-0.246258	time compared to	-0.124939
-0.246258	disadvantages compared to	-0.124939
-0.105903	is sufficient to	-0.124939
-0.454925	is impossible to	-0.124939
-0.141294	is type-casted to	-0.124939
-0.141294	are type-casted to	-0.124939
-0.454925	was manipulated to	-0.124939
-0.454925	code carefully to	-0.124939
-0.454925	of dangers to	-0.124939
-0.646253	virus scanners to	-0.124939
-0.646253	different priorities to	-0.124939
-0.454925	get answers to	-0.124939
-0.454925	function prototype to	-0.124939
-0.454925	256 Kbytes to	-0.124939
-0.454925	parallelism refers to	-0.124939
-0.454925	take microseconds to	-0.124939
-0.454925	extra precautions to	-0.124939
-0.141294	is worthwhile to	-0.124939
-0.141294	be worthwhile to	-0.124939
-0.454925	is profitable to	-0.124939
-0.646253	// Initialize to	-0.124939
-0.141294	object belongs to	-0.124939
-0.141294	normally belongs to	-0.124939
-0.646253	the caller to	-0.124939
-0.454925	// Volatile to	-0.124939
-0.064926	allows us to	-0.124939
-0.141294	flawed approach to	-0.124939
-0.141294	thought-through approach to	-0.124939
-0.141294	C++ relates to	-0.124939
-0.141294	language relates to	-0.124939
-0.454925	example 11.1a to	-0.124939
-0.064926	you forget to	-0.124939
-0.454925	never respond to	-0.124939
-0.141294	significant contribution to	-0.124939
-0.141294	negligible contribution to	-0.124939
-0.064926	the ability to	-0.124939
-0.141294	and attempts to	-0.124939
-0.141294	it attempts to	-0.124939
-0.646253	time consumer to	-0.124939
-0.454925	Non-public distribution to	-0.124939
-0.454925	error messages to	-0.124939
-0.454925	as coprocessors to	-0.124939
-0.454925	sum, initialize to	-0.124939
-0.454925	an obstacle to	-0.124939
-0.454925	64-bit extension to	-0.124939
-0.064926	several minutes to	-0.124939
-0.454925	been incremented to	-0.124939
-0.352068	anyway. Updates to	-0.124939
-0.352068	serious limitations to	-0.124939
-0.352068	we forgot to	-0.124939
-0.352068	malloc. Handles to	-0.124939
-0.352068	adding bounds-checking to	-0.124939
-0.352068	always comparable to	-0.124939
-0.352068	is unacceptable to	-0.124939
-0.352068	time T+1 to	-0.124939
-0.352068	are prone to	-0.124939
-0.352068	a plug-in to	-0.124939
-0.352068	< 223 to	-0.124939
-0.352068	are advised to	-0.124939
-0.352068	it unwise to	-0.124939
-0.352068	"move constructor" to	-0.124939
-0.352068	of switching to	-0.124939
-0.352068	like throw(A,B,C) to	-0.124939
-0.352068	are cumbersome to	-0.124939
-0.352068	#pragma novector to	-0.124939
-0.352068	integer According to	-0.124939
-0.352068	the capability to	-0.124939
-0.352068	I tried to	-0.124939
-0.352068	// Round to	-0.124939
-0.352068	> -b to	-0.124939
-0.352068	// Entry to	-0.124939
-0.352068	be rounded to	-0.124939
-0.352068	is closest to	-0.124939
-0.352068	is supposed to	-0.124939
-0.352068	in relation to	-0.124939
-0.352068	immediate responses to	-0.124939
-0.352068	default, conform to	-0.124939
-0.352068	123 correspond to	-0.124939
-0.352068	following steps to	-0.124939
-0.352068	core. Try to	-0.124939
-0.352068	from attempting to	-0.124939
-0.352068	example 15.1d to	-0.124939
-0.352068	is advisable to	-0.124939
-0.352068	gives rise to	-0.124939
-0.352068	throw() specification to	-0.124939
-0.352068	testing. Trying to	-0.124939
-0.352068	// Convert to	-0.124939
-0.352068	41 Float to	-0.124939
-0.352068	respond quickly to	-0.124939
-0.352068	software teachers to	-0.124939
-0.352068	* 5; to	-0.124939
-0.352068	to port to	-0.124939
-0.352068	example 12.1b to	-0.124939
-0.352068	just happened to	-0.124939
-0.352068	and closer to	-0.124939
-0.352068	This corresponds to	-0.124939
-0.352068	example 12.8a to	-0.124939
-0.352068	Runtime, CLR, to	-0.124939
-0.352068	are risking to	-0.124939
-0.352068	are confined to	-0.124939
-0.352068	multiplication prior to	-0.124939
-0.352068	takes hours to	-0.124939
-0.352068	have gone to	-0.124939
-0.352068	less susceptible to	-0.124939
-0.352068	to adhere to	-0.124939
-0.352068	that relate to	-0.124939
-0.352068	and WritePrivateProfileString to	-0.124939
-0.352068	be responded to	-0.124939
-0.352068	adding -100 to	-0.124939
-0.352068	also tends to	-0.124939
-0.352068	reduce (a*b*c)+(c*b*a) to	-0.124939
-0.352068	implementation analogous to	-0.124939
-0.352068	always happy to	-0.124939
-0.352068	common practice to	-0.124939
-0.352068	in Windows) to	-0.124939
-0.352068	from a=a*2; to	-0.124939
-0.352068	is tempting to	-0.124939
-0.352068	to adapt to	-0.124939
-0.352068	are unrelated to	-0.124939
-0.352068	7.38b. Alternative to	-0.124939
-2.561128	it is and	-0.124939
-1.592415	of a and	-0.602060
-1.818370	for a and	-0.124939
-0.808174	if a and	-0.726999
-1.094337	If a and	-0.425969
-1.844851	into a and	-0.124939
-0.583801	array a and	-0.124939
-1.026558	between a and	-0.124939
-1.376910	making a and	-0.124939
-0.583801	parameters a and	-0.124939
-0.583801	used. a and	-0.124939
-0.583801	arrays, a and	-0.124939
-0.583801	joining a and	-0.124939
-0.583801	MultiplyBy<8>(10); a and	-0.124939
-1.567357	points to and	-0.124939
-0.597500	delete it and	-0.124939
-1.986491	the function and	-0.124939
-1.187129	this function and	-0.124939
-1.007327	one function and	-0.124939
-1.266310	each function and	-0.124939
-1.352915	critical function and	-0.124939
-1.187129	another function and	-0.124939
-1.110733	dispatcher function and	-0.124939
-0.853891	inlined function and	-0.124939
-0.894204	time if and	-0.124939
-0.591067	versions with and	-0.124939
-1.394782	compiled with and	-0.124939
-1.274449	turn on and	-0.124939
-1.409784	the code and	-0.124939
-0.801505	of code and	-0.726999
-0.349001	when code and	-0.425969
-0.962301	same code and	-0.124939
-0.962301	point code and	-0.124939
-1.059211	extra code and	-0.124939
-0.886385	compiled code and	-0.124939
-0.843842	intermediate code and	-0.425969
-0.527875	binary code and	-0.124939
-0.615025	position-independent code and	-0.301030
-0.962301	machine code and	-0.124939
-0.527875	well-structured code and	-0.124939
-0.527875	startup code and	-0.124939
-2.262884	the compiler and	-0.124939
-1.729656	Gnu compiler and	-0.124939
-0.586978	best compiler and	-0.124939
-0.598885	between x and	-0.124939
-1.424472	the time and	-0.124939
-0.518263	a time and	-0.124939
-1.149102	of time and	-0.124939
-1.344586	compile time and	-0.124939
-0.811706	development time and	-0.124939
-0.553969	installation time and	-0.124939
-2.168032	to use and	-0.124939
-0.877192	cache use and	-0.124939
-0.887054	becoming more and	-0.124939
-1.180530	of A and	-0.124939
-1.359968	of memory and	-0.124939
-0.763600	static memory and	-0.124939
-0.549457	less memory and	-0.124939
-0.549457	common memory and	-0.124939
-0.803572	main memory and	-0.124939
-1.023690	RAM memory and	-0.124939
-0.549457	uncached memory and	-0.124939
-1.757964	the data and	-0.124939
-0.997770	test data and	-0.124939
-0.573194	intermediate data and	-0.124939
-0.847157	thread-specific data and	-0.124939
-1.815743	a program and	-0.124939
-0.585590	final program and	-0.124939
-2.378948	to make and	-0.124939
-0.523592	to functions and	-0.124939
-0.758185	different functions and	-0.124939
-0.876667	all functions and	-0.124939
-0.876667	critical functions and	-0.124939
-0.523592	both functions and	-0.124939
-1.206978	intrinsic functions and	-0.124939
-0.876667	string functions and	-0.124939
-0.187251	public functions and	-0.124939
-0.876667	Virtual functions and	-0.124939
-0.523592	leaf functions and	-0.124939
-1.035405	the CPU and	-0.522879
-0.374958	scan instruction and	-0.124939
-2.313303	floating point and	-0.124939
-1.317386	the loop and	-0.124939
-1.560239	a loop and	-0.124939
-1.334383	innermost loop and	-0.124939
-1.365678	is used and	-0.124939
-0.577877	longer used and	-0.124939
-1.047820	have one and	-0.124939
-0.565908	code cache and	-0.124939
-0.548901	as cache and	-0.124939
-1.196431	data cache and	-0.124939
-1.319117	level-2 cache and	-0.124939
-0.830365	bit integer and	-0.124939
-1.069120	unsigned integer and	-0.124939
-0.564177	mix integer and	-0.124939
-0.564177	(6 integer and	-0.124939
-1.875956	See page and	-0.124939
-1.096475	instruction set and	-0.301030
-0.537746	list, set and	-0.124939
-0.843944	same class and	-0.124939
-0.486545	parent class and	-0.124939
-1.947773	to do and	-0.124939
-1.672477	can do and	-0.124939
-0.495092	with compilers and	-0.124939
-0.495092	only compilers and	-0.124939
-1.000645	Intel compilers and	-0.124939
-0.440050	C++ compilers and	-0.903090
-0.710362	how compilers and	-0.124939
-0.710362	good compilers and	-0.124939
-0.495092	Intel's compilers and	-0.124939
-1.578140	are using and	-0.124939
-0.199558	float, double and	-0.425969
-1.924513	the size and	-0.124939
-0.863663	for size and	-0.124939
-1.463775	the Intel and	-0.124939
-0.187372	from Intel and	-0.124939
-0.083659	Microsoft, Intel and	-0.249877
-0.524118	Gnu, Intel and	-0.124939
-1.632833	function pointer and	-0.124939
-1.107605	if b and	-0.124939
-0.551127	a, b and	-0.124939
-0.551127	add b and	-0.124939
-0.193444	Multiply b and	-0.425969
-1.019038	function library and	-0.124939
-0.567625	efficient library and	-0.124939
-0.582688	to i and	-0.124939
-0.582688	eliminate i and	-0.124939
-0.922771	bit float and	-0.124939
-0.191746	mix float and	-0.124939
-0.543439	Mixing float and	-0.124939
-0.543439	mixes float and	-0.124939
-1.039217	by two and	-0.124939
-1.486340	the number and	-0.124939
-1.078549	of static and	-0.124939
-0.196900	both static and	-0.425969
-0.517729	of C++ and	-0.301030
-1.183853	in C++ and	-0.124939
-2.013145	more efficient and	-0.124939
-1.598501	less efficient and	-0.124939
-1.722806	an array and	-0.124939
-0.591953	if possible and	-0.124939
-1.176819	debug version and	-0.124939
-2.137490	the value and	-0.124939
-1.030906	class objects and	-0.124939
-0.551865	large objects and	-0.124939
-1.068170	allocated objects and	-0.124939
-0.551865	declare objects and	-0.124939
-0.617636	point variables and	-0.425969
-0.459574	all variables and	-0.124939
-0.459574	many variables and	-0.124939
-0.653496	Such variables and	-0.124939
-0.653496	local variables and	-0.124939
-0.171638	non-static variables and	-0.425969
-0.459574	internal variables and	-0.124939
-0.171638	Integers variables and	-0.124939
-1.031798	and return and	-0.124939
-1.683069	of 2 and	-0.124939
-0.571246	between 2 and	-0.124939
-0.876173	import table and	-0.124939
-0.850727	program performance and	-0.124939
-0.850727	good performance and	-0.124939
-0.579617	sequential order and	-0.124939
-0.579617	natural order and	-0.124939
-0.479814	very long and	-0.124939
-0.560619	annoyingly long and	-0.124939
-0.688588	and 32-bit and	-0.124939
-0.677669	for 32-bit and	-0.425969
-0.608173	between 32-bit and	-0.124939
-0.430035	support 32-bit and	-0.124939
-0.430035	both 32-bit and	-0.124939
-0.430035	supports 32-bit and	-0.124939
-0.163832	Supports 32-bit and	-0.425969
-0.430035	including 32-bit and	-0.124939
-0.430035	Linux, 32-bit and	-0.124939
-0.430035	X, 32-bit and	-0.124939
-0.430035	16-bit, 32-bit and	-0.124939
-0.590470	that branch and	-0.124939
-0.884819	first way and	-0.124939
-0.881234	data elements and	-0.124939
-0.795240	calls faster and	-0.124939
-0.544796	becomes faster and	-0.124939
-0.544796	generally faster and	-0.124939
-0.544796	packages faster and	-0.124939
-0.789242	time before and	-0.124939
-0.789242	program before and	-0.124939
-0.541416	counter before and	-0.124939
-0.541416	count before and	-0.124939
-0.963207	is called and	-0.301030
-1.091113	are called and	-0.124939
-0.557228	code address and	-0.124939
-1.259336	memory address and	-0.124939
-0.557228	their address and	-0.124939
-1.355283	Pentium 4 and	-0.124939
-1.270494	the call and	-0.124939
-0.571925	of call and	-0.124939
-0.587722	be 8 and	-0.124939
-0.588736	8 bit and	-0.124939
-0.588296	reflected, first and	-0.124939
-1.525858	a register and	-0.124939
-0.589351	"Intel 64 and	-0.124939
-1.061851	function libraries and	-0.124939
-0.515091	vector libraries and	-0.124939
-0.515091	different libraries and	-0.124939
-0.515091	runtime libraries and	-0.124939
-1.150026	point registers and	-0.124939
-0.446371	the pointers and	-0.124939
-0.633035	as pointers and	-0.124939
-0.446371	all pointers and	-0.124939
-0.633035	using pointers and	-0.124939
-0.801919	member pointers and	-0.124939
-0.446371	while pointers and	-0.124939
-0.446371	All pointers and	-0.124939
-0.446371	link pointers and	-0.124939
-0.446371	includes pointers and	-0.124939
-1.159000	the test and	-0.124939
-1.039358	a new and	-0.124939
-0.237621	with new and	-0.249877
-0.442729	using new and	-0.124939
-0.442729	uses new and	-0.124939
-0.442729	operators new and	-0.124939
-0.442729	over new and	-0.124939
-0.816815	64-bit systems and	-0.124939
-0.361531	32-bit systems and	-0.124939
-0.724914	operating systems and	-0.124939
-0.442729	existing systems and	-0.124939
-1.143724	the user and	-0.124939
-0.868175	avoid these and	-0.124939
-1.084061	memory access and	-0.124939
-0.524499	CPU access and	-0.124939
-0.953183	file access and	-0.124939
-0.524499	non-sequential access and	-0.124939
-0.878682	for SSE2 and	-0.124939
-1.190824	operating system and	-0.124939
-0.566087	by 32 and	-0.124939
-0.566087	16, 32 and	-0.124939
-0.501880	library file and	-0.124939
-0.501880	C++ file and	-0.124939
-0.501880	source file and	-0.124939
-0.520152	executable file and	-0.124939
-1.290815	vector operations and	-0.124939
-1.139715	point operations and	-0.124939
-0.542134	bit operations and	-0.124939
-0.687943	is 0 and	-0.124939
-0.327814	to 0 and	-0.425969
-0.695791	than 0 and	-0.425969
-0.687943	< 0 and	-0.124939
-1.436401	the type and	-0.124939
-0.972434	function type and	-0.124939
-0.587364	possible case and	-0.124939
-1.674812	some cases and	-0.124939
-0.685453	different processors and	-0.124939
-0.479749	simple processors and	-0.124939
-0.685453	AMD processors and	-0.124939
-0.685453	physical processors and	-0.124939
-0.479749	older processors and	-0.124939
-0.479749	emulated processors and	-0.124939
-0.560749	is constant and	-0.124939
-0.560749	precision constant and	-0.124939
-0.917637	set up and	-0.124939
-0.541288	goes up and	-0.124939
-0.917637	cleaning up and	-0.124939
-1.030208	residual error and	-0.124939
-0.646345	two times and	-0.124939
-0.454985	256 times and	-0.124939
-0.454985	20 times and	-0.124939
-0.454985	random times and	-0.124939
-0.646345	1000 times and	-0.124939
-0.454985	unpredictable times and	-0.124939
-0.454985	ten times and	-0.124939
-1.043799	the stack and	-0.124939
-0.117414	Intel, Gnu and	-0.301030
-0.587909	so important and	-0.124939
-0.560585	most CPUs and	-0.124939
-0.560585	64-bit CPUs and	-0.124939
-0.809361	of arrays and	-0.124939
-0.662322	large arrays and	-0.124939
-0.173080	big arrays and	-0.124939
-0.465200	variables, arrays and	-0.124939
-0.465200	Align arrays and	-0.124939
-0.519915	The Windows and	-0.124939
-0.254744	for Windows and	-0.124939
-0.515645	64-bit Windows and	-0.124939
-0.584380	32-bit Windows and	-0.124939
-0.369517	bit Windows and	-0.124939
-0.146590	both Windows and	-0.124939
-0.369517	BSD, Windows and	-0.124939
-0.925458	function calls and	-0.425969
-0.509965	many calls and	-0.124939
-0.509965	jumps, calls and	-0.124939
-1.413307	point calculations and	-0.124939
-1.053151	two versions and	-0.124939
-1.048810	out-of-order execution and	-0.124939
-0.690939	the processor and	-0.425969
-0.531604	VIA processor and	-0.124939
-1.526571	is compiled and	-0.124939
-1.115345	is big and	-0.124939
-0.555408	code big and	-0.124939
-1.456672	multiple threads and	-0.124939
-1.677856	the best and	-0.124939
-0.591746	programming language and	-0.124939
-0.727799	assembly language and	-0.124939
-1.241131	execution speed and	-0.124939
-0.589124	b, c and	-0.124939
-0.527821	between single and	-0.124939
-0.527821	mix single and	-0.124939
-0.527821	mixing single and	-0.124939
-0.581076	units, etc. and	-0.124939
-0.239505	on AMD and	-0.602060
-0.258340	as AMD and	-0.124939
-0.258340	both AMD and	-0.124939
-0.040104	Intel, AMD and	-1.204120
-0.959848	is allocated and	-0.124939
-0.458558	are allocated and	-0.425969
-0.617760	is small and	-0.124939
-1.129084	a small and	-0.124939
-0.648746	the overflow and	-0.124939
-0.378419	of overflow and	-0.124939
-0.797725	for overflow and	-0.124939
-0.407617	on overflow and	-0.124939
-0.648746	an overflow and	-0.124939
-0.407617	about overflow and	-0.124939
-0.407617	give overflow and	-0.124939
-0.773995	of integers and	-0.124939
-0.108846	between integers and	-0.602060
-0.773995	32-bit integers and	-0.124939
-1.536804	a matrix and	-0.124939
-0.308727	of Linux and	-0.124939
-0.165510	in Linux and	-0.124939
-0.499751	64-bit Linux and	-0.124939
-0.308727	In Linux and	-0.124939
-0.306790	32-bit Linux and	-0.124939
-0.308727	supports Linux and	-0.124939
-0.080710	Windows, Linux and	-0.301030
-1.618838	the AVX and	-0.124939
-0.494940	of classes and	-0.124939
-1.056549	vector classes and	-0.124939
-0.494940	C++ classes and	-0.124939
-1.034129	container classes and	-0.124939
-0.867349	this works and	-0.124939
-0.582946	carefully optimized and	-0.124939
-1.144962	address calculation and	-0.124939
-1.046880	integer parameters and	-0.124939
-0.753983	the problem and	-0.124939
-0.937244	AVX support and	-0.124939
-0.549423	OS support and	-0.124939
-0.580752	These operators and	-0.124939
-0.583191	to list and	-0.124939
-0.918452	of structure and	-0.124939
-0.742361	data structure and	-0.124939
-0.514302	logical structure and	-0.124939
-1.235220	to inline and	-0.124939
-1.016020	or 1 and	-0.124939
-0.839808	bit mode and	-0.124939
-0.683803	16-bit mode and	-0.124939
-0.176491	protected mode and	-0.425969
-0.586242	for sign and	-0.124939
-1.315189	loop counter and	-0.124939
-0.540736	integer counter and	-0.124939
-0.675206	first count and	-0.124939
-0.627256	repeat count and	-0.124939
-0.581839	data files and	-0.124939
-0.657087	object files and	-0.124939
-0.294185	help files and	-0.124939
-0.412384	Open files and	-0.124939
-0.581839	configuration files and	-0.124939
-0.916581	is fast and	-0.124939
-1.056291	for fast and	-0.124939
-0.617768	The allocation and	-0.124939
-0.436378	register allocation and	-0.124939
-0.436378	dynamic allocation and	-0.124939
-0.436378	frequent allocation and	-0.124939
-0.436378	Register allocation and	-0.124939
-0.577490	C++ programs and	-0.124939
-0.650760	the problems and	-0.124939
-0.187595	compatibility problems and	-0.301030
-0.525084	resource problems and	-0.124939
-0.273827	usability problems and	-0.124939
-1.119298	cache space and	-0.124939
-0.807289	CPU dispatching and	-0.124939
-1.382036	the microprocessor and	-0.124939
-0.778880	dedicated microprocessor and	-0.124939
-0.413680	of branches and	-0.124939
-0.652312	many branches and	-0.124939
-0.458816	preceding branches and	-0.124939
-0.853591	a multiplication and	-0.124939
-0.577546	12.8b automatically and	-0.124939
-1.148247	code caching and	-0.124939
-0.922866	instruction sets and	-0.124939
-0.744774	more complicated and	-0.124939
-0.490826	extremely complicated and	-0.124939
-1.560425	exception handling and	-0.124939
-1.158802	look like and	-0.124939
-0.577249	optimization methods and	-0.124939
-0.417597	on signed and	-0.124939
-0.417597	using signed and	-0.124939
-0.160428	between signed and	-0.425969
-0.417597	mix signed and	-0.124939
-1.131805	CPU model and	-0.124939
-1.248992	software development and	-0.124939
-0.917475	memory block and	-0.124939
-0.769301	class name and	-0.124939
-0.530034	brand name and	-0.124939
-0.579222	data conversion and	-0.124939
-0.854201	is high and	-0.124939
-0.995875	to zero and	-0.124939
-0.177758	terminating zero and	-0.425969
-0.769821	The Microsoft and	-0.124939
-0.769821	Intel, Microsoft and	-0.124939
-0.859405	function parameter and	-0.124939
-0.578982	involving division and	-0.124939
-0.578982	that source and	-0.124939
-0.579668	starts running and	-0.124939
-1.015198	network resources and	-0.124939
-0.850854	by n and	-0.124939
-1.131766	a string and	-0.124939
-0.574448	becoming better and	-0.124939
-0.573740	Unix applications and	-0.124939
-0.573033	operators && and	-0.124939
-0.526833	than addition and	-0.124939
-1.138939	point addition and	-0.124939
-0.575455	<=, > and	-0.124939
-0.753876	of expressions and	-0.124939
-0.521076	simplest expressions and	-0.124939
-0.523901	to read and	-0.124939
-0.781226	be read and	-0.124939
-0.520304	#include directives and	-0.124939
-0.520304	Compiler directives and	-0.124939
-0.575005	all public and	-0.124939
-0.747904	the framework and	-0.124939
-0.934838	.NET framework and	-0.124939
-1.068841	static linking and	-0.124939
-0.423187	Dynamic linking and	-0.425969
-1.133825	modern microprocessors and	-0.124939
-0.707124	point numbers and	-0.425969
-1.168304	hardware platform and	-0.124939
-0.576142	cycles later and	-0.124939
-0.576142	project together and	-0.124939
-1.194677	128-bit XMM and	-0.124939
-0.923546	user interface and	-0.124939
-0.578604	become bigger and	-0.124939
-0.846572	on vectors and	-0.124939
-0.570453	for AVX2 and	-0.124939
-0.573963	of << and	-0.124939
-0.541164	all x86 and	-0.425969
-0.463344	Supports x86 and	-0.124939
-0.846994	development process and	-0.124939
-0.993621	the advantages and	-0.124939
-0.520032	has advantages and	-0.124939
-0.580872	and b, and	-0.124939
-0.845390	data storage and	-0.124939
-1.354811	optimization options and	-0.124939
-0.639401	copy constructor and	-0.124939
-0.579724	of a[i] and	-0.124939
-0.645948	the function, and	-0.124939
-0.454730	library function, and	-0.124939
-0.454730	select function, and	-0.124939
-1.403009	the operands and	-0.124939
-0.839055	an advanced and	-0.124939
-1.184320	of range and	-0.124939
-1.239295	to start and	-0.124939
-0.569758	library modules and	-0.124939
-0.734466	is smaller and	-0.124939
-0.454730	code smaller and	-0.124939
-0.454730	bytes smaller and	-0.124939
-0.506380	system core and	-0.124939
-0.506380	microprocessor core and	-0.124939
-0.574593	jumping around and	-0.124939
-0.504744	between 5 and	-0.124939
-0.504744	3, 5 and	-0.124939
-1.291573	code section and	-0.124939
-0.569824	further tested and	-0.124939
-0.573635	the contentions and	-0.124939
-0.726322	for positive and	-0.124939
-0.726322	both positive and	-0.124939
-0.572679	in C and	-0.124939
-0.508023	one global and	-0.124939
-0.508023	Avoid global and	-0.124939
-0.989064	the conversions and	-0.124939
-0.839106	if statement and	-0.124939
-0.573949	it off and	-0.124939
-0.504501	as p and	-0.124939
-0.504501	away p and	-0.124939
-1.219980	programming languages and	-0.124939
-0.571918	for installation and	-0.124939
-0.573949	vary dynamically and	-0.124939
-1.114002	function inlining and	-0.124939
-0.569981	databases, network and	-0.124939
-0.989283	a slow and	-0.124939
-0.496893	used functions, and	-0.124939
-0.496893	virtual functions, and	-0.124939
-0.568907	into lines and	-0.124939
-1.557396	to find and	-0.124939
-0.495079	syntax checking and	-0.124939
-0.814841	bounds checking and	-0.124939
-0.578614	other platforms and	-0.124939
-0.365992	which platforms and	-0.124939
-0.365992	all platforms and	-0.124939
-0.365992	ARM platforms and	-0.124939
-1.501693	the level-1 and	-0.124939
-1.173936	is limited and	-0.124939
-0.568926	fast math and	-0.124939
-0.567778	String constants and	-0.124939
-0.566633	= shift and	-0.124939
-0.841319	is safe and	-0.124939
-0.572388	cache efficiency and	-0.124939
-0.986574	Text strings and	-0.124939
-0.579603	by testing and	-0.124939
-0.410869	makes testing and	-0.124939
-0.410869	development, testing and	-0.124939
-0.480730	on alignment and	-0.124939
-0.480730	pointer alignment and	-0.124939
-0.825590	with 100 and	-0.124939
-0.416062	stack Variables and	-0.124939
-0.416062	memory. Variables and	-0.124939
-0.587283	storage Variables and	-0.124939
-0.562802	signal processing and	-0.124939
-0.391969	more clear and	-0.124939
-0.352205	less clear and	-0.124939
-0.352205	making clear and	-0.124939
-0.566299	in x, and	-0.124939
-1.478215	Mac OS and	-0.124939
-0.240050	function names and	-0.301030
-0.338228	brand names and	-0.124939
-0.567634	time, RAM and	-0.124939
-0.176160	of rows and	-0.425969
-0.196728	same thing and	-0.124939
-0.824519	of structures and	-0.124939
-0.563190	seldom occur and	-0.124939
-0.567542	very smart and	-0.124939
-0.445760	the SSE and	-0.124939
-0.466496	are reading and	-0.124939
-0.466496	on reading and	-0.124939
-1.056876	The simplest and	-0.124939
-0.646615	error message and	-0.124939
-1.092926	CPU cores and	-0.124939
-0.560312	array sizes and	-0.124939
-0.467653	the PathScale and	-0.124939
-0.467653	Intel, PathScale and	-0.124939
-0.129863	Linux, BSD and	-0.726999
-0.317391	the program, and	-0.124939
-0.455005	to SSE4.1 and	-0.124939
-0.455005	for SSE4.1 and	-0.124939
-0.556349	memory buffer and	-0.124939
-0.556349	of seconds and	-0.124939
-0.813192	the compiler, and	-0.124939
-0.958286	be invalid and	-0.124939
-0.556349	file input and	-0.124939
-0.521797	many programmers and	-0.124939
-0.370847	assembly programmers and	-0.124939
-0.370847	advanced programmers and	-0.124939
-1.333497	critical stride and	-0.124939
-1.333497	instruction set, and	-0.124939
-0.371868	4 processors, and	-0.124939
-0.371868	VIA processors, and	-0.124939
-0.371868	future processors, and	-0.124939
-0.833424	doesn't matter and	-0.124939
-0.453768	class declaration and	-0.124939
-0.453768	"C" declaration and	-0.124939
-0.818877	optimization features and	-0.124939
-0.562648	been added and	-0.124939
-0.555005	OS independent and	-0.124939
-0.555005	optimized away and	-0.124939
-1.087324	example 15.1b and	-0.124939
-0.446725	inlined 15.1b and	-0.124939
-0.989674	= multiply and	-0.124939
-1.300230	an explanation and	-0.124939
-0.826331	CPU brands and	-0.124939
-1.213330	the diagonal and	-0.124939
-0.560231	reload *p and	-0.124939
-0.556740	point addition, and	-0.124939
-0.279612	Supports OpenMP and	-0.124939
-0.117681	processing, OpenMP and	-0.425969
-0.279612	107), OpenMP and	-0.124939
-0.440030	be standardized and	-0.124939
-0.440030	fully standardized and	-0.124939
-0.263899	of parent and	-0.425969
-0.354633	both parent and	-0.124939
-0.803588	operating systems, and	-0.124939
-0.549466	for false and	-0.124939
-0.807055	a PC and	-0.124939
-0.807055	coarse-grained parallelism and	-0.124939
-0.553329	between c2 and	-0.124939
-0.547547	for prediction and	-0.124939
-0.547547	the iterations and	-0.124939
-0.754647	an integer, and	-0.124939
-0.424887	binary integer, and	-0.124939
-0.549466	on algorithms and	-0.124939
-0.810549	the PLT and	-0.124939
-0.134545	of additions and	-0.124939
-0.330510	n additions and	-0.124939
-0.426333	into ecx and	-0.124939
-0.426333	eax, ecx and	-0.124939
-0.547547	local variables, and	-0.124939
-0.549466	accurate, however, and	-0.124939
-0.547547	languages, profiling and	-0.124939
-0.331674	is fragmented and	-0.124939
-0.331674	be fragmented and	-0.124939
-0.467327	become fragmented and	-0.124939
-0.464155	CPU family and	-0.124939
-0.329350	its family and	-0.124939
-0.329350	brand, family and	-0.124939
-0.426333	of devices and	-0.124939
-0.426333	system devices and	-0.124939
-0.360577	the GOT and	-0.124939
-0.251087	use GOT and	-0.124939
-0.251087	effect. GOT and	-0.124939
-0.251087	suppress. GOT and	-0.124939
-0.549466	processing Memory and	-0.124939
-0.551393	shut down and	-0.124939
-0.456244	cache misses and	-0.425969
-0.544615	with -fpic and	-0.124939
-1.242590	the carry and	-0.124939
-0.428044	for debugging and	-0.124939
-0.302585	off debugging and	-0.124939
-0.302585	verifying, debugging and	-0.124939
-0.544615	next vector, and	-0.124939
-0.407746	the object, and	-0.124939
-0.407746	allocated object, and	-0.124939
-0.546794	be allowed and	-0.124939
-0.575005	program itself and	-0.124939
-0.407746	application itself and	-0.124939
-0.548984	linear algebra and	-0.124939
-0.542446	task switches and	-0.124939
-0.936172	is distributed and	-0.124939
-0.572680	32-bit mode, and	-0.124939
-0.406163	exclusive mode, and	-0.124939
-0.794918	of Java and	-0.124939
-0.544615	is free and	-0.124939
-0.791067	more expensive and	-0.124939
-0.157681	between rounding and	-0.425969
-0.438901	operating system, and	-0.124939
-1.003124	32-bit integers, and	-0.124939
-0.406163	interpreted again and	-0.124939
-0.406163	reused again and	-0.124939
-0.788197	the linker and	-0.124939
-0.407746	compiler, linker and	-0.124939
-0.548984	to debug and	-0.124939
-0.291823	Gnu, Clang and	-0.425969
-0.920397	more reliable and	-0.124939
-0.925592	the Borland and	-0.124939
-1.061564	execution units and	-0.124939
-0.544615	restoring registers, and	-0.124939
-0.519447	to transpose and	-0.425969
-0.156923	if possible, and	-0.124939
-0.179635	as possible, and	-0.124939
-0.382390	is compact and	-0.124939
-0.668408	more compact and	-0.124939
-0.540736	unaligned reads and	-0.124939
-1.039779	of course, and	-0.124939
-1.126964	to identify and	-0.124939
-0.984022	more complex and	-0.124939
-0.150402	4 Performance and	-0.425969
-0.150402	13.4 Test and	-0.425969
-0.382390	when portability and	-0.124939
-0.382390	efficiency, portability and	-0.124939
-1.039779	to evaluate and	-0.124939
-0.543256	Basic .NET and	-0.124939
-0.384139	references Pointers and	-0.124939
-0.114098	7.6 Pointers and	-0.425969
-0.538230	are costly and	-0.124939
-0.385895	more efficient, and	-0.124939
-0.385895	pointers efficient, and	-0.124939
-0.540736	of computers and	-0.124939
-0.538230	A, B and	-0.124939
-0.384139	between 9 and	-0.124939
-0.384139	6, 9 and	-0.124939
-1.089069	Intel Core and	-0.124939
-0.535738	a debugger and	-0.124939
-0.382390	of truncation and	-0.124939
-0.382390	to truncation and	-0.124939
-1.126964	hot spots and	-0.124939
-0.538230	Windows 7 and	-0.124939
-0.538230	quite powerful and	-0.124939
-0.115040	for 32- and	-0.425969
-0.271985	Supports 32- and	-0.124939
-0.538230	sound processing, and	-0.124939
-0.535738	computer users and	-0.124939
-0.382390	e.g. C++, and	-0.124939
-0.382390	managed C++, and	-0.124939
-0.543256	for communication and	-0.124939
-0.385895	because communication and	-0.124939
-0.526517	Pascal, Fortran and	-0.124939
-0.526517	the increment and	-0.124939
-0.529445	accessed backwards and	-0.124939
-0.526517	memory swapping and	-0.124939
-0.226580	of memset and	-0.124939
-0.226580	to memset and	-0.124939
-0.226580	functions memset and	-0.124939
-0.526517	more popular and	-0.124939
-0.532394	string searching and	-0.124939
-0.338070	constant propagation and	-0.124939
-0.351154	program development, and	-0.124939
-0.351154	software development, and	-0.124939
-0.529445	is obvious and	-0.124939
-0.526517	Linked lists and	-0.124939
-0.529445	sqrt, pow and	-0.124939
-0.526517	far pointers, and	-0.124939
-0.526517	allocation Objects and	-0.124939
-0.129889	have constructors and	-0.124939
-0.039133	copy constructors and	-0.301030
-0.768281	if nonzero and	-0.124939
-0.958617	software package and	-0.124939
-0.883290	to understand and	-0.124939
-0.526517	quite inefficient, and	-0.124939
-0.526517	foreground jobs and	-0.124939
-1.014917	the latency and	-0.124939
-0.529445	Linux platforms, and	-0.124939
-0.526517	branches separately and	-0.124939
-0.883290	subexpression elimination and	-0.124939
-0.529445	variables sum1 and	-0.124939
-0.526517	code. Compilers and	-0.124939
-0.529445	CodeGear, Codeplay and	-0.124939
-0.351154	optimizing features, and	-0.124939
-0.351154	backup features, and	-0.124939
-0.526517	zero flag and	-0.124939
-0.353108	in b[i] and	-0.124939
-0.353108	if b[i] and	-0.124939
-0.128233	or malloc and	-0.124939
-0.129889	functions malloc and	-0.124939
-0.129889	(or malloc and	-0.124939
-1.014917	is true, and	-0.124939
-0.513042	input/output Graphics and	-0.124939
-0.520175	these obstacles and	-0.124939
-0.520175	of m and	-0.124939
-0.513042	of resources, and	-0.124939
-0.513042	always one, and	-0.124939
-0.853234	are needed, and	-0.124939
-0.513042	the SVML and	-0.124939
-0.049493	addition, subtraction and	-0.602060
-0.740233	double precision, and	-0.124939
-0.513042	that u.f and	-0.124939
-0.740233	page 134 and	-0.124939
-0.746243	calculate *p+2 and	-0.124939
-0.308147	more cores, and	-0.124939
-0.308147	multiple cores, and	-0.124939
-0.853234	the pipeline and	-0.124939
-0.752337	Set flush-to-zero and	-0.124939
-0.516594	example 14.8 and	-0.124939
-0.308147	for overflow, and	-0.124939
-0.308147	integer overflow, and	-0.124939
-0.306357	in advance and	-0.124939
-0.740233	"worst case" and	-0.124939
-0.310359	= 0x2710 and	-0.124939
-0.614544	address 0x2710 and	-0.124939
-0.923096	hot spot and	-0.124939
-0.513042	frame, saving and	-0.124939
-0.746243	of structured and	-0.124939
-0.516594	poor documentation and	-0.124939
-0.516594	does not, and	-0.124939
-0.740233	example 12.4b and	-0.124939
-0.513042	an underflow and	-0.124939
-0.516594	than 2n and	-0.124939
-0.128026	7.12 Branches and	-0.425969
-0.435484	user interfaces and	-0.124939
-0.308147	hardware interfaces and	-0.124939
-0.513042	other's caches and	-0.124939
-0.740233	it is, and	-0.124939
-0.746243	3 breakpoint and	-0.124939
-0.513042	well-defined functionality and	-0.124939
-0.310359	multiple layers and	-0.124939
-0.310359	software layers and	-0.124939
-0.513042	variables Y and	-0.124939
-0.513042	are auto_ptr and	-0.124939
-0.746243	string constants, and	-0.124939
-0.516594	which opens and	-0.124939
-0.516594	right format and	-0.124939
-0.513042	millisecond resolution and	-0.124939
-0.513042	addition units, and	-0.124939
-0.704438	page 49 and	-0.124939
-0.491474	four bits, and	-0.124939
-0.495985	accessed consecutively and	-0.124939
-0.495985	(chapter 11) and	-0.124939
-0.495985	then B, and	-0.124939
-0.491474	Square blocking and	-0.124939
-0.491474	system API and	-0.124939
-0.491474	Loop r1 and	-0.124939
-0.495985	Loop r2 and	-0.124939
-0.491474	fast anyway and	-0.124939
-0.244762	system database, and	-0.124939
-0.244762	remote database, and	-0.124939
-0.491474	doing calculations, and	-0.124939
-0.244762	cache level, and	-0.124939
-0.244762	file level, and	-0.124939
-0.491474	relevant books and	-0.124939
-0.244762	for generality and	-0.124939
-0.244762	full generality and	-0.124939
-0.491474	the parameter, and	-0.124939
-0.491474	many keywords and	-0.124939
-0.807347	option -fno-pic and	-0.124939
-0.704438	platform _M_IX86 and	-0.124939
-0.491474	information elsewhere and	-0.124939
-0.491474	integer operations, and	-0.124939
-0.704438	software packages and	-0.124939
-0.491474	page 136 and	-0.124939
-0.491474	15.1c automatically, and	-0.124939
-0.491474	page 145 and	-0.124939
-0.491474	between RISC and	-0.124939
-0.491474	C++, Pascal and	-0.124939
-0.105361	7.23 Constructors and	-0.425969
-0.491474	between PC's and	-0.124939
-0.244762	as DOS and	-0.124939
-0.244762	systems DOS and	-0.124939
-0.491474	7.4. Signed and	-0.124939
-0.491474	as logarithms and	-0.124939
-0.495985	the easiest and	-0.124939
-0.244762	data flow and	-0.124939
-0.352456	program flow and	-0.124939
-0.491474	s1, s2 and	-0.124939
-0.491474	function, etc., and	-0.124939
-0.491474	by F2 and	-0.124939
-0.491474	human readable and	-0.124939
-0.491474	functional decomposition and	-0.124939
-0.491474	internet forums and	-0.124939
-0.491474	If Func1 and	-0.124939
-0.491474	is odd and	-0.124939
-0.491474	Boolean vectors, and	-0.124939
-0.495985	allocation, deallocation and	-0.124939
-0.807347	table (PLT) and	-0.124939
-0.491474	Pointers, references, and	-0.124939
-0.201043	Constant folding and	-0.425969
-0.088399	in Sum2 and	-0.124939
-0.088399	than Sum2 and	-0.124939
-0.088399	Sum1, Sum2 and	-0.124939
-0.451307	smaller. Structure and	-0.124939
-0.451307	n.a. _MSC_VER and	-0.124939
-0.451307	& operator; and	-0.124939
-0.451307	are CPU-specific and	-0.124939
-0.064556	to develop and	-0.124939
-0.451307	microprocessors. Multiplication and	-0.124939
-0.451307	example 14.12b and	-0.124939
-0.451307	self-styled hacks and	-0.124939
-0.451307	a hint and	-0.124939
-0.640644	preceding paragraph and	-0.124939
-0.640644	virus scanners and	-0.124939
-0.451307	page 73 and	-0.124939
-0.451307	color settings and	-0.124939
-0.451307	me corrections and	-0.124939
-0.451307	becomes inconsistent and	-0.124939
-0.451307	of starting and	-0.124939
-0.451307	in itself, and	-0.124939
-0.451307	64 Kbytes and	-0.124939
-0.140412	to creating and	-0.124939
-0.140412	for creating and	-0.124939
-0.451307	alternately FuncA and	-0.124939
-0.451307	be overwritten, and	-0.124939
-0.451307	advantageous if, and	-0.124939
-0.451307	in p1 and	-0.124939
-0.451307	functions lrintf and	-0.124939
-0.451307	for audio and	-0.124939
-0.451307	high price, and	-0.124939
-0.451307	register renaming and	-0.124939
-0.640644	data compression and	-0.124939
-0.064556	the exponent, and	-0.124939
-0.451307	testing, verifying and	-0.124939
-0.064556	7.30 Exceptions and	-0.425969
-0.451307	copy constructors, and	-0.124939
-0.451307	to push and	-0.124939
-0.451307	sorting, searching, and	-0.124939
-0.451307	deleted properly and	-0.124939
-0.451307	accessing list[i].a and	-0.124939
-0.451307	MOVNTPS, MOVNTPD and	-0.124939
-0.451307	respectively. Increment and	-0.124939
-0.640644	constant propagation, and	-0.124939
-0.640644	Kernel Library" and	-0.124939
-0.451307	up multiplications and	-0.124939
-0.451307	be optional and	-0.124939
-0.451307	platform __GNUC__ and	-0.124939
-0.640644	Example 14.23b and	-0.124939
-0.451307	many respects and	-0.124939
-0.451307	doesn't support, and	-0.124939
-0.640644	quite tedious and	-0.124939
-0.640644	a systematic and	-0.124939
-0.140412	memory management and	-0.124939
-0.140412	heap management and	-0.124939
-0.451307	look clumsy and	-0.124939
-0.451307	are fetched and	-0.124939
-0.451307	ABC 123 and	-0.124939
-0.451307	in reusable and	-0.124939
-0.451307	Taylor expansions and	-0.124939
-0.451307	to interrupts and	-0.124939
-0.451307	Public distribution and	-0.124939
-0.451307	to log, and	-0.124939
-0.140412	S. Goedecker and	-0.124939
-0.140412	Stefan Goedecker and	-0.124939
-0.451307	as accurate and	-0.124939
-0.451307	as sorting and	-0.124939
-0.640644	to keyboard and	-0.124939
-0.064556	square root and	-0.425969
-0.451307	and statistics, and	-0.124939
-0.640644	memory leaks and	-0.124939
-0.451307	page 95 and	-0.124939
-0.640644	and delete, and	-0.124939
-0.640644	is accessed, and	-0.124939
-0.064556	7.17 Structures and	-0.124939
-0.349209	are common, and	-0.124939
-0.349209	fast, compact, and	-0.124939
-0.349209	'?', '@' and	-0.124939
-0.349209	memory areas, and	-0.124939
-0.349209	of strange and	-0.124939
-0.349209	become invalid, and	-0.124939
-0.349209	as sqrt and	-0.124939
-0.349209	and esp+12 and	-0.124939
-0.349209	= x∙xn-1, and	-0.124939
-0.349209	cache evictions and	-0.124939
-0.349209	program compactness, and	-0.124939
-0.349209	code bloat and	-0.124939
-0.349209	and C# and	-0.124939
-0.349209	false (0); and	-0.124939
-0.349209	of frustration and	-0.124939
-0.349209	some situations, and	-0.124939
-0.349209	compiler technology, and	-0.124939
-0.349209	different alignments and	-0.124939
-0.349209	platform independence, and	-0.124939
-0.349209	are sufficient, and	-0.124939
-0.349209	processes running, and	-0.124939
-0.349209	2A, 2B, and	-0.124939
-0.349209	gives a+b=0, and	-0.124939
-0.349209	hard-to-find errors, and	-0.124939
-0.349209	initialization, condition, and	-0.124939
-0.349209	one local, and	-0.124939
-0.349209	between commas and	-0.124939
-0.349209	bits (YMM), and	-0.124939
-0.349209	are dominating and	-0.124939
-0.349209	as spell-checking and	-0.124939
-0.349209	well tested, and	-0.124939
-0.349209	priority thread, and	-0.124939
-0.349209	||, ! and	-0.124939
-0.349209	Volume 2A and	-0.124939
-0.349209	for transposing and	-0.124939
-0.349209	are aligned, and	-0.124939
-0.349209	both cheaper and	-0.124939
-0.349209	second source, and	-0.124939
-0.349209	defining _mm_malloc and	-0.124939
-0.349209	latencies, throughputs and	-0.124939
-0.349209	modularity, reusability and	-0.124939
-0.349209	memory economy and	-0.124939
-0.349209	Functions _intel_fast_memcpy and	-0.124939
-0.349209	as GetPrivateProfileString and	-0.124939
-0.349209	be non-zero, and	-0.124939
-0.349209	test, maintain and	-0.124939
-0.349209	separate module, and	-0.124939
-0.349209	is ambiguous and	-0.124939
-0.349209	always sequential, and	-0.124939
-0.349209	Intel VTune and	-0.124939
-0.349209	address esp+8 and	-0.124939
-0.349209	by hand and	-0.124939
-0.349209	own caller, and	-0.124939
-0.349209	0x3700, 0x3F00 and	-0.124939
-0.349209	to T+6, and	-0.124939
-0.349209	G values, and	-0.124939
-0.349209	the Professional and	-0.124939
-0.349209	conventions. FreeBSD and	-0.124939
-0.349209	to optimize, and	-0.124939
-0.349209	MKL, VML and	-0.124939
-0.349209	CPU brands, and	-0.124939
-0.349209	service routines and	-0.124939
-0.349209	PC's, workstations and	-0.124939
-0.349209	fetching, decoding and	-0.124939
-0.349209	3-dimensional geometry and	-0.124939
-0.349209	is terminated and	-0.124939
-0.349209	common subexpressions, and	-0.124939
-0.349209	as flush and	-0.124939
-0.349209	PHP, ASP and	-0.124939
-0.349209	usability issues, and	-0.124939
-0.349209	semaphores, mutexes and	-0.124939
-0.349209	often fluctuating and	-0.124939
-0.349209	dispatch mechanisms, and	-0.124939
-0.349209	containing (2,2,2,2), and	-0.124939
-0.349209	value infinity, and	-0.124939
-0.349209	computer games and	-0.124939
-0.349209	as email and	-0.124939
-0.349209	be platform-independent and	-0.124939
-0.349209	this limitation and	-0.124939
-0.349209	virus attacks and	-0.124939
-0.349209	a temp1 and	-0.124939
-0.349209	Template Library) and	-0.124939
-0.349209	512 520 and	-0.124939
-0.349209	Library (ATL) and	-0.124939
-0.349209	version 2.6.30 and	-0.124939
-0.349209	than 15.1b, and	-0.124939
-0.349209	between recoverable and	-0.124939
-0.349209	be reinstalled and	-0.124939
-0.349209	of synchronizing and	-0.124939
-0.349209	See www.agner.org/optimize and	-0.124939
-0.349209	system dependent and	-0.124939
-0.349209	clock period and	-0.124939
-0.349209	and 3A and	-0.124939
-0.349209	Available protocols and	-0.124939
-0.349209	vector intrinsics and	-0.124939
-0.349209	the GOT, and	-0.124939
-0.349209	heavy traffic and	-0.124939
-0.349209	more manageable and	-0.124939
-0.349209	each call, and	-0.124939
-0.349209	generate -128, and	-0.124939
-0.349209	library libmmt.lib and	-0.124939
-0.349209	with _finite()) and	-0.124939
-0.349209	operators (& and	-0.124939
-0.349209	copying process, and	-0.124939
-0.349209	no side-effects and	-0.124939
-0.349209	consistent modularity and	-0.124939
-0.349209	different sizes, and	-0.124939
-0.349209	their workplace and	-0.124939
-0.349209	See www.openmp.org and	-0.124939
-0.349209	are CPLDs and	-0.124939
-0.349209	allocation (new and	-0.124939
-0.349209	smaller squares and	-0.124939
-0.349209	+ 100*16, and	-0.124939
-0.349209	bounds violations and	-0.124939
-0.349209	the pros and	-0.124939
-0.349209	how tortuous and	-0.124939
-0.349209	try, catch, and	-0.124939
-0.349209	Edition, 2005; and	-0.124939
-0.349209	operator ++i and	-0.124939
-0.349209	times lower; and	-0.124939
-0.349209	and mainframes, and	-0.124939
-0.349209	access, sort and	-0.124939
-0.349209	this mask, and	-0.124939
-0.349209	floppy disks and	-0.124939
-0.349209	jump tables, and	-0.124939
-0.349209	becomes bulky and	-0.124939
-0.349209	vector (1,2,3,4), and	-0.124939
-0.349209	pointer arithmetics and	-0.124939
-0.349209	with truncation, and	-0.124939
-0.349209	operator (&) and	-0.124939
-0.349209	operators (&& and	-0.124939
-0.349209	an error; and	-0.124939
-1.412184	loop is in	-0.124939
-1.627477	pointer is in	-0.124939
-1.275764	address is in	-0.124939
-0.894667	matrix a in	-0.124939
-1.483271	systems and in	-0.124939
-0.590844	platforms, and in	-0.124939
-0.880941	level, and in	-0.124939
-0.590844	price, and in	-0.124939
-0.590844	15.1b, and in	-0.124939
-1.750395	to be in	-0.425969
-1.051457	still be in	-0.124939
-1.034969	and are in	-0.124939
-1.907079	you are in	-0.124939
-1.146116	program are in	-0.124939
-1.582533	we are in	-0.124939
-1.345001	they are in	-0.425969
-1.617922	compilers can in	-0.124939
-1.610386	function or in	-0.124939
-1.033067	code or in	-0.124939
-0.586138	manual or in	-0.124939
-0.586138	flag or in	-0.124939
-1.335526	make it in	-0.124939
-0.590450	store it in	-0.124939
-0.880173	write it in	-0.124939
-0.590450	disable it in	-0.124939
-1.220772	the function in	-0.522879
-0.920992	a function in	-0.550907
-0.554007	time-consuming function in	-0.124939
-1.101001	detection function in	-0.124939
-0.811775	strlen function in	-0.124939
-0.554007	std::unexpected() function in	-0.124939
-0.884452	dealing with in	-0.124939
-0.592641	dealt with in	-0.124939
-1.106408	the code in	-0.425969
-0.929984	of code in	-0.124939
-0.698981	The code in	-0.346788
-0.965277	same code in	-0.124939
-0.542635	critical code in	-0.425969
-1.062188	above code in	-0.124939
-0.528967	definition code in	-0.124939
-0.767452	compiler-generated code in	-0.124939
-1.000694	way as in	-0.124939
-1.320963	well as in	-0.124939
-0.574291	explicitly as in	-0.124939
-0.574291	memory, as in	-0.124939
-0.574291	principle as in	-0.124939
-0.574291	counters, as in	-0.124939
-0.574291	union, as in	-0.124939
-1.520664	is not in	-0.301030
-0.844993	or not in	-0.124939
-1.513754	but not in	-0.124939
-0.844993	registers, not in	-0.124939
-0.572041	(but not in	-0.124939
-0.598609	expensive - in	-0.124939
-0.850880	to int in	-0.124939
-1.697757	unsigned int in	-0.124939
-1.208391	short int in	-0.301030
-0.575171	int16_t int in	-0.124939
-0.542808	functions than in	-0.124939
-0.542808	library than in	-0.124939
-1.482320	faster than in	-0.124939
-1.331889	less than in	-0.124939
-1.240896	rather than in	-0.301030
-0.542808	file than in	-0.124939
-0.117942	Linux than in	-0.602060
-0.191605	mode than in	-0.124939
-0.542808	device than in	-0.124939
-2.284574	the compiler in	-0.124939
-1.849148	Intel compiler in	-0.124939
-1.742398	Gnu compiler in	-0.124939
-0.599356	store x in	-0.124939
-1.207419	compiler may in	-0.425969
-1.700553	It may in	-0.124939
-1.118694	program may in	-0.124939
-0.579077	declaration may in	-0.124939
-1.307391	avoid this in	-0.124939
-0.589043	like this in	-0.124939
-1.341475	a time in	-0.124939
-1.216536	of time in	-0.124939
-0.570132	one time in	-0.124939
-1.341475	long time in	-0.124939
-0.989678	its time in	-0.124939
-1.389490	longer time in	-0.124939
-2.106324	to use in	-0.124939
-0.584950	CPU use in	-0.124939
-0.869518	resource use in	-0.124939
-0.596069	exits, when in	-0.124939
-0.875724	main memory in	-0.124939
-0.588161	reserving memory in	-0.124939
-1.510192	the data in	-0.124939
-1.227366	of data in	-0.124939
-0.908878	to data in	-0.124939
-1.266189	and data in	-0.124939
-1.045700	The data in	-0.124939
-0.537584	same data in	-0.124939
-0.465328	all data in	-0.124939
-0.537584	received data in	-0.124939
-0.537584	arranging data in	-0.124939
-1.643934	the program in	-0.124939
-0.861259	entire program in	-0.124939
-0.087985	result vector in	-0.726999
-0.592624	look different in	-0.124939
-1.023792	pointers because in	-0.124939
-0.582801	*(++p) because in	-0.124939
-0.582801	array[++i] because in	-0.124939
-1.854100	the same in	-0.124939
-1.030146	of functions in	-0.124939
-0.807449	different functions in	-0.124939
-0.551612	about functions in	-0.124939
-0.942610	Virtual functions in	-0.124939
-0.551612	suitable functions in	-0.124939
-0.551612	internal functions in	-0.124939
-0.551612	non-polymorphic functions in	-0.124939
-0.579738	registers only in	-0.124939
-0.579738	option only in	-0.124939
-0.579738	comes only in	-0.124939
-0.380602	each other in	-0.249877
-0.887670	decimal point in	-0.124939
-1.250234	the loop in	-0.301030
-1.381052	a loop in	-0.124939
-0.881657	The loop in	-0.425969
-0.903446	while loop in	-0.124939
-0.535266	c loop in	-0.124939
-0.778418	message loop in	-0.124939
-1.334794	function which in	-0.124939
-1.063907	functions, but in	-0.124939
-0.562541	systems, but in	-0.124939
-0.562541	case, but in	-0.124939
-0.562541	programming, but in	-0.124939
-0.562541	required, but in	-0.124939
-0.633527	is used in	-0.367977
-1.000915	be used in	-0.124939
-0.482765	are used in	-0.329059
-0.433427	data used in	-0.124939
-0.433427	only used in	-0.124939
-0.694759	also used in	-0.124939
-0.613296	method used in	-0.124939
-0.433427	now used in	-0.124939
-0.433427	Microcontrollers used in	-0.124939
-0.593280	number one in	-0.124939
-1.163883	the cache in	-0.124939
-0.859600	the integer in	-0.124939
-0.938883	an integer in	-0.124939
-0.850248	is set in	-0.124939
-0.574836	same set in	-0.124939
-2.007358	instruction set in	-0.124939
-1.291655	derived class in	-0.124939
-1.148852	parent class in	-0.124939
-1.182048	an example in	-0.124939
-1.428334	integer size in	-0.124939
-1.098905	register size in	-0.124939
-1.297569	line size in	-0.124939
-1.371851	a pointer in	-0.124939
-0.827783	this pointer in	-0.124939
-1.161564	'this' pointer in	-0.124939
-1.067892	and b in	-0.124939
-1.513540	of i in	-0.124939
-1.048840	to float in	-0.124939
-1.634281	the object in	-0.124939
-0.839147	data object in	-0.124939
-1.084452	each object in	-0.124939
-1.350813	point number in	-0.124939
-1.064184	more efficient in	-0.425969
-1.419722	less efficient in	-0.124939
-1.385466	is possible in	-0.124939
-1.017175	the version in	-0.124939
-0.860785	desired version in	-0.124939
-1.355800	the value in	-0.124939
-0.529233	by value in	-0.124939
-0.529233	maximum value in	-0.124939
-0.529233	B value in	-0.124939
-0.529233	R value in	-0.124939
-0.912975	the objects in	-0.124939
-0.681490	of objects in	-0.124939
-0.451085	are objects in	-0.124939
-0.556551	shared objects in	-0.124939
-0.451085	graphics objects in	-0.124939
-0.108076	Shared objects in	-0.204120
-0.886724	the variable in	-0.124939
-0.726390	a variable in	-0.124939
-0.847659	A variable in	-0.124939
-0.455497	other variable in	-0.124939
-0.170583	register variable in	-0.124939
-0.647141	public variable in	-0.124939
-0.315543	global variable in	-0.124939
-0.593639	does so in	-0.124939
-0.460850	on variables in	-0.124939
-0.938166	point variables in	-0.124939
-0.655492	used variables in	-0.124939
-0.460850	most variables in	-0.124939
-0.564837	register variables in	-0.425969
-0.171966	precision variables in	-0.425969
-0.655492	public variables in	-0.124939
-0.655492	local variables in	-0.124939
-0.460850	Storing variables in	-0.124939
-1.212405	of 2 in	-0.124939
-0.961483	by 2 in	-0.124939
-1.032404	the table in	-0.124939
-1.484030	a table in	-0.124939
-0.592197	improve performance in	-0.124939
-0.794313	making software in	-0.124939
-0.191931	Optimizing software in	-0.124939
-0.544275	memory-hungry software in	-0.124939
-0.408959	the order in	-0.903090
-0.886273	The order in	-0.124939
-0.494982	non-sequential order in	-0.124939
-0.883506	if branch in	-0.124939
-1.208407	one way in	-0.124939
-0.854314	safe way in	-0.124939
-0.834617	the elements in	-0.124939
-0.335299	of elements in	-0.249877
-0.518767	for elements in	-0.124939
-0.368705	any elements in	-0.124939
-0.368705	language elements in	-0.124939
-0.707987	eight elements in	-0.124939
-0.481331	consecutive elements in	-0.726999
-0.368705	subsequent elements in	-0.124939
-0.825138	calls faster in	-0.124939
-0.561338	slightly faster in	-0.124939
-0.561338	executed faster in	-0.124939
-0.594288	declared const in	-0.124939
-0.266254	is stored in	-0.124939
-0.315713	and stored in	-0.124939
-0.198358	be stored in	-0.346788
-0.184915	are stored in	-0.221849
-0.215827	pointer stored in	-0.124939
-0.129778	objects stored in	-0.124939
-0.215827	been stored in	-0.124939
-0.215827	typically stored in	-0.124939
-0.215827	never stored in	-0.124939
-0.215827	usually stored in	-0.124939
-1.514174	is called in	-0.124939
-0.556389	actually called in	-0.124939
-0.816097	usually called in	-0.124939
-0.575260	function address in	-0.124939
-0.575260	other address in	-0.124939
-0.589907	factor 4 in	-0.124939
-0.879144	by 8 in	-0.124939
-0.963003	For example, in	-0.221849
-0.476544	each bit in	-0.124939
-1.551976	sign bit in	-0.124939
-1.168845	to unsigned in	-0.124939
-1.698306	the first in	-0.124939
-0.572076	or first in	-0.124939
-1.825090	function libraries in	-0.124939
-1.389416	in registers in	-0.124939
-1.198185	vector registers in	-0.124939
-0.937631	integer registers in	-0.124939
-1.170418	through pointers in	-0.124939
-1.640757	to test in	-0.124939
-1.059493	is useful in	-0.124939
-0.515158	be useful in	-0.124939
-0.712690	also useful in	-0.124939
-0.443159	less useful in	-0.124939
-0.590621	handling even in	-0.124939
-1.335531	The method in	-0.124939
-0.568783	calling method in	-0.124939
-1.078975	file access in	-0.124939
-0.982106	network access in	-0.124939
-0.588082	number 16 in	-0.124939
-1.109706	a file in	-0.124939
-0.543618	data file in	-0.124939
-0.543618	entire file in	-0.124939
-0.466767	of bits in	-0.124939
-0.664791	same bits in	-0.124939
-0.840882	multiple bits in	-0.124939
-0.895887	64 bits in	-0.124939
-0.745148	32 bits in	-0.124939
-0.466767	small bits in	-0.124939
-0.833307	of operations in	-0.124939
-0.565768	primitive operations in	-0.124939
-0.590038	element 0 in	-0.124939
-0.589036	to type in	-0.124939
-1.577904	the case in	-0.124939
-0.594067	is short in	-0.124939
-0.586244	quite simple in	-0.124939
-0.587513	pending instructions in	-0.124939
-0.541009	is available in	-0.124939
-0.390582	be available in	-0.124939
-0.589178	are available in	-0.124939
-0.550005	also available in	-0.124939
-0.550005	registers available in	-0.124939
-0.550005	processors available in	-0.124939
-0.390582	become available in	-0.124939
-1.092371	look up in	-0.124939
-0.920767	clean up in	-0.124939
-0.791341	cleaned up in	-0.124939
-0.587458	minor error in	-0.124939
-0.517584	multiple times in	-0.124939
-1.157835	many times in	-0.124939
-0.185859	several times in	-0.425969
-0.989884	the stack in	-0.425969
-0.538359	call stack in	-0.124939
-0.586350	read about in	-0.124939
-0.710438	is accessed in	-0.124939
-0.443692	are accessed in	-0.367977
-0.379854	or accessed in	-0.124939
-0.149657	objects accessed in	-0.425969
-0.577852	non-Intel CPUs in	-0.425969
-0.585961	incremented, while in	-0.124939
-1.059108	of arrays in	-0.124939
-0.824559	static arrays in	-0.124939
-1.240045	to work in	-0.124939
-0.822576	not work in	-0.124939
-1.034803	32-bit Windows in	-0.124939
-0.614471	function calls in	-0.726999
-0.995785	the calculations in	-0.124939
-0.509608	integer calculations in	-0.124939
-0.340839	multiple calculations in	-0.124939
-0.877744	the result in	-0.301030
-0.489564	this result in	-0.124939
-0.489564	store result in	-0.124939
-1.538519	is compiled in	-0.124939
-1.270289	4 bytes in	-0.124939
-1.169854	8 bytes in	-0.124939
-0.638628	unused bytes in	-0.124939
-1.141491	very big in	-0.124939
-0.840661	different threads in	-0.124939
-1.065548	multiple threads in	-0.124939
-0.586434	two threads in	-0.124939
-1.177935	be necessary in	-0.124939
-0.329784	an element in	-0.124939
-0.131951	each element in	-0.602060
-0.585638	array element in	-0.124939
-0.329784	new element in	-0.124939
-0.329784	every element in	-0.124939
-0.464747	Each element in	-0.124939
-0.250280	largest element in	-0.124939
-1.156944	definition language in	-0.124939
-0.822782	function. But in	-0.124939
-0.560052	b. But in	-0.124939
-1.247938	execution speed in	-0.124939
-1.110904	the thread in	-0.124939
-1.177322	separate thread in	-0.124939
-0.867763	functions, etc. in	-0.124939
-0.704397	an exception in	-0.425969
-1.161731	are allocated in	-0.124939
-0.584882	kept small in	-0.124939
-0.873749	cause overflow in	-0.124939
-0.828800	64-bit integers in	-0.124939
-0.828800	32-bit integers in	-0.124939
-1.113569	unsigned integers in	-0.124939
-0.721277	signed integers in	-0.124939
-0.556155	handling option in	-0.124939
-0.556155	unroll option in	-0.124939
-1.372410	a matrix in	-0.124939
-0.813087	512 matrix in	-0.124939
-0.586435	to Linux in	-0.124939
-0.700866	container classes in	-0.425969
-0.524322	parent classes in	-0.124939
-1.190349	is done in	-0.124939
-1.224466	be done in	-0.124939
-0.500775	all done in	-0.124939
-0.500775	usually done in	-0.124939
-0.760428	same precision in	-0.124939
-0.967214	double precision in	-0.124939
-1.627763	cache line in	-0.124939
-0.360423	This works in	-0.425969
-0.276441	are explained in	-0.124939
-0.601218	as explained in	-0.425969
-0.378109	further explained in	-0.124939
-1.234163	is calculated in	-0.124939
-1.363255	be calculated in	-0.124939
-0.522900	has calculated in	-0.124939
-1.110323	the calculation in	-0.425969
-0.953759	address calculation in	-0.124939
-0.969639	is advantageous in	-0.124939
-0.380852	is implemented in	-0.301030
-0.374796	be implemented in	-0.124939
-0.652578	are implemented in	-0.124939
-0.331909	programs implemented in	-0.124939
-0.586158	usability problem in	-0.124939
-0.588007	pointer known in	-0.124939
-0.600183	efficient solution in	-0.124939
-0.521002	viable solution in	-0.124939
-0.592116	an advantage in	-0.124939
-0.493244	such advantage in	-0.124939
-0.493244	speed advantage in	-0.124939
-0.585789	profiling support in	-0.124939
-0.660151	is supported in	-0.124939
-0.587219	is eight in	-0.124939
-0.585319	in list in	-0.124939
-1.909049	is likely in	-0.124939
-1.131854	the structure in	-0.124939
-0.549522	program structure in	-0.124939
-0.315276	that run in	-0.124939
-0.787019	can run in	-0.124939
-0.454943	should run in	-0.124939
-0.454943	each run in	-0.124939
-0.584696	in hardware in	-0.124939
-0.749199	the values in	-0.124939
-0.510863	G values in	-0.124939
-0.580842	program. All in	-0.124939
-0.584563	worked well in	-0.124939
-0.740706	the information in	-0.124939
-0.513322	debug information in	-0.124939
-0.513322	application-specific information in	-0.124939
-1.797842	clock cycles in	-0.124939
-0.444190	and addresses in	-0.124939
-0.444190	All addresses in	-0.124939
-0.310045	relative addresses in	-0.124939
-0.444190	round addresses in	-0.124939
-0.792118	monitor counter in	-0.124939
-1.063527	stamp counter in	-0.124939
-0.542750	be fast in	-0.124939
-0.542750	are fast in	-0.124939
-1.700222	memory allocation in	-0.124939
-0.584711	thread. However, in	-0.124939
-1.133886	is optimal in	-0.124939
-0.963465	be optimal in	-0.124939
-0.465609	a space in	-0.124939
-0.755425	more space in	-0.124939
-0.465609	much space in	-0.124939
-0.465609	little space in	-0.124939
-1.979498	a lot in	-0.124939
-0.447175	CPU dispatching in	-0.425969
-1.020362	a microprocessor in	-0.124939
-0.580610	and branches in	-0.124939
-0.580138	level, typically in	-0.124939
-0.585359	files, preferably in	-0.124939
-0.649004	code automatically in	-0.124939
-0.456694	optimization automatically in	-0.124939
-0.456694	operations automatically in	-0.124939
-0.456694	writes automatically in	-0.124939
-0.582753	you see in	-0.124939
-0.353132	hardware implementation in	-0.425969
-1.456597	more complicated in	-0.124939
-0.982158	error handling in	-0.124939
-1.288238	exception handling in	-0.124939
-0.581769	used members in	-0.124939
-0.859954	the methods in	-0.124939
-1.119977	the name in	-0.124939
-0.578530	remains zero in	-0.124939
-1.019338	integer division in	-0.124939
-0.435427	no cost in	-0.425969
-0.491387	any cost in	-0.124939
-0.602946	is running in	-0.124939
-0.259906	are running in	-0.124939
-0.789828	when running in	-0.124939
-0.347280	system running in	-0.124939
-0.347280	threads running in	-0.124939
-0.347280	thread running in	-0.124939
-0.529015	make dispatcher in	-0.124939
-1.229204	CPU dispatcher in	-0.124939
-1.695840	the programmer in	-0.124939
-1.313425	a lookup in	-0.124939
-1.718610	the end in	-0.124939
-0.793006	the examples in	-0.124939
-0.793006	The examples in	-0.124939
-0.693084	code examples in	-0.124939
-0.340891	a difference in	-0.124939
-0.197688	no difference in	-0.346788
-0.340891	big difference in	-0.124939
-0.562335	dispatch mechanism in	-0.425969
-0.483154	detection mechanism in	-0.124939
-0.784212	is needed in	-0.124939
-0.473961	not needed in	-0.425969
-0.405667	than needed in	-0.124939
-0.405667	memory needed in	-0.124939
-0.530001	true last in	-0.124939
-0.530001	come last in	-0.124939
-0.032829	be transferred in	-0.346788
-0.130303	are transferred in	-0.669007
-0.582233	byte longer in	-0.124939
-0.678268	of optimizations in	-0.124939
-0.475261	specific optimizations in	-0.124939
-0.475261	improve optimizations in	-0.124939
-0.851068	a framework in	-0.124939
-0.583140	A look in	-0.124939
-0.524256	newer microprocessors in	-0.124939
-1.003105	Modern microprocessors in	-0.124939
-0.322157	the numbers in	-0.124939
-0.469303	denormal numbers in	-0.124939
-0.578392	use later in	-0.124939
-0.539098	objects together in	-0.124939
-0.462178	stored together in	-0.425969
-0.539098	linked together in	-0.124939
-0.382997	joined together in	-0.124939
-0.821329	be declared in	-0.124939
-0.724419	are declared in	-0.124939
-0.425322	objects declared in	-0.124939
-0.583497	by piece in	-0.124939
-1.013877	doesn't know in	-0.124939
-0.857211	and r in	-0.124939
-0.374453	This results in	-0.124939
-0.374453	four results in	-0.124939
-0.629434	intermediate results in	-0.124939
-0.374453	thousand results in	-0.124939
-0.374453	experimental results in	-0.124939
-0.576516	function goes in	-0.124939
-0.575851	power-save options in	-0.124939
-0.576516	Func2 were in	-0.124939
-0.575140	all operands in	-0.124939
-0.580065	unused points in	-0.124939
-1.133558	a switch in	-0.124939
-1.002962	is smaller in	-0.124939
-0.577244	provoked here in	-0.124939
-0.262495	scattered around in	-0.249877
-0.354987	randomly around in	-0.124939
-0.340889	do things in	-0.124939
-1.253577	algebraic reductions in	-0.124939
-1.003493	be tested in	-0.124939
-0.729324	been tested in	-0.124939
-0.089369	cause contentions in	-0.602060
-0.263850	Cache contentions in	-0.425969
-0.675046	and references in	-0.124939
-0.562524	relative references in	-0.124939
-0.399216	absolute references in	-0.124939
-0.399216	self-relative references in	-0.124939
-1.117751	extra overhead in	-0.124939
-0.575827	a change in	-0.124939
-0.572671	point-to-integer conversions in	-0.124939
-0.721566	if statement in	-0.124939
-0.501882	one statement in	-0.124939
-0.138598	of errors in	-0.425969
-0.343368	program errors in	-0.124939
-0.257769	such errors in	-0.124939
-0.162743	of columns in	-0.602060
-0.124953	and columns in	-0.425969
-0.301073	by columns in	-0.124939
-0.847647	other languages in	-0.124939
-0.711964	The syntax in	-0.124939
-0.627509	C++ syntax in	-0.124939
-0.442768	assembly syntax in	-0.124939
-1.643900	be avoided in	-0.124939
-0.847321	quite inefficient in	-0.124939
-0.267136	is described in	-0.124939
-0.176796	are described in	-0.124939
-0.088185	as described in	-0.249877
-0.176796	cases described in	-0.124939
-0.176796	syntax described in	-0.124939
-0.176796	further described in	-0.124939
-0.377085	cache lines in	-0.124939
-0.329609	4 lines in	-0.124939
-0.329609	16 lines in	-0.124939
-0.569950	graphics operation in	-0.124939
-0.576638	the instance in	-0.124939
-0.446558	is given in	-0.124939
-0.446558	be given in	-0.124939
-0.312491	are given in	-0.124939
-0.316380	as given in	-0.124939
-0.568630	the task in	-0.124939
-0.569517	or limited in	-0.124939
-1.052992	the costs in	-0.124939
-0.494484	also costs in	-0.124939
-0.843599	of S1 in	-0.124939
-0.492249	register temp in	-0.124939
-0.492249	save temp in	-0.124939
-0.489286	system database in	-0.124939
-0.489286	registration database in	-0.124939
-0.570406	identical constants in	-0.124939
-0.573084	of bool in	-0.124939
-0.569517	this shift in	-0.124939
-0.852025	and d in	-0.124939
-0.179281	the algorithm in	-0.124939
-0.108731	all strings in	-0.124939
-0.254115	store strings in	-0.124939
-0.254115	storing strings in	-0.124939
-0.108731	text strings in	-0.124939
-0.662952	multiple conditions in	-0.124939
-0.415713	error conditions in	-0.124939
-0.415713	worst-case conditions in	-0.124939
-1.399380	the right in	-0.124939
-0.979324	a macro in	-0.124939
-0.832282	with 100 in	-0.124939
-0.839343	different tasks in	-0.124939
-0.566162	parallel processing in	-0.124939
-0.477208	The containers in	-0.124939
-0.477208	example containers in	-0.124939
-0.566375	different priority in	-0.124939
-1.115607	be obtained in	-0.124939
-0.475536	of counters in	-0.124939
-0.919855	monitor counters in	-0.124939
-0.566375	overwritten, possibly in	-0.124939
-1.207087	function names in	-0.124939
-0.567405	standardized details in	-0.124939
-0.985099	the rows in	-0.124939
-0.478886	between rows in	-0.124939
-0.844069	may fail in	-0.124939
-0.563300	applications (e.g. in	-0.124939
-0.840188	by compiling in	-0.124939
-0.711303	at least in	-0.124939
-1.291510	data structures in	-0.124939
-0.422492	can occur in	-0.124939
-0.377081	may occur in	-0.124939
-0.263858	will occur in	-0.124939
-0.112191	contentions occur in	-0.425969
-0.323218	inefficient, especially in	-0.124939
-0.323218	precision, especially in	-0.124939
-0.323218	resource, especially in	-0.124939
-0.323218	relocation, especially in	-0.124939
-1.509269	be improved in	-0.124939
-0.447494	are discussed in	-0.124939
-0.566626	listed below in	-0.124939
-0.649021	error message in	-0.124939
-0.173881	forwarding delay in	-0.425969
-0.564387	scarce resource in	-0.124939
-0.974743	of cores in	-0.124939
-0.467454	saved either in	-0.124939
-0.467454	blocks, either in	-0.124939
-0.212785	is defined in	-0.425969
-0.264417	been defined in	-0.124939
-0.112388	classes defined in	-0.124939
-0.561049	but rarely in	-0.124939
-0.390496	register except in	-0.124939
-0.390496	underflow except in	-0.124939
-0.390496	representation, except in	-0.124939
-0.559577	loops" chapter in	-0.124939
-0.565694	places back in	-0.124939
-0.821911	class templates in	-0.124939
-0.965558	loop unrolling in	-0.124939
-0.458912	of CriticalFunction in	-0.124939
-0.458912	to CriticalFunction in	-0.124939
-1.099880	the sequence in	-0.124939
-0.457950	put something in	-0.124939
-0.457950	Storing something in	-0.124939
-0.965558	be invalid in	-0.124939
-1.087413	user input in	-0.124939
-0.830895	is organized in	-0.124939
-0.564464	transferring 'this' in	-0.124939
-0.517303	to gain in	-0.124939
-0.349853	The gain in	-0.124939
-0.242729	This gain in	-0.124939
-0.349853	you gain in	-0.124939
-0.242729	small gain in	-0.124939
-1.066120	can happen in	-0.124939
-0.558363	considered metaprogramming in	-0.124939
-0.560793	can define in	-0.124939
-1.300444	to implement in	-0.124939
-0.564558	consecutive terms in	-0.124939
-0.557783	multiple blocks in	-0.124939
-0.443498	put away in	-0.124939
-0.443498	go away in	-0.124939
-0.557783	is low in	-0.124939
-0.675327	is provided in	-0.124939
-0.263984	are provided in	-0.124939
-0.957954	by default in	-0.124939
-1.225864	dependency chains in	-0.124939
-0.557783	general purposes in	-0.124939
-0.354789	operations mentioned in	-0.124939
-0.354789	time-consumers mentioned in	-0.124939
-0.354789	ones mentioned in	-0.124939
-0.167712	17 Optimization in	-0.425969
-0.560481	up everything in	-0.124939
-0.555103	of (or in	-0.124939
-0.164397	is included in	-0.124939
-0.185557	are included in	-0.124939
-0.185557	not included in	-0.124939
-0.185557	license included in	-0.124939
-0.551062	two iterations in	-0.124939
-1.324115	into account in	-0.124939
-1.174804	dependency chain in	-0.124939
-0.427203	and algorithms in	-0.124939
-0.603912	different algorithms in	-0.124939
-0.429448	float additions in	-0.124939
-0.429448	four additions in	-0.124939
-0.555552	unknown factors in	-0.124939
-0.071588	are listed in	-0.249877
-0.127797	as listed in	-0.124939
-0.127797	instructions listed in	-0.124939
-0.552553	reductions explicitly in	-0.124939
-0.944930	is interpreted in	-0.124939
-1.081077	be determined in	-0.124939
-0.163368	causes misses in	-0.425969
-0.555552	named YMM in	-0.124939
-0.550995	specific purpose in	-0.124939
-0.378923	without -fpic in	-0.124939
-0.545920	registers had in	-0.124939
-0.547605	table 19 in	-0.124939
-0.577677	are allowed in	-0.124939
-0.577677	not allowed in	-0.124939
-0.549297	calling itself in	-0.124939
-0.806338	of algebra in	-0.124939
-0.547605	for free in	-0.124939
-0.545920	be expensive in	-0.124939
-0.545920	Catch exceptions in	-0.124939
-0.481599	be saved in	-0.124939
-0.304010	are saved in	-0.124939
-0.304010	was saved in	-0.124939
-0.547605	of changes in	-0.124939
-0.928737	is measured in	-0.124939
-0.545920	risk factor in	-0.124939
-1.073151	execution units in	-0.124939
-1.057756	the reciprocal in	-0.124939
-0.546956	minor increase in	-0.124939
-0.544993	is spent in	-0.124939
-0.544993	time spent in	-0.124939
-0.541094	exception occurs in	-0.124939
-0.384389	interrupt occurs in	-0.124939
-0.792118	as follows in	-0.124939
-0.541094	next step in	-0.124939
-0.463571	hot spots in	-0.425969
-0.539158	specific places in	-0.124939
-0.788673	are evaluated in	-0.124939
-0.115255	and deallocated in	-0.425969
-0.272604	also deallocated in	-0.124939
-0.541094	are permissible in	-0.124939
-0.785255	many users in	-0.124939
-0.788673	approximately six in	-0.124939
-0.772962	and fourteen in	-0.124939
-0.529865	programming principles in	-0.124939
-0.772962	the reduction in	-0.124939
-0.354452	is portable in	-0.124939
-0.354452	fully portable in	-0.124939
-0.529865	linked lists in	-0.124939
-1.178919	to recover in	-0.124939
-0.949053	the advice in	-0.124939
-0.949053	is already in	-0.124939
-0.532140	available, i.e. in	-0.124939
-0.967733	software package in	-0.124939
-0.227585	is seen in	-0.124939
-0.227585	be seen in	-0.124939
-0.227585	not seen in	-0.124939
-0.532140	modules contiguous in	-0.124939
-0.529865	class separately in	-0.124939
-0.529865	old-fashioned. Development in	-0.124939
-0.769008	or key in	-0.124939
-0.099052	they appear in	-0.425969
-0.227585	modules appear in	-0.124939
-0.355970	loops (except in	-0.124939
-0.355970	capabilities (except in	-0.124939
-0.529865	code flag in	-0.124939
-0.352940	be written in	-0.124939
-0.352940	programs written in	-0.124939
-0.532140	not present in	-0.124939
-0.745723	one place in	-0.124939
-0.127797	and sixteen in	-0.425969
-0.516288	always enabled in	-0.124939
-0.750411	is serial in	-0.124939
-0.516288	require modifications in	-0.124939
-0.127797	are missing in	-0.124939
-0.064844	Optimizing subroutines in	-0.124939
-0.020557	"Optimizing subroutines in	-0.602060
-0.516288	different lengths in	-0.124939
-0.309663	is contained in	-0.124939
-0.309663	completely contained in	-0.124939
-0.516288	point underflow in	-0.124939
-0.750411	be prevented in	-0.124939
-0.064844	2. Contentions in	-0.124939
-0.064844	buffer. Contentions in	-0.124939
-0.064844	(BTB). Contentions in	-0.124939
-0.064844	experiments. Contentions in	-0.124939
-0.076826	is illustrated in	-0.425969
-0.170226	as illustrated in	-0.124939
-0.127797	be returned in	-0.124939
-0.516288	there is, in	-0.124939
-0.309663	a breakpoint in	-0.124939
-0.309663	fixed breakpoint in	-0.124939
-0.519046	that appears in	-0.124939
-0.154600	be found in	-0.124939
-0.170226	rarely found in	-0.124939
-0.860369	exception handler in	-0.124939
-0.311374	be coded in	-0.124939
-0.311374	are coded in	-0.124939
-0.516288	index changing in	-0.124939
-0.309663	is kept in	-0.124939
-0.309663	are kept in	-0.124939
-0.745723	the techniques in	-0.124939
-0.309663	stored sequentially in	-0.124939
-0.309663	accessed sequentially in	-0.124939
-0.516288	much simpler in	-0.124939
-0.353948	in detail in	-0.124939
-0.245926	more detail in	-0.124939
-0.494561	security advices in	-0.124939
-0.709492	an FPGA in	-0.124939
-0.245926	elements consecutively in	-0.124939
-0.245926	stored consecutively in	-0.124939
-0.813761	of abstraction in	-0.124939
-0.494561	whose distance in	-0.124939
-0.494561	default anyway in	-0.124939
-0.494561	need updating in	-0.124939
-0.494561	profiling instruments in	-0.124939
-0.494561	unnecessarily wasteful in	-0.124939
-0.245926	difference lies in	-0.124939
-0.245926	efficiency lies in	-0.124939
-0.494561	errors elsewhere in	-0.124939
-0.494561	than RISC in	-0.124939
-0.245926	are supplied in	-0.124939
-0.245926	have supplied in	-0.124939
-0.494561	standard PC's in	-0.124939
-0.494561	cause delays in	-0.124939
-0.494561	uses logarithms in	-0.124939
-0.494561	on longjmp in	-0.124939
-0.494561	system kernel in	-0.124939
-0.494561	of bookkeeping in	-0.124939
-0.494561	right formula in	-0.124939
-0.494561	{} brackets in	-0.124939
-0.494561	be handled in	-0.124939
-0.813761	table (PLT) in	-0.124939
-0.494561	as pivot in	-0.124939
-0.201745	be placed in	-0.124939
-0.645003	to invest in	-0.124939
-0.645003	and Sum3 in	-0.124939
-0.064844	as shown in	-0.124939
-0.454121	also proceed in	-0.124939
-0.645003	be justified in	-0.124939
-0.645003	disturbing influences in	-0.124939
-0.454121	be disabled in	-0.124939
-0.141098	the relocations in	-0.124939
-0.141098	generate relocations in	-0.124939
-0.454121	kernel code" in	-0.124939
-0.454121	it locally in	-0.124939
-0.454121	If MultiplyBy in	-0.124939
-0.454121	the answers in	-0.124939
-0.454121	is inserted in	-0.124939
-0.454121	reproducibility. Delays in	-0.124939
-0.454121	not visible in	-0.124939
-0.645003	are summarized in	-0.124939
-0.454121	do experiments in	-0.124939
-0.454121	scattered everywhere in	-0.124939
-0.454121	as pragmas in	-0.124939
-0.454121	runs alone in	-0.124939
-0.454121	biggest time-consumer in	-0.124939
-0.064844	8 Optimizations in	-0.425969
-0.454121	for parallelization in	-0.124939
-0.454121	are covered in	-0.124939
-0.454121	slight degradation in	-0.124939
-0.645003	are overdetermined in	-0.124939
-0.454121	or integrated in	-0.124939
-0.645003	source annotation in	-0.124939
-0.645003	develop- ment in	-0.124939
-0.454121	more efforts in	-0.124939
-0.454121	right-most 1-bit in	-0.124939
-0.454121	"Register usage in	-0.124939
-0.454121	assigned previously in	-0.124939
-0.454121	necessarily stay in	-0.124939
-0.454121	| Friday) in	-0.124939
-0.454121	question: Put in	-0.124939
-0.351432	will dominate in	-0.124939
-0.351432	a niche in	-0.124939
-0.351432	Windows, SetThreadAffinityMask, in	-0.124939
-0.351432	32 AND-operations in	-0.124939
-0.351432	a typo in	-0.124939
-0.351432	not recognized in	-0.124939
-0.351432	The dot in	-0.124939
-0.351432	: 0] in	-0.124939
-0.351432	Gnu utilities in	-0.124939
-0.351432	been alleviated in	-0.124939
-0.351432	right positions in	-0.124939
-0.351432	libraries. Numbers in	-0.124939
-0.351432	will grow in	-0.124939
-0.351432	is system-independent, in	-0.124939
-0.351432	two formulas in	-0.124939
-0.351432	be arranged in	-0.124939
-0.351432	other flaws in	-0.124939
-0.351432	(e.g. GetLogicalProcessorInformation in	-0.124939
-0.351432	than investing in	-0.124939
-0.351432	of randomness in	-0.124939
-0.351432	is mirrored in	-0.124939
-0.351432	often reorganized in	-0.124939
-0.351432	then de-referenced in	-0.124939
-0.351432	classes Programming in	-0.124939
-0.351432	a server in	-0.124939
-0.351432	chain. Nothing in	-0.124939
-0.351432	completely absent in	-0.124939
-0.351432	and decoded in	-0.124939
-0.351432	be programmed in	-0.124939
-0.351432	stored contiguously in	-0.124939
-0.351432	You may, in	-0.124939
-0.351432	must bear in	-0.124939
-0.351432	efficient because, in	-0.124939
-0.351432	test finishes in	-0.124939
-0.351432	inserted UnusedFiller in	-0.124939
-0.351432	planning phase in	-0.124939
-0.351432	are indexed in	-0.124939
-0.351432	The FactorialTable in	-0.124939
-0.351432	general improvements in	-0.124939
-0.351432	been introduced in	-0.124939
-0.351432	occurs somewhere in	-0.124939
-0.351432	slight imprecision in	-0.124939
-0.351432	interface (OnIdle in	-0.124939
-0.351432	table (GOT) in	-0.124939
-0.351432	thread-like scheduling in	-0.124939
-0.351432	AND-OR construction in	-0.124939
-0.351432	so 1.2 in	-0.124939
-0.351432	(release version) in	-0.124939
-0.351432	the iterator in	-0.124939
-0.351432	gained remarkably in	-0.124939
-0.351432	are cheap, in	-0.124939
-0.351432	are costless in	-0.124939
-0.351432	and semicolons in	-0.124939
-0.351432	are obscured in	-0.124939
-0.351432	integer representations in	-0.124939
-0.351432	compilers succeeded in	-0.124939
-0.351432	the if-branch in	-0.124939
-0.351432	// continue in	-0.124939
-0.351432	(e.g. GetProcessAffinityMask in	-0.124939
-0.351432	considerable improvement in	-0.124939
-0.351432	and foremost, in	-0.124939
-0.351432	of CPU-time in	-0.124939
-0.351432	University courses in	-0.124939
-0.351432	be scheduled in	-0.124939
-0.351432	compute i/2 in	-0.124939
-0.351432	or glitches in	-0.124939
-0.351432	(e.g. IsProcessorFeaturePresent in	-0.124939
-0.351432	* sizeof(float) in	-0.124939
-0.351432	of rows/columns in	-0.124939
-0.351432	cycles. Calculations in	-0.124939
-0.351432	or iterative in	-0.124939
-0.351432	branch predictions in	-0.124939
-0.351432	my comments, in	-0.124939
-0.351432	occurred anywhere in	-0.124939
-0.351432	be resized in	-0.124939
-0.351432	be overridden in	-0.124939
-0.351432	lag. Thinking in	-0.124939
-0.351432	and __intel_new_strlen in	-0.124939
-0.351432	other volumes in	-0.124939
-1.535360	}; // The	-0.124939
-0.594122	loops // The	-0.124939
-1.287313	of x The	-0.124939
-0.596880	xplus2() { The	-0.124939
-0.569329	} } The	-0.170696
-0.420076	0; } The	-0.124939
-1.168896	1; } The	-0.124939
-1.054035	2; } The	-0.124939
-0.913174	3; } The	-0.124939
-0.760708	1.; } The	-0.124939
-0.667231	2.0; } The	-0.124939
-0.667231	1.0f; } The	-0.124939
-0.468312	Z } The	-0.124939
-0.468312	pow(x,10); } The	-0.124939
-0.468312	abs(v.f) } The	-0.124939
-0.468312	printf(Greek[n]); } The	-0.124939
-0.468312	Func(a[i]); } The	-0.124939
-0.468312	timediff[i]); } The	-0.124939
-0.468312	list[x]; } The	-0.124939
-0.594393	Prefetching data The	-0.124939
-1.279341	Mathematical functions The	-0.124939
-0.859893	Overloaded functions The	-0.124939
-0.579925	fastcall functions The	-0.124939
-1.079966	C++ compilers The	-0.124939
-0.594043	platform. Intel The	-0.124939
-0.592000	Register variables The	-0.124939
-1.370443	in table The	-0.124939
-1.176809	if unsigned The	-0.124939
-0.939770	the code. The	-0.124939
-0.754935	of code. The	-0.124939
-0.307847	point code. The	-0.124939
-0.622842	intermediate code. The	-0.124939
-0.706302	source code. The	-0.124939
-0.439712	independent code. The	-0.124939
-0.439712	Fortran code. The	-0.124939
-0.622842	application-specific code. The	-0.124939
-0.439712	precompiled code. The	-0.124939
-0.730409	the time. The	-0.124939
-0.730409	of time. The	-0.124939
-0.878900	same time. The	-0.124939
-0.816161	extra time. The	-0.124939
-0.791643	compile time. The	-0.124939
-0.642637	load time. The	-0.124939
-0.452595	save time. The	-0.124939
-0.642637	user's time. The	-0.124939
-0.986615	YMM registers The	-0.124939
-0.986615	ZMM registers The	-0.124939
-0.412944	the function. The	-0.124939
-0.601857	member function. The	-0.124939
-0.259576	critical function. The	-0.124939
-0.346674	new function. The	-0.124939
-0.346674	desired function. The	-0.124939
-0.487960	inlined function. The	-0.124939
-0.346674	message function. The	-0.124939
-0.346674	polymorphic function. The	-0.124939
-0.346674	strlen function. The	-0.124939
-0.500652	access, etc. The	-0.124939
-0.500652	way, etc. The	-0.124939
-0.500652	databases, etc. The	-0.124939
-0.500652	connections, etc. The	-0.124939
-0.871722	bit Linux The	-0.124939
-0.553764	bit }; The	-0.124939
-0.811336	b;} }; The	-0.124939
-0.337186	library functions. The	-0.124939
-0.349940	two functions. The	-0.124939
-0.362656	member functions. The	-0.124939
-0.492488	virtual functions. The	-0.124939
-0.349940	hardware functions. The	-0.124939
-0.349940	polymorphic functions. The	-0.124939
-0.349940	trigonometric functions. The	-0.124939
-0.865778	decrement operators The	-0.124939
-0.846316	the memory. The	-0.124939
-1.095110	in memory. The	-0.124939
-0.509868	code memory. The	-0.124939
-0.403320	the program. The	-0.124939
-0.398928	a program. The	-0.124939
-0.360151	mode program. The	-0.124939
-0.569124	application program. The	-0.124939
-0.544176	is used. The	-0.301030
-0.472844	not used. The	-0.124939
-0.862243	of microprocessor The	-0.124939
-0.860356	identical branches The	-0.124939
-0.581652	date. Mac The	-0.124939
-0.152729	the cache. The	-0.124939
-0.115765	code cache. The	-0.124939
-0.489873	data cache. The	-0.124939
-0.437285	level-2 cache. The	-0.124939
-0.218474	level-1 cache. The	-0.124939
-0.390383	micro-op cache. The	-0.124939
-0.508430	64-bit systems. The	-0.124939
-0.593175	operating systems. The	-0.124939
-0.353922	Linux systems. The	-0.124939
-0.353922	bigger systems. The	-0.124939
-0.353922	BSD systems. The	-0.124939
-0.353922	Itanium systems. The	-0.124939
-1.020657	+ c; The	-0.124939
-0.519742	more efficient. The	-0.124939
-0.404279	very efficient. The	-0.124939
-0.649100	less efficient. The	-0.124939
-0.766548	explained below. The	-0.124939
-0.568788	described below. The	-0.124939
-0.403506	given below. The	-0.124939
-0.403506	sections below. The	-0.124939
-0.403506	14.19 below. The	-0.124939
-0.519384	the data. The	-0.124939
-0.338634	of data. The	-0.124939
-0.309872	vector data. The	-0.124939
-0.309872	used data. The	-0.124939
-0.437798	test data. The	-0.124939
-0.309872	read-only data. The	-0.124939
-0.860351	return types The	-0.124939
-0.418195	instruction set. The	-0.124939
-0.778210	different compilers. The	-0.124939
-0.778210	C++ compilers. The	-0.124939
-0.477151	Clang compilers. The	-0.124939
-0.732575	Intel processors. The	-0.124939
-0.606236	newer processors. The	-0.124939
-0.428748	PC processors. The	-0.124939
-0.428748	contemporary processors. The	-0.124939
-1.178892	hardware platform The	-0.124939
-1.271514	stored together The	-0.124939
-0.579005	is called. The	-0.124939
-0.354120	are called. The	-0.124939
-0.590708	never called. The	-0.124939
-0.512203	of CPUs. The	-0.124939
-0.512203	AMD CPUs. The	-0.124939
-0.364048	old CPUs. The	-0.124939
-0.364048	modern CPUs. The	-0.124939
-0.364048	earlier CPUs. The	-0.124939
-0.850017	Boolean operands The	-0.124939
-1.276877	the compiler. The	-0.124939
-0.845899	across modules The	-0.124939
-0.448076	pointers are: The	-0.124939
-0.635657	allocation are: The	-0.124939
-0.448076	inlining are: The	-0.124939
-0.539424	the loop. The	-0.124939
-0.353866	test loop. The	-0.124939
-0.689988	innermost loop. The	-0.124939
-0.353866	infinite loop. The	-0.124939
-1.143558	a pointer. The	-0.124939
-0.508118	hidden pointer. The	-0.124939
-0.572231	Type conversions The	-0.124939
-0.336828	Windows platforms. The	-0.124939
-0.848430	Program installation The	-0.124939
-0.388915	the cases. The	-0.124939
-0.616664	most cases. The	-0.124939
-0.388915	such cases. The	-0.124939
-0.616664	simple cases. The	-0.124939
-0.722048	and 1. The	-0.124939
-0.895686	or 1. The	-0.124939
-1.095423	Function inlining The	-0.124939
-0.569782	variable size. The	-0.124939
-0.739887	of 2. The	-0.124939
-0.378386	be 2. The	-0.124939
-0.276587	by 2. The	-0.124939
-0.567714	these variables. The	-0.124939
-0.567206	allocated resources. The	-0.124939
-0.516701	or class. The	-0.124939
-0.367241	same class. The	-0.124939
-0.367241	container class. The	-0.124939
-0.367241	parent class. The	-0.124939
-0.385693	it calls. The	-0.124939
-0.279420	function calls. The	-0.124939
-0.270478	interface calls. The	-0.124939
-0.984443	optimal algorithm The	-0.124939
-0.482873	tested it. The	-0.124939
-0.482873	execute it. The	-0.124939
-0.294342	vector registers. The	-0.124939
-0.582296	YMM registers. The	-0.124939
-0.247010	32-bit mode. The	-0.124939
-0.496603	bit mode. The	-0.124939
-0.571623	12 bytes. The	-0.124939
-0.700842	the object. The	-0.124939
-0.412693	global object. The	-0.124939
-0.412693	anonymous object. The	-0.124939
-0.789733	the library. The	-0.124939
-0.789733	function library. The	-0.124939
-0.142111	the calculations. The	-0.124939
-0.499149	integer calculations. The	-0.124939
-0.354727	overlapping calculations. The	-0.124939
-0.556414	clock cycles. The	-0.124939
-0.690482	arithmetic operations. The	-0.124939
-0.482873	(SIMD) operations. The	-0.124939
-0.353508	that variable. The	-0.124939
-0.141737	register variable. The	-0.124939
-0.497450	induction variable. The	-0.124939
-0.565629	prevent optimization. The	-0.124939
-0.482873	program performance. The	-0.124939
-0.482873	best performance. The	-0.124939
-0.400823	dynamic libraries. The	-0.124939
-0.400823	large libraries. The	-0.124939
-0.400823	math libraries. The	-0.124939
-1.350271	the stack. The	-0.124939
-0.916352	if possible. The	-0.124939
-0.772474	as possible. The	-0.124939
-0.867976	is needed. The	-0.124939
-0.474273	when needed. The	-0.124939
-0.400823	and classes. The	-0.124939
-0.400823	for classes. The	-0.124939
-0.678084	container classes. The	-0.124939
-0.565941	the thread. The	-0.124939
-0.639518	different purposes. The	-0.124939
-0.567018	other purposes. The	-0.124939
-0.567018	test purposes. The	-0.124939
-0.400823	and precision. The	-0.124939
-0.400823	point precision. The	-0.124939
-0.400823	losing precision. The	-0.124939
-0.636977	memory access. The	-0.124939
-0.400823	each access. The	-0.124939
-0.400823	every access. The	-0.124939
-0.829980	control condition The	-0.124939
-0.545132	AVX instructions. The	-0.124939
-0.387200	string instructions. The	-0.124939
-0.387200	subsequent instructions. The	-0.124939
-0.567487	+ f; The	-0.124939
-0.613771	following way. The	-0.124939
-0.545132	inefficient way. The	-0.124939
-0.545132	suboptimal way. The	-0.124939
-0.545132	the vector. The	-0.124939
-0.387200	size vector. The	-0.124939
-0.545132	per vector. The	-0.124939
-0.665346	as well. The	-0.124939
-0.467119	performs well. The	-0.124939
-1.315019	CPU dispatching. The	-0.124939
-1.114502	switch statements The	-0.124939
-0.454672	its address. The	-0.124939
-0.454672	(signed) address. The	-0.124939
-0.845626	instruction sets. The	-0.124939
-0.454672	instructions sets. The	-0.124939
-0.216954	or not. The	-0.124939
-0.648973	this problem. The	-0.124939
-0.372223	another problem. The	-0.124939
-0.372223	security problem. The	-0.124939
-0.562893	systems. 3 The	-0.124939
-0.302252	each version. The	-0.124939
-0.302252	32-bit version. The	-0.124939
-0.302252	alternative version. The	-0.124939
-0.302252	up-to-date version. The	-0.124939
-1.114502	end user. The	-0.124939
-0.640769	function returns. The	-0.124939
-0.411552	64-bit Windows. The	-0.124939
-0.262924	non-sequential order. The	-0.124939
-0.496509	random order. The	-0.124939
-0.554540	dynamic allocation. The	-0.124939
-0.629765	64-bit integers. The	-0.124939
-0.444241	16-bit integers. The	-0.124939
-0.623676	is enabled. The	-0.124939
-0.557343	future. 6 The	-0.124939
-0.554540	quite inefficient. The	-0.124939
-0.262924	is critical. The	-0.124939
-0.496509	be critical. The	-0.124939
-0.074651	is available. The	-0.124939
-0.278836	libraries available. The	-0.124939
-1.151354	is executed. The	-0.124939
-0.353711	times faster. The	-0.124939
-0.353711	calculation faster. The	-0.124939
-0.353711	execute faster. The	-0.124939
-0.628107	performance problems. The	-0.124939
-0.443159	Installation problems. The	-0.124939
-1.189620	template parameter. The	-0.124939
-0.815279	of overflow. The	-0.124939
-0.278836	single element. The	-0.124939
-0.278836	matrix element. The	-0.124939
-0.396622	per element. The	-0.124939
-0.278836	pivot element. The	-0.124939
-0.812740	register storage. The	-0.124939
-0.330689	the value. The	-0.124939
-0.330689	return value. The	-0.124939
-0.330689	calculated value. The	-0.124939
-0.552063	input file. The	-0.124939
-0.603361	a register. The	-0.124939
-0.603361	vector register. The	-0.124939
-0.552063	at once The	-0.124939
-0.428008	the system. The	-0.124939
-0.840525	operating system. The	-0.124939
-0.728734	very fast. The	-0.124939
-0.426835	quite fast. The	-0.124939
-0.545043	execution units. The	-0.124939
-0.550503	that branch. The	-0.124939
-0.330689	an array. The	-0.124939
-0.330689	another array. The	-0.124939
-0.330689	normal array. The	-0.124939
-1.041035	cache space. The	-0.124939
-0.939888	to zero. The	-0.124939
-0.780103	cache line. The	-0.124939
-0.426835	matrix line. The	-0.124939
-0.426835	adding vectors. The	-0.124939
-0.426835	128-bit vectors. The	-0.124939
-0.426835	large applications. The	-0.124939
-0.426835	Windows applications. The	-0.124939
-1.173372	OS X The	-0.124939
-0.799405	the table. The	-0.124939
-0.547131	up-to-date solution. The	-0.124939
-0.796260	and Linux. The	-0.124939
-0.409274	to integer. The	-0.124939
-0.694228	an integer. The	-0.124939
-0.407991	of optimizations. The	-0.124939
-0.575364	interprocedural optimizations. The	-0.124939
-0.691757	this case. The	-0.124939
-0.407991	32-bit case. The	-0.124939
-0.547131	Intel processor. The	-0.124939
-0.432348	of bits. The	-0.124939
-0.432348	64 bits. The	-0.124939
-0.305806	extra bits. The	-0.124939
-0.407991	loop is. The	-0.124939
-0.407991	itself is. The	-0.124939
-0.303785	work automatically. The	-0.124939
-0.303785	alignment automatically. The	-0.124939
-0.303785	vectorize automatically. The	-0.124939
-0.548900	platforms. Clang The	-0.124939
-0.407991	in details. The	-0.124939
-0.738699	for details. The	-0.124939
-0.304794	to vectorization. The	-0.124939
-0.400760	automatic vectorization. The	-0.124939
-0.545369	support anyway. The	-0.124939
-0.651638	to do. The	-0.124939
-0.409274	not do. The	-0.124939
-0.349235	multiple threads. The	-0.124939
-0.303785	between threads. The	-0.124939
-0.931666	model number. The	-0.124939
-0.538615	a constant. The	-0.124939
-0.787870	bit systems: The	-0.124939
-0.784298	Calculate polynomial The	-0.124939
-0.538615	avoiding this. The	-0.124939
-0.538615	first call. The	-0.124939
-0.540640	right prediction. The	-0.124939
-0.431838	the application. The	-0.124939
-0.385490	particular application. The	-0.124939
-0.270323	MFC application. The	-0.124939
-0.385490	used here. The	-0.124939
-0.385490	odd here. The	-0.124939
-0.538615	class members. The	-0.124939
-0.911306	mentioned above. The	-0.124939
-0.540640	positive result. The	-0.124939
-0.384072	25 7 The	-0.124939
-0.384072	tool. 7 The	-0.124939
-1.134804	stack unwinding The	-0.124939
-0.540640	breakpoint again. The	-0.124939
-0.384072	in one. The	-0.124939
-0.540640	new one. The	-0.124939
-0.787870	stamp counter. The	-0.124939
-0.540640	a structure. The	-0.124939
-0.540640	or structure. The	-0.124939
-0.279577	Pentium 4. The	-0.124939
-0.384072	and branches. The	-0.124939
-0.384072	nearby branches. The	-0.124939
-0.529333	the profiler. The	-0.124939
-0.966278	to maintain. The	-0.124939
-0.889721	development tools. The	-0.124939
-0.529333	negative numbers. The	-0.124939
-0.529333	assembly names. The	-0.124939
-0.529333	first manual. The	-0.124939
-0.330363	a computer. The	-0.124939
-0.227426	4 computer. The	-0.124939
-0.227426	another computer. The	-0.124939
-0.966278	is finished. The	-0.124939
-0.496265	two ways. The	-0.124939
-0.352657	8 ways. The	-0.124939
-0.352657	program. 16.2 The	-0.124939
-0.352657	155 16.2 The	-0.124939
-0.531713	than pow The	-0.124939
-0.889721	+= a[i]; The	-0.124939
-0.768086	high priority. The	-0.124939
-0.966278	out-of-order execution. The	-0.124939
-0.529333	Intel/x86-compatible microprocessors. The	-0.124939
-0.496265	at www.agner.org/optimize/asmlib.zip. The	-0.124939
-0.352657	library www.agner.org/optimize/asmlib.zip. The	-0.124939
-0.966278	branch mispredictions. The	-0.124939
-0.889721	do so. The	-0.124939
-0.529333	code addresses. The	-0.124939
-0.529333	connect them. The	-0.124939
-0.889721	floating point. The	-0.124939
-0.339163	by 8. The	-0.124939
-0.352657	further explanation. The	-0.124939
-0.352657	little explanation. The	-0.124939
-0.352657	class elements. The	-0.124939
-0.496265	array elements. The	-0.124939
-0.262829	is doubled. The	-0.124939
-0.529333	or division. The	-0.124939
-0.183489	clock cycle. The	-0.124939
-0.529333	AVX. 5. The	-0.124939
-0.529333	pitfalls here: The	-0.124939
-0.515772	clearly better. The	-0.124939
-0.515772	broken up. The	-0.124939
-0.515772	usability reasons. The	-0.124939
-0.859232	an exception. The	-0.124939
-0.515772	a type. The	-0.124939
-0.437195	at initialization. The	-0.124939
-0.309422	necessary initialization. The	-0.124939
-0.437195	of ebx. The	-0.124939
-0.309422	to ebx. The	-0.124939
-0.515772	is bad The	-0.124939
-0.309422	is true. The	-0.124939
-0.309422	always true. The	-0.124939
-0.515772	class objects. The	-0.124939
-0.309422	loop index. The	-0.124939
-0.309422	array index. The	-0.124939
-0.076779	or *.so). The	-0.124939
-0.258918	(*.dll, *.so). The	-0.124939
-0.309422	code lines. The	-0.124939
-0.437195	cache lines. The	-0.124939
-0.744850	is running. The	-0.124939
-0.859232	user input. The	-0.124939
-0.518657	a dispatcher. The	-0.124939
-0.515772	from optimal. The	-0.124939
-0.859232	an example. The	-0.124939
-0.518657	+ 1.0f; The	-0.124939
-0.744850	of efficiency. The	-0.124939
-0.515772	64-bit versions. The	-0.124939
-0.744850	not cached. The	-0.124939
-0.309422	is unsigned. The	-0.124939
-0.309422	or unsigned. The	-0.124939
-0.309422	with pointers. The	-0.124939
-0.437195	invalid pointers. The	-0.124939
-0.515772	two double. The	-0.124939
-0.110443	the diagonal. The	-0.124939
-0.749749	to another. The	-0.124939
-0.859232	pointer aliasing. The	-0.124939
-0.307309	programming style. The	-0.124939
-0.515772	clock counts. The	-0.124939
-0.515772	are smaller. The	-0.124939
-0.170110	platforms. 3. The	-0.124939
-0.170110	interrupt 3. The	-0.124939
-0.170110	namespace. 3. The	-0.124939
-0.170110	other module. The	-0.124939
-0.154525	another module. The	-0.124939
-0.064798	Dynamic cast The	-0.124939
-0.064798	Static cast The	-0.124939
-0.064798	Reinterpret cast The	-0.124939
-0.064798	Const cast The	-0.124939
-0.515772	parentheses manually. The	-0.124939
-0.859232	is run. The	-0.124939
-0.309422	double format. The	-0.124939
-0.309422	file format. The	-0.124939
-0.515772	optimized programs. The	-0.124939
-0.170110	compilers work. The	-0.124939
-0.170110	important work. The	-0.124939
-0.170110	microprocessors work. The	-0.124939
-0.494071	and compact. The	-0.124939
-0.201634	following reasons: The	-0.124939
-0.245741	of error. The	-0.124939
-0.245741	programming error. The	-0.124939
-0.494071	library. 119 The	-0.124939
-0.494071	(byte code). The	-0.124939
-0.708688	from errors. The	-0.124939
-0.494071	multiplications only. The	-0.124939
-0.708688	range analysis The	-0.124939
-0.494071	processor features. The	-0.124939
-0.201634	in advance. The	-0.124939
-0.494071	number 28. The	-0.124939
-0.708688	it takes. The	-0.124939
-0.245741	using exceptions. The	-0.124939
-0.245741	hardware exceptions. The	-0.124939
-0.105716	is implemented. The	-0.124939
-0.494071	called once. The	-0.124939
-0.494071	64-bit Windows). The	-0.124939
-0.494071	compile for. The	-0.124939
-0.494071	second step. The	-0.124939
-0.708688	CPU brand. The	-0.124939
-0.812740	garbage collection. The	-0.124939
-0.494071	} 135 The	-0.124939
-0.497733	particular purpose. The	-0.124939
-0.245741	before compilation. The	-0.124939
-0.353711	just-in-time compilation. The	-0.124939
-0.708688	accessed sequentially. The	-0.124939
-0.494071	the operands. The	-0.124939
-0.245741	iterations back. The	-0.124939
-0.245741	written back. The	-0.124939
-0.494071	be expected. The	-0.124939
-0.812740	operating systems". The	-0.124939
-0.812740	calling conventions. The	-0.124939
-0.494071	are stored. The	-0.124939
-0.714706	been deallocated. The	-0.124939
-0.494071	matrix sizes. The	-0.124939
-0.105716	example 9.6b. The	-0.425969
-0.812740	the same. The	-0.124939
-0.708688	not occur. The	-0.124939
-0.812740	error prone. The	-0.124939
-0.353711	the linker. The	-0.124939
-0.245741	dynamic linker. The	-0.124939
-0.494071	no multiplications. The	-0.124939
-0.708688	need metaprogramming. The	-0.124939
-0.245741	assembly output. The	-0.124939
-0.245741	Boolean output. The	-0.124939
-0.494071	arithmetic expression. The	-0.124939
-0.245741	allocated resource. The	-0.124939
-0.245741	limited resource. The	-0.124939
-0.708688	and truncation. The	-0.124939
-0.494071	is big. The	-0.124939
-0.494071	four float. The	-0.124939
-0.201634	+ 1.0f;} The	-0.124939
-0.494071	of n. The	-0.124939
-0.494071	illegitimate copying. The	-0.124939
-0.494071	in x. The	-0.124939
-0.494071	or post-increment. The	-0.124939
-0.494071	recursive templates. The	-0.124939
-0.644310	page 107). The	-0.124939
-0.644310	13.5 Implementation The	-0.124939
-0.140990	fine-grained parallelism. The	-0.124939
-0.140990	natural parallelism. The	-0.124939
-0.644310	clock frequency. The	-0.124939
-0.644310	page 71). The	-0.124939
-0.453674	background jobs. The	-0.124939
-0.453674	the context. The	-0.124939
-0.140990	^ operator. The	-0.124939
-0.140990	sizeof operator. The	-0.124939
-0.644310	automatic parallelization. The	-0.124939
-0.140990	Event-based sampling: The	-0.124939
-0.140990	Time-based sampling: The	-0.124939
-0.644310	one instance. The	-0.124939
-0.453674	runtime frameworks. The	-0.124939
-0.140990	page 127. The	-0.124939
-0.140990	generates 127. The	-0.124939
-0.453674	single instruction. The	-0.124939
-0.453674	most cases: The	-0.124939
-0.644310	the mouse. The	-0.124939
-0.644310	more difficult. The	-0.124939
-0.453674	B values. The	-0.124939
-0.644310	"Hello 2" The	-0.124939
-0.140990	the heap. The	-0.124939
-0.140990	memory heap. The	-0.124939
-0.453674	works differently. The	-0.124939
-0.644310	assembly language". The	-0.124939
-0.644310	page 52. The	-0.124939
-0.453674	7.13 Loops The	-0.124939
-0.453674	and GOT. The	-0.124939
-0.453674	a string. The	-0.124939
-0.453674	is volatile. The	-0.124939
-0.453674	use relocation. The	-0.124939
-0.453674	not supported. The	-0.124939
-0.644310	of range. The	-0.124939
-0.644310	of programming. The	-0.124939
-0.644310	has disadvantages: The	-0.124939
-0.453674	user's needs. The	-0.124939
-0.453674	by __fastcall. The	-0.124939
-0.453674	is specified. The	-0.124939
-0.140990	of underflow. The	-0.124939
-0.140990	and underflow. The	-0.124939
-0.453674	when false. The	-0.124939
-0.453674	10 Multithreading The	-0.124939
-0.453674	dispatch methods. The	-0.124939
-0.453674	is small. The	-0.124939
-0.644310	make utility. The	-0.124939
-0.644310	be controlled. The	-0.124939
-0.140990	for updating. The	-0.124939
-0.140990	Hardware updating. The	-0.124939
-0.140990	allocated separately. The	-0.124939
-0.140990	measured separately. The	-0.124939
-0.644310	page 26. The	-0.124939
-0.644310	members (properties) The	-0.124939
-0.453674	cores. 60 The	-0.124939
-0.453674	of iterations. The	-0.124939
-0.644310	is required. The	-0.124939
-0.453674	in use. The	-0.124939
-0.453674	class declaration. The	-0.124939
-0.644310	go undetected. The	-0.124939
-0.453674	enabled. Volatile The	-0.124939
-0.453674	is allowed. The	-0.124939
-0.453674	table 8.1. The	-0.124939
-0.453674	+ c) The	-0.124939
-0.453674	detection mechanism. The	-0.124939
-0.064798	be negative. The	-0.124939
-0.453674	be defined. The	-0.124939
-0.453674	work load. The	-0.124939
-0.453674	level framework. The	-0.124939
-0.644310	memory area. The	-0.124939
-0.453674	standard PCs. The	-0.124939
-0.644310	test examples. The	-0.124939
-0.644310	be predicted. The	-0.124939
-0.453674	inverted mask. The	-0.124939
-0.140990	same machine. The	-0.124939
-0.140990	virtual machine. The	-0.124939
-0.644310	allocated dynamically. The	-0.124939
-0.453674	than 1.23456. The	-0.124939
-0.644310	table (PLT). The	-0.124939
-0.453674	or column. The	-0.124939
-0.644310	(see below). The	-0.124939
-0.453674	unknown sources. The	-0.124939
-0.644310	page 137). The	-0.124939
-0.453674	the template. The	-0.124939
-0.453674	was started. The	-0.124939
-0.140990	= 80. The	-0.124939
-0.140990	page 80. The	-0.124939
-0.453674	number i. The	-0.124939
-0.140990	is 0. The	-0.124939
-0.140990	< 0. The	-0.124939
-0.453674	work correctly. The	-0.124939
-0.140990	3 1.1 The	-0.124939
-0.140990	information. 1.1 The	-0.124939
-0.453674	page 43). The	-0.124939
-0.644310	and Mac. The	-0.124939
-0.644310	the fraction. The	-0.124939
-0.140990	other compilers). The	-0.124939
-0.140990	DOS compilers). The	-0.124939
-0.140990	other processes. The	-0.124939
-0.140990	multiple processes. The	-0.124939
-0.644310	sign bit. The	-0.124939
-0.453674	dynamic linking. The	-0.124939
-0.453674	classes. Security The	-0.124939
-0.453674	7.5 Booleans The	-0.124939
-0.453674	linked together. The	-0.124939
-0.453674	become invalid. The	-0.124939
-0.064798	page 122. The	-0.124939
-0.064798	program starts. The	-0.124939
-0.140990	contains i/2+r. The	-0.124939
-0.140990	computing i/2+r. The	-0.124939
-0.644310	below shows. The	-0.124939
-0.064798	is closed. The	-0.124939
-0.453674	is avoided. The	-0.124939
-0.140990	bits each. The	-0.124939
-0.140990	bytes each. The	-0.124939
-0.453674	and later. The	-0.124939
-0.644310	page 140). The	-0.124939
-0.453674	operations. 105 The	-0.124939
-0.644310	/ 4; The	-0.124939
-0.453674	users have. The	-0.124939
-0.644310	and BSD. The	-0.124939
-0.351080	page 51). The	-0.124939
-0.351080	loop count. The	-0.124939
-0.351080	calculated independently. The	-0.124939
-0.351080	cache (en.wikipedia.org/wiki/L2_cache). The	-0.124939
-0.351080	being initialized. The	-0.124939
-0.351080	= i+1; The	-0.124939
-0.351080	is terminated. The	-0.124939
-0.351080	code generality. The	-0.124939
-0.351080	= sin(0.8); The	-0.124939
-0.351080	complexity (en.wikipedia.org/wiki/Standard_Template_Library). The	-0.124939
-0.351080	Ignoring virtualization. The	-0.124939
-0.351080	be installed. The	-0.124939
-0.351080	with interpretation. The	-0.124939
-0.351080	happens rarely. The	-0.124939
-0.351080	is insufficient. The	-0.124939
-0.351080	CPU core). The	-0.124939
-0.351080	to be. The	-0.124939
-0.351080	other situations: The	-0.124939
-0.351080	page 119). The	-0.124939
-0.351080	next paragraph. The	-0.124939
-0.351080	every millisecond. The	-0.124939
-0.351080	and destructors. The	-0.124939
-0.351080	about division). The	-0.124939
-0.351080	page 27). The	-0.124939
-0.351080	pointers. 144 The	-0.124939
-0.351080	is ecx+eax*4. The	-0.124939
-0.351080	CPUs optimally. The	-0.124939
-0.351080	compiling module2.cpp. The	-0.124939
-0.351080	in eax. The	-0.124939
-0.351080	/ (b1*b2); The	-0.124939
-0.351080	worst-case performance: The	-0.124939
-0.351080	value 1000. The	-0.124939
-0.351080	at runtime). The	-0.124939
-0.351080	than 20. The	-0.124939
-0.351080	Graphics accelerators The	-0.124939
-0.351080	mov eax,0. The	-0.124939
-0.351080	before storing. The	-0.124939
-0.351080	example 8.15b. The	-0.124939
-0.351080	a vector). The	-0.124939
-0.351080	methods: Instrumentation: The	-0.124939
-0.351080	in y. The	-0.124939
-0.351080	only SSE). The	-0.124939
-0.351080	a formalism. The	-0.124939
-0.351080	suitable duration. The	-0.124939
-0.351080	and disadvantages. The	-0.124939
-0.351080	very kludgy. The	-0.124939
-0.351080	Linux, sched_setaffinity). The	-0.124939
-0.351080	see shortly. The	-0.124939
-0.351080	and databases. The	-0.124939
-0.351080	type-casting pointers: The	-0.124939
-0.351080	at Wikibooks. The	-0.124939
-0.351080	overlap. 27 The	-0.124939
-0.351080	becomes noticeable. The	-0.124939
-0.351080	you know). The	-0.124939
-0.351080	and error-prone. The	-0.124939
-0.351080	but risky. The	-0.124939
-0.351080	the wheel. The	-0.124939
-0.351080	the weekdays. The	-0.124939
-0.351080	example 16.2. The	-0.124939
-0.351080	be straightforward. The	-0.124939
-0.351080	optimized further. The	-0.124939
-0.351080	a password. The	-0.124939
-0.351080	p. 104). The	-0.124939
-0.351080	becomes contiguous. The	-0.124939
-0.351080	versions instead. The	-0.124939
-0.351080	binary digits. The	-0.124939
-0.351080	page 44. The	-0.124939
-0.351080	physical factors. The	-0.124939
-0.351080	>= operators). The	-0.124939
-0.351080	remain unchanged. The	-0.124939
-0.351080	throw() specification. The	-0.124939
-0.351080	four floats. The	-0.124939
-0.351080	register renaming. The	-0.124939
-0.351080	used most. The	-0.124939
-0.351080	implementation dependent. The	-0.124939
-0.351080	page 143. The	-0.124939
-0.351080	and ||). The	-0.124939
-0.351080	Windows: __rdtsc()). The	-0.124939
-0.351080	/ 1.2345); The	-0.124939
-0.351080	is repetitive. The	-0.124939
-0.351080	certain tolerance. The	-0.124939
-0.351080	its reputation. The	-0.124939
-0.351080	aliasing (/Oa). The	-0.124939
-0.351080	takes. Debugging. The	-0.124939
-0.351080	and FPGAs. The	-0.124939
-0.351080	register keyword. The	-0.124939
-0.351080	= 100000001.23456. The	-0.124939
-0.351080	e.g. /arch:SSE2. The	-0.124939
-0.351080	several flaws: The	-0.124939
-0.351080	(Intel Atom). The	-0.124939
-0.351080	performance somewhat. The	-0.124939
-0.351080	p. 28) The	-0.124939
-0.351080	or __attribute__((fastcall)). The	-0.124939
-0.351080	page 134. The	-0.124939
-0.351080	+ r.b;} The	-0.124939
-0.351080	necessarily newer. The	-0.124939
-0.351080	variable m. The	-0.124939
-0.351080	best algorithm. The	-0.124939
-0.351080	63 . The	-0.124939
-0.351080	well documented. The	-0.124939
-0.351080	using new. The	-0.124939
-0.351080	the end. The	-0.124939
-0.351080	or 2016. The	-0.124939
-0.351080	page 84). The	-0.124939
-0.351080	following features: The	-0.124939
-0.351080	2n -1. The	-0.124939
-0.351080	of temp. The	-0.124939
-0.351080	and tedious. The	-0.124939
-0.351080	hardware design. The	-0.124939
-0.351080	copying. Security. The	-0.124939
-0.351080	efficient alternative. The	-0.124939
-0.351080	Gauss elimination. The	-0.124939
-0.351080	Kernel Library. The	-0.124939
-0.351080	normal afterwards. The	-0.124939
-0.351080	the next. The	-0.124939
-0.351080	and VIA. The	-0.124939
-0.351080	purposes (www.boost.org). The	-0.124939
-0.351080	and SVML. The	-0.124939
-0.351080	Plus2 (&a); The	-0.124939
-0.351080	not satisfactory. The	-0.124939
-0.351080	as <. The	-0.124939
-0.351080	is fastest. The	-0.124939
-0.351080	Agner Fog The	-0.124939
-0.351080	that doesn’t. The	-0.124939
-0.351080	is distributed. The	-0.124939
-0.351080	number 6! The	-0.124939
-0.351080	systems. 67 The	-0.124939
-0.351080	for details). The	-0.124939
-0.351080	the conversion. The	-0.124939
-0.351080	performance costs. The	-0.124939
-0.351080	return a+1;. The	-0.124939
-0.351080	are satisfied. The	-0.124939
-0.351080	large delays. The	-0.124939
-0.351080	class library). The	-0.124939
-0.351080	changed freely. The	-0.124939
-0.351080	and increment. The	-0.124939
-0.351080	multithreaded applications: The	-0.124939
-0.351080	page 72). The	-0.124939
-0.351080	performance/price ratio. The	-0.124939
-0.351080	name mangling. The	-0.124939
-0.351080	of sum. The	-0.124939
-0.351080	/ 3.0; The	-0.124939
-0.351080	i<100; i++)a[i]=2*i; The	-0.124939
-0.351080	"FDIV bug". The	-0.124939
-0.351080	8. 71 The	-0.124939
-0.351080	and modular. The	-0.124939
-0.351080	three advantages: The	-0.124939
-0.351080	128-bit reads. The	-0.124939
-0.351080	: 2.6f; The	-0.124939
-0.351080	hot spots. The	-0.124939
-0.351080	Taylor series. The	-0.124939
-0.351080	|, ~. The	-0.124939
-0.351080	preceding row. The	-0.124939
-0.351080	have tested. The	-0.124939
-0.351080	to vectorize. The	-0.124939
-0.351080	vectorclass www.agner.org/optimize/#vectorclass. The	-0.124939
-0.351080	function tables. The	-0.124939
-0.351080	more powerful. The	-0.124939
-0.351080	integer comparisons. The	-0.124939
-0.351080	(80 bits). The	-0.124939
-0.351080	return _mm_cvtsd_si32(_mm_load_sd(&x));} The	-0.124939
-0.351080	have tried. The	-0.124939
-0.351080	as follows. The	-0.124939
-0.351080	it directly. The	-0.124939
-0.351080	"frame pointer". The	-0.124939
-0.351080	reference parameters). The	-0.124939
-0.351080	years old. The	-0.124939
-0.351080	with these. The	-0.124939
-0.351080	been wasted. The	-0.124939
-0.351080	example 7.30b. The	-0.124939
-0.351080	page 70). The	-0.124939
-0.597663	all is for	-0.124939
-1.283703	manual is for	-0.124939
-0.597663	preference is for	-0.124939
-1.804744	code and for	-0.124939
-0.591208	array and for	-0.124939
-1.476243	pointers and for	-0.124939
-0.591208	speed and for	-0.124939
-0.591208	features and for	-0.124939
-0.591208	variables, and for	-0.124939
-0.598849	use that for	-0.124939
-1.064523	manuals are for	-0.124939
-0.590274	this or for	-0.124939
-0.590274	optimization or for	-0.124939
-0.590274	recovering or for	-0.124939
-0.202975	use it for	-0.124939
-2.012408	the function for	-0.124939
-1.404613	a function for	-0.124939
-0.857537	template function for	-0.124939
-0.857537	inlined function for	-0.124939
-0.578687	right function for	-0.124939
-0.857537	strlen function for	-0.124939
-1.640790	The code for	-0.124939
-1.187485	program code for	-0.124939
-0.560824	64-bit code for	-0.124939
-1.155395	extra code for	-0.124939
-0.714892	assembly code for	-0.124939
-1.413714	intermediate code for	-0.124939
-0.824196	optimal code for	-0.124939
-0.560824	identical code for	-0.124939
-0.195554	serial code for	-0.124939
-0.560824	build code for	-0.124939
-0.888386	same as for	-0.124939
-0.595714	CPUs, not for	-0.124939
-0.587864	precision than for	-0.124939
-0.587864	contentions than for	-0.124939
-0.587864	shared_ptr than for	-0.124939
-1.080281	A compiler for	-0.124939
-1.180878	Intel compiler for	-0.124939
-0.568840	C++ compiler for	-0.425969
-1.071809	Gnu compiler for	-0.124939
-1.019440	Microsoft compiler for	-0.124939
-0.548026	source compiler for	-0.124939
-0.548026	your compiler for	-0.124939
-0.548026	PathScale compiler for	-0.124939
-0.548026	commercial compiler for	-0.124939
-0.548026	cheap compiler for	-0.124939
-1.289353	of x for	-0.124939
-1.400529	cc[]) { for	-0.124939
-0.241202	r++) { for	-0.602060
-0.830621	TILESIZE) { for	-0.124939
-0.196304	r2++) { for	-0.425969
-0.564316	bb) { for	-0.124939
-0.595551	using this for	-0.124939
-1.268484	of time for	-0.124939
-0.862569	development time for	-0.124939
-0.581327	save time for	-0.124939
-0.581327	saves time for	-0.124939
-1.439343	to use for	-0.124939
-1.703103	can use for	-0.124939
-1.137791	of memory for	-0.124939
-0.852356	much data for	-0.124939
-0.575952	little data for	-0.124939
-1.005142	prefetch data for	-0.124939
-0.575952	prefetching data for	-0.124939
-1.949039	a program for	-0.124939
-0.579945	is different for	-0.124939
-0.859933	be different for	-0.124939
-1.015954	are different for	-0.124939
-2.352892	the same for	-0.124939
-0.512924	have functions for	-0.124939
-0.512924	efficient functions for	-0.124939
-0.228429	many functions for	-0.124939
-0.512924	necessary functions for	-0.124939
-1.165599	intrinsic functions for	-0.124939
-0.512924	various functions for	-0.124939
-0.740033	frame functions for	-0.124939
-0.592105	Mathematical functions for	-0.124939
-0.977913	Intrinsic functions for	-0.124939
-0.512924	QueryPerformanceCounter functions for	-0.124939
-0.321940	used only for	-0.124939
-0.531415	method only for	-0.124939
-0.368221	works only for	-0.124939
-0.771700	dispatching only for	-0.124939
-0.531415	allowed only for	-0.124939
-0.590303	no instruction for	-0.124939
-0.590303	assembly instruction for	-0.124939
-1.721293	a loop for	-0.124939
-1.238904	// loop for	-0.124939
-0.580349	Main loop for	-0.124939
-0.587822	spots, but for	-0.124939
-0.587822	computing, but for	-0.124939
-0.382197	is used for	-0.124939
-0.343801	and used for	-0.124939
-0.378650	be used for	-0.157123
-0.763292	are used for	-0.124939
-0.343801	// used for	-0.124939
-0.138733	not used for	-0.425969
-0.483987	time used for	-0.124939
-0.483987	when used for	-0.124939
-0.343801	CPU used for	-0.124939
-0.542968	also used for	-0.124939
-0.621930	often used for	-0.124939
-0.343801	space used for	-0.124939
-0.343801	algorithms used for	-0.124939
-0.343801	currently used for	-0.124939
-0.691448	and one for	-0.124939
-0.545320	program, one for	-0.124939
-0.545320	set, one for	-0.124939
-0.545320	times, one for	-0.124939
-0.545320	parts: one for	-0.124939
-0.545320	branches: one for	-0.124939
-1.529502	of cache for	-0.124939
-0.586071	extra cache for	-0.124939
-1.717239	instruction set for	-0.425969
-1.606127	or class for	-0.124939
-0.665161	Intel compilers for	-0.124939
-0.595327	used most for	-0.124939
-1.176855	vector size for	-0.124939
-0.890482	to b for	-0.124939
-1.625476	function library for	-0.124939
-1.022116	interface library for	-0.124939
-0.582024	intermediate object for	-0.124939
-0.582024	temporary object for	-0.124939
-0.580728	for C++ for	-0.124939
-0.861425	as C++ for	-0.124939
-0.864554	is efficient for	-0.124939
-1.550481	most efficient for	-0.124939
-0.580353	same array for	-0.124939
-1.017068	linear array for	-0.124939
-1.415806	it possible for	-0.124939
-1.040713	as possible for	-0.124939
-0.555094	therefore possible for	-0.124939
-0.555094	rarely possible for	-0.124939
-0.791330	64-bit version for	-0.124939
-0.542595	bit version for	-0.124939
-1.003553	new version for	-0.124939
-0.542595	another version for	-0.124939
-0.542595	separate version for	-0.124939
-0.594505	temporary objects for	-0.124939
-1.636018	a variable for	-0.124939
-1.407162	induction variable for	-0.124939
-0.539109	induction variables for	-0.124939
-0.318685	Induction variables for	-0.124939
-1.050201	hash table for	-0.124939
-0.777178	good performance for	-0.124939
-0.534557	optimize performance for	-0.124939
-0.534557	identical performance for	-0.124939
-0.534557	poor performance for	-0.124939
-0.534557	degrades performance for	-0.124939
-1.600883	the software for	-0.124939
-0.370145	code branch for	-0.124939
-1.874463	is called for	-0.124939
-1.380974	= 0; for	-0.124939
-2.130490	For example, for	-0.124939
-1.172411	to unsigned for	-0.124939
-1.124252	vector register for	-0.124939
-0.562816	same register for	-0.124939
-0.538495	temporary register for	-0.124939
-1.068857	function libraries for	-0.124939
-0.749073	standard libraries for	-0.124939
-0.518260	Many libraries for	-0.124939
-0.518260	well-tested libraries for	-0.124939
-0.591840	Function template for	-0.124939
-0.570650	using registers for	-0.124939
-1.463753	XMM registers for	-0.124939
-1.109738	the need for	-0.124939
-0.552310	The need for	-0.124939
-1.410768	no need for	-0.124939
-1.656186	to test for	-0.124939
-0.238956	is useful for	-0.124939
-0.526286	be useful for	-0.221849
-0.069804	are useful for	-0.124939
-0.274831	most useful for	-0.124939
-0.438392	also useful for	-0.124939
-0.323423	very useful for	-0.124939
-0.529565	cases, even for	-0.124939
-0.529565	occurs, even for	-0.124939
-0.529565	time-consumer even for	-0.124939
-0.529565	times, even for	-0.124939
-1.340375	this method for	-0.124939
-0.569883	preferred method for	-0.124939
-1.571491	not always for	-0.124939
-1.204453	by 16 for	-0.124939
-0.567375	page 16 for	-0.124939
-1.820300	operating system for	-0.124939
-0.568965	page 32 for	-0.124939
-0.568965	preferably 32 for	-0.124939
-1.029442	the file for	-0.124939
-0.584840	header file for	-0.124939
-0.339050	Header file for	-0.425969
-1.011898	the bits for	-0.124939
-0.567117	enough bits for	-0.124939
-0.512448	integer operations for	-0.301030
-0.836526	is 0 for	-0.124939
-0.567504	value 0 for	-0.124939
-0.591282	different cases for	-0.124939
-0.461291	of instructions for	-0.124939
-0.461291	no instructions for	-0.124939
-0.172079	extra instructions for	-0.124939
-0.461291	intrinsic instructions for	-0.124939
-0.656182	reorder instructions for	-0.124939
-0.461291	ADX instructions for	-0.124939
-0.550571	is available for	-0.124939
-0.486763	are available for	-0.124939
-0.407635	register available for	-0.124939
-0.574841	registers available for	-0.124939
-0.407635	Only available for	-0.124939
-1.041609	residual error for	-0.124939
-0.332538	response times for	-0.124939
-1.699405	the stack for	-0.124939
-1.800609	is important for	-0.124939
-1.008248	very important for	-0.124939
-0.589934	other CPUs for	-0.124939
-0.587901	too large for	-0.124939
-1.246549	to work for	-0.124939
-1.128058	doesn't work for	-0.124939
-0.668267	code versions for	-0.124939
-1.215665	different versions for	-0.124939
-0.523159	multiple versions for	-0.602060
-0.468967	several versions for	-0.124939
-0.588888	physics processor for	-0.124939
-0.511741	is compiled for	-0.124939
-0.424590	and compiled for	-0.124939
-0.803702	code compiled for	-0.124939
-0.424590	each compiled for	-0.124939
-0.162351	programs compiled for	-0.124939
-0.874441	too big for	-0.124939
-1.038614	is best for	-0.124939
-1.572486	is necessary for	-0.124939
-1.207168	not necessary for	-0.124939
-1.165561	per element for	-0.124939
-1.517119	assembly language for	-0.124939
-1.241731	The speed for	-0.124939
-0.755500	int i; for	-0.903090
-1.158557	is common for	-0.124939
-0.556600	/arch:AVX etc. for	-0.124939
-0.556600	-mavx, etc. for	-0.124939
-1.260911	to compile for	-0.124939
-0.589885	Enable exception for	-0.124939
-1.341770	be allocated for	-0.124939
-0.561506	the option for	-0.124939
-0.165602	an option for	-0.124939
-0.448367	compiler option for	-0.124939
-0.369657	file" option for	-0.124939
-0.443129	is good for	-0.425969
-0.182447	are good for	-0.124939
-1.564931	a matrix for	-0.124939
-0.872908	of precision for	-0.124939
-1.246039	that works for	-0.124939
-0.637807	is optimized for	-0.124939
-0.565570	are optimized for	-0.124939
-0.401305	not optimized for	-0.124939
-0.401305	you optimized for	-0.124939
-0.401305	each optimized for	-0.124939
-0.373983	highly optimized for	-0.124939
-0.565570	Not optimized for	-0.124939
-0.174485	the manual for	-0.124939
-0.671062	compiler manual for	-0.124939
-0.972092	this manual for	-0.124939
-0.470731	vectorclass manual for	-0.124939
-0.588553	a[100], b; for	-0.124939
-0.352237	the check for	-0.124939
-0.314532	to check for	-0.124939
-0.352237	not check for	-0.124939
-0.244591	then check for	-0.124939
-0.049466	no check for	-0.425969
-0.244591	automatically check for	-0.124939
-0.244591	automatic check for	-0.124939
-0.244591	might check for	-0.124939
-0.244591	missing check for	-0.124939
-0.244591	(1) check for	-0.124939
-0.589403	are advantageous for	-0.124939
-1.317426	efficient solution for	-0.124939
-0.762267	a container for	-0.124939
-0.521462	one container for	-0.124939
-0.114296	have support for	-0.124939
-0.114296	has support for	-0.124939
-0.269854	make support for	-0.124939
-0.269854	some support for	-0.124939
-0.072764	hardware support for	-0.124939
-0.269854	better support for	-0.124939
-0.114296	off support for	-0.124939
-0.269854	inherent support for	-0.124939
-0.269854	excellent support for	-0.124939
-0.747383	overloaded operators for	-0.124939
-0.788527	bitwise operators for	-0.425969
-0.879044	rows; i++) for	-0.124939
-0.869606	a standard for	-0.124939
-0.872046	microprocessor hardware for	-0.124939
-0.184727	and 1 for	-0.124939
-0.512745	with 1 for	-0.124939
-0.547356	and optimizing for	-0.124939
-0.547356	between optimizing for	-0.124939
-0.549753	some information for	-0.124939
-0.549753	recovery information for	-0.124939
-1.818476	clock cycles for	-0.124939
-0.585902	// ... for	-0.124939
-0.383674	i; ... for	-0.425969
-0.415130	136 ... for	-0.124939
-0.415130	j; ... for	-0.124939
-0.415130	List[ArraySize]; ... for	-0.124939
-0.546878	64-bit addresses for	-0.124939
-0.546878	element addresses for	-0.124939
-0.928075	source files for	-0.124939
-0.545646	Header files for	-0.124939
-1.265728	not recommended for	-0.124939
-1.716581	memory allocation for	-0.124939
-1.598997	to optimize for	-0.124939
-0.585704	mentioned above for	-0.124939
-0.868479	caching problems for	-0.124939
-1.298363	is optimal for	-0.124939
-0.583690	same space for	-0.124939
-1.427990	some cases, for	-0.124939
-0.582652	all branches for	-0.124939
-1.279270	= 1; for	-0.124939
-1.360578	+ 1; for	-0.124939
-1.155968	code caching for	-0.124939
-0.584244	best implementation for	-0.124939
-0.961595	exception handling for	-0.124939
-0.453635	used methods for	-0.124939
-0.453635	useful methods for	-0.124939
-0.453635	various methods for	-0.124939
-0.453635	suggests methods for	-0.124939
-0.581938	be separate for	-0.124939
-1.322563	memory block for	-0.124939
-0.535192	small block for	-0.124939
-0.492012	different name for	-0.124939
-0.881409	same name for	-0.124939
-0.492012	local name for	-0.124939
-0.477629	r, c; for	-0.425969
-1.158054	a disadvantage for	-0.124939
-0.581537	annoyingly high for	-0.124939
-1.451866	to zero for	-0.124939
-0.583211	reserve resources for	-0.124939
-0.613221	The reason for	-0.301030
-0.446655	security reason for	-0.124939
-1.465930	table lookup for	-0.124939
-0.864639	code examples for	-0.124939
-1.279418	no difference for	-0.124939
-0.533723	Time difference for	-0.124939
-0.439370	is needed for	-0.124939
-0.573478	be needed for	-0.124939
-0.869011	not needed for	-0.124939
-0.406707	work needed for	-0.124939
-0.578811	two expressions for	-0.124939
-0.451832	is difficult for	-0.602060
-0.625948	more difficult for	-0.124939
-0.347454	OpenMP directives for	-0.124939
-1.009947	runtime framework for	-0.124939
-1.407035	static linking for	-0.124939
-0.773173	int x; for	-0.124939
-0.324677	float x; for	-0.425969
-1.190323	hardware platform for	-0.124939
-0.521711	is higher for	-0.124939
-0.521711	are higher for	-0.124939
-0.860958	cannot know for	-0.124939
-0.579432	reliable results for	-0.124939
-0.747633	the options for	-0.124939
-0.517412	available options for	-0.124939
-0.345165	a feature for	-0.124939
-1.048108	be made for	-0.124939
-0.517412	library made for	-0.124939
-0.578762	most appropriate for	-0.124939
-0.577105	a constructor for	-0.124939
-1.034577	is relevant for	-0.124939
-1.001301	this section for	-0.124939
-1.369319	the computer for	-0.124939
-0.064196	good choice for	-0.249877
-0.346447	optimal choice for	-0.124939
-0.848832	in STL for	-0.124939
-0.250383	is intended for	-0.124939
-0.330579	are intended for	-0.124939
-0.227596	not intended for	-0.124939
-0.227596	unit intended for	-0.124939
-0.577795	is avoided for	-0.124939
-1.387739	cache lines for	-0.124939
-0.155474	one instance for	-0.823909
-0.181465	no checking for	-0.124939
-0.991776	be inlined for	-0.124939
-0.841606	a database for	-0.124939
-1.229526	the destructor for	-0.124939
-0.531816	the possibility for	-0.124939
-0.838609	Define macro for	-0.124939
-0.570124	hide them for	-0.124939
-0.569186	separate containers for	-0.124939
-0.403779	are: Optimizing for	-0.124939
-0.403779	critical. Optimizing for	-0.124939
-0.403779	speed. Optimizing for	-0.124939
-0.842695	through rows for	-0.124939
-0.073342	when compiling for	-0.124939
-1.302994	data structures for	-0.124939
-0.566318	precious resource for	-0.124939
-0.283046	double temp; for	-0.425969
-0.390716	register temp; for	-0.124939
-0.153165	it easier for	-0.425969
-0.391892	reordering easier for	-0.124939
-0.834324	exactly identical for	-0.124939
-1.254949	the program, for	-0.124939
-0.323909	time, except for	-0.124939
-0.323909	object, except for	-0.124939
-0.323909	programs, except for	-0.124939
-0.323909	stack, except for	-0.124939
-0.456932	and templates for	-0.124939
-0.456932	Using templates for	-0.124939
-0.566758	// header for	-0.124939
-0.303312	a penalty for	-0.124939
-0.303312	no penalty for	-0.124939
-0.302740	performance penalty for	-0.124939
-0.564825	The reasons for	-0.124939
-0.562900	software module for	-0.124939
-1.161958	is used, for	-0.124939
-0.459195	no checks for	-0.124939
-0.459195	explicit checks for	-0.124939
-1.357814	critical stride for	-0.124939
-0.564825	page 3 for	-0.124939
-0.561941	event counts for	-0.124939
-0.318054	big enough for	-0.124939
-0.562900	have features for	-0.124939
-0.563007	= 3; for	-0.124939
-0.714718	is chosen for	-0.124939
-0.629775	has chosen for	-0.124939
-0.998443	all destructors for	-0.124939
-1.152236	parameter transfer for	-0.124939
-0.355037	The search for	-0.124939
-0.355037	programs search for	-0.124939
-0.355037	binary search for	-0.124939
-0.445875	9.1. Time for	-0.124939
-0.445875	9.3. Time for	-0.124939
-1.125712	be mispredicted for	-0.124939
-0.502914	test tool for	-0.124939
-0.995786	is included for	-0.124939
-0.557178	and c2 for	-0.124939
-0.815380	processing unit for	-0.124939
-0.147246	calling conventions for	-0.124939
-0.014354	"Calling conventions for	-0.823909
-0.077041	Calling conventions for	-0.124939
-0.558364	to account for	-0.124939
-0.813238	different algorithms for	-0.124939
-0.430663	a PLT for	-0.124939
-0.814666	and PLT for	-0.124939
-0.428896	only once for	-0.124939
-0.428896	Compile once for	-0.124939
-0.135276	is designed for	-0.124939
-0.332810	never designed for	-0.124939
-0.554815	The inputs for	-0.124939
-0.557178	limiting factors for	-0.124939
-0.135276	is required for	-0.124939
-0.332810	if required for	-0.124939
-0.817532	a GOT for	-0.124939
-0.551128	perform poorly for	-0.124939
-0.548466	not suitable for	-0.124939
-0.552466	// Template for	-0.124939
-0.552466	i, a[100]; for	-0.124939
-0.804180	32-bit mode, for	-0.124939
-0.045556	page 130 for	-0.249877
-0.409923	and 120 for	-0.124939
-0.409923	page 120 for	-0.124939
-0.549795	some changes for	-0.124939
-0.549795	and again for	-0.124939
-0.548466	optimization capabilities for	-0.124939
-0.017614	is waiting for	-0.124939
-0.035973	are waiting for	-0.124939
-0.017614	time waiting for	-0.425969
-0.035973	often waiting for	-0.124939
-0.035973	while waiting for	-0.124939
-0.552466	The rules for	-0.124939
-0.045764	to wait for	-0.249877
-0.386916	the principle for	-0.124939
-0.386916	this principle for	-0.124939
-0.925855	be expected for	-0.124939
-0.543191	C++ Performance for	-0.124939
-0.797853	be convenient for	-0.124939
-0.541665	and c1 for	-0.124939
-0.789682	page 87 for	-0.124939
-0.280508	not permissible for	-0.425969
-1.058987	= 1.0; for	-0.124939
-0.543191	rare. Testing for	-0.124939
-0.546261	off requirements for	-0.124939
-0.340317	device drivers for	-0.124939
-0.141962	page 122 for	-0.425969
-0.535913	and bc for	-0.124939
-0.355433	and searching for	-0.124939
-0.355433	time searching for	-0.124939
-0.051240	biggest vectors: for	-0.124939
-0.012254	eight-element vectors: for	-0.726999
-0.534112	page 80 for	-0.124939
-0.354244	and 90 for	-0.124939
-0.354244	page 90 for	-0.124939
-0.773272	My recommendation for	-0.124939
-0.534112	// Only for	-0.124939
-0.532318	page 107 for	-0.124939
-0.532318	X Compilers for	-0.124939
-0.229207	object (except for	-0.124939
-0.229207	expressions (except for	-0.124939
-0.229207	iteration (except for	-0.124939
-0.534112	have facilities for	-0.124939
-0.518665	page 103 for	-0.124939
-0.518665	page 51 for	-0.124939
-0.757210	is preferable for	-0.124939
-0.749763	page 43 for	-0.124939
-0.110726	template specialization for	-0.602060
-0.518665	page 88 for	-0.124939
-0.937701	heap manager for	-0.124939
-0.518665	page 150 for	-0.124939
-0.015373	optimization guide for	-0.249877
-0.749763	development tools for	-0.124939
-0.128161	compiler documentation for	-0.124939
-0.518665	a directive for	-0.124939
-0.128161	memory area for	-0.124939
-0.518665	page 29 for	-0.124939
-0.518665	100 floats for	-0.124939
-0.491813	at www.agner.org/optimize/cppexamples.zip for	-0.124939
-0.310769	See www.agner.org/optimize/cppexamples.zip for	-0.124939
-0.518665	page 31 for	-0.124939
-0.520839	i; 45 for	-0.124939
-0.713207	page 49 for	-0.124939
-0.496823	page 101 for	-0.124939
-0.496823	page 93 for	-0.124939
-0.496823	and 119 for	-0.124939
-0.818490	memory blocks, for	-0.124939
-0.713207	square blocking for	-0.124939
-0.499581	= r; for	-0.124939
-0.496823	except perhaps for	-0.124939
-0.496823	cache organization for	-0.124939
-0.499581	for calculations: for	-0.124939
-0.499581	the basis for	-0.124939
-0.496823	page 81 for	-0.124939
-0.106090	page 89 for	-0.425969
-0.496823	page 153 for	-0.124939
-0.496823	page 140 for	-0.124939
-0.496823	page 141 for	-0.124939
-0.499581	used twice for	-0.124939
-0.496823	are competing for	-0.124939
-0.106090	not unusual for	-0.425969
-0.246775	10 ms for	-0.124939
-0.246775	30 ms for	-0.124939
-0.496823	Compiler Documentation for	-0.124939
-0.496823	PLT lookups for	-0.124939
-0.106090	page 78 for	-0.425969
-0.496823	the market for	-0.124939
-0.499581	in Day for	-0.124939
-0.202256	VIA CPUs" for	-0.425969
-0.713207	a variable, for	-0.124939
-0.499581	be sufficient for	-0.124939
-0.456180	accelerator card for	-0.124939
-0.456180	of accumulators for	-0.124939
-0.648204	be justified for	-0.124939
-0.456180	0, sum; for	-0.124939
-0.456180	the loop, for	-0.124939
-0.456180	innermost loop: for	-0.124939
-0.456180	i, StringLength; for	-0.124939
-0.456180	of it, for	-0.124939
-0.456180	i, a[2]; for	-0.124939
-0.456180	and 72 for	-0.124939
-0.141598	not suited for	-0.124939
-0.141598	best suited for	-0.124939
-0.456180	130 Compile for	-0.124939
-0.456180	various corrections for	-0.124939
-0.065054	Approximate exp(x) for	-0.425969
-0.648204	certain events, for	-0.124939
-0.456180	a proxy for	-0.124939
-0.456180	specific literature for	-0.124939
-0.456180	special precautions for	-0.124939
-0.456180	structures. Useful for	-0.124939
-0.456180	compiler warning for	-0.124939
-0.456180	row, column; for	-0.124939
-0.456180	24 dramatically for	-0.124939
-0.456180	and IDE's for	-0.124939
-0.648204	= 1.f; for	-0.124939
-0.141598	and fine-tuned for	-0.124939
-0.141598	are fine-tuned for	-0.124939
-0.456180	that waits for	-0.124939
-0.456180	is consistent for	-0.124939
-0.648204	an interpreter for	-0.124939
-0.456180	memory spaces for	-0.124939
-0.456180	all squares: for	-0.124939
-0.065054	not uncommon for	-0.124939
-0.456180	are unnecessary for	-0.124939
-0.456180	i; 84 for	-0.124939
-0.456180	register left for	-0.124939
-0.456180	much stronger for	-0.124939
-0.456180	The procedures for	-0.124939
-0.456180	use ~ for	-0.124939
-0.456180	sufficiently accurate for	-0.124939
-0.141598	will contend for	-0.124939
-0.141598	libraries contend for	-0.124939
-0.456180	+ B; for	-0.124939
-0.456180	Linux Optimize for	-0.124939
-0.456180	the subroutine for	-0.124939
-0.353058	specific preferences for	-0.124939
-0.353058	division. Correction for	-0.124939
-0.353058	similar utility for	-0.124939
-0.353058	the FAQ for	-0.124939
-0.353058	NUMROWS; row++) for	-0.124939
-0.353058	can be, for	-0.124939
-0.353058	this interval, for	-0.124939
-0.353058	two decimals, for	-0.124939
-0.353058	matrix cell for	-0.124939
-0.353058	list[100], *temp; for	-0.124939
-0.353058	Intel VTune, for	-0.124939
-0.353058	separate executables for	-0.124939
-0.353058	is maintained for	-0.124939
-0.353058	the IDE, for	-0.124939
-0.353058	as buffers for	-0.124939
-0.353058	limited audience for	-0.124939
-0.353058	A sourcebook for	-0.124939
-0.353058	different meaning for	-0.124939
-0.353058	inside sqaure: for	-0.124939
-0.353058	is delayed for	-0.124939
-0.353058	v. 11.1 for	-0.124939
-0.353058	E-book Usability for	-0.124939
-0.353058	e.g. .R. for	-0.124939
-0.353058	page 122) for	-0.124939
-0.353058	registers. Except for	-0.124939
-0.353058	be prepared for	-0.124939
-0.353058	to compensate for	-0.124939
-0.353058	is reserved for	-0.124939
-0.353058	long timediff[NumberOfTests]; for	-0.124939
-0.353058	a request for	-0.124939
-0.353058	Optimization Guide for	-0.124939
-0.353058	search requests for	-0.124939
-0.353058	static keyword, for	-0.124939
-0.353058	always compete for	-0.124939
-0.353058	expression. Assume, for	-0.124939
-0.353058	of attack for	-0.124939
-0.353058	or intranet for	-0.124939
-0.353058	been criticized for	-0.124939
-0.353058	math. Libraries for	-0.124939
-0.353058	} printf("\nResults:"); for	-0.124939
-0.353058	the standards for	-0.124939
-0.353058	optimize specifically for	-0.124939
-0.353058	dispatching 125 for	-0.124939
-0.353058	the possibilities for	-0.124939
-0.353058	very helpful for	-0.124939
-0.353058	function prototypes for	-0.124939
-0.353058	footprint. If, for	-0.124939
-0.353058	micro-operation breakdowns for	-0.124939
-0.353058	in parts, for	-0.124939
-0.353058	my blog for	-0.124939
-0.353058	and suggestions for	-0.124939
-0.353058	handle. Waiting for	-0.124939
-0.353058	detect opportunities for	-0.124939
-0.353058	as replacements for	-0.124939
-0.353058	exception handlers for	-0.124939
-0.353058	or /Fa for	-0.124939
-0.353058	be wired for	-0.124939
-0.353058	example 12.3a, for	-0.124939
-0.353058	a strategy for	-0.124939
-0.353058	handled separately: for	-0.124939
-0.353058	// Prototype for	-0.124939
-0.353058	compilers exist for	-0.124939
-0.353058	operating systems" for	-0.124939
-0.353058	v. 14.00 for	-0.124939
-0.353058	name _alloca) for	-0.124939
-0.353058	than doubled for	-0.124939
-0.353058	on correction for	-0.124939
-0.353058	by 5-10% for	-0.124939
-0.353058	Typical candidates for	-0.124939
-0.353058	is responsible for	-0.124939
-0.353058	newsgroup comp.lang.asm.x86 for	-0.124939
-0.353058	Intel's term for	-0.124939
-1.104934	code is that	-0.249877
-1.565646	compiler is that	-0.124939
-1.179099	this is that	-0.124939
-1.078414	data is that	-0.124939
-1.106596	point is that	-0.124939
-1.355600	cache is that	-0.124939
-1.325552	set is that	-0.124939
-0.981657	double is that	-0.124939
-1.228041	library is that	-0.124939
-1.519889	method is that	-0.124939
-1.294988	result is that	-0.124939
-1.282074	language is that	-0.124939
-0.835704	Linux is that	-0.124939
-1.321452	problem is that	-0.124939
-0.068442	disadvantage is that	-0.522879
-1.203097	parameter is that	-0.124939
-0.087382	reason is that	-0.249877
-0.567062	optimizations is that	-0.124939
-1.175324	linking is that	-0.124939
-0.981657	storage is that	-0.124939
-0.713190	here is that	-0.124939
-0.196891	contentions is that	-0.124939
-0.835704	inlining is that	-0.124939
-0.981657	containers is that	-0.124939
-0.567062	improved is that	-0.124939
-0.567062	binding is that	-0.124939
-0.567062	algorithms is that	-0.124939
-0.567062	volatile is that	-0.124939
-0.196891	notice is that	-0.124939
-0.567062	macros is that	-0.124939
-0.567062	consequence is that	-0.124939
-0.567062	assumption is that	-0.124939
-0.196891	conclusion is that	-0.124939
-0.567062	argument is that	-0.124939
-1.071571	history of that	-0.124939
-1.171246	faster and that	-0.124939
-1.398736	times and that	-0.124939
-0.593605	1 and that	-0.124939
-0.593605	dynamically and that	-0.124939
-0.886342	development, and that	-0.124939
-0.598270	style are that	-0.124939
-1.725791	the function that	-0.124939
-0.917110	a function that	-0.249877
-1.272041	The function that	-0.124939
-0.328857	A function that	-0.124939
-0.550348	const function that	-0.124939
-0.805173	every function that	-0.124939
-0.939506	graphics function that	-0.124939
-0.550348	Any function that	-0.124939
-1.088243	detection function that	-0.124939
-0.550348	error-handling function that	-0.124939
-1.670671	compatible with that	-0.124939
-0.889295	optimizations on that	-0.124939
-0.595108	effort on that	-0.124939
-1.088941	the code that	-0.124939
-0.534113	a code that	-0.124939
-0.914515	of code that	-0.124939
-1.433224	The code that	-0.124939
-0.344910	for code that	-0.124939
-0.344910	A code that	-0.124939
-0.518666	make code that	-0.124939
-0.518666	other code that	-0.124939
-0.518666	All code that	-0.124939
-0.518666	optimize code that	-0.124939
-0.518666	complicated code that	-0.124939
-0.518666	Any code that	-0.124939
-0.937703	loop-invariant code that	-0.124939
-0.518666	improving code that	-0.124939
-1.358500	the compiler that	-0.221849
-0.851067	a compiler that	-0.124939
-1.155622	A compiler that	-0.124939
-1.755228	The time that	-0.124939
-1.264976	same time that	-0.124939
-2.346887	to use that	-0.124939
-1.798591	of memory that	-0.124939
-1.521681	of data that	-0.124939
-0.583606	while data that	-0.124939
-0.583606	local data that	-0.124939
-1.188658	the program that	-0.124939
-1.413911	a program that	-0.124939
-0.897089	C++ program that	-0.124939
-0.462112	test program that	-0.124939
-0.532531	Windows program that	-0.124939
-0.532531	well-structured program that	-0.124939
-0.532531	antivirus program that	-0.124939
-0.231923	the functions that	-0.425969
-0.864143	of functions that	-0.124939
-0.489150	for functions that	-0.124939
-0.700650	if functions that	-0.124939
-0.489150	from functions that	-0.124939
-0.802551	other functions that	-0.124939
-1.120644	member functions that	-0.124939
-0.489150	several functions that	-0.124939
-0.489150	few functions that	-0.124939
-1.165173	mathematical functions that	-0.124939
-1.276138	of CPU that	-0.124939
-0.880860	an instruction that	-0.124939
-0.590803	slow instruction that	-0.124939
-1.237058	a loop that	-0.124939
-0.573623	another loop that	-0.124939
-1.395497	innermost loop that	-0.124939
-0.597521	generally used that	-0.124939
-0.175400	the one that	-0.124939
-1.115814	is one that	-0.124939
-0.677307	and one that	-0.124939
-1.319152	only one that	-0.124939
-0.531179	function, one that	-0.124939
-1.563379	a cache that	-0.124939
-1.874500	an integer that	-0.124939
-1.723842	instruction set that	-0.124939
-1.399695	the class that	-0.124939
-1.417236	or class that	-0.124939
-0.646350	container class that	-0.124939
-1.401581	the compilers that	-0.124939
-0.558120	for compilers that	-0.124939
-0.558120	on compilers that	-0.124939
-0.558120	from compilers that	-0.124939
-0.819249	these compilers that	-0.124939
-0.555647	data size that	-0.124939
-0.423916	integer size that	-0.425969
-1.008873	the library that	-0.124939
-0.771644	function library that	-0.124939
-0.992588	The object that	-0.124939
-1.092116	or object that	-0.124939
-1.606051	an object that	-0.124939
-0.922660	the version that	-0.124939
-0.191735	The version that	-0.124939
-0.543393	one version that	-0.124939
-0.543393	generic version that	-0.124939
-1.064246	the value that	-0.425969
-0.460939	a value that	-0.425969
-1.411529	and objects that	-0.124939
-0.863189	big objects that	-0.124939
-1.565405	the variable that	-0.124939
-1.540111	a variable that	-0.124939
-1.197610	A variable that	-0.124939
-0.214012	in so that	-0.124939
-0.214012	it so that	-0.124939
-0.214012	function so that	-0.124939
-0.093955	code so that	-0.124939
-0.093955	time so that	-0.124939
-0.214012	vector so that	-0.124939
-0.214012	different so that	-0.124939
-0.214012	example so that	-0.124939
-0.214012	call so that	-0.124939
-0.214012	less so that	-0.124939
-0.214012	bit so that	-0.124939
-0.214012	pointers so that	-0.124939
-0.214012	operations so that	-0.124939
-0.214012	calculations so that	-0.124939
-0.214012	threads so that	-0.124939
-0.214012	exception so that	-0.124939
-0.214012	mode so that	-0.124939
-0.214012	source so that	-0.124939
-0.214012	start so that	-0.124939
-0.214012	negative so that	-0.124939
-0.214012	section so that	-0.124939
-0.214012	statement so that	-0.124939
-0.214012	inlined so that	-0.124939
-0.214012	macro so that	-0.124939
-0.214012	100 so that	-0.124939
-0.214012	changed so that	-0.124939
-0.214012	identical so that	-0.124939
-0.214012	unrolling so that	-0.124939
-0.093955	organized so that	-0.425969
-0.214012	integer, so that	-0.124939
-0.214012	possible, so that	-0.124939
-0.214012	compact so that	-0.124939
-0.214012	above, so that	-0.124939
-0.214012	9.5 so that	-0.124939
-0.214012	12.4a so that	-0.124939
-0.214012	factorials so that	-0.124939
-0.214012	switches; so that	-0.124939
-0.214012	0x2C so that	-0.124939
-0.758353	the variables that	-0.124939
-0.758353	for variables that	-0.124939
-0.523689	that variables that	-0.124939
-0.523689	counter variables that	-0.124939
-0.523689	initialized variables that	-0.124939
-0.523689	uninitialized variables that	-0.124939
-1.734589	the table that	-0.124939
-0.594637	highest performance that	-0.124939
-1.189148	of software that	-0.124939
-0.547536	on software that	-0.124939
-0.800129	make software that	-0.124939
-0.547536	Security software that	-0.124939
-0.595130	so long that	-0.124939
-0.578955	a branch that	-0.301030
-0.861627	of branch that	-0.124939
-0.349798	A branch that	-0.249877
-0.479800	Remove branch that	-0.124939
-0.319711	a way that	-0.124939
-0.594666	interface elements that	-0.124939
-0.771816	memory address that	-0.425969
-1.356866	for example, that	-0.124939
-0.592783	logical register that	-0.124939
-0.554564	or libraries that	-0.124939
-1.154030	function libraries that	-0.425969
-0.592705	saving registers that	-0.124939
-0.593086	with pointers that	-0.124939
-0.883333	performance test that	-0.124939
-0.550886	all systems that	-0.124939
-1.067855	operating systems that	-0.124939
-0.096269	be sure that	-0.124939
-0.478297	are sure that	-0.124939
-0.263558	make sure that	-0.393784
-0.111962	makes sure that	-0.367977
-0.061753	making sure that	-0.124939
-0.938131	the method that	-0.124939
-0.549786	important method that	-0.124939
-0.549786	unfortunate method that	-0.124939
-1.317163	a file that	-0.124939
-1.693505	= 0 that	-0.124939
-1.553337	the type that	-0.124939
-1.443239	the case that	-0.124939
-0.568197	likely case that	-0.124939
-0.462524	on instructions that	-0.124939
-0.839448	vector instructions that	-0.124939
-0.462524	specific instructions that	-0.124939
-0.462524	single instructions that	-0.124939
-0.462524	few instructions that	-0.124939
-0.462524	certain instructions that	-0.124939
-0.462524	application-specific instructions that	-0.124939
-0.605248	the processors that	-0.124939
-0.392675	of processors that	-0.425969
-0.427029	on processors that	-0.425969
-0.428091	some processors that	-0.124939
-0.302090	first processors that	-0.425969
-0.605248	unknown processors that	-0.124939
-1.722783	a constant that	-0.124939
-0.590787	common error that	-0.124939
-0.927868	is important that	-0.301030
-0.565050	with CPUs that	-0.124939
-0.565050	all CPUs that	-0.124939
-0.589732	so large that	-0.124939
-1.067388	of arrays that	-0.124939
-0.829365	to arrays that	-0.124939
-0.563001	other work that	-0.124939
-0.563001	User work that	-0.124939
-1.171576	to avoid that	-0.124939
-0.590386	any processor that	-0.124939
-0.195701	so big that	-0.124939
-0.561763	for threads that	-0.124939
-1.307088	multiple threads that	-0.124939
-0.534214	a language that	-0.124939
-1.304282	programming language that	-0.124939
-0.534214	Any language that	-0.124939
-1.174054	a thread that	-0.124939
-0.560172	A thread that	-0.124939
-0.194982	so small that	-0.124939
-1.197735	the option that	-0.124939
-1.260423	an option that	-0.124939
-0.530519	any option that	-0.124939
-1.415930	container classes that	-0.124939
-0.886642	the line that	-0.124939
-0.941849	cache line that	-0.124939
-1.399786	Function parameters that	-0.124939
-1.412916	to check that	-0.124939
-0.557068	runtime check that	-0.124939
-1.305977	the problem that	-0.124939
-0.554250	causes problem that	-0.124939
-1.322147	efficient solution that	-0.124939
-0.801030	a container that	-0.124939
-0.361389	the advantage that	-0.124939
-0.518843	from operators that	-0.124939
-0.518843	all operators that	-0.124939
-0.518843	but operators that	-0.124939
-1.194481	is likely that	-0.124939
-0.752144	very likely that	-0.124939
-0.588620	parallel structure that	-0.124939
-1.303988	can calculate that	-0.124939
-1.052940	to copy that	-0.124939
-0.878271	CPUID information that	-0.124939
-0.813009	is certain that	-0.124939
-0.448894	be certain that	-0.124939
-0.813009	are certain that	-0.124939
-0.448894	quite certain that	-0.124939
-0.636918	almost certain that	-0.124939
-1.838790	clock cycles that	-0.124939
-0.547915	or addresses that	-0.124939
-0.547915	absolute addresses that	-0.124939
-0.872111	a counter that	-0.124939
-0.802729	loop count that	-0.425969
-0.587047	Temporary files that	-0.124939
-0.879925	therefore recommended that	-0.124939
-0.192193	so fast that	-0.124939
-0.192036	I write that	-0.425969
-0.898616	in programs that	-0.124939
-0.724011	for programs that	-0.124939
-0.503355	making programs that	-0.124939
-0.854560	the problems that	-0.124939
-0.668009	resource problems that	-0.124939
-0.468804	finding problems that	-0.124939
-0.844100	usability problems that	-0.124939
-1.030259	a microprocessor that	-0.124939
-0.877123	of branches that	-0.124939
-0.463944	making branches that	-0.124939
-0.463944	Unpredictable branches that	-0.124939
-0.463944	Predictable branches that	-0.124939
-0.540788	or operator that	-0.124939
-0.788131	casting operator that	-0.124939
-0.584578	second application that	-0.124939
-0.335784	can see that	-0.425969
-0.715974	will see that	-0.124939
-0.718701	the expression that	-0.124939
-0.575056	The expression that	-0.124939
-0.268838	an expression that	-0.425969
-0.575056	An expression that	-0.124939
-0.363808	Any expression that	-0.124939
-0.363808	loop-invariant expression that	-0.124939
-0.583631	so complicated that	-0.124939
-1.557993	data members that	-0.124939
-0.583508	a model that	-0.124939
-0.927194	memory block that	-0.124939
-0.583508	arbitrary name that	-0.124939
-0.223141	the disadvantage that	-0.602060
-0.110980	so high that	-0.602060
-1.467336	to zero that	-0.124939
-0.868883	allocated resources that	-0.124939
-0.586159	same reason that	-0.124939
-1.521924	CPU dispatcher that	-0.124939
-1.715945	the programmer that	-0.124939
-0.770411	in applications that	-0.124939
-0.530673	for applications that	-0.124939
-1.291902	dispatch mechanism that	-0.124939
-0.165181	function means that	-0.124939
-0.020086	This means that	-0.271067
-0.074784	variable means that	-0.425969
-0.165181	10 means that	-0.124939
-0.165181	function, means that	-0.124939
-0.165181	operands means that	-0.124939
-0.165181	here means that	-0.124939
-0.862017	write expressions that	-0.124939
-0.581476	preprocessing directives that	-0.124939
-0.693796	it requires that	-0.124939
-0.693796	This requires that	-0.124939
-0.432899	often requires that	-0.124939
-0.612497	method requires that	-0.124939
-0.477658	the optimizations that	-0.124939
-0.477658	compiler optimizations that	-0.124939
-0.682100	making optimizations that	-0.124939
-0.524760	software framework that	-0.124939
-0.879305	runtime framework that	-0.124939
-0.864889	old microprocessors that	-0.124939
-0.113394	to assume that	-0.124939
-0.065364	and assume that	-0.124939
-0.015442	can assume that	-0.249877
-0.065364	or assume that	-0.124939
-0.065364	you assume that	-0.124939
-0.133552	we assume that	-0.124939
-0.031453	cannot assume that	-0.124939
-0.065364	would assume that	-0.124939
-0.031453	generally assume that	-0.425969
-0.065364	makers assume that	-0.124939
-0.065364	safely assume that	-0.124939
-0.759154	table shows that	-0.124939
-0.524156	16) shows that	-0.124939
-0.337844	not know that	-0.124939
-0.475782	you know that	-0.124939
-0.337844	we know that	-0.124939
-0.475782	cannot know that	-0.124939
-0.337844	would know that	-0.124939
-0.337844	Intel) know that	-0.124939
-0.522122	other advantages that	-0.124939
-0.522122	specific advantages that	-0.124939
-1.385824	optimization options that	-0.124939
-0.665189	the feature that	-0.124939
-0.467019	special feature that	-0.124939
-0.467019	interposition feature that	-0.124939
-1.013267	default constructor that	-0.124939
-0.460931	may require that	-0.124939
-0.326985	operations require that	-0.124939
-0.326985	instructions require that	-0.124939
-0.326985	applications require that	-0.124939
-0.326985	profilers require that	-0.124939
-0.326985	MOVNTDQ require that	-0.124939
-0.514175	all modules that	-0.124939
-0.742147	.cpp modules that	-0.124939
-0.644662	of things that	-0.124939
-0.453901	three things that	-0.124939
-0.453901	reveal things that	-0.124939
-0.855546	the reductions that	-0.124939
-0.576156	each statement that	-0.124939
-0.730211	programming errors that	-0.124939
-0.507074	detecting errors that	-0.124939
-0.728904	other languages that	-0.124939
-0.985706	programming languages that	-0.124939
-1.207133	a profiler that	-0.124939
-0.574483	illegal operation that	-0.124939
-0.331809	the fact that	-0.124939
-0.062258	The fact that	-0.726999
-0.574665	to platforms that	-0.124939
-0.493750	single task that	-0.124939
-0.493750	Any task that	-0.124939
-0.574142	for constants that	-0.124939
-0.280625	a destructor that	-0.124939
-0.319119	code Assume that	-0.124939
-0.319119	} Assume that	-0.124939
-0.319119	access. Assume that	-0.124939
-0.319119	speed. Assume that	-0.124939
-0.319119	general. Assume that	-0.124939
-0.573099	first algorithm that	-0.124939
-0.504511	the possibility that	-0.301030
-0.357653	theoretical possibility that	-0.124939
-0.576003	this discussion that	-0.124939
-0.573749	The conditions that	-0.124939
-0.572627	an offset that	-0.124939
-0.105302	1. Note that	-0.124939
-0.105302	version. Note that	-0.124939
-0.105302	system. Note that	-0.124939
-0.105302	arrays. Note that	-0.124939
-0.105302	details. Note that	-0.124939
-0.105302	explanation. Note that	-0.124939
-0.105302	optimized. Note that	-0.124939
-0.105302	away. Note that	-0.124939
-0.105302	disassembler. Note that	-0.124939
-0.105302	131 Note that	-0.124939
-0.179194	the operand that	-0.425969
-0.572627	other tasks that	-0.124939
-0.256012	efficient. Variables that	-0.124939
-0.366925	storage Variables that	-0.124939
-0.256012	are: Variables that	-0.124939
-0.256012	storage. Variables that	-0.124939
-0.109408	9.4 Variables that	-0.425969
-0.178741	is clear that	-0.124939
-0.570866	occasionally predict that	-0.124939
-1.478277	clock frequency that	-0.124939
-0.570262	extra iteration that	-0.124939
-0.479528	or models that	-0.124939
-0.479528	newer models that	-0.124939
-0.988436	is true that	-0.124939
-0.843926	have names that	-0.124939
-0.570866	other details that	-0.124939
-0.480509	Another thing that	-0.124939
-0.480509	third thing that	-0.124939
-1.314064	data structures that	-0.124939
-0.838927	must consider that	-0.124939
-0.568795	time delay that	-0.124939
-0.568139	soft cores that	-0.124939
-0.567484	of ebx that	-0.124939
-0.563459	of statements that	-0.124939
-0.564173	zigzag course that	-0.124939
-0.657387	a risk that	-0.124939
-0.462060	higher risk that	-0.124939
-0.564173	loop buffer that	-0.124939
-0.460368	is something that	-0.124939
-0.460368	certainly something that	-0.124939
-0.742027	clock counts that	-0.124939
-0.458684	case" counts that	-0.124939
-0.798759	can happen that	-0.124939
-0.460368	often happen that	-0.124939
-0.963951	programming style that	-0.124939
-0.447745	function, provided that	-0.124939
-0.447745	branches, provided that	-0.124939
-0.356554	that everything that	-0.124939
-0.356554	sure everything that	-0.124939
-0.356554	eliminate everything that	-0.124939
-0.556949	Assume now that	-0.124939
-1.330491	into account that	-0.124939
-0.470130	the factors that	-0.124939
-0.135565	several factors that	-0.425969
-0.556949	compiler explicitly that	-0.124939
-0.551862	a measure that	-0.124939
-0.552855	disk. Software that	-0.124939
-0.551862	the trick that	-0.124939
-0.552855	some disadvantages that	-0.124939
-0.553851	multiple instances that	-0.124939
-0.550871	so expensive that	-0.124939
-0.558683	be aware that	-0.425969
-0.551862	runtime polymorphism that	-0.124939
-0.013692	the sense that	-0.301030
-1.066826	of course, that	-0.124939
-0.152044	will notice that	-0.124939
-0.929683	be expected that	-0.124939
-0.544032	can detect that	-0.124939
-0.545170	9.1 show that	-0.124939
-0.152044	C, specifying that	-0.124939
-1.149813	stack unwinding that	-0.124939
-0.535972	other cleanup that	-0.124939
-0.142339	9.3 Functions that	-0.425969
-0.500189	standard specifies that	-0.124939
-0.355472	keyword specifies that	-0.124939
-0.535972	dynamically. Arrays that	-0.124939
-0.534635	for lists that	-0.124939
-0.264353	the event that	-0.425969
-0.534635	collection. Objects that	-0.124939
-0.535972	areas. Data that	-0.124939
-0.537313	are frameworks that	-0.124939
-0.538659	keyword tells that	-0.124939
-0.535972	powerful facilities that	-0.124939
-0.538659	constant divisor that	-0.124939
-0.440403	to recommend that	-0.124939
-0.440403	textbooks recommend that	-0.124939
-0.520910	roughly estimate that	-0.124939
-0.522530	be said that	-0.124939
-0.065251	may think that	-0.124939
-0.065251	then think that	-0.124939
-0.065251	I think that	-0.124939
-0.065251	don't think that	-0.124939
-0.524156	efficient alternatives that	-0.124939
-0.520910	profiling tools that	-0.124939
-0.943617	hot spot that	-0.124939
-0.520910	variable lengths that	-0.124939
-0.522530	unfortunate consequence that	-0.124939
-0.128503	the assumption that	-0.124939
-0.049941	will recognize that	-0.301030
-0.522530	applications. Remember that	-0.124939
-0.520910	the considerations that	-0.124939
-0.240211	initialization routine that	-0.124939
-0.520910	one, auto_ptr that	-0.124939
-0.247573	are assuming that	-0.124939
-0.247573	from assuming that	-0.124939
-0.498957	optimize("a",on). Specifies that	-0.124939
-0.498957	ball reveals that	-0.124939
-0.498957	many labels that	-0.124939
-0.498957	instruction. Programmers that	-0.124939
-0.498957	function F2 that	-0.124939
-0.501012	it unusual that	-0.124939
-0.089378	systems. Applications that	-0.124939
-0.089378	interface. Applications that	-0.124939
-0.089378	141. Applications that	-0.124939
-0.822979	table (PLT) that	-0.124939
-0.498957	Many services that	-0.124939
-0.356062	// Check that	-0.124939
-0.247573	2. Check that	-0.124939
-0.202736	But beware that	-0.124939
-0.458124	Remember again, that	-0.124939
-0.142068	and discovered that	-0.124939
-0.142068	have discovered that	-0.124939
-0.458124	90% chance that	-0.124939
-0.458124	a chip that	-0.124939
-0.065251	I believe that	-0.124939
-0.142068	operations. Algorithms that	-0.124939
-0.142068	matrixes. Algorithms that	-0.124939
-0.458124	on hacks that	-0.124939
-0.142068	C; Assuming that	-0.124939
-0.142068	has. Assuming that	-0.124939
-0.142068	to note that	-0.124939
-0.142068	Please note that	-0.124939
-0.458124	random events that	-0.124939
-0.142068	expressions. Operations that	-0.124939
-0.142068	aliasing. Operations that	-0.124939
-0.142068	register. Factors that	-0.124939
-0.142068	is. Factors that	-0.124939
-0.458124	on servers that	-0.124939
-0.142068	optimization. Everything that	-0.124939
-0.142068	register. Everything that	-0.124939
-0.458124	may report that	-0.124939
-0.458124	to verify that	-0.124939
-0.458124	other complications that	-0.124939
-0.458124	therefore conclude that	-0.124939
-0.458124	up spaces that	-0.124939
-0.651232	more complex, that	-0.124939
-0.065251	only hope that	-0.124939
-0.651232	the ones that	-0.124939
-0.458124	program saying that	-0.124939
-0.458124	vectors. Code that	-0.124939
-0.651232	with certainty that	-0.124939
-0.458124	conditions. Programs that	-0.124939
-0.142068	standard says that	-0.124939
-0.142068	convention says that	-0.124939
-0.458124	then interpret that	-0.124939
-0.354589	not noticed that	-0.124939
-0.354589	making plug-ins that	-0.124939
-0.354589	speed exceeding that	-0.124939
-0.354589	programmer forgets that	-0.124939
-0.354589	into sub-vectors that	-0.124939
-0.354589	seem illogical that	-0.124939
-0.354589	virus scanner that	-0.124939
-0.354589	14.27 assumes that	-0.124939
-0.354589	is assumed that	-0.124939
-0.354589	so kludgy that	-0.124939
-0.354589	to remember that	-0.124939
-0.354589	file mathimf.h that	-0.124939
-0.354589	so-called iterators that	-0.124939
-0.354589	you discover that	-0.124939
-0.354589	common excuse that	-0.124939
-0.354589	the likelihood that	-0.124939
-0.354589	or Espresso) that	-0.124939
-0.354589	web browsing that	-0.124939
-0.354589	no guarantee that	-0.124939
-0.354589	planning stage that	-0.124939
-0.354589	so-called CPU-dispatcher that	-0.124939
-0.354589	later discovers that	-0.124939
-0.354589	to realize that	-0.124939
-0.354589	Thin clients that	-0.124939
-0.354589	be emphasized that	-0.124939
-0.354589	or *.so) that	-0.124939
-0.354589	strict formalism that	-0.124939
-0.354589	program dictates that	-0.124939
-0.354589	in mind, that	-0.124939
-0.354589	is unrealistic that	-0.124939
-0.354589	common subexpressions that	-0.124939
-0.354589	I guess, that	-0.124939
-0.354589	condition. Things that	-0.124939
-0.354589	compiler knows that	-0.124939
-0.354589	the wires that	-0.124939
-0.354589	developers feel that	-0.124939
-0.354589	82 Keywords that	-0.124939
-0.354589	let's say that	-0.124939
-0.354589	the complication that	-0.124939
-0.354589	to multithreading that	-0.124939
-0.354589	is unlikely that	-0.124939
-0.354589	may argue that	-0.124939
-0.354589	from knowing that	-0.124939
-0.354589	directives (everything that	-0.124939
-1.218911	a to be	-0.124939
-0.811447	function to be	-0.124939
-1.245555	code to be	-0.124939
-1.453812	compiler to be	-0.124939
-0.765476	have to be	-0.124939
-0.351997	this to be	-0.124939
-1.240558	memory to be	-0.124939
-0.597931	has to be	-0.124939
-0.981083	number to be	-0.124939
-0.534688	variable to be	-0.124939
-0.463487	variables to be	-0.124939
-0.902099	table to be	-0.124939
-0.777408	software to be	-0.124939
-0.569557	need to be	-0.191886
-1.109767	sure to be	-0.124939
-0.189784	out to be	-0.425969
-1.390749	want to be	-0.124939
-0.534688	line to be	-0.124939
-0.176017	parameters to be	-0.726999
-0.176017	known to be	-0.124939
-0.306499	likely to be	-0.124939
-0.394729	certain to be	-0.124939
-0.777408	addresses to be	-0.124939
-0.534688	files to be	-0.124939
-0.480775	needs to be	-0.124939
-0.534688	division to be	-0.124939
-1.424978	programmer to be	-0.124939
-0.902099	intended to be	-0.124939
-0.534688	heap to be	-0.124939
-0.777408	executable to be	-0.124939
-0.534688	sequence to be	-0.124939
-0.116916	happen to be	-0.301030
-0.626534	enough to be	-0.124939
-0.351997	expected to be	-0.124939
-0.233977	guaranteed to be	-0.124939
-0.534688	tools to be	-0.124939
-0.777408	going to be	-0.124939
-0.189784	appears to be	-0.124939
-0.534688	argument to be	-0.124939
-0.534688	dangers to be	-0.124939
-0.534688	happened to be	-0.124939
-0.102408	to can be	-0.124939
-0.342109	and can be	-0.124939
-0.178962	that can be	-0.182931
-0.241052	it can be	-0.157123
-0.292491	function can be	-0.124939
-0.165710	code can be	-0.124939
-0.114937	This can be	-0.372723
-0.236666	x can be	-0.124939
-0.383832	this can be	-0.124939
-0.236666	time can be	-0.124939
-0.236666	use can be	-0.124939
-0.373737	It can be	-0.301030
-0.342109	data can be	-0.124939
-0.342109	vector can be	-0.124939
-0.342109	same can be	-0.124939
-0.196145	functions can be	-0.124939
-0.236666	instruction can be	-0.124939
-0.383832	loop can be	-0.124939
-0.094527	which can be	-0.124939
-0.102408	integer can be	-0.124939
-0.383832	set can be	-0.124939
-0.236666	class can be	-0.124939
-0.102408	example can be	-0.124939
-0.580646	compilers can be	-0.124939
-0.102408	size can be	-0.124939
-0.065523	pointer can be	-0.301030
-0.236666	b can be	-0.124939
-0.150824	library can be	-0.124939
-0.065523	object can be	-0.124939
-0.138205	array can be	-0.124939
-0.106821	objects can be	-0.124939
-0.065523	variable can be	-0.124939
-0.196145	variables can be	-0.124939
-0.236666	2 can be	-0.124939
-0.342109	performance can be	-0.124939
-0.196145	branch can be	-0.124939
-0.236666	stored can be	-0.124939
-0.102408	address can be	-0.124939
-0.236666	bit can be	-0.124939
-0.102408	register can be	-0.124939
-0.102408	libraries can be	-0.425969
-0.102408	pointers can be	-0.124939
-0.342109	they can be	-0.124939
-0.048192	method can be	-0.425969
-0.236666	access can be	-0.124939
-0.102408	system can be	-0.124939
-0.236666	file can be	-0.124939
-0.236666	programming can be	-0.124939
-0.102408	operations can be	-0.124939
-0.236666	type can be	-0.124939
-0.383832	processors can be	-0.124939
-0.236666	available can be	-0.124939
-0.102408	constant can be	-0.124939
-0.236666	stack can be	-0.124939
-0.406300	CPUs can be	-0.124939
-0.236666	arrays can be	-0.124939
-0.236666	work can be	-0.124939
-0.236666	calls can be	-0.124939
-0.236666	result can be	-0.124939
-0.236666	bytes can be	-0.124939
-0.236666	inside can be	-0.124939
-0.102408	problem can be	-0.124939
-0.236666	list can be	-0.124939
-0.236666	hardware can be	-0.124939
-0.236666	information can be	-0.124939
-0.236666	... can be	-0.124939
-0.048192	counter can be	-0.425969
-0.342109	allocation can be	-0.124939
-0.236666	both can be	-0.124939
-0.236666	programs can be	-0.124939
-0.236666	space can be	-0.124939
-0.236666	dispatching can be	-0.124939
-0.236666	branches can be	-0.124939
-0.236666	multiplication can be	-0.124939
-0.102408	sets can be	-0.124939
-0.236666	implementation can be	-0.124939
-0.236666	handling can be	-0.124939
-0.236666	members can be	-0.124939
-0.236666	reference can be	-0.124939
-0.236666	keyword can be	-0.124939
-0.236666	lookup can be	-0.124939
-0.236666	applications can be	-0.124939
-0.196145	mechanism can be	-0.124939
-0.236666	numbers can be	-0.124939
-0.383832	union can be	-0.124939
-0.236666	constructor can be	-0.124939
-0.236666	section can be	-0.124939
-0.236666	contentions can be	-0.124939
-0.342109	conversions can be	-0.124939
-0.342109	statement can be	-0.124939
-0.236666	languages can be	-0.124939
-0.236666	costs can be	-0.124939
-0.236666	constants can be	-0.124939
-0.236666	offset can be	-0.124939
-0.236666	effect can be	-0.124939
-0.236666	counters can be	-0.124939
-0.236666	loading can be	-0.124939
-0.102408	condition can be	-0.124939
-0.236666	stride can be	-0.124939
-0.236666	metaprogramming can be	-0.124939
-0.236666	map can be	-0.124939
-0.102408	chains can be	-0.124939
-0.383832	tool can be	-0.124939
-0.236666	binding can be	-0.124939
-0.236666	DLL can be	-0.124939
-0.236666	tables can be	-0.124939
-0.236666	sections can be	-0.124939
-0.196145	They can be	-0.124939
-0.236666	exceptions can be	-0.124939
-0.236666	manuals can be	-0.124939
-0.236666	units can be	-0.124939
-0.236666	polynomial can be	-0.124939
-0.236666	13.1 can be	-0.124939
-0.236666	Pointers can be	-0.124939
-0.236666	behavior can be	-0.124939
-0.236666	edx can be	-0.124939
-0.236666	job can be	-0.124939
-0.236666	abc can be	-0.124939
-0.102408	limit can be	-0.124939
-0.236666	guidelines can be	-0.124939
-0.236666	estimate can be	-0.124939
-0.236666	Metaprogramming can be	-0.124939
-0.236666	12.4b can be	-0.124939
-0.236666	Integers can be	-0.124939
-0.236666	techniques can be	-0.124939
-0.236666	resolution can be	-0.124939
-0.236666	projects can be	-0.124939
-0.236666	14.28 can be	-0.124939
-0.236666	divisions can be	-0.124939
-0.236666	s3 can be	-0.124939
-0.236666	etc., can be	-0.124939
-0.236666	Strings can be	-0.124939
-0.236666	read-only can be	-0.124939
-0.236666	c+b can be	-0.124939
-0.236666	chip can be	-0.124939
-0.236666	formats can be	-0.124939
-0.236666	miss can be	-0.124939
-0.236666	Jumps can be	-0.124939
-0.236666	parentheses can be	-0.124939
-0.236666	(b+c) can be	-0.124939
-0.236666	dilemma can be	-0.124939
-0.236666	work-around can be	-0.124939
-0.236666	8.24 can be	-0.124939
-0.236666	(Examples can be	-0.124939
-0.236666	!a; can be	-0.124939
-0.236666	Zero can be	-0.124939
-0.147276	may not be	-0.271067
-0.637311	will not be	-0.124939
-0.540485	should not be	-0.425969
-0.526365	need not be	-0.124939
-0.958208	therefore not be	-0.124939
-0.526365	might not be	-0.124939
-0.436329	and may be	-0.124939
-0.178980	that may be	-0.249877
-0.147474	it may be	-0.257564
-0.238493	function may be	-0.124939
-0.436329	code may be	-0.124939
-0.347831	This may be	-0.124939
-0.832158	compiler may be	-0.124939
-0.308777	time may be	-0.124939
-0.247555	It may be	-0.234083
-0.517612	program may be	-0.124939
-0.517612	which may be	-0.124939
-0.308777	but may be	-0.124939
-0.308777	integer may be	-0.124939
-0.488794	compilers may be	-0.124939
-0.308777	pointer may be	-0.124939
-0.308777	library may be	-0.124939
-0.059090	there may be	-0.124939
-0.103077	There may be	-0.221849
-0.308777	variables may be	-0.124939
-0.436329	table may be	-0.124939
-0.436329	pointers may be	-0.124939
-0.308777	they may be	-0.124939
-0.127506	method may be	-0.124939
-0.308777	access may be	-0.124939
-0.308777	times may be	-0.124939
-0.308777	Windows may be	-0.124939
-0.308777	execution may be	-0.124939
-0.238493	processor may be	-0.124939
-0.308777	These may be	-0.124939
-0.308777	calculation may be	-0.124939
-0.308777	solution may be	-0.124939
-0.308777	count may be	-0.124939
-0.308777	allocation may be	-0.124939
-0.308777	multiplication may be	-0.124939
-0.308777	methods may be	-0.124939
-0.308777	reference may be	-0.124939
-0.308777	mechanism may be	-0.124939
-0.127506	constructor may be	-0.124939
-0.308777	modules may be	-0.124939
-0.308777	network may be	-0.124939
-0.436329	frequency may be	-0.124939
-0.308777	usability may be	-0.124939
-0.308777	They may be	-0.124939
-0.308777	compilation may be	-0.124939
-0.308777	Templates may be	-0.124939
-0.308777	tree may be	-0.124939
-0.308777	Bitfields may be	-0.124939
-0.308777	2.5 may be	-0.124939
-0.308777	tolerance may be	-0.124939
-0.560006	a will be	-0.124939
-0.940986	it will be	-0.124939
-0.671786	function will be	-0.124939
-0.461759	code will be	-0.124939
-0.879022	This will be	-0.124939
-0.839321	you will be	-0.124939
-0.560006	memory will be	-0.124939
-0.267404	program will be	-0.124939
-0.560006	cache will be	-0.124939
-0.397486	integer will be	-0.124939
-0.397486	class will be	-0.124939
-0.560006	b will be	-0.124939
-0.397486	there will be	-0.124939
-0.397486	There will be	-0.124939
-0.397486	array will be	-0.124939
-0.397486	objects will be	-0.124939
-0.397486	variables will be	-0.124939
-0.560006	branch will be	-0.124939
-0.560006	user will be	-0.124939
-0.096834	result will be	-0.124939
-0.397486	speed will be	-0.124939
-0.397486	line will be	-0.124939
-0.397486	multiplication will be	-0.124939
-0.397486	caching will be	-0.124939
-0.397486	section will be	-0.124939
-0.560006	main will be	-0.124939
-0.560006	operation will be	-0.124939
-0.397486	vectorization will be	-0.124939
-0.397486	constants will be	-0.124939
-0.397486	instances will be	-0.124939
-0.397486	0x273F will be	-0.124939
-0.397486	3.5 will be	-0.124939
-0.397486	today will be	-0.124939
-0.397486	modifier will be	-0.124939
-0.397486	b+c will be	-0.124939
-0.397486	103) will be	-0.124939
-0.397486	b*2.0/3.0 will be	-0.124939
-1.055386	can then be	-0.124939
-0.593982	will then be	-0.124939
-0.582260	can only be	-0.425969
-0.648725	not only be	-0.425969
-0.569544	should only be	-0.124939
-0.599324	would all be	-0.124939
-0.326217	it should be	-0.124939
-0.366369	code should be	-0.124939
-0.224151	This should be	-0.124939
-0.472345	you should be	-0.124939
-0.242133	It should be	-0.124939
-0.366369	program should be	-0.124939
-0.224151	class should be	-0.124939
-0.224151	example should be	-0.124939
-0.224151	b should be	-0.124939
-0.326217	There should be	-0.124939
-0.097772	array should be	-0.124939
-0.564020	You should be	-0.124939
-0.224151	table should be	-0.124939
-0.224151	performance should be	-0.124939
-0.224151	software should be	-0.124939
-0.224151	branch should be	-0.124939
-0.366369	test should be	-0.124939
-0.224151	systems should be	-0.124939
-0.224151	method should be	-0.124939
-0.224151	constant should be	-0.124939
-0.097772	arrays should be	-0.124939
-0.097772	versions should be	-0.124939
-0.224151	bytes should be	-0.124939
-0.224151	etc. should be	-0.124939
-0.224151	programs should be	-0.124939
-0.224151	problems should be	-0.124939
-0.224151	dispatching should be	-0.124939
-0.224151	parameter should be	-0.124939
-0.224151	resources should be	-0.124939
-0.046140	together should be	-0.726999
-0.224151	results should be	-0.124939
-0.224151	storage should be	-0.124939
-0.224151	lines should be	-0.124939
-0.224151	output should be	-0.124939
-0.326217	containers should be	-0.124939
-0.097772	updates should be	-0.124939
-0.224151	penalty should be	-0.124939
-0.224151	counts should be	-0.124939
-0.326217	developers should be	-0.124939
-0.224151	guidelines should be	-0.124939
-0.097772	queue should be	-0.425969
-0.224151	considerations should be	-0.124939
-0.224151	modified should be	-0.124939
-0.224151	feedback should be	-0.124939
-0.224151	calculations, should be	-0.124939
-0.224151	formats should be	-0.124939
-0.224151	servers should be	-0.124939
-0.224151	wide, should be	-0.124939
-0.224151	function) should be	-0.124939
-0.224151	Patches should be	-0.124939
-0.224151	complaints should be	-0.124939
-0.224151	scheme should be	-0.124939
-0.224151	imprecisions should be	-0.124939
-0.224151	file) should be	-0.124939
-0.224151	malloc/free should be	-0.124939
-0.116198	can also be	-0.263241
-0.398590	may also be	-0.124939
-0.306350	should also be	-0.124939
-0.436675	might also be	-0.124939
-0.597326	all software be	-0.124939
-0.292744	or cannot be	-0.124939
-0.339415	it cannot be	-0.124939
-0.491968	function cannot be	-0.124939
-0.292744	code cannot be	-0.124939
-0.586988	you cannot be	-0.124939
-0.292744	instruction cannot be	-0.124939
-0.292744	which cannot be	-0.124939
-0.292744	size cannot be	-0.124939
-0.292744	object cannot be	-0.124939
-0.292744	variable cannot be	-0.124939
-0.633195	You cannot be	-0.124939
-0.292744	address cannot be	-0.124939
-0.292744	optimization cannot be	-0.124939
-0.339415	they cannot be	-0.124939
-0.292744	cases cannot be	-0.124939
-0.292744	times cannot be	-0.124939
-0.292744	CPUs cannot be	-0.124939
-0.292744	work cannot be	-0.124939
-0.292744	name cannot be	-0.124939
-0.292744	resources cannot be	-0.124939
-0.292744	lookup cannot be	-0.124939
-0.292744	linking cannot be	-0.124939
-0.292744	operands cannot be	-0.124939
-0.292744	loaded cannot be	-0.124939
-0.596991	neverthe- less be	-0.124939
-0.398043	can often be	-0.124939
-0.918324	will often be	-0.124939
-0.573961	can even be	-0.124939
-0.999813	may even be	-0.124939
-0.889734	should always be	-0.124939
-1.164748	most cases be	-0.124939
-0.997326	some cases be	-0.124939
-0.236246	a must be	-0.124939
-0.236246	that must be	-0.124939
-0.502588	you must be	-0.124939
-0.236246	It must be	-0.124939
-0.236246	instruction must be	-0.124939
-0.236246	do must be	-0.124939
-0.236246	i must be	-0.124939
-0.236246	object must be	-0.124939
-0.236246	bit must be	-0.124939
-0.236246	they must be	-0.124939
-0.236246	threads must be	-0.124939
-0.236246	sign must be	-0.124939
-0.236246	framework must be	-0.124939
-0.236246	vectors must be	-0.124939
-0.236246	constructor must be	-0.124939
-0.236246	errors must be	-0.124939
-0.236246	index must be	-0.124939
-0.236246	SIZE must be	-0.124939
-0.236246	pragmas must be	-0.124939
-0.236246	any, must be	-0.124939
-0.236246	correctness must be	-0.124939
-0.427042	can therefore be	-0.425969
-0.683822	will therefore be	-0.124939
-0.603133	should therefore be	-0.124939
-1.254686	the container be	-0.124939
-0.103978	it would be	-0.124939
-0.347587	compiler would be	-0.124939
-0.347587	this would be	-0.124939
-0.240958	loop would be	-0.124939
-0.240958	which would be	-0.124939
-0.240958	variable would be	-0.124939
-0.240958	line would be	-0.124939
-0.240958	parameters would be	-0.124939
-0.240958	solution would be	-0.124939
-0.240958	metaprogramming would be	-0.124939
-0.347587	reduction would be	-0.124939
-0.240958	otherwise would be	-0.124939
-0.240958	logarithm would be	-0.124939
-0.240958	sizeof(S1) would be	-0.124939
-0.592115	most likely be	-0.124939
-0.239117	may preferably be	-0.124939
-0.025236	should preferably be	-0.367977
-0.144033	therefore preferably be	-0.124939
-0.181129	can never be	-0.124939
-0.820195	will never be	-0.124939
-0.871283	may actually be	-0.124939
-1.390229	in fact be	-0.124939
-0.142780	can sometimes be	-0.124939
-0.394421	can still be	-0.124939
-0.430619	would still be	-0.124939
-0.384198	can possibly be	-0.124939
-0.343352	may possibly be	-0.124939
-0.343352	could possibly be	-0.124939
-0.622052	of course be	-0.124939
-0.531315	it might be	-0.124939
-0.377549	this might be	-0.124939
-0.377549	It might be	-0.124939
-0.358626	function could be	-0.124939
-0.358626	portability could be	-0.124939
-0.358626	r+i/2 could be	-0.124939
-0.562311	can now be	-0.124939
-0.562430	can easily be	-0.124939
-0.136265	cannot easily be	-0.124939
-0.804796	will soon be	-0.124939
-0.064181	should definitely be	-0.301030
-0.526552	} Can be	-0.124939
-0.504319	can probably be	-0.124939
-0.463002	which can't be	-0.124939
-0.358428	some day be	-0.124939
-0.358428	Dispatcher. Will be	-0.124939
-1.667225	point to are	-0.124939
-0.892754	stack and are	-0.124939
-0.596861	space and are	-0.124939
-0.596861	evaluate and are	-0.124939
-1.333167	code that are	-0.124939
-0.346541	data that are	-0.124939
-1.164120	program that are	-0.124939
-0.295815	functions that are	-0.301030
-0.998340	compilers that are	-0.124939
-0.756013	objects that are	-0.124939
-0.397732	variables that are	-0.301030
-0.346541	libraries that are	-0.124939
-0.522325	pointers that are	-0.124939
-1.065011	instructions that are	-0.124939
-0.756013	CPUs that are	-0.124939
-0.756013	arrays that are	-0.124939
-0.522325	parameters that are	-0.124939
-0.932104	programs that are	-0.124939
-0.947372	branches that are	-0.124939
-0.522325	members that are	-0.124939
-0.522325	constants that are	-0.124939
-0.522325	tasks that are	-0.124939
-0.053844	Variables that are	-0.301030
-0.186959	Functions that are	-0.425969
-0.522325	Arrays that are	-0.124939
-0.522325	lists that are	-0.124939
-0.522325	Objects that are	-0.124939
-0.522325	Data that are	-0.124939
-0.522325	lengths that are	-0.124939
-0.522325	considerations that are	-0.124939
-0.873816	Applications that are	-0.124939
-0.756013	Algorithms that are	-0.124939
-0.522325	events that are	-0.124939
-0.186959	Operations that are	-0.124939
-0.522325	spaces that are	-0.124939
-0.522325	ones that are	-0.124939
-0.522325	iterators that are	-0.124939
-1.672202	a function are	-0.425969
-0.881611	any function are	-0.124939
-0.591188	overloaded function are	-0.124939
-1.873312	the code are	-0.124939
-1.262826	program code are	-0.124939
-1.250803	critical code are	-0.124939
-1.253632	that you are	-0.124939
-0.523342	if you are	-0.301030
-0.680405	code you are	-0.124939
-0.901698	as you are	-0.124939
-0.680405	compiler you are	-0.124939
-0.476795	when you are	-0.301030
-1.495111	then you are	-0.124939
-0.551136	If you are	-0.191886
-0.777107	sure you are	-0.124939
-0.680405	whether you are	-0.124939
-0.476795	unless you are	-0.301030
-0.856497	what you are	-0.124939
-0.680405	library, you are	-0.124939
-0.730151	the data are	-0.204120
-0.826461	and data are	-0.124939
-0.517205	that data are	-0.124939
-0.747280	if data are	-0.124939
-0.185771	when data are	-0.425969
-0.517205	which data are	-0.124939
-0.862398	static data are	-0.124939
-1.624194	the program are	-0.124939
-1.009166	C++ program are	-0.124939
-1.031273	mode program are	-0.124939
-0.875821	the functions are	-0.124939
-0.655488	The functions are	-0.124939
-0.171965	two functions are	-0.124939
-1.027150	member functions are	-0.124939
-0.746191	these functions are	-0.124939
-0.460847	Some functions are	-0.124939
-0.460847	important functions are	-0.124939
-0.171965	These functions are	-0.124939
-0.835465	virtual functions are	-0.124939
-0.689773	mathematical functions are	-0.124939
-0.460847	Library functions are	-0.124939
-0.746191	Virtual functions are	-0.124939
-0.415052	Intrinsic functions are	-0.124939
-0.655488	Fastcall functions are	-0.124939
-0.460847	Small functions are	-0.124939
-0.460847	Sometimes, functions are	-0.124939
-0.460847	Leaf functions are	-0.124939
-0.795374	each other are	-0.425969
-2.281351	the loop are	-0.124939
-1.026226	and which are	-0.124939
-0.550306	functions which are	-0.124939
-0.550306	available which are	-0.124939
-0.550306	directives which are	-0.124939
-0.550306	conditions which are	-0.124939
-0.550306	registers, which are	-0.124939
-0.550306	comparisons, which are	-0.124939
-0.597979	order but are	-0.124939
-1.325547	data cache are	-0.124939
-1.494117	level-2 cache are	-0.124939
-1.258600	level-1 cache are	-0.124939
-2.376596	instruction set are	-0.124939
-1.061972	a class are	-0.124939
-1.237305	child class are	-0.124939
-0.638452	derived class are	-0.124939
-1.158327	the compilers are	-0.124939
-0.510411	The compilers are	-0.124939
-0.490231	all compilers are	-0.124939
-0.490231	64-bit compilers are	-0.124939
-1.281202	C++ compilers are	-0.124939
-0.702411	Gnu compilers are	-0.124939
-1.364938	Some compilers are	-0.124939
-0.490231	common compilers are	-0.124939
-0.490231	few compilers are	-0.124939
-0.490231	Watcom compilers are	-0.124939
-0.490231	Current compilers are	-0.124939
-0.490231	Few compilers are	-0.124939
-0.597109	256-bit size are	-0.124939
-0.819713	and b are	-0.124939
-1.182147	each object are	-0.124939
-0.292670	and there are	-0.301030
-0.296830	that there are	-0.204120
-0.357097	if there are	-0.380211
-0.404024	than there are	-0.124939
-0.452510	because there are	-0.124939
-0.253542	If there are	-0.124939
-0.349481	but there are	-0.124939
-0.404024	where there are	-0.124939
-0.224542	But there are	-0.124939
-0.120460	However, there are	-0.124939
-0.705054	cases, there are	-0.124939
-0.284468	Here, there are	-0.124939
-0.284468	Fortunately, there are	-0.124939
-0.404024	Typically, there are	-0.124939
-0.284468	avoided, there are	-0.124939
-0.091733	compiler There are	-0.425969
-0.208179	code. There are	-0.124939
-0.091733	time. There are	-0.124939
-0.208179	function. There are	-0.124939
-0.208179	etc. There are	-0.124939
-0.208179	efficient. There are	-0.124939
-0.208179	below. There are	-0.124939
-0.306110	processors. There are	-0.124939
-0.208179	vectors There are	-0.124939
-0.208179	resources. There are	-0.124939
-0.208179	calls. There are	-0.124939
-0.208179	it. There are	-0.124939
-0.208179	registers. There are	-0.124939
-0.208179	performance. There are	-0.124939
-0.208179	instructions. There are	-0.124939
-0.208179	address. There are	-0.124939
-0.208179	not. There are	-0.124939
-0.208179	enabled. There are	-0.124939
-0.208179	expressions. There are	-0.124939
-0.208179	arrays. There are	-0.124939
-0.208179	vectors. There are	-0.124939
-0.208179	core. There are	-0.124939
-0.208179	threads. There are	-0.124939
-0.208179	manual. There are	-0.124939
-0.208179	8. There are	-0.124939
-0.208179	*.so). There are	-0.124939
-0.208179	optimal. There are	-0.124939
-0.208179	maintenance There are	-0.124939
-0.208179	explicitly. There are	-0.124939
-0.208179	Windows). There are	-0.124939
-0.208179	CodeAnalyst. There are	-0.124939
-0.208179	way: There are	-0.124939
-0.208179	security. There are	-0.124939
-0.208179	limited. There are	-0.124939
-0.208179	0x1C. There are	-0.124939
-0.208179	copying. There are	-0.124939
-0.208179	post-increment. There are	-0.124939
-0.208179	check. There are	-0.124939
-0.208179	tables". There are	-0.124939
-0.208179	power. There are	-0.124939
-0.208179	-abs(x);. There are	-0.124939
-0.208179	uses. There are	-0.124939
-0.208179	2B. There are	-0.124939
-0.208179	normally. There are	-0.124939
-0.208179	point). There are	-0.124939
-0.597094	the objects are	-0.124939
-0.998837	and objects are	-0.124939
-0.628140	The objects are	-0.124939
-0.104130	If objects are	-0.301030
-0.403049	allocated objects are	-0.124939
-0.938224	shared objects are	-0.124939
-0.443180	local objects are	-0.124939
-0.443180	so-called objects are	-0.124939
-1.172208	Shared objects are	-0.124939
-0.628140	composite objects are	-0.124939
-1.007500	that we are	-0.124939
-0.438668	time we are	-0.124939
-0.399959	because we are	-0.124939
-0.438668	where we are	-0.124939
-0.438668	example, we are	-0.124939
-0.438668	examples we are	-0.124939
-0.438668	While we are	-0.124939
-0.621250	Then we are	-0.124939
-0.438668	7.4 we are	-0.124939
-0.438668	since we are	-0.124939
-0.438668	Similarly, we are	-0.124939
-0.438668	14.7b, we are	-0.124939
-0.438668	Next, we are	-0.124939
-0.716851	used variables are	-0.124939
-1.064047	register variables are	-0.124939
-0.499033	how variables are	-0.124939
-0.321688	Boolean variables are	-0.301030
-1.044988	Induction variables are	-0.124939
-0.716851	Global variables are	-0.124939
-1.748119	the table are	-0.124939
-0.594389	multi-threaded software are	-0.124939
-0.393432	the elements are	-0.425969
-0.502250	when elements are	-0.124939
-0.502250	many elements are	-0.124939
-0.502250	container elements are	-0.124939
-1.192911	objects stored are	-0.124939
-1.776106	sign bit are	-0.124939
-1.472874	to optimization are	-0.124939
-0.509210	function libraries are	-0.204120
-0.568078	Intel libraries are	-0.124939
-0.547556	dynamic libraries are	-0.425969
-0.568078	standard libraries are	-0.124939
-0.403021	Dynamic libraries are	-0.124939
-0.403021	purpose libraries are	-0.124939
-0.403021	LIBM libraries are	-0.124939
-0.891407	in registers are	-0.124939
-0.730437	vector registers are	-0.124939
-0.662997	point registers are	-0.124939
-0.096047	stack registers are	-0.301030
-0.220125	XMM registers are	-0.301030
-0.623232	YMM registers are	-0.124939
-0.534450	if pointers are	-0.124939
-1.035683	member pointers are	-0.124939
-0.776992	smart pointers are	-0.124939
-0.980418	Smart pointers are	-0.124939
-0.586701	operating systems are	-0.221849
-0.844019	because these are	-0.124939
-0.844019	but these are	-0.124939
-0.237514	and they are	-0.124939
-0.421654	that they are	-0.124939
-0.179845	function they are	-0.124939
-0.102718	if they are	-0.124939
-0.179845	- they are	-0.124939
-0.179845	time they are	-0.124939
-0.038469	when they are	-0.124939
-0.275072	because they are	-0.124939
-0.168358	which they are	-0.249877
-0.080679	but they are	-0.124939
-0.270892	where they are	-0.124939
-0.179845	before they are	-0.124939
-0.179845	how they are	-0.124939
-0.179845	cases they are	-0.124939
-0.270892	whether they are	-0.124939
-0.179845	programs they are	-0.124939
-0.179845	unless they are	-0.124939
-0.179845	whenever they are	-0.124939
-1.338381	memory access are	-0.124939
-1.397648	oriented programming are	-0.124939
-1.386423	64 bits are	-0.124939
-0.859236	vector operations are	-0.124939
-0.823182	point operations are	-0.124939
-0.889490	integer operations are	-0.124939
-0.419872	these operations are	-0.124939
-0.161056	Integer operations are	-0.124939
-0.100489	Vector operations are	-0.124939
-0.592946	arithmetic operations are	-0.124939
-0.593434	These cases are	-0.124939
-0.463554	where instructions are	-0.124939
-0.463554	Some instructions are	-0.124939
-0.319416	These instructions are	-0.124939
-0.416877	write instructions are	-0.425969
-0.751427	executing instructions are	-0.124939
-0.464422	vector processors are	-0.124939
-0.661098	different processors are	-0.124939
-0.661098	Intel processors are	-0.124939
-0.661098	AMD processors are	-0.124939
-0.464422	x86 processors are	-0.124939
-0.753114	PC processors are	-0.124939
-0.464422	Newer processors are	-0.124939
-0.235625	Modern CPUs are	-0.124939
-0.162842	the arrays are	-0.271067
-0.391693	when arrays are	-0.124939
-0.391693	If arrays are	-0.124939
-0.391693	Linear arrays are	-0.124939
-1.270109	for Windows are	-0.124939
-1.708616	function calls are	-0.124939
-0.940723	the calculations are	-0.124939
-0.703427	address calculations are	-0.124939
-0.703427	precision calculations are	-0.124939
-0.490854	All calculations are	-0.124939
-0.490854	certain calculations are	-0.124939
-0.190901	New versions are	-0.425969
-0.539654	trial versions are	-0.124939
-0.899952	the threads are	-0.124939
-0.797948	different threads are	-0.124939
-0.996081	multiple threads are	-0.124939
-0.979133	two threads are	-0.124939
-0.486907	high-priority threads are	-0.124939
-1.264716	and c are	-0.124939
-0.561183	calls. These are	-0.124939
-0.561183	efficiency. These are	-0.124939
-0.880781	or thread are	-0.124939
-0.822888	functions, etc. are	-0.124939
-0.560110	Monday, etc. are	-0.124939
-0.881535	two integers are	-0.124939
-1.197254	vector classes are	-0.124939
-1.147618	container classes are	-0.124939
-0.966018	Container classes are	-0.124939
-0.442744	the parameters are	-0.124939
-0.418364	function parameters are	-0.124939
-0.111177	point parameters are	-0.124939
-0.210753	integer parameters are	-0.425969
-0.201367	template parameters are	-0.124939
-0.373361	four parameters are	-0.124939
-0.197406	Function parameters are	-0.425969
-0.260989	macro parameters are	-0.124939
-1.401078	this problem are	-0.124939
-0.590381	STL container are	-0.124939
-0.752317	The operators are	-0.124939
-0.792319	bitwise operators are	-0.124939
-1.264911	or structure are	-0.124939
-0.342893	The values are	-0.124939
-0.514165	key values are	-0.124939
-0.627314	the addresses are	-0.124939
-0.478037	Function addresses are	-0.124939
-0.858799	relative addresses are	-0.124939
-0.477483	library files are	-0.124939
-0.477483	intermediate files are	-0.124939
-0.778875	source files are	-0.124939
-0.681820	header files are	-0.124939
-0.873631	if both are	-0.124939
-0.874092	these problems are	-0.124939
-0.541993	the branches are	-0.124939
-0.541993	dispatch branches are	-0.124939
-1.142420	and multiplication are	-0.124939
-0.933890	instruction sets are	-0.425969
-0.586411	its members are	-0.124939
-0.387379	these methods are	-0.124939
-0.593810	These methods are	-0.124939
-0.420451	development methods are	-0.124939
-0.420451	similar methods are	-0.124939
-0.585370	of development are	-0.124939
-0.415060	which resources are	-0.124939
-0.415060	all resources are	-0.124939
-0.585798	allocated resources are	-0.124939
-0.415060	shared resources are	-0.124939
-0.661799	network resources are	-0.124939
-0.583487	such applications are	-0.124939
-0.899990	The examples are	-0.124939
-0.775824	these examples are	-0.124939
-0.586263	protection means are	-0.124939
-0.583143	and || are	-0.124939
-0.582902	Integer expressions are	-0.124939
-0.613773	these directives are	-0.124939
-0.433742	These directives are	-0.124939
-0.613773	#define directives are	-0.124939
-0.613773	#if directives are	-0.124939
-1.128942	.NET framework are	-0.124939
-0.502627	of microprocessors are	-0.124939
-0.357218	Intel microprocessors are	-0.124939
-0.357218	way microprocessors are	-0.124939
-0.689326	modern microprocessors are	-0.124939
-0.342475	Modern microprocessors are	-0.124939
-0.850891	the numbers are	-0.124939
-1.017345	point numbers are	-0.124939
-0.674800	model numbers are	-0.124939
-0.648599	used together are	-0.124939
-0.581824	YMM vectors are	-0.124939
-0.864722	and r are	-0.124939
-0.593231	the results are	-0.124939
-0.298067	The results are	-0.124939
-0.715266	intermediate results are	-0.124939
-0.862063	variable storage are	-0.124939
-1.154792	optimization options are	-0.124939
-0.520439	certain options are	-0.124939
-0.503023	the operands are	-0.124939
-1.015101	the modules are	-0.124939
-1.276638	algebraic reductions are	-0.124939
-1.120452	and references are	-0.124939
-0.579893	and C are	-0.124939
-1.115721	These conversions are	-0.124939
-0.707298	programming languages are	-0.124939
-0.393044	high-level languages are	-0.124939
-0.393044	Interpreted languages are	-0.124939
-0.393044	Low-level languages are	-0.124939
-1.318955	the STL are	-0.124939
-0.577021	These lines are	-0.124939
-0.851099	the output are	-0.124939
-0.854621	These costs are	-0.124939
-0.449827	the constants are	-0.124939
-0.130782	point constants are	-0.124939
-0.449827	two constants are	-0.124939
-0.318800	Integer constants are	-0.124939
-1.002298	Text strings are	-0.124939
-0.125835	following conditions are	-0.124939
-0.303724	certain conditions are	-0.124939
-0.303724	caching conditions are	-0.124939
-0.303724	Copyright conditions are	-0.124939
-0.848847	standard tasks are	-0.124939
-0.998759	and child are	-0.124939
-1.267647	monitor counters are	-0.124939
-1.008817	function names are	-0.124939
-0.481507	Function names are	-0.124939
-0.572271	Further details are	-0.124939
-0.198179	the rows are	-0.301030
-0.570981	or structures are	-0.124939
-0.568729	frequent updates are	-0.124939
-0.569662	multiple cores are	-0.124939
-0.571534	alternative implementations are	-0.124939
-0.323102	different sizes are	-0.425969
-0.569662	Open BSD are	-0.124939
-0.391766	metaprogramming, loops are	-0.124939
-0.153129	Nested loops are	-0.425969
-0.565532	Switch statements are	-0.124939
-0.833813	class templates are	-0.124939
-0.568084	in sequence are	-0.124939
-0.979012	clock counts are	-0.124939
-0.567062	and map are	-0.124939
-0.562218	writing style are	-0.124939
-0.311791	all destructors are	-0.124939
-1.163362	parameter transfer are	-0.124939
-0.568178	the diagonal are	-0.425969
-0.563341	special purposes are	-0.124939
-0.563341	language. Here are	-0.124939
-1.050004	branch prediction are	-0.124939
-0.826132	calling conventions are	-0.124939
-0.819231	the background are	-0.124939
-0.558736	These algorithms are	-0.124939
-0.559990	the additions are	-0.124939
-0.558736	allowed inputs are	-0.124939
-0.559362	Those who are	-0.124939
-0.612291	the factors are	-0.124939
-0.432763	These factors are	-0.124939
-0.432294	Common devices are	-0.124939
-0.432294	hand-held devices are	-0.124939
-0.559362	Cache misses are	-0.124939
-0.135688	PLT tables are	-0.425969
-0.470663	Lookup tables are	-0.124939
-0.553591	for D are	-0.124939
-0.553591	you measure are	-0.124939
-0.553591	above sections are	-0.124939
-0.813590	of algebra are	-0.124939
-0.809745	Context switches are	-0.124939
-0.811023	of Java are	-0.124939
-0.552885	thrown exceptions are	-0.124939
-0.555008	virtual machine are	-0.124939
-0.306847	optimization manuals are	-0.124939
-0.433742	these manuals are	-0.124939
-0.306847	subsequent manuals are	-0.124939
-0.306847	The profilers are	-0.124939
-0.306847	These profilers are	-0.124939
-0.306847	Unfortunately, profilers are	-0.124939
-1.033988	out-of-order capabilities are	-0.124939
-0.553591	that measurements are	-0.124939
-0.552885	These units are	-0.124939
-0.554299	and log are	-0.124939
-0.152312	point comparisons are	-0.124939
-0.389507	These requirements are	-0.124939
-0.389507	alignment requirements are	-0.124939
-0.780712	and Fortran are	-0.124939
-1.033684	device drivers are	-0.124939
-0.356499	10; Templates are	-0.124939
-0.356499	57 Templates are	-0.124939
-0.536575	storage principles are	-0.124939
-0.211991	Such schemes are	-0.124939
-0.039565	protection schemes are	-0.301030
-0.356499	references. Arrays are	-0.124939
-0.356499	behaviors. Arrays are	-0.124939
-0.539441	any constructors are	-0.124939
-0.536575	Some guidelines are	-0.124939
-0.229581	runtime frameworks are	-0.124939
-0.229581	interface frameworks are	-0.124939
-0.229581	Such frameworks are	-0.124939
-0.537528	power consumption are	-0.124939
-0.142653	search facilities are	-0.425969
-0.536575	as macros are	-0.124939
-0.522789	hybrid solutions are	-0.124939
-0.522789	and sum2 are	-0.124939
-0.313392	15.1b. Branches are	-0.124939
-0.313392	penalty. Branches are	-0.124939
-0.522789	Most caches are	-0.124939
-0.312681	Threads Threads are	-0.124939
-0.312681	Linux). Threads are	-0.124939
-0.522789	performance tests are	-0.124939
-0.502207	and main() are	-0.124939
-0.500744	and divisions are	-0.124939
-0.500744	disguise. Enums are	-0.124939
-0.502207	itself. Constructors are	-0.124939
-0.106620	and c[i] are	-0.124939
-0.248240	calculations. Examples are	-0.124939
-0.248240	above. Examples are	-0.124939
-0.500744	table lookups are	-0.124939
-0.653771	and Sum3 are	-0.124939
-0.653771	disturbing influences are	-0.124939
-0.459750	programming constructs are	-0.124939
-0.459750	user settings are	-0.124939
-0.459750	well, others are	-0.124939
-0.459750	The DLLs are	-0.124939
-0.459750	These suffixes are	-0.124939
-0.459750	same arguments are	-0.124939
-0.459750	time intervals are	-0.124939
-0.459750	and v.f are	-0.124939
-0.653771	CPU dispatchers are	-0.124939
-0.142460	register. Registers are	-0.124939
-0.142460	reference. Registers are	-0.124939
-0.142460	references. References are	-0.124939
-0.142460	type. References are	-0.124939
-0.459750	short int) are	-0.124939
-0.459750	sorting algorithms, are	-0.124939
-0.459750	my experiment are	-0.124939
-0.459750	operator i++ are	-0.124939
-0.459750	common time-consumers are	-0.124939
-0.459750	far procedures are	-0.124939
-0.459750	^, ~ are	-0.124939
-0.355870	and repagination are	-0.124939
-0.355870	three clauses are	-0.124939
-0.355870	X (Darwin) are	-0.124939
-0.355870	(chapter 12) are	-0.124939
-0.355870	answer. Beginners are	-0.124939
-0.355870	name mangling are	-0.124939
-0.355870	Software distributors are	-0.124939
-0.355870	The recommendations are	-0.124939
-0.355870	instruction latencies are	-0.124939
-0.355870	and '$' are	-0.124939
-0.355870	and parsing are	-0.124939
-0.355870	to objects) are	-0.124939
-0.355870	called properties) are	-0.124939
-0.355870	= 123; are	-0.124939
-0.355870	with #) are	-0.124939
-0.355870	time slice are	-0.124939
-0.355870	instructions (MOVNT) are	-0.124939
-0.355870	(e.g. '>') are	-0.124939
-0.355870	each clause are	-0.124939
-0.355870	CPU vendors are	-0.124939
-0.355870	9. Multiplications are	-0.124939
-0.746624	pointed to can	-0.425969
-1.885532	code and can	-0.124939
-0.597871	consecutively and can	-0.124939
-1.158822	code that can	-0.124939
-1.246058	compiler that can	-0.124939
-1.307880	program that can	-0.124939
-0.554179	cache that can	-0.124939
-0.554179	performance that can	-0.124939
-1.326021	branch that can	-0.124939
-1.037918	way that can	-0.124939
-1.188066	instructions that can	-0.124939
-0.948953	language that can	-0.124939
-0.554179	structure that can	-0.124939
-0.194112	count that can	-0.425969
-1.037918	branches that can	-0.124939
-0.194112	applications that can	-0.124939
-0.554179	expressions that can	-0.124939
-0.812087	advantages that can	-0.124939
-0.360421	things that can	-0.425969
-0.554179	profiler that can	-0.124939
-0.812087	thing that can	-0.124939
-0.194112	something that can	-0.124939
-0.360421	factors that can	-0.124939
-0.554179	alternatives that can	-0.124939
-0.554179	F2 that can	-0.124939
-0.554179	chip that can	-0.124939
-0.554179	Espresso) that can	-0.124939
-0.627482	and it can	-0.249877
-0.766586	that it can	-0.301030
-1.698777	if it can	-0.124939
-1.094168	than it can	-0.124939
-1.393671	then it can	-0.124939
-1.045369	because it can	-0.301030
-0.921257	but it can	-0.124939
-0.519623	so it can	-0.124939
-1.245576	before it can	-0.124939
-0.696874	cases it can	-0.425969
-1.089618	But it can	-0.124939
-1.332727	Therefore, it can	-0.124939
-0.751395	what it can	-0.124939
-0.867769	called, it can	-0.124939
-0.519623	worse, it can	-0.124939
-0.519623	(b*c)/d, it can	-0.124939
-0.519623	least, it can	-0.124939
-2.073218	the function can	-0.124939
-2.053317	a function can	-0.124939
-1.212279	this function can	-0.124939
-1.023787	one function can	-0.124939
-0.865387	calling function can	-0.124939
-1.212279	frame function can	-0.124939
-0.582800	exponential function can	-0.124939
-1.153437	the code can	-0.249877
-1.571216	of code can	-0.124939
-1.179689	The code can	-0.124939
-1.043031	this code can	-0.124939
-1.043031	same code can	-0.124939
-1.139941	above code can	-0.124939
-0.629404	} This can	-0.602060
-0.436934	code. This can	-0.124939
-0.917691	memory. This can	-0.124939
-0.708038	x; This can	-0.124939
-0.708038	pointer. This can	-0.124939
-0.493674	operations. This can	-0.124939
-0.708038	variable. This can	-0.124939
-0.436934	stack. This can	-0.124939
-0.493674	fast. This can	-0.124939
-0.708038	important. This can	-0.124939
-0.708038	times. This can	-0.124939
-0.493674	process. This can	-0.124939
-0.493674	files. This can	-0.124939
-0.493674	cached. This can	-0.124939
-0.493674	b2; This can	-0.124939
-0.493674	only. This can	-0.124939
-0.708038	CPUs"). This can	-0.124939
-0.493674	free. This can	-0.124939
-0.180171	modified. This can	-0.124939
-0.493674	defined. This can	-0.124939
-0.493674	saturated. This can	-0.124939
-0.493674	(a+b). This can	-0.124939
-0.493674	it). This can	-0.124939
-0.493674	scheduler. This can	-0.124939
-0.493674	32-62. This can	-0.124939
-0.493674	patterns. This can	-0.124939
-0.493674	place. This can	-0.124939
-1.017388	the compiler can	-0.234083
-1.100298	a compiler can	-0.124939
-0.826010	The compiler can	-0.124939
-0.928217	A compiler can	-0.124939
-1.278804	Intel compiler can	-0.124939
-0.953455	Gnu compiler can	-0.425969
-0.460314	good compiler can	-0.124939
-0.436776	optimizing compiler can	-0.124939
-0.497610	just-in-time compiler can	-0.124939
-0.600276	case x can	-0.124939
-0.659508	and you can	-0.124939
-0.576825	that you can	-0.221849
-0.885909	if you can	-0.124939
-0.792824	as you can	-0.124939
-0.659636	then you can	-0.191886
-0.692939	because you can	-0.124939
-1.096626	If you can	-0.124939
-0.740016	but you can	-0.124939
-0.432429	compilers you can	-0.124939
-0.787878	where you can	-0.124939
-0.692939	so you can	-0.124939
-0.770924	example, you can	-0.124939
-0.164480	how you can	-0.124939
-0.611786	cases you can	-0.124939
-0.432429	while you can	-0.124939
-0.432429	Windows you can	-0.124939
-0.164480	cases, you can	-0.124939
-0.611786	optimizations you can	-0.124939
-0.692939	Here, you can	-0.124939
-0.164480	things you can	-0.425969
-0.611786	Windows, you can	-0.124939
-1.009726	Alternatively, you can	-0.124939
-0.304248	general, you can	-0.124939
-0.432429	Instead, you can	-0.124939
-0.432429	operator; you can	-0.124939
-0.432429	reason, you can	-0.124939
-1.246389	if this can	-0.124939
-1.037570	then this can	-0.124939
-1.037570	how this can	-0.124939
-0.598848	load time can	-0.124939
-0.896200	cache use can	-0.124939
-1.149848	time. It can	-0.124939
-0.545614	does It can	-0.124939
-0.545614	database It can	-0.124939
-0.545614	operations. It can	-0.124939
-0.545614	CPU. It can	-0.124939
-0.972789	integers. It can	-0.124939
-0.545614	parameter. It can	-0.124939
-0.545614	Linux. It can	-0.124939
-0.545614	numbers. It can	-0.124939
-0.545614	list[i].b. It can	-0.124939
-1.464000	static memory can	-0.124939
-1.162970	RAM memory can	-0.124939
-1.582846	and data can	-0.124939
-0.881547	public data can	-0.124939
-1.614983	The program can	-0.124939
-0.590833	7 program can	-0.124939
-0.498488	each vector can	-0.124939
-0.921876	The same can	-0.124939
-1.024108	other functions can	-0.124939
-1.519672	intrinsic functions can	-0.124939
-0.582916	missing functions can	-0.124939
-2.062270	the CPU can	-0.124939
-1.603225	The CPU can	-0.124939
-0.599014	prefetch instruction can	-0.124939
-1.582871	the loop can	-0.124939
-1.534778	The loop can	-0.124939
-0.535586	CPU which can	-0.124939
-0.778979	i which can	-0.124939
-0.535586	register which can	-0.124939
-0.535586	instructions which can	-0.124939
-0.535586	references, which can	-0.124939
-0.535586	operator, which can	-0.124939
-0.535586	attribute which can	-0.124939
-0.535586	multiplications, which can	-0.124939
-0.535586	YMM) which can	-0.124939
-1.233170	to integer can	-0.124939
-1.766721	an integer can	-0.124939
-0.578041	the set can	-0.124939
-1.676645	instruction set can	-0.124939
-1.187765	template class can	-0.124939
-0.786791	this example can	-0.425969
-0.985893	other compilers can	-0.124939
-1.031381	Intel compilers can	-0.124939
-0.729007	these compilers can	-0.124939
-1.411862	Some compilers can	-0.124939
-0.838727	optimizing compilers can	-0.124939
-0.633462	Most compilers can	-0.124939
-0.183217	PathScale compilers can	-0.124939
-0.506354	Modern compilers can	-0.124939
-0.586542	variable size can	-0.124939
-0.586542	>= size can	-0.124939
-1.935319	a pointer can	-0.124939
-1.004566	A pointer can	-0.124939
-0.851950	link pointer can	-0.124939
-0.597333	on b can	-0.124939
-1.208338	class library can	-0.124939
-0.371160	dynamic library can	-0.425969
-0.914652	interface library can	-0.124939
-0.597579	that i can	-0.124939
-1.661292	the object can	-0.124939
-1.405926	shared object can	-0.124939
-0.845153	existing object can	-0.124939
-1.486550	an array can	-0.124939
-0.816214	simple array can	-0.124939
-0.556453	large array can	-0.124939
-1.109682	An array can	-0.124939
-1.306357	of objects can	-0.124939
-0.968508	when objects can	-0.124939
-1.005327	class objects can	-0.124939
-0.792417	many objects can	-0.124939
-0.543208	new objects can	-0.124939
-1.548234	a variable can	-0.124939
-1.202514	A variable can	-0.124939
-1.336630	induction variable can	-0.124939
-0.482702	that we can	-0.124939
-0.584761	then we can	-0.124939
-0.829249	because we can	-0.124939
-0.432255	so we can	-0.124939
-0.438633	systems we can	-0.124939
-0.438633	constants we can	-0.124939
-0.621198	Here we can	-0.124939
-0.438633	As we can	-0.124939
-0.438633	lesson we can	-0.124939
-0.565776	of variables can	-0.124939
-0.565776	Integer variables can	-0.124939
-1.349488	induction variables can	-0.124939
-1.926103	of 2 can	-0.124939
-0.349575	time You can	-0.124939
-0.349575	unsigned You can	-0.124939
-0.349575	code. You can	-0.124939
-0.491980	time. You can	-0.124939
-0.349575	functions. You can	-0.124939
-0.491980	efficient. You can	-0.124939
-0.349575	compiler. You can	-0.124939
-0.491980	cycles. You can	-0.124939
-0.349575	references. You can	-0.124939
-0.349575	16. You can	-0.124939
-0.349575	result. You can	-0.124939
-0.349575	one. You can	-0.124939
-0.491980	overlap. You can	-0.124939
-0.349575	n. You can	-0.124939
-0.349575	a. You can	-0.124939
-0.349575	test. You can	-0.124939
-0.349575	twice. You can	-0.124939
-0.349575	heading You can	-0.124939
-0.349575	__intel_cpu_feature_indicator_x. You can	-0.124939
-0.349575	account. You can	-0.124939
-0.349575	entry. You can	-0.124939
-0.349575	makefile. You can	-0.124939
-1.216526	The table can	-0.124939
-1.014312	hash table can	-0.124939
-1.218833	in performance can	-0.124939
-1.200504	The performance can	-0.124939
-1.446689	of software can	-0.124939
-0.775266	A branch can	-0.124939
-0.563750	optimal branch can	-0.124939
-1.819139	are stored can	-0.124939
-1.927875	the address can	-0.124939
-1.032318	target address can	-0.124939
-1.056981	carry bit can	-0.124939
-1.226050	same register can	-0.124939
-0.575602	XMM register can	-0.124939
-1.179480	of optimization can	-0.124939
-0.198431	Function libraries can	-0.425969
-1.313732	vector registers can	-0.124939
-1.472958	XMM registers can	-0.124939
-0.573252	invalid pointers can	-0.124939
-1.098837	Smart pointers can	-0.124939
-1.225557	64-bit systems can	-0.124939
-1.612350	operating systems can	-0.124939
-1.829746	the user can	-0.124939
-1.197176	and they can	-0.124939
-1.299414	because they can	-0.124939
-1.048836	This method can	-0.425969
-0.530376	same method can	-0.124939
-0.530376	similar method can	-0.124939
-0.593300	data access can	-0.124939
-1.216653	operating system can	-0.425969
-1.324887	a file can	-0.124939
-1.397305	oriented programming can	-0.124939
-1.819591	critical part can	-0.124939
-0.840938	of operations can	-0.124939
-0.569873	Boolean operations can	-0.124939
-1.177352	composite type can	-0.124939
-0.883653	new instructions can	-0.124939
-0.797429	virtual processors can	-0.124939
-0.546024	Many processors can	-0.124939
-0.797429	non-Intel processors can	-0.124939
-0.885878	processors available can	-0.124939
-1.550588	a constant can	-0.124939
-0.567398	A constant can	-0.124939
-1.665551	an error can	-0.124939
-1.728814	the stack can	-0.124939
-0.935445	Intel CPUs can	-0.124939
-0.517804	x86 CPUs can	-0.124939
-0.748299	modern CPUs can	-0.124939
-1.049329	Modern CPUs can	-0.124939
-0.883285	and arrays can	-0.124939
-0.591553	caches work can	-0.124939
-1.708077	function calls can	-0.124939
-0.591185	intermediate calculations can	-0.124939
-1.866414	the result can	-0.124939
-1.406631	the processor can	-0.124939
-1.387703	unused bytes can	-0.124939
-0.562850	that threads can	-0.124939
-1.313173	multiple threads can	-0.124939
-1.264573	and c can	-0.124939
-0.792099	one thread can	-0.124939
-0.852121	another thread can	-0.124939
-0.852121	Each thread can	-0.124939
-0.484039	third thread can	-0.124939
-0.484039	high-priority thread can	-0.124939
-0.772374	that overflow can	-0.124939
-0.531803	no overflow can	-0.124939
-0.531803	array overflow can	-0.124939
-1.156574	Container classes can	-0.124939
-1.668944	cache line can	-0.124939
-0.592113	branches inside can	-0.124939
-0.814000	This problem can	-0.124939
-0.555234	safety problem can	-0.124939
-0.815055	This solution can	-0.124939
-0.991146	this solution can	-0.124939
-1.042540	sorted list can	-0.124939
-1.318061	the hardware can	-0.124939
-0.590076	unwinding information can	-0.124939
-1.359494	{ ... can	-0.124939
-0.624797	loop counter can	-0.602060
-0.874770	stamp counter can	-0.124939
-1.075411	memory allocation can	-0.124939
-0.588126	described above can	-0.124939
-0.586993	then both can	-0.124939
-0.586269	oriented programs can	-0.124939
-1.459593	memory space can	-0.124939
-0.587050	automatic dispatching can	-0.124939
-0.742212	the microprocessor can	-0.124939
-0.586099	that branches can	-0.124939
-1.357112	the multiplication can	-0.124939
-1.543146	the application can	-0.124939
-1.346352	instruction sets can	-0.124939
-0.787644	32 sets can	-0.124939
-0.586757	mixed implementation can	-0.124939
-1.635066	exception handling can	-0.124939
-1.568156	data members can	-0.124939
-1.142198	template parameter can	-0.124939
-1.607111	or reference can	-0.124939
-1.453576	the programmer can	-0.124939
-0.899849	The programmer can	-0.124939
-0.868319	register keyword can	-0.124939
-1.490418	table lookup can	-0.124939
-0.583383	WTL applications can	-0.124939
-0.145204	} We can	-0.124939
-0.145204	used. We can	-0.124939
-0.145204	cache. We can	-0.124939
-0.145204	zero We can	-0.124939
-0.145204	compiler. We can	-0.124939
-0.145204	number. We can	-0.124939
-0.145204	names. We can	-0.124939
-0.145204	u.f We can	-0.124939
-0.145204	15.1c. We can	-0.124939
-0.145204	bit. We can	-0.124939
-0.145204	caveats. We can	-0.124939
-0.145204	...). We can	-0.124939
-0.145204	PowerPC). We can	-0.124939
-0.145204	set). We can	-0.124939
-0.145204	'this'. We can	-0.124939
-0.697590	execution mechanism can	-0.124939
-0.487267	dispatching mechanism can	-0.124939
-0.980065	dispatch mechanism can	-0.124939
-0.581902	extra framework can	-0.124939
-1.151237	Modern microprocessors can	-0.124939
-1.399353	point numbers can	-0.124939
-1.611942	user interface can	-0.124939
-0.752657	development process can	-0.124939
-0.942168	installation process can	-0.124939
-0.468719	or union can	-0.124939
-0.321879	A union can	-0.124939
-1.331052	copy constructor can	-0.124939
-1.345294	code section can	-0.124939
-1.274951	have tested can	-0.124939
-1.280210	cache contentions can	-0.124939
-0.507187	float conversions can	-0.124939
-0.908199	These conversions can	-0.124939
-0.506902	throw() statement can	-0.124939
-0.506902	general statement can	-0.124939
-0.508041	same errors can	-0.124939
-0.508041	serious errors can	-0.124939
-1.254722	programming languages can	-0.124939
-1.115414	Function inlining can	-0.124939
-0.576223	digital operation can	-0.124939
-0.495495	as output can	-0.124939
-0.495495	compiler output can	-0.124939
-0.854525	These costs can	-0.124939
-0.574435	A database can	-0.124939
-0.851642	two constants can	-0.124939
-0.574814	simple algorithm can	-0.124939
-0.573195	This alignment can	-0.124939
-1.322137	the offset can	-0.124939
-0.575647	This effect can	-0.124939
-0.571313	These counters can	-0.124939
-1.218132	The heap can	-0.124939
-0.569577	program loading can	-0.124939
-0.767180	the condition can	-0.124939
-0.471601	if condition can	-0.124939
-0.569577	four cores can	-0.124939
-0.571972	same generation can	-0.124939
-0.978743	target buffer can	-0.124939
-1.375865	critical stride can	-0.124939
-0.565417	how metaprogramming can	-0.124939
-0.981447	hash map can	-0.124939
-0.568089	dependency chains can	-0.425969
-0.503142	This tool can	-0.124939
-0.439652	test tool can	-0.124939
-0.820192	Lazy binding can	-0.124939
-0.961717	x86 family can	-0.124939
-0.958489	a DLL can	-0.124939
-0.821363	Lookup tables can	-0.124939
-0.553495	data sections can	-0.124939
-1.033648	context switches can	-0.124939
-1.144412	Visual Studio can	-0.124939
-0.307211	variables. They can	-0.124939
-0.307211	branches. They can	-0.124939
-0.307211	smart. They can	-0.124939
-0.552773	if exceptions can	-0.124939
-1.035839	garbage collection can	-0.124939
-0.812159	these manuals can	-0.124939
-1.033648	out-of-order capabilities can	-0.124939
-0.553495	then measurements can	-0.124939
-0.552773	Such units can	-0.124939
-0.545904	this polynomial can	-0.124939
-0.930705	example 13.1 can	-0.124939
-0.547565	address. Pointers can	-0.124939
-0.545904	A debugger can	-0.124939
-0.548398	wasteful behavior can	-0.124939
-0.798696	and edx can	-0.124939
-0.545904	background job can	-0.124939
-0.958325	of abc can	-0.124939
-0.265224	upper limit can	-0.425969
-0.536467	following guidelines can	-0.124939
-0.538418	ever seen can	-0.124939
-0.522685	reasonable estimate can	-0.124939
-0.522685	code. Metaprogramming can	-0.124939
-0.948330	heap manager can	-0.124939
-0.240675	periodic pattern can	-0.124939
-0.756629	example 12.4b can	-0.124939
-0.523865	sizes Integers can	-0.124939
-0.523865	running simultaneously can	-0.124939
-0.522685	following techniques can	-0.124939
-0.522685	which otherwise can	-0.124939
-0.522685	higher resolution can	-0.124939
-0.502141	A redesign can	-0.124939
-0.500644	C++ projects can	-0.124939
-0.719515	example 14.28 can	-0.124939
-0.500644	Multiple divisions can	-0.124939
-0.500644	and s3 can	-0.124939
-0.500644	interface etc., can	-0.124939
-0.500644	arrays. Strings can	-0.124939
-0.500644	are read-only can	-0.124939
-0.459659	(typically 64) can	-0.124939
-0.459659	subexpression c+b can	-0.124939
-0.459659	same chip can	-0.124939
-0.459659	The formats can	-0.124939
-0.653629	cache miss can	-0.124939
-0.459659	jumps Jumps can	-0.124939
-0.459659	two parentheses can	-0.124939
-0.459659	set. Neither can	-0.124939
-0.459659	explicitly. Divisions can	-0.124939
-0.459659	c) 139 can	-0.124939
-0.459659	fastcall modifier can	-0.124939
-0.459659	new insight can	-0.124939
-0.459659	Database queries can	-0.124939
-0.459659	the bottlenecks can	-0.124939
-0.459659	<<, >> can	-0.124939
-0.355799	one tread can	-0.124939
-0.355799	result (b+c) can	-0.124939
-0.355799	CPUs unequally can	-0.124939
-0.355799	This dilemma can	-0.124939
-0.355799	the preprocessor can	-0.124939
-0.355799	following work-around can	-0.124939
-0.355799	example 8.24 can	-0.124939
-0.355799	the BTB can	-0.124939
-0.355799	common denominator can	-0.124939
-0.355799	(if valid) can	-0.124939
-0.355799	time. (Examples can	-0.124939
-0.355799	programs installed can	-0.124939
-0.355799	= !a; can	-0.124939
-0.355799	and shuffling can	-0.124939
-0.355799	zero. Zero can	-0.124939
-0.355799	far (arrays can	-0.124939
-2.859683	in the //	-0.124939
-0.599865	powN is //	-0.124939
-1.069270	loop for //	-0.124939
-0.596281	Virtual function //	-0.124939
-0.596281	instrset_detect function //	-0.124939
-0.441998	this by //	-0.823909
-1.604788	replaced by //	-0.124939
-1.872741	Gnu compiler //	-0.124939
-0.201946	small x //	-0.124939
-0.591297	square x //	-0.124939
-0.556713	= { //	-0.124939
-0.395218	vector { //	-0.124939
-0.655849	i++) { //	-0.249877
-0.731743	else { //	-0.124939
-0.766474	x) { //	-0.124939
-0.475305	0) { //	-0.124939
-0.146641	cc[]) { //	-0.301030
-0.667531	2) { //	-0.124939
-0.395218	parm2) { //	-0.124939
-0.194664	4) { //	-0.301030
-0.693543	() { //	-0.124939
-0.070260	8) { //	-0.726999
-0.857743	r++) { //	-0.124939
-0.369688	c++) { //	-0.425969
-0.369688	n) { //	-0.425969
-0.395218	5) { //	-0.124939
-0.154122	11) { //	-0.425969
-0.395218	CriticalFunctionDispatch(void) { //	-0.124939
-0.154122	0x7FFFFFFF) { //	-0.124939
-0.556713	TILESIZE) { //	-0.124939
-0.154122	a[SIZE][SIZE]) { //	-0.124939
-0.395218	i) { //	-0.124939
-0.395218	__try { //	-0.124939
-0.395218	EXCEPTION_CONTINUE_SEARCH) { //	-0.124939
-0.395218	N) { //	-0.124939
-0.395218	arraysize) { //	-0.124939
-0.395218	v.i) { //	-0.124939
-0.395218	13) { //	-0.124939
-1.245701	} } //	-0.124939
-0.508495	multiplication } //	-0.124939
-0.843342	x); } //	-0.124939
-0.113490	const*)p); } //	-0.602060
-0.508495	p->Hello(); } //	-0.124939
-0.508495	clock; } //	-0.124939
-0.732588	&CriticalFunction_386; } //	-0.124939
-0.732588	&CriticalFunction_SSE2; } //	-0.124939
-0.732588	cc); } //	-0.124939
-0.508495	x10; } //	-0.124939
-0.508495	return; } //	-0.124939
-0.508495	_mm_cvtss_f32(s); } //	-0.124939
-0.508495	N-1)==0,N>::p(x); } //	-0.124939
-0.508495	*(T*)0; } //	-0.124939
-0.893350	constant data //	-0.124939
-0.957973	intrinsic functions //	-0.124939
-1.413800	line size //	-0.124939
-0.889580	have multiple //	-0.124939
-0.891141	function version //	-0.124939
-1.060243	vector objects //	-0.124939
-1.223343	of 2 //	-0.425969
-0.971015	== 2 //	-0.124939
-0.984217	a table //	-0.425969
-0.888474	with branch //	-0.124939
-0.888674	swap elements //	-0.124939
-1.371849	is faster //	-0.425969
-1.269900	first call //	-0.124939
-2.122268	= 0; //	-0.124939
-1.031330	return 0; //	-0.124939
-1.769267	sign bit //	-0.124939
-0.489770	if unsigned //	-0.425969
-0.540573	bytes. first //	-0.301030
-1.668031	the code. //	-0.124939
-1.699313	compile time. //	-0.124939
-0.928496	to test //	-0.124939
-0.532828	each test //	-0.124939
-0.532828	before test //	-0.124939
-1.275474	// SSE2 //	-0.124939
-1.014930	to 0 //	-0.124939
-1.578414	= 0 //	-0.124939
-0.590982	provoke error //	-0.124939
-0.591725	NumberOfTests times //	-0.124939
-0.001264	code. Example: //	-0.726999
-0.005080	time. Example: //	-0.124939
-0.002533	function. Example: //	-0.425969
-0.002533	memory. Example: //	-0.425969
-0.005080	used. Example: //	-0.124939
-0.005080	called. Example: //	-0.124939
-0.005080	loop. Example: //	-0.124939
-0.002533	2. Example: //	-0.425969
-0.005080	variables. Example: //	-0.124939
-0.005080	calls. Example: //	-0.124939
-0.005080	registers. Example: //	-0.124939
-0.005080	variable. Example: //	-0.124939
-0.005080	needed. Example: //	-0.124939
-0.005080	instructions. Example: //	-0.124939
-0.005080	order. Example: //	-0.124939
-0.005080	to. Example: //	-0.124939
-0.002533	overflow. Example: //	-0.425969
-0.005080	value. Example: //	-0.124939
-0.005080	branch. Example: //	-0.124939
-0.005080	constant. Example: //	-0.124939
-0.005080	prediction. Example: //	-0.124939
-0.005080	result. Example: //	-0.124939
-0.002533	counter. Example: //	-0.425969
-0.005080	operation. Example: //	-0.124939
-0.005080	finished. Example: //	-0.124939
-0.005080	ways. Example: //	-0.124939
-0.005080	execution. Example: //	-0.124939
-0.005080	elements. Example: //	-0.124939
-0.005080	once. Example: //	-0.124939
-0.005080	limited. Example: //	-0.124939
-0.005080	static. Example: //	-0.124939
-0.005080	known. Example: //	-0.124939
-0.005080	thing. Example: //	-0.124939
-0.005080	divisions. Example: //	-0.124939
-0.005080	later. Example: //	-0.124939
-0.005080	zeroes. Example: //	-0.124939
-0.005080	undesired. Example: //	-0.124939
-0.005080	offsets). Example: //	-0.124939
-0.005080	overhead. Example: //	-0.124939
-0.005080	individually. Example: //	-0.124939
-0.591142	Aligned arrays //	-0.124939
-1.246059	doesn't work //	-0.124939
-0.591952	Store result //	-0.124939
-1.382359	unused bytes //	-0.124939
-0.588499	<ia32intrin.h> etc. //	-0.124939
-0.337150	in matrix //	-0.425969
-0.501488	define matrix //	-0.124939
-0.501488	transpose matrix //	-0.124939
-1.587331	vector classes //	-0.124939
-0.206749	} }; //	-0.191886
-0.352128	perhaps }; //	-0.124939
-0.352128	NotPolymorphic(); }; //	-0.124939
-0.352128	4.; }; //	-0.124939
-1.055124	int b; //	-0.124939
-0.815442	double b; //	-0.124939
-0.894922	a, b; //	-0.124939
-0.587696	prevent optimizing //	-0.124939
-0.184242	c; ... //	-0.425969
-0.510684	CriticalFunction(); ... //	-0.124939
-0.798284	Loop counter //	-0.124939
-1.075117	stamp counter //	-0.124939
-0.585076	sum operator //	-0.124939
-0.191629	: 1; //	-0.124939
-0.709787	size conversion //	-0.124939
-0.494741	unsigned conversion //	-0.124939
-0.877522	type conversion //	-0.124939
-0.741860	b, c; //	-0.124939
-1.469236	to zero //	-0.124939
-0.557991	// Table //	-0.425969
-0.766669	Gnu compilers. //	-0.124939
-0.528515	Microsoft compilers. //	-0.124939
-0.583399	S1 aligned //	-0.124939
-0.581020	* x; //	-0.124939
-1.377460	= 100; //	-0.124939
-1.446929	or later //	-0.124939
-0.579285	// AVX2 //	-0.124939
-0.744272	// constructor //	-0.124939
-0.858480	default constructor //	-0.124939
-1.242199	* 2; //	-0.124939
-0.580365	go here //	-0.124939
-1.217207	int a; //	-0.124939
-0.055796	the example: //	-0.124939
-0.004385	For example: //	-1.204120
-0.055796	following example: //	-0.124939
-0.055796	Another example: //	-0.124939
-0.337011	is slow //	-0.124939
-0.163265	int d; //	-0.124939
-0.605023	+ d; //	-0.124939
-0.845348	through rows //	-0.124939
-0.568987	Called directly //	-0.124939
-1.016551	double temp; //	-0.124939
-0.568358	both loops //	-0.124939
-0.459971	// SSE4.1 //	-0.124939
-0.459971	with SSE4.1 //	-0.124939
-0.830854	with templates //	-0.124939
-0.024945	code to: //	-0.124939
-0.004876	this to: //	-0.823909
-0.024945	optimized to: //	-0.124939
-0.024945	reduced to: //	-0.124939
-0.012294	changed to: //	-0.425969
-1.067779	template metaprogramming //	-0.124939
-0.561968	// multiply //	-0.124939
-0.824907	below diagonal //	-0.124939
-0.970480	// Time //	-0.124939
-1.064492	x, y; //	-0.124939
-0.556365	align arrays. //	-0.124939
-0.558894	SSE3 required //	-0.124939
-0.558050	different array. //	-0.124939
-1.031650	to measure //	-0.124939
-0.013696	like this: //	-0.903090
-0.987886	/ 10; //	-0.124939
-1.208781	as follows: //	-0.124939
-0.187061	* c); //	-0.425969
-0.096894	(b, c); //	-0.425969
-0.811795	int a[100]; //	-0.124939
-0.554974	= 256; //	-0.124939
-0.806636	type T //	-0.124939
-0.810068	of 2: //	-0.124939
-0.552111	metaprogramming is. //	-0.124939
-1.141834	for details. //	-0.124939
-0.553063	x2, x); //	-0.124939
-0.544318	integer constant. //	-0.124939
-0.544318	} polynomial //	-0.124939
-0.365220	type casting //	-0.124939
-0.546503	/ 16; //	-0.124939
-0.546503	% 16; //	-0.124939
-0.006787	parm2) {...} //	-0.301030
-0.545409	Example 13.1 //	-0.124939
-0.006787	+ i); //	-0.425969
-0.534914	simple method. //	-0.124939
-0.534914	return a[i]; //	-0.124939
-0.536196	unused returns //	-0.124939
-0.777804	// n! //	-0.124939
-0.780049	from www.agner.org/optimize/asmlib.zip. //	-0.124939
-0.355620	cc[size] ); //	-0.124939
-0.355620	aa[size] ); //	-0.124939
-0.534914	entry point. //	-0.124939
-0.536196	at 13 //	-0.124939
-0.355620	} #endif //	-0.124939
-0.355620	SelectAddMul_AVX2 #endif //	-0.124939
-0.521181	writing: 103 //	-0.124939
-0.031412	0, _EM_OVERFLOW); //	-0.425969
-0.031412	_controlfp(0, _EM_OVERFLOW); //	-0.425969
-0.240823	// x^4 //	-0.124939
-0.077268	|= 0x80000000; //	-0.124939
-0.260408	^= 0x80000000; //	-0.124939
-0.522734	the dispatcher. //	-0.124939
-0.521181	with 1: //	-0.124939
-0.522734	to unsigned. //	-0.124939
-0.521181	variable Y //	-0.124939
-0.501184	an error. //	-0.124939
-0.247670	X #else //	-0.124939
-0.247670	); #else //	-0.124939
-0.089411	of numbers: //	-0.124939
-0.089411	point numbers: //	-0.124939
-0.089411	100 numbers: //	-0.124939
-0.247670	of calculations: //	-0.124939
-0.247670	modulo calculations: //	-0.124939
-0.356185	= 8; //	-0.124939
-0.247670	: 8; //	-0.124939
-0.247670	the operations: //	-0.124939
-0.247670	modulo operations: //	-0.124939
-0.499215	of overflow: //	-0.124939
-0.823522	following way: //	-0.124939
-0.717151	Polynomial coefficients //	-0.124939
-0.042409	replaced with: //	-0.425969
-0.089411	Replace with: //	-0.124939
-0.499215	- 30 //	-0.124939
-0.499215	example: 38 //	-0.124939
-0.202794	sign bit: //	-0.425969
-0.027807	a[size], b[size]; //	-0.301030
-0.065274	= _mm_set1_epi16(2); //	-0.425969
-0.458358	#ifdef _MSC_VER //	-0.124939
-0.651597	x=y; y=temp;} //	-0.124939
-0.651597	// x^2 //	-0.124939
-0.651597	#include <stdio.h> //	-0.124939
-0.651597	= 3.3; //	-0.124939
-0.065274	#include <dvec.h> //	-0.425969
-0.142125	a case: //	-0.124939
-0.142125	lower case: //	-0.124939
-0.458358	the loop: //	-0.124939
-0.065274	#include "vectorclass.h" //	-0.124939
-0.065274	table lookup: //	-0.425969
-0.142125	_mm_or_si128(c2, bc); //	-0.124939
-0.142125	_mm_andnot_si128(mask, bc); //	-0.124939
-0.065274	_mm_add_epi16(c, two); //	-0.425969
-0.458358	by TILESIZE //	-0.124939
-0.458358	- time1; //	-0.124939
-0.651597	#include <emmintrin.h> //	-0.124939
-0.651597	swapd(a[r][c], a[c][r]); //	-0.124939
-0.065274	= InstructionSet(); //	-0.425969
-0.065274	#include "asmlib.h" //	-0.124939
-0.458358	c2, mask); //	-0.124939
-0.065274	= 512; //	-0.425969
-0.065274	* 1.2; //	-0.124939
-0.458358	elements c.load(cc+i); //	-0.124939
-0.651597	template parameter: //	-0.124939
-0.458358	int parm2); //	-0.124939
-0.458358	// x^n //	-0.124939
-0.065274	lookup table: //	-0.425969
-0.065274	= _mm_set1_epi16(0); //	-0.425969
-0.142125	// x^10 //	-0.124939
-0.142125	return x^10 //	-0.124939
-0.458358	derived class: //	-0.124939
-0.142125	: 23; //	-0.124939
-0.142125	<< 23; //	-0.124939
-0.065274	is needed: //	-0.425969
-0.065274	_mm_cmpgt_epi16(b, zero); //	-0.124939
-0.651597	"Hello "; //	-0.124939
-0.065274	"Hello 1" //	-0.425969
-0.651597	(double)(signed int)u; //	-0.124939
-0.458358	has occurred. //	-0.124939
-0.458358	+= 2;} //	-0.124939
-0.065274	is available: //	-0.425969
-0.651597	0, sizeof(a)); //	-0.124939
-0.458358	elements b.load(bb+i); //	-0.124939
-0.458358	lrint function: //	-0.124939
-0.354774	use SafeArray: //	-0.124939
-0.354774	a union: //	-0.124939
-0.354774	fraction bits: //	-0.124939
-0.354774	to zero: //	-0.124939
-0.354774	F32vec4 xx4(x4); //	-0.124939
-0.354774	floating point: //	-0.124939
-0.354774	is enabled: //	-0.124939
-0.354774	// x^8 //	-0.124939
-0.354774	100> list; //	-0.124939
-0.354774	double precision: //	-0.124939
-0.354774	= 64; //	-0.124939
-0.354774	+ log(c[i]); //	-0.124939
-0.354774	&Object2; p2->Hello(); //	-0.124939
-0.354774	CriticalFunction, @gnu_indirect_function"); //	-0.124939
-0.354774	positive integer: //	-0.124939
-0.354774	members last: //	-0.124939
-0.354774	+= A2; //	-0.124939
-0.354774	#include "xmmintrin.h" //	-0.124939
-0.354774	control condition: //	-0.124939
-0.354774	more efficient: //	-0.124939
-0.354774	b[arraysize], c[arraysize]; //	-0.124939
-0.354774	#include <pmmintrin.h> //	-0.124939
-0.354774	_mm_loadu_si128((__m128i const*)p);} //	-0.124939
-0.354774	the arrays: //	-0.124939
-0.354774	of underflow: //	-0.124939
-0.354774	runtime polymorphism: //	-0.124939
-0.354774	unsigned Examples: //	-0.124939
-0.354774	= &SelectAddMul_dispatch; //	-0.124939
-0.354774	point variable: //	-0.124939
-0.354774	*= n+1; //	-0.124939
-0.354774	// Constructor //	-0.124939
-0.354774	ifbit=1 bitofn //	-0.124939
-0.354774	a double: //	-0.124939
-0.354774	and reorganize: //	-0.124939
-0.354774	suggested improvements). //	-0.124939
-0.354774	away cpuid //	-0.124939
-0.354774	* sizeof(float)); //	-0.124939
-0.354774	vector classes): //	-0.124939
-0.354774	return ipow(x,10); //	-0.124939
-0.354774	= (int)d; //	-0.124939
-0.354774	order polynomial: //	-0.124939
-0.354774	* _mm_load_ps(coef+i); //	-0.124939
-0.354774	|| defined(__GNUC__) //	-0.124939
-0.354774	using InstructionSet(): //	-0.124939
-0.354774	int BigArray[1024]; //	-0.124939
-0.354774	of structures: //	-0.124939
-0.354774	Example 7.45 //	-0.124939
-0.354774	Is16vec8 two(2,2,2,2,2,2,2,2); //	-0.124939
-0.354774	= WhateverFunction(i); //	-0.124939
-0.354774	&= 0x7FFFFFFF; //	-0.124939
-0.354774	absolute values: //	-0.124939
-0.354774	static keyword: //	-0.124939
-0.354774	single comparison: //	-0.124939
-0.354774	instruction set: //	-0.124939
-0.354774	the reciprocal: //	-0.124939
-0.354774	Library (WTL): //	-0.124939
-0.354774	loop counter: //	-0.124939
-0.354774	// Serialize //	-0.124939
-0.354774	be used: //	-0.124939
-0.354774	Example 14.21. //	-0.124939
-0.354774	and memcpy: //	-0.124939
-0.354774	library asmlib.. //	-0.124939
-0.354774	= 0.0; //	-0.124939
-0.354774	= &CriticalFunction_Dispatch; //	-0.124939
-0.354774	uses SSE3. //	-0.124939
-0.354774	= lrint(d); //	-0.124939
-0.354774	common denominator: //	-0.124939
-0.354774	in two: //	-0.124939
-0.354774	we have: //	-0.124939
-0.354774	using memset: //	-0.124939
-0.354774	define fprintf //	-0.124939
-0.354774	: 52; //	-0.124939
-0.354774	#include "instrset_detect.cpp" //	-0.124939
-0.354774	x,y coordinates //	-0.124939
-0.354774	Is16vec8 zero(0,0,0,0,0,0,0,0); //	-0.124939
-0.354774	its address: //	-0.124939
-0.354774	changes fastest: //	-0.124939
-0.354774	with alloca: //	-0.124939
-0.354774	//=A*x*x+B*x+C //=DeltaY //	-0.124939
-0.354774	the exponent: //	-0.124939
-0.354774	reference instead: //	-0.124939
-0.354774	type conversions: //	-0.124939
-0.354774	element matrix[c][r]. //	-0.124939
-0.354774	= &SelectAddMul_SSE2; //	-0.124939
-0.354774	SelectAddMul_AVX2, SelectAddMul_dispatch; //	-0.124939
-0.354774	vector classes: //	-0.124939
-0.354774	CriticalFunctionType CriticalFunction_Dispatch; //	-0.124939
-0.354774	<< x.f; //	-0.124939
-0.354774	matrix a: //	-0.124939
-0.354774	: 63; //	-0.124939
-0.354774	return add_elements(s); //	-0.124939
-0.354774	as integers: //	-0.124939
-0.354774	| 0x3F800000; //	-0.124939
-0.354774	pointers, e.g.: //	-0.124939
-0.354774	return FactorialTable[n]; //	-0.124939
-0.354774	example 7.22. //	-0.124939
-0.354774	obj1; p->f(); //	-0.124939
-0.354774	pivot search: //	-0.124939
-0.354774	* x2; //	-0.124939
-0.354774	to x^0/0! //	-0.124939
-0.354774	example 9.5b. //	-0.124939
-0.354774	always false: //	-0.124939
-0.354774	inside square: //	-0.124939
-0.354774	a square. //	-0.124939
-0.354774	_mm_stream_pi((__m64*)dest, *(__m64*)&source); //	-0.124939
-0.354774	certain interval: //	-0.124939
-0.354774	0.f, 1.f); //	-0.124939
-0.354774	induction variables: //	-0.124939
-0.354774	*= xx4; //	-0.124939
-0.354774	: 15; //	-0.124939
-0.354774	function. 154 //	-0.124939
-0.354774	return statement: //	-0.124939
-0.354774	a template: //	-0.124939
-0.354774	= static_cast<float>(i); //	-0.124939
-0.354774	this capability: //	-0.124939
-0.354774	BigArray[1024] __attribute__((aligned(64))); //	-0.124939
-0.354774	MOVNTQ _mm_empty(); //	-0.124939
-0.354774	two gives: //	-0.124939
-0.354774	int seconds; //	-0.124939
-0.354774	* 1.2f; //	-0.124939
-0.354774	int cc[]); //	-0.124939
-0.354774	instrset_detect(); 116 //	-0.124939
-0.354774	mask); 110 //	-0.124939
-0.354774	: 11; //	-0.124939
-0.354774	<< endl; //	-0.124939
-0.354774	* 2.5; //	-0.124939
-1.628803	to a =	-0.124939
-1.140263	function a =	-0.124939
-0.555956	int a =	-0.124939
-0.032546	{ a =	-0.647817
-1.460426	have a =	-0.124939
-0.555956	double a =	-0.124939
-0.555956	pointer a =	-0.124939
-0.555956	float a =	-0.124939
-0.555956	address a =	-0.124939
-1.294761	example, a =	-0.124939
-0.555956	case a =	-0.124939
-0.555956	& a =	-0.124939
-0.072630	b; a =	-0.425969
-0.555956	solution a =	-0.124939
-0.555956	... a =	-0.124939
-0.361178	expression a =	-0.124939
-0.194499	__m128i a =	-0.124939
-0.067729	c; a =	-0.522879
-0.815310	&& a =	-0.124939
-0.555956	| a =	-0.124939
-0.119568	char a =	-0.301030
-0.555956	d; a =	-0.124939
-0.086451	10; a =	-0.425969
-0.194499	16; a =	-0.425969
-0.555956	Is16vec8 a =	-0.124939
-0.555956	7.2 a =	-0.124939
-0.555956	140 a =	-0.124939
-0.555956	c.load(cc+i); a =	-0.124939
-0.555956	z; a =	-0.124939
-0.555956	Writing a =	-0.124939
-0.555956	8.2b a =	-0.124939
-0.555956	8.3b a =	-0.124939
-0.555956	8.10b a =	-0.124939
-0.555956	1.5f}; a =	-0.124939
-0.555956	2.5f}; a =	-0.124939
-0.196724	int x =	-0.124939
-0.120814	than x =	-0.124939
-0.196724	example, x =	-0.124939
-0.566277	2, x =	-0.124939
-0.566277	y; x =	-0.124939
-0.201990	double A =	-0.425969
-0.025298	int size =	-0.505150
-0.170549	b; b =	-0.124939
-0.455363	1 b =	-0.124939
-0.170549	__m128i b =	-0.425969
-0.170549	c; b =	-0.124939
-0.455363	0, b =	-0.124939
-0.455363	a; b =	-0.124939
-0.455363	Is16vec8 b =	-0.124939
-0.455363	-100, b =	-0.124939
-0.455363	-1.0E8, b =	-0.124939
-0.455363	(2.0f); b =	-0.124939
-0.455363	5.0f; b =	-0.124939
-0.455363	Multiply(10,8); b =	-0.124939
-0.455363	a+1; b =	-0.124939
-0.698674	= i =	-0.124939
-0.435567	int i =	-0.124939
-0.435567	example i =	-0.124939
-0.021686	(int i =	-1.238882
-0.435567	40 i =	-0.124939
-0.200336	__m128i two =	-0.425969
-0.597594	dummy[0]; clock =	-0.124939
-0.576900	* 4 =	-0.124939
-0.576900	/ 4 =	-0.124939
-0.576620	* 8 =	-0.124939
-0.576620	/ 8 =	-0.124939
-0.593882	% 32 =	-0.124939
-0.547050	& 0 =	-0.124939
-0.192546	| 0 =	-0.425969
-0.083469	by constant =	-0.425969
-0.196538	// result =	-0.425969
-0.141223	4 bytes =	-0.602060
-0.319929	8 bytes =	-0.425969
-0.370188	2048 bytes =	-0.124939
-0.281302	to c =	-0.124939
-0.281302	{ c =	-0.124939
-0.281302	If c =	-0.124939
-0.281302	b; c =	-0.124939
-0.118263	__m128i c =	-0.425969
-0.281302	division c =	-0.124939
-0.118263	d; c =	-0.425969
-0.281302	temp; c =	-0.124939
-0.281302	Is16vec8 c =	-0.124939
-0.281302	100, c =	-0.124939
-0.281302	3.5; c =	-0.124939
-0.281302	1.0E8, c =	-0.124939
-0.281302	CFALSE: c =	-0.124939
-0.281302	(a+1); c =	-0.124939
-0.002166	for (i =	-0.985277
-0.029396	{ y =	-0.425969
-0.131820	double y =	-0.124939
-0.131820	return y =	-0.124939
-0.131820	expression y =	-0.124939
-0.131820	c; y =	-0.124939
-0.131820	a; y =	-0.124939
-0.131820	b) y =	-0.124939
-0.029396	y; y =	-0.425969
-0.060928	b2; y =	-0.124939
-0.131820	write: y =	-0.124939
-0.189968	__m128i zero =	-0.425969
-0.533023	If n =	-0.124939
-0.533023	(int n =	-0.124939
-0.351217	1 byte =	-0.425969
-0.521581	{ r =	-0.124939
-0.521581	when r =	-0.124939
-0.010834	{ a[i] =	-0.182931
-0.095336	; a[i] =	-0.124939
-0.045057	i++) a[i] =	-0.124939
-0.095336	here: a[i] =	-0.124939
-0.095336	overflow: a[i] =	-0.124939
-0.095336	formula a[i] =	-0.124939
-0.184735	2.2, C =	-0.425969
-0.168467	20, columns =	-0.124939
-0.447397	10, columns =	-0.124939
-0.394009	* p =	-0.124939
-0.394009	i; p =	-0.124939
-0.394009	p->Hello(); p =	-0.124939
-0.394009	p; p =	-0.124939
-0.577300	< b) =	-0.124939
-0.370583	{ temp =	-0.124939
-0.186750	temp; temp =	-0.124939
-0.127040	{ d =	-0.124939
-0.127040	double d =	-0.124939
-0.058891	+ d =	-0.124939
-0.058891	b; d =	-0.425969
-0.038361	d; d =	-0.301030
-0.127040	DTRUE: d =	-0.124939
-0.319714	} sum =	-0.124939
-0.131078	float sum =	-0.124939
-0.319714	i, sum =	-0.124939
-0.319714	list[size], sum =	-0.124939
-0.573613	shift right =	-0.124939
-0.480713	&& true =	-0.124939
-0.480713	|| true =	-0.124939
-0.327843	for N =	-0.124939
-0.098341	int rows =	-0.301030
-0.177147	int level =	-0.425969
-0.673993	i++){ list[i] =	-0.124939
-0.673993	i+=3){ list[i] =	-0.124939
-0.305195	* CriticalFunction =	-0.124939
-0.305195	version CriticalFunction =	-0.124939
-0.126323	supported CriticalFunction =	-0.124939
-0.566374	{ seconds =	-0.124939
-0.305454	i, f =	-0.124939
-0.305454	(float)i; f =	-0.124939
-0.305454	float(i); f =	-0.124939
-0.305454	f=i; f =	-0.124939
-0.118573	{ *p =	-0.425969
-0.118573	string[100], *p =	-0.425969
-0.168832	n, factorial =	-0.425969
-0.563106	; eax =	-0.124939
-0.432064	&& false =	-0.124939
-0.432064	|| false =	-0.124939
-0.135850	__m128i c2 =	-0.425969
-0.334621	bit-mask: c2 =	-0.124939
-0.164498	; ecx =	-0.124939
-0.164498	{ j =	-0.425969
-0.083568	& -1 =	-0.425969
-0.083568	| -1 =	-0.425969
-0.187143	^ -1 =	-0.124939
-0.412893	as xn =	-0.124939
-0.412893	float xn =	-0.124939
-0.555874	int Induction =	-0.124939
-0.152208	1.1, B =	-0.425969
-0.152208	; edx =	-0.124939
-0.099831	__m128i bc =	-0.425969
-0.229683	bit-mask: bc =	-0.124939
-0.063943	int SIZE =	-0.301030
-0.029341	for (c =	-0.726999
-0.229683	as -(-a) =	-0.124939
-0.229683	- -(-a) =	-0.124939
-0.229683	n.a. -(-a) =	-0.124939
-0.536921	as n! =	-0.124939
-0.230119	x); s =	-0.124939
-0.099993	s; s =	-0.124939
-0.537805	4, Wednesday =	-0.124939
-0.537805	list[size], sum1 =	-0.124939
-0.099831	& ~a =	-0.425969
-0.229683	^ ~a =	-0.124939
-0.357265	{ b[i] =	-0.124939
-0.357265	i++) b[i] =	-0.124939
-0.065444	* SelectAddMul_pointer =	-0.124939
-0.065444	2) SelectAddMul_pointer =	-0.124939
-0.065444	8) SelectAddMul_pointer =	-0.124939
-0.065444	5) SelectAddMul_pointer =	-0.124939
-0.171757	} z =	-0.124939
-0.171757	cos(x); z =	-0.124939
-0.171757	sin(x); z =	-0.124939
-0.523124	n; u.i =	-0.124939
-0.757383	(line size) =	-0.124939
-0.523124	0, sum2 =	-0.124939
-0.015460	for (r =	-0.425969
-0.523124	that N1 =	-0.124939
-0.128838	__m128i mask =	-0.425969
-0.523124	double Y =	-0.124939
-0.089650	and (set) =	-0.124939
-0.089650	have (set) =	-0.124939
-0.089650	formula: (set) =	-0.124939
-0.248359	&& !a =	-0.124939
-0.248359	|| !a =	-0.124939
-0.089650	- a-a =	-0.124939
-0.089650	n.a. a-a =	-0.124939
-0.089650	a-(-b)=a+b a-a =	-0.124939
-0.089650	n.a. x*x*x*x*x*x*x*x =	-0.124939
-0.042516	((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x =	-0.124939
-0.248359	0; list[i+1] =	-0.124939
-0.248359	=0; list[i+1] =	-0.124939
-0.248359	double x2 =	-0.124939
-0.248359	float x2 =	-0.124939
-0.106663	1; list[i+2] =	-0.425969
-0.501062	double Z =	-0.124939
-0.502420	b[i]; c[i] =	-0.124939
-0.501062	0, s3 =	-0.124939
-0.501062	0, s2 =	-0.124939
-0.248359	n.a. a+b =	-0.124939
-0.248359	reductions: a+b =	-0.124939
-0.248359	b+a a*b =	-0.124939
-0.248359	b+a, a*b =	-0.124939
-0.027877	for (x =	-0.301030
-0.720207	8 kb =	-0.124939
-0.142530	double x4 =	-0.124939
-0.142530	float x4 =	-0.124939
-0.142530	; a[i+1] =	-0.124939
-0.142530	Induction; a[i+1] =	-0.124939
-0.460039	float nfac =	-0.124939
-0.142530	- a*0 =	-0.124939
-0.142530	n.a. a*0 =	-0.124939
-0.142530	- a*1 =	-0.124939
-0.142530	n.a. a*1 =	-0.124939
-0.460039	int TILESIZE =	-0.124939
-0.460039	0x20, Saturday =	-0.124939
-0.142530	- a+0 =	-0.124939
-0.142530	n.a. a+0 =	-0.124939
-0.142530	b2); y1 =	-0.124939
-0.142530	y2; y1 =	-0.124939
-0.142530	reciprocal_divisor; y2 =	-0.124939
-0.142530	b1; y2 =	-0.124939
-0.142530	C; x.abc =	-0.124939
-0.142530	7.40c x.abc =	-0.124939
-0.460039	p2; p2 =	-0.124939
-0.460039	p1; p1 =	-0.124939
-0.460039	int ArraySize =	-0.124939
-0.065444	{ aa[i] =	-0.124939
-0.065444	for (c2 =	-0.124939
-0.460039	int ABC =	-0.124939
-0.460039	float s0 =	-0.124939
-0.654224	|| (a&&c) =	-0.124939
-0.460039	int NumberOfTests =	-0.124939
-0.460039	int min =	-0.124939
-0.460039	factor sizeof(S1) =	-0.124939
-0.142530	i, largest_index =	-0.124939
-0.142530	absvalue; largest_index =	-0.124939
-0.460039	{ list[i].a =	-0.124939
-0.460039	order(i); matrix[j][0] =	-0.124939
-0.460039	0, s1 =	-0.124939
-0.142530	- a*b+a*c =	-0.124939
-0.142530	n.a. a*b+a*c =	-0.124939
-0.065444	|| (a&&b&&c) =	-0.124939
-0.460039	int ARRAYSIZE =	-0.124939
-0.065444	a ^a =	-0.425969
-0.460039	0x10, Friday =	-0.124939
-0.142530	as 0/a =	-0.124939
-0.142530	- 0/a =	-0.124939
-0.142530	- (a&b)|(a&c) =	-0.124939
-0.142530	x-xxxxx-- (a&b)|(a&c) =	-0.124939
-0.460039	double log2 =	-0.124939
-0.654224	|| (!a&&c) =	-0.124939
-0.142530	{ largest_abs =	-0.124939
-0.142530	absvalue, largest_abs =	-0.124939
-0.142530	... x.a =	-0.124939
-0.142530	C; x.a =	-0.124939
-0.065444	{ Table[x] =	-0.124939
-0.142530	B; x.c =	-0.124939
-0.142530	2.; x.c =	-0.124939
-0.142530	1.; x.b =	-0.124939
-0.142530	A; x.b =	-0.124939
-0.142530	a*b*c=a*(b*c) a+b+c+d =	-0.124939
-0.142530	a+b+c=c+b+a a+b+c+d =	-0.124939
-0.065444	int FactorialTable[13] =	-0.425969
-0.460039	2, Tuesday =	-0.124939
-0.065444	for (r2 =	-0.124939
-0.460039	first. b+c =	-0.124939
-0.356098	b) {x =	-0.124939
-0.356098	- a<<b<<c =	-0.124939
-0.356098	then 0+1.23456 =	-0.124939
-0.356098	CriticalFunction(); timediff[i] =	-0.124939
-0.356098	{ Sunday =	-0.124939
-0.356098	float OneOrTwo5[2] =	-0.124939
-0.356098	&& a<c) =	-0.124939
-0.356098	{ time1 =	-0.124939
-0.356098	- a/1 =	-0.124939
-0.356098	1; x[1] =	-0.124939
-0.356098	order(i); list[j].a =	-0.124939
-0.356098	0); DontSkip =	-0.124939
-0.356098	const Greek[4] =	-0.124939
-0.356098	double A2 =	-0.124939
-0.356098	100, max =	-0.124939
-0.356098	double x10 =	-0.124939
-0.356098	92 DynamicArray[i] =	-0.124939
-0.356098	// polynomial(x) =	-0.124939
-0.356098	column++) matrix[row][column] =	-0.124939
-0.356098	bit: absvalue =	-0.124939
-0.356098	reciprocal_divisor; reciprocal_divisor =	-0.124939
-0.356098	n.a. (-a)*(-b) =	-0.124939
-0.356098	- a+b+c =	-0.124939
-0.356098	float coef[16] =	-0.124939
-0.356098	n.a. (a+b)+c =	-0.124939
-0.356098	int NUMROWS =	-0.124939
-0.356098	is (int)(&list[100]) =	-0.124939
-0.356098	for (j =	-0.124939
-0.356098	n.a. a+a+a+a =	-0.124939
-0.356098	double x8 =	-0.124939
-0.356098	| (b&c) =	-0.124939
-0.356098	x-xxxx--x (a|b)&(a|c) =	-0.124939
-0.356098	value -100+100+100 =	-0.124939
-0.356098	|| (b&&c) =	-0.124939
-0.356098	for (c1 =	-0.124939
-0.356098	1.0; list[i].b =	-0.124939
-0.356098	- andnot(a,a) =	-0.124939
-0.356098	float lookup[2] =	-0.124939
-0.356098	3.; x.d =	-0.124939
-0.356098	x; x.f =	-0.124939
-0.356098	&& !b =	-0.124939
-0.356098	1; a[1] =	-0.124939
-0.356098	for (row =	-0.124939
-0.356098	for (r1 =	-0.124939
-0.356098	int arraysize =	-0.124939
-0.356098	for (temp =	-0.124939
-0.356098	repeated 1024/4 =	-0.124939
-0.356098	a[2]; a[0] =	-0.124939
-0.356098	|| (!a&&b) =	-0.124939
-0.356098	(critical stride) =	-0.124939
-0.356098	0.5 ns =	-0.124939
-0.356098	{ ab[i].b =	-0.124939
-0.356098	* sizeof(float)) =	-0.124939
-0.356098	1.0; temp->b =	-0.124939
-0.356098	{ temp->a =	-0.124939
-0.356098	float list[] =	-0.124939
-0.356098	8, Thursday =	-0.124939
-0.356098	* DynamicArray =	-0.124939
-0.356098	int iset =	-0.124939
-0.356098	1, Monday =	-0.124939
-0.356098	(~a&c) a&b&c&d =	-0.124939
-0.356098	^ ~b =	-0.124939
-0.356098	{ a[c][r] =	-0.124939
-0.356098	is 8*1024/64 =	-0.124939
-0.356098	with IsPowerOf2 =	-0.124939
-0.356098	reductions: ~(~a) =	-0.124939
-0.356098	c; x[0] =	-0.124939
-0.356098	100, NUMCOLUMNS =	-0.124939
-0.356098	a ^0 =	-0.124939
-0.356098	for (column =	-0.124939
-0.356098	7.41b a.x =	-0.124939
-0.356098	d.x; a.y =	-0.124939
-0.356098	} list[300] =	-0.124939
-0.356098	% 0x20 =	-0.124939
-0.598670	asmlib.. // or	-0.124939
-1.445551	the function or	-0.124939
-1.844366	a function or	-0.124939
-1.066387	same function or	-0.124939
-0.563321	no function or	-0.124939
-1.213931	each function or	-0.124939
-0.828786	any function or	-0.124939
-1.574112	member function or	-0.124939
-0.972006	single function or	-0.124939
-0.828786	every function or	-0.124939
-0.563321	No function or	-0.124939
-1.134749	frame function or	-0.124939
-0.563321	friend function or	-0.124939
-0.895158	represented with or	-0.124939
-1.058626	system code or	-0.124939
-1.406976	assembly code or	-0.124939
-1.048702	vectorized code or	-0.124939
-1.950658	unsigned int or	-0.124939
-0.596865	reflect this or	-0.124939
-0.963839	compile time or	-0.425969
-1.634518	of memory or	-0.124939
-0.832290	in memory or	-0.124939
-0.597003	exchange data or	-0.124939
-2.034347	the program or	-0.124939
-1.213525	of program or	-0.124939
-1.555701	The program or	-0.124939
-1.790556	a vector or	-0.124939
-2.411175	the same or	-0.124939
-1.179123	virtual functions or	-0.124939
-0.929338	intrinsic functions or	-0.425969
-0.574787	individual functions or	-0.124939
-0.892922	particular CPU or	-0.124939
-1.579356	the loop or	-0.124939
-1.731742	a loop or	-0.124939
-2.068846	is used or	-0.124939
-1.148580	and one or	-0.124939
-0.876899	has one or	-0.124939
-1.283582	only one or	-0.124939
-0.758361	using one or	-0.124939
-0.523694	read one or	-0.124939
-0.758361	just one or	-0.124939
-0.523694	enable one or	-0.124939
-0.523694	22 one or	-0.124939
-0.523694	units, one or	-0.124939
-0.523694	Contain one or	-0.124939
-1.522817	code cache or	-0.124939
-2.351836	instruction set or	-0.124939
-0.619310	the class or	-0.301030
-0.678484	a class or	-0.425969
-0.532160	object's class or	-0.124939
-1.349960	other compilers or	-0.124939
-1.061733	code size or	-0.124939
-0.121901	Clang, Intel or	-0.124939
-0.921919	the pointer or	-0.124939
-0.478534	a pointer or	-0.689210
-0.655192	A pointer or	-0.124939
-0.158679	any pointer or	-0.425969
-0.411305	const pointer or	-0.124939
-0.411305	4 pointer or	-0.124939
-0.411305	8 pointer or	-0.124939
-0.411305	simple pointer or	-0.124939
-0.411305	integer, pointer or	-0.124939
-0.411305	returned pointer or	-0.124939
-0.411305	variable, pointer or	-0.124939
-1.633749	function library or	-0.124939
-0.583431	graphics library or	-0.124939
-0.451525	a float or	-0.425969
-0.516043	of float or	-0.124939
-0.451525	= float or	-0.425969
-0.185500	from float or	-0.425969
-0.516043	(8 float or	-0.124939
-0.504623	is two or	-0.124939
-0.504623	be two or	-0.124939
-0.633636	are two or	-0.425969
-0.463518	have two or	-0.425969
-0.947385	between two or	-0.124939
-0.726121	doing two or	-0.124939
-0.504623	Make two or	-0.124939
-1.767412	the object or	-0.124939
-0.583161	as object or	-0.124939
-0.559264	to static or	-0.124939
-0.821340	or static or	-0.124939
-0.559264	on static or	-0.124939
-0.821340	functions static or	-0.124939
-0.582036	compiled C++ or	-0.124939
-0.582036	C, C++ or	-0.124939
-1.350552	the array or	-0.124939
-1.013108	an array or	-0.124939
-0.543157	size array or	-0.124939
-0.543157	Any array or	-0.124939
-1.178718	as possible or	-0.124939
-1.069369	a variable or	-0.425969
-0.552351	no variable or	-0.124939
-0.808782	const variable or	-0.124939
-1.057466	global variables or	-0.124939
-1.905982	of 2 or	-0.124939
-0.886458	import table or	-0.124939
-1.068873	of order or	-0.124939
-0.875773	long long or	-0.124939
-1.928242	in 32-bit or	-0.124939
-1.351735	data member or	-0.124939
-0.562113	different way or	-0.124939
-0.569814	one way or	-0.124939
-0.595691	#define, const or	-0.124939
-0.593035	2, 4 or	-0.124939
-1.691960	to call or	-0.124939
-1.105274	16 8 or	-0.124939
-0.575161	lower 8 or	-0.124939
-0.593650	entire 64 or	-0.124939
-1.254834	program optimization or	-0.124939
-0.592659	linked libraries or	-0.124939
-0.533528	by pointers or	-0.124939
-0.462748	through pointers or	-0.124939
-0.533528	contain pointers or	-0.124939
-0.592040	can test or	-0.124939
-0.660031	with new or	-0.124939
-0.550859	(with new or	-0.124939
-0.592002	16-bit systems or	-0.124939
-1.165782	file access or	-0.124939
-0.591418	8, 16 or	-0.124939
-0.812325	the SSE2 or	-0.726999
-0.702643	for SSE2 or	-0.124939
-0.702643	set SSE2 or	-0.124939
-0.886433	ruled out or	-0.124939
-1.845309	operating system or	-0.124939
-0.168589	be 0 or	-0.124939
-0.195458	than 0 or	-0.346788
-0.447861	always 0 or	-0.124939
-0.594801	with short or	-0.124939
-0.590894	assembly instructions or	-0.124939
-0.592119	Use Gnu or	-0.124939
-1.380941	most important or	-0.124939
-0.117625	multiple CPUs or	-0.124939
-1.253026	inline assembly or	-0.124939
-0.971201	is large or	-0.124939
-1.227407	very large or	-0.124939
-1.559246	the arrays or	-0.124939
-1.456480	point calculations or	-0.124939
-1.159092	128 bytes or	-0.124939
-1.153753	the speed or	-0.124939
-0.796784	for speed or	-0.124939
-0.954414	execution speed or	-0.124939
-0.486338	reduced speed or	-0.124939
-0.696083	half speed or	-0.124939
-0.589445	with single or	-0.124939
-1.693802	Intel, AMD or	-0.124939
-1.394123	an exception or	-0.124939
-0.600212	is small or	-0.425969
-0.723310	very small or	-0.124939
-0.723310	too small or	-0.124939
-0.893156	an overflow or	-0.124939
-0.770679	cause overflow or	-0.124939
-0.530828	ignore overflow or	-0.124939
-0.559392	are integers or	-0.124939
-0.559392	16-bit integers or	-0.124939
-0.194901	A matrix or	-0.425969
-1.501669	the AVX or	-0.124939
-0.558169	for AVX or	-0.124939
-0.556898	into classes or	-0.124939
-1.257883	container classes or	-0.124939
-0.921346	double precision or	-0.124939
-0.646954	single precision or	-0.124939
-0.361656	command line or	-0.124939
-0.880564	compiler manual or	-0.124939
-1.598352	be advantageous or	-0.124939
-1.485109	a container or	-0.124939
-0.878195	involves eight or	-0.124939
-0.586530	with few or	-0.124939
-1.175561	linked list or	-0.124939
-0.945268	sorted list or	-0.124939
-0.557816	the structure or	-0.124939
-0.104183	a structure or	-0.346788
-0.241520	of structure or	-0.425969
-0.314129	This structure or	-0.124939
-0.314129	same structure or	-0.124939
-0.314129	class, structure or	-0.124939
-0.588827	functions inline or	-0.124939
-0.586050	doesn't add or	-0.124939
-0.470485	64-bit mode or	-0.425969
-0.587741	valid values or	-0.124939
-0.587019	loading files or	-0.124939
-0.585965	technical problems or	-0.124939
-1.141093	cache space or	-0.124939
-1.727226	CPU dispatching or	-0.124939
-0.868733	many branches or	-0.124939
-0.583984	involves multiplication or	-0.124939
-0.868021	code automatically or	-0.124939
-0.962600	an expression or	-0.124939
-0.914270	An expression or	-0.124939
-0.889355	data members or	-0.124939
-0.583772	code-based methods or	-0.124939
-0.420596	be signed or	-0.124939
-0.420596	integer, signed or	-0.124939
-0.161255	int, signed or	-0.425969
-0.420596	char, signed or	-0.124939
-0.584073	try block or	-0.124939
-0.533669	takes zero or	-0.124939
-0.533669	was zero or	-0.124939
-0.868229	with Microsoft or	-0.124939
-0.352530	a reference or	-0.124939
-0.582870	as string or	-0.124939
-0.581241	be three or	-0.124939
-1.479208	table lookup or	-0.124939
-1.269309	different types or	-0.124939
-0.530064	mixed types or	-0.124939
-1.018829	point expressions or	-0.124939
-0.530064	and read or	-0.124939
-0.530064	not read or	-0.124939
-0.838608	are aligned or	-0.124939
-0.780307	vector aligned or	-0.124939
-0.478198	properly aligned or	-0.124939
-0.863725	is declared or	-0.124939
-0.519030	A process or	-0.124939
-0.938659	installation process or	-0.124939
-0.580898	misleading results or	-0.124939
-0.857989	The constructor or	-0.124939
-0.577686	of modules or	-0.124939
-0.510602	is negative or	-0.124939
-0.510602	are negative or	-0.124939
-1.359611	be predicted or	-0.124939
-1.116610	is loaded or	-0.124939
-0.577601	the positive or	-0.124939
-0.401970	be C or	-0.124939
-0.566542	with C or	-0.124939
-0.401970	separate C or	-0.124939
-0.401970	either C or	-0.124939
-1.159375	turn off or	-0.124939
-0.507446	them off or	-0.124939
-0.577042	Windows syntax or	-0.124939
-0.576580	their index or	-0.124939
-1.028380	the network or	-0.124939
-0.575917	but slow or	-0.124939
-0.576410	library functions, or	-0.124939
-0.574634	multiple platforms or	-0.124939
-0.573059	each task or	-0.124939
-0.997411	be inlined or	-0.124939
-0.577273	to repeat or	-0.124939
-0.572595	can clear or	-0.124939
-0.557696	hard disk or	-0.124939
-0.480487	is overloaded or	-0.124939
-0.480487	be overloaded or	-0.124939
-0.569620	always true or	-0.124939
-0.479007	with little or	-0.124939
-0.479007	have little or	-0.124939
-1.111035	is initialized or	-0.124939
-0.447878	the SSE or	-0.124939
-0.174541	than reading or	-0.425969
-0.764861	of cores or	-0.124939
-0.889800	CPU cores or	-0.124939
-0.765900	be copied or	-0.124939
-0.470953	deleted, copied or	-0.124939
-1.235583	Linux, BSD or	-0.124939
-0.568765	multithreaded program, or	-0.124939
-0.568105	time loops or	-0.124939
-0.564131	classes, templates or	-0.124939
-0.564131	static buffer or	-0.124939
-0.564131	than seconds or	-0.124939
-0.563413	Linux compiler, or	-0.124939
-0.564851	different module or	-0.124939
-0.413569	user input or	-0.124939
-0.830280	each row or	-0.124939
-0.565572	link map or	-0.124939
-0.564090	int 3; or	-0.124939
-0.561700	normal writes or	-0.124939
-0.637004	CPU brands or	-0.124939
-0.448950	Other brands or	-0.124939
-0.560114	unknown brand or	-0.124939
-0.564090	by *p or	-0.124939
-0.556909	10, 12 or	-0.124939
-0.556026	a prediction or	-0.124939
-1.111316	an integer, or	-0.124939
-0.556909	called once or	-0.124939
-0.431124	or __restrict or	-0.124939
-0.431124	keyword __restrict or	-0.124939
-0.815438	runtime DLL or	-0.124939
-0.443408	and delete or	-0.124939
-1.068069	class C1 or	-0.124939
-0.809634	be called, or	-0.124939
-0.550826	new update or	-0.124939
-0.815092	be slower or	-0.124939
-0.551823	without polymorphism or	-0.124939
-0.552823	add, remove or	-0.124939
-1.154250	if possible, or	-0.124939
-0.387999	that reads or	-0.124939
-0.387999	afterwards reads or	-0.124939
-0.548589	of scope or	-0.124939
-0.387201	a reference, or	-0.124939
-0.652642	or reference, or	-0.124939
-0.543988	to five or	-0.124939
-0.181439	access Reading or	-0.124939
-0.181439	program. Reading or	-0.124939
-0.181439	access. Reading or	-0.124939
-0.181439	0x1C. Reading or	-0.124939
-0.545133	requires compilation or	-0.124939
-0.534592	calling. __fastcall or	-0.124939
-0.534592	on remote or	-0.124939
-0.535937	side effects or	-0.124939
-0.777239	multiple processes or	-0.124939
-0.538641	the console or	-0.124939
-0.142332	is created or	-0.425969
-0.984583	a hundred or	-0.124939
-0.958054	a command or	-0.124939
-1.029954	the latency or	-0.124939
-0.535937	Tuesday, Wednesday or	-0.124939
-0.777239	a key or	-0.124939
-0.901874	carry flag or	-0.124939
-0.524135	not overlap or	-0.124939
-0.870551	are needed, or	-0.124939
-0.520868	use pre-increment or	-0.124939
-0.520868	mouse move or	-0.124939
-0.520868	alignment. __declspec(align(16)) or	-0.124939
-0.932402	periodic pattern or	-0.124939
-0.522498	specific bottleneck or	-0.124939
-0.128496	RGB video or	-0.425969
-0.524135	only 50% or	-0.124939
-0.520868	true (1) or	-0.124939
-0.520868	are uncached or	-0.124939
-0.524135	use, incompatible or	-0.124939
-0.311791	calculations simultaneously or	-0.124939
-0.311791	jobs simultaneously or	-0.124939
-0.171254	branch tree or	-0.124939
-0.155260	binary tree or	-0.425969
-0.520868	use hyperthreading or	-0.124939
-0.822895	bits each, or	-0.124939
-0.822895	memory blocks, or	-0.124939
-0.498917	templates. Two or	-0.124939
-0.716660	square blocking or	-0.124939
-0.042391	library (*.dll or	-0.425969
-0.089373	libraries (*.dll or	-0.124939
-0.500985	list, database, or	-0.124939
-0.498917	function parameter, or	-0.124939
-0.500985	storing text or	-0.124939
-0.498917	is correct or	-0.124939
-0.498917	options -S or	-0.124939
-0.498917	more (128 or	-0.124939
-0.247559	only) -O3 or	-0.124939
-0.247559	/Ox -O3 or	-0.124939
-0.089373	is infinity or	-0.124939
-0.089373	be infinity or	-0.124939
-0.089373	or infinity or	-0.124939
-0.500985	returns. Global or	-0.124939
-0.498917	backwards. Copying or	-0.124939
-0.498917	on. Replace or	-0.124939
-0.500985	for interpreting or	-0.124939
-0.458088	graphics card or	-0.124939
-0.458088	of interpretation or	-0.124939
-0.458088	not overlapping or	-0.124939
-0.458088	project window or	-0.124939
-0.458088	size (16 or	-0.124939
-0.458088	assignment operator, or	-0.124939
-0.651175	A for-loop or	-0.124939
-0.458088	file stdint.h or	-0.124939
-0.458088	segments (32-bit or	-0.124939
-0.458088	signed number, or	-0.124939
-0.458088	resources locally or	-0.124939
-0.458088	binary search, or	-0.124939
-0.458088	are uninitialized or	-0.124939
-0.458088	&& expression, or	-0.124939
-0.458088	speed. Delays or	-0.124939
-0.458088	use internet or	-0.124939
-0.458088	streaming audio or	-0.124939
-0.458088	consecutive indices or	-0.124939
-0.458088	is unstable or	-0.124939
-0.065247	(*.lib, *.a) or	-0.425969
-0.458088	and searching, or	-0.124939
-0.458088	an array, or	-0.124939
-0.458088	this purpose, or	-0.124939
-0.458088	graphics coprocessor or	-0.124939
-0.458088	for recovering or	-0.124939
-0.458088	this initialization, or	-0.124939
-0.458088	key press or	-0.124939
-0.651175	to keyboard or	-0.124939
-0.458088	}; 52 or	-0.124939
-0.651175	and delete, or	-0.124939
-0.354561	an update, or	-0.124939
-0.354561	use objconv or	-0.124939
-0.354561	to C1::Disp() or	-0.124939
-0.354561	particular weakness or	-0.124939
-0.354561	to x?" or	-0.124939
-0.354561	each pixel or	-0.124939
-0.354561	keyword __thread or	-0.124939
-0.354561	example, f(x) or	-0.124939
-0.354561	libraries (.dll or	-0.124939
-0.354561	version 2.20 or	-0.124939
-0.354561	a button or	-0.124939
-0.354561	become imprecise or	-0.124939
-0.354561	whole polygon or	-0.124939
-0.354561	data optimally, or	-0.124939
-0.354561	declaration "static" or	-0.124939
-0.354561	computer game or	-0.124939
-0.354561	inte- ger or	-0.124939
-0.354561	whole workday or	-0.124939
-0.354561	to vectorize, or	-0.124939
-0.354561	branch misprediction, or	-0.124939
-0.354561	a year or	-0.124939
-0.354561	with First-In-First-Out or	-0.124939
-0.354561	the GetTickCount or	-0.124939
-0.354561	libraries (.lib or	-0.124939
-0.354561	stack frame" or	-0.124939
-0.354561	aliasing. __declspec(noalias) or	-0.124939
-0.354561	is reset or	-0.124939
-0.354561	allocating piecewise or	-0.124939
-0.354561	program creates or	-0.124939
-0.354561	that violate or	-0.124939
-0.354561	as VHDL or	-0.124939
-0.354561	reordered, inlined, or	-0.124939
-0.354561	option /QaxAVX or	-0.124939
-0.354561	speed /O2 or	-0.124939
-0.354561	using new/delete or	-0.124939
-0.354561	memmove, memset, or	-0.124939
-0.354561	in 2015 or	-0.124939
-0.354561	is signed, or	-0.124939
-0.354561	output (/FAs or	-0.124939
-0.354561	enum, const, or	-0.124939
-0.354561	("int 3"); or	-0.124939
-0.354561	kit (SDK or	-0.124939
-0.354561	structure. Incrementing or	-0.124939
-0.354561	option -fwrapv or	-0.124939
-0.354561	(e.g. Quine–McCluskey or	-0.124939
-0.354561	a printer or	-0.124939
-0.354561	for AVX2, or	-0.124939
-0.354561	are incremental or	-0.124939
-0.354561	complicated criteria or	-0.124939
-0.354561	the Pentium-II or	-0.124939
-0.354561	from www.agner.org/optimize/testp.zip or	-0.124939
-0.354561	valid addresses, or	-0.124939
-0.354561	instruction directly, or	-0.124939
-0.354561	of range"); or	-0.124939
-0.354561	registers (XMM or	-0.124939
-0.354561	to C0::f or	-0.124939
-0.354561	while he or	-0.124939
-0.354561	string, wstring or	-0.124939
-0.354561	option -Wstrict-overflow=2, or	-0.124939
-0.600897	Neither is it	-0.124939
-1.969441	address of it	-0.124939
-1.027295	compiler and it	-0.124939
-1.416200	time and it	-0.124939
-1.490245	functions and it	-0.124939
-1.296886	cache and it	-0.124939
-0.867820	order and it	-0.124939
-0.584067	cases and it	-0.124939
-0.494265	calls and it	-0.124939
-1.043911	function, and it	-0.124939
-1.027295	processors, and it	-0.124939
-0.584067	however, and it	-0.124939
-0.584067	-fpic and it	-0.124939
-0.584067	inefficient, and it	-0.124939
-0.867820	cores, and it	-0.124939
-0.867820	layers and it	-0.124939
-0.584067	fluctuating and it	-0.124939
-0.658432	is that it	-0.321233
-1.322075	code that it	-0.124939
-0.751961	time that it	-0.124939
-1.022495	functions that it	-0.425969
-0.506584	so that it	-0.393784
-1.489694	sure that it	-0.124939
-0.751961	small that it	-0.124939
-0.751961	advantage that it	-0.124939
-1.236487	expression that it	-0.124939
-0.115011	high that it	-0.301030
-0.482033	means that it	-0.346788
-0.868510	optimizations that it	-0.124939
-1.315422	assume that it	-0.124939
-0.751961	shows that it	-0.124939
-0.519955	expensive that it	-0.124939
-0.519955	show that it	-0.124939
-0.186410	event that it	-0.124939
-0.941094	think that it	-0.124939
-0.519955	consequence that it	-0.124939
-0.519955	saying that it	-0.124939
-0.519955	kludgy that it	-0.124939
-0.519955	discovers that it	-0.124939
-0.519955	knows that it	-0.124939
-0.519955	argue that it	-0.124939
-0.676635	or if it	-0.425969
-0.732832	function if it	-0.124939
-0.732832	as if it	-0.124939
-0.911869	not if it	-0.124939
-0.732832	time if it	-0.124939
-0.843657	memory if it	-0.124939
-0.747419	only if it	-0.124939
-1.077022	loop if it	-0.124939
-0.843657	integer if it	-0.124939
-1.065506	efficient if it	-0.124939
-0.732832	branch if it	-0.124939
-0.508640	call if it	-0.124939
-0.908923	pointers if it	-0.124939
-0.508640	arrays if it	-0.124939
-0.843657	thread if it	-0.124939
-1.077022	check if it	-0.124939
-0.732832	well if it	-0.124939
-0.183759	cycles if it	-0.425969
-0.732832	fast if it	-0.124939
-0.911869	see if it	-0.124939
-0.732832	global if it	-0.124939
-0.508640	statement if it	-0.124939
-0.508640	costs if it	-0.124939
-0.508640	destructor if it	-0.124939
-0.968596	safe if it	-0.124939
-0.911869	efficiently if it	-0.124939
-0.911869	consider if it	-0.124939
-0.508640	message if it	-0.124939
-0.508640	style if it	-0.124939
-0.508640	consumer if it	-0.124939
-0.508640	subroutine if it	-0.124939
-1.384928	long as it	-0.124939
-1.329734	stored as it	-0.124939
-1.040784	distributed as it	-0.124939
-0.588880	executed as it	-0.124939
-1.186113	more than it	-0.124939
-0.574210	data than it	-0.124939
-0.574210	systems than it	-0.124939
-1.000477	important than it	-0.124939
-1.477836	bigger than it	-0.124939
-0.849069	purposes than it	-0.124939
-0.448899	the time it	-1.028029
-0.270734	The time it	-1.079181
-1.047334	each time it	-0.124939
-0.986064	long time it	-0.124939
-0.415091	much time it	-0.124939
-1.259065	every time it	-0.124939
-0.655579	last time it	-0.124939
-2.258095	to use it	-0.124939
-1.804843	can use it	-0.124939
-0.520637	x when it	-0.124939
-0.688453	only when it	-0.124939
-0.870033	used when it	-0.124939
-0.494966	even when it	-0.124939
-0.520637	compiled when it	-0.124939
-0.520637	line when it	-0.124939
-0.520637	running when it	-0.124939
-0.520637	advantages when it	-0.124939
-0.520637	message when it	-0.124939
-0.520637	costly when it	-0.124939
-0.520637	permissible when it	-0.124939
-0.520637	manually when it	-0.124939
-0.747308	code then it	-0.124939
-0.571217	program then it	-0.124939
-0.686343	functions then it	-0.124939
-0.571217	set then it	-0.124939
-0.405165	compilers then it	-0.124939
-0.405165	static then it	-0.124939
-0.571217	available then it	-0.124939
-0.405165	up then it	-0.124939
-0.405165	large then it	-0.124939
-0.405165	execution then it	-0.124939
-0.405165	necessary then it	-0.124939
-0.405165	advantageous then it	-0.124939
-0.405165	problem then it	-0.124939
-0.405165	known then it	-0.124939
-0.405165	cycles then it	-0.124939
-0.405165	automatically then it	-0.124939
-0.405165	core then it	-0.124939
-0.571217	index then it	-0.124939
-0.405165	efficiency then it	-0.124939
-0.405165	resource then it	-0.124939
-0.644484	module then it	-0.124939
-0.405165	debugger then it	-0.124939
-0.156954	on, then it	-0.425969
-0.571217	not, then it	-0.124939
-0.405165	manner then it	-0.124939
-0.405165	arrays, then it	-0.124939
-0.405165	below) then it	-0.124939
-0.405165	segment then it	-0.124939
-0.405165	today, then it	-0.124939
-0.405165	identified, then it	-0.124939
-0.405165	dispatching, then it	-0.124939
-0.405165	found, then it	-0.124939
-0.405165	fine then it	-0.124939
-0.405165	T+5, then it	-0.124939
-0.405165	predictable, then it	-0.124939
-0.405165	made) then it	-0.124939
-0.405165	obvious, then it	-0.124939
-0.405165	small, then it	-0.124939
-0.405165	met then it	-0.124939
-1.686326	to make it	-0.124939
-1.422305	and make it	-0.124939
-0.761366	that make it	-0.425969
-1.099718	then make it	-0.124939
-0.833151	is because it	-0.124939
-0.465373	function because it	-0.425969
-0.624418	code because it	-0.124939
-0.393488	compiler because it	-0.124939
-0.761208	time because it	-0.124939
-0.393488	do because it	-0.124939
-0.393488	library because it	-0.124939
-0.761208	efficient because it	-0.124939
-0.554206	variable because it	-0.124939
-0.624418	pointers because it	-0.124939
-0.393488	useful because it	-0.124939
-0.393488	up because it	-0.124939
-0.624418	times because it	-0.124939
-0.153625	optimal because it	-0.124939
-0.393488	cost because it	-0.124939
-0.393488	mechanism because it	-0.124939
-0.393488	just because it	-0.124939
-0.749357	inefficient because it	-0.124939
-0.393488	platforms because it	-0.124939
-0.393488	constants because it	-0.124939
-0.393488	executable because it	-0.124939
-0.393488	parallel because it	-0.124939
-0.393488	seconds because it	-0.124939
-0.153625	consuming because it	-0.425969
-0.393488	poor because it	-0.124939
-0.393488	processes because it	-0.124939
-0.393488	one, because it	-0.124939
-0.393488	simpler because it	-0.124939
-0.393488	interesting because it	-0.124939
-0.393488	non-sequentially because it	-0.124939
-0.393488	Bridge) because it	-0.124939
-0.393488	alloca, because it	-0.124939
-0.393488	risky because it	-0.124939
-2.238701	the CPU it	-0.124939
-0.582812	branch. If it	-0.124939
-0.865411	first. If it	-0.124939
-0.582812	58 If it	-0.124939
-1.244349	on which it	-0.124939
-0.590116	array, which it	-0.124939
-0.699945	cases, but it	-0.124939
-0.306145	function, but it	-0.124939
-0.617590	set, but it	-0.124939
-0.699945	CPUs, but it	-0.124939
-0.436260	cycles, but it	-0.124939
-0.699945	threads, but it	-0.124939
-0.617590	well, but it	-0.124939
-0.617590	pointer, but it	-0.124939
-0.436260	again, but it	-0.124939
-0.617590	applications, but it	-0.124939
-0.436260	software, but it	-0.124939
-0.436260	method, but it	-0.124939
-0.436260	core, but it	-0.124939
-0.436260	hint, but it	-0.124939
-0.436260	profiling, but it	-0.124939
-0.436260	solution, but it	-0.124939
-0.436260	restriction, but it	-0.124939
-0.436260	caching, but it	-0.124939
-0.436260	subtasks, but it	-0.124939
-0.436260	expandable, but it	-0.124939
-0.436260	job, but it	-0.124939
-0.436260	section, but it	-0.124939
-0.436260	wrong, but it	-0.124939
-1.293045	the one it	-0.124939
-2.397880	instruction set it	-0.124939
-2.237328	to do it	-0.124939
-1.532226	the pointer it	-0.124939
-1.950745	the object it	-0.124939
-0.487568	point where it	-0.124939
-0.487568	variable where it	-0.124939
-0.635629	cases where it	-0.249877
-0.487568	false where it	-0.124939
-0.487568	mode, where it	-0.124939
-0.487568	cache, where it	-0.124939
-0.487568	data", where it	-0.124939
-2.196818	the value it	-0.124939
-0.596952	C1, so it	-0.124939
-0.656204	and makes it	-0.124939
-0.881369	it makes it	-0.124939
-0.816125	This makes it	-0.301030
-0.800803	which makes it	-0.124939
-0.461305	library makes it	-0.124939
-0.461305	pointers makes it	-0.124939
-0.656204	linking makes it	-0.124939
-0.461305	declaration makes it	-0.124939
-0.472761	Intel before it	-0.124939
-0.674286	stack before it	-0.124939
-0.472761	overflow before it	-0.124939
-0.769475	values before it	-0.124939
-0.472761	temp before it	-0.124939
-0.472761	misprediction before it	-0.124939
-0.472761	compilation before it	-0.124939
-0.472761	(c+d) before it	-0.124939
-0.472761	sub-vector before it	-0.124939
-0.823711	and call it	-0.124939
-0.568549	first call it	-0.124939
-1.494885	For example, it	-0.425969
-0.594862	best optimization it	-0.124939
-0.594975	most libraries it	-0.124939
-1.039800	make sure it	-0.301030
-1.599154	to access it	-0.124939
-0.564896	this case it	-0.124939
-0.896921	most cases it	-0.124939
-0.665397	many cases it	-0.124939
-0.575546	some cases it	-0.425969
-0.467151	complex cases it	-0.124939
-1.062772	than making it	-0.124939
-1.872360	you want it	-0.124939
-1.236795	we want it	-0.124939
-1.179671	more important it	-0.124939
-0.592544	on, while it	-0.124939
-1.402399	the work it	-0.124939
-0.487975	resources. But it	-0.124939
-0.487975	CPU. But it	-0.124939
-0.487975	parameter. But it	-0.124939
-0.487975	checks. But it	-0.124939
-0.487975	issue. But it	-0.124939
-1.515825	and therefore it	-0.124939
-0.435793	out whether it	-0.124939
-0.616880	consider whether it	-0.124939
-0.435793	evaluate whether it	-0.124939
-0.527763	deciding whether it	-0.425969
-0.435793	determine whether it	-0.124939
-0.811094	and calculate it	-0.124939
-0.553631	may calculate it	-0.124939
-1.180102	and store it	-0.124939
-1.054856	how well it	-0.124939
-0.799410	and write it	-0.124939
-0.799410	or write it	-0.124939
-0.589256	size. However, it	-0.124939
-0.587669	How was it	-0.124939
-0.545867	other cases, it	-0.124939
-1.277092	some cases, it	-0.124939
-1.681077	the microprocessor it	-0.124939
-0.544274	or replace it	-0.124939
-0.544274	then replace it	-0.124939
-0.271429	references. Therefore, it	-0.124939
-0.271429	to. Therefore, it	-0.124939
-0.271429	applications. Therefore, it	-0.124939
-0.271429	number. Therefore, it	-0.124939
-0.271429	declared. Therefore, it	-0.124939
-0.386933	another. Therefore, it	-0.124939
-0.271429	int. Therefore, it	-0.124939
-0.271429	78 Therefore, it	-0.124939
-0.271429	programmed. Therefore, it	-0.124939
-0.271429	PCs. Therefore, it	-0.124939
-0.271429	calculated. Therefore, it	-0.124939
-1.161150	This allows it	-0.124939
-0.537301	and what it	-0.124939
-0.781987	change what it	-0.124939
-0.585060	multithreaded applications it	-0.124939
-0.532290	object after it	-0.124939
-0.532290	accessed after it	-0.124939
-0.584476	or give it	-0.124939
-1.581631	loop control it	-0.124939
-1.294030	} Here, it	-0.124939
-0.581463	directives around it	-0.124939
-0.581076	then turn it	-0.124939
-1.378622	in fact it	-0.124939
-0.854200	and sometimes it	-0.124939
-0.854200	and prevent it	-0.124939
-1.254008	This prevents it	-0.124939
-0.856068	can tell it	-0.124939
-0.575317	of time, it	-0.124939
-0.850160	of copying it	-0.124939
-0.575317	as accessing it	-0.124939
-0.576375	and divide it	-0.124939
-0.997683	each iteration it	-0.124939
-0.629332	even though it	-0.124939
-0.689250	then convert it	-0.124939
-0.482110	must convert it	-0.124939
-0.571245	I consider it	-0.124939
-1.002715	the program, it	-0.124939
-0.472931	final program, it	-0.124939
-1.299239	reason why it	-0.124939
-0.376956	cycles whenever it	-0.124939
-0.376956	mispredicted whenever it	-0.124939
-0.376956	initiative whenever it	-0.124939
-1.178189	is used, it	-0.124939
-0.568280	again. Obviously, it	-0.124939
-0.567942	other features it	-0.124939
-0.564684	should multiply it	-0.124939
-0.564684	} Here it	-0.124939
-0.813644	to delete it	-0.124939
-0.554570	overflow. Likewise, it	-0.124939
-0.225086	is called, it	-0.124939
-0.555038	interrupted. Now it	-0.124939
-0.934289	to declare it	-0.124939
-0.547673	by giving it	-0.124939
-0.547673	bodies above, it	-0.124939
-0.800376	than comparing it	-0.124939
-1.197676	In general, it	-0.124939
-0.548211	In C++, it	-0.124939
-0.548749	to anything it	-0.124939
-0.538829	specific event it	-0.124939
-1.047681	other hand, it	-0.124939
-0.538829	or created it	-0.124939
-0.538198	classes Fortunately, it	-0.124939
-0.990965	but unfortunately it	-0.124939
-0.357356	etc. And it	-0.124939
-0.357356	maintain. And it	-0.124939
-0.538829	between platforms, it	-0.124939
-0.538198	and compare it	-0.124939
-0.538198	script languages, it	-0.124939
-0.524362	crash. Furthermore, it	-0.124939
-0.936913	by declaring it	-0.124939
-0.524362	derived class, it	-0.124939
-0.525126	should disable it	-0.124939
-1.005073	other words, it	-0.124939
-0.502238	to optimization, it	-0.124939
-0.248797	consuming. Sometimes it	-0.124939
-0.248797	tasks. Sometimes it	-0.124939
-0.503207	size. Today, it	-0.124939
-0.502238	large arrays, it	-0.124939
-0.502238	integer variable, it	-0.124939
-0.142787	loading Often, it	-0.124939
-0.142787	two. Often, it	-0.124939
-0.461109	allows it, it	-0.124939
-0.461109	even worse, it	-0.124939
-0.461109	PC. Nevertheless, it	-0.124939
-0.461109	modern software, it	-0.124939
-0.461109	Boolean algebra, it	-0.124939
-0.142787	team projects, it	-0.124939
-0.142787	one-man projects, it	-0.124939
-0.461109	automatically, although it	-0.124939
-0.461109	parameters. Or it	-0.124939
-0.461109	each method, it	-0.124939
-0.655898	is accessed, it	-0.124939
-0.356940	compiler recognizes it	-0.124939
-0.356940	can see, it	-0.124939
-0.356940	cached. Usually it	-0.124939
-0.356940	is referencing it	-0.124939
-0.356940	the performance, it	-0.124939
-0.356940	which redirects it	-0.124939
-0.356940	= (b*c)/d, it	-0.124939
-0.356940	At least, it	-0.124939
-0.356940	possible. Typically it	-0.124939
-0.356940	do. Hence, it	-0.124939
-0.356940	software design, it	-0.124939
-0.356940	or bottleneck, it	-0.124939
-0.356940	software project, it	-0.124939
-0.356940	of habit, it	-0.124939
-0.356940	in all, it	-0.124939
-0.356940	by XOR'ing it	-0.124939
-0.356940	by AND'ing it	-0.124939
-0.356940	Most importantly, it	-0.124939
-0.356940	as reflecting it	-0.124939
-0.356940	like these, it	-0.124939
-0.356940	in nature, it	-0.124939
-1.795353	is the function	-0.124939
-1.659717	of the function	-0.425969
-1.387756	to the function	-0.124939
-1.546436	and the function	-0.124939
-1.680517	in the function	-0.249877
-1.832762	for the function	-0.124939
-1.300211	that the function	-0.301030
-1.351959	or the function	-0.124939
-1.472807	if the function	-0.425969
-1.768910	by the function	-0.124939
-1.751919	with the function	-0.124939
-1.486447	as the function	-0.124939
-1.671303	than the function	-0.124939
-0.306880	time the function	-0.778151
-1.027196	when the function	-0.367977
-1.701417	from the function	-0.124939
-1.770651	at the function	-0.124939
-1.270325	make the function	-0.425969
-1.690262	because the function	-0.124939
-1.521266	but the function	-0.124939
-1.609841	where the function	-0.124939
-0.890806	before the function	-0.726999
-1.431140	call the function	-0.124939
-1.086359	case the function	-0.124939
-1.393352	up the function	-0.124939
-1.112943	times the function	-0.124939
-1.158220	want the function	-0.124939
-1.412532	about the function	-0.124939
-1.183269	calls the function	-0.124939
-0.963254	inside the function	-0.124939
-1.517529	calculate the function	-0.124939
-0.366890	inline the function	-0.124939
-1.010509	unless the function	-0.124939
-1.262451	allows the function	-0.124939
-0.181915	Make the function	-0.124939
-0.485319	calling the function	-0.124939
-1.158220	When the function	-0.124939
-0.988004	Avoid the function	-0.124939
-1.345996	until the function	-0.124939
-0.569494	across the function	-0.124939
-0.197408	changes the function	-0.124939
-0.840230	declare the function	-0.124939
-0.569494	giving the function	-0.124939
-0.840230	declaring the function	-0.124939
-0.569494	Put the function	-0.124939
-0.569494	Declare the function	-0.124939
-1.312336	is a function	-0.522879
-1.348135	of a function	-0.124939
-1.085315	to a function	-0.221849
-1.511640	in a function	-0.124939
-0.979680	that a function	-0.301030
-1.430374	or a function	-0.124939
-1.007273	as a function	-0.124939
-1.113412	than a function	-0.124939
-1.449888	have a function	-0.124939
-0.767682	time a function	-0.425969
-1.516519	use a function	-0.124939
-1.302772	when a function	-0.124939
-0.812574	memory a function	-0.124939
-1.467712	make a function	-0.124939
-0.820747	If a function	-0.301030
-1.350748	using a function	-0.124939
-0.949621	between a function	-0.124939
-1.135648	call a function	-0.124939
-0.475961	up a function	-0.124939
-0.812574	while a function	-0.124939
-0.812574	calls a function	-0.124939
-0.891328	through a function	-0.301030
-0.476458	inside a function	-0.301030
-1.038738	contains a function	-0.124939
-0.554448	inline a function	-0.124939
-1.150991	replace a function	-0.124939
-0.194171	sets a function	-0.425969
-0.812574	what a function	-0.124939
-0.554448	inlining a function	-0.124939
-0.949621	whenever a function	-0.124939
-0.554448	Obviously, a function	-0.124939
-0.554448	exceptions a function	-0.124939
-0.812574	Whenever a function	-0.124939
-0.812574	Calling a function	-0.124939
-0.554448	Specifies a function	-0.124939
-0.554448	Replacing a function	-0.124939
-0.554448	replacing a function	-0.124939
-0.554448	Inlining a function	-0.124939
-0.554448	publish a function	-0.124939
-0.202474	// of function	-0.425969
-2.208895	number of function	-0.124939
-1.605735	disadvantage of function	-0.124939
-1.627041	advantages of function	-0.124939
-1.037823	Choice of function	-0.124939
-0.593891	binding of function	-0.124939
-0.593891	chain of function	-0.124939
-1.180059	Comparison of function	-0.124939
-0.900741	addresses to function	-0.124939
-1.574969	functions and function	-0.124939
-0.973460	compilers and function	-0.425969
-1.272328	allocation and function	-0.124939
-0.500722	branches and function	-0.425969
-1.769849	} The function	-0.124939
-1.449251	time. The function	-0.124939
-1.563929	function. The function	-0.124939
-1.231100	below. The function	-0.124939
-0.587144	call. The function	-0.124939
-0.587144	microprocessors. The function	-0.124939
-0.873754	reasons: The function	-0.124939
-0.587144	119 The function	-0.124939
-0.873754	exceptions. The function	-0.124939
-0.587144	instance. The function	-0.124939
-2.110769	used for function	-0.124939
-0.896605	information for function	-0.124939
-2.078696	{ // function	-0.124939
-0.499770	matrix // function	-0.425969
-0.593130	cc[]); // function	-0.124939
-0.594315	compilers or function	-0.124939
-0.594315	branches or function	-0.124939
-0.594315	block or function	-0.124939
-1.073088	spent on function	-0.124939
-2.005776	such as function	-0.124939
-0.592488	objects as function	-0.124939
-1.051075	variable as function	-0.124939
-0.793544	// This function	-0.124939
-1.537091	} This function	-0.124939
-0.862326	compilers. This function	-0.124939
-0.581200	dispatching. This function	-0.124939
-0.581200	overflow. This function	-0.124939
-1.107462	of this function	-0.124939
-0.578831	// this function	-0.124939
-1.363823	use this function	-0.124939
-0.578831	inline this function	-0.124939
-0.575180	loop A function	-0.124939
-0.575180	local A function	-0.124939
-0.575180	incompatible. A function	-0.124939
-0.575180	body. A function	-0.124939
-0.575180	destructor. A function	-0.124939
-0.598926	Mathematical vector function	-0.124939
-1.447349	that make function	-0.124939
-1.580532	can make function	-0.124939
-1.707383	a different function	-0.124939
-1.755060	of different function	-0.124939
-1.525675	the same function	-0.124939
-1.745980	any other function	-0.124939
-0.598470	decides which function	-0.124939
-0.861827	that one function	-0.124939
-0.861827	by one function	-0.124939
-1.204405	from one function	-0.124939
-1.187581	that no function	-0.124939
-1.417515	of each function	-0.124939
-1.440140	for each function	-0.124939
-0.806028	at each function	-0.124939
-0.550823	test each function	-0.124939
-0.358986	times each function	-0.124939
-0.598084	able do function	-0.124939
-1.067518	that most function	-0.124939
-2.060982	by using function	-0.124939
-1.678903	the Intel function	-0.124939
-1.227504	in Intel function	-0.124939
-1.202489	an Intel function	-0.124939
-0.566921	optimized Intel function	-0.124939
-0.989688	the library function	-0.124939
-0.467477	a library function	-0.124939
-0.788459	The library function	-0.124939
-0.788459	Intel library function	-0.124939
-0.893480	through multiple function	-0.124939
-1.237978	are many function	-0.124939
-0.881900	with many function	-0.124939
-0.863854	of any function	-0.124939
-0.863854	with any function	-0.124939
-0.596284	correspondence between function	-0.124939
-0.729333	the member function	-0.124939
-0.352186	a member function	-0.124939
-0.521563	or member function	-0.124939
-0.622674	class member function	-0.124939
-0.370682	each member function	-0.124939
-0.092246	static member function	-0.301030
-0.370682	const member function	-0.124939
-0.725245	virtual member function	-0.124939
-0.370682	Assume member function	-0.124939
-0.397446	non-static member function	-0.425969
-0.521563	polymorphic member function	-0.124939
-1.188558	a const function	-0.124939
-1.737167	This makes function	-0.124939
-0.579279	frame makes function	-0.124939
-0.848831	the critical function	-0.221849
-0.951814	a critical function	-0.124939
-0.178104	Call critical function	-0.124939
-0.876391	the template function	-0.124939
-1.015283	the simple function	-0.124939
-1.508323	a simple function	-0.124939
-1.524633	The Gnu function	-0.124939
-1.766887	information about function	-0.124939
-1.258673	the extra function	-0.124939
-0.592277	details. Use function	-0.124939
-1.538815	The best function	-0.124939
-0.903733	a single function	-0.124939
-0.562603	vectors. These function	-0.124939
-0.562603	Primitives". These function	-0.124939
-0.406416	the virtual function	-0.301030
-0.549218	a virtual function	-0.124939
-0.563796	of virtual function	-0.124939
-0.563796	to virtual function	-0.124939
-0.563796	and virtual function	-0.124939
-0.400089	inefficient virtual function	-0.124939
-1.006305	function through function	-0.124939
-0.564095	data through function	-0.124939
-1.048919	Some common function	-0.124939
-1.253529	the thread function	-0.124939
-1.187307	the power function	-0.124939
-0.559222	and optimized function	-0.124939
-0.821263	best optimized function	-0.124939
-0.590966	dispatcher 128 function	-0.124939
-1.044602	only four function	-0.124939
-0.886142	by another function	-0.124939
-0.468565	making another function	-0.124939
-0.420246	calls another function	-0.124939
-0.468565	Use another function	-0.124939
-0.519946	as inline function	-0.124939
-0.751946	an inline function	-0.124939
-0.519946	An inline function	-0.124939
-0.552499	that every function	-0.124939
-0.809049	at every function	-0.124939
-0.807843	a standard function	-0.124939
-0.551831	includes standard function	-0.124939
-0.787253	the intrinsic function	-0.124939
-0.540291	each intrinsic function	-0.124939
-0.989488	a separate function	-0.124939
-1.544614	are various function	-0.124939
-0.620032	the dispatcher function	-0.124939
-0.634017	The dispatcher function	-0.124939
-0.447010	A dispatcher function	-0.124939
-0.584464	information. Each function	-0.124939
-0.604400	a graphics function	-0.124939
-0.479168	Various graphics function	-0.124939
-0.583994	functions. Many function	-0.124939
-1.455674	a linked function	-0.124939
-0.761560	the calling function	-0.124939
-0.525555	The calling function	-0.124939
-0.583110	My own function	-0.124939
-1.553461	the appropriate function	-0.124939
-0.862312	Gnu C function	-0.124939
-1.485794	the desired function	-0.124939
-0.577620	storage. No function	-0.124939
-0.497256	Intel math function	-0.124939
-0.497256	optimized math function	-0.124939
-0.334948	the inlined function	-0.124939
-0.131119	the frame function	-0.124939
-0.451235	a frame function	-0.124939
-0.131119	A frame function	-0.124939
-0.430180	throw() Assume function	-0.124939
-0.430180	const)) Assume function	-0.124939
-0.430180	ivdep Assume function	-0.124939
-1.444987	the right function	-0.124939
-0.832854	// Define function	-0.124939
-0.573320	CPU’s. Another function	-0.124939
-0.999528	an overloaded function	-0.124939
-0.572519	// Any function	-0.124939
-0.569697	string length function	-0.124939
-0.666137	a linear function	-0.425969
-0.801817	to define function	-0.124939
-0.801817	// define function	-0.124939
-1.262026	the latter function	-0.124939
-0.975024	very time-consuming function	-0.124939
-0.318522	a pure function	-0.124939
-0.218056	A pure function	-0.124939
-0.218056	containing pure function	-0.124939
-0.218056	contain pure function	-0.124939
-0.218056	involves pure function	-0.124939
-0.449451	the factorial function	-0.124939
-0.449451	integer factorial function	-0.124939
-0.723460	optimize across function	-0.124939
-0.636951	optimizations across function	-0.124939
-0.824865	the memcpy function	-0.124939
-0.136105	CPU detection function	-0.124939
-1.057183	a polymorphic function	-0.124939
-0.825581	// Virtual function	-0.124939
-0.822724	for general function	-0.124939
-0.554773	well. Even function	-0.124939
-0.950429	for storing function	-0.124939
-0.555652	call transpose function	-0.124939
-0.390504	name Intrinsic function	-0.124939
-0.390504	below. Intrinsic function	-0.124939
-0.182496	the dispatched function	-0.124939
-0.182496	a dispatched function	-0.124939
-0.274164	to dispatched function	-0.124939
-0.182496	another dispatched function	-0.124939
-0.547873	SSE. Several function	-0.124939
-1.060863	// Set function	-0.124939
-0.162312	a leaf function	-0.425969
-0.081646	A leaf function	-0.425969
-0.783908	// Critical function	-0.124939
-0.357459	the pow function	-0.124939
-0.357459	The pow function	-0.124939
-0.538985	monotonically increasing function	-0.124939
-0.143065	the strlen function	-0.124939
-0.762303	the asmlib function	-0.124939
-0.524551	a round function	-0.124939
-0.761066	the lrint function	-0.124939
-0.525268	optimize(...) Fastcall function	-0.124939
-0.313494	Library exp function	-0.124939
-0.313494	floats exp function	-0.124939
-0.502418	virtual 53 function	-0.124939
-0.502418	or API function	-0.124939
-0.722455	loop counters, function	-0.124939
-0.503327	The exponential function	-0.124939
-0.722455	an up-to-date function	-0.124939
-0.502418	efficient. Simple function	-0.124939
-0.503327	The InstructionSet() function	-0.124939
-0.461273	The indirect function	-0.124939
-0.461273	the 61 function	-0.124939
-0.656154	to distribute function	-0.124939
-0.461273	keywords Fast function	-0.124939
-0.065568	; mangled function	-0.425969
-0.461273	always Optimize function	-0.124939
-0.461273	A thread-safe function	-0.124939
-0.357069	a user-defined function	-0.124939
-0.357069	Avoid nested function	-0.124939
-0.357069	or friend function	-0.124939
-0.357069	// Branch/loop function	-0.124939
-0.357069	own error-handling function	-0.124939
-0.357069	a staircase function	-0.124939
-0.357069	you. Optimized function	-0.124939
-0.357069	// instrset_detect function	-0.124939
-0.357069	the std::unexpected() function	-0.124939
-0.357069	the DelayFiveSeconds function	-0.124939
-0.357069	The sin function	-0.124939
-0.357069	called from), function	-0.124939
-2.576435	that the if	-0.124939
-1.072213	how the if	-0.124939
-0.599725	loop, the if	-0.124939
-1.065480	thing and if	-0.124939
-0.597445	etc., and if	-0.124939
-0.894129	4. The if	-0.124939
-0.597556	135 The if	-0.124939
-1.582368	is that if	-0.124939
-1.039260	means that if	-0.124939
-0.503024	_EM_OVERFLOW); // if	-0.425969
-0.556178	used or if	-0.124939
-0.556178	out or if	-0.124939
-0.815713	large or if	-0.124939
-0.239254	small or if	-0.602060
-0.556178	values or if	-0.124939
-0.815713	negative or if	-0.124939
-0.556178	predicted or if	-0.124939
-0.556178	functions, or if	-0.124939
-0.556178	effects or if	-0.124939
-0.556178	overlap or if	-0.124939
-0.556178	pattern or if	-0.124939
-0.556178	blocks, or if	-0.124939
-0.556178	correct or if	-0.124939
-0.556178	unstable or if	-0.124939
-0.556178	initialization, or if	-0.124939
-0.556178	addresses, or if	-0.124939
-1.702572	a function if	-0.124939
-2.336125	the code if	-0.124939
-1.049918	error code if	-0.124939
-0.592085	dead code if	-0.124939
-1.545825	same as if	-0.124939
-0.594619	object as if	-0.124939
-0.776598	but not if	-0.124939
-0.586416	But not if	-0.124939
-0.598392	(unsigned int if	-0.124939
-1.741089	efficient than if	-0.124939
-0.589581	variables than if	-0.124939
-0.589581	form than if	-0.124939
-2.550803	the compiler if	-0.124939
-1.595230	i++) { if	-0.124939
-0.179708	b) { if	-0.726999
-0.575987	0) { if	-0.301030
-1.078927	n) { if	-0.124939
-0.815755	y) { if	-0.124939
-1.169615	extra time if	-0.124939
-0.885495	compile- time if	-0.124939
-1.158541	0; } if	-0.124939
-1.241262	3; } if	-0.124939
-0.872179	&CriticalFunction_AVX; } if	-0.124939
-1.608874	the memory if	-0.124939
-1.446540	in memory if	-0.124939
-1.138050	RAM memory if	-0.124939
-2.331877	the program if	-0.124939
-1.705285	member functions if	-0.124939
-1.244225	virtual functions if	-0.124939
-1.051867	and only if	-0.124939
-1.051867	but only if	-0.124939
-0.539489	possible only if	-0.124939
-0.755423	works only if	-0.124939
-0.539489	run only if	-0.124939
-0.539489	methods only if	-0.124939
-0.539489	needed only if	-0.124939
-0.539489	F1 only if	-0.124939
-0.598854	blend instruction if	-0.124939
-2.461065	floating point if	-0.124939
-1.272248	the loop if	-0.124939
-1.139962	a loop if	-0.124939
-0.544560	no loop if	-0.124939
-0.544560	repeat loop if	-0.124939
-0.544560	infinite loop if	-0.124939
-0.597639	manually, but if	-0.124939
-0.922613	be used if	-0.221849
-1.358935	into one if	-0.124939
-1.526105	code cache if	-0.124939
-1.198913	an integer if	-0.124939
-1.115714	signed integer if	-0.124939
-1.726996	instruction set if	-0.425969
-0.834053	for example if	-0.425969
-0.890775	than double if	-0.124939
-1.530182	integer size if	-0.124939
-1.667080	function pointer if	-0.124939
-0.585990	& b if	-0.124939
-0.585990	| b if	-0.124939
-0.595893	usable library if	-0.124939
-0.596034	them static if	-0.124939
-0.546591	and efficient if	-0.124939
-1.792877	more efficient if	-0.124939
-0.912384	most efficient if	-0.124939
-1.428062	less efficient if	-0.124939
-1.288486	not possible if	-0.124939
-0.197364	only possible if	-0.124939
-0.596362	static version if	-0.124939
-0.891087	any objects if	-0.124939
-0.594646	one variable if	-0.124939
-0.595284	static variables if	-0.124939
-0.657489	of 2 if	-0.425969
-1.056613	lookup table if	-0.124939
-1.728337	the performance if	-0.124939
-1.217579	in performance if	-0.124939
-0.578965	possible branch if	-0.124939
-0.578965	single branch if	-0.124939
-1.066856	efficient way if	-0.124939
-0.805582	is faster if	-0.522879
-0.462670	goes faster if	-0.124939
-0.077714	Still faster if	-0.726999
-1.610935	function call if	-0.124939
-0.679075	For example, if	-0.301030
-1.177390	to unsigned if	-0.124939
-1.572053	a register if	-0.124939
-0.359933	function pointers if	-0.425969
-1.097587	member pointers if	-0.124939
-1.327259	64-bit systems if	-0.124939
-1.824795	the user if	-0.124939
-1.524051	is useful if	-0.124939
-1.751326	be useful if	-0.124939
-0.412835	arrays even if	-0.124939
-0.412835	works even if	-0.124939
-0.412835	cycles even if	-0.124939
-0.412835	Intel, even if	-0.124939
-0.412835	mispredicted even if	-0.124939
-0.412835	called, even if	-0.124939
-0.412835	up, even if	-0.124939
-0.412835	resources, even if	-0.124939
-0.412835	execution, even if	-0.124939
-0.412835	overflows, even if	-0.124939
-0.412835	handler, even if	-0.124939
-1.456969	this method if	-0.124939
-0.594079	left out if	-0.124939
-1.858375	operating system if	-0.124939
-0.593009	return 0 if	-0.124939
-1.603031	the case if	-0.124939
-1.520932	are available if	-0.124939
-0.837883	only available if	-0.124939
-0.593161	filled up if	-0.124939
-1.661678	an error if	-0.124939
-0.592529	be important if	-0.124939
-0.833090	different CPUs if	-0.124939
-0.565651	106 CPUs if	-0.124939
-0.590599	time while if	-0.124939
-0.882451	large arrays if	-0.124939
-1.394076	64-bit Windows if	-0.124939
-0.592325	same result if	-0.124939
-1.278303	works best if	-0.124939
-1.577589	is necessary if	-0.124939
-1.213133	not necessary if	-0.124939
-1.317670	array element if	-0.124939
-0.464308	cycles. But if	-0.124939
-0.464308	obsolete. But if	-0.124939
-0.464308	programmed. But if	-0.124939
-0.464308	delay. But if	-0.124939
-0.464308	miss. But if	-0.124939
-0.464308	conflicts. But if	-0.124939
-0.591434	reduce speed if	-0.124939
-2.055718	int i; if	-0.124939
-0.322751	separate thread if	-0.124939
-1.045806	64-bit integers if	-0.124939
-0.590330	annotation option if	-0.124939
-1.171198	is good if	-0.124939
-1.387183	single precision if	-0.124939
-0.589222	memset line if	-0.124939
-1.156126	be optimized if	-0.124939
-0.556574	b[1000]; }; if	-0.124939
-0.556574	"Delta" }; if	-0.124939
-0.361372	bool b; if	-0.425969
-0.521373	to check if	-0.124939
-0.733481	can check if	-0.124939
-0.395684	// check if	-0.124939
-0.557389	not check if	-0.124939
-0.395684	must check if	-0.124939
-0.395684	input check if	-0.124939
-0.883627	is advantageous if	-0.425969
-0.834679	be advantageous if	-0.124939
-0.667772	more advantageous if	-0.124939
-0.554840	no problem if	-0.124939
-0.554840	big problem if	-0.124939
-1.339721	an advantage if	-0.124939
-0.587447	always 1 if	-0.124939
-1.026762	bit mode if	-0.124939
-0.805419	denormals-are-zero mode if	-0.124939
-1.274140	other values if	-0.124939
-0.549291	quite well if	-0.124939
-0.979366	predicted well if	-0.124939
-1.192688	clock cycles if	-0.425969
-1.058301	i; ... if	-0.124939
-0.547830	list[size]; ... if	-0.124939
-1.267204	not recommended if	-0.124939
-0.546075	very fast if	-0.124939
-0.546075	calculated fast if	-0.124939
-0.587916	F1. However, if	-0.124939
-0.585623	32-bit programs if	-0.124939
-0.544000	without problems if	-0.124939
-0.793824	cause problems if	-0.124939
-0.339807	if else if	-0.425969
-1.730633	} else if	-0.124939
-0.585490	first application if	-0.124939
-0.585106	loop automatically if	-0.124939
-0.531355	to see if	-0.124939
-0.540491	possible implementation if	-0.124939
-1.040682	software implementation if	-0.124939
-1.494345	more complicated if	-0.124939
-0.584692	above methods if	-0.124939
-0.465339	a disadvantage if	-0.124939
-1.026341	is zero if	-0.124939
-0.585287	But what if	-0.124939
-1.142477	const reference if	-0.124939
-1.486003	table lookup if	-0.124939
-1.025683	at runtime if	-0.124939
-0.559518	not needed if	-0.425969
-1.289421	stored together if	-0.124939
-0.583255	becomes bigger if	-0.124939
-0.581077	using vectors if	-0.124939
-0.582287	don't know if	-0.124939
-0.581633	inconsistent results if	-0.124939
-0.579837	one function, if	-0.124939
-1.437373	the operands if	-0.124939
-0.578807	separate modules if	-0.124939
-0.579837	becomes smaller if	-0.124939
-0.580870	runtime here if	-0.124939
-1.010229	this section if	-0.124939
-0.591543	cache contentions if	-0.425969
-1.013182	is predicted if	-0.124939
-0.579291	than C if	-0.124939
-0.512674	variable global if	-0.124939
-0.512674	variables global if	-0.124939
-1.208921	switch statement if	-0.124939
-0.507654	cause errors if	-0.124939
-0.507654	fatal errors if	-0.124939
-0.895341	very inefficient if	-0.124939
-0.721817	quite inefficient if	-0.124939
-0.576344	by checking if	-0.124939
-0.575438	Linux platforms if	-0.124939
-0.515293	be vectorized if	-0.124939
-1.286788	the costs if	-0.124939
-0.574127	usually inlined if	-0.124939
-1.203126	c, d; if	-0.124939
-1.225611	a destructor if	-0.124939
-0.521072	be safe if	-0.124939
-0.370335	only safe if	-0.124939
-0.370335	thread safe if	-0.124939
-0.370335	exception safe if	-0.124939
-0.574564	loop further if	-0.124939
-0.574127	complicated algorithm if	-0.124939
-1.152225	the exponent if	-0.124939
-1.239935	hard disk if	-0.124939
-1.341646	is obtained if	-0.124939
-0.177076	most efficiently if	-0.124939
-0.657166	less efficiently if	-0.124939
-0.571156	CPU models if	-0.124939
-0.342585	to fail if	-0.124939
-0.087180	will fail if	-0.124939
-0.988152	can occur if	-0.124939
-1.113091	the target if	-0.124939
-0.324867	program, especially if	-0.124939
-0.324867	systems, especially if	-0.124939
-0.324867	file, especially if	-0.124939
-0.324867	consuming, especially if	-0.124939
-0.567905	the updates if	-0.124939
-0.415232	may consider if	-0.124939
-0.324242	will consider if	-0.124939
-0.457202	must consider if	-0.124939
-0.569550	function directly if	-0.124939
-1.336488	error message if	-0.124939
-0.567905	in parallel if	-0.124939
-0.571202	becomes easier if	-0.124939
-0.471141	the loops if	-0.124939
-0.471141	unroll loops if	-0.124939
-0.591512	} u; if	-0.124939
-0.564632	is significant if	-0.124939
-0.744525	be invalid if	-0.124939
-0.654135	becomes invalid if	-0.124939
-0.836763	is organized if	-0.124939
-1.109648	to gain if	-0.124939
-0.799970	can happen if	-0.124939
-0.460923	will happen if	-0.124939
-0.839006	doesn't matter if	-0.124939
-0.966909	programming style if	-0.124939
-0.563969	some help if	-0.124939
-0.564633	following explanation if	-0.124939
-0.829982	is pure if	-0.124939
-0.561325	stored (or if	-0.124939
-1.422762	clock cycle if	-0.124939
-0.557960	more frequent if	-0.124939
-0.558697	$B1$2 label if	-0.124939
-0.557225	point variables, if	-0.124939
-0.557960	needed, however, if	-0.124939
-0.820303	small devices if	-0.124939
-0.557960	data explicitly if	-0.124939
-0.960253	lookup tables if	-0.124939
-0.553673	after debugging if	-0.124939
-0.552011	operand. Likewise, if	-0.124939
-0.552011	but expensive if	-0.124939
-0.552011	allows compile-time if	-0.124939
-1.073783	more compact if	-0.124939
-1.070578	of course, if	-0.124939
-1.010992	more complex if	-0.124939
-0.545154	to detect if	-0.124939
-0.546108	dispatching. Test if	-0.124939
-0.797578	is costly if	-0.124939
-0.545154	is poor if	-0.124939
-0.545154	typically happens if	-0.124939
-0.152003	be evaluated if	-0.425969
-0.546108	be permissible if	-0.124939
-0.545154	so (i.e. if	-0.124939
-1.039764	other hand, if	-0.124939
-0.536853	taken, i.e. if	-0.124939
-0.535733	object separately if	-0.124939
-0.536853	as true, if	-0.124939
-0.755412	need modification if	-0.124939
-0.111049	be eliminated if	-0.602060
-0.521974	is selected if	-0.124939
-0.521974	high resolution if	-0.124939
-0.106516	// Faster if	-0.425969
-0.499969	database anyway if	-0.124939
-0.501689	Example 7.8 if	-0.124939
-0.499969	RAM space, if	-0.124939
-0.499969	severe delays if	-0.124939
-0.499969	of longjmp if	-0.124939
-0.718397	programming questions if	-0.124939
-0.718397	library (STL) if	-0.124939
-0.721245	// Check if	-0.124939
-0.825115	to determine if	-0.124939
-0.718397	branch mispredictions if	-0.124939
-0.459044	multiple accumulators if	-0.124939
-0.652669	pointer aliasing" if	-0.124939
-0.652669	39916800, 479001600}; if	-0.124939
-0.459044	copy constructor, if	-0.124939
-0.459044	> v.f if	-0.124939
-0.065344	many branches): if	-0.425969
-0.065344	u, v; if	-0.425969
-0.459044	relatively cheap if	-0.124939
-0.065344	Weekdays Day; if	-0.425969
-0.652669	time consumer if	-0.124939
-0.652669	be avoided, if	-0.124939
-0.459044	separate subroutine if	-0.124939
-0.652669	memory leaks if	-0.124939
-0.355315	Example 14.5b if	-0.124939
-0.355315	& N-1)==0 if	-0.124939
-0.355315	call WriteFile if	-0.124939
-0.355315	n; 143 if	-0.124939
-0.355315	A Number) if	-0.124939
-0.355315	function calls, if	-0.124939
-0.355315	Example 14.4b if	-0.124939
-0.355315	with zero-bits if	-0.124939
-0.355315	Example 14.15b if	-0.124939
-0.355315	element (approximately): if	-0.124939
-0.355315	6); Or, if	-0.124939
-0.355315	is inexact if	-0.124939
-0.355315	are modified, if	-0.124939
-0.355315	be adjusted if	-0.124939
-0.355315	Example 8.10a if	-0.124939
-0.355315	at runtime, if	-0.124939
-0.355315	will occur: if	-0.124939
-0.355315	bits (YMM) if	-0.124939
-0.355315	the destructor, if	-0.124939
-0.355315	the sign-bit if	-0.124939
-0.355315	be ignored if	-0.124939
-0.355315	always normalized, if	-0.124939
-0.355315	are uninitialized, if	-0.124939
-0.355315	bits (XMM) if	-0.124939
-0.355315	or __restrict__, if	-0.124939
-0.355315	a minute if	-0.124939
-0.355315	Example 14.15a if	-0.124939
-0.355315	float list[ARRAYSIZE]; if	-0.124939
-0.355315	Don't panic if	-0.124939
-0.355315	be reversed if	-0.124939
-0.355315	level linking" if	-0.124939
-0.355315	is minimized if	-0.124939
-0.355315	not alias, if	-0.124939
-2.360853	function is by	-0.124939
-0.746692	pointed to by	-0.124939
-0.595929	possible and by	-0.124939
-1.067005	linking and by	-0.124939
-0.595929	invalid, and by	-0.124939
-0.595929	period and by	-0.124939
-1.571462	function or by	-0.124939
-0.580780	int or by	-0.124939
-1.143174	static or by	-0.124939
-1.124621	precision or by	-0.124939
-0.861524	process or by	-0.124939
-0.580780	latency or by	-0.124939
-0.580780	indices or by	-0.124939
-0.580780	signed, or by	-0.124939
-0.894085	replace it by	-0.124939
-0.597534	multiply it by	-0.124939
-1.055693	single function by	-0.124939
-0.501944	leaf function by	-0.425969
-1.580480	intermediate code by	-0.124939
-1.564046	position-independent code by	-0.124939
-0.884147	compiler-generated code by	-0.124939
-0.598393	system, not by	-0.124939
-1.020372	rather than by	-0.301030
-0.565483	classes than by	-0.124939
-0.565483	ways than by	-0.124939
-0.565483	algorithm than by	-0.124939
-0.599333	different compiler by	-0.124939
-1.292416	of x by	-0.124939
-1.349598	of this by	-0.124939
-0.544753	it this by	-0.124939
-1.148416	do this by	-0.124939
-0.544753	does this by	-0.124939
-1.113901	avoid this by	-0.124939
-0.142614	replace this by	-0.823909
-0.544753	improve this by	-0.124939
-0.544753	confirmed this by	-0.124939
-1.541928	much more by	-0.124939
-0.592943	once more by	-0.124939
-1.529589	in memory by	-0.124939
-2.175875	the program by	-0.124939
-1.442732	whole program by	-0.124939
-0.598166	speed-critical functions by	-0.124939
-2.222903	the CPU by	-0.124939
-1.021935	the loop by	-0.221849
-0.747463	this loop by	-0.124939
-0.065111	out loop by	-0.522879
-1.133814	innermost loop by	-0.124939
-1.273954	is used by	-0.124939
-1.180972	are used by	-0.124939
-0.776820	time used by	-0.124939
-0.189707	memory used by	-0.124939
-1.108503	often used by	-0.124939
-0.534352	was used by	-0.124939
-0.663563	into one by	-0.425969
-0.580072	inserted, one by	-0.124939
-1.712629	you should by	-0.124939
-0.893636	is set by	-0.124939
-1.416265	child class by	-0.124939
-1.348066	a double by	-0.124939
-0.597222	longer size by	-0.124939
-0.573780	replaced i by	-0.124939
-0.198315	divide i by	-0.425969
-1.819255	an object by	-0.124939
-1.359094	point number by	-0.124939
-1.190790	CPU clock by	-0.124939
-1.068908	absolute value by	-0.124939
-0.595362	integer variable by	-0.124939
-1.060770	global variables by	-0.124939
-1.753716	of 2 by	-0.124939
-0.858433	to 2 by	-0.124939
-0.595108	align table by	-0.124939
-1.180817	the performance by	-0.124939
-0.565145	measuring performance by	-0.124939
-1.365717	the branch by	-0.124939
-1.404268	a branch by	-0.124939
-0.563910	predictable branch by	-0.124939
-0.596770	b member by	-0.124939
-0.596205	intelligible way by	-0.124939
-0.596486	functions faster by	-0.124939
-0.596476	information stored by	-0.124939
-1.544499	is called by	-0.124939
-0.195479	functions called by	-0.124939
-1.360148	memory address by	-0.124939
-0.578139	each address by	-0.124939
-1.615846	function call by	-0.124939
-0.994812	of optimization by	-0.124939
-0.364465	to optimization by	-0.249877
-1.520926	in registers by	-0.124939
-0.997736	ZMM registers by	-0.124939
-1.331235	64-bit systems by	-0.124939
-0.593511	exclusive access by	-0.124939
-0.805007	ruled out by	-0.124939
-0.193253	rolled out by	-0.124939
-1.326356	a file by	-0.124939
-0.885463	different type by	-0.124939
-0.883790	this error by	-0.124939
-1.186854	is accessed by	-0.124939
-0.191365	not accessed by	-0.425969
-0.831559	and arrays by	-0.124939
-0.564823	replace arrays by	-0.124939
-1.048741	32-bit Windows by	-0.124939
-0.592360	delays execution by	-0.124939
-0.592817	better result by	-0.124939
-1.340517	16 bytes by	-0.124939
-1.297556	the speed by	-0.124939
-0.574647	in speed by	-0.425969
-1.328850	for overflow by	-0.124939
-0.513857	is done by	-0.124939
-0.387326	and done by	-0.124939
-0.710966	be done by	-0.124939
-0.387326	has done by	-0.124939
-0.387326	was done by	-0.124939
-0.387326	necessarily done by	-0.124939
-1.708206	double precision by	-0.124939
-0.558757	this line by	-0.124939
-0.558757	interpreted line by	-0.124939
-1.056548	This works by	-0.124939
-1.158976	be optimized by	-0.124939
-0.389688	be calculated by	-0.204120
-0.878957	function uses by	-0.124939
-1.042257	to another by	-0.124939
-1.269371	not advantageous by	-0.124939
-1.145657	are implemented by	-0.124939
-0.557704	typically implemented by	-0.124939
-1.205114	the problem by	-0.124939
-0.684004	this problem by	-0.124939
-0.192383	is supported by	-0.271067
-0.402183	and supported by	-0.124939
-0.075529	are supported by	-0.301030
-0.118869	if supported by	-0.425969
-1.039202	or 1 by	-0.124939
-0.589035	table values by	-0.124939
-0.345576	number simply by	-0.124939
-0.345576	error simply by	-0.124939
-0.345576	done simply by	-0.124939
-0.345576	implemented simply by	-0.124939
-0.345576	numbers simply by	-0.124939
-0.345576	copied simply by	-0.124939
-0.345576	brand simply by	-0.124939
-0.345576	measured simply by	-0.124939
-0.345576	significantly simply by	-0.124939
-1.591911	loop counter by	-0.124939
-1.256354	memory space by	-0.124939
-1.008612	cache space by	-0.124939
-0.938511	the multiplication by	-0.124939
-0.173037	The multiplication by	-0.124939
-0.662052	integer multiplication by	-0.124939
-0.585990	inlined automatically by	-0.124939
-0.903811	is zero by	-0.124939
-1.192555	to zero by	-0.124939
-0.131259	than division by	-0.425969
-0.555442	point division by	-0.124939
-0.320274	one division by	-0.124939
-0.283651	Integer division by	-0.726999
-0.585255	actually needed by	-0.124939
-1.555240	are transferred by	-0.124939
-0.089920	be aligned by	-0.124939
-0.599563	are aligned by	-0.124939
-0.357593	typically aligned by	-0.124939
-0.357593	preferably aligned by	-0.124939
-0.582538	to dispatch by	-0.124939
-0.866333	is declared by	-0.124939
-0.582538	every second by	-0.124939
-0.585059	calculations piece by	-0.124939
-0.002030	is divisible by	-0.425969
-0.006117	be divisible by	-0.124939
-0.006117	not divisible by	-0.425969
-0.012322	size divisible by	-0.124939
-0.002437	address divisible by	-0.221849
-0.003048	addresses divisible by	-0.425969
-0.581256	significantly just by	-0.124939
-0.516741	even smaller by	-0.124939
-0.516741	made smaller by	-0.124939
-1.122119	CPU core by	-0.124939
-1.014672	to 5 by	-0.124939
-0.024554	is replaced by	-0.124939
-0.078265	and replaced by	-0.124939
-0.083828	be replaced by	-0.367977
-0.078265	are replaced by	-0.124939
-0.078265	been replaced by	-0.124939
-0.078265	parameters replaced by	-0.124939
-0.579475	not negative by	-0.124939
-1.013075	this section by	-0.124939
-1.368853	be predicted by	-0.124939
-1.011785	the conversions by	-0.124939
-0.579660	and off by	-0.124939
-0.857018	the index by	-0.124939
-0.151677	be avoided by	-0.176091
-0.577213	this fact by	-0.124939
-0.458007	is limited by	-0.124939
-0.428391	be limited by	-0.124939
-1.002957	be inlined by	-0.124939
-0.850152	a database by	-0.124939
-1.244080	the destructor by	-0.124939
-0.849488	may save by	-0.124939
-0.575491	code further by	-0.124939
-0.577262	improve efficiency by	-0.124939
-0.576907	you unroll by	-0.124939
-0.573513	when alignment by	-0.124939
-0.573135	Replace macro by	-0.124939
-0.490421	can divide by	-0.124939
-0.490421	= divide by	-0.124939
-0.358307	is obtained by	-0.124939
-0.665969	be obtained by	-0.124939
-1.097501	more efficiently by	-0.124939
-1.315902	be changed by	-0.124939
-0.572034	to square by	-0.124939
-0.571217	big structures by	-0.124939
-0.570293	array initialized by	-0.124939
-0.058675	be improved by	-0.249877
-0.569848	faster either by	-0.124939
-1.115040	is copied by	-0.124939
-0.672223	// align by	-0.124939
-0.672223	; align by	-0.124939
-0.569848	such loops by	-0.124939
-0.566755	a module by	-0.124939
-0.356576	to gain by	-0.124939
-0.530235	you gain by	-0.124939
-0.834237	each row by	-0.124939
-0.501890	can multiply by	-0.124939
-0.265011	= multiply by	-0.124939
-0.511171	lazy binding by	-0.425969
-1.002339	is converted by	-0.124939
-0.822955	two additions by	-0.124939
-0.560147	originally designed by	-0.124939
-0.559550	multiply j by	-0.124939
-0.431990	code explicitly by	-0.124939
-0.431990	alignment explicitly by	-0.124939
-0.433325	of multiplying by	-0.124939
-0.433325	than multiplying by	-0.124939
-0.558954	this jump by	-0.124939
-0.165388	is determined by	-0.425969
-0.212920	be determined by	-0.124939
-0.187119	often determined by	-0.124939
-1.151401	cache misses by	-0.124939
-1.034735	context switches by	-0.124939
-0.554476	Other manuals by	-0.124939
-1.076896	more compact by	-0.124939
-0.800606	two comparisons by	-0.124939
-0.547029	91 step by	-0.124939
-0.547802	alias anything by	-0.124939
-0.909194	multiple inheritance by	-0.124939
-0.537719	be overcome by	-0.124939
-0.060803	code generated by	-0.425969
-0.131525	files generated by	-0.124939
-0.131525	comments generated by	-0.124939
-0.537719	dynamically created by	-0.124939
-0.536812	of pointers, by	-0.124939
-0.781129	be increased by	-0.124939
-0.229651	// Division by	-0.124939
-0.229651	faster. Division by	-0.124939
-0.229651	matters: Division by	-0.124939
-0.536812	these guidelines by	-0.124939
-0.537719	are combined by	-0.124939
-0.524117	b<c) Multiply by	-0.124939
-0.524117	Does not, by	-0.124939
-0.128823	by 2n by	-0.124939
-0.759087	be prevented by	-0.124939
-0.525217	be illustrated by	-0.124939
-0.524117	are returned by	-0.124939
-0.757204	example 8.26a by	-0.124939
-0.065435	be identified by	-0.124939
-0.031486	are identified by	-0.124939
-0.065435	objects identified by	-0.124939
-0.065435	is multiplied by	-0.124939
-0.031486	be multiplied by	-0.425969
-0.065435	index multiplied by	-0.124939
-0.077433	be modified by	-0.425969
-0.171734	never modified by	-0.124939
-0.523020	unconventional manner by	-0.124939
-0.523020	avoid hyperthreading by	-0.124939
-0.500963	later deleted by	-0.124939
-0.500963	be hidden by	-0.124939
-0.500963	to zero, by	-0.124939
-0.500963	done manually by	-0.124939
-0.106649	be spaced by	-0.425969
-0.106649	are separated by	-0.124939
-0.106649	be solved by	-0.425969
-0.089638	- Divide by	-0.124939
-0.089638	add Divide by	-0.124939
-0.089638	---xx---x Divide by	-0.124939
-0.502354	120 ms by	-0.124939
-0.720042	branch mispredictions by	-0.124939
-0.827219	if necessary, by	-0.124939
-0.459949	exceptions thrown by	-0.124939
-0.459949	is bypassed by	-0.124939
-0.654082	Intensive Codes", by	-0.124939
-0.459949	of ArraySize by	-0.124939
-0.459949	code everywhere by	-0.124939
-0.459949	Linux Align by	-0.124939
-0.459949	performance dramatically by	-0.124939
-0.459949	before dividing by	-0.124939
-0.459949	are relocated by	-0.124939
-0.459949	command received by	-0.124939
-0.459949	size grows by	-0.124939
-0.459949	data segment by	-0.124939
-0.654082	a bitfield by	-0.124939
-0.356027	threads Parallelization by	-0.124939
-0.356027	[1.0, 2.0) by	-0.124939
-0.356027	and investigated by	-0.124939
-0.356027	is copyrighted by	-0.124939
-0.356027	// Modulo by	-0.124939
-0.356027	of doubles by	-0.124939
-0.356027	is caught by	-0.124939
-0.356027	replace u[1] by	-0.124939
-0.356027	paralleli- zation by	-0.124939
-0.356027	not affected by	-0.124939
-0.356027	operations. Multiplying by	-0.124939
-0.356027	place indicated by	-0.124939
-0.356027	be mitigated by	-0.124939
-0.356027	libraries published by	-0.124939
-0.356027	be ameliorated by	-0.124939
-0.356027	be followed by	-0.124939
-0.356027	be caused by	-0.124939
-0.356027	obviously influenced by	-0.124939
-0.356027	be accomplished by	-0.124939
-0.356027	when activated by	-0.124939
-0.356027	still frustrated by	-0.124939
-0.884971	list or with	-0.124939
-0.884971	delete or with	-0.124939
-0.592906	polymorphism or with	-0.124939
-0.883394	write it with	-0.124939
-0.883394	replace it with	-0.124939
-0.592100	XOR'ing it with	-0.124939
-0.592100	AND'ing it with	-0.124939
-2.180920	a function with	-0.124939
-0.881413	simple function with	-0.124939
-1.248740	another function with	-0.124939
-1.258707	pure function with	-0.124939
-1.073431	log on with	-0.124939
-2.193653	the code with	-0.124939
-1.141061	this code with	-0.124939
-0.585416	class code with	-0.124939
-1.223540	optimized code with	-0.124939
-0.585416	user-written code with	-0.124939
-0.594258	possibly not with	-0.124939
-0.594258	specialization, not with	-0.124939
-0.594095	signed than with	-0.124939
-0.594095	parallelism than with	-0.124939
-1.488493	a compiler with	-0.124939
-0.586131	each compiler with	-0.124939
-0.586131	Borland compiler with	-0.124939
-0.586131	friendly compiler with	-0.124939
-1.051695	optimize this with	-0.124939
-0.592703	explain this with	-0.124939
-1.811017	of memory with	-0.124939
-2.055966	the data with	-0.124939
-2.169991	the program with	-0.124939
-0.590731	A program with	-0.124939
-1.187842	of functions with	-0.124939
-1.488407	works only with	-0.124939
-0.486219	a CPU with	-0.124939
-1.771344	the other with	-0.124939
-1.239492	a loop with	-0.124939
-0.368891	A loop with	-0.425969
-2.042265	be used with	-0.124939
-1.896219	are used with	-0.124939
-1.765350	an integer with	-0.124939
-1.036884	simple integer with	-0.124939
-1.101277	a class with	-0.425969
-1.481127	or class with	-0.124939
-1.846341	can do with	-0.124939
-0.495261	above example with	-0.124939
-0.596965	kb size with	-0.124939
-0.851326	&& b with	-0.124939
-0.575407	|| b with	-0.124939
-0.575407	AND'ed b with	-0.124939
-0.772468	function library with	-0.301030
-0.193333	this library with	-0.425969
-0.597524	comparing i with	-0.124939
-1.816344	an object with	-0.124939
-1.561659	an array with	-0.124939
-0.987124	linear array with	-0.124939
-0.569158	variable-size array with	-0.124939
-1.148747	debug version with	-0.124939
-0.865401	release version with	-0.124939
-1.186933	allocated objects with	-0.124939
-0.595004	member variable with	-0.124939
-0.593886	can return with	-0.124939
-1.667287	a table with	-0.124939
-1.445643	of software with	-0.124939
-0.578401	these elements with	-0.124939
-0.578401	distinguish elements with	-0.124939
-2.063456	is faster with	-0.124939
-1.824132	be stored with	-0.124939
-1.689501	is called with	-0.124939
-0.576912	erroneously called with	-0.124939
-1.394563	Pentium 4 with	-0.124939
-1.613383	function call with	-0.124939
-1.871626	function libraries with	-0.124939
-1.262458	A template with	-0.124939
-0.189323	to systems with	-0.425969
-0.532652	in systems with	-0.124939
-0.532652	utilize systems with	-0.124939
-0.593903	A method with	-0.124939
-0.887669	carried out with	-0.124939
-0.882942	a system with	-0.124939
-0.593546	* 32 with	-0.124939
-0.366956	multiple bits with	-0.425969
-0.547913	return operations with	-0.124939
-0.547913	makes operations with	-0.124939
-0.800805	arithmetic operations with	-0.124939
-1.052082	function type with	-0.124939
-1.605398	the case with	-0.124939
-0.462877	on processors with	-0.124939
-0.899870	first processors with	-0.124939
-0.832007	PC processors with	-0.124939
-0.503215	lightweight processors with	-0.124939
-0.885704	only available with	-0.124939
-1.421318	a constant with	-0.124939
-0.543860	integer constant with	-0.124939
-0.543860	single constant with	-0.124939
-0.545097	end up with	-0.124939
-0.192114	keep up with	-0.425969
-1.356511	many times with	-0.124939
-0.567252	250 times with	-0.124939
-1.185777	is accessed with	-0.124939
-0.632124	be accessed with	-0.124939
-0.175384	for CPUs with	-0.124939
-0.324523	on CPUs with	-0.124939
-0.474298	Use CPUs with	-0.124939
-0.474298	Older CPUs with	-0.124939
-0.992541	of arrays with	-0.124939
-0.784542	aligned arrays with	-0.124939
-0.538753	variable-size arrays with	-0.124939
-1.157390	to work with	-0.124939
-0.538301	may work with	-0.124939
-1.048014	doesn't work with	-0.124939
-0.881318	mathematical calculations with	-0.124939
-1.511985	multiple versions with	-0.124939
-0.591443	i7 processor with	-0.124939
-0.954329	is compiled with	-0.124939
-0.718215	be compiled with	-0.124939
-0.632649	are compiled with	-0.124939
-0.405057	code compiled with	-0.124939
-0.632649	when compiled with	-0.124939
-0.446120	normally compiled with	-0.124939
-0.660625	of threads with	-0.124939
-0.660625	more threads with	-0.124939
-0.660625	other threads with	-0.124939
-0.464122	into threads with	-0.124939
-0.922446	two threads with	-0.124939
-0.660625	separate threads with	-0.124939
-0.880592	high-level language with	-0.124939
-1.267331	separate thread with	-0.124939
-0.547312	to compile with	-0.124939
-0.953413	you compile with	-0.124939
-0.531827	has allocated with	-0.124939
-0.189135	Memory allocated with	-0.425969
-1.046723	integer overflow with	-0.124939
-0.895171	of integers with	-0.124939
-0.895171	64-bit integers with	-0.124939
-0.772199	8-bit integers with	-0.124939
-1.170441	32-bit Linux with	-0.124939
-0.589513	wrapper classes with	-0.124939
-0.666588	is done with	-0.124939
-0.371190	be done with	-0.346788
-0.466350	are done with	-0.124939
-1.054725	command line with	-0.124939
-0.879767	method works with	-0.124939
-1.505761	be calculated with	-0.124939
-0.557111	always calculated with	-0.124939
-1.129853	is implemented with	-0.124939
-1.402854	be implemented with	-0.124939
-0.979142	are implemented with	-0.124939
-0.496686	calculation implemented with	-0.124939
-0.808788	a problem with	-0.124939
-0.808788	The problem with	-0.124939
-0.464941	A problem with	-0.124939
-0.464941	Another problem with	-0.124939
-0.464941	serious problem with	-0.124939
-1.684186	is known with	-0.124939
-0.590389	12.6. Function with	-0.124939
-0.589377	linear list with	-0.124939
-0.944656	may run with	-0.124939
-0.552442	test run with	-0.124939
-0.588537	Works well with	-0.124939
-0.588854	different addresses with	-0.124939
-1.588776	loop counter with	-0.124939
-1.744111	memory allocation with	-0.124939
-0.588209	platform. However, with	-0.124939
-1.143517	in programs with	-0.124939
-0.587113	common problems with	-0.124939
-1.738288	CPU dispatching with	-0.124939
-0.191458	A microprocessor with	-0.425969
-0.588044	container, preferably with	-0.124939
-0.585938	An application with	-0.124939
-1.034470	An expression with	-0.124939
-1.567069	data members with	-0.124939
-0.585127	table-based methods with	-0.124939
-0.586488	comparing signed with	-0.124939
-0.584900	A model with	-0.124939
-1.030944	integer division with	-0.124939
-0.583685	>> n with	-0.124939
-0.585899	always end with	-0.124939
-0.532084	mathematical applications with	-0.124939
-0.532084	CPU-intensive applications with	-0.124939
-0.774765	an addition with	-0.124939
-1.162097	point addition with	-0.124939
-1.438473	different types with	-0.124939
-0.866892	such optimizations with	-0.124939
-0.523232	a platform with	-0.124939
-0.523232	PC platform with	-0.124939
-1.449427	or later with	-0.124939
-0.865159	linked together with	-0.124939
-0.524708	variables declared with	-0.124939
-0.524708	macro declared with	-0.124939
-0.521252	to link with	-0.124939
-0.521252	and link with	-0.124939
-0.520224	object made with	-0.124939
-0.520224	projects made with	-0.124939
-0.582451	eight) points with	-0.124939
-0.515564	or modules with	-0.124939
-0.515564	critical modules with	-0.124939
-0.859422	the core with	-0.124939
-0.916160	do things with	-0.124939
-0.512945	funny things with	-0.124939
-1.214269	be tested with	-0.124939
-0.644631	a computer with	-0.124939
-0.453881	A computer with	-0.124939
-0.453881	old computer with	-0.124939
-0.023256	is compatible with	-0.124939
-0.031293	be compatible with	-0.124939
-0.109230	not compatible with	-0.124939
-0.101583	even compatible with	-0.124939
-0.101583	highly compatible with	-0.124939
-0.176240	backwards compatible with	-0.124939
-0.101583	mostly compatible with	-0.124939
-0.524216	switch statement with	-0.124939
-1.139303	allocated dynamically with	-0.124939
-1.058538	// Loop with	-0.124939
-0.508235	12.4a. Loop with	-0.124939
-0.854619	a network with	-0.124939
-0.537962	that comes with	-0.124939
-0.382204	compiler comes with	-0.124939
-0.382204	It comes with	-0.124939
-0.382204	which comes with	-0.124939
-0.816797	other platforms with	-0.124939
-0.496015	common platforms with	-0.124939
-0.978311	be vectorized with	-0.124939
-0.496344	example, vectorized with	-0.124939
-0.574632	any algorithm with	-0.124939
-0.256343	the compatibility with	-0.124939
-0.242851	of compatibility with	-0.124939
-0.256343	when compatibility with	-0.124939
-0.256343	backwards compatibility with	-0.124939
-0.575565	polymorphism effect with	-0.124939
-1.401201	to predict with	-0.124939
-0.479957	is obtained with	-0.124939
-0.290071	be obtained with	-0.425969
-0.286478	doubt obtained with	-0.124939
-1.096357	more efficiently with	-0.124939
-0.845874	have names with	-0.124939
-0.994743	of N with	-0.124939
-0.570225	number (e.g. with	-0.124939
-0.643347	data structures with	-0.124939
-0.569921	representation directly with	-0.124939
-0.841957	are defined with	-0.124939
-0.093165	that come with	-0.301030
-0.317674	circular buffer with	-0.425969
-1.079441	can happen with	-0.124939
-0.968366	C style with	-0.124939
-0.828363	nontemporal writes with	-0.124939
-1.250622	dependency chains with	-0.124939
-0.562494	compares eax with	-0.124939
-0.560459	libraries included with	-0.124939
-0.559791	vector c2 with	-0.124939
-0.822303	two additions with	-0.124939
-0.957976	a DLL with	-0.124939
-0.821082	small devices with	-0.124939
-0.254627	by multiplying with	-0.124939
-0.254627	when multiplying with	-0.124939
-0.206963	before multiplying with	-0.425969
-1.089647	be determined with	-0.124939
-0.552571	calculations. Even with	-0.124939
-0.581975	Runtime polymorphism with	-0.124939
-0.412476	Compile-time polymorphism with	-0.124939
-0.944973	is measured with	-0.124939
-0.796861	Calculate polynomial with	-0.124939
-0.798400	of Func with	-0.124939
-0.546568	disk. Test with	-0.124939
-0.799945	powerful computers with	-0.124939
-0.545705	the users with	-0.124939
-0.545705	be mixed with	-0.124939
-0.548299	of communication with	-0.124939
-0.546568	style type-casting with	-0.124939
-0.538301	vector bc with	-0.124939
-0.142808	is swapped with	-0.425969
-0.564049	// Array with	-0.124939
-0.357008	7.15a. Array with	-0.124939
-0.536272	; compare with	-0.124939
-0.142604	is contiguous with	-0.425969
-0.536272	them separately with	-0.124939
-0.029321	is AND'ed with	-0.249877
-0.537286	compiler combined with	-0.124939
-0.536272	avoid macros with	-0.124939
-0.522496	and databases with	-0.124939
-0.523724	12.4b. Vectorized with	-0.124939
-0.522496	column 29 with	-0.124939
-0.522496	that begin with	-0.124939
-0.313302	is represented with	-0.124939
-0.442409	be represented with	-0.124939
-0.077386	are incompatible with	-0.124939
-0.171618	code incompatible with	-0.124939
-0.756307	PLT entry with	-0.124939
-0.522496	some tests with	-0.124939
-0.523724	behavior well-defined with	-0.124939
-0.042481	are satisfied with	-0.124939
-0.089573	not satisfied with	-0.124939
-0.089573	Gnu Comes with	-0.124939
-0.089573	Microsoft Comes with	-0.124939
-0.089573	Embarcadero Comes with	-0.124939
-0.502021	be performed with	-0.124939
-0.248136	MKL). Works with	-0.124939
-0.248136	(IPP). Works with	-0.124939
-0.502021	is supplied with	-0.124939
-0.502021	or moved with	-0.124939
-0.502021	duration compared with	-0.124939
-0.459496	are impossible with	-0.124939
-0.459496	be manipulated with	-0.124939
-0.459496	optimizations. Loops with	-0.124939
-0.459496	example 14.14a with	-0.124939
-0.142399	} Microprocessors with	-0.124939
-0.142399	control Microprocessors with	-0.124939
-0.459496	regular patterns with	-0.124939
-0.459496	often conflicting with	-0.124939
-0.459496	indexes, working with	-0.124939
-0.142399	problems associated with	-0.124939
-0.142399	errors associated with	-0.124939
-0.459496	and machines with	-0.124939
-0.459496	4 ways, with	-0.124939
-0.459496	cause complications with	-0.124939
-0.142399	for dealing with	-0.124939
-0.142399	are dealing with	-0.124939
-0.653375	by extending with	-0.124939
-0.142399	may interfere with	-0.124939
-0.142399	will interfere with	-0.124939
-0.459496	that begins with	-0.124939
-0.459496	an IDE with	-0.124939
-0.355671	Documentation". Included with	-0.124939
-0.355671	before coordination with	-0.124939
-0.355671	example (12.4e) with	-0.124939
-0.355671	ADC (add with	-0.124939
-0.355671	12.1b. Vectorization with	-0.124939
-0.355671	multiple configurations with	-0.124939
-0.355671	Far Systems with	-0.124939
-0.355671	is correlated with	-0.124939
-0.355671	through 14, with	-0.124939
-0.355671	matters. Problems with	-0.124939
-0.355671	to trace with	-0.124939
-0.355671	competition. Processors with	-0.124939
-0.355671	Big supercomputers with	-0.124939
-0.355671	project built with	-0.124939
-0.355671	calling vector::reserve with	-0.124939
-0.355671	been unsatisfied with	-0.124939
-0.355671	be reached with	-0.124939
-0.355671	-0 (zero with	-0.124939
-0.355671	12.4b, rewritten with	-0.124939
-0.355671	multiple streams with	-0.124939
-0.355671	in connection with	-0.124939
-0.355671	array coincides with	-0.124939
-0.355671	and invoked with	-0.124939
-0.355671	dividing repeatedly with	-0.124939
-0.355671	calculate pow(x,10) with	-0.124939
-0.355671	usually dealt with	-0.124939
-0.355671	might clash with	-0.124939
-0.355671	I disagree with	-0.124939
-0.355671	spent fighting with	-0.124939
-2.700743	it is on	-0.124939
-0.599944	focus is on	-0.124939
-1.364923	processors and on	-0.124939
-1.276871	this function on	-0.124939
-1.184686	same function on	-0.124939
-1.020315	but not on	-0.124939
-0.874187	registers, not on	-0.124939
-0.587368	research, not on	-0.124939
-1.115521	rather than on	-0.823909
-0.848322	expressions than on	-0.124939
-0.573813	interface than on	-0.124939
-1.885522	Gnu compiler on	-0.124939
-1.964423	the time on	-0.124939
-0.856168	CPU time on	-0.124939
-1.010571	its time on	-0.124939
-1.135510	execution time on	-0.124939
-0.856168	spend time on	-0.124939
-0.896963	resource use on	-0.124939
-0.598851	For more on	-0.124939
-0.598421	allocates memory on	-0.124939
-0.598196	speed-critical program on	-0.124939
-0.546996	work only on	-0.124939
-1.317429	works only on	-0.124939
-0.192535	well only on	-0.425969
-0.546996	optimal only on	-0.124939
-0.192535	depends only on	-0.124939
-0.546996	tested only on	-0.124939
-1.185295	at all on	-0.124939
-0.598302	numbers, but on	-0.124939
-2.045984	be used on	-0.124939
-0.589960	typically used on	-0.124939
-0.596520	The example on	-0.124939
-1.533913	integer size on	-0.124939
-1.944100	the object on	-0.124939
-0.596806	generally possible on	-0.124939
-0.866194	advanced version on	-0.124939
-0.866194	inferior version on	-0.124939
-0.596742	other objects on	-0.124939
-0.907056	of performance on	-0.124939
-1.043206	The performance on	-0.124939
-0.117185	reduced performance on	-0.301030
-1.187019	very long on	-0.124939
-1.392793	is stored on	-0.124939
-1.490768	be stored on	-0.124939
-0.834215	are stored on	-0.602060
-0.515359	Variables stored on	-0.124939
-1.926119	is called on	-0.124939
-1.689530	to test on	-0.124939
-1.664425	is useful on	-0.124939
-0.594393	applications even on	-0.124939
-1.328754	a file on	-0.124939
-1.220280	vector operations on	-0.124939
-0.527739	do operations on	-0.124939
-0.527739	mathematical operations on	-0.124939
-0.527739	Similar operations on	-0.124939
-1.723301	some cases on	-0.124939
-0.798111	the processors on	-0.124939
-1.054851	of processors on	-0.124939
-0.798111	virtual processors on	-0.124939
-0.836566	less important on	-0.124939
-0.567526	particularly important on	-0.124939
-1.177678	are accessed on	-0.425969
-1.258740	inline assembly on	-0.124939
-0.569891	to work on	-0.124939
-0.448846	that work on	-0.124939
-0.636844	not work on	-0.124939
-0.448846	this work on	-0.124939
-0.168848	directives work on	-0.425969
-0.942076	the calculations on	-0.124939
-0.880302	do calculations on	-0.124939
-0.491334	doing calculations on	-0.124939
-0.491334	start calculations on	-0.124939
-0.491334	parallel calculations on	-0.124939
-0.592768	cross- compiled on	-0.124939
-0.592154	64 bytes on	-0.124939
-1.510879	multiple threads on	-0.124939
-0.487058	work best on	-0.124939
-0.167289	works best on	-0.124939
-1.254492	The speed on	-0.124939
-0.592223	very much on	-0.124939
-0.560936	possible overflow on	-0.124939
-0.560936	buffer overflow on	-0.124939
-0.881274	64 matrix on	-0.124939
-1.429667	container classes on	-0.124939
-1.640946	be done on	-0.124939
-0.880601	of precision on	-0.124939
-1.148956	that works on	-0.124939
-0.558766	also works on	-0.124939
-0.591761	a manual on	-0.124939
-0.323157	is explained on	-0.124939
-0.499248	are explained on	-0.124939
-0.272427	as explained on	-1.204120
-0.045738	reasons explained on	-0.726999
-1.047110	the parameters on	-0.124939
-0.557175	(three parameters on	-0.124939
-0.557868	extra check on	-0.124939
-0.557868	bounds check on	-0.124939
-0.557868	if implemented on	-0.124939
-0.557868	preferably implemented on	-0.124939
-0.591052	fastest solution on	-0.124939
-1.437992	is supported on	-0.124939
-0.755940	and supported on	-0.124939
-0.522282	only supported on	-0.124939
-0.878156	decrement operators on	-0.124939
-0.519156	then run on	-0.124939
-0.519156	only run on	-0.124939
-0.519156	still run on	-0.124939
-0.177151	work well on	-0.124939
-0.177151	works well on	-0.124939
-1.196007	clock cycles on	-0.124939
-0.589095	don't count on	-0.124939
-0.192887	all files on	-0.425969
-0.509125	quite fast on	-0.124939
-0.509125	particularly fast on	-0.124939
-0.509125	sufficiently fast on	-0.124939
-1.313465	is optimal on	-0.124939
-0.587547	of space on	-0.124939
-1.043336	anything else on	-0.124939
-1.451347	CPU dispatching on	-0.124939
-0.544871	makes dispatching on	-0.124939
-0.734595	is running on	-0.124939
-0.200918	when running on	-0.124939
-0.415387	processes running on	-0.124939
-0.584455	performs better on	-0.124939
-1.031590	The examples on	-0.124939
-1.399067	point addition on	-0.124939
-0.867059	algebraic expressions on	-0.124939
-0.860004	are transferred on	-0.425969
-0.766602	all optimizations on	-0.124939
-0.528476	doing optimizations on	-0.124939
-0.583655	rendering graphics on	-0.124939
-0.583416	keep together on	-0.124939
-0.603901	the dispatch on	-0.124939
-0.581762	by storage on	-0.124939
-0.060082	is based on	-0.124939
-0.030957	be based on	-0.124939
-0.003751	are based on	-0.124939
-0.030957	CPU based on	-0.124939
-0.030957	C++ based on	-0.124939
-0.030957	language based on	-0.124939
-0.030957	dispatcher based on	-0.124939
-0.030957	framework based on	-0.124939
-0.030957	go based on	-0.124939
-0.030957	chosen based on	-0.124939
-0.582920	CPU feature on	-0.124939
-0.580465	processor core on	-0.124939
-1.222208	scattered around on	-0.124939
-0.513497	wrap around on	-0.124939
-0.402952	more reductions on	-0.124939
-0.402952	simple reductions on	-0.124939
-0.482133	algebraic reductions on	-0.425969
-0.078300	use depends on	-0.124939
-0.078300	vector depends on	-0.124939
-0.078300	loop depends on	-0.124939
-0.148725	value depends on	-0.124939
-0.037388	branch depends on	-0.124939
-0.037388	calculation depends on	-0.425969
-0.078300	application depends on	-0.124939
-0.078300	addition depends on	-0.124939
-0.078300	predicted depends on	-0.124939
-0.078300	sum depends on	-0.124939
-0.078300	gain depends on	-0.124939
-0.078300	truth depends on	-0.124939
-0.528907	be tested on	-0.124939
-0.582015	fully compatible on	-0.124939
-0.015468	name depending on	-0.124939
-0.015468	ways depending on	-0.124939
-0.015468	dynamically depending on	-0.124939
-0.003050	cycles, depending on	-0.823909
-0.015468	memory, depending on	-0.124939
-0.015468	integers, depending on	-0.124939
-0.015468	64, depending on	-0.124939
-0.015468	four, depending on	-0.124939
-0.015468	meanings depending on	-0.124939
-0.015468	12.4a, depending on	-0.124939
-0.015468	move, depending on	-0.124939
-0.015468	solutions, depending on	-0.124939
-1.661181	be avoided on	-0.124939
-0.685581	to turn on	-0.425969
-0.488053	and turn on	-0.124939
-0.346742	can turn on	-0.124939
-0.346742	not turn on	-0.124939
-0.579119	methods described on	-0.124939
-0.854499	point operation on	-0.124939
-0.577956	model comes on	-0.124939
-0.019348	to rely on	-0.124939
-0.066225	that rely on	-0.425969
-0.019348	can rely on	-0.124939
-0.039599	should rely on	-0.124939
-0.012803	cannot rely on	-0.124939
-0.039599	always rely on	-0.124939
-0.039599	must rely on	-0.124939
-0.039599	Don't rely on	-0.124939
-0.039599	surely rely on	-0.124939
-1.131807	are given on	-0.124939
-0.574696	certain tasks on	-0.124939
-0.358011	any effect on	-0.124939
-0.358011	negative effect on	-0.124939
-0.358011	significant effect on	-0.124939
-0.358011	dramatic effect on	-0.124939
-0.573209	work efficiently on	-0.124939
-0.525684	processor models on	-0.425969
-0.787210	for details on	-0.124939
-0.481627	more details on	-0.124939
-0.571459	chain, especially on	-0.124939
-0.923920	is discussed on	-0.124939
-0.473102	as discussed on	-0.124939
-0.571069	explained below on	-0.124939
-1.018535	The delay on	-0.124939
-0.570290	unit, either on	-0.124939
-0.569901	save ebx on	-0.124939
-0.568088	doing something on	-0.124939
-0.476439	clock cycle on	-0.124939
-0.821720	is fastest on	-0.124939
-1.223497	are listed on	-0.124939
-0.554895	you spend on	-0.124939
-0.554305	make measurements on	-0.124939
-0.553715	were measured on	-0.124939
-0.307183	the log on	-0.124939
-0.307183	The log on	-0.124939
-0.307183	requires log on	-0.124939
-0.389792	is spent on	-0.124939
-0.389792	time spent on	-0.124939
-0.273619	cycles spent on	-0.124939
-0.547509	is 15 on	-0.124939
-0.546832	than normal on	-0.124939
-0.042540	that depend on	-0.124939
-0.042540	should depend on	-0.124939
-0.042540	doesn't depend on	-0.124939
-0.042540	don't depend on	-0.124939
-0.042540	methods depend on	-0.124939
-0.042540	details depend on	-0.124939
-0.961006	optimization effort on	-0.124939
-0.537375	negative list, on	-0.124939
-0.538169	to compromise on	-0.124939
-0.988634	software package on	-0.124939
-0.051572	that relies on	-0.124939
-0.051572	code relies on	-0.124939
-0.051572	program relies on	-0.124939
-0.051572	mechanism relies on	-0.124939
-0.051572	MKL relies on	-0.124939
-0.313631	time. Dispatch on	-0.124939
-0.313631	times: Dispatch on	-0.124939
-0.523564	particularly bad on	-0.124939
-0.758138	page 134 on	-0.124939
-0.065482	for restrictions on	-0.124939
-0.065482	few restrictions on	-0.124939
-0.031508	certain restrictions on	-0.124939
-0.524527	processor appears on	-0.124939
-0.313039	5 μs on	-0.124939
-0.442055	250 μs on	-0.124939
-0.524527	256 bytes) on	-0.124939
-0.523564	in tests on	-0.124939
-0.722923	in detail on	-0.124939
-0.501480	Many advices on	-0.124939
-0.357271	behave differently on	-0.124939
-0.248515	behaves differently on	-0.124939
-0.106719	is performed on	-0.124939
-0.501480	titles. Literature on	-0.124939
-0.501480	more focus on	-0.124939
-0.501480	typically specified on	-0.124939
-0.720900	example 9.5a on	-0.124939
-0.501480	discussion forums on	-0.124939
-0.501480	zero flags on	-0.124939
-0.460420	or interpretation on	-0.124939
-0.142622	Technical Report on	-0.124939
-0.142622	"Technical Report on	-0.124939
-0.460420	www.intel.com. Manual on	-0.124939
-0.654819	cache miss on	-0.124939
-0.460420	general literature on	-0.124939
-0.065482	is concentrated on	-0.124939
-0.460420	of experiments on	-0.124939
-0.460420	has influence on	-0.124939
-0.460420	two (three on	-0.124939
-0.654819	will crash on	-0.124939
-0.654819	predicted perfectly on	-0.124939
-0.460420	run optimally on	-0.124939
-0.065482	is wasted on	-0.124939
-0.460420	rely heavily on	-0.124939
-0.460420	optimization efforts on	-0.124939
-0.460420	Advanced book on	-0.124939
-0.654819	const restriction on	-0.124939
-0.460420	the IDE on	-0.124939
-0.654819	algebraic manipulations on	-0.124939
-0.460420	will stay on	-0.124939
-0.142622	many tips on	-0.124939
-0.142622	some tips on	-0.124939
-0.356398	provided below, on	-0.124939
-0.356398	cache. Files on	-0.124939
-0.356398	discussions. Turn on	-0.124939
-0.356398	objects. Storage on	-0.124939
-0.356398	is pushed on	-0.124939
-0.356398	Wikipedia article on	-0.124939
-0.356398	theory. Advice on	-0.124939
-0.356398	a tag on	-0.124939
-0.356398	Example 7.43 on	-0.124939
-0.356398	based mainly on	-0.124939
-0.356398	runs satisfactorily on	-0.124939
-0.356398	negative impacts on	-0.124939
-0.356398	cycles (depending on	-0.124939
-0.356398	of research on	-0.124939
-0.356398	busy concentrating on	-0.124939
-0.356398	www.amd.com. Advices on	-0.124939
-0.356398	are relying on	-0.124939
-0.356398	BIOS setup. on	-0.124939
-0.356398	a textbook on	-0.124939
-0.356398	different opinions on	-0.124939
-0.356398	Boolean NOT on	-0.124939
-0.356398	is incurred on	-0.124939
-1.445639	is the code	-0.124939
-1.155435	of the code	-0.376751
-1.607562	to the code	-0.124939
-1.356489	and the code	-0.124939
-1.134344	in the code	-0.289749
-1.328487	that the code	-0.221849
-1.292312	if the code	-0.301030
-1.667975	by the code	-0.124939
-1.573629	than the code	-0.124939
-1.154872	when the code	-0.522879
-1.037402	then the code	-0.249877
-1.342334	at the code	-0.425969
-0.817600	make the code	-0.346788
-1.590062	because the code	-0.124939
-1.365663	only the code	-0.124939
-1.248323	If the code	-0.301030
-1.450691	but the code	-0.124939
-1.120924	into the code	-0.124939
-0.415251	makes the code	-0.124939
-1.592744	before the code	-0.124939
-1.335160	sure the code	-0.124939
-1.053827	case the code	-0.124939
-1.313627	making the code	-0.124939
-1.120083	want the code	-0.124939
-0.961871	check the code	-0.124939
-1.351416	whether the code	-0.124939
-0.821474	All the code	-0.124939
-0.362615	optimize the code	-0.124939
-1.243415	However, the code	-0.124939
-1.448326	unless the code	-0.124939
-1.199264	replace the code	-0.124939
-1.199264	Therefore, the code	-0.124939
-1.237579	Here, the code	-0.124939
-1.210719	change the code	-0.124939
-0.961871	copying the code	-0.124939
-0.144512	vectorize the code	-0.221849
-0.821474	Now the code	-0.124939
-0.559338	Whenever the code	-0.124939
-0.559338	prefetching the code	-0.124939
-0.821474	study the code	-0.124939
-0.559338	contrary, the code	-0.124939
-0.559338	reduces the code	-0.124939
-0.559338	organize the code	-0.124939
-0.559338	reorganize the code	-0.124939
-0.559338	tune the code	-0.124939
-2.463636	of a code	-0.124939
-0.895488	which a code	-0.124939
-0.895488	shows a code	-0.124939
-0.598242	execute a code	-0.124939
-1.067828	insert a code	-0.124939
-1.933702	size of code	-0.124939
-0.372435	branch of code	-0.425969
-1.702759	lot of code	-0.124939
-0.260612	piece of code	-0.221849
-1.212600	range of code	-0.124939
-1.567865	amount of code	-0.124939
-0.948325	kinds of code	-0.124939
-1.385840	terms of code	-0.124939
-0.586645	pieces of code	-0.124939
-0.582875	parallelization of code	-0.124939
-0.200215	Caching of code	-0.425969
-0.582875	"Zen of code	-0.124939
-1.740373	needs to code	-0.124939
-0.599517	efficiency and code	-0.124939
-1.196536	names and code	-0.124939
-0.601337	degradation in code	-0.124939
-1.578568	} The code	-0.124939
-0.803825	time. The code	-0.124939
-1.362128	functions. The code	-0.124939
-0.567854	branches The code	-0.124939
-1.286199	data. The code	-0.124939
-0.983718	compilers. The code	-0.124939
-0.567854	together The code	-0.124939
-1.080990	calculations. The code	-0.124939
-0.983718	access. The code	-0.124939
-0.567854	dispatching. The code	-0.124939
-0.567854	X The code	-0.124939
-0.983718	automatically. The code	-0.124939
-0.567854	members. The code	-0.124939
-0.983718	3. The code	-0.124939
-0.837176	operator. The code	-0.124939
-0.567854	specified. The code	-0.124939
-0.567854	allowed. The code	-0.124939
-0.837176	122. The code	-0.124939
-0.567854	know). The code	-0.124939
-0.567854	contiguous. The code	-0.124939
-0.567854	features: The code	-0.124939
-0.567854	tedious. The code	-0.124939
-0.567854	_mm_cvtsd_si32(_mm_load_sd(&x));} The code	-0.124939
-1.354718	and for code	-0.124939
-1.283869	choice for code	-0.124939
-0.597706	criticized for code	-0.124939
-1.076036	likely that code	-0.124939
-1.198074	functions or code	-0.124939
-1.589434	The function code	-0.124939
-0.600285	later with code	-0.124939
-0.600382	Literature on code	-0.124939
-0.961499	} This code	-0.124939
-0.588961	n;} This code	-0.124939
-0.588961	16.1. This code	-0.124939
-1.074929	priority than code	-0.124939
-1.218730	with this code	-0.124939
-0.584305	which this code	-0.124939
-0.584305	running this code	-0.124939
-0.584305	overflow, this code	-0.124939
-1.261376	or when code	-0.124939
-1.261376	time when code	-0.124939
-0.588436	CriticalFunction when code	-0.124939
-1.399065	time. A code	-0.124939
-0.587293	correctly. A code	-0.124939
-0.587293	says. A code	-0.124939
-1.593561	the program code	-0.124939
-1.168975	of program code	-0.124939
-0.959312	The program code	-0.425969
-2.508366	to make code	-0.124939
-1.823723	a different code	-0.124939
-1.528904	the same code	-0.124939
-0.895967	is other code	-0.124939
-1.464326	floating point code	-0.124939
-1.428706	of which code	-0.124939
-0.900065	that all code	-0.124939
-0.573771	call all code	-0.124939
-0.573771	Not all code	-0.124939
-1.068043	Vector class code	-0.124939
-1.070594	make multiple code	-0.124939
-1.733906	and 64-bit code	-0.124939
-0.597988	maintaining such code	-0.124939
-1.716451	less efficient code	-0.124939
-1.425601	situations where code	-0.124939
-0.596987	run any code	-0.124939
-1.875085	This makes code	-0.124939
-1.117242	the critical code	-0.124939
-0.188660	Making critical code	-0.425969
-1.485018	64 bit code	-0.124939
-1.410231	32 bit code	-0.124939
-0.560100	the system code	-0.425969
-0.804933	in system code	-0.124939
-1.069182	the error code	-0.124939
-0.950929	an error code	-0.124939
-0.593725	discussions about code	-0.124939
-1.181176	no extra code	-0.124939
-0.615047	any extra code	-0.124939
-0.496750	add extra code	-0.124939
-0.496750	inserts extra code	-0.124939
-0.169913	and assembly code	-0.425969
-0.643140	use assembly code	-0.124939
-0.643140	need assembly code	-0.124939
-0.169913	following assembly code	-0.425969
-0.816912	inline assembly code	-0.124939
-0.466347	the compiled code	-0.124939
-0.912663	directly compiled code	-0.124939
-0.592366	and small code	-0.124939
-0.592830	for good code	-0.124939
-0.195585	from AVX code	-0.425969
-0.449929	the optimized code	-0.124939
-0.678311	The optimized code	-0.124939
-0.678311	fully optimized code	-0.124939
-0.899431	highly optimized code	-0.124939
-0.590914	or every code	-0.124939
-0.589741	149 All code	-0.124939
-0.592655	the intermediate code	-0.124939
-0.295128	of intermediate code	-0.124939
-0.418117	and intermediate code	-0.124939
-0.468298	The intermediate code	-0.124939
-0.122962	on intermediate code	-0.124939
-0.180673	an intermediate code	-0.221849
-1.646044	to optimize code	-0.124939
-1.043768	the above code	-0.124939
-0.287037	The above code	-0.124939
-0.442951	This above code	-0.124939
-1.331927	the optimal code	-0.124939
-0.546207	less optimal code	-0.124939
-1.091948	a particular code	-0.124939
-1.262154	32-bit Mac code	-0.124939
-1.037881	a complicated code	-0.124939
-1.088708	the source code	-0.124939
-0.537814	The source code	-0.124939
-0.529270	time. Each code	-0.124939
-0.529270	initialization. Each code	-0.124939
-0.526429	that your code	-0.124939
-0.526429	before your code	-0.124939
-0.583450	to binary code	-0.124939
-0.582187	most advanced code	-0.124939
-0.195272	the position-independent code	-0.124939
-0.054811	and position-independent code	-0.124939
-0.054811	use position-independent code	-0.124939
-0.117553	make position-independent code	-0.124939
-0.117553	using position-independent code	-0.124939
-0.117553	without position-independent code	-0.124939
-0.117553	uses position-independent code	-0.124939
-0.117553	special position-independent code	-0.124939
-0.117553	burdensome position-independent code	-0.124939
-0.430257	in vectorized code	-0.124939
-0.430257	The vectorized code	-0.124939
-0.430257	use vectorized code	-0.124939
-0.573505	here. Any code	-0.124939
-0.844549	exactly identical code	-0.124939
-0.977868	position- independent code	-0.124939
-1.135371	to vectorize code	-0.124939
-1.288479	hardware definition code	-0.124939
-0.222577	the machine code	-0.124939
-0.222577	as machine code	-0.124939
-0.222577	into machine code	-0.124939
-0.222577	resulting machine code	-0.124939
-0.390478	code. System code	-0.124939
-0.390478	else. System code	-0.124939
-0.230387	below. Position-independent code	-0.124939
-0.230387	default. Position-independent code	-0.124939
-0.334120	14.12 Position-independent code	-0.124939
-0.212507	a loop-invariant code	-0.124939
-0.060969	and loop-invariant code	-0.124939
-0.131917	out loop-invariant code	-0.124939
-0.313902	size. Vectorized code	-0.124939
-0.313902	operators. Vectorized code	-0.124939
-0.129187	Transforming serial code	-0.425969
-0.313902	on mixing code	-0.124939
-0.313902	when mixing code	-0.124939
-0.525932	Your measurement code	-0.124939
-0.761352	data cache, code	-0.124939
-0.249176	the compiler-generated code	-0.124939
-0.249176	and compiler-generated code	-0.124939
-0.723849	for improving code	-0.124939
-0.503257	and well-structured code	-0.124939
-0.249176	The built-in code	-0.124939
-0.249176	inserts built-in code	-0.124939
-0.503887	contains complete code	-0.124939
-0.065645	Loop invariant code	-0.425969
-0.657351	to non-AVX code	-0.124939
-0.462036	The resulting code	-0.124939
-0.462036	the unsafe code	-0.124939
-0.462036	linker. Both code	-0.124939
-0.462036	script. Interpreted code	-0.124939
-0.357669	is dead code	-0.124939
-0.357669	can build code	-0.124939
-0.357669	the user-written code	-0.124939
-0.357669	the startup code	-0.124939
-0.357669	it. Complicated code	-0.124939
-0.357669	Making exception-safe code	-0.124939
-0.357669	the resultant code	-0.124939
-2.282912	function is as	-0.124939
-1.426234	operator is as	-0.124939
-0.894599	sets is as	-0.124939
-0.378500	destructor is as	-0.425969
-2.339311	to be as	-0.124939
-2.247790	should be as	-0.124939
-1.186793	files are as	-0.124939
-0.597663	i++ are as	-0.124939
-0.598579	parameter, or as	-0.124939
-0.600291	recognizes it as	-0.124939
-1.290322	detection function as	-0.124939
-1.194322	same code as	-0.124939
-0.598586	size, not as	-0.124939
-2.182442	rather than as	-0.124939
-0.594532	buffer than as	-0.124939
-0.600368	access x as	-0.124939
-1.353570	you may as	-0.425969
-1.069408	should have as	-0.124939
-1.268506	same time as	-0.124939
-1.171596	extra time as	-0.124939
-0.598885	for use as	-0.124939
-1.959422	the data as	-0.124939
-0.882182	much data as	-0.124939
-1.715036	the program as	-0.124939
-0.895710	256-bit vector as	-0.124939
-1.076664	the same as	-0.221849
-1.068016	string functions as	-0.124939
-0.598206	automatically, but as	-0.124939
-1.274152	is used as	-0.425969
-0.898773	be used as	-0.221849
-0.776992	when used as	-0.124939
-1.108874	often used as	-0.124939
-1.638523	level-2 cache as	-0.124939
-2.022785	to do as	-0.124939
-0.587447	should do as	-0.124939
-0.586852	and size as	-0.124939
-0.586852	same size as	-0.124939
-0.597595	/ b as	-0.124939
-1.183257	or object as	-0.124939
-0.596293	36 C++ as	-0.124939
-0.183849	function such as	-0.124939
-0.019152	functions such as	-0.182931
-0.183849	compilers such as	-0.124939
-0.183849	operations such as	-0.124939
-0.183849	type such as	-0.124939
-0.183849	cases such as	-0.124939
-0.183849	CPUs such as	-0.124939
-0.183849	branches such as	-0.124939
-0.183849	applications such as	-0.124939
-0.183849	types such as	-0.124939
-0.082268	optimizations such as	-0.124939
-0.183849	reductions such as	-0.124939
-0.082268	languages such as	-0.124939
-0.039189	tasks such as	-0.124939
-0.183849	time, such as	-0.124939
-0.183849	blocks such as	-0.124939
-0.183849	purposes such as	-0.124939
-0.183849	iterations such as	-0.124939
-0.183849	vector, such as	-0.124939
-0.183849	memory, such as	-0.124939
-0.183849	profilers such as	-0.124939
-0.183849	available, such as	-0.124939
-0.183849	threads, such as	-0.124939
-0.183849	languages, such as	-0.124939
-0.183849	resources, such as	-0.124939
-0.183849	overflow, such as	-0.124939
-0.183849	considerations such as	-0.124939
-0.183849	comparisons, such as	-0.124939
-0.183849	language, such as	-0.124939
-0.183849	classes, such as	-0.124939
-0.183849	templates, such as	-0.124939
-0.183849	resource, such as	-0.124939
-0.183849	shuffling, such as	-0.124939
-0.183849	events, such as	-0.124939
-0.183849	suffixes such as	-0.124939
-0.183849	serial, such as	-0.124939
-0.183849	vectorization, such as	-0.124939
-0.183849	obtain, such as	-0.124939
-0.183849	media such as	-0.124939
-0.183849	9.2, such as	-0.124939
-0.183849	information, such as	-0.124939
-0.118128	as efficient as	-0.301030
-0.372520	previous value as	-0.124939
-1.063110	vector objects as	-0.124939
-1.576842	the variable as	-0.124939
-0.567343	point variable as	-0.124939
-1.339086	induction variable as	-0.124939
-0.596532	designed so as	-0.124939
-0.566095	multiple variables as	-0.124939
-0.587315	Boolean variables as	-0.425969
-0.098900	as long as	-0.271067
-0.240861	same way as	-0.124939
-0.566616	is stored as	-0.425969
-0.744069	and stored as	-0.124939
-1.451193	are stored as	-0.124939
-0.886583	quite often as	-0.124939
-1.623143	not always as	-0.124939
-1.399431	oriented programming as	-0.124939
-1.660441	are available as	-0.124939
-0.593016	six times as	-0.124939
-2.042903	you want as	-0.124939
-1.564772	the arrays as	-0.124939
-0.564602	development work as	-0.124939
-0.564602	little work as	-0.124939
-1.470848	point calculations as	-0.124939
-1.385597	is compiled as	-0.124939
-0.975293	be compiled as	-0.124939
-0.591326	C language as	-0.124939
-0.883383	as much as	-0.124939
-0.591187	same thread as	-0.124939
-0.590650	as small as	-0.124939
-0.591483	using integers as	-0.124939
-0.081925	as good as	-0.249877
-1.404861	64-bit Linux as	-0.124939
-1.640184	be done as	-0.124939
-0.195258	are therefore as	-0.124939
-0.880237	same precision as	-0.124939
-0.880383	Not optimized as	-0.124939
-1.146067	is calculated as	-0.124939
-0.730520	be calculated as	-0.124939
-1.564574	to get as	-0.124939
-0.591381	particular advantageous as	-0.124939
-0.557237	is implemented as	-0.425969
-0.458633	be implemented as	-0.346788
-0.392769	are implemented as	-0.124939
-0.353117	often implemented as	-0.124939
-0.194557	error known as	-0.425969
-0.140601	as well as	-0.124939
-0.588961	therefore count as	-0.124939
-0.587384	not quite as	-0.124939
-0.046778	as fast as	-0.124939
-0.874831	not optimize as	-0.124939
-0.586656	few branches as	-0.124939
-0.334334	same name as	-0.425969
-0.710776	class name as	-0.124939
-0.584847	that string as	-0.124939
-0.583442	accept expressions as	-0.124939
-0.853565	is transferred as	-0.124939
-0.484658	then transferred as	-0.124939
-0.484658	always transferred as	-0.124939
-1.130935	.NET framework as	-0.124939
-0.583023	thousand numbers as	-0.124939
-1.134180	are declared as	-0.124939
-1.130793	intermediate results as	-0.124939
-0.582044	results were as	-0.124939
-0.412138	be just as	-0.124939
-0.412138	are just as	-0.124939
-0.412138	vector just as	-0.124939
-0.412138	index, just as	-0.124939
-0.580959	be smaller as	-0.124939
-0.579749	obvious reductions as	-0.124939
-1.257135	programming languages as	-0.124939
-0.856977	in STL as	-0.124939
-1.521302	is intended as	-0.124939
-0.438230	optimizing code, as	-0.124939
-0.438230	source code, as	-0.124939
-0.438230	CPU-intensive code, as	-0.124939
-0.712711	different platforms as	-0.124939
-0.817858	other platforms as	-0.124939
-0.853253	is given as	-0.124939
-0.979405	be vectorized as	-0.124939
-0.496795	indeed vectorized as	-0.124939
-1.324163	the offset as	-0.124939
-0.573460	Use macro as	-0.124939
-0.574164	comparing them as	-0.124939
-1.147933	same thing as	-0.124939
-0.842138	may occur as	-0.124939
-0.570516	to reading as	-0.124939
-0.153214	implemented either as	-0.425969
-0.392059	linked either as	-0.124939
-0.569691	uses ebx as	-0.124939
-0.842910	are defined as	-0.124939
-0.566133	not significant as	-0.124939
-0.835651	becomes invalid as	-0.124939
-0.608418	be organized as	-0.124939
-0.431616	are organized as	-0.124939
-0.305258	if organized as	-0.124939
-0.305258	registers organized as	-0.124939
-1.378835	instruction set, as	-0.124939
-0.567935	non-Intel processors, as	-0.124939
-0.838167	rules apply as	-0.124939
-0.567033	same features as	-0.124939
-0.970708	C style as	-0.124939
-0.971978	is chosen as	-0.124939
-0.312335	is provided as	-0.124939
-0.563310	as standardized as	-0.124939
-0.560915	usually included as	-0.124939
-0.559253	is now as	-0.124939
-0.559807	same unit as	-0.124939
-0.135981	modern CPUs, as	-0.425969
-0.471928	multi-core CPUs, as	-0.124939
-0.559807	of j as	-0.124939
-0.560361	different factors as	-0.124939
-0.559253	dispatching explicitly as	-0.124939
-0.692546	is interpreted as	-0.124939
-0.432213	be interpreted as	-0.124939
-0.613335	is exactly as	-0.124939
-0.613335	are exactly as	-0.124939
-0.554092	calculate xn as	-0.124939
-0.486768	is distributed as	-0.124939
-0.307437	and distributed as	-0.124939
-0.307437	libraries distributed as	-0.124939
-0.811930	64-bit mode, as	-0.124939
-0.947191	in memory, as	-0.124939
-0.811930	the system, as	-0.124939
-0.810800	64-bit integers, as	-0.124939
-0.554092	the measurements as	-0.124939
-0.548024	same principle as	-0.124939
-0.548743	be static, as	-0.124939
-0.547306	in edx as	-0.124939
-0.546589	software users as	-0.124939
-0.388710	as soon as	-0.124939
-0.388710	As soon as	-0.124939
-0.781699	be executed as	-0.124939
-1.034744	time consumption as	-0.124939
-0.538822	will appear as	-0.124939
-0.537979	value written as	-0.124939
-0.907827	to 15.1c as	-0.124939
-0.524353	cleaned up, as	-0.124939
-0.524353	bool, enum as	-0.124939
-0.757742	of precision, as	-0.124939
-0.524353	a queue as	-0.124939
-0.015464	be expressed as	-0.249877
-0.525375	// Same as	-0.124939
-0.077461	is treated as	-0.124939
-0.171804	simply treated as	-0.124939
-0.129075	is coded as	-0.124939
-0.442757	be represented as	-0.124939
-0.313560	fact represented as	-0.124939
-0.759493	and reproducible as	-0.124939
-0.501261	template parameters, as	-0.124939
-0.720537	an FPGA as	-0.124939
-0.720537	small devices, as	-0.124939
-0.501261	multiple counters, as	-0.124939
-0.720537	out-of-order execution, as	-0.124939
-0.501261	optimize access, as	-0.124939
-0.501261	cache space, as	-0.124939
-0.720537	vector operations, as	-0.124939
-0.501261	compilers, etc., as	-0.124939
-0.501261	get ReadTSC as	-0.124939
-0.501261	for metaprogramming, as	-0.124939
-0.501261	of vectors, as	-0.124939
-0.501261	vector classes, as	-0.124939
-0.460221	class templates, as	-0.124939
-0.460221	eliminate branches, as	-0.124939
-0.460221	well developed as	-0.124939
-0.460221	Such events as	-0.124939
-0.460221	or microseconds as	-0.124939
-0.460221	register use, as	-0.124939
-0.460221	as _mm_empty() as	-0.124939
-0.460221	other ways, as	-0.124939
-0.460221	without AVX, as	-0.124939
-0.460221	are cached as	-0.124939
-0.460221	have Booleans as	-0.124939
-0.142574	calculated internally as	-0.124939
-0.142574	implemented internally as	-0.124939
-0.356241	and clumsy, as	-0.124939
-0.356241	switch statements, as	-0.124939
-0.356241	not yet as	-0.124939
-0.356241	be regarded as	-0.124939
-0.356241	memory pool, as	-0.124939
-0.356241	by assignment, as	-0.124939
-0.356241	other optimizations, as	-0.124939
-0.356241	increasingly blurred as	-0.124939
-0.356241	be passed as	-0.124939
-0.356241	pointer serves as	-0.124939
-0.356241	data elements, as	-0.124939
-0.356241	implement OneOrTwo5[b!=0] as	-0.124939
-0.356241	static linking, as	-0.124939
-0.356241	clock frequency, as	-0.124939
-0.356241	optimization hints as	-0.124939
-0.356241	is pipelined, as	-0.124939
-0.356241	critical stride, as	-0.124939
-0.356241	cache contentions, as	-0.124939
-0.356241	bounds checking, as	-0.124939
-0.356241	function (n!) as	-0.124939
-0.356241	same directory as	-0.124939
-0.356241	a union, as	-0.124939
-0.356241	legal issue, as	-0.124939
-0.356241	garbage collection, as	-0.124939
-0.356241	2008 R2 as	-0.124939
-1.062791	and is not	-0.124939
-1.439332	that is not	-0.124939
-0.885882	it is not	-0.249877
-1.125796	function is not	-0.124939
-1.020453	code is not	-0.124939
-1.461430	This is not	-0.124939
-1.357434	compiler is not	-0.124939
-0.527095	this is not	-0.191886
-0.892094	A is not	-0.124939
-1.026373	It is not	-0.204120
-0.969110	functions is not	-0.124939
-1.017033	but is not	-0.124939
-1.525946	set is not	-0.124939
-1.225917	size is not	-0.124939
-0.969110	i is not	-0.124939
-0.956706	object is not	-0.124939
-0.769879	number is not	-0.124939
-1.118491	array is not	-0.124939
-0.969110	objects is not	-0.124939
-1.022863	table is not	-0.124939
-1.187729	performance is not	-0.124939
-1.022863	address is not	-0.124939
-0.892094	processors is not	-0.124939
-0.969110	error is not	-0.124939
-0.769879	CPUs is not	-0.124939
-0.543790	processor is not	-0.425969
-0.892094	precision is not	-0.124939
-1.225917	count is not	-0.124939
-0.530367	branches is not	-0.124939
-0.460731	handling is not	-0.425969
-0.530367	name is not	-0.124939
-0.892094	keyword is not	-0.124939
-0.769879	interface is not	-0.124939
-0.350100	union is not	-0.124939
-0.350100	constructor is not	-0.425969
-0.530367	points is not	-0.124939
-0.769879	section is not	-0.124939
-0.969110	computer is not	-0.124939
-1.102044	p is not	-0.124939
-0.769879	STL is not	-0.124939
-1.070909	index is not	-0.124939
-0.530367	conditions is not	-0.124939
-0.530367	alignment is not	-0.124939
-0.530367	compatibility is not	-0.124939
-0.663474	operand is not	-0.124939
-1.070909	N is not	-0.124939
-0.530367	unrolling is not	-0.124939
-0.530367	length is not	-0.124939
-0.892094	tool is not	-0.124939
-0.530367	iterations is not	-0.124939
-0.530367	required is not	-0.124939
-0.530367	misses is not	-0.124939
-0.530367	debugger is not	-0.124939
-0.530367	base is not	-0.124939
-0.530367	propagation is not	-0.124939
-0.530367	package is not	-0.124939
-1.017033	divisor is not	-0.124939
-0.530367	Fastcall is not	-0.124939
-0.530367	(1) is not	-0.124939
-0.530367	hyperthreading is not	-0.124939
-0.769879	format is not	-0.124939
-0.530367	mirroring is not	-0.124939
-0.530367	'1' is not	-0.124939
-0.597819	2n and not	-0.124939
-0.597819	readable and not	-0.124939
-0.597819	_MSC_VER and not	-0.124939
-0.597819	__GNUC__ and not	-0.124939
-0.961971	that are not	-0.124939
-0.898256	you are not	-0.124939
-1.363322	data are not	-0.124939
-1.268440	functions are not	-0.124939
-0.920288	compilers are not	-0.124939
-1.462746	there are not	-0.124939
-0.715475	objects are not	-0.124939
-0.690530	libraries are not	-0.124939
-0.544171	systems are not	-0.124939
-1.128804	they are not	-0.124939
-1.157652	operations are not	-0.124939
-0.664012	instructions are not	-0.124939
-1.095433	processors are not	-0.124939
-1.426426	parameters are not	-0.124939
-1.024301	resources are not	-0.124939
-1.103482	microprocessors are not	-0.124939
-0.893158	numbers are not	-0.124939
-0.530829	reductions are not	-0.124939
-0.530829	conversions are not	-0.124939
-0.770681	names are not	-0.124939
-0.530829	sequence are not	-0.124939
-0.350303	tables are not	-0.425969
-0.530829	D are not	-0.124939
-0.893158	profilers are not	-0.124939
-0.530829	algorithms, are not	-0.124939
-0.530829	'>') are not	-0.124939
-2.077882	it can not	-0.124939
-0.598946	redesign can not	-0.124939
-1.281062	speed or not	-0.124939
-0.597992	hyperthreading or not	-0.124939
-1.198833	and by not	-0.124939
-1.197279	and not not	-0.124939
-2.606450	the compiler not	-0.124939
-1.245985	it may not	-0.425969
-0.772907	compiler may not	-0.425969
-0.834058	It may not	-0.602060
-0.760816	memory may not	-0.124939
-0.525123	functions may not	-0.124939
-0.525123	cache may not	-0.124939
-0.880126	compilers may not	-0.124939
-0.760816	pointers may not	-0.124939
-0.525123	user may not	-0.124939
-0.954857	system may not	-0.124939
-0.525123	alloca may not	-0.124939
-0.525123	exit may not	-0.124939
-0.525123	sticks may not	-0.124939
-0.899326	They have not	-0.124939
-0.600115	expressions when not	-0.124939
-0.734346	it will not	-0.124939
-1.264250	code will not	-0.124939
-1.003166	It will not	-0.124939
-1.142280	program will not	-0.124939
-1.036603	compilers will not	-0.425969
-1.003166	we will not	-0.124939
-0.542461	You will not	-0.124939
-0.542461	16 will not	-0.124939
-1.683011	it has not	-0.124939
-1.528195	compiler has not	-0.124939
-1.453887	library has not	-0.124939
-0.825531	cases, but not	-0.124939
-0.890733	time, but not	-0.124939
-0.718722	used, but not	-0.124939
-0.718722	processors, but not	-0.124939
-0.825531	CPUs, but not	-0.124939
-0.500165	8, but not	-0.124939
-0.500165	memory, but not	-0.124939
-0.825531	efficient, but not	-0.124939
-0.500165	float, but not	-0.124939
-0.718722	applications, but not	-0.124939
-0.500165	complex, but not	-0.124939
-0.500165	.a), but not	-0.124939
-0.500165	GB, but not	-0.124939
-0.500165	noticeable but not	-0.124939
-0.196150	that should not	-0.425969
-1.463529	you should not	-0.124939
-0.972717	dispatcher should not	-0.124939
-0.563598	measurement should not	-0.124939
-2.423952	instruction set not	-0.124939
-0.203941	that do not	-0.249877
-0.149604	compilers do not	-0.124939
-0.277266	we do not	-0.124939
-0.379673	variables do not	-0.124939
-0.149604	libraries do not	-0.124939
-0.379673	pointers do not	-0.124939
-0.379673	systems do not	-0.124939
-0.379673	they do not	-0.124939
-0.379673	operations do not	-0.124939
-0.379673	directives do not	-0.124939
-0.379673	vectors do not	-0.124939
-0.379673	contentions do not	-0.124939
-0.379673	references do not	-0.124939
-0.379673	conversions do not	-0.124939
-0.379673	containers do not	-0.124939
-0.379673	programmers do not	-0.124939
-0.379673	Compilers do not	-0.124939
-0.093812	ranges do not	-0.602060
-0.379673	studied do not	-0.124939
-0.379673	live-ranges do not	-0.124939
-0.379673	ranges) do not	-0.124939
-0.896494	Assume pointer not	-0.124939
-0.596134	class need not	-0.124939
-0.596024	Be sure not	-0.124939
-0.890315	set SSE2 not	-0.124939
-0.314192	it does not	-0.124939
-0.164263	function does not	-0.124939
-0.074411	This does not	-0.124939
-0.216414	compiler does not	-0.124939
-0.164263	this does not	-0.124939
-0.285329	It does not	-0.124939
-0.164263	loop does not	-0.124939
-0.048165	pointer does not	-0.301030
-0.164263	library does not	-0.124939
-0.164263	object does not	-0.124939
-0.074411	2 does not	-0.124939
-0.164263	long does not	-0.124939
-0.164263	thread does not	-0.124939
-0.164263	manual does not	-0.124939
-0.164263	list does not	-0.124939
-0.164263	dispatcher does not	-0.124939
-0.164263	programmer does not	-0.124939
-0.164263	aliasing does not	-0.124939
-0.164263	hand, does not	-0.124939
-0.164263	unit-test does not	-0.124939
-0.164263	argument does not	-0.124939
-0.164263	14.26 does not	-0.124939
-0.593881	n. But not	-0.124939
-1.171518	is therefore not	-0.124939
-0.740603	and therefore not	-0.124939
-1.135629	should therefore not	-0.124939
-1.048419	This would not	-0.124939
-1.599044	is simply not	-0.124939
-0.589642	seconds was not	-0.124939
-0.587631	(Scalar means not	-0.124939
-1.029533	bit platform not	-0.124939
-1.489112	is usually not	-0.124939
-1.234283	that were not	-0.124939
-0.522995	tasks were not	-0.124939
-0.069153	time. Do not	-0.124939
-0.069153	function. Do not	-0.124939
-0.069153	used. Do not	-0.124939
-0.069153	efficient. Do not	-0.124939
-0.069153	set. Do not	-0.124939
-0.069153	allocation. Do not	-0.124939
-0.069153	block. Do not	-0.124939
-0.069153	optimizations. Do not	-0.124939
-0.069153	list. Do not	-0.124939
-0.069153	resource. Do not	-0.124939
-0.069153	column; Do not	-0.124939
-0.069153	editions). Do not	-0.124939
-0.574697	but possibly not	-0.124939
-0.574927	function, though not	-0.124939
-0.840357	it might not	-0.124939
-0.566246	should include not	-0.124939
-0.827223	Intel CPUs, not	-0.124939
-0.556534	columns had not	-0.124939
-0.816704	are generally not	-0.124939
-1.142623	operating system, not	-0.124939
-0.556534	fixed size, not	-0.124939
-0.705198	in registers, not	-0.124939
-0.414932	on registers, not	-0.124939
-0.504629	I am not	-0.124939
-0.787345	not _WIN64 not	-0.124939
-0.786896	is currently not	-0.124939
-0.132039	cases. Does not	-0.124939
-0.132039	sets. Does not	-0.124939
-0.212655	Windows. Does not	-0.124939
-0.132039	IDE. Does not	-0.124939
-0.503978	IDE. Has not	-0.124939
-0.503978	a register, not	-0.124939
-0.504369	profiler measures not	-0.124939
-0.658379	own research, not	-0.124939
-0.462692	is 95 not	-0.124939
-0.358185	[edx] adds, not	-0.124939
-0.358185	the rows, not	-0.124939
-0.358185	this did not	-0.124939
-0.358185	template specialization, not	-0.124939
-0.358185	arrays forwards, not	-0.124939
-0.358185	systems (but not	-0.124939
-0.358185	take precedence, not	-0.124939
-0.865431	functions // This	-0.124939
-0.865431	0 // This	-0.124939
-0.582823	required // This	-0.124939
-0.200204	16; // This	-0.425969
-0.582823	method. // This	-0.124939
-0.582823	time1; // This	-0.124939
-0.582823	square. // This	-0.124939
-1.130667	} } This	-0.124939
-0.925680	1; } This	-0.124939
-0.534907	f; } This	-0.124939
-0.547533	3; } This	-0.425969
-0.534907	break; } This	-0.124939
-0.534907	FuncC(i); } This	-0.124939
-0.534907	sums } This	-0.124939
-0.534907	FuncC(i+1); } This	-0.124939
-0.927103	the code. This	-0.124939
-0.781678	intermediate code. This	-0.124939
-0.537125	non-AVX code. This	-0.124939
-0.906371	the time. This	-0.124939
-0.986211	a time. This	-0.124939
-0.536516	first time. This	-0.124939
-1.042268	extra time. This	-0.124939
-0.592723	10 Gnu This	-0.124939
-0.485136	the function. This	-0.249877
-0.814579	dispatcher function. This	-0.124939
-0.589336	2, etc. This	-0.124939
-1.196906	member functions. This	-0.124939
-0.760183	virtual functions. This	-0.124939
-0.879294	intrinsic functions. This	-0.124939
-0.590343	|| b; This	-0.124939
-0.725188	the memory. This	-0.124939
-0.916975	in memory. This	-0.124939
-0.638367	program memory. This	-0.124939
-0.638367	into memory. This	-0.124939
-0.725188	RAM memory. This	-0.124939
-0.753140	the program. This	-0.124939
-0.863735	a program. This	-0.124939
-0.445425	C++ program. This	-0.124939
-0.445425	final program. This	-0.124939
-0.538274	1 cache. This	-0.124939
-0.910501	level-2 cache. This	-0.124939
-1.347884	more efficient. This	-0.124939
-0.583327	8 below. This	-0.124939
-0.861816	the data. This	-0.124939
-0.654808	of data. This	-0.124939
-1.703844	instruction set. This	-0.124939
-0.889596	different compilers. This	-0.124939
-0.767992	other compilers. This	-0.124939
-0.762937	+= x; This	-0.124939
-1.010511	*= x; This	-0.124939
-0.703207	is called. This	-0.124939
-0.757816	never called. This	-0.124939
-0.451740	different CPUs. This	-0.124939
-0.744398	Intel compiler. This	-0.124939
-0.515506	C++ compiler. This	-0.124939
-0.528687	innermost loop. This	-0.124939
-0.512130	memory pointer. This	-0.124939
-0.738695	member pointer. This	-0.124939
-0.904715	Windows platforms. This	-0.124939
-0.905530	x86 platforms. This	-0.124939
-0.839606	most cases. This	-0.124939
-0.506763	both cases. This	-0.124939
-0.445854	is 1. This	-0.124939
-0.632240	= 1. This	-0.124939
-0.767724	or 1. This	-0.124939
-1.109838	cache size. This	-0.124939
-1.002189	register variables. This	-0.124939
-0.573822	network resources. This	-0.124939
-0.605739	or class. This	-0.124939
-0.605739	child class. This	-0.124939
-0.428418	derived class. This	-0.124939
-0.851557	+ d; This	-0.124939
-0.846114	to it. This	-0.124939
-0.846114	of registers. This	-0.124939
-0.850430	64 bytes. This	-0.124939
-1.096783	shared object. This	-0.124939
-0.996295	the library. This	-0.124939
-0.574472	actual calculations. This	-0.124939
-0.572638	integer operations. This	-0.124939
-0.699726	the variable. This	-0.124939
-0.488582	local variable. This	-0.124939
-0.697891	program optimization. This	-0.124939
-0.487452	profile-guided optimization. This	-0.124939
-0.387085	the stack. This	-0.124939
-0.341156	their stack. This	-0.124939
-1.262029	if possible. This	-0.124939
-0.570274	than needed. This	-0.124939
-0.438066	each thread. This	-0.124939
-0.571261	another thread. This	-0.124939
-0.687045	other purposes. This	-0.124939
-0.687045	these purposes. This	-0.124939
-0.838503	point instructions. This	-0.124939
-0.838503	the vector. This	-0.124939
-0.671841	as well. This	-0.124939
-0.471222	very well. This	-0.124939
-1.348307	CPU dispatching. This	-0.124939
-0.566527	the problem. This	-0.124939
-1.324445	function returns. This	-0.124939
-0.827739	random order. This	-0.124939
-1.253777	memory allocation. This	-0.124939
-0.405870	memory block. This	-0.124939
-1.179379	is executed. This	-0.124939
-0.826551	for overflow. This	-0.124939
-0.558799	same value. This	-0.124939
-0.303706	object file. This	-0.425969
-0.558080	logical register. This	-0.124939
-1.197309	operating system. This	-0.124939
-1.049911	very fast. This	-0.124939
-0.558080	multiplication units. This	-0.124939
-0.820490	the array. This	-0.124939
-0.557361	23 software. This	-0.124939
-1.165515	cache line. This	-0.124939
-0.558080	the vectors. This	-0.124939
-0.552145	its parameters. This	-0.124939
-0.444620	is important. This	-0.124939
-0.412799	vector simultaneously. This	-0.124939
-0.582452	threads simultaneously. This	-0.124939
-1.141967	Visual Studio This	-0.124939
-0.985966	the processor. This	-0.124939
-0.434575	of bits. This	-0.124939
-0.434575	64 bits. This	-0.124939
-0.307469	32 bits. This	-0.124939
-0.552956	actually is. This	-0.124939
-0.552956	than speed. This	-0.124939
-0.657816	to do. This	-0.124939
-0.412799	event-counters do. This	-0.124939
-0.727998	by 16. This	-0.124939
-0.412212	modulo 16. This	-0.124939
-0.799443	Copyright notice This	-0.124939
-1.071021	data members. This	-0.124939
-1.014110	back again. This	-0.124939
-0.387955	hundred times. This	-0.124939
-0.546218	inconvenient times. This	-0.124939
-0.797775	preceding one. This	-0.124939
-0.797775	stamp counter. This	-0.124939
-0.797775	or structure. This	-0.124939
-0.535862	ready-made profiler. This	-0.124939
-0.535862	compiler manual. This	-0.124939
-0.535862	are finished. This	-0.124939
-0.536957	4 ways. This	-0.124939
-1.040177	Digital Mars This	-0.124939
-0.535862	dispatch process. This	-0.124939
-0.535862	data files. This	-0.124939
-0.984372	out-of-order execution. This	-0.124939
-0.229370	different modules. This	-0.124939
-0.191700	other modules. This	-0.124939
-0.264704	at all. This	-0.124939
-0.535862	absolute addresses. This	-0.124939
-0.535862	multiplying them. This	-0.124939
-0.535862	per point. This	-0.124939
-0.957474	is doubled. This	-0.124939
-1.131832	clock cycle. This	-0.124939
-0.522099	filled up. This	-0.124939
-0.522099	marketing reasons. This	-0.124939
-0.522099	all objects. This	-0.124939
-0.757899	cache lines. This	-0.124939
-0.933990	+= 1.0f; This	-0.124939
-0.522099	multiple versions. This	-0.124939
-0.522099	be cached. This	-0.124939
-0.523425	if unsigned. This	-0.124939
-0.757899	invalid pointers. This	-0.124939
-0.873309	this option. This	-0.124939
-0.523425	/ b2; This	-0.124939
-0.755627	programming languages. This	-0.124939
-0.522099	subsequent counts. This	-0.124939
-0.522099	and smaller. This	-0.124939
-0.936274	another module. This	-0.124939
-0.522099	this manually. This	-0.124939
-0.757899	reproducible results. This	-0.124939
-0.500088	point addition. This	-0.124939
-0.500088	code only. This	-0.124939
-0.500088	are long. This	-0.124939
-0.897469	in advance. This	-0.124939
-0.718594	= 28. This	-0.124939
-0.247996	is loaded. This	-0.124939
-0.247996	been loaded. This	-0.124939
-0.500088	compiled C++. This	-0.124939
-0.500088	code caching. This	-0.124939
-0.500088	is stored. This	-0.124939
-0.500088	random manner. This	-0.124939
-0.500088	#include directives. This	-0.124939
-0.247996	function definition. This	-0.124939
-0.356604	class definition. This	-0.124939
-0.202990	VIA CPUs"). This	-0.124939
-0.825367	the same. This	-0.124939
-0.500088	null reference. This	-0.124939
-0.897469	each other. This	-0.124939
-0.247996	too fragmented. This	-0.124939
-0.356604	become fragmented. This	-0.124939
-0.500088	of truncation. This	-0.124939
-0.247996	or inline. This	-0.124939
-0.356604	function inline. This	-0.124939
-0.718594	and free. This	-0.124939
-0.106532	or modified. This	-0.425969
-0.500088	of x. This	-0.124939
-0.500088	in a. This	-0.124939
-0.459153	table static. This	-0.124939
-0.459153	interface frameworks. This	-0.124939
-0.652838	the mouse. This	-0.124939
-0.652838	S1 ArrayOfStructures[100]; This	-0.124939
-0.652838	is compiled. This	-0.124939
-0.459153	= 32. This	-0.124939
-0.459153	a; 72 This	-0.124939
-0.459153	declared volatile. This	-0.124939
-0.459153	less safe. This	-0.124939
-0.459153	be pure. This	-0.124939
-0.652838	of range. This	-0.124939
-0.459153	< 2.0 This	-0.124939
-0.459153	is lost. This	-0.124939
-0.459153	function declaration. This	-0.124939
-0.459153	never changed. This	-0.124939
-0.652838	"override" feature. This	-0.124939
-0.459153	is defined. This	-0.124939
-0.652838	by default. This	-0.124939
-0.459153	software development. This	-0.124939
-0.652838	is known. This	-0.124939
-0.459153	than rounding. This	-0.124939
-0.459153	even temporarily. This	-0.124939
-0.652838	be predicted. This	-0.124939
-0.652838	everything else. This	-0.124939
-0.459153	with alloca. This	-0.124939
-0.652838	250 ms. This	-0.124939
-0.652838	page 137). This	-0.124939
-0.459153	index, i. This	-0.124939
-0.652838	1 Introduction This	-0.124939
-0.652838	than normal. This	-0.124939
-0.459153	have occurred. This	-0.124939
-0.459153	less efficiently. This	-0.124939
-0.652838	+= list[i]; This	-0.124939
-0.459153	different executables. This	-0.124939
-0.355400	than 231. This	-0.124939
-0.355400	undesired effects. This	-0.124939
-0.355400	label $B1$2:. This	-0.124939
-0.355400	called before. This	-0.124939
-0.355400	is saturated. This	-0.124939
-0.355400	is compiling. This	-0.124939
-0.355400	time slices. This	-0.124939
-0.355400	2 Gbytes. This	-0.124939
-0.355400	task switching. This	-0.124939
-0.355400	of view. This	-0.124939
-0.355400	ifunc branch). This	-0.124939
-0.355400	754 (1985). This	-0.124939
-0.355400	also de-allocated. This	-0.124939
-0.355400	address [ecx+eax*4]. This	-0.124939
-0.355400	+ 0.666666666666666666667; This	-0.124939
-0.355400	made local. This	-0.124939
-0.355400	32 bytes). This	-0.124939
-0.355400	vacant spaces. This	-0.124939
-0.355400	becomes full. This	-0.124939
-0.355400	of if. This	-0.124939
-0.355400	of (a+b). This	-0.124939
-0.355400	of it). This	-0.124939
-0.355400	quite substantial. This	-0.124939
-0.355400	its arguments. This	-0.124939
-0.355400	or -axAVX. This	-0.124939
-0.355400	template specialization. This	-0.124939
-0.355400	last member. This	-0.124939
-0.355400	2008 version). This	-0.124939
-0.355400	ever happens. This	-0.124939
-0.355400	two entries. This	-0.124939
-0.355400	Visual Studio. This	-0.124939
-0.355400	return n;} This	-0.124939
-0.355400	0; 35 This	-0.124939
-0.355400	their functionality. This	-0.124939
-0.355400	; mark_end; This	-0.124939
-0.355400	size (4096). This	-0.124939
-0.355400	columns unused. This	-0.124939
-0.355400	64 kbytes. This	-0.124939
-0.355400	be reduced. This	-0.124939
-0.355400	library (SVML). This	-0.124939
-0.355400	quite often. This	-0.124939
-0.355400	DEC, JNZ). This	-0.124939
-0.355400	thread scheduler. This	-0.124939
-0.355400	be added. This	-0.124939
-0.355400	external clock. This	-0.124939
-0.355400	plus i*sizeof(S1). This	-0.124939
-0.355400	page 45. This	-0.124939
-0.355400	from scratch. This	-0.124939
-0.355400	polymorphous class? This	-0.124939
-0.355400	page 135). This	-0.124939
-0.355400	or tiling. This	-0.124939
-0.355400	example 16.1. This	-0.124939
-0.355400	bits 32-62. This	-0.124939
-0.355400	page 87. This	-0.124939
-0.355400	from www.agner.org/optimize/testp.zip. This	-0.124939
-0.355400	or CString. This	-0.124939
-0.355400	previous iteration. This	-0.124939
-0.355400	v. 7.2). This	-0.124939
-0.355400	register state. This	-0.124939
-0.355400	is deprecated. This	-0.124939
-0.355400	as ((a+b)+c)+d. This	-0.124939
-0.355400	(parallel composer) This	-0.124939
-0.355400	of -fpic. This	-0.124939
-0.355400	in 2010. This	-0.124939
-0.355400	line written. This	-0.124939
-0.355400	page 158. This	-0.124939
-0.355400	time measure. This	-0.124939
-0.355400	return route. This	-0.124939
-0.355400	Windows MFC). This	-0.124939
-0.355400	access patterns. This	-0.124939
-0.355400	is "undefined". This	-0.124939
-0.355400	option -mveclibabi=svml. This	-0.124939
-0.355400	Portability note: This	-0.124939
-0.355400	this place. This	-0.124939
-0.355400	64 kb. This	-0.124939
-0.355400	of usability. This	-0.124939
-0.355400	+ log(c[i]);. This	-0.124939
-0.355400	(short int)i; This	-0.124939
-0.355400	or -fsource-asm). This	-0.124939
-0.355400	F1() throw(); This	-0.124939
-0.355400	xor eax,eax. This	-0.124939
-0.643621	= a -	-0.425969
-0.935939	return a -	-0.425969
-1.075442	multiply by -	-0.124939
-0.276234	- - -	-1.179296
-0.326144	x - -	-1.054358
-0.210478	n.a. - -	-1.255273
-0.371827	-- - -	-0.124939
-0.259804	-(-a)=a - -	-0.124939
-0.361483	a x -	-0.124939
-0.384696	- x -	-1.255273
-0.607154	x x -	-1.403692
-0.506297	n.a. x -	-0.602060
-0.809630	* x -	-0.124939
-0.361483	-1 x -	-0.124939
-0.361483	---x----- x -	-0.124939
-0.361483	x-xxx---x x -	-0.124939
-2.409315	the program -	-0.124939
-0.596460	- n.a. -	-0.726999
-0.360601	x n.a. -	-0.124939
-0.048722	n.a. n.a. -	-0.784991
-0.251926	reciprocal n.a. -	-0.124939
-0.893493	2) 2 -	-0.124939
-0.199412	takes 4 -	-0.124939
-1.183130	of 8 -	-0.124939
-0.235560	= 0 -	-0.425969
-0.399944	bits 0 -	-0.124939
-0.399944	typically 0 -	-0.124939
-0.399944	0= 0 -	-0.124939
-1.536765	unsigned integers -	-0.124939
-0.591111	Volume 1 -	-0.124939
-0.933259	of 10 -	-0.124939
-0.522999	until 10 -	-0.124939
-0.579988	>= b) -	-0.124939
-0.578368	is inlined -	-0.124939
-0.377592	and 3 -	-0.124939
-0.377592	takes 3 -	-0.124939
-0.377592	take 3 -	-0.124939
-0.562131	approximately 12 -	-0.124939
-0.970068	= -1 -	-0.124939
-0.556715	quite expensive -	-0.124939
-0.549971	takes 14 -	-0.124939
-0.540263	takes 50 -	-0.124939
-0.762950	takes 40 -	-0.124939
-0.958199	and maintenance -	-0.124939
-0.725315	/FA -S -	-0.124939
-0.504139	x ----- -	-0.124939
-0.504476	= a*b -	-0.124939
-0.504139	= ReadTSC() -	-0.124939
-0.901967	Constant folding -	-0.124939
-0.065725	= a+(b+c) -	-0.124939
-0.143202	x -- -	-0.124939
-0.143202	xxxxxxxxx -- -	-0.124939
-0.065725	= a*(b+c) -	-0.124939
-0.658608	a+a+a+a=a*4 -(-a)=a -	-0.124939
-0.658608	= b*a -	-0.124939
-0.658608	= a&(b|c) -	-0.124939
-0.358299	of convenience -	-0.124939
-0.358299	- x-xxx -	-0.124939
-0.358299	(time after) -	-0.124939
-0.358299	= (int)n -	-0.124939
-0.358299	= 2.5*x^2 -	-0.124939
-0.358299	= a<<(b+c) -	-0.124939
-0.358299	multiplication (27 -	-0.124939
-0.358299	multiplication (20 -	-0.124939
-0.358299	((unsigned int)(i -	-0.124939
-0.358299	subtraction (3 -	-0.124939
-0.358299	© 2004 -	-0.124939
-0.358299	- xx(-)x- -	-0.124939
-0.358299	(unsigned int)(max -	-0.124939
-0.358299	see http://www.agner.org/optimize/ -	-0.124939
-0.358299	= a*4 -	-0.124939
-0.358299	x --- -	-0.124939
-1.984382	that is an	-0.124939
-2.275063	it is an	-0.124939
-1.812082	program is an	-0.124939
-1.547325	pointer is an	-0.124939
-0.676716	b is an	-0.301030
-1.214448	there is an	-0.425969
-1.403857	C++ is an	-0.124939
-1.468843	There is an	-0.124939
-1.247437	processor is an	-0.124939
-1.235012	counter is an	-0.124939
-0.588029	source is an	-0.124939
-1.150527	n is an	-0.124939
-0.588029	10 is an	-0.124939
-1.235012	exponent is an	-0.124939
-1.409835	size of an	-0.124939
-1.249643	address of an	-0.425969
-1.522351	bits of an	-0.124939
-1.484123	type of an	-0.124939
-0.565558	case of an	-0.221849
-1.797450	versions of an	-0.124939
-1.154562	overflow of an	-0.124939
-1.251472	copy of an	-0.124939
-1.411115	end of an	-0.124939
-1.221321	range of an	-0.124939
-1.236135	Conversion of an	-0.124939
-1.029624	throughput of an	-0.124939
-1.286176	availability of an	-0.124939
-0.869431	compilation of an	-0.124939
-0.584905	event of an	-0.124939
-0.584905	functionality of an	-0.124939
-0.883855	or to an	-0.124939
-1.860382	pointer to an	-0.124939
-1.166465	number to an	-0.124939
-1.175577	compiled to an	-0.124939
-0.592336	aligned to an	-0.124939
-0.889536	points to an	-0.425969
-1.432846	applies to an	-0.124939
-1.409503	converted to an	-0.124939
-0.592336	functionality to an	-0.124939
-1.166465	15.1a to an	-0.124939
-0.592336	bounds-checking to an	-0.124939
-0.598077	multiplication and an	-0.124939
-0.895160	core and an	-0.124939
-0.895160	mode, and an	-0.124939
-1.157729	pointer in an	-0.124939
-1.707082	elements in an	-0.124939
-1.921752	stored in an	-0.124939
-0.871619	first in an	-0.124939
-1.372383	bits in an	-0.124939
-1.614585	element in an	-0.124939
-1.226255	run in an	-0.124939
-0.586039	allocation in an	-0.124939
-0.586039	microprocessor in an	-0.124939
-0.871619	last in an	-0.124939
-0.589197	together in an	-0.425969
-1.032791	provided in an	-0.124939
-0.586039	Put in an	-0.124939
-0.586039	scheduling in an	-0.124939
-1.791140	used for an	-0.124939
-0.864938	32 for an	-0.124939
-0.582565	allocated for an	-0.124939
-1.148080	130 for an	-0.124939
-0.582565	80 for an	-0.124939
-0.582565	43 for an	-0.124939
-0.582565	81 for an	-0.124939
-0.200150	89 for an	-0.124939
-0.200150	78 for an	-0.425969
-0.200150	CPUs" for an	-0.425969
-1.801465	assume that an	-0.124939
-0.599144	dictates that an	-0.124939
-1.491127	to be an	-0.124939
-1.866720	can be an	-0.301030
-1.841111	will be an	-0.124939
-1.644165	also be an	-0.124939
-1.580739	would be an	-0.124939
-1.682965	preferably be an	-0.124939
-0.878226	line or an	-0.124939
-0.878226	expression or an	-0.124939
-0.589450	map or an	-0.124939
-0.589450	integer, or an	-0.124939
-0.589450	operator, or an	-0.124939
-0.877572	and if an	-0.124939
-1.421714	check if an	-0.124939
-0.589113	what if an	-0.124939
-1.166384	fail if an	-0.124939
-0.589113	Number) if an	-0.124939
-1.404621	or by an	-0.124939
-0.667327	calculated by an	-0.425969
-0.588146	received by an	-0.124939
-0.588146	followed by an	-0.124939
-1.010170	or with an	-0.124939
-1.211254	code with an	-0.124939
-0.855886	this with an	-0.124939
-0.577817	return with an	-0.124939
-1.010170	accessed with an	-0.124939
-1.427757	done with an	-0.124939
-1.114344	implemented with an	-0.124939
-0.855886	platform with an	-0.124939
-0.589930	called on an	-0.124939
-0.594742	running on an	-0.425969
-1.712285	based on an	-0.124939
-0.552305	x as an	-0.124939
-0.808698	data as an	-0.124939
-0.789169	used as an	-0.425969
-0.944316	variable as an	-0.124939
-0.944316	way as an	-0.124939
-0.552305	available as an	-0.124939
-0.944316	transferred as an	-0.124939
-0.808698	provided as an	-0.124939
-0.808698	interpreted as an	-0.124939
-1.069255	expressed as an	-0.124939
-0.944316	treated as an	-0.124939
-0.808698	coded as an	-0.124939
-0.808698	represented as an	-0.124939
-0.552305	(n!) as an	-0.124939
-1.473602	is not an	-0.124939
-1.822351	more than an	-0.124939
-0.591055	expensive than an	-0.124939
-0.591055	compact than an	-0.124939
-0.630291	will have an	-0.124939
-0.602190	compilers have an	-0.726999
-1.066315	also have an	-0.124939
-1.090879	we have an	-0.124939
-0.806545	even have an	-0.124939
-1.271675	doesn't have an	-0.124939
-0.551110	Linux have an	-0.124939
-1.764076	every time an	-0.124939
-2.013977	to use an	-0.124939
-1.410246	may use an	-0.124939
-1.377184	then use an	-0.124939
-0.577940	languages use an	-0.124939
-0.577940	Don't use an	-0.124939
-0.599559	microprocessors when an	-0.124939
-0.599748	volatile then an	-0.124939
-0.659757	stored at an	-0.425969
-0.835259	start at an	-0.124939
-0.483666	loaded at an	-0.124939
-0.566822	begin at an	-0.124939
-1.525042	it has an	-0.124939
-0.887026	compiler has an	-0.425969
-1.324007	program has an	-0.124939
-1.347099	library has an	-0.124939
-0.565269	Sum1 has an	-0.124939
-0.945540	can make an	-0.124939
-1.154766	then make an	-0.124939
-0.599561	issue because an	-0.124939
-1.591466	is only an	-0.124939
-1.215638	but only an	-0.124939
-0.866895	requires only an	-0.124939
-0.599226	up. If an	-0.124939
-0.598949	Pascal used an	-0.124939
-1.361474	to set an	-0.124939
-2.034642	to do an	-0.124939
-0.588168	register, do an	-0.124939
-1.568958	of using an	-0.124939
-0.959552	for using an	-0.124939
-0.898976	are using an	-0.425969
-1.670722	by using an	-0.124939
-0.555621	class into an	-0.124939
-0.555621	software into an	-0.124939
-0.555621	compiled into an	-0.124939
-1.042326	loaded into an	-0.124939
-0.555621	one, into an	-0.124939
-0.598208	making i an	-0.124939
-0.844820	to such an	-0.124939
-0.571948	make such an	-0.124939
-0.571948	Porting such an	-0.124939
-0.595902	may return an	-0.124939
-0.858850	and makes an	-0.124939
-0.579377	linker makes an	-0.124939
-1.813101	is often an	-0.124939
-1.269386	don't need an	-0.124939
-0.573248	versions without an	-0.124939
-0.573248	applications without an	-0.124939
-0.909075	to access an	-0.124939
-0.522037	and making an	-0.124939
-1.297671	by making an	-0.124939
-0.552300	from making an	-0.124939
-1.502214	many times an	-0.124939
-0.593234	assumption about an	-0.124939
-0.592920	wide, while an	-0.124939
-0.885927	you avoid an	-0.124939
-0.564467	etc. Use an	-0.124939
-0.830901	data. Use an	-0.124939
-0.593136	C1::f. But an	-0.124939
-0.830359	goes through an	-0.124939
-0.564174	main through an	-0.124939
-0.557236	code uses an	-0.124939
-0.557236	feature uses an	-0.124939
-1.574262	to get an	-0.124939
-0.591426	check whether an	-0.124939
-1.044772	is doing an	-0.124939
-1.257087	will run an	-0.124939
-0.589639	or add an	-0.124939
-1.593156	is simply an	-0.124939
-0.588167	11.2b was an	-0.124939
-1.461890	most cases, an	-0.124939
-1.153840	can replace an	-0.124939
-0.874916	behaves like an	-0.124939
-0.587541	function. Using an	-0.124939
-1.037342	and put an	-0.124939
-1.260205	it needs an	-0.124939
-1.029415	it requires an	-0.124939
-0.584581	58 shows an	-0.124939
-0.493679	to generate an	-0.124939
-0.389503	will generate an	-0.124939
-0.583680	should choose an	-0.124939
-1.130727	is just an	-0.124939
-0.509359	code gives an	-0.124939
-0.509359	SSE4.1 gives an	-0.124939
-0.579103	software. Such an	-0.124939
-1.381052	in fact an	-0.124939
-0.579103	Windows, including an	-0.124939
-0.575901	operations. When an	-0.124939
-0.575228	19 Avoid an	-0.124939
-0.575228	as copying an	-0.124939
-0.490326	to accessing an	-0.124939
-0.490326	when accessing an	-0.124939
-1.306529	by adding an	-0.124939
-0.575452	write causes an	-0.124939
-0.853533	you divide an	-0.124939
-0.572824	branch (e.g. an	-0.124939
-0.848280	can convert an	-0.124939
-1.251422	to handle an	-0.124939
-1.190140	to insert an	-0.124939
-0.462632	called whenever an	-0.124939
-0.462632	declared whenever an	-0.124939
-0.568351	function modify an	-0.124939
-0.568925	or setting an	-0.124939
-1.101463	optimize away an	-0.124939
-0.555066	PC's had an	-0.124939
-0.548161	constant plus an	-0.124939
-0.548618	you declare an	-0.124939
-0.548161	will detect an	-0.124939
-0.802887	language defines an	-0.124939
-0.549075	pointer. Accessing an	-0.124939
-0.784404	to increment an	-0.124939
-0.538675	it adds an	-0.124939
-0.539748	you specify an	-0.124939
-0.525473	of declaring an	-0.124939
-0.524824	functions. While an	-0.124939
-0.502678	will catch an	-0.124939
-0.502678	dispatcher signal an	-0.124939
-0.502678	builder Has an	-0.124939
-0.502678	to issue an	-0.124939
-0.502678	clause. Comparing an	-0.124939
-0.502678	possibly throw an	-0.124939
-0.722886	user expects an	-0.124939
-0.461509	function construct an	-0.124939
-0.461509	slow CPU, an	-0.124939
-0.461509	a constructor, an	-0.124939
-0.461509	never exceeds an	-0.124939
-0.461509	by performing an	-0.124939
-0.461509	will provoke an	-0.124939
-0.461509	a[i]; Converting an	-0.124939
-0.656524	by replacing an	-0.124939
-0.357255	works, here's an	-0.124939
-0.357255	for issuing an	-0.124939
-0.357255	are seeing an	-0.124939
-0.357255	line provokes an	-0.124939
-0.357255	are feeding an	-0.124939
-0.357255	simply prints an	-0.124939
-0.357255	actually throws an	-0.124939
-0.357255	specified. Insert an	-0.124939
-0.357255	to fake an	-0.124939
-0.357255	for raising an	-0.124939
-0.357255	that detects an	-0.124939
-1.075296	double to int	-0.124939
-0.600113	5; to int	-0.124939
-0.939766	bytes = int	-0.124939
-0.804111	float or int	-0.124939
-1.806553	of an int	-0.124939
-1.259762	if an int	-0.124939
-0.591369	while an int	-0.124939
-2.173476	short int int	-0.124939
-0.981682	S1 { int	-0.124939
-0.534903	struct { int	-0.124939
-1.270576	p) { int	-0.124939
-0.352090	r) { int	-0.425969
-0.777783	Bitfield { int	-0.124939
-0.189832	main() { int	-0.425969
-0.534903	ReadTSC() { int	-0.124939
-0.189832	Func2() { int	-0.124939
-0.777783	(y) { int	-0.124939
-0.189832	b[SIZE][SIZE]) { int	-0.425969
-0.534903	x[]) { int	-0.124939
-1.368050	first time int	-0.124939
-0.587200	parm2); } int	-0.124939
-0.873862	&CriticalFunction_386; } int	-0.124939
-0.587200	sizeof(a)); } int	-0.124939
-0.597976	nearest integer int	-0.124939
-2.401385	instruction set int	-0.124939
-0.373523	asmlib library int	-0.425969
-0.356352	SSE2 version int	-0.425969
-0.795076	AVX version int	-0.124939
-0.192027	Lowest version int	-0.425969
-0.890667	2 2 int	-0.124939
-0.977151	unsigned long int	-0.124939
-0.565321	systems: long int	-0.124939
-0.565321	Linux: long int	-0.124939
-0.544834	the const int	-0.124939
-0.111554	{ const int	-0.124939
-0.262054	} const int	-0.124939
-0.447566	static const int	-0.124939
-0.262054	constant const int	-0.124939
-0.262054	i; const int	-0.124939
-0.262054	vectorization const int	-0.124939
-0.262054	x); const int	-0.124939
-0.262054	#endif const int	-0.124939
-0.262054	14.8 const int	-0.124939
-0.262054	16.1 const int	-0.124939
-0.262054	14.30 const int	-0.124939
-0.262054	9.5a const int	-0.124939
-0.262054	11.3 const int	-0.124939
-0.262054	9.4 const int	-0.124939
-0.262054	7.17 const int	-0.124939
-0.111554	Func(int); const int	-0.425969
-0.262054	9.6a const int	-0.124939
-0.262054	11.2b const int	-0.124939
-0.262054	squares: const int	-0.124939
-0.262054	factorials: const int	-0.124939
-0.262054	14.5a const int	-0.124939
-0.262054	11.2a const int	-0.124939
-0.262054	7.33b const int	-0.124939
-0.262054	7.33a const int	-0.124939
-0.262054	FuncCol(int); const int	-0.124939
-0.262054	14.4a const int	-0.124939
-1.275694	32 4 int	-0.124939
-1.383904	= 0; int	-0.301030
-0.471598	to unsigned int	-0.124939
-0.570531	an unsigned int	-0.124939
-0.397832	int unsigned int	-0.124939
-0.074843	{ unsigned int	-0.602060
-0.445592	4 unsigned int	-0.124939
-0.117732	part unsigned int	-0.425969
-0.279758	d; unsigned int	-0.124939
-0.279758	x, unsigned int	-0.124939
-0.279758	f; unsigned int	-0.124939
-0.397832	systems: unsigned int	-0.124939
-0.279758	normal unsigned int	-0.124939
-0.279758	1000; unsigned int	-0.124939
-0.279758	7.7 unsigned int	-0.124939
-0.279758	7.25 unsigned int	-0.124939
-0.279758	142 unsigned int	-0.124939
-0.279758	a[size]; unsigned int	-0.124939
-0.279758	T, unsigned int	-0.124939
-0.279758	0x3FF unsigned int	-0.124939
-0.279758	0x3FFF unsigned int	-0.124939
-0.279758	14.22b unsigned int	-0.124939
-0.279758	14.22a unsigned int	-0.124939
-0.279758	uint16_t unsigned int	-0.124939
-0.279758	0x7F unsigned int	-0.124939
-1.266438	128 SSE2 int	-0.124939
-0.080510	than short int	-0.124939
-0.080510	{ short int	-0.124939
-0.080510	A short int	-0.124939
-0.080510	using short int	-0.124939
-0.080510	4 short int	-0.124939
-0.080510	8 short int	-0.124939
-0.025214	unsigned short int	-0.124939
-0.080510	SSE2 short int	-0.124939
-0.080510	type short int	-0.124939
-0.038392	i; short int	-0.425969
-0.080510	256 short int	-0.124939
-0.080510	char short int	-0.124939
-0.080510	AVX2 short int	-0.124939
-0.008243	bb[], short int	-0.778151
-0.008243	aa[], short int	-1.079181
-0.025214	( short int	-0.124939
-0.080510	MMX short int	-0.124939
-0.080510	11 short int	-0.124939
-0.080510	7.22 short int	-0.124939
-0.080510	int8_t short int	-0.124939
-1.257087	doesn't work int	-0.124939
-2.067156	int i; int	-0.124939
-1.284252	(int a, int	-0.124939
-1.529295	unsigned integers int	-0.124939
-0.883075	256 AVX int	-0.124939
-0.818599	b;} }; int	-0.124939
-0.557763	UnusedFiller; }; int	-0.124939
-0.376017	int b; int	-0.124939
-0.818445	double b; int	-0.124939
-0.767499	static inline int	-0.301030
-0.589596	479001600}; ... int	-0.124939
-0.873006	unsigned 256 int	-0.124939
-1.050350	int c; int	-0.124939
-0.817981	{ public: int	-0.124939
-0.188357	Bitfield x; int	-0.425969
-0.686675	= 100; int	-0.124939
-1.132490	256 AVX2 int	-0.124939
-0.582878	will reduce int	-0.124939
-0.860968	allocation are: int	-0.124939
-0.360403	int a; int	-0.425969
-0.312490	float a; int	-0.425969
-0.129995	{int a; int	-0.124939
-1.156043	double d; int	-0.124939
-0.573895	(int x, int	-0.124939
-0.077388	float f; int	-0.970037
-0.592285	} u; int	-0.425969
-0.015489	CriticalFunction_386(int parm1, int	-0.425969
-0.015489	CriticalFunction_SSE2(int parm1, int	-0.425969
-0.015489	CriticalFunction_AVX(int parm1, int	-0.425969
-0.031550	CriticalFunctionType(int parm1, int	-0.124939
-0.031550	CriticalFunction_Dispatch(int parm1, int	-0.124939
-0.561232	operator[] (unsigned int	-0.124939
-0.433405	volatile volatile int	-0.124939
-0.433405	dummy[4]; volatile int	-0.124939
-0.815627	= 10; int	-0.124939
-0.435129	int a[100]; int	-0.124939
-0.362924	float a[100]; int	-0.425969
-0.554841	version 127 int	-0.124939
-1.039937	64 MMX int	-0.124939
-1.059768	16-bit systems: int	-0.124939
-0.548927	= 16; int	-0.124939
-0.548433	at 7 int	-0.124939
-1.079988	= 1.0; int	-0.124939
-0.006816	void SelectAddMul(short int	-0.903090
-0.784022	// n! int	-0.124939
-0.342674	= 1000; int	-0.124939
-0.538458	is __asm int	-0.124939
-0.015489	int list[300]; int	-0.425969
-0.525315	B2 b2; int	-0.124939
-0.027929	float matrix[rows][columns]; int	-0.301030
-0.503367	{ 89 int	-0.124939
-0.502478	S1 list[100]; int	-0.124939
-0.503367	Example 14.10 int	-0.124939
-0.503367	Example 14.11 int	-0.124939
-0.503367	Example 8.7 int	-0.124939
-0.503367	Example 7.21 int	-0.124939
-0.830437	i, j; int	-0.124939
-0.203525	= 1024; int	-0.425969
-0.089833	(int a[], int	-0.124939
-0.042598	Func(int a[], int	-0.425969
-0.502478	parameters typedef int	-0.124939
-0.503367	Example 7.23 int	-0.124939
-0.503367	Example 7.20 int	-0.124939
-0.503367	Example 7.19 int	-0.124939
-0.503367	Example 7.18 int	-0.124939
-0.656240	x=y; y=temp;} int	-0.124939
-0.461327	Example 14.12b int	-0.124939
-0.065574	double Table[100]; int	-0.425969
-0.065574	int b:2; int	-0.425969
-0.656240	= string; int	-0.124939
-0.461327	Example 14.13b int	-0.124939
-0.461327	S1 list[size]; int	-0.124939
-0.656240	* p; int	-0.124939
-0.656240	extern "C" int	-0.124939
-0.065574	int a:4; int	-0.425969
-0.461327	class c1; int	-0.124939
-0.656240	* m;} int	-0.124939
-0.461327	+ 2;} int	-0.124939
-0.357112	Example 14.3a int	-0.124939
-0.357112	Example 14.3b int	-0.124939
-0.357112	int matrix[NUMROWS][NUMCOLUMNS]; int	-0.124939
-0.357112	Example 8.9b int	-0.124939
-0.357112	Example 8.9a int	-0.124939
-0.357112	Example 14.1b int	-0.124939
-0.357112	Example 14.1a int	-0.124939
-0.357112	at 403 int	-0.124939
-0.357112	= 110; int	-0.124939
-0.357112	Example 14.13c int	-0.124939
-0.357112	Example 14.13a int	-0.124939
-0.357112	int FuncRow(int); int	-0.124939
-0.357112	Example 8.13a int	-0.124939
-0.357112	Example 8.13b int	-0.124939
-0.357112	Example 9.1a int	-0.124939
-0.357112	Example 9.1b int	-0.124939
-0.357112	Example 8.11b int	-0.124939
-0.357112	Example 8.11a int	-0.124939
-0.357112	Example 7.42 int	-0.124939
-0.357112	Sab ab[size]; int	-0.124939
-0.357112	void SelectAddMul_dispatch(short int	-0.124939
-0.357112	} module2.cpp int	-0.124939
-0.357112	square blocking: int	-0.124939
-0.357112	<int m> int	-0.124939
-0.357112	Example 8.6a int	-0.124939
-0.357112	Example 8.6b int	-0.124939
-0.357112	float list[16]; int	-0.124939
-0.357112	8.20 module1.cpp int	-0.124939
-0.357112	Example 7.30b int	-0.124939
-0.357112	Example 7.30a int	-0.124939
-0.357112	int list[301]; int	-0.124939
-0.357112	<bool IsPowerOf2, int	-0.124939
-0.357112	__restrict aa, int	-0.124939
-0.357112	void FUNCNAME(short int	-0.124939
-0.357112	writing: __declspec(align(64)) int	-0.124939
-0.357112	Example 8.12a int	-0.124939
-0.357112	Example 8.12b int	-0.124939
-0.357112	void FuncType(short int	-0.124939
-0.357112	Example 14.12a int	-0.124939
-0.357112	Example 8.14b int	-0.124939
-0.357112	Example 8.14a int	-0.124939
-0.357112	+ p->b;} int	-0.124939
-0.357112	32767 int16_t int	-0.124939
-0.357112	= 1.6; int	-0.124939
-0.357112	at 399 int	-0.124939
-0.357112	9.5a: 98 int	-0.124939
-0.250846	more time than	-0.176091
-0.257794	longer time than	-0.221849
-1.719245	to use than	-0.124939
-1.393652	is more than	-0.124939
-0.803140	in more than	-0.124939
-0.906528	for more than	-0.124939
-0.701116	by more than	-0.124939
-0.803140	have more than	-0.124939
-0.489436	at more than	-0.124939
-0.928079	no more than	-0.124939
-0.803140	do more than	-0.124939
-0.510034	take more than	-0.301030
-1.114582	much more than	-0.124939
-0.864822	uses more than	-0.124939
-0.179137	cycles more than	-0.124939
-0.489436	was more than	-0.124939
-0.489436	actually more than	-0.124939
-0.489436	load more than	-0.124939
-0.489436	go more than	-0.124939
-0.489436	occurs more than	-0.124939
-0.489436	prefetch more than	-0.124939
-0.489436	programs, more than	-0.124939
-0.489436	implies more than	-0.124939
-0.599383	more data than	-0.124939
-2.205282	the program than	-0.124939
-1.902781	a program than	-0.124939
-1.531970	library functions than	-0.124939
-2.253011	the CPU than	-0.124939
-0.575547	size other than	-0.124939
-0.575547	library other than	-0.124939
-0.575547	sizes other than	-0.124939
-0.575547	c1 other than	-0.124939
-2.414482	instruction set than	-0.124939
-1.436508	+ b than	-0.124939
-1.488916	dynamic library than	-0.124939
-0.449038	more efficient than	-0.157123
-0.270726	less efficient than	-0.124939
-0.119723	other value than	-0.602060
-0.993724	previous value than	-0.124939
-0.597162	are variables than	-0.124939
-0.597050	another way than	-0.124939
-0.255068	is faster than	-0.249877
-0.259428	be faster than	-0.124939
-0.105955	are faster than	-0.124939
-0.246402	C++ faster than	-0.124939
-0.246402	called faster than	-0.124939
-0.354558	often faster than	-0.124939
-0.202031	times faster than	-0.124939
-0.354558	much faster than	-0.124939
-0.246402	calculated faster than	-0.124939
-0.397557	run faster than	-0.124939
-0.246402	execute faster than	-0.124939
-0.246402	little faster than	-0.124939
-0.246402	increasing faster than	-0.124939
-0.246402	ipow faster than	-0.124939
-0.510486	is less than	-0.221849
-0.294986	be less than	-0.301030
-0.406140	not less than	-0.124939
-0.406140	at less than	-0.124939
-0.406140	times less than	-0.124939
-0.406140	while less than	-0.124939
-0.406140	write less than	-0.124939
-0.406140	difference less than	-0.124939
-0.004398	code rather than	-0.124939
-0.004398	time rather than	-0.425969
-0.008842	use rather than	-0.124939
-0.004398	memory rather than	-0.425969
-0.008842	integer rather than	-0.124939
-0.008842	float rather than	-0.124939
-0.008842	object rather than	-0.124939
-0.008842	array rather than	-0.124939
-0.004398	8 rather than	-0.124939
-0.008842	register rather than	-0.124939
-0.008842	template rather than	-0.124939
-0.001754	registers rather than	-0.522879
-0.008842	pointers rather than	-0.124939
-0.008842	access rather than	-0.124939
-0.008842	system rather than	-0.124939
-0.008842	bits rather than	-0.124939
-0.008842	0 rather than	-0.124939
-0.008842	instructions rather than	-0.124939
-0.008842	processors rather than	-0.124939
-0.008842	times rather than	-0.124939
-0.008842	stack rather than	-0.124939
-0.008842	calls rather than	-0.124939
-0.008842	container rather than	-0.124939
-0.008842	mode rather than	-0.124939
-0.008842	cycles rather than	-0.124939
-0.008842	sets rather than	-0.124939
-0.008842	implementation rather than	-0.124939
-0.008842	parameter rather than	-0.124939
-0.008842	expressions rather than	-0.124939
-0.008842	linking rather than	-0.124939
-0.008842	references rather than	-0.124939
-0.008842	loaded rather than	-0.124939
-0.008842	operation rather than	-0.124939
-0.008842	100 rather than	-0.124939
-0.008842	models rather than	-0.124939
-0.008842	beginning rather than	-0.124939
-0.008842	blocks rather than	-0.124939
-0.008842	memcpy rather than	-0.124939
-0.008842	factor rather than	-0.124939
-0.008842	units rather than	-0.124939
-0.008842	step rather than	-0.124939
-0.008842	advance rather than	-0.124939
-0.008842	zero, rather than	-0.124939
-0.008842	xxn rather than	-0.124939
-0.008842	frameworks, rather than	-0.124939
-0.008842	-56 rather than	-0.124939
-0.008842	once, rather than	-0.124939
-0.008842	!b) rather than	-0.124939
-0.008842	connections rather than	-0.124939
-0.008842	(b*2.0)/3.0 rather than	-0.124939
-0.008842	unions rather than	-0.124939
-0.008842	X?" rather than	-0.124939
-0.008842	matters rather than	-0.124939
-0.008842	supports, rather than	-0.124939
-0.008842	tools, rather than	-0.124939
-0.008842	at, rather than	-0.124939
-0.595685	compiler optimization than	-0.124939
-0.595261	such systems than	-0.124939
-0.594650	separate file than	-0.124939
-0.594759	more bits than	-0.124939
-0.594800	| operations than	-0.124939
-0.594334	control instructions than	-0.124939
-0.469427	more important than	-0.124939
-0.793925	less important than	-0.124939
-0.972584	one thread than	-0.124939
-1.067105	each thread than	-0.124939
-0.593717	computing power than	-0.124939
-0.460208	64-bit Linux than	-0.602060
-1.626612	vector classes than	-0.124939
-1.400254	single precision than	-0.124939
-1.498351	a container than	-0.124939
-1.809767	to calculate than	-0.124939
-1.407535	64-bit mode than	-0.124939
-1.033557	bit mode than	-0.124939
-0.166395	other values than	-0.726999
-1.883577	clock cycles than	-0.124939
-1.041404	more space than	-0.124939
-1.044502	anything else than	-0.124939
-0.588178	with signed than	-0.124939
-1.622656	memory block than	-0.124939
-1.500822	to zero than	-0.124939
-0.047957	more resources than	-0.346788
-0.321243	memory resources than	-0.124939
-0.453133	other resources than	-0.124939
-0.321243	computing resources than	-0.124939
-0.178999	is better than	-0.124939
-0.926913	be better than	-0.124939
-0.545180	integer expressions than	-0.124939
-0.485143	is longer than	-0.124939
-0.485143	be longer than	-0.124939
-0.485143	matrix longer than	-0.124939
-1.628566	user interface than	-0.124939
-0.762670	much higher than	-0.124939
-0.526199	usually higher than	-0.124939
-0.062573	is bigger than	-0.602060
-0.223694	be bigger than	-0.124939
-0.223694	are bigger than	-0.124939
-0.223694	as bigger than	-0.124939
-0.223694	loop bigger than	-0.124939
-0.223694	arrays bigger than	-0.124939
-0.223694	offset bigger than	-0.124939
-0.223694	Objects bigger than	-0.124939
-0.584678	other ways than	-0.124939
-0.864659	other modules than	-0.124939
-0.582777	units smaller than	-0.124939
-1.288265	cache contentions than	-0.124939
-1.220761	array index than	-0.124939
-0.856575	more safe than	-0.124939
-0.577573	best algorithm than	-0.124939
-0.574149	same priority than	-0.124939
-0.574149	higher priority than	-0.124939
-0.574149	lower priority than	-0.124939
-1.497239	clock frequency than	-0.124939
-1.103039	more efficiently than	-0.124939
-0.177580	more RAM than	-0.124939
-1.014983	circular buffer than	-0.124939
-0.833585	logic device than	-0.124939
-0.978100	memory blocks than	-0.124939
-0.565460	more time-consuming than	-0.124939
-0.169036	other purposes than	-0.124939
-0.825630	coarse-grained parallelism than	-0.124939
-0.561349	is lower than	-0.124939
-0.561349	more random than	-0.124939
-0.162494	be slower than	-0.124939
-0.089979	libraries slower than	-0.124939
-0.089979	much slower than	-0.124939
-0.089979	run slower than	-0.124939
-0.089979	execute slower than	-0.124939
-0.089979	nor slower than	-0.124939
-0.815087	more expensive than	-0.124939
-0.953070	more reliable than	-0.124939
-0.152635	more predictable than	-0.124939
-1.084457	more compact than	-0.124939
-0.549911	binary form than	-0.124939
-0.539414	is larger than	-0.124939
-0.526011	easier said than	-0.124939
-0.526011	a bottleneck than	-0.124939
-0.761534	is simpler than	-0.124939
-0.503954	elsewhere. Faster than	-0.124939
-0.503357	other input/output than	-0.124939
-0.657494	memory footprint than	-0.124939
-0.462127	accessed recently than	-0.124939
-0.462127	and verify than	-0.124939
-0.462127	for shared_ptr than	-0.124939
-0.357741	element. Rather than	-0.124939
-0.357741	or bitmap than	-0.124939
-0.357741	write 2.0/3.0 than	-0.124939
-0.357741	objects (rather than	-0.124939
-0.357741	call (other than	-0.124939
-0.357741	loops (less than	-0.124939
-0.357741	(memory pooling) than	-0.124939
-1.836516	of the compiler	-0.124939
-1.777879	to the compiler	-0.124939
-1.497286	and the compiler	-0.124939
-1.701378	in the compiler	-0.124939
-1.167782	for the compiler	-0.669007
-1.172252	that the compiler	-0.182931
-1.642271	if the compiler	-0.124939
-1.213361	by the compiler	-0.124939
-1.523174	on the compiler	-0.124939
-1.431685	as the compiler	-0.124939
-1.167355	then the compiler	-0.124939
-1.639729	from the compiler	-0.124939
-1.712905	at the compiler	-0.124939
-1.590045	make the compiler	-0.124939
-1.125554	because the compiler	-0.124939
-1.259706	If the compiler	-0.124939
-0.698033	but the compiler	-0.124939
-1.210821	where the compiler	-0.425969
-1.063729	so the compiler	-0.124939
-1.520866	makes the compiler	-0.124939
-1.615076	before the compiler	-0.124939
-0.841392	See the compiler	-0.425969
-0.834262	example, the compiler	-0.124939
-1.352513	sure the compiler	-0.124939
-0.969866	cases the compiler	-0.124939
-1.183637	But the compiler	-0.124939
-1.369598	whether the compiler	-0.124939
-0.827246	well the compiler	-0.124939
-0.641666	cases, the compiler	-0.124939
-0.407933	allows the compiler	-0.602060
-0.333191	what the compiler	-0.301030
-1.250583	give the compiler	-0.124939
-0.701112	Here, the compiler	-0.124939
-0.562485	function, the compiler	-0.124939
-1.250583	Unfortunately, the compiler	-0.124939
-0.363946	prevent the compiler	-0.425969
-0.316212	prevents the compiler	-0.823909
-0.036535	tell the compiler	-0.380211
-0.479319	enable the compiler	-0.602060
-0.562485	allow the compiler	-0.124939
-0.969866	expect the compiler	-0.124939
-0.562485	why the compiler	-0.124939
-0.969866	help the compiler	-0.124939
-0.562485	Likewise, the compiler	-0.124939
-0.562485	reference, the compiler	-0.124939
-0.562485	list, the compiler	-0.124939
-0.562485	hand, the compiler	-0.124939
-0.969866	specify the compiler	-0.124939
-1.063729	tells the compiler	-0.124939
-0.195911	enables the compiler	-0.425969
-0.562485	optimization, the compiler	-0.124939
-0.827246	fact, the compiler	-0.124939
-0.562485	Sometimes the compiler	-0.124939
-0.562485	Adding the compiler	-0.124939
-0.827246	invoking the compiler	-0.124939
-0.827246	forces the compiler	-0.124939
-0.562485	12.1a, the compiler	-0.124939
-0.562485	Therefore the compiler	-0.124939
-0.562485	12.1b, the compiler	-0.124939
-0.562485	++b; the compiler	-0.124939
-2.340735	of a compiler	-0.124939
-1.909851	that a compiler	-0.124939
-1.957550	use a compiler	-0.124939
-1.704668	using a compiler	-0.124939
-1.571192	example, a compiler	-0.124939
-1.273342	Use a compiler	-0.124939
-1.457797	get a compiler	-0.124939
-1.176432	requires a compiler	-0.124939
-0.500883	expect a compiler	-0.425969
-1.573573	choice of compiler	-0.124939
-1.044096	Choice of compiler	-0.124939
-0.203385	Overview of compiler	-0.124939
-1.076538	programmers and compiler	-0.124939
-1.299516	included in compiler	-0.124939
-0.702207	} The compiler	-0.522879
-1.243523	functions. The compiler	-0.124939
-1.067306	called. The compiler	-0.124939
-0.544176	modules The compiler	-0.124939
-0.544176	inlining The compiler	-0.124939
-1.008139	2. The compiler	-0.124939
-0.924538	object. The compiler	-0.124939
-1.008139	variable. The compiler	-0.124939
-0.924538	access. The compiler	-0.124939
-0.544176	f; The compiler	-0.124939
-0.924538	problem. The compiler	-0.124939
-0.794137	enabled. The compiler	-0.124939
-0.544176	executed. The compiler	-0.124939
-0.544176	solution. The compiler	-0.124939
-0.544176	Linux. The compiler	-0.124939
-0.794137	integer. The compiler	-0.124939
-0.924538	vectorization. The compiler	-0.124939
-0.924538	threads. The compiler	-0.124939
-0.794137	here. The compiler	-0.124939
-0.794137	one. The compiler	-0.124939
-0.544176	division. The compiler	-0.124939
-0.794137	initialization. The compiler	-0.124939
-0.544176	1.0f; The compiler	-0.124939
-0.924538	module. The compiler	-0.124939
-0.544176	big. The compiler	-0.124939
-0.794137	1.0f;} The compiler	-0.124939
-0.544176	x. The compiler	-0.124939
-0.544176	4; The compiler	-0.124939
-0.544176	i+1; The compiler	-0.124939
-0.544176	Instrumentation: The compiler	-0.124939
-0.544176	1.2345); The compiler	-0.124939
-0.544176	/arch:SSE2. The compiler	-0.124939
-0.544176	84). The compiler	-0.124939
-0.544176	temp. The compiler	-0.124939
-0.544176	(&a); The compiler	-0.124939
-0.544176	a+1;. The compiler	-0.124939
-0.544176	72). The compiler	-0.124939
-0.544176	3.0; The compiler	-0.124939
-0.601317	guess, that compiler	-0.124939
-0.598652	optimization by compiler	-0.124939
-1.703210	rely on compiler	-0.124939
-0.597347	article on compiler	-0.124939
-0.892815	well. This compiler	-0.124939
-0.596892	composer) This compiler	-0.124939
-0.576452	possible. A compiler	-0.124939
-0.576452	result. A compiler	-0.124939
-0.576452	37 A compiler	-0.124939
-0.576452	Basic. A compiler	-0.124939
-0.576452	Scheduling A compiler	-0.124939
-1.834913	a different compiler	-0.124939
-2.603440	the same compiler	-0.124939
-1.193707	predict which compiler	-0.124939
-1.070121	but no compiler	-0.124939
-1.364827	to each compiler	-0.124939
-0.277996	the Intel compiler	-0.263241
-0.455332	in Intel compiler	-0.124939
-0.110041	The Intel compiler	-0.182931
-0.735614	on Intel compiler	-0.124939
-0.529324	compiler Intel compiler	-0.124939
-0.148564	Windows Intel compiler	-0.425969
-0.093202	Linux Intel compiler	-0.301030
-1.274904	the C++ compiler	-0.124939
-1.229372	Intel C++ compiler	-0.124939
-0.802561	Gnu C++ compiler	-0.124939
-0.802561	PathScale C++ compiler	-0.124939
-0.802561	PGI C++ compiler	-0.124939
-1.923697	a new compiler	-0.124939
-1.982642	The following compiler	-0.124939
-0.197686	the Gnu compiler	-0.234083
-0.067747	in Gnu compiler	-0.124939
-0.140209	The Gnu compiler	-0.204120
-0.067747	Windows Gnu compiler	-0.602060
-0.887388	a Windows compiler	-0.124939
-1.777425	the best compiler	-0.124939
-1.242178	a good compiler	-0.124939
-0.381055	A good compiler	-0.301030
-0.048257	an optimizing compiler	-0.346788
-0.061208	An optimizing compiler	-0.249877
-0.324062	good optimizing compiler	-0.124939
-1.783151	a particular compiler	-0.124939
-0.589119	maintain. Most compiler	-0.124939
-0.644030	The Microsoft compiler	-0.124939
-0.453494	or Microsoft compiler	-0.124939
-0.453494	If Microsoft compiler	-0.124939
-0.453494	(The Microsoft compiler	-0.124939
-0.588009	open source compiler	-0.124939
-0.585983	code. Each compiler	-0.124939
-0.870552	for your compiler	-0.124939
-0.583684	Use appropriate compiler	-0.124939
-0.572795	or PathScale compiler	-0.124939
-0.979686	the chosen compiler	-0.124939
-0.562587	a just-in-time compiler	-0.124939
-0.159760	The Clang compiler	-0.124939
-0.955779	the Borland compiler	-0.124939
-0.549829	The CodeGear compiler	-0.124939
-1.054533	Digital Mars compiler	-0.124939
-0.540307	language programming, compiler	-0.124939
-0.540518	The Codeplay compiler	-0.124939
-0.230785	n.a. MS compiler	-0.124939
-0.100240	optimization MS compiler	-0.425969
-0.526404	A commercial compiler	-0.124939
-0.526659	(The PGI compiler	-0.124939
-0.462874	stand alone compiler	-0.124939
-0.462874	a cheap compiler	-0.124939
-0.358328	user friendly compiler	-0.124939
-0.358328	a genuine compiler	-0.124939
-2.208406	= a x	-0.124939
-1.947437	address of x	-0.124939
-0.898759	bit of x	-0.124939
-0.895896	reading of x	-0.124939
-1.358977	availability of x	-0.124939
-1.078191	2 to x	-0.124939
-0.601267	template for x	-0.124939
-2.291380	is that x	-0.124939
-2.249744	a = x	-0.124939
-0.202903	x2 = x	-0.124939
-1.375349	depend on x	-0.124939
-0.681340	- - x	-1.079181
-0.321224	x - x	-1.367977
-1.118055	n.a. - x	-0.602060
-0.891759	to int x	-0.124939
-0.596358	reduce int x	-0.124939
-1.189952	efficient than x	-0.425969
-1.925840	faster than x	-0.124939
-0.239550	- x x	-1.216019
-0.139922	x x x	-1.392110
-0.417380	n.a. x x	-0.301030
-0.162108	x- x x	-0.602060
-0.281939	(x) x x	-0.124939
-0.188781	74 x x	-0.124939
-1.189619	- n.a. x	-0.425969
-0.547879	x n.a. x	-0.124939
-1.173190	n.a. n.a. x	-0.602060
-1.969468	the object x	-0.124939
-0.575453	x * x	-0.425969
-0.558579	(2.5f * x	-0.124939
-0.558579	8.0f) * x	-0.124939
-0.836732	{ return x	-0.726999
-0.597448	there between x	-0.124939
-1.639784	= 0; x	-0.425969
-1.498545	For example, x	-0.425969
-1.606804	to access x	-0.124939
-0.595329	former case x	-0.124939
-0.363943	for small x	-0.425969
-1.585728	to get x	-0.124939
-1.702812	to store x	-0.124939
-1.226119	*= x; x	-0.124939
-0.574998	- 2, x	-0.124939
-0.574998	// square x	-0.124939
-0.840948	can modify x	-0.124939
-1.077107	x, y; x	-0.124939
-0.970332	= -1 x	-0.124939
-0.024402	x x- x	-0.425969
-0.107099	x- x- x	-0.124939
-0.540820	15.1a. Calculate x	-0.124939
-0.915668	subexpression elimination x	-0.124939
-1.008806	= 2.0; x	-0.124939
-0.042704	x x-- x	-0.124939
-0.090070	reductions: x-- x	-0.124939
-0.504319	Devirtualization ---x----- x	-0.124939
-0.203935	x (x) x	-0.124939
-0.463002	x-xxx---x x-xxx---x x	-0.124939
-0.463002	x 74 x	-0.124939
-0.463002	a+b=b+a, a*b=b*a x	-0.124939
-0.358428	- xx x	-0.124939
-0.358428	constructor initializes x	-0.124939
-0.599391	powerful and may	-0.124939
-0.599391	ambiguous and may	-0.124939
-0.675570	variables that may	-0.425969
-1.385198	instructions that may	-0.124939
-0.884467	advantages that may	-0.124939
-0.592649	cleanup that may	-0.124939
-0.592649	Things that may	-0.124939
-0.979934	and it may	-0.425969
-1.246849	that it may	-0.124939
-0.670197	then it may	-0.602060
-1.070758	because it may	-0.124939
-0.804932	but it may	-0.726999
-0.189577	example, it may	-0.124939
-0.533776	optimization it may	-0.124939
-0.899976	case it may	-0.124939
-1.121825	But it may	-0.124939
-0.533776	arrays, it may	-0.124939
-0.533776	it, it may	-0.124939
-1.541169	The function may	-0.124939
-1.175999	same function may	-0.124939
-1.457072	critical function may	-0.124939
-1.063622	error code may	-0.124939
-1.063622	compiled code may	-0.124939
-1.134501	memory. This may	-0.124939
-0.563254	again. This may	-0.124939
-0.971835	modules. This may	-0.124939
-0.563254	reasons. This may	-0.124939
-0.828663	inline. This may	-0.124939
-0.563254	72 This may	-0.124939
-0.563254	range. This may	-0.124939
-0.563254	full. This may	-0.124939
-0.563254	entries. This may	-0.124939
-0.563254	reduced. This may	-0.124939
-0.563254	45. This may	-0.124939
-1.125285	the compiler may	-0.271067
-1.150078	a compiler may	-0.124939
-0.969694	of compiler may	-0.124939
-0.540162	The compiler may	-0.359022
-0.969694	A compiler may	-0.124939
-1.226679	optimizing compiler may	-0.124939
-1.298845	and you may	-0.124939
-0.480624	or you may	-0.124939
-0.450256	then you may	-0.249877
-0.844191	but you may	-0.124939
-0.883872	example, you may	-0.124939
-0.686859	cases you may	-0.124939
-0.686859	Windows, you may	-0.124939
-0.480624	code, you may	-0.124939
-0.686859	systems, you may	-0.124939
-0.480624	object, you may	-0.124939
-0.166049	Alternatively, you may	-0.124939
-0.480624	case, you may	-0.124939
-0.480624	Furthermore, you may	-0.124939
-0.480624	fact, you may	-0.124939
-0.480624	better, you may	-0.124939
-0.480624	safety, you may	-0.124939
-1.409694	because this may	-0.124939
-0.594304	systems, this may	-0.124939
-1.195299	extra time may	-0.124939
-0.502778	arrays It may	-0.124939
-1.052925	pointer. It may	-0.124939
-0.502778	registers. It may	-0.124939
-0.502778	cycles. It may	-0.124939
-0.502778	vector. It may	-0.124939
-0.502778	branch. It may	-0.124939
-0.723053	space. It may	-0.124939
-0.502778	software. It may	-0.124939
-0.502778	anyway. It may	-0.124939
-0.723053	here. It may	-0.124939
-0.502778	one. It may	-0.124939
-0.502778	so. It may	-0.124939
-0.502778	predictable. It may	-0.124939
-0.502778	safer. It may	-0.124939
-0.502778	profile. It may	-0.124939
-0.502778	objects? It may	-0.124939
-0.502778	high. It may	-0.124939
-0.502778	unavoidable. It may	-0.124939
-1.256037	allocated memory may	-0.124939
-1.167760	RAM memory may	-0.124939
-1.631980	the program may	-0.124939
-1.741367	a program may	-0.124939
-1.525081	The program may	-0.124939
-0.599148	inlined functions may	-0.124939
-2.249131	the CPU may	-0.124939
-0.574485	CPUs which may	-0.124939
-0.574485	intervals which may	-0.124939
-0.574485	moved, which may	-0.124939
-0.574485	results, which may	-0.124939
-0.598993	security, but may	-0.124939
-1.359773	level-1 cache may	-0.124939
-1.068161	An integer may	-0.124939
-2.409972	instruction set may	-0.124939
-1.191471	above example may	-0.124939
-1.501578	the compilers may	-0.124939
-0.857615	future compilers may	-0.124939
-0.857615	current compilers may	-0.124939
-1.425077	line size may	-0.124939
-1.286293	smart pointer may	-0.124939
-1.066416	interface library may	-0.124939
-1.525046	then there may	-0.124939
-0.968242	because there may	-0.124939
-0.561848	so there may	-0.124939
-1.224016	However, there may	-0.124939
-0.534664	dispatching There may	-0.124939
-0.534664	user. There may	-0.124939
-0.534664	faster. There may	-0.124939
-0.534664	Mbytes. There may	-0.124939
-0.534664	36. There may	-0.124939
-0.534664	inheritance. There may	-0.124939
-0.893817	allocated array may	-0.124939
-1.349470	that we may	-0.124939
-0.790963	case we may	-0.124939
-0.542388	future we may	-0.124939
-0.542388	available, we may	-0.124939
-0.542388	algebra, we may	-0.124939
-0.893021	Global variables may	-0.124939
-0.260603	// You may	-0.124939
-0.372861	time. You may	-0.124939
-0.260603	used. You may	-0.124939
-0.372861	efficient. You may	-0.124939
-0.260603	below. You may	-0.124939
-0.260603	called. You may	-0.124939
-0.260603	pointer. You may	-0.124939
-0.372861	operations. You may	-0.124939
-0.260603	needed. You may	-0.124939
-0.260603	classes. You may	-0.124939
-0.260603	precision. You may	-0.124939
-0.260603	way. You may	-0.124939
-0.260603	vector. You may	-0.124939
-0.260603	zero. You may	-0.124939
-0.260603	anyway. You may	-0.124939
-0.260603	application. You may	-0.124939
-0.111041	cores. You may	-0.124939
-0.260603	modules. You may	-0.124939
-0.260603	itself. You may	-0.124939
-0.260603	expensive. You may	-0.124939
-0.260603	debugger. You may	-0.124939
-0.260603	52. You may	-0.124939
-0.260603	question. You may	-0.124939
-0.260603	model. You may	-0.124939
-0.260603	incompatible. You may	-0.124939
-0.260603	today. You may	-0.124939
-0.260603	108 You may	-0.124939
-0.260603	-fno-strict-overflow. You may	-0.124939
-0.260603	contention. You may	-0.124939
-0.260603	64). You may	-0.124939
-0.260603	used). You may	-0.124939
-0.260603	dilemma. You may	-0.124939
-0.580984	this table may	-0.124939
-0.580984	written table may	-0.124939
-1.223316	of pointers may	-0.124939
-0.574581	use pointers may	-0.124939
-0.595041	other systems may	-0.124939
-1.059346	The user may	-0.124939
-0.595785	or they may	-0.124939
-1.681242	This method may	-0.124939
-0.572902	vector method may	-0.124939
-1.058041	network access may	-0.124939
-1.063714	the system may	-0.124939
-0.931832	operating system may	-0.124939
-0.594147	few times may	-0.124939
-1.404931	64-bit Windows may	-0.124939
-1.516338	function calls may	-0.124939
-0.566116	Function calls may	-0.124939
-0.593165	The calculations may	-0.124939
-0.886376	program execution may	-0.124939
-0.190706	virtual processor may	-0.124939
-0.538787	M processor may	-0.124939
-0.592476	www.agner.org/optimize/cppexamples.zip. These may	-0.124939
-1.167679	each thread may	-0.124939
-0.592182	history, etc. may	-0.124939
-0.592304	A calculation may	-0.124939
-1.331297	efficient solution may	-0.124939
-1.251447	the container may	-0.124939
-1.597319	repeat count may	-0.124939
-1.764669	memory allocation may	-0.124939
-0.588275	The branches may	-0.124939
-1.150739	and multiplication may	-0.124939
-0.588339	C++ implementation may	-0.124939
-0.588144	class members may	-0.124939
-0.874237	following methods may	-0.124939
-1.613024	or reference may	-0.124939
-0.586691	a programmer may	-0.124939
-0.587242	annoying. We may	-0.124939
-0.872240	unwinding mechanism may	-0.124939
-0.895340	point expressions may	-0.124939
-0.531775	such expressions may	-0.124939
-1.028596	runtime framework may	-0.124939
-0.866523	on process may	-0.124939
-0.518309	simple constructor may	-0.124939
-1.089955	copy constructor may	-0.124939
-0.582100	Some modules may	-0.124939
-0.582935	given here may	-0.124939
-1.019113	data section may	-0.124939
-1.017323	The syntax may	-0.124939
-0.860591	the profiler may	-0.124939
-0.858531	a network may	-0.124939
-0.804319	clock frequency may	-0.124939
-0.571189	consuming updates may	-0.124939
-0.568705	int declaration may	-0.124939
-0.986573	hash map may	-0.124939
-0.565419	and writes may	-0.124939
-1.008752	program logic may	-0.124939
-0.565419	The usability may	-0.124939
-1.217089	dependency chain may	-0.124939
-0.556505	memory. They may	-0.124939
-1.042970	garbage collection may	-0.124939
-0.555831	The developers may	-0.124939
-0.815083	time measurements may	-0.124939
-0.978789	just-in-time compilation may	-0.124939
-1.038434	device drivers may	-0.124939
-0.539542	parameter. Templates may	-0.124939
-0.525224	similar solutions may	-0.124939
-0.525224	returns. alloca may	-0.124939
-0.525224	this unit-test may	-0.124939
-0.938976	binary tree may	-0.124939
-0.503057	the advices may	-0.124939
-0.503057	once. One may	-0.124939
-0.503057	Bitfields Bitfields may	-0.124939
-0.503754	* 2.5 may	-0.124939
-0.723517	computationally intensive may	-0.124939
-0.461855	Calling exit may	-0.124939
-0.461855	// Overflow may	-0.124939
-0.657066	test setup may	-0.124939
-0.357526	the tolerance may	-0.124939
-0.357526	USB sticks may	-0.124939
-0.357526	software developer may	-0.124939
-0.357526	first dimension may	-0.124939
-0.357526	operator (^) may	-0.124939
-0.587982	to and you	-0.124939
-1.403548	function and you	-0.124939
-1.770072	code and you	-0.124939
-1.519774	functions and you	-0.124939
-1.038246	library and you	-0.124939
-0.875376	efficient and you	-0.124939
-1.150355	access and you	-0.124939
-0.587982	handling and you	-0.124939
-0.875376	features, and you	-0.124939
-0.587982	anyway and you	-0.124939
-0.587982	odd and you	-0.124939
-0.587982	operator; and you	-0.124939
-0.587982	sequential, and you	-0.124939
-1.976010	is that you	-0.124939
-1.617136	code that you	-0.124939
-1.312919	compiler that you	-0.124939
-0.838975	instruction that you	-0.124939
-0.838975	set that you	-0.124939
-1.451759	so that you	-0.124939
-0.838975	arrays that you	-0.124939
-0.568821	processor that you	-0.124939
-0.484903	requires that you	-0.124939
-0.568821	options that you	-0.124939
-1.208090	Assume that you	-0.124939
-0.838975	thing that you	-0.124939
-0.568821	statements that you	-0.124939
-0.197265	counts that you	-0.124939
-0.568821	course, that you	-0.124939
-1.084150	think that you	-0.124939
-0.568821	unrealistic that you	-0.124939
-0.568821	say that you	-0.124939
-0.601080	Neither can you	-0.124939
-0.600225	purpose, or you	-0.124939
-0.749479	and if you	-0.124939
-0.937262	that if you	-0.124939
-0.749479	as if you	-0.124939
-1.199703	only if you	-0.124939
-1.109795	loop if you	-0.124939
-0.186072	example if you	-0.124939
-0.518498	size if you	-0.124939
-0.518498	table if you	-0.124939
-1.199703	example, if you	-0.124939
-0.518498	unsigned if you	-0.124939
-0.518498	important if you	-0.124939
-0.749479	CPUs if you	-0.124939
-0.749479	necessary if you	-0.124939
-0.518498	option if you	-0.124939
-0.518498	good if you	-0.124939
-0.518498	precision if you	-0.124939
-0.518498	line if you	-0.124939
-0.518498	section if you	-0.124939
-0.518498	C if you	-0.124939
-0.749479	global if you	-0.124939
-0.749479	vectorized if you	-0.124939
-0.518498	organized if you	-0.124939
-0.518498	help if you	-0.124939
-0.518498	explanation if you	-0.124939
-0.518498	devices if you	-0.124939
-0.518498	anyway if you	-0.124939
-0.518498	questions if you	-0.124939
-0.518498	aliasing" if you	-0.124939
-0.518498	leaks if you	-0.124939
-0.518498	adjusted if you	-0.124939
-0.518498	panic if you	-0.124939
-2.480148	the code you	-0.124939
-1.929638	of code you	-0.124939
-0.482032	long as you	-0.124939
-0.865122	soon as you	-0.124939
-0.582661	clumsy, as you	-0.124939
-0.582661	issue, as you	-0.124939
-1.920098	the compiler you	-0.124939
-2.087906	the time you	-0.124939
-1.729001	The time you	-0.124939
-1.319695	first time you	-0.124939
-0.825157	code when you	-0.124939
-0.561348	do when you	-0.124939
-0.825157	example when you	-0.124939
-0.561348	first when you	-0.124939
-0.825157	signed when you	-0.124939
-0.825157	counters when you	-0.124939
-0.561348	readable when you	-0.124939
-0.561348	indices when you	-0.124939
-0.688948	are then you	-0.124939
-0.303157	code then you	-0.124939
-0.608473	program then you	-0.124939
-0.735570	functions then you	-0.124939
-0.430233	loop then you	-0.124939
-0.430233	cache then you	-0.124939
-0.608473	pointer then you	-0.124939
-0.430233	object then you	-0.124939
-0.430233	calculations then you	-0.124939
-0.430233	works then you	-0.124939
-0.430233	structure then you	-0.124939
-0.430233	... then you	-0.124939
-0.430233	application then you	-0.124939
-0.430233	handling then you	-0.124939
-0.430233	addition then you	-0.124939
-0.430233	vectors then you	-0.124939
-0.430233	supports then you	-0.124939
-0.430233	code, then you	-0.124939
-0.430233	instance then you	-0.124939
-0.430233	models then you	-0.124939
-0.430233	set, then you	-0.124939
-0.430233	independent then you	-0.124939
-0.430233	integer, then you	-0.124939
-0.430233	changes then you	-0.124939
-0.608473	not, then you	-0.124939
-0.430233	numbers, then you	-0.124939
-0.430233	efficiency, then you	-0.124939
-0.430233	differ then you	-0.124939
-0.430233	limit, then you	-0.124939
-0.430233	meaning, then you	-0.124939
-0.430233	so, then you	-0.124939
-0.430233	algorithm, then you	-0.124939
-0.430233	y?" then you	-0.124939
-2.013625	a program you	-0.124939
-0.869526	operands because you	-0.124939
-0.584954	fastest because you	-0.124939
-0.584954	course, because you	-0.124939
-0.599306	if only you	-0.124939
-0.424248	64 If you	-0.124939
-0.425108	code. If you	-0.124939
-0.678149	memory. If you	-0.124939
-0.599482	systems. If you	-0.124939
-0.599482	efficient. If you	-0.124939
-0.775573	set. If you	-0.124939
-0.599482	library. If you	-0.124939
-0.424248	cycles. If you	-0.124939
-0.599482	thread. If you	-0.124939
-0.424248	u; If you	-0.124939
-0.424248	storage. If you	-0.124939
-0.424248	16. If you	-0.124939
-0.424248	branches. If you	-0.124939
-0.424248	www.agner.org/optimize/asmlib.zip. If you	-0.124939
-0.424248	programs. If you	-0.124939
-0.162257	results. If you	-0.124939
-0.424248	errors. If you	-0.124939
-0.424248	methods. If you	-0.124939
-0.424248	macro. If you	-0.124939
-0.424248	__debugbreak();. If you	-0.124939
-0.424248	(www.intel.com). If you	-0.124939
-0.424248	ordering? If you	-0.124939
-0.424248	42 If you	-0.124939
-0.424248	sequence. If you	-0.124939
-0.424248	152 If you	-0.124939
-0.849073	automatically but you	-0.124939
-0.574212	system, but you	-0.124939
-0.574212	manual, but you	-0.124939
-0.574212	type, but you	-0.124939
-1.069519	most compilers you	-0.124939
-0.570893	function where you	-0.124939
-0.570893	case where you	-0.124939
-0.570893	Internet where you	-0.124939
-0.567840	C++ so you	-0.124939
-0.567840	code, so you	-0.124939
-0.567840	bits, so you	-0.124939
-0.596830	algorithm before you	-0.124939
-0.400095	for example, you	-0.124939
-1.344621	For example, you	-0.124939
-1.132167	and how you	-0.124939
-1.433197	shows how you	-0.124939
-0.595378	endian systems you	-0.124939
-1.333276	are sure you	-0.124939
-0.552309	not sure you	-0.124939
-1.766338	make sure you	-0.124939
-0.595454	Which method you	-0.124939
-1.514649	this case you	-0.124939
-1.265491	most cases you	-0.124939
-1.547023	some cases you	-0.124939
-0.593639	vector, while you	-0.124939
-0.593632	(In Windows you	-0.124939
-1.062066	How much you	-0.124939
-0.592052	Which solution you	-0.124939
-0.815242	of whether you	-0.124939
-0.555919	difference whether you	-0.124939
-0.590007	operations. All you	-0.124939
-0.590010	first. However, you	-0.124939
-0.589373	of problems you	-0.124939
-0.352220	so unless you	-0.124939
-0.352220	& unless you	-0.124939
-0.141341	CPUs unless you	-0.425969
-0.495656	mode unless you	-0.124939
-0.352220	avoided unless you	-0.124939
-0.352220	X, unless you	-0.124939
-0.352220	b*(2.0/3.0) unless you	-0.124939
-1.236103	most cases, you	-0.124939
-0.546574	such cases, you	-0.124939
-0.402153	set. Therefore, you	-0.124939
-0.402153	it. Therefore, you	-0.124939
-0.566809	consuming. Therefore, you	-0.124939
-0.402153	exception. Therefore, you	-0.124939
-0.402153	strides. Therefore, you	-0.124939
-0.402153	namespaces. Therefore, you	-0.124939
-0.579881	that allows you	-0.425969
-0.719607	compiler allows you	-0.124939
-0.493563	does what you	-0.124939
-0.811681	know what you	-0.124939
-0.493563	exactly what you	-0.124939
-0.870590	will give you	-0.124939
-0.188655	which optimizations you	-0.124939
-0.469243	... Here, you	-0.124939
-0.469243	testing. Here, you	-0.124939
-0.469243	blog. Here, you	-0.124939
-0.742874	of things you	-0.124939
-0.514605	various things you	-0.124939
-0.509579	in Windows, you	-0.124939
-0.509579	In Windows, you	-0.124939
-0.579445	to code, you	-0.124939
-0.576513	are: When you	-0.124939
-0.576958	on until you	-0.124939
-0.473415	that allow you	-0.124939
-0.770770	systems allow you	-0.124939
-0.473692	C++ program, you	-0.124939
-0.473692	your program, you	-0.124939
-0.838851	the compiler, you	-0.124939
-0.569322	dispatching. Obviously, you	-0.124939
-0.449671	... Here you	-0.124939
-0.449671	way. Here you	-0.124939
-0.433896	most systems, you	-0.124939
-0.614006	64-bit systems, you	-0.124939
-0.556539	composite object, you	-0.124939
-0.556014	false. Likewise, you	-0.124939
-0.151281	functions. Alternatively, you	-0.124939
-0.151281	size. Alternatively, you	-0.124939
-0.151281	stack. Alternatively, you	-0.124939
-0.151281	returns. Alternatively, you	-0.124939
-0.151281	Windows). Alternatively, you	-0.124939
-0.249474	In general, you	-0.301030
-0.786638	latter case, you	-0.124939
-0.143138	vector library, you	-0.124939
-0.525708	executed. Furthermore, you	-0.124939
-0.525708	that 150 you	-0.124939
-0.310742	other words, you	-0.124939
-0.526137	support. Then you	-0.124939
-0.503518	smallest devices, you	-0.124939
-0.724282	out-of-order execution, you	-0.124939
-0.832653	is slow, you	-0.124939
-0.249272	etc. Whether you	-0.124939
-0.249272	Sum3. Whether you	-0.124939
-0.901277	In fact, you	-0.124939
-0.832653	the contrary, you	-0.124939
-0.249272	spots Before you	-0.124939
-0.249272	tasks. Before you	-0.124939
-0.503518	NOT. Instead, you	-0.124939
-0.462273	| operator; you	-0.124939
-0.462273	specific purpose, you	-0.124939
-0.462273	The insight you	-0.124939
-0.462273	Even better, you	-0.124939
-0.357855	example 8.21, you	-0.124939
-0.357855	jeopardizing safety, you	-0.124939
-0.357855	optimize anything, you	-0.124939
-0.357855	this reason, you	-0.124939
-0.357855	way. First you	-0.124939
-0.597769	Greek[4] = {	-0.124939
-0.597769	coef[16] = {	-0.124939
-0.599301	class vector {	-0.124939
-0.066425	100; i++) {	-0.669007
-0.085406	size; i++) {	-0.124939
-0.210289	n; i++) {	-0.124939
-0.130073	256; i++) {	-0.124939
-0.130073	1000; i++) {	-0.124939
-0.130073	20; i++) {	-0.124939
-0.210289	rows; i++) {	-0.124939
-0.060185	NumberOfTests; i++) {	-0.124939
-0.130073	arraysize; i++) {	-0.124939
-0.130073	list.Size(); i++) {	-0.124939
-0.334450	if else {	-0.124939
-0.013477	} else {	-0.403692
-0.086477	34 else {	-0.124939
-0.086477	68 else {	-0.124939
-0.003019	const x) {	-0.301030
-0.001507	& x) {	-0.301030
-0.004536	(int x) {	-0.124939
-0.003019	(float x) {	-0.602060
-0.002262	p(double x) {	-0.425969
-0.002262	xpow10(double x) {	-0.425969
-0.009120	(double x) {	-0.124939
-0.004536	Exp(float x) {	-0.124939
-0.009120	Func1(int x) {	-0.124939
-0.009120	Func2(double x) {	-0.124939
-0.210715	{ union {	-0.124939
-0.210715	14.28 union {	-0.124939
-0.210715	14.23b union {	-0.124939
-0.210715	14.26 union {	-0.124939
-0.210715	14.27 union {	-0.124939
-0.210715	14.23 union {	-0.124939
-0.210715	7.39 union {	-0.124939
-0.210715	14.29 union {	-0.124939
-0.210715	14.24 union {	-0.124939
-0.210715	14.25 union {	-0.124939
-0.333910	& b) {	-0.124939
-0.062539	bool b) {	-0.726999
-0.221844	struct S1 {	-0.124939
-0.576265	{ struct {	-0.124939
-0.115782	< 0) {	-0.124939
-0.193155	> 0) {	-0.124939
-0.067327	== 0) {	-0.124939
-0.085927	!= 0) {	-0.301030
-0.561732	{ try {	-0.124939
-0.032535	* p) {	-0.492916
-0.003824	int cc[]) {	-0.550907
-0.255395	* 2) {	-0.124939
-0.069663	+= 2) {	-0.124939
-0.005002	if (b) {	-0.492916
-1.078294	class C1 {	-0.124939
-1.361676	int parm2) {	-0.124939
-0.222701	< 4) {	-0.124939
-0.222701	+= 4) {	-0.124939
-0.097229	>= 4) {	-0.425969
-0.802244	class c1 {	-0.124939
-0.016556	test () {	-0.124939
-0.051692	Func () {	-0.124939
-0.051692	CriticalInnerFunction () {	-0.124939
-0.029413	+= 8) {	-0.726999
-0.111375	& r) {	-0.301030
-0.015503	SIZE; r++) {	-0.425969
-0.031580	SIZE; c++) {	-0.124939
-0.031580	r; c++) {	-0.425969
-0.065638	int n) {	-0.124939
-0.020796	(int n) {	-0.301030
-0.129278	100; x++) {	-0.425969
-0.525868	class powN {	-0.124939
-0.314181	union Bitfield {	-0.124939
-0.314181	struct Bitfield {	-0.124939
-0.525350	>= size) {	-0.124939
-0.015503	void Disp() {	-0.726999
-0.172252	class CHello {	-0.124939
-0.077641	public CHello {	-0.425969
-0.129278	enum Weekdays {	-0.124939
-0.723716	< 5) {	-0.124939
-0.106946	>= 11) {	-0.425969
-0.106946	int main() {	-0.425969
-0.249146	class C0 {	-0.124939
-0.249146	public C0 {	-0.124939
-0.503177	long ReadTSC() {	-0.124939
-0.461964	+= 16) {	-0.124939
-0.142993	& a) {	-0.124939
-0.142993	(float a) {	-0.124939
-0.065638	public CParent<CChild1> {	-0.124939
-0.142993	class CGrandParent {	-0.124939
-0.142993	public CGrandParent {	-0.124939
-0.065638	F3(bool y) {	-0.124939
-0.657237	* CriticalFunctionDispatch(void) {	-0.124939
-0.657237	void F1() {	-0.124939
-0.142993	int Func2() {	-0.124939
-0.142993	void Func2() {	-0.124939
-0.065638	& 0x7FFFFFFF) {	-0.425969
-0.065638	void Hello() {	-0.425969
-0.065638	if (y) {	-0.124939
-0.065638	+= TILESIZE) {	-0.124939
-0.142993	c1+TILESIZE; c2++) {	-0.124939
-0.142993	r2; c2++) {	-0.124939
-0.065638	r1+TILESIZE; r2++) {	-0.425969
-0.065638	transpose(double a[SIZE][SIZE]) {	-0.425969
-0.461964	class SafeArray {	-0.124939
-0.065638	double b[SIZE][SIZE]) {	-0.425969
-0.461964	public B2 {	-0.124939
-0.461964	== Friday) {	-0.124939
-0.357612	< 10) {	-0.124939
-0.357612	public B1 {	-0.124939
-0.357612	& source) {	-0.124939
-0.357612	array i) {	-0.124939
-0.357612	void g() {	-0.124939
-0.357612	void F0() {	-0.124939
-0.357612	class powN<true,0> {	-0.124939
-0.357612	> largest_abs) {	-0.124939
-0.357612	class powN<true,N> {	-0.124939
-0.357612	(unsigned int)size) {	-0.124939
-0.357612	struct Sdouble {	-0.124939
-0.357612	&list[100]; temp++) {	-0.124939
-0.357612	class S2 {	-0.124939
-0.357612	class S3 {	-0.124939
-0.357612	62 __try {	-0.124939
-0.357612	if (true) {	-0.124939
-0.357612	public CParent<CChild2> {	-0.124939
-0.357612	switch (n) {	-0.124939
-0.357612	> 1.0) {	-0.124939
-0.357612	int m) {	-0.124939
-0.357612	WriteFile(handle, ...)) {	-0.124939
-0.357612	: EXCEPTION_CONTINUE_SEARCH) {	-0.124939
-0.357612	<= max) {	-0.124939
-0.357612	int x[]) {	-0.124939
-0.357612	struct Slongdouble {	-0.124939
-0.357612	16; n++) {	-0.124939
-0.357612	| Friday)) {	-0.124939
-0.357612	struct Sfloat {	-0.124939
-0.357612	void MathLoop() {	-0.124939
-0.357612	int Size() {	-0.124939
-0.357612	>= N) {	-0.124939
-0.357612	void xplus2() {	-0.124939
-0.357612	< arraysize) {	-0.124939
-0.357612	void Func() {	-0.124939
-0.357612	> v.i) {	-0.124939
-0.357612	class powN<true,1> {	-0.124939
-0.357612	catch (...) {	-0.124939
-0.357612	< 13) {	-0.124939
-0.357612	- min)) {	-0.124939
-0.357612	public: SafeArray() {	-0.124939
-0.357612	__restrict bb) {	-0.124939
-0.357612	void DelayFiveSeconds() {	-0.124939
-2.102908	is to have	-0.124939
-1.668460	efficient to have	-0.124939
-1.394758	sure to have	-0.124939
-1.806535	important to have	-0.124939
-1.820713	necessary to have	-0.124939
-0.887463	good to have	-0.124939
-1.349499	certain to have	-0.124939
-0.887463	allowed to have	-0.124939
-1.173408	convenient to have	-0.124939
-0.887463	sufficient to have	-0.124939
-0.898295	instruction and have	-0.124939
-1.289140	loop and have	-0.124939
-1.107132	functions that have	-0.425969
-1.213286	compilers that have	-0.124939
-0.583036	registers that have	-0.124939
-1.024439	systems that have	-0.124939
-1.484746	processors that have	-0.124939
-0.372501	operators that have	-0.124939
-0.372501	programs that have	-0.124939
-0.583036	labels that have	-0.124939
-0.583036	Programmers that have	-0.124939
-1.975422	it can have	-0.124939
-2.042077	This can have	-0.124939
-1.502658	you can have	-0.124939
-1.050063	processors can have	-0.124939
-2.621152	the code have	-0.124939
-1.496649	will not have	-0.124939
-0.804230	do not have	-0.124939
-1.242520	does not have	-0.124939
-2.599128	the compiler have	-0.124939
-1.650418	you may have	-0.124939
-1.072341	program may have	-0.124939
-1.650788	You may have	-0.124939
-0.565182	systems may have	-0.124939
-1.072341	system may have	-0.124939
-1.008312	processor may have	-0.124939
-0.565182	etc. may have	-0.124939
-0.832221	expressions may have	-0.124939
-0.565182	unit-test may have	-0.124939
-1.073287	that you have	-0.124939
-1.201510	if you have	-0.425969
-1.049172	as you have	-0.124939
-1.095908	then you have	-0.602060
-0.526325	systems you have	-0.124939
-0.526325	case you have	-0.124939
-0.526325	All you have	-0.124939
-0.730445	unless you have	-0.124939
-0.762887	optimizations you have	-0.124939
-0.762887	Here you have	-0.124939
-0.938979	general, you have	-0.124939
-0.526325	execution, you have	-0.124939
-0.526325	anything, you have	-0.124939
-0.570011	and will have	-0.124939
-1.369880	you will have	-0.124939
-1.088061	we will have	-0.124939
-0.841195	user will have	-0.124939
-0.570011	processor will have	-0.124939
-0.570011	loader will have	-0.124939
-1.255887	The data have	-0.124939
-0.592648	image data have	-0.124939
-2.396132	the program have	-0.124939
-0.789728	if functions have	-0.124939
-0.541691	vector functions have	-0.124939
-1.186684	library functions have	-0.124939
-1.347197	member functions have	-0.124939
-0.918596	these functions have	-0.124939
-0.541691	specific functions have	-0.124939
-0.541691	All functions have	-0.124939
-0.918596	string functions have	-0.124939
-0.541691	Inlined functions have	-0.124939
-1.292762	can only have	-0.124939
-1.018698	code should have	-0.124939
-0.580949	block should have	-0.124939
-1.018698	dispatcher should have	-0.124939
-1.340948	that do have	-0.124939
-1.052615	we do have	-0.124939
-0.919652	other compilers have	-0.124939
-0.969750	Intel compilers have	-0.124939
-0.930135	C++ compilers have	-0.425969
-0.316394	Some compilers have	-0.249877
-1.232992	Most compilers have	-0.124939
-0.483258	Many compilers have	-0.124939
-1.068638	code size have	-0.124939
-0.896112	by Intel have	-0.124939
-1.075665	and b have	-0.124939
-1.524965	class library have	-0.124939
-0.560189	compilers also have	-0.124939
-0.195417	systems also have	-0.425969
-0.560189	F1 also have	-0.124939
-0.118142	all objects have	-0.602060
-0.191961	Do objects have	-0.124939
-1.350283	that we have	-0.124939
-0.688699	then we have	-0.124939
-0.542571	Here, we have	-0.124939
-0.542571	numbers, we have	-0.124939
-0.893481	Such variables have	-0.124939
-0.524699	} You have	-0.124939
-0.524699	functions You have	-0.124939
-0.524699	handling. You have	-0.124939
-0.524699	manual. You have	-0.124939
-0.524699	72. You have	-0.124939
-0.524699	job. You have	-0.124939
-0.196107	if elements have	-0.124939
-0.972203	all elements have	-0.124939
-0.576651	size often have	-0.124939
-0.576651	hackers often have	-0.124939
-1.117413	function libraries have	-0.124939
-0.539457	all libraries have	-0.124939
-0.539457	these libraries have	-0.124939
-0.595597	These registers have	-0.124939
-0.194052	bit systems have	-0.124939
-1.073206	Some systems have	-0.124939
-0.595944	after they have	-0.124939
-0.998658	may even have	-0.124939
-0.573527	don't even have	-0.124939
-0.594427	AVX instructions have	-0.124939
-0.594871	micro- processors have	-0.124939
-0.317619	that I have	-0.124939
-0.217340	compiler I have	-0.124939
-0.025142	compilers I have	-0.367977
-0.217340	examples I have	-0.124939
-0.095214	Here, I have	-0.124939
-0.217340	called. I have	-0.124939
-0.217340	performance. I have	-0.124939
-0.217340	element. I have	-0.124939
-0.217340	arrays. I have	-0.124939
-0.217340	number. I have	-0.124939
-0.217340	call. I have	-0.124939
-0.217340	one. I have	-0.124939
-0.217340	manually. I have	-0.124939
-0.217340	research, I have	-0.124939
-0.217340	easier. I have	-0.124939
-0.217340	tricky. I have	-0.124939
-0.217340	chapter, I have	-0.124939
-0.882945	Intel CPUs have	-0.124939
-0.496979	Some CPUs have	-0.124939
-0.496979	Many CPUs have	-0.124939
-0.713465	modern CPUs have	-0.124939
-0.496979	Current CPUs have	-0.124939
-1.265017	it does have	-0.124939
-0.789855	functions must have	-0.124939
-0.541762	class must have	-0.124939
-0.541762	task must have	-0.124939
-0.886155	address calculations have	-0.124939
-1.571248	different versions have	-0.124939
-0.564194	it doesn't have	-0.301030
-0.376051	compiler doesn't have	-0.124939
-0.537726	The threads have	-0.124939
-0.782734	other threads have	-0.124939
-0.537726	all threads have	-0.124939
-1.257526	the thread have	-0.124939
-1.051484	for Linux have	-0.124939
-0.554349	you would have	-0.124939
-0.554349	we would have	-0.124939
-0.590624	five values have	-0.124939
-0.589495	destination both have	-0.124939
-0.503849	strings typically have	-0.124939
-0.503849	devices typically have	-0.124939
-0.503849	developers typically have	-0.124939
-1.670483	should preferably have	-0.124939
-0.971513	therefore preferably have	-0.124939
-0.937150	instruction sets have	-0.124939
-0.425411	you don't have	-0.602060
-0.249542	we don't have	-0.602060
-0.358304	simply don't have	-0.124939
-0.587761	different methods have	-0.124939
-0.586364	embedded applications have	-0.124939
-1.035425	the examples have	-0.124939
-0.529690	Smaller microprocessors have	-0.124939
-0.529690	Today's microprocessors have	-0.124939
-0.694714	the operands have	-0.124939
-0.462659	where operands have	-0.124939
-0.580792	these languages have	-0.124939
-0.577978	framework sometimes have	-0.124939
-0.577837	capabilities still have	-0.124939
-0.574255	development models have	-0.124939
-0.569096	variables might have	-0.124939
-0.462692	Most programmers have	-0.124939
-0.462692	Many programmers have	-0.124939
-1.263683	the diagonal have	-0.124939
-0.566011	N1 could have	-0.124939
-0.825381	the inputs have	-0.124939
-0.967885	x86 family have	-0.124939
-0.561708	people who have	-0.124939
-1.158181	cache misses have	-0.124939
-0.414971	information. They have	-0.124939
-0.414971	64-bit. They have	-0.124939
-0.274044	that computers have	-0.124939
-0.274044	because computers have	-0.124939
-0.274044	modern computers have	-0.124939
-1.164020	hot spots have	-0.124939
-0.761751	development tools have	-0.124939
-0.525666	all caches have	-0.124939
-0.503478	software projects have	-0.124939
-0.027967	Smaller microcontrollers have	-0.602060
-0.462237	You can't have	-0.124939
-0.462237	whether others have	-0.124939
-0.462237	OS, etc.) have	-0.124939
-0.357827	Development Environments) have	-0.124939
-0.357827	Microprocessor designers have	-0.124939
-0.357827	imple- mentations have	-0.124939
-0.357827	in isolation have	-0.124939
-1.423695	performance of this	-0.124939
-1.718016	calculation of this	-0.124939
-0.818085	advantage of this	-0.124939
-0.874207	name of this	-0.124939
-1.504136	cost of this	-0.124939
-1.423695	end of this	-0.124939
-1.311315	costs of this	-0.124939
-0.921225	discussion of this	-0.124939
-1.411490	implementations of this	-0.124939
-1.311315	explanation of this	-0.124939
-1.378262	care of this	-0.124939
-1.148156	purpose of this	-0.124939
-0.341838	scope of this	-0.301030
-1.503458	a to this	-0.124939
-0.500360	solution to this	-0.425969
-0.377012	Add to this	-0.425969
-0.594106	solutions to this	-0.124939
-0.124048	appendix to this	-0.602060
-0.594106	conclusion to this	-0.124939
-1.062152	used and this	-0.124939
-1.181559	integer and this	-0.124939
-0.891663	integer, and this	-0.124939
-0.596309	accessed, and this	-0.124939
-0.596309	infinity, and this	-0.124939
-1.189979	and in this	-0.124939
-1.688904	code in this	-0.124939
-1.423112	data in this	-0.124939
-1.445251	loop in this	-0.124939
-1.166148	but in this	-0.124939
-0.571524	float in this	-0.124939
-0.844025	address in this	-0.124939
-0.571524	libraries in this	-0.124939
-0.571524	0 in this	-0.124939
-0.571524	Windows in this	-0.124939
-0.993346	solution in this	-0.124939
-0.844025	handling in this	-0.124939
-0.993346	examples in this	-0.124939
-1.189979	needed in this	-0.124939
-0.844025	statement in this	-0.124939
-1.294376	columns in this	-0.124939
-1.421420	described in this	-0.124939
-1.215194	occur in this	-0.124939
-0.844025	message in this	-0.124939
-0.571524	define in this	-0.124939
-0.571524	exceptions in this	-0.124939
-0.571524	measured in this	-0.124939
-0.571524	reduction in this	-0.124939
-1.020116	illustrated in this	-0.124939
-0.571524	MultiplyBy in this	-0.124939
-0.571524	0] in this	-0.124939
-0.571524	formulas in this	-0.124939
-0.571524	1.2 in this	-0.124939
-0.571524	volumes in this	-0.124939
-0.600730	parameters). The this	-0.124939
-1.410781	function for this	-0.124939
-1.563290	code for this	-0.124939
-0.496208	reason for this	-0.124939
-0.587257	reasons for this	-0.124939
-0.587257	mispredicted for this	-0.124939
-1.036205	designed for this	-0.124939
-0.587257	basis for this	-0.124939
-0.587257	125 for this	-0.124939
-0.587257	doubled for this	-0.124939
-1.475371	compiler that this	-0.124939
-1.283764	certain that this	-0.124939
-0.894371	note that this	-0.124939
-0.900224	0 // this	-0.124939
-0.600860	tell it this	-0.124939
-1.753886	or if this	-0.124939
-1.422656	loop if this	-0.124939
-0.589272	automatically if this	-0.124939
-0.589272	know if this	-0.124939
-0.589272	tables if this	-0.124939
-1.232570	processors with this	-0.124939
-1.232570	problem with this	-0.124939
-1.161764	AND'ed with this	-0.124939
-0.874399	dealing with this	-0.124939
-0.587477	disagree with this	-0.124939
-0.590183	more on this	-0.124939
-0.592533	turn on this	-0.425969
-0.590183	measurements on this	-0.124939
-1.437360	long as this	-0.124939
-1.462868	not have this	-0.124939
-1.808387	you have this	-0.124939
-1.859229	to use this	-0.124939
-0.883019	can use this	-0.301030
-1.309360	then use this	-0.124939
-0.970345	should use this	-0.124939
-0.970345	always use this	-0.124939
-0.827591	normally use this	-0.124939
-0.587518	elements then this	-0.124939
-0.587518	stride then this	-0.124939
-0.874477	cycles, then this	-0.124939
-0.592867	clear from this	-0.124939
-0.592867	learn from this	-0.124939
-1.825029	known at this	-0.124939
-2.133552	to make this	-0.124939
-1.503611	and make this	-0.124939
-1.472365	can make this	-0.124939
-1.193571	not make this	-0.124939
-1.177101	function because this	-0.124939
-0.955252	pointers because this	-0.124939
-0.556706	0 because this	-0.124939
-0.556706	processors because this	-0.124939
-0.556706	position-independent because this	-0.124939
-0.556706	tasks because this	-0.124939
-0.556706	0x80000000; because this	-0.124939
-0.591186	size. If this	-0.124939
-0.591186	address. If this	-0.124939
-1.285653	on which this	-0.124939
-0.729744	code, but this	-0.124939
-0.506795	course, but this	-0.124939
-0.506795	library, but this	-0.124939
-0.729744	pointer, but this	-0.124939
-0.506795	occurs, but this	-0.124939
-0.506795	metaprogramming, but this	-0.124939
-0.506795	unit, but this	-0.124939
-0.506795	factorials, but this	-0.124939
-0.506795	2-20, but this	-0.124939
-0.506795	-mcmodel=large, but this	-0.124939
-0.506795	block, but this	-0.124939
-0.506795	-ftrapv, but this	-0.124939
-0.506795	symbols, but this	-0.124939
-1.234306	to do this	-0.124939
-1.070181	can do this	-0.124939
-0.939677	will do this	-0.124939
-0.895506	am using this	-0.124939
-0.317506	} In this	-0.425969
-0.653490	code. In this	-0.124939
-0.459570	b; In this	-0.124939
-0.459570	resources. In this	-0.124939
-0.653490	speed. In this	-0.124939
-0.459570	chains. In this	-0.124939
-0.459570	55 In this	-0.124939
-0.459570	elsewhere. In this	-0.124939
-0.459570	71). In this	-0.124939
-0.459570	cycle? In this	-0.124939
-0.459570	divisor. In this	-0.124939
-0.459570	g(x)); In this	-0.124939
-0.597398	prefetching so this	-0.124939
-0.596372	will call this	-0.124939
-0.597034	systems". For this	-0.124939
-0.596438	preceding example, this	-0.124939
-1.384093	shows how this	-0.124939
-1.083063	know how this	-0.124939
-0.557851	describes how this	-0.124939
-1.703010	to test this	-0.124939
-1.902044	operating system this	-0.124939
-1.732486	some cases this	-0.124939
-1.873645	you want this	-0.124939
-0.568983	don't want this	-0.124939
-0.593742	worried about this	-0.124939
-1.057019	It does this	-0.124939
-0.982226	to avoid this	-0.124939
-0.674998	can avoid this	-0.124939
-0.765752	may avoid this	-0.124939
-0.671294	cannot avoid this	-0.124939
-0.564744	time. But this	-0.124939
-0.564744	today. But this	-0.124939
-1.062757	object through this	-0.124939
-1.251113	that support this	-0.124939
-1.259349	to inline this	-0.124939
-0.883957	to optimize this	-0.124939
-0.509872	will optimize this	-0.124939
-0.475340	used. However, this	-0.124939
-0.475340	processor. However, this	-0.124939
-0.475340	flow. However, this	-0.124939
-0.475340	maintenance. However, this	-0.124939
-0.120831	may replace this	-0.522879
-0.566417	will replace this	-0.124939
-0.588221	implemented like this	-0.124939
-1.050467	are running this	-0.124939
-0.586471	same after this	-0.124939
-0.872056	only read this	-0.124939
-1.587398	example shows this	-0.124939
-1.367614	can improve this	-0.124939
-0.881149	to reduce this	-0.124939
-0.173987	may reduce this	-0.425969
-1.043898	and choose this	-0.124939
-0.514475	work around this	-0.124939
-0.514475	ways around this	-0.124939
-0.925043	compiler supports this	-0.124939
-0.854891	CPU supports this	-0.124939
-0.863839	may change this	-0.124939
-0.579532	80 Unfortunately, this	-0.124939
-0.857376	is outside this	-0.124939
-0.577809	To prevent this	-0.124939
-0.574445	b[i]*c[i], though this	-0.124939
-0.574445	instructions during this	-0.124939
-0.572020	best under this	-0.124939
-0.572418	and expect this	-0.124939
-0.929206	reason why this	-0.124939
-0.473073	explanation why this	-0.124939
-0.462565	variables. Obviously, this	-0.124939
-0.462565	finished. Obviously, this	-0.124939
-1.318327	to implement this	-0.124939
-0.566337	[ecx+eax*4],ebx stores this	-0.124939
-0.561269	Mac systems, this	-0.124939
-1.140535	operating system, this	-0.124939
-1.043625	can eliminate this	-0.124939
-0.548827	Of course, this	-0.124939
-0.548827	am giving this	-0.124939
-1.038887	to overcome this	-0.124939
-0.539733	to 122 this	-0.124939
-0.540140	please install this	-0.124939
-0.539327	ebx,1 adds this	-0.124939
-0.994175	but unfortunately this	-0.124939
-0.525456	edx. Furthermore, this	-0.124939
-0.762236	me explain this	-0.124939
-0.525456	remedies against this	-0.124939
-0.525947	cause overflow, this	-0.124939
-0.761389	by changing this	-0.124939
-0.503277	may skip this	-0.124939
-0.503277	computers. At this	-0.124939
-0.503901	has solved this	-0.124939
-0.106960	to solve this	-0.124939
-0.462055	microprocessor handles this	-0.124939
-0.462055	will conclude this	-0.124939
-0.357684	can subtract this	-0.124939
-0.357684	often underestimate this	-0.124939
-0.357684	have confirmed this	-0.124939
-0.357684	1.2345; Change this	-0.124939
-0.357684	to reflect this	-0.124939
-1.996655	is the time	-0.124939
-1.573919	of the time	-0.279841
-1.834248	to the time	-0.425969
-1.678573	and the time	-0.124939
-2.117838	if the time	-0.124939
-1.593571	with the time	-0.124939
-2.018115	on the time	-0.124939
-1.142390	than the time	-0.425969
-1.457567	have the time	-0.124939
-0.123038	this the time	-0.301030
-0.674325	at the time	-0.380211
-1.547598	only the time	-0.124939
-0.870089	also the time	-0.124939
-1.816087	before the time	-0.124939
-1.616163	calculate the time	-0.124939
-1.155514	read the time	-0.124939
-1.222802	includes the time	-0.124939
-1.222802	stores the time	-0.124939
-1.222802	increase the time	-0.124939
-1.030576	reducing the time	-0.124939
-0.870089	Consider the time	-0.124939
-0.870089	measuring the time	-0.124939
-0.585246	to) the time	-0.124939
-2.392050	function is time	-0.124939
-1.971776	be a time	-0.124939
-2.155249	as a time	-0.124939
-0.556095	at a time	-0.301030
-1.265176	lot of time	-0.124939
-0.852730	waste of time	-0.124939
-0.203196	ahead of time	-0.124939
-1.075708	complicated and time	-0.124939
-1.392236	time. The time	-0.124939
-1.341083	program. The time	-0.124939
-1.191485	below. The time	-0.124939
-1.191485	loop. The time	-0.124939
-0.577829	installation The time	-0.124939
-0.577829	bytes. The time	-0.124939
-1.114385	calculations. The time	-0.124939
-0.577829	prediction. The time	-0.124939
-0.577829	maintain. The time	-0.124939
-0.855909	doubled. The time	-0.124939
-0.577829	input. The time	-0.124939
-0.855909	style. The time	-0.124939
-0.577829	run. The time	-0.124939
-0.577829	virtualization. The time	-0.124939
-0.577829	__rdtsc()). The time	-0.124939
-0.577829	tolerance. The time	-0.124939
-0.577829	costs. The time	-0.124939
-2.887443	can be time	-0.124939
-1.294958	methods are time	-0.124939
-0.600946	resolution if time	-0.124939
-0.600599	resources. This time	-0.124939
-0.594724	during this time	-0.124939
-0.594724	underestimate this time	-0.124939
-1.196029	programs use time	-0.124939
-0.763271	use more time	-0.124939
-0.458287	no more time	-0.425969
-0.174580	takes more time	-0.249877
-0.372579	take more time	-0.425969
-0.800702	much more time	-0.425969
-0.526548	consume more time	-0.124939
-0.526548	40% more time	-0.124939
-0.593114	sum1 from time	-0.124939
-0.593114	sum2 from time	-0.124939
-1.415540	the same time	-0.221849
-2.089033	the CPU time	-0.124939
-0.591546	more CPU time	-0.124939
-0.599549	measurement. If time	-0.124939
-1.856213	only one time	-0.124939
-0.193544	than each time	-0.425969
-0.551583	memory each time	-0.124939
-0.551583	value each time	-0.124939
-0.807396	after each time	-0.124939
-0.551583	updates each time	-0.124939
-0.597791	course also time	-0.124939
-0.597775	obviously takes time	-0.124939
-1.766548	is very time	-0.124939
-0.533746	a long time	-0.124939
-1.313805	as long time	-0.124939
-0.941566	very long time	-0.124939
-0.495933	how long time	-0.124939
-0.495933	too long time	-0.124939
-0.892987	are critical time	-0.124939
-0.856305	the first time	-0.249877
-1.223332	The first time	-0.124939
-0.510825	only first time	-0.124939
-0.889681	because these time	-0.124939
-0.890277	a short time	-0.124939
-0.511227	of its time	-0.124939
-0.519352	The extra time	-0.124939
-0.554183	no extra time	-0.124939
-0.704987	the execution time	-0.124939
-0.516179	their execution time	-0.124939
-0.745538	total execution time	-0.124939
-0.736316	and much time	-0.124939
-0.227854	how much time	-0.301030
-0.395264	at compile time	-0.204120
-1.112864	the calculation time	-0.425969
-0.528678	estimated calculation time	-0.124939
-1.165207	will get time	-0.124939
-0.233244	this every time	-0.124939
-0.233244	example every time	-0.124939
-0.233244	called every time	-0.124939
-0.233244	done every time	-0.124939
-0.233244	list every time	-0.124939
-0.233244	branches every time	-0.124939
-0.101148	block every time	-0.425969
-0.233244	loaded every time	-0.124939
-0.233244	updates every time	-0.124939
-0.233244	misprediction every time	-0.124939
-0.233244	evaluated every time	-0.124939
-0.233244	updated every time	-0.124939
-0.233244	re-allocated every time	-0.124939
-0.233244	re-calculated every time	-0.124939
-1.316300	the next time	-0.124939
-1.146431	The next time	-0.124939
-1.151363	of their time	-0.124939
-0.191036	The development time	-0.124939
-1.151130	The conversion time	-0.124939
-0.189991	as last time	-0.124939
-0.163494	takes longer time	-0.124939
-0.729816	take longer time	-0.124939
-0.303141	making longer time	-0.124939
-0.079603	much longer time	-0.602060
-0.585602	(methods) Each time	-0.124939
-0.585846	The load time	-0.124939
-0.581111	take installation time	-0.124939
-0.108281	the response time	-0.346788
-0.579190	parameter. No time	-0.124939
-0.577960	be particularly time	-0.124939
-1.191043	to save time	-0.124939
-1.291111	the so-called time	-0.124939
-0.574761	core during time	-0.124939
-0.414928	not spend time	-0.124939
-0.414928	never spend time	-0.124939
-0.815867	The measured time	-0.124939
-0.366205	the biggest time	-0.425969
-0.787518	to consume time	-0.124939
-0.539828	details. Development time	-0.124939
-0.525940	at regular time	-0.124939
-0.526311	more reproducible time	-0.124939
-0.504208	zero. Execution time	-0.124939
-0.503738	processor. Extra time	-0.124939
-0.504208	an annoying time	-0.124939
-0.249354	no compile- time	-0.124939
-0.249354	allow compile- time	-0.124939
-0.462473	overall computation time	-0.124939
-0.462473	returns. Every time	-0.124939
-0.658036	// Returns time	-0.124939
-0.658036	develop- ment time	-0.124939
-0.358013	with real time	-0.124939
-0.358013	that saves time	-0.124939
-0.358013	// Read time	-0.124939
-0.358013	are: Coarse time	-0.124939
-0.358013	the exact time	-0.124939
-2.468198	that the use	-0.124939
-1.486890	by the use	-0.602060
-1.694477	with the use	-0.425969
-1.786609	makes the use	-0.124939
-1.539045	prevents the use	-0.124939
-1.066174	Avoid the use	-0.124939
-0.502517	economize the use	-0.425969
-0.894376	Especially the use	-0.124939
-2.731685	in a use	-0.124939
-0.845715	is to use	-0.271067
-1.315508	function to use	-0.124939
-1.288829	than to use	-0.124939
-1.078978	you to use	-0.124939
-1.018307	program to use	-0.124939
-1.443799	has to use	-0.124939
-0.937392	efficient to use	-0.425969
-1.203964	possible to use	-0.301030
-0.547643	version to use	-0.124939
-0.547643	branch to use	-0.124939
-1.451452	way to use	-0.124939
-0.782924	faster to use	-0.124939
-0.903022	how to use	-0.249877
-1.147637	need to use	-0.124939
-0.547643	method to use	-0.124939
-0.547643	cases to use	-0.124939
-1.470384	want to use	-0.124939
-1.075086	necessary to use	-0.124939
-0.400786	advantageous to use	-0.329059
-1.018307	whether to use	-0.124939
-1.559326	likely to use	-0.124939
-0.490986	recommended to use	-0.602060
-0.557994	optimal to use	-0.124939
-0.327881	reason to use	-0.124939
-0.932908	choose to use	-0.124939
-0.192677	inefficient to use	-0.124939
-0.547643	lines to use	-0.124939
-0.932908	safe to use	-0.124939
-1.158160	easier to use	-0.124939
-0.800321	expect to use	-0.124939
-0.547643	reasons to use	-0.124939
-0.357620	preferred to use	-0.124939
-0.471689	safer to use	-0.124939
-0.932908	prefer to use	-0.124939
-0.547643	profitable to use	-0.124939
-0.547643	unwise to use	-0.124939
-0.547643	cumbersome to use	-0.124939
-1.274375	loop and use	-0.124939
-0.892231	i and use	-0.124939
-0.892231	interface and use	-0.124939
-0.596596	off and use	-0.124939
-0.596596	local, and use	-0.124939
-1.462784	program. The use	-0.124939
-1.062806	purposes. The use	-0.124939
-1.062806	vector. The use	-0.124939
-1.062806	threads. The use	-0.124939
-0.600979	matrix for use	-0.124939
-1.868967	code that use	-0.124939
-1.386583	instructions that use	-0.124939
-0.592872	classes that use	-0.124939
-0.884904	modules that use	-0.124939
-0.592872	platforms that use	-0.124939
-1.052180	Applications that use	-0.124939
-1.299646	it can use	-0.425969
-1.306174	function can use	-0.124939
-0.868181	compiler can use	-0.425969
-0.962093	you can use	-0.221849
-1.396208	compilers can use	-0.124939
-1.378544	we can use	-0.124939
-1.135719	You can use	-0.124939
-0.826902	programmer can use	-0.124939
-0.562297	interface can use	-0.124939
-0.969388	union can use	-0.124939
-1.066578	code or use	-0.124939
-0.597818	directly, or use	-0.124939
-0.898371	will not use	-0.124939
-1.239664	do not use	-0.124939
-1.577881	does not use	-0.124939
-0.347591	Do not use	-0.204120
-1.050630	you may use	-0.124939
-0.843086	You may use	-0.221849
-0.565273	framework may use	-0.124939
-0.595790	method you use	-0.124939
-0.890640	whether you use	-0.124939
-1.005024	this will use	-0.124939
-1.153302	loop will use	-0.124939
-1.102775	compilers will use	-0.124939
-0.575908	library will use	-0.124939
-0.860585	and then use	-0.124939
-0.958722	can then use	-0.124939
-0.558089	element then use	-0.124939
-0.558089	option then use	-0.124939
-0.819193	used, then use	-0.124939
-0.194962	basis then use	-0.425969
-1.655013	can make use	-0.124939
-0.599371	optimizing CPU use	-0.124939
-0.598905	examples all use	-0.124939
-1.487549	code cache use	-0.124939
-1.378404	data cache use	-0.124939
-0.664336	You should use	-0.124939
-0.581030	Software should use	-0.124939
-0.598770	you do use	-0.124939
-0.598356	For example use	-0.124939
-0.588841	best compilers use	-0.124939
-0.588841	(Some compilers use	-0.124939
-1.731358	can also use	-0.124939
-0.598155	to efficient use	-0.124939
-0.597340	avoid any use	-0.124939
-1.283760	if we use	-0.124939
-1.473355	point variables use	-0.124939
-1.448990	You cannot use	-0.124939
-1.217786	they cannot use	-0.124939
-2.186677	For example, use	-0.124939
-0.595910	systems often use	-0.124939
-0.891090	class libraries use	-0.124939
-0.574313	These systems use	-0.124939
-0.574313	Unix-like systems use	-0.124939
-0.941527	to always use	-0.124939
-0.806656	and always use	-0.124939
-0.806656	should always use	-0.124939
-1.476050	vector operations use	-0.124939
-1.349605	integer operations use	-0.124939
-0.594898	libraries available use	-0.124939
-1.081097	Intel CPUs use	-0.124939
-0.837237	AMD CPUs use	-0.124939
-0.836108	functions must use	-0.124939
-0.567279	programs must use	-0.124939
-1.260202	the threads use	-0.124939
-0.885316	to integers use	-0.124939
-1.276168	container classes use	-0.124939
-0.824502	string classes use	-0.124939
-1.599814	as well use	-0.124939
-0.471272	many programs use	-0.124939
-0.471272	common programs use	-0.124939
-0.471272	application programs use	-0.124939
-0.471272	Other programs use	-0.124939
-0.794488	that typically use	-0.124939
-0.794488	will typically use	-0.124939
-1.150130	should never use	-0.124939
-0.586712	make better use	-0.124939
-0.778206	software applications use	-0.124939
-0.535144	several applications use	-0.124939
-1.266922	programming languages use	-0.124939
-0.574623	Many containers use	-0.124939
-0.574070	in full use	-0.124939
-0.473703	the resource use	-0.124939
-0.473703	economize resource use	-0.124939
-0.846443	Some implementations use	-0.124939
-0.569502	some programmers use	-0.124939
-0.566211	overkill. Don't use	-0.124939
-0.825489	the DLL use	-0.124939
-0.557172	applications. Alternatively, use	-0.124939
-0.525940	The explicit use	-0.124939
-0.314134	not normally use	-0.124939
-0.314134	systems normally use	-0.124939
-0.503738	brackets mean use	-0.124939
-0.503738	future. To use	-0.124939
-0.065689	version (May use	-0.425969
-0.462473	Intel CPUs: use	-0.124939
-0.462473	occurs, (2) use	-0.124939
-0.462473	space. Excessive use	-0.124939
-0.462473	Windows DLLs use	-0.124939
-0.462473	Java machines use	-0.124939
-0.462473	as Java, use	-0.124939
-0.358013	2 thenaandbcannot use	-0.124939
-0.358013	stack entries use	-0.124939
-0.358013	float. (Both use	-0.124939
-0.358013	multiplications. Subtractions use	-0.124939
-1.374612	not the more	-0.124939
-0.601122	system, the more	-0.124939
-1.789671	that is more	-0.124939
-1.497140	it is more	-0.346788
-1.781622	code is more	-0.124939
-1.619154	compiler is more	-0.124939
-1.246588	It is more	-0.550907
-1.102490	data is more	-0.124939
-1.652508	program is more	-0.124939
-1.707999	which is more	-0.124939
-1.399780	class is more	-0.124939
-1.574755	there is more	-0.124939
-0.849309	member is more	-0.124939
-0.574337	libraries is more	-0.124939
-1.102490	access is more	-0.124939
-0.574337	operations is more	-0.124939
-0.849309	type is more	-0.124939
-0.849309	Linux is more	-0.124939
-0.574337	calculation is more	-0.124939
-1.545908	solution is more	-0.124939
-0.574337	operators is more	-0.124939
-0.574337	write is more	-0.124939
-1.302431	operand is more	-0.124939
-0.849309	situation is more	-0.124939
-1.000817	transfer is more	-0.124939
-0.849309	blocks is more	-0.124939
-1.000817	latter is more	-0.124939
-0.849309	#if is more	-0.124939
-0.574337	pre-increment is more	-0.124939
-0.574337	*(p++) is more	-0.124939
-0.574337	array[i++] is more	-0.124939
-2.287658	to a more	-0.124939
-1.800903	in a more	-0.124939
-0.598326	Gives a more	-0.124939
-0.593236	more and more	-0.124939
-1.169851	faster and more	-0.124939
-0.593236	bigger and more	-0.124939
-1.053229	smaller and more	-0.124939
-1.169851	clear and more	-0.124939
-0.885617	mode, and more	-0.124939
-0.593236	expensive and more	-0.124939
-0.593236	cheaper and more	-0.124939
-1.629286	accessed in more	-0.124939
-0.897368	But in more	-0.124939
-1.542135	described in more	-0.124939
-1.338365	and for more	-0.124939
-1.175779	register for more	-0.124939
-0.594800	31 for more	-0.124939
-0.594800	119 for more	-0.124939
-0.594800	literature for more	-0.124939
-1.569175	may be more	-0.301030
-1.178168	possibly be more	-0.124939
-1.763811	functions are more	-0.124939
-1.975737	there are more	-0.124939
-0.591520	development are more	-0.124939
-0.591520	switches are more	-0.124939
-1.048298	comparisons are more	-0.124939
-0.591520	12) are more	-0.124939
-0.476561	one or more	-0.249877
-0.091288	two or more	-0.182931
-0.565577	bytes or more	-0.124939
-0.565577	expressions or more	-0.124939
-0.565577	Two or more	-0.124939
-1.595246	makes it more	-0.124939
-1.708500	replaced by more	-0.124939
-0.597467	increased by more	-0.124939
-0.597156	computers with more	-0.124939
-1.064631	satisfied with more	-0.124939
-1.498477	the code more	-0.124939
-1.121301	point code more	-0.124939
-0.859709	source code more	-0.124939
-0.579828	independent code more	-0.124939
-1.322047	will have more	-0.124939
-1.487385	functions have more	-0.124939
-1.043357	typically have more	-0.124939
-1.505310	may use more	-0.124939
-1.174252	programs use more	-0.124939
-0.570303	sets A more	-0.124939
-0.570303	enabled. A more	-0.124939
-0.570303	collection. A more	-0.124939
-0.570303	date. A more	-0.124939
-0.570303	("hidden")))". A more	-0.124939
-0.570303	_mm_free. A more	-0.124939
-0.898262	run at more	-0.124939
-1.975731	the data more	-0.124939
-0.592648	making data more	-0.124939
-2.396132	the program more	-0.124939
-2.511153	to make more	-0.124939
-2.088057	is used more	-0.124939
-0.598816	adding one more	-0.124939
-1.807149	is no more	-0.124939
-1.505749	have no more	-0.124939
-1.240514	takes no more	-0.124939
-1.018666	take no more	-0.124939
-1.520629	to do more	-0.124939
-0.857640	or do more	-0.124939
-1.145261	that takes more	-0.124939
-1.761319	it takes more	-0.124939
-0.543808	list takes more	-0.124939
-1.007069	conversion takes more	-0.124939
-0.543808	DLL takes more	-0.124939
-0.597059	SSE4.1 some more	-0.124939
-0.892307	making software more	-0.124939
-1.183717	array elements more	-0.124939
-0.597076	software. For more	-0.124939
-1.051251	to take more	-0.124939
-0.479330	and take more	-0.124939
-1.276981	can take more	-0.124939
-0.558511	may take more	-0.124939
-0.782579	functions take more	-0.124939
-0.479330	C++ take more	-0.124939
-0.479330	sometimes take more	-0.124939
-1.136414	is often more	-0.124939
-0.595773	detailed optimization more	-0.124939
-0.573527	is even more	-0.124939
-0.847785	An even more	-0.124939
-0.888710	takes up more	-0.124939
-1.723657	function calls more	-0.124939
-0.527293	is much more	-0.124939
-0.381165	a much more	-0.124939
-0.499899	takes much more	-0.425969
-0.381165	take much more	-0.124939
-0.381165	often much more	-0.124939
-0.381165	uses much more	-0.124939
-0.381165	made much more	-0.124939
-0.381165	obtain much more	-0.124939
-1.427971	is therefore more	-0.124939
-1.325117	and therefore more	-0.124939
-0.592384	Multithreading works more	-0.124939
-1.704777	be calculated more	-0.124939
-1.167131	address calculation more	-0.124939
-0.496920	and uses more	-0.124939
-0.496920	it uses more	-0.124939
-0.496920	int uses more	-0.124939
-0.943633	program uses more	-0.124939
-1.579817	to get more	-0.124939
-1.786088	a few more	-0.124939
-1.200412	clock cycles more	-0.425969
-0.589074	CPUs was more	-0.124939
-0.820547	data caching more	-0.124939
-0.720301	makes caching more	-0.124939
-0.587761	program development more	-0.124939
-0.643274	code becomes more	-0.124939
-0.685552	space becomes more	-0.124939
-1.221293	is actually more	-0.124939
-1.553258	to load more	-0.124939
-1.138712	function calling more	-0.124939
-1.332817	be made more	-0.124939
-1.025245	or require more	-0.124939
-0.581779	can go more	-0.124939
-0.861071	have become more	-0.124939
-0.448463	it gives more	-0.124939
-0.448463	often gives more	-0.124939
-0.448463	CPUs" gives more	-0.124939
-0.580667	makes inlining more	-0.124939
-0.580074	also find more	-0.124939
-1.270699	assembly output more	-0.124939
-0.714836	is sometimes more	-0.124939
-0.714836	are sometimes more	-0.124939
-0.574255	is possibly more	-0.124939
-0.504034	a little more	-0.124939
-0.804985	to allocate more	-0.124939
-0.377420	not allocate more	-0.124939
-0.377420	even allocate more	-0.124939
-0.185056	is slightly more	-0.425969
-0.185056	only slightly more	-0.124939
-0.218554	Sum1 slightly more	-0.124939
-0.561470	(rebased) once more	-0.124939
-0.556507	well spend more	-0.124939
-1.104982	point comparisons more	-0.124939
-0.549359	subexpression occurs more	-0.124939
-0.549668	cannot prefetch more	-0.124939
-0.787218	to consume more	-0.124939
-0.566139	are becoming more	-0.124939
-0.358304	therefore becoming more	-0.124939
-0.539545	in ever more	-0.124939
-0.503478	some programs, more	-0.124939
-0.724215	than allocating more	-0.124939
-0.832567	is certainly more	-0.124939
-0.657665	to invest more	-0.124939
-0.462237	makes dynamic_cast more	-0.124939
-0.462237	be achieved more	-0.124939
-0.462237	be cached more	-0.124939
-0.462237	is somewhat more	-0.124939
-0.357827	may sample more	-0.124939
-0.357827	actually implies more	-0.124939
-0.357827	takes 40% more	-0.124939
-1.195724	aware of when	-0.124939
-0.898872	track of when	-0.124939
-1.166250	functions or when	-0.124939
-0.589066	possible or when	-0.124939
-0.246956	mode or when	-0.602060
-0.597362	power function when	-0.124939
-0.893745	pow function when	-0.124939
-0.898778	details on when	-0.124939
-1.595307	position-independent code when	-0.124939
-1.062532	vectorized code when	-0.124939
-0.599426	cache as when	-0.124939
-1.755661	efficient than when	-0.124939
-1.908598	faster than when	-0.124939
-0.590762	index than when	-0.124939
-0.599789	alone compiler when	-0.124939
-0.600500	object x when	-0.124939
-1.654166	The time when	-0.124939
-1.378521	long time when	-0.124939
-1.115740	extra time when	-0.124939
-1.136207	execution time when	-0.124939
-0.578223	ment time when	-0.124939
-1.685051	the memory when	-0.124939
-0.883577	into memory when	-0.124939
-0.598576	big program when	-0.124939
-0.507636	a only when	-0.124939
-1.034971	not only when	-0.124939
-1.063338	used only when	-0.124939
-0.507636	size only when	-0.124939
-0.507636	branch only when	-0.124939
-0.507636	AVX only when	-0.124939
-0.507636	loaded only when	-0.124939
-0.507636	chosen only when	-0.124939
-0.507636	applies only when	-0.124939
-0.227049	mispredicted only when	-0.602060
-0.507636	evaluated only when	-0.124939
-0.507636	services only when	-0.124939
-1.924119	is used when	-0.124939
-1.972415	be used when	-0.124939
-1.021134	also used when	-0.124939
-1.733813	instruction set when	-0.124939
-2.235980	to do when	-0.124939
-0.836434	for example when	-0.124939
-0.597828	default size when	-0.124939
-0.597966	evaluate b when	-0.124939
-0.200777	positive number when	-0.124939
-0.598550	put there when	-0.124939
-1.468732	but also when	-0.124939
-0.821530	is efficient when	-0.124939
-1.433245	more efficient when	-0.124939
-1.486601	less efficient when	-0.124939
-1.365567	not possible when	-0.124939
-1.944230	of 2 when	-0.124939
-1.954496	is faster when	-0.124939
-1.140403	be faster when	-0.124939
-1.701023	is called when	-0.124939
-1.338398	be called when	-0.124939
-0.596392	is critical when	-0.124939
-2.173102	For example, when	-0.124939
-0.595323	comes first when	-0.124939
-0.594905	other libraries when	-0.124939
-1.449790	vector registers when	-0.124939
-1.694424	to test when	-0.124939
-1.512064	32-bit systems when	-0.124939
-1.338422	is useful when	-0.124939
-1.609915	be useful when	-0.124939
-1.375983	are useful when	-0.124939
-1.072436	very useful when	-0.124939
-0.460487	memory even when	-0.124939
-0.460487	objects even when	-0.124939
-0.460487	mechanism even when	-0.124939
-0.460487	needed even when	-0.124939
-0.460487	inlined even when	-0.124939
-0.460487	used, even when	-0.124939
-0.460487	default, even when	-0.124939
-0.460487	space, even when	-0.124939
-1.268176	executable file when	-0.124939
-0.886856	512 bits when	-0.124939
-0.996961	vector operations when	-0.124939
-1.394890	most cases when	-0.124939
-0.593569	inconvenient times when	-0.124939
-1.740803	the stack when	-0.124939
-2.044514	you want when	-0.124939
-0.592658	also work when	-0.124939
-1.568924	is compiled when	-0.124939
-1.050321	is best when	-0.124939
-1.336289	not necessary when	-0.124939
-1.577815	programming language when	-0.124939
-0.592827	++i). But when	-0.124939
-1.272015	the matrix when	-0.124939
-1.399399	a matrix when	-0.124939
-1.038292	double precision when	-0.425969
-0.591135	by line when	-0.124939
-1.603683	be advantageous when	-0.124939
-1.043812	a problem when	-0.124939
-1.286850	this problem when	-0.124939
-1.805100	to calculate when	-0.124939
-1.599400	loop counter when	-0.124939
-0.589253	several files when	-0.124939
-1.757283	memory allocation when	-0.124939
-0.587699	CPU-intensive programs when	-0.124939
-0.876094	cause problems when	-0.124939
-0.541140	goes automatically when	-0.124939
-0.541140	update automatically when	-0.124939
-0.587737	different implementation when	-0.124939
-0.786366	than signed when	-0.124939
-0.539788	Use signed when	-0.124939
-1.162619	a disadvantage when	-0.124939
-0.586716	process running when	-0.124939
-1.734275	the end when	-0.124939
-0.584358	large expressions when	-0.124939
-0.868119	#define directives when	-0.124939
-0.584741	cross-module optimizations when	-0.124939
-0.869116	some microprocessors when	-0.124939
-0.582577	each process when	-0.124939
-0.584291	many advantages when	-0.124939
-0.583148	32 results when	-0.124939
-0.581120	language modules when	-0.124939
-0.315657	is relevant when	-0.124939
-0.455733	be relevant when	-0.124939
-0.447015	allocated dynamically when	-0.425969
-1.662846	be avoided when	-0.124939
-0.578717	are inefficient when	-0.124939
-0.578479	delay comes when	-0.124939
-0.576334	this task when	-0.124939
-1.345240	is obtained when	-0.124939
-0.481569	the counters when	-0.124939
-0.936849	monitor counters when	-0.124939
-0.874692	most efficiently when	-0.124939
-0.291356	less efficiently when	-0.425969
-0.367953	is initialized when	-0.425969
-0.729255	be initialized when	-0.124939
-0.571810	slower, especially when	-0.124939
-1.344668	error message when	-0.124939
-0.571810	library, except when	-0.124939
-0.568898	execute CriticalFunction when	-0.124939
-1.110468	performance penalty when	-0.124939
-0.567849	is invalid when	-0.124939
-0.823711	fine-grained parallelism when	-0.124939
-0.108815	into account when	-0.346788
-0.560130	inefficient, however, when	-0.124939
-0.335335	more fragmented when	-0.124939
-0.472338	becomes fragmented when	-0.124939
-0.472338	become fragmented when	-0.124939
-0.560130	mouse inputs when	-0.124939
-0.999802	is preferred when	-0.124939
-0.560130	space explicitly when	-0.124939
-0.761351	is resolved when	-0.124939
-0.414697	not resolved when	-0.124939
-0.554458	object. Likewise, when	-0.124939
-0.814348	program itself when	-0.124939
-0.554942	more serious when	-0.124939
-1.151030	Visual Studio when	-0.124939
-0.584223	the disadvantages when	-0.124939
-0.413996	these disadvantages when	-0.124939
-0.554458	an update when	-0.124939
-1.040245	garbage collection when	-0.124939
-0.801172	is costly when	-0.124939
-0.548118	than truncation when	-0.124939
-0.547563	This happens when	-0.124939
-0.800178	different places when	-0.124939
-0.038895	keyword static, when	-0.726999
-0.274109	is deallocated when	-0.124939
-0.274109	are deallocated when	-0.124939
-0.274109	automatically deallocated when	-0.124939
-0.548118	is permissible when	-0.124939
-0.538742	less strict when	-0.124939
-0.538742	viable compromise when	-0.124939
-0.538090	is increased when	-0.124939
-0.910067	to understand when	-0.124939
-0.524257	more dramatic when	-0.124939
-0.525047	into force when	-0.124939
-0.759327	is simpler when	-0.124939
-0.721991	is deleted when	-0.124939
-0.502138	motion manually when	-0.124939
-0.829715	option -fno-pic when	-0.124939
-0.502138	class Vec16s when	-0.124939
-0.721991	more readable when	-0.124939
-0.721991	is negligible when	-0.124939
-0.502138	re- allocating when	-0.124939
-0.502138	about Func1 when	-0.124939
-0.248760	function implicitly when	-0.124939
-0.248760	done implicitly when	-0.124939
-0.721991	be evicted when	-0.124939
-0.461019	is freed when	-0.124939
-0.655756	all 0's when	-0.124939
-0.461019	be bypassed when	-0.124939
-0.461019	is achieved when	-0.124939
-0.461019	be careful when	-0.124939
-0.461019	array indices when	-0.124939
-0.461019	requirement. Useful when	-0.124939
-0.461019	the question when	-0.124939
-0.461019	point precisions when	-0.124939
-0.655756	all 1's when	-0.124939
-0.461019	is stronger when	-0.124939
-0.655756	four float's when	-0.124939
-0.356869	loop exits, when	-0.124939
-0.356869	in popularity when	-0.124939
-0.356869	the processor) when	-0.124939
-0.356869	to Eclipse when	-0.124939
-0.356869	systems disappears when	-0.124939
-0.356869	than 33% when	-0.124939
-0.356869	memory released when	-0.124939
-0.356869	and decreased when	-0.124939
-0.356869	const definitions when	-0.124939
-2.113940	value of A	-0.124939
-0.898387	calculation of A	-0.301030
-0.594388	Z = A	-0.124939
-0.887879	x.abc = A	-0.124939
-0.594388	A2 = A	-0.124939
-1.886433	Gnu compiler A	-0.124939
-1.936977	} } A	-0.124939
-0.861703	b; } A	-0.124939
-1.497343	2; } A	-0.124939
-0.861703	1.0f; } A	-0.124939
-1.597258	of data A	-0.124939
-1.587159	and data A	-0.124939
-1.160954	of functions A	-0.124939
-0.590860	Pure functions A	-0.124939
-1.570077	innermost loop A	-0.124939
-0.495735	const double A	-0.425969
-0.594355	critical code. A	-0.124939
-0.766872	the time. A	-0.124939
-0.823277	a time. A	-0.124939
-0.766872	of time. A	-0.124939
-0.494613	same time. A	-0.124939
-0.825660	compile time. A	-0.124939
-0.471445	calculation time. A	-0.124939
-1.175168	Smart pointers A	-0.124939
-0.196747	other function. A	-0.425969
-0.616551	member functions. A	-0.425969
-0.712251	mathematical functions. A	-0.124939
-0.442922	string functions. A	-0.124939
-0.627745	frame functions. A	-0.124939
-0.442922	thread-safe functions. A	-0.124939
-1.047359	= b; A	-0.124939
-0.589187	main memory. A	-0.124939
-1.183371	is used. A	-0.124939
-0.737690	never used. A	-0.124939
-0.737690	longer used. A	-0.124939
-0.587082	Instruction sets A	-0.124939
-1.203746	64-bit systems. A	-0.124939
-0.784830	embedded systems. A	-0.124939
-1.145563	type conversion A	-0.124939
-0.770623	the data. A	-0.124939
-1.013661	of data. A	-0.124939
-0.447233	user data. A	-0.124939
-0.447233	input data. A	-0.124939
-1.716077	instruction set. A	-0.124939
-1.133822	Intel processors. A	-0.124939
-0.581913	Register storage A	-0.124939
-1.147504	are called. A	-0.124939
-1.318742	a pointer. A	-0.124939
-0.576695	unsigned variables. A	-0.124939
-0.576123	functions local A	-0.124939
-0.574190	calls it. A	-0.124939
-0.574190	into registers. A	-0.124939
-1.178003	64-bit mode. A	-0.124939
-0.574190	each object. A	-0.124939
-0.574190	static library. A	-0.124939
-0.574190	of operations. A	-0.124939
-0.573873	careful optimization. A	-0.124939
-0.864037	in performance. A	-0.124939
-0.489105	improved performance. A	-0.124939
-0.572289	static libraries. A	-0.124939
-1.382896	the stack. A	-0.124939
-0.571947	is possible. A	-0.124939
-0.572632	one thread. A	-0.124939
-0.841275	AVX instructions. A	-0.124939
-0.570053	other way. A	-0.124939
-0.323568	predicted well. A	-0.124939
-0.566979	different address. A	-0.124939
-1.069021	and destructors A	-0.124939
-1.316052	is enabled. A	-0.124939
-0.406832	points to. A	-0.124939
-0.847393	memory block. A	-0.124939
-0.448380	next block. A	-0.124939
-0.830309	particularly critical. A	-0.124939
-0.830309	and usability A	-0.124939
-0.559632	output file. A	-0.124939
-0.822927	Pointer arithmetic A	-0.124939
-0.560131	logic devices A	-0.124939
-0.135927	memory space. A	-0.124939
-0.560632	cache space. A	-0.124939
-0.559133	of zero. A	-0.124939
-0.559133	your software. A	-0.124939
-0.559632	into vectors. A	-0.124939
-0.553895	template parameters. A	-0.124939
-0.799719	is important. A	-0.124939
-0.413705	increasingly important. A	-0.124939
-0.811573	Context switches A	-0.124939
-0.583189	hard disk. A	-0.124939
-0.413298	floppy disk. A	-0.124939
-1.038771	this case. A	-0.124939
-0.554459	for polymorphism A	-0.124939
-0.554459	full speed. A	-0.124939
-0.547009	every call. A	-0.124939
-0.424015	branch prediction. A	-0.124939
-1.076828	data members. A	-0.124939
-0.799188	explained above. A	-0.124939
-0.547656	same result. A	-0.124939
-1.122325	dependency chain. A	-0.124939
-0.800345	inconvenient times. A	-0.124939
-0.800345	integer counter. A	-0.124939
-0.547656	other branches. A	-0.124939
-1.057880	CPU cores. A	-0.124939
-0.782421	a profiler. A	-0.124939
-0.538308	7.28 Templates A	-0.124939
-1.035521	time consuming. A	-0.124939
-0.537548	the method. A	-0.124939
-0.908793	development tools. A	-0.124939
-0.783758	one operation. A	-0.124939
-0.991277	unroll factor. A	-0.124939
-0.782421	each process. A	-0.124939
-0.537548	not necessary. A	-0.124939
-0.782421	Pointer elimination A	-0.124939
-0.538308	finding elements. A	-0.124939
-0.538308	not doubled. A	-0.124939
-0.537548	mentioned here: A	-0.124939
-0.876984	an exception. A	-0.124939
-0.524653	need initialization. A	-0.124939
-0.758426	3.10 Graphics A	-0.124939
-0.524653	simple index. A	-0.124939
-0.524653	few lines. A	-0.124939
-0.523732	hardware conditions. A	-0.124939
-0.876984	an example. A	-0.124939
-0.128930	very expensive. A	-0.124939
-0.523732	two versions. A	-0.124939
-0.523732	distinct tasks. A	-0.124939
-0.951127	user interface. A	-0.124939
-0.758426	4 floats A	-0.124939
-0.760008	data sequentially A	-0.124939
-0.828656	dependency chains. A	-0.124939
-0.828656	code motion A	-0.124939
-0.828656	garbage collection. A	-0.124939
-0.501640	of int. A	-0.124939
-0.501640	const reference. A	-0.124939
-0.828656	error prone. A	-0.124939
-0.655046	7.24 Unions A	-0.124939
-0.460565	constant subexpression. A	-0.124939
-0.460565	bits differently. A	-0.124939
-0.655046	of course. A	-0.124939
-0.460565	avoided. 37 A	-0.124939
-0.655046	to date. A	-0.124939
-0.655046	page 78). A	-0.124939
-0.142657	and debugging. A	-0.124939
-0.142657	with debugging. A	-0.124939
-0.460565	to load. A	-0.124939
-0.655046	(see below). A	-0.124939
-0.655046	fast enough. A	-0.124939
-0.460565	works correctly. A	-0.124939
-0.655046	Codeplay VectorC A	-0.124939
-0.460565	static linking. A	-0.124939
-0.655046	mutually incompatible. A	-0.124939
-0.655046	monitor counters. A	-0.124939
-0.655046	VIA CPUs". A	-0.124939
-0.460565	n! 117 A	-0.124939
-0.356512	memory blocks. A	-0.124939
-0.356512	do so). A	-0.124939
-0.356512	be moved. A	-0.124939
-0.356512	function body. A	-0.124939
-0.356512	often mispredicted. A	-0.124939
-0.356512	object owns. A	-0.124939
-0.356512	good investment. A	-0.124939
-0.356512	page 153. A	-0.124939
-0.356512	other constructors. A	-0.124939
-0.356512	page 107. A	-0.124939
-0.356512	jump targets. A	-0.124939
-0.356512	example 7.32b. A	-0.124939
-0.356512	a constructor. A	-0.124939
-0.356512	first sub-vector. A	-0.124939
-0.356512	microprocessor microarchitecture. A	-0.124939
-0.356512	"__attribute__((visibility ("hidden")))". A	-0.124939
-0.356512	device driver. A	-0.124939
-0.356512	(See Sutter: A	-0.124939
-0.356512	Classes (MFC). A	-0.124939
-0.356512	without restrictions. A	-0.124939
-0.356512	0, sizeof(list)); A	-0.124939
-0.356512	ASCII form. A	-0.124939
-0.356512	was developed. A	-0.124939
-0.356512	"Inner Loops: A	-0.124939
-0.356512	millisecond resolution. A	-0.124939
-0.356512	NAN (Not A	-0.124939
-0.356512	extra complications. A	-0.124939
-0.356512	linked lists. A	-0.124939
-0.356512	0.5ns. 2GHz A	-0.124939
-0.356512	Library (WTL). A	-0.124939
-0.356512	0x20; 46 A	-0.124939
-0.356512	be considered. A	-0.124939
-0.356512	in scope. A	-0.124939
-0.356512	give infinity. A	-0.124939
-0.356512	and _mm_free. A	-0.124939
-0.356512	is taken. A	-0.124939
-0.356512	it says. A	-0.124939
-0.356512	details (www.agner.org/optimize/testp.zip). A	-0.124939
-0.356512	it changes. A	-0.124939
-0.356512	for Basic. A	-0.124939
-0.356512	have exploited. A	-0.124939
-0.356512	is servicing. A	-0.124939
-0.356512	is inferior. A	-0.124939
-0.356512	different types. A	-0.124939
-0.356512	a destructor. A	-0.124939
-0.356512	this reason. A	-0.124939
-0.356512	instruction set?". A	-0.124939
-0.356512	specific interval. A	-0.124939
-0.356512	polynomial. Scheduling A	-0.124939
-0.356512	} 138 A	-0.124939
-0.356512	structure needed? A	-0.124939
-0.356512	increments seconds. A	-0.124939
-2.566985	of a will	-0.124939
-2.000885	that a will	-0.124939
-1.494400	memory and will	-0.124939
-0.600553	x.f; // will	-0.124939
-1.040177	and it will	-0.124939
-1.921905	that it will	-0.124939
-0.958999	then it will	-0.221849
-1.680941	but it will	-0.124939
-0.982151	call it will	-0.124939
-0.982151	case it will	-0.124939
-1.492008	Therefore, it will	-0.124939
-0.567252	created it will	-0.124939
-0.567252	all, it will	-0.124939
-1.323362	library function will	-0.124939
-1.490661	virtual function will	-0.124939
-1.165406	dispatcher function will	-0.124939
-0.592053	DelayFiveSeconds function will	-0.124939
-1.575092	the code will	-0.124939
-1.697201	The code will	-0.124939
-1.087088	This code will	-0.124939
-1.159079	optimized code will	-0.124939
-0.575994	above code will	-0.124939
-0.569715	resulting code will	-0.124939
-0.569715	resultant code will	-0.124939
-1.163146	function. This will	-0.124939
-0.828699	definition. This will	-0.124939
-0.563273	reference. This will	-0.124939
-0.828699	inline. This will	-0.124939
-0.563273	changed. This will	-0.124939
-0.563273	slices. This will	-0.124939
-0.563273	-axAVX. This will	-0.124939
-0.563273	functionality. This will	-0.124939
-0.563273	(4096). This will	-0.124939
-0.563273	87. This will	-0.124939
-0.563273	-fpic. This will	-0.124939
-1.456044	the compiler will	-0.124939
-0.785748	The compiler will	-0.124939
-0.534754	which compiler will	-0.124939
-1.038799	Gnu compiler will	-0.124939
-0.476964	good compiler will	-0.124939
-0.823490	optimizing compiler will	-0.124939
-1.189623	as you will	-0.124939
-0.829871	compiler you will	-0.124939
-1.137106	then you will	-0.124939
-0.973515	because you will	-0.124939
-0.973515	so you will	-0.124939
-0.829871	program, you will	-0.124939
-0.563909	compiler, you will	-0.124939
-0.591783	and this will	-0.124939
-1.633389	but this will	-0.124939
-0.581323	// It will	-0.124939
-1.038653	memory. It will	-0.124939
-0.581323	system. It will	-0.124939
-0.581323	positive. It will	-0.124939
-1.737292	of memory will	-0.124939
-0.592712	No memory will	-0.124939
-1.593128	the program will	-0.124939
-1.482348	The program will	-0.124939
-0.995042	application program will	-0.124939
-0.845226	entire program will	-0.124939
-1.660660	The CPU will	-0.124939
-1.948095	the loop will	-0.124939
-1.486961	The loop will	-0.124939
-0.851446	this loop will	-0.124939
-0.575471	whole loop will	-0.124939
-0.849646	i which will	-0.124939
-0.574516	optimization, which will	-0.124939
-0.574516	-56 which will	-0.124939
-0.574516	a[] which will	-0.124939
-0.599020	variables, but will	-0.124939
-1.174526	the cache will	-0.124939
-0.598395	negative integer will	-0.124939
-0.895444	same class will	-0.124939
-1.056162	the compilers will	-0.124939
-0.879778	The compilers will	-0.124939
-0.841538	other compilers will	-0.124939
-0.731189	most compilers will	-0.124939
-0.643274	some compilers will	-0.124939
-0.617061	Some compilers will	-0.124939
-0.643274	good compilers will	-0.124939
-0.731189	optimizing compilers will	-0.124939
-0.363619	Most compilers will	-0.124939
-0.453005	Optimizing compilers will	-0.124939
-0.643274	future compilers will	-0.124939
-1.161233	of b will	-0.124939
-1.666121	and b will	-0.124939
-1.523332	class library will	-0.124939
-1.530929	of i will	-0.124939
-1.370051	and there will	-0.124939
-0.598474	} There will	-0.124939
-1.065443	linear array will	-0.124939
-0.894451	this value will	-0.124939
-1.482934	and objects will	-0.124939
-1.252467	then we will	-0.124939
-0.814509	But we will	-0.124939
-0.555515	four, we will	-0.124939
-0.555515	Thus, we will	-0.124939
-1.282408	do so will	-0.124939
-0.597030	which variables will	-0.124939
-0.597777	me. You will	-0.124939
-0.910652	a branch will	-0.124939
-1.188216	eight elements will	-0.124939
-1.339948	64-bit systems will	-0.124939
-1.630305	the user will	-0.124939
-1.101903	end user will	-0.124939
-0.594627	code 16 will	-0.124939
-0.844487	The file will	-0.124939
-1.246300	header file will	-0.124939
-1.639955	of programming will	-0.124939
-0.594688	Future processors will	-0.124939
-0.545281	not. I will	-0.124939
-0.545281	Instead, I will	-0.124939
-0.545281	apart. I will	-0.124939
-1.482517	point calculations will	-0.124939
-1.569531	the result will	-0.124939
-1.094371	The result will	-0.124939
-0.787582	final result will	-0.124939
-0.885742	a processor will	-0.124939
-1.258229	the threads will	-0.124939
-0.592866	preferred language will	-0.124939
-1.514307	the speed will	-0.124939
-0.984406	each thread will	-0.124939
-0.984406	another thread will	-0.124939
-0.984406	Each thread will	-0.124939
-1.051634	integer overflow will	-0.124939
-1.686693	cache line will	-0.124939
-0.592469	my manual will	-0.124939
-0.883193	& b; will	-0.124939
-0.880911	overloaded operators will	-0.124939
-0.588421	[] operator will	-0.124939
-0.588146	this multiplication will	-0.124939
-0.588664	Code caching will	-0.124939
-1.148044	processor model will	-0.124939
-0.588630	Here, y will	-0.124939
-0.872958	above examples will	-0.124939
-0.583959	such feature will	-0.124939
-0.581735	same core will	-0.124939
-1.360783	code section will	-0.124939
-1.287548	cache contentions will	-0.124939
-0.342773	in main will	-0.124939
-0.503388	This operation will	-0.124939
-0.503388	& operation will	-0.124939
-0.578613	whether vectorization will	-0.124939
-0.577673	only constants will	-0.124939
-0.851470	A macro will	-0.124939
-0.573738	The counters will	-0.124939
-0.844325	overflow condition will	-0.124939
-0.993770	of cores will	-0.124939
-0.568996	that F1 will	-0.124939
-1.387871	critical stride will	-0.124939
-1.140059	The trick will	-0.124939
-0.556544	template instances will	-0.124939
-0.555562	to 127 will	-0.124939
-0.556216	The linker will	-0.124939
-0.802798	the break will	-0.124939
-0.802124	many users will	-0.124939
-0.539153	cache. Compilers will	-0.124939
-0.539594	Number 18 will	-0.124939
-0.880499	the loader will	-0.124939
-0.955300	heap manager will	-0.124939
-0.525821	Number 17 will	-0.124939
-0.241538	address 0x2710 will	-0.425969
-0.723616	example 14.28 will	-0.124939
-0.503117	and 14.30 will	-0.124939
-0.831799	to 0x273F will	-0.124939
-0.503794	constant 3.5 will	-0.124939
-0.461909	of c+b will	-0.124939
-0.461909	are disabled will	-0.124939
-0.461909	new today will	-0.124939
-0.461909	static modifier will	-0.124939
-0.461909	of b+c will	-0.124939
-0.357569	= OneOrTwo5[b!=0]; will	-0.124939
-0.357569	page 103) will	-0.124939
-0.357569	the producer will	-0.124939
-0.357569	(bitwise and) will	-0.124939
-0.357569	= b++; will	-0.124939
-0.357569	example, b*2.0/3.0 will	-0.124939
-1.699000	function. The }	-0.124939
-1.545941	virtual function }	-0.124939
-0.499116	} } }	-0.263241
-0.406005	elements } }	-0.124939
-1.016283	i; } }	-0.124939
-0.327816	... } }	-0.301030
-0.602263	1; } }	-0.124939
-0.662791	2; } }	-0.124939
-0.406005	range } }	-0.124939
-0.406005	5 } }	-0.124939
-0.290933	a); } }	-0.425969
-0.157191	F2(b); } }	-0.124939
-0.157191	a.store(aa+i); } }	-0.124939
-0.406005	Induction++; } }	-0.124939
-0.157191	swapd(a[r2][c2],a[c2][r2]); } }	-0.425969
-0.406005	i/2; } }	-0.124939
-0.406005	c[i]); } }	-0.124939
-0.406005	b[r][c]; } }	-0.124939
-0.406005	CFALSE; } }	-0.124939
-0.406005	DTRUE; } }	-0.124939
-0.406005	transpose(matrix); } }	-0.124939
-0.406005	b[r][c]); } }	-0.124939
-0.406005	i++; } }	-0.124939
-0.892365	swap elements }	-0.124939
-1.547615	= 0; }	-0.124939
-0.356395	return 0; }	-0.425969
-1.329109	array element }	-0.124939
-0.150053	= i; }	-0.124939
-0.594874	int i; }	-0.550907
-0.958426	= b; }	-0.124939
-0.557971	: b; }	-0.124939
-0.589974	// ... }	-0.124939
-0.775288	{ ... }	-0.124939
-0.417874	x; ... }	-0.124939
-0.417874	Func1(2); ... }	-0.124939
-0.417874	log(2.0); ... }	-0.124939
-0.417874	FactorialTable[b]; ... }	-0.124939
-0.588369	index operator }	-0.124939
-0.436340	= 1; }	-0.124939
-0.099389	- 1; }	-0.425969
-0.157736	+ 1; }	-0.204120
-0.099389	<< 1; }	-0.425969
-0.228491	>>= 1; }	-0.124939
-0.588087	for multiplication }	-0.124939
-0.540087	= c; }	-0.124939
-0.540087	return c; }	-0.124939
-1.034607	is zero }	-0.124939
-0.872028	Table lookup }	-0.124939
-0.584775	return x; }	-0.124939
-0.198423	= 2; }	-0.124939
-0.028940	+ 2; }	-0.221849
-0.295333	* 2; }	-0.425969
-0.076323	<< 2; }	-0.425969
-1.213452	of range }	-0.124939
-1.019920	to 5 }	-0.124939
-0.862698	both positive }	-0.124939
-0.576748	to exponent }	-0.124939
-0.845063	= temp; }	-0.124939
-0.572300	return f; }	-0.124939
-0.218361	+ 3; }	-0.124939
-0.309876	* 3; }	-0.124939
-0.218361	/ 3; }	-0.124939
-0.218361	% 3; }	-0.124939
-0.565687	return y; }	-0.124939
-0.565955	n factorial }	-0.124939
-0.080547	*)d, x); }	-0.124939
-0.548583	return 1.0; }	-0.124939
-0.185058	+ 1.; }	-0.124939
-0.539542	is nonzero }	-0.124939
-0.539088	+ C; }	-0.124939
-0.526324	printf("Delta"); break; }	-0.124939
-0.525224	range"; 134 }	-0.124939
-0.310575	= 2.0; }	-0.124939
-0.241337	+= 1.0f; }	-0.124939
-0.880356	+= i_div_3; }	-0.124939
-0.525224	#undef N1 }	-0.124939
-0.042631	_mm_loadu_si128((__m128i const*)p); }	-0.425969
-0.089908	_mm_load_si128((__m128i const*)p); }	-0.124939
-0.503057	variable Z }	-0.124939
-0.027951	i, a); }	-0.301030
-0.461855	&Object2; p->Hello(); }	-0.124939
-0.657066	= cos(x); }	-0.124939
-0.461855	to C1::f }	-0.124939
-0.461855	return sum; }	-0.124939
-0.461855	+ 2.0f; }	-0.124939
-0.657066	of range"; }	-0.124939
-0.461855	return clock; }	-0.124939
-0.461855	or -0 }	-0.124939
-0.142966	{ F2(b); }	-0.124939
-0.142966	b[1000]; F2(b); }	-0.124939
-0.461855	} FuncC(i); }	-0.124939
-0.657066	{ FuncA(i); }	-0.124939
-0.461855	(*CriticalFunction)(parm1, parm2); }	-0.124939
-0.461855	four x^n }	-0.124939
-0.142966	{ F1(a); }	-0.124939
-0.142966	a[1000]; F1(a); }	-0.124939
-0.142966	= &CriticalFunction_386; }	-0.124939
-0.142966	return &CriticalFunction_386; }	-0.124939
-0.065627	{ DoThisThreeTimesAWeek(); }	-0.124939
-0.657066	= sin(x); }	-0.124939
-0.142966	c); a.store(aa+i); }	-0.124939
-0.142966	aa: a.store(aa+i); }	-0.124939
-0.142966	= &CriticalFunction_SSE2; }	-0.124939
-0.142966	return &CriticalFunction_SSE2; }	-0.124939
-0.461855	Induction; Induction++; }	-0.124939
-0.065627	bb, cc); }	-0.124939
-0.065627	{ swapd(a[r2][c2],a[c2][r2]); }	-0.425969
-0.461855	// EMMS }	-0.124939
-0.657066	0, sizeof(a)); }	-0.124939
-0.142966	= &CriticalFunction_AVX; }	-0.124939
-0.142966	return &CriticalFunction_AVX; }	-0.124939
-0.461855	} 109 }	-0.124939
-0.357526	return pow(x,10); }	-0.124939
-0.357526	four sums }	-0.124939
-0.357526	(time before) }	-0.124939
-0.357526	* powN<true,N/2>::p(x); }	-0.124939
-0.357526	+ list[j].c; }	-0.124939
-0.357526	* cc[i]); }	-0.124939
-0.357526	return x10; }	-0.124939
-0.357526	+ i/2; }	-0.124939
-0.357526	> abs(v.f) }	-0.124939
-0.357526	* c[i]); }	-0.124939
-0.357526	{ FuncB(i); }	-0.124939
-0.357526	return IntegerPower<10>(x); }	-0.124939
-0.357526	function: (static_cast<MyChild*>(this))->Disp(); }	-0.124939
-0.357526	= b[r][c]; }	-0.124939
-0.357526	range printf(Greek[n]); }	-0.124939
-0.357526	goto CFALSE; }	-0.124939
-0.357526	goto DTRUE; }	-0.124939
-0.357526	= Func(a[i]); }	-0.124939
-0.357526	i, timediff[i]); }	-0.124939
-0.357526	matrix[SIZE][SIZE]; transpose(matrix); }	-0.124939
-0.357526	{ F1(); }	-0.124939
-0.357526	supported"); return; }	-0.124939
-0.357526	= Func(ab[i].a); }	-0.124939
-0.357526	1; 69 }	-0.124939
-0.357526	return list[x]; }	-0.124939
-0.357526	return N; }	-0.124939
-0.357526	StoreNTD(&a[c][r], b[r][c]); }	-0.124939
-0.357526	return _mm_cvtss_f32(s); }	-0.124939
-0.357526	Func1(list, &list[8]); }	-0.124939
-0.357526	int i[2]; }	-0.124939
-0.357526	& N-1)==0,N>::p(x); }	-0.124939
-0.357526	temp; 104 }	-0.124939
-0.357526	FuncB(i+1); FuncC(i+1); }	-0.124939
-0.357526	+= a[i+3]; }	-0.124939
-0.357526	for-loop: i++; }	-0.124939
-0.357526	return *(T*)0; }	-0.124939
-0.357526	} 111 }	-0.124939
-0.357526	+= 9; }	-0.124939
-1.660155	pointer is then	-0.124939
-0.899179	ebx is then	-0.124939
-0.589113	A and then	-0.124939
-0.877572	constant and then	-0.124939
-1.041445	structure and then	-0.124939
-0.589113	model and then	-0.124939
-1.053663	zero and then	-0.124939
-0.589113	string and then	-0.124939
-0.877572	message and then	-0.124939
-0.589113	added and then	-0.124939
-0.589113	PC and then	-0.124939
-0.589113	calculations, and then	-0.124939
-0.589113	a+b=0, and then	-0.124939
-1.061791	values are then	-0.124939
-1.181084	files are then	-0.124939
-0.596185	members are then	-0.124939
-1.708686	code can then	-0.124939
-1.943077	compiler can then	-0.124939
-1.278020	thread can then	-0.124939
-1.196527	dispatched function then	-0.124939
-2.356481	the code then	-0.124939
-1.265412	a code then	-0.124939
-1.879258	of code then	-0.124939
-0.876637	CPU time then	-0.124939
-0.588632	short time then	-0.124939
-1.562641	compile time then	-0.124939
-1.740844	or more then	-0.124939
-0.896802	operation will then	-0.124939
-1.533093	in memory then	-0.124939
-2.187643	the program then	-0.124939
-1.891561	a program then	-0.124939
-1.364955	library functions then	-0.124939
-1.573417	member functions then	-0.124939
-1.184564	virtual functions then	-0.124939
-0.852699	frame functions then	-0.124939
-1.783535	the other then	-0.124939
-0.598528	big loop then	-0.124939
-1.357597	function which then	-0.124939
-1.856367	the cache then	-0.124939
-0.598686	thread should then	-0.124939
-0.875753	be set then	-0.124939
-2.191338	instruction set then	-0.124939
-1.188682	different compilers then	-0.124939
-1.187034	vector size then	-0.124939
-2.014315	a pointer then	-0.124939
-1.243424	smart pointer then	-0.124939
-1.735317	function library then	-0.124939
-1.062509	by two then	-0.124939
-1.948110	the object then	-0.124939
-0.598017	odd number then	-0.124939
-0.597117	object static then	-0.124939
-1.024003	of 2 then	-0.301030
-1.902825	the performance then	-0.124939
-1.180834	array elements then	-0.124939
-1.191544	above example, then	-0.124939
-0.594614	enough registers then	-0.124939
-1.612480	the case then	-0.124939
-0.876570	is available then	-0.124939
-1.055692	clean up then	-0.124939
-1.673053	an error then	-0.124939
-0.837687	thousand times then	-0.124939
-0.837687	1000 times then	-0.124939
-1.050499	is large then	-0.124939
-0.885606	function must then	-0.124939
-1.050385	of calculations then	-0.124939
-0.885054	program execution then	-0.124939
-1.061833	a result then	-0.124939
-1.166987	128 bytes then	-0.124939
-0.592804	are necessary then	-0.124939
-0.592346	one element then	-0.124939
-0.591110	_endthread(), etc. then	-0.124939
-1.400935	an exception then	-0.124939
-1.338434	is small then	-0.124939
-0.882659	output option then	-0.124939
-1.678157	cache line then	-0.124939
-0.591291	profiler works then	-0.124939
-0.557407	of parameters then	-0.124939
-1.178857	template parameters then	-0.124939
-1.270613	not advantageous then	-0.124939
-1.160837	a problem then	-0.124939
-0.882005	already known then	-0.124939
-1.046002	AVX support then	-0.124939
-0.880064	data structure then	-0.124939
-0.589652	set values then	-0.124939
-1.868679	clock cycles then	-0.124939
-0.589245	b[1], ... then	-0.124939
-0.588724	point counter then	-0.124939
-1.749050	CPU dispatching then	-0.124939
-0.587267	your application then	-0.124939
-0.586857	73) automatically then	-0.124939
-1.646577	exception handling then	-0.124939
-1.158782	these methods then	-0.124939
-0.586571	old block then	-0.124939
-0.871422	is high then	-0.124939
-1.539294	CPU dispatcher then	-0.124939
-0.870746	preceding addition then	-0.124939
-0.582976	64-bit vectors then	-0.124939
-0.581547	innermost function, then	-0.124939
-0.462295	this range then	-0.124939
-0.462295	limited range then	-0.124939
-0.462295	narrow range then	-0.124939
-1.124765	CPU core then	-0.124939
-0.580595	or references then	-0.124939
-1.015882	CPU supports then	-0.124939
-0.508172	as index then	-0.124939
-1.007595	array index then	-0.124939
-1.192460	the code, then	-0.124939
-1.382971	one instance then	-0.124939
-0.577304	want vectorization then	-0.124939
-0.577760	CPU efficiency then	-0.124939
-0.489829	a time, then	-0.124939
-0.489829	any time, then	-0.124939
-0.572897	specific models then	-0.124939
-0.847189	has changed then	-0.124939
-0.606965	to execute then	-0.124939
-0.570680	same resource then	-0.124939
-0.566920	Intel compiler, then	-0.124939
-0.632448	same module then	-0.124939
-0.376125	other module then	-0.124939
-0.376125	separate module then	-0.124939
-0.486138	is used, then	-0.124939
-1.382117	critical stride then	-0.124939
-1.382117	instruction set, then	-0.124939
-0.564415	are independent then	-0.124939
-0.566060	equally near then	-0.124939
-1.256974	dependency chains then	-0.124939
-1.122244	an integer, then	-0.124939
-0.559931	than once then	-0.124939
-0.697839	clock cycles, then	-0.124939
-0.560388	declared volatile then	-0.124939
-0.159397	shared object, then	-0.425969
-0.554748	version changes then	-0.124939
-1.038082	32-bit integers, then	-0.124939
-0.800841	poorly predictable then	-0.124939
-0.932176	the debugger then	-0.124939
-0.142862	version on, then	-0.425969
-0.539265	are swapped then	-0.124939
-0.909557	carry flag then	-0.124939
-1.036134	is true, then	-0.124939
-0.877694	the pipeline then	-0.124939
-0.524889	constant n, then	-0.124939
-0.128978	If not, then	-0.124939
-0.524047	and changing then	-0.124939
-0.524047	non-sequential manner then	-0.124939
-0.501939	four numbers, then	-0.124939
-0.829291	is slow, then	-0.124939
-0.248685	(FIFO) basis then	-0.124939
-0.248685	(FILO) basis then	-0.124939
-0.501939	is elsewhere then	-0.124939
-0.501939	one way, then	-0.124939
-0.501939	cache efficiency, then	-0.124939
-0.501939	big arrays, then	-0.124939
-0.503007	is false, then	-0.124939
-0.248685	first sum, then	-0.124939
-0.248685	second sum, then	-0.124939
-0.248685	of double, then	-0.124939
-0.248685	64-bit double, then	-0.124939
-0.460837	not vacant then	-0.124939
-0.655472	different priorities then	-0.124939
-0.655472	2 GHz then	-0.124939
-0.460837	parameters differ then	-0.124939
-0.142722	calculated first, then	-0.124939
-0.142722	values first, then	-0.124939
-0.655472	(see below) then	-0.124939
-0.460837	has hyperthreading, then	-0.124939
-0.460837	while loops, then	-0.124939
-0.460837	the container, then	-0.124939
-0.460837	AVX support, then	-0.124939
-0.460837	one segment then	-0.124939
-0.460837	acceptable limit, then	-0.124939
-0.356726	μs today, then	-0.124939
-0.356726	and FuncB, then	-0.124939
-0.356726	particular meaning, then	-0.124939
-0.356726	been identified, then	-0.124939
-0.356726	CPU dispatching, then	-0.124939
-0.356726	been found, then	-0.124939
-0.356726	chapter 9.10, then	-0.124939
-0.356726	too fine then	-0.124939
-0.356726	If so, then	-0.124939
-0.356726	the other, then	-0.124939
-0.356726	accessed row-wise, then	-0.124939
-0.356726	to T+5, then	-0.124939
-0.356726	poorly predictable, then	-0.124939
-0.356726	be made) then	-0.124939
-0.356726	same algorithm, then	-0.124939
-0.356726	to ignore, then	-0.124939
-0.356726	and y?" then	-0.124939
-0.356726	is obvious, then	-0.124939
-0.356726	for RTTI then	-0.124939
-0.356726	= 10000, then	-0.124939
-0.356726	writes only, then	-0.124939
-0.356726	too small, then	-0.124939
-0.356726	< 231 then	-0.124939
-0.356726	or C2, then	-0.124939
-0.356726	not met then	-0.124939
-0.356726	= 18, then	-0.124939
-0.356726	then d+e, then	-0.124939
-0.898799	compilers. // It	-0.124939
-0.593045	-0 } It	-0.124939
-0.593045	109 } It	-0.124939
-1.646914	intrinsic functions It	-0.124939
-1.823481	an object It	-0.124939
-0.594585	two libraries It	-0.124939
-1.292442	the code. It	-0.124939
-0.538181	integer code. It	-0.124939
-0.910282	extra code. It	-0.124939
-0.910282	source code. It	-0.124939
-0.520010	long time. It	-0.124939
-0.991451	extra time. It	-0.124939
-0.230258	longer time. It	-0.301030
-1.064510	function pointers It	-0.124939
-1.458894	compiler does It	-0.124939
-0.592697	clearing arrays It	-0.124939
-0.590862	maps etc. It	-0.124939
-0.881200	intrinsic functions. It	-0.124939
-0.881200	mathematical functions. It	-0.124939
-0.525596	unreferenced functions. It	-0.124939
-0.856284	the memory. It	-0.124939
-1.110929	in memory. It	-0.124939
-0.742582	allocated memory. It	-0.124939
-0.697788	is used. It	-0.124939
-0.900730	are used. It	-0.124939
-0.632791	longer used. It	-0.124939
-0.446212	seldom used. It	-0.124939
-1.445260	64-bit systems. It	-0.124939
-0.586294	these data. It	-0.124939
-1.716395	instruction set. It	-0.124939
-1.025528	VIA processors. It	-0.124939
-0.582389	be called. It	-0.124939
-0.863000	Intel CPUs. It	-0.124939
-0.580861	same compiler. It	-0.124939
-0.579646	registers are: It	-0.124939
-1.424605	the loop. It	-0.124939
-0.196387	a pointer. It	-0.249877
-0.502901	'this' pointer. It	-0.124939
-0.579056	best cases. It	-0.124939
-0.576742	induction variables. It	-0.124939
-0.576747	another class. It	-0.124939
-0.851641	System database It	-0.124939
-1.449733	function calls. It	-0.124939
-1.025198	vector registers. It	-0.124939
-1.102128	shared object. It	-0.124939
-0.574230	C library. It	-0.124939
-0.702315	integer calculations. It	-0.124939
-0.878407	mathematical calculations. It	-0.124939
-1.497451	clock cycles. It	-0.124939
-0.574230	input/output operations. It	-0.124939
-0.573916	full optimization. It	-0.124939
-0.574230	improve performance. It	-0.124939
-1.193791	each thread. It	-0.124939
-1.328524	data structures It	-0.124939
-0.570092	one vector. It	-0.124939
-0.566619	or references. It	-0.124939
-1.079586	the CPU. It	-0.124939
-0.567422	in Windows. It	-0.124939
-0.502833	to integers. It	-0.124939
-0.357366	signed integers. It	-0.124939
-0.357366	containing integers. It	-0.124939
-0.448748	it to. It	-0.124939
-0.448748	apply to. It	-0.124939
-0.564182	not critical. It	-0.124939
-0.564626	became available. It	-0.124939
-0.828740	was executed. It	-0.124939
-0.564626	software faster. It	-0.124939
-0.448408	cache problems. It	-0.124939
-0.448408	alignment problems. It	-0.124939
-1.228745	template parameter. It	-0.124939
-1.067720	point expressions. It	-0.124939
-0.822989	previous value. It	-0.124939
-1.202283	operating system. It	-0.124939
-0.821182	the arrays. It	-0.124939
-0.559178	the branch. It	-0.124939
-0.433263	storage space. It	-0.124939
-0.433263	disk space. It	-0.124939
-0.559178	element zero. It	-0.124939
-0.559178	legacy software. It	-0.124939
-0.554498	better solution. It	-0.124939
-0.811654	for Linux. It	-0.124939
-1.100766	assembly language. It	-0.124939
-0.554498	it is. It	-0.124939
-0.555056	code automatically. It	-0.124939
-0.413325	the core. It	-0.124939
-0.413325	same core. It	-0.124939
-1.174373	automatic vectorization. It	-0.124939
-0.553940	true anyway. It	-0.124939
-0.951132	to do. It	-0.124939
-0.547053	precision constant. It	-0.124939
-0.547053	see this. It	-0.124939
-0.389424	time here. It	-0.124939
-0.389424	appropriate here. It	-0.124939
-1.076978	data members. It	-0.124939
-0.547693	response times. It	-0.124939
-0.547693	previous one. It	-0.124939
-0.547693	program structure. It	-0.124939
-0.782497	a profiler. It	-0.124939
-0.989246	exception handling. It	-0.124939
-0.782497	installation tools. It	-0.124939
-0.782497	point numbers. It	-0.124939
-0.537591	= a[i]; It	-0.124939
-0.989246	out-of-order execution. It	-0.124939
-0.537591	approximately so. It	-0.124939
-0.523774	mouse input. It	-0.124939
-0.523774	own IDE. It	-0.124939
-0.523774	dynamic versions. It	-0.124939
-0.523774	number information. It	-0.124939
-0.951239	user interface. It	-0.124939
-0.760062	of unit-testing It	-0.124939
-1.004073	programming style. It	-0.124939
-0.523774	the counts. It	-0.124939
-0.523774	files smaller. It	-0.124939
-0.523774	do manually. It	-0.124939
-0.523774	16-bit programs. It	-0.124939
-0.502833	particular part. It	-0.124939
-0.721230	< 100. It	-0.124939
-0.721230	Cache organization It	-0.124939
-0.248589	this purpose. It	-0.124939
-0.248589	specific purpose. It	-0.124939
-0.721230	accessed sequentially. It	-0.124939
-0.501680	non-sequential manner. It	-0.124939
-0.501680	on x. It	-0.124939
-0.460601	new context. It	-0.124939
-0.460601	are aligned. It	-0.124939
-0.460601	and throw. It	-0.124939
-0.655103	of course. It	-0.124939
-0.655103	page 73). It	-0.124939
-0.655103	has disadvantages: It	-0.124939
-0.460601	a buffer. It	-0.124939
-0.655103	optimized away. It	-0.124939
-0.655103	make utility. It	-0.124939
-0.460601	syntax check. It	-0.124939
-0.655103	data decomposition. It	-0.124939
-0.460601	are lost. It	-0.124939
-0.655103	to read. It	-0.124939
-0.460601	data. 148 It	-0.124939
-0.460601	is started. It	-0.124939
-0.460601	with Gnu. It	-0.124939
-0.460601	dispatcher updated. It	-0.124939
-0.655103	poorly predictable. It	-0.124939
-0.655103	below shows. It	-0.124939
-0.460601	open source. It	-0.124939
-0.460601	more efficiently. It	-0.124939
-0.460601	doing divisions. It	-0.124939
-0.655103	page 72. It	-0.124939
-0.460601	also safer. It	-0.124939
-0.356540	error message. It	-0.124939
-0.356540	final product. It	-0.124939
-0.356540	and off. It	-0.124939
-0.356540	to diagnose. It	-0.124939
-0.356540	the profile. It	-0.124939
-0.356540	Feature bloat. It	-0.124939
-0.356540	contained objects? It	-0.124939
-0.356540	compile-time polymorphism. It	-0.124939
-0.356540	mouse move. It	-0.124939
-0.356540	iterations ahead. It	-0.124939
-0.356540	dispatch strategies It	-0.124939
-0.356540	C-style type-casting. It	-0.124939
-0.356540	for response. It	-0.124939
-0.356540	is happening. It	-0.124939
-0.356540	not standardized. It	-0.124939
-0.356540	quite convenient. It	-0.124939
-0.356540	too high. It	-0.124939
-0.356540	are unavoidable. It	-0.124939
-0.356540	or animation. It	-0.124939
-0.356540	<xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It	-0.124939
-0.356540	non-Intel processors). It	-0.124939
-0.356540	programmer can. It	-0.124939
-0.356540	tedious indeed. It	-0.124939
-0.356540	for hackers. It	-0.124939
-0.356540	user friendly. It	-0.124939
-0.356540	page 54. It	-0.124939
-0.356540	the label. It	-0.124939
-0.356540	both positive. It	-0.124939
-0.356540	these considerations. It	-0.124939
-0.356540	and list[i].b. It	-0.124939
-0.356540	mean atomic. It	-0.124939
-0.356540	performs poorly. It	-0.124939
-0.356540	first-in-last-out fashion. It	-0.124939
-0.356540	choose between. It	-0.124939
-0.356540	page 130. It	-0.124939
-0.356540	pointer conversions. It	-0.124939
-0.356540	p. 57). It	-0.124939
-0.356540	for correctness. It	-0.124939
-0.356540	page 61. It	-0.124939
-0.356540	or sizes? It	-0.124939
-0.356540	to a*b*c*2. It	-0.124939
-0.356540	with double's. It	-0.124939
-0.356540	memory pooling. It	-0.124939
-0.356540	many decimals. It	-0.124939
-0.356540	to develop. It	-0.124939
-0.356540	memory leaks. It	-0.124939
-0.356540	in green. It	-0.124939
-0.356540	is costless. It	-0.124939
-0.356540	a queue. It	-0.124939
-0.037104	by // Example	-0.124939
-0.293559	code. // Example	-0.124939
-0.293559	time. // Example	-0.124939
-0.004468	Example: // Example	-0.124939
-0.229798	conversion // Example	-0.124939
-0.416035	compilers. // Example	-0.124939
-0.014463	example: // Example	-0.124939
-0.021880	to: // Example	-0.124939
-0.293559	arrays. // Example	-0.124939
-0.293559	array. // Example	-0.124939
-0.037104	this: // Example	-0.124939
-0.293559	follows: // Example	-0.124939
-0.293559	2: // Example	-0.124939
-0.293559	is. // Example	-0.124939
-0.293559	details. // Example	-0.124939
-0.293559	www.agner.org/optimize/asmlib.zip. // Example	-0.124939
-0.293559	103 // Example	-0.124939
-0.293559	1: // Example	-0.124939
-0.293559	unsigned. // Example	-0.124939
-0.077677	numbers: // Example	-0.124939
-0.122434	calculations: // Example	-0.124939
-0.122434	operations: // Example	-0.124939
-0.293559	overflow: // Example	-0.124939
-0.293559	way: // Example	-0.124939
-0.077677	with: // Example	-0.124939
-0.293559	38 // Example	-0.124939
-0.122434	bit: // Example	-0.124939
-0.122434	case: // Example	-0.124939
-0.293559	loop: // Example	-0.124939
-0.122434	lookup: // Example	-0.124939
-0.293559	parameter: // Example	-0.124939
-0.122434	table: // Example	-0.124939
-0.293559	class: // Example	-0.124939
-0.122434	needed: // Example	-0.124939
-0.122434	available: // Example	-0.124939
-0.293559	function: // Example	-0.124939
-0.293559	SafeArray: // Example	-0.124939
-0.293559	union: // Example	-0.124939
-0.293559	bits: // Example	-0.124939
-0.293559	zero: // Example	-0.124939
-0.293559	point: // Example	-0.124939
-0.293559	enabled: // Example	-0.124939
-0.293559	precision: // Example	-0.124939
-0.293559	integer: // Example	-0.124939
-0.293559	last: // Example	-0.124939
-0.293559	condition: // Example	-0.124939
-0.293559	efficient: // Example	-0.124939
-0.293559	arrays: // Example	-0.124939
-0.293559	underflow: // Example	-0.124939
-0.293559	polymorphism: // Example	-0.124939
-0.293559	Examples: // Example	-0.124939
-0.293559	variable: // Example	-0.124939
-0.293559	double: // Example	-0.124939
-0.293559	reorganize: // Example	-0.124939
-0.293559	improvements). // Example	-0.124939
-0.293559	classes): // Example	-0.124939
-0.293559	polynomial: // Example	-0.124939
-0.293559	InstructionSet(): // Example	-0.124939
-0.293559	structures: // Example	-0.124939
-0.293559	values: // Example	-0.124939
-0.293559	keyword: // Example	-0.124939
-0.293559	comparison: // Example	-0.124939
-0.293559	set: // Example	-0.124939
-0.293559	reciprocal: // Example	-0.124939
-0.293559	(WTL): // Example	-0.124939
-0.293559	counter: // Example	-0.124939
-0.293559	used: // Example	-0.124939
-0.293559	memcpy: // Example	-0.124939
-0.293559	denominator: // Example	-0.124939
-0.293559	two: // Example	-0.124939
-0.293559	have: // Example	-0.124939
-0.293559	memset: // Example	-0.124939
-0.293559	address: // Example	-0.124939
-0.293559	fastest: // Example	-0.124939
-0.293559	alloca: // Example	-0.124939
-0.293559	exponent: // Example	-0.124939
-0.293559	instead: // Example	-0.124939
-0.293559	conversions: // Example	-0.124939
-0.293559	matrix[c][r]. // Example	-0.124939
-0.293559	classes: // Example	-0.124939
-0.293559	a: // Example	-0.124939
-0.293559	integers: // Example	-0.124939
-0.293559	e.g.: // Example	-0.124939
-0.293559	7.22. // Example	-0.124939
-0.293559	search: // Example	-0.124939
-0.293559	9.5b. // Example	-0.124939
-0.293559	false: // Example	-0.124939
-0.293559	square: // Example	-0.124939
-0.293559	interval: // Example	-0.124939
-0.293559	variables: // Example	-0.124939
-0.293559	statement: // Example	-0.124939
-0.293559	template: // Example	-0.124939
-0.293559	capability: // Example	-0.124939
-0.293559	gives: // Example	-0.124939
-0.293559	1.2f; // Example	-0.124939
-2.011916	} } Example	-0.124939
-0.588066	positive } Example	-0.124939
-0.588066	exponent } Example	-0.124939
-0.567894	mode): ; Example	-0.124939
-0.567894	8.26b: ; Example	-0.124939
-0.482933	per element Example	-0.124939
-0.740306	the loop. Example	-0.124939
-0.981766	big-endian storage. Example	-0.124939
-0.835782	at runtime. Example	-0.124939
-0.504982	chosen expression. Example	-0.124939
-0.597296	// or from	-0.124939
-0.597296	optimally, or from	-0.124939
-1.059651	call it from	-0.124939
-0.595452	prevent it from	-0.124939
-0.595452	prevents it from	-0.124939
-1.354609	library function from	-0.124939
-1.066191	single function from	-0.124939
-2.368000	the code from	-0.124939
-0.741175	assembly code from	-0.425969
-1.191091	value than from	-0.124939
-1.981724	faster than from	-0.124939
-1.160521	the compiler from	-0.425969
-0.870994	constant data from	-0.124939
-0.585716	Accessing data from	-0.124939
-0.585716	send data from	-0.124939
-0.480696	integer vector from	-0.726999
-0.245740	called only from	-0.301030
-1.319401	the CPU from	-0.301030
-1.389177	and one from	-0.124939
-0.589828	available, one from	-0.124939
-1.568936	level-2 cache from	-0.124939
-1.309411	level-1 cache from	-0.124939
-0.598604	latest compilers from	-0.124939
-0.598337	convert b from	-0.124939
-1.395863	the value from	-0.124939
-1.049998	a value from	-0.124939
-0.794523	this value from	-0.124939
-1.147211	each value from	-0.124939
-1.704214	the variable from	-0.124939
-1.674511	a variable from	-0.124939
-0.365548	to return from	-0.124939
-0.834275	; return from	-0.124939
-1.765886	the table from	-0.124939
-1.152369	the elements from	-0.124939
-0.338362	consecutive elements from	-0.602060
-0.787961	is called from	-0.124939
-0.399785	be called from	-0.124939
-0.825963	are called from	-0.124939
-0.104646	when called from	-0.124939
-0.843999	also called from	-0.124939
-0.891497	A call from	-0.124939
-1.174107	map file from	-0.124939
-0.772766	is available from	-0.124939
-0.809548	are available from	-0.124939
-0.630107	also available from	-0.124939
-0.444464	always available from	-0.124939
-0.444464	easily available from	-0.124939
-0.444464	Library, available from	-0.124939
-1.095140	is accessed from	-0.124939
-0.613740	be accessed from	-0.124939
-0.519204	when accessed from	-0.124939
-0.593229	0x40 bytes from	-0.124939
-1.337092	two threads from	-0.124939
-0.592617	the integers from	-0.124939
-0.493634	is calculated from	-0.425969
-1.690414	is known from	-0.124939
-0.591434	requires support from	-0.124939
-0.591125	entire list from	-0.124939
-0.590126	subtracting 1 from	-0.124939
-0.937587	object files from	-0.124939
-0.549564	resource files from	-0.124939
-1.166071	not optimal from	-0.124939
-1.625537	instruction sets from	-0.124939
-0.587359	functions separate from	-0.124939
-1.620487	memory block from	-0.124939
-0.882984	The conversion from	-0.124939
-0.496996	A conversion from	-0.124939
-0.496996	Efficient conversion from	-0.124939
-0.587295	steals resources from	-0.124939
-0.586104	subtracting n from	-0.124939
-1.033284	at runtime from	-0.124939
-0.872825	are needed from	-0.124939
-0.976323	is transferred from	-0.124939
-0.532980	or transferred from	-0.124939
-0.471810	to read from	-0.124939
-0.402606	we read from	-0.124939
-0.402606	had read from	-0.124939
-0.402606	99 read from	-0.124939
-0.585348	functions linked from	-0.124939
-0.870286	of microprocessors from	-0.124939
-0.584401	for calling from	-0.124939
-0.583491	9.5a goes from	-0.124939
-0.462787	which range from	-0.124939
-0.172463	address range from	-0.425969
-1.379118	be loaded from	-0.124939
-0.447791	all conversions from	-0.124939
-0.168571	avoid conversions from	-0.124939
-0.579685	for response from	-0.124939
-1.415925	cache lines from	-0.124939
-0.503727	or comes from	-0.124939
-0.503727	feedback comes from	-0.124939
-0.575568	size right from	-0.124939
-0.852727	and writing from	-0.124939
-0.575955	been reduced from	-0.124939
-0.575955	be clear from	-0.124939
-1.102250	more efficiently from	-0.124939
-0.574266	variable names from	-0.124939
-0.571373	restore ebx from	-0.124939
-1.119082	is copied from	-0.124939
-0.377089	to come from	-0.124939
-0.377089	or come from	-0.124939
-0.377089	they come from	-0.124939
-0.434430	integers Conversion from	-0.124939
-0.434430	enabled. Conversion from	-0.124939
-0.561009	a jump from	-0.124939
-0.307885	is far from	-0.124939
-0.307885	values far from	-0.124939
-0.307885	course far from	-0.124939
-0.953826	be saved from	-0.124939
-0.556136	programming manuals from	-0.124939
-0.555792	Reading again from	-0.124939
-0.390112	program reads from	-0.124939
-0.390112	later reads from	-0.124939
-0.025072	can benefit from	-0.124939
-0.051681	will benefit from	-0.124939
-0.025072	could benefit from	-0.124939
-0.073017	to recover from	-0.249877
-0.540434	are generated from	-0.124939
-0.539045	been increased from	-0.124939
-0.539507	ret returns from	-0.124939
-0.539970	it gets from	-0.124939
-0.539507	to sum1 from	-0.124939
-0.525742	when going from	-0.124939
-0.525182	to sum2 from	-0.124939
-0.129149	is prevented from	-0.124939
-0.761882	user interfaces from	-0.124939
-0.880261	the interval from	-0.124939
-1.006469	vector b: from	-0.124939
-0.503017	is removed from	-0.124939
-0.503727	not separated from	-0.124939
-0.503017	details. Inheritance from	-0.124939
-0.249086	purposes. Available from	-0.124939
-0.249086	Agner Available from	-0.124939
-0.503017	answer questions from	-0.124939
-0.503727	when returning from	-0.124939
-0.503017	Use ReadTSC() from	-0.124939
-0.723450	be evicted from	-0.124939
-0.142958	often suffer from	-0.124939
-0.142958	therefore suffer from	-0.124939
-0.461818	no warning from	-0.124939
-0.461818	be fetched from	-0.124939
-0.142958	are accessible from	-0.124939
-0.142958	not accessible from	-0.124939
-0.461818	and recovering from	-0.124939
-0.657009	const restriction from	-0.124939
-0.357498	file timingtest.h from	-0.124939
-0.357498	not referenced from	-0.124939
-0.357498	any transition from	-0.124939
-0.357498	and popped from	-0.124939
-0.357498	may deviate from	-0.124939
-0.357498	can learn from	-0.124939
-0.357498	feasible. Interference from	-0.124939
-0.357498	} 115 from	-0.124939
-0.357498	is re-loaded from	-0.124939
-1.932569	of the memory	-0.221849
-2.278792	and the memory	-0.124939
-2.369597	for the memory	-0.124939
-1.954094	that the memory	-0.124939
-2.239092	by the memory	-0.124939
-2.153593	because the memory	-0.124939
-1.346608	cause the memory	-0.124939
-0.891597	causes the memory	-0.124939
-0.202957	free the memory	-0.124939
-0.891597	Especially the memory	-0.124939
-0.601400	remedy is memory	-0.124939
-2.312338	of a memory	-0.124939
-1.764472	in a memory	-0.124939
-2.101388	as a memory	-0.124939
-1.593963	when a memory	-0.124939
-0.918589	at a memory	-0.602060
-0.887270	holds a memory	-0.124939
-0.594078	managing a memory	-0.124939
-1.578767	part of memory	-0.124939
-1.817387	versions of memory	-0.124939
-0.201177	allocation of memory	-0.124939
-1.037003	block of memory	-0.124939
-1.739795	piece of memory	-0.124939
-1.232849	range of memory	-0.124939
-0.587541	index of memory	-0.124939
-0.591957	amount of memory	-0.124939
-0.587541	copying of memory	-0.124939
-1.354444	risk of memory	-0.124939
-0.587541	swapping of memory	-0.124939
-0.874521	deallocation of memory	-0.124939
-0.587541	de-allocation of memory	-0.124939
-0.587541	amounts of memory	-0.124939
-1.295045	cost to memory	-0.124939
-1.076179	directly to memory	-0.124939
-1.199445	access and memory	-0.124939
-0.592713	x in memory	-0.124939
-1.402315	functions in memory	-0.124939
-1.603585	variable in memory	-0.124939
-1.590171	variables in memory	-0.124939
-2.002618	stored in memory	-0.124939
-0.594567	around in memory	-0.124939
-0.884594	sequentially in memory	-0.124939
-0.592713	distance in memory	-0.124939
-1.067099	libraries. The memory	-0.124939
-0.597995	stack. The memory	-0.124939
-0.597995	(PLT). The memory	-0.124939
-1.076495	likely that memory	-0.124939
-0.600371	container or memory	-0.124939
-0.600940	method if memory	-0.124939
-1.493652	or by memory	-0.124939
-0.600518	calculations with memory	-0.124939
-0.203056	known as memory	-0.124939
-0.900160	purposes. This memory	-0.124939
-0.600183	bottleneck than memory	-0.124939
-1.073150	computers have memory	-0.124939
-1.700874	but this memory	-0.124939
-1.285569	takes more memory	-0.124939
-1.056192	allocate more memory	-0.124939
-1.238868	value from memory	-0.124939
-1.201877	read from memory	-0.124939
-0.580335	loaded from memory	-0.124939
-0.580335	re-loaded from memory	-0.124939
-1.657011	and data memory	-0.124939
-0.373194	use different memory	-0.124939
-1.029125	at different memory	-0.124939
-1.185291	the same memory	-0.249877
-1.575525	in one memory	-0.124939
-0.590253	allocates one memory	-0.124939
-0.587641	load into memory	-0.124939
-1.149113	loaded into memory	-0.124939
-1.343640	with multiple memory	-0.124939
-0.585674	keep multiple memory	-0.124939
-0.326513	in static memory	-0.301030
-1.031464	The static memory	-0.124939
-0.172950	from static memory	-0.726999
-1.678372	most efficient memory	-0.124939
-0.597884	maximum possible memory	-0.124939
-0.583662	always takes memory	-0.124939
-0.583662	never takes memory	-0.124939
-0.597325	destroys any memory	-0.124939
-0.892380	much less memory	-0.124939
-0.596512	cases take memory	-0.124939
-1.232695	a new memory	-0.425969
-0.145160	of dynamic memory	-0.522879
-0.297810	with dynamic memory	-0.124939
-0.037495	use dynamic memory	-0.602060
-0.297810	using dynamic memory	-0.124939
-0.297810	without dynamic memory	-0.124939
-0.078535	avoid dynamic memory	-0.301030
-0.297810	All dynamic memory	-0.124939
-0.297810	Whenever dynamic memory	-0.124939
-1.804772	in case memory	-0.124939
-0.356063	to stack memory	-0.124939
-0.544034	in stack memory	-0.124939
-0.594115	87 about memory	-0.124939
-1.185882	a large memory	-0.124939
-0.789406	of large memory	-0.124939
-0.541509	This large memory	-0.124939
-0.565377	of big memory	-0.124939
-0.977294	one big memory	-0.124939
-1.187234	how much memory	-0.124939
-1.480788	most common memory	-0.124939
-0.786384	the allocated memory	-0.124939
-0.177112	The allocated memory	-0.124939
-0.481218	own allocated memory	-0.124939
-1.089497	dynamically allocated memory	-0.124939
-0.591838	for another memory	-0.124939
-1.093171	a particular memory	-0.425969
-1.452177	its own memory	-0.124939
-0.108864	new bigger memory	-0.301030
-1.406022	the old memory	-0.124939
-0.865944	a smaller memory	-0.124939
-0.718489	the main memory	-0.124939
-0.269399	code. Dynamic memory	-0.124939
-0.178634	allocation Dynamic memory	-0.124939
-0.178634	allocation. Dynamic memory	-0.124939
-0.178634	inefficient. Dynamic memory	-0.124939
-0.178634	28 Dynamic memory	-0.124939
-0.178634	are. Dynamic memory	-0.124939
-0.178634	limited. Dynamic memory	-0.124939
-0.080196	9.6 Dynamic memory	-0.425969
-0.858449	time. No memory	-0.124939
-1.291692	to prevent memory	-0.124939
-0.177566	9 Optimizing memory	-0.425969
-0.659369	of RAM memory	-0.124939
-0.138573	from RAM memory	-0.124939
-0.343288	where RAM memory	-0.124939
-1.016927	to swap memory	-0.124939
-0.540117	cause seven memory	-0.124939
-0.540428	the excessive memory	-0.124939
-0.539806	a larger memory	-0.124939
-0.540117	one contiguous memory	-0.124939
-0.762187	at round memory	-0.124939
-0.881932	for saving memory	-0.124939
-0.525919	to uncached memory	-0.124939
-0.525919	execution units, memory	-0.124939
-0.833080	an arbitrary memory	-0.124939
-0.503718	efficient. Extra memory	-0.124939
-0.503718	that allocates memory	-0.124939
-0.065687	execution speed, memory	-0.124939
-0.357999	for minimizing memory	-0.124939
-0.357999	strides. Uncached memory	-0.124939
-0.357999	for reserving memory	-0.124939
-0.601056	responded to at	-0.124939
-2.478670	may be at	-0.124939
-0.599773	loaded or at	-0.124939
-0.895162	calculate it at	-0.124939
-0.598078	reflecting it at	-0.124939
-0.600505	53 function at	-0.124939
-1.371029	aligned by at	-0.124939
-0.597804	extra code at	-0.124939
-0.898269	or not at	-0.124939
-1.660061	rather than at	-0.124939
-2.590786	the compiler at	-0.124939
-0.599772	consume time at	-0.124939
-1.070607	stack memory at	-0.124939
-0.897712	class has at	-0.124939
-1.836069	are used at	-0.124939
-0.200057	never used at	-0.425969
-1.359586	other compilers at	-0.124939
-0.598255	Make pointer at	-0.124939
-1.011105	function library at	-0.124939
-0.562931	Load library at	-0.124939
-1.004160	asmlib library at	-0.124939
-1.186118	as possible at	-0.124939
-0.894284	its value at	-0.124939
-1.866520	the variable at	-0.124939
-1.764933	the table at	-0.124939
-0.547330	The elements at	-0.124939
-0.471492	eight elements at	-0.425969
-0.547330	dummy elements at	-0.124939
-1.064289	run faster at	-0.124939
-1.393676	is stored at	-0.124939
-1.056572	be stored at	-0.425969
-0.515663	then stored at	-0.124939
-0.185411	i.e. stored at	-0.425969
-1.467732	memory address at	-0.124939
-0.595951	small bit at	-0.124939
-1.558130	32 bits at	-0.124939
-0.593978	optimization instructions at	-0.124939
-1.665546	are available at	-0.124939
-1.747005	the stack at	-0.124939
-1.267856	that calls at	-0.124939
-0.593034	some calculations at	-0.124939
-1.345601	16 bytes at	-0.124939
-0.592812	are best at	-0.124939
-0.592029	b) etc. at	-0.124939
-1.167491	very good at	-0.124939
-0.812950	is done at	-0.124939
-1.235805	be done at	-0.124939
-1.025463	are done at	-0.124939
-0.195450	one line at	-0.425969
-0.232407	this manual at	-0.301030
-1.947306	as explained at	-0.124939
-1.392240	is calculated at	-0.124939
-1.513257	be calculated at	-0.124939
-0.280260	is known at	-0.602060
-0.222705	be known at	-0.124939
-0.030055	not known at	-0.903090
-0.222705	integer known at	-0.124939
-0.324389	size known at	-0.124939
-0.222705	constant known at	-0.124939
-0.324389	already known at	-0.124939
-0.882753	not supported at	-0.124939
-0.948351	may run at	-0.124939
-1.134089	will run at	-0.124939
-0.193454	multiple values at	-0.124939
-1.878695	clock cycles at	-0.124939
-0.590144	row addresses at	-0.124939
-0.588131	Avoid branches at	-0.124939
-0.875272	point multiplication at	-0.124939
-0.587145	its name at	-0.124939
-1.497214	to zero at	-0.124939
-0.777015	and better at	-0.124939
-0.534464	are better at	-0.124939
-1.505424	table lookup at	-0.124939
-0.121613	first byte at	-0.221849
-0.113407	bytes byte at	-0.124939
-0.366409	1 byte at	-0.124939
-0.012659	last byte at	-0.124939
-0.113407	15 byte at	-0.124939
-1.145452	is transferred at	-0.124939
-1.141448	are aligned at	-0.124939
-0.235957	not look at	-0.124939
-0.517297	may look at	-0.124939
-0.065363	you look at	-0.602060
-0.235957	should look at	-0.124939
-0.235957	also look at	-0.124939
-0.102147	Let's look at	-0.124939
-0.235957	let's look at	-0.124939
-0.868281	four numbers at	-0.124939
-0.870632	small piece at	-0.124939
-0.583250	installation options at	-0.124939
-1.023867	to start at	-0.124939
-0.518577	may start at	-0.124939
-0.693415	scattered around at	-0.425969
-0.581882	multiple things at	-0.124939
-0.581248	equivalent reductions at	-0.124939
-0.548782	be loaded at	-0.425969
-0.569548	are loaded at	-0.124939
-0.404026	typically loaded at	-0.124939
-0.580931	N+1 supports at	-0.124939
-0.579082	BSD comes at	-0.124939
-0.575857	no offset at	-0.124939
-0.237368	was unknown at	-0.124939
-0.038205	were unknown at	-0.823909
-0.573743	one square at	-0.124939
-0.574184	one thing at	-0.124939
-1.409864	at least at	-0.124939
-0.993907	can occur at	-0.124939
-0.568295	which counts at	-0.124939
-0.839944	be added at	-0.124939
-0.564700	library (or at	-0.124939
-0.560569	same DLL at	-0.124939
-0.187572	is resolved at	-0.124939
-0.097206	always resolved at	-0.425969
-0.951775	in memory, at	-0.124939
-0.548821	will break at	-0.124939
-0.548405	everything happens at	-0.124939
-0.548821	not evaluated at	-0.124939
-0.538914	less popular at	-0.124939
-0.539403	and pow at	-0.124939
-0.784825	the project at	-0.124939
-0.314090	below. Dispatch at	-0.124939
-0.314090	compilers. Dispatch at	-0.124939
-0.526239	the appendix at	-0.124939
-0.525056	must begin at	-0.124939
-0.525056	same cache, at	-0.124939
-0.723251	the Internet at	-0.124939
-0.724497	program flow at	-0.124939
-0.723251	is handled at	-0.124939
-0.461709	and memcpy, at	-0.124939
-0.461709	few kilobytes at	-0.124939
-0.461709	be visible at	-0.124939
-0.461709	by looking at	-0.124939
-0.461709	element matrix[c][r] at	-0.124939
-0.461709	generate interrupts at	-0.124939
-0.461709	to do, at	-0.124939
-0.461709	body begins at	-0.124939
-0.656838	garbage collector at	-0.124939
-0.357412	been lost at	-0.124939
-0.357412	is aiming at	-0.124939
-0.357412	are instantiated at	-0.124939
-0.357412	calculate (1./1.2345) at	-0.124939
-0.357412	dispatch decision at	-0.124939
-0.357412	for Nerds at	-0.124939
-0.357412	Pragmatic Look at	-0.124939
-0.357412	need relocation at	-0.124939
-0.357412	things. Looking at	-0.124939
-0.357412	come unpredictably at	-0.124939
-0.357412	debug breakpoints at	-0.124939
-1.799929	of the data	-0.124939
-1.726217	and the data	-0.124939
-2.107678	in the data	-0.425969
-1.581576	if the data	-0.425969
-2.166538	when the data	-0.124939
-1.882380	make the data	-0.124939
-1.777152	into the data	-0.124939
-1.847100	where the data	-0.124939
-1.714967	makes the data	-0.124939
-1.157615	access the data	-0.124939
-1.535375	up the data	-0.124939
-1.525910	making the data	-0.124939
-0.669140	Therefore, the data	-0.124939
-1.043844	smaller the data	-0.124939
-0.589959	processing the data	-0.124939
-1.389766	divide the data	-0.124939
-0.589959	converting the data	-0.124939
-0.589959	reordering the data	-0.124939
-0.589959	aligning the data	-0.124939
-0.879215	manipulate the data	-0.124939
-0.879215	organizing the data	-0.124939
-0.201673	Organize the data	-0.124939
-2.534769	This is data	-0.124939
-2.499792	of a data	-0.124939
-0.599045	cases, a data	-0.124939
-1.192184	accessing a data	-0.124939
-1.070199	Accessing a data	-0.124939
-1.598048	set of data	-0.124939
-2.027420	size of data	-0.124939
-1.543753	type of data	-0.124939
-1.714655	case of data	-0.124939
-1.244348	lot of data	-0.124939
-1.049644	byte of data	-0.124939
-1.262077	pieces of data	-0.124939
-0.089394	Alignment of data	-0.124939
-0.883177	contents of data	-0.124939
-0.940667	pointers to data	-0.124939
-0.898326	references to data	-0.124939
-0.612892	code and data	-0.204120
-0.877343	use and data	-0.124939
-1.527713	functions and data	-0.124939
-1.318497	cache and data	-0.124939
-0.588996	caching and data	-0.124939
-0.588996	algorithms and data	-0.124939
-0.588996	decomposition and data	-0.124939
-1.875703	} The data	-0.124939
-1.451094	data. The data	-0.124939
-0.888932	index. The data	-0.124939
-0.594923	(properties) The data	-0.124939
-0.888932	processes. The data	-0.124939
-1.377266	require that data	-0.124939
-1.065980	program or data	-0.124939
-0.597615	size or data	-0.124939
-0.597953	explicitly if data	-0.124939
-0.597953	poor if data	-0.124939
-0.600502	vectors. This data	-0.124939
-0.599809	sample more data	-0.124939
-0.377037	efficiently when data	-0.425969
-2.571651	the same data	-0.124939
-1.773581	and other data	-0.124939
-1.326808	or other data	-0.124939
-1.736229	in which data	-0.124939
-1.175597	of all data	-0.124939
-1.444019	on all data	-0.124939
-0.573908	copying all data	-0.124939
-0.573908	contain all data	-0.124939
-1.427844	often used data	-0.124939
-1.719708	a class data	-0.124939
-1.165415	parent class data	-0.124939
-1.031333	the multiple data	-0.124939
-0.585518	on multiple data	-0.124939
-0.597604	allowing two data	-0.124939
-1.099183	of static data	-0.124939
-1.196093	The static data	-0.124939
-0.573355	because static data	-0.124939
-0.597758	structure where data	-0.124939
-0.477201	This makes data	-0.970037
-0.862106	which makes data	-0.124939
-2.002274	the first data	-0.124939
-0.485547	of test data	-0.124939
-0.950044	The test data	-0.124939
-0.569431	// constant data	-0.124939
-0.569431	Copying constant data	-0.124939
-1.336620	for making data	-0.124939
-0.593851	but its data	-0.124939
-0.593894	26 about data	-0.124939
-0.409422	in large data	-0.425969
-0.642540	for large data	-0.124939
-0.452532	with large data	-0.124939
-0.452532	on large data	-0.124939
-0.858659	very large data	-0.124939
-0.452532	Use large data	-0.124939
-0.593588	threads, while data	-0.124939
-0.466090	have big data	-0.124939
-0.992628	very big data	-0.124939
-0.830025	as much data	-0.124939
-0.973729	too much data	-0.124939
-1.696202	to store data	-0.124939
-0.880155	store intermediate data	-0.124939
-0.944330	a public data	-0.124939
-0.768239	and public data	-0.124939
-1.450100	its own data	-0.124939
-0.583640	of binary data	-0.124939
-0.583640	plain old data	-0.124939
-0.582414	more advanced data	-0.124939
-0.530499	the level-1 data	-0.301030
-0.372168	a level-1 data	-0.124939
-0.577818	including local data	-0.124939
-1.451332	the right data	-0.124939
-0.853333	and writing data	-0.124939
-0.574072	too little data	-0.124939
-0.844531	will align data	-0.124939
-0.569077	cannot modify data	-0.124939
-0.568881	on input data	-0.124939
-0.312942	any non-static data	-0.425969
-0.565562	the time-consuming data	-0.124939
-0.556491	a far data	-0.124939
-0.815292	of storing data	-0.124939
-1.255946	the smallest data	-0.124939
-0.549340	help files, data	-0.124939
-0.274037	to prefetch data	-0.124939
-0.274037	processors prefetch data	-0.124939
-0.274037	automatically prefetch data	-0.124939
-0.549340	signal processing, data	-0.124939
-0.549653	access Accessing data	-0.124939
-0.539523	the remote data	-0.124939
-0.540258	set. Aligning data	-0.124939
-0.129308	9.9 Access data	-0.425969
-0.525645	RGB image data	-0.124939
-0.129308	7.18 Class data	-0.425969
-0.525645	favorable: Small data	-0.124939
-0.503458	data. Extra data	-0.124939
-0.503458	for prefetching data	-0.124939
-0.503458	for aligning data	-0.124939
-0.504021	for organizing data	-0.124939
-0.724182	and read-only data	-0.124939
-0.724182	that accesses data	-0.124939
-0.462218	or send data	-0.124939
-0.657636	of keeping data	-0.124939
-0.143054	make thread-specific data	-0.124939
-0.143054	containing thread-specific data	-0.124939
-0.462218	of received data	-0.124939
-0.657636	to organize data	-0.124939
-0.357812	optimization. Prefetching data	-0.124939
-0.357812	to exchange data	-0.124939
-0.357812	Encryption, decryption, data	-0.124939
-0.357812	on arranging data	-0.124939
-0.357812	Any writable data	-0.124939
-0.357812	favorable: Larger data	-0.124939
-0.357812	containing numerical data	-0.124939
-0.357812	data Loading data	-0.124939
-0.357812	data structure, data	-0.124939
-1.272960	of the program	-0.221849
-1.588645	and the program	-0.124939
-1.492076	in the program	-0.124939
-1.345422	if the program	-0.301030
-1.833940	by the program	-0.124939
-1.733809	than the program	-0.124939
-1.096097	time the program	-0.124939
-0.923329	when the program	-0.602060
-1.084224	make the program	-0.124939
-1.180736	If the program	-0.124939
-1.564035	but the program	-0.124939
-1.617052	into the program	-0.124939
-1.600161	makes the program	-0.124939
-1.042425	before the program	-0.124939
-1.179940	want the program	-0.124939
-0.860553	while the program	-0.425969
-0.488707	compile the program	-0.124939
-1.104697	run the program	-0.124939
-1.324878	after the program	-0.124939
-1.179940	When the program	-0.124939
-1.376031	until the program	-0.124939
-1.002562	modify the program	-0.124939
-0.574990	update the program	-0.124939
-0.574990	words, the program	-0.124939
-0.574990	determines the program	-0.124939
-0.850539	behind the program	-0.124939
-0.850539	stop the program	-0.124939
-0.574990	Otherwise the program	-0.124939
-0.574990	terminates the program	-0.124939
-1.064546	of a program	-0.221849
-1.827744	in a program	-0.124939
-1.739585	that a program	-0.124939
-0.954423	if a program	-0.301030
-1.763026	on a program	-0.124939
-1.691844	than a program	-0.124939
-1.482504	when a program	-0.124939
-0.853328	If a program	-0.124939
-0.904389	where a program	-0.124939
-0.581550	running a program	-0.124939
-0.862996	load a program	-0.124939
-0.581550	down a program	-0.124939
-1.020348	install a program	-0.124939
-0.581550	redesigning a program	-0.124939
-2.107309	size of program	-0.124939
-1.193418	cases of program	-0.124939
-1.807935	piece of program	-0.124939
-0.895926	priority of program	-0.124939
-1.480576	terms of program	-0.124939
-1.290474	around in program	-0.124939
-0.599397	contiguous in program	-0.124939
-0.599397	Delays in program	-0.124939
-1.428079	time. The program	-0.124939
-1.216575	below. The program	-0.124939
-1.216575	called. The program	-0.124939
-1.135293	calculations. The program	-0.124939
-0.867315	sets. The program	-0.124939
-0.867315	initialization. The program	-0.124939
-0.867315	true. The program	-0.124939
-1.026566	*.so). The program	-0.124939
-0.867315	compilation. The program	-0.124939
-0.583804	deallocated. The program	-0.124939
-0.583804	interpretation. The program	-0.124939
-0.583804	sched_setaffinity). The program	-0.124939
-0.583804	error-prone. The program	-0.124939
-1.294956	speed or program	-0.124939
-0.600857	impacts on program	-0.124939
-0.599953	reason. A program	-0.124939
-0.997091	a C++ program	-0.124939
-0.572938	another C++ program	-0.124939
-0.572938	well-structured C++ program	-0.124939
-1.414966	it makes program	-0.124939
-0.464421	the test program	-0.124939
-0.985200	a test program	-0.124939
-0.536157	small test program	-0.124939
-0.887660	a Windows program	-0.124939
-1.393868	a big program	-0.124939
-0.887310	etc. But program	-0.124939
-1.177372	highly optimized program	-0.124939
-0.229858	console mode program	-0.301030
-0.182791	The application program	-0.124939
-0.726023	an application program	-0.124939
-0.870487	the calling program	-0.124939
-0.607022	in your program	-0.124939
-0.783114	make your program	-0.124939
-0.429271	inside your program	-0.124939
-0.429271	unless your program	-0.124939
-1.020415	The installation program	-0.124939
-1.499472	the desired program	-0.124939
-0.358834	the whole program	-0.124939
-0.093459	for whole program	-0.124939
-0.212706	do whole program	-0.124939
-0.212706	called whole program	-0.124939
-0.212706	Use whole program	-0.124939
-0.212706	doing whole program	-0.124939
-0.579930	used. No program	-0.124939
-1.478155	the final program	-0.124939
-1.208633	more clear program	-0.124939
-0.483521	used during program	-0.124939
-0.483521	grows during program	-0.124939
-0.694282	the entire program	-0.124939
-0.572490	programmers rarely program	-0.124939
-0.550175	the 7 program	-0.124939
-0.504359	a speed-critical program	-0.124939
-0.504359	more well-structured program	-0.124939
-0.090075	optimization Whole program	-0.124939
-0.090075	program. Whole program	-0.124939
-0.090075	/Og Whole program	-0.124939
-0.725682	computationally intensive program	-0.124939
-0.463038	for preventing program	-0.124939
-0.463038	time, usability, program	-0.124939
-0.358457	and analyzing program	-0.124939
-0.358457	of downloaded program	-0.124939
-0.358457	--combine -fwhole- program	-0.124939
-0.358457	an antivirus program	-0.124939
-1.780088	function that has	-0.124939
-1.256191	library that has	-0.124939
-0.592714	file that has	-0.124939
-0.884595	block that has	-0.124939
-0.592714	iteration that has	-0.124939
-1.051727	everything that has	-0.124939
-1.485965	that it has	-0.124939
-1.013062	if it has	-0.124939
-1.465803	when it has	-0.124939
-1.309874	because it has	-0.124939
-0.818049	which it has	-0.124939
-1.620957	but it has	-0.124939
-0.812734	before it has	-0.425969
-0.557461	work it has	-0.124939
-1.455719	Therefore, it has	-0.124939
-0.194826	after it has	-0.425969
-0.557461	anything it has	-0.124939
-2.406127	the function has	-0.124939
-1.835857	member function has	-0.124939
-1.825001	the code has	-0.124939
-1.796177	The code has	-0.124939
-1.132858	This code has	-0.124939
-1.149608	all code has	-0.124939
-0.865999	System code has	-0.124939
-1.457276	} This has	-0.124939
-0.573700	function. This has	-0.425969
-1.148205	program. This has	-0.124939
-0.196854	cache. This has	-0.124939
-0.981211	called. This has	-0.124939
-0.566890	executed. This has	-0.124939
-0.566890	list[i]; This has	-0.124939
-0.566890	bytes). This has	-0.124939
-1.422852	the compiler has	-0.124939
-1.200240	The compiler has	-0.124939
-1.115939	A compiler has	-0.124939
-1.209547	Intel compiler has	-0.124939
-0.558194	Codeplay compiler has	-0.124939
-0.899255	CParent<CChild1> { has	-0.124939
-0.898820	this time has	-0.124939
-0.593287	pointer. It has	-0.124939
-0.587535	processors). It has	-0.124939
-1.665247	the program has	-0.124939
-0.689494	a program has	-0.204120
-0.912827	The program has	-0.124939
-1.319764	the CPU has	-0.301030
-0.599081	unit-test but has	-0.124939
-1.530026	the integer has	-0.124939
-0.877911	32-bit integer has	-0.124939
-1.484760	instruction set has	-0.124939
-1.535402	the class has	-0.124939
-0.588638	polymorphic class has	-0.124939
-1.481141	this example has	-0.124939
-1.363692	Intel compilers has	-0.124939
-1.068304	code size has	-0.124939
-0.803035	the pointer has	-0.124939
-0.929976	function pointer has	-0.124939
-0.816033	link pointer has	-0.124939
-0.598439	because b has	-0.124939
-1.225967	the library has	-0.124939
-0.736387	The library has	-0.124939
-0.736387	or library has	-0.124939
-0.171729	This library has	-0.425969
-0.719239	class library has	-0.124939
-1.234672	the object has	-0.425969
-1.413476	shared object has	-0.124939
-0.597841	where static has	-0.124939
-0.597572	While C++ has	-0.124939
-0.572161	function also has	-0.124939
-0.572161	stack also has	-0.124939
-0.572161	unrolling also has	-0.124939
-2.202796	the value has	-0.124939
-1.767799	the table has	-0.124939
-1.064217	of performance has	-0.124939
-0.597015	suboptimal way has	-0.124939
-1.571479	a template has	-0.124939
-1.126403	of registers has	-0.124939
-1.322574	vector registers has	-0.124939
-1.107402	the user has	-0.124939
-0.811777	a user has	-0.124939
-0.594115	variable always has	-0.124939
-1.901768	operating system has	-0.124939
-1.343966	the file has	-0.124939
-0.594629	Each type has	-0.124939
-0.594036	another error has	-0.124939
-1.236026	the processor has	-0.124939
-0.784818	a processor has	-0.124939
-0.538910	This processor has	-0.124939
-1.329858	array element has	-0.124939
-1.560224	assembly language has	-0.124939
-1.168296	Each thread has	-0.124939
-0.884709	point overflow has	-0.124939
-1.687717	cache line has	-0.124939
-0.882533	This problem has	-0.124939
-1.328847	linked list has	-0.124939
-0.591178	pipeline structure has	-0.124939
-0.590625	rounding mode has	-0.124939
-1.597771	repeat count has	-0.124939
-1.041169	heap space has	-0.124939
-0.584831	the microprocessor has	-0.124939
-1.561255	the application has	-0.124939
-1.161762	CPU model has	-0.124939
-0.874350	the parameter has	-0.124939
-1.736348	the programmer has	-0.124939
-0.463716	static keyword has	-0.124939
-0.872758	each addition has	-0.124939
-0.584645	user actually has	-0.124939
-1.217993	hardware platform has	-0.124939
-1.451250	the operands has	-0.124939
-1.039139	in main has	-0.124939
-1.395293	the computer has	-0.124939
-0.581175	pointer p has	-0.124939
-0.861202	C++ syntax has	-0.124939
-1.330596	the STL has	-0.124939
-1.123538	Function inlining has	-0.124939
-0.580119	template instance has	-0.124939
-0.580269	as position-independent has	-0.124939
-1.330763	the offset has	-0.124939
-1.256875	the heap has	-0.124939
-0.572005	doesn't occur has	-0.124939
-0.995149	main executable has	-0.124939
-0.568671	until seconds has	-0.124939
-0.658159	if F1 has	-0.124939
-0.462552	then F1 has	-0.124939
-0.569110	selected. Compiler has	-0.124939
-0.976618	programming style has	-0.124939
-1.264790	the latter has	-0.124939
-1.218126	dependency chain has	-0.124939
-0.561519	user who has	-0.124939
-0.556025	language. D has	-0.124939
-0.955696	heap manager has	-0.124939
-0.955696	hot spot has	-0.124939
-0.525434	shared_ptr. auto_ptr has	-0.124939
-0.503257	This reordering has	-0.124939
-0.503257	platforms. Pascal has	-0.124939
-0.657351	A for-loop has	-0.124939
-0.462036	functions. Sum1 has	-0.124939
-0.657351	the reader has	-0.124939
-0.462036	member functions) has	-0.124939
-0.357669	indirect function" has	-0.124939
-0.357669	that CParent::Hello() has	-0.124939
-0.357669	stack. Deallocation has	-0.124939
-0.357669	example 8.23b has	-0.124939
-0.357669	-fno-pic apparently has	-0.124939
-0.357669	initialisation i=0; has	-0.124939
-2.228823	is the vector	-0.124939
-2.348893	of the vector	-0.124939
-2.293411	and the vector	-0.124939
-2.384617	for the vector	-0.124939
-1.481394	by the vector	-0.602060
-2.122687	than the vector	-0.124939
-1.481878	at the vector	-0.425969
-2.189749	If the vector	-0.124939
-1.932116	using the vector	-0.124939
-1.871172	into the vector	-0.124939
-1.274679	Using the vector	-0.124939
-2.197325	of a vector	-0.124939
-1.917031	in a vector	-0.124939
-1.554065	as a vector	-0.425969
-0.876617	into a vector	-0.124939
-1.534323	where a vector	-0.124939
-0.589827	calculate a vector	-0.124939
-0.181162	Make a vector	-0.903090
-1.874280	use of vector	-0.124939
-2.104675	size of vector	-0.124939
-1.904476	advantage of vector	-0.124939
-1.648240	kinds of vector	-0.124939
-0.598267	extension of vector	-0.124939
-1.367307	processors and vector	-0.124939
-0.599883	11) and vector	-0.124939
-1.805518	elements in vector	-0.124939
-0.599374	element in vector	-0.249877
-1.067279	registers. The vector	-0.124939
-1.067279	bits. The vector	-0.124939
-0.598056	105 The vector	-0.124939
-1.281059	performance for vector	-0.124939
-0.894991	suited for vector	-0.124939
-0.597992	Useful for vector	-0.124939
-0.900723	"vectorclass.h" // vector	-0.124939
-0.502644	functions or vector	-0.425969
-1.893412	divisible by vector	-0.124939
-0.868855	integer with vector	-0.124939
-0.584605	available with vector	-0.124939
-1.028791	arrays with vector	-0.124939
-1.220027	problem with vector	-0.124939
-0.584605	Function with vector	-0.124939
-0.584605	dispatching with vector	-0.124939
-0.892702	size as vector	-0.124939
-1.576542	implemented as vector	-0.124939
-0.899183	microprocessors have vector	-0.124939
-1.283587	to use vector	-0.124939
-1.653705	can use vector	-0.124939
-0.573205	also use vector	-0.124939
-0.599993	some more vector	-0.124939
-0.947281	for integer vector	-0.124939
-0.873739	more integer vector	-0.124939
-0.186951	aligned integer vector	-0.124939
-0.053842	unaligned integer vector	-0.602060
-0.598710	7.41a class vector	-0.124939
-1.073623	of each vector	-0.124939
-1.215355	in each vector	-0.124939
-0.840923	then each vector	-0.124939
-1.560778	are using vector	-0.124939
-1.936149	by using vector	-0.124939
-1.019202	of Intel vector	-0.124939
-1.155492	and Intel vector	-0.124939
-1.456522	The Intel vector	-0.124939
-0.547945	using Intel vector	-0.124939
-0.800862	libraries: Intel vector	-0.124939
-0.547945	(using Intel vector	-0.124939
-0.083869	cc into vector	-0.726999
-0.083869	bb into vector	-0.726999
-1.420523	the 64-bit vector	-0.124939
-1.678814	most efficient vector	-0.124939
-0.597941	biggest possible vector	-0.124939
-0.937964	a long vector	-0.124939
-0.550982	some long vector	-0.124939
-0.550982	libraries: long vector	-0.124939
-0.326534	128 bit vector	-0.301030
-0.543929	128- bit vector	-0.124939
-1.919780	a new vector	-0.124939
-0.506646	the short vector	-0.124939
-0.729495	a short vector	-0.124939
-0.506646	of short vector	-0.124939
-0.506646	and short vector	-0.124939
-0.506646	Intel short vector	-0.124939
-1.183122	the available vector	-0.124939
-1.448719	the constant vector	-0.124939
-0.769446	the result vector	-0.726999
-1.165000	with another vector	-0.124939
-0.171331	12 Using vector	-0.124939
-0.171331	12.5 Using vector	-0.425969
-0.540563	for Boolean vector	-0.124939
-0.540563	mispredictions. Boolean vector	-0.124939
-0.588649	The intrinsic vector	-0.124939
-0.585360	The XMM vector	-0.124939
-0.585360	of bigger vector	-0.124939
-1.128640	compiler supports vector	-0.124939
-1.208955	in my vector	-0.124939
-0.581088	parallelization. Supports vector	-0.124939
-0.734329	an STL vector	-0.124939
-0.509533	objects. STL vector	-0.124939
-0.080170	the #pragma vector	-0.124939
-0.178566	use #pragma vector	-0.124939
-0.178566	always #pragma vector	-0.124939
-0.178566	write #pragma vector	-0.124939
-0.080170	aligned #pragma vector	-0.425969
-0.178566	nontemporal #pragma vector	-0.124939
-0.178566	Vectorize #pragma vector	-0.124939
-0.577951	of special vector	-0.124939
-1.453673	the right vector	-0.124939
-0.419591	// Define vector	-0.425969
-0.416746	a 128-bit vector	-0.124939
-0.407715	supported 128-bit vector	-0.124939
-0.572019	massively parallel vector	-0.124939
-0.572159	not allow vector	-0.124939
-0.569583	me. My vector	-0.124939
-0.449892	a 256-bit vector	-0.124939
-0.449892	one 256-bit vector	-0.124939
-0.557433	AVX2 Mathematical vector	-0.124939
-0.938496	the largest vector	-0.124939
-0.549693	using Agner vector	-0.124939
-0.051739	using Agner's vector	-0.124939
-0.051739	classes Agner's vector	-0.124939
-0.051739	107). Agner's vector	-0.124939
-0.051739	-mveclibabi=acml. Agner's vector	-0.124939
-0.051739	amd_vrd2_exp Agner's vector	-0.124939
-0.786627	the larger vector	-0.124939
-0.526390	or sixteen vector	-0.124939
-0.526390	RISC cores, vector	-0.124939
-0.504275	= b;} vector	-0.124939
-0.462564	and scientific vector	-0.124939
-0.143137	of predefined vector	-0.124939
-0.143137	Use predefined vector	-0.124939
-0.358084	processors (when vector	-0.124939
-0.358084	other odd-sized vector	-0.124939
-0.358084	b Bit vector	-0.124939
-0.358084	+ a.y);} vector	-0.124939
-0.358084	// 2-dimensional vector	-0.124939
-1.884625	or a make	-0.124939
-2.300163	as a make	-0.124939
-0.771711	is to make	-0.550907
-0.664670	and to make	-0.425969
-0.771663	as to make	-0.124939
-0.531394	- to make	-0.124939
-1.116309	compiler to make	-0.124939
-1.434089	have to make	-0.124939
-1.362679	has to make	-0.124939
-0.771663	do to make	-0.124939
-0.906596	efficient to make	-0.425969
-0.894460	array to make	-0.124939
-0.898090	possible to make	-0.301030
-0.531394	value to make	-0.124939
-1.108729	takes to make	-0.124939
-0.637100	order to make	-0.321233
-0.908075	way to make	-0.301030
-1.206654	faster to make	-0.124939
-0.705341	how to make	-0.204120
-0.794957	useful to make	-0.124939
-1.097509	sure to make	-0.124939
-0.795662	want to make	-0.221849
-1.321793	important to make	-0.124939
-0.350551	common to make	-0.124939
-1.467461	advantageous to make	-0.124939
-1.019399	solution to make	-0.124939
-0.531394	structure to make	-0.124939
-1.159881	recommended to make	-0.124939
-0.771663	dispatching to make	-0.124939
-0.531394	complicated to make	-0.124939
-1.334836	needs to make	-0.124939
-0.195007	programmer to make	-0.492916
-0.531394	means to make	-0.124939
-0.894460	choose to make	-0.124939
-1.230183	ways to make	-0.124939
-0.771663	things to make	-0.124939
-0.771663	Loop to make	-0.124939
-0.771663	destructor to make	-0.124939
-0.894460	safe to make	-0.124939
-0.531394	subexpression to make	-0.124939
-0.894460	easy to make	-0.124939
-0.971936	convenient to make	-0.124939
-0.189037	effort to make	-0.124939
-0.894460	preferable to make	-0.124939
-0.894460	idea to make	-0.124939
-0.531394	manner to make	-0.124939
-0.771663	sufficient to make	-0.124939
-0.531394	carefully to make	-0.124939
-0.771663	forget to make	-0.124939
-0.531394	tried to make	-0.124939
-0.531394	advisable to make	-0.124939
-0.531394	tends to make	-0.124939
-1.156739	data and make	-0.124939
-0.589721	case and make	-0.124939
-1.377893	times and make	-0.124939
-0.878753	problem and make	-0.124939
-0.589721	list and make	-0.124939
-0.589721	conversions and make	-0.124939
-1.156739	possible, and make	-0.124939
-0.878753	truncation and make	-0.124939
-0.589721	package and make	-0.124939
-0.589721	spot and make	-0.124939
-0.589721	aligned, and make	-0.124939
-0.589721	error; and make	-0.124939
-1.368361	instructions that make	-0.124939
-0.879081	modules that make	-0.124939
-0.589889	conditions that make	-0.124939
-0.589889	details that make	-0.124939
-0.589889	disadvantages that make	-0.124939
-0.201658	Factors that make	-0.425969
-0.589889	complications that make	-0.124939
-1.818040	that can make	-0.124939
-1.732141	compiler can make	-0.124939
-1.230383	you can make	-0.124939
-1.475800	compilers can make	-0.124939
-0.577208	instructions can make	-0.124939
-0.577208	application can make	-0.124939
-1.021766	We can make	-0.124939
-1.008524	tool can make	-0.124939
-0.577208	modifier can make	-0.124939
-1.077529	2 // make	-0.124939
-0.601043	space or make	-0.124939
-1.308991	do not make	-0.124939
-1.791437	does not make	-0.124939
-0.962611	Do not make	-0.124939
-1.326301	you may make	-0.124939
-1.359761	You may make	-0.124939
-0.591329	only you make	-0.124939
-1.887218	If you make	-0.124939
-0.591329	Then you make	-0.124939
-0.485148	This will make	-0.124939
-1.295109	compiler will make	-0.124939
-0.896115	this will make	-0.124939
-0.819015	compilers will make	-0.124939
-0.896115	I will make	-0.124939
-0.532110	overflow will make	-0.124939
-0.532110	b; will make	-0.124939
-0.532110	b++; will make	-0.124939
-0.581832	library then make	-0.124939
-0.863534	parameters then make	-0.124939
-0.581832	counter then make	-0.124939
-0.581832	compiler, then make	-0.124939
-1.789809	Some compilers make	-0.124939
-1.154203	compiler cannot make	-0.124939
-1.287678	you cannot make	-0.124939
-1.114543	they cannot make	-0.124939
-0.547415	Compilers cannot make	-0.124939
-0.574403	you must make	-0.124939
-1.519116	compiler doesn't make	-0.124939
-0.555022	loops would make	-0.124939
-0.555022	chain would make	-0.124939
-0.590000	parameters. Therefore, make	-0.124939
-1.319254	of course make	-0.124939
-1.211368	OS X make	-0.124939
-0.557610	format. Alternatively, make	-0.124939
-0.540919	time. Templates make	-0.124939
-0.463294	or better, make	-0.124939
-0.358658	non-recoverable errors; make	-0.124939
-0.358658	or (5) make	-0.124939
-2.746392	of the different	-0.124939
-2.489419	to the different	-0.124939
-2.674780	in the different	-0.124939
-2.223530	If the different	-0.124939
-1.008541	between the different	-0.124939
-1.187298	test the different	-0.124939
-1.636463	whether the different	-0.124939
-1.480155	put the different	-0.124939
-0.597793	contain the different	-0.124939
-0.894598	manipulate the different	-0.124939
-0.597793	summarizes the different	-0.124939
-1.638759	size is different	-0.124939
-2.166864	of a different	-0.124939
-1.417530	to a different	-0.124939
-1.494546	in a different	-0.124939
-1.249190	function a different	-0.124939
-1.535908	with a different	-0.124939
-1.344352	use a different	-0.124939
-1.198587	has a different	-0.124939
-1.628029	using a different	-0.124939
-1.371548	uses a different	-0.124939
-0.588508	had a different	-0.124939
-2.076741	number of different	-0.124939
-0.341944	objects of different	-0.301030
-1.425289	performance of different	-0.124939
-0.201207	pointers of different	-0.425969
-1.037418	arrays of different	-0.124939
-0.486469	efficiency of different	-0.602060
-0.587688	strings of different	-0.124939
-1.539933	discussion of different	-0.124939
-0.874806	consumption of different	-0.124939
-0.496470	Comparison of different	-0.124939
-0.587688	transposition of different	-0.124939
-0.587688	hundreds of different	-0.124939
-0.587688	Sizes of different	-0.124939
-1.626520	pointers to different	-0.124939
-0.599907	priorities to different	-0.124939
-0.599907	port to different	-0.124939
-0.600059	settings and different	-0.124939
-0.600059	alignments and different	-0.124939
-1.547847	data in different	-0.124939
-1.418433	stored in different	-0.124939
-1.582019	available in different	-0.124939
-1.589239	implemented in different	-0.124939
-1.055114	optimizations in different	-0.124939
-0.886898	tested in different	-0.124939
-0.886898	kept in different	-0.124939
-1.063371	functions The different	-0.124939
-1.551453	cache. The different	-0.124939
-1.183164	variable. The different	-0.124939
-0.892486	sets. The different	-0.124939
-1.327101	function for different	-0.124939
-1.020454	different for different	-0.124939
-0.571704	cases for different	-0.124939
-0.056497	versions for different	-0.301030
-0.571704	compile for different	-0.124939
-0.121955	conventions for different	-0.903090
-0.571704	algorithms for different	-0.124939
-0.844363	area for different	-0.124939
-0.571704	organization for different	-0.124939
-0.571704	spaces for different	-0.124939
-0.571704	cell for different	-0.124939
-1.742872	means that different	-0.124939
-2.764722	can be different	-0.124939
-2.145730	will be different	-0.124939
-1.450013	there are different	-0.124939
-0.597345	prediction are different	-0.124939
-1.848313	or if different	-0.124939
-0.578512	4 with different	-0.124939
-0.724640	compiled with different	-0.124939
-1.273480	threads with different	-0.124939
-0.578512	addresses with different	-0.124939
-1.493647	compatible with different	-0.124939
-0.578512	Test with different	-0.124939
-0.578512	streams with different	-0.124939
-0.886952	tested on different	-0.124939
-0.593916	fastest on different	-0.124939
-0.886952	differently on different	-0.124939
-1.074719	treated as different	-0.124939
-1.251908	will use different	-0.124939
-0.877835	operations use different	-0.124939
-0.589249	threads use different	-0.124939
-0.885779	files from different	-0.124939
-1.258981	read from different	-0.124939
-0.200988	around at different	-0.124939
-0.586619	decision at different	-0.124939
-2.519364	to make different	-0.124939
-0.599664	stub. If different	-0.124939
-1.938051	for each different	-0.124939
-2.263479	to do different	-0.124939
-2.078231	by using different	-0.124939
-1.771621	and b different	-0.124939
-0.964685	of two different	-0.124939
-1.057307	in two different	-0.124939
-1.124254	are two different	-0.124939
-0.474326	have two different	-0.124939
-0.767077	make two different	-0.124939
-0.528750	correspondingly two different	-0.124939
-1.042425	in many different	-0.124939
-0.767563	for many different	-0.425969
-1.161342	with many different	-0.124939
-0.872248	have many different	-0.124939
-0.521626	from many different	-0.124939
-0.754817	so many different	-0.124939
-1.064975	have very different	-0.124939
-0.509056	or between different	-0.124939
-0.509056	switch between different	-0.124939
-1.038966	Conversions between different	-0.124939
-0.909619	communication between different	-0.124939
-0.509056	jumps between different	-0.124939
-0.509056	Switch between different	-0.124939
-0.509056	environment, between different	-0.124939
-0.593798	link. Use different	-0.124939
-0.593438	operator These different	-0.124939
-0.433853	for several different	-0.124939
-0.347842	are several different	-0.124939
-0.400317	by several different	-0.124939
-0.400317	on several different	-0.124939
-0.677126	has several different	-0.124939
-0.400317	test several different	-0.124939
-0.592167	to support different	-0.124939
-0.592064	into eight different	-0.124939
-1.163188	are doing different	-0.124939
-0.586984	Supports three different	-0.124939
-0.586464	will look different	-0.124939
-0.576669	and copying different	-0.124939
-1.283827	to mix different	-0.124939
-0.562405	to try different	-0.124939
-0.562552	of CPUs, different	-0.124939
-0.143243	on seven different	-0.124939
-0.143243	different platforms, different	-0.425969
-0.763382	for mixing different	-0.124939
-0.462820	p2 having different	-0.124939
-0.658579	screen resolutions, different	-0.124939
-0.462820	that treats different	-0.124939
-0.462820	for assigning different	-0.124939
-0.358285	different browsers, different	-0.124939
-0.358285	with widely different	-0.124939
-0.358285	different microprocessors, different	-0.124939
-1.094776	This is because	-0.249877
-2.472126	may be because	-0.124939
-1.071794	program or because	-0.124939
-0.910749	member function because	-0.301030
-0.877491	simple function because	-0.124939
-1.239657	frame function because	-0.124939
-1.076361	than if because	-0.124939
-1.258243	optimized code because	-0.124939
-0.885466	optimal code because	-0.124939
-0.593159	non-AVX code because	-0.124939
-2.013259	Intel compiler because	-0.124939
-1.379445	a time because	-0.124939
-1.273089	first time because	-0.124939
-0.490811	execution time because	-0.425969
-1.488261	compile time because	-0.124939
-0.598998	store data because	-0.124939
-1.812888	member functions because	-0.124939
-1.069636	while loop because	-0.124939
-1.188538	at all because	-0.124939
-1.645536	level-2 cache because	-0.124939
-1.912277	an integer because	-0.124939
-1.862886	can do because	-0.124939
-1.483642	or double because	-0.124939
-1.052390	and b because	-0.124939
-0.576281	< b because	-0.124939
-1.190608	link library because	-0.124939
-0.597515	contained object because	-0.124939
-0.547538	are efficient because	-0.124939
-1.797953	more efficient because	-0.124939
-0.932655	very efficient because	-0.124939
-1.432199	less efficient because	-0.124939
-0.800134	equally efficient because	-0.124939
-0.597364	optimizations possible because	-0.124939
-0.893983	optimized version because	-0.124939
-1.672067	a variable because	-0.124939
-1.435285	induction variable because	-0.124939
-1.359404	register variables because	-0.124939
-1.183045	the performance because	-0.425969
-0.833750	program performance because	-0.124939
-0.890945	32-bit software because	-0.124939
-0.892658	is long because	-0.124939
-1.865898	is faster because	-0.124939
-0.563487	code faster because	-0.124939
-0.972431	run faster because	-0.124939
-0.596576	particularly critical because	-0.124939
-0.586848	same register because	-0.425969
-0.888966	quite often because	-0.124939
-0.595392	function template because	-0.124939
-1.171452	of pointers because	-0.124939
-0.949603	than pointers because	-0.124939
-0.554441	uses pointers because	-0.124939
-0.594803	unit- test because	-0.124939
-0.594733	some systems because	-0.124939
-1.666974	is useful because	-0.124939
-1.704518	= 0 because	-0.124939
-0.594444	VIA processors because	-0.124939
-0.594355	became available because	-0.124939
-1.056643	cleaning up because	-0.124939
-0.544601	eight times because	-0.124939
-0.544601	subsequent times because	-0.124939
-0.544601	hundred times because	-0.124939
-1.052220	is large because	-0.124939
-1.718990	function calls because	-0.124939
-0.593644	correct result because	-0.124939
-1.337545	not necessary because	-0.124939
-1.555910	assembly language because	-0.124939
-0.885236	half speed because	-0.124939
-0.591138	than 128 because	-0.124939
-1.048918	function parameters because	-0.124939
-1.667849	is advantageous because	-0.124939
-1.189818	efficient solution because	-0.124939
-0.816699	optimal solution because	-0.124939
-1.342181	an advantage because	-0.124939
-1.245694	Boolean operators because	-0.124939
-0.630976	64-bit mode because	-0.602060
-1.470197	the values because	-0.124939
-0.588177	interactive programs because	-0.124939
-0.876811	caching problems because	-0.124939
-0.470522	not optimal because	-0.425969
-1.038643	a microprocessor because	-0.124939
-1.514073	more complicated because	-0.124939
-1.162690	no cost because	-0.124939
-1.156534	be better because	-0.124939
-0.585480	critical applications because	-0.124939
-0.586008	compiler mechanism because	-0.124939
-0.872322	be needed because	-0.124939
-0.871240	simple types because	-0.124939
-0.585843	uncached read because	-0.124939
-0.584103	hexadecimal numbers because	-0.124939
-0.583020	allocation process because	-0.124939
-0.582487	function just because	-0.124939
-1.198096	the operands because	-0.124939
-0.748623	Boolean operands because	-0.124939
-1.022005	is smaller because	-0.124939
-0.518428	n here because	-0.124939
-0.518428	anything here because	-0.124939
-0.581038	than intended because	-0.124939
-0.970259	be avoided because	-0.124939
-0.287826	is inefficient because	-0.124939
-0.645723	very inefficient because	-0.124939
-0.580072	always position-independent because	-0.124939
-0.855244	different platforms because	-0.124939
-0.577266	other constants because	-0.124939
-0.851773	time-consuming tasks because	-0.124939
-1.251250	hard disk because	-0.124939
-0.994100	main executable because	-0.124939
-0.570743	inherently parallel because	-0.124939
-0.571543	not copied because	-0.124939
-1.151423	switch statements because	-0.124939
-0.837490	several seconds because	-0.124939
-0.824496	fine-grained parallelism because	-0.124939
-0.823840	is fastest because	-0.124939
-0.560988	are preferred because	-0.124939
-1.076990	without -fpic because	-0.124939
-0.382882	time consuming because	-0.425969
-0.555021	total size, because	-0.124939
-1.080592	of course, because	-0.124939
-0.548581	only occurs because	-0.124939
-0.389594	quite costly because	-0.124939
-0.389594	relatively costly because	-0.124939
-0.548117	be poor because	-0.124939
-0.548117	fail completely because	-0.124939
-0.784328	column 28 because	-0.124939
-0.784328	multiple processes because	-0.124939
-0.524782	plus one, because	-0.124939
-0.762503	^= 0x80000000; because	-0.124939
-0.761365	is serial because	-0.124939
-0.761365	example 9.5 because	-0.124939
-0.524782	becomes simpler because	-0.124939
-0.724208	behave differently because	-0.124939
-0.502638	is unfortunate because	-0.124939
-0.248945	const twice because	-0.124939
-0.248945	calculated twice because	-0.124939
-0.722820	option -fpie because	-0.124939
-0.502638	an issue because	-0.124939
-0.722820	is negligible because	-0.124939
-0.142875	very problematic because	-0.124939
-0.142875	particularly problematic because	-0.124939
-0.656467	be vectorized, because	-0.124939
-0.461473	is unsafe because	-0.124939
-0.357226	but i*12, because	-0.124939
-0.357226	particularly interesting because	-0.124939
-0.357226	accessed non-sequentially because	-0.124939
-0.357226	flags stall because	-0.124939
-0.357226	not evaluated, because	-0.124939
-0.357226	= *(++p) because	-0.124939
-0.357226	= array[++i] because	-0.124939
-0.357226	cache line, because	-0.124939
-0.357226	Sandy Bridge) because	-0.124939
-0.357226	in advance, because	-0.124939
-0.357226	with alloca, because	-0.124939
-0.357226	particularly risky because	-0.124939
-0.924272	is the same	-0.970037
-1.350954	of the same	-0.159701
-1.119840	to the same	-0.234083
-1.097289	in the same	-0.279841
-1.188031	for the same	-0.204120
-1.655763	that the same	-0.124939
-0.328612	are the same	-0.124939
-1.678500	if the same	-0.124939
-1.588062	by the same	-0.124939
-0.887007	with the same	-0.182931
-1.446544	on the same	-0.124939
-1.243710	have the same	-0.124939
-0.620204	use the same	-0.249877
-0.882179	from the same	-0.124939
-0.701409	at the same	-0.550907
-0.788713	has the same	-0.602060
-1.320499	because the same	-0.124939
-1.353204	If the same	-0.124939
-0.965513	using the same	-0.124939
-1.427716	into the same	-0.124939
-1.382765	In the same	-0.124939
-1.455368	where the same	-0.124939
-0.628848	take the same	-0.425969
-0.803949	even the same	-0.124939
-0.143260	does the same	-0.221849
-0.549667	work the same	-0.124939
-1.121228	calls the same	-0.124939
-1.132408	But the same	-0.124939
-1.121228	get the same	-0.124939
-0.463032	doing the same	-0.301030
-1.414950	calculate the same	-0.124939
-1.024318	write the same	-0.124939
-0.549667	becomes the same	-0.124939
-0.803949	shows the same	-0.124939
-0.803949	goes the same	-0.124939
-0.549667	go the same	-0.124939
-0.358490	produce the same	-0.124939
-0.803949	still the same	-0.124939
-0.549667	accessing the same	-0.124939
-0.803949	least the same	-0.124939
-0.803949	keep the same	-0.124939
-0.559652	within the same	-0.124939
-0.036104	share the same	-0.176091
-0.358490	exactly the same	-0.124939
-0.803949	executing the same	-0.124939
-0.193123	interpreting the same	-0.425969
-0.549667	having the same	-0.124939
-0.549667	requiring the same	-0.124939
-0.193123	sharing the same	-0.124939
-0.549667	reuse the same	-0.124939
-1.768293	} The same	-0.124939
-0.587016	unsigned The same	-0.124939
-1.470934	cache. The same	-0.124939
-1.035528	order. The same	-0.124939
-0.587016	storage. The same	-0.124939
-0.873506	closed. The same	-0.124939
-0.587016	operators). The same	-0.124939
-0.587016	floats. The same	-0.124939
-0.587016	2016. The same	-0.124939
-0.587016	reads. The same	-0.124939
-0.587016	row. The same	-0.124939
-0.376898	only from same	-0.425969
-1.114361	execution units same	-0.124939
-2.367224	of the functions	-0.425969
-2.411315	for the functions	-0.124939
-1.691175	with the functions	-0.124939
-2.166500	use the functions	-0.124939
-2.005543	make the functions	-0.124939
-1.299379	all the functions	-0.425969
-1.065093	containing the functions	-0.124939
-1.065093	implement the functions	-0.124939
-0.597313	extracts the functions	-0.124939
-0.597313	collect the functions	-0.124939
-1.586871	order of functions	-0.124939
-0.598979	even of functions	-0.124939
-1.426040	speed of functions	-0.124939
-1.889392	instead of functions	-0.124939
-0.601335	accesses to functions	-0.124939
-0.600861	operators and functions	-0.124939
-1.071405	memory. The functions	-0.124939
-0.599452	difficult. The functions	-0.124939
-1.444939	conventions for functions	-0.124939
-2.022622	sure that functions	-0.124939
-0.598012	Windows if functions	-0.124939
-1.188151	efficiently if functions	-0.124939
-0.600020	library have functions	-0.124939
-1.071869	return from functions	-0.124939
-0.599548	intrinsic vector functions	-0.124939
-1.588722	the different functions	-0.124939
-1.480777	several different functions	-0.124939
-0.866436	some other functions	-0.124939
-1.042527	while other functions	-0.124939
-0.583346	calls other functions	-0.124939
-1.430039	of which functions	-0.124939
-1.559061	for all functions	-0.124939
-1.305592	if all functions	-0.124939
-0.582169	declare all functions	-0.124939
-1.428411	often used functions	-0.124939
-1.646986	are using functions	-0.124939
-1.226792	the library functions	-0.124939
-0.510966	of library functions	-0.124939
-0.848703	in library functions	-0.124939
-0.510966	most library functions	-0.124939
-0.736736	Intel library functions	-0.124939
-0.510966	any library functions	-0.124939
-0.510966	standard library functions	-0.124939
-0.510966	optimizing library functions	-0.124939
-0.510966	consuming library functions	-0.124939
-1.141094	these two functions	-0.124939
-0.870435	These two functions	-0.124939
-1.439335	as efficient functions	-0.124939
-1.193416	are many functions	-0.124939
-0.478008	contains many functions	-0.425969
-0.557722	Includes many functions	-0.124939
-0.493576	the member functions	-0.425969
-0.380806	and member functions	-0.124939
-0.149937	in member functions	-0.124939
-0.380806	The member functions	-0.124939
-0.277862	make member functions	-0.124939
-0.640930	class member functions	-0.124939
-0.380806	any member functions	-0.124939
-0.459987	virtual member functions	-0.124939
-0.094007	Virtual member functions	-0.124939
-0.149937	Class member functions	-0.124939
-0.380806	Non-static member functions	-0.124939
-1.375920	the critical functions	-0.124939
-0.562120	making critical functions	-0.124939
-1.416030	of these functions	-0.124939
-1.036802	that these functions	-0.124939
-0.553812	Unfortunately, these functions	-0.124939
-1.907615	operating system functions	-0.124939
-1.064034	compiler. Some functions	-0.124939
-1.776716	information about functions	-0.124939
-1.397021	most important functions	-0.124939
-1.392078	the necessary functions	-0.124939
-1.052775	CPU- specific functions	-0.124939
-0.563671	commpage. These functions	-0.124939
-0.563671	_mm. These functions	-0.124939
-0.696217	and virtual functions	-0.124939
-0.696217	with virtual functions	-0.124939
-0.486420	If virtual functions	-0.124939
-0.486420	avoid virtual functions	-0.124939
-0.486420	Avoid virtual functions	-0.124939
-0.593168	If several functions	-0.124939
-1.788449	a few functions	-0.124939
-0.591482	Use inline functions	-0.124939
-0.590171	unwinding. All functions	-0.124939
-0.589672	optimize both functions	-0.124939
-1.521977	more complicated functions	-0.124939
-0.135821	for intrinsic functions	-0.124939
-0.230710	with intrinsic functions	-0.124939
-0.230710	use intrinsic functions	-0.124939
-0.230710	different intrinsic functions	-0.124939
-0.230710	using intrinsic functions	-0.124939
-0.230710	SSE2 intrinsic functions	-0.124939
-0.230710	Use intrinsic functions	-0.124939
-0.064177	Using intrinsic functions	-0.124939
-0.640785	of mathematical functions	-0.124939
-0.322387	and mathematical functions	-0.124939
-0.329745	for mathematical functions	-0.124939
-0.329745	or mathematical functions	-0.124939
-0.464694	common mathematical functions	-0.124939
-0.329745	advanced mathematical functions	-0.124939
-0.329745	computing mathematical functions	-0.124939
-0.587506	contains various functions	-0.124939
-0.802882	and string functions	-0.124939
-0.489311	common string functions	-0.124939
-0.489311	style string functions	-0.124939
-0.872503	The three functions	-0.124939
-0.587271	mode. Make functions	-0.124939
-0.529588	of public functions	-0.124939
-0.529588	All public functions	-0.124939
-0.583055	multiple smaller functions	-0.124939
-0.864181	with C functions	-0.124939
-0.498056	of math functions	-0.124939
-0.498056	common math functions	-0.124939
-0.577899	of inlined functions	-0.124939
-0.715410	to frame functions	-0.124939
-0.498160	than frame functions	-0.124939
-0.571693	optimized. Library functions	-0.124939
-0.335858	functions Virtual functions	-0.124939
-0.335858	pure. Virtual functions	-0.124939
-0.335858	96). Virtual functions	-0.124939
-0.556195	all suitable functions	-0.124939
-0.090015	manipulation Mathematical functions	-0.124939
-0.042679	14.10 Mathematical functions	-0.124939
-0.090015	140). Mathematical functions	-0.124939
-0.042679	12.7 Mathematical functions	-0.425969
-0.182697	code. Intrinsic functions	-0.124939
-0.182697	instructions. Intrinsic functions	-0.124939
-0.182697	case. Intrinsic functions	-0.124939
-0.182697	12.3. Intrinsic functions	-0.124939
-0.550089	between leaf functions	-0.124939
-0.549817	and internal functions	-0.124939
-0.526264	The missing functions	-0.124939
-0.314105	/vms Fastcall functions	-0.124939
-0.314105	compiler). Fastcall functions	-0.124939
-0.525877	identify individual functions	-0.124939
-0.525877	program. Small functions	-0.124939
-0.129331	7.26 Overloaded functions	-0.124939
-0.724548	for speed-critical functions	-0.124939
-0.462419	them. Pure functions	-0.124939
-0.462419	Use fastcall functions	-0.124939
-0.462419	Place non-polymorphic functions	-0.124939
-0.462419	function. Sometimes, functions	-0.124939
-0.462419	Avoid unnecessary functions	-0.124939
-0.357970	// Non-polymorphic functions	-0.124939
-0.357970	or QueryPerformanceCounter functions	-0.124939
-0.357970	function. Leaf functions	-0.124939
-0.357970	or memory-intensive functions	-0.124939
-0.357970	case. Inlined functions	-0.124939
-1.689589	is the only	-0.124939
-2.618433	that the only	-0.124939
-1.663240	about the only	-0.124939
-2.011078	it is only	-0.124939
-1.567001	This is only	-0.124939
-1.851750	this is only	-0.124939
-1.956748	which is only	-0.124939
-1.374206	there is only	-0.124939
-1.538174	variable is only	-0.124939
-1.050644	keyword is only	-0.124939
-0.592338	penalty is only	-0.124939
-0.592338	log(2.0) is only	-0.124939
-0.601211	evaluate a only	-0.124939
-0.601200	etc. of only	-0.124939
-0.596384	one and only	-0.124939
-0.596384	one, and only	-0.124939
-0.596384	automatically, and only	-0.124939
-0.596384	if, and only	-0.124939
-0.596384	thread, and only	-0.124939
-0.901710	dispatcher in only	-0.124939
-0.597832	number. The only	-0.124939
-0.597832	example. The only	-0.124939
-0.597832	aliasing. The only	-0.124939
-2.204906	so that only	-0.124939
-2.409235	may be only	-0.124939
-2.285487	should be only	-0.124939
-2.137643	that are only	-0.124939
-0.595126	size are only	-0.124939
-2.158282	There are only	-0.124939
-0.595126	int) are only	-0.124939
-2.060048	you can only	-0.124939
-1.527127	which can only	-0.124939
-1.055686	We can only	-0.425969
-0.592116	otherwise can only	-0.124939
-0.900762	calculate it only	-0.124939
-0.600853	runtime, if only	-0.124939
-0.600451	grows by only	-0.124939
-1.170912	systems with only	-0.124939
-0.590706	system with only	-0.124939
-1.326196	CPUs with only	-0.124939
-0.590706	pow(x,10) with only	-0.124939
-1.732274	rely on only	-0.124939
-0.862214	can not only	-0.124939
-1.530823	will not only	-0.124939
-0.581142	would not only	-0.124939
-0.581142	include not only	-0.124939
-0.581142	measures not only	-0.124939
-0.581142	precedence, not only	-0.124939
-1.375357	Therefore, you only	-0.124939
-1.344982	will have only	-0.124939
-1.265801	CPUs have only	-0.124939
-0.599666	though this only	-0.124939
-1.318187	that use only	-0.124939
-1.764526	can use only	-0.124939
-1.431717	then use only	-0.124939
-1.623047	called from only	-0.124939
-0.199374	function has only	-0.124939
-0.578834	template has only	-0.124939
-0.578834	computer has only	-0.124939
-1.688157	will make only	-0.124939
-0.599307	smaller functions only	-0.124939
-0.566146	user but only	-0.124939
-0.566146	multiplication but only	-0.124939
-0.834006	automatically but only	-0.124939
-0.566146	needed, but only	-0.124939
-0.566146	relocation, but only	-0.124939
-1.339638	is used only	-0.124939
-1.394727	be used only	-0.124939
-1.726543	are used only	-0.124939
-0.897305	unrolling should only	-0.124939
-1.481794	this example only	-0.124939
-2.070842	by using only	-0.124939
-1.189958	register size only	-0.124939
-0.597730	libraries where only	-0.124939
-0.597730	are possible only	-0.124939
-1.818712	it takes only	-0.124939
-0.816517	memory takes only	-0.124939
-0.556620	C++ takes only	-0.124939
-0.556620	precision takes only	-0.124939
-0.892944	new branch only	-0.124939
-1.443369	is called only	-0.124939
-1.218983	be called only	-0.124939
-0.192277	function called only	-0.425969
-2.183756	For example, only	-0.124939
-1.044951	that take only	-0.124939
-1.140213	may take only	-0.124939
-1.002658	will take only	-0.124939
-0.542285	operations take only	-0.124939
-1.687070	in registers only	-0.124939
-0.595871	language need only	-0.124939
-1.466605	this method only	-0.124939
-0.886308	will work only	-0.124939
-0.593242	XOP, AMD only	-0.124939
-0.592755	this option only	-0.124939
-0.592574	use AVX only	-0.124939
-1.416185	is done only	-0.124939
-1.206694	are done only	-0.124939
-0.404698	and works only	-0.124939
-0.404698	code works only	-0.124939
-0.404698	compiler works only	-0.124939
-0.570533	this works only	-0.124939
-0.570533	method works only	-0.124939
-0.404698	dispatching works only	-0.124939
-0.570533	mechanism works only	-0.124939
-0.404698	14.13b works only	-0.124939
-1.164265	The problem only	-0.124939
-0.873719	that contains only	-0.124939
-0.755939	code contains only	-0.124939
-0.755939	now contains only	-0.124939
-0.591196	implementation would only	-0.124939
-1.161963	can run only	-0.124939
-0.359231	predicted well only	-0.425969
-1.319789	is optimal only	-0.124939
-0.797799	the dispatching only	-0.124939
-1.458528	CPU dispatching only	-0.124939
-0.875867	Windows allows only	-0.124939
-0.587691	such methods only	-0.124939
-1.261194	it needs only	-0.124939
-1.243417	is needed only	-0.124939
-0.768313	pointers requires only	-0.124939
-0.529464	precision requires only	-0.124939
-0.582199	doing things only	-0.124939
-0.515174	that depends only	-0.124939
-0.743836	value depends only	-0.124939
-0.863058	been tested only	-0.124939
-1.380738	be loaded only	-0.124939
-0.509392	compiler. Supports only	-0.124939
-0.509392	sets. Supports only	-0.124939
-0.579479	size comes only	-0.124939
-1.385057	in fact only	-0.124939
-0.574013	subexpression containing only	-0.124939
-1.254514	to handle only	-0.124939
-1.119832	is initialized only	-0.124939
-0.572109	linking includes only	-0.124939
-1.119832	and insert only	-0.124939
-0.569225	to F1 only	-0.124939
-0.569428	other processors, only	-0.124939
-0.977614	is chosen only	-0.124939
-0.566848	2 applies only	-0.124939
-0.055311	is mispredicted only	-0.425969
-0.556443	is allowed only	-0.124939
-0.991773	to hold only	-0.124939
-0.549285	is evaluated only	-0.124939
-0.913295	is executed only	-0.124939
-0.539458	is valid only	-0.124939
-0.785783	is currently only	-0.124939
-0.525582	all. Can only	-0.124939
-0.503398	the services only	-0.124939
-0.462164	by modifying only	-0.124939
-0.357769	simultaneously. Actually, only	-0.124939
-0.357769	it understands only	-0.124939
-1.811675	of the CPU	-0.124939
-2.063855	to the CPU	-0.124939
-1.704849	in the CPU	-0.221849
-1.584068	for the CPU	-0.124939
-1.425146	that the CPU	-0.124939
-2.079145	if the CPU	-0.124939
-1.604795	by the CPU	-0.425969
-1.981939	on the CPU	-0.124939
-1.847146	than the CPU	-0.124939
-1.889997	use the CPU	-0.124939
-1.856850	then the CPU	-0.124939
-1.909128	at the CPU	-0.124939
-1.192048	because the CPU	-0.425969
-1.911159	If the CPU	-0.124939
-1.741955	all the CPU	-0.124939
-1.557261	do the CPU	-0.124939
-1.484489	up the CPU	-0.124939
-1.214315	want the CPU	-0.124939
-1.217575	inside the CPU	-0.124939
-0.586969	both the CPU	-0.425969
-0.662458	replace the CPU	-0.425969
-1.309509	allows the CPU	-0.124939
-1.150044	sets the CPU	-0.124939
-1.345000	Unfortunately, the CPU	-0.124939
-1.042393	prevent the CPU	-0.124939
-0.842241	prevents the CPU	-0.425969
-1.025105	help the CPU	-0.124939
-1.133418	tells the CPU	-0.124939
-0.583277	since the CPU	-0.124939
-1.025105	bypass the CPU	-0.124939
-0.583277	override the CPU	-0.124939
-0.583277	limits the CPU	-0.124939
-1.825101	or a CPU	-0.124939
-1.996863	on a CPU	-0.124939
-1.862315	has a CPU	-0.124939
-1.981546	make a CPU	-0.124939
-1.184470	need a CPU	-0.124939
-0.893154	give a CPU	-0.124939
-0.597064	keeping a CPU	-0.124939
-2.334347	number of CPU	-0.124939
-1.603981	type of CPU	-0.124939
-1.912977	lot of CPU	-0.124939
-1.071733	pitfalls of CPU	-0.124939
-1.068056	history of CPU	-0.124939
-0.901856	approach to CPU	-0.124939
-0.901074	system and CPU	-0.124939
-1.384700	program. The CPU	-0.124939
-1.139732	version. The CPU	-0.124939
-0.585046	processor. The CPU	-0.124939
-0.585046	Implementation The CPU	-0.124939
-0.585046	instruction. The CPU	-0.124939
-0.585046	cases: The CPU	-0.124939
-0.585046	programming. The CPU	-0.124939
-0.585046	mechanism. The CPU	-0.124939
-0.585046	independently. The CPU	-0.124939
-0.585046	renaming. The CPU	-0.124939
-0.585046	newer. The CPU	-0.124939
-0.585046	old. The CPU	-0.124939
-1.784326	check for CPU	-0.124939
-1.485670	intended for CPU	-0.124939
-0.600938	13.1 // CPU	-0.124939
-1.074699	CPUs or CPU	-0.124939
-1.283877	optimization by CPU	-0.124939
-0.597708	dispatch by CPU	-0.124939
-1.282713	code with CPU	-0.124939
-1.287220	library with CPU	-0.124939
-0.893544	concentrated on CPU	-0.124939
-0.597261	research on CPU	-0.124939
-2.345326	rather than CPU	-0.124939
-1.196624	libraries have CPU	-0.124939
-0.600061	spend more CPU	-0.124939
-1.073372	relevant when CPU	-0.124939
-0.593705	set. A CPU	-0.124939
-0.593705	developed. A CPU	-0.124939
-1.593316	look at CPU	-0.124939
-1.431298	between different CPU	-0.124939
-1.167891	only one CPU	-0.124939
-0.877304	and each CPU	-0.124939
-1.307036	in each CPU	-0.124939
-0.588483	AVX using CPU	-0.124939
-0.876348	am using CPU	-0.124939
-1.234786	the Intel CPU	-0.124939
-0.550731	The multiple CPU	-0.124939
-1.173414	with multiple CPU	-0.124939
-0.940444	use multiple CPU	-0.124939
-0.745274	between multiple CPU	-0.124939
-0.565786	Jumps between CPU	-0.124939
-0.565786	discriminating between CPU	-0.124939
-0.565786	discriminates between CPU	-0.124939
-1.956322	is called CPU	-0.124939
-0.595682	functions without CPU	-0.124939
-0.516519	a specific CPU	-0.249877
-0.629489	of specific CPU	-0.124939
-0.167587	for specific CPU	-0.124939
-0.592188	software uses CPU	-0.124939
-0.592319	a known CPU	-0.124939
-0.590893	than optimizing CPU	-0.124939
-1.093864	a particular CPU	-0.124939
-0.588505	keep their CPU	-0.124939
-0.587986	for automatic CPU	-0.124939
-0.201268	with automatic CPU	-0.301030
-0.416535	contains automatic CPU	-0.124939
-0.585608	properly. Many CPU	-0.124939
-1.454263	its own CPU	-0.124939
-1.129162	compiler supports CPU	-0.124939
-0.575041	an unknown CPU	-0.124939
-0.572667	set Automatic CPU	-0.124939
-0.572542	Wikipedia under CPU	-0.124939
-0.569425	have similar CPU	-0.124939
-0.841119	should apply CPU	-0.124939
-0.159727	overriding Intel's CPU	-0.425969
-0.390455	122 13.1 CPU	-0.124939
-0.390455	loops. 13.1 CPU	-0.124939
-0.463437	the newest CPU	-0.124939
-0.549606	of poor CPU	-0.124939
-0.526193	of bad CPU	-0.124939
-0.526193	monitoring options. CPU	-0.124939
-1.008193	vector c: CPU	-0.124939
-0.762659	an explicit CPU	-0.124939
-0.503978	which consumes CPU	-0.124939
-0.504369	processors. Explicit CPU	-0.124939
-0.249443	set. 13.6 CPU	-0.124939
-0.249443	126 13.6 CPU	-0.124939
-0.249443	etc. (Intel CPU	-0.124939
-0.249443	only) (Intel CPU	-0.124939
-0.143167	128 13.7 CPU	-0.124939
-0.143167	129 13.7 CPU	-0.124939
-0.358185	Example 13.2. CPU	-0.124939
-0.358185	use inappropriate CPU	-0.124939
-0.358185	4 (NetBurst) CPU	-0.124939
-1.796007	and the other	-0.124939
-2.578102	in the other	-0.124939
-2.361676	for the other	-0.124939
-1.545561	or the other	-0.124939
-2.209056	with the other	-0.124939
-2.294030	on the other	-0.124939
-1.779074	as the other	-0.124939
-2.106209	than the other	-0.124939
-0.501545	times the other	-0.124939
-0.891187	goes the other	-0.124939
-0.389816	On the other	-0.726999
-0.596068	rarely the other	-0.124939
-1.640359	there is other	-0.124939
-1.673305	result of other	-0.124939
-0.600727	independently of other	-0.124939
-1.072839	costs to other	-0.124939
-0.679117	apply to other	-0.425969
-0.578020	branch and other	-0.124939
-1.252626	arrays and other	-0.124939
-1.232569	integers and other	-0.124939
-1.252626	files and other	-0.124939
-0.856272	sets and other	-0.124939
-0.856272	expressions and other	-0.124939
-0.856272	interface and other	-0.124939
-0.578020	network and other	-0.124939
-0.856272	functions, and other	-0.124939
-1.115043	platforms and other	-0.124939
-0.578020	smart and other	-0.124939
-0.856272	C++, and other	-0.124939
-0.578020	swapping and other	-0.124939
-0.856272	propagation and other	-0.124939
-0.578020	lists and other	-0.124939
-0.856272	database, and other	-0.124939
-0.578020	scanners and other	-0.124939
-0.578020	leaks and other	-0.124939
-0.578020	evictions and other	-0.124939
-0.578020	geometry and other	-0.124939
-0.578020	limitation and other	-0.124939
-0.578020	attacks and other	-0.124939
-0.578020	Library) and other	-0.124939
-1.694625	than in other	-0.124939
-0.594992	different in other	-0.124939
-1.412915	functions in other	-0.124939
-1.412915	running in other	-0.124939
-1.273423	defined in other	-0.124939
-0.594992	prevented in other	-0.124939
-1.058313	found in other	-0.124939
-1.076611	vector. The other	-0.124939
-0.589180	not for other	-0.124939
-1.896021	used for other	-0.124939
-1.308106	variables for other	-0.124939
-1.497945	available for other	-0.124939
-0.589180	resources for other	-0.124939
-1.240141	needed for other	-0.124939
-1.041634	possibility for other	-0.124939
-0.589180	unit for other	-0.124939
-0.589180	card for other	-0.124939
-1.564540	There are other	-0.124939
-1.038363	memory or other	-0.124939
-0.588023	CPU or other	-0.124939
-0.588023	exception or other	-0.124939
-0.875456	disk or other	-0.124939
-0.588023	database, or other	-0.124939
-0.588023	printer or other	-0.124939
-0.901113	disadvantage if other	-0.124939
-0.900652	multiplying by other	-0.124939
-1.116873	compiler with other	-0.124939
-0.199315	used with other	-0.124939
-1.012158	operations with other	-0.124939
-1.493941	compatible with other	-0.124939
-0.199315	contiguous with other	-0.124939
-0.578551	coordination with other	-0.124939
-0.900549	implemented on other	-0.124939
-1.565038	time than other	-0.124939
-1.331838	faster than other	-0.124939
-0.583478	cycles than other	-0.124939
-0.583478	frequency than other	-0.124939
-0.879585	b have other	-0.124939
-1.044385	operands have other	-0.124939
-0.590149	might have other	-0.124939
-0.586927	calling from other	-0.124939
-0.873333	accessible from other	-0.124939
-0.586927	Interference from other	-0.124939
-1.046251	by all other	-0.124939
-0.880862	sets all other	-0.124939
-0.599493	storage, but other	-0.124939
-0.897512	least one other	-0.124939
-0.986581	or no other	-0.124939
-0.352853	if no other	-0.124939
-0.721838	have no other	-0.301030
-0.906678	but no other	-0.124939
-0.536647	produce no other	-0.124939
-1.114544	to each other	-0.124939
-1.342646	for each other	-0.124939
-0.534349	from each other	-0.124939
-0.208173	near each other	-0.522879
-0.897010	or do other	-0.124939
-1.153929	on most other	-0.124939
-0.588958	than most other	-0.124939
-0.598888	any size other	-0.124939
-1.193386	a library other	-0.124939
-1.357409	in two other	-0.124939
-0.987255	are also other	-0.124939
-0.778234	size. In other	-0.124939
-0.903200	calculations. In other	-0.124939
-0.535161	parameter. In other	-0.124939
-0.535161	for. In other	-0.124939
-0.535161	safe. In other	-0.124939
-0.535161	__intel_cpu_features_init_x(). In other	-0.124939
-0.297928	to any other	-0.124939
-0.419786	and any other	-0.124939
-0.670176	for any other	-0.124939
-0.670176	or any other	-0.124939
-0.202256	by any other	-0.301030
-0.419786	as any other	-0.124939
-0.419786	not any other	-0.124939
-0.592819	have any other	-0.124939
-0.297928	from any other	-0.124939
-0.161032	call any other	-0.124939
-0.419786	insert any other	-0.124939
-1.131900	to some other	-0.124939
-1.131900	and some other	-0.124939
-0.888573	two. Some other	-0.124939
-0.541874	overhead while other	-0.124939
-0.541874	x, while other	-0.124939
-0.541874	Func1, while other	-0.124939
-1.270825	that calls other	-0.124939
-0.593594	and several other	-0.124939
-1.636973	can cause other	-0.124939
-0.587927	makes various other	-0.124939
-1.350610	can reduce other	-0.124939
-0.584644	developers choose other	-0.124939
-0.583760	algebra) require other	-0.124939
-0.577182	predictor. On other	-0.124939
-0.574612	saved. Any other	-0.124939
-0.845862	of sizes other	-0.124939
-0.549851	compilers. Several other	-0.124939
-0.804280	class c1 other	-0.124939
-0.540535	performance over other	-0.124939
-0.358342	which affects other	-0.124939
-2.420366	of the instruction	-0.124939
-1.788046	to the instruction	-0.124939
-2.762356	in the instruction	-0.124939
-1.934604	for the instruction	-0.425969
-0.900020	details of instruction	-0.124939
-0.600522	Lists of instruction	-0.124939
-1.372994	processors and instruction	-0.124939
-1.193742	2. The instruction	-0.124939
-0.897870	elements. The instruction	-0.124939
-1.296503	used if instruction	-0.124939
-1.800188	depending on instruction	-0.124939
-0.592888	efficient. This instruction	-0.124939
-0.592888	set. This instruction	-0.124939
-0.592888	view. This instruction	-0.124939
-1.356792	has an instruction	-0.124939
-0.596086	insert an instruction	-0.124939
-1.490962	for this instruction	-0.124939
-1.241938	with this instruction	-0.124939
-0.589581	support this instruction	-0.124939
-1.723839	only when instruction	-0.124939
-0.947020	for different instruction	-0.301030
-2.580250	the same instruction	-0.124939
-1.212122	on which instruction	-0.124939
-0.582763	checks which instruction	-0.124939
-0.582763	detect which instruction	-0.124939
-1.569368	has no instruction	-0.124939
-1.932662	for each instruction	-0.124939
-1.419889	the 64-bit instruction	-0.124939
-1.187538	best possible instruction	-0.124939
-1.485634	A branch instruction	-0.124939
-1.627964	64 bit instruction	-0.124939
-1.917720	a new instruction	-0.124939
-1.079151	of these instruction	-0.425969
-0.118198	the SSE2 instruction	-0.878266
-0.246068	and SSE2 instruction	-0.124939
-0.067618	The SSE2 instruction	-0.602060
-0.246068	or SSE2 instruction	-0.124939
-0.246068	145 SSE2 instruction	-0.124939
-0.246068	-msse SSE2 instruction	-0.124939
-0.595341	AVX 32 instruction	-0.124939
-0.485605	the available instruction	-0.425969
-0.594064	Details about instruction	-0.124939
-1.263628	inline assembly instruction	-0.124939
-1.392009	the necessary instruction	-0.124939
-1.069674	the specific instruction	-0.124939
-1.458310	a specific instruction	-0.124939
-0.321162	the AVX instruction	-0.425969
-0.546283	The AVX instruction	-0.124939
-0.152039	12.1 AVX instruction	-0.425969
-0.410962	the supported instruction	-0.124939
-0.410962	as supported instruction	-0.124939
-0.410962	about supported instruction	-0.124939
-0.158583	Get supported instruction	-0.425969
-0.410962	minimum supported instruction	-0.124939
-0.410962	Detect supported instruction	-0.124939
-1.276806	nontemporal write instruction	-0.124939
-1.093012	a particular instruction	-0.124939
-1.359970	The next instruction	-0.124939
-0.587490	of various instruction	-0.124939
-1.162435	on what instruction	-0.124939
-0.265303	and later instruction	-0.602060
-0.112097	or later instruction	-0.602060
-0.466319	a higher instruction	-0.124939
-0.387138	or higher instruction	-0.124939
-0.387138	any higher instruction	-0.124939
-0.387138	next higher instruction	-0.124939
-0.525934	the AVX2 instruction	-0.124939
-0.525934	The AVX2 instruction	-0.124939
-0.596838	the x86 instruction	-0.425969
-0.469311	bit x86 instruction	-0.124939
-1.556992	the appropriate instruction	-0.124939
-0.865358	backwards compatible instruction	-0.124939
-0.859438	particularly slow instruction	-0.124939
-0.804026	the desired instruction	-0.124939
-0.856722	a given instruction	-0.124939
-0.572303	sections SSE instruction	-0.124939
-0.172468	the SSE4.1 instruction	-0.425969
-0.658347	a newer instruction	-0.124939
-0.462672	The newer instruction	-0.124939
-1.156655	the current instruction	-0.124939
-0.978797	a low instruction	-0.124939
-0.825712	a lower instruction	-0.124939
-0.270735	the CPUID instruction	-0.124939
-0.559713	the latest instruction	-0.425969
-0.208183	bit scan instruction	-0.301030
-0.549250	-msse2 SSE3 instruction	-0.124939
-0.377711	the newest instruction	-0.124939
-0.274262	The newest instruction	-0.124939
-0.549802	The prefetch instruction	-0.124939
-0.539740	the AVX512 instruction	-0.124939
-0.131985	the CISC instruction	-0.124939
-0.060998	The CISC instruction	-0.425969
-0.131985	with CISC instruction	-0.124939
-0.762078	the highest instruction	-0.124939
-0.526248	the x86-64 instruction	-0.124939
-0.129250	The MOVNTQ instruction	-0.124939
-0.762078	the selected instruction	-0.124939
-0.724515	the specified instruction	-0.124939
-0.089985	the AVX-512 instruction	-0.124939
-0.042666	12.2 AVX-512 instruction	-0.425969
-0.462401	Error: lowest instruction	-0.124939
-0.462401	the FMA4 instruction	-0.124939
-0.657922	the corresponding instruction	-0.124939
-0.462401	an EMMS instruction	-0.124939
-0.357956	number (the instruction	-0.124939
-0.357956	a blend instruction	-0.124939
-0.357956	(or later) instruction	-0.124939
-0.357956	scan forward) instruction	-0.124939
-0.357956	difference. Newest instruction	-0.124939
-0.357956	Pentium Pro instruction	-0.124939
-2.725294	to the point	-0.124939
-1.849395	but the point	-0.124939
-1.444999	sure to point	-0.124939
-0.901474	edx = point	-0.124939
-1.597524	makes it point	-0.124939
-1.733157	it will point	-0.124939
-0.001484	the floating point	-0.159701
-0.001484	a floating point	-0.335792
-0.002146	of floating point	-0.124939
-0.035804	to floating point	-0.204120
-0.003870	and floating point	-0.124939
-0.009741	in floating point	-0.124939
-0.019706	The floating point	-0.124939
-0.030465	for floating point	-0.124939
-0.019706	that floating point	-0.124939
-0.019706	or floating point	-0.124939
-0.006470	with floating point	-0.124939
-0.006470	on floating point	-0.301030
-0.009741	than floating point	-0.124939
-0.019706	have floating point	-0.124939
-0.019706	A floating point	-0.124939
-0.004843	from floating point	-0.425969
-0.006470	make floating point	-0.301030
-0.019706	different floating point	-0.124939
-0.019706	all floating point	-0.124939
-0.019706	one floating point	-0.124939
-0.019706	each floating point	-0.124939
-0.006470	two floating point	-0.124939
-0.009741	any floating point	-0.124939
-0.009741	between floating point	-0.425969
-0.006470	makes floating point	-0.301030
-0.019706	new floating point	-0.124939
-0.009741	making floating point	-0.425969
-0.009741	does floating point	-0.425969
-0.019706	big floating point	-0.124939
-0.006470	eight floating point	-0.301030
-0.019706	contains floating point	-0.124939
-0.009741	doing floating point	-0.425969
-0.019706	fast floating point	-0.124939
-0.019706	generate floating point	-0.124939
-0.019706	positive floating point	-0.124939
-0.019706	100 floating point	-0.124939
-0.019706	causes floating point	-0.124939
-0.019706	mix floating point	-0.124939
-0.019706	Any floating point	-0.124939
-0.019706	entire floating point	-0.124939
-0.019706	style floating point	-0.124939
-0.019706	variables, floating point	-0.124939
-0.019706	strict floating point	-0.124939
-0.009741	nonzero floating point	-0.124939
-0.019706	larger floating point	-0.124939
-0.019706	additional floating point	-0.124939
-0.009741	manipulating floating point	-0.425969
-0.019706	Catch floating point	-0.124939
-0.019706	mispredictions, floating point	-0.124939
-0.019706	precise floating point	-0.124939
-0.019706	Non-strict floating point	-0.124939
-0.019706	native floating point	-0.124939
-0.019706	Reset floating point	-0.124939
-0.019706	relaxed floating point	-0.124939
-0.019706	relax floating point	-0.124939
-0.019706	FMA3 floating point	-0.124939
-0.895847	a possible point	-0.124939
-0.199654	types cannot point	-0.425969
-0.850911	objects they point	-0.124939
-0.575187	texts they point	-0.124939
-0.594780	Induction++; ; point	-0.124939
-0.012385	// Floating point	-0.124939
-0.012385	double Floating point	-0.124939
-0.012385	n.a. Floating point	-0.124939
-0.012385	variables Floating point	-0.124939
-0.012385	systems. Floating point	-0.124939
-0.012385	division Floating point	-0.124939
-0.012385	shift Floating point	-0.124939
-0.012385	cycles. Floating point	-0.124939
-0.012385	purposes. Floating point	-0.124939
-0.012385	expressions. Floating point	-0.124939
-0.012385	parameters. Floating point	-0.124939
-0.012385	integer. Floating point	-0.124939
-0.006148	14.6 Floating point	-0.425969
-0.012385	105. Floating point	-0.124939
-0.006148	7.3 Floating point	-0.425969
-0.012385	organized. Floating point	-0.124939
-0.012385	cycles). Floating point	-0.124939
-0.012385	79 Floating point	-0.124939
-0.527101	floating 26 point	-0.124939
-0.527101	common entry point	-0.124939
-0.143355	the decimal point	-0.124939
-0.143355	a decimal point	-0.124939
-0.358801	a technological point	-0.124939
-1.562448	is the loop	-0.124939
-1.796320	of the loop	-0.124939
-1.916272	and the loop	-0.124939
-1.979095	in the loop	-0.425969
-2.000499	for the loop	-0.124939
-1.777017	that the loop	-0.124939
-1.429573	or the loop	-0.124939
-1.304867	if the loop	-0.271067
-1.922960	by the loop	-0.124939
-2.031113	when the loop	-0.124939
-1.431939	then the loop	-0.124939
-1.828288	from the loop	-0.124939
-1.882529	If the loop	-0.124939
-1.733023	where the loop	-0.124939
-1.280012	before the loop	-0.124939
-0.406820	out the loop	-0.522879
-1.470966	up the loop	-0.124939
-1.429573	avoid the loop	-0.124939
-0.628653	inside the loop	-0.204120
-1.591658	unless the loop	-0.124939
-1.362110	after the loop	-0.124939
-1.335707	off the loop	-0.124939
-0.633925	outside the loop	-0.124939
-0.199905	unroll the loop	-0.124939
-0.862677	predict the loop	-0.124939
-0.862677	execute the loop	-0.124939
-0.199905	unrolling the loop	-0.124939
-1.339799	vectorize the loop	-0.124939
-0.581384	evaluate the loop	-0.124939
-0.581384	comparing the loop	-0.124939
-0.581384	increment the loop	-0.124939
-0.581384	Unrolling the loop	-0.124939
-2.052572	is a loop	-0.124939
-1.286176	of a loop	-0.301030
-1.558976	in a loop	-0.249877
-1.747495	that a loop	-0.124939
-1.702607	if a loop	-0.124939
-1.317308	use a loop	-0.124939
-1.735977	make a loop	-0.124939
-1.091168	If a loop	-0.124939
-1.463784	example, a loop	-0.124939
-0.864404	out a loop	-0.124939
-1.390349	inside a loop	-0.124939
-0.245405	unroll a loop	-0.124939
-0.864404	processors, a loop	-0.124939
-0.582286	vectorize a loop	-0.124939
-0.582286	Unrolling a loop	-0.124939
-0.582286	incrementing a loop	-0.124939
-0.897537	calculations of loop	-0.124939
-0.124628	top of loop	-0.602060
-1.232379	} The loop	-0.124939
-1.216787	loop. The loop	-0.124939
-1.135469	calculations. The loop	-0.124939
-1.135469	not. The loop	-0.124939
-1.026702	value. The loop	-0.124939
-0.583853	5. The loop	-0.124939
-0.583853	better. The loop	-0.124939
-0.583853	eax. The loop	-0.124939
-0.583853	1000. The loop	-0.124939
-0.583853	eax,0. The loop	-0.124939
-0.583853	freely. The loop	-0.124939
-0.583853	7.30b. The loop	-0.124939
-1.330917	{ // loop	-0.301030
-0.591679	rows // loop	-0.124939
-0.882570	x^10 // loop	-0.124939
-1.075304	variable as loop	-0.124939
-0.974185	} This loop	-0.124939
-1.058827	then this loop	-0.124939
-1.058827	optimize this loop	-0.124939
-1.404875	time. A loop	-0.124939
-0.875769	well. A loop	-0.124939
-0.875769	prediction. A loop	-0.124939
-2.239591	is no loop	-0.124939
-0.897023	power using loop	-0.124939
-1.681104	most efficient loop	-0.124939
-0.064682	Roll out loop	-0.823909
-0.235812	the while loop	-0.124939
-1.394416	a big loop	-0.124939
-0.594111	The c loop	-0.124939
-0.592495	inside another loop	-0.124939
-0.171770	the innermost loop	-0.124939
-0.245822	critical innermost loop	-0.301030
-1.348639	the whole loop	-0.124939
-1.012388	a special loop	-0.124939
-0.579162	; repeat loop	-0.124939
-0.869796	the maximum loop	-0.124939
-1.029301	The maximum loop	-0.124939
-0.474192	the message loop	-0.124939
-0.474192	a message loop	-0.124939
-0.827004	simple variables, loop	-0.124939
-0.540912	use excessive loop	-0.124939
-0.504460	The unrolled loop	-0.124939
-0.463130	much. Excessive loop	-0.124939
-0.463130	avoiding infinite loop	-0.124939
-0.659066	// Initialize loop	-0.124939
-0.463130	// Increment loop	-0.124939
-0.358529	temporary intermediates, loop	-0.124939
-0.358529	integer power, loop	-0.124939
-0.358529	The i<20 loop	-0.124939
-0.358529	// Main loop	-0.124939
-0.600285	_MSC_VER // If	-0.124939
-1.370269	... } If	-0.124939
-0.595718	exception. 64 If	-0.124939
-0.960603	point code. If	-0.124939
-0.538736	error code. If	-0.124939
-0.538736	simplest code. If	-0.124939
-0.784511	application-specific code. If	-0.124939
-0.593322	virtual function. If	-0.124939
-0.743430	program memory. If	-0.124939
-0.743430	into memory. If	-0.124939
-0.857386	static memory. If	-0.124939
-1.041676	be used. If	-0.124939
-1.095660	data cache. If	-0.124939
-0.539758	level-3 cache. If	-0.124939
-1.206266	64-bit systems. If	-0.124939
-0.539529	some systems. If	-0.124939
-0.534495	not efficient. If	-0.124939
-0.777069	equally efficient. If	-0.124939
-0.938404	instruction set. If	-0.124939
-0.484126	each set. If	-0.124939
-0.584898	some compilers. If	-0.124939
-1.469357	is called. If	-0.124939
-1.428148	the loop. If	-0.124939
-1.019578	smart pointer. If	-0.124939
-1.119188	cache size. If	-0.124939
-1.453089	function calls. If	-0.124939
-1.225259	32-bit mode. If	-0.124939
-1.104961	the object. If	-0.124939
-0.803945	function library. If	-0.124939
-0.489827	point library. If	-0.124939
-1.502385	clock cycles. If	-0.124939
-0.482079	different thread. If	-0.124939
-0.689201	another thread. If	-0.124939
-0.848044	test purposes. If	-0.124939
-0.991685	following way. If	-0.124939
-0.842846	a vector. If	-0.124939
-0.836690	local references. If	-0.124939
-0.569119	= u; If	-0.124939
-0.837257	load address. If	-0.124939
-0.801976	the CPU. If	-0.124939
-0.461841	non-Intel CPU. If	-0.124939
-0.462080	a problem. If	-0.124939
-0.838393	this problem. If	-0.124939
-0.564935	sequential order. If	-0.124939
-0.564262	are inefficient. If	-0.124939
-0.830522	was executed. If	-0.124939
-1.233235	template parameter. If	-0.124939
-0.564262	endian storage. If	-0.124939
-0.823620	source file. If	-0.124939
-0.823620	a register. If	-0.124939
-1.242388	execution units. If	-0.124939
-0.560135	a branch. If	-0.124939
-0.413911	above table. If	-0.124939
-0.413911	linkage table. If	-0.124939
-0.584550	threads simultaneously. If	-0.124939
-0.414217	seemingly simultaneously. If	-0.124939
-0.703805	an integer. If	-0.124939
-0.414217	nearest integer. If	-0.124939
-0.554886	loaded anyway. If	-0.124939
-0.555309	page 16. If	-0.124939
-0.413911	32-bit number. If	-0.124939
-0.413911	signed number. If	-0.124939
-0.547984	or constant. If	-0.124939
-1.117666	branch prediction. If	-0.124939
-0.802674	particular application. If	-0.124939
-1.080139	data members. If	-0.124939
-1.126003	dependency chain. If	-0.124939
-0.389517	and again. If	-0.124939
-0.656920	back again. If	-0.124939
-1.020756	not overlap. If	-0.124939
-0.548470	many branches. If	-0.124939
-0.991827	to maintain. If	-0.124939
-0.991827	the future. If	-0.124939
-0.357516	other factor. If	-0.124939
-0.599428	unroll factor. If	-0.124939
-0.538502	lower priority. If	-0.124939
-0.785104	at www.agner.org/optimize/asmlib.zip. If	-0.124939
-0.538502	be necessary. If	-0.124939
-0.911038	subexpression elimination If	-0.124939
-0.538502	memory addresses. If	-0.124939
-0.538502	<< 5. If	-0.124939
-0.760013	work better. If	-0.124939
-0.524656	cleaned up. If	-0.124939
-0.953603	is declared. If	-0.124939
-0.760013	is running. If	-0.124939
-0.953603	identification (RTTI) If	-0.124939
-0.313543	operand first. If	-0.124939
-0.313543	come first. If	-0.124939
-0.524656	See www.agner.org/optimize/cppexamples.zip. If	-0.124939
-0.524656	oriented programs. If	-0.124939
-0.313543	reliable results. If	-0.124939
-0.442734	reproducible results. If	-0.124939
-0.722621	an addition. If	-0.124939
-0.502518	of code). If	-0.124939
-0.722621	from errors. If	-0.124939
-0.106858	AVX part. If	-0.425969
-0.722621	chapter 12. If	-0.124939
-0.502518	too long. If	-0.124939
-0.830522	the same. If	-0.124939
-0.722621	is slow. If	-0.124939
-0.502518	set 0x1C. If	-0.124939
-0.461364	to CriticalFunction. If	-0.124939
-0.461364	to 15. If	-0.124939
-0.461364	following cases: If	-0.124939
-0.656296	|= 0x20; If	-0.124939
-0.461364	these methods. If	-0.124939
-0.461364	has hyperthreading. If	-0.124939
-0.142849	FIFO manner? If	-0.124939
-0.142849	FILO manner? If	-0.124939
-0.656296	to read. If	-0.124939
-0.656296	be obtained. If	-0.124939
-0.461364	deleting containers. If	-0.124939
-0.656296	250 ms. If	-0.124939
-0.065577	been added? If	-0.425969
-0.461364	so. 58 If	-0.124939
-0.656296	page 105). If	-0.124939
-0.656296	and BSD. If	-0.124939
-0.357140	set extensions. If	-0.124939
-0.357140	a macro. If	-0.124939
-0.357140	1-bit removed. If	-0.124939
-0.357140	compile time? If	-0.124939
-0.357140	same class). If	-0.124939
-0.357140	different speeds. If	-0.124939
-0.357140	above. 7. If	-0.124939
-0.357140	= lookup[b]; If	-0.124939
-0.357140	or __debugbreak();. If	-0.124939
-0.357140	numbered consecutively? If	-0.124939
-0.357140	= n∙(n-1)!. If	-0.124939
-0.357140	page 62. If	-0.124939
-0.357140	was coded. If	-0.124939
-0.357140	a key? If	-0.124939
-0.357140	more complicated. If	-0.124939
-0.357140	cryptography (www.intel.com). If	-0.124939
-0.357140	or remotely. If	-0.124939
-0.357140	natural ordering? If	-0.124939
-0.357140	file stub. If	-0.124939
-0.357140	to calculate. If	-0.124939
-0.357140	systems). 42 If	-0.124939
-0.357140	nontemporal writes. If	-0.124939
-0.357140	logical sequence. If	-0.124939
-0.357140	time measurement. If	-0.124939
-0.357140	for analysis. If	-0.124939
-0.357140	multiple elements? If	-0.124939
-0.357140	or references: If	-0.124939
-0.357140	vector. 6. If	-0.124939
-0.357140	the pipeline. If	-0.124939
-0.357140	is stored? If	-0.124939
-0.357140	} 152 If	-0.124939
-0.357140	of ways). If	-0.124939
-0.357140	is considerable. If	-0.124939
-0.357140	: 2.5f; If	-0.124939
-0.357140	+= sum2; If	-0.124939
-0.357140	been allocated. If	-0.124939
-0.684183	list of which	-0.301030
-1.558784	choice of which	-0.124939
-0.596409	recommendation of which	-0.124939
-0.596409	indication of which	-0.124939
-0.596409	reports of which	-0.124939
-0.893433	do and which	-0.124939
-0.597205	allowed and which	-0.124939
-0.597205	costly and which	-0.124939
-0.597205	situations, and which	-0.124939
-0.709347	function in which	-0.249877
-1.803515	code in which	-0.124939
-0.106000	order in which	-0.492916
-0.873820	thread in which	-0.124939
-0.587178	brackets in which	-0.124939
-1.646593	a function which	-0.124939
-1.293809	library function which	-0.124939
-1.733585	member function which	-0.124939
-1.227887	another function which	-0.124939
-0.586413	API function which	-0.124939
-1.034687	processors on which	-0.124939
-1.680241	based on which	-0.124939
-0.201008	models on which	-0.124939
-0.586716	opinions on which	-0.124939
-1.073919	error code which	-0.124939
-0.600122	new compiler which	-0.124939
-1.659280	compile time which	-0.124939
-1.070766	stack memory which	-0.124939
-0.599430	address at which	-0.124939
-0.599120	that functions which	-0.124939
-2.248360	the CPU which	-0.124939
-1.192461	template class which	-0.124939
-1.355924	a double which	-0.124939
-1.511951	function pointer which	-0.124939
-0.581707	'this' pointer which	-0.124939
-0.597727	point library which	-0.124939
-0.835254	of i which	-0.124939
-1.568458	shared object which	-0.124939
-1.830534	a variable which	-0.124939
-1.364992	memory address which	-0.124939
-0.579049	higher address which	-0.124939
-0.596289	following example, which	-0.124939
-0.596013	single bit which	-0.124939
-1.404669	vector register which	-0.124939
-0.889689	find out which	-0.124939
-1.898187	operating system which	-0.124939
-0.594055	conversion instructions which	-0.124939
-0.594532	profilers available which	-0.124939
-1.443388	information about which	-0.124939
-0.543294	recommendation about which	-0.124939
-0.543294	debate about which	-0.124939
-0.593831	Pentium CPUs which	-0.124939
-0.884381	8-bit integers which	-0.124939
-1.690365	is known which	-0.124939
-0.591421	debugging support which	-0.124939
-1.317418	can calculate which	-0.124939
-0.588318	| operator which	-0.124939
-0.897925	to see which	-0.124939
-0.500670	and see which	-0.124939
-0.584964	and directives which	-0.124939
-0.584774	77) shows which	-0.124939
-0.583339	complicated process which	-0.124939
-1.130314	extra overhead which	-0.124939
-1.219635	a profiler which	-0.124939
-0.578650	shift operation which	-0.124939
-0.944474	the code, which	-0.124939
-0.724369	intermediate code, which	-0.124939
-0.576330	three conditions which	-0.124939
-0.851592	when testing which	-0.124939
-0.303453	to predict which	-0.124939
-0.925379	is discussed which	-0.124939
-0.473747	also discussed which	-0.124939
-0.571811	to consider which	-0.124939
-0.837726	C++ compiler, which	-0.124939
-0.839587	it checks which	-0.124939
-1.216779	dependency chain which	-0.124939
-0.561296	access non-sequential which	-0.124939
-0.555773	special trick which	-0.124939
-1.041730	32-bit integers, which	-0.124939
-0.555427	line size, which	-0.124939
-0.555773	MMX registers, which	-0.124939
-0.801885	automatically detect which	-0.124939
-0.550109	on deciding which	-0.124939
-0.539490	a latency which	-0.124939
-0.342890	is true, which	-0.425969
-0.313775	start up, which	-0.124939
-0.313775	filled up, which	-0.124939
-1.006433	in advance which	-0.124939
-0.525161	5) {} which	-0.124939
-0.954960	by default, which	-0.124939
-0.831544	of abstraction which	-0.124939
-0.502997	program optimization, which	-0.124939
-0.502997	point comparisons, which	-0.124939
-0.203641	the stack, which	-0.124939
-0.502997	Library (STL) which	-0.124939
-0.502997	to decide which	-0.124939
-0.502997	and references, which	-0.124939
-0.461800	OR operator, which	-0.124939
-0.461800	to -56 which	-0.124939
-0.461800	family number, which	-0.124939
-0.461800	the CPU, which	-0.124939
-0.461800	unpredictable intervals which	-0.124939
-0.461800	the array, which	-0.124939
-0.656980	an interpreter which	-0.124939
-0.461800	a division, which	-0.124939
-0.461800	integer comparison, which	-0.124939
-0.656980	loop counter, which	-0.124939
-0.461800	function decides which	-0.124939
-0.656980	garbage collector which	-0.124939
-0.656980	with certainty which	-0.124939
-0.142953	& operation, which	-0.124939
-0.142953	shift operation, which	-0.124939
-0.357483	is moved, which	-0.124939
-0.357483	be mispredicted, which	-0.124939
-0.357483	library (DLL) which	-0.124939
-0.357483	by x<<3, which	-0.124939
-0.357483	generic branch, which	-0.124939
-0.357483	library asmlib, which	-0.124939
-0.357483	compile-time polymorphism, which	-0.124939
-0.357483	an attribute which	-0.124939
-0.357483	intermediate results, which	-0.124939
-0.357483	even matters, which	-0.124939
-0.357483	Linux: -ffunction-sections) which	-0.124939
-0.357483	for everything, which	-0.124939
-0.357483	language output, which	-0.124939
-0.357483	Basic .NET, which	-0.124939
-0.357483	in a[] which	-0.124939
-0.357483	n-1 multiplications, which	-0.124939
-0.357483	a bit-mask which	-0.124939
-0.357483	variable (eax) which	-0.124939
-0.357483	CPU model, which	-0.124939
-0.357483	calling WritePrivateProfileString, which	-0.124939
-0.357483	identification (RTTI), which	-0.124939
-0.357483	or YMM) which	-0.124939
-0.357483	constant 2.5, which	-0.124939
-2.512361	This is all	-0.124939
-2.143844	which is all	-0.124939
-1.448089	size of all	-0.124939
-1.361532	values of all	-0.124939
-1.428479	care of all	-0.124939
-1.071613	rid of all	-0.124939
-1.065444	array to all	-0.124939
-1.607743	pointers to all	-0.124939
-1.607743	access to all	-0.124939
-0.893884	information to all	-0.124939
-0.597432	keyword to all	-0.124939
-1.538081	applied to all	-0.124939
-1.279743	file and all	-0.124939
-0.597718	AVX2 and all	-0.124939
-0.597718	statement and all	-0.124939
-0.597718	true, and all	-0.124939
-1.605132	available in all	-0.124939
-1.064412	precision in all	-0.124939
-0.893188	counters in all	-0.124939
-1.069280	deallocated in all	-0.124939
-0.597081	permissible in all	-0.124939
-0.860405	memory for all	-0.124939
-1.758631	used for all	-0.124939
-0.860405	method for all	-0.124939
-0.580193	stack for all	-0.124939
-0.580193	best for all	-0.124939
-1.421250	option for all	-0.124939
-1.640417	check for all	-0.124939
-0.580193	allocation for all	-0.124939
-0.860405	made for all	-0.124939
-0.584480	choice for all	-0.124939
-0.860405	PLT for all	-0.124939
-0.580193	GOT for all	-0.124939
-0.580193	c1 for all	-0.124939
-0.580193	exist for all	-0.124939
-2.105377	is that all	-0.124939
-1.294025	sure that all	-0.425969
-1.045235	important that all	-0.124939
-1.639436	means that all	-0.124939
-1.154146	requires that all	-0.124939
-1.314780	sense that all	-0.124939
-0.869143	specifies that all	-0.124939
-0.869143	Check that all	-0.124939
-0.584755	verify that all	-0.124939
-0.584755	guarantee that all	-0.124939
-1.470720	only if all	-0.124939
-1.406810	loop if all	-0.124939
-1.256134	efficient if all	-0.124939
-1.321325	But if all	-0.124939
-0.586573	zero if all	-0.124939
-0.586573	runtime if all	-0.124939
-1.493592	used by all	-0.124939
-0.594628	should by all	-0.124939
-1.673207	supported by all	-0.124939
-0.569189	data with all	-0.124939
-0.839662	version with all	-0.124939
-0.569189	out with all	-0.124939
-0.569189	line with all	-0.124939
-0.569189	works with all	-0.124939
-1.427415	compatible with all	-0.124939
-0.569189	Even with all	-0.124939
-0.485130	AND'ed with all	-0.124939
-0.197344	Works with all	-0.425969
-0.824775	version on all	-0.124939
-0.480138	operations on all	-0.124939
-0.561140	assembly on all	-0.124939
-0.707268	work on all	-0.124939
-0.824775	works on all	-0.124939
-0.363378	supported on all	-0.124939
-1.091377	well on all	-0.124939
-1.156388	turn on all	-0.124939
-0.561140	efficiently on all	-0.124939
-0.561140	incurred on all	-0.124939
-0.600236	though not all	-0.124939
-0.600046	end when all	-0.124939
-0.593836	example, then all	-0.124939
-0.886795	first, then all	-0.124939
-0.579994	not at all	-0.124939
-1.036117	used at all	-0.124939
-0.579994	evaluated at all	-0.124939
-0.579994	visible at all	-0.124939
-1.690157	will make all	-0.124939
-0.585025	double because all	-0.124939
-0.585025	numbers because all	-0.124939
-0.869664	costly because all	-0.124939
-0.892957	needed before all	-0.124939
-1.063231	can call all	-0.124939
-2.187367	For example, all	-0.124939
-1.707137	to test all	-0.124939
-0.593896	4, while all	-0.124939
-1.633934	can cause all	-0.124939
-0.591461	d would all	-0.124939
-1.405585	to store all	-0.124939
-0.810042	may store all	-0.124939
-0.590632	These addresses all	-0.124939
-1.155924	can replace all	-0.124939
-0.542651	and sets all	-0.124939
-0.542651	constructor sets all	-0.124939
-0.588576	handler needs all	-0.124939
-1.561864	// Make all	-0.124939
-0.873886	above examples all	-0.124939
-0.873886	and last all	-0.124939
-0.694346	only after all	-0.124939
-0.485265	array after all	-0.124939
-0.485265	needed after all	-0.124939
-0.585901	not load all	-0.124939
-1.038741	turning off all	-0.124939
-0.448393	processors. Supports all	-0.124939
-0.448393	source. Supports all	-0.124939
-0.448393	workaround. Supports all	-0.124939
-0.465775	by inlining all	-0.425969
-0.579919	package, including all	-0.124939
-0.579708	without checking all	-0.124939
-0.578718	and prevents all	-0.124939
-0.853143	by testing all	-0.124939
-1.228110	by copying all	-0.124939
-0.490994	list causes all	-0.124939
-0.490994	stride causes all	-0.124939
-1.308317	reason why all	-0.124939
-0.569567	object. Obviously, all	-0.124939
-0.565761	to contain all	-0.124939
-0.566614	vector stores all	-0.124939
-0.566102	transfer across all	-0.124939
-0.164973	in almost all	-0.124939
-0.556353	pointer. Likewise, all	-0.124939
-0.556782	has saved all	-0.124939
-1.080381	to remove all	-0.124939
-0.549674	and declare all	-0.124939
-0.540204	to select all	-0.124939
-0.539915	less. Fortunately, all	-0.124939
-0.526024	fma4intrin.h (Gnu) all	-0.124939
-0.504262	or manipulate all	-0.124939
-0.503818	C++ language, all	-0.124939
-0.107032	that scans all	-0.425969
-0.504262	not solve all	-0.124939
-0.504262	vectorization Not all	-0.124939
-0.658150	to join all	-0.124939
-0.658150	to distribute all	-0.124939
-0.358070	to pool all	-0.124939
-0.358070	are removed, all	-0.124939
-0.358070	you analyze all	-0.124939
-2.385887	a function but	-0.124939
-1.768621	more time but	-0.124939
-1.281568	of all but	-0.124939
-0.895298	by Intel but	-0.124939
-0.598189	not i but	-0.124939
-1.476559	of C++ but	-0.124939
-0.597519	specific order but	-0.124939
-0.596088	contrived example, but	-0.124939
-0.888665	under test but	-0.124939
-1.844493	the user but	-0.124939
-0.887969	physical processors but	-0.124939
-1.284333	= a, but	-0.124939
-0.883901	point overflow but	-0.124939
-0.588158	Mac programs but	-0.124939
-1.085332	most cases, but	-0.124939
-1.158087	some cases, but	-0.124939
-0.730063	simplest cases, but	-0.124939
-0.874735	a multiplication but	-0.124939
-0.541524	arrays automatically but	-0.124939
-0.541524	14.14b automatically but	-0.124939
-0.657439	the function, but	-0.124939
-0.462093	optimized function, but	-0.124939
-0.462093	latter function, but	-0.124939
-0.279236	intrinsic functions, but	-0.124939
-0.383423	similar functions, but	-0.124939
-0.383423	pure functions, but	-0.124939
-0.503291	efficient code, but	-0.124939
-0.503291	superfluous code, but	-0.124939
-0.357768	the time, but	-0.124939
-0.357768	compile time, but	-0.124939
-0.357768	compile- time, but	-0.124939
-0.357768	programmers' time, but	-0.124939
-0.899045	is used, but	-0.124939
-0.657159	be used, but	-0.124939
-0.696438	instruction set, but	-0.124939
-0.462374	AMD processors, but	-0.124939
-0.462374	Atom processors, but	-0.124939
-0.823803	32-bit systems, but	-0.124939
-0.335498	other CPUs, but	-0.124939
-0.472562	Intel CPUs, but	-0.124939
-0.472562	multi-core CPUs, but	-0.124939
-0.560249	register variables, but	-0.124939
-0.560970	or 8, but	-0.124939
-1.394737	clock cycles, but	-0.124939
-0.556221	or 1, but	-0.124939
-0.950989	in memory, but	-0.124939
-1.138575	operating system, but	-0.124939
-0.813571	64-bit integers, but	-0.124939
-1.080517	of course, but	-0.124939
-0.548095	16.2 above, but	-0.124939
-0.273955	is efficient, but	-0.124939
-0.273955	and efficient, but	-0.124939
-0.273955	quite efficient, but	-0.124939
-0.548562	not edx but	-0.124939
-0.539159	general case, but	-0.124939
-0.539159	function library, but	-0.124939
-0.538610	system programming, but	-0.124939
-0.230182	different threads, but	-0.124939
-0.192196	multiple threads, but	-0.124939
-0.785257	many features, but	-0.124939
-0.992135	programming languages, but	-0.124939
-0.524761	on BSD, but	-0.124939
-0.313591	as well, but	-0.124939
-0.313591	reasonably well, but	-0.124939
-0.879308	are needed, but	-0.124939
-0.760194	double precision, but	-0.124939
-0.953886	hot spot but	-0.124939
-0.525426	a float, but	-0.124939
-0.760194	it is, but	-0.124939
-0.760194	the unit-test but	-0.124939
-0.442799	a pointer, but	-0.124939
-0.313591	imported pointer, but	-0.124939
-0.502618	64 bits, but	-0.124939
-0.503460	function libraries, but	-0.124939
-0.722786	small devices, but	-0.124939
-0.502618	model numbers, but	-0.124939
-0.502618	by 64, but	-0.124939
-0.722786	it occurs, but	-0.124939
-0.502618	optimizations automatically, but	-0.124939
-0.722786	more readable but	-0.124939
-0.502618	fence instructions, but	-0.124939
-0.722786	template metaprogramming, but	-0.124939
-0.502618	in vectors, but	-0.124939
-0.461455	little-endian storage, but	-0.124939
-0.461455	of expressions, but	-0.124939
-0.461455	logarithm again, but	-0.124939
-0.142870	multiple applications, but	-0.124939
-0.142870	such applications, but	-0.124939
-0.656439	be vectorized, but	-0.124939
-0.461455	any expression, but	-0.124939
-0.142870	can occur, but	-0.124939
-0.142870	doesn't occur, but	-0.124939
-0.461455	using hyperthreading, but	-0.124939
-0.461455	under test, but	-0.124939
-0.461455	64-bit software, but	-0.124939
-0.656439	more complex, but	-0.124939
-0.656439	is loaded, but	-0.124939
-0.461455	and flexible, but	-0.124939
-0.461455	need relocation, but	-0.124939
-0.461455	addition unit, but	-0.124939
-0.461455	on usability, but	-0.124939
-0.656439	this manual, but	-0.124939
-0.656439	test setup but	-0.124939
-0.461455	known type, but	-0.124939
-0.461455	other reasons, but	-0.124939
-0.461455	simplest method, but	-0.124939
-0.357212	the factorials, but	-0.124939
-0.357212	hot spots, but	-0.124939
-0.357212	a macro, but	-0.124939
-0.357212	or .a), but	-0.124939
-0.357212	than 2-20, but	-0.124939
-0.357212	with -mcmodel=large, but	-0.124939
-0.357212	to relocate, but	-0.124939
-0.357212	static if), but	-0.124939
-0.357212	CPU core, but	-0.124939
-0.357212	data bases, but	-0.124939
-0.357212	more primitive, but	-0.124939
-0.357212	from main, but	-0.124939
-0.357212	memory block, but	-0.124939
-0.357212	the hint, but	-0.124939
-0.357212	me manually, but	-0.124939
-0.357212	option -ftrapv, but	-0.124939
-0.357212	2 GB, but	-0.124939
-0.357212	with profiling, but	-0.124939
-0.357212	page 103), but	-0.124939
-0.357212	very often, but	-0.124939
-0.357212	of security, but	-0.124939
-0.357212	simple solution, but	-0.124939
-0.357212	particular situation, but	-0.124939
-0.357212	syntax restriction, but	-0.124939
-0.357212	as required, but	-0.124939
-0.357212	// Faster, but	-0.124939
-0.357212	disk caching, but	-0.124939
-0.357212	be noticeable but	-0.124939
-0.357212	small subtasks, but	-0.124939
-0.357212	container expandable, but	-0.124939
-0.357212	or aliasing, but	-0.124939
-0.357212	to 15.1c, but	-0.124939
-0.357212	is cached, but	-0.124939
-0.357212	considerable job, but	-0.124939
-0.357212	scientific computing, but	-0.124939
-0.357212	type casting, but	-0.124939
-0.357212	code section, but	-0.124939
-0.357212	of rounding, but	-0.124939
-0.357212	is wrong, but	-0.124939
-0.357212	final destination, but	-0.124939
-0.357212	public symbols, but	-0.124939
-0.357212	Mac platform, but	-0.124939
-1.261579	and is used	-0.124939
-1.062716	that is used	-0.221849
-1.877017	it is used	-0.425969
-1.849533	code is used	-0.124939
-2.053812	This is used	-0.124939
-2.225477	It is used	-0.124939
-1.364241	which is used	-0.124939
-0.826614	cache is used	-0.124939
-1.200053	table is used	-0.124939
-1.015827	thread is used	-0.124939
-0.579899	standard is used	-0.124939
-0.579899	mode is used	-0.124939
-1.200053	counter is used	-0.124939
-1.121547	space is used	-0.124939
-0.480464	operator is used	-0.602060
-1.015827	keyword is used	-0.124939
-0.371211	process is used	-0.124939
-1.218457	feature is used	-0.124939
-0.579899	bool is used	-0.124939
-0.579899	frame is used	-0.124939
-0.859844	algorithm is used	-0.124939
-0.579899	INSTRSET is used	-0.124939
-0.579899	longjmp is used	-0.124939
-0.601316	popular and used	-0.124939
-1.066933	can be used	-0.477121
-1.030855	may be used	-0.367977
-1.418247	will be used	-0.124939
-1.135545	only be used	-0.124939
-1.489612	should be used	-0.124939
-0.534324	also be used	-0.346788
-1.092688	cannot be used	-0.124939
-0.786082	even be used	-0.124939
-1.091941	therefore be used	-0.124939
-0.913697	still be used	-0.124939
-0.566072	that are used	-0.572097
-1.475621	data are used	-0.124939
-1.081638	functions are used	-0.124939
-1.200663	which are used	-0.124939
-1.418841	libraries are used	-0.124939
-1.046795	pointers are used	-0.124939
-1.524328	they are used	-0.124939
-1.200663	processors are used	-0.124939
-0.956172	manuals are used	-0.124939
-0.557073	units are used	-0.124939
-0.956172	frameworks are used	-0.124939
-0.817342	Threads are used	-0.124939
-0.601126	ipow(x,10); // used	-0.124939
-1.441884	than it used	-0.124939
-1.490626	are not used	-0.425969
-1.259407	I have used	-0.124939
-2.165092	the time used	-0.124939
-1.772497	The time used	-0.124939
-0.594565	directives when used	-0.124939
-0.594565	definitions when used	-0.124939
-1.698200	the memory used	-0.124939
-0.593391	data memory used	-0.124939
-0.898859	or data used	-0.124939
-1.725493	is only used	-0.124939
-2.266735	the CPU used	-0.124939
-1.458682	the most used	-0.124939
-0.965020	is also used	-0.124939
-0.598004	lines we used	-0.124939
-0.580094	are often used	-0.124939
-0.223151	most often used	-0.124939
-0.492944	Keep often used	-0.124939
-0.998715	the method used	-0.124939
-1.356821	The method used	-0.124939
-1.586295	to get used	-0.124939
-1.044302	that was used	-0.124939
-1.157620	cache space used	-0.124939
-0.589451	frameworks typically used	-0.124939
-1.039795	memory model used	-0.124939
-0.467331	are never used	-0.425969
-1.049817	no longer used	-0.124939
-0.577155	additions. When used	-0.124939
-0.562391	are now used	-0.124939
-0.562391	The algorithms used	-0.124939
-0.557009	you had used	-0.124939
-0.557130	and generally used	-0.124939
-0.550074	Func 87 used	-0.124939
-0.540546	method currently used	-0.124939
-0.077756	most commonly used	-0.124939
-0.172537	two commonly used	-0.124939
-0.314455	from seldom used	-0.124939
-0.314455	put seldom used	-0.124939
-0.504399	of Pascal used	-0.124939
-0.358486	systems Microcontrollers used	-0.124939
-2.332794	is the one	-0.124939
-1.196563	be the one	-0.124939
-2.234563	than the one	-0.124939
-0.600158	like the one	-0.124939
-1.490166	find the one	-0.124939
-1.838404	This is one	-0.425969
-0.896300	product is one	-0.124939
-0.598652	(there is one	-0.124939
-0.598652	Polymorphism is one	-0.124939
-0.601363	counts a one	-0.124939
-1.155103	calculation of one	-0.425969
-1.219933	pointer to one	-0.124939
-0.598213	priority to one	-0.124939
-1.067741	identical to one	-0.124939
-1.285842	belong to one	-0.124939
-1.410601	set and one	-0.124939
-1.461637	Intel and one	-0.124939
-1.264473	file and one	-0.124939
-0.888096	global and one	-0.124939
-1.056880	program, and one	-0.124939
-0.888096	SSE4.1 and one	-0.124939
-0.594498	brands, and one	-0.124939
-1.155930	it in one	-0.124939
-1.343576	int in one	-0.124939
-1.252840	integer in one	-0.124939
-0.878327	class in one	-0.124939
-1.320765	value in one	-0.124939
-1.961980	stored in one	-0.124939
-0.589502	pointers in one	-0.124939
-1.252840	together in one	-0.124939
-0.878327	temp in one	-0.124939
-1.320765	strings in one	-0.124939
-0.589502	terms in one	-0.124939
-0.878327	additions in one	-0.124939
-1.193921	data for one	-0.124939
-0.897960	addresses for one	-0.124939
-2.190062	so that one	-0.124939
-2.003851	sure that one	-0.124939
-1.440072	instructions are one	-0.124939
-0.899835	zero or one	-0.124939
-1.847547	or if one	-0.124939
-0.597621	comparisons by one	-0.124939
-0.597621	created by one	-0.124939
-1.074860	cycle on one	-0.124939
-1.295320	a code one	-0.124939
-1.988341	unsigned int one	-0.124939
-0.625064	more than one	-0.124939
-1.243588	can have one	-0.124939
-1.322770	will have one	-0.124939
-0.589948	variables have one	-0.124939
-1.586676	not use one	-0.124939
-1.271714	will use one	-0.124939
-1.025011	only from one	-0.124939
-0.574131	block from one	-0.124939
-0.198389	transferred from one	-0.124939
-0.574131	saved from one	-0.124939
-1.440591	program has one	-0.124939
-0.585794	structure has one	-0.124939
-0.585794	latter has one	-0.124939
-2.349618	to make one	-0.124939
-1.601929	and make one	-0.124939
-0.824330	the only one	-0.124939
-1.095538	is only one	-0.124939
-0.862157	and only one	-0.124939
-0.471914	that only one	-0.124939
-0.174783	be only one	-0.124939
-0.422490	with only one	-0.124939
-0.174783	have only one	-0.124939
-0.471914	from only one	-0.124939
-0.217400	has only one	-0.124939
-0.471914	make only one	-0.124939
-0.471914	example, only one	-0.124939
-0.422490	take only one	-0.425969
-0.973969	mispredicted only one	-0.124939
-0.471914	hold only one	-0.124939
-0.591341	prediction. If one	-0.124939
-0.881911	first. If one	-0.124939
-1.342825	of which one	-0.124939
-0.582836	out which one	-0.124939
-1.023889	see which one	-0.124939
-0.876127	is using one	-0.124939
-1.935851	by using one	-0.124939
-0.973303	files into one	-0.124939
-0.545902	read into one	-0.124939
-0.545902	modules into one	-0.124939
-1.053633	them into one	-0.124939
-0.545902	0x273F into one	-0.124939
-0.797211	joined into one	-0.124939
-0.598516	is number one	-0.124939
-0.597922	threads where one	-0.124939
-1.066535	typically takes one	-0.124939
-2.187194	For example, one	-0.124939
-0.889006	takes up one	-0.124939
-1.506673	many times one	-0.124939
-0.593135	entirely inside one	-0.124939
-1.165383	will get one	-0.124939
-0.191151	section needs one	-0.425969
-0.586609	but read one	-0.124939
-1.027481	that goes one	-0.124939
-1.365226	may choose one	-0.124939
-0.519018	to just one	-0.124939
-0.750364	in just one	-0.124939
-0.583159	detection function, one	-0.124939
-1.129160	to go one	-0.124939
-0.577792	should save one	-0.124939
-0.809915	the preceding one	-0.124939
-1.311726	by adding one	-0.124939
-0.718046	at least one	-0.124939
-0.849188	and handle one	-0.124939
-0.572695	and enable one	-0.124939
-1.272104	the program, one	-0.124939
-1.391222	instruction set, one	-0.124939
-1.305789	to allocate one	-0.124939
-1.045175	can eliminate one	-0.124939
-0.556330	currently available, one	-0.124939
-0.539893	and 22 one	-0.124939
-0.540186	line. Only one	-0.124939
-0.526003	integer units, one	-0.124939
-0.503798	queue) allocates one	-0.124939
-0.462528	goes randomly one	-0.124939
-0.462528	three times, one	-0.124939
-0.358056	2exponent 16383 one	-0.124939
-0.358056	for signifying one	-0.124939
-0.358056	purpose: Contain one	-0.124939
-0.358056	and shifts one	-0.124939
-0.358056	three parts: one	-0.124939
-0.358056	two branches: one	-0.124939
-0.358056	two names, one	-0.124939
-0.358056	were inserted, one	-0.124939
-2.527033	in the cache	-0.124939
-2.357003	that the cache	-0.124939
-1.699210	by the cache	-0.425969
-1.322612	than the cache	-0.301030
-1.457287	from the cache	-0.124939
-1.609188	because the cache	-0.124939
-1.622670	If the cache	-0.124939
-1.893788	all the cache	-0.124939
-1.936838	before the cache	-0.124939
-1.339434	cause the cache	-0.124939
-1.058314	uses the cache	-0.124939
-0.889068	least the cache	-0.124939
-0.594992	evict the cache	-0.124939
-0.594992	28, the cache	-0.124939
-2.363560	is a cache	-0.124939
-2.339750	of a cache	-0.124939
-1.339148	also a cache	-0.124939
-0.594941	so a cache	-0.124939
-0.888966	how a cache	-0.124939
-1.176318	cause a cache	-0.124939
-0.202687	loading a cache	-0.124939
-0.594941	fetching a cache	-0.124939
-0.594941	occupying a cache	-0.124939
-1.708561	because of cache	-0.124939
-1.611651	set of cache	-0.124939
-2.205525	number of cache	-0.124939
-1.250989	lot of cache	-0.124939
-1.646413	amount of cache	-0.124939
-0.886633	details of cache	-0.124939
-0.886633	penalty of cache	-0.124939
-0.849026	waste of cache	-0.124939
-0.593754	levels of cache	-0.124939
-1.298737	goes to cache	-0.124939
-1.196121	access and cache	-0.124939
-0.899072	sets and cache	-0.124939
-1.564817	cache. The cache	-0.124939
-1.188838	processors. The cache	-0.124939
-0.895382	line. The cache	-0.124939
-2.201302	will be cache	-0.124939
-0.600648	access or cache	-0.124939
-0.900555	align by cache	-0.124939
-1.382495	the code cache	-0.221849
-1.217429	The code cache	-0.425969
-0.573390	that code cache	-0.124939
-0.573390	Both code cache	-0.124939
-2.258148	such as cache	-0.124939
-1.196409	uses more cache	-0.124939
-0.886715	data A cache	-0.124939
-0.593795	sequentially A cache	-0.124939
-1.654959	the data cache	-0.124939
-0.902725	and data cache	-0.124939
-1.124209	The data cache	-0.124939
-0.240284	level-1 data cache	-0.124939
-0.599462	eight different cache	-0.124939
-1.325436	the same cache	-0.301030
-0.599552	under CPU cache	-0.124939
-0.897374	are other cache	-0.124939
-0.599493	87 used cache	-0.124939
-1.464646	are no cache	-0.124939
-1.636350	have no cache	-0.124939
-0.598978	has most cache	-0.124939
-0.598067	today where cache	-0.124939
-0.597604	loading any cache	-0.124939
-1.922772	a new cache	-0.124939
-1.769622	of these cache	-0.124939
-0.064959	bytes without cache	-0.124939
-0.367180	take up cache	-0.124939
-1.537871	an extra cache	-0.124939
-1.373226	can cause cache	-0.124939
-1.236646	may cause cache	-0.124939
-0.658943	the four cache	-0.425969
-0.883191	only four cache	-0.124939
-1.365837	the last cache	-0.124939
-0.529907	64. Each cache	-0.124939
-0.529907	29. Each cache	-0.124939
-1.223635	to improve cache	-0.124939
-0.132226	the level-2 cache	-0.124939
-0.042728	a level-2 cache	-0.124939
-0.090123	The level-2 cache	-0.124939
-0.090123	for level-2 cache	-0.124939
-0.090123	if level-2 cache	-0.124939
-1.200277	the code, cache	-0.124939
-0.859312	// No cache	-0.124939
-0.358936	the level-1 cache	-0.124939
-0.274524	for level-1 cache	-0.124939
-0.274524	entire level-1 cache	-0.124939
-1.011309	a special cache	-0.124939
-1.292805	to prevent cache	-0.124939
-1.011068	can save cache	-0.124939
-0.572296	an entire cache	-0.124939
-0.556669	very expensive cache	-0.124939
-1.080523	a thousand cache	-0.124939
-0.833893	an arbitrary cache	-0.124939
-0.107070	9.11 Explicit cache	-0.425969
-0.504449	a micro-op cache	-0.124939
-0.462801	disk. Provoke cache	-0.124939
-0.358271	instruction sets, cache	-0.124939
-0.358271	= (total cache	-0.124939
-0.358271	memory economy, cache	-0.124939
-0.358271	without taking cache	-0.124939
-0.358271	instructions executed, cache	-0.124939
-0.746568	expression that should	-0.425969
-1.579404	where it should	-0.124939
-0.598201	Typically it should	-0.124939
-1.447048	a function should	-0.124939
-0.592133	thread-safe function should	-0.124939
-1.053818	vectorized code should	-0.124939
-0.886017	System code should	-0.124939
-0.593440	measurement code should	-0.124939
-0.600460	calculations. This should	-0.124939
-1.591330	optimizing compiler should	-0.124939
-1.507136	that you should	-0.124939
-1.322877	then you should	-0.124939
-0.920844	because you should	-0.124939
-1.003665	but you should	-0.124939
-1.062183	example, you should	-0.124939
-0.251815	Therefore, you should	-0.425969
-0.920844	Here, you should	-0.124939
-0.791398	program, you should	-0.124939
-0.542633	Obviously, you should	-0.124939
-0.542633	contrary, you should	-0.124939
-0.581352	available. It should	-0.124939
-0.862617	space. It should	-0.124939
-0.581352	tools. It should	-0.124939
-0.581352	buffer. It should	-0.124939
-1.071166	test data should	-0.124939
-0.984650	The program should	-0.124939
-0.585567	No program should	-0.124939
-0.897545	math functions should	-0.124939
-1.576823	innermost loop should	-0.124939
-1.645678	or class should	-0.124939
-1.481514	this example should	-0.124939
-1.587777	The size should	-0.124939
-1.768259	and b should	-0.124939
-1.187728	each object should	-0.124939
-1.479184	of C++ should	-0.124939
-0.584955	NULL. There should	-0.124939
-0.584955	commas. There should	-0.124939
-0.494168	multidimensional array should	-0.425969
-1.319521	the objects should	-0.124939
-0.774292	and objects should	-0.425969
-0.597480	decomposition, we should	-0.124939
-0.760059	operations. You should	-0.124939
-0.524683	allocation. You should	-0.124939
-0.524683	software. You should	-0.124939
-0.760059	overlap. You should	-0.124939
-0.524683	them. You should	-0.124939
-0.524683	late. You should	-0.124939
-1.280291	The table should	-0.124939
-0.597059	software performance should	-0.124939
-0.596520	All software should	-0.124939
-0.892880	loop branch should	-0.124939
-0.949843	The test should	-0.124939
-0.812736	performance test should	-0.124939
-0.554537	speed test should	-0.124939
-0.595246	Web systems should	-0.124939
-0.595363	or method should	-0.124939
-0.888792	complicated cases should	-0.124939
-1.753092	a constant should	-0.124939
-0.566228	Big arrays should	-0.124939
-0.566228	Multidimensional arrays should	-0.124939
-1.483918	point calculations should	-0.124939
-1.402195	multiple versions should	-0.124939
-0.566553	three versions should	-0.124939
-1.346762	16 bytes should	-0.124939
-1.521579	multiple threads should	-0.124939
-1.168543	Each thread should	-0.124939
-0.592451	boxes, etc. should	-0.124939
-1.338850	a list should	-0.124939
-1.607151	loop counter should	-0.124939
-1.500134	loop count should	-0.124939
-0.588846	of programs should	-0.124939
-0.589240	These problems should	-0.124939
-0.877664	the dispatching should	-0.124939
-1.622528	memory block should	-0.124939
-1.148617	template parameter should	-0.124939
-0.874452	and resources should	-0.124939
-0.603188	CPU dispatcher should	-0.124939
-0.586597	updating mechanism should	-0.124939
-1.138997	.NET framework should	-0.124939
-0.234665	used together should	-0.726999
-0.947671	installation process should	-0.124939
-0.522437	update process should	-0.124939
-1.135934	intermediate results should	-0.124939
-1.025795	Thread-local storage should	-0.124939
-0.579288	few lines should	-0.124939
-0.577706	and output should	-0.124939
-0.482809	These containers should	-0.124939
-0.482809	inside containers should	-0.124939
-1.101795	one iteration should	-0.124939
-0.770194	for updates should	-0.124939
-0.473124	program updates should	-0.124939
-1.154549	switch statements should	-0.124939
-0.487000	Loop unrolling should	-0.124939
-0.569386	This penalty should	-0.124939
-0.986070	clock counts should	-0.124939
-0.565903	other device should	-0.124939
-0.825123	Lazy binding should	-0.124939
-0.825598	an interrupt should	-0.124939
-0.556394	rights. Software should	-0.124939
-0.584945	software developers should	-0.124939
-0.584945	Software developers should	-0.124939
-0.539392	Accessibility guidelines should	-0.124939
-0.313940	A queue should	-0.124939
-0.313940	FIFO queue should	-0.124939
-0.525995	or video should	-0.124939
-0.525995	performance measurement should	-0.124939
-0.525519	following considerations should	-0.124939
-0.525995	service routine should	-0.124939
-0.526471	are modified should	-0.124939
-0.503337	User feedback should	-0.124939
-0.503337	mathematical calculations, should	-0.124939
-0.462109	file formats should	-0.124939
-0.462109	and servers should	-0.124939
-0.657465	bits wide, should	-0.124939
-0.357727	any function) should	-0.124939
-0.357727	solutions. Patches should	-0.124939
-0.357727	appropriately. Users should	-0.124939
-0.357727	User complaints should	-0.124939
-0.357727	protection scheme should	-0.124939
-0.357727	which imprecisions should	-0.124939
-0.357727	unattended. Uninstallation should	-0.124939
-0.357727	.cpp file) should	-0.124939
-0.357727	or malloc/free should	-0.124939
-2.389553	of the integer	-0.124939
-2.503334	to the integer	-0.124939
-2.487004	that the integer	-0.124939
-2.512045	if the integer	-0.124939
-2.210965	because the integer	-0.124939
-1.943512	all the integer	-0.124939
-1.956534	using the integer	-0.124939
-1.360632	take the integer	-0.124939
-1.188386	reduce the integer	-0.124939
-1.067328	smaller the integer	-0.124939
-1.916282	use of integer	-0.124939
-2.416065	number of integer	-0.124939
-0.976178	point to integer	-0.124939
-0.378680	double to integer	-0.124939
-0.598241	Float to integer	-0.124939
-0.599908	point and integer	-0.124939
-0.898796	SSE4.1 and integer	-0.124939
-2.135866	stored in integer	-0.124939
-1.182695	element. The integer	-0.124939
-1.063014	application. The integer	-0.124939
-0.892245	index. The integer	-0.124939
-0.596604	8.15b. The integer	-0.124939
-1.648755	functions for integer	-0.124939
-1.409660	instructions for integer	-0.124939
-1.758537	check for integer	-0.124939
-0.596501	problems for integer	-0.124939
-1.075752	operands are integer	-0.124939
-0.600600	functions with integer	-0.124939
-0.590436	operators on integer	-0.124939
-0.500463	reductions on integer	-0.124939
-0.590436	manipulations on integer	-0.124939
-1.370515	fast as integer	-0.124939
-0.989782	is an integer	-0.124939
-1.019256	of an integer	-0.124939
-1.196798	to an integer	-0.124939
-1.341743	for an integer	-0.124939
-0.970729	or an integer	-0.124939
-1.262424	as an integer	-0.124939
-1.019643	use an integer	-0.124939
-0.852925	only an integer	-0.124939
-0.739995	do an integer	-0.124939
-0.512901	whether an integer	-0.124939
-0.512901	simply an integer	-0.124939
-0.512901	replace an integer	-0.124939
-0.512901	fact an integer	-0.124939
-0.512901	When an integer	-0.124939
-0.512901	adding an integer	-0.124939
-0.512901	divide an integer	-0.124939
-0.512901	convert an integer	-0.124939
-0.512901	increment an integer	-0.124939
-0.512901	declaring an integer	-0.124939
-0.512901	Comparing an integer	-0.124939
-0.512901	Converting an integer	-0.124939
-0.512901	replacing an integer	-0.124939
-0.899522	predictable than integer	-0.124939
-0.899154	typically use integer	-0.124939
-1.630463	or more integer	-0.124939
-0.588735	one more integer	-0.124939
-0.588735	few more integer	-0.124939
-1.053094	conversions from integer	-0.124939
-0.885526	Conversion from integer	-0.124939
-1.365931	with vector integer	-0.124939
-1.589889	the different integer	-0.124939
-1.765424	of different integer	-0.124939
-0.599857	advantage because integer	-0.124939
-1.535369	for other integer	-0.124939
-1.813098	of each integer	-0.124939
-2.260052	to do integer	-0.124939
-1.066902	first two integer	-0.124939
-1.303408	a 64-bit integer	-0.124939
-1.141629	of 64-bit integer	-0.124939
-1.678961	most efficient integer	-0.124939
-1.125750	a 32-bit integer	-0.124939
-0.581102	as 32-bit integer	-0.124939
-1.623676	most critical integer	-0.124939
-1.233356	128 bit integer	-0.124939
-0.856823	256 bit integer	-0.124939
-0.544194	the unsigned integer	-0.124939
-0.544194	The unsigned integer	-0.124939
-0.469514	an unsigned integer	-0.124939
-1.178186	that these integer	-0.124939
-0.595694	the even integer	-0.124939
-1.371585	a simple integer	-0.124939
-0.799278	do simple integer	-0.124939
-0.547059	mix simple integer	-0.124939
-0.539262	operations An integer	-0.124939
-0.539262	s; An integer	-0.124939
-0.539262	+127. An integer	-0.124939
-1.048594	that contains integer	-0.124939
-1.779254	a particular integer	-0.124939
-0.589530	often replace integer	-0.124939
-0.191110	14.9 Using integer	-0.425969
-0.316989	a signed integer	-0.124939
-0.741666	to signed integer	-0.124939
-0.458496	that signed integer	-0.124939
-0.587588	/ means integer	-0.124939
-0.530061	store aligned integer	-0.124939
-0.530061	load aligned integer	-0.124939
-0.864135	A negative integer	-0.124939
-1.243835	a positive integer	-0.124939
-1.282501	to mix integer	-0.124939
-0.054841	store unaligned integer	-0.602060
-0.054841	load unaligned integer	-0.602060
-0.566156	allows 256-bit integer	-0.124939
-0.834026	the default integer	-0.124939
-0.825783	Register variables, integer	-0.124939
-0.306194	the smallest integer	-0.602060
-1.023804	more complex integer	-0.124939
-0.390391	first six integer	-0.124939
-0.549730	approximately six integer	-0.124939
-0.787192	and fourteen integer	-0.124939
-0.540532	at reducing integer	-0.124939
-0.358476	15.1c. Calculate integer	-0.124939
-0.358476	15.1b. Calculate integer	-0.124939
-0.762478	an additional integer	-0.124939
-0.526088	of defining integer	-0.124939
-0.724881	reductions involving integer	-0.124939
-0.503878	to nearest integer	-0.124939
-0.503878	fast. Simple integer	-0.124939
-0.658236	subexpression elimin., integer	-0.124939
-0.358113	CodeGear compiler) integer	-0.124939
-0.358113	registers (6 integer	-0.124939
-0.358113	bounds violation, integer	-0.124939
-0.358113	(3) trap integer	-0.124939
-1.907155	This is no	-0.124939
-1.498429	object is no	-0.124939
-0.368947	there is no	-0.274701
-0.211433	There is no	-0.359022
-0.567542	execution is no	-0.124939
-0.567542	priority is no	-0.124939
-0.598182	8 and no	-0.124939
-1.192590	count and no	-0.124939
-0.895369	constructor and no	-0.124939
-1.067651	additions and no	-0.124939
-1.071708	sure that no	-0.124939
-0.891928	recommend that no	-0.124939
-2.435055	may be no	-0.124939
-2.152436	will be no	-0.124939
-1.984670	that are no	-0.124939
-0.919111	there are no	-0.124939
-1.519380	There are no	-0.124939
-1.825159	they are no	-0.124939
-0.593317	few or no	-0.124939
-0.202357	little or no	-0.124939
-0.593317	number, or no	-0.124939
-1.059355	code if no	-0.124939
-0.595350	objects if no	-0.124939
-1.610894	even if no	-0.124939
-0.600926	inlined - no	-0.124939
-1.247470	to have no	-0.124939
-1.048404	can have no	-0.124939
-0.782013	functions have no	-0.425969
-0.910849	elements have no	-0.124939
-1.087920	I have no	-0.124939
-0.783957	preferably have no	-0.124939
-0.783957	microprocessors have no	-0.124939
-0.910849	operands have no	-0.124939
-0.117389	microcontrollers have no	-0.124939
-0.538421	mentations have no	-0.124939
-0.594633	necessary when no	-0.124939
-0.594633	Useful when no	-0.124939
-1.134456	that has no	-0.124939
-1.216874	code has no	-0.124939
-1.297575	compiler has no	-0.124939
-0.915844	set has no	-0.124939
-1.241303	library has no	-0.124939
-0.354546	object has no	-0.124939
-0.540533	functions) has no	-0.124939
-0.540533	Deallocation has no	-0.124939
-0.540533	apparently has no	-0.124939
-0.582820	overflow but no	-0.124939
-0.582820	expressions, but no	-0.124939
-0.582820	if), but no	-0.124939
-0.544269	code takes no	-0.124939
-0.544269	object takes no	-0.124939
-0.794303	often takes no	-0.124939
-0.544269	handling takes no	-0.124939
-1.008411	conversion takes no	-0.124939
-1.312555	it makes no	-0.124939
-0.580083	#define makes no	-0.124939
-0.560103	double take no	-0.124939
-0.560103	calculations take no	-0.124939
-0.560103	precisions take no	-0.124939
-0.594827	hint about no	-0.124939
-1.167620	will get no	-0.124939
-1.044139	program contains no	-0.124939
-0.556212	section contains no	-0.124939
-1.601697	is simply no	-0.124939
-1.033646	This requires no	-0.124939
-0.585610	set control no	-0.124939
-1.157422	to assume no	-0.124939
-0.580356	can produce no	-0.124939
-0.498740	-fno-rtti Assume no	-0.124939
-0.498740	78. Assume no	-0.124939
-0.557285	conversion generates no	-0.124939
-0.504757	for assuming no	-0.124939
-0.504560	or (requires no	-0.124939
-0.143294	for "assume no	-0.124939
-0.143294	option "assume no	-0.124939
-0.358600	is virtually no	-0.124939
-0.601607	storage and page	-0.124939
-0.511081	example on page	-0.124939
-0.511081	classes on page	-0.124939
-0.017019	explained on page	-0.149762
-0.511081	examples on page	-0.124939
-0.511081	described on page	-0.124939
-0.511081	given on page	-0.124939
-0.184335	discussed on page	-0.124939
-0.511081	below on page	-0.124939
-0.511081	listed on page	-0.124939
-0.511081	detail on page	-0.124939
-0.511081	below, on page	-0.124939
-0.511081	7.43 on page	-0.124939
-1.781340	the memory page	-0.124939
-0.600241	explained at page	-0.124939
-0.598580	(See also page	-0.124939
-0.107435	or See page	-0.124939
-0.107435	function. See page	-0.124939
-0.107435	functions. See page	-0.124939
-0.183199	memory. See page	-0.124939
-0.107435	program. See page	-0.124939
-0.050404	used. See page	-0.124939
-0.107435	processors. See page	-0.124939
-0.107435	1. See page	-0.124939
-0.107435	possible. See page	-0.124939
-0.107435	way. See page	-0.124939
-0.107435	CPU. See page	-0.124939
-0.107435	not. See page	-0.124939
-0.107435	order. See page	-0.124939
-0.107435	allocation. See page	-0.124939
-0.107435	available. See page	-0.124939
-0.107435	expressions. See page	-0.124939
-0.107435	storage. See page	-0.124939
-0.050404	system. See page	-0.124939
-0.107435	optimizations. See page	-0.124939
-0.107435	language. See page	-0.124939
-0.107435	this. See page	-0.124939
-0.107435	overlap. See page	-0.124939
-0.107435	files. See page	-0.124939
-0.107435	so. See page	-0.124939
-0.107435	manuals. See page	-0.124939
-0.050404	dispatcher. See page	-0.425969
-0.107435	cached. See page	-0.124939
-0.107435	aliasing. See page	-0.124939
-0.107435	compact. See page	-0.124939
-0.107435	errors. See page	-0.124939
-0.107435	takes. See page	-0.124939
-0.107435	exceptions. See page	-0.124939
-0.107435	occur. See page	-0.124939
-0.050404	other. See page	-0.124939
-0.107435	aligned. See page	-0.124939
-0.107435	required. See page	-0.124939
-0.107435	mechanism. See page	-0.124939
-0.107435	delay. See page	-0.124939
-0.107435	containers. See page	-0.124939
-0.107435	contentions. See page	-0.124939
-0.107435	crash. See page	-0.124939
-0.107435	incremented. See page	-0.124939
-0.107435	requested. See page	-0.124939
-0.107435	OS. See page	-0.124939
-0.107435	motion. See page	-0.124939
-0.107435	(RTTI). See page	-0.124939
-0.107435	0x8040); See page	-0.124939
-0.094641	to (see page	-0.124939
-0.094641	compiler (see page	-0.124939
-0.094641	one (see page	-0.124939
-0.168009	cache (see page	-0.124939
-0.094641	class (see page	-0.124939
-0.094641	double (see page	-0.124939
-0.094641	pointer (see page	-0.124939
-0.094641	efficient (see page	-0.124939
-0.044748	registers (see page	-0.124939
-0.094641	system (see page	-0.124939
-0.094641	instructions (see page	-0.124939
-0.094641	processors (see page	-0.124939
-0.094641	constant (see page	-0.124939
-0.094641	necessary (see page	-0.124939
-0.094641	integers (see page	-0.124939
-0.094641	precision (see page	-0.124939
-0.094641	list (see page	-0.124939
-0.094641	1 (see page	-0.124939
-0.094641	expressions (see page	-0.124939
-0.094641	range (see page	-0.124939
-0.094641	intended (see page	-0.124939
-0.044748	vectorization (see page	-0.124939
-0.094641	checking (see page	-0.124939
-0.094641	time-consuming (see page	-0.124939
-0.044748	aliasing (see page	-0.425969
-0.094641	profiling (see page	-0.124939
-0.094641	capabilities (see page	-0.124939
-0.094641	mispredictions (see page	-0.124939
-0.094641	profitable (see page	-0.124939
-0.094641	CPU-dispatching (see page	-0.124939
-0.094641	devirtualization (see page	-0.124939
-0.543376	operations, see page	-0.124939
-0.543376	registers; see page	-0.124939
-0.584972	chapter 10 page	-0.124939
-0.109868	2 (See page	-0.124939
-0.257303	Windows (See page	-0.124939
-0.257303	CPUs. (See page	-0.124939
-0.257303	modules (See page	-0.124939
-0.257303	2. (See page	-0.124939
-0.805602	(see above, page	-0.124939
-0.940222	example 13.1 page	-0.124939
-0.463495	example 14.23 page	-0.124939
-0.358816	example 7.35 page	-0.124939
-3.046119	in the set	-0.124939
-1.199855	f is set	-0.124939
-0.600990	Friday is set	-0.124939
-1.402576	use a set	-0.425969
-2.186595	is to set	-0.124939
-2.127205	have to set	-0.124939
-1.923754	way to set	-0.124939
-1.975981	recommended to set	-0.124939
-1.283828	belong to set	-0.124939
-0.894404	attempts to set	-0.124939
-1.298957	lines in set	-0.124939
-2.252142	can be set	-0.124939
-0.901530	options are set	-0.124939
-2.181925	you can set	-0.124939
-1.070340	tool can set	-0.124939
-1.055645	0x80000000; // set	-0.124939
-0.376998	b[size]; // set	-0.425969
-0.594072	0x7FFFFFFF; // set	-0.124939
-0.600089	lines from set	-0.124939
-2.094549	the data set	-0.124939
-1.487327	with different set	-0.124939
-2.621205	the same set	-0.124939
-0.192878	the instruction set	-0.124939
-0.203271	on instruction set	-0.124939
-0.057790	This instruction set	-0.301030
-0.299968	an instruction set	-0.124939
-0.175560	this instruction set	-0.124939
-0.175560	which instruction set	-0.124939
-0.203271	64-bit instruction set	-0.124939
-0.203271	possible instruction set	-0.124939
-0.203271	bit instruction set	-0.124939
-0.203271	new instruction set	-0.124939
-0.107097	SSE2 instruction set	-0.602060
-0.203271	32 instruction set	-0.124939
-0.086264	AVX instruction set	-0.492916
-0.349399	supported instruction set	-0.124939
-0.299968	particular instruction set	-0.124939
-0.223568	later instruction set	-0.522879
-0.322636	higher instruction set	-0.124939
-0.089850	AVX2 instruction set	-0.124939
-0.057790	x86 instruction set	-0.124939
-0.203271	appropriate instruction set	-0.124939
-0.299968	desired instruction set	-0.124939
-0.203271	SSE instruction set	-0.124939
-0.089850	SSE4.1 instruction set	-0.425969
-0.299968	newer instruction set	-0.124939
-0.203271	low instruction set	-0.124939
-0.089850	latest instruction set	-0.124939
-0.203271	SSE3 instruction set	-0.124939
-0.337661	newest instruction set	-0.124939
-0.203271	AVX512 instruction set	-0.124939
-0.124585	CISC instruction set	-0.124939
-0.203271	highest instruction set	-0.124939
-0.203271	x86-64 instruction set	-0.124939
-0.057790	AVX-512 instruction set	-0.301030
-0.203271	lowest instruction set	-0.124939
-0.203271	(the instruction set	-0.124939
-0.203271	later) instruction set	-0.124939
-0.203271	Newest instruction set	-0.124939
-0.203271	Pro instruction set	-0.124939
-0.599642	calculate which set	-0.124939
-1.072047	commonly used set	-0.124939
-1.293351	is one set	-0.124939
-1.943268	for each set	-0.124939
-0.599016	its pointer set	-0.124939
-0.597361	debugger cannot set	-0.124939
-1.788136	a particular set	-0.124939
-1.458953	its own set	-0.124939
-0.112960	bits Instruction set	-0.124939
-0.112960	element Instruction set	-0.124939
-0.112960	name Instruction set	-0.124939
-0.112960	BSD Instruction set	-0.124939
-0.052818	follows: Instruction set	-0.124939
-0.112960	"\nError: Instruction set	-0.124939
-1.076973	a typical set	-0.124939
-0.566844	in addition, set	-0.124939
-1.047156	a suitable set	-0.124939
-0.540721	as list, set	-0.124939
-0.266086	a realistic set	-0.425969
-0.090101	AVX instr. set	-0.124939
-0.090101	SSE4.1 instr. set	-0.124939
-0.090101	SSE3 instr. set	-0.124939
-2.192780	of the class	-0.301030
-2.508132	to the class	-0.124939
-2.698866	in the class	-0.124939
-2.448878	for the class	-0.124939
-2.516729	if the class	-0.124939
-2.235276	If the class	-0.124939
-0.973753	about the class	-0.124939
-1.915716	inside the class	-0.124939
-1.863156	is a class	-0.124939
-1.307279	of a class	-0.204120
-1.630209	to a class	-0.425969
-1.910217	in a class	-0.124939
-1.858269	be a class	-0.124939
-1.275775	into a class	-0.124939
-0.488603	inside a class	-0.124939
-1.641236	object of class	-0.124939
-0.902014	belongs to class	-0.124939
-1.073356	structure and class	-0.124939
-0.600109	Structure and class	-0.124939
-0.901386	work for class	-0.124939
-0.611923	function or class	-0.124939
-0.240774	structure or class	-0.124939
-0.573260	52 or class	-0.124939
-0.593871	registers. A class	-0.124939
-0.593871	constructors. A class	-0.124939
-0.974672	the vector class	-0.124939
-0.900170	The vector class	-0.124939
-0.533859	// vector class	-0.124939
-1.112992	Intel vector class	-0.124939
-0.533859	my vector class	-0.124939
-0.533859	My vector class	-0.124939
-0.175872	Agner's vector class	-0.425969
-1.941660	the same class	-0.124939
-1.289265	virtual functions class	-0.124939
-1.363543	to all class	-0.124939
-1.292690	to one class	-0.124939
-0.840091	a template class	-0.124939
-1.050475	A template class	-0.124939
-0.539061	above template class	-0.124939
-1.741293	a simple class	-0.124939
-0.897915	} }; class	-0.124939
-0.764067	f(); }; class	-0.124939
-0.669388	a container class	-0.425969
-0.440317	The container class	-0.124939
-0.440317	other container class	-0.124939
-0.799855	efficient container class	-0.124939
-0.440317	made container class	-0.124939
-1.038498	know what class	-0.124939
-0.085945	the child class	-0.124939
-0.323998	and child class	-0.124939
-0.287426	The child class	-0.124939
-0.169236	its child class	-0.124939
-0.193203	correct child class	-0.124939
-0.574985	suitable containers class	-0.124939
-0.459949	first generation class	-0.124939
-0.248322	second generation class	-0.124939
-0.326263	third generation class	-0.124939
-0.377761	12.4. Vector class	-0.124939
-0.377761	lately. Vector class	-0.124939
-0.377761	12.7. Vector class	-0.124939
-0.840739	the declaration class	-0.124939
-0.193241	the derived class	-0.124939
-0.107125	a derived class	-0.124939
-0.162535	and derived class	-0.124939
-0.283047	the parent class	-0.124939
-0.118861	a parent class	-0.124939
-0.578245	of parent class	-0.124939
-1.062798	a polymorphic class	-0.124939
-0.804280	a base class	-0.124939
-0.566420	multiple inheritance class	-0.124939
-0.358478	Multiple inheritance class	-0.124939
-0.077737	int N> class	-0.124939
-0.261840	<int N> class	-0.124939
-0.504199	B2; 54 class	-0.124939
-0.504199	Conversions involving class	-0.124939
-0.504199	Example 7.28 class	-0.124939
-0.504199	8.19. Devirtualization class	-0.124939
-0.504516	Example 7.14 class	-0.124939
-0.065731	class B1; class	-0.425969
-0.658694	the object's class	-0.124939
-0.462893	grandparent class: class	-0.124939
-0.462893	"; Disp(); class	-0.124939
-0.143215	true. template<> class	-0.124939
-0.143215	recursion template<> class	-0.124939
-0.462893	undetected. Converting class	-0.124939
-0.658694	class B2; class	-0.124939
-0.358342	// versions: class	-0.124939
-0.358342	<typename MyChild> class	-0.124939
-0.358342	Example 7.44 class	-0.124939
-0.358342	Example 7.37 class	-0.124939
-0.358342	Example 7.36 class	-0.124939
-0.358342	Example 7.41a class	-0.124939
-2.369426	of the floating	-0.425969
-2.454716	that the floating	-0.124939
-2.274656	by the floating	-0.124939
-1.498662	when the floating	-0.602060
-1.185732	so the floating	-0.124939
-1.971990	before the floating	-0.124939
-1.515974	store the floating	-0.124939
-1.278165	When the floating	-0.124939
-1.065318	reflects the floating	-0.124939
-0.597390	in-between the floating	-0.124939
-1.638792	b is floating	-0.124939
-1.649083	of a floating	-0.602060
-2.126037	to a floating	-0.124939
-1.767549	and a floating	-0.124939
-1.814896	if a floating	-0.124939
-1.820417	than a floating	-0.124939
-1.739427	If a floating	-0.124939
-1.273603	do a floating	-0.124939
-1.256236	access a floating	-0.124939
-0.884614	needs a floating	-0.124939
-0.592724	addition, a floating	-0.124939
-0.592724	rounds a floating	-0.124939
-1.824820	use of floating	-0.124939
-2.243168	number of floating	-0.124939
-1.568915	order of floating	-0.124939
-1.183931	cases of floating	-0.124939
-0.596735	types of floating	-0.602060
-1.267885	range of floating	-0.124939
-0.595225	manipulations of floating	-0.124939
-0.743256	integer to floating	-0.124939
-0.501446	integers to floating	-0.425969
-0.890864	conversion to floating	-0.124939
-1.350267	apply to floating	-0.124939
-0.595904	Conversion to floating	-0.124939
-0.890864	converting to floating	-0.124939
-1.183149	integer and floating	-0.124939
-0.345001	integers and floating	-0.602060
-0.596722	constants and floating	-0.124939
-1.197503	calculations in floating	-0.124939
-1.200799	especially in floating	-0.124939
-1.593215	functions. The floating	-0.124939
-1.339614	variables for floating	-0.124939
-0.889132	registers for floating	-0.124939
-0.595025	exception for floating	-0.124939
-0.595025	accumulators for floating	-0.124939
-0.595025	keyword, for floating	-0.124939
-1.823535	assume that floating	-0.124939
-2.145338	there are floating	-0.124939
-0.900039	integers or floating	-0.124939
-0.887449	example with floating	-0.124939
-0.887449	addition with floating	-0.124939
-1.055926	incompatible with floating	-0.124939
-1.430870	than on floating	-0.124939
-0.501843	reductions on floating	-0.425969
-0.600463	results as floating	-0.124939
-1.990249	faster than floating	-0.124939
-0.891056	expressions than floating	-0.124939
-1.660714	that have floating	-0.124939
-1.071937	space. A floating	-0.124939
-0.860895	than from floating	-0.124939
-1.017333	conversion from floating	-0.124939
-1.017333	conversions from floating	-0.124939
-0.860895	Conversion from floating	-0.124939
-2.230943	to make floating	-0.124939
-0.495160	cannot make floating	-0.425969
-0.599347	mixing different floating	-0.124939
-1.619933	that all floating	-0.124939
-1.858009	only one floating	-0.124939
-0.598785	If each floating	-0.124939
-0.487857	or two floating	-0.425969
-0.573609	require two floating	-0.124939
-0.493661	before any floating	-0.425969
-0.660386	Conversions between floating	-0.425969
-1.228370	it makes floating	-0.124939
-0.196068	set makes floating	-0.425969
-0.892188	and 8 floating	-0.124939
-1.920928	a new floating	-0.124939
-0.197383	difficulties making floating	-0.425969
-0.491344	that does floating	-0.425969
-1.391572	a big floating	-0.124939
-0.522891	are eight floating	-0.124939
-0.756982	first eight floating	-0.124939
-0.756982	involves eight floating	-0.124939
-0.591668	loop contains floating	-0.124939
-0.633591	of doing floating	-0.425969
-0.590156	enable fast floating	-0.124939
-0.584403	that generate floating	-0.124939
-0.582201	two positive floating	-0.124939
-0.576269	are 100 floating	-0.124939
-0.576599	bug causes floating	-0.124939
-1.282832	to mix floating	-0.124939
-0.574306	units. Any floating	-0.124939
-1.386631	the entire floating	-0.124939
-0.565900	x87 style floating	-0.124939
-0.561757	static variables, floating	-0.124939
-0.540308	for strict floating	-0.124939
-0.358329	a nonzero floating	-0.124939
-0.358329	of nonzero floating	-0.124939
-0.540045	allows larger floating	-0.124939
-0.762587	an additional floating	-0.124939
-0.065707	for manipulating floating	-0.425969
-0.658322	// Catch floating	-0.124939
-0.658322	branch mispredictions, floating	-0.124939
-0.358156	less precise floating	-0.124939
-0.358156	-fno-alias Non-strict floating	-0.124939
-0.358156	no native floating	-0.124939
-0.358156	// Reset floating	-0.124939
-0.358156	including relaxed floating	-0.124939
-0.358156	to relax floating	-0.124939
-0.358156	vectors FMA3 floating	-0.124939
-0.915876	size of each	-0.221849
-1.256480	address of each	-0.124939
-1.605288	result of each	-0.124939
-1.359744	speed of each	-0.124939
-1.593827	advantages of each	-0.124939
-1.465515	length of each	-0.124939
-0.588443	destructors of each	-0.124939
-0.876269	consumption of each	-0.124939
-0.876269	bytes) of each	-0.124939
-0.123404	Size of each	-0.301030
-0.588443	logarithm of each	-0.124939
-0.378397	2 to each	-0.425969
-0.597539	allocated to each	-0.124939
-0.597539	features to each	-0.124939
-1.283221	belong to each	-0.124939
-0.597539	unrelated to each	-0.124939
-0.600059	CPU-specific and each	-0.124939
-0.600059	x∙xn-1, and each	-0.124939
-1.795503	elements in each	-0.124939
-1.186366	threads in each	-0.124939
-1.271831	lines in each	-0.124939
-0.891174	counters in each	-0.124939
-0.891174	occurs in each	-0.124939
-0.596061	formula in each	-0.124939
-1.299606	function for each	-0.124939
-1.075120	use for each	-0.124939
-0.365445	different for each	-0.124939
-1.172030	file for each	-0.124939
-0.566044	error for each	-0.124939
-0.979019	container for each	-0.124939
-0.566044	optimal for each	-0.124939
-0.566044	separate for each	-0.124939
-0.833818	block for each	-0.124939
-0.979019	name for each	-0.124939
-0.833818	difference for each	-0.124939
-0.068378	instance for each	-0.221849
-0.566044	containers for each	-0.124939
-0.196674	once for each	-0.124939
-0.566044	changes for each	-0.124939
-1.265661	waiting for each	-0.124939
-0.566044	bc for each	-0.124939
-0.566044	manager for each	-0.124939
-0.566044	prototypes for each	-0.124939
-0.592996	CPU that each	-0.124939
-1.497932	so that each	-0.124939
-0.592996	account that each	-0.124939
-0.673975	sense that each	-0.124939
-1.637892	structure or each	-0.124939
-1.188612	that if each	-0.124939
-1.533243	example, if each	-0.124939
-0.600800	access by each	-0.124939
-0.597478	objects with each	-0.124939
-0.597478	stored with each	-0.124939
-1.664818	rather than each	-0.425969
-1.073761	threads have each	-0.124939
-1.198778	much time each	-0.124939
-0.886967	available then each	-0.124939
-0.593924	chains then each	-0.124939
-1.072581	far from each	-0.124939
-1.195305	from memory each	-0.124939
-0.885582	code at each	-0.124939
-0.593218	addresses at each	-0.124939
-1.263779	time because each	-0.124939
-0.592443	serial because each	-0.124939
-0.599664	loop. If each	-0.124939
-0.598795	fact using each	-0.124939
-1.072845	work into each	-0.124939
-0.797071	loop where each	-0.124939
-0.545823	sequence where each	-0.124939
-0.545823	chain where each	-0.124939
-0.545823	calculations, where each	-0.124939
-0.545823	__intel_cpu_feature_indicator where each	-0.124939
-2.207902	the value each	-0.124939
-0.597326	cache between each	-0.124939
-0.596885	or less each	-0.124939
-1.709447	to test each	-0.124939
-1.011808	of times each	-0.124939
-0.779972	many times each	-0.124939
-0.593961	members. But each	-0.124939
-1.624055	to calculate each	-0.124939
-1.152067	can calculate each	-0.124939
-1.701266	to store each	-0.124939
-0.589000	get next each	-0.124939
-1.023588	and after each	-0.124939
-0.533204	switches after each	-0.124939
-1.157591	to give each	-0.124939
-1.299156	} Here, each	-0.124939
-0.583437	same function, each	-0.124939
-0.995446	for updates each	-0.124939
-0.570291	statements within each	-0.124939
-0.112900	used near each	-0.124939
-0.025594	stored near each	-0.425969
-0.112900	called near each	-0.124939
-0.112900	together near each	-0.124939
-0.549762	By giving each	-0.124939
-0.143243	// AND each	-0.425969
-0.540465	CPU development, each	-0.124939
-0.540465	matrix, i.e. each	-0.124939
-0.504462	branch. After each	-0.124939
-0.833936	the contrary, each	-0.124939
-0.504462	than moving each	-0.124939
-0.833936	if necessary, each	-0.124939
-0.462820	will invalidate each	-0.124939
-0.658579	to draw each	-0.124939
-0.065723	// Compare each	-0.425969
-0.358285	multiple versions, each	-0.124939
-0.358285	underflow neutralize each	-0.124939
-1.382242	is to do	-0.124939
-0.731560	compiler to do	-0.124939
-0.939908	have to do	-0.221849
-0.811447	possible to do	-0.182931
-1.172678	takes to do	-0.124939
-1.262668	how to do	-0.124939
-1.559921	need to do	-0.124939
-1.094248	important to do	-0.425969
-0.821306	work to do	-0.124939
-0.772320	necessary to do	-0.124939
-0.821306	good to do	-0.124939
-1.107579	advantageous to do	-0.124939
-0.646659	able to do	-0.191886
-0.961638	cycles to do	-0.124939
-1.150450	optimal to do	-0.124939
-0.821306	better to do	-0.124939
-0.763366	ways to do	-0.124939
-0.961638	safe to do	-0.124939
-0.821306	thing to do	-0.124939
-1.053540	try to do	-0.124939
-0.821306	obvious to do	-0.124939
-1.086577	safer to do	-0.124939
-0.239992	Failure to do	-0.602060
-0.559246	seem to do	-0.124939
-0.559246	decide to do	-0.124939
-1.052841	systems that do	-0.124939
-0.593101	microprocessors that do	-0.124939
-0.202313	languages that do	-0.124939
-0.593101	cores that do	-0.124939
-0.593101	facilities that do	-0.124939
-1.684313	that can do	-0.124939
-1.654712	it can do	-0.124939
-1.586260	compiler can do	-0.124939
-0.875641	you can do	-0.204120
-0.819670	CPU can do	-0.124939
-0.876514	compilers can do	-0.124939
-1.353319	we can do	-0.124939
-1.050751	CPUs can do	-0.124939
-0.558350	processor can do	-0.124939
-0.566747	thread can do	-0.124939
-0.819670	programmer can do	-0.124939
-0.558350	microprocessors can do	-0.124939
-0.558350	algorithm can do	-0.124939
-0.558350	preprocessor can do	-0.124939
-2.293654	{ // do	-0.124939
-0.598326	order or do	-0.124939
-0.598326	command or do	-0.124939
-0.938179	will not do	-0.124939
-2.012056	If you do	-0.124939
-1.668332	compiler will do	-0.124939
-1.128090	compilers will do	-0.124939
-2.415442	the program do	-0.124939
-0.599494	routine should do	-0.124939
-1.041491	most compilers do	-0.124939
-0.589130	why compilers do	-0.124939
-0.840494	and we do	-0.124939
-1.486151	that we do	-0.124939
-0.840494	But we do	-0.124939
-1.475913	point variables do	-0.124939
-0.860043	compilers cannot do	-0.124939
-0.580003	therefore cannot do	-0.124939
-1.704393	function libraries do	-0.124939
-0.854070	Intel libraries do	-0.124939
-1.061996	that pointers do	-0.124939
-1.521451	32-bit systems do	-0.124939
-1.413727	because they do	-0.124939
-1.467571	integer operations do	-0.124939
-1.272071	you must do	-0.124939
-1.695372	be able do	-0.124939
-0.871943	these directives do	-0.124939
-0.585387	bigger vectors do	-0.124939
-1.041499	when contentions do	-0.124939
-0.865216	relative references do	-0.124939
-1.127247	These conversions do	-0.124939
-0.850885	STL containers do	-0.124939
-0.841312	many programmers do	-0.124939
-0.540590	overlap. Compilers do	-0.124939
-0.526679	2004. Can do	-0.124939
-0.111503	live ranges do	-0.602060
-0.725815	vector register, do	-0.124939
-0.358514	have studied do	-0.124939
-0.358514	their live-ranges do	-0.124939
-0.358514	(live ranges) do	-0.124939
-1.869127	as the example	-0.124939
-2.190617	at the example	-0.124939
-1.694703	way of example	-0.124939
-0.899139	collection of example	-0.124939
-0.600080	transformation of example	-0.124939
-0.601643	analogous to example	-0.124939
-0.598419	code in example	-0.124939
-0.208529	as in example	-0.124939
-0.225260	loop in example	-0.124939
-1.113866	but in example	-0.124939
-1.601931	used in example	-0.124939
-0.818337	b in example	-0.124939
-1.386835	variable in example	-0.124939
-0.361886	2 in example	-0.124939
-0.557619	branch in example	-0.124939
-0.818337	method in example	-0.124939
-0.818337	work in example	-0.124939
-1.369138	explained in example	-0.124939
-0.557619	list in example	-0.124939
-0.818337	structure in example	-0.124939
-0.361886	syntax in example	-0.124939
-1.113866	given in example	-0.124939
-0.557619	below in example	-0.124939
-0.557619	unrolling in example	-0.124939
-0.194861	CriticalFunction in example	-0.124939
-0.361886	illustrated in example	-0.124939
-0.818337	shown in example	-0.124939
-0.557619	Friday) in example	-0.124939
-0.557619	FactorialTable in example	-0.124939
-0.557619	construction in example	-0.124939
-0.557619	if-branch in example	-0.124939
-0.601376	type. The example	-0.124939
-0.589344	cases, for example	-0.124939
-0.589344	program, for example	-0.124939
-0.589344	suitable for example	-0.124939
-0.589344	variable, for example	-0.124939
-0.589344	loop, for example	-0.124939
-0.589344	it, for example	-0.124939
-0.589344	events, for example	-0.124939
-0.589344	interval, for example	-0.124939
-0.589344	parts, for example	-0.124939
-0.593691	code as example	-0.124939
-0.593691	Same as example	-0.124939
-0.593691	parameters, as example	-0.124939
-0.600877	note: This example	-0.124939
-1.000498	for an example	-0.425969
-1.656459	as an example	-0.124939
-0.588237	shows an example	-0.124939
-2.076959	faster than example	-0.124939
-0.644202	in this example	-0.271067
-0.565480	giving this example	-0.124939
-0.838096	or from example	-0.124939
-0.121061	code from example	-0.124939
-0.985008	conversion from example	-0.124939
-0.985008	come from example	-0.124939
-1.622757	The same example	-0.124939
-0.599068	matrix using example	-0.124939
-0.547182	double In example	-0.124939
-0.547182	register. In example	-0.124939
-0.547182	16. In example	-0.124939
-0.547182	application. In example	-0.124939
-0.547182	explicitly. In example	-0.124939
-0.597302	directives. For example	-0.124939
-0.596025	Use these example	-0.124939
-0.172078	The following example	-0.656418
-0.313855	InstructionSet().The following example	-0.124939
-0.594276	executed. An example	-0.124939
-0.590458	compiler optimize example	-0.124939
-0.685741	the above example	-0.301030
-0.997543	The above example	-0.124939
-1.365871	The next example	-0.124939
-0.589225	situations like example	-0.124939
-1.285178	to reduce example	-0.124939
-0.575252	can convert example	-0.124939
-0.291909	will convert example	-0.124939
-0.570084	we modify example	-0.124939
-0.463369	this. My example	-0.124939
-0.463369	example. My example	-0.124939
-0.540964	actually reducing example	-0.124939
-0.463203	automatically reduces example	-0.124939
-0.358586	the SelectAddMul example	-0.124939
-2.056618	of the compilers	-0.425969
-1.982313	that the compilers	-0.124939
-1.302720	all the compilers	-0.425969
-1.420654	choose the compilers	-0.124939
-0.598173	reductions the compilers	-0.124939
-0.598173	on, the compilers	-0.124939
-0.595308	compiler. The compilers	-0.124939
-1.059232	precision. The compilers	-0.124939
-0.595308	constant. The compilers	-0.124939
-0.595308	so. The compilers	-0.124939
-0.595308	71 The compilers	-0.124939
-1.593706	only for compilers	-0.124939
-1.076610	come with compilers	-0.124939
-1.440875	work on compilers	-0.124939
-0.898787	accessible from compilers	-0.124939
-1.485515	the different compilers	-0.124939
-1.640218	of different compilers	-0.124939
-1.342138	with different compilers	-0.124939
-0.856254	from different compilers	-0.124939
-1.071986	use only compilers	-0.124939
-1.478791	and other compilers	-0.124939
-0.750422	with other compilers	-0.124939
-0.561516	Some other compilers	-0.124939
-0.363537	while other compilers	-0.124939
-0.599252	Fortunately, all compilers	-0.124939
-0.579126	However, most compilers	-0.124939
-0.579126	On most compilers	-0.124939
-0.579126	Fortunately, most compilers	-0.124939
-0.328048	and Intel compilers	-0.301030
-1.159739	in Intel compilers	-0.124939
-1.456915	The Intel compilers	-0.124939
-0.801152	compiler Intel compilers	-0.124939
-0.598355	Some 64-bit compilers	-0.124939
-0.927541	of C++ compilers	-0.124939
-0.436666	that C++ compilers	-0.124939
-0.154141	different C++ compilers	-0.550907
-0.436666	all C++ compilers	-0.124939
-0.436666	most C++ compilers	-0.124939
-0.436666	All C++ compilers	-0.124939
-0.165619	Most C++ compilers	-0.124939
-0.618207	Microsoft C++ compilers	-0.124939
-1.131950	that some compilers	-0.124939
-0.582862	Unfortunately, some compilers	-0.124939
-2.190833	For example, compilers	-0.124939
-0.795976	of how compilers	-0.124939
-1.560072	of these compilers	-0.124939
-0.574501	tell these compilers	-0.124939
-1.015291	and Gnu compilers	-0.124939
-0.568941	or Gnu compilers	-0.124939
-0.311032	optimization Some compilers	-0.124939
-0.311032	time. Some compilers	-0.124939
-0.128247	systems. Some compilers	-0.425969
-0.311032	compilers. Some compilers	-0.124939
-0.311032	directives Some compilers	-0.124939
-0.239770	compiler. Some compilers	-0.124939
-0.311032	unrolling Some compilers	-0.124939
-0.311032	order. Some compilers	-0.124939
-0.311032	available. Some compilers	-0.124939
-0.311032	line. Some compilers	-0.124939
-0.311032	division. Some compilers	-0.124939
-0.439356	two. Some compilers	-0.124939
-0.311032	style. Some compilers	-0.124939
-0.311032	places). Some compilers	-0.124939
-1.547491	The best compilers	-0.124939
-1.053918	Some common compilers	-0.124939
-0.562610	all good compilers	-0.124939
-1.064126	very good compilers	-0.124939
-0.591680	Unfortunately, few compilers	-0.124939
-0.515980	because optimizing compilers	-0.124939
-0.515980	best optimizing compilers	-0.124939
-0.515980	All optimizing compilers	-0.124939
-0.262801	memory. Most compilers	-0.124939
-0.262801	cache. Most compilers	-0.124939
-0.262801	storage Most compilers	-0.124939
-0.262801	reductions Most compilers	-0.124939
-0.262801	loop. Most compilers	-0.124939
-0.262801	variable. Most compilers	-0.124939
-0.262801	sets. Most compilers	-0.124939
-0.262801	reduction Most compilers	-0.124939
-0.262801	executable. Most compilers	-0.124939
-0.262801	compression Most compilers	-0.124939
-0.262801	47 Most compilers	-0.124939
-0.585857	slower. Many compilers	-0.124939
-0.575001	processor). Optimizing compilers	-0.124939
-0.483369	format. Other compilers	-0.124939
-0.483369	Gnu). Other compilers	-0.124939
-0.546319	and PathScale compilers	-0.425969
-1.310645	reason why compilers	-0.124939
-0.463080	that future compilers	-0.124939
-0.463080	though future compilers	-0.124939
-0.462995	that current compilers	-0.124939
-0.462995	where current compilers	-0.124939
-0.566940	optimize Modern compilers	-0.124939
-1.257659	the latest compilers	-0.124939
-0.557105	with Intel's compilers	-0.124939
-0.152864	8.1 How compilers	-0.425969
-1.054676	Digital Mars compilers	-0.124939
-0.526446	many commercial compilers	-0.124939
-0.526446	and Watcom compilers	-0.124939
-0.504529	number). Different compilers	-0.124939
-0.462911	73). Current compilers	-0.124939
-0.462911	vectorclass.h Supported compilers	-0.124939
-0.358357	vectorization Good compilers	-0.124939
-0.358357	enabled. Few compilers	-0.124939
-0.358357	// (Some compilers	-0.124939
-1.247987	is the most	-0.726999
-1.729634	of the most	-0.367977
-1.871988	to the most	-0.124939
-1.521529	and the most	-0.124939
-1.829301	in the most	-0.425969
-2.166342	that the most	-0.124939
-2.189971	if the most	-0.124939
-1.974630	use the most	-0.124939
-1.937571	then the most	-0.124939
-1.861418	make the most	-0.124939
-1.575946	only the most	-0.124939
-1.513488	making the most	-0.124939
-1.152252	run the most	-0.124939
-1.639142	calculate the most	-0.124939
-0.789959	put the most	-0.124939
-1.360088	choose the most	-0.124939
-1.237108	When the most	-0.124939
-1.249162	finding the most	-0.124939
-0.876382	select the most	-0.124939
-0.876382	choosing the most	-0.124939
-0.588501	probably the most	-0.124939
-0.876382	isolate the most	-0.124939
-0.588501	among the most	-0.124939
-2.253384	that is most	-0.124939
-0.898051	type is most	-0.124939
-1.071645	what is most	-0.124939
-1.074143	profiler is most	-0.124939
-2.154422	version of most	-0.124939
-0.600210	best and most	-0.124939
-0.600210	simplest and most	-0.124939
-1.245473	and in most	-0.124939
-0.587490	can in most	-0.124939
-1.299332	time in most	-0.124939
-1.050515	because in most	-0.124939
-1.311808	value in most	-0.124939
-0.587490	simple in most	-0.124939
-0.874424	advantageous in most	-0.124939
-0.874424	fast in most	-0.124939
-0.587490	However, in most	-0.124939
-0.874424	optimal in most	-0.124939
-0.201167	implementation in most	-0.124939
-0.587490	lists in most	-0.124939
-0.587490	because, in most	-0.124939
-1.685216	} The most	-0.124939
-1.014853	functions The most	-0.124939
-0.579542	variables The most	-0.124939
-1.259866	set. The most	-0.124939
-1.120306	cases. The most	-0.124939
-0.859164	operations. The most	-0.124939
-1.014853	purposes. The most	-0.124939
-0.579542	condition The most	-0.124939
-1.014853	problem. The most	-0.124939
-1.120306	available. The most	-0.124939
-1.014853	faster. The most	-0.124939
-1.120306	element. The most	-0.124939
-0.579542	execution. The most	-0.124939
-0.579542	methods. The most	-0.124939
-0.579542	Security The most	-0.124939
-0.579542	generality. The most	-0.124939
-0.901545	but for most	-0.124939
-0.598037	used that most	-0.124939
-0.598037	again, that most	-0.124939
-0.598037	conclude that most	-0.124939
-1.282964	systems are most	-0.124939
-1.278562	resources are most	-0.124939
-0.597473	statements are most	-0.124939
-1.709189	supported by most	-0.124939
-0.502468	comes with most	-0.124939
-0.880540	important on most	-0.124939
-0.590639	precision on most	-0.124939
-0.880540	cycles on most	-0.124939
-1.045781	cycle on most	-0.124939
-2.264503	such as most	-0.124939
-0.600538	Faster than most	-0.124939
-1.196420	function will most	-0.124939
-1.539787	program has most	-0.124939
-1.990629	are used most	-0.124939
-0.512306	cases In most	-0.124939
-0.851626	etc. In most	-0.124939
-0.512306	integers In most	-0.124939
-0.512306	cycles. In most	-0.124939
-0.512306	optimizations. In most	-0.124939
-0.512306	step. In most	-0.124939
-0.512306	calculation. In most	-0.124939
-0.512306	strings. In most	-0.124939
-0.598192	object where most	-0.124939
-1.282693	one way most	-0.124939
-0.596873	the 8 most	-0.124939
-1.063681	functions take most	-0.124939
-1.494557	is accessed most	-0.124939
-0.594360	Windows, while most	-0.124939
-0.594098	versa. But most	-0.124939
-0.232942	cache works most	-0.602060
-0.884149	application uses most	-0.124939
-1.407395	to run most	-0.124939
-0.590506	brackets. However, most	-0.124939
-1.041177	are predicted most	-0.124939
-0.577269	output. On most	-0.124939
-0.415290	programs spend most	-0.124939
-0.415290	applications spend most	-0.124939
-0.540503	d.y; Fortunately, most	-0.124939
-0.540503	software runs most	-0.124939
-0.526594	code. Furthermore, most	-0.124939
-0.763709	can obtain most	-0.124939
-0.504359	that consumes most	-0.124939
-0.358457	25 Since most	-0.124939
-0.358457	153 spends most	-0.124939
-1.738110	solution is using	-0.124939
-0.600929	15.0) is using	-0.124939
-0.703265	advantage of using	-0.221849
-0.708668	disadvantage of using	-0.124939
-1.704627	instead of using	-0.124939
-0.443908	advantages of using	-0.221849
-1.141461	possibility of using	-0.124939
-1.141461	purpose of using	-0.124939
-0.200763	trick of using	-0.124939
-0.870631	disadvantages of using	-0.124939
-1.156299	drawbacks of using	-0.124939
-0.585528	cons of using	-0.124939
-0.585528	advise of using	-0.124939
-0.202755	speed to using	-0.124939
-0.501067	advantage to using	-0.124939
-0.596624	cost to using	-0.124939
-0.595276	penalty to using	-0.124939
-0.595276	alternative to using	-0.124939
-1.059138	alternatives to using	-0.124939
-1.192575	classes and using	-0.124939
-0.897279	counter and using	-0.124939
-0.897279	devices and using	-0.124939
-1.202291	advantage in using	-0.124939
-1.192812	reason for using	-0.124939
-1.192812	penalty for using	-0.124939
-0.895519	tool for using	-0.124939
-0.620117	you are using	-0.182931
-0.996974	we are using	-0.124939
-1.223058	systems are using	-0.124939
-1.284753	microprocessors are using	-0.124939
-0.601055	round function using	-0.124939
-1.047060	or by using	-0.124939
-1.135118	than by using	-0.124939
-0.510772	x by using	-0.124939
-1.223151	this by using	-0.124939
-0.510772	functions by using	-0.124939
-1.223151	loop by using	-0.124939
-0.510772	set by using	-0.124939
-0.510772	clock by using	-0.124939
-0.510772	variables by using	-0.124939
-0.736410	2 by using	-0.124939
-0.510772	systems by using	-0.124939
-0.510772	result by using	-0.124939
-0.113795	speed by using	-0.124939
-0.510772	optimized by using	-0.124939
-1.064200	simply by using	-0.124939
-0.736410	zero by using	-0.124939
-0.510772	conversions by using	-0.124939
-1.098213	avoided by using	-0.124939
-0.510772	further by using	-0.124939
-0.510772	efficiency by using	-0.124939
-0.917285	obtained by using	-0.124939
-0.487969	improved by using	-0.124939
-0.736410	explicitly by using	-0.124939
-0.510772	anything by using	-0.124939
-0.510772	guidelines by using	-0.124939
-0.510772	hyperthreading by using	-0.124939
-0.510772	hidden by using	-0.124939
-0.510772	necessary, by using	-0.124939
-0.510772	segment by using	-0.124939
-0.510772	ameliorated by using	-0.124939
-0.500312	restrictions on using	-0.124939
-0.594026	Manual on using	-0.124939
-1.374866	efficient as using	-0.124939
-0.600598	by not using	-0.124939
-2.352906	rather than using	-0.124939
-0.600286	simpler when using	-0.124939
-1.290780	benefit from using	-0.124939
-0.598877	same example using	-0.124939
-1.281774	An array using	-0.124939
-1.065599	speed between using	-0.124939
-0.370841	Same example, using	-0.124939
-0.534124	error without using	-0.124939
-0.534124	exception without using	-0.124939
-0.534124	errors without using	-0.124939
-0.534124	directly without using	-0.124939
-0.595892	call method using	-0.124939
-0.563032	integer power using	-0.124939
-0.563032	Integer power using	-0.124939
-1.607387	a matrix using	-0.124939
-1.053282	without AVX using	-0.124939
-1.709134	be calculated using	-0.124939
-1.535592	bitwise operators using	-0.124939
-1.775492	memory allocation using	-0.124939
-0.589813	by preferably using	-0.124939
-0.872889	algebraic expressions using	-0.124939
-0.580076	single operation using	-0.124939
-1.390568	in fact using	-0.124939
-1.008529	multiple conditions using	-0.124939
-0.698199	instruction set, using	-0.425969
-0.817184	allocated memory, using	-0.124939
-0.504807	I am using	-0.124939
-0.527021	is finished using	-0.124939
-0.463057	running. Programs using	-0.124939
-0.358471	highly optimized, using	-0.124939
-2.410098	with the double	-0.124939
-0.901771	everything is double	-0.124939
-2.473338	is a double	-0.124939
-2.440374	of a double	-0.124939
-1.597647	example, a double	-0.124939
-0.894394	while a double	-0.124939
-0.597690	modify a double	-0.124939
-0.894394	stores a double	-0.124939
-1.540480	a to double	-0.124939
-0.599764	precision to double	-0.124939
-0.898510	converting to double	-0.124939
-1.170413	integer and double	-0.124939
-0.185741	float and double	-0.124939
-0.123966	single and double	-0.301030
-1.071392	than for double	-0.124939
-1.193765	even for double	-0.124939
-1.375744	know that double	-0.124939
-1.294905	constants are double	-0.124939
-2.224639	you can double	-0.124939
-0.940521	bytes = double	-0.124939
-0.181050	float or double	-0.301030
-0.582710	single or double	-0.124939
-1.131410	precision or double	-0.124939
-1.988196	faster than double	-0.124939
-2.215426	rather than double	-0.124939
-1.903323	x) { double	-0.124939
-1.554230	union { double	-0.124939
-1.141450	S1 { double	-0.124939
-1.156290	n) { double	-0.124939
-1.542178	may use double	-0.124939
-0.599464	complications. A double	-0.124939
-0.599980	y; } double	-0.124939
-0.599360	using loop double	-0.124939
-1.769894	and b double	-0.124939
-1.187342	of two double	-0.124939
-0.086935	public: static double	-0.726999
-0.664673	a 64-bit double	-0.124939
-1.964180	of 2 double	-0.124939
-0.509039	the long double	-0.124939
-0.183853	and long double	-0.124939
-0.733499	with long double	-0.124939
-0.509039	have long double	-0.124939
-0.509039	when long double	-0.124939
-0.509039	8 long double	-0.124939
-0.547900	of const double	-0.124939
-1.234564	static const double	-0.124939
-0.547900	variables const double	-0.124939
-0.547900	x; const double	-0.124939
-1.279101	4 4 double	-0.124939
-0.593849	pure_function ; double	-0.124939
-0.884756	256 AVX double	-0.124939
-0.591984	float 128 double	-0.124939
-0.591595	hold four double	-0.124939
-1.643547	a, b; double	-0.124939
-1.047950	This would double	-0.124939
-1.718352	static inline double	-0.124939
-1.466115	most cases, double	-0.124939
-0.588249	precision. Using double	-0.124939
-0.587902	float 256 double	-0.124939
-0.479465	r, c; double	-0.425969
-0.346686	of 10 double	-0.425969
-0.782140	int a; double	-0.124939
-0.768412	float a; double	-0.124939
-0.156789	{double a; double	-0.425969
-0.572348	128 SSE double	-0.124939
-0.093438	int u; double	-0.602060
-0.995540	512 AVX512 double	-0.124939
-0.786397	= C; double	-0.124939
-0.540117	pure_function #endif double	-0.124939
-0.314125	of float, double	-0.124939
-0.314125	between float, double	-0.124939
-0.504195	Example 8.4 double	-0.124939
-0.724615	Polynomial coefficients double	-0.124939
-0.249347	are: Long double	-0.124939
-0.249347	precision. Long double	-0.124939
-0.503718	loop unrolled double	-0.124939
-0.658008	= 3.3; double	-0.124939
-0.462455	Example 14.14b double	-0.124939
-0.462455	Example 14.14a double	-0.124939
-0.462455	+ A; double	-0.124939
-0.065687	TransposeCopy(double a[SIZE][SIZE], double	-0.425969
-0.357999	= x2*x2; double	-0.124939
-0.357999	int dummy; double	-0.124939
-0.357999	c1, c2; double	-0.124939
-0.357999	Example 14.18c double	-0.124939
-0.357999	Example 8.2a double	-0.124939
-0.357999	Example 7.32a double	-0.124939
-0.357999	Example 7.32b double	-0.124939
-0.357999	{ __declspec(__align(64)) double	-0.124939
-0.357999	Example 14.17b double	-0.124939
-0.357999	Example 14.16a double	-0.124939
-0.357999	* dest, double	-0.124939
-0.357999	x *x; double	-0.124939
-0.357999	Example 8.8b double	-0.124939
-0.357999	Example 8.8a double	-0.124939
-0.357999	= x4*x4; double	-0.124939
-0.357999	Example 14.16b double	-0.124939
-0.357999	Example 14.17a double	-0.124939
-0.357999	Example 14.20 double	-0.124939
-2.473142	of the size	-0.124939
-2.237679	for the size	-0.124939
-1.492949	if the size	-0.346788
-1.458421	by the size	-0.301030
-2.177348	on the size	-0.124939
-1.724643	as the size	-0.124939
-1.479183	when the size	-0.602060
-2.047389	because the size	-0.124939
-2.078713	If the size	-0.124939
-1.333372	where the size	-0.124939
-1.521078	example, the size	-0.124939
-1.681054	unless the size	-0.124939
-1.409291	fit the size	-0.124939
-1.254191	increase the size	-0.124939
-0.883744	half the size	-0.124939
-0.202146	Is the size	-0.425969
-0.592279	Return the size	-0.124939
-0.883744	increases the size	-0.124939
-0.901503	type and size	-0.124939
-0.590847	efficient. The size	-0.425969
-0.875583	units. The size	-0.124939
-0.875583	8. The size	-0.124939
-0.875583	elements. The size	-0.124939
-0.588089	objects. The size	-0.124939
-1.038547	module. The size	-0.124939
-0.875583	reasons: The size	-0.124939
-0.588089	expression. The size	-0.124939
-0.588089	i. The size	-0.124939
-0.898472	optimizing for size	-0.124939
-1.072273	Optimizing for size	-0.124939
-2.382283	the code size	-0.124939
-0.886790	and code size	-0.124939
-0.593833	small code size	-0.124939
-0.411191	const int size	-1.204120
-0.599897	smallest data size	-0.124939
-0.803357	the vector size	-0.124939
-0.579006	new vector size	-0.124939
-1.768596	of different size	-0.124939
-0.592320	copying different size	-0.124939
-0.599545	units same size	-0.124939
-0.599282	where cache size	-0.124939
-1.197359	the integer size	-0.124939
-0.969284	The integer size	-0.124939
-1.338130	an integer size	-0.124939
-0.530430	efficient integer size	-0.124939
-0.530430	particular integer size	-0.124939
-0.530430	default integer size	-0.124939
-0.116371	smallest integer size	-0.602060
-0.598762	memory page size	-0.124939
-1.633948	the array size	-0.124939
-0.584441	final array size	-0.124939
-1.284147	of variable size	-0.124939
-0.894543	of any size	-0.124939
-1.102604	the register size	-0.124939
-1.143200	vector register size	-0.124939
-0.191722	new register size	-0.124939
-0.594602	sure its size	-0.124939
-1.687301	a specific size	-0.124939
-0.825630	64 matrix size	-0.124939
-0.825630	512 matrix size	-0.124939
-0.164112	a line size	-0.425969
-0.439199	cache line size	-0.124939
-0.590002	performance. Integer size	-0.124939
-0.588537	the block size	-0.124939
-0.587090	a longer size	-0.124939
-0.866944	a smaller size	-0.124939
-1.016140	i >= size	-0.124939
-1.229867	The maximum size	-0.124939
-0.783347	the final size	-0.124939
-1.530744	// Define size	-0.124939
-0.576945	The total size	-0.124939
-0.574754	a full size	-0.124939
-0.575242	the RAM size	-0.124939
-0.566552	of 256-bit size	-0.124939
-0.834759	the default size	-0.124939
-0.530591	a fixed size	-0.124939
-0.473162	and fixed size	-0.124939
-0.530591	with fixed size	-0.124939
-0.566723	// Array size	-0.124939
-0.358666	arrays. Array size	-0.124939
-0.143278	the combined size	-0.425969
-0.107099	elements Total size	-0.425969
-0.725615	8 kb size	-0.124939
-0.143241	element. Matrix size	-0.124939
-0.143241	follows: Matrix size	-0.124939
-0.358428	CPUs. Half size	-0.124939
-2.000937	of the Intel	-0.425969
-2.332475	to the Intel	-0.124939
-2.199303	and the Intel	-0.124939
-2.182383	in the Intel	-0.425969
-2.288154	for the Intel	-0.124939
-1.717359	that the Intel	-0.124939
-2.172803	by the Intel	-0.124939
-2.150419	with the Intel	-0.124939
-2.225080	on the Intel	-0.124939
-1.560942	use the Intel	-0.124939
-2.117880	If the Intel	-0.124939
-1.541053	See the Intel	-0.124939
-1.444143	However, the Intel	-0.124939
-1.341090	cases, the Intel	-0.124939
-0.593949	tests, the Intel	-0.124939
-0.593949	Overriding the Intel	-0.124939
-1.889486	use of Intel	-0.124939
-1.916115	versions of Intel	-0.124939
-1.070475	features of Intel	-0.124939
-0.599138	favor of Intel	-0.124939
-1.816980	AMD and Intel	-0.124939
-0.892676	Microsoft and Intel	-0.124939
-0.892676	PathScale and Intel	-0.124939
-0.203067	Clang and Intel	-0.124939
-1.786654	function in Intel	-0.124939
-0.891164	counter in Intel	-0.124939
-0.677629	dispatching in Intel	-0.425969
-1.067256	mechanism in Intel	-0.124939
-1.277498	defined in Intel	-0.124939
-1.780133	} The Intel	-0.124939
-0.587979	Intel The Intel	-0.124939
-1.357014	systems. The Intel	-0.124939
-1.150344	not. The Intel	-0.124939
-0.587979	this. The Intel	-0.124939
-0.875370	127. The Intel	-0.124939
-0.587979	required. The Intel	-0.124939
-0.875370	122. The Intel	-0.124939
-0.587979	details). The Intel	-0.124939
-0.587979	www.agner.org/optimize/#vectorclass. The Intel	-0.124939
-1.659899	code for Intel	-0.124939
-1.578409	only for Intel	-0.124939
-0.600648	Gnu or Intel	-0.124939
-0.597766	designed by Intel	-0.124939
-0.597766	published by Intel	-0.124939
-0.597468	only with Intel	-0.124939
-0.597468	Included with Intel	-0.124939
-0.583850	and on Intel	-0.124939
-1.135456	not on Intel	-0.124939
-0.867403	works on Intel	-0.124939
-0.583850	feature on Intel	-0.124939
-0.583850	tests on Intel	-0.124939
-0.583850	setup. on Intel	-0.124939
-1.664433	is an Intel	-0.124939
-0.491727	on an Intel	-0.124939
-0.588637	using an Intel	-0.124939
-0.579913	fake an Intel	-0.124939
-1.329351	Intel compiler Intel	-0.124939
-0.899355	CPUs use Intel	-0.124939
-0.600166	work when Intel	-0.124939
-0.885758	one from Intel	-0.124939
-0.885758	Available from Intel	-0.124939
-1.894084	for different Intel	-0.124939
-0.896559	example, using Intel	-0.124939
-0.598664	2 double Intel	-0.124939
-1.527791	class library Intel	-0.124939
-0.597276	Intel. See Intel	-0.124939
-0.597208	profiler. For Intel	-0.124939
-0.594787	PathScale Gnu Intel	-0.124939
-0.659584	compiler Windows Intel	-0.425969
-1.172684	128 bytes Intel	-0.124939
-0.322194	compiler Linux Intel	-0.602060
-0.884680	well optimized Intel	-0.124939
-0.590850	on certain Intel	-0.124939
-1.263980	32-bit Mac Intel	-0.124939
-1.033972	PathScale compilers. Intel	-0.124939
-0.585732	dispatching. Many Intel	-0.124939
-1.369213	and later Intel	-0.124939
-0.866577	aligned operands Intel	-0.124939
-0.572296	multi-threading, e.g. Intel	-0.124939
-0.840565	all newer Intel	-0.124939
-0.569551	on current Intel	-0.124939
-0.283091	The Microsoft, Intel	-0.124939
-0.283091	with Microsoft, Intel	-0.124939
-0.283091	from Microsoft, Intel	-0.124939
-0.283091	(i.e. Microsoft, Intel	-0.124939
-0.562691	The Gnu, Intel	-0.124939
-0.111468	Gnu, Clang, Intel	-0.602060
-0.762878	Vector class, Intel	-0.124939
-0.504099	and earlier Intel	-0.124939
-0.504099	Mac platform. Intel	-0.124939
-0.203886	math libraries: Intel	-0.124939
-0.658551	page 131. Intel	-0.124939
-0.658551	unaligned op. Intel	-0.124939
-0.065722	Intel: "IA-32 Intel	-0.425969
-0.358271	vmlsExp4 vmldExp2 Intel	-0.124939
-0.358271	5, 2009). Intel	-0.124939
-0.358271	The undocumented Intel	-0.124939
-0.358271	__svml_expf4 __svml_exp2 Intel	-0.124939
-0.358271	v. 2.00. Intel	-0.124939
-0.358271	follows (using Intel	-0.124939
-0.358271	as AQtime, Intel	-0.124939
-2.207478	of the pointer	-0.124939
-2.530756	that the pointer	-0.124939
-2.302546	with the pointer	-0.124939
-2.370574	when the pointer	-0.124939
-2.159578	then the pointer	-0.124939
-1.995946	before the pointer	-0.124939
-1.480664	after the pointer	-0.124939
-1.744637	is a pointer	-0.124939
-1.968332	of a pointer	-0.124939
-1.063153	to a pointer	-0.425969
-1.587964	and a pointer	-0.124939
-0.855065	it a pointer	-0.124939
-1.769749	with a pointer	-0.124939
-1.818864	as a pointer	-0.124939
-1.450392	when a pointer	-0.124939
-1.632231	has a pointer	-0.124939
-1.676578	make a pointer	-0.124939
-0.919807	return a pointer	-0.124939
-0.494119	through a pointer	-0.367977
-0.855065	what a pointer	-0.124939
-0.577384	transfer a pointer	-0.124939
-0.855065	stores a pointer	-0.124939
-0.577384	Likewise, a pointer	-0.124939
-1.031156	converting a pointer	-0.124939
-0.855065	returns a pointer	-0.124939
-0.577384	Returns a pointer	-0.124939
-0.577384	decrementing a pointer	-0.124939
-0.577384	Unlike a pointer	-0.124939
-0.577384	pass a pointer	-0.124939
-0.601304	arithmetics and pointer	-0.124939
-0.901559	pointer. The pointer	-0.124939
-2.188180	used for pointer	-0.124939
-0.601384	Specifies that pointer	-0.124939
-0.900679	reference or pointer	-0.124939
-1.254331	the function pointer	-0.221849
-1.143625	a function pointer	-0.221849
-1.601695	member function pointer	-0.124939
-0.837164	through function pointer	-0.124939
-0.567847	Set function pointer	-0.124939
-0.601119	uninitialized, if pointer	-0.124939
-0.600822	member. This pointer	-0.124939
-1.273909	and this pointer	-0.124939
-0.595119	The this pointer	-0.124939
-0.588125	conversion A pointer	-0.124939
-0.588125	arithmetic A pointer	-0.124939
-0.588125	elimination A pointer	-0.124939
-1.183297	has no pointer	-0.124939
-0.536750	about no pointer	-0.124939
-0.536750	assume no pointer	-0.124939
-0.190249	Assume no pointer	-0.124939
-0.536750	assuming no pointer	-0.124939
-0.190249	"assume no pointer	-0.425969
-1.780608	the array pointer	-0.124939
-1.282072	of possible pointer	-0.124939
-0.583391	return any pointer	-0.124939
-0.866521	making any pointer	-0.124939
-1.336109	the member pointer	-0.124939
-1.285809	data member pointer	-0.124939
-1.065453	A const pointer	-0.124939
-1.280925	4 4 pointer	-0.124939
-1.350058	8 8 pointer	-0.124939
-1.743158	a simple pointer	-0.124939
-0.888404	have its pointer	-0.124939
-1.782274	information about pointer	-0.124939
-1.687898	a specific pointer	-0.124939
-0.946251	// Function pointer	-0.124939
-0.587678	126 Make pointer	-0.124939
-0.523409	No link pointer	-0.124939
-0.523409	previous link pointer	-0.124939
-0.498664	aligned Assume pointer	-0.124939
-0.498664	aligned(16))) Assume pointer	-0.124939
-0.436531	a smart pointer	-0.124939
-0.113516	A smart pointer	-0.124939
-0.267627	their smart pointer	-0.124939
-0.352590	a 'this' pointer	-0.124939
-0.244867	The 'this' pointer	-0.124939
-0.244867	its 'this' pointer	-0.124939
-0.105399	implicit 'this' pointer	-0.124939
-0.562371	6 integer, pointer	-0.124939
-0.366567	// Set pointer	-0.425969
-0.939129	by avoiding pointer	-0.124939
-0.883514	the original pointer	-0.124939
-0.526818	The returned pointer	-0.124939
-0.526818	the implicit pointer	-0.124939
-0.725715	a variable, pointer	-0.124939
-0.065747	-fomit- frame- pointer	-0.124939
-2.129951	value of b	-0.124939
-1.289578	elements of b	-0.124939
-1.195503	offset of b	-0.124939
-0.897326	Value of b	-0.124939
-0.855458	a to b	-0.124939
-0.123438	a and b	-0.301030
-1.424283	time and b	-0.124939
-1.014785	element in b	-0.425969
-1.824402	assume that b	-0.124939
-0.549103	a = b	-0.681241
-0.192615	result = b	-0.425969
-1.003935	c = b	-0.425969
-1.354350	a[i] = b	-0.124939
-1.017472	temp = b	-0.124939
-1.166591	not if b	-0.124939
-0.202164	b if b	-0.425969
-0.592369	inexact if b	-0.124939
-0.900531	check on b	-0.124939
-1.563073	only when b	-0.124939
-1.133283	efficient when b	-0.124939
-0.583238	implementation when b	-0.124939
-0.583238	however, when b	-0.124939
-1.072826	b because b	-0.124939
-0.496840	a + b	-0.522879
-1.240789	c + b	-0.124939
-0.534761	1.0f + b	-0.124939
-1.069056	b * b	-0.425969
-0.597333	a < b	-0.124939
-1.589050	a & b	-0.124939
-0.594480	align its b	-0.124939
-0.594089	example, a, b	-0.124939
-0.592348	a / b	-0.124939
-1.528451	a, b; b	-0.124939
-0.558536	0, b; b	-0.124939
-0.591146	be 1 b	-0.124939
-0.193903	2 : b	-0.425969
-0.881385	and add b	-0.124939
-0.542975	intermediate expression b	-0.124939
-0.542975	equivalent expression b	-0.124939
-0.191170	b: __m128i b	-0.425969
-0.540771	double c; b	-0.124939
-1.260172	b, c; b	-0.124939
-0.669471	a && b	-0.124939
-1.377267	a | b	-0.124939
-1.228448	a || b	-0.124939
-0.485169	a > b	-0.124939
-0.178089	(a > b	-0.124939
-1.418372	= 0, b	-0.124939
-0.865186	= a; b	-0.124939
-0.177620	+ 2, b	-0.425969
-1.002506	to convert b	-0.124939
-0.175309	a ? b	-0.425969
-1.086453	to evaluate b	-0.124939
-1.024800	a ^ b	-0.124939
-0.549829	b: Is16vec8 b	-0.124939
-0.540940	have AND'ed b	-0.124939
-0.129333	// Multiply b	-0.425969
-0.504179	of security. b	-0.124939
-0.725381	that accesses b	-0.124939
-0.358328	= -100, b	-0.124939
-0.358328	= -1.0E8, b	-0.124939
-0.358328	parabola (2.0f); b	-0.124939
-0.358328	= 5.0f; b	-0.124939
-0.358328	+ two, b	-0.124939
-0.358328	a XOR b	-0.124939
-0.358328	= Multiply(10,8); b	-0.124939
-0.358328	= a+1; b	-0.124939
-0.600956	divide it into	-0.124939
-2.321364	a function into	-0.124939
-1.280656	frame function into	-0.124939
-1.977910	of code into	-0.124939
-1.288722	allocated memory into	-0.124939
-1.328237	the data into	-0.124939
-0.572939	right data into	-0.124939
-0.572939	organizing data into	-0.124939
-0.572939	Loading data into	-0.124939
-0.480787	integer vector into	-0.726999
-0.598762	data set into	-0.124939
-1.647449	or class into	-0.124939
-1.193795	of b into	-0.124939
-1.783666	the library into	-0.124939
-1.531957	of i into	-0.124939
-0.894439	allocated array into	-0.124939
-1.187463	as possible into	-0.124939
-0.597306	simple variables into	-0.124939
-1.464384	of software into	-0.124939
-1.605099	a branch into	-0.124939
-0.596181	flags register into	-0.124939
-1.483270	to take into	-0.124939
-0.577677	you take into	-0.124939
-0.594985	read operations into	-0.124939
-0.888839	split up into	-0.124939
-0.686490	the work into	-0.425969
-0.540362	of work into	-0.124939
-0.593625	consuming calculations into	-0.124939
-1.575052	is compiled into	-0.124939
-0.196374	fits best into	-0.425969
-1.443077	the matrix into	-0.124939
-0.853223	source files into	-0.124939
-0.184796	.cpp files into	-0.425969
-1.179250	compatibility problems into	-0.124939
-1.624323	memory block into	-0.124939
-0.875906	be put into	-0.124939
-0.588776	structure y into	-0.124939
-1.034053	be read into	-0.124939
-0.460353	be linked into	-0.124939
-0.585763	it) load into	-0.124939
-0.869545	objects together into	-0.124939
-0.868835	the vectors into	-0.124939
-0.583985	or goes into	-0.124939
-0.868222	test feature into	-0.124939
-0.865191	.cpp modules into	-0.124939
-0.863679	will go into	-0.124939
-0.684876	is loaded into	-0.124939
-0.404397	and loaded into	-0.124939
-0.803175	be loaded into	-0.124939
-0.570091	are loaded into	-0.124939
-1.010226	a task into	-0.124939
-0.358179	copying them into	-0.124939
-0.358179	copies them into	-0.124939
-0.358179	join them into	-0.124939
-0.358179	getting them into	-0.124939
-0.490959	the tasks into	-0.124939
-0.703598	time-consuming tasks into	-0.124939
-0.483352	not fit into	-0.124939
-0.691254	data fit into	-0.124939
-1.001328	of N into	-0.124939
-0.572273	instruments directly into	-0.124939
-0.995328	be copied into	-0.124939
-0.569582	automatically come into	-0.124939
-0.569938	go back into	-0.124939
-0.875409	be organized into	-0.124939
-0.658957	are organized into	-0.124939
-1.060271	branch prediction into	-0.124939
-0.390617	it fits into	-0.124939
-0.390617	float's fits into	-0.124939
-1.022942	the job into	-0.124939
-0.540029	be turned into	-0.124939
-0.540029	cache effects into	-0.124939
-0.540029	put 80 into	-0.124939
-0.540029	was split into	-0.124939
-0.051724	is divided into	-0.124939
-0.025092	be divided into	-0.124939
-0.051724	not divided into	-0.124939
-0.051724	usually divided into	-0.124939
-0.540029	be combined into	-0.124939
-0.525814	by one, into	-0.124939
-0.015512	from cc into	-0.726999
-0.015512	from bb into	-0.726999
-0.724448	measurement instruments into	-0.124939
-0.832866	to 0x273F into	-0.124939
-0.725297	is translated into	-0.124939
-0.503618	some formula into	-0.124939
-0.027972	be taken into	-0.602060
-0.203779	be joined into	-0.124939
-0.462364	is fed into	-0.124939
-0.143089	be wrapped into	-0.124939
-0.143089	are wrapped into	-0.124939
-0.357927	go deeper into	-0.124939
-0.357927	Windows. Integrates into	-0.124939
-0.357927	fit nicely into	-0.124939
-0.357927	preferably isolated into	-0.124939
-0.357927	time packed into	-0.124939
-0.357927	to feed into	-0.124939
-0.811647	= a +	-0.367977
-0.410023	return a +	-0.823909
-0.883212	write a +	-0.124939
-0.592007	{return a +	-0.124939
-0.247620	* x +	-0.301030
-0.376838	= A +	-0.124939
-0.708854	= b +	-0.124939
-0.194694	+ b +	-0.346788
-0.179252	* b +	-0.425969
-0.896190	; i +	-0.124939
-0.596631	1: 4 +	-0.124939
-0.596575	1: 8 +	-0.124939
-0.420520	= c +	-0.124939
-0.161234	+ c +	-0.425969
-0.161234	0, c +	-0.425969
-0.161234	? c +	-0.425969
-0.420520	zero, c +	-0.124939
-0.589124	vector operator +	-0.124939
-0.106477	= y +	-0.602060
-0.458783	a.x, y +	-0.124939
-0.346915	= r +	-0.124939
-0.583927	= a[i] +	-0.124939
-0.581532	= p +	-0.124939
-0.579813	+ b) +	-0.124939
-0.181374	= d +	-0.124939
-0.100518	// exponent +	-0.124939
-0.569384	= row +	-0.124939
-0.090107	= *p +	-0.301030
-0.816633	= (a +	-0.124939
-0.993791	<< 4) +	-0.124939
-0.550010	* 9 +	-0.124939
-0.540835	+ (c +	-0.124939
-0.540572	= b[i] +	-0.124939
-0.882459	Intel SVML +	-0.124939
-0.526469	for (b +	-0.124939
-0.027984	= LoadVector(cc +	-0.602060
-0.027984	= LoadVector(bb +	-0.602060
-0.027984	aa: StoreVector(aa +	-0.602060
-0.065707	+ c*x +	-0.425969
-0.065707	+ b*x*x +	-0.425969
-0.065707	a*b+a*c=a*(b+c) a*x*x*x +	-0.425969
-0.462656	= bb[i] +	-0.124939
-0.065707	= log(b[i]) +	-0.124939
-0.462656	* Func1(x) +	-0.124939
-0.462656	= 1.0f +	-0.124939
-0.462656	?Func@@YAXQAHAAH@Z ENDP +	-0.124939
-0.358156	= b.y +	-0.124939
-0.358156	compute (FuncRow(i)*columns +	-0.124939
-0.358156	= a1/b1 +	-0.124939
-0.358156	as (int)&matrix[0][0] +	-0.124939
-0.358156	= list[j].b +	-0.124939
-0.358156	+ e +	-0.124939
-0.358156	{return r.a +	-0.124939
-0.358156	= A*x*x +	-0.124939
-0.358156	= (a1*b2 +	-0.124939
-0.358156	+ B*x +	-0.124939
-0.358156	return x*x +	-0.124939
-0.358156	? (cc[i] +	-0.124939
-0.358156	{return p->a +	-0.124939
-0.358156	= y.d +	-0.124939
-0.358156	= y.a +	-0.124939
-0.358156	= y.b +	-0.124939
-0.358156	= y.c +	-0.124939
-0.358156	return square(x) +	-0.124939
-0.358156	- 8*x +	-0.124939
-0.358156	= (int)(&list[0]) +	-0.124939
-0.358156	return vector(x +	-0.124939
-0.358156	= b.x +	-0.124939
-0.358156	+ c.x +	-0.124939
-0.358156	+ c.y +	-0.124939
-0.292199	a - n.a.	-0.204120
-0.608489	- - n.a.	-1.630089
-1.411204	x - n.a.	-0.602060
-0.855202	n.a. - n.a.	-0.602060
-0.402808	2 - n.a.	-0.124939
-0.254411	0 - n.a.	-0.301030
-0.402808	b) - n.a.	-0.124939
-0.402808	-1 - n.a.	-0.124939
-0.402808	a*b - n.a.	-0.124939
-0.402808	folding - n.a.	-0.124939
-0.567766	a+(b+c) - n.a.	-0.124939
-0.567766	a*(b+c) - n.a.	-0.124939
-0.402808	b*a - n.a.	-0.124939
-0.402808	a&(b|c) - n.a.	-0.124939
-0.402808	a<<(b+c) - n.a.	-0.124939
-0.402808	a*4 - n.a.	-0.124939
-0.595796	n.a. x n.a.	-0.301030
-0.583531	elimination x n.a.	-0.124939
-0.583531	a*b=b*a x n.a.	-0.124939
-0.226085	- n.a. n.a.	-1.367977
-0.313294	__INTEL_COMPILER n.a. n.a.	-0.124939
-0.585858	Linux platform n.a.	-0.124939
-0.557654	- reciprocal n.a.	-0.124939
-0.764503	__INTEL_COMPILER __INTEL_COMPILER n.a.	-0.124939
-0.504941	not _WIN32 n.a.	-0.124939
-0.463568	0.44 0.40 n.a.	-0.124939
-0.463568	0.25 0.24 n.a.	-0.124939
-0.358873	1.25 1.61 n.a.	-0.124939
-2.155068	of the library	-0.124939
-2.414025	to the library	-0.124939
-2.272808	and the library	-0.124939
-2.580474	in the library	-0.124939
-2.424647	if the library	-0.124939
-2.107494	than the library	-0.124939
-2.296296	when the library	-0.124939
-1.462975	from the library	-0.124939
-0.970461	call the library	-0.124939
-1.186523	calling the library	-0.124939
-1.412649	including the library	-0.124939
-1.186523	economize the library	-0.124939
-0.891280	loads the library	-0.124939
-2.557662	is a library	-0.124939
-2.604580	in a library	-0.124939
-2.246549	as a library	-0.124939
-1.614910	example, a library	-0.124939
-1.202638	addresses of library	-0.124939
-1.364472	time in library	-0.124939
-0.599397	CPU-time in library	-0.124939
-0.599397	__intel_new_strlen in library	-0.124939
-1.579971	code. The library	-0.124939
-0.898581	register. The library	-0.124939
-1.930023	useful for library	-0.124939
-1.067904	program or library	-0.124939
-0.895539	object or library	-0.124939
-1.594174	the function library	-0.124939
-1.534043	a function library	-0.124939
-1.333179	The function library	-0.124939
-1.191868	A function library	-0.124939
-1.094958	Intel function library	-0.124939
-1.131863	another function library	-0.124939
-0.827357	standard function library	-0.124939
-0.827357	separate function library	-0.124939
-0.562545	own function library	-0.124939
-0.562545	C function library	-0.124939
-0.827357	math function library	-0.124939
-0.562545	asmlib function library	-0.124939
-0.562545	up-to-date function library	-0.124939
-0.870719	compiler. This library	-0.124939
-0.200772	platforms. This library	-0.425969
-0.585573	7.2). This library	-0.124939
-0.585573	-mveclibabi=svml. This library	-0.124939
-0.796477	use this library	-0.425969
-0.599980	ReadTSC() from library	-0.124939
-1.195482	long vector library	-0.124939
-2.561458	floating point library	-0.124939
-0.169893	vector class library	-0.182931
-0.884326	Vector class library	-0.124939
-0.599108	of most library	-0.124939
-0.588328	Many Intel library	-0.124939
-0.588328	undocumented Intel library	-0.124939
-1.680734	most efficient library	-0.124939
-1.066510	for any library	-0.124939
-0.198884	standard template library	-0.425969
-0.212599	a dynamic library	-0.301030
-0.105900	A dynamic library	-0.602060
-0.646215	same dynamic library	-0.124939
-0.454901	another dynamic library	-0.124939
-1.394411	the necessary library	-0.124939
-1.586011	to get library	-0.124939
-0.591335	that standard library	-0.124939
-0.881535	in optimizing library	-0.124939
-1.359360	a graphics library	-0.124939
-0.586275	dynamically linked library	-0.124939
-0.685047	user interface library	-0.124939
-0.798762	static link library	-0.124939
-0.357244	dynamic link library	-0.301030
-0.514967	math core library	-0.124939
-0.514967	Math core library	-0.124939
-0.582531	well- tested library	-0.124939
-0.616187	vector math library	-0.124939
-1.389267	the entire library	-0.124939
-0.567095	call. Load library	-0.124939
-0.817839	on executing library	-0.124939
-0.557091	Time- consuming library	-0.124939
-0.261886	the asmlib library	-0.124939
-0.077752	using asmlib library	-0.425969
-0.358457	re- usable library	-0.124939
-0.358457	The IPP library	-0.124939
-0.358457	Performance Primitives" library	-0.124939
-1.472817	function of i	-0.124939
-1.058197	value of i	-0.124939
-0.643365	bit of i	-0.124939
-0.595398	conversion of i	-0.124939
-1.078248	this to i	-0.124939
-1.377244	0 and i	-0.124939
-0.601390	noticed that i	-0.124939
-1.686332	a[i] = i	-0.124939
-0.891168	list[i] = i	-0.124939
-0.596058	eax = i	-0.124939
-0.601124	label if i	-0.124939
-1.591525	same as i	-0.124939
-2.568300	is not i	-0.124939
-0.600620	work int i	-0.124939
-0.888228	number when i	-0.124939
-0.594565	invalid when i	-0.124939
-1.831006	it has i	-0.124939
-0.599749	15. If i	-0.124939
-1.534605	for example i	-0.124939
-0.154810	= 0; i	-1.716003
-1.712493	by making i	-0.124939
-0.567440	i ; i	-0.124939
-0.567440	( ; i	-0.124939
-1.178623	list[i] += i	-0.124939
-1.249633	to add i	-0.124939
-1.614114	loop counter i	-0.124939
-0.036079	for (int i	-1.238882
-0.587295	min && i	-0.124939
-0.586759	0 || i	-0.124939
-0.970556	< 100; i	-0.425969
-1.151995	= 2; i	-0.124939
-0.583152	has replaced i	-0.124939
-0.758626	to divide i	-0.425969
-0.572776	loop condition i	-0.124939
-1.587635	< size; i	-0.124939
-0.065402	< 256; i	-0.823909
-0.557130	also eliminate i	-0.124939
-0.805180	two comparisons i	-0.124939
-0.804680	than comparing i	-0.124939
-0.981021	by type-casting i	-0.124939
-0.526636	s; 40 i	-0.124939
-1.008950	= 2.0; i	-0.124939
-0.463075	= StringLength; i	-0.124939
-0.658980	< 20; i	-0.124939
-0.901757	everything is float	-0.124939
-2.544169	is a float	-0.124939
-2.316730	to a float	-0.124939
-2.124231	by a float	-0.124939
-1.192541	because a float	-0.124939
-0.601314	Conversions of float	-0.124939
-1.194980	i to float	-0.124939
-0.203658	Integer to float	-0.124939
-0.600945	100000001.23456. The float	-0.124939
-1.373647	variables for float	-0.124939
-0.526078	bytes = float	-0.249877
-1.557294	else { float	-0.124939
-1.715537	x) { float	-0.124939
-0.134308	union { float	-1.028029
-1.033969	S1 { float	-0.124939
-2.404006	to use float	-0.124939
-1.052816	conversion from float	-0.124939
-1.052816	conversions from float	-0.124939
-0.895172	{ static float	-0.124939
-0.412810	static const float	-0.124939
-0.891959	by 4 float	-0.124939
-1.347335	8 8 float	-0.124939
-1.187137	128 bit float	-0.124939
-0.823945	256 bit float	-0.124939
-0.560687	(128 bit float	-0.124939
-1.339867	by 16 float	-0.124939
-1.269147	128 SSE2 float	-0.124939
-0.757055	int i; float	-0.204120
-0.593948	vector(float a, float	-0.124939
-0.591969	double 128 float	-0.124939
-0.591574	get four float	-0.124939
-1.339770	a list float	-0.124939
-1.718292	static inline float	-0.124939
-0.587883	uint64_t 256 float	-0.124939
-0.586068	56 public: float	-0.124939
-0.585401	= x; float	-0.124939
-0.431972	= 100; float	-0.301030
-1.137455	256 AVX2 float	-0.124939
-0.584051	8.15a were float	-0.124939
-0.582390	bool a; float	-0.124939
-0.328467	Don't mix float	-0.425969
-1.001496	to convert float	-0.124939
-0.570313	Taylor series float	-0.124939
-0.825405	Register variables, float	-0.124939
-0.420487	float a[100]; float	-0.124939
-0.995478	512 AVX512 float	-0.124939
-1.039753	= 1000; float	-0.124939
-0.504181	Example 14.6 float	-0.124939
-0.504181	Example 7.1 float	-0.124939
-0.725386	= 8; float	-0.124939
-0.503698	1.0f;} 66 float	-0.124939
-0.833037	i, j; float	-0.124939
-0.504181	Example 7.27 float	-0.124939
-0.504181	Example 7.24 float	-0.124939
-0.504181	Example 7.16 float	-0.124939
-0.657979	subexpression elimin., float	-0.124939
-0.657979	// x^2 float	-0.124939
-0.657979	int a[1000]; float	-0.124939
-0.657979	= 1.f; float	-0.124939
-0.462437	Example 11.1a float	-0.124939
-0.462437	Example 11.1b float	-0.124939
-0.462437	// Mixing float	-0.124939
-0.357984	Example 8.3a float	-0.124939
-0.357984	Example 7.29a float	-0.124939
-0.357984	code mixes float	-0.124939
-0.357984	* a;} float	-0.124939
-0.357984	n floats: float	-0.124939
-0.357984	Example 8.1b float	-0.124939
-0.357984	Example 8.1a float	-0.124939
-0.357984	Example 8.16 float	-0.124939
-0.357984	Example 8.18 float	-0.124939
-0.357984	= 32; float	-0.124939
-0.357984	Example 14.18a float	-0.124939
-0.357984	Example 14.18b float	-0.124939
-0.357984	1./1.30767E12, 1./2.09227E13}; float	-0.124939
-0.357984	b[size], c[size]; float	-0.124939
-0.357984	Example 7.26b float	-0.124939
-0.357984	Example 7.26a float	-0.124939
-0.357984	= 50; float	-0.124939
-0.357984	registers (8 float	-0.124939
-0.357984	Example 14.2a float	-0.124939
-0.357984	Example 14.2b float	-0.124939
-0.900950	utilize the multiple	-0.124939
-0.600989	merge the multiple	-0.124939
-0.600989	combine the multiple	-0.124939
-1.475330	is a multiple	-0.823909
-1.456795	by a multiple	-0.425969
-0.595707	size a multiple	-0.124939
-0.595707	spaced a multiple	-0.124939
-1.075218	ported to multiple	-0.124939
-0.600735	Alternative to multiple	-0.124939
-0.824555	code in multiple	-0.425969
-1.058327	program in multiple	-0.124939
-1.898354	used in multiple	-0.124939
-0.594997	compiled in multiple	-0.124939
-0.601212	Library. The multiple	-0.124939
-1.488676	used for multiple	-0.124939
-0.895362	array for multiple	-0.124939
-1.576823	one or multiple	-0.124939
-0.595612	platforms or multiple	-0.124939
-0.595612	once or multiple	-0.124939
-1.533398	example, if multiple	-0.124939
-1.192517	safe if multiple	-0.124939
-1.519586	used by multiple	-0.124939
-0.597824	zation by multiple	-0.124939
-1.020662	CPU with multiple	-0.124939
-1.145601	loop with multiple	-0.124939
-1.145601	systems with multiple	-0.124939
-0.581665	method with multiple	-0.124939
-0.581665	expression with multiple	-0.124939
-1.020662	computer with multiple	-0.124939
-0.581665	Processors with multiple	-0.124939
-0.900566	performed on multiple	-0.124939
-2.261069	such as multiple	-0.124939
-0.967316	that have multiple	-0.124939
-2.174319	to use multiple	-0.124939
-1.252072	and use multiple	-0.124939
-0.589294	To use multiple	-0.124939
-0.599903	Inheritance from multiple	-0.124939
-0.599903	CParent::Hello() has multiple	-0.124939
-2.232813	to make multiple	-0.124939
-1.141939	may make multiple	-0.124939
-1.562701	will make multiple	-0.124939
-0.897171	can set multiple	-0.124939
-2.264915	to do multiple	-0.124939
-0.507921	it into multiple	-0.124939
-0.731626	function into multiple	-0.124939
-0.507921	code into multiple	-0.124939
-0.956588	data into multiple	-0.124939
-0.507921	up into multiple	-0.124939
-0.340076	work into multiple	-0.425969
-0.731626	tasks into multiple	-0.124939
-0.507921	job into multiple	-0.124939
-0.956588	divided into multiple	-0.124939
-0.063571	shared between multiple	-0.221849
-0.495991	jump between multiple	-0.124939
-0.495991	distributed between multiple	-0.124939
-0.495991	workload between multiple	-0.124939
-0.595802	mask out multiple	-0.124939
-1.339059	for making multiple	-0.124939
-0.594239	semicolons, while multiple	-0.124939
-1.641489	to avoid multiple	-0.124939
-0.980906	may avoid multiple	-0.124939
-0.975390	go through multiple	-0.124939
-0.564638	separately through multiple	-0.124939
-0.802322	is doing multiple	-0.124939
-0.489039	and doing multiple	-0.124939
-0.863880	from doing multiple	-0.124939
-0.489039	CPU doing multiple	-0.124939
-0.588926	shared_ptr allows multiple	-0.124939
-0.588590	parallel: Using multiple	-0.124939
-0.588085	for running multiple	-0.124939
-0.584598	automatically generate multiple	-0.124939
-0.582327	(IDE) supports multiple	-0.124939
-0.338508	for checking multiple	-0.425969
-0.853901	for testing multiple	-0.124939
-0.576770	declared. Avoid multiple	-0.124939
-0.577023	cores: Define multiple	-0.124939
-0.850903	of compiling multiple	-0.124939
-0.574818	patterns containing multiple	-0.124939
-1.331693	to keep multiple	-0.124939
-0.390610	14.7b. Testing multiple	-0.124939
-0.390610	14.7a. Testing multiple	-0.124939
-0.504219	processing instructions, multiple	-0.124939
-0.658722	for supporting multiple	-0.124939
-0.358357	processing. Running multiple	-0.124939
-0.358357	for combining multiple	-0.124939
-0.358357	access. Run multiple	-0.124939
-0.358357	can toggle multiple	-0.124939
-2.042374	of the two	-0.249877
-2.404068	for the two	-0.124939
-2.443469	that the two	-0.124939
-2.180262	because the two	-0.124939
-1.007248	between the two	-0.124939
-0.597140	mix the two	-0.124939
-0.893305	keep the two	-0.124939
-0.893305	Now the two	-0.124939
-0.597140	interleave the two	-0.124939
-0.597140	namely the two	-0.124939
-2.324473	that is two	-0.124939
-1.890696	use of two	-0.124939
-1.623861	vector of two	-0.124939
-1.488303	performance of two	-0.124939
-1.286946	vectors of two	-0.124939
-1.267968	used in two	-0.124939
-1.180750	software in two	-0.124939
-0.596098	compiling in two	-0.124939
-1.277660	defined in two	-0.124939
-0.596098	representations in two	-0.124939
-1.956148	} The two	-0.124939
-0.599759	up. The two	-0.124939
-0.901671	specifying that two	-0.124939
-2.515153	may be two	-0.124939
-1.418921	there are two	-0.124939
-1.030392	There are two	-0.124939
-0.895920	one or two	-0.425969
-0.593116	each, or two	-0.124939
-0.593116	AVX2, or two	-0.124939
-0.970789	loop by two	-0.124939
-0.594848	unroll by two	-0.124939
-0.597559	table with two	-0.124939
-0.894135	calculated with two	-0.124939
-0.597087	vector as two	-0.124939
-0.893200	represented as two	-0.124939
-1.883094	more than two	-0.124939
-2.223298	rather than two	-0.124939
-1.482622	to have two	-0.124939
-1.751147	you have two	-0.124939
-1.222629	CPUs have two	-0.124939
-0.585207	family have two	-0.124939
-1.487925	program has two	-0.124939
-0.592901	8.23b has two	-0.124939
-2.353844	to make two	-0.124939
-1.588069	can make two	-0.124939
-0.882191	integer. If two	-0.124939
-0.591485	considerable. If two	-0.124939
-1.576931	to do two	-0.124939
-0.587831	operations into two	-0.124939
-0.587831	split into two	-0.124939
-1.880163	the variable two	-0.124939
-1.260562	difference between two	-0.124939
-0.536618	select between two	-0.124939
-0.117161	chooses between two	-0.124939
-1.282584	one way two	-0.124939
-1.256440	the first two	-0.124939
-1.424825	The first two	-0.124939
-0.981910	of these two	-0.124939
-0.777880	and these two	-0.124939
-0.534958	distinguish these two	-0.124939
-1.477419	of making two	-0.124939
-0.564180	} These two	-0.124939
-0.564180	1996. These two	-0.124939
-0.950160	is doing two	-0.124939
-0.950160	for doing two	-0.124939
-0.709964	to run two	-0.425969
-1.364119	The next two	-0.124939
-0.191181	(2,2,2,2,2,2,2,2) __m128i two	-0.425969
-0.875651	avoid running two	-0.124939
-0.587620	set. Make two	-0.124939
-0.519322	with just two	-0.124939
-0.519322	have just two	-0.124939
-0.583808	would require two	-0.124939
-0.578658	doesn't prevent two	-0.124939
-0.319324	to swap two	-0.124939
-0.450194	for approximately two	-0.124939
-0.450194	accessed approximately two	-0.124939
-0.557013	Then again two	-0.124939
-0.915564	to compare two	-0.124939
-0.504279	addition. Comparing two	-0.124939
-0.462965	condition. Replacing two	-0.124939
-0.358400	and correspondingly two	-0.124939
-0.358400	by allowing two	-0.124939
-2.098294	of the object	-0.124939
-2.291113	to the object	-0.124939
-2.161526	and the object	-0.124939
-2.434146	in the object	-0.124939
-2.249569	for the object	-0.124939
-1.905012	that the object	-0.124939
-1.733260	if the object	-0.124939
-2.188620	on the object	-0.124939
-1.708975	when the object	-0.425969
-2.033909	at the object	-0.124939
-2.088031	If the object	-0.124939
-1.890180	where the object	-0.124939
-0.592690	member the object	-0.124939
-0.592690	register the object	-0.124939
-1.557768	sure the object	-0.124939
-1.587850	whether the object	-0.124939
-0.592690	destructor the object	-0.124939
-0.592690	move the object	-0.124939
-0.592690	deleting the object	-0.124939
-0.592690	constructing the object	-0.124939
-0.592690	met: the object	-0.124939
-0.894530	class of object	-0.124939
-2.125091	version of object	-0.124939
-1.598333	type of object	-0.124939
-1.651992	advantages of object	-0.124939
-0.203256	effects of object	-0.425969
-1.068330	classes. The object	-0.124939
-0.598413	another. The object	-0.124939
-0.598413	division). The object	-0.124939
-1.267011	array or object	-0.124939
-0.247916	variable or object	-0.124939
-1.075442	distributed as object	-0.124939
-1.359243	is an object	-0.124939
-1.379029	of an object	-0.124939
-0.861303	to an object	-0.425969
-0.903193	in an object	-0.425969
-0.770546	that an object	-0.124939
-1.017917	on an object	-0.124939
-1.333413	as an object	-0.124939
-0.530751	time an object	-0.124939
-0.544107	use an object	-0.124939
-0.892979	such an object	-0.124939
-0.770546	access an object	-0.124939
-0.188891	accessing an object	-0.124939
-0.770546	whenever an object	-0.124939
-0.530751	Accessing an object	-0.124939
-0.530751	construct an object	-0.124939
-1.400190	the data object	-0.124939
-0.599777	three different object	-0.124939
-2.620252	the same object	-0.124939
-1.288246	from one object	-0.124939
-1.193007	that no object	-0.124939
-1.074668	of each object	-0.124939
-0.570286	store each object	-0.124939
-0.570286	moving each object	-0.124939
-1.359531	a static object	-0.124939
-2.015039	the first object	-0.124939
-1.234633	a new object	-0.124939
-0.539686	above. An object	-0.124939
-0.539686	declared. An object	-0.124939
-0.539686	Inheritance An object	-0.124939
-1.161231	a single object	-0.425969
-1.271347	or structure object	-0.124939
-0.629908	the shared object	-0.124939
-0.151011	a shared object	-0.221849
-0.130367	A shared object	-0.124939
-0.448102	64-bit shared object	-0.124939
-0.317524	large shared object	-0.124939
-0.591014	makes intermediate object	-0.124939
-0.586294	reasons: Each object	-0.124939
-0.335791	the local object	-0.124939
-0.572766	reasons why object	-0.124939
-1.063603	a temporary object	-0.124939
-0.763956	textbooks recommend object	-0.124939
-0.526945	a contained object	-0.124939
-0.883898	the original object	-0.124939
-0.314523	The existing object	-0.124939
-0.444054	an existing object	-0.124939
-0.463203	compilers. Mixing object	-0.124939
-0.358586	the usual object	-0.124939
-1.606660	is the number	-0.425969
-2.352408	of the number	-0.124939
-2.176635	to the number	-0.124939
-1.521760	and the number	-0.602060
-1.853896	that the number	-0.425969
-1.482700	or the number	-0.124939
-1.573247	if the number	-0.726999
-1.644971	by the number	-0.425969
-2.085725	on the number	-0.124939
-1.676423	as the number	-0.124939
-1.506163	than the number	-0.425969
-1.683213	when the number	-0.425969
-1.861906	make the number	-0.124939
-1.030123	If the number	-0.602060
-0.876449	double the number	-0.124939
-1.088364	where the number	-0.602060
-1.596644	between the number	-0.124939
-1.513780	making the number	-0.124939
-1.316444	Therefore, the number	-0.124939
-1.152379	reduce the number	-0.124939
-1.039809	reducing the number	-0.124939
-0.876449	measures the number	-0.124939
-1.299426	manual is number	-0.124939
-2.567610	in a number	-0.124939
-0.803493	are a number	-0.425969
-1.758375	from a number	-0.124939
-0.598704	want a number	-0.124939
-1.434098	program. The number	-0.124939
-1.167142	available. The number	-0.124939
-0.884208	system. The number	-0.124939
-0.592516	systems: The number	-0.124939
-0.884208	8. The number	-0.124939
-0.592516	small. The number	-0.124939
-0.592516	27 The number	-0.124939
-0.202982	512; // number	-0.425969
-0.596399	64; // number	-0.124939
-1.568706	to this number	-0.124939
-1.459140	use this number	-0.124939
-1.259087	floating point number	-0.204120
-1.320918	to set number	-0.124939
-0.589536	in set number	-0.124939
-1.684033	a variable number	-0.124939
-1.279016	A variable number	-0.124939
-1.186530	a 32-bit number	-0.124939
-1.396572	very large number	-0.124939
-0.565317	of element number	-0.124939
-0.565317	reach element number	-0.124939
-1.053118	the line number	-0.124939
-0.590085	The optimal number	-0.124939
-0.588604	false model number	-0.124939
-1.303289	a higher number	-0.124939
-0.185248	large positive number	-0.425969
-0.886580	a limited number	-0.124939
-0.715922	A limited number	-0.124939
-0.277296	The maximum number	-0.602060
-0.577152	a reduced number	-0.124939
-0.472646	the total number	-0.602060
-0.562567	have family number	-0.124939
-0.562472	for random number	-0.124939
-0.964207	a realistic number	-0.124939
-0.064192	an excessive number	-0.602060
-0.540634	the 107 number	-0.124939
-0.143305	an increasing number	-0.425969
-0.527073	an extended number	-0.124939
-0.526721	valid 63 number	-0.124939
-0.504480	an odd number	-0.124939
-0.203971	will evict number	-0.124939
-0.358543	lookups Max. number	-0.124939
-0.358543	an integral number	-0.124939
-1.714946	with the static	-0.124939
-2.242935	use the static	-0.124939
-2.278259	because the static	-0.124939
-1.624449	without the static	-0.124939
-1.290370	add the static	-0.124939
-2.271405	to a static	-0.124939
-1.796677	in a static	-0.124939
-1.836527	or a static	-0.124939
-1.893610	than a static	-0.124939
-1.913533	advantage of static	-0.124939
-0.599153	mechanism of static	-0.124939
-0.897296	storage of static	-0.124939
-1.195458	behavior of static	-0.124939
-0.901971	Access to static	-0.124939
-0.598994	public and static	-0.124939
-0.896979	global and static	-0.124939
-0.598994	123 and static	-0.124939
-1.069428	table in static	-0.124939
-1.182132	stored in static	-0.301030
-0.893337	something in static	-0.124939
-1.640761	function. The static	-0.124939
-1.059114	memory. The static	-0.124939
-1.177564	class. The static	-0.124939
-1.059114	module. The static	-0.124939
-0.595267	tables. The static	-0.124939
-0.901578	clear that static	-0.124939
-0.598108	inline or static	-0.124939
-0.598108	Global or static	-0.124939
-1.200110	not if static	-0.124939
-1.735375	rely on static	-0.124939
-0.593464	want as static	-0.124939
-0.376752	either as static	-0.425969
-0.596111	efficiently than static	-0.124939
-1.351250	slower than static	-0.124939
-1.977396	x) { static	-0.124939
-1.269016	() { static	-0.124939
-2.275867	to use static	-0.124939
-0.594698	never use static	-0.124939
-0.600183	file when static	-0.124939
-0.667128	functions. A static	-0.425969
-0.587947	stack. A static	-0.124939
-1.000732	data from static	-0.124939
-0.574306	table from static	-0.124939
-0.574306	list from static	-0.124939
-0.574306	linked from static	-0.124939
-0.574306	copied from static	-0.124939
-0.599921	problems because static	-0.124939
-1.049699	all functions static	-0.124939
-1.726609	member functions static	-0.124939
-1.739651	for all static	-0.124939
-1.186958	of using static	-0.124939
-1.832666	by using static	-0.124939
-0.895698	local object static	-0.124939
-0.368647	static static static	-0.124939
-0.573708	module static static	-0.124939
-0.082390	from array static	-0.726999
-0.082390	into array static	-0.726999
-0.598086	functions, where static	-0.124939
-0.887452	align large static	-0.124939
-1.343377	int b; static	-0.124939
-1.050191	not support static	-0.124939
-1.194663	in both static	-0.124939
-0.548436	with both static	-0.124939
-0.614889	the keyword static	-0.425969
-0.586006	this requires static	-0.124939
-0.658862	{ public: static	-0.726999
-0.576781	making them static	-0.124939
-1.096861	This includes static	-0.124939
-1.087073	same module static	-0.124939
-1.145810	int n; static	-0.124939
-0.787980	to specify static	-0.124939
-0.526362	{ __declspec(align(16)) static	-0.124939
-0.763868	<int N> static	-0.124939
-0.658608	#include <emmintrin.h> static	-0.124939
-0.462838	Example 14.19 static	-0.124939
-0.462838	statements (called static	-0.124939
-0.658608	of factorials: static	-0.124939
-0.462838	the word static	-0.124939
-0.358299	floata; boolb=0; static	-0.124939
-0.358299	<typename T> static	-0.124939
-0.358299	cache line: static	-0.124939
-0.358299	return _mm_cvtss_si32(_mm_load_ss(&x));} static	-0.124939
-0.358299	function add_horizontal) static	-0.124939
-2.897249	of the 64-bit	-0.124939
-2.810298	in the 64-bit	-0.124939
-2.572313	that the 64-bit	-0.124939
-2.234933	use the 64-bit	-0.124939
-2.268415	because the 64-bit	-0.124939
-1.486278	after the 64-bit	-0.124939
-1.432927	including the 64-bit	-0.124939
-1.918068	of a 64-bit	-0.124939
-1.849136	and a 64-bit	-0.124939
-2.531365	in a 64-bit	-0.124939
-1.745061	using a 64-bit	-0.124939
-0.894963	write a 64-bit	-0.124939
-1.239425	advantage of 64-bit	-0.124939
-1.654231	disadvantage of 64-bit	-0.124939
-0.599282	marketing of 64-bit	-0.124939
-0.601622	portability to 64-bit	-0.124939
-0.102446	32-bit and 64-bit	-0.263241
-1.255857	integers and 64-bit	-0.124939
-0.123179	32- and 64-bit	-0.124939
-0.768785	than in 64-bit	-0.301030
-1.709889	used in 64-bit	-0.124939
-0.340359	efficient in 64-bit	-0.301030
-0.999527	faster in 64-bit	-0.124939
-0.368708	registers in 64-bit	-0.425969
-1.321511	bits in 64-bit	-0.124939
-1.456830	available in 64-bit	-0.124939
-1.100861	bytes in 64-bit	-0.124939
-1.321511	running in 64-bit	-0.124939
-0.579351	needed in 64-bit	-0.425969
-0.573854	avoided in 64-bit	-0.124939
-0.573854	default in 64-bit	-0.124939
-0.573854	follows in 64-bit	-0.124939
-0.573854	fourteen in 64-bit	-0.124939
-0.573854	i.e. in 64-bit	-0.124939
-0.198330	sixteen in 64-bit	-0.124939
-0.573854	enabled in 64-bit	-0.124939
-0.573854	simpler in 64-bit	-0.124939
-0.573854	anyway in 64-bit	-0.124939
-0.573854	recognized in 64-bit	-0.124939
-0.599861	6 The 64-bit	-0.124939
-0.898704	is. The 64-bit	-0.124939
-1.556832	available for 64-bit	-0.124939
-1.479444	compiled for 64-bit	-0.124939
-1.688873	support for 64-bit	-0.124939
-0.892624	drivers for 64-bit	-0.124939
-0.595800	32-bit or 64-bit	-0.124939
-0.595800	systems or 64-bit	-0.124939
-0.595800	(32-bit or 64-bit	-0.124939
-1.898807	efficient than 64-bit	-0.124939
-2.176288	to use 64-bit	-0.124939
-1.768357	can use 64-bit	-0.124939
-1.474635	may use 64-bit	-0.124939
-0.600045	increased from 64-bit	-0.124939
-0.599776	fact only 64-bit	-0.124939
-1.683112	on all 64-bit	-0.124939
-0.895785	into two 64-bit	-0.124939
-0.535243	2 In 64-bit	-0.124939
-0.778379	efficient. In 64-bit	-0.124939
-0.535243	Windows. In 64-bit	-0.124939
-0.535243	parameters. In 64-bit	-0.124939
-0.535243	cycle. In 64-bit	-0.124939
-0.535243	32. In 64-bit	-0.124939
-1.281202	4 4 64-bit	-0.124939
-0.595280	speeding up 64-bit	-0.124939
-0.594914	mode. Some 64-bit	-0.124939
-0.565978	files. Use 64-bit	-0.124939
-0.565978	point. Use 64-bit	-0.124939
-0.589949	numbers. Therefore, 64-bit	-0.124939
-1.366636	more efficient. 64-bit	-0.124939
-0.854395	in registers. 64-bit	-0.124939
-0.574943	use full 64-bit	-0.124939
-0.846902	can expect 64-bit	-0.124939
-0.988996	internal references. 64-bit	-0.124939
-1.088117	to define 64-bit	-0.124939
-1.026169	or reference, 64-bit	-0.124939
-0.540634	respectively. (In 64-bit	-0.124939
-0.526721	int64_t 29 64-bit	-0.124939
-0.463148	unsigned __int64 64-bit	-0.124939
-0.358543	registers, whereas 64-bit	-0.124939
-0.358543	are different. 64-bit	-0.124939
-0.891032	order and there	-0.124939
-0.595989	limited and there	-0.124939
-0.595989	matter and there	-0.124939
-0.595989	common, and there	-0.124939
-0.595989	areas, and there	-0.124939
-0.595989	dominating and there	-0.124939
-1.214657	and that there	-0.124939
-2.056459	so that there	-0.124939
-1.133702	way that there	-0.124939
-1.093405	assume that there	-0.124939
-1.025326	feature that there	-0.124939
-1.531642	Note that there	-0.124939
-0.200314	aware that there	-0.124939
-0.866455	discovered that there	-0.124939
-0.583356	complex, that there	-0.124939
-0.583356	discover that there	-0.124939
-1.443872	elements are there	-0.124939
-1.520000	or if there	-0.124939
-0.548795	program if there	-0.124939
-1.118632	used if there	-0.124939
-0.548795	cache if there	-0.124939
-0.935712	integer if there	-0.124939
-0.358115	pointers if there	-0.425969
-0.935712	thread if there	-0.124939
-0.548795	programs if there	-0.124939
-0.548795	bigger if there	-0.124939
-0.548795	smaller if there	-0.124939
-0.472414	safe if there	-0.425969
-0.548795	exponent if there	-0.124939
-1.103415	especially if there	-0.124939
-1.021722	consider if there	-0.124939
-0.548795	i.e. if there	-0.124939
-0.548795	separately if there	-0.124939
-0.548795	accumulators if there	-0.124939
-0.548795	calls, if there	-0.124939
-0.600908	convenience - there	-0.124939
-0.891641	RAM than there	-0.124939
-0.596298	blocks than there	-0.124939
-1.175016	efficient when there	-0.124939
-0.594599	critical when there	-0.124939
-0.357343	time then there	-0.425969
-0.547000	memory then there	-0.124939
-1.016410	functions then there	-0.124939
-0.547000	two then there	-0.124939
-0.547000	performance then there	-0.124939
-0.547000	error then there	-0.124939
-0.547000	etc. then there	-0.124939
-0.799171	used, then there	-0.124939
-0.547000	elsewhere then there	-0.124939
-0.585183	differently because there	-0.124939
-0.585183	negligible because there	-0.124939
-0.869967	problematic because there	-0.124939
-0.567752	mode. If there	-0.124939
-0.197038	again. If there	-0.425969
-0.567752	running. If there	-0.124939
-0.567752	calculate. If there	-0.124939
-0.849822	code, but there	-0.124939
-0.574610	precision, but there	-0.124939
-0.574610	devices, but there	-0.124939
-0.574610	bases, but there	-0.124939
-0.584580	operations where there	-0.124939
-0.868807	however, where there	-0.124939
-0.597913	parameter, so there	-0.124939
-1.517948	this case there	-0.124939
-0.783917	function. But there	-0.124939
-0.538398	pointer. But there	-0.124939
-0.538398	integers. But there	-0.124939
-0.592271	and whether there	-0.124939
-0.443520	possible. However, there	-0.124939
-0.443520	critical. However, there	-0.124939
-0.443520	120 However, there	-0.124939
-0.443520	automatically. However, there	-0.124939
-0.443520	are. However, there	-0.124939
-0.508143	default unless there	-0.124939
-0.508143	object, unless there	-0.124939
-0.508143	manually unless there	-0.124939
-0.977270	most cases, there	-0.124939
-0.473907	some cases, there	-0.301030
-0.876962	simply put there	-0.124939
-0.584822	ArrayOfStructures[100]; Here, there	-0.124939
-1.312166	reason why there	-0.124939
-0.569950	(Of course there	-0.124939
-0.841227	are used, there	-0.124939
-1.268096	the diagonal there	-0.124939
-0.827222	64-bit systems, there	-0.124939
-1.003314	cases, however, there	-0.124939
-1.202632	In general, there	-0.124939
-0.540634	all. Fortunately, there	-0.124939
-0.997909	but unfortunately there	-0.124939
-0.358735	units. Typically, there	-0.124939
-0.358735	caches. Typically, there	-0.124939
-0.763569	is enabled there	-0.124939
-0.659095	be avoided, there	-0.124939
-2.066633	of the C++	-0.124939
-2.482909	for the C++	-0.124939
-2.301041	with the C++	-0.124939
-1.361479	But the C++	-0.124939
-0.896756	behind the C++	-0.124939
-0.598881	Because the C++	-0.124939
-2.644991	in a C++	-0.124939
-0.600070	In a C++	-0.124939
-1.499206	Make a C++	-0.124939
-2.112870	version of C++	-0.124939
-0.248547	optimization of C++	-0.602060
-1.625083	disadvantage of C++	-0.124939
-1.641181	advantages of C++	-0.124939
-1.351225	brands of C++	-0.124939
-0.596106	maintainability of C++	-0.124939
-1.626424	Windows and C++	-0.124939
-0.600135	C and C++	-0.124939
-1.412959	functions in C++	-0.124939
-1.176549	software in C++	-0.124939
-0.596404	errors in C++	-0.124939
-0.595001	processing in C++	-0.124939
-0.889086	allowed in C++	-0.124939
-0.595001	Development in C++	-0.124939
-1.177758	etc. The C++	-0.124939
-0.595318	conversions The C++	-0.124939
-0.889709	needed. The C++	-0.124939
-1.059261	work. The C++	-0.124939
-0.889709	resource. The C++	-0.124939
-1.076688	is for C++	-0.124939
-1.377326	sense that C++	-0.124939
-0.901080	casting // C++	-0.124939
-0.248440	C or C++	-0.124939
-0.203178	Report on C++	-0.124939
-2.131286	such as C++	-0.124939
-0.597064	developed as C++	-0.124939
-0.899430	disadvantages when C++	-0.124939
-0.599861	117 A C++	-0.124939
-0.828085	of different C++	-0.124939
-1.123173	in different C++	-0.124939
-0.560392	for different C++	-0.903090
-1.174326	several different C++	-0.124939
-1.681014	on all C++	-0.124939
-0.599048	Furthermore, most C++	-0.124939
-1.135785	the Intel C++	-0.124939
-0.992192	of Intel C++	-0.124939
-0.784325	with Intel C++	-0.124939
-0.538630	compilers. Intel C++	-0.124939
-0.538630	2009). Intel C++	-0.124939
-0.538630	2.00. Intel C++	-0.124939
-0.598922	vectors into C++	-0.124939
-0.598335	it. In C++	-0.124939
-1.436758	The Gnu C++	-0.124939
-0.568951	2009. Gnu C++	-0.124939
-1.056120	in compiled C++	-0.124939
-0.592299	produces another C++	-0.124939
-0.590745	options All C++	-0.124939
-0.590435	platforms. However, C++	-0.124939
-0.543049	efficient. Most C++	-0.124939
-0.543049	optimizations. Most C++	-0.124939
-0.538434	and Microsoft C++	-0.124939
-0.538434	tested: Microsoft C++	-0.124939
-0.583299	on advanced C++	-0.124939
-0.581417	most modern C++	-0.124939
-1.104181	function libraries. C++	-0.124939
-0.474197	platforms. PathScale C++	-0.124939
-0.474197	Hat). PathScale C++	-0.124939
-1.111024	assembly language. C++	-0.124939
-0.661894	the Borland C++	-0.124939
-0.415114	2005). Borland C++	-0.124939
-0.526467	several reasons. C++	-0.124939
-0.526467	language While C++	-0.124939
-0.314377	tolerated. PGI C++	-0.124939
-0.314377	2007. PGI C++	-0.124939
-0.526707	in C, C++	-0.124939
-0.504239	a well-structured C++	-0.124939
-0.462929	page 15. C++	-0.124939
-0.462929	in 36 C++	-0.124939
-0.462929	security. Standard C++	-0.124939
-0.462929	14 Portability C++	-0.124939
-0.358371	optimizer. Borland/CodeGear/Embarcadero C++	-0.124939
-0.358371	Intel: "Intel® C++	-0.124939
-0.358371	5.82 (Embarcadero/CodeGear/Borland C++	-0.124939
-1.769120	it is also	-0.124939
-1.615201	function is also	-0.124939
-2.157564	This is also	-0.124939
-1.724630	It is also	-0.301030
-1.145261	memory is also	-0.124939
-1.145261	functions is also	-0.124939
-1.394466	C++ is also	-0.124939
-0.872664	so is also	-0.124939
-0.200980	allocated is also	-0.124939
-0.872664	option is also	-0.124939
-1.242171	manual is also	-0.124939
-1.374752	operator is also	-0.124939
-0.373953	mechanism is also	-0.425969
-0.586581	buffer is also	-0.124939
-0.586581	Fortran is also	-0.124939
-0.586581	while-loop is also	-0.124939
-0.601329	programs and also	-0.124939
-1.201474	loop that also	-0.124939
-1.120042	program are also	-0.124939
-0.199506	other are also	-0.425969
-1.018532	There are also	-0.221849
-1.562357	objects are also	-0.124939
-1.545561	libraries are also	-0.124939
-1.120042	files are also	-0.124939
-0.859019	together are also	-0.124939
-0.579466	purposes are also	-0.124939
-1.768178	it can also	-0.124939
-1.694879	compiler can also	-0.124939
-1.831029	you can also	-0.124939
-0.628165	It can also	-0.301030
-1.022705	variables can also	-0.124939
-1.022705	branch can also	-0.124939
-0.846614	systems can also	-0.124939
-0.572904	classes can also	-0.124939
-0.572904	parameter can also	-0.124939
-0.997002	union can also	-0.124939
-0.846614	pattern can also	-0.124939
-0.572904	(arrays can also	-0.124939
-0.601182	time, it also	-0.124939
-0.601070	latter function also	-0.124939
-0.596975	possible. This also	-0.124939
-0.596975	137). This also	-0.124939
-1.857888	you may also	-0.124939
-1.350847	There may also	-0.124939
-1.231037	we may also	-0.124939
-0.587130	map may also	-0.124939
-0.600152	OneOrTwo5[b!=0]; will also	-0.124939
-1.063157	functions. It also	-0.124939
-0.593974	this. It also	-0.124939
-0.543339	test but also	-0.124939
-1.005707	time, but also	-0.124939
-0.543339	features, but also	-0.124939
-0.543339	languages, but also	-0.124939
-0.543339	spot but also	-0.124939
-0.543339	main, but also	-0.124939
-0.543339	casting, but also	-0.124939
-0.543339	platform, but also	-0.124939
-1.144436	function should also	-0.124939
-0.581241	video should also	-0.124939
-0.581241	Uninstallation should also	-0.124939
-2.430801	instruction set also	-0.124939
-1.289683	The compilers also	-0.124939
-0.488583	Some systems also	-0.425969
-1.773227	of these also	-0.124939
-1.796862	This method also	-0.124939
-1.272770	register stack also	-0.124939
-1.341073	C++ language also	-0.124939
-0.593282	about Linux also	-0.124939
-0.591953	increment operators also	-0.124939
-1.775824	memory allocation also	-0.124939
-0.876544	These methods also	-0.124939
-1.161853	static keyword also	-0.124939
-0.586739	Reducible expressions also	-0.124939
-1.336061	the STL also	-0.124939
-0.577368	www.intel.com. (See also	-0.124939
-0.850721	and possibly also	-0.124939
-1.318121	of course also	-0.124939
-1.184843	Loop unrolling also	-0.124939
-0.569972	coprocessor might also	-0.124939
-0.570057	by F1 also	-0.124939
-0.550231	and soon also	-0.124939
-0.107113	link libraries, also	-0.425969
-0.358500	is 102 also	-0.124939
-1.195147	source of such	-0.124939
-0.898580	consequence of such	-0.124939
-0.599799	absence of such	-0.124939
-1.671017	reference to such	-0.124939
-1.074770	costs to such	-0.124939
-1.070849	solution in such	-0.124939
-1.070849	supported in such	-0.124939
-0.599265	reorganized in such	-0.124939
-0.894952	efficient for such	-0.124939
-0.894952	checks for such	-0.124939
-0.597972	warning for such	-0.124939
-1.191614	possibility that such	-0.124939
-0.894713	hope that such	-0.124939
-0.597851	realize that such	-0.124939
-1.867571	member function such	-0.124939
-0.598045	user if such	-0.124939
-0.598045	gain if such	-0.124939
-0.590402	use on such	-0.124939
-0.590402	even on such	-0.124939
-1.045107	fast on such	-0.124939
-0.590402	operation on such	-0.124939
-1.464332	not have such	-0.124939
-0.889053	do have such	-0.124939
-1.073181	should use such	-0.124939
-0.599637	doesn't make such	-0.124939
-0.548626	using functions such	-0.124939
-0.374054	mathematical functions such	-0.726999
-0.548626	C functions such	-0.124939
-0.802082	math functions such	-0.124939
-0.548626	memory-intensive functions such	-0.124939
-0.599317	often, but such	-0.124939
-2.228809	is no such	-0.124939
-1.901207	to do such	-0.124939
-0.857942	not do such	-0.124939
-1.013106	will do such	-0.124939
-0.598933	Good compilers such	-0.124939
-0.598588	Programs using such	-0.124939
-1.067660	calculations. In such	-0.124939
-1.600110	with many such	-0.124939
-1.465039	integer operations such	-0.124939
-1.183204	composite type such	-0.124939
-0.889287	special cases such	-0.124939
-1.174010	of CPUs such	-0.124939
-0.590175	screen. However, such	-0.124939
-0.878304	automatically replace such	-0.124939
-0.588943	used branches such	-0.124939
-0.586701	other applications such	-0.124939
-0.872720	simple types such	-0.124939
-0.769041	other optimizations such	-0.124939
-0.529884	do optimizations such	-0.124939
-0.864723	parenthesis around such	-0.124939
-1.288372	algebraic reductions such	-0.124939
-0.509671	compiled languages such	-0.124939
-0.509671	includes languages such	-0.124939
-0.341206	to prevent such	-0.301030
-0.358353	for tasks such	-0.124939
-0.504214	standard tasks such	-0.124939
-0.358353	Other tasks such	-0.124939
-0.358353	trivial tasks such	-0.124939
-0.576594	long time, such	-0.124939
-1.308201	reason why such	-0.124939
-0.566084	building blocks such	-0.124939
-0.566084	of purposes such	-0.124939
-0.561597	mathematical iterations such	-0.124939
-0.992477	of vector, such	-0.124939
-0.556330	segmented memory, such	-0.124939
-0.556766	third-party profilers such	-0.124939
-0.556330	also available, such	-0.124939
-0.540480	between threads, such	-0.124939
-0.995789	programming languages, such	-0.124939
-0.526003	same resources, such	-0.124939
-0.526359	of overflow, such	-0.124939
-0.957230	example illustrates such	-0.124939
-0.526003	by considerations such	-0.124939
-0.503798	in comparisons, such	-0.124939
-0.503798	definition language, such	-0.124939
-0.724748	to justify such	-0.124939
-0.503798	string classes, such	-0.124939
-0.462528	STL templates, such	-0.124939
-0.462528	other resource, such	-0.124939
-0.462528	data shuffling, such	-0.124939
-0.658122	certain events, such	-0.124939
-0.462528	with suffixes such	-0.124939
-0.658122	and maintaining such	-0.124939
-0.358056	inherently serial, such	-0.124939
-0.358056	automatic vectorization, such	-0.124939
-0.358056	to obtain, such	-0.124939
-0.358056	removable media such	-0.124939
-0.358056	table 9.2, such	-0.124939
-0.358056	feature information, such	-0.124939
-0.358056	hardware. Porting such	-0.124939
-0.358056	may supply such	-0.124939
-2.524978	This is efficient	-0.124939
-0.601065	1.5f; is efficient	-0.124939
-1.640782	discussion of efficient	-0.124939
-1.078520	obstacles to efficient	-0.124939
-0.902084	compact and efficient	-0.124939
-2.213150	will be efficient	-0.124939
-0.901766	Templates are efficient	-0.124939
-0.339284	is as efficient	-0.602060
-0.859896	therefore as efficient	-0.124939
-1.353385	well as efficient	-0.124939
-0.199602	exactly as efficient	-0.425969
-0.640340	be an efficient	-0.301030
-1.736528	compilers have efficient	-0.124939
-0.313839	is more efficient	-0.903090
-0.791138	a more efficient	-0.124939
-0.555200	and more efficient	-0.124939
-0.379581	be more efficient	-0.425969
-0.488456	are more efficient	-0.124939
-0.577253	by more efficient	-0.124939
-0.525332	code more efficient	-0.124939
-0.741453	A more efficient	-0.124939
-0.409275	make more efficient	-0.124939
-0.577253	often more efficient	-0.124939
-0.652110	much more efficient	-0.124939
-0.292603	caching more efficient	-0.124939
-0.651639	becomes more efficient	-0.124939
-0.409275	calling more efficient	-0.124939
-0.409275	inlining more efficient	-0.124939
-0.158110	sometimes more efficient	-0.425969
-0.272607	slightly more efficient	-0.124939
-0.673859	the most efficient	-0.249877
-0.869830	is most efficient	-0.124939
-0.959312	The most efficient	-0.124939
-0.332579	are most efficient	-0.124939
-1.071659	a very efficient	-0.124939
-1.293364	be very efficient	-0.124939
-0.370438	is less efficient	-0.492916
-0.406627	and less efficient	-0.124939
-0.513100	be less efficient	-0.124939
-0.411383	are less efficient	-0.124939
-0.406627	input less efficient	-0.124939
-0.596858	Note how efficient	-0.124939
-0.594451	matrices. An efficient	-0.124939
-1.423411	is quite efficient	-0.124939
-0.876182	and various efficient	-0.124939
-0.463556	are equally efficient	-0.124939
-1.294373	detection function In	-0.124939
-2.006091	} } In	-0.124939
-0.874532	c; } In	-0.124939
-0.874532	2.0; } In	-0.124939
-1.067828	to double In	-0.124939
-0.596665	be 2 In	-0.124939
-1.005610	system code. In	-0.124939
-1.005610	compiled code. In	-0.124939
-0.595701	Member pointers In	-0.124939
-0.594929	Difficult cases In	-0.124939
-1.493375	the function. In	-0.124939
-1.012812	critical function. In	-0.124939
-0.534469	programming, etc. In	-0.124939
-0.534469	counters, etc. In	-0.124939
-0.534469	limit, etc. In	-0.124939
-1.533779	unsigned integers In	-0.124939
-0.592194	+= b; In	-0.124939
-1.593305	the program. In	-0.124939
-0.857507	less efficient. In	-0.124939
-0.585186	different processors. In	-0.124939
-1.432233	the loop. In	-0.124939
-0.509025	code size. In	-0.124939
-0.733476	register size. In	-0.124939
-0.578880	same variables. In	-0.124939
-1.119849	Bounds checking In	-0.124939
-0.577554	the resources. In	-0.124939
-0.576028	need it. In	-0.124939
-0.297882	mathematical calculations. In	-0.124939
-0.419695	graphics calculations. In	-0.124939
-1.508094	clock cycles. In	-0.124939
-1.181961	Loop unrolling In	-0.124939
-1.111900	64-bit Windows. In	-0.124939
-0.449808	is faster. In	-0.124939
-0.725141	much faster. In	-0.124939
-1.238423	template parameter. In	-0.124939
-0.978933	array element. In	-0.124939
-0.561470	XMM register. In	-0.124939
-0.561470	equally fast. In	-0.124939
-0.555969	register parameters. In	-0.124939
-0.556507	objects simultaneously. In	-0.124939
-0.556238	other optimizations. In	-0.124939
-1.107953	assembly language. In	-0.124939
-0.414582	higher speed. In	-0.124939
-0.414582	single-thread speed. In	-0.124939
-1.108913	by 16. In	-0.124939
-0.937843	the application. In	-0.124939
-0.785936	low priority. In	-0.124939
-0.539908	them all. In	-0.124939
-1.137070	clock cycle. In	-0.124939
-0.526106	of two. In	-0.124939
-0.525666	case" counts. In	-0.124939
-0.832567	dependency chains. In	-0.124939
-0.503478	-2.0 55 In	-0.124939
-0.503478	it explicitly. In	-0.124939
-0.724215	intended for. In	-0.124939
-0.503478	by step. In	-0.124939
-0.503478	this condition. In	-0.124939
-0.503478	doesn't occur. In	-0.124939
-0.503478	reused elsewhere. In	-0.124939
-0.503478	very big. In	-0.124939
-0.657665	page 71). In	-0.124939
-0.462237	software users. In	-0.124939
-0.462237	can throw. In	-0.124939
-0.462237	usually 32. In	-0.124939
-0.462237	vendor string. In	-0.124939
-0.462237	exception safe. In	-0.124939
-0.462237	optimal, though. In	-0.124939
-0.462237	same name. In	-0.124939
-0.462237	each calculation. In	-0.124939
-0.657665	be obtained. In	-0.124939
-0.657665	clock cycle? In	-0.124939
-0.657665	mathematical purity. In	-0.124939
-0.462237	microprocessors have. In	-0.124939
-0.357827	specifying otherwise. In	-0.124939
-0.357827	option /MT). In	-0.124939
-0.357827	of B. In	-0.124939
-0.357827	to mind. In	-0.124939
-0.357827	page 60. In	-0.124939
-0.357827	Journal, 2002). In	-0.124939
-0.357827	course system-specific. In	-0.124939
-0.357827	mispredictions. 44 In	-0.124939
-0.357827	page 34. In	-0.124939
-0.357827	32-bit counterparts. In	-0.124939
-0.357827	are short. In	-0.124939
-0.357827	is destroyed. In	-0.124939
-0.357827	self-relative addressing. In	-0.124939
-0.357827	same divisor. In	-0.124939
-0.357827	MAX(f(x), g(x)); In	-0.124939
-0.357827	number generators. In	-0.124939
-0.357827	(page 146). In	-0.124939
-0.357827	call __intel_cpu_features_init_x(). In	-0.124939
-0.357827	its API. In	-0.124939
-0.357827	many strings. In	-0.124939
-0.693222	= a *	-0.602060
-2.104553	as a *	-0.124939
-1.605563	return a *	-0.124939
-1.067614	a[], int *	-0.124939
-0.596238	aa, int *	-0.124939
-1.025468	= x *	-0.124939
-0.088713	return x *	-0.425969
-0.433351	= b *	-0.234083
-1.105403	+ b *	-0.124939
-0.170787	: b *	-0.425969
-0.648362	expression b *	-0.124939
-0.170787	2, b *	-0.425969
-0.456282	two, b *	-0.124939
-1.068999	= i *	-0.124939
-0.598210	floats: float *	-0.124939
-0.597183	= 2 *	-0.124939
-0.532729	char const *	-0.124939
-0.116666	LoadVector(void const *	-0.602060
-0.532729	LoadVectorA(void const *	-0.124939
-0.596674	= 8 *	-0.124939
-0.548878	Plus2 (int *	-0.124939
-0.548878	FuncA (int *	-0.124939
-0.584229	value 10 *	-0.124939
-0.864487	* 5 *	-0.124939
-0.335696	= temp *	-0.425969
-0.576435	* 100 *	-0.124939
-0.434428	+ j *	-0.124939
-0.434428	replace j *	-0.124939
-1.046012	if (a *	-0.124939
-0.540430	c;}; abc *	-0.124939
-1.008373	if (u.i *	-0.124939
-0.526580	Object2; CChild1 *	-0.124939
-0.526862	Object2; CHello *	-0.124939
-0.504435	take 1000 *	-0.124939
-0.504435	= x2 *	-0.124939
-0.504435	obj1; C0 *	-0.124939
-0.027989	void StoreVector(void *	-0.602060
-0.504079	+= xxn *	-0.124939
-0.504079	p1->Hello(); CChild2 *	-0.124939
-0.658522	= a2 *	-0.124939
-0.658522	= a1 *	-0.124939
-0.462783	version CriticalFunctionType *	-0.124939
-0.658522	version FuncType *	-0.124939
-0.462783	: (bb[i] *	-0.124939
-0.143189	is (columns *	-0.124939
-0.143189	* (columns *	-0.124939
-0.143189	function. typeof(CriticalFunction) *	-0.124939
-0.143189	("CriticalFunction"); typeof(CriticalFunction) *	-0.124939
-0.462783	return Func1(x) *	-0.124939
-0.658522	= (a+1) *	-0.124939
-0.358256	int Sum2(S3 *	-0.124939
-0.358256	For example,a *	-0.124939
-0.358256	void AddTwo(int *	-0.124939
-0.358256	+ FuncCol(i)) *	-0.124939
-0.358256	return (2.5f *	-0.124939
-0.358256	= a[i].u[1] *	-0.124939
-0.358256	(float *)alloca(n *	-0.124939
-0.358256	/ (b1 *	-0.124939
-0.358256	return powN<(N1&(N1-1))==0,N1>::p(x) *	-0.124939
-0.358256	> v.i *	-0.124939
-0.358256	void StoreNTD(double *	-0.124939
-0.358256	return powN<true,N/2>::p(x) *	-0.124939
-0.358256	void StoreVectorA(void *	-0.124939
-0.358256	4, anda *	-0.124939
-0.358256	* b2 *	-0.124939
-0.358256	* b1 *	-0.124939
-0.358256	log (b[i] *	-0.124939
-0.358256	- 8.0f) *	-0.124939
-1.270853	of compiler There	-0.124939
-0.890767	by compiler There	-0.124939
-0.599838	Func(ab[i].a); } There	-0.124939
-0.598134	is double There	-0.124939
-0.595620	mathematical code. There	-0.124939
-1.204175	same time. There	-0.124939
-1.182997	extra time. There	-0.124939
-0.593888	memcpy function. There	-0.124939
-0.592432	size, etc. There	-0.124939
-0.591982	different functions. There	-0.124939
-1.369887	{ ... There	-0.124939
-0.877634	Model-specific dispatching There	-0.124939
-0.587799	the systems. There	-0.124939
-1.546791	less efficient. There	-0.124939
-1.158705	explained below. There	-0.124939
-0.767735	logical processors. There	-0.124939
-0.767735	future processors. There	-0.124939
-0.584305	for vectors There	-0.124939
-0.583622	Development process There	-0.124939
-0.583738	was called. There	-0.124939
-0.581375	free are: There	-0.124939
-1.016415	vector size. There	-0.124939
-0.577378	other resources. There	-0.124939
-1.456309	function calls. There	-0.124939
-0.575868	support it. There	-0.124939
-0.852196	of registers. There	-0.124939
-0.575868	same object. There	-0.124939
-1.107676	in performance. There	-0.124939
-0.571660	control instructions. There	-0.124939
-0.571660	different way. There	-0.124939
-0.985460	internal references. There	-0.124939
-0.568734	same address. There	-0.124939
-1.328299	or not. There	-0.124939
-1.154459	end user. There	-0.124939
-1.341974	function returns. There	-0.124939
-1.272420	memory allocation. There	-0.124939
-1.324769	is enabled. There	-0.124939
-0.565184	becomes inefficient. There	-0.124939
-0.565653	try block. There	-0.124939
-0.978614	much faster. There	-0.124939
-1.237554	template parameter. There	-0.124939
-0.832659	Boolean expressions. There	-0.124939
-0.561571	maximum value. There	-0.124939
-0.561048	unaligned arrays. There	-0.124939
-0.824607	control branch. There	-0.124939
-0.561309	point vectors. There	-0.124939
-0.555788	four parameters. There	-0.124939
-0.556969	higher bits. There	-0.124939
-0.556378	comes automatically. There	-0.124939
-0.556083	CPU core. There	-0.124939
-1.141561	multiple threads. There	-0.124939
-0.803738	unit throughput There	-0.124939
-0.539768	time- consuming. There	-0.124939
-0.539371	present manual. There	-0.124939
-0.994299	out-of-order execution. There	-0.124939
-1.038969	by 8. There	-0.124939
-0.763121	(*.dll, *.so). There	-0.124939
-0.525498	AVX support. There	-0.124939
-0.761461	not optimal. There	-0.124939
-0.955867	and maintenance There	-0.124939
-0.503317	code explicitly. There	-0.124939
-0.723949	in Windows). There	-0.124939
-0.723949	AMD CodeAnalyst. There	-0.124939
-0.832226	following way: There	-0.124939
-0.723949	to security. There	-0.124939
-0.503317	very limited. There	-0.124939
-0.503317	number 0x1C. There	-0.124939
-0.503317	Memory copying. There	-0.124939
-0.503317	to post-increment. There	-0.124939
-0.657436	the screen. There	-0.124939
-0.462091	application programmer. There	-0.124939
-0.462091	overflow check. There	-0.124939
-0.657436	"Instruction tables". There	-0.124939
-0.462091	is created. There	-0.124939
-0.462091	p. 43). There	-0.124939
-0.462091	and Gnu. There	-0.124939
-0.657436	12.10 Conclusion There	-0.124939
-0.462091	save power. There	-0.124939
-0.462091	p. 87). There	-0.124939
-0.357712	= -abs(x);. There	-0.124939
-0.357712	to NULL. There	-0.124939
-0.357712	it uses. There	-0.124939
-0.357712	and 2B. There	-0.124939
-0.357712	2 Mbytes. There	-0.124939
-0.357712	by commas. There	-0.124939
-0.357712	be recycled? There	-0.124939
-0.357712	as x4∙xn-4. There	-0.124939
-0.357712	7.33 Namespaces There	-0.124939
-0.357712	is returned. There	-0.124939
-0.357712	to 36. There	-0.124939
-0.357712	than normally. There	-0.124939
-0.357712	or .so). There	-0.124939
-0.357712	floating point). There	-0.124939
-0.357712	using inheritance. There	-0.124939
-1.783916	of the array	-0.191886
-2.273676	and the array	-0.124939
-2.581666	in the array	-0.124939
-1.978400	if the array	-0.124939
-1.983523	make the array	-0.124939
-1.418325	which the array	-0.124939
-1.911666	all the array	-0.124939
-1.180902	case the array	-0.124939
-0.596138	compares the array	-0.124939
-0.596138	Sort the array	-0.124939
-1.947859	address of array	-0.124939
-0.503007	addresses of array	-0.124939
-1.484177	end of array	-0.124939
-0.598493	Violation of array	-0.124939
-1.639845	access to array	-0.124939
-0.601342	sizes and array	-0.124939
-1.299258	result in array	-0.124939
-1.340976	variables for array	-0.124939
-1.748581	check for array	-0.124939
-1.453586	intended for array	-0.124939
-0.889614	checking for array	-0.124939
-0.889614	checks for array	-0.124939
-0.900767	object or array	-0.124939
-1.394281	is an array	-0.124939
-0.757353	of an array	-0.124939
-0.872916	to an array	-0.124939
-1.269343	in an array	-0.124939
-1.087165	if an array	-0.124939
-0.753224	as an array	-0.124939
-0.537948	set an array	-0.124939
-0.537948	like an array	-0.124939
-0.537948	copying an array	-0.124939
-0.537948	setting an array	-0.124939
-0.537948	feeding an array	-0.124939
-0.088497	vector from array	-0.726999
-2.615516	the same array	-0.124939
-0.897808	for one array	-0.124939
-1.123444	of each array	-0.124939
-1.070099	fixed size array	-0.124939
-0.087335	vector into array	-0.726999
-0.895724	swap two array	-0.124939
-0.595859	Make dynamic array	-0.124939
-1.050739	a simple array	-0.124939
-0.887958	A large array	-0.124939
-0.491140	} An array	-0.124939
-0.491140	arrays. An array	-0.124939
-0.491140	Arrays An array	-0.124939
-0.491140	www.agner.org/optimize/cppexamples.zip. An array	-0.124939
-0.491140	27. An array	-0.124939
-0.594145	Loop through array	-0.124939
-0.970508	the allocated array	-0.124939
-0.562736	An allocated array	-0.124939
-1.563720	// Make array	-0.124939
-1.209079	cycles per array	-0.124939
-1.478688	the final array	-0.124939
-0.574895	allocation Any array	-0.124939
-0.412353	a linear array	-0.124939
-1.159828	the current array	-0.124939
-1.063239	a temporary array	-0.124939
-0.212749	a multidimensional array	-0.124939
-0.061054	or multidimensional array	-0.124939
-0.132117	A multidimensional array	-0.124939
-0.526679	to individual array	-0.124939
-0.763819	string constants, array	-0.124939
-0.463111	a variable-size array	-0.124939
-0.463111	Safe [] array	-0.124939
-0.358514	// Output array	-0.124939
-0.358514	a fixed-size array	-0.124939
-1.373891	arrays and where	-0.124939
-2.521729	the function where	-0.124939
-2.629360	the code where	-0.124939
-2.017569	a program where	-0.124939
-0.897715	the point where	-0.124939
-1.305286	a loop where	-0.124939
-2.148667	be used where	-0.124939
-2.422719	instruction set where	-0.124939
-1.572212	shared object where	-0.124939
-0.895389	functions static where	-0.124939
-0.893967	public variable where	-0.124939
-0.596130	large libraries where	-0.124939
-1.690567	vector operations where	-0.124939
-0.595069	general case where	-0.124939
-0.461739	the cases where	-0.124939
-0.048628	in cases where	-0.346788
-0.133609	be cases where	-0.124939
-0.327578	are cases where	-0.124939
-0.591649	most cases where	-0.124939
-0.327578	In cases where	-0.124939
-0.461739	many cases where	-0.124939
-0.327578	simple cases where	-0.124939
-0.133609	few cases where	-0.425969
-0.461739	special cases where	-0.124939
-0.594752	carry) instructions where	-0.124939
-1.339894	two threads where	-0.124939
-0.592239	a solution where	-0.124939
-1.359455	a structure where	-0.124939
-1.581443	64-bit mode where	-0.124939
-1.155491	in programs where	-0.124939
-1.475446	memory space where	-0.124939
-0.588773	data sets where	-0.124939
-1.038489	memory model where	-0.124939
-0.587267	obscure examples where	-0.124939
-0.871918	of expressions where	-0.124939
-0.584137	learning process where	-0.124939
-0.582252	4 computer where	-0.124939
-0.581137	interpreted languages where	-0.124939
-0.579976	member functions, where	-0.124939
-0.850084	can predict where	-0.124939
-0.038648	the situation where	-0.301030
-0.128098	use situation where	-0.124939
-0.128098	A situation where	-0.124939
-0.128098	only situation where	-0.124939
-0.128098	any situation where	-0.124939
-0.128098	common situation where	-0.124939
-0.839946	of templates where	-0.124939
-1.017123	a sequence where	-0.124939
-0.172537	overflow checks where	-0.124939
-0.112870	of situations where	-0.124939
-0.084565	in situations where	-0.124939
-0.112870	be situations where	-0.124939
-0.112870	are situations where	-0.124939
-0.112870	also situations where	-0.124939
-0.561890	= false where	-0.124939
-1.221459	dependency chain where	-0.124939
-0.305116	cases, however, where	-0.124939
-1.003230	is determined where	-0.124939
-0.816562	bit mode, where	-0.124939
-0.804096	second step where	-0.124939
-0.549517	addresses (i.e. where	-0.124939
-0.540002	in Fortran where	-0.124939
-0.915227	multiple inheritance where	-0.124939
-0.526109	a pipeline where	-0.124939
-0.762514	data cache, where	-0.124939
-0.526109	column-wise manner where	-0.124939
-0.503898	of calculations, where	-0.124939
-0.724914	the Internet where	-0.124939
-0.503898	sequential instructions, where	-0.124939
-0.658265	example 12.4a where	-0.124939
-0.462619	efficient today where	-0.124939
-0.462619	An experiment where	-0.124939
-0.358127	are areas where	-0.124939
-0.358127	variable __intel_cpu_feature_indicator where	-0.124939
-0.358127	around 1980 where	-0.124939
-0.358127	for pow(x,N) where	-0.124939
-0.358127	2eee 1.fffff, where	-0.124939
-0.358127	of data", where	-0.124939
-0.358127	places back, where	-0.124939
-0.358127	the sequence, where	-0.124939
-1.077048	implement the many	-0.124939
-0.601348	thank the many	-0.124939
-1.444747	speed is many	-0.124939
-0.600742	costly to many	-0.124939
-0.600742	consumer to many	-0.124939
-1.064701	compiler in many	-0.124939
-1.639174	variable in many	-0.124939
-0.597179	explicitly in many	-0.124939
-0.597179	users in many	-0.124939
-0.893383	missing in many	-0.124939
-1.149396	use for many	-0.124939
-1.233639	performance for many	-0.124939
-0.341954	libraries for many	-0.124939
-1.730202	useful for many	-0.124939
-0.877608	available for many	-0.124939
-0.587719	resource for many	-0.124939
-0.587719	market for many	-0.124939
-0.898311	discovered that many	-0.124939
-0.599664	mind, that many	-0.124939
-0.836948	there are many	-0.204120
-1.497802	critical function many	-0.124939
-1.545607	used by many	-0.124939
-1.086010	it with many	-0.124939
-1.086010	compiler with many	-0.124939
-0.840032	program with many	-0.124939
-0.840032	called with many	-0.124939
-0.569387	template with many	-0.124939
-0.569387	programs with many	-0.124939
-0.569387	application with many	-0.124939
-0.840032	applications with many	-0.124939
-0.987725	computer with many	-0.124939
-0.840032	statement with many	-0.124939
-0.569387	IDE with many	-0.124939
-2.261558	such as many	-0.124939
-1.514964	to have many	-0.124939
-1.565515	that have many	-0.124939
-1.661092	compilers have many	-0.124939
-0.600212	way, then many	-0.124939
-1.626576	called from many	-0.124939
-1.487531	it has many	-0.124939
-0.790891	program has many	-0.124939
-1.319957	library has many	-0.124939
-0.559359	C++ has many	-0.124939
-0.559359	D has many	-0.124939
-0.559359	Pascal has many	-0.124939
-1.990151	are used many	-0.124939
-1.285561	divided into many	-0.124939
-0.844838	efficient. In many	-0.124939
-0.571958	priority. In many	-0.124939
-0.571958	purity. In many	-0.124939
-0.865226	be so many	-0.124939
-1.211922	are so many	-0.124939
-2.191007	For example, many	-0.124939
-0.191016	count how many	-0.425969
-0.540169	tell how many	-0.124939
-0.540169	counts how many	-0.124939
-0.594532	from its many	-0.124939
-0.594257	cases, while many	-0.124939
-0.594030	code. But many	-0.124939
-1.175740	program uses many	-0.124939
-0.870966	program contains many	-0.124939
-0.435840	library contains many	-0.124939
-0.492014	Library" contains many	-0.124939
-1.058688	that run many	-0.124939
-1.702193	to store many	-0.124939
-0.587213	Making too many	-0.124939
-1.314268	to generate many	-0.124939
-1.028489	that goes many	-0.124939
-0.580220	classes. Unfortunately, many	-0.124939
-0.577203	processors. On many	-0.124939
-0.574838	block containing many	-0.124939
-0.566247	books contain many	-0.124939
-1.046491	can hold many	-0.124939
-0.540768	have seen many	-0.124939
-0.526707	and avoids many	-0.124939
-0.504239	Linux. Has many	-0.124939
-0.462929	Even worse, many	-0.124939
-0.462929	framework requiring many	-0.124939
-0.358371	CPUs. Includes many	-0.124939
-0.358371	2003. Contains many	-0.124939
-0.358371	or modifies many	-0.124939
-0.358371	or "how many	-0.124939
-2.198232	at the possible	-0.124939
-1.124786	it is possible	-1.238882
-1.749629	this is possible	-0.124939
-0.923868	It is possible	-1.054358
-1.373379	also a possible	-0.124939
-0.600914	justify a possible	-0.124939
-1.505515	number of possible	-0.124939
-1.190174	aware of possible	-0.124939
-0.598531	obstacle of possible	-0.124939
-0.601405	explanation and possible	-0.124939
-1.763538	may be possible	-0.425969
-2.098043	should be possible	-0.124939
-0.592445	less be possible	-0.124939
-0.376340	might be possible	-0.425969
-0.601268	objects) are possible	-0.124939
-0.417305	make it possible	-0.301030
-0.479124	makes it possible	-0.726999
-0.582805	was it possible	-0.124939
-1.377524	2 if possible	-0.124939
-0.879636	data as possible	-0.124939
-0.590175	much as possible	-0.124939
-0.590175	small as possible	-0.124939
-0.590175	standardized as possible	-0.124939
-1.350215	is not possible	-0.346788
-1.128364	therefore not possible	-0.124939
-1.196294	} A possible	-0.124939
-1.029455	is only possible	-0.425969
-0.898113	are other possible	-0.124939
-1.288295	in all possible	-0.124939
-1.219947	is also possible	-0.425969
-1.138692	is often possible	-0.425969
-1.665279	not always possible	-0.124939
-0.594795	change its possible	-0.124939
-0.794815	the best possible	-0.124939
-1.187934	The best possible	-0.124939
-1.660608	is therefore possible	-0.124939
-0.872313	other optimizations possible	-0.124939
-0.857903	is sometimes possible	-0.124939
-1.112230	the maximum possible	-0.124939
-0.962277	the simplest possible	-0.124939
-0.829484	The simplest possible	-0.124939
-1.228035	is rarely possible	-0.124939
-0.562532	of fastest possible	-0.124939
-0.817693	is generally possible	-0.124939
-0.408118	the worst possible	-0.124939
-0.550343	Define biggest possible	-0.124939
-2.382950	of the clock	-0.124939
-2.678491	in the clock	-0.124939
-1.751339	that the clock	-0.301030
-2.501286	if the clock	-0.124939
-2.290290	by the clock	-0.124939
-2.340794	when the clock	-0.124939
-0.597852	measure the clock	-0.124939
-0.597852	Occasionally, the clock	-0.124939
-0.597852	event, the clock	-0.124939
-0.902178	unit is clock	-0.124939
-1.973222	is a clock	-0.425969
-1.685207	of a clock	-0.602060
-2.273688	to a clock	-0.124939
-2.454757	number of clock	-0.124939
-0.596827	Multithreading The clock	-0.124939
-0.596827	load. The clock	-0.124939
-0.596827	PCs. The clock	-0.124939
-0.596827	afterwards. The clock	-0.124939
-1.196830	uses more clock	-0.124939
-0.599922	2GHz A clock	-0.124939
-1.298110	the CPU clock	-0.301030
-0.853551	using CPU clock	-0.124939
-0.565462	or one clock	-0.124939
-1.096451	only one clock	-0.124939
-0.565462	takes one clock	-0.124939
-0.832739	just one clock	-0.124939
-1.476050	the two clock	-0.124939
-0.198337	approximately two clock	-0.124939
-0.597390	- 2 clock	-0.124939
-0.579137	to 4 clock	-0.124939
-0.579137	- 4 clock	-0.124939
-0.596844	- 8 clock	-0.124939
-0.595544	- 16 clock	-0.124939
-0.564282	save several clock	-0.124939
-0.564282	wastes several clock	-0.124939
-0.461505	a few clock	-0.602060
-0.410785	The few clock	-0.124939
-0.360130	addition every clock	-0.425969
-0.588855	change their clock	-0.124939
-0.588456	only 256 clock	-0.124939
-0.587206	every three clock	-0.124939
-1.302742	a higher clock	-0.124939
-0.469431	- 10 clock	-0.124939
-0.669002	takes 10 clock	-0.124939
-0.469431	take 10 clock	-0.124939
-0.504612	the core clock	-0.124939
-0.143305	The core clock	-0.124939
-0.358638	are core clock	-0.124939
-0.358638	called core clock	-0.124939
-0.456277	- 5 clock	-0.124939
-0.170786	takes 5 clock	-0.425969
-0.576719	- 100 clock	-0.124939
-0.393845	and 20 clock	-0.124939
-0.153728	- 20 clock	-0.124939
-0.566552	- 6 clock	-0.124939
-0.550138	and 15 clock	-0.124939
-0.143278	- 80 clock	-0.124939
-0.997409	the actual clock	-0.124939
-0.601229	a hundred clock	-0.124939
-0.358547	several hundred clock	-0.124939
-0.540640	takes 11 clock	-0.124939
-0.540459	took 50 clock	-0.124939
-0.763278	takes 40 clock	-0.124939
-0.129355	- 45 clock	-0.124939
-0.504319	- 25 clock	-0.124939
-0.065742	size matrices, clock	-0.425969
-0.358428	only 2-3 clock	-0.124939
-0.358428	is counting clock	-0.124939
-0.358428	approximately 500 clock	-0.124939
-0.358428	= dummy[0]; clock	-0.124939
-2.219505	then the version	-0.124939
-2.334461	If the version	-0.124939
-1.670412	call the version	-0.124939
-2.093248	use a version	-0.124939
-0.599687	takes. The version	-0.124939
-0.599687	brand. The version	-0.124939
-0.598100	appropriate function version	-0.124939
-0.598100	desired function version	-0.124939
-1.243216	a code version	-0.124939
-1.146306	this code version	-0.124939
-0.201039	Each code version	-0.425969
-0.586869	advanced code version	-0.124939
-2.601612	the same version	-0.124939
-0.567216	time which version	-0.124939
-0.567216	known which version	-0.124939
-0.567216	testing which version	-0.124939
-0.567216	deciding which version	-0.124939
-0.567216	certainty which version	-0.124939
-0.897438	make one version	-0.124939
-1.614804	of each version	-0.124939
-1.213112	for each version	-0.124939
-1.361847	the static version	-0.124939
-1.303991	a 64-bit version	-0.124939
-0.870980	The 64-bit version	-0.124939
-1.188438	best possible version	-0.124939
-1.063384	32- bit version	-0.124939
-0.714944	the new version	-0.124939
-1.395968	a new version	-0.124939
-0.777945	each new version	-0.124939
-1.789042	the SSE2 version	-0.124939
-0.570933	// SSE2 version	-0.425969
-0.887328	The Windows version	-0.124939
-1.055965	directly compiled version	-0.124939
-0.593539	// specific version	-0.124939
-0.488485	// AVX version	-0.124939
-0.488029	the optimized version	-0.124939
-0.592211	set, another version	-0.124939
-1.581908	the optimal version	-0.124939
-1.685283	a separate version	-0.124939
-1.230963	a better version	-0.124939
-0.584286	years old version	-0.124939
-0.224879	the appropriate version	-0.602060
-0.291731	The appropriate version	-0.124939
-0.453360	the advanced version	-0.124939
-0.804720	the desired version	-0.124939
-0.506882	the right version	-0.602060
-1.476692	the final version	-0.124939
-0.569711	a future version	-0.124939
-0.840635	a newer version	-0.124939
-1.158605	the current version	-0.124939
-0.979580	the chosen version	-0.124939
-0.562131	the interpreted version	-0.124939
-0.097282	a debug version	-0.124939
-0.222841	17 debug version	-0.124939
-0.222841	Uses debug version	-0.124939
-0.304743	the latest version	-0.602060
-0.550533	to dispatched version	-0.124939
-0.390947	in dispatched version	-0.124939
-0.540263	most popular version	-0.124939
-0.762950	the selected version	-0.124939
-0.314492	an inferior version	-0.124939
-0.314492	An inferior version	-0.124939
-0.504139	Linux kernel version	-0.124939
-0.065725	// Lowest version	-0.425969
-0.143202	(requires binutils version	-0.124939
-0.143202	Requires binutils version	-0.124939
-0.462838	A command-line version	-0.124939
-0.065725	a release version	-0.124939
-0.462838	a generic version	-0.124939
-0.358299	// Generic version	-0.124939
-0.358299	2.20, glibc version	-0.124939
-0.358299	// Default version	-0.124939
-2.119951	of the value	-0.124939
-2.115469	to the value	-0.124939
-1.292557	that the value	-0.367977
-1.462996	if the value	-0.823909
-1.997017	by the value	-0.124939
-1.977589	with the value	-0.124939
-2.029703	on the value	-0.124939
-1.929680	use the value	-0.124939
-1.894789	then the value	-0.124939
-1.413014	from the value	-0.425969
-1.654363	has the value	-0.124939
-1.825495	make the value	-0.124939
-1.915932	because the value	-0.124939
-1.953076	If the value	-0.124939
-1.142605	so the value	-0.124939
-1.503442	sure the value	-0.124939
-0.589040	get the value	-0.124939
-1.020454	calculate the value	-0.124939
-1.626334	unless the value	-0.124939
-1.389761	after the value	-0.124939
-0.495349	read the value	-0.425969
-1.357871	Here, the value	-0.124939
-0.585845	know the value	-0.124939
-0.585845	generate the value	-0.124939
-1.344624	change the value	-0.124939
-1.304588	gives the value	-0.124939
-1.440914	until the value	-0.124939
-0.871244	reading the value	-0.124939
-1.357871	calculating the value	-0.124939
-0.871244	hold the value	-0.124939
-0.871244	reload the value	-0.124939
-0.585845	restores the value	-0.124939
-1.984752	that a value	-0.124939
-1.769175	from a value	-0.124939
-0.599492	constant a value	-0.124939
-0.599492	Reading a value	-0.124939
-0.893008	explanation. The value	-0.124939
-0.596990	counts. The value	-0.124939
-0.596990	false. The value	-0.124939
-0.596990	. The value	-0.124939
-0.601053	transferred by value	-0.124939
-1.274523	and this value	-0.124939
-0.595280	subtract this value	-0.124939
-0.599849	each different value	-0.124939
-0.784750	no other value	-0.425969
-1.593385	any other value	-0.124939
-2.568706	floating point value	-0.124939
-1.617465	the integer value	-0.124939
-0.808332	and each value	-0.124939
-0.631283	that each value	-0.124939
-0.808332	because each value	-0.124939
-0.808332	calculate each value	-0.124939
-0.552102	Here, each value	-0.124939
-0.597689	its return value	-0.124939
-1.343479	the new value	-0.124939
-1.673853	a new value	-0.124939
-0.568972	by its value	-0.124939
-0.568972	then its value	-0.124939
-1.029228	the binary value	-0.124939
-0.582844	possible negative value	-0.124939
-1.507595	the preceding value	-0.124939
-0.577284	value maximum value	-0.124939
-1.480023	the final value	-0.124939
-0.478113	the previous value	-0.301030
-0.080689	the absolute value	-0.301030
-0.550435	four B value	-0.124939
-0.463294	bits minimum value	-0.124939
-0.463294	four R value	-0.124939
-0.463294	unused fourth value	-0.124939
-0.358658	the initial value	-0.124939
-2.863971	of the objects	-0.124939
-2.551541	that the objects	-0.124939
-2.026634	if the objects	-0.124939
-1.056676	all the objects	-0.124939
-0.599298	whenever the objects	-0.124939
-1.231300	number of objects	-0.522879
-1.582859	type of objects	-0.124939
-0.202939	movements of objects	-0.425969
-0.410571	variables and objects	-0.221849
-0.124018	Variables and objects	-0.124939
-1.538937	time. The objects	-0.124939
-0.599851	(en.wikipedia.org/wiki/Standard_Template_Library). The objects	-0.124939
-1.580109	only for objects	-0.124939
-0.898610	principle for objects	-0.124939
-2.151459	there are objects	-0.124939
-0.588936	on when objects	-0.124939
-0.374914	fragmented when objects	-0.425969
-0.585915	possible vector objects	-0.124939
-1.172409	Define vector objects	-0.124939
-0.585915	allow vector objects	-0.124939
-1.820689	for different objects	-0.124939
-0.592419	make different objects	-0.124939
-1.487518	with other objects	-0.124939
-0.200336	manner? If objects	-0.425969
-0.583462	consecutively? If objects	-0.124939
-0.582711	before all objects	-0.124939
-0.372368	after all objects	-0.425969
-0.842431	and class objects	-0.124939
-0.570672	all class objects	-0.124939
-0.570672	involving class objects	-0.124939
-0.570672	Converting class objects	-0.124939
-0.584375	store many objects	-0.124939
-0.584375	containing many objects	-0.124939
-0.866630	if any objects	-0.124939
-0.583447	remove any objects	-0.124939
-0.596068	and new objects	-0.124939
-1.712797	by making objects	-0.124939
-0.887992	align large objects	-0.124939
-0.565979	for big objects	-0.124939
-0.565979	other big objects	-0.124939
-0.183366	all allocated objects	-0.124939
-0.711100	dynamically allocated objects	-0.124939
-1.703898	to store objects	-0.124939
-0.418639	in shared objects	-0.124939
-0.418639	when shared objects	-0.124939
-0.418639	make shared objects	-0.124939
-0.591111	64-bit shared objects	-0.124939
-0.160716	called shared objects	-0.425969
-0.871977	of graphics objects	-0.124939
-1.033754	for local objects	-0.124939
-0.498892	key. Do objects	-0.124939
-0.498892	map. Do objects	-0.124939
-1.007652	The so-called objects	-0.124939
-0.841042	and similar objects	-0.124939
-0.841192	or modify objects	-0.124939
-0.562353	of temporary objects	-0.124939
-0.036347	code Shared objects	-0.124939
-0.036347	time. Shared objects	-0.124939
-0.036347	Linux Shared objects	-0.124939
-0.017793	below. Shared objects	-0.425969
-0.036347	BSD Shared objects	-0.124939
-0.036347	references. Shared objects	-0.124939
-0.415509	cases, composite objects	-0.124939
-0.415509	transferring composite objects	-0.124939
-0.939311	to declare objects	-0.124939
-0.065754	time. Are objects	-0.124939
-0.065754	index. Are objects	-0.124939
-0.065754	small. Are objects	-0.124939
-0.065754	94 Are objects	-0.124939
-0.358529	void. Returning objects	-0.124939
-0.901604	compact and takes	-0.124939
-1.497075	one that takes	-0.124939
-1.169294	class that takes	-0.124939
-1.257917	library that takes	-0.124939
-1.257917	version that takes	-0.124939
-1.169294	way that takes	-0.124939
-0.885328	task that takes	-0.124939
-1.235058	that it takes	-0.124939
-1.119202	than it takes	-0.124939
-0.139469	time it takes	-1.185637
-1.498280	because it takes	-0.124939
-0.764480	Sometimes it takes	-0.124939
-0.764480	Often, it takes	-0.124939
-0.527248	Usually it takes	-0.124939
-2.639339	the code takes	-0.124939
-1.922782	the compiler takes	-0.124939
-1.272742	used. It takes	-0.124939
-0.587686	started. It takes	-0.124939
-0.587686	green. It takes	-0.124939
-1.746344	of memory takes	-0.124939
-1.498965	in memory takes	-0.124939
-2.414620	the program takes	-0.124939
-0.599902	branch instruction takes	-0.124939
-0.599679	unrolled loop takes	-0.124939
-1.204647	to integer takes	-0.124939
-1.693522	an integer takes	-0.124939
-1.125378	unsigned integer takes	-0.124939
-1.303561	a double takes	-0.124939
-1.440372	or double takes	-0.124939
-1.288691	'this' pointer takes	-0.124939
-0.598493	structure object takes	-0.124939
-0.598302	language. C++ takes	-0.124939
-1.777003	the table takes	-0.124939
-0.577427	conversion often takes	-0.124939
-0.577427	disk often takes	-0.124939
-0.595423	constant always takes	-0.124939
-1.736040	double precision takes	-0.124939
-1.332417	linked list takes	-0.124939
-0.726026	This typically takes	-0.124939
-0.504566	pointer typically takes	-0.124939
-0.504566	SSE2 typically takes	-0.124939
-0.469964	Integer multiplication takes	-0.124939
-1.669230	exception handling takes	-0.124939
-0.588530	directive never takes	-0.124939
-0.866882	the conversion takes	-0.124939
-0.458640	This conversion takes	-0.124939
-0.795001	type conversion takes	-0.124939
-0.458640	integer-to-float conversion takes	-0.124939
-0.918553	point division takes	-0.124939
-0.717686	Integer division takes	-0.124939
-0.716192	point addition takes	-0.124939
-0.860220	point operation takes	-0.124939
-0.172607	that something takes	-0.425969
-0.826878	runtime DLL takes	-0.124939
-1.046970	garbage collection takes	-0.124939
-0.557130	them again takes	-0.124939
-0.550212	because truncation takes	-0.124939
-0.540873	cycles. Division takes	-0.124939
-0.463075	microprocessor. Multiplication takes	-0.124939
-0.463075	the branching takes	-0.124939
-0.463075	it obviously takes	-0.124939
-2.174044	is the variable	-0.124939
-2.292203	of the variable	-0.124939
-2.504035	in the variable	-0.124939
-1.928683	that the variable	-0.124939
-2.362404	if the variable	-0.124939
-2.064489	than the variable	-0.124939
-1.781945	time the variable	-0.124939
-2.130736	If the variable	-0.124939
-0.740594	which the variable	-0.425969
-1.755382	but the variable	-0.124939
-1.572900	sure the variable	-0.124939
-1.174510	write the variable	-0.124939
-0.500578	sets the variable	-0.124939
-1.403467	give the variable	-0.124939
-0.888033	away the variable	-0.124939
-0.594466	transferring the variable	-0.124939
-0.888033	forces the variable	-0.124939
-0.594466	fetch the variable	-0.124939
-1.486422	of a variable	-0.124939
-1.415126	to a variable	-0.124939
-1.048545	that a variable	-0.301030
-1.426677	by a variable	-0.124939
-1.846596	on a variable	-0.124939
-1.632014	from a variable	-0.124939
-1.816321	make a variable	-0.124939
-1.234907	access a variable	-0.124939
-1.038313	writing a variable	-0.124939
-1.150441	accessing a variable	-0.124939
-1.038313	Accessing a variable	-0.124939
-0.588005	treat a variable	-0.124939
-1.290685	objects of variable	-0.124939
-1.068836	arrays of variable	-0.124939
-0.723523	kinds of variable	-0.301030
-0.504160	names and variable	-0.124939
-1.042128	function or variable	-0.124939
-1.074619	typically have variable	-0.124939
-0.843343	data A variable	-0.124939
-1.303768	time. A variable	-0.124939
-0.571160	pointer. A variable	-0.124939
-0.571160	thread. A variable	-0.124939
-0.843343	expensive. A variable	-0.124939
-0.571160	below). A variable	-0.124939
-0.898349	some other variable	-0.124939
-2.569751	floating point variable	-0.124939
-1.365421	than one variable	-0.124939
-1.934602	an integer variable	-0.124939
-1.193368	that no variable	-0.124939
-1.187237	class member variable	-0.124939
-0.580448	global const variable	-0.124939
-0.580448	local const variable	-0.124939
-1.189616	an unsigned variable	-0.124939
-0.902815	a register variable	-0.425969
-1.328363	the shared variable	-0.124939
-1.053344	a signed variable	-0.124939
-0.070291	the induction variable	-0.124939
-0.209150	an induction variable	-0.124939
-0.258294	same induction variable	-0.124939
-0.258294	no induction variable	-0.124939
-0.258294	second induction variable	-0.124939
-0.110221	Update induction variable	-0.124939
-0.350091	a public variable	-0.124939
-0.292714	a global variable	-0.301030
-1.064112	a temporary variable	-0.124939
-0.557478	the saved variable	-0.124939
-1.371916	integers of any	-0.124939
-0.600666	outside of any	-0.124939
-1.434230	it to any	-0.124939
-1.072669	writes to any	-0.124939
-0.599878	correspond to any	-0.124939
-0.601089	constructors, and any	-0.124939
-1.961420	used in any	-0.124939
-0.600420	speed in any	-0.124939
-2.090806	used for any	-0.124939
-0.598090	works for any	-0.124939
-0.598090	linking for any	-0.124939
-0.595482	set or any	-0.124939
-0.595482	call or any	-0.124939
-0.595482	bottleneck or any	-0.124939
-0.598109	detect if any	-0.124939
-0.598109	true, if any	-0.124939
-0.376058	accessed by any	-0.425969
-0.882710	line by any	-0.124939
-0.591750	bypassed by any	-0.124939
-1.065461	work with any	-0.124939
-0.893895	interfere with any	-0.124939
-0.600718	optimally on any	-0.124939
-1.374088	efficient as any	-0.124939
-1.731659	but not any	-0.124939
-2.071283	faster than any	-0.124939
-1.267327	can have any	-0.124939
-0.595107	inputs have any	-0.124939
-1.862186	can use any	-0.124939
-1.534449	called from any	-0.124939
-1.146207	accessed from any	-0.124939
-0.586842	referenced from any	-0.124939
-0.599884	added at any	-0.124939
-1.621852	will make any	-0.124939
-1.176353	cannot make any	-0.124939
-0.591413	units. If any	-0.124939
-0.591413	(RTTI) If any	-0.124939
-0.897831	used, but any	-0.124939
-2.262334	to do any	-0.124939
-0.598267	counts. In any	-0.124939
-0.597141	never return any	-0.124939
-0.192416	and before any	-0.124939
-0.546463	instruction before any	-0.124939
-1.054987	_mm256_zeroupper() before any	-0.124939
-0.199366	doesn't call any	-0.425969
-0.596671	doesn't take any	-0.124939
-1.180992	not need any	-0.124939
-1.180992	don't need any	-0.124939
-1.132949	compiled without any	-0.124939
-0.553561	microprocessors without any	-0.124939
-0.553561	freely without any	-0.124939
-0.198195	cannot access any	-0.425969
-0.569427	not making any	-0.124939
-0.569427	avoid making any	-0.124939
-1.173548	should avoid any	-0.124939
-0.883658	not get any	-0.124939
-0.555849	(CGrandParent) contains any	-0.124939
-0.555849	(CParent<>) contains any	-0.124939
-0.591559	and run any	-0.124939
-0.585201	and calling any	-0.124939
-0.584472	doesn't generate any	-0.124939
-1.349899	can reduce any	-0.124939
-0.103641	not produce any	-0.602060
-0.578925	defined outside any	-0.124939
-0.576791	this time, any	-0.124939
-0.576589	without adding any	-0.124939
-0.572587	may insert any	-0.124939
-0.572469	from loading any	-0.124939
-0.566300	not include any	-0.124939
-0.364511	is hardly any	-0.124939
-0.222811	with hardly any	-0.124939
-0.222811	has hardly any	-0.124939
-0.222811	was hardly any	-0.124939
-0.556960	or remove any	-0.124939
-0.938357	by avoiding any	-0.124939
-0.763272	to recommend any	-0.124939
-0.725148	not alias any	-0.124939
-0.504038	never throw any	-0.124939
-0.658465	and resolve any	-0.124939
-0.658465	to obey any	-0.124939
-0.358228	to express any	-0.124939
-0.358228	profiler identifies any	-0.124939
-0.358228	that destroys any	-0.124939
-1.491591	memory and we	-0.124939
-0.600097	range and we	-0.124939
-2.077413	is that we	-0.124939
-1.085120	so that we	-0.425969
-0.581655	large that we	-0.124939
-1.020636	line that we	-0.124939
-0.581655	information that we	-0.124939
-0.581655	cycles that we	-0.124939
-1.020636	optimizations that we	-0.124939
-0.581655	now that we	-0.124939
-0.581655	assumes that we	-0.124939
-0.581655	multithreading that we	-0.124939
-2.526177	the function we	-0.124939
-1.042567	code if we	-0.124939
-0.589509	but if we	-0.124939
-1.485922	example, if we	-0.124939
-0.589509	result if we	-0.124939
-0.589509	easier if we	-0.124939
-2.250965	the time we	-0.124939
-0.594473	understand when we	-0.124939
-0.594473	evicted when we	-0.124939
-0.563963	number then we	-0.124939
-0.829969	times then we	-0.124939
-0.563963	result then we	-0.124939
-0.829969	cycles, then we	-0.124939
-0.563963	n, then we	-0.124939
-0.563963	10000, then we	-0.124939
-0.563963	C2, then we	-0.124939
-1.376933	is because we	-0.124939
-1.010358	faster because we	-0.124939
-0.856018	here because we	-0.124939
-0.577887	9.5 because we	-0.124939
-0.575453	references. If we	-0.124939
-0.575453	anyway. If we	-0.124939
-0.575453	n∙(n-1)!. If we	-0.124939
-0.575453	2.5f; If we	-0.124939
-1.364685	function which we	-0.124939
-0.896233	this number we	-0.124939
-1.782974	cases where we	-0.124939
-0.553992	x so we	-0.124939
-0.553992	cache so we	-0.124939
-0.553992	case so we	-0.124939
-0.553992	bytes, so we	-0.124939
-0.597125	evicted before we	-0.124939
-1.359993	this example, we	-0.124939
-1.344289	64-bit systems we	-0.124939
-1.460707	the case we	-0.124939
-1.397612	this case we	-0.124939
-0.565227	applications. But we	-0.124939
-0.565227	1.23456. But we	-0.124939
-0.548727	code. However, we	-0.124939
-0.548727	models. However, we	-0.124939
-0.874364	these examples we	-0.124939
-1.299321	} Here, we	-0.124939
-1.421372	cache lines we	-0.124939
-0.857321	of constants we	-0.124939
-0.491408	cache. When we	-0.124939
-0.491408	100000000. When we	-0.124939
-0.691365	first thing we	-0.124939
-0.483421	second thing we	-0.124939
-0.840705	the future we	-0.124939
-0.569861	process. Obviously, we	-0.124939
-0.450126	double. Here we	-0.124939
-0.450126	a2/b2; Here we	-0.124939
-0.826414	= 4, we	-0.124939
-0.817057	bit mode, we	-0.124939
-0.556760	be available, we	-0.124939
-0.526659	CPU cores, we	-0.124939
-0.526404	language". While we	-0.124939
-0.526404	pow(x,n) As we	-0.124939
-0.314348	option. Then we	-0.124939
-0.314348	F1? Then we	-0.124939
-0.504179	hexadecimal numbers, we	-0.124939
-0.504502	example 7.4 we	-0.124939
-0.504179	problem since we	-0.124939
-0.462874	by four, we	-0.124939
-0.462874	of algebra, we	-0.124939
-0.358328	data decomposition, we	-0.124939
-0.358328	The lesson we	-0.124939
-0.358328	PC. Similarly, we	-0.124939
-0.358328	-156. Surprisingly, we	-0.124939
-0.358328	example 14.7b, we	-0.124939
-0.358328	200. Next, we	-0.124939
-0.358328	back. Thus, we	-0.124939
-0.358328	CPU. Should we	-0.124939
-0.899124	be of some	-0.124939
-1.627800	list of some	-0.124939
-1.437309	care of some	-0.124939
-1.656658	up to some	-0.124939
-1.070831	identical to some	-0.124939
-0.599259	programmers to some	-0.124939
-0.599259	rise to some	-0.124939
-1.605920	functions and some	-0.124939
-1.192516	mode and some	-0.124939
-0.598157	errors, and some	-0.124939
-0.598157	mechanisms, and some	-0.124939
-0.585266	and in some	-0.425969
-1.351494	not in some	-0.124939
-0.183806	may in some	-0.726999
-1.767245	used in some	-0.124939
-0.593582	efficient in some	-0.124939
-0.862263	possible in some	-0.124939
-0.581167	performance in some	-0.124939
-1.427585	useful in some	-0.124939
-1.019296	solution in some	-0.124939
-0.862263	structure in some	-0.124939
-1.019296	optimizations in some	-0.124939
-0.862263	least in some	-0.124939
-0.581167	expensive in some	-0.124939
-0.581167	imprecision in some	-0.124939
-0.581167	iterator in some	-0.124939
-0.596815	section for some	-0.124939
-1.183511	except for some	-0.124939
-0.596815	5-10% for some	-0.124939
-0.596815	comp.lang.asm.x86 for some	-0.124939
-0.891901	avoid that some	-0.124939
-0.596430	true that some	-0.124939
-0.596430	buffer that some	-0.124939
-0.891901	notice that some	-0.124939
-2.112936	there are some	-0.124939
-0.599404	Here are some	-0.124939
-0.601210	giving it some	-0.124939
-1.692013	supported by some	-0.124939
-0.597968	combined by some	-0.124939
-1.181707	comes with some	-0.124939
-1.336485	compatibility with some	-0.124939
-0.594460	14, with some	-0.124939
-0.790366	only on some	-0.124939
-0.590706	normal on some	-0.124939
-0.590706	IDE on some	-0.124939
-0.600674	solutions may some	-0.124939
-0.600452	does have some	-0.124939
-1.575751	compiler has some	-0.124939
-1.052541	It has some	-0.124939
-1.195806	may make some	-0.124939
-2.269253	to do some	-0.124939
-0.450103	function In some	-0.124939
-0.638783	function. In some	-0.124939
-0.450103	program. In some	-0.124939
-0.725697	calculations. In some	-0.124939
-0.450103	unrolling In some	-0.124939
-0.450103	element. In some	-0.124939
-0.450103	users. In some	-0.124939
-0.450103	though. In some	-0.124939
-0.450103	have. In some	-0.124939
-0.450103	mind. In some	-0.124939
-0.450103	2002). In some	-0.124939
-0.450103	44 In some	-0.124939
-0.450103	34. In some	-0.124939
-0.450103	API. In some	-0.124939
-1.067561	It takes some	-0.124939
-2.193451	For example, some	-0.124939
-0.595923	points out some	-0.124939
-1.154454	it does some	-0.124939
-1.334379	compiler does some	-0.124939
-1.049063	for doing some	-0.124939
-0.586313	can give some	-0.124939
-1.352035	can reduce some	-0.124939
-0.580571	have described some	-0.124939
-0.580420	2; Unfortunately, some	-0.124939
-1.194862	to save some	-0.124939
-0.570139	instructions SSE4.1 some	-0.124939
-0.557145	common. Even some	-0.124939
-0.526763	others. While some	-0.124939
-0.659152	sections describe some	-0.124939
-2.218988	function is so	-0.124939
-1.563751	code is so	-0.425969
-1.672603	time is so	-0.124939
-1.745513	object is so	-0.124939
-1.059016	variables is so	-0.124939
-0.595234	programming is so	-0.124939
-1.353528	matrix is so	-0.124939
-1.177436	syntax is so	-0.124939
-1.059016	effect is so	-0.124939
-0.600988	caller, and so	-0.124939
-1.974835	used in so	-0.124939
-1.833458	may be so	-0.124939
-2.105042	that are so	-0.124939
-1.054139	cache are so	-0.124939
-2.135564	There are so	-0.124939
-1.054139	CPUs are so	-0.124939
-0.886236	c[i] are so	-0.124939
-0.601030	around it so	-0.124939
-0.600869	thread function so	-0.124939
-2.485215	the code so	-0.124939
-1.281688	critical code so	-0.124939
-0.600760	on x so	-0.124939
-1.338130	of time so	-0.124939
-1.613212	compile time so	-0.124939
-1.071983	128-bit vector so	-0.124939
-0.599304	b different so	-0.124939
-1.868154	the cache so	-0.124939
-1.141363	to do so	-0.249877
-0.822519	not do so	-0.124939
-1.484040	this example so	-0.124939
-1.419299	in C++ so	-0.124939
-0.596969	same address so	-0.124939
-1.630513	function call so	-0.124939
-0.596757	register less so	-0.124939
-1.790393	sign bit so	-0.124939
-1.185989	through pointers so	-0.124939
-0.595169	64-bit operations so	-0.124939
-1.515884	this case so	-0.124939
-0.594751	or does so	-0.124939
-1.333067	the calculations so	-0.124939
-0.886484	separate threads so	-0.124939
-0.593368	any exception so	-0.124939
-1.047029	32-bit mode so	-0.124939
-0.587796	reliable source so	-0.124939
-0.866698	the start so	-0.124939
-0.582146	be negative so	-0.124939
-1.365643	code section so	-0.124939
-0.580931	this statement so	-0.124939
-1.199475	the code, so	-0.124939
-0.578103	are inlined so	-0.124939
-1.006112	a macro so	-0.124939
-0.576198	by 100 so	-0.124939
-0.574609	of 2, so	-0.124939
-0.850055	is changed so	-0.124939
-0.845540	are identical so	-0.124939
-0.987937	loop unrolling so	-0.124939
-0.416624	be organized so	-0.425969
-1.085305	template metaprogramming so	-0.124939
-1.129365	an integer, so	-0.124939
-0.562233	be designed so	-0.124939
-0.556625	in thousand so	-0.124939
-1.160994	if possible, so	-0.124939
-1.086113	more compact so	-0.124939
-0.803640	explained above, so	-0.124939
-0.549730	specifies truncation so	-0.124939
-0.957458	by default, so	-0.124939
-0.763054	example 9.5 so	-0.124939
-0.503878	32 bits, so	-0.124939
-0.503878	automatic prefetching so	-0.124939
-0.503878	a parameter, so	-0.124939
-0.658236	example 12.4a so	-0.124939
-0.462601	1024 bytes, so	-0.124939
-0.462601	reciprocal factorials so	-0.124939
-0.358113	task switches; so	-0.124939
-0.358113	is developing so	-0.124939
-0.358113	significant digits, so	-0.124939
-0.358113	value 0x2C so	-0.124939
-0.358113	class C1, so	-0.124939
-2.687402	that the variables	-0.124939
-1.442279	choose the variables	-0.124939
-2.454106	number of variables	-0.124939
-2.137818	used for variables	-0.124939
-1.486814	intended for variables	-0.124939
-2.024981	sure that variables	-0.124939
-1.076393	operands are variables	-0.124939
-1.199189	not on variables	-0.124939
-1.290123	not make variables	-0.124939
-0.599345	cause other variables	-0.124939
-1.358706	floating point variables	-0.124939
-0.733698	Floating point variables	-0.249877
-1.193923	predict which variables	-0.124939
-1.621811	that all variables	-0.124939
-1.375134	often used variables	-0.124939
-1.046841	commonly used variables	-0.124939
-1.070266	that most variables	-0.124939
-1.071840	for multiple variables	-0.124939
-1.068297	and static variables	-0.124939
-1.191752	contains many variables	-0.124939
-0.848916	for register variables	-0.124939
-0.511064	make register variables	-0.124939
-0.383158	point register variables	-0.301030
-0.511064	integer register variables	-0.124939
-0.596664	understand how variables	-0.124939
-0.595791	setting these variables	-0.124939
-0.594941	Putting simple variables	-0.124939
-1.174895	to its variables	-0.124939
-0.593660	that several variables	-0.124939
-0.707629	single precision variables	-0.425969
-0.590628	add counter variables	-0.124939
-0.589974	158 Integer variables	-0.124939
-0.422679	with Boolean variables	-0.124939
-0.422679	have Boolean variables	-0.124939
-0.422679	true. Boolean variables	-0.124939
-0.422679	overdetermined Boolean variables	-0.124939
-0.422679	invalid. Boolean variables	-0.124939
-0.284700	of induction variables	-0.124939
-0.284700	and induction variables	-0.124939
-0.404330	with induction variables	-0.124939
-0.284700	use induction variables	-0.124939
-0.284700	make induction variables	-0.124939
-0.119427	point induction variables	-0.124939
-0.404330	two induction variables	-0.124939
-0.284700	need induction variables	-0.124939
-0.769319	and public variables	-0.124939
-0.530044	have public variables	-0.124939
-0.456433	and global variables	-0.124939
-0.456433	avoid global variables	-0.124939
-0.456433	All global variables	-0.124939
-0.504627	used. Such variables	-0.124939
-0.504627	__declspec(thread). Such variables	-0.124939
-0.498315	and local variables	-0.124939
-0.498315	other local variables	-0.124939
-0.572766	for initialized variables	-0.124939
-0.572581	Windows, allow variables	-0.124939
-0.639321	all non-static variables	-0.124939
-0.450451	All non-static variables	-0.124939
-0.151425	} Induction variables	-0.124939
-0.151425	elements Induction variables	-0.124939
-0.151425	expressions Induction variables	-0.124939
-0.151425	motion Induction variables	-0.124939
-0.151425	70 Induction variables	-0.124939
-0.550101	4; Register variables	-0.124939
-0.549940	variables (i.e. variables	-0.124939
-0.550262	access internal variables	-0.124939
-0.129349	7.2 Integers variables	-0.425969
-0.504279	class. Storing variables	-0.124939
-0.249555	function. Global variables	-0.124939
-0.249555	it. Global variables	-0.124939
-0.462965	for uninitialized variables	-0.124939
-0.358400	two summation variables	-0.124939
-1.077208	copying the return	-0.124939
-0.601401	overwrite the return	-0.124939
-1.377858	explanation of return	-0.124939
-2.251635	is to return	-0.124939
-2.005622	recommended to return	-0.124939
-0.600014	a=a*2; to return	-0.124939
-0.203540	call and return	-0.124939
-1.544856	new and return	-0.124939
-0.601314	types The return	-0.124939
-0.601310	error can return	-0.124939
-0.891788	0; // return	-0.124939
-0.891788	x^10 // return	-0.124939
-0.596372	bitofn // return	-0.124939
-2.332955	the function return	-0.124939
-2.263638	a function return	-0.124939
-0.595355	storing function return	-0.124939
-1.076337	function may return	-0.124939
-1.014208	else { return	-0.124939
-0.288056	x) { return	-0.191886
-1.100692	b) { return	-0.124939
-0.304794	p) { return	-0.425969
-1.173187	(b) { return	-0.124939
-0.179383	a) { return	-0.124939
-0.490440	m) { return	-0.124939
-0.490440	Size() { return	-0.124939
-1.196539	function will return	-0.124939
-1.590877	1; } return	-0.124939
-1.225189	3; } return	-0.124939
-0.581823	factorial } return	-0.124939
-0.581823	x^n } return	-0.124939
-0.558035	chosen version return	-0.124939
-0.194951	dispatched version return	-0.425969
-0.558035	Default version return	-0.124939
-1.782164	of 2 return	-0.124939
-1.146612	+ 2 return	-0.124939
-0.594912	No error return	-0.124939
-0.888472	and its return	-0.124939
-0.594626	it must return	-0.124939
-0.836436	stack ; return	-0.124939
-0.982682	label ; return	-0.124939
-0.594053	*= i; return	-0.124939
-0.194835	7.16 Function return	-0.425969
-0.816081	SSE2 supported return	-0.124939
-0.816081	AVX supported return	-0.124939
-0.193301	c); ... return	-0.425969
-1.543947	+ 1; return	-0.124939
-1.152428	should never return	-0.124939
-0.550718	* 2; return	-0.425969
-0.438064	* 3; return	-0.425969
-0.550096	the normal return	-0.124939
-0.540727	n; #endif return	-0.124939
-0.540568	reporting here: return	-0.124939
-0.358500	= x8*x2; return	-0.124939
-0.358500	& (N-1)) return	-0.124939
-0.358500	4.4, 2.5}; return	-0.124939
-0.358500	page 134) return	-0.124939
-0.358500	by causing return	-0.124939
-0.358500	= __rdtsc(); return	-0.124939
-0.358500	_mm_hadd_ps(s, s); return	-0.124939
-1.202570	frequency is 2	-0.124939
-2.096771	on a 2	-0.124939
-1.409207	function of 2	-0.124939
-0.214693	power of 2	-0.321233
-0.212936	powers of 2	-0.124939
-0.900658	reduced to 2	-0.124939
-0.600842	Kbytes to 2	-0.124939
-2.208582	will be 2	-0.124939
-0.596416	a; // 2	-0.124939
-1.062466	d; // 2	-0.124939
-0.596416	13 // 2	-0.124939
-1.547339	x = 2	-0.124939
-1.176404	multiplication by 2	-0.124939
-0.889011	divide by 2	-0.124939
-0.594963	dividing by 2	-0.124939
-1.636641	0 - 2	-0.124939
-1.884762	more than 2	-0.124939
-1.574090	bigger than 2	-0.124939
-0.896891	= double 2	-0.124939
-1.590768	a + 2	-0.124939
-0.769800	c + 2	-0.425969
-0.565680	8*x + 2	-0.124939
-0.598093	(u.i * 2	-0.124939
-0.582116	unsigned 2 2	-0.124939
-0.864078	32 2 2	-0.124939
-0.597571	takes between 2	-0.124939
-0.858656	time. 4 2	-0.124939
-0.579275	............................................................................................... 4 2	-0.124939
-1.184631	or unsigned 2	-0.124939
-0.882338	double 64 2	-0.124939
-0.957485	long 64 2	-0.124939
-0.882338	4 64 2	-0.124939
-0.526098	Vec2q 64 2	-0.124939
-0.526098	Vec4ui 64 2	-0.124939
-1.172297	int 32 2	-0.124939
-0.573080	Iu16vec4 32 2	-0.124939
-0.592698	5 / 2	-0.124939
-1.162956	// add 2	-0.124939
-0.281933	INSTRSET == 2	-0.124939
-0.575113	(i % 2	-0.124939
-0.573012	address below 2	-0.124939
-0.377810	one fraction 2	-0.124939
-0.149055	1 fraction 2	-0.124939
-0.165130	((x2) 2) 2	-0.124939
-0.817473	= int64_t 2	-0.124939
-0.152897	// Add 2	-0.425969
-0.152874	Intel Core 2	-0.124939
-0.504520	cases........................................................................................................ 124 2	-0.124939
-0.358572	can exceed 2	-0.124939
-0.600785	error. // You	-0.124939
-0.600075	Read time You	-0.124939
-0.599952	111 } You	-0.124939
-1.655529	intrinsic functions You	-0.124939
-1.188354	if unsigned You	-0.124939
-0.595861	instruction code. You	-0.124939
-1.107980	a time. You	-0.124939
-1.005156	of time. You	-0.124939
-1.408914	member functions. You	-0.124939
-0.590106	memory used. You	-0.124939
-0.589411	^ 1; You	-0.124939
-1.146898	more efficient. You	-0.124939
-1.256235	less efficient. You	-0.124939
-0.586691	guidelines below. You	-0.124939
-1.475067	is called. You	-0.124939
-1.327539	the compiler. You	-0.124939
-0.864121	'this' pointer. You	-0.124939
-0.576188	different registers. You	-0.124939
-0.817379	clock cycles. You	-0.124939
-0.867940	vector operations. You	-0.124939
-0.490747	point operations. You	-0.124939
-1.000185	not needed. You	-0.124939
-0.574248	base classes. You	-0.124939
-0.574396	a thread. You	-0.124939
-1.102190	double precision. You	-0.124939
-0.571967	graceful way. You	-0.124939
-0.844854	per vector. You	-0.124939
-0.568894	with references. You	-0.124939
-0.569420	which not. You	-0.124939
-1.274306	memory allocation. You	-0.124939
-0.967136	to zero. You	-0.124939
-0.825279	the software. You	-0.124939
-1.044697	this case. You	-0.124939
-0.556149	needed anyway. You	-0.124939
-1.109467	by 16. You	-0.124939
-0.938133	the application. You	-0.124939
-0.549787	pitfalls here. You	-0.124939
-0.549507	the result. You	-0.124939
-0.803663	preceding one. You	-0.124939
-0.658252	not overlap. You	-0.124939
-0.390237	b overlap. You	-0.124939
-0.366159	CPU cores. You	-0.425969
-0.539719	error handling. You	-0.124939
-0.539719	denormal numbers. You	-0.124939
-0.786243	this manual. You	-0.124939
-0.963494	other modules. You	-0.124939
-0.539719	about them. You	-0.124939
-0.539719	program itself. You	-0.124939
-0.526232	not expensive. You	-0.124939
-0.503638	Boolean operands. You	-0.124939
-0.503638	positive n. You	-0.124939
-0.503638	+ a. You	-0.124939
-0.657893	a debugger. You	-0.124939
-0.462382	of CriticalFunction. You	-0.124939
-0.657893	page 52. You	-0.124939
-0.657893	in question. You	-0.124939
-0.462382	processor model. You	-0.124939
-0.657893	test examples. You	-0.124939
-0.657893	be shared. You	-0.124939
-0.462382	155 test. You	-0.124939
-0.657893	mutually incompatible. You	-0.124939
-0.657893	page 72. You	-0.124939
-0.462382	available today. You	-0.124939
-0.462382	to me. You	-0.124939
-0.357941	level 108 You	-0.124939
-0.357941	it twice. You	-0.124939
-0.357941	Print heading You	-0.124939
-0.357941	or -fno-strict-overflow. You	-0.124939
-0.357941	variable __intel_cpu_feature_indicator_x. You	-0.124939
-0.357941	cache contention. You	-0.124939
-0.357941	are compiler-specific. You	-0.124939
-0.357941	(rarely 64). You	-0.124939
-0.357941	not used). You	-0.124939
-0.357941	into account. You	-0.124939
-0.357941	too late. You	-0.124939
-0.357941	optimization job. You	-0.124939
-0.357941	GOT entry. You	-0.124939
-0.357941	bad dilemma. You	-0.124939
-0.357941	or makefile. You	-0.124939
-2.335996	of the table	-0.124939
-2.584058	in the table	-0.124939
-2.402949	that the table	-0.124939
-2.427514	if the table	-0.124939
-2.109429	than the table	-0.124939
-0.630169	calculate the table	-0.124939
-1.506088	store the table	-0.124939
-1.061790	copy the table	-0.124939
-1.061790	expect the table	-0.124939
-0.891419	declare the table	-0.124939
-0.891419	declaring the table	-0.124939
-0.891419	copies the table	-0.124939
-0.596185	Copying the table	-0.124939
-2.132054	to a table	-0.124939
-1.771073	and a table	-0.124939
-1.053310	by a table	-0.249877
-2.045927	with a table	-0.124939
-2.076188	as a table	-0.124939
-1.364582	use a table	-0.425969
-1.080493	from a table	-0.124939
-1.804382	has a table	-0.124939
-0.902284	principle of table	-0.124939
-1.919680	code and table	-0.124939
-0.600362	calculation and table	-0.124939
-1.786158	elements in table	-0.124939
-1.267226	results in table	-0.124939
-1.058585	mentioned in table	-0.124939
-0.423631	listed in table	-0.124939
-0.595085	summarized in table	-0.124939
-1.178455	element. The table	-0.124939
-0.595500	8.1. The table	-0.124939
-0.595500	examples. The table	-0.124939
-0.595500	104). The table	-0.124939
-0.595500	134. The table	-0.124939
-2.296125	{ // table	-0.124939
-0.600960	heavily on table	-0.124939
-2.040433	in this table	-0.124939
-2.525625	to make table	-0.124939
-1.819032	of each table	-0.124939
-0.891210	all these table	-0.124939
-1.288881	The following table	-0.124939
-0.593875	(GOT). These table	-0.124939
-1.125618	the virtual table	-0.124939
-0.624479	a virtual table	-0.124939
-0.511200	so-called virtual table	-0.124939
-0.385281	a lookup table	-0.124939
-0.580478	132. Unfortunately, table	-0.124939
-0.181381	for vectorized table	-0.124939
-0.179671	global offset table	-0.124939
-0.577151	9. Avoid table	-0.124939
-0.846583	// align table	-0.124939
-0.530961	a hash table	-0.124939
-0.378753	A hash table	-0.124939
-0.052831	procedure linkage table	-0.522879
-0.540884	hand- written table	-0.124939
-0.526992	112 Vectorized table	-0.124939
-0.526848	100 As table	-0.124939
-0.065767	an import table	-0.124939
-2.528400	of the performance	-0.124939
-2.279739	for the performance	-0.124939
-2.143582	with the performance	-0.124939
-1.746027	but the performance	-0.124939
-1.884520	using the performance	-0.124939
-1.906906	where the performance	-0.124939
-1.918706	before the performance	-0.124939
-1.171541	test the performance	-0.124939
-1.566175	up the performance	-0.124939
-1.596941	about the performance	-0.124939
-1.179460	read the performance	-0.124939
-0.318824	improve the performance	-0.204120
-1.171541	reduce the performance	-0.124939
-0.886495	reading the performance	-0.124939
-1.054521	reducing the performance	-0.124939
-0.886495	case, the performance	-0.124939
-0.593683	compare the performance	-0.124939
-0.593683	influence the performance	-0.124939
-0.593683	paying the performance	-0.124939
-2.646318	is a performance	-0.124939
-1.075570	include a performance	-0.124939
-1.663210	set of performance	-0.124939
-1.200048	optimization of performance	-0.124939
-0.600019	degradation of performance	-0.124939
-0.744306	difference in performance	-0.425969
-0.599145	gain in performance	-0.124939
-0.597212	improvement in performance	-0.124939
-1.274939	efficient. The performance	-0.124939
-1.178029	processors. The performance	-0.124939
-1.268654	CPUs. The performance	-0.124939
-0.889848	problems. The performance	-0.124939
-0.595389	mispredictions. The performance	-0.124939
-1.076896	required for performance	-0.124939
-0.600853	problems or performance	-0.124939
-0.600862	good code performance	-0.124939
-1.059986	or more performance	-0.425969
-1.295730	time when performance	-0.124939
-0.599968	counters. A performance	-0.124939
-1.257299	of program performance	-0.124939
-0.592954	analyzing program performance	-0.124939
-1.549547	is no performance	-0.124939
-1.187363	hardly any performance	-0.124939
-1.065276	that software performance	-0.124939
-1.064262	feature called performance	-0.124939
-0.596149	particularly useful performance	-0.124939
-0.595469	advanced system performance	-0.124939
-0.597369	The best performance	-0.602060
-1.480139	a good performance	-0.124939
-1.064417	very good performance	-0.124939
-0.590323	selecting optimize performance	-0.124939
-0.589241	testing Most performance	-0.124939
-0.191178	16.1 Using performance	-0.425969
-1.370499	can improve performance	-0.124939
-0.100509	has reduced performance	-0.602060
-0.572759	almost identical performance	-0.124939
-1.166893	to identify performance	-0.124939
-0.550051	Very poor performance	-0.124939
-0.540692	A realistic performance	-0.124939
-0.763387	The highest performance	-0.124939
-0.526615	no 51 performance	-0.124939
-0.504636	of measuring performance	-0.124939
-0.504379	2.1. Comparing performance	-0.124939
-0.463057	are inherent performance	-0.124939
-0.658951	the overall performance	-0.124939
-0.358471	The benchmark performance	-0.124939
-0.358471	for investigating performance	-0.124939
-0.358471	definitely degrades performance	-0.124939
-0.601791	activating the very	-0.124939
-1.799613	it is very	-0.124939
-2.018225	code is very	-0.124939
-2.235011	This is very	-0.124939
-2.409804	It is very	-0.124939
-1.190759	which is very	-0.124939
-1.326058	library is very	-0.124939
-2.127690	There is very	-0.124939
-1.246876	registers is very	-0.124939
-1.246876	counter is very	-0.124939
-1.160270	syntax is very	-0.124939
-0.590676	constants is very	-0.124939
-0.880612	algorithm is very	-0.124939
-0.880612	body is very	-0.124939
-1.547221	is a very	-0.124939
-1.822357	of a very	-0.124939
-2.239476	in a very	-0.124939
-1.889182	for a very	-0.124939
-1.336683	be a very	-0.124939
-1.542531	with a very	-0.124939
-2.002857	as a very	-0.124939
-1.758796	has a very	-0.124939
-1.252372	takes a very	-0.124939
-1.241014	require a very	-0.124939
-0.589375	certainly a very	-0.124939
-0.589375	indeed a very	-0.124939
-1.378132	apply to very	-0.124939
-1.073993	long and very	-0.124939
-0.600324	tested, and very	-0.124939
-1.566192	only for very	-0.124939
-0.895675	work for very	-0.124939
-0.598337	dramatically for very	-0.124939
-2.110097	to be very	-0.124939
-1.800011	can be very	-0.249877
-1.460684	will be very	-0.124939
-2.112059	that are very	-0.124939
-1.526632	operations are very	-0.124939
-1.340862	microprocessors are very	-0.124939
-0.593900	misses are very	-0.124939
-0.593900	capabilities are very	-0.124939
-0.600934	better on very	-0.124939
-0.600810	thread as very	-0.124939
-2.180124	are not very	-0.124939
-0.600681	intensive may very	-0.124939
-1.158950	libraries have very	-0.124939
-0.590320	instructions have very	-0.124939
-1.044871	computers have very	-0.124939
-0.600090	critical. A very	-0.124939
-1.479849	but also very	-0.124939
-1.756398	in some very	-0.124939
-0.865945	by some very	-0.124939
-1.877253	are accessed very	-0.124939
-1.572571	the arrays very	-0.124939
-1.051478	can get very	-0.124939
-1.553479	data caching very	-0.124939
-1.045140	is made very	-0.124939
-0.515160	other things very	-0.124939
-0.515160	some things very	-0.124939
-0.583182	bookkeeping depends very	-0.124939
-1.020411	can become very	-0.124939
-0.817693	are generally very	-0.124939
-0.540964	like -(-a) very	-0.124939
-0.540699	is unfortunately very	-0.124939
-0.526945	executed. Optimizes very	-0.124939
-0.504540	obsolete. Programmers very	-0.124939
-0.358586	is admittedly very	-0.124939
-2.382950	of the software	-0.124939
-2.476310	that the software	-0.124939
-2.501286	if the software	-0.124939
-1.127659	time the software	-0.425969
-2.181349	use the software	-0.124939
-2.225362	If the software	-0.124939
-1.355554	But the software	-0.124939
-1.355554	cause the software	-0.124939
-0.894714	optimizing the software	-0.124939
-0.597852	view the software	-0.124939
-1.714741	using a software	-0.124939
-1.060505	between a software	-0.124939
-0.595745	test a software	-0.124939
-0.595745	However, a software	-0.124939
-0.890550	choose a software	-0.124939
-0.595745	base a software	-0.124939
-1.060505	install a software	-0.124939
-0.890550	considered a software	-0.124939
-0.595745	reinstall a software	-0.124939
-1.792643	piece of software	-0.124939
-1.351367	costs of software	-0.124939
-0.596136	principles of software	-0.124939
-0.891322	splitting of software	-0.124939
-0.596136	updating of software	-0.124939
-0.596136	vulnerability of software	-0.124939
-0.596136	lineage of software	-0.124939
-0.596136	attention of software	-0.124939
-1.377739	relevant to software	-0.124939
-0.600185	process and software	-0.124939
-1.073581	programmers and software	-0.124939
-0.600486	shift in software	-0.124939
-0.600486	separately in software	-0.124939
-1.192725	time for software	-0.124939
-0.598228	common for software	-0.124939
-0.895460	uncommon for software	-0.124939
-0.598024	are that software	-0.124939
-1.188196	problems that software	-0.124939
-0.895055	believe that software	-0.124939
-0.900652	wasted on software	-0.124939
-0.600260	test when software	-0.124939
-0.898825	systems. A software	-0.124939
-1.831704	to make software	-0.124939
-1.071576	about which software	-0.124939
-1.622032	that all software	-0.124939
-1.070326	that most software	-0.124939
-0.597929	worse, many software	-0.124939
-1.474059	for 32-bit software	-0.124939
-0.581298	fast 32-bit software	-0.124939
-1.925321	a new software	-0.124939
-1.387674	of making software	-0.124939
-0.569581	with making software	-0.124939
-0.594815	redesign. Some software	-0.124939
-0.594597	other extra software	-0.124939
-0.594000	Even big software	-0.124939
-0.885015	well optimized software	-0.124939
-0.590828	problems. All software	-0.124939
-1.568801	the application software	-0.124939
-0.588855	make their software	-0.124939
-0.480343	updates Many software	-0.124939
-0.480343	databases Many software	-0.124939
-0.480343	more. Many software	-0.124939
-0.585221	Microsoft platform software	-0.124939
-0.585621	ever bigger software	-0.124939
-0.650890	the whole software	-0.124939
-0.177661	1. Optimizing software	-0.425969
-1.076074	a typical software	-0.124939
-0.787550	of CPU-intensive software	-0.124939
-0.540640	of 18 software	-0.124939
-0.763655	of structured software	-0.124939
-0.463002	of irrelevant software	-0.124939
-0.463002	hard working software	-0.124939
-0.463002	better performing software	-0.124939
-0.463002	computer. Security software	-0.124939
-0.358428	make memory-hungry software	-0.124939
-0.358428	supporting multi-threaded software	-0.124939
-1.917292	in the order	-0.425969
-2.451128	that the order	-0.124939
-1.065086	check the order	-0.124939
-0.597311	usually the order	-0.124939
-1.414954	change the order	-0.124939
-0.248818	swap the order	-0.602060
-0.597311	swapping the order	-0.124939
-1.065086	reflects the order	-0.124939
-0.597311	controlling the order	-0.124939
-1.886640	out of order	-0.124939
-0.203748	Out of order	-0.425969
-1.036517	it in order	-0.124939
-1.583669	code in order	-0.124939
-0.836824	data in order	-0.425969
-1.072750	other in order	-0.124939
-0.987344	set in order	-0.124939
-0.947811	size in order	-0.124939
-0.553718	i in order	-0.124939
-1.411359	variables in order	-0.124939
-0.987344	2 in order	-0.124939
-1.413313	order in order	-0.124939
-1.496519	elements in order	-0.124939
-0.553718	const in order	-0.124939
-0.553718	8 in order	-0.124939
-0.553718	unsigned in order	-0.124939
-0.811253	operations in order	-0.124939
-0.475504	times in order	-0.425969
-0.553718	big in order	-0.124939
-1.454932	element in order	-0.124939
-0.947811	information in order	-0.124939
-1.169660	addresses in order	-0.124939
-0.553718	end in order	-0.124939
-1.133427	needed in order	-0.124939
-1.133427	together in order	-0.124939
-0.553718	bool in order	-0.124939
-0.987344	conditions in order	-0.124939
-0.553718	right in order	-0.124939
-0.553718	cores in order	-0.124939
-0.553718	input in order	-0.124939
-0.553718	blocks in order	-0.124939
-0.553718	low in order	-0.124939
-0.811253	algorithms in order	-0.124939
-0.553718	purpose in order	-0.124939
-0.553718	itself in order	-0.124939
-0.553718	principles in order	-0.124939
-0.553718	package in order	-0.124939
-0.553718	is, in order	-0.124939
-0.553718	bookkeeping in order	-0.124939
-0.553718	influences in order	-0.124939
-0.553718	experiments in order	-0.124939
-0.553718	ment in order	-0.124939
-0.553718	randomness in order	-0.124939
-0.553718	de-referenced in order	-0.124939
-0.553718	phase in order	-0.124939
-0.553718	(GOT) in order	-0.124939
-0.553718	sizeof(float) in order	-0.124939
-0.598576	parameter. The order	-0.124939
-0.598576	Booleans The order	-0.124939
-0.598576	51). The order	-0.124939
-1.021302	} In order	-0.124939
-0.572157	B. In order	-0.124939
-0.572157	system-specific. In order	-0.124939
-1.056194	no specific order	-0.124939
-0.869561	the storage order	-0.124939
-0.869737	The link order	-0.124939
-1.241121	a non-sequential order	-0.124939
-0.527122	in sequential order	-0.124939
-0.726517	a natural order	-0.124939
-0.659639	The opposite order	-0.124939
-0.358816	its out-of- order	-0.124939
-0.358816	a 2'nd order	-0.124939
-3.037517	in the long	-0.124939
-2.763516	it is long	-0.124939
-1.439252	loop is long	-0.124939
-2.276861	of a long	-0.124939
-1.803190	has a long	-0.124939
-1.678500	using a long	-0.124939
-1.559373	where a long	-0.124939
-0.594701	takes a long	-0.124939
-0.596529	take a long	-0.425969
-0.376516	quite a long	-0.124939
-0.592881	causes a long	-0.124939
-0.884921	With a long	-0.124939
-0.592881	forms a long	-0.124939
-0.203749	double and long	-0.425969
-0.600838	misprediction, or long	-0.124939
-1.530203	done with long	-0.124939
-0.597600	chains with long	-0.124939
-0.852994	time as long	-0.124939
-0.852994	program as long	-0.124939
-0.576290	but as long	-0.124939
-1.029085	variables as long	-0.124939
-0.576290	times as long	-0.124939
-0.576290	calculations as long	-0.124939
-0.576290	significant as long	-0.124939
-0.576290	integers, as long	-0.124939
-1.733661	but not long	-0.124939
-0.600369	registers have long	-0.124939
-0.600277	calculate when long	-0.124939
-0.599953	sub-vector. A long	-0.124939
-1.533450	are no long	-0.124939
-1.066253	of some long	-0.124939
-1.565657	is so long	-0.124939
-1.442125	a very long	-0.124939
-0.983534	for very long	-0.124939
-0.697737	be very long	-0.425969
-0.766094	unsigned long long	-0.124939
-0.471051	SSE2 long long	-0.124939
-0.471051	i; long long	-0.124939
-0.471051	AVX2 long long	-0.124939
-0.471051	MMX long long	-0.124939
-0.471051	AVX512 long long	-0.124939
-0.471051	time1; long long	-0.124939
-0.471051	int32_t long long	-0.124939
-0.471051	<intrin.h> long long	-0.124939
-0.471051	DontSkip; long long	-0.124939
-1.349978	8 8 long	-0.124939
-0.824934	systems: unsigned long	-0.124939
-0.561226	Linux: unsigned long	-0.124939
-0.561226	uint32_t unsigned long	-0.124939
-0.892425	measure how long	-0.124939
-1.270620	128 SSE2 long	-0.124939
-1.178614	to avoid long	-0.425969
-2.075889	int i; long	-0.124939
-0.874120	takes too long	-0.124939
-1.140166	256 AVX2 long	-0.124939
-1.135105	is just long	-0.124939
-0.576910	22. Avoid long	-0.124939
-0.969525	branch misprediction long	-0.124939
-1.046457	64 MMX long	-0.124939
-1.064022	16-bit systems: long	-0.124939
-0.997534	512 AVX512 long	-0.124939
-0.132104	of unacceptably long	-0.124939
-0.132104	by unacceptably long	-0.124939
-0.132104	have unacceptably long	-0.124939
-0.132104	experience unacceptably long	-0.124939
-0.902212	math libraries: long	-0.124939
-0.725682	64-bit Linux: long	-0.124939
-0.658923	data types: long	-0.124939
-0.463038	long time1; long	-0.124939
-0.463038	give annoyingly long	-0.124939
-0.358457	231-1 int32_t long	-0.124939
-0.358457	#include <intrin.h> long	-0.124939
-0.358457	int DontSkip; long	-0.124939
-0.601152	mainframes, and between	-0.124939
-0.902061	cache in between	-0.124939
-0.600678	program, or between	-0.124939
-1.869758	the cache between	-0.124939
-0.598747	are there between	-0.124939
-1.067019	It takes between	-0.124939
-0.585832	in performance between	-0.124939
-1.400972	unused bytes between	-0.124939
-0.331763	in speed between	-0.124939
-0.448008	is shared between	-0.124939
-0.448008	and shared between	-0.124939
-0.448008	be shared between	-0.124939
-0.448008	are shared between	-0.124939
-0.448008	not shared between	-0.124939
-1.319430	is typically between	-0.124939
-1.152198	The conversion between	-0.124939
-0.082533	the difference between	-0.204120
-0.452746	The difference between	-0.124939
-0.780228	no difference between	-0.124939
-0.284629	minimal difference between	-0.124939
-0.870995	graphics framework between	-0.124939
-1.138156	to choose between	-0.124939
-1.151097	a switch between	-0.124939
-0.509845	need conversions between	-0.124939
-0.509845	Avoid conversions between	-0.124939
-1.261678	the choice between	-0.124939
-0.569946	CISC processors, between	-0.124939
-0.562131	that jump between	-0.124939
-0.129327	... Conversions between	-0.124939
-0.129327	conversion Conversions between	-0.124939
-0.129327	precision. Conversions between	-0.124939
-0.129327	enabled. Conversions between	-0.124939
-0.059867	14.8 Conversions between	-0.425969
-0.956499	is distributed between	-0.124939
-0.390817	for communication between	-0.124939
-0.274403	that communication between	-0.124939
-0.274403	necessary communication between	-0.124939
-0.787592	that select between	-0.124939
-0.540483	is split between	-0.124939
-0.143246	a compromise between	-0.124939
-0.540483	has nothing between	-0.124939
-0.540263	thread jumps between	-0.124939
-0.077732	that chooses between	-0.425969
-0.172476	program chooses between	-0.124939
-0.310966	to distinguish between	-0.124939
-0.504139	the distance between	-0.124939
-0.462838	optimized. Jumps between	-0.124939
-0.462838	function. Switch between	-0.124939
-0.065725	and synchronization between	-0.124939
-0.143202	a distinction between	-0.124939
-0.143202	important distinction between	-0.124939
-0.358299	The similarity between	-0.124939
-0.358299	sensible balance between	-0.124939
-0.358299	the transitions between	-0.124939
-0.358299	work evenly between	-0.124939
-0.358299	the workload between	-0.124939
-0.358299	and communicating between	-0.124939
-0.358299	clear correspondence between	-0.124939
-0.358299	perfectly varies between	-0.124939
-0.358299	were observed between	-0.124939
-0.358299	without discriminating between	-0.124939
-0.358299	The distinctions between	-0.124939
-0.358299	facilitate porting between	-0.124939
-0.358299	that discriminates between	-0.124939
-0.358299	multitasking environment, between	-0.124939
-0.358299	for distinguishing between	-0.124939
-3.061304	of the 32-bit	-0.124939
-2.607421	for the 32-bit	-0.124939
-2.269047	than the 32-bit	-0.124939
-2.518685	of a 32-bit	-0.124939
-2.239483	with a 32-bit	-0.124939
-1.622917	as a 32-bit	-0.124939
-1.445396	efficiency of 32-bit	-0.124939
-1.554581	applied to 32-bit	-0.124939
-0.939385	Windows and 32-bit	-0.124939
-1.691091	Linux and 32-bit	-0.124939
-1.204806	and in 32-bit	-0.124939
-1.573241	than in 32-bit	-0.124939
-0.913314	variables in 32-bit	-0.124939
-1.005089	faster in 32-bit	-0.124939
-0.852319	method in 32-bit	-0.124939
-1.329876	bits in 32-bit	-0.124939
-1.468506	available in 32-bit	-0.124939
-0.369571	stack in 32-bit	-0.124939
-1.107896	bytes in 32-bit	-0.124939
-1.107896	integers in 32-bit	-0.124939
-1.005089	precision in 32-bit	-0.124939
-0.575933	eight in 32-bit	-0.124939
-1.226938	addresses in 32-bit	-0.124939
-1.329876	running in 32-bit	-0.124939
-1.107896	references in 32-bit	-0.124939
-0.243934	especially in 32-bit	-0.301030
-0.575933	resource in 32-bit	-0.124939
-0.575933	purposes in 32-bit	-0.124939
-0.852319	-fpic in 32-bit	-0.124939
-0.575933	six in 32-bit	-0.124939
-0.575933	Sum3 in 32-bit	-0.124939
-0.747545	compiler for 32-bit	-0.301030
-1.234193	performance for 32-bit	-0.124939
-0.587844	cycles for 32-bit	-0.124939
-1.402649	intended for 32-bit	-0.124939
-1.301158	compiling for 32-bit	-0.124939
-0.587844	capabilities for 32-bit	-0.124939
-0.587844	Compilers for 32-bit	-0.124939
-0.587844	executables for 32-bit	-0.124939
-1.076746	b are 32-bit	-0.124939
-0.896560	#else // 32-bit	-0.124939
-0.598783	defined(__GNUC__) // 32-bit	-0.124939
-2.267964	such as 32-bit	-0.124939
-2.076502	faster than 32-bit	-0.124939
-2.414388	to use 32-bit	-0.124939
-0.202099	Supports only 32-bit	-0.425969
-0.599624	bits, but 32-bit	-0.124939
-0.871624	as two 32-bit	-0.124939
-0.871624	than two 32-bit	-0.124939
-0.598424	addressing. In 32-bit	-0.124939
-0.862968	performance between 32-bit	-0.124939
-1.445592	difference between 32-bit	-0.124939
-0.197320	-fno-builtin Gnu 32-bit	-0.124939
-0.592596	sometimes uses 32-bit	-0.124939
-0.592420	libraries support 32-bit	-0.124939
-1.246234	for fast 32-bit	-0.124939
-0.590397	(2013) both 32-bit	-0.124939
-0.589041	to their 32-bit	-0.124939
-0.586147	integers. Many 32-bit	-0.124939
-0.582624	It supports 32-bit	-0.124939
-0.510160	not. Supports 32-bit	-0.124939
-0.510160	code). Supports 32-bit	-0.124939
-0.580406	platforms, including 32-bit	-0.124939
-0.577228	and Linux, 32-bit	-0.124939
-0.817431	in Linux. 32-bit	-0.124939
-1.026225	or reference, 32-bit	-0.124939
-0.659123	OS X, 32-bit	-0.124939
-0.358557	both 16-bit, 32-bit	-0.124939
-2.297241	in the branch	-0.425969
-2.557053	if the branch	-0.124939
-2.329404	by the branch	-0.124939
-0.378961	called the branch	-0.124939
-1.364833	replace the branch	-0.124939
-0.598940	predictable the branch	-0.124939
-0.896873	avoids the branch	-0.124939
-0.601645	used is branch	-0.124939
-2.343055	is a branch	-0.124939
-2.167238	to a branch	-0.124939
-2.080740	with a branch	-0.124939
-1.702806	from a branch	-0.124939
-1.218368	has a branch	-0.124939
-0.887800	way a branch	-0.124939
-1.565330	example, a branch	-0.124939
-1.335869	replace a branch	-0.124939
-1.263770	Such a branch	-0.124939
-0.594348	feeds a branch	-0.124939
-0.679223	explanation of branch	-0.425969
-1.489262	kind of branch	-0.124939
-1.482648	function and branch	-0.124939
-0.203543	misses and branch	-0.425969
-0.898663	integers. The branch	-0.124939
-1.072557	critical. The branch	-0.124939
-2.140226	used for branch	-0.124939
-0.599804	results for branch	-0.124939
-0.601404	of that branch	-0.124939
-1.067752	the if branch	-0.124939
-0.895437	The if branch	-0.124939
-0.203232	Loop with branch	-0.124939
-0.900755	details on branch	-0.124939
-1.062976	A code branch	-0.124939
-0.593882	which code branch	-0.124939
-0.593882	any code branch	-0.124939
-1.055585	} A branch	-0.124939
-0.559900	b; A branch	-0.124939
-0.559900	way. A branch	-0.124939
-0.559900	call. A branch	-0.124939
-0.559900	course. A branch	-0.124939
-0.559900	CPUs". A branch	-0.124939
-0.559900	mispredicted. A branch	-0.124939
-0.559900	changes. A branch	-0.124939
-2.161798	the loop branch	-0.124939
-1.613482	The loop branch	-0.124939
-1.720301	have no branch	-0.124939
-0.598022	generate many branch	-0.124939
-1.189001	best possible branch	-0.124939
-0.597854	resolve any branch	-0.124939
-1.631951	to take branch	-0.124939
-1.234396	a new branch	-0.124939
-0.594725	43 about branch	-0.124939
-1.853058	a single branch	-0.124939
-1.638904	can cause branch	-0.124939
-1.583549	the optimal branch	-0.124939
-0.589870	each particular branch	-0.124939
-0.362490	loop control branch	-0.346788
-1.302966	the dispatch branch	-0.124939
-0.804997	poorly predictable branch	-0.124939
-0.550118	from poor branch	-0.124939
-0.526679	code cache, branch	-0.124939
-0.725815	the wrong branch	-0.124939
-0.028002	cache misses, branch	-0.301030
-0.463111	function. Provoke branch	-0.124939
-0.463111	branches Remove branch	-0.124939
-0.358514	target buffer, branch	-0.124939
-2.423544	to a <	-0.124939
-0.202996	0; x <	-0.425969
-0.258164	if i <	-0.124939
-0.005019	0; i <	-0.674611
-0.369705	; i <	-0.124939
-0.258164	condition i <	-0.124939
-0.258164	comparisons i <	-0.124939
-0.486786	if c <	-0.124939
-0.080138	0; c <	-0.425969
-0.554471	if (i <	-0.124939
-0.525415	while (i <	-0.124939
-0.587679	<= n <	-0.124939
-0.161633	0; r <	-0.425969
-0.161633	1; r <	-0.425969
-0.579038	&list[0]; temp <	-0.124939
-0.570160	0; row <	-0.124939
-0.434862	r1; c2 <	-0.124939
-0.434862	c1; c2 <	-0.124939
-0.562739	0; j <	-0.124939
-0.550473	0; column <	-0.124939
-0.550386	0; c1 <	-0.124939
-0.526932	<= u.f <	-0.124939
-0.504680	0; r1 <	-0.124939
-0.249703	r1; r2 <	-0.124939
-0.249703	r1+1; r2 <	-0.124939
-0.659381	((unsigned int)i <	-0.124939
-0.463330	n.a. !(a <	-0.124939
-0.065775	((unsigned int)n <	-0.124939
-0.358686	that u <	-0.124939
-0.358686	if (u.i[1] <	-0.124939
-0.358686	while (seconds <	-0.124939
-0.358686	while (0 <	-0.124939
-2.222207	of the member	-0.124939
-2.572313	that the member	-0.124939
-2.597943	if the member	-0.124939
-1.578595	have the member	-0.124939
-2.285296	If the member	-0.124939
-2.327031	that is member	-0.124939
-1.770430	and a member	-0.124939
-1.914405	be a member	-0.124939
-1.769121	or a member	-0.124939
-0.594737	function a member	-0.425969
-1.348976	as a member	-0.301030
-1.478303	Make a member	-0.124939
-1.168681	accessing a member	-0.124939
-1.052335	Accessing a member	-0.124939
-0.885009	Calling a member	-0.124939
-0.592926	force a member	-0.124939
-1.378866	implementation of member	-0.124939
-1.201354	classes and member	-0.124939
-0.504228	pointer in member	-0.425969
-0.601345	52. The member	-0.124939
-0.601346	meaning for member	-0.124939
-1.922714	pointer or member	-0.124939
-0.895712	members or member	-0.124939
-1.066113	work with member	-0.124939
-0.597660	complications with member	-0.124939
-0.087374	a data member	-0.124939
-0.835528	class data member	-0.124939
-0.566966	first data member	-0.124939
-2.234558	to make member	-0.124939
-1.415511	that make member	-0.124939
-1.142354	may make member	-0.124939
-2.617404	the same member	-0.124939
-1.769471	any other member	-0.124939
-1.086554	a class member	-0.124939
-0.570685	for class member	-0.124939
-0.842455	A class member	-0.124939
-0.599132	But each member	-0.124939
-0.598954	its b member	-0.124939
-0.121716	A static member	-0.602060
-0.894776	contains any member	-0.124939
-1.478168	the way member	-0.124939
-1.065547	A const member	-0.124939
-0.592958	a virtual member	-0.124939
-0.659710	to virtual member	-0.124939
-0.659710	with virtual member	-0.124939
-0.463540	one virtual member	-0.124939
-0.463540	no virtual member	-0.124939
-0.579055	static Assume member	-0.124939
-0.118887	a non-static member	-0.425969
-0.283122	or non-static member	-0.124939
-0.402252	all non-static member	-0.124939
-0.615092	the polymorphic member	-0.124939
-0.744463	a polymorphic member	-0.124939
-0.336131	systems. Virtual member	-0.124939
-0.136327	7.20 Virtual member	-0.425969
-0.129416	7.19 Class member	-0.425969
-0.504480	fast=2 Simple member	-0.124939
-0.504480	other (not member	-0.124939
-0.463148	any non-polymorphic member	-0.124939
-0.463148	are: Non-static member	-0.124939
-0.463148	a non-virtual member	-0.124939
-0.358543	structures (without member	-0.124939
-2.863971	of the way	-0.124939
-2.085455	in the way	-0.124939
-2.316424	with the way	-0.124939
-2.425213	on the way	-0.124939
-0.897583	control the way	-0.124939
-1.430843	Unfortunately, the way	-0.124939
-1.963698	is a way	-0.124939
-1.791371	in a way	-0.602060
-1.560531	such a way	-0.124939
-0.893512	shows a way	-0.124939
-0.898683	parallelism. The way	-0.124939
-0.599851	factors. The way	-0.124939
-1.345134	in this way	-0.124939
-1.838377	a different way	-0.124939
-1.533559	the same way	-0.425969
-1.165311	the only way	-0.124939
-1.049754	The only way	-0.124939
-0.856271	the other way	-0.301030
-1.194310	predict which way	-0.124939
-1.407726	in one way	-0.124939
-1.196562	than one way	-0.124939
-0.565542	goes one way	-0.124939
-0.565542	go one way	-0.124939
-0.565542	randomly one way	-0.124939
-1.295578	is no way	-0.301030
-1.286341	The C++ way	-0.124939
-1.022106	an efficient way	-0.124939
-1.953836	more efficient way	-0.124939
-0.996156	very efficient way	-0.124939
-0.893986	even faster way	-0.124939
-1.760246	the first way	-0.124939
-1.519993	The first way	-0.124939
-0.849030	a useful way	-0.124939
-1.198901	very useful way	-0.124939
-1.267308	A simple way	-0.124939
-0.761273	the best way	-0.602060
-0.751762	The best way	-0.425969
-1.449796	a common way	-0.124939
-1.480383	a good way	-0.124939
-1.240654	A good way	-0.124939
-0.592495	goes another way	-0.124939
-1.156193	The second way	-0.124939
-0.583067	most compatible way	-0.124939
-0.716137	a safe way	-0.124939
-0.498601	The safe way	-0.124939
-1.097170	The simplest way	-0.124939
-0.169252	no easy way	-0.425969
-0.566501	The typical way	-0.124939
-0.397162	the fastest way	-0.425969
-0.550524	a convenient way	-0.124939
-0.550524	but efficient, way	-0.124939
-0.540912	a portable way	-0.124939
-0.939937	a suboptimal way	-0.124939
-0.107118	The easiest way	-0.425969
-0.358529	and intelligible way	-0.124939
-2.837530	of the elements	-0.124939
-2.394131	and the elements	-0.124939
-2.534614	that the elements	-0.124939
-2.021766	if the elements	-0.124939
-1.432038	which the elements	-0.124939
-1.365091	take the elements	-0.124939
-0.598994	storing the elements	-0.124939
-0.598994	adds the elements	-0.124939
-0.774573	number of elements	-0.636822
-1.469980	types of elements	-0.124939
-0.497041	Number of elements	-0.124939
-0.201400	Type of elements	-0.124939
-1.077255	diagonal. The elements	-0.124939
-0.599894	c2 for elements	-0.124939
-0.599894	requests for elements	-0.124939
-0.599022	used if elements	-0.425969
-1.728546	only when elements	-0.124939
-1.984349	the data elements	-0.124939
-0.885643	multiple data elements	-0.124939
-1.514938	on all elements	-0.124939
-0.865431	sets all elements	-0.124939
-1.041523	after all elements	-0.124939
-1.438515	the array elements	-0.124939
-1.117009	of array elements	-0.124939
-1.181582	for array elements	-0.124939
-0.558490	individual array elements	-0.124939
-0.598162	"how many elements	-0.124939
-0.597986	alias any elements	-0.124939
-0.596097	swap these elements	-0.124939
-1.341848	C++ language elements	-0.124939
-0.592547	read four elements	-0.124939
-0.592556	accessing container elements	-0.124939
-0.705868	first eight elements	-0.124939
-0.492349	these eight elements	-0.124939
-0.492349	handle eight elements	-0.124939
-0.492349	handles eight elements	-0.124939
-1.163313	// add elements	-0.124939
-1.636488	user interface elements	-0.124939
-0.584743	where 10 elements	-0.124939
-0.008129	eight consecutive elements	-0.602060
-0.575315	with N elements	-0.124939
-0.172650	// swap elements	-0.124939
-0.566754	all subsequent elements	-0.124939
-1.009347	to distinguish elements	-0.124939
-0.527008	the mirror elements	-0.124939
-0.463276	add dummy elements	-0.124939
-2.129800	it is faster	-0.124939
-1.581074	function is faster	-0.425969
-1.709962	This is faster	-0.124939
-1.699075	It is faster	-0.301030
-0.492160	point is faster	-0.124939
-0.580620	2 is faster	-0.124939
-1.100812	method is faster	-0.124939
-1.124063	access is faster	-0.124939
-1.037310	file is faster	-0.124939
-1.142736	case is faster	-0.124939
-0.122851	constant is faster	-0.425969
-0.584824	implementation is faster	-0.425969
-0.861219	division is faster	-0.124939
-1.332002	operand is faster	-0.124939
-0.861219	blocks is faster	-0.124939
-1.017798	tool is faster	-0.124939
-0.580620	15.1c is faster	-0.124939
-0.580620	(This is faster	-0.124939
-0.580620	Unsigned is faster	-0.124939
-0.580620	14.21 is faster	-0.124939
-0.601660	prevents a faster	-0.124939
-2.267858	to be faster	-0.124939
-1.799050	may be faster	-0.124939
-2.057383	will be faster	-0.124939
-1.576506	operations are faster	-0.124939
-1.586548	arrays are faster	-0.124939
-2.642988	the code faster	-0.124939
-1.825440	member functions faster	-0.124939
-0.895836	and C++ faster	-0.124939
-0.597168	83 called faster	-0.124939
-1.138778	is often faster	-0.124939
-0.574095	be even faster	-0.124939
-0.574095	an even faster	-0.124939
-1.261804	many times faster	-0.124939
-0.545692	5 times faster	-0.124939
-0.545692	seven times faster	-0.124939
-1.033455	function calls faster	-0.124939
-0.564607	calculated much faster	-0.124939
-0.564607	usually much faster	-0.124939
-0.885054	are calculated faster	-0.124939
-0.535975	will run faster	-0.124939
-0.520914	applications run faster	-0.124939
-0.586296	threads becomes faster	-0.124939
-1.494145	is usually faster	-0.124939
-0.584777	vector goes faster	-0.124939
-0.850883	code execute faster	-0.124939
-1.202223	a little faster	-0.124939
-0.567079	run slightly faster	-0.124939
-0.817764	is generally faster	-0.124939
-0.788051	be executed faster	-0.124939
-0.540866	is increasing faster	-0.124939
-0.015531	// Still faster	-0.726999
-0.726049	software packages faster	-0.124939
-0.463239	// ipow faster	-0.124939
-0.358615	but neither faster	-0.124939
-2.265570	use the const	-0.124939
-1.493559	However, the const	-0.124939
-1.074798	remove the const	-0.124939
-0.600594	relieving the const	-0.124939
-2.322099	to a const	-0.124939
-2.126905	by a const	-0.124939
-0.599280	away a const	-0.124939
-0.599280	reference, a const	-0.124939
-1.501556	table of const	-0.124939
-0.901928	equivalent to const	-0.124939
-1.076427	purposes. The const	-0.124939
-1.935995	pointer or const	-0.124939
-0.595445	c1 { const	-0.124939
-0.595445	MathLoop() { const	-0.124939
-0.875221	to. A const	-0.124939
-0.587902	reference. A const	-0.124939
-0.587902	taken. A const	-0.124939
-0.600109	EMMS } const	-0.124939
-0.599846	table has const	-0.124939
-0.598650	dest, double const	-0.124939
-0.912656	and static const	-0.124939
-0.785306	{ static const	-0.124939
-0.539187	b; static const	-0.124939
-0.539187	__declspec(align(16)) static const	-0.124939
-0.539187	factorials: static const	-0.124939
-0.539187	boolb=0; static const	-0.124939
-1.292417	const * const	-0.124939
-1.525748	induction variables const	-0.124939
-0.594776	Integer constant const	-0.124939
-2.074577	int i; const	-0.124939
-0.077291	d, __m128i const	-0.726999
-0.587391	static char const	-0.124939
-1.032046	int x; const	-0.124939
-1.237774	be declared const	-0.124939
-1.245171	a global const	-0.124939
-1.199143	Automatic vectorization const	-0.124939
-0.856987	a local const	-0.124939
-0.556647	a, T const	-0.124939
-0.817194	order(int x); const	-0.124939
-0.550114	lrintf (float const	-0.124939
-0.540430	__attribute__((aligned(16))) #endif const	-0.124939
-0.526580	Example 14.8 const	-0.124939
-0.526580	example 16.1 const	-0.124939
-0.129317	lrint (double const	-0.425969
-0.725215	Example 14.30 const	-0.124939
-0.027989	__m128i LoadVector(void const	-0.602060
-0.504079	Example 9.5a const	-0.124939
-0.504079	Example 11.3 const	-0.124939
-0.504435	Example 9.4 const	-0.124939
-0.504435	Example 7.17 const	-0.124939
-0.065720	int Func(int); const	-0.425969
-0.658522	Example 9.6a const	-0.124939
-0.462783	Example 11.2b const	-0.124939
-0.462783	of squares: const	-0.124939
-0.658522	of factorials: const	-0.124939
-0.358256	Example 14.5a const	-0.124939
-0.358256	polynomial (Vec4f const	-0.124939
-0.358256	Example 11.2a const	-0.124939
-0.358256	+ (vector const	-0.124939
-0.358256	__m128i LoadVectorA(void const	-0.124939
-0.358256	float add_elements(__m128 const	-0.124939
-0.358256	Example 7.33b const	-0.124939
-0.358256	a #define, const	-0.124939
-0.358256	T max(T const	-0.124939
-0.358256	Example 7.33a const	-0.124939
-0.358256	int FuncCol(int); const	-0.124939
-0.358256	Example 14.4a const	-0.124939
-0.600248	pointer and makes	-0.124939
-1.196917	faster and makes	-0.124939
-1.974662	code that makes	-0.124939
-1.072210	destructor that makes	-0.124939
-1.076411	d; // makes	-0.124939
-1.662498	and it makes	-0.124939
-1.575159	that it makes	-0.124939
-1.399068	because it makes	-0.124939
-1.510294	where it makes	-0.124939
-0.585355	variable, it makes	-0.124939
-1.952698	the code makes	-0.425969
-1.020079	memory. This makes	-0.124939
-1.020079	program. This makes	-0.124939
-0.944413	data. This makes	-0.124939
-0.768322	x; This makes	-0.124939
-0.890033	1. This makes	-0.124939
-0.890033	class. This makes	-0.124939
-0.529469	bytes. This makes	-0.124939
-1.014972	stack. This makes	-0.124939
-0.529469	needed. This makes	-0.124939
-0.529469	order. This makes	-0.124939
-0.944413	bits. This makes	-0.124939
-0.890033	modules. This makes	-0.124939
-0.768322	all. This makes	-0.124939
-0.529469	doubled. This makes	-0.124939
-0.529469	lines. This makes	-0.124939
-0.768322	loaded. This makes	-0.124939
-0.529469	stored. This makes	-0.124939
-0.768322	fragmented. This makes	-0.124939
-0.529469	static. This makes	-0.124939
-0.529469	occurred. This makes	-0.124939
-0.529469	local. This makes	-0.124939
-2.204933	The compiler makes	-0.124939
-1.703856	but this makes	-0.124939
-0.600351	class. It makes	-0.124939
-0.599892	installation program makes	-0.124939
-0.599746	functions only makes	-0.124939
-0.575189	non-sequential which makes	-0.124939
-0.575189	default, which makes	-0.124939
-0.575189	abstraction which makes	-0.124939
-0.850915	stack, which makes	-0.124939
-0.897783	by one makes	-0.124939
-1.741060	instruction set makes	-0.425969
-1.529256	class library makes	-0.124939
-0.572931	it also makes	-0.124939
-0.846664	This also makes	-0.124939
-0.572931	keyword also makes	-0.124939
-1.633280	function call makes	-0.124939
-0.596244	Using pointers makes	-0.124939
-0.890080	handling system makes	-0.124939
-0.594209	non-Intel processor makes	-0.124939
-0.195949	This option makes	-0.425969
-0.592857	This check makes	-0.124939
-0.591189	integers simply makes	-0.124939
-1.151481	const reference makes	-0.124939
-0.874460	volatile keyword makes	-0.124939
-1.065629	dynamic linking makes	-0.124939
-1.016706	Dynamic linking makes	-0.124939
-0.583747	or #define makes	-0.124939
-1.118082	stack frame makes	-0.124939
-0.840964	of templates makes	-0.124939
-0.570142	such checks makes	-0.124939
-0.569972	static declaration makes	-0.124939
-0.980569	memory blocks makes	-0.124939
-0.557385	many instances makes	-0.124939
-1.144304	the linker makes	-0.124939
-0.504663	the product makes	-0.124939
-0.358500	section position-independent, makes	-0.124939
-0.600882	inlined or cannot	-0.124939
-1.593340	that it cannot	-0.124939
-2.064468	if it cannot	-0.124939
-1.979523	because it cannot	-0.124939
-1.588807	Therefore, it cannot	-0.124939
-2.260196	the function cannot	-0.124939
-1.161139	member function cannot	-0.124939
-0.592526	128 function cannot	-0.124939
-1.635249	intermediate code cannot	-0.124939
-1.631355	the compiler cannot	-0.301030
-2.005128	The compiler cannot	-0.124939
-0.583352	CodeGear compiler cannot	-0.124939
-0.966251	and you cannot	-0.124939
-1.632312	if you cannot	-0.124939
-1.123221	then you cannot	-0.124939
-1.558526	If you cannot	-0.124939
-1.029139	but you cannot	-0.124939
-0.941787	Here, you cannot	-0.124939
-0.806846	Here you cannot	-0.124939
-0.806846	systems, you cannot	-0.124939
-0.551277	Likewise, you cannot	-0.124939
-0.898794	MOVNTQ instruction cannot	-0.124939
-1.365451	function which cannot	-0.124939
-1.654753	level-2 cache cannot	-0.124939
-1.560065	the compilers cannot	-0.124939
-0.589122	example, compilers cannot	-0.124939
-0.896994	final size cannot	-0.124939
-1.072098	An object cannot	-0.124939
-1.355539	A variable cannot	-0.124939
-0.474040	1; You cannot	-0.124939
-0.676322	cycles. You cannot	-0.124939
-0.474040	thread. You cannot	-0.124939
-0.474040	not. You cannot	-0.124939
-0.474040	case. You cannot	-0.124939
-0.474040	here. You cannot	-0.124939
-0.474040	operands. You cannot	-0.124939
-0.474040	CriticalFunction. You cannot	-0.124939
-0.474040	examples. You cannot	-0.124939
-0.474040	compiler-specific. You cannot	-0.124939
-1.472354	memory address cannot	-0.124939
-1.273745	program optimization cannot	-0.124939
-0.532616	that they cannot	-0.124939
-1.085343	because they cannot	-0.124939
-0.746696	where they cannot	-0.124939
-0.516861	reductions they cannot	-0.124939
-0.889954	complicated cases cannot	-0.124939
-1.267554	vector instructions cannot	-0.124939
-0.595004	access times cannot	-0.124939
-1.175497	Intel CPUs cannot	-0.124939
-0.594342	it work cannot	-0.124939
-1.528440	and therefore cannot	-0.124939
-1.458829	the problem cannot	-0.124939
-1.152428	the name cannot	-0.124939
-1.151481	const reference cannot	-0.124939
-1.038908	network resources cannot	-0.124939
-0.874385	Table lookup cannot	-0.124939
-0.587816	optimized. We cannot	-0.124939
-0.750312	different types cannot	-0.425969
-1.241062	dynamic linking cannot	-0.124939
-0.583698	point operands cannot	-0.124939
-1.131533	is loaded cannot	-0.124939
-0.550096	The debugger cannot	-0.124939
-0.540568	variables Compilers cannot	-0.124939
-0.463093	encryption algorithms, cannot	-0.124939
-1.072943	operations and before	-0.124939
-0.599971	running and before	-0.124939
-1.989789	unsigned int before	-0.124939
-1.557357	the time before	-0.124939
-2.213215	the program before	-0.124939
-1.167934	your program before	-0.124939
-0.599803	EMMS instruction before	-0.124939
-1.068830	to double before	-0.124939
-1.360984	an Intel before	-0.124939
-0.597915	temporary array before	-0.124939
-2.207067	the value before	-0.124939
-2.061137	it takes before	-0.124939
-1.185125	virtual table before	-0.124939
-0.597423	misprediction long before	-0.124939
-1.559390	is called before	-0.124939
-1.277445	be called before	-0.124939
-0.827086	usually called before	-0.124939
-1.175502	of times before	-0.124939
-1.064915	the stack before	-0.124939
-0.886538	too big before	-0.124939
-1.337788	for overflow before	-0.124939
-0.885620	signed integers before	-0.124939
-1.732544	double precision before	-0.124939
-0.592706	a check before	-0.124939
-1.530603	is known before	-0.124939
-0.817873	size known before	-0.124939
-0.515884	their values before	-0.124939
-0.515884	desired values before	-0.124939
-0.515884	actual values before	-0.124939
-0.590893	pointer well before	-0.124939
-1.890246	clock cycles before	-0.124939
-1.245554	stamp counter before	-0.124939
-0.590621	clock count before	-0.124939
-1.039822	to signed before	-0.124939
-0.587199	new addition before	-0.124939
-0.535752	size needed before	-0.124939
-0.779270	searching needed before	-0.124939
-1.034721	be read before	-0.124939
-0.859923	it comes before	-0.124939
-0.578502	of temp before	-0.124939
-1.011219	optimal algorithm before	-0.124939
-0.574697	thread priority before	-0.124939
-1.103705	one iteration before	-0.124939
-1.283640	monitor counters before	-0.124939
-0.569834	security reasons before	-0.124939
-0.979931	// Time before	-0.124939
-0.826301	the misprediction before	-0.124939
-0.557289	are resolved before	-0.124939
-0.556722	address again before	-0.124939
-0.549606	of c1 before	-0.124939
-0.549823	of B before	-0.124939
-0.549823	or compilation before	-0.124939
-1.024136	the job before	-0.124939
-0.540343	require cleanup before	-0.124939
-0.065711	function _mm256_zeroupper() before	-0.124939
-0.020818	call _mm256_zeroupper() before	-0.301030
-0.526193	several years before	-0.124939
-0.503978	programming experience before	-0.124939
-0.503978	is evicted before	-0.124939
-0.462692	be freed before	-0.124939
-0.143167	do immediately before	-0.124939
-0.143167	placed immediately before	-0.124939
-0.358185	several stages before	-0.124939
-0.358185	be restored before	-0.124939
-0.358185	particular subtask before	-0.124939
-0.358185	calculate (c+d) before	-0.124939
-0.358185	is checked before	-0.124939
-0.358185	second sub-vector before	-0.124939
-2.092466	that is stored	-0.124939
-1.821628	it is stored	-0.124939
-1.170775	i is stored	-0.124939
-0.891126	variable is stored	-0.425969
-0.739609	result is stored	-0.425969
-1.053935	element is stored	-0.124939
-0.593481	sign is stored	-0.124939
-1.259733	exponent is stored	-0.124939
-0.593481	fraction is stored	-0.124939
-0.600475	distributed and stored	-0.124939
-0.899926	advance and stored	-0.124939
-1.565266	to be stored	-0.425969
-1.880452	can be stored	-0.425969
-1.841800	may be stored	-0.124939
-1.043008	will be stored	-0.249877
-1.026912	should be stored	-0.301030
-1.199095	cannot be stored	-0.425969
-1.597941	preferably be stored	-0.124939
-1.789109	that are stored	-0.124939
-0.486187	function are stored	-0.425969
-0.766140	data are stored	-0.301030
-0.583097	class are stored	-0.124939
-1.506681	objects are stored	-0.124939
-0.533321	variables are stored	-0.301030
-1.334038	elements are stored	-0.124939
-1.596111	parameters are stored	-0.124939
-0.570900	structure are stored	-0.124939
-0.991701	numbers are stored	-0.124939
-0.842858	together are stored	-0.124939
-1.163701	constants are stored	-0.124939
-2.185061	are not stored	-0.124939
-0.899721	is then stored	-0.124939
-2.117553	a pointer stored	-0.124939
-0.573246	are also stored	-0.425969
-1.261402	the objects stored	-0.124939
-0.818477	The objects stored	-0.124939
-0.194877	for objects stored	-0.425969
-1.270876	are always stored	-0.124939
-1.621805	have been stored	-0.124939
-0.591547	for information stored	-0.124939
-1.043426	are typically stored	-0.124939
-1.329742	is never stored	-0.124939
-1.029404	are usually stored	-0.124939
-0.577533	26). Variables stored	-0.124939
-0.143358	16, i.e. stored	-0.425969
-0.998847	not necessarily stored	-0.124939
-3.123536	of the called	-0.124939
-2.721828	to the called	-0.124939
-1.274981	and is called	-0.124939
-1.511055	that is called	-0.124939
-1.903655	it is called	-0.124939
-0.989357	function is called	-0.271067
-1.135048	This is called	-0.124939
-1.131214	functions is called	-0.124939
-1.806933	which is called	-0.124939
-1.131214	one is called	-0.124939
-1.307305	example is called	-0.124939
-1.023387	processors is called	-0.124939
-0.372345	profiler is called	-0.124939
-1.041200	destructor is called	-0.124939
-0.200169	CriticalFunction is called	-0.124939
-2.396252	can be called	-0.124939
-1.747662	may be called	-0.124939
-1.818620	cannot be called	-0.124939
-1.181832	must be called	-0.124939
-0.590776	Will be called	-0.124939
-1.175076	function are called	-0.124939
-1.382182	which are called	-0.124939
-1.271986	class are called	-0.124939
-0.592161	object are called	-0.124939
-0.883513	destructors are called	-0.124939
-0.592161	sum2 are called	-0.124939
-1.841208	member function called	-0.124939
-1.068017	Assume function called	-0.124939
-0.589036	compiler when called	-0.124939
-1.612955	only when called	-0.124939
-0.589036	also when called	-0.124939
-1.847684	of memory called	-0.124939
-1.050439	all functions called	-0.124939
-1.475221	library functions called	-0.124939
-1.726896	is only called	-0.124939
-0.599492	special cache called	-0.124939
-1.158352	is also called	-0.124939
-0.195586	libraries, also called	-0.425969
-0.597905	its variables called	-0.124939
-1.760870	has been called	-0.124939
-0.590461	function was called	-0.124939
-0.587746	a mechanism called	-0.124939
-1.144258	are actually called	-0.124939
-0.799227	is usually called	-0.124939
-0.845706	a feature called	-0.124939
-0.469818	A feature called	-0.124939
-0.669614	test feature called	-0.124939
-0.580550	its functions, called	-0.124939
-0.358701	class (also called	-0.124939
-0.358701	is 83 called	-0.124939
-0.358701	// erroneously called	-0.124939
-2.502538	of the address	-0.124939
-1.732907	to the address	-0.602060
-2.290965	that the address	-0.124939
-2.315023	if the address	-0.124939
-1.366592	make the address	-0.425969
-0.926642	up the address	-0.425969
-0.089477	contains the address	-0.726999
-1.033128	calculate the address	-0.425969
-0.593048	simply the address	-0.124939
-1.688018	unless the address	-0.124939
-1.395694	Here, the address	-0.124939
-1.437802	find the address	-0.124939
-1.052687	containing the address	-0.124939
-0.737396	calculating the address	-0.425969
-1.169142	tells the address	-0.124939
-0.593048	So the address	-0.124939
-0.593048	covered the address	-0.124939
-0.898805	address. The address	-0.124939
-0.898805	here. The address	-0.124939
-1.442532	instructions for address	-0.124939
-2.532610	the function address	-0.124939
-0.600944	different code address	-0.124939
-1.515112	to an address	-0.124939
-0.147125	at an address	-0.522879
-0.899784	If this address	-0.124939
-0.861532	variable from address	-0.124939
-0.580784	bytes from address	-0.124939
-0.580784	again from address	-0.124939
-0.861532	reads from address	-0.124939
-0.647907	a memory address	-0.301030
-0.991226	The memory address	-0.124939
-1.043057	from memory address	-0.124939
-0.194478	particular memory address	-0.124939
-0.555859	arbitrary memory address	-0.124939
-1.354777	stored at address	-0.124939
-0.593416	stack at address	-0.124939
-2.622161	the same address	-0.124939
-1.770645	any other address	-0.124939
-0.897382	calculate each address	-0.124939
-1.782020	the array address	-0.124939
-0.894283	the return address	-0.124939
-0.891174	but these address	-0.124939
-0.888748	if its address	-0.124939
-0.589150	the complicated address	-0.124939
-1.154537	by their address	-0.124939
-1.149014	the runtime address	-0.124939
-1.459116	its own address	-0.124939
-1.303631	a higher address	-0.124939
-0.368807	the target address	-0.124939
-0.554909	The target address	-0.124939
-0.970297	a fixed address	-0.124939
-0.805041	a base address	-0.124939
-0.788051	the larger address	-0.124939
-0.504580	a nearby address	-0.124939
-0.463239	variable whose address	-0.124939
-3.167540	of the 4	-0.124939
-1.676474	pointer is 4	-0.124939
-1.498864	function of 4	-0.124939
-0.600795	Structure of 4	-0.124939
-1.676993	up to 4	-0.124939
-0.376966	first // 4	-0.124939
-1.172713	b; // 4	-0.124939
-1.055416	d; // 4	-0.124939
-1.638025	bytes = 4	-0.124939
-1.066822	out by 4	-0.124939
-0.894811	align by 4	-0.124939
-1.075666	3 - 4	-0.124939
-0.891818	= int 4	-0.124939
-0.891818	or int 4	-0.124939
-0.896702	= double 4	-0.124939
-1.158084	= float 4	-0.124939
-0.586166	8 float 4	-0.124939
-1.682787	are also 4	-0.124939
-0.597978	8 * 4	-0.124939
-0.867636	double takes 4	-0.124939
-0.583971	Multiplication takes 4	-0.124939
-0.769003	float 4 4	-0.124939
-0.529862	unsigned 4 4	-0.124939
-0.529862	mode 4 4	-0.124939
-0.529862	_mm_permutevar_ps 4 4	-0.124939
-0.529862	_mm256_permutevar_ps 4 4	-0.124939
-1.184406	or unsigned 4	-0.124939
-0.882183	double 64 4	-0.124939
-0.957301	long 64 4	-0.124939
-0.882183	4 64 4	-0.124939
-0.348185	8 64 4	-0.124939
-1.006464	execution time. 4	-0.124939
-0.576444	computation time. 4	-0.124939
-1.091381	int 16 4	-0.124939
-0.551255	Iu8vec8 16 4	-0.124939
-0.551255	Is16vec4 16 4	-0.124939
-0.965745	int 32 4	-0.124939
-0.849120	float 32 4	-0.124939
-0.511158	4 32 4	-0.124939
-0.511158	Vec8us 32 4	-0.124939
-0.511158	Vec4i 32 4	-0.124939
-0.592554	8192 / 4	-0.124939
-1.048069	32-bit mode 4	-0.124939
-0.877610	32 sets 4	-0.124939
-0.232800	the Pentium 4	-0.124939
-0.136660	a Pentium 4	-0.124939
-0.337186	The Pentium 4	-0.124939
-0.232800	on Pentium 4	-0.124939
-0.232800	while Pentium 4	-0.124939
-1.127663	1, 2, 4	-0.124939
-1.017646	SSE Store 4	-0.124939
-0.562811	of procedure 4	-0.124939
-0.255843	_mm256_i32gather_epi32 unlimited 4	-0.124939
-0.255843	_mm_i32gather_ps unlimited 4	-0.124939
-0.255843	_mm_i32gather_epi32 unlimited 4	-0.124939
-0.255843	_mm256_i32gather_ps unlimited 4	-0.124939
-0.817184	= int64_t 4	-0.124939
-0.817184	a factor 4	-0.124939
-0.540525	....................................................................................... 22 4	-0.124939
-0.129365	assembly: ALIGN 4	-0.425969
-0.763387	parameter 1: 4	-0.124939
-0.504379	optimizing ............................................................................................... 4	-0.124939
-0.463057	least recently 4	-0.124939
-0.463057	8192 bytes, 4	-0.124939
-0.358471	only _mm_permutevar_ps 4	-0.124939
-0.358471	AVX _mm256_permutevar_ps 4	-0.124939
-0.600459	syntax or See	-0.124939
-1.705423	the code. See	-0.124939
-1.263345	member function. See	-0.124939
-0.592301	than functions. See	-0.124939
-0.942622	of memory. See	-0.124939
-0.551617	contiguous memory. See	-0.124939
-0.590323	main program. See	-0.124939
-0.734839	are used. See	-0.425969
-0.871936	Gnu compilers. See	-0.124939
-1.031390	VIA processors. See	-0.124939
-0.580795	across platforms. See	-0.124939
-0.580990	simplest cases. See	-0.124939
-1.125357	or 1. See	-0.124939
-0.579309	static variables. See	-0.124939
-0.576508	sleep mode. See	-0.124939
-0.576270	code optimization. See	-0.124939
-1.282280	if possible. See	-0.124939
-0.845214	inefficient way. See	-0.124939
-0.569431	Intel CPU. See	-0.124939
-1.329692	or not. See	-0.124939
-0.569736	possible version. See	-0.124939
-0.833959	of order. See	-0.124939
-1.275489	memory allocation. See	-0.124939
-0.566288	if available. See	-0.124939
-0.565952	integer expressions. See	-0.124939
-0.565785	of storage. See	-0.124939
-0.513456	operating system. See	-0.425969
-0.825699	control branch. See	-0.124939
-0.816457	interprocedural optimizations. See	-0.124939
-1.109405	assembly language. See	-0.124939
-0.803560	avoid this. See	-0.124939
-1.024394	not overlap. See	-0.124939
-0.995913	exception handling. See	-0.124939
-0.539936	disk files. See	-0.124939
-0.914428	do so. See	-0.124939
-0.540221	five manuals. See	-0.124939
-0.310857	memory pool. See	-0.124939
-0.129279	CPU dispatcher. See	-0.425969
-0.762405	not cached. See	-0.124939
-0.882219	pointer aliasing. See	-0.124939
-0.503838	less compact. See	-0.124939
-0.503838	such errors. See	-0.124939
-0.503838	part takes. See	-0.124939
-0.504275	from exceptions. See	-0.124939
-0.724814	not occur. See	-0.124939
-0.203828	each other. See	-0.425969
-0.462564	optimally aligned. See	-0.124939
-0.658179	is required. See	-0.124939
-0.462564	prediction mechanism. See	-0.124939
-0.462564	legal issue. See	-0.124939
-0.462564	long delay. See	-0.124939
-0.462564	STL containers. See	-0.124939
-0.462564	the alignment. See	-0.124939
-0.658179	cache contentions. See	-0.124939
-0.462564	will crash. See	-0.124939
-0.358084	is incremented. See	-0.124939
-0.358084	I die. See	-0.124939
-0.358084	are doing. See	-0.124939
-0.358084	is requested. See	-0.124939
-0.358084	from Intel. See	-0.124939
-0.358084	Mac OS. See	-0.124939
-0.358084	directive __declspec(cpu_dispatch(...)). See	-0.124939
-0.358084	code motion. See	-0.124939
-0.358084	is obvious. See	-0.124939
-0.358084	identification (RTTI). See	-0.124939
-0.358084	| 0x8040); See	-0.124939
-1.637701	of the critical	-0.477121
-1.886173	to the critical	-0.124939
-2.081787	and the critical	-0.124939
-1.751208	in the critical	-0.823909
-2.168342	for the critical	-0.124939
-2.194957	that the critical	-0.124939
-1.892763	if the critical	-0.124939
-2.071401	by the critical	-0.124939
-1.752327	time the critical	-0.124939
-1.957538	then the critical	-0.124939
-1.989207	because the critical	-0.124939
-2.023316	If the critical	-0.124939
-1.842600	where the critical	-0.124939
-0.592111	calls the critical	-0.425969
-1.824989	inside the critical	-0.124939
-1.532465	outside the critical	-0.124939
-1.242287	When the critical	-0.124939
-1.242287	includes the critical	-0.124939
-0.878631	executing the critical	-0.124939
-0.878631	identify the critical	-0.124939
-0.589659	distance the critical	-0.124939
-2.299218	code is critical	-0.124939
-1.928632	of a critical	-0.124939
-2.573953	in a critical	-0.124939
-1.285091	makes a critical	-0.124939
-0.598825	executing a critical	-0.124939
-0.601702	advices in critical	-0.124939
-0.896089	ways. The critical	-0.124939
-0.896089	lines. The critical	-0.124939
-0.896089	each. The critical	-0.124939
-0.601505	recommended for critical	-0.124939
-1.071705	cache are critical	-0.124939
-0.599554	access are critical	-0.124939
-0.601160	important or critical	-0.124939
-0.600289	tasks. A critical	-0.124939
-2.632812	the same critical	-0.124939
-0.586793	the most critical	-0.522879
-1.375185	The most critical	-0.124939
-0.597246	Optimizing less critical	-0.124939
-1.714525	by making critical	-0.124939
-1.196486	are particularly critical	-0.124939
-0.129431	13 Making critical	-0.425969
-0.204042	// Call critical	-0.425969
-0.358773	processor activates critical	-0.124939
-1.293089	Use the call	-0.124939
-1.659744	whether the call	-0.124939
-1.370308	replace the call	-0.124939
-1.073212	inlining the call	-0.124939
-1.073212	overlap the call	-0.124939
-0.600061	removing the call	-0.124939
-2.011925	that a call	-0.124939
-0.600959	across a call	-0.124939
-1.599404	overhead of call	-0.124939
-1.635280	code to call	-0.124939
-1.489028	have to call	-0.425969
-1.657662	time to call	-0.124939
-1.012867	takes to call	-0.301030
-1.867476	need to call	-0.124939
-1.924923	want to call	-0.124939
-1.667003	needs to call	-0.124939
-0.883692	destructor to call	-0.124939
-0.592253	handler to call	-0.124939
-0.592253	supposed to call	-0.124939
-1.074219	compiler and call	-0.124939
-0.600399	F2 and call	-0.124939
-2.011711	that can call	-0.124939
-2.044306	it can call	-0.124939
-1.571853	It can call	-0.124939
-1.194571	matrix // call	-0.124939
-0.598854	"; // call	-0.124939
-1.912040	the function call	-0.124939
-1.380428	a function call	-0.124939
-1.377844	The function call	-0.124939
-0.990953	or function call	-0.124939
-0.649798	each function call	-0.124939
-1.372861	virtual function call	-0.124939
-0.842327	intrinsic function call	-0.124939
-0.570617	Each function call	-0.124939
-1.090061	dispatched function call	-0.124939
-1.295929	should not call	-0.124939
-2.054538	you may call	-0.124939
-0.594217	address. A call	-0.124939
-0.594217	driver. A call	-0.124939
-1.073843	I will call	-0.124939
-1.474174	and then call	-0.124939
-0.581837	support then call	-0.124939
-0.581837	dispatching then call	-0.124939
-0.581837	support, then call	-0.124939
-1.273099	than one call	-0.124939
-1.143732	only one call	-0.425969
-0.897484	because each call	-0.124939
-0.895039	make any call	-0.124939
-1.393874	the first call	-0.124939
-0.348682	on first call	-0.124939
-0.188069	After first call	-0.425969
-0.890516	a system call	-0.124939
-0.705520	that doesn't call	-0.425969
-1.161526	a single call	-0.425969
-0.360207	on every call	-0.124939
-0.867324	other modules call	-0.124939
-0.827941	// Virtual call	-0.124939
-0.817905	// Now call	-0.124939
-0.096540	i = 0;	-0.970037
-1.141932	c = 0;	-0.124939
-0.053962	(i = 0;	-1.539912
-1.011815	d = 0;	-0.124939
-0.297365	sum = 0;	-0.301030
-0.666764	list[i] = 0;	-0.124939
-0.468017	seconds = 0;	-0.124939
-0.078262	(c = 0;	-0.726999
-0.468017	sum2 = 0;	-0.124939
-0.446484	(r = 0;	-0.425969
-0.321545	(x = 0;	-0.425969
-0.666764	largest_index = 0;	-0.124939
-0.666764	largest_abs = 0;	-0.124939
-0.468017	(j = 0;	-0.124939
-0.468017	(c1 = 0;	-0.124939
-0.468017	(row = 0;	-0.124939
-0.468017	(r1 = 0;	-0.124939
-0.468017	(column = 0;	-0.124939
-0.468017	list[300] = 0;	-0.124939
-1.624693	{ return 0;	-0.124939
-0.197093	... return 0;	-0.425969
-0.587406	i > 0;	-0.124939
-1.018178	i >= 0;	-0.124939
-0.550847	z != 0;	-0.124939
-0.902409	Today, the 8	-0.124939
-1.553073	cache is 8	-0.124939
-0.379071	cache of 8	-0.124939
-1.363409	integers of 8	-0.124939
-0.599214	double's of 8	-0.124939
-1.196619	integer and 8	-0.124939
-1.543259	systems and 8	-0.124939
-1.765923	would be 8	-0.124939
-0.598693	bytes // 8	-0.124939
-1.190808	b; // 8	-0.124939
-0.600920	sizeof(float)) = 8	-0.124939
-1.359617	aligned by 8	-0.124939
-1.874524	divisible by 8	-0.124939
-0.900706	4 - 8	-0.124939
-0.891739	= int 8	-0.124939
-0.891739	or int 8	-0.124939
-1.997453	on page 8	-0.124939
-0.598800	4 double 8	-0.124939
-1.193114	= float 8	-0.124939
-0.597913	10 * 8	-0.124939
-0.895098	double takes 8	-0.124939
-0.597436	nothing between 8	-0.124939
-0.513428	double 8 8	-0.124939
-0.513428	unsigned 8 8	-0.124939
-0.513428	mode 8 8	-0.124939
-0.972242	char 8 8	-0.124939
-0.513428	Agner 8 8	-0.124939
-0.513428	Is8vec8 8 8	-0.124939
-1.184278	or unsigned 8	-0.124939
-1.010508	double 64 8	-0.124939
-1.114774	long 64 8	-0.124939
-0.892687	or 16 8	-0.124939
-1.023664	int 16 8	-0.124939
-0.530624	Vec8s 16 8	-0.124939
-0.530624	Vec16uc 16 8	-0.124939
-0.965640	int 32 8	-0.124939
-0.849040	float 32 8	-0.124939
-0.511121	2 32 8	-0.124939
-0.849040	8 32 8	-0.124939
-0.511121	16 32 8	-0.124939
-1.451243	the constant 8	-0.124939
-0.592471	kb / 8	-0.124939
-1.321977	the structure 8	-0.124939
-1.582997	64-bit mode 8	-0.124939
-0.410125	set char 8	-0.124939
-0.695869	unsigned char 8	-0.124939
-0.410125	SSE2 char 8	-0.124939
-0.410125	MMX char 8	-0.124939
-0.410125	stdint.h char 8	-0.124939
-1.366544	the last 8	-0.124939
-1.234735	INSTRSET == 8	-0.124939
-0.841467	SSE2 Store 8	-0.124939
-0.450208	namespaces. 65 8	-0.124939
-0.450208	Namespaces........................................................................................................... 65 8	-0.124939
-0.255832	_mm256_i64gather_pd unlimited 8	-0.124939
-0.255832	_mm_i64gather_pd unlimited 8	-0.124939
-0.255832	_mm256_i64gather_epi32 unlimited 8	-0.124939
-0.255832	_mm_i64gather_epi32 unlimited 8	-0.124939
-0.826890	the lower 8	-0.124939
-0.817269	1 eax, 8	-0.124939
-1.046670	can hold 8	-0.124939
-0.550120	class, Agner 8	-0.124939
-0.763242	parameter 1: 8	-0.124939
-0.358414	language ............................................................................... 8	-0.124939
-0.358414	Is8vec16 Vec16c 8	-0.124939
-0.358414	128 Vec2uq 8	-0.124939
-0.358414	64 Is8vec8 8	-0.124939
-0.358414	512 kb, 8	-0.124939
-0.358414	64 I64vec1 8	-0.124939
-2.339804	it is less	-0.124939
-2.122941	function is less	-0.124939
-2.234327	This is less	-0.124939
-1.765438	compiler is less	-0.124939
-2.409106	It is less	-0.124939
-1.926774	which is less	-0.124939
-0.498263	but is less	-0.124939
-1.508727	class is less	-0.124939
-1.421250	array is less	-0.124939
-1.382763	value is less	-0.124939
-0.880547	option is less	-0.124939
-1.246726	list is less	-0.124939
-0.880547	numbers is less	-0.124939
-1.045791	delay is less	-0.124939
-0.590642	μs is less	-0.124939
-0.590642	bitfield is less	-0.124939
-0.601342	around and less	-0.124939
-0.601327	sufficient for less	-0.124939
-1.767434	to be less	-0.425969
-2.508528	can be less	-0.124939
-2.014986	will be less	-0.124939
-0.594150	fact be less	-0.124939
-1.721528	functions are less	-0.124939
-1.039432	cache are less	-0.124939
-1.606244	libraries are less	-0.124939
-0.588402	bits are less	-0.124939
-1.370994	instructions are less	-0.124939
-0.588402	expressions are less	-0.124939
-0.588402	implementations are less	-0.124939
-0.876190	requirements are less	-0.124939
-0.600897	50% or less	-0.124939
-1.596853	makes it less	-0.124939
-2.640147	the code less	-0.124939
-2.569652	is not less	-0.124939
-0.600410	applications have less	-0.124939
-0.899095	run at less	-0.124939
-2.415442	the program less	-0.124939
-0.599480	several other less	-0.124939
-1.194354	functions, but less	-0.124939
-1.479380	but also less	-0.124939
-0.596779	one register less	-0.124939
-1.272754	member pointers less	-0.124939
-0.595017	million times less	-0.124939
-0.594428	files while less	-0.124939
-0.831016	and much less	-0.124939
-0.564530	have much less	-0.124939
-1.186318	cache works less	-0.124939
-0.880082	or write less	-0.124939
-0.590165	brand was less	-0.124939
-0.313665	data caching less	-0.823909
-0.558056	makes caching less	-0.124939
-1.319017	that allows less	-0.124939
-0.587862	relative difference less	-0.124939
-1.127061	has become less	-0.124939
-0.580250	compilers produce less	-0.124939
-0.860192	make vectorization less	-0.124939
-0.575174	time. Optimizing less	-0.124939
-0.575243	available, though less	-0.124939
-0.841003	as input less	-0.124939
-0.681004	is slightly less	-0.124939
-0.681004	only slightly less	-0.124939
-0.358853	although slightly less	-0.124939
-0.463111	works somewhat less	-0.124939
-0.358514	may neverthe- less	-0.124939
-0.900830	counter // For	-0.124939
-1.398247	the code. For	-0.124939
-0.557692	this code. For	-0.124939
-0.557692	makes code. For	-0.124939
-1.042856	compile time. For	-0.124939
-0.593046	comparisons, etc. For	-0.124939
-1.526138	is used. For	-0.124939
-0.880294	header files For	-0.124939
-0.862131	many cases. For	-0.124939
-0.714979	of resources. For	-0.124939
-0.497899	ample resources. For	-0.124939
-0.853818	the variable. For	-0.124939
-0.576401	for optimization. For	-0.124939
-0.572274	Boolean vector. For	-0.124939
-0.677635	CPU dispatching. For	-0.425969
-0.463107	every version. For	-0.124939
-0.463107	intermediate version. For	-0.124939
-0.450091	pointed to. For	-0.124939
-0.450091	refers to. For	-0.124939
-0.776167	point expressions. For	-0.124939
-0.449857	algebraic expressions. For	-0.124939
-0.833876	cause overflow. For	-0.124939
-1.248355	execution units. For	-0.124939
-0.561780	of software. For	-0.124939
-0.556895	vectorized automatically. For	-0.124939
-0.556703	each core. For	-0.124939
-0.804196	a structure. For	-0.124939
-0.540067	specific profiler. For	-0.124939
-0.504196	one operation. For	-0.124939
-0.504196	shift operation. For	-0.124939
-0.997027	unroll factor. For	-0.124939
-0.526172	elements are. For	-0.124939
-0.526172	case conditions. For	-0.124939
-0.526172	in efficiency. For	-0.124939
-0.526172	different tasks. For	-0.124939
-0.762623	data structures. For	-0.124939
-0.314241	more constants. For	-0.124939
-0.314241	defining constants. For	-0.124939
-0.833594	operating systems". For	-0.124939
-0.503958	Preprocessor directives. For	-0.124939
-0.503958	mixed sizes. For	-0.124939
-0.833594	table lookup. For	-0.124939
-0.203855	is valid. For	-0.124939
-0.503958	than post-increment. For	-0.124939
-0.658351	clock frequency. For	-0.124939
-0.462674	of jobs. For	-0.124939
-0.462674	a subexpression. For	-0.124939
-0.462674	is supported. For	-0.124939
-0.658351	in question. For	-0.124939
-0.462674	easy development. For	-0.124939
-0.658351	mathematical purity. For	-0.124939
-0.462674	of sources. For	-0.124939
-0.658351	fast enough. For	-0.124939
-0.658351	the fraction. For	-0.124939
-0.658351	poorly predictable. For	-0.124939
-0.462674	execution unit. For	-0.124939
-0.462674	a matrix. For	-0.124939
-0.358170	of algebra. For	-0.124939
-0.358170	it exits. For	-0.124939
-0.358170	be combined. For	-0.124939
-0.358170	of modularity. For	-0.124939
-0.358170	simply identical. For	-0.124939
-0.358170	algebraic reduction. For	-0.124939
-0.358170	means modulo. For	-0.124939
-0.358170	eliminated completely. For	-0.124939
-0.358170	error reporting. For	-0.124939
-0.358170	is minimized. For	-0.124939
-0.594032	used, for example,	-0.124939
-0.594032	be, for example,	-0.124939
-0.594032	decimals, for example,	-0.124939
-0.594032	Assume, for example,	-0.124939
-0.594032	If, for example,	-0.124939
-0.594032	12.3a, for example,	-0.124939
-0.387293	In this example,	-0.301030
-0.122525	code. For example,	-0.124939
-0.199784	time. For example,	-0.124939
-0.121323	used. For example,	-0.124939
-0.121323	files For example,	-0.124939
-0.199784	resources. For example,	-0.124939
-0.121323	variable. For example,	-0.124939
-0.121323	optimization. For example,	-0.124939
-0.121323	vector. For example,	-0.124939
-0.056439	dispatching. For example,	-0.124939
-0.056439	expressions. For example,	-0.124939
-0.121323	overflow. For example,	-0.124939
-0.121323	units. For example,	-0.124939
-0.121323	automatically. For example,	-0.124939
-0.121323	core. For example,	-0.124939
-0.199784	operation. For example,	-0.124939
-0.121323	factor. For example,	-0.124939
-0.121323	are. For example,	-0.124939
-0.121323	conditions. For example,	-0.124939
-0.121323	efficiency. For example,	-0.124939
-0.121323	tasks. For example,	-0.124939
-0.121323	structures. For example,	-0.124939
-0.056439	constants. For example,	-0.124939
-0.199784	valid. For example,	-0.124939
-0.121323	post-increment. For example,	-0.124939
-0.121323	frequency. For example,	-0.124939
-0.121323	jobs. For example,	-0.124939
-0.121323	subexpression. For example,	-0.124939
-0.121323	supported. For example,	-0.124939
-0.121323	question. For example,	-0.124939
-0.121323	development. For example,	-0.124939
-0.121323	purity. For example,	-0.124939
-0.121323	sources. For example,	-0.124939
-0.121323	enough. For example,	-0.124939
-0.121323	fraction. For example,	-0.124939
-0.121323	unit. For example,	-0.124939
-0.121323	matrix. For example,	-0.124939
-0.121323	algebra. For example,	-0.124939
-0.121323	exits. For example,	-0.124939
-0.121323	modularity. For example,	-0.124939
-0.121323	identical. For example,	-0.124939
-0.121323	reduction. For example,	-0.124939
-0.121323	modulo. For example,	-0.124939
-0.121323	reporting. For example,	-0.124939
-0.121323	minimized. For example,	-0.124939
-1.920159	the following example,	-0.124939
-0.564246	the above example,	-0.425969
-1.509439	the preceding example,	-0.124939
-0.172672	12.4c. Same example,	-0.124939
-0.172672	12.4e. Same example,	-0.124939
-0.172672	12.4d. Same example,	-0.124939
-0.463604	very contrived example,	-0.124939
-2.698091	that the bit	-0.124939
-2.297149	use the bit	-0.124939
-1.855090	of this bit	-0.124939
-0.841934	if each bit	-0.124939
-0.570406	using each bit	-0.124939
-1.161769	where each bit	-0.124939
-0.570406	next each bit	-0.124939
-0.598255	Contains many bit	-0.124939
-0.597142	between 8 bit	-0.124939
-0.047686	in 64 bit	-0.204120
-0.600207	The 64 bit	-0.124939
-0.424732	code 64 bit	-0.124939
-0.424732	Gnu 64 bit	-0.124939
-0.424732	efficient. 64 bit	-0.124939
-0.424732	_WIN64 64 bit	-0.124939
-0.424732	-fno-pic). 64 bit	-0.124939
-0.595866	identification 16 bit	-0.124939
-0.440532	to 32 bit	-0.124939
-0.440532	and 32 bit	-0.124939
-0.103725	in 32 bit	-0.124939
-0.440532	directives 32 bit	-0.124939
-0.440532	over 32 bit	-0.124939
-0.440532	161 32 bit	-0.124939
-0.440532	80386 32 bit	-0.124939
-1.856700	a single bit	-0.124939
-1.449379	a small bit	-0.124939
-0.108141	a 128 bit	-0.602060
-0.470146	SSE2 128 bit	-0.124939
-0.470146	SSE 128 bit	-0.124939
-0.194831	the sign bit	-0.380211
-0.191627	// sign bit	-0.124939
-0.191627	with sign bit	-0.124939
-0.085329	set sign bit	-0.124939
-0.191627	test sign bit	-0.124939
-0.191627	down sign bit	-0.124939
-0.191627	Set sign bit	-0.124939
-0.191627	flip sign bit	-0.124939
-0.541024	AVX 256 bit	-0.124939
-0.541024	AVX2 256 bit	-0.124939
-0.835558	a slow bit	-0.124939
-0.504877	with slow bit	-0.124939
-0.841503	least significant bit	-0.124939
-0.482894	the carry bit	-0.124939
-0.308744	The carry bit	-0.124939
-0.274621	in 32- bit	-0.124939
-0.274621	The 32- bit	-0.124939
-0.274621	A 32- bit	-0.124939
-0.726317	set (128 bit	-0.124939
-0.463385	subtraction, comparison, bit	-0.124939
-0.358729	two 128- bit	-0.124939
-2.002758	of the operating	-0.726999
-1.944864	to the operating	-0.124939
-1.225887	and the operating	-0.425969
-2.488502	in the operating	-0.124939
-2.325315	that the operating	-0.124939
-1.320996	by the operating	-0.249877
-2.154229	with the operating	-0.124939
-1.001247	between the operating	-0.124939
-1.173101	tells the operating	-0.124939
-0.887303	force the operating	-0.124939
-1.047566	Choice of operating	-0.124939
-0.355761	compilers and operating	-0.301030
-0.201835	CPUs and operating	-0.124939
-0.590751	microprocessors and operating	-0.124939
-0.590751	platform and operating	-0.124939
-1.160552	platforms and operating	-0.124939
-1.193359	used. The operating	-0.124939
-1.567182	cache. The operating	-0.124939
-0.598443	databases. The operating	-0.124939
-1.601138	have an operating	-0.124939
-0.892118	without an operating	-0.124939
-1.840119	a different operating	-0.124939
-2.241913	is no operating	-0.124939
-0.896140	to multiple operating	-0.124939
-1.703863	the two operating	-0.124939
-1.536962	and 64-bit operating	-0.124939
-1.809394	in 64-bit operating	-0.124939
-1.100308	for 64-bit operating	-0.124939
-1.187655	and some operating	-0.124939
-1.256857	in 32-bit operating	-0.425969
-0.596079	though these operating	-0.124939
-1.056771	the Windows operating	-0.124939
-0.593407	and Linux operating	-0.124939
-0.591245	query certain operating	-0.124939
-0.589740	or Mac operating	-0.124939
-1.164366	the old operating	-0.124939
-0.523379	on old operating	-0.124939
-0.841198	both compiler, operating	-0.124939
-0.570076	and current operating	-0.124939
-0.513884	OS X operating	-0.124939
-0.998284	programming languages, operating	-0.124939
-0.129428	a protected operating	-0.124939
-0.504600	Unfortunately, contemporary operating	-0.124939
-0.463257	etc.). Older operating	-0.124939
-0.463257	Clang Supported operating	-0.124939
-0.358629	or circumvent operating	-0.124939
-0.902397	convert the unsigned	-0.124939
-0.902151	dividend is unsigned	-0.124939
-1.298934	Conversion of unsigned	-0.124939
-1.536487	a to unsigned	-0.124939
-1.192686	i to unsigned	-0.124939
-0.599173	dividend to unsigned	-0.124939
-0.599173	Convert to unsigned	-0.124939
-0.070214	signed and unsigned	-0.346788
-0.595864	Signed and unsigned	-0.124939
-1.076734	bits. The unsigned	-0.124939
-0.185699	signed or unsigned	-0.124939
-0.485867	faster if unsigned	-0.249877
-0.894135	than with unsigned	-0.124939
-0.597559	signed with unsigned	-0.124939
-1.204950	of an unsigned	-0.124939
-1.655498	as an unsigned	-0.124939
-0.588114	i an unsigned	-0.124939
-0.596338	2 int unsigned	-0.124939
-1.062235	long int unsigned	-0.124939
-0.590587	Sdouble { unsigned	-0.124939
-0.590587	Slongdouble { unsigned	-0.124939
-0.590587	Sfloat { unsigned	-0.124939
-0.600271	(2) use unsigned	-0.124939
-1.159242	64 4 unsigned	-0.124939
-0.968740	16 4 unsigned	-0.124939
-1.159242	32 4 unsigned	-0.124939
-1.256349	8 8 unsigned	-0.124939
-1.117753	16 8 unsigned	-0.124939
-1.059831	8 16 unsigned	-0.124939
-0.368273	fractional part unsigned	-0.425969
-0.592451	Signed / unsigned	-0.124939
-0.588419	int 256 unsigned	-0.124939
-0.578732	{double d; unsigned	-0.124939
-0.575133	(double x, unsigned	-0.124939
-1.002716	to convert unsigned	-0.124939
-1.474247	float f; unsigned	-0.124939
-0.366451	16-bit systems: unsigned	-0.124939
-0.549940	and normal unsigned	-0.124939
-0.540605	Signed versus unsigned	-0.124939
-1.040952	= 1000; unsigned	-0.124939
-0.725548	64-bit Linux: unsigned	-0.124939
-0.504569	Example 7.7 unsigned	-0.124939
-0.504569	Example 7.25 unsigned	-0.124939
-0.658808	MS compiler: unsigned	-0.124939
-0.462965	part 142 unsigned	-0.124939
-0.358400	256 Vec32c unsigned	-0.124939
-0.358400	u[2]} a[size]; unsigned	-0.124939
-0.358400	<typename T, unsigned	-0.124939
-0.358400	+ 0x3FF unsigned	-0.124939
-0.358400	+ 0x3FFF unsigned	-0.124939
-0.358400	Example 14.22b unsigned	-0.124939
-0.358400	Example 14.22a unsigned	-0.124939
-0.358400	65535 uint16_t unsigned	-0.124939
-0.358400	255 uint8_t unsigned	-0.124939
-0.358400	+ 0x7F unsigned	-0.124939
-0.358400	232-1 uint32_t unsigned	-0.124939
-2.093316	is the first	-0.124939
-2.068380	of the first	-0.124939
-2.232904	to the first	-0.124939
-2.367282	in the first	-0.124939
-1.830094	for the first	-0.124939
-2.222589	that the first	-0.124939
-0.590709	it the first	-0.124939
-1.905633	if the first	-0.425969
-2.136643	on the first	-0.124939
-1.976439	then the first	-0.124939
-1.596164	only the first	-0.124939
-2.044721	If the first	-0.124939
-1.712070	but the first	-0.124939
-0.919636	before the first	-0.249877
-1.509224	example, the first	-0.124939
-1.170920	times the first	-0.124939
-0.880677	calculated the first	-0.124939
-0.880677	optimizing the first	-0.124939
-0.590709	Windows, the first	-0.124939
-1.421702	find the first	-0.124939
-0.590709	Linux, the first	-0.124939
-1.472855	until the first	-0.124939
-0.880677	adding the first	-0.124939
-1.257305	within the first	-0.124939
-0.590709	way, the first	-0.124939
-0.590709	localize the first	-0.124939
-1.548687	faster to first	-0.124939
-1.925164	necessary to first	-0.124939
-0.870542	registers The first	-0.124939
-0.585481	algorithm The first	-0.124939
-1.031231	registers. The first	-0.124939
-1.141295	mode. The first	-0.124939
-0.870542	possible. The first	-0.124939
-1.031231	way. The first	-0.124939
-0.870542	ways. The first	-0.124939
-1.031231	diagonal. The first	-0.124939
-0.585481	performance: The first	-0.124939
-0.585481	vector). The first	-0.124939
-0.585481	further. The first	-0.124939
-0.585481	follows. The first	-0.124939
-1.201647	files are first	-0.124939
-0.601204	expression, or first	-0.124939
-0.887570	dispatching on first	-0.124939
-0.887570	dispatch on first	-0.124939
-0.887570	Dispatch on first	-0.124939
-0.600579	read this first	-0.124939
-1.199933	called only first	-0.124939
-1.592221	example shows first	-0.124939
-0.580641	parameter comes first	-0.124939
-0.078739	2 bytes. first	-0.124939
-0.050876	4 bytes. first	-0.301030
-0.078739	8 bytes. first	-0.124939
-0.174986	400 bytes. first	-0.124939
-0.504861	the 49 first	-0.124939
-0.107171	// After first	-0.425969
-0.358816	is reflected, first	-0.124939
-2.973960	of the register	-0.124939
-2.297558	because the register	-0.124939
-1.999517	using the register	-0.124939
-1.074172	way the register	-0.124939
-1.628596	without the register	-0.124939
-2.344578	is a register	-0.124939
-1.521665	in a register	-0.124939
-1.938858	be a register	-0.124939
-1.585602	as a register	-0.124939
-0.594393	temp a register	-0.124939
-0.887889	setting a register	-0.124939
-1.878971	use of register	-0.124939
-2.119383	value of register	-0.124939
-1.362884	explanation of register	-0.124939
-0.598539	opposite of register	-0.124939
-0.896076	capable of register	-0.124939
-0.895846	compilers The register	-0.124939
-1.566992	cache. The register	-0.124939
-1.189751	variable. The register	-0.124939
-1.479801	function for register	-0.124939
-0.598347	most for register	-0.124939
-0.598347	candidates for register	-0.124939
-0.600105	memory. A register	-0.124939
-1.291263	benefit from register	-0.124939
-1.515806	the vector register	-0.124939
-1.546221	a vector register	-0.124939
-1.119623	of vector register	-0.124939
-1.086490	each vector register	-0.124939
-0.559211	another vector register	-0.124939
-0.821243	256-bit vector register	-0.124939
-0.559211	largest vector register	-0.124939
-2.525100	to make register	-0.124939
-1.496663	the same register	-0.425969
-1.419075	The same register	-0.124939
-1.259189	floating point register	-0.301030
-0.897956	using one register	-0.124939
-0.897662	of integer register	-0.124939
-0.598538	c[size]; float register	-0.124939
-1.963640	is called register	-0.124939
-1.234680	a new register	-0.425969
-0.595303	largest available register	-0.124939
-0.594827	rules about register	-0.124939
-0.844143	an extra register	-0.124939
-1.854511	a single register	-0.124939
-1.656129	to optimize register	-0.124939
-1.225000	128-bit XMM register	-0.124939
-1.003306	the logical register	-0.124939
-1.063676	a temporary register	-0.124939
-0.827680	the YMM register	-0.124939
-0.557285	one free register	-0.124939
-0.964330	to fourteen register	-0.124939
-0.540849	new physical register	-0.124939
-0.504560	the flags register	-0.124939
-2.326938	with a 64	-0.124939
-2.128393	size of 64	-0.124939
-1.367974	integers of 64	-0.124939
-1.290809	vectors of 64	-0.124939
-1.077696	extended to 64	-0.124939
-1.543345	systems and 64	-0.124939
-0.899348	32 and 64	-0.124939
-1.665161	objects in 64	-0.124939
-1.067357	calculation in 64	-0.124939
-0.596108	longer in 64	-0.124939
-1.180786	references in 64	-0.124939
-0.891266	-fpic in 64	-0.124939
-1.067357	seen in 64	-0.124939
-1.431440	systems. The 64	-0.124939
-0.599779	expected. The 64	-0.124939
-2.914919	can be 64	-0.124939
-1.440829	which are 64	-0.124939
-0.900837	8 = 64	-0.124939
-0.900620	represented with 64	-0.124939
-0.900649	bit code 64	-0.124939
-0.600871	8 - 64	-0.124939
-1.038391	long int 64	-0.124939
-1.824460	unsigned int 64	-0.124939
-1.467366	short int 64	-0.124939
-0.600290	entries use 64	-0.124939
-0.577948	AVX double 64	-0.124939
-0.577948	SSE double 64	-0.124939
-0.577948	AVX512 double 64	-0.124939
-0.893801	32 2 64	-0.124939
-0.463891	long long 64	-0.124939
-1.159326	64 4 64	-0.124939
-0.968808	16 4 64	-0.124939
-1.159326	32 4 64	-0.124939
-1.179695	8 8 64	-0.124939
-0.577296	32 8 64	-0.425969
-0.542637	a 64 64	-0.124939
-0.791404	The 64 64	-0.124939
-0.542637	11.6 64 64	-0.124939
-0.542637	9.6b 64 64	-0.124939
-0.595712	Is32vec2 32 64	-0.124939
-0.594900	Asmlib Gnu 64	-0.124939
-0.592455	double uses 64	-0.124939
-0.881770	64 1 64	-0.124939
-0.624051	is typically 64	-0.124939
-1.365940	more efficient. 64	-0.124939
-0.535959	8 char 64	-0.124939
-0.984644	unsigned char 64	-0.124939
-1.389015	the entire 64	-0.124939
-0.556918	1 int64_t 64	-0.124939
-0.787869	not _WIN64 64	-0.124939
-0.526552	another exception. 64	-0.124939
-0.526770	Intel: "Intel 64	-0.124939
-0.463002	data exceeds 64	-0.124939
-0.463002	compiler: __int64 64	-0.124939
-0.358428	(option -fno-pic). 64	-0.124939
-0.358428	64 Iu32vec2 64	-0.124939
-0.358428	line covers 64	-0.124939
-0.358428	31 11.6 64	-0.124939
-0.358428	Example 9.6b 64	-0.124939
-0.358428	I64vec2 Vec2q 64	-0.124939
-0.358428	Iu32vec4 Vec4ui 64	-0.124939
-2.026898	have to take	-0.124939
-1.775513	has to take	-0.124939
-0.885134	do to take	-0.124939
-2.093641	order to take	-0.124939
-1.959728	how to take	-0.124939
-1.286258	need to take	-0.124939
-0.885134	work to take	-0.124939
-0.592989	process to take	-0.124939
-0.885134	destructors to take	-0.124939
-0.592989	appear to take	-0.124939
-0.592989	coprocessors to take	-0.124939
-0.901780	itself and take	-0.124939
-0.891915	objects that take	-0.124939
-1.409241	instructions that take	-0.124939
-1.182049	branches that take	-0.124939
-0.596436	instances that take	-0.124939
-1.784463	that can take	-0.124939
-1.768429	it can take	-0.124939
-1.831300	you can take	-0.124939
-1.450934	It can take	-0.124939
-0.846667	memory can take	-0.124939
-0.846667	program can take	-0.124939
-1.474199	which can take	-0.124939
-1.163409	You can take	-0.425969
-1.194679	thread can take	-0.124939
-1.012303	We can take	-0.124939
-0.572933	tread can take	-0.124939
-0.572933	installed can take	-0.124939
-1.872726	may not take	-0.124939
-1.317470	it may take	-0.124939
-1.499822	This may take	-0.124939
-0.578398	calculations may take	-0.124939
-0.578398	branches may take	-0.124939
-0.578398	process may take	-0.124939
-2.086220	if you take	-0.124939
-0.245373	loop will take	-0.124939
-0.864139	main will take	-0.124939
-0.584677	which functions take	-0.124939
-1.028990	critical functions take	-0.124939
-1.439440	mathematical functions take	-0.124939
-0.898038	developers should take	-0.124939
-1.428898	long double take	-0.124939
-1.773443	and b take	-0.124939
-1.422112	in C++ take	-0.124939
-1.063049	will often take	-0.124939
-0.595638	shift operations take	-0.124939
-1.739033	some cases take	-0.124939
-0.887866	precision calculations take	-0.124939
-0.594293	and doesn't take	-0.124939
-0.591916	multiplication would take	-0.124939
-0.878486	that typically take	-0.124939
-0.588265	and division take	-0.124939
-0.587867	28. We take	-0.124939
-0.584552	calculations usually take	-0.124939
-1.127563	These conversions take	-0.124939
-1.255843	can sometimes take	-0.124939
-0.578827	will still take	-0.124939
-0.526945	inputs. Let's take	-0.124939
-0.504540	and logarithms take	-0.124939
-0.463203	additions. Divisions take	-0.124939
-0.463203	different precisions take	-0.124939
-1.967643	it is often	-0.124939
-0.876735	as is often	-0.124939
-1.763409	This is often	-0.124939
-1.815599	this is often	-0.124939
-1.268296	It is often	-0.182931
-1.821233	program is often	-0.124939
-2.194234	there is often	-0.124939
-1.152919	objects is often	-0.124939
-1.040226	system is often	-0.124939
-0.876735	lookup is often	-0.124939
-0.876735	task is often	-0.124939
-0.600399	complex and often	-0.124939
-0.600399	delete, and often	-0.124939
-1.132926	program are often	-0.124939
-1.657759	functions are often	-0.124939
-1.417843	variables are often	-0.124939
-1.777680	they are often	-0.124939
-1.213723	threads are often	-0.124939
-1.132926	addresses are often	-0.124939
-0.583138	counts are often	-0.124939
-1.024722	profilers are often	-0.124939
-0.866036	requirements are often	-0.124939
-0.866036	Arrays are often	-0.124939
-0.583138	distributors are often	-0.124939
-1.950277	it can often	-0.124939
-1.866191	compiler can often	-0.124939
-1.277670	objects can often	-0.124939
-0.590223	operation can often	-0.124939
-0.879730	output can often	-0.124939
-0.590223	queries can often	-0.124939
-2.096441	because it often	-0.124939
-1.942149	but it often	-0.124939
-0.900926	Vectorized code often	-0.124939
-1.904001	Gnu compiler often	-0.124939
-1.636580	it will often	-0.124939
-1.724386	compilers will often	-0.124939
-0.588145	language will often	-0.124939
-1.537212	library functions often	-0.124939
-1.189546	the most often	-0.602060
-1.087064	is most often	-0.124939
-1.192497	vector size often	-0.124939
-0.597659	Programmers very often	-0.124939
-0.596811	e.g. how often	-0.124939
-0.891479	Mac systems often	-0.124939
-0.882538	other hardware often	-0.124939
-0.550454	occur quite often	-0.124939
-0.550454	happens quite often	-0.124939
-0.877091	size conversion often	-0.124939
-1.259979	hard disk often	-0.124939
-1.160506	switch statements often	-0.124939
-0.562652	do, however, often	-0.124939
-0.463312	Updating mechanisms often	-0.124939
-0.358672	software companies often	-0.124939
-0.358672	that hackers often	-0.124939
-0.358672	file. Keep often	-0.124939
-2.739215	in a rather	-0.124939
-2.486917	the code rather	-0.124939
-1.184966	point code rather	-0.124939
-0.967882	compile time rather	-0.425969
-0.600143	full use rather	-0.124939
-0.844949	in memory rather	-0.425969
-0.896933	64-bit integer rather	-0.124939
-0.598156	use float rather	-0.124939
-0.895544	existing object rather	-0.124939
-0.597926	one array rather	-0.124939
-0.857401	by 8 rather	-0.124939
-0.578615	constant 8 rather	-0.124939
-1.595463	a register rather	-0.124939
-0.596220	class template rather	-0.124939
-0.466197	in registers rather	-0.823909
-0.891056	using pointers rather	-0.124939
-0.595521	cache access rather	-0.124939
-1.912123	operating system rather	-0.124939
-1.401448	64 bits rather	-0.124939
-0.595213	get 0 rather	-0.124939
-0.594829	six instructions rather	-0.124939
-0.595135	present processors rather	-0.124939
-0.594739	10 times rather	-0.124939
-1.756143	the stack rather	-0.124939
-0.594189	API calls rather	-0.124939
-1.253861	the container rather	-0.124939
-1.047278	flush-to-zero mode rather	-0.124939
-1.890463	clock cycles rather	-0.124939
-0.588848	with sets rather	-0.124939
-1.165937	software implementation rather	-0.124939
-1.150200	template parameter rather	-0.124939
-1.241153	integer expressions rather	-0.124939
-1.430025	static linking rather	-0.124939
-0.864501	using references rather	-0.124939
-1.130379	is loaded rather	-0.124939
-0.579683	one operation rather	-0.124939
-0.576340	result 100 rather	-0.124939
-1.223674	processor models rather	-0.124939
-1.540861	the beginning rather	-0.124939
-0.834226	big blocks rather	-0.124939
-0.562319	to memcpy rather	-0.124939
-0.556556	each factor rather	-0.124939
-1.110052	execution units rather	-0.124939
-0.549841	single step rather	-0.124939
-1.008229	in advance rather	-0.124939
-0.503998	towards zero, rather	-0.124939
-0.503998	of xxn rather	-0.124939
-0.503998	and frameworks, rather	-0.124939
-0.462710	result -56 rather	-0.124939
-0.462710	calculated once, rather	-0.124939
-0.358199	&& !b) rather	-0.124939
-0.358199	electrical connections rather	-0.124939
-0.358199	as (b*2.0)/3.0 rather	-0.124939
-0.358199	using unions rather	-0.124939
-0.358199	processor X?" rather	-0.124939
-0.358199	that matters rather	-0.124939
-0.358199	CPU supports, rather	-0.124939
-0.358199	development tools, rather	-0.124939
-0.358199	running at, rather	-0.124939
-2.899545	of the optimization	-0.124939
-2.392877	when the optimization	-0.124939
-1.743743	do the optimization	-0.124939
-1.985981	using the optimization	-0.124939
-1.692764	between the optimization	-0.124939
-0.599679	focus the optimization	-0.124939
-0.599679	concentrate the optimization	-0.124939
-1.930373	lot of optimization	-0.124939
-0.379110	level of optimization	-0.124939
-0.897612	degree of optimization	-0.124939
-0.675304	relevant to optimization	-0.425969
-0.377827	obstacles to optimization	-0.124939
-0.089717	Obstacles to optimization	-0.726999
-0.594103	advices on optimization	-0.124939
-0.594103	book on optimization	-0.124939
-0.594103	Advices on optimization	-0.124939
-0.600936	on code optimization	-0.124939
-0.900426	on compiler optimization	-0.124939
-0.674412	do this optimization	-0.124939
-0.771471	whole program optimization	-0.124939
-0.121606	Whole program optimization	-0.124939
-0.598115	its many optimization	-0.124939
-1.446385	a software optimization	-0.124939
-0.581698	18 software optimization	-0.124939
-0.514777	C++ An optimization	-0.124939
-0.514777	CPUs: An optimization	-0.124939
-0.514777	C++: An optimization	-0.124939
-0.514777	language: An optimization	-0.124939
-1.780745	the best optimization	-0.124939
-0.593960	giving specific optimization	-0.124939
-0.593630	many good optimization	-0.124939
-0.538583	the various optimization	-0.124939
-0.784242	have various optimization	-0.124939
-0.586210	options. Many optimization	-0.124939
-0.764251	if your optimization	-0.124939
-0.527116	If your optimization	-0.124939
-0.570963	the relevant optimization	-0.124939
-0.197720	all relevant optimization	-0.602060
-0.743530	for my optimization	-0.124939
-0.514993	that my optimization	-0.124939
-1.194821	to insert optimization	-0.124939
-0.172615	8.5 Compiler optimization	-0.425969
-0.526805	makes detailed optimization	-0.124939
-0.526960	when interprocedural optimization	-0.124939
-0.504560	program 81 optimization	-0.124939
-0.504560	/Fm Generate optimization	-0.124939
-0.065764	14 Specific optimization	-0.425969
-0.358600	x-xxxx--x Profile-guided optimization	-0.124939
-0.358600	-O3 Interprocedural optimization	-0.124939
-0.358600	the strongest optimization	-0.124939
-1.299744	includes the libraries	-0.124939
-0.601397	Mac The libraries	-0.124939
-0.600999	system or libraries	-0.124939
-0.829093	of function libraries	-0.124939
-0.630537	and function libraries	-0.124939
-0.785951	for function libraries	-0.124939
-0.913521	or function libraries	-0.124939
-0.539553	most function libraries	-0.124939
-0.466577	Intel function libraries	-0.124939
-0.539553	Gnu function libraries	-0.124939
-0.539553	Use function libraries	-0.124939
-0.539553	best function libraries	-0.124939
-0.190878	These function libraries	-0.425969
-0.539553	common function libraries	-0.124939
-0.785951	optimized function libraries	-0.124939
-0.539553	various function libraries	-0.124939
-0.913521	graphics function libraries	-0.124939
-0.539553	Many function libraries	-0.124939
-0.785951	math function libraries	-0.124939
-0.539553	general function libraries	-0.124939
-0.539553	Several function libraries	-0.124939
-0.539553	distribute function libraries	-0.124939
-0.539553	Optimized function libraries	-0.124939
-1.195911	long vector libraries	-0.124939
-0.599806	try different libraries	-0.124939
-0.898181	most other libraries	-0.124939
-0.599512	not all libraries	-0.124939
-1.320829	container class libraries	-0.124939
-1.042587	Vector class libraries	-0.124939
-1.734162	in most libraries	-0.124939
-1.833238	the Intel libraries	-0.124939
-0.588431	when Intel libraries	-0.124939
-1.359101	in two libraries	-0.124939
-1.294070	from static libraries	-0.124939
-1.180607	All these libraries	-0.124939
-0.315278	the dynamic libraries	-0.124939
-0.454949	The dynamic libraries	-0.124939
-0.454949	not dynamic libraries	-0.124939
-0.454949	more dynamic libraries	-0.124939
-0.646289	same dynamic libraries	-0.124939
-0.454949	all dynamic libraries	-0.124939
-0.646289	versus dynamic libraries	-0.124939
-1.528857	The Gnu libraries	-0.124939
-0.888195	for large libraries	-0.124939
-0.557580	code Function libraries	-0.124939
-0.557580	libraries Function libraries	-0.124939
-1.323333	the standard libraries	-0.124939
-0.553495	include standard libraries	-0.124939
-0.587614	all runtime libraries	-0.124939
-0.586230	below. Many libraries	-0.124939
-0.586427	(dynamically linked libraries	-0.124939
-0.421047	static link libraries	-0.425969
-1.064581	dynamic link libraries	-0.124939
-0.580690	efficient. Dynamic libraries	-0.124939
-0.557489	special purpose libraries	-0.124939
-0.504580	_mm_add_epi16(a,b). Two libraries	-0.124939
-0.504580	contains well-tested libraries	-0.124939
-0.504580	and LIBM libraries	-0.124939
-2.550036	This is how	-0.124939
-0.248574	example of how	-0.602060
-0.547010	examples of how	-0.602060
-0.596226	study of how	-0.124939
-0.891500	understanding of how	-0.124939
-1.889669	code and how	-0.124939
-0.502819	called and how	-0.425969
-0.598182	like and how	-0.124939
-0.499286	130 for how	-0.425969
-0.883845	120 for how	-0.124939
-0.202157	122 for how	-0.425969
-0.592331	107 for how	-0.124939
-0.883845	www.agner.org/optimize/cppexamples.zip for how	-0.124939
-1.694249	depends on how	-0.124939
-1.098117	depending on how	-0.124939
-0.590723	Advice on how	-0.124939
-0.925251	details about how	-0.124939
-0.544472	comments about how	-0.124939
-0.544472	Tips about how	-0.124939
-1.322691	can calculate how	-0.124939
-0.805590	to count how	-0.124939
-0.550580	that count how	-0.124939
-1.644649	to see how	-0.124939
-0.241433	example shows how	-0.903090
-0.282257	12.4b shows how	-0.124939
-0.282257	39 shows how	-0.124939
-0.269453	to know how	-0.249877
-1.036806	for checking how	-0.124939
-0.858315	can tell how	-0.124939
-0.577540	a[i]. Note how	-0.124939
-0.497114	is discussed how	-0.425969
-0.572787	compiler e.g. how	-0.124939
-0.570034	profiler counts how	-0.124939
-0.705991	to measure how	-0.124939
-0.415338	and measure how	-0.124939
-0.550361	to show how	-0.124939
-0.540849	that specifies how	-0.124939
-0.540721	C++ programming, how	-0.124939
-0.916289	to understand how	-0.124939
-0.526960	examples explain how	-0.124939
-0.527116	no idea how	-0.124939
-0.959398	example illustrates how	-0.124939
-0.504560	that decide how	-0.124939
-0.143294	manual discusses how	-0.124939
-0.143294	section discusses how	-0.124939
-0.463221	in doubt how	-0.124939
-0.358600	chapter describes how	-0.124939
-1.789277	of the code.	-0.191886
-1.809176	in the code.	-0.221849
-1.613084	improve the code.	-0.124939
-0.596849	optimizes the code.	-0.124939
-0.596849	improving the code.	-0.124939
-1.129655	piece of code.	-0.124939
-1.290024	pieces of code.	-0.124939
-0.897552	sequences of code.	-0.124939
-0.600775	string as code.	-0.124939
-0.600341	handles this code.	-0.124939
-1.494557	the instruction code.	-0.124939
-1.625926	floating point code.	-0.301030
-1.195848	on integer code.	-0.124939
-1.999005	in 64-bit code.	-0.124939
-1.068156	or C++ code.	-0.124939
-0.893739	that makes code.	-0.124939
-2.108184	the critical code.	-0.124939
-0.806759	of system code.	-0.124939
-0.806759	in system code.	-0.124939
-0.551229	for system code.	-0.124939
-1.266626	the error code.	-0.124939
-0.405507	any extra code.	-0.124939
-1.307610	in assembly code.	-0.124939
-0.567721	compiler-generated assembly code.	-0.124939
-1.038956	the compiled code.	-0.124939
-0.914018	directly compiled code.	-0.124939
-0.539763	fully compiled code.	-0.124939
-1.123971	the intermediate code.	-0.124939
-1.328091	an intermediate code.	-0.124939
-1.569952	the application code.	-0.124939
-0.588832	vectorizing mathematical code.	-0.124939
-0.949772	the source code.	-0.124939
-0.333735	same source code.	-0.124939
-0.726701	the position-independent code.	-0.124939
-0.504971	of position-independent code.	-0.124939
-0.578941	faster vectorized code.	-0.124939
-0.572987	binary executable code.	-0.124939
-1.295308	the simplest code.	-0.124939
-0.980709	position- independent code.	-0.124939
-0.787859	and Fortran code.	-0.124939
-0.540925	2. Position-independent code.	-0.124939
-0.540634	to CPU-intensive code.	-0.124939
-0.526897	to suboptimal code.	-0.124939
-0.314494	for application-specific code.	-0.124939
-0.314494	optimizing application-specific code.	-0.124939
-0.659095	to non-AVX code.	-0.124939
-0.463148	in time-critical code.	-0.124939
-0.358543	in precompiled code.	-0.124939
-2.252993	of the time.	-0.124939
-0.765091	at a time.	-0.124939
-1.944986	lot of time.	-0.124939
-1.013544	amount of time.	-0.124939
-1.296013	slightly more time.	-0.124939
-1.418223	the same time.	-0.221849
-0.591991	than CPU time.	-0.124939
-0.591991	consumes CPU time.	-0.124939
-0.599235	less each time.	-0.124939
-0.599248	take most time.	-0.124939
-0.598210	branching takes time.	-0.124939
-1.682475	a long time.	-0.124939
-2.016240	the first time.	-0.124939
-0.540918	no extra time.	-0.124939
-0.714375	takes extra time.	-0.124939
-1.121009	any extra time.	-0.124939
-0.541091	to execution time.	-0.124939
-0.788667	and execution time.	-0.124939
-0.788667	total execution time.	-0.124939
-0.263173	at compile time.	-0.215115
-0.593091	total calculation time.	-0.124939
-0.591960	at run time.	-0.124939
-1.040493	and development time.	-0.124939
-0.587775	than last time.	-0.124939
-0.928218	takes longer time.	-0.124939
-0.209031	take longer time.	-0.301030
-0.188810	at load time.	-0.124939
-0.581785	at installation time.	-0.124939
-1.195453	to save time.	-0.124939
-1.425027	the total time.	-0.124939
-0.241769	the user's time.	-0.124939
-0.463294	total computation time.	-0.124939
-2.405570	of the template	-0.124939
-2.375226	and the template	-0.124939
-2.728110	in the template	-0.124939
-2.513373	that the template	-0.124939
-2.538585	if the template	-0.124939
-1.641121	because the template	-0.124939
-2.249088	If the template	-0.124939
-1.571828	example, the template	-0.124939
-0.601652	Why is template	-0.124939
-2.368367	is a template	-0.124939
-2.344200	of a template	-0.124939
-1.802317	and a template	-0.124939
-1.911461	that a template	-0.124939
-1.590493	as a template	-0.124939
-1.761244	If a template	-0.124939
-1.706022	using a template	-0.124939
-1.188901	through a template	-0.425969
-1.677150	set of template	-0.124939
-1.595578	functions. The template	-0.124939
-2.405301	a function template	-0.124939
-0.600965	macro by template	-0.124939
-1.186743	implemented with template	-0.124939
-0.597650	algorithm with template	-0.124939
-0.886421	size as template	-0.124939
-1.062512	name as template	-0.124939
-0.593646	factors as template	-0.124939
-1.755734	or more template	-0.124939
-1.334318	time. A template	-0.124939
-0.576655	parameters. A template	-0.124939
-0.576655	polymorphism A template	-0.124939
-0.576655	Templates A template	-0.124939
-0.576655	so). A template	-0.124939
-1.823427	a class template	-0.124939
-0.897023	power using template	-0.124939
-1.451705	the C++ template	-0.124939
-0.585485	In C++ template	-0.124939
-0.895484	however, where template	-0.124939
-1.973848	of 2 template	-0.124939
-0.596501	by template template	-0.124939
-1.173129	// Use template	-0.124939
-1.644164	// Function template	-0.124939
-1.322969	the standard template	-0.124939
-0.947029	The standard template	-0.124939
-1.674722	the above template	-0.124939
-0.589032	this complicated template	-0.124939
-1.016834	bounds checking template	-0.124939
-1.003096	of N template	-0.124939
-0.817754	of 2: template	-0.124939
-0.129377	The powN template	-0.425969
-0.463130	because partial template	-0.124939
-0.065754	// Full template	-0.425969
-0.659066	template parameter: template	-0.124939
-0.659066	* m;} template	-0.124939
-0.358529	by (partial) template	-0.124939
-0.358529	and convoluted template	-0.124939
-0.358529	a non-recursing template	-0.124939
-0.358529	// Partial template	-0.124939
-0.902706	Only the registers	-0.124939
-1.509755	number of registers	-0.301030
-1.615086	type of registers	-0.124939
-1.233074	but in registers	-0.124939
-1.563094	variables in registers	-0.124939
-1.163415	stored in registers	-0.124939
-0.217324	transferred in registers	-0.329059
-0.874619	returned in registers	-0.124939
-0.779926	the vector registers	-0.124939
-1.097041	of vector registers	-0.124939
-1.070674	bit vector registers	-0.124939
-0.552880	XMM vector registers	-0.124939
-0.945735	128-bit vector registers	-0.124939
-0.552880	sixteen vector registers	-0.124939
-0.600084	advantageous because registers	-0.124939
-1.467499	floating point registers	-0.124939
-1.470007	the integer registers	-0.124939
-0.862417	six integer registers	-0.124939
-0.581248	fourteen integer registers	-0.124939
-0.599236	from using registers	-0.124939
-0.595422	of available registers	-0.124939
-0.118178	point stack registers	-0.602060
-0.594057	stack. These registers	-0.124939
-0.036497	the XMM registers	-0.346788
-0.224034	in XMM registers	-0.124939
-0.097728	if XMM registers	-0.425969
-0.224034	uses XMM registers	-0.124939
-0.401223	128-bit XMM registers	-0.124939
-0.570489	not enough registers	-0.124939
-0.566985	to 256-bit registers	-0.124939
-0.136352	and YMM registers	-0.124939
-0.336210	The YMM registers	-0.124939
-0.884524	for saving registers	-0.124939
-0.042732	and ZMM registers	-0.124939
-0.090132	512-bit ZMM registers	-0.124939
-0.687934	without the need	-0.602060
-0.600784	eliminates the need	-0.124939
-1.500440	data. The need	-0.124939
-1.784331	functions that need	-0.124939
-0.202667	addresses that need	-0.124939
-0.594841	files that need	-0.124939
-0.594841	resources that need	-0.124939
-1.790614	may not need	-0.124939
-1.835929	do not need	-0.124939
-1.057641	does not need	-0.124939
-1.301320	that may need	-0.124939
-1.101824	program may need	-0.124939
-0.574140	array may need	-0.124939
-1.176526	we may need	-0.124939
-1.729498	You may need	-0.124939
-0.574140	logic may need	-0.124939
-0.574140	drivers may need	-0.124939
-1.966375	then you need	-0.124939
-1.790096	If you need	-0.124939
-1.021955	so you need	-0.124939
-1.021955	sure you need	-0.124939
-0.864114	words, you need	-0.124939
-0.599893	you only need	-0.124939
-0.843078	is no need	-0.669007
-0.536943	- no need	-0.124939
-1.825409	a class need	-0.124939
-0.897631	Other compilers need	-0.124939
-1.255027	then we need	-0.124939
-0.556234	before we need	-0.124939
-0.815815	case we need	-0.124939
-0.556234	cores, we need	-0.124939
-0.597998	registers. You need	-0.124939
-1.468960	dynamic libraries need	-0.124939
-1.815852	operating systems need	-0.124939
-0.796292	it doesn't need	-0.124939
-1.166715	compiler doesn't need	-0.124939
-0.491310	class doesn't need	-0.124939
-0.491310	object doesn't need	-0.124939
-1.056177	different threads need	-0.124939
-0.887499	high-level language need	-0.124939
-0.886064	will therefore need	-0.124939
-1.046924	object files need	-0.124939
-0.422870	that don't need	-0.124939
-0.649728	you don't need	-0.124939
-0.897528	we don't need	-0.124939
-0.422870	they don't need	-0.124939
-0.874776	software applications need	-0.124939
-1.299858	both the pointers	-0.124939
-0.548532	table of pointers	-0.602060
-0.203406	casting of pointers	-0.124939
-0.598070	programmer that pointers	-0.124939
-0.598070	explicitly that pointers	-0.124939
-0.895147	specifying that pointers	-0.124939
-0.600911	data or pointers	-0.124939
-0.895074	of function pointers	-0.425969
-1.354049	and function pointers	-0.124939
-1.849127	or if pointers	-0.124939
-1.548698	than by pointers	-0.124939
-0.900762	things with pointers	-0.124939
-1.467639	well as pointers	-0.124939
-1.064732	transferred as pointers	-0.124939
-0.883242	use than pointers	-0.124939
-2.126779	rather than pointers	-0.124939
-0.592022	(rather than pointers	-0.124939
-0.899685	typically use pointers	-0.124939
-2.523789	to make pointers	-0.124939
-0.599425	analyze all pointers	-0.124939
-1.211812	of using pointers	-0.124939
-0.598485	allows multiple pointers	-0.124939
-0.573989	that two pointers	-0.124939
-1.175924	between two pointers	-0.124939
-0.573989	Comparing two pointers	-0.124939
-0.536146	of member pointers	-0.124939
-0.779961	with member pointers	-0.124939
-0.956052	make member pointers	-0.124939
-0.536146	way member pointers	-0.124939
-0.536146	Simple member pointers	-0.124939
-0.594445	arguments while pointers	-0.124939
-0.361178	accessed through pointers	-0.249877
-1.255504	that uses pointers	-0.124939
-0.194840	7.7 Function pointers	-0.124939
-0.590971	Relocation. All pointers	-0.124939
-0.588753	variable. Using pointers	-0.124939
-0.584809	the link pointers	-0.124939
-0.574919	block. Any pointers	-0.124939
-0.474427	of smart pointers	-0.124939
-0.474427	using smart pointers	-0.124939
-1.097661	This includes pointers	-0.124939
-1.332723	to keep pointers	-0.124939
-0.570009	with invalid pointers	-0.124939
-1.185495	by setting pointers	-0.124939
-0.566501	may contain pointers	-0.124939
-0.031634	7.9 Smart pointers	-0.124939
-0.065754	deleted. Smart pointers	-0.124939
-0.065754	auto_ptr. Smart pointers	-0.124939
-0.659066	7.8 Member pointers	-0.124939
-0.358529	by initializing pointers	-0.124939
-2.914957	in the test	-0.124939
-2.403086	by the test	-0.124939
-2.026197	before the test	-0.124939
-1.494767	after the test	-0.124939
-1.393470	make a test	-0.124939
-1.290790	put a test	-0.124939
-0.599477	developed a test	-0.124939
-0.565404	set of test	-0.425969
-1.528027	function to test	-0.124939
-1.619411	code to test	-0.124939
-1.984704	have to test	-0.124939
-2.053059	order to test	-0.124939
-1.822615	way to test	-0.124939
-1.045897	example, to test	-0.124939
-1.346927	how to test	-0.124939
-0.590679	registers to test	-0.124939
-1.847381	need to test	-0.124939
-1.045897	times to test	-0.124939
-1.774830	necessary to test	-0.124939
-1.326075	relevant to test	-0.124939
-0.880619	things to test	-0.124939
-0.590679	practice to test	-0.124939
-0.601443	separately and test	-0.124939
-1.553470	useful in test	-0.124939
-1.477027	data. The test	-0.124939
-0.598443	counter. The test	-0.124939
-0.598443	spots. The test	-0.124939
-0.898748	variable for test	-0.124939
-0.898748	branch for test	-0.124939
-0.601355	>> can test	-0.124939
-1.603993	{ // test	-0.124939
-0.600960	textbook on test	-0.124939
-0.600136	code. A test	-0.124939
-1.072497	on different test	-0.124939
-1.723678	you should test	-0.124939
-1.943745	for each test	-0.124939
-0.864073	a performance test	-0.124939
-0.582113	realistic performance test	-0.124939
-0.597314	Time before test	-0.124939
-0.561757	}; void test	-0.425969
-0.552240	a[c][r]); void test	-0.124939
-1.745449	a simple test	-0.124939
-1.263175	The speed test	-0.124939
-1.448622	a small test	-0.124939
-0.976858	in my test	-0.124939
-0.743587	for my test	-0.124939
-0.324540	program under test	-0.124939
-0.463407	counters. My test	-0.124939
-0.463407	identified. My test	-0.124939
-1.009311	a dedicated test	-0.124939
-0.504783	a built-in test	-0.124939
-0.358629	the unit- test	-0.124939
-2.430801	of the new	-0.124939
-2.026037	to the new	-0.124939
-2.508374	for the new	-0.124939
-2.581778	if the new	-0.124939
-1.071193	uses the new	-0.124939
-0.897748	gets the new	-0.124939
-2.067097	is a new	-0.124939
-1.762791	of a new	-0.124939
-1.937415	to a new	-0.124939
-1.810044	for a new	-0.124939
-1.756276	that a new	-0.124939
-0.542374	time a new	-0.124939
-1.781646	use a new	-0.124939
-1.494909	when a new	-0.124939
-1.308226	make a new	-0.124939
-1.372508	making a new	-0.124939
-0.583089	support a new	-0.124939
-0.865943	load a new	-0.124939
-1.229664	generate a new	-0.124939
-0.200259	start a new	-0.124939
-0.865943	calculating a new	-0.124939
-0.245590	allocate a new	-0.301030
-0.583089	starting a new	-0.124939
-0.583089	maintaining a new	-0.124939
-0.583089	assigning a new	-0.124939
-0.583089	create a new	-0.124939
-0.601531	needed, and new	-0.124939
-0.585047	memory with new	-0.124939
-0.585047	object with new	-0.124939
-0.373326	allocated with new	-0.425969
-0.585047	allocation with new	-0.124939
-0.585047	dynamically with new	-0.124939
-0.600947	problem. This new	-0.124939
-1.320546	to each new	-0.124939
-0.589453	development, each new	-0.124939
-1.533914	to using new	-0.124939
-0.595106	this important new	-0.124939
-0.594002	set. These new	-0.124939
-0.592785	CString uses new	-0.124939
-0.592231	the operators new	-0.124939
-1.251118	to add new	-0.124939
-1.667589	the next new	-0.124939
-0.580448	and desired new	-0.124939
-0.577291	keep adding new	-0.124939
-0.566826	is brand new	-0.124939
-0.541006	alloca over new	-0.124939
-0.358729	to advertise new	-0.124939
-0.358729	to receive new	-0.124939
-0.358729	dynamically (with new	-0.124939
-0.900801	portable to systems	-0.124939
-1.075752	ported to systems	-0.124939
-0.902336	thread in systems	-0.124939
-0.599717	but other systems	-0.124939
-1.288931	in all systems	-0.124939
-1.125918	the 64-bit systems	-0.124939
-1.216211	in 64-bit systems	-0.124939
-0.784847	The 64-bit systems	-0.124939
-0.960938	use 64-bit systems	-0.124939
-1.129222	In 64-bit systems	-0.124939
-1.193674	on such systems	-0.124939
-1.867625	in some systems	-0.124939
-0.545225	in 32-bit systems	-0.367977
-0.482221	but 32-bit systems	-0.124939
-0.482221	Many 32-bit systems	-0.124939
-0.936968	64 bit systems	-0.425969
-1.343170	the operating systems	-0.124939
-0.504773	and operating systems	-0.124939
-0.348320	different operating systems	-0.124939
-0.348320	two operating systems	-0.124939
-0.260473	64-bit operating systems	-0.124939
-0.348320	some operating systems	-0.124939
-0.140137	32-bit operating systems	-0.425969
-0.348320	these operating systems	-0.124939
-0.348320	Linux operating systems	-0.124939
-0.140137	old operating systems	-0.124939
-0.348320	current operating systems	-0.124939
-0.348320	contemporary operating systems	-0.124939
-0.348320	Older operating systems	-0.124939
-0.348320	Supported operating systems	-0.124939
-0.521526	applications. Some systems	-0.124939
-0.521526	important. Some systems	-0.124939
-0.521526	for. Some systems	-0.124939
-0.521526	card. Some systems	-0.124939
-0.593984	3.x. These systems	-0.124939
-1.208996	and Mac systems	-0.124939
-0.545282	systems. Mac systems	-0.124939
-1.249241	in 16-bit systems	-0.124939
-0.152904	in embedded systems	-0.124939
-0.527087	with existing systems	-0.124939
-0.835223	big endian systems	-0.124939
-0.504864	fully utilize systems	-0.124939
-0.726283	in Unix-like systems	-0.124939
-0.463367	64-bit Unix systems	-0.124939
-0.358715	used. Web systems	-0.124939
-2.563532	of the user	-0.124939
-1.745569	to the user	-0.124939
-2.216554	and the user	-0.124939
-2.305799	for the user	-0.124939
-1.721928	that the user	-0.124939
-1.531710	or the user	-0.124939
-1.748451	if the user	-0.301030
-2.241693	on the user	-0.124939
-2.065072	than the user	-0.124939
-2.051730	then the user	-0.124939
-1.056855	way the user	-0.124939
-0.594489	processor the user	-0.124939
-0.594489	interrupt the user	-0.124939
-0.594489	place the user	-0.124939
-0.594489	telling the user	-0.124939
-0.594489	forbids the user	-0.124939
-2.012652	that a user	-0.124939
-1.668659	when a user	-0.124939
-0.896226	development of user	-0.124939
-1.883277	instead of user	-0.124939
-1.575231	choice of user	-0.124939
-1.044402	Choice of user	-0.425969
-1.773446	time to user	-0.124939
-0.601544	reinstalled and user	-0.124939
-1.422997	systems. The user	-0.124939
-0.896048	starts. The user	-0.124939
-0.598525	insufficient. The user	-0.124939
-1.059577	times for user	-0.124939
-0.740240	waiting for user	-0.425969
-0.595426	waits for user	-0.124939
-0.595426	Waiting for user	-0.124939
-1.490941	time. A user	-0.124939
-0.898851	and different user	-0.124939
-0.895770	simplest possible user	-0.124939
-1.804156	a very user	-0.124939
-0.597225	though less user	-0.124939
-0.882556	use standard user	-0.124939
-0.626578	the end user	-0.124939
-0.580593	code, including user	-0.124939
-0.129445	the graphical user	-0.124939
-0.039013	a graphical user	-0.301030
-0.129445	A graphical user	-0.124939
-0.129445	own graphical user	-0.124939
-0.957034	for storing user	-0.124939
-0.540939	A popular user	-0.124939
-0.463403	features. Take user	-0.124939
-0.876584	all of these	-0.124939
-1.654736	one of these	-0.124939
-1.274524	many of these	-0.124939
-0.547881	any of these	-0.124939
-1.478644	some of these	-0.124939
-1.825668	versions of these	-0.124939
-1.815396	advantage of these	-0.124939
-0.201395	All of these	-0.124939
-1.341036	implementation of these	-0.124939
-0.876584	Many of these	-0.124939
-1.152634	aware of these	-0.124939
-1.305103	availability of these	-0.124939
-0.588605	Which of these	-0.124939
-1.040006	combination of these	-0.124939
-0.588605	fourth of these	-0.124939
-1.202893	solution to these	-0.124939
-1.491549	function and these	-0.124939
-0.600336	vectors, and these	-0.124939
-1.077936	examples in these	-0.124939
-1.670045	functions for these	-0.124939
-0.598347	examples for these	-0.124939
-0.598347	avoided for these	-0.124939
-1.774389	assume that these	-0.124939
-0.898591	Note that these	-0.124939
-0.891928	beware that these	-0.124939
-0.600943	something on these	-0.124939
-1.371974	that use these	-0.124939
-0.592550	size, because these	-0.124939
-0.884274	problematic because these	-0.124939
-1.644878	for all these	-0.124939
-0.591059	Obviously, all these	-0.124939
-0.591154	instructions, but these	-0.124939
-0.591154	relocate, but these	-0.124939
-0.598444	variables. In these	-0.124939
-0.597596	balance between these	-0.124939
-0.893634	resources. For these	-0.124939
-1.607695	to access these	-0.124939
-1.642783	to avoid these	-0.124939
-1.078100	should avoid these	-0.124939
-0.594192	reasons. Use these	-0.124939
-0.594213	line. But these	-0.124939
-0.690401	purposes. All these	-0.124939
-0.482823	prone. All these	-0.124939
-0.482823	9.2. All these	-0.124939
-0.482823	www.agner.org/optimize/#vectorclass All these	-0.124939
-0.590624	implementations. However, these	-0.124939
-0.580449	lrint. Unfortunately, these	-0.124939
-1.558091	to tell these	-0.124939
-0.575326	systems, though these	-0.124939
-1.027155	will convert these	-0.124939
-0.570308	and swap these	-0.124939
-1.185726	by setting these	-0.124939
-1.041532	to overcome these	-0.124939
-1.009238	to distinguish these	-0.124939
-0.358600	to translate these	-0.124939
-1.418624	problems and they	-0.124939
-1.064680	thing and they	-0.124939
-1.064680	programmers and they	-0.124939
-0.597172	integers, and they	-0.124939
-0.597172	sizes, and they	-0.124939
-1.501516	so that they	-0.124939
-1.317280	sure that they	-0.124939
-0.594822	reason that they	-0.124939
-0.601058	needed, or they	-0.124939
-2.533908	the function they	-0.124939
-1.111848	or if they	-0.124939
-0.575606	static if they	-0.124939
-0.918029	even if they	-0.124939
-0.575606	values if they	-0.124939
-0.575606	together if they	-0.124939
-0.851702	errors if they	-0.124939
-0.575606	expensive if they	-0.124939
-0.575606	cheap if they	-0.124939
-0.600950	integers - they	-0.124939
-1.768836	every time they	-0.124939
-0.866608	of when they	-0.124939
-1.564693	only when they	-0.124939
-0.866608	counters when they	-0.124939
-0.583436	stronger when they	-0.124939
-0.956025	code because they	-0.124939
-1.143524	efficient because they	-0.124939
-0.993327	performance because they	-0.124939
-0.557015	critical because they	-0.124939
-0.557015	operators because they	-0.124939
-0.817236	avoided because they	-0.124939
-0.817236	costly because they	-0.124939
-0.417593	in which they	-0.602060
-0.591197	a, but they	-0.124939
-0.591197	integers, but they	-0.124939
-1.710937	cases where they	-0.124939
-1.454923	situation where they	-0.124939
-1.393801	the objects they	-0.124939
-1.438077	of objects they	-0.124939
-0.597341	stages before they	-0.124939
-1.183495	on how they	-0.124939
-1.403667	most cases they	-0.124939
-0.816137	of whether they	-0.124939
-0.816137	see whether they	-0.124939
-0.590112	the programs they	-0.124939
-0.590474	pointers unless they	-0.124939
-0.588374	that's what they	-0.124939
-0.873924	only after they	-0.124939
-0.865507	which reductions they	-0.124939
-0.570367	calculations whenever they	-0.124939
-0.358672	the texts they	-0.124939
-0.203513	with and without	-0.425969
-1.286095	loop and without	-0.124939
-0.600722	with or without	-0.124939
-0.898746	to memory without	-0.124939
-0.599832	storing data without	-0.124939
-1.072331	application program without	-0.124939
-2.604357	the same without	-0.124939
-1.535350	library functions without	-0.124939
-2.317889	the loop without	-0.124939
-2.150061	be used without	-0.124939
-1.286436	to integer without	-0.124939
-0.897195	Gnu compilers without	-0.124939
-1.360614	a double without	-0.124939
-0.877698	shared object without	-0.124939
-1.188633	new version without	-0.124939
-1.364356	shared objects without	-0.124939
-1.466601	dynamic libraries without	-0.124939
-0.595868	11.3 even without	-0.124939
-1.409656	oriented programming without	-0.124939
-1.347870	point operations without	-0.124939
-0.889052	reorder instructions without	-0.124939
-0.595237	old processors without	-0.124939
-0.594746	unrecoverable error without	-0.124939
-1.064361	on CPUs without	-0.124939
-1.055731	of calculations without	-0.124939
-0.594465	command-line versions without	-0.124939
-1.084911	is compiled without	-0.124939
-0.703908	are compiled without	-0.124939
-0.931613	code compiled without	-0.124939
-0.703908	when compiled without	-0.124939
-0.491149	object compiled without	-0.124939
-1.232383	4 bytes without	-0.124939
-1.117289	8 bytes without	-0.124939
-0.373090	16 bytes without	-0.602060
-0.593979	improve speed without	-0.124939
-1.406069	an exception without	-0.124939
-1.734372	double precision without	-0.124939
-0.883823	or container without	-0.124939
-0.873713	in applications without	-0.124939
-0.871859	old microprocessors without	-0.124939
-0.581526	handling errors without	-0.124939
-0.576750	backup copying without	-0.124939
-1.326046	be changed without	-0.124939
-0.850881	of compiling without	-0.124939
-0.572706	C1::f directly without	-0.124939
-0.569877	of F1 without	-0.124939
-0.550027	C-style type-casting without	-0.124939
-0.540535	an int, without	-0.124939
-0.526425	desired functionality without	-0.124939
-0.526425	a unit-test without	-0.124939
-0.504199	disassembly, probably without	-0.124939
-0.462893	in question without	-0.124939
-0.358342	used freely without	-0.124939
-0.358342	code. (Compile without	-0.124939
-1.572212	This is useful	-0.124939
-1.980848	It is useful	-0.124939
-1.895093	program is useful	-0.124939
-1.978760	which is useful	-0.124939
-0.889868	method is useful	-0.301030
-0.593514	testing is useful	-0.124939
-0.593514	principle is useful	-0.124939
-0.593514	throw()specification is useful	-0.124939
-2.020246	is a useful	-0.124939
-1.275720	can be useful	-0.602060
-1.095472	may be useful	-0.492916
-1.360862	which are useful	-0.124939
-1.607903	libraries are useful	-0.124939
-1.483454	operations are useful	-0.124939
-1.152731	directives are useful	-0.124939
-1.040081	profilers are useful	-0.124939
-0.876636	Threads are useful	-0.124939
-0.876636	References are useful	-0.124939
-0.588632	~ are useful	-0.124939
-1.372455	are more useful	-0.124939
-1.193453	is most useful	-0.124939
-1.188492	is also useful	-0.124939
-0.573237	102 also useful	-0.124939
-1.193124	contains many useful	-0.124939
-1.322302	is very useful	-0.124939
-1.369112	a very useful	-0.124939
-0.782635	and very useful	-0.124939
-0.683797	be very useful	-0.124939
-1.800163	is less useful	-0.124939
-1.836091	is often useful	-0.124939
-0.579123	A particularly useful	-0.124939
-0.566988	newsgroups contain useful	-0.124939
-2.242256	then the even	-0.124939
-2.808598	it is even	-0.124939
-0.601500	prone to even	-0.124939
-0.601152	overwritten, and even	-0.124939
-0.601178	table for even	-0.124939
-1.764799	would be even	-0.124939
-0.601238	denominator can even	-0.124939
-0.593015	update or even	-0.124939
-0.593015	hundred or even	-0.124939
-0.593015	uncached or even	-0.124939
-0.593015	search, or even	-0.124939
-2.397013	is not even	-0.124939
-0.596608	register, not even	-0.124939
-1.881907	is an even	-0.124939
-1.779136	It may even	-0.124939
-0.882182	memory may even	-0.124939
-1.932341	You may even	-0.124939
-1.844860	you have even	-0.124939
-0.898681	into memory even	-0.124939
-0.895033	different objects even	-0.124939
-1.841195	a variable even	-0.124939
-1.918444	the performance even	-0.124939
-1.736944	some cases even	-0.124939
-0.594246	different arrays even	-0.124939
-1.519469	multiple versions even	-0.124939
-0.565851	cases. An even	-0.124939
-0.565851	leak. An even	-0.124939
-0.592854	(|) works even	-0.124939
-1.891984	clock cycles even	-0.124939
-1.467833	most cases, even	-0.124939
-1.666780	exception handling even	-0.124939
-1.480577	you don't even	-0.124939
-0.587060	many applications even	-0.124939
-1.310847	dispatch mechanism even	-0.124939
-0.874311	are needed even	-0.124939
-0.872452	an Intel, even	-0.124939
-0.578624	to temp even	-0.124939
-0.578368	always inlined even	-0.124939
-0.840635	be used, even	-0.124939
-1.146795	be mispredicted even	-0.124939
-0.817283	be called, even	-0.124939
-0.915203	is executed even	-0.124939
-0.540483	function returns even	-0.124939
-0.526628	starts up, even	-0.124939
-0.526362	more resources, even	-0.124939
-0.958199	by default, even	-0.124939
-0.504139	program execution, even	-0.124939
-0.504139	never occurs, even	-0.124939
-0.504139	memory space, even	-0.124939
-0.725315	example 11.3 even	-0.124939
-0.462838	point expressions, even	-0.124939
-0.462838	a time-consumer even	-0.124939
-0.462838	response times, even	-0.124939
-0.358299	|| b)) even	-0.124939
-0.358299	than nine, even	-0.124939
-0.358299	(b*c) overflows, even	-0.124939
-0.358299	exception handler, even	-0.124939
-2.777335	it is sure	-0.124939
-2.527117	This is sure	-0.124939
-0.601575	know for sure	-0.124939
-1.968237	cannot be sure	-0.124939
-1.072195	never be sure	-0.124939
-0.964056	you are sure	-0.249877
-1.353724	they are sure	-0.425969
-1.361047	processors are sure	-0.124939
-0.588663	arguments are sure	-0.124939
-2.188384	are not sure	-0.124939
-0.667433	to make sure	-0.681241
-0.729929	and make sure	-0.124939
-0.916697	can make sure	-0.124939
-0.714975	you make sure	-0.124939
-0.403873	then make sure	-0.124939
-0.629987	must make sure	-0.124939
-0.444386	Therefore, make sure	-0.124939
-0.444386	errors; make sure	-0.124939
-0.678179	that makes sure	-0.124939
-0.918934	it makes sure	-0.124939
-0.831127	This makes sure	-0.602060
-0.475205	system makes sure	-0.124939
-0.475205	reference makes sure	-0.124939
-0.475205	keyword makes sure	-0.124939
-0.475205	product makes sure	-0.124939
-0.760689	of making sure	-0.425969
-1.401507	by making sure	-0.124939
-0.536357	variable. Make sure	-0.124939
-0.536357	executables. Make sure	-0.124939
-0.358873	signed. Be sure	-0.124939
-1.843606	but the method	-0.124939
-1.440698	choose the method	-0.124939
-0.601131	shows, the method	-0.124939
-1.172243	used. The method	-0.124939
-1.047304	bits. The method	-0.124939
-0.591172	pow The method	-0.124939
-0.881581	back. The method	-0.124939
-0.591172	language". The method	-0.124939
-0.881581	negative. The method	-0.124939
-0.591172	count. The method	-0.124939
-0.591172	elimination. The method	-0.124939
-1.738553	function or method	-0.124939
-0.921530	called. This method	-0.124939
-0.791907	CPUs. This method	-0.124939
-0.791907	compiler. This method	-0.124939
-0.542920	library. This method	-0.124939
-0.921530	thread. This method	-0.124939
-0.542920	allocation. This method	-0.124939
-0.542920	array. This method	-0.124939
-0.791907	16. This method	-0.124939
-0.542920	finished. This method	-0.124939
-0.791907	all. This method	-0.124939
-0.542920	versions. This method	-0.124939
-0.791907	loaded. This method	-0.124939
-0.542920	2.0 This method	-0.124939
-0.542920	efficiently. This method	-0.124939
-0.542920	executables. This method	-0.124939
-0.542920	added. This method	-0.124939
-0.542920	MFC). This method	-0.124939
-1.484614	of this method	-0.124939
-0.977798	that this method	-0.124939
-1.297373	use this method	-0.124939
-1.289328	because this method	-0.124939
-1.508226	but this method	-0.124939
-1.196690	avoid this method	-0.124939
-0.565572	choose this method	-0.124939
-0.565572	Unfortunately, this method	-0.124939
-0.600212	blocks. A method	-0.124939
-1.293055	short vector method	-0.124939
-1.623801	The same method	-0.124939
-1.345171	of which method	-0.124939
-0.866368	discussed which method	-0.124939
-0.583311	consider which method	-0.124939
-1.563547	induction variable method	-0.124939
-1.281821	first call method	-0.124939
-1.400530	most important method	-0.124939
-1.142153	function calling method	-0.124939
-0.570181	A similar method	-0.124939
-0.570233	A newer method	-0.124939
-0.566952	table. Optimization method	-0.124939
-0.562628	more general method	-0.124939
-0.562757	The preferred method	-0.124939
-0.526953	old C-style method	-0.124939
-0.526953	The original method	-0.124939
-0.504700	effect. Which method	-0.124939
-0.504700	an unfortunate method	-0.124939
-2.175649	that is always	-0.124939
-2.258986	function is always	-0.124939
-1.603629	b is always	-0.124939
-2.279445	there is always	-0.124939
-1.063816	SSE2 is always	-0.124939
-1.350007	parameter is always	-0.124939
-0.892787	section is always	-0.124939
-1.275716	exponent is always	-0.124939
-2.254447	is to always	-0.124939
-1.971588	compiler to always	-0.124939
-0.899193	reduced to always	-0.124939
-1.074256	small and always	-0.124939
-1.199195	count and always	-0.124939
-1.552858	branch that always	-0.124939
-1.724475	parameters are always	-0.124939
-1.172641	results are always	-0.124939
-1.055360	manuals are always	-0.124939
-0.887065	Arrays are always	-0.124939
-0.593974	properties) are always	-0.124939
-0.901474	1; // always	-0.124939
-0.601072	true or always	-0.124939
-1.196814	is not always	-0.124939
-0.990975	are not always	-0.124939
-1.408437	but not always	-0.124939
-1.211021	do not always	-0.124939
-1.506185	does not always	-0.124939
-1.669498	compiler will always	-0.124939
-1.038735	thread will always	-0.124939
-0.588155	core will always	-0.124939
-0.596647	#pragma vector always	-0.124939
-1.873109	the cache always	-0.124939
-0.590351	size should always	-0.124939
-0.879979	process should always	-0.124939
-1.884199	the variable always	-0.124939
-1.632678	you cannot always	-0.124939
-1.202079	and they always	-0.124939
-1.202079	that they always	-0.124939
-0.595235	point constant always	-0.124939
-1.761889	the stack always	-0.124939
-0.594772	recursion must always	-0.124939
-0.863300	call statement always	-0.124939
-0.863702	that p always	-0.124939
-1.095456	is almost always	-0.124939
-1.203077	I am always	-0.124939
-0.358686	Remember, therefore, always	-0.124939
-2.092500	make the access	-0.124939
-1.825705	makes the access	-0.124939
-2.094146	is to access	-0.124939
-1.651078	code to access	-0.124939
-1.597730	than to access	-0.124939
-2.119888	possible to access	-0.124939
-1.556035	order to access	-0.124939
-0.845768	faster to access	-0.425969
-0.886663	seconds to access	-0.124939
-1.171864	unable to access	-0.124939
-0.593769	steps to access	-0.124939
-0.601376	column. The access	-0.124939
-2.269938	is that access	-0.124939
-1.816685	functions that access	-0.124939
-1.738216	we can access	-0.124939
-1.300936	If you access	-0.124939
-0.591305	give you access	-0.124939
-1.074404	threads have access	-0.124939
-0.568023	and memory access	-0.124939
-0.568023	that memory access	-0.124939
-0.568023	if memory access	-0.124939
-0.568023	than memory access	-0.124939
-0.197096	Optimizing memory access	-0.124939
-0.899008	if data access	-0.124939
-0.599773	when CPU access	-0.124939
-0.599565	require other access	-0.124939
-0.599404	or cache access	-0.124939
-0.598279	fastest possible access	-0.124939
-1.218969	it cannot access	-0.124939
-1.122059	function cannot access	-0.124939
-0.596217	different user access	-0.124939
-0.530735	is file access	-0.124939
-0.188887	put file access	-0.124939
-0.530735	Optimizing file access	-0.124939
-1.587289	to get access	-0.124939
-1.246356	for fast access	-0.124939
-0.581801	which gives access	-0.124939
-0.623801	and network access	-0.124939
-0.440341	The network access	-0.124939
-0.440341	with network access	-0.124939
-0.827333	3.13 Memory access	-0.124939
-0.562619	with non-sequential access	-0.124939
-0.550230	subset, giving access	-0.124939
-0.526784	for regular access	-0.124939
-0.725982	3.7 File access	-0.124939
-0.463203	finally (4) access	-0.124939
-0.065762	3.12 Network access	-0.124939
-0.463203	allows direct access	-0.124939
-0.463203	for exclusive access	-0.124939
-0.358586	Sequential forward access	-0.124939
-1.946355	} } void	-0.124939
-1.304419	... } void	-0.124939
-1.503150	2; } void	-0.124939
-1.021133	x); } void	-0.124939
-1.067836	each version void	-0.124939
-0.894079	with branch void	-0.124939
-0.080129	public: virtual void	-0.425969
-0.486689	NotPolymorphic(); virtual void	-0.124939
-1.172013	another thread void	-0.124939
-0.195743	copy matrix void	-0.425969
-1.637621	vector classes void	-0.124939
-0.860623	} }; void	-0.425969
-0.470122	d; }; void	-0.124939
-0.670096	f(); }; void	-0.124939
-0.470122	~C1(); }; void	-0.124939
-0.592417	alignment problem void	-0.124939
-0.479974	static inline void	-0.346788
-0.432334	call inline void	-0.124939
-0.542455	{ public: void	-0.346788
-0.526700	matrix 96 void	-0.124939
-0.763533	Example 8.26a void	-0.124939
-0.504690	Example 7.12 void	-0.124939
-0.504460	type typedef void	-0.124939
-0.659066	Example 8.26b void	-0.124939
-0.659066	swapd(a[r][c], a[c][r]); void	-0.124939
-0.463130	Example 14.1c void	-0.124939
-0.463130	void Disp(); void	-0.124939
-0.143272	F2(float x[]); void	-0.124939
-0.143272	F1(int x[]); void	-0.124939
-0.659066	Example 8.21 void	-0.124939
-0.358529	Example 9.5b void	-0.124939
-0.358529	// Dispatcher void	-0.124939
-0.358529	*)d, x);} void	-0.124939
-0.358529	x(0) {}; void	-0.124939
-0.358529	EXCEPTION_FLT_OVERFLOW 0xC0000091L void	-0.124939
-0.358529	Example 8.5a void	-0.124939
-0.358529	function prototype: void	-0.124939
-0.358529	#include <malloc.h> void	-0.124939
-0.358529	Example 8.25 void	-0.124939
-0.358529	Example 9.2b void	-0.124939
-0.358529	Example 9.2a void	-0.124939
-0.358529	function vectorized: void	-0.124939
-0.358529	#include <asmlib.h> void	-0.124939
-1.077957	int is 16	-0.124939
-1.372858	integers of 16	-0.124939
-1.075489	block of 16	-0.124939
-0.600821	increased to 16	-0.124939
-0.600821	corresponds to 16	-0.124939
-0.601018	sizeof(S1) = 16	-0.124939
-0.202858	8 or 16	-0.124939
-0.595785	12 or 16	-0.124939
-1.567110	loop by 16	-0.124939
-0.586042	table by 16	-0.124939
-1.801689	divisible by 16	-0.124939
-0.586042	alignment by 16	-0.124939
-0.586042	structures by 16	-0.124939
-0.586042	Align by 16	-0.124939
-2.640552	the code 16	-0.124939
-0.900780	4 - 16	-0.124939
-1.781020	unsigned int 16	-0.124939
-1.225199	short int 16	-0.124939
-0.583998	systems: int 16	-0.124939
-1.594778	bigger than 16	-0.124939
-2.186360	See page 16	-0.124939
-0.896383	element number 16	-0.124939
-1.128478	char 8 16	-0.124939
-0.561630	Vec16c 8 16	-0.124939
-0.561630	I64vec1 8 16	-0.124939
-1.712080	to test 16	-0.124939
-1.091573	int 16 16	-0.124939
-0.551310	256 16 16	-0.124939
-0.551310	Vec4d 16 16	-0.124939
-1.172169	int 32 16	-0.124939
-0.997381	float 32 16	-0.124939
-0.588586	832 256 16	-0.124939
-0.190090	= char 16	-0.124939
-0.531655	SSE2 Store 16	-0.124939
-0.276272	SSE Store 16	-0.425969
-0.827186	the lower 16	-0.124939
-0.562550	be 8, 16	-0.124939
-0.562649	Compiler identification 16	-0.124939
-0.540612	actually adds 16	-0.124939
-0.526700	....................................................................................................... 150 16	-0.124939
-0.504460	cycle? ...................................................................................... 16	-0.124939
-0.463130	Is16vec8 Vec8s 16	-0.124939
-0.463130	consumers ................................................................................ 16	-0.124939
-0.463130	spots .................................................................................. 16	-0.124939
-0.358529	to 15.1c). 16	-0.124939
-0.358529	Iu8vec16 Vec16uc 16	-0.124939
-0.358529	64 Iu8vec8 16	-0.124939
-0.358529	Vec8f Vec4d 16	-0.124939
-0.358529	needed _mm_shuffle_epi8 16	-0.124939
-0.358529	64 Is16vec4 16	-0.124939
-1.870148	for the SSE2	-0.425969
-1.950154	if the SSE2	-0.425969
-1.061978	when the SSE2	-0.669007
-0.998239	only the SSE2	-0.124939
-1.576813	without the SSE2	-0.124939
-1.055813	cases the SSE2	-0.124939
-0.657489	unless the SSE2	-0.249877
-1.262754	Using the SSE2	-0.124939
-0.490256	enable the SSE2	-0.301030
-0.902008	SSE and SSE2	-0.124939
-1.286980	efficient. The SSE2	-0.124939
-1.283537	CPUs. The SSE2	-0.124939
-0.598505	140). The SSE2	-0.124939
-1.488497	optimized for SSE2	-0.124939
-0.599943	Only for SSE2	-0.124939
-1.556580	{ // SSE2	-0.425969
-0.673008	{...} // SSE2	-0.425969
-0.882793	#endif // SSE2	-0.124939
-0.901175	SSE or SSE2	-0.124939
-0.901394	mode if SSE2	-0.124939
-0.601027	Vectorized with SSE2	-0.124939
-2.217222	instruction set SSE2	-0.124939
-1.367013	Instruction set SSE2	-0.124939
-0.596188	integer without SSE2	-0.124939
-0.174334	2 128 SSE2	-0.124939
-0.670116	4 128 SSE2	-0.124939
-0.470135	8 128 SSE2	-0.124939
-0.470135	16 128 SSE2	-0.124939
-0.585658	float vectors SSE2	-0.124939
-1.532277	// Define SSE2	-0.124939
-1.287159	if possible. SSE2	-0.124939
-0.504721	the 145 SSE2	-0.124939
-0.463367	same executable. SSE2	-0.124939
-0.659439	/arch:SSE -msse SSE2	-0.124939
-0.358715	MOVNTDQ _mm_stream_si128 SSE2	-0.124939
-0.358715	MOVNTI _mm_stream_si32 SSE2	-0.124939
-0.358715	SSE xmmintrin.h SSE2	-0.124939
-0.358715	MOVNTPD _mm_stream_pd SSE2	-0.124939
-0.249486	index is out	-0.602060
-1.201814	languages are out	-0.124939
-0.900971	simultaneously or out	-0.124939
-0.601173	0 if out	-0.124939
-2.574418	is not out	-0.124939
-0.570830	execute instructions out	-0.124939
-0.991514	executing instructions out	-0.124939
-0.584103	list points out	-0.124939
-1.020634	the conversions out	-0.124939
-0.581655	FatalAppExitA(0,"Array index out	-0.124939
-0.885133	to find out	-0.124939
-0.736700	and shift out	-0.124939
-0.430792	can shift out	-0.124939
-0.430792	will shift out	-0.124939
-0.052830	cannot rule out	-0.301030
-0.182840	completely rule out	-0.124939
-0.008214	to roll out	-0.522879
-0.042719	we roll out	-0.124939
-0.009876	// Roll out	-0.823909
-0.763752	can move out	-0.124939
-0.526976	or mask out	-0.124939
-0.249666	be carried out	-0.124939
-0.249666	were carried out	-0.124939
-0.090104	// Index out	-0.124939
-0.042719	"Error: Index out	-0.425969
-0.107134	be moved out	-0.425969
-0.504580	n being out	-0.124939
-0.249666	for jumping out	-0.124939
-0.249666	after jumping out	-0.124939
-0.065765	be ruled out	-0.124939
-0.065765	by rolling out	-0.425969
-0.143298	block turns out	-0.124939
-0.143298	prediction turns out	-0.124939
-0.463239	be left out	-0.124939
-0.143298	is rolled out	-0.124939
-0.143298	list, rolled out	-0.124939
-0.358615	to print out	-0.124939
-0.358615	are breaking out	-0.124939
-2.106225	of the following	-0.124939
-1.526453	in the following	-0.380211
-1.652788	for the following	-0.602060
-2.319673	if the following	-0.124939
-2.153232	by the following	-0.124939
-1.520623	have the following	-0.124939
-1.717380	has the following	-0.124939
-1.329541	through the following	-0.124939
-0.593194	consider the following	-0.124939
-0.202332	generates the following	-0.425969
-0.593194	skip the following	-0.124939
-0.885535	Consider the following	-0.124939
-0.593194	Instead, the following	-0.124939
-0.563238	table The following	-0.124939
-1.389237	function. The following	-0.124939
-1.336870	functions. The following	-0.124939
-1.163032	efficient. The following	-0.124939
-1.186794	set. The following	-0.124939
-1.066122	processors. The following	-0.124939
-1.134439	loop. The following	-0.124939
-1.066122	2. The following	-0.124939
-0.971793	precision. The following	-0.124939
-1.066122	not. The following	-0.124939
-0.828633	vectors. The following	-0.124939
-0.828633	do. The following	-0.124939
-0.563238	again. The following	-0.124939
-0.828633	branches. The following	-0.124939
-0.828633	www.agner.org/optimize/asmlib.zip. The following	-0.124939
-0.828633	explanation. The following	-0.124939
-0.196073	unsigned. The following	-0.124939
-0.563238	errors. The following	-0.124939
-0.563238	only. The following	-0.124939
-0.828633	compilation. The following	-0.124939
-0.563238	multiplications. The following	-0.124939
-0.563238	runtime). The following	-0.124939
-0.563238	shortly. The following	-0.124939
-0.563238	Wikibooks. The following	-0.124939
-0.563238	noticeable. The following	-0.124939
-0.563238	Atom). The following	-0.124939
-0.563238	satisfactory. The following	-0.124939
-0.358973	is InstructionSet().The following	-0.124939
-2.471296	and the system	-0.124939
-2.018691	that the system	-0.124939
-1.197736	access the system	-0.124939
-0.600455	therefore the system	-0.124939
-2.160364	by a system	-0.124939
-2.081756	on a system	-0.124939
-1.634690	discussion of system	-0.124939
-0.600955	area of system	-0.124939
-1.358315	files and system	-0.124939
-0.743720	problems and system	-0.124939
-0.895668	interfaces and system	-0.124939
-1.820962	function in system	-0.124939
-1.074815	use in system	-0.124939
-0.601509	resources. The system	-0.124939
-1.500734	intended for system	-0.124939
-0.601068	determined with system	-0.124939
-0.599964	resolutions, different system	-0.124939
-0.147110	the operating system	-0.215115
-0.386832	of operating system	-0.124939
-0.073081	The operating system	-0.602060
-0.386832	an operating system	-0.124939
-0.271352	certain operating system	-0.124939
-0.271352	Mac operating system	-0.124939
-0.271352	compiler, operating system	-0.124939
-0.386832	X operating system	-0.124939
-0.386832	protected operating system	-0.124939
-0.271352	circumvent operating system	-0.124939
-1.005501	error handling system	-0.124939
-1.330836	exception handling system	-0.124939
-0.583936	under advanced system	-0.124939
-0.177736	3.11 Other system	-0.425969
-1.082510	are highly system	-0.124939
-0.550649	again. Accessing system	-0.124939
-0.659553	just-in-time compilers, system	-0.124939
-0.358773	service routines, system	-0.124939
-3.163294	of the 32	-0.124939
-1.075715	int is 32	-0.124939
-0.600901	size_t is 32	-0.124939
-1.377117	integers of 32	-0.124939
-0.902085	compared to 32	-0.124939
-0.601253	bit and 32	-0.124939
-1.727219	than in 32	-0.124939
-1.692482	objects in 32	-0.124939
-1.193509	references in 32	-0.124939
-0.598225	16 or 32	-0.124939
-0.598225	(16 or 32	-0.124939
-1.374381	aligned by 32	-0.124939
-1.199967	organized as 32	-0.124939
-1.026889	long int 32	-0.124939
-0.583921	SSE2 int 32	-0.124939
-0.583921	AVX int 32	-0.124939
-0.583921	AVX2 int 32	-0.124939
-0.583921	MMX int 32	-0.124939
-2.351380	rather than 32	-0.124939
-0.600280	(Both use 32	-0.124939
-1.692454	will make 32	-0.124939
-2.180876	See page 32	-0.124939
-1.534059	for example 32	-0.124939
-0.896594	64-bit double 32	-0.124939
-0.574202	SSE2 float 32	-0.124939
-0.574202	AVX2 float 32	-0.124939
-0.574202	AVX512 float 32	-0.124939
-0.894834	j * 32	-0.124939
-1.282579	64 2 32	-0.124939
-1.065847	unsigned long 32	-0.124939
-1.280704	64 4 32	-0.124939
-1.189268	32 8 32	-0.124939
-1.128093	char 8 32	-0.124939
-0.561525	Vec2uq 8 32	-0.124939
-1.188348	64 64 32	-0.124939
-1.066216	16 16 32	-0.124939
-1.665360	the AVX 32	-0.124939
-0.592439	float uses 32	-0.124939
-0.589777	SSE2, preferably 32	-0.124939
-1.047801	OpenMP directives 32	-0.124939
-0.860280	operators produce 32	-0.124939
-0.577004	with accessing 32	-0.124939
-0.574897	64) % 32	-0.124939
-0.787838	advantages over 32	-0.124939
-0.787838	8, 16, 32	-0.124939
-0.358779	the upper 32	-0.124939
-0.358779	Get upper 32	-0.124939
-0.358414	SSSE3 _mm_perm_epi8 32	-0.124939
-0.358414	__INTEL_COMPILER 161 32	-0.124939
-0.358414	Iu16vec8 Vec8us 32	-0.124939
-0.358414	operators ...................................................................... 32	-0.124939
-0.358414	Is32vec4 Vec4i 32	-0.124939
-0.358414	64 Is32vec2 32	-0.124939
-0.358414	features 80386 32	-0.124939
-0.358414	64 Iu16vec4 32	-0.124939
-2.284405	because the file	-0.124939
-2.014575	before the file	-0.124939
-1.623866	sure the file	-0.124939
-1.196179	access the file	-0.124939
-1.196179	write the file	-0.124939
-0.600061	closes the file	-0.124939
-0.902410	bottleneck is file	-0.124939
-2.278034	to a file	-0.124939
-1.281319	access a file	-0.124939
-1.067247	writing a file	-0.124939
-0.598046	writes a file	-0.124939
-0.598046	created a file	-0.124939
-0.598046	opens a file	-0.124939
-0.599953	linking. The file	-0.124939
-0.898887	closed. The file	-0.124939
-2.192923	used for file	-0.124939
-0.600079	old data file	-0.124939
-1.791104	the library file	-0.124939
-1.166315	the object file	-0.124939
-1.484092	an object file	-0.124939
-0.551133	different object file	-0.124939
-0.551133	usual object file	-0.124939
-1.068497	or C++ file	-0.124939
-1.067682	have many file	-0.124939
-0.594296	A big file	-0.124939
-0.940300	The intermediate file	-0.124939
-1.328352	an intermediate file	-0.124939
-1.688858	a separate file	-0.124939
-0.830884	to put file	-0.425969
-0.588350	one source file	-0.124939
-0.575347	storage. Optimizing file	-0.124939
-1.391160	the entire file	-0.124939
-0.204830	the executable file	-0.301030
-0.267634	an executable file	-0.124939
-0.267634	single executable file	-0.124939
-0.167964	the header file	-0.124939
-0.191188	a header file	-0.124939
-0.191188	standard header file	-0.124939
-0.085157	appropriate header file	-0.124939
-0.306276	a map file	-0.124939
-0.126680	The map file	-0.124939
-0.306276	Generate map file	-0.124939
-0.566796	and standardized file	-0.124939
-0.077780	// Header file	-0.425969
-0.172598	set Header file	-0.124939
-0.358672	optimize/#vectorclass Include file	-0.124939
-0.358672	a zip file	-0.124939
-2.998121	in the programming	-0.124939
-1.077351	way the programming	-0.124939
-0.902291	choosing a programming	-0.124939
-1.177975	time of programming	-0.124939
-0.888951	choice of programming	-0.425969
-1.036821	Choice of programming	-0.425969
-0.388452	matter of programming	-0.726999
-1.053036	history of programming	-0.124939
-0.885486	deal of programming	-0.124939
-0.593169	standardization of programming	-0.124939
-0.601649	courses in programming	-0.124939
-1.628013	called from programming	-0.124939
-1.293668	in other programming	-0.124939
-0.576524	choose other programming	-0.124939
-0.576524	Several other programming	-0.124939
-0.576524	over other programming	-0.124939
-0.599652	decide which programming	-0.124939
-0.598559	supports multiple programming	-0.124939
-1.531427	the C++ programming	-0.124939
-1.540199	the software programming	-0.124939
-1.446462	a software programming	-0.124939
-0.569060	functions Some programming	-0.124939
-0.569060	allocation. Some programming	-0.124939
-0.594417	other compiled programming	-0.124939
-0.717404	a common programming	-0.124939
-0.536852	other common programming	-0.124939
-0.591229	to certain programming	-0.124939
-1.788400	a particular programming	-0.124939
-0.875867	and various programming	-0.124939
-0.527126	to your programming	-0.124939
-0.527126	send your programming	-0.124939
-1.151176	the advanced programming	-0.124939
-0.581697	Several modern programming	-0.124939
-0.858151	a safe programming	-0.124939
-0.046313	object oriented programming	-0.204120
-0.112962	Object oriented programming	-0.124939
-0.827557	the preferred programming	-0.124939
-0.152904	14.13 System programming	-0.124939
-0.504580	may catch programming	-0.124939
-0.463239	the trivial programming	-0.124939
-0.463239	relatively primitive programming	-0.124939
-0.358615	Template meta- programming	-0.124939
-0.358615	classes Nowadays, programming	-0.124939
-1.314195	all the dynamic	-0.425969
-1.200252	load the dynamic	-0.124939
-2.047958	in a dynamic	-0.425969
-2.249004	as a dynamic	-0.124939
-0.897910	which a dynamic	-0.124939
-0.893020	uses of dynamic	-0.124939
-1.590750	cost of dynamic	-0.124939
-0.596996	process of dynamic	-0.124939
-0.975198	advantages of dynamic	-0.124939
-1.355460	costs of dynamic	-0.124939
-0.893020	disadvantages of dynamic	-0.124939
-0.379521	static and dynamic	-0.124939
-1.077163	libraries. The dynamic	-0.124939
-0.601386	reserved for dynamic	-0.124939
-0.203388	*.a) or dynamic	-0.425969
-0.601168	application if dynamic	-0.124939
-0.900864	associated with dynamic	-0.124939
-1.734917	but not dynamic	-0.124939
-2.356361	rather than dynamic	-0.124939
-1.609632	to use dynamic	-0.425969
-0.573516	libraries use dynamic	-0.124939
-0.198259	classes use dynamic	-0.425969
-0.573516	Java, use dynamic	-0.124939
-1.756335	or more dynamic	-0.124939
-0.588259	it. A dynamic	-0.124939
-0.588259	process. A dynamic	-0.124939
-0.588259	linking. A dynamic	-0.124939
-1.658942	can make dynamic	-0.124939
-1.945268	the same dynamic	-0.124939
-0.599497	make all dynamic	-0.124939
-1.907260	of using dynamic	-0.124939
-1.484483	between multiple dynamic	-0.124939
-1.784026	cases where dynamic	-0.124939
-0.596078	container without dynamic	-0.124939
-0.594531	application, while dynamic	-0.124939
-1.121129	to avoid dynamic	-0.425969
-0.541049	and avoid dynamic	-0.124939
-1.167394	with another dynamic	-0.124939
-0.881389	purposes. All dynamic	-0.124939
-0.992450	a separate dynamic	-0.425969
-1.564080	// Make dynamic	-0.124939
-0.143319	Static versus dynamic	-0.124939
-0.526960	classes Whenever dynamic	-0.124939
-1.712853	only the part	-0.124939
-2.302521	that is part	-0.124939
-1.373829	parameter is part	-0.124939
-2.650926	is a part	-0.124939
-1.075774	often a part	-0.124939
-0.601279	(Darwin) are part	-0.124939
-0.600822	included as part	-0.124939
-2.573734	is not part	-0.124939
-1.059007	that this part	-0.124939
-1.183946	on this part	-0.124939
-1.489754	time. A part	-0.124939
-1.692632	the same part	-0.602060
-0.599798	cases: If part	-0.124939
-1.071966	see which part	-0.124939
-0.599651	reasons, but part	-0.124939
-1.260694	in each part	-0.124939
-0.579715	time each part	-0.124939
-1.035584	times each part	-0.124939
-1.359595	a static part	-0.124939
-0.597942	include any part	-0.124939
-0.645700	the critical part	-0.669007
-0.727174	a critical part	-0.124939
-0.374913	same critical part	-0.124939
-0.141730	most critical part	-1.028029
-1.060935	you access part	-0.124939
-0.889100	an important part	-0.124939
-1.492827	a large part	-0.124939
-1.448406	a small part	-0.124939
-0.885381	The optimized part	-0.124939
-0.592583	and another part	-0.124939
-1.788136	a particular part	-0.124939
-0.569966	most significant part	-0.124939
-0.835020	most time-consuming part	-0.124939
-0.566617	program (or part	-0.124939
-0.028007	// fractional part	-0.301030
-0.463221	the time-critical part	-0.124939
-0.358600	the task-specific part	-0.124939
-3.067190	of the bits	-0.124939
-1.998483	all the bits	-0.124939
-0.601113	interpret the bits	-0.124939
-2.459997	number of bits	-0.124939
-1.197443	uses more bits	-0.124939
-1.946074	the same bits	-0.124939
-0.898281	all other bits	-0.124939
-1.337997	if all bits	-0.124939
-0.591115	testing all bits	-0.124939
-0.574197	set multiple bits	-0.124939
-0.574197	out multiple bits	-0.124939
-0.574197	toggle multiple bits	-0.124939
-1.184500	of 8 bits	-0.124939
-0.334024	of 64 bits	-0.124939
-0.180412	and 64 bits	-0.124939
-0.494670	be 64 bits	-0.124939
-0.494670	are 64 bits	-0.124939
-0.494670	use 64 bits	-0.124939
-0.891620	// test bits	-0.124939
-0.530848	is 16 bits	-0.124939
-0.770715	of 16 bits	-0.124939
-0.893203	or 16 bits	-0.124939
-0.530848	lower 16 bits	-0.124939
-0.599802	is 32 bits	-0.124939
-0.424462	of 32 bits	-0.124939
-0.599802	or 32 bits	-0.124939
-0.424462	use 32 bits	-0.124939
-0.424462	example 32 bits	-0.124939
-0.424462	double 32 bits	-0.124939
-0.424462	uses 32 bits	-0.124939
-0.424462	accessing 32 bits	-0.124939
-0.162316	upper 32 bits	-0.425969
-0.593555	writing small bits	-0.124939
-0.820230	is 128 bits	-0.124939
-0.558657	(MMX), 128 bits	-0.124939
-0.540941	available, 256 bits	-0.124939
-0.540941	(XMM), 256 bits	-0.124939
-0.587641	significant n bits	-0.124939
-0.523552	and 512 bits	-0.124939
-0.523552	also 512 bits	-0.124939
-0.570414	has enough bits	-0.124939
-0.295710	of vector, bits	-0.124939
-0.557281	declaration size, bits	-0.124939
-0.805161	by comparing bits	-0.124939
-0.526890	the individual bits	-0.124939
-0.463294	to 1024 bits	-0.124939
-0.065771	each element, bits	-0.425969
-0.358658	the remaining bits	-0.124939
-1.670994	kinds of operations	-0.124939
-1.375627	sequence of operations	-0.124939
-0.940348	the vector operations	-0.124939
-0.963147	of vector operations	-0.124939
-0.735522	and vector operations	-0.124939
-0.847132	The vector operations	-0.124939
-0.997735	with vector operations	-0.124939
-0.382751	use vector operations	-0.124939
-0.735522	using vector operations	-0.124939
-0.510244	64-bit vector operations	-0.124939
-0.510244	efficient vector operations	-0.124939
-0.972071	Using vector operations	-0.124939
-0.735522	Boolean vector operations	-0.124939
-0.510244	(when vector operations	-0.124939
-1.328154	floating point operations	-0.221849
-1.545273	Floating point operations	-0.124939
-1.233207	the integer operations	-0.124939
-0.538657	use integer operations	-0.124939
-0.538657	because integer operations	-0.124939
-0.538657	do integer operations	-0.124939
-0.538657	these integer operations	-0.124939
-0.190677	Using integer operations	-0.425969
-0.538657	Simple integer operations	-0.124939
-2.271584	to do operations	-0.124939
-0.598599	two 64-bit operations	-0.124939
-1.066306	and return operations	-0.124939
-1.880131	This makes operations	-0.124939
-0.597070	comparison, bit operations	-0.124939
-0.891353	and these operations	-0.124939
-1.266132	the extra operations	-0.124939
-0.799491	operators Integer operations	-0.124939
-0.547178	size. Integer operations	-0.124939
-0.877164	The Boolean operations	-0.124939
-1.165918	and mathematical operations	-0.124939
-1.517619	table lookup operations	-0.124939
-0.587655	and | operations	-0.124939
-0.587192	256-bit read operations	-0.124939
-1.118282	and shift operations	-0.124939
-0.575225	for disk operations	-0.124939
-0.377905	variables. Vector operations	-0.124939
-0.377905	sets. Vector operations	-0.124939
-0.377905	(ZMM). Vector operations	-0.124939
-0.434812	do arithmetic operations	-0.124939
-0.615394	Pointer arithmetic operations	-0.124939
-0.463330	are primitive operations	-0.124939
-0.358686	float. Similar operations	-0.124939
-2.151783	which is 0	-0.124939
-1.199990	one is 0	-0.124939
-1.200208	x to 0	-0.124939
-0.203727	b to 0	-0.425969
-1.833320	to be 0	-0.425969
-1.716693	i = 0	-0.124939
-0.841333	N = 0	-0.124939
-0.367137	~a = 0	-0.124939
-0.121267	a-a = 0	-0.301030
-0.197534	a*0 = 0	-0.425969
-0.197534	^a = 0	-0.124939
-0.197534	0/a = 0	-0.124939
-0.570084	andnot(a,a) = 0	-0.124939
-0.243828	value than 0	-0.602060
-0.088073	values than 0	-0.425969
-0.593608	integers from 0	-0.124939
-0.593608	interval from 0	-0.124939
-2.211139	the value 0	-0.124939
-1.066252	// return 0	-0.124939
-2.009588	i < 0	-0.124939
-1.019907	(i < 0	-0.124939
-1.276708	char 8 0	-0.124939
-1.184013	int 64 0	-0.124939
-1.456856	is always 0	-0.124939
-1.270616	int 16 0	-0.124939
-0.595903	long 32 0	-0.124939
-0.595712	test bits 0	-0.124939
-1.592065	a & 0	-0.124939
-0.594215	by element 0	-0.124939
-0.884611	we get 0	-0.124939
-0.589698	takes typically 0	-0.124939
-0.587660	xn n 0	-0.124939
-0.682157	a | 0	-0.425969
-0.178172	b > 0	-0.425969
-0.485504	bb[i] > 0	-0.124939
-0.884187	the interval 0	-0.124939
-0.358672	& 0= 0	-0.124939
-2.808387	of the type	-0.124939
-2.377151	and the type	-0.124939
-2.515531	that the type	-0.124939
-2.540757	if the type	-0.124939
-2.395019	on the type	-0.124939
-2.000776	where the type	-0.124939
-1.575398	while the type	-0.124939
-1.423741	choose the type	-0.124939
-0.896270	specifying the type	-0.124939
-0.598636	Re-interpreting the type	-0.124939
-1.293253	elements of type	-0.124939
-0.203728	numbers of type	-0.124939
-1.676914	than to type	-0.124939
-0.901856	size and type	-0.124939
-0.892967	is. The type	-0.124939
-0.596969	float. The type	-0.124939
-0.596969	declaration. The type	-0.124939
-0.892967	each. The type	-0.124939
-2.334801	the function type	-0.124939
-0.889913	Define function type	-0.124939
-0.889913	define function type	-0.124939
-2.357133	rather than type	-0.124939
-1.727884	a different type	-0.124939
-1.770627	of different type	-0.124939
-2.623118	the same type	-0.124939
-1.193434	unsigned integer type	-0.124939
-1.943745	for each type	-0.124939
-0.599034	and pointer type	-0.124939
-0.598565	The float type	-0.124939
-0.894951	with any type	-0.124939
-0.597653	The return type	-0.124939
-1.745449	a simple type	-0.124939
-1.331414	of doing type	-0.124939
-0.448220	for runtime type	-0.124939
-0.448220	use runtime type	-0.124939
-0.448220	require runtime type	-0.124939
-0.448220	No runtime type	-0.124939
-0.586346	instructions. Each type	-0.124939
-1.559688	the appropriate type	-0.124939
-0.582885	over- loaded type	-0.124939
-0.132714	a composite type	-0.124939
-0.223020	of composite type	-0.124939
-0.550500	rounding. Pointer type	-0.124939
-0.230875	(RTTI) Runtime type	-0.124939
-0.100273	7.21 Runtime type	-0.425969
-0.526848	// C-style type	-0.124939
-0.358629	// Implicit type	-0.124939
-0.358629	// Constructor-style type	-0.124939
-1.671852	is the case	-0.124939
-2.273690	in the case	-0.124939
-2.509313	if the case	-0.124939
-1.356501	not the case	-0.124939
-0.599257	In the case	-0.726999
-0.598017	often the case	-0.124939
-0.598017	commonly the case	-0.124939
-1.105645	function in case	-0.124939
-1.283733	time in case	-0.124939
-1.028268	use in case	-0.124939
-1.028268	program in case	-0.124939
-1.028268	object in case	-0.124939
-0.868494	way in case	-0.124939
-0.373068	up in case	-0.425969
-0.200534	exception in case	-0.425969
-1.137481	integers in case	-0.124939
-1.028268	numbers in case	-0.124939
-0.584417	operands in case	-0.124939
-1.234391	errors in case	-0.124939
-0.584417	everything in case	-0.124939
-0.584417	justified in case	-0.124939
-0.600710	(n) { case	-0.124939
-0.797502	in this case	-0.346788
-0.585768	In this case	-0.124939
-0.895885	worst possible case	-0.124939
-0.592240	the likely case	-0.124939
-1.098521	The simplest case	-0.124939
-1.270171	the latter case	-0.124939
-0.970761	the general case	-0.124939
-0.343736	the worst case	-0.425969
-0.391103	The worst case	-0.124939
-0.172649	printf("Beta"); break; case	-0.124939
-0.172649	printf("Gamma"); break; case	-0.124939
-0.172649	printf("Alpha"); break; case	-0.124939
-0.358830	the worst- case	-0.124939
-0.358830	the former case	-0.124939
-2.636226	for the cases	-0.124939
-1.710958	In the cases	-0.124939
-1.065046	size in cases	-0.124939
-0.893615	advantageous in cases	-0.124939
-1.189985	automatically in cases	-0.124939
-1.282283	errors in cases	-0.124939
-0.893615	containers in cases	-0.124939
-1.838422	may be cases	-0.425969
-2.154792	there are cases	-0.124939
-1.434478	many different cases	-0.124939
-0.499221	in most cases	-0.124939
-0.769205	In most cases	-0.124939
-1.068605	etc. In cases	-0.124939
-1.299120	are many cases	-0.124939
-1.044908	In many cases	-0.124939
-0.598375	all possible cases	-0.124939
-0.374249	in some cases	-0.182931
-0.262495	In some cases	-0.425969
-1.190810	in simple cases	-0.124939
-0.594002	best. These cases	-0.124939
-0.556256	the few cases	-0.124939
-1.483447	a few cases	-0.124939
-0.543185	These complicated cases	-0.124939
-0.543185	More complicated cases	-0.124939
-0.587434	In difficult cases	-0.124939
-0.716161	in special cases	-0.124939
-0.498615	are special cases	-0.124939
-0.177727	7.31 Other cases	-0.425969
-1.026666	more complex cases	-0.124939
-0.659467	13.3 Difficult cases	-0.124939
-0.358729	some rare cases	-0.124939
-1.502999	However, the short	-0.124939
-2.819130	it is short	-0.124939
-2.700637	in a short	-0.124939
-0.900801	With a short	-0.124939
-1.640056	list of short	-0.124939
-1.297764	libraries and short	-0.124939
-0.600935	macros with short	-0.124939
-1.198461	other than short	-0.124939
-1.198269	S1 { short	-0.124939
-0.600090	int. A short	-0.124939
-2.082600	by using short	-0.124939
-0.897117	libraries: Intel short	-0.124939
-1.064352	16 4 short	-0.124939
-1.184226	16 8 short	-0.124939
-0.966833	4 unsigned short	-0.124939
-0.825060	8 unsigned short	-0.124939
-0.561295	uint8_t unsigned short	-0.124939
-1.271023	128 SSE2 short	-0.124939
-1.059647	of type short	-0.124939
-1.380056	int i; short	-0.425969
-0.882149	1 1 short	-0.124939
-0.876691	unsigned 256 short	-0.124939
-1.149351	unsigned char short	-0.124939
-1.140908	256 AVX2 short	-0.124939
-0.003024	int bb[], short	-1.079181
-0.003024	int aa[], short	-1.079181
-0.080702	Alignd ( short	-0.602060
-1.047086	64 MMX short	-0.124939
-0.540832	at 11 short	-0.124939
-0.504743	Example 7.22 short	-0.124939
-0.463203	types: char, short	-0.124939
-0.358586	127 int8_t short	-0.124939
-0.358586	sizes (char, short	-0.124939
-3.019593	of the &	-0.124939
-2.375063	with the &	-0.124939
-2.213132	then the &	-0.124939
-1.372471	But the &	-0.124939
-1.174346	a a &	-0.124939
-1.496290	= a &	-0.425969
-2.082713	with a &	-0.124939
-0.264494	- a &	-0.124939
-0.887948	0 a &	-0.124939
-1.181600	a, a &	-0.124939
-0.594423	--xxxx--- a &	-0.124939
-0.601679	&& to &	-0.124939
-0.901826	operator. The &	-0.124939
-1.200406	p = &	-0.124939
-0.377986	a[], int &	-0.425969
-0.599133	conditions using &	-0.124939
-0.599022	security. b &	-0.124939
-0.463638	double const &	-0.124939
-0.077814	__m128i const &	-0.726999
-0.463638	T const &	-0.124939
-0.463638	(Vec4f const &	-0.124939
-0.463638	(vector const &	-0.124939
-0.463638	add_elements(__m128 const &	-0.124939
-0.463638	max(T const &	-0.124939
-1.855482	a single &	-0.124939
-0.590778	FuncB (int &	-0.124939
-0.557281	} T &	-0.124939
-0.311146	if (u.i &	-0.425969
-0.314572	= (n &	-0.124939
-0.444120	if (n &	-0.124939
-0.143311	// (N &	-0.124939
-0.143311	N1 (N &	-0.124939
-0.659324	if (Day &	-0.124939
-0.358658	int Sum3(S3 &	-0.124939
-0.358658	return powN<(N &	-0.124939
-0.358658	| ((C &	-0.124939
-0.358658	| ((B &	-0.124939
-0.358658	... list[i &	-0.124939
-0.358658	SVML v.10.3 &	-0.124939
-0.358658	SVML v.10.2 &	-0.124939
-0.358658	= OneOrTwo5[b &	-0.124939
-0.358658	= (A &	-0.124939
-3.070590	of the simple	-0.124939
-2.271621	than the simple	-0.124939
-1.707768	In the simple	-0.124939
-1.683324	is a simple	-0.124939
-2.205192	of a simple	-0.124939
-2.259344	in a simple	-0.124939
-1.871471	be a simple	-0.124939
-1.735035	or a simple	-0.124939
-1.784704	if a simple	-0.124939
-1.982126	with a simple	-0.124939
-1.787612	than a simple	-0.124939
-0.497967	then a simple	-0.425969
-0.590154	calling a simple	-0.124939
-1.158337	accessing a simple	-0.124939
-0.123600	follows a simple	-0.301030
-1.299602	elements of simple	-0.124939
-1.075752	times to simple	-0.124939
-0.600914	responses to simple	-0.124939
-0.601519	compact, and simple	-0.124939
-1.068250	file in simple	-0.124939
-0.502942	automatically in simple	-0.124939
-0.895772	least in simple	-0.124939
-0.596914	same for simple	-0.124939
-0.892858	efficient for simple	-0.124939
-1.183891	even for simple	-0.124939
-1.063922	times for simple	-0.124939
-2.273460	such as simple	-0.124939
-1.335397	time. A simple	-0.124939
-0.576843	speed. A simple	-0.124939
-0.576843	members. A simple	-0.124939
-0.576843	branches. A simple	-0.124939
-0.576843	profiler. A simple	-0.124939
-1.072712	contains only simple	-0.124939
-2.052265	to do simple	-0.124939
-1.745089	can do simple	-0.124939
-2.155846	the most simple	-0.124939
-1.283892	between two simple	-0.124939
-0.598499	pointers In simple	-0.124939
-0.597694	and between simple	-0.124939
-0.594336	contentions. Use simple	-0.124939
-1.422995	is quite simple	-0.124939
-1.352928	can reduce simple	-0.124939
-1.287159	to mix simple	-0.124939
-0.540895	In 50 simple	-0.124939
-0.358715	space. Putting simple	-0.124939
-2.026938	using the instructions	-0.124939
-1.501330	kind of instructions	-0.124939
-0.901580	i/2+r. The instructions	-0.124939
-1.736470	rely on instructions	-0.124939
-0.995417	The vector instructions	-0.124939
-0.572307	have vector instructions	-0.124939
-1.271768	use vector instructions	-0.124939
-0.572307	more vector instructions	-0.124939
-1.493616	integer vector instructions	-0.124939
-1.657980	the different instructions	-0.124939
-1.533651	are no instructions	-0.124939
-0.598330	next two instructions	-0.124939
-0.598211	pipeline where instructions	-0.124939
-0.596449	specific optimization instructions	-0.124939
-0.574775	These new instructions	-0.124939
-0.574775	adding new instructions	-0.124939
-1.179983	All these instructions	-0.124939
-0.594865	though. Some instructions	-0.124939
-0.837862	of extra instructions	-0.124939
-0.568223	few extra instructions	-0.124939
-0.594675	single assembly instructions	-0.124939
-0.593800	application- specific instructions	-0.124939
-0.593781	are single instructions	-0.124939
-0.536738	cache. These instructions	-0.124939
-0.536738	problem. These instructions	-0.124939
-0.536738	lookup. These instructions	-0.124939
-0.885681	The AVX instructions	-0.124939
-1.797057	a few instructions	-0.124939
-0.591087	have certain instructions	-0.124939
-0.165095	nontemporal write instructions	-0.249877
-0.588895	are intrinsic instructions	-0.124939
-0.876722	precision conversion instructions	-0.124939
-1.031238	cache control instructions	-0.124939
-1.002575	can execute instructions	-0.124939
-0.566624	supported 256-bit instructions	-0.124939
-0.566720	string search instructions	-0.124939
-0.308582	by executing instructions	-0.124939
-0.436068	on executing instructions	-0.124939
-0.308582	speculatively executing instructions	-0.124939
-0.557372	of machine instructions	-0.124939
-0.550212	only six instructions	-0.124939
-0.526834	define application-specific instructions	-0.124939
-0.143259	to reorder instructions	-0.124939
-0.143259	may reorder instructions	-0.124939
-0.358486	The 16-byte instructions	-0.124939
-0.358486	with carry) instructions	-0.124939
-0.358486	the ADX instructions	-0.124939
-0.358486	of pending instructions	-0.124939
-3.115892	of the processors	-0.124939
-2.539902	on the processors	-0.124939
-1.621886	list of processors	-0.124939
-0.424742	generation of processors	-0.301030
-0.592960	time on processors	-0.425969
-1.247053	best on processors	-0.124939
-0.590715	avoided on processors	-0.124939
-0.898927	and vector processors	-0.124939
-1.202651	for different processors	-0.124939
-1.192784	on most processors	-0.124939
-1.315896	in Intel processors	-0.124939
-0.588412	earlier Intel processors	-0.124939
-1.193470	on such processors	-0.124939
-1.187483	on some processors	-0.124939
-1.601470	the first processors	-0.124939
-0.917696	The first processors	-0.124939
-0.595175	between simple processors	-0.124939
-0.564668	other virtual processors	-0.124939
-0.564668	These virtual processors	-0.124939
-0.829566	and AMD processors	-0.124939
-0.563744	processors. AMD processors	-0.124939
-0.586189	processors. Many processors	-0.124939
-0.584860	The x86 processors	-0.124939
-0.584667	for old processors	-0.124939
-0.582881	recognize VIA processors	-0.124939
-0.581664	that modern processors	-0.124939
-0.695744	on non-Intel processors	-0.124939
-0.483670	all unknown processors	-0.124939
-0.483670	handle unknown processors	-0.124939
-0.483573	of logical processors	-0.124939
-0.138639	or logical processors	-0.124939
-0.343501	even-numbered logical processors	-0.124939
-0.175137	standard PC processors	-0.124939
-0.358673	of physical processors	-0.124939
-0.358673	four physical processors	-0.124939
-0.540832	for present processors	-0.124939
-0.526784	on older processors	-0.124939
-0.358586	Therefore, micro- processors	-0.124939
-0.358586	see emulated processors	-0.124939
-0.358586	Small lightweight processors	-0.124939
-0.358586	time. Newer processors	-0.124939
-0.358586	brand. Future processors	-0.124939
-2.501314	on the available	-0.124939
-0.600778	lists the available	-0.124939
-0.600778	increased the available	-0.124939
-0.900530	study the available	-0.124939
-1.342310	and is available	-0.124939
-2.492730	it is available	-0.124939
-2.225440	function is available	-0.124939
-1.186020	compiler is available	-0.425969
-1.444972	which is available	-0.124939
-2.033728	set is available	-0.124939
-0.595509	inttypes.h is available	-0.124939
-0.595509	edition is available	-0.124939
-2.460987	number of available	-0.124939
-2.520727	to be available	-0.124939
-1.887061	that are available	-0.124939
-0.579609	software are available	-0.124939
-1.546478	libraries are available	-0.124939
-0.746665	registers are available	-0.301030
-1.417213	operations are available	-0.124939
-1.236885	calculations are available	-0.124939
-1.035383	versions are available	-0.124939
-0.859292	These are available	-0.124939
-0.579609	tasks are available	-0.124939
-0.579609	templates are available	-0.124939
-1.015036	frameworks are available	-0.124939
-0.499173	are only available	-0.124939
-1.220235	is also available	-0.124939
-0.892967	extra register available	-0.124939
-1.907929	function libraries available	-0.124939
-1.106281	point registers available	-0.124939
-1.003814	integer registers available	-0.124939
-1.815698	operating systems available	-0.124939
-1.270441	are always available	-0.124939
-0.486268	logical processors available	-0.124939
-1.338934	be made available	-0.124939
-0.581729	will become available	-0.124939
-0.562757	be easily available	-0.124939
-0.557494	various profilers available	-0.124939
-0.940062	the largest available	-0.124939
-0.540971	library. Only available	-0.124939
-0.463349	soon became available	-0.124939
-0.358701	Kernel Library, available	-0.124939
-0.358701	on publicly available	-0.124939
-2.574900	to the constant	-0.124939
-2.581778	if the constant	-0.124939
-2.319549	with the constant	-0.124939
-1.614852	making the constant	-0.124939
-1.287796	add the constant	-0.124939
-1.431325	Here, the constant	-0.124939
-0.897748	around the constant	-0.124939
-0.599381	sees the constant	-0.124939
-0.601747	(ArraySize) is constant	-0.124939
-2.220034	is a constant	-0.124939
-1.871581	be a constant	-0.124939
-0.655720	by a constant	-0.477121
-1.982283	with a constant	-0.124939
-1.878191	use a constant	-0.124939
-1.646485	using a constant	-0.124939
-0.201714	adding a constant	-0.425969
-0.590161	plus a constant	-0.124939
-0.599371	inlining and constant	-0.124939
-0.203581	folding and constant	-0.425969
-0.599994	n. The constant	-0.124939
-0.898968	0. The constant	-0.124939
-1.595072	}; // constant	-0.124939
-0.592084	Multiply by constant	-0.124939
-0.123819	Divide by constant	-0.602060
-0.600926	declared as constant	-0.124939
-0.600243	subexpression. A constant	-0.124939
-2.571324	floating point constant	-0.124939
-1.865266	only one constant	-0.124939
-1.935277	an integer constant	-0.124939
-0.599299	giving each constant	-0.124939
-1.856700	a single constant	-0.124939
-1.738890	double precision constant	-0.124939
-0.590295	8.24. Integer constant	-0.124939
-1.219628	to enable constant	-0.124939
-0.557394	any compile-time constant	-0.124939
-0.504741	memory. Copying constant	-0.124939
-0.065780	subexpression elimination, constant	-0.425969
-2.292769	that are up	-0.124939
-2.641363	the code up	-0.124939
-0.600676	currently not up	-0.124939
-1.492684	that make up	-0.124939
-1.279911	to set up	-0.124939
-0.860153	be set up	-0.124939
-0.860153	can set up	-0.124939
-0.584028	loop takes up	-0.124939
-0.584028	pointer takes up	-0.124939
-0.479475	that take up	-0.425969
-1.201952	may take up	-0.124939
-0.196532	to speed up	-0.124939
-0.590940	it count up	-0.124939
-0.587816	we end up	-0.124939
-0.165514	to look up	-0.124939
-0.436275	first look up	-0.124939
-0.436275	(3) look up	-0.124939
-0.584711	frequency goes up	-0.124939
-0.974195	to keep up	-0.124939
-0.474220	always keep up	-0.124939
-0.772258	systems allow up	-0.124939
-0.474165	Mac allow up	-0.124939
-0.570198	for setting up	-0.124939
-0.540797	vector turned up	-0.124939
-0.504630	to split up	-0.124939
-0.358650	be split up	-0.124939
-0.077765	to clean up	-0.124939
-0.172560	must clean up	-0.124939
-0.444028	be cleaned up	-0.124939
-0.314504	are cleaned up	-0.124939
-0.725916	be filled up	-0.124939
-0.090093	for cleaning up	-0.124939
-0.090093	time cleaning up	-0.124939
-0.090093	from cleaning up	-0.124939
-0.504500	tested (not up	-0.124939
-0.143281	efficient. Splitting up	-0.124939
-0.143281	rule. Splitting up	-0.124939
-0.659123	CPU dispatchers up	-0.124939
-0.358557	measurements: warm up	-0.124939
-0.358557	_endthread() cleans up	-0.124939
-0.358557	it fills up	-0.124939
-0.358557	may fill up	-0.124939
-0.358557	be speeded up	-0.124939
-0.358557	by summing up	-0.124939
-0.358557	registers, totaling up	-0.124939
-0.358557	for speeding up	-0.124939
-2.566900	for the error	-0.124939
-1.585985	or the error	-0.124939
-1.850763	as the error	-0.124939
-2.202884	then the error	-0.124939
-0.600425	trigger the error	-0.124939
-1.196492	source of error	-0.124939
-1.490026	kind of error	-0.124939
-1.198387	form of error	-0.124939
-0.902372	approach to error	-0.124939
-0.203792	Exceptions and error	-0.425969
-0.601087	incompatible or error	-0.124939
-2.272957	such as error	-0.124939
-1.475627	of an error	-0.124939
-0.936576	and an error	-0.124939
-1.196265	with an error	-0.124939
-0.549150	If an error	-0.124939
-0.549150	return an error	-0.124939
-0.803021	makes an error	-0.124939
-0.549150	need an error	-0.124939
-1.084122	generate an error	-0.124939
-0.549150	detect an error	-0.124939
-0.549150	signal an error	-0.124939
-0.549150	issue an error	-0.124939
-0.549150	provoke an error	-0.124939
-0.549150	issuing an error	-0.124939
-0.549150	detects an error	-0.124939
-1.274665	and this error	-0.124939
-1.341240	avoid this error	-0.124939
-1.456261	and more error	-0.124939
-0.888551	therefore more error	-0.124939
-0.600165	recovering from error	-0.124939
-1.331494	or other error	-0.124939
-1.673879	any other error	-0.124939
-1.060944	common programming error	-0.124939
-0.594393	checking). An error	-0.124939
-1.451258	a common error	-0.124939
-0.592708	or another error	-0.124939
-0.541010	your own error	-0.124939
-0.519615	an appropriate error	-0.124939
-0.519615	make appropriate error	-0.124939
-0.860676	// No error	-0.124939
-0.028012	the residual error	-0.124939
-0.463349	a minor error	-0.124939
-0.463349	to provoke error	-0.124939
-0.358701	an unrecoverable error	-0.124939
-0.601291	issues, and I	-0.124939
-1.853349	function that I	-0.124939
-0.599704	complicated that I	-0.124939
-1.376438	But if I	-0.124939
-0.600648	no compiler I	-0.124939
-0.600277	force when I	-0.124939
-0.599737	speeds. If I	-0.124939
-0.599563	usability, but I	-0.124939
-0.387244	the compilers I	-0.823909
-1.098641	The compilers I	-0.124939
-1.001731	different compilers I	-0.124939
-0.592621	multiple functions. I	-0.124939
-1.037122	the examples I	-0.124939
-0.523398	b; Here, I	-0.124939
-0.523398	1]; Here, I	-0.124939
-1.478778	is called. I	-0.124939
-0.581280	line size. I	-0.124939
-0.491339	understand it. I	-0.124939
-0.491339	recompile it. I	-0.124939
-1.111231	in performance. I	-0.124939
-1.331089	or not. I	-0.124939
-0.566690	next element. I	-0.124939
-0.562238	initialized arrays. I	-0.124939
-0.956217	model number. I	-0.124939
-0.550029	to call. I	-0.124939
-0.804863	new one. I	-0.124939
-0.358570	and manuals. I	-0.124939
-0.358570	optimization manuals. I	-0.124939
-0.526594	performance options. I	-0.124939
-0.526594	reason is, I	-0.124939
-0.526594	reductions manually. I	-0.124939
-0.504359	as expected. I	-0.124939
-0.504359	model. Instead, I	-0.124939
-0.504359	to a. I	-0.124939
-0.658923	own research, I	-0.124939
-0.463038	to use. I	-0.124939
-0.658923	this manual, I	-0.124939
-0.358457	maintenance easier. I	-0.124939
-0.358457	embedded microcontrollers. I	-0.124939
-0.358457	particularly tricky. I	-0.124939
-0.358457	of people. I	-0.124939
-0.358457	being said, I	-0.124939
-0.358457	this chapter, I	-0.124939
-0.358457	0x800 apart. I	-0.124939
-0.358457	dramatic consequences. I	-0.124939
-0.754778	way of making	-0.301030
-0.202970	solution of making	-0.425969
-1.067813	means of making	-0.124939
-0.596339	advice of making	-0.124939
-0.891722	capable of making	-0.124939
-0.601607	two and making	-0.124939
-1.613373	code for making	-0.124939
-1.215554	useful for making	-0.124939
-1.180299	good for making	-0.124939
-0.887065	feature for making	-0.124939
-0.593974	facilities for making	-0.124939
-1.132001	you are making	-0.124939
-1.249480	or by making	-0.124939
-0.965032	code by making	-0.124939
-1.425572	this by making	-0.124939
-0.560586	faster by making	-0.124939
-1.425820	division by making	-0.124939
-0.792751	avoided by making	-0.124939
-0.560586	either by making	-0.124939
-0.560586	misses by making	-0.124939
-0.560586	switches by making	-0.124939
-0.560586	inheritance by making	-0.124939
-0.195502	solved by making	-0.124939
-0.560586	mispredictions by making	-0.124939
-0.560586	mitigated by making	-0.124939
-1.076302	satisfied with making	-0.124939
-0.900790	am not making	-0.124939
-1.930218	faster than making	-0.124939
-2.131346	rather than making	-0.124939
-0.592240	zero than making	-0.124939
-1.001722	it from making	-0.124939
-0.382595	compiler from making	-0.249877
-1.175429	should avoid making	-0.124939
-0.586488	for actually making	-0.124939
-0.573200	you consider making	-0.124939
-0.805602	different places making	-0.124939
-0.065791	have difficulties making	-0.425969
-1.754027	number of times	-0.124939
-1.195929	Number of times	-0.124939
-0.599312	billions of times	-0.124939
-1.068725	or multiple times	-0.124939
-0.586087	way two times	-0.124939
-0.586087	again two times	-0.124939
-0.499219	is many times	-0.124939
-0.499219	function many times	-0.124939
-0.499219	then many times	-0.124939
-0.499219	used many times	-0.124939
-0.081341	how many times	-0.249877
-0.499219	goes many times	-0.124939
-0.890841	that access times	-0.124939
-0.888152	The execution times	-0.124939
-0.564467	package several times	-0.124939
-0.564467	alternatingly several times	-0.124939
-0.592287	reloaded eight times	-0.124939
-1.798994	a few times	-0.124939
-0.588679	= 256 times	-0.124939
-0.535885	and three times	-0.124939
-0.535885	approximately three times	-0.124939
-0.584686	executed 10 times	-0.124939
-1.023707	to 5 times	-0.124939
-0.948592	the response times	-0.124939
-0.174672	long response times	-0.301030
-0.334702	longer response times	-0.124939
-0.572975	repeats 20 times	-0.124939
-1.147453	the subsequent times	-0.124939
-0.566844	improve search times	-0.124939
-0.562552	at random times	-0.124939
-0.383819	a thousand times	-0.124939
-0.550361	takes six times	-0.124939
-0.540849	to seven times	-0.124939
-0.998526	a hundred times	-0.124939
-0.540977	function 250 times	-0.124939
-0.940074	at inconvenient times	-0.124939
-0.107131	repeats 1000 times	-0.124939
-0.726016	at unpredictable times	-0.124939
-0.463221	function ten times	-0.124939
-0.463221	Repeat NumberOfTests times	-0.124939
-0.358600	a million times	-0.124939
-1.758291	to the stack	-0.124939
-2.355425	for the stack	-0.124939
-0.917432	on the stack	-0.204120
-2.051471	from the stack	-0.124939
-2.142492	because the stack	-0.124939
-2.081936	use a stack	-0.124939
-1.201008	up a stack	-0.124939
-0.504491	cases of stack	-0.425969
-0.804277	memory to stack	-0.124939
-1.073497	table to stack	-0.124939
-0.601582	references, and stack	-0.124939
-2.139806	stored in stack	-0.124939
-1.659721	function. The stack	-0.124939
-1.276687	below. The stack	-0.124939
-0.597081	situations: The stack	-0.124939
-0.597081	dependent. The stack	-0.124939
-0.601055	ebx on stack	-0.124939
-0.600231	ebx from stack	-0.124939
-1.626940	floating point stack	-0.602060
-1.720061	is called stack	-0.124939
-0.579793	mechanism called stack	-0.124939
-1.361266	the call stack	-0.124939
-0.541194	the register stack	-0.124939
-1.013164	of register stack	-0.124939
-0.940527	The register stack	-0.124939
-1.136620	point register stack	-0.124939
-1.324061	the standard stack	-0.124939
-0.947718	The standard stack	-0.124939
-0.580479	/EHs- No stack	-0.124939
-0.358787	for "standard stack	-0.124939
-1.377803	arrays and want	-0.124939
-1.358660	you may want	-0.425969
-0.359640	and you want	-0.602060
-0.606659	that you want	-0.346788
-0.755097	if you want	-0.823909
-0.675962	code you want	-0.124939
-0.677934	when you want	-0.425969
-0.473814	program you want	-0.124939
-0.681311	If you want	-0.346788
-0.324294	where you want	-0.124939
-0.866840	example, you want	-0.124939
-0.324294	what you want	-0.124939
-0.473814	150 you want	-0.124939
-0.675962	Whether you want	-0.124939
-1.353120	that we want	-0.124939
-0.543207	function we want	-0.124939
-0.554357	if we want	-0.425969
-1.005325	If we want	-0.124939
-0.889829	manuals. I want	-0.124939
-1.483626	you don't want	-0.124939
-0.536386	compilers. We want	-0.124939
-0.536386	chain. We want	-0.124939
-0.584178	you just want	-0.124939
-0.579202	we still want	-0.124939
-0.434979	developers who want	-0.124939
-0.434979	those who want	-0.124939
-0.933117	the code. Example:	-0.425969
-0.995929	of code. Example:	-0.124939
-0.914442	extra code. Example:	-0.124939
-1.278680	same time. Example:	-0.124939
-0.837787	called function. Example:	-0.124939
-0.568182	pure function. Example:	-0.124939
-1.259946	in memory. Example:	-0.124939
-0.943422	static memory. Example:	-0.124939
-1.431102	are used. Example:	-0.124939
-1.478674	is called. Example:	-0.124939
-0.582682	a loop. Example:	-0.124939
-0.583737	of 2. Example:	-0.425969
-0.579906	consecutive variables. Example:	-0.124939
-1.460450	function calls. Example:	-0.124939
-0.576890	XMM registers. Example:	-0.124939
-0.577037	float variable. Example:	-0.124939
-1.179403	is needed. Example:	-0.124939
-0.572639	detailed instructions. Example:	-0.124939
-1.010884	non-sequential order. Example:	-0.124939
-0.566674	jumps to. Example:	-0.124939
-0.638866	for overflow. Example:	-0.124939
-0.638866	cause overflow. Example:	-0.124939
-0.827177	previous value. Example:	-0.124939
-0.562216	previous branch. Example:	-0.124939
-0.550007	same constant. Example:	-0.124939
-1.122690	branch prediction. Example:	-0.124939
-0.550157	calculated result. Example:	-0.124939
-0.550157	loop counter. Example:	-0.124939
-0.550157	integer counter. Example:	-0.124939
-0.540657	single operation. Example:	-0.124939
-0.997471	is finished. Example:	-0.124939
-0.540657	different ways. Example:	-0.124939
-0.540481	parallel execution. Example:	-0.124939
-0.787900	array elements. Example:	-0.124939
-0.725648	only once. Example:	-0.124939
-0.725648	is limited. Example:	-0.124939
-0.463020	lookup-table static. Example:	-0.124939
-0.658894	is known. Example:	-0.124939
-0.658894	same thing. Example:	-0.124939
-0.463020	independent divisions. Example:	-0.124939
-0.463020	or later. Example:	-0.124939
-0.358443	all zeroes. Example:	-0.124939
-0.358443	be undesired. Example:	-0.124939
-0.358443	bit offsets). Example:	-0.124939
-0.358443	loop overhead. Example:	-0.124939
-0.358443	members individually. Example:	-0.124939
-2.337886	of the Gnu	-0.124939
-1.976352	to the Gnu	-0.425969
-1.798328	and the Gnu	-0.425969
-2.038978	in the Gnu	-0.301030
-2.238477	by the Gnu	-0.124939
-1.681808	with the Gnu	-0.124939
-2.111372	than the Gnu	-0.124939
-1.651001	only the Gnu	-0.124939
-1.557662	while the Gnu	-0.124939
-1.351936	replace the Gnu	-0.124939
-1.413442	Here, the Gnu	-0.124939
-0.803031	Intel and Gnu	-0.124939
-0.897854	PathScale and Gnu	-0.124939
-0.425117	dispatching in Gnu	-0.602060
-1.242889	CPUs. The Gnu	-0.124939
-1.311321	calls. The Gnu	-0.124939
-1.043372	libraries. The Gnu	-0.124939
-1.157001	version. The Gnu	-0.124939
-1.043372	vectorization. The Gnu	-0.124939
-1.043372	3. The Gnu	-0.124939
-0.589792	107). The Gnu	-0.124939
-0.589792	Mac. The Gnu	-0.124939
-0.589792	instead. The Gnu	-0.124939
-0.901617	#else // Gnu	-0.124939
-0.601189	Microsoft or Gnu	-0.124939
-0.404703	compiler Windows Gnu	-0.602060
-0.594443	libraries. Use Gnu	-0.124939
-0.219891	Microsoft, Intel, Gnu	-0.602060
-0.584953	systems. 10 Gnu	-0.124939
-0.573218	PGI PathScale Gnu	-0.124939
-1.115228	32-bit Windows. Gnu	-0.124939
-0.249763	32-bit -fno-builtin Gnu	-0.124939
-0.249763	bit -fno-builtin Gnu	-0.124939
-0.463476	13 Asmlib Gnu	-0.124939
-0.358801	IA-32/Intel64, 2009. Gnu	-0.124939
-0.600330	Development time Some	-0.124939
-0.599709	unnecessary functions Some	-0.124939
-1.273185	program optimization Some	-0.124939
-1.904017	function libraries Some	-0.124939
-1.708972	the code. Some	-0.124939
-1.736770	compile time. Some	-0.124939
-0.890448	Network access Some	-0.124939
-1.211293	64-bit systems. Some	-0.124939
-0.998220	operating systems. Some	-0.124939
-0.872740	all compilers. Some	-0.124939
-0.871643	Optimization directives Some	-0.124939
-0.886060	the compiler. Some	-0.124939
-0.658991	a compiler. Some	-0.124939
-0.658991	Microsoft compiler. Some	-0.124939
-1.435580	the loop. Some	-0.124939
-0.854130	bit mode. Some	-0.124939
-0.577214	16 bytes. Some	-0.124939
-1.184351	Loop unrolling Some	-0.124939
-0.566498	optimal order. Some	-0.124939
-1.277983	memory allocation. Some	-0.124939
-0.566612	option available. Some	-0.124939
-0.566498	technical problems. Some	-0.124939
-1.182661	cache line. Some	-0.124939
-0.562251	intensive applications. Some	-0.124939
-1.143904	is important. Some	-0.124939
-0.540394	avoid them. Some	-0.124939
-0.540394	the division. Some	-0.124939
-0.314387	than two. Some	-0.124939
-0.314387	make two. Some	-0.124939
-0.526489	starts up. Some	-0.124939
-1.008698	programming style. Some	-0.124939
-0.526489	can run. Some	-0.124939
-0.725515	intended for. Some	-0.124939
-0.726009	just-in-time compilation. Some	-0.124939
-0.504259	array sequentially. Some	-0.124939
-0.725515	10.1 Hyperthreading Some	-0.124939
-0.504259	works best. Some	-0.124939
-0.462947	used, though. Some	-0.124939
-0.358385	different places). Some	-0.124939
-0.358385	program logic. Some	-0.124939
-0.358385	time intervals. Some	-0.124939
-0.358385	the STL. Some	-0.124939
-0.358385	Xnu project. Some	-0.124939
-0.358385	accelerator card. Some	-0.124939
-0.358385	pool. Alignment? Some	-0.124939
-0.358385	of redesign. Some	-0.124939
-0.358385	very stupid. Some	-0.124939
-0.358385	Copy protection. Some	-0.124939
-1.728514	because of its	-0.124939
-0.891559	each of its	-0.124939
-0.759818	most of its	-0.602060
-1.539119	member of its	-0.124939
-1.573787	bits of its	-0.124939
-1.351938	values of its	-0.124939
-1.905328	pointer to its	-0.124939
-0.599302	return to its	-0.124939
-0.940077	pointers to its	-0.124939
-0.899750	type and its	-0.124939
-0.600387	side-effects and its	-0.124939
-1.074702	object in its	-0.124939
-0.600562	framework in its	-0.124939
-1.824363	or if its	-0.124939
-0.598271	register if its	-0.124939
-1.739207	replaced by its	-0.124939
-1.070586	constant with its	-0.124939
-0.597741	counter with its	-0.124939
-0.594137	object on its	-0.124939
-1.055833	run on its	-0.124939
-1.757738	based on its	-0.124939
-1.165657	other than its	-0.124939
-2.128828	rather than its	-0.124939
-1.050020	better than its	-0.124939
-1.466707	not have its	-0.124939
-1.059535	should have its	-0.124939
-0.899638	pointer then its	-0.124939
-1.291477	benefit from its	-0.124939
-0.600145	matrix[c][r] at its	-0.124939
-0.579318	thread has its	-0.124939
-0.579318	list has its	-0.124939
-0.579318	model has its	-0.124939
-0.579318	instance has its	-0.124939
-1.072097	cases, but its	-0.124939
-1.993778	make sure its	-0.124939
-1.784356	information about its	-0.124939
-1.172573	each thread its	-0.124939
-0.592706	(1) get its	-0.124939
-0.591979	must calculate its	-0.124939
-0.865678	cannot change its	-0.124939
-0.575181	then handle its	-0.124939
-1.097748	to align its	-0.124939
-0.283131	by type-casting its	-0.124939
-0.358658	fully utilizing its	-0.124939
-0.975365	too much about	-0.124939
-0.564629	worry much about	-0.124939
-0.175582	of information about	-0.124939
-0.175582	have information about	-0.124939
-0.175582	more information about	-0.124939
-0.078977	all information about	-0.425969
-0.175582	no information about	-0.124939
-0.175582	without information about	-0.124939
-0.078977	necessary information about	-0.124939
-0.175582	No information about	-0.124939
-0.175582	full information about	-0.124939
-0.175582	added information about	-0.124939
-0.265641	CPUID information about	-0.124939
-0.078977	gets information about	-0.124939
-0.175582	additional information about	-0.124939
-0.175582	insufficient information about	-0.124939
-0.175582	incomplete information about	-0.124939
-0.587152	can read about	-0.124939
-1.338530	be made about	-0.124939
-0.186321	said here about	-0.124939
-0.575315	The details about	-0.124939
-0.649340	for details about	-0.124939
-0.407957	More details about	-0.124939
-0.172638	do something about	-0.124939
-0.567094	to care about	-0.124939
-0.557514	certain rules about	-0.124939
-0.805121	page 87 about	-0.124939
-0.788128	specific recommendation about	-0.124939
-0.763824	page 43 about	-0.124939
-0.526869	page 26 about	-0.124939
-0.527008	any assumption about	-0.124939
-0.504620	page 137 about	-0.124939
-0.659295	to worry about	-0.124939
-0.463276	but that's about	-0.124939
-0.463276	the hint about	-0.124939
-0.463276	few comments about	-0.124939
-0.358643	useful discussions about	-0.124939
-0.358643	hasn't thought about	-0.124939
-0.358643	a reply about	-0.124939
-0.358643	too worried about	-0.124939
-0.358643	tables". Tips about	-0.124939
-0.358643	processors. Details about	-0.124939
-0.358643	considerable debate about	-0.124939
-1.298375	it is important	-1.079181
-1.166925	It is important	-0.647817
-1.430723	performance is important	-0.124939
-2.940086	can be important	-0.124939
-0.601055	concentrating on important	-0.124939
-1.496519	well as important	-0.124939
-1.189127	is an important	-0.124939
-0.600554	install this important	-0.124939
-0.867005	the more important	-0.124939
-1.954043	is more important	-0.124939
-1.295065	are more important	-0.124939
-0.867005	even more important	-0.124939
-1.343567	the most important	-0.124939
-0.550037	The most important	-0.124939
-1.567311	is so important	-0.124939
-0.824383	is very important	-0.301030
-1.693309	is less important	-0.124939
-0.579229	become less important	-0.124939
-0.595124	them. Some important	-0.124939
-0.594481	important. An important	-0.124939
-1.662558	is therefore important	-0.124939
-1.300904	is too important	-0.124939
-1.196571	are particularly important	-0.124939
-1.699972	it is accessed	-0.124939
-2.546090	It is accessed	-0.124939
-1.117518	object is accessed	-0.124939
-0.895006	variable is accessed	-0.124939
-1.378141	cache and accessed	-0.124939
-1.715383	can be accessed	-0.221849
-2.102609	should be accessed	-0.124939
-0.760038	data are accessed	-0.124939
-0.578885	class are accessed	-0.124939
-0.751959	objects are accessed	-0.301030
-0.321176	elements are accessed	-0.425969
-1.526198	registers are accessed	-0.124939
-0.622142	arrays are accessed	-0.602060
-1.068856	addresses are accessed	-0.124939
-0.364625	rows are accessed	-0.124939
-0.564095	structures are accessed	-0.124939
-0.196257	diagonal are accessed	-0.124939
-1.076749	memory or accessed	-0.124939
-1.893696	is not accessed	-0.425969
-1.495975	even when accessed	-0.124939
-0.494478	Are objects accessed	-0.425969
-1.764279	has been accessed	-0.124939
-1.393635	in fact accessed	-0.124939
-0.541114	is necessarily accessed	-0.124939
-1.428433	speed of CPUs	-0.124939
-1.369413	generation of CPUs	-0.124939
-0.678516	brands of CPUs	-0.124939
-1.490207	function for CPUs	-0.124939
-1.290304	version for CPUs	-0.124939
-1.703170	compatible with CPUs	-0.124939
-0.887369	function on CPUs	-0.124939
-1.452450	only on CPUs	-0.124939
-1.277657	performance on CPUs	-0.124939
-1.392892	many different CPUs	-0.124939
-1.485356	several different CPUs	-0.124939
-1.289024	than other CPUs	-0.124939
-1.626117	with all CPUs	-0.124939
-0.599238	by most CPUs	-0.124939
-1.081138	of Intel CPUs	-0.124939
-0.567899	For Intel CPUs	-0.124939
-0.567899	newer Intel CPUs	-0.124939
-0.567899	current Intel CPUs	-0.124939
-1.281609	with multiple CPUs	-0.124939
-1.000410	use multiple CPUs	-0.124939
-0.574185	Using multiple CPUs	-0.124939
-0.598567	all 64-bit CPUs	-0.124939
-0.595000	bytes. Some CPUs	-0.124939
-0.594246	compiler. Use CPUs	-0.124939
-0.973194	for AMD CPUs	-0.124939
-1.243956	on AMD CPUs	-0.124939
-0.586272	counters Many CPUs	-0.124939
-0.584911	modern x86 CPUs	-0.124939
-1.154114	with old CPUs	-0.124939
-0.510244	because modern CPUs	-0.124939
-0.510244	Most modern CPUs	-0.124939
-0.579127	oldest Pentium CPUs	-0.124939
-0.361868	on non-Intel CPUs	-0.301030
-0.305020	treats non-Intel CPUs	-0.124939
-0.305020	treat non-Intel CPUs	-0.124939
-0.570097	handle current CPUs	-0.124939
-0.283195	CPU Modern CPUs	-0.124939
-0.283195	resources. Modern CPUs	-0.124939
-0.283195	parallel. Modern CPUs	-0.124939
-0.283195	temp2. Modern CPUs	-0.124939
-0.463276	accumulators. Current CPUs	-0.124939
-0.463276	division. Older CPUs	-0.124939
-0.358643	contemporary 106 CPUs	-0.124939
-0.358643	small low-power CPUs	-0.124939
-1.954414	of the function.	-0.221849
-2.008200	to the function.	-0.124939
-1.474212	from the function.	-0.425969
-1.645004	call the function.	-0.124939
-1.917244	inside the function.	-0.124939
-2.599730	of a function.	-0.124939
-2.389682	to a function.	-0.124939
-0.591820	one other function.	-0.124939
-1.673315	any other function.	-0.124939
-1.790960	the library function.	-0.124939
-1.071101	from any function.	-0.124939
-1.149903	the member function.	-0.124939
-0.976325	a member function.	-0.124939
-0.985289	class member function.	-0.124939
-1.120401	virtual member function.	-0.124939
-0.199549	the called function.	-0.124939
-1.156173	the critical function.	-0.301030
-1.465704	the new function.	-0.124939
-1.855482	a single function.	-0.124939
-1.459714	the virtual function.	-0.124939
-1.666318	the next function.	-0.124939
-0.589000	_mm_clflush intrinsic function.	-0.124939
-1.688720	a separate function.	-0.124939
-0.439507	the dispatcher function.	-0.124939
-1.501482	the desired function.	-0.124939
-0.892160	the inlined function.	-0.124939
-0.716068	an inlined function.	-0.124939
-1.354241	error message function.	-0.124939
-0.835536	a pure function.	-0.124939
-0.827786	the memcpy function.	-0.124939
-1.064197	a polymorphic function.	-0.124939
-0.981761	a leaf function.	-0.124939
-0.541030	examples: strlen function.	-0.124939
-0.504640	the ReadTSC function.	-0.124939
-0.463294	anyway. Pure function.	-0.124939
-2.467566	of the extra	-0.124939
-1.754595	do the extra	-0.124939
-0.899874	away the extra	-0.124939
-0.600449	despite the extra	-0.124939
-1.278868	lot of extra	-0.124939
-0.901969	structure. The extra	-0.124939
-0.597099	231. This extra	-0.124939
-0.597099	135). This extra	-0.124939
-0.891640	is an extra	-0.124939
-1.406892	have an extra	-0.124939
-0.830832	makes an extra	-0.124939
-0.564430	add an extra	-0.124939
-0.564430	needs an extra	-0.124939
-0.564430	requires an extra	-0.124939
-0.564430	Such an extra	-0.124939
-0.564430	adds an extra	-0.124939
-1.198028	make this extra	-0.124939
-1.918016	and other extra	-0.124939
-1.504031	is no extra	-0.124939
-0.767072	be no extra	-0.124939
-0.174971	takes no extra	-0.425969
-0.349386	take no extra	-0.124939
-0.528748	generates no extra	-0.124939
-0.584159	table takes extra	-0.124939
-0.584159	again takes extra	-0.124939
-0.530323	take any extra	-0.124939
-0.530323	generate any extra	-0.124939
-0.116357	produce any extra	-0.301030
-0.530323	adding any extra	-0.124939
-1.188171	to some extra	-0.124939
-1.633490	to take extra	-0.124939
-1.420033	may need extra	-0.124939
-1.801670	a few extra	-0.124939
-0.591646	actually add extra	-0.124939
-0.550634	an 9 extra	-0.124939
-0.540961	identification adds extra	-0.124939
-0.504781	compiler inserts extra	-0.124939
-1.817915	function that does	-0.124939
-1.921254	code that does	-0.124939
-1.182126	loop that does	-0.124939
-0.596456	constructor that does	-0.124939
-0.601014	automatically or does	-0.124939
-1.593516	that it does	-0.124939
-0.590617	fact it does	-0.124939
-0.590617	sometimes it does	-0.124939
-0.590617	optimization, it does	-0.124939
-1.076416	Assume function does	-0.124939
-0.597037	profiler. This does	-0.124939
-0.597037	point. This does	-0.124939
-1.458680	the compiler does	-0.124939
-1.899579	The compiler does	-0.124939
-0.842793	This compiler does	-0.124939
-1.090886	Microsoft compiler does	-0.124939
-0.570866	Each compiler does	-0.124939
-1.197582	However, this does	-0.124939
-0.587712	value. It does	-0.124939
-0.587712	check. It does	-0.124939
-0.587712	conversions. It does	-0.124939
-2.322318	the loop does	-0.124939
-1.072031	pointer which does	-0.124939
-1.426083	the pointer does	-0.124939
-1.946717	a pointer does	-0.124939
-0.577462	specific pointer does	-0.124939
-0.598695	IPP library does	-0.124939
-1.972051	the object does	-0.124939
-1.282583	of 2 does	-0.425969
-0.894426	is long does	-0.124939
-0.886965	or thread does	-0.124939
-1.274709	This manual does	-0.124939
-1.165987	the list does	-0.124939
-0.589690	static_cast operator does	-0.124939
-1.555300	CPU dispatcher does	-0.124939
-1.037587	The programmer does	-0.124939
-1.147878	pointer aliasing does	-0.124939
-1.056031	other hand, does	-0.124939
-0.763788	the unit-test does	-0.124939
-0.504600	same argument does	-0.124939
-0.659267	Example 14.26 does	-0.124939
-0.358629	function __intel_cpu_features_init_x() does	-0.124939
-2.919477	in the assembly	-0.124939
-2.177006	at the assembly	-0.124939
-1.818740	makes the assembly	-0.124939
-1.297231	what the assembly	-0.124939
-1.937538	use of assembly	-0.124939
-0.601715	linking to assembly	-0.124939
-0.504183	C++ and assembly	-0.425969
-0.887181	with in assembly	-0.124939
-1.180470	pointer in assembly	-0.124939
-0.594033	d in assembly	-0.124939
-0.887181	allowed in assembly	-0.124939
-0.089554	subroutines in assembly	-0.249877
-1.691840	function. The assembly	-0.124939
-0.898968	output. The assembly	-0.124939
-1.567109	option for assembly	-0.124939
-1.193336	guide for assembly	-0.124939
-0.598436	/Fa for assembly	-0.124939
-0.203415	C++ or assembly	-0.124939
-1.254974	or an assembly	-0.124939
-1.571939	have an assembly	-0.124939
-1.254974	generate an assembly	-0.124939
-1.722918	to use assembly	-0.124939
-0.578967	function using assembly	-0.124939
-1.836209	by using assembly	-0.124939
-0.578967	optimized, using assembly	-0.124939
-1.328344	may need assembly	-0.124939
-0.575554	compilers need assembly	-0.124939
-1.221983	the following assembly	-0.425969
-0.594354	this: Use assembly	-0.124939
-0.594050	testing single assembly	-0.124939
-0.460015	for inline assembly	-0.124939
-0.654186	an inline assembly	-0.124939
-0.460015	use inline assembly	-0.124939
-0.460015	same inline assembly	-0.124939
-0.460015	functions, inline assembly	-0.124939
-0.504877	understand compiler-generated assembly	-0.124939
-0.504741	-static Generate assembly	-0.124939
-0.358729	instruction timing, assembly	-0.124939
-0.358729	The MASM assembly	-0.124939
-3.124501	of the large	-0.124939
-1.596399	avoid the large	-0.124939
-1.809150	object is large	-0.124939
-1.292543	list is large	-0.124939
-1.628293	count is large	-0.124939
-1.945155	is a large	-0.124939
-2.441386	in a large	-0.124939
-1.855026	if a large	-0.124939
-2.146017	as a large	-0.124939
-0.595895	copying a large	-0.124939
-0.595895	least a large	-0.124939
-1.060944	install a large	-0.124939
-0.595895	incur a large	-0.124939
-1.794957	case of large	-0.124939
-0.900822	allocations of large	-0.124939
-1.576969	data in large	-0.124939
-0.600587	contentions in large	-0.425969
-0.598386	inefficient in large	-0.124939
-1.905926	useful for large	-0.124939
-1.488497	optimized for large	-0.124939
-0.901026	applications with large	-0.124939
-1.297894	calculations on large	-0.124939
-0.600940	Gbytes. This large	-0.124939
-1.372336	that use large	-0.124939
-0.887632	block. A large	-0.124939
-0.594262	here: A large	-0.124939
-1.068585	etc. In large	-0.124939
-1.566951	is so large	-0.124939
-1.199930	is very large	-0.124939
-0.651692	a very large	-0.124939
-0.341180	for very large	-0.124939
-0.594336	server. Use large	-0.124939
-1.172822	has several large	-0.124939
-1.170760	and cause large	-0.124939
-1.037437	are too large	-0.124939
-0.829796	to align large	-0.124939
-0.676789	will align large	-0.124939
-0.129419	are sufficiently large	-0.425969
-0.504721	can skip large	-0.124939
-2.023857	that a must	-0.124939
-0.901725	framework that must	-0.124939
-2.197895	then it must	-0.124939
-2.419656	the function must	-0.124939
-0.895366	calling function must	-0.124939
-2.638532	the code must	-0.124939
-1.922660	the compiler must	-0.124939
-1.294924	of x must	-0.124939
-1.965767	then you must	-0.124939
-0.582049	However, you must	-0.124939
-0.582049	problems you must	-0.124939
-0.863951	words, you must	-0.124939
-0.582049	purpose, you must	-0.124939
-0.600343	manually. It must	-0.124939
-2.413800	the program must	-0.124939
-0.883413	The functions must	-0.124939
-1.332595	Mathematical functions must	-0.124939
-0.898765	MOVNTQ instruction must	-0.124939
-0.599563	is, but must	-0.124939
-1.365430	container class must	-0.124939
-0.897175	cannot do must	-0.124939
-1.533483	of i must	-0.124939
-1.839584	an object must	-0.124939
-1.780468	the array must	-0.124939
-0.894977	However, we must	-0.124939
-0.893972	loop branch must	-0.124939
-1.063790	carry bit must	-0.124939
-1.857879	the user must	-0.124939
-1.413557	because they must	-0.124939
-0.595145	said, I must	-0.124939
-1.527561	multiple threads must	-0.124939
-0.591366	inequality sign must	-0.124939
-0.589824	Multithreaded programs must	-0.124939
-0.587791	performance. We must	-0.124939
-0.871459	interface framework must	-0.124939
-0.585310	XMM vectors must	-0.124939
-1.346832	copy constructor must	-0.124939
-0.581634	137 errors must	-0.124939
-0.581457	This index must	-0.124939
-0.578592	This task must	-0.124939
-0.540846	// SIZE must	-0.124939
-0.463038	The pragmas must	-0.124939
-0.463038	The recursion must	-0.124939
-0.658923	if any, must	-0.124939
-0.358457	for correctness must	-0.124939
-3.006882	of the while	-0.124939
-2.905427	in the while	-0.124939
-1.438822	including the while	-0.124939
-0.600659	emulate the while	-0.124939
-1.076765	program, and while	-0.124939
-1.765923	would be while	-0.124939
-1.664397	compile time while	-0.124939
-1.875921	can do while	-0.124939
-2.335595	= 0; while	-0.124939
-1.563818	32 bits while	-0.124939
-0.365696	do calculations while	-0.124939
-0.880753	data files while	-0.124939
-0.590062	all cases, while	-0.124939
-0.583594	frame function, while	-0.124939
-1.399610	the computer while	-0.124939
-0.582951	no overhead while	-0.124939
-1.145231	for Windows, while	-0.124939
-0.575147	modify x, while	-0.124939
-0.840913	are used, while	-0.124939
-0.562170	Pentium 4, while	-0.124939
-0.817269	one vector, while	-0.124939
-1.178263	is called, while	-0.124939
-0.556896	are integers, while	-0.124939
-0.556896	program size, while	-0.124939
-0.556896	and compile-time while	-0.124939
-0.550120	press break while	-0.124939
-1.086913	= 1.0; while	-0.124939
-0.540622	running on, while	-0.124939
-0.540622	do nothing while	-0.124939
-0.964256	multiple threads, while	-0.124939
-0.658837	= string; while	-0.124939
-0.462984	as arguments while	-0.124939
-0.462984	after exceptions: while	-0.124939
-0.658837	bits wide, while	-0.124939
-0.462984	only once, while	-0.124939
-0.358414	been incremented, while	-0.124939
-0.358414	relatively expensive, while	-0.124939
-0.358414	is unchanged, while	-0.124939
-0.358414	for both, while	-0.124939
-0.358414	regular pattern, while	-0.124939
-0.358414	to Func1, while	-0.124939
-0.358414	and flexibility, while	-0.124939
-0.358414	by semicolons, while	-0.124939
-0.358414	the application, while	-0.124939
-0.358414	as intended, while	-0.124939
-2.554286	of a ;	-0.124939
-2.190497	= a ;	-0.124939
-0.600138	; a ;	-0.124939
-1.598852	points to ;	-0.124939
-0.245749	of loop ;	-0.124939
-1.533607	of i ;	-0.124939
-1.219551	of array ;	-0.124939
-0.584495	in array ;	-0.124939
-0.893994	; return ;	-0.124939
-1.065574	by 2 ;	-0.124939
-0.892978	by 4 ;	-0.124939
-0.568914	on stack ;	-0.124939
-0.568914	from stack ;	-0.124939
-1.237323	function name ;	-0.124939
-0.584784	; r ;	-0.124939
-0.575015	if true ;	-0.124939
-0.474103	in ebx ;	-0.124939
-0.474103	$B1$2 ebx ;	-0.124939
-0.846259	; align ;	-0.124939
-0.175116	unused label ;	-0.124939
-0.557503	for ( ;	-0.124939
-0.557503	= Induction ;	-0.124939
-0.763791	ret ALIGN ;	-0.124939
-0.311067	= Induction; ;	-0.124939
-0.504419	Func1(double) pure_function ;	-0.124939
-0.834579	+ esp ;	-0.124939
-0.463093	; Induction++; ;	-0.124939
-0.358500	eax $B2$2 ;	-0.124939
-0.358500	to a[i+2] ;	-0.124939
-0.358500	?Func@@YAXQAHAAH@Z PROCNEAR ;	-0.124939
-0.358500	(32-bit mode): ;	-0.124939
-0.358500	variable 85 ;	-0.124939
-0.358500	PROC NEAR ;	-0.124939
-0.358500	example 8.26b: ;	-0.124939
-0.358500	;eax=addressofa ;edx=addressinr ;	-0.124939
-0.358500	+ sign(i) ;	-0.124939
-0.358500	esp ;alignby4 ;	-0.124939
-0.358500	;a ;r ;	-0.124939
-0.358500	i++ ;checkifi<100 ;	-0.124939
-0.358500	PTR [esp+12] ;	-0.124939
-0.358500	name ;startofFunc ;	-0.124939
-1.989470	that the arrays	-0.425969
-2.362417	when the arrays	-0.124939
-2.250447	If the arrays	-0.124939
-0.937662	sure the arrays	-0.425969
-1.607200	making the arrays	-0.124939
-0.974775	whether the arrays	-0.425969
-0.598636	align the arrays	-0.124939
-1.892855	use of arrays	-0.124939
-2.368549	number of arrays	-0.124939
-1.489007	examples of arrays	-0.124939
-0.897642	alignment of arrays	-0.124939
-1.374298	apply to arrays	-0.124939
-1.495727	applies to arrays	-0.124939
-1.197366	objects and arrays	-0.124939
-0.600362	Objects and arrays	-0.124939
-1.201510	even for arrays	-0.124939
-1.197440	efficient when arrays	-0.124939
-1.659134	can make arrays	-0.124939
-1.898137	for different arrays	-0.124939
-0.599810	6. If arrays	-0.124939
-1.070343	fixed size arrays	-0.124939
-1.047802	as static arrays	-0.124939
-0.586087	large static arrays	-0.124939
-0.836672	of large arrays	-0.124939
-0.567583	several large arrays	-0.124939
-0.566093	that big arrays	-0.124939
-1.104084	have big arrays	-0.124939
-1.799479	a few arrays	-0.124939
-1.458385	to replace arrays	-0.124939
-0.530426	make aligned arrays	-0.124939
-0.530426	three aligned arrays	-0.124939
-0.865698	or global arrays	-0.124939
-1.189115	for accessing arrays	-0.124939
-0.827300	simple variables, arrays	-0.124939
-0.504783	in character arrays	-0.124939
-0.504600	memory. Big arrays	-0.124939
-0.463257	12.5. Aligned arrays	-0.124939
-0.463257	allocate variable-size arrays	-0.124939
-0.463257	4. Align arrays	-0.124939
-0.358629	or clearing arrays	-0.124939
-0.358629	inefficient. Linear arrays	-0.124939
-0.358629	data. Multidimensional arrays	-0.124939
-2.429600	and the work	-0.124939
-2.444484	on the work	-0.124939
-1.755589	when the work	-0.425969
-0.491601	divide the work	-0.301030
-1.712456	amount of work	-0.124939
-1.567359	function to work	-0.124939
-1.928765	compiler to work	-0.124939
-1.911590	way to work	-0.124939
-1.412457	sure to work	-0.124939
-1.950201	likely to work	-0.124939
-1.063969	intended to work	-0.124939
-0.596930	impossible to work	-0.124939
-0.601464	Keywords that work	-0.124939
-1.376935	make it work	-0.124939
-1.850824	may not work	-0.124939
-1.952230	does not work	-0.124939
-0.900399	this may work	-0.124939
-1.197632	make this work	-0.124939
-1.542460	code will work	-0.124939
-0.594147	model will work	-0.124939
-0.898248	is other work	-0.124939
-0.599552	functions should work	-0.124939
-0.598410	methods also work	-0.124939
-1.666360	not always work	-0.124939
-0.595845	trivial programming work	-0.124939
-1.265933	the extra work	-0.124939
-0.888394	code versions work	-0.124939
-0.796221	it doesn't work	-0.124939
-0.491255	method doesn't work	-0.124939
-0.491255	line doesn't work	-0.124939
-0.491255	above doesn't work	-0.124939
-0.588734	next model work	-0.124939
-1.305910	software development work	-0.124939
-0.530303	Gnu directives work	-0.124939
-0.530303	Microsoft directives work	-0.124939
-0.575212	as little work	-0.124939
-0.846725	in BSD work	-0.124939
-0.562612	some heavy work	-0.124939
-0.526869	how caches work	-0.124939
-0.504620	deleted. User work	-0.124939
-0.358643	the reinstallation work	-0.124939
-1.598603	points to (see	-0.124939
-2.611500	the compiler (see	-0.124939
-0.897660	preceding one (see	-0.124939
-1.380060	data cache (see	-0.124939
-0.879553	no cache (see	-0.124939
-1.365297	derived class (see	-0.124939
-1.486024	and double (see	-0.124939
-1.288547	smart pointer (see	-0.124939
-1.719627	less efficient (see	-0.124939
-1.526626	induction variables (see	-0.124939
-1.597373	a register (see	-0.124939
-1.534042	in registers (see	-0.124939
-1.481920	XMM registers (see	-0.124939
-1.342508	by 16 (see	-0.124939
-1.916679	operating system (see	-0.124939
-1.267189	vector instructions (see	-0.124939
-0.889670	non-Intel processors (see	-0.124939
-1.760140	a constant (see	-0.124939
-1.758838	the stack (see	-0.124939
-0.594099	where necessary (see	-0.124939
-1.537582	unsigned integers (see	-0.124939
-0.885316	point precision (see	-0.124939
-1.332165	linked list (see	-0.124939
-1.047582	or 1 (see	-0.124939
-0.880583	cycle counter (see	-0.124939
-1.034456	point expressions (see	-0.124939
-1.216698	of range (see	-0.124939
-0.581793	as intended (see	-0.124939
-0.443946	automatic vectorization (see	-0.425969
-1.122405	Bounds checking (see	-0.124939
-0.569782	using templates (see	-0.124939
-1.393802	critical stride (see	-0.124939
-1.267386	dependency chains (see	-0.124939
-0.566446	quite time-consuming (see	-0.124939
-0.810700	pointer aliasing (see	-0.124939
-0.450222	out aliasing (see	-0.124939
-1.062804	branch prediction (see	-0.124939
-0.562193	a profiling (see	-0.124939
-1.046317	out-of-order capabilities (see	-0.124939
-1.106103	the throughput (see	-0.124939
-0.504319	of mispredictions (see	-0.124939
-0.463002	be profitable (see	-0.124939
-0.358428	automatic CPU-dispatching (see	-0.124939
-0.358428	the devirtualization (see	-0.124939
-2.367435	is the Windows	-0.124939
-2.961288	in the Windows	-0.124939
-1.707926	In the Windows	-0.124939
-2.084574	for a Windows	-0.124939
-0.900983	made a Windows	-0.124939
-1.008408	Linux and Windows	-0.425969
-0.597247	7 and Windows	-0.124939
-0.893517	DOS and Windows	-0.124939
-0.597247	(ATL) and Windows	-0.124939
-1.074801	supported in Windows	-0.124939
-0.600595	(OnIdle in Windows	-0.124939
-0.899009	performance. The Windows	-0.124939
-0.600015	BSD. The Windows	-0.124939
-1.005182	compiler for Windows	-0.124939
-0.889942	compilers for Windows	-0.124939
-0.889942	library for Windows	-0.124939
-1.341905	compiling for Windows	-0.124939
-0.601296	BigArray[1024]; // Windows	-0.124939
-0.601038	cases on Windows	-0.124939
-1.050514	Intel compiler Windows	-0.301030
-0.122346	MS compiler Windows	-0.602060
-0.962478	of 64-bit Windows	-0.124939
-0.930143	and 64-bit Windows	-0.425969
-1.576912	in 64-bit Windows	-0.124939
-0.527940	than 64-bit Windows	-0.124939
-0.527940	efficient. 64-bit Windows	-0.124939
-0.527940	different. 64-bit Windows	-0.124939
-0.978535	and 32-bit Windows	-0.124939
-1.708571	in 32-bit Windows	-0.124939
-1.395629	for 32-bit Windows	-0.124939
-1.634322	64 bit Windows	-0.124939
-0.685194	in both Windows	-0.425969
-0.540961	input. (In Windows	-0.124939
-0.764116	Linux, BSD, Windows	-0.124939
-0.504781	discontinued Object Windows	-0.124939
-0.659525	_WIN64 _LP64 Windows	-0.124939
-0.358758	current position. Windows	-0.124939
-1.710225	between the calls	-0.124939
-1.596281	avoid the calls	-0.124939
-2.460657	number of calls	-0.124939
-1.443769	times and calls	-0.124939
-1.164686	function that calls	-0.124939
-0.930138	program that calls	-0.124939
-0.594828	statement that calls	-0.124939
-0.853120	of function calls	-0.124939
-0.647297	and function calls	-0.124939
-0.560029	on function calls	-0.124939
-0.822739	make function calls	-0.124939
-0.963620	many function calls	-0.124939
-0.195382	makes function calls	-0.124939
-0.560029	extra function calls	-0.124939
-1.322970	virtual function calls	-0.124939
-1.152899	pure function calls	-0.124939
-0.560029	Even function calls	-0.124939
-1.055989	dispatched function calls	-0.124939
-0.560029	61 function calls	-0.124939
-0.560029	nested function calls	-0.124939
-0.601072	loops by calls	-0.124939
-1.569641	and then calls	-0.124939
-0.594088	which then calls	-0.124939
-0.897681	contains no calls	-0.124939
-1.424545	has many calls	-0.124939
-0.595744	statement always calls	-0.124939
-0.595746	with system calls	-0.124939
-0.592606	Functions Function calls	-0.124939
-1.051225	AVX support calls	-0.124939
-1.166066	program contains calls	-0.124939
-0.587852	below. Make calls	-0.124939
-0.581990	in turn calls	-0.124939
-0.659582	if F1 calls	-0.124939
-0.463458	If F1 calls	-0.124939
-0.788424	example 16.2 calls	-0.124939
-0.540852	"function". Multiple calls	-0.124939
-0.884235	the loader calls	-0.124939
-0.526932	error handler calls	-0.124939
-0.504680	standard API calls	-0.124939
-0.358686	of jumps, calls	-0.124939
-2.617670	to the calculations	-0.124939
-2.463122	on the calculations	-0.124939
-1.368389	through the calculations	-0.124939
-1.073230	overlap the calculations	-0.124939
-1.073230	finished the calculations	-0.124939
-0.899113	redo the calculations	-0.124939
-0.425111	sequence of calculations	-0.124939
-0.901867	error. The calculations	-0.124939
-1.742888	depends on calculations	-0.124939
-1.488777	with other calculations	-0.124939
-1.105799	floating point calculations	-0.124939
-1.425088	Floating point calculations	-0.124939
-1.071298	simple integer calculations	-0.124939
-1.906720	to do calculations	-0.124939
-1.152461	can do calculations	-0.425969
-0.243535	doing multiple calculations	-0.301030
-0.597944	doing some calculations	-0.124939
-0.579801	these address calculations	-0.124939
-0.579801	runtime address calculations	-0.124939
-1.395513	the necessary calculations	-0.124939
-1.042570	double precision calculations	-0.124939
-0.883036	when doing calculations	-0.124939
-0.591197	are: All calculations	-0.124939
-0.881846	that certain calculations	-0.124939
-0.591088	if intermediate calculations	-0.124939
-0.788713	common mathematical calculations	-0.124939
-0.541117	mix mathematical calculations	-0.124939
-1.281738	to start calculations	-0.124939
-0.572915	doing parallel calculations	-0.124939
-0.562605	heavy background calculations	-0.124939
-0.562739	pointer arithmetic calculations	-0.124939
-0.817941	time- consuming calculations	-0.124939
-1.190509	all code versions	-0.124939
-0.597475	multiple code versions	-0.124939
-1.956917	Intel compiler versions	-0.124939
-0.596384	following compiler versions	-0.124939
-1.060374	or more versions	-0.425969
-0.656586	the different versions	-0.301030
-1.149640	in different versions	-0.124939
-0.465717	The different versions	-0.425969
-0.538196	if different versions	-0.124939
-1.149640	with different versions	-0.124939
-0.538196	If different versions	-0.124939
-1.193804	two different versions	-0.124939
-0.598755	get library versions	-0.124939
-0.198626	in multiple versions	-0.346788
-0.340253	make multiple versions	-0.124939
-0.508312	making multiple versions	-0.124939
-0.508312	generate multiple versions	-0.124939
-1.301642	are two versions	-0.124939
-0.848984	make two versions	-0.124939
-1.101908	these two versions	-0.124939
-0.596245	advertise new versions	-0.124939
-0.595050	Hyperthreading Some versions	-0.124939
-1.181769	the compiled versions	-0.124939
-0.887127	have several versions	-0.124939
-0.593224	includes optimized versions	-0.124939
-0.587627	all three versions	-0.124939
-0.578901	make special versions	-0.124939
-0.572889	16. Library versions	-0.124939
-0.570233	in newer versions	-0.124939
-0.557566	The latest versions	-0.124939
-0.463349	the CPU-specific versions	-0.124939
-0.065776	CPUs. New versions	-0.425969
-0.463349	as command-line versions	-0.124939
-0.463349	necessary. Fast versions	-0.124939
-0.358701	Free trial versions	-0.124939
-2.867165	of the execution	-0.124939
-2.426894	on the execution	-0.124939
-0.203573	block the execution	-0.425969
-1.431049	give the execution	-0.124939
-1.627344	improve the execution	-0.124939
-0.203573	down the execution	-0.124939
-0.804517	terms of execution	-0.425969
-0.601658	relation to execution	-0.124939
-1.371723	cache and execution	-0.124939
-0.600349	compactness, and execution	-0.124939
-1.958944	} The execution	-0.124939
-1.072770	access. The execution	-0.124939
-1.499867	optimized for execution	-0.124939
-0.676892	CPUs with execution	-0.425969
-1.924901	of an execution	-0.124939
-0.899940	often have execution	-0.124939
-1.061380	in program execution	-0.124939
-0.885289	during program execution	-0.124939
-1.593996	the different execution	-0.124939
-1.060274	use different execution	-0.124939
-2.567142	floating point execution	-0.124939
-0.599084	Half size execution	-0.124939
-0.598546	only 64-bit execution	-0.124939
-0.598298	used where execution	-0.124939
-0.372003	of order execution	-0.124939
-0.594548	flexibility, while execution	-0.124939
-0.593906	between several execution	-0.124939
-0.881823	for optimizing execution	-0.124939
-1.154537	of their execution	-0.124939
-0.394682	the out-of-order execution	-0.425969
-0.430998	to out-of-order execution	-0.124939
-0.727749	the total execution	-0.124939
-0.575396	full 128-bit execution	-0.124939
-0.575340	occurs during execution	-0.124939
-0.562572	The fastest execution	-0.124939
-0.504580	that delays execution	-0.124939
-0.358615	with full-size execution	-0.124939
-0.358615	threads. Out-of-order execution	-0.124939
-1.932152	is to avoid	-0.124939
-1.889854	have to avoid	-0.124939
-1.138307	used to avoid	-0.124939
-0.584649	size to avoid	-0.124939
-1.970308	possible to avoid	-0.124939
-1.293015	order to avoid	-0.124939
-1.750261	way to avoid	-0.124939
-1.096296	how to avoid	-0.124939
-1.028911	system to avoid	-0.124939
-0.584649	type to avoid	-0.124939
-1.810398	want to avoid	-0.124939
-1.819559	able to avoid	-0.124939
-0.541148	ways to avoid	-0.124939
-0.868938	models to avoid	-0.124939
-0.584649	situations to avoid	-0.124939
-0.868938	measurements to avoid	-0.124939
-0.584649	unrolled to avoid	-0.124939
-1.202564	possible, and avoid	-0.124939
-1.103404	you can avoid	-0.124939
-1.563443	we can avoid	-0.124939
-0.963659	You can avoid	-0.301030
-1.846139	compiler may avoid	-0.124939
-1.378120	You may avoid	-0.124939
-2.020317	if you avoid	-0.124939
-1.345558	as you avoid	-0.124939
-0.761289	you should avoid	-0.124939
-1.272615	You should avoid	-0.124939
-1.486118	you cannot avoid	-0.124939
-1.452070	You cannot avoid	-0.124939
-0.879444	may preferably avoid	-0.124939
-0.588045	all means avoid	-0.124939
-1.777251	and the result	-0.124939
-2.306579	for the result	-0.124939
-2.339795	that the result	-0.124939
-1.572271	on the result	-0.602060
-2.258820	when the result	-0.124939
-2.103644	because the result	-0.124939
-0.594513	return the result	-0.124939
-1.573305	sure the result	-0.124939
-1.492681	store the result	-0.124939
-1.174688	see the result	-0.124939
-0.594513	needs the result	-0.124939
-0.739147	give the result	-0.124939
-0.888125	convert the result	-0.124939
-0.089591	Store the result	-0.726999
-1.264542	stores the result	-0.124939
-2.069997	for a result	-0.124939
-1.629549	as a result	-0.425969
-1.178920	2. The result	-0.124939
-0.890307	fast. The result	-0.124939
-0.595622	profiler. The result	-0.124939
-0.595622	ecx+eax*4. The result	-0.124939
-0.595622	<. The result	-0.124939
-1.370900	i); // result	-0.124939
-0.598934	c.load(cc+i); // result	-0.124939
-0.600566	stores this result	-0.124939
-2.634777	the same result	-0.124939
-2.018652	the first result	-0.124939
-0.591775	; store result	-0.124939
-0.560546	the intermediate result	-0.124939
-1.234064	a better result	-0.124939
-1.527848	the second result	-0.124939
-0.784046	the final result	-0.124939
-1.239996	// Store result	-0.124939
-0.527101	the 33 result	-0.124939
-0.726484	the correct result	-0.124939
-2.573724	that the processor	-0.124939
-1.795092	if the processor	-0.301030
-2.358059	by the processor	-0.124939
-2.443608	on the processor	-0.124939
-1.655745	whether the processor	-0.124939
-1.295366	Such a processor	-0.124939
-0.900816	Whenever a processor	-0.124939
-0.942570	list of processor	-0.425969
-1.368882	know that processor	-0.124939
-0.898523	Assuming that processor	-0.124939
-1.295473	best on processor	-0.124939
-0.900742	simultaneously. This processor	-0.124939
-2.356361	rather than processor	-0.124939
-1.533950	the same processor	-0.425969
-0.735382	of which processor	-0.425969
-1.943268	for each processor	-0.124939
-1.068725	the multiple processor	-0.124939
-0.597942	time, any processor	-0.124939
-1.928118	a new processor	-0.124939
-0.887039	of specific processor	-0.124939
-0.564676	The virtual processor	-0.124939
-0.831287	A virtual processor	-0.124939
-0.592460	don't support processor	-0.124939
-1.788136	a particular processor	-0.124939
-1.665304	the next processor	-0.124939
-0.874509	and better processor	-0.124939
-0.582895	or VIA processor	-0.124939
-0.577271	A non-Intel processor	-0.124939
-0.575268	one logical processor	-0.124939
-0.129393	a soft processor	-0.124939
-0.526805	a hyperthreading processor	-0.124939
-0.463221	dedicated physics processor	-0.124939
-0.463221	a word processor	-0.124939
-0.358600	Core i7 processor	-0.124939
-0.358600	The Core2 processor	-0.124939
-0.358600	Pentium M processor	-0.124939
-3.024170	of the compiled	-0.124939
-2.645918	that the compiled	-0.124939
-1.372682	not the compiled	-0.124939
-1.818804	makes the compiled	-0.124939
-1.338605	and is compiled	-0.124939
-1.590104	that is compiled	-0.124939
-2.209974	function is compiled	-0.124939
-1.175988	code is compiled	-0.249877
-1.331890	program is compiled	-0.124939
-0.594843	D is compiled	-0.124939
-1.298456	file and compiled	-0.124939
-1.538633	useful in compiled	-0.124939
-0.940891	implemented in compiled	-0.124939
-2.339969	to be compiled	-0.124939
-2.248307	should be compiled	-0.124939
-1.187512	possibly be compiled	-0.124939
-1.194096	code are compiled	-0.124939
-0.599532	main() are compiled	-0.124939
-1.852973	of code compiled	-0.124939
-0.590549	with code compiled	-0.124939
-0.201793	mixing code compiled	-0.124939
-1.057516	than when compiled	-0.124939
-0.594718	process when compiled	-0.124939
-1.917695	and other compiled	-0.124939
-0.599312	necessary, each compiled	-0.124939
-1.575830	shared object compiled	-0.124939
-0.597041	are first compiled	-0.124939
-0.547137	and programs compiled	-0.124939
-1.016814	in programs compiled	-0.124939
-0.394062	the directly compiled	-0.124939
-0.394062	as directly compiled	-0.124939
-0.394062	C++, directly compiled	-0.124939
-0.566899	a fully compiled	-0.124939
-0.764080	Example 8.26a compiled	-0.124939
-0.527119	are normally compiled	-0.124939
-0.659496	Example 8.26b compiled	-0.124939
-0.358744	be cross- compiled	-0.124939
-0.600180	element } An	-0.124939
-0.599719	inline functions An	-0.124939
-1.421001	in C++ An	-0.124939
-1.287997	Induction variables An	-0.124939
-0.596344	application code. An	-0.124939
-0.596322	each time. An	-0.124939
-1.693714	vector operations An	-0.124939
-0.882870	Overloaded operators An	-0.124939
-0.591367	memory store An	-0.124939
-0.590663	first program. An	-0.124939
-1.527716	is used. An	-0.124939
-1.019894	some cases. An	-0.124939
-1.104314	container classes. An	-0.124939
-0.834279	caching inefficient. An	-0.124939
-1.199783	is executed. An	-0.124939
-0.826625	the arrays. An	-0.124939
-0.562147	gives zero. An	-0.124939
-1.143954	is important. An	-0.124939
-0.938507	mentioned above. An	-0.124939
-0.390648	single result. An	-0.124939
-0.390648	negative result. An	-0.124939
-1.040952	constant propagation An	-0.124939
-0.787808	7.10 Arrays An	-0.124939
-0.958598	is declared. An	-0.124939
-0.526739	= s; An	-0.124939
-0.763205	in www.agner.org/optimize/cppexamples.zip. An	-0.124939
-0.725548	7.22 Inheritance An	-0.124939
-0.725548	7.4 Enums An	-0.124939
-0.504279	is inlined. An	-0.124939
-0.726031	become fragmented. An	-0.124939
-0.504279	below. Devirtualization An	-0.124939
-0.462965	VIA CPUs: An	-0.124939
-0.658808	in C++: An	-0.124939
-0.658808	same thing. An	-0.124939
-0.358400	memory leak. An	-0.124939
-0.358400	to +127. An	-0.124939
-0.358400	operating systems"). An	-0.124939
-0.358400	char pointers). An	-0.124939
-0.358400	of matrices. An	-0.124939
-0.358400	C99 standard. An	-0.124939
-0.358400	bounds checking). An	-0.124939
-0.358400	assembly language: An	-0.124939
-0.358400	it supports. An	-0.124939
-0.358400	page 27. An	-0.124939
-1.724986	} // Use	-0.124939
-1.063211	... // Use	-0.124939
-0.887120	zero); // Use	-0.124939
-0.594002	2.5; // Use	-0.124939
-0.601176	copying it Use	-0.124939
-1.658916	intrinsic functions Use	-0.124939
-1.568271	assembly language Use	-0.124939
-0.593468	cards, etc. Use	-0.124939
-0.985077	the data. Use	-0.124939
-1.228115	of data. Use	-0.124939
-0.873009	all compilers. Use	-0.124939
-0.866886	Intel compiler. Use	-0.124939
-0.583152	for example: Use	-0.124939
-0.581493	satisfied: 1. Use	-0.124939
-0.580215	PathScale. 2. Use	-0.124939
-1.104713	function libraries. Use	-0.124939
-1.285379	if possible. Use	-0.124939
-1.003166	object file. Use	-0.124939
-0.557614	do this: Use	-0.124939
-1.161682	for details. Use	-0.124939
-0.540546	common names. Use	-0.124939
-0.540546	library files. Use	-0.124939
-0.915875	floating point. Use	-0.124939
-0.526636	performance reasons. Use	-0.124939
-0.763424	not optimal. Use	-0.124939
-0.883562	this option. Use	-0.124939
-0.527031	vectorization. 3. Use	-0.124939
-0.504649	are implemented. Use	-0.124939
-0.504399	contentions expected. Use	-0.124939
-0.834536	hot spot. Use	-0.124939
-0.143259	16 3.2 Use	-0.124939
-0.143259	improved. 3.2 Use	-0.124939
-0.143259	zero. 14.3 Use	-0.124939
-0.143259	134 14.3 Use	-0.124939
-0.143259	132 14.1 Use	-0.124939
-0.143259	topics 14.1 Use	-0.124939
-0.463075	test server. Use	-0.124939
-0.658980	cache contentions. Use	-0.124939
-0.463075	calls. 48 Use	-0.124939
-0.358486	symbolic link. Use	-0.124939
-0.358486	assembly listing. Use	-0.124939
-0.358486	Example 7.34a. Use	-0.124939
-0.358486	allows "__attribute__((visibility("hidden")))". Use	-0.124939
-0.358486	supports this). Use	-0.124939
-0.601748	string of bytes	-0.124939
-0.485834	is 4 bytes	-0.124939
-0.506723	4 4 bytes	-0.425969
-0.485834	Store 4 bytes	-0.124939
-0.080045	unlimited 4 bytes	-0.726999
-0.820966	of 8 bytes	-0.124939
-0.670562	and 8 bytes	-0.124939
-0.470416	takes 8 bytes	-0.124939
-0.470416	structure 8 bytes	-0.124939
-0.470416	Store 8 bytes	-0.124939
-0.078506	unlimited 8 bytes	-0.726999
-0.893125	typically 64 bytes	-0.124939
-0.706133	to 16 bytes	-0.124939
-0.492511	than 16 bytes	-0.124939
-0.492511	test 16 bytes	-0.124939
-0.111310	Store 16 bytes	-0.602060
-0.714779	is 128 bytes	-0.124939
-0.714779	first 128 bytes	-0.124939
-0.181164	strlen 128 bytes	-0.124939
-0.112645	of unused bytes	-0.124939
-0.265147	2 unused bytes	-0.124939
-0.112645	4 unused bytes	-0.124939
-0.112645	6 unused bytes	-0.124939
-0.582050	64 consecutive bytes	-0.124939
-0.567021	than 65 bytes	-0.124939
-0.557507	than 127 bytes	-0.124939
-0.557507	Type size, bytes	-0.124939
-0.463476	= 2048 bytes	-0.124939
-0.463476	or 0x40 bytes	-0.124939
-0.358801	array 800 bytes	-0.124939
-0.358801	bytes alignment, bytes	-0.124939
-3.050486	in the big	-0.124939
-1.624888	that is big	-0.425969
-2.335223	function is big	-0.124939
-1.622101	size is big	-0.124939
-2.456878	is a big	-0.124939
-1.910035	of a big	-0.124939
-2.015520	in a big	-0.124939
-0.893661	out a big	-0.124939
-1.185463	requires a big	-0.124939
-0.902330	deallocation of big	-0.124939
-1.376747	arrays and big	-0.124939
-1.202946	done in big	-0.124939
-1.595859	only for big	-0.124939
-0.601477	recommended that big	-0.124939
-1.842535	based on big	-0.124939
-1.075939	compiled code big	-0.124939
-1.525817	that have big	-0.124939
-1.493169	may have big	-0.124939
-1.135207	you have big	-0.124939
-1.372200	that use big	-0.124939
-0.899342	disk. A big	-0.124939
-1.916092	and other big	-0.124939
-1.515052	in one big	-0.124939
-1.022331	has one big	-0.124939
-0.582272	allocate one big	-0.124939
-2.049583	is no big	-0.124939
-1.467049	are no big	-0.124939
-1.454023	is so big	-0.124939
-1.212664	are so big	-0.124939
-1.443209	a very big	-0.124939
-0.551802	to very big	-0.124939
-0.551802	arrays very big	-0.124939
-0.551802	made very big	-0.124939
-0.596811	doubt how big	-0.124939
-1.082396	is too big	-0.124939
-0.905230	are too big	-0.124939
-0.577433	comparison. On big	-0.124939
-1.288811	or writing big	-0.124939
-0.557304	come. Even big	-0.124939
-0.358672	of yesterday's big	-0.124939
-1.936618	code and doesn't	-0.124939
-0.912624	function that doesn't	-0.301030
-1.270256	size that doesn't	-0.124939
-0.591558	solution that doesn't	-0.124939
-0.591558	dispatcher that doesn't	-0.124939
-0.591558	style that doesn't	-0.124939
-1.181793	that it doesn't	-0.249877
-1.005702	because it doesn't	-0.249877
-1.221428	but it doesn't	-0.124939
-1.003125	case it doesn't	-0.124939
-1.277195	the compiler doesn't	-0.425969
-1.428164	The compiler doesn't	-0.124939
-0.566811	chosen compiler doesn't	-0.124939
-0.600395	atomic. It doesn't	-0.124939
-2.271318	the CPU doesn't	-0.124939
-1.072977	CPUID instruction doesn't	-0.124939
-1.193795	signed integer doesn't	-0.124939
-0.897631	A class doesn't	-0.124939
-2.029355	the size doesn't	-0.124939
-1.973349	the object doesn't	-0.124939
-1.470416	this method doesn't	-0.124939
-1.267550	the error doesn't	-0.124939
-1.054652	integer overflow doesn't	-0.124939
-0.593427	above line doesn't	-0.124939
-0.880685	if above doesn't	-0.124939
-1.697062	the microprocessor doesn't	-0.124939
-0.580448	store operation doesn't	-0.124939
-0.562791	that volatile doesn't	-0.124939
-0.540917	manual currently doesn't	-0.124939
-1.647468	if the threads	-0.124939
-2.300266	because the threads	-0.124939
-1.922754	use of threads	-0.124939
-2.429677	number of threads	-0.124939
-0.601499	28) The threads	-0.124939
-0.601496	counts for threads	-0.124939
-1.298324	and that threads	-0.124939
-0.601145	processes or threads	-0.124939
-1.691750	or more threads	-0.124939
-1.182602	no more threads	-0.124939
-1.156111	The different threads	-0.124939
-0.585460	that different threads	-0.124939
-1.355922	between different threads	-0.124939
-1.380796	in other threads	-0.124939
-1.447669	no other threads	-0.124939
-0.599656	when all threads	-0.124939
-1.286473	divided into threads	-0.124939
-0.181284	if multiple threads	-0.124939
-0.181284	by multiple threads	-0.124939
-1.183815	into multiple threads	-0.124939
-1.086904	between multiple threads	-0.124939
-0.715601	avoid multiple threads	-0.124939
-0.498275	running multiple threads	-0.124939
-0.498275	Define multiple threads	-0.124939
-0.498275	Running multiple threads	-0.124939
-0.786766	If two threads	-0.124939
-0.540015	making two threads	-0.124939
-0.190982	run two threads	-0.425969
-0.540015	running two threads	-0.124939
-0.540015	prevent two threads	-0.124939
-0.371953	communication between threads	-0.124939
-0.592398	run eight threads	-0.124939
-0.788635	in separate threads	-0.124939
-0.541073	into separate threads	-0.124939
-0.504781	core. Two threads	-0.124939
-0.463422	and high-priority threads	-0.124939
-2.201346	is the best	-0.124939
-2.145957	of the best	-0.301030
-2.392976	to the best	-0.124939
-2.253968	and the best	-0.124939
-2.554833	in the best	-0.124939
-2.195241	with the best	-0.124939
-1.342779	not the best	-0.124939
-1.541353	have the best	-0.124939
-2.122139	use the best	-0.124939
-1.690003	do the best	-0.124939
-0.595593	model the best	-0.124939
-1.455925	find the best	-0.124939
-1.060062	Obviously, the best	-0.124939
-0.890251	select the best	-0.124939
-0.890251	choosing the best	-0.124939
-0.595593	Sometimes, the best	-0.124939
-0.595593	provide the best	-0.124939
-2.683411	It is best	-0.124939
-1.492283	language is best	-0.124939
-1.733739	solution is best	-0.124939
-1.481696	code. The best	-0.124939
-1.304074	calls. The best	-0.124939
-1.151909	available. The best	-0.124939
-0.876200	system. The best	-0.124939
-0.876200	case. The best	-0.124939
-0.876200	machine. The best	-0.124939
-0.876200	compilers). The best	-0.124939
-0.588407	shows. The best	-0.124939
-0.588407	duration. The best	-0.124939
-0.588407	flaws: The best	-0.124939
-1.298660	version for best	-0.124939
-2.300309	that are best	-0.124939
-1.404995	to work best	-0.124939
-0.498711	that works best	-0.425969
-0.476305	vectorization works best	-0.124939
-0.175887	"what works best	-0.425969
-0.152947	what fits best	-0.425969
-0.527228	that performs best	-0.124939
-2.604739	if the necessary	-0.124939
-1.579669	have the necessary	-0.124939
-1.973349	all the necessary	-0.124939
-1.744983	do the necessary	-0.124939
-1.375640	does the necessary	-0.124939
-1.072342	support the necessary	-0.124939
-0.599768	lack the necessary	-0.124939
-1.310071	it is necessary	-0.778151
-2.297455	This is necessary	-0.124939
-1.980796	It is necessary	-0.425969
-0.886150	unit-testing is necessary	-0.124939
-1.713273	amount of necessary	-0.124939
-1.444448	problems and necessary	-0.124939
-2.590396	can be necessary	-0.124939
-1.863027	not be necessary	-0.124939
-1.800537	may be necessary	-0.425969
-0.601440	updates are necessary	-0.124939
-1.597557	makes it necessary	-0.124939
-1.350869	is not necessary	-0.221849
-1.801934	are not necessary	-0.124939
-0.895866	checks where necessary	-0.124939
-0.598163	calling any necessary	-0.124939
-1.139382	is often necessary	-0.425969
-0.710384	is therefore necessary	-0.602060
-1.229737	is rarely necessary	-0.124939
-0.504861	the 124 necessary	-0.124939
-1.978709	address of element	-0.124939
-1.547665	used by element	-0.124939
-0.203261	swapped with element	-0.124939
-0.900564	access an element	-0.124939
-1.064766	the vector element	-0.425969
-1.865083	only one element	-0.124939
-0.613819	to each element	-0.425969
-1.344204	for each element	-0.124939
-0.189772	AND each element	-0.425969
-0.534638	i.e. each element	-0.124939
-0.189772	Compare each element	-0.425969
-0.964524	the array element	-0.124939
-1.033960	of array element	-0.124939
-1.374771	an array element	-0.124939
-0.776041	each array element	-0.124939
-0.533906	Output array element	-0.124939
-0.597755	each table element	-0.124939
-1.067567	the first element	-0.301030
-1.929993	a new element	-0.124939
-0.594881	this extra element	-0.124939
-0.592030	only calculate element	-0.124939
-1.048966	for every element	-0.124939
-1.368032	the last element	-0.124939
-0.530379	diagonal. Each element	-0.124939
-0.530379	list. Each element	-0.124939
-0.698640	cycles per element	-0.124939
-0.090143	Time per element	-0.301030
-0.073751	numerically largest element	-0.301030
-0.726283	the nearest element	-0.124939
-0.463367	extra dummy element	-0.124939
-0.358715	we reach element	-0.124939
-1.378457	also a language	-0.124939
-0.900133	But this language	-0.124939
-0.899646	important. A language	-0.124939
-0.551266	the C++ language	-0.124939
-0.549899	The C++ language	-0.425969
-1.429118	different C++ language	-0.124939
-0.576953	the programming language	-0.124939
-0.409071	a programming language	-0.124939
-0.467535	of programming language	-0.124939
-0.409071	which programming language	-0.124939
-0.409071	C++ programming language	-0.124939
-0.576953	software programming language	-0.124939
-0.409071	particular programming language	-0.124939
-0.409071	preferred programming language	-0.124939
-0.379655	of assembly language	-0.124939
-0.379655	to assembly language	-0.124939
-0.750677	in assembly language	-0.124939
-0.601133	for assembly language	-0.124939
-0.534318	or assembly language	-0.124939
-0.601133	an assembly language	-0.124939
-0.277256	using assembly language	-0.124939
-0.379655	Use assembly language	-0.124939
-0.379655	timing, assembly language	-0.124939
-0.379655	MASM assembly language	-0.124939
-0.887302	the common language	-0.124939
-0.583087	low-level C language	-0.124939
-0.575391	device. Any language	-0.124939
-0.828000	the preferred language	-0.124939
-0.557577	The D language	-0.124939
-0.178268	hardware definition language	-0.249877
-0.550586	integration, mixed language	-0.124939
-0.249770	a high-level language	-0.124939
-0.249770	advanced high-level language	-0.124939
-0.596375	CPU-intensive code. But	-0.124939
-0.891744	CPU time. But	-0.124939
-1.153103	member function. But	-0.124939
-0.837763	called function. But	-0.124939
-0.563308	sprintf, etc. But	-0.124939
-0.563308	Basic, etc. But	-0.124939
-1.023421	smart pointer. But	-0.124939
-0.580145	> b) But	-0.124939
-0.578481	same resources. But	-0.124939
-1.513152	clock cycles. But	-0.124939
-1.104447	double precision. But	-0.124939
-0.569879	modern CPU. But	-0.124939
-0.569782	it returns. But	-0.124939
-0.566659	unsigned integers. But	-0.124939
-0.566339	function parameter. But	-0.124939
-0.562430	destination array. But	-0.124939
-1.182913	cache line. But	-0.124939
-0.562311	matical applications. But	-0.124939
-1.047144	an integer. But	-0.124939
-1.086990	data members. But	-0.124939
-0.540459	function names. But	-0.124939
-0.540459	by itself. But	-0.124939
-0.540459	constant 5. But	-0.124939
-0.526552	other languages. But	-0.124939
-0.763278	be obsolete. But	-0.124939
-0.725615	be inlined. But	-0.124939
-0.504319	value, n. But	-0.124939
-0.658866	was programmed. But	-0.124939
-0.658866	the market. But	-0.124939
-0.463002	such checks. But	-0.124939
-0.463002	optimization issue. But	-0.124939
-0.463002	the delay. But	-0.124939
-0.463002	= 1.23456. But	-0.124939
-0.463002	Java today. But	-0.124939
-0.358428	vice versa. But	-0.124939
-0.358428	93 themselves. But	-0.124939
-0.358428	i<n; ++i). But	-0.124939
-0.358428	or C1::f. But	-0.124939
-0.358428	its simplicity. But	-0.124939
-0.358428	single session. But	-0.124939
-0.358428	and b. But	-0.124939
-0.358428	cache miss. But	-0.124939
-0.358428	resource conflicts. But	-0.124939
-2.396719	and the speed	-0.124939
-2.195897	than the speed	-0.124939
-2.245436	because the speed	-0.124939
-0.897084	double the speed	-0.124939
-1.578521	while the speed	-0.124939
-1.625680	improve the speed	-0.124939
-1.286169	increase the speed	-0.124939
-0.599047	Testing the speed	-0.124939
-0.897084	measures the speed	-0.124939
-1.199667	used to speed	-0.124939
-2.062712	how to speed	-0.124939
-0.490798	difference in speed	-0.602060
-0.599204	gain in speed	-0.425969
-1.455985	data. The speed	-0.124939
-1.060086	precision. The speed	-0.124939
-0.595601	correctly. The speed	-0.124939
-0.595601	optimally. The speed	-0.124939
-0.595601	6! The speed	-0.124939
-0.598465	software for speed	-0.124939
-1.068485	Optimizing for speed	-0.124939
-0.598465	Optimize for speed	-0.124939
-0.601233	version if speed	-0.124939
-0.594734	avoided when speed	-0.124939
-0.594734	preferred when speed	-0.124939
-0.584730	code where speed	-0.124939
-0.584730	areas where speed	-0.124939
-1.188564	hardly any speed	-0.124939
-1.073312	the execution speed	-0.124939
-0.180142	of execution speed	-0.425969
-0.493554	for execution speed	-0.124939
-0.493554	optimizing execution speed	-0.124939
-0.876179	The high speed	-0.124939
-1.225478	to improve speed	-0.124939
-0.584937	actually reduce speed	-0.124939
-0.577415	with reduced speed	-0.124939
-0.577320	a processing speed	-0.124939
-0.550649	than half speed	-0.124939
-0.391028	at half speed	-0.124939
-0.805598	16 Testing speed	-0.124939
-1.959656	for the specific	-0.124939
-1.589641	or the specific	-0.124939
-2.258579	than the specific	-0.124939
-1.379015	example is specific	-0.124939
-2.258757	is a specific	-0.124939
-1.845384	of a specific	-0.124939
-1.288805	to a specific	-0.425969
-2.298844	in a specific	-0.124939
-0.975717	for a specific	-0.124939
-1.861015	that a specific	-0.124939
-0.591605	Typically, a specific	-0.124939
-0.591605	indicates a specific	-0.124939
-1.497421	terms of specific	-0.124939
-0.600947	lists of specific	-0.124939
-1.290693	version for specific	-0.124939
-0.898926	fine-tuned for specific	-0.124939
-2.155270	there are specific	-0.124939
-0.601296	AVX2 // specific	-0.124939
-0.901263	brands or specific	-0.124939
-0.600206	instructions at specific	-0.124939
-1.007384	have no specific	-0.124939
-0.580639	(requires no specific	-0.124939
-0.840634	making any specific	-0.124939
-0.569710	recommend any specific	-0.124939
-0.569710	obey any specific	-0.124939
-0.580548	legacy code, specific	-0.124939
-1.395664	to fit specific	-0.124939
-0.575297	maintain. Any specific	-0.124939
-0.805442	for giving specific	-0.124939
-0.090129	a CPU- specific	-0.124939
-0.090129	The CPU- specific	-0.124939
-0.090129	make CPU- specific	-0.124939
-0.358758	other system- specific	-0.124939
-0.358758	critical application- specific	-0.124939
-1.202513	changed to c	-0.124939
-0.070289	b and c	-0.221849
-1.077194	diagonal. The c	-0.124939
-1.821029	y = c	-0.124939
-0.601173	reversed if c	-0.124939
-1.549070	0) { c	-0.124939
-0.600313	d+e, then c	-0.124939
-0.594738	in vector c	-0.425969
-0.599804	overlap. If c	-0.124939
-0.961975	b + c	-0.425969
-0.598142	(a * c	-0.124939
-1.225378	= 0; c	-0.726999
-1.051959	= b; c	-0.124939
-0.193967	b : c	-0.124939
-0.191216	c: __m128i c	-0.425969
-0.588293	fast division c	-0.124939
-1.757072	a, b, c	-0.124939
-0.187205	> 0, c	-0.425969
-0.517437	c, d; c	-0.425969
-0.175378	0 ? c	-0.425969
-0.846889	* temp; c	-0.124939
-0.550274	c: Is16vec8 c	-0.124939
-0.959455	= 100, c	-0.124939
-0.504580	select_gt(b, zero, c	-0.124939
-0.463239	* 3.5; c	-0.124939
-0.358615	= 1.0E8, c	-0.124939
-0.358615	{ CFALSE: c	-0.124939
-0.358615	* (a+1); c	-0.124939
-2.577377	it is much	-0.124939
-1.456588	which is much	-0.425969
-1.283331	processor is much	-0.124939
-0.894150	addresses is much	-0.124939
-1.065840	effect is much	-0.124939
-0.597567	-fpic is much	-0.124939
-1.639852	where a much	-0.124939
-0.600412	users and much	-0.124939
-0.600412	backwards and much	-0.124939
-0.601344	measure are much	-0.124939
-0.893654	do as much	-0.124939
-0.597316	get as much	-0.124939
-1.074619	typically have much	-0.124939
-0.600197	resolution. A much	-0.124939
-1.371863	that do much	-0.124939
-0.794395	memory takes much	-0.124939
-0.794395	often takes much	-0.124939
-0.356187	division takes much	-0.425969
-0.544321	truncation takes much	-0.124939
-1.566807	is so much	-0.124939
-0.597673	depends very much	-0.124939
-0.596991	typically take much	-0.124939
-1.597762	are often much	-0.124939
-0.467063	and how much	-0.425969
-0.540319	calculate how much	-0.124939
-0.787302	measure how much	-0.124939
-1.495962	is accessed much	-0.124939
-0.885191	are calculated much	-0.124939
-0.592738	typically uses much	-0.124939
-0.489926	has too much	-0.124939
-0.701914	takes too much	-0.124939
-0.489926	worrying too much	-0.124939
-1.494989	is usually much	-0.124939
-1.338833	be made much	-0.124939
-0.274563	1. How much	-0.124939
-0.115936	3.1 How much	-0.425969
-0.764147	can obtain much	-0.124939
-0.659381	to worry much	-0.124939
-2.094104	is a single	-0.124939
-1.399616	in a single	-0.124939
-1.826885	for a single	-0.124939
-1.413526	by a single	-0.124939
-1.876227	with a single	-0.124939
-1.919360	as a single	-0.124939
-1.765688	make a single	-0.124939
-0.781813	only a single	-0.124939
-0.669338	into a single	-0.204120
-1.028544	even a single	-0.124939
-1.028544	just a single	-0.124939
-0.584517	produce a single	-0.124939
-0.584517	isolating a single	-0.124939
-1.202828	back to single	-0.124939
-0.901982	higher for single	-0.124939
-1.077228	operators are single	-0.124939
-1.550381	done with single	-0.124939
-1.373654	fast as single	-0.124939
-1.733102	time than single	-0.124939
-1.509119	may use single	-0.124939
-0.594997	all use single	-0.124939
-0.600220	b from single	-0.124939
-1.653799	are using single	-0.124939
-1.066357	speed between single	-0.124939
-1.454003	the constant single	-0.124939
-0.884640	or four single	-0.124939
-0.592408	or eight single	-0.124939
-0.855003	for testing single	-0.124939
-0.575355	not mix single	-0.124939
-0.764311	for mixing single	-0.124939
-0.896799	largest_index = i;	-0.124939
-0.598903	matrix[j][0] = i;	-0.124939
-0.679329	{ int i;	-0.124939
-0.322466	0; int i;	-0.425969
-0.973915	unsigned int i;	-0.124939
-0.669831	100; int i;	-0.124939
-0.043086	f; int i;	-0.970037
-0.469955	10; int i;	-0.124939
-0.322466	a[100]; int i;	-0.124939
-0.469955	16; int i;	-0.124939
-0.469955	1.0; int i;	-0.124939
-0.669831	1000; int i;	-0.124939
-0.216854	list[300]; int i;	-0.602060
-0.845922	matrix[rows][columns]; int i;	-0.124939
-0.469955	list[100]; int i;	-0.124939
-0.469955	7.21 int i;	-0.124939
-0.469955	7.23 int i;	-0.124939
-0.469955	7.20 int i;	-0.124939
-0.469955	7.19 int i;	-0.124939
-0.469955	p; int i;	-0.124939
-0.469955	110; int i;	-0.124939
-0.469955	ab[size]; int i;	-0.124939
-0.469955	list[16]; int i;	-0.124939
-0.469955	7.30b int i;	-0.124939
-0.469955	7.30a int i;	-0.124939
-0.469955	list[301]; int i;	-0.124939
-0.599106	p + i;	-0.124939
-0.563080	f *= i;	-0.124939
-0.659897	a[size], b[size], i;	-0.124939
-1.628746	2; } These	-0.124939
-0.891793	position-independent code. These	-0.124939
-1.273203	extra time. These	-0.124939
-0.593391	mispredictions, etc. These	-0.124939
-1.047066	of memory. These	-0.124939
-0.878225	casting operator These	-0.124939
-1.305379	data cache. These	-0.124939
-1.737933	instruction set. These	-0.124939
-0.867050	of CPUs. These	-0.124939
-0.582667	the pointer. These	-0.124939
-0.578762	system calls. These	-0.124939
-0.574917	link libraries. These	-0.124939
-0.574998	register stack. These	-0.124939
-1.179316	is needed. These	-0.124939
-1.104447	single precision. These	-0.124939
-0.572619	largest vector. These	-0.124939
-0.463153	of CPU. These	-0.124939
-0.463153	hardware CPU. These	-0.124939
-1.160089	this problem. These	-0.124939
-0.562311	in vectors. These	-0.124939
-0.540459	development process. These	-0.124939
-0.540459	excessively so. These	-0.124939
-0.526552	improve efficiency. These	-0.124939
-0.763655	to another. These	-0.124939
-0.763278	in www.agner.org/optimize/cppexamples.zip. These	-0.124939
-0.504319	called CodeAnalyst. These	-0.124939
-0.504319	of microprocessor. These	-0.124939
-0.725615	is best. These	-0.124939
-0.834365	table lookup. These	-0.124939
-0.725615	and free. These	-0.124939
-0.463002	and 0x4700. These	-0.124939
-0.463002	syntax checks. These	-0.124939
-0.463002	the server. These	-0.124939
-0.658866	for AVX. These	-0.124939
-0.358428	= 128. These	-0.124939
-0.358428	Addison-Wesley, 1996. These	-0.124939
-0.358428	so-called commpage. These	-0.124939
-0.358428	and Fortran. These	-0.124939
-0.358428	with _mm. These	-0.124939
-0.358428	table (GOT). These	-0.124939
-0.358428	Windows 3.x. These	-0.124939
-0.358428	Performance Primitives". These	-0.124939
-2.073745	of the virtual	-0.124939
-2.383769	when the virtual	-0.124939
-1.980252	using the virtual	-0.124939
-0.897736	avoiding the virtual	-0.124939
-1.071175	bypass the virtual	-0.124939
-2.456161	of a virtual	-0.124939
-2.278805	to a virtual	-0.124939
-2.535731	in a virtual	-0.124939
-2.026892	for a virtual	-0.124939
-2.205907	as a virtual	-0.124939
-1.285278	call a virtual	-0.124939
-1.923402	instead of virtual	-0.124939
-0.600925	misprediction of virtual	-0.124939
-0.600914	dispatch to virtual	-0.124939
-0.600914	Call to virtual	-0.124939
-1.543488	pointers and virtual	-0.124939
-0.600437	tables, and virtual	-0.124939
-0.901908	machine. The virtual	-0.124939
-1.288319	obtained with virtual	-0.124939
-0.894574	polymorphism with virtual	-0.124939
-0.887632	important. A virtual	-0.124939
-0.594262	necessary. A virtual	-0.124939
-1.540159	for other virtual	-0.124939
-0.599847	called. If virtual	-0.124939
-0.898153	least one virtual	-0.124939
-1.575911	has no virtual	-0.124939
-1.469178	can avoid virtual	-0.124939
-0.887085	CPU. These virtual	-0.124939
-0.658981	{ public: virtual	-0.726999
-0.580564	the inefficient virtual	-0.124939
-0.577271	functions. Avoid virtual	-0.124939
-0.854703	This so-called virtual	-0.124939
-0.415450	the Java virtual	-0.124939
-0.415450	so-called Java virtual	-0.124939
-0.659439	void NotPolymorphic(); virtual	-0.124939
-0.358715	multiple inheritance, virtual	-0.124939
-1.077973	loading of several	-0.124939
-0.601443	Fortran and several	-0.124939
-0.601654	decoded in several	-0.124939
-0.889768	C++ for several	-0.124939
-0.595348	optimize for several	-0.124939
-0.889768	templates for several	-0.124939
-0.595348	delayed for several	-0.124939
-0.595348	prepared for several	-0.124939
-0.901885	risk that several	-0.124939
-0.942925	There are several	-0.425969
-1.077131	accessed by several	-0.124939
-0.600960	test on several	-0.124939
-1.517746	functions have several	-0.124939
-1.059475	systems have several	-0.124939
-0.600380	except when several	-0.124939
-1.400850	program has several	-0.124939
-0.579303	compilers has several	-0.124939
-0.858709	keyword has several	-0.124939
-0.579303	syntax has several	-0.124939
-1.072467	memory. If several	-0.124939
-0.599668	cached, but several	-0.124939
-0.597620	split between several	-0.124939
-0.999356	can take several	-0.124939
-0.560121	often take several	-0.124939
-1.713168	to test several	-0.124939
-1.050158	that contains several	-0.124939
-0.872272	that requires several	-0.124939
-1.557194	to load several	-0.124939
-0.581632	control statement several	-0.124939
-1.012682	can save several	-0.124939
-0.566945	have provided several	-0.124939
-0.998284	software package several	-0.124939
-0.504600	it took several	-0.124939
-0.358629	versions alternatingly several	-0.124939
-0.358629	microprocessor wastes several	-0.124939
-0.358629	power. Connecting several	-0.124939
-2.335212	the function through	-0.124939
-2.265355	a function through	-0.124939
-1.460774	critical function through	-0.124939
-1.073232	to data through	-0.124939
-0.595854	// loop through	-0.425969
-1.329349	child class through	-0.124939
-1.122144	generation class through	-0.124939
-1.279949	derived class through	-0.124939
-0.599022	accesses b through	-0.124939
-1.790960	the library through	-0.124939
-1.102112	or object through	-0.124939
-1.626925	an object through	-0.124939
-0.849098	data object through	-0.124939
-1.845956	a variable through	-0.124939
-1.964659	is called through	-0.124939
-0.597323	own address through	-0.124939
-0.842745	is accessed through	-0.124939
-0.609341	are accessed through	-0.346788
-0.436690	fact accessed through	-0.124939
-0.436690	necessarily accessed through	-0.124939
-0.523468	code goes through	-0.124939
-0.523468	DLL goes through	-0.124939
-0.648696	may go through	-0.124939
-0.456496	variables go through	-0.124939
-0.456496	must go through	-0.124939
-0.582844	from main through	-0.124939
-1.338034	// Loop through	-0.124939
-0.572873	download updates through	-0.124939
-0.827786	the GOT through	-0.124939
-0.562632	extra jump through	-0.124939
-0.550435	versions 7 through	-0.124939
-0.540808	line separately through	-0.124939
-0.659324	the caller through	-0.124939
-0.358658	block. Walking through	-0.124939
-0.358658	will propagate through	-0.124939
-0.358658	than looping through	-0.124939
-0.358658	be propagated through	-0.124939
-1.970129	for the common	-0.124939
-1.629115	It is common	-0.249877
-1.479579	is a common	-0.124939
-1.726574	using a common	-0.124939
-1.348625	also a common	-0.124939
-1.463526	making a common	-0.124939
-1.941072	versions of common	-0.124939
-0.601489	branch. The common	-0.124939
-1.708595	functions for common	-0.124939
-0.601131	short or common	-0.124939
-2.274466	such as common	-0.124939
-2.185041	is more common	-0.124939
-0.600258	46 A common	-0.124939
-1.917695	and other common	-0.124939
-0.897895	the most common	-0.124939
-0.648804	The most common	-0.124939
-1.465595	for many common	-0.124939
-0.868845	that many common	-0.124939
-0.544704	problems. Some common	-0.124939
-0.544704	best. Some common	-0.124939
-0.544704	stupid. Some common	-0.124939
-0.591279	93). All common	-0.124939
-1.167145	This allows common	-0.124939
-0.586417	to eliminate common	-0.124939
-0.706265	can eliminate common	-0.124939
-0.527119	'this' pointer, common	-0.124939
-0.358744	function inlining, common	-0.124939
-2.155905	a = a,	-0.124939
-0.882131	true = a,	-0.124939
-0.595666	-1 = a,	-0.425969
-1.058223	-(-a) = a,	-0.124939
-1.639386	even if a,	-0.124939
-1.013796	{ int a,	-0.425969
-0.568138	integers int a,	-0.124939
-0.568138	14.10 int a,	-0.124939
-0.568138	14.11 int a,	-0.124939
-0.568138	m;} int a,	-0.124939
-0.568138	8.6a int a,	-0.124939
-0.568138	8.6b int a,	-0.124939
-0.568138	1.6; int a,	-0.124939
-0.600122	a.y);} vector a,	-0.124939
-0.568209	14.14b double a,	-0.124939
-0.568209	14.14a double a,	-0.124939
-0.568209	14.18c double a,	-0.124939
-0.568209	8.2a double a,	-0.124939
-1.407141	{ float a,	-0.124939
-0.529892	66 float a,	-0.124939
-0.529892	11.1a float a,	-0.124939
-0.529892	11.1b float a,	-0.124939
-0.529892	8.16 float a,	-0.124939
-0.529892	14.18a float a,	-0.124939
-0.529892	14.18b float a,	-0.124939
-1.361352	this example, a,	-0.124939
-1.569978	const & a,	-0.124939
-0.184237	SomeFunction (int a,	-0.522879
-1.595106	int i, a,	-0.124939
-0.431040	way: bool a,	-0.124939
-0.431040	7.10a bool a,	-0.124939
-0.431040	7.9a bool a,	-0.124939
-0.504841	{ Vec16s a,	-0.124939
-0.463476	objects Vec8s a,	-0.124939
-0.358801	{} vector(float a,	-0.124939
-0.358801	b memcpy(b, a,	-0.124939
-2.643825	to the thread	-0.124939
-2.338766	in the thread	-0.124939
-0.600455	increasing the thread	-0.124939
-0.899886	fix the thread	-0.124939
-2.537196	in a thread	-0.124939
-1.885064	if a thread	-0.124939
-1.897483	than a thread	-0.124939
-1.635933	when a thread	-0.124939
-0.895203	setting a thread	-0.124939
-0.895203	lock a thread	-0.124939
-0.896173	process or thread	-0.124939
-0.598588	task or thread	-0.124939
-0.600870	generally not thread	-0.124939
-0.600289	doubled. A thread	-0.124939
-2.632812	the same thread	-0.124939
-1.808110	the other thread	-0.124939
-1.244410	to one thread	-0.124939
-0.864535	that one thread	-0.124939
-0.582355	where one thread	-0.124939
-1.587392	for each thread	-0.124939
-0.842001	then each thread	-0.124939
-0.570442	give each thread	-0.124939
-0.570442	contrary, each thread	-0.124939
-1.923604	operating system thread	-0.124939
-0.945320	by another thread	-0.124939
-0.884756	with another thread	-0.124939
-0.497723	while another thread	-0.124939
-0.497723	interface, another thread	-0.124939
-0.461645	a separate thread	-0.346788
-0.436221	stack. Each thread	-0.124939
-0.436221	simultaneously. Each thread	-0.124939
-0.436221	threads. Each thread	-0.124939
-0.436221	cores. Each thread	-0.124939
-0.540983	a third thread	-0.124939
-0.463440	a high-priority thread	-0.124939
-0.358773	a low-priority thread	-0.124939
-0.358773	a higher-priority thread	-0.124939
-1.056898	help files etc.	-0.124939
-1.016547	/ b) etc.	-0.124939
-0.182815	trigonometric functions, etc.	-0.124939
-0.575016	number 2, etc.	-0.124939
-0.840632	Gnu compiler, etc.	-0.124939
-0.556941	cache size, etc.	-0.124939
-0.540481	invalid pointers, etc.	-0.124939
-0.540481	11 programming, etc.	-0.124939
-0.526573	arithmetic units, etc.	-0.124939
-0.504339	(/arch:SSE2, /arch:AVX etc.	-0.124939
-0.504339	subtraction, multiplication, etc.	-0.124939
-0.725648	loop counters, etc.	-0.124939
-0.504339	base access, etc.	-0.124939
-0.504339	are comparisons, etc.	-0.124939
-0.504339	other way, etc.	-0.124939
-0.658894	Sunday, Monday, etc.	-0.124939
-0.463020	resources, databases, etc.	-0.124939
-0.463020	semaphores, mutexes, etc.	-0.124939
-0.658894	screen resolutions, etc.	-0.124939
-0.463020	horizontal add, etc.	-0.124939
-0.463020	than loops, etc.	-0.124939
-0.658894	constant propagation, etc.	-0.124939
-0.463020	its limit, etc.	-0.124939
-0.658894	branch mispredictions, etc.	-0.124939
-0.358443	#include <ia32intrin.h> etc.	-0.124939
-0.358443	hash maps etc.	-0.124939
-0.358443	strlen, sprintf, etc.	-0.124939
-0.358443	exp, sin, etc.	-0.124939
-0.358443	graphics cards, etc.	-0.124939
-0.358443	-msse2, -mavx, etc.	-0.124939
-0.358443	memory ports, etc.	-0.124939
-0.358443	pattern history, etc.	-0.124939
-0.358443	database connections, etc.	-0.124939
-0.358443	-mAVX -axSSE3, etc.	-0.124939
-0.358443	/arch:AVX /QaxSSE3, etc.	-0.124939
-0.358443	Visual Basic, etc.	-0.124939
-0.358443	graphic brushes, etc.	-0.124939
-0.358443	dialog boxes, etc.	-0.124939
-0.358443	abort(), _endthread(), etc.	-0.124939
-0.358443	point exceptions, etc.	-0.124939
-0.600437	4 and AMD	-0.124939
-0.600437	VTune and AMD	-0.124939
-0.601468	119). The AMD	-0.124939
-1.649579	code for AMD	-0.124939
-0.598426	VTune, for AMD	-0.124939
-0.598426	Guide for AMD	-0.124939
-1.136374	not on AMD	-0.124939
-0.584107	all on AMD	-0.124939
-0.584107	size on AMD	-0.124939
-1.249256	performance on AMD	-0.124939
-1.043989	supported on AMD	-0.124939
-1.152346	well on AMD	-0.124939
-2.273460	such as AMD	-0.124939
-0.899923	CPUs use AMD	-0.124939
-1.174334	128 bytes AMD	-0.124939
-0.593956	platforms. AMD AMD	-0.124939
-0.590592	Supports both AMD	-0.124939
-1.144609	Intel processors. AMD	-0.124939
-0.015343	of Intel, AMD	-1.079181
-0.162672	for Intel, AMD	-0.124939
-0.249809	an Intel, AMD	-0.124939
-0.162672	from Intel, AMD	-0.124939
-0.162672	both Intel, AMD	-0.124939
-0.867612	aligned operands AMD	-0.124939
-0.863313	x86-64 platforms. AMD	-0.124939
-0.526974	xopintrin.h (Gnu) AMD	-0.124939
-0.659439	page 131. AMD	-0.124939
-0.659439	unaligned op. AMD	-0.124939
-0.463367	SSE4A ammintrin.h AMD	-0.124939
-0.358715	_mm_exp_ps _mm_exp_pd AMD	-0.124939
-0.358715	__vrs4_expf __vrd2_exp AMD	-0.124939
-0.358715	16 XOP, AMD	-0.124939
-0.358715	AVX immintrin.h AMD	-0.124939
-1.595796	is to compile	-0.425969
-1.599048	possible to compile	-0.124939
-2.049834	want to compile	-0.124939
-1.923645	code and compile	-0.124939
-0.900253	framework and compile	-0.124939
-1.819569	that you compile	-0.124939
-1.454409	when you compile	-0.124939
-0.591426	First you compile	-0.124939
-0.621085	it at compile	-0.124939
-0.438559	compiler at compile	-0.124939
-0.438559	possible at compile	-0.124939
-0.438559	value at compile	-0.124939
-0.438559	available at compile	-0.124939
-0.438559	calculations at compile	-0.124939
-0.438559	etc. at compile	-0.124939
-0.752555	done at compile	-0.124939
-0.166125	calculated at compile	-0.124939
-0.066309	known at compile	-0.698970
-0.207859	resolved at compile	-0.301030
-0.438559	instantiated at compile	-0.124939
-0.438559	(1./1.2345) at compile	-0.124939
-1.189353	If we compile	-0.124939
-2.937667	of the exception	-0.124939
-2.616894	to the exception	-0.124939
-1.948744	for the exception	-0.124939
-2.191821	then the exception	-0.124939
-1.435266	off the exception	-0.124939
-2.056204	program is exception	-0.124939
-1.639204	cost of exception	-0.124939
-1.078040	alternatives to exception	-0.124939
-1.077408	debugging and exception	-0.124939
-0.599943	overflow. The exception	-0.124939
-0.599943	anyway. The exception	-0.124939
-1.740640	support for exception	-0.124939
-1.201767	think that exception	-0.124939
-1.547321	used by exception	-0.124939
-1.295644	relies on exception	-0.124939
-1.172034	of an exception	-0.124939
-0.581219	if an exception	-0.425969
-0.576161	catch an exception	-0.124939
-0.576161	throw an exception	-0.124939
-0.576161	raising an exception	-0.124939
-2.416231	to use exception	-0.124939
-1.196012	your program exception	-0.124939
-0.599823	compilers. If exception	-0.124939
-2.049369	is no exception	-0.124939
-1.055065	if no exception	-0.124939
-1.907884	of using exception	-0.124939
-1.286792	The C++ exception	-0.124939
-0.598327	its possible exception	-0.124939
-0.598001	throw any exception	-0.124939
-0.580264	-ipo No exception	-0.124939
-0.578791	possibly save exception	-0.124939
-1.313104	reason why exception	-0.124939
-0.314572	with structured exception	-0.124939
-0.314572	on structured exception	-0.124939
-0.129406	can disable exception	-0.425969
-0.463294	// Enable exception	-0.124939
-0.203940	wrap the allocated	-0.124939
-0.601161	owns the allocated	-0.124939
-1.625244	that is allocated	-0.124939
-1.800353	object is allocated	-0.124939
-0.898348	block is allocated	-0.124939
-0.601733	cleanup of allocated	-0.124939
-0.899029	error. The allocated	-0.124939
-0.600025	collection. The allocated	-0.124939
-1.715012	can be allocated	-0.522879
-1.981512	will be allocated	-0.124939
-2.153859	that are allocated	-0.124939
-2.046323	there are allocated	-0.124939
-0.202875	sizes are allocated	-0.425969
-1.833026	it has allocated	-0.124939
-0.599784	Any other allocated	-0.124939
-1.318940	to all allocated	-0.124939
-1.563539	that all allocated	-0.124939
-1.946140	for each allocated	-0.124939
-1.994007	make sure allocated	-0.124939
-0.594466	inefficient. An allocated	-0.124939
-1.762417	has been allocated	-0.124939
-1.460909	its own allocated	-0.124939
-0.229171	in dynamically allocated	-0.124939
-0.229171	as dynamically allocated	-0.124939
-0.229171	different dynamically allocated	-0.124939
-0.229171	small dynamically allocated	-0.124939
-0.229171	align dynamically allocated	-0.124939
-0.099641	Aligning dynamically allocated	-0.124939
-0.229171	aligning dynamically allocated	-0.124939
-0.434852	memory Memory allocated	-0.124939
-0.434852	include: Memory allocated	-0.124939
-0.960085	time slices allocated	-0.124939
-2.610472	it is small	-0.124939
-2.295900	function is small	-0.124939
-2.426982	This is small	-0.124939
-1.071636	elements is small	-0.124939
-0.937686	count is small	-0.425969
-2.427271	is a small	-0.124939
-2.470615	in a small	-0.124939
-2.144718	with a small	-0.124939
-1.874960	than a small	-0.124939
-1.971872	make a small	-0.124939
-1.468418	only a small	-0.124939
-1.063057	writing a small	-0.124939
-1.193304	allocate a small	-0.124939
-0.601702	design of small	-0.124939
-1.378453	relevant to small	-0.124939
-0.601519	economy and small	-0.124939
-1.977391	used in small	-0.124939
-1.189762	except for small	-0.124939
-0.203391	exp(x) for small	-0.425969
-0.601101	instructions or small	-0.124939
-0.900997	important on small	-0.124939
-0.900801	be as small	-0.124939
-1.286371	divided into small	-0.124939
-0.495094	on such small	-0.124939
-0.584570	into many small	-0.124939
-0.584570	uses many small	-0.124939
-1.187999	for some small	-0.124939
-1.454190	is so small	-0.124939
-1.212770	are so small	-0.124939
-1.610780	is very small	-0.124939
-0.581870	on very small	-0.124939
-1.043264	are typically small	-0.124939
-0.905356	are too small	-0.124939
-0.536083	time too small	-0.124939
-1.288962	or writing small	-0.124939
-0.540895	the relatively small	-0.124939
-0.764202	be kept small	-0.124939
-2.611341	for the overflow	-0.124939
-2.085451	make the overflow	-0.124939
-2.329590	because the overflow	-0.124939
-1.101837	case of overflow	-0.124939
-0.897717	problems of overflow	-0.124939
-1.428638	risk of overflow	-0.124939
-0.480443	check for overflow	-0.204120
-2.007245	sure that overflow	-0.124939
-0.898616	big that overflow	-0.124939
-0.900979	around on overflow	-0.124939
-1.263731	if an overflow	-0.124939
-1.254886	generate an overflow	-0.124939
-0.592431	away an overflow	-0.124939
-1.832716	to make overflow	-0.425969
-2.292927	floating point overflow	-0.124939
-1.785782	Floating point overflow	-0.124939
-1.019344	An integer overflow	-0.124939
-1.126039	signed integer overflow	-0.124939
-0.581185	trap integer overflow	-0.124939
-1.193419	that no overflow	-0.124939
-1.282643	An array overflow	-0.124939
-1.282817	of possible overflow	-0.124939
-0.888977	much about overflow	-0.124939
-0.887889	result. An overflow	-0.124939
-0.826146	doesn't cause overflow	-0.124939
-0.561886	would cause overflow	-0.124939
-0.590267	15 Integer overflow	-0.124939
-0.872428	inputs give overflow	-0.124939
-0.582899	A positive overflow	-0.124939
-0.570181	for buffer overflow	-0.124939
-0.526953	protection against overflow	-0.124939
-0.504700	to ignore overflow	-0.124939
-0.358701	An uncaught overflow	-0.124939
-1.641269	b; a +=	-0.124939
-0.600967	8.5b a +=	-0.124939
-0.184086	100; i +=	-0.124939
-0.510024	size; i +=	-0.124939
-0.064592	256; i +=	-0.522879
-0.510024	20; i +=	-0.124939
-1.159235	temp; temp +=	-0.124939
-0.372376	{ sum +=	-0.124939
-0.092543	i++) sum +=	-0.301030
-0.133188	{ list[i] +=	-0.425969
-0.459952	i++){ list[i] +=	-0.124939
-0.326265	i+=3,i_div_3++){ list[i] +=	-0.124939
-0.550386	r1; c1 +=	-0.124939
-0.358888	// s +=	-0.124939
-0.358888	//Loopby4 s +=	-0.124939
-0.358754	{ sum1 +=	-0.124939
-0.358754	list[i+1];} sum1 +=	-0.124939
-0.526932	nonzero u.i +=	-0.124939
-0.526932	list[i]; sum2 +=	-0.124939
-0.526932	Y; Y +=	-0.124939
-0.504837	i_div_3; list[i+1] +=	-0.124939
-0.504680	SIZE; r1 +=	-0.124939
-0.504837	i_div_3; list[i+2] +=	-0.124939
-0.504680	Z; Z +=	-0.124939
-0.504680	a[i+2]; s3 +=	-0.124939
-0.504680	a[i+1]; s2 +=	-0.124939
-0.463330	{ s0 +=	-0.124939
-0.463330	a[i]; s1 +=	-0.124939
-0.358686	x *const_cast<int*>(&x) +=	-0.124939
-0.358686	& 15] +=	-0.124939
-0.358686	i++) matrix[FuncRow(i)][FuncCol(i)] +=	-0.124939
-0.358686	39 matrix[i][j] +=	-0.124939
-1.300436	are the integers	-0.124939
-2.130853	size of integers	-0.124939
-0.600171	addition of integers	-0.124939
-1.293522	Conversion of integers	-0.124939
-0.600935	types to integers	-0.124939
-0.600935	numbers to integers	-0.124939
-0.203803	numbers and integers	-0.124939
-2.103027	they are integers	-0.124939
-0.599223	operators using integers	-0.124939
-1.158326	or two integers	-0.124939
-0.872030	If two integers	-0.124939
-0.999330	or 64-bit integers	-0.124939
-0.368677	use 64-bit integers	-0.124939
-0.196716	conversions between integers	-0.425969
-1.224924	Conversions between integers	-0.124939
-0.565857	of 32-bit integers	-0.124939
-0.565857	use 32-bit integers	-0.124939
-0.833471	two 32-bit integers	-0.124939
-0.455148	of unsigned integers	-0.124939
-0.227691	and unsigned integers	-0.124939
-0.170493	with unsigned integers	-0.124939
-0.455148	use unsigned integers	-0.124939
-0.455148	convert unsigned integers	-0.124939
-0.455148	versus unsigned integers	-0.124939
-0.592716	each, four integers	-0.124939
-0.592398	each, eight integers	-0.124939
-0.541169	of signed integers	-0.124939
-0.917355	to signed integers	-0.124939
-0.851391	eight 16-bit integers	-0.124939
-0.566967	cannot multiply integers	-0.124939
-0.527135	either sixteen integers	-0.124939
-0.314699	as 8-bit integers	-0.124939
-0.314699	using 8-bit integers	-0.124939
-1.029388	with the option	-0.124939
-2.382571	on the option	-0.124939
-0.599090	Use the option	-0.124939
-0.598350	e.g. the option	-0.124939
-0.594610	optimizations with option	-0.124939
-0.888316	made with option	-0.124939
-0.594610	well-defined with option	-0.124939
-0.597113	manual. This option	-0.124939
-0.597113	-fsource-asm). This option	-0.124939
-0.513442	have an option	-0.425969
-0.660467	has an option	-0.425969
-0.576244	specify an option	-0.124939
-1.830073	the compiler option	-0.124939
-1.471956	a compiler option	-0.124939
-2.006535	The compiler option	-0.124939
-0.583501	same compiler option	-0.124939
-0.900083	supports this option	-0.124939
-1.067506	without any option	-0.124939
-0.596759	strongest optimization option	-0.124939
-1.673219	exception handling option	-0.124939
-0.577916	assembly output option	-0.124939
-0.858515	loop unroll option	-0.124939
-0.575344	linking (e.g. option	-0.124939
-0.562813	Use 12 option	-0.124939
-0.504821	The -fpie option	-0.124939
-0.659582	source annotation option	-0.124939
-0.358787	map file" option	-0.124939
-2.113277	it is good	-0.425969
-2.656636	It is good	-0.124939
-1.072188	precision is good	-0.124939
-1.563841	is a good	-0.425969
-1.905400	be a good	-0.124939
-1.433050	not a good	-0.124939
-1.211580	has a good	-0.124939
-1.166575	because a good	-0.124939
-0.592365	done a good	-0.124939
-0.592365	therefore a good	-0.124939
-0.792667	get a good	-0.124939
-1.060003	quite a good	-0.124939
-1.378480	availability of good	-0.124939
-0.601555	recommendation for good	-0.124939
-0.503677	languages are good	-0.425969
-0.590378	always as good	-0.124939
-0.590378	optimized as good	-0.124939
-0.590378	optimize as good	-0.124939
-0.590378	cached as good	-0.124939
-2.585511	is not good	-0.124939
-1.092381	} A good	-0.124939
-0.843637	performance. A good	-0.124939
-0.571317	zero. A good	-0.124939
-0.571317	operation. A good	-0.124939
-0.571317	index. A good	-0.124939
-0.571317	exploited. A good	-0.124939
-0.599743	example, all good	-0.124939
-0.598379	Has many good	-0.124939
-1.444078	a very good	-0.124939
-0.551956	not very good	-0.124939
-0.943457	have very good	-0.124939
-0.808069	some very good	-0.124939
-0.504901	numbers mean good	-0.124939
-3.031907	of the power	-0.124939
-1.805790	to the power	-0.602060
-0.924133	is a power	-1.329059
-0.943973	be a power	-0.726999
-1.183191	by a power	-0.602060
-1.368700	not a power	-0.124939
-0.864753	matrix a power	-0.124939
-0.582469	been a power	-0.124939
-0.582469	columns a power	-0.124939
-0.582469	N a power	-0.124939
-0.898184	Calculate integer power	-0.124939
-0.590449	15.1d. Integer power	-0.124939
-0.720074	a high power	-0.425969
-0.577495	high processing power	-0.124939
-0.835829	with low power	-0.124939
-0.541157	and computing power	-0.124939
-0.358887	the computational power	-0.124939
-2.217345	of the matrix	-0.124939
-2.415098	and the matrix	-0.124939
-2.790362	in the matrix	-0.124939
-2.047819	make the matrix	-0.124939
-1.434097	divide the matrix	-0.124939
-0.599416	transpose the matrix	-0.124939
-1.869035	of a matrix	-0.124939
-1.623619	in a matrix	-0.249877
-1.827957	if a matrix	-0.124939
-0.202452	transpose a matrix	-0.124939
-0.886696	implementing a matrix	-0.124939
-0.886696	transposes a matrix	-0.124939
-0.593786	Transposing a matrix	-0.124939
-1.078276	writes to matrix	-0.124939
-0.679712	columns in matrix	-0.425969
-0.599515	rows/columns in matrix	-0.124939
-0.601535	lines for matrix	-0.124939
-0.594368	78). A matrix	-0.124939
-0.594368	needed? A matrix	-0.124939
-1.823423	for different matrix	-0.124939
-1.435381	with different matrix	-0.124939
-0.490708	64 64 matrix	-0.124939
-1.396616	a big matrix	-0.124939
-0.359636	and copy matrix	-0.425969
-0.538260	512 512 matrix	-0.124939
-0.577491	times per matrix	-0.124939
-1.089288	// define matrix	-0.124939
-1.224587	to transpose matrix	-0.124939
-2.099199	on a Linux	-0.124939
-1.941147	versions of Linux	-0.124939
-1.078190	identical to Linux	-0.124939
-1.638942	Windows and Linux	-0.124939
-1.436269	as in Linux	-0.124939
-1.577063	data in Linux	-0.124939
-0.598400	introduced in Linux	-0.124939
-0.598400	overridden in Linux	-0.124939
-1.684591	compiler for Linux	-0.124939
-1.570647	available for Linux	-0.124939
-1.286788	choice for Linux	-0.124939
-0.601296	__attribute__((aligned(64))); // Linux	-0.124939
-0.601038	possible on Linux	-0.124939
-1.286799	Intel compiler Linux	-0.124939
-0.937546	Gnu compiler Linux	-0.602060
-0.930143	and 64-bit Linux	-0.124939
-0.997555	in 64-bit Linux	-0.602060
-0.962478	for 64-bit Linux	-0.124939
-0.527940	Therefore, 64-bit Linux	-0.124939
-0.598519	/MT). In Linux	-0.124939
-0.940504	and 32-bit Linux	-0.124939
-1.623542	in 32-bit Linux	-0.124939
-0.550755	In 32-bit Linux	-0.124939
-0.805905	between 32-bit Linux	-0.124939
-1.491226	64 bit Linux	-0.124939
-1.415101	32 bit Linux	-0.124939
-0.889111	here about Linux	-0.124939
-1.358202	compiler Windows Linux	-0.124939
-0.582921	also supports Linux	-0.124939
-0.407017	for Windows, Linux	-0.425969
-0.637075	64-bit Windows, Linux	-0.124939
-0.504781	_WIN32 _WIN32 Linux	-0.124939
-0.585877	have not been	-0.124939
-0.373666	has not been	-0.124939
-0.585877	had not been	-0.124939
-0.585877	Has not been	-0.124939
-0.543118	code have been	-0.124939
-0.543118	compiler have been	-0.124939
-0.543118	program have been	-0.124939
-0.792257	b have been	-0.124939
-0.326238	objects have been	-0.301030
-0.922002	elements have been	-0.124939
-0.543118	examples have been	-0.124939
-0.543118	diagonal have been	-0.124939
-0.543118	could have been	-0.124939
-0.543118	spots have been	-0.124939
-0.543118	isolation have been	-0.124939
-0.562099	that has been	-0.124939
-0.624247	it has been	-0.249877
-0.482917	time has been	-0.124939
-0.504273	pointer has been	-0.124939
-0.177533	registers has been	-0.124939
-0.482917	file has been	-0.124939
-0.482917	problem has been	-0.124939
-0.482917	count has been	-0.124939
-0.482917	p has been	-0.124939
-0.482917	STL has been	-0.124939
-0.482917	seconds has been	-0.124939
-0.482917	spot has been	-0.124939
-0.482917	function" has been	-0.124939
-0.482917	i=0; has been	-0.124939
-0.541286	has already been	-0.124939
-1.664717	up to cause	-0.124939
-0.600193	small to cause	-0.124939
-1.991594	likely to cause	-0.124939
-1.425696	times and cause	-0.124939
-0.598408	invalid and cause	-0.124939
-0.598408	stride and cause	-0.124939
-0.598408	caches and cause	-0.124939
-1.202140	problems that cause	-0.124939
-1.769225	it can cause	-0.124939
-0.977391	This can cause	-0.124939
-0.997318	this can cause	-0.124939
-0.846838	memory can cause	-0.124939
-1.147868	array can cause	-0.124939
-0.573024	software can cause	-0.124939
-0.573024	calculations can cause	-0.124939
-0.997318	overflow can cause	-0.124939
-0.573024	alignment can cause	-0.124939
-0.573024	generation can cause	-0.124939
-0.573024	BTB can cause	-0.124939
-1.317660	it may cause	-0.124939
-0.911895	This may cause	-0.124939
-1.116822	which may cause	-0.124939
-0.578536	members may cause	-0.124939
-0.582332	so will cause	-0.124939
-0.582332	operators will cause	-0.124939
-0.200102	0x2710 will cause	-0.124939
-1.267145	that doesn't cause	-0.124939
-0.566390	integer doesn't cause	-0.124939
-1.293330	a common cause	-0.124939
-1.299050	most common cause	-0.124939
-0.883503	reduction would cause	-0.124939
-0.828035	a frequent cause	-0.124939
-0.788971	Such schemes cause	-0.124939
-2.376790	of the AVX	-0.425969
-2.482141	to the AVX	-0.124939
-2.665490	in the AVX	-0.124939
-1.915146	for the AVX	-0.124939
-2.000751	if the AVX	-0.124939
-2.175536	use the AVX	-0.124939
-2.218911	If the AVX	-0.124939
-0.203233	leaving the AVX	-0.425969
-0.597643	Enable the AVX	-0.124939
-0.899029	possible. The AVX	-0.124939
-0.600025	later. The AVX	-0.124939
-1.502385	compiled for AVX	-0.124939
-1.572167	{ // AVX	-0.425969
-0.674924	{...} // AVX	-0.425969
-1.551363	only if AVX	-0.124939
-0.743950	compiled with AVX	-0.425969
-0.600510	example, use AVX	-0.124939
-0.593684	going from AVX	-0.124939
-0.593684	transition from AVX	-0.124939
-1.576409	has no AVX	-0.124939
-1.071275	instr. set AVX	-0.124939
-0.893517	int 4 AVX	-0.124939
-0.360358	and without AVX	-0.124939
-1.134382	compiled without AVX	-0.124939
-0.595450	search instructions AVX	-0.124939
-0.788636	4 256 AVX	-0.124939
-0.788636	8 256 AVX	-0.124939
-1.212150	operating system. AVX	-0.124939
-0.143346	elements. 12.1 AVX	-0.124939
-0.143346	105 12.1 AVX	-0.124939
-0.358773	PCLMUL wmmintrin.h AVX	-0.124939
-1.938438	use of classes	-0.124939
-0.902185	Structures and classes	-0.124939
-1.378493	strings in classes	-0.124939
-0.185525	or vector classes	-0.124939
-1.016146	with vector classes	-0.124939
-0.745491	using vector classes	-0.124939
-1.059301	Intel vector classes	-0.124939
-0.451595	Using vector classes	-0.124939
-0.229264	Define vector classes	-0.124939
-0.516150	Agner vector classes	-0.124939
-1.143431	Agner's vector classes	-0.124939
-0.185525	predefined vector classes	-0.124939
-1.286608	data into classes	-0.124939
-0.598598	into C++ classes	-0.124939
-0.153214	of container classes	-0.124939
-0.392060	that container classes	-0.124939
-0.392060	make container classes	-0.124939
-0.392060	example container classes	-0.124939
-0.392060	standard container classes	-0.124939
-0.392060	own container classes	-0.124939
-0.392060	containing container classes	-0.124939
-0.536248	of string classes	-0.124939
-0.780139	The string classes	-0.124939
-0.851670	The child classes	-0.124939
-0.463652	12.5. Vector classes	-0.124939
-0.463652	12.1. Vector classes	-0.124939
-0.567150	multiple parent classes	-0.124939
-0.065791	allocation. Container classes	-0.124939
-0.031651	9.7 Container classes	-0.124939
-0.065791	threads? Container classes	-0.124939
-0.358816	are wrapper classes	-0.124939
-0.358816	third generations classes	-0.124939
-2.044055	it is done	-0.124939
-1.808821	This is done	-0.425969
-1.582875	size is done	-0.124939
-1.266339	allocation is done	-0.124939
-0.888879	multiplication is done	-0.124939
-0.888879	inlining is done	-0.124939
-0.594897	branching is done	-0.124939
-0.594897	C2::Disp() is done	-0.124939
-0.594897	Relocation is done	-0.124939
-0.902261	standardized and done	-0.124939
-1.633503	to be done	-0.124939
-1.516288	can be done	-0.492916
-1.878490	should be done	-0.124939
-1.619699	must be done	-0.124939
-1.091498	preferably be done	-0.124939
-1.512925	operations are done	-0.124939
-0.343502	calculations are done	-0.124939
-0.592277	tests are done	-0.124939
-0.592277	Multiplications are done	-0.124939
-0.600849	said than done	-0.124939
-1.877897	I have done	-0.124939
-0.595554	others have done	-0.124939
-1.833634	it has done	-0.124939
-0.898498	is all done	-0.124939
-0.590712	15.1c was done	-0.124939
-1.497023	is usually done	-0.124939
-0.999286	not necessarily done	-0.124939
-1.327785	and is therefore	-0.124939
-1.128420	It is therefore	-0.263241
-1.336095	caching is therefore	-0.124939
-1.336095	p is therefore	-0.124939
-1.439528	memory and therefore	-0.124939
-0.592056	make and therefore	-0.124939
-1.595591	variables and therefore	-0.124939
-0.883308	microprocessor and therefore	-0.124939
-0.883308	system, and therefore	-0.124939
-0.592056	understand and therefore	-0.124939
-0.592056	m and therefore	-0.124939
-0.592056	not, and therefore	-0.124939
-0.592056	non-zero, and therefore	-0.124939
-0.592056	dependent and therefore	-0.124939
-1.578642	operations are therefore	-0.124939
-0.599661	Constructors are therefore	-0.124939
-1.699320	code can therefore	-0.124939
-1.559846	It can therefore	-0.124939
-0.888558	allocation can therefore	-0.124939
-1.697465	We can therefore	-0.124939
-0.600858	developers may therefore	-0.124939
-0.594336	examples will therefore	-0.124939
-0.594336	14.30 will therefore	-0.124939
-0.951992	code should therefore	-0.124939
-1.076931	It should therefore	-0.124939
-0.643522	You should therefore	-0.124939
-0.555401	calculations should therefore	-0.124939
-0.555401	binding should therefore	-0.124939
-0.902535	holds a precision	-0.124939
-1.375959	regardless of precision	-0.124939
-1.296777	loss of precision	-0.124939
-1.948704	the same precision	-0.124939
-2.295703	floating point precision	-0.124939
-1.786867	Floating point precision	-0.124939
-0.454915	the double precision	-0.124939
-0.846223	a double precision	-0.124939
-0.734819	to double precision	-0.124939
-0.634764	and double precision	-0.124939
-0.454915	that double precision	-0.124939
-0.454915	are double precision	-0.124939
-0.646236	than double precision	-0.124939
-0.454915	use double precision	-0.124939
-0.454915	two double precision	-0.124939
-0.576772	long double precision	-0.124939
-0.454915	four double precision	-0.124939
-0.454915	cases, double precision	-0.124939
-0.454915	Using double precision	-0.124939
-0.170432	Long double precision	-0.124939
-0.441625	for single precision	-0.124939
-0.625761	use single precision	-0.124939
-0.441625	from single precision	-0.124939
-0.441625	using single precision	-0.124939
-0.441625	constant single precision	-0.124939
-0.441625	four single precision	-0.124939
-0.441625	eight single precision	-0.124939
-0.190713	for high precision	-0.124939
-0.584182	precision require precision	-0.124939
-0.805683	have mixed precision	-0.124939
-0.143368	time. Single precision	-0.124939
-0.143368	cache. Single precision	-0.124939
-0.358844	set. High precision	-0.124939
-0.358844	enabled (single precision	-0.124939
-1.593094	have the line	-0.124939
-1.550186	then the line	-0.124939
-1.641255	with a line	-0.425969
-0.601145	pixel or line	-0.124939
-0.901214	line by line	-0.124939
-0.601057	29 with line	-0.124939
-1.377155	replace this line	-0.124939
-0.590896	code one line	-0.124939
-1.317167	than one line	-0.124939
-0.587751	the cache line	-0.221849
-0.748313	a cache line	-0.124939
-0.330090	The cache line	-0.425969
-0.486174	by cache line	-0.124939
-0.486174	used cache line	-0.124939
-0.486174	new cache line	-0.124939
-0.178337	Each cache line	-0.124939
-0.486174	entire cache line	-0.124939
-1.945900	for each line	-0.124939
-1.280447	the matrix line	-0.124939
-1.409104	a matrix line	-0.124939
-0.880734	if above line	-0.124939
-1.367494	The next line	-0.124939
-0.587872	The last line	-0.124939
-0.586502	bytes. Each line	-0.124939
-0.562773	and interpreted line	-0.124939
-0.541121	the memset line	-0.124939
-0.230874	the command line	-0.124939
-0.192619	a command line	-0.425969
-0.358758	18.1. Command line	-0.124939
-1.501023	overflow and works	-0.124939
-1.896585	code that works	-0.124939
-0.847223	one that works	-0.124939
-1.265957	library that works	-0.124939
-1.265957	version that works	-0.124939
-1.077551	sure it works	-0.124939
-1.199770	This code works	-0.124939
-0.202338	optimization. This works	-0.124939
-0.593223	addresses. This works	-0.124939
-2.026337	Intel compiler works	-0.124939
-1.059151	how this works	-0.124939
-0.595280	course, this works	-0.124939
-0.600381	sequentially. It works	-0.124939
-1.074146	which one works	-0.124939
-1.022053	The cache works	-0.124939
-0.822162	code cache works	-0.124939
-0.845962	A cache works	-0.124939
-0.895849	It also works	-0.124939
-1.684708	This method works	-0.124939
-1.357349	this method works	-0.124939
-0.594364	currently doesn't works	-0.124939
-1.769397	CPU dispatching works	-0.124939
-0.878715	code branches works	-0.124939
-0.589376	code implementation works	-0.124939
-0.779850	execution mechanism works	-0.124939
-0.536083	renaming mechanism works	-0.124939
-1.158848	Dynamic linking works	-0.124939
-1.224789	a profiler works	-0.124939
-1.141873	automatic vectorization works	-0.124939
-0.540919	that already works	-0.124939
-0.249689	than "what works	-0.124939
-0.249689	thinks "what works	-0.124939
-0.463294	and 14.13b works	-0.124939
-0.463294	101 Multithreading works	-0.124939
-0.358658	operator (|) works	-0.124939
-2.918828	in the optimized	-0.124939
-1.372612	not the optimized	-0.124939
-1.199039	run the optimized	-0.124939
-0.600784	12.2, the optimized	-0.124939
-2.251929	code is optimized	-0.124939
-1.544466	cache is optimized	-0.124939
-1.074069	expression is optimized	-0.124939
-1.678003	compilers and optimized	-0.124939
-0.601683	obscured in optimized	-0.124939
-0.898948	output. The optimized	-0.124939
-0.599984	framework. The optimized	-0.124939
-1.980417	can be optimized	-0.124939
-1.180588	often be optimized	-0.124939
-1.893859	functions are optimized	-0.124939
-0.898006	examples are optimized	-0.124939
-0.601101	inlined, or optimized	-0.124939
-2.183821	are not optimized	-0.124939
-1.925885	of an optimized	-0.124939
-1.910217	that you optimized	-0.124939
-0.599287	function, each optimized	-0.124939
-1.086200	the best optimized	-0.124939
-1.175358	library contains optimized	-0.124939
-0.552204	the well optimized	-0.124939
-0.552204	a well optimized	-0.124939
-1.602432	is simply optimized	-0.124939
-0.573095	Currently includes optimized	-0.124939
-0.450455	the fully optimized	-0.124939
-0.450455	not fully optimized	-0.124939
-0.223015	a highly optimized	-0.124939
-0.241245	are highly optimized	-0.425969
-0.223015	making highly optimized	-0.124939
-0.249718	compiler. Not optimized	-0.124939
-0.249718	builder. Not optimized	-0.124939
-0.463367	each carefully optimized	-0.124939
-2.769656	it is inside	-0.124939
-1.439862	loop is inside	-0.124939
-1.910102	must be inside	-0.124939
-0.601239	declaring it inside	-0.124939
-2.643803	the code inside	-0.124939
-1.846963	of memory inside	-0.124939
-2.092974	is used inside	-0.124939
-0.598243	making objects inside	-0.124939
-0.597994	shared variable inside	-0.124939
-1.778802	the table inside	-0.124939
-0.796333	the branch inside	-0.425969
-1.341269	a branch inside	-0.124939
-0.805718	The branch inside	-0.124939
-0.893866	for elements inside	-0.124939
-0.594548	size arrays inside	-0.124939
-1.100065	the calculations inside	-0.124939
-0.540978	on calculations inside	-0.124939
-1.183546	point calculations inside	-0.124939
-0.881060	a counter inside	-0.124939
-0.589683	no branches inside	-0.124939
-0.389694	be declared inside	-0.425969
-0.386113	objects declared inside	-0.425969
-0.140536	Variables declared inside	-0.425969
-0.575212	performance counters inside	-0.124939
-0.846725	overflow condition inside	-0.124939
-0.852973	is defined inside	-0.124939
-0.474391	object defined inside	-0.124939
-0.557429	function body inside	-0.124939
-0.550319	what happens inside	-0.124939
-0.540786	needed. Objects inside	-0.124939
-0.540901	because nothing inside	-0.124939
-0.463276	multiplication, etc.) inside	-0.124939
-0.358643	kept entirely inside	-0.124939
-0.358643	than log) inside	-0.124939
-0.902255	See the manual	-0.425969
-1.502874	not a manual	-0.124939
-0.600538	49 and manual	-0.124939
-0.600538	paragraph and manual	-0.124939
-1.502829	explained in manual	-0.124939
-1.247557	given in manual	-0.124939
-0.880905	discussed in manual	-0.124939
-0.590826	chapter in manual	-0.124939
-1.046314	provided in manual	-0.124939
-1.347344	listed in manual	-0.124939
-0.590826	19 in manual	-0.124939
-0.201850	detail in manual	-0.124939
-0.590826	code" in manual	-0.124939
-0.590826	covered in manual	-0.124939
-0.585748	below. This manual	-0.124939
-0.871056	important. This manual	-0.124939
-0.585748	smaller. This manual	-0.124939
-0.585748	Introduction This manual	-0.124939
-0.585748	158. This manual	-0.124939
-1.924253	the compiler manual	-0.124939
-0.638417	to this manual	-0.602060
-1.457838	for this manual	-0.124939
-0.597440	cases. See manual	-0.124939
-1.780362	the Gnu manual	-0.124939
-0.583046	and my manual	-0.124939
-0.143400	CPU (See manual	-0.425969
-0.358948	CPUs (See manual	-0.124939
-0.358948	mispredicted (See manual	-0.124939
-0.504935	the present manual	-0.124939
-0.358868	The present manual	-0.124939
-0.463513	the vectorclass manual	-0.124939
-0.601645	compute a /	-0.124939
-0.846541	= b /	-0.124939
-0.991779	> b /	-0.124939
-0.598781	+= i /	-0.124939
-0.584860	sets). Here, /	-0.124939
-0.865296	* 5 /	-0.124939
-0.863127	= 1. /	-0.124939
-1.034092	= temp /	-0.124939
-0.854299	eax, 100 /	-0.124939
-1.011073	ebx, eax /	-0.124939
-0.557266	+= xn /	-0.124939
-0.557266	Windows. Borland /	-0.124939
-0.550230	/ CodeGear /	-0.124939
-0.526784	cache size) /	-0.124939
-0.366736	(unsigned int)b /	-0.124939
-0.504540	below. Signed /	-0.124939
-0.090098	of (2n /	-0.124939
-0.090098	* (2n /	-0.124939
-0.090098	constant (2n /	-0.124939
-0.504540	512 kb /	-0.124939
-0.659181	= a2 /	-0.124939
-0.659181	= a1 /	-0.124939
-0.463203	+ 2.0 /	-0.124939
-0.463203	is 8192 /	-0.124939
-0.659181	= (a+1) /	-0.124939
-0.358586	+ a2*b1) /	-0.124939
-0.358586	mov ebx,eax /	-0.124939
-0.358586	= (10000 /	-0.124939
-0.358586	(unsigned int)a /	-0.124939
-0.358586	(memory address) /	-0.124939
-0.358586	* (1. /	-0.124939
-0.358586	for 80x86 /	-0.124939
-0.358586	= (0x2710 /	-0.124939
-1.844617	method is explained	-0.124939
-0.901251	checking is explained	-0.124939
-0.597814	storage are explained	-0.124939
-0.894638	factors are explained	-0.124939
-0.597814	mangling are explained	-0.124939
-0.874115	code, as explained	-0.124939
-0.522458	processors, as explained	-0.124939
-0.522458	mode, as explained	-0.124939
-0.522458	system, as explained	-0.124939
-0.522458	static, as explained	-0.124939
-0.522458	up, as explained	-0.124939
-0.522458	precision, as explained	-0.124939
-0.522458	execution, as explained	-0.124939
-0.522458	space, as explained	-0.124939
-0.522458	operations, as explained	-0.124939
-0.522458	metaprogramming, as explained	-0.124939
-0.522458	classes, as explained	-0.124939
-0.522458	templates, as explained	-0.124939
-0.522458	branches, as explained	-0.124939
-0.522458	use, as explained	-0.124939
-0.522458	ways, as explained	-0.124939
-0.522458	AVX, as explained	-0.124939
-0.522458	statements, as explained	-0.124939
-0.522458	pool, as explained	-0.124939
-0.522458	optimizations, as explained	-0.124939
-0.522458	linking, as explained	-0.124939
-0.522458	frequency, as explained	-0.124939
-0.522458	pipelined, as explained	-0.124939
-0.522458	stride, as explained	-0.124939
-0.522458	contentions, as explained	-0.124939
-0.579258	is further explained	-0.124939
-0.205555	for reasons explained	-0.726999
-0.527270	perfectly. As explained	-0.124939
-0.463622	lookup mechanisms explained	-0.124939
-2.440684	by the calculated	-0.124939
-2.410813	with the calculated	-0.124939
-2.109428	that is calculated	-0.124939
-1.992592	which is calculated	-0.124939
-0.738803	value is calculated	-0.425969
-1.056087	expression is calculated	-0.124939
-0.594225	b) is calculated	-0.124939
-0.594225	xn is calculated	-0.124939
-0.594225	n! is calculated	-0.124939
-0.594225	coefficients is calculated	-0.124939
-0.594225	a+b is calculated	-0.124939
-0.594225	matrix[j][0] is calculated	-0.124939
-0.594225	g(x) is calculated	-0.124939
-1.855574	to be calculated	-0.124939
-1.326503	can be calculated	-0.388180
-1.740685	will be calculated	-0.124939
-1.655573	cannot be calculated	-0.124939
-1.008376	could be calculated	-0.124939
-1.896281	functions are calculated	-0.124939
-1.071959	operators are calculated	-0.124939
-1.833837	it has calculated	-0.124939
-1.029850	is only calculated	-0.124939
-1.271856	are always calculated	-0.124939
-0.358887	* 17is calculated	-0.124939
-0.358887	* 16is calculated	-0.124939
-2.209844	is the calculation	-0.124939
-1.794318	and the calculation	-0.425969
-2.571232	in the calculation	-0.124939
-2.356518	for the calculation	-0.124939
-2.417231	if the calculation	-0.124939
-1.773392	but the calculation	-0.124939
-0.595930	efficient the calculation	-0.124939
-1.950275	before the calculation	-0.124939
-1.613633	out the calculation	-0.124939
-1.585730	up the calculation	-0.124939
-0.890915	start the calculation	-0.124939
-0.595930	specifies the calculation	-0.124939
-0.890915	case, the calculation	-0.124939
-0.595930	begin the calculation	-0.124939
-1.061045	finished the calculation	-0.124939
-0.890915	redo the calculation	-0.124939
-0.595930	Re-do the calculation	-0.124939
-0.594197	c; The calculation	-0.124939
-1.335039	calls. The calculation	-0.124939
-0.594197	polynomial The calculation	-0.124939
-0.594197	28. The calculation	-0.124939
-0.887504	127. The calculation	-0.124939
-0.594197	supported. The calculation	-0.124939
-1.672908	} This calculation	-0.124939
-0.600617	shows this calculation	-0.124939
-0.600381	motion A calculation	-0.124939
-1.321054	that each calculation	-0.124939
-1.241873	where each calculation	-0.124939
-0.597917	out-of- order calculation	-0.124939
-1.273316	the address calculation	-0.124939
-0.546810	for address calculation	-0.124939
-0.546810	complicated address calculation	-0.124939
-1.426229	the total calculation	-0.124939
-0.358859	an estimated calculation	-0.124939
-0.358859	Linux. Address calculation	-0.124939
-0.541777	function } };	-0.124939
-0.541777	operator } };	-0.124939
-0.934799	1; } };	-0.425969
-0.541777	x; } };	-0.124939
-0.851676	2; } };	-0.425969
-0.541777	1.0; } };	-0.124939
-0.541777	N1 } };	-0.124939
-0.541777	powN<true,N/2>::p(x); } };	-0.124939
-0.541777	(static_cast<MyChild*>(this))->Disp(); } };	-0.124939
-0.597472	add elements };	-0.124939
-1.794296	sign bit };	-0.124939
-1.323321	the structure };	-0.124939
-0.354792	int c; };	-0.124939
-1.215573	c, d; };	-0.124939
-0.557441	at 19 };	-0.124939
-0.726283	and perhaps };	-0.124939
-0.249718	+ b;} };	-0.124939
-0.249718	{return b;} };	-0.124939
-0.065778	int c:2; };	-0.124939
-0.463367	Friday, Saturday };	-0.124939
-0.065778	void f(); };	-0.124939
-0.659439	float b[1000]; };	-0.124939
-0.659439	void NotPolymorphic(); };	-0.124939
-0.463367	= 0x40 };	-0.124939
-0.065778	sign :1;//signbit };	-0.425969
-0.358715	... ~C1(); };	-0.124939
-0.358715	int UnusedFiller; };	-0.124939
-0.358715	char abc; };	-0.124939
-0.358715	"Gamma", "Delta" };	-0.124939
-0.358715	+ 4.; };	-0.124939
-1.592821	class is 128	-0.124939
-1.076080	register is 128	-0.124939
-0.124735	defines a 128	-0.602060
-0.588200	4 int 128	-0.124939
-1.826337	unsigned int 128	-0.124939
-1.467919	short int 128	-0.124939
-1.733509	less than 128	-0.124939
-2.193314	See page 128	-0.124939
-0.599045	128 double 128	-0.124939
-0.598606	256 float 128	-0.124939
-0.586126	64 2 128	-0.425969
-0.583815	32 4 128	-0.124939
-1.184555	16 8 128	-0.124939
-1.322230	the first 128	-0.124939
-1.060679	8 16 128	-0.124939
-0.595947	vectors SSE2 128	-0.124939
-0.820256	128 128 128	-0.124939
-0.558671	12.2 128 128	-0.124939
-1.392501	the dispatcher 128	-0.124939
-0.985227	unsigned char 128	-0.124939
-0.536167	16 char 128	-0.124939
-0.575251	SIZE % 128	-0.124939
-0.573050	mode SSE 128	-0.124939
-0.557304	2 int64_t 128	-0.124939
-0.358812	0.28 strlen 128	-0.124939
-0.358812	0.27 strlen 128	-0.124939
-0.526911	2 uint64_t 128	-0.124939
-0.527040	126 12.2 128	-0.124939
-0.659353	compiler ......................................................................... 128	-0.124939
-0.358672	bits (MMX), 128	-0.124939
-2.752002	if the uses	-0.124939
-0.901932	big and uses	-0.124939
-1.291177	code that uses	-0.124939
-0.594822	application that uses	-0.124939
-0.888732	framework that uses	-0.124939
-0.594822	spot that uses	-0.124939
-1.972648	but it uses	-0.124939
-2.330714	a function uses	-0.124939
-0.895586	pow function uses	-0.124939
-0.600977	Mac code uses	-0.124939
-1.076568	an int uses	-0.124939
-1.923578	the compiler uses	-0.124939
-0.594006	CPUs. It uses	-0.124939
-0.594006	label. It uses	-0.124939
-1.637432	the program uses	-0.124939
-1.750290	a program uses	-0.124939
-1.531109	The program uses	-0.124939
-1.362425	a double uses	-0.124939
-1.193840	a float uses	-0.124939
-0.597606	when software uses	-0.124939
-0.589698	framework typically uses	-0.124939
-1.301150	the application uses	-0.124939
-0.545180	particular application uses	-0.124939
-0.543258	This implementation uses	-0.124939
-0.792507	good implementation uses	-0.124939
-0.589206	as their uses	-0.124939
-0.588771	user never uses	-0.124939
-0.869493	This feature uses	-0.124939
-0.578981	compiler sometimes uses	-0.124939
-0.578940	it still uses	-0.124939
-0.566733	Four typical uses	-0.124939
-0.557382	a vector, uses	-0.124939
-0.358672	or CString uses	-0.124939
-2.442379	of the four	-0.425969
-2.431158	and the four	-0.124939
-2.332834	with the four	-0.124939
-1.289487	add the four	-0.124939
-1.535741	store the four	-0.124939
-0.599727	vector, the four	-0.124939
-2.330120	that is four	-0.124939
-1.619787	vector of four	-0.124939
-1.068881	structure of four	-0.124939
-1.283995	vectors of four	-0.124939
-0.598599	maximum of four	-0.124939
-0.598599	groups of four	-0.124939
-1.202713	i to four	-0.124939
-2.261141	There are four	-0.124939
-0.896058	integers or four	-0.124939
-1.190168	precision or four	-0.124939
-0.894574	times with four	-0.124939
-0.597781	processor with four	-0.124939
-1.944371	more than four	-0.124939
-0.600554	only have four	-0.124939
-1.073353	processor has four	-0.124939
-1.153570	are only four	-0.124939
-1.153570	with only four	-0.124939
-0.584548	allows only four	-0.124939
-1.879282	can do four	-0.124939
-2.017203	the first four	-0.124939
-0.592769	you get four	-0.124939
-1.048966	for every four	-0.124939
-0.589677	// next four	-0.124939
-0.587218	will read four	-0.124939
-0.474297	of e.g. four	-0.124939
-0.474297	hold e.g. four	-0.124939
-1.047931	can hold four	-0.124939
-0.835223	bits each, four	-0.124939
-0.463367	loop calculates four	-0.124939
-1.901081	efficient than functions.	-0.124939
-0.599906	as different functions.	-0.124939
-1.491591	the library functions.	-0.124939
-0.563948	for library functions.	-0.124939
-0.563948	linked library functions.	-0.124939
-0.563948	executing library functions.	-0.124939
-1.581963	into multiple functions.	-0.124939
-1.704963	the two functions.	-0.124939
-0.508409	for member functions.	-0.124939
-0.732445	or member functions.	-0.124939
-0.508409	other member functions.	-0.124939
-1.037143	virtual member functions.	-0.124939
-0.465236	non-static member functions.	-0.124939
-0.508409	non-polymorphic member functions.	-0.124939
-1.318091	the virtual functions.	-0.124939
-0.831409	of virtual functions.	-0.124939
-0.591704	intrinsic hardware functions.	-0.124939
-0.498104	used intrinsic functions.	-0.124939
-0.498104	support intrinsic functions.	-0.124939
-0.498104	so-called intrinsic functions.	-0.124939
-0.498053	useful mathematical functions.	-0.124939
-0.498053	about mathematical functions.	-0.124939
-0.498053	optimized mathematical functions.	-0.124939
-0.587763	from string functions.	-0.124939
-0.587650	the three functions.	-0.124939
-0.716380	to frame functions.	-0.124939
-0.498748	and frame functions.	-0.124939
-0.851381	using overloaded functions.	-0.124939
-0.827666	the polymorphic functions.	-0.124939
-0.726283	for speed-critical functions.	-0.124939
-0.504864	and trigonometric functions.	-0.124939
-0.463367	local non-member functions.	-0.124939
-0.463367	use thread-safe functions.	-0.124939
-0.463367	than non-virtual functions.	-0.124939
-0.358715	remove unreferenced functions.	-0.124939
-1.078202	overflow is another	-0.124939
-1.910837	pointer to another	-0.124939
-0.600114	auto_ptr to another	-0.124939
-1.073369	ported to another	-0.124939
-0.901983	support and another	-0.124939
-0.897940	class in another	-0.124939
-1.288267	results in another	-0.124939
-0.599477	system-independent, in another	-0.124939
-1.202310	wait for another	-0.124939
-1.076268	overflow or another	-0.124939
-0.592064	second by another	-0.124939
-0.592064	5 by another	-0.124939
-0.592064	changed by another	-0.124939
-0.592064	deleted by another	-0.124939
-0.881932	addition with another	-0.124939
-0.591352	core with another	-0.124939
-0.591352	built with another	-0.124939
-0.591352	clash with another	-0.124939
-1.076019	cycle on another	-0.124939
-0.931072	called from another	-0.425969
-1.879122	can do another	-0.124939
-1.714016	by making another	-0.124939
-0.888397	calculations while another	-0.124939
-1.275942	function calls another	-0.124939
-0.516663	turn calls another	-0.124939
-0.185644	F1 calls another	-0.425969
-0.594318	file. Use another	-0.124939
-0.885992	is inside another	-0.124939
-1.029545	it goes another	-0.124939
-0.577292	destructor causes another	-0.124939
-1.395697	instruction set, another	-0.124939
-0.764379	that produces another	-0.124939
-0.358701	user interface, another	-0.124939
-0.358701	we encounter another	-0.124939
-2.590801	for the parameters	-0.124939
-2.049053	where the parameters	-0.124939
-0.600814	mode, the parameters	-0.124939
-0.600814	Storing the parameters	-0.124939
-1.639951	type of parameters	-0.124939
-0.601520	systems". The parameters	-0.124939
-1.059798	as function parameters	-0.124939
-0.595502	four function parameters	-0.124939
-0.595502	Simple function parameters	-0.124939
-0.600972	passed as parameters	-0.124939
-1.368652	with vector parameters	-0.124939
-2.294590	floating point parameters	-0.124939
-1.786432	Floating point parameters	-0.124939
-0.581260	two integer parameters	-0.124939
-0.862441	six integer parameters	-0.124939
-0.581260	compiler) integer parameters	-0.124939
-0.455491	the template parameters	-0.425969
-0.872749	as template parameters	-0.124939
-1.190191	has its parameters	-0.124939
-1.117785	of four parameters	-0.124939
-0.558705	first four parameters	-0.124939
-0.415635	parameters Function parameters	-0.124939
-0.159884	memory. Function parameters	-0.425969
-0.415635	operators. Function parameters	-0.124939
-0.159884	7.15 Function parameters	-0.124939
-0.415635	__fastcall. Function parameters	-0.124939
-0.580531	with desired parameters	-0.124939
-0.577342	that macro parameters	-0.124939
-0.266167	to fourteen parameters	-0.425969
-0.463458	stack (three parameters	-0.124939
-2.106743	possible to get	-0.124939
-1.167304	order to get	-0.124939
-0.593080	template to get	-0.124939
-1.388146	want to get	-0.124939
-1.742672	difficult to get	-0.124939
-1.462001	ways to get	-0.124939
-0.593080	-fno-builtin to get	-0.124939
-0.593080	experience to get	-0.124939
-0.601569	elsewhere and get	-0.124939
-1.278235	you can get	-0.124939
-1.076920	x // get	-0.124939
-0.601160	www.agner.org/optimize/testp.zip or get	-0.124939
-0.938455	will not get	-0.124939
-1.358465	you may get	-0.425969
-1.294679	example, you get	-0.124939
-0.829877	you will get	-0.124939
-1.022337	thread will get	-0.124939
-0.582274	y will get	-0.124939
-0.599611	Users should get	-0.124939
-0.583770	number we get	-0.124939
-0.867249	Then we get	-0.124939
-0.590663	will both get	-0.124939
-0.878966	will typically get	-0.124939
-1.262883	we don't get	-0.124939
-0.805598	will soon get	-0.124939
-0.527059	object: (1) get	-0.124939
-0.463440	inefficient, (4) get	-0.124939
-1.679631	a = b;	-0.124939
-0.596349	x[1] = b;	-0.124939
-0.147125	a; int b;	-0.346788
-0.580134	399 int b;	-0.124939
-1.116044	{ double b;	-0.124939
-0.495494	a; double b;	-0.124939
-0.895838	a & b;	-0.124939
-0.530689	int a, b;	-0.301030
-0.064123	double a, b;	-0.425969
-0.421545	float a, b;	-0.602060
-0.345882	i, a, b;	-0.124939
-0.662955	bool a, b;	-0.124939
-0.886681	a += b;	-0.124939
-0.882774	a : b;	-0.124939
-1.368016	a && b;	-0.124939
-1.380370	a | b;	-0.124939
-1.231502	a || b;	-0.124939
-1.422637	= 0, b;	-0.124939
-0.431057	a; bool b;	-0.124939
-0.431057	y; bool b;	-0.124939
-0.431057	z; bool b;	-0.124939
-0.835567	i, a[100], b;	-0.124939
-1.769775	do the check	-0.124939
-1.077493	bypass the check	-0.124939
-1.599022	such a check	-0.124939
-1.660188	code to check	-0.124939
-2.059228	have to check	-0.124939
-1.792813	has to check	-0.124939
-1.273412	way to check	-0.124939
-1.979096	how to check	-0.124939
-1.899428	need to check	-0.124939
-1.967253	want to check	-0.124939
-1.455565	calls to check	-0.124939
-1.826877	necessary to check	-0.124939
-0.892825	program can check	-0.124939
-1.872831	You can check	-0.124939
-1.719916	We can check	-0.124939
-2.298298	{ // check	-0.124939
-1.329352	does not check	-0.124939
-1.075851	class. This check	-0.124939
-0.600360	must then check	-0.124939
-1.461620	is no check	-0.425969
-0.990153	have no check	-0.425969
-0.888876	This extra check	-0.124939
-0.888702	function must check	-0.124939
-0.589346	doesn't automatically check	-0.124939
-0.588448	no automatic check	-0.124939
-0.874942	a runtime check	-0.124939
-0.573023	a bounds check	-0.124939
-0.570271	We might check	-0.124939
-0.841592	as input check	-0.124939
-0.981044	CPU brand check	-0.124939
-0.527103	A missing check	-0.124939
-0.526995	problem: (1) check	-0.124939
-1.494696	it is advantageous	-0.903090
-2.181174	function is advantageous	-0.124939
-1.798556	This is advantageous	-0.124939
-2.474483	It is advantageous	-0.124939
-1.260012	table is advantageous	-0.124939
-1.751025	method is advantageous	-0.124939
-1.339194	caching is advantageous	-0.124939
-1.861546	can be advantageous	-0.602060
-1.730221	not be advantageous	-0.124939
-1.687831	may be advantageous	-0.124939
-1.832283	will be advantageous	-0.124939
-1.639536	also be advantageous	-0.124939
-1.027041	cases be advantageous	-0.124939
-0.587533	therefore be advantageous	-0.124939
-0.601494	cores are advantageous	-0.124939
-1.463724	is not advantageous	-0.425969
-1.142496	therefore not advantageous	-0.124939
-1.489106	is more advantageous	-0.124939
-1.800443	is less advantageous	-0.124939
-0.596927	decide how advantageous	-0.124939
-0.596066	almost always advantageous	-0.124939
-0.590392	are particular advantageous	-0.124939
-2.155875	code is implemented	-0.124939
-2.050224	which is implemented	-0.124939
-1.558226	class is implemented	-0.124939
-1.465974	array is implemented	-0.124939
-0.597979	version is implemented	-0.425969
-0.892960	software is implemented	-0.124939
-1.069051	constructor is implemented	-0.124939
-1.326478	can be implemented	-0.467361
-1.520967	should be implemented	-0.425969
-1.008349	easily be implemented	-0.124939
-0.577143	day be implemented	-0.124939
-1.394348	which are implemented	-0.124939
-0.887335	etc. are implemented	-0.124939
-1.180697	languages are implemented	-0.124939
-1.063426	loops are implemented	-0.124939
-0.887335	Branches are implemented	-0.124939
-0.601271	modification if implemented	-0.124939
-0.890195	and have implemented	-0.124939
-1.878043	I have implemented	-0.124939
-1.836735	is often implemented	-0.124939
-0.593241	this calculation implemented	-0.124939
-0.880037	for programs implemented	-0.124939
-1.322840	is typically implemented	-0.124939
-0.590067	is preferably implemented	-0.124939
-2.874165	of the problem	-0.124939
-2.583491	if the problem	-0.124939
-2.276658	If the problem	-0.124939
-1.576264	avoid the problem	-0.124939
-1.193617	reduce the problem	-0.124939
-0.599410	ignore the problem	-0.124939
-0.897807	fix the problem	-0.124939
-0.599410	solving the problem	-0.124939
-1.759166	is a problem	-0.124939
-1.485605	not a problem	-0.124939
-1.184577	etc. The problem	-0.124939
-1.554758	cache. The problem	-0.124939
-0.893209	units. The problem	-0.124939
-0.597091	unchanged. The problem	-0.124939
-0.597119	registers. This problem	-0.124939
-0.597119	caching. This problem	-0.124939
-0.629986	to this problem	-0.301030
-0.842056	have this problem	-0.124939
-1.218045	avoid this problem	-0.124939
-0.570472	solved this problem	-0.124939
-0.842056	solve this problem	-0.124939
-1.197199	} A problem	-0.124939
-2.245923	is no problem	-0.124939
-1.174457	very big problem	-0.124939
-0.577392	with alignment problem	-0.124939
-0.577420	version causes problem	-0.124939
-0.575458	double. Another problem	-0.124939
-0.567021	a usability problem	-0.124939
-0.557557	most serious problem	-0.124939
-0.805871	The worst problem	-0.124939
-0.463476	This safety problem	-0.124939
-1.816390	it is known	-0.124939
-1.965728	which is known	-0.124939
-1.168300	objects is known	-0.124939
-0.376493	elements is known	-0.425969
-0.738952	result is known	-0.425969
-0.592824	store is known	-0.124939
-1.168300	n is known	-0.124939
-1.060902	process is known	-0.124939
-1.052043	condition is known	-0.124939
-1.176982	divisor is known	-0.124939
-0.902505	represent a known	-0.124939
-1.642237	object of known	-0.124939
-0.902185	constant and known	-0.124939
-2.004102	cannot be known	-0.124939
-1.350869	is not known	-0.823909
-1.801934	are not known	-0.124939
-0.599961	handle only known	-0.124939
-1.936628	an integer known	-0.124939
-1.331602	the size known	-0.124939
-0.599153	implicit pointer known	-0.124939
-1.071421	to any known	-0.124939
-1.764073	a constant known	-0.124939
-0.988881	of error known	-0.124939
-0.569828	programming error known	-0.124939
-0.266179	is already known	-0.124939
-0.560891	b for (i	-0.124939
-0.965807	0; for (i	-0.124939
-0.055940	i; for (i	-0.903090
-0.560891	b; for (i	-0.124939
-0.259675	... for (i	-0.726999
-0.195568	1; for (i	-0.425969
-0.560891	zero for (i	-0.124939
-0.363273	x; for (i	-0.425969
-1.000411	temp; for (i	-0.124939
-0.560891	3; for (i	-0.124939
-0.560891	a[100]; for (i	-0.124939
-0.560891	45 for (i	-0.124939
-0.560891	r; for (i	-0.124939
-0.560891	loop: for (i	-0.124939
-0.560891	StringLength; for (i	-0.124939
-0.560891	a[2]; for (i	-0.124939
-0.560891	84 for (i	-0.124939
-0.560891	timediff[NumberOfTests]; for (i	-0.124939
-0.560891	printf("\nResults:"); for (i	-0.124939
-0.589747	int if (i	-0.124939
-1.543713	{ if (i	-0.124939
-0.201629	... if (i	-0.124939
-0.589747	list[ARRAYSIZE]; if (i	-0.124939
-0.594978	exceptions: while (i	-0.124939
-1.549986	then the solution	-0.425969
-0.601125	Fortunately, the solution	-0.124939
-1.078098	But a solution	-0.124939
-0.601448	comparisons. The solution	-0.124939
-0.597064	mark_end; This solution	-0.124939
-0.597064	JNZ). This solution	-0.124939
-1.713350	of this solution	-0.124939
-0.879693	But this solution	-0.124939
-0.590204	Furthermore, this solution	-0.124939
-1.072161	see which solution	-0.124939
-1.741869	more efficient solution	-0.124939
-0.689550	most efficient solution	-0.124939
-0.906459	very efficient solution	-0.124939
-0.536554	An efficient solution	-0.124939
-1.268236	A simple solution	-0.124939
-1.549737	The best solution	-0.124939
-1.048540	The standard solution	-0.124939
-1.336310	the optimal solution	-0.124939
-0.547160	an optimal solution	-0.124939
-1.529715	more complicated solution	-0.124939
-1.233353	a better solution	-0.124939
-0.463458	The alternative solution	-0.124939
-0.659582	An alternative solution	-0.124939
-1.095284	the fastest solution	-0.124939
-0.981489	more powerful solution	-0.124939
-0.527179	most clean solution	-0.124939
-0.527056	only reasonable solution	-0.124939
-0.504680	class. Which solution	-0.124939
-0.659381	a viable solution	-0.124939
-0.463330	No universal solution	-0.124939
-0.358686	The radical solution	-0.124939
-0.358686	The ultimate solution	-0.124939
-2.883127	in the container	-0.124939
-2.300019	because the container	-0.124939
-2.314220	If the container	-0.124939
-1.625991	making the container	-0.124939
-0.600443	Can the container	-0.124939
-1.378328	use a container	-0.124939
-1.288817	into a container	-0.124939
-0.595910	defining a container	-0.124939
-0.890877	choosing a container	-0.124939
-0.890877	considered a container	-0.124939
-0.890877	lock a container	-0.124939
-0.595910	re-use a container	-0.124939
-1.498472	examples of container	-0.124939
-1.634571	discussion of container	-0.124939
-1.201840	class. The container	-0.124939
-0.601511	Remember that container	-0.124939
-0.599257	array or container	-0.124939
-2.527732	to make container	-0.124939
-1.917695	and other container	-0.124939
-0.898202	use one container	-0.124939
-1.070618	of example container	-0.124939
-1.068781	of such container	-0.124939
-0.572698	of efficient container	-0.124939
-1.954655	more efficient container	-0.124939
-0.572698	various efficient container	-0.124939
-0.882556	many standard container	-0.124939
-1.238811	your own container	-0.124939
-0.584902	Ready made container	-0.124939
-0.863449	an STL container	-0.124939
-1.189653	for accessing container	-0.124939
-0.575349	www.agner.org/optimize/cppexamples.zip containing container	-0.124939
-0.504761	by well-tested container	-0.124939
-1.105274	has the advantage	-0.425969
-1.375823	gives the advantage	-0.124939
-1.822767	} The advantage	-0.124939
-1.425397	program. The advantage	-0.124939
-1.504464	cache. The advantage	-0.124939
-1.047534	compilers. The advantage	-0.124939
-0.881737	enabled. The advantage	-0.124939
-1.047534	faster. The advantage	-0.124939
-0.591252	iterations. The advantage	-0.124939
-0.591252	m. The advantage	-0.124939
-0.600989	register. This advantage	-0.124939
-1.667449	is an advantage	-0.124939
-0.872877	be an advantage	-0.124939
-1.122830	not an advantage	-0.124939
-1.016832	only an advantage	-0.124939
-0.860546	gives an advantage	-0.124939
-2.246258	is no advantage	-0.124939
-0.598604	no such advantage	-0.124939
-1.361809	that takes advantage	-0.124939
-0.764807	to take advantage	-0.425969
-0.387410	can take advantage	-0.903090
-0.594377	any speed advantage	-0.124939
-1.692708	a specific advantage	-0.124939
-1.024468	The main advantage	-0.124939
-0.577464	take maximum advantage	-0.124939
-1.181557	the full advantage	-0.124939
-0.504861	We took advantage	-0.124939
-0.851879	function // Function	-0.124939
-0.835182	} // Function	-0.602060
-0.851879	functions // Function	-0.124939
-0.575700	classes // Function	-0.124939
-1.408050	}; // Function	-0.124939
-0.851879	SSE4.1 // Function	-0.124939
-0.851879	); // Function	-0.124939
-0.575700	parm2); // Function	-0.124939
-0.575700	const*)p);} // Function	-0.124939
-0.575700	CriticalFunction_Dispatch; // Function	-0.124939
-1.634086	position-independent code Function	-0.124939
-1.469051	dynamic libraries Function	-0.124939
-0.596075	Optimization method Function	-0.124939
-0.888876	inlined function. Function	-0.124939
-1.411557	Function parameters Function	-0.124939
-1.173136	non-inlined copy Function	-0.124939
-0.780094	in memory. Function	-0.425969
-0.573023	these instructions. Function	-0.124939
-0.788517	7.14 Functions Function	-0.124939
-0.540917	profiler itself. Function	-0.124939
-0.764043	overloaded operators. Function	-0.124939
-0.249726	operators. 7.7 Function	-0.124939
-0.249726	36 7.7 Function	-0.124939
-0.249726	50 7.16 Function	-0.124939
-0.249726	systems". 7.16 Function	-0.124939
-0.143333	48 7.15 Function	-0.124939
-0.143333	respect. 7.15 Function	-0.124939
-0.463385	using __fastcall. Function	-0.124939
-0.358729	Example 12.6. Function	-0.124939
-0.358729	functions /Gr Function	-0.124939
-0.358729	know about. Function	-0.124939
-0.358729	library libircmt.lib. Function	-0.124939
-1.743540	code to support	-0.124939
-0.601456	system for support	-0.124939
-1.266049	compilers that support	-0.124939
-0.598834	processors that support	-0.124939
-0.888758	CPUs that support	-0.124939
-1.334276	do not support	-0.124939
-1.177957	Does not support	-0.124939
-1.613465	that have support	-0.124939
-1.697872	compilers have support	-0.124939
-0.600293	processors will support	-0.124939
-1.052751	set has support	-0.124939
-0.593070	system has support	-0.124939
-2.526941	to make support	-0.124939
-0.894926	has some support	-0.124939
-0.596688	Gnu libraries support	-0.124939
-0.189280	with AVX support	-0.124939
-0.896932	without AVX support	-0.124939
-0.114843	has hardware support	-0.602060
-1.672076	exception handling support	-0.124939
-0.588895	"we don't support	-0.124939
-0.874781	need better support	-0.124939
-0.586458	It requires support	-0.124939
-0.714505	turn off support	-0.425969
-0.575467	requires OS support	-0.124939
-0.562628	and profiling support	-0.124939
-0.557494	full debugging support	-0.124939
-0.463349	have inherent support	-0.124939
-0.358701	has excellent support	-0.124939
-2.663955	for the supported	-0.124939
-0.730697	set is supported	-0.492916
-1.450709	C++ is supported	-0.124939
-0.202672	AVX is supported	-0.124939
-0.594870	AVX2 is supported	-0.124939
-1.702275	Linux and supported	-0.124939
-0.900002	standardized and supported	-0.124939
-0.855575	processors that supported	-0.124939
-1.861516	functions are supported	-0.124939
-1.690638	registers are supported	-0.124939
-1.187046	directives are supported	-0.124939
-0.895651	available if supported	-0.124939
-0.598325	__restrict__, if supported	-0.124939
-2.276487	such as supported	-0.124939
-2.077465	are not supported	-0.124939
-0.597057	set not supported	-0.124939
-0.599952	currently only supported	-0.124939
-0.584562	// SSE2 supported	-0.124939
-1.785965	information about supported	-0.124939
-0.488676	// AVX supported	-0.124939
-0.204051	// Get supported	-0.425969
-0.463476	the minimum supported	-0.124939
-0.358801	// Detect supported	-0.124939
-1.078284	variables is eight	-0.124939
-1.635723	vector of eight	-0.124939
-1.295495	vectors of eight	-0.124939
-0.089894	vector in eight	-0.726999
-2.261736	There are eight	-0.124939
-1.200473	precision or eight	-0.124939
-0.561143	loop by eight	-0.726999
-1.293706	can have eight	-0.124939
-0.599747	processors but eight	-0.124939
-0.599109	go into eight	-0.124939
-1.521169	The first eight	-0.124939
-0.578515	49 first eight	-0.124939
-0.596242	But these eight	-0.124939
-1.165409	can run eight	-0.124939
-0.575333	can handle eight	-0.124939
-0.007987	// Load eight	-1.028029
-0.169311	operations involves eight	-0.124939
-0.835352	bits each, eight	-0.124939
-0.463422	it handles eight	-0.124939
-0.358758	be reloaded eight	-0.124939
-2.425832	with the operators	-0.124939
-0.687749	variables and operators	-0.124939
-0.899070	vectors. The operators	-0.124939
-1.073165	cycle. The operators	-0.124939
-1.073749	come from operators	-0.124939
-1.624921	that all operators	-0.124939
-0.599773	1, but operators	-0.124939
-0.594093	free. These operators	-0.124939
-0.590365	results. Integer operators	-0.124939
-0.154251	the Boolean operators	-0.124939
-0.597524	The Boolean operators	-0.124939
-0.691959	using overloaded operators	-0.124939
-0.483789	multiple overloaded operators	-0.124939
-0.095760	the bitwise operators	-0.124939
-0.069769	The bitwise operators	-0.124939
-0.081875	with bitwise operators	-0.124939
-0.081875	using bitwise operators	-0.124939
-0.039011	Use bitwise operators	-0.425969
-0.081875	corresponding bitwise operators	-0.124939
-0.541026	about increment operators	-0.124939
-0.129454	7.27 Overloaded operators	-0.124939
-0.249763	to decrement operators	-0.124939
-0.358875	and decrement operators	-0.124939
-0.358801	and relational operators	-0.124939
-3.196202	of the few	-0.124939
-2.138130	is a few	-0.124939
-1.853486	for a few	-0.124939
-0.544000	are a few	-0.124939
-0.920221	or a few	-0.124939
-0.979292	than a few	-0.124939
-1.796283	make a few	-0.124939
-1.407370	only a few	-0.124939
-1.242494	takes a few	-0.124939
-0.872837	add a few	-0.124939
-0.586670	needed a few	-0.124939
-1.034556	just a few	-0.124939
-1.229015	require a few	-0.124939
-0.586670	until a few	-0.124939
-1.034556	include a few	-0.124939
-0.872837	break a few	-0.124939
-0.586670	SSSE3 a few	-0.124939
-0.601561	kludgy. The few	-0.124939
-1.201301	loop with few	-0.124939
-0.601018	have as few	-0.124939
-0.887899	disk. A few	-0.124939
-0.594398	lines. A few	-0.124939
-2.637741	the same few	-0.124939
-0.599981	where only few	-0.124939
-0.599828	matters, which few	-0.124939
-1.066614	have very few	-0.124939
-1.257097	that uses few	-0.124939
-0.580693	table. Unfortunately, few	-0.124939
-1.948168	code that contains	-0.124939
-1.188687	loop that contains	-0.124939
-0.895305	container that contains	-0.124939
-1.953862	the code contains	-0.124939
-1.983844	the program contains	-0.124939
-1.061845	a program contains	-0.124939
-2.002409	a loop contains	-0.124939
-0.599718	(eax) which contains	-0.124939
-1.227861	This library contains	-0.124939
-0.196223	core library contains	-0.124939
-0.563937	Primitives" library contains	-0.124939
-0.950309	the software contains	-0.425969
-0.855628	it often contains	-0.124939
-0.577681	code often contains	-0.124939
-1.423514	the expression contains	-0.124939
-1.370948	code section contains	-0.124939
-0.577251	are testing contains	-0.124939
-0.572984	Now ebx contains	-0.124939
-0.434777	ecx now contains	-0.124939
-0.434777	body now contains	-0.124939
-0.562757	is. ecx contains	-0.124939
-0.557421	Boost collection contains	-0.124939
-0.805431	and edx contains	-0.124939
-0.884552	at www.agner.org/optimize/cppexamples.zip contains	-0.124939
-0.659410	Kernel Library" contains	-0.124939
-0.358701	at www.agner.org/optimize/asmlib.zip contains	-0.124939
-0.358701	file http://www.agner.org/optimize/asmlib.zip contains	-0.124939
-0.358701	class (CGrandParent) contains	-0.124939
-0.358701	class (CParent<>) contains	-0.124939
-0.680308	regardless of whether	-0.124939
-0.901907	program and whether	-0.124939
-1.742802	depends on whether	-0.124939
-0.598513	be efficient whether	-0.124939
-0.596170	for sure whether	-0.124939
-0.890998	find out whether	-0.124939
-0.594895	made about whether	-0.124939
-1.558224	to check whether	-0.124939
-0.590628	equally fast whether	-0.124939
-0.947356	to see whether	-0.124939
-1.451218	no difference whether	-0.124939
-0.871089	table shows whether	-0.124939
-1.382447	to know whether	-0.124939
-0.577284	not clear whether	-0.124939
-1.416149	to predict whether	-0.124939
-0.521035	may consider whether	-0.124939
-0.377817	that checks whether	-0.124939
-0.531697	it checks whether	-0.124939
-0.377817	dispatcher checks whether	-0.124939
-0.557281	at compile-time whether	-0.124939
-1.088219	to evaluate whether	-0.124939
-0.019417	when deciding whether	-0.522879
-0.835052	to determine whether	-0.124939
-0.463294	When considering whether	-0.124939
-0.463294	operand determines whether	-0.124939
-0.463294	it decides whether	-0.124939
-0.358658	predict correctly whether	-0.124939
-0.151179	< 100; i++)	-0.602060
-0.584196	< 2; i++)	-0.124939
-0.019731	< size; i++)	-0.522879
-0.450566	< n; i++)	-0.124939
-0.639498	<= n; i++)	-0.124939
-1.295635	< 256; i++)	-0.124939
-0.541111	< 1000; i++)	-0.124939
-0.504861	(i=0; i<100; i++)	-0.124939
-0.659639	< 20; i++)	-0.124939
-0.659639	(i=0; i<n; i++)	-0.124939
-0.065791	< rows; i++)	-0.124939
-0.065791	< NumberOfTests; i++)	-0.425969
-0.358816	< arraysize; i++)	-0.124939
-0.358816	< ArraySize; i++)	-0.124939
-0.358816	< list.Size(); i++)	-0.124939
-2.922735	in the list	-0.124939
-2.647590	that the list	-0.124939
-2.673846	if the list	-0.124939
-1.940809	into the list	-0.124939
-1.976042	is a list	-0.425969
-2.458095	of a list	-0.124939
-2.027730	for a list	-0.124939
-2.105678	by a list	-0.124939
-1.281647	Such a list	-0.124939
-1.555368	beginning of list	-0.124939
-0.902472	element to list	-0.124939
-1.853510	elements in list	-0.124939
-0.600319	set?". A list	-0.124939
-1.683158	a long list	-0.124939
-1.986643	The following list	-0.124939
-0.321837	a linked list	-0.124939
-0.143384	A linked list	-0.425969
-0.213094	a negative list	-0.602060
-0.292719	a positive list	-0.301030
-1.392300	the entire list	-0.124939
-1.365865	a linear list	-0.124939
-1.259950	the smallest list	-0.124939
-0.028017	a sorted list	-0.124939
-0.901899	errors that would	-0.124939
-1.769149	when it would	-0.124939
-1.942025	but it would	-0.124939
-0.593216	them. This would	-0.124939
-0.593216	scratch. This would	-0.124939
-0.593216	log(c[i]);. This would	-0.124939
-2.471038	the compiler would	-0.124939
-1.568892	optimizing compiler would	-0.124939
-1.075227	time you would	-0.124939
-0.741396	because this would	-0.124939
-1.691576	The loop would	-0.124939
-0.599674	{} which would	-0.124939
-1.563015	induction variable would	-0.124939
-1.427773	then we would	-0.124939
-1.697791	cache line would	-0.124939
-1.168272	the parameters would	-0.124939
-0.592593	ultimate solution would	-0.124939
-1.377443	the multiplication would	-0.124939
-0.589363	safer implementation would	-0.124939
-0.858544	and d would	-0.124939
-0.846725	two loops would	-0.124939
-0.570036	to metaprogramming would	-0.124939
-1.225240	dependency chain would	-0.124939
-0.434774	but who would	-0.124939
-0.434774	And who would	-0.124939
-0.504726	the reduction would	-0.124939
-0.358719	particular reduction would	-0.124939
-0.916444	to 15.1c would	-0.124939
-0.526869	they otherwise would	-0.124939
-0.835009	to 0x273F would	-0.124939
-0.835009	the logarithm would	-0.124939
-0.463276	then sizeof(S1) would	-0.124939
-3.062047	in the likely	-0.124939
-0.587419	for is likely	-0.124939
-1.775211	it is likely	-0.301030
-1.514838	code is likely	-0.425969
-1.732820	compiler is likely	-0.124939
-1.803698	this is likely	-0.124939
-1.803694	program is likely	-0.124939
-1.874640	which is likely	-0.124939
-1.232310	address is likely	-0.124939
-1.036660	user is likely	-0.124939
-1.687063	method is likely	-0.124939
-1.036660	system is likely	-0.124939
-0.587419	bits is likely	-0.124939
-1.391340	problem is likely	-0.124939
-1.148302	model is likely	-0.124939
-0.874285	platform is likely	-0.124939
-1.378439	here is likely	-0.124939
-0.587419	brand is likely	-0.124939
-0.587419	comparison is likely	-0.124939
-0.587419	87) is likely	-0.124939
-1.743511	data are likely	-0.124939
-2.186426	is more likely	-0.124939
-0.599389	will most likely	-0.124939
-1.917967	is also likely	-0.124939
-1.079574	is very likely	-0.124939
-1.477402	are less likely	-0.124939
-1.531386	and therefore likely	-0.124939
-0.591104	which quite likely	-0.124939
-1.162104	are equally likely	-0.124939
-2.455514	of the structure	-0.124939
-2.850693	in the structure	-0.124939
-1.622455	making the structure	-0.124939
-1.196366	load the structure	-0.124939
-0.899196	made the structure	-0.124939
-2.492567	is a structure	-0.124939
-2.027160	in a structure	-0.425969
-2.207023	as a structure	-0.124939
-0.598106	big a structure	-0.124939
-0.895218	define a structure	-0.124939
-1.291740	array of structure	-0.124939
-1.073583	arrays of structure	-0.124939
-0.899350	alignment of structure	-0.124939
-0.272724	class or structure	-0.346788
-1.075934	thread. This structure	-0.124939
-0.600304	floats A structure	-0.124939
-0.885850	other data structure	-0.124939
-0.593355	own data structure	-0.124939
-0.600123	clear program structure	-0.124939
-2.633793	the same structure	-0.124939
-1.349481	the whole structure	-0.124939
-1.003855	the logical structure	-0.124939
-0.573065	a parallel structure	-0.124939
-0.835595	the logic structure	-0.124939
-0.788893	a multidimensional structure	-0.124939
-0.884572	the pipeline structure	-0.124939
-0.527080	a class, structure	-0.124939
-2.727399	it is doing	-0.124939
-2.358132	function is doing	-0.124939
-0.899729	microprocessor is doing	-0.124939
-1.671867	way of doing	-0.124939
-0.677876	method of doing	-0.425969
-0.548101	ways of doing	-0.124939
-0.601569	renaming and doing	-0.124939
-2.101162	used for doing	-0.124939
-0.895930	C++ for doing	-0.124939
-1.570731	available for doing	-0.124939
-2.008706	you are doing	-0.124939
-1.830023	functions are doing	-0.124939
-1.270917	threads are doing	-0.124939
-0.595868	Sum3 are doing	-0.124939
-0.601131	accomplished by doing	-0.124939
-2.185475	are not doing	-0.124939
-1.497541	resources than doing	-0.124939
-0.900077	spend time doing	-0.124939
-0.594734	size when doing	-0.124939
-1.182504	useful when doing	-0.124939
-0.789170	compiler from doing	-0.124939
-0.371623	CPU from doing	-0.124939
-0.600215	best at doing	-0.124939
-2.272131	the CPU doing	-0.124939
-1.582703	innermost loop doing	-0.124939
-0.586419	to actually doing	-0.124939
-1.392952	in fact doing	-0.124939
-0.659553	is busy doing	-0.124939
-2.168809	is to run	-0.124939
-1.951063	likely to run	-0.124939
-1.987742	able to run	-0.124939
-0.597001	programs to run	-0.124939
-0.893030	models to run	-0.124939
-1.184227	try to run	-0.124939
-1.064177	prefer to run	-0.124939
-1.077748	dispatching and run	-0.124939
-0.895384	threads that run	-0.124939
-0.598190	services that run	-0.124939
-0.598190	servers that run	-0.124939
-1.057430	set can run	-0.124939
-0.594688	part can run	-0.124939
-0.594688	cores can run	-0.124939
-0.594688	family can run	-0.124939
-0.882681	code may run	-0.124939
-0.882681	calls may run	-0.124939
-0.591736	thread may run	-0.124939
-2.013496	If you run	-0.124939
-0.768309	it will run	-0.124939
-1.429905	code will run	-0.124939
-1.006408	thread will run	-0.124939
-1.074173	can then run	-0.124939
-0.600224	or at run	-0.124939
-0.599942	Can only run	-0.124939
-0.898218	process should run	-0.124939
-1.820861	of each run	-0.124939
-1.181991	a test run	-0.124939
-1.060973	will always run	-0.124939
-0.587778	make applications run	-0.124939
-1.138560	can still run	-0.124939
-1.544775	than to calculate	-0.124939
-1.935783	have to calculate	-0.124939
-1.610867	time to calculate	-0.124939
-2.015537	possible to calculate	-0.124939
-1.005926	takes to calculate	-0.602060
-1.162455	variables to calculate	-0.124939
-2.005828	order to calculate	-0.124939
-1.464655	faster to calculate	-0.124939
-1.853368	want to calculate	-0.124939
-1.856135	able to calculate	-0.124939
-1.866140	recommended to calculate	-0.124939
-0.874874	application to calculate	-0.124939
-1.149410	start to calculate	-0.124939
-0.587723	care to calculate	-0.124939
-0.587723	procedure to calculate	-0.124939
-1.149410	convenient to calculate	-0.124939
-1.162455	safer to calculate	-0.124939
-1.493616	function and calculate	-0.124939
-0.600601	*p and calculate	-0.124939
-0.879885	and can calculate	-0.124939
-1.951295	it can calculate	-0.124939
-1.606992	we can calculate	-0.124939
-1.794761	You can calculate	-0.124939
-1.044823	processors can calculate	-0.124939
-1.654526	We can calculate	-0.124939
-1.948319	compiler may calculate	-0.124939
-1.735061	compiler will calculate	-0.124939
-1.173977	we will calculate	-0.124939
-0.600020	needs only calculate	-0.124939
-0.888968	compiler must calculate	-0.124939
-0.567168	you could calculate	-0.124939
-2.757092	if the inline	-0.124939
-1.950657	compiler to inline	-0.124939
-1.970811	likely to inline	-0.124939
-1.366495	able to inline	-0.124939
-1.287314	optimal to inline	-0.124939
-1.741933	support for inline	-0.124939
-0.600995	macro as inline	-0.124939
-1.463759	with an inline	-0.124939
-1.285043	using an inline	-0.124939
-2.419143	to use inline	-0.124939
-2.635763	the same inline	-0.124939
-1.073003	critical functions inline	-0.124939
-0.036515	array static inline	-0.726999
-0.449929	N> static inline	-0.124939
-0.449929	<emmintrin.h> static inline	-0.124939
-0.449929	14.19 static inline	-0.124939
-0.449929	T> static inline	-0.124939
-0.449929	line: static inline	-0.124939
-0.449929	_mm_cvtss_si32(_mm_load_ss(&x));} static inline	-0.124939
-0.449929	add_horizontal) static inline	-0.124939
-1.283045	it cannot inline	-0.124939
-1.635549	function call inline	-0.124939
-0.594510	functions An inline	-0.124939
-0.594461	possible. Use inline	-0.124939
-1.037395	intrinsic functions, inline	-0.124939
-0.600909	file of every	-0.124939
-1.297587	copy of every	-0.124939
-0.895812	block for every	-0.124939
-0.598406	expressions for every	-0.124939
-0.598406	again for every	-0.124939
-0.901939	recommend that every	-0.124939
-1.738410	function or every	-0.124939
-0.887419	dispatching on every	-0.124939
-0.887419	dispatch on every	-0.124939
-0.887419	Dispatch on every	-0.124939
-1.372301	do this every	-0.124939
-1.170691	done at every	-0.124939
-0.593458	breakpoints at every	-0.124939
-1.536138	for example every	-0.124939
-1.352044	are called every	-0.124939
-1.587120	is done every	-0.124939
-1.166192	the list every	-0.124939
-1.168155	of branches every	-0.124939
-0.934661	memory block every	-0.425969
-1.019355	point addition every	-0.124939
-0.179276	one addition every	-0.124939
-1.386241	be loaded every	-0.124939
-0.997030	for updates every	-0.124939
-0.572915	interrupt, e.g. every	-0.124939
-0.827592	a misprediction every	-0.124939
-0.805398	are evaluated every	-0.124939
-0.788244	be updated every	-0.124939
-0.463330	// incremented every	-0.124939
-0.358686	is re-allocated every	-0.124939
-0.358686	be re-calculated every	-0.124939
-2.842044	of the standard	-0.124939
-2.020467	to the standard	-0.124939
-2.413512	on the standard	-0.124939
-1.826860	as the standard	-0.124939
-2.264157	If the standard	-0.124939
-1.429385	Unfortunately, the standard	-0.124939
-0.599047	purposes the standard	-0.124939
-0.599047	Omitting the standard	-0.124939
-2.657148	is a standard	-0.124939
-1.977523	have a standard	-0.124939
-0.203899	rule of standard	-0.425969
-1.068722	classes. The standard	-0.124939
-0.598545	storing. The standard	-0.124939
-0.598545	pointer". The standard	-0.124939
-0.601505	structures for standard	-0.124939
-1.377538	know that standard	-0.124939
-0.600968	(1985). This standard	-0.124939
-1.497541	resources than standard	-0.124939
-1.814879	can use standard	-0.124939
-1.058329	should use standard	-0.124939
-1.465801	for many standard	-0.124939
-0.584630	Unfortunately, many standard	-0.124939
-0.595408	only simple standard	-0.124939
-0.594087	Connecting several standard	-0.124939
-0.583040	official C standard	-0.124939
-0.573155	compiler includes standard	-0.124939
-0.566985	compilers include standard	-0.124939
-0.249748	the C/C++ standard	-0.124939
-0.249748	The C/C++ standard	-0.124939
-0.358773	the IEEE standard	-0.124939
-2.548656	for the hardware	-0.124939
-2.465569	on the hardware	-0.124939
-1.594207	than the hardware	-0.425969
-2.406224	when the hardware	-0.124939
-2.286554	because the hardware	-0.124939
-1.851453	and a hardware	-0.124939
-1.799018	in a hardware	-0.602060
-1.897718	than a hardware	-0.124939
-1.605515	where a hardware	-0.124939
-0.898871	choice of hardware	-0.425969
-1.045524	Choice of hardware	-0.425969
-1.640986	access to hardware	-0.124939
-1.641016	implemented in hardware	-0.124939
-0.601530	them. The hardware	-0.124939
-1.843591	based on hardware	-0.124939
-0.373800	CPU has hardware	-0.425969
-1.143910	microprocessor has hardware	-0.124939
-0.671167	or other hardware	-0.124939
-0.592712	any known hardware	-0.124939
-1.426907	the microprocessor hardware	-0.124939
-0.545353	in microprocessor hardware	-0.124939
-0.877522	the intrinsic hardware	-0.124939
-0.805768	language defines hardware	-0.124939
-0.463476	and direct hardware	-0.124939
-0.358801	to catching hardware	-0.124939
-1.078460	cycle is 1	-0.124939
-0.899700	positive and 1	-0.124939
-0.600362	false and 1	-0.124939
-2.209982	will be 1	-0.124939
-0.548840	0 or 1	-0.124939
-1.076996	b with 1	-0.124939
-1.781708	byte at 1	-0.124939
-1.658759	b + 1	-0.124939
-1.184759	or unsigned 1	-0.124939
-1.115342	long 64 1	-0.124939
-0.578107	Iu32vec2 64 1	-0.124939
-0.595652	// always 1	-0.124939
-0.595759	_mm_shuffle_epi8 16 1	-0.124939
-0.595869	_mm_perm_epi8 32 1	-0.124939
-0.595242	b & 1	-0.124939
-0.553492	unsigned 1 1	-0.124939
-0.553492	bool 1 1	-0.124939
-0.579070	bytes bool 1	-0.124939
-0.572947	OneOrTwo5[(b!=0) ? 1	-0.124939
-0.818122	eax ebx, 1	-0.124939
-0.557324	PTR[ecx+eax*4],ebx eax, 1	-0.124939
-0.557236	2exponent 127 1	-0.124939
-0.504783	.......................................................................................................... 164 1	-0.124939
-0.504600	ENDP ecx, 1	-0.124939
-0.834966	Manual", Volume 1	-0.124939
-0.463257	around. Adding 1	-0.124939
-0.463257	and subtracting 1	-0.124939
-0.358629	2014-08-07. Contents 1	-0.124939
-0.358629	2exponent 1023 1	-0.124939
-0.358629	the level- 1	-0.124939
-0.203903	? a :	-0.124939
-0.599572	int one :	-0.124939
-0.201259	? b :	-0.425969
-0.493166	+ 2 :	-0.425969
-0.591604	? 1 :	-0.124939
-1.058314	int sign :	-0.124939
-0.100557	int exponent :	-0.124939
-0.093501	int fraction :	-0.124939
-0.562885	+ 2) :	-0.124939
-0.159829	class D :	-0.425969
-0.383888	class C1 :	-0.425969
-0.129415	class CChild1 :	-0.425969
-0.504700	class CChild2 :	-0.124939
-0.463349	class CParent :	-0.124939
-0.463349	class C2 :	-0.124939
-0.463349	? 1.0f :	-0.124939
-0.358701	: "=m"(n) :	-0.124939
-0.358701	? 1.5f :	-0.124939
-0.358701	public: c1() :	-0.124939
-0.358701	? EXCEPTION_EXECUTE_HANDLER :	-0.124939
-0.358701	%0 " :	-0.124939
-0.358701	: "m"(x) :	-0.124939
-2.147820	have to add	-0.124939
-2.223202	possible to add	-0.124939
-1.935417	takes to add	-0.124939
-0.896079	software to add	-0.124939
-1.290555	reason to add	-0.124939
-0.600437	operands and add	-0.124939
-0.600437	shift and add	-0.124939
-0.601497	plug-ins that add	-0.124939
-0.594143	operator // add	-0.124939
-0.887397	23; // add	-0.124939
-0.594143	2;} // add	-0.124939
-0.594143	add_elements(s); // add	-0.124939
-1.076312	loop or add	-0.124939
-2.042627	do not add	-0.124939
-1.396394	You may add	-0.124939
-0.594100	size then add	-0.124939
-1.055725	module then add	-0.124939
-0.202118	The instruction add	-0.124939
-0.895330	When we add	-0.124939
-1.061688	may even add	-0.124939
-0.595388	two instructions add	-0.124939
-0.594680	2 ; add	-0.124939
-1.403211	that doesn't add	-0.124939
-0.591589	86 add add	-0.124939
-0.872173	may actually add	-0.124939
-0.575310	add mov add	-0.124939
-0.726283	vector register, add	-0.124939
-0.463367	add sar add	-0.124939
-0.463367	mov shr add	-0.124939
-0.358715	ecx 86 add	-0.124939
-0.585539	in 64-bit mode	-0.367977
-0.978220	In 64-bit mode	-0.124939
-0.178430	Use 64-bit mode	-0.124939
-0.486554	reference, 64-bit mode	-0.124939
-1.225181	in 32-bit mode	-0.124939
-0.565934	reference, 32-bit mode	-0.124939
-0.873991	64 bit mode	-0.124939
-0.778799	32 bit mode	-0.124939
-0.483841	for 16-bit mode	-0.124939
-0.483841	mode. 16-bit mode	-0.124939
-0.557704	point rounding mode	-0.124939
-0.061092	a console mode	-0.124939
-0.061092	A console mode	-0.425969
-0.077809	the flush-to-zero mode	-0.124939
-0.262058	Set flush-to-zero mode	-0.124939
-0.129467	to protected mode	-0.425969
-0.143381	the denormals-are-zero mode	-0.124939
-0.143381	and denormals-are-zero mode	-0.124939
-2.564894	of a store	-0.124939
-2.066027	on a store	-0.124939
-1.294173	generate a store	-0.124939
-2.006803	is to store	-0.124939
-1.557629	than to store	-0.124939
-1.838931	compiler to store	-0.124939
-1.960420	have to store	-0.124939
-1.021520	efficient to store	-0.124939
-1.547358	possible to store	-0.124939
-0.877839	elements to store	-0.124939
-1.918112	how to store	-0.124939
-1.829851	need to store	-0.124939
-0.389211	Function to store	-0.425969
-1.155005	whether to store	-0.124939
-0.877839	space to store	-0.124939
-1.068355	class and store	-0.124939
-0.598421	*p+2 and store	-0.124939
-0.598421	(2,2,2,2), and store	-0.124939
-0.598421	(1,2,3,4), and store	-0.124939
-1.739615	we can store	-0.124939
-1.980754	you may store	-0.124939
-1.181464	system may store	-0.124939
-1.810019	compiler will store	-0.124939
-0.600281	Uncached memory store	-0.124939
-0.594863	to ; store	-0.124939
-0.570458	compiler might store	-0.124939
-0.463568	Even better: store	-0.124939
-2.788918	in the values	-0.124939
-2.001491	that the values	-0.425969
-2.429992	on the values	-0.124939
-2.047445	make the values	-0.124939
-1.532917	store the values	-0.124939
-1.071245	insert the values	-0.124939
-0.599398	show the values	-0.124939
-0.896089	}; The values	-0.124939
-1.283734	called. The values	-0.124939
-1.068722	array. The values	-0.124939
-1.665398	that have values	-0.124939
-0.121171	have other values	-0.301030
-0.767119	no other values	-0.425969
-0.599409	different set values	-0.124939
-0.200926	checking multiple values	-0.425969
-1.190598	these two values	-0.124939
-1.780280	the table values	-0.124939
-1.155372	by their values	-0.124939
-0.587738	have three values	-0.124939
-0.580510	to desired values	-0.124939
-0.550520	all five values	-0.124939
-0.540983	their actual values	-0.124939
-0.540983	to valid values	-0.124939
-0.540983	the key values	-0.124939
-0.463440	four G values	-0.124939
-0.463440	the R values	-0.124939
-0.596541	Position-independent code. All	-0.124939
-1.919839	operating system All	-0.124939
-0.888128	order execution All	-0.124939
-1.046426	application program. All	-0.124939
-0.588749	message systems. All	-0.124939
-1.410116	optimization options All	-0.124939
-0.582666	stack are: All	-0.124939
-0.580144	public variables. All	-0.124939
-0.577091	standard operations. All	-0.124939
-1.180274	is needed. All	-0.124939
-0.791152	different purposes. All	-0.124939
-0.691611	multiple purposes. All	-0.124939
-0.566750	Compatibility problems. All	-0.124939
-0.980444	big-endian storage. All	-0.124939
-1.063879	very fast. All	-0.124939
-0.557364	cannot do. All	-0.124939
-0.526784	best-case conditions. All	-0.124939
-0.504540	be stored. All	-0.124939
-0.834837	error prone. All	-0.124939
-0.463203	need relocation. All	-0.124939
-0.659181	"override" feature. All	-0.124939
-0.463203	table 9.2. All	-0.124939
-0.358586	are constructed. All	-0.124939
-0.358586	prediction). 149 All	-0.124939
-0.358586	two steps. All	-0.124939
-0.358586	page 93). All	-0.124939
-0.358586	libraries. www.agner.org/optimize/#vectorclass All	-0.124939
-0.358586	1. Relocation. All	-0.124939
-0.358586	formats. Comments All	-0.124939
-0.358586	an integer). All	-0.124939
-0.358586	stack unwinding. All	-0.124939
-2.435274	to the sign	-0.124939
-2.225590	with the sign	-0.124939
-1.555366	example, the sign	-0.124939
-1.182745	test the sign	-0.124939
-1.596758	without the sign	-0.124939
-0.939039	out the sign	-0.425969
-1.624819	about the sign	-0.124939
-1.187991	sets the sign	-0.124939
-1.410415	change the sign	-0.124939
-0.203026	except the sign	-0.124939
-1.182745	setting the sign	-0.124939
-0.892271	copies the sign	-0.124939
-0.596617	flip the sign	-0.124939
-0.596617	inverting the sign	-0.124939
-0.601551	fraction. The sign	-0.124939
-0.601545	corrections for sign	-0.124939
-0.901652	1; // sign	-0.124939
-0.601108	(zero with sign	-0.124939
-1.044817	unsigned int sign	-0.301030
-0.500184	// set sign	-0.425969
-0.891973	// test sign	-0.124939
-1.067281	shift out sign	-0.124939
-0.562912	shift down sign	-0.124939
-1.065329	// Set sign	-0.124939
-0.463513	// flip sign	-0.124939
-0.358830	The inequality sign	-0.124939
-2.724301	to the copy	-0.124939
-2.297796	use the copy	-0.124939
-2.657148	is a copy	-0.124939
-2.603325	of a copy	-0.124939
-1.752144	time to copy	-0.124939
-1.587009	useful to copy	-0.124939
-0.600150	block to copy	-0.124939
-0.897804	block and copy	-0.124939
-0.203589	transpose and copy	-0.425969
-0.896089	pointer. The copy	-0.124939
-1.068722	value. The copy	-0.124939
-0.598545	destructors. The copy	-0.124939
-1.934066	useful for copy	-0.124939
-0.598916	sizeof(a)); // copy	-0.124939
-0.598916	0.0; // copy	-0.124939
-0.887751	performance. A copy	-0.124939
-0.594323	initialization. A copy	-0.124939
-1.467652	are no copy	-0.124939
-1.493811	has no copy	-0.124939
-0.595112	protection. Some copy	-0.124939
-0.589497	updated. Most copy	-0.124939
-0.586459	breakdown. Many copy	-0.124939
-0.863823	an unused copy	-0.124939
-0.575320	object. Any copy	-0.124939
-0.020841	a non-inlined copy	-0.301030
-0.065786	This non-inlined copy	-0.124939
-0.504801	a backup copy	-0.124939
-0.463440	default constructors, copy	-0.124939
-0.679344	costs of optimizing	-0.124939
-1.073516	requirements of optimizing	-0.124939
-0.902058	size and optimizing	-0.124939
-1.736567	than in optimizing	-0.124939
-0.600590	efforts in optimizing	-0.124939
-1.906273	useful for optimizing	-0.124939
-1.197859	good for optimizing	-0.124939
-1.529208	than by optimizing	-0.124939
-1.067362	gain by optimizing	-0.124939
-0.868195	that an optimizing	-0.124939
-0.584262	then an optimizing	-0.124939
-0.584262	because an optimizing	-0.124939
-0.584262	But an optimizing	-0.124939
-0.584262	cases, an optimizing	-0.124939
-1.075294	important than optimizing	-0.124939
-1.296214	account when optimizing	-0.124939
-0.600198	good at optimizing	-0.124939
-0.899128	variable because optimizing	-0.124939
-0.597718	choice between optimizing	-0.124939
-0.514899	program. An optimizing	-0.124939
-0.514899	used. An optimizing	-0.124939
-0.514899	Devirtualization An optimizing	-0.124939
-0.514899	pointers). An optimizing	-0.124939
-1.782502	the best optimizing	-0.124939
-1.347898	A good optimizing	-0.124939
-0.591279	fast. All optimizing	-0.124939
-0.583890	many advanced optimizing	-0.124939
-1.294746	to prevent optimizing	-0.124939
-0.550475	best job optimizing	-0.124939
-0.358744	// Prevent optimizing	-0.124939
-3.078339	of the memory.	-0.124939
-2.360402	in the memory.	-0.124939
-1.073674	bytes of memory.	-0.124939
-1.819765	piece of memory.	-0.124939
-0.899410	blocks of memory.	-0.124939
-1.403580	not in memory.	-0.124939
-1.680418	than in memory.	-0.124939
-1.177452	other in memory.	-0.124939
-1.415348	stored in memory.	-0.124939
-1.265825	around in memory.	-0.124939
-0.885131	temp in memory.	-0.124939
-0.885131	sequentially in memory.	-0.124939
-0.885131	consecutively in memory.	-0.124939
-2.649553	the code memory.	-0.124939
-0.376666	in program memory.	-0.124939
-0.588040	library into memory.	-0.124939
-1.150568	loaded into memory.	-0.124939
-0.590660	in static memory.	-0.124939
-0.848866	than static memory.	-0.124939
-1.065478	to stack memory.	-0.124939
-0.767120	dynamically allocated memory.	-0.124939
-0.583083	of main memory.	-0.124939
-0.292024	of RAM memory.	-0.124939
-0.408140	in RAM memory.	-0.124939
-0.541146	with contiguous memory.	-0.124939
-2.311272	use the well	-0.124939
-2.336057	with a well	-0.124939
-0.601594	systematic and well	-0.124939
-0.196769	may as well	-0.124939
-0.834641	program as well	-0.124939
-0.566488	functions as well	-0.124939
-0.566488	Linux as well	-0.124939
-0.566488	framework as well	-0.124939
-0.566488	reading as well	-0.124939
-0.566488	users as well	-0.124939
-0.566488	enum as well	-0.124939
-0.566488	yet as well	-0.124939
-0.566488	R2 as well	-0.124939
-2.186304	are not well	-0.124939
-2.117812	a pointer well	-0.124939
-0.597787	may very well	-0.124939
-1.050919	on how well	-0.124939
-0.558404	see how well	-0.124939
-0.558404	checking how well	-0.124939
-0.567084	always work well	-0.124939
-1.148945	doesn't work well	-0.124939
-1.155897	that works well	-0.124939
-0.560984	it works well	-0.124939
-0.591022	predicted quite well	-0.124939
-0.316100	are predicted well	-0.425969
-0.456650	usually predicted well	-0.124939
-0.504944	platforms. Works well	-0.124939
-0.463476	have worked well	-0.124939
-0.463476	universal, flexible, well	-0.124939
-1.637498	sure the information	-0.124939
-1.550939	store the information	-0.124939
-1.202540	source of information	-0.124939
-1.376362	and for information	-0.124939
-0.600905	known. This information	-0.124939
-1.440963	doesn't have information	-0.124939
-1.493858	use this information	-0.124939
-1.292652	for more information	-0.124939
-0.591101	needs all information	-0.124939
-0.591101	saved all information	-0.124939
-1.575290	has no information	-0.124939
-0.597900	save some information	-0.124939
-0.596119	probably without information	-0.124939
-0.594810	adds extra information	-0.124939
-1.238936	the necessary information	-0.124939
-0.565478	124 necessary information	-0.124939
-0.580240	memory. No information	-0.124939
-1.180414	the full information	-0.124939
-0.570343	of added information	-0.124939
-0.882405	the CPUID information	-0.124939
-0.415503	only CPUID information	-0.124939
-0.557514	contains debug information	-0.124939
-1.167662	stack unwinding information	-0.124939
-0.358795	which gets information	-0.124939
-0.358795	class gets information	-0.124939
-0.526869	compiler additional information	-0.124939
-0.527008	store application-specific information	-0.124939
-0.463276	has insufficient information	-0.124939
-0.358643	to seek information	-0.124939
-0.358643	save recovery information	-0.124939
-0.358643	has incomplete information	-0.124939
-1.985887	It is simply	-0.425969
-0.965073	pointer is simply	-0.124939
-2.249455	there is simply	-0.124939
-1.055894	structure is simply	-0.124939
-0.202528	difference is simply	-0.425969
-1.055894	effect is simply	-0.124939
-0.594158	enum is simply	-0.124939
-0.594158	labels is simply	-0.124939
-0.594158	X" is simply	-0.124939
-1.638319	functions and simply	-0.124939
-1.874268	function that simply	-0.124939
-1.196580	function are simply	-0.124939
-1.071641	values are simply	-0.124939
-0.601131	imprecise or simply	-0.124939
-1.296062	time. It simply	-0.124939
-0.599745	When used simply	-0.124939
-0.897204	member pointer simply	-0.124939
-1.364025	point number simply	-0.124939
-1.689788	an error simply	-0.124939
-0.595306	people. I simply	-0.124939
-1.539588	unsigned integers simply	-0.124939
-1.587374	is done simply	-0.124939
-1.461503	is implemented simply	-0.124939
-1.415467	point numbers simply	-0.124939
-0.997587	be copied simply	-0.124939
-0.981104	CPU brand simply	-0.124939
-0.957034	is measured simply	-0.124939
-0.463403	performance significantly simply	-0.124939
-1.199592	compiler is able	-0.425969
-0.899851	microprocessor is able	-0.124939
-1.467119	to be able	-0.602060
-0.805825	not be able	-0.726999
-1.485042	may be able	-0.602060
-1.401300	will be able	-0.425969
-1.545030	would be able	-0.124939
-0.389279	compilers are able	-0.903090
-0.667865	microprocessors are able	-0.425969
-2.079732	are not able	-0.124939
-0.597147	usually not able	-0.124939
-1.671256	not always able	-0.124939
-1.145427	are actually able	-0.124939
-0.523712	they were able	-0.124939
-0.523712	tested were able	-0.124939
-0.858615	are sometimes able	-0.124939
-2.725246	it is certain	-0.124939
-0.899662	1 is certain	-0.124939
-0.600342	#define is certain	-0.124939
-1.916305	if a certain	-0.124939
-1.932213	than a certain	-0.124939
-1.196902	within a certain	-0.124939
-0.601701	adhere to certain	-0.124939
-0.601456	speed for certain	-0.124939
-2.007245	sure that certain	-0.124939
-0.599817	likelihood that certain	-0.124939
-2.002341	cannot be certain	-0.124939
-2.031606	you are certain	-0.124939
-1.558676	There are certain	-0.425969
-1.534140	only if certain	-0.124939
-0.598287	parallel if certain	-0.124939
-0.601003	flags on certain	-0.124939
-2.578545	is not certain	-0.124939
-0.900063	sets have certain	-0.124939
-0.600171	interrupts at certain	-0.124939
-1.659616	can make certain	-0.124939
-0.897707	be no certain	-0.124939
-1.661721	is therefore certain	-0.124939
-0.881366	to count certain	-0.124939
-1.422857	is quite certain	-0.124939
-0.590461	instruction was certain	-0.124939
-1.034543	it prevents certain	-0.124939
-0.397319	is almost certain	-0.124939
-0.659410	to obey certain	-0.124939
-0.358701	to query certain	-0.124939
-0.796065	the clock cycles	-0.124939
-0.311176	is clock cycles	-0.124939
-0.311176	more clock cycles	-0.124939
-0.617369	CPU clock cycles	-0.124939
-0.492430	two clock cycles	-0.124939
-0.311176	2 clock cycles	-0.124939
-0.439550	4 clock cycles	-0.124939
-0.439550	several clock cycles	-0.124939
-0.087178	few clock cycles	-0.204120
-0.239852	10 clock cycles	-0.124939
-0.405527	core clock cycles	-0.124939
-0.615641	5 clock cycles	-0.124939
-0.615641	20 clock cycles	-0.124939
-0.311176	15 clock cycles	-0.124939
-0.439550	80 clock cycles	-0.124939
-0.128294	hundred clock cycles	-0.124939
-0.311176	11 clock cycles	-0.124939
-0.311176	50 clock cycles	-0.124939
-0.128294	matrices, clock cycles	-0.425969
-0.311176	2-3 clock cycles	-0.124939
-0.311176	counting clock cycles	-0.124939
-1.072898	b[size]; // ...	-0.124939
-0.598907	WhateverFunction(i); // ...	-0.124939
-1.691407	i++) { ...	-0.124939
-0.571584	10) { ...	-0.124939
-0.571584	1.0) { ...	-0.124939
-0.571584	...)) { ...	-0.124939
-0.571584	max) { ...	-0.124939
-0.571584	(...) { ...	-0.124939
-0.571584	min)) { ...	-0.124939
-1.090256	int i; ...	-0.124939
-0.511265	b[size], i; ...	-0.124939
-0.745257	b, c; ...	-0.425969
-1.772035	{ public: ...	-0.124939
-0.586459	C1 x; ...	-0.124939
-1.078259	x, y; ...	-0.124939
-0.415621	CriticalFunction(b, c); ...	-0.124939
-0.415621	(*CriticalFunction)(b, c); ...	-0.124939
-0.504781	x); 136 ...	-0.124939
-0.835352	i, j; ...	-0.124939
-0.463422	void CriticalFunction(); ...	-0.124939
-0.659525	39916800, 479001600}; ...	-0.124939
-0.463422	float list[size]; ...	-0.124939
-0.358758	int List[ArraySize]; ...	-0.124939
-0.358758	= Func1(2); ...	-0.124939
-0.358758	= log(2.0); ...	-0.124939
-0.358758	a[1], b[1], ...	-0.124939
-0.358758	= FactorialTable[b]; ...	-0.124939
-2.598596	that the addresses	-0.124939
-2.285359	because the addresses	-0.124939
-1.730727	calculate the addresses	-0.124939
-0.899149	control the addresses	-0.124939
-1.291243	includes the addresses	-0.124939
-1.435440	calculating the addresses	-0.124939
-0.600921	alignment to addresses	-0.124939
-0.600921	structures to addresses	-0.124939
-1.552090	pointers and addresses	-0.124939
-1.200357	pointers or addresses	-0.124939
-0.600565	both have addresses	-0.124939
-0.899352	reads from addresses	-0.124939
-1.748780	of memory addresses	-0.124939
-0.593571	round memory addresses	-0.124939
-0.898822	from different addresses	-0.124939
-0.598631	full 64-bit addresses	-0.124939
-1.022985	function return addresses	-0.124939
-0.582509	causing return addresses	-0.124939
-0.596206	translate these addresses	-0.124939
-0.594270	calculate element addresses	-0.124939
-0.594002	0x4700. These addresses	-0.124939
-0.592637	itself. Function addresses	-0.124939
-0.591259	code. All addresses	-0.124939
-0.637110	because relative addresses	-0.124939
-0.449019	generate relative addresses	-0.124939
-0.637110	self- relative addresses	-0.124939
-0.570223	calculating row addresses	-0.124939
-0.557526	no absolute addresses	-0.124939
-0.550528	calculate self-relative addresses	-0.124939
-0.526995	to round addresses	-0.124939
-2.020607	is a counter	-0.124939
-2.576607	floating point counter	-0.124939
-1.014580	the loop counter	-0.221849
-0.901292	a loop counter	-0.301030
-1.168347	The loop counter	-0.124939
-0.512186	as loop counter	-0.124939
-0.914880	A loop counter	-0.124939
-0.512186	Initialize loop counter	-0.124939
-0.512186	Increment loop counter	-0.124939
-1.937532	an integer counter	-0.124939
-0.882806	may add counter	-0.124939
-1.900778	clock cycles counter	-0.124939
-0.640432	// Loop counter	-0.124939
-0.863624	performance monitor counter	-0.124939
-0.732324	clock cycle counter	-0.124939
-0.132381	time stamp counter	-0.124939
-2.943359	of the shared	-0.124939
-2.850693	in the shared	-0.124939
-1.483720	from the shared	-0.425969
-1.198292	compile the shared	-0.124939
-1.539048	store the shared	-0.124939
-2.331413	that is shared	-0.124939
-2.427829	of a shared	-0.124939
-1.645578	in a shared	-0.425969
-1.469054	making a shared	-0.124939
-0.597380	compile a shared	-0.124939
-1.077748	address and shared	-0.124939
-1.977696	used in shared	-0.124939
-2.940086	can be shared	-0.124939
-2.298001	that are shared	-0.124939
-0.601175	libraries or shared	-0.124939
-2.582711	is not shared	-0.124939
-1.495722	even when shared	-0.124939
-0.594338	libraries. A shared	-0.124939
-0.594338	above. A shared	-0.124939
-2.528525	to make shared	-0.124939
-2.633793	the same shared	-0.124939
-1.305519	a 64-bit shared	-0.124939
-0.586058	up 64-bit shared	-0.124939
-0.491653	also called shared	-0.425969
-1.398427	very large shared	-0.124939
-1.821951	call to count	-0.124939
-1.671727	up to count	-0.124939
-1.378746	variables that count	-0.124939
-0.601329	making it count	-0.124939
-1.054231	the loop count	-0.522879
-1.404969	a loop count	-0.124939
-1.283990	The loop count	-0.124939
-0.190869	maximum loop count	-0.425969
-1.656820	the clock count	-0.124939
-0.945346	The first count	-0.124939
-1.531504	and therefore count	-0.124939
-0.589071	but don't count	-0.124939
-0.017806	the repeat count	-0.522879
-0.097372	high repeat count	-0.124939
-0.045963	maximum repeat count	-0.425969
-0.097372	low repeat count	-0.124939
-0.097372	typical repeat count	-0.124939
-0.097372	fixed repeat count	-0.124939
-0.570433	while seconds count	-0.124939
-1.798219	of the program.	-0.271067
-2.273690	in the program.	-0.124939
-2.296013	by the program.	-0.124939
-0.598017	crash the program.	-0.124939
-0.598017	crashes the program.	-0.124939
-1.539558	of a program.	-0.249877
-1.194576	up a program.	-0.124939
-1.202942	start to program.	-0.124939
-1.068914	a C++ program.	-0.124939
-2.019136	the first program.	-0.124939
-1.396726	a big program.	-0.124939
-1.183844	console mode program.	-0.124939
-0.620327	the application program.	-0.124939
-1.417529	the main program.	-0.124939
-1.349621	the whole program.	-0.124939
-1.481631	the final program.	-0.124939
-0.562950	poorly designed program.	-0.124939
-0.764421	an existing program.	-0.124939
-2.548909	it is quite	-0.124939
-2.386259	This is quite	-0.124939
-2.008087	It is quite	-0.124939
-2.049168	which is quite	-0.124939
-1.465625	C++ is quite	-0.124939
-0.596918	problems is quite	-0.124939
-0.596918	12.4c is quite	-0.124939
-1.827908	can be quite	-0.124939
-2.248395	may be quite	-0.124939
-0.601408	slice are quite	-0.124939
-2.582014	is not quite	-0.124939
-1.293757	can have quite	-0.124939
-0.599773	model, which quite	-0.124939
-0.599756	flexible, but quite	-0.124939
-1.916894	is also quite	-0.124939
-1.020789	can take quite	-0.425969
-1.496489	is accessed quite	-0.124939
-1.464780	compiler does quite	-0.124939
-1.022815	is actually quite	-0.124939
-0.969068	are actually quite	-0.124939
-1.386624	be predicted quite	-0.124939
-0.573155	also occur quite	-0.124939
-0.570408	may happen quite	-0.124939
-0.550520	which happens quite	-0.124939
-0.916911	that runs quite	-0.124939
-2.034834	set is used.	-0.124939
-1.614873	pointer is used.	-0.124939
-1.558669	variable is used.	-0.124939
-1.275707	stack is used.	-0.124939
-1.060052	precision is used.	-0.124939
-0.595590	framework is used.	-0.124939
-0.596876	linking is used.	-0.124939
-0.595590	nontemporal is used.	-0.124939
-0.595590	alloca is used.	-0.124939
-2.220483	can be used.	-0.124939
-2.251359	should be used.	-0.124939
-0.480779	registers are used.	-0.221849
-1.127320	they are used.	-0.124939
-2.188384	are not used.	-0.124939
-1.849855	of memory used.	-0.124939
-1.187742	of registers used.	-0.124939
-0.631882	is never used.	-0.124939
-0.351526	no longer used.	-0.124939
-1.228622	is actually used.	-0.124939
-0.527262	is seldom used.	-0.124939
-1.405131	large data files	-0.124939
-0.593323	writing data files	-0.124939
-0.600076	or make files	-0.124939
-0.201926	scans all files	-0.425969
-0.598792	necessary library files	-0.124939
-1.680343	the object files	-0.124939
-1.278417	of object files	-0.124939
-0.574289	Mixing object files	-0.124939
-0.598271	requiring many files	-0.124939
-0.594054	load several files	-0.124939
-1.047184	The intermediate files	-0.124939
-0.494205	different source files	-0.124939
-0.494205	all source files	-0.124939
-0.494205	All source files	-0.124939
-0.573083	time loading files	-0.124939
-0.573083	or resource files	-0.124939
-0.835921	the header files	-0.124939
-0.463616	Intel header files	-0.124939
-0.358867	store help files	-0.124939
-0.143375	files, help files	-0.124939
-0.557480	connections. Open files	-0.124939
-0.266178	multiple .cpp files	-0.425969
-0.314693	write configuration files	-0.124939
-0.314693	drivers, configuration files	-0.124939
-0.527221	12.2. Header files	-0.124939
-0.504761	compiler. Object files	-0.124939
-0.358744	connections. Temporary files	-0.124939
-1.406967	it is recommended	-0.970037
-1.015150	It is recommended	-1.271067
-1.594615	is not recommended	-0.602060
-1.857989	are not recommended	-0.124939
-1.142676	therefore not recommended	-0.124939
-1.919042	is also recommended	-0.124939
-0.965730	is therefore recommended	-0.124939
-0.358945	is strongly recommended	-0.124939
-2.569754	for the intermediate	-0.124939
-2.484514	on the intermediate	-0.124939
-1.542224	store the intermediate	-0.124939
-0.899922	compiling the intermediate	-0.124939
-0.600473	interprets the intermediate	-0.124939
-1.678208	disadvantage of intermediate	-0.124939
-1.922018	code and intermediate	-0.124939
-0.600526	precision, and intermediate	-0.124939
-0.896150	format. The intermediate	-0.124939
-0.598576	step. The intermediate	-0.124939
-0.598576	distributed. The intermediate	-0.124939
-0.601535	objects for intermediate	-0.124939
-1.200885	consider if intermediate	-0.124939
-1.146093	based on intermediate	-0.425969
-1.629749	of an intermediate	-0.124939
-1.466803	to an intermediate	-0.124939
-1.309935	with an intermediate	-0.124939
-1.192545	use an intermediate	-0.124939
-0.572294	used an intermediate	-0.124939
-0.583957	using an intermediate	-0.425969
-1.169185	into an intermediate	-0.124939
-0.597508	// makes intermediate	-0.124939
-0.882872	of every intermediate	-0.124939
-1.009848	to store intermediate	-0.124939
-0.591382	integer). All intermediate	-0.124939
-0.818175	by storing intermediate	-0.124939
-0.504861	runtime frameworks, intermediate	-0.124939
-2.252789	code is fast	-0.124939
-0.600376	arrays is fast	-0.124939
-0.600376	search, is fast	-0.124939
-1.341959	and for fast	-0.124939
-0.595446	unsigned for fast	-0.124939
-1.402846	instructions for fast	-0.124939
-0.889961	options for fast	-0.124939
-0.595446	sourcebook for fast	-0.124939
-2.529736	may be fast	-0.124939
-1.595584	operations are fast	-0.124939
-1.247216	is as fast	-0.124939
-0.866483	are as fast	-0.124939
-0.866483	therefore as fast	-0.124939
-0.245654	just as fast	-0.602060
-1.199741	also have fast	-0.124939
-1.212912	are so fast	-0.124939
-0.582948	developing so fast	-0.124939
-1.775855	is very fast	-0.124939
-1.606583	is calculated fast	-0.124939
-0.590981	runs quite fast	-0.124939
-1.196486	are particularly fast	-0.124939
-1.219713	to enable fast	-0.124939
-0.567029	saturated addition, fast	-0.124939
-0.827905	is equally fast	-0.124939
-1.026866	the job fast	-0.124939
-0.527151	worked sufficiently fast	-0.124939
-0.358773	approximate reciprocal, fast	-0.124939
-2.759636	to the allocation	-0.124939
-1.073378	object. The allocation	-0.124939
-0.899213	integers. The allocation	-0.124939
-1.076654	the memory allocation	-0.124939
-0.261475	dynamic memory allocation	-0.212089
-0.033705	Dynamic memory allocation	-0.176091
-0.597174	optimize register allocation	-0.124939
-1.429283	of dynamic allocation	-0.124939
-0.981991	it involves allocation	-0.124939
-0.562974	The frequent allocation	-0.124939
-0.550752	finished. Register allocation	-0.124939
-0.790212	{ for (int	-0.425969
-1.022107	0; for (int	-0.124939
-0.673184	... for (int	-0.425969
-0.069387	vectors: for (int	-0.823909
-0.582190	floats for (int	-0.124939
-0.582190	sum; for (int	-0.124939
-0.582190	1.f; for (int	-0.124939
-0.582190	_alloca) for (int	-0.124939
-0.169334	int factorial (int	-0.425969
-0.007035	int SomeFunction (int	-0.823909
-0.036372	float SomeFunction (int	-0.124939
-0.036372	void SomeFunction (int	-0.124939
-0.527278	int Multiply (int	-0.124939
-0.504962	void Func1 (int	-0.124939
-0.463586	void Plus2 (int	-0.124939
-0.463586	int MultiplyBy (int	-0.124939
-0.463586	void FuncA (int	-0.124939
-0.358887	void FuncB (int	-0.124939
-2.364461	because the write	-0.124939
-0.969546	than to write	-0.124939
-2.170238	possible to write	-0.124939
-0.675411	easier to write	-0.124939
-1.061919	prefer to write	-0.124939
-0.891506	minutes to write	-0.124939
-0.596229	attempting to write	-0.124939
-0.600526	value and write	-0.124939
-1.074594	read and write	-0.124939
-0.203432	read or write	-0.124939
-1.980288	you may write	-0.124939
-2.007936	You may write	-0.124939
-2.087434	if you write	-0.124939
-0.596898	however, often write	-0.124939
-1.066152	These instructions write	-0.124939
-0.569924	if I write	-0.124939
-0.569924	If I write	-0.124939
-1.263791	the threads write	-0.124939
-0.113541	the nontemporal write	-0.124939
-0.267698	of nontemporal write	-0.124939
-0.267698	The nontemporal write	-0.124939
-0.267698	so-called nontemporal write	-0.124939
-0.570416	that programmers write	-0.124939
-0.527122	An uncached write	-0.124939
-0.358816	__fastcall Noncached write	-0.124939
-1.383517	and to optimize	-0.124939
-0.873072	compiler to optimize	-0.249877
-0.880824	data to optimize	-0.124939
-2.054827	order to optimize	-0.124939
-1.376888	want to optimize	-0.124939
-1.763763	important to optimize	-0.124939
-1.776143	necessary to optimize	-0.124939
-0.880824	information to optimize	-0.124939
-1.895607	able to optimize	-0.124939
-1.160674	start to optimize	-0.124939
-1.160674	try to optimize	-0.124939
-0.601632	inline and optimize	-0.124939
-1.324735	compiler can optimize	-0.124939
-0.502164	Does not optimize	-0.124939
-2.620254	the compiler optimize	-0.124939
-1.809750	compiler will optimize	-0.124939
-0.201542	How compilers optimize	-0.124939
-1.360060	can often optimize	-0.124939
-1.065139	can easily optimize	-0.124939
-0.527165	or otherwise optimize	-0.124939
-0.358844	by selecting optimize	-0.124939
-0.358844	below. Cannot optimize	-0.124939
-2.694152	of the above	-0.124939
-1.810335	in the above	-0.221849
-1.467424	from the above	-0.124939
-1.937600	using the above	-0.124939
-1.007939	In the above	-0.425969
-0.596984	repeat the above	-0.124939
-0.596984	(In the above	-0.124939
-0.892997	explain the above	-0.124939
-0.596984	Weighing the above	-0.124939
-0.890407	register. The above	-0.124939
-0.595672	a[i]; The above	-0.124939
-0.595672	sources. The above	-0.124939
-0.595672	rarely. The above	-0.124939
-0.595672	144 The above	-0.124939
-0.203376	// if above	-0.124939
-0.601017	is. This above	-0.124939
-0.599814	we used above	-0.124939
-0.580771	method described above	-0.124939
-0.567164	disadvantages mentioned above	-0.124939
-0.788746	column 28 above	-0.124939
-0.463568	elements matrix[c][r] above	-0.124939
-0.358873	mirror position above	-0.124939
-0.596556	64-bit code. However,	-0.124939
-0.594767	next function. However,	-0.124939
-1.529101	is used. However,	-0.124939
-0.583859	x86 CPUs. However,	-0.124939
-0.862993	major platforms. However,	-0.124939
-0.862909	the size. However,	-0.124939
-0.578746	scarce resources. However,	-0.124939
-1.002842	as possible. However,	-0.124939
-0.575210	other thread. However,	-0.124939
-1.003306	different purposes. However,	-0.124939
-0.841238	data sets. However,	-0.124939
-0.566768	most critical. However,	-0.124939
-0.566617	are executed. However,	-0.124939
-0.562636	its value. However,	-0.124939
-0.557285	set. 120 However,	-0.124939
-0.557285	actual processor. However,	-0.124939
-0.557380	works automatically. However,	-0.124939
-0.763715	they are. However,	-0.124939
-0.526960	fastest first. However,	-0.124939
-0.504560	PC platform. However,	-0.124939
-0.659209	a debugger. However,	-0.124939
-0.659209	the screen. However,	-0.124939
-0.659209	program flow. However,	-0.124939
-0.463221	next calculation. However,	-0.124939
-0.463221	Java implementations. However,	-0.124939
-0.358600	{} brackets. However,	-0.124939
-0.358600	later maintenance. However,	-0.124939
-0.358600	CPU models. However,	-0.124939
-0.358600	function F1. However,	-0.124939
-1.938736	if a was	-0.124939
-1.067536	line that was	-0.124939
-0.598143	model that was	-0.124939
-0.598143	ebx that was	-0.124939
-1.134273	time it was	-0.301030
-0.593269	value it was	-0.124939
-2.534234	the function was	-0.124939
-2.270505	the CPU was	-0.124939
-1.072935	CPUID instruction was	-0.124939
-2.434899	instruction set was	-0.124939
-1.671536	that there was	-0.124939
-0.950284	the software was	-0.124939
-1.279864	non-Intel CPUs was	-0.124939
-0.584949	template feature was	-0.124939
-0.581710	the statement was	-0.124939
-0.580386	128-bit operation was	-0.124939
-0.570160	If seconds was	-0.124939
-0.566756	this brand was	-0.124939
-1.224371	the CPUID was	-0.124939
-0.550647	multiplications. How was	-0.124939
-0.540852	the recommendation was	-0.124939
-0.788244	of Basic was	-0.124939
-1.041781	time consumption was	-0.124939
-0.916600	to 15.1c was	-0.124939
-0.526932	which alloca was	-0.124939
-0.463330	example 11.2b was	-0.124939
-1.554844	members of both	-0.124939
-0.889297	same in both	-0.124939
-1.590730	available in both	-0.124939
-0.889297	work in both	-0.124939
-0.595109	line in both	-0.124939
-1.267336	run in both	-0.124939
-1.065388	syntax in both	-0.124939
-0.595109	details in both	-0.124939
-2.526893	may be both	-0.124939
-0.601354	v.f are both	-0.124939
-1.192897	fail if both	-0.124939
-0.598287	v.f if both	-0.124939
-1.710168	supported by both	-0.124939
-1.076059	work with both	-0.124939
-1.370993	of time both	-0.124939
-0.899564	b will both	-0.124939
-0.600348	swapped then both	-0.124939
-0.600165	support from both	-0.124939
-0.600100	style has both	-0.124939
-0.899099	twice because both	-0.124939
-0.590593	and optimize both	-0.124939
-0.590019	time. Therefore, both	-0.124939
-0.582836	tool supports both	-0.124939
-0.581796	yet. Supports both	-0.124939
-0.579205	counter outside both	-0.124939
-0.570338	and checks both	-0.124939
-0.550408	always evaluate both	-0.124939
-0.463349	and destination both	-0.124939
-0.358701	Today (2013) both	-0.124939
-2.299694	than the programs	-0.124939
-0.601680	uninstallation of programs	-0.124939
-1.552263	systems and programs	-0.124939
-1.068208	use in programs	-0.124939
-1.531118	useful in programs	-0.124939
-1.068208	precision in programs	-0.124939
-1.286460	errors in programs	-0.124939
-1.195596	even for programs	-0.124939
-0.599913	high for programs	-0.124939
-0.600996	case with programs	-0.124939
-0.598588	expect 64-bit programs	-0.124939
-1.422625	in C++ programs	-0.124939
-0.598527	many such programs	-0.124939
-0.598193	But many programs	-0.124939
-1.070318	Many software programs	-0.124939
-0.597679	than 32-bit programs	-0.124939
-1.058860	are making programs	-0.124939
-0.595025	intervals. Some programs	-0.124939
-0.886922	many common programs	-0.124939
-0.592109	which few programs	-0.124939
-1.425730	and Mac programs	-0.124939
-0.589720	Some application programs	-0.124939
-0.586313	input. Many programs	-0.124939
-0.575347	time. Other programs	-0.124939
-1.379861	object oriented programs	-0.124939
-0.788205	of CPU-intensive programs	-0.124939
-0.358672	in interactive programs	-0.124939
-0.358672	cache. Multithreaded programs	-0.124939
-2.983631	of the problems	-0.124939
-2.362664	with the problems	-0.124939
-1.311726	all the problems	-0.124939
-1.199358	involves the problems	-0.124939
-1.502661	kind of problems	-0.124939
-0.601751	susceptible to problems	-0.124939
-1.075385	CPU has problems	-0.124939
-0.575034	to these problems	-0.124939
-1.104843	All these problems	-0.124939
-0.596270	compilers without problems	-0.124939
-0.594093	server. These problems	-0.124939
-1.055654	Some common problems	-0.124939
-1.376474	can cause problems	-0.124939
-0.561981	schemes cause problems	-0.124939
-0.543404	no caching problems	-0.124939
-0.543404	cause caching problems	-0.124939
-0.488071	of compatibility problems	-0.425969
-0.143396	and compatibility problems	-0.124939
-0.474451	of resource problems	-0.124939
-0.676977	other resource problems	-0.124939
-1.321576	for finding problems	-0.124939
-0.358850	of usability problems	-0.124939
-0.358850	important usability problems	-0.124939
-0.358850	problems, usability problems	-0.124939
-0.504841	causes technical problems	-0.124939
-1.495787	long time unless	-0.124939
-1.563148	induction variable unless	-0.124939
-1.284941	do so unless	-0.124939
-1.410476	the optimization unless	-0.124939
-0.891786	as pointers unless	-0.124939
-1.522277	32-bit systems unless	-0.124939
-0.595277	to & unless	-0.124939
-0.581565	non-Intel CPUs unless	-0.425969
-1.492291	point calculations unless	-0.124939
-0.947506	32-bit mode unless	-0.124939
-0.947506	flush-to-zero mode unless	-0.124939
-1.671505	exception handling unless	-0.124939
-1.669213	be avoided unless	-0.124939
-0.902524	is slow unless	-0.124939
-0.504810	are slow unless	-0.124939
-1.118601	stack frame unless	-0.124939
-0.579050	not safe unless	-0.124939
-1.209421	more clear unless	-0.124939
-0.981083	by default unless	-0.124939
-0.557445	large object, unless	-0.124939
-0.557445	than rounding unless	-0.124939
-0.504640	loop manually unless	-0.124939
-0.659324	OS X, unless	-0.124939
-0.358658	(16 bits), unless	-0.124939
-0.358658	as b*(2.0/3.0) unless	-0.124939
-0.358658	integer constant, unless	-0.124939
-0.358658	method unfavorable, unless	-0.124939
-2.691350	in the optimal	-0.124939
-1.188308	be the optimal	-0.124939
-1.529755	then the optimal	-0.124939
-1.360534	cases, the optimal	-0.124939
-1.419853	choose the optimal	-0.124939
-1.474081	find the optimal	-0.124939
-1.071203	produce the optimal	-0.124939
-0.089867	Choosing the optimal	-0.425969
-2.217538	that is optimal	-0.124939
-2.092820	it is optimal	-0.425969
-2.429972	This is optimal	-0.124939
-1.716019	solution is optimal	-0.124939
-1.286438	implementation is optimal	-0.124939
-0.601612	finished. The optimal	-0.124939
-1.886853	not be optimal	-0.124939
-1.820178	may be optimal	-0.124939
-1.479944	is not optimal	-0.249877
-1.199624	not an optimal	-0.124939
-0.597353	produce less optimal	-0.124939
-0.601892	deallocate the space	-0.124939
-0.601775	occupies a space	-0.124939
-1.713341	amount of space	-0.124939
-0.902072	heap. The space	-0.124939
-0.589208	up more space	-0.124939
-1.041712	allocate more space	-0.124939
-0.589208	allocating more space	-0.124939
-1.361887	the memory space	-0.124939
-1.363536	of memory space	-0.124939
-0.358649	The memory space	-0.124939
-0.550037	This memory space	-0.124939
-0.550037	efficient memory space	-0.124939
-0.550037	take memory space	-0.124939
-0.550037	saving memory space	-0.124939
-0.550037	Extra memory space	-0.124939
-2.636751	the same space	-0.124939
-0.898387	of cache space	-0.124939
-0.846196	up cache space	-0.124939
-0.572682	save cache space	-0.124939
-0.597435	larger address space	-0.124939
-1.056062	too much space	-0.124939
-0.851390	and disk space	-0.124939
-0.575468	takes little space	-0.124939
-0.487296	the heap space	-0.124939
-0.871463	The heap space	-0.124939
-2.263330	There are cases,	-0.124939
-1.373771	In other cases,	-0.124939
-1.289709	in all cases,	-0.124939
-0.602817	in most cases,	-0.124939
-0.259998	In most cases,	-0.221849
-0.598635	In such cases,	-0.124939
-0.373198	In many cases,	-0.124939
-0.850805	in some cases,	-0.124939
-0.498750	In some cases,	-0.346788
-0.571070	In simple cases,	-0.124939
-0.571070	50 simple cases,	-0.124939
-1.803627	a few cases,	-0.124939
-0.599067	the simplest cases,	-0.124939
-0.595478	pointer if else	-0.124939
-0.377567	else if else	-0.124939
-1.215371	} } else	-0.425969
-0.939958	0; } else	-0.124939
-0.710481	b; } else	-0.124939
-0.676450	1; } else	-0.602060
-0.495164	lookup } else	-0.124939
-0.784159	2; } else	-0.425969
-0.815019	1.; } else	-0.124939
-0.495164	nonzero } else	-0.124939
-0.495164	134 } else	-0.124939
-0.495164	range"; } else	-0.124939
-0.495164	FuncA(i); } else	-0.124939
-0.180532	F1(a); } else	-0.425969
-0.710481	&CriticalFunction_SSE2; } else	-0.124939
-0.710481	&CriticalFunction_AVX; } else	-0.124939
-0.495164	69 } else	-0.124939
-0.274659	on anything else	-0.124939
-0.274659	than anything else	-0.124939
-0.274659	optimize anything else	-0.124939
-0.358945	} 34 else	-0.124939
-0.358945	} 68 else	-0.124939
-1.806335	is a lot	-0.425969
-1.718570	if a lot	-0.124939
-1.866375	with a lot	-0.124939
-1.792049	use a lot	-0.124939
-0.340657	do a lot	-0.602060
-0.591087	take a lot	-0.425969
-1.026896	often a lot	-0.124939
-1.135718	cause a lot	-0.124939
-0.726513	uses a lot	-0.425969
-1.391701	get a lot	-0.124939
-1.135718	contains a lot	-0.124939
-1.217087	require a lot	-0.124939
-0.583923	save a lot	-0.124939
-0.583923	waste a lot	-0.124939
-0.583923	spend a lot	-0.124939
-0.372865	consume a lot	-0.425969
-0.583923	consumes a lot	-0.124939
-0.583923	differ a lot	-0.124939
-0.583923	installed, a lot	-0.124939
-0.583923	RAM, a lot	-0.124939
-1.343891	functions. A lot	-0.124939
-0.594549	vectors. A lot	-0.124939
-2.711617	- - Integer	-0.124939
-1.976498	of 2 Integer	-0.124939
-1.274587	program optimization Integer	-0.124939
-1.554787	compile time. Integer	-0.124939
-1.154664	longer time. Integer	-0.124939
-1.179441	of overflow Integer	-0.124939
-0.954068	and operators Integer	-0.124939
-0.556233	Integer operators Integer	-0.124939
-1.167978	Integer multiplication Integer	-0.124939
-1.452729	Integer division Integer	-0.124939
-0.863333	many cases. Integer	-0.124939
-0.581617	specific size. Integer	-0.124939
-1.112262	in performance. Integer	-0.124939
-0.550454	pool. 15 Integer	-0.124939
-0.788205	other microprocessors. Integer	-0.124939
-0.527040	for constants. Integer	-0.124939
-0.527040	undesired results. Integer	-0.124939
-0.249696	performance. 14.4 Integer	-0.124939
-0.249696	135 14.4 Integer	-0.124939
-0.726183	the microprocessor. Integer	-0.124939
-0.143316	136 14.5 Integer	-0.124939
-0.143316	96. 14.5 Integer	-0.124939
-0.463312	integer. 158 Integer	-0.124939
-0.463312	further discussion. Integer	-0.124939
-0.659353	the processor). Integer	-0.124939
-0.358672	integer division: Integer	-0.124939
-0.358672	Example 15.1d. Integer	-0.124939
-0.358672	Example 8.24. Integer	-0.124939
-2.541459	and the dispatching	-0.124939
-1.771004	do the dispatching	-0.124939
-0.600137	2" The dispatching	-0.124939
-0.600137	44. The dispatching	-0.124939
-1.132490	the CPU dispatching	-0.124939
-0.859899	of CPU dispatching	-0.124939
-0.671476	for CPU dispatching	-0.124939
-0.470993	// CPU dispatching	-0.124939
-0.671476	on CPU dispatching	-0.124939
-0.470993	have CPU dispatching	-0.124939
-0.671476	using CPU dispatching	-0.124939
-0.544807	automatic CPU dispatching	-0.124939
-0.470993	supports CPU dispatching	-0.124939
-0.470993	apply CPU dispatching	-0.124939
-0.470993	c: CPU dispatching	-0.124939
-0.470993	explicit CPU dispatching	-0.124939
-0.470993	Explicit CPU dispatching	-0.124939
-0.174551	13.6 CPU dispatching	-0.425969
-0.174551	13.7 CPU dispatching	-0.425969
-0.470993	13.2. CPU dispatching	-0.124939
-1.416667	it makes dispatching	-0.124939
-0.876610	The automatic dispatching	-0.124939
-0.065806	13.2 Model-specific dispatching	-0.124939
-3.067489	in the particular	-0.124939
-2.100479	of a particular	-0.124939
-1.696795	in a particular	-0.124939
-0.855204	for a particular	-0.124939
-0.893970	that a particular	-0.124939
-1.330284	on a particular	-0.124939
-1.188012	has a particular	-0.124939
-1.594210	using a particular	-0.124939
-1.499427	where a particular	-0.124939
-0.585309	whether a particular	-0.124939
-0.585309	optimizing a particular	-0.124939
-0.585309	supports a particular	-0.124939
-1.155689	expect a particular	-0.124939
-0.585309	activate a particular	-0.124939
-0.902179	on that particular	-0.124939
-1.078136	tables are particular	-0.124939
-1.367522	that each particular	-0.124939
-1.595775	that the microprocessor	-0.249877
-2.250797	by the microprocessor	-0.124939
-2.315083	on the microprocessor	-0.124939
-2.101148	then the microprocessor	-0.124939
-2.165017	because the microprocessor	-0.124939
-1.635815	If the microprocessor	-0.124939
-1.776432	makes the microprocessor	-0.124939
-1.063156	how the microprocessor	-0.124939
-1.063156	cases the microprocessor	-0.124939
-0.892341	well the microprocessor	-0.124939
-0.892341	fits the microprocessor	-0.124939
-0.596652	structure), the microprocessor	-0.124939
-2.663241	in a microprocessor	-0.124939
-1.074096	implement a microprocessor	-0.124939
-0.600358	(requires a microprocessor	-0.124939
-1.047778	Choice of microprocessor	-0.124939
-0.601696	technology, and microprocessor	-0.124939
-0.601749	improvements in microprocessor	-0.124939
-0.594474	chain. A microprocessor	-0.124939
-0.594474	counter. A microprocessor	-0.124939
-0.311277	a dedicated microprocessor	-0.124939
-2.150696	is to replace	-0.124939
-2.094316	have to replace	-0.124939
-1.344797	possible to replace	-0.301030
-1.850431	necessary to replace	-0.124939
-1.831246	advantageous to replace	-0.124939
-1.067667	expected to replace	-0.124939
-0.909911	compiler can replace	-0.124939
-1.202555	variable or replace	-0.124939
-0.715248	compiler may replace	-0.522879
-1.773492	You may replace	-0.124939
-1.734948	compiler will replace	-0.124939
-1.778911	compilers will replace	-0.124939
-0.600426	predictable then replace	-0.124939
-1.486118	you cannot replace	-0.124939
-1.452070	You cannot replace	-0.124939
-1.064136	will often replace	-0.124939
-0.922668	can automatically replace	-0.124939
-0.792751	will automatically replace	-0.124939
-2.655134	of the next	-0.124939
-1.977240	to the next	-0.124939
-2.039833	in the next	-0.124939
-2.371086	for the next	-0.124939
-1.737671	that the next	-0.124939
-2.302800	on the next	-0.124939
-1.732556	when the next	-0.124939
-1.986761	make the next	-0.124939
-1.181580	need the next	-0.124939
-0.891674	start the next	-0.124939
-1.512335	until the next	-0.124939
-1.168201	mode. The next	-0.124939
-0.884759	returns. The next	-0.124939
-0.884759	ebx. The next	-0.124939
-0.592798	metaprogramming. The next	-0.124939
-0.592798	range. The next	-0.124939
-0.884759	i/2+r. The next	-0.124939
-0.592798	VIA. The next	-0.124939
-0.601412	xx4; // next	-0.124939
-0.593021	// get next	-0.124939
-0.358945	be mainstream next	-0.124939
-2.753523	if the branches	-0.124939
-2.370154	number of branches	-0.124939
-1.931456	lot of branches	-0.124939
-0.599373	target of branches	-0.124939
-1.537833	series of branches	-0.124939
-1.202498	calls and branches	-0.124939
-0.601468	optimal. The branches	-0.124939
-2.293219	is that branches	-0.124939
-0.502398	all code branches	-0.124939
-0.599613	test all branches	-0.124939
-0.898443	seldom used branches	-0.124939
-1.193471	and no branches	-0.124939
-1.704963	the two branches	-0.124939
-1.493945	with many branches	-0.124939
-1.351444	has many branches	-0.124939
-1.058961	are making branches	-0.124939
-0.594022	contains several branches	-0.124939
-0.592167	as few branches	-0.124939
-1.303998	the dispatch branches	-0.124939
-0.579032	with preceding branches	-0.124939
-0.577271	8. Avoid branches	-0.124939
-0.175402	Join identical branches	-0.124939
-0.504864	jumps Eliminate branches	-0.124939
-0.358715	size. Unpredictable branches	-0.124939
-0.358715	needed. Predictable branches	-0.124939
-2.427834	This is typically	-0.124939
-1.705063	time is typically	-0.124939
-1.460776	which is typically	-0.124939
-1.610714	size is typically	-0.124939
-0.598298	malloc is typically	-0.124939
-0.601733	slices of typically	-0.124939
-0.599850	structures that typically	-0.124939
-0.599850	frameworks that typically	-0.124939
-1.186964	program are typically	-0.124939
-1.861142	functions are typically	-0.124939
-0.597707	child are typically	-0.124939
-0.597106	unsigned. This typically	-0.124939
-0.597106	C++. This typically	-0.124939
-1.634579	This may typically	-0.124939
-0.594236	programming will typically	-0.124939
-0.594236	calculations will typically	-0.124939
-1.687515	function pointer typically	-0.124939
-0.598288	instruction takes typically	-0.124939
-0.596013	without SSE2 typically	-0.124939
-1.037974	The programmer typically	-0.124939
-0.586439	This framework typically	-0.124939
-1.009254	Text strings typically	-0.124939
-0.562843	such devices typically	-0.124939
-0.818153	Software developers typically	-0.124939
-0.541134	and frameworks typically	-0.124939
-0.504917	priority level, typically	-0.124939
-1.738840	function or operator	-0.124939
-0.600072	b;} vector operator	-0.124939
-1.071834	has one operator	-0.124939
-0.471762	the & operator	-0.124939
-0.547759	The & operator	-0.124939
-0.874858	the | operator	-0.124939
-1.225105	array index operator	-0.124939
-0.579188	// sum operator	-0.124939
-0.408049	or overloaded operator	-0.124939
-0.649500	an overloaded operator	-0.124939
-0.408049	An overloaded operator	-0.124939
-0.391049	C++ casting operator	-0.124939
-0.745215	type casting operator	-0.124939
-0.541006	The AND operator	-0.124939
-0.358788	The OR operator	-0.124939
-0.358788	EXCLUSIVE OR operator	-0.124939
-0.526995	the modulo operator	-0.124939
-0.526995	The pre-increment operator	-0.124939
-0.463385	The dynamic_cast operator	-0.124939
-0.143333	the const_cast operator	-0.124939
-0.143333	The const_cast operator	-0.124939
-0.463385	The [] operator	-0.124939
-0.358729	The ?: operator	-0.124939
-0.358729	the post-increment operator	-0.124939
-0.358729	The reinterpret_cast operator	-0.124939
-0.358729	The static_cast operator	-0.124939
-0.902600	application is preferably	-0.124939
-0.601483	vectors are preferably	-0.124939
-1.674398	this by preferably	-0.124939
-0.601014	program - preferably	-0.124939
-2.008459	You may preferably	-0.124939
-0.596284	dimension may preferably	-0.124939
-0.920112	function should preferably	-0.124939
-0.485549	loop should preferably	-0.124939
-0.485549	object should preferably	-0.124939
-0.110338	objects should preferably	-0.602060
-0.485549	we should preferably	-0.124939
-0.795173	test should preferably	-0.124939
-0.485549	list should preferably	-0.124939
-0.485549	counter should preferably	-0.124939
-0.485549	count should preferably	-0.124939
-0.485549	statements should preferably	-0.124939
-0.694805	unrolling should preferably	-0.124939
-0.485549	device should preferably	-0.124939
-0.485549	interrupt should preferably	-0.124939
-0.401559	should therefore preferably	-0.301030
-0.550714	few files, preferably	-0.124939
-0.463568	single container, preferably	-0.124939
-0.358873	for SSE2, preferably	-0.124939
-1.637824	c = 1;	-0.124939
-0.863793	n = 1;	-0.124939
-1.462554	d = 1;	-0.124939
-1.128789	f = 1;	-0.124939
-0.497001	(r = 1;	-0.425969
-0.200026	list[i+1] = 1;	-0.425969
-0.581967	a[0] = 1;	-0.124939
-0.976277	a - 1;	-0.425969
-0.401265	a + 1;	-0.425969
-0.651013	b + 1;	-0.124939
-0.506600	x*x + 1;	-0.124939
-0.553825	one : 1;	-0.124939
-0.553825	sign : 1;	-0.124939
-0.810653	cout << 1;	-0.425969
-1.027535	a ^ 1;	-0.124939
-0.358916	n >>= 1;	-0.124939
-0.596570	run time. Therefore,	-0.124939
-0.587856	writeable data. Therefore,	-0.124939
-1.740617	instruction set. Therefore,	-0.124939
-1.154346	are called. Therefore,	-0.124939
-0.854699	to it. Therefore,	-0.124939
-1.189060	64-bit mode. Therefore,	-0.124939
-0.989487	internal references. Therefore,	-0.124939
-1.106191	points to. Therefore,	-0.124939
-0.835294	be critical. Therefore,	-0.124939
-0.562632	different applications. Therefore,	-0.124939
-0.557281	additional parameters. Therefore,	-0.124939
-0.557363	other number. Therefore,	-0.124939
-0.343571	time consuming. Therefore,	-0.124939
-0.788167	point numbers. Therefore,	-0.124939
-0.540808	relative addresses. Therefore,	-0.124939
-0.884139	an exception. Therefore,	-0.124939
-0.959627	is declared. Therefore,	-0.124939
-0.314572	or another. Therefore,	-0.124939
-0.314572	than another. Therefore,	-0.124939
-0.504640	type int. Therefore,	-0.124939
-0.504810	library. 78 Therefore,	-0.124939
-0.659324	was programmed. Therefore,	-0.124939
-0.463294	different strides. Therefore,	-0.124939
-0.463294	or namespaces. Therefore,	-0.124939
-0.463294	than PCs. Therefore,	-0.124939
-0.358658	been calculated. Therefore,	-0.124939
-2.568658	on the Mac	-0.124939
-1.576618	Windows and Mac	-0.124939
-0.753806	Linux and Mac	-0.124939
-0.089560	BSD and Mac	-0.249877
-1.703201	objects in Mac	-0.124939
-0.900212	tested in Mac	-0.124939
-1.711393	compiler for Mac	-0.124939
-0.601218	BSD or Mac	-0.124939
-1.076251	run on Mac	-0.124939
-0.598705	references. 64-bit Mac	-0.124939
-1.552649	in 32-bit Mac	-0.124939
-0.813450	for 32-bit Mac	-0.425969
-0.780056	Gnu 32-bit Mac	-0.124939
-0.536201	Linux. 32-bit Mac	-0.124939
-0.589014	Unix-like systems. Mac	-0.124939
-1.134265	Windows, Linux, Mac	-0.124939
-0.726551	and perhaps Mac	-0.124939
-0.090142	The Intel-based Mac	-0.124939
-0.090142	as Intel-based Mac	-0.124939
-0.090142	BSD, Intel-based Mac	-0.124939
-0.659668	to date. Mac	-0.124939
-2.910031	of the multiplication	-0.124939
-2.434291	and the multiplication	-0.124939
-2.821548	in the multiplication	-0.124939
-1.541069	then the multiplication	-0.124939
-1.584193	while the multiplication	-0.124939
-1.579839	avoid the multiplication	-0.124939
-2.079061	make a multiplication	-0.124939
-1.296045	require a multiplication	-0.124939
-0.895817	addition and multiplication	-0.124939
-0.124531	subtraction and multiplication	-0.124939
-1.300041	occur in multiplication	-0.124939
-1.962135	} The multiplication	-0.124939
-1.196278	element. The multiplication	-0.124939
-2.197375	used for multiplication	-0.124939
-0.600617	cases this multiplication	-0.124939
-1.881967	floating point multiplication	-0.124939
-0.879982	32-bit integer multiplication	-0.124939
-0.590353	replace integer multiplication	-0.124939
-0.673362	time. Integer multiplication	-0.124939
-0.472180	multiplication Integer multiplication	-0.124939
-0.174851	14.4 Integer multiplication	-0.124939
-0.567151	code involves multiplication	-0.124939
-2.387705	of the application	-0.124939
-2.348919	and the application	-0.124939
-2.688671	in the application	-0.124939
-2.006387	if the application	-0.124939
-2.295807	by the application	-0.124939
-2.162654	than the application	-0.124939
-2.095215	from the application	-0.124939
-2.230345	If the application	-0.124939
-1.981557	before the application	-0.124939
-0.598011	market the application	-0.124939
-1.196157	processors. The application	-0.124939
-0.899090	library. The application	-0.124939
-1.708117	in an application	-0.124939
-1.063194	such an application	-0.124939
-2.018894	the first application	-0.124939
-0.595149	access Some application	-0.124939
-0.594510	systems"). An application	-0.124939
-1.792113	a particular application	-0.124939
-0.872645	heavy graphics application	-0.124939
-0.871346	for your application	-0.124939
-1.032200	a second application	-0.124939
-1.481497	the final application	-0.124939
-1.078099	a typical application	-0.124939
-0.463495	integration, web application	-0.124939
-0.463495	A WTL application	-0.124939
-0.564088	(float const x)	-0.124939
-0.196255	(double const x)	-0.425969
-0.248986	const & x)	-0.903090
-1.304615	SomeFunction (int x)	-0.124939
-0.549175	MultiplyBy (int x)	-0.124939
-0.073769	parabola (float x)	-0.602060
-0.015539	double p(double x)	-0.726999
-0.015539	double xpow10(double x)	-0.726999
-0.527262	IntegerPower (double x)	-0.124939
-0.065798	float Exp(float x)	-0.425969
-0.358873	int Func1(int x)	-0.124939
-0.358873	double Func2(double x)	-0.124939
-1.202815	space is automatically	-0.124939
-2.072050	able to automatically	-0.124939
-1.501180	compiler that automatically	-0.124939
-1.317112	compiler can automatically	-0.124939
-1.603700	compilers can automatically	-0.124939
-1.953862	the code automatically	-0.124939
-1.141226	compilers will automatically	-0.124939
-1.073328	prefetch data automatically	-0.124939
-2.002409	a loop automatically	-0.124939
-1.071778	program should automatically	-0.124939
-0.892377	this optimization automatically	-0.124939
-1.697219	vector operations automatically	-0.124939
-0.888294	static arrays automatically	-0.124939
-1.403134	that doesn't automatically	-0.124939
-0.590150	software programs automatically	-0.124939
-1.029545	it goes automatically	-0.124939
-0.578939	often inlined automatically	-0.124939
-0.835394	nontemporal writes automatically	-0.124939
-0.557349	or update automatically	-0.124939
-0.463349	with 14.14b automatically	-0.124939
-0.358701	to 12.8b automatically	-0.124939
-0.358701	License shall automatically	-0.124939
-0.358701	page 73) automatically	-0.124939
-1.636216	code to see	-0.124939
-0.592344	at to see	-0.124939
-0.883871	compilers to see	-0.124939
-1.563813	possible to see	-0.124939
-1.050662	table to see	-0.124939
-1.926485	want to see	-0.124939
-0.592344	result to see	-0.124939
-1.917071	able to see	-0.124939
-1.324931	fail to see	-0.124939
-0.592344	generates to see	-0.124939
-0.883871	measurements to see	-0.124939
-0.592344	listing to see	-0.124939
-1.298959	libraries and see	-0.124939
-1.317296	compiler can see	-0.425969
-0.596943	user can see	-0.124939
-1.910746	that you see	-0.124939
-1.734722	compiler will see	-0.124939
-1.504662	you will see	-0.124939
-1.193886	may also see	-0.124939
-0.580671	independent code, see	-0.124939
-0.788795	many features, see	-0.124939
-0.726618	vector operations, see	-0.124939
-0.358859	this topic, see	-0.124939
-0.358859	XMM registers; see	-0.124939
-2.563079	and the caching	-0.124939
-1.077862	fragmented and caching	-0.124939
-0.902072	big that caching	-0.124939
-0.590597	in code caching	-0.124939
-1.056549	when code caching	-0.124939
-0.590597	where code caching	-0.124939
-0.590597	makes code caching	-0.124939
-1.537697	the data caching	-0.124939
-1.288268	and data caching	-0.124939
-0.040554	makes data caching	-0.425969
-0.599896	addresses. If caching	-0.124939
-1.536065	are no caching	-0.124939
-0.199660	code makes caching	-0.124939
-0.596297	data without caching	-0.124939
-1.642466	can cause caching	-0.124939
-0.541128	87). Data caching	-0.124939
-0.463513	memory. Efficient caching	-0.124939
-0.463513	eliminated. Code caching	-0.124939
-1.874066	code that allows	-0.124939
-1.053239	systems that allows	-0.124939
-1.053239	language that allows	-0.124939
-1.053239	option that allows	-0.124939
-0.885624	container that allows	-0.124939
-1.053239	feature that allows	-0.124939
-1.622582	that it allows	-0.124939
-0.589503	pure. This allows	-0.124939
-0.589503	iteration. This allows	-0.124939
-0.589503	"undefined". This allows	-0.124939
-0.589503	throw(); This allows	-0.124939
-1.957823	Intel compiler allows	-0.124939
-1.842140	Gnu compiler allows	-0.124939
-0.599806	-ffunction-sections) which allows	-0.124939
-0.598580	set also allows	-0.124939
-0.713218	64-bit Windows allows	-0.124939
-0.594311	D language allows	-0.124939
-1.152379	const reference allows	-0.124939
-0.587881	out-of-order mechanism allows	-0.124939
-0.313168	program logic allows	-0.124939
-0.567002	is standardized allows	-0.124939
-0.463495	is biased allows	-0.124939
-0.463495	assignment. shared_ptr allows	-0.124939
-0.902235	number and sets	-0.124939
-0.601118	working with sets	-0.124939
-0.600408	dispatcher then sets	-0.124939
-1.437972	large data sets	-0.124939
-1.160254	the instruction sets	-0.124939
-0.507759	if instruction sets	-0.124939
-0.507759	when instruction sets	-0.124939
-0.340002	different instruction sets	-0.124939
-0.907447	which instruction sets	-0.124939
-1.687793	SSE2 instruction sets	-0.124939
-1.102262	supported instruction sets	-0.124939
-0.507759	various instruction sets	-0.124939
-0.507759	what instruction sets	-0.124939
-0.507759	compatible instruction sets	-0.124939
-0.731356	newer instruction sets	-0.124939
-0.841753	newest instruction sets	-0.124939
-1.032386	CISC instruction sets	-0.124939
-1.195894	above example sets	-0.124939
-0.573282	the 32 sets	-0.124939
-0.573282	as 32 sets	-0.124939
-0.867913	The constructor sets	-0.124939
-0.567202	13.1. Instruction sets	-0.124939
-0.940539	initialization routine sets	-0.124939
-0.358844	function __intel_cpu_features_init() sets	-0.124939
-0.358844	and similarly sets	-0.124939
-2.875252	of the expression	-0.124939
-2.790845	in the expression	-0.124939
-2.584178	if the expression	-0.124939
-2.173516	then the expression	-0.124939
-2.259463	because the expression	-0.124939
-0.898877	while the expression	-0.124939
-1.429021	change the expression	-0.124939
-1.529260	time. The expression	-0.124939
-0.598586	efficiency. The expression	-0.124939
-0.598586	mask. The expression	-0.124939
-0.600996	d; This expression	-0.124939
-1.177910	is an expression	-0.425969
-1.539283	be an expression	-0.124939
-1.618958	the integer expression	-0.124939
-1.188459	that some expression	-0.124939
-0.539904	variables An expression	-0.124939
-0.539904	propagation An expression	-0.124939
-0.539904	thing. An expression	-0.124939
-1.259108	the intermediate expression	-0.124939
-0.587827	the && expression	-0.124939
-0.575415	counter. Any expression	-0.124939
-0.541128	The equivalent expression	-0.124939
-0.788940	a loop-invariant expression	-0.124939
-0.358830	the non-reduced expression	-0.124939
-0.601761	behaviour is implementation	-0.124939
-0.901024	particular code implementation	-0.124939
-1.672525	} This implementation	-0.124939
-1.815111	a vector implementation	-0.124939
-1.842369	a different implementation	-0.124939
-1.072324	about which implementation	-0.124939
-0.598546	A C++ implementation	-0.124939
-0.895789	simplest possible implementation	-0.124939
-0.904494	the software implementation	-0.124939
-0.797733	a software implementation	-0.124939
-0.594451	standard. An implementation	-0.124939
-1.782678	the best implementation	-0.124939
-1.481361	a good implementation	-0.124939
-1.241237	A good implementation	-0.124939
-0.597905	the hardware implementation	-0.425969
-1.226985	a hardware implementation	-0.124939
-0.828414	a complicated implementation	-0.124939
-1.086579	more complicated implementation	-0.124939
-0.501526	most complicated implementation	-0.124939
-0.570222	A metaprogramming implementation	-0.124939
-0.566872	A typical implementation	-0.124939
-0.550497	A mixed implementation	-0.124939
-0.541202	A safer implementation	-0.124939
-0.597118	procedure 4 Most	-0.124939
-0.596601	executable code. Most	-0.124939
-1.047676	static memory. Most	-0.124939
-0.588863	or cache. Most	-0.124939
-1.555207	less efficient. Most	-0.124939
-0.872009	interface framework Most	-0.124939
-1.029175	Thread-local storage Most	-0.124939
-0.582825	Algebraic reductions Most	-0.124939
-1.437134	the loop. Most	-0.124939
-0.578812	limited resources. Most	-0.124939
-0.854661	Worst-case testing Most	-0.124939
-0.854843	simple variable. Most	-0.124939
-0.841241	and references. Most	-0.124939
-1.216382	instruction sets. Most	-0.124939
-1.077418	point expressions. Most	-0.124939
-0.557344	low-level optimizations. Most	-0.124939
-0.540786	and maintain. Most	-0.124939
-0.788332	Algebraic reduction Most	-0.124939
-0.726116	chapter 12. Most	-0.124939
-0.504620	turned on. Most	-0.124939
-0.463276	C++ constructs Most	-0.124939
-0.463276	the executable. Most	-0.124939
-0.659295	data compression Most	-0.124939
-0.463276	is updated. Most	-0.124939
-0.358643	sizeof(b)); 47 Most	-0.124939
-0.358643	heuristic guidelines. Most	-0.124939
-2.302205	than the complicated	-0.124939
-2.611088	is a complicated	-0.124939
-2.058993	make a complicated	-0.124939
-1.586515	such a complicated	-0.124939
-1.678208	disadvantage of complicated	-0.124939
-0.601607	advanced and complicated	-0.124939
-1.843709	based on complicated	-0.124939
-1.494888	use this complicated	-0.124939
-0.798750	the more complicated	-0.124939
-1.335180	is more complicated	-0.124939
-1.075999	for more complicated	-0.124939
-1.188351	code more complicated	-0.124939
-1.121404	A more complicated	-0.124939
-0.930778	do more complicated	-0.124939
-0.546764	elements more complicated	-0.124939
-1.311915	much more complicated	-0.124939
-0.798750	little more complicated	-0.124939
-0.546764	somewhat more complicated	-0.124939
-2.156754	the most complicated	-0.124939
-1.567455	is so complicated	-0.124939
-0.594111	needed. These complicated	-0.124939
-0.589026	expensive. Using complicated	-0.124939
-1.286608	to reduce complicated	-0.124939
-0.504861	set. More complicated	-0.124939
-0.835524	is extremely complicated	-0.124939
-1.704979	way of handling	-0.124939
-1.499102	ways of handling	-0.124939
-0.601625	twice for handling	-0.124939
-0.187156	and error handling	-0.124939
-0.523179	as error handling	-0.124939
-0.757477	own error handling	-0.124939
-0.442863	the exception handling	-0.124939
-0.255927	of exception handling	-0.124939
-0.255927	to exception handling	-0.124939
-0.255927	for exception handling	-0.124939
-0.255927	that exception handling	-0.124939
-0.255927	use exception handling	-0.124939
-0.255927	If exception handling	-0.124939
-0.255927	using exception handling	-0.124939
-0.255927	C++ exception handling	-0.124939
-0.255927	possible exception handling	-0.124939
-0.255927	No exception handling	-0.124939
-0.255927	save exception handling	-0.124939
-0.255927	why exception handling	-0.124939
-0.366816	structured exception handling	-0.124939
-0.109378	disable exception handling	-0.425969
-0.107196	handling Exception handling	-0.425969
-1.636263	intermediate code like	-0.124939
-1.475503	library functions like	-0.124939
-0.592303	complicated functions like	-0.124939
-0.595640	difficult cases like	-0.124939
-0.593451	in classes like	-0.124939
-1.734536	be implemented like	-0.124939
-0.883330	who would like	-0.124939
-0.873699	write expressions like	-0.124939
-0.436344	can look like	-0.124939
-0.306187	may look like	-0.425969
-0.436344	typically look like	-0.124939
-0.515291	for things like	-0.124939
-0.515291	simple things like	-0.124939
-0.577399	simple tasks like	-0.124939
-0.570222	add statements like	-0.124939
-1.136909	in situations like	-0.124939
-0.527232	also treated like	-0.124939
-0.527038	complicated techniques like	-0.124939
-0.107161	that behaves like	-0.124939
-0.090129	function looks like	-0.124939
-0.090129	code looks like	-0.124939
-0.090129	classes looks like	-0.124939
-0.358758	is expanded like	-0.124939
-0.358758	simple actions like	-0.124939
-0.601892	splitting the dependency	-0.124939
-2.659242	is a dependency	-0.124939
-0.901059	break a dependency	-0.124939
-1.203231	effect of dependency	-0.124939
-0.600350	chains. A dependency	-0.124939
-0.896221	if such dependency	-0.124939
-0.858381	a long dependency	-0.425969
-0.496345	or long dependency	-0.124939
-0.496345	A long dependency	-0.124939
-0.496345	no long dependency	-0.124939
-0.180818	avoid long dependency	-0.425969
-0.496345	Avoid long dependency	-0.124939
-0.584010	a critical dependency	-0.124939
-0.586588	Z. Each dependency	-0.124939
-0.580679	chain. Such dependency	-0.124939
-0.562912	break down dependency	-0.124939
-0.051813	a loop-carried dependency	-0.124939
-0.051813	no loop-carried dependency	-0.124939
-0.051813	two loop-carried dependency	-0.124939
-0.051813	No loop-carried dependency	-0.124939
-0.051813	especially loop-carried dependency	-0.124939
-0.504971	loop- carried dependency	-0.124939
-0.504971	order. Long dependency	-0.124939
-1.202084	write the members	-0.124939
-1.077653	containing the members	-0.124939
-2.246013	that are members	-0.124939
-2.059023	they are members	-0.124939
-1.077320	class with members	-0.124939
-1.005764	The data members	-0.124939
-1.004316	all data members	-0.124939
-0.524791	used data members	-0.124939
-0.760246	class data members	-0.124939
-0.524791	two data members	-0.124939
-0.524791	where data members	-0.124939
-0.524791	its data members	-0.124939
-0.524791	align data members	-0.124939
-0.187527	non-static data members	-0.425969
-0.187527	Class data members	-0.425969
-0.524791	accesses data members	-0.124939
-1.431625	often used members	-0.124939
-0.897874	and class members	-0.124939
-0.598210	saved variable members	-0.124939
-1.465122	of its members	-0.124939
-1.260183	the smallest members	-0.124939
-0.358891	class. Data members	-0.124939
-0.358891	together. Data members	-0.124939
-0.463549	instance. Non-static members	-0.124939
-1.754806	because of their	-0.124939
-1.015437	most of their	-0.124939
-1.918619	versions of their	-0.124939
-0.902443	inferior to their	-0.124939
-0.601569	languages and their	-0.124939
-0.601233	variable if their	-0.124939
-0.883417	program by their	-0.124939
-1.031473	replaced by their	-0.124939
-1.174935	identified by their	-0.124939
-1.441687	long as their	-0.124939
-1.495672	even when their	-0.124939
-0.600215	reductions at their	-0.124939
-0.600111	rarely program their	-0.124939
-2.528261	to make their	-0.124939
-1.030612	b because their	-0.124939
-0.200707	register because their	-0.425969
-0.878305	with each their	-0.124939
-0.589491	have each their	-0.124939
-0.596406	and test their	-0.124939
-1.246368	can change their	-0.124939
-0.851614	that fit their	-0.124939
-1.334187	to keep their	-0.124939
-0.902702	before leaving their	-0.124939
-1.179200	The type __m128i	-0.124939
-1.482545	each element __m128i	-0.124939
-0.528020	and c __m128i	-0.425969
-0.184393	vector c __m128i	-0.425969
-0.608638	static inline __m128i	-0.425969
-0.045964	* d, __m128i	-0.726999
-0.129450	a bit-mask: __m128i	-0.425969
-0.311248	vector c: __m128i	-0.425969
-0.311248	vector b: __m128i	-0.425969
-0.504998	AND operations: __m128i	-0.124939
-0.204069	of (0,0,0,0,0,0,0,0) __m128i	-0.425969
-0.204069	of (2,2,2,2,2,2,2,2) __m128i	-0.425969
-2.175831	} } Using	-0.124939
-0.888876	a function. Using	-0.124939
-1.040966	level-2 cache. Using	-0.124939
-1.518036	table lookup Using	-0.124939
-1.281802	of 2. Using	-0.124939
-0.855030	simple variable. Using	-0.124939
-1.105847	single precision. Using	-0.124939
-0.434807	access. 12 Using	-0.124939
-0.434807	103 12 Using	-0.124939
-0.314620	141 14.9 Using	-0.124939
-0.314620	int)u; 14.9 Using	-0.124939
-0.527103	less expensive. Using	-0.124939
-0.764043	of efficiency. Using	-0.124939
-0.314620	153 16.1 Using	-0.124939
-0.314620	below) 16.1 Using	-0.124939
-0.463385	least temporarily. Using	-0.124939
-0.659467	page 105). Using	-0.124939
-0.143333	107 12.4 Using	-0.124939
-0.143333	division. 12.4 Using	-0.124939
-0.143333	109 12.5 Using	-0.124939
-0.143333	section. 12.5 Using	-0.124939
-0.358729	this chapter. Using	-0.124939
-0.358729	page 150. Using	-0.124939
-0.358729	in parallel: Using	-0.124939
-0.358729	chapter 11. Using	-0.124939
-2.238833	of the Boolean	-0.301030
-2.245475	than the Boolean	-0.124939
-1.700216	between the Boolean	-0.124939
-2.302383	as a Boolean	-0.124939
-2.077819	make a Boolean	-0.124939
-0.901208	order of Boolean	-0.124939
-1.787721	case of Boolean	-0.124939
-0.600025	43). The Boolean	-0.124939
-0.600025	~. The Boolean	-0.124939
-1.934066	useful for Boolean	-0.124939
-1.076211	operations with Boolean	-0.124939
-0.600960	integers as Boolean	-0.124939
-2.361011	rather than Boolean	-0.124939
-1.665398	that have Boolean	-0.124939
-1.607257	with many Boolean	-0.124939
-0.861116	that produce Boolean	-0.124939
-0.557518	xx-xx--x- reciprocal Boolean	-0.124939
-0.998910	branch mispredictions. Boolean	-0.124939
-0.527151	for true. Boolean	-0.124939
-0.659553	are overdetermined Boolean	-0.124939
-0.463440	is invalid. Boolean	-0.124939
-0.358773	- 76 Boolean	-0.124939
-2.110984	in the cache.	-0.301030
-0.600844	invalidate the cache.	-0.124939
-1.076749	memory or cache.	-0.124939
-1.954780	the code cache.	-0.425969
-0.778043	the data cache.	-0.124939
-2.638734	the same cache.	-0.124939
-0.591798	level- 1 cache.	-0.124939
-0.606417	the level-2 cache.	-0.124939
-0.761369	the level-1 cache.	-0.124939
-0.431115	same level-1 cache.	-0.124939
-0.575484	the disk cache.	-0.124939
-0.249793	the micro-op cache.	-0.124939
-0.249793	or micro-op cache.	-0.124939
-0.358859	a level-3 cache.	-0.124939
-0.601632	flag and don't	-0.124939
-1.078177	data that don't	-0.124939
-1.146214	that you don't	-0.425969
-1.297461	if you don't	-0.124939
-1.191337	as you don't	-0.124939
-1.852958	then you don't	-0.124939
-1.643759	If you don't	-0.124939
-1.275587	Therefore, you don't	-0.124939
-0.564314	devices, you don't	-0.124939
-0.599800	destination, but don't	-0.124939
-0.897782	current compilers don't	-0.124939
-0.792385	and we don't	-0.124939
-0.916212	that we don't	-0.425969
-0.480636	so we don't	-0.124939
-1.580159	if they don't	-0.124939
-0.569939	options. I don't	-0.124939
-0.569939	a. I don't	-0.124939
-0.591511	I simply don't	-0.124939
-0.463531	So please don't	-0.124939
-0.463531	the factorials don't	-0.124939
-0.358844	that "we don't	-0.124939
-1.078467	cache of 256	-0.124939
-0.601178	1024/4 = 256	-0.124939
-0.601101	(128 or 256	-0.124939
-0.600782	int int 256	-0.124939
-1.197651	take only 256	-0.124939
-0.599085	256 double 256	-0.124939
-0.598647	128 float 256	-0.124939
-0.583849	64 4 256	-0.124939
-0.588142	32 8 256	-0.124939
-1.013124	4 unsigned 256	-0.124939
-0.578906	256 unsigned 256	-0.124939
-0.368396	16 16 256	-0.124939
-1.061065	8 32 256	-0.124939
-0.593490	instructions AVX 256	-0.124939
-0.587830	32 char 256	-0.124939
-0.587124	(SIZE > 256	-0.124939
-0.585580	vectors AVX2 256	-0.124939
-0.557371	int int64_t 256	-0.124939
-0.557371	is available, 256	-0.124939
-0.526974	256 uint64_t 256	-0.124939
-0.358715	bits (XMM), 256	-0.124939
-0.358715	int 832 256	-0.124939
-2.286280	than the intrinsic	-0.124939
-1.202450	calling the intrinsic	-0.124939
-1.937687	use of intrinsic	-0.124939
-0.601489	double. The intrinsic	-0.124939
-1.037014	support for intrinsic	-0.124939
-0.892897	files for intrinsic	-0.124939
-0.596933	header for intrinsic	-0.124939
-2.261538	There are intrinsic	-0.124939
-1.200083	implemented with intrinsic	-0.124939
-2.417817	to use intrinsic	-0.124939
-1.853970	of different intrinsic	-0.124939
-0.599745	had used intrinsic	-0.124939
-1.366643	that each intrinsic	-0.124939
-2.084906	by using intrinsic	-0.124939
-0.595994	Define SSE2 intrinsic	-0.124939
-0.594372	language Use intrinsic	-0.124939
-1.255634	that support intrinsic	-0.124939
-0.498042	lookup Using intrinsic	-0.124939
-0.181228	12.4 Using intrinsic	-0.425969
-1.132081	compiler supports intrinsic	-0.124939
-1.297418	the so-called intrinsic	-0.124939
-0.358744	allow assembly-like intrinsic	-0.124939
-0.358744	the _mm_clflush intrinsic	-0.124939
-2.437823	by the methods	-0.124939
-1.298249	Using the methods	-0.124939
-0.599935	These different methods	-0.124939
-1.289604	than other methods	-0.124939
-1.072274	commonly used methods	-0.124939
-0.598566	use such methods	-0.124939
-0.892464	various optimization methods	-0.124939
-0.982789	of these methods	-0.124939
-0.982892	that these methods	-0.124939
-0.535335	use these methods	-0.124939
-0.596265	more useful methods	-0.124939
-1.667262	the following methods	-0.124939
-1.804625	The following methods	-0.124939
-0.564588	memory. These methods	-0.124939
-0.831125	CPU. These methods	-0.124939
-1.676260	the above methods	-0.124939
-0.588883	most development methods	-0.124939
-1.555678	are various methods	-0.124939
-0.869378	the storage methods	-0.124939
-0.841631	and similar methods	-0.124939
-0.358744	inefficient code-based methods	-0.124939
-0.358744	These workaround methods	-0.124939
-0.358744	efficient table-based methods	-0.124939
-0.358744	and suggests methods	-0.124939
-1.947364	of a signed	-0.124939
-2.361372	to a signed	-0.124939
-1.203139	behavior of signed	-0.124939
-1.435821	it to signed	-0.124939
-1.198414	integers to signed	-0.124939
-0.899278	conversion to signed	-0.124939
-0.902019	assumption that signed	-0.124939
-2.939051	can be signed	-0.124939
-0.601068	faster with signed	-0.124939
-0.901066	differently on signed	-0.124939
-1.833308	efficient than signed	-0.124939
-1.998373	faster than signed	-0.124939
-0.599236	between using signed	-0.124939
-0.581713	conversion between signed	-0.124939
-1.286851	Conversions between signed	-0.124939
-1.174287	// Use signed	-0.124939
-1.287606	to mix signed	-0.124939
-0.562793	64-bit integer, signed	-0.124939
-0.550520	of comparing signed	-0.124939
-0.358822	2 int, signed	-0.124939
-0.504871	short int, signed	-0.124939
-0.129450	an 8-bit signed	-0.124939
-0.463440	1 char, signed	-0.124939
-2.713408	is a model	-0.124939
-0.893666	name and model	-0.124939
-1.190060	names and model	-0.124939
-0.124410	family and model	-0.301030
-1.128766	assume that model	-0.124939
-0.601233	brand or model	-0.124939
-0.600365	inferior. A model	-0.124939
-1.633305	the memory model	-0.124939
-0.374186	large memory model	-0.124939
-0.854074	each CPU model	-0.124939
-1.314119	specific CPU model	-0.124939
-0.576861	known CPU model	-0.124939
-0.854074	particular CPU model	-0.124939
-0.596392	next new model	-0.124939
-0.185253	that processor model	-0.124939
-0.514990	each processor model	-0.124939
-0.514990	next processor model	-0.124939
-1.669630	the next model	-0.124939
-0.827998	a false model	-0.124939
-0.358844	fast, -fp- model	-0.124939
-1.077564	how the development	-0.124939
-1.201965	during the development	-0.124939
-0.601741	ease of development	-0.124939
-1.665647	compilers and development	-0.124939
-1.287996	language and development	-0.124939
-0.897829	portability and development	-0.124939
-1.073135	automatically. The development	-0.124939
-1.073135	application. The development	-0.124939
-0.600123	makes program development	-0.124939
-0.599339	Since most development	-0.124939
-1.188286	that some development	-0.124939
-0.863014	the software development	-0.124939
-0.523783	which software development	-0.124939
-0.523783	Some software development	-0.124939
-0.523783	platform software development	-0.124939
-0.523783	structured software development	-0.124939
-0.894523	compromise between development	-0.124939
-0.593790	of good development	-0.124939
-1.026994	of advanced development	-0.124939
-0.835673	and easy development	-0.124939
-0.550603	of powerful development	-0.124939
-0.541004	One popular development	-0.124939
-0.463458	network. Various development	-0.124939
-0.463458	The integrated development	-0.124939
-0.601856	follows the mathematical	-0.124939
-0.203740	reasons of mathematical	-0.425969
-0.600163	tables of mathematical	-0.124939
-0.598308	processing, and mathematical	-0.124939
-0.598308	searching, and mathematical	-0.124939
-0.203367	root and mathematical	-0.425969
-1.078079	is in mathematical	-0.124939
-1.443153	instructions for mathematical	-0.124939
-0.601131	memset, or mathematical	-0.124939
-0.901031	is on mathematical	-0.124939
-1.879604	can do mathematical	-0.124939
-0.596265	many useful mathematical	-0.124939
-1.785321	information about mathematical	-0.124939
-0.564567	for common mathematical	-0.124939
-1.298265	most common mathematical	-0.124939
-0.593270	contains optimized mathematical	-0.124939
-0.591983	loop doing mathematical	-0.124939
-1.530340	more complicated mathematical	-0.124939
-1.026804	of advanced mathematical	-0.124939
-1.287382	to mix mathematical	-0.124939
-0.562753	as heavy mathematical	-0.124939
-0.788398	for computing mathematical	-0.124939
-0.358744	for vectorizing mathematical	-0.124939
-2.613782	it is never	-0.124939
-1.689895	function is never	-0.425969
-1.982919	program is never	-0.124939
-0.897889	variable is never	-0.124939
-1.581289	that are never	-0.124939
-1.831066	functions are never	-0.124939
-1.979536	they are never	-0.124939
-0.599184	i can never	-0.124939
-1.744836	We can never	-0.124939
-0.875940	a will never	-0.124939
-1.467886	you will never	-0.124939
-0.588273	F1 will never	-0.124939
-0.487204	function should never	-0.124939
-1.120996	It should never	-0.124939
-0.572549	mechanism should never	-0.124939
-1.862276	the user never	-0.124939
-0.886813	that overflow never	-0.124939
-0.590712	feature was never	-0.124939
-1.480527	memory space never	-0.124939
-1.115354	user input never	-0.124939
-0.527186	#define directive never	-0.124939
-1.195995	in a separate	-0.221849
-1.744658	or a separate	-0.124939
-1.033438	into a separate	-0.301030
-1.423382	making a separate	-0.124939
-0.881160	implemented a separate	-0.124939
-2.465302	number of separate	-0.124939
-0.900250	access in separate	-0.124939
-0.900250	placed in separate	-0.124939
-2.389573	should be separate	-0.124939
-1.591124	to have separate	-0.124939
-1.196602	may make separate	-0.124939
-0.600038	used functions separate	-0.124939
-0.897332	tasks into separate	-0.124939
-0.596580	threads need separate	-0.124939
-2.365034	because the block	-0.124939
-1.203011	within a block	-0.124939
-0.901834	they can block	-0.124939
-0.910859	the memory block	-0.124939
-0.821727	a memory block	-0.124939
-0.521378	this memory block	-0.124939
-0.754392	one memory block	-0.124939
-0.521378	any memory block	-0.124939
-0.186740	new memory block	-0.425969
-0.754392	big memory block	-0.124939
-0.521378	own memory block	-0.124939
-0.346120	bigger memory block	-0.124939
-0.521378	old memory block	-0.124939
-2.097908	the data block	-0.124939
-0.888735	A large block	-0.124939
-1.056897	one big block	-0.124939
-1.450246	a small block	-0.124939
-1.461726	its own block	-0.124939
-1.411646	the old block	-0.124939
-1.203374	can possibly block	-0.124939
-0.562964	no try block	-0.124939
-2.355689	is the name	-0.124939
-2.648009	that the name	-0.124939
-2.273465	use the name	-0.124939
-0.600826	modifying the name	-0.124939
-1.582420	code. The name	-0.124939
-0.899090	www.agner.org/optimize/asmlib.zip. The name	-0.124939
-2.201206	the function name	-0.124939
-0.879052	Define function name	-0.124939
-0.879052	Intrinsic function name	-0.124939
-0.201655	mangled function name	-0.124939
-1.843373	a different name	-0.124939
-1.694323	the same name	-0.301030
-0.733401	child class name	-0.124939
-1.967474	is called name	-0.124939
-1.058554	than its name	-0.124939
-1.058575	details about name	-0.124939
-1.034465	the local name	-0.124939
-0.835524	any brand name	-0.124939
-0.835524	an arbitrary name	-0.124939
-0.463495	funny looking name	-0.124939
-0.358816	Function Assembly name	-0.124939
-1.714325	between the systems.	-0.124939
-1.012964	the 64-bit systems.	-0.124939
-0.891707	and 64-bit systems.	-0.124939
-0.663211	in 64-bit systems.	-0.124939
-1.188459	on some systems.	-0.124939
-1.955094	in 32-bit systems.	-0.124939
-0.876408	and operating systems.	-0.124939
-0.544616	multiple operating systems.	-0.124939
-0.971011	64-bit operating systems.	-0.124939
-1.054284	for Linux systems.	-0.124939
-0.879208	in Mac systems.	-0.124939
-0.585926	on bigger systems.	-0.124939
-0.573186	and message systems.	-0.124939
-0.573186	to BSD systems.	-0.124939
-0.391069	some embedded systems.	-0.124939
-0.550708	small embedded systems.	-0.124939
-0.726551	in Unix-like systems.	-0.124939
-0.358830	and Itanium systems.	-0.124939
-2.114385	is to put	-0.124939
-1.404749	and to put	-0.124939
-0.891370	useful to put	-0.124939
-1.168680	advantageous to put	-0.124939
-1.314146	recommended to put	-0.425969
-0.594698	like to put	-0.124939
-1.057459	idea to put	-0.124939
-1.907719	code and put	-0.124939
-0.599510	threads and put	-0.124939
-0.898005	functions, and put	-0.124939
-2.427011	to be put	-0.124939
-1.809745	preferably be put	-0.124939
-2.056615	you may put	-0.124939
-1.498995	unless you put	-0.124939
-0.600678	they have put	-0.124939
-0.588003	other then put	-0.124939
-0.588003	bytes then put	-0.124939
-0.588003	other, then put	-0.124939
-0.552352	and simply put	-0.124939
-0.808783	are simply put	-0.124939
-0.450681	threads. Don't put	-0.124939
-0.450681	opposite: Don't put	-0.124939
-3.191667	of the needs	-0.124939
-0.902474	&& a needs	-0.124939
-1.168686	function that needs	-0.124939
-0.892097	work that needs	-0.124939
-1.062795	destructor that needs	-0.124939
-1.084735	and it needs	-0.425969
-1.180430	because it needs	-0.124939
-0.600975	1.0f; This needs	-0.124939
-1.924069	the compiler needs	-0.425969
-2.003527	a loop needs	-0.124939
-0.897186	&& b needs	-0.124939
-1.276888	executable file needs	-0.124939
-0.595343	one constant needs	-0.124939
-1.059992	positive list needs	-0.124939
-1.062452	code section needs	-0.124939
-0.857957	data section needs	-0.124939
-0.579089	code still needs	-0.124939
-1.003769	each iteration needs	-0.124939
-0.884572	exception handler needs	-0.124939
-0.358787	as ReadB needs	-0.124939
-0.124296	z = y	-0.602060
-1.207693	else { y	-0.425969
-0.738420	(b) { y	-0.425969
-1.192608	{ double y	-0.124939
-1.066683	// return y	-0.124939
-1.323642	the structure y	-0.124939
-1.423985	the expression y	-0.124939
-1.041121	+ c; y	-0.124939
-0.587212	x > y	-0.124939
-0.585038	c; Here, y	-0.124939
-0.865967	= a; y	-0.124939
-0.580583	: b) y	-0.124939
-0.118903	d, y; y	-0.425969
-0.283170	100, y; y	-0.124939
-0.283170	1.23456, y; y	-0.124939
-0.129434	b1, b2; y	-0.425969
-0.358787	& 1) y	-0.124939
-0.358787	+ a.x, y	-0.124939
-0.358787	may write: y	-0.124939
-2.648009	that the conversion	-0.124939
-2.674269	if the conversion	-0.124939
-1.591076	example, the conversion	-0.124939
-0.900625	Typically, the conversion	-0.124939
-0.893229	well. The conversion	-0.124939
-0.893229	integer. The conversion	-0.124939
-0.597101	result. The conversion	-0.124939
-0.597101	i++)a[i]=2*i; The conversion	-0.124939
-0.600989	int)i; This conversion	-0.124939
-0.600335	mode. A conversion	-0.124939
-0.600186	This data conversion	-0.124939
-1.288546	to integer conversion	-0.124939
-1.550776	The size conversion	-0.124939
-0.588660	Integer size conversion	-0.124939
-1.069305	to float conversion	-0.124939
-0.597430	integers before conversion	-0.124939
-0.597246	/ unsigned conversion	-0.124939
-0.828298	the type conversion	-0.124939
-0.527268	Pointer type conversion	-0.124939
-0.527268	Implicit type conversion	-0.124939
-0.826282	point precision conversion	-0.124939
-0.561960	require precision conversion	-0.124939
-0.463495	truncation. Efficient conversion	-0.124939
-0.358816	the integer-to-float conversion	-0.124939
-2.390963	a = c;	-0.124939
-1.167012	b; int c;	-0.124939
-1.051057	public: int c;	-0.124939
-0.592482	b2; int c;	-0.124939
-0.599235	b; double c;	-0.124939
-0.707159	b + c;	-0.124939
-1.070327	b * c;	-0.124939
-0.597963	1; return c;	-0.124939
-1.258113	b / c;	-0.124939
-0.270654	int b, c;	-0.124939
-0.352332	a, b, c;	-0.367977
-1.003999	b % c;	-0.124939
-0.015539	int r, c;	-0.425969
-0.379754	means of #include	-0.124939
-1.530543	class library #include	-0.124939
-0.595966	with SSE2 #include	-0.124939
-0.594685	compiled versions #include	-0.124939
-1.640061	vector classes #include	-0.124939
-1.769860	CPU dispatching #include	-0.124939
-0.873584	other compilers. #include	-0.124939
-0.579091	series, vectorized #include	-0.124939
-0.540971	Example 16.2 #include	-0.124939
-0.527071	Example 16.1 #include	-0.124939
-0.527071	Example 9.3 #include	-0.124939
-0.504700	x64 141 #include	-0.124939
-0.504850	Example 9.6b. #include	-0.124939
-0.107150	for InstructionSet() #include	-0.425969
-0.659410	#include <stdio.h> #include	-0.124939
-0.463349	// Or #include	-0.124939
-0.358701	#include <excpt.h> #include	-0.124939
-0.358701	#include <float.h> #include	-0.124939
-0.358701	mode (SSE2): #include	-0.124939
-0.358701	classes (Intel) #include	-0.124939
-0.358701	mode (SSE): #include	-0.124939
-0.358701	classes 114 #include	-0.124939
-0.601898	applying the various	-0.124939
-1.378480	availability of various	-0.124939
-0.900077	checking and various	-0.124939
-1.198115	platforms and various	-0.124939
-1.641115	implemented in various	-0.124939
-1.865466	there are various	-0.124939
-0.696201	There are various	-0.279841
-1.038839	compilers have various	-0.124939
-0.600109	complicated because various	-0.124939
-1.065713	also makes various	-0.124939
-0.592396	www.agner.org/optimize/asmlib.zip contains various	-0.124939
-1.286787	to reduce various	-0.124939
-0.550677	135 show various	-0.124939
-0.659697	sections describe various	-0.124939
-1.590278	have the disadvantage	-0.124939
-0.850480	has the disadvantage	-0.602060
-2.571159	is a disadvantage	-0.124939
-2.038533	be a disadvantage	-0.124939
-1.859409	at a disadvantage	-0.124939
-1.071929	often a disadvantage	-0.124939
-1.256484	below. The disadvantage	-0.124939
-1.435916	data. The disadvantage	-0.124939
-1.256484	called. The disadvantage	-0.124939
-0.884720	Windows. The disadvantage	-0.124939
-1.051909	array. The disadvantage	-0.124939
-0.884720	starts. The disadvantage	-0.124939
-0.592778	avoided. The disadvantage	-0.124939
-0.588586	107. A disadvantage	-0.124939
-0.588586	form. A disadvantage	-0.124939
-0.588586	types. A disadvantage	-0.124939
-1.257336	most important disadvantage	-0.124939
-0.569316	An important disadvantage	-0.124939
-0.408151	time. Another disadvantage	-0.124939
-0.408151	itself. Another disadvantage	-0.124939
-0.408151	slower. Another disadvantage	-0.124939
-0.550770	The biggest disadvantage	-0.124939
-2.926676	in the high	-0.124939
-2.274488	use the high	-0.124939
-2.317498	because the high	-0.124939
-0.600855	With the high	-0.124939
-1.200395	objects is high	-0.124939
-0.901223	load is high	-0.124939
-1.956247	is a high	-0.425969
-1.865895	if a high	-0.124939
-2.147464	with a high	-0.124939
-1.895258	have a high	-0.124939
-1.834873	at a high	-0.124939
-1.592696	where a high	-0.124939
-0.596709	involve a high	-0.124939
-0.600107	statements The high	-0.124939
-0.600107	powerful. The high	-0.124939
-1.433361	instructions for high	-0.124939
-0.600062	Libraries for high	-0.124939
-0.600207	performance has high	-0.124939
-0.844292	is so high	-0.425969
-0.838225	be so high	-0.124939
-1.805821	a very high	-0.124939
-0.463586	be annoyingly high	-0.124939
-2.312388	use the zero	-0.124939
-0.899824	number is zero	-0.124939
-1.437432	value is zero	-0.124939
-1.197612	f is zero	-0.124939
-1.078360	byte of zero	-0.124939
-0.845964	a to zero	-0.124939
-1.415435	integer to zero	-0.124939
-1.180113	variables to zero	-0.124939
-1.055175	bit to zero	-0.124939
-1.055175	register to zero	-0.124939
-1.582168	pointers to zero	-0.124939
-1.055175	initialized to zero	-0.124939
-0.886939	seconds to zero	-0.124939
-0.886939	down to zero	-0.124939
-0.593909	Initialize to zero	-0.124939
-0.601658	carry and zero	-0.124939
-1.189487	conversion takes zero	-0.124939
-0.590734	a was zero	-0.124939
-0.191258	(0,0,0,0,0,0,0,0) __m128i zero	-0.425969
-0.107182	the terminating zero	-0.425969
-0.358873	seconds remains zero	-0.124939
-3.026477	of the Microsoft	-0.124939
-2.922081	in the Microsoft	-0.124939
-1.857728	as the Microsoft	-0.124939
-1.940702	into the Microsoft	-0.124939
-1.497356	C++ is Microsoft	-0.124939
-1.076242	tool is Microsoft	-0.124939
-0.600950	specific to Microsoft	-0.124939
-0.600950	plug-in to Microsoft	-0.124939
-1.502289	Intel and Microsoft	-0.124939
-1.961194	} The Microsoft	-0.124939
-0.899050	platforms. The Microsoft	-0.124939
-1.076530	Intel or Microsoft	-0.124939
-0.894674	made with Microsoft	-0.124939
-1.070766	Comes with Microsoft	-0.124939
-0.599877	// If Microsoft	-0.124939
-0.587812	mentioned below. Microsoft	-0.124939
-0.530604	compilers Intel, Microsoft	-0.124939
-0.530604	Clang, Intel, Microsoft	-0.124939
-0.567045	also available. Microsoft	-0.124939
-0.557538	Intel Borland Microsoft	-0.124939
-0.550542	Intel CodeGear Microsoft	-0.124939
-0.463458	explanation. (The Microsoft	-0.124939
-0.358787	to date): Microsoft	-0.124939
-0.358787	were tested: Microsoft	-0.124939
-0.601737	limitations to what	-0.124939
-0.902109	do and what	-0.124939
-0.902019	fast that what	-0.124939
-0.601160	called, or what	-0.124939
-0.590824	but on what	-0.124939
-1.721589	based on what	-0.124939
-1.694709	depends on what	-0.124939
-1.762883	depending on what	-0.124939
-1.594727	look at what	-0.124939
-1.464780	compiler does what	-0.124939
-0.594351	returns. But what	-0.124939
-0.591666	; add what	-0.124939
-1.592037	example shows what	-0.124939
-1.062812	to know what	-0.124939
-0.669718	you know what	-0.124939
-0.763795	doesn't know what	-0.124939
-1.079997	can change what	-0.124939
-0.744017	cannot change what	-0.124939
-0.562793	tell explicitly what	-0.124939
-0.562942	measure exactly what	-0.124939
-0.463440	and that's what	-0.124939
-0.065786	8.7 Checking what	-0.425969
-0.659553	the reader what	-0.124939
-3.135261	of the parameter	-0.124939
-2.729612	if the parameter	-0.124939
-2.564894	of a parameter	-0.124939
-1.917680	if a parameter	-0.124939
-2.278283	as a parameter	-0.124939
-0.646395	overhead of parameter	-0.602060
-0.600576	return and parameter	-0.124939
-1.295125	allocation and parameter	-0.124939
-1.713360	a function parameter	-0.124939
-0.599587	critical integer parameter	-0.124939
-1.331657	the size parameter	-0.425969
-1.352791	the template parameter	-0.124939
-1.352791	a template parameter	-0.124939
-0.539456	The template parameter	-0.124939
-1.051761	A template parameter	-0.124939
-0.865110	a ; parameter	-0.124939
-0.749360	Induction; ; parameter	-0.124939
-0.518428	PROCNEAR ; parameter	-0.124939
-0.518428	NEAR ; parameter	-0.124939
-0.764503	an implicit parameter	-0.124939
-2.103617	make the division	-0.124939
-0.601658	Multiplication and division	-0.124939
-0.902133	0. The division	-0.124939
-1.384506	faster than division	-0.425969
-1.911219	floating point division	-0.124939
-0.755943	Floating point division	-0.124939
-0.599721	eliminate one division	-0.124939
-1.126568	The integer division	-0.124939
-1.126568	for integer division	-0.124939
-0.581336	means integer division	-0.124939
-1.247580	for fast division	-0.124939
-0.352341	2 Integer division	-0.124939
-0.495825	time. Integer division	-0.124939
-0.352341	division Integer division	-0.124939
-0.352341	microprocessors. Integer division	-0.124939
-0.352341	microprocessor. Integer division	-0.124939
-0.495825	14.5 Integer division	-0.124939
-0.352341	processor). Integer division	-0.124939
-0.352341	division: Integer division	-0.124939
-0.726651	reductions involving division	-0.124939
-2.613915	is a reference	-0.124939
-1.294233	Use a reference	-0.124939
-0.899678	returns a reference	-0.124939
-0.287394	pointer or reference	-0.505150
-0.899829	to. A reference	-0.124939
-0.472077	a const reference	-0.124939
-0.548259	or const reference	-0.124939
-0.934407	A const reference	-0.124939
-1.764952	a constant reference	-0.124939
-0.863979	a relative reference	-0.124939
-0.835782	// Return reference	-0.124939
-0.659811	a null reference	-0.124939
-2.942723	of the source	-0.124939
-2.328756	in the source	-0.124939
-2.625566	if the source	-0.124939
-2.249265	use the source	-0.124939
-2.062509	make the source	-0.124939
-0.601509	code). The source	-0.124939
-1.596547	option for source	-0.124939
-1.744451	means that source	-0.124939
-1.488657	in different source	-0.124939
-1.693984	the same source	-0.301030
-0.599671	join all source	-0.124939
-1.657572	in one source	-0.124939
-0.891600	a useful source	-0.124939
-1.451868	a common source	-0.124939
-1.051963	in another source	-0.124939
-0.591321	steps. All source	-0.124939
-0.827813	a frequent source	-0.124939
-0.557518	Yeppp. Open source	-0.124939
-0.818051	a reliable source	-0.124939
-0.504801	Another open source	-0.124939
-0.358773	a valuable source	-0.124939
-2.453713	and the cost	-0.124939
-2.628221	if the cost	-0.124939
-2.163809	at the cost	-0.124939
-1.368841	But the cost	-0.124939
-0.600144	Avoiding the cost	-0.124939
-0.600144	Underestimating the cost	-0.124939
-0.597142	thread. The cost	-0.124939
-0.597142	60 The cost	-0.124939
-0.597142	defined. The cost	-0.124939
-0.597142	applications: The cost	-0.124939
-2.026688	does not cost	-0.124939
-0.601017	switching. This cost	-0.124939
-1.462172	is no cost	-0.425969
-1.365175	has no cost	-0.124939
-0.571647	virtually no cost	-0.124939
-1.067766	without any cost	-0.124939
-0.894894	no performance cost	-0.124939
-0.793200	This extra cost	-0.124939
-1.241153	an extra cost	-0.124939
-1.320822	no extra cost	-0.124939
-1.495417	a large cost	-0.124939
-0.744245	large overhead cost	-0.124939
-0.515415	high overhead cost	-0.124939
-2.102467	it is running	-0.124939
-1.589164	code is running	-0.124939
-0.599010	core is running	-0.124939
-0.601535	term for running	-0.124939
-2.197496	that are running	-0.124939
-1.695113	we are running	-0.124939
-0.597739	repagination are running	-0.124939
-1.565370	only when running	-0.124939
-0.866766	set when running	-0.124939
-0.583518	libraries when running	-0.124939
-0.583518	disappears when running	-0.124939
-0.597430	counters before running	-0.124939
-1.924477	operating system running	-0.124939
-1.179269	to avoid running	-0.124939
-0.594352	Two threads running	-0.124939
-0.594137	higher-priority thread running	-0.124939
-0.587820	Multiple applications running	-0.124939
-0.584990	background process running	-0.124939
-0.541048	other processes running	-0.124939
-0.726517	program starts running	-0.124939
-0.504957	resources. Consider running	-0.124939
-1.272676	language and automatic	-0.124939
-0.089726	OpenMP and automatic	-0.249877
-0.596239	intrinsics and automatic	-0.124939
-1.692657	function. The automatic	-0.124939
-1.073226	instructions. The automatic	-0.124939
-0.600022	features for automatic	-0.124939
-0.600022	intranet for automatic	-0.124939
-0.593545	code with automatic	-0.425969
-0.591442	(12.4e) with automatic	-0.124939
-0.591442	invoked with automatic	-0.124939
-1.040436	rely on automatic	-0.124939
-2.246595	is no automatic	-0.124939
-0.599363	Can do automatic	-0.124939
-1.429365	situations where automatic	-0.124939
-0.594479	compilers. Use automatic	-0.124939
-1.166635	program contains automatic	-0.124939
-0.865822	that supports automatic	-0.124939
-0.964928	to install automatic	-0.124939
-0.358830	vector intrinsics, automatic	-0.124939
-0.601886	shares the resources	-0.124939
-1.198015	data and resources	-0.124939
-1.199533	called and resources	-0.124939
-0.856603	use more resources	-0.124939
-1.437161	take more resources	-0.124939
-0.870034	much more resources	-0.425969
-1.233041	slightly more resources	-0.124939
-0.899453	more memory resources	-0.124939
-1.224057	and other resources	-0.124939
-1.195173	predict which resources	-0.124939
-0.599714	removed, all resources	-0.124939
-0.198185	Other system resources	-0.124939
-1.096063	are allocated resources	-0.124939
-0.562977	sure allocated resources	-0.124939
-1.328631	the shared resources	-0.124939
-0.440526	for network resources	-0.124939
-0.166649	on network resources	-0.124939
-0.541048	less computing resources	-0.124939
-0.659639	to reserve resources	-0.124939
-0.358816	thread steals resources	-0.124939
-3.075741	of the induction	-0.124939
-2.285566	use the induction	-0.124939
-2.086404	make the induction	-0.124939
-1.378965	method of induction	-0.124939
-0.601594	subexpressions, and induction	-0.124939
-0.894694	this with induction	-0.124939
-0.597842	polynomial with induction	-0.124939
-0.594393	by an induction	-0.124939
-1.167073	making an induction	-0.124939
-2.418877	to use induction	-0.124939
-1.291382	not make induction	-0.124939
-2.634777	the same induction	-0.124939
-2.294868	floating point induction	-0.124939
-1.786541	Floating point induction	-0.124939
-1.193780	and no induction	-0.124939
-1.577783	the two induction	-0.124939
-1.144238	of two induction	-0.124939
-1.279301	doesn't need induction	-0.124939
-1.032153	a second induction	-0.124939
-0.764226	an explicit induction	-0.124939
-0.065789	// Update induction	-0.425969
-1.283907	is the reason	-0.726999
-1.470782	code. The reason	-0.124939
-0.873468	1. The reason	-0.124939
-1.035473	cycles. The reason	-0.124939
-0.873468	well. The reason	-0.124939
-0.873468	do. The reason	-0.124939
-0.873468	4. The reason	-0.124939
-0.586996	point. The reason	-0.124939
-0.586996	occur. The reason	-0.124939
-0.586996	end. The reason	-0.124939
-0.586996	tested. The reason	-0.124939
-0.586996	directly. The reason	-0.124939
-2.643731	the same reason	-0.124939
-1.020540	is no reason	-0.823909
-1.024875	The main reason	-0.124939
-0.550765	compelling security reason	-0.124939
-2.601890	to the dispatcher	-0.124939
-2.582289	that the dispatcher	-0.124939
-2.136401	from the dispatcher	-0.124939
-1.808498	makes the dispatcher	-0.124939
-1.292145	calls the dispatcher	-0.124939
-1.369133	Therefore, the dispatcher	-0.124939
-1.296565	Make the dispatcher	-0.124939
-0.600148	dispatcher. The dispatcher	-0.124939
-0.600148	initialized. The dispatcher	-0.124939
-0.600472	conditions. A dispatcher	-0.124939
-0.600205	// make dispatcher	-0.124939
-1.278180	the CPU dispatcher	-0.124939
-0.464084	a CPU dispatcher	-0.124939
-0.415447	The CPU dispatcher	-0.346788
-0.753306	A CPU dispatcher	-0.124939
-0.753306	Intel CPU dispatcher	-0.124939
-2.330378	that is n	-0.124939
-1.295420	array of n	-0.124939
-0.900837	consequence of n	-0.124939
-1.299919	fact that n	-0.124939
-0.601278	x^0/0! // n	-0.124939
-1.650338	loop by n	-0.124939
-1.360640	calculated by n	-0.124939
-0.901014	check on n	-0.124939
-1.057492	than when n	-0.124939
-0.594709	serious when n	-0.124939
-0.599853	vector. If n	-0.124939
-0.598375	back, where n	-0.124939
-0.593666	u.i += n	-0.124939
-1.163742	// add n	-0.124939
-1.725984	for (int n	-0.124939
-1.488284	= 1; n	-0.124939
-1.227919	*= x; n	-0.124939
-1.316137	by adding n	-0.124939
-0.841503	least significant n	-0.124939
-0.557460	ex xn n	-0.124939
-0.557460	0 <= n	-0.124939
-0.463385	by subtracting n	-0.124939
-0.463385	b) >> n	-0.124939
-3.191667	of the string	-0.124939
-0.804179	time a string	-0.425969
-1.071726	produces a string	-0.124939
-0.599561	scans a string	-0.124939
-1.502919	implementations of string	-0.124939
-0.802491	memory and string	-0.124939
-0.599422	Memory and string	-0.124939
-1.692306	function. The string	-0.124939
-0.899050	applications. The string	-0.124939
-1.708977	functions for string	-0.124939
-0.601531	interpret that string	-0.124939
-2.275981	such as string	-0.124939
-2.418612	to use string	-0.124939
-0.600231	names from string	-0.124939
-1.820861	of each string	-0.124939
-0.594056	of common string	-0.124939
-0.981285	C style string	-0.124939
-0.129434	point constants, string	-0.425969
-0.463458	zero-terminated ASCII string	-0.124939
-0.463458	instructions SSE4.2 string	-0.124939
-1.849849	of the programmer	-0.903090
-2.408137	to the programmer	-0.124939
-1.528784	for the programmer	-0.726999
-1.949720	that the programmer	-0.124939
-2.418873	if the programmer	-0.124939
-2.144558	because the programmer	-0.124939
-1.773908	but the programmer	-0.124939
-1.061165	help the programmer	-0.124939
-0.595971	puts the programmer	-0.124939
-1.641076	example, a programmer	-0.124939
-0.598678	reasons. The programmer	-0.124939
-0.598678	features. The programmer	-0.124939
-0.598678	70). The programmer	-0.124939
-1.574148	the application programmer	-0.124939
-2.662180	for the three	-0.124939
-0.601544	way and three	-0.124939
-0.600004	r.b;} The three	-0.124939
-0.600004	increment. The three	-0.124939
-2.528596	may be three	-0.124939
-1.565659	There are three	-0.425969
-1.554446	two or three	-0.124939
-1.596474	implemented as three	-0.124939
-0.900125	data have three	-0.124939
-1.527879	This has three	-0.124939
-0.586175	example has three	-0.124939
-0.586175	for-loop has three	-0.124939
-1.196906	and all three	-0.124939
-1.286439	divided into three	-0.124939
-1.070195	other way three	-0.124939
-1.056923	be compiled three	-0.124939
-1.058851	addition every three	-0.124939
-1.564681	// Make three	-0.124939
-0.581846	well. Supports three	-0.124939
-0.981363	is approximately three	-0.124939
-0.504761	listing reveals three	-0.124939
-0.463403	12.4b executes three	-0.124939
-2.117476	set is better	-0.124939
-1.297197	version is better	-0.124939
-2.306446	to a better	-0.124939
-1.379281	be a better	-0.124939
-2.218152	with a better	-0.124939
-1.483199	get a better	-0.124939
-1.550247	new and better	-0.124939
-0.600538	better and better	-0.124939
-1.444079	waiting for better	-0.124939
-1.800643	may be better	-0.425969
-2.061101	will be better	-0.124939
-0.596138	actually be better	-0.124939
-1.711759	compilers are better	-0.124939
-0.601170	efficiently by better	-0.124939
-0.600350	usability A better	-0.124939
-2.529320	to make better	-0.124939
-0.575615	systems need better	-0.124939
-0.575615	applications need better	-0.124939
-0.589534	non-reduced expression better	-0.124939
-0.917396	are becoming better	-0.124939
-0.527143	processor performs better	-0.124939
-2.948482	of the keyword	-0.124939
-1.074752	using the keyword	-0.124939
-1.291593	add the keyword	-0.124939
-0.600156	Add the keyword	-0.124939
-1.060352	functions The keyword	-0.124939
-1.060352	value. The keyword	-0.124939
-0.890447	optimizations. The keyword	-0.124939
-0.595692	context. The keyword	-0.124939
-0.890447	80. The keyword	-0.124939
-0.641509	the static keyword	-0.124939
-0.569989	The static keyword	-0.124939
-1.142517	the const keyword	-0.124939
-0.580540	The const keyword	-0.124939
-0.370687	The register keyword	-0.124939
-0.592207	the inline keyword	-0.124939
-0.165173	The volatile keyword	-0.124939
-0.541179	the __fastcall keyword	-0.124939
-2.590455	is not efficient.	-0.124939
-1.397168	is more efficient.	-0.124939
-1.306757	and more efficient.	-0.124939
-1.272134	code more efficient.	-0.124939
-0.567544	calls more efficient.	-0.124939
-1.012691	caching more efficient.	-0.124939
-0.567544	comparisons more efficient.	-0.124939
-0.597930	caching very efficient.	-0.124939
-1.191438	is less efficient.	-0.124939
-0.914792	are less efficient.	-0.124939
-0.431131	it less efficient.	-0.124939
-0.431131	program less efficient.	-0.124939
-0.431131	pointers less efficient.	-0.124939
-0.048131	caching less efficient.	-0.124939
-0.690578	slightly less efficient.	-0.124939
-0.615744	is equally efficient.	-0.124939
-0.842251	are equally efficient.	-0.124939
-2.194737	with a lookup	-0.124939
-1.389185	use a lookup	-0.124939
-1.747934	using a lookup	-0.124939
-1.357444	also a lookup	-0.124939
-1.424391	uses a lookup	-0.124939
-1.631998	not use lookup	-0.124939
-1.152656	a table lookup	-0.124939
-0.462232	of table lookup	-0.124939
-0.172321	and table lookup	-0.124939
-0.462232	on table lookup	-0.124939
-0.462232	these table lookup	-0.124939
-0.802830	virtual table lookup	-0.124939
-0.462232	Unfortunately, table lookup	-0.124939
-0.657657	vectorized table lookup	-0.124939
-0.462232	Avoid table lookup	-0.124939
-0.462232	Vectorized table lookup	-0.124939
-0.196757	14.1 Use lookup	-0.425969
-1.127817	// Table lookup	-0.124939
-0.536344	132 Table lookup	-0.124939
-0.563050	slow GOT lookup	-0.124939
-2.674033	of the end	-0.124939
-1.621057	to the end	-0.726999
-2.044691	in the end	-0.124939
-1.901934	for the end	-0.124939
-1.959472	that the end	-0.425969
-2.098283	at the end	-0.124939
-1.960803	before the end	-0.124939
-1.560529	See the end	-0.124939
-0.596646	past the end	-0.124939
-0.601801	majority of end	-0.124939
-1.679174	point to end	-0.124939
-0.601159	compare with end	-0.124939
-0.900738	RAM than end	-0.124939
-0.598291	Surprisingly, we end	-0.124939
-0.596089	must always end	-0.124939
-0.358902	; mark end	-0.124939
-1.293731	but in applications	-0.124939
-1.198271	advantage in applications	-0.124939
-0.601486	advantageous for applications	-0.124939
-1.694667	will make applications	-0.124939
-1.072289	from other applications	-0.124939
-0.598566	However, such applications	-0.124939
-1.569090	for many applications	-0.124939
-0.372017	Many software applications	-0.124939
-0.597354	for critical applications	-0.124939
-0.544704	libraries Some applications	-0.124939
-0.544704	sequentially. Some applications	-0.124939
-0.544704	Alignment? Some applications	-0.124939
-0.594054	when several applications	-0.124939
-0.588996	on mathematical applications	-0.124939
-0.805661	small embedded applications	-0.124939
-0.540939	146 Multiple applications	-0.124939
-0.540939	some CPU-intensive applications	-0.124939
-0.504761	In multithreaded applications	-0.124939
-0.463403	for Unix applications	-0.124939
-0.463403	for WTL applications	-0.124939
-0.358744	the resource-hungry applications	-0.124939
-0.358744	disk. Memory-hungry applications	-0.124939
-0.597199	page 8 below.	-0.124939
-0.836178	as explained below.	-0.249877
-0.592848	page 128 below.	-0.124939
-0.589494	code, see below.	-0.124939
-0.726779	is described below.	-0.124939
-1.124719	as described below.	-0.124939
-1.138712	are given below.	-0.124939
-0.762423	is discussed below.	-0.124939
-0.409762	are discussed below.	-0.124939
-0.567087	are mentioned below.	-0.124939
-0.557538	the sections below.	-0.124939
-0.940131	example 13.1 below.	-0.124939
-0.541004	the guidelines below.	-0.124939
-0.527167	table 8.1 below.	-0.124939
-0.129434	page 146 below.	-0.124939
-0.504931	page 164 below.	-0.124939
-0.659582	are summarized below.	-0.124939
-0.463458	example 14.19 below.	-0.124939
-1.078623	expect the &&	-0.124939
-1.185697	a a &&	-0.124939
-2.157622	= a &&	-0.124939
-1.352862	replace a &&	-0.124939
-1.069872	expression a &&	-0.124939
-0.597380	b) a &&	-0.124939
-0.203180	true a &&	-0.124939
-0.902451	operand of &&	-0.124939
-1.761428	in an &&	-0.124939
-0.897186	expression b &&	-0.124939
-1.254286	Boolean operators &&	-0.124939
-0.588919	> 256 &&	-0.124939
-0.589127	> y &&	-0.124939
-0.583079	Don't change &&	-0.124939
-0.504931	a&&(b||c) !a &&	-0.124939
-0.463458	>= min &&	-0.124939
-0.463458	< ARRAYSIZE &&	-0.124939
-0.143351	= (a<b &&	-0.124939
-0.143351	!(a<b)=(a>=b) (a<b &&	-0.124939
-0.358787	&& b<c &&	-0.124939
-0.358787	!= INVALID_HANDLE_VALUE &&	-0.124939
-0.358787	write if(!a &&	-0.124939
-1.332291	using the |	-0.124939
-1.185814	a a |	-0.124939
-2.157970	= a |	-0.124939
-2.169369	with a |	-0.124939
-1.366410	- a |	-0.124939
-0.502354	a, a |	-0.124939
-0.597411	x-xxx--xx a |	-0.124939
-0.601632	<< and |	-0.124939
-1.075798	= A |	-0.124939
-0.295846	<< 4) |	-0.124939
-0.143379	| Wednesday |	-0.124939
-0.143368	& (Tuesday |	-0.124939
-0.143368	expression (Tuesday |	-0.124939
-0.659697	| (~a&c) |	-0.124939
-0.143368	= (a&b) |	-0.124939
-0.143368	0, (a&b) |	-0.124939
-0.358844	<xmmintrin.h> _mm_setcsr(_mm_getcsr() |	-0.124939
-0.358844	& 0x7FFFFF) |	-0.124939
-0.358844	& 0x0F) |	-0.124939
-1.147200	{ // Make	-0.425969
-1.422463	}; // Make	-0.124939
-0.856278	); // Make	-0.124939
-0.199205	_mm_set1_epi16(0); // Make	-0.425969
-0.578023	list; // Make	-0.124939
-0.578023	@gnu_indirect_function"); // Make	-0.124939
-0.578023	zero(0,0,0,0,0,0,0,0); // Make	-0.124939
-0.875158	described below. Make	-0.124939
-1.742642	instruction set. Make	-0.124939
-0.579191	object's class. Make	-0.124939
-1.189956	64-bit mode. Make	-0.124939
-1.113020	the object. Make	-0.124939
-0.855248	a variable. Make	-0.124939
-1.350673	function returns. Make	-0.124939
-0.527143	below. 126 Make	-0.124939
-0.527143	C++0x support. Make	-0.124939
-0.527143	and operators. Make	-0.124939
-0.463513	and executables. Make	-0.124939
-0.358830	following alternatives: Make	-0.124939
-0.600323	zero } We	-0.124939
-1.046681	be used. We	-0.124939
-1.053255	level-1 cache. We	-0.124939
-1.511345	to zero We	-0.124939
-0.587037	Intel compilers. We	-0.124939
-1.333363	the compiler. We	-0.124939
-0.577231	possible performance. We	-0.124939
-0.557402	point number. We	-0.124939
-1.135164	dependency chain. We	-0.124939
-0.540852	identifier names. We	-0.124939
-0.526932	of u.f We	-0.124939
-0.526932	this example. We	-0.124939
-0.526932	is optimized. We	-0.124939
-0.726216	= 28. We	-0.124939
-0.463330	to 15.1c. We	-0.124939
-0.659381	sign bit. We	-0.124939
-0.358686	some caveats. We	-0.124939
-0.358686	64, ...). We	-0.124939
-0.358686	(e.g. PowerPC). We	-0.124939
-0.358686	bit set). We	-0.124939
-0.358686	as 'this'. We	-0.124939
-0.358686	to 15.1c? We	-0.124939
-0.358686	be annoying. We	-0.124939
-3.076605	of the examples	-0.124939
-1.844299	but the examples	-0.124939
-1.595060	See the examples	-0.124939
-1.359714	set. The examples	-0.124939
-1.190348	version. The examples	-0.124939
-0.598576	documented. The examples	-0.124939
-0.895989	90 for examples	-0.124939
-0.598495	103 for examples	-0.124939
-0.895989	www.agner.org/optimize/cppexamples.zip for examples	-0.124939
-1.930911	The code examples	-0.124939
-0.597540	complete code examples	-0.124939
-0.600488	find more examples	-0.124939
-0.598348	seen many examples	-0.124939
-0.575051	In these examples	-0.124939
-1.104901	All these examples	-0.124939
-1.986765	The following examples	-0.124939
-0.594137	provided several examples	-0.124939
-0.592366	www.agner.org/optimize/cppexamples.zip contains examples	-0.124939
-1.416914	the above examples	-0.124939
-1.158258	The above examples	-0.124939
-0.504861	a[i] More examples	-0.124939
-0.463495	such contrived examples	-0.124939
-0.463495	construct obscure examples	-0.124939
-1.195831	except for char	-0.124939
-1.075017	(except for char	-0.124939
-0.203455	byte = char	-0.425969
-0.898488	have used char	-0.124939
-1.428856	Instruction set char	-0.124939
-0.598676	n; static char	-0.124939
-1.351668	8 8 char	-0.124939
-0.794756	int unsigned char	-0.124939
-0.794756	8 unsigned char	-0.124939
-0.544524	16 unsigned char	-0.124939
-0.544524	Vec32c unsigned char	-0.124939
-1.060947	8 16 char	-0.124939
-1.271560	128 SSE2 char	-0.124939
-1.061164	8 32 char	-0.124939
-0.884992	c:2; }; char	-0.124939
-1.047927	64 MMX char	-0.124939
-0.463422	in stdint.h char	-0.124939
-0.358758	Example 7.10b char	-0.124939
-0.358758	Example 7.31b char	-0.124939
-0.358758	Example 7.31a char	-0.124939
-0.358758	Example 8.17 char	-0.124939
-0.358758	Example 7.9b char	-0.124939
-2.911803	of the difference	-0.124939
-2.531147	for the difference	-0.124939
-1.839830	as the difference	-0.124939
-0.203668	Note the difference	-0.425969
-0.599804	illustrates the difference	-0.124939
-0.599804	Calculating the difference	-0.124939
-2.087531	be a difference	-0.124939
-1.572184	functions. The difference	-0.124939
-0.896271	80. The difference	-0.124939
-0.598637	FPGAs. The difference	-0.124939
-0.978791	is no difference	-0.522879
-0.190325	makes no difference	-0.124939
-0.537091	simply no difference	-0.124939
-0.888244	no big difference	-0.124939
-0.863979	a relative difference	-0.124939
-0.981935	// Time difference	-0.124939
-0.358902	a minimal difference	-0.124939
-1.928986	code in addition	-0.124939
-0.600628	(or in addition	-0.124939
-0.892424	do an addition	-0.124939
-0.596694	doing an addition	-0.124939
-1.733877	time than addition	-0.124939
-1.168312	floating point addition	-0.124939
-1.462192	Floating point addition	-0.124939
-1.046759	have one addition	-0.124939
-1.752827	only one addition	-0.124939
-0.878452	if each addition	-0.124939
-1.241873	where each addition	-0.124939
-1.932348	a new addition	-0.124939
-1.401291	most important addition	-0.124939
-0.592903	do another addition	-0.124939
-0.811133	the preceding addition	-0.124939
-0.573196	math allow addition	-0.124939
-3.028796	of the data.	-0.124939
-2.504323	on the data.	-0.124939
-0.600832	prefetch the data.	-0.124939
-0.900637	organizing the data.	-0.124939
-1.605393	list of data.	-0.124939
-1.064515	block of data.	-0.124939
-0.124387	addressing of data.	-0.301030
-0.893258	lots of data.	-0.124939
-0.597116	gigabytes of data.	-0.124939
-0.600142	odd-sized vector data.	-0.124939
-0.898565	most used data.	-0.124939
-1.069270	and static data.	-0.124939
-0.494347	of test data.	-0.124939
-0.596417	storing user data.	-0.124939
-0.596332	on these data.	-0.124939
-0.577505	as writing data.	-0.124939
-0.570370	or input data.	-0.124939
-0.726551	and read-only data.	-0.124939
-0.358830	double. Misaligned data.	-0.124939
-0.358830	contains writeable data.	-0.124939
-2.613118	it is too	-0.124939
-1.432408	problem is too	-0.124939
-1.715677	solution is too	-0.124939
-0.895652	container is too	-0.124939
-1.610942	count is too	-0.124939
-0.598325	granularity is too	-0.124939
-1.812095	to be too	-0.124939
-1.886039	not be too	-0.124939
-2.197757	that are too	-0.124939
-0.597750	C are too	-0.124939
-0.894511	c[i] are too	-0.124939
-1.202439	small or too	-0.124939
-0.600584	Execution time too	-0.124939
-1.541439	program has too	-0.124939
-1.967699	it takes too	-0.124939
-0.584206	program takes too	-0.124939
-0.590666	Basic was too	-0.124939
-1.128567	has become too	-0.124939
-0.579237	compilers unroll too	-0.124939
-0.557596	sampling generates too	-0.124939
-0.527214	improvements. Making too	-0.124939
-0.358830	without worrying too	-0.124939
-0.601760	described a mechanism	-0.124939
-1.073165	*.so). The mechanism	-0.124939
-0.899070	exceptions. The mechanism	-0.124939
-0.600982	route. This mechanism	-0.124939
-1.905071	Gnu compiler mechanism	-0.124939
-1.933390	the Intel mechanism	-0.124939
-1.780213	the Gnu mechanism	-0.124939
-0.365898	out-of-order execution mechanism	-0.124939
-0.879955	The dispatching mechanism	-0.124939
-0.686430	the dispatch mechanism	-0.124939
-0.492103	function dispatch mechanism	-0.124939
-0.307625	CPU dispatch mechanism	-0.249877
-1.139038	the out-of-order mechanism	-0.124939
-1.366286	CPU detection mechanism	-0.124939
-0.818133	the update mechanism	-0.124939
-0.470190	stack unwinding mechanism	-0.124939
-0.504841	The updating mechanism	-0.124939
-0.463476	The renaming mechanism	-0.124939
-2.051346	{ // Table	-0.124939
-0.591810	n! // Table	-0.124939
-0.591810	coefficients // Table	-0.124939
-0.591810	A2; // Table	-0.124939
-0.591810	FactorialTable[n]; // Table	-0.124939
-2.264468	n.a. - Table	-0.124939
-1.060902	or 16 Table	-0.124939
-0.595994	_mm_stream_si128 SSE2 Table	-0.124939
-0.588421	CodeGear Microsoft Table	-0.124939
-1.141816	4 AVX2 Table	-0.124939
-0.869425	compiler options Table	-0.124939
-0.997697	vector nontemporal Table	-0.124939
-0.835402	of overflow. Table	-0.124939
-0.998785	512 AVX512 Table	-0.124939
-0.527017	x86intrin.h (Gnu) Table	-0.124939
-0.527017	264-1 uint64_t Table	-0.124939
-0.504761	line. 132 Table	-0.124939
-0.463403	commercial license Table	-0.124939
-0.358744	F32vec8 F64vec4 Table	-0.124939
-0.358744	58.7 168.3 Table	-0.124939
-0.358744	/Qopt-report -opt-report Table	-0.124939
-0.358744	point multiply-and-add Table	-0.124939
-0.358744	38.1 97 Table	-0.124939
-2.492488	and the runtime	-0.124939
-2.259035	than the runtime	-0.124939
-0.600814	first the runtime	-0.124939
-1.592190	while the runtime	-0.124939
-2.302662	as a runtime	-0.124939
-1.295856	makes a runtime	-0.124939
-1.975926	lot of runtime	-0.124939
-1.741698	support for runtime	-0.124939
-0.901083	wasted on runtime	-0.124939
-1.631279	not use runtime	-0.124939
-0.600304	library. A runtime	-0.124939
-0.873311	than at runtime	-0.124939
-1.146474	done at runtime	-0.124939
-0.586915	transferred at runtime	-0.124939
-0.599685	including all runtime	-0.124939
-0.598589	why such runtime	-0.124939
-1.315846	a large runtime	-0.124939
-1.249804	very large runtime	-0.124939
-0.594435	on big runtime	-0.124939
-0.594278	common language runtime	-0.124939
-1.027480	or require runtime	-0.124939
-0.580479	pointer No runtime	-0.124939
-0.504821	of. Big runtime	-0.124939
-2.211825	code is needed	-0.124939
-1.713081	time is needed	-0.124939
-1.648954	pointer is needed	-0.124939
-0.599024	map is needed	-0.124939
-0.599024	polymorphism is needed	-0.124939
-2.442924	may be needed	-0.124939
-1.746203	would be needed	-0.124939
-2.245722	that are needed	-0.124939
-0.599607	lookups are needed	-0.124939
-1.363324	is not needed	-0.346788
-1.076741	longer than needed	-0.124939
-1.849493	of memory needed	-0.124939
-1.429301	Instruction set needed	-0.124939
-0.897485	final size needed	-0.124939
-0.594698	extra work needed	-0.124939
-1.228419	is actually needed	-0.124939
-1.229950	is rarely needed	-0.124939
-0.143390	Is searching needed	-0.124939
-2.330565	as a means	-0.124939
-1.871723	member function means	-0.124939
-0.378632	one by means	-0.425969
-0.578307	etc. This means	-0.124939
-0.856816	cases. This means	-0.124939
-0.578307	units. This means	-0.124939
-0.578307	ways. This means	-0.124939
-0.578307	execution. This means	-0.124939
-0.578307	cycle. This means	-0.124939
-0.578307	28. This means	-0.124939
-1.072139	by all means	-0.124939
-0.867157	const variable means	-0.124939
-1.043248	global variable means	-0.124939
-0.593028	Here, / means	-0.124939
-0.584953	to 10 means	-0.124939
-0.584063	non-member function, means	-0.124939
-0.584063	Aligned operands means	-0.124939
-0.584125	decomposition here means	-0.124939
-0.575428	and % means	-0.124939
-0.550679	other protection means	-0.124939
-0.527101	Metaprogramming Metaprogramming means	-0.124939
-0.358801	scalar (Scalar means	-0.124939
-2.821030	in the last	-0.124939
-2.579773	that the last	-0.124939
-2.334918	with the last	-0.124939
-2.156534	at the last	-0.124939
-1.289748	add the last	-0.124939
-1.487237	after the last	-0.124939
-0.599780	leave the last	-0.124939
-0.600551	x, and last	-0.124939
-0.600551	values, and last	-0.124939
-0.902092	negative. The last	-0.124939
-1.566424	same as last	-0.124939
-1.065473	way as last	-0.124939
-0.600838	way than last	-0.124939
-0.108101	at 0, last	-0.602060
-0.575487	often true last	-0.124939
-0.570478	objects come last	-0.124939
-0.165159	at 8, last	-0.425969
-0.541146	at 16, last	-0.124939
-0.358844	at 12, last	-0.124939
-0.358844	at 400, last	-0.124939
-0.599758	are one byte	-0.124939
-1.065689	the first byte	-0.425969
-0.260825	bytes. first byte	-0.823909
-1.404296	unused bytes byte	-0.124939
-0.518828	at 1 byte	-0.124939
-0.518828	16 1 byte	-0.124939
-0.518828	32 1 byte	-0.124939
-0.081537	0, last byte	-0.602060
-0.128873	8, last byte	-0.425969
-0.312941	16, last byte	-0.124939
-0.312941	12, last byte	-0.124939
-0.312941	400, last byte	-0.124939
-1.210483	cycles per byte	-0.124939
-0.550770	at 15 byte	-0.124939
-2.287739	than the parts	-0.124939
-1.078138	optimize the parts	-0.124939
-0.600500	possible when parts	-0.124939
-1.664185	and make parts	-0.124939
-1.488557	the different parts	-0.124939
-1.344761	in different parts	-0.124939
-0.720019	between different parts	-0.425969
-1.059326	to other parts	-0.124939
-0.592019	affects other parts	-0.124939
-0.898565	most used parts	-0.124939
-1.684457	the critical parts	-0.124939
-0.530249	in critical parts	-0.124939
-0.530249	or critical parts	-0.124939
-1.408120	most critical parts	-0.124939
-0.530249	less critical parts	-0.124939
-0.594282	system- specific parts	-0.124939
-0.882155	that certain parts	-0.124939
-0.835631	most time-consuming parts	-0.124939
-1.082423	time consuming parts	-0.124939
-0.541070	brand. Critical parts	-0.124939
-0.726551	other nearby parts	-0.124939
-2.174833	= a ||	-0.124939
-1.194554	a, a ||	-0.124939
-1.361285	replace a ||	-0.124939
-0.203476	false, a ||	-0.124939
-0.902482	operand of ||	-0.124939
-0.601607	&& and ||	-0.124939
-1.761691	in an ||	-0.124939
-0.890429	< 0 ||	-0.124939
-0.051811	c (a&&b) ||	-0.124939
-0.051811	--xx----- (a&&b) ||	-0.124939
-0.051811	x--xx---- (a&&b) ||	-0.124939
-0.051811	75 (a&&b) ||	-0.124939
-0.051811	a&&b (a&&b) ||	-0.124939
-0.541111	== Wednesday ||	-0.124939
-0.659639	|| (a&&c) ||	-0.124939
-0.463495	= !(a ||	-0.124939
-0.659639	|| (!a&&c) ||	-0.124939
-0.463495	== Tuesday ||	-0.124939
-0.358816	a&&(b||c) (a&&!b) ||	-0.124939
-0.358816	#if defined(__unix__) ||	-0.124939
-0.358816	equivalent if(!(a ||	-0.124939
-1.641801	return a >	-0.124939
-1.075686	= x >	-0.124939
-1.266441	= b >	-0.425969
-0.598849	StringLength; i >	-0.124939
-0.597771	* 2 >	-0.124939
-0.594225	* c >	-0.124939
-0.573166	&& list[i] >	-0.124939
-0.706265	if (a >	-0.124939
-0.415478	MAX(a,b) (a >	-0.124939
-1.009599	if (u.i >	-0.124939
-0.527017	// u.f >	-0.124939
-0.764257	if (n >	-0.124939
-0.463403	= (bb[i] >	-0.124939
-0.463403	when bb[i] >	-0.124939
-0.065782	= select(b >	-0.425969
-0.358744	expression -a >	-0.124939
-0.358744	if (absvalue >	-0.124939
-0.358744	if (SIZE >	-0.124939
-0.358744	<, <=, >	-0.124939
-0.358744	// abs(u.f) >	-0.124939
-0.902261	number and types	-0.124939
-0.716120	of different types	-0.249877
-0.551310	using different types	-0.124939
-0.697438	two different types	-0.425969
-0.551310	CPUs, different types	-0.124939
-0.599886	reduce other types	-0.124939
-0.199893	different integer types	-0.124939
-0.581323	defining integer types	-0.124939
-1.706804	the two types	-0.124939
-0.598121	reduce some types	-0.124939
-0.200172	Function return types	-0.124939
-0.596368	convert these types	-0.124939
-0.571053	of simple types	-0.124939
-1.091506	for simple types	-0.124939
-0.805723	have mixed types	-0.124939
-0.726618	the specified types	-0.124939
-1.845655	calculation of expressions	-0.124939
-1.548835	types of expressions	-0.124939
-1.627119	floating point expressions	-0.124939
-0.563784	are integer expressions	-0.124939
-0.481782	on integer expressions	-0.124939
-0.563784	other integer expressions	-0.124939
-0.563784	reducing integer expressions	-0.124939
-0.598756	for float expressions	-0.124939
-1.284493	between two expressions	-0.124939
-0.598612	but such expressions	-0.124939
-0.594806	skip large expressions	-0.124939
-0.549052	often write expressions	-0.124939
-0.549052	programmers write expressions	-0.124939
-0.590393	cases. Integer expressions	-0.124939
-0.692064	simple algebraic expressions	-0.124939
-0.483854	various algebraic expressions	-0.124939
-1.297050	the simplest expressions	-0.124939
-0.358830	references accept expressions	-0.124939
-0.358830	inlining. Reducible expressions	-0.124939
-2.196848	that is difficult	-0.124939
-2.581059	it is difficult	-0.124939
-1.622251	It is difficult	-0.425969
-2.065794	which is difficult	-0.124939
-1.074707	long and difficult	-0.124939
-0.600564	bulky and difficult	-0.124939
-2.680970	can be difficult	-0.124939
-1.819733	may be difficult	-0.425969
-1.071895	and are difficult	-0.124939
-2.246013	that are difficult	-0.124939
-2.649966	the code difficult	-0.124939
-2.094690	is more difficult	-0.124939
-1.456932	and more difficult	-0.124939
-0.598567	otherwise. In difficult	-0.124939
-1.280351	are very difficult	-0.124939
-1.531386	and therefore difficult	-0.124939
-1.424383	is quite difficult	-0.124939
-0.835653	is slow, difficult	-0.124939
-1.049960	the instruction set.	-0.124939
-0.465196	same instruction set.	-0.124939
-0.465196	each instruction set.	-0.124939
-0.173079	available instruction set.	-0.425969
-0.465196	necessary instruction set.	-0.124939
-0.662315	specific instruction set.	-0.124939
-1.231741	AVX instruction set.	-0.124939
-0.991968	supported instruction set.	-0.124939
-1.231741	later instruction set.	-0.124939
-0.514892	higher instruction set.	-0.124939
-0.662315	desired instruction set.	-0.124939
-0.465196	given instruction set.	-0.124939
-0.465196	current instruction set.	-0.124939
-0.465196	lower instruction set.	-0.124939
-0.754619	newest instruction set.	-0.124939
-0.465196	selected instruction set.	-0.124939
-0.465196	specified instruction set.	-0.124939
-0.465196	FMA4 instruction set.	-0.124939
-0.465196	corresponding instruction set.	-0.124939
-1.365254	in each set.	-0.124939
-1.076550	inline function instead	-0.124939
-0.900975	built-in code instead	-0.124939
-2.192389	short int instead	-0.124939
-1.073360	test data instead	-0.124939
-0.598837	has i instead	-0.124939
-0.598647	were float instead	-0.124939
-1.973163	the object instead	-0.124939
-1.066393	lookup table instead	-0.124939
-1.694576	in registers instead	-0.124939
-0.890625	handling system instead	-0.124939
-0.865671	using references instead	-0.124939
-1.287376	monitor counters instead	-0.124939
-0.841552	with templates instead	-0.124939
-0.562835	use #if instead	-0.124939
-0.415500	using rounding instead	-0.124939
-0.415500	Use rounding instead	-0.124939
-0.540895	Use macros instead	-0.124939
-0.764202	file format instead	-0.124939
-0.726283	option -fpie instead	-0.124939
-0.504721	or typedef instead	-0.124939
-0.463367	(or int) instead	-0.124939
-0.358715	and |) instead	-0.124939
-1.843826	based on compilers.	-0.124939
-1.758277	for different compilers.	-0.124939
-1.387756	with different compilers.	-0.124939
-0.870635	seven different compilers.	-0.124939
-1.381307	in other compilers.	-0.124939
-1.439299	with other compilers.	-0.124939
-1.573107	with all compilers.	-0.124939
-1.593280	on all compilers.	-0.124939
-1.292583	and Intel compilers.	-0.124939
-0.847268	as C++ compilers.	-0.124939
-1.319114	Intel C++ compilers.	-0.124939
-0.573253	modern C++ compilers.	-0.124939
-1.868614	in some compilers.	-0.124939
-0.366793	and Gnu compilers.	-0.124939
-0.876392	to Microsoft compilers.	-0.124939
-0.291234	and PathScale compilers.	-0.124939
-0.567057	compatible across compilers.	-0.124939
-0.557640	and Clang compilers.	-0.124939
-0.527143	the commercial compilers.	-0.124939
-2.117571	which is transferred	-0.124939
-0.379289	m is transferred	-0.124939
-0.599757	ownership is transferred	-0.124939
-1.412391	to be transferred	-0.726999
-1.682057	would be transferred	-0.124939
-0.255986	parameters are transferred	-0.602060
-0.585110	r are transferred	-0.124939
-0.901613	copied or transferred	-0.124939
-1.627554	and then transferred	-0.124939
-1.272183	are always transferred	-0.124939
-2.332451	that is longer	-0.124939
-2.427513	to a longer	-0.124939
-1.640284	cost of longer	-0.124939
-2.388172	should be longer	-0.124939
-0.600365	method. A longer	-0.124939
-1.918165	is no longer	-0.124939
-0.823666	are no longer	-0.124939
-1.191829	that takes longer	-0.124939
-0.956692	It takes longer	-0.124939
-0.993811	integer takes longer	-0.124939
-0.817720	multiplication takes longer	-0.124939
-1.290640	to take longer	-0.124939
-0.542947	would take longer	-0.124939
-0.542947	division take longer	-0.124939
-0.542947	Divisions take longer	-0.124939
-1.715035	by making longer	-0.124939
-0.331516	takes much longer	-0.602060
-1.449122	the matrix longer	-0.124939
-0.587447	one byte longer	-0.124939
-0.089889	before and after	-0.249877
-0.601160	member or after	-0.124939
-2.632812	the same after	-0.124939
-1.253786	but only after	-0.124939
-0.592191	things only after	-0.124939
-1.842286	an object after	-0.124939
-1.783578	the array after	-0.124939
-1.496489	is accessed after	-0.124939
-0.885128	the check after	-0.124939
-1.203345	clock cycles after	-0.425969
-0.875189	searching needed after	-0.124939
-0.579071	then output after	-0.124939
-0.566985	necessary destructors after	-0.124939
-1.047997	context switches after	-0.124939
-0.504801	be removed after	-0.124939
-0.463440	execute _mm_empty() after	-0.124939
-0.358773	to resume after	-0.124939
-0.358773	remain locked after	-0.124939
-1.190376	used to read	-0.124939
-2.048238	want to read	-0.124939
-1.068833	cycles to read	-0.124939
-0.598583	Trying to read	-0.124939
-0.598583	WritePrivateProfileString to read	-0.124939
-0.601594	buffer and read	-0.124939
-2.341692	to be read	-0.124939
-2.678693	can be read	-0.124939
-1.847634	must be read	-0.124939
-2.230948	you can read	-0.124939
-1.674301	Do not read	-0.124939
-2.055783	you may read	-0.124939
-2.013572	If you read	-0.124939
-1.588674	code will read	-0.124939
-0.592210	need only read	-0.124939
-0.592210	would only read	-0.124939
-0.599773	above, but read	-0.124939
-0.895448	when we read	-0.124939
-0.567021	splitting 256-bit read	-0.124939
-0.557507	program had read	-0.124939
-0.527101	an uncached read	-0.124939
-0.463476	to 99 read	-0.124939
-1.714489	code to give	-0.124939
-2.243907	possible to give	-0.124939
-0.599373	processor to give	-0.124939
-0.599373	appropriate to give	-0.124939
-1.494435	overflow and give	-0.124939
-0.600513	underflow and give	-0.124939
-0.901804	table can give	-0.124939
-1.502543	class or give	-0.124939
-2.025397	does not give	-0.124939
-0.600797	(^) may give	-0.124939
-0.594256	// will give	-0.124939
-0.887621	file will give	-0.124939
-1.071912	dispatcher should give	-0.124939
-1.816777	operating systems give	-0.124939
-0.594504	instruction doesn't give	-0.124939
-0.883404	this would give	-0.124939
-0.579134	They sometimes give	-0.124939
-1.138611	can still give	-0.124939
-0.841788	subsequent counts give	-0.124939
-0.434882	negative inputs give	-0.124939
-0.434882	Higher inputs give	-0.124939
-1.182976	of code. Each	-0.124939
-0.596626	installation time. Each	-0.124939
-0.578923	extra resources. Each	-0.124939
-0.855221	64 bytes. Each	-0.124939
-1.396738	the stack. Each	-0.124939
-0.575310	polymorphic classes. Each	-0.124939
-0.846800	point instructions. Each	-0.124939
-1.251537	execution units. Each	-0.124939
-0.557510	jobs simultaneously. Each	-0.124939
-1.145055	multiple threads. Each	-0.124939
-0.550510	processor cores. Each	-0.124939
-0.764202	at initialization. Each	-0.124939
-0.526974	handling information. Each	-0.124939
-1.065427	the diagonal. Each	-0.124939
-0.902613	following reasons: Each	-0.124939
-0.835223	linked list. Each	-0.124939
-0.358715	functions (methods) Each	-0.124939
-0.358715	of 64. Each	-0.124939
-0.358715	are unacceptable. Each	-0.124939
-0.358715	and Z. Each	-0.124939
-0.358715	line 29. Each	-0.124939
-2.273746	that it becomes	-0.124939
-2.180755	then it becomes	-0.124939
-1.488087	the code becomes	-0.124939
-0.994971	The code becomes	-0.301030
-1.111655	machine code becomes	-0.124939
-2.004461	a loop becomes	-0.124939
-1.184855	write instructions becomes	-0.124939
-0.887890	between threads becomes	-0.124939
-1.354246	The calculation becomes	-0.124939
-1.249031	stamp counter becomes	-0.124939
-1.268360	memory space becomes	-0.124939
-0.932085	heap space becomes	-0.124939
-0.543441	and caching becomes	-0.124939
-0.543441	that caching becomes	-0.124939
-0.589012	space never becomes	-0.124939
-0.358859	menu click becomes	-0.124939
-1.678331	pointer is aligned	-0.124939
-2.342556	to be aligned	-0.124939
-2.250339	should be aligned	-0.124939
-1.847981	must be aligned	-0.124939
-1.039420	data are aligned	-0.124939
-0.895721	arrays are aligned	-0.124939
-0.596680	#pragma vector aligned	-0.124939
-2.529320	to make aligned	-0.124939
-1.707171	to store aligned	-0.124939
-1.322584	is typically aligned	-0.124939
-0.590040	are preferably aligned	-0.124939
-0.587827	Make three aligned	-0.124939
-1.558184	to load aligned	-0.124939
-0.858496	of S1 aligned	-0.124939
-0.366798	memcpy 16kB aligned	-0.425969
-0.463513	are properly aligned	-0.124939
-0.601620	keywords and directives	-0.124939
-1.564667	of these directives	-0.124939
-1.104960	that these directives	-0.124939
-1.780362	the Gnu directives	-0.124939
-0.594129	Fortran. These directives	-0.124939
-0.876287	of #include directives	-0.124939
-1.152271	the Microsoft directives	-0.124939
-0.519688	runtime. #define directives	-0.124939
-0.519688	name. #define directives	-0.124939
-0.570433	18.2. Compiler directives	-0.124939
-0.169317	8.6 Optimization directives	-0.124939
-0.143396	the OpenMP directives	-0.425969
-0.358937	by OpenMP directives	-0.124939
-0.434969	have #if directives	-0.124939
-0.434969	compiled. #if directives	-0.124939
-0.090142	directives Preprocessing directives	-0.124939
-0.042737	7.32 Preprocessing directives	-0.124939
-0.463513	has preprocessing directives	-0.124939
-1.072645	language that requires	-0.124939
-0.599870	formalism that requires	-0.124939
-2.055059	because it requires	-0.124939
-1.914233	but it requires	-0.124939
-0.595965	importantly, it requires	-0.124939
-1.258887	program. This requires	-0.124939
-0.885739	block. This requires	-0.124939
-0.593298	option. This requires	-0.124939
-0.600579	system, this requires	-0.124939
-0.600411	library. It requires	-0.124939
-0.596898	hardware often requires	-0.124939
-0.369422	two pointers requires	-0.124939
-1.099807	This method requires	-0.124939
-0.595573	such processors requires	-0.124939
-0.593542	(single precision requires	-0.124939
-0.593201	This calculation requires	-0.124939
-0.585793	intrinsic vectors requires	-0.124939
-0.584917	databases usually requires	-0.124939
-0.463495	Event-based sampling requires	-0.124939
-1.446258	doing the optimizations	-0.124939
-1.496507	kind of optimizations	-0.124939
-1.200859	Comparison of optimizations	-0.124939
-2.619340	the compiler optimizations	-0.124939
-0.883170	of other optimizations	-0.124939
-0.591986	various other optimizations	-0.124939
-1.387404	of which optimizations	-0.124939
-1.163385	and which optimizations	-0.124939
-0.591257	off all optimizations	-0.124939
-0.591257	prevents all optimizations	-0.124939
-2.273927	to do optimizations	-0.124939
-1.031204	for such optimizations	-0.124939
-1.031204	do such optimizations	-0.124939
-0.582458	from making optimizations	-0.124939
-1.056136	CPU- specific optimizations	-0.124939
-1.165388	from doing optimizations	-0.124939
-1.371712	can improve optimizations	-0.124939
-0.847264	will enable optimizations	-0.124939
-0.527182	do interprocedural optimizations	-0.124939
-0.358801	do cross-module optimizations	-0.124939
-3.195064	of the graphics	-0.124939
-1.461896	to a graphics	-0.301030
-2.169128	with a graphics	-0.124939
-1.385439	on a graphics	-0.124939
-1.907569	have a graphics	-0.124939
-1.845655	calculation of graphics	-0.124939
-1.548835	types of graphics	-0.124939
-0.601218	coprocessor or graphics	-0.124939
-2.246595	is no graphics	-0.124939
-0.888702	the large graphics	-0.124939
-1.692910	a specific graphics	-0.124939
-0.586588	resources. Each graphics	-0.124939
-0.615545	the heavy graphics	-0.124939
-0.434912	a heavy graphics	-0.124939
-0.504971	not cover graphics	-0.124939
-0.463513	a third-party graphics	-0.124939
-0.463513	unit. Various graphics	-0.124939
-0.358830	than rendering graphics	-0.124939
-2.363556	to a public	-0.124939
-1.292475	access a public	-0.124939
-1.074028	whenever a public	-0.124939
-0.601786	overriding of public	-0.124939
-0.941520	functions and public	-0.124939
-0.600668	can't have public	-0.124939
-1.747169	for all public	-0.124939
-0.598222	avoiding any public	-0.124939
-0.141626	D : public	-0.124939
-0.141626	C1 : public	-0.124939
-0.141626	CChild1 : public	-0.425969
-0.353148	CChild2 : public	-0.124939
-0.353148	CParent : public	-0.124939
-0.353148	C2 : public	-0.124939
-0.591464	relocation. All public	-0.124939
-0.659754	to override public	-0.124939
-0.358873	public B1, public	-0.124939
-0.522923	C1 { public:	-0.124939
-0.522923	powN { public:	-0.124939
-0.115399	CHello { public:	-0.301030
-0.187097	C0 { public:	-0.425969
-0.757037	CParent<CChild1> { public:	-0.124939
-0.187097	CGrandParent { public:	-0.425969
-0.522923	B2 { public:	-0.124939
-0.522923	B1 { public:	-0.124939
-0.522923	powN<true,0> { public:	-0.124939
-0.522923	powN<true,N> { public:	-0.124939
-0.522923	S2 { public:	-0.124939
-0.522923	S3 { public:	-0.124939
-0.522923	CParent<CChild2> { public:	-0.124939
-0.522923	powN<true,1> { public:	-0.124939
-1.034736	int x; public:	-0.124939
-0.527334	vector 56 public:	-0.124939
-0.358959	T a[N]; public:	-0.124939
-3.134271	of the framework	-0.124939
-1.202084	load the framework	-0.124939
-0.901289	such a framework	-0.124939
-1.200976	code. This framework	-0.124939
-1.531461	a software framework	-0.124939
-1.542489	an extra framework	-0.124939
-0.866489	the runtime framework	-0.124939
-0.179309	large runtime framework	-0.124939
-0.530531	specific graphics framework	-0.124939
-0.530531	third-party graphics framework	-0.124939
-0.940588	user interface framework	-0.124939
-0.328991	high level framework	-0.124939
-0.550653	a complex framework	-0.124939
-0.274634	the .NET framework	-0.124939
-0.081883	The .NET framework	-0.124939
-0.182876	Microsoft's .NET framework	-0.124939
-1.925041	necessary to look	-0.124939
-1.738358	needs to look	-0.124939
-2.022612	compiler can look	-0.124939
-1.296391	should not look	-0.124939
-1.936200	You may look	-0.124939
-0.591743	implementation may look	-0.124939
-0.591743	setup may look	-0.124939
-1.961572	if you look	-0.124939
-1.887786	If you look	-0.124939
-0.591378	When you look	-0.124939
-0.899616	systems. A look	-0.124939
-1.197375	It will look	-0.124939
-1.197913	Intrinsic functions look	-0.124939
-1.724417	you should look	-0.124939
-1.193720	may also look	-0.124939
-0.893214	to first look	-0.124939
-0.589868	may typically look	-0.124939
-0.314669	code. Let's look	-0.124939
-0.314669	rows. Let's look	-0.124939
-0.463476	example, let's look	-0.124939
-0.463476	address. (3) look	-0.124939
-0.936106	of static linking	-0.124939
-0.518057	that static linking	-0.124939
-0.518057	if static linking	-0.124939
-0.518057	when static linking	-0.124939
-0.344638	using static linking	-0.124939
-0.518057	requires static linking	-0.124939
-0.518057	specify static linking	-0.124939
-1.214518	of dynamic linking	-0.124939
-0.509971	if dynamic linking	-0.124939
-0.509971	than dynamic linking	-0.124939
-0.509971	where dynamic linking	-0.124939
-0.509971	while dynamic linking	-0.124939
-0.384550	used. Dynamic linking	-0.124939
-0.384550	user. Dynamic linking	-0.124939
-0.151034	3.6 Dynamic linking	-0.425969
-0.575630	Function level linking	-0.124939
-0.567199	or easy linking	-0.124939
-0.358967	code Static linking	-0.124939
-0.358967	are: Static linking	-0.124939
-1.712893	the code. Many	-0.124939
-0.592842	speed-critical functions. Many	-0.124939
-0.591004	to program. Many	-0.124939
-1.037501	discussed below. Many	-0.124939
-0.872316	newer processors. Many	-0.124939
-0.867502	a compiler. Many	-0.124939
-1.287376	monitor counters Many	-0.124939
-0.846714	Automatic updates Many	-0.124939
-1.375456	CPU dispatching. Many	-0.124939
-0.835530	64-bit integers. Many	-0.124939
-0.557441	inefficient solution. Many	-0.124939
-0.788321	other microprocessors. Many	-0.124939
-0.764007	Other databases Many	-0.124939
-0.526974	relevant options. Many	-0.124939
-0.884331	user input. Many	-0.124939
-0.463367	Manual". developer.intel.com. Many	-0.124939
-0.463367	be slower. Many	-0.124939
-0.358715	or more. Many	-0.124939
-0.358715	system breakdown. Many	-0.124939
-0.358715	Background services. Many	-0.124939
-0.358715	processors properly. Many	-0.124939
-0.600142	scientific vector processors.	-0.124939
-1.073093	on different processors.	-0.124939
-0.837475	with Intel processors.	-0.124939
-0.647196	on Intel processors.	-0.124939
-0.568015	later Intel processors.	-0.124939
-1.188459	on some processors.	-0.124939
-0.592730	only known processors.	-0.124939
-0.586588	cover graphics processors.	-0.124939
-0.692500	and VIA processors.	-0.124939
-0.691961	of logical processors.	-0.124939
-0.483790	eight logical processors.	-0.124939
-0.751443	on future processors.	-0.124939
-0.463562	than future processors.	-0.124939
-0.463562	on newer processors.	-0.124939
-0.463562	most newer processors.	-0.124939
-0.562912	in PC processors.	-0.124939
-1.162361	the newest processors.	-0.124939
-0.726551	on contemporary processors.	-0.124939
-2.237172	that is actually	-0.124939
-2.451115	This is actually	-0.124939
-1.713006	time is actually	-0.124939
-1.996732	program is actually	-0.124939
-0.897025	16 is actually	-0.124939
-1.299852	goes to actually	-0.124939
-1.676630	code for actually	-0.124939
-1.830718	functions are actually	-0.124939
-1.674681	compilers are actually	-0.124939
-1.060988	CPUs are actually	-0.124939
-0.595911	consumption are actually	-0.124939
-2.102796	This can actually	-0.124939
-1.442689	than it actually	-0.124939
-1.893847	compiler may actually	-0.124939
-0.596261	set may actually	-0.124939
-0.599162	original pointer actually	-0.124939
-1.861961	the user actually	-0.124939
-0.527143	your modifications actually	-0.124939
-0.504881	case F2 actually	-0.124939
-0.463513	"position-independent code" actually	-0.124939
-0.358830	and temp++ actually	-0.124939
-0.037592	microarchitecture of Intel,	-1.079181
-0.601585	breakdowns for Intel,	-0.124939
-1.464115	with an Intel,	-0.124939
-1.183118	not an Intel,	-0.124939
-0.600307	microprocessors from Intel,	-0.124939
-0.599423	Supported compilers Intel,	-0.124939
-0.590805	supports both Intel,	-0.124939
-0.876502	Intel, Microsoft Intel,	-0.124939
-0.118935	the Microsoft, Intel,	-0.425969
-0.283264	by Microsoft, Intel,	-0.124939
-0.283264	platforms. Microsoft, Intel,	-0.124939
-1.065646	Gnu, Clang, Intel,	-0.124939
-1.902597	of a linked	-0.124939
-2.474092	in a linked	-0.124939
-1.364854	as a linked	-0.602060
-1.989813	use a linked	-0.124939
-1.847495	through a linked	-0.124939
-2.187481	can be linked	-0.124939
-2.196019	should be linked	-0.124939
-1.061741	cases be linked	-0.124939
-0.601483	modules are linked	-0.124939
-2.081553	faster than linked	-0.124939
-0.600575	containers use linked	-0.124939
-0.887958	block. A linked	-0.124939
-0.594428	lists. A linked	-0.124939
-1.074279	are then linked	-0.124939
-1.538349	library functions linked	-0.124939
-0.582041	of dynamically linked	-0.124939
-0.358873	DLL's (dynamically linked	-0.124939
-0.901572	xn = x;	-0.124939
-1.961073	const int x;	-0.124939
-0.202185	Table[100]; int x;	-0.124939
-1.593863	i; } x;	-0.124939
-1.257562	i; float x;	-0.124939
-0.574602	j; float x;	-0.124939
-0.574602	7.27 float x;	-0.124939
-1.290205	x * x;	-0.124939
-1.948343	{ return x;	-0.124939
-0.562972	matrix[FuncRow(i)][FuncCol(i)] += x;	-0.124939
-0.562972	matrix[i][j] += x;	-0.124939
-0.188077	x *= x;	-0.124939
-0.188077	y *= x;	-0.124939
-0.083936	factorial *= x;	-0.124939
-0.188077	xn *= x;	-0.124939
-0.957582	{ C1 x;	-0.124939
-0.129462	}; Bitfield x;	-0.425969
-0.463549	qword ptr x;	-0.124939
-1.374902	brands of microprocessors	-0.124939
-0.600993	family of microprocessors	-0.124939
-1.678682	compilers and microprocessors	-0.124939
-1.192931	of Intel microprocessors	-0.124939
-1.133669	that some microprocessors	-0.124939
-1.133669	on some microprocessors	-0.124939
-1.479013	the way microprocessors	-0.124939
-0.456413	with old microprocessors	-0.124939
-0.466635	of modern microprocessors	-0.425969
-0.395270	The modern microprocessors	-0.124939
-0.627456	all modern microprocessors	-0.124939
-0.570420	All newer microprocessors	-0.124939
-0.283251	operators Modern microprocessors	-0.124939
-0.283251	chains Modern microprocessors	-0.124939
-0.283251	prediction. Modern microprocessors	-0.124939
-0.283251	mechanisms. Modern microprocessors	-0.124939
-0.527165	with older microprocessors	-0.124939
-0.527295	possible. Smaller microprocessors	-0.124939
-0.358844	operations Today's microprocessors	-0.124939
-1.676064	time to load	-0.124939
-0.593917	cache to load	-0.124939
-1.880184	takes to load	-0.124939
-1.889701	need to load	-0.124939
-1.817150	necessary to load	-0.124939
-0.389944	Function to load	-0.425969
-1.679872	needs to load	-0.124939
-1.055195	writes to load	-0.124939
-0.902154	it. The load	-0.124939
-1.874333	may not load	-0.124939
-1.589145	code will load	-0.124939
-0.886290	Dispatch at load	-0.124939
-0.593579	relocation at load	-0.124939
-0.713295	the work load	-0.425969
-1.693717	a specific load	-0.124939
-0.999411	the actual load	-0.124939
-0.358887	of it) load	-0.124939
-1.980675	way to control	-0.124939
-0.901015	options to control	-0.124939
-0.792511	the loop control	-0.380211
-1.194625	The loop control	-0.124939
-0.518889	efficient loop control	-0.124939
-0.518889	i<20 loop control	-0.124939
-0.581542	other cache control	-0.124939
-0.199938	Explicit cache control	-0.124939
-1.429746	Instruction set control	-0.124939
-0.598480	a version control	-0.124939
-0.557774	9.2. Cache control	-0.124939
-1.293102	compiler to assume	-0.124939
-1.847460	has to assume	-0.124939
-0.599401	permissible to assume	-0.124939
-0.902261	problem and assume	-0.124939
-1.513731	you can assume	-0.425969
-1.223835	You can assume	-0.425969
-1.076749	overflow or assume	-0.124939
-0.600812	can you assume	-0.124939
-1.135377	If we assume	-0.124939
-0.583828	which we assume	-0.124939
-1.485993	you cannot assume	-0.124939
-1.451969	You cannot assume	-0.124939
-0.883503	compiler would assume	-0.124939
-0.159872	can generally assume	-0.425969
-0.358859	compiler makers assume	-0.124939
-0.358859	can safely assume	-0.124939
-0.446246	size = 100;	-0.221849
-0.586792	ARRAYSIZE = 100;	-0.124939
-0.586792	NUMCOLUMNS = 100;	-0.124939
-0.165006	x < 100;	-0.425969
-0.557948	i < 100;	-0.865301
-2.675890	that the numbers	-0.124939
-2.000314	all the numbers	-0.124939
-0.901389	hold the numbers	-0.124939
-1.168354	floating point numbers	-0.271067
-1.462295	Floating point numbers	-0.124939
-0.820568	with four numbers	-0.124939
-0.558842	have four numbers	-0.124939
-0.592489	have eight numbers	-0.124939
-1.139292	and model numbers	-0.124939
-0.999553	processor model numbers	-0.124939
-0.557674	of thousand numbers	-0.124939
-0.463586	generating denormal numbers	-0.124939
-0.463586	use hexadecimal numbers	-0.124939
-0.358887	data (low numbers	-0.124939
-2.100136	on a platform	-0.124939
-1.599693	choice of platform	-0.124939
-1.843624	a different platform	-0.124939
-1.384097	64 bit platform	-0.124939
-0.561371	16 bit platform	-0.124939
-1.329051	32 bit platform	-0.124939
-0.594676	_LP64 Windows platform	-0.124939
-0.593601	_WIN32 Linux platform	-0.124939
-0.903907	the hardware platform	-0.124939
-0.077052	of hardware platform	-0.124939
-0.888179	the optimal platform	-0.124939
-1.152271	the Microsoft platform	-0.124939
-0.585076	__linux__ x86 platform	-0.124939
-1.129062	standard PC platform	-0.124939
-0.527214	_M_IX86 x86-64 platform	-0.124939
-0.504881	of efficiency, platform	-0.124939
-2.301620	code is later	-0.124939
-1.453413	function and later	-0.124939
-0.595246	SSE2 and later	-0.124939
-0.595246	AVX and later	-0.124939
-0.889567	SSE and later	-0.124939
-0.595246	Core and later	-0.124939
-0.595246	pipeline and later	-0.124939
-0.889567	0x2710 and later	-0.124939
-0.601605	helpful for later	-0.124939
-0.261940	SSE2 or later	-0.726999
-0.200366	AVX or later	-0.124939
-0.583608	2.20 or later	-0.124939
-0.583608	Pentium-II or later	-0.124939
-2.421006	to use later	-0.124939
-0.595591	v.10.3 & later	-0.124939
-1.901445	clock cycles later	-0.124939
-2.651209	the code together	-0.124939
-0.671530	are used together	-0.602060
-1.394639	the objects together	-0.124939
-0.868546	many objects together	-0.124939
-0.843943	be stored together	-0.124939
-1.453876	are stored together	-0.124939
-1.046451	also stored together	-0.124939
-0.516109	always stored together	-0.124939
-1.017575	be linked together	-0.124939
-0.530602	then linked together	-0.124939
-1.334964	to keep together	-0.124939
-0.788823	software project together	-0.124939
-0.902903	be joined together	-0.124939
-2.195159	then the dispatch	-0.124939
-2.034188	where the dispatch	-0.124939
-1.623079	making the dispatch	-0.124939
-1.073529	uses the dispatch	-0.124939
-1.073529	implement the dispatch	-0.124939
-0.899314	bypassing the dispatch	-0.124939
-0.601828	called, a dispatch	-0.124939
-0.601815	sense to dispatch	-0.124939
-0.852251	virtual function dispatch	-0.124939
-1.579879	the CPU dispatch	-0.124939
-0.898952	The CPU dispatch	-0.425969
-0.788437	A CPU dispatch	-0.124939
-0.540961	Automatic CPU dispatch	-0.124939
-0.540961	similar CPU dispatch	-0.124939
-0.191193	13.1 CPU dispatch	-0.124939
-0.540961	inappropriate CPU dispatch	-0.124939
-0.588010	on runtime dispatch	-0.124939
-2.726290	to the calling	-0.124939
-1.444555	which the calling	-0.124939
-1.078386	template is calling	-0.124939
-1.077862	up and calling	-0.124939
-1.077655	object. The calling	-0.124939
-1.934806	useful for calling	-0.124939
-1.793690	the function calling	-0.124939
-0.884556	make function calling	-0.124939
-1.167810	same function calling	-0.124939
-0.595136	file by calling	-0.124939
-1.503774	avoided by calling	-0.124939
-0.595136	prevented by calling	-0.124939
-1.373928	fast as calling	-0.124939
-1.945651	more than calling	-0.124939
-0.579894	array before calling	-0.124939
-1.140753	_mm256_zeroupper() before calling	-0.124939
-1.056252	any specific calling	-0.124939
-1.497946	the standard calling	-0.124939
-1.283122	manual 5: calling	-0.124939
-0.601748	lifetime of your	-0.124939
-0.601751	answers to your	-0.124939
-0.600609	point in your	-0.124939
-0.600609	switch in your	-0.124939
-0.898985	necessary for your	-0.124939
-1.292858	manual for your	-0.124939
-0.601538	remember that your	-0.124939
-1.479652	check if your	-0.124939
-0.598325	hand, if your	-0.124939
-1.032499	you make your	-0.124939
-0.871418	must make your	-0.124939
-0.585935	better, make your	-0.124939
-0.599883	CriticalFunction. If your	-0.124939
-0.597421	years before your	-0.124939
-0.593472	counters inside your	-0.124939
-0.880707	may write your	-0.124939
-0.590513	frame unless your	-0.124939
-1.089227	to define your	-0.124939
-0.463476	don't send your	-0.124939
-0.358801	code. Inserting your	-0.124939
-0.683849	to its own	-0.124939
-0.569302	in its own	-0.124939
-0.642218	on its own	-0.124939
-0.569302	have its own	-0.124939
-0.197368	has its own	-0.124939
-0.403858	thread its own	-0.124939
-0.403858	get its own	-0.124939
-0.403858	handle its own	-0.124939
-0.543446	at their own	-0.124939
-0.543446	program their own	-0.124939
-0.281546	make your own	-0.425969
-0.387837	write your own	-0.124939
-0.387837	define your own	-0.124939
-0.387837	Inserting your own	-0.124939
-0.744218	on my own	-0.124939
-0.515400	For my own	-0.124939
-0.570564	Asmlib My own	-0.124939
-0.901422	class is declared	-0.124939
-1.647715	should be declared	-0.124939
-1.792422	must be declared	-0.124939
-1.118400	preferably be declared	-0.124939
-1.581479	that are declared	-0.124939
-1.389656	they are declared	-0.124939
-0.600986	was not declared	-0.124939
-0.525754	and objects declared	-0.301030
-0.818636	any objects declared	-0.124939
-0.895127	for variables declared	-0.124939
-0.855317	A macro declared	-0.124939
-0.491926	class Variables declared	-0.124939
-0.491926	inefficient. Variables declared	-0.124939
-2.887929	in the XMM	-0.124939
-2.262091	use the XMM	-0.124939
-1.510806	when the XMM	-0.602060
-0.601730	underflow in XMM	-0.124939
-0.601571	Windows). The XMM	-0.124939
-1.742287	support for XMM	-0.124939
-1.479796	check if XMM	-0.124939
-0.598346	costly if XMM	-0.124939
-1.876034	Floating point XMM	-0.124939
-0.885011	implementation uses XMM	-0.124939
-0.590421	- Integer XMM	-0.124939
-0.589067	76 Boolean XMM	-0.124939
-0.351295	a 128-bit XMM	-0.124939
-0.287597	to 128-bit XMM	-0.124939
-0.120415	The 128-bit XMM	-0.124939
-0.541163	stack versus XMM	-0.124939
-2.791273	of the second	-0.124939
-2.520985	to the second	-0.124939
-2.366959	and the second	-0.124939
-2.283695	in the second	-0.124939
-2.385368	on the second	-0.124939
-1.532101	then the second	-0.425969
-2.130253	at the second	-0.124939
-1.642739	whether the second	-0.124939
-0.598415	last the second	-0.124939
-2.147733	by a second	-0.124939
-1.884889	through a second	-0.124939
-0.600373	Installing a second	-0.124939
-1.555681	code. The second	-0.124939
-0.894368	functions. The second	-0.425969
-0.597193	137). The second	-0.124939
-0.591937	incremented every second	-0.124939
-0.358945	price, compatibility, second	-0.124939
-0.230681	following example shows	-0.467361
-0.499604	next example shows	-0.124939
-0.586174	The table shows	-0.124939
-0.764481	Example 12.4b shows	-0.124939
-0.463604	page 16) shows	-0.124939
-0.463604	page 39 shows	-0.124939
-0.463604	page 58 shows	-0.124939
-0.358902	(page 77) shows	-0.124939
-0.358902	(page 131) shows	-0.124939
-1.299401	language and interface	-0.124939
-0.630718	the user interface	-0.124939
-0.252818	of user interface	-0.301030
-0.577562	The user interface	-0.124939
-0.365347	A user interface	-0.124939
-0.365347	possible user interface	-0.124939
-0.365347	standard user interface	-0.124939
-0.365347	including user interface	-0.124939
-0.331999	graphical user interface	-0.124939
-0.365347	popular user interface	-0.124939
-0.435093	Several graphical interface	-0.124939
-0.435093	system-specific graphical interface	-0.124939
-0.241885	a well-defined interface	-0.425969
-2.225806	possible to improve	-0.124939
-1.582629	order to improve	-0.124939
-2.049674	want to improve	-0.124939
-0.598647	methods to improve	-0.124939
-2.017874	This can improve	-0.124939
-1.204957	You can improve	-0.124939
-0.875637	table can improve	-0.124939
-0.875637	systems can improve	-0.124939
-0.875637	statement can improve	-0.124939
-0.588116	64) can improve	-0.124939
-0.601012	did not improve	-0.124939
-1.325207	that may improve	-0.124939
-1.524510	This may improve	-0.124939
-1.311644	you may improve	-0.124939
-0.865658	this may improve	-0.124939
-1.370203	not only improve	-0.124939
-1.203735	can possibly improve	-0.124939
-0.601886	ignoring the higher	-0.124939
-1.203032	functions is higher	-0.124939
-2.493283	is a higher	-0.124939
-1.391170	for a higher	-0.124939
-1.614863	with a higher	-0.124939
-1.846587	at a higher	-0.124939
-2.525973	to be higher	-0.124939
-0.601440	costs are higher	-0.124939
-0.901379	SSE or higher	-0.124939
-0.600335	microarchitecture. A higher	-0.124939
-0.600166	size has higher	-0.124939
-1.067593	or any higher	-0.124939
-0.564760	are much higher	-0.124939
-0.564760	A much higher	-0.124939
-1.669119	the next higher	-0.124939
-1.159264	to give higher	-0.124939
-1.496514	is usually higher	-0.124939
-0.358816	and hence higher	-0.124939
-2.025899	program is bigger	-0.124939
-1.373542	matrix is bigger	-0.124939
-1.370246	parameter is bigger	-0.124939
-1.940981	advantage of bigger	-0.124939
-1.596687	code. The bigger	-0.124939
-2.530307	may be bigger	-0.124939
-2.298001	that are bigger	-0.124939
-0.901083	used on bigger	-0.124939
-1.075925	treated as bigger	-0.124939
-1.582784	innermost loop bigger	-0.124939
-1.248725	the new bigger	-0.124939
-1.170138	a new bigger	-0.425969
-0.594675	for arrays bigger	-0.124939
-1.320455	that allows bigger	-0.124939
-1.406488	code becomes bigger	-0.124939
-0.863561	have become bigger	-0.124939
-0.577431	total offset bigger	-0.124939
-0.541004	2. Objects bigger	-0.124939
-0.541004	the ever bigger	-0.124939
-2.298013	use the vectors	-0.124939
-0.601520	wrapping the vectors	-0.124939
-1.378094	time in vectors	-0.124939
-1.708977	functions for vectors	-0.124939
-1.190937	operations on vectors	-0.124939
-1.287847	calculations on vectors	-0.124939
-0.898001	bit integer vectors	-0.124939
-2.085537	by using vectors	-0.124939
-0.790062	and double vectors	-0.124939
-1.069224	bit float vectors	-0.124939
-1.423986	the 64-bit vectors	-0.124939
-0.589079	of intrinsic vectors	-0.124939
-1.225617	128-bit XMM vectors	-0.124939
-0.585893	The bigger vectors	-0.124939
-1.532661	// Define vectors	-0.124939
-0.828023	256-bit YMM vectors	-0.124939
-0.726450	set (128 vectors	-0.124939
-0.107166	or 3-dimensional vectors	-0.124939
-2.298921	{ // Floating	-0.124939
-1.487837	and double Floating	-0.124939
-2.362728	- n.a. Floating	-0.124939
-1.477160	point variables Floating	-0.124939
-1.460979	64-bit systems. Floating	-0.124939
-1.236815	point division Floating	-0.124939
-1.118606	and shift Floating	-0.124939
-1.515946	clock cycles. Floating	-0.124939
-0.851391	multiple purposes. Floating	-0.124939
-1.077952	point expressions. Floating	-0.124939
-0.557439	integer parameters. Floating	-0.124939
-0.557559	32-bit integer. Floating	-0.124939
-0.249741	0; 14.6 Floating	-0.124939
-0.249741	137 14.6 Floating	-0.124939
-0.659525	page 105. Floating	-0.124939
-0.143342	29 7.3 Floating	-0.124939
-0.143342	31 7.3 Floating	-0.124939
-0.358758	is organized. Floating	-0.124939
-0.358758	clock cycles). Floating	-0.124939
-0.358758	programmer. 79 Floating	-0.124939
-2.563926	and the AVX2	-0.124939
-0.601571	somewhat. The AVX2	-0.124939
-1.444197	one for AVX2	-0.124939
-0.901688	SSE4.1 // AVX2	-0.124939
-1.730027	only when AVX2	-0.124939
-0.582450	double 2 AVX2	-0.124939
-0.582450	int64_t 2 AVX2	-0.124939
-0.797441	int 4 AVX2	-0.124939
-0.546031	double 4 AVX2	-0.124939
-0.797441	float 4 AVX2	-0.124939
-0.546031	int64_t 4 AVX2	-0.124939
-0.195792	int 8 AVX2	-0.124939
-0.561931	float 8 AVX2	-0.124939
-0.652426	4 256 AVX2	-0.124939
-0.652426	8 256 AVX2	-0.124939
-0.652426	16 256 AVX2	-0.124939
-0.458889	32 256 AVX2	-0.124939
-0.871256	double vectors AVX2	-0.124939
-0.463549	e.g. AVX, AVX2	-0.124939
-1.501229	after the piece	-0.124939
-1.077707	insert the piece	-0.124939
-2.350709	of a piece	-0.124939
-1.846853	if a piece	-0.124939
-1.370265	make a piece	-0.425969
-1.763099	If a piece	-0.124939
-0.889617	how a piece	-0.124939
-0.595271	optimize a piece	-0.124939
-1.274489	generate a piece	-0.124939
-0.595271	optimizes a piece	-0.124939
-0.595271	studying a piece	-0.124939
-0.601218	piece by piece	-0.124939
-1.949516	the same piece	-0.425969
-1.282818	a critical piece	-0.124939
-0.594743	background calculations piece	-0.124939
-0.752439	a small piece	-0.124939
-1.793715	a particular piece	-0.124939
-1.372422	that is divisible	-0.602060
-2.082919	which is divisible	-0.124939
-0.937867	count is divisible	-0.425969
-2.428589	to be divisible	-0.124939
-1.880446	must be divisible	-0.124939
-1.894410	is not divisible	-0.425969
-0.897608	Array size divisible	-0.124939
-0.140801	an address divisible	-0.823909
-0.176757	to addresses divisible	-0.425969
-0.479792	have addresses divisible	-0.124939
-0.685522	memory addresses divisible	-0.124939
-1.939039	use of <<	-0.124939
-0.587927	+= n <<	-0.124939
-0.573282	<< list[i] <<	-0.124939
-0.003064	{ cout <<	-0.425969
-0.025136	array cout <<	-0.124939
-0.025136	f cout <<	-0.124939
-0.562963	with j <<	-0.124939
-0.659754	& 3) <<	-0.124939
-0.358873	calculated asa <<	-0.124939
-0.358873	calculated as(a <<	-0.124939
-0.358873	| (C <<	-0.124939
-0.358873	| (B <<	-0.124939
-1.461671	i; } Here,	-0.124939
-1.264502	... } Here,	-0.124939
-0.989639	1.; } Here,	-0.124939
-0.570117	sum; } Here,	-0.124939
-0.570117	list[j].c; } Here,	-0.124939
-0.570117	&list[8]); } Here,	-0.124939
-0.594246	+ i; Here,	-0.124939
-0.592948	| b; Here,	-0.124939
-1.375974	{ ... Here,	-0.124939
-1.041198	+ c; Here,	-0.124939
-0.872581	+= x; Here,	-0.124939
-0.659639	S1 ArrayOfStructures[100]; Here,	-0.124939
-0.463495	+ 3.5; Here,	-0.124939
-0.463495	from testing. Here,	-0.124939
-0.358816	my blog. Here,	-0.124939
-0.358816	of sets). Here,	-0.124939
-0.358816	i++) List[i]++; Here,	-0.124939
-0.358816	int c1::*MemberPointer; Here,	-0.124939
-0.358816	& 1]; Here,	-0.124939
-2.947196	of the x86	-0.124939
-2.622744	to the x86	-0.124939
-2.099258	in the x86	-0.301030
-2.467106	on the x86	-0.124939
-1.445535	bits in x86	-0.124939
-0.601581	versions. The x86	-0.124939
-1.202667	guide for x86	-0.124939
-0.896072	with all x86	-0.425969
-0.120893	Supports all x86	-0.301030
-1.064861	32- bit x86	-0.124939
-0.591464	storage. All x86	-0.124939
-0.581994	library. Supports x86	-0.124939
-0.581994	All modern x86	-0.124939
-0.659754	__unix__ __linux__ x86	-0.124939
-1.077624	are: The process	-0.124939
-0.378437	log on process	-0.124939
-0.600335	cores. A process	-0.124939
-1.946861	for each process	-0.124939
-0.590753	the allocation process	-0.124939
-1.042331	a complicated process	-0.124939
-0.788747	the development process	-0.124939
-1.100637	software development process	-0.124939
-0.587866	GOT lookup process	-0.124939
-0.096463	the installation process	-0.124939
-0.627443	The installation process	-0.124939
-0.562812	a background process	-0.124939
-0.557530	The update process	-0.124939
-0.788591	6 Development process	-0.124939
-0.358816	a learning process	-0.124939
-0.358816	this delaying process	-0.124939
-2.368726	is the binary	-0.124939
-1.864588	as the binary	-0.124939
-1.441968	off the binary	-0.124939
-2.616307	in a binary	-0.124939
-1.986529	that a binary	-0.124939
-2.037727	be a binary	-0.124939
-1.862915	or a binary	-0.124939
-1.678360	disadvantage of binary	-0.124939
-1.203258	compiled to binary	-0.124939
-2.120970	stored in binary	-0.124939
-0.600623	1-bit in binary	-0.124939
-1.076063	distributed as binary	-0.124939
-1.496205	then use binary	-0.124939
-0.594398	case. A binary	-0.124939
-0.594398	moved. A binary	-0.124939
-1.465025	of its binary	-0.124939
-1.017889	to produce binary	-0.124939
-0.463531	a biased binary	-0.124939
-0.463531	search facilities, binary	-0.124939
-1.417932	and to know	-0.124939
-2.175076	order to know	-0.124939
-1.563513	useful to know	-0.124939
-2.015191	want to know	-0.124939
-0.597050	dispatcher to know	-0.124939
-0.900077	programmer to know	-0.124939
-2.045299	do not know	-0.124939
-1.946531	If you know	-0.124939
-1.061509	sure you know	-0.124939
-1.683124	that we know	-0.124939
-0.588827	compiler cannot know	-0.124939
-0.795233	compiler doesn't know	-0.124939
-0.539961	microprocessor doesn't know	-0.124939
-0.883553	who would know	-0.124939
-0.877517	I don't know	-0.124939
-0.358887	(Microsoft, Intel) know	-0.124939
-0.856006	cache is 512	-0.124939
-2.710915	in a 512	-0.124939
-2.085690	for a 512	-0.124939
-0.902261	system, and 512	-0.124939
-1.077716	instructions. The 512	-0.124939
-0.598622	soon also 512	-0.124939
-0.199459	64 8 512	-0.425969
-0.198203	32 16 512	-0.425969
-1.449217	the matrix 512	-0.124939
-0.149400	a 512 512	-0.124939
-0.378980	The 512 512	-0.124939
-0.378980	38.7 512 512	-0.124939
-0.378980	80.9 512 512	-0.124939
-0.358859	2040 38.7 512	-0.124939
-0.358859	13.6 80.9 512	-0.124939
-1.187553	CPU to generate	-0.124939
-1.066697	system to generate	-0.124939
-2.032320	want to generate	-0.124939
-1.961620	likely to generate	-0.124939
-2.002025	able to generate	-0.124939
-0.597859	expression to generate	-0.124939
-0.680094	0 and generate	-0.425969
-1.077796	Applications that generate	-0.124939
-0.975202	it will generate	-0.124939
-1.381991	This will generate	-0.124939
-1.054003	which will generate	-0.124939
-0.559394	condition will generate	-0.124939
-0.559394	127 will generate	-0.124939
-0.559394	linker will generate	-0.124939
-0.559394	c+b will generate	-0.124939
-1.593930	it doesn't generate	-0.124939
-1.042813	can automatically generate	-0.124939
-2.247519	of the advantages	-0.301030
-0.600861	weigh the advantages	-0.124939
-0.497790	used. The advantages	-0.425969
-0.589862	user. The advantages	-0.124939
-0.879028	pointers. The advantages	-0.124939
-0.879028	style. The advantages	-0.124939
-0.589862	operands. The advantages	-0.124939
-0.879028	1.0f;} The advantages	-0.124939
-0.589862	dynamically. The advantages	-0.124939
-0.589862	bits). The advantages	-0.124939
-0.600215	type has advantages	-0.124939
-0.898854	also other advantages	-0.124939
-1.425886	has many advantages	-0.124939
-0.594382	are specific advantages	-0.124939
-0.887579	have several advantages	-0.124939
-1.677392	the above advantages	-0.124939
-1.873258	a and r	-0.124939
-0.900052	p and r	-0.124939
-1.072665	variable that r	-0.124939
-0.599877	0 that r	-0.124939
-0.891743	r = r	-0.124939
-1.689157	a[i] = r	-0.124939
-0.891743	edx = r	-0.124939
-0.901311	to by r	-0.124939
-1.076485	r) { r	-0.124939
-0.899976	example when r	-0.124939
-0.598443	sequence, where r	-0.124939
-1.640305	= 0; r	-0.425969
-1.057794	a ; r	-0.124939
-0.592458	clear whether r	-0.124939
-0.790245	= 1; r	-0.425969
-0.588520	add what r	-0.124939
-0.504971	that lies r	-0.124939
-2.471317	it is usually	-0.124939
-1.667060	function is usually	-0.124939
-2.332232	This is usually	-0.124939
-1.812494	compiler is usually	-0.124939
-2.509339	It is usually	-0.124939
-0.888932	order is usually	-0.124939
-1.583089	count is usually	-0.124939
-0.594923	area is usually	-0.124939
-0.594923	any, is usually	-0.124939
-0.594923	loop-branch is usually	-0.124939
-1.863012	functions are usually	-0.124939
-0.597814	cases are usually	-0.124939
-1.280201	constants are usually	-0.124939
-0.600445	Compilers will usually	-0.124939
-1.185151	logical processors usually	-0.124939
-1.494638	point calculations usually	-0.124939
-1.133530	an integer, usually	-0.124939
-0.527270	remote databases usually	-0.124939
-2.728172	if the results	-0.124939
-0.601538	OR the results	-0.124939
-1.068842	compilers. The results	-0.124939
-0.896170	optimizations. The results	-0.124939
-0.598586	sizes. The results	-0.124939
-0.900964	do. This results	-0.124939
-0.596069	print out results	-0.124939
-0.596027	produce 32 results	-0.124939
-1.394465	the four results	-0.124939
-0.685430	and intermediate results	-0.124939
-0.685430	store intermediate results	-0.124939
-0.479734	All intermediate results	-0.124939
-0.479734	storing intermediate results	-0.124939
-0.818216	The measured results	-0.124939
-0.557553	get reliable results	-0.124939
-0.557596	the thousand results	-0.124939
-0.463513	give inconsistent results	-0.124939
-0.463513	give misleading results	-0.124939
-0.358830	My experimental results	-0.124939
-1.879184	a and b,	-0.124939
-1.767204	{ int b,	-0.124939
-0.282050	if a, b,	-0.124939
-0.191214	int a, b,	-0.602060
-0.282050	vector a, b,	-0.124939
-0.265924	float a, b,	-0.726999
-0.223136	bool a, b,	-0.425969
-0.282050	Vec16s a, b,	-0.124939
-0.282050	Vec8s a, b,	-0.124939
-0.835955	i, a[100], b,	-0.124939
-3.134271	of the storage	-0.124939
-2.066609	where the storage	-0.124939
-1.631939	type of storage	-0.124939
-1.076011	bytes of storage	-0.124939
-1.540758	time. The storage	-0.124939
-0.600086	stored. The storage	-0.124939
-1.740411	replaced by storage	-0.124939
-0.593407	about data storage	-0.124939
-0.593407	binary data storage	-0.124939
-0.896501	or static storage	-0.124939
-0.587373	of variable storage	-0.124939
-0.550696	constants. Register storage	-0.124939
-0.090147	thread. Thread-local storage	-0.124939
-0.090147	block. Thread-local storage	-0.124939
-0.090147	times. Thread-local storage	-0.124939
-0.835653	big endian storage	-0.124939
-0.143372	make thread-local storage	-0.124939
-0.143372	(See thread-local storage	-0.124939
-2.309572	is the old	-0.124939
-2.877435	of the old	-0.124939
-2.578786	to the old	-0.124939
-1.941288	in the old	-0.124939
-2.432260	on the old	-0.124939
-0.601592	string. The old	-0.124939
-1.502780	compiled for old	-0.124939
-0.992065	compatible with old	-0.425969
-1.320293	compatibility with old	-0.124939
-1.048189	incompatible with old	-0.124939
-0.601116	crash on old	-0.124939
-0.900797	Use an old	-0.124939
-0.894756	some very old	-0.124939
-0.527228	six years old	-0.124939
-0.358887	a plain old	-0.124939
-1.288525	compiler to reduce	-0.124939
-2.206760	possible to reduce	-0.124939
-2.001905	able to reduce	-0.124939
-0.597852	lengths to reduce	-0.124939
-0.597852	capability to reduce	-0.124939
-0.902362	propagation and reduce	-0.124939
-1.915739	that can reduce	-0.124939
-2.002303	you can reduce	-0.124939
-0.676790	compilers can reduce	-0.124939
-0.588112	switches can reduce	-0.124939
-0.588112	seen can reduce	-0.124939
-1.250016	compiler may reduce	-0.425969
-1.839885	compilers will reduce	-0.124939
-0.894127	compilers cannot reduce	-0.124939
-0.586650	can actually reduce	-0.124939
-0.599480	branch that goes	-0.124939
-0.601189	reset or goes	-0.124939
-1.743368	when it goes	-0.124939
-2.054974	because it goes	-0.124939
-1.061130	whenever it goes	-0.124939
-0.601226	polymorphic function goes	-0.124939
-1.970304	The code goes	-0.124939
-2.255783	the time goes	-0.124939
-1.815369	a vector goes	-0.124939
-0.595928	that always goes	-0.124939
-0.579108	The output goes	-0.124939
-1.503778	clock frequency goes	-0.124939
-0.970644	a DLL goes	-0.124939
-0.788553	software project goes	-0.124939
-0.726484	example 9.5a goes	-0.124939
-0.358801	call p->f() goes	-0.124939
-0.358801	than 1% goes	-0.124939
-1.989713	into a union	-0.124939
-0.601020	Using a union	-0.124939
-1.201963	variable. The union	-0.124939
-1.639922	structure or union	-0.124939
-0.900351	y) { union	-0.124939
-1.039574	space. A union	-0.124939
-0.588452	example. A union	-0.124939
-0.588452	Unions A union	-0.124939
-0.504821	Example 14.28 union	-0.124939
-0.659582	Example 14.23b union	-0.124939
-0.659582	Example 14.26 union	-0.124939
-0.659582	Example 14.27 union	-0.124939
-0.463458	Example 14.23 union	-0.124939
-0.358787	Example 7.40b union	-0.124939
-0.358787	100 doubles: union	-0.124939
-0.358787	Example 7.39 union	-0.124939
-0.358787	Example 14.29 union	-0.124939
-0.358787	Example 14.24 union	-0.124939
-0.358787	Example 14.25 union	-0.124939
-1.597959	a = 0,	-0.124939
-1.610867	b = 0,	-0.124939
-1.039907	0 = 0,	-0.124939
-0.581979	sum1 = 0,	-0.124939
-0.581979	s3 = 0,	-0.124939
-0.581979	s2 = 0,	-0.124939
-0.581979	s0 = 0,	-0.124939
-0.581979	s1 = 0,	-0.124939
-0.830348	byte at 0,	-0.602060
-0.189539	select(b > 0,	-0.425969
-0.143394	{ memset(a, 0,	-0.124939
-0.143394	zero memset(a, 0,	-0.124939
-0.143394	overflow: _controlfp_s(&dummy, 0,	-0.124939
-0.143394	_fpreset(); _controlfp_s(&dummy, 0,	-0.124939
-0.358930	list[100]; memset(list, 0,	-0.124939
-0.889442	function is called.	-0.176091
-0.595623	CriticalInnerFunction is called.	-0.124939
-2.531952	to be called.	-0.124939
-2.156701	that are called.	-0.124939
-1.691047	objects are called.	-0.124939
-0.891045	destructors are called.	-0.124939
-0.595996	constructors are called.	-0.124939
-0.590848	alloca was called.	-0.124939
-0.596240	is never called.	-0.124939
-0.946271	are never called.	-0.124939
-1.444163	power of 10	-0.425969
-0.899350	penalty of 10	-0.124939
-0.601744	20 to 10	-0.124939
-0.601582	jobs and 10	-0.124939
-1.078124	recognize that 10	-0.124939
-0.600986	(3 - 10	-0.124939
-0.598414	experiment where 10	-0.124939
-2.212103	the value 10	-0.124939
-0.200484	something takes 10	-0.124939
-0.597062	still take 10	-0.124939
-1.153962	operating systems. 10	-0.124939
-0.577551	detected until 10	-0.124939
-0.570307	See chapter 10	-0.124939
-0.570345	for Windows. 10	-0.124939
-0.916963	is executed 10	-0.124939
-0.463458	compiler .................................................................................................... 10	-0.124939
-0.463458	.............................................................................................. 99 10	-0.124939
-1.074291	system is based	-0.124939
-0.600751	manual is based	-0.124939
-2.389106	should be based	-0.124939
-2.016486	that are based	-0.124939
-1.237830	methods are based	-0.124939
-0.588663	framework are based	-0.124939
-0.588663	Java are based	-0.124939
-0.588663	Fortran are based	-0.124939
-0.497062	schemes are based	-0.425969
-0.588663	recommendations are based	-0.124939
-0.599974	unknown CPU based	-0.124939
-1.069028	or C++ based	-0.124939
-0.594378	A language based	-0.124939
-1.557389	CPU dispatcher based	-0.124939
-0.872687	level framework based	-0.124939
-0.866001	will go based	-0.124939
-0.567084	be chosen based	-0.124939
-2.234138	is to choose	-0.124939
-0.897831	compilers to choose	-0.124939
-0.599423	done to choose	-0.124939
-0.599423	mask to choose	-0.124939
-0.898030	system and choose	-0.124939
-1.071613	operations and choose	-0.124939
-1.071613	processors, and choose	-0.124939
-1.176902	we may choose	-0.124939
-0.854443	You may choose	-0.124939
-0.574234	developer may choose	-0.124939
-0.900627	Whether you choose	-0.124939
-1.810288	compiler will choose	-0.124939
-1.724848	you should choose	-0.124939
-0.878440	will automatically choose	-0.124939
-0.557693	make developers choose	-0.124939
-2.539853	and the options	-0.124939
-1.077671	specify the options	-0.124939
-1.273801	of compiler options	-0.124939
-0.596476	appropriate compiler options	-0.124939
-0.677677	various optimization options	-0.124939
-0.474890	Many optimization options	-0.124939
-0.078958	relevant optimization options	-0.425969
-0.175532	Compiler optimization options	-0.124939
-1.184704	the available options	-0.124939
-0.593604	Command line options	-0.124939
-0.882248	if certain options	-0.124939
-0.876434	have various options	-0.124939
-0.582010	all installation options	-0.124939
-0.557688	the debugging options	-0.124939
-0.358873	disable power-save options	-0.124939
-2.381722	is the feature	-0.124939
-1.800539	has the feature	-0.124939
-1.293427	have a feature	-0.124939
-1.586648	such a feature	-0.124939
-0.601248	indirect function feature	-0.124939
-0.597140	branch). This feature	-0.124939
-0.597140	2010. This feature	-0.124939
-1.000710	but this feature	-0.425969
-0.590338	so this feature	-0.124939
-0.600365	compiler A feature	-0.124939
-1.434672	specific CPU feature	-0.124939
-1.068941	that such feature	-0.124939
-0.892572	C++ template feature	-0.124939
-1.106613	a test feature	-0.124939
-0.575555	built-in test feature	-0.124939
-0.579122	the special feature	-0.124939
-0.659697	symbol interposition feature	-0.124939
-1.031559	are different ways	-0.124939
-0.831286	several different ways	-0.425969
-1.432508	in other ways	-0.124939
-0.598491	other possible ways	-0.124939
-1.357148	are several ways	-0.124939
-0.590858	have fast ways	-0.124939
-0.321807	in various ways	-0.124939
-0.275240	are various ways	-0.522879
-0.321807	show various ways	-0.124939
-0.321807	describe various ways	-0.124939
-0.190147	are three ways	-0.425969
-0.065802	are smarter ways	-0.425969
-0.598118	processors that were	-0.602060
-0.885624	problem that were	-0.124939
-0.202341	models that were	-0.425969
-0.597540	10 elements were	-0.124939
-0.891972	whether they were	-0.124939
-0.595496	256-bit instructions were	-0.124939
-0.888601	compiler versions were	-0.124939
-0.590342	disk space were	-0.124939
-0.585045	measured results were	-0.124939
-1.292425	have tested were	-0.124939
-0.855217	different tasks were	-0.124939
-0.573138	matrix sizes were	-0.124939
-0.527122	The tests were	-0.124939
-0.463495	example 8.15a were	-0.124939
-0.463495	No differences were	-0.124939
-0.358816	and Func2 were	-0.124939
-2.666183	for the link	-0.124939
-1.878458	at a link	-0.124939
-2.012141	need to link	-0.124939
-0.601658	-fno-pic and link	-0.124939
-0.600096	differently. The link	-0.124939
-0.600096	together. The link	-0.124939
-1.182898	a static link	-0.124939
-0.363871	as static link	-0.425969
-0.826921	than static link	-0.124939
-1.004468	a dynamic link	-0.124939
-0.179486	or dynamic link	-0.425969
-0.490861	make dynamic link	-0.124939
-0.179486	separate dynamic link	-0.425969
-0.580622	array. No link	-0.124939
-1.431702	the previous link	-0.124939
-0.463568	a symbolic link	-0.124939
-1.492230	array is made	-0.124939
-0.600424	dispatch is made	-0.124939
-0.600424	attempt is made	-0.124939
-1.704857	can be made	-0.124939
-2.062826	should be made	-0.124939
-1.161091	often be made	-0.124939
-1.957867	I have made	-0.124939
-1.169590	microprocessor has made	-0.124939
-0.593167	reordering has made	-0.124939
-1.750512	function library made	-0.124939
-1.576591	shared object made	-0.124939
-0.562933	I once made	-0.124939
-0.504941	into projects made	-0.124939
-0.358873	using ready made	-0.124939
-0.358873	templates. Ready made	-0.124939
-1.522205	to the appropriate	-0.522879
-2.459574	for the appropriate	-0.124939
-2.283823	with the appropriate	-0.124939
-1.422147	choose the appropriate	-0.124939
-0.598398	include the appropriate	-0.124939
-0.895796	loads the appropriate	-0.124939
-0.598398	Including the appropriate	-0.124939
-1.443961	systems. The appropriate	-0.124939
-0.600986	simply not appropriate	-0.124939
-0.600922	prints an appropriate	-0.124939
-1.664584	and make appropriate	-0.124939
-1.193650	is most appropriate	-0.124939
-0.594569	3. Use appropriate	-0.124939
-0.358902	keyword wherever appropriate	-0.124939
-0.978824	0; int i,	-0.124939
-0.548989	... int i,	-0.124939
-0.978824	a[100]; int i,	-0.124939
-0.548989	n! int i,	-0.124939
-1.103763	list[300]; int i,	-0.124939
-0.358199	matrix[rows][columns]; int i,	-0.425969
-0.548989	string; int i,	-0.124939
-0.548989	list[size]; int i,	-0.124939
-0.548989	8.13a int i,	-0.124939
-0.548989	8.13b int i,	-0.124939
-0.548989	8.12a int i,	-0.124939
-0.548989	8.14b int i,	-0.124939
-0.548989	8.14a int i,	-0.124939
-0.122060	StoreVector(aa + i,	-0.602060
-0.358973	printf("\n%2i %10I64i", i,	-0.124939
-2.458380	by the constructor	-0.124939
-1.300043	function a constructor	-0.124939
-1.073378	array. The constructor	-0.124939
-0.600117	conversion. The constructor	-0.124939
-0.598996	data // constructor	-0.124939
-0.896984	constructor // constructor	-0.124939
-0.600426	destructors A constructor	-0.124939
-1.269505	A simple constructor	-0.124939
-0.558687	the copy constructor	-0.124939
-0.558687	a copy constructor	-0.124939
-0.629691	The copy constructor	-0.124939
-0.154512	A copy constructor	-0.124939
-0.558687	no copy constructor	-0.124939
-0.396578	Any copy constructor	-0.124939
-0.358928	a default constructor	-0.124939
-0.358928	// default constructor	-0.124939
-0.358928	A default constructor	-0.124939
-1.671731	set of CPUs.	-0.124939
-1.374902	brands of CPUs.	-0.124939
-1.170719	for different CPUs.	-0.124939
-1.395993	several different CPUs.	-0.124939
-0.578475	support different CPUs.	-0.124939
-0.876537	for Intel CPUs.	-0.124939
-0.588580	different Intel CPUs.	-0.124939
-0.973558	for AMD CPUs.	-0.124939
-1.244407	on AMD CPUs.	-0.124939
-0.589454	fit their CPUs.	-0.124939
-1.331167	the x86 CPUs.	-0.124939
-1.154857	with old CPUs.	-0.124939
-1.645792	and VIA CPUs.	-0.124939
-1.021476	all modern CPUs.	-0.124939
-0.577520	with non-Intel CPUs.	-0.124939
-0.990436	on future CPUs.	-0.124939
-0.504901	with earlier CPUs.	-0.124939
-2.033046	(i = 2;	-0.124939
-0.202494	list[i+2] = 2;	-0.425969
-0.593991	a[1] = 2;	-0.124939
-0.814561	r + 2;	-0.124939
-0.361003	*p + 2;	-0.425969
-0.555544	b[i] + 2;	-0.124939
-0.555544	bb[i] + 2;	-0.124939
-0.507385	a * 2;	-0.425969
-0.546727	a[i].u[1] * 2;	-0.124939
-2.179743	i < 2;	-0.124939
-0.886901	a += 2;	-0.124939
-0.810669	cout << 2;	-0.425969
-2.474526	This is just	-0.124939
-2.656143	It is just	-0.124939
-0.599703	classes is just	-0.124939
-1.072148	delay is just	-0.124939
-0.902486	translated to just	-0.124939
-0.900203	cache in just	-0.124939
-0.600614	AND-operations in just	-0.124939
-2.531450	may be just	-0.124939
-1.299174	calculations are just	-0.124939
-2.409639	a function just	-0.124939
-1.550563	done with just	-0.124939
-2.013648	If you just	-0.124939
-1.590722	to have just	-0.124939
-1.495823	even when just	-0.124939
-0.899799	purpose. It just	-0.124939
-1.815455	a vector just	-0.124939
-0.463495	up significantly just	-0.124939
-0.463495	brackets index, just	-0.124939
-1.600693	bits of a[i]	-0.124939
-1.679332	reference to a[i]	-0.124939
-1.201138	temp = a[i]	-0.124939
-0.548652	i++) { a[i]	-0.970037
-1.078124	2) { a[i]	-0.124939
-1.057939	loop ; a[i]	-0.124939
-1.336196	array element a[i]	-0.124939
-0.556560	2; i++) a[i]	-0.124939
-1.443151	size; i++) a[i]	-0.124939
-0.541135	multiplication here: a[i]	-0.124939
-0.504941	avoids overflow: a[i]	-0.124939
-0.504941	safe formula a[i]	-0.124939
-3.129359	of the function,	-0.124939
-1.078103	inline the function,	-0.124939
-1.587230	for this function,	-0.124939
-2.633793	the same function,	-0.124939
-0.599647	inside one function,	-0.124939
-1.792260	the library function,	-0.124939
-0.597815	non-virtual member function,	-0.124939
-1.575995	the template function,	-0.124939
-0.367506	the simple function,	-0.124939
-0.593317	an optimized function,	-0.124939
-1.176953	calls another function,	-0.124939
-1.591995	the innermost function,	-0.124939
-0.858462	a frame function,	-0.124939
-1.269971	the latter function,	-0.124939
-1.366249	CPU detection function,	-0.124939
-0.541076	the select function,	-0.124939
-0.463458	a non-member function,	-0.124939
-2.211001	of the operands	-0.124939
-1.996747	that the operands	-0.124939
-1.789660	if the operands	-0.301030
-1.198220	swap the operands	-0.124939
-0.601602	operands The operands	-0.124939
-2.577672	floating point operands	-0.124939
-1.289851	in all operands	-0.124939
-0.598491	expressions where operands	-0.124939
-1.186905	the Boolean operands	-0.124939
-0.917569	of Boolean operands	-0.124939
-0.188861	16kB aligned operands	-0.124939
-0.463604	performance). Aligned operands	-0.124939
-2.792171	of the innermost	-0.124939
-1.822865	in the innermost	-0.346788
-1.674295	only the innermost	-0.124939
-0.895854	also the innermost	-0.124939
-1.918778	inside the innermost	-0.124939
-0.898734	outside the innermost	-0.124939
-0.869649	the critical innermost	-0.346788
-0.515028	A critical innermost	-0.124939
-0.789016	// Critical innermost	-0.124939
-2.012725	likely to require	-0.124939
-1.828865	functions that require	-0.124939
-0.596060	methods or require	-0.124939
-0.596060	lookup or require	-0.124939
-0.596060	slower or require	-0.124939
-2.025397	does not require	-0.124939
-1.604301	This may require	-0.124939
-0.596246	measurements may require	-0.124939
-1.698393	vector operations require	-0.124939
-0.595481	these instructions require	-0.124939
-0.594687	global arrays require	-0.124939
-0.593525	mixed precision require	-0.124939
-1.049977	This would require	-0.124939
-1.037731	Some applications require	-0.124939
-0.583049	non-constant references require	-0.124939
-0.557607	Some profilers require	-0.124939
-0.463476	and MOVNTDQ require	-0.124939
-0.358801	linear algebra) require	-0.124939
-2.911212	of the compiler.	-0.124939
-2.093560	in the compiler.	-0.124939
-1.743874	by the compiler.	-0.124939
-2.449482	on the compiler.	-0.124939
-2.073949	in a compiler.	-0.124939
-1.844631	a different compiler.	-0.124939
-2.640726	the same compiler.	-0.124939
-1.834731	the Intel compiler.	-0.124939
-0.588608	or Intel compiler.	-0.124939
-1.430427	Intel C++ compiler.	-0.124939
-1.780659	the Gnu compiler.	-0.124939
-1.168730	with another compiler.	-0.124939
-0.992928	the Microsoft compiler.	-0.124939
-0.784781	with Microsoft compiler.	-0.124939
-3.030349	of the advanced	-0.124939
-2.504995	on the advanced	-0.124939
-1.199275	run the advanced	-0.124939
-0.900661	running the advanced	-0.124939
-1.946948	lot of advanced	-0.124939
-0.600224	lack of advanced	-0.124939
-0.600224	wealth of advanced	-0.124939
-0.601645	execution and advanced	-0.124939
-1.077698	is for advanced	-0.124939
-0.901169	tips on advanced	-0.124939
-1.833405	is an advanced	-0.124939
-1.279954	use an advanced	-0.124939
-1.293369	for more advanced	-0.124939
-2.157144	the most advanced	-0.124939
-1.041346	and using advanced	-0.124939
-1.566156	are using advanced	-0.124939
-1.425617	has many advanced	-0.124939
-0.573245	services under advanced	-0.124939
-1.640306	where a #define	-0.124939
-0.601160	const, or #define	-0.124939
-1.076639	inline function #define	-0.124939
-0.901107	declared with #define	-0.124939
-1.199176	Microsoft compiler #define	-0.124939
-1.066540	== 2 #define	-0.124939
-0.597185	== 8 #define	-0.124939
-2.195909	For example, #define	-0.124939
-0.593853	compiler, etc. #define	-0.124939
-0.582991	== 5 #define	-0.124939
-0.504917	__attribute__((const)) #else #define	-0.124939
-0.835395	at runtime. #define	-0.124939
-0.143346	two elements: #define	-0.124939
-0.143346	array elements: #define	-0.124939
-0.463440	a name. #define	-0.124939
-0.463440	#ifdef __GNUC__ #define	-0.124939
-0.358773	of N: #define	-0.124939
-0.358773	#include <math.h> #define	-0.124939
-2.464635	number of points	-0.124939
-0.595982	object it points	-0.124939
-1.061195	call it points	-0.124939
-0.891017	what it points	-0.124939
-2.024590	a pointer points	-0.124939
-1.594466	function pointer points	-0.124939
-0.596020	p always points	-0.124939
-0.592428	following list points	-0.124939
-0.586557	pointer actually points	-0.124939
-0.174281	that r points	-0.425969
-0.469925	what r points	-0.124939
-0.582046	few unused points	-0.124939
-0.582046	object p points	-0.124939
-0.090147	which initially points	-0.124939
-0.090147	pointer initially points	-0.124939
-0.090147	entry initially points	-0.124939
-0.358859	(or eight) points	-0.124939
-2.569448	is a switch	-0.124939
-0.599599	predict a switch	-0.124939
-1.071839	insert a switch	-0.124939
-0.898181	processors, a switch	-0.124939
-1.745671	needs to switch	-0.124939
-1.196439	branches and switch	-0.124939
-0.203604	Branches and switch	-0.124939
-0.899064	as for switch	-0.124939
-0.600042	replacements for switch	-0.124939
-1.077558	tree or switch	-0.124939
-0.887929	well. A switch	-0.124939
-0.594413	targets. A switch	-0.124939
-0.600114	statements because switch	-0.124939
-1.013823	a task switch	-0.124939
-1.148866	int n; switch	-0.124939
-0.541263	A context switch	-0.124939
-0.358859	initializer lists, switch	-0.124939
-2.381944	is the range	-0.124939
-0.601550	limit the range	-0.124939
-0.670846	out of range	-0.124939
-0.902113	underflow. The range	-0.124939
-2.043295	in this range	-0.124939
-2.638734	the same range	-0.124939
-0.599839	integers which range	-0.124939
-1.300252	the address range	-0.425969
-1.119060	a limited range	-0.124939
-0.541263	The live range	-0.124939
-0.143372	float Live range	-0.124939
-0.143372	storage. Live range	-0.124939
-0.358859	a narrow range	-0.124939
-1.502963	at the start	-0.124939
-1.828884	has to start	-0.124939
-1.187470	CPU to start	-0.124939
-2.206427	possible to start	-0.124939
-1.926593	takes to start	-0.124939
-1.355470	fail to start	-0.124939
-0.894685	minutes to start	-0.124939
-0.601670	iterations and start	-0.124939
-2.122039	it can start	-0.124939
-0.600843	collection may start	-0.124939
-0.591407	before you start	-0.124939
-0.201968	Before you start	-0.425969
-0.594316	CPU will start	-0.124939
-0.594316	manager will start	-0.124939
-0.594880	name ; start	-0.124939
-0.575602	framework, during start	-0.124939
-0.747313	which the modules	-0.124939
-1.999856	all the modules	-0.124939
-1.078292	loading of modules	-0.124939
-0.601218	templates or modules	-0.124939
-1.381307	in other modules	-0.124939
-1.448116	no other modules	-0.124939
-1.746601	for all modules	-0.124939
-0.598864	tested library modules	-0.124939
-1.706435	the two modules	-0.124939
-1.625098	most critical modules	-0.124939
-0.595161	up. Some modules	-0.124939
-1.571704	assembly language modules	-0.124939
-0.877328	in separate modules	-0.124939
-0.726650	optimize across modules	-0.124939
-0.639563	optimizations across modules	-0.124939
-0.358907	all .cpp modules	-0.124939
-0.681080	multiple .cpp modules	-0.124939
-0.601532	faster the smaller	-0.124939
-0.601532	advantageous the smaller	-0.124939
-2.253435	code is smaller	-0.124939
-0.600397	switches is smaller	-0.124939
-0.600397	proxy is smaller	-0.124939
-2.393352	to a smaller	-0.124939
-1.926402	has a smaller	-0.124939
-0.601756	Integers of smaller	-0.124939
-1.443533	systems. The smaller	-0.124939
-2.531450	may be smaller	-0.124939
-2.648727	the code smaller	-0.124939
-0.599137	matrix into smaller	-0.124939
-1.582462	into multiple smaller	-0.124939
-0.596227	variable even smaller	-0.124939
-1.512898	8 bytes smaller	-0.124939
-1.406701	code becomes smaller	-0.124939
-1.339742	be made smaller	-0.124939
-1.113545	execution units smaller	-0.124939
-0.898519	method used here	-0.124939
-1.072483	using static here	-0.124939
-1.686035	is necessary here	-0.124939
-1.519066	the speed here	-0.124939
-1.354102	The calculation here	-0.124939
-1.167625	The problem here	-0.124939
-0.796182	const_cast operator here	-0.124939
-0.545324	?: operator here	-0.124939
-0.587812	on n here	-0.124939
-1.037817	at runtime here	-0.124939
-0.583009	functions go here	-0.124939
-0.858409	advice given here	-0.124939
-0.550664	cost anything here	-0.124939
-0.541076	www.yeppp.info And here	-0.124939
-0.129434	is said here	-0.425969
-0.504821	Functional decomposition here	-0.124939
-0.358787	is provoked here	-0.124939
-3.133284	of the core	-0.124939
-2.298878	use the core	-0.124939
-1.073256	cycles. The core	-0.124939
-0.600076	frequency. The core	-0.124939
-0.601462	table are core	-0.124939
-2.637741	the same core	-0.124939
-1.879316	the CPU core	-0.124939
-0.854074	one CPU core	-0.124939
-0.718120	specific CPU core	-0.124939
-1.967988	is called core	-0.124939
-0.595949	routines, system core	-0.124939
-0.888555	The execution core	-0.124939
-1.189220	same processor core	-0.124939
-0.879216	dedicated microprocessor core	-0.124939
-0.579206	AMD math core	-0.124939
-0.557736	AMD Math core	-0.124939
-3.011552	in the relevant	-0.124939
-2.006947	all the relevant	-0.124939
-2.729996	it is relevant	-0.124939
-1.628885	size is relevant	-0.124939
-1.439005	speed is relevant	-0.124939
-1.201863	possibly be relevant	-0.124939
-2.186426	is more relevant	-0.124939
-0.667944	with all relevant	-0.602060
-1.451460	on all relevant	-0.124939
-1.917967	is also relevant	-0.124939
-0.596407	receive new relevant	-0.124939
-0.585044	line options relevant	-0.124939
-0.557709	are hardly relevant	-0.124939
-0.504921	and keywords relevant	-0.124939
-0.463549	all respects relevant	-0.124939
-1.462308	vector registers are:	-0.124939
-1.062672	than pointers are:	-0.124939
-1.411914	oriented programming are:	-0.124939
-1.273884	register stack are:	-0.124939
-1.082423	memory allocation are:	-0.124939
-1.770943	CPU dispatching are:	-0.124939
-0.543901	dynamic linking are:	-0.124939
-0.583049	than references are:	-0.124939
-1.146216	function inlining are:	-0.124939
-0.562789	loop iterations are:	-0.124939
-0.557557	and free are:	-0.124939
-0.557607	with profilers are:	-0.124939
-0.541093	positive effects are:	-0.124939
-0.764226	Possible solutions are:	-0.124939
-0.358801	registers. Disadvantages are:	-0.124939
-1.203028	back to around	-0.124939
-1.404913	to work around	-0.124939
-0.872779	#if directives around	-0.124939
-1.441244	various ways around	-0.124939
-0.113011	and scattered around	-0.124939
-0.052840	be scattered around	-0.425969
-0.152944	are scattered around	-0.425969
-0.113011	functions scattered around	-0.124939
-0.113011	etc. scattered around	-0.124939
-0.541181	not wrap around	-0.124939
-0.505011	of jumping around	-0.124939
-0.463568	scattered randomly around	-0.124939
-0.065798	a parenthesis around	-0.124939
-0.358873	the circumstances around	-0.124939
-1.664593	up to 5	-0.124939
-0.600178	count to 5	-0.124939
-0.600178	incremented to 5	-0.124939
-1.076010	3 - 5	-0.124939
-1.197884	take only 5	-0.124939
-1.646479	b * 5	-0.124939
-0.584858	100 * 5	-0.124939
-0.868087	addition takes 5	-0.124939
-0.584206	operation takes 5	-0.124939
-0.597792	typically between 5	-0.124939
-0.586557	graphics processors. 5	-0.124939
-1.235809	INSTRSET == 5	-0.124939
-0.527143	platform ....................................................................................... 5	-0.124939
-0.504881	platform ........................................................................................... 5	-0.124939
-0.504881	............................................................................................... 23 5	-0.124939
-0.463513	by 3, 5	-0.124939
-0.358830	a website. 5	-0.124939
-2.360289	function is replaced	-0.124939
-1.075956	m is replaced	-0.124939
-0.600444	x*8 is replaced	-0.124939
-1.202665	possible, and replaced	-0.124939
-1.608890	can be replaced	-0.602060
-2.081096	may be replaced	-0.124939
-1.886853	will be replaced	-0.124939
-1.148406	sometimes be replaced	-0.124939
-1.774354	parameters are replaced	-0.124939
-1.628971	compiler has replaced	-0.124939
-1.624207	have been replaced	-0.124939
-0.593024	its parameters replaced	-0.124939
-0.598879	{x = a;	-0.124939
-0.598879	x[0] = a;	-0.124939
-1.626358	{ int a;	-0.124939
-1.454943	short int a;	-0.124939
-1.027830	public: int a;	-0.124939
-0.584260	2;} int a;	-0.124939
-1.660773	b + a;	-0.124939
-0.563049	7.24 float a;	-0.124939
-0.563049	7.29a float a;	-0.124939
-0.563049	14.2a float a;	-0.124939
-0.563049	14.2b float a;	-0.124939
-0.579308	7.11 bool a;	-0.124939
-0.107190	S1 {double a;	-0.425969
-0.143390	abc {int a;	-0.124939
-0.143390	Sab {int a;	-0.124939
-0.900928	lots of things	-0.124939
-0.600978	couple of things	-0.124939
-1.078133	operators for things	-0.124939
-1.919303	and other things	-0.124939
-1.323752	to do things	-0.301030
-0.598732	do multiple things	-0.124939
-1.427298	are two things	-0.124939
-0.895159	does some things	-0.124939
-0.889992	to simple things	-0.124939
-1.332388	of doing things	-0.124939
-1.556157	are various things	-0.124939
-0.587805	reveals three things	-0.124939
-0.463495	often reveal things	-0.124939
-0.463495	some funny things	-0.124939
-0.358816	quite ingenious things	-0.124939
-1.600441	or the negative	-0.124939
-0.601801	u.d is negative	-0.124939
-2.049343	use a negative	-0.124939
-2.040759	make a negative	-0.124939
-1.194327	contains a negative	-0.124939
-1.071816	produces a negative	-0.124939
-1.494695	overflow and negative	-0.124939
-0.900077	positive and negative	-0.124939
-0.601561	variables. The negative	-0.124939
-1.077669	1 for negative	-0.124939
-1.077457	never be negative	-0.124939
-0.601462	both are negative	-0.124939
-1.199635	and not negative	-0.124939
-0.594398	software. A negative	-0.124939
-0.594398	differently. A negative	-0.124939
-1.577032	has no negative	-0.124939
-0.598452	A possible negative	-0.124939
-1.393317	the code section	-0.124939
-0.995021	The code section	-0.124939
-0.601044	languages. This section	-0.124939
-1.715826	of this section	-0.124939
-0.590412	skip this section	-0.124939
-0.590412	conclude this section	-0.124939
-1.898655	the data section	-0.124939
-1.229383	The data section	-0.124939
-0.586754	writable data section	-0.124939
-1.369123	The next section	-0.124939
-0.358930	assembly language", section	-0.124939
-1.770652	do the reductions	-0.124939
-0.902083	All the reductions	-0.124939
-1.074598	do more reductions	-0.124939
-1.163586	and which reductions	-0.124939
-0.591566	shows which reductions	-0.124939
-0.598410	while many reductions	-0.124939
-0.595533	most simple reductions	-0.124939
-0.589583	expressions. Most reductions	-0.124939
-0.237931	the algebraic reductions	-0.124939
-0.237931	and algebraic reductions	-0.124939
-0.237931	make algebraic reductions	-0.124939
-0.237931	any algebraic reductions	-0.124939
-0.343722	simple algebraic reductions	-0.124939
-0.237931	Many algebraic reductions	-0.124939
-0.541181	such obvious reductions	-0.124939
-0.541181	doing equivalent reductions	-0.124939
-0.505011	CPU. Algebraic reductions	-0.124939
-1.071233	example, to go	-0.124939
-2.066752	want to go	-0.124939
-1.981124	likely to go	-0.124939
-0.599394	predicted to go	-0.124939
-0.902235	counter and go	-0.124939
-2.070468	that can go	-0.124939
-2.300796	{ // go	-0.124939
-2.537827	the function go	-0.124939
-2.010215	it may go	-0.124939
-0.891584	table may go	-0.124939
-0.887679	branch will go	-0.124939
-0.594286	elements will go	-0.124939
-0.600009	Non-polymorphic functions go	-0.124939
-0.895039	public variables go	-0.124939
-0.594897	but must go	-0.124939
-0.594678	arithmetic calculations go	-0.124939
-0.527165	would otherwise go	-0.124939
-1.077636	everything that depends	-0.124939
-2.419143	to use depends	-0.124939
-1.198361	each vector depends	-0.124939
-2.003900	a loop depends	-0.124939
-1.298388	each value depends	-0.124939
-0.584415	return value depends	-0.124939
-0.589618	control branch depends	-0.425969
-0.195376	each calculation depends	-0.425969
-0.589900	final application depends	-0.124939
-0.875210	each addition depends	-0.124939
-1.386907	be predicted depends	-0.124939
-0.579249	of sum depends	-0.124939
-0.842141	The gain depends	-0.124939
-0.504861	this bookkeeping depends	-0.124939
-0.358816	the truth depends	-0.124939
-0.601934	Take the example:	-0.124939
-0.601615	blocks, for example:	-0.124939
-2.044193	in this example:	-0.124939
-0.609857	time. For example:	-0.124939
-0.431151	etc. For example:	-0.124939
-0.431151	cases. For example:	-0.124939
-0.164135	to. For example:	-0.425969
-0.431151	structure. For example:	-0.124939
-0.431151	sizes. For example:	-0.124939
-0.431151	lookup. For example:	-0.124939
-0.609857	valid. For example:	-0.124939
-0.431151	predictable. For example:	-0.124939
-0.431151	combined. For example:	-0.124939
-0.431151	completely. For example:	-0.124939
-1.920528	the following example:	-0.124939
-0.575618	ARRAYSIZE. Another example:	-0.124939
-0.601696	together and tested	-0.124939
-1.273516	should be tested	-0.249877
-1.728868	also be tested	-0.124939
-1.454200	that have tested	-0.124939
-0.724024	I have tested	-0.124939
-0.594817	Library versions tested	-0.124939
-1.159220	not been tested	-0.124939
-1.381863	have been tested	-0.124939
-0.579258	be further tested	-0.124939
-0.358916	and well- tested	-0.124939
-0.601934	removed the contentions	-0.124939
-0.589170	as when contentions	-0.124939
-0.877681	matrix when contentions	-0.124939
-0.589170	dramatic when contentions	-0.124939
-1.531660	the cache contentions	-0.124939
-0.555696	be cache contentions	-0.124939
-0.814838	cause cache contentions	-0.124939
-0.894019	level-2 cache contentions	-0.124939
-1.155942	level-1 cache contentions	-0.124939
-1.069077	of such contentions	-0.124939
-0.897371	to cause contentions	-0.124939
-0.975417	and cause contentions	-0.124939
-1.213385	can cause contentions	-0.124939
-0.308809	// Cache contentions	-0.124939
-0.127516	9.10 Cache contentions	-0.425969
-1.436018	loop is predicted	-0.124939
-1.440758	way is predicted	-0.124939
-1.293044	address is predicted	-0.124939
-2.073807	to be predicted	-0.124939
-1.788934	can be predicted	-0.249877
-1.682080	also be predicted	-0.124939
-1.796527	cannot be predicted	-0.124939
-1.625566	would be predicted	-0.124939
-2.018611	they are predicted	-0.124939
-0.378512	loops are predicted	-0.425969
-2.589745	is not predicted	-0.124939
-1.603812	is simply predicted	-0.124939
-1.497873	is usually predicted	-0.124939
-2.879079	of the main	-0.124939
-2.088127	in the main	-0.301030
-2.512745	for the main	-0.124939
-2.210005	than the main	-0.124939
-2.127956	from the main	-0.124939
-2.018519	where the main	-0.124939
-1.939882	instead of main	-0.124939
-0.898090	version in main	-0.124939
-1.659180	variable in main	-0.124939
-0.599553	instance in main	-0.124939
-1.569178	code. The main	-0.124939
-1.360185	set. The main	-0.124939
-0.598658	running. The main	-0.124939
-1.197281	accessed from main	-0.124939
-1.428015	are two main	-0.124939
-1.527821	pointers and references	-0.124939
-0.124530	Pointers and references	-0.124939
-1.200822	pointers or references	-0.124939
-2.362963	rather than references	-0.124939
-1.074540	have more references	-0.124939
-1.799958	of using references	-0.124939
-1.944132	by using references	-0.124939
-0.595404	as constant references	-0.124939
-0.735930	because relative references	-0.124939
-0.510487	mostly relative references	-0.124939
-0.557656	use absolute references	-0.124939
-0.550677	of self-relative references	-0.124939
-0.541146	Pointers versus references	-0.124939
-0.358844	and non-constant references	-0.124939
-0.358844	calls. Internal references	-0.124939
-1.354544	program is loaded	-0.124939
-0.678938	library is loaded	-0.124939
-0.601708	esp+12 and loaded	-0.124939
-1.714293	to be loaded	-0.124939
-2.076551	can be loaded	-0.124939
-2.118046	may be loaded	-0.124939
-1.916945	will be loaded	-0.124939
-1.796527	cannot be loaded	-0.124939
-1.719784	must be loaded	-0.124939
-1.012743	libraries are loaded	-0.124939
-1.323183	is typically loaded	-0.124939
-0.358930	an over- loaded	-0.124939
-1.679591	whether the positive	-0.124939
-1.987180	is a positive	-0.124939
-1.973835	that a positive	-0.124939
-2.023261	make a positive	-0.124939
-1.191530	contains a positive	-0.124939
-0.902133	performance. The positive	-0.124939
-1.582386	only for positive	-0.124939
-0.899083	0 for positive	-0.124939
-0.600396	variables. A positive	-0.124939
-0.598748	compare two positive	-0.124939
-1.188631	to some positive	-0.124939
-1.316389	a large positive	-0.124939
-1.250270	very large positive	-0.124939
-0.802895	if both positive	-0.124939
-0.549080	has both positive	-0.124939
-0.981785	a low positive	-0.124939
-2.422201	of the loop.	-0.124939
-1.005302	inside the loop.	-0.124939
-0.899562	outside the loop.	-0.124939
-1.192469	during the loop.	-0.124939
-0.599118	exit the loop.	-0.124939
-1.203672	unroll a loop.	-0.124939
-1.187897	the test loop.	-0.124939
-0.597143	the innermost loop.	-0.301030
-0.527411	critical innermost loop.	-0.124939
-0.463659	an infinite loop.	-0.124939
-1.814572	time the computer	-0.124939
-2.386054	when the computer	-0.124939
-2.278106	If the computer	-0.124939
-0.745060	off the computer	-0.124939
-0.853167	until the computer	-0.425969
-0.599452	restart the computer	-0.124939
-2.074075	in a computer	-0.124939
-1.014485	objects in computer	-0.124939
-1.074299	used. A computer	-0.124939
-0.599746	on one computer	-0.124939
-1.570530	for many computer	-0.124939
-1.415385	Pentium 4 computer	-0.124939
-0.585087	an old computer	-0.124939
-2.650954	that the overhead	-0.124939
-1.590278	avoid the overhead	-0.124939
-1.200552	involves the overhead	-0.124939
-0.900708	invoking the overhead	-0.124939
-1.005394	function. The overhead	-0.425969
-1.064681	are: The overhead	-0.124939
-1.064681	threads. The overhead	-0.124939
-1.194193	or no overhead	-0.124939
-0.991728	the extra overhead	-0.124939
-1.247000	no extra overhead	-0.124939
-0.520104	need extra overhead	-0.124939
-0.520104	9 extra overhead	-0.124939
-0.837272	the large overhead	-0.124939
-1.316661	a large overhead	-0.124939
-1.418482	a high overhead	-0.124939
-0.575586	very little overhead	-0.124939
-0.075820	AMD and VIA	-0.397940
-0.601365	AMD or VIA	-0.124939
-0.527391	to recognize VIA	-0.124939
-0.601922	holding the pointer.	-0.124939
-2.282164	to a pointer.	-0.124939
-1.841226	or a pointer.	-0.124939
-1.751153	from a pointer.	-0.124939
-1.474937	making a pointer.	-0.124939
-1.862068	through a pointer.	-0.124939
-1.281902	like a pointer.	-0.124939
-1.629588	a memory pointer.	-0.124939
-1.336873	the member pointer.	-0.124939
-0.581623	same member pointer.	-0.124939
-1.764449	the stack pointer.	-0.124939
-0.291255	a smart pointer.	-0.124939
-0.463681	the 'this' pointer.	-0.124939
-0.659932	a 'this' pointer.	-0.124939
-0.504982	a hidden pointer.	-0.124939
-1.490231	compiler that supports	-0.124939
-0.599897	microprocessor that supports	-0.124939
-2.049150	The compiler supports	-0.124939
-1.845021	Intel compiler supports	-0.124939
-1.149738	Microsoft compiler supports	-0.124939
-0.587813	PGI compiler supports	-0.124939
-0.600422	utility. It supports	-0.124939
-1.536259	the CPU supports	-0.124939
-1.570718	The CPU supports	-0.124939
-1.742730	instruction set supports	-0.425969
-1.481730	but also supports	-0.124939
-0.575567	model N supports	-0.124939
-1.203842	test tool supports	-0.124939
-0.358873	model N+1 supports	-0.124939
-0.358873	environment (IDE) supports	-0.124939
-2.730520	that the C	-0.124939
-0.601620	B and C	-0.124939
-0.902411	arrays in C	-0.124939
-1.201781	often be C	-0.124939
-0.597862	together with C	-0.124939
-0.597862	manipulated with C	-0.124939
-1.497804	resources than C	-0.124939
-1.082187	the Gnu C	-0.124939
-1.690379	a separate C	-0.124939
-0.573186	choose either C	-0.124939
-0.065793	= 2.2, C	-0.425969
-0.065793	old fashioned C	-0.425969
-0.463513	the low-level C	-0.124939
-0.358830	The official C	-0.124939
-1.378201	that is compatible	-0.602060
-1.291811	processor is compatible	-0.124939
-1.233298	not be compatible	-0.425969
-2.107684	will be compatible	-0.124939
-1.238052	are not compatible	-0.301030
-1.787952	The most compatible	-0.124939
-0.891608	not even compatible	-0.124939
-0.567105	are fully compatible	-0.124939
-0.557735	and highly compatible	-0.124939
-0.504999	of backwards compatible	-0.124939
-0.358914	not backwards compatible	-0.124939
-0.463586	is mostly compatible	-0.124939
-2.652508	of a change	-0.124939
-0.902544	allowed to change	-0.124939
-0.902133	updating. The change	-0.124939
-1.051110	loop can change	-0.124939
-1.352140	library can change	-0.124939
-1.819372	You can change	-0.124939
-1.167080	CPUs can change	-0.124939
-1.675344	We can change	-0.124939
-1.249958	compiler may change	-0.124939
-2.087738	if you change	-0.124939
-1.734835	compiler will change	-0.124939
-1.173902	which will change	-0.124939
-1.286070	if we change	-0.124939
-0.580216	reference cannot change	-0.124939
-0.580216	We cannot change	-0.124939
-0.567137	cases. Don't change	-0.124939
-3.062047	in the global	-0.124939
-2.306995	to a global	-0.124939
-2.576356	in a global	-0.124939
-2.230208	as a global	-0.124939
-1.644432	when a global	-0.124939
-0.598871	log2 a global	-0.124939
-1.078352	static and global	-0.124939
-0.503116	static or global	-0.124939
-0.599709	names, one global	-0.124939
-1.848644	a variable global	-0.124939
-0.598027	make variables global	-0.124939
-1.261386	are called global	-0.124939
-0.579859	variables called global	-0.124939
-0.594744	preferably avoid global	-0.124939
-0.591444	variables. All global	-0.124939
-0.577472	26. Avoid global	-0.124939
-0.902527	results of my	-0.124939
-0.601645	computers and my	-0.124939
-0.597339	about in my	-0.124939
-0.893699	matrix in my	-0.124939
-0.597339	look in my	-0.124939
-0.597339	reductions in my	-0.124939
-1.065170	found in my	-0.124939
-1.293014	manual for my	-0.124939
-0.600042	suggestions for my	-0.124939
-0.902099	note that my	-0.124939
-1.740411	replaced by my	-0.124939
-1.799501	based on my	-0.124939
-0.597664	mainly on my	-0.124939
-0.597449	issue. See my	-0.124939
-1.069877	code. For my	-0.124939
-0.589560	topic, see my	-0.124939
-0.541114	solution. (In my	-0.124939
-1.593561	avoid the conversions	-0.124939
-1.076614	Avoid the conversions	-0.124939
-0.601203	Move the conversions	-0.124939
-0.599772	language, all conversions	-0.124939
-1.069466	to float conversions	-0.124939
-1.274251	don't need conversions	-0.124939
-0.890476	different type conversions	-0.124939
-1.644234	to avoid conversions	-0.124939
-0.835925	cannot avoid conversions	-0.124939
-0.511243	time. These conversions	-0.124939
-0.511243	pointer. These conversions	-0.124939
-0.511243	precision. These conversions	-0.124939
-0.511243	checks. These conversions	-0.124939
-0.577492	140. Avoid conversions	-0.124939
-0.805832	7.11 Type conversions	-0.124939
-0.358873	floating point-to-integer conversions	-0.124939
-1.831458	time the statement	-0.124939
-1.068149	the if statement	-0.124939
-0.895704	The if statement	-0.124939
-0.600629	after this statement	-0.124939
-1.867099	only one statement	-0.124939
-1.367208	that each statement	-0.124939
-0.937852	function call statement	-0.124939
-1.592276	loop control statement	-0.124939
-0.351274	a switch statement	-0.124939
-0.369403	or switch statement	-0.124939
-0.519753	A switch statement	-0.124939
-0.369403	lists, switch statement	-0.124939
-0.971090	empty throw() statement	-0.124939
-0.562904	No general statement	-0.124939
-1.075988	cause of errors	-0.124939
-1.199867	source of errors	-0.124939
-0.600169	preventing program errors	-0.124939
-2.637741	the same errors	-0.124939
-0.996601	for such errors	-0.124939
-0.368250	prevent such errors	-0.124939
-0.997964	common programming errors	-0.124939
-0.573267	catch programming errors	-0.124939
-1.642629	can cause errors	-0.124939
-0.878301	of handling errors	-0.124939
-0.557616	because serious errors	-0.124939
-0.504901	cause unpredictable errors	-0.124939
-0.504901	rounding 137 errors	-0.124939
-0.358844	for detecting errors	-0.124939
-0.358844	cause fatal errors	-0.124939
-0.601696	optional and off	-0.124939
-0.601346	turn it off	-0.124939
-0.130053	to turn off	-0.425969
-0.332662	and turn off	-0.124939
-0.229238	or turn off	-0.124939
-0.577566	turn them off	-0.124939
-0.557737	or log off	-0.124939
-0.172677	for turning off	-0.124939
-0.156174	by turning off	-0.425969
-0.358916	will cut off	-0.124939
-2.432452	number of unused	-0.124939
-0.601016	holes of unused	-0.124939
-1.708700	in an unused	-0.124939
-1.183118	making an unused	-0.124939
-1.070987	// 2 unused	-0.124939
-1.160263	// 4 unused	-0.124939
-0.579578	also 4 unused	-0.124939
-0.597392	// For unused	-0.124939
-0.865141	loop ; unused	-0.124939
-0.518442	r ; unused	-0.124939
-0.518442	true ; unused	-0.124939
-0.518442	;r ; unused	-0.124939
-1.803873	a few unused	-0.124939
-1.252081	to add unused	-0.124939
-0.450663	are 6 unused	-0.124939
-0.450663	// 6 unused	-0.124939
-0.601898	explaining the relative	-0.124939
-2.304269	with a relative	-0.124939
-0.601051	sees a relative	-0.124939
-1.742169	support for relative	-0.124939
-1.201732	code are relative	-0.124939
-1.376166	each function relative	-0.124939
-1.264491	efficient because relative	-0.124939
-0.592633	smaller because relative	-0.124939
-1.336771	the member relative	-0.124939
-1.286382	data member relative	-0.124939
-1.411963	will generate relative	-0.124939
-1.336252	the offset relative	-0.124939
-0.463531	and mostly relative	-0.124939
-0.143368	only self- relative	-0.124939
-0.143368	calculating self- relative	-0.124939
-0.358844	fact addressed relative	-0.124939
-1.354231	number of columns	-0.425969
-0.203823	rows and columns	-0.425969
-1.200685	multiplication by columns	-0.124939
-0.900027	faster when columns	-0.124939
-0.595937	// loop columns	-0.425969
-0.599914	5. If columns	-0.124939
-0.597284	last 8 columns	-0.124939
-0.582056	add unused columns	-0.124939
-0.065798	= 20, columns	-0.425969
-0.463568	= 10, columns	-0.124939
-1.199752	i to p	-0.124939
-1.376931	added to p	-0.124939
-1.074813	see that p	-0.124939
-0.898722	clear that p	-0.124939
-1.200943	p = p	-0.124939
-0.901291	to by p	-0.124939
-0.600995	thing as p	-0.124939
-1.539146	the pointer p	-0.124939
-1.367116	of object p	-0.124939
-0.598372	C0 * p	-0.124939
-0.597430	read before p	-0.124939
-2.078242	int i; p	-0.124939
-0.592449	fast whether p	-0.124939
-0.835662	optimizing away p	-0.124939
-0.463495	p->NotPolymorphic(); p->Hello(); p	-0.124939
-0.659639	* p; p	-0.124939
-1.747359	for all platforms.	-0.124939
-0.552737	and Windows platforms.	-0.124939
-0.541235	on Windows platforms.	-0.124939
-1.427115	and Mac platforms.	-0.124939
-0.422094	for x86 platforms.	-0.124939
-0.299864	all x86 platforms.	-0.124939
-0.567129	standardized across platforms.	-0.124939
-0.562980	on PC platforms.	-0.124939
-0.129457	and x86-64 platforms.	-0.124939
-0.504962	all Unix-like platforms.	-0.124939
-0.065800	all major platforms.	-0.124939
-1.790976	and other languages	-0.124939
-1.253225	than other languages	-0.124939
-0.596386	However, these languages	-0.124939
-1.246670	of programming languages	-0.124939
-0.492593	from programming languages	-0.124939
-0.436222	other programming languages	-0.124939
-0.492593	compiled programming languages	-0.124939
-0.492593	modern programming languages	-0.124939
-1.057201	in compiled languages	-0.124939
-1.098863	This includes languages	-0.124939
-0.828072	in interpreted languages	-0.124939
-0.505011	while high-level languages	-0.124939
-0.463568	time. Interpreted languages	-0.124939
-0.358873	code. Compiled languages	-0.124939
-0.358873	hand. Low-level languages	-0.124939
-3.080080	of the installation	-0.124939
-2.616093	for the installation	-0.124939
-1.200701	during the installation	-0.124939
-1.068932	*.so). The installation	-0.124939
-0.598617	use. The installation	-0.124939
-0.598617	installed. The installation	-0.124939
-0.601575	procedures for installation	-0.124939
-0.899530	Dispatch at installation	-0.124939
-0.599772	select all installation	-0.124939
-0.597122	should take installation	-0.124939
-0.483845	both during installation	-0.124939
-0.483845	itself, during installation	-0.124939
-0.567084	use standardized installation	-0.124939
-0.129464	3.3 Program installation	-0.124939
-0.527207	by individual installation	-0.124939
-1.239306	function name depending	-0.124939
-1.441198	various ways depending	-0.124939
-0.582017	frequency dynamically depending	-0.124939
-0.167051	clock cycles, depending	-0.823909
-0.557576	the memory, depending	-0.124939
-1.048348	32-bit integers, depending	-0.124939
-0.504901	and 64, depending	-0.124939
-0.463531	or four, depending	-0.124939
-0.358844	several meanings depending	-0.124939
-0.358844	example 12.4a, depending	-0.124939
-0.358844	conditional move, depending	-0.124939
-0.358844	following solutions, depending	-0.124939
-2.649689	that the syntax	-0.124939
-1.839520	but the syntax	-0.124939
-0.746736	Unfortunately, the syntax	-0.124939
-0.601798	relieving a syntax	-0.124939
-0.598617	x The syntax	-0.124939
-1.068932	are: The syntax	-0.124939
-0.598617	space. The syntax	-0.124939
-0.900029	little more syntax	-0.124939
-1.453417	the C++ syntax	-0.124939
-1.239294	The C++ syntax	-0.124939
-0.895277	has some syntax	-0.124939
-1.266523	inline assembly syntax	-0.124939
-0.594722	// Windows syntax	-0.124939
-0.593643	// Linux syntax	-0.124939
-0.505011	or bypassing syntax	-0.124939
-3.199634	of the cases.	-0.124939
-0.781256	in most cases.	-0.124939
-1.069009	in such cases.	-0.124939
-0.588158	in many cases.	-0.124939
-0.915902	in some cases.	-0.124939
-0.237251	in simple cases.	-0.124939
-1.784267	the best cases.	-0.124939
-1.383626	in both cases.	-0.124939
-1.297399	the simplest cases.	-0.124939
-0.586499	newest processors. Supports	-0.124939
-0.867695	Microsoft compiler. Supports	-0.124939
-0.577372	source library. Supports	-0.124939
-0.575408	Intel libraries. Supports	-0.124939
-0.573135	moderately well. Supports	-0.124939
-1.217314	instruction sets. Supports	-0.124939
-0.570384	are not. Supports	-0.124939
-0.527080	optimization options. Supports	-0.124939
-0.504821	binary code). Supports	-0.124939
-0.659582	automatic parallelization. Supports	-0.124939
-0.659582	and 64-bit. Supports	-0.124939
-0.659582	and Mac. Supports	-0.124939
-0.463458	Open source. Supports	-0.124939
-0.358787	or PSDK). Supports	-0.124939
-0.358787	optimized yet. Supports	-0.124939
-0.358787	possible workaround. Supports	-0.124939
-2.891566	in the choice	-0.124939
-2.019874	that the choice	-0.425969
-0.900028	offer the choice	-0.124939
-0.900028	Today, the choice	-0.124939
-0.595723	platform The choice	-0.124939
-0.890507	applications. The choice	-0.124939
-0.595723	values. The choice	-0.124939
-0.595723	accelerators The choice	-0.124939
-0.595723	algorithm. The choice	-0.124939
-0.699158	a good choice	-0.602060
-0.908416	very good choice	-0.124939
-1.586850	the optimal choice	-0.124939
-1.048840	a suitable choice	-0.124939
-1.203386	point is 1.	-0.124939
-0.680054	0 and 1.	-0.124939
-0.896678	N = 1.	-0.124939
-0.598842	reciprocal_divisor = 1.	-0.124939
-0.389237	0 or 1.	-0.124939
-0.896639	evict number 1.	-0.124939
-0.659754	this problem: 1.	-0.124939
-0.358873	CPU dispatching: 1.	-0.124939
-0.358873	function local: 1.	-0.124939
-0.358873	five manuals: 1.	-0.124939
-0.358873	are satisfied: 1.	-0.124939
-2.225242	of the STL	-0.124939
-2.320439	in the STL	-0.124939
-1.487420	However, the STL	-0.124939
-0.898589	fact, the STL	-0.124939
-1.964193	used in STL	-0.124939
-0.900259	matrix in STL	-0.124939
-1.708817	in an STL	-0.124939
-1.274979	into an STL	-0.124939
-1.631854	not use STL	-0.124939
-0.599937	The other STL	-0.124939
-0.595223	STL. Some STL	-0.124939
-0.527249	four objects. STL	-0.124939
-0.659811	the container. STL	-0.124939
-2.053764	it is intended	-0.124939
-2.227985	function is intended	-0.124939
-2.122641	code is intended	-0.124939
-2.350277	This is intended	-0.124939
-1.997494	It is intended	-0.124939
-1.185069	handling is intended	-0.124939
-1.275810	feature is intended	-0.124939
-0.595616	interposition is intended	-0.124939
-2.247475	that are intended	-0.124939
-0.898326	examples are intended	-0.124939
-0.901147	vectorized as intended	-0.124939
-2.589745	is not intended	-0.124939
-1.374465	slower than intended	-0.124939
-0.828254	processing unit intended	-0.124939
-1.203254	addresses of dynamically	-0.124939
-2.140143	stored in dynamically	-0.124939
-1.912911	must be dynamically	-0.124939
-2.278008	such as dynamically	-0.124939
-1.073136	to different dynamically	-0.124939
-0.908238	is allocated dynamically	-0.124939
-0.391995	be allocated dynamically	-0.301030
-0.886682	many small dynamically	-0.124939
-1.504046	clock frequency dynamically	-0.124939
-1.098586	to align dynamically	-0.124939
-0.143390	12.8 Aligning dynamically	-0.425969
-0.504901	of aligning dynamically	-0.124939
-0.358844	may vary dynamically	-0.124939
-1.379269	sequence of consecutive	-0.124939
-1.201686	identified by consecutive	-0.124939
-0.597164	covers 64 consecutive	-0.124939
-0.592991	calculates four consecutive	-0.124939
-0.058216	in eight consecutive	-0.726999
-0.028133	Load eight consecutive	-1.028029
-2.239237	then the profiler	-0.124939
-1.444285	including the profiler	-0.124939
-2.220047	with a profiler	-0.124939
-0.896810	way a profiler	-0.124939
-0.599537	Use a profiler	-0.425969
-1.069794	include a profiler	-0.124939
-0.594248	programs. The profiler	-0.124939
-0.202546	sampling: The profiler	-0.425969
-0.887604	processes. The profiler	-0.124939
-0.594248	millisecond. The profiler	-0.124939
-0.594248	Debugging. The profiler	-0.124939
-0.600457	153. A profiler	-0.124939
-0.557753	CPUs. Intel's profiler	-0.124939
-0.358930	VTune; AMD's profiler	-0.124939
-1.492156	memory to become	-0.124939
-1.660634	point to become	-0.124939
-1.369785	certain to become	-0.124939
-0.897846	space to become	-0.124939
-1.713705	code can become	-0.124939
-0.596963	measurements can become	-0.124939
-0.596963	unequally can become	-0.124939
-1.060068	computers have become	-0.124939
-0.595595	projects have become	-0.124939
-0.600445	feature will become	-0.124939
-0.600438	block then become	-0.124939
-0.579459	way has become	-0.124939
-0.579459	space has become	-0.124939
-0.579459	platform has become	-0.124939
-0.579459	heap has become	-0.124939
-1.065413	can easily become	-0.124939
-1.918912	and a Windows,	-0.124939
-1.299712	example, in Windows,	-0.124939
-1.672555	compiler for Windows,	-0.124939
-0.893052	etc. for Windows,	-0.124939
-0.502114	guide for Windows,	-0.425969
-0.901229	platforms with Windows,	-0.124939
-1.628471	and 64-bit Windows,	-0.124939
-1.305744	In 64-bit Windows,	-0.124939
-0.598567	short. In Windows,	-0.124939
-1.475522	for 32-bit Windows,	-0.124939
-0.863025	// 32-bit Windows,	-0.124939
-1.817395	operating systems Windows,	-0.124939
-0.589986	Linux, Mac Windows,	-0.124939
-0.575552	and 16-bit Windows,	-0.124939
-0.541114	this. (In Windows,	-0.124939
-2.730092	if the index	-0.124939
-0.601562	multiplying the index	-0.124939
-0.902126	Check that index	-0.124939
-0.601053	j as index	-0.124939
-0.601024	i. This index	-0.124939
-1.464115	with an index	-0.124939
-0.596714	plus an index	-0.124939
-1.366712	the array index	-0.124939
-0.810875	an array index	-0.124939
-0.546169	[] array index	-0.124939
-1.155981	by their index	-0.124939
-0.670450	the last index	-0.124939
-0.358887	write FatalAppExitA(0,"Array index	-0.124939
-2.100699	on a modern	-0.124939
-0.598682	operations of modern	-0.124939
-1.424048	speed of modern	-0.124939
-0.598682	core of modern	-0.124939
-0.598682	capabilities of modern	-0.124939
-0.598682	complexity of modern	-0.124939
-0.601581	inefficient. The modern	-0.124939
-2.294233	is that modern	-0.124939
-1.493632	is because modern	-0.124939
-1.024466	by all modern	-0.124939
-0.583045	why all modern	-0.124939
-0.865858	almost all modern	-0.124939
-0.897784	with most modern	-0.124939
-0.591464	execution All modern	-0.124939
-0.589583	12. Most modern	-0.124939
-0.550676	Perl. Several modern	-0.124939
-1.529997	one that gives	-0.124939
-1.067712	method that gives	-0.124939
-1.067712	option that gives	-0.124939
-2.144262	because it gives	-0.124939
-1.985211	of code gives	-0.124939
-0.600989	clock. This gives	-0.124939
-0.599806	double which gives	-0.124939
-2.437759	instruction set gives	-0.124939
-1.190780	these two gives	-0.124939
-0.892827	it often gives	-0.124939
-1.523188	32-bit systems gives	-0.124939
-0.584139	calculation here gives	-0.124939
-0.570416	library, SSE4.1 gives	-0.124939
-0.902769	VIA CPUs" gives	-0.124939
-0.659639	all 0's gives	-0.124939
-0.463495	= N&(N-1) gives	-0.124939
-1.314108	{ // Loop	-0.124939
-0.587227	branch // Loop	-0.124939
-0.201113	Table // Loop	-0.425969
-0.587227	TILESIZE // Loop	-0.124939
-2.787258	x x Loop	-0.124939
-2.176973	} } Loop	-0.124939
-1.063522	execution time. Loop	-0.124939
-1.334832	the compiler. Loop	-0.124939
-1.048593	this case. Loop	-0.124939
-0.999479	unroll factor. Loop	-0.124939
-0.463568	is eliminated. Loop	-0.124939
-0.358873	Example 12.4a. Loop	-0.124939
-0.358873	Example 8.23a. Loop	-0.124939
-1.078610	transfer is avoided	-0.124939
-1.451961	can be avoided	-0.726999
-1.342162	should be avoided	-0.301030
-1.631495	preferably be avoided	-0.124939
-1.112298	sometimes be avoided	-0.124939
-0.370105	definitely be avoided	-0.124939
-2.150867	is to turn	-0.124939
-1.181416	program to turn	-0.124939
-1.810990	has to turn	-0.124939
-0.891590	user to turn	-0.124939
-1.557781	useful to turn	-0.124939
-1.070211	recommended to turn	-0.301030
-0.600601	using and turn	-0.124939
-0.600601	versions and turn	-0.124939
-0.601744	which in turn	-0.124939
-2.231744	you can turn	-0.124939
-0.601292	calculations or turn	-0.124939
-1.674837	Do not turn	-0.124939
-0.600827	until you turn	-0.124939
-0.600432	RTTI then turn	-0.124939
-2.760690	if the inlining	-0.124939
-0.902362	p and inlining	-0.124939
-0.601605	request for inlining	-0.124939
-0.891927	of function inlining	-0.124939
-1.344066	and function inlining	-0.124939
-0.592738	do function inlining	-0.124939
-0.376243	function by inlining	-0.425969
-1.483109	avoided by inlining	-0.124939
-1.449132	improved by inlining	-0.124939
-1.881277	This makes inlining	-0.124939
-0.495273	method Function inlining	-0.124939
-0.495273	function. Function inlining	-0.124939
-0.495273	copy Function inlining	-0.124939
-0.495273	about. Function inlining	-0.124939
-3.136252	of the size.	-0.124939
-0.902094	specifying the size.	-0.124939
-1.297129	speed or size.	-0.124939
-1.985805	of code size.	-0.124939
-1.664736	the vector size.	-0.124939
-0.586155	by vector size.	-0.124939
-0.586155	larger vector size.	-0.124939
-1.136781	the cache size.	-0.124939
-0.846274	and cache size.	-0.124939
-1.228137	level-1 cache size.	-0.124939
-1.285943	of variable size.	-0.124939
-1.304686	vector register size.	-0.124939
-0.578616	available register size.	-0.124939
-1.693717	a specific size.	-0.124939
-0.886374	matrix line size.	-0.124939
-2.058338	if the network	-0.124939
-2.058239	where the network	-0.124939
-2.711463	in a network	-0.124939
-2.083017	on a network	-0.124939
-0.601787	interfaces to network	-0.124939
-1.198215	access and network	-0.124939
-1.371383	files and network	-0.124939
-0.601581	controlled. The network	-0.124939
-1.077728	times for network	-0.124939
-0.901496	input or network	-0.124939
-0.601139	software with network	-0.124939
-1.362921	depend on network	-0.124939
-1.279522	relies on network	-0.124939
-0.463568	accessing databases, network	-0.124939
-0.601916	BSD, the slow	-0.124939
-1.853061	This is slow	-0.425969
-2.135441	which is slow	-0.124939
-1.891168	and a slow	-0.124939
-1.635069	with a slow	-0.124939
-1.077486	comparisons are slow	-0.124939
-1.375675	CPUs with slow	-0.124939
-0.891629	calls may slow	-0.124939
-0.596292	writes may slow	-0.124939
-0.599826	setup but slow	-0.124939
-0.595936	lookup operations slow	-0.124939
-0.498830	have particularly slow	-0.124939
-0.498830	any particularly slow	-0.124939
-0.599031	(a + b)	-0.124939
-0.598797	a, float b)	-0.124939
-0.597782	!(a < b)	-0.124939
-1.570390	const & b)	-0.124939
-0.116200	(2n / b)	-0.124939
-0.882868	a : b)	-0.124939
-0.587295	!(a || b)	-0.124939
-0.587318	c > b)	-0.124939
-0.580656	(a >= b)	-0.124939
-0.067503	a, bool b)	-0.726999
-0.601696	> and >=	-0.124939
-0.574959	and i >=	-0.124939
-0.574959	|| i >=	-0.124939
-0.574959	2.0; i >=	-0.124939
-0.574990	if (i >=	-0.124939
-0.818507	= (a >=	-0.124939
-0.015540	if (level >=	-0.425969
-0.090158	{ (iset >=	-0.124939
-0.090158	&SelectAddMul_AVX2; (iset >=	-0.124939
-0.090158	&SelectAddMul_SSE41; (iset >=	-0.124939
-0.659840	((unsigned int)i >=	-0.124939
-2.765833	of the desired	-0.124939
-1.520482	to the desired	-0.221849
-1.920924	for the desired	-0.124939
-1.481744	put the desired	-0.124939
-1.435297	enable the desired	-0.124939
-0.203319	obtain the desired	-0.124939
-1.078491	initialized to desired	-0.124939
-1.445204	problems and desired	-0.124939
-0.601200	type with desired	-0.124939
-1.433680	are used. Such	-0.124939
-0.573138	either way. Such	-0.124939
-0.562812	application software. Such	-0.124939
-0.557577	soft processor. Such	-0.124939
-0.557530	definition language. Such	-0.124939
-1.135936	dependency chain. Such	-0.124939
-0.527122	are running. Such	-0.124939
-0.659639	the market. Such	-0.124939
-0.143359	same chip. Such	-0.124939
-0.143359	CPU chip. Such	-0.124939
-0.358816	system standards. Such	-0.124939
-0.358816	hardware identification. Such	-0.124939
-0.358816	computer games. Such	-0.124939
-0.358816	or __declspec(thread). Such	-0.124939
-0.358816	not reproducible. Such	-0.124939
-2.299095	use the #pragma	-0.124939
-2.453301	when the #pragma	-0.124939
-0.896346	__restrict or #pragma	-0.124939
-0.598675	vectorize, or #pragma	-0.124939
-0.601255	Optimize function #pragma	-0.124939
-1.496259	then use #pragma	-0.124939
-1.061308	vector always #pragma	-0.124939
-0.590788	Noncached write #pragma	-0.124939
-0.530589	is aligned #pragma	-0.124939
-0.892605	vector aligned #pragma	-0.124939
-0.997970	vector nontemporal #pragma	-0.124939
-0.434966	ivdep __restrict #pragma	-0.124939
-0.434966	noalias) __restrict #pragma	-0.124939
-0.358859	not aliased #pragma	-0.124939
-0.358859	("internal"))) Vectorize #pragma	-0.124939
-1.008371	system code. Dynamic	-0.124939
-1.008371	extra code. Dynamic	-0.124939
-1.530789	is used. Dynamic	-0.124939
-1.779830	memory allocation Dynamic	-0.124939
-1.557041	less efficient. Dynamic	-0.124939
-1.161598	end user. Dynamic	-0.124939
-1.281811	memory allocation. Dynamic	-0.124939
-0.835610	caching inefficient. Dynamic	-0.124939
-0.541092	systems). 28 Dynamic	-0.124939
-0.527165	optimization are. Dynamic	-0.124939
-0.726584	is limited. Dynamic	-0.124939
-0.143368	memory. 9.6 Dynamic	-0.124939
-0.143368	90 9.6 Dynamic	-0.124939
-0.143368	19 3.6 Dynamic	-0.124939
-0.143368	acceptable. 3.6 Dynamic	-0.124939
-0.898595	seldom used functions,	-0.124939
-1.069736	in library functions,	-0.124939
-0.894695	with member functions,	-0.124939
-1.177029	to its functions,	-0.124939
-0.594273	inheritance, virtual functions,	-0.124939
-1.016466	for intrinsic functions,	-0.124939
-0.498175	supports intrinsic functions,	-0.124939
-0.498175	assembly-like intrinsic functions,	-0.124939
-0.570412	contains similar functions,	-0.124939
-0.567151	are pure functions,	-0.124939
-0.504921	to well-tested functions,	-0.124939
-0.107179	logarithms, exponential functions,	-0.425969
-0.107179	functions, trigonometric functions,	-0.425969
-2.445200	of the whole	-0.124939
-2.435550	and the whole	-0.124939
-2.531475	for the whole	-0.124939
-1.491820	put the whole	-0.124939
-0.599810	Test the whole	-0.124939
-0.599810	throughout the whole	-0.124939
-1.298123	take a whole	-0.124939
-0.601089	draws a whole	-0.124939
-1.582672	option for whole	-0.124939
-1.725024	support for whole	-0.124939
-1.881538	can do whole	-0.124939
-1.065518	feature called whole	-0.124939
-0.594587	"__attribute__((visibility("hidden")))". Use whole	-0.124939
-1.332913	of doing whole	-0.124939
-1.600737	avoid the inefficient	-0.124939
-2.103184	it is inefficient	-0.425969
-2.452467	This is inefficient	-0.124939
-0.897105	other is inefficient	-0.124939
-1.070235	storage is inefficient	-0.124939
-1.077550	comparisons are inefficient	-0.124939
-1.762616	in an inefficient	-0.124939
-1.737414	compilers have inefficient	-0.124939
-1.399590	is very inefficient	-0.124939
-1.038608	a very inefficient	-0.124939
-1.240170	be very inefficient	-0.124939
-1.089702	be quite inefficient	-0.124939
-0.550770	but quite inefficient	-0.124939
-2.383258	and the level-2	-0.124939
-1.933521	in the level-2	-0.425969
-2.477259	for the level-2	-0.124939
-2.522384	that the level-2	-0.124939
-2.186648	than the level-2	-0.124939
-2.111944	from the level-2	-0.124939
-1.543276	prevents the level-2	-0.124939
-1.905549	and a level-2	-0.124939
-0.601104	if, a level-2	-0.124939
-1.597805	cache. The level-2	-0.124939
-0.601625	stronger for level-2	-0.124939
-0.601298	Check if level-2	-0.124939
-2.013854	that the response	-0.425969
-1.655689	because the response	-0.124939
-2.303673	If the response	-0.124939
-1.196577	test the response	-0.124939
-1.599910	such a response	-0.124939
-1.444435	waiting for response	-0.124939
-0.086068	unacceptably long response	-0.425969
-0.587428	of longer response	-0.124939
-0.463622	an immediate response	-0.124939
-0.358916	and irregular response	-0.124939
-1.155733	method is described	-0.124939
-0.601483	algorithms are described	-0.124939
-0.586911	function as described	-0.124939
-1.035232	code, as described	-0.124939
-0.123229	CPUs, as described	-0.602060
-1.957867	I have described	-0.124939
-1.470988	The method described	-0.124939
-0.890515	the cases described	-0.124939
-0.877430	the methods described	-0.124939
-1.164874	the syntax described	-0.124939
-0.579202	are further described	-0.124939
-0.659754	preceding paragraph described	-0.124939
-0.913728	power of 2.	-0.221849
-1.434869	powers of 2.	-0.124939
-2.216697	will be 2.	-0.124939
-0.377448	i by 2.	-0.425969
-1.058871	out by 2.	-0.124939
-0.581989	Mac platforms. 2.	-0.124939
-0.570534	code version. 2.	-0.124939
-0.358902	different meaning. 2.	-0.124939
-0.358902	the loader. 2.	-0.124939
-0.358902	or PathScale. 2.	-0.124939
-1.554283	types of variables.	-0.124939
-2.637741	the same variables.	-0.124939
-1.289568	of all variables.	-0.124939
-1.196664	on integer variables.	-0.124939
-0.896479	use static variables.	-0.124939
-1.365974	and unsigned variables.	-0.124939
-1.125394	of register variables.	-0.124939
-0.965553	for register variables.	-0.124939
-1.234473	point register variables.	-0.124939
-1.062273	for these variables.	-0.124939
-0.875525	with induction variables.	-0.124939
-0.586606	any public variables.	-0.124939
-0.744144	or global variables.	-0.124939
-0.744144	called global variables.	-0.124939
-0.582072	of consecutive variables.	-0.124939
-2.465969	number of lines	-0.124939
-1.050385	the cache lines	-0.124939
-1.260392	of cache lines	-0.124939
-0.531293	different cache lines	-0.124939
-0.531293	any cache lines	-0.124939
-0.531293	these cache lines	-0.124939
-0.116482	four cache lines	-0.301030
-0.897360	organized into lines	-0.124939
-0.597391	the 4 lines	-0.124939
-0.891181	to 16 lines	-0.124939
-0.594239	128. These lines	-0.124939
-1.804363	a few lines	-0.124939
-0.900684	around the hot	-0.124939
-1.296232	finding the hot	-0.124939
-0.600855	once the hot	-0.124939
-0.900684	isolate the hot	-0.124939
-2.613443	is a hot	-0.124939
-0.600343	When a hot	-0.124939
-0.600343	identify a hot	-0.124939
-1.639572	functions and hot	-0.124939
-1.740421	function or hot	-0.124939
-2.043654	in this hot	-0.124939
-0.598236	identifies any hot	-0.124939
-0.885286	to find hot	-0.425969
-1.321827	for finding hot	-0.124939
-0.358887	for identifying hot	-0.124939
-1.036319	integer types Unfortunately,	-0.124939
-1.029867	never called. Unfortunately,	-0.124939
-1.152567	= 2; Unfortunately,	-0.124939
-1.462495	function calls. Unfortunately,	-0.124939
-1.106181	container classes. Unfortunately,	-0.124939
-0.851480	these purposes. Unfortunately,	-0.124939
-1.376181	CPU dispatching. Unfortunately,	-0.124939
-0.557557	virtual table. Unfortunately,	-0.124939
-0.957386	processor core. Unfortunately,	-0.124939
-0.550564	do this. Unfortunately,	-0.124939
-0.541093	functions. 80 Unfortunately,	-0.124939
-0.726484	AMD CodeAnalyst. Unfortunately,	-0.124939
-0.358801	cross-platform portability. Unfortunately,	-0.124939
-0.358801	and lrint. Unfortunately,	-0.124939
-0.358801	page 132. Unfortunately,	-0.124939
-0.847314	Gnu C++ v.	-0.124939
-0.847314	PathScale C++ v.	-0.124939
-0.847314	PGI C++ v.	-0.124939
-0.841895	C++ compiler, v.	-0.124939
-0.357388	C++ Compiler v.	-0.124939
-0.377926	Mars Compiler v.	-0.124939
-0.504998	Watcom C/C++ v.	-0.124939
-0.659725	Codeplay VectorC v.	-0.124939
-0.358859	works (gcc v.	-0.124939
-0.358859	Library (MKL v.	-0.124939
-0.358859	2.8. Asmlib: v.	-0.124939
-0.358859	Borland bcc, v.	-0.124939
-0.358859	Gnu: Glibc v.	-0.124939
-0.358859	studio 2008, v.	-0.124939
-0.600996	a. This operation	-0.124939
-1.948501	the same operation	-0.425969
-1.881753	floating point operation	-0.124939
-1.658045	in one operation	-0.124939
-1.184689	the & operation	-0.124939
-1.858410	a single operation	-0.124939
-1.058905	a store operation	-0.124939
-0.586588	Each graphics operation	-0.124939
-1.138713	a shift operation	-0.124939
-0.575573	Each 128-bit operation	-0.124939
-0.541128	bitwise AND operation	-0.124939
-0.463513	complex digital operation	-0.124939
-0.358830	an illegal operation	-0.124939
-2.090323	of the code,	-0.124939
-2.888533	in the code,	-0.124939
-1.203028	start to code,	-0.124939
-2.192949	more efficient code,	-0.124939
-0.882296	for optimizing code,	-0.124939
-1.328758	an intermediate code,	-0.124939
-0.550807	frameworks, intermediate code,	-0.124939
-1.304810	the source code,	-0.124939
-0.981785	position- independent code,	-0.124939
-0.541135	for CPU-intensive code,	-0.124939
-0.463568	with legacy code,	-0.124939
-0.358873	removing superfluous code,	-0.124939
-2.250592	then the instance	-0.124939
-0.900817	whenever an instance	-0.124939
-1.133036	than one instance	-0.124939
-0.938241	have one instance	-0.124939
-0.804244	make one instance	-0.124939
-1.419550	only one instance	-0.124939
-0.549831	get one instance	-0.124939
-0.193160	needs one instance	-0.425969
-0.897893	with each instance	-0.124939
-1.275431	A template instance	-0.124939
-1.235676	a new instance	-0.425969
-1.670654	the next instance	-0.124939
-0.586675	classes. Each instance	-0.124939
-1.290224	library that comes	-0.124939
-0.599877	algorithm that comes	-0.124939
-0.601218	initialized or comes	-0.124939
-1.769893	when it comes	-0.124939
-2.097467	because it comes	-0.124939
-2.207500	The compiler comes	-0.124939
-0.600414	source. It comes	-0.124939
-0.599817	(STL) which comes	-0.124939
-1.192942	register size comes	-0.124939
-0.592786	This advantage comes	-0.124939
-0.588975	new model comes	-0.124939
-0.588520	integer parameter comes	-0.124939
-0.573215	considerable delay comes	-0.124939
-0.573186	and BSD comes	-0.124939
-0.504881	main feedback comes	-0.124939
-3.205415	of the fact	-0.124939
-1.052620	is in fact	-0.124939
-0.148690	are in fact	-0.124939
-1.283144	may in fact	-0.124939
-0.593025	when in fact	-0.124939
-0.593025	had in fact	-0.124939
-1.064770	way. The fact	-0.124939
-0.893430	underflow. The fact	-0.124939
-0.597203	bit. The fact	-0.124939
-0.597203	20. The fact	-0.124939
-1.858322	of this fact	-0.124939
-2.081605	is to find	-0.124939
-1.318776	order to find	-0.301030
-1.388599	want to find	-0.425969
-0.593172	bytes to find	-0.124939
-1.928874	able to find	-0.124939
-1.114219	difficult to find	-0.124939
-0.202328	profiler to find	-0.425969
-1.734981	can also find	-0.124939
-1.634354	you cannot find	-0.124939
-0.463677	call. (2) find	-0.124939
-2.282412	is to rely	-0.124939
-1.199894	convenient to rely	-0.124939
-1.282240	compilers that rely	-0.124939
-1.067810	optimizations that rely	-0.124939
-0.895476	Algorithms that rely	-0.124939
-1.533514	you can rely	-0.425969
-0.599662	threads should rely	-0.124939
-1.067198	function cannot rely	-0.124939
-1.376051	you cannot rely	-0.124939
-1.361193	You cannot rely	-0.124939
-0.596066	cannot always rely	-0.124939
-0.594930	branch must rely	-0.124939
-0.567153	possible. Don't rely	-0.124939
-0.358887	can surely rely	-0.124939
-2.223986	{ // No	-0.124939
-1.788072	} // No	-0.124939
-0.897314	frame- pointer No	-0.124939
-1.007350	execution time. No	-0.124939
-1.555907	compile time. No	-0.124939
-1.478156	in memory. No	-0.124939
-0.591118	actually used. No	-0.124939
-0.583027	iterations are: No	-0.124939
-1.246102	template parameter. No	-0.124939
-0.566988	separate storage. No	-0.124939
-0.562912	linear array. No	-0.124939
-0.504881	2.1.7, 2004. No	-0.124939
-0.358830	handling /EHs- No	-0.124939
-0.358830	page 53). No	-0.124939
-0.358830	/Qipo -ipo No	-0.124939
-0.600207	example to produce	-0.124939
-1.434350	sure to produce	-0.124939
-0.600207	CLR, to produce	-0.124939
-1.074879	operators that produce	-0.124939
-0.599904	Programs that produce	-0.124939
-0.901865	output can produce	-0.124939
-1.929142	do not produce	-0.124939
-1.314698	does not produce	-0.425969
-0.900659	and may produce	-0.124939
-1.810154	compiler will produce	-0.124939
-0.599662	compiler should produce	-0.124939
-0.599423	Mars compilers produce	-0.124939
-1.109751	Boolean operators produce	-0.124939
-1.352925	bitwise operators produce	-0.124939
-3.133284	of the position-independent	-0.124939
-1.444072	off the position-independent	-0.124939
-1.378768	costs of position-independent	-0.124939
-0.379607	linking and position-independent	-0.425969
-0.901008	compiled as position-independent	-0.124939
-0.595043	often use position-independent	-0.124939
-0.889167	systems use position-independent	-0.124939
-0.600141	X make position-independent	-0.124939
-0.599301	not using position-independent	-0.124939
-0.596311	objects without position-independent	-0.124939
-1.458865	is always position-independent	-0.124939
-0.884980	compiler uses position-independent	-0.124939
-0.858365	for special position-independent	-0.124939
-0.358844	the burdensome position-independent	-0.124939
-1.677163	code for vectorization	-0.124939
-0.795377	that make vectorization	-0.124939
-2.049793	you want vectorization	-0.124939
-0.593040	how advantageous vectorization	-0.124939
-0.592515	correctly whether vectorization	-0.124939
-0.898437	and automatic vectorization	-0.124939
-0.644710	The automatic vectorization	-0.124939
-0.453932	where automatic vectorization	-0.124939
-0.453932	intrinsics, automatic vectorization	-0.124939
-0.267742	expressions Automatic vectorization	-0.124939
-0.267742	dispatch Automatic vectorization	-0.124939
-0.267742	12.1a. Automatic vectorization	-0.124939
-0.113557	12.3 Automatic vectorization	-0.124939
-2.084218	you are including	-0.124939
-0.601160	compiler by including	-0.124939
-0.888384	mathematical calculations including	-0.124939
-1.645618	and VIA including	-0.124939
-0.863759	32-bit Windows, including	-0.124939
-1.203062	the code, including	-0.124939
-0.179724	the strings including	-0.425969
-0.788702	turned on, including	-0.124939
-0.541111	many platforms, including	-0.124939
-0.541111	metaprogramming features, including	-0.124939
-0.527198	on n, including	-0.124939
-0.358816	static data, including	-0.124939
-0.358816	same computer, including	-0.124939
-0.358816	software package, including	-0.124939
-1.882440	useful for checking	-0.124939
-0.378809	operators for checking	-0.425969
-0.601228	overflow by checking	-0.124939
-2.053229	is no checking	-0.124939
-1.640970	have no checking	-0.124939
-0.596379	loop without checking	-0.124939
-0.582043	some syntax checking	-0.124939
-0.394160	of bounds checking	-0.124939
-0.284835	with bounds checking	-0.124939
-0.065804	// Bounds checking	-0.124939
-0.031657	14.2 Bounds checking	-0.124939
-0.065804	fragmentation. Bounds checking	-0.124939
-2.317755	because the out-of-order	-0.124939
-1.495653	However, the out-of-order	-0.124939
-1.295070	Using the out-of-order	-0.124939
-0.600861	general, the out-of-order	-0.124939
-1.243235	advantage of out-of-order	-0.124939
-1.203344	thanks to out-of-order	-0.124939
-0.202636	microprocessor with out-of-order	-0.425969
-0.888474	Microprocessors with out-of-order	-0.124939
-1.723659	have no out-of-order	-0.124939
-1.881376	can do out-of-order	-0.124939
-1.165817	from doing out-of-order	-0.124939
-0.579310	which prevents out-of-order	-0.124939
-0.902529	portable to platforms	-0.124939
-1.051810	to different platforms	-0.124939
-1.823835	for different platforms	-0.124939
-0.373043	to other platforms	-0.124939
-0.584357	on other platforms	-0.124939
-1.072552	about which platforms	-0.124939
-1.686985	on all platforms	-0.124939
-0.598769	supporting multiple platforms	-0.124939
-1.489839	most common platforms	-0.124939
-1.054365	for Linux platforms	-0.124939
-1.426930	and Mac platforms	-0.124939
-0.585102	All x86 platforms	-0.124939
-0.358859	and ARM platforms	-0.124939
-2.095853	set is particularly	-0.124939
-1.435730	speed is particularly	-0.124939
-1.289602	allocation is particularly	-0.124939
-0.599750	representation is particularly	-0.124939
-2.949515	can be particularly	-0.124939
-1.570752	that are particularly	-0.124939
-1.528718	operations are particularly	-0.124939
-0.594143	strings are particularly	-0.124939
-0.594143	drivers are particularly	-0.124939
-1.294265	CPUs have particularly	-0.124939
-0.600442	(www.agner.org/optimize/testp.zip). A particularly	-0.124939
-1.067896	or any particularly	-0.124939
-0.593467	implementation works particularly	-0.124939
-2.383000	function is given	-0.124939
-1.593843	class is given	-0.124939
-1.403983	for a given	-0.124939
-2.796294	can be given	-0.124939
-2.445265	may be given	-0.124939
-1.061174	classes are given	-0.124939
-0.595974	details are given	-0.124939
-0.595974	divisions are given	-0.124939
-0.595974	experiment are given	-0.124939
-0.601076	access, as given	-0.124939
-1.268228	not been given	-0.124939
-0.266222	the advice given	-0.124939
-3.013965	in the output	-0.124939
-1.202645	compile the output	-0.124939
-0.601683	input and output	-0.124939
-0.601602	file. The output	-0.124939
-0.601064	Booleans as output	-0.124939
-1.671104	to an output	-0.124939
-2.621475	the compiler output	-0.124939
-1.074315	are then output	-0.124939
-0.078880	the assembly output	-0.124939
-0.676448	The assembly output	-0.124939
-0.772168	an assembly output	-0.124939
-2.818775	of the level-1	-0.124939
-1.933521	in the level-1	-0.249877
-2.477259	for the level-1	-0.124939
-2.186648	than the level-1	-0.124939
-2.003520	where the level-1	-0.124939
-1.288003	both the level-1	-0.124939
-0.896528	reload the level-1	-0.124939
-2.717589	is a level-1	-0.124939
-1.077877	than for level-1	-0.124939
-2.644737	the same level-1	-0.124939
-1.393570	the entire level-1	-0.124939
-3.195064	of the resources.	-0.124939
-0.856258	waste of resources.	-0.124939
-2.636751	the same resources.	-0.124939
-1.369282	or other resources.	-0.124939
-0.893816	are critical resources.	-0.124939
-0.889072	of extra resources.	-0.124939
-0.593827	of allocated resources.	-0.124939
-0.592323	uses few resources.	-0.124939
-0.580657	or network resources.	-0.124939
-0.579145	with limited resources.	-0.124939
-0.541070	of computing resources.	-0.124939
-0.527143	were scarce resources.	-0.124939
-0.358830	have ample resources.	-0.124939
-2.775885	it is outside	-0.124939
-1.200287	i is outside	-0.124939
-1.073770	stack memory outside	-0.124939
-0.599791	function but outside	-0.124939
-0.598181	temporary variable outside	-0.124939
-0.595879	extra operations outside	-0.124939
-0.594366	last element outside	-0.124939
-1.340487	for overflow outside	-0.124939
-1.652900	be done outside	-0.124939
-1.617304	loop counter outside	-0.124939
-1.142780	are declared outside	-0.124939
-0.583064	calculations go outside	-0.124939
-0.573243	variables defined outside	-0.124939
-0.764299	can move outside	-0.124939
-3.196202	of the task	-0.124939
-1.660811	when a task	-0.124939
-1.626361	where a task	-0.124939
-1.294113	put a task	-0.124939
-1.640284	cost of task	-0.124939
-0.601632	interrupts and task	-0.124939
-0.601018	events as task	-0.124939
-0.601003	mouse. This task	-0.124939
-1.587716	for this task	-0.124939
-1.367083	to each task	-0.124939
-1.858655	a single task	-0.124939
-0.858525	a given task	-0.124939
-0.575438	checking. Any task	-0.124939
-0.358844	the essential task	-0.124939
-2.213002	code is limited	-0.124939
-0.897119	CPU is limited	-0.124939
-1.487478	performance is limited	-0.124939
-1.192259	frequency is limited	-0.124939
-0.599064	inputs is limited	-0.124939
-2.571587	is a limited	-0.124939
-2.335674	to a limited	-0.124939
-1.488463	only a limited	-0.124939
-1.194506	within a limited	-0.124939
-2.531284	to be limited	-0.124939
-0.901613	overloaded or limited	-0.124939
-0.601179	devices with limited	-0.124939
-0.594489	storage A limited	-0.124939
-0.888077	expensive. A limited	-0.124939
-1.203159	automatically in vectorized	-0.124939
-0.601592	prone. The vectorized	-0.124939
-1.690720	functions for vectorized	-0.124939
-2.148144	used for vectorized	-0.124939
-2.518620	can be vectorized	-0.124939
-1.728677	also be vectorized	-0.124939
-1.285385	cannot be vectorized	-0.124939
-0.594414	now be vectorized	-0.124939
-2.420473	to use vectorized	-0.124939
-0.597627	a faster vectorized	-0.124939
-1.069776	Same example, vectorized	-0.124939
-0.726685	is indeed vectorized	-0.124939
-0.358887	Taylor series, vectorized	-0.124939
-2.711667	It is sometimes	-0.124939
-0.601139	zero is sometimes	-0.124939
-0.900203	efficient, and sometimes	-0.124939
-0.600614	inconsistent and sometimes	-0.124939
-1.430634	processors are sometimes	-0.124939
-0.599661	macros are sometimes	-0.124939
-0.592515	optimization can sometimes	-0.124939
-0.884205	conversions can sometimes	-0.124939
-0.592515	Divisions can sometimes	-0.124939
-0.592515	139 can sometimes	-0.124939
-0.592515	shuffling can sometimes	-0.124939
-2.208173	The compiler sometimes	-0.124939
-0.872814	a framework sometimes	-0.124939
-0.557761	unreliable. They sometimes	-0.124939
-2.517044	and the local	-0.124939
-2.286828	use the local	-0.124939
-2.087223	make the local	-0.124939
-1.732367	to a local	-0.124939
-1.202564	data and local	-0.124939
-1.068718	name for local	-0.124939
-0.598544	destructors for local	-0.124939
-0.598544	lookups for local	-0.124939
-0.600038	Make functions local	-0.124939
-1.490385	with other local	-0.124939
-1.366746	to all local	-0.124939
-0.580736	data, including local	-0.124939
-0.726685	function parameters, local	-0.124939
-2.949773	of the costs	-0.124939
-2.468340	on the costs	-0.124939
-1.660868	about the costs	-0.124939
-0.600168	plus the costs	-0.124939
-0.899314	avoiding the costs	-0.124939
-0.600168	against the costs	-0.124939
-1.679056	kinds of costs	-0.124939
-0.598658	exception. The costs	-0.124939
-0.203437	1.1 The costs	-0.425969
-0.598692	STL also costs	-0.124939
-0.597987	inherent performance costs	-0.124939
-0.564810	CPUs. These costs	-0.124939
-0.564810	another. These costs	-0.124939
-1.444078	instance of S1	-0.124939
-1.201040	instances of S1	-0.124939
-0.600752	Func() { S1	-0.124939
-1.404296	unused bytes S1	-0.124939
-0.593029	19 }; S1	-0.124939
-1.386235	= 100; S1	-0.124939
-0.257314	__declspec(align(16)) struct S1	-0.124939
-0.257314	14.9 struct S1	-0.124939
-0.257314	8.15a struct S1	-0.124939
-0.257314	8.15b struct S1	-0.124939
-0.257314	7.35b struct S1	-0.124939
-0.257314	7.35a struct S1	-0.124939
-0.107190	double b;}; S1	-0.124939
-0.601794	library of math	-0.124939
-1.144552	of vector math	-0.124939
-0.645110	Intel vector math	-0.124939
-1.074748	long vector math	-0.124939
-0.572919	short vector math	-0.124939
-1.933983	the Intel math	-0.124939
-1.490144	most common math	-0.124939
-0.594086	The AMD math	-0.124939
-0.886385	high precision math	-0.124939
-0.885990	best optimized math	-0.124939
-1.247641	for fast math	-0.124939
-0.575546	A little math	-0.124939
-2.176814	value of temp	-0.124939
-1.078383	register to temp	-0.124939
-1.742669	b = temp	-0.124939
-1.752630	c = temp	-0.124939
-0.596398	c[i] = temp	-0.124939
-1.948183	i++) { temp	-0.124939
-1.695633	will make temp	-0.124939
-1.277061	of register temp	-0.124939
-0.579172	will save temp	-0.124939
-0.460090	= temp; temp	-0.124939
-0.326366	b, temp; temp	-0.124939
-0.326366	c, temp; temp	-0.124939
-0.326366	a[100], temp; temp	-0.124939
-0.358887	= &list[0]; temp	-0.124939
-3.080953	of the inlined	-0.124939
-2.057782	to the inlined	-0.124939
-2.301139	code is inlined	-0.124939
-0.902558	names of inlined	-0.124939
-2.344290	to be inlined	-0.124939
-2.369817	may be inlined	-0.124939
-1.934545	cannot be inlined	-0.124939
-1.077486	operators are inlined	-0.124939
-1.870471	of an inlined	-0.124939
-1.635339	to an inlined	-0.124939
-1.599796	are often inlined	-0.124939
-1.459368	is always inlined	-0.124939
-1.497363	is usually inlined	-0.124939
-2.730865	it is still	-0.124939
-0.899837	etc. is still	-0.124939
-0.600431	who is still	-0.124939
-1.057532	loop can still	-0.124939
-1.057532	set can still	-0.124939
-0.888538	solution can still	-0.124939
-0.594723	above can still	-0.124939
-1.597725	where it still	-0.124939
-2.650794	the code still	-0.124939
-1.733685	it will still	-0.124939
-0.895565	However, we still	-0.124939
-0.592181	0x273F would still	-0.124939
-0.872729	level framework still	-0.124939
-0.557644	processing capabilities still	-0.124939
-2.247391	of the class.	-0.124939
-1.948383	inside the class.	-0.124939
-0.941796	structure or class.	-0.124939
-2.640726	the same class.	-0.124939
-1.052372	to another class.	-0.124939
-1.503201	a container class.	-0.124939
-0.791788	and child class.	-0.124939
-0.868207	its child class.	-0.124939
-1.106959	the derived class.	-0.124939
-0.567201	// parent class.	-0.124939
-0.659783	the object's class.	-0.124939
-3.061147	in the database	-0.124939
-2.082657	use a database	-0.124939
-1.374191	replace a database	-0.124939
-0.601233	network or database	-0.124939
-1.197380	data. A database	-0.124939
-1.277086	the system database	-0.124939
-0.882243	by optimizing database	-0.124939
-0.575536	etc. Optimizing database	-0.124939
-0.557616	mutexes. Open database	-0.124939
-0.152940	3.8 System database	-0.124939
-0.541146	GUI development, database	-0.124939
-0.463531	windows, mutexes, database	-0.124939
-0.358844	big registration database	-0.124939
-2.729612	if the constants	-0.124939
-0.601556	Whether the constants	-0.124939
-2.432143	number of constants	-0.124939
-1.498877	table of constants	-0.124939
-1.444257	one for constants	-0.124939
-0.600001	containing only constants	-0.124939
-0.599903	by other constants	-0.124939
-1.882075	floating point constants	-0.425969
-1.578469	the two constants	-0.124939
-1.227713	between two constants	-0.124939
-0.590435	constants. Integer constants	-0.124939
-0.573237	All identical constants	-0.124939
-0.358873	stack. String constants	-0.124939
-1.829549	If a bool	-0.124939
-1.939167	instead of bool	-0.124939
-0.594560	alignment, bytes bool	-0.124939
-0.171837	(int a, bool	-0.726999
-1.149737	float a; bool	-0.124939
-1.078610	x, y; bool	-0.124939
-0.835653	following way: bool	-0.124939
-0.504998	Example 7.11 bool	-0.124939
-0.659725	y, z; bool	-0.124939
-0.358859	Example 7.10a bool	-0.124939
-0.358859	Example 7.9a bool	-0.124939
-1.183151	a time. Do	-0.124939
-1.266732	member function. Do	-0.124939
-1.433680	are used. Do	-0.124939
-1.556778	less efficient. Do	-0.124939
-1.742473	instruction set. Do	-0.124939
-1.281571	memory allocation. Do	-0.124939
-1.106536	memory block. Do	-0.124939
-0.557577	certain optimizations. Do	-0.124939
-0.835524	linked list. Do	-0.124939
-0.504957	scarce resource. Do	-0.124939
-0.463495	+ column; Do	-0.124939
-0.358816	Enterprise editions). Do	-0.124939
-0.358816	unique key. Do	-0.124939
-0.358816	hash map. Do	-0.124939
-1.077724	inlining the frame	-0.124939
-0.601574	turning the frame	-0.124939
-1.946630	than a frame	-0.124939
-1.076274	called a frame	-0.124939
-0.804716	calls to frame	-0.124939
-1.639823	functions and frame	-0.124939
-1.903750	efficient than frame	-0.124939
-1.343538	functions. A frame	-0.124939
-0.594474	exception. A frame	-0.124939
-0.754879	a stack frame	-0.124939
-0.186806	standard stack frame	-0.124939
-0.521663	No stack frame	-0.124939
-0.597961	% 2 ==	-0.124939
-0.592973	% 128 ==	-0.124939
-1.048713	if (a ==	-0.124939
-0.107188	// INSTRSET ==	-0.124939
-0.050295	#if INSTRSET ==	-0.425969
-0.050295	#elif INSTRSET ==	-0.124939
-0.314737	= (b ==	-0.124939
-0.444342	if (b ==	-0.124939
-0.107188	|| Day ==	-0.124939
-0.659811	if (Day ==	-0.124939
-0.358902	__except (GetExceptionCode() ==	-0.124939
-1.183129	b; int d;	-0.124939
-0.596716	7 int d;	-0.124939
-1.082562	{ double d;	-0.124939
-0.121060	u; double d;	-0.602060
-0.791923	c + d;	-0.124939
-0.268353	b, c, d;	-0.124939
-0.218825	0, c, d;	-0.124939
-0.505078	union {double d;	-0.124939
-1.804529	has the special	-0.124939
-2.661772	in a special	-0.124939
-2.274343	with a special	-0.124939
-1.963558	have a special	-0.124939
-1.678783	set of special	-0.124939
-0.900240	optimal in special	-0.124939
-1.076332	except in special	-0.124939
-1.295034	libraries for special	-0.124939
-1.073185	need for special	-0.124939
-2.157188	there are special	-0.124939
-1.847678	you have special	-0.124939
-2.530115	to make special	-0.124939
-1.634216	to take special	-0.124939
-0.550676	libraries. Several special	-0.124939
-2.192850	order to prevent	-0.124939
-1.285224	way to prevent	-0.124939
-2.032014	want to prevent	-0.124939
-0.894699	overhead to prevent	-0.124939
-0.597844	Volatile to prevent	-0.124939
-1.298118	CPU and prevent	-0.124939
-0.900178	user and prevent	-0.124939
-2.070748	that can prevent	-0.124939
-2.651209	the code prevent	-0.124939
-1.630616	This will prevent	-0.124939
-0.594602	It doesn't prevent	-0.124939
-0.585097	debugging options prevent	-0.124939
-0.504982	rebooted. To prevent	-0.124939
-2.133448	by a shift	-0.124939
-2.246636	with a shift	-0.124939
-2.254452	as a shift	-0.124939
-1.768828	using a shift	-0.124939
-1.068466	operations and shift	-0.124939
-0.598459	a[i] and shift	-0.124939
-0.598459	multiply and shift	-0.124939
-1.068466	additions and shift	-0.124939
-1.770973	We can shift	-0.124939
-1.202700	constant = shift	-0.124939
-1.588324	for this shift	-0.124939
-0.600445	14.28 will shift	-0.124939
-0.837279	ebx ; shift	-0.124939
-0.567909	sign(i) ; shift	-0.124939
-1.849461	and the destructor	-0.124939
-2.630128	if the destructor	-0.124939
-0.978507	call the destructor	-0.124939
-1.660931	about the destructor	-0.124939
-2.023616	be a destructor	-0.124939
-2.220319	with a destructor	-0.124939
-1.287364	have a destructor	-0.124939
-2.024174	make a destructor	-0.124939
-0.600472	owns. A destructor	-0.124939
-1.194297	and no destructor	-0.124939
-0.887757	A virtual destructor	-0.124939
-2.211973	is to save	-0.124939
-1.525760	have to save	-0.124939
-2.211383	order to save	-0.124939
-0.598640	calculations to save	-0.124939
-2.045650	it can save	-0.124939
-2.072727	This can save	-0.124939
-1.873676	You can save	-0.124939
-1.272998	we may save	-0.124939
-2.008851	You may save	-0.124939
-0.600445	but will save	-0.124939
-0.898333	it should save	-0.124939
-1.058085	label ; save	-0.124939
-0.851690	and possibly save	-0.124939
-0.601683	register and prevents	-0.124939
-1.749064	and it prevents	-0.124939
-2.055568	because it prevents	-0.124939
-0.595998	unfortunately it prevents	-0.124939
-1.225129	memory. This prevents	-0.124939
-1.032069	thread. This prevents	-0.124939
-0.585781	one. This prevents	-0.124939
-0.585781	volatile. This prevents	-0.124939
-0.585781	compiling. This prevents	-0.124939
-1.295434	if this prevents	-0.124939
-0.600039	write instruction prevents	-0.124939
-0.599872	chain which prevents	-0.124939
-0.896325	It also prevents	-0.124939
-1.039926	integer division prevents	-0.124939
-1.951977	of the preceding	-0.221849
-2.004646	to the preceding	-0.124939
-2.370963	on the preceding	-0.124939
-1.681401	In the preceding	-0.124939
-1.336074	before the preceding	-0.124939
-1.571169	See the preceding	-0.124939
-0.601653	unwinding The preceding	-0.124939
-0.601210	correlated with preceding	-0.124939
-2.312388	use the safe	-0.124939
-2.777335	it is safe	-0.124939
-2.527117	This is safe	-0.124939
-2.711463	in a safe	-0.124939
-1.497261	not a safe	-0.124939
-1.298642	called. The safe	-0.124939
-1.238527	not be safe	-0.124939
-1.737314	but not safe	-0.124939
-2.094830	is more safe	-0.124939
-0.888777	therefore more safe	-0.124939
-1.728021	is only safe	-0.124939
-0.594202	not thread safe	-0.124939
-0.593890	is exception safe	-0.124939
-0.600601	c and d	-0.124939
-0.900178	15.1b and d	-0.124939
-1.775256	c = d	-0.124939
-1.795656	y = d	-0.124939
-1.549767	0) { d	-0.124939
-0.599262	14.20 double d	-0.124939
-0.201222	c*x + d	-0.425969
-0.820748	& b; d	-0.124939
-0.558941	&& b; d	-0.124939
-0.205647	double d; d	-0.602060
-0.358902	{ DTRUE: d	-0.124939
-0.249823	8 2.5 Choice	-0.124939
-0.249823	compilers. 2.5 Choice	-0.124939
-0.143390	6 2.3 Choice	-0.124939
-0.143390	manual. 2.3 Choice	-0.124939
-0.143390	cache. 2.2 Choice	-0.124939
-0.143390	5 2.2 Choice	-0.124939
-0.143390	platform 2.1 Choice	-0.124939
-0.143390	5 2.1 Choice	-0.124939
-0.143390	12 2.7 Choice	-0.124939
-0.143390	undocumented. 2.7 Choice	-0.124939
-0.143390	10 2.6 Choice	-0.124939
-0.143390	compiler. 2.6 Choice	-0.124939
-0.143390	optimization. 2.4 Choice	-0.124939
-0.143390	6 2.4 Choice	-0.124939
-1.653098	code to tell	-0.124939
-2.123587	possible to tell	-0.124939
-1.271018	way to tell	-0.124939
-0.887036	always to tell	-0.124939
-0.593959	declaration to tell	-0.124939
-0.593959	directive to tell	-0.124939
-0.593959	prototype to tell	-0.124939
-0.593959	forgot to tell	-0.124939
-0.593959	throw(A,B,C) to tell	-0.124939
-0.593959	novector to tell	-0.124939
-2.041077	that can tell	-0.124939
-1.745290	We can tell	-0.124939
-0.600462	references then tell	-0.124939
-2.570217	on the Pentium	-0.124939
-0.988575	on a Pentium	-0.425969
-1.291350	CPUs. The Pentium	-0.124939
-1.073348	computer. The Pentium	-0.124939
-0.901204	cycles on Pentium	-0.124939
-0.899799	prediction. A Pentium	-0.124939
-1.363701	an Intel Pentium	-0.124939
-0.594875	pattern, while Pentium	-0.124939
-0.713685	the old Pentium	-0.124939
-0.358887	the oldest Pentium	-0.124939
-0.601829	background is further	-0.124939
-2.101807	for a further	-0.124939
-1.051212	possibility for further	-0.124939
-0.592535	expected for further	-0.124939
-0.592535	Performance for further	-0.124939
-0.592535	150 for further	-0.124939
-0.592535	101 for further	-0.124939
-0.592535	153 for further	-0.124939
-0.592535	140 for further	-0.124939
-2.948457	can be further	-0.124939
-1.298261	methods are further	-0.124939
-2.651209	the code further	-0.124939
-0.600426	instructions. A further	-0.124939
-2.326568	the loop further	-0.124939
-0.601058	exception-safe code Assume	-0.124939
-2.176516	} } Assume	-0.124939
-1.072527	static static Assume	-0.124939
-1.034512	vector aligned Assume	-0.124939
-1.003788	memory access. Assume	-0.124939
-1.004092	throw() throw() Assume	-0.124939
-0.557577	of speed. Assume	-0.124939
-0.659639	__attribute(( aligned(16))) Assume	-0.124939
-0.659639	__attribute(( const)) Assume	-0.124939
-0.659639	#pragma ivdep Assume	-0.124939
-0.463495	/GR- -fno-rtti Assume	-0.124939
-0.358816	accessed column-wise. Assume	-0.124939
-0.358816	page 78. Assume	-0.124939
-0.358816	in general. Assume	-0.124939
-2.547247	on the efficiency	-0.124939
-1.710990	between the efficiency	-0.124939
-1.184773	etc. The efficiency	-0.124939
-0.203132	7 The efficiency	-0.425969
-0.597142	Loops The efficiency	-0.124939
-1.587959	for this efficiency	-0.124939
-0.600526	question when efficiency	-0.124939
-1.291771	of program efficiency	-0.124939
-0.898928	on CPU efficiency	-0.124939
-0.599625	economy, cache efficiency	-0.124939
-1.269379	may improve efficiency	-0.124939
-0.582041	the relative efficiency	-0.124939
-0.764408	The highest efficiency	-0.124939
-2.625775	that the repeat	-0.124939
-2.046433	if the repeat	-0.425969
-2.418472	when the repeat	-0.124939
-2.316478	If the repeat	-0.124939
-1.203057	whether to repeat	-0.124939
-0.594880	;checkifi<100 ; repeat	-0.124939
-1.418278	a high repeat	-0.124939
-0.870602	the maximum repeat	-0.124939
-0.491862	worst-case maximum repeat	-0.124939
-0.567129	very low repeat	-0.124939
-0.567080	the typical repeat	-0.124939
-0.828159	and fixed repeat	-0.124939
-0.527278	precision. Let's repeat	-0.124939
-1.506404	by the unroll	-0.602060
-2.150315	have to unroll	-0.124939
-1.886700	necessary to unroll	-0.124939
-1.193940	advantage to unroll	-0.124939
-1.290850	reason to unroll	-0.124939
-0.896277	worthwhile to unroll	-0.124939
-1.709898	and you unroll	-0.124939
-1.839885	compilers will unroll	-0.124939
-1.629153	the loop unroll	-0.124939
-0.897867	some compilers unroll	-0.124939
-0.585078	will usually unroll	-0.124939
-1.622790	that it calls.	-0.124939
-1.465716	of function calls.	-0.124939
-1.005680	or function calls.	-0.124939
-1.243867	library function calls.	-0.124939
-0.576153	multiple function calls.	-0.124939
-1.005680	many function calls.	-0.124939
-0.852734	through function calls.	-0.124939
-0.581212	pure function calls.	-0.124939
-0.198813	across function calls.	-0.124939
-1.180744	and system calls.	-0.124939
-0.871538	graphical interface calls.	-0.124939
-1.262883	into the algorithm	-0.425969
-1.599814	choice of algorithm	-0.124939
-0.600893	defines an algorithm	-0.124939
-0.598207	express any algorithm	-0.124939
-2.019621	the first algorithm	-0.124939
-1.987130	The following algorithm	-0.124939
-1.748803	a simple algorithm	-0.124939
-1.783913	the best algorithm	-0.124939
-0.633062	the optimal algorithm	-0.124939
-0.589485	and complicated algorithm	-0.124939
-0.463549	a universal algorithm	-0.124939
-0.601904	calculates the sum	-0.124939
-2.714003	is a sum	-0.124939
-2.176528	value of sum	-0.124939
-0.901688	constructor // sum	-0.124939
-0.600724	n++) { sum	-0.124939
-0.600408	a[i+3]; } sum	-0.124939
-0.586525	x; float sum	-0.124939
-0.872556	a[100]; float sum	-0.124939
-1.353239	100; i++) sum	-0.124939
-1.348040	size; i++) sum	-0.124939
-0.523348	i<100; i++) sum	-0.124939
-1.595577	int i, sum	-0.124939
-0.659725	float list[size], sum	-0.124939
-0.463549	// initialize sum	-0.124939
-0.204022	handle the strings	-0.425969
-0.901555	types or strings	-0.124939
-0.201958	store all strings	-0.425969
-1.707954	to store strings	-0.124939
-1.261377	to handle strings	-0.124939
-0.818423	of storing strings	-0.124939
-0.090155	time. Text strings	-0.124939
-0.090155	classes. Text strings	-0.124939
-0.090155	Strings Text strings	-0.124939
-0.249815	of text strings	-0.124939
-0.249815	handle text strings	-0.124939
-0.586557	some processors. On	-0.124939
-1.152370	different CPUs. On	-0.124939
-1.334492	the compiler. On	-0.124939
-0.579100	few resources. On	-0.124939
-0.764299	data structures. On	-0.124939
-0.504971	language output. On	-0.124939
-0.659668	more difficult. On	-0.124939
-0.463513	using hyperthreading. On	-0.124939
-0.463513	branch tree. On	-0.124939
-0.358830	loop predictor. On	-0.124939
-0.358830	point comparison. On	-0.124939
-0.358830	is profitable. On	-0.124939
-0.358830	example 9.1b. On	-0.124939
-3.035041	of the exponent	-0.124939
-1.763891	when the exponent	-0.425969
-2.162319	from the exponent	-0.124939
-0.902615	n to exponent	-0.124939
-0.600148	numbers. The exponent	-0.124939
-0.600148	digits. The exponent	-0.124939
-0.892330	8; // exponent	-0.124939
-0.596647	15; // exponent	-0.124939
-0.596647	11; // exponent	-0.124939
-1.044964	unsigned int exponent	-0.602060
-0.601809	distributions of Linux,	-0.124939
-1.640185	Windows and Linux,	-0.124939
-1.703450	objects in Linux,	-0.124939
-0.600647	SetThreadAffinityMask, in Linux,	-0.124939
-1.538314	and 64-bit Linux,	-0.124939
-1.254708	In 64-bit Linux,	-0.124939
-0.573890	whereas 64-bit Linux,	-0.124939
-0.894682	// 32-bit Linux,	-0.124939
-0.395331	a Windows, Linux,	-0.124939
-0.395331	with Windows, Linux,	-0.124939
-0.395331	systems Windows, Linux,	-0.124939
-0.395331	Mac Windows, Linux,	-0.124939
-0.358916	platforms (Windows, Linux,	-0.124939
-2.848648	of the possibility	-0.124939
-2.400483	and the possibility	-0.124939
-0.941546	out the possibility	-0.425969
-1.649986	about the possibility	-0.124939
-0.203531	opens the possibility	-0.425969
-0.897237	offer the possibility	-0.124939
-0.599124	open the possibility	-0.124939
-0.483898	loop. Another possibility	-0.124939
-0.483898	GOT. Another possibility	-0.124939
-0.463677	the theoretical possibility	-0.124939
-0.463677	very obscure possibility	-0.124939
-1.600954	See the discussion	-0.124939
-1.403950	for a discussion	-0.425969
-0.899123	120 for discussion	-0.124939
-0.600072	93 for discussion	-0.124939
-0.900283	from this discussion	-0.124939
-1.293513	for more discussion	-0.124939
-0.600426	prone. A discussion	-0.124939
-1.556733	are various discussion	-0.124939
-0.372427	a further discussion	-0.124939
-0.350803	for further discussion	-0.602060
-0.601581	satisfied. The conditions	-0.124939
-0.574372	testing multiple conditions	-0.124939
-0.198439	Testing multiple conditions	-0.124939
-1.779151	of these conditions	-0.124939
-1.222352	the following conditions	-0.425969
-0.595306	from error conditions	-0.124939
-0.882248	if certain conditions	-0.124939
-0.589602	the caching conditions	-0.124939
-1.037998	has three conditions	-0.124939
-0.527262	www.agner.org/optimize. Copyright conditions	-0.124939
-0.764408	under worst-case conditions	-0.124939
-2.101451	on a non-Intel	-0.124939
-0.601179	well with non-Intel	-0.124939
-0.338497	performance on non-Intel	-0.301030
-1.336872	work on non-Intel	-0.124939
-0.577648	speed on non-Intel	-0.124939
-0.490339	well on non-Intel	-0.124939
-1.231562	running on non-Intel	-0.124939
-0.600457	processors. A non-Intel	-0.124939
-0.463641	dispatcher treats non-Intel	-0.124939
-0.463641	also treat non-Intel	-0.124939
-1.916763	pointer to it.	-0.124939
-1.673725	reference to it.	-0.124939
-0.601090	count on it.	-0.124939
-2.419674	to use it.	-0.124939
-1.290084	you need it.	-0.124939
-0.889312	something about it.	-0.124939
-1.272431	that calls it.	-0.124939
-1.469687	can avoid it.	-0.124939
-1.256065	that support it.	-0.124939
-0.583065	and tested it.	-0.124939
-1.307726	to execute it.	-0.124939
-0.541092	don't understand it.	-0.124939
-0.358844	and recompile it.	-0.124939
-1.577281	the CPU (See	-0.425969
-1.786433	of 2 (See	-0.124939
-1.022822	by 2 (See	-0.124939
-0.889163	AMD CPUs (See	-0.124939
-1.411576	64-bit Windows (See	-0.124939
-0.587351	specified types (See	-0.124939
-1.152460	different CPUs. (See	-0.124939
-0.867859	across modules (See	-0.124939
-1.037408	by 2. (See	-0.124939
-0.861178	global variables. (See	-0.124939
-1.149090	be mispredicted (See	-0.124939
-0.659725	from www.intel.com. (See	-0.124939
-1.496925	kind of registers.	-0.124939
-0.601023	scarcity of registers.	-0.124939
-1.677002	transferred in registers.	-0.124939
-0.900259	returned in registers.	-0.124939
-1.664797	the vector registers.	-0.124939
-0.586164	bigger vector registers.	-0.124939
-0.586164	special vector registers.	-0.124939
-1.437411	two different registers.	-0.124939
-0.599613	in integer registers.	-0.124939
-0.599179	copied into registers.	-0.124939
-0.585981	versus XMM registers.	-0.124939
-0.615712	the YMM registers.	-0.124939
-0.615712	256-bit YMM registers.	-0.124939
-3.034255	of the maximum	-0.124939
-1.201750	below the maximum	-0.124939
-0.600873	near the maximum	-0.124939
-0.600873	above, the maximum	-0.124939
-0.601828	allows a maximum	-0.124939
-1.060411	registers. The maximum	-0.124939
-0.595713	same. The maximum	-0.124939
-0.595713	27). The maximum	-0.124939
-0.595713	weekdays. The maximum	-0.124939
-0.595713	67 The maximum	-0.124939
-0.598446	minimum value maximum	-0.124939
-1.634579	to take maximum	-0.124939
-0.527291	the worst-case maximum	-0.124939
-1.399790	and 64-bit mode.	-0.124939
-1.022130	in 64-bit mode.	-0.124939
-0.939605	or 64-bit mode.	-0.124939
-0.725863	in 32-bit mode.	-0.346788
-1.492379	64 bit mode.	-0.124939
-1.416004	32 bit mode.	-0.124939
-0.358959	into sleep mode.	-0.124939
-0.877419	of elements per	-0.301030
-0.595395	execution times per	-0.124939
-0.591610	three values per	-0.124939
-0.672570	clock cycles per	-0.221849
-0.358970	size Time per	-0.124939
-0.358970	kilobytes Time per	-0.124939
-0.358970	9.6a Time per	-0.124939
-0.600062	this for testing	-0.124939
-1.908008	useful for testing	-0.124939
-2.085009	you are testing	-0.124939
-0.895366	zero by testing	-0.124939
-1.067646	gain by testing	-0.124939
-1.182700	useful when testing	-0.124939
-1.057762	relevant when testing	-0.124939
-1.073399	faster because testing	-0.124939
-1.065782	also makes testing	-0.124939
-0.541198	of development, testing	-0.124939
-0.065800	16.3 Worst-case testing	-0.124939
-0.358887	optimal. Best-case testing	-0.124939
-2.674997	that the alignment	-0.124939
-1.844559	but the alignment	-0.124939
-1.076596	specify the alignment	-0.124939
-1.775625	because of alignment	-0.124939
-1.077716	automatically. The alignment	-0.124939
-0.601129	Vectorization with alignment	-0.124939
-1.201242	restrictions on alignment	-0.124939
-0.601010	members. This alignment	-0.124939
-1.856905	of this alignment	-0.124939
-0.900010	operations when alignment	-0.124939
-0.599181	about pointer alignment	-0.124939
-0.586636	vectors requires alignment	-0.124939
-0.504921	__attribute__((aligned(16))). Specifies alignment	-0.124939
-1.633034	to the right	-0.249877
-1.251172	into the right	-0.124939
-0.895854	made the right	-0.124939
-1.476904	find the right	-0.124939
-0.599151	finding the right	-0.124939
-0.895854	putting the right	-0.124939
-0.897670	array size right	-0.124939
-0.858766	; shift right	-0.124939
-2.601516	to the offset	-0.124939
-2.607632	if the offset	-0.124939
-0.599816	code the offset	-0.124939
-2.184805	then the offset	-0.124939
-2.274656	because the offset	-0.124939
-2.291022	If the offset	-0.124939
-1.289922	stores the offset	-0.124939
-0.902215	}; The offset	-0.124939
-1.496281	with an offset	-0.124939
-1.194245	or no offset	-0.124939
-0.515421	the global offset	-0.124939
-0.744255	called global offset	-0.124939
-0.855444	a total offset	-0.124939
-2.733062	that the compatibility	-0.124939
-1.009007	sake of compatibility	-0.425969
-0.598697	causes of compatibility	-0.124939
-1.069171	requirements of compatibility	-0.124939
-1.069171	sources of compatibility	-0.124939
-1.543348	time and compatibility	-0.124939
-1.438479	problems and compatibility	-0.124939
-0.900061	set when compatibility	-0.124939
-0.788888	of backwards compatibility	-0.124939
-0.659811	and resolve compatibility	-0.124939
-0.463604	the cross-platform compatibility	-0.124939
-0.358902	about bugs, compatibility	-0.124939
-2.365895	because the macro	-0.124939
-2.363869	to a macro	-0.124939
-1.292512	like a macro	-0.124939
-0.899663	define a macro	-0.124939
-1.679647	result of macro	-0.124939
-0.902126	beware that macro	-0.124939
-0.887988	functions A macro	-0.124939
-0.594443	scope. A macro	-0.124939
-0.594551	7.34a. Use macro	-0.124939
-0.834807	// Define macro	-0.124939
-0.504962	7.34b. Replace macro	-0.124939
-0.463586	The preprocessing macro	-0.124939
-0.372281	// 2 bytes.	-0.425969
-0.240771	// 4 bytes.	-0.602060
-0.199468	// 8 bytes.	-0.425969
-1.032925	of 64 bytes.	-0.124939
-0.856833	typically 64 bytes.	-0.124939
-0.891151	of 16 bytes.	-0.124939
-0.885103	first 128 bytes.	-0.124939
-0.562974	is 12 bytes.	-0.124939
-0.463604	// 400 bytes.	-0.124939
-2.051687	to the object.	-0.124939
-2.594194	for the object.	-0.124939
-0.600867	delete the object.	-0.124939
-2.642727	the same object.	-0.124939
-1.948549	for each object.	-0.124939
-0.558988	the shared object.	-0.124939
-1.151516	a shared object.	-0.124939
-0.479807	same shared object.	-0.124939
-1.246788	a global object.	-0.124939
-1.393315	the entire object.	-0.124939
-0.659840	an anonymous object.	-0.124939
-1.292001	array of 100	-0.124939
-0.899456	sum of 100	-0.124939
-0.600239	Array of 100	-0.124939
-2.157428	there are 100	-0.124939
-0.901388	it by 100	-0.124939
-0.597903	i with 100	-0.124939
-0.597903	eax with 100	-0.124939
-0.601019	50 - 100	-0.124939
-0.598454	1000 * 100	-0.124939
-1.883048	the result 100	-0.124939
-0.835874	per element. 100	-0.124939
-0.586625	1 eax, 100	-0.124939
-0.415618	cmp eax, 100	-0.124939
-1.228521	*= x; Note	-0.124939
-0.863711	and 1. Note	-0.124939
-0.570464	desired version. Note	-0.124939
-0.562912	Windows system. Note	-0.124939
-0.827891	character arrays. Note	-0.124939
-1.163561	for details. Note	-0.124939
-0.917257	an explanation. Note	-0.124939
-0.527143	less optimized. Note	-0.124939
-0.659668	optimized away. Note	-0.124939
-0.358830	variable Day. Note	-0.124939
-0.358830	file disassembler. Note	-0.124939
-0.358830	patch. 131 Note	-0.124939
-0.358830	in a[i]. Note	-0.124939
-1.714933	by making them	-0.124939
-2.049570	you want them	-0.124939
-0.887327	and compile them	-0.124939
-1.353644	can reduce them	-0.124939
-0.582065	you turn them	-0.124939
-1.230977	by copying them	-0.124939
-0.847196	and reading them	-0.124939
-0.805643	by comparing them	-0.124939
-0.504971	and copies them	-0.124939
-0.726551	and leave them	-0.124939
-0.659668	to join them	-0.124939
-0.358830	or hide them	-0.124939
-0.358830	and getting them	-0.124939
-0.203833	reading and writing	-0.124939
-1.741301	we are writing	-0.124939
-0.201396	reading or writing	-0.124939
-0.089127	Reading or writing	-0.249877
-1.497429	well as writing	-0.124939
-0.894814	in software writing	-0.124939
-0.833115	more threads writing	-0.124939
-1.329255	multiple threads writing	-0.124939
-2.255076	of the library.	-0.124939
-2.267864	a function library.	-0.124939
-0.890173	different function library.	-0.124939
-0.890173	separate function library.	-0.124939
-2.577139	floating point library.	-0.124939
-1.633190	vector class library.	-0.124939
-1.360865	a static library.	-0.124939
-0.588563	Open source library.	-0.124939
-0.866087	Gnu C library.	-0.124939
-0.726685	AMD LIBM library.	-0.124939
-0.358887	any non-vector library.	-0.124939
-0.900408	Bitfield { struct	-0.124939
-0.195140	:1;//signbit }; struct	-0.124939
-1.230586	as follows: struct	-0.124939
-0.527165	12.2 __declspec(align(16)) struct	-0.124939
-0.527230	Example 14.9 struct	-0.124939
-0.902814	= 1024; struct	-0.124939
-0.504901	Example 7.13 struct	-0.124939
-0.463531	Example 8.15a struct	-0.124939
-0.358844	Example 7.40a struct	-0.124939
-0.358844	Example 8.15b struct	-0.124939
-0.358844	Example 7.35b struct	-0.124939
-0.358844	Example 7.35a struct	-0.124939
-3.013159	in the calculations.	-0.124939
-1.770740	do the calculations.	-0.124939
-2.577139	floating point calculations.	-0.124939
-1.538749	the integer calculations.	-0.124939
-0.880032	64-bit integer calculations.	-0.124939
-1.062431	for these calculations.	-0.124939
-0.946254	and mathematical calculations.	-0.124939
-0.498167	do mathematical calculations.	-0.124939
-0.498167	doing mathematical calculations.	-0.124939
-0.872813	heavy graphics calculations.	-0.124939
-0.573214	allows parallel calculations.	-0.124939
-0.999411	the actual calculations.	-0.124939
-0.463586	from overlapping calculations.	-0.124939
-0.805604	put the operand	-0.425969
-0.600941	when an operand	-0.124939
-0.201894	If one operand	-0.425969
-1.067973	the first operand	-0.301030
-0.415513	the second operand	-0.726999
-0.550789	most predictable operand	-0.124939
-1.994597	have a reduced	-0.124939
-1.078406	cause of reduced	-0.124939
-2.220878	can be reduced	-0.425969
-0.597961	Can be reduced	-0.124939
-0.901290	run with reduced	-0.124939
-1.958504	Intel compiler reduced	-0.124939
-1.842758	Gnu compiler reduced	-0.124939
-0.589724	library has reduced	-0.602060
-1.625382	the compilers reduced	-0.124939
-1.765214	has been reduced	-0.124939
-0.737072	two clock cycles.	-0.124939
-0.648073	4 clock cycles.	-0.124939
-0.456096	8 clock cycles.	-0.124939
-0.648073	several clock cycles.	-0.124939
-0.456096	256 clock cycles.	-0.124939
-0.456096	three clock cycles.	-0.124939
-0.958921	core clock cycles.	-0.124939
-0.456096	100 clock cycles.	-0.124939
-0.315832	20 clock cycles.	-0.124939
-0.456096	40 clock cycles.	-0.124939
-0.648073	45 clock cycles.	-0.124939
-0.456096	500 clock cycles.	-0.124939
-2.191154	of the final	-0.124939
-1.925825	in the final	-0.124939
-2.487463	that the final	-0.124939
-2.347032	when the final	-0.124939
-2.232578	If the final	-0.124939
-1.067355	check the final	-0.124939
-0.598082	allocate the final	-0.124939
-1.059165	on its final	-0.124939
-0.980638	for the sake	-1.238882
-0.902558	sequences of operations.	-0.124939
-1.014215	for vector operations.	-0.124939
-0.858718	as vector operations.	-0.124939
-1.295583	use vector operations.	-0.124939
-0.858718	Boolean vector operations.	-0.124939
-2.577139	floating point operations.	-0.124939
-0.898184	of integer operations.	-0.124939
-0.591840	simple standard operations.	-0.124939
-1.119190	and shift operations.	-0.124939
-0.434992	integer arithmetic operations.	-0.124939
-0.434992	doing arithmetic operations.	-0.124939
-0.504962	file input/output operations.	-0.124939
-0.358887	Single-Instruction-Multiple-Data (SIMD) operations.	-0.124939
-1.058320	dispatcher function. When	-0.124939
-1.178459	the cache. When	-0.124939
-0.583027	references are: When	-0.124939
-1.128426	cache size. When	-0.124939
-0.855156	arithmetic operations. When	-0.124939
-1.106315	single precision. When	-0.124939
-0.567057	Pointer aliasing When	-0.124939
-0.541070	call method. When	-0.124939
-0.999161	branch mispredictions. When	-0.124939
-0.659668	as additions. When	-0.124939
-0.358830	2 GB. When	-0.124939
-0.358830	of profiling. When	-0.124939
-0.358830	to 100000000. When	-0.124939
-0.601910	split the tasks	-0.124939
-0.902121	important for tasks	-0.124939
-1.596124	the different tasks	-0.124939
-1.394115	between different tasks	-0.124939
-0.898786	of other tasks	-0.124939
-1.178582	for simple tasks	-0.124939
-0.553773	for standard tasks	-0.124939
-0.811353	many standard tasks	-0.124939
-0.591514	for certain tasks	-0.124939
-0.575567	priority. Other tasks	-0.124939
-0.726690	very time-consuming tasks	-0.124939
-0.450629	put time-consuming tasks	-0.124939
-0.463568	for trivial tasks	-0.124939
-0.889072	a function. Avoid	-0.124939
-0.592941	non-virtual functions. Avoid	-0.124939
-1.258864	a program. Avoid	-0.124939
-0.583027	solutions are: Avoid	-0.124939
-0.835695	performance problems. Avoid	-0.124939
-0.557596	press. 19 Avoid	-0.124939
-0.541128	appropriate. 8. Avoid	-0.124939
-0.960314	is declared. Avoid	-0.124939
-0.659668	page 93. Avoid	-0.124939
-0.659668	page 26. Avoid	-0.124939
-0.463513	level 9. Avoid	-0.124939
-0.358830	page 22. Avoid	-0.124939
-0.358830	page 140. Avoid	-0.124939
-2.250391	then the effect	-0.124939
-0.595682	{ The effect	-0.124939
-1.456573	data. The effect	-0.124939
-1.270039	loop. The effect	-0.124939
-0.595682	manually. The effect	-0.124939
-0.595682	post-increment. The effect	-0.124939
-0.601024	effects. This effect	-0.124939
-0.900258	why this effect	-0.124939
-1.189023	hardly any effect	-0.124939
-0.583138	no negative effect	-0.124939
-0.990468	a significant effect	-0.124939
-0.557674	desired polymorphism effect	-0.124939
-0.527228	very dramatic effect	-0.124939
-2.454700	and the amount	-0.124939
-2.603488	that the amount	-0.124939
-2.407704	when the amount	-0.124939
-0.899302	increases the amount	-0.124939
-0.600162	reserve the amount	-0.124939
-0.600162	minimize the amount	-0.124939
-1.426573	the total amount	-0.124939
-0.990591	a significant amount	-0.124939
-0.165181	the required amount	-0.425969
-0.550853	an equal amount	-0.124939
-0.999537	a considerable amount	-0.124939
-0.463622	an insufficient amount	-0.124939
-3.137246	of the variable.	-0.124939
-2.547988	on the variable.	-0.124939
-2.161738	by a variable.	-0.124939
-1.296158	require a variable.	-0.124939
-0.902139	on that variable.	-0.124939
-1.194486	a float variable.	-0.124939
-0.903178	a register variable.	-0.425969
-1.051879	a simple variable.	-0.124939
-0.956540	an induction variable.	-0.124939
-0.536425	explicit induction variable.	-0.124939
-0.858588	a local variable.	-0.124939
-3.195064	of the time,	-0.124939
-1.878254	at a time,	-0.124939
-1.554821	waste of time,	-0.124939
-0.600592	At this time,	-0.124939
-2.636751	the same time,	-0.124939
-1.290551	of CPU time,	-0.124939
-0.598177	at any time,	-0.124939
-1.683295	a long time,	-0.124939
-0.889072	takes extra time,	-0.124939
-2.048207	at compile time,	-0.124939
-0.588988	between development time,	-0.124939
-0.504971	at compile- time,	-0.124939
-0.358830	the programmers' time,	-0.124939
-1.826939	a class Variables	-0.124939
-1.763936	the stack Variables	-0.124939
-1.478351	in memory. Variables	-0.124939
-1.368556	more efficient. Variables	-0.124939
-0.523644	static storage Variables	-0.124939
-0.758274	variable storage Variables	-0.124939
-0.583069	effects are: Variables	-0.124939
-0.567034	course inefficient. Variables	-0.124939
-0.567034	temporary storage. Variables	-0.124939
-0.249793	functions. 9.4 Variables	-0.124939
-0.249793	88 9.4 Variables	-0.124939
-0.358859	p. 26). Variables	-0.124939
-0.358859	critical stride. Variables	-0.124939
-3.064760	in the copying	-0.124939
-1.925198	instead of copying	-0.124939
-1.498967	ways of copying	-0.124939
-0.601683	transposing and copying	-0.124939
-1.473573	done by copying	-0.124939
-1.452567	simply by copying	-0.124939
-1.462943	avoided by copying	-0.124939
-0.589232	jump by copying	-0.124939
-0.589232	returned by copying	-0.124939
-2.280045	such as copying	-0.124939
-0.900061	implicitly when copying	-0.124939
-0.504982	this wasteful copying	-0.124939
-0.504982	legitimate backup copying	-0.124939
-1.712548	sake of optimization.	-0.124939
-1.378775	relevant to optimization.	-0.124939
-0.601555	possibilities for optimization.	-0.124939
-0.601075	about code optimization.	-0.124939
-0.900688	on compiler optimization.	-0.124939
-1.372980	do this optimization.	-0.124939
-0.795629	whole program optimization.	-0.124939
-0.597803	to software optimization.	-0.124939
-0.579185	options prevent optimization.	-0.124939
-0.575438	of full optimization.	-0.124939
-0.463531	needs careful optimization.	-0.124939
-0.358844	offer profile-guided optimization.	-0.124939
-1.299995	cost to accessing	-0.124939
-1.625766	code for accessing	-0.124939
-2.025911	used for accessing	-0.124939
-0.890135	variable for accessing	-0.124939
-1.455500	optimized for accessing	-0.124939
-0.595534	STL for accessing	-0.124939
-0.601292	files or accessing	-0.124939
-1.296544	problem with accessing	-0.124939
-1.374271	fast as accessing	-0.124939
-1.686517	time than accessing	-0.124939
-1.834774	efficient than accessing	-0.124939
-1.296495	or when accessing	-0.124939
-0.577578	aliasing When accessing	-0.124939
-0.901467	off or until	-0.124939
-0.601098	stay on until	-0.124939
-0.599991	valid only until	-0.124939
-1.848644	a variable until	-0.124939
-1.350642	the file until	-0.124939
-1.387281	be loaded until	-0.124939
-0.308814	and wait until	-0.124939
-0.308814	will wait until	-0.124939
-0.308814	must wait until	-0.124939
-0.463549	but waits until	-0.124939
-0.463549	is repeated until	-0.124939
-0.358859	not detected until	-0.124939
-0.358859	be postponed until	-0.124939
-2.666183	for the performance.	-0.124939
-0.598437	lot in performance.	-0.124939
-1.071966	cost in performance.	-0.124939
-1.432846	difference in performance.	-0.124939
-1.290252	gain in performance.	-0.124939
-1.201268	effect on performance.	-0.124939
-0.600192	on program performance.	-0.124939
-0.895942	worst possible performance.	-0.124939
-0.594389	for best performance.	-0.124939
-1.225848	to improve performance.	-0.124939
-0.577530	of reduced performance.	-0.124939
-0.573326	for improved performance.	-0.124939
-0.504941	on improving performance.	-0.124939
-0.601605	convenient for adding	-0.124939
-1.741163	we are adding	-0.124939
-1.042573	function by adding	-0.124939
-0.866482	address by adding	-0.124939
-0.583370	bytes by adding	-0.124939
-1.293898	calculated by adding	-0.124939
-1.403457	improved by adding	-0.124939
-0.583370	row by adding	-0.124939
-0.866482	2n by adding	-0.124939
-0.894004	needed before adding	-0.124939
-0.596379	type-casting without adding	-0.124939
-0.878501	things like adding	-0.124939
-0.573289	producers keep adding	-0.124939
-1.462921	{ // Define	-0.124939
-0.575852	arrays // Define	-0.124939
-0.852166	c; // Define	-0.124939
-0.575852	temp; // Define	-0.124939
-0.852166	x^4 // Define	-0.124939
-0.198750	<dvec.h> // Define	-0.425969
-0.852166	"vectorclass.h" // Define	-0.124939
-0.575852	<emmintrin.h> // Define	-0.124939
-0.852166	"asmlib.h" // Define	-0.124939
-0.575852	SelectAddMul_dispatch; // Define	-0.124939
-0.358988	CPU cores: Define	-0.124939
-0.601632	course, and causes	-0.124939
-0.599828	mispredicted, which causes	-0.124939
-0.201411	matrix size causes	-0.425969
-1.189778	new version causes	-0.124939
-1.166759	the list causes	-0.124939
-0.590772	the write causes	-0.124939
-0.581942	the inlining causes	-0.124939
-1.258261	the destructor causes	-0.124939
-1.396698	critical stride causes	-0.124939
-0.562893	most frequent causes	-0.124939
-0.659697	FDIV bug causes	-0.124939
-0.358844	and free) causes	-0.124939
-2.457783	by the processing	-0.124939
-1.919063	and a processing	-0.124939
-1.734006	time than processing	-0.124939
-0.593101	parallel vector processing	-0.124939
-0.593101	cores, vector processing	-0.124939
-1.164696	the high processing	-0.124939
-0.480875	the graphics processing	-0.124939
-0.980121	a graphics processing	-0.124939
-0.480875	no graphics processing	-0.124939
-0.573193	specifying parallel processing	-0.124939
-0.504941	statistics, signal processing	-0.124939
-0.463568	and sound processing	-0.124939
-0.463568	a physics processing	-0.124939
-1.341812	is to divide	-0.602060
-1.328473	order to divide	-0.301030
-1.923398	need to divide	-0.124939
-1.477052	ways to divide	-0.124939
-1.939425	code and divide	-0.124939
-1.935678	You can divide	-0.124939
-0.601374	right = divide	-0.124939
-1.476177	when you divide	-0.124939
-0.891268	library, you divide	-0.124939
-2.602264	to the so-called	-0.124939
-2.825190	in the so-called	-0.124939
-2.240388	use the so-called	-0.124939
-1.325153	using the so-called	-0.124939
-0.898636	bypassing the so-called	-0.124939
-0.599827	emulating the so-called	-0.124939
-2.333843	as a so-called	-0.124939
-1.190747	cases. The so-called	-0.124939
-0.896352	back. The so-called	-0.124939
-0.598678	modular. The so-called	-0.124939
-1.064748	functions. This so-called	-0.124939
-0.597195	object. This so-called	-0.124939
-2.135738	it is clear	-0.425969
-2.390041	should be clear	-0.124939
-2.231744	you can clear	-0.124939
-2.588329	is not clear	-0.124939
-0.582909	a more clear	-0.124939
-0.578250	it more clear	-0.124939
-0.578250	program more clear	-0.124939
-0.578250	software more clear	-0.124939
-2.248279	is no clear	-0.124939
-0.597342	code less clear	-0.124939
-1.341511	for making clear	-0.124939
-2.848648	of the total	-0.124939
-1.788655	to the total	-0.124939
-2.417065	on the total	-0.124939
-2.376396	when the total	-0.124939
-2.248275	because the total	-0.124939
-1.656059	If the total	-0.425969
-2.663990	is a total	-0.124939
-2.306913	with a total	-0.124939
-0.902256	operations. The total	-0.124939
-1.406448	64 bits total	-0.124939
-2.170952	is to mix	-0.124939
-1.418133	and to mix	-0.124939
-0.893198	not to mix	-0.124939
-1.172931	advantageous to mix	-0.124939
-1.989140	able to mix	-0.124939
-0.597086	multiplication, to mix	-0.124939
-1.675144	Do not mix	-0.124939
-0.143409	14.7 Don't mix	-0.425969
-0.358980	evicted. Don't mix	-0.124939
-0.358959	a balanced mix	-0.124939
-0.902362	DOS and 16-bit	-0.124939
-0.265178	int in 16-bit	-0.726999
-1.185610	integers in 16-bit	-0.124939
-1.501523	optimized for 16-bit	-0.124939
-1.705434	compatible with 16-bit	-0.124939
-2.530913	to make 16-bit	-0.124939
-0.194629	of eight 16-bit	-0.124939
-1.190405	64-bit mode. 16-bit	-0.124939
-3.139240	of the child	-0.124939
-2.642104	for the child	-0.124939
-0.124659	parent and child	-0.124939
-0.899253	// The child	-0.124939
-1.196480	class. The child	-0.124939
-1.239079	of its child	-0.124939
-1.010009	to its child	-0.124939
-0.544818	about its child	-0.124939
-0.563014	call polymorphic child	-0.124939
-0.726785	the correct child	-0.124939
-1.678715	set of containers	-0.124939
-0.601571	wheel. The containers	-0.124939
-0.601473	stored are containers	-0.124939
-0.599317	these example containers	-0.124939
-0.594166	so. These containers	-0.124939
-0.593498	Objects inside containers	-0.124939
-0.589023	have separate containers	-0.124939
-0.586583	solution. Many containers	-0.124939
-0.585044	ready made containers	-0.124939
-1.065238	the STL containers	-0.124939
-0.510425	other STL containers	-0.124939
-0.557598	of suitable containers	-0.124939
-0.893198	16 to fit	-0.124939
-0.893198	available to fit	-0.124939
-0.089792	eight to fit	-0.726999
-0.597086	necessary, to fit	-0.124939
-0.599937	tools that fit	-0.124939
-0.599937	sub-vectors that fit	-0.124939
-2.028243	does not fit	-0.124939
-1.401226	the data fit	-0.124939
-1.910983	compiler to predict	-0.124939
-2.142726	order to predict	-0.124939
-1.110544	able to predict	-0.124939
-1.119285	difficult to predict	-0.124939
-0.595523	algorithms to predict	-0.124939
-1.178543	unable to predict	-0.124939
-2.184568	you can predict	-0.124939
-1.070730	microprocessor can predict	-0.124939
-0.358973	may occasionally predict	-0.124939
-1.203567	setting the priority	-0.124939
-0.600093	widely different priority	-0.124939
-1.949516	the same priority	-0.124939
-1.263244	the thread priority	-0.124939
-0.876480	The high priority	-0.124939
-0.527311	has higher priority	-0.124939
-0.527311	give higher priority	-0.124939
-0.450677	the low priority	-0.124939
-0.726780	a low priority	-0.124939
-0.615658	a lower priority	-0.124939
-0.615658	with lower priority	-0.124939
-2.760714	to the disk	-0.124939
-1.775966	because of disk	-0.124939
-1.495214	memory and disk	-0.124939
-0.600627	RAM and disk	-0.124939
-1.444494	waiting for disk	-0.124939
-0.573319	or reading disk	-0.124939
-0.249854	the hard disk	-0.124939
-0.035324	a hard disk	-0.249877
-0.162708	for hard disk	-0.124939
-0.489007	the clock frequency	-0.249877
-0.215841	The clock frequency	-0.124939
-0.418748	CPU clock frequency	-0.124939
-0.466335	their clock frequency	-0.124939
-0.466335	higher clock frequency	-0.124939
-0.466335	actual clock frequency	-0.124939
-0.902573	CPU of unknown	-0.124939
-0.600922	about an unknown	-0.124939
-1.073977	come from unknown	-0.124939
-1.747548	for all unknown	-0.124939
-0.895883	so many unknown	-0.124939
-1.046184	that was unknown	-0.124939
-0.116746	that were unknown	-0.823909
-1.261377	to handle unknown	-0.124939
-2.198372	that is obtained	-0.124939
-0.547997	performance is obtained	-0.124939
-1.066236	microprocessors is obtained	-0.124939
-0.894417	interface is obtained	-0.124939
-1.066236	efficiency is obtained	-0.124939
-2.188765	can be obtained	-0.425969
-1.181293	sometimes be obtained	-0.124939
-1.181293	possibly be obtained	-0.124939
-0.463696	no doubt obtained	-0.124939
-0.592723	vector function libraries.	-0.124939
-0.884613	different function libraries.	-0.124939
-0.884613	optimized function libraries.	-0.124939
-0.884613	standard function libraries.	-0.124939
-1.293563	short vector libraries.	-0.124939
-1.933983	the Intel libraries.	-0.124939
-1.191127	of static libraries.	-0.124939
-0.596014	multiple dynamic libraries.	-0.124939
-1.399193	very large libraries.	-0.124939
-1.155182	static link libraries.	-0.124939
-1.315095	vector math libraries.	-0.124939
-0.463586	with external libraries.	-0.124939
-0.601928	Here the iteration	-0.124939
-0.198379	of one iteration	-0.124939
-1.459727	in one iteration	-0.124939
-1.176300	from one iteration	-0.124939
-1.678815	for each iteration	-0.124939
-1.200413	where each iteration	-0.124939
-0.579985	After each iteration	-0.124939
-1.542942	an extra iteration	-0.124939
-1.049427	for every iteration	-0.124939
-1.509548	the preceding iteration	-0.124939
-1.431961	the previous iteration	-0.124939
-3.203093	of the counters	-0.124939
-1.679056	set of counters	-0.124939
-0.601622	core). The counters	-0.124939
-1.924853	the performance counters	-0.124939
-0.594257	etc. These counters	-0.124939
-0.164685	performance monitor counters	-0.124939
-0.596777	total time. Optimizing	-0.124939
-0.593988	loops, etc. Optimizing	-0.124939
-0.583090	dispatching are: Optimizing	-0.124939
-0.510454	manuals: 1. Optimizing	-0.124939
-0.580689	platforms. 2. Optimizing	-0.124939
-1.011887	is critical. Optimizing	-0.124939
-0.981646	big-endian storage. Optimizing	-0.124939
-0.557654	for speed. Optimizing	-0.124939
-0.391100	ebx. 9 Optimizing	-0.124939
-0.391100	84 9 Optimizing	-0.124939
-0.659754	the processor). Optimizing	-0.124939
-1.043057	into a 128-bit	-0.301030
-1.617432	example, a 128-bit	-0.124939
-0.601808	MMX to 128-bit	-0.124939
-1.583108	code. The 128-bit	-0.124939
-0.899233	registers The 128-bit	-0.124939
-0.872350	as two 128-bit	-0.124939
-0.872350	into two 128-bit	-0.124939
-0.884236	that supported 128-bit	-0.124939
-0.586692	units. Each 128-bit	-0.124939
-1.182225	the full 128-bit	-0.124939
-0.601835	range is possibly	-0.124939
-0.600614	table and possibly	-0.124939
-0.600614	registers, and possibly	-0.124939
-1.353996	that can possibly	-0.124939
-1.685397	code can possibly	-0.124939
-1.264047	thread can possibly	-0.124939
-0.592515	valid) can possibly	-0.124939
-0.600858	size may possibly	-0.124939
-0.898668	set, but possibly	-0.124939
-0.567184	methods could possibly	-0.124939
-0.463622	is overwritten, possibly	-0.124939
-2.140480	stored in x,	-0.124939
-0.568297	7.32a double x,	-0.124939
-0.568297	7.32b double x,	-0.124939
-0.568297	8.8b double x,	-0.124939
-0.568297	8.8a double x,	-0.124939
-0.586565	public: float x,	-0.124939
-0.586565	a; float x,	-0.124939
-0.590907	Multiply (int x,	-0.124939
-0.579268	{ S1 x,	-0.124939
-0.842099	can modify x,	-0.124939
-0.527293	ipow (double x,	-0.124939
-0.358902	e, f, x,	-0.124939
-2.848648	of the stack.	-0.124939
-1.935483	for the stack.	-0.124939
-1.337876	on the stack.	-0.221849
-1.614947	up the stack.	-0.124939
-1.601821	a register stack.	-0.124939
-0.878555	each their stack.	-0.124939
-1.463038	its own stack.	-0.124939
-2.144064	power of 2,	-0.124939
-0.601337	Monday = 2,	-0.124939
-0.601023	(int)n - 2,	-0.124939
-0.791879	c + 2,	-0.425969
-0.896660	evict number 2,	-0.124939
-0.223093	of 1, 2,	-0.124939
-0.223093	sizes 1, 2,	-0.124939
-0.097376	{1, 1, 2,	-0.425969
-0.550720	2 (i.e. 2,	-0.124939
-0.358902	processors (0, 2,	-0.124939
-2.652240	if the full	-0.124939
-1.626684	making the full	-0.124939
-1.437931	give the full	-0.124939
-0.600509	had the full	-0.124939
-0.899993	Typically, the full	-0.124939
-0.601813	handling a full	-0.124939
-1.679692	result of full	-0.124939
-0.601744	server in full	-0.124939
-1.297202	speed or full	-0.124939
-0.901290	version with full	-0.124939
-1.295192	will use full	-0.124939
-0.600215	language has full	-0.124939
-0.892533	CPU time. Another	-0.124939
-1.229765	innermost loop. Another	-0.124939
-0.541092	code itself. Another	-0.124939
-0.764335	Open Watcom Another	-0.124939
-0.527165	the double. Another	-0.124939
-0.835610	dependency chains. Another	-0.124939
-0.463531	a GOT. Another	-0.124939
-0.463531	program slower. Another	-0.124939
-0.358844	than ARRAYSIZE. Another	-0.124939
-0.358844	a DLL. Another	-0.124939
-0.358844	Intel CPU’s. Another	-0.124939
-0.358844	execution considerably. Another	-0.124939
-0.902627	network is overloaded	-0.124939
-1.198315	classes and overloaded	-0.124939
-1.199759	constructors and overloaded	-0.124939
-2.005428	cannot be overloaded	-0.124939
-0.601292	constructor or overloaded	-0.124939
-1.820120	of an overloaded	-0.124939
-0.592565	Using an overloaded	-0.124939
-0.592565	constructor, an overloaded	-0.124939
-1.041454	and using overloaded	-0.124939
-1.041454	for using overloaded	-0.124939
-1.424874	with multiple overloaded	-0.124939
-0.594597	operators An overloaded	-0.124939
-2.833217	it is possible.	-0.124939
-2.532621	to be possible.	-0.124939
-0.867833	functions if possible.	-0.124939
-0.200463	set if possible.	-0.124939
-0.584074	library if possible.	-0.124939
-0.584074	function, if possible.	-0.124939
-0.584074	variables, if possible.	-0.124939
-0.584074	longjmp if possible.	-0.124939
-0.887091	work as possible.	-0.124939
-1.180337	good as possible.	-0.124939
-0.593987	reproducible as possible.	-0.124939
-0.583755	works more efficiently	-0.124939
-0.583755	calculated more efficiently	-0.124939
-0.583755	achieved more efficiently	-0.124939
-0.583755	cached more efficiently	-0.124939
-0.569895	accessed most efficiently	-0.124939
-0.121245	works most efficiently	-0.301030
-0.826422	much less efficiently	-0.124939
-0.562037	works less efficiently	-0.124939
-0.562037	somewhat less efficiently	-0.124939
-0.594816	should work efficiently	-0.124939
-0.901555	brands or models	-0.124939
-1.434908	specific CPU models	-0.124939
-0.179630	of processor models	-0.425969
-0.179630	which processor models	-0.425969
-0.491454	specific processor models	-0.124939
-0.594382	or specific models	-0.124939
-1.307559	software development models	-0.124939
-0.570495	for future models	-0.124939
-0.842099	all newer models	-0.124939
-0.358902	size. Later models	-0.124939
-2.407928	function is OS	-0.124939
-0.418957	and Mac OS	-0.124939
-0.417965	in Mac OS	-0.124939
-0.295013	for Mac OS	-0.124939
-0.295013	64-bit Mac OS	-0.124939
-0.218830	32-bit Mac OS	-0.301030
-0.230635	Intel-based Mac OS	-0.124939
-0.873021	method requires OS	-0.124939
-2.318345	function is needed.	-0.124939
-1.365435	library is needed.	-0.124939
-1.289169	optimization is needed.	-0.124939
-1.289169	implementation is needed.	-0.124939
-0.599064	re-allocation is needed.	-0.124939
-1.853742	is not needed.	-0.124939
-0.593366	95 not needed.	-0.124939
-0.600905	space than needed.	-0.124939
-1.730522	only when needed.	-0.124939
-1.230590	is rarely needed.	-0.124939
-0.358930	packing, unpacking needed.	-0.124939
-0.601670	structures and classes.	-0.124939
-1.597334	only for classes.	-0.124939
-0.600181	available vector classes.	-0.124939
-1.910389	of using classes.	-0.124939
-1.779380	of these classes.	-0.124939
-0.495251	such container classes.	-0.124939
-0.334291	efficient container classes.	-0.124939
-0.495251	well-tested container classes.	-0.124939
-0.562953	implementing polymorphic classes.	-0.124939
-0.550698	the base classes.	-0.124939
-0.463586	and reusable classes.	-0.124939
-0.901291	1 is changed	-0.124939
-0.601160	14.9 is changed	-0.124939
-2.116731	to be changed	-0.124939
-1.801967	can be changed	-0.249877
-2.158677	may be changed	-0.124939
-1.821219	cannot be changed	-0.124939
-1.266683	pointer has changed	-0.124939
-0.593215	value has changed	-0.124939
-0.358959	is artificially changed	-0.124939
-1.197639	a is true	-0.124939
-2.730865	it is true	-0.124939
-1.630586	b is true	-0.124939
-2.529285	to be true	-0.124939
-0.896702	true = true	-0.124939
-0.896702	!a = true	-0.124939
-1.499693	loop if true	-0.124939
-1.194032	most often true	-0.124939
-1.061443	to always true	-0.124939
-1.368475	a && true	-0.124939
-1.231853	a || true	-0.124939
-0.358887	single result, true	-0.124939
-0.902825	stop the thread.	-0.124939
-0.601821	terminating a thread.	-0.124939
-1.845136	a different thread.	-0.124939
-1.809881	the other thread.	-0.124939
-1.371077	into one thread.	-0.124939
-0.956079	for each thread.	-0.124939
-0.561340	by each thread.	-0.124939
-0.561340	into each thread.	-0.124939
-0.478745	by another thread.	-0.124939
-0.600137	addresses. The names	-0.124939
-0.600137	for. The names	-0.124939
-2.202417	the function names	-0.124939
-1.503466	The function names	-0.124939
-0.589933	between function names	-0.124939
-0.589933	about function names	-0.124939
-0.879165	define function names	-0.124939
-0.849598	functions have names	-0.124939
-0.895567	and variable names	-0.124939
-0.592785	libircmt.lib. Function names	-0.124939
-0.981887	CPU brand names	-0.124939
-0.496038	temp even though	-0.124939
-0.496038	executed even though	-0.124939
-0.496038	returns even though	-0.124939
-0.496038	expressions, even though	-0.124939
-0.496038	b)) even though	-0.124939
-0.496038	nine, even though	-0.124939
-0.584203	this function, though	-0.124939
-0.828183	operating systems, though	-0.124939
-0.557689	compilers available, though	-0.124939
-0.541233	track backwards though	-0.124939
-0.358916	multiplication b[i]*c[i], though	-0.124939
-0.358916	to Object1.Hello(), though	-0.124939
-1.629498	than to execute	-0.124939
-2.113066	have to execute	-0.124939
-1.713305	time to execute	-0.124939
-1.269534	takes to execute	-0.124939
-1.952187	likely to execute	-0.124939
-0.597093	microseconds to execute	-0.124939
-1.184159	CPUs can execute	-0.124939
-1.064125	microprocessor can execute	-0.124939
-0.596983	debugger can execute	-0.124939
-2.502545	the code execute	-0.124939
-1.937866	of code execute	-0.124939
-0.601670	truncation, and %	-0.124939
-1.011307	= b %	-0.124939
-1.069802	= i %	-0.124939
-1.273896	if (i %	-0.124939
-0.541239	&& SIZE %	-0.124939
-0.764445	(line size) %	-0.124939
-0.366813	(unsigned int)b %	-0.124939
-0.463586	/ 64) %	-0.124939
-0.358887	/ 0x40) %	-0.124939
-1.903176	efficient than mov	-0.124939
-0.600029	next instruction mov	-0.124939
-0.595559	The instructions mov	-0.124939
-0.591800	sar add mov	-0.124939
-0.649590	mov mov mov	-0.124939
-0.408101	$B1$1: mov mov	-0.124939
-0.408101	$B2$2: mov mov	-0.124939
-0.463568	mov xor mov	-0.124939
-0.463568	$B1$1: push mov	-0.124939
-0.463568	parameter $B1$1: mov	-0.124939
-0.358873	lea $B2$2: mov	-0.124939
-0.358873	mov $B1$2: mov	-0.124939
-2.148622	value of N	-0.124939
-2.131829	power of N	-0.124939
-0.899471	splitting of N	-0.124939
-0.124548	specialization for N	-0.301030
-0.601282	N-1)==0 if N	-0.124939
-0.901290	Array with N	-0.124939
-0.599926	removed. If N	-0.124939
-0.598491	pow(x,N) where N	-0.124939
-1.154331	processor model N	-0.124939
-0.541216	General case, N	-0.124939
-1.118450	The different kinds	-0.124939
-0.571587	do different kinds	-0.124939
-1.312522	two different kinds	-0.124939
-0.571587	doing different kinds	-0.124939
-0.571587	mix different kinds	-0.124939
-1.290598	than other kinds	-0.124939
-0.599816	cause all kinds	-0.124939
-1.707543	the two kinds	-0.124939
-0.592949	are four kinds	-0.124939
-0.591561	make certain kinds	-0.124939
-0.107190	7.1 Different kinds	-0.425969
-0.600107	names. The details	-0.124939
-0.600107	(en.wikipedia.org/wiki/L2_cache). The details	-0.124939
-0.896087	tool for details	-0.124939
-0.598544	141 for details	-0.124939
-0.598544	systems" for details	-0.124939
-1.076139	gives more details	-0.124939
-0.898820	also other details	-0.124939
-0.567105	non- standardized details	-0.124939
-0.504962	the technical details	-0.124939
-0.504962	problems. More details	-0.124939
-0.358887	other hardware-related details	-0.124939
-0.358887	parameter. Further details	-0.124939
-2.760174	if the RAM	-0.124939
-1.909581	use of RAM	-0.124939
-1.434621	speed of RAM	-0.124939
-1.699950	amount of RAM	-0.124939
-1.299454	results in RAM	-0.124939
-0.888815	with more RAM	-0.124939
-1.057941	allocate more RAM	-0.124939
-1.054802	data from RAM	-0.124939
-0.886686	variable from RAM	-0.124939
-0.598491	1980 where RAM	-0.124939
-0.858506	may save RAM	-0.124939
-0.577563	CPU time, RAM	-0.124939
-2.651799	that the rows	-0.124939
-2.052799	if the rows	-0.425969
-2.079721	make the rows	-0.124939
-1.769111	number of rows	-0.425969
-1.110205	const int rows	-0.602060
-0.597890	distance between rows	-0.124939
-0.196426	loop through rows	-0.124939
-2.337121	with a square	-0.124939
-1.829641	call to square	-0.124939
-0.902092	ebx. The square	-0.124939
-0.601350	multiply // square	-0.124939
-0.599696	handle one square	-0.124939
-0.598770	8.1a float square	-0.124939
-1.967988	is called square	-0.124939
-0.594497	expected. Use square	-0.124939
-0.589512	techniques like square	-0.124939
-0.557616	approximate reciprocal square	-0.124939
-0.463531	precision division, square	-0.124939
-0.358844	inefficient. Division, square	-0.124939
-2.013422	likely to fail	-0.124939
-0.601292	results or fail	-0.124939
-1.813255	It may fail	-0.124939
-0.596299	example may fail	-0.124939
-1.502557	code will fail	-0.124939
-1.151525	It will fail	-0.124939
-0.588302	trick will fail	-0.124939
-0.597004	companies often fail	-0.124939
-1.414878	because they fail	-0.124939
-1.330667	and therefore fail	-0.124939
-0.562070	may therefore fail	-0.124939
-0.358902	software products fail	-0.124939
-0.728541	many different purposes.	-0.124939
-1.439525	several different purposes.	-0.124939
-0.843501	for other purposes.	-0.124939
-0.373901	for multiple purposes.	-0.124939
-0.198706	for test purposes.	-0.124939
-1.565647	of these purposes.	-0.124939
-0.850913	all these purposes.	-0.124939
-0.358930	for demonstration purposes.	-0.124939
-0.600009	system functions (e.g.	-0.124939
-0.599603	micro-op cache (e.g.	-0.124939
-0.598812	63 number (e.g.	-0.124939
-1.610262	a branch (e.g.	-0.124939
-0.597291	system call (e.g.	-0.124939
-0.594732	system calls (e.g.	-0.124939
-0.592369	relational operators (e.g.	-0.124939
-1.037909	Some applications (e.g.	-0.124939
-1.432978	static linking (e.g.	-0.124939
-0.585010	endian storage (e.g.	-0.124939
-0.579143	universal algorithm (e.g.	-0.124939
-0.917170	carry flag (e.g.	-0.124939
-1.671282	disadvantage of compiling	-0.124939
-1.200109	possibility of compiling	-0.124939
-0.601350	interpreting or compiling	-0.124939
-0.598229	works by compiling	-0.124939
-0.598229	module by compiling	-0.124939
-0.567263	registers when compiling	-0.124939
-0.567263	Studio when compiling	-0.124939
-0.567263	strict when compiling	-0.124939
-0.567263	-fno-pic when compiling	-0.124939
-0.567263	Vec16s when compiling	-0.124939
-0.567263	Func1 when compiling	-0.124939
-0.567263	Eclipse when compiling	-0.124939
-1.728453	efficient to convert	-0.124939
-1.073709	example, to convert	-0.124939
-1.912641	necessary to convert	-0.124939
-1.745120	We can convert	-0.124939
-0.599209	tested can convert	-0.124939
-0.857079	compiler will convert	-0.301030
-1.570499	and then convert	-0.124939
-0.594188	231 then convert	-0.124939
-0.893448	to first convert	-0.124939
-0.889013	compiler must convert	-0.124939
-1.535592	the same thing	-0.425969
-1.366508	than one thing	-0.124939
-0.945385	The first thing	-0.124939
-1.401499	most important thing	-0.124939
-1.157423	The second thing	-0.124939
-0.575582	chains. Another thing	-0.124939
-0.541179	The third thing	-0.124939
-0.541216	an obvious thing	-0.124939
-1.948654	into the least	-0.124939
-0.601250	chooses the least	-0.124939
-0.601250	isolates the least	-0.124939
-0.549384	by at least	-0.124939
-0.549384	has at least	-0.124939
-0.549384	calls at least	-0.124939
-0.549384	supports at least	-0.124939
-0.549384	(or at least	-0.124939
-0.549384	memory, at least	-0.124939
-0.549384	cache, at least	-0.124939
-0.549384	memcpy, at least	-0.124939
-0.549384	do, at least	-0.124939
-0.601565	class for containing	-0.124939
-1.200224	loop-invariant code containing	-0.124939
-1.198449	bit vector containing	-0.124939
-1.728540	a class containing	-0.124939
-0.589742	simple class containing	-0.124939
-1.413768	vector register containing	-0.124939
-0.595983	big file containing	-0.124939
-1.054240	the line containing	-0.124939
-0.589034	large block containing	-0.124939
-0.573245	or subexpression containing	-0.124939
-0.884950	at www.agner.org/optimize/cppexamples.zip containing	-0.124939
-0.463549	access patterns containing	-0.124939
-0.597852	(u.i[1] < 0)	-0.124939
-0.533637	(n > 0)	-0.124939
-0.533637	(bb[i] > 0)	-0.124939
-0.320831	2 == 0)	-0.124939
-0.320831	128 == 0)	-0.124939
-0.320831	(a == 0)	-0.124939
-0.131439	(b == 0)	-0.124939
-0.182910	(a != 0)	-0.124939
-0.182910	(n != 0)	-0.124939
-0.182910	(b != 0)	-0.124939
-0.182910	(*p != 0)	-0.124939
-1.300057	loss of precision.	-0.124939
-0.902387	performance and precision.	-0.124939
-2.578739	floating point precision.	-0.124939
-1.333517	and double precision.	-0.124939
-0.197160	for double precision.	-0.124939
-1.275525	long double precision.	-0.124939
-0.511315	to single precision.	-0.124939
-0.511315	as single precision.	-0.124939
-0.511315	than single precision.	-0.124939
-0.737323	use single precision.	-0.124939
-0.358930	of losing precision.	-0.124939
-1.775813	do the algebraic	-0.124939
-1.202995	possibility of algebraic	-0.124939
-0.601645	methods and algebraic	-0.124939
-2.419940	to use algebraic	-0.124939
-1.198414	cannot make algebraic	-0.124939
-1.493603	is because algebraic	-0.124939
-0.598207	do any algebraic	-0.124939
-0.843143	do simple algebraic	-0.124939
-0.571053	reduce simple algebraic	-0.124939
-0.589485	reduce complicated algebraic	-0.124939
-0.588511	reduce various algebraic	-0.124939
-0.586583	compiler. Many algebraic	-0.124939
-1.924788	use of structures	-0.124939
-1.201108	instances of structures	-0.124939
-0.601350	arrays or structures	-0.124939
-1.318508	of data structures	-0.124939
-1.348423	and data structures	-0.124939
-0.813052	other data structures	-0.124939
-0.813052	multiple data structures	-0.124939
-0.700840	large data structures	-0.124939
-0.989143	big data structures	-0.124939
-0.554712	advanced data structures	-0.124939
-0.594643	and big structures	-0.124939
-2.219505	with a little	-0.124939
-0.896780	run a little	-0.124939
-0.896780	needs a little	-0.124939
-0.896780	becomes a little	-0.124939
-0.598893	seem a little	-0.124939
-0.601159	methods with little	-0.124939
-0.901101	do as little	-0.124939
-0.900351	programmers have little	-0.124939
-0.600426	factor. A little	-0.124939
-0.598375	and takes little	-0.124939
-1.777455	is very little	-0.124939
-0.587956	generates too little	-0.124939
-1.595708	}; // Any	-0.124939
-1.779830	memory allocation Any	-0.124939
-0.577452	entire object. Any	-0.124939
-0.567075	new block. Any	-0.124939
-1.252294	execution units. Any	-0.124939
-0.550723	400 here. Any	-0.124939
-0.805766	loop counter. Any	-0.124939
-0.999223	to maintain. Any	-0.124939
-0.659697	be shared. Any	-0.124939
-0.358844	be saved. Any	-0.124939
-0.358844	spell checking. Any	-0.124939
-0.358844	the device. Any	-0.124939
-2.968495	in the logical	-0.124939
-2.616890	for the logical	-0.124939
-1.296819	though the logical	-0.124939
-0.601813	form a logical	-0.124939
-1.768909	number of logical	-0.124939
-0.203450	cores or logical	-0.425969
-2.641725	the same logical	-0.124939
-1.867466	only one logical	-0.124939
-0.592499	but eight logical	-0.124939
-0.358902	the even-numbered logical	-0.124939
-0.203040	library int level	-0.425969
-1.542829	an extra level	-0.124939
-0.196587	vector element level	-0.124939
-0.592764	/Gr Function level	-0.124939
-0.437352	the high level	-0.124939
-1.032225	a high level	-0.124939
-0.585940	A higher level	-0.124939
-0.764481	the highest level	-0.124939
-0.358902	for "function level	-0.124939
-0.203241	files on access.	-0.124939
-0.587183	by memory access.	-0.124939
-0.587183	with memory access.	-0.124939
-0.587183	another memory access.	-0.124939
-1.369058	with vector access.	-0.124939
-0.897868	at each access.	-0.124939
-0.882984	at every access.	-0.124939
-0.591869	direct hardware access.	-0.124939
-0.579188	optimizing database access.	-0.124939
-1.012055	any non-static access.	-0.124939
-0.562953	than random access.	-0.124939
-2.287460	use the bitwise	-0.124939
-2.016246	using the bitwise	-0.124939
-0.601226	Nevertheless, the bitwise	-0.124939
-0.893389	needed. The bitwise	-0.124939
-0.597183	once The bitwise	-0.124939
-0.597183	||). The bitwise	-0.124939
-0.597183	-1. The bitwise	-0.124939
-0.601179	do with bitwise	-0.124939
-1.910860	of using bitwise	-0.124939
-0.196757	14.3 Use bitwise	-0.425969
-0.659869	the corresponding bitwise	-0.124939
-2.761207	if the handle	-0.124939
-1.285277	way to handle	-0.124939
-0.203276	large to handle	-0.425969
-0.894728	designed to handle	-0.124939
-1.195758	Failure to handle	-0.124939
-0.600627	these and handle	-0.124939
-0.600627	squares and handle	-0.124939
-1.739895	we can handle	-0.124939
-0.600444	should then handle	-0.124939
-1.404374	that doesn't handle	-0.124939
-2.378817	by the heap	-0.124939
-1.759004	when the heap	-0.124939
-1.075441	called the heap	-0.124939
-1.369085	cause the heap	-0.124939
-0.899349	causes the heap	-0.124939
-1.641033	cost of heap	-0.124939
-1.060500	order. The heap	-0.124939
-0.595743	allocation. The heap	-0.124939
-0.890547	heap. The heap	-0.124939
-0.595743	26. The heap	-0.124939
-0.595743	invalid. The heap	-0.124939
-0.575546	instruction mov DWORD	-0.124939
-0.562980	[eax], ecx DWORD	-0.124939
-0.415662	1 ebx, DWORD	-0.124939
-0.586690	add ebx, DWORD	-0.124939
-0.391086	eax edx, DWORD	-0.124939
-0.391086	ecx, edx, DWORD	-0.124939
-0.504962	ebx ecx, DWORD	-0.124939
-0.204078	PTR [edx] DWORD	-0.124939
-0.659783	PTR [esp+8] DWORD	-0.124939
-0.358887	PTR [eax+400] DWORD	-0.124939
-0.358887	PTR [esp+4] DWORD	-0.124939
-0.892615	user's time. Other	-0.124939
-0.586636	known processors. Other	-0.124939
-0.788784	high priority. Other	-0.124939
-0.527278	this format. Other	-0.124939
-0.726685	19 Literature Other	-0.124939
-0.143381	21 3.11 Other	-0.124939
-0.143381	best. 3.11 Other	-0.124939
-0.065800	20 3.9 Other	-0.425969
-0.143381	on. 7.31 Other	-0.124939
-0.143381	61 7.31 Other	-0.124939
-0.358887	and Gnu). Other	-0.124939
-2.094190	is used during	-0.124939
-1.923973	the performance during	-0.124939
-1.059871	executing instructions during	-0.124939
-0.590752	time both during	-0.124939
-1.132855	CPU core during	-0.124939
-1.402120	the computer during	-0.124939
-0.866031	will change during	-0.124939
-0.550677	switch occurs during	-0.124939
-0.527165	be selected during	-0.124939
-0.463531	framework itself, during	-0.124939
-0.463531	array grows during	-0.124939
-0.358844	the framework, during	-0.124939
-2.260588	that is initialized	-0.124939
-2.113907	it is initialized	-0.124939
-1.289635	table is initialized	-0.124939
-0.601708	constants, and initialized	-0.124939
-1.444494	one for initialized	-0.124939
-2.345594	to be initialized	-0.124939
-2.221274	can be initialized	-0.425969
-1.283514	An array initialized	-0.124939
-1.624427	have been initialized	-0.124939
-1.064066	overflow can occur	-0.124939
-0.596963	buffer can occur	-0.124939
-0.596963	collection can occur	-0.124939
-0.891659	expressions may occur	-0.124939
-0.596307	Overflow may occur	-0.124939
-0.600445	break will occur	-0.124939
-0.598678	expressions also occur	-0.124939
-0.594616	overflow doesn't occur	-0.124939
-0.343448	when contentions occur	-0.425969
-0.527309	that seldom occur	-0.124939
-2.053002	if the target	-0.124939
-2.217196	then the target	-0.124939
-0.900755	predict the target	-0.124939
-0.600168	predicted. The target	-0.124939
-0.600168	paragraph. The target	-0.124939
-0.557116	the branch target	-0.301030
-0.780188	The branch target	-0.124939
-0.536276	cache, branch target	-0.124939
-0.573245	a program, especially	-0.124939
-0.828035	32-bit systems, especially	-0.124939
-0.917222	is inefficient, especially	-0.124939
-0.764372	of precision, especially	-0.124939
-0.463549	scarce resource, especially	-0.124939
-0.463549	the file, especially	-0.124939
-0.463549	than relocation, especially	-0.124939
-0.659725	dependency chains, especially	-0.124939
-0.358859	code slower, especially	-0.124939
-0.358859	dependency chain, especially	-0.124939
-0.358859	time consuming, especially	-0.124939
-1.852396	or a smart	-0.124939
-2.034564	use a smart	-0.124939
-1.191649	need a smart	-0.124939
-1.869621	through a smart	-0.124939
-1.069794	whenever a smart	-0.124939
-1.503449	implementations of smart	-0.124939
-0.594489	pointers A smart	-0.124939
-1.056852	used. A smart	-0.124939
-1.910860	of using smart	-0.124939
-0.894841	things very smart	-0.124939
-0.878474	each their smart	-0.124939
-1.202221	loop that includes	-0.124939
-0.878408	compilers. This includes	-0.124939
-0.589544	variables. This includes	-0.124939
-0.589544	speed. This includes	-0.124939
-0.589544	feature. This includes	-0.124939
-2.028600	Intel compiler includes	-0.124939
-0.598664	language also includes	-0.124939
-0.894258	this way includes	-0.124939
-1.180503	map file includes	-0.124939
-0.872884	Static linking includes	-0.124939
-0.358902	www.agner.org/optimize/asmlib.zip. Currently includes	-0.124939
-2.384098	and the entire	-0.124939
-2.741689	in the entire	-0.124939
-1.797766	makes the entire	-0.124939
-1.608710	making the entire	-0.124939
-1.069427	copy the entire	-0.124939
-1.191162	load the entire	-0.124939
-1.069427	copying the entire	-0.124939
-0.598784	loading the entire	-0.124939
-0.598784	almost the entire	-0.124939
-0.896563	mirror the entire	-0.124939
-0.600980	causes an entire	-0.124939
-1.935537	into the executable	-0.124939
-1.293415	want the executable	-0.124939
-1.294928	both the executable	-0.124939
-0.900028	Only the executable	-0.124939
-0.600526	Both the executable	-0.124939
-1.296612	by an executable	-0.124939
-1.860372	a single executable	-0.124939
-0.585144	as binary executable	-0.124939
-0.464291	the main executable	-0.124939
-2.761207	if the subexpression	-0.124939
-1.599978	such a subexpression	-0.124939
-0.901613	expression or subexpression	-0.124939
-2.643731	the same subexpression	-0.124939
-0.537226	as common subexpression	-0.124939
-0.537226	allows common subexpression	-0.124939
-0.537226	inlining, common subexpression	-0.124939
-0.182900	integer Common subexpression	-0.124939
-0.182900	2; Common subexpression	-0.124939
-0.182900	reductions: Common subexpression	-0.124939
-0.182900	elimination Common subexpression	-0.124939
-2.212563	is to insert	-0.124939
-1.599090	possible to insert	-0.124939
-0.896319	Remember to insert	-0.124939
-0.598661	risking to insert	-0.124939
-1.525236	time and insert	-0.124939
-1.480798	memory and insert	-0.124939
-1.429625	set and insert	-0.124939
-0.598496	hand and insert	-0.124939
-2.023544	compiler can insert	-0.124939
-2.095347	You may insert	-0.124939
-2.238847	then the nontemporal	-0.124939
-1.298575	Using the nontemporal	-0.124939
-1.203345	effect of nontemporal	-0.124939
-0.601602	area. The nontemporal	-0.124939
-0.596696	#pragma vector nontemporal	-0.124939
-2.087224	by using nontemporal	-0.124939
-1.009317	The so-called nontemporal	-0.124939
-1.027686	Don't mix nontemporal	-0.124939
-0.573289	can insert nontemporal	-0.124939
-1.601302	outside the bounds	-0.124939
-0.601828	added a bounds	-0.124939
-1.379239	method of bounds	-0.124939
-1.057495	array with bounds	-0.124939
-1.057495	arrays with bounds	-0.124939
-0.888513	Array with bounds	-0.124939
-0.601142	134 on bounds	-0.124939
-1.117752	of array bounds	-0.124939
-0.331846	for array bounds	-0.124939
-0.601655	inlined for improved	-0.124939
-1.482204	can be improved	-0.550907
-1.859649	will be improved	-0.124939
-0.585768	probably be improved	-0.124939
-2.495374	and the SSE	-0.124939
-2.379412	with the SSE	-0.124939
-1.793174	has the SSE	-0.124939
-1.075595	support the SSE	-0.124939
-0.885103	4 128 SSE	-0.124939
-1.164745	bit mode SSE	-0.124939
-0.818472	-ffunction- sections SSE	-0.124939
-0.358902	PREFETCH _mm_prefetch SSE	-0.124939
-0.358902	MOVNTPS _mm_stream_ps SSE	-0.124939
-0.358902	MMX mmintrin.h SSE	-0.124939
-0.358902	MOVNTQ _mm_stream_pi SSE	-0.124939
-2.651704	it is discussed	-0.124939
-2.026178	It is discussed	-0.425969
-0.897146	threads is discussed	-0.124939
-0.599078	conversions is discussed	-0.124939
-0.596006	optimization are discussed	-0.124939
-1.664336	libraries are discussed	-0.124939
-1.271571	methods are discussed	-0.124939
-0.596006	time-consumers are discussed	-0.124939
-0.601110	devices, as discussed	-0.124939
-1.919222	is also discussed	-0.124939
-1.203567	need the updates	-0.124939
-0.378805	search for updates	-0.124939
-0.896106	searching for updates	-0.124939
-0.600215	downloaded program updates	-0.124939
-0.588594	install automatic updates	-0.124939
-0.175453	3.4 Automatic updates	-0.124939
-0.562974	If frequent updates	-0.124939
-1.082667	time consuming updates	-0.124939
-0.358902	automatically download updates	-0.124939
-1.920140	important to consider	-0.124939
-0.827418	you may consider	-0.346788
-2.014256	If you consider	-0.124939
-1.197734	we will consider	-0.124939
-0.889893	that I consider	-0.124939
-1.178237	you must consider	-0.124939
-0.567957	we must consider	-0.124939
-0.601568	requires the loading	-0.124939
-0.601568	involve the loading	-0.124939
-2.216697	will be loading	-0.124939
-1.772439	more time loading	-0.124939
-0.899613	cache from loading	-0.124939
-0.600215	But program loading	-0.124939
-0.574742	memory without loading	-0.124939
-0.574742	double without loading	-0.124939
-0.818670	of lazy loading	-0.124939
-0.129469	3.5 Program loading	-0.124939
-0.601515	all be below	-0.124939
-0.897686	the example below	-0.124939
-1.371083	an address below	-0.124939
-0.885745	is explained below	-0.124939
-0.184203	loop columns below	-0.425969
-0.563061	ReadTSC listed below	-0.124939
-0.788784	row 28 below	-0.124939
-0.249808	elements matrix[r][c] below	-0.124939
-0.358932	element matrix[r][c] below	-0.124939
-0.463586	example 7.15b below	-0.124939
-1.202155	case the reading	-0.124939
-1.444214	off the reading	-0.124939
-1.503083	applies to reading	-0.124939
-0.900178	stack and reading	-0.124939
-0.600601	optimize, and reading	-0.124939
-1.741025	we are reading	-0.124939
-0.901555	input or reading	-0.124939
-0.601159	connection with reading	-0.124939
-1.076381	spent on reading	-0.124939
-2.000092	faster than reading	-0.124939
-2.233343	rather than reading	-0.124939
-1.078713	Obviously, the directly	-0.124939
-2.538155	the function directly	-0.124939
-1.496974	well as directly	-0.124939
-0.594744	Make calls directly	-0.124939
-0.590788	instructions write directly	-0.124939
-0.557709	point representation directly	-0.124939
-0.805799	of C++, directly	-0.124939
-0.726618	measurement instruments directly	-0.124939
-0.463549	call C1::f directly	-0.124939
-0.463549	be fed directly	-0.124939
-0.358859	// Called directly	-0.124939
-2.322245	is the simplest	-0.124939
-2.321420	in the simplest	-0.124939
-2.533116	for the simplest	-0.124939
-1.690077	only the simplest	-0.124939
-0.679021	gives the simplest	-0.124939
-1.064830	vector. The simplest	-0.124939
-0.597223	tools. The simplest	-0.124939
-0.597223	module2.cpp. The simplest	-0.124939
-0.597223	repetitive. The simplest	-0.124939
-2.369588	is the situation	-0.124939
-2.699987	to the situation	-0.124939
-2.968495	in the situation	-0.124939
-0.902174	structure. The situation	-0.124939
-0.600593	a use situation	-0.124939
-1.074299	space. A situation	-0.124939
-1.196018	the only situation	-0.124939
-0.895505	in any situation	-0.124939
-0.367820	worst case situation	-0.124939
-0.594211	A common situation	-0.124939
-2.190042	from the message	-0.124939
-2.771340	in a message	-0.124939
-0.601734	mutexes and message	-0.124939
-0.426045	an error message	-0.221849
-0.442483	An error message	-0.124939
-0.627073	own error message	-0.124939
-0.627073	appropriate error message	-0.124939
-1.529646	time. The delay	-0.124939
-0.896271	line. The delay	-0.124939
-0.896271	linker. The delay	-0.124939
-0.601030	ms. This delay	-0.124939
-2.256812	the time delay	-0.124939
-1.197654	which will delay	-0.124939
-1.495690	a large delay	-0.124939
-0.594602	operation doesn't delay	-0.124939
-0.541179	A considerable delay	-0.124939
-0.065802	store forwarding delay	-0.425969
-2.969222	in the condition	-0.124939
-2.058645	if the condition	-0.124939
-0.601821	testing a condition	-0.124939
-1.076866	the if condition	-0.124939
-2.326793	the loop condition	-0.124939
-1.691242	an error condition	-0.124939
-0.971320	the overflow condition	-0.124939
-0.563053	uncaught overflow condition	-0.124939
-0.894017	loop control condition	-0.124939
-0.815102	the performance monitor	-0.301030
-0.838883	The performance monitor	-0.124939
-0.172335	more performance monitor	-0.425969
-0.462286	A performance monitor	-0.124939
-0.462286	called performance monitor	-0.124939
-0.462286	useful performance monitor	-0.124939
-0.172335	Using performance monitor	-0.425969
-1.203669	economize the resource	-0.124939
-1.078360	sources of resource	-0.124939
-0.601262	modules or resource	-0.124939
-2.639729	the same resource	-0.124939
-1.224317	and other resource	-0.124939
-0.805832	configuration files, resource	-0.124939
-1.090016	to economize resource	-0.124939
-0.884861	a scarce resource	-0.124939
-0.358873	a precious resource	-0.124939
-0.358873	shared objects), resource	-0.124939
-1.513962	number of cores	-0.124939
-1.662237	the different cores	-0.124939
-1.879976	the CPU cores	-0.124939
-0.338244	multiple CPU cores	-0.301030
-1.425039	with multiple cores	-0.124939
-0.885096	with four cores	-0.124939
-0.527325	FPGA soft cores	-0.124939
-1.939785	has a parallel	-0.124939
-1.712729	sake of parallel	-0.124939
-1.202821	calculations in parallel	-0.124939
-0.902121	directives for parallel	-0.124939
-1.050048	for doing parallel	-0.124939
-0.878389	logic allows parallel	-0.124939
-0.510468	options. Supports parallel	-0.124939
-0.510468	Mac. Supports parallel	-0.124939
-0.550753	for specifying parallel	-0.124939
-0.463568	is inherently parallel	-0.124939
-0.358873	with massively parallel	-0.124939
-1.940919	code in either	-0.124939
-1.070349	times faster either	-0.124939
-1.036081	be implemented either	-0.425969
-1.159513	be linked either	-0.124939
-1.368358	may choose either	-0.124939
-0.567057	can contain either	-0.124939
-0.957715	be saved either	-0.124939
-0.527262	of going either	-0.124939
-0.835696	memory blocks, either	-0.124939
-0.463568	processing unit, either	-0.124939
-0.738914	two different implementations	-0.425969
-0.569293	loop. Some implementations	-0.124939
-0.569293	run. Some implementations	-0.124939
-1.299345	most common implementations	-0.124939
-0.564766	All common implementations	-0.124939
-0.589607	code. Most implementations	-0.124939
-0.589536	and their implementations	-0.124939
-0.861442	particularly slow implementations	-0.124939
-0.570514	if alternative implementations	-0.124939
-0.463604	Some early implementations	-0.124939
-1.940026	instead of calculating	-0.124939
-1.451733	used for calculating	-0.124939
-1.326124	variables for calculating	-0.124939
-0.592565	processor for calculating	-0.124939
-1.646010	support for calculating	-0.124939
-1.255503	needed for calculating	-0.124939
-1.434434	intended for calculating	-0.124939
-2.082710	faster than calculating	-0.124939
-0.900112	implicitly when calculating	-0.124939
-0.527313	to begin calculating	-0.124939
-2.176528	value of ebx	-0.124939
-0.601730	i/2 in ebx	-0.124939
-1.272573	The result ebx	-0.124939
-0.885011	It uses ebx	-0.124939
-0.579125	; save ebx	-0.124939
-0.557635	to. Now ebx	-0.124939
-0.550696	code. Register ebx	-0.124939
-0.835653	+ esp ebx	-0.124939
-0.504921	100. pop ebx	-0.124939
-0.504921	100 $B1$2 ebx	-0.124939
-0.358859	; restore ebx	-0.124939
-2.642727	the same generation	-0.124939
-0.945404	The first generation	-0.124939
-0.891972	each new generation	-0.124939
-0.973458	the next generation	-0.425969
-1.074970	the second generation	-0.124939
-0.425000	The second generation	-0.425969
-0.557689	or compile-time generation	-0.124939
-0.541201	the third generation	-0.124939
-2.212170	is to enable	-0.124939
-2.211552	order to enable	-0.124939
-1.324965	recommended to enable	-0.124939
-0.896291	options to enable	-0.124939
-1.078127	up and enable	-0.124939
-1.202643	mode or enable	-0.124939
-1.635160	This may enable	-0.124939
-1.585266	This will enable	-0.124939
-1.174053	which will enable	-0.124939
-1.637352	instruction sets enable	-0.124939
-1.882182	floating point instructions.	-0.124939
-0.596405	access these instructions.	-0.124939
-0.969466	the AVX instructions.	-0.124939
-1.038200	and string instructions.	-0.124939
-0.585924	Cache control instructions.	-0.124939
-1.149024	the subsequent instructions.	-0.124939
-0.557735	few machine instructions.	-0.124939
-1.162385	bit scan instructions.	-0.124939
-0.527228	for detailed instructions.	-0.124939
-1.127121	object is copied	-0.124939
-1.366573	parameter is copied	-0.124939
-0.599757	14.1c is copied	-0.124939
-2.002397	can be copied	-0.124939
-2.589745	is not copied	-0.124939
-1.765838	has been copied	-0.124939
-0.505065	entire contents copied	-0.124939
-0.358930	created, deleted, copied	-0.124939
-1.641487	vector of e.g.	-0.124939
-1.774241	time to e.g.	-0.124939
-2.278516	such as e.g.	-0.124939
-2.620559	the compiler e.g.	-0.124939
-1.396798	instruction set, e.g.	-0.124939
-1.048532	can hold e.g.	-0.124939
-0.557598	set available, e.g.	-0.124939
-0.504921	programming language, e.g.	-0.124939
-0.358859	internal multi-threading, e.g.	-0.124939
-0.358859	module with, e.g.	-0.124939
-0.358859	an interrupt, e.g.	-0.124939
-2.151551	is to keep	-0.124939
-1.811308	has to keep	-0.124939
-1.901819	way to keep	-0.124939
-1.999850	want to keep	-0.124939
-1.831535	advantageous to keep	-0.124939
-1.346745	fail to keep	-0.124939
-0.596300	computers to keep	-0.124939
-1.062126	preferable to keep	-0.124939
-1.202816	objects and keep	-0.124939
-0.891411	they always keep	-0.124939
-0.358959	Microprocessor producers keep	-0.124939
-0.050303	mov DWORD PTR	-0.124939
-0.050303	ecx DWORD PTR	-0.124939
-0.024424	ebx, DWORD PTR	-0.425969
-0.024424	edx, DWORD PTR	-0.124939
-0.050303	ecx, DWORD PTR	-0.124939
-0.115982	[edx] DWORD PTR	-0.124939
-0.050303	[esp+8] DWORD PTR	-0.124939
-0.050303	[eax+400] DWORD PTR	-0.124939
-0.050303	[esp+4] DWORD PTR	-0.124939
-1.071536	instr. set Automatic	-0.124939
-0.587305	float expressions Automatic	-0.124939
-1.467359	CPU dispatch Automatic	-0.124939
-1.203241	Automatic vectorization Automatic	-0.124939
-0.788784	installation tools. Automatic	-0.124939
-0.143381	18 3.4 Automatic	-0.124939
-0.143381	manner. 3.4 Automatic	-0.124939
-0.463586	Example 12.1a. Automatic	-0.124939
-0.143381	future. 12.3 Automatic	-0.124939
-0.143381	107 12.3 Automatic	-0.124939
-0.358887	Automatic updates. Automatic	-0.124939
-0.594753	Object Windows Library	-0.124939
-0.097376	Windows Template Library	-0.124939
-0.223093	Standard Template Library	-0.124939
-0.223093	Active Template Library	-0.124939
-0.557775	Core Math Library	-0.124939
-1.114132	by 16. Library	-0.124939
-1.096489	Math Kernel Library	-0.124939
-0.527249	fully optimized. Library	-0.124939
-0.726718	AMD LIBM Library	-0.124939
-0.358902	functions directly: Library	-0.124939
-1.513373	= a ?	-0.425969
-1.850998	= b ?	-0.124939
-0.370111	> b ?	-0.425969
-0.367833	> 0 ?	-0.425969
-0.692187	> 0) ?	-0.124939
-1.090127	== 0) ?	-0.124939
-0.463641	== EXCEPTION_FLT_OVERFLOW ?	-0.124939
-0.358930	as OneOrTwo5[(b!=0) ?	-0.124939
-1.704502	function is defined	-0.425969
-0.899864	body is defined	-0.124939
-0.898304	etc. are defined	-0.124939
-1.289163	constants are defined	-0.124939
-0.600223	programmer has defined	-0.124939
-0.598827	static object defined	-0.124939
-0.598071	(i.e. variables defined	-0.124939
-1.624207	have been defined	-0.124939
-0.195820	Vector classes defined	-0.425969
-0.601842	Basic is Visual	-0.124939
-0.709150	the Microsoft Visual	-0.124939
-0.588605	is Microsoft Visual	-0.124939
-0.588605	to Microsoft Visual	-0.124939
-0.416953	below. Microsoft Visual	-0.124939
-0.416953	date): Microsoft Visual	-0.124939
-0.527291	multi-core processing. Visual	-0.124939
-0.249830	as C#, Visual	-0.124939
-0.249830	Java, C#, Visual	-0.124939
-0.505022	for free. Visual	-0.124939
-0.358930	respectively (MS Visual	-0.124939
-2.230919	order to align	-0.124939
-1.372411	how to align	-0.124939
-1.071381	choose to align	-0.124939
-2.023459	compiler can align	-0.124939
-1.070133	x // align	-0.124939
-0.599023	elements // align	-0.124939
-1.141590	compilers will align	-0.124939
-0.567941	return ; align	-0.124939
-0.567941	esp ; align	-0.124939
-0.901019	allocations of sizes	-0.124939
-0.601023	Bit-fields of sizes	-0.124939
-1.140278	of different sizes	-0.425969
-0.870769	and different sizes	-0.124939
-1.687516	on all sizes	-0.124939
-1.290370	for array sizes	-0.124939
-1.414055	vector register sizes	-0.124939
-0.886450	different matrix sizes	-0.124939
-0.880198	operators Integer sizes	-0.124939
-0.584185	of smaller sizes	-0.124939
-1.042250	a[i] = temp;	-0.124939
-0.199281	c; double temp;	-0.425969
-0.578389	c2; double temp;	-0.124939
-0.200644	temp * temp;	-0.124939
-0.597188	float register temp;	-0.124939
-0.585200	a[100], b, temp;	-0.124939
-1.408092	b, c, temp;	-0.124939
-0.835825	i, a[100], temp;	-0.124939
-1.443793	instructions that allow	-0.124939
-2.026947	does not allow	-0.124939
-1.630538	This will allow	-0.124939
-0.599662	C++ should allow	-0.124939
-1.785523	C++ compilers allow	-0.124939
-1.328834	32-bit systems allow	-0.124939
-1.075661	Some systems allow	-0.124939
-0.554891	Unix systems allow	-0.124939
-1.427115	and Mac allow	-0.124939
-0.582025	16-bit Windows, allow	-0.124939
-0.579252	precision math allow	-0.124939
-2.570998	on the PathScale	-0.124939
-0.800576	Intel and PathScale	-0.425969
-0.378335	Gnu and PathScale	-0.425969
-0.893790	Microsoft and PathScale	-0.124939
-1.076924	Intel or PathScale	-0.124939
-1.174058	Microsoft, Intel, PathScale	-0.124939
-0.582010	all platforms. PathScale	-0.124939
-0.527309	Mars PGI PathScale	-0.124939
-0.358916	(Red Hat). PathScale	-0.124939
-1.503197	applies to BSD	-0.124939
-1.712849	Linux and BSD	-0.124939
-1.014502	objects in BSD	-0.124939
-0.305176	of Linux, BSD	-0.124939
-0.431506	in Linux, BSD	-0.124939
-0.483356	64-bit Linux, BSD	-0.124939
-0.304136	Windows, Linux, BSD	-0.124939
-0.557732	and Open BSD	-0.124939
-0.358930	Linux, Mac, BSD	-0.124939
-0.599136	e + f;	-0.124939
-0.167022	{ float f;	-0.726999
-1.039680	i; float f;	-0.124939
-0.598091	i; return f;	-0.124939
-2.820211	of the previous	-0.124939
-2.540817	to the previous	-0.124939
-2.293169	in the previous	-0.124939
-2.401589	on the previous	-0.124939
-1.229128	from the previous	-0.602060
-1.969341	using the previous	-0.124939
-1.530763	until the previous	-0.124939
-1.803170	from a previous	-0.124939
-0.567891	i < size;	-0.865301
-2.615446	it is rarely	-0.124939
-2.020197	It is rarely	-0.124939
-1.071837	mechanism is rarely	-0.124939
-0.599107	feature is rarely	-0.124939
-1.553148	time and rarely	-0.124939
-2.321186	that it rarely	-0.124939
-0.599852	programs but rarely	-0.124939
-0.570547	Application programmers rarely	-0.124939
-0.570532	advanced features rarely	-0.124939
-1.845136	a different way.	-0.124939
-1.809881	the other way.	-0.124939
-0.967311	the following way.	-0.124939
-0.505061	an inefficient way.	-0.124939
-0.902868	very inefficient way.	-0.124939
-0.573289	going either way.	-0.124939
-0.241868	a suboptimal way.	-0.124939
-0.358916	a graceful way.	-0.124939
-2.728789	to the vector.	-0.124939
-1.954532	into the vector.	-0.124939
-2.074075	in a vector.	-0.124939
-1.658637	in one vector.	-0.124939
-0.599290	full size vector.	-0.124939
-0.877562	a Boolean vector.	-0.124939
-1.368956	the last vector.	-0.124939
-0.332740	elements per vector.	-0.124939
-0.940572	the largest vector.	-0.124939
-2.283478	that is easier	-0.124939
-2.683936	It is easier	-0.124939
-0.899864	15.1b is easier	-0.124939
-0.601696	manageable and easier	-0.124939
-0.899021	makes it easier	-0.425969
-1.139684	is often easier	-0.124939
-0.586684	calculation becomes easier	-0.124939
-1.136785	is just easier	-0.124939
-0.505002	this reordering easier	-0.124939
-1.200449	to is identical	-0.124939
-1.375627	p is identical	-0.124939
-1.289163	constants are identical	-0.124939
-0.599661	BSD are identical	-0.124939
-0.591526	stored. All identical	-0.124939
-0.563015	give almost identical	-0.124939
-0.615753	is exactly identical	-0.124939
-0.435049	make exactly identical	-0.124939
-0.835825	by joining identical	-0.124939
-0.143390	analysis Join identical	-0.124939
-0.143390	area. Join identical	-0.124939
-0.902286	5 and 20	-0.124939
-0.203112	10 - 20	-0.425969
-0.600296	reduced from 20	-0.124939
-0.764503	loop repeats 20	-0.124939
-0.527207	database ...................................................................................................... 20	-0.124939
-0.463568	code ....................................................... 20	-0.124939
-0.463568	..................................................................................................................... 163 20	-0.124939
-0.358873	some links. 20	-0.124939
-0.358873	File access................................................................................................................ 20	-0.124939
-0.358873	(*.ini files). 20	-0.124939
-0.597500	smaller as well.	-0.124939
-0.597500	languages as well.	-0.124939
-0.597902	Optimizes very well.	-0.124939
-0.880945	not optimize well.	-0.124939
-0.738325	is predicted well.	-0.124939
-0.927483	be predicted well.	-0.124939
-0.456752	not predicted well.	-0.124939
-0.764518	version performs well.	-0.124939
-0.107190	Optimizes reasonably well.	-0.124939
-0.358916	Optimizes moderately well.	-0.124939
-2.232789	of the program,	-0.124939
-1.747490	by the program,	-0.124939
-0.899314	execute the program,	-0.124939
-2.654534	of a program,	-0.124939
-1.069179	a C++ program,	-0.124939
-0.871532	in your program,	-0.124939
-1.482572	the final program,	-0.124939
-0.505022	a multithreaded program,	-0.124939
-1.979692	address of list[i]	-0.124939
-1.230540	else { list[i]	-0.425969
-1.424614	the expression list[i]	-0.124939
-0.587938	ARRAYSIZE && list[i]	-0.124939
-1.509048	cout << list[i]	-0.124939
-0.065802	i<300; i++){ list[i]	-0.124939
-0.143386	i<300; i+=3){ list[i]	-0.124939
-0.143386	i<301; i+=3){ list[i]	-0.124939
-0.358902	i<300; i+=3,i_div_3++){ list[i]	-0.124939
-1.296714	response time under	-0.124939
-1.475560	the program under	-0.301030
-1.924413	the performance under	-0.124939
-0.594405	performs best under	-0.124939
-1.339899	are done under	-0.124939
-1.229790	be tested under	-0.124939
-0.917326	that runs under	-0.124939
-0.504962	background services under	-0.124939
-0.463586	in Wikipedia under	-0.124939
-0.902387	instruction and expect	-0.124939
-1.533582	you can expect	-0.124939
-2.045970	do not expect	-0.124939
-2.020513	if you expect	-0.124939
-1.476153	unless you expect	-0.124939
-1.683374	that we expect	-0.124939
-1.288435	you cannot expect	-0.124939
-0.599050	You cannot expect	-0.602060
-1.601096	a register except	-0.124939
-0.198099	all bits except	-0.425969
-0.577530	same time, except	-0.124939
-0.557688	same object, except	-0.124939
-0.541181	static library, except	-0.124939
-0.764408	and underflow except	-0.124939
-0.504941	16-bit programs, except	-0.124939
-0.902858	the stack, except	-0.124939
-0.358873	much faster, except	-0.124939
-0.358873	the representation, except	-0.124939
-2.427552	with the loops	-0.124939
-0.900208	compile- time loops	-0.124939
-1.009591	the two loops	-0.124939
-0.598643	replace such loops	-0.124939
-0.886784	very small loops	-0.124939
-0.590805	outside both loops	-0.124939
-0.579284	will unroll loops	-0.124939
-0.726685	template metaprogramming, loops	-0.124939
-0.065800	processor. Nested loops	-0.425969
-0.059679	the reason why	-0.124939
-0.536824	The reason why	-0.124939
-0.312980	main reason why	-0.124939
-0.570590	main reasons why	-0.124939
-0.567253	no explanation why	-0.124939
-0.463677	example explains why	-0.124939
-1.538822	the CPU dispatching.	-0.124939
-0.534157	to CPU dispatching.	-0.124939
-0.776478	with CPU dispatching.	-0.124939
-0.534157	called CPU dispatching.	-0.124939
-0.534157	without CPU dispatching.	-0.124939
-0.330754	automatic CPU dispatching.	-0.124939
-0.534157	poor CPU dispatching.	-0.124939
-0.534157	bad CPU dispatching.	-0.124939
-0.567018	size) { cout	-0.124939
-0.087378	Disp() { cout	-0.726999
-0.196882	Hello() { cout	-0.425969
-0.567018	int)size) { cout	-0.124939
-0.598534	through array cout	-0.124939
-0.842333	of f cout	-0.124939
-0.855041	pointers and references.	-0.124939
-1.201171	pointers or references.	-0.124939
-0.601179	impossible with references.	-0.124939
-0.599379	when using references.	-0.124939
-0.335962	for local references.	-0.124939
-0.073776	for internal references.	-0.124939
-2.310362	possible to come	-0.124939
-0.598243	elements that come	-0.124939
-1.071580	libraries that come	-0.124939
-0.598243	mathimf.h that come	-0.124939
-0.601292	uninitialized or come	-0.124939
-0.600850	updates may come	-0.124939
-0.895830	big objects come	-0.124939
-1.580276	if they come	-0.124939
-0.589560	shall automatically come	-0.124939
-1.595236	data members come	-0.124939
-1.554602	series of statements	-0.124939
-0.601292	compile-time if statements	-0.124939
-0.598830	while multiple statements	-0.124939
-1.252344	to add statements	-0.124939
-0.271844	and switch statements	-0.124939
-0.519800	for switch statements	-0.124939
-0.519800	A switch statements	-0.124939
-0.369435	because switch statements	-0.124939
-0.463641	ways. Switch statements	-0.124939
-1.596796	d = u;	-0.124939
-1.045000	unsigned int u;	-0.602060
-0.361992	i; } u;	-0.346788
-0.570190	i[2]; } u;	-0.124939
-2.730092	if the SSE4.1	-0.124939
-1.772704	unless the SSE4.1	-0.124939
-0.601794	prior to SSE4.1	-0.124939
-1.444316	one for SSE4.1	-0.124939
-0.601376	SSE2 // SSE4.1	-0.124939
-0.901270	vectorized with SSE4.1	-0.124939
-1.071536	instr. set SSE4.1	-0.124939
-1.269528	vector instructions SSE4.1	-0.124939
-0.541198	class library, SSE4.1	-0.124939
-0.358887	SSE3 tmmintrin.h SSE4.1	-0.124939
-3.065668	in the chapter	-0.124939
-1.538553	explained in chapter	-0.124939
-0.853724	described in chapter	-0.124939
-1.068445	mentioned in chapter	-0.124939
-0.601037	class? This chapter	-0.124939
-0.597465	variables. See chapter	-0.124939
-1.368987	The next chapter	-0.124939
-1.431961	the previous chapter	-0.124939
-0.358916	"Macro loops" chapter	-0.124939
-2.154846	which is similar	-0.124939
-1.076405	template is similar	-0.124939
-1.898909	or a similar	-0.124939
-0.600601	strings and similar	-0.124939
-0.600601	blocking and similar	-0.124939
-0.601584	reveals that similar	-0.124939
-0.600688	Intel have similar	-0.124939
-0.600426	138 A similar	-0.124939
-1.280557	are very similar	-0.124939
-1.175925	library contains similar	-0.124939
-0.089740	is of course	-0.124939
-0.596422	are of course	-0.124939
-0.596422	may of course	-0.124939
-0.596422	should of course	-0.124939
-0.596422	would of course	-0.124939
-0.358973	a zigzag course	-0.124939
-0.358973	time. (Of course	-0.124939
-1.068466	address and back	-0.124939
-0.502986	mode and back	-0.425969
-0.895917	truncation and back	-0.124939
-1.883205	the result back	-0.124939
-0.583175	and go back	-0.124939
-0.575600	the priority back	-0.124939
-0.550743	r places back	-0.124939
-0.541201	and jumps back	-0.124939
-0.358916	that dates back	-0.124939
-1.636060	without the risk	-0.124939
-0.504653	involves the risk	-0.425969
-2.021211	is a risk	-0.124939
-0.599861	Faster, but risk	-0.124939
-1.296671	is no risk	-0.602060
-1.305208	a higher risk	-0.124939
-1.940193	has a garbage	-0.124939
-0.598459	switches and garbage	-0.124939
-0.598459	deallocation and garbage	-0.124939
-0.203397	management and garbage	-0.124939
-1.077817	need for garbage	-0.124939
-0.901047	fragmented. This garbage	-0.124939
-1.969275	is called garbage	-0.124939
-0.868138	will start garbage	-0.124939
-0.981876	very time-consuming garbage	-0.124939
-1.924351	use of templates	-0.124939
-1.201040	form of templates	-0.124939
-1.202665	classes and templates	-0.124939
-0.597923	effect with templates	-0.124939
-0.894855	polymorphism with templates	-0.124939
-1.322085	container class templates	-0.124939
-0.589796	containers class templates	-0.124939
-1.535206	to using templates	-0.124939
-0.589122	150. Using templates	-0.124939
-0.505002	functions, classes, templates	-0.124939
-1.802190	check for buffer	-0.124939
-1.629756	a memory buffer	-0.124939
-2.327018	the loop buffer	-0.124939
-1.361056	a static buffer	-0.124939
-0.265936	branch target buffer	-0.124939
-0.028024	a circular buffer	-0.301030
-3.082705	of the header	-0.124939
-2.517806	and the header	-0.124939
-2.287249	use the header	-0.124939
-0.601821	including a header	-0.124939
-0.601696	modules and header	-0.124939
-0.601394	"xmmintrin.h" // header	-0.124939
-1.366503	in Intel header	-0.124939
-1.498490	the standard header	-0.124939
-0.862211	the appropriate header	-0.425969
-3.014773	in the future	-0.124939
-1.711648	In the future	-0.124939
-1.829882	If a future	-0.124939
-1.299213	choice for future	-0.124939
-0.902153	hope that future	-0.124939
-1.263500	best on future	-0.124939
-0.594290	solution on future	-0.124939
-0.594290	bytes) on future	-0.124939
-2.364924	rather than future	-0.124939
-0.575629	Object1.Hello(), though future	-0.124939
-1.424420	be called whenever	-0.124939
-1.672074	is useful whenever	-0.124939
-1.494246	point calculations whenever	-0.124939
-1.900778	clock cycles whenever	-0.124939
-1.513448	to zero whenever	-0.124939
-1.040055	extra cost whenever	-0.124939
-1.142924	are declared whenever	-0.124939
-1.149149	be mispredicted whenever	-0.124939
-0.541181	(PLT). And whenever	-0.124939
-0.358873	own initiative whenever	-0.124939
-1.067817	gain by unrolling	-0.124939
-0.598239	dramatically by unrolling	-0.124939
-1.550205	The loop unrolling	-0.124939
-0.584084	excessive loop unrolling	-0.124939
-0.584084	Excessive loop unrolling	-0.124939
-0.347518	} Loop unrolling	-0.124939
-0.347518	time. Loop unrolling	-0.124939
-0.347518	case. Loop unrolling	-0.124939
-0.347518	factor. Loop unrolling	-0.124939
-0.347518	eliminated. Loop unrolling	-0.124939
-1.941828	versions of CriticalFunction	-0.124939
-1.829855	call to CriticalFunction	-0.124939
-0.600902	"C" int CriticalFunction	-0.124939
-0.598454	CriticalFunctionType * CriticalFunction	-0.124939
-0.598455	Generic version CriticalFunction	-0.124939
-1.177860	of times CriticalFunction	-0.124939
-0.816407	SSE2 supported CriticalFunction	-0.124939
-0.816407	AVX supported CriticalFunction	-0.124939
-0.592496	on whether CriticalFunction	-0.124939
-1.307999	to execute CriticalFunction	-0.124939
-1.073731	system to swap	-0.124939
-0.203754	macro to swap	-0.425969
-0.601721	diagonal and swap	-0.124939
-0.599023	diagonal // swap	-0.124939
-0.599023	a[c][r]); // swap	-0.124939
-1.675067	Do not swap	-0.124939
-0.901764	you cannot swap	-0.425969
-1.361358	You cannot swap	-0.124939
-1.441318	uses a newer	-0.124939
-0.901135	choose a newer	-0.124939
-1.640933	available in newer	-0.124939
-1.377470	set. The newer	-0.124939
-1.076381	fast on newer	-0.124939
-1.074299	used. A newer	-0.124939
-1.593993	on all newer	-0.124939
-0.591357	while all newer	-0.124939
-1.193650	on most newer	-0.124939
-0.591506	system All newer	-0.124939
-2.541459	and the fraction	-0.124939
-1.202202	setting the fraction	-0.124939
-1.044945	unsigned int fraction	-0.602060
-0.599771	16383 one fraction	-0.124939
-1.495964	a large fraction	-0.124939
-1.450898	a small fraction	-0.124939
-0.553831	127 1 fraction	-0.124939
-0.553831	1023 1 fraction	-0.124939
-1.926030	necessary to modify	-0.124939
-2.019005	recommended to modify	-0.124939
-0.745902	function can modify	-0.124939
-0.896462	classes or modify	-0.124939
-0.598733	remove or modify	-0.124939
-2.539470	the function modify	-0.124939
-1.189276	If we modify	-0.124939
-1.186376	function cannot modify	-0.124939
-0.589115	and don't modify	-0.124939
-2.176814	value of seconds	-0.124939
-1.827232	assume that seconds	-0.124939
-2.364139	rather than seconds	-0.124939
-0.600738	DelayFiveSeconds() { seconds	-0.124939
-0.898820	thread. If seconds	-0.124939
-1.367542	to set seconds	-0.124939
-0.594875	nothing while seconds	-0.124939
-1.167942	for several seconds	-0.124939
-0.975739	take several seconds	-0.124939
-1.009623	wait until seconds	-0.124939
-0.601625	account for unaligned	-0.124939
-0.754933	to store unaligned	-0.602060
-0.589149	efficiency. Using unaligned	-0.124939
-0.604847	to load unaligned	-0.602060
-0.366828	memcpy 16kB unaligned	-0.425969
-0.600642	through this address.	-0.124939
-1.629504	a memory address.	-0.124939
-1.844631	a different address.	-0.124939
-2.640726	the same address.	-0.124939
-0.595162	calculate its address.	-0.124939
-0.530590	specific load address.	-0.124939
-0.530590	actual load address.	-0.124939
-0.550733	a self-relative address.	-0.124939
-0.788784	a valid address.	-0.124939
-0.358887	32-bit (signed) address.	-0.124939
-0.501078	c); // Store	-0.425969
-0.883101	bc); // Store	-0.124939
-0.591951	mask); // Store	-0.124939
-0.591951	//=DeltaY // Store	-0.124939
-0.573384	_mm_stream_si32 SSE2 Store	-0.124939
-0.573384	_mm_stream_pd SSE2 Store	-0.124939
-0.394221	_mm_prefetch SSE Store	-0.124939
-0.394221	_mm_stream_ps SSE Store	-0.124939
-0.394221	_mm_stream_pi SSE Store	-0.124939
-3.035041	of the sequence	-0.124939
-2.351112	in the sequence	-0.124939
-2.678093	if the sequence	-0.124939
-2.066895	on a sequence	-0.124939
-0.600373	doing a sequence	-0.124939
-0.600373	follow a sequence	-0.124939
-0.601759	allocated in sequence	-0.124939
-1.298897	CPUs. The sequence	-0.124939
-1.683843	a long sequence	-0.124939
-2.441546	by the compiler,	-0.124939
-2.411643	with the compiler,	-0.124939
-1.363868	an Intel compiler,	-0.124939
-0.731962	Intel C++ compiler,	-0.124939
-1.621327	the Gnu compiler,	-0.124939
-0.569329	// Gnu compiler,	-0.124939
-0.593699	a Linux compiler,	-0.124939
-1.197357	in both compiler,	-0.124939
-0.549144	from both compiler,	-0.124939
-1.078508	delay is significant	-0.124939
-2.614388	is a significant	-0.124939
-1.914827	has a significant	-0.124939
-1.075784	consume a significant	-0.124939
-1.077817	possibility for significant	-0.124939
-2.589037	is not significant	-0.124939
-2.157664	the most significant	-0.124939
-0.329015	the least significant	-0.124939
-0.541233	approximately seven significant	-0.124939
-1.437023	cases it might	-0.124939
-0.598655	Or it might	-0.124939
-1.595032	optimizing compiler might	-0.124939
-1.074940	then this might	-0.124939
-0.600425	solution. It might	-0.124939
-0.895105	the variables might	-0.124939
-0.597472	fixed address might	-0.124939
-1.862592	the user might	-0.124939
-0.588045	example. We might	-0.124939
-0.463586	a coprocessor might	-0.124939
-2.350761	in the CPU.	-0.124939
-2.594194	for the CPU.	-0.124939
-2.506341	on the CPU.	-0.124939
-0.601809	brand of CPU.	-0.124939
-1.363812	an Intel CPU.	-0.124939
-0.591897	known hardware CPU.	-0.124939
-0.582043	a modern CPU.	-0.124939
-0.577593	a non-Intel CPU.	-0.124939
-0.659840	2 GHz CPU.	-0.124939
-0.599255	class, Intel Vector	-0.124939
-0.890890	vector, bits Vector	-0.124939
-1.017804	register variables. Vector	-0.124939
-1.217873	instruction sets. Vector	-0.124939
-0.463568	Table 12.5. Vector	-0.124939
-0.358873	Table 12.4. Vector	-0.124939
-0.358873	updated lately. Vector	-0.124939
-0.358873	Table 12.1. Vector	-0.124939
-0.358873	Example 12.7. Vector	-0.124939
-0.358873	bits (ZMM). Vector	-0.124939
-2.625106	to the length	-0.124939
-2.630510	if the length	-0.124939
-2.195512	then the length	-0.124939
-2.304333	If the length	-0.124939
-1.757903	unless the length	-0.124939
-0.899338	adding the length	-0.124939
-0.899294	doubled. The length	-0.124939
-0.600158	started. The length	-0.124939
-0.875508	The string length	-0.124939
-0.570559	the row length	-0.124939
-0.601734	lines and sets.	-0.124939
-0.739609	large data sets.	-0.124939
-0.562216	and instruction sets.	-0.124939
-0.195853	these instruction sets.	-0.124939
-0.588368	later instruction sets.	-0.124939
-0.595652	different instructions sets.	-0.124939
-1.956872	is a linear	-0.425969
-1.876980	than a linear	-0.124939
-1.382336	use a linear	-0.124939
-1.188393	then a linear	-0.124939
-1.063454	even a linear	-0.124939
-1.848012	through a linear	-0.124939
-0.601768	(e.g. in linear	-0.124939
-0.580822	calculations including linear	-0.124939
-2.339425	there is something	-0.124939
-0.203689	write that something	-0.425969
-1.075499	also has something	-0.124939
-1.579001	to do something	-0.425969
-0.592163	actually doing something	-0.124939
-0.877578	Don't put something	-0.124939
-0.504982	code. Storing something	-0.124939
-0.835782	is certainly something	-0.124939
-0.901846	bit of f	-0.124939
-2.225300	{ // f	-0.124939
-0.598996	30 // f	-0.124939
-0.899840	sum, then f	-0.124939
-0.884234	n; i++) f	-0.124939
-1.595931	int i, f	-0.124939
-0.358902	= (float)i; f	-0.124939
-0.358902	= float(i); f	-0.124939
-0.358902	f; f=i; f	-0.124939
-2.716990	is a penalty	-0.124939
-1.202372	version. The penalty	-0.124939
-0.601044	state. This penalty	-0.124939
-2.248954	is no penalty	-0.124939
-0.809294	a performance penalty	-0.124939
-0.809294	no performance penalty	-0.124939
-0.552635	any performance penalty	-0.124939
-0.552635	51 performance penalty	-0.124939
-0.615704	the misprediction penalty	-0.124939
-0.615704	a misprediction penalty	-0.124939
-1.900700	out of F1	-0.124939
-0.601794	specification to F1	-0.124939
-1.827232	assume that F1	-0.124939
-1.595003	The function F1	-0.124939
-1.071806	possible if F1	-0.124939
-0.598357	However, if F1	-0.124939
-1.077480	called by F1	-0.124939
-0.600426	exception then F1	-0.124939
-0.599920	necessary. If F1	-0.124939
-0.358887	without returning. F1	-0.124939
-0.902655	list[i] is invalid	-0.124939
-0.900228	overflow, and invalid	-0.124939
-0.600627	violations and invalid	-0.124939
-2.683833	can be invalid	-0.124939
-1.067855	would be invalid	-0.124939
-0.601179	Problems with invalid	-0.124939
-0.770284	it becomes invalid	-0.124939
-0.530600	counter becomes invalid	-0.124939
-0.358930	bounds violations, invalid	-0.124939
-0.601633	once. The reasons	-0.124939
-1.622008	functions for reasons	-0.124939
-0.594062	precision for reasons	-0.124939
-1.269884	manual for reasons	-0.124939
-0.594062	mode, for reasons	-0.124939
-0.202508	permissible for reasons	-0.425969
-1.418158	the main reasons	-0.124939
-0.579277	have special reasons	-0.124939
-0.550787	for security reasons	-0.124939
-1.713334	way of setting	-0.124939
-0.601708	test and setting	-0.124939
-1.298809	needed for setting	-0.124939
-1.298081	array or setting	-0.124939
-0.589251	value by setting	-0.124939
-1.452705	simply by setting	-0.124939
-0.589251	core by setting	-0.124939
-0.589251	zero, by setting	-0.124939
-0.589251	2.0) by setting	-0.124939
-1.292497	benefit from setting	-0.124939
-0.902825	compiling the module	-0.124939
-2.769457	in a module	-0.124939
-1.845136	a different module	-0.124939
-1.783279	the same module	-0.124939
-0.199202	from same module	-0.124939
-1.775608	any other module	-0.124939
-1.531839	a software module	-0.124939
-1.691211	a separate module	-0.124939
-2.411703	of the beginning	-0.425969
-1.635510	to the beginning	-0.726999
-2.523641	that the beginning	-0.124939
-2.297745	with the beginning	-0.124939
-2.112478	from the beginning	-0.124939
-1.905515	into the beginning	-0.124939
-1.299807	integer is within	-0.124939
-1.197108	accessed from within	-0.124939
-1.073710	to data within	-0.124939
-1.294881	used only within	-0.124939
-1.594978	data members within	-0.124939
-0.588496	of zero within	-0.124939
-0.570409	multiple statements within	-0.124939
-0.463568	be irrelevant within	-0.124939
-0.358873	by keys within	-0.124939
-0.358873	become obsolete within	-0.124939
-1.863094	compiler is used,	-0.124939
-2.633983	It is used,	-0.124939
-1.286319	allocation is used,	-0.124939
-1.070295	expression is used,	-0.124939
-1.289222	linking is used,	-0.124939
-2.798533	can be used,	-0.124939
-2.160184	will be used,	-0.124939
-1.014194	registers are used,	-0.124939
-0.541266	hardly ever used,	-0.124939
-0.601683	independent and checks	-0.124939
-0.601584	CPU-dispatcher that checks	-0.124939
-1.539568	before it checks	-0.124939
-0.598661	class, it checks	-0.124939
-1.536570	are no checks	-0.124939
-1.069032	of such checks	-0.124939
-0.196031	make overflow checks	-0.425969
-1.557635	CPU dispatcher checks	-0.124939
-0.527249	make explicit checks	-0.124939
-0.601321	text or input	-0.124939
-0.901255	overflow on input	-0.124939
-0.378386	variables as input	-0.124939
-1.295466	or an input	-0.124939
-1.126015	of user input	-0.124939
-0.535574	to user input	-0.124939
-0.574233	for user input	-0.425969
-0.596069	for file input	-0.124939
-0.601548	others are not.	-0.124939
-1.078007	functions can not.	-0.124939
-1.033057	code or not.	-0.124939
-1.033057	loop or not.	-0.124939
-0.586135	2 or not.	-0.124939
-1.226670	speed or not.	-0.124939
-0.586135	advantageous or not.	-0.124939
-0.373771	aligned or not.	-0.124939
-1.195605	and which not.	-0.124939
-1.202194	think that programmers	-0.124939
-1.466626	for many programmers	-0.124939
-0.584751	example, many programmers	-0.124939
-0.598150	example, some programmers	-0.124939
-1.472581	of software programmers	-0.124939
-1.058178	for assembly programmers	-0.124939
-0.589595	constructs Most programmers	-0.124939
-0.586625	program. Many programmers	-0.124939
-0.584118	for advanced programmers	-0.124939
-0.358887	program. Application programmers	-0.124939
-2.303466	than the alternative	-0.124939
-0.601592	size. The alternative	-0.124939
-0.601276	case if alternative	-0.124939
-1.295156	and use alternative	-0.124939
-1.269421	A simple alternative	-0.124939
-0.566411	inlined. An alternative	-0.124939
-0.566411	fragmented. An alternative	-0.124939
-0.575565	DLL. Another alternative	-0.124939
-0.358887	A little-known alternative	-0.124939
-0.358887	A light-weight alternative	-0.124939
-0.573215	scan instructions. My	-0.124939
-0.550676	illustrates this. My	-0.124939
-0.884861	an example. My	-0.124939
-0.527207	well-known languages. My	-0.124939
-0.463568	Linux. Asmlib My	-0.124939
-0.659754	monitor counters. My	-0.124939
-0.463568	512 matrix. My	-0.124939
-0.463568	from me. My	-0.124939
-0.358873	file level. My	-0.124939
-0.358873	been identified. My	-0.124939
-2.308379	that is organized	-0.124939
-1.549992	cache is organized	-0.124939
-2.594585	can be organized	-0.124939
-1.662934	should be organized	-0.425969
-1.061890	easily be organized	-0.124939
-0.599682	lines are organized	-0.124939
-0.599682	caches are organized	-0.124939
-1.076899	memory if organized	-0.124939
-1.182665	point registers organized	-0.124939
-0.691501	the critical stride	-0.191886
-0.106475	The critical stride	-0.124939
-0.846284	SSE2 instruction set,	-0.124939
-0.776490	specific instruction set,	-0.124939
-1.377997	AVX instruction set,	-0.124939
-0.680291	supported instruction set,	-0.425969
-0.776490	particular instruction set,	-0.124939
-1.122724	higher instruction set,	-0.124939
-2.989963	of the current	-0.124939
-2.648383	to the current	-0.124939
-2.653045	if the current	-0.124939
-2.364896	with the current	-0.124939
-0.600520	(i.e. the current	-0.124939
-0.902387	features, and current	-0.124939
-2.008334	code that current	-0.124939
-0.601142	tasks on current	-0.124939
-0.598510	12.4a where current	-0.124939
-0.575593	doesn't handle current	-0.124939
-1.203567	need the 'this'	-0.124939
-1.978960	have a 'this'	-0.124939
-1.200218	need a 'this'	-0.124939
-1.597954	functions. The 'this'	-0.124939
-0.601595	transfer for 'this'	-0.124939
-0.889437	type-casting its 'this'	-0.124939
-0.527249	by transferring 'this'	-0.124939
-0.314737	The implicit 'this'	-0.124939
-0.444342	an implicit 'this'	-0.124939
-0.504982	pointers, references, 'this'	-0.124939
-3.203093	of the problem.	-0.124939
-0.902626	becomes a problem.	-0.124939
-1.608940	of this problem.	-0.124939
-0.860763	have this problem.	-0.124939
-1.036854	reduce this problem.	-0.124939
-0.860763	around this problem.	-0.124939
-0.860763	solve this problem.	-0.124939
-1.057200	one big problem.	-0.124939
-0.592992	encounter another problem.	-0.124939
-0.550765	another security problem.	-0.124939
-1.078013	processors, and 3	-0.124939
-2.203959	See page 3	-0.124939
-0.895731	addition takes 3	-0.124939
-1.356110	may take 3	-0.124939
-1.601066	the program. 3	-0.124939
-1.154364	operating systems. 3	-0.124939
-0.165170	the interrupt 3	-0.425969
-0.550733	language...................................................... 14 3	-0.124939
-0.358887	Introduction ....................................................................................................................... 3	-0.124939
-1.828206	member functions counts	-0.124939
-0.599883	CPU, which counts	-0.124939
-0.939764	the clock counts	-0.124939
-1.144596	The clock counts	-0.124939
-1.288201	The profiler counts	-0.124939
-0.811742	the subsequent counts	-0.124939
-0.726779	The subsequent counts	-0.124939
-0.541233	meaningless event counts	-0.124939
-0.764518	"best case" counts	-0.124939
-0.124647	lot to gain	-0.124939
-0.897874	nothing to gain	-0.124939
-0.600148	priority. The gain	-0.124939
-0.899274	parallelism. The gain	-0.124939
-0.601051	substantial. This gain	-0.124939
-0.596109	much you gain	-0.124939
-0.596109	insight you gain	-0.124939
-0.593900	relatively small gain	-0.124939
-0.599903	On other processors,	-0.124939
-0.598410	On many processors,	-0.124939
-1.415196	Pentium 4 processors,	-0.124939
-1.349116	on AMD processors,	-0.124939
-1.645966	and VIA processors,	-0.124939
-1.394456	on non-Intel processors,	-0.124939
-0.990535	on future processors,	-0.124939
-0.541271	and CISC processors,	-0.124939
-0.527207	On older processors,	-0.124939
-0.358873	Intel Atom processors,	-0.124939
-2.011893	it can happen	-0.124939
-0.888558	same can happen	-0.124939
-0.202645	errors can happen	-0.124939
-1.199332	which may happen	-0.124939
-1.630695	This will happen	-0.124939
-2.427116	the program happen	-0.124939
-0.598071	several variables happen	-0.124939
-1.360401	can often happen	-0.124939
-0.593676	big matrix happen	-0.124939
-2.535478	may be enough	-0.124939
-2.189636	are not enough	-0.124939
-0.899424	integer has enough	-0.124939
-0.581854	not long enough	-0.124939
-0.581854	just long enough	-0.124939
-0.235291	is big enough	-0.301030
-1.348320	is small enough	-0.124939
-1.230484	is rarely enough	-0.124939
-0.601815	them to apply	-0.124939
-1.330079	does not apply	-0.425969
-0.596315	here may apply	-0.124939
-0.596315	advices may apply	-0.124939
-1.026316	you should apply	-0.124939
-1.671803	not always apply	-0.124939
-0.415690	same rules apply	-0.124939
-0.415690	coding rules apply	-0.124939
-2.176973	} } Obviously,	-0.124939
-0.580622	all variables. Obviously,	-0.124939
-1.113227	shared object. Obviously,	-0.124939
-1.516923	clock cycles. Obviously,	-0.124939
-1.003944	not needed. Obviously,	-0.124939
-1.376786	CPU dispatching. Obviously,	-0.124939
-1.027450	back again. Obviously,	-0.124939
-0.999349	is finished. Obviously,	-0.124939
-0.788746	each process. Obviously,	-0.124939
-0.463568	.NET framework. Obviously,	-0.124939
-0.901154	particular code version.	-0.124939
-1.947825	for each version.	-0.124939
-1.189941	best possible version.	-0.124939
-1.066548	the 32-bit version.	-0.124939
-0.882962	of every version.	-0.124939
-0.591226	every intermediate version.	-0.124939
-1.411838	the old version.	-0.124939
-1.503646	the desired version.	-0.124939
-0.570482	the alternative version.	-0.124939
-0.726651	an up-to-date version.	-0.124939
-2.466125	when the row	-0.124939
-2.653014	of a row	-0.124939
-1.853821	elements in row	-0.124939
-0.601325	matrix[row][column] = row	-0.124939
-1.441921	elements from row	-0.124939
-1.708804	of each row	-0.124939
-1.792732	for each row	-0.124939
-2.338651	= 0; row	-0.124939
-1.031497	elements per row	-0.124939
-1.342153	for calculating row	-0.124939
-0.707319	Intel C++ Compiler	-0.124939
-0.824869	Microsoft C++ Compiler	-0.124939
-0.561191	"Intel® C++ Compiler	-0.124939
-1.057534	Digital Mars Compiler	-0.124939
-0.541223	Predefined macros Compiler	-0.124939
-0.143394	latencies. 8.5 Compiler	-0.124939
-0.143394	CPU.............................................................................81 8.5 Compiler	-0.124939
-0.358930	Table 18.2. Compiler	-0.124939
-0.358930	not selected. Compiler	-0.124939
-1.956976	is a matter	-0.425969
-0.070268	simply a matter	-0.823909
-1.063476	just a matter	-0.124939
-1.440110	it doesn't matter	-0.124939
-0.566508	size doesn't matter	-0.124939
-2.703298	that the declaration	-0.124939
-2.023083	using the declaration	-0.124939
-1.990726	} The declaration	-0.124939
-2.063249	const int declaration	-0.124939
-1.656026	or class declaration	-0.124939
-1.288037	The static declaration	-0.124939
-1.849030	a variable declaration	-0.124939
-1.182034	the full declaration	-0.124939
-0.567153	types available. declaration	-0.124939
-0.659783	extern "C" declaration	-0.124939
-2.170952	is to allocate	-0.124939
-1.629428	than to allocate	-0.124939
-1.696375	efficient to allocate	-0.124939
-1.862651	necessary to allocate	-0.124939
-0.597086	delete to allocate	-0.124939
-1.064426	preferable to allocate	-0.124939
-0.597086	Try to allocate	-0.124939
-1.201061	Does not allocate	-0.124939
-0.596335	to even allocate	-0.124939
-0.886542	string classes allocate	-0.124939
-1.600856	or the series	-0.124939
-2.496518	is a series	-0.124939
-2.028409	in a series	-0.425969
-1.862298	through a series	-0.124939
-0.895382	made a series	-0.124939
-0.598189	executes a series	-0.124939
-0.601051	notice This series	-0.124939
-2.044373	in this series	-0.124939
-0.527313	12.9a. Taylor series	-0.124939
-3.199634	of the features	-0.124939
-1.198618	libraries have features	-0.124939
-2.640726	the same features	-0.124939
-1.920918	and other features	-0.124939
-1.297197	the optimization features	-0.124939
-0.577199	many optimization features	-0.124939
-0.596436	add new features	-0.124939
-1.027436	of advanced features	-0.124939
-0.818437	time- consuming features	-0.124939
-0.358887	set Important features	-0.124939
-2.238843	that is added	-0.124939
-1.286286	integer is added	-0.124939
-0.599071	c is added	-0.124939
-0.599071	members is added	-0.124939
-1.192285	f is added	-0.124939
-1.977581	lot of added	-0.124939
-2.257275	can be added	-0.124939
-1.958746	I have added	-0.124939
-1.624646	have been added	-0.124939
-2.045989	to the user.	-0.124939
-1.955860	for the user.	-0.124939
-2.394121	by the user.	-0.124939
-0.510129	the end user.	-0.124939
-2.652456	the code to:	-0.124939
-0.371414	reduce this to:	-0.425969
-0.580393	change this to:	-0.124939
-0.580393	changing this to:	-0.124939
-0.580393	Change this to:	-0.124939
-1.170803	be optimized to:	-0.124939
-1.031589	be reduced to:	-0.124939
-0.630020	be changed to:	-0.425969
-1.977131	is a waste	-0.425969
-1.852745	and a waste	-0.124939
-1.376178	be a waste	-0.425969
-1.188839	cause a waste	-0.124939
-0.601721	frustration and waste	-0.124939
-1.279333	and they waste	-0.124939
-1.397609	a big waste	-0.124939
-0.855479	a total waste	-0.124939
-1.299773	15.1b to metaprogramming	-0.124939
-1.372180	functions. A metaprogramming	-0.124939
-0.596943	explain how metaprogramming	-0.124939
-0.539489	is template metaprogramming	-0.124939
-0.539489	using template metaprogramming	-0.124939
-0.539489	where template metaprogramming	-0.124939
-0.539489	convoluted template metaprogramming	-0.124939
-0.587976	for better metaprogramming	-0.124939
-0.575557	has full metaprogramming	-0.124939
-0.726834	be considered metaprogramming	-0.124939
-0.601821	requesting a map	-0.124939
-1.445340	set and map	-0.124939
-1.489923	program. The map	-0.124939
-0.899233	linker. The map	-0.124939
-0.585153	a link map	-0.124939
-0.531174	a hash map	-0.124939
-0.378828	A hash map	-0.124939
-0.505002	-S Generate map	-0.124939
-0.358916	the "generate map	-0.124939
-1.288141	you to define	-0.124939
-1.720320	efficient to define	-0.124939
-2.029771	able to define	-0.124939
-0.897888	ability to define	-0.124939
-1.740035	we can define	-0.124939
-0.594292	size // define	-0.124939
-1.181222	matrix // define	-0.124939
-0.594292	<stdio.h> // define	-0.124939
-0.594292	fprintf // define	-0.124939
-2.057310	you may define	-0.124939
-1.798694	when it returns.	-0.124939
-1.023369	the function returns.	-0.176091
-0.902497	database in Windows.	-0.124939
-0.902240	drivers for Windows.	-0.124939
-0.995459	and 64-bit Windows.	-0.124939
-1.257710	in 64-bit Windows.	-0.124939
-0.833003	for 32-bit Windows.	-0.124939
-0.193388	only 32-bit Windows.	-0.124939
-0.688449	oriented programming style	-0.124939
-0.551724	primitive programming style	-0.124939
-0.456783	the C style	-0.124939
-0.170917	fashioned C style	-0.124939
-0.577622	software writing style	-0.124939
-0.358945	with x87 style	-0.124939
-0.358945	as C- style	-0.124939
-1.165451	{ // Load	-0.726999
-0.420201	i); // Load	-0.602060
-0.584983	b.load(bb+i); // Load	-0.124939
-0.940748	function call. Load	-0.124939
-1.201333	temp = 3;	-0.124939
-0.600933	__asm int 3;	-0.124939
-0.599091	9 + 3;	-0.124939
-0.512481	a * 3;	-0.425969
-0.593213	i / 3;	-0.124939
-0.575605	i % 3;	-0.124939
-2.501286	This is approximately	-0.124939
-1.074352	variables is approximately	-0.124939
-0.600444	misprediction is approximately	-0.124939
-0.601809	precision of approximately	-0.124939
-0.601605	x for approximately	-0.124939
-2.263929	There are approximately	-0.124939
-0.601306	array, or approximately	-0.124939
-1.184811	will take approximately	-0.124939
-1.353028	be accessed approximately	-0.124939
-1.202297	out of order.	-0.124939
-1.586850	the optimal order.	-0.124939
-0.287633	a non-sequential order.	-0.301030
-0.165180	in random order.	-0.124939
-0.527313	non- sequential order.	-0.124939
-1.060663	break; case 3:	-0.124939
-0.599106	and manual 3:	-0.124939
-0.796664	in manual 3:	-0.425969
-0.073567	(See manual 3:	-0.726999
-0.463696	branches. Manual 3:	-0.124939
-1.077993	3. The microarchitecture	-0.124939
-0.003834	3: "The microarchitecture	-1.028029
-2.038098	It is easy	-0.425969
-1.197746	error is easy	-0.124939
-0.900253	fast and easy	-0.124939
-0.600639	independence, and easy	-0.124939
-0.601335	assembly or easy	-0.124939
-1.551781	is no easy	-0.425969
-0.463659	debugging facilities, easy	-0.124939
-1.203147	aware of situations	-0.124939
-0.599254	useful in situations	-0.301030
-0.598456	RISC in situations	-0.124939
-2.536056	may be situations	-0.124939
-2.264128	There are situations	-0.124939
-1.686595	are also situations	-0.124939
-0.596570	in test situations	-0.124939
-1.696445	efficient to implement	-0.124939
-1.590109	possible to implement	-0.425969
-2.176011	order to implement	-0.124939
-2.010053	how to implement	-0.124939
-1.122685	difficult to implement	-0.124939
-0.593726	child classes implement	-0.124939
-1.293356	have tested implement	-0.124939
-0.600883	(less than 65	-0.124939
-0.450677	16.4 65 65	-0.124939
-0.450677	80.8 65 65	-0.124939
-0.527249	directives ......................................................................................... 65	-0.124939
-0.463604	using namespaces. 65	-0.124939
-0.358902	7.33 Namespaces........................................................................................................... 65	-0.124939
-0.358902	32 16.4 65	-0.124939
-0.358902	14.0 80.8 65	-0.124939
-0.358902	unwinding .............................................................................. 65	-0.124939
-2.059091	where the chosen	-0.124939
-1.673093	call the chosen	-0.124939
-1.376118	gives the chosen	-0.124939
-2.255596	code is chosen	-0.124939
-1.492548	C++ is chosen	-0.124939
-1.492548	language is chosen	-0.124939
-2.952704	can be chosen	-0.124939
-1.754394	it has chosen	-0.124939
-1.577276	compiler has chosen	-0.124939
-0.601805	emulate a 256-bit	-0.124939
-0.601794	Vectors of 256-bit	-0.124939
-1.078383	extended to 256-bit	-0.124939
-0.601670	XMM and 256-bit	-0.124939
-0.601592	below). The 256-bit	-0.124939
-0.898449	use one 256-bit	-0.124939
-0.884208	that supported 256-bit	-0.124939
-0.589551	also allows 256-bit	-0.124939
-0.527278	were splitting 256-bit	-0.124939
-2.732609	it is slightly	-0.124939
-0.600458	bytes is slightly	-0.124939
-1.074392	latter is slightly	-0.124939
-1.153990	are only slightly	-0.124939
-0.494650	takes only slightly	-0.425969
-1.050329	may run slightly	-0.124939
-0.463659	make Sum1 slightly	-0.124939
-0.463659	133 although slightly	-0.124939
-1.078165	fragmented and scattered	-0.124939
-2.429117	to be scattered	-0.124939
-2.446206	may be scattered	-0.124939
-2.156701	that are scattered	-0.124939
-1.039555	data are scattered	-0.124939
-0.891045	branches are scattered	-0.124939
-1.200159	many functions scattered	-0.124939
-0.594084	files etc. scattered	-0.124939
-2.310151	possible to contain	-0.124939
-0.601578	subexpressions that contain	-0.124939
-1.598395	It can contain	-0.124939
-0.600843	section may contain	-0.124939
-0.599662	data should contain	-0.124939
-0.892046	objects they contain	-0.124939
-0.593616	generations classes contain	-0.124939
-0.504962	two books contain	-0.124939
-0.358887	and newsgroups contain	-0.124939
-0.601696	reads and writes	-0.124939
-0.203453	reads or writes	-0.124939
-2.321082	that it writes	-0.124939
-1.377120	This function writes	-0.124939
-0.898613	causes all writes	-0.124939
-0.474585	mix nontemporal writes	-0.124939
-0.474585	insert nontemporal writes	-0.124939
-0.550743	with normal writes	-0.124939
-2.570607	on the device	-0.124939
-0.902596	calls a device	-0.124939
-0.601683	routines and device	-0.124939
-0.902459	(except in device	-0.124939
-1.369697	or other device	-0.124939
-2.001482	in 64-bit device	-0.124939
-0.169328	programmable logic device	-0.124939
-0.541179	C++. Critical device	-0.124939
-2.779276	it is independent	-0.124939
-0.601146	11.3 is independent	-0.124939
-0.601526	additions are independent	-0.124939
-0.575656	is OS independent	-0.124939
-1.096205	is almost independent	-0.124939
-0.550765	on completely independent	-0.124939
-0.090160	of position- independent	-0.124939
-0.090160	makes position- independent	-0.124939
-0.090160	so-called position- independent	-0.124939
-0.415086	dynamic memory allocation.	-0.124939
-0.596055	for dynamic allocation.	-0.124939
-1.261386	than a non-static	-0.425969
-0.901642	members or non-static	-0.124939
-1.594422	on all non-static	-0.124939
-0.591399	Likewise, all non-static	-0.124939
-0.840968	need any non-static	-0.124939
-0.197492	access any non-static	-0.425969
-0.591567	constructed. All non-static	-0.124939
-2.476116	and the subsequent	-0.124939
-2.892785	in the subsequent	-0.124939
-1.597744	than the subsequent	-0.124939
-0.900052	delay the subsequent	-0.124939
-1.569464	cache. The subsequent	-0.124939
-0.598688	manual. The subsequent	-0.124939
-0.598688	cached. The subsequent	-0.124939
-0.898728	causes all subsequent	-0.124939
-1.064727	functions. This applies	-0.124939
-0.597188	manner. This applies	-0.124939
-0.927624	The same applies	-0.425969
-0.847438	This also applies	-0.124939
-0.573343	Linux also applies	-0.124939
-0.573343	operators also applies	-0.124939
-1.981579	of 2 applies	-0.124939
-0.541268	same advice applies	-0.124939
-2.188949	can be applied	-0.425969
-0.598565	only be applied	-0.425969
-0.578103	results when applied	-0.124939
-0.088286	static, when applied	-0.726999
-0.503008	constructors and destructors	-0.124939
-0.203405	Constructors and destructors	-0.124939
-0.601200	classes with destructors	-0.124939
-0.672342	that all destructors	-0.301030
-0.594503	any necessary destructors	-0.124939
-0.856496	applied to integers.	-0.124939
-1.376288	efficient as integers.	-0.124939
-1.143709	of 64-bit integers.	-0.124939
-1.143709	for 64-bit integers.	-0.124939
-1.366096	and unsigned integers.	-0.124939
-0.877616	than signed integers.	-0.124939
-0.851718	eight 16-bit integers.	-0.124939
-0.575586	vector containing integers.	-0.124939
-1.772693	function in terms	-0.124939
-0.594117	- in terms	-0.124939
-0.377016	cost in terms	-0.425969
-0.202520	costs in terms	-0.425969
-0.594117	costless in terms	-0.124939
-0.594117	Thinking in terms	-0.124939
-0.582147	four consecutive terms	-0.124939
-2.282875	is to help	-0.124939
-2.271675	order to help	-0.124939
-1.739825	we can help	-0.124939
-1.067644	of some help	-0.124939
-0.596379	instructions without help	-0.124939
-1.708111	to store help	-0.124939
-0.550770	resource files, help	-0.124939
-0.550770	configuration files, help	-0.124939
-0.541201	updates, remote help	-0.124939
-1.549181	faster to transfer	-0.124939
-0.601035	constructor" to transfer	-0.124939
-1.202454	class. The transfer	-0.124939
-0.100025	of parameter transfer	-0.301030
-0.160256	and parameter transfer	-0.124939
-0.358959	mode Parameter transfer	-0.124939
-0.873889	more memory blocks	-0.124939
-0.873889	multiple memory blocks	-0.124939
-0.873889	big memory blocks	-0.124939
-1.583033	into multiple blocks	-0.124939
-0.566435	in big blocks	-0.124939
-0.566435	writing big blocks	-0.124939
-0.855422	of copying blocks	-0.124939
-0.358930	digital building blocks	-0.124939
-0.358930	17.9: "Moving blocks	-0.124939
-0.593488	simply optimized away	-0.124939
-0.552401	at optimizing away	-0.124939
-0.552401	Prevent optimizing away	-0.124939
-0.822124	to optimize away	-0.124939
-0.679635	can optimize away	-0.124939
-0.476117	easily optimize away	-0.124939
-0.877652	be put away	-0.124939
-1.133188	to go away	-0.124939
-0.919826	of example 15.1b	-0.124939
-0.542207	to example 15.1b	-0.124939
-1.221867	in example 15.1b	-0.425969
-0.621388	from example 15.1b	-0.425969
-0.919826	convert example 15.1b	-0.124939
-0.858755	an inlined 15.1b	-0.124939
-0.855560	compiler reduced 15.1b	-0.124939
-2.566050	and the low	-0.124939
-0.902655	load is low	-0.124939
-2.663732	in a low	-0.124939
-2.071385	for a low	-0.124939
-1.074119	produces a low	-0.124939
-1.280776	processors with low	-0.124939
-1.359957	threads with low	-0.124939
-1.806321	a very low	-0.124939
-0.659869	have got low	-0.124939
-0.601815	factor to multiply	-0.124939
-2.082339	it can multiply	-0.124939
-1.745120	We can multiply	-0.124939
-0.601403	for // multiply	-0.124939
-0.248621	constant = multiply	-0.301030
-1.724972	you should multiply	-0.124939
-0.597564	instructions cannot multiply	-0.124939
-1.774640	time to share	-0.124939
-0.892955	threads can share	-0.124939
-0.596963	c can share	-0.124939
-0.596963	simultaneously can share	-0.124939
-0.895849	different objects share	-0.124939
-1.264241	the threads share	-0.124939
-1.595364	data members share	-0.124939
-0.585078	processors usually share	-0.124939
-0.788861	row 28 share	-0.124939
-0.731699	set is enabled.	-0.191886
-1.278578	optimization is enabled.	-0.124939
-0.596337	higher) is enabled.	-0.124939
-0.387473	for an explanation	-0.602060
-1.724159	have no explanation	-0.124939
-1.920897	the following explanation	-0.124939
-0.527334	more detailed explanation	-0.124939
-1.641497	count is near	-0.124939
-1.993269	are used near	-0.124939
-1.567642	are stored near	-0.124939
-0.236994	also stored near	-0.602060
-1.353380	are called near	-0.124939
-0.585979	code together near	-0.124939
-1.162378	are equally near	-0.124939
-1.492495	language is provided	-0.124939
-0.899892	sets is provided	-0.124939
-0.600458	contain is provided	-0.124939
-0.597835	guidelines are provided	-0.124939
-0.894680	Examples are provided	-0.124939
-0.597835	parsing are provided	-0.124939
-1.958746	I have provided	-0.124939
-0.584238	member function, provided	-0.124939
-0.463659	use branches, provided	-0.124939
-2.915370	of the latter	-0.124939
-2.321420	in the latter	-0.124939
-2.291876	If the latter	-0.124939
-1.696415	In the latter	-0.124939
-1.072553	inlining the latter	-0.124939
-1.290038	though the latter	-0.124939
-1.434155	systems. The latter	-0.124939
-1.196642	mode. The latter	-0.124939
-2.157428	there are 6	-0.124939
-1.077815	first // 6	-0.124939
-1.076065	3 - 6	-0.124939
-0.591191	designed program. 6	-0.124939
-0.550698	double plus 6	-0.124939
-0.999411	the future. 6	-0.124939
-0.504962	microprocessor ........................................................................................... 6	-0.124939
-0.463586	....................................................................................... 24 6	-0.124939
-0.358887	operating system......................................................................................... 6	-0.124939
-1.438628	times and stores	-0.124939
-0.600627	matrix and stores	-0.124939
-2.427981	the function stores	-0.124939
-1.365849	This function stores	-0.124939
-0.899400	STL vector stores	-0.124939
-0.552388	It simply stores	-0.124939
-0.552388	pointer simply stores	-0.124939
-0.588016	Gnu mechanism stores	-0.124939
-0.358930	PTR [ecx+eax*4],ebx stores	-0.124939
-0.901674	want it to.	-0.124939
-0.760371	it points to.	-0.124939
-0.584280	pointer points to.	-0.124939
-0.295022	r points to.	-0.124939
-0.570577	to apply to.	-0.124939
-1.004480	object pointed to.	-0.124939
-0.541223	it jumps to.	-0.124939
-0.463641	pointer refers to.	-0.124939
-3.139240	of the default	-0.124939
-2.300180	use the default	-0.124939
-2.339258	with a default	-0.124939
-1.503197	applies to default	-0.124939
-0.601403	coordinates // default	-0.124939
-1.498170	used by default	-0.124939
-0.889483	registers by default	-0.124939
-0.595203	off by default	-0.124939
-0.600457	constructor. A default	-0.124939
-0.890948	vector, bits Instruction	-0.124939
-0.594435	table element Instruction	-0.124939
-1.239637	function name Instruction	-0.124939
-0.573272	Mac, BSD Instruction	-0.124939
-0.532456	as follows: Instruction	-0.425969
-0.550752	makers. 4. Instruction	-0.124939
-0.358902	fprintf(stderr, "\nError: Instruction	-0.124939
-0.358902	Table 13.1. Instruction	-0.124939
-1.203239	purpose of finding	-0.124939
-1.960994	used for finding	-0.124939
-0.970473	useful for finding	-0.301030
-1.434569	intended for finding	-0.124939
-1.060432	search for finding	-0.124939
-1.051352	required for finding	-0.124939
-0.600938	else than finding	-0.124939
-0.902627	numbers is inefficient.	-0.124939
-0.601505	procedures are inefficient.	-0.124939
-1.777455	is very inefficient.	-0.124939
-0.893037	and often inefficient.	-0.124939
-1.424801	is quite inefficient.	-0.124939
-0.855601	data caching inefficient.	-0.124939
-0.872827	caching becomes inefficient.	-0.124939
-1.321011	of course inefficient.	-0.124939
-0.295175	a, b, c,	-0.550907
-1.423986	= 0, c,	-0.124939
-0.601683	sort and search	-0.124939
-0.601602	needs. The search	-0.124939
-2.157669	there are search	-0.124939
-0.202027	added? If search	-0.425969
-0.590419	Some programs search	-0.124939
-0.587982	SSE4.2 string search	-0.124939
-1.372081	can improve search	-0.124939
-0.585087	use binary search	-0.124939
-0.898948	by CPU Modern	-0.124939
-1.050884	and operators Modern	-0.124939
-0.880879	compilers optimize Modern	-0.124939
-0.579188	critical resources. Modern	-0.124939
-0.835784	Dependency chains Modern	-0.124939
-1.124415	branch prediction. Modern	-0.124939
-0.960544	in parallel. Modern	-0.124939
-0.358887	prediction mechanisms. Modern	-0.124939
-0.358887	and temp2. Modern	-0.124939
-1.575163	the memory block.	-0.124939
-1.203844	allocated memory block.	-0.124939
-1.037663	bigger memory block.	-0.124939
-0.580805	contiguous memory block.	-0.124939
-1.467518	the new block.	-0.124939
-0.593914	each allocated block.	-0.124939
-1.671167	the next block.	-0.124939
-0.563050	a try block.	-0.124939
-0.463641	thread environment block.	-0.124939
-1.439206	speed is critical.	-0.124939
-0.679639	caching is critical.	-0.124939
-2.257275	can be critical.	-0.124939
-2.190473	are not critical.	-0.124939
-1.073975	are most critical.	-0.124939
-0.887626	is particularly critical.	-0.124939
-0.931699	are particularly critical.	-0.124939
-0.366232	of dependency chains	-0.124939
-0.366232	such dependency chains	-0.124939
-0.841833	long dependency chains	-0.124939
-0.366232	Such dependency chains	-0.124939
-0.366232	down dependency chains	-0.124939
-0.800854	loop-carried dependency chains	-0.124939
-0.366232	Long dependency chains	-0.124939
-0.065813	3.15 Dependency chains	-0.124939
-1.078802	finished the time-consuming	-0.124939
-0.600566	dynamic_cast more time-consuming	-0.124939
-1.459489	the most time-consuming	-0.124939
-0.566796	the very time-consuming	-0.124939
-0.566796	as very time-consuming	-0.124939
-0.566796	A very time-consuming	-0.124939
-1.249284	be quite time-consuming	-0.124939
-1.529754	to put time-consuming	-0.124939
-1.388372	with different brands	-0.124939
-0.870823	seven different brands	-0.124939
-0.585627	treats different brands	-0.124939
-1.050186	between CPU brands	-0.124939
-1.390983	specific CPU brands	-0.124939
-1.541882	for other brands	-0.124939
-1.687871	on all brands	-0.124939
-0.575630	processors. Other brands	-0.124939
-0.505022	of competing brands	-0.124939
-1.176249	set is available.	-0.602060
-0.601287	linking" if available.	-0.124939
-1.686475	are also available.	-0.124939
-1.910616	function libraries available.	-0.124939
-0.593888	optimization option available.	-0.124939
-1.036617	integer types available.	-0.124939
-0.463622	compiler became available.	-0.124939
-1.021623	most cases. Don't	-0.124939
-1.288611	if possible. Don't	-0.124939
-0.557721	stopping threads. Don't	-0.124939
-0.504982	template metaprogramming. Don't	-0.124939
-0.143386	reciprocal_divisor; 14.7 Don't	-0.124939
-0.143386	139 14.7 Don't	-0.124939
-0.358902	against overkill. Don't	-0.124939
-0.358902	be evicted. Don't	-0.124939
-0.358902	the opposite: Don't	-0.124939
-1.078528	what is brand	-0.124939
-1.440297	because this brand	-0.124939
-1.976420	the CPU brand	-0.124939
-0.868616	for CPU brand	-0.124939
-0.584481	at CPU brand	-0.124939
-0.583846	on any brand	-0.124939
-0.867395	have any brand	-0.124939
-1.794250	a particular brand	-0.124939
-0.575656	of unknown brand	-0.124939
-2.616448	it is executed.	-0.124939
-2.193926	code is executed.	-0.124939
-1.984005	program is executed.	-0.124939
-1.358658	branch is executed.	-0.124939
-0.203384	Func is executed.	-0.124939
-2.107235	they are executed.	-0.124939
-1.061581	it was executed.	-0.124939
-0.549177	statement was executed.	-0.124939
-2.175158	which is faster.	-0.124939
-0.597886	their software faster.	-0.124939
-0.889811	three times faster.	-0.124939
-0.683387	is much faster.	-0.124939
-0.537259	accessed much faster.	-0.124939
-1.169984	address calculation faster.	-0.124939
-0.588591	the division faster.	-0.124939
-0.851663	code execute faster.	-0.124939
-2.157711	at the diagonal	-0.124939
-0.504260	above the diagonal	-0.124939
-0.249383	below the diagonal	-0.124939
-0.599839	At the diagonal	-0.124939
-0.175469	columns below diagonal	-0.124939
-0.584289	integer int n;	-0.124939
-0.200507	u; int n;	-0.124939
-0.584289	14.3a int n;	-0.124939
-0.584289	14.3b int n;	-0.124939
-2.180036	i < n;	-0.124939
-0.415688	x <= n;	-0.124939
-0.586729	i <= n;	-0.124939
-0.463677	dword ptr n;	-0.124939
-1.690102	a[i] = *p	-0.124939
-0.502896	*p = *p	-0.425969
-0.601247	object by *p	-0.124939
-0.744409	p) { *p	-0.425969
-0.902970	to reload *p	-0.124939
-0.065807	char string[100], *p	-0.425969
-2.067478	where the logic	-0.124939
-0.601586	explains the logic	-0.124939
-1.077900	faster. The logic	-0.124939
-1.682342	the program logic	-0.425969
-1.581080	The program logic	-0.124939
-0.143399	a programmable logic	-0.124939
-0.143399	A programmable logic	-0.124939
-0.358945	5 Programmable logic	-0.124939
-2.641259	for the Microsoft,	-0.124939
-1.871558	as the Microsoft,	-0.124939
-0.601602	tried. The Microsoft,	-0.124939
-1.710976	supported by Microsoft,	-0.124939
-0.601159	Linux with Microsoft,	-0.124939
-0.600318	compilers from Microsoft,	-0.124939
-0.586762	Microsoft Intel, Microsoft,	-0.124939
-1.128866	x86 platforms. Microsoft,	-0.124939
-0.550720	functions (i.e. Microsoft,	-0.124939
-2.730295	to the hard	-0.124939
-2.549102	on the hard	-0.124939
-2.528829	of a hard	-0.124939
-1.396293	on a hard	-0.425969
-1.771282	from a hard	-0.124939
-1.742994	support for hard	-0.124939
-0.895975	to many hard	-0.124939
-0.563080	and fragmented hard	-0.124939
-2.465635	number of purposes	-0.124939
-1.901243	for different purposes	-0.124939
-0.843454	for other purposes	-0.425969
-0.599419	for most purposes	-0.124939
-0.887531	many common purposes	-0.124939
-0.858533	for special purposes	-0.124939
-0.828102	for general purposes	-0.124939
-0.358902	for educational purposes	-0.124939
-2.761207	if the typical	-0.124939
-2.050702	in a typical	-0.124939
-2.050345	on a typical	-0.124939
-0.599637	contain a typical	-0.124939
-0.601622	fastest. The typical	-0.124939
-0.600457	infinity. A typical	-0.124939
-0.598195	out some typical	-0.124939
-0.358930	time. Four typical	-0.124939
-2.428965	to a usability	-0.124939
-1.503343	terms of usability	-0.124939
-0.203828	Performance and usability	-0.124939
-1.377470	calls. The usability	-0.124939
-1.203190	possible for usability	-0.124939
-0.601505	problems are usability	-0.124939
-0.595236	as important usability	-0.124939
-0.358902	compatibility problems, usability	-0.124939
-1.709418	function is pure	-0.124939
-2.662402	is a pure	-0.124939
-2.395704	to a pure	-0.124939
-0.601516	log are pure	-0.124939
-0.899859	functions A pure	-0.124939
-0.575586	code containing pure	-0.124939
-0.567127	that contain pure	-0.124939
-0.982024	it involves pure	-0.124939
-2.170959	have to vectorize	-0.124939
-2.245726	possible to vectorize	-0.124939
-2.067916	want to vectorize	-0.124939
-1.193750	unable to vectorize	-0.124939
-0.601721	automatically and vectorize	-0.124939
-1.874640	may not vectorize	-0.124939
-1.112302	compiler will vectorize	-0.124939
-0.589144	compilers don't vectorize	-0.124939
-0.898277	no cache problems.	-0.124939
-0.582316	or performance problems.	-0.124939
-0.582316	investigating performance problems.	-0.124939
-0.891887	avoid these problems.	-0.124939
-0.577532	of alignment problems.	-0.124939
-0.577624	resolve compatibility problems.	-0.124939
-0.504982	of technical problems.	-0.124939
-0.358902	user. Installation problems.	-0.124939
-0.358902	user. Compatibility problems.	-0.124939
-1.077736	variable that could	-0.124939
-0.901640	though it could	-0.124939
-2.538812	the function could	-0.124939
-2.650794	the code could	-0.124939
-0.600822	8.21, you could	-0.124939
-0.877464	following methods could	-0.124939
-0.550733	the portability could	-0.124939
-0.527228	constant N1 could	-0.124939
-0.358887	that r+i/2 could	-0.124939
-1.076950	as function parameter.	-0.124939
-0.599808	a one parameter.	-0.124939
-0.789911	the template parameter.	-0.124939
-0.441034	a template parameter.	-0.124939
-0.801997	as template parameter.	-0.124939
-2.483701	of the derived	-0.124939
-2.497187	and the derived	-0.124939
-1.948832	inside the derived	-0.124939
-1.939264	of a derived	-0.425969
-2.336555	to a derived	-0.124939
-1.878619	and a derived	-0.124939
-1.078241	class and derived	-0.124939
-2.947402	can be mentioned	-0.124939
-1.712057	compilers are mentioned	-0.124939
-0.601053	collection, as mentioned	-0.124939
-1.699403	vector operations mentioned	-0.124939
-1.245814	the problems mentioned	-0.124939
-0.589058	storage methods mentioned	-0.124939
-0.818492	the disadvantages mentioned	-0.124939
-0.463586	the time-consumers mentioned	-0.124939
-0.659783	the ones mentioned	-0.124939
-1.182793	test // Time	-0.124939
-0.596629	times // Time	-0.124939
-0.596629	optimizing // Time	-0.124939
-0.897588	Matrix size Time	-0.124939
-1.162054	the user. Time	-0.124939
-0.463622	Total kilobytes Time	-0.124939
-0.659840	Example 9.6a Time	-0.124939
-0.358916	Table 9.1. Time	-0.124939
-0.358916	Table 9.3. Time	-0.124939
-0.818543	the table. Optimization	-0.124939
-0.129466	157 17 Optimization	-0.425969
-0.143394	book "Performance Optimization	-0.124939
-0.143394	Hoisie: "Performance Optimization	-0.124939
-0.143394	handling. 8.6 Optimization	-0.124939
-0.143394	81 8.6 Optimization	-0.124939
-0.358930	AMD: "Software Optimization	-0.124939
-0.358930	IA-32 Architectures Optimization	-0.124939
-1.467995	floating point expressions.	-0.124939
-0.599652	complex integer expressions.	-0.124939
-0.595623	two simple expressions.	-0.124939
-0.541296	than Boolean expressions.	-0.124939
-0.541296	many Boolean expressions.	-0.124939
-0.575678	complicated algebraic expressions.	-0.124939
-0.601007	be to include	-0.124939
-2.213998	have to include	-0.124939
-1.296750	should not include	-0.124939
-1.072047	test should include	-0.124939
-1.630469	Most compilers include	-0.124939
-1.637112	instruction sets include	-0.124939
-0.582027	Compiled languages include	-0.124939
-0.504982	compiler packages include	-0.124939
-0.505038	run. Examples include	-0.124939
-1.188316	} return y;	-0.124939
-0.667808	double x, y;	-0.124939
-0.483901	float x, y;	-0.124939
-0.343738	S1 x, y;	-0.124939
-0.343738	f, x, y;	-0.124939
-0.295871	c, d, y;	-0.425969
-0.960773	= 100, y;	-0.124939
-0.358945	= 1.23456, y;	-0.124939
-0.902837	avoids the overflow.	-0.124939
-1.796072	case of overflow.	-0.124939
-1.200049	possibility of overflow.	-0.124939
-1.103673	check for overflow.	-0.124939
-1.194517	for integer overflow.	-0.124939
-1.377268	can cause overflow.	-0.124939
-0.826543	doesn't cause overflow.	-0.124939
-0.463641	without generating overflow.	-0.124939
-1.577183	an array element.	-0.124939
-0.571563	per array element.	-0.124939
-0.571563	current array element.	-0.124939
-1.860126	a single element.	-0.124939
-1.449693	the matrix element.	-0.124939
-1.671167	the next element.	-0.124939
-0.511819	cycles per element.	-0.124939
-0.505022	suitable pivot element.	-0.124939
-0.392165	of object oriented	-0.602060
-0.889865	The object oriented	-0.124939
-1.377634	an object oriented	-0.124939
-0.529396	why object oriented	-0.124939
-0.529396	recommend object oriented	-0.124939
-0.505082	88 Object oriented	-0.124939
-0.358973	than non-object oriented	-0.124939
-3.064760	in the fully	-0.124939
-1.497785	C++ is fully	-0.124939
-1.200422	syntax is fully	-0.124939
-2.338545	with a fully	-0.124939
-1.995107	way to fully	-0.124939
-0.601505	Windows are fully	-0.124939
-2.189218	are not fully	-0.124939
-1.073977	it from fully	-0.124939
-1.671256	not always fully	-0.124939
-1.679056	kinds of storage.	-0.124939
-1.012448	for register storage.	-0.124939
-0.578658	from register storage.	-0.124939
-0.589111	need separate storage.	-0.124939
-0.562995	for temporary storage.	-0.124939
-0.028024	with big-endian storage.	-0.124939
-0.835869	big endian storage.	-0.124939
-1.445404	speed of addition,	-0.124939
-0.601749	may, in addition,	-0.124939
-2.280556	such as addition,	-0.124939
-1.036189	time than addition,	-0.425969
-2.578205	floating point addition,	-0.124939
-1.801329	an integer addition,	-0.124939
-0.590404	involving integer addition,	-0.124939
-0.358916	maximum, saturated addition,	-0.124939
-0.379799	execution of everything	-0.425969
-2.027830	sure that everything	-0.124939
-1.192028	b; // everything	-0.124939
-0.897002	1.2; // everything	-0.124939
-0.598500	languages where everything	-0.124939
-1.994294	make sure everything	-0.124939
-1.059863	clean up everything	-0.124939
-0.818507	to eliminate everything	-0.124939
-2.106722	if it involves	-0.124939
-1.743852	when it involves	-0.124939
-2.055737	because it involves	-0.124939
-2.652040	the code involves	-0.124939
-1.198623	However, this involves	-0.124939
-0.598692	method also involves	-0.124939
-0.651997	point operations involves	-0.425969
-0.358930	a driver involves	-0.124939
-2.177087	} } Here	-0.124939
-1.376372	{ ... Here	-0.124939
-0.847233	suboptimal way. Here	-0.124939
-1.113953	assembly language. Here	-0.124939
-0.527228	is double. Here	-0.124939
-0.659783	"Hello 2" Here	-0.124939
-0.463586	Newton-Raphson iterations. Here	-0.124939
-0.358887	calculation capabilities. Here	-0.124939
-0.358887	+ a2/b2; Here	-0.124939
-3.203093	of the factorial	-0.124939
-0.596706	14.1b int factorial	-0.124939
-0.596706	14.1a int factorial	-0.124939
-1.619832	the integer factorial	-0.124939
-0.588003	// n factorial	-0.124939
-0.129466	x, n, factorial	-0.425969
-0.314777	n; x++) factorial	-0.124939
-0.314777	i--, x++) factorial	-0.124939
-3.138242	of the OpenMP	-0.124939
-0.601574	Supports the OpenMP	-0.124939
-0.601228	Parallelization by OpenMP	-0.124939
-0.888271	data. Use OpenMP	-0.124939
-0.582043	64-bit. Supports OpenMP	-0.124939
-0.152940	parallel processing, OpenMP	-0.425969
-0.505002	OpenMP directives. OpenMP	-0.124939
-0.358916	page 107), OpenMP	-0.124939
-0.599217	array pointer eax	-0.124939
-0.594913	85 ; eax	-0.124939
-0.828222	the array. eax	-0.124939
-0.436362	add ebx, eax	-0.124939
-0.308801	r ebx, eax	-0.124939
-0.308801	31 ebx, eax	-0.124939
-0.557713	[esp+8] eax, eax	-0.124939
-0.550770	8 edx, eax	-0.124939
-0.463622	It compares eax	-0.124939
-0.677573	short int bb[],	-1.079181
-1.366732	branch is mispredicted	-0.124939
-0.746366	way is mispredicted	-0.425969
-1.623387	count is mispredicted	-0.124939
-1.771247	to be mispredicted	-0.124939
-2.158178	can be mispredicted	-0.124939
-2.022436	will be mispredicted	-0.124939
-0.902614	format is standardized	-0.124939
-2.768206	in a standardized	-0.124939
-0.601670	protocols and standardized	-0.124939
-2.389573	should be standardized	-0.124939
-0.901078	be as standardized	-0.124939
-2.587623	is not standardized	-0.124939
-1.074768	always use standardized	-0.124939
-0.835784	is fully standardized	-0.124939
-0.463586	on non- standardized	-0.124939
-1.939453	instead of (or	-0.124939
-1.073636	C++ program (or	-0.124939
-2.439356	instruction set (or	-0.124939
-0.598913	entire library (or	-0.124939
-1.693720	is stored (or	-0.124939
-1.940381	the SSE2 (or	-0.124939
-1.257077	of four (or	-0.124939
-0.587995	used char (or	-0.124939
-1.145563	and delete (or	-0.124939
-1.198807	to optimize across	-0.124939
-0.511092	otherwise optimize across	-0.124939
-0.511092	Cannot optimize across	-0.124939
-0.770329	making optimizations across	-0.124939
-0.530626	enable optimizations across	-0.124939
-1.042372	not compatible across	-0.124939
-1.175717	parameter transfer across	-0.124939
-0.567166	not standardized across	-0.124939
-0.358930	is unchanged across	-0.124939
-0.248654	a clock cycle	-0.425969
-0.498586	A clock cycle	-0.124939
-0.557671	one clock cycle	-0.124939
-0.537070	core clock cycle	-0.425969
-0.557102	that pointer aliasing	-0.124939
-0.535399	no pointer aliasing	-0.124939
-0.557102	possible pointer aliasing	-0.124939
-1.192364	rule out aliasing	-0.124939
-0.550842	77 Pointer aliasing	-0.124939
-0.143407	the strict aliasing	-0.425969
-0.056322	SelectAddMul(short int aa[],	-0.903090
-0.568269	SelectAddMul_dispatch(short int aa[],	-0.124939
-0.568269	FUNCNAME(short int aa[],	-0.124939
-0.568269	FuncType(short int aa[],	-0.124939
-0.597202	Studio. This tool	-0.124939
-0.597202	www.agner.org/optimize/testp.zip. This tool	-0.124939
-0.936838	a test tool	-0.124939
-0.864907	The test tool	-0.124939
-0.749204	my test tool	-0.124939
-0.186034	My test tool	-0.124939
-0.917687	and development tool	-0.124939
-0.541309	popular development tool	-0.124939
-3.203093	of the parent	-0.124939
-1.958164	of a parent	-0.425969
-0.379491	functions of parent	-0.124939
-0.600262	Members of parent	-0.124939
-0.601403	the // parent	-0.124939
-0.598830	from multiple parent	-0.124939
-0.590858	of both parent	-0.124939
-2.238432	have to care	-0.124939
-0.649793	that takes care	-0.425969
-0.842318	compiler takes care	-0.124939
-0.871261	to take care	-0.425969
-0.979273	can take care	-0.425969
-1.484085	you don't care	-0.124939
-1.497363	In most systems,	-0.124939
-0.665353	In 64-bit systems,	-0.124939
-1.257266	in 32-bit systems,	-0.124939
-0.858242	X operating systems,	-0.124939
-0.579058	languages, operating systems,	-0.124939
-1.427486	and Mac systems,	-0.124939
-0.065807	int CriticalFunction_386(int parm1,	-0.425969
-0.065807	int CriticalFunction_SSE2(int parm1,	-0.425969
-0.065807	int CriticalFunction_AVX(int parm1,	-0.425969
-0.358945	int CriticalFunctionType(int parm1,	-0.124939
-0.358945	int CriticalFunction_Dispatch(int parm1,	-0.124939
-2.255163	code is included	-0.124939
-1.729263	time is included	-0.124939
-0.899878	checking is included	-0.124939
-1.933411	functions are included	-0.124939
-2.589745	is not included	-0.124939
-0.596911	The libraries included	-0.124939
-1.030169	are usually included	-0.124939
-0.463641	License license included	-0.124939
-1.979120	have a false	-0.124939
-0.901150	given a false	-0.124939
-0.902180	0 for false	-0.124939
-2.530616	to be false	-0.124939
-0.601349	IsPowerOf2 = false	-0.124939
-0.601306	(1) or false	-0.124939
-1.368705	a && false	-0.124939
-1.232028	a || false	-0.124939
-1.446232	change the value.	-0.124939
-2.642727	the same value.	-0.124939
-1.067167	function return value.	-0.124939
-0.889471	with its value.	-0.124939
-0.885629	the calculated value.	-0.124939
-1.113526	the maximum value.	-0.124939
-0.733502	the previous value.	-0.124939
-1.681669	the object file.	-0.124939
-0.198453	single object file.	-0.425969
-0.960962	same source file.	-0.124939
-0.538940	another source file.	-0.124939
-0.579295	an output file.	-0.124939
-1.220023	the executable file.	-0.124939
-0.570538	an input file.	-0.124939
-0.600928	x; x *=	-0.124939
-0.589180	1) y *=	-0.124939
-0.570583	i++) f *=	-0.124939
-0.169339	x++) factorial *=	-0.425969
-0.557713	nfac; xn *=	-0.124939
-0.505002	x^n/n! xxn *=	-0.124939
-0.463622	x; nfac *=	-0.124939
-2.528829	of a temporary	-0.124939
-2.050822	in a temporary	-0.124939
-2.254951	as a temporary	-0.124939
-0.902619	creation of temporary	-0.124939
-2.199445	used for temporary	-0.124939
-1.502542	variables are temporary	-0.124939
-0.505042	profiler inserts temporary	-0.124939
-0.902627	abc is 12	-0.124939
-0.594569	optimal. Use 12	-0.124939
-1.004105	memory access. 12	-0.124939
-0.981878	is approximately 12	-0.124939
-0.557721	parameter 2: 12	-0.124939
-0.527249	................................................................................................. 103 12	-0.124939
-0.463604	8, 10, 12	-0.124939
-0.358902	function libraries........................................................................................ 12	-0.124939
-3.138242	of the memcpy	-0.124939
-2.299962	use the memcpy	-0.124939
-1.829998	call to memcpy	-0.124939
-1.078089	memset and memcpy	-0.124939
-0.726752	0.18 0.11 memcpy	-0.124939
-0.505002	0.44 0.12 memcpy	-0.124939
-0.358916	Test Processor memcpy	-0.124939
-0.358916	0.28 0.22 memcpy	-0.124939
-3.066577	in the procedure	-0.124939
-2.062354	in a procedure	-0.425969
-1.437091	uses a procedure	-0.124939
-1.503692	end of procedure	-0.124939
-0.601622	Linux The procedure	-0.124939
-0.597471	functions, called procedure	-0.124939
-0.358930	an ordinary procedure	-0.124939
-1.403519	on a PC	-0.124939
-1.641380	implemented in PC	-0.124939
-1.498882	only on PC	-0.124939
-0.385186	the standard PC	-0.425969
-2.021090	is a frequent	-0.124939
-0.902215	advance. The frequent	-0.124939
-1.196993	schemes are frequent	-0.124939
-1.072054	frameworks are frequent	-0.124939
-1.372790	are more frequent	-0.124939
-0.599938	remotely. If frequent	-0.124939
-2.157794	the most frequent	-0.124939
-0.358902	AVX2 _mm256_i64gather_pd unlimited	-0.124939
-0.358902	AVX2 _mm_i64gather_pd unlimited	-0.124939
-0.358902	AVX2 _mm256_i32gather_epi32 unlimited	-0.124939
-0.358902	AVX2 _mm_i32gather_ps unlimited	-0.124939
-0.358902	AVX2 _mm256_i64gather_epi32 unlimited	-0.124939
-0.358902	AVX2 _mm_i32gather_epi32 unlimited	-0.124939
-0.358902	AVX2 _mm_i64gather_epi32 unlimited	-0.124939
-0.358902	AVX2 _mm256_i32gather_ps unlimited	-0.124939
-1.377845	where the parallelism	-0.425969
-0.249837	and fine-grained parallelism	-0.124939
-0.249837	with fine-grained parallelism	-0.124939
-0.143399	with coarse-grained parallelism	-0.124939
-0.143399	between coarse-grained parallelism	-0.124939
-0.358945	parallel. Fine-grained parallelism	-0.124939
-0.358945	parallel. Coarse-grained parallelism	-0.124939
-0.870378	the CPU detection	-0.367977
-0.800894	Intel CPU detection	-0.124939
-0.601696	r2 and c2	-0.124939
-1.293641	in vector c2	-0.124939
-0.597866	choose between c2	-0.124939
-0.479811	c __m128i c2	-0.425969
-0.527309	the bit-mask: c2	-0.124939
-0.726752	= r1; c2	-0.124939
-0.463622	= c1; c2	-0.124939
-0.009143	manual 3: "The	-0.970037
-0.068455	Manual 3: "The	-0.124939
-1.317649	by adding throw()	-0.124939
-0.253893	throw() throw() throw()	-0.124939
-0.336313	exceptions throw() throw()	-0.124939
-0.557757	throw exceptions throw()	-0.124939
-0.172691	the empty throw()	-0.124939
-0.077818	an empty throw()	-0.124939
-2.761724	if the prediction	-0.124939
-2.339615	with a prediction	-0.124939
-0.601625	rules for prediction	-0.124939
-1.313063	the branch prediction	-0.124939
-0.806086	for branch prediction	-0.124939
-0.550855	no branch prediction	-0.124939
-0.550855	take branch prediction	-0.124939
-0.584209	and advanced prediction	-0.124939
-3.141244	of the polymorphic	-0.124939
-1.676581	call the polymorphic	-0.124939
-1.939166	of a polymorphic	-0.124939
-0.600132	call a polymorphic	-0.124939
-0.893783	// call polymorphic	-0.124939
-0.726901	for implementing polymorphic	-0.124939
-1.075079	should have #if	-0.124939
-0.600593	example use #if	-0.124939
-2.177202	} } #if	-0.124939
-0.600129	if because #if	-0.124939
-2.439676	instruction set #if	-0.124939
-1.063804	source code. #if	-0.124939
-1.149102	int n; #if	-0.124939
-0.659811	is compiled. #if	-0.124939
-0.601829	inheritance is now	-0.124939
-0.601505	solutions are now	-0.124939
-1.743951	code can now	-0.124939
-0.579310	column-wise. Assume now	-0.124939
-0.562998	stack). ecx now	-0.124939
-0.957796	loop body now	-0.124939
-1.065664	live ranges now	-0.124939
-0.358902	is Borland's now	-0.124939
-1.706429	but this unit	-0.124939
-1.819904	The time unit	-0.124939
-2.643731	the same unit	-0.124939
-0.599771	save one unit	-0.124939
-0.808175	graphics processing unit	-0.124939
-0.491874	physics processing unit	-0.124939
-0.107193	3.16 Execution unit	-0.425969
-0.960956	function calling conventions	-0.124939
-0.527380	specific calling conventions	-0.124939
-0.009882	5: "Calling conventions	-0.823909
-0.527373	5. Calling conventions	-0.124939
-2.074328	in a register.	-0.124939
-1.758148	a vector register.	-0.124939
-0.885430	as vector register.	-0.124939
-2.643731	the same register.	-0.124939
-0.599771	up one register.	-0.124939
-1.226093	128-bit XMM register.	-0.124939
-0.575630	same logical register.	-0.124939
-2.663460	is a kind	-0.124939
-1.374507	also a kind	-0.124939
-1.159336	make this kind	-0.124939
-0.880121	supports this kind	-0.124939
-0.590424	prevent this kind	-0.124939
-1.845640	a different kind	-0.124939
-0.588625	explicitly what kind	-0.124939
-0.358945	even worse kind	-0.124939
-0.601934	dropping the graphical	-0.124939
-1.948461	of a graphical	-0.425969
-1.914956	has a graphical	-0.124939
-0.600457	Graphics A graphical	-0.124939
-0.871413	their own graphical	-0.124939
-0.550765	(OWL). Several graphical	-0.124939
-0.358930	on system-specific graphical	-0.124939
-1.710346	only the lower	-0.124939
-1.298664	stores the lower	-0.124939
-1.203277	error is lower	-0.124939
-2.086650	for a lower	-0.124939
-1.872285	at a lower	-0.124939
-0.601823	string to lower	-0.124939
-1.360006	threads with lower	-0.124939
-0.597943	thread with lower	-0.124939
-2.201476	at the label	-0.124939
-1.288341	where each label	-0.124939
-0.070277	; unused label	-0.425969
-1.509765	the preceding label	-0.124939
-0.505042	the $B1$2 label	-0.124939
-1.078784	overlap the iterations	-0.124939
-2.465969	number of iterations	-0.124939
-1.758992	or more iterations	-0.124939
-1.199930	of loop iterations	-0.124939
-0.896584	doing two iterations	-0.124939
-1.168040	for several iterations	-0.124939
-0.564805	statement several iterations	-0.124939
-0.589136	in mathematical iterations	-0.124939
-1.202250	so the misprediction	-0.124939
-0.601592	detect the misprediction	-0.124939
-2.080515	make a misprediction	-0.124939
-1.498555	get a misprediction	-0.124939
-0.601734	prediction and misprediction	-0.124939
-1.374489	the branch misprediction	-0.124939
-1.414747	a branch misprediction	-0.124939
-0.565971	any branch misprediction	-0.124939
-1.156384	is an integer,	-0.124939
-0.950988	to an integer,	-0.124939
-1.579239	in an integer,	-0.124939
-0.598800	4 64-bit integer,	-0.124939
-0.585163	biased binary integer,	-0.124939
-0.567219	plus 6 integer,	-0.124939
-0.162721	of lazy binding	-0.124939
-0.042750	and lazy binding	-0.425969
-0.090171	on lazy binding	-0.124939
-0.090171	But lazy binding	-0.124939
-0.090171	allow lazy binding	-0.124939
-0.143412	called. Lazy binding	-0.124939
-0.143412	long. Lazy binding	-0.124939
-0.601836	hand, a just-in-time	-0.124939
-1.241264	code and just-in-time	-0.124939
-1.844768	based on just-in-time	-0.124939
-0.595106	implementations use just-in-time	-0.124939
-0.595106	machines use just-in-time	-0.124939
-0.143399	code, interpreters, just-in-time	-0.124939
-0.143399	frameworks, interpreters, just-in-time	-0.124939
-2.715793	is a try	-0.124939
-2.029813	recommended to try	-0.124939
-1.948319	compiler may try	-0.124939
-0.600745	F0() { try	-0.124939
-0.600435	producer will try	-0.124939
-0.600432	hyperthreading, then try	-0.124939
-2.248279	is no try	-0.124939
-0.598291	Should we try	-0.124939
-3.015582	in the background	-0.124939
-2.704729	that the background	-0.124939
-0.601828	leave a background	-0.124939
-1.977430	lot of background	-0.124939
-1.291276	performance for background	-0.124939
-0.899162	ms for background	-0.124939
-0.828220	the heavy background	-0.124939
-0.463641	The theoretical background	-0.124939
-1.293144	integer is converted	-0.124939
-1.587967	class is converted	-0.124939
-0.600471	14.7b is converted	-0.124939
-2.277058	to be converted	-0.124939
-1.982468	can be converted	-0.602060
-0.900147	number when converted	-0.124939
-1.681764	the object pointed	-0.124939
-1.278949	of object pointed	-0.124939
-1.001108	The object pointed	-0.124939
-1.515007	the value pointed	-0.425969
-1.189816	the variable pointed	-0.425969
-1.123122	the target pointed	-0.124939
-1.379034	brands of CPUs,	-0.124939
-1.290764	than other CPUs,	-0.124939
-0.876664	for Intel CPUs,	-0.124939
-0.588646	certain Intel CPUs,	-0.124939
-0.544868	of modern CPUs,	-0.425969
-0.249837	or multi-core CPUs,	-0.124939
-0.249837	on multi-core CPUs,	-0.124939
-0.601844	precautions to account	-0.124939
-0.190181	take into account	-0.124939
-0.536450	problems into account	-0.124939
-0.536450	prediction into account	-0.124939
-0.117140	taken into account	-0.602060
-0.737884	int * p)	-0.124939
-0.171892	const * p)	-0.726999
-0.184469	(int * p)	-0.425969
-0.511648	Sum2(S3 * p)	-0.124939
-1.680113	about the chain	-0.124939
-0.366232	the dependency chain	-0.124939
-0.515278	a dependency chain	-0.124939
-0.366232	A dependency chain	-0.124939
-0.841833	long dependency chain	-0.124939
-0.515278	critical dependency chain	-0.124939
-0.366232	Each dependency chain	-0.124939
-0.366232	carried dependency chain	-0.124939
-0.902362	flow and algorithms	-0.124939
-0.902195	branches. The algorithms	-0.124939
-0.601133	literature on algorithms	-0.124939
-1.773545	of different algorithms	-0.124939
-1.487221	several different algorithms	-0.124939
-0.594239	microprocessor. These algorithms	-0.124939
-0.589564	of complicated algorithms	-0.124939
-0.868005	using advanced algorithms	-0.124939
-1.377409	through the PLT	-0.124939
-0.601592	replaces the PLT	-0.124939
-1.299948	makes a PLT	-0.124939
-0.089901	GOT and PLT	-0.249877
-1.711025	function. The PLT	-0.124939
-3.138242	of the heavy	-0.124939
-1.444733	doing the heavy	-0.124939
-1.640826	example, a heavy	-0.124939
-1.203365	thanks to heavy	-0.124939
-0.601169	network with heavy	-0.124939
-2.280556	such as heavy	-0.124939
-2.248616	is no heavy	-0.124939
-0.598180	it some heavy	-0.124939
-1.985924	of code once	-0.124939
-1.946452	more than once	-0.124939
-0.899564	values at once	-0.124939
-0.600020	executed only once	-0.124939
-1.969018	is called once	-0.124939
-0.595395	consequences. I once	-0.124939
-0.463604	function. Compile once	-0.124939
-0.358902	relocated (rebased) once	-0.124939
-2.014511	all the additions	-0.124939
-0.601039	mix of additions	-0.124939
-1.076124	combination of additions	-0.124939
-0.598852	four float additions	-0.124939
-0.872379	do two additions	-0.124939
-0.872379	just two additions	-0.124939
-0.592970	do four additions	-0.124939
-0.875416	by n additions	-0.124939
-1.198464	or a hash	-0.124939
-2.067243	use a hash	-0.124939
-1.131858	data. A hash	-0.124939
-0.582836	elements. A hash	-0.124939
-0.582836	enough. A hash	-0.124939
-0.582836	interval. A hash	-0.124939
-0.358973	binary trees, hash	-0.124939
-1.192735	loaded into ecx	-0.124939
-0.837279	stack ; ecx	-0.124939
-0.567909	;edx=addressinr ; ecx	-0.124939
-0.557713	registers eax, ecx	-0.124939
-0.557713	address is. ecx	-0.124939
-0.358916	the stack). ecx	-0.124939
-0.358916	PTR [eax+4], ecx	-0.124939
-0.358916	PTR [eax], ecx	-0.124939
-3.069318	in the system.	-0.124939
-0.820362	the operating system.	-0.124939
-1.268753	and operating system.	-0.124939
-0.739756	an operating system.	-0.124939
-1.057841	the Windows system.	-0.124939
-2.578739	floating point variables,	-0.124939
-0.598808	includes static variables,	-0.124939
-1.361101	point register variables,	-0.124939
-1.091787	for simple variables,	-0.124939
-0.571138	as simple variables,	-0.124939
-0.579276	parameters, local variables,	-0.124939
-0.391124	integer Register variables,	-0.124939
-0.391124	float Register variables,	-0.124939
-2.528997	This is equally	-0.124939
-0.601166	p->member is equally	-0.124939
-1.055972	and are equally	-0.124939
-1.946209	they are equally	-0.124939
-0.594185	integers are equally	-0.124939
-0.594185	references are equally	-0.124939
-0.594185	123; are equally	-0.124939
-1.879857	are accessed equally	-0.124939
-0.508199	are cases, however,	-0.124939
-0.732092	many cases, however,	-0.124939
-0.508199	few cases, however,	-0.124939
-0.917482	is inefficient, however,	-0.124939
-0.527291	be needed, however,	-0.124939
-0.463641	Programmers do, however,	-0.124939
-0.358930	always accurate, however,	-0.124939
-0.358930	is OK, however,	-0.124939
-0.901264	CPU is designed	-0.124939
-0.901264	STL is designed	-0.124939
-2.531284	to be designed	-0.124939
-0.599671	dispatchers are designed	-0.124939
-0.599671	(MOVNT) are designed	-0.124939
-0.589105	was never designed	-0.124939
-0.557753	original, poorly designed	-0.124939
-0.358930	was originally designed	-0.124939
-1.829799	If a profiling	-0.124939
-1.078051	debugging and profiling	-0.124939
-0.901290	program with profiling	-0.124939
-2.530647	to make profiling	-0.124939
-1.540810	several different profiling	-0.124939
-1.239674	your own profiling	-0.124939
-0.999474	programming languages, profiling	-0.124939
-0.358902	are offering profiling	-0.124939
-2.301861	code is fragmented	-0.124939
-0.601708	slow and fragmented	-0.124939
-2.531284	to be fragmented	-0.124939
-1.074713	becomes more fragmented	-0.124939
-0.770284	space becomes fragmented	-0.124939
-0.530600	never becomes fragmented	-0.124939
-0.916609	to become fragmented	-0.124939
-0.510507	easily become fragmented	-0.124939
-2.731056	if the inputs	-0.124939
-2.007413	all the inputs	-0.124939
-1.501580	program. The inputs	-0.124939
-1.283517	of possible inputs	-0.124939
-0.866107	and negative inputs	-0.124939
-0.557737	only allowed inputs	-0.124939
-0.527270	and mouse inputs	-0.124939
-0.358916	12. Higher inputs	-0.124939
-2.155615	which is fast.	-0.124939
-0.601160	Rounding is fast.	-0.124939
-1.399814	is very fast.	-0.124939
-1.094176	are very fast.	-0.124939
-0.552059	accessed very fast.	-0.124939
-0.552059	generally very fast.	-0.124939
-0.591247	accessed quite fast.	-0.124939
-0.563066	accessed equally fast.	-0.124939
-1.294366	CPUs have family	-0.124939
-2.100766	the CPU family	-0.124939
-1.616432	The CPU family	-0.124939
-1.059012	on its family	-0.124939
-0.377342	the x86 family	-0.124939
-0.358945	its brand, family	-0.124939
-0.896775	n = 4,	-0.124939
-0.598891	Tuesday = 4,	-0.124939
-0.585141	asa << 4,	-0.124939
-0.858762	old Pentium 4,	-0.124939
-0.483869	(i.e. 2, 4,	-0.124939
-0.483869	(0, 2, 4,	-0.124939
-0.463641	2, 3, 4,	-0.124939
-0.358930	11, Iss. 4,	-0.124939
-0.599014	here // Virtual	-0.124939
-0.599014	p->f(); // Virtual	-0.124939
-1.828338	member functions Virtual	-0.124939
-0.589123	32-bit systems. Virtual	-0.124939
-0.249830	access. 7.20 Virtual	-0.124939
-0.249830	53 7.20 Virtual	-0.124939
-0.463641	is pure. Virtual	-0.124939
-0.358930	page 96). Virtual	-0.124939
-1.939739	instead of j	-0.124939
-0.601169	32 with j	-0.124939
-1.249862	i++) { j	-0.425969
-0.599076	(int)&matrix[0][0] + j	-0.124939
-2.338837	= 0; j	-0.124939
-1.158091	can replace j	-0.124939
-0.567165	to multiply j	-0.124939
-2.193677	at the interrupt	-0.124939
-1.077742	remove the interrupt	-0.124939
-0.902200	instruction for interrupt	-0.124939
-1.280140	by an interrupt	-0.124939
-0.596743	times an interrupt	-0.124939
-0.594627	code. An interrupt	-0.124939
-1.154467	should never interrupt	-0.124939
-0.463641	Device drivers, interrupt	-0.124939
-0.598691	-1 = -1	-0.124939
-1.068050	~a = -1	-0.124939
-0.896067	a & -1	-0.425969
-0.682514	a | -1	-0.425969
-1.027735	a ^ -1	-0.124939
-2.949515	can be 8,	-0.124939
-0.601349	Wednesday = 8,	-0.124939
-0.601306	4 or 8,	-0.124939
-1.199473	other than 8,	-0.124939
-1.085538	byte at 8,	-0.425969
-1.430715	long double 8,	-0.124939
-0.828144	2, 4, 8,	-0.124939
-0.639817	and execution units.	-0.124939
-0.639817	different execution units.	-0.124939
-0.450772	point execution units.	-0.124939
-0.450772	size execution units.	-0.124939
-0.450772	64-bit execution units.	-0.124939
-0.450772	several execution units.	-0.124939
-0.450772	full-size execution units.	-0.124939
-0.879517	point multiplication units.	-0.124939
-0.601683	packages and who	-0.124939
-0.599835	libraries, but who	-0.124939
-1.182200	end user who	-0.124939
-0.818472	software developers who	-0.124939
-0.541216	precision. And who	-0.124939
-0.358902	for those who	-0.124939
-0.358902	below. Those who	-0.124939
-0.358902	many people who	-0.124939
-1.199440	be the fastest	-0.124939
-0.900743	calculated the fastest	-0.124939
-1.374369	cases, the fastest	-0.124939
-0.900743	still the fastest	-0.124939
-2.278341	code is fastest	-0.124939
-1.844895	method is fastest	-0.124939
-1.713274	sake of fastest	-0.124939
-1.077931	critical. The fastest	-0.124939
-0.601306	__declspec(noalias) or __restrict	-0.124939
-0.869525	int * __restrict	-0.124939
-0.584954	AddTwo(int * __restrict	-0.124939
-1.314061	the keyword __restrict	-0.124939
-0.563015	on) __restrict __restrict	-0.124939
-0.659840	#pragma ivdep __restrict	-0.124939
-0.358916	__declspec( noalias) __restrict	-0.124939
-0.358916	optimize("a", on) __restrict	-0.124939
-1.887554	is an arithmetic	-0.124939
-0.599626	as integer arithmetic	-0.124939
-1.881538	can do arithmetic	-0.124939
-0.599217	if pointer arithmetic	-0.124939
-0.592179	than doing arithmetic	-0.124939
-0.391131	address. Pointer arithmetic	-0.124939
-0.391131	accessed. Pointer arithmetic	-0.124939
-0.358916	flip-flops, multiplexers, arithmetic	-0.124939
-2.239628	then the DLL	-0.124939
-1.299160	within the DLL	-0.124939
-2.062600	in a DLL	-0.124939
-2.060977	make a DLL	-0.124939
-2.645746	the same DLL	-0.124939
-0.780395	a runtime DLL	-0.124939
-0.536394	A runtime DLL	-0.124939
-2.007529	all the factors	-0.124939
-1.638675	up the factors	-0.124939
-1.435656	many different factors	-0.124939
-0.594257	libraries. These factors	-0.124939
-0.658516	are several factors	-0.425969
-0.575656	many unknown factors	-0.124939
-0.358930	are limiting factors	-0.124939
-2.475770	and the Gnu,	-0.124939
-2.393606	by the Gnu,	-0.124939
-2.365394	with the Gnu,	-0.124939
-1.852668	as the Gnu,	-0.124939
-2.263487	use the Gnu,	-0.124939
-0.601643	parallelization. The Gnu,	-0.124939
-2.282091	such as Gnu,	-0.124939
-0.567264	Intel, Microsoft, Gnu,	-0.124939
-2.510074	of the arrays.	-0.124939
-1.183062	in large arrays.	-0.124939
-0.573319	and initialized arrays.	-0.124939
-1.098973	to align arrays.	-0.124939
-0.570592	for unaligned arrays.	-0.124939
-0.249830	with character arrays.	-0.124939
-0.249830	as character arrays.	-0.124939
-2.465969	number of devices	-0.124939
-1.069054	that such devices	-0.124939
-0.596041	Accessing system devices	-0.124939
-0.563023	on small devices	-0.124939
-0.828238	such small devices	-0.124939
-0.567165	Programmable logic devices	-0.124939
-0.550825	Verilog. Common devices	-0.124939
-0.358916	Small hand-held devices	-0.124939
-3.201937	of the branch.	-0.124939
-2.716391	is a branch.	-0.124939
-1.503146	kind of branch.	-0.124939
-0.601591	with that branch.	-0.124939
-0.894017	loop control branch.	-0.124939
-0.573289	a previous branch.	-0.124939
-0.726752	the wrong branch.	-0.124939
-2.729792	to the required	-0.124939
-0.601580	allocates the required	-0.124939
-0.601146	math is required	-0.124939
-0.601146	manipulation is required	-0.124939
-0.601292	debugging if required	-0.124939
-1.850581	of memory required	-0.124939
-0.550765	// SSE3 required	-0.124939
-0.463641	that previously required	-0.124939
-1.274921	a = (unsigned	-0.726999
-1.694609	b = (unsigned	-0.124939
-0.580780	int)i >= (unsigned	-0.124939
-0.557771	min) <= (unsigned	-0.124939
-0.358959	& operator[] (unsigned	-0.124939
-2.261028	that is almost	-0.124939
-2.690473	it is almost	-0.124939
-2.658613	It is almost	-0.124939
-1.289702	list is almost	-0.124939
-1.964430	used in almost	-0.124939
-0.600661	Linux in almost	-0.124939
-0.896057	loop where almost	-0.124939
-0.586749	systems give almost	-0.124939
-3.139240	of the GOT	-0.124939
-1.501323	find the GOT	-0.124939
-1.905402	and a GOT	-0.124939
-0.601096	expects a GOT	-0.124939
-1.631998	not use GOT	-0.124939
-0.580769	the slow GOT	-0.124939
-0.463641	no effect. GOT	-0.124939
-0.358930	-read_only_relocs suppress. GOT	-0.124939
-3.138242	of the array.	-0.124939
-3.014773	in the array.	-0.124939
-1.762616	in an array.	-0.124939
-1.845136	a different array.	-0.124939
-1.052474	in another array.	-0.124939
-1.366376	a linear array.	-0.124939
-0.550743	a normal array.	-0.124939
-0.463622	the destination array.	-0.124939
-1.803102	functions are listed	-0.124939
-1.173402	results are listed	-0.124939
-1.262961	conditions are listed	-0.124939
-0.594174	suffixes are listed	-0.124939
-0.594174	latencies are listed	-0.124939
-0.601110	set, as listed	-0.124939
-0.595652	the instructions listed	-0.124939
-0.505062	function ReadTSC listed	-0.124939
-2.701397	to the general	-0.124939
-2.970680	in the general	-0.124939
-0.601232	consult the general	-0.124939
-1.379026	due to general	-0.124939
-1.584744	available for general	-0.124939
-0.600102	justified for general	-0.124939
-1.371378	A more general	-0.124939
-0.580742	53). No general	-0.124939
-0.601586	definitely the preferred	-0.124939
-0.601586	reasons, the preferred	-0.124939
-2.732609	it is preferred	-0.124939
-2.684462	It is preferred	-0.124939
-1.294656	version is preferred	-0.124939
-0.902236	returns. The preferred	-0.124939
-2.536635	may be preferred	-0.124939
-1.443511	processors are preferred	-0.124939
-0.509891	16 clock cycles,	-0.124939
-1.278998	few clock cycles,	-0.124939
-0.911021	10 clock cycles,	-0.124939
-0.340967	5 clock cycles,	-0.425969
-0.509891	6 clock cycles,	-0.124939
-0.734930	80 clock cycles,	-0.124939
-0.509891	25 clock cycles,	-0.124939
-0.601107	vectorize code explicitly	-0.124939
-2.621475	the compiler explicitly	-0.124939
-1.073774	prefetch data explicitly	-0.124939
-0.590441	the space explicitly	-0.124939
-1.772028	CPU dispatching explicitly	-0.124939
-1.292983	algebraic reductions explicitly	-0.124939
-1.558835	to tell explicitly	-0.124939
-1.009399	the alignment explicitly	-0.124939
-1.457470	same memory space.	-0.124939
-0.886608	takes memory space.	-0.124939
-0.898531	of cache space.	-0.124939
-0.572776	more cache space.	-0.124939
-0.846372	up cache space.	-0.124939
-0.869929	of storage space.	-0.124939
-0.851758	and disk space.	-0.124939
-2.616285	is a fixed	-0.124939
-1.197473	because a fixed	-0.124939
-1.074186	insert a fixed	-0.124939
-1.198565	objects and fixed	-0.124939
-1.075008	small and fixed	-0.124939
-0.202646	buffer with fixed	-0.124939
-0.594740	patterns with fixed	-0.124939
-1.292366	allocated memory Memory	-0.124939
-0.577539	sound processing Memory	-0.124939
-0.957777	to disk. Memory	-0.124939
-0.065804	21 3.13 Memory	-0.124939
-0.659840	precision math. Memory	-0.124939
-0.358916	up include: Memory	-0.124939
-0.358916	and API's. Memory	-0.124939
-1.078452	byte of zero.	-0.124939
-1.073709	array to zero.	-0.124939
-0.899434	elements to zero.	-0.124939
-0.600228	bits to zero.	-0.124939
-0.601526	bit are zero.	-0.124939
-0.594462	extra element zero.	-0.124939
-0.591592	or simply zero.	-0.124939
-0.582089	0's gives zero.	-0.124939
-1.451784	in a non-sequential	-0.301030
-0.901392	structures with non-sequential	-0.124939
-0.891522	the access non-sequential	-0.124939
-1.503692	ways of multiplying	-0.124939
-1.742876	support for multiplying	-0.124939
-1.550578	done by multiplying	-0.124939
-2.082478	faster than multiplying	-0.124939
-0.600560	2 when multiplying	-0.124939
-0.563089	double before multiplying	-0.124939
-0.563089	big before multiplying	-0.124939
-0.563089	precision before multiplying	-0.124939
-2.578205	floating point Conversion	-0.124939
-0.886914	and integers Conversion	-0.124939
-0.591216	registers used. Conversion	-0.124939
-0.541265	integer conversion Conversion	-0.124939
-0.541265	float conversion Conversion	-0.124939
-1.333663	is enabled. Conversion	-0.124939
-0.917430	floating point. Conversion	-0.124939
-0.358916	operator %. Conversion	-0.124939
-1.504794	loop count down	-0.124939
-0.590826	consumption was down	-0.124939
-0.166674	may slow down	-0.124939
-0.440618	operations slow down	-0.124939
-0.858659	; shift down	-0.124939
-0.805966	to break down	-0.124939
-0.358930	is shut down	-0.124939
-3.138242	of the software.	-0.124939
-3.014773	in the software.	-0.124939
-1.939491	use of software.	-0.124939
-1.573712	the application software.	-0.124939
-0.585983	of your software.	-0.124939
-0.550743	party security software.	-0.124939
-0.505002	their 23 software.	-0.124939
-0.463622	some legacy software.	-0.124939
-2.367045	because the interpreted	-0.124939
-1.729341	time is interpreted	-0.124939
-1.436065	loop is interpreted	-0.124939
-1.197746	i is interpreted	-0.124939
-0.601721	is and interpreted	-0.124939
-1.295443	example, in interpreted	-0.124939
-1.198533	advantage in interpreted	-0.124939
-2.217766	will be interpreted	-0.124939
-2.277884	code is exactly	-0.124939
-1.442608	operator is exactly	-0.124939
-1.761700	parameters are exactly	-0.124939
-0.599671	Enums are exactly	-0.124939
-0.600709	methods have exactly	-0.124939
-1.695923	will make exactly	-0.124939
-1.165940	are doing exactly	-0.124939
-1.048834	to measure exactly	-0.124939
-1.940057	has a jump	-0.124939
-1.503601	table of jump	-0.124939
-0.902139	threads that jump	-0.124939
-0.600654	eliminate this jump	-0.124939
-1.542829	an extra jump	-0.124939
-0.888880	array ; jump	-0.124939
-1.698628	the microprocessor jump	-0.124939
-1.225821	switch statement jump	-0.124939
-1.729498	time is determined	-0.124939
-1.074433	storage is determined	-0.124939
-0.600471	slices is determined	-0.124939
-2.188765	can be determined	-0.124939
-1.904155	cannot be determined	-0.124939
-1.061950	cases be determined	-0.124939
-1.838243	is often determined	-0.124939
-0.733688	short int cc[])	-1.028029
-0.601148	every code line.	-0.124939
-0.611361	a cache line.	-0.124939
-0.643787	same cache line.	-0.124939
-0.555726	arbitrary cache line.	-0.124939
-1.611514	a matrix line.	-0.124939
-2.391918	should be easily	-0.124939
-1.986566	that can easily	-0.124939
-1.923294	compiler can easily	-0.124939
-0.888587	performance can easily	-0.124939
-0.594748	heap can easily	-0.124939
-1.200046	and not easily	-0.124939
-0.580267	problem cannot easily	-0.124939
-0.580267	algorithms, cannot easily	-0.124939
-0.078292	runtime type identification	-0.249877
-0.107875	Runtime type identification	-0.301030
-0.570613	macros Compiler identification	-0.124939
-3.064760	in the vectors.	-0.124939
-2.465635	number of vectors.	-0.124939
-0.601744	etc. in vectors.	-0.124939
-2.577672	floating point vectors.	-0.124939
-0.599613	256-bit integer vectors.	-0.124939
-0.897346	organized into vectors.	-0.124939
-0.577532	like adding vectors.	-0.124939
-0.851750	two 128-bit vectors.	-0.124939
-0.599106	(cc[i] + 2)	-0.124939
-0.598520	v.i * 2)	-0.124939
-0.570615	i += 2)	-0.602060
-1.018178	(iset >= 2)	-0.124939
-0.065807	= ((x2) 2)	-0.425969
-0.600625	real time applications.	-0.124939
-1.435487	many different applications.	-0.124939
-1.747548	for all applications.	-0.124939
-1.069032	in such applications.	-0.124939
-1.182962	in large applications.	-0.124939
-1.279467	for Windows applications.	-0.124939
-0.504982	less intensive applications.	-0.124939
-0.358902	mathe- matical applications.	-0.124939
-0.600127	volatile. The volatile	-0.124939
-0.600127	Volatile The volatile	-0.124939
-1.600187	Note that volatile	-0.124939
-1.314061	the keyword volatile	-0.124939
-0.585974	not declared volatile	-0.124939
-0.563015	Explain volatile volatile	-0.124939
-0.358916	7.3. Explain volatile	-0.124939
-0.358916	int dummy[4]; volatile	-0.124939
-0.885354	of cache misses	-0.124939
-0.564172	code, cache misses	-0.124939
-0.564172	thousand cache misses	-0.124939
-0.564172	Provoke cache misses	-0.124939
-0.179748	size causes misses	-0.425969
-0.557813	together Cache misses	-0.124939
-0.490238	use lookup tables	-0.124939
-0.179333	Use lookup tables	-0.124939
-1.018273	to produce tables	-0.124939
-0.397484	and PLT tables	-0.425969
-0.143403	lookup Lookup tables	-0.124939
-0.143403	lookup. Lookup tables	-0.124939
-2.769457	in a random	-0.124939
-0.379646	deallocated in random	-0.425969
-1.935917	useful for random	-0.124939
-0.601228	caused by random	-0.124939
-2.082247	faster than random	-0.124939
-0.900087	data more random	-0.124939
-0.600302	occur at random	-0.124939
-0.182924	Mac OS X	-0.204120
-0.527355	Alignd(X) __declspec(align(16)) X	-0.124939
-0.659955	#define Alignd(X) X	-0.124939
-1.189757	class objects Conversions	-0.124939
-1.376531	{ ... Conversions	-0.124939
-1.434376	are used. Conversions	-0.124939
-0.877574	precision conversion Conversions	-0.124939
-1.106716	double precision. Conversions	-0.124939
-1.333663	is enabled. Conversions	-0.124939
-0.314747	platform. 14.8 Conversions	-0.124939
-0.314747	140 14.8 Conversions	-0.124939
-2.372466	in the YMM	-0.124939
-0.746767	set and YMM	-0.425969
-0.601633	SSE). The YMM	-0.124939
-0.450718	and 256-bit YMM	-0.124939
-0.450718	The 256-bit YMM	-0.124939
-0.463659	registers named YMM	-0.124939
-0.600465	if is resolved	-0.124939
-1.372292	library is resolved	-0.124939
-0.899905	#if is resolved	-0.124939
-2.106953	they are resolved	-0.124939
-2.591166	is not resolved	-0.124939
-1.318308	is always resolved	-0.124939
-1.175224	are always resolved	-0.124939
-2.668872	for the purpose	-0.124939
-0.893430	// The purpose	-0.124939
-0.597203	terminated. The purpose	-0.124939
-0.597203	y. The purpose	-0.124939
-0.597203	new. The purpose	-0.124939
-1.174496	the specific purpose	-0.124939
-0.579299	Several special purpose	-0.124939
-1.642423	without the -fpic	-0.124939
-1.442874	compiled with -fpic	-0.124939
-0.777125	object without -fpic	-0.124939
-0.547219	compiled without -fpic	-0.124939
-0.534527	compiling without -fpic	-0.124939
-1.568287	the option -fpic	-0.124939
-2.396495	is the D	-0.124939
-0.601622	library). The D	-0.124939
-0.601615	IDE's for D	-0.124939
-0.589809	54 class D	-0.124939
-0.589809	B2; class D	-0.124939
-0.557712	D language. D	-0.124939
-0.358930	C++. Yet, D	-0.124939
-2.151977	if it had	-0.124939
-2.087966	if you had	-0.124939
-2.427116	the program had	-0.124939
-1.463061	vector registers had	-0.124939
-0.582077	If columns had	-0.124939
-0.575600	Later models had	-0.124939
-0.505002	first PC's had	-0.124939
-0.599626	with integer parameters.	-0.124939
-0.597188	fourteen register parameters.	-0.124939
-0.596830	of template parameters.	-0.124939
-0.889471	and its parameters.	-0.124939
-0.592949	than four parameters.	-0.124939
-0.592440	same few parameters.	-0.124939
-0.527270	transferring additional parameters.	-0.124939
-0.591904	ebx, 1 ebx,	-0.124939
-0.811472	instruction add ebx,	-0.124939
-0.553839	instructions add ebx,	-0.124939
-1.046049	= r ebx,	-0.124939
-0.313223	ebx, eax ebx,	-0.124939
-0.527313	ebx, 31 ebx,	-0.124939
-0.601843	gives a measure	-0.124939
-1.584120	function to measure	-0.124939
-1.193778	program to measure	-0.124939
-2.068083	want to measure	-0.124939
-1.804641	difficult to measure	-0.124939
-1.202816	data and measure	-0.124939
-1.911117	that you measure	-0.124939
-2.732609	it is poorly	-0.124939
-1.437632	value is poorly	-0.124939
-1.370686	branch is poorly	-0.124939
-1.378868	replace a poorly	-0.124939
-0.902045	branches are poorly	-0.124939
-0.358945	the original, poorly	-0.124939
-0.358945	to perform poorly	-0.124939
-2.277763	to do this:	-0.124939
-0.195021	look like this:	-0.602060
-0.096645	looks like this:	-0.602060
-3.066577	in the sections	-0.124939
-0.600272	read-only data sections	-0.124939
-1.987740	The following sections	-0.124939
-1.677598	the above sections	-0.124939
-0.981929	The subsequent sections	-0.124939
-0.065806	/Gy -ffunction- sections	-0.124939
-0.567183	compatibility problems. Software	-0.124939
-0.957826	to disk. Software	-0.124939
-0.818505	restarted anyway. Software	-0.124939
-0.107193	Intel Architecture Software	-0.425969
-0.358930	access rights. Software	-0.124939
-0.358930	Memory swapping. Software	-0.124939
-0.577606	point calculations. Even	-0.124939
-1.004118	not needed. Even	-0.124939
-1.023427	predicted well. Even	-0.124939
-0.557713	pre-calculated table. Even	-0.124939
-0.982024	Pentium 4. Even	-0.124939
-0.358916	very common. Even	-0.124939
-0.358916	to come. Even	-0.124939
-1.783868	byte at 19	-0.124939
-1.419192	in table 19	-0.124939
-0.505002	loading ....................................................................................................... 19	-0.124939
-0.463622	updates .................................................................................................... 19	-0.124939
-0.463622	options....................................................................................... 160 19	-0.124939
-0.358916	_M_X64 162 19	-0.124939
-0.358916	key press. 19	-0.124939
-0.491260	speed is important.	-0.124939
-1.070315	efficiency is important.	-0.124939
-0.599084	portability is important.	-0.124939
-1.494995	and more important.	-0.124939
-0.659955	becoming increasingly important.	-0.124939
-2.331261	in the carry	-0.425969
-2.304773	If the carry	-0.124939
-1.929594	into the carry	-0.124939
-2.034726	where the carry	-0.124939
-1.073600	modify the carry	-0.124939
-0.601663	next. The carry	-0.124939
-1.769111	because of lazy	-0.124939
-0.901065	principle of lazy	-0.124939
-1.241264	code and lazy	-0.425969
-0.601150	delay on lazy	-0.124939
-0.594489	session. But lazy	-0.124939
-0.998083	systems allow lazy	-0.124939
-0.901124	value as xn	-0.124939
-1.630707	{ float xn	-0.124939
-1.362400	each value xn	-0.124939
-1.180011	sum += xn	-0.124939
-0.883602	will calculate xn	-0.124939
-0.358916	/ nfac; xn	-0.124939
-0.358916	series: ex xn	-0.124939
-1.122013	the time stamp	-0.425969
-1.599698	The time stamp	-0.124939
-0.569116	so-called time stamp	-0.124939
-0.569116	Returns time stamp	-0.124939
-2.366757	because the debugging	-0.124939
-2.149066	used for debugging	-0.124939
-0.600092	IDE, for debugging	-0.124939
-0.587431	removed after debugging	-0.124939
-1.413378	turn off debugging	-0.124939
-0.575580	with full debugging	-0.124939
-0.358930	of verifying, debugging	-0.124939
-1.528893	x = 10;	-0.124939
-0.598928	NumberOfTests = 10;	-0.124939
-1.019319	b / 10;	-0.124939
-0.767896	int)b / 10;	-0.124939
-0.529223	int)a / 10;	-0.124939
-0.791830	b % 10;	-0.124939
-0.692150	int)b % 10;	-0.124939
-2.729792	to the table.	-0.124939
-3.015582	in the table.	-0.124939
-1.920528	the following table.	-0.124939
-1.460636	the virtual table.	-0.124939
-1.677598	the above table.	-0.124939
-1.283216	procedure linkage table.	-0.124939
-0.358930	a pre-calculated table.	-0.124939
-0.601817	factor of 1,	-0.124939
-0.601362	Sunday = 1,	-0.124939
-1.502936	0 or 1,	-0.124939
-0.847342	of sizes 1,	-0.124939
-0.835869	Manual", Volume 1,	-0.124939
-0.065806	= {1, 1,	-0.425969
-2.655551	of a vector,	-0.124939
-1.454065	size of vector,	-0.425969
-1.624231	type of vector,	-0.124939
-0.960556	in one vector,	-0.124939
-1.671681	the next vector,	-0.124939
-0.481117	{ if (b)	-0.726999
-1.027357	} if (b)	-0.124939
-0.200466	b; if (b)	-0.425969
-3.203093	of the object,	-0.124939
-2.643731	the same object,	-0.124939
-1.495964	a large object,	-0.124939
-1.055187	the allocated object,	-0.124939
-1.169226	the shared object,	-0.124939
-1.308762	a shared object,	-0.124939
-1.119686	a composite object,	-0.124939
-0.601849	specialization is allowed	-0.124939
-2.391448	should be allowed	-0.124939
-0.599682	container are allowed	-0.124939
-0.599682	'$' are allowed	-0.124939
-1.894554	is not allowed	-0.124939
-1.073178	The only allowed	-0.124939
-1.670256	than to delete	-0.124939
-0.901072	forget to delete	-0.124939
-0.324764	new and delete	-0.221849
-0.583196	stack pointer. Likewise,	-0.124939
-1.113434	shared object. Likewise,	-0.124939
-1.218153	instruction sets. Likewise,	-0.124939
-0.567146	generating overflow. Likewise,	-0.124939
-0.527270	different type. Likewise,	-0.124939
-0.463622	is false. Likewise,	-0.124939
-0.358916	second operand. Likewise,	-0.124939
-1.238261	is as follows:	-0.124939
-0.860253	are as follows:	-0.124939
-1.122294	calculated as follows:	-0.124939
-0.580113	were as follows:	-0.124939
-1.141351	organized as follows:	-0.124939
-1.141351	expressed as follows:	-0.124939
-0.580113	elements, as follows:	-0.124939
-1.816143	a vector simultaneously.	-0.124939
-0.598434	modify objects simultaneously.	-0.124939
-0.565665	or threads simultaneously.	-0.124939
-0.565665	eight threads simultaneously.	-0.124939
-0.541223	many processes simultaneously.	-0.124939
-0.541223	two jobs simultaneously.	-0.124939
-0.358930	or seemingly simultaneously.	-0.124939
-2.652040	the code itself	-0.124939
-2.622087	the compiler itself	-0.124939
-2.226854	the program itself	-0.124939
-1.178406	test program itself	-0.124939
-1.573858	the application itself	-0.124939
-0.585971	is calling itself	-0.124939
-0.567199	the device itself	-0.124939
-0.368271	an efficient solution.	-0.124939
-1.486355	most efficient solution.	-0.124939
-1.234954	a better solution.	-0.124939
-1.124669	very inefficient solution.	-0.124939
-0.557734	most reliable solution.	-0.124939
-0.505042	most up-to-date solution.	-0.124939
-0.379802	rules of algebra	-0.124939
-0.600211	Bit vector algebra	-0.124939
-1.876702	Floating point algebra	-0.124939
-0.590491	optimization Integer algebra	-0.124939
-0.589136	reciprocal Boolean algebra	-0.124939
-0.570547	including linear algebra	-0.124939
-2.529209	of a suitable	-0.124939
-1.628692	with a suitable	-0.124939
-0.599652	finding a suitable	-0.124939
-1.503783	examples of suitable	-0.124939
-2.190893	are not suitable	-0.124939
-1.748308	for all suitable	-0.124939
-0.601403	metaprogramming // Template	-0.124939
-0.982032	the Windows Template	-0.124939
-1.175792	and Windows Template	-0.124939
-0.575593	be possible. Template	-0.124939
-0.463641	the Standard Template	-0.124939
-0.358930	STL (Standard Template	-0.124939
-0.358930	the Active Template	-0.124939
-0.601457	manager can spend	-0.124939
-2.027465	does not spend	-0.124939
-1.075507	time you spend	-0.124939
-0.591579	very well spend	-0.124939
-0.590438	Many programs spend	-0.124939
-1.041367	will never spend	-0.124939
-1.038207	Some applications spend	-0.124939
-0.579327	as task switches	-0.124939
-0.061100	of context switches	-0.124939
-0.132226	The context switches	-0.124939
-0.132226	Frequent context switches	-0.124939
-0.358990	3.14 Context switches	-0.124939
-0.249852	renewed. Context switches	-0.124939
-0.804363	memory to disk.	-0.124939
-0.899463	swapped to disk.	-0.124939
-0.899700	files from disk.	-0.124939
-0.639809	the hard disk.	-0.124939
-0.450766	fragmented hard disk.	-0.124939
-0.463677	a floppy disk.	-0.124939
-0.203927	become a serious	-0.425969
-2.158150	there are serious	-0.124939
-0.600566	possibly more serious	-0.124939
-0.600139	unsafe because serious	-0.124939
-1.788156	The most serious	-0.124939
-0.575618	considerably. Another serious	-0.124939
-0.815320	b * c);	-0.301030
-0.065809	_mm_mullo_epi16 (b, c);	-0.425969
-0.358959	= CriticalFunction(b, c);	-0.124939
-0.358959	= (*CriticalFunction)(b, c);	-0.124939
-0.177321	Microsoft Visual Studio	-0.124939
-0.216118	processing. Visual Studio	-0.124939
-0.216118	free. Visual Studio	-0.124939
-0.216118	(MS Visual Studio	-0.124939
-0.358988	x64 (Visual Studio	-0.124939
-2.127772	short int a[100];	-0.124939
-1.063402	public: int a[100];	-0.124939
-0.563099	4 float a[100];	-0.124939
-0.563099	list float a[100];	-0.124939
-0.563099	7.26b float a[100];	-0.124939
-0.563099	7.26a float a[100];	-0.124939
-1.596521	int i, a[100];	-0.124939
-0.601951	used the trick	-0.124939
-0.890547	true. The trick	-0.124939
-0.595743	(b1*b2); The trick	-0.124939
-0.595743	pointers: The trick	-0.124939
-0.595743	143. The trick	-0.124939
-0.595743	sum. The trick	-0.124939
-1.014252	a special trick	-0.124939
-1.597282	have the disadvantages	-0.124939
-0.902130	over the disadvantages	-0.124939
-0.902215	advance. The disadvantages	-0.124939
-2.158150	there are disadvantages	-0.124939
-0.598195	have some disadvantages	-0.124939
-0.596459	overcome these disadvantages	-0.124939
-1.920528	the following disadvantages	-0.124939
-0.596583	the registers eax,	-0.124939
-0.553831	eax, 1 eax,	-0.124939
-0.553831	ecx, 1 eax,	-0.124939
-0.505022	i++. cmp eax,	-0.124939
-0.659869	PTR [esp+8] eax,	-0.124939
-0.358930	DWORD PTR[ecx+eax*4],ebx eax,	-0.124939
-0.358930	mov 2:8+esp eax,	-0.124939
-1.349041	code is distributed	-0.124939
-0.601734	compiled and distributed	-0.124939
-1.836006	to be distributed	-0.425969
-1.911156	function libraries distributed	-0.124939
-0.601160	compilers is generally	-0.124939
-0.901291	application is generally	-0.124939
-0.601734	important and generally	-0.124939
-1.578949	operations are generally	-0.124939
-1.072118	classes are generally	-0.124939
-2.184466	you can generally	-0.124939
-1.903711	You can generally	-0.124939
-0.586182	to 64-bit mode,	-0.124939
-0.586182	(In 64-bit mode,	-0.124939
-1.257284	in 32-bit mode,	-0.124939
-1.492379	64 bit mode,	-0.124939
-1.013441	32- bit mode,	-0.124939
-0.463677	in exclusive mode,	-0.124939
-0.941951	Windows and Linux.	-0.124939
-1.442975	as in Linux.	-0.124939
-0.600661	rarely in Linux.	-0.124939
-0.899202	compilers for Linux.	-0.124939
-1.368649	compiling for Linux.	-0.124939
-1.743575	and 64-bit Linux.	-0.124939
-1.247758	() { C1	-0.124939
-0.590870	F1() { C1	-0.124939
-0.590870	g() { C1	-0.124939
-0.571080	to class C1	-0.124939
-0.992175	}; class C1	-0.124939
-0.571080	Disp(); class C1	-0.124939
-0.571080	7.44 class C1	-0.124939
-1.742881	objects are instances	-0.124939
-1.319775	to all instances	-0.124939
-0.591385	Make all instances	-0.124939
-0.598830	has multiple instances	-0.124939
-1.608846	with many instances	-0.124939
-0.596842	more template instances	-0.124939
-0.358930	many renamed instances	-0.124939
-1.292858	function is called,	-0.249877
-1.792687	object is called,	-0.124939
-2.160809	will be called,	-0.124939
-0.599801	likely be called,	-0.124939
-1.202202	during the update	-0.124939
-0.601580	abusing the update	-0.124939
-2.012652	need to update	-0.124939
-0.902215	updating. The update	-0.124939
-0.601321	update, or update	-0.124939
-1.075834	make an update	-0.124939
-0.596480	important new update	-0.124939
-0.600931	2.0; x <=	-0.124939
-0.586782	&& i <=	-0.124939
-0.586782	2; i <=	-0.124939
-0.595771	interval 0 <=	-0.124939
-0.588003	1; n <=	-0.124939
-0.358930	- min) <=	-0.124939
-0.358930	Now 1.0 <=	-0.124939
-1.679403	point to integer.	-0.124939
-1.514889	be an integer.	-0.124939
-0.808850	as an integer.	-0.124939
-1.066712	the 32-bit integer.	-0.124939
-0.726852	the nearest integer.	-0.124939
-2.442122	by the body	-0.124939
-2.350540	because the body	-0.124939
-2.540459	the function body	-0.124939
-2.049467	the loop body	-0.124939
-0.978405	The loop body	-0.124939
-0.889575	if its body	-0.124939
-0.785828	the hardware definition	-0.124939
-0.120809	a hardware definition	-0.346788
-0.401762	The hardware definition	-0.124939
-2.566476	and the Java	-0.124939
-1.498102	implementations of Java	-0.124939
-1.076147	features of Java	-0.124939
-2.199445	used for Java	-0.124939
-1.533744	the best Java	-0.124939
-1.394740	The best Java	-0.124939
-1.299165	the so-called Java	-0.124939
-0.599303	e.g. Intel Math	-0.124939
-0.594129	AMD AMD Math	-0.124939
-0.415687	of Intel's Math	-0.124939
-0.415687	in Intel's Math	-0.124939
-0.550827	AMD Core Math	-0.124939
-0.444382	the "Intel Math	-0.124939
-0.314766	The "Intel Math	-0.124939
-2.278846	the compiler generates	-0.124939
-1.499649	a compiler generates	-0.124939
-2.049616	The compiler generates	-0.124939
-1.845546	Intel compiler generates	-0.124939
-1.154666	type conversion generates	-0.124939
-0.726852	from -128 generates	-0.124939
-0.463677	the sampling generates	-0.124939
-0.601625	CPUs for executing	-0.124939
-1.297788	optimization by executing	-0.124939
-1.283906	time on executing	-0.124939
-1.066275	spent on executing	-0.124939
-0.462827	and after executing	-0.124939
-0.358945	43 speculatively executing	-0.124939
-0.601696	FreeBSD and Open	-0.124939
-0.577552	class library. Open	-0.124939
-0.573289	optimize well. Open	-0.124939
-0.505002	8.42n, 2004. Open	-0.124939
-0.463622	database connections. Open	-0.124939
-0.358916	processing. Yeppp. Open	-0.124939
-0.358916	Locked mutexes. Open	-0.124939
-1.678005	size = 256;	-0.124939
-0.858444	i < 256;	-0.602060
-1.679056	kinds of optimizations.	-0.124939
-1.921889	and other optimizations.	-0.124939
-0.591577	prevents certain optimizations.	-0.124939
-1.343591	for further optimizations.	-0.124939
-0.129466	enables interprocedural optimizations.	-0.124939
-0.463641	to low-level optimizations.	-0.124939
-2.302993	{ // Cache	-0.124939
-1.305236	stored together Cache	-0.124939
-0.143399	important. 9.2 Cache	-0.124939
-0.143399	87 9.2 Cache	-0.124939
-0.143399	96 9.10 Cache	-0.124939
-0.143399	opposite). 9.10 Cache	-0.124939
-0.463659	Table 9.2. Cache	-0.124939
-1.835736	to be slower	-0.124939
-1.068944	link libraries slower	-0.124939
-1.409666	is much slower	-0.124939
-0.592215	always run slower	-0.124939
-1.308273	to execute slower	-0.124939
-0.358930	faster nor slower	-0.124939
-2.741289	It is free	-0.124939
-1.203028	malloc and free	-0.124939
-1.597981	available for free	-0.124939
-2.045746	do not free	-0.124939
-1.867650	only one free	-0.124939
-0.583164	see my free	-0.124939
-0.567184	it could free	-0.124939
-2.032984	the time consuming	-0.124939
-0.584597	is time consuming	-0.124939
-0.584597	be time consuming	-0.124939
-0.584597	these time consuming	-0.124939
-0.249852	the time- consuming	-0.124939
-0.249852	put time- consuming	-0.124939
-0.358973	functions. Time- consuming	-0.124939
-2.259027	is to hold	-0.124939
-1.073795	register to hold	-0.124939
-1.373044	enough to hold	-0.124939
-0.888607	vector can hold	-0.124939
-0.888607	CPU can hold	-0.124939
-0.888607	registers can hold	-0.124939
-0.594758	line can hold	-0.124939
-3.205415	of the memory,	-0.124939
-1.659261	variable in memory,	-0.124939
-1.438232	stored in memory,	-0.124939
-0.767212	dynamically allocated memory,	-0.124939
-0.358959	with segmented memory,	-0.124939
-0.708101	cache (see p.	-0.124939
-0.493713	templates (see p.	-0.124939
-0.493713	chains (see p.	-0.124939
-0.493713	prediction (see p.	-0.124939
-0.493713	throughput (see p.	-0.124939
-0.869965	thread-local storage p.	-0.124939
-0.806044	(see above, p.	-0.124939
-0.543538	c < SIZE;	-0.425969
-0.082218	r < SIZE;	-0.726999
-0.508490	r1 < SIZE;	-0.124939
-0.931338	in this case.	-0.124939
-1.365104	in each case.	-0.124939
-1.066712	the 32-bit case.	-0.124939
-0.573340	in either case.	-0.124939
-0.601625	calculations: for (	-0.124939
-0.090163	size Alignd (	-0.124939
-0.090163	arrays Alignd (	-0.124939
-0.090163	); Alignd (	-0.124939
-0.358945	, longdoublevalue (	-0.124939
-0.358945	, doublevalue (	-0.124939
-0.358945	follows: floatvalue (	-0.124939
-2.950576	can be expensive	-0.124939
-2.095392	is more expensive	-0.124939
-0.594883	development more expensive	-0.124939
-1.195356	time, but expensive	-0.124939
-1.281797	are so expensive	-0.124939
-0.597916	get very expensive	-0.124939
-1.425079	is quite expensive	-0.124939
-0.601708	sign and rounding	-0.124939
-1.734524	time than rounding	-0.124939
-2.578739	floating point rounding	-0.124939
-2.087647	by using rounding	-0.124939
-1.021153	speed between rounding	-0.124939
-1.447094	difference between rounding	-0.124939
-0.594605	this). Use rounding	-0.124939
-1.388823	See page 130	-0.425969
-1.635504	(see page 130	-0.124939
-1.220520	(See page 130	-0.124939
-0.505062	129 129 130	-0.124939
-0.659926	compiler ......................................................................... 130	-0.124939
-0.358959	the following: 130	-0.124939
-1.078508	user is far	-0.124939
-2.769457	in a far	-0.124939
-0.601696	pointers, and far	-0.124939
-0.591585	have values far	-0.124939
-1.314061	the keyword far	-0.124939
-1.321115	of course far	-0.124939
-0.463622	Far storage, far	-0.124939
-1.478742	in memory. They	-0.124939
-0.861360	global variables. They	-0.124939
-0.550770	three branches. They	-0.124939
-0.527270	available information. They	-0.124939
-0.659840	and 64-bit. They	-0.124939
-0.358916	very smart. They	-0.124939
-0.358916	often unreliable. They	-0.124939
-1.503146	kind of exceptions	-0.124939
-1.802101	check for exceptions	-0.124939
-0.601287	out if exceptions	-0.124939
-1.196088	without using exceptions	-0.124939
-0.505002	not throw exceptions	-0.124939
-0.463622	that thrown exceptions	-0.124939
-0.659840	// Catch exceptions	-0.124939
-2.550220	on the system,	-0.124939
-1.077814	smaller the system,	-0.124939
-1.204936	the operating system,	-0.124939
-0.528414	no operating system,	-0.124939
-0.528414	Windows operating system,	-0.124939
-0.766495	protected operating system,	-0.124939
-3.084463	of the absolute	-0.124939
-1.376089	take the absolute	-0.124939
-1.740795	calculate the absolute	-0.124939
-0.600621	DLL use absolute	-0.124939
-0.898151	contains no absolute	-0.124939
-0.597854	uses 32-bit absolute	-0.124939
-0.917534	to compare absolute	-0.124939
-1.796334	y = (a	-0.124939
-0.598928	b) = (a	-0.124939
-1.051422	} if (a	-0.124939
-0.592608	d; if (a	-0.124939
-0.592608	14.15b if (a	-0.124939
-0.592608	14.15a if (a	-0.124939
-0.358973	#define MAX(a,b) (a	-0.124939
-3.065668	in the machine	-0.124939
-2.465969	number of machine	-0.124939
-1.076235	transferred as machine	-0.124939
-0.599186	translated into machine	-0.124939
-0.887722	Java virtual machine	-0.124939
-1.804363	a few machine	-0.124939
-0.463622	the resulting machine	-0.124939
-0.901670	ecx = Induction	-0.124939
-0.600923	i; int Induction	-0.124939
-0.600437	9; } Induction	-0.124939
-1.186582	array elements Induction	-0.124939
-1.244949	integer expressions Induction	-0.124939
-0.835825	code motion Induction	-0.124939
-0.358916	} 70 Induction	-0.124939
-0.601808	slices to 120	-0.124939
-0.601696	95 and 120	-0.124939
-2.205398	See page 120	-0.124939
-1.743658	instruction set. 120	-0.124939
-0.505002	Conclusion .......................................................................................................... 120	-0.124939
-0.463622	vectors ....................................................... 120	-0.124939
-0.358916	allocated memory................................................................. 120	-0.124939
-2.732609	it is hardly	-0.124939
-2.322042	there is hardly	-0.124939
-1.197746	model is hardly	-0.124939
-0.902045	these are hardly	-0.124939
-1.076575	integers with hardly	-0.124939
-1.593033	This has hardly	-0.124939
-0.590848	there was hardly	-0.124939
-2.456351	and the CPUID	-0.124939
-2.469577	on the CPUID	-0.124939
-1.509623	when the CPUID	-0.124939
-1.662910	call the CPUID	-0.124939
-1.073265	The only CPUID	-0.124939
-1.203638	access the saved	-0.124939
-2.684408	can be saved	-0.124939
-2.253063	should be saved	-0.124939
-1.849370	must be saved	-0.124939
-0.601537	calls are saved	-0.124939
-0.899457	F1 has saved	-0.124939
-1.046379	that was saved	-0.124939
-2.760690	if the changes	-0.124939
-1.078679	independent of changes	-0.124939
-1.068505	the version changes	-0.124939
-1.067644	with some changes	-0.124939
-0.875425	The dispatcher changes	-0.124939
-0.863918	last index changes	-0.124939
-0.541201	keyword __fastcall changes	-0.124939
-0.601559	c are integers,	-0.124939
-1.629138	and 64-bit integers,	-0.124939
-0.586192	define 64-bit integers,	-0.124939
-0.550879	to 32-bit integers,	-0.124939
-1.328941	for 32-bit integers,	-0.124939
-0.550879	are 32-bit integers,	-0.124939
-0.806129	two 32-bit integers,	-0.124939
-0.902657	implemented a collection	-0.124939
-0.600488	consuming. A collection	-0.124939
-0.305104	and garbage collection	-0.124939
-0.306471	This garbage collection	-0.124939
-0.306471	start garbage collection	-0.124939
-0.358959	the Boost collection	-0.124939
-0.892843	my optimization manuals	-0.124939
-1.565647	of these manuals	-0.124939
-0.575188	in these manuals	-0.124939
-0.891245	the programming manuals	-0.124939
-0.575630	Literature Other manuals	-0.124939
-0.981929	The subsequent manuals	-0.124939
-0.940529	of five manuals	-0.124939
-1.619417	on the processor.	-0.301030
-1.363924	an Intel processor.	-0.124939
-1.415668	Pentium 4 processor.	-0.124939
-0.999662	the actual processor.	-0.124939
-0.527341	so-called soft processor.	-0.124939
-1.077311	Position-independent code Shared	-0.124939
-0.892697	load time. Shared	-0.124939
-0.886525	bit Linux Shared	-0.124939
-0.464548	explained below. Shared	-0.425969
-0.847368	in BSD Shared	-0.124939
-0.842113	local references. Shared	-0.124939
-0.680442	method of storing	-0.124939
-2.105041	used for storing	-0.124939
-0.598603	database for storing	-0.124939
-0.598603	buffers for storing	-0.124939
-1.651573	loop by storing	-0.124939
-1.523065	simply by storing	-0.124939
-0.601633	have. The developers	-0.124939
-1.494136	that make developers	-0.124939
-0.864008	and software developers	-0.124939
-1.021803	that software developers	-0.124939
-0.595260	time Some developers	-0.124939
-0.415687	problems. Software developers	-0.124939
-0.415687	swapping. Software developers	-0.124939
-0.105085	parm1, int parm2)	-0.669007
-0.900257	from time T	-0.124939
-0.600444	N; } T	-0.124939
-1.468031	the type T	-0.124939
-0.993919	of type T	-0.124939
-0.594336	& a, T	-0.124939
-1.722249	static inline T	-0.124939
-0.358930	{ protected: T	-0.124939
-1.764121	time to eliminate	-0.124939
-1.374143	fail to eliminate	-0.124939
-1.310097	compiler can eliminate	-0.124939
-1.057620	this can eliminate	-0.124939
-1.655630	we can eliminate	-0.124939
-1.735037	can also eliminate	-0.124939
-2.138143	power of 2:	-0.124939
-1.444146	powers of 2:	-0.124939
-1.060627	break; case 2:	-0.124939
-0.655336	in manual 2:	-0.602060
-1.165038	; parameter 2:	-0.124939
-1.939166	of a composite	-0.425969
-1.903054	has a composite	-0.124939
-0.898287	returning a composite	-0.124939
-0.601832	parameter of composite	-0.124939
-0.880322	simplest cases, composite	-0.124939
-0.764627	for transferring composite	-0.124939
-1.501580	program. The profilers	-0.124939
-0.601169	problems with profilers	-0.124939
-0.595235	code. Some profilers	-0.124939
-0.594239	CodeAnalyst. These profilers	-0.124939
-1.556829	are various profilers	-0.124939
-0.580765	CodeAnalyst. Unfortunately, profilers	-0.124939
-0.463622	also third-party profilers	-0.124939
-1.078532	But a highly	-0.124939
-0.601734	respects and highly	-0.124939
-1.832287	functions are highly	-0.124939
-1.006196	libraries are highly	-0.124939
-0.596006	applications are highly	-0.124939
-0.595412	consider making highly	-0.124939
-0.902362	again and again	-0.124939
-0.597491	nearby address again	-0.124939
-0.577566	reading them again	-0.124939
-0.971168	is interpreted again	-0.124939
-0.550825	0x4700. Reading again	-0.124939
-0.527309	times. Then again	-0.124939
-0.463622	is reused again	-0.124939
-0.601808	1 to 127	-0.124939
-1.596259	bigger than 127	-0.124939
-0.895943	AVX version 127	-0.124939
-0.557689	11.8 127 127	-0.124939
-0.505002	8 -128 127	-0.124939
-0.835825	1)sign 2exponent 127	-0.124939
-0.358916	33 11.8 127	-0.124939
-1.036691	in assembly language.	-0.124939
-0.711499	or assembly language.	-0.124939
-0.711499	use assembly language.	-0.124939
-0.887590	using assembly language.	-0.124939
-0.711499	need assembly language.	-0.124939
-0.557791	the D language.	-0.124939
-1.292120	hardware definition language.	-0.124939
-1.751569	to be aware	-0.425969
-1.413127	should be aware	-0.301030
-1.264847	therefore be aware	-0.124939
-0.358988	2 (be aware	-0.124939
-1.052591	intrinsic functions. Alternatively,	-0.124939
-1.128901	cache size. Alternatively,	-0.124939
-0.575600	own stack. Alternatively,	-0.124939
-1.351349	function returns. Alternatively,	-0.124939
-0.562994	such applications. Alternatively,	-0.124939
-0.527309	OMF format. Alternatively,	-0.124939
-0.726752	in Windows). Alternatively,	-0.124939
-2.579809	floating point capabilities	-0.124939
-1.412496	the optimization capabilities	-0.124939
-0.714224	the out-of-order capabilities	-0.124939
-0.092565	with out-of-order capabilities	-0.124939
-0.855483	vector processing capabilities	-0.124939
-0.894715	int)n < 4)	-0.124939
-1.524628	i += 4)	-0.124939
-0.469981	3) << 4)	-0.124939
-0.469981	as(a << 4)	-0.124939
-0.469981	(B << 4)	-0.124939
-0.463723	(level >= 4)	-0.425969
-2.628172	that the linker	-0.124939
-1.751136	by the linker	-0.124939
-2.303986	because the linker	-0.124939
-1.374156	allows the linker	-0.124939
-0.902277	address. The linker	-0.124939
-0.842244	both compiler, linker	-0.124939
-0.941314	bytes = int64_t	-0.124939
-0.901613	long or int64_t	-0.124939
-0.600933	256 int int64_t	-0.124939
-1.284995	64 2 int64_t	-0.124939
-0.882975	64 1 int64_t	-0.124939
-0.358930	-263 263-1 int64_t	-0.124939
-1.769111	number of bits.	-0.124939
-0.578349	to 64 bits.	-0.124939
-0.578349	uses 64 bits.	-0.124939
-0.596117	than 32 bits.	-0.124939
-1.267332	the extra bits.	-0.124939
-0.585987	the higher bits.	-0.124939
-2.104185	make the measurements	-0.124939
-1.298724	and that measurements	-0.124939
-1.774502	The time measurements	-0.124939
-0.595219	during time measurements	-0.124939
-0.899863	execute then measurements	-0.124939
-1.664744	and make measurements	-0.124939
-0.598195	do some measurements	-0.124939
-2.734595	that the representation	-0.124939
-0.902236	compilers). The representation	-0.124939
-2.579274	floating point representation	-0.124939
-1.194569	The integer representation	-0.124939
-0.819939	a binary representation	-0.124939
-0.669835	in binary representation	-0.124939
-0.469958	its binary representation	-0.124939
-0.584298	8.7 int SomeFunction	-0.124939
-0.584298	8.9b int SomeFunction	-0.124939
-0.584298	8.9a int SomeFunction	-0.124939
-0.584298	8.11b int SomeFunction	-0.124939
-0.584298	8.11a int SomeFunction	-0.124939
-0.598893	7.1 float SomeFunction	-0.124939
-0.596228	<malloc.h> void SomeFunction	-0.124939
-0.600227	or program size,	-0.124939
-0.599658	sets, cache size,	-0.124939
-1.700648	cache line size,	-0.124939
-0.577566	bits total size,	-0.124939
-0.570514	available. declaration size,	-0.124939
-0.971222	with fixed size,	-0.124939
-0.550770	table. Type size,	-0.124939
-2.151977	if it is.	-0.124939
-2.326793	the loop is.	-0.124939
-0.597491	array address is.	-0.124939
-0.586650	it actually is.	-0.124939
-0.580706	advantageous vectorization is.	-0.124939
-1.089607	template metaprogramming is.	-0.124939
-0.557737	compiler itself is.	-0.124939
-0.223129	vector algebra reductions:	-0.124939
-0.223129	point algebra reductions:	-0.124939
-0.223129	Integer algebra reductions:	-0.124939
-0.223129	Boolean algebra reductions:	-0.124939
-0.028027	XMM (vector) reductions:	-0.124939
-1.076466	user is waiting	-0.124939
-1.076466	thread is waiting	-0.124939
-1.741439	we are waiting	-0.124939
-1.058996	its time waiting	-0.124939
-0.595227	their time waiting	-0.124939
-1.600378	are often waiting	-0.124939
-0.594944	do while waiting	-0.124939
-2.129738	set is available,	-0.124939
-2.530616	to be available,	-0.124939
-2.106109	they are available,	-0.124939
-2.439997	instruction set available,	-0.124939
-1.071369	optimizing compilers available,	-0.124939
-1.686475	are also available,	-0.124939
-0.541201	are currently available,	-0.124939
-2.651624	the code automatically.	-0.124939
-0.594772	programming work automatically.	-0.124939
-0.886070	mechanism works automatically.	-0.124939
-0.580753	advantage comes automatically.	-0.124939
-1.216360	be vectorized automatically.	-0.124939
-0.577552	this alignment automatically.	-0.124939
-0.567222	not vectorize automatically.	-0.124939
-1.597728	only for powers	-0.124939
-1.077614	numbers are powers	-0.124939
-0.601099	defined as powers	-0.124939
-1.188295	of using powers	-0.425969
-0.579153	preferably using powers	-0.124939
-0.594816	means avoid powers	-0.124939
-1.497501	making a debug	-0.124939
-0.601096	executable: a debug	-0.124939
-1.829826	difficult to debug	-0.124939
-0.592487	testing contains debug	-0.124939
-0.562995	inserts temporary debug	-0.124939
-0.527325	The 17 debug	-0.124939
-0.358930	time. Uses debug	-0.124939
-0.902200	templates for polymorphism	-0.124939
-0.596393	functionality without polymorphism	-0.124939
-1.150455	the runtime polymorphism	-0.124939
-1.504225	the desired polymorphism	-0.124939
-0.358985	7.43a. Runtime polymorphism	-0.124939
-0.358985	73. Runtime polymorphism	-0.124939
-0.358930	7.43b. Compile-time polymorphism	-0.124939
-1.078529	Gnu and Clang	-0.124939
-0.899294	platforms. The Clang	-0.124939
-0.600158	Clang The Clang	-0.124939
-0.582073	Unix-like platforms. Clang	-0.124939
-0.475754	the Gnu, Clang	-0.425969
-0.336336	Microsoft, Gnu, Clang	-0.124939
-2.284175	that is measured	-0.124939
-1.729420	time is measured	-0.124939
-2.684725	It is measured	-0.124939
-1.073500	computer. The measured	-0.124939
-0.600158	16.2. The measured	-0.124939
-2.391918	should be measured	-0.124939
-0.585174	sizes were measured	-0.124939
-1.941333	code in details.	-0.124939
-0.595673	manual for details.	-0.124939
-0.594091	88 for details.	-0.124939
-0.594091	29 for details.	-0.124939
-0.594091	Documentation for details.	-0.124939
-0.594091	blog for details.	-0.124939
-2.454537	when the factor	-0.124939
-0.601586	Now, the factor	-0.124939
-1.478873	by a factor	-0.124939
-1.991532	} The factor	-0.124939
-1.822698	of each factor	-0.124939
-0.842322	a risk factor	-0.124939
-0.156183	_mm_storeu_si128((__m128i *)d, x);	-0.425969
-0.172691	_mm_store_si128((__m128i *)d, x);	-0.124939
-0.065809	int order(int x);	-0.124939
-0.358959	= _mm_hadd_ps(x, x);	-0.124939
-0.358959	x2*x, x2, x);	-0.124939
-3.067489	in the core.	-0.124939
-2.644737	the same core.	-0.124939
-0.600024	own CPU core.	-0.124939
-1.365029	in each core.	-0.124939
-0.235302	same processor core.	-0.124939
-2.729792	to the rules	-0.124939
-1.298634	though the rules	-0.124939
-1.077870	cycles. The rules	-0.124939
-1.625897	The same rules	-0.124939
-0.895944	the many rules	-0.124939
-0.591577	obey certain rules	-0.124939
-0.358930	same coding rules	-0.124939
-1.503396	terms of speed.	-0.124939
-0.902180	optimizing for speed.	-0.124939
-1.075692	important than speed.	-0.124939
-0.585956	hence higher speed.	-0.124939
-0.575557	or full speed.	-0.124939
-0.358916	expected real-time speed.	-0.124939
-0.358916	the single-thread speed.	-0.124939
-0.601837	obstacle to vectorization.	-0.124939
-0.899651	better at vectorization.	-0.124939
-0.496167	and automatic vectorization.	-0.124939
-0.889685	with automatic vectorization.	-0.124939
-0.588655	on automatic vectorization.	-0.124939
-0.416986	do automatic vectorization.	-0.124939
-1.696300	in registers anyway.	-0.124939
-0.592767	handling support anyway.	-0.124939
-0.588036	rarely needed anyway.	-0.124939
-1.387715	be loaded anyway.	-0.124939
-0.575605	be true anyway.	-0.124939
-0.143394	is restarted anyway.	-0.124939
-0.143394	and restarted anyway.	-0.124939
-1.354546	use the smallest	-0.301030
-1.989133	using the smallest	-0.124939
-0.898672	even the smallest	-0.124939
-1.498539	On the smallest	-0.124939
-0.898672	putting the smallest	-0.124939
-1.000010	is the responsibility	-0.970037
-0.585948	AVX, AVX2 Mathematical	-0.124939
-0.505042	string manipulation Mathematical	-0.124939
-0.249837	142 14.10 Mathematical	-0.124939
-0.249837	u[0]. 14.10 Mathematical	-0.124939
-0.659897	page 140). Mathematical	-0.124939
-0.143399	117 12.7 Mathematical	-0.124939
-0.143399	118 12.7 Mathematical	-0.124939
-0.598800	from 64-bit MMX	-0.124939
-0.543012	2 64 MMX	-0.124939
-0.921749	4 64 MMX	-0.124939
-0.968161	8 64 MMX	-0.124939
-0.543012	1 64 MMX	-0.124939
-1.067347	Header file MMX	-0.124939
-0.527334	The older MMX	-0.124939
-1.792168	from a reliable	-0.124939
-2.080515	make a reliable	-0.124939
-0.877920	often more reliable	-0.124939
-0.375059	gives more reliable	-0.124939
-2.158055	the most reliable	-0.124939
-1.591002	to get reliable	-0.124939
-2.395762	with the Borland	-0.124939
-1.595483	while the Borland	-0.124939
-0.601232	Combining the Borland	-0.124939
-0.599303	Gnu Intel Borland	-0.124939
-1.115719	64-bit Windows. Borland	-0.124939
-0.550787	9.0 CodeGear Borland	-0.124939
-0.358945	Studio 2005). Borland	-0.124939
-1.750104	in the sense	-0.903090
-1.416874	it makes sense	-0.124939
-2.858519	in the latest	-0.124939
-1.950719	for the latest	-0.425969
-1.585527	example, the latest	-0.124939
-1.293604	Use the latest	-0.124939
-0.899361	gets the latest	-0.124939
-0.601663	3 The latest	-0.124939
-1.788923	} // Now	-0.124939
-0.599014	0x3F800000; // Now	-0.124939
-1.106954	points to. Now	-0.124939
-0.940529	mentioned above. Now	-0.124939
-0.358930	+ d); Now	-0.124939
-0.358930	brutally interrupted. Now	-0.124939
-0.358930	= (s0+s1)+(s2+s3); Now	-0.124939
-1.073694	the execution units	-0.124939
-0.180174	with execution units	-0.124939
-0.708060	different execution units	-0.124939
-0.493688	128-bit execution units	-0.124939
-0.594312	vector. These units	-0.124939
-0.861605	chip. Such units	-0.124939
-1.436325	it to do.	-0.124939
-0.899449	thing to do.	-0.124939
-0.600236	jobs to do.	-0.124939
-0.901022	can not do.	-0.124939
-1.283354	they cannot do.	-0.124939
-0.590477	few programs do.	-0.124939
-0.358945	but event-counters do.	-0.124939
-1.693350	is the reciprocal	-0.425969
-1.545842	store the reciprocal	-0.124939
-1.075666	insert the reciprocal	-0.124939
-0.601042	by - reciprocal	-0.124939
-0.726852	fast approximate reciprocal	-0.124939
-0.358959	by xx-xx--x- reciprocal	-0.124939
-0.119958	StoreVector(void * d,	-0.602060
-0.559164	StoreVectorA(void * d,	-0.124939
-0.453992	b, c, d,	-0.301030
-0.353026	into multiple threads.	-0.221849
-0.597915	communicating between threads.	-0.124939
-0.358973	and stopping threads.	-0.124939
-1.379549	cases, the log	-0.124939
-0.601696	pow and log	-0.124939
-0.601612	password. The log	-0.124939
-1.740435	a[i] = log	-0.124939
-0.901584	off or log	-0.124939
-0.586701	usually requires log	-0.124939
-1.230041	innermost loop. log	-0.124939
-1.300457	stores the thousand	-0.124939
-2.050680	on a thousand	-0.124939
-1.495596	time a thousand	-0.124939
-1.071996	even a thousand	-0.124939
-0.599652	repeats a thousand	-0.124939
-1.299891	array of thousand	-0.124939
-1.378434	time in thousand	-0.124939
-0.902611	implementing a compile-time	-0.124939
-0.601696	if and compile-time	-0.124939
-0.601306	loops or compile-time	-0.124939
-1.076515	or with compile-time	-0.124939
-1.827257	known at compile-time	-0.124939
-1.067896	for any compile-time	-0.124939
-0.589585	language allows compile-time	-0.124939
-2.234967	is to remove	-0.124939
-1.972319	need to remove	-0.124939
-0.897888	linker to remove	-0.124939
-0.897888	Remember to remove	-0.124939
-0.601350	add or remove	-0.124939
-2.095347	You may remove	-0.124939
-0.463677	can add, remove	-0.124939
-0.601842	Hyperthreading is Intel's	-0.124939
-2.156420	version of Intel's	-0.124939
-0.902478	supplied in Intel's	-0.124939
-0.601179	supplied with Intel's	-0.124939
-0.584229	their CPUs. Intel's	-0.124939
-0.107193	are overriding Intel's	-0.425969
-1.473855	than by 16.	-0.124939
-0.782218	divisible by 16.	-0.124939
-2.015589	on page 16.	-0.124939
-0.527355	i modulo 16.	-0.124939
-0.725797	transferred in registers,	-0.124939
-1.068486	saved in registers,	-0.124939
-1.498882	only on registers,	-0.124939
-0.557757	older MMX registers,	-0.124939
-0.358959	and restoring registers,	-0.124939
-0.644568	function to transpose	-0.301030
-1.067597	time to transpose	-0.124939
-1.927212	takes to transpose	-0.124939
-0.893828	// call transpose	-0.124939
-2.171147	have to wait	-0.124939
-0.922808	has to wait	-0.602060
-0.601734	seconds and wait	-0.124939
-1.197814	function will wait	-0.124939
-0.594987	x must wait	-0.124939
-1.776084	any other number.	-0.124939
-2.579274	floating point number.	-0.124939
-1.187536	a 32-bit number.	-0.124939
-0.877662	8-bit signed number.	-0.124939
-0.309330	and model number.	-0.124939
-2.705685	that the break	-0.124939
-0.601592	Repeating the break	-0.124939
-2.064062	how to break	-0.124939
-1.998952	need to break	-0.124939
-1.734126	it will break	-0.124939
-0.463677	and press break	-0.124939
-2.339258	with a constant.	-0.124939
-0.601526	to are constant.	-0.124939
-0.901613	large or constant.	-0.124939
-2.643731	the same constant.	-0.124939
-0.599639	positive integer constant.	-0.124939
-1.741252	double precision constant.	-0.124939
-0.129482	the procedure linkage	-0.124939
-0.039023	a procedure linkage	-0.301030
-0.129482	called procedure linkage	-0.124939
-0.129482	ordinary procedure linkage	-0.124939
-1.043256	code if possible,	-0.124939
-0.589752	variables if possible,	-0.124939
-0.878813	implementation if possible,	-0.124939
-0.589752	avoided, if possible,	-0.124939
-0.589752	normalized, if possible,	-0.124939
-0.601134	branches as possible,	-0.124939
-0.188354	the bit scan	-0.425969
-0.528396	this bit scan	-0.124939
-0.188354	slow bit scan	-0.124939
-0.358988	BSF (bit scan	-0.124939
-0.825446	32 bit systems:	-0.124939
-0.135929	in 16-bit systems:	-0.249877
-2.095673	is more predictable	-0.124939
-1.345547	are more predictable	-0.124939
-2.158055	the most predictable	-0.124939
-1.184094	on how predictable	-0.124939
-0.662926	is poorly predictable	-0.124939
-0.415698	a poorly predictable	-0.124939
-0.810736	cout << "Hello	-0.425969
-0.029477	// Writes "Hello	-0.425969
-1.440811	way is equal	-0.124939
-0.899905	list[i] is equal	-0.124939
-0.600465	label is equal	-0.124939
-2.532621	to be equal	-0.124939
-0.600961	put an equal	-0.124939
-1.664236	is therefore equal	-0.124939
-0.601633	keyword. The CodeGear	-0.124939
-0.894444	parameters on CodeGear	-0.124939
-0.597715	(three on CodeGear	-0.124939
-0.599303	Mac Intel CodeGear	-0.124939
-0.593234	Borland / CodeGear	-0.124939
-0.358945	v. 9.0 CodeGear	-0.124939
-2.302826	code is compact	-0.124939
-1.431784	is more compact	-0.124939
-1.361029	and more compact	-0.124939
-1.320921	code more compact	-0.124939
-0.578305	made more compact	-0.124939
-1.858119	of this polynomial	-0.124939
-0.600451	C; } polynomial	-0.124939
-0.358976	// Calculate polynomial	-0.124939
-0.358976	8.23b. Calculate polynomial	-0.124939
-0.505078	n'th degree polynomial	-0.124939
-0.463659	parameters Vec4f polynomial	-0.124939
-1.642316	without the Common	-0.124939
-0.599639	elimin., integer Common	-0.124939
-0.584271	+= 2; Common	-0.124939
-0.957774	(vector) reductions: Common	-0.124939
-0.788900	Pointer elimination Common	-0.124939
-0.358930	or Verilog. Common	-0.124939
-1.875299	function that reads	-0.124939
-0.601321	writes or reads	-0.124939
-2.029405	a program reads	-0.124939
-1.372120	and later reads	-0.124939
-0.570592	Using unaligned reads	-0.124939
-0.463641	program afterwards reads	-0.124939
-1.197217	from memory plus	-0.124939
-1.491471	or double plus	-0.124939
-0.597500	base address plus	-0.124939
-1.765246	a constant plus	-0.124939
-0.592496	of list plus	-0.124939
-0.563032	preceding label plus	-0.124939
-0.635588	and manual 5:	-0.124939
-0.476052	in manual 5:	-0.425969
-0.448031	See manual 5:	-0.124939
-2.272064	order to increase	-0.124939
-1.980806	way to increase	-0.124939
-2.232085	you can increase	-0.124939
-1.634265	you cannot increase	-0.124939
-0.586696	modifications actually increase	-0.124939
-0.463659	possible minor increase	-0.124939
-0.598739	// C++ casting	-0.124939
-0.527389	than type casting	-0.124939
-0.527389	loaded type casting	-0.124939
-0.527389	C-style type casting	-0.124939
-0.527389	Constructor-style type casting	-0.124939
-0.550845	safer. Type casting	-0.124939
-0.598743	time, of course,	-0.124939
-0.598743	inefficient, of course,	-0.124939
-0.598743	practice, of course,	-0.124939
-0.598743	omitted, of course,	-0.124939
-0.598743	requires, of course,	-0.124939
-0.358988	faster. Of course,	-0.124939
-2.893396	in the scope	-0.124939
-2.072215	make the scope	-0.124939
-0.124770	beyond the scope	-0.602060
-1.379360	regardless of scope	-0.124939
-0.902849	shows the principle	-0.124939
-1.368861	calls. The principle	-0.124939
-0.600148	undetected. The principle	-0.124939
-0.601051	manually. This principle	-0.124939
-1.495661	use this principle	-0.124939
-2.644737	the same principle	-0.124939
-2.497550	and the throughput	-0.124939
-2.409717	by the throughput	-0.124939
-2.262239	than the throughput	-0.124939
-1.295246	increase the throughput	-0.124939
-0.165194	Execution unit throughput	-0.124939
-1.047060	time is spent	-0.124939
-0.900433	would have spent	-0.124939
-1.558948	the time spent	-0.124939
-1.902113	clock cycles spent	-0.124939
-1.677883	size = 16;	-0.124939
-1.122681	b / 16;	-0.124939
-0.822778	int)b / 16;	-0.124939
-0.791798	b % 16;	-0.124939
-0.692124	int)b % 16;	-0.124939
-0.557771	n <= 16;	-0.124939
-0.901080	name of Func	-0.124939
-1.077171	start of Func	-0.124939
-1.347111	first time Func	-0.124939
-1.738723	every time Func	-0.124939
-1.074107	return from Func	-0.124939
-0.596212	8.25 void Func	-0.124939
-2.131976	have to identify	-0.124939
-2.193987	order to identify	-0.124939
-1.926969	way to identify	-0.124939
-2.020538	how to identify	-0.124939
-1.363783	enough to identify	-0.124939
-0.597894	debugger to identify	-0.124939
-2.175360	which is 15	-0.124939
-0.902387	2 and 15	-0.124939
-1.783976	byte at 15	-0.124939
-1.010069	memory pool. 15	-0.124939
-0.527291	.......................................................................................... 150 15	-0.124939
-0.358930	page 90. 15	-0.124939
-0.598394	Division takes 14	-0.124939
-0.589123	Mac systems. 14	-0.124939
-0.577561	of optimization. 14	-0.124939
-0.557774	......................................................................... 130 14	-0.124939
-0.358930	interface framework........................................................................... 14	-0.124939
-0.358930	C++ language...................................................... 14	-0.124939
-2.276875	to do this.	-0.124939
-1.179505	to avoid this.	-0.124939
-1.646305	to see this.	-0.124939
-0.550807	always avoiding this.	-0.124939
-0.960773	example illustrates this.	-0.124939
-0.599639	variables, integer Register	-0.124939
-0.598852	elimin., float Register	-0.124939
-0.892838	assembly code. Register	-0.124939
-0.999599	is finished. Register	-0.124939
-0.527325	integer constants. Register	-0.124939
-0.659869	/ 4; Register	-0.124939
-2.102016	on a complex	-0.124939
-1.450106	is more complex	-0.124939
-1.026456	in more complex	-0.124939
-1.585206	or more complex	-0.124939
-0.600503	operations. A complex	-0.124939
-0.596904	suboptimal code. Intrinsic	-0.124939
-0.589105	Assembly name Intrinsic	-0.124939
-0.588003	summarized below. Intrinsic	-0.124939
-0.573292	machine instructions. Intrinsic	-0.124939
-0.557732	either case. Intrinsic	-0.124939
-0.358930	Table 12.3. Intrinsic	-0.124939
-0.902630	destructors to call.	-0.124939
-1.817760	the function call.	-0.124939
-2.268651	a function call.	-0.124939
-1.069591	on first call.	-0.124939
-1.059188	on every call.	-0.124939
-0.846612	you will notice	-0.425969
-0.200428	thing we notice	-0.425969
-0.129476	20 Copyright notice	-0.124939
-0.679417	i); // Add	-0.425969
-1.047534	application program. Add	-0.124939
-0.582068	local: 1. Add	-0.124939
-1.009562	the library. Add	-0.124939
-1.377392	CPU dispatching. Add	-0.124939
-0.536286	is branch prediction.	-0.124939
-0.352695	of branch prediction.	-0.124939
-0.536286	about branch prediction.	-0.124939
-0.536286	poor branch prediction.	-0.124939
-1.461529	the right prediction.	-0.124939
-2.428990	with the expected	-0.124939
-2.742189	It is expected	-0.124939
-2.002637	can be expected	-0.124939
-0.601548	set are expected	-0.124939
-2.258589	is to declare	-0.124939
-2.008648	recommended to declare	-0.124939
-1.075554	preferred to declare	-0.124939
-0.902438	directives and declare	-0.124939
-2.095347	You may declare	-0.124939
-2.088194	if you declare	-0.124939
-2.618889	for the application.	-0.124939
-2.396296	with the application.	-0.124939
-0.901460	fits the application.	-0.124939
-0.547413	the particular application.	-0.124939
-1.416213	a particular application.	-0.124939
-0.358973	an MFC application.	-0.124939
-1.667258	compile time here.	-0.124939
-0.599845	model used here.	-0.124939
-0.584246	not appropriate here.	-0.124939
-0.541279	few pitfalls here.	-0.124939
-0.505022	little odd here.	-0.124939
-0.463641	is 400 here.	-0.124939
-3.087114	of the largest	-0.124939
-2.276104	than the largest	-0.124939
-1.672415	whether the largest	-0.124939
-0.042749	the numerically largest	-0.425969
-0.090170	Find numerically largest	-0.124939
-2.761724	if the dispatched	-0.124939
-1.830049	If a dispatched	-0.124939
-0.901029	go to dispatched	-0.124939
-0.601028	Entry to dispatched	-0.124939
-0.601759	continue in dispatched	-0.124939
-1.177516	calls another dispatched	-0.124939
-1.330739	the data members.	-0.124939
-1.442365	of data members.	-0.124939
-1.124014	all data members.	-0.124939
-0.573684	modify data members.	-0.124939
-1.432410	child class members.	-0.124939
-1.294692	size that fits	-0.124939
-1.290518	version that fits	-0.124939
-2.321395	that it fits	-0.124939
-0.466199	on what fits	-0.425969
-0.659926	four float's fits	-0.124939
-0.601033	xx(-)x- - x-xxxx--x	-0.124939
-1.129082	Function inlining x-xxxx--x	-0.124939
-0.550765	- x-xxxx--x x-xxxx--x	-0.124939
-0.659869	= a&(b|c) x-xxxx--x	-0.124939
-0.358930	= a|(b&c) x-xxxx--x	-0.124939
-0.358930	true/false Loopunrolling x-xxxx--x	-0.124939
-1.501967	used for giving	-0.124939
-0.601247	CPU by giving	-0.124939
-1.203880	I am giving	-0.124939
-0.463659	} By giving	-0.124939
-0.358945	a subset, giving	-0.124939
-1.747088	floating point comparisons	-0.124939
-1.140558	Floating point comparisons	-0.425969
-0.872496	The two comparisons	-0.124939
-0.586493	Replacing two comparisons	-0.124939
-0.599303	131. Intel Performance	-0.124939
-0.896422	on C++ Performance	-0.124939
-0.859338	time. 4 Performance	-0.124939
-0.579633	22 4 Performance	-0.124939
-0.764640	the "Intel Performance	-0.124939
-0.358945	and "Integrated Performance	-0.124939
-0.567287	stack (see above,	-0.124939
-0.567287	stride (see above,	-0.124939
-1.793607	as explained above,	-0.124939
-0.560115	As explained above,	-0.124939
-0.789011	example 16.2 above,	-0.124939
-0.358959	function bodies above,	-0.124939
-1.793625	as explained above.	-0.124939
-0.560120	mechanisms explained above.	-0.124939
-0.858788	advice given above.	-0.124939
-0.358995	as mentioned above.	-0.124939
-0.358995	problems mentioned above.	-0.124939
-0.358995	methods mentioned above.	-0.124939
-0.570517	memory address. Pointer	-0.124939
-0.541251	Constant propagation Pointer	-0.124939
-0.463641	81). 77 Pointer	-0.124939
-0.463641	about rounding. Pointer	-0.124939
-0.358930	like sin. Pointer	-0.124939
-0.358930	be accessed. Pointer	-0.124939
-2.309830	is to detect	-0.124939
-1.077997	They can detect	-0.124939
-2.040979	it may detect	-0.124939
-0.600465	operator will detect	-0.124939
-0.922822	can automatically detect	-0.124939
-0.543460	should automatically detect	-0.124939
-2.030775	using the normal	-0.124939
-2.333245	as a normal	-0.124939
-1.203142	back to normal	-0.124939
-0.601708	nonzero and normal	-0.124939
-0.601179	writes with normal	-0.124939
-1.734524	time than normal	-0.124939
-0.587374	on compilers. Several	-0.124939
-1.106783	function libraries. Several	-0.124939
-0.505022	Internet forums Several	-0.124939
-0.358930	support SSE. Several	-0.124939
-0.358930	Library (OWL). Several	-0.124939
-0.358930	is Perl. Several	-0.124939
-2.833217	it is convenient	-0.124939
-2.718190	is a convenient	-0.124939
-2.446677	may be convenient	-0.124939
-1.781741	also be convenient	-0.124939
-1.182991	be more convenient	-0.124939
-0.594902	certainly more convenient	-0.124939
-0.902615	only to show	-0.124939
-1.543680	time and show	-0.124939
-0.600639	breakpoint and show	-0.124939
-1.291070	but only show	-0.124939
-0.505042	and 135 show	-0.124939
-0.505078	table 9.1 show	-0.124939
-0.600657	16 in column	-0.124939
-1.294058	lines in column	-0.124939
-0.901351	elements with column	-0.124939
-2.339023	= 0; column	-0.124939
-0.541245	are swapping column	-0.124939
-0.358945	the leftmost column	-0.124939
-0.040617	int parm2) {...}	-0.903090
-1.910976	function libraries Test	-0.124939
-1.377392	CPU dispatching. Test	-0.124939
-0.818578	hard disk. Test	-0.124939
-0.999662	branch mispredictions. Test	-0.124939
-0.143399	2 13.4 Test	-0.124939
-0.143399	decision. 13.4 Test	-0.124939
-0.902619	declaration of c1	-0.124939
-0.601721	r1 and c1	-0.124939
-1.544045	the class c1	-0.124939
-0.589822	7.28 class c1	-0.124939
-2.339023	= 0; c1	-0.124939
-0.505042	< r1; c1	-0.124939
-1.596796	d = x-	-0.124939
-1.829837	x x x-	-0.602060
-0.587834	xx x x-	-0.124939
-0.550882	= x- x-	-0.124939
-0.601412	measure // Number	-0.124939
-0.198114	element, bits Number	-0.425969
-0.582068	number 1. Number	-0.124939
-0.463659	this column. Number	-0.124939
-0.463659	libraries 113 Number	-0.124939
-2.734083	that the portability	-0.124939
-2.716990	is a portability	-0.124939
-1.713092	sake of portability	-0.124939
-0.601292	recommended if portability	-0.124939
-0.600560	compromise when portability	-0.124939
-0.505022	between efficiency, portability	-0.124939
-0.601412	<pmmintrin.h> // SSE3	-0.124939
-0.871480	double vectors SSE3	-0.124939
-0.659897	/arch:SSE2 -msse2 SSE3	-0.124939
-0.143399	set Suppl. SSE3	-0.124939
-0.143399	pmmintrin.h Suppl. SSE3	-0.124939
-0.358945	SSE2 emmintrin.h SSE3	-0.124939
-1.951811	compiler to evaluate	-0.124939
-1.733014	time to evaluate	-0.124939
-2.016049	able to evaluate	-0.124939
-1.043145	needs to evaluate	-0.124939
-0.891502	they always evaluate	-0.124939
-0.203840	Optimization in embedded	-0.425969
-0.601200	machines with embedded	-0.124939
-1.869730	in some embedded	-0.124939
-0.563072	in small embedded	-0.124939
-1.004418	for small embedded	-0.124939
-0.598219	manuals by Agner	-0.124939
-0.598219	copyrighted by Agner	-0.124939
-0.897771	example, using Agner	-0.124939
-0.599303	library Intel Agner	-0.124939
-0.764591	Vector class, Agner	-0.124939
-0.463659	platforms By Agner	-0.124939
-2.650052	to the availability	-0.124939
-2.476462	and the availability	-0.124939
-2.574069	for the availability	-0.124939
-0.900064	delay the availability	-0.124939
-0.600544	signaling the availability	-0.124939
-1.077993	application. The availability	-0.124939
-2.633058	// Example 13.1	-0.124939
-1.197332	in example 13.1	-0.124939
-0.541286	sets........................... 122 13.1	-0.124939
-0.463677	innermost loops. 13.1	-0.124939
-0.601851	pointer, a reference,	-0.124939
-0.826656	pointer or reference,	-0.124939
-0.358973	a non-const reference,	-0.124939
-3.141244	of the .NET	-0.124939
-2.642950	for the .NET	-0.124939
-0.600158	frameworks. The .NET	-0.124939
-0.600158	mouse. The .NET	-0.124939
-0.788977	Visual Basic .NET	-0.124939
-0.358959	in Microsoft's .NET	-0.124939
-1.048834	if (a !=	-0.124939
-0.527359	&& z !=	-0.124939
-0.527325	while (n !=	-0.124939
-0.764612	if (b !=	-0.124939
-0.358930	if (handle !=	-0.124939
-0.358930	while (*p !=	-0.124939
-0.884173	A few files,	-0.124939
-0.474601	files, resource files,	-0.124939
-0.474601	objects), resource files,	-0.124939
-0.567241	remote help files,	-0.124939
-0.314790	files, configuration files,	-0.124939
-0.314790	DLLs, configuration files,	-0.124939
-0.929254	and references Pointers	-0.124939
-0.515427	versus references Pointers	-0.124939
-1.203856	each thread. Pointers	-0.124939
-0.570559	valid address. Pointers	-0.124939
-0.143403	operations. 7.6 Pointers	-0.124939
-0.143403	33 7.6 Pointers	-0.124939
-1.888679	more than half	-0.124939
-1.691873	less than half	-0.124939
-0.600337	handled at half	-0.124939
-1.599773	is only half	-0.124939
-0.584718	of only half	-0.124939
-0.584718	modifying only half	-0.124939
-2.149375	used for converting	-0.124939
-1.433631	instructions for converting	-0.124939
-1.741439	we are converting	-0.124939
-0.900112	signed when converting	-0.124939
-0.597511	signed before converting	-0.124939
-0.505078	is implicitly converting	-0.124939
-0.600050	problem only occurs	-0.124939
-0.709221	an exception occurs	-0.124939
-0.584259	task switch occurs	-0.124939
-0.573334	same subexpression occurs	-0.124939
-0.828286	an interrupt occurs	-0.124939
-1.056309	0x80000000; // Set	-0.124939
-0.202557	InstructionSet(); // Set	-0.425969
-0.594301	116 // Set	-0.124939
-0.358973	Example 7.6. Set	-0.124939
-0.358973	Example 7.5. Set	-0.124939
-0.901278	exception is costly	-0.124939
-0.901278	conversion is costly	-0.124939
-0.601537	constructs are costly	-0.124939
-0.591226	are quite costly	-0.124939
-0.788939	are relatively costly	-0.124939
-0.835912	is extremely costly	-0.124939
-1.614382	on the newest	-0.301030
-2.263886	use the newest	-0.124939
-2.002652	using the newest	-0.124939
-1.077993	vectorization. The newest	-0.124939
-0.601625	standard for specifying	-0.124939
-0.601247	declared by specifying	-0.124939
-0.596407	int, without specifying	-0.124939
-1.349780	copy constructor specifying	-0.124939
-0.129469	standard C, specifying	-0.425969
-1.553576	branch that follows	-0.124939
-2.152077	if it follows	-0.124939
-1.579832	implemented as follows	-0.124939
-0.894062	vectorized as follows	-0.124939
-1.688553	function pointer follows	-0.124939
-0.358945	This closely follows	-0.124939
-1.679874	result of comparing	-0.124939
-1.522983	simply by comparing	-0.124939
-0.598229	doubles by comparing	-0.124939
-1.835427	efficient than comparing	-0.124939
-0.596617	Rather than comparing	-0.124939
-0.659926	loop counter, comparing	-0.124939
-2.556584	This is efficient,	-0.124939
-0.902387	fast and efficient,	-0.124939
-1.438267	code more efficient,	-0.124939
-0.599852	primitive, but efficient,	-0.124939
-0.596591	make pointers efficient,	-0.124939
-0.591206	also quite efficient,	-0.124939
-1.379269	generation of computers	-0.124939
-1.298757	and that computers	-0.124939
-1.493778	is because computers	-0.124939
-1.021794	all modern computers	-0.124939
-0.283265	more powerful computers	-0.124939
-2.014629	all the B	-0.124939
-1.854746	calculation of B	-0.124939
-1.395385	the four B	-0.124939
-0.505078	of A, B	-0.124939
-0.065807	= 1.1, B	-0.425969
-1.063981	system code. System	-0.124939
-0.143403	20 3.8 System	-0.124939
-0.143403	finish. 3.8 System	-0.124939
-0.143403	148 14.13 System	-0.124939
-0.143403	X. 14.13 System	-0.124939
-0.659926	everything else. System	-0.124939
-0.600673	series of five	-0.124939
-1.679174	up to five	-0.124939
-0.898699	then all five	-0.124939
-0.851807	has changed five	-0.124939
-1.822698	of each step	-0.124939
-1.860372	a single step	-0.124939
-1.671424	the next step	-0.124939
-1.245210	the second step	-0.124939
-0.885180	a second step	-0.124939
-0.358945	space 91 step	-0.124939
-1.379124	caching is poor	-0.124939
-1.503692	examples of poor	-0.124939
-1.378990	due to poor	-0.124939
-2.536056	may be poor	-0.124939
-0.899657	suffer from poor	-0.124939
-0.358930	well. Very poor	-0.124939
-2.237994	have to prefetch	-0.124939
-0.601622	data The prefetch	-0.124939
-1.077987	mechanism can prefetch	-0.124939
-0.597564	cache cannot prefetch	-0.124939
-0.595655	modern processors prefetch	-0.124939
-0.589596	to automatically prefetch	-0.124939
-0.900856	gives an 9	-0.124939
-0.598504	i * 9	-0.124939
-0.597878	varies between 9	-0.124939
-0.527325	pop ebx. 9	-0.124939
-0.505065	4, 6, 9	-0.124939
-0.463641	............................................................................. 84 9	-0.124939
-1.297506	time on deciding	-0.124939
-0.578103	parallelism when deciding	-0.124939
-0.338654	account when deciding	-0.602060
-0.856428	disadvantages when deciding	-0.124939
-1.900630	through a self-relative	-0.124939
-1.854746	calculation of self-relative	-0.124939
-0.902220	instruction for self-relative	-0.124939
-1.816276	to calculate self-relative	-0.124939
-0.185348	set supports self-relative	-0.124939
-0.601386	DynamicArray = (float	-0.124939
-0.575653	float square (float	-0.124939
-0.050300	float parabola (float	-0.602060
-0.463677	int lrintf (float	-0.124939
-1.641160	example, a Core	-0.124939
-1.686276	the Intel Core	-0.124939
-0.568104	bytes Intel Core	-0.124939
-0.568104	operands Intel Core	-0.124939
-0.568104	op. Intel Core	-0.124939
-0.594151	_mm_exp_pd AMD Core	-0.124939
-2.117736	in the debugger	-0.124939
-2.771340	in a debugger	-0.124939
-0.601643	optimization. The debugger	-0.124939
-0.899951	debugging. A debugger	-0.124939
-2.429279	with the ^	-0.124939
-1.194596	a a ^	-0.124939
-1.509659	= a ^	-0.124939
-0.898302	0 a ^	-0.124939
-0.541318	--------- ~a ^	-0.124939
-1.295382	same time regardless	-0.124939
-2.643731	the same regardless	-0.124939
-1.471454	most cases, regardless	-0.124939
-0.563014	be false regardless	-0.124939
-1.048834	in registers, regardless	-0.124939
-0.358930	same name, regardless	-0.124939
-1.939882	instead of truncation	-0.124939
-1.203142	changed to truncation	-0.124939
-1.374465	slower than truncation	-0.124939
-0.600612	integers use truncation	-0.124939
-0.600139	unfortunate because truncation	-0.124939
-0.788949	standard specifies truncation	-0.124939
-3.204253	of the base	-0.124939
-2.396378	to a base	-0.124939
-2.305736	as a base	-0.124939
-1.203171	whether to base	-0.124939
-0.600282	files, data base	-0.124939
-0.527313	the image base	-0.124939
-3.203093	of the result.	-0.124939
-2.643731	the same result.	-0.124939
-1.860126	a single result.	-0.124939
-0.885656	the calculated result.	-0.124939
-1.167172	a negative result.	-0.124939
-0.583193	low positive result.	-0.124939
-0.582086	dispatching: 1. How	-0.124939
-0.314776	compiler 8.1 How	-0.124939
-0.314776	66 8.1 How	-0.124939
-0.505062	four multiplications. How	-0.124939
-0.143403	16 3.1 How	-0.124939
-0.143403	consumers 3.1 How	-0.124939
-0.558388	a dependency chain.	-0.124939
-0.600492	long dependency chain.	-0.124939
-0.266909	loop-carried dependency chain.	-0.124939
-0.596197	File access Reading	-0.124939
-1.601384	the program. Reading	-0.124939
-0.575605	random access. Reading	-0.124939
-0.971266	lookup tables Reading	-0.124939
-0.505022	= 0x1C. Reading	-0.124939
-0.463641	from 0x4700. Reading	-0.124939
-2.076326	where the compilation	-0.124939
-0.601350	interpretation or compilation	-0.124939
-0.872990	that requires compilation	-0.124939
-0.473701	and just-in-time compilation	-0.124939
-0.336328	on just-in-time compilation	-0.124939
-0.473701	use just-in-time compilation	-0.124939
-0.292958	the hot spots	-0.124939
-0.290300	any hot spots	-0.124939
-0.121333	find hot spots	-0.124939
-0.290300	identifying hot spots	-0.124939
-2.678131	that the behavior	-0.124939
-1.441438	change the behavior	-0.124939
-0.601238	mimic the behavior	-0.124939
-0.902256	details. The behavior	-0.124939
-1.055250	the overflow behavior	-0.124939
-0.505062	This wasteful behavior	-0.124939
-0.601044	normal. This happens	-0.124939
-0.600040	this only happens	-0.124939
-0.898768	up, which happens	-0.124939
-0.879373	This typically happens	-0.124939
-0.588612	at what happens	-0.124939
-0.567216	where everything happens	-0.124939
-2.761207	if the 7	-0.124939
-1.783976	byte at 7	-0.124939
-0.888657	in Windows 7	-0.124939
-0.888739	compiler versions 7	-0.124939
-0.505022	process...................................................................................................... 25 7	-0.124939
-0.358930	control tool. 7	-0.124939
-0.589788	and page 87	-0.124939
-1.949840	See page 87	-0.124939
-0.550807	from Func 87	-0.124939
-0.527313	access ............................................................................................. 87	-0.124939
-0.527313	data ......................................................................................... 87	-0.124939
-0.358945	organization ................................................................................................... 87	-0.124939
-1.293719	in vector Type	-0.124939
-0.557752	following table. Type	-0.124939
-1.231135	as follows: Type	-0.124939
-0.249837	problem. 7.11 Type	-0.124939
-0.249837	38 7.11 Type	-0.124939
-0.463659	is safer. Type	-0.124939
-1.436266	in different places	-0.124939
-1.052053	at different places	-0.124939
-0.594442	at specific places	-0.124939
-0.592991	is four places	-0.124939
-0.588022	is n places	-0.124939
-0.585178	lies r places	-0.124939
-1.084824	the stack unwinding	-0.124939
-0.176512	of stack unwinding	-0.124939
-0.451584	The stack unwinding	-0.425969
-0.683941	called stack unwinding	-0.124939
-1.827706	preferably be static,	-0.124939
-0.159640	The keyword static,	-0.726999
-0.463696	executed. Without static,	-0.124939
-0.177060	it. I am	-0.124939
-0.687476	manuals. I am	-0.124939
-0.481008	use. I am	-0.124939
-0.481008	manual, I am	-0.124939
-0.481008	microcontrollers. I am	-0.124939
-1.297694	into a leaf	-0.425969
-1.074186	called a leaf	-0.124939
-0.202604	function. A leaf	-0.425969
-0.894839	distinction between leaf	-0.124939
-1.445935	operand is evaluated	-0.124939
-1.238710	not be evaluated	-0.425969
-1.761845	parameters are evaluated	-0.124939
-0.599693	|| are evaluated	-0.124939
-2.591166	is not evaluated	-0.124939
-2.074458	able to completely	-0.124939
-2.797786	can be completely	-0.124939
-2.446206	may be completely	-0.124939
-0.601335	slow or completely	-0.124939
-0.901273	used on completely	-0.124939
-0.575689	or fail completely	-0.124939
-0.902463	again and again.	-0.124939
-0.058766	and back again.	-0.124939
-0.764695	3 breakpoint again.	-0.124939
-1.378844	availability of powerful	-0.124939
-0.600729	tools have powerful	-0.124939
-1.041953	have more powerful	-0.124939
-0.877920	even more powerful	-0.124939
-0.589293	ever more powerful	-0.124939
-0.881726	actually quite powerful	-0.124939
-2.117736	in the form	-0.602060
-1.776322	any other form	-0.124939
-0.871548	model numbers form	-0.124939
-0.869929	in binary form	-0.124939
-2.832666	it is deallocated	-0.124939
-0.379642	allocated and deallocated	-0.425969
-2.106671	they are deallocated	-0.124939
-1.686714	are also deallocated	-0.124939
-0.589614	is automatically deallocated	-0.124939
-0.588005	way three times.	-0.124939
-0.580811	irregular response times.	-0.124939
-0.550787	changed five times.	-0.124939
-0.999730	a hundred times.	-0.124939
-0.241879	at inconvenient times.	-0.124939
-1.554197	useful in 32-	-0.124939
-1.202413	mode. The 32-	-0.124939
-1.013515	compiler for 32-	-0.425969
-0.600472	versions. A 32-	-0.124939
-0.582076	libraries. Supports 32-	-0.124939
-1.873818	a and edx	-0.124939
-0.900278	ecx and edx	-0.124939
-1.378730	value in edx	-0.124939
-0.601038	adds, not edx	-0.124939
-0.567956	Induction ; edx	-0.124939
-0.567956	[esp+12] ; edx	-0.124939
-1.164265	it cannot rule	-0.124939
-0.578593	compiler cannot rule	-0.425969
-0.169346	strict aliasing rule	-0.425969
-0.550832	to completely rule	-0.124939
-0.601763	iterations in one.	-0.124939
-1.235866	a new one.	-0.124939
-0.811286	the preceding one.	-0.124939
-1.432220	the previous one.	-0.124939
-1.961142	this is permissible	-0.124939
-2.952704	can be permissible	-0.124939
-0.601548	multiplication are permissible	-0.124939
-2.295231	is not permissible	-0.124939
-1.469007	are not permissible	-0.425969
-0.600897	assume the worst	-0.124939
-1.374427	gives the worst	-0.124939
-0.203887	cover the worst	-0.425969
-1.196642	etc. The worst	-0.124939
-1.073561	critical. The worst	-0.124939
-2.357995	is the job	-0.124939
-1.760940	do the job	-0.124939
-0.600891	done the job	-0.124939
-1.441345	divide the job	-0.124939
-1.785329	the best job	-0.124939
-0.828313	the background job	-0.124939
-0.599449	commercial compilers due	-0.124939
-0.585971	be higher due	-0.124939
-0.573319	large delay due	-0.124939
-0.842169	the future due	-0.124939
-0.463641	are unstable due	-0.124939
-0.463641	some differences due	-0.124939
-1.721167	y = 1.0;	-0.124939
-0.202010	factorial = 1.0;	-0.124939
-0.591610	list[i].a = 1.0;	-0.124939
-0.591610	temp->a = 1.0;	-0.124939
-1.950904	{ return 1.0;	-0.124939
-0.601598	clients that depend	-0.124939
-0.599682	iteration should depend	-0.124939
-1.593930	it doesn't depend	-0.124939
-0.589129	factorials don't depend	-0.124939
-0.589111	workaround methods depend	-0.124939
-0.575630	hardware-related details depend	-0.124939
-2.357995	is the biggest	-0.124939
-1.442501	fit the biggest	-0.124939
-0.203886	Finding the biggest	-0.425969
-0.601653	compact. The biggest	-0.124939
-1.533661	// Define biggest	-0.124939
-0.589142	looking name ?Func@@YAXQAHAAH@Z	-0.124939
-0.152951	PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z	-0.124939
-0.764667	ret ALIGN ?Func@@YAXQAHAAH@Z	-0.124939
-0.065809	4 PUBLIC ?Func@@YAXQAHAAH@Z	-0.425969
-2.145199	because it defines	-0.124939
-1.436105	programming language defines	-0.124939
-1.102972	definition language defines	-0.124939
-0.589160	type __m128i defines	-0.124939
-0.463659	type __m128 defines	-0.124939
-0.358945	type __m128d defines	-0.124939
-0.933398	do not overlap.	-0.124939
-1.775795	and b overlap.	-0.124939
-0.563074	ranges now overlap.	-0.124939
-0.175452	Supports parallel processing,	-0.425969
-0.527341	and video processing,	-0.124939
-0.527313	are image processing,	-0.124939
-0.505042	processing, signal processing,	-0.124939
-0.463659	processing, sound processing,	-0.124939
-0.876719	} void SelectAddMul(short	-0.124939
-0.494409	branch void SelectAddMul(short	-0.124939
-0.494409	classes void SelectAddMul(short	-0.124939
-1.052810	inline void SelectAddMul(short	-0.124939
-0.494409	x);} void SelectAddMul(short	-0.124939
-0.494409	vectorized: void SelectAddMul(short	-0.124939
-0.601939	disturb the users	-0.124939
-0.869251	that many users	-0.124939
-0.584811	by many users	-0.124939
-1.066874	for software users	-0.124939
-0.588073	than end users	-0.124939
-0.583219	many computer users	-0.124939
-0.601721	(YMM), and soon	-0.124939
-0.601099	invalid as soon	-0.124939
-1.505040	you will soon	-0.124939
-0.594356	manual will soon	-0.124939
-0.541245	for Basic soon	-0.124939
-0.527313	5). As soon	-0.124939
-1.802607	using a six	-0.124939
-1.073178	contains only six	-0.124939
-2.064999	it takes six	-0.124939
-2.021078	the first six	-0.124939
-0.726859	is approximately six	-0.124939
-0.450718	are approximately six	-0.124939
-0.573337	150 16 Testing	-0.124939
-0.573337	15.1c). 16 Testing	-0.124939
-0.594486	Testing speed Testing	-0.124939
-0.358945	Example 14.7b. Testing	-0.124939
-0.358945	Example 14.7a. Testing	-0.124939
-0.358945	is rare. Testing	-0.124939
-0.778709	code. In general,	-0.124939
-0.535432	loop. In general,	-0.124939
-0.535432	fast. In general,	-0.124939
-0.535432	condition. In general,	-0.124939
-0.535432	counterparts. In general,	-0.124939
-0.535432	generators. In general,	-0.124939
-2.212958	is to roll	-0.124939
-1.939816	way to roll	-0.124939
-1.575696	useful to roll	-0.124939
-2.050313	want to roll	-0.124939
-1.851264	advantageous to roll	-0.124939
-0.895703	when we roll	-0.124939
-1.661698	intrinsic functions (i.e.	-0.124939
-1.285576	do so (i.e.	-0.124939
-1.067356	global variables (i.e.	-0.124939
-1.981310	of 2 (i.e.	-0.124939
-0.881768	return addresses (i.e.	-0.124939
-1.089782	same module (i.e.	-0.124939
-0.902387	ecx and edx,	-0.124939
-1.078263	is in edx,	-0.124939
-0.597341	eax, 8 edx,	-0.124939
-0.567166	eax, eax edx,	-0.124939
-0.557732	2:8+esp eax, edx,	-0.124939
-0.505022	edx, ecx, edx,	-0.124939
-0.804761	implementations of C++,	-0.124939
-0.598608	checking In C++,	-0.124939
-0.573300	language, e.g. C++,	-0.124939
-0.527341	include C, C++,	-0.124939
-0.358945	C#, managed C++,	-0.124939
-0.118262	LoadVector(cc + i);	-0.602060
-0.118262	LoadVector(bb + i);	-0.602060
-1.555360	members of mixed	-0.124939
-2.006093	cannot be mixed	-0.124939
-1.281976	objects have mixed	-0.124939
-1.060127	operands have mixed	-0.124939
-0.600472	optimization. A mixed	-0.124939
-0.463659	application integration, mixed	-0.124939
-0.601309	Or, if protection	-0.124939
-1.922863	and other protection	-0.124939
-0.691120	a copy protection	-0.124939
-0.483269	Some copy protection	-0.124939
-0.483269	Most copy protection	-0.124939
-0.483269	Many copy protection	-0.124939
-1.629333	the loop counter.	-0.124939
-1.045258	simple integer counter.	-0.124939
-0.590455	additional integer counter.	-0.124939
-0.663859	time stamp counter.	-0.124939
-2.761794	to the structure.	-0.124939
-2.609231	of a structure.	-0.124939
-2.396715	to a structure.	-0.124939
-0.804207	class or structure.	-0.124939
-0.600262	desired program structure.	-0.124939
-1.078569	int is 4.	-0.124939
-0.910867	a Pentium 4.	-0.124939
-0.431138	Intel Pentium 4.	-0.124939
-0.609837	old Pentium 4.	-0.124939
-0.358959	for Linux) 4.	-0.124939
-0.358959	compiler makers. 4.	-0.124939
-0.601615	computer for security	-0.124939
-0.598510	programs where security	-0.124939
-0.592992	is another security	-0.124939
-1.677598	the above security	-0.124939
-0.358930	a compelling security	-0.124939
-0.358930	third party security	-0.124939
-2.466304	number of branches.	-0.124939
-1.203066	calls and branches.	-0.124939
-1.492761	no other branches.	-0.124939
-0.598473	too many branches.	-0.124939
-0.587983	as three branches.	-0.124939
-0.726785	other nearby branches.	-0.124939
-1.168980	int 128 Is16vec8	-0.124939
-0.877615	* c; Is16vec8	-0.124939
-1.010069	vector c: Is16vec8	-0.124939
-1.010069	vector b: Is16vec8	-0.124939
-0.902948	of (0,0,0,0,0,0,0,0) Is16vec8	-0.124939
-0.902948	of (2,2,2,2,2,2,2,2) Is16vec8	-0.124939
-1.187873	of CPU cores.	-0.124939
-0.576947	different CPU cores.	-0.124939
-1.229667	multiple CPU cores.	-0.124939
-1.007819	between CPU cores.	-0.124939
-1.425287	with multiple cores.	-0.124939
-0.594672	multiple processor cores.	-0.124939
-1.445981	care of communication	-0.124939
-1.196339	methods for communication	-0.124939
-1.291325	needed for communication	-0.124939
-2.294695	is that communication	-0.124939
-0.600144	parallelism because communication	-0.124939
-0.594492	of necessary communication	-0.124939
-1.583053	only for avoiding	-0.124939
-1.196417	methods for avoiding	-0.124939
-1.177428	and by avoiding	-0.124939
-0.595232	error by avoiding	-0.124939
-0.595232	save by avoiding	-0.124939
-1.672626	not always avoiding	-0.124939
-1.679532	reference to anything	-0.124939
-1.739091	rely on anything	-0.124939
-1.734524	time than anything	-0.124939
-1.659755	to optimize anything	-0.124939
-0.588651	not cost anything	-0.124939
-0.726785	not alias anything	-0.124939
-0.901814	#endif // INSTRSET	-0.124939
-0.577605	preprocessing macro INSTRSET	-0.124939
-0.435065	} #if INSTRSET	-0.124939
-0.435065	set #if INSTRSET	-0.124939
-0.143403	SelectAddMul_SSE41 #elif INSTRSET	-0.124939
-0.143403	SelectAddMul_SSE2 #elif INSTRSET	-0.124939
-0.596197	Memory access Accessing	-0.124939
-1.024925	smart pointer. Accessing	-0.124939
-1.027673	back again. Accessing	-0.124939
-0.527291	or structures. Accessing	-0.124939
-0.505022	more compact. Accessing	-0.124939
-0.358930	variable. Efficiency Accessing	-0.124939
-1.641823	variables and internal	-0.124939
-1.491512	used for internal	-0.425969
-0.896185	PLT for internal	-0.124939
-0.601200	libraries with internal	-0.124939
-0.596224	can access internal	-0.124939
-1.453244	or by type-casting	-0.124939
-0.595222	class by type-casting	-0.124939
-0.595222	type by type-casting	-0.124939
-0.900129	of when type-casting	-0.124939
-0.567196	C- style type-casting	-0.124939
-0.764627	the C-style type-casting	-0.124939
-1.758104	by the requirements	-0.425969
-2.396029	with the requirements	-0.124939
-0.594293	process. These requirements	-0.124939
-1.413480	turn off requirements	-0.124939
-1.009617	the alignment requirements	-0.124939
-2.761794	to the profiler.	-0.124939
-1.497620	not a profiler.	-0.124939
-1.791249	using a profiler.	-0.124939
-1.056776	CPU- specific profiler.	-0.124939
-0.358959	a ready-made profiler.	-0.124939
-1.379609	Therefore, the __fastcall	-0.124939
-0.601300	Fastcall function __fastcall	-0.124939
-1.273415	The keyword __fastcall	-0.124939
-0.358945	__attribute(( fastcall)) __fastcall	-0.124939
-0.358945	function calling. __fastcall	-0.124939
-1.203224	cause a loss	-0.124939
-1.502783	overflow and loss	-0.124939
-1.077011	overflow or loss	-0.124939
-1.189252	hardly any loss	-0.124939
-0.595235	worry about loss	-0.124939
-1.776084	any other cleanup	-0.124939
-1.290064	of all cleanup	-0.124939
-1.396757	the necessary cleanup	-0.124939
-0.878544	of handling cleanup	-0.124939
-0.584266	that require cleanup	-0.124939
-0.314785	87 9.3 Functions	-0.124939
-0.314785	CPUs". 9.3 Functions	-0.124939
-0.249852	45 7.14 Functions	-0.124939
-0.249852	big. 7.14 Functions	-0.124939
-0.358973	v. 10.1.020. Functions	-0.124939
-0.595427	to error handling.	-0.124939
-0.507308	and exception handling.	-0.124939
-0.507308	on exception handling.	-0.124939
-0.730601	no exception handling.	-0.124939
-0.730601	structured exception handling.	-0.124939
-1.199909	C++ and Fortran	-0.124939
-0.600652	Pascal and Fortran	-0.124939
-0.902497	(except in Fortran	-0.124939
-0.358959	as versatile. Fortran	-0.124939
-0.358959	D, Pascal, Fortran	-0.124939
-3.205415	of the increment	-0.124939
-1.200036	CPU to increment	-0.124939
-0.601035	simply to increment	-0.124939
-2.327468	the loop increment	-0.124939
-0.889579	here about increment	-0.124939
-1.299527	libraries and drivers	-0.124939
-0.283282	and device drivers	-0.124939
-0.283282	in device drivers	-0.124939
-0.283282	64-bit device drivers	-0.124939
-0.283282	Critical device drivers	-0.124939
-0.806946	important to economize	-0.425969
-1.078279	library and economize	-0.124939
-1.742843	compile time. Templates	-0.124939
-1.246989	template parameter. Templates	-0.124939
-0.818642	= 10; Templates	-0.124939
-0.505042	cases. 7.28 Templates	-0.124939
-0.358945	template. 57 Templates	-0.124939
-0.463702	in row 28	-0.124939
-0.463702	from row 28	-0.124939
-0.550845	in column 28	-0.124939
-0.391163	with column 28	-0.124939
-0.463696	64-bit systems). 28	-0.124939
-0.601830	three to seven	-0.124939
-0.597724	expressions on seven	-0.124939
-0.597724	experiments on seven	-0.124939
-1.054638	to cause seven	-0.124939
-0.567219	of approximately seven	-0.124939
-2.953773	can be turned	-0.124939
-0.899459	STL vector turned	-0.124939
-0.458559	optimization options turned	-0.301030
-1.600959	order of inheritance	-0.124939
-0.849527	to multiple inheritance	-0.124939
-0.574453	as multiple inheritance	-0.124939
-0.849527	avoid multiple inheritance	-0.124939
-0.541288	7.38a. Multiple inheritance	-0.124939
-1.953176	way to overcome	-0.124939
-1.120623	how to overcome	-0.124939
-2.954844	can be overcome	-0.124939
-0.875098	difficult to maintain.	-0.124939
-1.367391	easier to maintain.	-0.124939
-0.601759	debug and maintain.	-0.124939
-0.725196	up to fourteen	-0.301030
-0.855398	systems and fourteen	-0.124939
-1.078721	Add to 122	-0.124939
-1.511142	See page 122	-0.425969
-0.358959	dispatch strategies........................................................................................ 122	-0.124939
-0.358959	instruction sets........................... 122	-0.124939
-0.584604	and time consuming.	-0.124939
-0.584604	are time consuming.	-0.124939
-0.584604	very time consuming.	-0.124939
-0.584604	particularly time consuming.	-0.124939
-0.505118	very time- consuming.	-0.124939
-0.601939	justify the method.	-0.124939
-1.858119	of this method.	-0.124939
-0.893761	every call method.	-0.124939
-0.596855	complicated template method.	-0.124939
-0.595623	Use simple method.	-0.124939
-1.704080	sake of backwards	-0.124939
-1.376200	sequence of backwards	-0.124939
-2.190893	are not backwards	-0.124939
-1.879760	are accessed backwards	-0.124939
-0.505091	the track backwards	-0.124939
-0.902849	mirror the remote	-0.124939
-2.339615	with a remote	-0.124939
-0.902615	Access to remote	-0.124939
-0.601150	Files on remote	-0.124939
-0.358945	automatic updates, remote	-0.124939
-2.282091	such as int,	-0.124939
-0.600961	declare an int,	-0.124939
-0.895067	2 2 int,	-0.124939
-0.571805	1 short int,	-0.124939
-0.571805	char, short int,	-0.124939
-0.601734	c2 and bc	-0.124939
-1.293758	in vector bc	-0.124939
-0.479824	c __m128i bc	-0.425969
-0.527357	inverted bit-mask: bc	-0.124939
-0.821431	and development tools.	-0.124939
-0.498222	advanced development tools.	-0.124939
-0.498222	powerful development tools.	-0.124939
-0.510583	standardized installation tools.	-0.124939
-0.510583	individual installation tools.	-0.124939
-0.960580	in one operation.	-0.124939
-1.860864	a single operation.	-0.124939
-0.440390	a shift operation.	-0.124939
-1.958315	in the future.	-0.124939
-0.358988	more distant future.	-0.124939
-0.902849	force the swapping	-0.124939
-1.741439	we are swapping	-0.124939
-0.600569	careful when swapping	-0.124939
-0.600335	excessive memory swapping	-0.124939
-0.563034	disk. Memory swapping	-0.124939
-2.467611	when the AVX512	-0.124939
-0.161684	8 512 AVX512	-0.124939
-0.161684	16 512 AVX512	-0.124939
-2.573306	is a considerable	-0.124939
-1.291536	takes a considerable	-0.124939
-0.898317	give a considerable	-0.124939
-0.599667	course a considerable	-0.124939
-0.600518	called. A considerable	-0.124939
-1.078820	remove the memset	-0.124939
-1.939792	use of memset	-0.124939
-1.503491	calls to memset	-0.124939
-0.601604	report that memset	-0.124939
-1.706168	the functions memset	-0.124939
-2.650470	to the rest	-0.124939
-2.476809	and the rest	-0.124939
-2.894008	in the rest	-0.124939
-2.628974	that the rest	-0.124939
-2.394379	by the rest	-0.124939
-1.250851	code version on,	-0.124939
-0.869002	advanced version on,	-0.124939
-1.237779	is running on,	-0.124939
-0.266247	options turned on,	-0.124939
-0.599392	example using Agner's	-0.124939
-1.643540	vector classes Agner's	-0.124939
-0.659897	page 107). Agner's	-0.124939
-0.358945	option -mveclibabi=acml. Agner's	-0.124939
-0.358945	amd_vrs4_expf amd_vrd2_exp Agner's	-0.124939
-2.566476	and the Digital	-0.124939
-0.601721	Borland and Digital	-0.124939
-0.527313	Codeplay Watcom Digital	-0.124939
-0.358945	7.1-4, 2008. Digital	-0.124939
-0.358945	vector intrinsics. Digital	-0.124939
-1.679917	about the third	-0.124939
-1.919817	and a third	-0.124939
-1.597805	code. The third	-0.124939
-0.600916	reliable than third	-0.124939
-0.358945	} 59 third	-0.124939
-0.591968	objects // Roll	-0.124939
-0.883136	c; // Roll	-0.124939
-0.202083	_mm_set1_epi16(2); // Roll	-0.425969
-0.591968	two(2,2,2,2,2,2,2,2); // Roll	-0.124939
-1.192133	test // Critical	-0.124939
-1.192133	b; // Critical	-0.124939
-0.505062	parts only. Critical	-0.124939
-0.726852	CPU brand. Critical	-0.124939
-0.505062	or C++. Critical	-0.124939
-0.052851	manual 5: "Calling	-0.823909
-1.504234	However, the CISC	-0.124939
-0.601734	RISC and CISC	-0.124939
-0.899294	resource. The CISC	-0.124939
-0.600158	ratio. The CISC	-0.124939
-1.296746	processors with CISC	-0.124939
-0.601721	units, and 22	-0.124939
-0.527313	throughput ....................................................................................... 22	-0.124939
-0.463659	chains ................................................................................................ 22	-0.124939
-0.358945	Context switches..................................................................................................... 22	-0.124939
-0.358945	Memory access....................................................................................................... 22	-0.124939
-0.902256	1. The AND	-0.124939
-0.897055	zero); // AND	-0.124939
-0.599032	110 // AND	-0.124939
-1.708097	the two AND	-0.124939
-1.152919	The bitwise AND	-0.124939
-0.504967	worth the effort	-0.425969
-1.206895	the optimization effort	-0.124939
-0.195047	your optimization effort	-0.425969
-1.882720	floating point numbers.	-0.124939
-0.583230	for negative numbers.	-0.124939
-1.082862	a thousand numbers.	-0.124939
-0.463677	use denormal numbers.	-0.124939
-0.900126	becoming more popular	-0.124939
-0.600472	tools. A popular	-0.124939
-0.599449	8 most popular	-0.124939
-0.597374	was less popular	-0.124939
-0.505042	tools. One popular	-0.124939
-0.901831	8; // SIZE	-0.124939
-1.110235	const int SIZE	-0.602060
-0.588049	256 && SIZE	-0.124939
-0.960830	identification (RTTI) Runtime	-0.124939
-0.249845	53 7.21 Runtime	-0.124939
-0.249845	effort. 7.21 Runtime	-0.124939
-0.358959	Example 7.43a. Runtime	-0.124939
-0.358959	page 73. Runtime	-0.124939
-0.596110	certain programming principles	-0.124939
-0.869892	The storage principles	-0.124939
-1.152627	the advanced principles	-0.124939
-0.583212	two main principles	-0.124939
-0.358945	software engineering principles	-0.124939
-1.769178	number of context	-0.425969
-0.601643	jobs. The context	-0.124939
-0.600488	switches A context	-0.124939
-0.358959	program. Frequent context	-0.124939
-0.601300	to function names.	-0.124939
-0.895596	and variable names.	-0.124939
-1.457882	in assembly names.	-0.124939
-0.594269	or common names.	-0.124939
-0.358945	of identifier names.	-0.124939
-1.503738	ways of reducing	-0.124939
-2.199445	used for reducing	-0.124939
-0.899617	better at reducing	-0.124939
-0.596407	operations without reducing	-0.124939
-1.145678	are actually reducing	-0.124939
-2.041077	that can benefit	-0.124939
-0.897438	registers can benefit	-0.124939
-0.899946	memory will benefit	-0.124939
-0.450752	that could benefit	-0.124939
-0.450752	code could benefit	-0.124939
-1.238710	not be worth	-0.425969
-1.230804	is rarely worth	-0.124939
-0.570580	Another alternative worth	-0.124939
-0.957995	is hardly worth	-0.124939
-1.906382	Gnu compiler manual.	-0.124939
-1.160063	of this manual.	-0.124939
-0.597239	this first manual.	-0.124939
-0.789011	the present manual.	-0.124939
-0.902193	operator that specifies	-0.124939
-1.473199	of software specifies	-0.124939
-0.194044	C/C++ standard specifies	-0.124939
-0.875508	volatile keyword specifies	-0.124939
-1.078203	used and searching	-0.124939
-0.600658	use time searching	-0.124939
-0.588050	for string searching	-0.124939
-0.505097	solution. Is searching	-0.124939
-0.358984	tree. Is searching	-0.124939
-1.274468	register stack versus	-0.124939
-0.806063	references Pointers versus	-0.124939
-0.143411	14.11 Static versus	-0.425969
-0.505062	overflow. Signed versus	-0.124939
-0.115625	and constant propagation	-0.124939
-0.524661	enable constant propagation	-0.124939
-0.527401	Microsoft Constant propagation	-0.124939
-1.771267	do the reduction	-0.124939
-2.067768	where the reduction	-0.124939
-1.795053	a particular reduction	-0.124939
-0.249852	call. Algebraic reduction	-0.124939
-0.249852	reductions. Algebraic reduction	-0.124939
-0.599691	taking cache effects	-0.124939
-0.515422	the negative effects	-0.124939
-0.515422	The negative effects	-0.124939
-0.583230	The positive effects	-0.124939
-0.358959	has side effects	-0.124939
-0.239118	y + 1.;	-0.301030
-0.555612	Func1(x) + 1.;	-0.124939
-0.555612	y.a + 1.;	-0.124939
-0.601663	analysis The live	-0.124939
-0.463720	when their live	-0.124939
-0.107205	because their live	-0.602060
-1.296347	access a multidimensional	-0.124939
-0.601119	Is a multidimensional	-0.124939
-0.203464	matrix or multidimensional	-0.425969
-0.600503	sizeof(list)); A multidimensional	-0.124939
-1.278252	takes to install	-0.425969
-0.600250	hours to install	-0.124939
-0.594998	user must install	-0.124939
-0.463696	saying please install	-0.124939
-1.503503	terms of development,	-0.124939
-0.899478	during program development,	-0.124939
-1.290946	of CPU development,	-0.124939
-1.473076	of software development,	-0.124939
-0.358945	easy GUI development,	-0.124939
-2.549475	on the strict	-0.124939
-0.601592	violates the strict	-0.124939
-1.995260	have a strict	-0.124939
-0.601635	requirements for strict	-0.124939
-1.477826	are less strict	-0.124939
-0.548696	{ for (c	-0.602060
-0.597101	rows for (c	-0.124939
-0.599151	b) + (c	-0.124939
-0.875490	146 below. Position-independent	-0.124939
-0.580794	loader. 2. Position-independent	-0.124939
-0.659926	by default. Position-independent	-0.124939
-0.143403	146 14.12 Position-independent	-0.124939
-0.143403	147 14.12 Position-independent	-0.124939
-1.078569	parallelism is obvious	-0.124939
-2.446677	may be obvious	-0.124939
-1.747279	would be obvious	-0.124939
-1.594425	be an obvious	-0.124939
-1.069123	do such obvious	-0.124939
-0.601160	diagonal is swapped	-0.124939
-0.601160	matrix[r][c] is swapped	-0.124939
-2.537215	may be swapped	-0.124939
-1.077647	b are swapped	-0.124939
-1.187167	or even swapped	-0.124939
-0.541245	resources .......................................................................................... 21	-0.124939
-0.527313	access ...................................................................................................... 21	-0.124939
-0.505078	heavily loaded. 21	-0.124939
-0.505042	databases ....................................................................................................... 21	-0.124939
-0.358945	Graphics ................................................................................................................. 21	-0.124939
-1.065709	the biggest vectors:	-0.124939
-0.015542	the eight-element vectors:	-0.726999
-1.077900	cycle. The OR	-0.124939
-0.901796	bc); // OR	-0.124939
-0.594641	zero. An OR	-0.124939
-1.027969	the bitwise OR	-0.124939
-0.358945	the EXCLUSIVE OR	-0.124939
-2.161231	{ // Array	-0.124939
-0.596665	100; // Array	-0.124939
-0.596665	256; // Array	-0.124939
-0.563064	large arrays. Array	-0.124939
-0.358973	Example 7.15a. Array	-0.124939
-0.898988	all other processes	-0.124939
-1.419075	between multiple processes	-0.124939
-0.586477	Run multiple processes	-0.124939
-0.598504	run many processes	-0.124939
-0.563041	of background processes	-0.124939
-1.503466	language is portable	-0.124939
-2.102304	for a portable	-0.124939
-1.936993	not be portable	-0.124939
-2.590455	is not portable	-0.124939
-0.835937	is fully portable	-0.124939
-2.003003	likely to consume	-0.124939
-0.601043	scanners to consume	-0.124939
-0.599225	framework can consume	-0.124939
-0.599225	database can consume	-0.124939
-0.600094	and functions consume	-0.124939
-0.505121	standards. Such schemes	-0.124939
-0.505121	identification. Such schemes	-0.124939
-0.152960	copy protection schemes	-0.602060
-0.597073	40 - 80	-0.124939
-0.597073	(27 - 80	-0.124939
-2.207566	See page 80	-0.124939
-0.593051	non-member functions. 80	-0.124939
-0.877676	simply put 80	-0.124939
-0.842201	and references. Arrays	-0.124939
-0.143403	38 7.10 Arrays	-0.124939
-0.143403	93. 7.10 Arrays	-0.124939
-0.659926	allocated dynamically. Arrays	-0.124939
-0.358959	unexpected behaviors. Arrays	-0.124939
-0.902220	it for lists	-0.124939
-0.601335	criteria or lists	-0.124939
-0.894970	following table lists	-0.124939
-0.586745	than linked lists	-0.124939
-0.358945	casting. Linked lists	-0.124939
-2.117839	in the event	-0.301030
-1.174572	the specific event	-0.124939
-0.358973	in meaningless event	-0.124939
-2.714764	in a computer.	-0.124939
-2.084101	on a computer.	-0.124939
-1.415763	Pentium 4 computer.	-0.124939
-0.593027	on another computer.	-0.124939
-0.659926	big mainframe computer.	-0.124939
-0.901252	bit code Static	-0.124939
-0.866189	linking are: Static	-0.124939
-0.570580	can not. Static	-0.124939
-0.249845	145 14.11 Static	-0.124939
-0.249845	limitation). 14.11 Static	-0.124939
-0.601863	Virtualization is becoming	-0.124939
-1.687397	compilers are becoming	-0.124939
-1.418552	processors are becoming	-0.124939
-0.894723	devices are becoming	-0.124939
-1.664377	is therefore becoming	-0.124939
-3.068403	in the select	-0.124939
-2.311209	possible to select	-0.124939
-1.195690	branches that select	-0.124939
-0.599937	directives that select	-0.124939
-1.061779	will always select	-0.124939
-2.655042	of a list,	-0.124939
-1.713897	element in list,	-0.124939
-2.281579	such as list,	-0.124939
-0.866177	A negative list,	-0.124939
-0.358945	same queue, list,	-0.124939
-0.600478	instruction is executed	-0.124939
-1.370806	branch is executed	-0.124939
-1.074453	latter is executed	-0.124939
-1.969865	cannot be executed	-0.124939
-1.195154	often be executed	-0.124939
-2.508029	on the actual	-0.124939
-2.262239	than the actual	-0.124939
-2.179228	at the actual	-0.124939
-1.442525	fit the actual	-0.124939
-1.156514	by their actual	-0.124939
-1.709266	In this case,	-0.124939
-0.572038	the latter case,	-0.124939
-0.971290	the general case,	-0.124939
-0.463677	// General case,	-0.124939
-1.285047	in performance over	-0.124939
-1.002094	the advantages over	-0.124939
-0.523808	several advantages over	-0.124939
-0.527334	of alloca over	-0.124939
-0.358959	to controversies over	-0.124939
-1.635493	with a realistic	-0.425969
-1.493584	get a realistic	-0.124939
-1.371492	A more realistic	-0.124939
-0.600503	considered. A realistic	-0.124939
-1.201165	size of abc	-0.301030
-0.577651	7.13 struct abc	-0.124939
-0.358973	int c;}; abc	-0.124939
-1.072389	A is finished.	-0.124939
-0.599784	addition is finished.	-0.124939
-1.072389	iteration is finished.	-0.124939
-0.599784	compilation is finished.	-0.124939
-0.601569	loop are finished.	-0.124939
-0.580730	the other hand,	-0.124939
-0.601734	_M_IX86 and _WIN64	-0.124939
-0.597198	platform not _WIN64	-0.124939
-0.597198	_WIN64 not _WIN64	-0.124939
-1.032629	bit platform _WIN64	-0.124939
-0.659926	_WIN64 _LP64 _WIN64	-0.124939
-1.937225	takes to recover	-0.124939
-2.031089	how to recover	-0.124939
-1.366857	able to recover	-0.425969
-0.896362	made to recover	-0.124939
-2.762335	to the console	-0.124939
-2.086970	for a console	-0.124939
-2.084282	use a console	-0.124939
-0.594534	file. A console	-0.124939
-0.594534	interface. A console	-0.124939
-2.497029	of the advice	-0.124939
-1.200866	follow the advice	-0.124939
-1.077962	order. The advice	-0.124939
-1.626291	The same advice	-0.124939
-1.490103	in different ways.	-0.124939
-1.294068	in two ways.	-0.124939
-0.872437	than two ways.	-0.124939
-0.597434	sets 4 ways.	-0.124939
-0.597369	kb, 8 ways.	-0.124939
-2.633058	// Example 16.2	-0.124939
-1.452415	in example 16.2	-0.124939
-1.601596	the program. 16.2	-0.124939
-0.463677	.................................................................... 155 16.2	-0.124939
-1.962224	inside the pow	-0.124939
-0.601721	sqrt and pow	-0.124939
-1.991532	} The pow	-0.124939
-2.082710	faster than pow	-0.124939
-0.358945	like sqrt, pow	-0.124939
-1.078569	microprocessors is split	-0.124939
-1.998952	need to split	-0.124939
-1.871679	advantageous to split	-0.124939
-2.391918	should be split	-0.124939
-0.590871	operation was split	-0.124939
-0.902066	factors are generated	-0.124939
-1.955364	the code generated	-0.425969
-0.591275	Object files generated	-0.124939
-0.463677	the comments generated	-0.124939
-0.504610	string is created	-0.425969
-1.638985	program that created	-0.124939
-0.601350	declared or created	-0.124939
-0.582113	be dynamically created	-0.124939
-2.039340	be a hundred	-0.124939
-1.922609	than a hundred	-0.124939
-0.898317	memory a hundred	-0.124939
-0.599667	*p+2 a hundred	-0.124939
-0.594334	but several hundred	-0.124939
-1.203413	time of 250	-0.124939
-0.601374	ns = 250	-0.124939
-1.375671	library function 250	-0.124939
-1.946934	more than 250	-0.124939
-0.358945	Certainly not! 250	-0.124939
-1.977732	lot of computing	-0.124939
-1.502871	memory and computing	-0.124939
-1.196378	register for computing	-0.124939
-1.295211	libraries for computing	-0.124939
-0.597385	have less computing	-0.124939
-1.940026	instead of pointers,	-0.124939
-1.475055	accessed through pointers,	-0.124939
-0.570551	violations, invalid pointers,	-0.124939
-0.557769	storage, far pointers,	-0.124939
-0.726819	function parameters, pointers,	-0.124939
-1.995781	way to limit	-0.124939
-0.591624	no certain limit	-0.124939
-0.100311	reasonable upper limit	-0.124939
-0.230978	not-too-big upper limit	-0.124939
-0.601721	page and 90	-0.124939
-2.206842	See page 90	-0.124939
-0.582076	Linux syntax 90	-0.124939
-0.527313	data ...................................................................................................... 90	-0.124939
-0.505042	allocation ...................................................................................... 90	-0.124939
-1.746056	needs to follow	-0.124939
-2.088118	if you follow	-0.124939
-0.600450	vectorization then follow	-0.124939
-1.425455	cache lines follow	-0.124939
-0.505042	case labels follow	-0.124939
-1.078509	called a loop-carried	-0.124939
-2.249293	is no loop-carried	-0.124939
-0.896645	has two loop-carried	-0.124939
-0.580742	are: No loop-carried	-0.124939
-0.573357	chains, especially loop-carried	-0.124939
-2.411824	a function library,	-0.124939
-1.169564	long vector library,	-0.124939
-1.266474	short vector library,	-0.124939
-1.633458	vector class library,	-0.124939
-1.361183	a static library,	-0.124939
-0.601951	ago, the recommendation	-0.124939
-0.978086	no specific recommendation	-0.124939
-0.978086	any specific recommendation	-0.124939
-0.463714	instructions. My recommendation	-0.124939
-0.463714	level. My recommendation	-0.124939
-1.781006	memory allocation Objects	-0.124939
-1.282905	of 2. Objects	-0.124939
-1.182459	is needed. Objects	-0.124939
-0.567173	often inefficient. Objects	-0.124939
-0.835912	garbage collection. Objects	-0.124939
-2.663990	is a compromise	-0.124939
-2.071214	be a compromise	-0.124939
-1.940357	necessary to compromise	-0.124939
-1.404529	that doesn't compromise	-0.124939
-0.659926	a viable compromise	-0.124939
-0.051830	the Digital Mars	-0.124939
-0.051830	and Digital Mars	-0.124939
-0.051830	Watcom Digital Mars	-0.124939
-0.051830	2008. Digital Mars	-0.124939
-0.051830	intrinsics. Digital Mars	-0.124939
-1.437712	value is already	-0.124939
-1.199371	string is already	-0.124939
-0.600471	CPU-type is already	-0.124939
-1.639051	program that already	-0.124939
-1.371265	that has already	-0.124939
-2.330862	there is nothing	-0.124939
-2.177361	There is nothing	-0.124939
-1.073767	set has nothing	-0.124939
-0.600149	loop because nothing	-0.124939
-0.599457	// do nothing	-0.124939
-0.887769	: c (a&&b)	-0.124939
-0.463659	(a+b)+c=a+(b+c) --xx----- (a&&b)	-0.124939
-0.463659	(a&&b)||(a&&!b)=a x--xx---- (a&&b)	-0.124939
-0.358945	x-xx----- 75 (a&&b)	-0.124939
-0.358945	= a&&b (a&&b)	-0.124939
-1.446435	calculating the physical	-0.124939
-2.466638	number of physical	-0.124939
-1.077557	limited by physical	-0.124939
-1.933767	a new physical	-0.124939
-0.592991	has four physical	-0.124939
-0.589757	i; if ((unsigned	-0.124939
-0.878823	}; if ((unsigned	-0.124939
-0.589757	479001600}; if ((unsigned	-0.124939
-0.589757	14.5b if ((unsigned	-0.124939
-0.589757	14.4b if ((unsigned	-0.124939
-2.712222	- - xxxxxxxxx	-0.124939
-0.463659	----x---x a/1=a xxxxxxxxx	-0.124939
-0.463659	x-xxxx-x- x-xxxxxxx xxxxxxxxx	-0.124939
-0.358945	x-xxxx--x Constantfolding xxxxxxxxx	-0.124939
-0.358945	xxxxxxxxx xxxxxxx-x xxxxxxxxx	-0.124939
-1.591608	to have constructors	-0.124939
-1.193009	before any constructors	-0.124939
-0.860789	The copy constructors	-0.124939
-0.516478	for copy constructors	-0.124939
-0.746046	no copy constructors	-0.124939
-1.203304	frequency is increased	-0.124939
-2.257490	can be increased	-0.124939
-1.177037	of CPUs increased	-0.124939
-1.766463	has been increased	-0.124939
-0.598713	advanced C++ programming,	-0.124939
-0.891208	of system programming,	-0.124939
-1.572854	assembly language programming,	-0.124939
-0.541268	language 11 programming,	-0.124939
-0.358945	and object-oriented programming,	-0.124939
-1.776798	any other factor.	-0.124939
-0.092566	the unroll factor.	-0.124939
-0.524148	loop unroll factor.	-0.124939
-0.557757	are available, i.e.	-0.124939
-0.143407	by 16, i.e.	-0.425969
-0.358959	is taken, i.e.	-0.124939
-0.358959	quadratic matrix, i.e.	-0.124939
-1.203304	f is nonzero	-0.124939
-0.601843	multiply a nonzero	-0.124939
-1.379072	values of nonzero	-0.124939
-1.480047	check if nonzero	-0.124939
-0.598384	1 if nonzero	-0.124939
-1.078474	cause of unacceptably	-0.124939
-0.601247	frustrated by unacceptably	-0.124939
-0.600719	sometimes have unacceptably	-0.124939
-0.858712	and sometimes unacceptably	-0.124939
-0.505042	might experience unacceptably	-0.124939
-1.251152	for each process.	-0.124939
-1.307927	software development process.	-0.124939
-0.871538	function dispatch process.	-0.124939
-0.818588	the update process.	-0.124939
-0.901796	counter // Calculate	-0.124939
-0.463659	Example 15.1c. Calculate	-0.124939
-0.463659	Example 15.1b. Calculate	-0.124939
-0.358945	Example 8.23b. Calculate	-0.124939
-0.358945	Example 15.1a. Calculate	-0.124939
-0.601412	14.21. // Only	-0.124939
-0.577593	LIBM library. Only	-0.124939
-0.563034	executable file. Only	-0.124939
-1.185937	cache line. Only	-0.124939
-0.764640	of ebx. Only	-0.124939
-2.321290	that it adds	-0.124939
-1.377179	This function adds	-0.124939
-0.586696	temp++ actually adds	-0.124939
-1.308610	type identification adds	-0.124939
-0.358945	sar ebx,1 adds	-0.124939
-0.119538	void test ()	-0.602060
-0.550845	void Func ()	-0.124939
-0.463696	void CriticalInnerFunction ()	-0.124939
-0.901796	slow // Division	-0.124939
-0.577632	these calculations. Division	-0.124939
-1.517534	clock cycles. Division	-0.124939
-0.982055	much faster. Division	-0.124939
-0.358945	it matters: Division	-0.124939
-3.205415	of the pitfalls	-0.124939
-0.203739	16.2 The pitfalls	-0.425969
-1.490907	most common pitfalls	-0.124939
-1.805101	a few pitfalls	-0.124939
-2.030263	a program package	-0.124939
-1.385514	the software package	-0.124939
-0.568109	a software package	-0.124939
-2.304478	than the equivalent	-0.124939
-1.446105	operator is equivalent	-0.124939
-1.202413	cases. The equivalent	-0.124939
-1.202032	directives are equivalent	-0.124939
-0.592212	at doing equivalent	-0.124939
-1.894488	important to understand	-0.124939
-1.813000	difficult to understand	-0.124939
-1.371235	easier to understand	-0.124939
-1.078241	read and understand	-0.124939
-1.484161	you don't understand	-0.124939
-1.643540	vector classes Fortunately,	-0.124939
-0.541268	not all. Fortunately,	-0.124939
-0.505042	cache sizes. Fortunately,	-0.124939
-0.358945	operator less. Fortunately,	-0.124939
-0.358945	+ d.y; Fortunately,	-0.124939
-2.190202	from the command	-0.124939
-2.365751	to a command	-0.124939
-2.067243	on a command	-0.124939
-1.781761	from a command	-0.124939
-0.600503	servicing. A command	-0.124939
-0.901768	b[i] = a[i];	-0.124939
-0.598091	error return a[i];	-0.124939
-0.463234	sum += a[i];	-0.124939
-0.534291	s0 += a[i];	-0.124939
-0.601945	justifies the relatively	-0.124939
-1.078569	condition is relatively	-0.124939
-2.655551	of a relatively	-0.124939
-1.430852	which are relatively	-0.124939
-0.898368	Branches are relatively	-0.124939
-1.184024	a high priority.	-0.124939
-0.538956	has high priority.	-0.124939
-0.639777	with low priority.	-0.124939
-0.450746	got low priority.	-0.124939
-0.828331	with lower priority.	-0.124939
-1.693443	of data files.	-0.124939
-0.896914	or library files.	-0.124939
-1.305179	the source files.	-0.124939
-0.575614	reading disk files.	-0.124939
-0.570601	and header files.	-0.124939
-2.255812	code is inefficient,	-0.124939
-2.502298	This is inefficient,	-0.124939
-1.835604	method is inefficient,	-0.124939
-1.425497	is quite inefficient,	-0.124939
-0.835998	is extremely inefficient,	-0.124939
-1.203638	follow the guidelines	-0.124939
-1.780299	of these guidelines	-0.124939
-1.987863	The following guidelines	-0.124939
-0.595260	logic. Some guidelines	-0.124939
-0.358945	etc. Accessibility guidelines	-0.124939
-0.151586	Intel Math Kernel	-0.124939
-0.069212	Intel's Math Kernel	-0.124939
-0.069212	"Intel Math Kernel	-0.425969
-0.601870	malloc) is necessarily	-0.124939
-1.817037	is not necessarily	-0.124939
-1.921013	are not necessarily	-0.124939
-1.842646	does not necessarily	-0.124939
-0.902412	use and returns	-0.124939
-2.540129	the function returns	-0.124939
-1.367437	function which returns	-0.124939
-0.582109	For unused returns	-0.124939
-0.505078	beginning. ret returns	-0.124939
-1.759356	or more jobs	-0.124939
-0.896676	do two jobs	-0.124939
-0.358972	necessary cleanup jobs	-0.124939
-0.358972	handling cleanup jobs	-0.124939
-0.358959	for foreground jobs	-0.124939
-1.119513	the class. Data	-0.124939
-0.527369	always work. Data	-0.124939
-0.463659	data together. Data	-0.124939
-0.463659	page 87). Data	-0.124939
-0.358945	memory areas. Data	-0.124939
-0.902412	layers and frameworks	-0.124939
-0.601537	machine are frameworks	-0.124939
-0.588027	such runtime frameworks	-0.124939
-0.871506	graphical interface frameworks	-0.124939
-0.580793	running. Such frameworks	-0.124939
-1.203686	see the excessive	-0.124939
-1.255723	into an excessive	-0.124939
-0.592613	avoid an excessive	-0.124939
-0.592613	Avoid an excessive	-0.124939
-0.600639	available use excessive	-0.124939
-2.780249	it is safer	-0.124939
-2.712507	It is safer	-0.124939
-0.902066	References are safer	-0.124939
-0.600488	seconds. A safer	-0.124939
-1.664236	is therefore safer	-0.124939
-1.744336	instruction set. Aligning	-0.124939
-0.143407	exp 12.8 Aligning	-0.124939
-0.143407	119 12.8 Aligning	-0.124939
-0.143407	access. 12.9 Aligning	-0.124939
-0.143407	120 12.9 Aligning	-0.124939
-0.524155	of out-of-order execution.	-0.124939
-0.372512	no out-of-order execution.	-0.124939
-0.372512	do out-of-order execution.	-0.124939
-0.372512	prevents out-of-order execution.	-0.124939
-0.573364	of parallel execution.	-0.124939
-0.203052	1024; int a[size],	-0.124939
-0.653900	i; float a[size],	-0.124939
-0.574719	1000; float a[size],	-0.124939
-1.754694	by the latency	-0.124939
-1.859233	as the latency	-0.124939
-1.704512	between the latency	-0.124939
-1.940873	has a latency	-0.124939
-0.901043	always to specify	-0.124939
-2.019296	recommended to specify	-0.124939
-1.499118	unless you specify	-0.124939
-1.286302	if we specify	-0.124939
-1.603115	as well specify	-0.124939
-0.849144	int i; for(i=0;	-0.346788
-2.180655	from the larger	-0.124939
-2.023694	using the larger	-0.124939
-2.334534	that is larger	-0.124939
-1.995260	have a larger	-0.124939
-0.878587	it allows larger	-0.124939
-2.281579	such as -(-a)	-0.124939
-0.901047	a*(b+c) - -(-a)	-0.124939
-2.363653	- n.a. -(-a)	-0.124939
-1.424850	the expression -(-a)	-0.124939
-0.589624	expressions like -(-a)	-0.124939
-1.021772	some cases. Multiple	-0.124939
-0.527341	are: 146 Multiple	-0.124939
-0.358945	a "function". Multiple	-0.124939
-0.358945	is exact. Multiple	-0.124939
-0.358945	Example 7.38a. Multiple	-0.124939
-0.902709	unit-testing is unfortunately	-0.124939
-1.026397	function, but unfortunately	-0.124939
-1.104277	functions, but unfortunately	-0.124939
-0.574866	readable but unfortunately	-0.124939
-0.850305	occur, but unfortunately	-0.124939
-2.177530	value of n!	-0.124939
-1.605443	{ // n!	-0.124939
-0.901193	value as n!	-0.124939
-0.595793	n 0 n!	-0.124939
-1.201101	efficiently if pieces	-0.124939
-0.563072	into small pieces	-0.124939
-0.563072	typically small pieces	-0.124939
-0.573340	joining identical pieces	-0.124939
-0.541266	only. Critical pieces	-0.124939
-1.457796	version of Basic	-0.124939
-1.712280	compiler for Basic	-0.124939
-0.474632	is Visual Basic	-0.124939
-0.677265	C#, Visual Basic	-0.124939
-0.557752	reliable solution. (In	-0.124939
-0.805964	avoid this. (In	-0.124939
-0.885102	user input. (In	-0.124939
-0.726819	be inlined. (In	-0.124939
-0.463659	edx, respectively. (In	-0.124939
-1.662513	the different microprocessors.	-0.124939
-1.440220	with other microprocessors.	-0.124939
-0.883527	most other microprocessors.	-0.124939
-1.193808	on most microprocessors.	-0.124939
-0.358959	on Intel/x86-compatible microprocessors.	-0.124939
-1.490214	in different modules.	-0.124939
-1.028464	from other modules.	-0.124939
-1.056493	any other modules.	-0.425969
-1.180815	and system modules.	-0.124939
-0.601421	_mm_load_ps(coef+i); // s	-0.124939
-0.557785	_mm_hadd_ps(x, x); s	-0.124939
-0.444395	int s; s	-0.124939
-0.314776	__m128 s; s	-0.124939
-0.358959	for(inti=0;i<16;i+=4){ //Loopby4 s	-0.124939
-3.018018	in the project	-0.124939
-2.643374	for the project	-0.124939
-1.803056	from a project	-0.124939
-0.864069	whole software project	-0.124939
-0.582111	typical software project	-0.124939
-0.902682	task is divided	-0.124939
-2.257490	can be divided	-0.425969
-0.901048	were not divided	-0.124939
-1.498214	is usually divided	-0.124939
-0.886791	function from www.agner.org/optimize/asmlib.zip.	-0.124939
-0.886791	Available from www.agner.org/optimize/asmlib.zip.	-0.124939
-0.500073	library at www.agner.org/optimize/asmlib.zip.	-0.124939
-1.794144	the library www.agner.org/optimize/asmlib.zip.	-0.124939
-0.190167	(Tuesday | Wednesday	-0.425969
-0.858773	Day == Wednesday	-0.124939
-0.828271	= 4, Wednesday	-0.124939
-0.463677	for Tuesday, Wednesday	-0.124939
-0.899744	suffer from mispredictions.	-0.124939
-0.359012	and branch mispredictions.	-0.124939
-0.806138	for branch mispredictions.	-0.124939
-0.550884	many branch mispredictions.	-0.124939
-0.601604	Software that relies	-0.124939
-2.652456	the code relies	-0.124939
-1.196925	your program relies	-0.124939
-0.875475	The mechanism relies	-0.124939
-0.358945	the MKL relies	-0.124939
-0.594084	pointers, etc. And	-0.124939
-1.106850	single precision. And	-0.124939
-0.999662	to maintain. And	-0.124939
-0.659897	table (PLT). And	-0.124939
-0.358945	languages. www.yeppp.info And	-0.124939
-1.052094	on different platforms,	-0.124939
-0.592842	browsers, different platforms,	-0.124939
-1.571055	for many platforms,	-0.124939
-0.597903	porting between platforms,	-0.124939
-0.593727	on Linux platforms,	-0.124939
-1.073773	bit to compare	-0.124939
-2.087133	want to compare	-0.124939
-0.899477	us to compare	-0.124939
-0.601746	error and compare	-0.124939
-0.594980	a[i+2] ; compare	-0.124939
-0.902682	reference is valid	-0.124939
-2.663990	is a valid	-0.124939
-2.396715	to a valid	-0.124939
-0.601832	bounds of valid	-0.124939
-1.078491	initialized to valid	-0.124939
-1.825514	piece of CPU-intensive	-0.124939
-1.076169	throughput of CPU-intensive	-0.124939
-0.601830	relate to CPU-intensive	-0.124939
-0.601635	language for CPU-intensive	-0.124939
-1.188976	for some CPU-intensive	-0.124939
-1.398293	the stack. Is	-0.124939
-0.295859	efficient solution. Is	-0.124939
-0.463677	binary tree. Is	-0.124939
-0.358959	page 38). Is	-0.124939
-1.324115	to do so.	-0.124939
-0.567237	or approximately so.	-0.124939
-0.358973	often excessively so.	-0.124939
-0.902668	cost is seen	-0.124939
-2.391448	should be seen	-0.124939
-2.590455	is not seen	-0.124939
-1.958746	I have seen	-0.124939
-0.541245	have ever seen	-0.124939
-0.579277	computing resources. Typically,	-0.124939
-1.333877	is enabled. Typically,	-0.124939
-1.252884	execution units. Typically,	-0.124939
-0.999662	the future. Typically,	-0.124939
-0.358945	memory caches. Typically,	-0.124939
-2.459278	by the 107	-0.124939
-2.206842	See page 107	-0.124939
-0.527313	vectorization ......................................................................................... 107	-0.124939
-0.358945	registers ................................................................. 107	-0.124939
-0.358945	registers .......................................................... 107	-0.124939
-1.200530	memory is contiguous	-0.124939
-2.155615	which is contiguous	-0.124939
-0.601200	preferably with contiguous	-0.124939
-1.659110	in one contiguous	-0.124939
-0.584237	two modules contiguous	-0.124939
-0.601357	pointer it gets	-0.124939
-0.599905	class which gets	-0.124939
-1.194072	generation class gets	-0.124939
-1.182336	end user gets	-0.124939
-0.588050	application programmer gets	-0.124939
-1.554655	series of manuals.	-0.124939
-0.601721	books and manuals.	-0.124939
-0.892872	my optimization manuals.	-0.124939
-1.149338	the subsequent manuals.	-0.124939
-0.940584	of five manuals.	-0.124939
-0.901089	definition. This tells	-0.124939
-1.180766	map file tells	-0.124939
-0.875508	const keyword tells	-0.124939
-0.589727	The profiler tells	-0.425969
-2.258808	is to wrap	-0.124939
-2.008743	recommended to wrap	-0.124939
-1.200507	guaranteed to wrap	-0.124939
-2.046642	do not wrap	-0.124939
-2.213674	the value wrap	-0.124939
-1.656544	or class separately	-0.124939
-1.191417	each object separately	-0.124939
-0.593692	or line separately	-0.124939
-0.879419	code branches separately	-0.124939
-0.577602	compile them separately	-0.124939
-0.836038	is pure __attribute((	-0.124939
-0.541266	function __fastcall __attribute((	-0.124939
-0.065809	__declspec( align(16)) __attribute((	-0.425969
-0.659926	__attribute(( const)) __attribute((	-0.124939
-0.601849	subtasks is necessary.	-0.124939
-2.536635	may be necessary.	-0.124939
-2.590455	is not necessary.	-0.124939
-1.904132	efficient than necessary.	-0.124939
-0.896038	checks where necessary.	-0.124939
-0.902682	CPUs is increasing	-0.124939
-1.076777	problem by increasing	-0.124939
-1.685143	for an increasing	-0.124939
-0.596762	seeing an increasing	-0.124939
-0.358959	a monotonically increasing	-0.124939
-0.677420	aligned by 16,	-0.425969
-1.784301	byte at 16,	-0.124939
-0.435069	than 8, 16,	-0.124939
-0.435069	4, 8, 16,	-0.124939
-1.435908	between different threads,	-0.124939
-1.457864	into multiple threads,	-0.124939
-0.773256	between multiple threads,	-0.124939
-0.894839	synchronization between threads,	-0.124939
-0.450732	program. 6 Development	-0.124939
-0.450732	24 6 Development	-0.124939
-1.164267	for details. Development	-0.124939
-0.358959	very old-fashioned. Development	-0.124939
-0.358959	IDE's (Integrated Development	-0.124939
-1.626010	that is AND'ed	-0.425969
-0.599784	cc[i]+2 is AND'ed	-0.124939
-0.599784	bb[i]*cc[i] is AND'ed	-0.124939
-1.959275	I have AND'ed	-0.124939
-0.731374	common subexpression elimination	-0.124939
-0.409810	Common subexpression elimination	-0.124939
-0.391182	propagation Pointer elimination	-0.124939
-0.391182	sin. Pointer elimination	-0.124939
-1.738200	but not all.	-0.124939
-0.873524	code at all.	-0.124939
-0.587025	supported at all.	-0.124939
-0.587025	offset at all.	-0.124939
-0.577639	reduce them all.	-0.124939
-2.622394	the compiler ..........................................................................................	-0.124939
-0.891270	System programming ..........................................................................................	-0.124939
-0.876623	system resources ..........................................................................................	-0.124939
-0.960773	and maintenance ..........................................................................................	-0.124939
-0.764640	data sequentially ..........................................................................................	-0.124939
-2.313731	use the upper	-0.124939
-0.444395	a reasonable upper	-0.124939
-0.314776	no reasonable upper	-0.124939
-0.902992	// Get upper	-0.124939
-0.358959	a not-too-big upper	-0.124939
-0.901236	and code addresses.	-0.124939
-1.074026	different memory addresses.	-0.124939
-0.864048	self- relative addresses.	-0.124939
-0.557769	32-bit absolute addresses.	-0.124939
-0.764591	at round addresses.	-0.124939
-2.021452	is a loop-invariant	-0.124939
-0.600665	elimination and loop-invariant	-0.124939
-0.600665	propagation, and loop-invariant	-0.124939
-0.596150	move out loop-invariant	-0.124939
-1.203408	addition to sum1	-0.124939
-1.198968	2) { sum1	-0.124939
-0.598094	summation variables sum1	-0.124939
-0.659897	float list[size], sum1	-0.124939
-0.358945	+= list[i+1];} sum1	-0.124939
-1.299012	-1 = ~a	-0.124939
-0.896067	a & ~a	-0.425969
-1.027735	a ^ ~a	-0.124939
-0.358959	(a&~b)|(~a&b)=a^b --------- ~a	-0.124939
-1.529269	induction variables Compilers	-0.124939
-0.596919	C++ code. Compilers	-0.124939
-0.877641	micro-op cache. Compilers	-0.124939
-1.212861	OS X Compilers	-0.124939
-0.550807	now overlap. Compilers	-0.124939
-0.358945	: "memory" );	-0.124939
-0.358945	3) <<6 );	-0.124939
-0.358945	int bb[size] );	-0.124939
-0.358945	int cc[size] );	-0.124939
-0.358945	int aa[size] );	-0.124939
-0.601824	goal of 18	-0.124939
-0.550847	1. Number 18	-0.124939
-0.463659	............................................................................. 158 18	-0.124939
-0.463659	installation .................................................................................................. 18	-0.124939
-0.358945	22). 159 18	-0.124939
-0.889546	something about them.	-0.124939
-1.877911	to avoid them.	-0.124939
-1.178712	that needs them.	-0.124939
-1.004434	before multiplying them.	-0.124939
-0.358945	that connect them.	-0.124939
-0.580236	is floating point.	-0.124939
-1.442716	to floating point.	-0.124939
-0.580236	as floating point.	-0.124939
-0.577651	values per point.	-0.124939
-0.527355	as entry point.	-0.124939
-1.521432	the time consumption	-0.124939
-1.696073	The time consumption	-0.124939
-0.584604	exact time consumption	-0.124939
-0.593985	low power consumption	-0.124939
-0.592255	member by 8.	-0.124939
-1.190454	divisible by 8.	-0.124939
-0.592255	index by 8.	-0.124939
-0.463714	if appropriate. 8.	-0.124939
-2.375458	If the key	-0.124939
-1.296347	like a key	-0.124939
-0.901211	pressing a key	-0.124939
-0.598791	index or key	-0.124939
-0.598791	move or key	-0.124939
-1.005721	for an explanation.	-0.124939
-0.592613	here's an explanation.	-0.124939
-1.343825	for further explanation.	-0.124939
-1.203894	a little explanation.	-0.124939
-0.601247	advantageous by itself.	-0.124939
-2.652456	the code itself.	-0.124939
-2.427962	the program itself.	-0.124939
-0.584238	the constructor itself.	-0.124939
-0.863987	the profiler itself.	-0.124939
-2.430174	to be updated	-0.124939
-2.799282	can be updated	-0.124939
-0.569819	not been updated	-0.124939
-0.358973	2014. Last updated	-0.124939
-0.600475	i will appear	-0.124939
-2.428385	the program appear	-0.124939
-0.670325	which they appear	-0.425969
-1.027768	the modules appear	-0.124939
-1.077900	way. The Codeplay	-0.124939
-0.847400	reasonably well. Codeplay	-0.124939
-0.541245	Constantfolding xxxxxxxxx Codeplay	-0.124939
-0.358945	1.4, 2005. Codeplay	-0.124939
-0.358945	The CodeGear, Codeplay	-0.124939
-0.598849	same object (except	-0.124939
-1.245102	integer expressions (except	-0.124939
-0.575636	previous iteration (except	-0.124939
-0.847400	two loops (except	-0.124939
-0.557734	point capabilities (except	-0.124939
-2.360070	If the combined	-0.124939
-2.067623	where the combined	-0.124939
-2.952704	can be combined	-0.124939
-1.202075	results are combined	-0.124939
-0.900812	Clang compiler combined	-0.124939
-1.503574	C++ is definitely	-0.124939
-0.581445	cases should definitely	-0.124939
-0.581445	framework should definitely	-0.124939
-0.862795	containers should definitely	-0.124939
-1.213010	lazy binding definitely	-0.124939
-0.601734	100 and jumps	-0.124939
-2.321395	that it jumps	-0.124939
-1.335610	a thread jumps	-0.124939
-0.249845	branches Eliminate jumps	-0.124939
-0.249845	1.; Eliminate jumps	-0.124939
-0.600231	right vector elements.	-0.124939
-1.656673	or class elements.	-0.124939
-1.221059	of array elements.	-0.124939
-0.584844	to array elements.	-0.124939
-1.322036	for finding elements.	-0.124939
-0.599873	across all .cpp	-0.124939
-1.001126	the multiple .cpp	-0.124939
-0.574453	compiling multiple .cpp	-0.124939
-0.574453	combining multiple .cpp	-0.124939
-1.162449	the current .cpp	-0.124939
-1.495836	with many features,	-0.124939
-1.352728	has many features,	-0.124939
-0.591619	advanced optimizing features,	-0.124939
-0.570549	full metaprogramming features,	-0.124939
-0.505062	better backup features,	-0.124939
-1.635314	position-independent code flag	-0.124939
-0.588631	the zero flag	-0.124939
-0.304916	the carry flag	-0.124939
-0.410940	i += 8)	-0.726999
-1.018348	(iset >= 8)	-0.124939
-2.428702	with the ever	-0.124939
-0.601759	invest in ever	-0.124939
-1.958746	I have ever	-0.124939
-0.886998	no exception ever	-0.124939
-0.957962	is hardly ever	-0.124939
-0.594310	directly // Writes	-0.124939
-0.202559	1" // Writes	-0.425969
-0.594310	p2->Hello(); // Writes	-0.124939
-0.876700	system resources Writes	-0.124939
-0.902412	9 and 13	-0.124939
-1.784085	byte at 13	-0.124939
-0.557752	.......................................................................................................... 120 13	-0.124939
-0.358945	files. 121 13	-0.124939
-0.358945	0.6 1.19 13	-0.124939
-1.078277	numbers in b[i]	-0.124939
-1.740701	a[i] = b[i]	-0.124939
-0.601298	checking if b[i]	-0.124939
-1.948511	i++) { b[i]	-0.124939
-1.565206	size; i++) b[i]	-0.124939
-1.729498	time is doubled.	-0.124939
-1.293144	registers is doubled.	-0.124939
-1.197800	frequency is doubled.	-0.124939
-2.591878	is not doubled.	-0.124939
-1.766776	has been doubled.	-0.124939
-1.078165	read and written	-0.124939
-2.391448	should be written	-0.124939
-0.598454	point value written	-0.124939
-0.590477	with programs written	-0.124939
-0.358945	a hand- written	-0.124939
-1.366520	of programming languages,	-0.124939
-1.018853	other programming languages,	-0.124939
-0.531157	multiple programming languages,	-0.124939
-0.771251	Some programming languages,	-0.124939
-0.358988	interpreted script languages,	-0.124939
-1.061930	new or malloc	-0.124939
-0.891514	delete or malloc	-0.124939
-0.596233	delete, or malloc	-0.124939
-1.706260	the functions malloc	-0.124939
-0.567220	delete (or malloc	-0.124939
-1.607021	program that runs	-0.124939
-1.192866	software that runs	-0.124939
-0.895555	thread that runs	-0.124939
-2.428809	the program runs	-0.124939
-0.597952	most software runs	-0.124939
-1.195087	a is true,	-0.124939
-1.625571	b is true,	-0.124939
-0.599784	0 is true,	-0.124939
-0.599784	|| is true,	-0.124939
-0.601134	count as true,	-0.124939
-1.446556	doing the division.	-0.124939
-0.601335	multiplication or division.	-0.124939
-1.594749	integer vector division.	-0.124939
-2.579274	floating point division.	-0.124939
-0.898288	and integer division.	-0.124939
-0.598928	Y = C;	-0.124939
-0.896848	x.c = C;	-0.124939
-0.599136	B*x + C;	-0.124939
-0.107201	A, B, C;	-0.124939
-0.505097	0.18 0.18 0.18	-0.124939
-0.358984	0.11 0.18 0.18	-0.124939
-0.505062	0.12 0.11 0.18	-0.124939
-0.505062	2 0.12 0.18	-0.124939
-0.358959	0.63 0.75 0.18	-0.124939
-0.599140	_WIN32 n.a. MS	-0.124939
-0.786235	to optimization MS	-0.425969
-0.557757	or int64_t MS	-0.124939
-0.527334	or uint64_t MS	-0.124939
-0.899878	cc); } #endif	-0.124939
-0.567187	ptr n; #endif	-0.124939
-0.726819	#define pure_function #endif	-0.124939
-0.358945	X __attribute__((aligned(16))) #endif	-0.124939
-0.358945	FUNCNAME SelectAddMul_AVX2 #endif	-0.124939
-3.141244	of the present	-0.124939
-3.017205	in the present	-0.124939
-0.601643	Fog The present	-0.124939
-1.077907	Optimizing for present	-0.124939
-0.901048	were not present	-0.124939
-1.292054	15.1b to 15.1c	-0.124939
-1.196924	15.1a to 15.1c	-0.124939
-0.600250	15.1d to 15.1c	-0.124939
-2.150167	in example 15.1c	-0.124939
-0.358973	to 151 15.1c	-0.124939
-0.974251	size = 1000;	-0.124939
-0.594040	ArraySize = 1000;	-0.124939
-0.594040	arraysize = 1000;	-0.124939
-2.180330	i < 1000;	-0.124939
-3.141244	of the strlen	-0.124939
-0.902154	tested the strlen	-0.124939
-0.463677	0.29 0.28 strlen	-0.124939
-0.358959	some examples: strlen	-0.124939
-0.358959	0.59 0.27 strlen	-0.124939
-2.302343	code is __asm	-0.124939
-0.601350	3; or __asm	-0.124939
-0.586734	ptr x; __asm	-0.124939
-0.143403	Intel/MASM syntax: __asm	-0.124939
-0.143403	Gnu/AT&T syntax: __asm	-0.124939
-0.333613	one clock cycle.	-0.301030
-0.192245	every clock cycle.	-0.124939
-1.784085	byte at 11	-0.124939
-0.895808	multiplication takes 11	-0.124939
-0.594461	mixed language 11	-0.124939
-0.575614	rarely needed. 11	-0.124939
-0.527313	..................................................................................................... 103 11	-0.124939
-0.601604	*.so) that belong	-0.124939
-0.599845	addresses all belong	-0.124939
-0.597057	functions often belong	-0.124939
-0.596158	stack always belong	-0.124939
-1.425455	cache lines belong	-0.124939
-0.598608	destroyed. In 50	-0.124939
-1.189676	conversion takes 50	-0.124939
-0.527313	types .............................................................................................. 50	-0.124939
-0.505042	parameters ............................................................................................... 50	-0.124939
-0.505042	code took 50	-0.124939
-0.600729	Environments) have facilities	-0.124939
-0.868136	using advanced facilities	-0.124939
-0.169344	If search facilities	-0.425969
-0.550826	have powerful facilities	-0.124939
-0.601037	1 - 5.	-0.124939
-1.455334	the constant 5.	-0.124939
-0.585158	j << 5.	-0.124939
-0.584245	VIA CPUs. 5.	-0.124939
-0.659897	for AVX. 5.	-0.124939
-1.201425	but is currently	-0.124939
-1.297440	version is currently	-0.124939
-1.077647	classes are currently	-0.124939
-1.471332	The method currently	-0.124939
-0.593307	Gnu manual currently	-0.124939
-0.590054	in multiplication here:	-0.124939
-0.567229	be mentioned here:	-0.124939
-0.541245	main principles here:	-0.124939
-0.541292	the pitfalls here:	-0.124939
-0.358945	error reporting here:	-0.124939
-1.021821	some cases. Does	-0.124939
-1.218434	instruction sets. Does	-0.124939
-0.416971	32-bit Windows. Does	-0.124939
-0.764627	an IDE. Does	-0.124939
-1.296696	problem with macros	-0.124939
-1.549619	used as macros	-0.124939
-1.175844	should avoid macros	-0.124939
-0.594623	48 Use macros	-0.124939
-0.358945	18.3. Predefined macros	-0.124939
-2.009244	You may prefer	-0.124939
-0.596330	programmer may prefer	-0.124939
-0.600847	solution you prefer	-0.124939
-0.600475	users will prefer	-0.124939
-0.895664	Here we prefer	-0.124939
-2.510779	of the divisor	-0.425969
-0.203383	Faster if divisor	-0.425969
-1.765686	a constant divisor	-0.124939
-0.249860	19 3.5 Program	-0.124939
-0.249860	process. 3.5 Program	-0.124939
-0.143412	16 3.3 Program	-0.124939
-0.143412	sections. 3.3 Program	-0.124939
-1.078590	processors is better.	-0.124939
-0.836056	will work better.	-0.124939
-0.567251	model work better.	-0.124939
-0.358973	is clearly better.	-0.124939
-1.713188	Linux and BSD,	-0.124939
-1.845003	based on BSD,	-0.124939
-0.808329	64-bit Linux, BSD,	-0.124939
-0.491948	(Windows, Linux, BSD,	-0.124939
-2.429279	with the bit-mask:	-0.124939
-0.601307	generate a bit-mask:	-0.425969
-0.659955	the inverted bit-mask:	-0.124939
-2.144306	power of two.	-0.124939
-0.601350	year or two.	-0.124939
-2.366105	rather than two.	-0.124939
-1.696117	will make two.	-0.124939
-1.283008	to start up,	-0.124939
-0.764667	be cleaned up,	-0.124939
-0.505062	computer starts up,	-0.124939
-0.505062	is filled up,	-0.124939
-0.527357	resources cleaned up.	-0.124939
-0.726852	program starts up.	-0.124939
-0.726852	be filled up.	-0.124939
-0.358959	be broken up.	-0.124939
-0.598009	for performance reasons.	-0.124939
-1.270793	for several reasons.	-0.124939
-0.567219	for usability reasons.	-0.124939
-0.463677	for marketing reasons.	-0.124939
-2.207566	See page 103	-0.124939
-0.505062	Hyperthreading ..................................................................................................... 103	-0.124939
-0.463677	execution ................................................................................................. 103	-0.124939
-0.659926	by writing: 103	-0.124939
-0.200158	4 2 Choosing	-0.425969
-0.515453	23 5 Choosing	-0.124939
-0.515453	website. 5 Choosing	-0.124939
-1.293013	the time slices	-0.124939
-0.584612	get time slices	-0.124939
-1.217793	of an exception.	-0.124939
-0.592622	throws an exception.	-0.124939
-0.593063	causes another exception.	-0.124939
-0.595643	using & enum	-0.124939
-0.594656	Enums An enum	-0.124939
-1.009703	multiple conditions enum	-0.124939
-0.358959	double, bool, enum	-0.124939
-2.030049	a program repeats	-0.124939
-1.868164	a loop repeats	-0.124939
-0.883134	This loop repeats	-0.124939
-0.598735	that also repeats	-0.124939
-2.412752	with the highest	-0.124939
-2.455157	when the highest	-0.124939
-1.073561	cycle. The highest	-0.124939
-0.899335	implemented. The highest	-0.124939
-1.062660	in matrix 96	-0.124939
-0.541266	sequentially .......................................................................................... 96	-0.124939
-0.463677	Strings ...................................................................................................................... 96	-0.124939
-0.358959	structures ............................................................. 96	-0.124939
-0.901072	going to recommend	-0.124939
-0.601050	teachers to recommend	-0.124939
-0.065813	programming textbooks recommend	-0.124939
-2.014021	likely to lead	-0.124939
-2.072892	This can lead	-0.124939
-0.596988	insight can lead	-0.124939
-0.596988	bottlenecks can lead	-0.124939
-1.063505	make an additional	-0.124939
-1.183342	making an additional	-0.124939
-2.623007	the compiler additional	-0.124939
-0.764664	for transferring additional	-0.124939
-2.249631	is no 51	-0.124939
-2.207566	See page 51	-0.124939
-0.358959	and classes............................................................................................ 51	-0.124939
-0.358959	(properties) ............................................................................ 51	-0.124939
-0.600231	2-dimensional vector 56	-0.124939
-0.527334	functions .............................................................................................. 56	-0.124939
-0.527334	operators ............................................................................................. 56	-0.124939
-0.358959	Bitfields ................................................................................................................... 56	-0.124939
-2.088254	be a type.	-0.124939
-1.845893	a different type.	-0.124939
-0.599665	each integer type.	-0.124939
-0.505062	a wrong type.	-0.124939
-1.997915	into a place	-0.124939
-2.030311	recommended to place	-0.124939
-1.248707	from one place	-0.124939
-0.591079	shifts one place	-0.124939
-2.780737	it is preferable	-0.124939
-1.297467	linking is preferable	-0.124939
-2.537796	may be preferable	-0.124939
-1.838243	is often preferable	-0.124939
-1.200065	CPU to overlap	-0.124939
-2.059274	able to overlap	-0.124939
-0.601478	capabilities can overlap	-0.124939
-2.046642	do not overlap	-0.124939
-0.332858	fit the eight-element	-0.726999
-1.027941	typically takes 40	-0.124939
-1.044358	division takes 40	-0.124939
-0.764695	int s; 40	-0.124939
-0.358973	Type conversions.................................................................................................... 40	-0.124939
-0.600940	to x 43	-0.124939
-1.511287	See page 43	-0.124939
-0.358973	switch statements............................................................................. 43	-0.124939
-0.855380	systems and sixteen	-0.425969
-0.601365	eight or sixteen	-0.124939
-0.573357	contain either sixteen	-0.124939
-2.200484	used for turning	-0.124939
-1.453382	or by turning	-0.124939
-0.889558	program by turning	-0.124939
-0.595241	just by turning	-0.124939
-0.593630	pointer at initialization.	-0.124939
-1.179307	library at initialization.	-0.124939
-1.279722	doesn't need initialization.	-0.124939
-1.396895	the necessary initialization.	-0.124939
-0.582094	PC platforms. Graphics	-0.124939
-0.505082	File input/output Graphics	-0.124939
-0.143407	access. 3.10 Graphics	-0.124939
-0.143407	21 3.10 Graphics	-0.124939
-2.076326	where the obstacles	-0.124939
-1.780529	of these obstacles	-0.124939
-0.595279	Some important obstacles	-0.124939
-1.490907	most common obstacles	-0.124939
-3.018834	in the asmlib	-0.124939
-1.850537	but the asmlib	-0.124939
-0.201516	set, using asmlib	-0.425969
-1.183969	of code. Furthermore,	-0.124939
-1.203680	is executed. Furthermore,	-0.124939
-0.463677	system crash. Furthermore,	-0.124939
-0.358959	in edx. Furthermore,	-0.124939
-1.613067	possible to obtain	-0.425969
-1.533673	you can obtain	-0.124939
-2.163251	value of ebx.	-0.124939
-1.596102	bit of ebx.	-0.124939
-0.601837	edx, to ebx.	-0.124939
-0.505082	and pop ebx.	-0.124939
-0.601350	prediction or estimate	-0.124939
-0.764667	a reasonable estimate	-0.124939
-0.358959	can roughly estimate	-0.124939
-0.358959	if our estimate	-0.124939
-1.431538	set is enabled	-0.124939
-1.460377	is always enabled	-0.124939
-0.577639	leave them enabled	-0.124939
-0.902488	efficient and enables	-0.124939
-0.202370	file. This enables	-0.425969
-1.053645	modules. This enables	-0.124939
-0.249860	option. 8.4 Obstacles	-0.124939
-0.249860	77 8.4 Obstacles	-0.124939
-0.143412	compilers. 8.3 Obstacles	-0.124939
-0.143412	74 8.3 Obstacles	-0.124939
-0.187819	int & r)	-0.425969
-0.526063	(int & r)	-0.124939
-0.526063	Sum3(S3 & r)	-0.124939
-0.601763	arranged in regular	-0.124939
-1.202421	data for regular	-0.124939
-0.600329	Internet at regular	-0.124939
-1.750278	a simple regular	-0.124939
-2.177673	value of m	-0.124939
-1.479376	the way m	-0.124939
-0.519822	template function, m	-0.124939
-0.751736	simple function, m	-0.124939
-0.596949	as code. Metaprogramming	-0.124939
-0.391163	150 15 Metaprogramming	-0.124939
-0.391163	90. 15 Metaprogramming	-0.124939
-0.764664	15 Metaprogramming Metaprogramming	-0.124939
-0.588078	following examples explain	-0.124939
-0.129476	Let me explain	-0.124939
-0.505082	libraries. To explain	-0.124939
-0.596860	takes time. Dispatch	-0.124939
-0.588041	128 below. Dispatch	-0.124939
-1.036647	different compilers. Dispatch	-0.124939
-0.358959	different times: Dispatch	-0.124939
-0.901193	platforms as well,	-0.124939
-0.886143	are optimized well,	-0.124939
-1.024994	is predicted well,	-0.124939
-0.505091	optimizes reasonably well,	-0.124939
-1.961230	this is sufficiently	-0.124939
-0.900264	arrays are sufficiently	-0.425969
-0.463696	This worked sufficiently	-0.124939
-0.588041	13.1 below. 126	-0.124939
-0.557757	127 127 126	-0.124939
-0.541266	maintenance .......................................................................................... 126	-0.124939
-0.505062	Implementation ..................................................................................................... 126	-0.124939
-1.078569	double is bad	-0.124939
-2.771340	in a bad	-0.124939
-1.503783	examples of bad	-0.124939
-0.579306	works particularly bad	-0.124939
-0.087492	static double p(double	-0.726999
-1.636532	that is said	-0.425969
-2.953773	can be said	-0.124939
-0.847518	often easier said	-0.124939
-2.761794	to the modulo	-0.124939
-1.379061	apply to modulo	-0.124939
-0.598942	as i modulo	-0.124939
-1.878042	to avoid modulo	-0.124939
-1.378333	files and databases	-0.124939
-0.177780	3.9 Other databases	-0.124939
-0.541288	to remote databases	-0.124939
-0.187298	_controlfp_s(&dummy, 0, _EM_OVERFLOW);	-0.425969
-0.065813	// _controlfp(0, _EM_OVERFLOW);	-0.425969
-0.550842	if protection against	-0.124939
-0.358959	must warn against	-0.124939
-0.358959	be weighed against	-0.124939
-0.358959	possible remedies against	-0.124939
-0.863983	register size. Vectorized	-0.124939
-0.764627	overloaded operators. Vectorized	-0.124939
-0.358959	Example 12.4b. Vectorized	-0.124939
-0.358959	} 112 Vectorized	-0.124939
-0.358959	1: printf("Beta"); break;	-0.124939
-0.358959	2: printf("Gamma"); break;	-0.124939
-0.358959	3: printf("Delta"); break;	-0.124939
-0.358959	0: printf("Alpha"); break;	-0.124939
-2.679031	that the loader	-0.124939
-2.425895	by the loader	-0.124939
-0.601250	loaded, the loader	-0.124939
-0.902488	linker and loader	-0.124939
-0.957972	model number. Failure	-0.124939
-0.249852	also deallocated. Failure	-0.124939
-0.358990	been deallocated. Failure	-0.124939
-0.659955	program flow. Failure	-0.124939
-1.582053	class is declared.	-0.124939
-1.586420	variable is declared.	-0.124939
-0.203665	MemberPointer is declared.	-0.124939
-1.977732	lot of resources,	-0.124939
-0.600585	require more resources,	-0.124939
-2.645746	the same resources,	-0.124939
-0.580801	to network resources,	-0.124939
-1.203304	a is true.	-0.124939
-1.077907	1 for true.	-0.124939
-2.391918	should be true.	-0.124939
-1.672351	not always true.	-0.124939
-1.378257	arrays and objects.	-0.124939
-1.748308	for all objects.	-0.124939
-1.656673	or class objects.	-0.124939
-0.593012	every four objects.	-0.124939
-1.189973	calculations in parallel.	-0.124939
-1.283417	run in parallel.	-0.124939
-1.429545	running in parallel.	-0.124939
-0.895959	things in parallel.	-0.124939
-1.077577	one by one,	-0.124939
-1.291118	and only one,	-0.124939
-1.460209	is always one,	-0.124939
-0.550810	label plus one,	-0.124939
-0.588425	14.12b int list[300];	-0.124939
-0.588425	14.13b int list[300];	-0.124939
-0.588425	14.13a int list[300];	-0.124939
-0.588425	14.12a int list[300];	-0.124939
-0.212168	< SIZE; r++)	-0.726999
-2.394046	a = parabola	-0.124939
-0.574719	8.3a float parabola	-0.124939
-0.574719	a;} float parabola	-0.124939
-0.574719	8.1b float parabola	-0.124939
-0.596673	x^2 // x^4	-0.124939
-0.596673	xx4(x4); // x^4	-0.124939
-0.596673	x2; // x^4	-0.124939
-0.358988	x^2, x^3, x^4	-0.124939
-1.299986	like a mouse	-0.124939
-0.601746	keyboard and mouse	-0.124939
-0.598791	press or mouse	-0.124939
-0.598791	keyboard or mouse	-0.124939
-0.539553	partial template specialization	-0.124939
-0.190878	Full template specialization	-0.425969
-0.539553	Partial template specialization	-0.124939
-2.327468	the loop index.	-0.124939
-1.795095	an array index.	-0.124939
-1.750278	a simple index.	-0.124939
-0.358959	a top-of-stack index.	-0.124939
-0.598009	system performance options.	-0.124939
-0.596936	good optimization options.	-0.124939
-1.167314	all relevant options.	-0.124939
-0.358959	performance monitoring options.	-0.124939
-0.627133	< SIZE; c++)	-0.425969
-0.107204	< r; c++)	-0.425969
-0.894310	data elements are.	-0.124939
-1.485063	to optimization are.	-0.124939
-1.202568	that they are.	-0.124939
-0.575274	what they are.	-0.124939
-2.538377	may be needed,	-0.124939
-2.019532	they are needed,	-0.124939
-0.203278	facilities are needed,	-0.124939
-1.713667	way of declaring	-0.124939
-1.510678	done by declaring	-0.124939
-0.889558	smaller by declaring	-0.124939
-0.595241	inlined by declaring	-0.124939
-3.070236	in the SVML	-0.124939
-0.578272	double Intel SVML	-0.124939
-0.578272	vmldExp2 Intel SVML	-0.124939
-0.578272	__svml_exp2 Intel SVML	-0.124939
-0.378907	(*.dll or *.so).	-0.425969
-0.065813	objects (*.dll, *.so).	-0.124939
-0.884409	u; if (u.i	-0.124939
-0.202215	v; if (u.i	-0.124939
-0.592619	143 if (u.i	-0.124939
-0.594503	and necessary support.	-0.124939
-1.054646	without AVX support.	-0.124939
-0.563041	with profiling support.	-0.124939
-0.358959	with C++0x support.	-0.124939
-0.902488	addition and subtraction	-0.124939
-0.143415	than addition, subtraction	-0.425969
-0.505118	integer addition, subtraction	-0.124939
-0.203514	two); // Multiply	-0.425969
-0.600963	7.42 int Multiply	-0.124939
-0.358973	&& b<c) Multiply	-0.124939
-0.249852	0) *(p++) |=	-0.124939
-0.249852	i--) *(p++) |=	-0.124939
-0.358973	x; *(int*)&x |=	-0.124939
-0.358973	2.0f; x.i |=	-0.124939
-1.497239	a memory pool.	-0.124939
-0.580856	or memory pool.	-0.124939
-1.391138	same memory pool.	-0.124939
-0.861670	one memory pool.	-0.124939
-1.298824	version that performs	-0.124939
-0.591550	code version performs	-0.124939
-0.594672	Core2 processor performs	-0.124939
-2.542668	and the "Intel	-0.124939
-1.872112	as the "Intel	-0.124939
-0.601653	(www.boost.org). The "Intel	-0.124939
-0.527373	optimization Intel: "Intel	-0.124939
-1.743000	compile time. Are	-0.124939
-0.527357	top-of-stack index. Are	-0.124939
-0.463677	too small. Are	-0.124939
-0.358959	list. 94 Are	-0.124939
-0.601643	operators The pre-increment	-0.124939
-0.900235	you use pre-increment	-0.124939
-1.429785	situations where pre-increment	-0.124939
-0.583249	you change pre-increment	-0.124939
-0.902438	object, and ownership	-0.124939
-0.835997	to transfer ownership	-0.124939
-0.358959	that transfers ownership	-0.124939
-0.358959	that looses ownership	-0.124939
-2.207566	See page 88	-0.124939
-0.358959	and delete). 88	-0.124939
-0.358959	together ...................................... 88	-0.124939
-0.358959	stored together...................................... 88	-0.124939
-0.314803	*(int*)&x |= 0x80000000;	-0.124939
-0.314803	x.i |= 0x80000000;	-0.124939
-0.143412	u.i ^= 0x80000000;	-0.124939
-0.143412	u.i[1] ^= 0x80000000;	-0.124939
-1.423891	it can move	-0.124939
-0.600889	container may move	-0.124939
-0.527355	a mouse move	-0.124939
-0.899892	c; } Can	-0.124939
-0.965102	at all. Can	-0.124939
-0.505062	since 2004. Can	-0.124939
-0.659926	the container. Can	-0.124939
-1.713584	way of defining	-0.124939
-2.200138	used for defining	-0.124939
-1.067817	problem by defining	-0.124939
-0.598239	overcome by defining	-0.124939
-1.978606	code that produces	-0.124939
-1.622805	program that produces	-0.124939
-0.583903	unsigned variable produces	-0.124939
-0.583903	signed variable produces	-0.124939
-0.601267	loss of precision,	-0.124939
-0.792893	or double precision,	-0.124939
-1.964794	have a non-inlined	-0.124939
-2.061375	make a non-inlined	-0.124939
-1.492015	making a non-inlined	-0.124939
-0.601072	module. This non-inlined	-0.124939
-3.087114	of the drawbacks	-0.124939
-0.203958	Overcoming the drawbacks	-0.425969
-0.902488	advantages and drawbacks	-0.124939
-2.021978	x) { __declspec(align(16))	-0.124939
-0.527357	Example 12.2 __declspec(align(16))	-0.124939
-0.659926	#define Alignd(X) __declspec(align(16))	-0.124939
-0.463677	Data alignment. __declspec(align(16))	-0.124939
-1.600693	bit of u.f	-0.124939
-1.377972	know that u.f	-0.124939
-2.303308	{ // u.f	-0.124939
-0.557771	1.0 <= u.f	-0.124939
-2.668872	for the commercial	-0.124939
-0.600488	VectorC A commercial	-0.124939
-1.286976	in many commercial	-0.124939
-0.463677	License, optional commercial	-0.124939
-0.881051	and write configuration	-0.124939
-0.806033	resource files, configuration	-0.124939
-0.463677	several drivers, configuration	-0.124939
-0.358959	of DLLs, configuration	-0.124939
-1.830883	on page 134	-0.124939
-1.871453	(see page 134	-0.124939
-0.659955	of range"; 134	-0.124939
-0.463696	checking .................................................................................................. 134	-0.124939
-0.601148	or code lines.	-0.124939
-0.672004	same cache lines.	-0.124939
-1.805347	a few lines.	-0.124939
-1.712280	compiler for restrictions	-0.124939
-0.592518	very few restrictions	-0.124939
-0.359669	are certain restrictions	-0.425969
-1.292328	x n.a. Constant	-0.124939
-0.588634	Borland Microsoft Constant	-0.124939
-0.358959	= 6.0f; Constant	-0.124939
-0.358959	few places. Constant	-0.124939
-0.675826	the heap manager	-0.124939
-0.242694	The heap manager	-0.124939
-0.597878	buffer, branch pattern	-0.124939
-0.028027	simple periodic pattern	-0.301030
-2.367622	because the x86-64	-0.124939
-0.379653	x86 and x86-64	-0.425969
-0.505082	_M_IX86 _M_IX86 x86-64	-0.124939
-1.809849	assume that *p+2	-0.124939
-0.898881	assuming that *p+2	-0.124939
-0.814099	and calculate *p+2	-0.124939
-0.555289	could calculate *p+2	-0.124939
-0.601746	Codeplay and Watcom	-0.124939
-0.415702	well. Open Watcom	-0.124939
-0.415702	2004. Open Watcom	-0.124939
-0.541303	xxxxxxxxx Codeplay Watcom	-0.124939
-2.101191	make a round	-0.124939
-0.601837	members to round	-0.124939
-0.593630	aligned at round	-0.124939
-1.179307	loaded at round	-0.124939
-1.759356	or more cores,	-0.124939
-0.600034	or CPU cores,	-0.124939
-0.598855	instructions, multiple cores,	-0.124939
-0.505062	got RISC cores,	-0.124939
-0.854794	branch that chooses	-0.425969
-2.030049	a program chooses	-0.124939
-0.596204	cache always chooses	-0.124939
-1.361216	program is running.	-0.124939
-2.107235	they are running.	-0.124939
-0.600586	itself when running.	-0.124939
-1.604316	code is serial	-0.124939
-0.065813	12.6 Transforming serial	-0.425969
-0.485804	elements from cc	-0.602060
-0.581066	b: from cc	-0.124939
-0.599041	call // Header	-0.124939
-0.599041	later // Header	-0.124939
-1.429968	Instruction set Header	-0.124939
-0.358973	Table 12.2. Header	-0.124939
-1.829377	functions that 150	-0.124939
-2.207566	See page 150	-0.124939
-0.541266	programming .......................................................................................... 150	-0.124939
-0.505062	Metaprogramming ....................................................................................................... 150	-0.124939
-0.598680	quite efficient thanks	-0.124939
-0.589631	data automatically thanks	-0.124939
-0.570559	very similar thanks	-0.124939
-0.828364	becomes fragmented thanks	-0.124939
-1.489048	x = 2.0;	-0.124939
-1.063310	(x = 2.0;	-0.124939
-0.594052	list[i].b = 2.0;	-0.124939
-0.594052	temp->b = 2.0;	-0.124939
-1.261749	into the pipeline	-0.124939
-1.498713	However, the pipeline	-0.124939
-1.802970	using a pipeline	-0.124939
-1.998912	unsigned int n)	-0.124939
-0.184356	factorial (int n)	-0.425969
-1.217117	SomeFunction (int n)	-0.124939
-0.335878	for user input.	-0.124939
-0.764700	or mouse input.	-0.124939
-2.622700	the compiler 8.1	-0.124939
-1.419473	in table 8.1	-0.124939
-0.588032	overflow. Table 8.1	-0.124939
-0.505062	.......................................................................................... 66 8.1	-0.124939
-0.595787	worst- case conditions.	-0.124939
-0.883077	other hardware conditions.	-0.124939
-0.764627	under worst-case conditions.	-0.124939
-0.358959	the best-case conditions.	-0.124939
-0.895481	more by choosing	-0.124939
-1.189033	obtained by choosing	-0.124939
-1.296622	account when choosing	-0.124939
-1.745240	the programmer choosing	-0.124939
-1.317906	on page 146	-0.425969
-0.866229	linking are: 146	-0.124939
-0.358973	dynamic libraries............................................................................ 146	-0.124939
-0.899148	Overloaded functions ..............................................................................................	-0.124939
-0.874332	return types ..............................................................................................	-0.124939
-0.872980	Optimization directives ..............................................................................................	-0.124939
-1.032688	cache control ..............................................................................................	-0.124939
-0.901616	intrinsic function _mm256_zeroupper()	-0.124939
-0.240664	then call _mm256_zeroupper()	-0.602060
-1.162419	the user. Making	-0.124939
-0.358983	120 13 Making	-0.124939
-0.358983	121 13 Making	-0.124939
-0.358973	significant improvements. Making	-0.124939
-1.077814	set the flush-to-zero	-0.124939
-1.202297	setting the flush-to-zero	-0.124939
-0.391182	7.6. Set flush-to-zero	-0.124939
-0.391182	7.5. Set flush-to-zero	-0.124939
-2.655551	of a Taylor	-0.124939
-2.282091	such as Taylor	-0.124939
-0.358959	Example 12.9b. Taylor	-0.124939
-0.358959	Example 12.9a. Taylor	-0.124939
-0.598537	FuncType * SelectAddMul_pointer	-0.124939
-0.563092	>= 2) SelectAddMul_pointer	-0.124939
-0.541324	>= 8) SelectAddMul_pointer	-0.124939
-0.505062	>= 5) SelectAddMul_pointer	-0.124939
-2.762335	to the dispatcher.	-0.124939
-2.430788	to a dispatcher.	-0.124939
-0.202132	Intel's CPU dispatcher.	-0.425969
-0.242697	the Gnu, Clang,	-0.301030
-0.255946	as Gnu, Clang,	-0.124939
-0.601734	14.8 and 14.9	-0.124939
-2.633058	// Example 14.9	-0.124939
-0.505062	................................... 141 14.9	-0.124939
-0.659926	(double)(signed int)u; 14.9	-0.124939
-1.498942	only on n,	-0.124939
-0.595542	compile-time constant n,	-0.124939
-0.430491	double x, n,	-0.425969
-2.633058	// Example 14.8	-0.124939
-2.149682	in example 14.8	-0.124939
-0.505062	another platform. 14.8	-0.124939
-0.505062	double..................................................................................... 140 14.8	-0.124939
-1.445564	risk of overflow,	-0.124939
-0.902240	checking for overflow,	-0.124939
-0.599665	violation, integer overflow,	-0.124939
-1.054638	to cause overflow,	-0.124939
-0.970717	< 100; x++)	-0.425969
-0.836014	<= n; x++)	-0.124939
-0.358973	0; i--, x++)	-0.124939
-1.298529	conditions are optimal.	-0.124939
-2.418943	is not optimal.	-0.124939
-2.081358	are not optimal.	-0.124939
-1.074139	far from optimal.	-0.124939
-0.028027	{ _mm_storeu_si128((__m128i *)d,	-0.301030
-0.358988	{ _mm_store_si128((__m128i *)d,	-0.124939
-2.656060	of a class,	-0.124939
-0.463733	Intel Vector class,	-0.124939
-0.463733	bits Vector class,	-0.124939
-1.137108	a derived class,	-0.124939
-0.600458	cos(x); } z	-0.124939
-0.588027	y && z	-0.124939
-0.659926	= cos(x); z	-0.124939
-0.659926	= sin(x); z	-0.124939
-0.378776	calculated in advance	-0.425969
-1.286883	needed in advance	-0.124939
-0.598480	know in advance	-0.124939
-0.387640	into vector c:	-0.249877
-1.636356	b is guaranteed	-0.124939
-0.601166	i&15 is guaranteed	-0.124939
-2.107235	they are guaranteed	-0.124939
-2.591878	is not guaranteed	-0.124939
-2.095347	You may think	-0.124939
-1.627670	and then think	-0.124939
-0.595428	but I think	-0.124939
-0.877660	I don't think	-0.124939
-1.662228	for an example.	-0.124939
-1.434834	with an example.	-0.124939
-1.691901	as an example.	-0.124939
-2.044913	in this example.	-0.124939
-1.202454	available. The older	-0.124939
-1.375076	compatibility with older	-0.124939
-1.201423	effect on older	-0.124939
-0.577653	tree. On older	-0.124939
-0.902696	as is commonly	-0.124939
-1.089572	The most commonly	-0.425969
-1.428284	are two commonly	-0.124939
-1.642308	up the queue	-0.124939
-1.078532	implement a queue	-0.124939
-0.600488	times. A queue	-0.124939
-0.659926	a FIFO queue	-0.124939
-0.601945	exiting the {}	-0.124939
-0.593543	it inside {}	-0.124939
-0.726852	< 5) {}	-0.124939
-0.358959	}; vector() {}	-0.124939
-1.733835	a + 1.0f;	-0.124939
-0.463243	list[i] += 1.0f;	-0.425969
-0.534305	15] += 1.0f;	-0.124939
-0.249860	pop ret ALIGN	-0.124939
-0.249860	$B2$3: ret ALIGN	-0.124939
-0.065813	to assembly: ALIGN	-0.425969
-0.599610	requires no modification	-0.124939
-1.328940	may need modification	-0.124939
-0.575702	therefore need modification	-0.124939
-1.048598	a certain modification	-0.124939
-0.570580	that similar solutions	-0.124939
-0.143407	it. Possible solutions	-0.124939
-0.143407	machines? Possible solutions	-0.124939
-0.358973	Such hybrid solutions	-0.124939
-0.085115	An optimization guide	-0.726999
-3.069318	in the appendix	-0.124939
-1.709401	in an appendix	-0.124939
-1.727959	as an appendix	-0.124939
-0.594670	classes. An appendix	-0.124939
-0.902277	lines. The 17	-0.124939
-0.550870	column. Number 17	-0.124939
-0.143407	................................................................................................ 157 17	-0.124939
-0.143407	normal. 157 17	-0.124939
-0.601951	apply the empty	-0.124939
-0.601653	specification. The empty	-0.124939
-1.602846	have an empty	-0.124939
-0.596772	While an empty	-0.124939
-0.378797	testing and maintenance	-0.124939
-0.203413	Test and maintenance	-0.124939
-1.200730	it with 1:	-0.124939
-1.060663	break; case 1:	-0.124939
-0.466207	; parameter 1:	-0.124939
-0.358983	needed. 11 Out	-0.124939
-0.358983	103 11 Out	-0.124939
-0.358973	a First-In-Last- Out	-0.124939
-0.358973	a First-In-First- Out	-0.124939
-2.074834	in a protected	-0.425969
-0.601050	switch to protected	-0.124939
-0.601050	switching to protected	-0.124939
-1.282894	memory allocation. Container	-0.124939
-0.143407	90 9.7 Container	-0.124939
-0.143407	alloca. 9.7 Container	-0.124939
-0.358973	Multiple threads? Container	-0.124939
-1.549699	used as alternatives	-0.124939
-2.193517	more efficient alternatives	-0.124939
-0.598529	the possible alternatives	-0.124939
-1.557117	are various alternatives	-0.124939
-1.977732	lot of modifications	-0.124939
-1.500354	improved by modifications	-0.124939
-0.871579	if your modifications	-0.124939
-0.584278	to require modifications	-0.124939
-1.026146	list[i] += i_div_3;	-0.124939
-0.534305	list[i+1] += i_div_3;	-0.124939
-0.534305	list[i+2] += i_div_3;	-0.124939
-1.596639	int i, i_div_3;	-0.124939
-1.829869	i = s;	-0.124939
-1.497007	short int s;	-0.124939
-0.463696	{ __m128 s;	-0.124939
-0.065813	the "worst case"	-0.124939
-0.143412	the "best case"	-0.124939
-0.143412	and "best case"	-0.124939
-2.171710	have to distinguish	-0.124939
-1.429363	sure to distinguish	-0.124939
-1.882264	important to distinguish	-0.124939
-1.364914	fail to distinguish	-0.124939
-0.601653	truncation. The missing	-0.124939
-2.248354	that are missing	-0.124939
-1.897496	functions are missing	-0.124939
-1.197925	data. A missing	-0.124939
-0.575694	2. Optimizing subroutines	-0.124939
-0.028027	2: "Optimizing subroutines	-0.602060
-0.541309	some development tools	-0.124939
-0.541309	Various development tools	-0.124939
-0.570573	better metaprogramming tools	-0.124939
-0.563064	offering profiling tools	-0.124939
-2.394046	a = 0x2710	-0.124939
-0.240918	from address 0x2710	-0.301030
-0.734286	the hot spot	-0.124939
-0.279837	a hot spot	-0.124939
-0.384569	or hot spot	-0.124939
-0.600168	templates. The powN	-0.124939
-0.600168	template. The powN	-0.124939
-1.499918	loop if powN	-0.124939
-1.071704	N> class powN	-0.124939
-1.872112	as the C-style	-0.124939
-2.290181	than the C-style	-0.124939
-1.077923	conversion // C-style	-0.124939
-0.585182	The old C-style	-0.124939
-1.343557	C++ language While	-0.124939
-0.885256	frame functions. While	-0.124939
-0.659926	assembly language". While	-0.124939
-0.358959	than others. While	-0.124939
-0.820845	c:2; }; Bitfield	-0.124939
-0.558993	abc; }; Bitfield	-0.124939
-0.585221	7.40b union Bitfield	-0.124939
-0.577651	7.40a struct Bitfield	-0.124939
-0.601043	something to clean	-0.124939
-0.901058	nothing to clean	-0.124939
-0.897923	and most clean	-0.124939
-0.594998	program must clean	-0.124939
-0.557771	option -fpic according	-0.124939
-0.957995	binary representation according	-0.124939
-0.505091	always behave according	-0.124939
-0.463677	100. Now, according	-0.124939
-2.303623	{ // Bounds	-0.124939
-0.143407	constant. 14.2 Bounds	-0.124939
-0.143407	132 14.2 Bounds	-0.124939
-0.358973	memory fragmentation. Bounds	-0.124939
-0.593084	} u; u.i	-0.124939
-1.149496	int n; u.i	-0.124939
-0.789042	if nonzero u.i	-0.124939
-2.430423	to a dramatic	-0.124939
-1.591838	much more dramatic	-0.124939
-1.806655	a very dramatic	-0.124939
-0.591247	have quite dramatic	-0.124939
-0.892577	without an IDE.	-0.124939
-0.596772	including an IDE.	-0.124939
-1.463202	its own IDE.	-0.124939
-1.164301	Visual Studio IDE.	-0.124939
-0.601643	smaller. The lengths	-0.124939
-1.856620	of different lengths	-0.124939
-0.598311	have variable lengths	-0.124939
-0.358959	to great lengths	-0.124939
-2.591878	is not expensive.	-0.124939
-1.209385	are very expensive.	-0.124939
-0.582119	also very expensive.	-0.124939
-1.477887	are less expensive.	-0.124939
-1.704169	sake of efficiency.	-0.124939
-1.297049	loss of efficiency.	-0.124939
-1.445991	difference in efficiency.	-0.124939
-1.226218	to improve efficiency.	-0.124939
-0.474620	163 20 Copyright	-0.124939
-0.474620	links. 20 Copyright	-0.124939
-0.358973	from www.agner.org/optimize. Copyright	-0.124939
-0.358973	of Denmark. Copyright	-0.124939
-1.300011	registers is extended	-0.124939
-2.952704	can be extended	-0.124939
-1.712939	registers are extended	-0.124939
-1.496435	with an extended	-0.124939
-0.599702	(total cache size)	-0.124939
-1.018291	i >= size)	-0.124939
-0.143407	/ (line size)	-0.124939
-0.143407	sets) (line size)	-0.124939
-0.463677	(MS) smmintrin.h (Gnu)	-0.124939
-0.358959	FMA4 fma4intrin.h (Gnu)	-0.124939
-0.358959	(MS) xopintrin.h (Gnu)	-0.124939
-0.358959	(MS) x86intrin.h (Gnu)	-0.124939
-0.902682	information is contained	-0.124939
-2.430423	to a contained	-0.124939
-0.601830	Pointers to contained	-0.124939
-0.806004	be completely contained	-0.124939
-1.600519	overhead of transferring	-0.124939
-0.899221	method for transferring	-0.124939
-0.600121	left for transferring	-0.124939
-0.601267	Windows by transferring	-0.124939
-1.558224	less efficient. Access	-0.124939
-0.143407	96 9.9 Access	-0.124939
-0.143407	www.agner.org/optimize/cppexamples.zip. 9.9 Access	-0.124939
-0.358973	data locally. Access	-0.124939
-1.359930	and for saving	-0.124939
-2.105319	used for saving	-0.124939
-0.598613	strategy for saving	-0.124939
-0.358988	stack frame, saving	-0.124939
-1.571055	for many years	-0.124939
-1.056308	take several years	-0.124939
-0.550826	a six years	-0.124939
-0.463677	or ten years	-0.124939
-0.588773	14.16a double y,	-0.124939
-0.588773	14.16b double y,	-0.124939
-0.916824	double x, y,	-0.124939
-0.692194	float x, y,	-0.124939
-0.901095	priority of structured	-0.124939
-0.601061	importance of structured	-0.124939
-1.705912	compatible with structured	-0.124939
-1.296586	relies on structured	-0.124939
-1.924868	the compiler documentation	-0.425969
-0.550832	to poor documentation	-0.124939
-0.505082	obsolete. Microprocessor documentation	-0.124939
-1.294669	() { CChild1	-0.124939
-0.589849	declaration class CChild1	-0.124939
-0.589849	versions: class CChild1	-0.124939
-0.463696	CChild2 Object2; CChild1	-0.124939
-1.057678	Digital Mars PGI	-0.124939
-0.463677	compilers. (The PGI	-0.124939
-0.358959	be tolerated. PGI	-0.124939
-0.358959	3.1, 2007. PGI	-0.124939
-0.577597	element. 100 As	-0.124939
-0.358959	= pow(x,n) As	-0.124939
-0.358959	predicted perfectly. As	-0.124939
-0.358959	* 5). As	-0.124939
-0.194907	type identification (RTTI)	-0.124939
-0.592264	precision by default,	-0.124939
-0.202143	binding by default,	-0.124939
-0.592264	not, by default,	-0.124939
-0.568386	} double xpow10(double	-0.124939
-0.197173	10 double xpow10(double	-0.425969
-0.568386	unrolled double xpow10(double	-0.124939
-0.031662	double a1, a2,	-0.425969
-0.031662	y, a1, a2,	-0.425969
-0.587034	flow at inconvenient	-0.124939
-0.587034	collector at inconvenient	-0.124939
-0.587034	unpredictably at inconvenient	-0.124939
-1.482484	but also inconvenient	-0.124939
-2.277802	to be expressed	-0.124939
-1.982697	can be expressed	-0.602060
-2.732989	if the bottleneck	-0.124939
-2.360320	If the bottleneck	-0.124939
-2.088435	be a bottleneck	-0.124939
-0.594482	Any specific bottleneck	-0.124939
-2.031024	using the directive	-0.124939
-1.203471	expect a directive	-0.124939
-0.584261	a #define directive	-0.124939
-0.358959	the __assume_aligned directive	-0.124939
-0.882657	CPU. If not,	-0.124939
-0.882657	factor. If not,	-0.124939
-1.267583	it does not,	-0.124939
-0.789094	Windows. Does not,	-0.124939
-2.616761	is a scarce	-0.124939
-0.804848	are a scarce	-0.124939
-0.585207	space were scarce	-0.124939
-0.886992	loop. Example 12.4b	-0.124939
-0.593936	expression. Example 12.4b	-0.124939
-1.042212	of example 12.4b	-0.124939
-1.920349	in example 12.4b	-0.124939
-3.143257	of the lrint	-0.124939
-2.301050	use the lrint	-0.124939
-0.378078	inline int lrint	-0.425969
-0.896704	have multiple versions.	-0.124939
-1.361247	in two versions.	-0.124939
-1.424839	the 64-bit versions.	-0.124939
-0.891139	and dynamic versions.	-0.124939
-1.351785	memory access .............................................................................................	-0.124939
-1.643746	vector classes .............................................................................................	-0.124939
-0.884192	Overloaded operators .............................................................................................	-0.124939
-1.169110	Integer multiplication .............................................................................................	-0.124939
-1.114483	by 16. Alignment	-0.124939
-0.314785	variables. 9.5 Alignment	-0.124939
-0.314785	88 9.5 Alignment	-0.124939
-0.358973	Table 7.2. Alignment	-0.124939
-1.203304	model is going	-0.124939
-0.601832	chance of going	-0.124939
-0.901048	am not going	-0.124939
-0.600577	penalty when going	-0.124939
-0.804228	overflow and underflow	-0.124939
-1.295611	generate an underflow	-0.124939
-2.580345	floating point underflow	-0.124939
-0.029478	their live ranges	-0.425969
-1.299464	loop and splitting	-0.124939
-1.073530	classes. The splitting	-0.124939
-0.600168	formalism. The splitting	-0.124939
-0.585191	instructions were splitting	-0.124939
-3.087114	of the user's	-0.124939
-0.601250	satisfies the user's	-0.124939
-0.601250	steal the user's	-0.124939
-0.588102	of end user's	-0.124939
-1.200097	and not __INTEL_COMPILER	-0.124939
-0.593740	Windows Linux __INTEL_COMPILER	-0.124939
-0.314785	not __INTEL_COMPILER __INTEL_COMPILER	-0.124939
-0.314785	Linux __INTEL_COMPILER __INTEL_COMPILER	-0.124939
-1.836140	to be cleaned	-0.124939
-1.298529	resources are cleaned	-0.124939
-0.876674	and resources cleaned	-0.124939
-1.300045	table is cached.	-0.124939
-1.937298	not be cached.	-0.124939
-2.418943	is not cached.	-0.124939
-2.081358	are not cached.	-0.124939
-0.601746	audio and video	-0.124939
-0.601365	audio or video	-0.124939
-0.129476	Aligning RGB video	-0.425969
-0.740358	elements in aa:	-0.425969
-0.598855	line number information.	-0.124939
-0.595551	publicly available information.	-0.124939
-1.675515	exception handling information.	-0.124939
-0.583267	new relevant information.	-0.124939
-1.171631	Container classes Whenever	-0.124939
-0.881761	never used. Whenever	-0.124939
-0.570580	big problem. Whenever	-0.124939
-0.527334	is better. Whenever	-0.124939
-2.762335	to the area	-0.124939
-0.796229	same memory area	-0.425969
-1.073934	static data area	-0.124939
-2.367622	because the consequence	-0.124939
-1.368982	calls. The consequence	-0.124939
-0.600168	wasted. The consequence	-0.124939
-0.505082	the unfortunate consequence	-0.124939
-0.588773	14.17b double a1,	-0.124939
-0.588773	14.17a double a1,	-0.124939
-0.129481	double y, a1,	-0.425969
-0.902682	dividend is unsigned.	-0.124939
-1.446238	converted to unsigned.	-0.124939
-1.299611	signed or unsigned.	-0.124939
-0.601303	zero-bits if unsigned.	-0.124939
-1.076636	operations with pointers.	-0.124939
-0.875562	for char pointers.	-0.124939
-0.172699	and invalid pointers.	-0.124939
-2.207566	See page 26	-0.124939
-1.296075	for floating 26	-0.124939
-0.358959	C++ constructs........................................................................ 26	-0.124939
-0.358959	variable storage............................................................................. 26	-0.124939
-1.289059	if possible. Smaller	-0.124939
-0.828271	the software. Smaller	-0.124939
-0.505062	optimize caching. Smaller	-0.124939
-0.358959	small microcontrollers: Smaller	-0.124939
-2.207566	See page 29	-0.124939
-0.557757	263-1 int64_t 29	-0.124939
-0.550826	swapping column 29	-0.124939
-0.358959	and operators............................................................................... 29	-0.124939
-1.203430	addition to sum2	-0.124939
-0.601734	sum1 and sum2	-0.124939
-1.423741	= 0, sum2	-0.124939
-0.659926	+= list[i]; sum2	-0.124939
-0.601399	u.i = (n	-0.124939
-0.899419	{ if (n	-0.124939
-0.594978	1.0; while (n	-0.124939
-0.902260	object for (b	-0.124939
-2.393659	a = (b	-0.124939
-0.899419	{ if (b	-0.124939
-0.598239	number by 2n	-0.124939
-0.895481	divide by 2n	-0.124939
-1.200730	it with 2n	-0.124939
-1.736003	less than 2n	-0.124939
-1.194452	or no idea	-0.124939
-0.726198	a good idea	-0.602060
-1.693575	of data ......................................................................................................	-0.124939
-0.892266	Function pointers ......................................................................................................	-0.124939
-0.891495	Network access ......................................................................................................	-0.124939
-0.858702	System database ......................................................................................................	-0.124939
-0.902506	written in C,	-0.124939
-0.194047	of standard C,	-0.425969
-0.567237	languages include C,	-0.124939
-0.601421	compiler // Same	-0.124939
-0.358959	Example 12.4c. Same	-0.124939
-0.358959	Example 12.4e. Same	-0.124939
-0.358959	Example 12.4d. Same	-0.124939
-1.237061	You can disable	-0.425969
-1.725157	you should disable	-0.124939
-0.463696	to test. disable	-0.124939
-2.549847	on the assumption	-0.124939
-0.601598	compiler, the assumption	-0.124939
-1.075922	such an assumption	-0.124939
-0.895651	make any assumption	-0.124939
-0.901305	x is treated	-0.124939
-1.820164	object is treated	-0.124939
-1.919401	is also treated	-0.124939
-0.882479	are simply treated	-0.124939
-0.587414	across compilers. Fastcall	-0.124939
-0.358959	#pragma optimize(...) Fastcall	-0.124939
-0.358959	pointers /vms Fastcall	-0.124939
-0.358959	CodeGear compiler). Fastcall	-0.124939
-0.871555	3-dimensional vectors RGB	-0.124939
-0.143413	12.9 Aligning RGB	-0.425969
-0.358973	square root, RGB	-0.124939
-0.601734	C# and avoids	-0.124939
-1.202327	way that avoids	-0.124939
-1.799258	and it avoids	-0.124939
-0.599870	time but avoids	-0.124939
-1.890838	compiler is prevented	-0.124939
-0.901318	F1 is prevented	-0.124939
-2.257921	can be prevented	-0.124939
-1.370422	Mathematical functions .......................................................................................	-0.124939
-1.226003	hardware platform .......................................................................................	-0.124939
-1.014212	optimal algorithm .......................................................................................	-0.124939
-0.806063	unit throughput .......................................................................................	-0.124939
-1.300214	feature is seldom	-0.124939
-0.902193	errors that seldom	-0.124939
-0.600362	separate from seldom	-0.124939
-1.041597	and put seldom	-0.124939
-0.503986	penalty for mixing	-0.124939
-1.201448	restrictions on mixing	-0.124939
-0.900147	problem when mixing	-0.124939
-0.249852	function. 7.12 Branches	-0.124939
-0.249852	40 7.12 Branches	-0.124939
-0.463696	example 15.1b. Branches	-0.124939
-0.358973	misprediction penalty. Branches	-0.124939
-0.601945	upon the double.	-0.124939
-2.175763	which is double.	-0.124939
-1.191386	of two double.	-0.124939
-0.358959	long long, double.	-0.124939
-2.633058	// Example 16.1	-0.124939
-1.367224	from example 16.1	-0.124939
-0.505062	speed.............................................................................................................. 153 16.1	-0.124939
-0.659926	(see below) 16.1	-0.124939
-0.203126	c; for (r	-0.425969
-0.378225	temp; for (r	-0.425969
-2.175763	which is 50%	-0.124939
-0.600059	by only 50%	-0.124939
-1.004316	is true 50%	-0.124939
-1.149503	be mispredicted 50%	-0.124939
-1.819695	in a suboptimal	-0.301030
-1.078534	leads to suboptimal	-0.124939
-0.255942	0.11 memcpy 16kB	-0.124939
-0.255942	0.12 memcpy 16kB	-0.124939
-0.255942	Processor memcpy 16kB	-0.124939
-0.255942	0.22 memcpy 16kB	-0.124939
-1.073477	to different tasks.	-0.124939
-1.178995	for simple tasks.	-0.124939
-0.589171	complicated mathematical tasks.	-0.124939
-0.358959	logically distinct tasks.	-0.124939
-2.762242	if the image	-0.124939
-0.601734	processing and image	-0.124939
-0.902066	Examples are image	-0.124939
-0.527357	vectors RGB image	-0.124939
-0.601951	determine the worst-case	-0.124939
-0.855537	when testing worst-case	-0.124939
-0.474620	time under worst-case	-0.124939
-0.474620	tested under worst-case	-0.124939
-0.575645	result, true (1)	-0.124939
-0.659926	this problem: (1)	-0.124939
-0.358959	data object: (1)	-0.124939
-0.358959	address. Step (1)	-0.124939
-2.718190	is a float,	-0.124939
-1.203208	representation of float,	-0.124939
-1.359811	Conversions between float,	-0.124939
-0.541286	as int, float,	-0.124939
-1.474384	for example 9.5	-0.124939
-0.589368	modify example 9.5	-0.124939
-1.018262	register variables. 9.5	-0.124939
-0.527355	...................................... 88 9.5	-0.124939
-1.031791	a[i] = Induction;	-0.124939
-0.202506	a[i+1] = Induction;	-0.124939
-1.203430	writing to uncached	-0.124939
-1.202075	code are uncached	-0.124939
-1.075893	than an uncached	-0.124939
-0.594656	store An uncached	-0.124939
-1.961616	into the individual	-0.124939
-1.641614	access to individual	-0.124939
-1.550713	than by individual	-0.124939
-1.169076	to identify individual	-0.124939
-1.445778	it to begin	-0.124939
-0.601611	names that begin	-0.124939
-1.077421	microprocessor can begin	-0.124939
-0.594987	array must begin	-0.124939
-1.061124	the user interface.	-0.124939
-0.640174	graphical user interface.	-0.124939
-2.633058	// Example 9.3	-0.124939
-0.597995	As table 9.3	-0.124939
-0.550810	................................................................................................... 87 9.3	-0.124939
-0.659926	VIA CPUs". 9.3	-0.124939
-1.716410	of this option.	-0.124939
-0.498153	on this option.	-0.124939
-0.557810	the -fpic option.	-0.124939
-2.676074	to the diagonal.	-0.124939
-2.179352	at the diagonal.	-0.124939
-0.504687	above the diagonal.	-0.425969
-0.902463	interfaces and interfaces	-0.124939
-1.225134	of user interfaces	-0.124939
-1.304315	graphical user interfaces	-0.124939
-0.591952	to hardware interfaces	-0.124939
-0.199547	of 4 floats	-0.124939
-1.257664	of four floats	-0.124939
-1.009638	of 100 floats	-0.124939
-1.594988	function to another.	-0.124939
-1.076136	object to another.	-0.124939
-1.077099	way or another.	-0.124939
-0.900849	thread than another.	-0.124939
-1.934509	unsigned int N>	-0.124939
-0.596746	IsPowerOf2, int N>	-0.124939
-0.204109	template <int N>	-0.124939
-0.249860	bytes. 7.19 Class	-0.124939
-0.249860	51 7.19 Class	-0.124939
-0.249860	performance. 7.18 Class	-0.124939
-0.249860	51 7.18 Class	-0.124939
-1.601596	the program. Small	-0.124939
-0.960830	in parallel. Small	-0.124939
-0.659926	be controlled. Small	-0.124939
-0.463677	vectorization favorable: Small	-0.124939
-0.601611	trick that N1	-0.124939
-0.890120	The constant N1	-0.124939
-0.584261	N: #define N1	-0.124939
-0.358959	powN<true,N-N1>::p(x); #undef N1	-0.124939
-1.679647	advantages of alloca	-0.124939
-0.600577	explicitly when alloca	-0.124939
-1.739673	in which alloca	-0.124939
-1.351688	function returns. alloca	-0.124939
-0.902488	alignment and aliasing.	-0.124939
-0.542628	no pointer aliasing.	-0.124939
-2.189133	can be eliminated	-0.425969
-1.746281	also be eliminated	-0.124939
-1.181371	sometimes be eliminated	-0.124939
-2.029113	that a detailed	-0.124939
-0.902240	documentation for detailed	-0.124939
-1.371435	A more detailed	-0.124939
-1.186496	which makes detailed	-0.124939
-0.589160	double 256 F32vec4	-0.124939
-0.620514	// x^4 F32vec4	-0.124939
-0.314796	x^3, x^4 F32vec4	-0.124939
-0.527355	four floats F32vec4	-0.124939
-0.601365	clear or mask	-0.124939
-1.175244	// Use mask	-0.124939
-0.191275	bit-mask: __m128i mask	-0.425969
-2.679031	that the original	-0.124939
-2.443050	when the original	-0.124939
-1.672415	whether the original	-0.124939
-0.601663	disadvantages. The original	-0.124939
-1.074790	because all caches	-0.124939
-1.064076	about how caches	-0.124939
-0.589656	guidelines. Most caches	-0.124939
-0.358959	each other's caches	-0.124939
-1.378918	fail to recognize	-0.124939
-1.638133	it will recognize	-0.124939
-1.671547	compiler will recognize	-0.124939
-1.726193	compilers will recognize	-0.124939
-0.314785	168.5 513 513	-0.124939
-0.314785	230.7 513 513	-0.124939
-0.358973	378.7 168.5 513	-0.124939
-0.358973	2048 230.7 513	-0.124939
-0.764695	7.29 Threads Threads	-0.124939
-0.143407	method. 7.29 Threads	-0.124939
-0.143407	Templates...............................................................................................................57 7.29 Threads	-0.124939
-0.358973	in Linux). Threads	-0.124939
-0.249860	functions. 7.27 Overloaded	-0.124939
-0.249860	56 7.27 Overloaded	-0.124939
-0.143412	); 7.26 Overloaded	-0.124939
-0.143412	56 7.26 Overloaded	-0.124939
-1.282979	of 2. Contentions	-0.124939
-0.463677	target buffer. Contentions	-0.124939
-0.358959	buffer (BTB). Contentions	-0.124939
-0.358959	my experiments. Contentions	-0.124939
-1.844988	method is illustrated	-0.124939
-0.601166	technique is illustrated	-0.124939
-2.953773	can be illustrated	-0.124939
-0.601122	checking, as illustrated	-0.124939
-0.260345	In other words,	-0.249877
-0.601653	risky. The returned	-0.124939
-2.257705	can be returned	-0.425969
-1.743200	objects are returned	-0.124939
-0.902277	one. The existing	-0.124939
-1.375136	compatibility with existing	-0.124939
-1.635815	to an existing	-0.124939
-0.596772	modify an existing	-0.124939
-1.715813	the code. Let's	-0.124939
-0.575645	of precision. Let's	-0.124939
-0.358959	4 rows. Let's	-0.124939
-0.358959	possible inputs. Let's	-0.124939
-0.503798	as it is,	-0.124939
-0.896674	than there is,	-0.124939
-1.590129	The reason is,	-0.124939
-0.730743	following example illustrates	-0.124939
-0.379812	pitfalls of unit-testing	-0.124939
-1.076806	performance by unit-testing	-0.124939
-0.601065	development. This unit-testing	-0.124939
-0.185411	i; for(i=0; i<300;	-0.301030
-0.358988	i_div_3; for(i=i_div_3=0; i<300;	-0.124939
-1.430470	* p) {return	-0.124939
-1.065737	& r) {return	-0.124939
-0.358959	int ReadB() {return	-0.124939
-0.358959	int Sum1() {return	-0.124939
-0.593275	a2 / b2;	-0.124939
-0.366835	a2, b1, b2;	-0.425969
-0.463696	public: B2 b2;	-0.124939
-0.563054	time applications. Remember	-0.124939
-0.541266	variable names. Remember	-0.124939
-0.764627	work better. Remember	-0.124939
-0.358959	and down. Remember	-0.124939
-1.202495	cases. The explicit	-0.124939
-1.183342	making an explicit	-0.124939
-0.596772	Insert an explicit	-0.124939
-2.531978	to make explicit	-0.124939
-2.251394	then the mirror	-0.124939
-1.300110	optimal to mirror	-0.124939
-2.095347	You may mirror	-0.124939
-0.595250	at its mirror	-0.124939
-2.530352	of a dedicated	-0.124939
-1.256291	than a dedicated	-0.124939
-1.950327	have a dedicated	-0.124939
-0.558896	virtual void Disp()	-0.425969
-0.558896	public: void Disp()	-0.425969
-1.052325	{ int r,	-0.425969
-0.588425	y=temp;} int r,	-0.124939
-0.588425	98 int r,	-0.124939
-2.218233	// Example 8.26a	-0.124939
-0.886992	; Example 8.26a	-0.124939
-1.320236	from example 8.26a	-0.124939
-0.589384	optimize example 8.26a	-0.124939
-1.078763	set a breakpoint	-0.124939
-0.172702	interrupt 3 breakpoint	-0.124939
-0.971398	a fixed breakpoint	-0.124939
-0.015542	a1, a2, b1,	-0.425969
-0.601611	register that appears	-0.124939
-0.601363	although it appears	-0.124939
-1.295632	if this appears	-0.124939
-0.594658	better processor appears	-0.124939
-0.601945	verifying the functionality	-0.124939
-0.591915	that add functionality	-0.124939
-1.504515	the desired functionality	-0.124939
-0.940759	a well-defined functionality	-0.124939
-1.433084	in other languages.	-0.124939
-1.123189	other programming languages.	-0.124939
-0.573374	various programming languages.	-0.124939
-0.358973	less well-known languages.	-0.124939
-0.601832	algorithm of sequential	-0.124939
-1.641971	accessed in sequential	-0.124939
-0.901371	statement with sequential	-0.124939
-0.463677	in non- sequential	-0.124939
-0.374138	manual at www.agner.org/optimize/cppexamples.zip	-0.124939
-0.587034	appendix at www.agner.org/optimize/cppexamples.zip	-0.124939
-0.597486	alignment. See www.agner.org/optimize/cppexamples.zip	-0.124939
-0.533136	of programming style.	-0.249877
-0.203741	9.6b. The MOVNTQ	-0.425969
-0.601430	*(__m64*)&source); // MOVNTQ	-0.124939
-1.289367	without cache MOVNTQ	-0.124939
-0.601856	assume is optimized.	-0.124939
-2.591166	is not optimized.	-0.124939
-0.597385	but less optimized.	-0.124939
-0.567208	always fully optimized.	-0.124939
-1.663887	and data .........................................................................................	-0.124939
-1.034765	Preprocessing directives .........................................................................................	-0.124939
-1.203709	Automatic vectorization .........................................................................................	-0.124939
-0.659926	optimization topics .........................................................................................	-0.124939
-0.599553	functions class CHello	-0.124939
-0.760355	: public CHello	-0.425969
-0.463696	C2 Object2; CHello	-0.124939
-2.222067	can be found	-0.425969
-1.849892	must be found	-0.124939
-0.573359	features rarely found	-0.124939
-1.550780	done by me	-0.124939
-0.143407	code. Let me	-0.124939
-0.143407	sets. Let me	-0.124939
-0.358973	have sent me	-0.124939
-2.190042	from the counts.	-0.124939
-1.068628	two clock counts.	-0.124939
-1.149417	the subsequent counts.	-0.124939
-0.764627	"worst case" counts.	-0.124939
-1.281142	The performance measurement	-0.124939
-1.529887	to put measurement	-0.124939
-1.504515	the desired measurement	-0.124939
-0.358959	method. Your measurement	-0.124939
-0.896704	through multiple layers	-0.124939
-0.597935	extra software layers	-0.124939
-0.594301	requires several layers	-0.124939
-0.589146	of separate layers	-0.124939
-1.691850	an error handler	-0.124939
-0.402194	the exception handler	-0.124939
-2.308871	that is coded	-0.124939
-2.528997	This is coded	-0.124939
-2.953773	can be coded	-0.124939
-2.302298	that are coded	-0.124939
-1.078241	small and changing	-0.124939
-1.189033	and by changing	-0.124939
-1.189033	multiplication by changing	-0.124939
-0.864055	last index changing	-0.124939
-3.018018	in the unit-test	-0.124939
-0.601598	unfortunately the unit-test	-0.124939
-2.102016	on a unit-test	-0.124939
-0.600717	under this unit-test	-0.124939
-1.379560	through the implicit	-0.124939
-0.601653	__fastcall. The implicit	-0.124939
-1.727959	as an implicit	-0.124939
-1.359433	has an implicit	-0.124939
-1.202816	faster and smaller.	-0.124939
-0.601548	thread are smaller.	-0.124939
-0.594658	800 bytes smaller.	-0.124939
-0.591275	make files smaller.	-0.124939
-2.117941	in the interval	-0.124939
-1.504805	the desired interval	-0.124939
-2.367333	because the 33	-0.124939
-0.835997	65 65 33	-0.124939
-0.463677	Enums ...................................................................................................................... 33	-0.124939
-0.358959	7.5 Booleans................................................................................................................... 33	-0.124939
-2.207566	See page 31	-0.124939
-0.580766	integer variables. 31	-0.124939
-0.818666	eax ebx, 31	-0.124939
-0.527334	63 63 31	-0.124939
-0.112450	= (unsigned int)b	-0.425969
-1.129162	x86 platforms. 3.	-0.124939
-0.563066	for interrupt 3.	-0.124939
-0.557800	at vectorization. 3.	-0.124939
-0.358959	anonymous namespace. 3.	-0.124939
-0.869965	takes 10 μs	-0.124939
-0.583248	only 5 μs	-0.124939
-0.358993	= 250 μs	-0.124939
-0.358993	not! 250 μs	-0.124939
-1.776798	any other module.	-0.124939
-0.885278	in another module.	-0.124939
-0.188124	from another module.	-0.124939
-0.861644	code. Dynamic cast	-0.124939
-0.541305	not. Static cast	-0.124939
-0.358959	int. Reinterpret cast	-0.124939
-0.358959	CPUs"). Const cast	-0.124939
-1.376472	stored as 8-bit	-0.124939
-1.068668	as an 8-bit	-0.425969
-1.655519	are using 8-bit	-0.124939
-0.864026	the size. Integers	-0.124939
-0.573350	Integer sizes Integers	-0.124939
-0.249852	classes. 7.2 Integers	-0.124939
-0.249852	26 7.2 Integers	-0.124939
-0.892266	Function pointers Calling	-0.124939
-1.119566	the class. Calling	-0.124939
-0.541266	CPUs. 5. Calling	-0.124939
-0.358959	calls exit. Calling	-0.124939
-0.129476	int lrint (double	-0.425969
-0.463696	double ipow (double	-0.124939
-0.358973	double IntegerPower (double	-0.124939
-0.559003	Saturday }; Weekdays	-0.124939
-0.559003	0x40 }; Weekdays	-0.124939
-0.314795	& enum Weekdays	-0.124939
-0.314795	conditions enum Weekdays	-0.124939
-0.601635	always for application-specific	-0.124939
-1.708582	to store application-specific	-0.124939
-0.882453	in optimizing application-specific	-0.124939
-1.089906	to define application-specific	-0.124939
-1.270940	and c first.	-0.124939
-0.577661	predictable operand first.	-0.124939
-0.570590	members come first.	-0.124939
-1.096261	the fastest first.	-0.124939
-3.205415	of the considerations	-0.124939
-1.298626	determined by considerations	-0.124939
-1.987985	The following considerations	-0.124939
-0.463677	the conflicting considerations	-0.124939
-1.067202	fraction 2 63	-0.124939
-1.181795	per element 63	-0.124939
-0.788977	a valid 63	-0.124939
-0.527334	element 63 63	-0.124939
-1.078590	double is represented	-0.124939
-2.257705	can be represented	-0.124939
-1.394548	in fact represented	-0.124939
-2.294402	order to force	-0.124939
-1.935753	You can force	-0.124939
-0.599206	come into force	-0.124939
-0.588032	Memory-hungry applications force	-0.124939
-1.373474	do this manually.	-0.124939
-2.277170	to do manually.	-0.124939
-0.866213	the reductions manually.	-0.124939
-0.463677	the parentheses manually.	-0.124939
-2.392389	should be identified	-0.124939
-0.599703	but are identified	-0.124939
-1.725179	objects are identified	-0.124939
-1.193417	Are objects identified	-0.124939
-1.294104	given in www.agner.org/optimize/cppexamples.zip.	-0.124939
-0.600666	templates in www.agner.org/optimize/cppexamples.zip.	-0.124939
-1.075743	manual at www.agner.org/optimize/cppexamples.zip.	-0.124939
-0.893982	pool. See www.agner.org/optimize/cppexamples.zip.	-0.124939
-1.940601	has a virus	-0.124939
-1.641614	access to virus	-0.124939
-0.902240	uncommon for virus	-0.124939
-0.358959	users. Firewalls, virus	-0.124939
-1.378333	arrays and structures.	-0.124939
-0.901701	classes or structures.	-0.124939
-0.376764	big data structures.	-0.124939
-1.532203	class library exp	-0.124939
-0.573312	directly: Library exp	-0.124939
-0.764627	4 floats exp	-0.124939
-0.527357	library exp exp	-0.124939
-1.963746	the size (in	-0.124939
-1.551182	The size (in	-0.124939
-0.886582	matrix line (in	-0.124939
-1.504848	clock frequency (in	-0.124939
-2.664521	is a pointer,	-0.124939
-0.601119	type, a pointer,	-0.124939
-0.570612	references, 'this' pointer,	-0.124939
-0.358973	an imported pointer,	-0.124939
-0.601863	bit is kept	-0.124939
-1.129012	preferably be kept	-0.124939
-1.934073	functions are kept	-0.124939
-0.599317	A; double Y	-0.124939
-1.565952	induction variable Y	-0.124939
-1.529343	induction variables Y	-0.124939
-0.358959	= Y; Y	-0.124939
-0.600586	optimizations when interprocedural	-0.124939
-2.277466	to do interprocedural	-0.124939
-0.314796	and enables interprocedural	-0.124939
-0.620514	This enables interprocedural	-0.124939
-0.898390	these are incompatible	-0.124939
-0.898390	options are incompatible	-0.124939
-2.653289	the code incompatible	-0.124939
-0.463696	to use, incompatible	-0.124939
-0.589179	or 256 bytes)	-0.124939
-0.077822	size (in bytes)	-0.425969
-0.172700	line (in bytes)	-0.124939
-2.731302	to the selected	-0.124939
-1.597459	have the selected	-0.124939
-2.302584	code is selected	-0.124939
-2.537796	may be selected	-0.124939
-1.445976	value is multiplied	-0.124939
-2.317559	should be multiplied	-0.124939
-1.881007	must be multiplied	-0.124939
-0.864055	an index multiplied	-0.124939
-0.600665	reliable and reproducible	-0.124939
-0.600665	accurate and reproducible	-0.124939
-0.600595	get more reproducible	-0.124939
-1.591145	to get reproducible	-0.124939
-1.743093	objects are normally	-0.124939
-2.046418	do not normally	-0.124939
-0.601058	else. This normally	-0.124939
-0.892059	Mac systems normally	-0.124939
-2.199791	used for constants.	-0.124939
-1.759356	or more constants.	-0.124939
-1.194620	for integer constants.	-0.124939
-0.527334	for defining constants.	-0.124939
-0.601148	cache, code cache,	-0.124939
-1.615969	of data cache,	-0.124939
-1.187149	level-1 data cache,	-0.124939
-2.646757	the same cache,	-0.124939
-0.601122	serves as entry	-0.124939
-0.887721	the common entry	-0.124939
-0.615793	the PLT entry	-0.124939
-0.435076	The PLT entry	-0.124939
-1.503927	performance is inferior	-0.124939
-1.712431	compilers are inferior	-0.124939
-0.600961	run an inferior	-0.124939
-0.594656	supports. An inferior	-0.124939
-2.430703	to be obsolete.	-0.124939
-0.599801	soon be obsolete.	-0.124939
-0.143412	1994. Mostly obsolete.	-0.124939
-0.143412	1997. Mostly obsolete.	-0.124939
-0.889006	calculations while simultaneously	-0.124939
-1.064799	multiple calculations simultaneously	-0.124939
-0.588638	applications running simultaneously	-0.124939
-0.541266	more jobs simultaneously	-0.124939
-0.726919	interrupt service routine	-0.124939
-0.090170	The initialization routine	-0.124939
-0.042749	an initialization routine	-0.425969
-1.202075	pointers are auto_ptr	-0.124939
-1.289824	from one auto_ptr	-0.124939
-0.527334	only one, auto_ptr	-0.124939
-0.358959	and shared_ptr. auto_ptr	-0.124939
-1.488659	A branch tree	-0.124939
-0.820038	a binary tree	-0.124939
-0.174300	A binary tree	-0.124939
-1.201642	compiler is unable	-0.425969
-1.520551	will be unable	-0.425969
-1.203780	is executed. Optimizes	-0.124939
-0.480994	automatic vectorization. Optimizes	-0.124939
-0.835998	calling conventions. Optimizes	-0.124939
-2.298496	floating point constants,	-0.124939
-0.592035	26 point constants,	-0.124939
-0.190176	constants, string constants,	-0.124939
-3.142249	of the techniques	-0.124939
-0.601598	trying the techniques	-0.124939
-1.988107	The following techniques	-0.124939
-0.589643	Using complicated techniques	-0.124939
-1.741141	function or otherwise	-0.124939
-0.599916	operator which otherwise	-0.124939
-1.580393	if they otherwise	-0.124939
-0.592245	that would otherwise	-0.124939
-0.143407	pointer. 7.9 Smart	-0.124939
-0.143407	pointers.......................................................................................................37 7.9 Smart	-0.124939
-0.463696	is deleted. Smart	-0.124939
-0.358973	for auto_ptr. Smart	-0.124939
-2.152128	if it opens	-0.124939
-2.411824	a function opens	-0.124939
-0.599916	WritePrivateProfileString, which opens	-0.124939
-2.440959	instruction set opens	-0.124939
-1.840400	may be modified	-0.425969
-2.302298	that are modified	-0.124939
-1.166516	are never modified	-0.124939
-1.042212	convert example 15.1a	-0.124939
-0.589384	reduces example 15.1a	-0.124939
-0.705227	compiler reduced 15.1a	-0.124939
-0.491956	compilers reduced 15.1a	-0.124939
-0.864036	x86-64 platforms. Comparison	-0.124939
-0.463696	Table 8.1. Comparison	-0.124939
-0.143407	optimization. 8.2 Comparison	-0.124939
-0.143407	66 8.2 Comparison	-0.124939
-2.833768	it is finished	-0.124939
-1.075232	threads have finished	-0.124939
-1.135750	it has finished	-0.425969
-2.733921	it is run.	-0.124939
-1.357947	program is run.	-0.124939
-2.122700	it can run.	-0.124939
-0.202393	Access data sequentially	-0.124939
-0.597542	necessarily stored sequentially	-0.124939
-1.353196	be accessed sequentially	-0.124939
-0.600362	manuals from Intel:	-0.124939
-0.596936	code optimization Intel:	-0.124939
-0.527357	Microprocessor documentation Intel:	-0.124939
-0.659926	produced regularly. Intel:	-0.124939
-2.044553	in this format.	-0.124939
-1.430953	long double format.	-0.124939
-1.272029	object file format.	-0.124939
-0.358959	to OMF format.	-0.124939
-1.424340	in C++ programs.	-0.124939
-0.593504	in optimized programs.	-0.124939
-0.575662	with 16-bit programs.	-0.124939
-0.567264	non-object oriented programs.	-0.124939
-1.241669	a non-sequential manner	-0.124939
-0.659926	a systematic manner	-0.124939
-0.358959	rather unconventional manner	-0.124939
-0.358959	a column-wise manner	-0.124939
-0.897917	how compilers work.	-0.124939
-1.672351	not always work.	-0.124939
-0.595279	on important work.	-0.124939
-0.586759	and microprocessors work.	-0.124939
-0.901671	long or uint64_t	-0.124939
-1.285130	64 2 uint64_t	-0.124939
-0.589142	int64_t 256 uint64_t	-0.124939
-0.358959	0 264-1 uint64_t	-0.124939
-1.051452	} if (level	-0.124939
-1.060500	else if (level	-0.124939
-0.202215	branches): if (level	-0.425969
-0.601763	well in tests	-0.124939
-0.902256	compilers The tests	-0.124939
-0.598224	make some tests	-0.124939
-0.598009	Most performance tests	-0.124939
-0.550826	three times. Then	-0.124939
-0.527334	profiling support. Then	-0.124939
-0.527334	-fpic option. Then	-0.124939
-0.358959	in F1? Then	-0.124939
-1.634136	where a soft	-0.124939
-1.296347	Such a soft	-0.124939
-0.577620	a so-called soft	-0.124939
-0.505082	and FPGA soft	-0.124939
-1.718638	b = 100,	-0.124939
-1.732057	c = 100,	-0.124939
-0.594052	min = 100,	-0.124939
-0.594052	NUMROWS = 100,	-0.124939
-0.957945	more reliable results.	-0.124939
-0.444408	and reproducible results.	-0.124939
-0.314785	get reproducible results.	-0.124939
-0.463696	produce undesired results.	-0.124939
-0.601843	tell a hyperthreading	-0.124939
-2.421807	to use hyperthreading	-0.124939
-0.599950	application. If hyperthreading	-0.124939
-1.470140	can avoid hyperthreading	-0.124939
-0.902463	expressions and operators.	-0.124939
-0.177782	and overloaded operators.	-0.124939
-0.726923	and decrement operators.	-0.124939
-2.383912	function is simpler	-0.124939
-1.200557	syntax is simpler	-0.124939
-1.409831	is much simpler	-0.124939
-1.407878	code becomes simpler	-0.124939
-2.580345	floating point format	-0.124939
-0.198226	intermediate file format	-0.124939
-1.461403	the right format	-0.124939
-1.887368	or a reasonable	-0.124939
-1.929670	if a reasonable	-0.124939
-1.196211	the only reasonable	-0.124939
-0.898203	when no reasonable	-0.124939
-2.428990	with the resolution	-0.124939
-0.588620	very high resolution	-0.124939
-0.871548	much higher resolution	-0.124939
-0.463677	with millisecond resolution	-0.124939
-1.072037	more integer units,	-0.124939
-0.594828	have execution units,	-0.124939
-1.415451	point addition units,	-0.124939
-0.563066	multiplexers, arithmetic units,	-0.124939
-2.633058	// Example 12.2	-0.124939
-0.595122	library function. 12.2	-0.124939
-0.541266	................................................................. 107 12.2	-0.124939
-0.527334	127 126 12.2	-0.124939
-0.387640	into vector b:	-0.249877
-0.600293	time-consuming data processing.	-0.124939
-0.573321	for parallel processing.	-0.124939
-0.527334	and image processing.	-0.124939
-0.505091	for multi-core processing.	-0.124939
-1.892159	and a well-defined	-0.124939
-1.635563	with a well-defined	-0.124939
-0.550881	overflow behavior well-defined	-0.124939
-0.377098	2 // Still	-0.425969
-0.202561	faster // Still	-0.425969
-0.597077	14 - 45	-0.124939
-0.597077	(20 - 45	-0.124939
-2.079281	int i; 45	-0.124939
-0.358973	7.13 Loops...................................................................................................................... 45	-0.124939
-0.485804	elements from bb	-0.602060
-0.581066	115 from bb	-0.124939
-1.549400	explained in detail	-0.124939
-1.549400	described in detail	-0.124939
-1.074829	in more detail	-0.124939
-3.206581	of the advices	-0.124939
-0.586750	developer.intel.com. Many advices	-0.124939
-0.550832	above security advices	-0.124939
-0.598708	71). The conclusion	-0.124939
-0.598708	utility. The conclusion	-0.124939
-0.598708	1.23456. The conclusion	-0.124939
-1.200584	to is deleted	-0.124939
-1.820257	object is deleted	-0.124939
-1.372373	and later deleted	-0.124939
-2.567757	and the 49	-0.124939
-1.951042	See page 49	-0.124939
-1.311608	(See page 49	-0.124939
-0.598454	counters, function parameters,	-0.124939
-0.598454	from), function parameters,	-0.124939
-1.063856	as template parameters,	-0.124939
-0.596949	vectorized code. Storing	-0.124939
-1.119619	the class. Storing	-0.124939
-1.231536	32-bit mode. Storing	-0.124939
-0.902463	0x2710 and (set)	-0.124939
-1.294468	we have (set)	-0.124939
-0.358973	the formula: (set)	-0.124939
-0.601746	parallelism and fine-grained	-0.124939
-0.901392	than with fine-grained	-0.124939
-0.358973	for exploiting fine-grained	-0.124939
-0.563087	simply zero. Execution	-0.124939
-0.143412	chain. 3.16 Execution	-0.124939
-0.143412	22 3.16 Execution	-0.124939
-0.845759	c = LoadVector(cc	-0.602060
-1.361555	at an arbitrary	-0.124939
-1.255811	into an arbitrary	-0.124939
-0.592632	just an arbitrary	-0.124939
-1.119619	the class. Which	-0.124939
-0.463696	case" values. Which	-0.124939
-0.463696	same effect. Which	-0.124939
-1.075677	compilers may behave	-0.124939
-0.599474	Different compilers behave	-0.124939
-1.061846	to always behave	-0.124939
-0.597187	with 64 bits,	-0.124939
-0.891329	is 32 bits,	-0.124939
-0.593034	to four bits,	-0.124939
-0.601746	platform-independent and compact.	-0.124939
-0.900164	data more compact.	-0.124939
-1.065336	slightly less compact.	-0.124939
-1.195743	class that behaves	-0.124939
-1.072883	object that behaves	-0.124939
-0.463714	30 Overflow behaves	-0.124939
-1.378408	processors and FPGA	-0.124939
-1.063534	and an FPGA	-0.124939
-1.709518	in an FPGA	-0.124939
-1.378333	processors and earlier	-0.124939
-0.901392	not with earlier	-0.124939
-0.595661	v.10.2 & earlier	-0.124939
-0.581640	(seconds < 5)	-0.124939
-0.581640	(0 < 5)	-0.124939
-1.018348	(iset >= 5)	-0.124939
-0.757840	The operators &,	-0.124939
-0.796552	bitwise operators &,	-0.425969
-0.599531	10 page 101	-0.124939
-0.541288	is necessary. 101	-0.124939
-0.358973	10 Multithreading.............................................................................................................. 101	-0.124939
-0.967435	the following reasons:	-0.301030
-1.530476	the elements consecutively	-0.124939
-1.826651	are stored consecutively	-0.124939
-1.879857	are accessed consecutively	-0.124939
-1.558224	less efficient. Extra	-0.124939
-0.588088	Misaligned data. Extra	-0.124939
-0.557791	4 processor. Extra	-0.124939
-1.803980	case of error.	-0.124939
-0.600971	provokes an error.	-0.124939
-1.061644	common programming error.	-0.124939
-2.953773	can be carried	-0.124939
-0.585191	tests were carried	-0.124939
-0.358973	longer loop- carried	-0.124939
-0.901505	smaller by reordering	-0.124939
-0.601065	ArrayOfStructures[100]; This reordering	-0.124939
-1.198772	make this reordering	-0.124939
-1.052679	to another platform.	-0.124939
-0.590109	on Mac platform.	-0.124939
-0.828349	a PC platform.	-0.124939
-2.059499	you are satisfied	-0.124939
-0.599714	who are satisfied	-0.124939
-2.191732	are not satisfied	-0.124939
-1.848049	It may catch	-0.124939
-1.589617	code will catch	-0.124939
-0.600465	F1(); } catch	-0.124939
-0.463696	/arch:SSE4.1 -mAVX /arch:AVX	-0.124939
-0.358973	linking (multithreaded) /arch:AVX	-0.124939
-0.358973	set (/arch:SSE2, /arch:AVX	-0.124939
-2.208291	See page 93	-0.124939
-0.575678	are containers 93	-0.124939
-0.505082	classes ..................................................................................................... 93	-0.124939
-1.460782	the virtual 53	-0.124939
-0.659955	functions ........................................................................................ 53	-0.124939
-0.358973	functions (methods)......................................................................... 53	-0.124939
-0.563074	__declspec(align(16)) X #else	-0.124939
-0.541303	"memory" ); #else	-0.124939
-0.463696	pure_function __attribute__((const)) #else	-0.124939
-1.063534	and an addition.	-0.124939
-1.063534	only an addition.	-0.124939
-2.580881	floating point addition.	-0.124939
-1.743157	compile time. Text	-0.124939
-1.106984	container classes. Text	-0.124939
-0.726886	9.8 Strings Text	-0.124939
-0.500756	systems with big-endian	-0.425969
-0.888612	platforms with big-endian	-0.124939
-0.659955	class B2; 54	-0.124939
-0.358973	(RTTI) ........................................................................... 54	-0.124939
-0.358973	Inheritance .............................................................................................................. 54	-0.124939
-0.601746	145 and 119	-0.124939
-0.577633	non-vector library. 119	-0.124939
-0.358973	for vectors........................................................................ 119	-0.124939
-1.369164	a && !a	-0.124939
-1.232379	a || !a	-0.124939
-0.659955	= a&&(b||c) !a	-0.124939
-1.075668	level of abstraction	-0.124939
-0.203767	layers of abstraction	-0.124939
-0.550889	8 bits each,	-0.124939
-1.065772	16 bits each,	-0.124939
-1.358951	32 bits each,	-0.124939
-0.463738	(level >= 11)	-0.425969
-0.463714	execution (chapter 11)	-0.124939
-1.078520	bytes of code).	-0.124939
-0.585182	produce binary code).	-0.124939
-0.358973	code (byte code).	-0.124939
-1.781342	memory allocation ......................................................................................	-0.124939
-0.764695	of unit-testing ......................................................................................	-0.124939
-0.659955	clock cycle? ......................................................................................	-0.124939
-2.360571	If the wrong	-0.124939
-0.601604	chosen the wrong	-0.124939
-2.431153	to a wrong	-0.124939
-0.844533	b = LoadVector(bb	-0.602060
-1.203473	integers to alias	-0.124939
-1.330287	does not alias	-0.124939
-1.670309	of memory blocks,	-0.124939
-0.873991	multiple memory blocks,	-0.124939
-1.050082	large memory blocks,	-0.124939
-0.596534	Take user feedback	-0.124939
-1.418315	the main feedback	-0.124939
-0.505082	features. User feedback	-0.124939
-0.519840	#else #define pure_function	-0.124939
-0.519840	__GNUC__ #define pure_function	-0.124939
-0.358988	double Func1(double) pure_function	-0.124939
-1.675195	a - a-a	-0.124939
-2.363795	- n.a. a-a	-0.124939
-0.463696	x-xxx---- a-(-b)=a+b a-a	-0.124939
-0.492390	long dependency chains.	-0.124939
-2.200138	used for prefetching	-0.124939
-0.876681	on automatic prefetching	-0.124939
-0.527373	while simultaneously prefetching	-0.124939
-1.203686	see the compiler-generated	-0.124939
-1.299464	libraries and compiler-generated	-0.124939
-0.541288	and understand compiler-generated	-0.124939
-0.600518	investment. A redesign	-0.124939
-0.249860	a complete redesign	-0.124939
-0.249860	A complete redesign	-0.124939
-0.249860	may behave differently	-0.124939
-0.249860	compilers behave differently	-0.124939
-0.505118	Overflow behaves differently	-0.124939
-1.627785	and then B,	-0.124939
-0.107204	int A, B,	-0.425969
-0.444421	vectorization. Optimizes reasonably	-0.124939
-0.314795	conventions. Optimizes reasonably	-0.124939
-0.505102	Studio optimizes reasonably	-0.124939
-0.957972	processor core. Two	-0.124939
-0.726886	using templates. Two	-0.124939
-0.358973	write _mm_add_epi16(a,b). Two	-0.124939
-0.463696	destructors .................................................................................. 55	-0.124939
-0.358973	give -2.0 55	-0.124939
-0.358973	Unions .................................................................................................................... 55	-0.124939
-0.361237	vector math libraries:	-0.301030
-0.897415	linked into projects	-0.124939
-1.485594	of C++ projects	-0.124939
-1.066971	that software projects	-0.124939
-0.308848	longdoublevalue ( 1)sign	-0.124939
-0.308848	doublevalue ( 1)sign	-0.124939
-0.308848	floatvalue ( 1)sign	-0.124939
-2.635642	// Example 14.6	-0.124939
-2.339209	= 0; 14.6	-0.124939
-0.505082	division...................................................................................................... 137 14.6	-0.124939
-2.397185	is the combination	-0.124939
-2.340331	with a combination	-0.124939
-0.541303	An OR combination	-0.124939
-1.201911	Intel function libraries,	-0.124939
-0.632619	dynamic link libraries,	-0.425969
-0.594672	volatile doesn't mean	-0.124939
-0.586019	(low numbers mean	-0.124939
-0.726886	square brackets mean	-0.124939
-2.208621	The compiler inserts	-0.124939
-0.597093	compiler often inserts	-0.124939
-1.288506	The profiler inserts	-0.124939
-0.204109	return _mm_loadu_si128((__m128i const*)p);	-0.425969
-0.358988	return _mm_load_si128((__m128i const*)p);	-0.124939
-1.900797	through a hidden	-0.124939
-2.392389	should be hidden	-0.124939
-1.229331	is actually hidden	-0.124939
-2.363867	- n.a. x*x*x*x*x*x*x*x	-0.124939
-0.143412	= ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x	-0.124939
-0.143412	x ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x	-0.124939
-0.500202	recover from errors.	-0.124939
-1.072480	prevent such errors.	-0.124939
-1.048613	of memory. One	-0.124939
-0.917638	development tools. One	-0.124939
-0.726886	only once. One	-0.124939
-0.483927	called square blocking	-0.124939
-0.483927	like square blocking	-0.124939
-0.358988	effort. Square blocking	-0.124939
-0.203516	unsigned // Faster	-0.425969
-0.505102	find elsewhere. Faster	-0.124939
-0.567243	some typical sources	-0.124939
-0.165192	are frequent sources	-0.425969
-0.601837	limited to well-tested	-0.124939
-0.901505	arrays by well-tested	-0.124939
-0.592533	collection contains well-tested	-0.124939
-0.563104	to small devices,	-0.124939
-0.828386	such small devices,	-0.124939
-1.260709	the smallest devices,	-0.124939
-2.580345	floating point multiplication,	-0.124939
-0.659955	addition, subtraction, multiplication,	-0.124939
-0.358973	operations (addition, multiplication,	-0.124939
-0.969572	the AVX part.	-0.425969
-0.590532	that particular part.	-0.124939
-0.901701	library or API	-0.124939
-1.927695	operating system API	-0.124939
-0.883084	use standard API	-0.124939
-1.731073	the program starts	-0.124939
-1.402960	the computer starts	-0.124939
-2.653289	the code only.	-0.124939
-0.587494	consuming parts only.	-0.124939
-0.463696	using multiplications only.	-0.124939
-0.591977	variables, loop counters,	-0.124939
-0.591977	intermediates, loop counters,	-0.124939
-1.425370	with multiple counters,	-0.124939
-1.494606	whole program execution,	-0.124939
-0.716752	of out-of-order execution,	-0.124939
-0.498973	doing out-of-order execution,	-0.124939
-2.339209	= 0; list[i+1]	-0.124939
-0.885198	+= i_div_3; list[i+1]	-0.124939
-0.358973	list[i] =0; list[i+1]	-0.124939
-2.762760	if the distance	-0.124939
-0.600717	call this distance	-0.124939
-0.463696	Variables whose distance	-0.124939
-2.638242	// Example 14.28	-0.124939
-1.452609	in example 14.28	-0.124939
-1.641671	pointers to zero,	-0.124939
-0.358973	truncation towards zero,	-0.124939
-0.358973	= select_gt(b, zero,	-0.124939
-2.339209	= 0; r1	-0.124939
-1.339325	// Loop r1	-0.124939
-1.326032	< SIZE; r1	-0.124939
-1.339325	// Loop r2	-0.124939
-0.726886	= r1; r2	-0.124939
-0.358973	= r1+1; r2	-0.124939
-0.463696	XOP ammintrin.h (MS)	-0.124939
-0.358973	all intrin.h (MS)	-0.124939
-0.358973	SSE4.2 nmmintrin.h (MS)	-0.124939
-1.641692	discussion of aligning	-0.124939
-0.601645	macro for aligning	-0.124939
-1.498440	compiler from aligning	-0.124939
-1.597925	option for assuming	-0.124939
-1.741714	we are assuming	-0.124939
-0.899722	prevented from assuming	-0.124939
-0.601411	Induction = r;	-0.124939
-0.589693	c < r;	-0.425969
-0.186385	Live range analysis	-0.124939
-0.358988	a thorough analysis	-0.124939
-1.813589	It may seem	-0.124939
-0.596345	syntax may seem	-0.124939
-1.293440	have tested seem	-0.124939
-1.703713	Linux and perhaps	-0.124939
-0.600677	decoding and perhaps	-0.124939
-0.573389	faster, except perhaps	-0.124939
-0.435082	An interrupt service	-0.124939
-0.435082	drivers, interrupt service	-0.124939
-0.358988	type. Interrupt service	-0.124939
-0.595292	Windows. Gnu Comes	-0.124939
-0.588648	available. Microsoft Comes	-0.124939
-0.358973	/ Embarcadero Comes	-0.124939
-0.576843	4 + esp	-0.124939
-0.576843	8 + esp	-0.124939
-0.576843	ENDP + esp	-0.124939
-1.345136	the new features.	-0.124939
-0.575265	desired new features.	-0.124939
-0.594686	than processor features.	-0.124939
-0.599217	^ b ---xx----	-0.124939
-0.143412	---x---xx (-a==-b)=(a==b) ---xx----	-0.124939
-0.143412	---xx--xx (-a==-b)=(a==b) ---xx----	-0.124939
-1.412257	Function parameters ...............................................................................................	-0.124939
-1.048621	of optimizing ...............................................................................................	-0.124939
-0.836030	and usability ...............................................................................................	-0.124939
-2.735619	that the C/C++	-0.124939
-0.601653	bad The C/C++	-0.124939
-0.764664	Open Watcom C/C++	-0.124939
-0.601411	-100+100+100 = 100.	-0.124939
-1.481640	i < 100.	-0.124939
-1.378040	a; int b;};	-0.124939
-0.499786	a; double b;};	-0.425969
-0.902206	task that consumes	-0.124939
-0.599927	overhead which consumes	-0.124939
-0.579333	framework still consumes	-0.124939
-0.885221	e.g. four numbers,	-0.124939
-1.263426	and model numbers,	-0.124939
-0.463696	Using hexadecimal numbers,	-0.124939
-0.836030	particularly critical. 129	-0.124939
-0.505082	17.4 129 129	-0.124939
-0.358973	128 17.4 129	-0.124939
-1.278291	takes to reload	-0.425969
-1.913241	necessary to reload	-0.124939
-1.446507	give the 124	-0.124939
-0.358973	Difficult cases........................................................................................................ 124	-0.124939
-0.358973	dispatching .................................................................................... 124	-0.124939
-1.173329	loop-invariant code motion	-0.124939
-0.202527	invariant code motion	-0.124939
-0.902687	run a speed-critical	-0.124939
-1.583148	only for speed-critical	-0.124939
-0.600131	preferable for speed-critical	-0.124939
-1.641692	list of numbers:	-0.124939
-2.580345	floating point numbers:	-0.124939
-1.009638	of 100 numbers:	-0.124939
-0.583239	next section (page	-0.124939
-0.570580	previous chapter (page	-0.124939
-0.527373	Table 8.1 (page	-0.124939
-0.902658	0 to 12.	-0.124939
-0.416988	in chapter 12.	-0.124939
-1.185004	will take 1000	-0.124939
-0.314795	program repeats 1000	-0.124939
-0.314795	also repeats 1000	-0.124939
-2.107235	they are long.	-0.124939
-0.588056	or too long.	-0.124939
-0.541333	sometimes unacceptably long.	-0.124939
-1.625025	of cache organization	-0.124939
-0.159910	9.2 Cache organization	-0.124939
-2.284872	that is slow,	-0.124939
-1.074473	A is slow,	-0.124939
-0.600485	(Division is slow,	-0.124939
-0.203942	operation is performed	-0.425969
-2.392860	should be performed	-0.124939
-2.771970	in a high-level	-0.124939
-0.594978	size, while high-level	-0.124939
-0.868180	an advanced high-level	-0.124939
-0.898137	memory in advance.	-0.124939
-1.074228	calculated in advance.	-0.124939
-1.288751	given in advance.	-0.124939
-1.046599	is fast anyway	-0.124939
-0.579321	the database anyway	-0.124939
-0.982113	by default anyway	-0.124939
-0.496095	link library (*.dll	-0.425969
-1.470693	dynamic libraries (*.dll	-0.124939
-1.444318	systems. The Intel-based	-0.124939
-1.497703	well as Intel-based	-0.124939
-0.764664	Linux, BSD, Intel-based	-0.124939
-1.939989	code and main()	-0.124939
-0.378078	} int main()	-0.425969
-0.901768	x4 = x2	-0.124939
-1.193303	{ double x2	-0.124939
-0.598893	1./2.09227E13}; float x2	-0.124939
-0.596115	compilers, system database,	-0.124939
-0.541288	a remote database,	-0.124939
-0.541288	queue, list, database,	-0.124939
-1.129236	x86 platforms. Works	-0.124939
-0.358973	(VML, MKL). Works	-0.124939
-0.358973	Primitives (IPP). Works	-0.124939
-1.554761	series of calculations:	-0.124939
-1.077937	loop for calculations:	-0.124939
-0.527355	to modulo calculations:	-0.124939
-1.878755	as the basis	-0.124939
-0.358973	Out (FIFO) basis	-0.124939
-0.358973	Out (FILO) basis	-0.124939
-1.077962	work. The updating	-0.124939
-1.274555	not need updating	-0.124939
-0.573371	updates. Automatic updating	-0.124939
-1.693708	of data manipulation	-0.124939
-0.597322	many bit manipulation	-0.124939
-1.038486	and string manipulation	-0.124939
-0.598940	32 = 28.	-0.124939
-0.896872	r = 28.	-0.124939
-0.896724	set number 28.	-0.124939
-0.599553	Devirtualization class C0	-0.124939
-1.459053	: public C0	-0.124939
-0.463696	C1 obj1; C0	-0.124939
-1.660230	to optimize access,	-0.124939
-0.550832	data base access,	-0.124939
-0.358973	or First-In-Last-Out access,	-0.124939
-1.554761	series of calculations,	-0.124939
-0.883677	when doing calculations,	-0.124939
-0.589183	heavy mathematical calculations,	-0.124939
-0.902260	directives for multi-core	-0.124939
-1.077099	CPUs or multi-core	-0.124939
-0.601168	core on multi-core	-0.124939
-0.599702	last cache level,	-0.124939
-1.272110	object file level,	-0.124939
-0.851824	lower priority level,	-0.124939
-0.600346	Look at Exception	-0.124939
-1.006199	error handling Exception	-0.124939
-1.332139	exception handling Exception	-0.124939
-0.902644	comes to optimization,	-0.124939
-1.494538	whole program optimization,	-0.124939
-0.463696	73 Without optimization,	-0.124939
-0.594122	propagation, etc. Whether	-0.124939
-0.836014	Boolean expressions. Whether	-0.124939
-0.358973	and Sum3. Whether	-0.124939
-2.351095	because the contents	-0.124939
-1.077814	copy the contents	-0.124939
-1.393951	the entire contents	-0.124939
-0.896707	These two books	-0.124939
-0.866303	the relevant books	-0.124939
-0.358973	Coriolis group books	-0.124939
-0.601863	static is removed	-0.124939
-2.537796	may be removed	-0.124939
-0.601065	unused. This removed	-0.124939
-2.015589	on page 164	-0.124939
-0.505082	notice .......................................................................................................... 164	-0.124939
-0.358973	See www.gnu.org/copyleft/fdl.html. 164	0.000000
-0.574732	8; float matrix[rows][columns];	-0.124939
-0.574732	32; float matrix[rows][columns];	-0.124939
-0.574732	50; float matrix[rows][columns];	-0.124939
-0.506698	a linked list.	-0.124939
-0.601663	series. The exponential	-0.124939
-0.065813	as logarithms, exponential	-0.425969
-0.601653	above. The generality	-0.124939
-1.077937	designed for generality	-0.124939
-1.182607	the full generality	-0.124939
-1.391086	time it takes.	-0.124939
-1.067271	each part takes.	-0.124939
-2.771970	in a multithreaded	-0.124939
-0.598622	simultaneously. In multithreaded	-0.124939
-0.591632	when optimizing multithreaded	-0.124939
-0.790361	= 1; list[i+2]	-0.425969
-0.885246	+= i_div_3; list[i+2]	-0.124939
-0.897690	Matrix size Total	-0.124939
-1.132698	of elements Total	-0.425969
-0.601368	do it explicitly.	-0.124939
-2.653289	the code explicitly.	-0.124939
-0.892930	this optimization explicitly.	-0.124939
-1.374123	In other programs,	-0.124939
-1.736825	In some programs,	-0.124939
-0.575678	make 16-bit programs,	-0.124939
-0.601368	well it optimizes	-0.124939
-2.623007	the compiler optimizes	-0.124939
-1.164301	Visual Studio optimizes	-0.124939
-0.563087	own profiling instruments	-0.124939
-0.314795	put measurement instruments	-0.124939
-0.314795	desired measurement instruments	-0.124939
-0.599050	point. // After	-0.124939
-0.599050	dispatcher. // After	-0.124939
-0.563087	of branch. After	-0.124939
-0.515453	many reductions involving	-0.124939
-0.515453	Most reductions involving	-0.124939
-0.563121	objects Conversions involving	-0.124939
-0.475755	point XMM (vector)	-0.124939
-0.475755	Integer XMM (vector)	-0.124939
-0.475755	Boolean XMM (vector)	-0.124939
-1.804987	has the unfortunate	-0.124939
-2.557445	This is unfortunate	-0.124939
-0.900914	uses an unfortunate	-0.124939
-0.080733	manual 2: "Optimizing	-0.602060
-2.762335	to the parameter,	-0.124939
-1.299986	like a parameter,	-0.124939
-2.412044	a function parameter,	-0.124939
-1.199078	recover from exceptions.	-0.124939
-1.196242	without using exceptions.	-0.124939
-0.591952	catching hardware exceptions.	-0.124939
-1.600975	avoid the time-	-0.124939
-0.894927	and very time-	-0.124939
-1.529954	to put time-	-0.124939
-0.028028	AMD Opteron K8	-0.124939
-2.059748	program is loaded.	-0.124939
-1.766776	has been loaded.	-0.124939
-0.463696	is heavily loaded.	-0.124939
-0.896243	as for (i=0;	-0.124939
-1.068951	0; for (i=0;	-0.124939
-0.598623	example, for (i=0;	-0.124939
-0.601759	14.23b and 14.30	-0.124939
-2.218233	// Example 14.30	-0.124939
-1.055252	} Example 14.30	-0.124939
-1.824637	y = b;}	-0.124939
-1.733762	a + b;}	-0.124939
-0.527355	ReadB() {return b;}	-0.124939
-0.601065	version). This wasteful	-0.124939
-1.372213	avoid this wasteful	-0.124939
-0.358973	is unnecessarily wasteful	-0.124939
-0.596682	error // Return	-0.124939
-0.596682	zero // Return	-0.124939
-0.596682	a[i]; // Return	-0.124939
-0.403295	inline void StoreVector(void	-0.602060
-0.172705	software. Smaller microcontrollers	-0.124939
-0.172705	caching. Smaller microcontrollers	-0.124939
-0.172705	microcontrollers: Smaller microcontrollers	-0.124939
-1.378753	strings in character	-0.124939
-0.601210	style with character	-0.124939
-0.601122	style as character	-0.124939
-1.498107	language is implemented.	-0.124939
-0.901318	15.1b is implemented.	-0.124939
-1.202161	pointers are implemented.	-0.124939
-0.845367	size. In fact,	-0.124939
-0.572240	language. In fact,	-0.124939
-0.572240	throw. In fact,	-0.124939
-0.587042	function at runtime.	-0.124939
-0.873557	than at runtime.	-0.124939
-1.174587	resolved at runtime.	-0.124939
-2.005959	a loop manually	-0.124939
-1.653614	be done manually	-0.124939
-0.835998	code motion manually	-0.124939
-0.601839	multiplication of xxn	-0.124939
-0.886995	s += xxn	-0.124939
-0.358973	+= x^n/n! xxn	-0.124939
-0.028028	operators &, |,	-0.301030
-2.635642	// Example 7.2	-0.124939
-0.575664	using classes. 7.2	-0.124939
-0.527355	storage............................................................................. 26 7.2	-0.124939
-1.203916	each thread. Thread-local	-0.124939
-0.567237	environment block. Thread-local	-0.124939
-0.550845	five times. Thread-local	-0.124939
-1.494538	whole program 81	-0.124939
-2.208291	See page 81	-0.124939
-0.358973	options ................................................................................... 81	-0.124939
-2.635642	// Example 7.1	-0.124939
-0.541303	of manuals. 7.1	-0.124939
-0.527355	constructs........................................................................ 26 7.1	-0.124939
-1.393717	the dispatcher signal	-0.124939
-0.550845	video processing, signal	-0.124939
-0.463696	for statistics, signal	-0.124939
-1.380867	as a circular	-0.602060
-1.288123	In example 7.4	-0.124939
-0.596140	...................................................................... 32 7.4	-0.124939
-1.052732	mathematical functions. 7.4	-0.124939
-1.996666	compiler to ignore	-0.124939
-2.095507	You may ignore	-0.124939
-1.741870	some cases ignore	-0.124939
-0.902463	directives and keywords	-0.124939
-1.068644	have many keywords	-0.124939
-0.358973	appropriate. Compiler-specific keywords	-0.124939
-2.635642	// Example 7.8	-0.124939
-0.463696	...................................................................................................... 37 7.8	-0.124939
-0.463696	has changed. 7.8	-0.124939
-0.592335	it only once.	-0.124939
-0.883854	done only once.	-0.124939
-0.597527	only called once.	-0.124939
-1.598253	union { 89	-0.124939
-1.511433	See page 89	-0.425969
-1.767302	{ int list[100];	-0.124939
-0.598893	7.16 float list[100];	-0.124939
-0.858788	b;}; S1 list[100];	-0.124939
-2.800032	can be considered	-0.124939
-2.447620	may be considered	-0.124939
-0.358988	not traditionally considered	-0.124939
-0.600671	GetProcessAffinityMask in Windows).	-0.124939
-0.600671	IsProcessorFeaturePresent in Windows).	-0.124939
-1.191309	for 64-bit Windows).	-0.124939
-1.270307	to compile for.	-0.124939
-0.828441	is intended for.	-0.124939
-1.776525	do the divisions	-0.124939
-0.601746	multiplications and divisions	-0.124939
-0.541288	exact. Multiple divisions	-0.124939
-1.069887	columns = 8;	-0.124939
-0.598940	TILESIZE = 8;	-0.124939
-1.049561	exponent : 8;	-0.124939
-0.601618	course that reflects	-0.124939
-0.901103	loop. This reflects	-0.124939
-1.431032	long double reflects	-0.124939
-1.171701	Container classes .....................................................................................................	-0.124939
-0.726886	10.1 Hyperthreading .....................................................................................................	-0.124939
-0.659955	13.5 Implementation .....................................................................................................	-0.124939
-1.379327	value that lies	-0.124939
-1.038577	The difference lies	-0.124939
-0.579361	this efficiency lies	-0.124939
-0.601759	logarithms and trigonometric	-0.124939
-0.182924	exponential functions, trigonometric	-0.425969
-1.296002	you to manipulate	-0.124939
-0.901072	us to manipulate	-0.124939
-0.601379	test or manipulate	-0.124939
-0.892400	23; // fractional	-0.124939
-0.596682	52; // fractional	-0.124939
-0.596682	63; // fractional	-0.124939
-0.593845	1 from -128	-0.124939
-1.062904	range from -128	-0.124939
-1.278204	char 8 -128	-0.124939
-1.836275	to be spaced	-0.425969
-1.202161	addresses are spaced	-0.124939
-1.075951	make an approximate	-0.124939
-0.549217	addition, fast approximate	-0.124939
-0.549217	reciprocal, fast approximate	-0.124939
-1.202953	integers in comparisons,	-0.124939
-1.077679	operands are comparisons,	-0.124939
-2.580345	floating point comparisons,	-0.124939
-0.726886	new features. User	-0.124939
-0.463696	be deleted. User	-0.124939
-0.358973	feedback seriously. User	-0.124939
-2.059261	if the dividend	-0.425969
-0.601256	changing the dividend	-0.124939
-0.593639	time at unpredictable	-0.124939
-0.886408	start at unpredictable	-0.124939
-1.644259	can cause unpredictable	-0.124939
-0.224568	inline __m128i LoadVector(void	-0.602060
-0.601267	step by step.	-0.124939
-1.671938	the next step.	-0.124939
-1.529144	the second step.	-0.124939
-0.599330	C; double Z	-0.124939
-1.566086	induction variable Z	-0.124939
-0.358973	+= Z; Z	-0.124939
-0.599714	clauses are separated	-0.124939
-0.599714	clause are separated	-0.124939
-2.592591	is not separated	-0.124939
-0.902463	9 and 64,	-0.124939
-0.901505	registers by 64,	-0.124939
-0.358973	16, 32, 64,	-0.124939
-1.299464	file and copies	-0.124939
-2.008708	code that copies	-0.124939
-0.358973	shr ebx,31 copies	-0.124939
-2.647771	the same brand.	-0.124939
-2.101316	the CPU brand.	-0.124939
-0.883624	by CPU brand.	-0.124939
-2.557445	This is annoying	-0.124939
-1.202617	schemes are annoying	-0.124939
-1.594492	be an annoying	-0.124939
-1.970566	is called CodeAnalyst.	-0.124939
-0.830088	and AMD CodeAnalyst.	-0.124939
-0.564027	use AMD CodeAnalyst.	-0.124939
-0.415716	160 19 Literature	-0.124939
-0.415716	162 19 Literature	-0.124939
-0.358988	of titles. Literature	-0.124939
-1.594044	useful to study	-0.124939
-1.907418	important to study	-0.124939
-0.866277	on my study	-0.124939
-1.619592	on the stack,	-0.301030
-0.723381	and garbage collection.	-0.124939
-0.378037	for garbage collection.	-0.124939
-0.378037	called garbage collection.	-0.124939
-1.770523	when it occurs,	-0.124939
-1.539731	before it occurs,	-0.124939
-0.589179	overflow never occurs,	-0.124939
-1.214030	the option -fno-pic	-0.124939
-0.560163	compiler option -fno-pic	-0.124939
-0.527384	x86 platform _M_IX86	-0.124939
-0.527384	x86-64 platform _M_IX86	-0.124939
-0.726919	platform _M_IX86 _M_IX86	-0.124939
-0.902696	bottleneck is elsewhere	-0.124939
-0.591654	seek information elsewhere	-0.124939
-0.582120	unpredictable errors elsewhere	-0.124939
-1.077099	way or bypassing	-0.124939
-1.076806	problem by bypassing	-0.124939
-2.623007	the compiler bypassing	-0.124939
-0.124739	0x2700 to 0x273F	-0.124939
-0.601746	134 and 135	-0.124939
-0.899906	DoThisThreeTimesAWeek(); } 135	-0.124939
-0.358973	at once................................... 135	-0.124939
-0.901601	factorial function looks	-0.124939
-1.296490	optimized code looks	-0.124939
-1.643951	vector classes looks	-0.124939
-0.585228	doubles: union {double	-0.124939
-0.637024	struct S1 {double	-0.425969
-1.502175	used for implementing	-0.124939
-0.594523	themselves. But implementing	-0.124939
-1.940312	instead of int.	-0.124939
-0.601837	float to int.	-0.124939
-1.060658	of type int.	-0.124939
-0.899689	takes memory space,	-0.124939
-1.624939	of cache space,	-0.124939
-0.575691	save RAM space,	-0.124939
-2.071099	that can skip	-0.124939
-2.095507	You may skip	-0.124939
-0.463696	explanation. Please skip	-0.124939
-1.365253	(See page 137	-0.124939
-0.557802	and rounding 137	-0.124939
-0.358973	Integer division...................................................................................................... 137	-0.124939
-1.186106	cache line. 132	-0.124939
-0.527355	topics ......................................................................................... 132	-0.124939
-0.463696	tables ................................................................................................. 132	-0.124939
-0.601839	needs of position-	-0.124939
-1.881564	This makes position-	-0.124939
-1.299415	the so-called position-	-0.124939
-1.596505	}; // Index	-0.124939
-0.065813	<< "Error: Index	-0.425969
-0.358973	#pragma optimize("a",on). Specifies	-0.124939
-0.358973	(Linux only). Specifies	-0.124939
-0.358973	or __attribute__((aligned(16))). Specifies	-0.124939
-2.497485	of the residual	-0.425969
-1.549913	until the residual	-0.124939
-1.258339	of vector operations,	-0.124939
-1.053066	for vector operations,	-0.124939
-0.599692	vector integer operations,	-0.124939
-1.203459	drawbacks of C++.	-0.124939
-1.202731	C or C++.	-0.124939
-1.057418	in compiled C++.	-0.124939
-0.600021	do other input/output	-0.124939
-0.596120	many file input/output	-0.124939
-0.505082	categories: File input/output	-0.124939
-0.600935	Most compiler packages	-0.124939
-0.864099	make software packages	-0.124939
-0.582127	bigger software packages	-0.124939
-0.601951	joining the operations:	-0.124939
-0.541303	two AND operations:	-0.124939
-0.527355	avoid modulo operations:	-0.124939
-1.034846	VIA processors. Explicit	-0.124939
-0.143412	96 9.11 Explicit	-0.124939
-0.143412	2001. 9.11 Explicit	-0.124939
-1.588811	for this purpose.	-0.124939
-1.694931	a specific purpose.	-0.124939
-1.795053	a particular purpose.	-0.124939
-0.585033	b2 * reciprocal_divisor;	-0.124939
-0.585033	b1 * reciprocal_divisor;	-0.124939
-0.358988	y1, y2, reciprocal_divisor;	-0.124939
-1.065754	values before compilation.	-0.124939
-0.615817	and just-in-time compilation.	-0.124939
-0.615817	use just-in-time compilation.	-0.124939
-0.601399	stride) = (number	-0.124939
-0.593275	size) / (number	-0.124939
-0.575664	size) % (number	-0.124939
-1.039649	have big endian	-0.124939
-0.540055	use big endian	-0.124939
-0.540055	On big endian	-0.124939
-1.875537	function that allocates	-0.124939
-1.077110	called, it allocates	-0.124939
-0.358973	ended queue) allocates	-0.124939
-2.015589	on page 136	-0.124939
-0.818669	order(int x); 136	-0.124939
-0.527355	multiplication ............................................................................................. 136	-0.124939
-0.899859	here. It reveals	-0.124939
-0.726886	assembly listing reveals	-0.124939
-0.358973	crystal ball reveals	-0.124939
-2.834319	it is filled	-0.124939
-2.430703	to be filled	-0.124939
-2.160809	will be filled	-0.124939
-0.901701	-O3 or (requires	-0.124939
-2.441280	instruction set (requires	-0.124939
-0.527355	and loader (requires	-0.124939
-1.685915	Some compilers offer	-0.124939
-1.541059	Most compilers offer	-0.124939
-0.859196	Other compilers offer	-0.124939
-0.726919	7.25 Bitfields Bitfields	-0.124939
-0.249860	integers. 7.25 Bitfields	-0.124939
-0.249860	55 7.25 Bitfields	-0.124939
-1.823157	} // At	-0.124939
-0.463696	user's computers. At	-0.124939
-0.358973	are dominating. At	-0.124939
-1.602917	have an up-to-date	-0.124939
-0.596781	choose an up-to-date	-0.124939
-0.897943	and most up-to-date	-0.124939
-0.563130	reasons before leaving	-0.124939
-0.481376	_mm256_zeroupper() before leaving	-0.425969
-1.164424	for details. Inheritance	-0.124939
-0.249860	54 7.22 Inheritance	-0.124939
-0.249860	implementations. 7.22 Inheritance	-0.124939
-2.428809	the program 153	-0.124939
-2.208291	See page 153	-0.124939
-0.358973	Testing speed.............................................................................................................. 153	-0.124939
-1.418890	a high degree	-0.124939
-1.078927	a typical degree	-0.124939
-0.358973	an n'th degree	-0.124939
-1.067998	x) { _mm_storeu_si128((__m128i	-0.602060
-0.873036	such optimizations automatically,	-0.124939
-0.541288	151 15.1c automatically,	-0.124939
-0.463696	to 11.1b automatically,	-0.124939
-1.193659	multidimensional array sequentially.	-0.124939
-1.181168	are accessed sequentially.	-0.124939
-0.249860	32 7.4 Enums	-0.124939
-0.249860	functions. 7.4 Enums	-0.124939
-0.358988	in disguise. Enums	-0.124939
-1.089968	the CPU. Algebraic	-0.124939
-0.940693	function call. Algebraic	-0.124939
-0.358973	complicated reductions. Algebraic	-0.124939
-1.379148	values of A,	-0.124939
-0.203052	x; int A,	-0.425969
-1.878755	as the operands.	-0.124939
-0.590911	evaluate both operands.	-0.124939
-1.041624	of Boolean operands.	-0.124939
-1.139557	i; for(i=0; i<100;	-0.124939
-0.903015	for (i=0; i<100;	-0.124939
-0.358973	i2; for(i=0,i2=0; i<100;	-0.124939
-0.505121	0.18 0.18 0.11	-0.124939
-0.359001	0.75 0.18 0.11	-0.124939
-0.505102	0.18 0.12 0.11	-0.124939
-1.067253	Core 2 0.12	-0.124939
-0.541318	0.12 0.18 0.12	-0.124939
-0.463696	0.57 0.44 0.12	-0.124939
-2.383948	is the nearest	-0.124939
-2.731806	to the nearest	-0.124939
-0.601844	Round to nearest	-0.124939
-0.575664	vector libraries. To	-0.124939
-0.999788	the future. To	-0.124939
-0.358973	is rebooted. To	-0.124939
-0.377991	x-- x x--	-0.425969
-1.049051	algebra reductions: x--	-0.124939
-1.533571	the C++ language,	-0.124939
-0.891320	software programming language,	-0.124939
-1.292120	hardware definition language,	-0.124939
-2.467399	when the 145	-0.124939
-2.208291	See page 145	-0.124939
-0.527355	functions ....................................................................................... 145	-0.124939
-2.208291	See page 140	-0.124939
-0.598893	is float 140	-0.124939
-0.358973	and double..................................................................................... 140	-0.124939
-2.208291	See page 141	-0.124939
-0.463696	or x64 141	-0.124939
-0.358973	integers ................................... 141	-0.124939
-1.075824	better than RISC	-0.124939
-0.597915	distinctions between RISC	-0.124939
-0.659955	have got RISC	-0.124939
-0.872998	future processors. Consider	-0.124939
-0.858744	of resources. Consider	-0.124939
-0.463696	in loops. Consider	-0.124939
-0.902649	storage of text	-0.124939
-0.851799	and handle text	-0.124939
-0.957945	for storing text	-0.124939
-0.584263	different compiler. Object	-0.124939
-0.527355	delete). 88 Object	-0.124939
-0.358973	now discontinued Object	-0.124939
-2.635642	// Example 14.10	-0.124939
-0.463696	......................... 142 14.10	-0.124939
-0.358973	by u[0]. 14.10	-0.124939
-2.635642	// Example 14.11	-0.124939
-0.505082	....................................................................................... 145 14.11	-0.124939
-0.358973	this limitation). 14.11	-0.124939
-0.557841	2 template <int	-0.124939
-0.557841	N template <int	-0.124939
-0.557841	m;} template <int	-0.124939
-0.563064	more iterations back.	-0.124939
-0.550832	four places back.	-0.124939
-0.541303	and written back.	-0.124939
-2.635642	// Example 8.4	-0.124939
-0.885198	this option. 8.4	-0.124939
-0.463696	....................................................................... 77 8.4	-0.124939
-2.635642	// Example 8.7	-0.124939
-0.659955	page 105. 8.7	-0.124939
-0.463696	.............................................................................................. 82 8.7	-0.124939
-0.837454	The assembly listing	-0.124939
-0.568003	Generate assembly listing	-0.124939
-1.276956	assembly output listing	-0.124939
-1.993509	are used twice	-0.124939
-0.597659	has const twice	-0.124939
-1.608098	is calculated twice	-0.124939
-1.503609	implementations of Pascal	-0.124939
-0.864036	major platforms. Pascal	-0.124939
-0.806067	of C++, Pascal	-0.124939
-2.953773	can be expected.	-0.124939
-1.201312	good as expected.	-0.124939
-1.042367	Cache contentions expected.	-0.124939
-0.577633	the performance. 14.4	-0.124939
-0.557813	129 130 14.4	-0.124939
-0.505082	once................................... 135 14.4	-0.124939
-1.501322	cc[]) { Vec16s	-0.124939
-1.621397	the class Vec16s	-0.124939
-0.358973	256 Vec32uc Vec16s	-0.124939
-0.875541	equally efficient. Simple	-0.124939
-1.065601	very fast. Simple	-0.124939
-0.358973	model fast=2 Simple	-0.124939
-0.065813	Software Developer’s Manual",	-0.425969
-0.358988	Architecture Programmer’s Manual",	-0.124939
-0.600677	cores and leave	-0.124939
-0.600677	520 and leave	-0.124939
-1.072162	program should leave	-0.124939
-2.257921	can be solved	-0.425969
-1.629290	compiler has solved	-0.124939
-2.557445	This is supplied	-0.124939
-1.934073	functions are supplied	-0.124939
-1.959098	I have supplied	-0.124939
-0.575678	demonstration purposes. Available	-0.124939
-0.575664	hardware access. Available	-0.124939
-0.550845	Intel Agner Available	-0.124939
-2.278797	code is translated	-0.124939
-0.601173	call is translated	-0.124939
-1.767089	has been translated	-0.124939
-0.586203	29 64-bit Linux:	-0.124939
-0.586203	__int64 64-bit Linux:	-0.124939
-0.358988	(Windows: /Gy, Linux:	-0.124939
-1.162419	the user. With	-0.124939
-0.541288	thousand numbers. With	-0.124939
-0.505082	next step. With	-0.124939
-0.557780	64-bit Linux. Has	-0.124939
-0.527355	Studio IDE. Has	-0.124939
-0.358973	C++ builder Has	-0.124939
-1.387452	you are overriding	-0.425969
-1.321517	that allows overriding	-0.124939
-0.535857	bytes AMD Opteron	-0.124939
-0.535857	operands AMD Opteron	-0.124939
-0.535857	op. AMD Opteron	-0.124939
-0.661769	and operating systems".	-0.124939
-2.412752	with the correct	-0.124939
-1.801186	has the correct	-0.124939
-0.601870	estimate is correct	-0.124939
-0.601148	efficient code caching.	-0.124939
-0.600356	about memory caching.	-0.124939
-1.660230	to optimize caching.	-0.124939
-1.445617	risk of overflow:	-0.124939
-2.580345	floating point overflow:	-0.124939
-0.527373	that avoids overflow:	-0.124939
-1.622805	program that scans	-0.124939
-0.599950	scanner that scans	-0.124939
-0.601322	length function scans	-0.124939
-0.967435	the following way:	-0.124939
-1.715986	the code. Sometimes	-0.124939
-1.042611	time consuming. Sometimes	-0.124939
-0.527355	simple tasks. Sometimes	-0.124939
-0.894756	Gnu 32-bit -fno-builtin	-0.124939
-1.636041	64 bit -fno-builtin	-0.124939
-0.593944	12 option -fno-builtin	-0.124939
-0.680433	enough to justify	-0.124939
-1.065687	can easily justify	-0.124939
-0.550639	On the contrary,	-0.124939
-0.833014	function calling conventions.	-0.124939
-0.475749	standard calling conventions.	-0.124939
-0.475749	5: calling conventions.	-0.124939
-1.711269	function. The initialization	-0.124939
-0.677013	has an initialization	-0.425969
-2.550220	on the Internet	-0.124939
-1.377480	through the Internet	-0.124939
-0.463714	www.amd.com. 163 Internet	-0.124939
-2.272649	order to cover	-0.124939
-0.601050	big to cover	-0.124939
-2.028762	does not cover	-0.124939
-0.541310	constructor itself. Constructors	-0.124939
-0.249860	}; 7.23 Constructors	-0.124939
-0.249860	54 7.23 Constructors	-0.124939
-0.597915	processors, between PC's	-0.124939
-2.021564	the first PC's	-0.124939
-0.591942	several standard PC's	-0.124939
-2.635642	// Example 7.21	-0.124939
-0.505082	........................................................................................ 53 7.21	-0.124939
-0.659955	the effort. 7.21	-0.124939
-1.077856	method that delays	-0.124939
-1.171749	and cause delays	-0.124939
-0.358973	cause severe delays	-0.124939
-0.107208	+ i, a);	-0.602060
-0.203843	b[i] and c[i]	-0.425969
-0.358988	+ b[i]; c[i]	-0.124939
-0.601645	handlers for cleaning	-0.124939
-1.371913	of time cleaning	-0.124939
-0.899722	prevented from cleaning	-0.124939
-2.646757	the same way,	-0.124939
-1.810592	the other way,	-0.124939
-0.599808	times one way,	-0.124939
-1.048613	RAM memory. Big	-0.124939
-0.541318	mainframe computer. Big	-0.124939
-0.358973	aware of. Big	-0.124939
-0.746805	set and ZMM	-0.425969
-0.358988	the 512-bit ZMM	-0.124939
-1.503874	table of coefficients	-0.124939
-0.065813	// Polynomial coefficients	-0.124939
-2.282605	such as DOS	-0.124939
-1.818633	operating systems DOS	-0.124939
-0.585182	very old DOS	-0.124939
-0.902297	case. The -fpie	-0.124939
-1.356034	the option -fpie	-0.124939
-0.971467	with option -fpie	-0.124939
-1.609280	with many labels	-0.124939
-1.627899	the case labels	-0.124939
-0.527355	with sequential labels	-0.124939
-0.430494	1, 2, 6,	-0.425969
-0.563087	3, 4, 6,	-0.124939
-0.505082	$B1$3: pop ret	-0.124939
-0.358973	ja $B2$3: ret	-0.124939
-0.358973	the beginning. ret	-0.124939
-1.038466	discussed below. Signed	-0.124939
-0.567228	integer overflow. Signed	-0.124939
-0.358973	Example 7.4. Signed	-0.124939
-0.601746	log, and logarithms	-0.124939
-2.282605	such as logarithms	-0.124939
-0.885257	function uses logarithms	-0.124939
-1.503574	array is stored.	-0.124939
-2.533292	to be stored.	-0.124939
-1.502649	variables are stored.	-0.124939
-0.567228	a standardized manner.	-0.124939
-1.241723	a non-sequential manner.	-0.124939
-0.563074	a random manner.	-0.124939
-0.582089	or size. Today,	-0.124939
-0.505082	too slow. Today,	-0.124939
-0.463696	mainframe computers. Today,	-0.124939
-1.203709	be the easiest	-0.124939
-0.600178	(/Oa). The easiest	-0.124939
-0.600178	delays. The easiest	-0.124939
-0.601746	push and pop	-0.124939
-0.726886	< 100. pop	-0.124939
-0.358973	jl $B1$3: pop	-0.124939
-1.455556	the constant 3.5	-0.124939
-0.557791	.................................................................................................... 19 3.5	-0.124939
-0.541288	update process. 3.5	-0.124939
-0.870008	the options -S	-0.124939
-0.143412	listing /FA -S	-0.124939
-0.143412	masm=intel /FA -S	-0.124939
-2.408169	function is inlined.	-0.124939
-2.430703	to be inlined.	-0.124939
-2.800032	can be inlined.	-0.124939
-0.553892	add add cmp	-0.124939
-0.553892	mov add cmp	-0.124939
-0.358988	increment i++. cmp	-0.124939
-0.600314	structure, data flow	-0.124939
-1.731073	the program flow	-0.124939
-0.876641	of #include directives.	-0.124939
-0.567263	Use OpenMP directives.	-0.124939
-0.358973	C++: Preprocessor directives.	-0.124939
-1.919581	is also deallocated.	-0.124939
-1.068716	has been deallocated.	-0.124939
-1.199768	be more (128	-0.124939
-1.743245	instruction set (128	-0.124939
-0.764664	be obsolete. Programmers	-0.124939
-0.463696	scan instruction. Programmers	-0.124939
-0.358973	macro expansions. Programmers	-0.124939
-1.920503	important to focus	-0.124939
-2.187814	is more focus	-0.124939
-1.025027	The main focus	-0.124939
-2.541119	the function definition.	-0.124939
-1.544339	the class definition.	-0.124939
-1.729641	a class definition.	-0.124939
-1.203709	follow the track	-0.124939
-0.975326	to keep track	-0.124939
-0.474628	and keep track	-0.124939
-0.600729	about this condition.	-0.124939
-1.160184	the error condition.	-0.124939
-0.841174	other error condition.	-0.124939
-0.601746	s2 and s3	-0.124939
-1.423864	= 0, s3	-0.124939
-0.358973	+= a[i+2]; s3	-0.124939
-1.423864	= 0, s2	-0.124939
-0.358973	+= a[i+1]; s2	-0.124939
-0.358973	s0, s1, s2	-0.124939
-1.191290	operations on contemporary	-0.124939
-0.597741	bytes on contemporary	-0.124939
-0.580837	core. Unfortunately, contemporary	-0.124939
-0.541288	compiler .......................................................................................... 66	-0.124939
-0.903015	+ 1.0f;} 66	-0.124939
-0.463696	optimize ............................................................................................ 66	-0.124939
-0.601863	bounds is probably	-0.124939
-1.744128	code can probably	-0.124939
-0.358973	a disassembly, probably	-0.124939
-1.940094	use of longjmp	-0.124939
-2.540789	the function longjmp	-0.124939
-1.739346	rely on longjmp	-0.124939
-0.028028	( 1)sign 2exponent	-0.124939
-1.226164	switch statement leads	-0.124939
-1.143215	automatic vectorization leads	-0.124939
-1.213010	lazy binding leads	-0.124939
-0.897670	Array size Alignd	-0.124939
-0.888766	aligned arrays Alignd	-0.124939
-0.541303	bb[size] ); Alignd	-0.124939
-0.899241	it for improving	-0.124939
-2.150300	used for improving	-0.124939
-0.901324	tips on improving	-0.124939
-0.898387	and cache sizes.	-0.124939
-0.886599	different matrix sizes.	-0.124939
-0.550832	of mixed sizes.	-0.124939
-0.847464	Program loading .......................................................................................................	-0.124939
-0.764664	15 Metaprogramming .......................................................................................................	-0.124939
-0.764664	Other databases .......................................................................................................	-0.124939
-0.601618	integer that holds	-0.124939
-0.595797	float type holds	-0.124939
-0.567228	array. eax holds	-0.124939
-1.503829	performance of competing	-0.124939
-1.298529	threads are competing	-0.124939
-0.600503	(MFC). A competing	-0.124939
-0.198232	your programming questions	-0.124939
-0.358988	to answer questions	-0.124939
-2.772600	in a register,	-0.124939
-1.758450	a vector register,	-0.124939
-1.053066	128-bit vector register,	-0.124939
-1.639416	user interface etc.,	-0.124939
-0.584272	another function, etc.,	-0.124939
-0.659955	just-in-time compilers, etc.,	-0.124939
-1.680156	call the ReadTSC	-0.124939
-2.540789	the function ReadTSC	-0.124939
-0.593052	or get ReadTSC	-0.124939
-0.820606	be replaced with:	-0.425969
-0.505102	}; Replace with:	-0.124939
-0.601768	usage in kernel	-0.124939
-1.927695	operating system kernel	-0.124939
-1.187637	in Linux kernel	-0.124939
-0.692616	and VIA CPUs").	-0.301030
-0.642660	int i, j;	-0.124939
-1.296939	have a natural	-0.124939
-0.884270	code contains natural	-0.124939
-0.577657	parallel calculations. Examples	-0.124939
-0.806044	explained above. Examples	-0.124939
-0.885198	is run. Examples	-0.124939
-1.929370	else { (iset	-0.124939
-0.358973	= &SelectAddMul_AVX2; (iset	-0.124939
-0.358973	= &SelectAddMul_SSE41; (iset	-0.124939
-1.297317	another function F2	-0.124939
-0.601267	thrown by F2	-0.124939
-1.807989	in case F2	-0.124939
-0.598806	key or moving	-0.124939
-0.598806	button or moving	-0.124939
-2.366894	rather than moving	-0.124939
-2.638242	// Example 9.6b.	-0.124939
-1.452609	in example 9.6b.	-0.425969
-0.659955	CPU only) -O3	-0.124939
-0.358973	-Ofast /O3 -O3	-0.124939
-0.358973	or /Ox -O3	-0.124939
-0.601374	is it unusual	-0.124939
-1.894983	is not unusual	-0.425969
-0.581595	to cache misses,	-0.124939
-0.581595	most cache misses,	-0.124939
-0.581595	executed, cache misses,	-0.124939
-1.637406	0 - Divide	-0.124939
-0.883068	and add Divide	-0.124939
-0.358973	(-a>-b)=(a<b) ---xx---x Divide	-0.124939
-2.067591	use a sorted	-0.124939
-1.199170	then a sorted	-0.124939
-1.074232	But a sorted	-0.124939
-0.601839	considerations of efficiency,	-0.124939
-0.599702	improve cache efficiency,	-0.124939
-0.894839	compromise between efficiency,	-0.124939
-2.371103	is the same.	-0.124939
-0.601256	always the same.	-0.124939
-1.077575	exactly the same.	-0.124939
-0.201081	template library (STL)	-0.124939
-1.099197	Template Library (STL)	-0.124939
-1.195086	to get rid	-0.124939
-0.764755	we get rid	-0.124939
-0.527408	don't get rid	-0.124939
-0.585182	and 10 ms	-0.124939
-0.557791	to 120 ms	-0.124939
-0.505082	typically 30 ms	-0.124939
-0.896707	has two arrays,	-0.124939
-0.594978	In large arrays,	-0.124939
-0.888416	no big arrays,	-0.124939
-1.530532	the elements matrix[r][c]	-0.124939
-1.404139	each element matrix[r][c]	-0.124939
-0.833210	Each element matrix[r][c]	-0.124939
-1.203228	program to issue	-0.124939
-1.199779	not an issue	-0.124939
-0.550845	a portability issue	-0.124939
-1.981197	way to solve	-0.124939
-0.901072	designed to solve	-0.124939
-2.028762	does not solve	-0.124939
-1.168244	a problem since	-0.124939
-0.789016	been updated since	-0.124939
-0.358973	clock pulses since	-0.124939
-0.899946	purposes is beyond	-0.124939
-0.600485	queries is beyond	-0.124939
-0.600485	coprocessors is beyond	-0.124939
-1.058108	becomes more readable	-0.124939
-0.594921	output more readable	-0.124939
-0.358988	not human readable	-0.124939
-1.445976	operand is infinity	-0.124939
-2.218481	will be infinity	-0.124939
-0.901701	zero or infinity	-0.124939
-1.977882	lot of bookkeeping	-0.124939
-1.858525	of this bookkeeping	-0.124939
-0.573343	explains why bookkeeping	-0.124939
-0.895481	by some formula	-0.124939
-0.579350	the safe formula	-0.124939
-1.461403	the right formula	-0.124939
-1.961729	into the technical	-0.124939
-1.776171	because of technical	-0.124939
-0.577639	inlining causes technical	-0.124939
-0.593740	set AVX instr.	-0.124939
-0.570596	set SSE4.1 instr.	-0.124939
-0.806044	Suppl. SSE3 instr.	-0.124939
-3.143257	of the specified	-0.124939
-2.550220	on the specified	-0.124939
-1.044290	are typically specified	-0.124939
-1.503829	ways of organizing	-0.124939
-1.202875	penalty for organizing	-0.124939
-1.076806	performance by organizing	-0.124939
-2.638242	// Example 9.5a	-0.124939
-1.920349	in example 9.5a	-0.124939
-0.589384	using example 9.5a	-0.124939
-0.601870	&& is false,	-0.124939
-0.896872	false = false,	-0.124939
-0.896872	!a = false,	-0.124939
-0.601746	free and open	-0.124939
-0.601478	inlining can open	-0.124939
-0.575671	Watcom Another open	-0.124939
-1.587071	the optimal decomposition	-0.124939
-0.358973	here: functional decomposition	-0.124939
-0.358973	decomposition. Functional decomposition	-0.124939
-0.601839	fallacy of measuring	-0.124939
-0.601746	spots and measuring	-0.124939
-1.674936	this by measuring	-0.124939
-0.143412	below. 3.7 File	-0.124939
-0.143412	20 3.7 File	-0.124939
-0.358988	these categories: File	-0.124939
-1.296615	allocation is negligible	-0.124939
-1.201465	handling is negligible	-0.124939
-1.503743	only a negligible	-0.124939
-1.974044	but it took	-0.124939
-1.200484	This code took	-0.124939
-0.588096	15.1c? We took	-0.124939
-0.598171	and so on.	-0.124939
-1.237779	is running on.	-0.124939
-0.965133	options turned on.	-0.124939
-0.873036	logical processors. Hyperthreading	-0.124939
-0.143412	101 10.1 Hyperthreading	-0.124939
-0.143412	(www.intel.com/technology/itj/). 10.1 Hyperthreading	-0.124939
-1.637406	0 - 30	-0.124939
-0.590097	of typically 30	-0.124939
-0.358973	page 142). 30	-0.124939
-1.072812	pointer which initially	-0.124939
-0.897496	Function pointer initially	-0.124939
-0.764664	PLT entry initially	-0.124939
-1.984707	do not occur.	-0.124939
-1.957490	does not occur.	-0.124939
-1.594287	it doesn't occur.	-0.124939
-0.828356	character arrays. Strings	-0.124939
-0.143412	93 9.8 Strings	-0.124939
-0.143412	needs. 9.8 Strings	-0.124939
-1.034862	Preprocessing directives Preprocessing	-0.124939
-0.143412	code. 7.32 Preprocessing	-0.124939
-0.143412	65 7.32 Preprocessing	-0.124939
-2.288736	possible to utilize	-0.124939
-2.272649	order to utilize	-0.124939
-0.567249	to fully utilize	-0.124939
-0.687840	vector of (0,0,0,0,0,0,0,0)	-0.301030
-0.583296	this example: 38	-0.124939
-0.505082	pointers .......................................................................................................... 38	-0.124939
-0.463696	Arrays ..................................................................................................................... 38	-0.124939
-1.940328	pointer or reference.	-0.124939
-1.191048	a const reference.	-0.124939
-0.659955	a null reference.	-0.124939
-0.463740	2 #define FUNCNAME	-0.124939
-0.463740	8 #define FUNCNAME	-0.124939
-0.463740	5 #define FUNCNAME	-0.124939
-0.601951	During the history	-0.124939
-1.598009	code. The history	-0.124939
-0.463696	the past history	-0.124939
-1.071704	}; class CChild2	-0.124939
-0.463696	CChild1 Object1; CChild2	-0.124939
-0.358973	&Object1; p1->Hello(); CChild2	-0.124939
-1.021646	the sign bit:	-0.425969
-0.516496	out sign bit:	-0.124939
-0.577682	various discussion forums	-0.124939
-0.505082	163 Internet forums	-0.124939
-0.463696	Several internet forums	-0.124939
-0.582137	for relative addressing	-0.124939
-0.391176	for self-relative addressing	-0.124939
-0.550863	supports self-relative addressing	-0.124939
-0.723898	size = 1024;	-0.301030
-2.282605	such as C#,	-0.124939
-1.715986	the code. C#,	-0.124939
-0.463696	in Java, C#,	-0.124939
-1.668677	rather than allocating	-0.124939
-0.463714	or re- allocating	-0.124939
-2.210442	so that a+b	-0.124939
-2.363795	- n.a. a+b	-0.124939
-1.048980	algebra reductions: a+b	-0.124939
-1.439668	should be taken	-0.602060
-1.874738	on the microprocessor.	-0.124939
-1.641117	type of microprocessor.	-0.124939
-2.540789	the function argument	-0.124939
-1.595385	to this argument	-0.124939
-1.626291	The same argument	-0.124939
-0.599956	} If Func1	-0.124939
-0.596228	8.21 void Func1	-0.124939
-1.787904	information about Func1	-0.124939
-1.014536	objects in Unix-like	-0.124939
-1.748688	for all Unix-like	-0.124939
-0.601046	--- - -----	-0.124939
-1.297678	x- x -----	-0.124939
-0.463696	x---- x---- -----	-0.124939
-1.769078	b * 2.5	-0.124939
-0.597384	............................................................................... 8 2.5	-0.124939
-1.036703	C++ compilers. 2.5	-0.124939
-1.924188	code and read-only	-0.124939
-0.600677	section and read-only	-0.124939
-2.302631	that are read-only	-0.124939
-2.771970	in a well-structured	-0.124939
-1.202867	clear and well-structured	-0.124939
-1.295198	a more well-structured	-0.124939
-0.596020	remaining bits represent	-0.124939
-0.842259	subsequent counts represent	-0.124939
-0.358973	to truly represent	-0.124939
-1.584195	to find elsewhere.	-0.124939
-0.940817	be found elsewhere.	-0.124939
-0.463696	be reused elsewhere.	-0.124939
-3.206581	of the micro-op	-0.124939
-2.340331	with a micro-op	-0.124939
-0.601365	cache or micro-op	-0.124939
-1.200584	one is best.	-0.124939
-1.297494	implementation is best.	-0.124939
-0.593538	one works best.	-0.124939
-0.601839	Instead of returning	-0.124939
-0.601267	manner by returning	-0.124939
-1.076238	deallocated when returning	-0.124939
-0.583239	Disadvantages are: Long	-0.124939
-1.106984	double precision. Long	-0.124939
-0.836030	of order. Long	-0.124939
-0.896872	(c2 = r1;	-0.124939
-0.896872	(r2 = r1;	-0.124939
-0.597876	c1 < r1;	-0.124939
-1.203285	requires a CPU-	-0.124939
-1.598412	functions. The CPU-	-0.124939
-2.531978	to make CPU-	-0.124939
-0.902658	jump to top	-0.124939
-0.984066	a ; top	-0.124939
-0.837425	ebx ; top	-0.124939
-1.920503	important to decide	-0.124939
-1.078297	factors that decide	-0.124939
-0.600889	We may decide	-0.124939
-0.739253	near each other.	-0.425969
-0.580059	neutralize each other.	-0.124939
-0.483927	a square brackets	-0.124939
-0.483927	The square brackets	-0.124939
-0.527376	the {} brackets	-0.124939
-0.505082	updated since 2004.	-0.124939
-0.358973	v. 8.42n, 2004.	-0.124939
-0.358973	v. 2.1.7, 2004.	-0.124939
-1.641680	count is odd	-0.124939
-0.600971	was an odd	-0.124939
-1.203894	a little odd	-0.124939
-2.635642	// Example 7.7	-0.124939
-0.527355	decrement operators. 7.7	-0.124939
-0.463696	............................................................................................ 36 7.7	-0.124939
-1.115838	C++ Compiler Documentation	-0.124939
-0.463696	GNU Free Documentation	-0.124939
-0.358973	OpenMP. www.openmp.org. Documentation	-0.124939
-0.545993	or error prone.	-0.124939
-0.192312	more error prone.	-0.124939
-0.600337	pow at compile-	-0.124939
-1.071871	but no compile-	-0.124939
-0.573350	should allow compile-	-0.124939
-0.595136	any function. Global	-0.124939
-0.577633	avoid it. Global	-0.124939
-1.351801	function returns. Global	-0.124939
-0.598009	These table lookups	-0.124939
-1.096362	and PLT lookups	-0.124939
-0.358973	of simultaneous lookups	-0.124939
-0.596950	Profile-guided optimization Whole	-0.124939
-1.259442	a program. Whole	-0.124939
-0.358973	optimization /Og Whole	-0.124939
-0.601399	(-a)*(-b) = a*b	-0.124939
-0.358973	= b+a a*b	-0.124939
-0.358973	= b+a, a*b	-0.124939
-2.643798	for the linker.	-0.124939
-2.180968	from the linker.	-0.124939
-1.067254	the dynamic linker.	-0.124939
-1.713456	sake of security.	-0.124939
-0.203918	relates to security.	-0.124939
-1.513627	the table lookup.	-0.124939
-1.516344	a table lookup.	-0.124939
-0.836109	vectorized table lookup.	-0.124939
-1.511433	See page 78	-0.425969
-1.009725	function library. 78	-0.124939
-1.635537	size is handled	-0.124939
-0.601173	triangle is handled	-0.124939
-2.392860	should be handled	-0.124939
-0.590953	Func1 (int a[],	-0.124939
-0.065813	void Func(int a[],	-0.425969
-0.601863	line is implicitly	-0.124939
-0.601315	memcpy function implicitly	-0.124939
-1.340181	are done implicitly	-0.124939
-0.747640	including the terminating	-0.425969
-0.597538	cleanup before terminating	-0.124939
-0.601051	not not _WIN32	-0.124939
-0.586002	Windows platform _WIN32	-0.124939
-0.505082	platform _WIN32 _WIN32	-0.124939
-0.902649	calculations of (2n	-0.124939
-1.626195	a * (2n	-0.124939
-0.890151	The constant (2n	-0.124939
-0.599950	test that measures	-0.124939
-0.599950	counter that measures	-0.124939
-1.288582	The profiler measures	-0.124939
-1.078241	additions and multiplications.	-0.124939
-1.194400	and no multiplications.	-0.124939
-1.052646	only four multiplications.	-0.124939
-0.597406	for less intensive	-0.124939
-0.143412	a computationally intensive	-0.124939
-0.143412	not computationally intensive	-0.124939
-2.800032	can be moved	-0.124939
-2.447620	may be moved	-0.124939
-0.901730	copied or moved	-0.124939
-0.601399	timediff[i] = ReadTSC()	-0.124939
-1.577824	long long ReadTSC()	-0.124939
-1.175244	// Use ReadTSC()	-0.124939
-1.439339	result is valid.	-0.124939
-0.899946	conversion is valid.	-0.124939
-1.437792	operand is valid.	-0.124939
-0.334881	int a[size], b[size];	-0.124939
-0.192687	float a[size], b[size];	-0.425969
-0.584263	Gnu compiler. Not	-0.124939
-0.580795	for vectorization Not	-0.124939
-0.358973	C++ builder. Not	-0.124939
-0.600594	achieved when none	-0.124939
-0.591386	expression, but none	-0.124939
-0.591386	15.1c, but none	-0.124939
-2.366500	rather than "what	-0.124939
-0.358973	typically thinks "what	-0.124939
-0.358973	the kind: "what	-0.124939
-0.726886	an addition. Comparing	-0.124939
-0.358973	condition clause. Comparing	-0.124939
-0.358973	Table 2.1. Comparing	-0.124939
-0.855525	vector processing instructions,	-0.124939
-0.527355	of sequential instructions,	-0.124939
-0.358973	and fence instructions,	-0.124939
-1.637711	instruction sets Microprocessor	-0.124939
-0.563064	wrong branch. Microprocessor	-0.124939
-0.764664	Mostly obsolete. Microprocessor	-0.124939
-0.892813	with template metaprogramming.	-0.124939
-1.328975	may need metaprogramming.	-0.124939
-1.129431	we need metaprogramming.	-0.124939
-0.901601	Intrinsic function Size	-0.124939
-1.831506	of elements Size	-0.124939
-0.575664	these classes. Size	-0.124939
-2.200484	used for metaprogramming,	-0.124939
-0.854189	with template metaprogramming,	-0.124939
-0.854189	C++ template metaprogramming,	-0.124939
-1.987650	compiler can bypass	-0.124939
-1.903850	You can bypass	-0.124939
-0.601379	Replace or bypass	-0.124939
-1.058347	for assembly output.	-0.124939
-1.573142	assembly language output.	-0.124939
-0.589177	produce Boolean output.	-0.124939
-0.879502	of microprocessor ...........................................................................................	-0.124939
-1.237764	point division ...........................................................................................	-0.124939
-0.871547	optimal platform ...........................................................................................	-0.124939
-1.299207	finding the numerically	-0.124939
-0.601604	finds the numerically	-0.124939
-0.358988	// Find numerically	-0.124939
-0.587434	an || expression.	-0.124939
-0.982090	the chosen expression.	-0.124939
-0.563084	an arithmetic expression.	-0.124939
-1.182780	Smart pointers ..........................................................................................................	-0.124939
-0.806089	Copyright notice ..........................................................................................................	-0.124939
-0.659955	12.10 Conclusion ..........................................................................................................	-0.124939
-1.992137	} The InstructionSet()	-0.124939
-0.600516	file for InstructionSet()	-0.425969
-0.879490	identical branches Eliminate	-0.124939
-1.139557	+ 1.; Eliminate	-0.124939
-0.789016	Eliminate jumps Eliminate	-0.124939
-0.601851	saving a backup	-0.124939
-0.875520	need better backup	-0.124939
-0.358973	prevent legitimate backup	-0.124939
-1.744336	instruction set. 13.6	-0.124939
-0.836030	65 65 13.6	-0.124939
-0.527355	..................................................................................................... 126 13.6	-0.124939
-1.589064	{ // Get	-0.124939
-0.596682	version // Get	-0.124939
-2.028502	does not throw	-0.124939
-1.041577	will never throw	-0.124939
-1.203916	can possibly throw	-0.124939
-1.744336	instruction set. More	-0.124939
-0.584309	to a[i] More	-0.124939
-0.567237	these problems. More	-0.124939
-0.505082	vectorization Devirtualization ---x-----	-0.124939
-0.463696	!(!a)=a x-xxxxxxx ---x-----	-0.124939
-0.463696	x-xx----- x--x----- ---x-----	-0.124939
-0.997298	{ return _mm_loadu_si128((__m128i	-0.301030
-1.593067	programming language Before	-0.124939
-1.169140	hot spots Before	-0.124939
-0.527355	mathematical tasks. Before	-0.124939
-1.462530	64-bit systems. Applications	-0.124939
-0.960888	user interface. Applications	-0.124939
-0.358973	page 141. Applications	-0.124939
-0.601046	12 - 25	-0.124939
-0.577633	reduced performance. 25	-0.124939
-0.358973	Development process...................................................................................................... 25	-0.124939
-2.429567	with the AVX-512	-0.124939
-0.314795	function. 12.2 AVX-512	-0.124939
-0.314795	107 12.2 AVX-512	-0.124939
-1.067253	fraction 2 23	-0.124939
-1.156438	of their 23	-0.124939
-0.505082	usability ............................................................................................... 23	-0.124939
-0.876130	cache will evict	-0.124939
-0.588371	18 will evict	-0.124939
-0.588371	17 will evict	-0.124939
-1.609889	the function. Copying	-0.124939
-0.591630	stack memory. Copying	-0.124939
-0.358973	not backwards. Copying	-0.124939
-1.072335	x; for (x	-0.124939
-0.598623	1.0; for (x	-0.124939
-0.598623	B; for (x	-0.124939
-1.045572	anything else being	-0.124939
-0.875527	of n being	-0.124939
-0.358973	data. That being	-0.124939
-0.601430	x^n // sum,	-0.124939
-2.021564	the first sum,	-0.124939
-1.529144	the second sum,	-0.124939
-0.601653	disadvantages: The unrolled	-0.124939
-0.599997	power, loop unrolled	-0.124939
-0.806044	be completely unrolled	-0.124939
-1.077410	cores is slow.	-0.124939
-0.601173	Truncation is slow.	-0.124939
-0.588076	was too slow.	-0.124939
-0.111572	in aa: StoreVector(aa	-0.602060
-2.635642	// Example 7.11	-0.124939
-1.162511	this problem. 7.11	-0.124939
-0.505082	..................................................................................................................... 38 7.11	-0.124939
-0.601951	enters the market	-0.124939
-0.902463	develop and market	-0.124939
-1.667325	The CPU market	-0.124939
-1.641911	vector of vectors,	-0.124939
-0.601768	division in vectors,	-0.124939
-0.589177	as Boolean vectors,	-0.124939
-0.593951	other allocated resource.	-0.124939
-1.119579	a limited resource.	-0.124939
-0.885198	a scarce resource.	-0.124939
-0.201410	"IA-32 Intel Architecture	-0.425969
-0.358988	AMD: "AMD64 Architecture	-0.124939
-2.635642	// Example 7.12	-0.124939
-1.267466	member function. 7.12	-0.124939
-0.527355	conversions.................................................................................................... 40 7.12	-0.124939
-1.296615	registers is limited.	-0.124939
-0.601173	allocations is limited.	-0.124939
-1.778525	is very limited.	-0.124939
-2.638242	// Example 11.3	-0.124939
-1.452609	in example 11.3	-0.124939
-0.601365	const or typedef	-0.124939
-1.060658	function type typedef	-0.124939
-0.593068	desired parameters typedef	-0.124939
-0.376907	range from 0x2700	-0.425969
-1.195121	from address 0x2700	-0.124939
-0.885293	c; }; Replace	-0.124939
-0.505082	running on. Replace	-0.124939
-0.358973	Example 7.34b. Replace	-0.124939
-0.463696	specific model. Instead,	-0.124939
-0.358973	July 2011). Instead,	-0.124939
-0.358973	for NOT. Instead,	-0.124939
-1.299464	libraries and frameworks,	-0.124939
-0.588063	big runtime frameworks,	-0.124939
-0.586761	large graphics frameworks,	-0.124939
-1.550385	x = *(p++)	-0.124939
-1.152930	!= 0) *(p++)	-0.124939
-0.358973	0; i--) *(p++)	-0.124939
-0.595142	low-power CPUs (Intel	-0.124939
-0.594122	-axSSE3, etc. (Intel	-0.124939
-0.659955	CPU only) (Intel	-0.124939
-1.899680	or a nearby	-0.124939
-1.792899	and other nearby	-0.124939
-0.592202	if other nearby	-0.124939
-0.588076	become too fragmented.	-0.124939
-0.916778	to become fragmented.	-0.124939
-0.916778	has become fragmented.	-0.124939
-1.940455	instead of truncation.	-0.124939
-0.203843	rounding and truncation.	-0.124939
-0.249860	manuals. 7.1 Different	-0.124939
-0.249860	26 7.1 Different	-0.124939
-0.358988	a number). Different	-0.124939
-1.442357	calculating the logarithm	-0.124939
-0.601256	static, the logarithm	-0.124939
-0.601256	Taking the logarithm	-0.124939
-1.078320	bit in Day	-0.124939
-0.533664	Wednesday || Day	-0.124939
-0.533664	Tuesday || Day	-0.124939
-2.334795	that is ported	-0.124939
-0.586023	is later ported	-0.124939
-0.563084	not easily ported	-0.124939
-1.202081	static or inline.	-0.124939
-1.842779	the function inline.	-0.124939
-2.407928	function is big.	-0.124939
-0.597959	become very big.	-0.124939
-1.302254	is too big.	-0.124939
-0.600953	allocation and deallocation	-0.425969
-0.358988	The allocation, deallocation	-0.124939
-0.338814	linkage table (PLT)	-0.124939
-2.635642	// Example 7.22	-0.124939
-0.505082	........................................................................... 54 7.22	-0.124939
-0.463696	alternative implementations. 7.22	-0.124939
-0.143412	22 3.14 Context	-0.124939
-0.143412	caching. 3.14 Context	-0.124939
-0.358988	be renewed. Context	-0.124939
-2.635642	// Example 7.23	-0.124939
-0.885293	c; }; 7.23	-0.124939
-0.505082	.............................................................................................................. 54 7.23	-0.124939
-0.902873	running the services	-0.124939
-0.586750	services. Many services	-0.124939
-0.828313	for background services	-0.124939
-2.635642	// Example 7.20	-0.124939
-0.575664	non-static access. 7.20	-0.124939
-0.505082	(methods)......................................................................... 53 7.20	-0.124939
-1.943745	this is extremely	-0.124939
-1.835786	method is extremely	-0.124939
-0.600485	abuse is extremely	-0.124939
-0.579364	is 8 kb	-0.124939
-1.119687	of 8 kb	-0.124939
-0.870031	is 512 kb	-0.124939
-0.889577	space by joining	-0.124939
-1.504604	avoided by joining	-0.124939
-0.595251	compact by joining	-0.124939
-1.503426	applies to decrement	-0.124939
-0.600677	increment and decrement	-0.124939
-0.600677	Increment and decrement	-0.124939
-0.601399	0x20 = 0x1C.	-0.124939
-0.599562	from set 0x1C.	-0.124939
-0.896714	set number 0x1C.	-0.124939
-0.504320	malloc and free.	-0.124939
-1.598425	available for free.	-0.124939
-2.226881	{ // Check	-0.124939
-0.599050	y=temp;} // Check	-0.124939
-0.580829	version. 2. Check	-0.124939
-1.940312	instead of double,	-0.124939
-1.364204	a 64-bit double,	-0.124939
-0.527373	int, float, double,	-0.124939
-0.996581	a simple periodic	-0.425969
-1.080385	A simple periodic	-0.124939
-2.635642	// Example 7.27	-0.124939
-0.593064	overloaded functions. 7.27	-0.124939
-0.527355	.............................................................................................. 56 7.27	-0.124939
-2.635642	// Example 7.24	-0.124939
-0.505082	.................................................................................. 55 7.24	-0.124939
-0.358973	page 53. 7.24	-0.124939
-2.304984	than the product	-0.124939
-0.597952	performing software product	-0.124939
-0.505082	A competing product	-0.124939
-2.635642	// Example 7.25	-0.124939
-0.836046	to integers. 7.25	-0.124939
-0.505082	.................................................................................................................... 55 7.25	-0.124939
-2.635642	// Example 7.28	-0.124939
-1.021871	simple cases. 7.28	-0.124939
-0.527355	............................................................................................. 56 7.28	-0.124939
-1.553799	pointers and references,	-0.124939
-0.541288	parameters, pointers, references,	-0.124939
-0.358973	systems: Pointers, references,	-0.124939
-0.692616	and VIA CPUs"	-0.301030
-0.598239	takes some experience	-0.124939
-1.646847	of programming experience	-0.124939
-0.570588	user might experience	-0.124939
-1.728830	efficient to determine	-0.124939
-2.251747	order to determine	-0.124939
-0.600264	Windows) to determine	-0.124939
-0.557841	template template <typename	-0.124939
-0.557841	checking template <typename	-0.124939
-0.557841	parameter: template <typename	-0.124939
-0.726886	/FA -S Generate	-0.124939
-0.463696	-openmp -static Generate	-0.124939
-0.358973	file /Fm Generate	-0.124939
-2.125332	it is certainly	-0.124939
-0.600485	seen, is certainly	-0.124939
-0.588060	8.1 below. Devirtualization	-0.124939
-1.203803	Automatic vectorization Devirtualization	-0.124939
-0.358973	Example 8.19. Devirtualization	-0.124939
-2.771970	in a pivot	-0.124939
-0.601122	use as pivot	-0.124939
-1.048980	a suitable pivot	-0.124939
-1.345766	by 16 __declspec(	-0.124939
-0.563084	__restrict __restrict __declspec(	-0.124939
-0.659955	__attribute(( aligned(16))) __declspec(	-0.124939
-1.804056	case of mispredictions	-0.124939
-0.581642	cause branch mispredictions	-0.124939
-0.581642	Provoke branch mispredictions	-0.124939
-0.642660	int i, a[100],	-0.124939
-2.467308	number of allocations	-0.124939
-0.600356	seven memory allocations	-0.124939
-1.362789	are many allocations	-0.124939
-0.595500	modules if necessary,	-0.124939
-0.595500	space, if necessary,	-0.124939
-0.595500	modified, if necessary,	-0.124939
-2.635642	// Example 9.4	-0.124939
-1.177671	library functions. 9.4	-0.124939
-0.527355	together...................................... 88 9.4	-0.124939
-1.960042	than a float.	-0.124939
-1.257664	of four float.	-0.124939
-0.789042	short int, float.	-0.124939
-0.369948	x + 1.0f;}	-0.124939
-0.576843	square(x) + 1.0f;}	-0.124939
-2.278797	code is indeed	-0.124939
-0.601173	8.21 is indeed	-0.124939
-0.601569	exceptions are indeed	-0.124939
-1.419567	in table 9.1	-0.124939
-1.351848	memory access 9.1	-0.124939
-0.550832	............................................................................................. 87 9.1	-0.124939
-1.497052	each other (not	-0.124939
-0.583244	versions tested (not	-0.124939
-0.659955	or NAN (not	-0.124939
-1.995426	have a built-in	-0.124939
-1.077962	instructions. The built-in	-0.124939
-0.505082	often inserts built-in	-0.124939
-1.600299	choice of n.	-0.124939
-0.866248	for positive n.	-0.124939
-0.358973	positive value, n.	-0.124939
-2.430788	to a complete	-0.124939
-1.197925	data. A complete	-0.124939
-0.592533	http://www.agner.org/optimize/asmlib.zip contains complete	-0.124939
-2.730229	x x (x)	-0.124939
-1.271995	x- x (x)	-0.124939
-0.883519	(x) x (x)	-0.124939
-0.573350	esp ebx ecx,	-0.124939
-0.550845	eax, edx, ecx,	-0.124939
-0.463696	?Func2@@YAXQAHAAH@Z ENDP ecx,	-0.124939
-0.203467	created or modified.	-0.425969
-2.592591	is not modified.	-0.124939
-0.172705	n.a. Constant folding	-0.124939
-0.172705	6.0f; Constant folding	-0.124939
-0.172705	places. Constant folding	-0.124939
-1.597961	where it expects	-0.124939
-1.639563	the user expects	-0.124939
-1.003315	The user expects	-0.124939
-0.892400	function // Call	-0.124939
-0.378052	... // Call	-0.425969
-2.686715	can be joined	-0.124939
-1.508359	will be joined	-0.124939
-1.372979	use vector classes,	-0.124939
-0.588067	use string classes,	-0.124939
-0.580822	well-tested functions, classes,	-0.124939
-1.638816	compilers can compute	-0.124939
-0.594998	code must compute	-0.124939
-1.058278	loop ; compute	-0.124939
-0.805387	matter of interpreting	-0.425969
-0.601655	framework for interpreting	-0.124939
-0.601759	SVML and LIBM	-0.124939
-0.564027	131. AMD LIBM	-0.124939
-0.564027	__vrd2_exp AMD LIBM	-0.124939
-1.310101	code that accesses	-0.124939
-0.591629	feature. All accesses	-0.124939
-2.762335	to the $B1$2	-0.124939
-0.855513	eax, 100 $B1$2	-0.124939
-0.463696	/ jl $B1$2	-0.124939
-1.261815	hard disk copying.	-0.124939
-0.563074	math. Memory copying.	-0.124939
-0.358973	preventing illegitimate copying.	-0.124939
-0.042750	Developer’s Manual", Volume	-0.124939
-0.090173	Programmer’s Manual", Volume	-0.124939
-2.686715	can be placed	-0.124939
-0.895071	then be placed	-0.124939
-1.850066	must be placed	-0.124939
-0.833037	the hot spot.	-0.124939
-0.800415	a hot spot.	-0.124939
-0.440684	this hot spot.	-0.124939
-2.610147	of a variable,	-0.124939
-0.601127	about a variable,	-0.124939
-1.939345	an integer variable,	-0.124939
-1.977882	lot of jumping	-0.124939
-2.200138	used for jumping	-0.124939
-0.587464	destructors after jumping	-0.124939
-1.496851	long time compared	-0.124939
-0.557802	following disadvantages compared	-0.124939
-0.358973	in duration compared	-0.124939
-1.503426	applies to 3-dimensional	-0.124939
-0.203467	video or 3-dimensional	-0.425969
-0.601839	destructor of x.	-0.124939
-1.299863	result in x.	-0.124939
-0.601168	restriction on x.	-0.124939
-0.601837	-(-a) to a.	-0.124939
-1.299572	results in a.	-0.124939
-0.599136	4) + a.	-0.124939
-0.601837	pre-increment to post-increment.	-0.124939
-0.601365	pre-increment or post-increment.	-0.124939
-1.904515	efficient than post-increment.	-0.124939
-2.136402	it is sufficient	-0.425969
-2.538377	may be sufficient	-0.124939
-2.834319	it is evicted	-0.124939
-2.430703	to be evicted	-0.124939
-2.160809	will be evicted	-0.124939
-0.601951	separating the flags	-0.124939
-0.588631	and zero flags	-0.124939
-0.463696	so-called partial flags	-0.124939
-0.601768	r in Sum2	-0.124939
-1.904515	efficient than Sum2	-0.124939
-0.358973	functions Sum1, Sum2	-0.124939
-0.687840	vector of (2,2,2,2,2,2,2,2)	-0.301030
-0.500139	DWORD PTR [edx]	-0.301030
-2.635642	// Example 7.14	-0.124939
-0.527373	Loops...................................................................................................................... 45 7.14	-0.124939
-0.505082	too big. 7.14	-0.124939
-2.635642	// Example 7.16	-0.124939
-0.541288	............................................................................................... 50 7.16	-0.124939
-0.835998	operating systems". 7.16	-0.124939
-2.635642	// Example 7.17	-0.124939
-1.113710	the object. 7.17	-0.124939
-0.541288	.............................................................................................. 50 7.17	-0.124939
-0.836924	to using templates.	-0.124939
-0.358988	as recursive templates.	-0.124939
-2.635642	// Example 7.13	-0.124939
-0.541288	different microprocessors. 7.13	-0.124939
-0.527355	statements............................................................................. 43 7.13	-0.124939
-2.635642	// Example 7.19	-0.124939
-0.577664	128 bytes. 7.19	-0.124939
-0.527355	............................................................................ 51 7.19	-0.124939
-0.784485	etc. But beware	-0.124939
-0.538721	b) But beware	-0.124939
-0.538721	inlined. But beware	-0.124939
-2.635642	// Example 7.18	-0.124939
-0.577633	on performance. 7.18	-0.124939
-0.527355	classes............................................................................................ 51 7.18	-0.124939
-1.362626	a graphics card	-0.124939
-0.659984	graphics accelerator card	-0.124939
-0.541310	extremely inefficient, (4)	-0.124939
-0.358988	and finally (4)	-0.124939
-0.896738	swap two elements:	-0.124939
-0.598545	two array elements:	-0.124939
-1.390027	be a viable	-0.124939
-0.505102	.......................................................................................................... 38 7.10	-0.124939
-0.659984	page 93. 7.10	-0.124939
-0.203497	two = _mm_set1_epi16(2);	-0.425969
-1.367887	container class templates,	-0.124939
-0.582119	Some STL templates,	-0.124939
-0.601759	bloat and complexity	-0.124939
-1.165086	the high complexity	-0.124939
-0.505118	130 14.4 511	-0.124939
-0.463714	14.4 511 511	-0.124939
-0.249867	37 7.8 Member	-0.124939
-0.249867	changed. 7.8 Member	-0.124939
-0.601655	utility for modifying	-0.124939
-0.601276	double by modifying	-0.124939
-1.551086	may have undesired	-0.124939
-0.580829	may produce undesired	-0.124939
-0.463714	................................................................................................................ 48 7.15	-0.124939
-0.358988	this respect. 7.15	-0.124939
-0.601663	GOT. The symbol	-0.124939
-0.855558	This so-called symbol	-0.124939
-1.778525	is very problematic	-0.124939
-1.197758	are particularly problematic	-0.124939
-1.867952	has to invest	-0.124939
-0.901086	worthwhile to invest	-0.124939
-1.078279	memset and memcpy,	-0.124939
-2.283118	such as memcpy,	-0.124939
-0.379663	Sum2 and Sum3	-0.124939
-0.557802	registers anyway. Pure	-0.124939
-0.541310	needs them. Pure	-0.124939
-2.834319	it is impossible	-0.124939
-2.302631	that are impossible	-0.124939
-0.201656	inheritance class B1;	-0.425969
-0.360309	a store forwarding	-0.425969
-0.463714	huge). Far storage,	-0.124939
-0.358988	have little-endian storage,	-0.124939
-1.368188	(see page 107).	-0.124939
-0.593296	(10000 / 64)	-0.124939
-0.358988	size (typically 64)	-0.124939
-1.782753	the table static.	-0.124939
-0.358988	a lookup-table static.	-0.124939
-0.541321	and _WIN64 _M_X64	-0.124939
-0.463714	_WIN64 _M_X64 _M_X64	-0.124939
-0.596244	<asmlib.h> void CriticalFunction();	-0.124939
-0.358988	= ReadTSC(); CriticalFunction();	-0.124939
-0.463714	--xx----- x-xxx---x x-xxx---x	-0.124939
-0.463714	x--x----- --xx----- x-xxx---x	-0.124939
-0.894153	platforms as shown	-0.124939
-0.597568	_mm_empty() as shown	-0.124939
-0.601759	documentation and lack	-0.124939
-1.818787	operating systems lack	-0.124939
-0.203497	y2 = a2	-0.124939
-0.203497	y1 = a1	-0.124939
-2.066092	(see page 16)	-0.124939
-1.524749	i += 16)	-0.124939
-2.610606	of a debugger.	-0.124939
-2.307908	with a debugger.	-0.124939
-1.900431	compiler is mostly	-0.124939
-1.203218	mode and mostly	-0.124939
-1.571049	const & a)	-0.124939
-0.550872	square (float a)	-0.124939
-0.358988	fld qword ptr	-0.124939
-0.358988	fistp dword ptr	-0.124939
-0.601663	__attribute__((fastcall)). The fastcall	-0.124939
-0.594677	names. Use fastcall	-0.124939
-2.467643	number of accumulators	-0.124939
-1.069709	use multiple accumulators	-0.124939
-0.797900	no pointer aliasing"	-0.124939
-0.143416	126 13.5 Implementation	-0.124939
-0.143416	elsewhere. 13.5 Implementation	-0.124939
-1.925440	the performance significantly	-0.124939
-0.595572	speeded up significantly	-0.124939
-0.505118	exploiting fine-grained parallelism.	-0.124939
-0.505102	contains natural parallelism.	-0.124939
-0.065815	3628800, 39916800, 479001600};	-0.124939
-0.463714	the book "Performance	-0.124939
-0.358988	Adolfy Hoisie: "Performance	-0.124939
-0.902709	x is type-casted	-0.124939
-1.202161	pointers are type-casted	-0.124939
-0.599344	*x; double x4	-0.124939
-0.598907	x^2 float x4	-0.124939
-0.958161	the clock frequency.	-0.124939
-0.902664	step of interpretation	-0.124939
-0.601379	compilation or interpretation	-0.124939
-1.700583	vector operations (chapter	-0.124939
-0.594853	Out-of-order execution (chapter	-0.124939
-0.188126	my own research,	-0.124939
-0.659984	Intensive Codes", SIAM	-0.124939
-0.358988	A. Hoisie, SIAM	-0.124939
-0.889076	Induction; ; a[i+1]	-0.124939
-1.010214	= Induction; a[i+1]	-0.124939
-0.541310	xxxxxxx-x xxxxxxxxx x-xxx----	-0.124939
-0.358988	---x----- x---x---x x-xxx----	-0.124939
-0.601379	buffer or send	-0.124939
-0.589188	please don't send	-0.124939
-0.536447	7.31b char string[100],	-0.124939
-0.536447	7.31a char string[100],	-0.124939
-0.891332	char 16 SSSE3	-0.124939
-0.594142	add, etc. SSSE3	-0.124939
-1.554814	types of expressions,	-0.124939
-2.580881	floating point expressions,	-0.124939
-0.896027	function version CriticalFunctionType	-0.124939
-0.463714	Function prototype CriticalFunctionType	-0.124939
-1.872100	(see page 71).	-0.124939
-1.311713	(See page 71).	-0.124939
-2.140987	stored in ASCII	-0.124939
-0.358988	a zero-terminated ASCII	-0.124939
-2.191732	are not overlapping	-0.124939
-1.075835	CPU from overlapping	-0.124939
-2.772600	in a computationally	-0.124939
-2.191732	are not computationally	-0.124939
-0.588084	Intel mechanism executes	-0.124939
-0.764700	Example 12.4b executes	-0.124939
-0.789055	the project window	-0.124939
-0.358988	the disassembly window	-0.124939
-0.601379	five or ten	-0.124939
-1.500012	critical function ten	-0.124939
-0.601439	aligned // Structure	-0.124939
-0.527376	bytes smaller. Structure	-0.124939
-1.679329	kinds of jobs.	-0.124939
-0.828356	for background jobs.	-0.124939
-0.463714	everybody. So please	-0.124939
-0.463714	messages saying please	-0.124939
-0.249867	55 7.24 Unions	-0.124939
-0.249867	53. 7.24 Unions	-0.124939
-0.463714	3B. developer.intel.com. AMD:	-0.124939
-0.659984	produced regularly. AMD:	-0.124939
-1.596893	d = ((a*x+b)*x+c)*x+d	-0.124939
-2.787423	x x ((a*x+b)*x+c)*x+d	-0.124939
-0.557818	more important. 9.2	-0.124939
-0.550854	......................................................................................... 87 9.2	-0.124939
-0.601870	kilobyte is 1024	-0.124939
-0.601844	sizes to 1024	-0.124939
-0.472680	it was programmed.	-0.124939
-0.463714	Example 12.5. Aligned	-0.124939
-0.358988	good performance). Aligned	-0.124939
-2.572959	on the past	-0.124939
-0.463714	programs. Writing past	-0.124939
-0.882502	allocated memory. 9.6	-0.124939
-0.541321	...................................................................................................... 90 9.6	-0.124939
-2.511249	of the object's	-0.124939
-0.366842	a2, b1, b2,	-0.425969
-0.600159	template because partial	-0.124939
-1.299540	the so-called partial	-0.124939
-0.203497	a+b+c+d = (a+b)+(c+d)	-0.425969
-2.195680	short int (16	-0.124939
-1.193387	vector size (16	-0.124939
-1.495276	the instruction xor	-0.124939
-0.575684	push mov xor	-0.124939
-0.527389	better. Remember again,	-0.124939
-0.836041	the logarithm again,	-0.124939
-0.527376	...................................................................................................................... 96 9.9	-0.124939
-0.527376	at www.agner.org/optimize/cppexamples.zip. 9.9	-0.124939
-1.544123	time and resolve	-0.124939
-0.600690	find and resolve	-0.124939
-2.572959	on the context.	-0.124939
-1.467901	the new context.	-0.124939
-0.741160	appropriate version (May	-0.425969
-1.511579	See page 131.	-0.124939
-2.763279	if the goal	-0.124939
-0.541321	more realistic goal	-0.124939
-1.078279	dispatching and discovered	-0.124939
-0.900474	programmers have discovered	-0.124939
-0.586658	16 float Exp(float	-0.124939
-0.586658	series float Exp(float	-0.124939
-0.505102	..................................................................................................... 93 9.8	-0.124939
-0.463714	specific needs. 9.8	-0.124939
-2.149690	n.a. n.a. _MSC_VER	-0.124939
-0.463714	data #ifdef _MSC_VER	-0.124939
-0.143416	...................................................................................... 156 16.3	-0.124939
-0.143416	large. 156 16.3	-0.124939
-0.463714	a 90% chance	-0.124939
-0.358988	a 50-50 chance	-0.124939
-2.954844	can be manipulated	-0.124939
-0.590917	CPUID was manipulated	-0.124939
-1.854996	calculation of c+b	-0.124939
-0.573379	the subexpression c+b	-0.124939
-1.296038	you to override	-0.124939
-0.901086	ability to override	-0.124939
-1.632286	not use branches,	-0.124939
-1.049075	can eliminate branches,	-0.124939
-1.437686	in multiple applications,	-0.124939
-1.069168	for such applications,	-0.124939
-1.959275	I have developed	-0.124939
-1.603216	as well developed	-0.124939
-0.541310	template method. 7.29	-0.124939
-0.358988	7.28 Templates...............................................................................................................57 7.29	-0.124939
-1.078756	execution of CriticalFunction.	-0.124939
-1.503641	calls to CriticalFunction.	-0.124939
-1.275333	This manual discusses	-0.124939
-0.583260	This section discusses	-0.124939
-0.358988	FUNCNAME SelectAddMul_SSE41 #elif	-0.124939
-0.358988	FUNCNAME SelectAddMul_SSE2 #elif	-0.124939
-0.541321	<<6 ); 7.26	-0.124939
-0.527376	................................................................................................................... 56 7.26	-0.124939
-0.065815	manual 4: "Instruction	-0.425969
-1.641931	b is 400	-0.124939
-0.601439	a[100]; // 400	-0.124939
-0.510599	x Loop invariant	-0.124939
-0.510599	compiler. Loop invariant	-0.124939
-0.201243	b*x*x + c*x	-0.425969
-2.653706	the code carefully	-0.124939
-0.599531	versions, each carefully	-0.124939
-0.600594	stack when CriticalInnerFunction	-0.124939
-0.596244	14.1c void CriticalInnerFunction	-0.124939
-0.358988	a/a=1 --------x a/1=a	-0.124939
-0.358988	a/a=1 ----x---x a/1=a	-0.124939
-2.022102	x) { __m128	-0.124939
-1.179643	The type __m128	-0.124939
-1.185249	the & operator;	-0.124939
-0.875559	the | operator;	-0.124939
-2.334442	as a subexpression.	-0.124939
-1.455667	the constant subexpression.	-0.124939
-1.203359	space is freed	-0.124939
-2.538377	may be freed	-0.124939
-0.541321	bitwise OR operator,	-0.124939
-0.358988	overloaded assignment operator,	-0.124939
-0.659984	= &Object2; p->Hello();	-0.124939
-0.358988	&Object1; p->NotPolymorphic(); p->Hello();	-0.124939
-1.366736	on Intel CPUs:	-0.124939
-1.646664	and VIA CPUs:	-0.124939
-1.173054	and all 0's	-0.124939
-1.574147	with all 0's	-0.124939
-2.719394	is a chip	-0.124939
-2.647771	the same chip	-0.124939
-0.550854	the ^ operator.	-0.124939
-0.358988	the sizeof operator.	-0.124939
-0.901936	process can proceed	-0.124939
-1.072584	should also proceed	-0.124939
-0.598870	version int CriticalFunction_386(int	-0.425969
-0.601759	workstations and scientific	-0.124939
-0.601773	niche in scientific	-0.124939
-1.300079	exponent is biased	-0.124939
-2.334442	as a biased	-0.124939
-2.719394	is a minor	-0.124939
-0.896096	a possible minor	-0.124939
-2.550593	on the screen.	-0.124939
-0.601610	refresh the screen.	-0.124939
-1.874816	on the market.	-0.124939
-0.143419	align(16)) __attribute(( aligned(16)))	-0.124939
-2.800783	can be justified	-0.124939
-2.448092	may be justified	-0.124939
-0.902658	or to exit	-0.124939
-0.527389	exit. Calling exit	-0.124939
-1.126117	y = cos(x);	-0.124939
-0.895681	or variable having	-0.124939
-0.463714	and p2 having	-0.124939
-1.448520	time. A for-loop	-0.124939
-0.594564	7.32b. A for-loop	-0.124939
-0.883113	1 1 char,	-0.124939
-0.659984	data types: char,	-0.124939
-0.463714	a*1=a (-a)*(-b)=a*b a/a=1	-0.124939
-0.358988	(-a)*(-b)=a*b ---xxx--- a/a=1	-0.124939
-0.159911	a serious legal	-0.124939
-1.776798	any other resource,	-0.124939
-0.885246	a scarce resource,	-0.124939
-1.129438	and automatic parallelization.	-0.124939
-0.538993	Use automatic parallelization.	-0.124939
-1.705306	way of keeping	-0.124939
-1.632755	cost of keeping	-0.124939
-0.463714	reliable. Event-based sampling:	-0.124939
-0.358988	line. Time-based sampling:	-0.124939
-2.638242	// Example 12.5.	-0.124939
-0.588074	F64vec4 Table 12.5.	-0.124939
-0.601072	advance. This reduces	-0.124939
-0.589667	that automatically reduces	-0.124939
-2.431153	to a non-member	-0.124939
-0.579351	all local non-member	-0.124939
-2.800783	can be vectorized,	-0.124939
-1.072470	still be vectorized,	-0.124939
-0.065815	{temp=x; x=y; y=temp;}	-0.124939
-0.143416	all disturbing influences	-0.124939
-0.143416	All disturbing influences	-0.124939
-1.844505	following example explains	-0.124939
-0.588076	expression better explains	-0.124939
-2.294812	order to emulate	-0.124939
-1.378482	library can emulate	-0.124939
-0.601379	three or four,	-0.124939
-1.675012	loop by four,	-0.124939
-0.570023	and I believe	-0.124939
-0.570023	expected. I believe	-0.124939
-1.378777	value in stdint.h	-0.124939
-1.351373	header file stdint.h	-0.124939
-0.449625	Common subexpression elimin.,	-0.124939
-1.169989	only one instance.	-0.124939
-0.198338	matrix void TransposeCopy(double	-0.425969
-0.600980	CPU, an insufficient	-0.124939
-0.600264	Compiler has insufficient	-0.124939
-0.902885	overcome the dangers	-0.124939
-2.467643	number of dangers	-0.124939
-1.743306	objects are aligned.	-0.124939
-0.463714	be optimally aligned.	-0.124939
-2.305238	than the external	-0.124939
-0.901412	link with external	-0.124939
-0.810753	cout << "Error:	-0.425969
-0.570613	tmmintrin.h SSE4.1 smmintrin.h	-0.124939
-0.505102	nmmintrin.h (MS) smmintrin.h	-0.124939
-0.588081	Big runtime frameworks.	-0.124939
-0.586032	and interface frameworks.	-0.124939
-0.143416	{ Sunday, Monday,	-0.124939
-0.143416	constants Sunday, Monday,	-0.124939
-0.806169	Mac OS X,	-0.124939
-0.600518	restrictions. A GNU	-0.124939
-0.358988	compiler price GNU	-0.124939
-0.599551	13.1 page 127.	-0.124939
-0.557810	-128 generates 127.	-0.124939
-1.029048	each version FuncType	-0.124939
-0.584698	selected version FuncType	-0.124939
-1.830355	call to C1::f	-0.124939
-1.065362	can call C1::f	-0.124939
-1.558356	less efficient. Splitting	-0.124939
-0.358988	this rule. Splitting	-0.124939
-1.113779	vector operations. Algorithms	-0.124939
-0.358988	and matrixes. Algorithms	-0.124939
-0.463714	-mssse3 -msse4.1 -mAVX	-0.124939
-0.358988	-msse4.1 /arch:SSE4.1 -mAVX	-0.124939
-1.540361	have to worry	-0.124939
-1.861111	a single instruction.	-0.124939
-1.162640	bit scan instruction.	-0.124939
-1.789532	} // x^2	-0.124939
-0.599059	x; // x^2	-0.124939
-2.954844	can be disabled	-0.124939
-2.107517	they are disabled	-0.124939
-2.762877	to the CPU-specific	-0.124939
-0.601569	counters are CPU-specific	-0.124939
-2.219225	// Example 8.26b	-0.124939
-0.887038	; Example 8.26b	-0.124939
-1.377837	set. The preprocessing	-0.124939
-1.543885	library has preprocessing	-0.124939
-1.490326	with different strides.	-0.124939
-0.971442	with fixed strides.	-0.124939
-0.902658	0 to 15.	-0.124939
-2.016065	on page 15.	-0.124939
-1.280499	takes to develop	-0.425969
-0.897703	}; // Full	-0.425969
-2.303938	{ // (N	-0.124939
-0.527376	#define N1 (N	-0.124939
-0.601439	c[arraysize]; // Enable	-0.124939
-0.463714	to 12.1a. Enable	-0.124939
-1.735433	in most cases:	-0.124939
-1.921266	the following cases:	-0.124939
-1.046615	code to non-AVX	-0.124939
-0.598952	a+b+c = a+(b+c)	-0.124939
-0.598952	(a+b)+c = a+(b+c)	-0.124939
-0.204030	moving the mouse.	-0.124939
-0.541310	- 5. www.amd.com.	-0.124939
-0.358988	15h Processors". www.amd.com.	-0.124939
-0.601844	-100 to -56	-0.124939
-1.883598	the result -56	-0.124939
-2.096095	is more difficult.	-0.124939
-0.594931	optimization more difficult.	-0.124939
-0.594142	/QaxSSE3, etc. -msse3	-0.124939
-0.463714	/openmp /MT -msse3	-0.124939
-0.764700	example 8.26a (32-bit	-0.124939
-0.358988	bigger segments (32-bit	-0.124939
-0.550863	the B values.	-0.124939
-0.764700	"best case" values.	-0.124939
-0.659984	/arch:SSE2 -msse2 /arch:SSE2	-0.124939
-0.358988	or double) /arch:SSE2	-0.124939
-1.068506	vector objects Vec8s	-0.124939
-0.550854	128 Is16vec8 Vec8s	-0.124939
-0.599567	MyChild> class CParent	-0.124939
-0.567255	2" Here CParent	-0.124939
-0.659984	page 78). Adding	-0.124939
-0.358988	wrap around. Adding	-0.124939
-0.358988	and 3B. developer.intel.com.	-0.124939
-0.358988	Reference Manual". developer.intel.com.	-0.124939
-0.143416	pointer -fomit- frame-	-0.124939
-0.143416	/Oy -fomit- frame-	-0.124939
-0.538986	library #include <stdio.h>	-0.124939
-0.538986	16.2 #include <stdio.h>	-0.124939
-0.527376	............................................................. 96 9.11	-0.124939
-0.659984	SIAM 2001. 9.11	-0.124939
-2.367910	because the relocations	-0.124939
-1.412668	will generate relocations	-0.124939
-0.378913	infinity or NAN	-0.124939
-0.527376	.......................................................................................... 96 9.10	-0.124939
-0.358988	is opposite). 9.10	-0.124939
-0.408234	Writes "Hello 2"	-0.124939
-1.188530	} return sum;	-0.124939
-1.423986	= 0, sum;	-0.124939
-0.563094	of vectors. 12.10	-0.124939
-0.557810	....................................................... 120 12.10	-0.124939
-1.600572	overhead of semaphores,	-0.124939
-2.283118	such as semaphores,	-0.124939
-0.541310	most microprocessors. Multiplication	-0.124939
-0.726919	the microprocessor. Multiplication	-0.124939
-0.601411	N1 = N&(N-1)	-0.124939
-1.076002	2 then N&(N-1)	-0.124939
-0.504548	compiled to assembly:	-0.425969
-0.065815	are produced regularly.	-0.124939
-0.902280	searching for vacant	-0.124939
-2.592591	is not vacant	-0.124939
-3.070236	in the early	-0.124939
-0.595297	compilation. Some early	-0.124939
-0.895680	contains any non-polymorphic	-0.124939
-0.358988	// Place non-polymorphic	-0.124939
-0.203497	C = 3.3;	-0.124939
-0.896068	to many users.	-0.124939
-0.597968	working software users.	-0.124939
-1.076565	must have extern	-0.124939
-0.600016	entry point extern	-0.124939
-2.305238	than the heap.	-0.124939
-1.630091	a memory heap.	-0.124939
-0.902297	format. The formats	-0.124939
-0.596137	standardized file formats	-0.124939
-2.800783	can be ruled	-0.124939
-1.970069	cannot be ruled	-0.124939
-0.902709	addresses is reused	-0.124939
-2.954844	can be reused	-0.124939
-0.573369	last vector. Organize	-0.124939
-0.358988	a bottleneck. Organize	-0.124939
-0.760395	: public CParent<CChild1>	-0.425969
-2.638242	// Example 14.12b	-0.124939
-2.150653	in example 14.12b	-0.124939
-0.593513	Small data types:	-0.124939
-0.593513	Larger data types:	-0.124939
-2.669772	for the FDIV	-0.124939
-0.601663	bug". The FDIV	-0.124939
-2.047374	before the decimal	-0.124939
-2.340689	with a decimal	-0.124939
-0.598907	1.f; float nfac	-0.124939
-1.229469	*= x; nfac	-0.124939
-0.861625	and network connections.	-0.124939
-0.579343	Open database connections.	-0.124939
-2.716424	in a PC.	-0.124939
-0.601134	required a PC.	-0.124939
-1.845121	based on hacks	-0.124939
-0.358988	than self-styled hacks	-0.124939
-0.595433	search times 24	-0.124939
-0.527376	algorithm ....................................................................................... 24	-0.124939
-0.597111	statements often suffer	-0.124939
-1.179656	can therefore suffer	-0.124939
-0.601759	catch, and throw.	-0.124939
-1.444854	function can throw.	-0.124939
-0.891122	same bits differently.	-0.124939
-0.593538	linking works differently.	-0.124939
-0.249867	16 __declspec( align(16))	-0.124939
-0.249867	aligned(16))) __declspec( align(16))	-0.124939
-1.124583	of each element,	-0.425969
-0.847496	Program loading Often,	-0.124939
-0.527389	or two. Often,	-0.124939
-0.726919	error condition. Replacing	-0.124939
-0.726946	function inline. Replacing	-0.124939
-0.201243	a*x*x*x + b*x*x	-0.425969
-0.759227	in assembly language".	-0.124939
-0.902488	efficient, and that's	-0.124939
-1.072696	threads, but that's	-0.124939
-0.505102	.................................................................................... 124 13.3	-0.124939
-0.659984	of programming. 13.3	-0.124939
-2.159114	there are inherent	-0.124939
-1.497348	not have inherent	-0.124939
-0.358988	-parallel -openmp -static	-0.124939
-0.358988	-m32 -m64 -static	-0.124939
-0.498972	bytes S1 ArrayOfStructures[100];	-0.124939
-0.498972	}; S1 ArrayOfStructures[100];	-0.124939
-0.541321	strategies........................................................................................ 122 13.2	-0.124939
-0.541310	source files. 13.2	-0.124939
-1.194723	The integer comparison	-0.124939
-0.505102	an approximate comparison	-0.124939
-0.601957	takes the hint	-0.124939
-1.503743	only a hint	-0.124939
-0.527376	.......................................................................................... 126 13.5	-0.124939
-0.505102	found elsewhere. 13.5	-0.124939
-0.598065	124 2 13.4	-0.124939
-0.358988	reliable decision. 13.4	-0.124939
-0.593068	......................................................................... 128 13.7	-0.124939
-0.505102	critical. 129 13.7	-0.124939
-0.505102	in kernel code"	-0.124939
-0.358988	name "position-independent code"	-0.124939
-0.601077	well, of course.	-0.124939
-0.601077	safe, of course.	-0.124939
-0.538986	vectorized #include <dvec.h>	-0.124939
-0.538986	114 #include <dvec.h>	-0.124939
-1.962455	inside the loop,	-0.124939
-1.190104	the while loop,	-0.124939
-0.877730	8-bit signed number,	-0.124939
-0.828381	CPU family number,	-0.124939
-1.600252	such a case:	-0.124939
-0.563094	to lower case:	-0.124939
-1.526780	avoided by rolling	-0.124939
-0.598258	8.26a by rolling	-0.124939
-1.601444	outside the loop:	-0.124939
-0.584315	Critical innermost loop:	-0.124939
-0.601411	x.f = 2.0f;	-0.124939
-1.073383	x + 2.0f;	-0.124939
-1.951444	See page 52.	-0.124939
-0.589867	7.35 page 52.	-0.124939
-1.909401	useful for supporting	-0.124939
-0.600141	tools for supporting	-0.124939
-0.902219	check that thrown	-0.124939
-0.557802	for exceptions thrown	-0.124939
-1.675012	this by invoking	-0.124939
-0.596448	program without invoking	-0.124939
-0.203497	FactorialTable[13] = {1,	-0.425969
-2.311633	possible to construct	-0.124939
-2.541119	the function construct	-0.124939
-2.781714	it is compiled.	-0.124939
-2.043817	program is compiled.	-0.124939
-0.573889	96 void transpose(double	-0.124939
-0.573889	9.5b void transpose(double	-0.124939
-0.249867	105. 8.7 Checking	-0.124939
-0.249867	82 8.7 Checking	-0.124939
-0.324686	common subexpression elimination,	-0.425969
-2.061867	(i = StringLength;	-0.124939
-1.596639	int i, StringLength;	-0.124939
-0.590118	web application integration,	-0.124939
-0.579343	development, database integration,	-0.124939
-1.805593	a few kilobytes	-0.124939
-0.505118	size Total kilobytes	-0.124939
-0.595656	size have got	-0.124939
-0.890375	sets have got	-0.124939
-0.601374	declare it locally	-0.124939
-0.876700	other resources locally	-0.124939
-0.601847	half of it,	-0.124939
-0.878652	logic allows it,	-0.124939
-0.901793	4 = 32.	-0.124939
-0.585192	integer, usually 32.	-0.124939
-2.397415	is the minimum	-0.124939
-0.596035	size, bits minimum	-0.124939
-2.532245	to make thread-specific	-0.124939
-0.575684	for containing thread-specific	-0.124939
-0.600973	8.12b int a[2];	-0.124939
-1.596639	int i, a[2];	-0.124939
-0.898855	address which can't	-0.124939
-0.598051	shared. You can't	-0.124939
-0.593079	vector parameters Vec4f	-0.124939
-0.358988	Vec4q Vec4uq Vec4f	-0.124939
-0.940748	function call. (2)	-0.124939
-0.726919	it occurs, (2)	-0.124939
-1.148331	the preceding paragraph	-0.124939
-0.498970	The preceding paragraph	-0.124939
-0.899966	file will remain	-0.124939
-1.270854	the diagonal remain	-0.124939
-0.314805	for virus scanners	-0.124939
-0.314805	Firewalls, virus scanners	-0.124939
-0.898993	This loop calculates	-0.124939
-0.599938	example, which calculates	-0.124939
-0.577662	or accessing databases,	-0.124939
-0.527376	network resources, databases,	-0.124939
-0.358988	Meyers: "Effective C++".	-0.124939
-0.358988	"More Effective C++".	-0.124939
-1.164393	Visual Studio 2008	-0.124939
-0.358988	Windows Server 2008	-0.124939
-0.818685	-ffunction- sections /Gy	-0.124939
-0.463714	ced functions) /Gy	-0.124939
-0.789055	= C; Assuming	-0.124939
-0.358988	it has. Assuming	-0.124939
-1.140288	a binary search,	-0.124939
-1.366695	a linear search,	-0.124939
-1.959300	Intel compiler .........................................................................	-0.124939
-1.843481	Gnu compiler .........................................................................	-0.124939
-1.630368	different C++ constructs	-0.124939
-0.596148	advanced programming constructs	-0.124939
-0.202269	platforms, different screen	-0.425969
-0.557810	further optimizations. Loops	-0.124939
-0.505102	microprocessors. 7.13 Loops	-0.124939
-0.596244	8.5a void Plus2	-0.124939
-1.230341	int a; Plus2	-0.124939
-1.675222	a - a*0	-0.124939
-2.363867	- n.a. a*0	-0.124939
-1.637434	0 - a*1	-0.124939
-2.363867	- n.a. a*1	-0.124939
-0.538986	classes #include "vectorclass.h"	-0.124939
-0.538986	dispatching #include "vectorclass.h"	-0.124939
-0.505139	int a[size], b[size],	-0.124939
-0.681230	float a[size], b[size],	-0.124939
-0.588786	coefficients double Table[100];	-0.124939
-0.588786	3.3; double Table[100];	-0.124939
-0.415730	following sections describe	-0.124939
-0.415730	subsequent sections describe	-0.124939
-1.445952	uses a GOT.	-0.124939
-0.601759	PLT and GOT.	-0.124939
-1.202536	cast The dynamic_cast	-0.124939
-0.597602	check makes dynamic_cast	-0.124939
-0.463742	program. 3 Finding	-0.124939
-0.463742	14 3 Finding	-0.124939
-0.065815	6, 24, 120,	-0.425969
-1.444732	one for uninitialized	-0.124939
-2.107517	they are uninitialized	-0.124939
-0.358988	page 81). 77	-0.124939
-0.358988	compiler ....................................................................... 77	-0.124939
-2.656570	of a string.	-0.124939
-0.358988	false vendor string.	-0.124939
-2.411630	- x 74	-0.124939
-0.358988	different compilers............................................................................. 74	-0.124939
-0.600472	C1::f } 73	-0.124939
-2.209017	See page 73	-0.124939
-0.505102	Object1; CChild2 Object2;	-0.124939
-0.463714	Object1; C2 Object2;	-0.124939
-2.305238	than the destination	-0.124939
-0.601759	source and destination	-0.124939
-0.989733	a table lookup:	-0.425969
-0.601759	73 and 72	-0.124939
-0.583280	+ a; 72	-0.124939
-0.595678	(Day & (Tuesday	-0.124939
-1.043047	The expression (Tuesday	-0.124939
-0.203054	a:4; int b:2;	-0.425969
-0.503904	*p = string;	-0.124939
-1.077966	required for putting	-0.124939
-0.901524	2 by putting	-0.124939
-0.601220	14.14a with 14.14b	-0.124939
-2.638242	// Example 14.14b	-0.124939
-1.642018	accessed in non-	-0.124939
-1.296629	relies on non-	-0.124939
-1.299952	15.1b to 15.1c.	-0.124939
-2.638242	// Example 15.1c.	-0.124939
-1.368188	(see page 73).	-0.124939
-1.635387	position-independent code .......................................................	-0.124939
-0.871593	3-dimensional vectors .......................................................	-0.124939
-0.463714	of semaphores, mutexes,	-0.124939
-0.463714	memory, windows, mutexes,	-0.124939
-1.078610	register is volatile.	-0.124939
-1.240218	be declared volatile.	-0.124939
-0.600648	DLLs use relocation.	-0.124939
-1.284978	that need relocation.	-0.124939
-0.190179	(a&b) | (~a&c)	-0.124939
-2.719394	is a 90%	-0.124939
-1.257822	that uses 90%	-0.124939
-0.600973	m> int MultiplyBy	-0.124939
-0.599963	parameter. If MultiplyBy	-0.124939
-2.191732	are not suited	-0.124939
-1.056924	is best suited	-0.124939
-0.159912	Architecture Software Developer’s	-0.425969
-0.065815	b1, b2, y1,	-0.124939
-2.638242	// Example 14.14a	-0.124939
-2.150653	in example 14.14a	-0.124939
-0.596546	and user settings	-0.124939
-0.463714	system color settings	-0.124939
-1.609955	the function. Compile	-0.124939
-0.557825	following: 130 Compile	-0.124939
-0.595151	intrinsic function. Provoke	-0.124939
-0.958021	to disk. Provoke	-0.124939
-1.709635	in an import	-0.124939
-0.892615	through an import	-0.124939
-1.633932	memory block turns	-0.124939
-0.563087	the prediction turns	-0.124939
-0.601759	0x3F00 and 0x4700.	-0.124939
-1.292713	read from 0x4700.	-0.124939
-0.505102	- ----- x----	-0.124939
-0.463714	----- x---- x----	-0.124939
-0.143416	applications. 2.8 Overcoming	-0.124939
-0.143416	14 2.8 Overcoming	-0.124939
-0.463714	a+0=a a*0=0 a*1=a	-0.124939
-0.358988	a*0=0 --xxxx-xx a*1=a	-0.124939
-0.541310	than necessary. Take	-0.124939
-0.726919	new features. Take	-0.124939
-1.693841	of data shuffling,	-0.124939
-0.358988	data conversion, shuffling,	-0.124939
-1.436659	with different priorities	-0.124939
-0.592884	assigning different priorities	-0.124939
-1.202351	out of range";	-0.124939
-0.588657	because various corrections	-0.124939
-0.527389	sent me corrections	-0.124939
-0.597406	also less safe.	-0.124939
-0.593969	program exception safe.	-0.124939
-2.130312	set is supported.	-0.124939
-2.592591	is not supported.	-0.124939
-1.275301	or an anonymous	-0.124939
-1.275301	into an anonymous	-0.124939
-2.408169	function is pure.	-0.124939
-2.533963	to be pure.	-0.124939
-1.270041	vector instructions SSE4.2	-0.124939
-0.463714	SSE4.1 smmintrin.h SSE4.2	-0.124939
-0.598110	__rdtsc(); return clock;	-0.124939
-1.577876	long long clock;	-0.124939
-1.920636	in example 12.4a	-0.124939
-0.589400	like example 12.4a	-0.124939
-0.201434	different size matrices,	-0.425969
-0.586782	may give inconsistent	-0.124939
-0.586776	click becomes inconsistent	-0.124939
-0.358988	= _mm_or_si128(c2, bc);	-0.124939
-0.358988	= _mm_andnot_si128(mask, bc);	-0.124939
-2.284267	is to join	-0.124939
-0.901086	better to join	-0.124939
-1.202351	out of range.	-0.124939
-0.203054	b:2; int c:2;	-0.425969
-0.828368	is fast. Value	-0.124939
-0.726919	is slow. Value	-0.124939
-0.599567	class: class CGrandParent	-0.124939
-1.459153	: public CGrandParent	-0.124939
-0.065815	= _mm_add_epi16(c, two);	-0.425969
-2.787423	x x --	-0.124939
-0.541310	- xxxxxxxxx --	-0.124939
-0.601870	check is bypassed	-0.124939
-2.954844	can be bypassed	-0.124939
-0.594334	of several drivers,	-0.124939
-0.358988	programming Device drivers,	-0.124939
-0.902709	other is -0	-0.124939
-0.901730	negative or -0	-0.124939
-1.130816	a graphics accelerator	-0.124939
-0.530683	or graphics accelerator	-0.124939
-0.575684	database access. 3.10	-0.124939
-0.541321	....................................................................................................... 21 3.10	-0.124939
-0.541321	................................................................................................................. 21 3.11	-0.124939
-0.726919	is best. 3.11	-0.124939
-1.974111	but it increases	-0.124939
-1.067183	hash table increases	-0.124939
-0.359006	...................................................................................................... 21 3.13	-0.124939
-0.359006	loaded. 21 3.13	-0.124939
-0.541310	access....................................................................................................... 22 3.14	-0.124939
-0.505102	memory caching. 3.14	-0.124939
-0.550863	multiple cores. 3.15	-0.124939
-0.541310	switches..................................................................................................... 22 3.15	-0.124939
-1.136968	dependency chain. 3.16	-0.124939
-0.541310	................................................................................................ 22 3.16	-0.124939
-0.600232	compilers make Sum1	-0.124939
-0.593076	three functions. Sum1	-0.124939
-0.065815	(a+b)+(c+d) a*b+a*c=a*(b+c) a*x*x*x	-0.425969
-0.129483	*(p++) |= 0x20;	-0.124939
-1.897635	divisible by TILESIZE	-0.124939
-2.064204	const int TILESIZE	-0.124939
-0.598339	reduce any expression,	-0.124939
-0.588071	an && expression,	-0.124939
-0.202751	biggest time consumers	-0.124939
-3.070236	in the CPU,	-0.124939
-1.018379	a slow CPU,	-0.124939
-1.191820	p = &Object1;	-0.124939
-0.598952	p1 = &Object1;	-0.124939
-0.892117	embedded systems .............................................................................	-0.124939
-1.465350	compiler does .............................................................................	-0.124939
-0.065815	// Approximate exp(x)	-0.425969
-0.504560	time of programming.	-0.124939
-0.601051	ReadTSC() - time1;	-0.124939
-1.577876	long long time1;	-0.124939
-0.552447	at certain events,	-0.124939
-0.552447	count certain events,	-0.124939
-2.059910	program is achieved	-0.124939
-1.077766	could be achieved	-0.124939
-0.538986	SSE2 #include <emmintrin.h>	-0.124939
-0.538986	141 #include <emmintrin.h>	-0.124939
-0.563094	all applications. 2.8	-0.124939
-0.550863	framework........................................................................... 14 2.8	-0.124939
-1.504329	find the answers	-0.124939
-1.052744	can get answers	-0.124939
-1.641117	cost of starting	-0.124939
-0.505118	language Before starting	-0.124939
-0.376661	also has disadvantages:	-0.124939
-0.567255	........................................................................................... 6 2.3	-0.124939
-0.789055	this manual. 2.3	-0.124939
-1.288604	control branch ahead	-0.124939
-1.618775	loop counter ahead	-0.124939
-2.408169	function is inserted	-0.124939
-1.294519	we have inserted	-0.124939
-1.308159	data cache. 2.2	-0.124939
-0.583267	....................................................................................... 5 2.2	-0.124939
-0.659984	/arch:SSE -msse /arch:SSE	-0.124939
-0.358988	float vectors) /arch:SSE	-0.124939
-0.871587	optimal platform 2.1	-0.124939
-0.583267	........................................................................................... 5 2.1	-0.124939
-1.661278	b + 2.0	-0.124939
-0.597876	u.f < 2.0	-0.124939
-0.065815	720, 5040, 40320,	-0.425969
-0.596756	9.1a int Func(int);	-0.124939
-0.596756	9.1b int Func(int);	-0.124939
-0.600495	threads will invalidate	-0.124939
-0.358988	may actively invalidate	-0.124939
-0.600189	sequentially. The opposite	-0.124939
-0.600189	most. The opposite	-0.124939
-0.601773	factor in itself,	-0.124939
-0.873026	the framework itself,	-0.124939
-0.563094	libraries........................................................................................ 12 2.7	-0.124939
-0.358988	are undocumented. 2.7	-0.124939
-1.727822	{ int a[1000];	-0.124939
-0.596756	89 int a[1000];	-0.124939
-0.585201	.................................................................................................... 10 2.6	-0.124939
-0.584283	another compiler. 2.6	-0.124939
-0.601276	Codes", by S.	-0.124939
-0.358988	processors. Henry S.	-0.124939
-1.335792	a thread environment	-0.124939
-0.589181	integrated development environment	-0.124939
-1.929452	else { F2(b);	-0.124939
-0.659984	float b[1000]; F2(b);	-0.124939
-2.145512	because it handles	-0.124939
-1.699413	the microprocessor handles	-0.124939
-0.855566	program optimization. 2.4	-0.124939
-0.567255	system......................................................................................... 6 2.4	-0.124939
-1.920624	important to note	-0.124939
-0.463714	manuals. Please note	-0.124939
-0.884297	consider whether others	-0.124939
-0.527389	optimized well, others	-0.124939
-0.594503	fit specific needs.	-0.124939
-0.940813	the user's needs.	-0.124939
-0.593296	eax / sar	-0.124939
-0.591953	shr add sar	-0.124939
-0.065815	__attribute__ ((visibility ("internal")))	-0.124939
-2.638242	// Example 8.15a	-0.124939
-2.150653	in example 8.15a	-0.124939
-1.702475	the memory footprint	-0.124939
-0.593773	larger memory footprint	-0.124939
-0.601759	14.12b and 14.13b	-0.124939
-2.638242	// Example 14.13b	-0.124939
-0.601379	scope or namespaces.	-0.124939
-1.535704	to using namespaces.	-0.124939
-1.936845	useful for preventing	-0.124939
-0.358988	without effectively preventing	-0.124939
-0.897108	"asmlib.h" // Lowest	-0.124939
-0.599059	&CriticalFunction_Dispatch; // Lowest	-0.124939
-0.358988	Thursday, Friday, Saturday	-0.124939
-0.358988	= 0x20, Saturday	-0.124939
-0.065815	= (a+b)+(c+d) a*b+a*c=a*(b+c)	-0.425969
-0.065815	different screen resolutions,	-0.124939
-0.550863	is 4. So	-0.124939
-0.358988	from everybody. So	-0.124939
-2.219225	// Example 9.6a	-0.124939
-0.887038	element Example 9.6a	-0.124939
-0.203497	a*b+a*c = a*(b+c)	-0.425969
-0.580837	reproducible. Such events	-0.124939
-0.563094	by random events	-0.124939
-0.601276	affected by __fastcall.	-0.124939
-1.655642	are using __fastcall.	-0.124939
-1.637434	0 - a+0	-0.124939
-2.363867	- n.a. a+0	-0.124939
-0.557810	real-time speed. Delays	-0.124939
-0.358988	Poor reproducibility. Delays	-0.124939
-0.463714	TR18015 Technical Report	-0.124939
-0.358988	18015, "Technical Report	-0.124939
-0.901793	aa[i] = (bb[i]	-0.124939
-0.591960	2) : (bb[i]	-0.124939
-2.130312	set is specified.	-0.124939
-2.441602	instruction set specified.	-0.124939
-2.412263	a function prototype	-0.124939
-1.645431	// Function prototype	-0.124939
-2.016065	on page 39	-0.124939
-0.358988	columns; j++) 39	-0.124939
-2.198558	For example, let's	-0.124939
-0.358988	the difference, let's	-0.124939
-0.203385	Day; if (Day	-0.124939
-1.937450	not be visible	-0.124939
-2.592591	is not visible	-0.124939
-0.597198	- 64 Kbytes	-0.124939
-0.589179	of 256 Kbytes	-0.124939
-2.719394	is a proxy	-0.124939
-1.077993	computer. The proxy	-0.124939
-0.600472	104 } Microprocessors	-0.124939
-1.032776	cache control Microprocessors	-0.124939
-1.831504	on page 105.	-0.124939
-0.879037	see page 105.	-0.124939
-0.595158	been accessed recently	-0.124939
-1.027958	the least recently	-0.124939
-1.300167	cost to creating	-0.124939
-0.601655	responsible for creating	-0.124939
-0.203497	j = order(i);	-0.124939
-0.897514	member pointer refers	-0.124939
-0.563101	Coarse-grained parallelism refers	-0.124939
-2.431153	to a floppy	-0.124939
-2.283118	such as floppy	-0.124939
-1.445670	risk of underflow.	-0.124939
-1.503047	overflow and underflow.	-0.124939
-0.527376	pointers ...................................................................................................... 37	-0.124939
-0.463714	be avoided. 37	-0.124939
-0.601773	known in 36	-0.124939
-0.463714	references ............................................................................................ 36	-0.124939
-0.571223	((C & 3)	-0.124939
-0.571223	((B & 3)	-0.124939
-1.069168	that such contrived	-0.124939
-1.806990	a very contrived	-0.124939
-0.550863	of branches. Manual	-0.124939
-0.659984	from www.intel.com. Manual	-0.124939
-1.065709	cache space. Excessive	-0.124939
-0.358988	too much. Excessive	-0.124939
-0.596244	7.12 void FuncA	-0.124939
-0.358988	calls alternately FuncA	-0.124939
-0.601759	email and web	-0.124939
-0.463714	database integration, web	-0.124939
-1.077452	overflow can occur,	-0.124939
-0.594686	error doesn't occur,	-0.124939
-2.074884	able to reorder	-0.124939
-1.948898	compiler may reorder	-0.124939
-0.601379	seconds or microseconds	-0.124939
-1.064766	functions take microseconds	-0.124939
-2.397415	is the Standard	-0.124939
-0.726919	to security. Standard	-0.124939
-3.207749	of the const_cast	-0.124939
-1.202536	cast The const_cast	-0.124939
-0.570607	ever used, though.	-0.124939
-0.358988	always optimal, though.	-0.124939
-0.359014	int64_t MS compiler:	-0.124939
-0.359014	uint64_t MS compiler:	-0.124939
-0.065815	void F3(bool y)	-0.425969
-0.463714	-m64 -static /MT	-0.124939
-0.358988	/arch:AVX /openmp /MT	-0.124939
-1.829850	object is overwritten,	-0.124939
-2.533963	to be overwritten,	-0.124939
-0.755400	DWORD PTR [esp+8]	-0.124939
-2.954844	can be annoyingly	-0.124939
-0.586782	would give annoyingly	-0.124939
-1.364669	i; float list[size];	-0.124939
-0.579360	100; S1 list[size];	-0.124939
-0.527376	optional commercial license	-0.124939
-0.463714	yes License license	-0.124939
-0.358988	* b2); y1	-0.124939
-0.358988	y1, y2; y1	-0.124939
-0.726919	* reciprocal_divisor; y2	-0.124939
-0.358988	/ b1; y2	-0.124939
-0.186387	elements: #define swapd(x,y)	-0.425969
-0.440683	if ((unsigned int)i	-0.124939
-0.598870	version int CriticalFunction_SSE2(int	-0.425969
-0.582617	is 2 GHz	-0.124939
-0.582617	a 2 GHz	-0.124939
-1.203359	a is false.	-0.124939
-0.600594	0's when false.	-0.124939
-0.585201	Windows. 10 Multithreading	-0.124939
-0.505102	necessary. 101 Multithreading	-0.124939
-0.751790	Intel CPUs. New	-0.124939
-0.751790	AMD CPUs. New	-0.124939
-0.200664	typeof(CriticalFunction) * CriticalFunctionDispatch(void)	-0.124939
-1.062801	for these methods.	-0.124939
-1.467961	CPU dispatch methods.	-0.124939
-0.600935	genuine compiler became	-0.124939
-0.550863	Basic soon became	-0.124939
-1.291214	and only if,	-0.124939
-1.672871	is advantageous if,	-0.124939
-0.527389	end user's computers.	-0.124939
-0.659984	big mainframe computers.	-0.124939
-0.789055	B, C; x.abc	-0.124939
-0.358988	Example 7.40c x.abc	-0.124939
-0.065815	156 16.3 Worst-case	-0.425969
-0.567249	simple expressions. Operations	-0.124939
-0.527376	and aliasing. Operations	-0.124939
-0.596967	the libraries named	-0.124939
-0.596634	256-bit registers named	-0.124939
-0.463714	a+0=a x-xxxxxx- a*0=0	-0.124939
-0.463714	0 a+0=a a*0=0	-0.124939
-0.601759	p1 and p2	-0.124939
-0.358988	* p2; p2	-0.124939
-0.902516	contained in p1	-0.124939
-0.358988	* p1; p1	-0.124939
-1.649093	for all major	-0.124939
-1.594994	on all major	-0.124939
-1.198501	programs use internet	-0.124939
-0.550854	forums Several internet	-0.124939
-0.585049	abc * p;	-0.124939
-0.585049	CHello * p;	-0.124939
-1.077010	inline int lrintf	-0.124939
-1.706306	the functions lrintf	-0.124939
-2.736133	that the resulting	-0.124939
-1.992137	} The resulting	-0.124939
-0.143416	function swapd(a[r][c], a[c][r]);	-0.124939
-0.143416	diagonal swapd(a[r][c], a[c][r]);	-0.124939
-0.826661	high precision math.	-0.124939
-0.562167	High precision math.	-0.124939
-0.901793	4 = 2048	-0.124939
-1.237252	512 512 2048	-0.124939
-0.897292	d + 3.5;	-0.124939
-1.769246	b * 3.5;	-0.124939
-0.601663	relocation. The DLLs	-0.124939
-0.594845	position. Windows DLLs	-0.124939
-1.712369	compiler for Unix	-0.124939
-0.598822	registers. 64-bit Unix	-0.124939
-1.520547	table lookup Lookup	-0.124939
-0.836041	table lookup. Lookup	-0.124939
-1.274626	template parameters differ	-0.124939
-0.541321	and drivers differ	-0.124939
-0.203497	level = InstructionSet();	-0.425969
-1.197889	}; void F1()	-0.124939
-0.573889	prototype: void F1()	-0.124939
-0.601072	safe. This safety	-0.124939
-0.541321	doesn't compromise safety	-0.124939
-0.601847	libraries of predefined	-0.124939
-0.594677	functions Use predefined	-0.124939
-2.101410	make a variable-size	-0.124939
-1.308747	to allocate variable-size	-0.124939
-0.595678	= & obj1;	-0.124939
-0.958021	{ C1 obj1;	-0.124939
-0.065815	Numerically Intensive Codes",	-0.124939
-1.433332	instructions are summarized	-0.124939
-1.194854	results are summarized	-0.124939
-0.601870	targets is small.	-0.124939
-1.038513	be too small.	-0.124939
-1.156539	error handling ................................................................................	-0.124939
-0.659984	time consumers ................................................................................	-0.124939
-1.803170	from a buffer.	-0.124939
-1.220112	branch target buffer.	-0.124939
-0.373985	100; float list[size],	-0.124939
-0.190751	InstructionSet() #include "asmlib.h"	-0.425969
-0.601847	occurrences of ArraySize	-0.124939
-2.064204	const int ArraySize	-0.124939
-0.598907	variables, float Live	-0.124939
-0.836041	register storage. Live	-0.124939
-0.358988	_mm_blendv_epi8(bc, c2, mask);	-0.124939
-0.358988	= _mm_and_si128(c2, mask);	-0.124939
-0.601220	names with suffixes	-0.124939
-0.594330	AVX. These suffixes	-0.124939
-2.460178	by the programmer.	-0.124939
-1.574439	the application programmer.	-0.124939
-0.463714	---xxx-x- a+0=a x-xxxxxx-	-0.124939
-0.358988	- x-xx----x x-xxxxxx-	-0.124939
-0.902687	given a name.	-0.124939
-2.647771	the same name.	-0.124939
-2.656570	of a third-party	-0.124939
-1.687073	are also third-party	-0.124939
-0.659984	= b*a (a+b)+c=a+(b+c)	-0.124939
-0.358988	a*b=b*a a+b+c=a+(b+c) (a+b)+c=a+(b+c)	-0.124939
-1.710763	functions for audio	-0.124939
-0.358988	produce streaming audio	-0.124939
-0.601134	expressions as arguments	-0.124939
-2.647771	the same arguments	-0.124939
-1.594559	be an infinite	-0.124939
-0.806100	for avoiding infinite	-0.124939
-2.228189	the program flow.	-0.124939
-1.259246	of program flow.	-0.124939
-0.596357	and even worse,	-0.124939
-0.557802	4. Even worse,	-0.124939
-1.781907	the cache miss	-0.124939
-1.578050	level-2 cache miss	-0.124939
-2.763279	if the unsafe	-0.124939
-0.601870	memcpy is unsafe	-0.124939
-0.966508	is optimized away.	-0.124939
-0.561167	or optimized away.	-0.124939
-1.446542	calculating the movements	-0.124939
-0.541321	the physical movements	-0.124939
-0.378965	x*x*x*x*x*x*x*x = ((x2)	-0.425969
-0.601844	Handles to windows,	-0.124939
-0.818671	allocated memory, windows,	-0.124939
-0.601844	response to pressing	-0.124939
-0.589672	tasks like pressing	-0.124939
-0.828368	vector register. Factors	-0.124939
-0.557810	vectorization is. Factors	-0.124939
-2.283118	such as price,	-0.124939
-1.418992	a high price,	-0.124939
-0.789055	Eliminate jumps Jumps	-0.124939
-0.527376	not optimized. Jumps	-0.124939
-1.075083	debugging and maintaining	-0.124939
-0.600690	verifying and maintaining	-0.124939
-0.505102	both operands. Nevertheless,	-0.124939
-0.659984	a PC. Nevertheless,	-0.124939
-0.601759	Graphics and sound	-0.124939
-0.550863	image processing, sound	-0.124939
-0.601759	resources and servers	-0.124939
-0.601176	useful on servers	-0.124939
-0.202312	a make utility.	-0.124939
-3.207749	of the executable.	-0.124939
-2.647771	the same executable.	-0.124939
-1.308255	cannot be controlled.	-0.124939
-1.174649	the specific literature	-0.124939
-0.971407	the general literature	-0.124939
-0.378965	SIZE = 512;	-0.425969
-0.595151	take extra precautions	-0.124939
-0.579343	take special precautions	-0.124939
-1.460652	there are smarter	-0.425969
-0.143416	Codes", SIAM 2001.	-0.124939
-0.143416	Hoisie, SIAM 2001.	-0.124939
-0.659984	page 73). Current	-0.124939
-0.358988	called accumulators. Current	-0.124939
-0.203944	effort is concentrated	-0.425969
-1.249960	i++) { aa[i]	-0.425969
-0.601134	Return a null	-0.124939
-0.901241	returning a null	-0.124939
-1.900431	compiler is capable	-0.124939
-1.077711	CPUs are capable	-0.124939
-0.600472	FuncB(i); } FuncC(i);	-0.124939
-0.659984	{ FuncA(i); FuncC(i);	-0.124939
-1.202905	reason for updating.	-0.124939
-0.358988	support. Hardware updating.	-0.124939
-0.601759	MOVNTPD and MOVNTDQ	-0.124939
-1.289421	without cache MOVNTDQ	-0.124939
-1.077993	memory. The renaming	-0.124939
-1.277550	of register renaming	-0.124939
-0.577666	GB. When considering	-0.124939
-0.541341	alternative worth considering	-0.124939
-2.834319	it is worthwhile	-0.124939
-2.538377	may be worthwhile	-0.124939
-0.463714	xxxxxxxxx x-xxx---- a-(-b)=a+b	-0.124939
-0.358988	-(-a)=a --xxxxxx- a-(-b)=a+b	-0.124939
-0.584939	virtual void f();	-0.425969
-1.172600	is allocated separately.	-0.124939
-0.557802	be measured separately.	-0.124939
-0.596251	regular access patterns	-0.124939
-0.527376	in regular patterns	-0.124939
-1.318096	on page 93.	-0.124939
-1.714773	only the lowest	-0.124939
-0.358988	// Error: lowest	-0.124939
-0.584293	<math.h> #define EXCEPTION_FLT_OVERFLOW	-0.124939
-0.579364	(GetExceptionCode() == EXCEPTION_FLT_OVERFLOW	-0.124939
-0.601859	defined a constructor,	-0.124939
-1.048687	The copy constructor,	-0.124939
-0.358988	Windows, Intel/MASM syntax:	-0.124939
-0.358988	Linux, Gnu/AT&T syntax:	-0.124939
-1.070628	b * 1.2;	-0.425969
-1.831504	on page 26.	-0.124939
-1.951444	See page 26.	-0.124939
-1.078873	set the parentheses	-0.124939
-1.708468	the two parentheses	-0.124939
-1.055325	an overflow check.	-0.124939
-0.582126	more syntax check.	-0.124939
-1.554814	series of experiments	-0.124939
-2.277763	to do experiments	-0.124939
-0.888793	order execution .................................................................................................	-0.124939
-0.971442	lookup tables .................................................................................................	-0.124939
-0.065815	4: "Instruction tables".	-0.124939
-0.593296	100 / jl	-0.124939
-0.726919	add cmp jl	-0.124939
-1.427003	the total computation	-0.124939
-0.659984	the overall computation	-0.124939
-1.661548	can make thread-local	-0.124939
-0.577679	variables. (See thread-local	-0.124939
-0.980467	up to date.	-0.124939
-1.995593	have a physics	-0.124939
-1.010214	a dedicated physics	-0.124939
-1.608207	is calculated first,	-0.124939
-0.591648	R values first,	-0.124939
-0.567294	register (see below)	-0.124939
-0.567294	counter (see below)	-0.124939
-1.379070	branch is eliminated.	-0.124939
-0.601569	transfer are eliminated.	-0.124939
-1.663892	consecutive elements c.load(cc+i);	-0.124939
-0.463714	{ b.load(bb+i); c.load(cc+i);	-0.124939
-0.358988	reductions: !(!a)=a x-xxxxxxx	-0.124939
-0.358988	x-xxxxxx- x-xxxx-x- x-xxxxxxx	-0.124939
-1.090029	the CPU. Unrolling	-0.124939
-0.358988	then FuncC. Unrolling	-0.124939
-1.073816	processor has hyperthreading.	-0.124939
-1.535704	to using hyperthreading.	-0.124939
-0.463714	is 1024 bytes,	-0.124939
-0.463714	= 8192 bytes,	-0.124939
-0.370142	link libraries (*.lib,	-0.425969
-1.978033	lot of irrelevant	-0.124939
-2.533963	to be irrelevant	-0.124939
-1.914928	must be careful	-0.124939
-0.589200	still needs careful	-0.124939
-0.593513	processing, data compression	-0.124939
-0.593513	decryption, data compression	-0.124939
-0.600674	if time intervals	-0.124939
-0.726919	at unpredictable intervals	-0.124939
-0.804644	{ for (c2	-0.425969
-0.600980	expects an immediate	-0.124939
-0.726919	user expects immediate	-0.124939
-0.958021	{ C1 Object1;	-0.124939
-0.527389	{ CChild1 Object1;	-0.124939
-0.505102	(addition, multiplication, etc.)	-0.124939
-0.358988	Mac OS, etc.)	-0.124939
-1.016485	32-bit and 64-bit.	-0.124939
-1.992137	} The indirect	-0.124939
-0.358988	called "Gnu indirect	-0.124939
-0.358988	0/a=0 ---x---xx (-a==-b)=(a==b)	-0.124939
-0.358988	0/a=0 ---xx--xx (-a==-b)=(a==b)	-0.124939
-1.196981	microprocessor has hyperthreading,	-0.124939
-1.535704	to using hyperthreading,	-0.124939
-2.384171	is the exponent,	-0.124939
-0.601610	bit, the exponent,	-0.124939
-1.525903	0) { FuncA(i);	-0.124939
-1.179632	2) { FuncA(i);	-0.124939
-1.446542	Unfortunately, the cross-platform	-0.124939
-1.713456	sake of cross-platform	-0.124939
-0.065815	#define swapd(x,y) {temp=x;	-0.425969
-0.593513	is data decomposition.	-0.124939
-1.602511	and data decomposition.	-0.124939
-0.877963	a template parameter:	-0.124939
-0.599938	profiler which determines	-0.124939
-1.031715	first operand determines	-0.124939
-0.897310	data members (properties)	-0.124939
-2.064204	const int ABC	-0.124939
-0.584293	example, #define ABC	-0.124939
-3.207749	of the comments	-0.124939
-1.805593	a few comments	-0.124939
-0.789073	7.10 Arrays .....................................................................................................................	-0.124939
-0.726919	19 Literature .....................................................................................................................	-0.124939
-2.834319	it is profitable	-0.124939
-2.533963	to be profitable	-0.124939
-0.836063	the logic behind	-0.124939
-0.505102	actually hidden behind	-0.124939
-0.659984	Agner Fog. Technical	-0.124939
-0.358988	ISO/IEC TR18015 Technical	-0.124939
-1.744506	instruction set. Neither	-0.124939
-0.358988	an hour. Neither	-0.124939
-1.949758	for each calculation.	-0.124939
-1.672195	the next calculation.	-0.124939
-0.359010	pure __attribute(( const))	-0.124939
-0.359010	const)) __attribute(( const))	-0.124939
-1.379133	easier to test,	-0.124939
-1.023597	program under test,	-0.124939
-0.143416	sections /Gy -ffunction-	-0.124939
-0.143416	functions) /Gy -ffunction-	-0.124939
-0.143416	-msse /arch:SSE -msse	-0.124939
-0.143416	vectors) /arch:SSE -msse	-0.124939
-0.897108	0; // Initialize	-0.124939
-0.599059	Constructor // Initialize	-0.124939
-0.598545	and array indices	-0.124939
-0.582147	by consecutive indices	-0.124939
-1.201579	4) { s0	-0.124939
-0.896806	a[100]; float s0	-0.124939
-0.726919	assembly listing /FA	-0.124939
-0.358988	- masm=intel /FA	-0.124939
-1.299007	version for marketing	-0.124939
-0.563094	no heavy marketing	-0.124939
-1.502978	parm1, int parm2);	-0.124939
-0.358988	return (*CriticalFunction)(parm1, parm2);	-0.124939
-0.463714	a+b+c=a+(b+c) (a+b)+c=a+(b+c) --xx-----	-0.124939
-0.463714	x-xx--xx- x--x----- --xx-----	-0.124939
-0.378965	rows = 20,	-0.425969
-1.078873	reflects the conflicting	-0.124939
-1.600816	are often conflicting	-0.124939
-1.481669	i < 20;	-0.124939
-1.300517	though the 61	-0.124939
-0.463714	handling ................................................................................ 61	-0.124939
-0.601276	uses by looking	-0.124939
-0.463714	The funny looking	-0.124939
-0.601220	efficiently with coarse-grained	-0.124939
-0.894863	distinguish between coarse-grained	-0.124939
-0.597657	mirror elements matrix[c][r]	-0.124939
-0.888134	with element matrix[c][r]	-0.124939
-0.463714	etc. -msse3 -mssse3	-0.124939
-0.358988	-msse3 /arch:SSE3 -mssse3	-0.124939
-1.600336	useful to isolate	-0.124939
-0.601759	identify and isolate	-0.124939
-0.892957	assembly code. Let	-0.124939
-0.570601	and sets. Let	-0.124939
-0.600675	task in question.	-0.124939
-0.900325	algorithm in question.	-0.124939
-1.077322	x // x^n	-0.124939
-0.593055	next four x^n	-0.124939
-0.601625	mechanism that treats	-0.124939
-1.558375	CPU dispatcher treats	-0.124939
-0.199056	Specific optimization topics	-0.124939
-0.570601	self-relative address. (3)	-0.124939
-0.358988	wrap around, (3)	-0.124939
-0.640694	a lookup table:	-0.425969
-0.902709	network is unstable	-0.124939
-0.601569	measurements are unstable	-0.124939
-1.065709	CPU cores. 60	-0.124939
-0.358988	Threads .................................................................................................................. 60	-0.124939
-2.467643	number of iterations.	-0.124939
-0.358988	and Newton-Raphson iterations.	-0.124939
-0.726919	range analysis Join	-0.124939
-0.659984	memory area. Join	-0.124939
-0.901793	aa[i] = bb[i]	-0.124939
-0.600594	1's when bb[i]	-0.124939
-2.251796	then the sampling	-0.124939
-0.463714	etc. Event-based sampling	-0.124939
-1.077237	(set) = (memory	-0.124939
-1.480638	the objects (memory	-0.124939
-0.902280	necessary for verifying	-0.124939
-0.358988	fine-tuning, testing, verifying	-0.124939
-0.557810	....................................................................................................... 19 3.6	-0.124939
-0.358988	is acceptable. 3.6	-0.124939
-0.541321	.................................................................................................. 18 3.4	-0.124939
-0.505102	standardized manner. 3.4	-0.124939
-1.042410	a[i] = log(b[i])	-0.425969
-0.505102	= 100. Now,	-0.124939
-0.358988	* sizeof(float)). Now,	-0.124939
-0.895136	2) 2 a+a+a+a=a*4	-0.124939
-0.358988	= ((x2)2)2 a+a+a+a=a*4	-0.124939
-1.379637	are in doubt	-0.124939
-2.250309	is no doubt	-0.124939
-1.368188	(see page 78).	-0.124939
-0.601759	u.f and v.f	-0.124939
-0.587459	u.f > v.f	-0.124939
-0.659984	a FIFO manner?	-0.124939
-0.358988	a FILO manner?	-0.124939
-2.366894	rather than generating	-0.124939
-0.596448	question without generating	-0.124939
-0.789055	low priority. Especially	-0.124939
-0.541310	round addresses. Especially	-0.124939
-1.295434	of compiler ....................................................................................................	-0.124939
-0.847477	Automatic updates ....................................................................................................	-0.124939
-0.596141	...................................................................................... 16 3.2	-0.124939
-0.358988	be improved. 3.2	-0.124939
-0.900549	(y) { F1(a);	-0.124939
-0.659984	int a[1000]; F1(a);	-0.124939
-0.595151	single function. Switch	-0.124939
-0.789073	two ways. Switch	-0.124939
-0.203497	zero = _mm_set1_epi16(0);	-0.425969
-1.635387	position-independent code everywhere	-0.124939
-1.107162	are scattered everywhere	-0.124939
-0.598952	(a&&c) = a&&(b||c)	-0.124939
-0.896896	(a&&b&&c) = a&&(b||c)	-0.124939
-0.601411	(b&c) = (a&b)	-0.124939
-1.423986	= 0, (a&b)	-0.124939
-0.596141	.................................................................................. 16 3.3	-0.124939
-0.358988	following sections. 3.3	-0.124939
-0.596141	................................................................................ 16 3.1	-0.124939
-0.659984	time consumers 3.1	-0.124939
-1.030466	that goes randomly	-0.124939
-1.107162	are scattered randomly	-0.124939
-0.527376	and structures. Useful	-0.124939
-0.358988	memory requirement. Useful	-0.124939
-0.573379	access................................................................................................................ 20 3.8	-0.124939
-0.358988	to finish. 3.8	-0.124939
-0.474644	...................................................................................................... 20 3.9	-0.124939
-0.474644	files). 20 3.9	-0.124939
-0.881110	compilers optimize ............................................................................................	-0.124939
-1.133395	and references ............................................................................................	-0.124939
-1.243857	a big mainframe	-0.124939
-0.566517	yesterday's big mainframe	-0.124939
-1.201642	test // (time	-0.124939
-0.601051	after) - (time	-0.124939
-0.577648	software optimization. Everything	-0.124939
-0.563094	same register. Everything	-0.124939
-0.901332	16 is required.	-0.124939
-0.601180	strictness is required.	-0.124939
-1.642940	out the theoretical	-0.124939
-0.601663	alternative. The theoretical	-0.124939
-0.601844	12.1b to 12.1a.	-0.124939
-2.638242	// Example 12.1a.	-0.124939
-3.070236	in the file,	-0.124939
-0.358988	the .exe file,	-0.124939
-0.567279	many hard working	-0.124939
-0.358988	using indexes, working	-0.124939
-0.601663	vectorize. The pragmas	-0.124939
-0.601134	hints as pragmas	-0.124939
-0.601844	principles to use.	-0.124939
-1.445724	not in use.	-0.124939
-1.830140	difficult to use,	-0.124939
-0.597261	about register use,	-0.124939
-0.597406	vectorization less favorable:	-0.124939
-0.861596	make vectorization favorable:	-0.124939
-0.596134	different system color	-0.124939
-0.527389	root, RGB color	-0.124939
-1.078800	stride is 8192	-0.124939
-0.601411	kb = 8192	-0.124939
-0.203497	mask = _mm_cmpgt_epi16(b,	-0.425969
-1.078610	microprocessors is lost.	-0.124939
-0.601569	settings are lost.	-0.124939
-0.143416	60 7.30 Exceptions	-0.124939
-0.143416	multithreading. 7.30 Exceptions	-0.124939
-3.207749	of the question	-0.124939
-1.078320	numbers in question	-0.124939
-1.553601	time and afterwards	-0.124939
-2.429234	the program afterwards	-0.124939
-1.351913	function returns. Every	-0.124939
-0.358988	the columns. Every	-0.124939
-1.078873	set the denormals-are-zero	-0.124939
-0.601759	flush-to-zero and denormals-are-zero	-0.124939
-2.541119	the function declaration.	-0.124939
-1.621514	the class declaration.	-0.124939
-1.493158	no other exceptions:	-0.124939
-0.587475	resume after exceptions:	-0.124939
-1.131335	difficult to read.	-0.124939
-0.583260	programming are: Non-static	-0.124939
-0.659984	one instance. Non-static	-0.124939
-2.656570	of a re-	-0.124939
-0.601379	piecewise or re-	-0.124939
-1.493028	dynamic library requiring	-0.124939
-0.586768	complex framework requiring	-0.124939
-0.541321	struct abc {int	-0.124939
-0.463714	struct Sab {int	-0.124939
-2.736133	that the branching	-0.124939
-1.711269	function. The branching	-0.124939
-1.293891	pointer has changed.	-0.124939
-1.330831	is never changed.	-0.124939
-1.976703	the object belongs	-0.124939
-0.527389	This normally belongs	-0.124939
-0.559755	(a&&b) || (a&&c)	-0.124939
-2.064204	const int NumberOfTests	-0.124939
-0.358988	// Repeat NumberOfTests	-0.124939
-0.902709	platform is obviously	-0.124939
-2.199341	then it obviously	-0.124939
-0.744336	may go undetected.	-0.124939
-0.515469	otherwise go undetected.	-0.124939
-0.917690	that runs alone	-0.124939
-0.358988	a stand alone	-0.124939
-2.442987	by the caller	-0.124939
-2.181124	from the caller	-0.124939
-1.235222	a better understanding	-0.124939
-0.358988	a basic understanding	-0.124939
-0.901936	process can influence	-0.124939
-1.593154	This has influence	-0.124939
-0.887814	: c x-xx-----	-0.124939
-0.550854	x-xxxx--x x-xxxx--x x-xx-----	-0.124939
-1.482625	is called. Lazy	-0.124939
-0.505102	unacceptably long. Lazy	-0.124939
-0.601439	returns // Volatile	-0.124939
-1.334199	is enabled. Volatile	-0.124939
-2.013164	need to lock	-0.124939
-0.358988	to temporarily lock	-0.124939
-0.902709	purposes is allowed.	-0.124939
-2.592591	is not allowed.	-0.124939
-2.193707	more efficient today	-0.124939
-0.596539	brand new today	-0.124939
-0.898128	d = (double)(signed	-0.425969
-2.772600	in a programmable	-0.124939
-0.600518	devices A programmable	-0.124939
-0.896389	have such checks.	-0.124939
-0.582126	bypassing syntax checks.	-0.124939
-1.520547	table lookup mechanisms	-0.124939
-0.358988	version. Updating mechanisms	-0.124939
-1.419661	in table 8.1.	-0.124939
-0.588074	- Table 8.1.	-0.124939
-2.014985	all the G	-0.124939
-1.395731	the four G	-0.124939
-1.276575	compiler Linux Align	-0.124939
-0.550863	Linux) 4. Align	-0.124939
-0.599151	(b + c)	-0.124939
-1.258877	b / c)	-0.124939
-1.201529	CriticalFunction = &CriticalFunction_386;	-0.124939
-1.192375	version return &CriticalFunction_386;	-0.124939
-1.069687	first two (three	-0.124939
-1.765477	the stack (three	-0.124939
-1.230655	else { goto	-0.124939
-2.191732	are not testing.	-0.124939
-0.899744	comes from testing.	-0.124939
-0.143416	the "override" feature.	-0.124939
-0.143416	this "override" feature.	-0.124939
-0.143416	The symbol interposition	-0.124939
-0.143416	so-called symbol interposition	-0.124939
-0.902219	routine that loads	-0.124939
-1.073877	application program loads	-0.124939
-0.065815	libraries (*.lib, *.a)	-0.425969
-0.592228	their CPU dispatchers	-0.124939
-0.592228	Many CPU dispatchers	-0.124939
-0.563094	one register. Registers	-0.124939
-0.505102	or reference. Registers	-0.124939
-0.594142	exceptions, etc. Event-based	-0.124939
-0.358988	less reliable. Event-based	-0.124939
-1.246284	the problems associated	-0.124939
-0.864110	programming errors associated	-0.124939
-2.088616	be a time-consumer	-0.124939
-1.065709	the biggest time-consumer	-0.124939
-1.366760	CPU detection mechanism.	-0.124939
-1.065643	branch prediction mechanism.	-0.124939
-0.199488	65 8 Optimizations	-0.425969
-0.902488	devices and machines	-0.124939
-0.818685	best Java machines	-0.124939
-1.269541	with this problem:	-0.124939
-0.595577	against this problem:	-0.124939
-0.591655	constructors, copy constructors,	-0.124939
-0.567255	to default constructors,	-0.124939
-0.570596	using references. References	-0.124939
-0.527376	wrong type. References	-0.124939
-0.203652	sets are mutually	-0.425969
-0.901239	coded as _mm_empty()	-0.124939
-1.308638	to execute _mm_empty()	-0.124939
-1.948898	compiler may report	-0.124939
-0.596965	Generate optimization report	-0.124939
-0.599888	remove all disturbing	-0.124939
-0.591629	conditions. All disturbing	-0.124939
-0.601773	increase in develop-	-0.124939
-1.473447	of software develop-	-0.124939
-1.911918	not be negative.	-0.124939
-1.072470	never be negative.	-0.124939
-0.567261	and search facilities,	-0.124939
-0.818699	for debugging facilities,	-0.124939
-1.379596	cause the creation	-0.124939
-0.601663	c) The creation	-0.124939
-1.593141	a compiler warning	-0.124939
-0.599623	get no warning	-0.124939
-2.064204	const int min	-0.124939
-0.861603	(i >= min	-0.124939
-0.550854	are constant. 14.2	-0.124939
-0.505102	................................................................................................. 132 14.2	-0.124939
-0.971407	to zero. 14.3	-0.124939
-0.527376	.................................................................................................. 134 14.3	-0.124939
-0.505102	......................................................................................... 132 14.1	-0.124939
-0.659984	optimization topics 14.1	-0.124939
-1.601239	See the vectorclass	-0.124939
-0.601051	http://www.agner.org/optimize/ - vectorclass	-0.124939
-0.726919	* reciprocal_divisor; 14.7	-0.124939
-0.463714	........................................................................................... 139 14.7	-0.124939
-0.601759	profiling and debugging.	-0.124939
-1.076667	incompatible with debugging.	-0.124939
-0.573889	8.26a void Func(int	-0.124939
-0.573889	8.26b void Func(int	-0.124939
-0.601870	by is (columns	-0.124939
-0.896137	j * (columns	-0.124939
-0.505102	............................................................................................. 136 14.5	-0.124939
-0.358988	page 96. 14.5	-0.124939
-1.503629	array is defined.	-0.124939
-2.954844	can be defined.	-0.124939
-0.463714	-msse3 -mssse3 -msse4.1	-0.124939
-0.358988	-mssse3 /arch:SSSE2 -msse4.1	-0.124939
-0.601439	x^8 // x^10	-0.124939
-1.067437	// return x^10	-0.124939
-0.664052	are many branches):	-0.425969
-0.595807	Friday) { DoThisThreeTimesAWeek();	-0.124939
-0.595807	Friday)) { DoThisThreeTimesAWeek();	-0.124939
-0.143416	-msse2 /arch:SSE2 -msse2	-0.124939
-0.143416	double) /arch:SSE2 -msse2	-0.124939
-1.585231	such as logarithms,	-0.425969
-1.067874	code by default.	-0.124939
-0.598258	everywhere by default.	-0.124939
-1.202905	time for WTL	-0.124939
-0.600518	(WTL). A WTL	-0.124939
-0.503947	_EM_OVERFLOW); // _controlfp(0,	-0.425969
-1.775139	time to load.	-0.124939
-1.412230	the work load.	-0.124939
-1.695890	a = select(b	-0.425969
-1.027958	high level framework.	-0.124939
-0.806132	the .NET framework.	-0.124939
-0.999851	exception handling. 8.6	-0.124939
-0.505102	................................................................................... 81 8.6	-0.124939
-0.203846	communication and synchronization	-0.425969
-1.203881	is executed. Without	-0.124939
-0.463714	} 73 Without	-0.124939
-1.710763	functions for millisecond	-0.124939
-0.601220	measured with millisecond	-0.124939
-0.899911	double, then sizeof(S1)	-0.124939
-0.557802	The factor sizeof(S1)	-0.124939
-2.772600	in a high-priority	-0.124939
-0.902488	core and high-priority	-0.124939
-0.894945	in software development.	-0.124939
-0.836075	and easy development.	-0.124939
-2.238870	have to push	-0.124939
-0.463714	12 $B1$1: push	-0.124939
-0.203923	Optimization of Numerically	-0.425969
-0.601844	CPUs to verify	-0.124939
-0.601759	maintain and verify	-0.124939
-0.543525	standardized allows us	-0.124939
-0.543525	biased allows us	-0.124939
-0.601759	sorting and searching,	-0.124939
-0.358988	as sorting, searching,	-0.124939
-1.200611	to is known.	-0.124939
-1.820350	object is known.	-0.124939
-0.463714	code.................................................................................. 148 14.13	-0.124939
-0.358988	OS X. 14.13	-0.124939
-0.527389	libraries............................................................................ 146 14.12	-0.124939
-0.358988	code. 147 14.12	-0.124939
-0.796254	same memory area.	-0.124939
-2.638242	// Example 14.19	-0.124939
-2.150653	in example 14.19	-0.124939
-2.366894	rather than rounding.	-0.124939
-1.059169	details about rounding.	-0.124939
-0.599151	row + column;	-0.124939
-0.358988	int row, column;	-0.124939
-1.925440	the performance dramatically	-0.124939
-0.463714	times 24 dramatically	-0.124939
-0.588099	static data. 148	-0.124939
-0.358988	Position-independent code.................................................................................. 148	-0.124939
-0.358988	long latencies. 8.5	-0.124939
-0.358988	by CPU.............................................................................81 8.5	-0.124939
-0.599059	polynomial // Polynomial	-0.124939
-0.599059	3.3; // Polynomial	-0.124939
-0.891758	not even temporarily.	-0.124939
-1.417826	at least temporarily.	-0.124939
-1.806990	a very obscure	-0.124939
-0.463714	to construct obscure	-0.124939
-2.638242	// Example 14.1c	-0.124939
-2.150653	in example 14.1c	-0.124939
-1.067271	fractional part 142	-0.124939
-0.358988	variables ......................... 142	-0.124939
-1.598023	option for "assume	-0.124939
-1.277166	compiler option "assume	-0.124939
-1.599124	arrays are properly	-0.124939
-0.726919	is deleted properly	-0.124939
-0.892959	software optimization issue.	-0.124939
-0.659984	serious legal issue.	-0.124939
-1.203359	computer is restarted	-0.124939
-0.601759	down and restarted	-0.124939
-0.600973	module2.cpp int Func2()	-0.124939
-1.181310	} void Func2()	-0.124939
-0.463714	x-xxxx--x x-xx----- x--x-----	-0.124939
-0.358988	(x) x-xx--xx- x--x-----	-0.124939
-0.600949	power than PCs.	-0.124939
-0.591958	than standard PCs.	-0.124939
-0.577648	this optimization. 8.2	-0.124939
-0.505102	............................................................................................ 66 8.2	-0.124939
-0.577653	about it. Possible	-0.124939
-0.358988	non-Intel machines? Possible	-0.124939
-0.601759	Compilers and IDE's	-0.124939
-0.589680	on. Most IDE's	-0.124939
-1.036759	PathScale compilers. 8.3	-0.124939
-0.463714	compilers............................................................................. 74 8.3	-0.124939
-2.800783	can be obtained.	-0.124939
-1.072470	easily be obtained.	-0.124939
-1.198608	software in C++:	-0.124939
-0.600675	metaprogramming in C++:	-0.124939
-1.126117	y = sin(x);	-0.124939
-1.203709	see the delay.	-0.124939
-1.684048	a long delay.	-0.124939
-1.291772	sum = 1.f;	-0.124939
-0.598952	nfac = 1.f;	-0.124939
-0.505102	optimization explicitly. Divisions	-0.124939
-0.659984	as additions. Divisions	-0.124939
-0.596964	time-critical code. 7.32	-0.124939
-0.567255	.............................................................................. 65 7.32	-0.124939
-1.596639	int i, largest_index	-0.124939
-0.358988	= absvalue; largest_index	-0.124939
-1.275010	64 bits wide,	-0.124939
-1.121882	16 bits wide,	-0.124939
-1.948757	i++) { list[i].a	-0.124939
-1.190798	for accessing list[i].a	-0.124939
-0.594686	particular processor model.	-0.124939
-1.056892	any specific model.	-0.124939
-0.567255	......................................................................................... 65 7.33	-0.124939
-0.463714	a discussion. 7.33	-0.124939
-0.379443	operations for manipulating	-0.425969
-0.143416	cores. 3.15 Dependency	-0.124939
-0.143416	22 3.15 Dependency	-0.124939
-1.353933	fast as additions.	-0.124939
-0.597568	unit as additions.	-0.124939
-1.289421	without cache MOVNTPD	-0.124939
-0.358988	instructions MOVNTPS, MOVNTPD	-0.124939
-1.049098	an integer. 158	-0.124939
-0.463714	systems ............................................................................. 158	-0.124939
-0.901793	x.a = A;	-0.124939
-0.897292	A + A;	-0.124939
-0.674215	a clock cycle?	-0.124939
-2.190361	from the server.	-0.124939
-0.596630	dedicated test server.	-0.124939
-0.505102	unit-testing ...................................................................................... 156	-0.124939
-0.358988	unreasonably large. 156	-0.124939
-0.601759	optimized and fine-tuned	-0.124939
-2.302631	that are fine-tuned	-0.124939
-0.463714	testing ................................................................................................ 157	-0.124939
-0.659984	than normal. 157	-0.124939
-1.670333	than to draw	-0.124939
-0.601057	here to draw	-0.124939
-0.575711	different test examples.	-0.124939
-0.851900	my test examples.	-0.124939
-1.107146	the derived class:	-0.124939
-0.358988	the grandparent class:	-0.124939
-1.942147	advantage of sharing	-0.124939
-1.298582	threads are sharing	-0.124939
-2.127747	want to 155	-0.124939
-0.358988	counters .................................................................... 155	-0.124939
-0.463714	.................................................................................................................. 60 7.30	-0.124939
-0.358988	of multithreading. 7.30	-0.124939
-1.433199	in other ways,	-0.124939
-0.597462	bytes, 4 ways,	-0.124939
-2.800783	can be predicted.	-0.124939
-2.318352	should be predicted.	-0.124939
-0.204114	for (i=0; i<n;	-0.124939
-2.059910	program is dividing	-0.124939
-0.597538	int before dividing	-0.124939
-1.923188	and other complications	-0.124939
-1.644259	can cause complications	-0.124939
-1.279003	compiled without AVX,	-0.124939
-0.573364	available, e.g. AVX,	-0.124939
-0.505102	so on. 7.31	-0.124939
-0.463714	................................................................................ 61 7.31	-0.124939
-1.495648	overflow and redo	-0.124939
-0.600690	_finite()) and redo	-0.124939
-0.601655	opportunities for parallelization	-0.124939
-1.317030	and automatic parallelization	-0.124939
-0.982176	array element. Matrix	-0.124939
-1.231371	as follows: Matrix	-0.124939
-1.079042	and destructors ..................................................................................	-0.124939
-1.169205	hot spots ..................................................................................	-0.124939
-0.994818	* c); a.store(aa+i);	-0.124939
-1.065774	in aa: a.store(aa+i);	-0.124939
-0.197775	(u.i & 0x7FFFFFFF)	-0.425969
-0.901936	threads can add,	-0.124939
-0.358988	SSE3 horizontal add,	-0.124939
-1.379070	branch is fed	-0.124939
-2.954844	can be fed	-0.124939
-3.207749	of the array,	-0.124939
-1.763277	in an array,	-0.124939
-0.899261	32 for AVX.	-0.124939
-0.600141	.R. for AVX.	-0.124939
-0.519854	compiler #define Alignd(X)	-0.124939
-0.519854	etc. #define Alignd(X)	-0.124939
-0.378965	c2 = _mm_add_epi16(c,	-0.425969
-1.639117	program that waits	-0.124939
-0.599887	loaded, but waits	-0.124939
-2.366894	rather than loops,	-0.124939
-0.594995	compile-time while loops,	-0.124939
-0.902280	bits for Tuesday,	-0.124939
-0.659984	Sunday, Monday, Tuesday,	-0.124939
-0.601773	case in loops.	-0.124939
-1.329014	critical innermost loops.	-0.124939
-0.601439	log(c[i]); // Increment	-0.124939
-0.463714	137, respectively. Increment	-0.124939
-2.533963	to be cached	-0.124939
-1.744278	data are cached	-0.124939
-0.197650	elimination, constant propagation,	-0.124939
-0.899606	or data exceeds	-0.124939
-0.589179	input never exceeds	-0.124939
-1.078320	found in Wikipedia	-0.124939
-1.036759	C++ compilers. Wikipedia	-0.124939
-0.855575	Worst-case testing ................................................................................................	-0.124939
-0.836052	Dependency chains ................................................................................................	-0.124939
-0.659984	the inverted mask.	-0.124939
-0.358988	thread affinity mask.	-0.124939
-1.074503	I will conclude	-0.124939
-1.179656	can therefore conclude	-0.124939
-2.800783	can be shared.	-0.124939
-1.970069	cannot be shared.	-0.124939
-0.601870	DLL is relocated	-0.124939
-0.601569	DLLs are relocated	-0.124939
-0.169354	of everything else.	-0.124939
-2.716424	in a FIFO	-0.124939
-1.633379	example, a FIFO	-0.124939
-0.397926	Math Kernel Library"	-0.124939
-1.202501	methods for dealing	-0.124939
-2.086119	you are dealing	-0.124939
-1.221470	public: void Hello()	-0.124939
-0.573889	Disp(); void Hello()	-0.124939
-0.463714	processing unit. Various	-0.124939
-0.358988	or network. Various	-0.124939
-1.191309	of 64-bit software,	-0.124939
-1.243779	of modern software,	-0.124939
-2.303938	{ // Overflow	-0.124939
-0.505102	142). 30 Overflow	-0.124939
-0.599431	calculated using multiplications	-0.124939
-0.890209	speed up multiplications	-0.124939
-0.895511	are some differences	-0.124939
-0.580814	2004. No differences	-0.124939
-0.203497	B = 2.2,	-0.425969
-2.647771	the same machine.	-0.124939
-0.887809	Java virtual machine.	-0.124939
-0.582119	use STL containers.	-0.124939
-0.659984	and deleting containers.	-0.124939
-1.611074	a branch tree.	-0.124939
-1.140288	a binary tree.	-0.124939
-0.358988	mentally flawed approach	-0.124939
-0.358988	well thought-through approach	-0.124939
-0.657465	be allocated dynamically.	-0.124939
-0.659984	* CriticalFunctionDispatch(void) __asm__	-0.124939
-0.358988	CriticalFunction (); __asm__	-0.124939
-1.039016	compilers have difficulties	-0.425969
-0.203846	creating and deleting	-0.124939
-2.102803	for a discussion.	-0.124939
-1.343902	for further discussion.	-0.124939
-0.726919	7.4 Enums ......................................................................................................................	-0.124939
-0.726919	9.8 Strings ......................................................................................................................	-0.124939
-0.595833	(char, short int)	-0.124939
-0.567243	char (or int)	-0.124939
-1.584061	{ if (y)	-0.124939
-0.895800	}; if (y)	-0.124939
-1.588933	for this purpose,	-0.124939
-1.695134	a specific purpose,	-0.124939
-0.463714	most sorting algorithms,	-0.124939
-0.358988	many encryption algorithms,	-0.124939
-1.201529	CriticalFunction = &CriticalFunction_SSE2;	-0.124939
-0.895225	supported return &CriticalFunction_SSE2;	-0.124939
-1.283811	virtual void Disp();	-0.124939
-0.659984	"Hello "; Disp();	-0.124939
-1.745908	time is consistent	-0.124939
-1.500470	improved by consistent	-0.124939
-0.601411	0+1.23456 = 1.23456.	-0.124939
-2.366894	rather than 1.23456.	-0.124939
-0.065815	} u, v;	-0.425969
-0.065815	= _mm_mullo_epi16 (b,	-0.425969
-1.360743	can often reveal	-0.124939
-0.573394	their implementations reveal	-0.124939
-1.078610	structure is created.	-0.124939
-2.107517	they are created.	-0.124939
-2.422342	to use denormal	-0.124939
-0.463714	than generating denormal	-0.124939
-2.392860	should be optional	-0.124939
-0.358988	Public License, optional	-0.124939
-0.172709	16kB unaligned op.	-0.124939
-0.594685	time. An experiment	-0.124939
-0.583263	of my experiment	-0.124939
-0.594997	$B2$2 ; Induction++;	-0.124939
-1.010214	= Induction; Induction++;	-0.124939
-1.435993	between different precisions	-0.124939
-2.580881	floating point precisions	-0.124939
-0.143416	124 13.3 Difficult	-0.124939
-0.143416	programming. 13.3 Difficult	-0.124939
-1.188501	on an interpreter	-0.124939
-0.596791	had an interpreter	-0.124939
-0.594083	linkage table (PLT).	-0.124939
-0.892547	}; int order(int	-0.124939
-0.596756	j; int order(int	-0.124939
-0.601847	consisting of digital	-0.124939
-0.550854	A complex digital	-0.124939
-0.241901	for(i=0; i<300; i++){	-0.425969
-0.463714	0.38 0.44 0.40	-0.124939
-0.358988	0.77 0.89 0.40	-0.124939
-0.129483	x, y, z;	-0.124939
-0.505102	division ........................................................................................... 139	-0.124939
-0.463714	/ c) 139	-0.124939
-0.358988	1.21 0.57 0.44	-0.124939
-0.358988	K8 0.38 0.44	-0.124939
-0.065815	(*SelectAddMul_pointer)(aa, bb, cc);	-0.425969
-1.032744	bit platform __GNUC__	-0.124939
-0.463714	8.22 #ifdef __GNUC__	-0.124939
-1.077876	line that covered	-0.124939
-1.443735	processors are covered	-0.124939
-0.713839	the old fashioned	-0.425969
-1.076667	arrays with alloca.	-0.124939
-1.073939	on using alloca.	-0.124939
-1.048654	RAM memory. Efficient	-0.124939
-0.726919	and truncation. Efficient	-0.124939
-1.074122	different memory spaces	-0.124939
-1.060000	cleaning up spaces	-0.124939
-1.165113	; parameter $B1$1:	-0.124939
-0.563094	2: 12 $B1$1:	-0.124939
-0.065815	5040, 40320, 362880,	-0.425969
-0.593296	address) / (line	-0.124939
-0.358988	of sets) (line	-0.124939
-0.601379	row or column.	-0.124939
-2.044913	in this column.	-0.124939
-1.457542	and more complex,	-0.124939
-1.406040	code more complex,	-0.124939
-0.875564	146 below. 3.7	-0.124939
-0.573379	....................................................... 20 3.7	-0.124939
-2.719394	is a cheap	-0.124939
-0.789055	are relatively cheap	-0.124939
-0.354899	of mathematical purity.	-0.124939
-0.463714	s(0.f, 0.f, 0.f,	-0.124939
-0.358988	F32vec4 s(0.f, 0.f,	-0.124939
-2.219225	// Example 14.23b	-0.124939
-0.593960	storage. Example 14.23b	-0.124939
-0.567294	variables (see below).	-0.124939
-0.567294	16 (see below).	-0.124939
-1.203315	requires a division,	-0.124939
-0.886620	Single precision division,	-0.124939
-0.601847	unit of received	-0.124939
-0.541331	A command received	-0.124939
-0.527376	a dramatic degradation	-0.124939
-0.463714	a slight degradation	-0.124939
-1.203709	need the "override"	-0.124939
-0.600729	implement this "override"	-0.124939
-2.638242	// Example 11.2b	-0.124939
-2.150653	in example 11.2b	-0.124939
-1.980147	address of matrix[j][0]	-0.124939
-0.659984	= order(i); matrix[j][0]	-0.124939
-0.505102	x-xxxxxxx ---x----- x--xx----	-0.124939
-0.358988	x--xx---- (a&&b)||(a&&!b)=a x--xx----	-0.124939
-0.595151	separate function. Sometimes,	-0.124939
-0.836041	hot spot. Sometimes,	-0.124939
-2.215448	have to distribute	-0.124939
-2.288937	possible to distribute	-0.124939
-2.194317	at the "worst	-0.124939
-0.601610	represent the "worst	-0.124939
-0.803845	variables are overdetermined	-0.124939
-0.359010	of 250 ms.	-0.124939
-0.359010	than 250 ms.	-0.124939
-1.950941	the same thing.	-0.124939
-2.155180	size of squares:	-0.124939
-1.748688	for all squares:	-0.124939
-2.257696	the time MemberPointer	-0.124939
-0.597538	c1 before MemberPointer	-0.124939
-0.796353	available from www.intel.com.	-0.124939
-0.563121	c1 += TILESIZE)	-0.124939
-0.563121	r1 += TILESIZE)	-0.124939
-2.467643	number of sources.	-0.124939
-0.575703	from unknown sources.	-0.124939
-2.762877	to the first-in-last-out	-0.124939
-2.772600	in a first-in-last-out	-0.124939
-0.065815	40320, 362880, 3628800,	-0.425969
-1.895126	is not uncommon	-0.425969
-1.300024	Such a coprocessor	-0.124939
-1.362626	a graphics coprocessor	-0.124939
-1.049561	fraction : 23;	-0.124939
-0.585207	n << 23;	-0.124939
-0.818671	for Linux. 82	-0.124939
-0.527376	directives .............................................................................................. 82	-0.124939
-1.485668	of C++ relates	-0.124939
-1.343713	C++ language relates	-0.124939
-0.866296	member pointer. 7.9	-0.124939
-0.358988	Member pointers.......................................................................................................37 7.9	-0.124939
-3.207749	of the alignment.	-0.124939
-0.541321	work. Data alignment.	-0.124939
-0.567261	as integers. 7.5	-0.124939
-0.527376	...................................................................................................................... 33 7.5	-0.124939
-0.981470	a good deal	-0.425969
-0.505102	loader (requires binutils	-0.124939
-0.358988	13.1, Requires binutils	-0.124939
-1.113779	vector operations. 7.6	-0.124939
-0.527376	Booleans................................................................................................................... 33 7.6	-0.124939
-0.391189	systems. 14 Specific	-0.124939
-0.391189	130 14 Specific	-0.124939
-0.595151	Pure function. __attribute__((const))	-0.124939
-0.726919	#define pure_function __attribute__((const))	-0.124939
-0.601569	background are unnecessary	-0.124939
-0.577653	program. Avoid unnecessary	-0.124939
-1.584873	{ float b[1000];	-0.124939
-0.586658	a[1000]; float b[1000];	-0.124939
-0.527376	operators............................................................................... 29 7.3	-0.124939
-0.527376	variables. 31 7.3	-0.124939
-1.549301	simply by performing	-0.124939
-0.588076	A better performing	-0.124939
-0.899137	done only once,	-0.124939
-0.885766	only calculated once,	-0.124939
-1.423986	= 0, s1	-0.124939
-0.917690	+= a[i]; s1	-0.124939
-2.441602	instruction set (called	-0.124939
-0.570596	if statements (called	-0.124939
-2.079375	int i; 84	-0.124939
-0.463714	does ............................................................................. 84	-0.124939
-0.143416	have extern "C"	-0.124939
-0.143416	point extern "C"	-0.124939
-0.898757	almost all respects	-0.124939
-1.287097	in many respects	-0.124939
-0.505102	Free Documentation License	-0.124939
-0.358988	no yes License	-0.124939
-0.597486	code. See ISO/IEC	-0.124939
-0.358988	optimization. en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC	-0.124939
-1.549859	used as command-line	-0.124939
-0.900012	debugging. A command-line	-0.124939
-0.889076	array ; i++	-0.124939
-0.590120	post-increment operator i++	-0.124939
-1.872100	(see page 137).	-0.124939
-1.311713	(See page 137).	-0.124939
-1.962455	inside the template.	-0.124939
-2.647771	the same template.	-0.124939
-0.601439	constant. // General	-0.124939
-0.463714	price GNU General	-0.124939
-3.070236	in the container,	-0.124939
-1.861111	a single container,	-0.124939
-0.902297	Windows. The integrated	-0.124939
-0.601379	card or integrated	-0.124939
-2.059910	program is started.	-0.124939
-0.590917	CPU was started.	-0.124939
-1.367629	function which transposes	-0.124939
-1.844505	following example transposes	-0.124939
-2.732311	to the container.	-0.124939
-3.019650	in the container.	-0.124939
-0.493519	version return (*SelectAddMul_pointer)(aa,	-0.425969
-1.174242	It will crash	-0.124939
-0.594396	disabled will crash	-0.124939
-0.203497	A = 1.1,	-0.425969
-1.296038	you to reserve	-0.124939
-2.272844	order to reserve	-0.124939
-0.541310	integer division. Older	-0.124939
-0.358988	4, etc.). Older	-0.124939
-1.679151	pointer is deleted.	-0.124939
-2.533963	to be deleted.	-0.124939
-0.818671	and Linux. Asmlib	-0.124939
-0.541321	1.19 13 Asmlib	-0.124939
-0.885246	is run. Both	-0.124939
-0.726946	the linker. Both	-0.124939
-1.191820	p = &Object2;	-0.124939
-0.598952	p2 = &Object2;	-0.124939
-1.068638	{ int a:4;	-0.425969
-0.463714	a*0=0 a*1=a (-a)*(-b)=a*b	-0.124939
-0.358988	a*1=a x-xxxxx-x (-a)*(-b)=a*b	-0.124939
-2.954844	can be left	-0.124939
-0.597261	free register left	-0.124939
-0.358988	< c1+TILESIZE; c2++)	-0.124939
-0.358988	< r2; c2++)	-0.124939
-0.295881	the processor. Nested	-0.425969
-1.823289	} // ipow	-0.124939
-0.599344	loop double ipow	-0.124939
-1.939345	an integer comparison,	-0.124939
-0.659984	addition, subtraction, comparison,	-0.124939
-0.379276	versions are produced	-0.425969
-0.584939	public: void NotPolymorphic();	-0.124939
-0.901074	a+(b+c) - a*b+a*c	-0.124939
-2.363867	- n.a. a*b+a*c	-0.124939
-2.638242	// Example 11.1a	-0.124939
-2.150653	in example 11.1a	-0.124939
-0.594686	CPU doesn't support,	-0.124939
-0.593754	no AVX support,	-0.124939
-2.020838	if you forget	-0.124939
-1.947052	If you forget	-0.124939
-1.731205	with the inverted	-0.124939
-0.601844	11.1a to 11.1b	-0.124939
-2.638242	// Example 11.1b	-0.124939
-0.901793	8 = 80.	-0.124939
-2.209017	See page 80.	-0.124939
-0.896724	element number i.	-0.124939
-0.463714	the index, i.	-0.124939
-0.601072	written. This worked	-0.124939
-0.900474	They have worked	-0.124939
-1.076547	overflow is needed:	-0.124939
-0.601180	bookkeeping is needed:	-0.124939
-0.553912	164 1 Introduction	-0.124939
-0.553912	Contents 1 Introduction	-0.124939
-0.940813	for(i=0; i<300; i+=3){	-0.124939
-0.358988	for(i=0; i<301; i+=3){	-0.124939
-0.199552	ALIGN 4 PUBLIC	-0.425969
-0.143416	2 a+a+a+a=a*4 -(-a)=a	-0.124939
-0.143416	((x2)2)2 a+a+a+a=a*4 -(-a)=a	-0.124939
-0.314805	from Intel: "IA-32	-0.124939
-0.314805	documentation Intel: "IA-32	-0.124939
-1.361281	program is loaded,	-0.124939
-1.121605	(a&&b) || (a&&b&&c)	-0.124939
-0.533681	(a&&c) || (a&&b&&c)	-0.124939
-1.711997	= 0 a+0=a	-0.124939
-0.358988	a-(-b)=a+b ---xxx-x- a+0=a	-0.124939
-2.638242	// Example 7.15b	-0.124939
-1.071439	as example 7.15b	-0.124939
-0.599352	block size grows	-0.124939
-1.795324	an array grows	-0.124939
-0.601870	N&(N-1) is 0.	-0.124939
-1.288597	c < 0.	-0.124939
-0.065815	< r1+TILESIZE; r2++)	-0.425969
-0.601957	holds the index,	-0.124939
-0.726919	square brackets index,	-0.124939
-3.207749	of the time-consumers	-0.124939
-1.491213	most common time-consumers	-0.124939
-0.936772	is fast enough.	-0.124939
-0.549230	job fast enough.	-0.124939
-0.065815	24, 120, 720,	-0.425969
-1.204004	is quite tedious	-0.124939
-1.090088	be quite tedious	-0.124939
-0.594846	versions work correctly.	-0.124939
-0.593538	branches works correctly.	-0.124939
-0.601759	safe and flexible,	-0.124939
-0.358988	are universal, flexible,	-0.124939
-2.064204	const int ARRAYSIZE	-0.124939
-1.066748	(i < ARRAYSIZE	-0.124939
-1.092907	the source annotation	-0.124939
-0.538991	for source annotation	-0.124939
-0.550863	and edx, respectively.	-0.124939
-0.358988	and 137, respectively.	-0.124939
-0.570618	....................................................................................................................... 3 1.1	-0.124939
-0.527376	relevant information. 1.1	-0.124939
-2.066092	(see page 43).	-0.124939
-1.114525	(see p. 43).	-0.124939
-0.764700	u; u.i ^=	-0.124939
-0.358988	with u.i[1] ^=	-0.124939
-0.882135	is all 1's	-0.124939
-1.574147	with all 1's	-0.124939
-0.600648	we use hexadecimal	-0.124939
-0.589190	2. Using hexadecimal	-0.124939
-0.895283	i; } u,	-0.425969
-1.475628	or by extending	-0.124939
-0.598258	size by extending	-0.124939
-0.204114	&, |, ^,	-0.124939
-2.716424	in a systematic	-0.124939
-2.084644	use a systematic	-0.124939
-1.044303	& operator forces	-0.124939
-0.585228	The union forces	-0.124939
-1.445827	loop is rolled	-0.124939
-0.541310	a list, rolled	-0.124939
-1.090029	same module __attribute__	-0.124939
-0.659984	((visibility ("internal"))) __attribute__	-0.124939
-0.902516	written in Java,	-0.124939
-2.283118	such as Java,	-0.124939
-0.182928	__restrict #pragma ivdep	-0.124939
-1.503347	Intel and Gnu.	-0.124939
-1.706031	compatible with Gnu.	-0.124939
-0.601957	ends the recursion	-0.124939
-0.902297	implemented. The recursion	-0.124939
-1.377441	- a ^a	-0.124939
-0.601134	~a a ^a	-0.124939
-1.078756	rules of algebra,	-0.124939
-1.041663	of Boolean algebra,	-0.124939
-0.899711	to memory management	-0.124939
-0.575698	of heap management	-0.124939
-2.264929	There are lots	-0.124939
-0.601220	databases with lots	-0.124939
-0.463714	5. www.amd.com. 163	-0.124939
-0.463714	Literature ..................................................................................................................... 163	-0.124939
-0.358988	For team projects,	-0.124939
-0.358988	For one-man projects,	-0.124939
-0.463714	-static /MT 160	-0.124939
-0.358988	compiler options....................................................................................... 160	-0.124939
-0.601439	SSE3. // (This	-0.124939
-0.855591	induction variable. (This	-0.124939
-1.631273	Windows and Mac.	-0.124939
-1.703824	Linux and Mac.	-0.124939
-2.647771	the same chip.	-0.124939
-2.276223	the CPU chip.	-0.124939
-2.954844	can be wrapped	-0.124939
-2.107517	they are wrapped	-0.124939
-0.689273	be predicted perfectly	-0.124939
-0.065815	= _mm_cmpgt_epi16(b, zero);	-0.425969
-0.926796	have been added?	-0.425969
-0.463714	Effective C++". Addison-Wesley,	-0.124939
-0.358988	"Hacker's Delight". Addison-Wesley,	-0.124939
-2.167060	the loop counter,	-0.124939
-1.868438	a loop counter,	-0.124939
-1.364404	the static modifier	-0.124939
-0.463714	The fastcall modifier	-0.124939
-0.901730	-O3 or -Ofast	-0.124939
-0.463714	option) better: -Ofast	-0.124939
-1.745390	code to test.	-0.124939
-0.463714	to 155 test.	-0.124939
-1.203257	unable to respond	-0.124939
-1.154740	should never respond	-0.124939
-0.065815	of Numerically Intensive	-0.425969
-3.070236	in the planning	-0.124939
-0.463714	the early planning	-0.124939
-1.071745	}; class C2	-0.124939
-0.463714	C1 Object1; C2	-0.124939
-2.014985	all the R	-0.124939
-1.395731	the four R	-0.124939
-1.221619	and a release	-0.425969
-2.511249	of the fraction.	-0.124939
-0.601379	Wednesday or Friday	-0.124939
-0.358988	= 0x10, Friday	-0.124939
-0.847542	Some programming textbooks	-0.124939
-0.573398	Nowadays, programming textbooks	-0.124939
-2.533963	to be slower.	-0.124939
-2.429234	the program slower.	-0.124939
-0.541321	...................................................................................... 90 9.7	-0.124939
-0.463714	using alloca. 9.7	-0.124939
-0.901793	(c2 = c1;	-0.124939
-0.599567	7.14 class c1;	-0.124939
-0.165199	interpreters, just-in-time compilers,	-0.124939
-0.601759	-128, and subtracting	-0.124939
-0.901524	2n by subtracting	-0.124939
-2.227146	{ // Returns	-0.124939
-0.599059	etc. // Returns	-0.124939
-0.902297	it. The insight	-0.124939
-0.596539	This new insight	-0.124939
-0.358988	reductions: a+b=b+a a*b=b*a	-0.124939
-0.358988	reductions: a+b=b+a, a*b=b*a	-0.124939
-0.199961	r2 < r1+TILESIZE;	-0.425969
-0.601134	reductions as 0/a	-0.124939
-1.675222	a - 0/a	-0.124939
-0.596205	can only hope	-0.425969
-0.601439	7.45 // Portability	-0.124939
-0.550863	optimization. 14 Portability	-0.124939
-1.810770	the other compilers).	-0.124939
-0.505118	old DOS compilers).	-0.124939
-2.227146	{ // Catch	-0.124939
-1.789532	} // Catch	-0.124939
-1.947417	more than 99%	-0.124939
-0.505102	other programs, 99%	-0.124939
-0.152964	<< "Hello ";	-0.124939
-0.577666	1024; struct Sab	-0.124939
-0.505118	int b;}; Sab	-0.124939
-0.990899	a significant contribution	-0.124939
-0.505102	a negligible contribution	-0.124939
-1.300024	makes a distinction	-0.124939
-0.889675	an important distinction	-0.124939
-0.599059	result // Update	-0.124939
-0.599059	Y // Update	-0.124939
-0.994790	Gnu, Clang Supported	-0.124939
-0.358988	dvec.h vectorclass.h Supported	-0.124939
-0.143416	in develop- ment	-0.124939
-0.143416	software develop- ment	-0.124939
-1.058775	dispatcher function. typeof(CriticalFunction)	-0.124939
-0.358988	__asm__ ("CriticalFunction"); typeof(CriticalFunction)	-0.124939
-2.290672	than the ones	-0.124939
-1.077831	modify the ones	-0.124939
-2.043817	program is busy	-0.124939
-0.601180	she is busy	-0.124939
-1.937450	not be optimally	-0.124939
-1.166212	can run optimally	-0.124939
-0.541310	where necessary. Fast	-0.124939
-0.505102	Compiler-specific keywords Fast	-0.124939
-0.902297	details. The funny	-0.124939
-0.895511	does some funny	-0.124939
-0.579343	Optimizing database queries	-0.124939
-0.358988	cases. Database queries	-0.124939
-2.030263	a program saying	-0.124939
-0.463714	pop-up messages saying	-0.124939
-0.892336	higher than normal.	-0.124939
-0.596650	random than normal.	-0.124939
-0.408234	Writes "Hello 1"	-0.425969
-0.601870	hardware is updated.	-0.124939
-1.558375	CPU dispatcher updated.	-0.124939
-0.601663	purpose. The clumsy	-0.124939
-0.586804	functions look clumsy	-0.124939
-0.065815	= (double)(signed int)u;	-0.124939
-3.207749	of the trivial	-0.124939
-1.077966	loop for trivial	-0.124939
-0.065815	void transpose(double a[SIZE][SIZE])	-0.425969
-1.378173	SSE2 or x64	-0.124939
-0.593296	80x86 / x64	-0.124939
-1.069538	or 64-bit systems).	-0.124939
-0.585216	in x86 systems).	-0.124939
-1.047109	time is wasted	-0.425969
-0.463714	GNU General Public	-0.124939
-0.659984	Agner Fog. Public	-0.124939
-1.543509	an extra dummy	-0.124939
-0.591953	even add dummy	-0.124939
-1.379596	through the symbolic	-0.124939
-1.300024	makes a symbolic	-0.124939
-2.954844	can be fetched	-0.124939
-1.444222	instructions are fetched	-0.124939
-0.601411	Saturday = 0x40	-0.124939
-0.601379	64 or 0x40	-0.124939
-2.265160	n.a. - (a&b)|(a&c)	-0.124939
-0.358988	~(~a)=a x-xxxxx-- (a&b)|(a&c)	-0.124939
-1.904707	efficient than relocation,	-0.124939
-1.284978	that need relocation,	-0.124939
-1.036759	PathScale compilers. (The	-0.124939
-0.917714	an explanation. (The	-0.124939
-0.901849	1.2; // Mixing	-0.124939
-0.587454	commercial compilers. Mixing	-0.124939
-0.601374	iteration it decides	-0.124939
-1.201177	dispatcher function decides	-0.124939
-0.538986	(SSE2): #include <xmmintrin.h>	-0.124939
-0.538986	(SSE): #include <xmmintrin.h>	-0.124939
-0.598569	Func1(x) * Func1(x)	-0.124939
-1.950904	{ return Func1(x)	-0.124939
-0.065815	120, 720, 5040,	-0.425969
-1.444462	including the ability	-0.124939
-0.601610	loose the ability	-0.124939
-0.901524	multiplying by 3,	-0.124939
-1.129373	1, 2, 3,	-0.124939
-0.541321	.......................................................................................... 21 3.12	-0.124939
-0.541331	system modules. 3.12	-0.124939
-2.046866	do not 123	-0.124939
-0.463714	#define ABC 123	-0.124939
-1.679732	reference to provoke	-0.124939
-1.631088	This will provoke	-0.124939
-0.359006	well. Codeplay VectorC	-0.124939
-0.359006	2005. Codeplay VectorC	-0.124939
-1.073144	from other processes.	-0.124939
-1.486409	between multiple processes.	-0.124939
-0.143416	module __attribute__ ((visibility	-0.124939
-0.143416	("internal"))) __attribute__ ((visibility	-0.124939
-0.601870	chains is stronger	-0.124939
-0.594345	so much stronger	-0.124939
-1.444222	instructions are accessible	-0.124939
-2.191732	are not accessible	-0.124939
-0.601759	modularity and reusable	-0.124939
-0.902516	away in reusable	-0.124939
-0.902297	problems. The procedures	-0.124939
-0.557818	and far procedures	-0.124939
-0.601411	!b = !(a	-0.124939
-2.363867	- n.a. !(a	-0.124939
-0.359006	158 18 Overview	-0.124939
-0.359006	159 18 Overview	-0.124939
-0.358988	AMD SSE4A ammintrin.h	-0.124939
-0.358988	AMD XOP ammintrin.h	-0.124939
-2.366894	rather than sequences	-0.124939
-0.593951	or small sequences	-0.124939
-0.600690	start and stop	-0.124939
-0.900354	message and stop	-0.124939
-1.343902	for further expansions	-0.124939
-0.527376	as Taylor expansions	-0.124939
-1.057607	the sign bit.	-0.124939
-0.143416	vectors. 12.10 Conclusion	-0.124939
-0.143416	120 12.10 Conclusion	-0.124939
-0.598852	support static linking.	-0.124939
-0.891163	and dynamic linking.	-0.124939
-0.764700	an IDE. Free	-0.124939
-0.463714	A GNU Free	-0.124939
-0.601957	studying the bottlenecks	-0.124939
-0.598031	identify performance bottlenecks	-0.124939
-1.379133	due to interrupts	-0.124939
-1.316423	to generate interrupts	-0.124939
-1.375197	compatibility with legacy	-0.124939
-1.067861	with some legacy	-0.124939
-0.600314	far data segment	-0.124939
-0.898622	for one segment	-0.124939
-0.599150	platform n.a. __unix__	-0.124939
-0.659984	__unix__ __linux__ __unix__	-0.124939
-1.078279	address and attempts	-0.124939
-2.321603	that it attempts	-0.124939
-0.563094	integer vectors. Code	-0.124939
-0.463714	are eliminated. Code	-0.124939
-0.129483	}; Weekdays Day;	-0.425969
-0.202862	c2++) { swapd(a[r2][c2],a[c2][r2]);	-0.425969
-0.065815	return (*SelectAddMul_pointer)(aa, bb,	-0.425969
-1.197724	to save power.	-0.124939
-0.577648	the processing power.	-0.124939
-1.465857	a time consumer	-0.124939
-0.595259	annoying time consumer	-0.124939
-0.601319	put a parenthesis	-0.425969
-0.575684	reusable classes. Security	-0.124939
-0.789091	a computer. Security	-0.124939
-0.889644	with its limit,	-0.124939
-0.358988	an acceptable limit,	-0.124939
-0.900271	cannot use ~	-0.124939
-0.659984	|, ^, ~	-0.124939
-0.601322	transpose function swapd(a[r][c],	-0.124939
-0.836052	below diagonal swapd(a[r][c],	-0.124939
-0.596888	more time. Single	-0.124939
-1.308159	data cache. Single	-0.124939
-0.463714	1.00 0.25 0.28	-0.124939
-0.358988	0.35 0.29 0.28	-0.124939
-1.667044	that have Booleans	-0.124939
-0.463714	integers. 7.5 Booleans	-0.124939
-0.836041	Opteron K8 0.24	-0.124939
-0.463714	0.24 0.25 0.24	-0.124939
-0.249867	access 9.1 Caching	-0.124939
-0.249867	87 9.1 Caching	-0.124939
-1.199484	which may interfere	-0.124939
-0.600495	macro will interfere	-0.124939
-0.897514	frame- pointer -fomit-	-0.124939
-0.358988	frame /Oy -fomit-	-0.124939
-0.463714	K8 0.24 0.25	-0.124939
-0.659984	n.a. 1.00 0.25	-0.124939
-0.378965	bc = _mm_mullo_epi16	-0.425969
-1.642530	without the FMA4	-0.124939
-0.594162	(Gnu) AMD FMA4	-0.124939
-0.463714	Fog. Public distribution	-0.124939
-0.358988	allowed. Non-public distribution	-0.124939
-1.018369	/ b) >>	-0.124939
-0.358988	~, <<, >>	-0.124939
-2.074884	able to do,	-0.124939
-0.505102	expansions. Programmers do,	-0.124939
-0.601314	alias, if appropriate.	-0.124939
-0.598549	static where appropriate.	-0.124939
-0.541321	subsequent manuals. Please	-0.124939
-0.917714	an explanation. Please	-0.124939
-0.143416	21 3.12 Network	-0.124939
-0.143416	modules. 3.12 Network	-0.124939
-0.597345	compiler: unsigned __int64	-0.124939
-0.659984	MS compiler: __int64	-0.124939
-0.371933	misses, branch mispredictions,	-0.124939
-1.481669	i < rows;	-0.425969
-0.600314	keeping data together.	-0.124939
-0.586787	are linked together.	-0.124939
-2.288937	possible to organize	-0.124939
-1.999324	need to organize	-0.124939
-2.610606	of a bitfield	-0.124939
-0.601134	compose a bitfield	-0.124939
-2.638242	// Example 15.1b.	-0.124939
-2.150653	in example 15.1b.	-0.124939
-1.798694	when it sees	-0.124939
-2.623314	the compiler sees	-0.124939
-0.596888	development time. Interpreted	-0.124939
-0.358988	shell script. Interpreted	-0.124939
-2.623314	the compiler treat	-0.124939
-0.598749	these also treat	-0.124939
-0.065815	void TransposeCopy(double a[SIZE][SIZE],	-0.425969
-0.601411	a<c) = (a<b	-0.124939
-0.358988	----x---- !(a<b)=(a>=b) (a<b	-0.124939
-1.874816	on the processor).	-0.124939
-0.600750	misses have occurred.	-0.124939
-0.600264	overflow has occurred.	-0.124939
-2.543476	and the corresponding	-0.124939
-0.601610	supports the corresponding	-0.124939
-0.600787	SafeArray() { memset(a,	-0.124939
-1.514747	to zero memset(a,	-0.124939
-0.591774	x * m;}	-0.124939
-0.601759	occur and recovering	-0.124939
-1.202501	use for recovering	-0.124939
-0.900549	Weekdays { Sunday,	-0.124939
-0.858809	the constants Sunday,	-0.124939
-0.065815	are mutually incompatible.	-0.124939
-1.073383	*p + 2;}	-0.124939
-0.593954	*const_cast<int*>(&x) += 2;}	-0.124939
-2.059910	program is fast,	-0.124939
-0.358988	/fp:fast=2 -fp-model fast,	-0.124939
-1.048660	of optimizing University	-0.124939
-0.463714	Fog. Technical University	-0.124939
-1.503641	calls to log,	-0.124939
-0.358988	as pow, log,	-0.124939
-0.601625	(everything that begins	-0.124939
-0.958041	loop body begins	-0.124939
-2.219225	// Example 14.26	-0.124939
-1.055320	} Example 14.26	-0.124939
-2.219225	// Example 14.27	-0.124939
-1.055320	} Example 14.27	-0.124939
-1.854674	method is somewhat	-0.124939
-0.593538	It works somewhat	-0.124939
-0.662988	is poorly predictable.	-0.124939
-0.415733	are poorly predictable.	-0.124939
-0.450776	of addition, subtraction,	-0.124939
-0.450776	as addition, subtraction,	-0.124939
-2.638242	// Example 14.23	-0.124939
-2.150653	in example 14.23	-0.124939
-2.088616	be a slight	-0.124939
-1.347896	may cause slight	-0.124939
-0.594853	an execution unit.	-0.124939
-1.009714	graphics processing unit.	-0.124939
-0.601759	hacks and direct	-0.124939
-1.321517	that allows direct	-0.124939
-1.300620	get the generic	-0.124939
-1.920271	and a generic	-0.124939
-1.415637	point addition unit,	-0.124939
-1.009714	graphics processing unit,	-0.124939
-0.601870	handle is invalid.	-0.124939
-0.582119	then become invalid.	-0.124939
-0.601870	database is heavily	-0.124939
-1.037774	that rely heavily	-0.124939
-1.291214	but only self-	-0.124939
-1.342354	for calculating self-	-0.124939
-2.532245	to make log2	-0.124939
-1.196022	const double log2	-0.124939
-0.863655	performance monitor counters.	-0.124939
-0.567282	where execution speed,	-0.124939
-0.567282	while execution speed,	-0.124939
-0.559755	(a&&b) || (!a&&c)	-0.124939
-0.601663	zero. The []	-0.124939
-0.358988	// Safe []	-0.124939
-1.831504	on page 122.	-0.124939
-0.879037	see page 122.	-0.124939
-0.889924	appropriate error messages	-0.124939
-0.358988	nagging pop-up messages	-0.124939
-0.358988	void F2(float x[]);	-0.124939
-0.358988	void F1(int x[]);	-0.124939
-0.202136	(Intel CPU only)	-0.124939
-0.505102	11.1b automatically, although	-0.124939
-0.358988	work, 133 although	-0.124939
-1.934293	functions are primitive	-0.124939
-0.541310	a relatively primitive	-0.124939
-0.463714	by S. Goedecker	-0.124939
-0.358988	performance. Stefan Goedecker	-0.124939
-0.143416	122 13.2 Model-specific	-0.124939
-0.143416	files. 13.2 Model-specific	-0.124939
-0.358988	(RTTI) /GR– -fno-rtti	-0.124939
-0.358988	-fno-rtti /GR- -fno-rtti	-0.124939
-0.900433	want this initialization,	-0.124939
-0.358988	three clauses: initialization,	-0.124939
-0.600787	largest_abs) { largest_abs	-0.124939
-0.358988	int absvalue, largest_abs	-0.124939
-0.570613	use alternative implementations.	-0.124939
-0.818685	best Java implementations.	-0.124939
-0.107206	2, 6, 24,	-0.425969
-0.902488	performance and studying	-0.124939
-0.902280	but for studying	-0.124939
-2.066092	(see page 87).	-0.124939
-1.114525	(see p. 87).	-0.124939
-0.880397	cause cache contentions.	-0.124939
-0.590565	No cache contentions.	-0.124939
-1.071745	N> class SafeArray	-0.124939
-0.463714	Example 7.15b SafeArray	-0.124939
-2.016065	on page 58	-0.124939
-0.917690	do so. 58	-0.124939
-0.359010	is becoming increasingly	-0.124939
-0.567279	are becoming increasingly	-0.124939
-0.601134	measurements as accurate	-0.124939
-0.527389	is sufficiently accurate	-0.124939
-0.600604	invest more efforts	-0.124939
-1.412689	the optimization efforts	-0.124939
-1.731158	the program starts.	-0.425969
-0.592548	ebx contains i/2+r.	-0.124939
-0.789055	for computing i/2+r.	-0.124939
-2.732311	to the reader	-0.124939
-2.707124	that the reader	-0.124939
-0.601176	manual on usability,	-0.124939
-0.577662	development time, usability,	-0.124939
-0.527389	be true. template<>	-0.124939
-0.463714	the recursion template<>	-0.124939
-1.300517	includes the low-level	-0.124939
-1.641729	access to low-level	-0.124939
-2.118818	set is available:	-0.124939
-1.076547	SSE2 is available:	-0.124939
-0.505102	point overflow: _controlfp_s(&dummy,	-0.124939
-0.358988	status: _fpreset(); _controlfp_s(&dummy,	-0.124939
-0.568003	4 ; mangled	-0.124939
-0.568003	;alignby4 ; mangled	-0.124939
-0.143416	2 12.6 Transforming	-0.124939
-0.143416	113 12.6 Transforming	-0.124939
-0.600495	stride will contend	-0.124939
-1.470693	dynamic libraries contend	-0.124939
-0.463747	a garbage collector	-0.124939
-0.463747	time-consuming garbage collector	-0.124939
-0.440683	if ((unsigned int)n	-0.425969
-0.463714	are created. Far	-0.124939
-0.358988	be huge). Far	-0.124939
-0.203923	Table of factorials:	-0.124939
-1.731763	member functions ........................................................................................	-0.124939
-1.593632	intrinsic functions ........................................................................................	-0.124939
-0.601374	control it compares	-0.124939
-0.600444	100. It compares	-0.124939
-0.474647	example below shows.	-0.124939
-0.474647	7.15b below shows.	-0.124939
-0.899920	DoThisThreeTimesAWeek(); } By	-0.124939
-0.579360	Mac platforms By	-0.124939
-0.597983	known with certainty	-0.124939
-0.597983	predict with certainty	-0.124939
-0.358988	the rightmost 1-bit	-0.124939
-0.358988	Remove right-most 1-bit	-0.124939
-1.823289	} // Or	-0.124939
-0.557802	few parameters. Or	-0.124939
-0.527376	when running. Programs	-0.124939
-0.527376	worst-case conditions. Programs	-0.124939
-0.595678	single & operation,	-0.124939
-1.139273	a shift operation,	-0.124939
-0.379859	file is closed.	-0.425969
-0.557810	library. Open source.	-0.124939
-0.505102	and open source.	-0.124939
-0.359685	int sign :1;//signbit	-0.425969
-2.318352	should be avoided,	-0.124939
-1.970069	cannot be avoided,	-0.124939
-3.070236	in the book	-0.124939
-0.358988	2001. Advanced book	-0.124939
-1.078610	transfer is avoided.	-0.124939
-1.078237	definitely be avoided.	-0.124939
-0.601439	_mm_empty(); // EMMS	-0.124939
-1.296728	by an EMMS	-0.124939
-2.277763	to do immediately	-0.124939
-0.903037	be placed immediately	-0.124939
-0.187302	memset(a, 0, sizeof(a));	-0.124939
-1.368188	(see page 105).	-0.124939
-1.282146	the register usage	-0.124939
-0.358988	chapter "Register usage	-0.124939
-2.215448	have to fix	-0.124939
-1.200122	try to fix	-0.124939
-0.201433	a[SIZE][SIZE], double b[SIZE][SIZE])	-0.425969
-0.601759	debugger and press	-0.124939
-0.789055	a key press	-0.124939
-2.283118	such as sorting	-0.124939
-0.599479	as most sorting	-0.124939
-0.667144	shared objects (*.dll,	-0.425969
-0.587750	0.40 n.a. 1.00	-0.124939
-0.587750	0.24 n.a. 1.00	-0.124939
-0.505102	2 23 ,	-0.124939
-0.463714	2 52 ,	-0.124939
-0.600604	used more efficiently.	-0.124939
-1.065367	slightly less efficiently.	-0.124939
-2.375717	If the word	-0.124939
-2.772600	in a word	-0.124939
-0.586779	B1, public B2	-0.124939
-1.772885	{ public: B2	-0.124939
-0.505102	Multithreading.............................................................................................................. 101 10.1	-0.124939
-0.358988	2007 (www.intel.com/technology/itj/). 10.1	-0.124939
-0.917690	+= a[i]; Converting	-0.124939
-0.659984	go undetected. Converting	-0.124939
-2.431153	to a matrix.	-0.124939
-1.237252	512 512 matrix.	-0.124939
-0.901793	x.b = B;	-0.124939
-0.897292	A + B;	-0.124939
-0.592261	not doing divisions.	-0.124939
-0.567255	completely independent divisions.	-0.124939
-1.269307	long dependency chains,	-0.124939
-1.144860	loop-carried dependency chains,	-0.124939
-1.940245	use of coprocessors	-0.124939
-1.549859	used as coprocessors	-0.124939
-0.591305	y; ... x.a	-0.124939
-0.789055	B, C; x.a	-0.124939
-2.647771	the same effect.	-0.124939
-1.578281	has no effect.	-0.124939
-1.076179	times to keyboard	-0.124939
-0.601057	quickly to keyboard	-0.124939
-0.065815	362880, 3628800, 39916800,	-0.425969
-1.789532	} // Approximate	-0.124939
-0.599059	n+1; // Approximate	-0.124939
-0.202862	x++) { Table[x]	-0.425969
-0.492137	the const restriction	-0.124939
-0.463714	= B; x.c	-0.124939
-0.358988	+ 2.; x.c	-0.124939
-2.218839	will be misleading	-0.124939
-0.586782	sometimes give misleading	-0.124939
-0.600314	aligning data #ifdef	-0.124939
-0.358988	Example 8.22 #ifdef	-0.124939
-0.864104	Program installation ..................................................................................................	-0.124939
-1.124793	Bounds checking ..................................................................................................	-0.124939
-0.358988	Example 12.8a. Sum	-0.124939
-0.358988	Example 12.8b. Sum	-0.124939
-1.139588	+ 1.; x.b	-0.124939
-0.463714	= A; x.b	-0.124939
-0.947878	and VIA CPUs".	-0.124939
-0.593296	ebx,eax / shr	-0.124939
-0.575684	$B1$2: mov shr	-0.124939
-1.406637	64 bits each.	-0.124939
-1.513829	8 bytes each.	-0.124939
-0.541310	a/1=a xxxxxxxxx 0/a=0	-0.124939
-0.358988	a/1=a x-xxx-x-- 0/a=0	-0.124939
-1.526780	avoided by replacing	-0.124939
-0.598258	fact by replacing	-0.124939
-2.394046	a = 1.0f	-0.124939
-0.847496	0) ? 1.0f	-0.124939
-1.781935	of this manual,	-0.124939
-1.684514	In this manual,	-0.124939
-1.078279	fragmented and involve	-0.124939
-2.041137	it may involve	-0.124939
-1.061913	vector always Optimize	-0.124939
-1.276575	compiler Linux Optimize	-0.124939
-1.096433	sum += list[i];	-0.124939
-0.828418	sum1 += list[i];	-0.124939
-0.601439	1.f); // initialize	-0.124939
-0.505118	// sum, initialize	-0.124939
-0.541310	0 n! 117	-0.124939
-0.358988	for vectorization............................................................. 117	-0.124939
-0.601625	browsing that previously	-0.124939
-0.358988	was assigned previously	-0.124939
-0.892963	class libraries 113	-0.124939
-0.527376	classes ............................................................................................. 113	-0.124939
-0.483941	division, square root	-0.124939
-0.483941	Division, square root	-0.124939
-0.601759	algebra and statistics,	-0.124939
-1.710763	functions for statistics,	-0.124939
-0.588071	compiled three times,	-0.124939
-1.162677	long response times,	-0.124939
-0.902885	overcome the obstacle	-0.124939
-0.600980	often an obstacle	-0.124939
-0.550882	by Agner Fog.	-0.124939
-0.391189	By Agner Fog.	-0.124939
-0.527389	exp exp 12.8	-0.124939
-0.505102	vectors........................................................................ 119 12.8	-0.124939
-2.190361	from the IDE	-0.124939
-0.600980	Has an IDE	-0.124939
-0.575684	vector access. 12.9	-0.124939
-0.557810	memory................................................................. 120 12.9	-0.124939
-1.713667	way of removing	-0.124939
-1.549301	simply by removing	-0.124939
-0.575711	A test setup	-0.124939
-0.575711	simple test setup	-0.124939
-0.359006	platform _WIN64 _LP64	-0.124939
-0.359006	_LP64 _WIN64 _LP64	-0.124939
-1.750701	a simple type,	-0.124939
-0.592830	of known type,	-0.124939
-0.600787	16) { b.load(bb+i);	-0.124939
-1.663892	consecutive elements b.load(bb+i);	-0.124939
-2.367910	because the factorials	-0.124939
-1.082960	the reciprocal factorials	-0.124939
-3.207749	of the subroutine	-0.124939
-1.691906	a separate subroutine	-0.124939
-0.601759	2.6.30 and later.	-0.124939
-1.378173	SSE2 or later.	-0.124939
-0.541310	......................................................................................... 107 12.4	-0.124939
-0.541310	vector division. 12.4	-0.124939
-0.463714	........................................................................................ 109 12.5	-0.124939
-0.358988	next section. 12.5	-0.124939
-0.483950	use algebraic manipulations	-0.124939
-0.483950	because algebraic manipulations	-0.124939
-1.751519	of memory leaks	-0.124939
-0.593773	prevent memory leaks	-0.124939
-0.591958	C standard says	-0.124939
-0.358988	usage convention says	-0.124939
-1.067304	== 2 12.6	-0.124939
-0.463714	............................................................................................. 113 12.6	-0.124939
-1.368188	(see page 140).	-0.124939
-0.463714	vectorization............................................................. 117 12.7	-0.124939
-0.358988	called. 118 12.7	-0.124939
-1.067304	fraction 2 52	-0.124939
-0.593081	structure }; 52	-0.124939
-2.215448	have to obey	-0.124939
-1.867952	has to obey	-0.124939
-1.318096	on page 72.	-0.124939
-0.203497	a*b = b*a	-0.124939
-1.078610	containers is 95	-0.124939
-2.209017	See page 95	-0.124939
-0.541321	vector elements. 12.1	-0.124939
-0.463714	operations............................................................................................... 105 12.1	-0.124939
-2.763279	if the time-critical	-0.124939
-0.601773	longjmp in time-critical	-0.124939
-0.541310	distant future. 12.3	-0.124939
-0.541310	.......................................................... 107 12.3	-0.124939
-1.078578	implement a universal	-0.124939
-0.861589	time. No universal	-0.124939
-2.441602	instruction set Suppl.	-0.124939
-0.358988	SSE3 pmmintrin.h Suppl.	-0.124939
-1.296352	program will crash.	-0.124939
-1.180886	and system crash.	-0.124939
-0.358988	x-xxx---- a*b*c=a*(b*c) a+b+c+d	-0.124939
-0.358988	(a+b)+c=a+(b+c) a+b+c=c+b+a a+b+c+d	-0.124939
-0.902658	expect to 99	-0.124939
-0.527376	control .............................................................................................. 99	-0.124939
-2.303938	{ // Remove	-0.124939
-0.590118	Eliminate branches Remove	-0.124939
-0.861618	intermediate code, interpreters,	-0.124939
-0.505102	graphics frameworks, interpreters,	-0.124939
-0.902488	5 and 9.	-0.124939
-0.851867	element level 9.	-0.124939
-0.598400	constructor, if any,	-0.124939
-0.598400	destructor, if any,	-0.124939
-0.900271	must use thread-safe	-0.124939
-1.372557	functions. A thread-safe	-0.124939
-0.848465	x[]); void F3(bool	-0.124939
-0.573889	9.2b void F3(bool	-0.124939
-1.078320	file in exclusive	-0.124939
-1.077966	container for exclusive	-0.124939
-1.419661	in table 9.2.	-0.124939
-0.588074	SSE2 Table 9.2.	-0.124939
-1.481669	i < NumberOfTests;	-0.425969
-0.600495	counters will stay	-0.124939
-0.999851	not necessarily stay	-0.124939
-1.424981	the 64-bit extension	-0.124939
-0.579351	A further extension	-0.124939
-2.397415	is the "best	-0.124939
-0.601759	case" and "best	-0.124939
-0.597891	(without member functions)	-0.124939
-0.358988	unreferen- ced functions)	-0.124939
-1.078610	iteration is repeated	-0.124939
-0.902146	then be repeated	-0.124939
-1.365521	const int FactorialTable[13]	-0.425969
-0.875559	Wednesday | Friday)	-0.124939
-0.858825	Day == Friday)	-0.124939
-0.575708	polymorphic child function:	-0.124939
-0.764722	the lrint function:	-0.124939
-2.219225	// Example 8.21	-0.124939
-0.887038	loop. Example 8.21	-0.124939
-1.201529	CriticalFunction = &CriticalFunction_AVX;	-0.124939
-0.895225	supported return &CriticalFunction_AVX;	-0.124939
-1.113779	vector operations. 105	-0.124939
-0.358988	vector operations............................................................................................... 105	-0.124939
-2.623314	the compiler interpret	-0.124939
-1.627785	and then interpret	-0.124939
-1.129335	or 1. Writing	-0.124939
-0.527376	C++ programs. Writing	-0.124939
-0.143416	the FDIV bug	-0.124939
-0.143416	The FDIV bug	-0.124939
-0.503947	c); // Compare	-0.425969
-0.065815	swapd(x,y) {temp=x; x=y;	-0.425969
-0.601379	range"); or better,	-0.124939
-0.557802	needed. Even better,	-0.124939
-2.127747	want to flip	-0.124939
-1.077322	0x80000000; // flip	-0.124939
-1.118850	of four float's	-0.124939
-0.820854	or four float's	-0.124939
-0.976056	take several minutes	-0.124939
-0.564897	took several minutes	-0.124939
-0.600472	cc[i]); } 109	-0.124939
-0.659984	functions ........................................................................................ 109	-0.124939
-1.770277	b = (a+1)	-0.124939
-1.776058	c = (a+1)	-0.124939
-1.542343	for other reasons,	-0.124939
-0.596531	For these reasons,	-0.124939
-0.557802	table. Even better:	-0.124939
-0.358988	specific option) better:	-0.124939
-1.823200	of each method,	-0.124939
-1.298011	the simplest method,	-0.124939
-1.888478	the variable whose	-0.124939
-0.577679	stride. Variables whose	-0.124939
-0.856236	new and delete,	-0.124939
-1.820350	object is accessed,	-0.124939
-1.076547	element is accessed,	-0.124939
-0.203497	(a&b)|(a&c) = a&(b|c)	-0.124939
-1.936845	useful for assigning	-0.124939
-1.675012	this by assigning	-0.124939
-1.728773	is only 10%	-0.124939
-1.004422	is true 10%	-0.124939
-0.358988	books 1994. Mostly	-0.124939
-0.358988	Wesley 1997. Mostly	-0.124939
-0.910615	in manual 4:	-0.425969
-0.560108	temp / 4;	-0.124939
-0.560108	(a+1) / 4;	-0.124939
-0.873063	some microprocessors have.	-0.124939
-0.550854	end users have.	-0.124939
-1.713667	way of relieving	-0.124939
-2.200484	used for relieving	-0.124939
-1.077884	rows = 10,	-0.124939
-0.563101	double 8, 10,	-0.124939
-0.550863	ALIGN ?Func@@YAXQAHAAH@Z ENDP	-0.124939
-0.358988	4 ?Func2@@YAXQAHAAH@Z ENDP	-0.124939
-0.601439	seconds; // incremented	-0.124939
-1.767089	has been incremented	-0.124939
-0.858817	it calls. 48	-0.124939
-0.358988	Functions ................................................................................................................ 48	-0.124939
-0.579364	(Day == Tuesday	-0.124939
-0.575689	= 2, Tuesday	-0.124939
-1.608207	is calculated internally	-0.124939
-1.462040	is implemented internally	-0.124939
-0.864124	an unused fourth	-0.124939
-0.463714	columns. Every fourth	-0.124939
-0.598535	contain many tips	-0.124939
-1.189091	and some tips	-0.124939
-0.902280	higher for shared_ptr	-0.124939
-0.358988	by assignment. shared_ptr	-0.124939
-1.014570	Linux and BSD.	-0.124939
-0.504970	worth the effort.	-0.124939
-0.595570	systems available today.	-0.124939
-0.557810	for Java today.	-0.124939
-0.580829	meaning. 2. Put	-0.124939
-0.358988	in question: Put	-0.124939
-1.285309	version int CriticalFunction_AVX(int	-0.124939
-0.596756	127 int CriticalFunction_AVX(int	-0.124939
-0.600141	sqaure: for (r2	-0.124939
-0.600141	separately: for (r2	-0.124939
-1.651719	this by writing:	-0.124939
-0.895519	explicitly by writing:	-0.124939
-0.201656	B1; class B2;	-0.124939
-2.707124	that the overall	-0.124939
-0.902190	measuring the overall	-0.124939
-0.249867	object. 7.17 Structures	-0.124939
-0.249867	50 7.17 Structures	-0.124939
-1.378408	files and executables.	-0.124939
-0.600179	Use different executables.	-0.124939
-1.503629	language is inherently	-0.124939
-2.302631	that are inherently	-0.124939
-0.601844	questions to me.	-0.124939
-0.899744	one from me.	-0.124939
-1.300225	call a non-virtual	-0.124939
-1.498527	resources than non-virtual	-0.124939
-1.854674	method is safer.	-0.124939
-1.482484	but also safer.	-0.124939
-2.177816	value of b+c	-0.124939
-0.527389	c first. b+c	-0.124939
-0.143416	n.a. __unix__ __linux__	-0.124939
-0.143416	__linux__ __unix__ __linux__	-0.124939
-1.272284	int 16 -32768	-0.124939
-0.901817	a+b = b+a	-0.124939
-3.208921	of the factorials,	-0.124939
-1.504329	matter of convenience	-0.124939
-1.736242	less than 231.	-0.124939
-1.951190	{ return pow(x,10);	-0.124939
-0.597985	big software companies	-0.124939
-0.600505	systems will dominate	-0.124939
-0.594523	code, specific preferences	-0.124939
-0.861672	or #pragma optimize("a",on).	-0.124939
-0.580857	function #pragma optimize(...)	-0.124939
-0.505122	x--x----- ---x----- x---x---x	-0.124939
-0.836084	1)sign 2exponent 16383	-0.124939
-2.441924	instruction set extensions.	-0.124939
-0.527404	10 μs today,	-0.124939
-2.640857	// Example 14.5b	-0.124939
-2.640857	// Example 14.5a	-0.124939
-0.550887	constructor specifying otherwise.	-0.124939
-0.359002	sum for(inti=0;i<16;i+=4){ //Loopby4	-0.124939
-0.598586	(b1 * b2);	-0.124939
-0.359002	1./40320., 1./362880., 1./3628800.,	-0.124939
-1.995759	have a niche	-0.124939
-0.597888	int)i < 10)	-0.124939
-0.541332	x; __asm fistp	-0.124939
-0.582144	(In Windows, SetThreadAffinityMask,	-0.124939
-1.299114	systems are common,	-0.124939
-1.693974	of data (low	-0.124939
-0.597988	unfortunately very common.	-0.124939
-0.586057	allows bigger segments	-0.124939
-0.359002	5.5 Mac: Darwin8	-0.124939
-0.851896	element level 108	-0.124939
-0.463732	is fast, compact,	-0.124939
-2.743091	It is 102	-0.124939
-0.660012	an anonymous namespace.	-0.124939
-1.928707	of an update,	-0.124939
-1.502069	data. The similarity	-0.124939
-0.726952	on contemporary 106	-0.124939
-0.598354	outside any function)	-0.124939
-1.300256	Use a "move	-0.124939
-0.463732	"Gnu indirect function"	-0.124939
-0.359002	Wednesday, Thursday, Friday,	-0.124939
-0.463732	int (16 bits),	-0.124939
-0.541332	point division. Correction	-0.124939
-0.601674	Security. The vulnerability	-0.124939
-2.670223	for the reinstallation	-0.124939
-1.549388	simply by ignoring	-0.124939
-1.395846	the four sums	-0.124939
-0.890453	(N & N-1)==0	-0.124939
-0.359002	1., 1./2., 1./6.,	-0.124939
-0.580846	float b) {x	-0.124939
-0.982216	array element. Rather	-0.124939
-0.580851	have inefficient code-based	-0.124939
-0.877734	that model N+1	-0.124939
-2.568185	and the 512-bit	-0.124939
-2.640857	// Example 7.6.	-0.124939
-0.601394	violate or circumvent	-0.124939
-0.463732	xxxxxxxxx 0/a=0 ---x---xx	-0.124939
-1.214344	in my blog.	-0.124939
-0.359002	www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP.	-0.124939
-0.584309	non-Intel CPUs. Includes	-0.124939
-0.567269	edx, eax $B2$2	-0.124939
-0.463732	have undesired effects.	-0.124939
-0.359002	Jr.: "Hacker's Delight".	-0.124939
-0.596162	make 32 AND-operations	-0.124939
-1.078334	optimizations in precompiled	-0.124939
-1.203346	because a typo	-0.124939
-2.066598	(see page 51).	-0.124939
-0.359002	Compiler Documentation". Included	-0.124939
-0.818712	restarted anyway. Updates	-0.124939
-0.359002	characters '?', '@'	-0.124939
-1.829000	member functions (methods)	-0.124939
-2.006334	a loop count.	-0.124939
-0.601056	-S - masm=intel	-0.124939
-0.595021	I must warn	-0.124939
-0.359002	time measurements: warm	-0.124939
-1.077217	has not noticed	-0.124939
-1.630418	different C++ constructs........................................................................	-0.124939
-2.265201	n.a. - a<<b<<c	-0.124939
-0.899732	as memory leak.	-0.124939
-0.570622	a similar utility	-0.124939
-1.078317	complicated and clumsy,	-0.124939
-0.601674	table. The 16-byte	-0.124939
-1.367423	to all zeroes.	-0.124939
-1.189149	for some caveats.	-0.124939
-0.563118	the label $B1$2:.	-0.124939
-0.601772	spell-checking and repagination	-0.124939
-1.679574	point to a[i+2]	-0.124939
-2.623621	the compiler recognizes	-0.124939
-2.593306	is not recognized	-0.124939
-0.359002	to _endthread() cleans	-0.124939
-1.044352	& operator (bitwise	-0.124939
-0.463732	of cross-platform portability.	-0.124939
-1.712749	be calculated independently.	-0.124939
-2.640857	// Example 9.5b	-0.124939
-1.775238	time to answer	-0.124939
-1.339392	the STL (Standard	-0.124939
-2.640857	// Example 13.2.	-0.124939
-0.903059	PTR [edx] adds,	-0.124939
-0.900290	or use objconv	-0.124939
-1.695336	a specific purpose:	-0.124939
-0.557829	are serious limitations	-0.124939
-0.601423	x10 = x8*x2;	-0.124939
-0.600742	overcome this limitation).	-0.124939
-0.527397	x 43 speculatively	-0.124939
-3.071155	in the disassembly	-0.124939
-0.599724	CPU cache (en.wikipedia.org/wiki/L2_cache).	-0.124939
-0.898255	when no attempt	-0.124939
-1.627842	and then 0+1.23456	-0.124939
-0.463732	- (time before)	-0.124939
-1.851491	of memory blocks.	-0.124939
-1.351454	header file timingtest.h	-0.124939
-0.902673	dispatching to C1::Disp()	-0.124939
-0.600016	special loop predictor.	-0.124939
-0.601423	8*1024/64 = 128.	-0.124939
-1.673175	not always accurate,	-0.124939
-0.527397	long, double. Misaligned	-0.124939
-2.568185	and the FAQ	-0.124939
-0.597540	been called before.	-0.124939
-3.208921	of the Xnu	-0.124939
-0.505122	else being initialized.	-0.124939
-2.278059	to do so).	-0.124939
-1.335851	the compiler. Remember,	-0.124939
-0.359002	-ffast-math /fp:fast /fp:fast=2	-0.124939
-0.593973	(e.g. option /MT).	-0.124939
-0.359002	library (VML, MKL).	-0.124939
-1.459254	: public B1	-0.124939
-2.295065	is that CParent::Hello()	-0.124939
-2.029566	that a user-defined	-0.124939
-0.580860	finding hot spots,	-0.124939
-1.855080	calculation of B.	-0.124939
-1.202540	methods for exploiting	-0.124939
-1.078566	(number of sets).	-0.124939
-1.625523	have been lost	-0.124939
-1.741233	a[i] = i+1;	-0.124939
-1.078631	thread is terminated.	-0.124939
-0.557829	Another serious burden	-0.124939
-0.660012	= a&&(b||c) (a&&!b)	-0.124939
-2.422609	to use SafeArray:	-0.124939
-0.885319	char 128 Is8vec16	-0.124939
-1.795590	a particular weakness	-0.124939
-0.598765	Standard C++ imple-	-0.124939
-0.527397	image processing. Yeppp.	-0.124939
-0.577673	problems. Avoid nested	-0.124939
-1.571132	const & source)	-0.124939
-0.505122	group books 1994.	-0.124939
-0.600272	operands has side	-0.124939
-1.848315	you have ample	-0.124939
-1.406732	64 bits (MMX),	-0.124939
-1.736242	less than 1/50	-0.124939
-0.505122	(multithreaded) /arch:AVX /openmp	-0.124939
-1.193115	because we forgot	-0.124939
-0.875592	The three clauses	-0.124939
-0.359002	mask. Poor reproducibility.	-0.124939
-1.550603	x = -abs(x);.	-0.124939
-0.601056	----- - x-xxx	-0.124939
-0.463732	ReadTSC(); CriticalFunction(); timediff[i]	-0.124939
-0.359002	be signed. Be	-0.124939
-0.579382	initialize sum for(inti=0;i<16;i+=4){	-0.124939
-1.691971	an error message.	-0.124939
-0.597547	subtask before coordination	-0.124939
-1.295275	a more distant	-0.124939
-0.900563	Weekdays { Sunday	-0.124939
-1.058375	label ; restore	-0.124939
-0.359002	a, sizeof(b)); 47	-0.124939
-0.463732	C++". Addison-Wesley, 1996.	-0.124939
-1.495252	available from www.agner.org/optimize.	-0.124939
-0.359002	< NUMROWS; row++)	-0.124939
-2.310817	is to resume	-0.124939
-0.359002	----- ~(~a)=a x-xxxxx--	-0.124939
-1.495154	same memory areas.	-0.124939
-0.359002	__asm fld qword	-0.124939
-1.201678	test // Repeat	-0.124939
-1.676091	exception handling /EHs-	-0.124939
-2.773232	in a union:	-0.124939
-0.599480	SelectAddMul example (12.4e)	-0.124939
-0.594350	low-priority thread steals	-0.124939
-0.359002	of ADC (add	-0.124939
-1.829945	object is moved,	-0.124939
-2.534636	to be moved.	-0.124939
-1.074155	different memory areas,	-0.124939
-0.463732	52 , longdoublevalue	-0.124939
-0.597077	a rather unconventional	-0.124939
-0.600946	modify x *const_cast<int*>(&x)	-0.124939
-1.635459	position-independent code (option	-0.124939
-0.463732	--xxxx-xx a*1=a x-xxxxx-x	-0.124939
-0.359002	or malloc. Handles	-0.124939
-1.679397	kinds of strange	-0.124939
-1.129391	to become invalid,	-0.124939
-0.463732	cache MOVNTDQ _mm_stream_si128	-0.124939
-0.359002	option (Windows: /Gy,	-0.124939
-0.527397	them enabled (there	-0.124939
-2.283633	such as sqrt	-0.124939
-0.601423	a.y = b.y	-0.124939
-1.052070	not support SSE.	-0.124939
-0.842362	the fraction bits:	-0.124939
-0.595826	64 0 264-1	-0.124939
-0.601164	than code generality.	-0.124939
-2.394433	a = sin(0.8);	-0.124939
-0.601772	esp+8 and esp+12	-0.124939
-0.577673	for adding bounds-checking	-0.124939
-0.601631	auto_ptr that owns	-0.124939
-0.597422	>= 0; i--,	-0.124939
-0.359002	or vice versa.	-0.124939
-2.541450	the function body.	-0.124939
-0.601394	wstring or CString	-0.124939
-0.764749	513 513 58.7	-0.124939
-0.601448	Serialize // Prevent	-0.124939
-0.858837	A limited "express"	-0.124939
-0.463732	and complexity (en.wikipedia.org/wiki/Standard_Template_Library).	-0.124939
-1.078555	bit to zero:	-0.124939
-1.049166	float x; *(int*)&x	-0.124939
-0.359002	code (option -fno-pic).	-0.124939
-0.599833	example, one tread	-0.124939
-0.597477	= 4 rows.	-0.124939
-0.601876	bus is saturated.	-0.124939
-1.203733	follow the rows,	-0.124939
-0.888492	result. An uncaught	-0.124939
-2.177794	by a macro,	-0.124939
-0.359002	year. Ignoring virtualization.	-0.124939
-0.601851	advised to seek	-0.124939
-2.657080	of a macro.	-0.124939
-1.377817	microprocessors are constructed.	-0.124939
-0.598863	all static data,	-0.124939
-1.181371	} void FuncB	-0.124939
-1.077896	option that limits	-0.124939
-0.764737	x^4 F32vec4 xx4(x4);	-0.124939
-2.719997	is a precious	-0.124939
-0.359002	AMD Family 15h	-0.124939
-0.597985	irrelevant software installed,	-0.124939
-2.534636	to be installed.	-0.124939
-1.497749	to floating point:	-0.124939
-0.601665	11.1 for IA-32/Intel64,	-0.124939
-0.463732	16 SSSE3 _mm_perm_epi8	-0.124939
-0.601056	int)(i - min)	-0.124939
-2.429856	with the LLVM	-0.124939
-2.640857	// Example 7.40a	-0.124939
-2.640857	// Example 7.40b	-0.124939
-2.640857	// Example 7.40c	-0.124939
-0.505122	must compute (FuncRow(i)*columns	-0.124939
-1.213160	OS X (Darwin)	-0.124939
-1.740246	we can see,	-0.124939
-0.601963	stress the importance	-0.124939
-1.673175	not always comparable	-0.124939
-1.200811	implemented with interpretation.	-0.124939
-0.901817	xn = x∙xn-1,	-0.124939
-0.550877	only happens rarely.	-0.124939
-0.580857	aliased #pragma optimize("a",	-0.124939
-0.599896	rounding, but neither	-0.124939
-1.417899	to predict correctly	-0.124939
-1.078566	(number of sets)	-0.124939
-1.597697	of function libraries........................................................................................	-0.124939
-0.902673	comes to mind.	-0.124939
-1.437307	supported instruction sets,	-0.124939
-0.601674	conventions. The dot	-0.124939
-0.599166	(a1*b2 + a2*b1)	-0.124939
-1.600962	are often mispredicted.	-0.124939
-1.888682	the variable Day.	-0.124939
-2.534636	to be mispredicted,	-0.124939
-1.203991	Automatic vectorization Good	-0.124939
-0.463732	// (time after)	-0.124939
-1.069830	const float OneOrTwo5[2]	-0.124939
-0.563110	are temporary intermediates,	-0.124939
-1.379294	p is incremented.	-0.124939
-1.767402	has been incremented,	-0.124939
-0.359002	this works, here's	-0.124939
-1.641803	size is insufficient.	-0.124939
-0.601394	he or she	-0.124939
-1.899809	or a not-too-big	-0.124939
-1.625110	of cache evictions	-0.124939
-1.756537	the sign bit,	-0.124939
-1.430117	Instruction set Prefetch	-0.124939
-0.899108	each CPU core).	-0.124939
-0.505131	__restrict __declspec( noalias)	-0.124939
-0.902300	Time for transposition	-0.124939
-0.599544	invalidate each other's	-0.124939
-1.012201	are provided below,	-0.124939
-0.463732	effectively preventing illegitimate	-0.124939
-0.858842	and prevent legitimate	-0.124939
-0.541332	header files. 121	-0.124939
-1.004489	the logical architecture	-0.124939
-0.598550	hold many renamed	-0.124939
-2.743091	It is unacceptable	-0.124939
-1.923513	and other hardware-related	-0.124939
-1.784518	byte at 12,	-0.124939
-0.463732	operations (chapter 12)	-0.124939
-0.601772	2005; and "More	-0.124939
-0.359002	Scott Meyers: "Effective	-0.124939
-0.601448	&SelectAddMul_dispatch; // Dispatcher	-0.124939
-0.597491	die. See www.gnu.org/copyleft/fdl.html.	-0.124939
-1.601190	example, the Boost	-0.124939
-0.463732	/ shr ebx,31	-0.124939
-0.550877	where security matters.	-0.124939
-0.588671	Or #include <ia32intrin.h>	-0.124939
-0.601851	operations to finish.	-0.124939
-1.198537	programs use inappropriate	-0.124939
-0.595824	{ case 0:	-0.124939
-0.601772	64 and IA-32	-0.124939
-0.601230	mixed with x87	-0.124939
-0.463732	a+b=b+a a*b=b*a a+b+c=a+(b+c)	-0.124939
-1.798598	b = 6.0f;	-0.124939
-0.527397	inside {} brackets.	-0.124939
-0.902673	set to NULL.	-0.124939
-0.901262	than as b*(2.0/3.0)	-0.124939
-0.601394	(.lib or .a),	-0.124939
-2.763419	to the tolerance	-0.124939
-0.550882	libraries Test Processor	-0.124939
-0.589204	disk cache. Files	-0.124939
-0.600296	usability, program compactness,	-0.124939
-0.901544	implemented by (partial)	-0.124939
-0.359002	tricks Michael Abrash:	-0.124939
-0.900339	from time T+1	-0.124939
-0.591976	1 : 0]	-0.124939
-0.359002	Addison-Wesley. Third Edition,	-0.124939
-2.151139	in example 14.7b	-0.124939
-0.961003	user interface. Otherwise	-0.124939
-1.069733	first two suggested	-0.124939
-2.640857	// Example 14.3a	-0.124939
-2.640857	// Example 14.3b	-0.124939
-1.195033	link library (DLL)	-0.124939
-0.836084	#define FUNCNAME SelectAddMul_SSE41	-0.124939
-1.192294	+ 2 thenaandbcannot	-0.124939
-0.359002	2006 (Red Hat).	-0.124939
-0.359002	-231 231-1 int32_t	-0.124939
-1.077996	or for issuing	-0.124939
-0.902723	member is unchanged	-0.124939
-1.398568	the stack. Deallocation	-0.124939
-0.871623	their own initiative	-0.124939
-1.077392	for code bloat	-0.124939
-0.567266	is inefficient. Division,	-0.124939
-0.541338	syntax 90 Gives	-0.124939
-1.594795	same as C-	-0.124939
-0.359002	7.29b floata; boolb=0;	-0.124939
-0.601772	Java and C#	-0.124939
-0.563114	or false (0);	-0.124939
-0.600990	exceeds an acceptable	-0.124939
-0.601772	FuncA and FuncB,	-0.124939
-0.463732	rightmost 1-bit removed.	-0.124939
-1.074533	this will trigger	-0.124939
-1.920422	and a basic	-0.124939
-0.359002	< columns; j++)	-0.124939
-0.899686	values at once...................................	-0.124939
-0.868292	for switch statements,	-0.124939
-2.834872	it is compiling.	-0.124939
-1.078566	sources of frustration	-0.124939
-1.198233	takes only 2-3	-0.124939
-0.601580	map are prone	-0.124939
-0.726968	3.14 Context switches.....................................................................................................	-0.124939
-1.133449	to go deeper	-0.124939
-0.587474	if(!(a || b))	-0.124939
-0.590137	one operator less.	-0.124939
-2.657080	of a "function".	-0.124939
-0.505122	simultaneous lookups Max.	-0.124939
-0.595696	T & operator[]	-0.124939
-2.130427	set is enabled:	-0.124939
-1.976890	the object owns.	-0.124939
-2.159356	there are wrapper	-0.124939
-0.764749	__INTEL_COMPILER __INTEL_COMPILER 161	-0.124939
-0.463732	_M_X64 _M_X64 162	-0.124939
-0.901946	pattern can be,	-0.124939
-0.570629	in linear algebra)	-0.124939
-1.203286	used to be.	-0.124939
-0.601448	work // Re-do	-0.124939
-0.601855	laws of algebra.	-0.124939
-1.499665	longer time slices.	-0.124939
-2.251997	then the transformation	-0.124939
-0.901867	x^4 // x^8	-0.124939
-0.598522	glibc version 2.11	-0.124939
-0.527404	test. disable power-save	-0.124939
-0.600055	two other situations:	-0.124939
-0.359002	a sensible balance	-0.124939
-0.660012	n.a. 1.00 0.35	-0.124939
-1.203386	model is over.	-0.124939
-1.295708	or an over-	-0.124939
-0.588094	b<c && a<c)	-0.124939
-1.824927	y = a1/b1	-0.124939
-0.591976	"m"(x) : "memory"	-0.124939
-0.359002	for IA-32/Intel64, 2009.	-0.124939
-1.870103	in some situations,	-0.124939
-0.828405	a false vendor	-0.124939
-0.359002	<float, 100> list;	-0.124939
-1.823421	} // Non-polymorphic	-0.124939
-1.680333	a good investment.	-0.124939
-0.595698	16-byte instructions MOVNTPS,	-0.124939
-1.742098	double precision (80	-0.124939
-0.900939	100; int matrix[NUMROWS][NUMCOLUMNS];	-0.124939
-0.359002	return prediction). 149	-0.124939
-0.940853	_mm_storeu_si128((__m128i *)d, x);}	-0.124939
-2.177960	value of cc[i]+2	-0.124939
-0.597210	32 64 Iu32vec2	-0.124939
-1.169276	int 128 Iu32vec4	-0.124939
-1.948839	i++) { time1	-0.124939
-1.341964	for making plug-ins	-0.124939
-0.589197	char 256 Vec32c	-0.124939
-0.600742	outside this interval,	-0.124939
-2.048481	at compile time?	-0.124939
-1.201578	SelectAddMul_pointer = &SelectAddMul_AVX2;	-0.124939
-0.550887	a Core i7	-0.124939
-1.308202	software development kit	-0.124939
-1.785855	the array i)	-0.124939
-0.550877	Vec4f polynomial (Vec4f	-0.124939
-2.568185	and the transitions	-0.124939
-0.527397	is cached. Usually	-0.124939
-0.359002	int u[2]} a[size];	-0.124939
-0.726952	Cache organization ...................................................................................................	-0.124939
-0.902673	element to x?"	-0.124939
-0.590560	has problems separating	-0.124939
-0.598871	32-bit number (the	-0.124939
-1.278281	}; void g()	-0.124939
-0.600943	programming, compiler technology,	-0.124939
-1.785855	the array 800	-0.124939
-2.006980	cannot be tolerated.	-0.124939
-0.598082	exceed 2 Gbytes.	-0.124939
-0.598586	CChild1 * p1;	-0.124939
-0.600983	typedef int CriticalFunctionType(int	-0.124939
-0.359002	is created, deleted,	-0.124939
-1.675250	a - a/1	-0.124939
-0.505122	of C++. Yet,	-0.124939
-0.597210	__int64 64 -263	-0.124939
-1.171578	it was assigned	-0.124939
-0.541332	principles here: functional	-0.124939
-0.601145	written as 2eee	-0.124939
-2.593306	is not human	-0.124939
-2.334742	as a plug-in	-0.124939
-0.660012	#include <xmmintrin.h> _mm_setcsr(_mm_getcsr()	-0.124939
-1.736242	less than 2-20,	-0.124939
-2.192153	are not yet	-0.124939
-0.601772	expansions and Newton-Raphson	-0.124939
-2.393331	should be regarded	-0.124939
-0.599544	draw each pixel	-0.124939
-1.335882	a thread affinity	-0.124939
-0.806134	?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROCNEAR	-0.124939
-0.599571	also page 119).	-0.124939
-2.573352	on the stack).	-0.124939
-0.359002	few decades ago,	-0.124939
-2.057727	you may reuse	-0.124939
-0.359002	any answer. Beginners	-0.124939
-0.601448	&SelectAddMul_SSE2; // Error:	-0.124939
-1.630175	a memory pool,	-0.124939
-1.625523	have been reordered,	-0.124939
-0.463732	23 , doublevalue	-0.124939
-0.359002	Performance Primitives (IPP).	-0.124939
-0.600355	project at hand.	-0.124939
-2.773232	in a hand-	-0.124939
-0.601772	(& and |)	-0.124939
-0.588096	by better standardization	-0.124939
-0.359002	planned solutions. Patches	-0.124939
-0.601772	structured and object-oriented	-0.124939
-0.597416	not call WriteFile	-0.124939
-0.463732	the symbolic link.	-0.124939
-1.468048	CPU dispatch strategies........................................................................................	-0.124939
-1.067237	of performance monitoring	-0.124939
-0.527397	case 1: printf("Beta");	-0.124939
-1.745898	efficient to re-use	-0.124939
-0.601876	copy is dead	-0.124939
-1.598213	cache. The Core2	-0.124939
-1.846652	a different meaning.	-0.124939
-1.795590	a particular meaning,	-0.124939
-0.541332	Last updated 2014-08-07.	-0.124939
-0.901262	internally as (int)&matrix[0][0]	-0.124939
-2.304254	{ // Safe	-0.124939
-0.599896	i but i*12,	-0.124939
-1.314512	the keyword __thread	-0.124939
-0.990969	target buffer (BTB).	-0.124939
-1.174070	has several meanings	-0.124939
-0.593331	order calculation capabilities.	-0.124939
-1.672453	the next paragraph.	-0.124939
-0.900939	}; int Sum2(S3	-0.124939
-0.359002	ISO/IEC TR 18015,	-0.124939
-0.600193	microprocessors, different alignments	-0.124939
-2.581419	floating point status:	-0.124939
-0.575704	vector classes. Including	-0.124939
-0.591983	e.g. every millisecond.	-0.124939
-0.858846	b;}; S1 list[100],	-0.124939
-0.601674	microprocessor The benchmark	-0.124939
-0.359002	in nn ifbit=1	-0.124939
-1.361929	this example, f(x)	-0.124939
-1.231450	as follows: floatvalue	-0.124939
-0.726968	class definition. Inlining	-0.124939
-0.570622	that seconds remains	-0.124939
-1.078891	under the worst-	-0.124939
-1.641813	list of titles.	-0.124939
-2.640857	// Example 11.2a	-0.124939
-0.961003	identification (RTTI) /GR–	-0.124939
-0.591318	public: ... ~C1();	-0.124939
-0.599166	operator + (vector	-0.124939
-1.078891	Obviously, the initial	-0.124939
-0.359002	compilers www.agner.org/ optimize/#vectorclass	-0.124939
-0.598586	powN<true,N/2>::p(x) * powN<true,N/2>::p(x);	-0.124939
-1.353774	are called accumulators.	-0.124939
-2.016540	on page 22.	-0.124939
-1.394777	in fact addressed	-0.124939
-1.181371	} void F0()	-0.124939
-1.055659	Intel-based Mac OS,	-0.124939
-0.505122	& earlier vmlsExp4	-0.124939
-1.077524	compile with -mcmodel=large,	-0.124939
-0.601851	-128 to +127.	-0.124939
-0.599358	b double precision:	-0.124939
-0.597888	n < 223	-0.124939
-1.545517	+ 1; x[1]	-0.124939
-0.902513	Goedecker and Adolfy	-0.124939
-1.077908	SIZE = 64;	-0.124939
-0.597985	many software products	-0.124939
-0.596154	Include file dvec.h	-0.124939
-1.470784	dynamic libraries (.dll	-0.124939
-0.594350	in several stages	-0.124939
-2.408411	function is InstructionSet().The	-0.124939
-2.155293	size of 64.	-0.124939
-0.359002	and USB sticks	-0.124939
-1.532072	multiple threads Parallelization	-0.124939
-0.593763	Each line covers	-0.124939
-0.595452	when I die.	-0.124939
-2.016540	on page 153.	-0.124939
-0.595826	16 0 65535	-0.124939
-0.463732	optimization report /Qopt-report	-0.124939
-2.029566	that a low-priority	-0.124939
-0.359002	Example 12.1b. Vectorization	-0.124939
-0.550877	Loopunrolling x-xxxx--x Profile-guided	-0.124939
-1.203255	constructors and destructors.	-0.124939
-0.594161	brushes, etc. Locked	-0.124939
-0.586043	efficiency, platform independence,	-0.124939
-0.891175	versus dynamic libraries............................................................................	-0.124939
-1.641200	cost of fine-tuning,	-0.124939
-1.798755	when it exits.	-0.124939
-2.328144	the loop exits,	-0.124939
-0.589197	about name mangling	-0.124939
-1.781254	the Gnu utilities	-0.124939
-0.660012	13.3 Difficult cases........................................................................................................	-0.124939
-1.767402	has been alleviated	-0.124939
-0.896768	with two decimals,	-0.124939
-0.593767	per matrix cell	-0.124939
-0.902233	operator that transfers	-0.124939
-0.601423	list[j].a = list[j].b	-0.124939
-0.660012	= order(i); list[j].a	-0.124939
-2.640857	// Example 12.4a.	-0.124939
-0.599480	than example 12.4a,	-0.124939
-0.893870	operation. For example,a	-0.124939
-2.393331	should be obeyed.	-0.124939
-1.298636	resources are sufficient,	-0.124939
-1.483245	the final product.	-0.124939
-0.573388	pop ebx restores	-0.124939
-0.359002	4.5.2, July 2011).	-0.124939
-0.463732	are inherently serial,	-0.124939
-2.534636	to be restored	-0.124939
-0.505131	code. C#, managed	-0.124939
-0.855613	YMM registers. Disadvantages	-0.124939
-0.550887	the expected real-time	-0.124939
-0.359002	1./720., 1./5040., 1./40320.,	-0.124939
-0.594534	processing speed exceeding	-0.124939
-0.726952	(-a==-b)=(a==b) ---xx---- (a+c==b+c)=(a==b)	-0.124939
-0.588095	-opt-report Table 18.2.	-0.124939
-0.359002	a menu click	-0.124939
-1.799852	c = 1.23456,	-0.124939
-0.589685	programs automatically download	-0.124939
-0.359002	-mveclibabi -fopenmp /Qopenmp	-0.124939
-0.601079	tiling. This technique	-0.124939
-1.299878	allocation and de-allocation	-0.124939
-1.741273	replaced by x<<3,	-0.124939
-1.785683	the best optimizer.	-0.124939
-0.595303	137 about division).	-0.124939
-0.902703	represent a monotonically	-0.124939
-2.341047	with a top-of-stack	-0.124939
-1.069746	or multiple configurations	-0.124939
-0.601772	on and off.	-0.124939
-0.359002	Vec16us Vec8i Vec8ui	-0.124939
-0.896043	binutils version 2.20	-0.124939
-0.599155	1.61 n.a. 2.23	-0.124939
-0.359002	Performance". www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf.	-0.124939
-0.902673	references to relocate,	-0.124939
-0.601580	Beginners are advised	-0.124939
-0.902703	pressing a button	-0.124939
-1.461655	the right positions	-0.124939
-1.198871	However, this did	-0.124939
-0.359002	128 Iu16vec8 Vec8us	-0.124939
-0.588671	compilers. #include <excpt.h>	-0.124939
-1.503796	only a minimal	-0.124939
-1.096841	Math Kernel Library,	-0.124939
-0.463732	created. Far Systems	-0.124939
-0.601394	workday or more.	-0.124939
-1.187263	or even telling	-0.124939
-0.601772	strange and unexpected	-0.124939
-0.563110	make profiling feasible.	-0.124939
-0.901817	x4 = x2*x2;	-0.124939
-2.156723	version of Mathcad	-0.124939
-1.568651	the option -mveclibabi=acml.	-0.124939
-0.596558	very user friendly	-0.124939
-0.541332	background processes running,	-0.124939
-1.857152	of different targets	-0.124939
-0.888817	then calls exit.	-0.124939
-0.579368	of task switching.	-0.124939
-1.921450	the following alternatives:	-0.124939
-1.198743	Using vector operations...............................................................................................	-0.124939
-2.066598	(see page 27).	-0.124939
-1.412712	optimization options ...................................................................................	-0.124939
-0.897322	log(b[i]) + log(c[i]);	-0.124939
-0.601286	copied by assignment,	-0.124939
-0.601286	another by assignment.	-0.124939
-1.830219	difficult to diagnose.	-0.124939
-2.640857	// Example 8.9b	-0.124939
-1.203286	15.1a to 15.1c).	-0.124939
-0.527397	vector() {} vector(float	-0.124939
-1.199828	b; int c;};	-0.124939
-2.640857	// Example 8.9a	-0.124939
-1.075214	different instruction sets...........................	-0.124939
-1.115916	32-bit Windows. Integrates	-0.124939
-0.596260	problem void AddTwo(int	-0.124939
-0.359002	function scanf. Violation	-0.124939
-1.412300	the work evenly	-0.124939
-1.767402	has been identified,	-0.124939
-1.203504	Number of simultaneous	-0.124939
-1.078391	operations for incrementing	-0.124939
-1.021964	can become imprecise	-0.124939
-0.599341	See Intel Technology	-0.124939
-2.394433	a = -100,	-0.124939
-1.361926	the call p->f()	-0.124939
-0.359002	__cpuid(dummy, 0); DontSkip	-0.124939
-0.359002	1/n! 1., 1./2.,	-0.124939
-2.177794	by a blend	-0.124939
-0.897322	d + e	-0.124939
-0.902513	generality and flexibility	-0.124939
-0.359002	FuncType SelectAddMul, SelectAddMul_SSE2,	-0.124939
-0.597672	* const Greek[4]	-0.124939
-0.527412	frequency (in Windows:	-0.124939
-2.341047	with a wealth	-0.124939
-2.834872	it is correlated	-0.124939
-0.891382	carried out independently	-0.124939
-0.601448	endl; // Output	-0.124939
-0.836084	template <typename T,	-0.124939
-0.836084	template <typename T>	-0.124939
-0.660012	version FuncType SelectAddMul,	-0.124939
-0.527412	The empty throw()specification	-0.124939
-0.593325	16is calculated asa	-0.124939
-0.505122	containers 93 themselves.	-0.124939
-1.736242	less than ARRAYSIZE.	-0.124939
-0.550887	it defines electrical	-0.124939
-1.196063	const double A2	-0.124939
-0.591976	" : "=m"(n)	-0.124939
-0.902513	Goedecker and A.	-0.124939
-0.463732	development environment (IDE)	-0.124939
-0.463732	en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC TR	-0.124939
-1.107117	function libraries. Numbers	-0.124939
-0.595012	use large amounts	-0.124939
-0.789093	syntax: __asm fld	-0.124939
-0.828405	is fast. Calculating	-0.124939
-0.359002	of Mathcad (v.	-0.124939
-0.861668	a whole polygon	-0.124939
-0.505122	x---- ----- ~(~a)=a	-0.124939
-1.274625	member pointers /vms	-0.124939
-0.600904	profiler may sample	-0.124939
-0.600395	questions from everybody.	-0.124939
-0.601380	consider it unwise	-0.124939
-1.615986	and operating systems").	-0.124939
-1.077896	object that looses	-0.124939
-0.660012	= &Object2; p2->Hello();	-0.124939
-2.640857	// Example 8.23b.	-0.124939
-0.599166	(c + d);	-0.124939
-0.527404	char pointers. 144	-0.124939
-0.591980	defines hardware circuits	-0.124939
-2.640857	// Example 14.1b	-0.124939
-1.149654	int n; 143	-0.124939
-2.640857	// Example 14.1a	-0.124939
-0.601876	a[i] is ecx+eax*4.	-0.124939
-0.505122	to int. Reinterpret	-0.124939
-0.961003	= 100, max	-0.124939
-0.505122	8.1 (page 77)	-0.124939
-0.596644	on test theory.	-0.124939
-0.359002	a "move constructor"	-0.124939
-0.594364	7 through 14,	-0.124939
-0.600534	(Not A Number)	-0.124939
-0.600505	cores will grow	-0.124939
-0.806134	bit systems: Pointers,	-0.124939
-1.021953	vector size. Unpredictable	-0.124939
-0.601394	GetTickCount or QueryPerformanceCounter	-0.124939
-1.446675	divide the workload	-0.124939
-0.601286	u[1] by u[0].	-0.124939
-2.648786	the same class).	-0.124939
-1.741990	we are seeing	-0.124939
-0.557834	anyway. Software distributors	-0.124939
-0.359002	1, 2A, 2B,	-0.124939
-0.359002	4.1.0, 2006 (Red	-0.124939
-0.595167	current CPUs optimally.	-0.124939
-2.100095	the data optimally,	-0.124939
-0.902680	point of view.	-0.124939
-0.563114	to heavy competition.	-0.124939
-1.018393	Compiler v. 8.42n,	-0.124939
-0.359002	optimization", Coriolis group	-0.124939
-2.127939	want to thank	-0.124939
-0.359002	std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP. www.openmp.org.	-0.124939
-1.119701	is particularly interesting	-0.124939
-0.550877	a|(b&c) x-xxxx--x ~a&~b=~(a|b)	-0.124939
-1.129443	unused label ;eax=addressofa	-0.124939
-0.842344	the declaration "static"	-0.124939
-1.555892	multiple of 0x800	-0.124939
-1.935978	You can subtract	-0.124939
-1.075239	how this works,	-0.124939
-0.359002	(".type CriticalFunction, @gnu_indirect_function");	-0.124939
-1.542458	for other optimizations,	-0.124939
-0.601772	vectors and matrixes.	-0.124939
-0.600193	very different speeds.	-0.124939
-0.806124	Suppl. SSE3 tmmintrin.h	-0.124939
-0.359002	: x(0) {};	-0.124939
-1.745437	the programmer forgets	-0.124939
-0.550877	given above. 7.	-0.124939
-1.246978	a positive integer:	-0.124939
-1.395975	when compiling module2.cpp.	-0.124939
-0.589689	smallest members last:	-0.124939
-1.189728	64 64 14.0	-0.124939
-0.359002	SelectAddMul_SSE2, SelectAddMul_SSE41, SelectAddMul_AVX2,	-0.124939
-0.359002	time consumers. Choose	-0.124939
-0.599166	list[j].b + list[j].c;	-0.124939
-0.527397	r) {return r.a	-0.124939
-1.436077	many different places).	-0.124939
-0.902300	etc. for Linux)	-0.124939
-0.601394	Incrementing or decrementing	-0.124939
-0.594161	resolutions, etc. Accessibility	-0.124939
-1.777036	any other constructors.	-0.124939
-0.599358	x4*x4; double x10	-0.124939
-2.304254	{ // Generic	-0.124939
-0.359002	F64vec2 F32vec8 F64vec4	-0.124939
-0.601876	supposedly is system-independent,	-0.124939
-0.359002	{ 92 DynamicArray[i]	-0.124939
-2.278059	to do cross-module	-0.124939
-0.359002	compiler. Remember, therefore,	-0.124939
-0.599166	(FuncRow(i)*columns + FuncCol(i))	-0.124939
-0.601286	memory by requesting	-0.124939
-0.600325	remote data locally.	-0.124939
-2.640857	// Example 8.3a	-0.124939
-2.640857	// Example 12.4c.	-0.124939
-0.582149	here gives a+b=0,	-0.124939
-0.463732	regularly. AMD: "Software	-0.124939
-1.058403	an assembly listing.	-0.124939
-1.678705	variable in eax.	-0.124939
-0.866322	a computer game	-0.124939
-2.304254	{ // polynomial(x)	-0.124939
-0.598586	(bb[i] * cc[i]);	-0.124939
-0.563114	code line. Time-based	-0.124939
-0.601772	installation and uninstallation	-0.124939
-1.049121	algebra reductions: a+b=b+a	-0.124939
-2.736646	that the remaining	-0.124939
-0.601448	bit // u.d	-0.124939
-0.359002	2.11 ifunc branch).	-0.124939
-0.806134	7.11 Type conversions....................................................................................................	-0.124939
-1.277867	the system forbids	-0.124939
-2.016540	on page 107.	-0.124939
-0.567273	allocated block. Walking	-0.124939
-0.359002	standard 754 (1985).	-0.124939
-1.919761	is also de-allocated.	-0.124939
-0.764749	7.29 Threads ..................................................................................................................	-0.124939
-0.463732	or -Ofast /O3	-0.124939
-0.600065	uses CPU dispatching:	-0.124939
-0.899108	with CPU dispatching,	-0.124939
-1.200811	compiler with C++0x	-0.124939
-0.593316	a2*b1) / (b1*b2);	-0.124939
-0.359002	1./6.22702E9, 1./8.71782E10, 1./1.30767E12,	-0.124939
-0.359002	S1 list[100], *temp;	-0.124939
-0.601230	Systems with segmented	-0.124939
-0.600990	get an integral	-0.124939
-0.806124	on CodeGear compiler).	-0.124939
-0.591315	existing program. Weighing	-0.124939
-0.594348	code. These workaround	-0.124939
-0.359002	utilized appropriately. Users	-0.124939
-0.359002	security matters. Problems	-0.124939
-0.359002	-fopenmp /Qopenmp -m32	-0.124939
-0.505131	float, double, bool,	-0.124939
-0.463732	the generic branch,	-0.124939
-0.463732	/arch:SSSE2 -msse4.1 /arch:SSE4.1	-0.124939
-0.527397	testing worst-case performance:	-0.124939
-0.359002	and Enterprise editions).	-0.124939
-0.894110	at address [ecx+eax*4].	-0.124939
-0.359002	best optimizer. Borland/CodeGear/Embarcadero	-0.124939
-0.961003	and maintenance easier.	-0.124939
-2.213916	the value 1000.	-0.124939
-1.911646	of using ready	-0.124939
-1.959451	I have studied	-0.124939
-1.661379	b + 0.666666666666666666667;	-0.124939
-0.593970	Z += A2;	-0.124939
-1.299666	the so-called commpage.	-0.124939
-2.090031	time it uses.	-0.124939
-0.898145	template<> class powN<true,0>	-0.124939
-0.359002	addresses 0x2F00, 0x3700,	-0.124939
-0.359002	NUMCOLUMNS; column++) matrix[row][column]	-0.124939
-0.593975	mean good performance).	-0.124939
-0.570622	next chapter describes	-0.124939
-0.601423	lookup[2] = {2.6f,	-0.124939
-0.595169	and accessed non-sequentially	-0.124939
-1.736242	less than 1%	-0.124939
-0.359002	and non-recoverable errors;	-0.124939
-0.890453	(n & 1)	-0.124939
-1.431217	line size (typically	-0.124939
-0.601631	error that hackers	-0.124939
-0.359002	avoid hard-to-find errors,	-0.124939
-2.460478	by the formula:	-0.124939
-0.359002	this loop? Certainly	-0.124939
-0.864120	call statement occupies	-0.124939
-0.550887	with internal multi-threading,	-0.124939
-0.588671	9.6b. #include "xmmintrin.h"	-0.124939
-1.203386	space is occupied	-0.124939
-0.570629	matrix. My experimental	-0.124939
-1.592995	loop control condition:	-0.124939
-0.600355	happens at runtime).	-0.124939
-0.359002	r1, r2, c1,	-0.124939
-0.359002	Exception Specifications, Dr	-0.124939
-1.600625	overhead of switching	-0.124939
-1.708653	the two formulas	-0.124939
-1.078555	easy to trace	-0.124939
-1.190700	also called Single-Instruction-Multiple-Data	-0.124939
-0.463732	clauses: initialization, condition,	-0.124939
-0.600794	SafeArray { protected:	-0.124939
-0.601866	reveal a zigzag	-0.124939
-1.200177	on this topic,	-0.124939
-0.505122	seriously. User complaints	-0.124939
-0.891151	128 bits (XMM),	-0.124939
-2.412482	a function local:	-0.124939
-2.367289	rather than 20.	-0.124939
-0.550882	C, C++, D,	-0.124939
-0.589688	statements like throw(A,B,C)	-0.124939
-1.799425	and it fills	-0.124939
-1.341060	be made local.	-0.124939
-0.597417	allows less precise	-0.124939
-0.359002	Table 18.3. Predefined	-0.124939
-1.436134	and one local,	-0.124939
-1.880050	are accessed column-wise.	-0.124939
-0.601330	Another function __intel_cpu_features_init_x()	-0.124939
-0.505122	partial flags stall	-0.124939
-0.847539	i+=3){ list[i] =0;	-0.124939
-0.527397	platforms. Graphics accelerators	-0.124939
-0.463732	"Effective C++". Addison-Wesley.	-0.124939
-0.903059	sign bit: absvalue	-0.124939
-0.575704	than mov eax,0.	-0.124939
-0.359002	__asm fistp dword	-0.124939
-0.359002	of inte- ger	-0.124939
-1.078023	work. The recommendations	-0.124939
-1.693974	of data decomposition,	-0.124939
-0.563114	of jump targets.	-0.124939
-2.538960	may be undesired.	-0.124939
-0.891374	or 32 bytes).	-0.124939
-0.359002	for discussions. Turn	-0.124939
-2.640857	// Example 12.6.	-0.124939
-2.151139	in example 7.32b.	-0.124939
-0.599341	use Intel VTune,	-0.124939
-0.885295	the interval [1.0,	-0.124939
-0.598863	(called static if),	-0.124939
-0.601876	macro is referencing	-0.124939
-1.970825	is called VTune;	-0.124939
-0.359002	does incredibly stupid	-0.124939
-0.596461	precision without worrying	-0.124939
-0.597547	checked before storing.	-0.124939
-1.051283	and operators ......................................................................	-0.124939
-2.640857	// Example 7.29b	-0.124939
-0.359002	Warren, Jr.: "Hacker's	-0.124939
-0.527397	and objects. Storage	-0.124939
-0.359002	a constant: Unsigned	-0.124939
-0.660012	becoming increasingly blurred	-0.124939
-2.640857	// Example 7.29a	-0.124939
-1.999077	unsigned int dummy;	-0.124939
-2.283633	such as eliminating	-0.124939
-0.881145	may write FatalAppExitA(0,"Array	-0.124939
-1.076864	code by emulating	-0.124939
-0.598522	current version satisfies	-0.124939
-2.061916	(i = (int)n	-0.124939
-0.570626	the module with,	-0.124939
-0.359002	is utilized appropriately.	-0.124939
-0.359002	512 378.7 168.5	-0.124939
-0.598128	x8*x2; return x10;	-0.124939
-0.359002	513 58.7 168.3	-0.124939
-0.861651	that produce streaming	-0.124939
-0.895261	2 return (2.5f	-0.124939
-1.532146	difference between commas	-0.124939
-0.505122	y2, reciprocal_divisor; reciprocal_divisor	-0.124939
-3.208921	of the usual	-0.124939
-0.597477	recently 4 ?Func2@@YAXQAHAAH@Z	-0.124939
-0.588671	<excpt.h> #include <float.h>	-0.124939
-0.601772	100*16, and temp++	-0.124939
-2.086278	you are doing.	-0.124939
-0.601876	footprint is unreasonably	-0.124939
-0.601777	remarkably in popularity	-0.124939
-0.463732	is relocated (rebased)	-0.124939
-0.726952	add cmp ja	-0.124939
-0.891151	256 bits (YMM),	-0.124939
-0.359002	interval [1.0, 2.0)	-0.124939
-1.202835	software that dates	-0.124939
-1.596584	}; // Called	-0.124939
-2.363938	- n.a. (-a)*(-b)	-0.124939
-0.600983	matrix[NUMROWS][NUMCOLUMNS]; int row,	-0.124939
-0.527404	its mirror position	-0.124939
-1.203346	need a constructor.	-0.124939
-0.601772	2A and 2B.	-0.124939
-0.601866	(not a number).	-0.124939
-1.185330	logical processors (0,	-0.124939
-2.209744	See page 78.	-0.124939
-1.030502	the binary decimals	-0.124939
-0.589199	make separate executables	-0.124939
-0.601580	means are among	-0.124939
-0.588098	see below. Installing	-0.124939
-0.587477	(absvalue > largest_abs)	-0.124939
-2.151139	in example 8.15b.	-0.124939
-0.550882	use truncation towards	-0.124939
-1.380066	the multiplication b[i]*c[i],	-0.124939
-0.463732	where appropriate. Compiler-specific	-0.124939
-2.130427	set is maintained	-0.124939
-0.600614	calculation more efficient:	-0.124939
-1.178783	inline __m128i LoadVectorA(void	-0.124939
-0.359002	a[arraysize], b[arraysize], c[arraysize];	-0.124939
-1.642873	improve the performance,	-0.124939
-2.101628	make a sensible	-0.124939
-0.505131	/O3 -O3 Interprocedural	-0.124939
-0.592837	instructions. Function Assembly	-0.124939
-1.071785	N> class powN<true,N>	-0.124939
-2.265201	n.a. - a+b+c	-0.124939
-2.719997	is a compelling	-0.124939
-3.071155	in the profile.	-0.124939
-1.445242	elements are cumbersome	-0.124939
-1.784518	byte at 403	-0.124939
-0.550877	emmintrin.h SSE3 pmmintrin.h	-0.124939
-2.014120	likely to experience.	-0.124939
-2.030478	a program executable:	-0.124939
-1.503603	not a vector).	-0.124939
-0.541332	(Integrated Development Environments)	-0.124939
-1.394996	on non-Intel machines?	-0.124939
-0.601665	are for those	-0.124939
-0.601423	absvalue = a[i].u[1]	-0.124939
-1.107173	copy protection scheme	-0.124939
-2.670223	for the IDE,	-0.124939
-0.890453	(N & (N-1))	-0.124939
-1.937031	useful for investigating	-0.124939
-0.861672	or #pragma novector	-0.124939
-0.601423	max = 110;	-0.124939
-0.463732	for vacant spaces.	-0.124939
-0.359002	discrete icon signaling	-0.124939
-1.937603	not be passed	-0.124939
-0.588671	(Intel) #include <pmmintrin.h>	-0.124939
-2.022052	the first sub-vector.	-0.124939
-2.763419	to the IEEE	-0.124939
-0.879563	OR operator (|)	-0.124939
-0.541332	is relatively expensive,	-0.124939
-0.877734	that model N-1	-0.124939
-0.601580	sections are dominating	-0.124939
-0.541344	__fastcall __attribute(( fastcall))	-0.124939
-0.359002	been brutally interrupted.	-0.124939
-0.885319	128 128 17.4	-0.124939
-0.828405	in interpreted script	-0.124939
-2.394433	a = lookup[b];	-0.124939
-0.903059	return _mm_loadu_si128((__m128i const*)p);}	-0.124939
-0.359002	128 Iu8vec16 Vec16uc	-0.124939
-0.527397	floats F32vec4 xxn(x4,	-0.124939
-0.897322	r + i/2;	-0.124939
-1.289380	to integer According	-0.124939
-0.590138	and microprocessor microarchitecture.	-0.124939
-0.893870	version. For team	-0.124939
-0.587477	abs(u.f) > abs(v.f)	-0.124939
-0.463732	(); __asm__ (".type	-0.124939
-0.893870	version. For one-man	-0.124939
-1.677695	code for vectorization.............................................................	-0.124939
-0.726952	need metaprogramming. None	-0.124939
-0.584309	function #define MAX(a,b)	-0.124939
-1.220152	branch target buffer,	-0.124939
-1.740246	we can roughly	-0.124939
-1.824927	y = pow(x,n)	-0.124939
-0.660012	& 3) <<6	-0.124939
-3.071155	in the arrays:	-0.124939
-0.359002	Family 15h Processors".	-0.124939
-0.359002	profiling methods: Instrumentation:	-0.124939
-2.177960	value of i&15	-0.124939
-0.359002	or "__attribute__((visibility ("hidden")))".	-0.124939
-1.549939	used as buffers	-0.124939
-0.871620	2 AVX2 _mm256_i64gather_pd	-0.124939
-0.601380	referencing it twice.	-0.124939
-0.359002	systems. Today (2013)	-0.124939
-0.902897	tested the capability	-0.124939
-1.830426	call to _endthread()	-0.124939
-0.601423	polynomial(x) = 2.5*x^2	-0.124939
-0.601394	3"); or __debugbreak();.	-0.124939
-1.923513	and other system-	-0.124939
-1.076994	many function calls,	-0.124939
-0.601851	tempting to fine-	-0.124939
-1.751441	function library asmlib,	-0.124939
-0.601876	chapter is aiming	-0.124939
-1.625523	have been found,	-0.124939
-1.795590	a particular subtask	-0.124939
-0.601772	lrintf and lrint.	-0.124939
-0.598586	(b[i] * c[i]);	-0.124939
-0.359002	32 -231 231-1	-0.124939
-0.897532	Function pointer serves	-0.124939
-0.600072	about instruction latencies	-0.124939
-1.119709	a limited audience	-0.124939
-0.359002	(remove unreferen- ced	-0.124939
-0.873077	it becomes full.	-0.124939
-1.940599	instead of if.	-0.124939
-0.359002	user. Feature bloat.	-0.124939
-0.902318	library. The radical	-0.124939
-1.065754	cache space. Putting	-0.124939
-0.902318	pointers. The absence	-0.124939
-1.929534	else { FuncB(i);	-0.124939
-1.713751	way of solving	-0.124939
-3.208921	of the programmers'	-0.124939
-1.743471	compile time. Four	-0.124939
-1.855080	calculation of (a+b).	-0.124939
-0.527404	to contained objects?	-0.124939
-2.141071	stored in y.	-0.124939
-1.099295	array bounds violations,	-0.124939
-2.573352	on the processor)	-0.124939
-0.600534	Loops: A sourcebook	-0.124939
-0.550877	vectors SSE3 horizontal	-0.124939
-2.581419	floating point comparison.	-0.124939
-2.955917	can be broken	-0.124939
-0.550887	= (float *)alloca(n	-0.124939
-2.283633	such as spell-checking	-0.124939
-0.563122	>= (unsigned int)size)	-0.124939
-2.640857	// Example 7.34a.	-0.124939
-0.899156	requires only SSE).	-0.124939
-1.300547	though the CPU-type	-0.124939
-0.601772	synchronizing and communicating	-0.124939
-1.714844	only the even-numbered	-0.124939
-1.846652	a different meaning	-0.124939
-0.601423	a<<b<<c = a<<(b+c)	-0.124939
-2.654124	the code mixes	-0.124939
-0.598550	as many encryption	-0.124939
-0.595452	size. I tried	-0.124939
-1.069830	const float coef[16]	-0.124939
-0.902318	separately. The fallacy	-0.124939
-2.593306	is not referenced	-0.124939
-0.463732	#define EXCEPTION_FLT_OVERFLOW 0xC0000091L	-0.124939
-1.600320	such a formalism.	-0.124939
-1.804132	case of underflow:	-0.124939
-0.595013	align ; mark	-0.124939
-0.599227	set into sub-vectors	-0.124939
-0.557825	with compile-time polymorphism.	-0.124939
-1.601094	or the __assume_aligned	-0.124939
-0.557825	a compile-time polymorphism,	-0.124939
-1.150784	the runtime polymorphism:	-0.124939
-0.359002	the self-explaining menus	-0.124939
-1.365021	other compilers (Microsoft,	-0.124939
-0.359002	d, e, f,	-0.124939
-1.679366	up to date):	-0.124939
-0.597353	is unsigned Examples:	-0.124939
-1.289475	without cache MOVNTPS	-0.124939
-2.145616	because it lacks	-0.124939
-2.955917	can be arranged	-0.124939
-1.816588	to calculate (c+d)	-0.124939
-0.902723	ebx is pushed	-0.124939
-0.601772	disks and USB	-0.124939
-1.059216	than its brand,	-0.124939
-0.888823	intermediate result (b+c)	-0.124939
-0.601394	frame" or "frame	-0.124939
-0.855627	}; struct Sdouble	-0.124939
-0.591659	flexible, well tested,	-0.124939
-0.600983	p->b;} int Sum3(S3	-0.124939
-1.049121	a suitable duration.	-0.124939
-1.300643	within the lifetime	-0.124939
-0.359002	< &list[100]; temp++)	-0.124939
-2.640857	// Example 14.13c	-0.124939
-0.359002	objects numbered consecutively?	-0.124939
-2.640857	// Example 14.13a	-0.124939
-0.600904	chain may fill	-0.124939
-2.640857	// Example 8.15b	-0.124939
-1.032792	8 AVX2 _mm_i64gather_pd	-0.124939
-0.601772	GOT, and finally	-0.124939
-0.861659	chip. Such hybrid	-0.124939
-0.505122	int list[100]; Func1(list,	-0.124939
-0.861668	a whole workday	-0.124939
-0.902131	Templates are instantiated	-0.124939
-1.923513	and other flaws	-0.124939
-0.875615	for char pointers).	-0.124939
-0.570629	"generate map file"	-0.124939
-1.951190	{ return IntegerPower<10>(x);	-0.124939
-0.585224	versions were tested:	-0.124939
-1.078605	testing and analyzing	-0.124939
-0.599581	7.36 class S2	-0.124939
-0.599581	7.37 class S3	-0.124939
-2.363938	- n.a. (a+b)+c	-0.124939
-0.601423	list[] = {1.1,	-0.124939
-1.693974	of data elements,	-0.124939
-0.463732	child function: (static_cast<MyChild*>(this))->Disp();	-0.124939
-2.955917	can be cross-	-0.124939
-0.575698	functions (e.g. GetLogicalProcessorInformation	-0.124939
-2.291231	part of it).	-0.124939
-1.249749	be quite substantial.	-0.124939
-0.901817	Table[x] = A*x*x	-0.124939
-0.527397	smmintrin.h (Gnu) AES,	-0.124939
-1.299317	certain that u	-0.124939
-0.567277	a device driver.	-0.124939
-2.955917	can be combined.	-0.124939
-0.902513	advantages and disadvantages.	-0.124939
-0.598361	Obviously, we loose	-0.124939
-0.359002	heavy competition. Processors	-0.124939
-1.904898	efficient than investing	-0.124939
-1.466097	of its arguments.	-0.124939
-1.199024	library at www.agner.org/optimize/asmlib.zip	-0.124939
-2.304254	{ // Round	-0.124939
-1.533164	more complicated reductions.	-0.124939
-0.597988	admittedly very kludgy.	-0.124939
-0.855627	in Linux, sched_setaffinity).	-0.124939
-0.598354	get any answer.	-0.124939
-0.595302	utilizing its out-of-	-0.124939
-0.600983	blocking: int r1,	-0.124939
-0.567277	facilities, easy GUI	-0.124939
-0.601772	flush and fence	-0.124939
-0.359002	the sign, eee	-0.124939
-0.359002	a scalar (Scalar	-0.124939
-0.999913	exception handling. Omitting	-0.124939
-2.367289	rather than nine,	-0.124939
-0.359002	mov lea $B2$2:	-0.124939
-2.640857	// Example 7.10b	-0.124939
-0.599166	c.y + d.y;	-0.124939
-2.640857	// Example 7.10a	-0.124939
-0.902680	degree of randomness	-0.124939
-0.851891	low priority thread,	-0.124939
-0.894977	32-bit software development",	-0.124939
-0.601077	when not selected.	-0.124939
-0.600983	are: int BigArray[1024]	-0.124939
-0.878695	will see shortly.	-0.124939
-0.899123	of instruction latencies,	-0.124939
-0.359002	the broader perspective	-0.124939
-0.894896	with long latencies.	-0.124939
-0.359002	^, ~, <<,	-0.124939
-0.600983	7.18 int FuncRow(int);	-0.124939
-0.601448	multiple // versions:	-0.124939
-1.798598	b = 1.0E8,	-0.124939
-0.601772	12.4b and 12.4c	-0.124939
-0.527404	regularly. Intel: "Intel®	-0.124939
-0.726952	may seem illogical	-0.124939
-0.505131	"AMD64 Architecture Programmer’s	-0.124939
-0.463732	The clumsy AND-OR	-0.124939
-1.077273	s = (short	-0.124939
-0.588094	if(!a && !b)	-0.124939
-1.437723	in multiple versions,	-0.124939
-0.550887	with embedded microcontrollers.	-0.124939
-0.587477	-a > -b	-0.124939
-0.589691	integer expression -a	-0.124939
-0.359002	Prefetch PREFETCH _mm_prefetch	-0.124939
-0.588095	license Table 12.4.	-0.124939
-0.836084	template <typename MyChild>	-0.124939
-2.195854	short int bb[size]	-0.124939
-0.463732	of digital building	-0.124939
-0.601580	STL are universal,	-0.124939
-1.745898	efficient to pool	-0.124939
-1.203299	representation of &list[100]	-0.124939
-1.823421	} // Entry	-0.124939
-0.598920	inline float add_elements(__m128	-0.124939
-0.359002	or void. Returning	-0.124939
-0.660012	(i=0; i<n; ++i).	-0.124939
-2.064340	const int NUMROWS	-0.124939
-1.299122	sum = (s0+s1)+(s2+s3);	-0.124939
-0.550882	Intel Performance Primitives	-0.124939
-2.431519	to a narrow	-0.124939
-2.640857	// Example 12.4e.	-0.124939
-0.588099	of runtime DLL's	-0.124939
-0.359002	section 17.9: "Moving	-0.124939
-0.593563	elements inside sqaure:	-0.124939
-1.201578	SelectAddMul_pointer = &SelectAddMul_dispatch;	-0.124939
-0.359002	etc. Locked mutexes.	-0.124939
-0.764737	or mouse move.	-0.124939
-0.593316	xn / nfac;	-0.124939
-1.501997	intended for detecting	-0.124939
-0.359002	a conditional move,	-0.124939
-0.359002	Rick Booth: "Inner	-0.124939
-0.359002	multiple logically distinct	-0.124939
-0.601423	a&b&c&d = (a&b)&(c&d)	-0.124939
-0.463732	General Public License,	-0.124939
-0.899108	one CPU core,	-0.124939
-0.589688	classes like string,	-0.124939
-0.359002	0.30 4.5 0.82	-0.124939
-0.359002	// Print heading	-0.124939
-2.016540	on page 60.	-0.124939
-0.577670	to optimization. Prefetching	-0.124939
-0.588679	supports automatic vectorization,	-0.124939
-1.162614	the current position.	-0.124939
-0.359002	2 0.77 0.89	-0.124939
-0.828398	several iterations ahead.	-0.124939
-0.595696	list[i & 15]	-0.124939
-0.527404	a virus scanner	-0.124939
-2.429659	the program logic.	-0.124939
-0.359002	version 2.11 ifunc	-0.124939
-0.899204	Fastcall functions /Gr	-0.124939
-0.601423	n! = n∙(n-1)!.	-0.124939
-2.178003	} } Transposing	-0.124939
-1.550915	done by controlling	-0.124939
-0.601772	auto_ptr and shared_ptr.	-0.124939
-1.378484	files and databases.	-0.124939
-0.597547	experience before trying	-0.124939
-2.773232	in a multitasking	-0.124939
-1.158261	and multiplication (27	-0.124939
-1.158261	and multiplication (20	-0.124939
-0.601423	size) = (total	-0.124939
-2.640857	// Example 8.5b	-0.124939
-1.276203	program optimization /GL	-0.124939
-2.640857	// Example 8.5a	-0.124939
-2.291231	part of it)	-0.124939
-1.468048	CPU dispatch strategies	-0.124939
-1.300296	optimization is requested.	-0.124939
-0.601876	response is delayed	-0.124939
-0.592575	ArraySize; i++) List[i]++;	-0.124939
-0.597547	job before you.	-0.124939
-0.660012	Example 14.27 assumes	-0.124939
-2.581419	floating point variable:	-0.124939
-0.463732	Delight". Addison-Wesley, 2003.	-0.124939
-2.743091	It is assumed	-0.124939
-0.896524	Borland C++ builder.	-0.124939
-0.359002	&&, ||, !	-0.124939
-0.596160	in programming nowadays	-0.124939
-0.505122	56 7.28 Templates...............................................................................................................57	-0.124939
-0.567280	tested implement OneOrTwo5[b!=0]	-0.124939
-0.563126	trees, hash maps	-0.124939
-1.115907	in chapter 9.10,	-0.124939
-1.361516	in two steps.	-0.124939
-0.570632	old version. Updating	-0.124939
-0.601394	/QaxAVX or -axAVX.	-0.124939
-1.549388	simply by inverting	-0.124939
-0.601876	&list[100] is (int)(&list[100])	-0.124939
-2.394433	a = _mm_or_si128(c2,	-0.124939
-0.575704	instructions mov ebx,eax	-0.124939
-0.596903	(partial) template specialization.	-0.124939
-0.596903	non-recursing template specialization,	-0.124939
-2.955917	can be improved.	-0.124939
-1.203255	C++ and Fortran.	-0.124939
-0.359002	processing. Scott Meyers:	-0.124939
-1.824927	y = (a1*b2	-0.124939
-2.593306	is not evaluated,	-0.124939
-2.394433	a = OneOrTwo5[b!=0];	-0.124939
-0.591976	c1() : x(0)	-0.124939
-0.359002	earlier vmlsExp4 vmldExp2	-0.124939
-2.219197	will be rounded	-0.124939
-0.601448	(int)d; // Truncation	-0.124939
-1.258972	b / 1.2345;	-0.124939
-0.601777	short in duration	-0.124939
-0.550882	when type-casting pointers:	-0.124939
-0.601230	begin with _mm.	-0.124939
-0.600355	Nerds at Wikibooks.	-0.124939
-0.579368	and task switches;	-0.124939
-1.948995	compiler may interleave	-0.124939
-1.027952	not overlap. 27	-0.124939
-0.359002	0.11 1.21 0.57	-0.124939
-0.359002	clock; __cpuid(dummy, 0);	-0.124939
-1.445907	and to Eclipse	-0.124939
-0.901432	interfere with real	-0.124939
-0.599166	A*x*x + B*x	-0.124939
-0.359002	Dobbs Journal, 2002).	-0.124939
-1.300643	Use the "generate	-0.124939
-1.460714	is always true/false	-0.124939
-2.593306	is not detected	-0.124939
-0.505122	computer. Big supercomputers	-0.124939
-0.601286	pointers, by initializing	-0.124939
-1.203386	memory is mirrored	-0.124939
-1.528448	100; i++) matrix[FuncRow(i)][FuncCol(i)]	-0.124939
-0.601448	static_cast<float>(i); // Implicit	-0.124939
-0.597128	very often underestimate	-0.124939
-1.294479	with this rule.	-0.124939
-1.162601	end user. Installation	-0.124939
-0.902131	names are undocumented.	-0.124939
-0.601394	polygon or bitmap	-0.124939
-0.890453	(n & 0x7FFFFF)	-0.124939
-0.836084	Manual", Volume 2A	-0.124939
-2.334742	as a valuable	-0.124939
-0.764737	the C-style type-casting.	-0.124939
-1.438553	large data bases,	-0.124939
-0.599948	system which redirects	-0.124939
-0.883156	denormals-are-zero mode (SSE2):	-0.124939
-1.644422	can cause severe	-0.124939
-1.369455	the last member.	-0.124939
-1.069830	bit float vectors)	-0.124939
-0.359002	a funda- mentally	-0.124939
-0.601631	wires that connect	-0.124939
-0.359002	the responsi- bility	-0.124939
-0.563126	nfac *= n+1;	-0.124939
-0.577688	types (See Sutter:	-0.124939
-1.371607	A more primitive,	-0.124939
-0.902300	Time for transposing	-0.124939
-2.648786	the same computer,	-0.124939
-0.601876	values is closest	-0.124939
-0.902673	Loop to print	-0.124939
-2.763799	if the evaluation	-0.124939
-0.902300	ms for foreground	-0.124939
-0.359002	__attribute__((const)) (Linux only).	-0.124939
-1.203494	advantage to obtain,	-0.124939
-0.601772	tested and investigated	-0.124939
-0.898392	Calculate integer power,	-0.124939
-0.601320	7.8 if (handle	-0.124939
-1.018393	Compiler v. 11.1	-0.124939
-0.601448	T // Constructor	-0.124939
-0.527397	63 31 11.6	-0.124939
-0.359002	or removable media	-0.124939
-0.601674	copying. The benefits	-0.124939
-0.527397	65 33 11.8	-0.124939
-0.895261	2 return powN<(N	-0.124939
-2.763799	if the bias	-0.124939
-0.902723	information is utilized	-0.124939
-3.071155	in the MKL	-0.124939
-0.359002	64 -263 263-1	-0.124939
-0.527397	256 F32vec4 F64vec2	-0.124939
-1.458129	in assembly language",	-0.124939
-1.505094	Mac OS X.	-0.124939
-0.586799	level linking (remove	-0.124939
-0.594700	support processor X"	-0.124939
-0.601876	market is developing	-0.124939
-1.599188	arrays are aligned,	-0.124939
-1.452236	to write 2.0/3.0	-0.124939
-2.640857	// Example 7.31b	-0.124939
-2.640857	// Example 7.31a	-0.124939
-0.598586	powN<(N1&(N1-1))==0,N1>::p(x) * powN<true,N-N1>::p(x);	-0.124939
-0.550892	the Common Language	-0.124939
-0.586795	instructions becomes noticeable.	-0.124939
-0.895170	than 2 gigabytes	-0.124939
-0.588098	x; n >>=	-0.124939
-2.066598	(see page 103)	-0.124939
-2.334742	as a learning	-0.124939
-2.640857	// Example 7.43b.	-0.124939
-1.189149	for some links.	-0.124939
-0.600983	c; int UnusedFiller;	-0.124939
-2.341047	with a 50-50	-0.124939
-0.600862	slow, you know).	-0.124939
-0.527397	a detailed overview	-0.124939
-0.902723	F1 is supposed	-0.124939
-2.640857	// Example 14.4b	-0.124939
-2.640857	// Example 15.1a.	-0.124939
-0.895170	to 2 Mbytes.	-0.124939
-0.601777	problem in interactive	-0.124939
-0.463732	Studio 2008 version).	-0.124939
-0.359002	nn ifbit=1 bitofn	-0.124939
-0.541332	exception ever happens.	-0.124939
-1.078317	complicated and error-prone.	-0.124939
-0.463732	compilers. Wikipedia article	-0.124939
-0.896768	with two entries.	-0.124939
-1.072722	efficient, but risky.	-0.124939
-0.541332	a project built	-0.124939
-0.858846	child class. Members	-0.124939
-3.071155	in the majority	-0.124939
-0.601488	Studio can build	-0.124939
-0.873087	Static linking (multithreaded)	-0.124939
-1.139619	if ((unsigned int)(i	-0.124939
-0.573382	it rarely justifies	-0.124939
-2.640857	// Example 8.13a	-0.124939
-2.640857	// Example 8.13b	-0.124939
-0.660012	|, ^, ~,	-0.124939
-0.601963	reinvent the wheel.	-0.124939
-0.601851	years to come.	-0.124939
-0.593552	doesn't works (gcc	-0.124939
-0.601963	lacks the self-explaining	-0.124939
-2.057727	you may actively	-0.124939
-0.359002	-2.0, 4.4, 2.5};	-0.124939
-0.599166	vector(x + a.x,	-0.124939
-1.920745	important to weigh	-0.124939
-1.379631	cause the resource-hungry	-0.124939
-0.601230	extending with zero-bits	-0.124939
-1.838674	is often reorganized	-0.124939
-0.971507	= -1 (a&~b)|(~a&b)=a^b	-0.124939
-0.359002	block: 62 __try	-0.124939
-1.377844	and for minimizing	-0.124939
-0.601777	cheap, in relation	-0.124939
-0.601423	a[c][r] = b[r][c];	-0.124939
-0.601230	defined with enum,	-0.124939
-1.288282	In example 8.21,	-0.124939
-2.640857	// Example 14.15b	-0.124939
-1.302463	is too fine	-0.124939
-0.359002	free E-book Usability	-0.124939
-1.317085	and automatic CPU-dispatching	-0.124939
-0.601394	ger or double)	-0.124939
-1.630305	called from main,	-0.124939
-1.379377	certain to truly	-0.124939
-0.902318	separately. The allocation,	-0.124939
-1.838674	is often seen,	-0.124939
-0.359002	Foundation Classes (MFC).	-0.124939
-0.359002	strcat, strlen, sprintf,	-0.124939
-2.657080	of a double:	-0.124939
-0.359002	__asm__ (".type CriticalFunction,	-0.124939
-1.299590	loop and reorganize:	-0.124939
-0.601772	tortuous and convoluted	-0.124939
-0.599166	a[i] + b[i];	-0.124939
-0.573385	as e.g. .R.	-0.124939
-0.596461	used without restrictions.	-0.124939
-0.601876	manuals is copyrighted	-0.124939
-0.901759	disk or network.	-0.124939
-2.180477	i < arraysize;	-0.124939
-2.311845	possible to express	-0.124939
-0.590947	be both cheaper	-0.124939
-0.600378	case memory re-allocation	-0.124939
-0.899923	is then de-referenced	-0.124939
-1.047708	be used. Web	-0.124939
-0.902673	user to restart	-0.124939
-0.359002	runtime DLL's (dynamically	-0.124939
-0.601665	i++) for (j	-0.124939
-0.601394	Copying or clearing	-0.124939
-1.077996	than for auto_ptr.	-0.124939
-0.359002	/Oa -fno-alias Non-strict	-0.124939
-0.900939	1000; int List[ArraySize];	-0.124939
-1.214344	in my experiments.	-0.124939
-0.601876	file, is acceptable.	-0.124939
-1.550603	x = *(++p)	-0.124939
-0.359002	Vec8ui Vec4q Vec4uq	-0.124939
-0.901867	slow // Modulo	-0.124939
-0.359002	two suggested improvements).	-0.124939
-1.996880	compiler to vectorize,	-0.124939
-0.601855	comparison of doubles	-0.124939
-0.599969	hyperthreading. If so,	-0.124939
-0.359002	Example 7.43b. Compile-time	-0.124939
-3.208921	of the weekdays.	-0.124939
-0.600943	in compiler price	-0.124939
-0.601394	(SDK or PSDK).	-0.124939
-0.541332	fastcall)) __fastcall Noncached	-0.124939
-1.218771	of range printf(Greek[n]);	-0.124939
-0.505131	not modified. Unlike	-0.124939
-1.863858	the user interface,	-0.124939
-0.573382	Kernel Library (MKL	-0.124939
-1.444791	waiting for response.	-0.124939
-1.075980	than an MFC	-0.124939
-0.601077	SSE2 not supported");	-0.124939
-1.070879	misses, branch misprediction,	-0.124939
-2.568185	and the loader.	-0.124939
-0.883751	will calculate (1./1.2345)	-0.124939
-1.550603	x = array[++i]	-0.124939
-0.600983	module1.cpp int Func1(int	-0.124939
-0.463732	expects immediate responses	-0.124939
-0.601580	vendors are offering	-0.124939
-1.644363	vector classes Programming	-0.124939
-2.102394	on a First-In-Last-	-0.124939
-1.716331	the code. Inserting	-0.124939
-1.077273	(set) = (10000	-0.124939
-0.836097	optimizing away cpuid	-0.124939
-2.151139	in example 16.2.	-0.124939
-0.359002	test theory. Advice	-0.124939
-2.363938	- n.a. a+a+a+a	-0.124939
-0.851927	[edx] DWORD PTR[ecx+eax*4],ebx	-0.124939
-0.463732	8.26a (32-bit mode):	-0.124939
-0.601423	OneOrTwo5[2] = {1.0f,	-0.124939
-1.568393	is so kludgy	-0.124939
-1.038561	has three clauses:	-0.124939
-0.359002	is occupied throughout	-0.124939
-0.599358	x2*x2; double x8	-0.124939
-0.463732	variable. (This eliminates	-0.124939
-1.771727	would be straightforward.	-0.124939
-0.601772	it and create	-0.124939
-1.899809	or a non-const	-0.124939
-1.201033	obtained by dropping	-0.124939
-1.220168	Microsoft Visual Studio.	-0.124939
-0.594172	immintrin.h AMD SSE4A	-0.124939
-1.741574	function or friend	-0.124939
-0.601330	using function inlining,	-0.124939
-0.896720	use static linking,	-0.124939
-1.288282	In example 12.2,	-0.124939
-1.300113	allocation is unnecessarily	-0.124939
-0.902723	exception is caught	-0.124939
-0.359002	an if-else structure),	-0.124939
-1.203569	string is checked	-0.124939
-0.593970	s1 += a[i+1];	-0.124939
-0.601320	8.10a if (true)	-0.124939
-2.066598	(see page 107),	-0.124939
-2.066598	(see page 122)	-0.124939
-2.016540	on page 62.	-0.124939
-0.586048	compatibility, second source,	-0.124939
-2.016540	on page 96.	-0.124939
-0.881127	software was coded.	-0.124939
-1.171039	be optimized further.	-0.124939
-1.823421	} // Branch/loop	-0.124939
-2.177794	by a key?	-0.124939
-0.598586	*)alloca(n * sizeof(float));	-0.124939
-0.463732	--------x a/1=a x-xxx-x--	-0.124939
-0.901084	-- - xx	-0.124939
-0.359002	a unique key.	-0.124939
-1.162601	end user. Menus,	-0.124939
-0.961003	by default, conform	-0.124939
-0.896169	(columns * sizeof(float)).	-0.124939
-2.341047	with a password.	-0.124939
-0.359002	type casting. Linked	-0.124939
-1.371287	Intel vector classes):	-0.124939
-1.857152	of different compilers.............................................................................	-0.124939
-1.193096	before any transition	-0.124939
-0.527412	and subtraction (3	-0.124939
-0.359002	be obeyed. Copy	-0.124939
-0.580838	compiler, v. 10.1.020.	-0.124939
-0.463732	See ISO/IEC TR18015	-0.124939
-1.078631	what is happening.	-0.124939
-1.498998	or by keys	-0.124939
-1.004497	an overloaded assignment	-0.124939
-0.463732	65 7.33 Namespaces...........................................................................................................	-0.124939
-1.573985	different versions alternatingly	-0.124939
-0.994836	c, d, e,	-0.124939
-0.359002	v. 1.4, 2005.	-0.124939
-0.764737	by defining _mm_malloc	-0.124939
-0.594865	handler calls exit(),	-0.124939
-0.585222	memset(list, 0, sizeof(list));	-0.124939
-0.527397	controlled. Small hand-held	-0.124939
-1.626354	a * a;}	-0.124939
-0.896524	Borland C++ 5.82	-0.124939
-1.203346	within a year	-0.124939
-0.359002	the series: ex	-0.124939
-0.359002	1./24., 1./120., 1./720.,	-0.124939
-0.359002	instruction latencies, throughputs	-0.124939
-0.527397	i, i_div_3; for(i=i_div_3=0;	-0.124939
-0.557838	possible. Template meta-	-0.124939
-0.660012	{ goto CFALSE;	-0.124939
-1.929534	else { CFALSE:	-0.124939
-0.855613	in registers. Except	-0.124939
-0.359002	www.agner.org/ optimize/#vectorclass Include	-0.124939
-0.764749	be kept entirely	-0.124939
-0.601380	of it (&ArraySize)	-0.124939
-0.463732	in ASCII form.	-0.124939
-0.600505	and) will cut	-0.124939
-0.881127	software was developed.	-0.124939
-1.657097	the clock frequency,	-0.124939
-2.341047	with a lineage	-0.124939
-0.597673	neither faster nor	-0.124939
-1.951190	{ return x*x	-0.124939
-1.240224	your own error-handling	-0.124939
-0.577678	no clear correspondence	-0.124939
-1.298636	threads are areas	-0.124939
-0.600904	CPU may occasionally	-0.124939
-1.057926	of calculations forms	-0.124939
-0.359002	programming, modularity, reusability	-0.124939
-2.209744	See page 141.	-0.124939
-0.901432	structures with First-In-First-Out	-0.124939
-1.778704	is very old-fashioned.	-0.124939
-0.598128	#endif return n;}	-0.124939
-1.459342	to replace u[1]	-0.124939
-0.598269	give some indication	-0.124939
-0.789104	shift operation. x*8	-0.124939
-2.188162	is more complicated.	-0.124939
-1.495154	same memory block,	-0.124939
-0.359002	deque (doubly ended	-0.124939
-1.146920	for Windows, -msse2,	-0.124939
-0.601963	Choose the strongest	-0.124939
-1.920745	important to remember	-0.124939
-0.828405	source file. Keep	-0.124939
-0.557829	from disk. Memory-hungry	-0.124939
-2.429856	with the sizeof	-0.124939
-1.920422	and a server	-0.124939
-0.599948	(RTTI), which affects	-0.124939
-1.305475	the dispatch decision	-0.124939
-0.359002	2 0.63 0.75	-0.124939
-1.067355	Core 2 0.77	-0.124939
-1.137054	dependency chain. Nothing	-0.124939
-0.596903	2: template <bool	-0.124939
-2.719997	is a staircase	-0.124939
-0.601674	2.6f; The ?:	-0.124939
-0.899732	speed, memory economy	-0.124939
-0.359002	1./4.790016E8, 1./6.22702E9, 1./8.71782E10,	-0.124939
-1.951190	{ return ipow(x,10);	-0.124939
-0.463732	conversion, shuffling, packing,	-0.124939
-0.359002	Addison-Wesley, 2003. Contains	-0.124939
-1.634968	have an attribute	-0.124939
-0.557829	my free E-book	-0.124939
-2.394433	a = (int)d;	-0.124939
-0.588095	16 Table 7.2.	-0.124939
-0.359002	Now s0, s1,	-0.124939
-0.541338	10.1.020. Functions _intel_fast_memcpy	-0.124939
-0.463732	to windows, graphic	-0.124939
-1.710891	functions for vectors........................................................................	-0.124939
-2.640857	// Example 9.1a	-0.124939
-2.640857	// Example 9.1b	-0.124939
-1.999077	unsigned int absvalue,	-0.124939
-1.351454	header file mathimf.h	-0.124939
-2.314404	use the GetTickCount	-0.124939
-0.586057	for XMM registers;	-0.124939
-0.596981	static libraries (.lib	-0.124939
-2.102969	for a 2'nd	-0.124939
-1.641200	cost of verifying,	-0.124939
-0.902318	fast. The lesson	-0.124939
-0.359002	access. Sequential forward	-0.124939
-3.208921	of the original,	-0.124939
-0.902673	attempts to translate	-0.124939
-0.359002	to experience. Occasionally,	-0.124939
-0.570619	for significant improvements.	-0.124939
-0.901817	largest_abs = absvalue;	-0.124939
-1.373690	code section position-independent,	-0.124939
-2.177794	by a conditional	-0.124939
-0.588095	97 Table 9.1.	-0.124939
-0.889688	a stack frame,	-0.124939
-0.595307	"standard stack frame"	-0.124939
-0.557825	-128 127 int8_t	-0.124939
-1.114607	(see p. 104).	-0.124939
-0.359002	Vec4f Vec2d Vec8f	-0.124939
-0.359002	Vec16s Vec16us Vec8i	-0.124939
-0.359002	x^1, x^2, x^3,	-0.124939
-0.880384	Model-specific dispatching ....................................................................................	-0.124939
-1.023620	b ? 1.5f	-0.124939
-0.359002	optimization /GL --combine	-0.124939
-1.143308	4 AVX2 _mm256_i32gather_epi32	-0.124939
-0.885295	pointer aliasing. __declspec(noalias)	-0.124939
-1.177181	of CPUs unequally	-0.124939
-0.588096	(~a&c) | (b&c)	-0.124939
-1.878980	as the .exe	-0.124939
-0.557834	case 2: printf("Gamma");	-0.124939
-0.597952	2'nd order polynomial:	-0.124939
-0.901544	separated by commas.	-0.124939
-1.078317	used and popped	-0.124939
-0.894977	and software engineering	-0.124939
-0.902703	calculating a polynomial.	-0.124939
-1.601094	avoid the burdensome	-0.124939
-0.463732	IDE. Free trial	-0.124939
-1.408092	code becomes contiguous.	-0.124939
-1.063141	YMM registers .................................................................	-0.124939
-0.590135	programmer typically thinks	-0.124939
-0.598586	xxn * _mm_load_ps(coef+i);	-0.124939
-0.586052	for later maintenance.	-0.124939
-1.078771	installation of downloaded	-0.124939
-1.192429	version return (*CriticalFunction)(parm1,	-0.124939
-0.593316	a1 / b1;	-0.124939
-0.600355	aiming at explaining	-0.124939
-1.505073	loop count (ArraySize)	-0.124939
-2.393331	should be prepared	-0.124939
-1.009766	The so-called iterators	-0.124939
-0.586788	code" actually implies	-0.124939
-2.367289	rather than 200.	-0.124939
-0.359002	without jeopardizing safety,	-0.124939
-0.550877	a&(b|c) x-xxxx--x (a|b)&(a|c)	-0.124939
-0.359002	v. 4.1.0, 2006	-0.124939
-0.563110	Iss. 4, 2007	-0.124939
-0.359002	Copyright © 2004	-0.124939
-2.066598	(see page 53).	-0.124939
-1.803091	using a ready-made	-0.124939
-2.304254	{ // Detect	-0.124939
-2.283633	such as GetPrivateProfileString	-0.124939
-1.032820	by calling vector::reserve	-0.124939
-1.459254	: public CParent<CChild2>	-0.124939
-1.286394	of variable storage.............................................................................	-0.124939
-1.940741	necessary to query	-0.124939
-2.640857	// Example 7.33b	-0.124939
-0.601772	(new and delete).	-0.124939
-1.554844	faster to compose	-0.124939
-2.541450	the function prototype:	-0.124939
-0.600378	minimizing memory fragmentation.	-0.124939
-2.743091	It is OK,	-0.124939
-2.283633	such as sorting,	-0.124939
-0.660012	b2, y1, y2;	-0.124939
-1.171859	and cause fatal	-0.124939
-2.397645	is the scarcity	-0.124939
-0.764737	Mostly obsolete. Rick	-0.124939
-1.289475	without cache MOVNTI	-0.124939
-2.213916	the value -100+100+100	-0.124939
-0.902513	Structures and classes............................................................................................	-0.124939
-0.598556	constants, array initializer	-0.124939
-0.359002	and cryptography (www.intel.com).	-0.124939
-0.359002	1./5040., 1./40320., 1./362880.,	-0.124939
-2.219197	will be non-zero,	-0.124939
-0.868282	// constructor initializes	-0.124939
-0.601394	f(x) or g(x)	-0.124939
-1.499191	when you discover	-0.124939
-0.873084	will give -2.0	-0.124939
-2.066598	(see page 93).	-0.124939
-1.296571	optimized code (release	-0.124939
-1.200630	not on publicly	-0.124939
-1.083031	are highly optimized,	-0.124939
-0.902703	show a discrete	-0.124939
-0.359002	report /Qopt-report -opt-report	-0.124939
-1.625523	have been unsatisfied	-0.124939
-2.192153	are not safe,	-0.124939
-0.595307	and stack entries	-0.124939
-2.305491	than the other,	-0.124939
-0.901759	simultaneously or seemingly	-0.124939
-0.900953	through an imported	-0.124939
-2.593306	is not standardized.	-0.124939
-0.601851	modification to compensate	-0.124939
-0.463732	to test, maintain	-0.124939
-0.878688	functions like sin.	-0.124939
-0.359002	log, exp, sin,	-0.124939
-2.429856	with the rightmost	-0.124939
-1.593278	programming language ...............................................................................	-0.124939
-2.368199	because the insertion	-0.124939
-0.899627	public data object:	-0.124939
-0.594869	library versions instead.	-0.124939
-2.534636	to be saved.	-0.124939
-1.077908	bc = _mm_andnot_si128(mask,	-0.124939
-2.640857	// Example 8.11b	-0.124939
-2.640857	// Example 8.11a	-0.124939
-0.463732	Most IDE's (Integrated	-0.124939
-0.902723	order is opposite).	-0.124939
-0.463732	developer.intel.com. AMD: "AMD64	-0.124939
-0.601423	Friday = 0x20,	-0.124939
-1.929534	else { DTRUE:	-0.124939
-0.660012	{ goto DTRUE;	-0.124939
-1.078764	common to exchange	-0.124939
-0.898877	code, which supposedly	-0.124939
-1.030502	the binary digits.	-0.124939
-2.016540	on page 44.	-0.124939
-0.570619	seven significant digits,	-0.124939
-0.575704	external libraries. www.agner.org/optimize/#vectorclass	-0.124939
-0.601320	(approximately): if (absvalue	-0.124939
-2.151139	in example 8.23b	-0.124939
-0.587474	defined(__unix__) || defined(__GNUC__)	-0.124939
-0.594346	The common excuse	-0.124939
-0.836084	#define FUNCNAME SelectAddMul_SSE2	-0.124939
-1.321737	of course system-specific.	-0.124939
-0.575701	unpacking needed. Predictable	-0.124939
-1.021953	vector size. Later	-0.124939
-1.786042	C++ compilers www.agner.org/	-0.124939
-0.541338	by physical factors.	-0.124939
-0.580842	and >= operators).	-0.124939
-0.587474	(!a&&c) || (b&&c)	-0.124939
-0.597422	!= 0; 35	-0.124939
-2.178003	} } 34	-0.124939
-0.359002	The characters '?',	-0.124939
-0.359002	{1.1, 0.3, -2.0,	-0.124939
-1.226360	= (unsigned int)a	-0.124939
-0.902300	bits for holding	-0.124939
-1.692045	a separate module,	-0.124939
-0.527397	data processing. Running	-0.124939
-1.379728	take the hint,	-0.124939
-1.376722	depend on system-specific	-0.124939
-0.601423	iset = instrset_detect();	-0.124939
-0.601394	locally or remotely.	-0.124939
-0.897963	with most distributions	-0.124939
-0.359002	Booth: "Inner Loops:	-0.124939
-0.601631	excuse that "we	-0.124939
-0.600534	object. A little-known	-0.124939
-0.359002	and Adolfy Hoisie:	-0.124939
-0.599444	method using InstructionSet():	-0.124939
-0.359002	functions Encryption, decryption,	-0.124939
-1.214344	in my crystal	-0.124939
-0.463732	for millisecond resolution.	-0.124939
-0.550877	or completely absent	-0.124939
-0.660012	a PC. Similarly,	-0.124939
-2.648786	the same name,	-0.124939
-1.064252	largest element (approximately):	-0.124939
-1.203386	computer is reset	-0.124939
-0.902703	show a disassembly,	-0.124939
-0.726952	a natural ordering?	-0.124939
-0.901342	concentrated on arranging	-0.124939
-0.596980	insert optimization hints	-0.124939
-0.589681	test their functionality.	-0.124939
-0.601772	fetched and decoded	-0.124939
-0.505122	b ---xx---- a<<b<<c=a<<(b+c)	-0.124939
-1.888682	the variable __intel_cpu_feature_indicator	-0.124939
-0.359002	Technology Journal Vol.	-0.124939
-0.463732	diagonal remain unchanged.	-0.124939
-0.601876	1's is unchanged,	-0.124939
-1.300256	put a tag	-0.124939
-0.359002	<< 6); Or,	-0.124939
-0.359002	also included. Combining	-0.124939
-1.078555	cycles to fetch	-0.124939
-1.503570	{ for (c1	-0.124939
-0.601876	heap is reserved	-0.124939
-1.995759	have a balanced	-0.124939
-0.881846	actually quite convenient.	-0.124939
-0.595706	most processors (when	-0.124939
-0.596461	copying without effectively	-0.124939
-0.971486	empty throw() specification.	-0.124939
-1.302463	is too high.	-0.124939
-1.724534	have no native	-0.124939
-0.900893	purposes than rendering	-0.124939
-2.102394	on a First-In-First-	-0.124939
-1.585968	a cache line:	-0.124939
-0.601079	lost. This dilemma	-0.124939
-2.394433	a = (b*c)/d,	-0.124939
-0.600505	value will propagate	-0.124939
-0.660012	predicted perfectly varies	-0.124939
-1.370943	same cache line,	-0.124939
-1.639672	user interface framework...........................................................................	-0.124939
-0.875601	of n floats:	-0.124939
-1.577928	long long timediff[NumberOfTests];	-0.124939
-0.885304	e.g. four floats.	-0.124939
-1.114607	(see p. 22).	-0.124939
-0.601448	templates // Place	-0.124939
-0.588105	}; char abc;	-0.124939
-0.902723	integers is ambiguous	-0.124939
-1.435320	specific CPU models.	-0.124939
-0.463732	not 123 correspond	-0.124939
-0.600089	AMD only _mm_permutevar_ps	-0.124939
-2.640857	// Example 7.38b.	-0.124939
-0.901817	Table[x] = Y;	-0.124939
-0.600983	__declspec(align(64)) int BigArray[1024];	-0.124939
-0.601772	3A and 3B.	-0.124939
-1.921450	the following steps	-0.124939
-1.735172	time than looping	-0.124939
-2.394433	a = Func1(2);	-0.124939
-0.359002	funda- mentally flawed	-0.124939
-1.383432	to know about.	-0.124939
-0.359002	with nagging pop-up	-0.124939
-0.601665	Day for signifying	-0.124939
-0.597276	called register renaming.	-0.124939
-0.527404	by me manually,	-0.124939
-0.588671	9.3 #include <malloc.h>	-0.124939
-1.446675	doing the spell	-0.124939
-1.673175	not always sequential,	-0.124939
-0.601423	(temp = &list[0];	-0.124939
-0.660012	or NAN (Not	-0.124939
-0.598269	may some day	-0.124939
-0.595165	some extra complications.	-0.124939
-0.505122	dominating. At least,	-0.124939
-0.599341	AQtime, Intel VTune	-0.124939
-0.573382	Math Library __vrs4_expf	-0.124939
-2.712349	- - x-xx----x	-0.124939
-1.288659	The profiler identifies	-0.124939
-0.463732	---xxx--- a/a=1 --------x	-0.124939
-0.601423	(a|b)&(a|c) = a|(b&c)	-0.124939
-0.597412	hold 8 double's	-0.124939
-0.586801	use linked lists.	-0.124939
-0.359002	array initializer lists,	-0.124939
-2.955917	can be programmed	-0.124939
-2.154364	be used most.	-0.124939
-0.601448	154 // Print	-0.124939
-0.505122	branch. Microprocessor designers	-0.124939
-0.958070	processor core. Try	-0.124939
-0.597552	not stored contiguously	-0.124939
-2.640857	// Example 8.1b	-0.124939
-0.601448	y; // x,y	-0.124939
-2.640857	// Example 8.1a	-0.124939
-0.601394	-fwrapv or -fno-strict-overflow.	-0.124939
-0.463732	a re- usable	-0.124939
-0.601448	occurred. // Reset	-0.124939
-1.300547	increase the likelihood	-0.124939
-1.678923	when a fixed-size	-0.124939
-0.589691	is implementation dependent.	-0.124939
-0.463732	// Remove right-most	-0.124939
-0.359002	-1 (a&~b)|(~a&b)=a^b ---------	-0.124939
-0.599571	14.23 page 143.	-0.124939
-0.601772	(&& and ||).	-0.124939
-2.295018	order to facilitate	-0.124939
-0.359002	v. 3.1, 2007.	-0.124939
-2.640857	// Example 12.9b.	-0.124939
-0.894110	at address esp+8	-0.124939
-1.305490	stored together ......................................	-0.124939
-0.899732	speed, memory economy,	-0.124939
-0.789093	been updated lately.	-0.124939
-1.300005	array of structures:	-0.124939
-1.880050	are accessed row-wise,	-0.124939
-2.101628	make a lookup-table	-0.124939
-0.601598	can't be reached	-0.124939
-2.640857	// Example 8.16	-0.124939
-0.359002	(in Windows: __rdtsc()).	-0.124939
-0.895082	offset table (GOT).	-0.124939
-2.640857	// Example 8.17	-0.124939
-2.640857	// Example 8.18	-0.124939
-0.851886	on access. Sequential	-0.124939
-0.598053	numbers. You may,	-0.124939
-0.601855	techniques of multithreading.	-0.124939
-0.600401	runtime. Example 7.43	-0.124939
-2.640857	// Example 7.42	-0.124939
-2.534636	to be renewed.	-0.124939
-2.640857	// Example 7.45	-0.124939
-2.640857	// Example 7.44	-0.124939
-2.640857	// Example 7.4.	-0.124939
-0.550877	(2,2,2,2,2,2,2,2) Is16vec8 two(2,2,2,2,2,2,2,2);	-0.124939
-0.601394	for-loop or while-loop	-0.124939
-1.064114	compiled code. Compiled	-0.124939
-1.503920	table of 1/n!	-0.124939
-1.237298	512 512 378.7	-0.124939
-1.300113	counter is counting	-0.124939
-0.359002	supported fprintf(stderr, "\nError:	-0.124939
-0.764737	and underflow neutralize	-0.124939
-0.463732	= 2.0f; x.i	-0.124939
-0.359002	513 2056 38.1	-0.124939
-0.359002	511 2040 38.7	-0.124939
-0.878685	compiler allows "__attribute__((visibility("hidden")))".	-0.124939
-0.601772	games and animations	-0.124939
-0.359002	Optimization Reference Manual".	-0.124939
-0.902300	made for demonstration	-0.124939
-0.873081	of graphics cards,	-0.124939
-0.359002	1./39916800., 1./4.790016E8, 1./6.22702E9,	-0.124939
-0.601286	values by hand	-0.124939
-1.463530	its own caller,	-0.124939
-0.789104	8, 16, 32,	-0.124939
-0.593763	next line provokes	-0.124939
-1.529360	the second operand.	-0.124939
-2.532512	to make memory-hungry	-0.124939
-0.601876	message is provoked	-0.124939
-1.077273	columns = 32;	-0.124939
-0.601423	DynamicArray[i] = WhateverFunction(i);	-0.124939
-2.107799	they are unavoidable.	-0.124939
-0.836084	option -fno-pic apparently	-0.124939
-2.773232	in a DLL.	-0.124939
-0.359002	cache MOVNTPS _mm_stream_ps	-0.124939
-0.601851	applications to perform	-0.124939
-0.595013	ALIGN ; mark_end;	-0.124939
-2.066598	(see page 96).	-0.124939
-0.601448	x); // x^1,	-0.124939
-1.568651	the option -ftrapv,	-0.124939
-1.070094	in library libircmt.lib.	-0.124939
-0.593316	(1. / 1.2345);	-0.124939
-2.303067	code is repetitive.	-0.124939
-0.584313	n; switch (n)	-0.124939
-0.600682	critical time consumers.	-0.124939
-0.601866	ignore a request	-0.124939
-1.203299	representation of N:	-0.124939
-0.900563	= { "Alpha",	-0.124939
-0.598082	/ 2 (be	-0.124939
-0.601876	CPUID is artificially	-0.124939
-0.601394	game or animation.	-0.124939
-0.601772	pros and cons	-0.124939
-1.048689	a certain tolerance.	-0.124939
-0.359002	0x2F00, 0x3700, 0x3F00	-0.124939
-0.598586	anda * 17is	-0.124939
-1.059216	than its reputation.	-0.124939
-0.893443	char 64 Iu8vec8	-0.124939
-1.149667	pointer aliasing (/Oa).	-0.124939
-0.359002	minimum, maximum, saturated	-0.124939
-1.524341	32 bit offsets).	-0.124939
-0.575704	xor mov $B1$2:	-0.124939
-1.680333	a good knowledge	-0.124939
-1.090070	= 1.0; list[i].b	-0.124939
-1.277874	executable file stub.	-0.124939
-0.601394	Quine–McCluskey or Espresso)	-0.124939
-0.567277	"Software Optimization Guide	-0.124939
-0.601772	Library" and "Integrated	-0.124939
-0.601851	T+1 to T+6,	-0.124939
-0.895540	are some examples:	-0.124939
-0.595021	We must bear	-0.124939
-1.301907	} Here, log(2.0)	-0.124939
-2.265201	n.a. - andnot(a,a)	-0.124939
-2.640857	// Example 12.8a.	-0.124939
-0.359002	u.i &= 0x7FFFFFFF;	-0.124939
-0.359002	Windows, -msse2, -mavx,	-0.124939
-0.601394	"static" or "__attribute__((visibility	-0.124939
-1.004475	mov mov 2:8+esp	-0.124939
-0.359002	= _mm_blendv_epi8(bc, c2,	-0.124939
-1.660547	to optimize anything,	-0.124939
-0.598528	of clock pulses	-0.124939
-1.818943	operating systems disappears	-0.124939
-0.567277	are search requests	-0.124939
-1.801004	is less reliable.	-0.124939
-1.004467	as possible. Typically	-0.124939
-0.601286	destructor by constructing	-0.124939
-0.463732	not allowed. Non-public	-0.124939
-0.598082	below 2 GB,	-0.124939
-0.575698	shared. Any writable	-0.124939
-0.359002	r2, c1, c2;	-0.124939
-0.601394	stdint.h or inttypes.h	-0.124939
-0.600395	threads from attempting	-0.124939
-0.726952	it takes. Debugging.	-0.124939
-2.088706	by using indexes,	-0.124939
-0.660012	a+a+a+a=a*4 -(-a)=a --xxxxxx-	-0.124939
-1.300740	what the preprocessor	-0.124939
-0.594700	new processor enters	-0.124939
-0.359002	v. 4.5.2, July	-0.124939
-1.069830	const float lookup[2]	-0.124939
-0.359002	than -156. Surprisingly,	-0.124939
-0.896404	equally efficient because,	-0.124939
-0.596461	speed without jeopardizing	-0.124939
-1.875696	function that draws	-0.124939
-0.601772	ASP and UNIX	-0.124939
-0.593768	4 AVX _mm256_permutevar_ps	-0.124939
-0.601772	'@' and '$'	-0.124939
-0.588105	contrived examples exist.	-0.124939
-0.588095	168.3 Table 9.3.	-0.124939
-2.154364	be used freely	-0.124939
-0.596461	unit-test without taking	-0.124939
-0.557834	compare absolute values:	-0.124939
-0.359002	Automatic paralleli- zation	-0.124939
-0.599362	page size (4096).	-0.124939
-0.789093	6 Development process......................................................................................................	-0.124939
-0.463732	the G values,	-0.124939
-0.359002	operands: minimum, maximum,	-0.124939
-0.596374	contain useful discussions	-0.124939
-2.102394	use a #define,	-0.124939
-0.359002	point status: _fpreset();	-0.124939
-0.585224	differences were observed	-0.124939
-0.599480	reducing example 15.1d	-0.124939
-0.894977	whole software package,	-0.124939
-0.726952	than allocating piecewise	-0.124939
-0.599705	contains integer division:	-0.124939
-0.582149	8 columns unused.	-0.124939
-0.463732	b;}; Sab ab[size];	-0.124939
-2.071239	that can steal	-0.124939
-1.550603	x = array[i++]	-0.124939
-0.601772	CPLDs and FPGAs.	-0.124939
-0.887058	s += x^n/n!	-0.124939
-0.726952	3.7 File access................................................................................................................	-0.124939
-1.446324	converted to OMF	-0.124939
-0.851912	data fit nicely	-0.124939
-0.892326	under test finishes	-0.124939
-1.364457	the static keyword:	-0.124939
-1.364457	the static keyword,	-0.124939
-0.359002	1 0.5ns. 2GHz	-0.124939
-0.601772	compression and cryptography	-0.124939
-3.071155	in the Professional	-0.124939
-1.282202	the register keyword.	-0.124939
-2.283633	such as strcpy,	-0.124939
-2.640857	// Example 7.35b	-0.124939
-2.640857	// Example 7.35a	-0.124939
-2.177794	by a plain	-0.124939
-0.660012	#include <xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);	-0.124939
-0.359002	cache MOVNTI _mm_stream_si32	-0.124939
-0.895170	than 2 GB.	-0.124939
-2.834872	it is advisable	-0.124939
-0.359002	C++ imple- mentations	-0.124939
-1.143308	4 AVX2 _mm_i32gather_ps	-0.124939
-0.601423	b+c = 100000001.23456.	-0.124939
-0.899779	element Example 9.6b	-0.124939
-1.394996	on non-Intel processors).	-0.124939
-2.393331	should be weighed	-0.124939
-0.573385	set, e.g. /arch:SSE2.	-0.124939
-0.359002	Builder 5, 2009).	-0.124939
-0.463732	have inserted UnusedFiller	-0.124939
-1.174070	has several flaws:	-0.124939
-0.463732	33 7.5 Booleans...................................................................................................................	-0.124939
-0.589207	the mathematical notion	-0.124939
-0.505131	CPUs (Intel Atom).	-0.124939
-0.597210	exceeds 64 kbytes.	-0.124939
-0.359002	compilers (Microsoft, Intel)	-0.124939
-1.861357	a single comparison:	-0.124939
-1.495252	available from Intel.	-0.124939
-0.836084	calling conventions. FreeBSD	-0.124939
-1.925587	the performance somewhat.	-0.124939
-1.974373	SSE2 instruction set:	-0.124939
-0.596260	Dispatcher void SelectAddMul_dispatch(short	-0.124939
-0.588671	16.1 #include <intrin.h>	-0.124939
-2.955917	can be reduced.	-0.124939
-0.359002	or NAN. Avoiding	-0.124939
-0.889953	other error reporting	-0.124939
-0.901432	associated with profiling,	-0.124939
-1.641813	discussion of profiling.	-0.124939
-1.193473	Are objects numbered	-0.124939
-0.463732	the planning phase	-0.124939
-0.593316	1. / (b1	-0.124939
-0.598765	Borland/CodeGear/Embarcadero C++ builder	-0.124939
-0.463732	fixed strides. Uncached	-0.124939
-2.397645	is the responsi-	-0.124939
-0.359002	programmer hasn't thought	-0.124939
-0.897011	math library (SVML).	-0.124939
-0.601056	x-xxx - xx(-)x-	-0.124939
-2.640857	// Example 8.23a.	-0.124939
-1.078222	rows are indexed	-0.124939
-1.072722	CPUs, but event-counters	-0.124939
-0.591308	happen quite often.	-0.124939
-0.892146	embedded systems Microcontrollers	-0.124939
-1.678923	when a genuine	-0.124939
-1.775238	time to calculate.	-0.124939
-0.541332	object-oriented programming, modularity,	-0.124939
-1.713546	sake of modularity.	-0.124939
-0.601851	instruction to localize	-0.124939
-2.127939	want to optimize,	-0.124939
-0.577673	on it. Instead	-0.124939
-0.836084	hot spot. Repeating	-0.124939
-0.961003	in parallel. Fine-grained	-0.124939
-0.359002	UNIX shell script.	-0.124939
-0.557825	storage p. 28)	-0.124939
-0.598763	will also work,	-0.124939
-0.806134	in column 28,	-0.124939
-1.767402	has been calculated.	-0.124939
-2.670223	for the "FDIV	-0.124939
-1.018393	C++ v. 3.1,	-0.124939
-1.200549	point code slower,	-0.124939
-2.429856	with the reciprocal:	-0.124939
-0.601394	__fastcall or __attribute__((fastcall)).	-0.124939
-2.195854	short int cc[size]	-0.124939
-0.359002	%1 \n fistpl	-0.124939
-0.359002	named MKL, VML	-0.124939
-0.359002	(e.g. Sandy Bridge)	-0.124939
-0.557825	file MMX mmintrin.h	-0.124939
-0.597944	types: long long,	-0.124939
-1.699544	the microprocessor wastes	-0.124939
-0.902723	division is inexact	-0.124939
-1.923513	and other odd-sized	-0.124939
-0.359002	1./3628800., 1./39916800., 1./4.790016E8,	-0.124939
-2.101628	make a thread-like	-0.124939
-0.601851	T to T+5,	-0.124939
-0.601772	OS and Itanium	-0.124939
-0.582140	problem: 1. Relocation.	-0.124939
-1.073222	between CPU brands,	-0.124939
-0.764749	513 513 2056	-0.124939
-0.597888	row < NUMROWS;	-0.124939
-0.359002	Instruction tables: Lists	-0.124939
-1.705059	1; } module2.cpp	-0.124939
-2.640857	// Example 12.8b.	-0.124939
-0.359002	1.00 0.35 0.29	-0.124939
-2.640857	// Example 14.18c	-0.124939
-0.573394	mmintrin.h SSE xmmintrin.h	-0.124939
-0.575707	Use square blocking:	-0.124939
-0.359002	0.82 0.59 0.27	-0.124939
-0.463732	0.25 0.28 0.22	-0.124939
-0.875598	4) | ((C	-0.124939
-0.588096	0x0F) | ((B	-0.124939
-0.901817	b[i] = Func(a[i]);	-0.124939
-2.030478	a program creates	-0.124939
-1.921450	the following work-around	-0.124939
-0.591321	for intermediate results,	-0.124939
-0.600960	(other than log)	-0.124939
-0.463732	x---x---x x-xxx---- a*b*c=a*(b*c)	-0.124939
-0.598354	than any non-vector	-0.124939
-0.601598	container be recycled?	-0.124939
-0.601230	(add with carry)	-0.124939
-0.590140	perhaps Mac OS.	-0.124939
-1.992339	} The FactorialTable	-0.124939
-0.584309	%10I64i", i, timediff[i]);	-0.124939
-1.745437	the programmer can.	-0.124939
-0.359002	(e.g. DEC, JNZ).	-0.124939
-0.878716	critical dependency chain,	-0.124939
-0.875609	in addition to)	-0.124939
-0.359002	(a+c==b+c)=(a==b) ----x---- !(a<b)=(a>=b)	-0.124939
-2.016540	on page 134.	-0.124939
-0.527404	table 9.3 shows,	-0.124939
-1.077144	whenever it feeds	-0.124939
-0.587477	list[i] > 1.0)	-0.124939
-0.563110	to general improvements	-0.124939
-1.767402	has been introduced	-0.124939
-0.557834	programs do. Hence,	-0.124939
-0.463732	set (called x86)	-0.124939
-2.640857	// Example 8.2a	-0.124939
-2.640857	// Example 8.2b	-0.124939
-0.594865	and calls alternately	-0.124939
-0.505122	Interrupt service routines	-0.124939
-2.541450	the function billions	-0.124939
-0.836084	Opteron K8 1.09	-0.124939
-0.601145	xn as x4∙xn-4.	-0.124939
-2.066598	(see page 103),	-0.124939
-0.583293	to around 1980	-0.124939
-2.151139	in example 14.7b,	-0.124939
-2.640857	// Example 14.7b.	-0.124939
-0.359002	next year. Ignoring	-0.124939
-2.192153	are not affected	-0.124939
-0.359002	+ 3.; x.d	-0.124939
-0.586793	} x; x.f	-0.124939
-2.640857	// Example 7.9b	-0.124939
-2.640857	// Example 7.9a	-0.124939
-1.604274	is simply identical.	-0.124939
-0.806134	exception occurs somewhere	-0.124939
-0.588094	!a && !b	-0.124939
-1.783204	the memory bus	-0.124939
-0.527404	First-In-First- Out (FIFO)	-0.124939
-0.596160	safe programming practice,	-0.124939
-2.151139	in example 8.24	-0.124939
-2.640857	// Example 8.25	-0.124939
-1.272273	object file disassembler.	-0.124939
-1.255482	Boolean operators &&,	-0.124939
-2.640857	// Example 8.20	-0.124939
-1.200472	calculated as (critical	-0.124939
-2.640857	// Example 8.22	-0.124939
-1.281040	are very smart.	-0.124939
-2.640857	// Example 12.9a.	-0.124939
-0.902300	16 for SSE2,	-0.124939
-0.599166	r.a + r.b;}	-0.124939
-0.577673	shift operations. Multiplying	-0.124939
-2.568185	and the post-increment	-0.124939
-1.788227	information about bugs,	-0.124939
-0.601394	C0::f or C1::f.	-0.124939
-0.359002	F32vec4 F64vec2 F32vec8	-0.124939
-0.359002	// Dispatcher. Will	-0.124939
-0.527397	warn against overkill.	-0.124939
-2.640857	// Example 8.3b	-0.124939
-0.900953	uses an ordinary	-0.124939
-0.596461	programming without paying	-0.124939
-0.359002	Intel Technology Journal	-0.124939
-2.394433	a = -1.0E8,	-0.124939
-0.463732	cause slight imprecision	-0.124939
-1.520691	compiler doesn't provide	-0.124939
-1.470278	integer operations in-between	-0.124939
-1.888682	the variable __intel_cpu_feature_indicator_x.	-0.124939
-1.197823	to save recovery	-0.124939
-1.099276	Template Library (WTL).	-0.124939
-0.359002	v. 2.7, 2.8.	-0.124939
-1.190133	each bit indicates	-0.124939
-0.660012	|= 0x20; 46	-0.124939
-1.099276	Template Library (WTL):	-0.124939
-0.541332	from mispredictions. 44	-0.124939
-0.818712	a reliable decision.	-0.124939
-0.463732	x86 systems). 42	-0.124939
-0.527397	a place indicated	-0.124939
-0.359002	Covers PC's, workstations	-0.124939
-0.359002	i<100; i++,i2+=2.0f)a[i]=i2; 41	-0.124939
-0.463732	512 2048 230.7	-0.124939
-1.409940	is much faster,	-0.124939
-0.847528	0) ? (cc[i]	-0.124939
-1.888682	the variable 85	-0.124939
-0.600296	-fwhole- program /Qipo	-0.124939
-0.660012	quite tedious indeed.	-0.124939
-1.489283	= 1; a[1]	-0.124939
-0.585224	(C << 6);	-0.124939
-0.601665	attack for hackers.	-0.124939
-0.902723	multiplication is exact.	-0.124939
-0.601665	column; for (row	-0.124939
-0.596558	less user friendly.	-0.124939
-0.958081	is poorly predictable,	-0.124939
-1.639672	user interface (OnIdle	-0.124939
-1.600962	are often abusing	-0.124939
-0.595165	leaf function. Leaf	-0.124939
-0.359002	"Beta", "Gamma", "Delta"	-0.124939
-0.601423	Thursday = 0x10,	-0.124939
-0.789104	7.14 Functions ................................................................................................................	-0.124939
-0.601394	C1::Disp() or C2::Disp()	-0.124939
-0.599227	goes into sleep	-0.124939
-0.359002	called Single-Instruction-Multiple-Data (SIMD)	-0.124939
-0.600990	(e.g. an if-else	-0.124939
-0.570635	int CriticalFunction ();	-0.124939
-2.086278	you are feeding	-0.124939
-0.463732	(-a)*(-b)=a*b a/a=1 ----x---x	-0.124939
-0.601631	hacks that violate	-0.124939
-2.209744	See page 34.	-0.124939
-0.359002	0.95 0.6 1.19	-0.124939
-0.463732	---x----- x--xx---- (a&&b)||(a&&!b)=a	-0.124939
-0.858828	in special mathe-	-0.124939
-2.283633	such as VHDL	-0.124939
-2.393331	should be postponed	-0.124939
-0.582149	systems gives rise	-0.124939
-1.999077	unsigned int u[2]}	-0.124939
-0.463732	and web browsing	-0.124939
-2.006334	a loop counter:	-0.124939
-0.359002	= {1.1, 0.3,	-0.124939
-0.586793	vector processors. Henry	-0.124939
-0.598361	mode, we encounter	-0.124939
-1.308230	data cache. Bit-fields	-0.124939
-0.601580	output are unacceptable.	-0.124939
-0.589681	if their live-ranges	-0.124939
-0.463732	0.89 0.40 0.30	-0.124939
-0.601448	Time // Serialize	-0.124939
-0.567277	Architectures Optimization Reference	-0.124939
-0.600682	also time consuming,	-0.124939
-0.577688	bugs, compatibility problems,	-0.124939
-0.895082	offset table (GOT)	-0.124939
-0.836084	Opteron K8 0.38	-0.124939
-0.359002	---xx---- (a+c==b+c)=(a==b) ----x----	-0.124939
-1.679225	pointer is created,	-0.124939
-0.359002	double matrix[SIZE][SIZE]; transpose(matrix);	-0.124939
-0.599724	prevent cache contention.	-0.124939
-0.598128	(N-1)) return powN<(N1&(N1-1))==0,N1>::p(x)	-0.124939
-0.601665	squares: for (r1	-0.124939
-0.601286	way by wrapping	-0.124939
-2.955917	can be omitted,	-0.124939
-0.527397	p) {return p->a	-0.124939
-0.999913	not necessarily newer.	-0.124939
-0.359002	hardware circuits consisting	-0.124939
-1.634968	have an estimated	-0.124939
-1.377878	CPU dispatching. Underestimating	-0.124939
-2.141071	stored in edx.	-0.124939
-0.601772	Y and Z.	-0.124939
-0.359002	and IA-32 Architectures	-0.124939
-0.601394	variables or hide	-0.124939
-0.971486	empty throw() specification	-0.124939
-0.359002	by fetching, decoding	-0.124939
-0.359002	calls exit(), abort(),	-0.124939
-0.900893	use than others.	-0.124939
-0.593970	Y += Z;	-0.124939
-1.466153	dynamically allocated memory.................................................................	-0.124939
-1.801124	also be considered.	-0.124939
-1.941385	code in general.	-0.124939
-2.640857	// Example 7.38a.	-0.124939
-2.265129	There are hundreds	-0.124939
-0.599358	; double Func2(double	-0.124939
-1.220168	Microsoft Visual studio	-0.124939
-2.239089	have to reinvent	-0.124939
-0.601855	that of yesterday's	-0.124939
-0.580838	(gcc v. 4.5.2,	-0.124939
-0.359002	?Func@@YAXQAHAAH@Z PROC NEAR	-0.124939
-0.601674	these. The CodeGear,	-0.124939
-0.891358	The file http://www.agner.org/optimize/asmlib.zip	-0.124939
-0.806124	on CodeGear compiler)	-0.124939
-0.463732	511 511 2040	-0.124939
-2.640857	// Example 7.43a.	-0.124939
-1.597505	implemented as recursive	-0.124939
-0.463732	not testing. Trying	-0.124939
-0.593763	with line 29.	-0.124939
-0.596461	F1 without returning.	-0.124939
-0.594365	memcpy(b, a, sizeof(b));	-0.124939
-0.594350	system thread scheduler.	-0.124939
-0.895082	following table summarizes	-0.124939
-0.902233	happen that (b*c)	-0.124939
-0.591659	that simply prints	-0.124939
-1.986758	of code optimization",	-0.124939
-0.527412	= parabola (2.0f);	-0.124939
-1.172660	i; ... list[i	-0.124939
-0.601665	Template for pow(x,N)	-0.124939
-1.454266	DWORD PTR [eax+400]	-0.124939
-2.367289	rather than -156.	-0.124939
-2.955917	can be speeded	-0.124939
-0.601423	x.d = y.d	-0.124939
-2.955917	can be used:	-0.124939
-1.888682	the variable m.	-0.124939
-0.600983	x, int m)	-0.124939
-0.901817	x.a = y.a	-0.124939
-0.901817	x.b = y.b	-0.124939
-0.901817	x.c = y.c	-0.124939
-0.359002	code optimization", Coriolis	-0.124939
-0.505122	32, 64, ...).	-0.124939
-0.903059	template <int m>	-0.124939
-0.359002	&& WriteFile(handle, ...))	-0.124939
-3.071155	in the oldest	-0.124939
-0.601772	registers and correspondingly	-0.124939
-1.073840	It has excellent	-0.124939
-1.201578	f = (float)i;	-0.124939
-3.071155	in the grandparent	-0.124939
-0.359002	Vec8i Vec8ui Vec4q	-0.124939
-0.463732	/ sar ebx,1	-0.124939
-0.901610	u; if (u.i[1]	-0.124939
-1.466097	of its simplicity.	-0.124939
-0.557834	processes simultaneously. Actually,	-0.124939
-0.505131	to 3-dimensional geometry	-0.124939
-0.359002	128 Is32vec4 Vec4i	-0.124939
-1.553310	one that saves	-0.124939
-0.601077	pointer not aliased	-0.124939
-0.359002	Vec2d Vec8f Vec4d	-0.124939
-0.359002	objects, respectively (MS	-0.124939
-0.359002	a thread-like scheduling	-0.124939
-0.575698	storage (e.g. PowerPC).	-0.124939
-2.250648	is no guarantee	-0.124939
-0.359002	is over. Virtualization	-0.124939
-1.785683	the best algorithm.	-0.124939
-1.061980	will always compete	-0.124939
-2.278059	to do searches	-0.124939
-1.202540	register for both,	-0.124939
-0.573397	using nontemporal writes.	-0.124939
-0.601394	*p or p->member	-0.124939
-1.959451	I have confirmed	-0.124939
-0.601772	units and hence	-0.124939
-0.591976	EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)	-0.124939
-2.640857	// Example 14.21.	-0.124939
-0.601145	quite as versatile.	-0.124939
-0.601772	reusability and systematization	-0.124939
-0.600378	smaller memory footprint.	-0.124939
-1.078317	memset and memcpy:	-0.124939
-0.601286	section by summing	-0.124939
-0.600378	units, memory ports,	-0.124939
-0.505122	|| expression. Assume,	-0.124939
-0.601674	design. The ultimate	-0.124939
-1.300296	stack is organized.	-0.124939
-0.601772	searching and parsing	-0.124939
-0.527404	Denmark. Copyright ©	-0.124939
-1.078023	problem. The official	-0.124939
-1.297120	function. This fragmentation	-0.124939
-0.359002	clumsy AND-OR construction	-0.124939
-1.202204	code are modified,	-0.124939
-2.065292	it takes 40%	-0.124939
-0.580838	2008, v. 9.0	-0.124939
-0.359002	option -read_only_relocs suppress.	-0.124939
-1.078631	efficiency is reflected,	-0.124939
-0.601772	136 and 137,	-0.124939
-0.573391	error condition terminates	-0.124939
-0.557825	above, p. 26).	-0.124939
-1.388138	be predicted perfectly.	-0.124939
-0.596162	64 32 16.4	-0.124939
-0.359002	/GL --combine -fwhole-	-0.124939
-2.060072	program is terminated	-0.124939
-0.527397	2 63 .	-0.124939
-0.601077	forwards, not backwards.	-0.124939
-0.359002	"Alpha", "Beta", "Gamma",	-0.124939
-0.597988	-(-a) very often,	-0.124939
-0.527404	branch pattern history,	-0.124939
-0.586806	x; public: c1()	-0.124939
-2.368199	because the integer-to-float	-0.124939
-2.064340	const int arraysize	-0.124939
-0.505122	Vec32uc Vec16s Vec16us	-0.124939
-0.463732	--xxxxxx- a-(-b)=a+b ---xxx-x-	-0.124939
-0.594346	pointer, common subexpressions,	-0.124939
-1.083020	to remove unreferenced	-0.124939
-2.341047	with a non-recursing	-0.124939
-2.029536	Intel compiler puts	-0.124939
-0.359002	Mathcad (v. 15.0)	-0.124939
-1.379391	generation of identifier	-0.124939
-0.594527	this language gained	-0.124939
-0.463732	early planning stage	-0.124939
-0.505122	for(i=0,i2=0; i<100; i++,i2+=2.0f)a[i]=i2;	-0.124939
-1.637223	intermediate code (byte	-0.124939
-0.599010	from library asmlib..	-0.124939
-1.077996	or for combining	-0.124939
-0.601777	limited in scope.	-0.124939
-0.601286	ms by selecting	-0.124939
-1.347387	data structures .............................................................	-0.124939
-0.579365	mutexes, database connections,	-0.124939
-0.601665	*temp; for (temp	-0.124939
-1.915130	must be added.	-0.124939
-0.463732	discussion. 7.33 Namespaces	-0.124939
-1.627842	and then merge	-0.124939
-2.066598	(see page 142).	-0.124939
-2.283633	such as flush	-0.124939
-0.359002	JavaScript, PHP, ASP	-0.124939
-0.591659	not well documented.	-0.124939
-1.928282	operating system standards.	-0.124939
-0.660012	7.8 Member pointers.......................................................................................................37	-0.124939
-0.601665	module for correctness	-0.124939
-1.189982	class objects (rather	-0.124939
-0.463732	is -0 (zero	-0.124939
-0.600983	time int CriticalFunction_Dispatch(int	-0.124939
-3.071155	in the BTB	-0.124939
-0.903059	compilers offer profile-guided	-0.124939
-0.806134	?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROC	-0.124939
-1.070557	exponent + 0x3FF	-0.124939
-2.640857	// Example 7.32a	-0.124939
-0.595012	string; while (*p	-0.124939
-0.567273	are usability issues,	-0.124939
-1.626554	The same coding	-0.124939
-0.891567	x[]); void F2(float	-0.124939
-1.295473	of compiler options.......................................................................................	-0.124939
-1.070557	exponent + 0x3FFF	-0.124939
-0.463732	cmp jl $B1$3:	-0.124939
-1.741233	a[i] = 0.0;	-0.124939
-0.359002	example 12.4b, rewritten	-0.124939
-0.463732	as semaphores, mutexes	-0.124939
-0.871620	2 AVX2 _mm256_i64gather_epi32	-0.124939
-0.902680	point of attack	-0.124939
-0.463732	the external clock.	-0.124939
-0.463732	pow, log, exp,	-0.124939
-0.596264	user access rights.	-0.124939
-1.600962	are often fluctuating	-0.124939
-2.310817	is to combine	-0.124939
-0.875592	(a<b && b<c	-0.124939
-0.901946	solution can incur	-0.124939
-1.687193	are also included.	-0.124939
-1.202204	directives are compiler-specific.	-0.124939
-1.713546	sake of security,	-0.124939
-1.202577	cast The reinterpret_cast	-0.124939
-1.299666	the so-called CPU-dispatcher	-0.124939
-1.769413	b * 1.5f;	-0.124939
-0.600113	three functions Sum1,	-0.124939
-0.881863	configuration files (*.ini	-0.124939
-1.300113	integer is returned.	-0.124939
-0.359002	K8 1.09 1.25	-0.124939
-0.726952	0.18 0.11 1.21	-0.124939
-1.828475	a class (also	-0.124939
-0.563110	advanced prediction mechanisms.	-0.124939
-0.463732	Henry S. Warren,	-0.124939
-0.463732	j++) 39 matrix[i][j]	-0.124939
-0.601851	however, to pass	-0.124939
-1.468048	CPU dispatch mechanisms,	-0.124939
-0.550877	inlining x-xxxx--x Constantfolding	-0.124939
-1.201578	CriticalFunction = &CriticalFunction_Dispatch;	-0.124939
-2.593306	is not traditionally	-0.124939
-1.021970	simple cases. Database	-0.124939
-0.505122	to alias upon	-0.124939
-0.590148	- preferably isolated	-0.124939
-1.300449	do a thorough	-0.124939
-0.573391	EXCEPTION_FLT_OVERFLOW ? EXCEPTION_EXECUTE_HANDLER	-0.124939
-0.580842	AND operation isolates	-0.124939
-0.359002	my crystal ball	-0.124939
-0.599902	(Gnu) all intrin.h	-0.124939
-0.601448	10; // Convert	-0.124939
-2.283633	such as pow,	-0.124939
-1.453828	a common denominator	-0.124939
-0.563122	<= (unsigned int)(max	-0.124939
-0.527397	simple regular pattern,	-0.124939
-1.259134	be aware of.	-0.124939
-0.359002	i++,i2+=2.0f)a[i]=i2; 41 Float	-0.124939
-0.994811	efficient solution. Sort	-0.124939
-0.893282	and often excessively	-0.124939
-0.359002	been reordered, inlined,	-0.124939
-0.463732	be repeated 1024/4	-0.124939
-0.818720	i <= max)	-0.124939
-1.568651	the option /QaxAVX	-0.124939
-0.900458	why this delaying	-0.124939
-0.601394	/O2 or /Ox	-0.124939
-1.119734	stack frame /Oy	-0.124939
-0.601772	n and reorganize	-0.124939
-0.601394	internet or intranet	-0.124939
-0.587477	2 > v.i	-0.124939
-0.575704	register containing (2,2,2,2),	-0.124939
-1.539082	library functions directly:	-0.124939
-0.550892	(handle != INVALID_HANDLE_VALUE	-0.124939
-0.359002	C++ 5.82 (Embarcadero/CodeGear/Borland	-0.124939
-1.149667	pointer aliasing /Oa	-0.124939
-0.596980	Interprocedural optimization /Og	-0.124939
-2.209744	See page 54.	-0.124939
-0.601772	throughputs and micro-operation	-0.124939
-0.359002	operators &&, ||,	-0.124939
-0.599225	XOR b Bit	-0.124939
-1.283797	of possible inputs.	-0.124939
-2.213916	the value infinity,	-0.124939
-0.873084	inputs give infinity.	-0.124939
-0.567269	from fully utilizing	-0.124939
-1.750912	a simple solution,	-0.124939
-0.463732	less favorable: Larger	-0.124939
-1.767402	has been criticized	-0.124939
-1.077186	Intel or PathScale.	-0.124939
-0.575717	of algebraic reduction.	-0.124939
-2.670223	for the label.	-0.124939
-1.191089	be faster despite	-0.124939
-1.056983	for speed /O2	-0.124939
-1.878108	has to reinstall	-0.124939
-1.078891	under the framework,	-0.124939
-0.593083	vector, uses SSE3.	-0.124939
-1.795590	a particular situation,	-0.124939
-0.588113	% means modulo.	-0.124939
-2.394433	a = 5.0f;	-0.124939
-0.463732	int a[2]; a[0]	-0.124939
-1.878435	to avoid hard-to-find	-0.124939
-0.599444	memory, using new.	-0.124939
-0.359002	point -ffast-math /fp:fast	-0.124939
-1.915130	must be adjusted	-0.124939
-0.601580	programming are dominating.	-0.124939
-0.596461	same without discriminating	-0.124939
-0.902673	down to 36.	-0.124939
-0.359002	updated 2014-08-07. Contents	-0.124939
-0.359002	2.7, 2.8. Asmlib:	-0.124939
-0.601866	converts a zero-terminated	-0.124939
-0.902723	unit is pipelined,	-0.124939
-1.960315	than a polymorphous	-0.124939
-0.601772	_mm_malloc and _mm_free.	-0.124939
-1.372437	and later discovers	-0.124939
-0.463732	cache MOVNTPD _mm_stream_pd	-0.124939
-0.598892	containing multiple streams	-0.124939
-0.527404	cache MOVNTQ _mm_stream_pi	-0.124939
-0.896043	binutils version 2.20,	-0.124939
-0.600682	regular time intervals.	-0.124939
-1.365489	(See page 81).	-0.124939
-0.896054	45 clock cycles).	-0.124939
-0.598586	example,a * 16is	-0.124939
-1.554001	pointers and non-constant	-0.124939
-0.601876	(&ArraySize) is taken.	-0.124939
-1.078317	long and irregular	-0.124939
-1.146059	the linker extracts	-0.124939
-1.300113	address is taken,	-0.124939
-0.866322	in computer games	-0.124939
-1.798598	b = lrint(d);	-0.124939
-1.169276	int 128 Is32vec4	-0.124939
-1.185035	int 64 Is32vec2	-0.124939
-0.593969	of small microcontrollers:	-0.124939
-1.636894	function call (other	-0.124939
-0.359002	program /Qipo -ipo	-0.124939
-0.600983	a, int x[])	-0.124939
-0.599358	dummy; double a[arraysize],	-0.124939
-0.567291	case 3: printf("Delta");	-0.124939
-0.598188	default, so 1.2	-0.124939
-1.076245	1. This ends	-0.124939
-0.463732	to respond quickly	-0.124939
-0.582142	a syntax restriction,	-0.124939
-2.283633	such as email	-0.124939
-0.579375	x86 platforms (Windows,	-0.124939
-0.550877	list plus i*sizeof(S1).	-0.124939
-3.071155	in the end.	-0.124939
-0.892987	position-independent code. 147	-0.124939
-1.496948	a time packed	-0.124939
-0.591321	from addresses 0x2F00,	-0.124939
-1.423335	an option (Windows:	-0.124939
-0.587474	(a&&!b) || (!a&&b)	-0.124939
-0.601772	temp1 and temp2.	-0.124939
-0.359002	limited "express" edition	-0.124939
-2.110520	the critical stride,	-0.124939
-2.110520	the critical stride.	-0.124939
-0.359002	as (critical stride)	-0.124939
-1.678951	than to temporarily	-0.124939
-1.067068	for software teachers	-0.124939
-0.601772	starting and stopping	-0.124939
-2.213916	the value 0x2C	-0.124939
-1.194286	generation class (CGrandParent)	-0.124939
-0.600378	have memory caches.	-0.124939
-0.590947	are both positive.	-0.124939
-0.359002	obsolete. Rick Booth:	-0.124939
-0.660012	precision math. Libraries	-0.124939
-0.359002	so complicated? Because	-0.124939
-0.601448	2; // Find	-0.124939
-0.601394	2015 or 2016.	-0.124939
-1.049121	algebra reductions: !(!a)=a	-0.124939
-1.077850	infinity or NAN.	-0.124939
-0.527397	integer type. Interrupt	-0.124939
-2.534636	to be platform-independent	-0.124939
-0.359002	as strcpy, strcat,	-0.124939
-0.359002	a BSF (bit	-0.124939
-0.601145	arrays as required,	-0.124939
-0.575704	file containing numerical	-0.124939
-0.601674	mangling. The characters	-0.124939
-2.955917	can be made)	-0.124939
-0.902680	sizes of matrices.	-0.124939
-0.359002	Tuesday, Wednesday, Thursday,	-0.124939
-0.597891	their 32-bit counterparts.	-0.124939
-2.209744	See page 90.	-0.124939
-1.300453	float and double.....................................................................................	-0.124939
-1.743471	compile time. (Of	-0.124939
-1.533743	the C++ language......................................................	-0.124939
-0.359002	bits (rarely 64).	-0.124939
-3.071155	in the STL.	-0.124939
-1.597994	where it matters:	-0.124939
-1.594973	a & 0=	-0.124939
-0.601777	determined in advance,	-0.124939
-1.642013	variables and operators...............................................................................	-0.124939
-2.066598	(see page 84).	-0.124939
-0.463732	/ x64 (Visual	-0.124939
-0.588095	multiply-and-add Table 13.1.	-0.124939
-0.599444	allocation using new/delete	-0.124939
-2.640857	// Example 14.22b	-0.124939
-2.640857	// Example 14.22a	-0.124939
-1.803284	from a technological	-0.124939
-0.596368	performance even matters,	-0.124939
-0.599948	interpreter which interprets	-0.124939
-0.851886	on access. Run	-0.124939
-1.937031	useful for vectorizing	-0.124939
-1.784518	byte at 400,	-0.124939
-0.359002	C++ Performance". www.open-	-0.124939
-2.640857	// Example 15.1d.	-0.124939
-0.567269	the overflow. Taking	-0.124939
-2.534636	to be reloaded	-0.124939
-1.921450	the following features:	-0.124939
-0.601665	Usability for Nerds	-0.124939
-3.208921	of the user-written	-0.124939
-0.600479	The } 59	-0.124939
-0.359002	Library amd_vrs4_expf amd_vrd2_exp	-0.124939
-0.598586	2 * 5;	-0.124939
-1.746538	solution is clearly	-0.124939
-0.359002	aliasing /Oa -fno-alias	-0.124939
-0.591308	does quite ingenious	-0.124939
-0.463732	same template. 57	-0.124939
-1.453828	a common denominator:	-0.124939
-0.599358	#endif double Func1(double)	-0.124939
-0.567266	SSE2 (or later)	-0.124939
-0.601855	University of Denmark.	-0.124939
-1.357463	inline void StoreNTD(double	-0.124939
-0.505122	a float. (Both	-0.124939
-0.359002	C++ Builder 5,	-0.124939
-0.601777	chain in two:	-0.124939
-2.640857	// Example 14.18a	-0.124939
-2.640857	// Example 14.18b	-0.124939
-2.209744	See page 53.	-0.124939
-1.490871	c + two,	-0.124939
-0.855627	}; struct Slongdouble	-0.124939
-2.640857	// Example 9.2b	-0.124939
-2.640857	// Example 9.2a	-0.124939
-1.190855	64-bit mode. Much	-0.124939
-1.771727	would be evicted.	-0.124939
-0.601851	mechanism to advertise	-0.124939
-0.726952	(-a==-b)=(a==b) ---xx---- (-a>-b)=(a<b)	-0.124939
-0.601580	intervals are short.	-0.124939
-2.209744	See page 45.	-0.124939
-0.359002	1./362880., 1./3628800., 1./39916800.,	-0.124939
-0.599902	replace all occurrences	-0.124939
-0.463732	processing power. Connecting	-0.124939
-2.066598	(see page 134)	-0.124939
-1.288282	In example 12.1a,	-0.124939
-0.359002	memcpy, memmove, memset,	-0.124939
-0.601876	original is destroyed.	-0.124939
-0.598361	4, we have:	-0.124939
-2.088706	by using memset:	-0.124939
-0.828398	2, 4, etc.).	-0.124939
-1.419754	in table 9.2,	-0.124939
-0.359002	incredibly stupid things.	-0.124939
-0.900458	around this limitation	-0.124939
-2.640857	// Example 8.24.	-0.124939
-0.527404	to virus attacks	-0.124939
-2.640857	// Example 7.32b	-0.124939
-0.601674	doesn’t. The undocumented	-0.124939
-1.740246	we can surely	-0.124939
-0.527404	with 2n -1.	-0.124939
-1.298636	conditions are met:	-0.124939
-2.060072	program is shut	-0.124939
-1.921450	the following sections.	-0.124939
-0.359002	and unexpected behaviors.	-0.124939
-0.847528	reasonably well. Very	-0.124939
-0.573388	compilers allow assembly-like	-0.124939
-0.359002	software development", Addison-	-0.124939
-1.300740	are the following:	-0.124939
-0.902897	explain the difference,	-0.124939
-2.719997	is a bottleneck.	-0.124939
-0.575709	a logical sequence.	-0.124939
-1.294739	() { __declspec(__align(64))	-0.124939
-1.595531	The function rounds	-0.124939
-0.577670	of macro expansions.	-0.124939
-1.995759	have a temp1	-0.124939
-0.593316	(0x2710 / 0x40)	-0.124939
-2.177960	value of temp.	-0.124939
-0.600479	before) } printf("\nResults:");	-0.124939
-0.359002	at hand. Low-level	-0.124939
-0.595696	(A & 0x0F)	-0.124939
-0.961003	identification (RTTI) ...........................................................................	-0.124939
-1.951190	{ return powN<true,N/2>::p(x)	-0.124939
-0.359002	code (release version)	-0.124939
-0.897439	to b memcpy(b,	-0.124939
-0.359002	double a[arraysize], b[arraysize],	-0.124939
-2.763419	to the truth	-0.124939
-1.078891	support the ADX	-0.124939
-1.203710	loop of ADC	-0.124939
-1.920745	important to realize	-0.124939
-0.359002	specific purpose: Contain	-0.124939
-0.541338	and 13 objects,	-0.124939
-0.593083	int64_t 128 I64vec2	-0.124939
-2.538960	may be mitigated	-0.124939
-0.463732	-msse4.1 -mAVX -axSSE3,	-0.124939
-1.641786	pointers to objects)	-0.124939
-1.641198	available in 2015	-0.124939
-2.295065	is that r+i/2	-0.124939
-0.900339	this time lag.	-0.124939
-0.601772	clumsy and tedious.	-0.124939
-0.883158	microprocessor hardware design.	-0.124939
-0.597985	optimized software design,	-0.124939
-0.600395	interfaces from scratch.	-0.124939
-1.067355	Core 2 0.63	-0.124939
-0.463732	x-xxxxx-x (-a)*(-b)=a*b ---xxx---	-0.124939
-0.583285	processors. 5 Programmable	-0.124939
-2.736646	that the producer	-0.124939
-1.443092	than it says.	-0.124939
-0.505122	/Gy, Linux: -ffunction-sections)	-0.124939
-0.902723	block is re-allocated	-0.124939
-1.078334	bit in nn	-0.124939
-0.575698	flag (e.g. DEC,	-0.124939
-1.049135	in registers, whereas	-0.124939
-0.601448	int)u; // Faster,	-0.124939
-0.463732	function. __attribute__((const)) (Linux	-0.124939
-0.359002	* 0.5 ns	-0.124939
-0.359002	and "More Effective	-0.124939
-0.594172	ammintrin.h AMD XOP	-0.124939
-2.212333	= a XOR	-0.124939
-2.334742	as a stand	-0.124939
-0.359002	/fp:fast /fp:fast=2 -fp-model	-0.124939
-0.359002	a polymorphous class?	-0.124939
-1.203255	malloc and free)	-0.124939
-0.359002	1./8.71782E10, 1./1.30767E12, 1./2.09227E13};	-0.124939
-2.066598	(see page 135).	-0.124939
-0.575701	of disk caching,	-0.124939
-1.090091	// define fprintf	-0.124939
-1.078605	Sum2 and Sum3.	-0.124939
-2.045093	in this block:	-0.124939
-0.359002	doubled. Thin clients	-0.124939
-0.580838	C/C++ v. 1.4,	-0.124939
-0.359002	< NUMCOLUMNS; column++)	-0.124939
-0.600272	error has occurred	-0.124939
-0.586050	version control tool.	-0.124939
-3.208921	of the iterator	-0.124939
-0.600990	performing an illegal	-0.124939
-2.640857	// Example 8.6a	-0.124939
-2.640857	// Example 8.6b	-0.124939
-2.101628	make a zip	-0.124939
-2.640857	// Example 7.15a.	-0.124939
-1.947577	more than 33%	-0.124939
-2.534636	to be signed.	-0.124939
-1.300113	integer is signed,	-0.124939
-2.640857	// Example 7.5.	-0.124939
-0.557829	(s0+s1)+(s2+s3); Now s0,	-0.124939
-0.359002	language gained remarkably	-0.124939
-0.877742	embedded systems. Today	-0.124939
-0.601851	gone to great	-0.124939
-0.359002	files (*.ini files).	-0.124939
-0.505122	-mAVX /arch:AVX /QaxSSE3,	-0.124939
-1.004489	for details (www.agner.org/optimize/testp.zip).	-0.124939
-0.902300	addresses for everything,	-0.124939
-0.505122	disk copying. Security.	-0.124939
-0.359002	the C99 standard.	-0.124939
-0.563110	different profiling methods:	-0.124939
-0.359002	13 objects, respectively	-0.124939
-0.885295	Intel SVML v.10.3	-0.124939
-0.885295	Intel SVML v.10.2	-0.124939
-0.601394	blocking or tiling.	-0.124939
-1.009766	of 100 doubles:	-0.124939
-0.359002	than 200. Next,	-0.124939
-1.683556	most efficient alternative.	-0.124939
-0.557838	(Standard Template Library)	-0.124939
-1.004475	mov mov lea	-0.124939
-1.357463	inline void StoreVectorA(void	-0.124939
-1.915130	must be emphasized	-0.124939
-1.077850	(*.dll or *.so)	-0.124939
-0.577670	compiler optimization. en.wikipedia.org/wiki/Compiler_optimization.	-0.124939
-0.359002	AES, PCLMUL wmmintrin.h	-0.124939
-0.359002	as gates, flip-flops,	-0.124939
-0.601777	languages in Microsoft's	-0.124939
-1.277033	assembly output (/FAs	-0.124939
-2.763419	to the standards	-0.124939
-2.209744	See page 140.	-0.124939
-0.359002	float i2; for(i=0,i2=0;	-0.124939
-0.505122	the divisions (Division	-0.124939
-1.139619	i; for(i=0; i<301;	-0.124939
-0.359002	aliasing" (if valid)	-0.124939
-1.366783	on Intel CPU’s.	-0.124939
-1.948839	i++) { ab[i].b	-0.124939
-0.594864	accessing arrays forwards,	-0.124939
-0.359002	a Gauss elimination.	-0.124939
-1.600962	are often unreliable.	-0.124939
-1.078555	easy to port	-0.124939
-0.789104	from www.agner.org/optimize/asmlib.zip. Currently	-0.124939
-1.443810	which are cheap,	-0.124939
-1.096841	Math Kernel Library.	-0.124939
-0.359002	4, 2007 (www.intel.com/technology/itj/).	-0.124939
-0.899123	of instruction timing,	-0.124939
-0.585229	matrix 512 520	-0.124939
-0.597540	(also called properties)	-0.124939
-1.861357	a single result,	-0.124939
-1.503796	get a reply	-0.124939
-2.640857	// Example 14.17b	-0.124939
-1.049607	fraction : 52;	-0.124939
-0.601580	#) are costless	-0.124939
-0.971476	branch misprediction penalty.	-0.124939
-1.550915	done by fetching,	-0.124939
-0.550877	to normal afterwards.	-0.124939
-0.601423	ABC = 123;	-0.124939
-1.287047	data into groups	-0.124939
-1.951190	{ return _mm_load_si128((__m128i	-0.124939
-0.600760	who have sent	-0.124939
-0.573391	small loops (less	-0.124939
-0.359002	+ ia32intrin.h _mm_exp_ps	-0.124939
-2.534636	to be noticeable	-0.124939
-0.359002	ia32intrin.h _mm_exp_ps _mm_exp_pd	-0.124939
-0.570622	this address. Step	-0.124939
-0.527397	the directive __declspec(cpu_dispatch(...)).	-0.124939
-0.660012	a[size], b[size], c[size];	-0.124939
-0.601772	mask, and bb[i]*cc[i]	-0.124939
-0.505122	float list[100]; memset(list,	-0.124939
-3.071155	in the broader	-0.124939
-0.597888	column < NUMCOLUMNS;	-0.124939
-0.601772	commas and semicolons	-0.124939
-2.232540	you can toggle	-0.124939
-2.640857	// Example 14.7a.	-0.124939
-1.700752	vector operations Today's	-0.124939
-1.644422	can cause holes	-0.124939
-0.588095	AVX512 Table 12.1.	-0.124939
-0.359002	examples exist. Therefore	-0.124939
-1.573431	assembly language output,	-0.124939
-0.359002	loop initialisation i=0;	-0.124939
-1.861357	a single session.	-0.124939
-2.648786	the same algorithm,	-0.124939
-0.601056	2004 - 2014.	-0.124939
-0.593970	s2 += a[i+2];	-0.124939
-0.563110	<< 4, anda	-0.124939
-2.095827	You may deviate	-0.124939
-0.563110	security software. Background	-0.124939
-0.588671	versions #include "instrset_detect.cpp"	-0.124939
-0.599480	reduce example 12.1b	-0.124939
-1.452236	to write _mm_add_epi16(a,b).	-0.124939
-2.568185	and the EXCLUSIVE	-0.124939
-1.367463	from example 8.26b:	-0.124939
-0.598920	14.6 float list[16];	-0.124939
-0.541338	a strict formalism	-0.124939
-1.376079	CPUs with full-size	-0.124939
-2.773232	in a graceful	-0.124939
-1.798755	when it changes.	-0.124939
-1.099276	Template Library (ATL)	-0.124939
-0.359002	160 /Qparallel -parallel	-0.124939
-0.550887	<= 16; n++)	-0.124939
-1.743471	compile time. (Examples	-0.124939
-0.875598	Wednesday | Friday))	-0.124939
-0.580851	on, including relaxed	-0.124939
-0.359002	operator (bitwise and)	-0.124939
-1.055936	for AMD Family	-0.124939
-0.884321	not supported fprintf(stderr,	-0.124939
-2.151139	in example 16.1.	-0.124939
-2.122795	it can handle.	-0.124939
-0.596901	most time. Uses	-0.124939
-0.601394	creates or modifies	-0.124939
-0.881143	can optimize specifically	-0.124939
-1.379169	due to controversies	-0.124939
-0.596260	9.2a void F1(int	-0.124939
-3.071155	in the representation,	-0.124939
-0.505131	places back. Thus,	-0.124939
-0.589691	features, see http://www.agner.org/optimize/	-0.124939
-0.505131	65 13.6 80.9	-0.124939
-0.359002	64 14.0 80.8	-0.124939
-1.202968	clear and intelligible	-0.124939
-0.902897	utilize the computational	-0.124939
-0.601394	__declspec(align(16)) or __attribute__((aligned(16))).	-0.124939
-0.600682	Coarse time measurement.	-0.124939
-1.202204	addresses are obscured	-0.124939
-0.596549	between these considerations.	-0.124939
-2.030478	a program dictates	-0.124939
-0.601631	operation that crashes	-0.124939
-2.408411	function is 83	-0.124939
-0.359002	A Pragmatic Look	-0.124939
-0.583281	language", section 17.9:	-0.124939
-0.359002	gates, flip-flops, multiplexers,	-0.124939
-2.763419	to the next.	-0.124939
-0.898392	and integer representations	-0.124939
-2.013292	need to deallocate	-0.124939
-2.022165	x) { _mm_store_si128((__m128i	-0.124939
-1.065792	be eliminated completely.	-0.124939
-0.601580	conventions are different.	-0.124939
-1.193934	different compilers succeeded	-0.124939
-0.600760	etc.) have little-endian	-0.124939
-2.045093	in this chapter.	-0.124939
-0.359002	128 Is8vec16 Vec16c	-0.124939
-1.773116	CPU dispatching 125	-0.124939
-2.640857	// Example 14.16a	-0.124939
-0.601963	eliminating the if-branch	-0.124939
-1.046174	is based mainly	-0.124939
-2.016540	on page 132.	-0.124939
-0.595823	integer type size_t	-0.124939
-2.773232	in a FILO	-0.124939
-0.505122	to 12. Higher	-0.124939
-0.563118	[eax+4], ecx 86	-0.124939
-0.359002	// x,y coordinates	-0.124939
-0.598586	a1 * b2	-0.124939
-0.598586	a2 * b1	-0.124939
-3.071155	in the "Macro	-0.124939
-1.879374	a and b.	-0.124939
-1.831644	AMD and VIA.	-0.124939
-0.584311	It just happened	-0.124939
-0.836084	at runtime. Polymorphism	-0.124939
-0.596049	comparing bits 32-62.	-0.124939
-1.200549	loop-invariant code motion.	-0.124939
-0.567273	common purposes (www.boost.org).	-0.124939
-0.595012	0; while (seconds	-0.124939
-1.823421	} // continue	-0.124939
-1.499061	only on Intel/x86-compatible	-0.124939
-2.640857	// Example 7.26b	-0.124939
-0.359002	Dr Dobbs Journal,	-0.124939
-1.297687	C++ compiler (parallel	-0.124939
-2.640857	// Example 7.26a	-0.124939
-1.642873	improve the possibilities	-0.124939
-1.491306	with other subtasks	-0.124939
-0.601394	weakness or bottleneck,	-0.124939
-1.202540	data for analysis.	-0.124939
-0.806134	16 Testing speed..............................................................................................................	-0.124939
-0.463732	RGB color difference.	-0.124939
-0.463732	smmintrin.h SSE4.2 nmmintrin.h	-0.124939
-0.359002	with enum, const,	-0.124939
-0.601772	VML and SVML.	-0.124939
-1.904898	efficient than non-object	-0.124939
-0.601851	issue to catching	-0.124939
-0.600794	try { F1();	-0.124939
-2.192153	are not used).	-0.124939
-0.598522	kernel version 2.6.30	-0.124939
-0.599724	expensive cache contentions,	-0.124939
-0.601286	execution by causing	-0.124939
-1.200334	c++) { StoreNTD(&a[c][r],	-0.124939
-0.901631	for function F1.	-0.124939
-0.789093	syntax: __asm ("fldl	-0.124939
-2.640857	// Example 8.19.	-0.124939
-0.879592	another. Therefore, micro-	-0.124939
-2.083637	faster than 15.1b,	-0.124939
-0.359002	Example 8.20 module1.cpp	-0.124939
-0.601394	dispatching or memory-intensive	-0.124939
-0.601777	somewhere in F1?	-0.124939
-0.599227	effects into account.	-0.124939
-0.359002	the Xnu project.	-0.124939
-0.597985	new software project,	-0.124939
-0.894888	distinguish between recoverable	-0.124939
-1.272920	and Windows 3.x.	-0.124939
-1.023614	with bounds checking,	-0.124939
-0.359002	the spell checking.	-0.124939
-2.640857	// Example 8.10b	-0.124939
-2.640857	// Example 8.10a	-0.124939
-0.886234	fully optimized yet.	-0.124939
-0.359002	4.5 0.82 0.59	-0.124939
-0.900939	volatile int DontSkip;	-0.124939
-0.463732	a; Plus2 (&a);	-0.124939
-0.591315	whole program. During	-0.124939
-0.359002	---xx---- (-a>-b)=(a<b) ---xx---x	-0.124939
-2.057727	you may view	-0.124939
-0.563114	Borland's now discontinued	-0.124939
-0.818712	in Linux. Address	-0.124939
-2.340041	there is virtually	-0.124939
-2.593306	is not satisfactory.	-0.124939
-2.016540	on page 87.	-0.124939
-0.463732	-fp-model fast, -fp-	-0.124939
-1.289409	monitor counters ....................................................................	-0.124939
-2.200831	used for fetching	-0.124939
-1.364736	i; float i2;	-0.124939
-0.550877	(0,0,0,0,0,0,0,0) Is16vec8 zero(0,0,0,0,0,0,0,0);	-0.124939
-0.660012	graphics accelerator card.	-0.124939
-1.830426	call to Func1,	-0.124939
-0.463732	register usage convention	-0.124939
-0.573382	Windows Library (OWL).	-0.124939
-2.670223	for the <,	-0.124939
-2.283633	such as <.	-0.124939
-0.359002	include JavaScript, PHP,	-0.124939
-2.368199	because the non-reduced	-0.124939
-1.272273	object file level.	-0.124939
-1.783204	the memory released	-0.124939
-0.889678	type-casting its address:	-0.124939
-0.601772	Professional and Enterprise	-0.124939
-0.593083	uint64_t 128 Vec2uq	-0.124939
-1.044384	and switch statements.............................................................................	-0.124939
-1.134739	Windows, Linux, Mac,	-0.124939
-1.495252	available from www.agner.org/optimize/testp.zip.	-0.124939
-0.601394	string or CString.	-0.124939
-0.601851	happy to receive	-0.124939
-0.359002	v. 5.5 Mac:	-0.124939
-2.408411	function is expanded	-0.124939
-0.359002	and planned solutions.	-0.124939
-1.921450	the following solutions,	-0.124939
-0.598586	(a+1) * (a+1);	-0.124939
-2.640857	// Example 7.30b	-0.124939
-2.640857	// Example 7.30a	-0.124939
-0.359002	not supported"); return;	-0.124939
-1.911696	function libraries published	-0.124939
-0.575698	call (e.g. GetProcessAffinityMask	-0.124939
-0.601598	software be reinstalled	-0.124939
-0.589691	also see emulated	-0.124939
-0.971507	a hash map.	-0.124939
-0.463732	/MT 160 /Qparallel	-0.124939
-0.902513	exponent, and fffff	-0.124939
-0.847545	C#, Visual Basic,	-0.124939
-2.394433	a = OneOrTwo5[b	-0.124939
-0.601665	interpreter for Basic.	-0.124939
-1.203386	one is fastest.	-0.124939
-1.951190	{ return square(x)	-0.124939
-0.557829	index changes fastest:	-0.124939
-0.557825	inline T max(T	-0.124939
-1.302463	is too late.	-0.124939
-0.463732	7.15b SafeArray <float,	-0.124939
-1.078317	smaller and closer	-0.124939
-1.202577	cast The static_cast	-0.124939
-0.359002	their superior performance/price	-0.124939
-0.999913	a considerable improvement	-0.124939
-0.600355	table at runtime,	-0.124939
-0.359002	// x^1, x^2,	-0.124939
-1.796167	sign bit set).	-0.124939
-0.601079	kb. This corresponds	-0.124939
-0.359002	a discrete icon	-0.124939
-0.359002	__asm ("int 3");	-0.124939
-0.359002	1.09 1.25 1.61	-0.124939
-2.066598	(see page 38).	-0.124939
-0.563114	Alignd(X) X __attribute__((aligned(16)))	-0.124939
-1.115924	C++ Compiler Documentation".	-0.124939
-1.203273	done in connection	-0.124939
-0.541332	program runs satisfactorily	-0.124939
-0.359002	development kit (SDK	-0.124939
-1.076697	array with alloca:	-0.124939
-0.567266	very inefficient. Linear	-0.124939
-0.600983	14.13c int list[301];	-0.124939
-0.789115	14.12 Position-independent code..................................................................................	-0.124939
-1.272920	and Windows Server	-0.124939
-0.359002	file formats. Comments	-0.124939
-1.641200	cost of synchronizing	-0.124939
-0.601185	spend on redesigning	-0.124939
-1.077524	allocated with alloca,	-0.124939
-0.359002	windows, graphic brushes,	-0.124939
-1.951190	{ return _mm_cvtss_si32(_mm_load_ss(&x));}	-0.124939
-0.601777	bear in mind,	-0.124939
-0.806134	supports self-relative addressing.	-0.124939
-0.359002	before you. Optimized	-0.124939
-0.660012	for supporting multi-threaded	-0.124939
-2.640857	// Example 7.3.	-0.124939
-0.806134	by Agner Fog	-0.124939
-0.359002	label ;eax=addressofa ;edx=addressinr	-0.124939
-0.359002	//=2*A //=A*x*x+B*x+C //=DeltaY	-0.124939
-0.505122	int, float. Similar	-0.124939
-1.010250	memory pool. Alignment?	-0.124939
-1.642487	sure the startup	-0.124939
-0.359002	= {2.6f, 1.5f};	-0.124939
-1.553310	one that doesn’t.	-0.124939
-2.640857	// Example 7.39	-0.124939
-1.071488	convert example 12.8a	-0.124939
-0.601851	12.8a to 12.8b	-0.124939
-0.601394	x?" or "how	-0.124939
-2.151139	in example 7.35	-0.124939
-2.640857	// Example 7.37	-0.124939
-0.601230	begins with #)	-0.124939
-0.359002	defines electrical connections	-0.124939
-2.640857	// Example 7.36	-0.124939
-1.824927	y = MAX(f(x),	-0.124939
-2.541450	the function add_horizontal)	-0.124939
-1.146045	The trick violates	-0.124939
-0.601056	2.5*x^2 - 8*x	-0.124939
-0.597491	optimization. See www.agner.org/optimize	-0.124939
-1.295281	we may write:	-0.124939
-0.900494	often have exploited.	-0.124939
-0.601079	arguments. This closely	-0.124939
-0.601772	first and foremost,	-0.124939
-1.401984	most important remedy	-0.124939
-0.596152	highly system dependent	-0.124939
-0.600983	c1; int c1::*MemberPointer;	-0.124939
-0.597422	> 0; i--)	-0.124939
-0.601772	list[i].a and list[i].b.	-0.124939
-0.601394	remote or removable	-0.124939
-1.920745	important to ignore,	-0.124939
-0.505122	doesn't mean atomic.	-0.124939
-1.769413	b * 5).	-0.124939
-0.599571	above, page 87)	-0.124939
-2.335317	that is distributed.	-0.124939
-0.541344	binding definitely degrades	-0.124939
-0.359002	Example 7.3. Explain	-0.124939
-0.897011	math library (VML,	-0.124939
-0.359002	template <bool IsPowerOf2,	-0.124939
-0.601423	ab[i].b = Func(ab[i].a);	-0.124939
-1.416046	Pentium 4 (NetBurst)	-0.124939
-1.799425	and it understands	-0.124939
-0.598871	family number 6!	-0.124939
-2.180477	i < ArraySize;	-0.124939
-0.764737	version performs poorly.	-0.124939
-1.504329	matter of habit,	-0.124939
-0.601423	log2 = log(2.0);	-0.124939
-0.579382	A Pentium M	-0.124939
-1.657097	the clock period	-0.124939
-0.902513	generality and flexibility,	-0.124939
-0.463732	a first-in-last-out fashion.	-0.124939
-0.601772	2B, and 3A	-0.124939
-1.978184	lot of CPU-time	-0.124939
-0.359002	this block: 62	-0.124939
-0.359002	Language Runtime, CLR,	-0.124939
-0.359002	n.a. 2.23 0.95	-0.124939
-2.763419	to the exponent:	-0.124939
-1.369523	with vector operands:	-0.124939
-1.462738	64-bit systems. 67	-0.124939
-0.600479	sin(x); } 68	-0.124939
-1.545517	+ 1; 69	-0.124939
-1.202935	130 for details).	-0.124939
-0.359002	g++ v 4.0.1.	-0.124939
-2.467979	number of DLLs,	-0.124939
-0.550882	the structure. Incrementing	-0.124939
-1.776703	do the conversion.	-0.124939
-1.490437	in different browsers,	-0.124939
-1.077273	columns = 50;	-0.124939
-2.834872	it is unrealistic	-0.124939
-0.591980	on hardware identification.	-0.124939
-0.567273	take approximately 500	-0.124939
-1.140388	to choose between.	-0.124939
-2.239089	have to consult	-0.124939
-1.432480	the previous iteration.	-0.124939
-0.598354	In any event,	-0.124939
-1.427168	be very helpful	-0.124939
-1.925587	the performance costs.	-0.124939
-0.505131	access. Available protocols	-0.124939
-0.588681	constant reference instead:	-0.124939
-0.573388	smaller sizes (char,	-0.124939
-0.764749	vectorization. Optimizes moderately	-0.124939
-0.828413	* __restrict aa,	-0.124939
-0.359002	that (b*c) overflows,	-0.124939
-0.588095	AVX2 Table 12.3.	-0.124939
-1.767402	has been brutally	-0.124939
-0.599166	i + sign(i)	-0.124939
-0.600505	contentions will occur:	-0.124939
-0.557829	CodeGear Borland bcc,	-0.124939
-0.359002	* powN<true,N-N1>::p(x); #undef	-0.124939
-0.359002	as 2eee 1.fffff,	-0.124939
-0.359002	TR 18015, "Technical	-0.124939
-1.844584	following example converts	-0.124939
-0.896169	(columns * sizeof(float))	-0.124939
-0.577680	follows: struct Sfloat	-0.124939
-0.601448	is // erroneously	-0.124939
-0.505122	and multiplications. Subtractions	-0.124939
-0.598765	(Embarcadero/CodeGear/Borland C++ Builder	-0.124939
-0.593325	17is calculated as(a	-0.124939
-0.902513	i and shifts	-0.124939
-0.600260	Supports vector intrinsics	-0.124939
-0.359002	cmp ja $B2$3:	-0.124939
-0.463732	Vec4uq Vec4f Vec2d	-0.124939
-1.180957	and system breakdown.	-0.124939
-0.887055	many small subtasks,	-0.124939
-0.601772	x and y?"	-0.124939
-2.431519	to a driver	-0.124939
-0.505122	sets Microprocessor producers	-0.124939
-0.359002	128 I64vec2 Vec2q	-0.124939
-0.588094	into three parts:	-0.124939
-0.893443	char 64 Is8vec8	-0.124939
-1.202661	// function prototypes	-0.124939
-0.940803	of five manuals:	-0.124939
-0.601423	(int)(&list[100]) = (int)(&list[0])	-0.124939
-0.359002	memory footprint. If,	-0.124939
-0.359002	and micro-operation breakdowns	-0.124939
-2.295018	order to minimize	-0.124939
-0.359002	0 65535 uint16_t	-0.124939
-0.359002	16 -32768 32767	-0.124939
-1.678705	variable in parts,	-0.124939
-0.601674	SVML. The IPP	-0.124939
-0.463732	optimizing University courses	-0.124939
-0.463732	AMD FMA4 fma4intrin.h	-0.124939
-0.601777	All in all,	-0.124939
-2.541450	the function bodies	-0.124939
-0.883143	instruction add eax,1	-0.124939
-0.600325	Aligning data Loading	-0.124939
-0.601320	occur: if (SIZE	-0.124939
-0.359002	manipulation tricks Michael	-0.124939
-0.890452	to simple actions	-0.124939
-2.086278	you are risking	-0.124939
-2.397645	is the sign,	-0.124939
-0.596260	0xC0000091L void MathLoop()	-0.124939
-0.900203	with more heuristic	-0.124939
-0.901631	factorial function (n!)	-0.124939
-0.575707	reciprocal square root,	-0.124939
-3.071155	in the GOT,	-0.124939
-0.580838	(MKL v. 7.2).	-0.124939
-2.016540	on page 130.	-0.124939
-2.955917	can be ameliorated	-0.124939
-0.590554	such programs installed	-0.124939
-2.773232	in a Gauss	-0.124939
-0.583283	See my blog	-0.124939
-0.359002	Specifications, Dr Dobbs	-0.124939
-2.031398	using the fundamental	-0.124939
-1.454266	DWORD PTR [eax+4],	-0.124939
-0.598586	StoreNTD(double * dest,	-0.124939
-0.463732	CriticalFunctionDispatch(void) __asm__ ("CriticalFunction");	-0.124939
-0.359002	/ 1.2345; Change	-0.124939
-2.393331	should be scheduled	-0.124939
-1.055358	with option -fwrapv	-0.124939
-0.599272	for pointer conversions.	-0.124939
-0.601665	audience for educational	-0.124939
-0.597276	YMM register state.	-0.124939
-0.505122	; compute i/2	-0.124939
-0.359002	Library __vrs4_expf __vrd2_exp	-0.124939
-0.563114	with heavy traffic	-0.124939
-0.463732	= (memory address)	-0.124939
-1.162601	the user. Compatibility	-0.124939
-0.595823	doing type conversions:	-0.124939
-1.077743	values are confined	-0.124939
-0.601394	Delays or glitches	-0.124939
-0.601866	reveals a funda-	-0.124939
-0.599358	__declspec(__align(64)) double matrix[SIZE][SIZE];	-0.124939
-0.596260	version void FUNCNAME(short	-0.124939
-0.888161	with element matrix[c][r].	-0.124939
-0.660012	by writing: __declspec(align(64))	-0.124939
-0.359002	("fldl %1 \n	-0.124939
-1.798598	b = MultiplyBy<8>(10);	-0.124939
-0.583287	constant references accept	-0.124939
-0.601772	corrections and suggestions	-0.124939
-0.505122	performance. 25 Since	-0.124939
-0.866318	and negative impacts	-0.124939
-1.146931	allocated dynamically (with	-0.124939
-0.541332	this method. Your	-0.124939
-1.740246	we can learn	-0.124939
-0.463732	as price, compatibility,	-0.124939
-0.894785	c2 < c1+TILESIZE;	-0.124939
-1.376091	each time slice	-0.124939
-0.880388	14.5 Integer division......................................................................................................	-0.124939
-0.588105	set needed _mm_shuffle_epi8	-0.124939
-0.586052	& later __svml_expf4	-0.124939
-0.577673	use it. Complicated	-0.124939
-1.090070	= 1.0; temp->b	-0.124939
-0.598128	2.5}; return list[x];	-0.124939
-0.600794	temp++) { temp->a	-0.124939
-0.887797	eliminate common subexpressions	-0.124939
-1.350960	point operations (addition,	-0.124939
-0.595452	is, I guess,	-0.124939
-2.640857	// Example 12.1b.	-0.124939
-2.151139	in example 12.1b,	-0.124939
-1.201578	SelectAddMul_pointer = &SelectAddMul_SSE2;	-0.124939
-1.290574	on which imprecisions	-0.124939
-1.644363	vector classes (Intel)	-0.124939
-0.359002	later __svml_expf4 __svml_exp2	-0.124939
-0.600065	and CPU hardware.	-0.124939
-1.915130	must be followed	-0.124939
-0.359002	SelectAddMul_SSE41, SelectAddMul_AVX2, SelectAddMul_dispatch;	-0.124939
-0.896768	just two branches:	-0.124939
-0.879555	integer multiplication prior	-0.124939
-1.071353	to return a+1;.	-0.124939
-1.818943	operating systems (but	-0.124939
-1.375848	library function __intel_cpu_features_init()	-0.124939
-0.593969	some small low-power	-0.124939
-0.593970	sum2 += list[i+1];}	-0.124939
-0.575698	calls (e.g. IsProcessorFeaturePresent	-0.124939
-0.598888	variable two names,	-0.124939
-1.298636	conditions are satisfied.	-0.124939
-0.601674	be. The distinctions	-0.124939
-0.527404	for(i=i_div_3=0; i<300; i+=3,i_div_3++){	-0.124939
-1.274641	not need relocation	-0.124939
-2.065292	it takes hours	-0.124939
-0.901817	a+b = b+a,	-0.124939
-1.298636	conditions are satisfied:	-0.124939
-1.848170	It may neverthe-	-0.124939
-0.598920	static float list[]	-0.124939
-0.998295	the condition clause.	-0.124939
-1.256755	the container expandable,	-0.124939
-0.591975	IEEE standard 754	-0.124939
-0.359002	F32vec4 xxn(x4, x2*x,	-0.124939
-0.570626	GHz CPU. Should	-0.124939
-0.601876	memset is deprecated.	-0.124939
-0.359002	to facilitate porting	-0.124939
-0.573382	LIBM Library amd_vrs4_expf	-0.124939
-1.075980	than an hour.	-0.124939
-1.553310	one that discriminates	-0.124939
-0.601777	succeeded in applying	-0.124939
-0.601380	features it has.	-0.124939
-0.836084	1)sign 2exponent 1023	-0.124939
-0.359002	can handle. Waiting	-0.124939
-1.371287	Intel vector classes:	-0.124939
-0.660012	= &Object1; p1->Hello();	-0.124939
-2.394433	a = _mm_blendv_epi8(bc,	-0.124939
-2.640857	// Example 8.12a	-0.124939
-2.640857	// Example 8.12b	-0.124939
-0.463732	prototype CriticalFunctionType CriticalFunction_Dispatch;	-0.124939
-1.064114	compiled code. (Compile	-0.124939
-0.591315	big program. Frequent	-0.124939
-2.367289	rather than isolating	-0.124939
-2.743091	It is strongly	-0.124939
-2.188162	is more manageable	-0.124939
-0.890227	cleaned up include:	-0.124939
-1.049135	in registers, totaling	-0.124939
-0.601394	-Wstrict-overflow=2, or (5)	-0.124939
-1.185308	write instructions (MOVNT)	-0.124939
-0.789093	Visual Basic .NET,	-0.124939
-2.773232	in a column-wise	-0.124939
-0.527404	user. Making exception-safe	-0.124939
-0.505122	program 153 spends	-0.124939
-0.359002	0.40 0.30 4.5	-0.124939
-1.077030	} int Size()	-0.124939
-0.359002	Table 7.1. Sizes	-0.124939
-0.599166	y.d + 4.;	-0.124939
-1.803284	from a website.	-0.124939
-0.359002	user. Menus, buttons,	-0.124939
-1.367463	from example 9.5a:	-0.124939
-0.599544	between each call,	-0.124939
-0.900858	by compiler .......................................................................	-0.124939
-0.359002	1./120., 1./720., 1./5040.,	-0.124939
-0.600983	403 int ReadB()	-0.124939
-0.585227	out results printf("\n%2i	-0.124939
-0.588095	uint64_t Table 7.1.	-0.124939
-1.069746	or multiple elements?	-0.124939
-0.660012	pointer aliasing" (if	-0.124939
-2.538960	may be caused	-0.124939
-1.509631	cout << x.f;	-0.124939
-0.463732	x-xxx-x-- 0/a=0 ---xx--xx	-0.124939
-0.891362	char 16 XOP,	-0.124939
-0.601394	First-In-First-Out or First-In-Last-Out	-0.124939
-0.598586	FuncCol(i)) * sizeof(float)	-0.124939
-0.527397	necessary support. Hardware	-0.124939
-0.505122	bit manipulation tricks	-0.124939
-1.594973	a & a=	-0.124939
-0.593767	for matrix a:	-0.124939
-2.648786	the same directory	-0.124939
-0.359002	{ "Alpha", "Beta",	-0.124939
-1.200472	calculated as (b*2.0)/3.0	-0.124939
-0.359002	STL deque (doubly	-0.124939
-2.394433	a = CriticalFunction(b,	-0.124939
-0.359002	linking (remove unreferen-	-0.124939
-2.334742	as a scalar	-0.124939
-1.114607	(see p. 57).	-0.124939
-1.049607	fraction : 63;	-0.124939
-0.527397	parallel processing. Scott	-0.124939
-0.595012	cause large delays.	-0.124939
-0.601448	cpuid // Read	-0.124939
-0.573397	vectorization Automatic paralleli-	-0.124939
-0.359002	0 255 uint8_t	-0.124939
-0.806124	automatically detect opportunities	-0.124939
-0.563118	= 8, Thursday	-0.124939
-1.032792	8 AVX2 _mm_i32gather_epi32	-0.124939
-0.600760	designers have gone	-0.124939
-0.601394	overlapping or aliasing,	-0.124939
-1.188601	} return add_elements(s);	-0.124939
-1.377769	{...} // Dispatcher.	-0.124939
-0.901759	reference, or void.	-0.124939
-0.891779	An even worse	-0.124939
-1.200472	calculated as ((a+b)+c)+d.	-0.124939
-0.359002	Microsoft Foundation Classes	-0.124939
-0.359002	strcpy, strcat, strlen,	-0.124939
-0.901867	casting // Constructor-style	-0.124939
-0.601665	branches for correctness.	-0.124939
-1.585968	a cache miss.	-0.124939
-1.803284	from a higher-priority	-0.124939
-0.359002	software. Background services.	-0.124939
-1.633619	vector class library).	-0.124939
-1.454266	DWORD PTR [esp+4]	-0.124939
-1.951190	{ return N;	-0.124939
-0.598586	float * DynamicArray	-0.124939
-1.077273	s = _mm_hadd_ps(x,	-0.124939
-0.861643	(i >= N)	-0.124939
-2.763419	to the design	-0.124939
-1.902782	clock cycles (depending	-0.124939
-1.078631	parallelism is obvious.	-0.124939
-1.961406	this is obvious,	-0.124939
-0.596260	typedef void FuncType(short	-0.124939
-1.769413	b * (1.	-0.124939
-1.328971	be changed freely.	-0.124939
-0.903059	x (x) x-xx--xx-	-0.124939
-0.359002	{ StoreNTD(&a[c][r], b[r][c]);	-0.124939
-1.056951	no specific option)	-0.124939
-0.601145	advantageous as replacements	-0.124939
-0.896524	on C++ Performance".	-0.124939
-1.356405	the exception handler,	-0.124939
-0.601772	condition, and increment.	-0.124939
-0.900290	systems use segmentation	-0.124939
-0.601145	them as integers:	-0.124939
-0.359002	/Qparallel -parallel -openmp	-0.124939
-1.201184	stack. This behaviour	-0.124939
-0.601286	1 by XOR'ing	-0.124939
-1.598122	option for RTTI	-0.124939
-2.648786	the same divisor.	-0.124939
-1.198352	y + a.y);}	-0.124939
-0.901817	(r2 = r1+1;	-0.124939
-0.594350	other thread increments	-0.124939
-0.593979	by exception handlers	-0.124939
-1.201462	pointers or references:	-0.124939
-0.359002	compiler (parallel composer)	-0.124939
-2.304254	{ // 2-dimensional	-0.124939
-1.412738	will generate -128,	-0.124939
-0.359002	counter //=2*A //=A*x*x+B*x+C	-0.124939
-0.505122	optimizing multithreaded applications:	-0.124939
-0.601394	-S or /Fa	-0.124939
-0.600990	handle an unrecoverable	-0.124939
-0.726952	error condition. Things	-0.124939
-1.180964	map file /Fm	-0.124939
-2.568185	and the texts	-0.124939
-0.601855	iterations of redesign.	-0.124939
-0.588096	0x7FFFFF) | 0x3F800000;	-0.124939
-1.940599	instead of -fpic.	-0.124939
-0.902680	deal of research	-0.124939
-2.834872	it is servicing.	-0.124939
-0.463732	long clock; __cpuid(dummy,	-0.124939
-2.467979	number of rows/columns	-0.124939
-0.660012	= &Object1; p->NotPolymorphic();	-0.124939
-0.359002	and A. Hoisie,	-0.124939
-0.541332	through pointers, e.g.:	-0.124939
-1.077896	destructor that destroys	-0.124939
-0.582138	container. STL deque	-0.124939
-2.623621	the compiler knows	-0.124939
-0.601777	utilities in 2010.	-0.124939
-0.601665	14.00 for 80x86	-0.124939
-0.601772	saving and restoring	-0.124939
-0.599166	a1/b1 + a2/b2;	-0.124939
-0.598128	s); return _mm_cvtss_f32(s);	-0.124939
-1.568651	the option -read_only_relocs	-0.124939
-0.601772	forums and newsgroups	-0.124939
-1.071488	as example 12.4b,	-0.124939
-2.640857	// Example 12.4b.	-0.124939
-1.567184	32 bits (rarely	-0.124939
-2.394433	a = 10000,	-0.124939
-0.359002	order a[0], b[0],	-0.124939
-2.066598	(see page 72).	-0.124939
-1.680179	about the dimensions	-0.124939
-0.359002	Darwin8 g++ v	-0.124939
-0.660012	b2, y1, y2,	-0.124939
-0.527397	parallel. Small lightweight	-0.124939
-0.836084	+ esp ;alignby4	-0.124939
-0.583281	profilers are: Coarse	-0.124939
-0.601330	Branch/loop function vectorized:	-0.124939
-0.463732	is obviously influenced	-0.124939
-0.580838	bcc, v. 5.5	-0.124939
-0.563114	API's. Memory swapping.	-0.124939
-1.741990	we are breaking	-0.124939
-1.038574	discussed below. Cannot	-0.124939
-1.794435	the library libmmt.lib	-0.124939
-0.601665	precautions for speeding	-0.124939
-0.806134	of Func ;a	-0.124939
-0.601230	(e.g. with _finite())	-0.124939
-0.660012	data decomposition. Functional	-0.124939
-0.359002	superior performance/price ratio.	-0.124939
-0.463732	can proceed unattended.	-0.124939
-1.203286	number to reflect	-0.124939
-0.359002	Func ;a ;r	-0.124939
-0.359002	= {1.0f, 2.5f};	-0.124939
-2.209744	See page 61.	-0.124939
-0.359002	/Qopenmp -m32 -m64	-0.124939
-0.660012	is busy concentrating	-0.124939
-1.539551	bitwise operators (&	-0.124939
-0.570629	languages. My preference	-0.124939
-0.359002	list[100]; Func1(list, &list[8]);	-0.124939
-1.696876	in registers (6	-0.124939
-0.595012	be while (0	-0.124939
-0.891151	256 bits (YMM)	-0.124939
-2.640857	// Example 12.4d.	-0.124939
-0.577673	the copying process,	-0.124939
-1.078891	under the best-case	-0.124939
-0.599166	c.x + d.x;	-0.124939
-0.958059	(vector) reductions: a+b=b+a,	-0.124939
-1.696876	in registers (8	-0.124939
-0.901817	x.abc = (A	-0.124939
-0.875598	4) | (C	-0.124939
-0.597985	A software developer	-0.124939
-0.588096	A | (B	-0.124939
-0.359002	set Prefetch PREFETCH	-0.124939
-0.589197	called name mangling.	-0.124939
-1.478008	are less susceptible	-0.124939
-0.577673	improving performance. Stefan	-0.124939
-1.049613	flush-to-zero mode (SSE):	-0.124939
-1.300005	vectors of inte-	-0.124939
-0.463732	/MT -msse3 /arch:SSE3	-0.124939
-2.177960	value of sum.	-0.124939
-2.283633	such as ReadB	-0.124939
-0.601394	VHDL or Verilog.	-0.124939
-0.601876	N-1 is inferior.	-0.124939
-2.022052	the first dimension	-0.124939
-0.541332	than third party	-0.124939
-0.463732	Processors". www.amd.com. Advices	-0.124939
-1.059623	of error reporting.	-0.124939
-2.955917	can be wired	-0.124939
-2.640857	// Example 14.12a	-0.124939
-1.979358	takes to refresh	-0.124939
-0.601674	here: The inequality	-0.124939
-0.726952	using templates. Ready	-0.124939
-0.600193	having different types.	-0.124939
-1.798598	b = !a;	-0.124939
-0.563122	n; #if defined(__unix__)	-0.124939
-2.568185	and the destructor,	-0.124939
-2.101628	make a destructor.	-0.124939
-2.568185	and the wires	-0.124939
-2.314404	use the _mm_clflush	-0.124939
-0.901759	types or sizes?	-0.124939
-2.104895	make the SelectAddMul	-0.124939
-0.463732	x-xxxxxx- a*0=0 --xxxx-xx	-0.124939
-0.505122	43 7.13 Loops......................................................................................................................	-0.124939
-0.600983	set int iset	-0.124939
-3.071155	in the beginning.	-0.124939
-1.940741	necessary to adhere	-0.124939
-0.601423	clock = __rdtsc();	-0.124939
-0.593316	2.0 / 3.0;	-0.124939
-1.518024	clock cycles. Calculations	-0.124939
-0.587488	A longer loop-	-0.124939
-0.505122	for(i=0; i<100; i++)a[i]=2*i;	-0.124939
-0.601772	recoverable and non-recoverable	-0.124939
-1.578406	has no side-effects	-0.124939
-0.359002	stupid things. Looking	-0.124939
-1.075239	optimize this loop?	-0.124939
-0.359002	4.0.1. Gnu: Glibc	-0.124939
-0.599581	of class C1,	-0.124939
-0.593763	each line written.	-0.124939
-1.596584	}; // Partial	-0.124939
-1.854235	elements in a[]	-0.124939
-0.463732	by consistent modularity	-0.124939
-0.890473	unknown processors properly.	-0.124939
-0.589197	-fp- model fast=2	-0.124939
-2.195854	short int aa[size]	-0.124939
-0.586788	F2 actually throws	-0.124939
-0.601851	branches to feed	-0.124939
-3.071155	in the former	-0.124939
-0.359002	the "FDIV bug".	-0.124939
-1.463665	the execution considerably.	-0.124939
-0.557829	Some developers feel	-0.124939
-1.202407	problems that relate	-0.124939
-2.394433	a = b++;	-0.124939
-0.597417	other less well-known	-0.124939
-0.847523	a vector. 6.	-0.124939
-0.900953	Use an antivirus	-0.124939
-0.899365	be different sizes,	-0.124939
-1.961954	into the pipeline.	-0.124939
-0.359002	fistpl %0 "	-0.124939
-2.283633	such as gates,	-0.124939
-0.866322	in computer games.	-0.124939
-0.598128	134) return FactorialTable[n];	-0.124939
-0.596901	last time. Newer	-0.124939
-0.599358	inline double IntegerPower	-0.124939
-2.016540	on page 158.	-0.124939
-0.575698	operators (e.g. '>')	-0.124939
-0.588095	options Table 18.1.	-0.124939
-1.129391	to become obsolete	-0.124939
-1.299987	15.1b to 15.1c,	-0.124939
-0.598128	of return prediction).	-0.124939
-1.272311	int 32 -231	-0.124939
-0.870056	the feature information,	-0.124939
-0.601851	(a*b*c)+(c*b*a) to a*b*c*2.	-0.124939
-0.601230	users with nagging	-0.124939
-0.541332	cases. Multiple threads?	-0.124939
-2.151139	in example 7.22.	-0.124939
-0.764737	programming languages. www.yeppp.info	-0.124939
-0.597888	temp < &list[100];	-0.124939
-0.864132	32-bit Windows, Intel/MASM	-0.124939
-0.463732	Linux. 82 Keywords	-0.124939
-1.296148	program. This requires,	-0.124939
-0.589681	leaving their workplace	-0.124939
-1.119701	is particularly risky	-0.124939
-1.598122	option for "standard	-0.124939
-2.834872	it is cached,	-0.124939
-0.463732	& obj1; p->f();	-0.124939
-0.573388	on bounds checking).	-0.124939
-0.505122	a pivot search:	-0.124939
-0.601448	"instrset_detect.cpp" // instrset_detect	-0.124939
-1.496948	a time measure.	-0.124939
-0.902513	program and concentrate	-0.124939
-0.601230	additions with double's.	-0.124939
-0.557829	each case. Inlined	-0.124939
-0.601423	a+a+a+a = a*4	-0.124939
-0.866311	that supports this).	-0.124939
-0.598586	x2 * x2;	-0.124939
-0.359002	Common Language Runtime,	-0.124939
-0.359002	requires n-1 multiplications,	-0.124939
-0.875623	test data. That	-0.124939
-0.895723	When we reach	-0.124939
-0.359002	xxn(x4, x2*x, x2,	-0.124939
-2.712349	- - 76	-0.124939
-0.463732	c x-xx----- 75	-0.124939
-2.195854	short int 832	-0.124939
-0.999913	a considerable job,	-0.124939
-1.412785	the optimization job.	-0.124939
-1.042695	by 8. 71	-0.124939
-0.600479	temp; } 70	-0.124939
-0.597491	compilers. See www.openmp.org	-0.124939
-1.078317	up and down.	-0.124939
-0.463732	the programmer. 79	-0.124939
-0.902131	devices are CPLDs	-0.124939
-0.598556	or array coincides	-0.124939
-2.640857	// Example 8.14b	-0.124939
-0.789093	or key press.	-0.124939
-2.640857	// Example 8.14a	-0.124939
-2.101628	make a bit-mask	-0.124939
-1.038569	be too worried	-0.124939
-0.591976	"=m"(n) : "m"(x)	-0.124939
-0.601963	extending the sign-bit	-0.124939
-2.640857	// Example 7.33a	-0.124939
-0.894983	things very stupid.	-0.124939
-0.600378	is memory pooling.	-0.124939
-0.463732	objects (memory pooling)	-0.124939
-0.902233	thread that shares	-0.124939
-1.202968	clear and modular.	-0.124939
-0.550887	(bit scan forward)	-0.124939
-1.038561	has three advantages:	-0.124939
-0.601772	GetPrivateProfileString and WritePrivateProfileString	-0.124939
-0.557825	(Visual Studio 2005).	-0.124939
-2.422609	to use try,	-0.124939
-2.581419	floating point multiply-and-add	-0.124939
-0.567273	it writes only,	-0.124939
-1.502214	// This triangle	-0.124939
-0.463732	x-xx----x x-xxxxxx- x-xxxx-x-	-0.124939
-0.601448	lrint(d); // Rounding	-0.124939
-0.575698	algorithm (e.g. Quine–McCluskey	-0.124939
-0.586798	calculation requires n-1	-0.124939
-0.359002	the fundamental laws	-0.124939
-0.601851	initialize to x^0/0!	-0.124939
-0.901544	separated by semicolons,	-0.124939
-2.441924	instruction set (/arch:SSE2,	-0.124939
-1.201578	f = float(i);	-0.124939
-1.609569	with many decimals.	-0.124939
-0.359002	loop? Certainly not!	-0.124939
-1.830963	be stored together......................................	-0.124939
-0.593083	their uses (live	-0.124939
-2.581419	floating point -ffast-math	-0.124939
-1.600625	overhead of managing	-0.124939
-1.199828	b; int Sum1()	-0.124939
-0.601876	occurrence is rare.	-0.124939
-1.827910	preferably be responded	-0.124939
-0.577673	are adding -100	-0.124939
-0.599166	y.b + 2.;	-0.124939
-2.773232	in a pre-calculated	-0.124939
-0.463732	b*a (a+b)+c=a+(b+c) a+b+c=c+b+a	-0.124939
-1.776703	do the devirtualization	-0.124939
-1.679514	compilers and invoked	-0.124939
-1.194672	or two 128-	-0.124939
-2.151139	in example 9.5b.	-0.124939
-2.431519	to a printer	-0.124939
-0.359002	proceed unattended. Uninstallation	-0.124939
-0.584309	future CPUs. Half	-0.124939
-1.430117	Instruction set Important	-0.124939
-0.601772	Func1 and Func2	-0.124939
-0.463732	difference, let's say	-0.124939
-1.288282	In example 12.3a,	-0.124939
-1.478412	point variables .........................	-0.124939
-0.851907	two 128-bit reads.	-0.124939
-0.359002	1./6., 1./24., 1./120.,	-0.124939
-1.655765	are using unions	-0.124939
-0.359002	C++, D, Pascal,	-0.124939
-0.660012	dependency chains, namely	-0.124939
-2.100095	the data structure,	-0.124939
-0.588110	writing data. Multidimensional	-0.124939
-0.601145	C++ as 'this'.	-0.124939
-1.377844	compiling for AVX2,	-0.124939
-0.588671	<stdio.h> #include <asmlib.h>	-0.124939
-1.384381	in both 16-bit,	-0.124939
-0.901432	example with u.i[1]	-0.124939
-0.588096	unroll too much.	-0.124939
-1.566354	induction variable (eax)	-0.124939
-0.359002	#pragma optimize("a", on)	-0.124939
-0.601963	draw the attention	-0.124939
-3.071155	in the level-	-0.124939
-1.553714	time and maintainability	-0.124939
-1.475440	float f; f=i;	-0.124939
-1.070557	exponent + 0x7F	-0.124939
-1.741990	we are relying	-0.124939
-0.359002	the BIOS setup.	-0.124939
-0.557838	= 1, Monday	-0.124939
-1.888348	is an n'th	-0.124939
-0.570622	the chapter "Register	-0.124939
-0.660012	| (~a&c) a&b&c&d	-0.124939
-0.599544	within each clause	-0.124939
-2.006980	cannot be ignored	-0.124939
-2.102394	use a union,	-0.124939
-0.601674	advantages: The i<20	-0.124939
-0.359002	the <, <=,	-0.124939
-0.601772	division and relational	-0.124939
-1.075847	= x *x;	-0.124939
-0.550882	"Intel Performance Primitives"	-0.124939
-0.726952	Example 14.30 finds	-0.124939
-0.596250	or always false:	-0.124939
-0.593563	code inside square:	-0.124939
-0.359002	obeyed. Copy protection.	-0.124939
-0.601394	incremental or iterative	-0.124939
-0.591327	or shared objects),	-0.124939
-1.119701	is particularly tricky.	-0.124939
-0.601963	was the opposite:	-0.124939
-1.961954	into the for-loop:	-0.124939
-1.805117	has the complication	-0.124939
-2.394433	a = ++b;	-0.124939
-0.601866	half a square.	-0.124939
-0.359002	SelectAddMul, SelectAddMul_SSE2, SelectAddMul_SSE41,	-0.124939
-1.995759	have a strategy	-0.124939
-0.601286	negative by AND'ing	-0.124939
-0.596260	{}; void xplus2()	-0.124939
-0.594700	on processor X?"	-0.124939
-0.505131	at Exception Specifications,	-0.124939
-2.088797	be a million	-0.124939
-1.781679	memory allocation (new	-0.124939
-1.589054	for this reason.	-0.124939
-0.600742	For this reason,	-0.124939
-0.359002	-Ofast -mveclibabi -fopenmp	-0.124939
-0.584307	into smaller squares	-0.124939
-0.527397	are optimal. Best-case	-0.124939
-0.595303	reply about investigation	-0.124939
-1.299901	integer in disguise.	-0.124939
-0.598128	normal return route.	-0.124939
-1.038569	be too small,	-0.124939
-0.601674	reputation. The compactness	-0.124939
-2.328144	the loop overhead.	-0.124939
-0.881857	Loop counter //=2*A	-0.124939
-0.591976	1.5f : 2.6f;	-0.124939
-0.359002	results printf("\n%2i %10I64i",	-0.124939
-0.599579	are floating point-to-integer	-0.124939
-1.075214	this instruction set?".	-0.124939
-1.299102	loop. The loop-branch	-0.124939
-1.951190	{ return vector(x	-0.124939
-2.640857	// Example 8.8b	-0.124939
-1.370560	Mathematical functions Encryption,	-0.124939
-2.640857	// Example 8.8a	-0.124939
-0.550877	~a ^ ~b	-0.124939
-0.463732	many users. Firewalls,	-0.124939
-0.359002	---xx---- a<<b<<c=a<<(b+c) x-xxx--xx	-0.124939
-0.599166	(int)(&list[0]) + 100*16,	-0.124939
-0.359002	color difference. Newest	-0.124939
-1.460714	is always normalized,	-0.124939
-0.888808	in Windows MFC).	-0.124939
-1.169269	stack unwinding ..............................................................................	-0.124939
-1.376079	threads with widely	-0.124939
-1.771727	would be re-calculated	-0.124939
-1.202577	not. The advise	-0.124939
-0.589204	same cache. Multithreaded	-0.124939
-0.590130	mainstream next year.	-0.124939
-0.463732	set specified. Insert	-0.124939
-0.359002	function inlining. Reducible	-0.124939
-0.593091	elements }; vector()	-0.124939
-0.884288	A few decades	-0.124939
-0.601772	high and decreased	-0.124939
-1.297943	optimization by CPU.............................................................................81	-0.124939
-2.394433	a = (*CriticalFunction)(b,	-0.124939
-2.640857	// Example 12.7.	-0.124939
-0.596264	other access patterns.	-0.124939
-0.902513	develop and publish	-0.124939
-0.597672	to const definitions	-0.124939
-2.394433	a = Multiply(10,8);	-0.124939
-1.078631	overflow is "undefined".	-0.124939
-0.726952	is handled separately:	-0.124939
-1.454266	DWORD PTR [ecx+eax*4],ebx	-0.124939
-1.680275	call the std::unexpected()	-0.124939
-0.601423	a.x = b.x	-0.124939
-0.601855	thousands of people.	-0.124939
-0.359002	exceptions: __except (GetExceptionCode()	-0.124939
-1.568651	the option -mveclibabi=svml.	-0.124939
-1.200334	c++) { a[c][r]	-0.124939
-2.107799	they are uninitialized,	-0.124939
-2.467979	number of jumps,	-0.124939
-1.805840	a few places.	-0.124939
-0.463732	in scientific computing,	-0.124939
-0.359002	programming nowadays stress	-0.124939
-0.885319	char 128 Iu8vec16	-0.124939
-0.359002	b[0], a[1], b[1],	-0.124939
-1.777089	unless the strictness	-0.124939
-0.588095	Microsoft Table 2.1.	-0.124939
-0.594700	size, bytes alignment,	-0.124939
-0.580838	Asmlib: v. 2.00.	-0.124939
-0.541338	value wrap around.	-0.124939
-0.359002	common sub-expressions. Why	-0.124939
-0.828405	3.13 Memory access.......................................................................................................	-0.124939
-1.066783	(i < arraysize)	-0.124939
-0.595823	pointer type casting.	-0.124939
-0.595823	simple type casting,	-0.124939
-1.099295	array bounds violations	-0.124939
-0.463732	better: -Ofast -mveclibabi	-0.124939
-1.077186	new or malloc.	-0.124939
-0.359002	- 2014. Last	-0.124939
-1.077186	new or malloc)	-0.124939
-0.359002	{ _mm_stream_pi((__m64*)dest, *(__m64*)&source);	-0.124939
-0.597888	u < 231	-0.124939
-1.695336	a specific interval.	-0.124939
-0.601580	influences are removed,	-0.124939
-1.278281	}; void Func()	-0.124939
-1.048689	a certain interval:	-0.124939
-0.902525	algorithm in question:	-0.124939
-0.894785	c2 < r2;	-0.124939
-1.099295	array bounds violation,	-0.124939
-1.298636	methods are incremental	-0.124939
-0.601876	7.43b is admittedly	-0.124939
-0.597486	activates critical application-	-0.124939
-1.078555	idea to collect	-0.124939
-0.359002	sets. Covers PC's,	-0.124939
-0.877734	The name "position-independent	-0.124939
-2.460478	by the application,	-0.124939
-0.580860	and hot spots.	-0.124939
-2.180477	i < list.Size();	-0.124939
-2.573352	on the essential	-0.124939
-0.359002	int r1, r2,	-0.124939
-1.076864	multiply by xx-xx--x-	-0.124939
-0.601772	distribution and mirroring	-0.124939
-1.598122	option for "function	-0.124939
-1.625523	have been identified.	-0.124939
-0.359002	= MAX(f(x), g(x));	-0.124939
-1.177742	library functions. Time-	-0.124939
-0.590940	set was originally	-0.124939
-0.601876	lines is 8*1024/64	-0.124939
-0.359002	x-xxxx--x ~a&~b=~(a|b) --xxxx---	-0.124939
-0.806143	as follows (using	-0.124939
-2.460478	by the series:	-0.124939
-0.599166	y.c + 3.;	-0.124939
-0.587477	(u.i > v.i)	-0.124939
-2.394433	a = select_gt(b,	-0.124939
-1.377769	{...} // Prototype	-0.124939
-1.398568	the stack. String	-0.124939
-0.901432	called with IsPowerOf2	-0.124939
-0.359002	version 2.20, glibc	-0.124939
-2.670223	for the pros	-0.124939
-0.359002	special mathe- matical	-0.124939
-0.660012	members (properties) ............................................................................	-0.124939
-1.078631	element is stored?	-0.124939
-0.598763	allocation also tends	-0.124939
-2.190521	from the leftmost	-0.124939
-1.503423	Intel and Gnu).	-0.124939
-0.527397	12.9b. Taylor series,	-0.124939
-0.527397	a Taylor series.	-0.124939
-0.891151	512 bits (ZMM).	-0.124939
-0.359002	Table 18.1. Command	-0.124939
-0.580838	VectorC v. 2.1.7,	-0.124939
-0.660012	the effort. Square	-0.124939
-1.601190	example, the DelayFiveSeconds	-0.124939
-0.596989	show how tortuous	-0.124939
-1.063141	ZMM registers ..........................................................	-0.124939
-0.359002	buttons, dialog boxes,	-0.124939
-0.359002	= _mm_hadd_ps(s, s);	-0.124939
-0.599636	control no yes	-0.124939
-0.359002	Menus, buttons, dialog	-0.124939
-1.888348	is an integer).	-0.124939
-0.903059	&, |, ~.	-0.124939
-0.958059	(vector) reductions: ~(~a)	-0.124939
-0.359002	a quadratic matrix,	-0.124939
-0.541332	x-xxxxxxx xxxxxxxxx xxxxxxx-x	-0.124939
-0.588098	164 below. Those	-0.124939
-0.541344	current .cpp file)	-0.124939
-0.597889	on branch predictions	-0.124939
-0.583285	some positive value,	-0.124939
-1.444368	b, c; x[0]	-0.124939
-0.600794	source) { _mm_stream_pi((__m64*)dest,	-0.124939
-0.601145	OneOrTwo5[b!=0] as OneOrTwo5[(b!=0)	-0.124939
-0.601851	manipulated to fake	-0.124939
-0.891151	128 bits (XMM)	-0.124939
-1.197807	doing multiple logically	-0.124939
-0.600065	options. CPU vendors	-0.124939
-1.939572	an integer constant,	-0.124939
-0.600983	d; int i[2];	-0.124939
-1.911276	that you analyze	-0.124939
-2.955917	can be accomplished	-0.124939
-0.359002	use try, catch,	-0.124939
-0.359002	Third Edition, 2005;	-0.124939
-0.359002	(Gnu) AES, PCLMUL	-0.124939
-0.875606	const keyword wherever	-0.124939
-0.589682	on complicated criteria	-0.124939
-0.726952	7.22 Inheritance ..............................................................................................................	-0.124939
-0.590137	pre-increment operator ++i	-0.124939
-0.463732	is dividing repeatedly	-0.124939
-0.878688	functions like sqrt,	-0.124939
-0.901759	__restrict or __restrict__,	-0.124939
-0.601665	hardware for raising	-0.124939
-0.359002	file dvec.h vectorclass.h	-0.124939
-1.672453	the next section.	-0.124939
-2.654124	the code section,	-0.124939
-0.596268	variable method unfavorable,	-0.124939
-1.644363	vector classes 114	-0.124939
-0.726952	7.25 Bitfields ...................................................................................................................	-0.124939
-1.745437	the programmer hasn't	-0.124939
-0.463732	0.f, 0.f, 1.f);	-0.124939
-0.596165	xmmintrin.h SSE2 emmintrin.h	-0.124939
-1.673175	not always optimal,	-0.124939
-2.736646	that the occurrence	-0.124939
-0.359002	of fine-tuning, testing,	-0.124939
-1.510201	the preceding row.	-0.124939
-1.829000	member functions (methods).........................................................................	-0.124939
-2.367289	rather than self-styled	-0.124939
-0.599166	SVML + ia32intrin.h	-0.124939
-0.595696	powN<(N & N-1)==0,N>::p(x);	-0.124939
-1.960315	than a minute	-0.124939
-0.601851	simple to develop.	-0.124939
-0.764737	they are. Declare	-0.124939
-1.300643	get the exact	-0.124939
-1.714159	amount of RAM,	-0.124939
-0.902897	identify the circumstances	-0.124939
-0.898145	template<> class powN<true,1>	-0.124939
-0.601665	~ for NOT.	-0.124939
-1.077273	(set) = (0x2710	-0.124939
-1.032792	8 AVX2 _mm_i64gather_epi32	-0.124939
-0.593316	CodeGear / Embarcadero	-0.124939
-0.889960	thousand times lower;	-0.124939
-0.598704	have efficient table-based	-0.124939
-1.287773	to reduce (a*b*c)+(c*b*a)	-0.124939
-0.359002	2056 38.1 97	-0.124939
-1.816588	to calculate pow(x,10)	-0.124939
-0.359002	called VTune; AMD's	-0.124939
-0.902680	blocks of data",	-0.124939
-1.948839	i++) { 92	-0.124939
-0.901544	space by allowing	-0.124939
-0.660012	"Instruction tables". Tips	-0.124939
-0.601394	compiler, or vice	-0.124939
-0.598871	random number generators.	-0.124939
-0.601423	x8 = x4*x4;	-0.124939
-1.959451	I have tested.	-0.124939
-0.599166	b.x + c.x	-0.124939
-0.599166	b.y + c.y	-0.124939
-0.888493	soft processor activates	-0.124939
-0.550877	n places back,	-0.124939
-1.731018	only when activated	-0.124939
-0.359002	a polynomial. Scheduling	-0.124939
-2.640857	// Example 7.34b.	-0.124939
-0.660012	((visibility ("internal"))) Vectorize	-0.124939
-0.583283	by my comments,	-0.124939
-0.875638	two induction variables:	-0.124939
-0.570626	Important features 80386	-0.124939
-2.640857	// Example 14.16b	-0.124939
-0.463732	// Portability note:	-0.124939
-0.359002	example 13.1, Requires	-0.124939
-1.777089	unless the Pentium-II	-0.124939
-0.902723	product is Borland's	-0.124939
-0.563126	xxn *= xx4;	-0.124939
-0.463732	other exceptions: __except	-0.124939
-0.601855	methods of rounding,	-0.124939
-0.660012	serious legal issue,	-0.124939
-0.463732	of removing superfluous	-0.124939
-2.198736	For example, b*2.0/3.0	-0.124939
-2.219197	will be mainstream	-0.124939
-1.077273	s = _mm_hadd_ps(s,	-0.124939
-1.379477	example is Perl.	-0.124939
-2.640857	// Example 14.17a	-0.124939
-1.299987	15.1b to 15.1c?	-0.124939
-0.601665	72 for discussions.	-0.124939
-0.889688	called stack unwinding.	-0.124939
-0.541332	or __asm ("int	-0.124939
-0.902673	set to relax	-0.124939
-0.567266	set (or higher)	-0.124939
-0.463732	; i++ ;checkifi<100	-0.124939
-1.030488	are usually dealt	-0.124939
-1.077908	x*x*x*x*x*x*x*x = ((x2)2)2	-0.124939
-1.458129	in assembly language:	-0.124939
-1.767499	{ int dummy[4];	-0.124939
-0.899108	newest CPU model,	-0.124939
-1.786042	C++ compilers exist	-0.124939
-2.640857	// Example 14.15a	-0.124939
-0.851881	of structures (without	-0.124939
-0.359002	always true/false Loopunrolling	-0.124939
-1.379111	parameter is wrong,	-0.124939
-0.600943	that compiler makers	-0.124939
-1.503603	not a textbook	-0.124939
-1.049607	exponent : 15;	-0.124939
-0.899732	as memory leaks.	-0.124939
-0.359002	is unreasonably large.	-0.124939
-0.463732	affinity mask. Poor	-0.124939
-0.359002	p. 22). 159	-0.124939
-0.595165	ReadTSC function. 154	-0.124939
-0.463732	/GR– -fno-rtti /GR-	-0.124939
-0.600479	IntegerPower<10>(x); } 152	-0.124939
-0.828420	a GOT entry.	-0.124939
-1.076994	as function inlining.	-0.124939
-1.830426	call to Object1.Hello(),	-0.124939
-1.203286	15.1a to 151	-0.124939
-0.591659	and well thought-through	-0.124939
-0.902673	not to vectorize.	-0.124939
-0.601772	works and suggests	-0.124939
-1.735172	time than normally.	-0.124939
-1.143308	4 AVX2 _mm256_i32gather_ps	-0.124939
-0.585220	99 10 Multithreading..............................................................................................................	-0.124939
-0.359002	exit(), abort(), _endthread(),	-0.124939
-0.359002	uses (live ranges)	-0.124939
-0.589691	metaprogramming implementation analogous	-0.124939
-0.896768	The two summation	-0.124939
-0.893740	of operating system.........................................................................................	-0.124939
-0.895261	the return statement:	-0.124939
-0.596250	am always happy	-0.124939
-1.615986	and operating systems"	-0.124939
-0.577680	32-bit Linux, Gnu/AT&T	-0.124939
-0.891370	System programming Device	-0.124939
-0.601772	PC's and mainframes,	-0.124939
-2.763419	to the device.	-0.124939
-0.359002	the "Macro loops"	-0.124939
-0.463732	- vectorclass www.agner.org/optimize/#vectorclass.	-0.124939
-0.359002	__asm ("fldl %1	-0.124939
-0.505122	First-In-Last-Out access, sort	-0.124939
-0.359002	Example 7.29b floata;	-0.124939
-0.573397	tools. Automatic updates.	-0.124939
-0.876728	for automatic updates,	-0.124939
-0.557834	= log (b[i]	-0.124939
-2.088797	be a level-3	-0.124939
-1.018393	Compiler v. 14.00	-0.124939
-1.334306	is enabled. Few	-0.124939
-1.875696	function that detects	-0.124939
-1.154808	the name _alloca)	-0.124939
-0.359002	has occurred anywhere	-0.124939
-1.459254	: public B1,	-0.124939
-0.589693	references. Most importantly,	-0.124939
-0.764737	3.10 Graphics .................................................................................................................	-0.124939
-0.359002	a[0], b[0], a[1],	-0.124939
-0.598586	5 * 0.5	-0.124939
-1.162601	the user. Feature	-0.124939
-0.359002	2.23 0.95 0.6	-0.124939
-1.294479	with this mask,	-0.124939
-1.072925	100; float list[ARRAYSIZE];	-0.124939
-1.185035	int 64 Is16vec4	-0.124939
-1.146030	Intel processors. Details	-0.124939
-0.567273	Examples include JavaScript,	-0.124939
-0.961003	= 100, NUMCOLUMNS	-0.124939
-0.567266	four (or eight)	-0.124939
-0.998287	following way. First	-0.124939
-1.445723	risk of losing	-0.124939
-0.601423	time1 = ReadTSC();	-0.124939
-1.295071	multiple CPU cores:	-0.124939
-0.600943	and compiler makers.	-0.124939
-0.567277	metaprogramming. Don't panic	-0.124939
-0.601674	sin(0.8); The sin	-0.124939
-1.203386	computer is rebooted.	-0.124939
-1.032820	by calling WritePrivateProfileString,	-0.124939
-0.505122	} catch (...)	-0.124939
-1.923513	and other abuse	-0.124939
-0.600742	at this place.	-0.124939
-0.600983	FuncRow(int); int FuncCol(int);	-0.124939
-0.590137	modulo operator %.	-0.124939
-0.903059	VIA CPUs"). Const	-0.124939
-2.304254	{ // abs(u.f)	-0.124939
-0.567287	4. Instruction tables:	-0.124939
-0.894785	int)n < 13)	-0.124939
-0.359002	profiling feasible. Interference	-0.124939
-1.073605	at different times:	-0.124939
-0.601056	int)(max - min))	-0.124939
-0.847528	other resource conflicts.	-0.124939
-0.359002	also work, 133	-0.124939
-0.598188	metaprogramming so complicated?	-0.124939
-0.359002	any patch. 131	-0.124939
-0.588671	<float.h> #include <math.h>	-0.124939
-1.601914	the program. Application	-0.124939
-0.896099	the many people	-0.124939
-0.463732	as floppy disks	-0.124939
-0.567266	one parameter. Further	-0.124939
-0.902897	half the single-thread	-0.124939
-1.115950	and garbage collection,	-0.124939
-0.600479	i_div_3; } 138	-0.124939
-1.551216	virtual function tables.	-0.124939
-0.359002	\n fistpl %0	-0.124939
-0.563114	statement jump tables,	-0.124939
-1.156590	of their superior	-0.124939
-0.600614	become more powerful.	-0.124939
-2.670223	for the newsgroup	-0.124939
-1.075596	time you activate	-0.124939
-0.601394	C1 or C2,	-0.124939
-0.601230	supercomputers with massively	-0.124939
-2.177794	by a unique	-0.124939
-1.078555	costs to multithreading	-0.124939
-2.834872	it is unlikely	-0.124939
-0.601855	queue of pending	-0.124939
-1.735978	the order a[0],	-0.124939
-0.597210	= 64 kb.	-0.124939
-0.870056	is 512 kb,	-0.124939
-1.181378	is common practice	-0.124939
-1.308792	type identification (RTTI).	-0.124939
-1.308792	type identification (RTTI),	-0.124939
-0.600395	timingtest.h from www.agner.org/optimize/testp.zip	-0.124939
-0.660012	7.24 Unions ....................................................................................................................	-0.124939
-0.902513	portability and ease	-0.124939
-2.534636	to be resized	-0.124939
-0.858859	The Pentium Pro	-0.124939
-0.570632	may come unpredictably	-0.124939
-0.887067	and integers ...................................	-0.124939
-1.463644	function calls. Internal	-0.124939
-0.599705	than integer comparisons.	-0.124939
-0.463732	around, (3) trap	-0.124939
-2.955917	can be overridden	-0.124939
-0.359002	128 Iu32vec4 Vec4ui	-0.124939
-0.887797	eliminate common sub-expressions.	-0.124939
-0.541332	of valid addresses,	-0.124939
-1.073605	are different opinions	-0.124939
-1.902393	for different microprocessors,	-0.124939
-0.878690	the members individually.	-0.124939
-0.599166	p->a + p->b;}	-0.124939
-1.694032	The loop initialisation	-0.124939
-2.008958	code that matters	-0.124939
-0.359002	to fine- tune	-0.124939
-0.588095	nontemporal Table 18.3.	-0.124939
-1.947577	more than doubled	-0.124939
-1.781220	of these categories:	-0.124939
-0.726952	interrupt service routines,	-0.124939
-0.601855	perspective of usability.	-0.124939
-0.588096	_mm_setcsr(_mm_getcsr() | 0x8040);	-0.124939
-1.504376	are a couple	-0.124939
-1.586201	64-bit mode Parameter	-0.124939
-1.801124	also be huge).	-0.124939
-1.071488	as example 13.1,	-0.124939
-0.567277	division faster. Of	-0.124939
-0.505122	section (page 131)	-0.124939
-0.902513	itself and recompile	-0.124939
-2.304254	{ // Main	-0.124939
-0.463732	libraries named MKL,	-0.124939
-0.897322	log(b[i]) + log(c[i]);.	-0.124939
-0.505122	That being said,	-0.124939
-1.078566	(number of ways).	-0.124939
-0.583283	(In my tests,	-0.124939
-0.585220	facilities, binary trees,	-0.124939
-0.902703	it a template:	-0.124939
-0.359002	precision (80 bits).	-0.124939
-2.394433	a = FactorialTable[b];	-0.124939
-0.541338	been doubled. Thin	-0.124939
-1.951190	{ return _mm_cvtsd_si32(_mm_load_sd(&x));}	-0.124939
-0.359002	= (short int)i;	-0.124939
-1.959451	I have tried.	-0.124939
-0.601777	GetLogicalProcessorInformation in Windows)	-0.124939
-0.592284	multidimensional structure needed?	-0.124939
-0.557838	Volume 1, 2A,	-0.124939
-0.601145	now as follows.	-0.124939
-1.201578	f = static_cast<float>(i);	-0.124939
-2.045093	in this respect.	-0.124939
-1.408092	code becomes bulky	-0.124939
-0.600534	load. A light-weight	-0.124939
-0.505122	memory. One kilobyte	-0.124939
-1.297540	turn on correction	-0.124939
-0.601380	set it supports.	-0.124939
-2.276497	the CPU supports,	-0.124939
-0.847539	* temp; 104	-0.124939
-1.076864	performance by 5-10%	-0.124939
-0.591974	is 1 0.5ns.	-0.124939
-0.601876	modification is profitable.	-0.124939
-0.595826	8 0 255	-0.124939
-0.601674	straightforward. The MASM	-0.124939
-0.599571	at page 150.	-0.124939
-1.073244	CPUID instruction directly,	-0.124939
-0.601380	accessing it directly.	-0.124939
-0.600904	One may argue	-0.124939
-1.445431	problems and planned	-0.124939
-1.858930	of this capability:	-0.124939
-0.577673	its final destination,	-0.124939
-0.359002	and UNIX shell	-0.124939
-0.601394	(.dll or .so).	-0.124939
-1.901246	out of range");	-0.124939
-1.915130	must be reversed	-0.124939
-0.359002	0.3, -2.0, 4.4,	-0.124939
-1.594795	same as reflecting	-0.124939
-2.541450	the function scanf.	-0.124939
-1.830254	function in isolation	-0.124939
-1.298636	resources are limiting	-0.124939
-1.187680	a 32-bit (signed)	-0.124939
-2.581419	floating point exceptions,	-0.124939
-0.600682	reproducible time measurements:	-0.124939
-0.818720	// Now 1.0	-0.124939
-0.601394	(XMM or YMM)	-0.124939
-0.842338	data sets. Covers	-0.124939
-0.463732	Monday, Tuesday, Wednesday,	-0.124939
-1.463627	vector registers (XMM	-0.124939
-1.185035	int 64 Iu16vec4	-0.124939
-0.895709	need any patch.	-0.124939
-0.359002	of 1/n! 1.,	-0.124939
-2.593306	is not met	-0.124939
-0.505122	ammintrin.h (MS) xopintrin.h	-0.124939
-0.359002	FuncC(i); FuncB(i+1); FuncC(i+1);	-0.124939
-0.601866	(a&b)&(c&d) a ^0	-0.124939
-0.902897	over the C99	-0.124939
-1.169276	int 128 Iu16vec8	-0.124939
-0.505122	intrin.h (MS) x86intrin.h	-0.124939
-0.359002	shuffling, packing, unpacking	-0.124939
-0.601394	(/FAs or -fsource-asm).	-0.124939
-0.505122	user feedback seriously.	-0.124939
-0.359002	int BigArray[1024] __attribute__((aligned(64)));	-0.124939
-0.570626	address might clash	-0.124939
-0.463732	the first-in-last-out nature	-0.124939
-0.541338	the equivalent if(!(a	-0.124939
-0.600378	possible memory requirement.	-0.124939
-0.660012	SIAM 2001. Advanced	-0.124939
-2.955917	can be propagated	-0.124939
-1.300196	goes to C0::f	-0.124939
-0.597210	int64_t 64 I64vec1	-0.124939
-1.194286	generation class (CParent<>)	-0.124939
-0.527397	a bad dilemma.	-0.124939
-1.744676	instruction set. High	-0.124939
-0.359002	v 4.0.1. Gnu:	-0.124939
-0.601772	systems, and API's.	-0.124939
-0.598558	and possible workaround.	-0.124939
-1.018393	C++ v. 4.1.0,	-0.124939
-0.527404	// MOVNTQ _mm_empty();	-0.124939
-0.902525	things in parallel:	-0.124939
-1.201166	see if our	-0.124939
-2.461379	x - 8.0f)	-0.124939
-0.902723	etc. is considerable.	-0.124939
-0.359002	or "frame pointer".	-0.124939
-0.359002	Michael Abrash: "Zen	-0.124939
-1.618121	or reference parameters).	-0.124939
-0.999913	a considerable debate	-0.124939
-0.600534	Sutter: A Pragmatic	-0.124939
-0.463732	and 9. Multiplications	-0.124939
-0.600260	constant vector (1,2,3,4),	-0.124939
-0.505122	chapter (page 146).	-0.124939
-0.601777	comments, in green.	-0.124939
-0.359002	registers. Typical candidates	-0.124939
-0.660012	in C++: Preprocessor	-0.124939
-1.146059	multiple threads. Out-of-order	-0.124939
-0.595012	computer while he	-0.124939
-1.549388	used by thousands	-0.124939
-0.601866	e.g. a menu	-0.124939
-1.709452	In this chapter,	-0.124939
-0.601772	conversion and shuffling	-0.124939
-0.901817	(a&&b&&c) = a&&b	-0.124939
-0.599272	avoiding pointer arithmetics	-0.124939
-0.877734	16 256 Vec32uc	-0.124939
-0.580838	of variables. Move	-0.124939
-1.452236	to write if(!a	-0.124939
-0.601665	row++) for (column	-0.124939
-1.069733	by two gives:	-0.124939
-0.601851	rounded to 100000000.	-0.124939
-1.272273	object file formats.	-0.124939
-0.359002	Vol. 11, Iss.	-0.124939
-1.834649	it has incomplete	-0.124939
-0.359002	case 0: printf("Alpha");	-0.124939
-0.541332	59 third generations	-0.124939
-0.902723	integers is costless.	-0.124939
-0.359002	more heuristic guidelines.	-0.124939
-0.899766	or from knowing	-0.124939
-2.151139	in example 7.43b	-0.124939
-1.829967	i = 18,	-0.124939
-0.359002	time lag. Thinking	-0.124939
-0.842338	load address. Relocation	-0.124939
-0.577673	integer registers. Typical	-0.124939
-0.359002	-32768 32767 int16_t	-0.124939
-0.463732	Documentation License shall	-0.124939
-0.598586	CChild2 * p2;	-0.124939
-0.586806	a[N]; public: SafeArray()	-0.124939
-0.601772	opens and closes	-0.124939
-0.359002	0 232-1 uint32_t	-0.124939
-0.591976	1.0f : 2.5f;	-0.124939
-0.557838	temporary debug breakpoints	-0.124939
-1.078334	compiler in favor	-0.124939
-0.359002	Mac: Darwin8 g++	-0.124939
-0.660012	1 Introduction .......................................................................................................................	-0.124939
-1.034928	is still frustrated	-0.124939
-0.580838	Glibc v. 2.7,	-0.124939
-0.593970	s3 += a[i+3];	-0.124939
-1.379631	not the columns.	-0.124939
-0.557825	protected: T a[N];	-0.124939
-0.594161	sin, etc. Overriding	-0.124939
-0.597888	j < columns;	-0.124939
-0.359002	Example 7.41b a.x	-0.124939
-0.359002	+ d.x; a.y	-0.124939
-0.600325	Extra data conversion,	-0.124939
-1.600410	class is responsible	-0.124939
-0.359002	the for-loop: i++;	-0.124939
-0.660012	void F1() throw();	-0.124939
-0.541332	loop increment i++.	-0.124939
-0.589199	good development tools,	-0.124939
-1.185043	will take precedence,	-0.124939
-1.194881	then call __intel_cpu_features_init_x().	-0.124939
-0.601448	sizeof(float)); // (Some	-0.124939
-0.601772	reusable and well-	-0.124939
-0.600395	jump from a=a*2;	-0.124939
-1.295708	generate an interrupt,	-0.124939
-1.018393	C++ v. 7.1-4,	-0.124939
-0.601230	division with truncation,	-0.124939
-0.527397	ten years old.	-0.124939
-0.595826	32 0 232-1	-0.124939
-0.889678	in its API.	-0.124939
-1.179693	The type __m128d	-0.124939
-0.359002	development", Addison- Wesley	-0.124939
-2.066598	(see page 73)	-0.124939
-2.209744	See page 73.	-0.124939
-0.764737	x^4 F32vec4 s(0.f,	-0.124939
-3.071155	in the Active	-0.124939
-2.334742	as a subset,	-0.124939
-1.283797	of possible remedies	-0.124939
-2.568185	and the resultant	-0.124939
-3.071155	in the sequence,	-0.124939
-2.023798	compiler can safely	-0.124939
-0.595696	OneOrTwo5[b & 1];	-0.124939
-3.208921	of the kind:	-0.124939
-0.359002	a multitasking environment,	-0.124939
-0.599579	8 floating point).	-0.124939
-1.034910	Preprocessing directives (everything	-0.124939
-0.359002	Visual studio 2008,	-0.124939
-0.828413	* __restrict bb)	-0.124939
-0.359002	v. 7.1-4, 2008.	-0.124939
-0.600260	supports vector intrinsics,	-0.124939
-1.073804	for vector intrinsics.	-0.124939
-1.543623	an extra layer	-0.124939
-0.527404	First-In-Last- Out (FILO)	-0.124939
-0.575709	"function level linking"	-0.124939
-2.640857	// Example 14.2a	-0.124939
-2.640857	// Example 14.2b	-0.124939
-0.600473	FuncB, then FuncC.	-0.124939
-0.601772	brands and similarly	-0.124939
-0.463732	Server 2008 R2	-0.124939
-0.359002	of 0x800 apart.	-0.124939
-0.586045	integer vectors FMA3	-0.124939
-1.078317	structure and clarity	-0.124939
-0.589688	cases like these,	-0.124939
-1.706151	compatible with these.	-0.124939
-0.359002	CPU hardware. Porting	-0.124939
-1.767402	has been wasted.	-0.124939
-0.463732	as memcpy, memmove,	-0.124939
-1.454266	DWORD PTR [esp+12]	-0.124939
-0.902300	feature for reserving	-0.124939
-2.743091	It is tempting	-0.124939
-0.588094	or three levels	-0.124939
-0.596260	thread void DelayFiveSeconds()	-0.124939
-1.078555	intended to mimic	-0.124939
-0.359002	C++". Addison-Wesley. Third	-0.124939
-1.937031	useful for identifying	-0.124939
-0.595452	functions. I disagree	-0.124939
-2.283633	such as AQtime,	-0.124939
-2.640857	// Example 14.29	-0.124939
-2.640857	// Example 14.24	-0.124939
-2.640857	// Example 14.25	-0.124939
-1.162639	this problem. Vectors	-0.124939
-0.726952	fast approximate reciprocal,	-0.124939
-2.640857	// Example 14.20	-0.124939
-2.151139	in example 14.21	-0.124939
-2.239089	have to adapt	-0.124939
-1.934514	functions are unrelated	-0.124939
-3.071155	in the unit-	-0.124939
-0.836084	#define FUNCNAME SelectAddMul_AVX2	-0.124939
-1.596991	d = 1.6;	-0.124939
-0.550892	have spent fighting	-0.124939
-0.601772	_intel_fast_memcpy and __intel_new_strlen	-0.124939
-1.078605	x86 and ARM	-0.124939
-1.299901	result in a[i].	-0.124939
-1.237897	is running at,	-0.124939
-2.122795	it can overwrite	-0.124939
-0.359002	thread increments seconds.	-0.124939
-1.784518	byte at 399	-0.124939
-0.593768	wmmintrin.h AVX immintrin.h	-0.124939
-0.601866	afterwards a BSF	-0.124939
-0.900939	volatile int seconds;	-0.124939
-1.769413	b * 1.2f;	-0.124939
-0.601145	15.1c as intended,	-0.124939
-0.601394	window or makefile.	-0.124939
-0.598550	modifies many strings.	-0.124939
-1.199514	system may supply	-0.124939
-2.102394	use a queue.	-0.124939
-2.648786	the same queue,	-0.124939
-0.597540	was called from),	-0.124939
-1.710891	functions for distinguishing	-0.124939
-0.359002	(doubly ended queue)	-0.124939
-0.359002	the newsgroup comp.lang.asm.x86	-0.124939
-2.151139	in example 9.1b.	-0.124939
-2.195854	short int cc[]);	-0.124939
-0.961003	in parallel. Coarse-grained	-0.124939
-0.557834	is Intel's term	-0.124939
-0.359002	like string, wstring	-0.124939
-0.586796	override public symbols,	-0.124939
-0.527397	quite dramatic consequences.	-0.124939
-1.445723	risk of activating	-0.124939
-0.887058	sum1 += sum2;	-0.124939
-0.902723	threads is minimized.	-0.124939
-0.585224	elements were inserted,	-0.124939
-0.598128	here: return *(T*)0;	-0.124939
-2.151139	in example 7.30b.	-0.124939
-0.900781	frequency may vary	-0.124939
-2.640857	// Example 14.4a	-0.124939
-0.901946	data can exceed	-0.124939
-0.588094	INVALID_HANDLE_VALUE && WriteFile(handle,	-0.124939
-1.201578	SelectAddMul_pointer = &SelectAddMul_SSE41;	-0.124939
-0.359002	Journal Vol. 11,	-0.124939
-0.593757	already been allocated.	-0.124939
-1.115907	in chapter 11.	-0.124939
-0.902723	cost is minimized	-0.124939
-2.047090	do not alias,	-0.124939
-0.600479	2.0f; } 115	-0.124939
-1.366783	on Intel Atom	-0.124939
-0.359002	SafeArray <float, 100>	-0.124939
-0.359002	= instrset_detect(); 116	-0.124939
-1.076026	a); } 111	-0.124939
-0.463732	_mm_and_si128(c2, mask); 110	-0.124939
-0.965195	to wrap around,	-0.124939
-1.074457	x); } 112	-0.124939
-1.055358	with option -Wstrict-overflow=2,	-0.124939
-0.463732	FuncA(i); FuncC(i); FuncB(i+1);	-0.124939
-0.359002	vector operands: minimum,	-0.124939
-1.482729	is called. 118	-0.124939
-1.049607	exponent : 11;	-0.124939
-2.177794	by a constant:	-0.124939
-0.359002	S. Warren, Jr.:	-0.124939
-2.066598	(see page 70).	-0.124939
-0.590137	AND operator (&)	-0.124939
-1.630967	2; } list[300]	-0.124939
-1.255482	Boolean operators (&&	-0.124939
-1.823421	} // Default	-0.124939
-0.463732	will remain locked	-0.124939
-0.359002	1./2., 1./6., 1./24.,	-0.124939
-0.595698	machine instructions executed,	-0.124939
-2.534636	to be annoying.	-0.124939
-0.836084	linked list. 94	-0.124939
-1.045549	more space 91	-0.124939
-0.593970	temp += 9;	-0.124939
-0.359002	example 9.5a: 98	-0.124939
-0.590140	the Mac platform,	-0.124939
-0.900181	memory when exiting	-0.124939
-1.870103	in some rare	-0.124939
-0.601855	program of occupying	-0.124939
-1.077908	c2 = _mm_and_si128(c2,	-0.124939
-0.601772	format and getting	-0.124939
-0.902525	possible in Linux).	-0.124939
-2.640857	// Example 7.41a	-0.124939
-2.640857	// Example 7.41b	-0.124939
-0.563110	are zero. Zero	-0.124939
-0.585224	list[i] << endl;	-0.124939
-0.575704	0x40) % 0x20	-0.124939
-1.454266	DWORD PTR [eax],	-0.124939
-0.359002	Example 7.38b. Alternative	-0.124939
-0.877747	a Boolean NOT	-0.124939
-0.463732	be misleading reports	-0.124939
-0.594696	the big registration	-0.124939
-1.928707	of an error;	-0.124939
-0.601330	Fast function calling.	-0.124939
-0.557834	keyword far (arrays	-0.124939
-0.601394	__thread or __declspec(thread).	-0.124939
-1.626354	a * 2.5;	-0.124939
-2.375977	If the granularity	-0.124939
-0.764737	u; u.i &=	-0.124939
-2.955917	can be accessed.	-0.124939
-3.071155	in the BIOS	-0.124939
-0.902703	transposes a quadratic	-0.124939
-0.899923	first, then d+e,	-0.124939
-1.078631	r is re-loaded	-0.124939
-1.455778	the constant 2.5,	-0.124939
-0.884300	often contains writeable	-0.124939
-0.601394	new/delete or malloc/free	-0.124939
-0.764737	is enabled (single	-0.124939
-1.065761	feature called "Gnu	-0.124939
-0.463732	instruction xor eax,eax.	-0.124939
-0.601777	iterative in nature,	-0.124939
-0.463732	/arch:SSE3 -mssse3 /arch:SSSE2	-0.124939
-2.016540	on page 27.	-0.124939
-0.359002	Addison- Wesley 1997.	-0.124939
-0.593762	and classes Nowadays,	-0.124939
-0.575698	cache (e.g. Sandy	-0.124939
-0.550882	"Integrated Performance Primitives".	-0.124939
-1.240218	function name ;startofFunc	-0.124939
-1.299619	results in meaningless	-0.124939
-1.076955	x-- x ---	-0.124939
-0.597128	mechanisms often disturb	-0.124939
-1.504570	put the task-specific	-0.124939
-0.463732	network connections. Temporary	-0.124939
-1.076245	1. This '1'	-0.124939
-0.601423	DontSkip = dummy[0];	-0.124939
-1.502848	function and replaces	-0.124939
-1.201578	temp = a+1;	-0.124939
-2.192153	are not reproducible.	-0.124939
-1.267647	it does incredibly	-0.124939
-0.855622	a variable. Efficiency	-0.124939
-0.903059	is valid. Re-interpreting	-0.124939
-1.535804	to using inheritance.	-0.124939
-0.598892	Avoid multiple inheritance,	-0.124939
-0.505122	same brand. Future	-0.124939
-1.529360	the second sub-vector	-0.124939
-0.879563	OR operator (^)	-0.124939
-0.601876	'this' is incurred	-0.124939
-0.875592	(a<b && b<c)	-0.124939
-0.876723	is Microsoft Foundation	-0.124939
-1.810948	the other volumes	-0.124939
-0.588095	(Gnu) Table 12.2.	-0.124939

\4-grams:
-1.591110	then it is the	-0.124939
-0.989021	But it is the	-0.124939
-0.569881	Now it is the	-0.124939
-0.339206	rely on is the	-0.124939
-0.571713	machine code is the	-0.124939
-0.507582	// This is the	-0.124939
-0.454373	program. This is the	-0.124939
-0.645395	data. This is the	-0.124939
-0.454373	pointer. This is the	-0.124939
-0.454373	well. This is the	-0.124939
-0.454373	counts. This is the	-0.124939
-0.454373	other. This is the	-0.124939
-0.454373	compiled. This is the	-0.124939
-0.454373	declaration. This is the	-0.124939
-0.454373	de-allocated. This is the	-0.124939
-0.454373	happens. This is the	-0.124939
-0.677959	because this is the	-0.124939
-0.475067	If this is the	-0.124939
-0.475067	system this is the	-0.124939
-0.546003	arrays. It is the	-0.124939
-0.797392	core. It is the	-0.124939
-0.546003	course. It is the	-0.124939
-0.546003	diagnose. It is the	-0.124939
-0.546003	response. It is the	-0.124939
-0.546003	leaks. It is the	-0.124939
-0.507893	code, which is the	-0.124939
-0.507893	latency which is the	-0.124939
-0.507893	branch, which is the	-0.124939
-1.254475	instruction set is the	-0.124939
-0.797654	function pointer is the	-0.124939
-0.531856	simple array is the	-0.124939
-0.339206	1.fffff, where is the	-0.124939
-0.977304	a variable is the	-0.124939
-0.716021	memory access is the	-0.124939
-0.477654	executable. SSE2 is the	-0.124939
-0.747536	The stack is the	-0.124939
-0.339206	function calls is the	-0.124939
-0.535548	The result is the	-0.124939
-1.017051	a matrix is the	-0.124939
-0.558269	powerful solution is the	-0.124939
-0.339206	i<n; i++) is the	-0.124939
-0.511555	sorted list is the	-0.124939
-0.438690	A reference is the	-0.124939
-0.498530	where n is the	-0.124939
-0.477654	where r is the	-0.124939
-0.535548	used here is the	-0.124939
-0.339206	handle strings is the	-0.124939
-0.477654	of containers is the	-0.124939
-0.486627	light-weight alternative is the	-0.124939
-0.339206	template metaprogramming is the	-0.124939
-0.197858	clock cycle is the	-0.425969
-0.477654	Fine-grained parallelism is the	-0.124939
-0.339206	name ?Func@@YAXQAHAAH@Z is the	-0.124939
-0.339206	kind: "what is the	-0.124939
-0.339206	jl $B1$2 is the	-0.124939
-0.339206	worth considering is the	-0.124939
-0.339206	serious burden is the	-0.124939
-0.339206	sign, eee is the	-0.124939
-0.339206	and fffff is the	-0.124939
-0.339206	add eax,1 is the	-0.124939
-0.311853	should be of the	-0.124939
-0.258925	detection function of the	-0.124939
-0.258925	increasing function of the	-0.124939
-0.258925	staircase function of the	-0.124939
-0.235492	the code of the	-0.124939
-0.641312	the use of the	-0.124939
-0.339550	make use of the	-0.124939
-0.339550	better use of the	-0.124939
-0.235492	or more of the	-0.124939
-0.260534	time because of the	-0.124939
-0.260534	variables because of the	-0.124939
-0.260534	systems because of the	-0.124939
-0.260534	avoided because of the	-0.124939
-0.260534	inefficient because of the	-0.124939
-0.101976	discussed which of the	-0.124939
-0.101976	advance which of the	-0.124939
-0.311853	if all of the	-0.124939
-0.012540	is one of the	-0.124939
-0.065631	to one of the	-0.124939
-0.038761	only one of the	-0.124939
-0.018948	into one of the	-0.425969
-0.038761	choose one of the	-0.124939
-0.038761	Only one of the	-0.124939
-0.038761	signifying one of the	-0.124939
-0.311853	the class of the	-0.124939
-0.311853	after each of the	-0.124939
-0.153980	where most of the	-0.124939
-0.153980	way most of the	-0.124939
-0.153980	while most of the	-0.124939
-0.153980	But most of the	-0.124939
-0.153980	predicted most of the	-0.124939
-0.153980	runs most of the	-0.124939
-0.153980	obtain most of the	-0.124939
-0.153980	consumes most of the	-0.124939
-0.293939	the size of the	-0.176091
-0.384034	The size of the	-0.124939
-0.023685	a multiple of the	-0.329059
-0.280723	the object of the	-0.124939
-0.320606	an object of the	-0.301030
-0.099591	with many of the	-0.124939
-0.046947	has many of the	-0.124939
-0.099591	avoids many of the	-0.124939
-0.092573	same version of the	-0.124939
-0.466591	which version of the	-0.124939
-0.092573	each version of the	-0.124939
-0.092573	possible version of the	-0.124939
-0.092573	new version of the	-0.124939
-0.092573	specific version of the	-0.124939
-0.092573	optimized version of the	-0.124939
-0.092573	optimal version of the	-0.124939
-0.092573	better version of the	-0.124939
-0.017003	appropriate version of the	-0.221849
-0.092573	desired version of the	-0.124939
-0.028720	right version of the	-0.301030
-0.092573	final version of the	-0.124939
-0.092573	newer version of the	-0.124939
-0.043825	debug version of the	-0.124939
-0.103096	latest version of the	-0.124939
-0.092573	inferior version of the	-0.124939
-0.092573	command-line version of the	-0.124939
-0.185494	the value of the	-0.321233
-0.340757	The value of the	-0.124939
-0.205541	different value of the	-0.124939
-0.205541	final value of the	-0.124939
-0.090723	absolute value of the	-0.124939
-0.164675	in any of the	-0.124939
-0.164675	if any of the	-0.124939
-0.164675	use any of the	-0.124939
-0.252259	If any of the	-0.124939
-0.164675	but any of the	-0.124939
-0.059522	of some of the	-0.124939
-0.059522	to some of the	-0.124939
-0.059522	with some of the	-0.124939
-0.059522	described some of the	-0.124939
-0.059522	Even some of the	-0.124939
-0.059522	While some of the	-0.124939
-0.059522	describe some of the	-0.124939
-0.192914	the performance of the	-0.124939
-0.162692	the order of the	-0.249877
-0.193330	opposite order of the	-0.124939
-0.388848	dispatch branch of the	-0.124939
-0.185654	is member of the	-0.124939
-0.214059	a member of the	-0.301030
-0.185654	(not member of the	-0.124939
-0.152183	the address of the	-0.204120
-0.159399	return address of the	-0.124939
-0.235492	every call of the	-0.124939
-0.398864	are out of the	-0.124939
-0.398864	conversions out of the	-0.124939
-0.562012	moved out of the	-0.124939
-0.111474	the part of the	-0.124939
-0.188012	is part of the	-0.124939
-0.188012	a part of the	-0.124939
-0.111474	are part of the	-0.124939
-0.111474	as part of the	-0.124939
-0.052170	this part of the	-0.124939
-0.034075	same part of the	-0.301030
-0.111474	which part of the	-0.124939
-0.111474	but part of the	-0.124939
-0.052170	each part of the	-0.124939
-0.111474	static part of the	-0.124939
-0.111474	any part of the	-0.124939
-0.235766	critical part of the	-0.477121
-0.111474	important part of the	-0.124939
-0.111474	small part of the	-0.124939
-0.111474	optimized part of the	-0.124939
-0.111474	another part of the	-0.124939
-0.111474	particular part of the	-0.124939
-0.111474	significant part of the	-0.124939
-0.111474	time-consuming part of the	-0.124939
-0.111474	time-critical part of the	-0.124939
-0.111474	task-specific part of the	-0.124939
-0.355835	the bits of the	-0.124939
-0.106315	16 bits of the	-0.124939
-0.247397	n bits of the	-0.124939
-0.372717	the case of the	-0.124939
-0.235492	project. Some of the	-0.124939
-0.079652	more versions of the	-0.124939
-0.154066	different versions of the	-0.124939
-0.159116	multiple versions of the	-0.124939
-0.267718	two versions of the	-0.124939
-0.177269	special versions of the	-0.124939
-0.177269	CPU-specific versions of the	-0.124939
-0.054447	the result of the	-0.425969
-0.230921	The result of the	-0.124939
-0.235492	first element of the	-0.124939
-0.235492	do much of the	-0.124939
-0.098996	for overflow of the	-0.425969
-0.371413	to integers of the	-0.124939
-0.509077	processing power of the	-0.124939
-0.509077	computational power of the	-0.124939
-0.360418	the calculation of the	-0.124939
-0.235492	the parameters of the	-0.124939
-0.235492	worst problem of the	-0.124939
-0.399379	takes advantage of the	-0.124939
-0.399379	main advantage of the	-0.124939
-0.399379	took advantage of the	-0.124939
-0.235492	for support of the	-0.124939
-0.235492	only few of the	-0.124939
-0.340613	whole structure of the	-0.124939
-0.171426	the copy of the	-0.124939
-0.077309	non-inlined copy of the	-0.124939
-0.311853	the problems of the	-0.124939
-0.235492	address space of the	-0.124939
-0.137736	An implementation of the	-0.124939
-0.137736	good implementation of the	-0.124939
-0.219523	complicated implementation of the	-0.124939
-0.137736	typical implementation of the	-0.124939
-0.065259	4 Most of the	-0.124939
-0.065259	framework Most of the	-0.124939
-0.065259	resources. Most of the	-0.124939
-0.424264	are members of the	-0.124939
-0.299751	variable members of the	-0.124939
-0.299751	Non-static members of the	-0.124939
-0.584062	important disadvantage of the	-0.124939
-0.235492	the resources of the	-0.124939
-0.366414	the end of the	-0.124939
-0.235492	language runtime of the	-0.124939
-0.010726	the parts of the	-0.425969
-0.021724	when parts of the	-0.124939
-0.021724	make parts of the	-0.124939
-0.039480	different parts of the	-0.124939
-0.010726	other parts of the	-0.124939
-0.021724	used parts of the	-0.124939
-0.004259	critical parts of the	-0.346788
-0.021724	specific parts of the	-0.124939
-0.021724	certain parts of the	-0.124939
-0.021724	time-consuming parts of the	-0.124939
-0.021724	Critical parts of the	-0.124939
-0.021724	nearby parts of the	-0.124939
-0.345805	code instead of the	-0.124939
-0.345805	format instead of the	-0.124939
-0.345805	|) instead of the	-0.124939
-0.235492	unacceptable. Each of the	-0.124939
-0.311853	interprocedural optimizations of the	-0.124939
-0.311853	microprocessors. Many of the	-0.124939
-0.311853	the results of the	-0.124939
-0.235492	The operands of the	-0.124939
-0.181469	the start of the	-0.124939
-0.181469	during start of the	-0.124939
-0.590049	the overhead of the	-0.124939
-0.352118	extra overhead of the	-0.124939
-0.081324	during installation of the	-0.124939
-0.115178	an instance of the	-0.124939
-0.115178	one instance of the	-0.124939
-0.115178	each instance of the	-0.124939
-0.053782	new instance of the	-0.124939
-0.311853	the output of the	-0.124939
-0.235492	essential task of the	-0.124939
-0.467014	the efficiency of the	-0.124939
-0.648520	The efficiency of the	-0.124939
-0.294055	more discussion of the	-0.124939
-0.295788	further discussion of the	-0.124939
-0.190515	the offset of the	-0.124939
-0.809123	the sake of the	-0.124939
-0.190515	The effect of the	-0.124939
-0.235492	clock frequency of the	-0.124939
-0.101976	one iteration of the	-0.124939
-0.101976	every iteration of the	-0.124939
-0.235492	future models of the	-0.124939
-0.440460	The names of the	-0.124939
-0.340613	lazy loading of the	-0.124939
-0.440460	the reading of the	-0.124939
-0.235492	case situation of the	-0.124939
-0.561635	different implementations of the	-0.124939
-0.311853	different sizes of the	-0.124939
-0.101976	large fraction of the	-0.124939
-0.101976	small fraction of the	-0.124939
-0.531776	the length of the	-0.124939
-0.053524	the beginning of the	-0.191886
-0.311853	The declaration of the	-0.124939
-0.340613	consuming features of the	-0.124939
-0.709714	a waste of the	-0.124939
-0.362375	total waste of the	-0.124939
-0.081324	is independent of the	-0.124939
-0.235492	without help of the	-0.124939
-0.396636	detailed explanation of the	-0.124939
-0.235492	The logic of the	-0.124939
-0.189818	takes care of the	-0.124939
-0.391059	take care of the	-0.124939
-0.562009	The purpose of the	-0.124939
-0.330374	all instances of the	-0.124939
-0.227434	renamed instances of the	-0.124939
-0.440460	the body of the	-0.124939
-0.235492	the changes of the	-0.124939
-0.355740	the representation of the	-0.124939
-0.003726	the responsibility of the	-0.492916
-0.048002	the reciprocal of the	-0.425969
-0.235492	degree polynomial of the	-0.124939
-0.905460	the scope of the	-0.124939
-0.537936	the throughput of the	-0.124939
-0.311853	each step of the	-0.124939
-0.299113	cases, regardless of the	-0.124939
-0.299113	false regardless of the	-0.124939
-0.440460	just-in-time compilation of the	-0.124939
-0.370931	the behavior of the	-0.124939
-0.227434	The behavior of the	-0.124939
-0.235492	the job of the	-0.124939
-0.537936	the requirements of the	-0.124939
-0.007333	the rest of the	-0.124939
-0.440460	the latency of the	-0.124939
-0.235492	advanced facilities of the	-0.124939
-0.235492	or estimate of the	-0.124939
-0.065259	transfer ownership of the	-0.124939
-0.065259	transfers ownership of the	-0.124939
-0.065259	looses ownership of the	-0.124939
-0.190515	the drawbacks of the	-0.425969
-0.235492	no modification of the	-0.124939
-0.235492	by modifications of the	-0.124939
-0.235492	The lengths of the	-0.124939
-0.065259	is 50% of the	-0.124939
-0.065259	true 50% of the	-0.124939
-0.065259	mispredicted 50% of the	-0.124939
-0.440460	(in bytes) of the	-0.124939
-0.235492	the resolution of the	-0.124939
-0.440460	complete redesign of the	-0.124939
-0.235492	thorough analysis of the	-0.124939
-0.440460	the contents of the	-0.124939
-0.235492	The generality of the	-0.124939
-0.440460	keep track of the	-0.124939
-0.161797	get rid of the	-0.124939
-0.235492	optimal decomposition of the	-0.124939
-0.235492	www.openmp.org. Documentation of the	-0.124939
-0.065259	when none of the	-0.124939
-0.031405	but none of the	-0.425969
-0.235492	are indeed of the	-0.124939
-0.235492	But beware of the	-0.124939
-0.235492	uses 90% of the	-0.124939
-0.235492	the lowest of the	-0.124939
-0.311853	better understanding of the	-0.124939
-0.101976	than 99% of the	-0.124939
-0.101976	programs, 99% of the	-0.124939
-0.235492	further expansions of the	-0.124939
-0.101976	only 10% of the	-0.124939
-0.101976	true 10% of the	-0.124939
-0.235492	than 1/50 of the	-0.124939
-0.235492	logical architecture of the	-0.124939
-0.235492	and flexibility of the	-0.124939
-0.235492	binary decimals of the	-0.124939
-0.235492	metaprogramming. None of the	-0.124939
-0.235492	responsi- bility of the	-0.124939
-0.235492	the evaluation of the	-0.124939
-0.235492	the bias of the	-0.124939
-0.235492	detailed overview of the	-0.124939
-0.235492	good knowledge of the	-0.124939
-0.235492	(called x86) of the	-0.124939
-0.235492	do searches of the	-0.124939
-0.235492	and systematization of the	-0.124939
-0.235492	This fragmentation of the	-0.124939
-0.235492	mode. Much of the	-0.124939
-0.235492	use segmentation of the	-0.124939
-0.235492	the dimensions of the	-0.124939
-0.235492	about investigation of the	-0.124939
-0.235492	The compactness of the	-0.124939
-0.235492	first-in-last-out nature of the	-0.124939
-0.235492	and clarity of the	-0.124939
-0.506476	reduce a to the	-0.124939
-0.367748	compare it to the	-0.124939
-0.367748	redirects it to the	-0.124939
-0.505845	machine code to the	-0.124939
-0.402199	apply as to the	-0.124939
-1.443811	the compiler to the	-0.124939
-0.156115	for x to the	-0.124939
-0.156115	get x to the	-0.124939
-0.156115	Calculate x to the	-0.124939
-0.411782	static memory to the	-0.425969
-0.402199	the data to the	-0.124939
-0.402199	dispatching only to the	-0.124939
-0.311560	it point to the	-0.124939
-0.311560	will point to the	-0.124939
-0.128420	cannot point to the	-0.425969
-0.309940	at all to the	-0.124939
-0.513289	more integer to the	-0.124939
-0.440920	a pointer to the	-0.425969
-0.250193	or pointer to the	-0.124939
-0.107323	function pointer to the	-0.124939
-0.107323	Set pointer to the	-0.425969
-0.437890	first object to the	-0.124939
-0.649323	point number to the	-0.124939
-0.059254	keyword static to the	-0.425969
-0.273450	one call to the	-0.425969
-0.372456	any call to the	-0.124939
-0.372456	first call to the	-0.124939
-0.781119	of pointers to the	-0.124939
-0.427945	multiple pointers to the	-0.124939
-0.292719	have access to the	-0.124939
-0.292719	possible access to the	-0.124939
-0.292719	get access to the	-0.124939
-0.292719	gives access to the	-0.124939
-0.402199	adds 16 to the	-0.124939
-0.309940	new instructions to the	-0.124939
-0.402199	made available to the	-0.124939
-0.059254	a constant to the	-0.124939
-1.218128	is important to the	-0.124939
-0.711660	the calls to the	-0.124939
-0.309940	the execution to the	-0.124939
-0.510831	and known to the	-0.124939
-0.309940	not add to the	-0.124939
-0.309940	threads write to the	-0.124939
-0.127889	a parameter to the	-0.124939
-0.127889	implicit parameter to the	-0.124939
-0.375552	a reference to the	-0.124939
-0.579672	or reference to the	-0.124939
-0.402199	adding n to the	-0.124939
-0.280971	in addition to the	-0.124939
-0.280971	important addition to the	-0.124939
-0.309940	is transferred to the	-0.124939
-0.059254	well-defined interface to the	-0.124939
-0.319323	output goes to the	-0.124939
-0.319323	project goes to the	-0.124939
-0.402199	symbolic link to the	-0.124939
-0.566876	is made to the	-0.124939
-0.480970	it points to the	-0.124939
-0.256808	initially points to the	-0.124939
-0.402199	function go to the	-0.124939
-0.402199	little overhead to the	-0.124939
-0.038591	are relative to the	-0.124939
-0.038591	function relative to the	-0.124939
-0.018867	member relative to the	-0.425969
-0.038591	offset relative to the	-0.124939
-0.038591	addressed relative to the	-0.124939
-0.118149	threads writing to the	-0.425969
-0.309940	more clear to the	-0.124939
-0.309940	one iteration to the	-0.124939
-0.456899	artificially changed to the	-0.124939
-0.309940	automatic updates to the	-0.124939
-0.223737	calls directly to the	-0.124939
-0.223737	fed directly to the	-0.124939
-0.241809	is copied to the	-0.124939
-0.156115	been copied to the	-0.124939
-0.156115	contents copied to the	-0.124939
-0.566876	is similar to the	-0.124939
-0.456899	jumps back to the	-0.124939
-0.309940	a row to the	-0.124939
-0.197600	is added to the	-0.124939
-0.798283	also applies to the	-0.124939
-0.309940	pointer eax to the	-0.124939
-0.309940	adding throw() to the	-0.124939
-0.309940	the inputs to the	-0.124939
-0.059254	be distributed to the	-0.425969
-0.015027	is equal to the	-0.124939
-0.046739	be equal to the	-0.124939
-0.046739	therefore equal to the	-0.124939
-0.309940	or reads to the	-0.124939
-0.309940	leftmost column to the	-0.124939
-0.346911	delay due to the	-0.124939
-0.346911	differences due to the	-0.124939
-0.566876	be obvious to the	-0.124939
-0.402199	be swapped to the	-0.124939
-0.127889	certain limit to the	-0.124939
-0.127889	upper limit to the	-0.124939
-0.319323	always belong to the	-0.124939
-0.319323	lines belong to the	-0.124939
-0.309940	one place to the	-0.124939
-0.280971	efficient thanks to the	-0.124939
-0.280971	fragmented thanks to the	-0.124939
-0.437890	as alternatives to the	-0.124939
-0.309940	of modifications to the	-0.124939
-0.059254	-fpic according to the	-0.124939
-0.059254	representation according to the	-0.124939
-0.059254	behave according to the	-0.124939
-0.059254	Now, according to the	-0.124939
-0.437890	be extended to the	-0.124939
-0.309940	also inconvenient to the	-0.124939
-0.402199	is inferior to the	-0.124939
-0.127889	is annoying to the	-0.124939
-0.127889	are annoying to the	-0.124939
-0.402199	been translated to the	-0.124939
-0.437890	statement leads to the	-0.124939
-0.402199	time compared to the	-0.124939
-0.309940	parallelism refers to the	-0.124939
-0.402199	normally belongs to the	-0.124939
-0.309940	the caller to the	-0.124939
-0.127889	significant contribution to the	-0.124939
-0.127889	negligible contribution to the	-0.124939
-0.309940	error messages to the	-0.124939
-0.309940	64-bit extension to the	-0.124939
-0.309940	anyway. Updates to the	-0.124939
-0.309940	is unacceptable to the	-0.124939
-0.309940	integer According to the	-0.124939
-0.309940	is closest to the	-0.124939
-0.309940	default, conform to the	-0.124939
-0.309940	and closer to the	-0.124939
-0.309940	to adapt to the	-0.124939
-0.494557	inlined function and the	-0.124939
-0.444568	Gnu compiler and the	-0.124939
-0.069313	the CPU and the	-0.726999
-0.555073	code cache and the	-0.124939
-0.351413	level-2 cache and the	-0.124939
-0.630267	function library and the	-0.124939
-0.408347	of 2 and the	-0.124939
-0.314905	data elements and the	-0.124939
-0.860207	is called and the	-0.124939
-0.505269	the pointers and the	-0.124939
-0.408347	by 32 and the	-0.124939
-0.475891	source file and the	-0.124939
-0.505239	is 0 and the	-0.124939
-0.484088	physical processors and the	-0.124939
-0.372609	two times and the	-0.124939
-0.372609	256 times and the	-0.124939
-0.858073	for Windows and the	-0.124939
-0.314905	point calculations and the	-0.124939
-0.098627	the processor and the	-0.124939
-0.679275	assembly language and the	-0.124939
-0.314905	units, etc. and the	-0.124939
-0.463838	is allocated and the	-0.124939
-0.314905	integer parameters and the	-0.124939
-0.284449	first count and the	-0.124939
-0.452481	repeat count and the	-0.124939
-0.408347	the microprocessor and the	-0.124939
-0.478915	preceding branches and the	-0.124939
-0.314905	software development and the	-0.124939
-0.408347	class name and the	-0.124939
-0.314905	Unix applications and the	-0.124939
-0.408347	.NET framework and the	-0.124939
-0.314905	cycles later and the	-0.124939
-0.314905	and b, and the	-0.124939
-0.314905	optimization options and the	-0.124939
-0.575888	copy constructor and the	-0.124939
-0.226443	library function, and the	-0.124939
-0.226443	select function, and the	-0.124939
-0.444568	bytes smaller and the	-0.124939
-0.314905	the contentions and the	-0.124939
-0.463879	all platforms and the	-0.124939
-0.314905	the level-1 and the	-0.124939
-0.314905	fast math and the	-0.124939
-0.408347	on alignment and the	-0.124939
-0.715317	same thing and the	-0.124939
-0.715317	the program, and the	-0.124939
-0.314905	the compiler, and the	-0.124939
-0.129515	class declaration and the	-0.124939
-0.129515	"C" declaration and the	-0.124939
-0.314905	optimized away and the	-0.124939
-0.408347	inlined 15.1b and the	-0.124939
-0.408347	binary integer, and the	-0.124939
-0.314905	next vector, and the	-0.124939
-0.408347	the linker and the	-0.124939
-0.463879	as possible, and the	-0.124939
-0.314905	Basic .NET and the	-0.124939
-0.314905	is obvious and the	-0.124939
-0.314905	the latency and the	-0.124939
-0.314905	of resources, and the	-0.124939
-0.408347	for overflow, and the	-0.124939
-0.575888	in advance and the	-0.124939
-0.314905	millisecond resolution and the	-0.124939
-0.314905	four bits, and the	-0.124939
-0.314905	then B, and the	-0.124939
-0.314905	system API and the	-0.124939
-0.408347	file level, and the	-0.124939
-0.314905	the parameter, and the	-0.124939
-0.314905	the easiest and the	-0.124939
-0.408347	program flow and the	-0.124939
-0.314905	a hint and the	-0.124939
-0.314905	in itself, and the	-0.124939
-0.575888	the exponent, and the	-0.124939
-0.314905	deleted properly and the	-0.124939
-0.314905	doesn't support, and the	-0.124939
-0.314905	quite tedious and the	-0.124939
-0.314905	and statistics, and the	-0.124939
-0.314905	false (0); and the	-0.124939
-0.314905	are sufficient, and the	-0.124939
-0.314905	||, ! and the	-0.124939
-0.314905	second source, and the	-0.124939
-0.314905	to T+6, and the	-0.124939
-0.314905	is terminated and the	-0.124939
-0.314905	See www.agner.org/optimize and the	-0.124939
-0.314905	each call, and the	-0.124939
-0.314905	library libmmt.lib and the	-0.124939
-0.314905	copying process, and the	-0.124939
-0.314905	their workplace and the	-0.124939
-0.314905	See www.openmp.org and the	-0.124939
-0.314905	operator ++i and the	-0.124939
-0.314905	times lower; and the	-0.124939
-0.314905	operator (&) and the	-0.124939
-0.031228	to be in the	-0.124939
-0.064877	still be in the	-0.124939
-0.431821	manual or in the	-0.124939
-0.353511	make it in the	-0.124939
-1.213333	a function in the	-0.124939
-0.922880	The code in the	-0.124939
-0.436852	compiler-generated code in the	-0.124939
-0.239897	is not in the	-0.124939
-0.450560	to int in the	-0.124939
-0.459743	file than in the	-0.124939
-0.338460	the compiler in the	-0.124939
-0.369103	longer time in the	-0.124939
-0.350570	the data in the	-0.124939
-0.350570	to data in the	-0.124939
-0.180473	*(++p) because in the	-0.124939
-0.180473	array[++i] because in the	-0.124939
-0.311255	different functions in the	-0.124939
-0.311255	non-polymorphic functions in the	-0.124939
-0.064877	registers only in the	-0.124939
-0.064877	option only in the	-0.124939
-0.064877	comes only in the	-0.124939
-0.243725	each other in the	-0.124939
-0.473898	message loop in the	-0.124939
-0.182525	is used in the	-0.249877
-0.209798	are used in the	-0.425969
-0.231354	data used in the	-0.124939
-0.367068	the integer in the	-0.124939
-0.411389	an integer in the	-0.124939
-0.180473	is set in the	-0.124939
-0.180473	same set in the	-0.124939
-0.233802	an example in the	-0.124939
-0.338460	register size in the	-0.124939
-0.338460	data object in the	-0.124939
-0.233802	point number in the	-0.124939
-0.394595	the value in the	-0.124939
-0.277290	B value in the	-0.124939
-0.373312	the objects in the	-0.124939
-0.373312	are objects in the	-0.124939
-0.425640	the variable in the	-0.124939
-0.300784	other variable in the	-0.124939
-0.425640	global variable in the	-0.124939
-0.150280	precision variables in the	-0.124939
-0.271667	the table in the	-0.124939
-0.180473	a table in the	-0.124939
-0.309838	one way in the	-0.124939
-0.407340	any elements in the	-0.124939
-0.407340	subsequent elements in the	-0.124939
-0.907535	are stored in the	-0.124939
-0.427815	been stored in the	-0.124939
-0.427815	usually stored in the	-0.124939
-0.387243	usually called in the	-0.124939
-0.309838	function address in the	-0.124939
-0.233802	factor 4 in the	-0.124939
-0.306646	For example, in the	-0.124939
-0.338460	sign bit in the	-0.124939
-0.387243	vector registers in the	-0.124939
-0.233802	to test in the	-0.124939
-0.753944	be useful in the	-0.124939
-0.233802	handling even in the	-0.124939
-0.309838	primitive operations in the	-0.124939
-0.233802	to type in the	-0.124939
-0.233802	pending instructions in the	-0.124939
-0.385884	is available in the	-0.124939
-0.238127	also available in the	-0.124939
-0.238127	processors available in the	-0.124939
-0.238127	become available in the	-0.124939
-0.233802	minor error in the	-0.124939
-0.390903	multiple times in the	-0.124939
-0.387243	call stack in the	-0.124939
-1.078669	are accessed in the	-0.124939
-0.233802	incremented, while in the	-0.124939
-0.309838	static arrays in the	-0.124939
-0.011449	function calls in the	-0.249877
-0.497455	unused bytes in the	-0.124939
-0.226189	multiple threads in the	-0.124939
-0.328796	two threads in the	-0.124939
-0.233802	be necessary in the	-0.124939
-0.397024	new element in the	-0.124939
-0.397024	every element in the	-0.124939
-0.233802	definition language in the	-0.124939
-0.309838	function. But in the	-0.124939
-0.233802	kept small in the	-0.124939
-0.233802	cause overflow in the	-0.124939
-0.101354	handling option in the	-0.124939
-0.101354	unroll option in the	-0.124939
-0.031228	container classes in the	-0.124939
-0.064877	parent classes in the	-0.124939
-0.047727	This works in the	-0.124939
-0.119550	as explained in the	-0.249877
-0.164059	further explained in the	-0.124939
-0.694260	is implemented in the	-0.124939
-0.657700	be implemented in the	-0.124939
-0.497455	an advantage in the	-0.124939
-0.233802	profiling support in the	-0.124939
-0.534550	is supported in the	-0.124939
-0.510447	that run in the	-0.124939
-0.233802	in hardware in the	-0.124939
-0.271667	the values in the	-0.124939
-0.180473	G values in the	-0.124939
-0.338460	the information in the	-0.124939
-0.233802	clock cycles in the	-0.124939
-0.170550	and addresses in the	-0.124939
-0.170550	All addresses in the	-0.124939
-0.259458	relative addresses in the	-0.124939
-0.309838	stamp counter in the	-0.124939
-0.047727	a space in the	-0.124939
-0.047727	more space in the	-0.124939
-0.047727	much space in the	-0.124939
-0.047727	little space in the	-0.124939
-1.027591	CPU dispatching in the	-0.124939
-0.233802	files, preferably in the	-0.124939
-0.233802	you see in the	-0.124939
-0.309838	error handling in the	-0.124939
-0.233802	used members in the	-0.124939
-0.233802	the methods in the	-0.124939
-0.233802	the name in the	-0.124939
-0.233802	remains zero in the	-0.124939
-0.225112	is running in the	-0.124939
-0.225112	threads running in the	-0.124939
-0.225112	thread running in the	-0.124939
-0.309838	CPU dispatcher in the	-0.124939
-0.338460	the examples in the	-0.124939
-0.080929	dispatch mechanism in the	-0.425969
-0.101354	newer microprocessors in the	-0.124939
-0.101354	Modern microprocessors in the	-0.124939
-0.233802	use later in the	-0.124939
-0.393114	linked together in the	-0.124939
-0.253770	be declared in the	-0.124939
-0.364033	objects declared in the	-0.124939
-0.233802	function goes in the	-0.124939
-0.233802	power-save options in the	-0.124939
-0.233802	Func2 were in the	-0.124939
-0.233802	unused points in the	-0.124939
-0.393114	randomly around in the	-0.124939
-0.054534	cause contentions in the	-0.301030
-0.353511	and references in the	-0.124939
-0.233802	extra overhead in the	-0.124939
-0.233802	a change in the	-0.124939
-0.233802	point-to-integer conversions in the	-0.124939
-0.309838	one statement in the	-0.124939
-0.486055	as described in the	-0.124939
-0.220359	cases described in the	-0.124939
-0.220359	syntax described in the	-0.124939
-0.220359	further described in the	-0.124939
-0.362799	4 lines in the	-0.124939
-0.233802	graphics operation in the	-0.124939
-0.362799	as given in the	-0.124939
-0.233802	of S1 in the	-0.124939
-0.309838	registration database in the	-0.124939
-0.233802	identical constants in the	-0.124939
-0.555810	text strings in the	-0.124939
-0.233802	a macro in the	-0.124939
-0.233802	with 100 in the	-0.124939
-0.309838	The containers in the	-0.124939
-0.233802	different priority in the	-0.124939
-0.233802	function names in the	-0.124939
-0.309838	the rows in the	-0.124939
-0.233802	may fail in the	-0.124939
-0.233802	data structures in the	-0.124939
-0.170550	can occur in the	-0.124939
-0.076956	contentions occur in the	-0.425969
-0.233802	be improved in the	-0.124939
-0.437753	are discussed in the	-0.124939
-0.047727	forwarding delay in the	-0.425969
-0.101354	saved either in the	-0.124939
-0.101354	blocks, either in the	-0.124939
-0.180473	register except in the	-0.124939
-0.180473	representation, except in the	-0.124939
-0.233802	places back in the	-0.124939
-0.233802	can happen in the	-0.124939
-0.309838	go away in the	-0.124939
-0.476629	are provided in the	-0.124939
-0.233802	dependency chains in the	-0.124939
-0.338460	time-consumers mentioned in the	-0.124939
-0.259458	is included in the	-0.124939
-0.170550	are included in the	-0.124939
-0.170550	not included in the	-0.124939
-0.233802	into account in the	-0.124939
-0.309838	and algorithms in the	-0.124939
-0.309838	float additions in the	-0.124939
-0.233802	unknown factors in the	-0.124939
-0.629826	are listed in the	-0.124939
-0.319342	as listed in the	-0.124939
-0.233802	is interpreted in the	-0.124939
-0.047727	causes misses in the	-0.124939
-0.233802	named YMM in the	-0.124939
-0.233802	for free in the	-0.124939
-0.338460	was saved in the	-0.124939
-0.233802	of changes in the	-0.124939
-0.233802	execution units in the	-0.124939
-0.233802	the reciprocal in the	-0.124939
-0.101354	is spent in the	-0.124939
-0.101354	time spent in the	-0.124939
-0.309838	exception occurs in the	-0.124939
-0.233802	next step in the	-0.124939
-0.047727	hot spots in the	-0.124939
-0.233802	specific places in the	-0.124939
-0.233802	are evaluated in the	-0.124939
-0.101354	is portable in the	-0.124939
-0.101354	fully portable in the	-0.124939
-0.233802	to recover in the	-0.124939
-0.233802	the advice in the	-0.124939
-0.233802	is already in the	-0.124939
-0.180473	be seen in the	-0.124939
-0.180473	not seen in the	-0.124939
-0.233802	or key in the	-0.124939
-0.031228	they appear in the	-0.124939
-0.064877	modules appear in the	-0.124939
-0.233802	code flag in the	-0.124939
-0.233802	not present in the	-0.124939
-0.233802	one place in the	-0.124939
-0.233802	is serial in the	-0.124939
-0.233802	require modifications in the	-0.124939
-0.437753	are missing in the	-0.124939
-0.233802	different lengths in the	-0.124939
-0.047727	2. Contentions in the	-0.124939
-0.047727	buffer. Contentions in the	-0.124939
-0.047727	(BTB). Contentions in the	-0.124939
-0.047727	experiments. Contentions in the	-0.124939
-0.101354	a breakpoint in the	-0.124939
-0.101354	fixed breakpoint in the	-0.124939
-0.233802	that appears in the	-0.124939
-0.233802	exception handler in the	-0.124939
-0.233802	index changing in the	-0.124939
-0.309838	is kept in the	-0.124939
-0.233802	the techniques in the	-0.124939
-0.233802	an FPGA in the	-0.124939
-0.309838	stored consecutively in the	-0.124939
-0.233802	of abstraction in the	-0.124939
-0.233802	need updating in the	-0.124939
-0.233802	profiling instruments in the	-0.124939
-0.233802	unnecessarily wasteful in the	-0.124939
-0.101354	difference lies in the	-0.124939
-0.101354	efficiency lies in the	-0.124939
-0.233802	errors elsewhere in the	-0.124939
-0.309838	have supplied in the	-0.124939
-0.233802	cause delays in the	-0.124939
-0.233802	uses logarithms in the	-0.124939
-0.233802	system kernel in the	-0.124939
-0.233802	table (PLT) in the	-0.124939
-0.437753	as shown in the	-0.124939
-0.233802	be disabled in the	-0.124939
-0.101354	the relocations in the	-0.124939
-0.101354	generate relocations in the	-0.124939
-0.233802	it locally in the	-0.124939
-0.233802	the answers in the	-0.124939
-0.233802	is inserted in the	-0.124939
-0.233802	not visible in the	-0.124939
-0.233802	scattered everywhere in the	-0.124939
-0.233802	as pragmas in the	-0.124939
-0.233802	runs alone in the	-0.124939
-0.233802	biggest time-consumer in the	-0.124939
-0.047727	8 Optimizations in the	-0.425969
-0.233802	for parallelization in the	-0.124939
-0.233802	are overdetermined in the	-0.124939
-0.233802	or integrated in the	-0.124939
-0.233802	source annotation in the	-0.124939
-0.233802	assigned previously in the	-0.124939
-0.233802	necessarily stay in the	-0.124939
-0.233802	will dominate in the	-0.124939
-0.233802	The dot in the	-0.124939
-0.233802	been alleviated in the	-0.124939
-0.233802	right positions in the	-0.124939
-0.233802	libraries. Numbers in the	-0.124939
-0.233802	will grow in the	-0.124939
-0.233802	other flaws in the	-0.124939
-0.233802	is mirrored in the	-0.124939
-0.233802	classes Programming in the	-0.124939
-0.233802	chain. Nothing in the	-0.124939
-0.233802	stored contiguously in the	-0.124939
-0.233802	inserted UnusedFiller in the	-0.124939
-0.233802	(release version) in the	-0.124939
-0.233802	and foremost, in the	-0.124939
-0.233802	or glitches in the	-0.124939
-0.233802	branch predictions in the	-0.124939
-0.233802	occurred anywhere in the	-0.124939
-0.233802	be resized in the	-0.124939
-0.405666	all is for the	-0.124939
-0.405666	this or for the	-0.124939
-0.358944	the function for the	-0.124939
-0.358944	inlined function for the	-0.124939
-0.289493	64-bit code for the	-0.124939
-0.289493	intermediate code for the	-0.124939
-0.289493	identical code for the	-0.124939
-0.289493	build code for the	-0.124939
-0.263835	save time for the	-0.124939
-0.263835	saves time for the	-0.124939
-0.423288	prefetching data for the	-0.124939
-0.285715	a program for the	-0.124939
-0.462345	intrinsic functions for the	-0.124939
-1.017536	is used for the	-0.124939
-0.344014	times, one for the	-0.124939
-0.344014	branches: one for the	-0.124939
-0.119773	of cache for the	-0.124939
-0.119773	extra cache for the	-0.124939
-0.055771	instruction set for the	-0.124939
-0.285715	vector size for the	-0.124939
-0.372433	temporary object for the	-0.124939
-0.372433	linear array for the	-0.124939
-0.147460	it possible for the	-0.124939
-0.147460	therefore possible for the	-0.124939
-0.147460	rarely possible for the	-0.124939
-0.434215	bit version for the	-0.124939
-0.709898	induction variables for the	-0.124939
-0.434215	degrades performance for the	-0.124939
-0.285715	is called for the	-0.124939
-0.423288	vector register for the	-0.124939
-0.372433	using registers for the	-0.124939
-0.405666	the need for the	-0.124939
-0.285715	to test for the	-0.124939
-0.922029	is useful for the	-0.124939
-0.644247	header file for the	-0.124939
-0.447049	reorder instructions for the	-0.124939
-0.813461	are available for the	-0.124939
-0.372433	is important for the	-0.124939
-0.285715	too large for the	-0.124939
-0.358228	is compiled for the	-0.124939
-0.218269	and compiled for the	-0.124939
-0.218269	each compiled for the	-0.124939
-0.318789	programs compiled for the	-0.124939
-0.285715	too big for the	-0.124939
-0.456930	file" option for the	-0.124939
-0.112183	is good for the	-0.124939
-0.640384	highly optimized for the	-0.124939
-0.778990	to check for the	-0.124939
-0.451187	then check for the	-0.124939
-0.285715	efficient solution for the	-0.124939
-0.413504	make support for the	-0.124939
-0.659056	hardware support for the	-0.124939
-0.405666	with 1 for the	-0.124939
-0.372433	some information for the	-0.124939
-0.372433	source files for the	-0.124939
-0.285715	mentioned above for the	-0.124939
-0.285715	same space for the	-0.124939
-0.285715	code caching for the	-0.124939
-0.524043	exception handling for the	-0.124939
-0.405666	same name for the	-0.124939
-0.285715	a disadvantage for the	-0.124939
-0.372433	no difference for the	-0.124939
-0.434215	not needed for the	-0.124939
-0.017806	is difficult for the	-0.301030
-0.055771	more difficult for the	-0.124939
-0.372433	available options for the	-0.124939
-0.285715	most appropriate for the	-0.124939
-0.285715	a constructor for the	-0.124939
-0.285715	is relevant for the	-0.124939
-0.285715	the destructor for the	-0.124939
-0.285715	hide them for the	-0.124939
-0.814278	when compiling for the	-0.124939
-0.036374	it easier for the	-0.425969
-0.076075	reordering easier for the	-0.124939
-0.285715	exactly identical for the	-0.124939
-0.423288	stack, except for the	-0.124939
-0.285715	critical stride for the	-0.124939
-0.524043	big enough for the	-0.124939
-0.119773	is chosen for the	-0.124939
-0.119773	has chosen for the	-0.124939
-0.285715	is included for the	-0.124939
-0.285715	limiting factors for the	-0.124939
-0.285715	perform poorly for the	-0.124939
-0.272839	to wait for the	-0.124939
-0.285715	rare. Testing for the	-0.124939
-0.210327	expressions (except for the	-0.124939
-0.210327	iteration (except for the	-0.124939
-0.285715	page 51 for the	-0.124939
-0.524043	compiler documentation for the	-0.124939
-0.285715	square blocking for the	-0.124939
-0.285715	are competing for the	-0.124939
-0.055771	not unusual for the	-0.124939
-0.372433	best suited for the	-0.124939
-0.285715	a proxy for the	-0.124939
-0.285715	is consistent for the	-0.124939
-0.285715	are unnecessary for the	-0.124939
-0.285715	sufficiently accurate for the	-0.124939
-0.119773	will contend for the	-0.124939
-0.119773	libraries contend for the	-0.124939
-0.285715	the subroutine for the	-0.124939
-0.285715	specific preferences for the	-0.124939
-0.285715	division. Correction for the	-0.124939
-0.285715	the FAQ for the	-0.124939
-0.285715	is maintained for the	-0.124939
-0.285715	page 122) for the	-0.124939
-0.285715	registers. Except for the	-0.124939
-0.285715	to compensate for the	-0.124939
-0.285715	always compete for the	-0.124939
-0.285715	the standards for the	-0.124939
-0.285715	optimize specifically for the	-0.124939
-0.285715	// Prototype for the	-0.124939
-0.285715	on correction for the	-0.124939
-0.434204	code is that the	-0.124939
-0.182253	compiler is that the	-0.124939
-0.273864	this is that the	-0.124939
-0.182253	point is that the	-0.124939
-0.273864	set is that the	-0.124939
-0.182253	method is that the	-0.124939
-0.182253	language is that the	-0.124939
-0.182253	Linux is that the	-0.124939
-0.089958	disadvantage is that the	-0.124939
-0.182253	parameter is that the	-0.124939
-0.209042	reason is that the	-0.124939
-0.182253	linking is that the	-0.124939
-0.273864	here is that the	-0.124939
-0.273864	contentions is that the	-0.124939
-0.182253	inlining is that the	-0.124939
-0.182253	binding is that the	-0.124939
-0.273864	notice is that the	-0.124939
-0.182253	macros is that the	-0.124939
-0.182253	consequence is that the	-0.124939
-0.182253	assumption is that the	-0.124939
-0.273864	conclusion is that the	-0.124939
-0.182253	argument is that the	-0.124939
-0.409632	1 and that the	-0.124939
-0.947273	the code that the	-0.124939
-0.381886	the compiler that the	-0.124939
-0.351030	an instruction that the	-0.124939
-0.399299	the class that the	-0.124939
-0.406464	time so that the	-0.124939
-0.286320	bit so that the	-0.124939
-0.286320	pointers so that the	-0.124939
-0.286320	calculations so that the	-0.124939
-0.286320	mode so that the	-0.124939
-0.286320	start so that the	-0.124939
-0.286320	inlined so that the	-0.124939
-0.286320	unrolling so that the	-0.124939
-0.119980	organized so that the	-0.124939
-0.286320	integer, so that the	-0.124939
-0.286320	possible, so that the	-0.124939
-0.286320	compact so that the	-0.124939
-0.286320	0x2C so that the	-0.124939
-0.268109	so long that the	-0.124939
-0.341989	be sure that the	-0.124939
-0.341989	are sure that the	-0.124939
-0.297039	make sure that the	-0.124939
-0.185182	makes sure that the	-0.124939
-0.113686	the case that the	-0.124939
-0.113686	likely case that the	-0.124939
-0.173745	is important that the	-0.124939
-0.351030	other work that the	-0.124939
-0.494002	to avoid that the	-0.124939
-0.351030	runtime check that the	-0.124939
-0.351030	the problem that the	-0.124939
-0.494002	the advantage that the	-0.124939
-0.538539	is likely that the	-0.124939
-0.268109	can calculate that the	-0.124939
-0.268109	to copy that the	-0.124939
-0.432974	quite certain that the	-0.124939
-0.494002	so fast that the	-0.124939
-0.399299	the problems that the	-0.124939
-0.088731	can see that the	-0.124939
-0.494002	memory block that the	-0.124939
-0.268109	arbitrary name that the	-0.124939
-0.022816	the disadvantage that the	-0.301030
-0.921073	This means that the	-0.124939
-0.434268	function, means that the	-0.124939
-0.427964	it requires that the	-0.124939
-0.298070	and assume that the	-0.124939
-0.597628	can assume that the	-0.124939
-0.123950	generally assume that the	-0.124939
-0.298070	safely assume that the	-0.124939
-0.382606	special feature that the	-0.124939
-0.150465	operations require that the	-0.124939
-0.150465	instructions require that the	-0.124939
-0.150465	profilers require that the	-0.124939
-0.150465	MOVNTDQ require that the	-0.124939
-0.419740	of things that the	-0.124939
-0.268109	the reductions that the	-0.124939
-0.024769	The fact that the	-0.124939
-0.457641	} Assume that the	-0.124939
-0.204917	the possibility that the	-0.124939
-0.268109	this discussion that the	-0.124939
-0.164627	version. Note that the	-0.124939
-0.164627	system. Note that the	-0.124939
-0.164627	arrays. Note that the	-0.124939
-0.164627	details. Note that the	-0.124939
-0.164627	optimized. Note that the	-0.124939
-0.164627	disassembler. Note that the	-0.124939
-0.268109	occasionally predict that the	-0.124939
-0.268109	clock frequency that the	-0.124939
-0.268109	must consider that the	-0.124939
-0.268109	time delay that the	-0.124939
-0.351030	higher risk that the	-0.124939
-0.113686	function, provided that the	-0.124939
-0.113686	branches, provided that the	-0.124939
-0.446387	the sense that the	-0.124939
-0.494002	will notice that the	-0.124939
-0.268109	be expected that the	-0.124939
-0.268109	can detect that the	-0.124939
-0.268109	roughly estimate that the	-0.124939
-0.268109	be said that the	-0.124939
-0.494002	the assumption that the	-0.124939
-0.173745	will recognize that the	-0.124939
-0.351030	are assuming that the	-0.124939
-0.268109	90% chance that the	-0.124939
-0.494002	I believe that the	-0.124939
-0.351030	C; Assuming that the	-0.124939
-0.268109	with certainty that the	-0.124939
-0.351030	standard says that the	-0.124939
-0.268109	programmer forgets that the	-0.124939
-0.268109	seem illogical that the	-0.124939
-0.268109	is assumed that the	-0.124939
-0.268109	be emphasized that the	-0.124939
-0.268109	the complication that the	-0.124939
-0.268109	is unlikely that the	-0.124939
-0.268109	from knowing that the	-0.124939
-0.856820	this to be the	-0.124939
-1.477770	likely to be the	-0.124939
-0.884893	function may be the	-0.124939
-0.650088	of course be the	-0.124939
-0.818390	and b are the	-0.124939
-0.353023	this problem are the	-0.124939
-0.353023	allowed inputs are the	-0.124939
-0.353023	of algebra are the	-0.124939
-0.353023	storage principles are the	-0.124939
-0.345383	a vector or the	-0.124939
-0.695817	the loop or the	-0.124939
-0.760285	an array or the	-0.124939
-0.695817	one way or the	-0.124939
-0.492407	vector aligned or the	-0.124939
-0.345383	the positive or the	-0.124939
-0.446472	is overloaded or the	-0.124939
-0.345383	if possible, or the	-0.124939
-0.446472	or reference, or the	-0.124939
-0.345383	and searching, or the	-0.124939
-0.357344	to access it the	-0.124939
-0.443973	is that if the	-0.124939
-0.249724	predicted or if the	-0.124939
-0.249724	effects or if the	-0.124939
-0.249724	blocks, or if the	-0.124939
-0.249724	correct or if the	-0.124939
-0.249724	unstable or if the	-0.124939
-0.249724	initialization, or if the	-0.124939
-0.390783	a function if the	-0.124939
-0.314463	But not if the	-0.124939
-0.162560	variables than if the	-0.124939
-0.162560	form than if the	-0.124939
-0.203844	the compiler if the	-0.124939
-0.300686	the memory if the	-0.124939
-0.274377	member functions if the	-0.124939
-0.249673	and only if the	-0.124939
-0.249673	possible only if the	-0.124939
-0.358759	works only if the	-0.124939
-0.203844	blend instruction if the	-0.124939
-0.203844	floating point if the	-0.124939
-0.437790	the loop if the	-0.124939
-0.274417	no loop if the	-0.124939
-0.698419	be used if the	-0.124939
-0.203844	into one if the	-0.124939
-0.425510	an integer if the	-0.124939
-0.203844	than double if the	-0.124939
-0.154754	and efficient if the	-0.124939
-0.240152	most efficient if the	-0.124939
-0.154754	less efficient if the	-0.124939
-0.162560	not possible if the	-0.124939
-0.249673	only possible if the	-0.124939
-0.101403	of 2 if the	-0.249877
-0.090071	the performance if the	-0.124939
-0.090071	in performance if the	-0.124939
-0.274377	single branch if the	-0.124939
-0.203844	efficient way if the	-0.124939
-0.096884	is faster if the	-0.249877
-0.204037	goes faster if the	-0.124939
-0.534851	For example, if the	-0.124939
-0.203844	64-bit systems if the	-0.124939
-0.090071	is useful if the	-0.124939
-0.090071	be useful if the	-0.124939
-0.159040	cycles even if the	-0.124939
-0.159040	Intel, even if the	-0.124939
-0.159040	called, even if the	-0.124939
-0.159040	up, even if the	-0.124939
-0.159040	resources, even if the	-0.124939
-0.159040	execution, even if the	-0.124939
-0.203844	operating system if the	-0.124939
-0.274377	are available if the	-0.124939
-0.203844	filled up if the	-0.124939
-0.203844	an error if the	-0.124939
-0.274377	106 CPUs if the	-0.124939
-0.203844	works best if the	-0.124939
-0.274377	is necessary if the	-0.124939
-0.203844	array element if the	-0.124939
-0.182756	programmed. But if the	-0.124939
-0.182756	delay. But if the	-0.124939
-0.182756	miss. But if the	-0.124939
-0.203844	reduce speed if the	-0.124939
-0.476605	separate thread if the	-0.124939
-0.203844	64-bit integers if the	-0.124939
-0.437790	to check if the	-0.124939
-0.274417	input check if the	-0.124939
-0.042714	is advantageous if the	-0.124939
-0.162627	be advantageous if the	-0.124939
-0.090092	more advantageous if the	-0.124939
-0.090071	no problem if the	-0.124939
-0.090071	big problem if the	-0.124939
-0.203844	an advantage if the	-0.124939
-0.274377	bit mode if the	-0.124939
-0.274377	predicted well if the	-0.124939
-0.274377	very fast if the	-0.124939
-0.090071	without problems if the	-0.124939
-0.090071	cause problems if the	-0.124939
-0.526846	to see if the	-0.124939
-0.274377	software implementation if the	-0.124939
-0.203844	more complicated if the	-0.124939
-0.203844	above methods if the	-0.124939
-0.390783	a disadvantage if the	-0.124939
-0.203844	const reference if the	-0.124939
-0.203844	table lookup if the	-0.124939
-0.042704	not needed if the	-0.425969
-0.203844	using vectors if the	-0.124939
-0.203844	inconsistent results if the	-0.124939
-0.203844	the operands if the	-0.124939
-0.203844	runtime here if the	-0.124939
-0.042704	cache contentions if the	-0.425969
-0.203844	is predicted if the	-0.124939
-0.274377	cause errors if the	-0.124939
-0.274377	very inefficient if the	-0.124939
-0.203844	Linux platforms if the	-0.124939
-0.390783	be vectorized if the	-0.124939
-0.203844	usually inlined if the	-0.124939
-0.203844	loop further if the	-0.124939
-0.203844	hard disk if the	-0.124939
-0.203844	is obtained if the	-0.124939
-0.314463	less efficiently if the	-0.124939
-0.203844	CPU models if the	-0.124939
-0.203913	to fail if the	-0.124939
-0.338537	will fail if the	-0.124939
-0.203844	the target if the	-0.124939
-0.115871	program, especially if the	-0.124939
-0.115871	systems, especially if the	-0.124939
-0.115871	file, especially if the	-0.124939
-0.203844	the updates if the	-0.124939
-0.443973	may consider if the	-0.124939
-0.203844	function directly if the	-0.124939
-0.274377	the loops if the	-0.124939
-0.090071	can happen if the	-0.124939
-0.090071	will happen if the	-0.124939
-0.203844	doesn't matter if the	-0.124939
-0.203844	is pure if the	-0.124939
-0.203844	clock cycle if the	-0.124939
-0.203844	more frequent if the	-0.124939
-0.203844	needed, however, if the	-0.124939
-0.203844	operand. Likewise, if the	-0.124939
-0.203844	more compact if the	-0.124939
-0.203844	of course, if the	-0.124939
-0.203844	more complex if the	-0.124939
-0.203844	dispatching. Test if the	-0.124939
-0.203844	typically happens if the	-0.124939
-0.203844	be permissible if the	-0.124939
-0.203844	so (i.e. if the	-0.124939
-0.018463	be eliminated if the	-0.301030
-0.203844	is selected if the	-0.124939
-0.203844	severe delays if the	-0.124939
-0.203844	library (STL) if the	-0.124939
-0.203844	to determine if the	-0.124939
-0.203844	branch mispredictions if the	-0.124939
-0.203844	call WriteFile if the	-0.124939
-0.203844	bits (YMM) if the	-0.124939
-0.203844	the sign-bit if the	-0.124939
-0.203844	be ignored if the	-0.124939
-0.203844	bits (XMM) if the	-0.124939
-0.203844	a minute if the	-0.124939
-0.203844	is minimized if the	-0.124939
-0.404233	period and by the	-0.124939
-0.271751	system, not by the	-0.124939
-0.259252	rather than by the	-0.124939
-0.355443	once more by the	-0.124939
-0.941456	the loop by the	-0.124939
-0.614029	memory used by the	-0.124939
-0.271751	information stored by the	-0.124939
-0.202446	is called by the	-0.124939
-0.298938	functions called by the	-0.124939
-0.355443	memory address by the	-0.124939
-0.271751	function call by the	-0.124939
-0.387353	ruled out by the	-0.124939
-0.355443	and arrays by the	-0.124939
-0.460214	is done by the	-0.124939
-0.289686	and done by the	-0.124939
-0.289686	necessarily done by the	-0.124939
-0.868616	be calculated by the	-0.124939
-0.355443	typically implemented by the	-0.124939
-0.185558	is supported by the	-0.124939
-0.146783	and supported by the	-0.124939
-0.139379	are supported by the	-0.124939
-0.067220	if supported by the	-0.124939
-0.271751	inlined automatically by the	-0.124939
-0.271751	actually needed by the	-0.124939
-0.708352	be aligned by the	-0.124939
-0.135403	is divisible by the	-0.124939
-0.295888	be divisible by the	-0.124939
-0.088588	not divisible by the	-0.124939
-0.450061	address divisible by the	-0.124939
-0.298452	addresses divisible by the	-0.124939
-0.671838	is replaced by the	-0.124939
-0.959602	be replaced by the	-0.124939
-0.271751	be predicted by the	-0.124939
-0.298938	is limited by the	-0.124939
-0.202446	be limited by the	-0.124939
-0.404233	be obtained by the	-0.124939
-0.271751	to square by the	-0.124939
-0.271751	is converted by the	-0.124939
-0.271751	two additions by the	-0.124939
-0.084621	is determined by the	-0.124939
-0.283227	be determined by the	-0.124939
-0.026014	code generated by the	-0.124939
-0.053686	files generated by the	-0.124939
-0.053686	comments generated by the	-0.124939
-0.271751	be illustrated by the	-0.124939
-0.142330	is multiplied by the	-0.124939
-0.065360	be multiplied by the	-0.124939
-0.035044	be modified by the	-0.124939
-0.073165	never modified by the	-0.124939
-0.271751	done manually by the	-0.124939
-0.271751	of ArraySize by the	-0.124939
-0.271751	are relocated by the	-0.124939
-0.271751	a bitfield by the	-0.124939
-0.271751	and investigated by the	-0.124939
-0.271751	is caught by the	-0.124939
-0.271751	place indicated by the	-0.124939
-0.271751	obviously influenced by the	-0.124939
-0.271751	when activated by the	-0.124939
-0.387389	delete or with the	-0.124939
-0.404271	replace it with the	-0.124939
-0.273238	another function with the	-0.124939
-0.273238	pure function with the	-0.124939
-0.404271	each compiler with the	-0.124939
-0.614090	a CPU with the	-0.124939
-0.126127	this library with the	-0.425969
-0.271779	member variable with the	-0.124939
-0.053691	multiple bits with the	-0.124939
-0.414724	first processors with the	-0.124939
-0.073171	end up with the	-0.124939
-0.035046	keep up with the	-0.124939
-0.545404	be accessed with the	-0.124939
-0.460533	normally compiled with the	-0.124939
-0.312147	of threads with the	-0.124939
-0.312147	more threads with the	-0.124939
-0.298958	to compile with the	-0.124939
-0.202462	you compile with the	-0.124939
-0.271779	integer overflow with the	-0.124939
-0.387389	8-bit integers with the	-0.124939
-0.423208	is done with the	-0.124939
-0.396385	be done with the	-0.124939
-0.355476	always calculated with the	-0.124939
-0.414724	serious problem with the	-0.124939
-0.271779	different types with the	-0.124939
-0.355476	variables declared with the	-0.124939
-0.355476	and link with the	-0.124939
-0.271779	eight) points with the	-0.124939
-0.355476	funny things with the	-0.124939
-0.639649	not compatible with the	-0.124939
-0.431942	that comes with the	-0.124939
-0.355476	be vectorized with the	-0.124939
-0.283244	is obtained with the	-0.124939
-0.084626	be obtained with the	-0.124939
-0.271779	of N with the	-0.124939
-0.271779	representation directly with the	-0.124939
-0.175054	that come with the	-0.124939
-0.271779	can happen with the	-0.124939
-0.271779	libraries included with the	-0.124939
-0.271779	vector c2 with the	-0.124939
-0.271779	a DLL with the	-0.124939
-0.142340	by multiplying with the	-0.124939
-0.065364	before multiplying with the	-0.124939
-0.271779	vector bc with the	-0.124939
-0.271779	them separately with the	-0.124939
-0.816992	is AND'ed with the	-0.124939
-0.271779	compiler combined with the	-0.124939
-0.271779	PLT entry with the	-0.124939
-0.271779	some tests with the	-0.124939
-0.387389	not satisfied with the	-0.124939
-0.202462	Gnu Comes with the	-0.124939
-0.202462	Embarcadero Comes with the	-0.124939
-0.271779	duration compared with the	-0.124939
-0.355476	control Microprocessors with the	-0.124939
-0.271779	often conflicting with the	-0.124939
-0.271779	multiple configurations with the	-0.124939
-0.271779	been unsatisfied with the	-0.124939
-0.271779	12.4b, rewritten with the	-0.124939
-0.271779	array coincides with the	-0.124939
-0.271779	dividing repeatedly with the	-0.124939
-0.271779	spent fighting with the	-0.124939
-0.012934	rather than on the	-0.346788
-0.068902	interface than on the	-0.124939
-0.236368	Gnu compiler on the	-0.124939
-0.396078	CPU time on the	-0.124939
-0.236368	allocates memory on the	-0.124939
-0.236368	speed-critical program on the	-0.124939
-0.563259	depends only on the	-0.124939
-0.312897	inferior version on the	-0.124939
-0.236368	other objects on the	-0.124939
-0.031496	is stored on the	-0.124939
-0.031496	be stored on the	-0.124939
-0.010247	are stored on the	-0.301030
-0.031496	Variables stored on the	-0.124939
-0.341728	virtual processors on the	-0.124939
-0.129038	directives work on the	-0.124939
-0.171879	the calculations on the	-0.124939
-0.171879	do calculations on the	-0.124939
-0.171879	start calculations on the	-0.124939
-0.614800	works best on the	-0.124939
-0.236368	very much on the	-0.124939
-0.312897	possible overflow on the	-0.124939
-0.236368	be done on the	-0.124939
-0.312897	the parameters on the	-0.124939
-0.236368	is optimal on the	-0.124939
-0.236368	of space on the	-0.124939
-0.447944	when running on the	-0.124939
-0.281362	processes running on the	-0.124939
-0.048144	are transferred on the	-0.425969
-0.312897	all optimizations on the	-0.124939
-0.236368	rendering graphics on the	-0.124939
-0.236368	keep together on the	-0.124939
-0.236368	by storage on the	-0.124939
-0.430085	is based on the	-0.124939
-0.556405	are based on the	-0.124939
-0.304114	go based on the	-0.124939
-0.304114	chosen based on the	-0.124939
-0.312897	scattered around on the	-0.124939
-0.098800	vector depends on the	-0.124939
-0.098800	value depends on the	-0.124939
-0.172937	branch depends on the	-0.124939
-0.046596	calculation depends on the	-0.124939
-0.098800	application depends on the	-0.124939
-0.098800	addition depends on the	-0.124939
-0.098800	predicted depends on the	-0.124939
-0.098800	sum depends on the	-0.124939
-0.098800	gain depends on the	-0.124939
-0.236368	fully compatible on the	-0.124939
-0.089890	ways depending on the	-0.124939
-0.089890	dynamically depending on the	-0.124939
-0.016551	cycles, depending on the	-0.221849
-0.089890	integers, depending on the	-0.124939
-0.089890	64, depending on the	-0.124939
-0.089890	four, depending on the	-0.124939
-0.089890	meanings depending on the	-0.124939
-0.089890	move, depending on the	-0.124939
-0.236368	model comes on the	-0.124939
-0.117450	that rely on the	-0.124939
-0.444396	cannot rely on the	-0.124939
-0.278942	always rely on the	-0.124939
-0.228078	any effect on the	-0.124939
-0.228078	dramatic effect on the	-0.124939
-0.236368	chain, especially on the	-0.124939
-0.236368	is 15 on the	-0.124939
-0.203126	should depend on the	-0.124939
-0.203126	methods depend on the	-0.124939
-0.203126	details depend on the	-0.124939
-0.236368	negative list, on the	-0.124939
-0.236368	to compromise on the	-0.124939
-0.366256	MKL relies on the	-0.124939
-0.236368	processor appears on the	-0.124939
-0.312897	5 μs on the	-0.124939
-0.236368	more focus on the	-0.124939
-0.236368	discussion forums on the	-0.124939
-0.236368	or interpretation on the	-0.124939
-0.236368	has influence on the	-0.124939
-0.236368	optimization efforts on the	-0.124939
-0.236368	discussions. Turn on the	-0.124939
-0.236368	objects. Storage on the	-0.124939
-0.236368	is pushed on the	-0.124939
-0.236368	cycles (depending on the	-0.124939
-0.236368	are relying on the	-0.124939
-0.356633	needs to code the	-0.124939
-0.425990	same time as the	-0.124939
-0.657600	the same as the	-0.124939
-0.507639	compilers such as the	-0.124939
-0.507639	available, such as the	-0.124939
-0.507639	events, such as the	-0.124939
-1.150080	as long as the	-0.124939
-1.052882	is stored as the	-0.124939
-0.295979	as good as the	-0.124939
-0.329076	same precision as the	-0.124939
-1.065909	as well as the	-0.124939
-0.463780	source code, as the	-0.124939
-0.329076	same features as the	-0.124939
-0.329076	is chosen as the	-0.124939
-0.425990	as soon as the	-0.124939
-0.329076	time consumption as the	-0.124939
-0.329076	increasingly blurred as the	-0.124939
-0.329076	same directory as the	-0.124939
-1.539794	it is not the	-0.124939
-1.264365	this is not the	-0.124939
-0.565742	debugger is not the	-0.124939
-0.487421	time, but not the	-0.124939
-0.487421	complex, but not the	-0.124939
-0.347290	the rows, not the	-0.124939
-0.564948	more time than the	-0.124939
-0.450986	no more than the	-0.124939
-0.450986	much more than the	-0.124939
-0.272068	the CPU than the	-0.124939
-0.404663	library other than the	-0.124939
-0.272068	instruction set than the	-0.124939
-0.864868	more efficient than the	-0.124939
-0.786527	less efficient than the	-0.124939
-0.440101	is faster than the	-0.249877
-0.334270	often faster than the	-0.124939
-0.334270	much faster than the	-0.124939
-0.334270	increasing faster than the	-0.124939
-0.346551	is less than the	-0.425969
-0.233931	be less than the	-0.124939
-0.570154	8 rather than the	-0.124939
-0.404439	stack rather than the	-0.124939
-0.404439	factor rather than the	-0.124939
-0.404439	step rather than the	-0.124939
-0.404439	xxn rather than the	-0.124939
-0.404439	!b) rather than the	-0.124939
-0.404439	matters rather than the	-0.124939
-0.404439	at, rather than the	-0.124939
-0.272068	control instructions than the	-0.124939
-0.272068	to calculate than the	-0.124939
-0.207466	more resources than the	-0.124939
-0.545948	is better than the	-0.124939
-0.355827	usually higher than the	-0.124939
-0.047007	is bigger than the	-0.124939
-0.159743	be bigger than the	-0.124939
-0.159743	are bigger than the	-0.124939
-0.159743	loop bigger than the	-0.124939
-0.272068	other modules than the	-0.124939
-0.272068	units smaller than the	-0.124939
-0.272068	more safe than the	-0.124939
-0.202626	same priority than the	-0.124939
-0.202626	lower priority than the	-0.124939
-0.441211	be slower than the	-0.124939
-0.500684	more predictable than the	-0.124939
-0.272068	is larger than the	-0.124939
-0.272068	other input/output than the	-0.124939
-0.272068	memory footprint than the	-0.124939
-0.440990	sure to have the	-0.124939
-0.440990	convenient to have the	-0.124939
-0.415067	will not have the	-0.124939
-0.733928	do not have the	-0.124939
-0.494848	all libraries have the	-0.124939
-0.676487	bit systems have the	-0.124939
-0.741293	it doesn't have the	-0.124939
-0.678518	compiler doesn't have the	-0.124939
-0.547611	simply don't have the	-0.124939
-0.336651	these languages have the	-0.124939
-0.171392	Add to this the	-0.425969
-0.350980	to 122 this the	-0.124939
-0.776466	at the time the	-0.425969
-0.521193	before the time the	-0.124939
-0.076633	than each time the	-0.425969
-0.169749	value each time the	-0.124939
-0.169749	updates each time the	-0.124939
-0.342020	the first time the	-0.425969
-0.276808	done every time the	-0.124939
-0.276808	list every time the	-0.124939
-0.276808	branches every time the	-0.124939
-0.276808	loaded every time the	-0.124939
-0.276808	updates every time the	-0.124939
-0.276808	misprediction every time the	-0.124939
-0.415445	the next time the	-0.124939
-0.586369	as last time the	-0.124939
-0.435964	is to use the	-0.124939
-0.289835	you to use the	-0.124939
-0.289835	has to use the	-0.124939
-0.227651	possible to use the	-0.124939
-0.411106	faster to use the	-0.124939
-0.292607	how to use the	-0.124939
-0.289835	cases to use the	-0.124939
-0.289835	want to use the	-0.124939
-0.350997	advantageous to use the	-0.301030
-0.289835	likely to use the	-0.124939
-0.347951	recommended to use the	-0.124939
-0.411106	preferred to use the	-0.124939
-0.289835	prefer to use the	-0.124939
-0.269307	i and use the	-0.124939
-0.269307	local, and use the	-0.124939
-0.291730	instructions that use the	-0.124939
-0.291730	modules that use the	-0.124939
-0.107139	it can use the	-0.425969
-0.328940	compiler can use the	-0.425969
-0.300852	you can use the	-0.124939
-0.358768	You can use the	-0.124939
-0.328608	directly, or use the	-0.124939
-0.432183	will not use the	-0.124939
-0.432183	do not use the	-0.124939
-0.609842	Do not use the	-0.124939
-0.433640	you may use the	-0.124939
-0.320499	You may use the	-0.124939
-0.269307	loop will use the	-0.124939
-0.384166	compilers will use the	-0.124939
-0.249502	you do use the	-0.124939
-0.107075	best compilers use the	-0.124939
-0.107075	(Some compilers use the	-0.124939
-0.328608	they cannot use the	-0.124939
-0.358539	to always use the	-0.124939
-0.328608	several applications use the	-0.124939
-0.328608	systems normally use the	-0.124939
-0.249502	brackets mean use the	-0.124939
-0.249502	Intel CPUs: use the	-0.124939
-0.249502	2 thenaandbcannot use the	-0.124939
-0.249502	multiplications. Subtractions use the	-0.124939
-0.045974	mode or when the	-0.602060
-0.090777	power function when the	-0.124939
-0.090777	pow function when the	-0.124939
-0.276539	vectorized code when the	-0.124939
-0.155738	The time when the	-0.124939
-0.155738	long time when the	-0.124939
-0.155738	extra time when the	-0.124939
-0.276539	into memory when the	-0.124939
-0.205682	big program when the	-0.124939
-0.374498	size only when the	-0.124939
-0.302983	also used when the	-0.124939
-0.205682	put there when the	-0.124939
-0.316834	less efficient when the	-0.124939
-0.276539	be faster when the	-0.124939
-0.090777	is called when the	-0.124939
-0.090777	be called when the	-0.124939
-0.205682	32-bit systems when the	-0.124939
-0.205291	is useful when the	-0.124939
-0.205291	be useful when the	-0.124939
-0.384702	needed even when the	-0.124939
-0.205682	512 bits when the	-0.124939
-0.393611	vector operations when the	-0.124939
-0.205682	most cases when the	-0.124939
-0.205682	you want when the	-0.124939
-0.205682	is best when the	-0.124939
-0.205682	programming language when the	-0.124939
-0.205682	++i). But when the	-0.124939
-0.276539	the matrix when the	-0.124939
-0.043021	double precision when the	-0.425969
-0.276539	this problem when the	-0.124939
-0.205682	loop counter when the	-0.124939
-0.205682	several files when the	-0.124939
-0.205682	memory allocation when the	-0.124939
-0.205682	CPU-intensive programs when the	-0.124939
-0.205682	cause problems when the	-0.124939
-0.090777	goes automatically when the	-0.124939
-0.090777	update automatically when the	-0.124939
-0.205682	a disadvantage when the	-0.124939
-0.205682	language modules when the	-0.124939
-0.428574	is relevant when the	-0.124939
-0.043021	allocated dynamically when the	-0.124939
-0.205682	are inefficient when the	-0.124939
-0.205682	this task when the	-0.124939
-0.205682	is obtained when the	-0.124939
-0.360509	most efficiently when the	-0.124939
-0.028202	is initialized when the	-0.425969
-0.058363	be initialized when the	-0.124939
-0.205682	slower, especially when the	-0.124939
-0.360509	more fragmented when the	-0.124939
-0.205682	mouse inputs when the	-0.124939
-0.090777	is resolved when the	-0.124939
-0.090777	not resolved when the	-0.124939
-0.205682	an update when the	-0.124939
-0.205682	garbage collection when the	-0.124939
-0.205682	than truncation when the	-0.124939
-0.205682	different places when the	-0.124939
-0.163673	is deallocated when the	-0.124939
-0.163673	are deallocated when the	-0.124939
-0.205682	is increased when the	-0.124939
-0.205682	is deleted when the	-0.124939
-0.205682	is negligible when the	-0.124939
-0.205682	is freed when the	-0.124939
-0.205682	be bypassed when the	-0.124939
-0.205682	point precisions when the	-0.124939
-0.205682	four float's when the	-0.124939
-0.205682	the processor) when the	-0.124939
-0.205682	than 33% when the	-0.124939
-0.205682	memory released when the	-0.124939
-0.205682	and decreased when the	-0.124939
-0.251959	dispatched function then the	-0.124939
-0.404460	short time then the	-0.124939
-0.251959	or more then the	-0.124939
-0.377594	frame functions then the	-0.124939
-0.331558	be set then the	-0.124939
-0.167900	of 2 then the	-0.425969
-0.251959	enough registers then the	-0.124939
-0.251959	the case then the	-0.124939
-0.331558	thousand times then the	-0.124939
-0.251959	cache line then the	-0.124939
-0.331558	template parameters then the	-0.124939
-0.251959	set values then the	-0.124939
-0.251959	these methods then the	-0.124939
-0.251959	is high then the	-0.124939
-0.251959	innermost function, then the	-0.124939
-0.404460	this range then the	-0.124939
-0.331558	as index then the	-0.124939
-0.107958	a time, then the	-0.124939
-0.107958	any time, then the	-0.124939
-0.251959	has changed then the	-0.124939
-0.467169	to execute then the	-0.124939
-0.361700	same module then the	-0.124939
-0.251959	equally near then the	-0.124939
-0.251959	than once then the	-0.124939
-0.050633	shared object, then the	-0.124939
-0.251959	32-bit integers, then the	-0.124939
-0.251959	carry flag then the	-0.124939
-0.251959	is true, then the	-0.124939
-0.251959	the pipeline then the	-0.124939
-0.251959	and changing then the	-0.124939
-0.251959	is slow, then the	-0.124939
-0.251959	is false, then the	-0.124939
-0.331558	second sum, then the	-0.124939
-0.331558	64-bit double, then the	-0.124939
-0.251959	not vacant then the	-0.124939
-0.251959	different priorities then the	-0.124939
-0.251959	2 GHz then the	-0.124939
-0.251959	while loops, then the	-0.124939
-0.251959	chapter 9.10, then the	-0.124939
-0.251959	accessed row-wise, then the	-0.124939
-0.251959	to ignore, then the	-0.124939
-0.251959	writes only, then the	-0.124939
-0.251959	= 18, then the	-0.124939
-0.341335	single function from the	-0.124939
-0.341335	value than from the	-0.124939
-1.149031	the compiler from the	-0.124939
-0.341335	level-1 cache from the	-0.124939
-0.297000	this value from the	-0.124939
-0.297000	each value from the	-0.124939
-0.523696	to return from the	-0.124939
-0.497368	is called from the	-0.124939
-0.314422	are called from the	-0.124939
-0.497368	when called from the	-0.124939
-0.260083	A call from the	-0.124939
-0.260083	map file from the	-0.124939
-0.446858	also available from the	-0.124939
-0.388478	when accessed from the	-0.124939
-0.096139	is calculated from the	-0.301030
-0.260083	instruction sets from the	-0.124939
-0.260083	subtracting n from the	-0.124939
-0.260083	at runtime from the	-0.124939
-0.260083	are needed from the	-0.124939
-0.398553	99 read from the	-0.124939
-0.260083	9.5a goes from the	-0.124939
-0.260083	size right from the	-0.124939
-0.260083	and writing from the	-0.124939
-0.260083	more efficiently from the	-0.124939
-0.372188	is far from the	-0.124939
-0.398553	will benefit from the	-0.124939
-0.260083	are generated from the	-0.124939
-0.260083	ret returns from the	-0.124939
-0.260083	it gets from the	-0.124939
-0.260083	is removed from the	-0.124939
-0.260083	not separated from the	-0.124939
-0.260083	when returning from the	-0.124939
-0.260083	be evicted from the	-0.124939
-0.260083	no warning from the	-0.124939
-0.260083	be fetched from the	-0.124939
-0.260083	and popped from the	-0.124939
-0.260083	may deviate from the	-0.124939
-0.336380	reflecting it at the	-0.124939
-0.255970	stack memory at the	-0.124939
-0.086025	never used at the	-0.425969
-0.255970	other compilers at the	-0.124939
-0.255970	the variable at the	-0.124939
-0.265235	The elements at the	-0.124939
-0.265235	dummy elements at the	-0.124939
-0.255970	run faster at the	-0.124939
-0.539044	is done at the	-0.124939
-0.255970	clock cycles at the	-0.124939
-0.255970	Avoid branches at the	-0.124939
-0.255970	point multiplication at the	-0.124939
-0.255970	its name at the	-0.124939
-0.255970	to zero at the	-0.124939
-0.255970	table lookup at the	-0.124939
-0.115989	may look at the	-0.124939
-0.035329	you look at the	-0.301030
-0.115989	also look at the	-0.124939
-0.193402	Let's look at the	-0.124939
-0.115989	let's look at the	-0.124939
-0.255970	installation options at the	-0.124939
-0.255970	multiple things at the	-0.124939
-0.033493	was unknown at the	-0.124939
-0.006495	were unknown at the	-0.823909
-0.255970	one thing at the	-0.124939
-0.255970	at least at the	-0.124939
-0.255970	which counts at the	-0.124939
-0.255970	same DLL at the	-0.124939
-0.255970	will break at the	-0.124939
-0.255970	less popular at the	-0.124939
-0.255970	body begins at the	-0.124939
-0.255970	been lost at the	-0.124939
-0.255970	things. Looking at the	-0.124939
-0.824162	if it has the	-0.124939
-0.471839	but it has the	-0.124939
-0.119699	function. This has the	-0.425969
-0.285495	called. This has the	-0.124939
-0.285495	executed. This has the	-0.124939
-0.312991	CParent<CChild1> { has the	-0.124939
-0.626321	pointer. It has the	-0.124939
-0.312991	variable always has the	-0.124939
-0.800540	the microprocessor has the	-0.124939
-0.312991	in main has the	-0.124939
-0.312991	Function inlining has the	-0.124939
-0.312991	as position-independent has the	-0.124939
-0.312991	doesn't occur has the	-0.124939
-0.312991	main executable has the	-0.124939
-0.312991	shared_ptr. auto_ptr has the	-0.124939
-0.642609	is to make the	-0.124939
-0.355983	do to make the	-0.124939
-0.355983	array to make the	-0.124939
-0.474461	possible to make the	-0.124939
-0.355983	value to make the	-0.124939
-0.294447	order to make the	-0.221849
-0.697935	how to make the	-0.124939
-0.500901	useful to make the	-0.124939
-0.693665	want to make the	-0.124939
-0.355983	advantageous to make the	-0.124939
-0.355983	solution to make the	-0.124939
-0.355983	things to make the	-0.124939
-0.355983	forget to make the	-0.124939
-0.355983	tried to make the	-0.124939
-0.355983	tends to make the	-0.124939
-0.394609	problem and make the	-0.124939
-0.394609	conversions and make the	-0.124939
-0.411795	does not make the	-0.124939
-0.333133	this will make the	-0.124939
-0.333133	overflow will make the	-0.124939
-0.333133	b++; will make the	-0.124939
-0.114232	loops would make the	-0.124939
-0.114232	chain would make the	-0.124939
-0.269670	of course make the	-0.124939
-0.269670	time. Templates make the	-0.124939
-0.269670	or (5) make the	-0.124939
-0.261774	This is because the	-0.124939
-0.201191	may be because the	-0.124939
-0.201191	program or because the	-0.124939
-0.257638	simple function because the	-0.124939
-0.257638	frame function because the	-0.124939
-0.099392	execution time because the	-0.124939
-0.201191	store data because the	-0.124939
-0.201191	member functions because the	-0.124939
-0.201191	at all because the	-0.124939
-0.201191	level-2 cache because the	-0.124939
-0.201191	an integer because the	-0.124939
-0.201191	contained object because the	-0.124939
-0.228498	are efficient because the	-0.124939
-0.228498	less efficient because the	-0.124939
-0.201191	optimizations possible because the	-0.124939
-0.201191	optimized version because the	-0.124939
-0.073061	the performance because the	-0.124939
-0.201191	32-bit software because the	-0.124939
-0.201191	is long because the	-0.124939
-0.297372	is faster because the	-0.124939
-0.201191	quite often because the	-0.124939
-0.201191	unit- test because the	-0.124939
-0.201191	became available because the	-0.124939
-0.297372	hundred times because the	-0.124939
-0.201191	is large because the	-0.124939
-0.201191	function calls because the	-0.124939
-0.201191	correct result because the	-0.124939
-0.201191	not necessary because the	-0.124939
-0.201191	than 128 because the	-0.124939
-0.271257	optimal solution because the	-0.124939
-0.018271	64-bit mode because the	-0.124939
-0.201191	interactive programs because the	-0.124939
-0.201191	a microprocessor because the	-0.124939
-0.201191	be better because the	-0.124939
-0.201191	critical applications because the	-0.124939
-0.201191	be needed because the	-0.124939
-0.201191	simple types because the	-0.124939
-0.201191	uncached read because the	-0.124939
-0.201191	allocation process because the	-0.124939
-0.271257	the operands because the	-0.124939
-0.271257	n here because the	-0.124939
-0.335823	is inefficient because the	-0.124939
-0.201922	very inefficient because the	-0.124939
-0.201191	not copied because the	-0.124939
-0.201191	without -fpic because the	-0.124939
-0.201191	only occurs because the	-0.124939
-0.201191	column 28 because the	-0.124939
-0.271257	calculated twice because the	-0.124939
-0.201191	option -fpie because the	-0.124939
-0.201191	but i*12, because the	-0.124939
-0.201191	flags stall because the	-0.124939
-0.201191	not evaluated, because the	-0.124939
-0.201191	cache line, because the	-0.124939
-0.479583	automatically, and only the	-0.124939
-0.317450	dispatcher in only the	-0.124939
-0.481701	CPUs with only the	-0.124939
-0.317450	rely on only the	-0.124939
-0.497745	measures not only the	-0.124939
-0.448003	then use only the	-0.124939
-0.317450	by using only the	-0.124939
-0.317450	is initialized only the	-0.124939
-0.317450	linking includes only the	-0.124939
-0.317450	and insert only the	-0.124939
-0.317450	other processors, only the	-0.124939
-0.317450	simultaneously. Actually, only the	-0.124939
-0.317450	it understands only the	-0.124939
-0.227712	error code. If the	-0.124939
-0.227712	simplest code. If the	-0.124939
-0.185314	virtual function. If the	-0.124939
-0.277646	into memory. If the	-0.124939
-0.185314	be used. If the	-0.124939
-0.082847	data cache. If the	-0.124939
-0.082847	level-3 cache. If the	-0.124939
-0.252659	some systems. If the	-0.124939
-0.252659	not efficient. If the	-0.124939
-0.235869	instruction set. If the	-0.124939
-0.151233	each set. If the	-0.124939
-0.185314	function calls. If the	-0.124939
-0.185314	the object. If the	-0.124939
-0.252659	function library. If the	-0.124939
-0.185314	test purposes. If the	-0.124939
-0.185314	following way. If the	-0.124939
-0.252659	the CPU. If the	-0.124939
-0.252659	a problem. If the	-0.124939
-0.185314	sequential order. If the	-0.124939
-0.185314	was executed. If the	-0.124939
-0.185314	source file. If the	-0.124939
-0.185314	a register. If the	-0.124939
-0.252659	above table. If the	-0.124939
-0.082847	threads simultaneously. If the	-0.124939
-0.082847	seemingly simultaneously. If the	-0.124939
-0.252659	signed number. If the	-0.124939
-0.185314	or constant. If the	-0.124939
-0.185314	data members. If the	-0.124939
-0.185314	to maintain. If the	-0.124939
-0.185314	lower priority. If the	-0.124939
-0.185314	subexpression elimination If the	-0.124939
-0.185314	work better. If the	-0.124939
-0.185314	is declared. If the	-0.124939
-0.185314	See www.agner.org/optimize/cppexamples.zip. If the	-0.124939
-0.185314	an addition. If the	-0.124939
-0.185314	of code). If the	-0.124939
-0.185314	chapter 12. If the	-0.124939
-0.185314	too long. If the	-0.124939
-0.185314	the same. If the	-0.124939
-0.185314	is slow. If the	-0.124939
-0.185314	set 0x1C. If the	-0.124939
-0.185314	|= 0x20; If the	-0.124939
-0.185314	deleting containers. If the	-0.124939
-0.185314	250 ms. If the	-0.124939
-0.185314	page 105). If the	-0.124939
-0.185314	compile time? If the	-0.124939
-0.185314	same class). If the	-0.124939
-0.185314	above. 7. If the	-0.124939
-0.185314	page 62. If the	-0.124939
-0.185314	was coded. If the	-0.124939
-0.185314	a key? If the	-0.124939
-0.185314	more complicated. If the	-0.124939
-0.185314	nontemporal writes. If the	-0.124939
-0.185314	for analysis. If the	-0.124939
-0.185314	multiple elements? If the	-0.124939
-0.185314	or references: If the	-0.124939
-0.185314	the pipeline. If the	-0.124939
-0.185314	is stored? If the	-0.124939
-0.185314	+= sum2; If the	-0.124939
-0.185314	been allocated. If the	-0.124939
-0.554577	function in which the	-0.124939
-0.268805	code in which the	-0.124939
-0.351566	order in which the	-0.301030
-0.268805	brackets in which the	-0.124939
-0.334499	error code which the	-0.124939
-0.415049	rid of all the	-0.124939
-0.254019	file and all the	-0.124939
-0.254019	statement and all the	-0.124939
-0.415049	counters in all the	-0.124939
-0.394611	memory for all the	-0.124939
-0.394611	check for all the	-0.124939
-0.471564	means that all the	-0.124939
-0.224791	loop if all the	-0.124939
-0.224791	But if all the	-0.124939
-0.224791	runtime if all the	-0.124939
-0.401226	data with all the	-0.124939
-0.401226	line with all the	-0.124939
-0.441926	works on all the	-0.124939
-0.355760	first, then all the	-0.124939
-0.202595	double because all the	-0.124939
-0.202595	numbers because all the	-0.124939
-0.272013	and last all the	-0.124939
-0.272013	not load all the	-0.124939
-0.053726	by inlining all the	-0.425969
-0.272013	without checking all the	-0.124939
-0.272013	vector stores all the	-0.124939
-0.272013	or manipulate all the	-0.124939
-0.272013	not solve all the	-0.124939
-0.272013	to distribute all the	-0.124939
-0.272013	to pool all the	-0.124939
-0.297616	of all but the	-0.124939
-0.297616	contrived example, but the	-0.124939
-0.622839	intrinsic functions, but the	-0.124939
-0.439710	compile time, but the	-0.124939
-0.421423	quite efficient, but the	-0.124939
-0.297616	not edx but the	-0.124939
-0.595259	multiple threads, but the	-0.124939
-0.297616	on BSD, but the	-0.124939
-0.387009	as well, but the	-0.124939
-0.297616	by 64, but the	-0.124939
-0.297616	in vectors, but the	-0.124939
-0.297616	be vectorized, but the	-0.124939
-0.387009	can occur, but the	-0.124939
-0.297616	using hyperthreading, but the	-0.124939
-0.297616	a macro, but the	-0.124939
-0.297616	page 103), but the	-0.124939
-0.297616	particular situation, but the	-0.124939
-0.297616	or aliasing, but the	-0.124939
-0.654435	I have used the	-0.124939
-0.380363	have to set the	-0.124939
-0.380363	recommended to set the	-0.124939
-0.346956	in addition, set the	-0.124939
-0.421674	have to do the	-0.124939
-0.527177	possible to do the	-0.124939
-0.543772	how to do the	-0.124939
-0.650900	necessary to do the	-0.124939
-0.499636	able to do the	-0.124939
-0.386255	better to do the	-0.124939
-0.386255	safe to do the	-0.124939
-0.505670	algorithm can do the	-0.124939
-0.422262	compiler will do the	-0.124939
-0.298247	the program do the	-0.124939
-0.387783	therefore cannot do the	-0.124939
-0.298247	you must do the	-0.124939
-0.575373	disadvantage of using the	-0.124939
-0.402295	advantages of using the	-0.124939
-0.364003	possibility of using the	-0.124939
-0.362646	counter and using the	-0.124939
-0.252694	advantage in using the	-0.124939
-0.362646	tool for using the	-0.124939
-0.725602	you are using the	-0.124939
-0.367278	systems are using the	-0.124939
-0.285422	x by using the	-0.124939
-0.285422	functions by using the	-0.124939
-0.285422	set by using the	-0.124939
-0.285422	clock by using the	-0.124939
-0.285422	variables by using the	-0.124939
-0.285422	explicitly by using the	-0.124939
-0.285422	anything by using the	-0.124939
-0.285422	hidden by using the	-0.124939
-0.285422	necessary, by using the	-0.124939
-0.285422	segment by using the	-0.124939
-0.285327	restrictions on using the	-0.124939
-0.191513	Manual on using the	-0.124939
-0.252694	An array using the	-0.124939
-0.240032	exception without using the	-0.124939
-0.240032	directly without using the	-0.124939
-0.252694	algebraic expressions using the	-0.124939
-0.252694	single operation using the	-0.124939
-0.252694	is finished using the	-0.124939
-0.349976	you can double the	-0.124939
-0.349976	This would double the	-0.124939
-0.400046	right data into the	-0.124939
-0.261168	of i into the	-0.124939
-0.261168	as possible into the	-0.124939
-0.261168	a branch into the	-0.124939
-0.261168	flags register into the	-0.124939
-0.052068	fits best into the	-0.425969
-0.261168	memory block into the	-0.124939
-0.261168	be put into the	-0.124939
-0.482389	be linked into the	-0.124939
-0.261168	test feature into the	-0.124939
-0.246188	copies them into the	-0.124939
-0.246188	getting them into the	-0.124939
-0.342643	data fit into the	-0.124939
-0.261168	of N into the	-0.124939
-0.261168	instruments directly into the	-0.124939
-0.261168	go back into the	-0.124939
-0.261168	measurement instruments into the	-0.124939
-0.261168	is fed into the	-0.124939
-0.261168	go deeper into the	-0.124939
-0.261168	Windows. Integrates into the	-0.124939
-0.261168	fit nicely into the	-0.124939
-0.261168	to feed into the	-0.124939
-0.424252	test but also the	-0.124939
-0.424252	spot but also the	-0.124939
-0.355620	Note how efficient the	-0.124939
-0.378836	the function. In the	-0.124939
-0.121554	is faster. In the	-0.124939
-0.121554	much faster. In the	-0.124939
-0.378836	higher speed. In the	-0.124939
-0.290953	them all. In the	-0.124939
-0.290953	of two. In the	-0.124939
-0.290953	doesn't occur. In the	-0.124939
-0.290953	very big. In the	-0.124939
-0.290953	vendor string. In the	-0.124939
-0.290953	same name. In the	-0.124939
-0.290953	be obtained. In the	-0.124939
-0.290953	page 60. In the	-0.124939
-0.290953	(page 146). In the	-0.124939
-0.209837	arrays and where the	-0.124939
-0.209837	a program where the	-0.124939
-0.209837	instruction set where the	-0.124939
-0.158481	in cases where the	-0.124939
-0.182374	most cases where the	-0.124939
-0.182374	In cases where the	-0.124939
-0.182374	many cases where the	-0.124939
-0.182374	simple cases where the	-0.124939
-0.182374	special cases where the	-0.124939
-0.209837	carry) instructions where the	-0.124939
-0.209837	64-bit mode where the	-0.124939
-0.209837	data sets where the	-0.124939
-0.209837	memory model where the	-0.124939
-0.209837	obscure examples where the	-0.124939
-0.209837	learning process where the	-0.124939
-0.209837	4 computer where the	-0.124939
-0.209837	can predict where the	-0.124939
-0.570587	the situation where the	-0.124939
-0.277214	use situation where the	-0.124939
-0.277214	common situation where the	-0.124939
-0.209837	of templates where the	-0.124939
-0.290714	of situations where the	-0.124939
-0.461738	in situations where the	-0.124939
-0.209837	is determined where the	-0.124939
-0.209837	second step where the	-0.124939
-0.209837	addresses (i.e. where the	-0.124939
-0.209837	in Fortran where the	-0.124939
-0.209837	column-wise manner where the	-0.124939
-0.652212	the compiler takes the	-0.124939
-0.335909	of 2, so the	-0.124939
-0.335909	in thousand so the	-0.124939
-0.335909	specifies truncation so the	-0.124939
-0.335909	significant digits, so the	-0.124939
-0.352840	function will return the	-0.124939
-0.277532	cache in between the	-0.124939
-0.509974	in performance between the	-0.124939
-0.382215	the difference between the	-0.124939
-0.303033	minimal difference between the	-0.124939
-0.277532	graphics framework between the	-0.124939
-0.509974	and synchronization between the	-0.124939
-0.362462	important distinction between the	-0.124939
-0.277532	The similarity between the	-0.124939
-0.277532	the transitions between the	-0.124939
-0.277532	work evenly between the	-0.124939
-0.277532	were observed between the	-0.124939
-0.277532	for distinguishing between the	-0.124939
-0.546859	no virtual member the	-0.124939
-0.418687	of the way the	-0.124939
-0.418687	on the way the	-0.124939
-0.874886	is no way the	-0.124939
-0.590587	division is faster the	-0.124939
-0.449523	all. This makes the	-0.124939
-0.449523	stored. This makes the	-0.124939
-0.248200	but this makes the	-0.124939
-0.248200	functions only makes the	-0.124939
-0.372581	non-sequential which makes the	-0.124939
-0.248200	by one makes the	-0.124939
-0.356867	it also makes the	-0.124939
-0.248200	function call makes the	-0.124939
-0.248200	non-Intel processor makes the	-0.124939
-0.050040	This option makes the	-0.124939
-0.248200	integers simply makes the	-0.124939
-0.327047	dynamic linking makes the	-0.124939
-0.248200	of templates makes the	-0.124939
-0.248200	such checks makes the	-0.124939
-0.248200	memory blocks makes the	-0.124939
-0.248200	many instances makes the	-0.124939
-0.362620	the time before the	-0.124939
-0.252674	the program before the	-0.124939
-0.185326	the value before the	-0.124939
-0.185326	it takes before the	-0.124939
-0.185326	virtual table before the	-0.124939
-0.185326	misprediction long before the	-0.124939
-0.053443	is called before the	-0.124939
-0.053443	be called before the	-0.124939
-0.053443	usually called before the	-0.124939
-0.185326	of times before the	-0.124939
-0.362620	the stack before the	-0.124939
-0.185326	a check before the	-0.124939
-0.082852	is known before the	-0.124939
-0.082852	size known before the	-0.124939
-0.277662	desired values before the	-0.124939
-0.185326	pointer well before the	-0.124939
-0.185326	clock cycles before the	-0.124939
-0.185326	new addition before the	-0.124939
-0.185326	it comes before the	-0.124939
-0.185326	thread priority before the	-0.124939
-0.185326	one iteration before the	-0.124939
-0.185326	are resolved before the	-0.124939
-0.185326	address again before the	-0.124939
-0.185326	of B before the	-0.124939
-0.185326	be freed before the	-0.124939
-0.082852	do immediately before the	-0.124939
-0.082852	placed immediately before the	-0.124939
-0.185326	be restored before the	-0.124939
-1.230137	This is called the	-0.124939
-0.338624	of memory called the	-0.124939
-0.338624	special cache called the	-0.124939
-0.378734	contiguous memory. See the	-0.124939
-0.290869	across platforms. See the	-0.124939
-0.290869	sleep mode. See the	-0.124939
-0.290869	possible version. See the	-0.124939
-0.290869	control branch. See the	-0.124939
-0.290869	exception handling. See the	-0.124939
-0.533005	memory pool. See the	-0.124939
-0.290869	are doing. See the	-0.124939
-0.290869	directive __declspec(cpu_dispatch(...)). See the	-0.124939
-0.290869	is obvious. See the	-0.124939
-0.199217	code to call the	-0.124939
-0.088284	have to call the	-0.124939
-0.471001	takes to call the	-0.124939
-0.199217	want to call the	-0.124939
-0.199217	handler to call the	-0.124939
-0.199217	supposed to call the	-0.124939
-0.359519	F2 and call the	-0.124939
-0.391742	It can call the	-0.124939
-0.275110	you may call the	-0.124939
-0.275110	other modules call the	-0.124939
-0.275110	// Now call the	-0.124939
-0.224228	In this example, the	-0.124939
-0.477732	optimization. For example, the	-0.124939
-0.682219	dispatching. For example, the	-0.124939
-0.477732	development. For example, the	-0.124939
-0.477732	sources. For example, the	-0.124939
-0.100898	the above example, the	-0.124939
-0.353728	example shows first the	-0.124939
-0.353369	a single register the	-0.124939
-0.800358	We can take the	-0.124939
-0.315077	may not take the	-0.124939
-0.315077	and b take the	-0.124939
-0.315077	28. We take the	-0.124939
-0.315077	calculations usually take the	-0.124939
-0.315077	inputs. Let's take the	-0.124939
-0.857074	This is often the	-0.124939
-0.337977	This is how the	-0.124939
-0.504225	code and how the	-0.124939
-0.475965	comments about how the	-0.124939
-0.494507	may not need the	-0.124939
-0.492901	before we need the	-0.124939
-0.724223	it doesn't need the	-0.124939
-0.494507	that don't need the	-0.124939
-0.409086	example, to test the	-0.124939
-0.409086	necessary to test the	-0.124939
-0.409086	relevant to test the	-0.124939
-0.326328	you should test the	-0.124939
-0.419003	loop and without the	-0.124939
-0.267330	with or without the	-0.124939
-0.492692	shared object without the	-0.124939
-0.267330	new version without the	-0.124939
-0.267330	dynamic libraries without the	-0.124939
-0.267330	11.3 even without the	-0.124939
-0.267330	old processors without the	-0.124939
-0.267330	on CPUs without the	-0.124939
-0.267330	of calculations without the	-0.124939
-0.267330	be changed without the	-0.124939
-0.267330	code. (Compile without the	-0.124939
-0.344317	table for even the	-0.124939
-0.344317	some cases even the	-0.124939
-0.924072	you are sure the	-0.124939
-0.459187	to make sure the	-0.204120
-0.487326	then make sure the	-0.124939
-0.423321	that makes sure the	-0.124939
-0.423321	it makes sure the	-0.124939
-0.348897	variable. Make sure the	-0.124939
-0.453401	small and always the	-0.124939
-0.449950	seconds to access the	-0.124939
-0.449950	unable to access the	-0.124939
-0.420523	functions that access the	-0.124939
-0.324696	finally (4) access the	-0.124939
-0.198085	can shift out the	-0.124939
-0.198085	will shift out the	-0.124939
-0.134577	cannot rule out the	-0.124939
-0.139477	completely rule out the	-0.124939
-0.202906	to roll out the	-0.301030
-0.159974	we roll out the	-0.124939
-0.052521	by rolling out the	-0.425969
-0.445790	use in case the	-0.124939
-0.445790	operands in case the	-0.124939
-0.445790	errors in case the	-0.124939
-0.322602	the latter case the	-0.124939
-0.747438	In most cases the	-0.124939
-0.710276	In some cases the	-0.124939
-0.366350	to set up the	-0.124939
-0.473106	to speed up the	-0.124939
-0.217732	to look up the	-0.124939
-0.136252	first look up the	-0.124939
-0.136252	(3) look up the	-0.124939
-0.335895	to split up the	-0.124939
-0.255566	measurements: warm up the	-0.124939
-0.255566	_endthread() cleans up the	-0.124939
-0.255566	it fills up the	-0.124939
-0.255566	may fill up the	-0.124939
-0.255566	by summing up the	-0.124939
-0.485911	way of making the	-0.124939
-0.092862	solution of making the	-0.124939
-0.211137	advice of making the	-0.124939
-0.395947	code for making the	-0.124939
-0.378935	avoided by making the	-0.124939
-0.265286	either by making the	-0.124939
-0.265286	misses by making the	-0.124939
-0.378935	solved by making the	-0.124939
-0.265286	mispredictions by making the	-0.124939
-0.253296	different places making the	-0.124939
-0.415401	again two times the	-0.124939
-0.413799	then many times the	-0.124939
-0.784515	how many times the	-0.124939
-0.415401	and three times the	-0.124939
-0.636971	and you want the	-0.124939
-1.044590	If you want the	-0.124939
-0.403708	chain. We want the	-0.124939
-0.311160	you just want the	-0.124939
-0.318878	too much about the	-0.124939
-0.087441	all information about the	-0.124939
-0.197044	no information about the	-0.124939
-0.197044	No information about the	-0.124939
-0.197044	full information about the	-0.124939
-0.197044	added information about the	-0.124939
-0.292203	gets information about the	-0.124939
-0.197044	incomplete information about the	-0.124939
-0.241377	to care about the	-0.124939
-0.241377	but that's about the	-0.124939
-0.241377	hasn't thought about the	-0.124939
-0.286801	function that does the	-0.124939
-0.286801	constructor that does the	-0.124939
-0.423595	conversions. It does the	-0.124939
-0.299248	pointer which does the	-0.124939
-0.299248	static_cast operator does the	-0.124939
-0.299248	function __intel_cpu_features_init_x() does the	-0.124939
-0.255750	program, and while the	-0.124939
-0.255750	are used, while the	-0.124939
-0.255750	is called, while the	-0.124939
-0.255750	are integers, while the	-0.124939
-0.255750	press break while the	-0.124939
-0.255750	only once, while the	-0.124939
-0.255750	relatively expensive, while the	-0.124939
-0.255750	is unchanged, while the	-0.124939
-0.255750	for both, while the	-0.124939
-0.255750	as intended, while the	-0.124939
-0.352263	in BSD work the	-0.124939
-0.446801	program that calls the	-0.124939
-0.316560	statement that calls the	-0.124939
-0.306563	statement always calls the	-0.124939
-0.306563	example 16.2 calls the	-0.124939
-0.306563	the loader calls the	-0.124939
-0.341358	way to avoid the	-0.124939
-0.341358	want to avoid the	-0.124939
-0.341358	able to avoid the	-0.124939
-0.539110	ways to avoid the	-0.124939
-0.341358	unrolled to avoid the	-0.124939
-0.458912	you can avoid the	-0.124939
-0.271560	we can avoid the	-0.124939
-0.563195	You can avoid the	-0.124939
-0.365351	compiler may avoid the	-0.124939
-0.334963	if you avoid the	-0.124939
-0.352387	a word processor the	-0.124939
-0.304201	PathScale. 2. Use the	-0.124939
-0.304201	this option. Use the	-0.124939
-0.304201	are implemented. Use the	-0.124939
-0.304201	hot spot. Use the	-0.124939
-0.304201	assembly listing. Use the	-0.124939
-0.293219	double precision. But the	-0.124939
-0.293219	destination array. But the	-0.124939
-0.293219	an integer. But the	-0.124939
-0.293219	constant 5. But the	-0.124939
-0.293219	other languages. But the	-0.124939
-0.293219	the market. But the	-0.124939
-0.292904	the library through the	-0.124939
-1.067294	are accessed through the	-0.124939
-0.381226	code goes through the	-0.124939
-0.415167	variables go through the	-0.124939
-0.292904	download updates through the	-0.124939
-0.292904	will propagate through the	-0.124939
-0.409094	framework and compile the	-0.124939
-0.226771	that you compile the	-0.124939
-0.226771	First you compile the	-0.124939
-0.315507	If we compile the	-0.124939
-0.286598	problems that cause the	-0.124939
-0.808863	This can cause the	-0.124939
-0.679480	This may cause the	-0.124939
-0.147781	so will cause the	-0.124939
-0.147781	operators will cause the	-0.124939
-0.231678	0x2710 will cause the	-0.124939
-0.456042	others have done the	-0.124939
-0.559965	system, and therefore the	-0.124939
-0.114357	must be inside the	-0.124939
-0.114357	of memory inside the	-0.124939
-0.114357	is used inside the	-0.124939
-0.114357	making objects inside the	-0.124939
-0.114357	shared variable inside the	-0.124939
-0.012753	the branch inside the	-0.124939
-0.025892	a branch inside the	-0.124939
-0.025892	The branch inside the	-0.124939
-0.114357	size arrays inside the	-0.124939
-0.034877	the calculations inside the	-0.124939
-0.034877	on calculations inside the	-0.124939
-0.034877	point calculations inside the	-0.124939
-0.114357	a counter inside the	-0.124939
-0.120164	be declared inside the	-0.425969
-0.114357	overflow condition inside the	-0.124939
-0.170859	object defined inside the	-0.124939
-0.114357	function body inside the	-0.124939
-0.114357	what happens inside the	-0.124939
-0.114357	because nothing inside the	-0.124939
-0.114357	multiplication, etc.) inside the	-0.124939
-0.114357	than log) inside the	-0.124939
-0.548937	that is calculated the	-0.124939
-0.617463	is only calculated the	-0.124939
-0.695432	code that uses the	-0.124939
-0.416930	CPUs. It uses the	-0.124939
-0.321813	user never uses the	-0.124939
-0.665105	you can get the	-0.124939
-0.538898	will not get the	-0.124939
-0.435025	y will get the	-0.124939
-0.294234	will both get the	-0.124939
-0.294234	will typically get the	-0.124939
-0.789967	way to check the	-0.124939
-0.231295	program can check the	-0.124939
-0.231295	You can check the	-0.124939
-0.648000	is more advantageous the	-0.124939
-0.794862	processors that support the	-0.124939
-0.453405	Does not support the	-0.124939
-0.321443	processors will support the	-0.124939
-0.305293	(eax) which contains the	-0.124939
-0.396460	ecx now contains the	-0.124939
-0.305293	is. ecx contains the	-0.124939
-0.305293	and edx contains the	-0.124939
-0.203424	be efficient whether the	-0.124939
-0.203424	for sure whether the	-0.124939
-0.203424	made about whether the	-0.124939
-0.390136	to see whether the	-0.124939
-0.203424	table shows whether the	-0.124939
-0.203424	to know whether the	-0.124939
-0.203424	to predict whether the	-0.124939
-0.057826	that checks whether the	-0.124939
-0.057826	it checks whether the	-0.124939
-0.057826	dispatcher checks whether the	-0.124939
-0.203424	at compile-time whether the	-0.124939
-0.203424	operand determines whether the	-0.124939
-0.684402	ways of doing the	-0.124939
-0.389806	functions are doing the	-0.124939
-0.261071	accomplished by doing the	-0.124939
-0.261071	spend time doing the	-0.124939
-0.548886	compiler from doing the	-0.124939
-0.261071	in fact doing the	-0.124939
-0.261071	is busy doing the	-0.124939
-0.379316	is to run the	-0.124939
-0.379316	models to run the	-0.124939
-0.304260	If you run the	-0.124939
-0.775028	it will run the	-0.124939
-0.087277	than to calculate the	-0.124939
-0.027189	takes to calculate the	-0.124939
-0.087277	variables to calculate the	-0.124939
-0.087277	order to calculate the	-0.124939
-0.087277	want to calculate the	-0.124939
-0.087277	able to calculate the	-0.124939
-0.087277	recommended to calculate the	-0.124939
-0.087277	application to calculate the	-0.124939
-0.087277	care to calculate the	-0.124939
-0.087277	convenient to calculate the	-0.124939
-0.087277	safer to calculate the	-0.124939
-0.292165	and can calculate the	-0.124939
-0.327489	compiler to inline the	-0.124939
-0.327489	optimal to inline the	-0.124939
-0.319951	it cannot inline the	-0.124939
-0.435247	have to add the	-0.124939
-0.424294	add_elements(s); // add the	-0.124939
-0.525312	You may add the	-0.124939
-0.373328	module then add the	-0.124939
-0.286448	vector register, add the	-0.124939
-0.416231	have to store the	-0.124939
-0.127125	class and store the	-0.124939
-0.127125	(2,2,2,2), and store the	-0.124939
-0.127125	(1,2,3,4), and store the	-0.124939
-0.231941	we can store the	-0.124939
-0.307622	system may store the	-0.124939
-0.231941	compiler will store the	-0.124939
-0.231941	compiler might store the	-0.124939
-0.231941	Even better: store the	-0.124939
-0.332168	is needed. All the	-0.124939
-0.332168	cannot do. All the	-0.124939
-0.228174	time to copy the	-0.124939
-0.228174	useful to copy the	-0.124939
-0.466832	block and copy the	-0.124939
-0.470316	requirements of optimizing the	-0.124939
-0.431977	than by optimizing the	-0.124939
-0.236648	on how well the	-0.124939
-0.236648	checking how well the	-0.124939
-0.835866	pointer is simply the	-0.124939
-0.326392	than to write the	-0.124939
-0.224289	possible to write the	-0.124939
-0.224289	minutes to write the	-0.124939
-0.224289	attempting to write the	-0.124939
-0.454878	important to optimize the	-0.124939
-0.454878	information to optimize the	-0.124939
-0.313866	can often optimize the	-0.124939
-0.297776	we used above the	-0.124939
-0.297776	column 28 above the	-0.124939
-0.297776	elements matrix[c][r] above the	-0.124939
-0.297776	mirror position above the	-0.124939
-0.235725	next function. However, the	-0.124939
-0.235725	scarce resources. However, the	-0.124939
-0.235725	different purposes. However, the	-0.124939
-0.235725	data sets. However, the	-0.124939
-0.235725	are executed. However, the	-0.124939
-0.235725	its value. However, the	-0.124939
-0.235725	a debugger. However, the	-0.124939
-0.235725	next calculation. However, the	-0.124939
-0.348578	the recommendation was the	-0.124939
-0.454308	line in both the	-0.124939
-0.280299	supported by both the	-0.124939
-0.280299	twice because both the	-0.124939
-0.280299	time. Therefore, both the	-0.124939
-0.280299	and checks both the	-0.124939
-0.139731	long time unless the	-0.124939
-0.139731	induction variable unless the	-0.124939
-0.139731	the optimization unless the	-0.124939
-0.139731	32-bit systems unless the	-0.124939
-0.139731	point calculations unless the	-0.124939
-0.199878	32-bit mode unless the	-0.124939
-0.139731	exception handling unless the	-0.124939
-0.064270	is slow unless the	-0.124939
-0.064270	are slow unless the	-0.124939
-0.139731	not safe unless the	-0.124939
-0.139731	more clear unless the	-0.124939
-0.139731	than rounding unless the	-0.124939
-0.139731	(16 bits), unless the	-0.124939
-0.139731	integer constant, unless the	-0.124939
-0.139731	method unfavorable, unless the	-0.124939
-0.386636	In most cases, the	-0.124939
-0.485010	In many cases, the	-0.124939
-0.442455	In some cases, the	-0.124939
-0.344541	50 simple cases, the	-0.124939
-0.489773	possible to replace the	-0.124939
-0.214212	necessary to replace the	-0.124939
-0.214212	advantageous to replace the	-0.124939
-0.214212	expected to replace the	-0.124939
-1.111137	compiler may replace the	-0.124939
-0.341356	compilers will replace the	-0.124939
-0.260831	writeable data. Therefore, the	-0.124939
-0.260831	are called. Therefore, the	-0.124939
-0.260831	64-bit mode. Therefore, the	-0.124939
-0.260831	be critical. Therefore, the	-0.124939
-0.481828	time consuming. Therefore, the	-0.124939
-0.260831	relative addresses. Therefore, the	-0.124939
-0.539391	possible to see the	-0.124939
-0.383201	want to see the	-0.124939
-0.383201	fail to see the	-0.124939
-0.441758	user can see the	-0.124939
-0.472829	that it allows the	-0.124939
-0.242000	"undefined". This allows the	-0.124939
-0.242000	throw(); This allows the	-0.124939
-0.255399	-ffunction-sections) which allows the	-0.124939
-0.255399	const reference allows the	-0.124939
-0.255399	out-of-order mechanism allows the	-0.124939
-0.499826	which instruction sets the	-0.124939
-0.291031	above example sets the	-0.124939
-0.291031	function __intel_cpu_features_init() sets the	-0.124939
-0.291031	and similarly sets the	-0.124939
-0.349710	intermediate code like the	-0.124939
-0.270339	level-2 cache. Using the	-0.124939
-0.270339	least temporarily. Using the	-0.124939
-0.270339	page 105). Using the	-0.124939
-0.270339	this chapter. Using the	-0.124939
-0.270339	chapter 11. Using the	-0.124939
-0.348858	brand or model the	-0.124939
-0.328031	they can block the	-0.124939
-0.328031	can possibly block the	-0.124939
-0.353039	and to put the	-0.124939
-0.496797	advantageous to put the	-0.124939
-0.319788	code and put the	-0.124939
-0.219061	you may put the	-0.124939
-0.219061	they have put the	-0.124939
-0.061503	other then put the	-0.124939
-0.061503	bytes then put the	-0.124939
-0.061503	other, then put the	-0.124939
-0.351039	each iteration needs the	-0.124939
-0.267218	limitations to what the	-0.124939
-0.267218	example shows what the	-0.124939
-0.381446	to know what the	-0.124939
-0.052997	8.7 Checking what the	-0.425969
-0.599598	to avoid running the	-0.124939
-0.327744	resources. Consider running the	-0.124939
-0.487629	@gnu_indirect_function"); // Make the	-0.124939
-0.264066	object's class. Make the	-0.124939
-0.264066	the object. Make the	-0.124939
-0.264066	function returns. Make the	-0.124939
-0.264066	following alternatives: Make the	-0.124939
-0.451752	x, and last the	-0.124939
-0.691395	before and after the	-0.124939
-0.203633	member or after the	-0.124939
-0.203633	the check after the	-0.124939
-0.042668	clock cycles after the	-0.124939
-0.203633	then output after the	-0.124939
-0.203633	execute _mm_empty() after the	-0.124939
-0.203633	remain locked after the	-0.124939
-0.469293	Trying to read the	-0.124939
-0.280673	you may read the	-0.124939
-0.280673	If you read the	-0.124939
-0.366285	would only read the	-0.124939
-0.212506	code to give the	-0.124939
-0.212506	appropriate to give the	-0.124939
-0.094457	overflow and give the	-0.124939
-0.094457	underflow and give the	-0.124939
-0.215337	does not give the	-0.124939
-0.215337	instruction doesn't give the	-0.124939
-0.215337	subsequent counts give the	-0.124939
-0.547524	machine code becomes the	-0.124939
-0.490328	because it requires the	-0.124939
-0.344061	cache to load the	-0.124939
-0.344061	takes to load the	-0.124939
-0.344061	needs to load the	-0.124939
-0.277273	code will load the	-0.124939
-0.131736	way to control the	-0.124939
-0.131736	options to control the	-0.124939
-0.518439	has to assume the	-0.124939
-0.389387	avoided by calling the	-0.124939
-0.273309	more than calling the	-0.124939
-0.115501	array before calling the	-0.124939
-0.115501	_mm256_zeroupper() before calling the	-0.124939
-0.801767	following example shows the	-0.124939
-0.237400	possible to improve the	-0.124939
-0.155392	This can improve the	-0.124939
-0.240928	You can improve the	-0.124939
-0.155392	systems can improve the	-0.124939
-0.137335	did not improve the	-0.124939
-0.068512	that may improve the	-0.124939
-0.068512	This may improve the	-0.124939
-0.137238	you may improve the	-0.124939
-0.068512	this may improve the	-0.124939
-0.137335	not only improve the	-0.124939
-0.137335	can possibly improve the	-0.124939
-0.249356	i; } Here, the	-0.124939
-0.249356	1.; } Here, the	-0.124939
-0.203557	+ i; Here, the	-0.124939
-0.203557	+= x; Here, the	-0.124939
-0.203557	+ 3.5; Here, the	-0.124939
-0.203557	i++) List[i]++; Here, the	-0.124939
-0.203557	int c1::*MemberPointer; Here, the	-0.124939
-0.703660	compiler doesn't know the	-0.124939
-0.550224	condition will generate the	-0.124939
-0.554180	order is usually the	-0.124939
-0.417872	possible to reduce the	-0.124939
-0.327593	you can reduce the	-0.124939
-0.327593	switches can reduce the	-0.124939
-0.268970	compilers cannot reduce the	-0.124939
-0.451281	when it goes the	-0.124939
-0.319874	that always goes the	-0.124939
-0.314091	done to choose the	-0.124939
-0.162386	system and choose the	-0.124939
-0.162386	operations and choose the	-0.124939
-0.285304	we may choose the	-0.124939
-0.496977	You may choose the	-0.124939
-0.203557	compiler will choose the	-0.124939
-0.203557	will automatically choose the	-0.124939
-0.131033	microprocessor has made the	-0.124939
-0.131033	reordering has made the	-0.124939
-0.636273	the simple function, the	-0.124939
-0.489759	fail to start the	-0.124939
-0.318738	iterations and start the	-0.124939
-0.121389	faster the smaller the	-0.124939
-0.121389	advantageous the smaller the	-0.124939
-0.290466	systems. The smaller the	-0.124939
-0.579371	a parenthesis around the	-0.124939
-0.316811	the circumstances around the	-0.124939
-0.447593	shows which reductions the	-0.124939
-0.508764	predicted to go the	-0.124939
-0.351648	that have tested the	-0.124939
-0.610848	I have tested the	-0.124939
-0.696224	the CPU supports the	-0.124939
-0.188448	allowed to change the	-0.124939
-0.146436	loop can change the	-0.124939
-0.146436	library can change the	-0.124939
-0.146436	We can change the	-0.124939
-0.367324	compiler may change the	-0.124939
-0.256321	compiler will change the	-0.124939
-0.188448	if we change the	-0.124939
-0.259553	to turn off the	-0.124939
-0.180232	or log off the	-0.124939
-0.067765	by turning off the	-0.124939
-0.180232	will cut off the	-0.124939
-0.346236	or PSDK). Supports the	-0.124939
-0.448023	In 64-bit Windows, the	-0.124939
-0.057961	one that gives the	-0.124939
-0.057961	method that gives the	-0.124939
-0.057961	option that gives the	-0.124939
-0.203992	instruction set gives the	-0.124939
-0.203992	these two gives the	-0.124939
-0.203992	= N&(N-1) gives the	-0.124939
-0.282934	p and inlining the	-0.124939
-0.278798	avoided by inlining the	-0.124939
-0.278798	improved by inlining the	-0.124939
-0.170302	integer types Unfortunately, the	-0.124939
-0.170302	never called. Unfortunately, the	-0.124939
-0.170302	function calls. Unfortunately, the	-0.124939
-0.170302	these purposes. Unfortunately, the	-0.124939
-0.170302	CPU dispatching. Unfortunately, the	-0.124939
-0.170302	do this. Unfortunately, the	-0.124939
-0.170302	cross-platform portability. Unfortunately, the	-0.124939
-0.159615	is to find the	-0.124939
-0.147746	order to find the	-0.124939
-0.159615	bytes to find the	-0.124939
-0.159615	able to find the	-0.124939
-0.246077	difficult to find the	-0.124939
-0.147601	you cannot find the	-0.124939
-0.147601	call. (2) find the	-0.124939
-0.396376	sure to produce the	-0.124939
-0.278649	compiler will produce the	-0.124939
-0.278649	compiler should produce the	-0.124939
-0.170302	compiler by including the	-0.124939
-0.170302	and VIA including the	-0.124939
-0.036730	the strings including the	-0.425969
-0.170302	metaprogramming features, including the	-0.124939
-0.170302	on n, including the	-0.124939
-0.170302	same computer, including the	-0.124939
-0.144102	it is outside the	-0.124939
-0.090778	stack memory outside the	-0.124939
-0.090778	function but outside the	-0.124939
-0.090778	temporary variable outside the	-0.124939
-0.090778	extra operations outside the	-0.124939
-0.090778	last element outside the	-0.124939
-0.090778	for overflow outside the	-0.124939
-0.090778	be done outside the	-0.124939
-0.090778	calculations go outside the	-0.124939
-0.090778	can move outside the	-0.124939
-0.222396	it is still the	-0.124939
-0.222396	etc. is still the	-0.124939
-0.273802	that can prevent the	-0.124939
-0.273802	the code prevent the	-0.124939
-0.273802	This will prevent the	-0.124939
-0.345295	and no destructor the	-0.124939
-0.104985	and it prevents the	-0.124939
-0.104985	because it prevents the	-0.124939
-0.059903	memory. This prevents the	-0.124939
-0.059903	thread. This prevents the	-0.124939
-0.059903	one. This prevents the	-0.124939
-0.059903	volatile. This prevents the	-0.124939
-0.112584	write instruction prevents the	-0.124939
-0.112584	It also prevents the	-0.124939
-0.112584	integer division prevents the	-0.124939
-0.062570	code to tell the	-0.124939
-0.062570	possible to tell the	-0.124939
-0.062570	always to tell the	-0.124939
-0.062570	declaration to tell the	-0.124939
-0.062570	directive to tell the	-0.124939
-0.062570	prototype to tell the	-0.124939
-0.062570	forgot to tell the	-0.124939
-0.062570	novector to tell the	-0.124939
-0.112584	references then tell the	-0.124939
-0.347440	precision. Let's repeat the	-0.124939
-0.338343	reason to unroll the	-0.124939
-0.338343	worthwhile to unroll the	-0.124939
-0.122584	different CPUs. On the	-0.124939
-0.122584	the compiler. On the	-0.124939
-0.122584	few resources. On the	-0.124939
-0.122584	data structures. On the	-0.124939
-0.122584	more difficult. On the	-0.124939
-0.122584	using hyperthreading. On the	-0.124939
-0.122584	is profitable. On the	-0.124939
-0.122584	example 9.1b. On the	-0.124939
-0.485252	In 64-bit Linux, the	-0.124939
-0.307132	*= x; Note the	-0.124939
-0.307132	variable Day. Note the	-0.124939
-0.203962	dispatcher function. When the	-0.124939
-0.203962	cache size. When the	-0.124939
-0.203962	single precision. When the	-0.124939
-0.203962	call method. When the	-0.124939
-0.203962	branch mispredictions. When the	-0.124939
-0.267504	a function. Avoid the	-0.124939
-0.267504	solutions are: Avoid the	-0.124939
-0.267504	page 93. Avoid the	-0.124939
-0.187704	done by copying the	-0.124939
-0.187704	avoided by copying the	-0.124939
-0.187704	jump by copying the	-0.124939
-0.519088	used for accessing the	-0.124939
-0.098614	off or until the	-0.124939
-0.098614	valid only until the	-0.124939
-0.098614	a variable until the	-0.124939
-0.098614	the file until the	-0.124939
-0.098614	be loaded until the	-0.124939
-0.172717	and wait until the	-0.124939
-0.098614	but waits until the	-0.124939
-0.098614	is repeated until the	-0.124939
-0.098614	be postponed until the	-0.124939
-0.484572	row by adding the	-0.124939
-0.303768	needed before adding the	-0.124939
-0.304187	course, and causes the	-0.124939
-0.304187	and free) causes the	-0.124939
-0.342880	time than processing the	-0.124939
-0.028354	is to divide the	-0.124939
-0.340151	order to divide the	-0.124939
-0.091303	need to divide the	-0.124939
-0.091303	ways to divide the	-0.124939
-0.210012	library, you divide the	-0.124939
-0.531336	able to mix the	-0.124939
-0.047233	16 to fit the	-0.124939
-0.011335	eight to fit the	-0.726999
-0.047233	necessary, to fit the	-0.124939
-0.194286	sub-vectors that fit the	-0.124939
-0.835654	able to predict the	-0.124939
-0.390943	microprocessor can predict the	-0.124939
-0.140341	temp even though the	-0.124939
-0.140341	executed even though the	-0.124939
-0.140341	expressions, even though the	-0.124939
-0.140341	b)) even though the	-0.124939
-0.193581	track backwards though the	-0.124939
-0.685167	takes to execute the	-0.124939
-0.424483	microprocessor can execute the	-0.124939
-0.301718	interpreting or compiling the	-0.124939
-0.392054	module by compiling the	-0.124939
-0.390943	and then convert the	-0.124939
-0.300816	to first convert the	-0.124939
-0.402307	by at least the	-0.124939
-0.402307	supports at least the	-0.124939
-0.111214	a class containing the	-0.124939
-0.111214	simple class containing the	-0.124939
-0.261093	the line containing the	-0.124939
-0.136726	large to handle the	-0.425969
-0.226596	the performance during the	-0.124939
-0.226596	the computer during the	-0.124939
-0.226596	will change during the	-0.124939
-0.226596	be selected during the	-0.124939
-0.181251	loop that includes the	-0.124939
-0.285529	compilers. This includes the	-0.124939
-0.181251	language also includes the	-0.124939
-0.181251	this way includes the	-0.124939
-0.181251	map file includes the	-0.124939
-0.417123	is to insert the	-0.124939
-0.241360	time and insert the	-0.124939
-0.241360	hand and insert the	-0.124939
-1.104968	you may consider the	-0.124939
-0.342166	will be loading the	-0.124939
-0.216746	all be below the	-0.124939
-0.216746	row 28 below the	-0.124939
-0.094990	elements matrix[r][c] below the	-0.124939
-0.094990	element matrix[r][c] below the	-0.124939
-0.385488	optimize, and reading the	-0.124939
-0.296378	connection with reading the	-0.124939
-0.296378	which will delay the	-0.124939
-0.296378	operation doesn't delay the	-0.124939
-0.119548	instead of calculating the	-0.124939
-0.182343	used for calculating the	-0.124939
-0.106716	processor for calculating the	-0.124939
-0.106716	support for calculating the	-0.124939
-0.106716	intended for calculating the	-0.124939
-0.119548	faster than calculating the	-0.124939
-0.119548	implicitly when calculating the	-0.124939
-0.107371	is to enable the	-0.124939
-0.183122	recommended to enable the	-0.124939
-0.107371	options to enable the	-0.124939
-0.119224	mode or enable the	-0.124939
-0.119224	This may enable the	-0.124939
-0.176406	This will enable the	-0.124939
-0.119224	instruction sets enable the	-0.124939
-0.341086	module with, e.g. the	-0.124939
-0.369489	advantageous to keep the	-0.124939
-0.369489	preferable to keep the	-0.124939
-0.341625	compiler can align the	-0.124939
-0.341625	This will allow the	-0.124939
-0.340547	time and rarely the	-0.124939
-0.254519	the performance under the	-0.124939
-0.254519	are done under the	-0.124939
-0.254519	that runs under the	-0.124939
-0.109191	if you expect the	-0.124939
-0.109191	unless you expect the	-0.124939
-0.448771	you cannot expect the	-0.124939
-0.057465	all bits except the	-0.425969
-0.902334	The reason why the	-0.124939
-0.342297	to zero whenever the	-0.124939
-0.121250	gain by unrolling the	-0.124939
-0.121250	dramatically by unrolling the	-0.124939
-0.205214	Do not swap the	-0.124939
-0.028150	you cannot swap the	-0.425969
-0.058252	You cannot swap the	-0.124939
-0.323036	necessary to modify the	-0.124939
-0.323036	classes or modify the	-0.124939
-0.244853	and don't modify the	-0.124939
-0.042903	c); // Store the	-0.425969
-0.090513	bc); // Store the	-0.124939
-0.090513	mask); // Store the	-0.124939
-0.438849	the Gnu compiler, the	-0.124939
-0.204781	test and setting the	-0.124939
-0.231554	value by setting the	-0.124939
-0.231554	2.0) by setting the	-0.124939
-0.204781	benefit from setting the	-0.124939
-0.168494	accessed from within the	-0.124939
-0.168494	to data within the	-0.124939
-0.168494	used only within the	-0.124939
-0.168494	data members within the	-0.124939
-0.168494	become obsolete within the	-0.124939
-0.627222	you should apply the	-0.124939
-0.245328	clock cycles. Obviously, the	-0.124939
-0.245328	not needed. Obviously, the	-0.124939
-0.245328	.NET framework. Obviously, the	-0.124939
-0.539019	preferable to allocate the	-0.124939
-0.331369	how to implement the	-0.124939
-0.466910	difficult to implement the	-0.124939
-0.235386	child classes implement the	-0.124939
-0.437812	it has chosen the	-0.124939
-0.337852	generations classes contain the	-0.124939
-0.101937	is to help the	-0.124939
-0.101937	order to help the	-0.124939
-0.235386	we can help the	-0.124939
-0.375097	to optimize away the	-0.124939
-0.262328	can optimize away the	-0.124939
-0.020535	time to share the	-0.124939
-0.006738	threads can share the	-0.124939
-0.006738	c can share the	-0.124939
-0.006738	simultaneously can share the	-0.124939
-0.020535	different objects share the	-0.124939
-0.020535	the threads share the	-0.124939
-0.020535	data members share the	-0.124939
-0.020535	processors usually share the	-0.124939
-0.020535	row 28 share the	-0.124939
-0.341796	count is near the	-0.124939
-0.069078	times and stores the	-0.124939
-0.069078	matrix and stores the	-0.124939
-0.213147	the function stores the	-0.124939
-0.069078	It simply stores the	-0.124939
-0.069078	pointer simply stores the	-0.124939
-0.123795	used for finding the	-0.124939
-0.124176	useful for finding the	-0.124939
-0.123795	required for finding the	-0.124939
-0.150835	else than finding the	-0.124939
-0.339163	for most purposes the	-0.124939
-0.077108	have to vectorize the	-0.124939
-0.077108	want to vectorize the	-0.124939
-0.077108	unable to vectorize the	-0.124939
-0.114416	automatically and vectorize the	-0.124939
-0.259919	compiler will vectorize the	-0.124939
-0.114416	compilers don't vectorize the	-0.124939
-0.438636	have to include the	-0.124939
-0.284761	because it involves the	-0.124939
-0.191056	However, this involves the	-0.124939
-0.191056	method also involves the	-0.124939
-0.191056	a driver involves the	-0.124939
-0.339163	Newton-Raphson iterations. Here the	-0.124939
-0.477595	otherwise optimize across the	-0.124939
-0.336743	of code once the	-0.124939
-0.337478	should never interrupt the	-0.124939
-0.337478	loop where almost the	-0.124939
-0.338952	support for multiplying the	-0.124939
-0.301718	may slow down the	-0.124939
-0.204670	operations slow down the	-0.124939
-0.296650	parameters are exactly the	-0.124939
-0.222705	methods have exactly the	-0.124939
-0.222705	are doing exactly the	-0.124939
-0.333652	Later models had the	-0.124939
-0.491735	function to measure the	-0.124939
-0.612292	in one vector, the	-0.124939
-0.432763	forget to delete the	-0.124939
-0.333652	instruction sets. Likewise, the	-0.124939
-0.432763	(In 64-bit mode, the	-0.124939
-0.333652	need to update the	-0.124939
-0.248941	The compiler generates the	-0.124939
-0.248941	Intel compiler generates the	-0.124939
-0.266397	CPUs for executing the	-0.124939
-0.491126	and after executing the	-0.124939
-0.264972	do not free the	-0.124939
-0.264972	it could free the	-0.124939
-0.198580	register to hold the	-0.124939
-0.198580	enough to hold the	-0.124939
-0.432763	smaller the system, the	-0.124939
-0.264972	The dispatcher changes the	-0.124939
-0.264972	keyword __fastcall changes the	-0.124939
-0.431718	simply by storing the	-0.124939
-0.264972	mentioned above. Now the	-0.124939
-0.264972	+ d); Now the	-0.124939
-0.205293	is to remove the	-0.124939
-0.205293	Remember to remove the	-0.124939
-0.205684	You may remove the	-0.124939
-0.762936	time to transpose the	-0.124939
-0.331494	on how predictable the	-0.124939
-0.330528	from memory plus the	-0.124939
-0.036681	order to increase the	-0.124939
-0.036681	way to increase the	-0.124939
-0.076749	you can increase the	-0.124939
-0.076749	you cannot increase the	-0.124939
-0.076749	modifications actually increase the	-0.124939
-0.293938	have to identify the	-0.124939
-0.293938	debugger to identify the	-0.124939
-0.333433	local: 1. Add the	-0.124939
-0.361632	recommended to declare the	-0.124939
-0.251906	You may declare the	-0.124939
-0.108518	size that fits the	-0.124939
-0.108518	version that fits the	-0.124939
-0.604819	used for giving the	-0.124939
-0.427806	As explained above, the	-0.124939
-0.330528	it may detect the	-0.124939
-0.429015	time and show the	-0.124939
-0.331494	branch mispredictions. Test the	-0.124939
-0.498712	able to evaluate the	-0.124939
-0.860110	pointer or reference, the	-0.124939
-0.332463	more than half the	-0.124939
-0.362670	of only half the	-0.124939
-0.431444	instructions for converting the	-0.124939
-0.252712	declared by specifying the	-0.124939
-0.252712	int, without specifying the	-0.124939
-0.332463	This closely follows the	-0.124939
-0.330528	loop counter, comparing the	-0.124939
-0.332463	mechanism can prefetch the	-0.124939
-0.333433	executed. Without static, the	-0.124939
-0.331494	Testing speed Testing the	-0.124939
-0.523395	condition. In general, the	-0.124939
-0.330528	same module (i.e. the	-0.124939
-0.331494	methods for avoiding the	-0.124939
-0.361632	save by avoiding the	-0.124939
-0.422388	CPU to increment the	-0.124939
-0.062750	important to economize the	-0.301030
-0.093872	library and economize the	-0.124939
-0.226520	way to overcome the	-0.124939
-0.369659	how to overcome the	-0.124939
-0.326192	careful when swapping the	-0.124939
-0.598844	options turned on, the	-0.124939
-0.158414	ways of reducing the	-0.124939
-0.158414	used for reducing the	-0.124939
-0.158414	operations without reducing the	-0.124939
-0.021635	not be worth the	-0.425969
-0.093872	is rarely worth the	-0.124939
-0.093872	is hardly worth the	-0.124939
-0.327340	of software specifies the	-0.124939
-0.327340	bc); // OR the	-0.124939
-0.326192	following table lists the	-0.124939
-0.310373	directives that select the	-0.124939
-0.234251	will always select the	-0.124939
-0.326192	element in list, the	-0.124939
-0.234251	In this case, the	-0.124939
-0.438471	the latter case, the	-0.124939
-0.310373	the advantages over the	-0.124939
-0.234251	to controversies over the	-0.124939
-0.914159	the other hand, the	-0.124939
-0.423822	advantageous to split the	-0.124939
-0.328492	way to limit the	-0.124939
-0.093872	needs to follow the	-0.124939
-0.093872	if you follow the	-0.124939
-0.093872	vectorization then follow the	-0.124939
-0.093872	cache lines follow the	-0.124939
-0.326192	of CPUs increased the	-0.124939
-0.234251	executable file. Only the	-0.124939
-0.234251	of ebx. Only the	-0.124939
-0.326192	This function adds the	-0.124939
-0.326192	cache sizes. Fortunately, the	-0.124939
-0.221406	always to specify the	-0.124939
-0.158414	if we specify the	-0.124939
-0.158414	as well specify the	-0.124939
-0.479871	function, but unfortunately the	-0.124939
-0.326192	be inlined. (In the	-0.124939
-0.459853	want to compare the	-0.124939
-0.235180	the stack. Is the	-0.124939
-0.235180	page 38). Is the	-0.124939
-0.235180	is enabled. Typically, the	-0.124939
-0.235180	the future. Typically, the	-0.124939
-0.235180	end user gets the	-0.124939
-0.235180	application programmer gets the	-0.124939
-0.093872	definition. This tells the	-0.124939
-0.093872	map file tells the	-0.124939
-0.021635	The profiler tells the	-0.124939
-0.180738	is to wrap the	-0.124939
-0.180738	recommended to wrap the	-0.124939
-0.327340	problem by increasing the	-0.124939
-0.328492	C++ is definitely the	-0.124939
-0.319768	Linux and BSD, the	-0.124939
-0.011431	4 2 Choosing the	-0.425969
-0.023171	23 5 Choosing the	-0.124939
-0.023171	website. 5 Choosing the	-0.124939
-0.319768	recommended to place the	-0.124939
-0.056253	CPU to overlap the	-0.124939
-0.056253	able to overlap the	-0.124939
-0.120892	capabilities can overlap the	-0.124939
-0.471065	or by turning the	-0.124939
-0.043597	possible to obtain the	-0.425969
-0.075265	file. This enables the	-0.425969
-0.398806	Let me explain the	-0.124939
-0.209044	libraries. To explain the	-0.124939
-0.319768	be weighed against the	-0.124939
-0.165705	done by declaring the	-0.124939
-0.165705	inlined by declaring the	-0.124939
-0.319768	container may move the	-0.124939
-0.319768	the container. Can the	-0.124939
-0.322602	cache always chooses the	-0.124939
-0.281792	more by choosing the	-0.124939
-0.210140	the programmer choosing the	-0.124939
-0.322602	as is commonly the	-0.124939
-0.319768	overhead of transferring the	-0.124939
-0.321183	loop and splitting the	-0.124939
-0.321183	big problem. Whenever the	-0.124939
-0.209044	and it avoids the	-0.124939
-0.209044	time but avoids the	-0.124939
-0.319768	microprocessor can begin the	-0.124939
-0.920603	In other words, the	-0.124939
-0.821713	following example illustrates the	-0.124939
-0.209044	optimal to mirror the	-0.124939
-0.209044	You may mirror the	-0.124939
-0.414386	and by changing the	-0.124939
-0.209044	order to force the	-0.124939
-0.209044	Memory-hungry applications force the	-0.124939
-0.209044	if it opens the	-0.124939
-0.209044	instruction set opens the	-0.124939
-0.120892	threads have finished the	-0.124939
-0.027216	it has finished the	-0.124939
-0.309269	32-bit mode. Storing the	-0.124939
-0.309269	smaller by reordering the	-0.124939
-0.309269	while simultaneously prefetching the	-0.124939
-0.309269	call this distance the	-0.124939
-0.309269	compiler from aligning the	-0.124939
-0.065118	takes to reload the	-0.124939
-0.309269	73 Without optimization, the	-0.124939
-0.311111	Boolean expressions. Whether the	-0.124939
-0.309269	unused. This removed the	-0.124939
-0.309269	well it optimizes the	-0.124939
-0.436989	zero // Return the	-0.124939
-0.141753	size. In fact, the	-0.124939
-0.141753	throw. In fact, the	-0.124939
-0.309269	You may ignore the	-0.124939
-0.064769	course that reflects the	-0.124939
-0.064769	loop. This reflects the	-0.124939
-0.064769	long double reflects the	-0.124939
-0.076749	you to manipulate the	-0.124939
-0.076749	us to manipulate the	-0.124939
-0.170035	code that copies the	-0.124939
-0.170035	shr ebx,31 copies the	-0.124939
-0.076749	useful to study the	-0.124939
-0.076749	important to study the	-0.124939
-0.170035	problem by bypassing the	-0.124939
-0.170035	the compiler bypassing the	-0.124939
-0.309269	explanation. Please skip the	-0.124939
-0.309269	called, it allocates the	-0.124939
-0.141753	Most compilers offer the	-0.124939
-0.141753	Other compilers offer the	-0.124939
-0.309269	} // At the	-0.124939
-0.065118	_mm256_zeroupper() before leaving the	-0.425969
-0.170035	future processors. Consider the	-0.124939
-0.170035	in loops. Consider the	-0.124939
-0.401369	520 and leave the	-0.124939
-0.311111	the user. With the	-0.124939
-0.311111	the code. Sometimes the	-0.124939
-0.565664	enough to justify the	-0.124939
-0.701283	On the contrary, the	-0.124939
-0.076749	order to cover the	-0.124939
-0.076749	big to cover the	-0.124939
-0.309269	the same way, the	-0.124939
-0.170035	too slow. Today, the	-0.124939
-0.170035	mainframe computers. Today, the	-0.124939
-0.309269	important to focus the	-0.124939
-0.309269	bounds is probably the	-0.124939
-0.401369	it for improving the	-0.124939
-0.311111	array. eax holds the	-0.124939
-0.076749	key or moving the	-0.124939
-0.076749	button or moving the	-0.124939
-0.309269	clock pulses since the	-0.124939
-0.020535	purposes is beyond the	-0.124939
-0.020535	queries is beyond the	-0.124939
-0.020535	coprocessors is beyond the	-0.124939
-0.170035	ways of organizing the	-0.124939
-0.170035	performance by organizing the	-0.124939
-0.309269	inlining can open the	-0.124939
-0.170035	spots and measuring the	-0.124939
-0.170035	this by measuring the	-0.124939
-0.076749	possible to utilize the	-0.124939
-0.076749	order to utilize the	-0.124939
-0.311111	subsequent counts represent the	-0.124939
-0.076749	test that measures the	-0.124939
-0.076749	counter that measures the	-0.124939
-0.031178	compiler can bypass the	-0.124939
-0.031178	You can bypass the	-0.124939
-0.064769	Replace or bypass the	-0.124939
-0.460274	cache will evict the	-0.124939
-0.309269	the function. Copying the	-0.124939
-0.309269	develop and market the	-0.124939
-0.309269	July 2011). Instead, the	-0.124939
-0.436989	avoided by joining the	-0.124939
-0.436989	efficient to determine the	-0.124939
-0.036681	matter of interpreting the	-0.425969
-0.289002	utility for modifying the	-0.124939
-0.289002	operating systems lack the	-0.124939
-0.289002	programs. Writing past the	-0.124939
-0.376449	you to override the	-0.124939
-0.289002	or to exit the	-0.124939
-0.289002	or variable having the	-0.124939
-0.289002	advance. This reduces the	-0.124939
-0.289002	expression better explains the	-0.124939
-0.289002	order to emulate the	-0.124939
-0.289002	to 12.1a. Enable the	-0.124939
-0.289002	page 78). Adding the	-0.124939
-0.101178	last vector. Organize the	-0.124939
-0.101178	a bottleneck. Organize the	-0.124939
-0.289002	the while loop, the	-0.124939
-0.101178	this by invoking the	-0.124939
-0.101178	program without invoking the	-0.124939
-0.289002	example, which calculates the	-0.124939
-0.047649	program. 3 Finding the	-0.124939
-0.047649	14 3 Finding the	-0.124939
-0.101178	required for putting the	-0.124939
-0.101178	2 by putting the	-0.124939
-0.047649	applications. 2.8 Overcoming the	-0.124939
-0.047649	14 2.8 Overcoming the	-0.124939
-0.289002	than necessary. Take the	-0.124939
-0.101178	but it increases the	-0.124939
-0.101178	hash table increases the	-0.124939
-0.289002	may actively invalidate the	-0.124939
-0.289002	is 4. So the	-0.124939
-0.289002	both operands. Nevertheless, the	-0.124939
-0.289002	then FuncC. Unrolling the	-0.124939
-0.289002	profiler which determines the	-0.124939
-0.101178	the logic behind the	-0.124939
-0.101178	actually hidden behind the	-0.124939
-0.101178	useful to isolate the	-0.124939
-0.101178	identify and isolate the	-0.124939
-0.289002	necessary for verifying the	-0.124939
-0.289002	* sizeof(float)). Now, the	-0.124939
-0.101178	low priority. Especially the	-0.124939
-0.101178	round addresses. Especially the	-0.124939
-0.289002	dynamic library requiring the	-0.124939
-0.289002	process can influence the	-0.124939
-0.101178	routine that loads the	-0.124939
-0.101178	application program loads the	-0.124939
-0.376449	here to draw the	-0.124939
-0.101178	advantage of sharing the	-0.124939
-0.101178	threads are sharing the	-0.124939
-0.047649	overflow and redo the	-0.124939
-0.047649	_finite()) and redo the	-0.124939
-0.529749	creating and deleting the	-0.124939
-0.289002	line that covered the	-0.124939
-0.289002	hot spot. Sometimes, the	-0.124939
-0.376449	disabled will crash the	-0.124939
-0.376449	you to reserve the	-0.124939
-0.289002	is run. Both the	-0.124939
-0.529749	program is loaded, the	-0.124939
-0.376449	size by extending the	-0.124939
-0.101178	& operator forces the	-0.124939
-0.101178	The union forces the	-0.124939
-0.047649	start and stop the	-0.124939
-0.047649	message and stop the	-0.124939
-0.376449	possible to organize the	-0.124939
-0.289002	the compiler sees the	-0.124939
-0.289002	performance and studying the	-0.124939
-0.289002	control it compares the	-0.124939
-0.047649	have to fix the	-0.124939
-0.047649	try to fix the	-0.124939
-0.289002	it may involve the	-0.124939
-0.289002	simply by removing the	-0.124939
-0.289002	the compiler interpret the	-0.124939
-0.289002	want to flip the	-0.124939
-0.289002	For these reasons, the	-0.124939
-0.289002	used for relieving the	-0.124939
-0.289002	meaning. 2. Put the	-0.124939
-0.233324	simply by ignoring the	-0.124939
-0.233324	auto_ptr that owns the	-0.124939
-0.233324	option that limits the	-0.124939
-0.233324	the sign bit, the	-0.124939
-0.233324	user interface. Otherwise the	-0.124939
-0.233324	this will trigger the	-0.124939
-0.233324	work // Re-do the	-0.124939
-0.233324	has problems separating the	-0.124939
-0.233324	few decades ago, the	-0.124939
-0.233324	you may reuse the	-0.124939
-0.233324	vector classes. Including the	-0.124939
-0.233324	pop ebx restores the	-0.124939
-0.233324	or even telling the	-0.124939
-0.233324	is fast. Calculating the	-0.124939
-0.233324	want to thank the	-0.124939
-0.233324	time consumers. Choose the	-0.124939
-0.233324	the system forbids the	-0.124939
-0.233324	existing program. Weighing the	-0.124939
-0.233324	such as eliminating the	-0.124939
-0.233324	code by emulating the	-0.124939
-0.233324	current version satisfies the	-0.124939
-0.233324	means are among the	-0.124939
-0.233324	discrete icon signaling the	-0.124939
-0.233324	way of solving the	-0.124939
-0.233324	because it lacks the	-0.124939
-0.233324	Obviously, we loose the	-0.124939
-0.233324	exception handling. Omitting the	-0.124939
-0.233324	done by controlling the	-0.124939
-0.233324	experience before trying the	-0.124939
-0.233324	simply by inverting the	-0.124939
-0.233324	compiler may interleave the	-0.124939
-0.233324	it rarely justifies the	-0.124939
-0.233324	important to weigh the	-0.124939
-0.233324	user to restart the	-0.124939
-0.233324	is occupied throughout the	-0.124939
-0.233324	variable. (This eliminates the	-0.124939
-0.233324	obtained by dropping the	-0.124939
-0.233324	In example 12.2, the	-0.124939
-0.233324	an if-else structure), the	-0.124939
-0.233324	to experience. Occasionally, the	-0.124939
-0.233324	aiming at explaining the	-0.124939
-0.233324	bits for holding the	-0.124939
-0.233324	also included. Combining the	-0.124939
-0.233324	cycles to fetch the	-0.124939
-0.233324	destructor by constructing the	-0.124939
-0.233324	new processor enters the	-0.124939
-0.233324	that can steal the	-0.124939
-0.233324	or NAN. Avoiding the	-0.124939
-0.233324	instruction to localize the	-0.124939
-0.233324	hot spot. Repeating the	-0.124939
-0.233324	in column 28, the	-0.124939
-0.233324	in addition to) the	-0.124939
-0.233324	table 9.3 shows, the	-0.124939
-0.233324	programming without paying the	-0.124939
-0.233324	compiler doesn't provide the	-0.124939
-0.233324	integer operations in-between the	-0.124939
-0.233324	are often abusing the	-0.124939
-0.233324	way by wrapping the	-0.124939
-0.233324	CPU dispatching. Underestimating the	-0.124939
-0.233324	have to reinvent the	-0.124939
-0.233324	following table summarizes the	-0.124939
-0.233324	error condition terminates the	-0.124939
-0.233324	Intel compiler puts the	-0.124939
-0.233324	and then merge the	-0.124939
-0.233324	is to combine the	-0.124939
-0.233324	to alias upon the	-0.124939
-0.233324	AND operation isolates the	-0.124939
-0.233324	efficient solution. Sort the	-0.124939
-0.233324	n and reorganize the	-0.124939
-0.233324	be faster despite the	-0.124939
-0.233324	the linker extracts the	-0.124939
-0.233324	1. This ends the	-0.124939
-0.233324	so complicated? Because the	-0.124939
-0.233324	interpreter which interprets the	-0.124939
-0.233324	the overflow. Taking the	-0.124939
-0.233324	In example 12.1a, the	-0.124939
-0.233324	conditions are met: the	-0.124939
-0.233324	examples exist. Therefore the	-0.124939
-0.233324	operation that crashes the	-0.124939
-0.233324	need to deallocate the	-0.124939
-0.233324	whole program. During the	-0.124939
-0.233324	you may view the	-0.124939
-0.233324	The trick violates the	-0.124939
-0.233324	have to consult the	-0.124939
-0.233324	In any event, the	-0.124939
-0.233324	order to minimize the	-0.124939
-0.233324	in example 12.1b, the	-0.124939
-0.233324	succeeded in applying the	-0.124939
-0.233324	takes to refresh the	-0.124939
-0.233324	program and concentrate the	-0.124939
-0.233324	thread that shares the	-0.124939
-0.233324	dependency chains, namely the	-0.124939
-0.233324	Example 14.30 finds the	-0.124939
-0.233324	a = ++b; the	-0.124939
-0.233324	programming nowadays stress the	-0.124939
-0.233324	idea to collect the	-0.124939
-0.233324	they are. Declare the	-0.124939
-0.233324	to fine- tune the	-0.124939
-0.233324	(In my tests, the	-0.124939
-0.233324	of variables. Move the	-0.124939
-0.233324	opens and closes the	-0.124939
-0.233324	sin, etc. Overriding the	-0.124939
-0.233324	intended to mimic the	-0.124939
-0.233324	it can overwrite the	-0.124939
-0.233324	risk of activating the	-0.124939
-0.233324	memory when exiting the	-0.124939
-0.233324	mechanisms often disturb the	-0.124939
-0.233324	function and replaces the	-0.124939
-0.233324	is valid. Re-interpreting the	-0.124939
-0.586823	example, that a is	-0.124939
-0.188908	evaluated if a is	-0.124939
-0.571377	b when a is	-0.124939
-0.836789	it points to is	-0.124939
-0.435080	object pointed to is	-0.124939
-0.435080	variable pointed to is	-0.124939
-0.307846	target pointed to is	-0.124939
-0.519896	class objects and is	-0.124939
-0.457246	is big and is	-0.124939
-0.457246	OS support and is	-0.124939
-0.820948	CPU dispatching and is	-0.124939
-0.457246	syntax checking and is	-0.124939
-0.353898	it is, and is	-0.124939
-0.559480	you optimized for is	-0.124939
-0.833333	a function that is	-0.124939
-0.624568	A function that is	-0.124939
-0.393576	const function that is	-0.124939
-0.393576	every function that is	-0.124939
-0.736030	the code that is	-0.124939
-0.556908	for code that is	-0.124939
-0.556908	A code that is	-0.124939
-0.395352	All code that is	-0.124939
-0.395352	complicated code that is	-0.124939
-0.415079	The time that is	-0.124939
-0.320325	of memory that is	-0.124939
-0.516968	well-structured program that is	-0.124939
-0.702506	the one that is	-0.124939
-0.583562	and one that is	-0.124939
-0.585826	instruction set that is	-0.124939
-0.471537	or class that is	-0.124939
-0.128730	integer size that is	-0.301030
-0.691923	the library that is	-0.124939
-0.451891	or object that is	-0.124939
-0.483766	generic version that is	-0.124939
-0.078355	the value that is	-0.124939
-0.047004	a value that is	-0.124939
-0.451891	A variable that is	-0.124939
-0.320325	the table that is	-0.124939
-0.288232	on software that is	-0.124939
-0.288232	Security software that is	-0.124939
-0.525763	Remove branch that is	-0.124939
-0.060696	memory address that is	-0.425969
-0.451891	important method that is	-0.124939
-0.320325	the type that is	-0.124939
-0.320325	a constant that is	-0.124939
-0.210639	the expression that is	-0.124939
-0.210639	The expression that is	-0.124939
-0.210639	An expression that is	-0.124939
-0.210639	Any expression that is	-0.124939
-0.320325	to zero that is	-0.124939
-0.320325	an offset that is	-0.124939
-0.060696	the operand that is	-0.124939
-0.451891	sure everything that is	-0.124939
-0.320325	a measure that is	-0.124939
-0.320325	runtime polymorphism that is	-0.124939
-0.320325	stack unwinding that is	-0.124939
-0.320325	constant divisor that is	-0.124939
-0.585826	initialization routine that is	-0.124939
-0.320325	table (PLT) that is	-0.124939
-0.131276	optimization. Everything that is	-0.124939
-0.131276	register. Everything that is	-0.124939
-0.320325	vectors. Code that is	-0.124939
-0.408373	processors, and it is	-0.124939
-0.330182	is that it is	-0.249877
-0.616609	so that it is	-0.124939
-0.224065	high that it is	-0.124939
-0.213070	means that it is	-0.124939
-0.283647	optimizations that it is	-0.124939
-0.283647	expensive that it is	-0.124939
-0.283647	think that it is	-0.124939
-0.283647	argue that it is	-0.124939
-0.075276	or if it is	-0.301030
-0.093276	function if it is	-0.124939
-0.093276	not if it is	-0.124939
-0.093276	time if it is	-0.124939
-0.093276	memory if it is	-0.124939
-0.166392	only if it is	-0.124939
-0.093276	loop if it is	-0.124939
-0.093276	integer if it is	-0.124939
-0.093276	efficient if it is	-0.124939
-0.093276	call if it is	-0.124939
-0.093276	thread if it is	-0.124939
-0.093276	well if it is	-0.124939
-0.044138	cycles if it is	-0.124939
-0.093276	fast if it is	-0.124939
-0.093276	see if it is	-0.124939
-0.093276	global if it is	-0.124939
-0.093276	costs if it is	-0.124939
-0.093276	destructor if it is	-0.124939
-0.093276	efficiently if it is	-0.124939
-0.093276	consider if it is	-0.124939
-0.093276	style if it is	-0.124939
-0.093276	subroutine if it is	-0.124939
-0.231063	long as it is	-0.124939
-0.231063	distributed as it is	-0.124939
-0.251386	systems than it is	-0.124939
-0.251386	purposes than it is	-0.124939
-0.503794	each time it is	-0.124939
-0.503794	every time it is	-0.124939
-0.195344	used when it is	-0.124939
-0.326894	even when it is	-0.124939
-0.195344	compiled when it is	-0.124939
-0.195344	line when it is	-0.124939
-0.195344	running when it is	-0.124939
-0.195344	permissible when it is	-0.124939
-0.176690	code then it is	-0.124939
-0.176690	compilers then it is	-0.124939
-0.176690	available then it is	-0.124939
-0.176690	execution then it is	-0.124939
-0.176690	advantageous then it is	-0.124939
-0.176690	known then it is	-0.124939
-0.176690	automatically then it is	-0.124939
-0.176690	core then it is	-0.124939
-0.176690	index then it is	-0.124939
-0.176690	efficiency then it is	-0.124939
-0.176690	module then it is	-0.124939
-0.176690	manner then it is	-0.124939
-0.176690	arrays, then it is	-0.124939
-0.176690	segment then it is	-0.124939
-0.176690	dispatching, then it is	-0.124939
-0.176690	found, then it is	-0.124939
-0.176690	fine then it is	-0.124939
-0.176690	predictable, then it is	-0.124939
-0.176690	made) then it is	-0.124939
-0.176690	small, then it is	-0.124939
-0.176690	met then it is	-0.124939
-0.310726	variable because it is	-0.124939
-0.310726	times because it is	-0.124939
-0.310726	just because it is	-0.124939
-0.310726	inefficient because it is	-0.124939
-0.310726	executable because it is	-0.124939
-0.310726	Bridge) because it is	-0.124939
-0.310726	alloca, because it is	-0.124939
-0.191412	the CPU it is	-0.124939
-0.054932	branch. If it is	-0.124939
-0.054932	first. If it is	-0.124939
-0.054932	58 If it is	-0.124939
-0.259789	on which it is	-0.124939
-0.260949	well, but it is	-0.124939
-0.260949	pointer, but it is	-0.124939
-0.260949	applications, but it is	-0.124939
-0.260949	software, but it is	-0.124939
-0.260949	subtasks, but it is	-0.124939
-0.260949	expandable, but it is	-0.124939
-0.260949	wrong, but it is	-0.124939
-0.191412	the one it is	-0.124939
-0.159652	point where it is	-0.124939
-0.190920	cases where it is	-0.124939
-0.159652	cache, where it is	-0.124939
-0.159652	data", where it is	-0.124939
-0.197014	stack before it is	-0.124939
-0.197014	values before it is	-0.124939
-0.197014	temp before it is	-0.124939
-0.197014	misprediction before it is	-0.124939
-0.191412	most libraries it is	-0.124939
-0.144734	make sure it is	-0.124939
-0.163344	most cases it is	-0.124939
-0.163344	many cases it is	-0.124939
-0.275100	some cases it is	-0.425969
-0.191412	more important it is	-0.124939
-0.191412	on, while it is	-0.124939
-0.148048	CPU. But it is	-0.124939
-0.148048	checks. But it is	-0.124939
-0.148048	issue. But it is	-0.124939
-0.191412	and therefore it is	-0.124939
-0.026598	out whether it is	-0.124939
-0.026598	consider whether it is	-0.124939
-0.026598	evaluate whether it is	-0.124939
-0.013096	deciding whether it is	-0.124939
-0.026598	determine whether it is	-0.124939
-0.191412	size. However, it is	-0.124939
-0.085245	other cases, it is	-0.124939
-0.085245	some cases, it is	-0.124939
-0.191412	the microprocessor it is	-0.124939
-0.113623	applications. Therefore, it is	-0.124939
-0.113623	number. Therefore, it is	-0.124939
-0.113623	int. Therefore, it is	-0.124939
-0.113623	78 Therefore, it is	-0.124939
-0.113623	programmed. Therefore, it is	-0.124939
-0.113623	PCs. Therefore, it is	-0.124939
-0.113623	calculated. Therefore, it is	-0.124939
-0.191412	multithreaded applications it is	-0.124939
-0.191412	} Here, it is	-0.124939
-0.371807	even though it is	-0.124939
-0.085245	the program, it is	-0.124939
-0.085245	final program, it is	-0.124939
-0.191412	reason why it is	-0.124939
-0.285203	initiative whenever it is	-0.124939
-0.191412	is used, it is	-0.124939
-0.191412	again. Obviously, it is	-0.124939
-0.191412	} Here it is	-0.124939
-0.191412	overflow. Likewise, it is	-0.124939
-0.453591	is called, it is	-0.124939
-0.191412	interrupted. Now it is	-0.124939
-0.191412	bodies above, it is	-0.124939
-0.191412	In general, it is	-0.124939
-0.191412	In C++, it is	-0.124939
-0.191412	specific event it is	-0.124939
-0.191412	other hand, it is	-0.124939
-0.191412	classes Fortunately, it is	-0.124939
-0.085245	etc. And it is	-0.124939
-0.085245	maintain. And it is	-0.124939
-0.191412	between platforms, it is	-0.124939
-0.191412	script languages, it is	-0.124939
-0.191412	crash. Furthermore, it is	-0.124939
-0.191412	other words, it is	-0.124939
-0.259789	tasks. Sometimes it is	-0.124939
-0.191412	size. Today, it is	-0.124939
-0.259789	two. Often, it is	-0.124939
-0.191412	PC. Nevertheless, it is	-0.124939
-0.191412	modern software, it is	-0.124939
-0.191412	Boolean algebra, it is	-0.124939
-0.085245	team projects, it is	-0.124939
-0.085245	one-man projects, it is	-0.124939
-0.191412	each method, it is	-0.124939
-0.191412	is accessed, it is	-0.124939
-0.191412	can see, it is	-0.124939
-0.191412	the performance, it is	-0.124939
-0.191412	do. Hence, it is	-0.124939
-0.191412	software design, it is	-0.124939
-0.191412	or bottleneck, it is	-0.124939
-0.191412	software project, it is	-0.124939
-0.191412	of habit, it is	-0.124939
-0.191412	like these, it is	-0.124939
-0.191412	in nature, it is	-0.124939
-0.473404	and the function is	-0.124939
-0.287519	that the function is	-0.124939
-0.175159	if the function is	-0.124939
-0.069311	time the function is	-0.550907
-0.805766	when the function is	-0.124939
-0.336112	case the function is	-0.124939
-0.336112	times the function is	-0.124939
-0.336112	calls the function is	-0.124939
-0.336112	calculate the function is	-0.124939
-0.473404	unless the function is	-0.124939
-0.336112	When the function is	-0.124939
-0.816611	to a function is	-0.124939
-0.299194	that a function is	-0.124939
-0.161725	time a function is	-0.425969
-0.422305	when a function is	-0.124939
-0.299194	If a function is	-0.124939
-0.422305	Inlining a function is	-0.124939
-0.379698	call. The function is	-0.124939
-0.379698	microprocessors. The function is	-0.124939
-0.232379	compilers. This function is	-0.124939
-0.232379	dispatching. This function is	-0.124939
-0.232379	overflow. This function is	-0.124939
-0.612493	of this function is	-0.124939
-0.473093	body. A function is	-0.124939
-0.284777	any other function is	-0.124939
-0.649323	times each function is	-0.124939
-0.297174	the member function is	-0.124939
-0.528524	a member function is	-0.124939
-0.231876	static member function is	-0.124939
-0.297174	virtual member function is	-0.124939
-0.297174	polymorphic member function is	-0.124939
-0.397022	the critical function is	-0.124939
-0.522422	the template function is	-0.124939
-0.277033	the virtual function is	-0.124939
-0.404431	An inline function is	-0.124939
-0.596123	the dispatcher function is	-0.124939
-0.570141	a graphics function is	-0.124939
-0.284777	a linked function is	-0.124939
-0.612493	A frame function is	-0.124939
-0.404431	ivdep Assume function is	-0.124939
-0.452530	A pure function is	-0.124939
-0.422002	the dispatched function is	-0.124939
-0.117715	A leaf function is	-0.124939
-0.284777	the lrint function is	-0.124939
-0.284777	The InstructionSet() function is	-0.124939
-0.284777	a user-defined function is	-0.124939
-0.284777	The sin function is	-0.124939
-0.357545	time while if is	-0.124939
-0.356613	multiply j by is	-0.124939
-0.576761	surely rely on is	-0.124939
-0.676253	of the code is	-0.124939
-0.437300	that the code is	-0.124939
-0.326870	if the code is	-0.124939
-0.150168	when the code is	-0.249877
-0.767935	then the code is	-0.124939
-0.290064	If the code is	-0.124939
-0.404307	case the code is	-0.124939
-0.404307	whether the code is	-0.124939
-0.404307	All the code is	-0.124939
-0.156566	branch of code is	-0.124939
-0.403791	lot of code is	-0.124939
-0.851652	piece of code is	-0.124939
-0.444178	functions. The code is	-0.124939
-0.444178	compilers. The code is	-0.124939
-0.444178	3. The code is	-0.124939
-0.287284	The function code is	-0.124939
-0.425444	with this code is	-0.124939
-0.197468	of program code is	-0.124939
-0.087606	The program code is	-0.425969
-0.287284	maintaining such code is	-0.124939
-0.092888	the system code is	-0.124939
-0.282666	of intermediate code is	-0.124939
-0.401652	on intermediate code is	-0.124939
-0.330426	an intermediate code is	-0.425969
-0.735900	The above code is	-0.124939
-0.374350	The source code is	-0.124939
-0.120308	that your code is	-0.124939
-0.120308	before your code is	-0.124939
-0.479730	special position-independent code is	-0.124939
-0.425444	the machine code is	-0.124939
-0.211206	below. Position-independent code is	-0.124939
-0.211206	default. Position-independent code is	-0.124939
-0.374350	size. Vectorized code is	-0.124939
-0.374350	The built-in code is	-0.124939
-0.287284	the unsafe code is	-0.124939
-0.287284	script. Interpreted code is	-0.124939
-0.287284	it. Complicated code is	-0.124939
-0.354631	compilers, etc., as is	-0.124939
-0.354631	of vectors, as is	-0.124939
-0.150913	0 // This is	-0.124939
-0.068934	16; // This is	-0.425969
-0.150913	method. // This is	-0.124939
-0.150913	time1; // This is	-0.124939
-0.613011	1; } This is	-0.124939
-0.540730	the code. This is	-0.124939
-0.046696	the time. This is	-0.124939
-0.046696	a time. This is	-0.124939
-0.046696	first time. This is	-0.124939
-0.046696	extra time. This is	-0.124939
-0.227513	10 Gnu This is	-0.124939
-0.736459	the function. This is	-0.124939
-0.330474	intrinsic functions. This is	-0.124939
-0.227513	|| b; This is	-0.124939
-0.354357	into memory. This is	-0.124939
-0.354357	a program. This is	-0.124939
-0.176754	the data. This is	-0.124939
-0.267084	of data. This is	-0.124939
-0.465689	is called. This is	-0.124939
-0.427739	different CPUs. This is	-0.124939
-0.427739	innermost loop. This is	-0.124939
-0.302357	memory pointer. This is	-0.124939
-0.302357	both cases. This is	-0.124939
-0.227513	cache size. This is	-0.124939
-0.330474	child class. This is	-0.124939
-0.227513	to it. This is	-0.124939
-0.302357	local variable. This is	-0.124939
-0.302357	other purposes. This is	-0.124939
-0.227513	point instructions. This is	-0.124939
-0.227513	the vector. This is	-0.124939
-0.302357	as well. This is	-0.124939
-0.227513	function returns. This is	-0.124939
-0.427739	memory block. This is	-0.124939
-0.227513	same value. This is	-0.124939
-0.227513	operating system. This is	-0.124939
-0.227513	23 software. This is	-0.124939
-0.227513	cache line. This is	-0.124939
-0.227513	its parameters. This is	-0.124939
-0.302357	vector simultaneously. This is	-0.124939
-0.227513	Visual Studio This is	-0.124939
-0.227513	the processor. This is	-0.124939
-0.176754	of bits. This is	-0.124939
-0.176754	64 bits. This is	-0.124939
-0.302357	to do. This is	-0.124939
-0.302357	by 16. This is	-0.124939
-0.302357	hundred times. This is	-0.124939
-0.227513	stamp counter. This is	-0.124939
-0.227513	or structure. This is	-0.124939
-0.227513	Digital Mars This is	-0.124939
-0.227513	filled up. This is	-0.124939
-0.227513	all objects. This is	-0.124939
-0.227513	invalid pointers. This is	-0.124939
-0.227513	subsequent counts. This is	-0.124939
-0.227513	reproducible results. This is	-0.124939
-0.227513	point addition. This is	-0.124939
-0.227513	are long. This is	-0.124939
-0.227513	#include directives. This is	-0.124939
-0.427739	VIA CPUs"). This is	-0.124939
-0.227513	the same. This is	-0.124939
-0.227513	each other. This is	-0.124939
-0.227513	of truncation. This is	-0.124939
-0.227513	of x. This is	-0.124939
-0.227513	interface frameworks. This is	-0.124939
-0.227513	is compiled. This is	-0.124939
-0.227513	= 32. This is	-0.124939
-0.227513	function declaration. This is	-0.124939
-0.227513	by default. This is	-0.124939
-0.227513	than rounding. This is	-0.124939
-0.227513	even temporarily. This is	-0.124939
-0.227513	be predicted. This is	-0.124939
-0.227513	with alloca. This is	-0.124939
-0.227513	label $B1$2:. This is	-0.124939
-0.227513	called before. This is	-0.124939
-0.227513	also de-allocated. This is	-0.124939
-0.227513	address [ecx+eax*4]. This is	-0.124939
-0.227513	+ 0.666666666666666666667; This is	-0.124939
-0.227513	vacant spaces. This is	-0.124939
-0.227513	of if. This is	-0.124939
-0.227513	template specialization. This is	-0.124939
-0.227513	ever happens. This is	-0.124939
-0.227513	0; 35 This is	-0.124939
-0.227513	64 kbytes. This is	-0.124939
-0.227513	library (SVML). This is	-0.124939
-0.227513	quite often. This is	-0.124939
-0.227513	plus i*sizeof(S1). This is	-0.124939
-0.227513	or CString. This is	-0.124939
-0.227513	is deprecated. This is	-0.124939
-0.227513	as ((a+b)+c)+d. This is	-0.124939
-0.227513	time measure. This is	-0.124939
-0.227513	of usability. This is	-0.124939
-0.227513	xor eax,eax. This is	-0.124939
-0.246638	of an int is	-0.124939
-0.246638	while an int is	-0.124939
-0.589481	A short int is	-0.124939
-0.881175	If the compiler is	-0.124939
-0.956103	but the compiler is	-0.124939
-0.187710	where the compiler is	-0.425969
-0.761612	example, the compiler is	-0.124939
-0.525585	Therefore the compiler is	-0.124939
-0.517223	solution. The compiler is	-0.124939
-0.517223	big. The compiler is	-0.124939
-0.517223	x. The compiler is	-0.124939
-0.417641	well. This compiler is	-0.124939
-0.515213	the Intel compiler is	-0.124939
-0.556704	The Intel compiler is	-0.124939
-0.336670	the C++ compiler is	-0.124939
-0.336670	Gnu C++ compiler is	-0.124939
-1.133370	The Gnu compiler is	-0.124939
-0.589628	The Clang compiler is	-0.124939
-0.322384	Digital Mars compiler is	-0.124939
-0.536782	address of x is	-0.124939
-0.356007	is that x is	-0.124939
-0.512736	purpose of this is	-0.124939
-0.706580	reason for this is	-0.124939
-0.437379	compiler that this is	-0.124939
-0.319012	or if this is	-0.124939
-0.319012	know if this is	-0.124939
-0.309560	long as this is	-0.124939
-0.401729	learn from this is	-0.124939
-0.375073	position-independent because this is	-0.124939
-0.375073	0x80000000; because this is	-0.124939
-0.401729	size. If this is	-0.124939
-0.235598	course, but this is	-0.124939
-0.235598	library, but this is	-0.124939
-0.235598	occurs, but this is	-0.124939
-0.235598	factorials, but this is	-0.124939
-0.235598	2-20, but this is	-0.124939
-0.235598	-ftrapv, but this is	-0.124939
-0.309560	preceding example, this is	-0.124939
-0.309560	to test this is	-0.124939
-0.309560	operating system this is	-0.124939
-0.456365	maintenance. However, this is	-0.124939
-0.127763	variables. Obviously, this is	-0.124939
-0.127763	finished. Obviously, this is	-0.124939
-0.309560	but unfortunately this is	-0.124939
-0.830203	of the time is	-0.425969
-0.327353	resources. This time is	-0.124939
-0.327353	measurement. If time is	-0.124939
-0.492555	and much time is	-0.124939
-0.101121	the calculation time is	-0.124939
-0.327353	The conversion time is	-0.124939
-0.426648	the response time is	-0.124939
-0.327353	parameter. No time is	-0.124939
-0.327353	The measured time is	-0.124939
-0.327353	processor. Extra time is	-0.124939
-0.327353	overall computation time is	-0.124939
-0.460491	method you use is	-0.124939
-0.169756	value of A is	-0.124939
-0.154297	calculation of A is	-0.124939
-0.038902	-0 } It is	-0.124939
-0.038902	109 } It is	-0.124939
-0.081635	intrinsic functions It is	-0.124939
-0.081635	an object It is	-0.124939
-0.081635	two libraries It is	-0.124939
-0.019016	the code. It is	-0.124939
-0.019016	integer code. It is	-0.124939
-0.019016	extra code. It is	-0.124939
-0.019016	source code. It is	-0.124939
-0.085031	long time. It is	-0.124939
-0.097931	longer time. It is	-0.124939
-0.081635	function pointers It is	-0.124939
-0.081635	maps etc. It is	-0.124939
-0.084463	intrinsic functions. It is	-0.124939
-0.084463	unreferenced functions. It is	-0.124939
-0.084463	the memory. It is	-0.124939
-0.084463	allocated memory. It is	-0.124939
-0.114099	is used. It is	-0.124939
-0.048684	are used. It is	-0.124939
-0.048684	longer used. It is	-0.124939
-0.048684	seldom used. It is	-0.124939
-0.081635	64-bit systems. It is	-0.124939
-0.081635	these data. It is	-0.124939
-0.081635	instruction set. It is	-0.124939
-0.081635	VIA processors. It is	-0.124939
-0.081635	be called. It is	-0.124939
-0.081635	same compiler. It is	-0.124939
-0.081635	registers are: It is	-0.124939
-0.081635	the loop. It is	-0.124939
-0.421203	a pointer. It is	-0.124939
-0.172512	'this' pointer. It is	-0.124939
-0.081635	best cases. It is	-0.124939
-0.081635	induction variables. It is	-0.124939
-0.081635	function calls. It is	-0.124939
-0.081635	shared object. It is	-0.124939
-0.038902	integer calculations. It is	-0.124939
-0.038902	mathematical calculations. It is	-0.124939
-0.081635	full optimization. It is	-0.124939
-0.081635	improve performance. It is	-0.124939
-0.081635	each thread. It is	-0.124939
-0.081635	data structures It is	-0.124939
-0.081635	or references. It is	-0.124939
-0.081635	in Windows. It is	-0.124939
-0.084463	to integers. It is	-0.124939
-0.084463	signed integers. It is	-0.124939
-0.038902	it to. It is	-0.124939
-0.038902	apply to. It is	-0.124939
-0.081635	not critical. It is	-0.124939
-0.081635	was executed. It is	-0.124939
-0.081635	software faster. It is	-0.124939
-0.038902	cache problems. It is	-0.124939
-0.038902	alignment problems. It is	-0.124939
-0.081635	point expressions. It is	-0.124939
-0.081635	the arrays. It is	-0.124939
-0.081635	element zero. It is	-0.124939
-0.081635	assembly language. It is	-0.124939
-0.081635	it is. It is	-0.124939
-0.081635	code automatically. It is	-0.124939
-0.038902	the core. It is	-0.124939
-0.038902	same core. It is	-0.124939
-0.081635	automatic vectorization. It is	-0.124939
-0.081635	to do. It is	-0.124939
-0.081635	precision constant. It is	-0.124939
-0.081635	data members. It is	-0.124939
-0.081635	response times. It is	-0.124939
-0.081635	program structure. It is	-0.124939
-0.081635	a profiler. It is	-0.124939
-0.081635	exception handling. It is	-0.124939
-0.081635	= a[i]; It is	-0.124939
-0.081635	out-of-order execution. It is	-0.124939
-0.081635	mouse input. It is	-0.124939
-0.081635	own IDE. It is	-0.124939
-0.081635	dynamic versions. It is	-0.124939
-0.081635	number information. It is	-0.124939
-0.081635	user interface. It is	-0.124939
-0.081635	of unit-testing It is	-0.124939
-0.081635	programming style. It is	-0.124939
-0.081635	the counts. It is	-0.124939
-0.081635	files smaller. It is	-0.124939
-0.081635	16-bit programs. It is	-0.124939
-0.081635	particular part. It is	-0.124939
-0.081635	Cache organization It is	-0.124939
-0.133778	specific purpose. It is	-0.124939
-0.081635	non-sequential manner. It is	-0.124939
-0.081635	on x. It is	-0.124939
-0.081635	new context. It is	-0.124939
-0.081635	are aligned. It is	-0.124939
-0.081635	and throw. It is	-0.124939
-0.081635	of course. It is	-0.124939
-0.081635	page 73). It is	-0.124939
-0.081635	has disadvantages: It is	-0.124939
-0.081635	optimized away. It is	-0.124939
-0.081635	data decomposition. It is	-0.124939
-0.081635	are lost. It is	-0.124939
-0.081635	to read. It is	-0.124939
-0.081635	data. 148 It is	-0.124939
-0.081635	with Gnu. It is	-0.124939
-0.081635	dispatcher updated. It is	-0.124939
-0.081635	below shows. It is	-0.124939
-0.081635	more efficiently. It is	-0.124939
-0.081635	doing divisions. It is	-0.124939
-0.081635	page 72. It is	-0.124939
-0.081635	error message. It is	-0.124939
-0.081635	final product. It is	-0.124939
-0.081635	and off. It is	-0.124939
-0.081635	to diagnose. It is	-0.124939
-0.081635	Feature bloat. It is	-0.124939
-0.081635	compile-time polymorphism. It is	-0.124939
-0.081635	mouse move. It is	-0.124939
-0.081635	iterations ahead. It is	-0.124939
-0.081635	dispatch strategies It is	-0.124939
-0.081635	C-style type-casting. It is	-0.124939
-0.081635	for response. It is	-0.124939
-0.081635	is happening. It is	-0.124939
-0.081635	not standardized. It is	-0.124939
-0.081635	quite convenient. It is	-0.124939
-0.081635	or animation. It is	-0.124939
-0.081635	<xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It is	-0.124939
-0.081635	programmer can. It is	-0.124939
-0.081635	tedious indeed. It is	-0.124939
-0.081635	for hackers. It is	-0.124939
-0.081635	user friendly. It is	-0.124939
-0.081635	page 54. It is	-0.124939
-0.081635	these considerations. It is	-0.124939
-0.081635	performs poorly. It is	-0.124939
-0.081635	first-in-last-out fashion. It is	-0.124939
-0.081635	choose between. It is	-0.124939
-0.081635	page 130. It is	-0.124939
-0.081635	p. 57). It is	-0.124939
-0.081635	for correctness. It is	-0.124939
-0.081635	page 61. It is	-0.124939
-0.081635	or sizes? It is	-0.124939
-0.081635	to a*b*c*2. It is	-0.124939
-0.081635	with double's. It is	-0.124939
-0.081635	memory pooling. It is	-0.124939
-0.081635	many decimals. It is	-0.124939
-0.081635	to develop. It is	-0.124939
-0.081635	memory leaks. It is	-0.124939
-0.081635	is costless. It is	-0.124939
-0.081635	a queue. It is	-0.124939
-1.170221	of the memory is	-0.124939
-0.552930	distance in memory is	-0.124939
-0.553939	The static memory is	-0.124939
-0.758920	The allocated memory is	-0.124939
-0.488788	of static data is	-0.124939
-0.347272	on input data is	-0.124939
-0.448857	make thread-specific data is	-0.124939
-0.347272	containing numerical data is	-0.124939
-0.620873	of the program is	-0.124939
-0.448853	if the program is	-0.124939
-0.460765	time the program is	-0.124939
-0.114552	when the program is	-0.191886
-0.326862	but the program is	-0.124939
-0.516469	before the program is	-0.124939
-0.133380	while the program is	-0.425969
-0.326862	after the program is	-0.124939
-0.326862	When the program is	-0.124939
-0.326862	until the program is	-0.124939
-0.657436	of a program is	-0.124939
-0.612896	in a program is	-0.124939
-0.478769	called. The program is	-0.124939
-0.623730	the test program is	-0.124939
-0.279475	highly optimized program is	-0.124939
-0.177776	console mode program is	-0.124939
-0.279475	the calling program is	-0.124939
-0.279475	computationally intensive program is	-0.124939
-0.487560	calls other functions is	-0.124939
-0.504668	and member functions is	-0.124939
-0.726195	in member functions is	-0.124939
-0.487560	of these functions is	-0.124939
-1.242701	of the CPU is	-0.124939
-0.352645	4 (NetBurst) CPU is	-0.124939
-0.850226	and the other is	-0.124939
-0.556866	from each other is	-0.124939
-0.900192	bit scan instruction is	-0.124939
-0.446767	but the point is	-0.124939
-0.656528	to floating point is	-0.425969
-0.446767	the decimal point is	-0.124939
-0.666944	if the loop is	-0.124939
-1.073021	inside the loop is	-0.124939
-0.520816	unless the loop is	-0.124939
-1.231315	of a loop is	-0.124939
-0.769957	the while loop is	-0.124939
-1.118780	the innermost loop is	-0.124939
-0.240285	new compiler which is	-0.124939
-0.240285	stack memory which is	-0.124939
-0.488034	'this' pointer which is	-0.124939
-0.240285	point library which is	-0.124939
-0.240285	shared object which is	-0.124939
-0.240285	a variable which is	-0.124939
-0.317573	memory address which is	-0.124939
-0.240285	single bit which is	-0.124939
-0.240285	debugging support which is	-0.124939
-0.240285	complicated process which is	-0.124939
-0.240285	shift operation which is	-0.124939
-0.317573	intermediate code, which is	-0.124939
-0.240285	C++ compiler, which is	-0.124939
-0.240285	special trick which is	-0.124939
-0.240285	32-bit integers, which is	-0.124939
-0.240285	line size, which is	-0.124939
-0.240285	a latency which is	-0.124939
-0.048776	is true, which is	-0.124939
-0.317573	start up, which is	-0.124939
-0.448169	the stack, which is	-0.124939
-0.240285	family number, which is	-0.124939
-0.240285	a division, which is	-0.124939
-0.240285	integer comparison, which is	-0.124939
-0.240285	loop counter, which is	-0.124939
-0.240285	garbage collector which is	-0.124939
-0.103732	& operation, which is	-0.124939
-0.103732	shift operation, which is	-0.124939
-0.240285	library (DLL) which is	-0.124939
-0.240285	by x<<3, which is	-0.124939
-0.240285	generic branch, which is	-0.124939
-0.240285	library asmlib, which is	-0.124939
-0.240285	compile-time polymorphism, which is	-0.124939
-0.240285	for everything, which is	-0.124939
-0.240285	language output, which is	-0.124939
-0.240285	Basic .NET, which is	-0.124939
-0.240285	a bit-mask which is	-0.124939
-0.240285	constant 2.5, which is	-0.124939
-0.521517	used at all is	-0.124939
-0.345331	by Intel but is	-0.124939
-0.345331	of C++ but is	-0.124939
-0.446406	Atom processors, but is	-0.124939
-0.345331	under test, but is	-0.124939
-1.257257	that is used is	-0.124939
-0.344256	or if one is	-0.124939
-0.242158	of which one is	-0.124939
-0.242158	out which one is	-0.124939
-0.631005	the preceding one is	-0.124939
-0.532192	how a cache is	-0.124939
-1.067389	the code cache is	-0.124939
-0.421017	data A cache is	-0.124939
-0.382580	The data cache is	-0.124939
-0.606015	level-1 data cache is	-0.124939
-1.106237	the same cache is	-0.124939
-0.843255	the level-2 cache is	-0.124939
-0.446420	The level-2 cache is	-0.124939
-0.499214	entire level-1 cache is	-0.124939
-0.544742	if the integer is	-0.124939
-0.515794	whether an integer is	-0.124939
-0.515794	When an integer is	-0.124939
-0.339563	the even integer is	-0.124939
-0.478145	s; An integer is	-0.124939
-0.372935	the instruction set is	-0.124939
-0.182124	This instruction set is	-0.124939
-0.313179	which instruction set is	-0.124939
-0.096390	SSE2 instruction set is	-0.284640
-0.213812	32 instruction set is	-0.124939
-0.208247	AVX instruction set is	-0.249877
-0.099916	later instruction set is	-0.249877
-0.313179	higher instruction set is	-0.124939
-0.352090	x86 instruction set is	-0.124939
-0.093879	SSE4.1 instruction set is	-0.124939
-0.213812	newest instruction set is	-0.124939
-0.213812	AVX512 instruction set is	-0.124939
-0.352090	CISC instruction set is	-0.124939
-0.213812	later) instruction set is	-0.124939
-0.928810	of the class is	-0.124939
-1.131412	of a class is	-0.124939
-0.622621	function or class is	-0.124939
-0.674374	structure or class is	-0.124939
-0.370652	52 or class is	-0.124939
-0.407648	a template class is	-0.124939
-0.287217	above template class is	-0.124939
-0.504304	The child class is	-0.124939
-0.824204	a derived class is	-0.124939
-0.318870	a base class is	-0.124939
-1.249074	have to do is	-0.124939
-1.264642	you can do is	-0.124939
-0.333202	note: This example is	-0.124939
-0.558095	in this example is	-0.124939
-0.333202	executed. An example is	-0.124939
-0.135400	this. My example is	-0.124939
-0.135400	example. My example is	-0.124939
-0.523140	from different compilers is	-0.124939
-1.017718	float and double is	-0.124939
-0.345901	complications. A double is	-0.124939
-0.634194	a 64-bit double is	-0.124939
-1.211113	if the size is	-0.124939
-0.404898	Optimizing for size is	-0.124939
-0.312121	where cache size is	-0.124939
-0.418068	The integer size is	-0.124939
-0.418068	particular integer size is	-0.124939
-0.459962	vector register size is	-0.124939
-0.312121	sure its size is	-0.124939
-0.312121	a specific size is	-0.124939
-0.922449	cache line size is	-0.124939
-0.312121	a smaller size is	-0.124939
-0.312121	the RAM size is	-0.124939
-0.511138	of the pointer is	-0.124939
-0.323409	when the pointer is	-0.124939
-0.323409	before the pointer is	-0.124939
-0.306642	pointer. The pointer is	-0.124939
-0.847089	the function pointer is	-0.124939
-0.437404	member function pointer is	-0.124939
-0.306642	member. This pointer is	-0.124939
-0.398124	and this pointer is	-0.124939
-0.433467	arithmetic A pointer is	-0.124939
-0.398124	aligned(16))) Assume pointer is	-0.124939
-0.446888	a smart pointer is	-0.124939
-0.446888	A smart pointer is	-0.124939
-0.281202	value of b is	-0.124939
-0.281202	offset of b is	-0.124939
-0.526583	time and b is	-0.124939
-0.310268	assume that b is	-0.124939
-0.156230	not if b is	-0.124939
-0.071127	b if b is	-0.425969
-0.059300	only when b is	-0.124939
-0.059300	efficient when b is	-0.124939
-0.059300	implementation when b is	-0.124939
-0.059300	however, when b is	-0.124939
-0.429006	if the library is	-0.124939
-0.429006	when the library is	-0.124939
-0.606624	from the library is	-0.124939
-0.329999	long vector library is	-0.124939
-0.296820	a dynamic library is	-0.124939
-0.137793	number when i is	-0.124939
-0.137793	invalid when i is	-0.124939
-0.340793	15. If i is	-0.124939
-0.340793	loop counter i is	-0.124939
-0.493361	of the object is	-0.124939
-0.493361	if the object is	-0.124939
-0.128495	when the object is	-0.124939
-0.311788	If the object is	-0.124939
-0.311788	sure the object is	-0.124939
-0.311788	met: the object is	-0.124939
-0.654870	variable or object is	-0.124939
-0.447427	time an object is	-0.124939
-0.447427	whenever an object is	-0.124939
-0.276817	that no object is	-0.124939
-0.508754	a new object is	-0.124939
-0.416408	a shared object is	-0.124939
-0.276817	reasons: Each object is	-0.124939
-0.508754	the local object is	-0.124939
-0.276817	the original object is	-0.124939
-1.134847	floating point number is	-0.124939
-0.351090	a higher number is	-0.124939
-0.355293	the word static is	-0.124939
-0.253730	matter and there is	-0.124939
-0.253730	common, and there is	-0.124939
-0.228217	way that there is	-0.124939
-0.331366	assume that there is	-0.124939
-0.228217	feature that there is	-0.124939
-0.331366	aware that there is	-0.124939
-0.228217	complex, that there is	-0.124939
-0.177419	or if there is	-0.124939
-0.177419	program if there is	-0.124939
-0.177419	integer if there is	-0.124939
-0.177419	thread if there is	-0.124939
-0.177419	bigger if there is	-0.124939
-0.177419	smaller if there is	-0.124939
-0.177419	exponent if there is	-0.124939
-0.177419	especially if there is	-0.124939
-0.177419	consider if there is	-0.124939
-0.177419	i.e. if there is	-0.124939
-0.177419	accumulators if there is	-0.124939
-0.130730	convenience - there is	-0.124939
-0.060465	efficient when there is	-0.124939
-0.060465	critical when there is	-0.124939
-0.015639	time then there is	-0.124939
-0.031861	memory then there is	-0.124939
-0.031861	functions then there is	-0.124939
-0.031861	two then there is	-0.124939
-0.031861	performance then there is	-0.124939
-0.031861	error then there is	-0.124939
-0.031861	etc. then there is	-0.124939
-0.031861	elsewhere then there is	-0.124939
-0.211079	negligible because there is	-0.124939
-0.252957	running. If there is	-0.124939
-0.252957	calculate. If there is	-0.124939
-0.196777	precision, but there is	-0.124939
-0.196777	bases, but there is	-0.124939
-0.189556	operations where there is	-0.124939
-0.130730	this case there is	-0.124939
-0.288860	pointer. But there is	-0.124939
-0.130730	and whether there is	-0.124939
-0.039360	default unless there is	-0.124939
-0.039360	object, unless there is	-0.124939
-0.039360	manually unless there is	-0.124939
-0.084517	most cases, there is	-0.124939
-0.097577	some cases, there is	-0.124939
-0.130730	reason why there is	-0.124939
-0.130730	(Of course there is	-0.124939
-0.130730	are used, there is	-0.124939
-0.130730	the diagonal there is	-0.124939
-0.130730	64-bit systems, there is	-0.124939
-0.130730	cases, however, there is	-0.124939
-0.130730	In general, there is	-0.124939
-0.130730	but unfortunately there is	-0.124939
-0.189556	caches. Typically, there is	-0.124939
-0.130730	is enabled there is	-0.124939
-0.411461	Windows and C++ is	-0.124939
-0.493800	Development in C++ is	-0.124939
-0.317414	disadvantages when C++ is	-0.124939
-0.317414	platforms. However, C++ is	-0.124939
-0.317414	function libraries. C++ is	-0.124939
-0.317414	several reasons. C++ is	-0.124939
-0.317414	page 15. C++ is	-0.124939
-0.317414	14 Portability C++ is	-0.124939
-0.179999	is double There is	-0.124939
-0.179999	different functions. There is	-0.124939
-0.179999	{ ... There is	-0.124939
-0.179999	the systems. There is	-0.124939
-0.246460	logical processors. There is	-0.124939
-0.179999	Development process There is	-0.124939
-0.179999	was called. There is	-0.124939
-0.179999	free are: There is	-0.124939
-0.179999	vector size. There is	-0.124939
-0.179999	same object. There is	-0.124939
-0.179999	different way. There is	-0.124939
-0.179999	internal references. There is	-0.124939
-0.179999	function returns. There is	-0.124939
-0.179999	memory allocation. There is	-0.124939
-0.179999	becomes inefficient. There is	-0.124939
-0.179999	try block. There is	-0.124939
-0.179999	template parameter. There is	-0.124939
-0.179999	maximum value. There is	-0.124939
-0.179999	control branch. There is	-0.124939
-0.179999	four parameters. There is	-0.124939
-0.179999	higher bits. There is	-0.124939
-0.179999	comes automatically. There is	-0.124939
-0.179999	unit throughput There is	-0.124939
-0.179999	time- consuming. There is	-0.124939
-0.179999	out-of-order execution. There is	-0.124939
-0.179999	AVX support. There is	-0.124939
-0.179999	the screen. There is	-0.124939
-0.179999	application programmer. There is	-0.124939
-0.179999	is created. There is	-0.124939
-0.179999	p. 43). There is	-0.124939
-0.179999	and Gnu. There is	-0.124939
-0.179999	12.10 Conclusion There is	-0.124939
-0.179999	p. 87). There is	-0.124939
-0.179999	be recycled? There is	-0.124939
-0.179999	as x4∙xn-4. There is	-0.124939
-0.179999	7.33 Namespaces There is	-0.124939
-0.179999	is returned. There is	-0.124939
-0.179999	or .so). There is	-0.124939
-0.528211	of the array is	-0.124939
-0.411846	which the array is	-0.124939
-0.576670	of each array is	-0.124939
-0.576670	a simple array is	-0.124939
-0.476512	Arrays An array is	-0.124939
-0.684867	or multidimensional array is	-0.124939
-0.315333	a fixed-size array is	-0.124939
-0.355450	2eee 1.fffff, where is	-0.124939
-0.138409	Each code version is	-0.425969
-0.429967	The 64-bit version is	-0.124939
-0.332254	The Windows version is	-0.124939
-0.332254	directly compiled version is	-0.124939
-1.186745	that the value is	-0.124939
-0.520944	then the value is	-0.124939
-0.520944	unless the value is	-0.124939
-0.485316	that a value is	-0.124939
-0.498904	. The value is	-0.124939
-0.721622	that each value is	-0.124939
-0.415609	then its value is	-0.124939
-0.109965	number of objects is	-0.124939
-0.356375	if the variable is	-0.124939
-0.356375	If the variable is	-0.124939
-0.142615	which the variable is	-0.124939
-0.356375	sure the variable is	-0.124939
-0.713143	of a variable is	-0.124939
-0.297524	that a variable is	-0.425969
-0.551261	function or variable is	-0.124939
-0.464046	data A variable is	-0.124939
-0.337931	to do so is	-0.124939
-0.993976	floating point variables is	-0.124939
-0.709182	point register variables is	-0.124939
-0.378794	integer register variables is	-0.124939
-1.747882	power of 2 is	-0.124939
-0.426907	that the table is	-0.124939
-0.426907	if the table is	-0.124939
-0.728877	calculate the table is	-0.124939
-0.482183	so-called virtual table is	-0.124939
-0.748534	a lookup table is	-0.124939
-0.492094	but the performance is	-0.124939
-0.492094	case, the performance is	-0.124939
-0.468190	processors. The performance is	-0.124939
-0.309574	good code performance is	-0.124939
-0.309574	time when performance is	-0.124939
-0.025322	The best performance is	-0.602060
-0.346899	the application software is	-0.124939
-0.346899	of CPU-intensive software is	-0.124939
-0.349541	the storage order is	-0.124939
-0.349541	The link order is	-0.124939
-0.526293	if the branch is	-0.124939
-0.469394	function and branch is	-0.124939
-0.415696	The if branch is	-0.124939
-0.423680	loop control branch is	-0.124939
-0.320821	the wrong branch is	-0.124939
-0.643501	a data member is	-0.124939
-0.382216	class data member is	-0.124939
-0.572192	in this way is	-0.124939
-1.007890	the same way is	-0.124939
-0.189230	the other way is	-0.425969
-0.405830	The first way is	-0.124939
-0.312874	The second way is	-0.124939
-0.312874	most compatible way is	-0.124939
-0.896291	number of elements is	-0.124939
-0.455734	types of elements is	-0.124939
-0.325379	If this address is	-0.124939
-0.325379	if its address is	-0.124939
-0.336307	the target address is	-0.124939
-0.232108	The target address is	-0.124939
-0.325379	variable whose address is	-0.124939
-0.569657	intrinsic function call is	-0.124939
-0.716042	the carry bit is	-0.124939
-0.509614	opposite of register is	-0.124939
-0.337899	memory. A register is	-0.124939
-0.524925	each vector register is	-0.124939
-0.409151	level of optimization is	-0.124939
-0.288356	degree of optimization is	-0.124939
-0.415301	18 software optimization is	-0.124939
-0.320503	when interprocedural optimization is	-0.124939
-0.320503	program 81 optimization is	-0.124939
-0.583204	or function libraries is	-0.124939
-0.508275	Templates A template is	-0.124939
-0.062951	The powN template is	-0.124939
-0.226038	number of registers is	-0.124939
-0.318625	advantageous because registers is	-0.124939
-0.449591	the integer registers is	-0.124939
-0.318625	of available registers is	-0.124939
-0.457640	using smart pointers is	-0.124939
-0.559096	or the user is	-0.124939
-0.434416	when a user is	-0.124939
-0.874839	the end user is	-0.124939
-0.302063	bits. The method is	-0.124939
-0.302063	language". The method is	-0.124939
-0.142650	called. This method is	-0.124939
-0.142650	CPUs. This method is	-0.124939
-0.142650	compiler. This method is	-0.124939
-0.142650	library. This method is	-0.124939
-0.142650	16. This method is	-0.124939
-0.142650	finished. This method is	-0.124939
-0.142650	versions. This method is	-0.124939
-0.142650	2.0 This method is	-0.124939
-0.142650	efficiently. This method is	-0.124939
-0.142650	added. This method is	-0.124939
-0.227266	of this method is	-0.124939
-0.227266	because this method is	-0.124939
-0.227266	but this method is	-0.124939
-0.167811	discussed which method is	-0.124939
-0.167811	consider which method is	-0.124939
-0.212540	more general method is	-0.124939
-0.325091	column. The access is	-0.124939
-0.360636	that memory access is	-0.124939
-0.360636	if memory access is	-0.124939
-0.478303	Optimizing file access is	-0.124939
-0.343290	sizeof(S1) = 16 is	-0.124939
-0.526675	alignment by 16 is	-0.124939
-0.334810	mode if SSE2 is	-0.124939
-0.334810	if possible. SSE2 is	-0.124939
-0.334810	same executable. SSE2 is	-0.124939
-0.332961	resources. The system is	-0.124939
-1.220974	the operating system is	-0.124939
-0.516036	X operating system is	-0.124939
-0.367998	before the file is	-0.124939
-0.367998	sure the file is	-0.124939
-0.511434	to a file is	-0.124939
-0.353773	Template meta- programming is	-0.124939
-0.353283	to 1024 bits is	-0.124939
-0.572128	of vector operations is	-0.124939
-0.500446	bb[i] > 0 is	-0.124939
-0.481535	a composite type is	-0.124939
-0.303967	of composite type is	-0.124939
-0.484593	in this case is	-0.124939
-0.322969	The simplest case is	-0.124939
-0.471409	The worst case is	-0.124939
-0.892421	generation of processors is	-0.124939
-0.430859	in Intel processors is	-0.124939
-0.762245	standard PC processors is	-0.124939
-0.459527	if the constant is	-0.124939
-0.150296	by a constant is	-0.903090
-0.483953	then the error is	-0.124939
-0.452064	kind of error is	-0.124939
-0.415238	and this error is	-0.124939
-0.729404	the residual error is	-0.124939
-0.860758	to the stack is	-0.124939
-0.291746	function. The stack is	-0.124939
-0.291746	below. The stack is	-0.124939
-0.450034	the register stack is	-0.124939
-0.318953	point register stack is	-0.124939
-0.502233	speed of CPUs is	-0.124939
-0.341768	with old CPUs is	-0.124939
-0.353206	in character arrays is	-0.124939
-0.575623	virtual function calls is	-0.124939
-0.353324	The fastest execution is	-0.124939
-0.629759	and the result is	-0.124939
-0.444237	because the result is	-0.124939
-0.444237	sure the result is	-0.124939
-0.453783	2. The result is	-0.124939
-0.285842	the first result is	-0.124939
-0.285842	the second result is	-0.124939
-0.285842	the 33 result is	-0.124939
-0.222389	if the processor is	-0.124939
-0.280768	whether the processor is	-0.124939
-0.305469	one logical processor is	-0.124939
-0.558830	a soft processor is	-0.124939
-0.352833	than 127 bytes is	-0.124939
-0.439937	use of threads is	-0.124939
-0.623185	communication between threads is	-0.124939
-0.724201	the array element is	-0.124939
-0.194209	the first element is	-0.124939
-0.128330	The C++ language is	-0.124939
-0.331529	the programming language is	-0.124939
-0.555033	of programming language is	-0.124939
-0.331529	which programming language is	-0.124939
-0.447830	in assembly language is	-0.124939
-0.264651	hardware definition language is	-0.124939
-0.429563	precision. The speed is	-0.124939
-0.401309	Optimizing for speed is	-0.124939
-0.282405	version if speed is	-0.124939
-0.118641	avoided when speed is	-0.124939
-0.118641	preferred when speed is	-0.124939
-0.118641	code where speed is	-0.124939
-0.118641	areas where speed is	-0.124939
-0.354059	d+e, then c is	-0.124939
-0.103644	3.1 How much is	-0.425969
-0.501687	when a thread is	-0.124939
-0.460608	where one thread is	-0.124939
-0.480662	while another thread is	-0.124939
-0.338296	subtraction, multiplication, etc. is	-0.124939
-0.338296	semaphores, mutexes, etc. is	-0.124939
-0.137496	overflow. The exception is	-0.124939
-0.137496	anyway. The exception is	-0.124939
-0.716322	that is allocated is	-0.124939
-0.339332	has been allocated is	-0.124939
-0.703901	case of overflow is	-0.124939
-0.326068	15 Integer overflow is	-0.124939
-0.326068	protection against overflow is	-0.124939
-0.762980	and unsigned integers is	-0.124939
-0.628763	with unsigned integers is	-0.124939
-0.620788	assembly output option is	-0.124939
-0.338946	The -fpie option is	-0.124939
-0.569121	of the matrix is	-0.124939
-0.360150	and the matrix is	-0.124939
-0.293013	in a matrix is	-0.425969
-0.290374	if a matrix is	-0.124939
-0.290374	Transposing a matrix is	-0.124939
-0.539347	Therefore, 64-bit Linux is	-0.124939
-0.504882	in 32-bit Linux is	-0.124939
-0.338575	only if AVX is	-0.124939
-0.338575	operating system. AVX is	-0.124939
-0.835546	or vector classes is	-0.124939
-0.686753	long double precision is	-0.124939
-0.686753	Long double precision is	-0.124939
-0.420263	cache. Single precision is	-0.124939
-0.351638	The last line is	-0.124939
-0.352282	that already works is	-0.124939
-0.495355	code is optimized is	-0.124939
-0.202448	below. This manual is	-0.124939
-0.202448	Introduction This manual is	-0.124939
-0.202448	158. This manual is	-0.124939
-0.123777	the present manual is	-0.124939
-0.123777	The present manual is	-0.124939
-0.353106	Linux. Address calculation is	-0.124939
-0.353681	CPU brand check is	-0.124939
-0.279460	if the problem is	-0.124939
-0.279460	If the problem is	-0.124939
-0.279460	solving the problem is	-0.124939
-0.397818	units. The problem is	-0.124939
-0.209919	to this problem is	-0.425969
-0.259587	solve this problem is	-0.124939
-0.069074	then the solution is	-0.124939
-0.252691	mark_end; This solution is	-0.124939
-0.151250	of this solution is	-0.124939
-0.151250	Furthermore, this solution is	-0.124939
-0.185342	see which solution is	-0.124939
-0.185342	The best solution is	-0.124939
-0.252691	the optimal solution is	-0.124939
-0.185342	more complicated solution is	-0.124939
-0.252691	An alternative solution is	-0.124939
-0.185342	more powerful solution is	-0.124939
-0.185342	most clean solution is	-0.124939
-0.185342	only reasonable solution is	-0.124939
-0.185342	No universal solution is	-0.124939
-0.508540	If the container is	-0.124939
-0.617407	array or container is	-0.124939
-0.561732	using bitwise operators is	-0.124939
-0.353525	(i=0; i<n; i++) is	-0.124939
-0.431540	if the list is	-0.124939
-0.481293	Such a list is	-0.124939
-0.134490	A linked list is	-0.124939
-0.659202	a sorted list is	-0.124939
-0.353843	which quite likely is	-0.124939
-0.175709	class or structure is	-0.124939
-0.351030	(1985). This standard is	-0.124939
-0.541429	when the hardware is	-0.124939
-0.334069	b + 1 is	-0.124939
-0.334069	b & 1 is	-0.124939
-0.454239	mode. 16-bit mode is	-0.124939
-0.572998	elements to store is	-0.124939
-0.351799	these two values is	-0.124939
-0.352826	fraction. The sign is	-0.124939
-0.520205	This non-inlined copy is	-0.124939
-0.434144	sure the information is	-0.124939
-0.335587	known. This information is	-0.124939
-0.432340	of memory addresses is	-0.124939
-0.334148	calculate self-relative addresses is	-0.124939
-0.444957	the loop counter is	-0.124939
-0.283325	clock cycles counter is	-0.124939
-0.519914	performance monitor counter is	-0.124939
-0.519914	clock cycle counter is	-0.124939
-0.096018	the loop count is	-0.249877
-0.383366	The first count is	-0.124939
-0.085376	the repeat count is	-0.124939
-0.168241	high repeat count is	-0.124939
-0.168241	typical repeat count is	-0.124939
-0.334350	the memory allocation is	-0.124939
-0.540653	dynamic memory allocation is	-0.124939
-0.483745	Dynamic memory allocation is	-0.124939
-0.350758	An uncached write is	-0.124939
-0.453375	to these problems is	-0.124939
-0.295444	heap. The space is	-0.124939
-0.308790	the memory space is	-0.124939
-0.308790	This memory space is	-0.124939
-0.308790	Extra memory space is	-0.124939
-0.694367	If the microprocessor is	-0.124939
-0.485278	cases the microprocessor is	-0.124939
-0.349957	contains several branches is	-0.124939
-0.350152	The & operator is	-0.124939
-0.066930	or overloaded operator is	-0.124939
-0.066930	an overloaded operator is	-0.124939
-0.066930	An overloaded operator is	-0.124939
-0.242963	The dynamic_cast operator is	-0.124939
-0.320775	The const_cast operator is	-0.124939
-0.242963	The reinterpret_cast operator is	-0.124939
-0.387174	and the multiplication is	-0.124939
-0.387174	while the multiplication is	-0.124939
-0.330161	heavy graphics application is	-0.124939
-0.330161	A WTL application is	-0.124939
-0.243149	when code caching is	-0.124939
-0.243149	where code caching is	-0.124939
-0.256978	addresses. If caching is	-0.124939
-0.256978	data without caching is	-0.124939
-0.256978	87). Data caching is	-0.124939
-0.256978	memory. Efficient caching is	-0.124939
-0.470228	the instruction sets is	-0.124939
-0.470228	compatible instruction sets is	-0.124939
-0.496771	of the expression is	-0.124939
-0.310602	d; This expression is	-0.124939
-0.310602	that some expression is	-0.124939
-0.273787	a vector implementation is	-0.124939
-0.273787	about which implementation is	-0.124939
-0.366000	the software implementation is	-0.124939
-0.366000	a software implementation is	-0.124939
-0.425105	more complicated implementation is	-0.124939
-0.609619	the exception handling is	-0.124939
-0.430993	If exception handling is	-0.124939
-0.056533	handling Exception handling is	-0.124939
-0.565710	class data members is	-0.124939
-0.577165	large memory model is	-0.124939
-0.265751	known CPU model is	-0.124939
-0.265751	particular CPU model is	-0.124939
-0.426984	each processor model is	-0.124939
-0.468918	this memory block is	-0.124939
-0.668189	bigger memory block is	-0.124939
-0.526879	the function name is	-0.124939
-0.294213	that the conversion is	-0.124939
-0.294213	example, the conversion is	-0.124939
-0.284064	data. The disadvantage is	-0.124939
-0.284064	starts. The disadvantage is	-0.124939
-0.284064	avoided. The disadvantage is	-0.124939
-0.420433	types. A disadvantage is	-0.124939
-0.420433	slower. Another disadvantage is	-0.124939
-0.557205	integer to zero is	-0.124939
-0.307000	fast that what is	-0.124939
-0.470274	based on what is	-0.124939
-0.307000	the reader what is	-0.124939
-0.328146	of the parameter is	-0.124939
-0.358044	if a parameter is	-0.124939
-0.462513	a function parameter is	-0.124939
-0.050185	the size parameter is	-0.124939
-0.373802	the template parameter is	-0.124939
-0.327937	0. The division is	-0.124939
-0.538098	microprocessor. Integer division is	-0.124939
-1.483308	pointer or reference is	-0.124939
-0.328564	to. A reference is	-0.124939
-0.537190	if the source is	-0.124939
-0.329402	switching. This cost is	-0.124939
-0.464225	This extra cost is	-0.124939
-0.300155	cycles. The reason is	-0.124939
-0.300155	well. The reason is	-0.124939
-0.300155	4. The reason is	-0.124939
-0.300155	tested. The reason is	-0.124939
-0.282970	fact that n is	-0.124939
-0.118835	than when n is	-0.124939
-0.118835	serious when n is	-0.124939
-0.282970	back, where n is	-0.124939
-0.283363	of the string is	-0.124939
-0.111590	time a string is	-0.425969
-0.283363	of each string is	-0.124939
-0.556655	The register keyword is	-0.124939
-0.304254	the inline keyword is	-0.124939
-0.304254	the __fastcall keyword is	-0.124939
-0.616786	and table lookup is	-0.124939
-0.435730	Unfortunately, table lookup is	-0.124939
-0.348171	operand of && is	-0.124939
-0.543646	of the difference is	-0.124939
-0.462531	functions. The difference is	-0.124939
-0.641298	the preceding addition is	-0.124939
-0.304254	route. This mechanism is	-0.124939
-0.481155	function dispatch mechanism is	-0.124939
-0.556655	stack unwinding mechanism is	-0.124939
-0.348400	operand of || is	-0.124939
-0.451552	kind of optimizations is	-0.124939
-0.449336	specific graphics framework is	-0.124939
-0.202084	of static linking is	-0.124939
-0.202084	that static linking is	-0.124939
-0.202084	if static linking is	-0.124939
-0.202084	when static linking is	-0.124939
-0.418780	if dynamic linking is	-0.124939
-0.115799	of modern microprocessors is	-0.124939
-0.300276	with older microprocessors is	-0.124939
-0.061220	the work load is	-0.124939
-0.453622	which we assume is	-0.124939
-0.548213	floating point numbers is	-0.124939
-0.321237	choice of platform is	-0.124939
-0.321237	a different platform is	-0.124939
-0.348402	called, a dispatch is	-0.124939
-0.764969	the user interface is	-0.124939
-0.470480	possible user interface is	-0.124939
-0.347090	only when AVX2 is	-0.124939
-0.538561	log on process is	-0.124939
-0.294042	GOT lookup process is	-0.124939
-0.294042	this delaying process is	-0.124939
-0.295019	to by r is	-0.124939
-0.295019	sequence, where r is	-0.124939
-0.295019	clear whether r is	-0.124939
-0.382322	type of storage is	-0.124939
-0.382322	binary data storage is	-0.124939
-0.416352	thread. Thread-local storage is	-0.124939
-0.123255	into a union is	-0.124939
-0.123255	Using a union is	-0.124939
-0.446056	Unions A union is	-0.124939
-0.347574	recognize that 10 is	-0.124939
-0.247047	indirect function feature is	-0.124939
-0.325664	2010. This feature is	-0.124939
-0.032594	but this feature is	-0.124939
-0.067834	so this feature is	-0.124939
-0.290958	destructors A constructor is	-0.124939
-0.666728	A copy constructor is	-0.124939
-0.412590	A default constructor is	-0.124939
-0.349867	array element a[i] is	-0.124939
-0.347821	declared with #define is	-0.124939
-0.349575	number of points is	-0.124939
-0.348405	A context switch is	-0.124939
-1.008659	out of range is	-0.124939
-0.196661	method used here is	-0.124939
-0.196661	using static here is	-0.124939
-0.196661	the speed here is	-0.124939
-0.196661	The problem here is	-0.124939
-0.087293	const_cast operator here is	-0.124939
-0.087293	?: operator here is	-0.124939
-0.196661	www.yeppp.info And here is	-0.124939
-0.510522	the CPU core is	-0.124939
-0.815537	The code section is	-0.124939
-0.445331	The data section is	-0.124939
-0.497058	level-1 cache contentions is	-0.124939
-0.316906	of such contentions is	-0.124939
-0.264032	when the computer is	-0.124939
-0.112252	until the computer is	-0.124939
-0.260533	on one computer is	-0.124939
-0.346441	different type conversions is	-0.124939
-0.709088	prevent such errors is	-0.124939
-0.347751	faster when columns is	-0.124939
-0.090325	i to p is	-0.124939
-0.090325	added to p is	-0.124939
-0.275155	clear that p is	-0.124939
-0.204506	to by p is	-0.124939
-0.204506	read before p is	-0.124939
-0.204506	fast whether p is	-0.124939
-0.136160	that the syntax is	-0.124939
-0.136160	but the syntax is	-0.124939
-0.217621	Unfortunately, the syntax is	-0.124939
-0.366038	are: The syntax is	-0.124939
-0.587189	of the STL is	-0.124939
-0.371228	However, the STL is	-0.124939
-0.283436	153. A profiler is	-0.124939
-0.283436	CPUs. Intel's profiler is	-0.124939
-0.283436	VTune; AMD's profiler is	-0.124939
-0.335285	if the index is	-0.124939
-0.255060	Check that index is	-0.124939
-0.273956	the array index is	-0.124939
-0.437118	an array index is	-0.124939
-0.402978	of function inlining is	-0.124939
-0.283674	and function inlining is	-0.124939
-0.326830	if the network is	-0.124939
-0.224636	where the network is	-0.124939
-0.798090	(2n / b) is	-0.124939
-0.347360	such a response is	-0.124939
-0.345964	number of lines is	-0.124939
-0.059349	the same operation is	-0.425969
-0.439669	of bounds checking is	-0.124939
-0.458758	fragmentation. Bounds checking is	-0.124939
-0.434891	where a task is	-0.124939
-0.307704	a given task is	-0.124939
-0.524406	frequency is limited is	-0.124939
-0.345791	A little math is	-0.124939
-0.344301	network or database is	-0.124939
-0.446516	table of constants is	-0.124939
-0.346539	If a bool is	-0.124939
-0.732780	standard stack frame is	-0.124939
-0.601816	and the destructor is	-0.124939
-0.274586	owns. A destructor is	-0.124939
-0.274586	A virtual destructor is	-0.124939
-0.275538	question when efficiency is	-0.124939
-0.275538	of program efficiency is	-0.124939
-0.275538	The highest efficiency is	-0.124939
-0.307704	choice of algorithm is	-0.124939
-0.307704	The following algorithm is	-0.124939
-0.345593	to handle strings is	-0.124939
-0.116415	of the exponent is	-0.124939
-0.054318	when the exponent is	-0.124939
-0.090583	numbers. The exponent is	-0.124939
-0.090583	digits. The exponent is	-0.124939
-0.126896	loop. Another possibility is	-0.124939
-0.126896	GOT. Another possibility is	-0.124939
-0.345593	of these conditions is	-0.124939
-0.343989	optimal. Best-case testing is	-0.124939
-0.484247	that the alignment is	-0.124939
-0.346397	the cross-platform compatibility is	-0.124939
-0.343589	because the macro is	-0.124939
-0.148729	when an operand is	-0.124939
-0.032684	If one operand is	-0.124939
-0.007943	the second operand is	-0.249877
-0.321229	data. The effect is	-0.124939
-0.321229	post-increment. The effect is	-0.124939
-0.270072	why this effect is	-0.124939
-0.262341	set of containers is	-0.124939
-0.262341	ready made containers is	-0.124939
-0.344059	the STL containers is	-0.124939
-0.629785	the same priority is	-0.124939
-0.265341	the clock frequency is	-0.124939
-0.408454	The clock frequency is	-0.124939
-0.364416	CPU clock frequency is	-0.124939
-0.261982	Here the iteration is	-0.124939
-0.374648	for each iteration is	-0.124939
-0.261982	the preceding iteration is	-0.124939
-0.226607	N-1)==0 if N is	-0.124939
-0.226607	removed. If N is	-0.124939
-0.226607	pow(x,N) where N is	-0.124939
-0.226607	General case, N is	-0.124939
-0.344492	most important thing is	-0.124939
-0.342759	if the handle is	-0.124939
-0.532165	called the heap is	-0.124939
-0.791619	#pragma vector nontemporal is	-0.124939
-0.502933	of array bounds is	-0.124939
-1.398576	can be improved is	-0.124939
-0.298518	structure. The situation is	-0.124939
-0.546452	worst case situation is	-0.124939
-0.543656	An error message is	-0.124939
-0.192799	time. The delay is	-0.124939
-0.192799	line. The delay is	-0.124939
-0.254917	ms. This delay is	-0.124939
-0.513565	if the condition is	-0.124939
-0.254532	testing a condition is	-0.124939
-0.471400	loop control condition is	-0.124939
-0.254532	the different cores is	-0.124939
-0.198995	multiple CPU cores is	-0.425969
-0.295969	The result ebx is	-0.124939
-0.295969	code. Register ebx is	-0.124939
-0.297242	address of list[i] is	-0.124939
-0.297242	the expression list[i] is	-0.124939
-0.512877	A switch statements is	-0.124939
-0.340610	class? This chapter is	-0.124939
-0.783096	branch target buffer is	-0.124939
-0.480298	Excessive loop unrolling is	-0.124939
-0.291982	of times CriticalFunction is	-0.124939
-0.291982	on whether CriticalFunction is	-0.124939
-0.443064	and the fraction is	-0.124939
-0.343719	the row length is	-0.124939
-0.393906	bit of f is	-0.124939
-0.090850	{ // f is	-0.124939
-0.090850	30 // f is	-0.124939
-0.205872	sum, then f is	-0.124939
-0.442410	the misprediction penalty is	-0.124939
-0.291060	The function F1 is	-0.124939
-0.291060	without returning. F1 is	-0.124939
-0.245758	A simple alternative is	-0.124939
-0.324120	fragmented. An alternative is	-0.124939
-0.245758	A light-weight alternative is	-0.124939
-0.384074	the critical stride is	-0.425969
-0.447567	The critical stride is	-0.124939
-0.342680	transfer for 'this' is	-0.124939
-0.340610	elements per row is	-0.124939
-0.499813	where template metaprogramming is	-0.124939
-0.481011	a hash map is	-0.124939
-0.338612	objects they contain is	-0.124939
-0.623436	programmable logic device is	-0.124939
-0.208367	of parameter transfer is	-0.124939
-0.234819	mode Parameter transfer is	-0.124939
-0.403573	big memory blocks is	-0.124939
-0.370493	writing big blocks is	-0.124939
-0.149297	in example 15.1b is	-0.124939
-0.397482	of the latter is	-0.124939
-0.101894	systems. The latter is	-0.124939
-0.101894	mode. The latter is	-0.124939
-0.531190	of dependency chains is	-0.124939
-0.338612	a particular brand is	-0.124939
-0.893991	below the diagonal is	-0.124939
-0.284125	for different purposes is	-0.124939
-0.284125	for educational purposes is	-0.124939
-0.340328	the user. Time is	-0.124939
-0.119575	b; // everything is	-0.124939
-0.119575	1.2; // everything is	-0.124939
-0.339755	calculation capabilities. Here is	-0.124939
-0.341476	OpenMP directives. OpenMP is	-0.124939
-0.162389	a clock cycle is	-0.301030
-0.512373	possible pointer aliasing is	-0.124939
-0.312661	www.agner.org/optimize/testp.zip. This tool is	-0.124939
-0.102225	and development tool is	-0.124939
-0.102225	popular development tool is	-0.124939
-0.338689	memset and memcpy is	-0.124939
-0.045785	where the parallelism is	-0.124939
-0.222013	parallel. Fine-grained parallelism is	-0.124939
-0.276723	if because #if is	-0.124939
-0.276723	source code. #if is	-0.124939
-0.276166	but this unit is	-0.124939
-0.276166	The time unit is	-0.124939
-0.338047	where each label is	-0.124939
-0.336766	number of iterations is	-0.124939
-0.475180	a branch misprediction is	-0.124939
-0.523088	of lazy binding is	-0.124939
-0.336766	The theoretical background is	-0.124939
-0.523182	A dependency chain is	-0.124939
-0.337406	of complicated algorithms is	-0.124939
-0.337406	of possible inputs is	-0.124939
-0.338047	packages and who is	-0.124939
-0.435624	then the DLL is	-0.124939
-0.338689	of memory required is	-0.124939
-0.338047	the keyword volatile is	-0.124939
-0.745166	of cache misses is	-0.124939
-0.510787	terminated. The purpose is	-0.124939
-0.720263	compiled without -fpic is	-0.124939
-0.335132	C++. Yet, D is	-0.124939
-0.335132	each value xn is	-0.124939
-1.018480	new and delete is	-0.124939
-0.266143	the code itself is	-0.124939
-0.266143	the device itself is	-0.124939
-0.704673	of context switches is	-0.124939
-0.512448	sum. The trick is	-0.124939
-0.380049	the loop body is	-0.124939
-0.266143	if its body is	-0.124939
-0.501096	the compiler generates is	-0.124939
-0.334404	without using exceptions is	-0.124939
-0.885911	when the CPUID is	-0.124939
-0.335861	of five manuals is	-0.124939
-0.432661	the type T is	-0.124939
-0.474061	its binary representation is	-0.124939
-0.433573	73. Runtime polymorphism is	-0.124939
-0.432661	when the factor is	-0.124939
-0.335861	innermost loop. log is	-0.124939
-0.332963	manually. This principle is	-0.124939
-0.108125	first time Func is	-0.124939
-0.108125	every time Func is	-0.124939
-0.050817	thing we notice is	-0.425969
-0.332118	recommended if portability is	-0.124939
-0.757713	in the debugger is	-0.124939
-0.331275	the image base is	-0.124939
-0.332118	where the compilation is	-0.124939
-0.332118	looking name ?Func@@YAXQAHAAH@Z is	-0.124939
-0.334657	preprocessing macro INSTRSET is	-0.124939
-0.326931	as versatile. Fortran is	-0.124939
-0.327933	order of inheritance is	-0.124939
-0.326931	disk. Memory swapping is	-0.124939
-0.328937	report that memset is	-0.124939
-0.081335	your optimization effort is	-0.425969
-0.810908	and constant propagation is	-0.124939
-0.424562	reductions. Algebraic reduction is	-0.124939
-0.161510	size of abc is	-0.124939
-0.423311	instructions. My recommendation is	-0.124939
-0.326931	a program package is	-0.124939
-0.423311	handling cleanup jobs is	-0.124939
-0.326931	value of n! is	-0.124939
-0.598080	version of Basic is	-0.124939
-0.477951	new or malloc is	-0.124939
-0.326931	in example 15.1c is	-0.124939
-0.326931	problem with macros is	-0.124939
-0.328937	solution you prefer is	-0.124939
-0.021672	of the divisor is	-0.124939
-0.021672	Faster if divisor is	-0.425969
-0.767534	the time slices is	-0.124939
-0.321731	Enums An enum is	-0.124939
-0.320497	if our estimate is	-0.124939
-0.121122	the way m is	-0.124939
-0.056352	template function, m is	-0.124939
-0.056352	simple function, m is	-0.124939
-0.500574	partial template specialization is	-0.124939
-0.320497	situations where pre-increment is	-0.124939
-0.322969	object, and ownership is	-0.124939
-0.092226	assume that *p+2 is	-0.124939
-0.092226	assuming that *p+2 is	-0.124939
-0.321731	14.8 and 14.9 is	-0.124939
-0.320497	a certain modification is	-0.124939
-0.321731	loop if powN is	-0.124939
-0.092226	if the bottleneck is	-0.124939
-0.092226	If the bottleneck is	-0.124939
-0.321731	static data area is	-0.124939
-0.416828	wasted. The consequence is	-0.124939
-0.321731	such an assumption is	-0.124939
-0.321731	across compilers. Fastcall is	-0.124939
-0.320497	address. Step (1) is	-0.124939
-0.320497	explicitly when alloca is	-0.124939
-0.452123	when the original is	-0.124939
-0.209468	performance by unit-testing is	-0.124939
-0.209468	development. This unit-testing is	-0.124939
-0.320497	the desired interval is	-0.124939
-0.416828	not! 250 μs is	-0.124939
-0.470248	line (in bytes) is	-0.124939
-0.320497	application. If hyperthreading is	-0.124939
-0.209468	floating point format is	-0.124939
-0.399461	intermediate file format is	-0.124939
-0.141967	71). The conclusion is	-0.124939
-0.141967	utility. The conclusion is	-0.124939
-0.620148	layers of abstraction is	-0.124939
-0.309980	of data manipulation is	-0.124939
-0.065208	if the dividend is	-0.124939
-0.309980	table of coefficients is	-0.124939
-0.309980	with sequential labels is	-0.124939
-0.309980	The main focus is	-0.124939
-0.309980	the function longjmp is	-0.124939
-0.566948	template library (STL) is	-0.124939
-0.404236	each element matrix[r][c] is	-0.124939
-0.309980	explains why bookkeeping is	-0.124939
-0.309980	logical processors. Hyperthreading is	-0.124939
-0.311587	so that a+b is	-0.124939
-0.309980	to this argument is	-0.124939
-0.311587	the kind: "what is	-0.124939
-0.309980	The CPU market is	-0.124939
-0.566948	in example 11.3 is	-0.124939
-0.311587	x = *(p++) is	-0.124939
-0.170379	performing software product is	-0.124939
-0.170379	A competing product is	-0.124939
-0.311587	number of allocations is	-0.124939
-0.309980	/ jl $B1$2 is	-0.124939
-0.289680	more realistic goal is	-0.124939
-0.289680	stack when CriticalInnerFunction is	-0.124939
-0.289680	2" Here CParent is	-0.124939
-0.289680	2 then N&(N-1) is	-0.124939
-0.289680	The integer comparison is	-0.124939
-0.289680	a linear search, is	-0.124939
-0.377279	the memory footprint is	-0.124939
-0.289680	computer. The proxy is	-0.124939
-0.289680	doesn't compromise safety is	-0.124939
-0.289680	alternative worth considering is	-0.124939
-0.289680	an hour. Neither is	-0.124939
-0.289680	the .exe file, is	-0.124939
-0.289680	function. The branching is	-0.124939
-0.377279	so-called symbol interposition is	-0.124939
-0.289680	in example 14.1c is	-0.124939
-0.289680	address of matrix[j][0] is	-0.124939
-0.101398	the time MemberPointer is	-0.124939
-0.101398	c1 before MemberPointer is	-0.124939
-0.377279	with all 1's is	-0.124939
-0.289680	SSE3. // (This is	-0.124939
-0.289680	Wednesday or Friday is	-0.124939
-0.289680	Optimizing database queries is	-0.124939
-0.289680	identify performance bottlenecks is	-0.124939
-0.377279	of a bitfield is	-0.124939
-0.289680	use of coprocessors is	-0.124939
-0.377279	constructor, if any, is	-0.124939
-0.377279	loop. Example 8.21 is	-0.124939
-0.377279	The FDIV bug is	-0.124939
-0.233921	when no attempt is	-0.124939
-0.233921	Another serious burden is	-0.124939
-0.233921	them enabled (there is	-0.124939
-0.233921	with the LLVM is	-0.124939
-0.233921	he or she is	-0.124939
-0.233921	in example 14.7b is	-0.124939
-0.233921	value of cc[i]+2 is	-0.124939
-0.233921	tiling. This technique is	-0.124939
-0.233921	of different targets is	-0.124939
-0.233921	The empty throw()specification is	-0.124939
-0.233921	bit // u.d is	-0.124939
-0.233921	a constant: Unsigned is	-0.124939
-0.233921	that model N-1 is	-0.124939
-0.233921	value of i&15 is	-0.124939
-0.233921	though the CPU-type is	-0.124939
-0.233921	the sign, eee is	-0.124939
-0.233921	12.4b and 12.4c is	-0.124939
-0.233921	representation of &list[100] is	-0.124939
-0.233921	(int)d; // Truncation is	-0.124939
-0.233921	support processor X" is	-0.124939
-0.233921	is often seen, is	-0.124939
-0.233921	case memory re-allocation is	-0.124939
-0.233921	of it (&ArraySize) is	-0.124939
-0.233921	shift operation. x*8 is	-0.124939
-0.233921	loop count (ArraySize) is	-0.124939
-0.233921	f(x) or g(x) is	-0.124939
-0.233921	code, which supposedly is	-0.124939
-0.233921	for-loop or while-loop is	-0.124939
-0.233921	games and animations is	-0.124939
-0.233921	} Here, log(2.0) is	-0.124939
-0.233921	stdint.h or inttypes.h is	-0.124939
-0.233921	x = array[i++] is	-0.124939
-0.233921	the memory bus is	-0.124939
-0.233921	C1::Disp() or C2::Disp() is	-0.124939
-0.233921	is over. Virtualization is	-0.124939
-0.233921	*p or p->member is	-0.124939
-0.233921	Mathcad (v. 15.0) is	-0.124939
-0.233921	b * 1.5f; is	-0.124939
-0.233921	limited "express" edition is	-0.124939
-0.233921	the divisions (Division is	-0.124939
-0.233921	mask, and bb[i]*cc[i] is	-0.124939
-0.233921	integer type size_t is	-0.124939
-0.233921	at runtime. Polymorphism is	-0.124939
-0.233921	with other subtasks is	-0.124939
-0.233921	exponent, and fffff is	-0.124939
-0.233921	most important remedy is	-0.124939
-0.233921	above, page 87) is	-0.124939
-0.233921	instruction add eax,1 is	-0.124939
-0.233921	stack. This behaviour is	-0.124939
-0.233921	languages. My preference is	-0.124939
-0.233921	// This triangle is	-0.124939
-0.233921	lrint(d); // Rounding is	-0.124939
-0.233921	loop. The loop-branch is	-0.124939
-0.233921	unless the strictness is	-0.124939
-0.233921	common sub-expressions. Why is	-0.124939
-0.233921	new or malloc) is	-0.124939
-0.233921	distribution and mirroring is	-0.124939
-0.233921	that the occurrence is	-0.124939
-0.233921	set (or higher) is	-0.124939
-0.233921	and other abuse is	-0.124939
-0.233921	memory. One kilobyte is	-0.124939
-0.233921	in example 7.43b is	-0.124939
-0.233921	load address. Relocation is	-0.124939
-0.233921	in example 14.21 is	-0.124939
-0.233921	If the granularity is	-0.124939
-0.233921	1. This '1' is	-0.124939
-0.764567	pointed to is a	-0.124939
-0.957468	value that is a	-0.124939
-0.967717	expression that is a	-0.124939
-0.508234	divisor that is a	-0.124939
-1.147938	if it is a	-0.124939
-1.186663	because it is a	-0.124939
-0.990159	member function is a	-0.124939
-0.507763	frame function is a	-0.124939
-0.507763	pure function is a	-0.124939
-0.731363	leaf function is a	-0.124939
-0.547984	Complicated code is a	-0.124939
-0.721174	time. This is a	-0.124939
-0.423047	vector. This is a	-0.124939
-0.423047	Studio This is a	-0.124939
-0.423047	16. This is a	-0.124939
-0.423047	counter. This is a	-0.124939
-0.423047	Mars This is a	-0.124939
-0.423047	up. This is a	-0.124939
-0.423047	pointers. This is a	-0.124939
-0.423047	32. This is a	-0.124939
-0.423047	alloca. This is a	-0.124939
-0.423047	if. This is a	-0.124939
-0.423047	i*sizeof(S1). This is a	-0.124939
-0.423047	((a+b)+c)+d. This is a	-0.124939
-0.423047	eax,eax. This is a	-0.124939
-0.396745	This compiler is a	-0.124939
-0.696582	Intel compiler is a	-0.124939
-0.396745	Gnu compiler is a	-0.124939
-0.396745	Clang compiler is a	-0.124939
-0.550133	that this is a	-0.124939
-0.319435	you use is a	-0.124939
-0.577319	x. It is a	-0.124939
-0.470276	in memory is a	-0.124939
-0.470276	input data is a	-0.124939
-0.540762	up, which is a	-0.124939
-0.518116	A cache is a	-0.124939
-0.362424	this example is a	-0.124939
-0.108160	My example is a	-0.124939
-0.358951	the size is a	-0.124939
-0.358951	cache size is a	-0.124939
-0.358951	its size is a	-0.124939
-0.398815	when b is a	-0.124939
-0.645472	if there is a	-0.124939
-0.510082	then there is a	-0.124939
-0.146632	If there is a	-0.124939
-0.520116	but there is a	-0.124939
-0.092067	unless there is a	-0.124939
-0.369659	however, there is a	-0.124939
-0.369659	Typically, there is a	-0.124939
-0.358467	process There is a	-0.124939
-0.358467	size. There is a	-0.124939
-0.358467	inefficient. There is a	-0.124939
-0.358467	block. There is a	-0.124939
-0.358467	consuming. There is a	-0.124939
-0.358467	support. There is a	-0.124939
-0.358467	programmer. There is a	-0.124939
-0.358467	Gnu. There is a	-0.124939
-0.358467	Conclusion There is a	-0.124939
-0.358467	recycled? There is a	-0.124939
-0.501426	each array is a	-0.124939
-0.584186	do so is a	-0.124939
-0.450686	A register is a	-0.124939
-0.639684	powN template is a	-0.124939
-0.482469	because registers is a	-0.124939
-0.670340	memory access is a	-0.124939
-0.287611	simplest case is a	-0.124939
-0.287611	worst case is a	-0.124939
-0.539530	the constant is a	-0.124939
-0.327069	the stack is a	-0.124939
-0.461046	The stack is a	-0.124939
-0.828202	programming language is a	-0.124939
-0.060574	How much is a	-0.425969
-0.362424	the matrix is a	-0.124939
-0.264150	a matrix is a	-0.124939
-0.537251	optimal solution is a	-0.124939
-0.689830	linked list is a	-0.124939
-0.319435	quite likely is a	-0.124939
-0.726797	or structure is a	-0.124939
-0.482469	monitor counter is a	-0.124939
-0.500251	If caching is a	-0.124939
-0.670340	when n is a	-0.124939
-0.450686	whether r is a	-0.124939
-0.468092	A union is a	-0.124939
-0.319435	context switch is a	-0.124939
-0.539474	operator here is a	-0.124939
-0.383259	And here is a	-0.124939
-0.319435	when columns is a	-0.124939
-0.355517	that p is a	-0.124939
-0.355517	whether p is a	-0.124939
-0.788914	the exponent is a	-0.124939
-0.450686	each iteration is a	-0.124939
-0.159422	if N is a	-0.124939
-0.159422	If N is a	-0.124939
-0.159422	where N is a	-0.124939
-0.413971	case situation is a	-0.124939
-0.450686	control condition is a	-0.124939
-0.319435	switch statements is a	-0.124939
-0.191413	critical stride is a	-0.425969
-0.319435	per row is a	-0.124939
-0.319435	logic device is a	-0.124939
-0.319435	user. Time is a	-0.124939
-0.319435	capabilities. Here is a	-0.124939
-0.319435	directives. OpenMP is a	-0.124939
-0.319435	dependency chain is a	-0.124939
-0.130987	code itself is a	-0.124939
-0.130987	device itself is a	-0.124939
-0.319435	type T is a	-0.124939
-0.319435	the factor is a	-0.124939
-0.319435	loop. log is a	-0.124939
-0.319435	Memory swapping is a	-0.124939
-0.319435	Algebraic reduction is a	-0.124939
-0.584186	of abc is a	-0.124939
-0.319435	you prefer is a	-0.124939
-0.120420	if divisor is a	-0.425969
-0.060574	that *p+2 is a	-0.425969
-0.319435	desired interval is a	-0.124939
-0.319435	(in bytes) is a	-0.124939
-0.319435	of abstraction is a	-0.124939
-0.319435	library (STL) is a	-0.124939
-0.319435	Here CParent is a	-0.124939
-0.319435	FDIV bug is a	-0.124939
-0.319435	the LLVM is a	-0.124939
-0.777656	0 = a a	-0.124939
-0.534830	false = a a	-0.124939
-0.534830	^0 = a a	-0.124939
-0.356842	& a= a a	-0.124939
-0.406312	a function of a	-0.124939
-0.572898	linear function of a	-0.124939
-0.412661	code because of a	-0.124939
-0.412661	b because of a	-0.124939
-0.412661	intended because of a	-0.124939
-0.844959	member functions of a	-0.124939
-0.415633	newest CPU of a	-0.124939
-0.072626	innermost loop of a	-0.425969
-0.159884	message loop of a	-0.124939
-0.320771	an integer of a	-0.124939
-0.498919	the example of a	-0.124939
-0.870373	the size of a	-0.301030
-0.320771	a pointer of a	-0.124939
-0.452809	an object of a	-0.124939
-0.256161	new object of a	-0.124939
-0.109461	An object of a	-0.124939
-0.228844	which version of a	-0.124939
-1.101143	the value of a	-0.124939
-0.506307	Returning objects of a	-0.124939
-0.517629	overall performance of a	-0.124939
-0.480998	a member of a	-0.124939
-0.345894	data member of a	-0.124939
-0.328156	the elements of a	-0.124939
-0.328156	all elements of a	-0.124939
-1.672135	the address of a	-0.124939
-0.533253	each bit of a	-0.124939
-0.643016	moved out of a	-0.124939
-0.169892	jumping out of a	-0.124939
-0.543637	is part of a	-0.124939
-0.386160	not part of a	-0.124939
-0.386160	If part of a	-0.124939
-0.386423	critical part of a	-0.367977
-0.386160	access part of a	-0.124939
-0.182471	32 bits of a	-0.301030
-0.512400	return type of a	-0.124939
-1.045718	different versions of a	-0.124939
-0.833641	multiple versions of a	-0.124939
-0.657549	two versions of a	-0.124939
-0.931566	the speed of a	-0.124939
-0.485338	positive overflow of a	-0.124939
-0.415633	the uses of a	-0.124939
-0.554920	full advantage of a	-0.124939
-0.452495	logic structure of a	-0.124939
-0.395932	the values of a	-0.425969
-0.060758	the sign of a	-0.124939
-0.383277	are members of a	-0.124939
-0.268624	data members of a	-0.124939
-0.113866	Data members of a	-0.124939
-0.415633	the development of a	-0.124939
-0.524049	the disadvantage of a	-0.124939
-0.774901	the end of a	-0.124939
-0.400774	to end of a	-0.124939
-1.161891	different parts of a	-0.124939
-0.519758	integer types of a	-0.124939
-0.484919	function instead of a	-0.124939
-0.484919	typedef instead of a	-0.124939
-0.415633	all optimizations of a	-0.124939
-0.484416	live range of a	-0.124939
-0.320771	the modules of a	-0.124939
-0.320771	The change of a	-0.124939
-0.531580	Each instance of a	-0.124939
-0.415633	assembly output of a	-0.124939
-1.000640	The efficiency of a	-0.124939
-0.415633	a sum of a	-0.124939
-0.870555	the offset of a	-0.124939
-0.235248	the length of a	-0.124939
-0.079996	The length of a	-0.425969
-1.406737	the beginning of a	-0.124939
-0.320771	The transfer of a	-0.124939
-0.133794	conversion Conversion of a	-0.124939
-0.586648	the body of a	-0.124939
-0.415633	A collection of a	-0.124939
-1.060475	the scope of a	-0.124939
-0.226903	the form of a	-0.124939
-0.320771	inefficient. Objects of a	-0.124939
-0.320771	calculations. Division of a	-0.124939
-0.586648	the latency of a	-0.124939
-0.710578	small pieces of a	-0.124939
-0.586648	complete redesign of a	-0.124939
-0.452495	the combination of a	-0.124939
-0.320771	disassembly window of a	-0.124939
-0.320771	the dangers of a	-0.124939
-0.415633	slow. Value of a	-0.124939
-0.415633	The creation of a	-0.124939
-0.131420	12.8a. Sum of a	-0.124939
-0.131420	12.8b. Sum of a	-0.124939
-0.320771	self-explaining menus of a	-0.124939
-0.320771	The benefits of a	-0.124939
-0.320771	the insertion of a	-0.124939
-0.320771	mathematical notion of a	-0.124939
-0.320771	extra layer of a	-0.124939
-0.762402	convert it to a	-0.124939
-1.108751	the code to a	-0.124939
-0.549772	to point to a	-0.124939
-0.420107	an integer to a	-0.124939
-0.296627	unsigned integer to a	-0.124939
-0.296627	signed integer to a	-0.124939
-0.335319	one class to a	-0.124939
-0.426373	a pointer to a	-0.124939
-0.350976	'this' pointer to a	-0.124939
-0.472316	an object to a	-0.124939
-0.335319	composite objects to a	-0.124939
-0.485537	a call to a	-0.124939
-0.485537	function call to a	-0.124939
-0.344923	A call to a	-0.124939
-0.344923	each call to a	-0.124939
-0.485537	single call to a	-0.124939
-0.545317	forward access to a	-0.124939
-0.335319	the file to a	-0.124939
-0.530308	Multiple calls to a	-0.124939
-0.136070	the thread to a	-0.124939
-0.136070	a thread to a	-0.124939
-0.526781	as parameters to a	-0.124939
-0.433808	an application to a	-0.124939
-0.379707	or reference to a	-0.124939
-0.330273	relative reference to a	-0.124939
-0.433808	The link to a	-0.124939
-0.969456	initially points to a	-0.124939
-0.335319	unused columns to a	-0.124939
-0.501302	or writing to a	-0.124939
-0.706821	be changed to a	-0.124939
-0.433808	main executable to a	-0.124939
-0.734838	is copied to a	-0.124939
-0.613873	is similar to a	-0.124939
-0.820700	is added to a	-0.124939
-0.378283	be added to a	-0.124939
-0.473266	be applied to a	-0.124939
-0.099071	when applied to a	-0.124939
-0.093601	is converted to a	-0.124939
-0.233454	be converted to a	-0.124939
-0.433808	microprocessor jump to a	-0.124939
-0.613873	the linker to a	-0.124939
-0.433808	is equivalent to a	-0.124939
-0.335319	be updated to a	-0.124939
-0.335319	resources Writes to a	-0.124939
-0.062728	to lead to a	-0.124939
-0.019919	can lead to a	-0.124939
-0.335319	the loader to a	-0.124939
-0.472316	binding leads to a	-0.124939
-0.136070	is type-casted to a	-0.124939
-0.136070	are type-casted to a	-0.124939
-0.335319	never respond to a	-0.124939
-0.335319	Non-public distribution to a	-0.124939
-0.335319	always comparable to a	-0.124939
-0.335319	< 223 to a	-0.124939
-0.335319	> -b to a	-0.124939
-0.335319	are confined to a	-0.124939
-0.556844	all functions and a	-0.124939
-0.701311	parent class and a	-0.124939
-0.347826	debug version and a	-0.124939
-0.553277	existing systems and a	-0.124939
-0.637942	the user and a	-0.124939
-0.494692	VIA processor and a	-0.124939
-0.878577	programming language and a	-0.124939
-0.637942	memory block and a	-0.124939
-0.347826	function parameter and a	-0.124939
-0.347826	instruction set, and a	-0.124939
-0.347826	point addition, and a	-0.124939
-0.449558	the object, and a	-0.124939
-0.449558	e.g. C++, and a	-0.124939
-0.449558	program development, and a	-0.124939
-0.449558	multiple cores, and a	-0.124939
-0.347826	well-defined functionality and a	-0.124939
-0.449558	remote database, and a	-0.124939
-0.347826	table (PLT) and a	-0.124939
-0.347826	64 Kbytes and a	-0.124939
-0.347826	processes running, and a	-0.124939
-0.347826	heavy traffic and a	-0.124939
-0.291338	matrix a in a	-0.124939
-0.149497	function or in a	-0.124939
-0.149497	code or in a	-0.124939
-0.149497	flag or in a	-0.124939
-0.431026	store it in a	-0.124939
-0.121405	a function in a	-0.301030
-0.379308	dealing with in a	-0.124939
-0.471021	the code in a	-0.124939
-0.359885	or not in a	-0.124939
-0.359885	but not in a	-0.124939
-0.284130	library than in a	-0.124939
-0.284130	less than in a	-0.124939
-0.224345	rather than in a	-0.124939
-0.284130	device than in a	-0.124939
-0.121684	avoid this in a	-0.124939
-0.121684	like this in a	-0.124939
-0.379308	main memory in a	-0.124939
-0.567995	all data in a	-0.124939
-0.402965	received data in a	-0.124939
-0.582887	the program in a	-0.124939
-0.533824	the same in a	-0.124939
-0.359885	suitable functions in a	-0.124939
-0.359885	internal functions in a	-0.124939
-0.517661	a loop in a	-0.124939
-0.442151	functions, but in a	-0.124939
-1.005548	is used in a	-0.124939
-0.999244	are used in a	-0.124939
-0.443361	only used in a	-0.124939
-0.291338	number one in a	-0.124939
-0.654573	the integer in a	-0.124939
-0.533824	and b in a	-0.124939
-0.379308	desired version in a	-0.124939
-0.720095	of objects in a	-0.124939
-0.416277	A variable in a	-0.124939
-0.416277	public variable in a	-0.124939
-0.291338	does so in a	-0.124939
-0.290869	used variables in a	-0.124939
-0.290869	most variables in a	-0.124939
-0.290869	public variables in a	-0.124939
-0.290869	Storing variables in a	-0.124939
-0.431026	making software in a	-0.124939
-0.380249	of elements in a	-0.301030
-0.366020	for elements in a	-0.124939
-0.413093	executed faster in a	-0.124939
-0.591085	is stored in a	-0.124939
-0.343098	be stored in a	-0.249877
-0.527557	are stored in a	-0.124939
-0.332798	pointer stored in a	-0.124939
-0.332798	never stored in a	-0.124939
-0.213469	is called in a	-0.124939
-0.213469	actually called in a	-0.124939
-0.895439	For example, in a	-0.124939
-0.379308	the first in a	-0.124939
-0.379308	file access in a	-0.124939
-0.413093	entire file in a	-0.124939
-0.264581	same bits in a	-0.124939
-0.264581	multiple bits in a	-0.124939
-0.264581	small bits in a	-0.124939
-0.441664	look up in a	-0.124939
-0.453195	many times in a	-0.124939
-0.138491	are accessed in a	-0.425969
-0.148941	or accessed in a	-0.124939
-0.068117	objects accessed in a	-0.124939
-0.056594	non-Intel CPUs in a	-0.425969
-0.431026	the calculations in a	-0.124939
-0.235810	the result in a	-0.124939
-0.609667	unused bytes in a	-0.124939
-0.453195	different threads in a	-0.124939
-0.441477	Each element in a	-0.124939
-0.625535	largest element in a	-0.124939
-0.267837	all done in a	-0.124939
-0.267837	usually done in a	-0.124939
-0.093749	the calculation in a	-0.425969
-0.259447	is implemented in a	-0.425969
-0.616132	be implemented in a	-0.124939
-0.291338	is likely in a	-0.124939
-0.442151	should run in a	-0.124939
-0.625821	the values in a	-0.124939
-0.413093	application-specific information in a	-0.124939
-0.379308	be fast in a	-0.124939
-0.291338	and branches in a	-0.124939
-0.291338	level, typically in a	-0.124939
-0.291338	more complicated in a	-0.124939
-0.291338	the programmer in a	-0.124939
-0.291338	a lookup in a	-0.124939
-0.379308	come last in a	-0.124939
-0.282940	are declared in a	-0.124939
-0.402012	objects declared in a	-0.124939
-0.291338	by piece in a	-0.124939
-0.291338	is smaller in a	-0.124939
-0.291338	provoked here in a	-0.124939
-0.067502	of columns in a	-0.602060
-0.442151	16 lines in a	-0.124939
-0.465040	all strings in a	-0.124939
-0.329999	store strings in a	-0.124939
-0.213469	multiple conditions in a	-0.124939
-0.213469	error conditions in a	-0.124939
-0.291338	different tasks in a	-0.124939
-0.291338	be obtained in a	-0.124939
-0.291338	overwritten, possibly in a	-0.124939
-0.379308	between rows in a	-0.124939
-0.533824	error message in a	-0.124939
-0.125945	is defined in a	-0.124939
-0.291338	the sequence in a	-0.124939
-0.379308	put something in a	-0.124939
-0.291338	be invalid in a	-0.124939
-0.291338	is organized in a	-0.124939
-0.291338	transferring 'this' in a	-0.124939
-0.291338	to implement in a	-0.124939
-0.684512	is included in a	-0.124939
-0.291338	of algebra in a	-0.124939
-0.413093	are saved in a	-0.124939
-0.379308	completely contained in a	-0.124939
-0.121684	be coded in a	-0.124939
-0.121684	are coded in a	-0.124939
-0.291338	standard PC's in a	-0.124939
-0.291338	be handled in a	-0.124939
-0.291338	as pivot in a	-0.124939
-0.533824	be placed in a	-0.124939
-0.291338	also proceed in a	-0.124939
-0.291338	a typo in a	-0.124939
-0.291338	than investing in a	-0.124939
-0.291338	completely absent in a	-0.124939
-0.291338	be programmed in a	-0.124939
-0.291338	test finishes in a	-0.124939
-0.291338	are indexed in a	-0.124939
-0.291338	and semicolons in a	-0.124939
-0.291338	be scheduled in a	-0.124939
-0.291338	cycles. Calculations in a	-0.124939
-0.337583	use that for a	-0.124939
-0.533416	right function for a	-0.124939
-0.817165	to use for a	-0.124939
-0.618183	of memory for a	-0.124939
-0.436651	function library for a	-0.124939
-0.618183	code branch for a	-0.124939
-0.436651	page 16 for a	-0.124939
-0.744218	header file for a	-0.124939
-0.918625	is compiled for a	-0.124939
-0.337583	per element for a	-0.124939
-0.529330	each optimized for a	-0.124939
-0.678528	a container for a	-0.124939
-0.337583	best implementation for a	-0.124939
-0.618183	exception handling for a	-0.124939
-0.337583	table lookup for a	-0.124939
-0.337583	hardware platform for a	-0.124939
-1.022022	when compiling for a	-0.124939
-0.337583	page 3 for a	-0.124939
-0.618183	big enough for a	-0.124939
-0.678528	is designed for a	-0.124939
-0.337583	The inputs for a	-0.124939
-0.958176	to wait for a	-0.124939
-0.436651	the principle for a	-0.124939
-0.337583	page 87 for a	-0.124939
-0.436651	page 90 for a	-0.124939
-0.337583	a directive for a	-0.124939
-0.618183	memory area for a	-0.124939
-0.337583	page 49 for a	-0.124939
-0.337583	except perhaps for a	-0.124939
-0.337583	130 Compile for a	-0.124939
-0.436651	and fine-tuned for a	-0.124939
-0.337583	be wired for a	-0.124939
-1.157860	reason is that a	-0.124939
-1.274742	the code that a	-0.124939
-1.082016	the compiler that a	-0.124939
-0.512072	these compilers that a	-0.124939
-0.339553	for example, that a	-0.124939
-1.369290	make sure that a	-0.124939
-1.217170	makes sure that a	-0.124939
-0.621950	so small that a	-0.124939
-0.343378	is certain that a	-0.124939
-0.343378	be certain that a	-0.124939
-0.631589	This means that a	-0.124939
-0.439127	16) shows that a	-0.124939
-0.373695	cannot know that a	-0.124939
-0.373695	would know that a	-0.124939
-0.535690	may require that a	-0.124939
-0.222185	code Assume that a	-0.124939
-0.222185	access. Assume that a	-0.124939
-0.222185	general. Assume that a	-0.124939
-0.904437	the possibility that a	-0.124939
-0.439127	often happen that a	-0.124939
-0.439127	keyword specifies that a	-0.124939
-0.339553	keyword tells that a	-0.124939
-0.339553	it unusual that a	-0.124939
-0.439127	convention says that a	-0.124939
-0.339553	planning stage that a	-0.124939
-0.339553	developers feel that a	-0.124939
-0.811893	this to be a	-0.124939
-1.314807	likely to be a	-0.124939
-1.367109	it can be a	-0.124939
-1.582709	This can be a	-0.124939
-0.557496	implementation can be a	-0.124939
-0.557496	constructor can be a	-0.124939
-0.557496	loading can be a	-0.124939
-0.557496	chip can be a	-0.124939
-1.246377	it may be a	-0.124939
-0.915067	there may be a	-0.124939
-1.011977	There may be a	-0.124939
-0.509901	solution may be a	-0.124939
-0.509901	frequency may be a	-0.124939
-0.509901	compilation may be a	-0.124939
-0.138915	not only be a	-0.425969
-0.566482	parameter should be a	-0.124939
-1.108806	can also be a	-0.124939
-0.697285	may also be a	-0.124939
-0.433274	may even be a	-0.124939
-0.334893	should always be a	-0.124939
-0.548623	framework must be a	-0.124939
-0.512161	will therefore be a	-0.124939
-0.344667	may preferably be a	-0.124939
-0.458811	should preferably be a	-0.249877
-0.613065	of course be a	-0.124939
-0.482589	this might be a	-0.124939
-0.471733	function could be a	-0.124939
-1.078057	that there are a	-0.124939
-0.756319	But there are a	-0.124939
-0.947848	However, there are a	-0.124939
-0.539209	etc. There are a	-0.124939
-0.539209	check. There are a	-0.124939
-0.539209	-abs(x);. There are a	-0.124939
-0.141167	register. Registers are a	-0.124939
-0.141167	reference. Registers are a	-0.124939
-0.116977	{ a = a	-1.028029
-0.420690	&& a = a	-0.124939
-0.420690	| a = a	-0.124939
-0.500519	y; x = a	-0.124939
-0.578015	b; b = a	-0.124939
-0.409792	a; b = a	-0.124939
-0.409792	(2.0f); b = a	-0.124939
-0.098947	| 0 = a	-0.124939
-0.366048	to c = a	-0.124939
-0.366048	division c = a	-0.124939
-0.145548	d; c = a	-0.124939
-0.371449	expression y = a	-0.124939
-0.187034	y; y = a	-0.602060
-0.161225	b; d = a	-0.124939
-0.410311	|| false = a	-0.124939
-0.578780	; ecx = a	-0.124939
-0.227303	- -(-a) = a	-0.124939
-0.227303	n.a. -(-a) = a	-0.124939
-0.130030	- a*1 = a	-0.124939
-0.130030	n.a. a*1 = a	-0.124939
-0.130030	- a+0 = a	-0.124939
-0.130030	n.a. a+0 = a	-0.124939
-0.316487	|| (!a&&c) = a	-0.124939
-0.316487	- a/1 = a	-0.124939
-0.316487	|| (b&&c) = a	-0.124939
-0.316487	|| (!a&&b) = a	-0.124939
-0.316487	^ ~b = a	-0.124939
-0.316487	reductions: ~(~a) = a	-0.124939
-0.316487	a ^0 = a	-0.124939
-0.062828	compile time or a	-0.124939
-0.336074	the same or a	-0.124939
-0.368307	has one or a	-0.124939
-0.368307	only one or a	-0.124939
-0.368307	just one or a	-0.124939
-0.865436	a pointer or a	-0.124939
-0.477323	simple pointer or a	-0.124939
-0.434756	function library or a	-0.124939
-0.336074	import table or a	-0.124939
-0.770645	multiple CPUs or a	-0.124939
-0.615308	command line or a	-0.124939
-0.434756	sorted list or a	-0.124939
-0.615308	a reference or a	-0.124939
-0.336074	different module or a	-0.124939
-0.336074	runtime DLL or a	-0.124939
-0.102833	binary tree or a	-0.425969
-0.336074	use objconv or a	-0.124939
-0.548522	then make it a	-0.124939
-0.356722	or give it a	-0.124939
-0.194086	make the function a	-0.425969
-1.037552	Make the function a	-0.124939
-0.554059	giving the function a	-0.124939
-0.812288	Call critical function a	-0.124939
-0.699770	means that if a	-0.124939
-1.192256	of 2 if a	-0.124939
-1.215876	is faster if a	-0.124939
-0.411912	For example, if a	-0.124939
-0.553366	works even if a	-0.124939
-0.523234	conflicts. But if a	-0.124939
-0.332307	be optimized if a	-0.124939
-0.526185	can check if a	-0.124939
-0.760443	be advantageous if a	-0.124939
-0.862861	to see if a	-0.124939
-0.430033	quite inefficient if a	-0.124939
-0.332307	complicated algorithm if a	-0.124939
-0.332307	can occur if a	-0.124939
-0.430033	unroll loops if a	-0.124939
-0.332307	is significant if a	-0.124939
-0.135116	be invalid if a	-0.124939
-0.135116	becomes invalid if a	-0.124939
-0.332307	stored (or if a	-0.124939
-0.062325	be evaluated if a	-0.425969
-0.311620	function is by a	-0.124939
-0.404278	replace it by a	-0.124939
-0.508532	ways than by a	-0.124939
-0.707106	memory used by a	-0.124939
-0.460752	replaced i by a	-0.124939
-0.311620	integer variable by a	-0.124939
-0.081279	the branch by a	-0.124939
-0.081279	a branch by a	-0.124939
-0.081279	predictable branch by a	-0.124939
-0.988233	be calculated by a	-0.124939
-0.311620	loop counter by a	-0.124939
-0.459258	integer multiplication by a	-0.124939
-0.034199	than division by a	-0.124939
-0.071322	point division by a	-0.124939
-0.016763	Integer division by a	-0.425969
-0.234846	be replaced by a	-0.124939
-0.311620	a database by a	-0.124939
-0.311620	array initialized by a	-0.124939
-1.112332	be improved by a	-0.124939
-0.440147	can multiply by a	-0.124939
-0.714954	be determined by a	-0.124939
-0.081279	// Division by a	-0.124939
-0.081279	faster. Division by a	-0.124939
-0.081279	matters: Division by a	-0.124939
-0.400973	are identified by a	-0.124939
-0.282150	objects identified by a	-0.124939
-0.492006	index multiplied by a	-0.124939
-0.059489	be spaced by a	-0.425969
-0.311620	// Modulo by a	-0.124939
-0.311620	operations. Multiplying by a	-0.124939
-0.280582	a function with a	-0.124939
-0.280582	simple function with a	-0.124939
-0.286544	log on with a	-0.124939
-0.454609	this code with a	-0.124939
-0.373445	specialization, not with a	-0.124939
-0.647253	a CPU with a	-0.124939
-0.286544	the other with a	-0.124939
-0.112391	A loop with a	-0.124939
-0.373445	an integer with a	-0.124939
-0.092730	a class with a	-0.124939
-0.286544	kb size with a	-0.124939
-0.210791	&& b with a	-0.124939
-0.210791	|| b with a	-0.124939
-0.241504	function library with a	-0.124939
-0.406759	linear array with a	-0.124939
-0.373445	distinguish elements with a	-0.124939
-0.286544	function call with a	-0.124939
-0.616258	on processors with a	-0.124939
-0.210791	a constant with a	-0.124939
-0.210791	single constant with a	-0.124939
-0.373445	many times with a	-0.124939
-0.406759	is accessed with a	-0.124939
-0.652736	on CPUs with a	-0.124939
-0.286544	high-level language with a	-0.124939
-0.406759	of integers with a	-0.124939
-0.463084	be done with a	-0.124939
-0.286544	linear list with a	-0.124939
-0.373445	test run with a	-0.124939
-0.286544	platform. However, with a	-0.124939
-0.286544	data members with a	-0.124939
-0.286544	A model with a	-0.124939
-0.286544	>> n with a	-0.124939
-0.286544	always end with a	-0.124939
-0.373445	a platform with a	-0.124939
-0.120056	or modules with a	-0.124939
-0.120056	critical modules with a	-0.124939
-0.286544	be tested with a	-0.124939
-0.406759	old computer with a	-0.124939
-0.823127	is compatible with a	-0.124939
-0.442843	backwards compatibility with a	-0.124939
-0.474728	doubt obtained with a	-0.124939
-0.472880	when multiplying with a	-0.124939
-0.286544	of Func with a	-0.124939
-0.286544	of communication with a	-0.124939
-0.286544	style type-casting with a	-0.124939
-0.286544	be performed with a	-0.124939
-0.286544	or moved with a	-0.124939
-0.286544	optimizations. Loops with a	-0.124939
-0.286544	4 ways, with a	-0.124939
-0.286544	to trace with a	-0.124939
-0.286544	calling vector::reserve with a	-0.124939
-0.286544	be reached with a	-0.124939
-0.407036	it is on a	-0.124939
-0.407036	same function on a	-0.124939
-0.313847	very long on a	-0.124939
-0.313847	a file on a	-0.124939
-0.443144	the processors on a	-0.124939
-0.059800	are accessed on a	-0.124939
-0.730389	to work on a	-0.124939
-0.313847	cross- compiled on a	-0.124939
-0.313847	multiple threads on a	-0.124939
-0.829858	works best on a	-0.124939
-0.313847	64 matrix on a	-0.124939
-0.407036	preferably implemented on a	-0.124939
-0.443144	still run on a	-0.124939
-0.443144	sufficiently fast on a	-0.124939
-0.313847	anything else on a	-0.124939
-0.313847	point addition on a	-0.124939
-0.573962	be tested on a	-0.124939
-0.882786	cannot rely on a	-0.124939
-0.313847	unit, either on a	-0.124939
-0.313847	were measured on a	-0.124939
-0.313847	software package on a	-0.124939
-0.313847	particularly bad on a	-0.124939
-0.407036	250 μs on a	-0.124939
-0.573962	is performed on a	-0.124939
-0.313847	typically specified on a	-0.124939
-0.313847	example 9.5a on a	-0.124939
-0.313847	cache miss on a	-0.124939
-0.313847	predicted perfectly on a	-0.124939
-0.313847	a tag on a	-0.124939
-0.313847	runs satisfactorily on a	-0.124939
-0.313847	Boolean NOT on a	-0.124939
-0.281017	parameter, or as a	-0.124939
-0.281017	recognizes it as a	-0.124939
-0.281017	size, not as a	-0.124939
-0.366703	buffer than as a	-0.124939
-1.021859	the same as a	-0.124939
-0.412408	be used as a	-0.124939
-0.281017	/ b as a	-0.124939
-0.459330	type such as a	-0.124939
-0.459330	applications such as a	-0.124939
-0.221300	as efficient as a	-0.301030
-0.135597	is stored as a	-0.124939
-0.281017	quite often as a	-0.124939
-0.281017	oriented programming as a	-0.124939
-0.366703	development work as a	-0.124939
-0.366703	be compiled as a	-0.124939
-0.281017	C language as a	-0.124939
-0.281017	be done as a	-0.124939
-0.073969	is implemented as a	-0.425969
-0.147475	be implemented as a	-0.301030
-0.163176	often implemented as a	-0.124939
-0.798611	as fast as a	-0.124939
-0.091545	same name as a	-0.124939
-0.281017	thousand numbers as a	-0.124939
-0.468629	index, just as a	-0.124939
-0.281017	in STL as a	-0.124939
-0.281017	is intended as a	-0.124939
-0.281017	is given as a	-0.124939
-0.281017	the offset as a	-0.124939
-0.281017	may occur as a	-0.124939
-0.431931	linked either as a	-0.124939
-0.281017	uses ebx as a	-0.124939
-0.260479	be organized as a	-0.124939
-0.260479	registers organized as a	-0.124939
-0.515942	is provided as a	-0.124939
-0.366703	be interpreted as a	-0.124939
-0.281017	in edx as a	-0.124939
-0.281017	will appear as a	-0.124939
-0.281017	a queue as a	-0.124939
-0.270267	be expressed as a	-0.124939
-0.281017	an FPGA as a	-0.124939
-0.281017	get ReadTSC as a	-0.124939
-0.281017	or microseconds as a	-0.124939
-0.366703	implemented internally as a	-0.124939
-0.281017	be regarded as a	-0.124939
-0.281017	by assignment, as a	-0.124939
-0.757503	This is not a	-0.124939
-0.655133	this is not a	-0.124939
-1.205383	It is not a	-0.124939
-0.757503	union is not a	-0.124939
-0.523194	N is not a	-0.124939
-0.523194	tool is not a	-0.124939
-0.345249	(Scalar means not a	-0.124939
-0.738029	n.a. n.a. - a	-0.425969
-0.857479	Func2() { int a	-0.124939
-0.334678	take more than a	-0.124939
-1.027428	more efficient than a	-0.124939
-0.929649	less efficient than a	-0.124939
-1.110303	is faster than a	-0.124939
-0.682785	be faster than a	-0.124939
-0.478086	called faster than a	-0.124939
-0.526486	write less than a	-0.124939
-0.486421	use rather than a	-0.124939
-0.486421	template rather than a	-0.124939
-0.486421	implementation rather than a	-0.124939
-0.486421	parameter rather than a	-0.124939
-0.486421	blocks rather than a	-0.124939
-0.317916	more bits than a	-0.124939
-0.514262	memory resources than a	-0.124939
-0.228078	is longer than a	-0.124939
-0.228078	be longer than a	-0.124939
-0.317916	more time-consuming than a	-0.124939
-0.317916	is lower than a	-0.124939
-0.354142	much slower than a	-0.124939
-0.354142	run slower than a	-0.124939
-0.317916	is simpler than a	-0.124939
-0.317916	and verify than a	-0.124939
-0.514250	} else { a	-0.823909
-0.209426	if (b) { a	-0.726999
-0.338491	if (true) { a	-0.124939
-0.410997	is to have a	-0.124939
-0.410997	important to have a	-0.124939
-0.398750	loop and have a	-0.124939
-0.499621	processors that have a	-0.124939
-0.335200	system may have a	-0.124939
-0.335200	processor may have a	-0.124939
-0.335200	unit-test may have a	-0.124939
-0.407635	member functions have a	-0.124939
-0.407635	Inlined functions have a	-0.124939
-0.457572	Intel compilers have a	-0.124939
-0.977733	Some compilers have a	-0.124939
-0.117475	systems also have a	-0.124939
-0.708182	Do objects have a	-0.124939
-0.464700	numbers, we have a	-0.124939
-0.614386	if elements have a	-0.124939
-0.434147	Some systems have a	-0.124939
-0.398750	may even have a	-0.124939
-0.464700	Many CPUs have a	-0.124939
-0.222209	class must have a	-0.124939
-0.222209	task must have a	-0.124939
-0.307149	the thread have a	-0.124939
-0.398750	therefore preferably have a	-0.124939
-0.307149	capabilities still have a	-0.124939
-0.307149	development models have a	-0.124939
-0.328031	this every time a	-0.124939
-0.328031	example every time a	-0.124939
-0.133754	block every time a	-0.425969
-0.328031	updated every time a	-0.124939
-0.441110	The next time a	-0.124939
-0.341129	(methods) Each time a	-0.124939
-0.341129	returns. Every time a	-0.124939
-0.523207	is to use a	-0.124939
-0.377079	program to use a	-0.124939
-0.148839	efficient to use a	-0.124939
-0.530646	need to use a	-0.124939
-0.862325	advantageous to use a	-0.124939
-0.379165	recommended to use a	-0.124939
-0.530646	optimal to use a	-0.124939
-0.530646	preferred to use a	-0.124939
-0.530646	safer to use a	-0.124939
-0.457508	interface and use a	-0.124939
-0.914705	you can use a	-0.124939
-0.468998	interface can use a	-0.124939
-0.487123	Do not use a	-0.124939
-0.785956	you may use a	-0.124939
-0.285383	element then use a	-0.124939
-0.119660	basis then use a	-0.124939
-0.289003	point variables use a	-0.124939
-0.376450	vector operations use a	-0.124939
-0.376450	functions must use a	-0.124939
-0.289003	as well use a	-0.124939
-0.376450	software applications use a	-0.124939
-0.289003	some programmers use a	-0.124939
-0.289003	applications. Alternatively, use a	-0.124939
-0.056253	version (May use a	-0.425969
-0.466040	efficient than when a	-0.124939
-0.539653	branch only when a	-0.124939
-0.466040	be used when a	-0.124939
-0.330732	evaluate b when a	-0.124939
-0.330732	For example, when a	-0.124939
-0.330732	inconvenient times when a	-0.124939
-0.330732	be advantageous when a	-0.124939
-0.330732	delay comes when a	-0.124939
-0.330732	object. Likewise, when a	-0.124939
-0.330732	This happens when a	-0.124939
-0.330732	re- allocating when a	-0.124939
-0.330732	in popularity when a	-0.124939
-0.349407	is small then a	-0.124939
-0.244865	limited range then a	-0.124939
-0.244865	narrow range then a	-0.124939
-0.349407	the container, then a	-0.124939
-0.446443	call it from a	-0.124939
-0.446443	send data from a	-0.124939
-0.468983	the value from a	-0.124939
-0.332885	a value from a	-0.124939
-0.875254	when called from a	-0.124939
-0.558917	are available from a	-0.124939
-0.396736	easily available from a	-0.124939
-1.012132	is calculated from a	-0.124939
-0.316295	is known from a	-0.124939
-0.316295	not optimal from a	-0.124939
-0.316295	steals resources from a	-0.124939
-0.682499	to read from a	-0.124939
-0.316295	for response from a	-0.124939
-0.410072	or comes from a	-0.124939
-0.909747	to recover from a	-0.124939
-0.316295	const restriction from a	-0.124939
-0.519784	re-loaded from memory a	-0.124939
-0.353821	how much memory a	-0.124939
-0.307756	responded to at a	-0.124939
-0.307756	may be at a	-0.124939
-0.121541	eight elements at a	-0.425969
-0.252854	is stored at a	-0.124939
-0.108279	i.e. stored at a	-0.425969
-0.307756	small bit at a	-0.124939
-0.307756	32 bits at a	-0.124939
-0.307756	16 bytes at a	-0.124939
-0.058947	one line at a	-0.124939
-0.307756	four numbers at a	-0.124939
-0.307756	small piece at a	-0.124939
-0.471100	typically loaded at a	-0.124939
-0.307756	BSD comes at a	-0.124939
-0.307756	one square at a	-0.124939
-0.307756	few kilobytes at a	-0.124939
-0.307756	by looking at a	-0.124939
-0.472709	library that has a	-0.124939
-0.248303	the code has a	-0.124939
-0.161439	The code has a	-0.124939
-0.161439	This code has a	-0.124939
-0.161439	all code has a	-0.124939
-0.485396	cache. This has a	-0.124939
-0.344821	list[i]; This has a	-0.124939
-0.344821	bytes). This has a	-0.124939
-0.297496	unit-test but has a	-0.124939
-0.386862	32-bit integer has a	-0.124939
-0.386862	polymorphic class has a	-0.124939
-1.005196	This library has a	-0.124939
-0.447466	shared object has a	-0.124939
-0.297496	where static has a	-0.124939
-0.634718	the user has a	-0.124939
-0.421264	a processor has a	-0.124939
-0.297496	the application has a	-0.124939
-0.297496	the parameter has a	-0.124939
-0.544645	static keyword has a	-0.124939
-0.297496	dependency chain has a	-0.124939
-0.297496	heap manager has a	-0.124939
-0.297496	the reader has a	-0.124939
-0.211689	is to make a	-0.124939
-0.326790	- to make a	-0.124939
-0.326790	has to make a	-0.124939
-0.133357	efficient to make a	-0.124939
-0.448790	possible to make a	-0.124939
-0.460666	takes to make a	-0.124939
-0.786497	order to make a	-0.124939
-0.405971	how to make a	-0.124939
-0.460666	useful to make a	-0.124939
-0.326790	sure to make a	-0.124939
-0.326790	needs to make a	-0.124939
-0.326790	ways to make a	-0.124939
-0.326790	safe to make a	-0.124939
-0.326790	convenient to make a	-0.124939
-0.460666	effort to make a	-0.124939
-0.326790	preferable to make a	-0.124939
-0.326790	idea to make a	-0.124939
-0.326790	sufficient to make a	-0.124939
-0.395310	case and make a	-0.124939
-0.395310	truncation and make a	-0.124939
-0.721760	you can make a	-0.124939
-0.582406	Do not make a	-0.124939
-0.385553	Then you make a	-0.124939
-0.458103	I will make a	-0.124939
-0.270371	format. Alternatively, make a	-0.124939
-0.347858	assembly language because a	-0.124939
-0.347858	the values because a	-0.124939
-0.347858	be vectorized, because a	-0.124939
-0.347858	in advance, because a	-0.124939
-0.652674	there is only a	-0.124939
-0.409867	keyword is only a	-0.124939
-0.409867	penalty is only a	-0.124939
-0.518581	include not only a	-0.124939
-0.470338	that use only a	-0.124939
-0.733027	function has only a	-0.124939
-0.499715	it takes only a	-0.124939
-0.470338	that contains only a	-0.124939
-0.303613	smart pointer. If a	-0.124939
-0.394388	this problem. If a	-0.124939
-0.303613	are inefficient. If a	-0.124939
-0.394388	linkage table. If a	-0.124939
-0.394388	an integer. If a	-0.124939
-0.394388	32-bit number. If a	-0.124939
-0.303613	dependency chain. If a	-0.124939
-0.303613	the future. If a	-0.124939
-0.394388	other factor. If a	-0.124939
-0.058360	AVX part. If a	-0.124939
-0.303613	to read. If a	-0.124939
-0.303613	be obtained. If a	-0.124939
-0.303613	and BSD. If a	-0.124939
-0.303613	set extensions. If a	-0.124939
-0.303613	= lookup[b]; If a	-0.124939
-0.303613	of ways). If a	-0.124939
-0.772803	models on which a	-0.124939
-0.352879	address at which a	-0.124939
-0.127838	b[size]; // set a	-0.425969
-0.347960	debugger cannot set a	-0.124939
-1.078644	have to do a	-0.124939
-0.534921	good to do a	-0.124939
-0.534921	cycles to do a	-0.124939
-0.730051	compilers can do a	-0.124939
-0.506979	CPUs can do a	-0.124939
-0.382596	15.0) is using a	-0.124939
-0.429425	advantage of using a	-0.124939
-0.627332	disadvantage of using a	-0.124939
-0.556683	trick of using a	-0.124939
-0.476748	alternatives to using a	-0.124939
-0.349380	or by using a	-0.124939
-0.349380	2 by using a	-0.124939
-0.349380	systems by using a	-0.124939
-0.551828	speed by using a	-0.124939
-0.349380	simply by using a	-0.124939
-0.349380	obtained by using a	-0.124939
-0.551828	improved by using a	-0.124939
-0.349380	guidelines by using a	-0.124939
-0.294021	efficient as using a	-0.124939
-0.294021	rather than using a	-0.124939
-0.355788	Example 8.4 double a	-0.124939
-0.460285	the array size a	-0.124939
-0.574400	through function pointer a	-0.124939
-0.334007	frame function into a	-0.124939
-0.253997	allocated memory into a	-0.124939
-0.390203	Loading data into a	-0.124939
-0.253997	of b into a	-0.124939
-0.253997	allocated array into a	-0.124939
-0.253997	simple variables into a	-0.124939
-0.253997	consuming calculations into a	-0.124939
-0.085579	.cpp files into a	-0.425969
-0.253997	structure y into a	-0.124939
-0.253997	objects together into a	-0.124939
-0.253997	a task into a	-0.124939
-0.412691	copying them into a	-0.124939
-0.334007	not fit into a	-0.124939
-0.108688	it fits into a	-0.124939
-0.108688	float's fits into a	-0.124939
-0.253997	be turned into a	-0.124939
-0.253997	put 80 into a	-0.124939
-0.253997	be combined into a	-0.124939
-0.253997	some formula into a	-0.124939
-0.470519	be joined into a	-0.124939
-0.108688	be wrapped into a	-0.124939
-0.108688	are wrapped into a	-0.124939
-0.253997	preferably isolated into a	-0.124939
-0.253997	time packed into a	-0.124939
-0.355590	Example 8.18 float a	-0.124939
-0.425429	C++ is also a	-0.124939
-0.425429	so is also a	-0.124939
-0.425429	buffer is also a	-0.124939
-0.425429	while-loop is also a	-0.124939
-0.328350	programs and also a	-0.124939
-0.328350	and possibly also a	-0.124939
-0.399478	costs to such a	-0.124939
-0.434937	reorganized in such a	-0.124939
-0.399478	user if such a	-0.124939
-0.399478	not have such a	-0.124939
-0.434937	to do such a	-0.124939
-0.307739	Programs using such a	-0.124939
-0.307739	parenthesis around such a	-0.124939
-0.307739	example illustrates such a	-0.124939
-0.307739	to justify such a	-0.124939
-0.307739	may supply such a	-0.124939
-0.356510	different processors. In a	-0.124939
-1.323603	of the array a	-0.124939
-0.794448	be cases where a	-0.124939
-0.300169	a solution where a	-0.124939
-0.300169	memory space where a	-0.124939
-0.196772	the situation where a	-0.124939
-0.237699	A situation where a	-0.124939
-0.237699	any situation where a	-0.124939
-0.367259	be situations where a	-0.124939
-0.367259	are situations where a	-0.124939
-0.300169	multiple inheritance where a	-0.124939
-0.300169	sequential instructions, where a	-0.124939
-0.516216	task that takes a	-0.124939
-0.235667	to integer takes a	-0.124939
-0.235667	an integer takes a	-0.124939
-0.467799	pointer typically takes a	-0.124939
-0.332019	garbage collection takes a	-0.124939
-0.356046	same address so a	-0.124939
-0.414758	new and return a	-0.124939
-0.414758	the function return a	-0.124939
-0.570558	else { return a	-0.124939
-0.404716	b) { return a	-0.124939
-0.404716	(b) { return a	-0.124939
-0.570558	a) { return a	-0.124939
-0.432761	3; } return a	-0.124939
-0.056777	* 2; return a	-0.425969
-0.056777	* 3; return a	-0.124939
-0.341580	unused bytes between a	-0.124939
-0.858762	the difference between a	-0.124939
-0.436378	The difference between a	-0.124939
-0.550697	with the way a	-0.124939
-0.348626	predict which way a	-0.124939
-0.560505	loaded. This makes a	-0.124939
-0.326550	The compiler makes a	-0.124939
-0.326550	class. It makes a	-0.124939
-0.326550	installation program makes a	-0.124939
-0.326550	section position-independent, makes a	-0.124939
-1.053289	function is called a	-0.124939
-0.489678	functions is called a	-0.124939
-0.489678	one is called a	-0.124939
-0.558439	from memory address a	-0.124939
-0.298017	time to call a	-0.124939
-0.232359	takes to call a	-0.124939
-0.298017	need to call a	-0.124939
-0.298017	needs to call a	-0.124939
-0.351478	files For example, a	-0.124939
-0.351478	resources. For example, a	-0.124939
-0.351478	variable. For example, a	-0.124939
-0.494625	expressions. For example, a	-0.124939
-0.351478	units. For example, a	-0.124939
-0.351478	core. For example, a	-0.124939
-0.494625	constants. For example, a	-0.124939
-0.351478	question. For example, a	-0.124939
-0.351478	matrix. For example, a	-0.124939
-0.351478	algebra. For example, a	-0.124939
-0.351478	reduction. For example, a	-0.124939
-0.527745	work to take a	-0.124939
-0.488379	objects that take a	-0.124939
-0.505453	branches may take a	-0.124939
-0.323546	These conversions take a	-0.124939
-0.323546	and logarithms take a	-0.124939
-0.727636	This is often a	-0.124939
-0.505532	there is often a	-0.124939
-0.337782	e.g. how often a	-0.124939
-0.980819	to know how a	-0.124939
-0.347083	no idea how a	-0.124939
-0.328470	you only need a	-0.124939
-0.471072	it doesn't need a	-0.124939
-0.334411	class doesn't need a	-0.124939
-0.711335	you don't need a	-0.124939
-0.850072	how to test a	-0.124939
-0.299037	hundred or even a	-0.124939
-0.299037	search, or even a	-0.124939
-0.335923	you have even a	-0.124939
-0.304513	possible to access a	-0.124939
-0.126097	faster to access a	-0.124939
-0.304513	steps to access a	-0.124939
-0.634025	If you access a	-0.124939
-0.502554	to roll out a	-0.124939
-0.702257	~a = 0 a	-0.124939
-0.702257	^a = 0 a	-0.124939
-0.841328	in the case a	-0.124939
-1.078132	- a & a	-0.124939
-0.353882	giving each constant a	-0.124939
-0.322821	that make up a	-0.124939
-0.322821	for setting up a	-0.124939
-0.132082	efficient. Splitting up a	-0.124939
-0.132082	rule. Splitting up a	-0.124939
-0.436173	facilities for making a	-0.124939
-0.636624	you are making a	-0.124939
-0.432745	or by making a	-0.124939
-0.432745	division by making a	-0.124939
-0.208173	faster than making a	-0.124939
-0.208173	rather than making a	-0.124939
-0.889769	compiler from making a	-0.124939
-0.281879	for actually making a	-0.124939
-1.266544	If you want a	-0.124939
-0.578674	additional information about a	-0.124939
-0.341536	32 bits while a	-0.124939
-0.341536	frame function, while a	-0.124939
-0.353438	name ;startofFunc ; a	-0.124939
-0.442175	which then calls a	-0.124939
-0.341975	AVX support calls a	-0.124939
-0.305708	copying it Use a	-0.124939
-0.305708	for example: Use a	-0.124939
-0.305708	satisfied: 1. Use a	-0.124939
-0.126492	16 3.2 Use a	-0.124939
-0.126492	improved. 3.2 Use a	-0.124939
-0.353186	doubt how big a	-0.124939
-0.329077	function names. But a	-0.124939
-0.329077	by itself. But a	-0.124939
-0.329077	its simplicity. But a	-0.124939
-0.133683	the function through a	-0.124939
-0.133683	a function through a	-0.124939
-0.046349	child class through a	-0.124939
-0.046349	generation class through a	-0.124939
-0.046349	derived class through a	-0.124939
-0.157188	accesses b through a	-0.124939
-0.133683	or object through a	-0.124939
-0.133683	an object through a	-0.124939
-0.157188	a variable through a	-0.124939
-0.157188	is called through a	-0.124939
-0.157188	own address through a	-0.124939
-0.303993	is accessed through a	-0.124939
-0.303993	necessarily accessed through a	-0.124939
-0.243117	must go through a	-0.124939
-0.157188	the GOT through a	-0.124939
-0.157188	extra jump through a	-0.124939
-0.157188	the caller through a	-0.124939
-0.157188	block. Walking through a	-0.124939
-0.157188	than looping through a	-0.124939
-0.157188	be propagated through a	-0.124939
-0.121893	a = a, a	-0.124939
-0.121893	true = a, a	-0.124939
-0.056684	-1 = a, a	-0.124939
-0.779897	possible to compile a	-0.124939
-1.067030	in a matrix a	-0.124939
-0.339222	writes to matrix a	-0.124939
-0.532896	had not been a	-0.124939
-0.357516	it may cause a	-0.124939
-0.357516	This may cause a	-0.124939
-0.248706	members may cause a	-0.124939
-0.706743	0x2710 will cause a	-0.124939
-0.457247	I have done a	-0.124939
-1.445062	It is therefore a	-0.124939
-0.359237	it is inside a	-0.124939
-0.274878	the table inside a	-0.124939
-0.073805	objects declared inside a	-0.425969
-0.073805	Variables declared inside a	-0.425969
-0.359237	is defined inside a	-0.124939
-0.411588	application that uses a	-0.124939
-0.496379	the compiler uses a	-0.124939
-0.362041	the program uses a	-0.124939
-0.252224	The program uses a	-0.124939
-0.352739	particular application uses a	-0.124939
-0.352739	This implementation uses a	-0.124939
-0.269520	it still uses a	-0.124939
-0.353475	systems". The parameters a	-0.124939
-0.362281	possible to get a	-0.124939
-0.607779	order to get a	-0.124939
-0.257078	elsewhere and get a	-0.124939
-0.051434	you may get a	-0.124939
-0.257078	Users should get a	-0.124939
-0.257078	will soon get a	-0.124939
-0.257078	inefficient, (4) get a	-0.124939
-0.874520	a; int b; a	-0.124939
-0.456709	a; double b; a	-0.124939
-0.122213	int a, b; a	-0.425969
-0.088288	double a, b; a	-0.602060
-0.036671	float a, b; a	-0.602060
-0.376321	a; bool b; a	-0.124939
-0.137241	and have implemented a	-0.124939
-0.137241	I have implemented a	-0.124939
-0.353534	comparisons. The solution a	-0.124939
-0.896370	processors that support a	-0.124939
-0.058780	the software contains a	-0.124939
-0.398043	code often contains a	-0.124939
-0.306576	the expression contains a	-0.124939
-0.353485	When considering whether a	-0.124939
-0.517537	you are doing a	-0.124939
-0.400872	programs to run a	-0.124939
-0.400872	prefer to run a	-0.124939
-0.541358	processors can calculate a	-0.124939
-0.534150	likely to inline a	-0.124939
-0.433277	size then add a	-0.124939
-0.334896	When we add a	-0.124939
-0.136178	sizeof(a)); // copy a	-0.124939
-0.136178	0.0; // copy a	-0.124939
-0.352465	best job optimizing a	-0.124939
-0.105560	It is simply a	-0.425969
-0.245308	structure is simply a	-0.124939
-0.105560	difference is simply a	-0.425969
-0.547394	i++) { ... a	-0.124939
-0.477675	may be quite a	-0.124939
-0.060118	can take quite a	-0.425969
-1.131738	registers are used. a	-0.124939
-0.782059	easier to write a	-0.124939
-0.333222	if you write a	-0.124939
-0.843113	want to optimize a	-0.124939
-0.352407	x86 CPUs. However, a	-0.124939
-0.455058	In simple cases, a	-0.124939
-0.274272	possible to replace a	-0.124939
-0.656937	compiler can replace a	-0.124939
-0.111133	you cannot replace a	-0.124939
-0.111133	You cannot replace a	-0.124939
-0.342275	can automatically replace a	-0.124939
-0.644291	64-bit Windows allows a	-0.124939
-0.330756	dispatcher then sets a	-0.124939
-0.330756	initialization routine sets a	-0.124939
-0.385523	in the expression a	-0.124939
-0.542721	while the expression a	-0.124939
-0.439784	time. The expression a	-0.124939
-0.350925	twice for handling a	-0.124939
-0.358304	simple things like a	-0.124939
-0.274109	also treated like a	-0.124939
-0.504145	that behaves like a	-0.124939
-0.274109	is expanded like a	-0.124939
-0.274109	simple actions like a	-0.124939
-0.330237	each element __m128i a	-0.124939
-0.330237	AND operations: __m128i a	-0.124939
-0.351358	} } Using a	-0.124939
-0.131552	recommended to put a	-0.124939
-0.321180	like to put a	-0.124939
-0.271537	unless you put a	-0.124939
-0.355183	threads. Don't put a	-0.124939
-0.330524	1.0f; This needs a	-0.124939
-0.330524	a loop needs a	-0.124939
-0.499572	b * c; a	-0.124939
-0.271411	b / c; a	-0.124939
-0.552128	a, b, c; a	-0.425969
-0.271411	b % c; a	-0.124939
-0.328929	called, or what a	-0.124939
-0.425806	can change what a	-0.124939
-0.351198	counters before running a	-0.124939
-0.742288	true a && a	-0.124939
-0.326427	expression b && a	-0.124939
-0.800153	a, a | a	-0.124939
-0.102187	{ // Make a	-0.602060
-0.069133	_mm_set1_epi16(0); // Make a	-0.425969
-0.151394	zero(0,0,0,0,0,0,0,0); // Make a	-0.124939
-0.209955	C++0x support. Make a	-0.124939
-0.209955	and operators. Make a	-0.124939
-0.305648	Example 7.10b char a	-0.124939
-0.305648	Example 8.17 char a	-0.124939
-0.305648	Example 7.9b char a	-0.124939
-0.531447	pointer is needed a	-0.124939
-0.324626	dispatcher should give a	-0.124939
-0.324626	can still give a	-0.124939
-0.324289	a loop becomes a	-0.124939
-0.420014	that caching becomes a	-0.124939
-0.395633	block. This requires a	-0.124939
-0.510913	two pointers requires a	-0.124939
-0.278082	such processors requires a	-0.124939
-0.278082	Event-based sampling requires a	-0.124939
-0.450145	time to load a	-0.124939
-0.450145	writes to load a	-0.124939
-0.349650	fast as calling a	-0.124939
-1.417269	following example shows a	-0.124939
-0.323683	(page 131) shows a	-0.124939
-0.308368	want to generate a	-0.124939
-0.308368	likely to generate a	-0.124939
-0.049932	0 and generate a	-0.425969
-0.611752	it will generate a	-0.124939
-0.348965	propagation and reduce a	-0.124939
-0.473770	is to choose a	-0.124939
-0.980170	You may choose a	-0.124939
-0.321159	I have made a	-0.124939
-0.321159	I once made a	-0.124939
-0.432327	classes is just a	-0.124939
-0.380464	cache in just a	-0.124939
-0.292281	even when just a	-0.124939
-0.349455	lookup or require a	-0.124939
-0.242418	does not require a	-0.124939
-0.320123	measurements may require a	-0.124939
-0.242418	global arrays require a	-0.124939
-0.242418	non-constant references require a	-0.124939
-0.491525	possible to start a	-0.124939
-0.319928	it can start a	-0.124939
-0.347577	model N supports a	-0.124939
-1.052479	number of columns a	-0.124939
-0.444198	unequally can become a	-0.124939
-0.463491	way has become a	-0.124939
-0.348507	clock. This gives a	-0.124939
-0.347572	request for inlining a	-0.124939
-0.347068	!(a || b) a	-0.124939
-0.222433	either way. Such a	-0.124939
-0.222433	soft processor. Such a	-0.124939
-0.222433	definition language. Such a	-0.124939
-0.222433	the market. Such a	-0.124939
-0.222433	computer games. Such a	-0.124939
-0.348314	preceding paragraph described a	-0.124939
-0.448600	Boolean operators produce a	-0.124939
-0.347566	you are including a	-0.124939
-0.401803	may be given a	-0.124939
-0.309620	not been given a	-0.124939
-0.346759	will make temp a	-0.124939
-0.958515	b, c, d; a	-0.124939
-0.486237	You can save a	-0.124939
-0.347559	if this prevents a	-0.124939
-0.834755	way to tell a	-0.124939
-0.193477	have to unroll a	-0.124939
-0.193477	necessary to unroll a	-0.124939
-0.193477	advantage to unroll a	-0.124939
-0.244115	will usually unroll a	-0.124939
-0.345258	faster because testing a	-0.124939
-0.325378	reading or writing a	-0.124939
-0.241615	Reading or writing a	-0.124939
-0.346117	of profiling. When a	-0.124939
-0.345258	implicitly when copying a	-0.124939
-0.365724	code for accessing a	-0.124939
-0.102153	time than accessing a	-0.124939
-0.102153	efficient than accessing a	-0.124939
-0.235973	aliasing When accessing a	-0.124939
-0.488379	must wait until a	-0.124939
-0.363420	address by adding a	-0.124939
-0.363420	calculated by adding a	-0.124939
-0.345544	mispredicted, which causes a	-0.124939
-0.944251	able to predict a	-0.124939
-0.125137	true = true a	-0.124939
-0.125137	!a = true a	-0.124939
-0.484869	debugger can execute a	-0.124939
-0.881558	specialization for N a	-0.124939
-0.552278	(or at least a	-0.124939
-0.390925	possible to insert a	-0.124939
-0.274486	Remember to insert a	-0.124939
-0.414525	memory and insert a	-0.124939
-0.123715	memory without loading a	-0.124939
-0.123715	double without loading a	-0.124939
-0.502912	variables for calculating a	-0.124939
-0.298582	to begin calculating a	-0.124939
-0.343138	time to e.g. a	-0.124939
-0.095017	> b ? a	-0.425969
-0.344485	programmer has defined a	-0.124939
-0.412626	you can expect a	-0.124939
-0.036866	You cannot expect a	-0.301030
-0.994451	is of course a	-0.124939
-0.247187	is useful whenever a	-0.124939
-0.247187	extra cost whenever a	-0.124939
-0.247187	(PLT). And whenever a	-0.124939
-0.442608	recommended to modify a	-0.124939
-0.292318	way of setting a	-0.124939
-0.461420	core by setting a	-0.124939
-0.206578	integer is within a	-0.124939
-0.206578	of zero within a	-0.124939
-0.206578	be irrelevant within a	-0.124939
-0.206578	by keys within a	-0.124939
-0.341949	member functions counts a	-0.124939
-0.292318	On many processors, a	-0.124939
-0.292318	On older processors, a	-0.124939
-0.342688	} } Obviously, a	-0.124939
-0.205490	than to allocate a	-0.124939
-0.205490	necessary to allocate a	-0.124939
-0.205490	delete to allocate a	-0.124939
-0.206039	string classes allocate a	-0.124939
-0.343428	I have added a	-0.124939
-0.344539	and they waste a	-0.124939
-0.431470	matrix // define a	-0.124939
-0.291660	you may define a	-0.124939
-0.242107	efficient to implement a	-0.124939
-0.104396	possible to implement a	-0.124939
-0.340093	data should contain a	-0.124939
-0.624554	reads or writes a	-0.124939
-0.440834	faster to transfer a	-0.124939
-0.507451	easily optimize away a	-0.124939
-0.440834	We can multiply a	-0.124939
-0.373051	This function stores a	-0.124939
-0.286222	Gnu mechanism stores a	-0.124939
-0.341729	purpose of finding a	-0.124939
-0.626917	compiler will vectorize a	-0.124939
-0.312131	be to include a	-0.124939
-0.235726	instruction sets include a	-0.124939
-0.235726	compiler packages include a	-0.124939
-0.441349	an integer addition, a	-0.124939
-0.340910	is unchanged across a	-0.124939
-0.339615	that previously required a	-0.124939
-0.696477	may slow down a	-0.124939
-0.335871	if it had a	-0.124939
-0.131897	b / 10; a	-0.124939
-0.131897	int)b / 10; a	-0.124939
-0.070359	b % 10; a	-0.124939
-0.070359	int)b % 10; a	-0.124939
-0.335871	different type. Likewise, a	-0.124939
-0.336912	manager can spend a	-0.124939
-0.996077	function is called, a	-0.124939
-0.617897	and after executing a	-0.124939
-0.335871	kind of exceptions a	-0.124939
-0.455864	time to transpose a	-0.124939
-0.323256	takes to transpose a	-0.124939
-0.333334	Repeating the break a	-0.124939
-0.333334	how to break a	-0.124939
-0.332732	base address plus a	-0.124939
-0.333937	b / 16; a	-0.124939
-0.333937	b % 16; a	-0.124939
-0.517891	enough to identify a	-0.124939
-0.333334	breakpoint and show a	-0.124939
-0.253437	but only show a	-0.124939
-0.721698	needs to evaluate a	-0.124939
-0.333334	a non-const reference, a	-0.124939
-0.470423	is only half a	-0.124939
-0.254441	used for converting a	-0.124939
-0.186840	we are converting a	-0.124939
-0.186840	is implicitly converting a	-0.124939
-0.186410	branch that follows a	-0.124939
-0.186410	if it follows a	-0.124939
-0.186410	function pointer follows a	-0.124939
-0.332732	whether to base a	-0.124939
-0.334541	lookup tables Reading a	-0.124939
-0.334541	model numbers form a	-0.124939
-0.186410	type __m128i defines a	-0.124939
-0.186410	type __m128 defines a	-0.124939
-0.186410	type __m128d defines a	-0.124939
-0.332732	* c; Is16vec8 a	-0.124939
-0.186410	or structures. Accessing a	-0.124939
-0.186410	more compact. Accessing a	-0.124939
-0.186410	variable. Efficiency Accessing a	-0.124939
-0.062283	takes to install a	-0.124939
-0.159302	user must install a	-0.124939
-0.072387	framework can consume a	-0.124939
-0.072387	database can consume a	-0.124939
-0.159302	and functions consume a	-0.124939
-0.922668	the other hand, a	-0.124939
-0.329089	program that created a	-0.124939
-0.330524	case labels follow a	-0.124939
-0.235662	use and returns a	-0.124939
-0.235662	function which returns a	-0.124939
-0.603462	efficient solution. Is a	-0.124939
-0.329806	computing resources. Typically, a	-0.124939
-0.329806	-1 = ~a a	-0.124939
-0.329806	Here we prefer a	-0.124939
-0.418158	a loop repeats a	-0.124939
-0.417061	overcome by defining a	-0.124939
-0.179084	code that produces a	-0.124939
-0.056545	unsigned variable produces a	-0.124939
-0.056545	signed variable produces a	-0.124939
-0.418158	and calculate *p+2 a	-0.124939
-0.282776	obtained by choosing a	-0.124939
-0.210974	account when choosing a	-0.124939
-0.454048	strategy for saving a	-0.124939
-0.210292	never used. Whenever a	-0.124939
-0.210292	is better. Whenever a	-0.124939
-0.321918	int Sum1() {return a	-0.124939
-0.210292	Function pointers Calling a	-0.124939
-0.210292	the class. Calling a	-0.124939
-0.322799	You can force a	-0.124939
-0.418158	type, a pointer, a	-0.124939
-0.322799	a function opens a	-0.124939
-0.403964	syntax may seem a	-0.124939
-0.311367	framework still consumes a	-0.124939
-0.311367	the compiler optimizes a	-0.124939
-0.439806	a[i]; // Return a	-0.124939
-0.312514	// Example 7.2 a	-0.124939
-0.311367	some cases ignore a	-0.124939
-0.236047	may be considered a	-0.124939
-0.171049	not traditionally considered a	-0.124939
-0.312514	addresses are spaced a	-0.124939
-0.341319	used for implementing a	-0.124939
-0.171049	themselves. But implementing a	-0.124939
-0.311367	(Linux only). Specifies a	-0.124939
-0.311367	here. It reveals a	-0.124939
-0.311367	instruction set (requires a	-0.124939
-0.311367	is float 140 a	-0.124939
-0.311367	program should leave a	-0.124939
-0.171049	thousand numbers. With a	-0.124939
-0.171049	next step. With a	-0.124939
-0.312514	length function scans a	-0.124939
-0.311367	can easily justify a	-0.124939
-0.171049	integer that holds a	-0.124939
-0.171049	float type holds a	-0.124939
-0.311367	has two arrays, a	-0.124939
-0.077157	false = false, a	-0.124939
-0.077157	!a = false, a	-0.124939
-0.171049	remaining bits represent a	-0.124939
-0.171049	to truly represent a	-0.124939
-0.171049	Instead of returning a	-0.124939
-0.171049	manner by returning a	-0.124939
-0.312514	cleanup before terminating a	-0.124939
-0.439806	space by joining a	-0.124939
-0.439806	seen, is certainly a	-0.124939
-0.403964	8.21 is indeed a	-0.124939
-0.311367	or NAN (not a	-0.124939
-0.311367	where it expects a	-0.124939
-0.311367	compilers can compute a	-0.124939
-0.291004	add, etc. SSSE3 a	-0.124939
-0.291004	Intel mechanism executes a	-0.124939
-0.291004	I have developed a	-0.124939
-0.378899	cost of keeping a	-0.124939
-0.291004	library can emulate a	-0.124939
-0.291004	function inline. Replacing a	-0.124939
-0.291004	language Before starting a	-0.124939
-0.291004	and only if, a	-0.124939
-0.291004	and drivers differ a	-0.124939
-0.101826	response to pressing a	-0.124939
-0.101826	tasks like pressing a	-0.124939
-0.378899	verifying and maintaining a	-0.124939
-0.291004	{ b.load(bb+i); c.load(cc+i); a	-0.124939
-0.291004	the CPU. Unrolling a	-0.124939
-0.291004	time and afterwards a	-0.124939
-0.101826	need to lock a	-0.124939
-0.101826	to temporarily lock a	-0.124939
-0.291004	their implementations reveal a	-0.124939
-0.533240	x, y, z; a	-0.124939
-0.101826	function which transposes a	-0.124939
-0.101826	following example transposes a	-0.124939
-0.378899	{ // Returns a	-0.124939
-0.291004	when it sees a	-0.124939
-0.291004	the compiler treat a	-0.124939
-0.291004	to make log2 a	-0.124939
-0.291004	but for studying a	-0.124939
-0.378899	fact by replacing a	-0.124939
-0.291004	fragmented and involve a	-0.124939
-0.291004	a simple type, a	-0.124939
-0.291004	or 1. Writing a	-0.124939
-0.291004	this by assigning a	-0.124939
-0.291004	way of relieving a	-0.124939
-0.235084	irrelevant software installed, a	-0.124939
-0.235084	syntax 90 Gives a	-0.124939
-0.235084	efficient to re-use a	-0.124939
-0.235084	class definition. Inlining a	-0.124939
-0.235084	operations for incrementing a	-0.124939
-0.235084	Incrementing or decrementing a	-0.124939
-0.235084	memory by requesting a	-0.124939
-0.235084	call statement occupies a	-0.124939
-0.235084	see below. Installing a	-0.124939
-0.235084	a program executable: a	-0.124939
-0.235084	a&b&c&d = (a&b)&(c&d) a	-0.124939
-0.235084	} } Transposing a	-0.124939
-0.235084	// Example 8.5b a	-0.124939
-0.235084	not modified. Unlike a	-0.124939
-0.235084	it and create a	-0.124939
-0.235084	of calculations forms a	-0.124939
-0.235084	faster to compose a	-0.124939
-0.235084	function that draws a	-0.124939
-0.235084	whenever it feeds a	-0.124939
-0.235084	// Example 8.2b a	-0.124939
-0.235084	// Example 8.3b a	-0.124939
-0.235084	each bit indicates a	-0.124939
-0.235084	solution can incur a	-0.124939
-0.235084	however, to pass a	-0.124939
-0.235084	has to reinstall a	-0.124939
-0.235084	The function rounds a	-0.124939
-0.235084	// Example 8.10b a	-0.124939
-0.235084	used for fetching a	-0.124939
-0.235084	spend on redesigning a	-0.124939
-0.235084	= {2.6f, 1.5f}; a	-0.124939
-0.235084	following example converts a	-0.124939
-0.235084	b = MultiplyBy<8>(10); a	-0.124939
-0.235084	rather than isolating a	-0.124939
-0.235084	a & a= a	-0.124939
-0.235084	= {1.0f, 2.5f}; a	-0.124939
-0.235084	overhead of managing a	-0.124939
-0.235084	---xx---- a<<b<<c=a<<(b+c) x-xxx--xx a	-0.124939
-0.235084	develop and publish a	-0.124939
-0.235084	x-xxxx--x ~a&~b=~(a|b) --xxxx--- a	-0.124939
-0.235084	amount of RAM, a	-0.124939
-0.235084	time you activate a	-0.124939
-0.235084	program of occupying a	-0.124939
-0.582950	} This is of	-0.124939
-0.582950	objects. This is of	-0.124939
-0.356785	already works is of	-0.124939
-0.356785	and animations is of	-0.124939
-0.357594	speed exceeding that of	-0.124939
-0.595277	table may be of	-0.124939
-0.593404	b should be of	-0.124939
-0.460956	calls. These are of	-0.124939
-0.065441	a table // of	-0.425969
-0.589208	whenever a function of	-0.124939
-0.064503	a linear function of	-0.124939
-1.009119	CPU detection function of	-0.124939
-0.348797	monotonically increasing function of	-0.124939
-0.140285	Library exp function of	-0.124939
-0.140285	floats exp function of	-0.124939
-0.348797	a staircase function of	-0.124939
-0.597984	prefetching the code of	-0.124939
-1.577688	then you may of	-0.124939
-0.566133	at the time of	-0.301030
-0.496683	estimated calculation time of	-0.124939
-0.018849	that the use of	-0.124939
-0.006193	by the use of	-0.124939
-0.009322	with the use of	-0.124939
-0.018849	makes the use of	-0.124939
-0.018849	prevents the use of	-0.124939
-0.018849	Avoid the use of	-0.124939
-0.009322	economize the use of	-0.124939
-0.018849	Especially the use of	-0.124939
-0.059195	program. The use of	-0.124939
-0.059195	purposes. The use of	-0.124939
-0.059195	vector. The use of	-0.124939
-0.059195	threads. The use of	-0.124939
-0.309523	can make use of	-0.124939
-0.309523	to efficient use of	-0.124939
-0.309523	avoid any use of	-0.124939
-0.309523	make better use of	-0.124939
-0.309523	The explicit use of	-0.124939
-0.309523	space. Excessive use of	-0.124939
-1.133905	one or more of	-0.124939
-1.804693	of the program of	-0.124939
-0.088381	as a vector of	-0.124939
-0.352391	into a vector of	-0.124939
-0.027509	Make a vector of	-0.602060
-0.229376	128 bit vector of	-0.124939
-0.443654	non-AVX code because of	-0.124939
-0.487411	compile time because of	-0.124939
-0.443654	< b because of	-0.124939
-0.314226	register variables because of	-0.124939
-0.314226	some systems because of	-0.124939
-0.443654	subsequent times because of	-0.124939
-0.314226	half speed because of	-0.124939
-0.314226	function parameters because of	-0.124939
-0.407506	efficient solution because of	-0.124939
-0.314226	than intended because of	-0.124939
-0.574652	be avoided because of	-0.124939
-0.780258	is inefficient because of	-0.124939
-0.314226	hard disk because of	-0.124939
-0.314226	are preferred because of	-0.124939
-0.314226	fail completely because of	-0.124939
-0.170213	the member functions of	-0.124939
-0.454070	The member functions of	-0.124939
-0.553715	give a CPU of	-0.124939
-0.647938	the newest CPU of	-0.124939
-0.352695	a possible point of	-0.124939
-0.352695	a technological point of	-0.124939
-1.123212	in a loop of	-0.124939
-0.312396	critical innermost loop of	-0.425969
-0.447397	the message loop of	-0.124939
-0.455488	is discussed which of	-0.124939
-0.352512	in advance which of	-0.124939
-0.539689	access to all of	-0.124939
-0.543397	efficient if all of	-0.124939
-0.092928	This is one of	-0.425969
-0.211311	Polymorphism is one of	-0.124939
-0.310038	pointer to one of	-0.124939
-0.211311	identical to one of	-0.124939
-0.211311	belong to one of	-0.124939
-0.529164	example, only one of	-0.124939
-0.360004	read into one of	-0.124939
-0.360004	0x273F into one of	-0.124939
-0.316149	may choose one of	-0.124939
-0.316149	line. Only one of	-0.124939
-0.316149	for signifying one of	-0.124939
-0.558239	is a cache of	-0.124939
-0.922051	level-1 data cache of	-0.124939
-0.824195	a level-2 cache of	-0.124939
-0.460943	NULL. There should of	-0.124939
-0.584065	declaring an integer of	-0.124939
-0.059544	use a set of	-0.124939
-0.312012	calculate which set of	-0.124939
-0.312012	commonly used set of	-0.124939
-0.312012	is one set of	-0.124939
-0.312012	for each set of	-0.124939
-0.312012	a particular set of	-0.124939
-0.312012	its own set of	-0.124939
-0.312012	a typical set of	-0.124939
-0.312012	a suitable set of	-0.124939
-0.059544	a realistic set of	-0.124939
-0.563025	if the class of	-0.124939
-0.351248	know what class of	-0.124939
-0.351306	structure or each of	-0.124939
-0.453961	and after each of	-0.124939
-0.442216	at the example of	-0.124939
-0.075562	for an example of	-0.425969
-0.167098	shows an example of	-0.124939
-0.306578	object where most of	-0.124939
-0.306578	one way most of	-0.124939
-0.306578	Windows, while most of	-0.124939
-0.306578	versa. But most of	-0.124939
-0.306578	application uses most of	-0.124939
-0.306578	to run most of	-0.124939
-0.306578	are predicted most of	-0.124939
-0.126780	programs spend most of	-0.124939
-0.126780	applications spend most of	-0.124939
-0.306578	software runs most of	-0.124939
-0.306578	can obtain most of	-0.124939
-0.306578	that consumes most of	-0.124939
-0.306578	153 spends most of	-0.124939
-0.093083	of the size of	-0.124939
-0.093083	for the size of	-0.124939
-0.233403	if the size of	-0.124939
-0.103444	by the size of	-0.425969
-0.093083	as the size of	-0.124939
-0.028866	when the size of	-0.301030
-0.093083	because the size of	-0.124939
-0.093083	If the size of	-0.124939
-0.166165	where the size of	-0.124939
-0.093083	example, the size of	-0.124939
-0.093083	unless the size of	-0.124939
-0.093083	fit the size of	-0.124939
-0.093083	increase the size of	-0.124939
-0.093083	half the size of	-0.124939
-0.093083	Return the size of	-0.124939
-0.093083	increases the size of	-0.124939
-0.049373	efficient. The size of	-0.124939
-0.105087	units. The size of	-0.124939
-0.105087	elements. The size of	-0.124939
-0.105087	module. The size of	-0.124939
-0.105087	reasons: The size of	-0.124939
-0.105087	i. The size of	-0.124939
-0.117458	a line size of	-0.124939
-0.219452	The maximum size of	-0.124939
-0.219452	// Define size of	-0.124939
-0.219452	The total size of	-0.124939
-0.045357	the combined size of	-0.124939
-0.045357	elements Total size of	-0.425969
-1.349063	to a pointer of	-0.124939
-0.523485	example, a library of	-0.124939
-0.005000	is a multiple of	-0.823909
-0.012610	by a multiple of	-0.425969
-0.025596	size a multiple of	-0.124939
-0.025596	spaced a multiple of	-0.124939
-0.526047	where the object of	-0.124939
-0.115376	to an object of	-0.124939
-0.115376	in an object of	-0.425969
-0.272950	on an object of	-0.124939
-0.272950	as an object of	-0.124939
-0.388918	accessing an object of	-0.124939
-0.557778	a new object of	-0.124939
-0.220964	declared. An object of	-0.124939
-0.220964	Inheritance An object of	-0.124939
-0.006968	is the number of	-0.425969
-0.014050	to the number of	-0.124939
-0.004633	and the number of	-0.124939
-0.006968	that the number of	-0.124939
-0.014050	or the number of	-0.124939
-0.003470	if the number of	-0.249877
-0.006968	by the number of	-0.124939
-0.014050	on the number of	-0.124939
-0.014050	as the number of	-0.124939
-0.006968	than the number of	-0.124939
-0.006968	when the number of	-0.124939
-0.014050	make the number of	-0.124939
-0.023319	If the number of	-0.221849
-0.014050	double the number of	-0.124939
-0.004633	where the number of	-0.124939
-0.014050	between the number of	-0.124939
-0.014050	making the number of	-0.124939
-0.014050	Therefore, the number of	-0.124939
-0.014050	reduce the number of	-0.124939
-0.014050	reducing the number of	-0.124939
-0.014050	measures the number of	-0.124939
-0.044941	in a number of	-0.124939
-0.021889	are a number of	-0.124939
-0.044941	from a number of	-0.124939
-0.009535	program. The number of	-0.124939
-0.009535	available. The number of	-0.124939
-0.009535	system. The number of	-0.124939
-0.009535	systems: The number of	-0.124939
-0.009535	8. The number of	-0.124939
-0.009535	small. The number of	-0.124939
-0.009535	27 The number of	-0.124939
-0.011145	512; // number of	-0.425969
-0.022584	64; // number of	-0.124939
-0.122493	use this number of	-0.124939
-0.034331	a variable number of	-0.124939
-0.034331	A variable number of	-0.124939
-0.071612	very large number of	-0.124939
-0.071612	The optimal number of	-0.124939
-0.034331	a limited number of	-0.124939
-0.034331	A limited number of	-0.124939
-0.007398	The maximum number of	-0.124939
-0.071612	a reduced number of	-0.124939
-0.007398	the total number of	-0.301030
-0.071612	a realistic number of	-0.124939
-0.007398	an excessive number of	-0.124939
-0.071612	the 107 number of	-0.124939
-0.016827	an increasing number of	-0.124939
-0.071612	an extended number of	-0.124939
-0.071612	lookups Max. number of	-0.124939
-0.071612	an integral number of	-0.124939
-0.615687	to an array of	-0.124939
-0.697645	as an array of	-0.124939
-0.435006	feeding an array of	-0.124939
-0.332231	Make dynamic array of	-0.124939
-0.332231	// Make array of	-0.124939
-0.893236	libraries for many of	-0.124939
-0.532997	IDE with many of	-0.124939
-0.388823	D has many of	-0.124939
-0.388823	Pascal has many of	-0.124939
-0.331474	and avoids many of	-0.124939
-0.184015	the same version of	-0.124939
-0.083891	time which version of	-0.124939
-0.083891	known which version of	-0.124939
-0.083891	testing which version of	-0.124939
-0.083891	certainty which version of	-0.124939
-0.276040	of each version of	-0.124939
-0.184015	best possible version of	-0.124939
-0.410063	the new version of	-0.124939
-0.339853	the SSE2 version of	-0.124939
-0.184015	// specific version of	-0.124939
-0.360649	the optimized version of	-0.124939
-0.184015	the optimal version of	-0.124939
-0.184015	a better version of	-0.124939
-0.184015	years old version of	-0.124939
-0.146223	the appropriate version of	-0.726999
-0.114984	The appropriate version of	-0.124939
-0.360649	the desired version of	-0.124939
-0.016995	the right version of	-0.602060
-0.184015	the final version of	-0.124939
-0.184015	a future version of	-0.124939
-0.184015	a newer version of	-0.124939
-0.184015	the interpreted version of	-0.124939
-0.188954	17 debug version of	-0.124939
-0.188954	Uses debug version of	-0.124939
-0.016995	the latest version of	-0.301030
-0.184015	most popular version of	-0.124939
-0.251142	An inferior version of	-0.124939
-0.184015	A command-line version of	-0.124939
-0.360649	a release version of	-0.124939
-0.257405	of the value of	-0.124939
-0.149499	that the value of	-0.249877
-0.028925	if the value of	-0.346788
-0.168876	If the value of	-0.124939
-0.168876	so the value of	-0.124939
-0.168876	sure the value of	-0.124939
-0.257405	calculate the value of	-0.124939
-0.168876	after the value of	-0.124939
-0.076281	read the value of	-0.124939
-0.168876	know the value of	-0.124939
-0.168876	change the value of	-0.124939
-0.168876	gives the value of	-0.124939
-0.168876	hold the value of	-0.124939
-0.168876	restores the value of	-0.124939
-0.103623	explanation. The value of	-0.124939
-0.103623	counts. The value of	-0.124939
-0.103623	false. The value of	-0.124939
-0.174440	each different value of	-0.124939
-0.174440	the integer value of	-0.124939
-0.222327	and each value of	-0.124939
-0.222327	because each value of	-0.124939
-0.078520	the new value of	-0.124939
-0.078520	a new value of	-0.124939
-0.174440	the binary value of	-0.124939
-0.174440	possible negative value of	-0.124939
-0.174440	the preceding value of	-0.124939
-0.174440	the final value of	-0.124939
-0.137880	the absolute value of	-0.425969
-0.174440	the initial value of	-0.124939
-0.102036	fragmented when objects of	-0.425969
-0.331999	to store objects of	-0.124939
-0.331999	and similar objects of	-0.124939
-0.331999	void. Returning objects of	-0.124939
-0.405737	speed in any of	-0.124939
-0.405737	true, if any of	-0.124939
-0.492898	bypassed by any of	-0.124939
-0.312799	can use any of	-0.124939
-0.128826	units. If any of	-0.124939
-0.128826	(RTTI) If any of	-0.124939
-0.312799	used, but any of	-0.124939
-0.441733	microprocessors without any of	-0.124939
-0.440428	care of some of	-0.124939
-0.459551	programmers to some of	-0.124939
-0.459551	mechanisms, and some of	-0.124939
-0.440428	comes with some of	-0.124939
-0.311829	have described some of	-0.124939
-0.311829	common. Even some of	-0.124939
-0.311829	others. While some of	-0.124939
-0.311829	sections describe some of	-0.124939
-0.280879	to a table of	-0.124939
-0.280879	and a table of	-0.124939
-0.280879	as a table of	-0.124939
-0.399303	from a table of	-0.124939
-0.280879	has a table of	-0.124939
-0.481587	104). The table of	-0.124939
-0.309336	{ // table of	-0.124939
-0.309336	to make table of	-0.124939
-0.381334	with the performance of	-0.124939
-0.381334	where the performance of	-0.124939
-0.381334	about the performance of	-0.124939
-0.381334	compare the performance of	-0.124939
-0.381334	influence the performance of	-0.124939
-0.309852	2.1. Comparing performance of	-0.124939
-0.309852	the overall performance of	-0.124939
-0.309852	The benchmark performance of	-0.124939
-0.222518	that the order of	-0.124939
-0.222518	check the order of	-0.124939
-0.222518	change the order of	-0.124939
-0.062302	swap the order of	-0.602060
-0.222518	swapping the order of	-0.124939
-0.216921	parameter. The order of	-0.124939
-0.216921	Booleans The order of	-0.124939
-0.297552	The opposite order of	-0.124939
-0.625189	a new branch of	-0.124939
-0.341241	each particular branch of	-0.124939
-0.341241	the dispatch branch of	-0.124939
-0.302749	that is member of	-0.124939
-0.293146	be a member of	-0.124939
-0.122295	function a member of	-0.425969
-0.293146	accessing a member of	-0.124939
-0.293146	Accessing a member of	-0.124939
-0.841074	a data member of	-0.124939
-0.393324	the polymorphic member of	-0.124939
-0.302749	other (not member of	-0.124939
-0.566449	in the way of	-0.124939
-0.358496	Unfortunately, the way of	-0.124939
-0.716095	is a way of	-0.124939
-0.274124	The C++ way of	-0.124939
-0.203793	an efficient way of	-0.124939
-0.203793	very efficient way of	-0.124939
-0.358321	a useful way of	-0.124939
-0.274124	A simple way of	-0.124939
-0.274124	a common way of	-0.124939
-0.358321	a good way of	-0.124939
-0.274124	a convenient way of	-0.124939
-0.274124	but efficient, way of	-0.124939
-0.274124	a portable way of	-0.124939
-0.532238	adds the elements of	-0.124939
-0.460101	on all elements of	-0.124939
-0.480131	of array elements of	-0.124939
-0.326374	read four elements of	-0.124939
-0.326374	with N elements of	-0.124939
-0.064316	of the address of	-0.124939
-0.020398	to the address of	-0.602060
-0.064316	that the address of	-0.124939
-0.030968	up the address of	-0.425969
-0.015208	contains the address of	-0.726999
-0.030968	calculate the address of	-0.124939
-0.064316	simply the address of	-0.124939
-0.064316	unless the address of	-0.124939
-0.064316	Here, the address of	-0.124939
-0.064316	find the address of	-0.124939
-0.030968	calculating the address of	-0.425969
-0.064316	tells the address of	-0.124939
-0.064316	So the address of	-0.124939
-0.091271	address. The address of	-0.124939
-0.091271	here. The address of	-0.124939
-0.206970	the return address of	-0.124939
-0.651856	on every call of	-0.124939
-0.449494	if each bit of	-0.124939
-0.252473	the sign bit of	-0.249877
-0.317434	set sign bit of	-0.124939
-0.217194	down sign bit of	-0.124939
-0.217194	Set sign bit of	-0.124939
-0.217194	flip sign bit of	-0.124939
-0.287938	least significant bit of	-0.124939
-0.511233	when the optimization of	-0.124939
-0.084623	advices on optimization of	-0.124939
-0.084623	book on optimization of	-0.124939
-0.084623	Advices on optimization of	-0.124939
-0.354231	_mm_add_epi16(a,b). Two libraries of	-0.124939
-0.486220	specifying that pointers of	-0.124939
-0.492439	that two pointers of	-0.124939
-0.354540	multiple versions even of	-0.124939
-0.435128	shows, the method of	-0.124939
-0.389135	back. The method of	-0.124939
-0.389135	count. The method of	-0.124939
-0.307882	A newer method of	-0.124939
-0.307882	old C-style method of	-0.124939
-0.307882	The original method of	-0.124939
-0.017934	index is out of	-0.301030
-0.196597	languages are out of	-0.124939
-0.196597	simultaneously or out of	-0.124939
-0.196597	0 if out of	-0.124939
-0.196597	is not out of	-0.124939
-0.087268	execute instructions out of	-0.124939
-0.087268	executing instructions out of	-0.124939
-0.196597	the conversions out of	-0.124939
-0.196597	FatalAppExitA(0,"Array index out of	-0.124939
-0.056189	// Index out of	-0.124939
-0.027186	"Error: Index out of	-0.425969
-0.041446	be moved out of	-0.124939
-0.196597	n being out of	-0.124939
-0.087268	for jumping out of	-0.124939
-0.087268	after jumping out of	-0.124939
-0.196597	are breaking out of	-0.124939
-0.353284	a zip file of	-0.124939
-0.026622	only the part of	-0.124939
-0.013107	that is part of	-0.124939
-0.013107	parameter is part of	-0.124939
-0.013107	is a part of	-0.124939
-0.013107	often a part of	-0.124939
-0.026622	(Darwin) are part of	-0.124939
-0.026622	included as part of	-0.124939
-0.026622	is not part of	-0.124939
-0.013107	that this part of	-0.124939
-0.013107	on this part of	-0.124939
-0.026622	time. A part of	-0.124939
-0.002879	the same part of	-0.602060
-0.026622	cases: If part of	-0.124939
-0.026622	see which part of	-0.124939
-0.026622	reasons, but part of	-0.124939
-0.046791	in each part of	-0.124939
-0.046791	times each part of	-0.124939
-0.026622	a static part of	-0.124939
-0.026622	include any part of	-0.124939
-0.018120	the critical part of	-0.602060
-0.007899	a critical part of	-0.124939
-0.007899	same critical part of	-0.124939
-0.000980	most critical part of	-0.726999
-0.026622	you access part of	-0.124939
-0.026622	an important part of	-0.124939
-0.026622	a large part of	-0.124939
-0.026622	a small part of	-0.124939
-0.026622	The optimized part of	-0.124939
-0.026622	and another part of	-0.124939
-0.026622	a particular part of	-0.124939
-0.026622	most significant part of	-0.124939
-0.026622	most time-consuming part of	-0.124939
-0.026622	program (or part of	-0.124939
-0.026622	the time-critical part of	-0.124939
-0.026622	the task-specific part of	-0.124939
-0.201625	all the bits of	-0.124939
-0.201625	interpret the bits of	-0.124939
-0.252791	or 16 bits of	-0.124939
-0.252791	lower 16 bits of	-0.124939
-0.257085	example 32 bits of	-0.124939
-0.257085	accessing 32 bits of	-0.124939
-0.109791	upper 32 bits of	-0.124939
-0.270307	significant n bits of	-0.124939
-0.270307	the individual bits of	-0.124939
-0.846306	the vector operations of	-0.124939
-0.203832	of the type of	-0.124939
-0.203832	and the type of	-0.124939
-0.203832	on the type of	-0.124939
-0.203832	where the type of	-0.124939
-0.203832	Re-interpreting the type of	-0.124939
-0.258983	size and type of	-0.124939
-0.387000	declaration. The type of	-0.124939
-0.258983	for each type of	-0.124939
-0.258983	with any type of	-0.124939
-0.258983	The return type of	-0.124939
-0.258983	the appropriate type of	-0.124939
-0.048345	In the case of	-0.249877
-0.180000	function in case of	-0.124939
-0.104747	time in case of	-0.124939
-0.104747	program in case of	-0.124939
-0.104747	way in case of	-0.124939
-0.049223	up in case of	-0.425969
-0.049223	exception in case of	-0.124939
-0.104747	integers in case of	-0.124939
-0.104747	numbers in case of	-0.124939
-0.104747	everything in case of	-0.124939
-0.104747	justified in case of	-0.124939
-0.323540	all possible cases of	-0.124939
-0.061137	7.31 Other cases of	-0.425969
-0.323540	some rare cases of	-0.124939
-0.353802	Xnu project. Some of	-0.124939
-0.426794	applies to arrays of	-0.124939
-0.329719	can make arrays of	-0.124939
-0.329719	a few arrays of	-0.124939
-0.522846	overlap the calculations of	-0.124939
-0.340775	the necessary calculations of	-0.124939
-0.033295	or more versions of	-0.425969
-0.096667	the different versions of	-0.425969
-0.039607	The different versions of	-0.124939
-0.083193	with different versions of	-0.124939
-0.083193	If different versions of	-0.124939
-0.083193	two different versions of	-0.124939
-0.336297	make multiple versions of	-0.124939
-0.232101	making multiple versions of	-0.124939
-0.232101	generate multiple versions of	-0.124939
-0.130356	are two versions of	-0.124939
-0.130356	make two versions of	-0.124939
-0.151932	advertise new versions of	-0.124939
-0.151932	Hyperthreading Some versions of	-0.124939
-0.151932	includes optimized versions of	-0.124939
-0.151932	make special versions of	-0.124939
-0.151932	in newer versions of	-0.124939
-0.151932	The latest versions of	-0.124939
-0.151932	the CPU-specific versions of	-0.124939
-0.151932	necessary. Fast versions of	-0.124939
-0.158011	block the execution of	-0.425969
-0.329816	occurs during execution of	-0.124939
-0.282855	for the result of	-0.124939
-0.075485	on the result of	-0.602060
-0.282855	when the result of	-0.124939
-0.282855	see the result of	-0.124939
-0.282855	needs the result of	-0.124939
-0.081574	as a result of	-0.124939
-0.258446	fast. The result of	-0.124939
-0.258446	<. The result of	-0.124939
-0.442176	the intermediate result of	-0.124939
-0.534918	takes 8 bytes of	-0.124939
-0.328876	64 consecutive bytes of	-0.124939
-0.328876	than 65 bytes of	-0.124939
-0.894065	the first element of	-0.124939
-0.186506	than the speed of	-0.124939
-0.186506	while the speed of	-0.124939
-0.186506	improve the speed of	-0.124939
-0.186506	Testing the speed of	-0.124939
-0.186506	measures the speed of	-0.124939
-0.429966	data. The speed of	-0.124939
-0.282692	The high speed of	-0.124939
-0.353502	that do much of	-0.124939
-0.352016	memory ports, etc. of	-0.124939
-0.429455	check for overflow of	-0.425969
-0.313648	result. An overflow of	-0.124939
-0.313648	A positive overflow of	-0.124939
-0.376692	types to integers of	-0.124939
-0.376692	or two integers of	-0.124939
-0.289200	each, four integers of	-0.124939
-0.289200	each, eight integers of	-0.124939
-0.289200	cannot multiply integers of	-0.124939
-0.289200	either sixteen integers of	-0.124939
-0.012782	to the power of	-0.301030
-0.000078	is a power of	-0.726999
-0.000311	be a power of	-0.425969
-0.000415	by a power of	-0.602060
-0.001247	not a power of	-0.124939
-0.001247	matrix a power of	-0.124939
-0.001247	been a power of	-0.124939
-0.001247	columns a power of	-0.124939
-0.001247	N a power of	-0.124939
-0.008804	a high power of	-0.425969
-0.036339	high processing power of	-0.124939
-0.036339	the computational power of	-0.124939
-0.132905	a common cause of	-0.124939
-0.132905	most common cause of	-0.124939
-0.325381	a frequent cause of	-0.124939
-0.352251	holds a precision of	-0.124939
-0.072573	is the calculation of	-0.124939
-0.034772	and the calculation of	-0.124939
-0.072573	in the calculation of	-0.124939
-0.072573	for the calculation of	-0.124939
-0.072573	efficient the calculation of	-0.124939
-0.072573	before the calculation of	-0.124939
-0.072573	out the calculation of	-0.124939
-0.072573	up the calculation of	-0.124939
-0.072573	start the calculation of	-0.124939
-0.072573	specifies the calculation of	-0.124939
-0.072573	case, the calculation of	-0.124939
-0.072573	begin the calculation of	-0.124939
-0.072573	finished the calculation of	-0.124939
-0.103292	calls. The calculation of	-0.124939
-0.103292	polynomial The calculation of	-0.124939
-0.103292	127. The calculation of	-0.124939
-0.103292	supported. The calculation of	-0.124939
-0.337197	if the uses of	-0.124939
-0.337197	Four typical uses of	-0.124939
-0.518730	for the parameters of	-0.124939
-0.352670	The worst problem of	-0.124939
-0.436842	The alternative solution of	-0.124939
-0.337735	The radical solution of	-0.124939
-0.266957	gives the advantage of	-0.124939
-0.012112	} The advantage of	-0.124939
-0.012112	program. The advantage of	-0.124939
-0.012112	cache. The advantage of	-0.124939
-0.012112	compilers. The advantage of	-0.124939
-0.012112	enabled. The advantage of	-0.124939
-0.012112	faster. The advantage of	-0.124939
-0.012112	iterations. The advantage of	-0.124939
-0.012112	m. The advantage of	-0.124939
-0.107926	that takes advantage of	-0.124939
-0.006014	to take advantage of	-0.124939
-0.001995	can take advantage of	-0.301030
-0.107926	The main advantage of	-0.124939
-0.107926	take maximum advantage of	-0.124939
-0.107926	the full advantage of	-0.124939
-0.107926	We took advantage of	-0.124939
-0.352538	system for support of	-0.124939
-0.351264	where only few of	-0.124939
-0.084584	is a list of	-0.124939
-0.189727	for a list of	-0.124939
-0.214873	set?". A list of	-0.124939
-0.214873	a long list of	-0.124939
-0.019254	a negative list of	-0.124939
-0.153947	a positive list of	-0.124939
-0.214873	the smallest list of	-0.124939
-0.352428	to 15.1c would of	-0.124939
-0.320443	floats A structure of	-0.124939
-0.320443	the whole structure of	-0.124939
-0.320443	the logic structure of	-0.124939
-0.074411	that the values of	-0.124939
-0.164262	on the values of	-0.124939
-0.164262	make the values of	-0.124939
-0.164262	show the values of	-0.124939
-0.422930	}; The values of	-0.124939
-0.333349	message systems. All of	-0.124939
-0.333349	formats. Comments All of	-0.124939
-0.500096	test the sign of	-0.124939
-0.500096	change the sign of	-0.124939
-0.373973	use the copy of	-0.124939
-0.286976	an unused copy of	-0.124939
-0.212973	a non-inlined copy of	-0.425969
-0.286976	a backup copy of	-0.124939
-0.169254	calculate the addresses of	-0.124939
-0.169254	control the addresses of	-0.124939
-0.169254	includes the addresses of	-0.124939
-0.169254	calculating the addresses of	-0.124939
-0.430712	object. The allocation of	-0.124939
-0.332849	it involves allocation of	-0.124939
-0.501030	with the problems of	-0.124939
-0.332100	susceptible to problems of	-0.124939
-0.350932	larger address space of	-0.124939
-0.037663	if a lot of	-0.124939
-0.037663	with a lot of	-0.124939
-0.037663	use a lot of	-0.124939
-0.012195	do a lot of	-0.124939
-0.018423	take a lot of	-0.124939
-0.037663	cause a lot of	-0.124939
-0.018423	uses a lot of	-0.124939
-0.037663	get a lot of	-0.124939
-0.037663	contains a lot of	-0.124939
-0.037663	require a lot of	-0.124939
-0.037663	save a lot of	-0.124939
-0.037663	waste a lot of	-0.124939
-0.037663	spend a lot of	-0.124939
-0.018423	consume a lot of	-0.124939
-0.037663	consumes a lot of	-0.124939
-0.037663	installed, a lot of	-0.124939
-0.037663	RAM, a lot of	-0.124939
-0.021116	functions. A lot of	-0.124939
-0.021116	vectors. A lot of	-0.124939
-0.545845	of the multiplication of	-0.124939
-0.256785	standard. An implementation of	-0.124939
-0.337361	a good implementation of	-0.124939
-0.409029	a hardware implementation of	-0.124939
-0.193877	a complicated implementation of	-0.124939
-0.193877	most complicated implementation of	-0.124939
-0.256785	A typical implementation of	-0.124939
-0.311101	procedure 4 Most of	-0.124939
-0.311101	interface framework Most of	-0.124939
-0.311101	limited resources. Most of	-0.124939
-0.279724	containing the members of	-0.124939
-0.091812	that are members of	-0.124939
-0.091812	they are members of	-0.124939
-0.208386	class with members of	-0.124939
-0.378919	The data members of	-0.124939
-0.208386	saved variable members of	-0.124939
-0.091812	class. Data members of	-0.124939
-0.091812	together. Data members of	-0.124939
-0.208386	instance. Non-static members of	-0.124939
-0.350155	than other methods of	-0.124939
-0.425651	during the development of	-0.124939
-0.328804	and easy development of	-0.124939
-0.308622	within a block of	-0.124939
-0.308622	one big block of	-0.124939
-0.308622	its own block of	-0.124939
-0.483348	is the name of	-0.124939
-0.425432	www.agner.org/optimize/asmlib.zip. The name of	-0.124939
-0.351627	of the needs of	-0.124939
-0.515226	i++)a[i]=2*i; The conversion of	-0.124939
-0.367238	have the disadvantage of	-0.124939
-0.171413	below. The disadvantage of	-0.124939
-0.171413	called. The disadvantage of	-0.124939
-0.171413	Windows. The disadvantage of	-0.124939
-0.171413	array. The disadvantage of	-0.124939
-0.131315	107. A disadvantage of	-0.124939
-0.131315	form. A disadvantage of	-0.124939
-0.069981	most important disadvantage of	-0.124939
-0.069981	An important disadvantage of	-0.124939
-0.131315	time. Another disadvantage of	-0.124939
-0.131315	itself. Another disadvantage of	-0.124939
-0.153445	The biggest disadvantage of	-0.124939
-0.493607	of a parameter of	-0.124939
-0.286992	a useful source of	-0.124939
-0.286992	a common source of	-0.124939
-0.286992	a frequent source of	-0.124939
-0.286992	a valuable source of	-0.124939
-0.023871	and the cost of	-0.124939
-0.023871	if the cost of	-0.124939
-0.023871	at the cost of	-0.124939
-0.023871	But the cost of	-0.124939
-0.023871	Avoiding the cost of	-0.124939
-0.023871	Underestimating the cost of	-0.124939
-0.036317	thread. The cost of	-0.124939
-0.036317	60 The cost of	-0.124939
-0.036317	defined. The cost of	-0.124939
-0.036317	applications: The cost of	-0.124939
-0.232576	high overhead cost of	-0.124939
-0.350746	shares the resources of	-0.124939
-0.517235	scans a string of	-0.124939
-0.296532	of the end of	-0.124939
-0.470392	in the end of	-0.124939
-0.296532	at the end of	-0.124939
-0.296532	See the end of	-0.124939
-0.296532	past the end of	-0.124939
-0.209956	point to end of	-0.124939
-0.209956	compare with end of	-0.124939
-0.209956	; mark end of	-0.124939
-0.059238	90 for examples of	-0.124939
-0.059238	103 for examples of	-0.124939
-0.059238	www.agner.org/optimize/cppexamples.zip for examples of	-0.124939
-0.209378	find more examples of	-0.124939
-0.209378	seen many examples of	-0.124939
-0.209378	provided several examples of	-0.124939
-0.209378	www.agner.org/optimize/cppexamples.zip contains examples of	-0.124939
-0.209378	a[i] More examples of	-0.124939
-0.349990	math allow addition of	-0.124939
-0.452042	*.so). The mechanism of	-0.124939
-0.338845	{ // Table of	-0.124939
-0.338845	n! // Table of	-0.124939
-0.349591	common language runtime of	-0.124939
-0.306124	as a means of	-0.124939
-0.058716	one by means of	-0.425969
-0.154713	the first byte of	-0.124939
-0.304617	cycles per byte of	-0.124939
-0.009731	than the parts of	-0.124939
-0.009731	optimize the parts of	-0.124939
-0.019685	possible when parts of	-0.124939
-0.019685	and make parts of	-0.124939
-0.004838	the different parts of	-0.124939
-0.004838	in different parts of	-0.124939
-0.002412	between different parts of	-0.124939
-0.009731	to other parts of	-0.124939
-0.009731	affects other parts of	-0.124939
-0.019685	most used parts of	-0.124939
-0.003866	the critical parts of	-0.124939
-0.003866	in critical parts of	-0.124939
-0.003866	or critical parts of	-0.124939
-0.003866	most critical parts of	-0.124939
-0.003866	less critical parts of	-0.124939
-0.019685	system- specific parts of	-0.124939
-0.019685	that certain parts of	-0.124939
-0.019685	most time-consuming parts of	-0.124939
-0.019685	brand. Critical parts of	-0.124939
-0.019685	other nearby parts of	-0.124939
-0.186410	number and types of	-0.124939
-0.179507	using different types of	-0.124939
-0.080544	two different types of	-0.425969
-0.179507	CPUs, different types of	-0.124939
-0.186410	reduce other types of	-0.124939
-0.279004	defining integer types of	-0.124939
-0.186410	the two types of	-0.124939
-0.186410	reduce some types of	-0.124939
-0.008314	inline function instead of	-0.124939
-0.008314	built-in code instead of	-0.124939
-0.008314	short int instead of	-0.124939
-0.008314	test data instead of	-0.124939
-0.008314	has i instead of	-0.124939
-0.008314	were float instead of	-0.124939
-0.008314	the object instead of	-0.124939
-0.008314	lookup table instead of	-0.124939
-0.008314	in registers instead of	-0.124939
-0.008314	handling system instead of	-0.124939
-0.008314	using references instead of	-0.124939
-0.008314	monitor counters instead of	-0.124939
-0.008314	with templates instead of	-0.124939
-0.008314	use #if instead of	-0.124939
-0.004137	using rounding instead of	-0.124939
-0.004137	Use rounding instead of	-0.124939
-0.008314	Use macros instead of	-0.124939
-0.008314	file format instead of	-0.124939
-0.008314	option -fpie instead of	-0.124939
-0.008314	or typedef instead of	-0.124939
-0.008314	(or int) instead of	-0.124939
-0.008314	and |) instead of	-0.124939
-0.349140	are unacceptable. Each of	-0.124939
-0.420296	off all optimizations of	-0.124939
-0.324515	do interprocedural optimizations of	-0.124939
-0.323280	the code. Many of	-0.124939
-0.323280	other microprocessors. Many of	-0.124939
-0.417965	have four numbers of	-0.124939
-0.322644	have eight numbers of	-0.124939
-0.330629	use the vectors of	-0.124939
-0.251185	time in vectors of	-0.124939
-0.330629	calculations on vectors of	-0.124939
-0.251185	// Define vectors of	-0.124939
-0.251185	set (128 vectors of	-0.124939
-0.023312	after the piece of	-0.124939
-0.023312	insert the piece of	-0.124939
-0.004563	of a piece of	-0.124939
-0.004563	if a piece of	-0.124939
-0.002276	make a piece of	-0.425969
-0.004563	If a piece of	-0.124939
-0.004563	how a piece of	-0.124939
-0.004563	optimize a piece of	-0.124939
-0.004563	generate a piece of	-0.124939
-0.004563	optimizes a piece of	-0.124939
-0.004563	studying a piece of	-0.124939
-0.011500	the same piece of	-0.124939
-0.047946	a critical piece of	-0.124939
-0.169587	a small piece of	-0.124939
-0.047946	a particular piece of	-0.124939
-0.348353	are: The process of	-0.124939
-0.130729	of the advantages of	-0.124939
-0.005541	used. The advantages of	-0.124939
-0.011154	user. The advantages of	-0.124939
-0.011154	pointers. The advantages of	-0.124939
-0.011154	style. The advantages of	-0.124939
-0.011154	operands. The advantages of	-0.124939
-0.011154	1.0f;} The advantages of	-0.124939
-0.011154	dynamically. The advantages of	-0.124939
-0.011154	bits). The advantages of	-0.124939
-0.112494	the above advantages of	-0.124939
-0.416246	OR the results of	-0.124939
-0.469809	sizes. The results of	-0.124939
-0.415121	time. The storage of	-0.124939
-0.415121	make thread-local storage of	-0.124939
-0.053322	are different ways of	-0.124939
-0.025843	several different ways of	-0.124939
-0.184834	other possible ways of	-0.124939
-0.184834	have fast ways of	-0.124939
-0.418645	show various ways of	-0.124939
-0.039365	are smarter ways of	-0.124939
-0.348022	operands The operands of	-0.124939
-0.104391	is the range of	-0.124939
-0.104391	limit the range of	-0.124939
-0.242093	underflow. The range of	-0.124939
-0.242093	the same range of	-0.124939
-0.242093	The live range of	-0.124939
-0.535085	at the start of	-0.124939
-0.292059	name ; start of	-0.124939
-0.292059	framework, during start of	-0.124939
-0.488772	all the modules of	-0.124939
-0.347921	The execution core of	-0.124939
-0.028581	that the overhead of	-0.124939
-0.028581	avoid the overhead of	-0.124939
-0.028581	involves the overhead of	-0.124939
-0.028581	invoking the overhead of	-0.124939
-0.014056	function. The overhead of	-0.425969
-0.028581	are: The overhead of	-0.124939
-0.028581	threads. The overhead of	-0.124939
-0.218526	the extra overhead of	-0.124939
-0.186097	the large overhead of	-0.124939
-0.348190	updating. The change of	-0.124939
-0.403895	use. The installation of	-0.124939
-0.119314	both during installation of	-0.124939
-0.119314	itself, during installation of	-0.124939
-0.061514	in the choice of	-0.124939
-0.029669	that the choice of	-0.124939
-0.061514	Today, the choice of	-0.124939
-0.021036	platform The choice of	-0.124939
-0.021036	applications. The choice of	-0.124939
-0.021036	values. The choice of	-0.124939
-0.021036	accelerators The choice of	-0.124939
-0.021036	algorithm. The choice of	-0.124939
-0.117145	a suitable choice of	-0.124939
-0.448452	with an index of	-0.124939
-0.171695	whenever an instance of	-0.124939
-0.422615	than one instance of	-0.124939
-0.171695	with each instance of	-0.124939
-0.036986	a new instance of	-0.425969
-0.171695	the next instance of	-0.124939
-0.171695	classes. Each instance of	-0.124939
-0.400463	compile the output of	-0.124939
-0.857941	the assembly output of	-0.124939
-0.347530	are declared outside of	-0.124939
-0.345254	the essential task of	-0.124939
-0.171840	of the costs of	-0.124939
-0.171840	on the costs of	-0.124939
-0.171840	avoiding the costs of	-0.124939
-0.053800	exception. The costs of	-0.124939
-0.026067	1.1 The costs of	-0.425969
-0.771266	call the destructor of	-0.124939
-0.006519	8 2.5 Choice of	-0.124939
-0.006519	compilers. 2.5 Choice of	-0.124939
-0.006519	6 2.3 Choice of	-0.124939
-0.006519	manual. 2.3 Choice of	-0.124939
-0.006519	cache. 2.2 Choice of	-0.124939
-0.006519	5 2.2 Choice of	-0.124939
-0.006519	platform 2.1 Choice of	-0.124939
-0.006519	5 2.1 Choice of	-0.124939
-0.006519	12 2.7 Choice of	-0.124939
-0.006519	undocumented. 2.7 Choice of	-0.124939
-0.006519	10 2.6 Choice of	-0.124939
-0.006519	compiler. 2.6 Choice of	-0.124939
-0.006519	optimization. 2.4 Choice of	-0.124939
-0.006519	6 2.4 Choice of	-0.124939
-0.072979	on the efficiency of	-0.124939
-0.072979	between the efficiency of	-0.124939
-0.034958	etc. The efficiency of	-0.124939
-0.017128	7 The efficiency of	-0.425969
-0.034958	Loops The efficiency of	-0.124939
-0.160748	the relative efficiency of	-0.124939
-0.345254	defines an algorithm of	-0.124939
-0.310028	calculates the sum of	-0.124939
-0.310028	is a sum of	-0.124939
-0.345958	types or strings of	-0.124939
-0.314384	and the possibility of	-0.124939
-0.314384	about the possibility of	-0.124939
-0.314384	offer the possibility of	-0.124939
-0.236888	very obscure possibility of	-0.124939
-0.054483	See the discussion of	-0.124939
-0.012993	for a discussion of	-0.124939
-0.026387	120 for discussion of	-0.124939
-0.026387	93 for discussion of	-0.124939
-0.054483	for more discussion of	-0.124939
-0.054483	prone. A discussion of	-0.124939
-0.012993	a further discussion of	-0.124939
-0.004288	for further discussion of	-0.124939
-0.345259	allows a maximum of	-0.124939
-0.304885	automatically. The alignment of	-0.124939
-0.304885	__attribute__((aligned(16))). Specifies alignment of	-0.124939
-0.226321	to the offset of	-0.124939
-0.226321	if the offset of	-0.124939
-0.226321	stores the offset of	-0.124939
-0.235530	}; The offset of	-0.124939
-0.187190	the first operand of	-0.124939
-0.000082	for the sake of	-0.159701
-0.236616	then the effect of	-0.124939
-0.190793	{ The effect of	-0.124939
-0.190793	loop. The effect of	-0.124939
-0.190793	manually. The effect of	-0.124939
-0.002329	and the amount of	-0.124939
-0.002329	that the amount of	-0.124939
-0.002329	when the amount of	-0.124939
-0.002329	increases the amount of	-0.124939
-0.002329	reserve the amount of	-0.124939
-0.002329	minimize the amount of	-0.124939
-0.014164	the total amount of	-0.124939
-0.014164	a significant amount of	-0.124939
-0.003498	the required amount of	-0.124939
-0.014164	an equal amount of	-0.124939
-0.014164	a considerable amount of	-0.124939
-0.014164	an insufficient amount of	-0.124939
-0.345259	takes extra time, of	-0.124939
-0.344561	this wasteful copying of	-0.124939
-0.344910	most frequent causes of	-0.124939
-0.343378	a balanced mix of	-0.124939
-0.301347	The high priority of	-0.124939
-0.391597	the low priority of	-0.124939
-0.963094	The clock frequency of	-0.124939
-0.444893	in one iteration of	-0.124939
-0.301347	for every iteration of	-0.124939
-0.344132	for future models of	-0.124939
-0.125272	addresses. The names of	-0.124939
-0.125272	for. The names of	-0.124939
-0.003030	The different kinds of	-0.124939
-0.003030	do different kinds of	-0.124939
-0.003030	two different kinds of	-0.124939
-0.003030	doing different kinds of	-0.124939
-0.003030	mix different kinds of	-0.124939
-0.015366	than other kinds of	-0.124939
-0.015366	cause all kinds of	-0.124939
-0.015366	the two kinds of	-0.124939
-0.015366	are four kinds of	-0.124939
-0.015366	make certain kinds of	-0.124939
-0.003791	7.1 Different kinds of	-0.425969
-0.392018	(en.wikipedia.org/wiki/L2_cache). The details of	-0.124939
-0.301689	the technical details of	-0.124939
-0.262715	an extra level of	-0.124939
-0.262715	A higher level of	-0.124939
-0.262715	the highest level of	-0.124939
-0.444254	paragraph. The target of	-0.124939
-0.342803	outside the bounds of	-0.124939
-0.109023	requires the loading of	-0.124939
-0.109023	involve the loading of	-0.124939
-0.254932	of lazy loading of	-0.124939
-0.123659	case the reading of	-0.124939
-0.123659	off the reading of	-0.124939
-0.632976	worst case situation of	-0.124939
-0.021140	two different implementations of	-0.124939
-0.144962	run. Some implementations of	-0.124939
-0.043362	most common implementations of	-0.124939
-0.043362	All common implementations of	-0.124939
-0.091539	code. Most implementations of	-0.124939
-0.091539	particularly slow implementations of	-0.124939
-0.091539	Some early implementations of	-0.124939
-0.311132	The first generation of	-0.124939
-0.150425	each new generation of	-0.124939
-0.033008	the next generation of	-0.124939
-0.307739	the second generation of	-0.124939
-0.150425	or compile-time generation of	-0.124939
-0.446494	and different sizes of	-0.124939
-0.296465	on all sizes of	-0.124939
-0.031442	without the risk of	-0.124939
-0.015437	involves the risk of	-0.124939
-0.102110	Faster, but risk of	-0.124939
-0.010230	is no risk of	-0.124939
-0.292282	a large fraction of	-0.124939
-0.292282	a small fraction of	-0.124939
-0.234607	in the sequence of	-0.124939
-0.150194	if the sequence of	-0.124939
-0.118644	on a sequence of	-0.124939
-0.118644	doing a sequence of	-0.124939
-0.133610	CPUs. The sequence of	-0.124939
-0.133610	a long sequence of	-0.124939
-0.011238	to the length of	-0.124939
-0.011238	if the length of	-0.124939
-0.011238	then the length of	-0.124939
-0.011238	If the length of	-0.124939
-0.011238	unless the length of	-0.124939
-0.011238	adding the length of	-0.124939
-0.034627	doubled. The length of	-0.124939
-0.034627	started. The length of	-0.124939
-0.291880	version. The penalty of	-0.124939
-0.379973	a misprediction penalty of	-0.124939
-0.145502	permissible for reasons of	-0.425969
-0.008386	of the beginning of	-0.124939
-0.004173	to the beginning of	-0.726999
-0.016937	that the beginning of	-0.124939
-0.016937	with the beginning of	-0.124939
-0.016937	into the beginning of	-0.124939
-0.004180	is a matter of	-0.425969
-0.001667	simply a matter of	-0.346788
-0.008401	just a matter of	-0.124939
-0.291078	} The declaration of	-0.124939
-0.291078	the full declaration of	-0.124939
-0.044549	or the series of	-0.124939
-0.007115	is a series of	-0.124939
-0.003543	in a series of	-0.425969
-0.007115	through a series of	-0.124939
-0.007115	made a series of	-0.124939
-0.007115	executes a series of	-0.124939
-0.044549	notice This series of	-0.124939
-0.044549	in this series of	-0.124939
-0.245774	of the features of	-0.124939
-0.324139	the optimization features of	-0.124939
-0.245774	time- consuming features of	-0.124939
-0.003543	is a waste of	-0.124939
-0.007115	and a waste of	-0.124939
-0.003543	be a waste of	-0.124939
-0.007115	cause a waste of	-0.124939
-0.044549	frustration and waste of	-0.124939
-0.044549	a big waste of	-0.124939
-0.044549	a total waste of	-0.124939
-0.020611	3. The microarchitecture of	-0.124939
-0.000315	3: "The microarchitecture of	-1.028029
-0.101879	it is independent of	-0.124939
-0.101879	11.3 is independent of	-0.124939
-0.235228	is almost independent of	-0.124939
-0.718575	constructors and destructors of	-0.124939
-0.005921	function in terms of	-0.124939
-0.005921	- in terms of	-0.124939
-0.002950	cost in terms of	-0.425969
-0.002950	costs in terms of	-0.124939
-0.005921	costless in terms of	-0.124939
-0.005921	Thinking in terms of	-0.124939
-0.341275	instructions without help of	-0.124939
-0.340276	class. The transfer of	-0.124939
-0.284584	of copying blocks of	-0.124939
-0.284584	17.9: "Moving blocks of	-0.124939
-0.035218	for an explanation of	-0.221849
-0.114795	more detailed explanation of	-0.124939
-0.035081	with different brands of	-0.124939
-0.035081	seven different brands of	-0.124939
-0.035081	treats different brands of	-0.124939
-0.115092	for other brands of	-0.124939
-0.115092	on all brands of	-0.124939
-0.115092	of competing brands of	-0.124939
-0.438783	on any brand of	-0.124939
-0.340276	faster. The logic of	-0.124939
-0.119537	book "Performance Optimization of	-0.124939
-0.119537	Hoisie: "Performance Optimization of	-0.124939
-0.012505	that takes care of	-0.425969
-0.025381	compiler takes care of	-0.124939
-0.009345	to take care of	-0.124939
-0.009345	can take care of	-0.124939
-0.338548	save one unit of	-0.124939
-0.011473	is a kind of	-0.124939
-0.011473	also a kind of	-0.124939
-0.007615	make this kind of	-0.124939
-0.007615	supports this kind of	-0.124939
-0.007615	prevent this kind of	-0.124939
-0.023258	a different kind of	-0.124939
-0.023258	explicitly what kind of	-0.124939
-0.023258	even worse kind of	-0.124939
-0.436460	for several iterations of	-0.124939
-0.337989	prediction and misprediction of	-0.124939
-0.523833	allow lazy binding of	-0.124939
-0.337431	about the chain of	-0.124939
-0.777406	the x86 family of	-0.124939
-0.131886	floating point Conversion of	-0.124939
-0.131886	registers used. Conversion of	-0.124939
-0.060956	integer conversion Conversion of	-0.124939
-0.060956	float conversion Conversion of	-0.124939
-0.131886	floating point. Conversion of	-0.124939
-0.338548	to produce tables of	-0.124939
-0.340228	are used. Conversions of	-0.124939
-0.154060	for the purpose of	-0.124939
-0.094850	// The purpose of	-0.124939
-0.094850	y. The purpose of	-0.124939
-0.094850	new. The purpose of	-0.124939
-0.283083	true. The trick of	-0.124939
-0.283083	(b1*b2); The trick of	-0.124939
-0.266547	advance. The disadvantages of	-0.124939
-0.266547	there are disadvantages of	-0.124939
-0.154060	objects are instances of	-0.124939
-0.070234	to all instances of	-0.124939
-0.070234	Make all instances of	-0.124939
-0.154060	many renamed instances of	-0.124939
-0.113137	by the body of	-0.124939
-0.113137	because the body of	-0.124939
-0.335700	if the changes of	-0.124939
-0.266006	implemented a collection of	-0.124939
-0.266006	consuming. A collection of	-0.124939
-0.080179	to be aware of	-0.124939
-0.178590	therefore be aware of	-0.124939
-0.154478	2 (be aware of	-0.124939
-0.492568	the out-of-order capabilities of	-0.124939
-0.154060	that the representation of	-0.124939
-0.154060	compilers). The representation of	-0.124939
-0.154060	The integer representation of	-0.124939
-0.239307	in binary representation of	-0.124939
-0.026684	only for powers of	-0.124939
-0.026684	numbers are powers of	-0.124939
-0.026684	defined as powers of	-0.124939
-0.004335	of using powers of	-0.425969
-0.008714	preferably using powers of	-0.124939
-0.026684	means avoid powers of	-0.124939
-0.613392	by a factor of	-0.124939
-0.091247	to the rules of	-0.124939
-0.091247	though the rules of	-0.124939
-0.206907	the many rules of	-0.124939
-0.000529	is the responsibility of	-0.970037
-0.107142	is the reciprocal of	-0.425969
-0.331932	n'th degree polynomial of	-0.124939
-0.412810	than type casting of	-0.124939
-0.254107	safer. Type casting of	-0.124939
-0.014348	in the scope of	-0.124939
-0.014348	make the scope of	-0.124939
-0.004730	beyond the scope of	-0.602060
-0.108508	calls. The principle of	-0.124939
-0.108508	undetected. The principle of	-0.124939
-0.108508	and the throughput of	-0.124939
-0.108508	by the throughput of	-0.124939
-0.108508	increase the throughput of	-0.124939
-0.128107	measure // Number of	-0.124939
-0.028661	element, bits Number of	-0.425969
-0.128107	libraries 113 Number of	-0.124939
-0.006081	to the availability of	-0.124939
-0.006081	and the availability of	-0.124939
-0.006081	for the availability of	-0.124939
-0.006081	delay the availability of	-0.124939
-0.006081	signaling the availability of	-0.124939
-0.031296	application. The availability of	-0.124939
-0.469691	modifying only half of	-0.124939
-0.252882	of each step of	-0.124939
-0.332666	a second step of	-0.124939
-0.031296	same time regardless of	-0.124939
-0.031296	the same regardless of	-0.124939
-0.031296	most cases, regardless of	-0.124939
-0.031296	be false regardless of	-0.124939
-0.031296	in registers, regardless of	-0.124939
-0.031296	same name, regardless of	-0.124939
-0.191621	on just-in-time compilation of	-0.124939
-0.191621	use just-in-time compilation of	-0.124939
-0.038650	that the behavior of	-0.124939
-0.038650	change the behavior of	-0.124939
-0.038650	mimic the behavior of	-0.124939
-0.128107	details. The behavior of	-0.124939
-0.252882	in vector Type of	-0.124939
-0.252882	as follows: Type of	-0.124939
-0.012505	in the form of	-0.301030
-0.128107	any other form of	-0.124939
-0.051066	strict aliasing rule of	-0.425969
-0.488071	is the job of	-0.124939
-0.026041	by the requirements of	-0.124939
-0.053745	with the requirements of	-0.124939
-0.037835	cause a loss of	-0.124939
-0.037835	overflow and loss of	-0.124939
-0.037835	overflow or loss of	-0.124939
-0.037835	hardly any loss of	-0.124939
-0.037835	worry about loss of	-0.124939
-0.328455	of all cleanup of	-0.124939
-0.327582	force the swapping of	-0.124939
-0.007308	to the rest of	-0.124939
-0.007308	and the rest of	-0.124939
-0.007308	in the rest of	-0.124939
-0.007308	that the rest of	-0.124939
-0.007308	by the rest of	-0.124939
-0.327582	the advanced principles of	-0.124939
-0.101850	the negative effects of	-0.124939
-0.101850	The negative effects of	-0.124939
-0.464126	100; // Array of	-0.124939
-0.327582	criteria or lists of	-0.124939
-0.856427	in the event of	-0.124939
-0.328455	order. The advice of	-0.124939
-0.424123	no specific recommendation of	-0.124939
-0.327582	often inefficient. Objects of	-0.124939
-0.329329	these calculations. Division of	-0.124939
-0.034627	16.2 The pitfalls of	-0.425969
-0.158980	most common pitfalls of	-0.124939
-0.461745	This is inefficient, of	-0.124939
-0.330054	by the latency of	-0.124939
-0.227182	as the latency of	-0.124939
-0.037835	efficiently if pieces of	-0.124939
-0.018505	into small pieces of	-0.124939
-0.018505	typically small pieces of	-0.124939
-0.037835	joining identical pieces of	-0.124939
-0.037835	only. Critical pieces of	-0.124939
-0.330054	the time consumption of	-0.124939
-0.227182	The time consumption of	-0.124939
-0.328455	using advanced facilities of	-0.124939
-0.472689	get time slices of	-0.124939
-0.321138	prediction or estimate of	-0.124939
-0.322213	is predicted well, of	-0.124939
-0.121325	to transfer ownership of	-0.124939
-0.121325	that transfers ownership of	-0.124939
-0.121325	that looses ownership of	-0.124939
-0.015366	of the drawbacks of	-0.124939
-0.007615	Overcoming the drawbacks of	-0.425969
-0.047832	advantages and drawbacks of	-0.124939
-0.322213	up the queue of	-0.124939
-0.321138	requires no modification of	-0.124939
-0.092368	needed. 11 Out of	-0.124939
-0.092368	103 11 Out of	-0.124939
-0.321138	improved by modifications of	-0.124939
-0.321138	smaller. The lengths of	-0.124939
-0.047832	by 16. Alignment of	-0.124939
-0.023258	variables. 9.5 Alignment of	-0.124939
-0.023258	88 9.5 Alignment of	-0.124939
-0.047832	Table 7.2. Alignment of	-0.124939
-0.092368	classes. The splitting of	-0.124939
-0.092368	formalism. The splitting of	-0.124939
-0.322213	to the area of	-0.124939
-0.209840	because the consequence of	-0.124939
-0.281438	calls. The consequence of	-0.124939
-0.121325	which is 50% of	-0.124939
-0.121325	is true 50% of	-0.124939
-0.121325	be mispredicted 50% of	-0.124939
-0.321138	verifying the functionality of	-0.124939
-0.210672	requires several layers of	-0.124939
-0.210672	of separate layers of	-0.124939
-0.322213	the size. Integers of	-0.124939
-0.321138	the conflicting considerations of	-0.124939
-0.075192	size (in bytes) of	-0.124939
-0.416090	of the techniques of	-0.124939
-0.047832	x86-64 platforms. Comparison of	-0.124939
-0.047832	Table 8.1. Comparison of	-0.124939
-0.023258	optimization. 8.2 Comparison of	-0.124939
-0.023258	66 8.2 Comparison of	-0.124939
-0.321138	with the resolution of	-0.124939
-0.310605	case" values. Which of	-0.124939
-0.077009	a complete redesign of	-0.124939
-0.077009	A complete redesign of	-0.124939
-0.065023	is the combination of	-0.124939
-0.065023	with a combination of	-0.124939
-0.065023	An OR combination of	-0.124939
-0.065023	some typical sources of	-0.124939
-0.015366	are frequent sources of	-0.124939
-0.310605	a thorough analysis of	-0.124939
-0.310605	updates. Automatic updating of	-0.124939
-0.077009	because the contents of	-0.124939
-0.077009	copy the contents of	-0.124939
-0.312005	above. The generality of	-0.124939
-0.312005	on my study of	-0.124939
-0.065023	stride) = (number of	-0.124939
-0.065023	size) / (number of	-0.124939
-0.065023	size) % (number of	-0.124939
-0.170682	a high degree of	-0.124939
-0.170682	a typical degree of	-0.124939
-0.312005	that allows overriding of	-0.124939
-0.077009	to keep track of	-0.124939
-0.077009	and keep track of	-0.124939
-0.020611	to get rid of	-0.124939
-0.020611	we get rid of	-0.124939
-0.020611	don't get rid of	-0.124939
-0.310605	the optimal decomposition of	-0.124939
-0.065023	During the history of	-0.124939
-0.065023	code. The history of	-0.124939
-0.065023	the past history of	-0.124939
-0.065023	for relative addressing of	-0.124939
-0.031296	for self-relative addressing of	-0.124939
-0.031296	supports self-relative addressing of	-0.124939
-0.065023	jump to top of	-0.124939
-0.031296	a ; top of	-0.124939
-0.031296	ebx ; top of	-0.124939
-0.310605	OpenMP. www.openmp.org. Documentation of	-0.124939
-0.065023	achieved when none of	-0.124939
-0.031296	expression, but none of	-0.124939
-0.031296	15.1c, but none of	-0.124939
-0.065023	Intrinsic function Size of	-0.124939
-0.065023	of elements Size of	-0.124939
-0.065023	these classes. Size of	-0.124939
-0.438783	Taking the logarithm of	-0.124939
-0.036800	allocation and deallocation of	-0.124939
-0.170682	seven memory allocations of	-0.124939
-0.170682	are many allocations of	-0.124939
-0.310605	exceptions are indeed of	-0.124939
-0.461114	b) But beware of	-0.124939
-0.290277	the high complexity of	-0.124939
-0.290277	documentation and lack of	-0.124939
-0.290277	the disassembly window of	-0.124939
-0.290277	aligned // Structure of	-0.124939
-0.290277	if the goal of	-0.124939
-0.290277	a 50-50 chance of	-0.124939
-0.290277	overcome the dangers of	-0.124939
-0.290277	an approximate comparison of	-0.124939
-0.290277	that uses 90% of	-0.124939
-0.101591	is fast. Value of	-0.124939
-0.101591	is slow. Value of	-0.124939
-0.101591	control branch ahead of	-0.124939
-0.101591	loop counter ahead of	-0.124939
-0.378010	most. The opposite of	-0.124939
-0.101591	calculating the movements of	-0.124939
-0.101591	the physical movements of	-0.124939
-0.101591	compiler is capable of	-0.124939
-0.101591	CPUs are capable of	-0.124939
-0.290277	only the lowest of	-0.124939
-0.290277	no heavy marketing of	-0.124939
-0.101591	a better understanding of	-0.124939
-0.101591	a basic understanding of	-0.124939
-0.101591	cause the creation of	-0.124939
-0.101591	c) The creation of	-0.124939
-0.290277	and automatic parallelization of	-0.124939
-0.290277	a dramatic degradation of	-0.124939
-0.023258	a good deal of	-0.124939
-0.101591	There are lots of	-0.124939
-0.101591	databases with lots of	-0.124939
-0.101591	more than 99% of	-0.124939
-0.101591	other programs, 99% of	-0.124939
-0.047832	158 18 Overview of	-0.124939
-0.047832	159 18 Overview of	-0.124939
-0.101591	rather than sequences of	-0.124939
-0.101591	or small sequences of	-0.124939
-0.290277	for further expansions of	-0.124939
-0.047832	access 9.1 Caching of	-0.124939
-0.047832	87 9.1 Caching of	-0.124939
-0.290277	Fog. Technical University of	-0.124939
-0.101591	Example 12.8a. Sum of	-0.124939
-0.101591	Example 12.8b. Sum of	-0.124939
-0.290277	overcome the obstacle of	-0.124939
-0.378010	because algebraic manipulations of	-0.124939
-0.290277	A further extension of	-0.124939
-0.101591	is only 10% of	-0.124939
-0.101591	is true 10% of	-0.124939
-0.290277	columns. Every fourth of	-0.124939
-0.234446	Security. The vulnerability of	-0.124939
-0.234446	less than 1/50 of	-0.124939
-0.234446	stress the importance of	-0.124939
-0.234446	Time for transposition of	-0.124939
-0.234446	the logical architecture of	-0.124939
-0.234446	then the transformation of	-0.124939
-0.234446	by better standardization of	-0.124939
-0.234446	allocation and de-allocation of	-0.124939
-0.234446	function scanf. Violation of	-0.124939
-0.234446	generality and flexibility of	-0.124939
-0.234446	with a wealth of	-0.124939
-0.234446	carried out independently of	-0.124939
-0.234446	use large amounts of	-0.124939
-0.234446	installation and uninstallation of	-0.124939
-0.234446	the binary decimals of	-0.124939
-0.234446	need metaprogramming. None of	-0.124939
-0.234446	pointers. The absence of	-0.124939
-0.234446	separately. The fallacy of	-0.124939
-0.234446	the self-explaining menus of	-0.124939
-0.234446	within the lifetime of	-0.124939
-0.234446	the broader perspective of	-0.124939
-0.234446	the responsi- bility of	-0.124939
-0.234446	if the evaluation of	-0.124939
-0.234446	copying. The benefits of	-0.124939
-0.234446	if the bias of	-0.124939
-0.234446	than 2 gigabytes of	-0.124939
-0.234446	a detailed overview of	-0.124939
-0.234446	child class. Members of	-0.124939
-0.234446	in the majority of	-0.124939
-0.234446	with a lineage of	-0.124939
-0.234446	give some indication of	-0.124939
-0.234446	is the scarcity of	-0.124939
-0.234446	are not safe, of	-0.124939
-0.234446	because the insertion of	-0.124939
-0.234446	with most distributions of	-0.124939
-0.234446	hold 8 double's of	-0.124939
-0.234446	pros and cons of	-0.124939
-0.234446	a good knowledge of	-0.124939
-0.234446	the mathematical notion of	-0.124939
-0.234446	on it. Instead of	-0.124939
-0.234446	Instruction tables: Lists of	-0.124939
-0.234446	set (called x86) of	-0.124939
-0.234446	the function billions of	-0.124939
-0.234446	safe programming practice, of	-0.124939
-0.234446	data cache. Bit-fields of	-0.124939
-0.234446	can be omitted, of	-0.124939
-0.234446	hardware circuits consisting of	-0.124939
-0.234446	There are hundreds of	-0.124939
-0.234446	to do searches of	-0.124939
-0.234446	reusability and systematization of	-0.124939
-0.234446	function. This fragmentation of	-0.124939
-0.234446	64-bit mode. Much of	-0.124939
-0.234446	replace all occurrences of	-0.124939
-0.234446	data into groups of	-0.124939
-0.234446	can cause holes of	-0.124939
-0.234446	Table 7.1. Sizes of	-0.124939
-0.234446	to the design of	-0.124939
-0.234446	systems use segmentation of	-0.124939
-0.234446	about the dimensions of	-0.124939
-0.234446	program. This requires, of	-0.124939
-0.234446	the fundamental laws of	-0.124939
-0.234446	draw the attention of	-0.124939
-0.234446	time and maintainability of	-0.124939
-0.234446	reply about investigation of	-0.124939
-0.234446	reputation. The compactness of	-0.124939
-0.234446	not. The advise of	-0.124939
-0.234446	portability and ease of	-0.124939
-0.234446	are a couple of	-0.124939
-0.234446	the first-in-last-out nature of	-0.124939
-0.234446	Michael Abrash: "Zen of	-0.124939
-0.234446	used by thousands of	-0.124939
-0.234446	compiler in favor of	-0.124939
-0.234446	an extra layer of	-0.124939
-0.234446	structure and clarity of	-0.124939
-0.234446	or three levels of	-0.124939
-0.234446	this problem. Vectors of	-0.124939
-0.234446	be misleading reports of	-0.124939
-0.586598	important it is to	-0.124939
-0.522458	of this is to	-0.124939
-0.522458	test this is to	-0.124939
-0.503435	thread-specific data is to	-0.124939
-0.532175	while loop is to	-0.124939
-0.138358	to do is to	-0.124939
-0.138358	can do is to	-0.124939
-0.543507	code performance is to	-0.124939
-0.442961	CPU-intensive software is to	-0.124939
-0.300417	first way is to	-0.124939
-0.300417	second way is to	-0.124939
-0.300417	compatible way is to	-0.124939
-0.345834	software optimization is to	-0.124939
-0.345834	81 optimization is to	-0.124939
-0.342599	smart pointers is to	-0.124939
-0.562994	general method is to	-0.124939
-0.733955	this case is to	-0.124939
-0.503435	of error is to	-0.124939
-0.342599	is optimized is to	-0.124939
-0.355382	the problem is to	-0.124939
-0.060839	this problem is to	-0.301030
-0.379771	the solution is to	-0.124939
-0.265929	best solution is to	-0.124939
-0.265929	complicated solution is to	-0.124939
-0.265929	alternative solution is to	-0.124939
-0.265929	clean solution is to	-0.124939
-0.265929	reasonable solution is to	-0.124939
-0.442961	the container is to	-0.124939
-0.627804	table lookup is to	-0.124939
-0.785608	operator here is to	-0.124939
-0.342599	such errors is to	-0.124939
-0.342599	is limited is to	-0.124939
-0.063692	Another possibility is to	-0.124939
-0.342599	important thing is to	-0.124939
-0.104097	CPU cores is to	-0.425969
-0.241284	simple alternative is to	-0.124939
-0.241284	An alternative is to	-0.124939
-0.342599	pointer aliasing is to	-0.124939
-0.342599	The purpose is to	-0.124939
-0.342599	and delete is to	-0.124939
-0.342599	The trick is to	-0.124939
-0.342599	compiler generates is to	-0.124939
-0.342599	using exceptions is to	-0.124939
-0.342599	My recommendation is to	-0.124939
-0.342599	cleanup jobs is to	-0.124939
-0.342599	realistic goal is to	-0.124939
-0.342599	compromise safety is to	-0.124939
-0.342599	performance bottlenecks is to	-0.124939
-0.106429	// set a to	-0.425969
-0.458460	we add a to	-0.124939
-0.065285	// copy a to	-0.425969
-0.354854	and reduce a to	-0.124939
-0.247711	are converting a to	-0.124939
-0.247711	implicitly converting a to	-0.124939
-0.354854	we prefer a to	-0.124939
-0.554345	the function and to	-0.124939
-0.978990	for Windows and to	-0.124939
-0.106174	terminating zero and to	-0.425969
-0.353503	these obstacles and to	-0.124939
-0.353503	integer operations, and to	-0.124939
-0.353503	separate module, and to	-0.124939
-0.577224	solution would be to	-0.124939
-0.354593	to repeat or to	-0.124939
-0.354593	the console or to	-0.124939
-0.453203	we want it to	-0.124939
-0.350707	This allows it to	-0.124939
-0.140875	then convert it to	-0.124939
-0.140875	must convert it to	-0.124939
-0.350707	than comparing it to	-0.124939
-0.350707	and compare it to	-0.124939
-0.350707	which redirects it to	-0.124939
-0.587246	want the function to	-0.124939
-0.587168	Specifies a function to	-0.124939
-0.168504	{ // function to	-0.124939
-0.076130	matrix // function to	-0.425969
-0.522038	use this function to	-0.124939
-0.346227	decides which function to	-0.124939
-0.487340	from one function to	-0.124939
-1.228349	a member function to	-0.124939
-0.346227	// Critical function to	-0.124939
-1.480588	of the code to	-0.124939
-1.026990	in the code to	-0.124939
-0.535818	before the code to	-0.124939
-0.535818	want the code to	-0.124939
-0.535818	tune the code to	-0.124939
-1.381424	piece of code to	-0.124939
-0.858809	the critical code to	-0.124939
-0.515580	add extra code to	-0.124939
-0.401769	need assembly code to	-0.124939
-0.401769	inline assembly code to	-0.124939
-0.063049	from AVX code to	-0.425969
-0.496407	as machine code to	-0.124939
-0.354690	designed so as to	-0.124939
-0.354690	rules apply as to	-0.124939
-0.354262	the compiler not to	-0.124939
-0.354262	Be sure not to	-0.124939
-0.357628	and maintenance - to	-0.124939
-0.136397	the program than to	-0.124939
-0.136397	a program than to	-0.124939
-0.336354	+ b than to	-0.124939
-0.336354	compiler optimization than to	-0.124939
-0.336354	| operations than to	-0.124939
-0.435107	each thread than to	-0.124939
-0.336354	a container than to	-0.124939
-0.336354	memory block than to	-0.124939
-0.336354	accessed recently than to	-0.124939
-0.336354	or bitmap than to	-0.124939
-0.336354	write 2.0/3.0 than to	-0.124939
-0.336354	(memory pooling) than to	-0.124939
-0.101138	for the compiler to	-0.124939
-0.677011	by the compiler to	-0.124939
-0.564040	on the compiler to	-0.124939
-0.097295	allows the compiler to	-0.124939
-0.655529	tell the compiler to	-0.124939
-0.097295	enable the compiler to	-0.124939
-0.400256	allow the compiler to	-0.124939
-0.400256	expect the compiler to	-0.124939
-0.155562	enables the compiler to	-0.124939
-0.400256	forces the compiler to	-0.124939
-0.160466	expect a compiler to	-0.124939
-0.313778	a Windows compiler to	-0.124939
-0.313778	a particular compiler to	-0.124939
-0.352253	template for x to	-0.124939
-0.352253	to get x to	-0.124939
-0.352253	15.1a. Calculate x to	-0.124939
-0.352253	constructor initializes x to	-0.124939
-0.041970	that allows you to	-0.124939
-0.088433	compiler allows you to	-0.124939
-0.140470	that allow you to	-0.124939
-0.140470	systems allow you to	-0.124939
-0.448835	registers that have to	-0.124939
-0.618092	do not have to	-0.124939
-0.500396	does not have to	-0.124939
-0.248307	you may have to	-0.124939
-0.248307	program may have to	-0.124939
-0.248307	You may have to	-0.124939
-0.248307	etc. may have to	-0.124939
-0.228909	that you have to	-0.124939
-0.043299	then you have to	-0.124939
-0.145497	systems you have to	-0.124939
-0.145497	case you have to	-0.124939
-0.145497	All you have to	-0.124939
-0.145497	optimizations you have to	-0.124939
-0.145497	Here you have to	-0.124939
-0.145497	execution, you have to	-0.124939
-0.145497	anything, you have to	-0.124939
-0.439858	we will have to	-0.124939
-0.354493	The data have to	-0.124939
-0.473156	All functions have to	-0.124939
-0.354493	we do have to	-0.124939
-0.413597	that we have to	-0.124939
-0.034968	} You have to	-0.124939
-0.034968	functions You have to	-0.124939
-0.034968	handling. You have to	-0.124939
-0.034968	manual. You have to	-0.124939
-0.034968	72. You have to	-0.124939
-0.034968	job. You have to	-0.124939
-0.270968	micro- processors have to	-0.124939
-0.270968	address calculations have to	-0.124939
-0.270968	different versions have to	-0.124939
-0.211236	it doesn't have to	-0.124939
-0.443983	compiler doesn't have to	-0.124939
-0.354493	we would have to	-0.124939
-0.270968	five values have to	-0.124939
-0.023712	you don't have to	-0.124939
-0.023712	we don't have to	-0.124939
-0.270968	all caches have to	-0.124939
-0.454123	you want this to	-0.124939
-0.351434	and expect this to	-0.124939
-0.351434	ebx,1 adds this to	-0.124939
-0.571987	have the time to	-0.124939
-0.717927	takes more time to	-0.124939
-0.399602	take more time to	-0.124939
-0.374745	40% more time to	-0.124939
-0.374003	the same time to	-0.124939
-0.325217	obviously takes time to	-0.124939
-0.522881	a long time to	-0.124939
-0.312027	as long time to	-0.124939
-0.312027	too long time to	-0.124939
-1.233231	at compile time to	-0.124939
-0.904906	takes longer time to	-0.124939
-0.425680	the response time to	-0.124939
-0.482950	cause the memory to	-0.124939
-0.482950	causes the memory to	-0.124939
-0.549993	swapping of memory to	-0.124939
-0.049844	from static memory to	-0.425969
-0.336586	to swap memory to	-0.124939
-0.568987	should look at to	-0.124939
-0.586636	converting the data to	-0.124939
-0.353116	to organize data to	-0.124939
-1.456742	of the program to	-0.124939
-0.555887	want the program to	-0.124939
-0.555887	modify the program to	-0.124939
-0.509827	in your program to	-0.124939
-0.482218	iteration that has to	-0.124939
-0.519727	Therefore, it has to	-0.124939
-0.508405	System code has to	-0.124939
-0.588432	the compiler has to	-0.124939
-0.554703	The compiler has to	-0.124939
-0.351182	A compiler has to	-0.124939
-0.705004	The program has to	-0.124939
-0.680340	the pointer has to	-0.124939
-0.305103	because b has to	-0.124939
-0.322343	the user has to	-0.124939
-0.221086	a user has to	-0.124939
-0.305103	array element has to	-0.124939
-0.305103	cache line has to	-0.124939
-0.305103	rounding mode has to	-0.124939
-0.305103	each addition has to	-0.124939
-0.305103	user actually has to	-0.124939
-0.305103	the offset has to	-0.124939
-0.396225	then F1 has to	-0.124939
-0.305103	user who has to	-0.124939
-0.352803	this example only to	-0.124939
-0.455858	CPU dispatching only to	-0.124939
-0.866157	for the CPU to	-0.124939
-0.518899	want the CPU to	-0.124939
-0.518899	allows the CPU to	-0.124939
-0.518899	tells the CPU to	-0.124939
-0.357181	scan forward) instruction to	-0.124939
-0.319567	sure to point to	-0.124939
-0.319567	edx = point to	-0.124939
-0.319567	makes it point to	-0.124939
-0.319567	it will point to	-0.124939
-0.236541	from floating point to	-0.301030
-0.060592	types cannot point to	-0.425969
-0.131030	objects they point to	-0.124939
-0.131030	texts they point to	-0.124939
-0.319567	Induction++; ; point to	-0.124939
-0.521626	visible at all to	-0.124939
-1.083497	can be used to	-0.124939
-0.345523	than it used to	-0.124939
-0.345523	to get used to	-0.124939
-0.582719	cause the cache to	-0.124939
-0.506838	Comparing an integer to	-0.124939
-0.506838	Converting an integer to	-0.124939
-0.467789	one more integer to	-0.124939
-0.135022	conversions from integer to	-0.124939
-0.135022	Conversion from integer to	-0.124939
-0.488185	the unsigned integer to	-0.124939
-0.699081	a signed integer to	-0.124939
-0.351490	options are set to	-0.124939
-0.351490	its pointer set to	-0.124939
-0.355114	to one class to	-0.124939
-1.075230	you can do to	-0.124939
-0.521558	programmer can do to	-0.124939
-0.561637	cases, for example to	-0.124939
-0.505486	of C++ compilers to	-0.124939
-1.262831	different C++ compilers to	-0.124939
-0.283006	float or double to	-0.301030
-0.500518	with fixed size to	-0.124939
-0.338444	is a pointer to	-0.124939
-0.149641	to a pointer to	-0.124939
-0.233789	and a pointer to	-0.124939
-0.233789	when a pointer to	-0.124939
-0.233789	has a pointer to	-0.124939
-0.233789	make a pointer to	-0.124939
-0.338444	return a pointer to	-0.124939
-0.645092	through a pointer to	-0.124939
-0.233789	stores a pointer to	-0.124939
-0.233789	converting a pointer to	-0.124939
-0.233789	returns a pointer to	-0.124939
-0.233789	Returns a pointer to	-0.124939
-0.270186	reference or pointer to	-0.124939
-0.433207	a function pointer to	-0.425969
-0.435403	its 'this' pointer to	-0.124939
-0.053449	// Set pointer to	-0.425969
-0.064819	element in b to	-0.425969
-0.741774	value of i to	-0.124939
-0.433293	conversion of i to	-0.124939
-0.340824	to add i to	-0.124939
-0.340824	by type-casting i to	-0.124939
-0.354818	to convert float to	-0.124939
-0.570049	such an object to	-0.124939
-0.344715	from one object to	-0.124939
-0.344715	the first object to	-0.124939
-0.529992	want a number to	-0.124939
-0.453727	floating point number to	-0.124939
-0.340497	false model number to	-0.124939
-0.064653	the keyword static to	-0.425969
-0.300329	is more efficient to	-0.234083
-0.123538	be more efficient to	-0.124939
-0.296841	often more efficient to	-0.124939
-0.420392	much more efficient to	-0.124939
-0.470853	slightly more efficient to	-0.124939
-1.248238	of the array to	-0.124939
-0.501000	set an array to	-0.124939
-0.501000	setting an array to	-0.124939
-0.001553	it is possible to	-0.159701
-0.008847	It is possible to	-0.287666
-0.009161	may be possible to	-0.124939
-0.018519	should be possible to	-0.124939
-0.018519	less be possible to	-0.124939
-0.009161	might be possible to	-0.425969
-0.065857	make it possible to	-0.124939
-0.009445	makes it possible to	-0.124939
-0.039080	was it possible to	-0.124939
-0.103827	is not possible to	-0.124939
-0.090302	therefore not possible to	-0.124939
-0.028083	is also possible to	-0.124939
-0.028083	is often possible to	-0.124939
-0.125211	not always possible to	-0.124939
-0.125211	is sometimes possible to	-0.124939
-0.539365	deciding which version to	-0.124939
-0.355882	unused fourth value to	-0.124939
-0.459255	transferring composite objects to	-0.124939
-0.165030	that it takes to	-0.124939
-0.070310	than it takes to	-0.124939
-0.010491	time it takes to	-0.321233
-0.581836	forces the variable to	-0.124939
-0.335465	cause other variables to	-0.124939
-0.335465	setting these variables to	-0.124939
-0.537900	need induction variables to	-0.124939
-0.335465	Windows, allow variables to	-0.124939
-0.353511	it must return to	-0.124939
-0.340449	// add 2 to	-0.124939
-0.063409	// Add 2 to	-0.425969
-0.495671	expect the table to	-0.124939
-0.495671	copies the table to	-0.124939
-0.720087	a virtual table to	-0.124939
-0.562529	cause the software to	-0.124939
-0.488387	common for software to	-0.124939
-0.001955	it in order to	-0.124939
-0.001955	code in order to	-0.124939
-0.000976	data in order to	-0.124939
-0.001955	other in order to	-0.124939
-0.001955	set in order to	-0.124939
-0.001955	size in order to	-0.124939
-0.001955	i in order to	-0.124939
-0.001955	variables in order to	-0.124939
-0.001955	2 in order to	-0.124939
-0.001955	order in order to	-0.124939
-0.001955	elements in order to	-0.124939
-0.001955	const in order to	-0.124939
-0.001955	8 in order to	-0.124939
-0.001955	unsigned in order to	-0.124939
-0.001955	operations in order to	-0.124939
-0.000976	times in order to	-0.124939
-0.001955	big in order to	-0.124939
-0.001955	element in order to	-0.124939
-0.001955	information in order to	-0.124939
-0.001955	addresses in order to	-0.124939
-0.001955	end in order to	-0.124939
-0.001955	needed in order to	-0.124939
-0.001955	together in order to	-0.124939
-0.001955	bool in order to	-0.124939
-0.001955	conditions in order to	-0.124939
-0.001955	right in order to	-0.124939
-0.001955	cores in order to	-0.124939
-0.001955	input in order to	-0.124939
-0.001955	blocks in order to	-0.124939
-0.001955	low in order to	-0.124939
-0.001955	algorithms in order to	-0.124939
-0.001955	purpose in order to	-0.124939
-0.001955	itself in order to	-0.124939
-0.001955	principles in order to	-0.124939
-0.001955	package in order to	-0.124939
-0.001955	is, in order to	-0.124939
-0.001955	bookkeeping in order to	-0.124939
-0.001955	influences in order to	-0.124939
-0.001955	experiments in order to	-0.124939
-0.001955	ment in order to	-0.124939
-0.001955	randomness in order to	-0.124939
-0.001955	de-referenced in order to	-0.124939
-0.001955	phase in order to	-0.124939
-0.001955	(GOT) in order to	-0.124939
-0.001955	sizeof(float) in order to	-0.124939
-0.031687	} In order to	-0.124939
-0.031687	B. In order to	-0.124939
-0.031687	system-specific. In order to	-0.124939
-0.501357	which code branch to	-0.124939
-0.470348	is a way to	-0.124939
-0.333882	shows a way to	-0.124939
-0.090622	parallelism. The way to	-0.124939
-0.090622	factors. The way to	-0.124939
-0.090622	the only way to	-0.124939
-0.090622	The only way to	-0.124939
-0.150216	is no way to	-0.425969
-0.205278	even faster way to	-0.124939
-0.276064	very useful way to	-0.124939
-0.011045	the best way to	-0.124939
-0.016674	The best way to	-0.124939
-0.276064	A good way to	-0.124939
-0.276064	The safe way to	-0.124939
-0.205278	The simplest way to	-0.124939
-0.042951	no easy way to	-0.124939
-0.205278	The typical way to	-0.124939
-0.042951	the fastest way to	-0.124939
-0.042951	The easiest way to	-0.124939
-1.626659	number of elements to	-0.124939
-0.489351	sets all elements to	-0.124939
-0.437004	it is faster to	-0.124939
-0.306513	It is faster to	-0.124939
-0.437004	operand is faster to	-0.124939
-0.437004	tool is faster to	-0.124939
-0.548708	is often faster to	-0.124939
-0.389683	be even faster to	-0.124939
-0.389683	usually much faster to	-0.124939
-0.299791	is usually faster to	-0.124939
-0.207666	replace the call to	-0.124939
-0.207666	inlining the call to	-0.124939
-0.207666	removing the call to	-0.124939
-0.104992	that a call to	-0.124939
-0.104992	across a call to	-0.124939
-0.564963	a function call to	-0.124939
-0.503126	each function call to	-0.124939
-0.321708	driver. A call to	-0.124939
-0.067103	than one call to	-0.124939
-0.032257	only one call to	-0.124939
-0.243743	because each call to	-0.124939
-0.243743	make any call to	-0.124939
-0.404613	the first call to	-0.124939
-0.049331	a single call to	-0.124939
-0.243743	// Virtual call to	-0.124939
-0.380463	used, for example, to	-0.124939
-0.380463	be, for example, to	-0.124939
-0.581060	fraction. For example, to	-0.124939
-0.699984	the sign bit to	-0.124939
-0.630486	set sign bit to	-0.124939
-0.545425	setting a register to	-0.124939
-0.618925	an extra register to	-0.124939
-0.337972	new physical register to	-0.124939
-0.028899	example of how to	-0.124939
-0.028899	examples of how to	-0.124939
-0.009527	130 for how to	-0.124939
-0.019267	120 for how to	-0.124939
-0.009527	122 for how to	-0.124939
-0.019267	107 for how to	-0.124939
-0.019267	www.agner.org/optimize/cppexamples.zip for how to	-0.124939
-0.255230	Advice on how to	-0.124939
-0.243185	Tips about how to	-0.124939
-0.011797	example shows how to	-0.204120
-0.198034	to know how to	-0.124939
-0.034300	is discussed how to	-0.124939
-0.157244	that specifies how to	-0.124939
-0.157244	C++ programming, how to	-0.124939
-0.157244	example illustrates how to	-0.124939
-0.071544	manual discusses how to	-0.124939
-0.071544	section discusses how to	-0.124939
-0.354245	// Use template to	-0.124939
-0.566566	uses XMM registers to	-0.124939
-0.031329	without the need to	-0.124939
-0.136263	functions that need to	-0.124939
-0.136263	files that need to	-0.124939
-0.136263	resources that need to	-0.124939
-0.444308	does not need to	-0.124939
-0.179442	that may need to	-0.124939
-0.179442	array may need to	-0.124939
-0.179442	You may need to	-0.124939
-0.079374	then you need to	-0.124939
-0.079374	If you need to	-0.124939
-0.079374	so you need to	-0.124939
-0.079374	words, you need to	-0.124939
-0.029440	is no need to	-0.204120
-0.047885	- no need to	-0.124939
-0.178244	case we need to	-0.124939
-0.178244	cores, we need to	-0.124939
-0.169950	registers. You need to	-0.124939
-0.169950	dynamic libraries need to	-0.124939
-0.169950	object files need to	-0.124939
-0.054427	table of pointers to	-0.301030
-0.386089	programmer that pointers to	-0.124939
-0.270782	data or pointers to	-0.124939
-0.386089	(rather than pointers to	-0.124939
-0.270782	allows multiple pointers to	-0.124939
-0.270782	block. Any pointers to	-0.124939
-0.270782	to keep pointers to	-0.124939
-0.270782	by setting pointers to	-0.124939
-0.270782	by initializing pointers to	-0.124939
-0.518401	telling the user to	-0.124939
-0.518401	forbids the user to	-0.124939
-0.706864	It is useful to	-0.124939
-1.008625	can be useful to	-0.124939
-0.150949	may be useful to	-0.221849
-0.392455	102 also useful to	-0.124939
-0.626046	be very useful to	-0.124939
-0.275655	is often useful to	-0.124939
-0.124584	it is sure to	-0.124939
-0.124584	This is sure to	-0.124939
-0.100576	they are sure to	-0.124939
-0.231695	processors are sure to	-0.124939
-0.231695	arguments are sure to	-0.124939
-0.389899	executables. Make sure to	-0.124939
-0.500829	of which method to	-0.124939
-0.786994	#pragma vector always to	-0.124939
-0.342013	Remember, therefore, always to	-0.124939
-0.347227	makes the access to	-0.124939
-0.378517	give you access to	-0.124939
-0.264964	threads have access to	-0.124939
-0.264964	fastest possible access to	-0.124939
-0.264964	to get access to	-0.124939
-0.264964	for fast access to	-0.124939
-0.264964	which gives access to	-0.124939
-0.378517	with network access to	-0.124939
-0.264964	subset, giving access to	-0.124939
-0.264964	allows direct access to	-0.124939
-0.264964	Sequential forward access to	-0.124939
-0.526793	loop by 16 to	-0.124939
-0.343367	actually adds 16 to	-0.124939
-0.139058	block turns out to	-0.124939
-0.139058	prediction turns out to	-0.124939
-0.682706	the operating system to	-0.124939
-0.542981	write the file to	-0.124939
-0.353358	all other bits to	-0.124939
-0.353454	for disk operations to	-0.124939
-0.138655	integers from 0 to	-0.124939
-0.138655	interval from 0 to	-0.124939
-0.353641	the same type to	-0.124939
-1.351250	in some cases to	-0.124939
-0.352519	compact, and simple to	-0.124939
-0.456142	adding new instructions to	-0.124939
-0.563621	that are available to	-0.124939
-0.343249	be made available to	-0.124939
-0.181652	adding a constant to	-0.425969
-0.247126	that are up to	-0.124939
-0.247126	the code up to	-0.124939
-0.247126	currently not up to	-0.124939
-0.355487	be set up to	-0.124939
-0.399880	may take up to	-0.124939
-0.247126	it count up to	-0.124939
-0.106217	systems allow up to	-0.124939
-0.106217	Mac allow up to	-0.124939
-0.247126	vector turned up to	-0.124939
-0.247126	tested (not up to	-0.124939
-0.247126	CPU dispatchers up to	-0.124939
-0.247126	registers, totaling up to	-0.124939
-0.487574	Number of times to	-0.124939
-0.260210	long response times to	-0.124939
-0.081895	arrays and want to	-0.124939
-0.019072	you may want to	-0.124939
-0.128249	and you want to	-0.124939
-0.088976	that you want to	-0.124939
-0.012520	if you want to	-0.124939
-0.066556	code you want to	-0.124939
-0.032004	when you want to	-0.124939
-0.066556	program you want to	-0.124939
-0.088976	If you want to	-0.124939
-0.134947	where you want to	-0.124939
-0.066556	example, you want to	-0.124939
-0.066556	Whether you want to	-0.124939
-0.048781	function we want to	-0.124939
-0.023706	if we want to	-0.124939
-0.048781	If we want to	-0.124939
-0.081895	manuals. I want to	-0.124939
-0.134072	compilers. We want to	-0.124939
-0.081895	we still want to	-0.124939
-0.039020	developers who want to	-0.124939
-0.039020	those who want to	-0.124939
-0.006916	it is important to	-0.124939
-0.150368	It is important to	-0.204120
-0.066605	performance is important to	-0.124939
-0.321854	even more important to	-0.124939
-0.135786	is very important to	-0.124939
-0.169333	is therefore important to	-0.124939
-0.169333	is too important to	-0.124939
-0.456619	many different CPUs to	-0.124939
-0.063427	are sufficiently large to	-0.425969
-0.340747	some heavy work to	-0.124939
-0.340747	the reinstallation work to	-0.124939
-0.116136	between the calls to	-0.124939
-0.116136	avoid the calls to	-0.124939
-0.275139	number of calls to	-0.124939
-0.487044	61 function calls to	-0.124939
-0.275139	loops by calls to	-0.124939
-0.275139	contains no calls to	-0.124939
-0.275139	program contains calls to	-0.124939
-0.275139	"function". Multiple calls to	-0.124939
-0.352464	with other calculations to	-0.124939
-0.816095	down the execution to	-0.124939
-0.649764	the final result to	-0.124939
-0.352905	a hyperthreading processor to	-0.124939
-0.518685	D is compiled to	-0.124939
-0.317552	are first compiled to	-0.124939
-0.317552	Example 8.26a compiled to	-0.124939
-0.317552	Example 8.26b compiled to	-0.124939
-0.352905	string of bytes to	-0.124939
-0.517408	arrays very big to	-0.124939
-0.025344	it is necessary to	-0.124939
-0.029210	It is necessary to	-0.124939
-0.090425	not be necessary to	-0.124939
-0.042863	may be necessary to	-0.124939
-0.143984	makes it necessary to	-0.124939
-0.250356	is not necessary to	-0.124939
-0.031772	is often necessary to	-0.124939
-0.013835	is therefore necessary to	-0.301030
-0.143984	is rarely necessary to	-0.124939
-0.340266	the nearest element to	-0.124939
-0.340266	extra dummy element to	-0.124939
-0.138869	of execution speed to	-0.425969
-0.351560	example is specific to	-0.124939
-0.294502	It is common to	-0.425969
-0.326246	is more common to	-0.124939
-0.511771	fix the thread to	-0.124939
-0.520681	lock a thread to	-0.124939
-0.352946	time slices allocated to	-0.124939
-0.454726	are too small to	-0.124939
-0.442512	Conversion of integers to	-0.124939
-0.442512	two 32-bit integers to	-0.124939
-0.417326	of unsigned integers to	-0.124939
-0.417326	convert unsigned integers to	-0.124939
-0.505783	It is good to	-0.124939
-0.339394	is not good to	-0.124939
-0.353323	said than done to	-0.124939
-0.549781	from single precision to	-0.124939
-0.572775	entire cache line to	-0.124939
-0.418778	four function parameters to	-0.124939
-0.295626	passed as parameters to	-0.124939
-0.384565	of four parameters to	-0.124939
-0.057216	to fourteen parameters to	-0.425969
-0.020680	it is advantageous to	-0.301030
-0.142072	It is advantageous to	-0.124939
-0.032994	can be advantageous to	-0.124939
-0.183411	may be advantageous to	-0.124939
-0.107613	also be advantageous to	-0.124939
-0.107613	cases be advantageous to	-0.124939
-0.183411	therefore be advantageous to	-0.124939
-0.087581	is not advantageous to	-0.301030
-0.139920	is less advantageous to	-0.124939
-0.139920	almost always advantageous to	-0.124939
-0.342731	which is known to	-0.124939
-0.138399	result is known to	-0.425969
-0.342731	process is known to	-0.124939
-0.294332	constant and known to	-0.124939
-0.457506	Fortunately, the solution to	-0.124939
-0.308167	A simple solution to	-0.124939
-0.308167	The standard solution to	-0.124939
-0.308167	a better solution to	-0.124939
-0.354796	is an advantage to	-0.124939
-0.499245	be an advantage to	-0.124939
-0.309103	is no advantage to	-0.124939
-0.309103	a specific advantage to	-0.124939
-0.038940	} // Function to	-0.602060
-0.129174	functions // Function to	-0.124939
-0.129174	classes // Function to	-0.124939
-0.129174	SSE4.1 // Function to	-0.124939
-0.129174	); // Function to	-0.124939
-0.129174	const*)p);} // Function to	-0.124939
-0.013972	loop by eight to	-0.726999
-0.226267	when deciding whether to	-0.124939
-0.306895	it decides whether to	-0.124939
-0.031059	for is likely to	-0.124939
-0.269766	it is likely to	-0.124939
-0.015252	code is likely to	-0.124939
-0.031059	compiler is likely to	-0.124939
-0.031059	this is likely to	-0.124939
-0.031059	program is likely to	-0.124939
-0.031059	which is likely to	-0.124939
-0.031059	address is likely to	-0.124939
-0.031059	user is likely to	-0.124939
-0.031059	method is likely to	-0.124939
-0.031059	system is likely to	-0.124939
-0.031059	problem is likely to	-0.124939
-0.031059	model is likely to	-0.124939
-0.031059	platform is likely to	-0.124939
-0.031059	here is likely to	-0.124939
-0.031059	brand is likely to	-0.124939
-0.031059	comparison is likely to	-0.124939
-0.031059	87) is likely to	-0.124939
-0.061050	data are likely to	-0.124939
-0.061050	is more likely to	-0.124939
-0.061050	is also likely to	-0.124939
-0.187013	is very likely to	-0.124939
-0.061050	are less likely to	-0.124939
-0.061050	and therefore likely to	-0.124939
-0.061050	are equally likely to	-0.124939
-0.540488	in the structure to	-0.124939
-0.350956	around. Adding 1 to	-0.124939
-0.350634	do not add to	-0.124939
-0.335622	use this information to	-0.124939
-0.335622	adds extra information to	-0.124939
-0.351698	When used simply to	-0.124939
-0.002280	compiler is able to	-0.124939
-0.004571	microprocessor is able to	-0.124939
-0.002797	to be able to	-0.301030
-0.002096	not be able to	-0.124939
-0.002797	may be able to	-0.124939
-0.004202	will be able to	-0.124939
-0.000284	compilers are able to	-0.204120
-0.000853	microprocessors are able to	-0.124939
-0.006875	are not able to	-0.124939
-0.006875	usually not able to	-0.124939
-0.013861	not always able to	-0.124939
-0.013861	are actually able to	-0.124939
-0.006875	they were able to	-0.124939
-0.006875	tested were able to	-0.124939
-0.013861	are sometimes able to	-0.124939
-0.201843	1 is certain to	-0.124939
-0.201843	#define is certain to	-0.124939
-0.270690	is not certain to	-0.124939
-0.270690	is therefore certain to	-0.124939
-0.270690	instruction was certain to	-0.124939
-0.498353	is almost certain to	-0.124939
-0.943321	few clock cycles to	-0.124939
-0.472719	5 clock cycles to	-0.124939
-0.674220	hundred clock cycles to	-0.124939
-0.432406	causing return addresses to	-0.124939
-0.334201	translate these addresses to	-0.124939
-0.351800	while seconds count to	-0.124939
-0.351456	requiring many files to	-0.124939
-0.000206	it is recommended to	-0.271067
-0.000103	It is recommended to	-0.425969
-0.017241	is not recommended to	-0.124939
-0.031333	is also recommended to	-0.124939
-0.147737	is therefore recommended to	-0.124939
-0.031333	is strongly recommended to	-0.124939
-0.350841	the threads write to	-0.124939
-0.350000	expect 64-bit programs to	-0.124939
-0.130745	it is optimal to	-0.124939
-0.206515	not be optimal to	-0.124939
-0.304025	may be optimal to	-0.124939
-0.831234	is not optimal to	-0.124939
-0.532895	efficient memory space to	-0.124939
-0.665133	the heap space to	-0.124939
-0.180395	is a lot to	-0.425969
-0.494599	often a lot to	-0.124939
-0.331951	of overflow Integer to	-0.124939
-0.331951	further discussion. Integer to	-0.124939
-0.429146	2" The dispatching to	-0.124939
-0.556095	supports CPU dispatching to	-0.124939
-0.350050	the two branches to	-0.124939
-0.427459	such an application to	-0.124939
-0.330251	a typical application to	-0.124939
-0.350898	the && expression to	-0.124939
-0.821450	is more complicated to	-0.124939
-0.350303	who would like to	-0.124939
-0.565818	align data members to	-0.124939
-0.517192	use these methods to	-0.124939
-0.349958	the data block to	-0.124939
-0.132095	&& a needs to	-0.124939
-0.156769	function that needs to	-0.124939
-0.085130	work that needs to	-0.124939
-0.085130	destructor that needs to	-0.124939
-0.053635	and it needs to	-0.124939
-0.221140	because it needs to	-0.124939
-0.029450	the compiler needs to	-0.124939
-0.132095	&& b needs to	-0.124939
-0.132095	executable file needs to	-0.124939
-0.132095	one constant needs to	-0.124939
-0.132095	positive list needs to	-0.124939
-0.132095	as ReadB needs to	-0.124939
-0.494372	if the conversion to	-0.124939
-0.329010	integers before conversion to	-0.124939
-0.462602	as a parameter to	-0.124939
-0.328211	an implicit parameter to	-0.124939
-0.528050	floating point division to	-0.124939
-0.309893	returns a reference to	-0.124939
-0.076152	pointer or reference to	-0.329059
-0.152675	a relative reference to	-0.124939
-0.152675	// Return reference to	-0.124939
-0.152675	a null reference to	-0.124939
-0.428593	virtually no cost to	-0.124939
-0.268690	no performance cost to	-0.124939
-0.383363	no extra cost to	-0.124939
-0.268690	a large cost to	-0.124939
-0.351733	large overhead cost to	-0.124939
-0.008016	is no reason to	-0.346788
-0.827985	the CPU dispatcher to	-0.124939
-0.325860	// add n to	-0.124939
-0.325860	by adding n to	-0.124939
-0.349402	zero-terminated ASCII string to	-0.124939
-0.018482	of the programmer to	-0.903090
-0.028026	for the programmer to	-0.249877
-0.348286	12.4b executes three to	-0.124939
-0.121885	may be better to	-0.124939
-0.746604	the static keyword to	-0.124939
-0.348509	the resource-hungry applications to	-0.124939
-0.348286	Don't change && to	-0.124939
-0.369890	code in addition to	-0.124939
-0.369890	doing an addition to	-0.124939
-0.283631	most important addition to	-0.124939
-0.283631	do another addition to	-0.124939
-0.349402	the update mechanism to	-0.124939
-0.351195	Metaprogramming Metaprogramming means to	-0.124939
-0.349905	convert these types to	-0.124939
-0.144402	that is difficult to	-0.124939
-0.262756	It is difficult to	-0.425969
-0.144402	which is difficult to	-0.124939
-0.032989	long and difficult to	-0.124939
-0.032989	bulky and difficult to	-0.124939
-0.021712	can be difficult to	-0.124939
-0.010720	may be difficult to	-0.124939
-0.032989	and are difficult to	-0.124939
-0.032989	that are difficult to	-0.124939
-0.068692	the code difficult to	-0.124939
-0.119212	and more difficult to	-0.124939
-0.068692	are very difficult to	-0.124939
-0.068692	and therefore difficult to	-0.124939
-0.068692	is quite difficult to	-0.124939
-0.068692	is slow, difficult to	-0.124939
-0.744843	m is transferred to	-0.124939
-0.742097	data are aligned to	-0.124939
-0.349231	or easy linking to	-0.124939
-1.236075	floating point numbers to	-0.124939
-0.348491	on runtime dispatch to	-0.124939
-0.060935	a well-defined interface to	-0.425969
-0.849206	the installation process to	-0.124939
-0.246458	the time goes to	-0.124939
-0.246458	The output goes to	-0.124939
-0.246458	software project goes to	-0.124939
-0.246458	call p->f() goes to	-0.124939
-0.246458	than 1% goes to	-0.124939
-0.631714	You may choose to	-0.124939
-0.363011	developer may choose to	-0.124939
-0.295078	Whether you choose to	-0.124939
-0.414924	appropriate compiler options to	-0.124939
-0.320200	have various options to	-0.124939
-0.184733	are several ways to	-0.124939
-0.069264	are various ways to	-0.249877
-0.121218	describe various ways to	-0.124939
-0.039347	are three ways to	-0.124939
-0.416185	differently. The link to	-0.124939
-0.321214	a symbolic link to	-0.124939
-0.229315	dispatch is made to	-0.124939
-0.229315	attempt is made to	-0.124939
-0.348475	keyword wherever appropriate to	-0.124939
-0.121257	object it points to	-0.124939
-0.121257	call it points to	-0.124939
-0.197515	a pointer points to	-0.124939
-0.137673	p always points to	-0.124939
-0.137673	pointer actually points to	-0.124939
-0.295519	what r points to	-0.124939
-0.137673	object p points to	-0.124939
-0.041225	which initially points to	-0.124939
-0.041225	pointer initially points to	-0.124939
-0.041225	entry initially points to	-0.124939
-0.348475	needs to switch to	-0.124939
-0.071945	before you start to	-0.124939
-0.034484	Before you start to	-0.124939
-0.348475	CPU will start to	-0.124939
-0.348475	is necessary here to	-0.124939
-0.212260	is more relevant to	-0.124939
-0.212260	is also relevant to	-0.124939
-0.212260	line options relevant to	-0.124939
-0.212260	are hardly relevant to	-0.124939
-0.212260	and keywords relevant to	-0.124939
-0.212260	all respects relevant to	-0.124939
-0.317255	are two things to	-0.124939
-0.317255	quite ingenious things to	-0.124939
-0.316133	{ // go to	-0.124939
-0.316133	the function go to	-0.124939
-0.347260	is simply predicted to	-0.124939
-0.316414	have more references to	-0.124939
-0.316414	calls. Internal references to	-0.124939
-0.468387	need extra overhead to	-0.124939
-0.318099	very little overhead to	-0.124939
-0.204316	code are relative to	-0.124939
-0.204316	each function relative to	-0.124939
-0.090252	the member relative to	-0.124939
-0.090252	data member relative to	-0.124939
-0.204316	the offset relative to	-0.124939
-0.204316	fact addressed relative to	-0.124939
-0.347814	add unused columns to	-0.124939
-0.600006	it is intended to	-0.124939
-0.424598	interposition is intended to	-0.124939
-0.371777	examples are intended to	-0.124939
-0.131992	Use a profiler to	-0.425969
-0.823745	{ // Loop to	-0.124939
-0.315071	Example 8.23a. Loop to	-0.124939
-0.137415	it is inefficient to	-0.425969
-0.347418	an immediate response to	-0.124939
-0.551068	of cache lines to	-0.124939
-0.404332	algorithm that comes to	-0.124939
-0.404332	when it comes to	-0.124939
-0.524525	code is limited to	-0.124939
-0.464666	plus the costs to	-0.124939
-0.275583	kinds of costs to	-0.124939
-0.275583	inherent performance costs to	-0.124939
-0.475113	about the destructor to	-0.124939
-0.710686	have a destructor to	-0.124939
-0.359341	it is safe to	-0.124939
-0.505597	not be safe to	-0.124939
-0.359341	is more safe to	-0.124939
-0.344095	vectors requires alignment to	-0.124939
-0.430070	define a macro to	-0.124939
-0.556384	// Define macro to	-0.124939
-0.344485	you want them to	-0.124939
-0.235537	we are writing to	-0.124939
-0.801674	Reading or writing to	-0.124939
-0.101993	more threads writing to	-0.124939
-0.101993	multiple threads writing to	-0.124939
-0.096638	can be reduced to	-0.124939
-0.524202	it more clear to	-0.124939
-0.444372	give higher priority to	-0.124939
-0.505058	from one iteration to	-0.124939
-0.132226	which processor models to	-0.124939
-0.301356	1 is changed to	-0.124939
-0.305160	to be changed to	-0.124939
-0.305160	may be changed to	-0.124939
-0.226671	is artificially changed to	-0.124939
-0.227965	It may fail to	-0.124939
-0.164082	companies often fail to	-0.124939
-0.164082	because they fail to	-0.124939
-0.074338	and therefore fail to	-0.124939
-0.074338	may therefore fail to	-0.124939
-0.164082	software products fail to	-0.124939
-0.552170	The first thing to	-0.124939
-0.301740	an obvious thing to	-0.124939
-0.540698	of data structures to	-0.124939
-0.532258	cause the heap to	-0.124939
-0.085802	can be initialized to	-0.124939
-0.254982	have been initialized to	-0.124939
-0.484675	want the executable to	-0.124939
-0.672349	the main executable to	-0.124939
-0.343275	such a subexpression to	-0.124939
-0.341894	install automatic updates to	-0.124939
-0.254982	Make calls directly to	-0.124939
-0.254982	instructions write directly to	-0.124939
-0.254982	be fed directly to	-0.124939
-0.312875	object is copied to	-0.124939
-0.213570	parameter is copied to	-0.124939
-0.216766	has been copied to	-0.124939
-0.216766	entire contents copied to	-0.124939
-0.342354	vector register sizes to	-0.124939
-0.044490	that is easier to	-0.124939
-0.044490	It is easier to	-0.124939
-0.044490	15.1b is easier to	-0.124939
-0.150033	manageable and easier to	-0.124939
-0.310565	is often easier to	-0.124939
-0.150033	is just easier to	-0.124939
-0.108906	to is identical to	-0.124939
-0.108906	p is identical to	-0.124939
-0.334739	BSD are identical to	-0.124939
-0.343275	reduced from 20 to	-0.124939
-0.297718	do not expect to	-0.124939
-0.297718	that we expect to	-0.124939
-0.121312	which is similar to	-0.124939
-0.121312	template is similar to	-0.124939
-0.205909	the result back to	-0.124939
-0.205909	the priority back to	-0.124939
-0.205909	and jumps back to	-0.124939
-0.205909	that dates back to	-0.124939
-0.290239	to set seconds to	-0.124939
-0.377963	take several seconds to	-0.124939
-0.734220	in the sequence to	-0.124939
-0.342236	also has something to	-0.124939
-0.508911	any performance penalty to	-0.124939
-0.342236	have special reasons to	-0.124939
-0.341731	of software programmers to	-0.124939
-0.341731	A little-known alternative to	-0.124939
-0.246233	the program happen to	-0.124939
-0.246233	several variables happen to	-0.124939
-0.246233	big matrix happen to	-0.124939
-0.133753	may be enough to	-0.124939
-0.061748	not long enough to	-0.124939
-0.061748	just long enough to	-0.124939
-0.589743	is big enough to	-0.124939
-0.133753	is small enough to	-0.124939
-0.133753	is rarely enough to	-0.124939
-0.029716	does not apply to	-0.124939
-0.061615	here may apply to	-0.124939
-0.061615	advices may apply to	-0.124939
-0.133440	not always apply to	-0.124939
-0.192661	same rules apply to	-0.124939
-0.340722	of a row to	-0.124939
-0.341226	a variable declaration to	-0.124939
-0.341226	add new features to	-0.124939
-0.023608	that is added to	-0.124939
-0.023608	integer is added to	-0.124939
-0.023608	c is added to	-0.124939
-0.023608	members is added to	-0.124939
-0.023608	f is added to	-0.124939
-0.286752	can be added to	-0.124939
-0.031388	It is easy to	-0.124939
-0.065223	error is easy to	-0.124939
-0.342093	in test situations to	-0.124939
-0.439502	reads or writes to	-0.124939
-0.234895	This function writes to	-0.124939
-0.234895	causes all writes to	-0.124939
-0.024154	functions. This applies to	-0.124939
-0.024154	manner. This applies to	-0.124939
-0.011909	The same applies to	-0.124939
-0.015953	This also applies to	-0.124939
-0.015953	Linux also applies to	-0.124939
-0.015953	operators also applies to	-0.124939
-0.049732	same advice applies to	-0.124939
-0.002520	can be applied to	-0.124939
-0.002520	only be applied to	-0.425969
-0.004039	results when applied to	-0.124939
-0.001006	static, when applied to	-0.726999
-0.594962	constructors and destructors to	-0.124939
-0.284210	classes with destructors to	-0.124939
-0.147026	of example 15.1b to	-0.124939
-0.067321	from example 15.1b to	-0.124939
-0.147026	convert example 15.1b to	-0.124939
-0.151878	compiler reduced 15.1b to	-0.124939
-0.339293	array pointer eax to	-0.124939
-0.342093	you don't care to	-0.124939
-0.340023	Linux The procedure to	-0.124939
-0.338140	by adding throw() to	-0.124939
-0.174362	compiler may try to	-0.124939
-0.174362	producer will try to	-0.124939
-0.174362	hyperthreading, then try to	-0.124939
-0.174362	Should we try to	-0.124939
-0.067348	integer is converted to	-0.124939
-0.067348	class is converted to	-0.124939
-0.013421	to be converted to	-0.124939
-0.004428	can be converted to	-0.301030
-0.056368	number when converted to	-0.124939
-0.067348	of object pointed to	-0.124939
-0.067348	The object pointed to	-0.124939
-0.013421	the value pointed to	-0.425969
-0.013421	the variable pointed to	-0.124939
-0.056368	the target pointed to	-0.124939
-0.337514	using advanced algorithms to	-0.124939
-0.337514	is OK, however, to	-0.124939
-0.116707	dispatchers are designed to	-0.124939
-0.116707	(MOVNT) are designed to	-0.124939
-0.436563	all the inputs to	-0.124939
-0.173525	it is preferred to	-0.124939
-0.173525	It is preferred to	-0.124939
-0.222084	may be preferred to	-0.124939
-0.340023	operator %. Conversion to	-0.124939
-0.276246	loop count down to	-0.124939
-0.276246	consumption was down to	-0.124939
-0.275704	array ; jump to	-0.124939
-0.275704	the microprocessor jump to	-0.124939
-0.348742	container are allowed to	-0.124939
-0.490825	is not allowed to	-0.124939
-1.018804	new and delete to	-0.124939
-0.052937	to be distributed to	-0.425969
-0.501211	a compiler generates to	-0.124939
-0.334526	from time T to	-0.124939
-0.402432	by the linker to	-0.124939
-0.283259	allows the linker to	-0.124939
-0.348009	during time measurements to	-0.124939
-0.265611	do some measurements to	-0.124939
-0.432814	Now, the factor to	-0.124939
-0.334526	from 64-bit MMX to	-0.124939
-0.338090	it makes sense to	-0.124939
-0.024154	way is equal to	-0.124939
-0.024154	list[i] is equal to	-0.124939
-0.024154	label is equal to	-0.124939
-0.076910	to be equal to	-0.124939
-0.076910	is therefore equal to	-0.124939
-0.333044	writes or reads to	-0.124939
-0.186362	application program. Add to	-0.124939
-0.186362	the library. Add to	-0.124939
-0.186362	CPU dispatching. Add to	-0.124939
-0.185773	It is expected to	-0.124939
-0.443333	can be expected to	-0.124939
-0.185773	set are expected to	-0.124939
-0.127939	it is convenient to	-0.124939
-0.186362	may be convenient to	-0.124939
-0.059275	be more convenient to	-0.124939
-0.059275	certainly more convenient to	-0.124939
-0.332219	the leftmost column to	-0.124939
-0.332219	sake of portability to	-0.124939
-0.333044	each thread. Pointers to	-0.124939
-0.253883	signed when converting to	-0.124939
-0.253883	signed before converting to	-0.124939
-0.332219	is extremely costly to	-0.124939
-0.609562	more powerful computers to	-0.124939
-0.758037	in the debugger to	-0.124939
-0.480083	is not permissible to	-0.124939
-0.031251	commercial compilers due to	-0.124939
-0.031251	be higher due to	-0.124939
-0.031251	large delay due to	-0.124939
-0.031251	the future due to	-0.124939
-0.031251	are unstable due to	-0.124939
-0.031251	some differences due to	-0.124939
-0.332219	is in edx, to	-0.124939
-0.048019	worth the effort to	-0.425969
-0.327051	software engineering principles to	-0.124939
-0.101724	may be obvious to	-0.124939
-0.101724	would be obvious to	-0.124939
-0.235597	may be swapped to	-0.124939
-0.235597	or even swapped to	-0.124939
-0.235597	not be portable to	-0.124939
-0.235597	is not portable to	-0.124939
-0.235597	no certain limit to	-0.124939
-0.547648	reasonable upper limit to	-0.124939
-0.101724	there is nothing to	-0.124939
-0.101724	There is nothing to	-0.124939
-0.598304	can be increased to	-0.124939
-0.234807	operator is equivalent to	-0.124939
-0.234807	directives are equivalent to	-0.124939
-0.423461	necessary cleanup jobs to	-0.124939
-0.044494	it is safer to	-0.124939
-0.044494	It is safer to	-0.124939
-0.094073	References are safer to	-0.124939
-0.094073	is therefore safer to	-0.124939
-0.329010	the expression -(-a) to	-0.124939
-0.423461	can be updated to	-0.124939
-0.329010	the program appear to	-0.124939
-0.329992	system resources Writes to	-0.124939
-0.037780	*.so) that belong to	-0.124939
-0.037780	addresses all belong to	-0.124939
-0.037780	functions often belong to	-0.124939
-0.037780	stack always belong to	-0.124939
-0.037780	cache lines belong to	-0.124939
-0.072167	You may prefer to	-0.124939
-0.072167	programmer may prefer to	-0.124939
-0.158764	users will prefer to	-0.124939
-0.767865	the time slices to	-0.124939
-0.047762	likely to lead to	-0.124939
-0.015344	This can lead to	-0.124939
-0.015344	insight can lead to	-0.124939
-0.015344	bottlenecks can lead to	-0.124939
-0.415439	shifts one place to	-0.124939
-0.178615	it is preferable to	-0.124939
-0.121160	may be preferable to	-0.124939
-0.121160	is often preferable to	-0.124939
-0.121160	where the obstacles to	-0.124939
-0.121160	Some important obstacles to	-0.124939
-0.121160	most common obstacles to	-0.124939
-0.023225	option. 8.4 Obstacles to	-0.124939
-0.023225	77 8.4 Obstacles to	-0.124939
-0.023225	compilers. 8.3 Obstacles to	-0.124939
-0.023225	74 8.3 Obstacles to	-0.124939
-0.452283	by the loader to	-0.124939
-0.047762	model number. Failure to	-0.124939
-0.023225	also deallocated. Failure to	-0.124939
-0.023225	been deallocated. Failure to	-0.124939
-0.047762	program flow. Failure to	-0.124939
-0.320615	you change pre-increment to	-0.124939
-0.047762	quite efficient thanks to	-0.124939
-0.047762	data automatically thanks to	-0.124939
-0.047762	very similar thanks to	-0.124939
-0.047762	becomes fragmented thanks to	-0.124939
-0.023225	b is guaranteed to	-0.124939
-0.023225	i&15 is guaranteed to	-0.124939
-0.047762	they are guaranteed to	-0.124939
-0.047762	is not guaranteed to	-0.124939
-0.415439	may need modification to	-0.124939
-0.415439	machines? Possible solutions to	-0.124939
-0.056368	in an appendix to	-0.124939
-0.056368	as an appendix to	-0.124939
-0.121160	classes. An appendix to	-0.124939
-0.121160	used as alternatives to	-0.124939
-0.121160	the possible alternatives to	-0.124939
-0.121160	are various alternatives to	-0.124939
-0.320615	lot of modifications to	-0.124939
-0.320615	better metaprogramming tools to	-0.124939
-0.047762	option -fpic according to	-0.124939
-0.047762	binary representation according to	-0.124939
-0.047762	always behave according to	-0.124939
-0.047762	100. Now, according to	-0.124939
-0.320615	to great lengths to	-0.124939
-0.121160	registers is extended to	-0.124939
-0.121160	can be extended to	-0.124939
-0.121160	registers are extended to	-0.124939
-0.210469	less efficient. Access to	-0.124939
-0.210469	data locally. Access to	-0.124939
-0.320615	for many years to	-0.124939
-0.321820	but also inconvenient to	-0.124939
-0.320615	the __assume_aligned directive to	-0.124939
-0.209536	model is going to	-0.124939
-0.209536	am not going to	-0.124939
-0.011909	a good idea to	-0.124939
-0.321820	interfaces and interfaces to	-0.124939
-0.321820	// Use mask to	-0.124939
-0.209536	variable names. Remember to	-0.124939
-0.209536	and down. Remember to	-0.124939
-0.209536	although it appears to	-0.124939
-0.209536	if this appears to	-0.124939
-0.320615	that add functionality to	-0.124939
-0.729819	the exception handler to	-0.124939
-0.210469	performance is inferior to	-0.124939
-0.210469	compilers are inferior to	-0.124939
-0.320615	from one auto_ptr to	-0.124939
-0.011457	compiler is unable to	-0.124939
-0.011457	will be unable to	-0.124939
-0.023225	convert example 15.1a to	-0.124939
-0.023225	reduces example 15.1a to	-0.124939
-0.023225	compiler reduced 15.1a to	-0.124939
-0.023225	compilers reduced 15.1a to	-0.124939
-0.320615	a systematic manner to	-0.124939
-0.460793	1.23456. The conclusion to	-0.124939
-0.310095	floating point multiplication, to	-0.124939
-0.310095	have tested seem to	-0.124939
-0.402391	range from -128 to	-0.124939
-0.460793	changing the dividend to	-0.124939
-0.170435	This is annoying to	-0.124939
-0.170435	schemes are annoying to	-0.124939
-0.310095	assembly output listing to	-0.124939
-0.235334	call is translated to	-0.124939
-0.170435	has been translated to	-0.124939
-0.311664	12 option -fno-builtin to	-0.124939
-0.064926	switch statement leads to	-0.124939
-0.064926	automatic vectorization leads to	-0.124939
-0.064926	lazy binding leads to	-0.124939
-0.567156	your programming questions to	-0.124939
-0.310095	a portability issue to	-0.124939
-0.310095	the function argument to	-0.124939
-0.310095	We may decide to	-0.124939
-0.310095	be completely unrolled to	-0.124939
-0.015344	range from 0x2700 to	-0.425969
-0.064926	from address 0x2700 to	-0.124939
-0.064926	that is ported to	-0.124939
-0.064926	is later ported to	-0.124939
-0.064926	not easily ported to	-0.124939
-0.310095	takes some experience to	-0.124939
-0.438098	modified, if necessary, to	-0.124939
-0.460793	function // Call to	-0.124939
-0.310095	feature. All accesses to	-0.124939
-0.170435	long time compared to	-0.124939
-0.170435	following disadvantages compared to	-0.124939
-0.036755	it is sufficient to	-0.124939
-0.289790	it is impossible to	-0.124939
-0.101433	x is type-casted to	-0.124939
-0.101433	pointers are type-casted to	-0.124939
-0.289790	CPUID was manipulated to	-0.124939
-0.289790	the code carefully to	-0.124939
-0.289790	number of dangers to	-0.124939
-0.377414	for virus scanners to	-0.124939
-0.377414	assigning different priorities to	-0.124939
-0.289790	can get answers to	-0.124939
-0.289790	a function prototype to	-0.124939
-0.289790	of 256 Kbytes to	-0.124939
-0.289790	Coarse-grained parallelism refers to	-0.124939
-0.289790	functions take microseconds to	-0.124939
-0.289790	take extra precautions to	-0.124939
-0.101433	it is worthwhile to	-0.124939
-0.101433	may be worthwhile to	-0.124939
-0.289790	it is profitable to	-0.124939
-0.377414	Constructor // Initialize to	-0.124939
-0.101433	the object belongs to	-0.124939
-0.101433	This normally belongs to	-0.124939
-0.377414	from the caller to	-0.124939
-0.289790	returns // Volatile to	-0.124939
-0.047762	standardized allows us to	-0.124939
-0.047762	biased allows us to	-0.124939
-0.101433	mentally flawed approach to	-0.124939
-0.101433	well thought-through approach to	-0.124939
-0.101433	of C++ relates to	-0.124939
-0.101433	C++ language relates to	-0.124939
-0.289790	in example 11.1a to	-0.124939
-0.047762	if you forget to	-0.124939
-0.047762	If you forget to	-0.124939
-0.289790	should never respond to	-0.124939
-0.101433	a significant contribution to	-0.124939
-0.101433	a negligible contribution to	-0.124939
-0.047762	including the ability to	-0.124939
-0.047762	loose the ability to	-0.124939
-0.101433	address and attempts to	-0.124939
-0.101433	that it attempts to	-0.124939
-0.377414	annoying time consumer to	-0.124939
-0.289790	allowed. Non-public distribution to	-0.124939
-0.289790	appropriate error messages to	-0.124939
-0.289790	used as coprocessors to	-0.124939
-0.289790	// sum, initialize to	-0.124939
-0.289790	often an obstacle to	-0.124939
-0.289790	the 64-bit extension to	-0.124939
-0.047762	take several minutes to	-0.124939
-0.047762	took several minutes to	-0.124939
-0.289790	has been incremented to	-0.124939
-0.234018	restarted anyway. Updates to	-0.124939
-0.234018	are serious limitations to	-0.124939
-0.234018	because we forgot to	-0.124939
-0.234018	or malloc. Handles to	-0.124939
-0.234018	for adding bounds-checking to	-0.124939
-0.234018	not always comparable to	-0.124939
-0.234018	It is unacceptable to	-0.124939
-0.234018	from time T+1 to	-0.124939
-0.234018	map are prone to	-0.124939
-0.234018	as a plug-in to	-0.124939
-0.234018	n < 223 to	-0.124939
-0.234018	Beginners are advised to	-0.124939
-0.234018	consider it unwise to	-0.124939
-0.234018	a "move constructor" to	-0.124939
-0.234018	overhead of switching to	-0.124939
-0.234018	statements like throw(A,B,C) to	-0.124939
-0.234018	elements are cumbersome to	-0.124939
-0.234018	or #pragma novector to	-0.124939
-0.234018	to integer According to	-0.124939
-0.234018	tested the capability to	-0.124939
-0.234018	size. I tried to	-0.124939
-0.234018	{ // Round to	-0.124939
-0.234018	-a > -b to	-0.124939
-0.234018	} // Entry to	-0.124939
-0.234018	will be rounded to	-0.124939
-0.234018	values is closest to	-0.124939
-0.234018	F1 is supposed to	-0.124939
-0.234018	cheap, in relation to	-0.124939
-0.234018	expects immediate responses to	-0.124939
-0.234018	by default, conform to	-0.124939
-0.234018	not 123 correspond to	-0.124939
-0.234018	the following steps to	-0.124939
-0.234018	processor core. Try to	-0.124939
-0.234018	threads from attempting to	-0.124939
-0.234018	reducing example 15.1d to	-0.124939
-0.234018	it is advisable to	-0.124939
-0.234018	systems gives rise to	-0.124939
-0.234018	empty throw() specification to	-0.124939
-0.234018	not testing. Trying to	-0.124939
-0.234018	10; // Convert to	-0.124939
-0.234018	i++,i2+=2.0f)a[i]=i2; 41 Float to	-0.124939
-0.234018	to respond quickly to	-0.124939
-0.234018	for software teachers to	-0.124939
-0.234018	2 * 5; to	-0.124939
-0.234018	easy to port to	-0.124939
-0.234018	reduce example 12.1b to	-0.124939
-0.234018	It just happened to	-0.124939
-0.234018	smaller and closer to	-0.124939
-0.234018	kb. This corresponds to	-0.124939
-0.234018	convert example 12.8a to	-0.124939
-0.234018	Language Runtime, CLR, to	-0.124939
-0.234018	you are risking to	-0.124939
-0.234018	values are confined to	-0.124939
-0.234018	integer multiplication prior to	-0.124939
-0.234018	it takes hours to	-0.124939
-0.234018	designers have gone to	-0.124939
-0.234018	are less susceptible to	-0.124939
-0.234018	necessary to adhere to	-0.124939
-0.234018	problems that relate to	-0.124939
-0.234018	GetPrivateProfileString and WritePrivateProfileString to	-0.124939
-0.234018	preferably be responded to	-0.124939
-0.234018	are adding -100 to	-0.124939
-0.234018	allocation also tends to	-0.124939
-0.234018	to reduce (a*b*c)+(c*b*a) to	-0.124939
-0.234018	metaprogramming implementation analogous to	-0.124939
-0.234018	am always happy to	-0.124939
-0.234018	is common practice to	-0.124939
-0.234018	GetLogicalProcessorInformation in Windows) to	-0.124939
-0.234018	jump from a=a*2; to	-0.124939
-0.234018	It is tempting to	-0.124939
-0.234018	have to adapt to	-0.124939
-0.234018	functions are unrelated to	-0.124939
-0.234018	Example 7.38b. Alternative to	-0.124939
-0.895990	as it is and	-0.124939
-0.569972	uses of a and	-0.124939
-0.197510	values of a and	-0.425969
-0.582704	area for a and	-0.124939
-0.435747	faster if a and	-0.124939
-0.699005	example, if a and	-0.124939
-0.435747	even if a and	-0.124939
-0.435747	optimized if a and	-0.124939
-0.508139	number. If a and	-0.124939
-0.508139	factor. If a and	-0.124939
-0.583825	80 into a and	-0.124939
-0.351103	the array a and	-0.124939
-0.494103	bytes between a and	-0.124939
-0.805518	by making a and	-0.124939
-0.351103	The parameters a and	-0.124939
-0.351103	are used. a and	-0.124939
-0.351103	two arrays, a and	-0.124939
-0.351103	by joining a and	-0.124939
-0.351103	= MultiplyBy<8>(10); a and	-0.124939
-0.568138	pointer points to and	-0.124939
-0.357062	to delete it and	-0.124939
-0.877008	inline the function and	-0.124939
-0.524699	inline this function and	-0.124939
-0.489797	by one function and	-0.124939
-0.780884	times each function and	-0.124939
-0.547788	a critical function and	-0.124939
-0.524699	by another function and	-0.124939
-0.737264	the dispatcher function and	-0.124939
-0.638281	the inlined function and	-0.124939
-0.461316	compile- time if and	-0.124939
-0.354277	multiple versions with and	-0.124939
-0.809055	code compiled with and	-0.124939
-0.537801	can turn on and	-0.124939
-1.579252	of the code and	-0.124939
-1.479753	in the code and	-0.124939
-0.811936	optimize the code and	-0.124939
-0.433615	size of code and	-0.124939
-0.433615	amount of code and	-0.124939
-0.164799	Caching of code and	-0.425969
-0.232262	time when code and	-0.124939
-0.232262	CriticalFunction when code and	-0.124939
-0.840730	the same code and	-0.124939
-0.840730	floating point code and	-0.124939
-0.720239	any extra code and	-0.124939
-0.459137	directly compiled code and	-0.124939
-0.633665	on intermediate code and	-0.124939
-0.866581	an intermediate code and	-0.124939
-0.325666	to binary code and	-0.124939
-0.364531	make position-independent code and	-0.124939
-0.364531	uses position-independent code and	-0.124939
-0.364531	burdensome position-independent code and	-0.124939
-0.479122	into machine code and	-0.124939
-0.325666	and well-structured code and	-0.124939
-0.325666	the startup code and	-0.124939
-0.595005	before the compiler and	-0.124939
-1.273159	The Gnu compiler and	-0.124939
-0.352494	the best compiler and	-0.124939
-0.357659	there between x and	-0.124939
-0.839365	of the time and	-0.124939
-0.278517	at a time and	-0.124939
-0.749078	ahead of time and	-0.124939
-1.308908	at compile time and	-0.124939
-0.618513	The development time and	-0.124939
-0.337756	take installation time and	-0.124939
-0.593300	function to use and	-0.124939
-0.456540	code cache use and	-0.124939
-0.459325	therefore becoming more and	-0.124939
-0.934317	calculation of A and	-0.124939
-0.548611	versions of memory and	-0.124939
-0.298571	in static memory and	-0.124939
-0.335693	much less memory and	-0.124939
-0.335693	most common memory and	-0.124939
-0.614584	the main memory and	-0.124939
-0.493470	where RAM memory and	-0.124939
-0.335693	to uncached memory and	-0.124939
-0.859823	Therefore, the data and	-0.124939
-0.698132	of test data and	-0.124939
-0.346415	store intermediate data and	-0.124939
-0.447775	containing thread-specific data and	-0.124939
-1.466345	of a program and	-0.124939
-0.351887	the final program and	-0.124939
-1.431124	programmer to make and	-0.124939
-0.323644	accesses to functions and	-0.124939
-0.419211	several different functions and	-0.124939
-0.456390	for all functions and	-0.124939
-0.648530	the critical functions and	-0.124939
-0.323644	optimize both functions and	-0.124939
-0.527876	different intrinsic functions and	-0.124939
-0.456390	and string functions and	-0.124939
-0.132347	of public functions and	-0.124939
-0.132347	All public functions and	-0.124939
-0.456390	pure. Virtual functions and	-0.124939
-0.323644	between leaf functions and	-0.124939
-0.977561	in the CPU and	-0.124939
-0.180743	by the CPU and	-0.425969
-0.180743	both the CPU and	-0.425969
-0.202362	bit scan instruction and	-0.124939
-1.645971	the floating point and	-0.124939
-1.190807	out the loop and	-0.124939
-0.792867	unroll the loop and	-0.124939
-0.792867	unrolling the loop and	-0.124939
-0.983056	unroll a loop and	-0.124939
-1.152167	the innermost loop and	-0.124939
-0.193036	it is used and	-0.124939
-0.348492	no longer used and	-0.124939
-0.498694	will have one and	-0.124939
-0.384028	the code cache and	-0.124939
-0.343158	that code cache and	-0.124939
-0.335439	such as cache and	-0.124939
-0.882577	level-1 data cache and	-0.124939
-1.144574	the level-2 cache and	-0.124939
-0.442686	128 bit integer and	-0.124939
-0.723619	an unsigned integer and	-0.124939
-0.342380	to mix integer and	-0.124939
-0.342380	registers (6 integer and	-0.124939
-0.585061	way. See page and	-0.124939
-0.732687	this instruction set and	-0.124939
-0.689707	AVX instruction set and	-0.425969
-0.732687	latest instruction set and	-0.124939
-0.340363	AVX-512 instruction set and	-0.425969
-0.330285	as list, set and	-0.124939
-0.633712	the same class and	-0.124939
-0.432229	a parent class and	-0.124939
-0.305717	of parent class and	-0.124939
-1.233153	compiler to do and	-0.124939
-0.575211	compiler can do and	-0.124939
-0.309922	come with compilers and	-0.124939
-0.309922	use only compilers and	-0.124939
-0.488265	The Intel compilers and	-0.124939
-0.120128	different C++ compilers and	-0.903090
-0.566844	of how compilers and	-0.124939
-0.402177	very good compilers and	-0.124939
-0.309922	with Intel's compilers and	-0.124939
-1.313496	you are using and	-0.124939
-0.140441	of float, double and	-0.124939
-0.140441	between float, double and	-0.124939
-0.586828	on the size and	-0.124939
-0.452645	optimizing for size and	-0.124939
-0.559437	by the Intel and	-0.124939
-0.132427	one from Intel and	-0.124939
-0.132427	Available from Intel and	-0.124939
-0.061185	The Microsoft, Intel and	-0.124939
-0.061185	with Microsoft, Intel and	-0.124939
-0.061185	from Microsoft, Intel and	-0.124939
-0.061185	(i.e. Microsoft, Intel and	-0.124939
-0.323893	The Gnu, Intel and	-0.124939
-1.193997	the function pointer and	-0.124939
-0.510688	inexact if b and	-0.124939
-0.336458	example, a, b and	-0.124939
-0.336458	and add b and	-0.124939
-0.062880	// Multiply b and	-0.425969
-0.492436	separate function library and	-0.124939
-0.492436	asmlib function library and	-0.124939
-0.343928	most efficient library and	-0.124939
-0.350613	this to i and	-0.124939
-0.350613	also eliminate i and	-0.124939
-0.469037	256 bit float and	-0.124939
-0.062407	Don't mix float and	-0.124939
-0.332924	// Mixing float and	-0.124939
-0.332924	code mixes float and	-0.124939
-0.713264	loop by two and	-0.124939
-0.561497	of the number and	-0.124939
-1.302553	If the number and	-0.124939
-0.505023	storage of static and	-0.124939
-0.138700	in both static and	-0.124939
-0.138700	with both static and	-0.124939
-0.083075	optimization of C++ and	-0.301030
-0.524163	processing in C++ and	-0.124939
-0.589600	inlining more efficient and	-0.124939
-0.990389	are less efficient and	-0.124939
-0.856364	to an array and	-0.124939
-0.354662	2 if possible and	-0.124939
-0.757170	a debug version and	-0.124939
-0.884507	calculate the value and	-0.124939
-0.495056	all class objects and	-0.124939
-0.336796	align large objects and	-0.124939
-0.723297	all allocated objects and	-0.124939
-0.336796	to declare objects and	-0.124939
-0.348556	Floating point variables and	-0.425969
-0.292168	that all variables and	-0.124939
-0.292168	contains many variables and	-0.124939
-0.380325	used. Such variables and	-0.124939
-0.380325	other local variables and	-0.124939
-0.121965	all non-static variables and	-0.124939
-0.121965	All non-static variables and	-0.124939
-0.292168	access internal variables and	-0.124939
-0.056715	7.2 Integers variables and	-0.124939
-0.710623	call and return and	-0.124939
-1.672334	power of 2 and	-0.124939
-0.345547	takes between 2 and	-0.124939
-0.648311	an import table and	-0.124939
-0.448839	analyzing program performance and	-0.124939
-0.448839	very good performance and	-0.124939
-0.349261	in sequential order and	-0.124939
-0.349261	a natural order and	-0.124939
-0.125386	be very long and	-0.124939
-0.340775	give annoyingly long and	-0.124939
-0.554974	Windows and 32-bit and	-0.124939
-0.389829	capabilities for 32-bit and	-0.124939
-0.389829	executables for 32-bit and	-0.124939
-0.361631	performance between 32-bit and	-0.124939
-0.276849	libraries support 32-bit and	-0.124939
-0.276849	(2013) both 32-bit and	-0.124939
-0.276849	It supports 32-bit and	-0.124939
-0.116728	not. Supports 32-bit and	-0.124939
-0.116728	code). Supports 32-bit and	-0.124939
-0.276849	platforms, including 32-bit and	-0.124939
-0.276849	and Linux, 32-bit and	-0.124939
-0.276849	OS X, 32-bit and	-0.124939
-0.276849	both 16-bit, 32-bit and	-0.124939
-0.354017	of that branch and	-0.124939
-0.458698	the first way and	-0.124939
-0.457687	multiple data elements and	-0.124939
-0.610520	function calls faster and	-0.124939
-0.333550	threads becomes faster and	-0.124939
-0.333550	is generally faster and	-0.124939
-0.333550	software packages faster and	-0.124939
-0.607570	the time before and	-0.124939
-0.429634	your program before and	-0.124939
-0.331989	stamp counter before and	-0.124939
-0.331989	clock count before and	-0.124939
-0.420518	function is called and	-0.301030
-0.507510	destructors are called and	-0.124939
-0.339239	different code address and	-0.124939
-0.535691	arbitrary memory address and	-0.124939
-0.339239	by their address and	-0.124939
-0.548066	while Pentium 4 and	-0.124939
-0.537255	overlap the call and	-0.124939
-0.345850	overhead of call and	-0.124939
-0.352819	would be 8 and	-0.124939
-0.353262	between 8 bit and	-0.124939
-0.353070	is reflected, first and	-0.124939
-1.140584	in a register and	-0.124939
-0.353529	Intel: "Intel 64 and	-0.124939
-0.721145	Intel function libraries and	-0.124939
-0.501628	graphics function libraries and	-0.124939
-0.319600	long vector libraries and	-0.124939
-0.319600	try different libraries and	-0.124939
-0.319600	all runtime libraries and	-0.124939
-0.937069	floating point registers and	-0.124939
-0.285383	both the pointers and	-0.124939
-0.372027	well as pointers and	-0.124939
-0.285383	analyze all pointers and	-0.124939
-0.523469	of using pointers and	-0.124939
-0.433748	with member pointers and	-0.124939
-0.285383	arguments while pointers and	-0.124939
-0.285383	Relocation. All pointers and	-0.124939
-0.285383	the link pointers and	-0.124939
-0.285383	This includes pointers and	-0.124939
-0.519983	before the test and	-0.124939
-0.818627	time a new and	-0.124939
-0.164975	memory with new and	-0.124939
-0.164975	object with new and	-0.124939
-0.164975	allocation with new and	-0.124939
-0.164975	dynamically with new and	-0.124939
-0.283494	to using new and	-0.124939
-0.283494	CString uses new and	-0.124939
-0.283494	the operators new and	-0.124939
-0.283494	alloca over new and	-0.124939
-0.620962	in 64-bit systems and	-0.124939
-0.186615	in 32-bit systems and	-0.124939
-0.574626	64-bit operating systems and	-0.124939
-0.157609	32-bit operating systems and	-0.124939
-0.283494	with existing systems and	-0.124939
-0.862639	to the user and	-0.124939
-0.517314	for the user and	-0.124939
-0.453953	should avoid these and	-0.124939
-0.506122	than memory access and	-0.124939
-0.324073	when CPU access and	-0.124939
-0.680817	put file access and	-0.124939
-0.324073	with non-sequential access and	-0.124939
-0.456963	optimized for SSE2 and	-0.124939
-1.260467	the operating system and	-0.124939
-0.525300	compiler, operating system and	-0.124939
-0.343239	aligned by 32 and	-0.124939
-0.343239	8, 16, 32 and	-0.124939
-0.313233	the library file and	-0.124939
-0.313233	or C++ file and	-0.124939
-0.313233	one source file and	-0.124939
-0.245949	the executable file and	-0.124939
-0.540016	64-bit vector operations and	-0.124939
-1.029213	floating point operations and	-0.124939
-0.332321	comparison, bit operations and	-0.124939
-0.393772	one is 0 and	-0.124939
-0.096213	b to 0 and	-0.425969
-0.411111	values than 0 and	-0.425969
-0.393772	i < 0 and	-0.124939
-0.556804	specifying the type and	-0.124939
-0.481598	the function type and	-0.124939
-0.352663	worst possible case and	-0.124939
-1.345575	in some cases and	-0.124939
-0.553243	for different processors and	-0.124939
-0.302342	between simple processors and	-0.124939
-0.392822	processors. AMD processors and	-0.124939
-0.392822	of physical processors and	-0.124939
-0.302342	on older processors and	-0.124939
-0.302342	see emulated processors and	-0.124939
-0.340834	(ArraySize) is constant and	-0.124939
-0.340834	double precision constant and	-0.124939
-0.467677	can set up and	-0.124939
-0.331929	frequency goes up and	-0.124939
-0.467677	for cleaning up and	-0.124939
-0.814475	the residual error and	-0.124939
-0.377451	way two times and	-0.124939
-0.289821	= 256 times and	-0.124939
-0.289821	repeats 20 times and	-0.124939
-0.289821	at random times and	-0.124939
-0.531176	repeats 1000 times and	-0.124939
-0.289821	at unpredictable times and	-0.124939
-0.289821	function ten times and	-0.124939
-0.794063	on the stack and	-0.124939
-0.026513	Microsoft, Intel, Gnu and	-0.301030
-0.352901	is so important and	-0.124939
-0.340760	by most CPUs and	-0.124939
-0.340760	all 64-bit CPUs and	-0.124939
-0.436125	alignment of arrays and	-0.124939
-0.383832	several large arrays and	-0.124939
-0.122929	that big arrays and	-0.124939
-0.122929	have big arrays and	-0.124939
-0.295029	simple variables, arrays and	-0.124939
-0.295029	4. Align arrays and	-0.124939
-0.321900	performance. The Windows and	-0.124939
-0.265861	compiler for Windows and	-0.124939
-0.175761	library for Windows and	-0.124939
-0.175761	compiling for Windows and	-0.124939
-0.131127	and 64-bit Windows and	-0.124939
-0.351357	for 32-bit Windows and	-0.124939
-0.243904	64 bit Windows and	-0.124939
-0.049356	in both Windows and	-0.124939
-0.243904	Linux, BSD, Windows and	-0.124939
-0.669498	of function calls and	-0.124939
-0.469744	pure function calls and	-0.124939
-0.317142	has many calls and	-0.124939
-0.317142	of jumps, calls and	-0.124939
-1.220353	floating point calculations and	-0.124939
-0.499818	these two versions and	-0.124939
-0.498904	to out-of-order execution and	-0.124939
-0.394911	by the processor and	-0.124939
-0.394911	on the processor and	-0.124939
-0.327418	or VIA processor and	-0.124939
-1.101099	code is compiled and	-0.124939
-0.512146	function is big and	-0.124939
-0.338412	compiled code big and	-0.124939
-0.558768	Define multiple threads and	-0.124939
-0.575531	have the best and	-0.124939
-0.594301	of programming language and	-0.124939
-0.354572	C++ programming language and	-0.124939
-0.354572	particular programming language and	-0.124939
-0.408528	or assembly language and	-0.124939
-0.576154	using assembly language and	-0.124939
-0.533065	for execution speed and	-0.124939
-0.353431	a, b, c and	-0.124939
-0.325640	speed between single and	-0.124939
-0.325640	not mix single and	-0.124939
-0.325640	for mixing single and	-0.124939
-0.349904	arithmetic units, etc. and	-0.124939
-0.166169	all on AMD and	-0.124939
-0.166169	performance on AMD and	-0.124939
-0.166169	well on AMD and	-0.124939
-0.178006	such as AMD and	-0.124939
-0.178006	Supports both AMD and	-0.124939
-0.003204	of Intel, AMD and	-1.079181
-0.029726	for Intel, AMD and	-0.124939
-0.029726	from Intel, AMD and	-0.124939
-0.029726	both Intel, AMD and	-0.124939
-0.478516	block is allocated and	-0.124939
-0.121789	sizes are allocated and	-0.425969
-0.145438	count is small and	-0.124939
-0.514682	with a small and	-0.124939
-0.378419	because the overflow and	-0.124939
-0.357741	case of overflow and	-0.124939
-0.248881	problems of overflow and	-0.124939
-0.850034	check for overflow and	-0.124939
-0.264889	around on overflow and	-0.124939
-0.378419	generate an overflow and	-0.124939
-0.264889	much about overflow and	-0.124939
-0.264889	inputs give overflow and	-0.124939
-0.424579	size of integers and	-0.124939
-0.037694	conversions between integers and	-0.425969
-0.078972	Conversions between integers and	-0.124939
-0.424579	of 32-bit integers and	-0.124939
-0.565767	transposes a matrix and	-0.124939
-0.208722	versions of Linux and	-0.124939
-0.117856	data in Linux and	-0.124939
-0.117856	introduced in Linux and	-0.124939
-0.117856	overridden in Linux and	-0.124939
-0.440923	and 64-bit Linux and	-0.124939
-0.312197	for 64-bit Linux and	-0.124939
-0.208722	/MT). In Linux and	-0.124939
-0.207567	In 32-bit Linux and	-0.124939
-0.207567	between 32-bit Linux and	-0.124939
-0.208722	also supports Linux and	-0.124939
-0.028538	for Windows, Linux and	-0.425969
-0.059083	64-bit Windows, Linux and	-0.124939
-0.844544	for the AVX and	-0.124939
-0.309848	use of classes and	-0.124939
-0.500528	using vector classes and	-0.124939
-0.309848	into C++ classes and	-0.124939
-0.495758	containing container classes and	-0.124939
-0.453714	how this works and	-0.124939
-0.350727	each carefully optimized and	-0.124939
-0.517533	complicated address calculation and	-0.124939
-0.498495	six integer parameters and	-0.124939
-0.417762	ignore the problem and	-0.124939
-0.417762	fix the problem and	-0.124939
-0.674362	with AVX support and	-0.124939
-0.335678	requires OS support and	-0.124939
-0.349762	free. These operators and	-0.124939
-0.350834	element to list and	-0.124939
-0.467893	alignment of structure and	-0.124939
-0.413708	own data structure and	-0.124939
-0.319223	the logical structure and	-0.124939
-0.773051	able to inline and	-0.124939
-0.807943	0 or 1 and	-0.124939
-0.631795	32 bit mode and	-0.124939
-0.392190	for 16-bit mode and	-0.124939
-0.058106	to protected mode and	-0.425969
-0.352172	corrections for sign and	-0.124939
-0.543183	Increment loop counter and	-0.124939
-0.331674	an integer counter and	-0.124939
-0.547544	The first count and	-0.124939
-0.146627	maximum repeat count and	-0.124939
-0.369642	fixed repeat count and	-0.124939
-0.350240	large data files and	-0.124939
-0.381757	of object files and	-0.124939
-0.200000	store help files and	-0.124939
-0.295886	files, help files and	-0.124939
-0.267456	connections. Open files and	-0.124939
-0.350240	drivers, configuration files and	-0.124939
-0.467395	arrays is fast and	-0.124939
-0.500474	and for fast and	-0.124939
-0.365685	integers. The allocation and	-0.124939
-0.280181	optimize register allocation and	-0.124939
-0.280181	of dynamic allocation and	-0.124939
-0.280181	The frequent allocation and	-0.124939
-0.280181	finished. Register allocation and	-0.124939
-0.348321	in C++ programs and	-0.124939
-0.379229	of the problems and	-0.124939
-0.061248	of compatibility problems and	-0.425969
-0.213298	and compatibility problems and	-0.124939
-0.324350	of resource problems and	-0.124939
-0.187595	of usability problems and	-0.124939
-0.187595	problems, usability problems and	-0.124939
-0.512882	up cache space and	-0.124939
-0.616384	the CPU dispatching and	-0.124939
-0.435466	for CPU dispatching and	-0.124939
-0.435466	on CPU dispatching and	-0.124939
-0.551110	by the microprocessor and	-0.124939
-0.602424	a dedicated microprocessor and	-0.124939
-0.268152	number of branches and	-0.124939
-0.268152	target of branches and	-0.124939
-0.379851	with many branches and	-0.124939
-0.291781	with preceding branches and	-0.124939
-0.449689	make a multiplication and	-0.124939
-0.348346	to 12.8b automatically and	-0.124939
-0.518112	makes code caching and	-0.124939
-0.469062	supported instruction sets and	-0.124939
-0.469062	what instruction sets and	-0.124939
-0.414556	code more complicated and	-0.124939
-0.414556	elements more complicated and	-0.124939
-0.307828	is extremely complicated and	-0.124939
-0.567614	structured exception handling and	-0.124939
-0.519949	can look like and	-0.124939
-0.348215	various optimization methods and	-0.124939
-0.270249	differently on signed and	-0.124939
-0.270249	between using signed and	-0.124939
-0.114434	conversion between signed and	-0.124939
-0.114434	Conversions between signed and	-0.124939
-0.270249	to mix signed and	-0.124939
-0.515177	specific CPU model and	-0.124939
-0.534210	structured software development and	-0.124939
-0.467634	own memory block and	-0.124939
-0.666159	bigger memory block and	-0.124939
-0.597612	child class name and	-0.124939
-0.326681	any brand name and	-0.124939
-0.349087	This data conversion and	-0.124939
-0.449869	load is high and	-0.124939
-0.487161	variables to zero and	-0.124939
-0.058467	the terminating zero and	-0.425969
-0.423174	platforms. The Microsoft and	-0.124939
-0.423174	Clang, Intel, Microsoft and	-0.124939
-0.640789	a function parameter and	-0.124939
-0.348981	reductions involving division and	-0.124939
-0.348981	means that source and	-0.124939
-0.349284	program starts running and	-0.124939
-0.704607	on network resources and	-0.124939
-0.448877	loop by n and	-0.124939
-0.515170	produces a string and	-0.124939
-0.346973	are becoming better and	-0.124939
-0.346658	for Unix applications and	-0.124939
-0.346344	Boolean operators && and	-0.124939
-0.325175	time than addition and	-0.124939
-1.044890	floating point addition and	-0.124939
-0.347420	<, <=, > and	-0.124939
-0.417725	types of expressions and	-0.124939
-0.322452	the simplest expressions and	-0.124939
-0.323790	used to read and	-0.124939
-0.323790	WritePrivateProfileString to read and	-0.124939
-0.426991	to be read and	-0.124939
-0.322085	of #include directives and	-0.124939
-0.322085	18.2. Compiler directives and	-0.124939
-0.347220	for all public and	-0.124939
-0.415651	load the framework and	-0.124939
-0.673375	The .NET framework and	-0.124939
-0.723524	using static linking and	-0.124939
-0.115472	3.6 Dynamic linking and	-0.425969
-0.515542	all modern microprocessors and	-0.124939
-0.547108	floating point numbers and	-0.425969
-0.521571	the hardware platform and	-0.124939
-0.347725	clock cycles later and	-0.124939
-0.347725	software project together and	-0.124939
-0.525922	to 128-bit XMM and	-0.124939
-0.762532	the user interface and	-0.124939
-0.762532	graphical user interface and	-0.124939
-0.348814	have become bigger and	-0.124939
-0.447600	operations on vectors and	-0.124939
-0.345194	one for AVX2 and	-0.124939
-0.346757	use of << and	-0.124939
-0.251438	Supports all x86 and	-0.425969
-0.294087	library. Supports x86 and	-0.124939
-0.447726	software development process and	-0.124939
-0.872658	of the advantages and	-0.124939
-0.321956	type has advantages and	-0.124939
-0.349814	a and b, and	-0.124939
-0.447246	about data storage and	-0.124939
-1.102010	relevant optimization options and	-0.124939
-0.374634	the copy constructor and	-0.124939
-0.374634	no copy constructor and	-0.124939
-0.349309	bits of a[i] and	-0.124939
-0.377291	inline the function, and	-0.124939
-0.289690	the library function, and	-0.124939
-0.289690	the select function, and	-0.124939
-0.946978	of the operands and	-0.124939
-0.445336	use an advanced and	-0.124939
-1.004095	out of range and	-0.124939
-0.532794	takes to start and	-0.124939
-0.344883	tested library modules and	-0.124939
-0.410913	proxy is smaller and	-0.124939
-0.289690	the code smaller and	-0.124939
-0.289690	8 bytes smaller and	-0.124939
-0.315413	routines, system core and	-0.124939
-0.315413	dedicated microprocessor core and	-0.124939
-0.347037	of jumping around and	-0.124939
-0.314622	typically between 5 and	-0.124939
-0.314622	by 3, 5 and	-0.124939
-1.053914	the code section and	-0.124939
-0.344913	be further tested and	-0.124939
-0.346611	removed the contentions and	-0.124939
-0.407996	0 for positive and	-0.124939
-0.407996	has both positive and	-0.124939
-0.346186	arrays in C and	-0.124939
-0.316206	names, one global and	-0.124939
-0.316206	26. Avoid global and	-0.124939
-0.485568	avoid the conversions and	-0.124939
-0.445352	the if statement and	-0.124939
-0.346751	turn it off and	-0.124939
-0.314504	thing as p and	-0.124939
-0.314504	optimizing away p and	-0.124939
-0.529892	of programming languages and	-0.124939
-0.345847	procedures for installation and	-0.124939
-0.346751	may vary dynamically and	-0.124939
-0.511894	do function inlining and	-0.124939
-0.344983	accessing databases, network and	-0.124939
-0.485620	and a slow and	-0.124939
-0.310803	seldom used functions, and	-0.124939
-0.310803	inheritance, virtual functions, and	-0.124939
-0.344503	organized into lines and	-0.124939
-1.012391	order to find and	-0.124939
-0.309916	some syntax checking and	-0.124939
-0.620018	with bounds checking and	-0.124939
-0.490931	to other platforms and	-0.124939
-0.241921	about which platforms and	-0.124939
-0.241921	on all platforms and	-0.124939
-0.241921	and ARM platforms and	-0.124939
-0.562845	both the level-1 and	-0.124939
-0.522519	inputs is limited and	-0.124939
-0.344511	for fast math and	-0.124939
-0.343997	stack. String constants and	-0.124939
-0.343484	constant = shift and	-0.124939
-0.446021	This is safe and	-0.124939
-0.346056	economy, cache efficiency and	-0.124939
-0.484981	time. Text strings and	-0.124939
-0.349255	gain by testing and	-0.124939
-0.266642	also makes testing and	-0.124939
-0.266642	of development, testing and	-0.124939
-0.302831	restrictions on alignment and	-0.124939
-0.302831	about pointer alignment and	-0.124939
-0.441213	eax with 100 and	-0.124939
-0.269428	the stack Variables and	-0.124939
-0.269428	in memory. Variables and	-0.124939
-0.352627	variable storage Variables and	-0.124939
-0.341761	statistics, signal processing and	-0.124939
-0.367388	a more clear and	-0.124939
-0.256370	software more clear and	-0.124939
-0.234096	code less clear and	-0.124939
-0.234096	for making clear and	-0.124939
-0.343334	stored in x, and	-0.124939
-0.824091	Intel-based Mac OS and	-0.124939
-0.166515	between function names and	-0.124939
-0.166515	about function names and	-0.124939
-0.166515	define function names and	-0.124939
-0.226054	CPU brand names and	-0.124939
-0.343933	CPU time, RAM and	-0.124939
-0.058011	number of rows and	-0.425969
-0.096423	the same thing and	-0.124939
-0.440881	instances of structures and	-0.124939
-0.341936	that seldom occur and	-0.124939
-0.343892	things very smart and	-0.124939
-0.285066	with the SSE and	-0.124939
-0.285066	support the SSE and	-0.124939
-0.295685	we are reading and	-0.124939
-0.295685	spent on reading and	-0.124939
-0.500596	vector. The simplest and	-0.124939
-0.731951	an error message and	-0.124939
-0.377560	appropriate error message and	-0.124939
-0.507864	the CPU cores and	-0.124939
-0.340636	for array sizes and	-0.124939
-0.296270	on the PathScale and	-0.124939
-0.296270	Microsoft, Intel, PathScale and	-0.124939
-0.093580	of Linux, BSD and	-0.124939
-0.093580	in Linux, BSD and	-0.124939
-0.093580	64-bit Linux, BSD and	-0.124939
-0.166752	Windows, Linux, BSD and	-0.124939
-0.352159	of the program, and	-0.124939
-0.313243	by the program, and	-0.124939
-0.213862	execute the program, and	-0.124939
-0.289831	prior to SSE4.1 and	-0.124939
-0.289831	one for SSE4.1 and	-0.124939
-0.338840	a memory buffer and	-0.124939
-0.338840	value of seconds and	-0.124939
-0.437338	with the compiler, and	-0.124939
-0.478129	can be invalid and	-0.124939
-0.338840	for file input and	-0.124939
-0.322794	for many programmers and	-0.124939
-0.244650	for assembly programmers and	-0.124939
-0.244650	for advanced programmers and	-0.124939
-1.183482	the critical stride and	-0.124939
-1.011893	SSE2 instruction set, and	-0.124939
-0.245223	Pentium 4 processors, and	-0.124939
-0.245223	and VIA processors, and	-0.124939
-0.245223	on future processors, and	-0.124939
-0.443623	size doesn't matter and	-0.124939
-0.289197	or class declaration and	-0.124939
-0.289197	extern "C" declaration and	-0.124939
-0.439124	many optimization features and	-0.124939
-0.341691	have been added and	-0.124939
-0.338228	is OS independent and	-0.124939
-0.338228	simply optimized away and	-0.124939
-0.506766	to example 15.1b and	-0.124939
-0.285566	an inlined 15.1b and	-0.124939
-0.871160	constant = multiply and	-0.124939
-1.266083	for an explanation and	-0.124939
-0.441442	between CPU brands and	-0.124939
-0.888654	below the diagonal and	-0.124939
-0.340600	to reload *p and	-0.124939
-0.339017	floating point addition, and	-0.124939
-0.191143	64-bit. Supports OpenMP and	-0.124939
-0.040487	parallel processing, OpenMP and	-0.425969
-0.191143	page 107), OpenMP and	-0.124939
-0.282088	should be standardized and	-0.124939
-0.282088	is fully standardized and	-0.124939
-0.272889	functions of parent and	-0.124939
-0.181463	Members of parent and	-0.124939
-0.235482	of both parent and	-0.124939
-0.434283	languages, operating systems, and	-0.124939
-0.335698	0 for false and	-0.124939
-0.616271	on a PC and	-0.124939
-0.435391	between coarse-grained parallelism and	-0.124939
-0.337465	choose between c2 and	-0.124939
-0.334817	rules for prediction and	-0.124939
-0.334817	overlap the iterations and	-0.124939
-0.590149	to an integer, and	-0.124939
-0.274128	biased binary integer, and	-0.124939
-0.335698	literature on algorithms and	-0.124939
-0.436502	through the PLT and	-0.124939
-0.096804	mix of additions and	-0.124939
-0.096804	combination of additions and	-0.124939
-0.221567	by n additions and	-0.124939
-0.274894	loaded into ecx and	-0.124939
-0.274894	registers eax, ecx and	-0.124939
-0.334817	parameters, local variables, and	-0.124939
-0.335698	always accurate, however, and	-0.124939
-0.334817	programming languages, profiling and	-0.124939
-0.222245	code is fragmented and	-0.124939
-0.222245	to be fragmented and	-0.124939
-0.296105	to become fragmented and	-0.124939
-0.294498	The CPU family and	-0.124939
-0.220890	on its family and	-0.124939
-0.220890	its brand, family and	-0.124939
-0.274894	number of devices and	-0.124939
-0.274894	Accessing system devices and	-0.124939
-0.238860	of the GOT and	-0.124939
-0.173470	not use GOT and	-0.124939
-0.173470	no effect. GOT and	-0.124939
-0.173470	-read_only_relocs suppress. GOT and	-0.124939
-0.335698	sound processing Memory and	-0.124939
-0.336580	is shut down and	-0.124939
-0.411940	of cache misses and	-0.124939
-0.290466	code, cache misses and	-0.124939
-0.333467	compiled with -fpic and	-0.124939
-0.533279	into the carry and	-0.124939
-0.275798	used for debugging and	-0.124939
-0.205053	turn off debugging and	-0.124939
-0.205053	of verifying, debugging and	-0.124939
-0.333467	the next vector, and	-0.124939
-0.264959	of the object, and	-0.124939
-0.264959	the allocated object, and	-0.124939
-0.334471	should be allowed and	-0.124939
-0.347220	test program itself and	-0.124939
-0.264959	the application itself and	-0.124939
-0.335477	including linear algebra and	-0.124939
-0.332465	as task switches and	-0.124939
-0.769022	code is distributed and	-0.124939
-0.487284	in 32-bit mode, and	-0.124939
-0.264103	in exclusive mode, and	-0.124939
-0.431486	features of Java and	-0.124939
-0.333467	It is free and	-0.124939
-0.430231	development more expensive and	-0.124939
-0.112579	speed between rounding and	-0.124939
-0.112579	difference between rounding and	-0.124939
-0.400118	the operating system, and	-0.124939
-0.281499	no operating system, and	-0.124939
-0.488836	to 32-bit integers, and	-0.124939
-0.264103	is interpreted again and	-0.124939
-0.264103	is reused again and	-0.124939
-0.607054	by the linker and	-0.124939
-0.264959	both compiler, linker and	-0.124939
-0.335477	difficult to debug and	-0.124939
-0.088035	the Gnu, Clang and	-0.425969
-0.667385	gives more reliable and	-0.124939
-0.469780	while the Borland and	-0.124939
-0.501569	128-bit execution units and	-0.124939
-0.333467	and restoring registers, and	-0.124939
-0.245761	function to transpose and	-0.425969
-0.112067	variables if possible, and	-0.124939
-0.112067	implementation if possible, and	-0.124939
-0.112067	avoided, if possible, and	-0.124939
-0.127298	branches as possible, and	-0.124939
-0.251087	code is compact and	-0.124939
-0.543731	is more compact and	-0.124939
-0.331674	Using unaligned reads and	-0.124939
-0.496978	inefficient, of course, and	-0.124939
-0.514295	how to identify and	-0.124939
-0.692909	is more complex and	-0.124939
-0.107645	time. 4 Performance and	-0.124939
-0.107645	22 4 Performance and	-0.124939
-0.107645	2 13.4 Test and	-0.124939
-0.107645	decision. 13.4 Test and	-0.124939
-0.251087	compromise when portability and	-0.124939
-0.251087	between efficiency, portability and	-0.124939
-0.496978	time to evaluate and	-0.124939
-0.332840	Visual Basic .NET and	-0.124939
-0.252055	versus references Pointers and	-0.124939
-0.082643	operations. 7.6 Pointers and	-0.124939
-0.082643	33 7.6 Pointers and	-0.124939
-0.330510	constructs are costly and	-0.124939
-0.253026	code more efficient, and	-0.124939
-0.253026	make pointers efficient, and	-0.124939
-0.331674	generation of computers and	-0.124939
-0.330510	of A, B and	-0.124939
-0.252055	varies between 9 and	-0.124939
-0.252055	4, 6, 9 and	-0.124939
-0.507110	the Intel Core and	-0.124939
-0.329350	in a debugger and	-0.124939
-0.251087	instead of truncation and	-0.124939
-0.251087	changed to truncation and	-0.124939
-0.742350	the hot spots and	-0.124939
-0.330510	in Windows 7 and	-0.124939
-0.330510	actually quite powerful and	-0.124939
-0.039656	compiler for 32- and	-0.425969
-0.186461	libraries. Supports 32- and	-0.124939
-0.330510	processing, sound processing, and	-0.124939
-0.329350	many computer users and	-0.124939
-0.251087	language, e.g. C++, and	-0.124939
-0.251087	C#, managed C++, and	-0.124939
-0.332840	methods for communication and	-0.124939
-0.253026	parallelism because communication and	-0.124939
-0.325026	D, Pascal, Fortran and	-0.124939
-0.325026	of the increment and	-0.124939
-0.326405	are accessed backwards and	-0.124939
-0.325026	excessive memory swapping and	-0.124939
-0.157938	use of memset and	-0.124939
-0.157938	calls to memset and	-0.124939
-0.157938	the functions memset and	-0.124939
-0.325026	becoming more popular and	-0.124939
-0.327788	for string searching and	-0.124939
-0.368885	and constant propagation and	-0.124939
-0.225963	enable constant propagation and	-0.124939
-0.233495	during program development, and	-0.124939
-0.233495	of software development, and	-0.124939
-0.326405	parallelism is obvious and	-0.124939
-0.325026	casting. Linked lists and	-0.124939
-0.326405	like sqrt, pow and	-0.124939
-0.325026	storage, far pointers, and	-0.124939
-0.325026	memory allocation Objects and	-0.124939
-0.093598	to have constructors and	-0.124939
-0.029014	The copy constructors and	-0.124939
-0.029014	for copy constructors and	-0.124939
-0.029014	no copy constructors and	-0.124939
-0.422654	1 if nonzero and	-0.124939
-0.780333	a software package and	-0.124939
-0.458267	difficult to understand and	-0.124939
-0.325026	is quite inefficient, and	-0.124939
-0.325026	for foreground jobs and	-0.124939
-0.491514	between the latency and	-0.124939
-0.326405	on Linux platforms, and	-0.124939
-0.325026	code branches separately and	-0.124939
-0.458267	common subexpression elimination and	-0.124939
-0.326405	summation variables sum1 and	-0.124939
-0.325026	C++ code. Compilers and	-0.124939
-0.326405	The CodeGear, Codeplay and	-0.124939
-0.233495	advanced optimizing features, and	-0.124939
-0.233495	better backup features, and	-0.124939
-0.325026	the zero flag and	-0.124939
-0.234612	numbers in b[i] and	-0.124939
-0.234612	checking if b[i] and	-0.124939
-0.092455	delete or malloc and	-0.124939
-0.092455	delete, or malloc and	-0.124939
-0.093598	the functions malloc and	-0.124939
-0.093598	delete (or malloc and	-0.124939
-0.491514	0 is true, and	-0.124939
-0.318619	File input/output Graphics and	-0.124939
-0.322023	of these obstacles and	-0.124939
-0.322023	value of m and	-0.124939
-0.318619	lot of resources, and	-0.124939
-0.318619	is always one, and	-0.124939
-0.637981	facilities are needed, and	-0.124939
-0.318619	in the SVML and	-0.124939
-0.017906	than addition, subtraction and	-0.425969
-0.036581	integer addition, subtraction and	-0.124939
-0.582687	or double precision, and	-0.124939
-0.318619	know that u.f and	-0.124939
-0.412958	on page 134 and	-0.124939
-0.415070	could calculate *p+2 and	-0.124939
-0.208376	or more cores, and	-0.124939
-0.208376	instructions, multiple cores, and	-0.124939
-0.637981	into the pipeline and	-0.124939
-0.417192	7.6. Set flush-to-zero and	-0.124939
-0.320318	in example 14.8 and	-0.124939
-0.208376	checking for overflow, and	-0.124939
-0.208376	violation, integer overflow, and	-0.124939
-0.091400	calculated in advance and	-0.124939
-0.582687	the "worst case" and	-0.124939
-0.209694	a = 0x2710 and	-0.124939
-0.688689	from address 0x2710 and	-0.124939
-0.469123	the hot spot and	-0.124939
-0.318619	stack frame, saving and	-0.124939
-0.415070	importance of structured and	-0.124939
-0.320318	to poor documentation and	-0.124939
-0.320318	it does not, and	-0.124939
-0.412958	of example 12.4b and	-0.124939
-0.318619	generate an underflow and	-0.124939
-0.320318	less than 2n and	-0.124939
-0.092312	function. 7.12 Branches and	-0.124939
-0.092312	40 7.12 Branches and	-0.124939
-0.279713	of user interfaces and	-0.124939
-0.208376	to hardware interfaces and	-0.124939
-0.318619	each other's caches and	-0.124939
-0.582687	as it is, and	-0.124939
-0.585813	interrupt 3 breakpoint and	-0.124939
-0.318619	a well-defined functionality and	-0.124939
-0.209694	through multiple layers and	-0.124939
-0.209694	extra software layers and	-0.124939
-0.318619	induction variables Y and	-0.124939
-0.318619	pointers are auto_ptr and	-0.124939
-0.585813	constants, string constants, and	-0.124939
-0.320318	WritePrivateProfileString, which opens and	-0.124939
-0.320318	the right format and	-0.124939
-0.318619	with millisecond resolution and	-0.124939
-0.318619	point addition units, and	-0.124939
-0.399983	(See page 49 and	-0.124939
-0.308147	to four bits, and	-0.124939
-0.310359	are accessed consecutively and	-0.124939
-0.310359	execution (chapter 11) and	-0.124939
-0.310359	and then B, and	-0.124939
-0.308147	effort. Square blocking and	-0.124939
-0.308147	operating system API and	-0.124939
-0.308147	// Loop r1 and	-0.124939
-0.310359	// Loop r2 and	-0.124939
-0.308147	is fast anyway and	-0.124939
-0.169492	compilers, system database, and	-0.124939
-0.169492	a remote database, and	-0.124939
-0.308147	when doing calculations, and	-0.124939
-0.169492	last cache level, and	-0.124939
-0.169492	object file level, and	-0.124939
-0.308147	the relevant books and	-0.124939
-0.169492	designed for generality and	-0.124939
-0.169492	the full generality and	-0.124939
-0.308147	to the parameter, and	-0.124939
-0.308147	have many keywords and	-0.124939
-0.435484	the option -fno-pic and	-0.124939
-0.399983	x86-64 platform _M_IX86 and	-0.124939
-0.308147	seek information elsewhere and	-0.124939
-0.308147	vector integer operations, and	-0.124939
-0.399983	bigger software packages and	-0.124939
-0.308147	on page 136 and	-0.124939
-0.308147	151 15.1c automatically, and	-0.124939
-0.308147	See page 145 and	-0.124939
-0.308147	distinctions between RISC and	-0.124939
-0.308147	of C++, Pascal and	-0.124939
-0.076530	}; 7.23 Constructors and	-0.124939
-0.076530	54 7.23 Constructors and	-0.124939
-0.308147	processors, between PC's and	-0.124939
-0.169492	such as DOS and	-0.124939
-0.169492	operating systems DOS and	-0.124939
-0.308147	Example 7.4. Signed and	-0.124939
-0.308147	such as logarithms and	-0.124939
-0.310359	be the easiest and	-0.124939
-0.169492	structure, data flow and	-0.124939
-0.339017	the program flow and	-0.124939
-0.308147	s0, s1, s2 and	-0.124939
-0.308147	another function, etc., and	-0.124939
-0.308147	thrown by F2 and	-0.124939
-0.308147	not human readable and	-0.124939
-0.308147	here: functional decomposition and	-0.124939
-0.308147	Several internet forums and	-0.124939
-0.308147	} If Func1 and	-0.124939
-0.308147	count is odd and	-0.124939
-0.308147	as Boolean vectors, and	-0.124939
-0.310359	The allocation, deallocation and	-0.124939
-0.698522	linkage table (PLT) and	-0.124939
-0.308147	systems: Pointers, references, and	-0.124939
-0.141413	6.0f; Constant folding and	-0.124939
-0.141413	places. Constant folding and	-0.124939
-0.064556	r in Sum2 and	-0.124939
-0.064556	efficient than Sum2 and	-0.124939
-0.064556	functions Sum1, Sum2 and	-0.124939
-0.287931	bytes smaller. Structure and	-0.124939
-0.287931	n.a. n.a. _MSC_VER and	-0.124939
-0.287931	the & operator; and	-0.124939
-0.287931	counters are CPU-specific and	-0.124939
-0.023099	takes to develop and	-0.124939
-0.287931	most microprocessors. Multiplication and	-0.124939
-0.287931	in example 14.12b and	-0.124939
-0.287931	than self-styled hacks and	-0.124939
-0.287931	only a hint and	-0.124939
-0.375140	the preceding paragraph and	-0.124939
-0.375140	Firewalls, virus scanners and	-0.124939
-0.287931	See page 73 and	-0.124939
-0.287931	system color settings and	-0.124939
-0.287931	sent me corrections and	-0.124939
-0.287931	click becomes inconsistent and	-0.124939
-0.287931	cost of starting and	-0.124939
-0.287931	factor in itself, and	-0.124939
-0.287931	- 64 Kbytes and	-0.124939
-0.100830	cost to creating and	-0.124939
-0.100830	responsible for creating and	-0.124939
-0.287931	calls alternately FuncA and	-0.124939
-0.287931	to be overwritten, and	-0.124939
-0.287931	is advantageous if, and	-0.124939
-0.287931	contained in p1 and	-0.124939
-0.287931	the functions lrintf and	-0.124939
-0.287931	functions for audio and	-0.124939
-0.287931	a high price, and	-0.124939
-0.287931	of register renaming and	-0.124939
-0.375140	processing, data compression and	-0.124939
-0.047495	is the exponent, and	-0.124939
-0.047495	bit, the exponent, and	-0.124939
-0.287931	fine-tuning, testing, verifying and	-0.124939
-0.047495	60 7.30 Exceptions and	-0.124939
-0.047495	multithreading. 7.30 Exceptions and	-0.124939
-0.287931	constructors, copy constructors, and	-0.124939
-0.287931	have to push and	-0.124939
-0.287931	as sorting, searching, and	-0.124939
-0.287931	is deleted properly and	-0.124939
-0.287931	for accessing list[i].a and	-0.124939
-0.287931	instructions MOVNTPS, MOVNTPD and	-0.124939
-0.287931	137, respectively. Increment and	-0.124939
-0.527887	elimination, constant propagation, and	-0.124939
-0.527887	Math Kernel Library" and	-0.124939
-0.287931	speed up multiplications and	-0.124939
-0.287931	should be optional and	-0.124939
-0.287931	bit platform __GNUC__ and	-0.124939
-0.375140	storage. Example 14.23b and	-0.124939
-0.287931	in many respects and	-0.124939
-0.287931	CPU doesn't support, and	-0.124939
-0.375140	be quite tedious and	-0.124939
-0.375140	use a systematic and	-0.124939
-0.100830	to memory management and	-0.124939
-0.100830	of heap management and	-0.124939
-0.287931	functions look clumsy and	-0.124939
-0.287931	instructions are fetched and	-0.124939
-0.287931	#define ABC 123 and	-0.124939
-0.287931	away in reusable and	-0.124939
-0.287931	as Taylor expansions and	-0.124939
-0.287931	due to interrupts and	-0.124939
-0.287931	Fog. Public distribution and	-0.124939
-0.287931	calls to log, and	-0.124939
-0.100830	by S. Goedecker and	-0.124939
-0.100830	performance. Stefan Goedecker and	-0.124939
-0.287931	measurements as accurate and	-0.124939
-0.287931	such as sorting and	-0.124939
-0.375140	times to keyboard and	-0.124939
-0.047495	division, square root and	-0.124939
-0.047495	Division, square root and	-0.124939
-0.287931	algebra and statistics, and	-0.124939
-0.375140	prevent memory leaks and	-0.124939
-0.287931	See page 95 and	-0.124939
-0.527887	new and delete, and	-0.124939
-0.375140	element is accessed, and	-0.124939
-0.047495	object. 7.17 Structures and	-0.124939
-0.047495	50 7.17 Structures and	-0.124939
-0.232382	systems are common, and	-0.124939
-0.232382	is fast, compact, and	-0.124939
-0.232382	characters '?', '@' and	-0.124939
-0.232382	different memory areas, and	-0.124939
-0.232382	kinds of strange and	-0.124939
-0.232382	to become invalid, and	-0.124939
-0.232382	such as sqrt and	-0.124939
-0.232382	esp+8 and esp+12 and	-0.124939
-0.232382	xn = x∙xn-1, and	-0.124939
-0.232382	of cache evictions and	-0.124939
-0.232382	usability, program compactness, and	-0.124939
-0.232382	for code bloat and	-0.124939
-0.232382	Java and C# and	-0.124939
-0.232382	or false (0); and	-0.124939
-0.232382	sources of frustration and	-0.124939
-0.232382	in some situations, and	-0.124939
-0.232382	programming, compiler technology, and	-0.124939
-0.232382	microprocessors, different alignments and	-0.124939
-0.232382	efficiency, platform independence, and	-0.124939
-0.232382	resources are sufficient, and	-0.124939
-0.232382	background processes running, and	-0.124939
-0.232382	1, 2A, 2B, and	-0.124939
-0.232382	here gives a+b=0, and	-0.124939
-0.232382	avoid hard-to-find errors, and	-0.124939
-0.232382	clauses: initialization, condition, and	-0.124939
-0.232382	and one local, and	-0.124939
-0.232382	difference between commas and	-0.124939
-0.232382	256 bits (YMM), and	-0.124939
-0.232382	sections are dominating and	-0.124939
-0.232382	such as spell-checking and	-0.124939
-0.232382	flexible, well tested, and	-0.124939
-0.232382	low priority thread, and	-0.124939
-0.232382	&&, ||, ! and	-0.124939
-0.232382	Manual", Volume 2A and	-0.124939
-0.232382	Time for transposing and	-0.124939
-0.232382	arrays are aligned, and	-0.124939
-0.232382	be both cheaper and	-0.124939
-0.232382	compatibility, second source, and	-0.124939
-0.232382	by defining _mm_malloc and	-0.124939
-0.232382	instruction latencies, throughputs and	-0.124939
-0.232382	programming, modularity, reusability and	-0.124939
-0.232382	speed, memory economy and	-0.124939
-0.232382	10.1.020. Functions _intel_fast_memcpy and	-0.124939
-0.232382	such as GetPrivateProfileString and	-0.124939
-0.232382	will be non-zero, and	-0.124939
-0.232382	to test, maintain and	-0.124939
-0.232382	a separate module, and	-0.124939
-0.232382	integers is ambiguous and	-0.124939
-0.232382	not always sequential, and	-0.124939
-0.232382	AQtime, Intel VTune and	-0.124939
-0.232382	at address esp+8 and	-0.124939
-0.232382	values by hand and	-0.124939
-0.232382	its own caller, and	-0.124939
-0.232382	0x2F00, 0x3700, 0x3F00 and	-0.124939
-0.232382	T+1 to T+6, and	-0.124939
-0.232382	the G values, and	-0.124939
-0.232382	in the Professional and	-0.124939
-0.232382	calling conventions. FreeBSD and	-0.124939
-0.232382	want to optimize, and	-0.124939
-0.232382	named MKL, VML and	-0.124939
-0.232382	between CPU brands, and	-0.124939
-0.232382	Interrupt service routines and	-0.124939
-0.232382	Covers PC's, workstations and	-0.124939
-0.232382	by fetching, decoding and	-0.124939
-0.232382	to 3-dimensional geometry and	-0.124939
-0.232382	program is terminated and	-0.124939
-0.232382	pointer, common subexpressions, and	-0.124939
-0.232382	such as flush and	-0.124939
-0.232382	JavaScript, PHP, ASP and	-0.124939
-0.232382	are usability issues, and	-0.124939
-0.232382	as semaphores, mutexes and	-0.124939
-0.232382	are often fluctuating and	-0.124939
-0.232382	CPU dispatch mechanisms, and	-0.124939
-0.232382	register containing (2,2,2,2), and	-0.124939
-0.232382	the value infinity, and	-0.124939
-0.232382	in computer games and	-0.124939
-0.232382	such as email and	-0.124939
-0.232382	to be platform-independent and	-0.124939
-0.232382	around this limitation and	-0.124939
-0.232382	to virus attacks and	-0.124939
-0.232382	have a temp1 and	-0.124939
-0.232382	(Standard Template Library) and	-0.124939
-0.232382	matrix 512 520 and	-0.124939
-0.232382	Template Library (ATL) and	-0.124939
-0.232382	kernel version 2.6.30 and	-0.124939
-0.232382	faster than 15.1b, and	-0.124939
-0.232382	distinguish between recoverable and	-0.124939
-0.232382	software be reinstalled and	-0.124939
-0.232382	cost of synchronizing and	-0.124939
-0.232382	optimization. See www.agner.org/optimize and	-0.124939
-0.232382	highly system dependent and	-0.124939
-0.232382	the clock period and	-0.124939
-0.232382	2B, and 3A and	-0.124939
-0.232382	access. Available protocols and	-0.124939
-0.232382	Supports vector intrinsics and	-0.124939
-0.232382	in the GOT, and	-0.124939
-0.232382	with heavy traffic and	-0.124939
-0.232382	is more manageable and	-0.124939
-0.232382	between each call, and	-0.124939
-0.232382	will generate -128, and	-0.124939
-0.232382	the library libmmt.lib and	-0.124939
-0.232382	(e.g. with _finite()) and	-0.124939
-0.232382	bitwise operators (& and	-0.124939
-0.232382	the copying process, and	-0.124939
-0.232382	has no side-effects and	-0.124939
-0.232382	by consistent modularity and	-0.124939
-0.232382	be different sizes, and	-0.124939
-0.232382	leaving their workplace and	-0.124939
-0.232382	compilers. See www.openmp.org and	-0.124939
-0.232382	devices are CPLDs and	-0.124939
-0.232382	memory allocation (new and	-0.124939
-0.232382	into smaller squares and	-0.124939
-0.232382	(int)(&list[0]) + 100*16, and	-0.124939
-0.232382	array bounds violations and	-0.124939
-0.232382	for the pros and	-0.124939
-0.232382	show how tortuous and	-0.124939
-0.232382	use try, catch, and	-0.124939
-0.232382	Third Edition, 2005; and	-0.124939
-0.232382	pre-increment operator ++i and	-0.124939
-0.232382	thousand times lower; and	-0.124939
-0.232382	PC's and mainframes, and	-0.124939
-0.232382	First-In-Last-Out access, sort and	-0.124939
-0.232382	with this mask, and	-0.124939
-0.232382	as floppy disks and	-0.124939
-0.232382	statement jump tables, and	-0.124939
-0.232382	code becomes bulky and	-0.124939
-0.232382	constant vector (1,2,3,4), and	-0.124939
-0.232382	avoiding pointer arithmetics and	-0.124939
-0.232382	division with truncation, and	-0.124939
-0.232382	AND operator (&) and	-0.124939
-0.232382	Boolean operators (&& and	-0.124939
-0.232382	of an error; and	-0.124939
-1.038427	the loop is in	-0.124939
-0.572377	A pointer is in	-0.124939
-0.537982	whose address is in	-0.124939
-0.461444	to matrix a in	-0.124939
-0.561223	64-bit systems and in	-0.124939
-0.354180	Linux platforms, and in	-0.124939
-0.457604	cache level, and in	-0.124939
-0.354180	high price, and in	-0.124939
-0.354180	than 15.1b, and in	-0.124939
-1.487372	likely to be in	-0.124939
-1.014761	guaranteed to be in	-0.124939
-0.499462	would still be in	-0.124939
-0.495940	stack and are in	-0.124939
-1.359752	If you are in	-0.124939
-0.517737	C++ program are in	-0.124939
-0.569260	14.7b, we are in	-0.124939
-0.798906	because they are in	-0.124939
-0.798906	but they are in	-0.124939
-0.844429	PathScale compilers can in	-0.124939
-0.571227	a function or in	-0.124939
-0.495527	system code or in	-0.124939
-0.352127	compiler manual or in	-0.124939
-0.352127	carry flag or in	-0.124939
-0.796868	to make it in	-0.124939
-0.354008	and store it in	-0.124939
-0.457386	or write it in	-0.124939
-0.354008	should disable it in	-0.124939
-0.636596	of the function in	-0.124939
-1.016221	in the function in	-0.124939
-0.530014	from the function in	-0.124939
-0.891283	inside the function in	-0.124939
-0.816832	of a function in	-0.124939
-0.492183	to a function in	-0.425969
-0.853983	as a function in	-0.124939
-0.843725	If a function in	-0.124939
-0.468567	call a function in	-0.124939
-0.468567	calls a function in	-0.124939
-0.468567	Whenever a function in	-0.124939
-0.337774	very time-consuming function in	-0.124939
-0.960834	CPU detection function in	-0.124939
-0.618546	the strlen function in	-0.124939
-0.337774	the std::unexpected() function in	-0.124939
-0.458594	are dealing with in	-0.124939
-0.354960	usually dealt with in	-0.124939
-0.910543	of the code in	-0.301030
-0.735886	to the code in	-0.124939
-0.911977	If the code in	-0.124939
-0.510461	replace the code in	-0.124939
-0.510461	change the code in	-0.124939
-0.510461	organize the code in	-0.124939
-0.708861	piece of code in	-0.124939
-0.470928	range of code in	-0.124939
-0.397944	} The code in	-0.124939
-0.560673	time. The code in	-0.124939
-0.397944	calculations. The code in	-0.124939
-0.397944	dispatching. The code in	-0.124939
-0.397944	_mm_cvtsd_si32(_mm_load_sd(&x));} The code in	-0.124939
-0.842415	the same code in	-0.124939
-0.135194	Making critical code in	-0.425969
-0.501698	the above code in	-0.124939
-0.326180	hardware definition code in	-0.124939
-0.422373	the compiler-generated code in	-0.124939
-0.800756	same way as in	-0.124939
-1.173486	as well as in	-0.124939
-0.346903	dispatching explicitly as in	-0.124939
-0.346903	in memory, as in	-0.124939
-0.346903	same principle as in	-0.124939
-0.346903	multiple counters, as in	-0.124939
-0.346903	a union, as in	-0.124939
-1.099844	it is not in	-0.124939
-0.564450	interface is not in	-0.124939
-0.447126	hyperthreading or not in	-0.124939
-0.563873	GB, but not in	-0.124939
-0.447126	in registers, not in	-0.124939
-0.345902	systems (but not in	-0.124939
-0.357539	quite expensive - in	-0.124939
-0.448885	double to int in	-0.124939
-0.576685	uint16_t unsigned int in	-0.124939
-0.886894	unsigned short int in	-0.124939
-0.528098	type short int in	-0.124939
-0.528098	int8_t short int in	-0.124939
-0.347294	32767 int16_t int in	-0.124939
-0.332633	library functions than in	-0.124939
-0.332633	dynamic library than in	-0.124939
-0.561137	C++ faster than in	-0.124939
-0.545265	times less than in	-0.124939
-0.189408	memory rather than in	-0.425969
-1.172452	registers rather than in	-0.124939
-0.332633	separate file than in	-0.124939
-0.026620	64-bit Linux than in	-0.602060
-0.135220	64-bit mode than in	-0.124939
-0.135220	bit mode than in	-0.124939
-0.332633	logic device than in	-0.124939
-1.177866	by the compiler in	-0.124939
-1.516453	the Intel compiler in	-0.124939
-1.416413	the Gnu compiler in	-0.124939
-0.357861	to store x in	-0.124939
-1.161509	the compiler may in	-0.124939
-0.527946	of compiler may in	-0.124939
-0.576844	cycles. It may in	-0.124939
-0.512770	a program may in	-0.124939
-0.349023	int declaration may in	-0.124939
-0.542187	cannot avoid this in	-0.124939
-0.353396	implemented like this in	-0.124939
-1.190820	at a time in	-0.124939
-0.768138	waste of time in	-0.124939
-0.345050	only one time in	-0.124939
-1.014729	a long time in	-0.124939
-0.795508	of its time in	-0.124939
-0.943389	takes longer time in	-0.124939
-0.591978	version to use in	-0.124939
-0.351606	optimizing CPU use in	-0.124939
-0.454341	economize resource use in	-0.124939
-0.356444	loop exits, when in	-0.124939
-0.648112	the main memory in	-0.124939
-0.353011	for reserving memory in	-0.124939
-0.563572	processing the data in	-0.124939
-0.531015	set of data in	-0.124939
-0.662523	pointers to data in	-0.124939
-0.536656	functions and data in	-0.124939
-0.498244	} The data in	-0.124939
-0.330210	the same data in	-0.124939
-0.295093	of all data in	-0.124939
-0.295093	on all data in	-0.124939
-0.330210	of received data in	-0.124939
-0.330210	on arranging data in	-0.124939
-0.573446	run the program in	-0.124939
-0.573446	stop the program in	-0.124939
-0.641629	the entire program in	-0.124939
-0.015196	the result vector in	-0.726999
-0.354953	will look different in	-0.124939
-0.493493	than pointers because in	-0.124939
-0.350663	= *(++p) because in	-0.124939
-0.350663	= array[++i] because in	-0.124939
-1.027668	are the same in	-0.124939
-0.584201	write the same in	-0.124939
-0.494890	order of functions in	-0.124939
-0.435517	the different functions in	-0.124939
-0.336680	information about functions in	-0.124939
-0.474184	96). Virtual functions in	-0.124939
-0.336680	all suitable functions in	-0.124939
-0.336680	and internal functions in	-0.124939
-0.336680	Place non-polymorphic functions in	-0.124939
-0.349314	in registers only in	-0.124939
-0.349314	this option only in	-0.124939
-0.349314	size comes only in	-0.124939
-0.250095	for each other in	-0.124939
-0.240417	near each other in	-0.301030
-0.459497	a decimal point in	-0.124939
-1.035491	of the loop in	-0.124939
-0.776886	unrolling the loop in	-0.124939
-0.534390	vectorize the loop in	-0.124939
-0.551001	that a loop in	-0.124939
-0.650737	} The loop in	-0.124939
-0.457806	1000. The loop in	-0.124939
-0.752011	the while loop in	-0.124939
-0.329130	The c loop in	-0.124939
-0.426058	a message loop in	-0.124939
-0.545620	another function which in	-0.124939
-0.721847	intrinsic functions, but in	-0.124939
-0.341643	32-bit systems, but in	-0.124939
-0.341643	general case, but in	-0.124939
-0.341643	system programming, but in	-0.124939
-0.341643	as required, but in	-0.124939
-0.372230	and is used in	-0.124939
-0.372230	code is used in	-0.124939
-0.523755	which is used in	-0.124939
-0.523755	cache is used in	-0.124939
-0.372230	standard is used in	-0.124939
-0.372230	mode is used in	-0.124939
-0.372230	feature is used in	-0.124939
-0.988397	can be used in	-0.124939
-0.699313	cannot be used in	-0.124939
-0.407042	that are used in	-0.346788
-0.429723	functions are used in	-0.124939
-0.303843	they are used in	-0.124939
-0.303843	processors are used in	-0.124939
-0.278633	or data used in	-0.124939
-0.278633	is only used in	-0.124939
-0.629312	is also used in	-0.124939
-0.363802	The method used in	-0.124939
-0.278633	are now used in	-0.124939
-0.278633	systems Microcontrollers used in	-0.124939
-0.355237	is number one in	-0.124939
-0.753439	from the cache in	-0.124939
-0.520820	uses the cache in	-0.124939
-0.640878	of the integer in	-0.124939
-0.451458	using the integer in	-0.124939
-0.675032	of an integer in	-0.124939
-0.473230	simply an integer in	-0.124939
-0.473230	convert an integer in	-0.124939
-0.448697	Friday is set in	-0.124939
-0.347145	the same set in	-0.124939
-0.589435	highest instruction set in	-0.124939
-0.786965	the derived class in	-0.124939
-0.749003	a parent class in	-0.124939
-0.523866	as an example in	-0.124939
-0.555999	default integer size in	-0.124939
-0.509024	the register size in	-0.124939
-1.056505	cache line size in	-0.124939
-0.549971	as a pointer in	-0.124939
-1.268262	through a pointer in	-0.124939
-0.441891	The this pointer in	-0.124939
-0.752762	implicit 'this' pointer in	-0.124939
-0.899294	a and b in	-0.124939
-0.973375	bit of i in	-0.124939
-0.498910	i to float in	-0.124939
-0.572824	move the object in	-0.124939
-0.631488	the data object in	-0.124939
-0.506199	store each object in	-0.124939
-1.157814	floating point number in	-0.124939
-1.607804	is more efficient in	-0.124939
-0.721942	and more efficient in	-0.124939
-0.721942	code more efficient in	-0.124939
-0.502109	calling more efficient in	-0.124939
-0.951303	are less efficient in	-0.124939
-0.551488	this is possible in	-0.124939
-1.736645	It is possible in	-0.124939
-0.492020	then the version in	-0.124939
-0.641415	the desired version in	-0.124939
-0.801187	of the value in	-0.124939
-0.548126	use the value in	-0.124939
-0.326305	transferred by value in	-0.124939
-0.326305	value maximum value in	-0.124939
-0.326305	four B value in	-0.124939
-0.326305	four R value in	-0.124939
-0.757028	all the objects in	-0.124939
-0.152995	movements of objects in	-0.124939
-0.287817	there are objects in	-0.124939
-0.338932	in shared objects in	-0.124939
-0.338932	64-bit shared objects in	-0.124939
-0.287817	of graphics objects in	-0.124939
-0.078433	code Shared objects in	-0.124939
-0.078433	time. Shared objects in	-0.124939
-0.037448	below. Shared objects in	-0.124939
-0.078433	BSD Shared objects in	-0.124939
-0.078433	references. Shared objects in	-0.124939
-0.652962	of the variable in	-0.124939
-0.459232	away the variable in	-0.124939
-0.649451	to a variable in	-0.124939
-0.408021	from a variable in	-0.124939
-0.408021	access a variable in	-0.124939
-0.447925	pointer. A variable in	-0.124939
-0.290083	some other variable in	-0.124939
-0.056411	a register variable in	-0.124939
-0.531634	a public variable in	-0.124939
-0.181477	a global variable in	-0.124939
-0.355393	or does so in	-0.124939
-0.292818	not on variables in	-0.124939
-0.826884	floating point variables in	-0.124939
-0.381122	often used variables in	-0.124939
-0.292818	that most variables in	-0.124939
-0.257390	point register variables in	-0.425969
-0.056809	single precision variables in	-0.425969
-0.381122	have public variables in	-0.124939
-0.381122	and local variables in	-0.124939
-0.292818	class. Storing variables in	-0.124939
-1.065077	power of 2 in	-0.124939
-0.478920	multiplication by 2 in	-0.124939
-0.879072	calculate the table in	-0.124939
-0.495383	store the table in	-0.124939
-0.825053	from a table in	-0.124939
-0.354767	can improve performance in	-0.124939
-0.431289	with making software in	-0.124939
-0.062459	1. Optimizing software in	-0.124939
-0.333310	make memory-hungry software in	-0.124939
-0.149512	in the order in	-0.602060
-0.265613	usually the order in	-0.124939
-0.265613	reflects the order in	-0.124939
-0.265613	controlling the order in	-0.124939
-0.459106	51). The order in	-0.124939
-0.309868	a non-sequential order in	-0.124939
-0.458328	the if branch in	-0.124939
-0.528100	than one way in	-0.124939
-0.449902	a safe way in	-0.124939
-0.443987	of the elements in	-0.124939
-0.231067	number of elements in	-0.271067
-0.326475	Number of elements in	-0.124939
-0.321354	requests for elements in	-0.124939
-0.243447	alias any elements in	-0.124939
-0.243447	C++ language elements in	-0.124939
-0.401300	first eight elements in	-0.124939
-0.451779	eight consecutive elements in	-0.726999
-0.243447	all subsequent elements in	-0.124939
-0.624918	function calls faster in	-0.124939
-0.341100	run slightly faster in	-0.124939
-0.341100	be executed faster in	-0.124939
-0.355674	be declared const in	-0.124939
-0.182922	that is stored in	-0.124939
-0.310162	it is stored in	-0.124939
-0.081901	variable is stored in	-0.124939
-0.081901	result is stored in	-0.124939
-0.212870	advance and stored in	-0.124939
-0.064238	to be stored in	-0.124939
-0.064238	can be stored in	-0.124939
-0.139655	may be stored in	-0.124939
-0.174731	will be stored in	-0.425969
-0.445704	should be stored in	-0.124939
-0.064238	cannot be stored in	-0.425969
-0.130801	that are stored in	-0.124939
-0.128838	data are stored in	-0.124939
-0.211165	class are stored in	-0.124939
-0.130801	objects are stored in	-0.124939
-0.128838	variables are stored in	-0.124939
-0.130801	elements are stored in	-0.124939
-0.130801	numbers are stored in	-0.124939
-0.130801	constants are stored in	-0.124939
-0.151021	a pointer stored in	-0.124939
-0.093521	The objects stored in	-0.124939
-0.044248	for objects stored in	-0.124939
-0.151021	have been stored in	-0.124939
-0.151021	are typically stored in	-0.124939
-0.151021	is never stored in	-0.124939
-0.151021	are usually stored in	-0.124939
-0.829869	CriticalFunction is called in	-0.124939
-0.338858	are actually called in	-0.124939
-0.620619	is usually called in	-0.124939
-0.347333	the function address in	-0.124939
-0.347333	any other address in	-0.124939
-0.353772	a factor 4 in	-0.124939
-0.457094	divisible by 8 in	-0.124939
-0.684724	code. For example, in	-0.124939
-0.479295	overflow. For example, in	-0.124939
-0.479295	tasks. For example, in	-0.124939
-0.479295	post-increment. For example, in	-0.124939
-0.479295	reporting. For example, in	-0.124939
-0.300741	using each bit in	-0.124939
-0.300741	next each bit in	-0.124939
-1.403758	the sign bit in	-0.124939
-0.521663	a to unsigned in	-0.124939
-0.576717	is the first in	-0.124939
-0.345917	expression, or first in	-0.124939
-0.582995	distribute function libraries in	-0.124939
-0.551921	but in registers in	-0.124939
-0.883215	the vector registers in	-0.124939
-0.472909	fourteen integer registers in	-0.124939
-0.997864	accessed through pointers in	-0.124939
-0.573242	code to test in	-0.124939
-0.827596	This is useful in	-0.124939
-0.289168	can be useful in	-0.124939
-0.568097	is also useful in	-0.124939
-0.283717	is less useful in	-0.124939
-0.354083	exception handling even in	-0.124939
-0.545710	elimination. The method in	-0.124939
-0.344447	function calling method in	-0.124939
-0.726930	put file access in	-0.124939
-0.483921	and network access in	-0.124939
-0.352976	element number 16 in	-0.124939
-0.511086	opens a file in	-0.124939
-0.333007	old data file in	-0.124939
-0.333007	the entire file in	-0.124939
-0.295822	number of bits in	-0.124939
-0.541691	the same bits in	-0.124939
-0.445889	set multiple bits in	-0.124939
-0.656950	and 64 bits in	-0.124939
-0.414687	is 32 bits in	-0.124939
-0.414687	use 32 bits in	-0.124939
-0.295822	writing small bits in	-0.124939
-0.443587	kinds of operations in	-0.124939
-0.343096	are primitive operations in	-0.124939
-0.353829	by element 0 in	-0.124939
-0.353393	than to type in	-0.124939
-0.568922	often the case in	-0.124939
-0.355578	it is short in	-0.124939
-0.352173	is quite simple in	-0.124939
-0.352728	of pending instructions in	-0.124939
-0.331800	and is available in	-0.124939
-0.331800	function is available in	-0.124939
-0.467500	which is available in	-0.124939
-0.255608	to be available in	-0.124939
-0.353454	libraries are available in	-0.124939
-0.263261	registers are available in	-0.124939
-0.473175	is also available in	-0.124939
-0.335945	point registers available in	-0.124939
-0.473175	logical processors available in	-0.124939
-0.255608	will become available in	-0.124939
-0.731351	to look up in	-0.124939
-0.667540	to clean up in	-0.124939
-0.430321	are cleaned up in	-0.124939
-0.352704	a minor error in	-0.124939
-0.320790	or multiple times in	-0.124939
-0.519782	used many times in	-0.124939
-0.131426	package several times in	-0.124939
-0.131426	alternatingly several times in	-0.124939
-0.774330	on the stack in	-0.425969
-0.330571	the call stack in	-0.124939
-0.352220	can read about in	-0.124939
-0.566885	object is accessed in	-0.124939
-0.451811	data are accessed in	-0.124939
-0.579490	objects are accessed in	-0.124939
-0.156455	elements are accessed in	-0.124939
-0.283994	addresses are accessed in	-0.124939
-0.403399	rows are accessed in	-0.124939
-0.249679	memory or accessed in	-0.124939
-0.050274	Are objects accessed in	-0.425969
-0.348481	treats non-Intel CPUs in	-0.124939
-0.348481	treat non-Intel CPUs in	-0.124939
-0.352049	been incremented, while in	-0.124939
-0.501060	use of arrays in	-0.124939
-0.440894	as static arrays in	-0.124939
-0.532905	intended to work in	-0.124939
-0.440278	does not work in	-0.124939
-0.495904	and 32-bit Windows in	-0.124939
-0.512554	of function calls in	-0.124939
-0.512554	and function calls in	-0.124939
-0.364297	on function calls in	-0.124939
-0.364297	nested function calls in	-0.124939
-0.487140	redo the calculations in	-0.124939
-0.316970	simple integer calculations in	-0.124939
-0.190596	doing multiple calculations in	-0.124939
-0.456697	return the result in	-0.124939
-0.456697	store the result in	-0.124939
-0.456697	stores the result in	-0.124939
-0.307207	stores this result in	-0.124939
-0.307207	; store result in	-0.124939
-0.565904	function is compiled in	-0.124939
-0.537227	is 4 bytes in	-0.124939
-0.521833	and 8 bytes in	-0.124939
-0.526720	of unused bytes in	-0.124939
-0.526720	4 unused bytes in	-0.124939
-0.516917	made very big in	-0.124939
-0.445822	between different threads in	-0.124939
-0.502389	running multiple threads in	-0.124939
-0.141352	run two threads in	-0.124939
-0.523186	can be necessary in	-0.124939
-0.221143	access an element in	-0.124939
-0.044916	to each element in	-0.425969
-0.044916	AND each element in	-0.425969
-0.044916	Compare each element in	-0.425969
-0.351908	each array element in	-0.124939
-0.221143	a new element in	-0.124939
-0.221143	for every element in	-0.124939
-0.294799	list. Each element in	-0.124939
-0.156359	numerically largest element in	-0.124939
-0.992721	hardware definition language in	-0.124939
-0.440342	called function. But in	-0.124939
-0.340519	and b. But in	-0.124939
-0.534058	the execution speed in	-0.124939
-0.511312	to the thread in	-0.124939
-1.097371	a separate thread in	-0.124939
-0.644559	trigonometric functions, etc. in	-0.124939
-0.399967	catch an exception in	-0.124939
-0.399967	raising an exception in	-0.124939
-0.520452	that are allocated in	-0.124939
-0.351577	be kept small in	-0.124939
-0.455557	doesn't cause overflow in	-0.124939
-0.442204	or 64-bit integers in	-0.124939
-0.442204	use 32-bit integers in	-0.124939
-0.919944	and unsigned integers in	-0.124939
-0.406171	of signed integers in	-0.124939
-0.338751	exception handling option in	-0.124939
-0.338751	loop unroll option in	-0.124939
-0.550034	implementing a matrix in	-0.124939
-0.619177	512 512 matrix in	-0.124939
-0.352256	identical to Linux in	-0.124939
-0.398650	that container classes in	-0.124939
-0.398650	make container classes in	-0.124939
-0.323990	multiple parent classes in	-0.124939
-0.760988	it is done in	-0.124939
-0.770242	preferably be done in	-0.124939
-0.312696	is all done in	-0.124939
-0.312696	is usually done in	-0.124939
-0.593108	the same precision in	-0.124939
-0.686383	and double precision in	-0.124939
-0.480328	Using double precision in	-0.124939
-0.572396	new cache line in	-0.124939
-0.344796	optimization. This works in	-0.124939
-0.238773	addresses. This works in	-0.124939
-0.189200	factors are explained in	-0.124939
-0.189200	mangling are explained in	-0.124939
-0.358661	space, as explained in	-0.124939
-0.358661	classes, as explained in	-0.124939
-0.358661	branches, as explained in	-0.124939
-0.358661	use, as explained in	-0.124939
-0.358661	ways, as explained in	-0.124939
-0.358661	linking, as explained in	-0.124939
-0.248708	is further explained in	-0.124939
-0.532034	b) is calculated in	-0.124939
-1.393066	can be calculated in	-0.124939
-0.323316	it has calculated in	-0.124939
-0.511202	redo the calculation in	-0.124939
-0.511202	Re-do the calculation in	-0.124939
-0.477000	for address calculation in	-0.124939
-0.687334	This is advantageous in	-0.124939
-0.480919	table is advantageous in	-0.124939
-0.250233	class is implemented in	-0.124939
-0.107338	version is implemented in	-0.425969
-0.275567	can be implemented in	-0.124939
-0.246861	day be implemented in	-0.124939
-0.379957	etc. are implemented in	-0.124939
-0.222382	for programs implemented in	-0.124939
-0.352135	a usability problem in	-0.124939
-0.352943	implicit pointer known in	-0.124939
-0.358217	very efficient solution in	-0.124939
-0.358217	An efficient solution in	-0.124939
-0.322416	a viable solution in	-0.124939
-0.354732	only an advantage in	-0.124939
-0.354732	gives an advantage in	-0.124939
-0.309017	no such advantage in	-0.124939
-0.309017	any speed advantage in	-0.124939
-0.351974	and profiling support in	-0.124939
-0.974650	set is supported in	-0.124939
-0.539064	AVX is supported in	-0.124939
-0.382973	AVX2 is supported in	-0.124939
-0.352600	variables is eight in	-0.124939
-0.351768	elements in list in	-0.124939
-0.586286	bits is likely in	-0.124939
-0.515186	making the structure in	-0.124939
-0.335723	clear program structure in	-0.124939
-0.212611	threads that run in	-0.124939
-0.212611	services that run in	-0.124939
-0.428905	family can run in	-0.124939
-0.289799	process should run in	-0.124939
-0.289799	of each run in	-0.124939
-0.351495	implemented in hardware in	-0.124939
-0.416102	store the values in	-0.124939
-0.416102	insert the values in	-0.124939
-0.317573	four G values in	-0.124939
-0.349801	application program. All in	-0.124939
-0.351437	have worked well in	-0.124939
-0.413125	store the information in	-0.124939
-0.318753	contains debug information in	-0.124939
-0.318753	store application-specific information in	-0.124939
-0.581788	2 clock cycles in	-0.124939
-0.284252	pointers and addresses in	-0.124939
-0.284252	code. All addresses in	-0.124939
-0.209506	generate relative addresses in	-0.124939
-0.209506	self- relative addresses in	-0.124939
-0.284252	to round addresses in	-0.124939
-0.608987	performance monitor counter in	-0.124939
-0.940081	time stamp counter in	-0.124939
-0.332606	may be fast in	-0.124939
-0.332606	operations are fast in	-0.124939
-1.456370	dynamic memory allocation in	-0.124939
-0.351502	other thread. However, in	-0.124939
-0.515553	This is optimal in	-0.124939
-0.684907	may be optimal in	-0.124939
-0.295236	occupies a space in	-0.124939
-0.418260	up more space in	-0.124939
-0.295236	too much space in	-0.124939
-0.295236	takes little space in	-0.124939
-0.588612	differ a lot in	-0.124939
-0.285799	explicit CPU dispatching in	-0.124939
-0.119802	13.6 CPU dispatching in	-0.425969
-0.119802	13.7 CPU dispatching in	-0.425969
-0.285799	13.2. CPU dispatching in	-0.124939
-0.492732	implement a microprocessor in	-0.124939
-0.349699	calls and branches in	-0.124939
-0.349491	priority level, typically in	-0.124939
-0.351786	few files, preferably in	-0.124939
-0.532703	the code automatically in	-0.124939
-0.290696	this optimization automatically in	-0.124939
-0.290696	vector operations automatically in	-0.124939
-0.290696	nontemporal writes automatically in	-0.124939
-0.350642	that you see in	-0.124939
-0.101657	the hardware implementation in	-0.425969
-0.820420	is more complicated in	-0.124939
-0.483934	as error handling in	-0.124939
-0.539672	to exception handling in	-0.124939
-0.350209	often used members in	-0.124939
-0.451562	Using the methods in	-0.124939
-0.513008	modifying the name in	-0.124939
-0.348781	seconds remains zero in	-0.124939
-0.492503	for integer division in	-0.124939
-0.117706	is no cost in	-0.425969
-0.308104	without any cost in	-0.124939
-0.505682	code is running in	-0.124939
-0.178981	that are running in	-0.124939
-0.178981	repagination are running in	-0.124939
-0.429826	disappears when running in	-0.124939
-0.231275	operating system running in	-0.124939
-0.231275	Two threads running in	-0.124939
-0.231275	higher-priority thread running in	-0.124939
-0.326202	// make dispatcher in	-0.124939
-0.771486	the CPU dispatcher in	-0.124939
-0.576576	puts the programmer in	-0.124939
-0.542959	also a lookup in	-0.124939
-1.010238	in the end in	-0.124939
-0.430864	See the examples in	-0.124939
-0.430864	documented. The examples in	-0.124939
-0.395723	The code examples in	-0.124939
-0.227595	be a difference in	-0.124939
-0.075567	is no difference in	-0.249877
-0.139216	simply no difference in	-0.124939
-0.227595	no big difference in	-0.124939
-0.331060	CPU dispatch mechanism in	-0.425969
-0.304036	CPU detection mechanism in	-0.124939
-0.427979	map is needed in	-0.124939
-0.396752	is not needed in	-0.425969
-0.263835	longer than needed in	-0.124939
-0.263835	of memory needed in	-0.124939
-0.326666	often true last in	-0.124939
-0.326666	objects come last in	-0.124939
-0.005969	to be transferred in	-0.249877
-0.024386	would be transferred in	-0.124939
-0.055694	parameters are transferred in	-0.669007
-0.350414	one byte longer in	-0.124939
-0.390061	Comparison of optimizations in	-0.124939
-0.300099	CPU- specific optimizations in	-0.124939
-0.300099	can improve optimizations in	-0.124939
-0.636990	such a framework in	-0.124939
-0.350812	systems. A look in	-0.124939
-0.323958	All newer microprocessors in	-0.124939
-0.488831	operators Modern microprocessors in	-0.124939
-0.216672	that the numbers in	-0.124939
-0.216672	hold the numbers in	-0.124939
-0.297103	generating denormal numbers in	-0.124939
-0.348720	to use later in	-0.124939
-0.330914	many objects together in	-0.124939
-0.293494	are stored together in	-0.124939
-0.293494	always stored together in	-0.124939
-0.330914	then linked together in	-0.124939
-0.251423	be joined together in	-0.124939
-0.623113	preferably be declared in	-0.124939
-0.574363	they are declared in	-0.124939
-0.566850	and objects declared in	-0.124939
-0.274359	any objects declared in	-0.124939
-0.350969	piece by piece in	-0.124939
-0.491280	microprocessor doesn't know in	-0.124939
-0.450757	p and r in	-0.124939
-0.246669	do. This results in	-0.124939
-0.246669	the four results in	-0.124939
-0.370543	store intermediate results in	-0.124939
-0.246669	the thousand results in	-0.124939
-0.246669	My experimental results in	-0.124939
-0.347890	polymorphic function goes in	-0.124939
-0.347595	disable power-save options in	-0.124939
-0.347890	and Func2 were in	-0.124939
-0.347280	in all operands in	-0.124939
-0.349459	few unused points in	-0.124939
-0.515494	insert a switch in	-0.124939
-0.488798	switches is smaller in	-0.124939
-0.348212	is provoked here in	-0.124939
-0.180591	and scattered around in	-0.124939
-0.080976	are scattered around in	-0.425969
-0.180591	functions scattered around in	-0.124939
-0.235684	scattered randomly around in	-0.124939
-0.190613	to do things in	-0.124939
-0.534871	and algebraic reductions in	-0.124939
-0.927017	should be tested in	-0.124939
-0.409076	not been tested in	-0.124939
-0.065244	to cause contentions in	-0.124939
-0.065244	and cause contentions in	-0.124939
-0.065244	can cause contentions in	-0.124939
-0.081310	9.10 Cache contentions in	-0.425969
-0.388813	pointers and references in	-0.124939
-0.341636	mostly relative references in	-0.124939
-0.260333	use absolute references in	-0.124939
-0.260333	of self-relative references in	-0.124939
-0.512595	no extra overhead in	-0.124939
-0.347585	of a change in	-0.124939
-0.346182	floating point-to-integer conversions in	-0.124939
-0.406276	The if statement in	-0.124939
-0.313234	only one statement in	-0.124939
-0.099587	cause of errors in	-0.124939
-0.099587	source of errors in	-0.124939
-0.229024	preventing program errors in	-0.124939
-0.177650	for such errors in	-0.124939
-0.268187	prevent such errors in	-0.124939
-0.086026	number of columns in	-0.602060
-0.042756	rows and columns in	-0.425969
-0.204146	multiplication by columns in	-0.124939
-0.447921	and other languages in	-0.124939
-0.402767	space. The syntax in	-0.124939
-0.369747	the C++ syntax in	-0.124939
-0.283514	inline assembly syntax in	-0.124939
-1.449686	can be avoided in	-0.124939
-0.447824	but quite inefficient in	-0.124939
-0.275365	method is described in	-0.124939
-0.125409	algorithms are described in	-0.124939
-0.064404	code, as described in	-0.124939
-0.020425	CPUs, as described in	-0.301030
-0.125409	the cases described in	-0.124939
-0.125409	the syntax described in	-0.124939
-0.125409	are further described in	-0.124939
-0.248138	different cache lines in	-0.124939
-0.203075	four cache lines in	-0.124939
-0.221041	the 4 lines in	-0.124939
-0.221041	to 16 lines in	-0.124939
-0.344969	Each graphics operation in	-0.124939
-0.347944	then the instance in	-0.124939
-0.285479	function is given in	-0.124939
-0.285479	can be given in	-0.124939
-0.210960	classes are given in	-0.124939
-0.210960	details are given in	-0.124939
-0.213264	access, as given in	-0.124939
-0.344379	of the task in	-0.124939
-0.344775	overloaded or limited in	-0.124939
-0.499784	against the costs in	-0.124939
-0.309625	STL also costs in	-0.124939
-0.446708	instance of S1 in	-0.124939
-0.308528	of register temp in	-0.124939
-0.308528	will save temp in	-0.124939
-0.307070	the system database in	-0.124939
-0.307070	big registration database in	-0.124939
-0.345173	All identical constants in	-0.124939
-0.346366	instead of bool in	-0.124939
-0.344775	for this shift in	-0.124939
-0.449225	15.1b and d in	-0.124939
-0.058901	into the algorithm in	-0.124939
-0.037657	store all strings in	-0.124939
-0.175368	to store strings in	-0.124939
-0.175368	of storing strings in	-0.124939
-0.078892	of text strings in	-0.124939
-0.078892	handle text strings in	-0.124939
-0.384081	testing multiple conditions in	-0.124939
-0.269242	from error conditions in	-0.124939
-0.269242	under worst-case conditions in	-0.124939
-1.070962	to the right in	-0.124939
-0.483257	to a macro in	-0.124939
-0.443273	i with 100 in	-0.124939
-0.445424	between different tasks in	-0.124939
-0.343272	specifying parallel processing in	-0.124939
-0.301073	wheel. The containers in	-0.124939
-0.301073	these example containers in	-0.124939
-0.343368	widely different priority in	-0.124939
-0.512195	possibly be obtained in	-0.124939
-0.300237	set of counters in	-0.124939
-0.899925	performance monitor counters in	-0.124939
-0.343368	is overwritten, possibly in	-0.124939
-0.527894	The function names in	-0.124939
-0.343830	non- standardized details in	-0.124939
-0.484632	make the rows in	-0.124939
-0.301912	distance between rows in	-0.124939
-0.446849	example may fail in	-0.124939
-0.341985	Some applications (e.g. in	-0.124939
-0.445679	works by compiling in	-0.124939
-0.402524	memcpy, at least in	-0.124939
-0.402524	do, at least in	-0.124939
-0.540108	multiple data structures in	-0.124939
-0.272857	overflow can occur in	-0.124939
-0.248136	Overflow may occur in	-0.124939
-0.181437	break will occur in	-0.124939
-0.038756	when contentions occur in	-0.425969
-0.217296	is inefficient, especially in	-0.124939
-0.217296	of precision, especially in	-0.124939
-0.217296	scarce resource, especially in	-0.124939
-0.217296	than relocation, especially in	-0.124939
-1.398417	can be improved in	-0.124939
-0.285964	methods are discussed in	-0.124939
-0.285964	time-consumers are discussed in	-0.124939
-0.343481	ReadTSC listed below in	-0.124939
-0.414961	an error message in	-0.124939
-0.057359	store forwarding delay in	-0.425969
-0.342475	a scarce resource in	-0.124939
-0.788281	number of cores in	-0.124939
-0.296169	be saved either in	-0.124939
-0.296169	memory blocks, either in	-0.124939
-0.068163	function is defined in	-0.425969
-0.181784	have been defined in	-0.124939
-0.038818	Vector classes defined in	-0.124939
-0.340970	programs but rarely in	-0.124939
-0.255561	a register except in	-0.124939
-0.255561	and underflow except in	-0.124939
-0.255561	the representation, except in	-0.124939
-0.340304	"Macro loops" chapter in	-0.124939
-0.343062	r places back in	-0.124939
-0.440071	containers class templates in	-0.124939
-0.479923	The loop unrolling in	-0.124939
-0.291830	versions of CriticalFunction in	-0.124939
-0.291830	call to CriticalFunction in	-0.124939
-0.509212	of the sequence in	-0.124939
-0.291338	Don't put something in	-0.124939
-0.291338	code. Storing something in	-0.124939
-0.685732	would be invalid in	-0.124939
-0.506784	of user input in	-0.124939
-0.442848	that is organized in	-0.124939
-0.342509	by transferring 'this' in	-0.124939
-0.506904	lot to gain in	-0.124939
-0.232750	priority. The gain in	-0.124939
-0.168209	substantial. This gain in	-0.124939
-0.232750	much you gain in	-0.124939
-0.168209	relatively small gain in	-0.124939
-0.502506	same can happen in	-0.124939
-0.339754	be considered metaprogramming in	-0.124939
-0.340854	we can define in	-0.124939
-0.789011	difficult to implement in	-0.124939
-0.342552	four consecutive terms in	-0.124939
-0.339491	into multiple blocks in	-0.124939
-0.283893	be put away in	-0.124939
-0.283893	to go away in	-0.124939
-0.339491	load is low in	-0.124939
-0.388922	sets is provided in	-0.124939
-0.181515	Examples are provided in	-0.124939
-0.181515	parsing are provided in	-0.124939
-0.478046	registers by default in	-0.124939
-0.530788	Long dependency chains in	-0.124939
-0.339491	for general purposes in	-0.124939
-0.235571	vector operations mentioned in	-0.124939
-0.235571	the time-consumers mentioned in	-0.124939
-0.235571	the ones mentioned in	-0.124939
-0.055581	157 17 Optimization in	-0.425969
-0.340713	clean up everything in	-0.124939
-0.338273	instead of (or in	-0.124939
-0.117107	code is included in	-0.124939
-0.117107	time is included in	-0.124939
-0.131226	functions are included in	-0.124939
-0.131226	is not included in	-0.124939
-0.131226	License license included in	-0.124939
-0.336428	doing two iterations in	-0.124939
-0.544305	prediction into account in	-0.124939
-0.522664	the dependency chain in	-0.124939
-0.275354	flow and algorithms in	-0.124939
-0.359815	several different algorithms in	-0.124939
-0.276539	four float additions in	-0.124939
-0.276539	do four additions in	-0.124939
-0.338477	many unknown factors in	-0.124939
-0.052555	functions are listed in	-0.124939
-0.052555	results are listed in	-0.124939
-0.052555	suffixes are listed in	-0.124939
-0.052555	latencies are listed in	-0.124939
-0.092154	set, as listed in	-0.124939
-0.092154	the instructions listed in	-0.124939
-0.337110	algebraic reductions explicitly in	-0.124939
-0.474774	time is interpreted in	-0.124939
-0.505528	cannot be determined in	-0.124939
-0.054319	size causes misses in	-0.425969
-0.338477	registers named YMM in	-0.124939
-0.336398	the specific purpose in	-0.124939
-0.249161	object without -fpic in	-0.124939
-0.249161	compiling without -fpic in	-0.124939
-0.334068	vector registers had in	-0.124939
-0.334844	in table 19 in	-0.124939
-0.348404	'$' are allowed in	-0.124939
-0.490356	is not allowed in	-0.124939
-0.335620	is calling itself in	-0.124939
-0.615925	rules of algebra in	-0.124939
-0.334844	available for free in	-0.124939
-0.334068	can be expensive in	-0.124939
-0.334068	// Catch exceptions in	-0.124939
-0.303263	should be saved in	-0.124939
-0.205906	calls are saved in	-0.124939
-0.205906	that was saved in	-0.124939
-0.334844	independent of changes in	-0.124939
-0.470603	that is measured in	-0.124939
-0.334068	a risk factor in	-0.124939
-0.503937	different execution units in	-0.124939
-0.500780	insert the reciprocal in	-0.124939
-0.334545	possible minor increase in	-0.124939
-0.470018	time is spent in	-0.124939
-0.470018	the time spent in	-0.124939
-0.467554	an exception occurs in	-0.124939
-0.252194	an interrupt occurs in	-0.124939
-0.430575	implemented as follows in	-0.124939
-0.331840	the next step in	-0.124939
-0.294202	any hot spots in	-0.124939
-0.294202	identifying hot spots in	-0.124939
-0.330942	at specific places in	-0.124939
-0.429448	|| are evaluated in	-0.124939
-0.039724	allocated and deallocated in	-0.425969
-0.186842	are also deallocated in	-0.124939
-0.331840	multiplication are permissible in	-0.124939
-0.428324	by many users in	-0.124939
-0.429448	is approximately six in	-0.124939
-0.599457	systems and fourteen in	-0.124939
-0.326602	certain programming principles in	-0.124939
-0.424232	do the reduction in	-0.124939
-0.235379	language is portable in	-0.124939
-0.235379	is fully portable in	-0.124939
-0.326602	than linked lists in	-0.124939
-0.523349	how to recover in	-0.124939
-0.679158	of the advice in	-0.124939
-0.475818	value is already in	-0.124939
-0.327669	are available, i.e. in	-0.124939
-0.480455	the software package in	-0.124939
-0.158581	cost is seen in	-0.124939
-0.158581	should be seen in	-0.124939
-0.158581	is not seen in	-0.124939
-0.327669	two modules contiguous in	-0.124939
-0.326602	or class separately in	-0.124939
-0.326602	very old-fashioned. Development in	-0.124939
-0.422900	index or key in	-0.124939
-0.034552	which they appear in	-0.425969
-0.158581	the modules appear in	-0.124939
-0.236244	two loops (except in	-0.124939
-0.236244	point capabilities (except in	-0.124939
-0.326602	position-independent code flag in	-0.124939
-0.234516	should be written in	-0.124939
-0.234516	with programs written in	-0.124939
-0.327669	were not present in	-0.124939
-0.414888	from one place in	-0.124939
-0.043637	systems and sixteen in	-0.425969
-0.320172	is always enabled in	-0.124939
-0.587969	code is serial in	-0.124939
-0.320172	to require modifications in	-0.124939
-0.092154	that are missing in	-0.124939
-0.092154	functions are missing in	-0.124939
-0.047703	2. Optimizing subroutines in	-0.124939
-0.005049	2: "Optimizing subroutines in	-0.602060
-0.320172	of different lengths in	-0.124939
-0.209279	information is contained in	-0.124939
-0.209279	be completely contained in	-0.124939
-0.320172	floating point underflow in	-0.124939
-0.587969	can be prevented in	-0.124939
-0.047703	of 2. Contentions in	-0.124939
-0.047703	target buffer. Contentions in	-0.124939
-0.047703	buffer (BTB). Contentions in	-0.124939
-0.047703	my experiments. Contentions in	-0.124939
-0.056308	method is illustrated in	-0.124939
-0.056308	technique is illustrated in	-0.124939
-0.121020	checking, as illustrated in	-0.124939
-0.043637	can be returned in	-0.124939
-0.320172	than there is, in	-0.124939
-0.209279	set a breakpoint in	-0.124939
-0.209279	a fixed breakpoint in	-0.124939
-0.321486	register that appears in	-0.124939
-0.051742	can be found in	-0.124939
-0.121020	features rarely found in	-0.124939
-0.728683	the exception handler in	-0.124939
-0.210297	can be coded in	-0.124939
-0.210297	that are coded in	-0.124939
-0.320172	last index changing in	-0.124939
-0.209279	bit is kept in	-0.124939
-0.209279	functions are kept in	-0.124939
-0.414888	trying the techniques in	-0.124939
-0.209279	necessarily stored sequentially in	-0.124939
-0.209279	be accessed sequentially in	-0.124939
-0.320172	is much simpler in	-0.124939
-0.235091	described in detail in	-0.124939
-0.170226	in more detail in	-0.124939
-0.309663	above security advices in	-0.124939
-0.401856	and an FPGA in	-0.124939
-0.170226	the elements consecutively in	-0.124939
-0.170226	are stored consecutively in	-0.124939
-0.619500	layers of abstraction in	-0.124939
-0.309663	Variables whose distance in	-0.124939
-0.309663	by default anyway in	-0.124939
-0.309663	not need updating in	-0.124939
-0.309663	own profiling instruments in	-0.124939
-0.309663	is unnecessarily wasteful in	-0.124939
-0.170226	The difference lies in	-0.124939
-0.170226	this efficiency lies in	-0.124939
-0.309663	unpredictable errors elsewhere in	-0.124939
-0.309663	better than RISC in	-0.124939
-0.170226	functions are supplied in	-0.124939
-0.170226	I have supplied in	-0.124939
-0.309663	several standard PC's in	-0.124939
-0.309663	and cause delays in	-0.124939
-0.309663	function uses logarithms in	-0.124939
-0.309663	rely on longjmp in	-0.124939
-0.309663	operating system kernel in	-0.124939
-0.309663	lot of bookkeeping in	-0.124939
-0.309663	the right formula in	-0.124939
-0.309663	the {} brackets in	-0.124939
-0.309663	should be handled in	-0.124939
-0.702255	linkage table (PLT) in	-0.124939
-0.309663	use as pivot in	-0.124939
-0.141871	can be placed in	-0.124939
-0.141871	then be placed in	-0.124939
-0.376909	has to invest in	-0.124939
-0.530403	Sum2 and Sum3 in	-0.124939
-0.047703	platforms as shown in	-0.124939
-0.047703	_mm_empty() as shown in	-0.124939
-0.289377	should also proceed in	-0.124939
-0.376909	may be justified in	-0.124939
-0.376909	all disturbing influences in	-0.124939
-0.289377	can be disabled in	-0.124939
-0.101300	because the relocations in	-0.124939
-0.101300	will generate relocations in	-0.124939
-0.289377	in kernel code" in	-0.124939
-0.289377	declare it locally in	-0.124939
-0.289377	parameter. If MultiplyBy in	-0.124939
-0.289377	find the answers in	-0.124939
-0.289377	function is inserted in	-0.124939
-0.289377	Poor reproducibility. Delays in	-0.124939
-0.289377	is not visible in	-0.124939
-0.376909	results are summarized in	-0.124939
-0.289377	to do experiments in	-0.124939
-0.289377	are scattered everywhere in	-0.124939
-0.289377	hints as pragmas in	-0.124939
-0.289377	that runs alone in	-0.124939
-0.289377	the biggest time-consumer in	-0.124939
-0.023197	65 8 Optimizations in	-0.425969
-0.289377	opportunities for parallelization in	-0.124939
-0.289377	processors are covered in	-0.124939
-0.289377	a slight degradation in	-0.124939
-0.530403	variables are overdetermined in	-0.124939
-0.289377	card or integrated in	-0.124939
-0.376909	for source annotation in	-0.124939
-0.376909	software develop- ment in	-0.124939
-0.289377	invest more efforts in	-0.124939
-0.289377	Remove right-most 1-bit in	-0.124939
-0.289377	chapter "Register usage in	-0.124939
-0.289377	was assigned previously in	-0.124939
-0.289377	not necessarily stay in	-0.124939
-0.289377	Wednesday | Friday) in	-0.124939
-0.289377	in question: Put in	-0.124939
-0.233654	systems will dominate in	-0.124939
-0.233654	have a niche in	-0.124939
-0.233654	(In Windows, SetThreadAffinityMask, in	-0.124939
-0.233654	make 32 AND-operations in	-0.124939
-0.233654	because a typo in	-0.124939
-0.233654	is not recognized in	-0.124939
-0.233654	conventions. The dot in	-0.124939
-0.233654	1 : 0] in	-0.124939
-0.233654	the Gnu utilities in	-0.124939
-0.233654	has been alleviated in	-0.124939
-0.233654	the right positions in	-0.124939
-0.233654	function libraries. Numbers in	-0.124939
-0.233654	cores will grow in	-0.124939
-0.233654	supposedly is system-independent, in	-0.124939
-0.233654	the two formulas in	-0.124939
-0.233654	can be arranged in	-0.124939
-0.233654	and other flaws in	-0.124939
-0.233654	functions (e.g. GetLogicalProcessorInformation in	-0.124939
-0.233654	efficient than investing in	-0.124939
-0.233654	degree of randomness in	-0.124939
-0.233654	memory is mirrored in	-0.124939
-0.233654	is often reorganized in	-0.124939
-0.233654	is then de-referenced in	-0.124939
-0.233654	vector classes Programming in	-0.124939
-0.233654	and a server in	-0.124939
-0.233654	dependency chain. Nothing in	-0.124939
-0.233654	or completely absent in	-0.124939
-0.233654	fetched and decoded in	-0.124939
-0.233654	can be programmed in	-0.124939
-0.233654	not stored contiguously in	-0.124939
-0.233654	numbers. You may, in	-0.124939
-0.233654	We must bear in	-0.124939
-0.233654	equally efficient because, in	-0.124939
-0.233654	under test finishes in	-0.124939
-0.233654	have inserted UnusedFiller in	-0.124939
-0.233654	the planning phase in	-0.124939
-0.233654	rows are indexed in	-0.124939
-0.233654	} The FactorialTable in	-0.124939
-0.233654	to general improvements in	-0.124939
-0.233654	has been introduced in	-0.124939
-0.233654	exception occurs somewhere in	-0.124939
-0.233654	cause slight imprecision in	-0.124939
-0.233654	user interface (OnIdle in	-0.124939
-0.233654	offset table (GOT) in	-0.124939
-0.233654	a thread-like scheduling in	-0.124939
-0.233654	clumsy AND-OR construction in	-0.124939
-0.233654	default, so 1.2 in	-0.124939
-0.233654	code (release version) in	-0.124939
-0.233654	of the iterator in	-0.124939
-0.233654	language gained remarkably in	-0.124939
-0.233654	which are cheap, in	-0.124939
-0.233654	#) are costless in	-0.124939
-0.233654	commas and semicolons in	-0.124939
-0.233654	addresses are obscured in	-0.124939
-0.233654	and integer representations in	-0.124939
-0.233654	different compilers succeeded in	-0.124939
-0.233654	eliminating the if-branch in	-0.124939
-0.233654	} // continue in	-0.124939
-0.233654	call (e.g. GetProcessAffinityMask in	-0.124939
-0.233654	a considerable improvement in	-0.124939
-0.233654	first and foremost, in	-0.124939
-0.233654	lot of CPU-time in	-0.124939
-0.233654	optimizing University courses in	-0.124939
-0.233654	should be scheduled in	-0.124939
-0.233654	; compute i/2 in	-0.124939
-0.233654	Delays or glitches in	-0.124939
-0.233654	calls (e.g. IsProcessorFeaturePresent in	-0.124939
-0.233654	FuncCol(i)) * sizeof(float) in	-0.124939
-0.233654	number of rows/columns in	-0.124939
-0.233654	clock cycles. Calculations in	-0.124939
-0.233654	incremental or iterative in	-0.124939
-0.233654	on branch predictions in	-0.124939
-0.233654	by my comments, in	-0.124939
-0.233654	has occurred anywhere in	-0.124939
-0.233654	to be resized in	-0.124939
-0.233654	can be overridden in	-0.124939
-0.233654	time lag. Thinking in	-0.124939
-0.233654	_intel_fast_memcpy and __intel_new_strlen in	-0.124939
-0.233654	the other volumes in	-0.124939
-1.263973	} }; // The	-0.124939
-0.355602	both loops // The	-0.124939
-0.785942	bit of x The	-0.124939
-0.356794	void xplus2() { The	-0.124939
-0.770151	} } } The	-0.124939
-0.258493	... } } The	-0.124939
-0.544378	1; } } The	-0.124939
-0.485217	2; } } The	-0.124939
-0.139010	a); } } The	-0.124939
-0.485217	a.store(aa+i); } } The	-0.124939
-0.344691	Induction++; } } The	-0.124939
-0.344691	i/2; } } The	-0.124939
-0.114896	return 0; } The	-0.124939
-1.075569	+ 1; } The	-0.124939
-0.718454	= 2; } The	-0.124939
-0.466484	+ 3; } The	-0.124939
-0.670691	+ 1.; } The	-0.124939
-0.543068	= 2.0; } The	-0.124939
-0.543068	+= 1.0f; } The	-0.124939
-0.296603	variable Z } The	-0.124939
-0.296603	return pow(x,10); } The	-0.124939
-0.296603	> abs(v.f) } The	-0.124939
-0.296603	range printf(Greek[n]); } The	-0.124939
-0.296603	= Func(a[i]); } The	-0.124939
-0.296603	i, timediff[i]); } The	-0.124939
-0.296603	return list[x]; } The	-0.124939
-0.355720	optimization. Prefetching data The	-0.124939
-0.784045	14.10 Mathematical functions The	-0.124939
-0.641010	7.26 Overloaded functions The	-0.124939
-0.349397	Use fastcall functions The	-0.124939
-1.262448	different C++ compilers The	-0.124939
-0.505307	all C++ compilers The	-0.124939
-0.355568	Mac platform. Intel The	-0.124939
-0.354682	4; Register variables The	-0.124939
-0.549811	elements in table The	-0.124939
-1.000269	faster if unsigned The	-0.124939
-0.473458	improve the code. The	-0.124939
-0.418091	sequences of code. The	-0.124939
-0.178635	floating point code. The	-0.124939
-0.367807	an intermediate code. The	-0.124939
-0.564651	same source code. The	-0.124939
-0.281923	position- independent code. The	-0.124939
-0.281923	and Fortran code. The	-0.124939
-0.367807	optimizing application-specific code. The	-0.124939
-0.281923	in precompiled code. The	-0.124939
-0.651971	of the time. The	-0.124939
-0.577533	amount of time. The	-0.124939
-0.888415	the same time. The	-0.124939
-0.438273	takes extra time. The	-0.124939
-0.753056	at compile time. The	-0.124939
-0.529039	at load time. The	-0.124939
-0.288594	to save time. The	-0.124939
-0.529039	the user's time. The	-0.124939
-0.693902	and YMM registers The	-0.124939
-0.693902	and ZMM registers The	-0.124939
-0.549982	of the function. The	-0.124939
-0.382148	to the function. The	-0.124939
-0.113562	from the function. The	-0.124939
-0.358935	class member function. The	-0.124939
-0.160079	the critical function. The	-0.124939
-0.230927	the new function. The	-0.124939
-0.230927	the desired function. The	-0.124939
-0.306416	an inlined function. The	-0.124939
-0.230927	error message function. The	-0.124939
-0.230927	a polymorphic function. The	-0.124939
-0.230927	examples: strlen function. The	-0.124939
-0.312636	base access, etc. The	-0.124939
-0.312636	other way, etc. The	-0.124939
-0.312636	resources, databases, etc. The	-0.124939
-0.312636	database connections, etc. The	-0.124939
-0.454975	64 bit Linux The	-0.124939
-0.337663	sign bit }; The	-0.124939
-0.436751	{return b;} }; The	-0.124939
-0.225451	the library functions. The	-0.124939
-0.225451	for library functions. The	-0.124939
-0.232800	the two functions. The	-0.124939
-0.240037	virtual member functions. The	-0.124939
-0.346410	non-static member functions. The	-0.124939
-0.240037	non-polymorphic member functions. The	-0.124939
-0.308645	of virtual functions. The	-0.124939
-0.232800	intrinsic hardware functions. The	-0.124939
-0.232800	the polymorphic functions. The	-0.124939
-0.232800	and trigonometric functions. The	-0.124939
-0.453259	and decrement operators The	-0.124939
-0.447523	of the memory. The	-0.124939
-0.508289	not in memory. The	-0.124939
-0.317095	the code memory. The	-0.124939
-0.408691	of the program. The	-0.124939
-0.262563	by the program. The	-0.124939
-0.262563	crashes the program. The	-0.124939
-0.270035	of a program. The	-0.124939
-0.238619	console mode program. The	-0.124939
-0.544233	the application program. The	-0.124939
-0.333265	framework is used. The	-0.124939
-0.469503	linking is used. The	-0.124939
-0.333265	alloca is used. The	-0.124939
-0.298886	are not used. The	-0.124939
-0.642074	Choice of microprocessor The	-0.124939
-0.641220	Join identical branches The	-0.124939
-0.350158	to date. Mac The	-0.124939
-0.114400	in the cache. The	-0.124939
-0.109225	invalidate the cache. The	-0.124939
-0.039884	the code cache. The	-0.124939
-0.546020	the data cache. The	-0.124939
-0.446907	the level-2 cache. The	-0.124939
-0.237689	the level-1 cache. The	-0.124939
-0.152729	same level-1 cache. The	-0.124939
-0.255499	or micro-op cache. The	-0.124939
-0.446588	and 64-bit systems. The	-0.124939
-0.561808	in 64-bit systems. The	-0.124939
-0.499797	and operating systems. The	-0.124939
-0.235077	for Linux systems. The	-0.124939
-0.235077	on bigger systems. The	-0.124939
-0.235077	to BSD systems. The	-0.124939
-0.235077	and Itanium systems. The	-0.124939
-0.810090	b + c; The	-0.124939
-0.453913	is more efficient. The	-0.124939
-0.321818	calls more efficient. The	-0.124939
-0.263084	caching very efficient. The	-0.124939
-0.378562	it less efficient. The	-0.124939
-0.677948	caching less efficient. The	-0.124939
-0.799181	as explained below. The	-0.124939
-0.344449	as described below. The	-0.124939
-0.262665	are given below. The	-0.124939
-0.262665	the sections below. The	-0.124939
-0.262665	example 14.19 below. The	-0.124939
-0.321648	prefetch the data. The	-0.124939
-0.189814	addressing of data. The	-0.124939
-0.226290	gigabytes of data. The	-0.124939
-0.209403	odd-sized vector data. The	-0.124939
-0.209403	most used data. The	-0.124939
-0.399362	of test data. The	-0.124939
-0.209403	and read-only data. The	-0.124939
-0.641218	Function return types The	-0.124939
-0.114546	available instruction set. The	-0.124939
-0.270568	AVX instruction set. The	-0.124939
-0.270568	given instruction set. The	-0.124939
-0.270568	lower instruction set. The	-0.124939
-0.270568	specified instruction set. The	-0.124939
-0.425988	seven different compilers. The	-0.124939
-0.425988	modern C++ compilers. The	-0.124939
-0.301045	and Clang compilers. The	-0.124939
-0.578674	on Intel processors. The	-0.124939
-0.360807	most newer processors. The	-0.124939
-0.276170	in PC processors. The	-0.124939
-0.276170	on contemporary processors. The	-0.124939
-0.950090	of hardware platform The	-0.124939
-0.908437	be stored together The	-0.124939
-0.691369	function is called. The	-0.124939
-0.348991	CriticalInnerFunction is called. The	-0.124939
-0.235189	objects are called. The	-0.124939
-0.235189	destructors are called. The	-0.124939
-0.354120	are never called. The	-0.124939
-0.318217	brands of CPUs. The	-0.124939
-0.318217	on AMD CPUs. The	-0.124939
-0.240824	with old CPUs. The	-0.124939
-0.240824	all modern CPUs. The	-0.124939
-0.240824	with earlier CPUs. The	-0.124939
-0.448628	of Boolean operands The	-0.124939
-0.538135	of the compiler. The	-0.124939
-0.447398	optimize across modules The	-0.124939
-0.286264	than pointers are: The	-0.124939
-0.524995	memory allocation are: The	-0.124939
-0.286264	function inlining are: The	-0.124939
-0.522991	inside the loop. The	-0.124939
-0.331065	exit the loop. The	-0.124939
-0.235045	the test loop. The	-0.124939
-0.555745	critical innermost loop. The	-0.124939
-0.235045	an infinite loop. The	-0.124939
-0.517285	making a pointer. The	-0.124939
-0.316252	a hidden pointer. The	-0.124939
-0.345986	7.11 Type conversions The	-0.124939
-0.327598	and Windows platforms. The	-0.124939
-0.225243	on Windows platforms. The	-0.124939
-0.635780	3.3 Program installation The	-0.124939
-0.254691	of the cases. The	-0.124939
-0.577356	in most cases. The	-0.124939
-0.254691	in such cases. The	-0.124939
-0.577356	in simple cases. The	-0.124939
-0.573103	0 and 1. The	-0.124939
-0.801722	0 or 1. The	-0.124939
-0.508350	about. Function inlining The	-0.124939
-0.344894	of variable size. The	-0.124939
-0.798028	power of 2. The	-0.124939
-0.248863	will be 2. The	-0.124939
-0.084413	i by 2. The	-0.124939
-0.343968	for these variables. The	-0.124939
-0.343741	of allocated resources. The	-0.124939
-0.451951	structure or class. The	-0.124939
-0.242624	the same class. The	-0.124939
-0.242624	a container class. The	-0.124939
-0.242624	// parent class. The	-0.124939
-0.362930	that it calls. The	-0.124939
-0.191026	of function calls. The	-0.124939
-0.191026	or function calls. The	-0.124939
-0.191026	multiple function calls. The	-0.124939
-0.284723	pure function calls. The	-0.124939
-0.185532	graphical interface calls. The	-0.124939
-0.792989	the optimal algorithm The	-0.124939
-0.303897	and tested it. The	-0.124939
-0.303897	to execute it. The	-0.124939
-0.200095	bigger vector registers. The	-0.124939
-0.200095	special vector registers. The	-0.124939
-0.350441	256-bit YMM registers. The	-0.124939
-0.151933	in 32-bit mode. The	-0.124939
-0.310662	64 bit mode. The	-0.124939
-0.345715	is 12 bytes. The	-0.124939
-0.561687	to the object. The	-0.124939
-0.267622	a global object. The	-0.124939
-0.267622	an anonymous object. The	-0.124939
-0.688153	of the library. The	-0.124939
-0.429795	separate function library. The	-0.124939
-0.101992	in the calculations. The	-0.124939
-0.101992	do the calculations. The	-0.124939
-0.311904	the integer calculations. The	-0.124939
-0.235536	from overlapping calculations. The	-0.124939
-0.338869	core clock cycles. The	-0.124939
-0.338869	100 clock cycles. The	-0.124939
-0.477191	20 clock cycles. The	-0.124939
-0.394737	doing arithmetic operations. The	-0.124939
-0.303897	Single-Instruction-Multiple-Data (SIMD) operations. The	-0.124939
-0.234840	on that variable. The	-0.124939
-0.047896	a register variable. The	-0.124939
-0.311076	an induction variable. The	-0.124939
-0.343033	options prevent optimization. The	-0.124939
-0.303897	on program performance. The	-0.124939
-0.303897	for best performance. The	-0.124939
-0.261207	multiple dynamic libraries. The	-0.124939
-0.261207	very large libraries. The	-0.124939
-0.261207	vector math libraries. The	-0.124939
-0.800024	for the stack. The	-0.124939
-0.665686	set if possible. The	-0.124939
-0.424068	good as possible. The	-0.124939
-0.453896	optimization is needed. The	-0.124939
-0.299604	only when needed. The	-0.124939
-0.261207	structures and classes. The	-0.124939
-0.261207	only for classes. The	-0.124939
-0.389989	well-tested container classes. The	-0.124939
-0.343173	stop the thread. The	-0.124939
-0.527235	many different purposes. The	-0.124939
-0.483787	for other purposes. The	-0.124939
-0.483787	for test purposes. The	-0.124939
-0.261207	performance and precision. The	-0.124939
-0.261207	floating point precision. The	-0.124939
-0.261207	of losing precision. The	-0.124939
-0.373644	by memory access. The	-0.124939
-0.261207	at each access. The	-0.124939
-0.261207	at every access. The	-0.124939
-0.627202	loop control condition The	-0.124939
-0.470106	the AVX instructions. The	-0.124939
-0.253746	and string instructions. The	-0.124939
-0.253746	the subsequent instructions. The	-0.124939
-0.343867	e + f; The	-0.124939
-0.575372	the following way. The	-0.124939
-0.333705	very inefficient way. The	-0.124939
-0.470106	a suboptimal way. The	-0.124939
-0.333705	to the vector. The	-0.124939
-0.253746	full size vector. The	-0.124939
-0.470106	elements per vector. The	-0.124939
-0.385024	smaller as well. The	-0.124939
-0.296000	version performs well. The	-0.124939
-0.922107	automatic CPU dispatching. The	-0.124939
-0.738456	and switch statements The	-0.124939
-0.289660	calculate its address. The	-0.124939
-0.289660	32-bit (signed) address. The	-0.124939
-0.634488	these instruction sets. The	-0.124939
-0.289660	different instructions sets. The	-0.124939
-0.151749	loop or not. The	-0.124939
-0.151749	2 or not. The	-0.124939
-0.151749	advantageous or not. The	-0.124939
-0.236496	aligned or not. The	-0.124939
-0.378511	reduce this problem. The	-0.124939
-0.245421	encounter another problem. The	-0.124939
-0.245421	another security problem. The	-0.124939
-0.341802	operating systems. 3 The	-0.124939
-0.204853	for each version. The	-0.124939
-0.204853	the 32-bit version. The	-0.124939
-0.204853	the alternative version. The	-0.124939
-0.204853	an up-to-date version. The	-0.124939
-0.968114	the end user. The	-0.124939
-0.535595	the function returns. The	-0.124939
-0.381175	and 64-bit Windows. The	-0.124939
-0.381175	in 64-bit Windows. The	-0.124939
-0.161407	a non-sequential order. The	-0.124939
-0.438797	in random order. The	-0.124939
-0.338016	for dynamic allocation. The	-0.124939
-0.370680	of 64-bit integers. The	-0.124939
-0.284279	eight 16-bit integers. The	-0.124939
-0.479540	set is enabled. The	-0.124939
-0.339291	the future. 6 The	-0.124939
-0.338016	is quite inefficient. The	-0.124939
-0.180858	speed is critical. The	-0.124939
-0.272142	caching is critical. The	-0.124939
-0.438797	can be critical. The	-0.124939
-0.017494	set is available. The	-0.124939
-0.190668	function libraries available. The	-0.124939
-0.518657	branch is executed. The	-0.124939
-0.234956	three times faster. The	-0.124939
-0.234956	address calculation faster. The	-0.124939
-0.234956	code execute faster. The	-0.124939
-0.369994	investigating performance problems. The	-0.124939
-0.283717	user. Installation problems. The	-0.124939
-0.760785	the template parameter. The	-0.124939
-0.437995	case of overflow. The	-0.124939
-0.190668	a single element. The	-0.124939
-0.190668	the matrix element. The	-0.124939
-0.370680	cycles per element. The	-0.124939
-0.190668	suitable pivot element. The	-0.124939
-0.437195	for register storage. The	-0.124939
-0.221672	change the value. The	-0.124939
-0.221672	function return value. The	-0.124939
-0.221672	the calculated value. The	-0.124939
-0.336887	an input file. The	-0.124939
-0.505930	in a register. The	-0.124939
-0.359579	a vector register. The	-0.124939
-0.336887	values at once The	-0.124939
-0.275780	in the system. The	-0.124939
-0.445781	an operating system. The	-0.124939
-0.408864	is very fast. The	-0.124939
-0.275160	accessed quite fast. The	-0.124939
-0.333664	size execution units. The	-0.124939
-0.333664	full-size execution units. The	-0.124939
-0.336173	with that branch. The	-0.124939
-0.221672	in an array. The	-0.124939
-0.221672	in another array. The	-0.124939
-0.221672	a normal array. The	-0.124939
-0.497248	up cache space. The	-0.124939
-0.473488	array to zero. The	-0.124939
-0.603035	same cache line. The	-0.124939
-0.275160	a matrix line. The	-0.124939
-0.275160	like adding vectors. The	-0.124939
-0.275160	two 128-bit vectors. The	-0.124939
-0.275160	in large applications. The	-0.124939
-0.275160	for Windows applications. The	-0.124939
-1.077819	Mac OS X The	-0.124939
-0.432938	in the table. The	-0.124939
-0.334626	most up-to-date solution. The	-0.124939
-0.611020	Windows and Linux. The	-0.124939
-0.265783	point to integer. The	-0.124939
-0.628969	as an integer. The	-0.124939
-0.265091	kinds of optimizations. The	-0.124939
-0.488937	enables interprocedural optimizations. The	-0.124939
-0.667537	in this case. The	-0.124939
-0.265091	the 32-bit case. The	-0.124939
-0.334626	an Intel processor. The	-0.124939
-0.395613	number of bits. The	-0.124939
-0.278067	uses 64 bits. The	-0.124939
-0.206979	the extra bits. The	-0.124939
-0.265091	the loop is. The	-0.124939
-0.265091	compiler itself is. The	-0.124939
-0.205771	programming work automatically. The	-0.124939
-0.205771	this alignment automatically. The	-0.124939
-0.205771	not vectorize automatically. The	-0.124939
-0.335438	Unix-like platforms. Clang The	-0.124939
-0.265091	code in details. The	-0.124939
-0.412416	blog for details. The	-0.124939
-0.206374	obstacle to vectorization. The	-0.124939
-0.261173	with automatic vectorization. The	-0.124939
-0.261173	on automatic vectorization. The	-0.124939
-0.333814	handling support anyway. The	-0.124939
-0.379581	thing to do. The	-0.124939
-0.265783	can not do. The	-0.124939
-0.285266	into multiple threads. The	-0.124939
-0.205771	communicating between threads. The	-0.124939
-0.766716	and model number. The	-0.124939
-0.330689	with a constant. The	-0.124939
-0.606892	32 bit systems: The	-0.124939
-0.428008	// Calculate polynomial The	-0.124939
-0.330689	always avoiding this. The	-0.124939
-0.330689	on first call. The	-0.124939
-0.331629	the right prediction. The	-0.124939
-0.277798	with the application. The	-0.124939
-0.252802	the particular application. The	-0.124939
-0.185436	an MFC application. The	-0.124939
-0.252802	model used here. The	-0.124939
-0.252802	little odd here. The	-0.124939
-0.330689	child class members. The	-0.124939
-0.465983	as mentioned above. The	-0.124939
-0.331629	low positive result. The	-0.124939
-0.252018	process...................................................................................................... 25 7 The	-0.124939
-0.252018	control tool. 7 The	-0.124939
-0.744760	of stack unwinding The	-0.124939
-0.331629	3 breakpoint again. The	-0.124939
-0.252018	iterations in one. The	-0.124939
-0.467266	a new one. The	-0.124939
-0.606892	time stamp counter. The	-0.124939
-0.331629	of a structure. The	-0.124939
-0.467266	class or structure. The	-0.124939
-0.191121	a Pentium 4. The	-0.124939
-0.191121	old Pentium 4. The	-0.124939
-0.252018	calls and branches. The	-0.124939
-0.252018	other nearby branches. The	-0.124939
-0.326352	to the profiler. The	-0.124939
-0.480099	easier to maintain. The	-0.124939
-0.460070	advanced development tools. The	-0.124939
-0.326352	for negative numbers. The	-0.124939
-0.326352	in assembly names. The	-0.124939
-0.326352	this first manual. The	-0.124939
-0.221481	in a computer. The	-0.124939
-0.158479	Pentium 4 computer. The	-0.124939
-0.158479	on another computer. The	-0.124939
-0.480099	addition is finished. The	-0.124939
-0.310496	in two ways. The	-0.124939
-0.234354	kb, 8 ways. The	-0.124939
-0.234354	the program. 16.2 The	-0.124939
-0.234354	.................................................................... 155 16.2 The	-0.124939
-0.327468	faster than pow The	-0.124939
-0.654272	sum += a[i]; The	-0.124939
-0.422588	has high priority. The	-0.124939
-0.480099	of out-of-order execution. The	-0.124939
-0.326352	on Intel/x86-compatible microprocessors. The	-0.124939
-0.438637	library at www.agner.org/optimize/asmlib.zip. The	-0.124939
-0.234354	the library www.agner.org/optimize/asmlib.zip. The	-0.124939
-0.480099	for branch mispredictions. The	-0.124939
-0.744694	to do so. The	-0.124939
-0.326352	and code addresses. The	-0.124939
-0.326352	that connect them. The	-0.124939
-0.460070	is floating point. The	-0.124939
-0.226596	member by 8. The	-0.124939
-0.329312	divisible by 8. The	-0.124939
-0.234354	for further explanation. The	-0.124939
-0.234354	a little explanation. The	-0.124939
-0.234354	or class elements. The	-0.124939
-0.310496	to array elements. The	-0.124939
-0.180799	time is doubled. The	-0.124939
-0.180799	frequency is doubled. The	-0.124939
-0.326352	multiplication or division. The	-0.124939
-0.128211	one clock cycle. The	-0.124939
-0.210028	every clock cycle. The	-0.124939
-0.326352	for AVX. 5. The	-0.124939
-0.326352	the pitfalls here: The	-0.124939
-0.319926	is clearly better. The	-0.124939
-0.319926	be broken up. The	-0.124939
-0.319926	for usability reasons. The	-0.124939
-0.640711	of an exception. The	-0.124939
-0.319926	be a type. The	-0.124939
-0.280608	pointer at initialization. The	-0.124939
-0.209136	the necessary initialization. The	-0.124939
-0.280608	bit of ebx. The	-0.124939
-0.209136	edx, to ebx. The	-0.124939
-0.319926	double is bad The	-0.124939
-0.209136	a is true. The	-0.124939
-0.209136	not always true. The	-0.124939
-0.319926	or class objects. The	-0.124939
-0.209136	the loop index. The	-0.124939
-0.209136	an array index. The	-0.124939
-0.027226	(*.dll or *.so). The	-0.124939
-0.269069	objects (*.dll, *.so). The	-0.124939
-0.209136	or code lines. The	-0.124939
-0.398948	same cache lines. The	-0.124939
-0.585090	program is running. The	-0.124939
-0.728053	for user input. The	-0.124939
-0.321301	to a dispatcher. The	-0.124939
-0.319926	far from optimal. The	-0.124939
-0.451350	as an example. The	-0.124939
-0.321301	a + 1.0f; The	-0.124939
-0.414582	loss of efficiency. The	-0.124939
-0.319926	the 64-bit versions. The	-0.124939
-0.414582	are not cached. The	-0.124939
-0.209136	dividend is unsigned. The	-0.124939
-0.209136	signed or unsigned. The	-0.124939
-0.209136	operations with pointers. The	-0.124939
-0.398948	and invalid pointers. The	-0.124939
-0.319926	of two double. The	-0.124939
-0.080090	to the diagonal. The	-0.124939
-0.038201	above the diagonal. The	-0.124939
-0.416293	object to another. The	-0.124939
-0.728053	no pointer aliasing. The	-0.124939
-0.229359	of programming style. The	-0.124939
-0.319926	two clock counts. The	-0.124939
-0.319926	thread are smaller. The	-0.124939
-0.120942	x86 platforms. 3. The	-0.124939
-0.120942	for interrupt 3. The	-0.124939
-0.120942	anonymous namespace. 3. The	-0.124939
-0.120942	any other module. The	-0.124939
-0.110443	in another module. The	-0.124939
-0.186782	from another module. The	-0.124939
-0.047670	code. Dynamic cast The	-0.124939
-0.047670	not. Static cast The	-0.124939
-0.047670	int. Reinterpret cast The	-0.124939
-0.047670	CPUs"). Const cast The	-0.124939
-0.319926	the parentheses manually. The	-0.124939
-0.640711	program is run. The	-0.124939
-0.209136	long double format. The	-0.124939
-0.209136	object file format. The	-0.124939
-0.319926	in optimized programs. The	-0.124939
-0.120942	how compilers work. The	-0.124939
-0.120942	on important work. The	-0.124939
-0.120942	and microprocessors work. The	-0.124939
-0.309422	platform-independent and compact. The	-0.124939
-0.136106	the following reasons: The	-0.124939
-0.170110	case of error. The	-0.124939
-0.170110	common programming error. The	-0.124939
-0.309422	non-vector library. 119 The	-0.124939
-0.309422	code (byte code). The	-0.124939
-0.565941	recover from errors. The	-0.124939
-0.309422	using multiplications only. The	-0.124939
-0.565941	Live range analysis The	-0.124939
-0.309422	than processor features. The	-0.124939
-0.141799	memory in advance. The	-0.124939
-0.141799	given in advance. The	-0.124939
-0.309422	set number 28. The	-0.124939
-0.565941	time it takes. The	-0.124939
-0.170110	without using exceptions. The	-0.124939
-0.170110	catching hardware exceptions. The	-0.124939
-0.076779	language is implemented. The	-0.124939
-0.076779	15.1b is implemented. The	-0.124939
-0.309422	only called once. The	-0.124939
-0.309422	for 64-bit Windows). The	-0.124939
-0.309422	to compile for. The	-0.124939
-0.309422	the second step. The	-0.124939
-0.401559	by CPU brand. The	-0.124939
-0.437195	for garbage collection. The	-0.124939
-0.309422	DoThisThreeTimesAWeek(); } 135 The	-0.124939
-0.311214	a particular purpose. The	-0.124939
-0.170110	values before compilation. The	-0.124939
-0.234956	use just-in-time compilation. The	-0.124939
-0.565941	are accessed sequentially. The	-0.124939
-0.309422	as the operands. The	-0.124939
-0.170110	more iterations back. The	-0.124939
-0.170110	and written back. The	-0.124939
-0.309422	can be expected. The	-0.124939
-0.701662	and operating systems". The	-0.124939
-0.437195	5: calling conventions. The	-0.124939
-0.309422	variables are stored. The	-0.124939
-0.569180	has been deallocated. The	-0.124939
-0.309422	different matrix sizes. The	-0.124939
-0.036695	in example 9.6b. The	-0.425969
-0.437195	always the same. The	-0.124939
-0.401559	do not occur. The	-0.124939
-0.619010	more error prone. The	-0.124939
-0.234956	from the linker. The	-0.124939
-0.170110	the dynamic linker. The	-0.124939
-0.309422	and no multiplications. The	-0.124939
-0.401559	may need metaprogramming. The	-0.124939
-0.170110	for assembly output. The	-0.124939
-0.170110	produce Boolean output. The	-0.124939
-0.309422	an arithmetic expression. The	-0.124939
-0.170110	other allocated resource. The	-0.124939
-0.170110	a limited resource. The	-0.124939
-0.565941	rounding and truncation. The	-0.124939
-0.309422	function is big. The	-0.124939
-0.309422	of four float. The	-0.124939
-0.224432	x + 1.0f;} The	-0.124939
-0.141799	square(x) + 1.0f;} The	-0.124939
-0.309422	choice of n. The	-0.124939
-0.309422	preventing illegitimate copying. The	-0.124939
-0.309422	result in x. The	-0.124939
-0.309422	pre-increment or post-increment. The	-0.124939
-0.309422	as recursive templates. The	-0.124939
-0.530004	(see page 107). The	-0.124939
-0.376629	elsewhere. 13.5 Implementation The	-0.124939
-0.101225	exploiting fine-grained parallelism. The	-0.124939
-0.101225	contains natural parallelism. The	-0.124939
-0.530004	the clock frequency. The	-0.124939
-0.376629	(See page 71). The	-0.124939
-0.289148	for background jobs. The	-0.124939
-0.289148	on the context. The	-0.124939
-0.101225	the ^ operator. The	-0.124939
-0.101225	the sizeof operator. The	-0.124939
-0.376629	Use automatic parallelization. The	-0.124939
-0.101225	reliable. Event-based sampling: The	-0.124939
-0.101225	line. Time-based sampling: The	-0.124939
-0.530004	only one instance. The	-0.124939
-0.289148	Big runtime frameworks. The	-0.124939
-0.101225	13.1 page 127. The	-0.124939
-0.101225	-128 generates 127. The	-0.124939
-0.289148	a single instruction. The	-0.124939
-0.289148	in most cases: The	-0.124939
-0.530004	moving the mouse. The	-0.124939
-0.376629	is more difficult. The	-0.124939
-0.289148	the B values. The	-0.124939
-0.530004	Writes "Hello 2" The	-0.124939
-0.101225	than the heap. The	-0.124939
-0.101225	a memory heap. The	-0.124939
-0.289148	linking works differently. The	-0.124939
-0.530004	in assembly language". The	-0.124939
-0.376629	See page 52. The	-0.124939
-0.289148	microprocessors. 7.13 Loops The	-0.124939
-0.289148	PLT and GOT. The	-0.124939
-0.289148	of a string. The	-0.124939
-0.289148	register is volatile. The	-0.124939
-0.289148	DLLs use relocation. The	-0.124939
-0.289148	is not supported. The	-0.124939
-0.530004	out of range. The	-0.124939
-0.530004	time of programming. The	-0.124939
-0.530004	also has disadvantages: The	-0.124939
-0.289148	the user's needs. The	-0.124939
-0.289148	affected by __fastcall. The	-0.124939
-0.289148	set is specified. The	-0.124939
-0.101225	risk of underflow. The	-0.124939
-0.101225	overflow and underflow. The	-0.124939
-0.289148	0's when false. The	-0.124939
-0.289148	Windows. 10 Multithreading The	-0.124939
-0.289148	CPU dispatch methods. The	-0.124939
-0.289148	targets is small. The	-0.124939
-0.530004	a make utility. The	-0.124939
-0.530004	cannot be controlled. The	-0.124939
-0.101225	reason for updating. The	-0.124939
-0.101225	support. Hardware updating. The	-0.124939
-0.101225	is allocated separately. The	-0.124939
-0.101225	be measured separately. The	-0.124939
-0.376629	See page 26. The	-0.124939
-0.530004	data members (properties) The	-0.124939
-0.289148	CPU cores. 60 The	-0.124939
-0.289148	number of iterations. The	-0.124939
-0.376629	16 is required. The	-0.124939
-0.289148	not in use. The	-0.124939
-0.289148	the class declaration. The	-0.124939
-0.376629	may go undetected. The	-0.124939
-0.289148	is enabled. Volatile The	-0.124939
-0.289148	purposes is allowed. The	-0.124939
-0.289148	in table 8.1. The	-0.124939
-0.289148	(b + c) The	-0.124939
-0.289148	CPU detection mechanism. The	-0.124939
-0.047670	not be negative. The	-0.124939
-0.047670	never be negative. The	-0.124939
-0.289148	can be defined. The	-0.124939
-0.289148	the work load. The	-0.124939
-0.289148	high level framework. The	-0.124939
-0.530004	same memory area. The	-0.124939
-0.289148	than standard PCs. The	-0.124939
-0.376629	my test examples. The	-0.124939
-0.376629	can be predicted. The	-0.124939
-0.289148	the inverted mask. The	-0.124939
-0.101225	the same machine. The	-0.124939
-0.101225	Java virtual machine. The	-0.124939
-0.530004	be allocated dynamically. The	-0.124939
-0.289148	rather than 1.23456. The	-0.124939
-0.530004	linkage table (PLT). The	-0.124939
-0.289148	row or column. The	-0.124939
-0.376629	16 (see below). The	-0.124939
-0.289148	from unknown sources. The	-0.124939
-0.376629	(See page 137). The	-0.124939
-0.289148	inside the template. The	-0.124939
-0.289148	CPU was started. The	-0.124939
-0.101225	8 = 80. The	-0.124939
-0.101225	See page 80. The	-0.124939
-0.289148	element number i. The	-0.124939
-0.101225	N&(N-1) is 0. The	-0.124939
-0.101225	c < 0. The	-0.124939
-0.289148	versions work correctly. The	-0.124939
-0.101225	....................................................................................................................... 3 1.1 The	-0.124939
-0.101225	relevant information. 1.1 The	-0.124939
-0.289148	(see page 43). The	-0.124939
-0.376629	Windows and Mac. The	-0.124939
-0.530004	of the fraction. The	-0.124939
-0.101225	the other compilers). The	-0.124939
-0.101225	old DOS compilers). The	-0.124939
-0.101225	from other processes. The	-0.124939
-0.101225	between multiple processes. The	-0.124939
-0.530004	the sign bit. The	-0.124939
-0.289148	and dynamic linking. The	-0.124939
-0.289148	reusable classes. Security The	-0.124939
-0.289148	integers. 7.5 Booleans The	-0.124939
-0.289148	are linked together. The	-0.124939
-0.289148	then become invalid. The	-0.124939
-0.047670	on page 122. The	-0.124939
-0.047670	see page 122. The	-0.124939
-0.023181	the program starts. The	-0.124939
-0.101225	ebx contains i/2+r. The	-0.124939
-0.101225	for computing i/2+r. The	-0.124939
-0.376629	example below shows. The	-0.124939
-0.023181	file is closed. The	-0.124939
-0.289148	transfer is avoided. The	-0.124939
-0.101225	64 bits each. The	-0.124939
-0.101225	8 bytes each. The	-0.124939
-0.289148	2.6.30 and later. The	-0.124939
-0.530004	(see page 140). The	-0.124939
-0.289148	vector operations. 105 The	-0.124939
-0.376629	(a+1) / 4; The	-0.124939
-0.289148	end users have. The	-0.124939
-0.530004	Linux and BSD. The	-0.124939
-0.233453	(see page 51). The	-0.124939
-0.233453	a loop count. The	-0.124939
-0.233453	be calculated independently. The	-0.124939
-0.233453	CPU cache (en.wikipedia.org/wiki/L2_cache). The	-0.124939
-0.233453	else being initialized. The	-0.124939
-0.233453	a[i] = i+1; The	-0.124939
-0.233453	thread is terminated. The	-0.124939
-0.233453	than code generality. The	-0.124939
-0.233453	a = sin(0.8); The	-0.124939
-0.233453	and complexity (en.wikipedia.org/wiki/Standard_Template_Library). The	-0.124939
-0.233453	year. Ignoring virtualization. The	-0.124939
-0.233453	to be installed. The	-0.124939
-0.233453	implemented with interpretation. The	-0.124939
-0.233453	only happens rarely. The	-0.124939
-0.233453	size is insufficient. The	-0.124939
-0.233453	each CPU core). The	-0.124939
-0.233453	used to be. The	-0.124939
-0.233453	two other situations: The	-0.124939
-0.233453	also page 119). The	-0.124939
-0.233453	the next paragraph. The	-0.124939
-0.233453	e.g. every millisecond. The	-0.124939
-0.233453	constructors and destructors. The	-0.124939
-0.233453	137 about division). The	-0.124939
-0.233453	(see page 27). The	-0.124939
-0.233453	char pointers. 144 The	-0.124939
-0.233453	a[i] is ecx+eax*4. The	-0.124939
-0.233453	current CPUs optimally. The	-0.124939
-0.233453	when compiling module2.cpp. The	-0.124939
-0.233453	variable in eax. The	-0.124939
-0.233453	a2*b1) / (b1*b2); The	-0.124939
-0.233453	testing worst-case performance: The	-0.124939
-0.233453	the value 1000. The	-0.124939
-0.233453	happens at runtime). The	-0.124939
-0.233453	rather than 20. The	-0.124939
-0.233453	platforms. Graphics accelerators The	-0.124939
-0.233453	than mov eax,0. The	-0.124939
-0.233453	checked before storing. The	-0.124939
-0.233453	in example 8.15b. The	-0.124939
-0.233453	not a vector). The	-0.124939
-0.233453	profiling methods: Instrumentation: The	-0.124939
-0.233453	stored in y. The	-0.124939
-0.233453	requires only SSE). The	-0.124939
-0.233453	such a formalism. The	-0.124939
-0.233453	a suitable duration. The	-0.124939
-0.233453	advantages and disadvantages. The	-0.124939
-0.233453	admittedly very kludgy. The	-0.124939
-0.233453	in Linux, sched_setaffinity). The	-0.124939
-0.233453	will see shortly. The	-0.124939
-0.233453	files and databases. The	-0.124939
-0.233453	when type-casting pointers: The	-0.124939
-0.233453	Nerds at Wikibooks. The	-0.124939
-0.233453	not overlap. 27 The	-0.124939
-0.233453	instructions becomes noticeable. The	-0.124939
-0.233453	slow, you know). The	-0.124939
-0.233453	complicated and error-prone. The	-0.124939
-0.233453	efficient, but risky. The	-0.124939
-0.233453	reinvent the wheel. The	-0.124939
-0.233453	of the weekdays. The	-0.124939
-0.233453	in example 16.2. The	-0.124939
-0.233453	would be straightforward. The	-0.124939
-0.233453	be optimized further. The	-0.124939
-0.233453	with a password. The	-0.124939
-0.233453	(see p. 104). The	-0.124939
-0.233453	code becomes contiguous. The	-0.124939
-0.233453	library versions instead. The	-0.124939
-0.233453	the binary digits. The	-0.124939
-0.233453	on page 44. The	-0.124939
-0.233453	by physical factors. The	-0.124939
-0.233453	and >= operators). The	-0.124939
-0.233453	diagonal remain unchanged. The	-0.124939
-0.233453	empty throw() specification. The	-0.124939
-0.233453	e.g. four floats. The	-0.124939
-0.233453	called register renaming. The	-0.124939
-0.233453	be used most. The	-0.124939
-0.233453	is implementation dependent. The	-0.124939
-0.233453	14.23 page 143. The	-0.124939
-0.233453	(&& and ||). The	-0.124939
-0.233453	(in Windows: __rdtsc()). The	-0.124939
-0.233453	(1. / 1.2345); The	-0.124939
-0.233453	code is repetitive. The	-0.124939
-0.233453	a certain tolerance. The	-0.124939
-0.233453	than its reputation. The	-0.124939
-0.233453	pointer aliasing (/Oa). The	-0.124939
-0.233453	it takes. Debugging. The	-0.124939
-0.233453	CPLDs and FPGAs. The	-0.124939
-0.233453	the register keyword. The	-0.124939
-0.233453	b+c = 100000001.23456. The	-0.124939
-0.233453	set, e.g. /arch:SSE2. The	-0.124939
-0.233453	has several flaws: The	-0.124939
-0.233453	CPUs (Intel Atom). The	-0.124939
-0.233453	the performance somewhat. The	-0.124939
-0.233453	storage p. 28) The	-0.124939
-0.233453	__fastcall or __attribute__((fastcall)). The	-0.124939
-0.233453	on page 134. The	-0.124939
-0.233453	r.a + r.b;} The	-0.124939
-0.233453	not necessarily newer. The	-0.124939
-0.233453	the variable m. The	-0.124939
-0.233453	the best algorithm. The	-0.124939
-0.233453	2 63 . The	-0.124939
-0.233453	not well documented. The	-0.124939
-0.233453	memory, using new. The	-0.124939
-0.233453	in the end. The	-0.124939
-0.233453	2015 or 2016. The	-0.124939
-0.233453	(see page 84). The	-0.124939
-0.233453	the following features: The	-0.124939
-0.233453	with 2n -1. The	-0.124939
-0.233453	value of temp. The	-0.124939
-0.233453	clumsy and tedious. The	-0.124939
-0.233453	microprocessor hardware design. The	-0.124939
-0.233453	disk copying. Security. The	-0.124939
-0.233453	most efficient alternative. The	-0.124939
-0.233453	a Gauss elimination. The	-0.124939
-0.233453	Math Kernel Library. The	-0.124939
-0.233453	to normal afterwards. The	-0.124939
-0.233453	to the next. The	-0.124939
-0.233453	AMD and VIA. The	-0.124939
-0.233453	common purposes (www.boost.org). The	-0.124939
-0.233453	VML and SVML. The	-0.124939
-0.233453	a; Plus2 (&a); The	-0.124939
-0.233453	is not satisfactory. The	-0.124939
-0.233453	such as <. The	-0.124939
-0.233453	one is fastest. The	-0.124939
-0.233453	by Agner Fog The	-0.124939
-0.233453	one that doesn’t. The	-0.124939
-0.233453	that is distributed. The	-0.124939
-0.233453	family number 6! The	-0.124939
-0.233453	64-bit systems. 67 The	-0.124939
-0.233453	130 for details). The	-0.124939
-0.233453	do the conversion. The	-0.124939
-0.233453	the performance costs. The	-0.124939
-0.233453	to return a+1;. The	-0.124939
-0.233453	conditions are satisfied. The	-0.124939
-0.233453	cause large delays. The	-0.124939
-0.233453	vector class library). The	-0.124939
-0.233453	be changed freely. The	-0.124939
-0.233453	condition, and increment. The	-0.124939
-0.233453	optimizing multithreaded applications: The	-0.124939
-0.233453	(see page 72). The	-0.124939
-0.233453	superior performance/price ratio. The	-0.124939
-0.233453	called name mangling. The	-0.124939
-0.233453	value of sum. The	-0.124939
-0.233453	2.0 / 3.0; The	-0.124939
-0.233453	for(i=0; i<100; i++)a[i]=2*i; The	-0.124939
-0.233453	the "FDIV bug". The	-0.124939
-0.233453	by 8. 71 The	-0.124939
-0.233453	clear and modular. The	-0.124939
-0.233453	has three advantages: The	-0.124939
-0.233453	two 128-bit reads. The	-0.124939
-0.233453	1.5f : 2.6f; The	-0.124939
-0.233453	and hot spots. The	-0.124939
-0.233453	a Taylor series. The	-0.124939
-0.233453	&, |, ~. The	-0.124939
-0.233453	the preceding row. The	-0.124939
-0.233453	I have tested. The	-0.124939
-0.233453	not to vectorize. The	-0.124939
-0.233453	- vectorclass www.agner.org/optimize/#vectorclass. The	-0.124939
-0.233453	virtual function tables. The	-0.124939
-0.233453	become more powerful. The	-0.124939
-0.233453	than integer comparisons. The	-0.124939
-0.233453	precision (80 bits). The	-0.124939
-0.233453	{ return _mm_cvtsd_si32(_mm_load_sd(&x));} The	-0.124939
-0.233453	I have tried. The	-0.124939
-0.233453	now as follows. The	-0.124939
-0.233453	accessing it directly. The	-0.124939
-0.233453	or "frame pointer". The	-0.124939
-0.233453	or reference parameters). The	-0.124939
-0.233453	ten years old. The	-0.124939
-0.233453	compatible with these. The	-0.124939
-0.233453	has been wasted. The	-0.124939
-0.233453	in example 7.30b. The	-0.124939
-0.233453	(see page 70). The	-0.124939
-0.357132	at all is for	-0.124939
-0.961177	This manual is for	-0.124939
-0.357132	My preference is for	-0.124939
-0.582101	well-structured code and for	-0.124939
-0.354338	an array and for	-0.124939
-0.560587	link pointers and for	-0.124939
-0.354338	execution speed and for	-0.124939
-0.354338	optimization features and for	-0.124939
-0.354338	local variables, and for	-0.124939
-0.357643	to use that for	-0.124939
-0.502178	subsequent manuals are for	-0.124939
-0.353932	reflect this or for	-0.124939
-0.353932	program optimization or for	-0.124939
-0.353932	for recovering or for	-0.124939
-0.142675	to use it for	-0.124939
-0.142675	can use it for	-0.124939
-0.878477	calling the function for	-0.124939
-1.036011	of a function for	-0.124939
-0.947399	in a function for	-0.124939
-0.553552	have a function for	-0.124939
-0.639942	the template function for	-0.124939
-0.639942	the inlined function for	-0.124939
-0.348850	the right function for	-0.124939
-0.639942	the strlen function for	-0.124939
-0.573244	operator. The code for	-0.124939
-0.760187	the program code for	-0.124939
-0.340868	and 64-bit code for	-0.124939
-0.750947	any extra code for	-0.124939
-0.156580	and assembly code for	-0.124939
-0.554505	and intermediate code for	-0.124939
-0.440781	the optimal code for	-0.124939
-0.340868	exactly identical code for	-0.124939
-0.063464	Transforming serial code for	-0.124939
-0.340868	can build code for	-0.124939
-0.693398	the same as for	-0.124939
-0.356291	Intel CPUs, not for	-0.124939
-0.352881	single precision than for	-0.124939
-0.352881	cache contentions than for	-0.124939
-0.352881	for shared_ptr than for	-0.124939
-0.505370	Basic. A compiler for	-0.124939
-0.825064	the Intel compiler for	-0.124939
-0.344472	PathScale C++ compiler for	-0.124939
-0.344472	PGI C++ compiler for	-0.124939
-0.725075	the Gnu compiler for	-0.124939
-0.492526	or Microsoft compiler for	-0.124939
-0.335037	open source compiler for	-0.124939
-0.335037	for your compiler for	-0.124939
-0.335037	or PathScale compiler for	-0.124939
-0.335037	A commercial compiler for	-0.124939
-0.335037	a cheap compiler for	-0.124939
-0.539821	availability of x for	-0.124939
-1.372295	int cc[]) { for	-0.124939
-0.109191	SIZE; r++) { for	-0.602060
-0.627503	+= TILESIZE) { for	-0.124939
-0.063672	r1+TILESIZE; r2++) { for	-0.425969
-0.342443	__restrict bb) { for	-0.124939
-0.356221	am using this for	-0.124939
-0.781417	waste of time for	-0.124939
-0.642221	The development time for	-0.124939
-0.350015	to save time for	-0.124939
-0.350015	that saves time for	-0.124939
-0.557094	branch to use for	-0.124939
-0.557094	lines to use for	-0.124939
-0.557094	cumbersome to use for	-0.124939
-0.576987	function can use for	-0.124939
-0.745670	allocation of memory for	-0.124939
-0.516256	block of memory for	-0.124939
-0.449323	too much data for	-0.124939
-0.347640	too little data for	-0.124939
-0.489298	automatically prefetch data for	-0.124939
-0.347640	for prefetching data for	-0.124939
-0.587651	down a program for	-0.124939
-0.349406	size is different for	-0.124939
-0.451555	will be different for	-0.124939
-0.491747	prediction are different for	-0.124939
-1.062175	are the same for	-0.124939
-0.318562	library have functions for	-0.124939
-0.318562	as efficient functions for	-0.124939
-0.072313	contains many functions for	-0.124939
-0.159120	Includes many functions for	-0.124939
-0.318562	the necessary functions for	-0.124939
-0.521112	with intrinsic functions for	-0.124939
-0.318562	contains various functions for	-0.124939
-0.412887	than frame functions for	-0.124939
-0.142111	12.7 Mathematical functions for	-0.124939
-0.482919	12.3. Intrinsic functions for	-0.124939
-0.318562	or QueryPerformanceCounter functions for	-0.124939
-0.316617	is used only for	-0.124939
-0.316617	be used only for	-0.124939
-0.216545	are used only for	-0.124939
-0.327329	this method only for	-0.124939
-0.243175	and works only for	-0.124939
-0.243175	code works only for	-0.124939
-0.243175	compiler works only for	-0.124939
-0.243175	method works only for	-0.124939
-0.423807	the dispatching only for	-0.124939
-0.327329	is allowed only for	-0.124939
-0.353944	has no instruction for	-0.124939
-0.353944	inline assembly instruction for	-0.124939
-0.856210	use a loop for	-0.124939
-0.532737	x^10 // loop for	-0.124939
-0.349584	// Main loop for	-0.124939
-0.352863	hot spots, but for	-0.124939
-0.352863	scientific computing, but for	-0.124939
-0.524533	that is used for	-0.124939
-0.250980	It is used for	-0.124939
-0.250980	table is used for	-0.124939
-0.250980	thread is used for	-0.124939
-0.250980	space is used for	-0.124939
-0.068699	operator is used for	-0.124939
-0.360440	process is used for	-0.124939
-0.250980	frame is used for	-0.124939
-0.250980	INSTRSET is used for	-0.124939
-0.250980	longjmp is used for	-0.124939
-0.229274	popular and used for	-0.124939
-0.312818	can be used for	-0.191886
-0.300472	may be used for	-0.124939
-0.195060	also be used for	-0.124939
-0.357906	cannot be used for	-0.124939
-0.594567	functions are used for	-0.124939
-0.420959	which are used for	-0.124939
-0.420959	Threads are used for	-0.124939
-0.229274	ipow(x,10); // used for	-0.124939
-0.046986	are not used for	-0.425969
-0.304449	The time used for	-0.124939
-0.304449	definitions when used for	-0.124939
-0.229274	the CPU used for	-0.124939
-0.525547	is also used for	-0.124939
-0.580957	are often used for	-0.124939
-0.229274	cache space used for	-0.124939
-0.229274	The algorithms used for	-0.124939
-0.229274	method currently used for	-0.124939
-0.395104	program, and one for	-0.124939
-0.395104	SSE4.1 and one for	-0.124939
-0.333792	the program, one for	-0.124939
-0.333792	instruction set, one for	-0.124939
-0.333792	three times, one for	-0.124939
-0.333792	three parts: one for	-0.124939
-0.333792	two branches: one for	-0.124939
-0.565177	levels of cache for	-0.124939
-0.352097	an extra cache for	-0.124939
-1.114175	the instruction set for	-0.124939
-0.855793	supported instruction set for	-0.124939
-1.265284	structure or class for	-0.124939
-0.280037	and Intel compilers for	-0.124939
-0.356124	are used most for	-0.124939
-0.875348	the vector size for	-0.124939
-0.654604	a to b for	-0.124939
-0.572245	C function library for	-0.124939
-0.810763	user interface library for	-0.124939
-0.350322	makes intermediate object for	-0.124939
-0.350322	a temporary object for	-0.124939
-0.349751	is for C++ for	-0.124939
-0.451992	such as C++ for	-0.124939
-0.452904	This is efficient for	-0.124939
-0.835307	are most efficient for	-0.124939
-0.349586	the same array for	-0.124939
-0.808429	a linear array for	-0.124939
-0.989161	make it possible for	-0.124939
-0.497179	standardized as possible for	-0.124939
-0.338269	is therefore possible for	-0.124939
-0.338269	is rarely possible for	-0.124939
-0.430317	a 64-bit version for	-0.124939
-0.332534	32- bit version for	-0.124939
-0.488934	each new version for	-0.124939
-0.332534	set, another version for	-0.124939
-0.332534	a separate version for	-0.124939
-0.355768	of temporary objects for	-0.124939
-1.097781	of a variable for	-0.124939
-0.553821	same induction variable for	-0.124939
-0.330919	use induction variables for	-0.124939
-0.330919	make induction variables for	-0.124939
-0.466296	point induction variables for	-0.124939
-0.214626	elements Induction variables for	-0.124939
-0.214626	expressions Induction variables for	-0.124939
-0.214626	motion Induction variables for	-0.124939
-0.499197	a hash table for	-0.124939
-0.425644	a good performance for	-0.124939
-0.328799	selecting optimize performance for	-0.124939
-0.328799	almost identical performance for	-0.124939
-0.328799	Very poor performance for	-0.124939
-0.328799	definitely degrades performance for	-0.124939
-0.570569	optimizing the software for	-0.124939
-0.244256	A code branch for	-0.124939
-0.244256	any code branch for	-0.124939
-1.367877	function is called for	-0.124939
-0.359058	sum = 0; for	-0.124939
-0.550992	largest_index = 0; for	-0.124939
-0.592518	identical. For example, for	-0.124939
-0.522263	Convert to unsigned for	-0.124939
-0.513798	256-bit vector register for	-0.124939
-0.381831	the same register for	-0.124939
-0.330634	a temporary register for	-0.124939
-0.503065	various function libraries for	-0.124939
-0.503065	Optimized function libraries for	-0.124939
-0.416058	include standard libraries for	-0.124939
-0.321112	below. Many libraries for	-0.124939
-0.321112	contains well-tested libraries for	-0.124939
-0.354612	// Function template for	-0.124939
-0.345282	from using registers for	-0.124939
-1.183965	the XMM registers for	-0.124939
-0.511092	eliminates the need for	-0.124939
-0.336999	data. The need for	-0.124939
-1.356190	is no need for	-0.124939
-0.849081	how to test for	-0.124939
-0.253662	It is useful for	-0.124939
-0.165822	program is useful for	-0.124939
-0.165822	which is useful for	-0.124939
-0.151762	method is useful for	-0.124939
-0.165822	throw()specification is useful for	-0.124939
-0.461870	can be useful for	-0.124939
-0.881440	may be useful for	-0.124939
-0.051273	which are useful for	-0.124939
-0.051273	libraries are useful for	-0.124939
-0.051273	directives are useful for	-0.124939
-0.051273	profilers are useful for	-0.124939
-0.051273	Threads are useful for	-0.124939
-0.051273	References are useful for	-0.124939
-0.051273	~ are useful for	-0.124939
-0.188212	is most useful for	-0.124939
-0.399770	is also useful for	-0.124939
-0.217417	and very useful for	-0.124939
-0.317715	be very useful for	-0.124939
-0.326461	most cases, even for	-0.124939
-0.326461	never occurs, even for	-0.124939
-0.326461	a time-consumer even for	-0.124939
-0.326461	response times, even for	-0.124939
-0.546296	choose this method for	-0.124939
-0.344939	The preferred method for	-0.124939
-0.568448	but not always for	-0.124939
-0.527479	structures by 16 for	-0.124939
-0.343816	See page 16 for	-0.124939
-1.604815	the operating system for	-0.124939
-0.344529	See page 32 for	-0.124939
-0.344529	SSE2, preferably 32 for	-0.124939
-0.494736	closes the file for	-0.124939
-0.351558	a header file for	-0.124939
-0.494736	appropriate header file for	-0.124939
-0.098660	// Header file for	-0.425969
-0.490834	of the bits for	-0.124939
-0.343701	has enough bits for	-0.124939
-0.318334	use integer operations for	-0.124939
-0.130631	Using integer operations for	-0.425969
-0.444569	which is 0 for	-0.124939
-0.343874	the value 0 for	-0.124939
-0.354370	many different cases for	-0.124939
-0.293043	kind of instructions for	-0.124939
-0.293043	are no instructions for	-0.124939
-0.122260	of extra instructions for	-0.124939
-0.122260	few extra instructions for	-0.124939
-0.293043	are intrinsic instructions for	-0.124939
-0.381397	may reorder instructions for	-0.124939
-0.293043	the ADX instructions for	-0.124939
-0.136350	compiler is available for	-0.124939
-0.336204	edition is available for	-0.124939
-0.305824	operations are available for	-0.124939
-0.305824	versions are available for	-0.124939
-0.305824	templates are available for	-0.124939
-0.305824	frameworks are available for	-0.124939
-0.264898	extra register available for	-0.124939
-0.347147	integer registers available for	-0.124939
-0.264898	library. Only available for	-0.124939
-0.819641	the residual error for	-0.124939
-0.222749	the response times for	-0.124939
-0.500535	long response times for	-0.124939
-0.222749	longer response times for	-0.124939
-1.477284	on the stack for	-0.124939
-1.544226	It is important for	-0.124939
-0.878136	is very important for	-0.124939
-0.353784	than other CPUs for	-0.124939
-0.352897	are too large for	-0.124939
-0.533856	impossible to work for	-0.124939
-0.514495	method doesn't work for	-0.124939
-0.386171	multiple code versions for	-0.124939
-0.529229	in different versions for	-0.124939
-0.232909	in multiple versions for	-0.602060
-0.296934	have several versions for	-0.124939
-0.353328	dedicated physics processor for	-0.124939
-0.448739	that is compiled for	-0.124939
-0.313691	code is compiled for	-0.124939
-0.273971	file and compiled for	-0.124939
-0.614648	mixing code compiled for	-0.124939
-0.273971	necessary, each compiled for	-0.124939
-0.115730	and programs compiled for	-0.124939
-0.115730	in programs compiled for	-0.124939
-0.455755	is too big for	-0.124939
-0.496728	solution is best for	-0.124939
-0.568522	unit-testing is necessary for	-0.124939
-1.059296	is not necessary for	-0.124939
-0.521106	cycles per element for	-0.124939
-0.564155	of assembly language for	-0.124939
-0.533153	optimally. The speed for	-0.124939
-0.667509	{ int i; for	-0.124939
-0.160619	0; int i; for	-0.425969
-0.418286	1.0; int i; for	-0.124939
-0.418286	list[100]; int i; for	-0.124939
-0.418286	7.30a int i; for	-0.124939
-0.993342	It is common for	-0.124939
-0.338954	(/arch:SSE2, /arch:AVX etc. for	-0.124939
-0.338954	-msse2, -mavx, etc. for	-0.124939
-0.779554	possible to compile for	-0.124939
-0.353762	// Enable exception for	-0.124939
-0.546464	will be allocated for	-0.124939
-0.341176	on the option for	-0.124939
-0.480366	Use the option for	-0.124939
-0.086922	have an option for	-0.124939
-0.054969	has an option for	-0.124939
-0.286415	a compiler option for	-0.124939
-0.286415	same compiler option for	-0.124939
-0.243983	map file" option for	-0.124939
-0.119085	it is good for	-0.425969
-0.059798	languages are good for	-0.124939
-1.108922	in a matrix for	-0.124939
-0.455316	loss of precision for	-0.124939
-0.533782	code that works for	-0.124939
-0.373983	cache is optimized for	-0.124939
-0.343007	examples are optimized for	-0.124939
-0.261469	are not optimized for	-0.124939
-0.261469	that you optimized for	-0.124939
-0.261469	function, each optimized for	-0.124939
-0.105957	are highly optimized for	-0.124939
-0.343007	builder. Not optimized for	-0.124939
-0.057532	See the manual for	-0.124939
-0.545223	the compiler manual for	-0.124939
-0.481515	for this manual for	-0.124939
-0.297823	the vectorclass manual for	-0.124939
-0.353182	i, a[100], b; for	-0.124939
-0.234114	bypass the check for	-0.124939
-0.212171	has to check for	-0.124939
-0.311117	way to check for	-0.124939
-0.212171	how to check for	-0.124939
-0.212171	calls to check for	-0.124939
-0.338858	does not check for	-0.124939
-0.169385	must then check for	-0.124939
-0.017896	is no check for	-0.425969
-0.017896	have no check for	-0.425969
-0.169385	doesn't automatically check for	-0.124939
-0.169385	no automatic check for	-0.124939
-0.169385	We might check for	-0.124939
-0.169385	A missing check for	-0.124939
-0.169385	problem: (1) check for	-0.124939
-0.353552	cores are advantageous for	-0.124939
-0.922836	most efficient solution for	-0.124939
-0.420609	choosing a container for	-0.124939
-0.420609	lock a container for	-0.124939
-0.322635	use one container for	-0.124939
-0.082781	that have support for	-0.124939
-0.082781	compilers have support for	-0.124939
-0.082781	set has support for	-0.124939
-0.082781	system has support for	-0.124939
-0.185147	to make support for	-0.124939
-0.185147	has some support for	-0.124939
-0.017080	has hardware support for	-0.124939
-0.185147	need better support for	-0.124939
-0.039421	turn off support for	-0.124939
-0.185147	have inherent support for	-0.124939
-0.185147	has excellent support for	-0.124939
-0.415468	using overloaded operators for	-0.124939
-0.163660	Use bitwise operators for	-0.425969
-0.649583	< rows; i++) for	-0.124939
-0.454366	is a standard for	-0.124939
-0.455068	the microprocessor hardware for	-0.124939
-0.130677	positive and 1 for	-0.124939
-0.130677	false and 1 for	-0.124939
-0.318477	b with 1 for	-0.124939
-0.334729	size and optimizing for	-0.124939
-0.334729	choice between optimizing for	-0.124939
-0.335829	save some information for	-0.124939
-0.335829	save recovery information for	-0.124939
-0.582708	80 clock cycles for	-0.124939
-0.352023	b[size]; // ... for	-0.124939
-0.405219	int i; ... for	-0.124939
-0.251798	b[size], i; ... for	-0.124939
-0.268929	x); 136 ... for	-0.124939
-0.268929	i, j; ... for	-0.124939
-0.268929	int List[ArraySize]; ... for	-0.124939
-0.334509	full 64-bit addresses for	-0.124939
-0.334509	calculate element addresses for	-0.124939
-0.470430	different source files for	-0.124939
-0.333942	12.2. Header files for	-0.124939
-0.536592	are not recommended for	-0.124939
-1.462026	dynamic memory allocation for	-0.124939
-0.841991	want to optimize for	-0.124939
-0.351937	disadvantages mentioned above for	-0.124939
-0.454041	no caching problems for	-0.124939
-0.541013	that is optimal for	-0.124939
-0.351054	the same space for	-0.124939
-0.953397	in some cases, for	-0.124939
-0.350598	test all branches for	-0.124939
-0.538461	f = 1; for	-0.124939
-0.935436	b + 1; for	-0.124939
-0.519460	in code caching for	-0.124939
-0.351297	the best implementation for	-0.124939
-0.176547	disable exception handling for	-0.124939
-0.289128	commonly used methods for	-0.124939
-0.289128	more useful methods for	-0.124939
-0.289128	are various methods for	-0.124939
-0.289128	and suggests methods for	-0.124939
-0.350284	should be separate for	-0.124939
-0.544111	one memory block for	-0.124939
-0.329095	a small block for	-0.124939
-0.308412	a different name for	-0.124939
-0.826719	the same name for	-0.124939
-0.308412	the local name for	-0.124939
-0.359167	int r, c; for	-0.425969
-0.519820	often a disadvantage for	-0.124939
-0.350107	be annoyingly high for	-0.124939
-0.819596	a to zero for	-0.124939
-0.350843	to reserve resources for	-0.124939
-0.363770	code. The reason for	-0.124939
-0.363770	end. The reason for	-0.124939
-0.363770	directly. The reason for	-0.124939
-0.285530	compelling security reason for	-0.124939
-0.559638	virtual table lookup for	-0.124939
-0.452929	complete code examples for	-0.124939
-0.784063	makes no difference for	-0.124939
-0.328409	// Time difference for	-0.124939
-0.281744	code is needed for	-0.124939
-0.281744	time is needed for	-0.124939
-0.346542	may be needed for	-0.124939
-0.955297	is not needed for	-0.124939
-0.264397	extra work needed for	-0.124939
-0.348906	between two expressions for	-0.124939
-0.288201	it is difficult for	-0.124939
-0.351645	It is difficult for	-0.425969
-0.369099	is more difficult for	-0.124939
-0.100458	the OpenMP directives for	-0.124939
-0.702674	large runtime framework for	-0.124939
-0.553807	specify static linking for	-0.124939
-0.599564	Table[100]; int x; for	-0.124939
-0.218153	i; float x; for	-0.124939
-0.218153	j; float x; for	-0.124939
-0.955115	of hardware platform for	-0.124939
-0.322753	functions is higher for	-0.124939
-0.322753	costs are higher for	-0.124939
-0.641493	compiler cannot know for	-0.124939
-0.349180	get reliable results for	-0.124939
-0.415556	specify the options for	-0.124939
-0.320708	the available options for	-0.124939
-0.333704	have a feature for	-0.124939
-0.230059	such a feature for	-0.124939
-0.931311	can be made for	-0.124939
-0.320708	function library made for	-0.124939
-0.348884	is most appropriate for	-0.124939
-0.348151	function a constructor for	-0.124939
-0.495855	it is relevant for	-0.124939
-0.488417	of this section for	-0.124939
-0.803980	off the computer for	-0.124939
-0.015182	a good choice for	-0.124939
-0.047236	very good choice for	-0.124939
-0.230796	the optimal choice for	-0.124939
-0.448275	used in STL for	-0.124939
-0.173029	function is intended for	-0.124939
-0.173029	This is intended for	-0.124939
-0.262502	It is intended for	-0.124939
-0.173029	handling is intended for	-0.124939
-0.173029	feature is intended for	-0.124939
-0.221607	that are intended for	-0.124939
-0.158588	is not intended for	-0.124939
-0.158588	processing unit intended for	-0.124939
-0.348456	transfer is avoided for	-0.124939
-0.551737	any cache lines for	-0.124939
-0.111086	have one instance for	-0.124939
-0.111086	make one instance for	-0.124939
-0.111086	get one instance for	-0.124939
-0.052001	needs one instance for	-0.425969
-0.128514	is no checking for	-0.124939
-0.128514	have no checking for	-0.124939
-0.486205	may be inlined for	-0.124939
-0.446107	use a database for	-0.124939
-0.771570	call the destructor for	-0.124939
-0.133589	opens the possibility for	-0.124939
-0.327517	open the possibility for	-0.124939
-0.631238	// Define macro for	-0.124939
-0.345046	or hide them for	-0.124939
-0.344627	have separate containers for	-0.124939
-0.262812	dispatching are: Optimizing for	-0.124939
-0.262812	is critical. Optimizing for	-0.124939
-0.262812	for speed. Optimizing for	-0.124939
-0.633134	loop through rows for	-0.124939
-0.053813	registers when compiling for	-0.124939
-0.053813	Studio when compiling for	-0.124939
-0.053813	strict when compiling for	-0.124939
-0.053813	-fno-pic when compiling for	-0.124939
-0.053813	Vec16s when compiling for	-0.124939
-0.053813	Eclipse when compiling for	-0.124939
-0.541618	and data structures for	-0.124939
-0.343342	a precious resource for	-0.124939
-0.085960	c; double temp; for	-0.425969
-0.255682	float register temp; for	-0.124939
-0.051317	makes it easier for	-0.425969
-0.256328	this reordering easier for	-0.124939
-0.443898	is exactly identical for	-0.124939
-0.902982	of the program, for	-0.124939
-0.217703	same time, except for	-0.124939
-0.217703	same object, except for	-0.124939
-0.217703	16-bit programs, except for	-0.124939
-0.217703	the stack, except for	-0.124939
-0.290818	classes and templates for	-0.124939
-0.290818	150. Using templates for	-0.124939
-0.343540	"xmmintrin.h" // header for	-0.124939
-0.205488	is a penalty for	-0.124939
-0.205488	is no penalty for	-0.124939
-0.205145	no performance penalty for	-0.124939
-0.205145	51 performance penalty for	-0.124939
-0.342672	once. The reasons for	-0.124939
-0.341805	a software module for	-0.124939
-0.520491	It is used, for	-0.124939
-0.291975	are no checks for	-0.124939
-0.291975	make explicit checks for	-0.124939
-0.934655	The critical stride for	-0.124939
-0.342672	See page 3 for	-0.124939
-0.341372	meaningless event counts for	-0.124939
-0.182397	is big enough for	-0.124939
-0.341805	libraries have features for	-0.124939
-0.341853	temp = 3; for	-0.124939
-0.403779	C++ is chosen for	-0.124939
-0.370684	compiler has chosen for	-0.124939
-0.874477	that all destructors for	-0.124939
-0.926103	of parameter transfer for	-0.124939
-0.235712	needs. The search for	-0.124939
-0.235712	Some programs search for	-0.124939
-0.235712	use binary search for	-0.124939
-0.285126	Table 9.1. Time for	-0.124939
-0.285126	Table 9.3. Time for	-0.124939
-0.741962	can be mispredicted for	-0.124939
-0.313735	a test tool for	-0.124939
-0.313735	my test tool for	-0.124939
-0.487141	checking is included for	-0.124939
-0.339216	r2 and c2 for	-0.124939
-0.438027	graphics processing unit for	-0.124939
-0.105497	specific calling conventions for	-0.124939
-0.002123	5: "Calling conventions for	-0.823909
-0.056462	5. Calling conventions for	-0.124939
-0.339754	precautions to account for	-0.124939
-0.437352	of different algorithms for	-0.124939
-0.277180	makes a PLT for	-0.124939
-0.827719	GOT and PLT for	-0.124939
-0.276248	executed only once for	-0.124939
-0.276248	function. Compile once for	-0.124939
-0.097307	CPU is designed for	-0.124939
-0.097307	STL is designed for	-0.124939
-0.222908	was never designed for	-0.124939
-0.338142	program. The inputs for	-0.124939
-0.339216	are limiting factors for	-0.124939
-0.097307	math is required for	-0.124939
-0.097307	manipulation is required for	-0.124939
-0.222908	debugging if required for	-0.124939
-0.438703	and a GOT for	-0.124939
-0.336459	to perform poorly for	-0.124939
-0.335239	are not suitable for	-0.124939
-0.337070	metaprogramming // Template for	-0.124939
-0.337070	int i, a[100]; for	-0.124939
-0.614879	in 32-bit mode, for	-0.124939
-0.016529	See page 130 for	-0.124939
-0.033711	(see page 130 for	-0.124939
-0.033711	(See page 130 for	-0.124939
-0.266132	95 and 120 for	-0.124939
-0.266132	See page 120 for	-0.124939
-0.335849	with some changes for	-0.124939
-0.335849	again and again for	-0.124939
-0.335239	the optimization capabilities for	-0.124939
-0.013143	user is waiting for	-0.124939
-0.013143	thread is waiting for	-0.124939
-0.026696	we are waiting for	-0.124939
-0.013143	its time waiting for	-0.124939
-0.013143	their time waiting for	-0.124939
-0.026696	are often waiting for	-0.124939
-0.026696	do while waiting for	-0.124939
-0.337070	cycles. The rules for	-0.124939
-0.033863	have to wait for	-0.124939
-0.010997	has to wait for	-0.301030
-0.253590	shows the principle for	-0.124939
-0.253590	use this principle for	-0.124939
-0.763725	can be expected for	-0.124939
-0.332810	on C++ Performance for	-0.124939
-0.432437	also be convenient for	-0.124939
-0.332104	r1 and c1 for	-0.124939
-0.429778	See page 87 for	-0.124939
-0.085354	are not permissible for	-0.425969
-0.720162	factorial = 1.0; for	-0.124939
-0.332810	is rare. Testing for	-0.124939
-0.334226	turn off requirements for	-0.124939
-0.227263	in device drivers for	-0.124939
-0.227263	64-bit device drivers for	-0.124939
-0.047964	See page 122 for	-0.425969
-0.329432	c2 and bc for	-0.124939
-0.235938	used and searching for	-0.124939
-0.235938	use time searching for	-0.124939
-0.037852	the biggest vectors: for	-0.124939
-0.002272	the eight-element vectors: for	-0.726999
-0.328591	See page 80 for	-0.124939
-0.235260	page and 90 for	-0.124939
-0.235260	See page 90 for	-0.124939
-0.424336	level. My recommendation for	-0.124939
-0.328591	14.21. // Only for	-0.124939
-0.327752	See page 107 for	-0.124939
-0.327752	OS X Compilers for	-0.124939
-0.159618	same object (except for	-0.124939
-0.159618	integer expressions (except for	-0.124939
-0.159618	previous iteration (except for	-0.124939
-0.328591	Environments) have facilities for	-0.124939
-0.321305	See page 103 for	-0.124939
-0.321305	See page 51 for	-0.124939
-0.418876	linking is preferable for	-0.124939
-0.587634	See page 43 for	-0.124939
-0.038291	Full template specialization for	-0.425969
-0.080288	Partial template specialization for	-0.124939
-0.321305	See page 88 for	-0.124939
-0.472926	the heap manager for	-0.124939
-0.321305	See page 150 for	-0.124939
-0.002841	An optimization guide for	-0.249877
-0.416298	Various development tools for	-0.124939
-0.043749	the compiler documentation for	-0.124939
-0.321305	expect a directive for	-0.124939
-0.043749	same memory area for	-0.124939
-0.321305	See page 29 for	-0.124939
-0.321305	of 100 floats for	-0.124939
-0.308314	appendix at www.agner.org/optimize/cppexamples.zip for	-0.124939
-0.209937	alignment. See www.agner.org/optimize/cppexamples.zip for	-0.124939
-0.321305	See page 31 for	-0.124939
-0.322339	int i; 45 for	-0.124939
-0.403224	See page 49 for	-0.124939
-0.310769	10 page 101 for	-0.124939
-0.310769	See page 93 for	-0.124939
-0.310769	145 and 119 for	-0.124939
-0.439003	of memory blocks, for	-0.124939
-0.403224	like square blocking for	-0.124939
-0.312115	Induction = r; for	-0.124939
-0.310769	faster, except perhaps for	-0.124939
-0.310769	of cache organization for	-0.124939
-0.312115	loop for calculations: for	-0.124939
-0.312115	as the basis for	-0.124939
-0.310769	See page 81 for	-0.124939
-0.036814	See page 89 for	-0.425969
-0.310769	See page 153 for	-0.124939
-0.310769	See page 140 for	-0.124939
-0.310769	See page 141 for	-0.124939
-0.312115	are used twice for	-0.124939
-0.310769	threads are competing for	-0.124939
-0.036814	is not unusual for	-0.425969
-0.170761	and 10 ms for	-0.124939
-0.170761	typically 30 ms for	-0.124939
-0.310769	C++ Compiler Documentation for	-0.124939
-0.310769	and PLT lookups for	-0.124939
-0.036814	See page 78 for	-0.425969
-0.310769	enters the market for	-0.124939
-0.312115	bit in Day for	-0.124939
-0.136373	and VIA CPUs" for	-0.425969
-0.403224	of a variable, for	-0.124939
-0.312115	may be sufficient for	-0.124939
-0.290433	graphics accelerator card for	-0.124939
-0.290433	number of accumulators for	-0.124939
-0.378201	can be justified for	-0.124939
-0.290433	= 0, sum; for	-0.124939
-0.290433	inside the loop, for	-0.124939
-0.290433	Critical innermost loop: for	-0.124939
-0.290433	int i, StringLength; for	-0.124939
-0.290433	half of it, for	-0.124939
-0.290433	int i, a[2]; for	-0.124939
-0.290433	73 and 72 for	-0.124939
-0.101642	are not suited for	-0.124939
-0.101642	is best suited for	-0.124939
-0.290433	following: 130 Compile for	-0.124939
-0.290433	because various corrections for	-0.124939
-0.023268	// Approximate exp(x) for	-0.425969
-0.378201	at certain events, for	-0.124939
-0.290433	is a proxy for	-0.124939
-0.290433	the specific literature for	-0.124939
-0.290433	take special precautions for	-0.124939
-0.290433	and structures. Useful for	-0.124939
-0.290433	a compiler warning for	-0.124939
-0.290433	int row, column; for	-0.124939
-0.290433	times 24 dramatically for	-0.124939
-0.290433	Compilers and IDE's for	-0.124939
-0.378201	nfac = 1.f; for	-0.124939
-0.101642	optimized and fine-tuned for	-0.124939
-0.101642	that are fine-tuned for	-0.124939
-0.290433	program that waits for	-0.124939
-0.290433	time is consistent for	-0.124939
-0.378201	had an interpreter for	-0.124939
-0.290433	different memory spaces for	-0.124939
-0.290433	for all squares: for	-0.124939
-0.023268	is not uncommon for	-0.124939
-0.290433	background are unnecessary for	-0.124939
-0.290433	int i; 84 for	-0.124939
-0.290433	free register left for	-0.124939
-0.290433	so much stronger for	-0.124939
-0.290433	problems. The procedures for	-0.124939
-0.290433	cannot use ~ for	-0.124939
-0.290433	is sufficiently accurate for	-0.124939
-0.101642	stride will contend for	-0.124939
-0.101642	dynamic libraries contend for	-0.124939
-0.290433	A + B; for	-0.124939
-0.290433	compiler Linux Optimize for	-0.124939
-0.290433	of the subroutine for	-0.124939
-0.234583	code, specific preferences for	-0.124939
-0.234583	point division. Correction for	-0.124939
-0.234583	a similar utility for	-0.124939
-0.234583	and the FAQ for	-0.124939
-0.234583	< NUMROWS; row++) for	-0.124939
-0.234583	pattern can be, for	-0.124939
-0.234583	outside this interval, for	-0.124939
-0.234583	with two decimals, for	-0.124939
-0.234583	per matrix cell for	-0.124939
-0.234583	S1 list[100], *temp; for	-0.124939
-0.234583	use Intel VTune, for	-0.124939
-0.234583	make separate executables for	-0.124939
-0.234583	set is maintained for	-0.124939
-0.234583	for the IDE, for	-0.124939
-0.234583	used as buffers for	-0.124939
-0.234583	a limited audience for	-0.124939
-0.234583	Loops: A sourcebook for	-0.124939
-0.234583	a different meaning for	-0.124939
-0.234583	elements inside sqaure: for	-0.124939
-0.234583	response is delayed for	-0.124939
-0.234583	Compiler v. 11.1 for	-0.124939
-0.234583	free E-book Usability for	-0.124939
-0.234583	as e.g. .R. for	-0.124939
-0.234583	(see page 122) for	-0.124939
-0.234583	in registers. Except for	-0.124939
-0.234583	should be prepared for	-0.124939
-0.234583	modification to compensate for	-0.124939
-0.234583	heap is reserved for	-0.124939
-0.234583	long long timediff[NumberOfTests]; for	-0.124939
-0.234583	ignore a request for	-0.124939
-0.234583	"Software Optimization Guide for	-0.124939
-0.234583	are search requests for	-0.124939
-0.234583	the static keyword, for	-0.124939
-0.234583	will always compete for	-0.124939
-0.234583	|| expression. Assume, for	-0.124939
-0.234583	point of attack for	-0.124939
-0.234583	internet or intranet for	-0.124939
-0.234583	has been criticized for	-0.124939
-0.234583	precision math. Libraries for	-0.124939
-0.234583	before) } printf("\nResults:"); for	-0.124939
-0.234583	to the standards for	-0.124939
-0.234583	can optimize specifically for	-0.124939
-0.234583	CPU dispatching 125 for	-0.124939
-0.234583	improve the possibilities for	-0.124939
-0.234583	be very helpful for	-0.124939
-0.234583	// function prototypes for	-0.124939
-0.234583	memory footprint. If, for	-0.124939
-0.234583	and micro-operation breakdowns for	-0.124939
-0.234583	variable in parts, for	-0.124939
-0.234583	See my blog for	-0.124939
-0.234583	corrections and suggestions for	-0.124939
-0.234583	can handle. Waiting for	-0.124939
-0.234583	automatically detect opportunities for	-0.124939
-0.234583	advantageous as replacements for	-0.124939
-0.234583	by exception handlers for	-0.124939
-0.234583	-S or /Fa for	-0.124939
-0.234583	can be wired for	-0.124939
-0.234583	In example 12.3a, for	-0.124939
-0.234583	have a strategy for	-0.124939
-0.234583	is handled separately: for	-0.124939
-0.234583	{...} // Prototype for	-0.124939
-0.234583	C++ compilers exist for	-0.124939
-0.234583	and operating systems" for	-0.124939
-0.234583	Compiler v. 14.00 for	-0.124939
-0.234583	the name _alloca) for	-0.124939
-0.234583	more than doubled for	-0.124939
-0.234583	turn on correction for	-0.124939
-0.234583	performance by 5-10% for	-0.124939
-0.234583	registers. Typical candidates for	-0.124939
-0.234583	class is responsible for	-0.124939
-0.234583	the newsgroup comp.lang.asm.x86 for	-0.124939
-0.234583	is Intel's term for	-0.124939
-0.510180	this code is that	-0.124939
-0.227714	intermediate code is that	-0.301030
-1.152490	Intel compiler is that	-0.124939
-0.523379	for this is that	-0.124939
-0.523379	from this is that	-0.124939
-0.504996	static data is that	-0.124939
-0.510496	the point is that	-0.124939
-0.548103	same cache is that	-0.124939
-1.263720	instruction set is that	-0.124939
-0.483814	64-bit double is that	-0.124939
-0.531117	vector library is that	-0.124939
-0.974740	this method is that	-0.124939
-0.915929	the result is that	-0.124939
-0.784698	definition language is that	-0.124939
-0.444319	32-bit Linux is that	-0.124939
-0.543972	The problem is that	-0.124939
-0.016126	The disadvantage is that	-0.602060
-0.050294	A disadvantage is that	-0.124939
-0.050294	Another disadvantage is that	-0.124939
-0.527265	function parameter is that	-0.124939
-0.015101	The reason is that	-0.249877
-0.343676	of optimizations is that	-0.124939
-0.999712	static linking is that	-0.124939
-0.483814	data storage is that	-0.124939
-0.403218	static here is that	-0.124939
-0.403218	problem here is that	-0.124939
-0.138694	cache contentions is that	-0.124939
-0.138694	such contentions is that	-0.124939
-0.629884	function inlining is that	-0.124939
-0.483814	made containers is that	-0.124939
-0.343676	be improved is that	-0.124939
-0.343676	lazy binding is that	-0.124939
-0.343676	complicated algorithms is that	-0.124939
-0.343676	keyword volatile is that	-0.124939
-0.063834	we notice is that	-0.124939
-0.343676	with macros is that	-0.124939
-0.343676	The consequence is that	-0.124939
-0.343676	an assumption is that	-0.124939
-0.063834	The conclusion is that	-0.124939
-0.343676	this argument is that	-0.124939
-0.503617	past history of that	-0.124939
-0.522067	becomes faster and that	-0.124939
-0.552927	1000 times and that	-0.124939
-0.355378	or 1 and that	-0.124939
-0.355378	vary dynamically and that	-0.124939
-0.459125	software development, and that	-0.124939
-0.357394	writing style are that	-0.124939
-1.011285	inside the function that	-0.124939
-0.163480	is a function that	-0.124939
-0.911447	to a function that	-0.124939
-0.759188	in a function that	-0.124939
-0.467536	using a function that	-0.124939
-0.467536	while a function that	-0.124939
-0.537469	exceptions. The function that	-0.124939
-0.220601	loop A function that	-0.124939
-0.220601	local A function that	-0.124939
-0.220601	destructor. A function that	-0.124939
-0.336102	a const function that	-0.124939
-0.434791	that every function that	-0.124939
-0.675287	a graphics function that	-0.124939
-0.336102	// Any function that	-0.124939
-0.953858	CPU detection function that	-0.124939
-0.336102	own error-handling function that	-0.124939
-1.105075	is compatible with that	-0.124939
-0.459951	doing optimizations on that	-0.124939
-0.356029	optimization effort on that	-0.124939
-0.730228	is the code that	-0.124939
-1.074573	of the code that	-0.124939
-0.730228	into the code that	-0.124939
-0.507084	check the code that	-0.124939
-0.507084	However, the code that	-0.124939
-0.507084	copying the code that	-0.124939
-0.507084	study the code that	-0.124939
-0.328592	shows a code that	-0.124939
-0.328592	insert a code that	-0.124939
-0.703219	piece of code that	-0.124939
-0.664912	pieces of code that	-0.124939
-0.556488	members. The code that	-0.124939
-0.229913	and for code that	-0.124939
-0.229913	choice for code that	-0.124939
-0.229913	time. A code that	-0.124939
-0.229913	says. A code that	-0.124939
-0.321306	to make code that	-0.124939
-0.321306	is other code that	-0.124939
-0.321306	149 All code that	-0.124939
-0.321306	to optimize code that	-0.124939
-0.321306	a complicated code that	-0.124939
-0.321306	here. Any code that	-0.124939
-0.472927	a loop-invariant code that	-0.124939
-0.321306	for improving code that	-0.124939
-0.420946	tell the compiler that	-0.124939
-0.548441	tells the compiler that	-0.124939
-0.448940	using a compiler that	-0.124939
-0.448940	Use a compiler that	-0.124939
-0.519399	possible. A compiler that	-0.124939
-0.579751	loop. The time that	-0.124939
-1.083037	the same time that	-0.124939
-1.539064	recommended to use that	-0.124939
-0.863516	part of memory that	-0.124939
-0.564534	pieces of data that	-0.124939
-0.351017	threads, while data that	-0.124939
-0.351017	including local data that	-0.124939
-0.994137	of the program that	-0.124939
-1.097867	in the program that	-0.124939
-0.524948	update the program that	-0.124939
-1.317553	of a program that	-0.124939
-0.462112	a C++ program that	-0.124939
-0.293461	a test program that	-0.124939
-0.293461	small test program that	-0.124939
-0.327852	a Windows program that	-0.124939
-0.327852	more well-structured program that	-0.124939
-0.327852	an antivirus program that	-0.124939
-0.161352	for the functions that	-0.124939
-0.161352	make the functions that	-0.124939
-0.073225	all the functions that	-0.425969
-0.161352	containing the functions that	-0.124939
-0.161352	implement the functions that	-0.124939
-0.161352	extracts the functions that	-0.124939
-0.161352	collect the functions that	-0.124939
-0.452784	even of functions that	-0.124939
-0.307003	conventions for functions that	-0.124939
-0.398569	efficiently if functions that	-0.124939
-0.307003	return from functions that	-0.124939
-0.433951	some other functions that	-0.124939
-0.513132	any member functions that	-0.124939
-0.307003	If several functions that	-0.124939
-0.307003	a few functions that	-0.124939
-0.521040	of mathematical functions that	-0.124939
-0.538033	type of CPU that	-0.124939
-0.457581	insert an instruction that	-0.124939
-0.354162	particularly slow instruction that	-0.124939
-0.532464	example, a loop that	-0.124939
-0.532464	processors, a loop that	-0.124939
-0.346606	inside another loop that	-0.124939
-1.181223	the innermost loop that	-0.124939
-0.357071	and generally used that	-0.124939
-0.124478	is the one that	-0.124939
-0.124478	be the one that	-0.124939
-0.124478	like the one that	-0.124939
-0.124478	find the one that	-0.124939
-0.512234	product is one that	-0.124939
-0.389689	set and one that	-0.124939
-0.389689	brands, and one that	-0.124939
-0.543683	the only one that	-0.124939
-0.327218	detection function, one that	-0.124939
-0.567838	also a cache that	-0.124939
-0.585005	fact an integer that	-0.124939
-1.115402	the instruction set that	-0.124939
-0.578125	an instruction set that	-0.124939
-0.810007	about the class that	-0.124939
-1.040023	function or class that	-0.124939
-0.148949	a container class that	-0.124939
-0.553231	choose the compilers that	-0.124939
-0.339644	only for compilers that	-0.124939
-0.339644	work on compilers that	-0.124939
-0.339644	accessible from compilers that	-0.124939
-0.439241	tell these compilers that	-0.124939
-0.338520	smallest data size that	-0.124939
-0.273613	an integer size that	-0.124939
-0.073558	smallest integer size that	-0.301030
-0.804608	of the library that	-0.124939
-0.490148	than the library that	-0.124939
-0.423789	another function library that	-0.124939
-0.423789	standard function library that	-0.124939
-0.423789	up-to-date function library that	-0.124939
-0.486395	another. The object that	-0.124939
-0.507706	array or object that	-0.124939
-0.570929	is an object that	-0.124939
-0.469008	call the version that	-0.124939
-0.135305	takes. The version that	-0.124939
-0.135305	brand. The version that	-0.124939
-0.332903	make one version that	-0.124939
-0.332903	a generic version that	-0.124939
-0.502122	on the value that	-0.124939
-0.182208	from the value that	-0.124939
-0.502122	Here, the value that	-0.124939
-0.292864	from a value that	-0.124939
-0.292864	constant a value that	-0.124939
-1.135129	variables and objects that	-0.124939
-0.452507	for big objects that	-0.124939
-0.837432	of the variable that	-0.124939
-0.978984	to a variable that	-0.124939
-0.526392	thread. A variable that	-0.124939
-0.149848	used in so that	-0.124939
-0.149848	around it so that	-0.124939
-0.149848	thread function so that	-0.124939
-0.068493	the code so that	-0.124939
-0.068493	critical code so that	-0.124939
-0.068493	of time so that	-0.124939
-0.068493	compile time so that	-0.124939
-0.149848	128-bit vector so that	-0.124939
-0.149848	b different so that	-0.124939
-0.149848	this example so that	-0.124939
-0.149848	function call so that	-0.124939
-0.149848	register less so that	-0.124939
-0.149848	sign bit so that	-0.124939
-0.149848	through pointers so that	-0.124939
-0.149848	64-bit operations so that	-0.124939
-0.149848	the calculations so that	-0.124939
-0.149848	separate threads so that	-0.124939
-0.149848	any exception so that	-0.124939
-0.149848	32-bit mode so that	-0.124939
-0.149848	reliable source so that	-0.124939
-0.149848	the start so that	-0.124939
-0.149848	be negative so that	-0.124939
-0.149848	code section so that	-0.124939
-0.149848	this statement so that	-0.124939
-0.149848	are inlined so that	-0.124939
-0.149848	a macro so that	-0.124939
-0.149848	by 100 so that	-0.124939
-0.149848	is changed so that	-0.124939
-0.149848	are identical so that	-0.124939
-0.149848	loop unrolling so that	-0.124939
-0.032898	be organized so that	-0.425969
-0.149848	an integer, so that	-0.124939
-0.149848	if possible, so that	-0.124939
-0.149848	more compact so that	-0.124939
-0.149848	explained above, so that	-0.124939
-0.149848	example 9.5 so that	-0.124939
-0.149848	example 12.4a so that	-0.124939
-0.149848	reciprocal factorials so that	-0.124939
-0.149848	task switches; so that	-0.124939
-0.149848	value 0x2C so that	-0.124939
-0.419268	choose the variables that	-0.124939
-0.419268	intended for variables that	-0.124939
-0.323690	sure that variables that	-0.124939
-0.323690	add counter variables that	-0.124939
-0.323690	for initialized variables that	-0.124939
-0.323690	for uninitialized variables that	-0.124939
-0.857552	of the table that	-0.124939
-0.355825	The highest performance that	-0.124939
-0.525028	lineage of software that	-0.124939
-0.334812	wasted on software that	-0.124939
-0.612910	to make software that	-0.124939
-0.334812	computer. Security software that	-0.124939
-0.356039	is so long that	-0.124939
-0.348969	is a branch that	-0.124939
-0.491140	has a branch that	-0.124939
-0.348969	example, a branch that	-0.124939
-0.452051	kind of branch that	-0.124939
-0.232719	way. A branch that	-0.124939
-0.232719	course. A branch that	-0.124939
-0.232719	mispredicted. A branch that	-0.124939
-0.232719	changes. A branch that	-0.124939
-0.302368	branches Remove branch that	-0.124939
-0.060612	in a way that	-0.124939
-0.215232	such a way that	-0.124939
-0.355838	user interface elements that	-0.124939
-0.299968	a memory address that	-0.425969
-0.548251	Assume, for example, that	-0.124939
-0.355022	the logical register that	-0.124939
-0.338028	system or libraries that	-0.124939
-0.750544	and function libraries that	-0.124939
-0.519123	general function libraries that	-0.124939
-0.354988	for saving registers that	-0.124939
-0.355153	things with pointers that	-0.124939
-0.458279	a performance test that	-0.124939
-0.336348	in all systems that	-0.124939
-0.502861	different operating systems that	-0.124939
-0.723190	old operating systems that	-0.124939
-0.070128	cannot be sure that	-0.124939
-0.070128	never be sure that	-0.124939
-0.301471	you are sure that	-0.124939
-0.242690	to make sure that	-0.301030
-0.272627	and make sure that	-0.124939
-0.181251	can make sure that	-0.124939
-0.181251	must make sure that	-0.124939
-0.181251	Therefore, make sure that	-0.124939
-0.025402	This makes sure that	-0.602060
-0.081152	system makes sure that	-0.124939
-0.081152	reference makes sure that	-0.124939
-0.081152	keyword makes sure that	-0.124939
-0.081152	product makes sure that	-0.124939
-0.022141	of making sure that	-0.124939
-0.045472	by making sure that	-0.124939
-0.473037	choose the method that	-0.124939
-0.335844	most important method that	-0.124939
-0.335844	an unfortunate method that	-0.124939
-0.543433	access a file that	-0.124939
-0.576443	i = 0 that	-0.124939
-0.567069	choose the type that	-0.124939
-0.818075	in the case that	-0.124939
-0.344185	the likely case that	-0.124939
-0.293670	rely on instructions that	-0.124939
-0.445455	have vector instructions that	-0.124939
-0.293670	application- specific instructions that	-0.124939
-0.293670	are single instructions that	-0.124939
-0.293670	a few instructions that	-0.124939
-0.293670	have certain instructions that	-0.124939
-0.293670	define application-specific instructions that	-0.124939
-0.360385	on the processors that	-0.124939
-0.208235	generation of processors that	-0.425969
-0.116178	time on processors that	-0.425969
-0.275824	on some processors that	-0.124939
-0.204756	the first processors that	-0.124939
-0.301825	The first processors that	-0.124939
-0.360385	all unknown processors that	-0.124939
-0.578068	be a constant that	-0.124939
-0.354155	a common error that	-0.124939
-0.599349	It is important that	-0.301030
-0.342773	compatible with CPUs that	-0.124939
-0.342773	with all CPUs that	-0.124939
-0.353696	is so large that	-0.124939
-0.502766	number of arrays that	-0.124939
-0.442378	apply to arrays that	-0.124939
-0.341851	is other work that	-0.124939
-0.341851	deleted. User work that	-0.124939
-0.873363	order to avoid that	-0.124939
-0.522123	situations to avoid that	-0.124939
-0.353980	time, any processor that	-0.124939
-0.137913	is so big that	-0.124939
-0.137913	are so big that	-0.124939
-0.341292	counts for threads that	-0.124939
-0.542148	into multiple threads that	-0.124939
-0.328639	also a language that	-0.124939
-0.541785	a programming language that	-0.124939
-0.328639	device. Any language that	-0.124939
-0.522538	than a thread that	-0.124939
-0.340573	doubled. A thread that	-0.124939
-0.137441	is so small that	-0.124939
-0.137441	are so small that	-0.124939
-0.763037	Use the option that	-0.124939
-0.535845	specify an option that	-0.124939
-0.326909	without any option that	-0.124939
-0.554734	example container classes that	-0.124939
-0.652926	then the line that	-0.124939
-0.925928	the cache line that	-0.124939
-0.676241	a cache line that	-0.124939
-0.553039	operators. Function parameters that	-0.124939
-0.554422	code to check that	-0.124939
-0.339166	a runtime check that	-0.124939
-0.542004	avoid the problem that	-0.124939
-0.337884	version causes problem that	-0.124939
-0.544059	more efficient solution that	-0.124939
-0.613349	use a container that	-0.124939
-0.433462	considered a container that	-0.124939
-0.103380	has the advantage that	-0.124939
-0.321390	come from operators that	-0.124939
-0.321390	that all operators that	-0.124939
-0.321390	1, but operators that	-0.124939
-0.348123	it is likely that	-0.124939
-0.588861	is very likely that	-0.124939
-0.353211	a parallel structure that	-0.124939
-0.541747	we can calculate that	-0.124939
-0.499773	block to copy that	-0.124939
-0.456847	only CPUID information that	-0.124939
-0.437280	it is certain that	-0.124939
-0.286687	cannot be certain that	-0.124939
-0.437280	you are certain that	-0.124939
-0.286687	is quite certain that	-0.124939
-0.525727	is almost certain that	-0.124939
-1.294770	few clock cycles that	-0.124939
-0.334986	pointers or addresses that	-0.124939
-0.334986	no absolute addresses that	-0.124939
-0.646504	is a counter that	-0.124939
-0.164905	maximum loop count that	-0.425969
-0.352524	connections. Temporary files that	-0.124939
-0.649972	is therefore recommended that	-0.124939
-0.135607	are so fast that	-0.124939
-0.135607	developing so fast that	-0.124939
-0.135503	if I write that	-0.124939
-0.135503	If I write that	-0.124939
-0.462533	use in programs that	-0.124939
-0.407162	high for programs that	-0.124939
-0.313949	are making programs that	-0.124939
-0.449975	involves the problems that	-0.124939
-0.386070	other resource problems that	-0.124939
-0.296851	for finding problems that	-0.124939
-0.446859	important usability problems that	-0.124939
-0.494915	(requires a microprocessor that	-0.124939
-0.456520	lot of branches that	-0.124939
-0.294391	are making branches that	-0.124939
-0.294391	size. Unpredictable branches that	-0.124939
-0.294391	needed. Predictable branches that	-0.124939
-0.331698	function or operator that	-0.124939
-0.429270	type casting operator that	-0.124939
-0.351443	a second application that	-0.124939
-0.097953	compiler can see that	-0.425969
-0.404239	compiler will see that	-0.124939
-0.571318	while the expression that	-0.124939
-0.347242	mask. The expression that	-0.124939
-0.082533	is an expression that	-0.425969
-0.347242	variables An expression that	-0.124939
-0.240688	counter. Any expression that	-0.124939
-0.240688	a loop-invariant expression that	-0.124939
-0.351028	is so complicated that	-0.124939
-0.567428	two data members that	-0.124939
-0.350974	is a model that	-0.124939
-0.670219	a memory block that	-0.124939
-0.470199	any memory block that	-0.124939
-0.350974	an arbitrary name that	-0.124939
-0.045973	has the disadvantage that	-0.602060
-0.038372	is so high that	-0.425969
-0.080465	be so high that	-0.124939
-0.559769	register to zero that	-0.124939
-0.454158	are allocated resources that	-0.124939
-0.352136	the same reason that	-0.124939
-0.975174	a CPU dispatcher that	-0.124939
-0.577697	to the programmer that	-0.124939
-0.423373	advantage in applications that	-0.124939
-0.326981	advantageous for applications that	-0.124939
-1.039899	CPU dispatch mechanism that	-0.124939
-0.117635	member function means that	-0.124939
-0.014977	etc. This means that	-0.124939
-0.014977	cases. This means that	-0.124939
-0.014977	units. This means that	-0.124939
-0.014977	ways. This means that	-0.124939
-0.014977	execution. This means that	-0.124939
-0.014977	cycle. This means that	-0.124939
-0.014977	28. This means that	-0.124939
-0.054847	const variable means that	-0.124939
-0.054847	global variable means that	-0.124939
-0.117635	to 10 means that	-0.124939
-0.117635	non-member function, means that	-0.124939
-0.117635	Aligned operands means that	-0.124939
-0.117635	decomposition here means that	-0.124939
-0.452165	often write expressions that	-0.124939
-0.350081	has preprocessing directives that	-0.124939
-0.395993	but it requires that	-0.124939
-0.395993	option. This requires that	-0.124939
-0.278356	hardware often requires that	-0.124939
-0.511382	This method requires that	-0.124939
-0.301298	doing the optimizations that	-0.124939
-0.301298	the compiler optimizations that	-0.124939
-0.551384	from making optimizations that	-0.124939
-0.324196	a software framework that	-0.124939
-0.649698	large runtime framework that	-0.124939
-0.643267	with old microprocessors that	-0.124939
-0.153259	compiler to assume that	-0.124939
-0.082152	permissible to assume that	-0.124939
-0.048078	problem and assume that	-0.124939
-0.005727	you can assume that	-0.124939
-0.005727	You can assume that	-0.124939
-0.048078	overflow or assume that	-0.124939
-0.048078	can you assume that	-0.124939
-0.096121	If we assume that	-0.124939
-0.023374	you cannot assume that	-0.124939
-0.023374	You cannot assume that	-0.124939
-0.048078	compiler would assume that	-0.124939
-0.011530	can generally assume that	-0.425969
-0.048078	compiler makers assume that	-0.124939
-0.048078	can safely assume that	-0.124939
-0.592457	The table shows that	-0.124939
-0.323911	page 16) shows that	-0.124939
-0.225832	do not know that	-0.124939
-0.300360	If you know that	-0.124939
-0.225832	that we know that	-0.124939
-0.425075	compiler cannot know that	-0.124939
-0.225832	who would know that	-0.124939
-0.225832	(Microsoft, Intel) know that	-0.124939
-0.322948	also other advantages that	-0.124939
-0.322948	are specific advantages that	-0.124939
-0.551528	various optimization options that	-0.124939
-0.384962	has the feature that	-0.124939
-0.295950	the special feature that	-0.124939
-0.295950	symbol interposition feature that	-0.124939
-0.491142	a default constructor that	-0.124939
-0.292860	This may require that	-0.124939
-0.219506	vector operations require that	-0.124939
-0.219506	these instructions require that	-0.124939
-0.219506	Some applications require that	-0.124939
-0.219506	Some profilers require that	-0.124939
-0.219506	and MOVNTDQ require that	-0.124939
-0.319162	for all modules that	-0.124939
-0.413633	all .cpp modules that	-0.124939
-0.376771	couple of things that	-0.124939
-0.289265	reveals three things that	-0.124939
-0.289265	often reveal things that	-0.124939
-0.450266	All the reductions that	-0.124939
-0.347730	that each statement that	-0.124939
-0.409394	catch programming errors that	-0.124939
-0.315748	for detecting errors that	-0.124939
-0.408925	than other languages that	-0.124939
-0.484776	from programming languages that	-0.124939
-0.527901	include a profiler that	-0.124939
-0.346988	an illegal operation that	-0.124939
-0.222324	of the fact that	-0.124939
-0.045837	way. The fact that	-0.124939
-0.045837	underflow. The fact that	-0.124939
-0.045837	bit. The fact that	-0.124939
-0.045837	20. The fact that	-0.124939
-0.347069	portable to platforms that	-0.124939
-0.309265	a single task that	-0.124939
-0.309265	checking. Any task that	-0.124939
-0.346837	one for constants that	-0.124939
-0.191762	be a destructor that	-0.124939
-0.191762	with a destructor that	-0.124939
-0.285637	have a destructor that	-0.124939
-0.214883	exception-safe code Assume that	-0.124939
-0.214883	} } Assume that	-0.124939
-0.214883	memory access. Assume that	-0.124939
-0.214883	of speed. Assume that	-0.124939
-0.214883	in general. Assume that	-0.124939
-0.346373	the first algorithm that	-0.124939
-0.314509	of the possibility that	-0.124939
-0.129385	out the possibility that	-0.124939
-0.237201	the theoretical possibility that	-0.124939
-0.347663	from this discussion that	-0.124939
-0.346662	satisfied. The conditions that	-0.124939
-0.346163	with an offset that	-0.124939
-0.076488	and 1. Note that	-0.124939
-0.076488	desired version. Note that	-0.124939
-0.076488	Windows system. Note that	-0.124939
-0.076488	character arrays. Note that	-0.124939
-0.076488	for details. Note that	-0.124939
-0.076488	an explanation. Note that	-0.124939
-0.076488	less optimized. Note that	-0.124939
-0.076488	optimized away. Note that	-0.124939
-0.076488	file disassembler. Note that	-0.124939
-0.076488	patch. 131 Note that	-0.124939
-0.058876	put the operand that	-0.425969
-0.346163	of other tasks that	-0.124939
-0.176553	more efficient. Variables that	-0.124939
-0.242446	static storage Variables that	-0.124939
-0.176553	effects are: Variables that	-0.124939
-0.176553	temporary storage. Variables that	-0.124939
-0.079366	functions. 9.4 Variables that	-0.124939
-0.079366	88 9.4 Variables that	-0.124939
-0.058747	it is clear that	-0.124939
-0.345378	may occasionally predict that	-0.124939
-0.560772	actual clock frequency that	-0.124939
-0.345108	an extra iteration that	-0.124939
-0.302232	brands or models that	-0.124939
-0.302232	all newer models that	-0.124939
-0.485421	it is true that	-0.124939
-0.633703	functions have names that	-0.124939
-0.345378	also other details that	-0.124939
-0.302721	chains. Another thing that	-0.124939
-0.302721	The third thing that	-0.124939
-0.543040	other data structures that	-0.124939
-0.445297	we must consider that	-0.124939
-0.344452	the time delay that	-0.124939
-0.344159	FPGA soft cores that	-0.124939
-0.343865	value of ebx that	-0.124939
-0.342057	series of statements that	-0.124939
-0.342378	a zigzag course that	-0.124939
-0.537494	is a risk that	-0.124939
-0.293434	a higher risk that	-0.124939
-0.342378	the loop buffer that	-0.124939
-0.292573	there is something that	-0.124939
-0.292573	is certainly something that	-0.124939
-0.583622	the clock counts that	-0.124939
-0.291713	"best case" counts that	-0.124939
-0.432730	it can happen that	-0.124939
-0.292573	can often happen that	-0.124939
-0.479528	primitive programming style that	-0.124939
-0.286094	member function, provided that	-0.124939
-0.286094	use branches, provided that	-0.124939
-0.236576	sure that everything that	-0.124939
-0.236576	make sure everything that	-0.124939
-0.236576	to eliminate everything that	-0.124939
-0.339112	column-wise. Assume now that	-0.124939
-0.795770	take into account that	-0.124939
-0.297520	up the factors that	-0.124939
-0.046022	are several factors that	-0.425969
-0.339112	the compiler explicitly that	-0.124939
-0.336794	gives a measure that	-0.124939
-0.337248	to disk. Software that	-0.124939
-0.336794	used the trick that	-0.124939
-0.337248	have some disadvantages that	-0.124939
-0.337702	has multiple instances that	-0.124939
-0.336341	are so expensive that	-0.124939
-0.255866	should be aware that	-0.425969
-0.336794	the runtime polymorphism that	-0.124939
-0.001688	in the sense that	-0.301030
-0.502651	requires, of course, that	-0.124939
-0.050984	you will notice that	-0.124939
-0.765698	can be expected that	-0.124939
-0.333198	They can detect that	-0.124939
-0.333723	table 9.1 show that	-0.124939
-0.050984	standard C, specifying that	-0.124939
-0.518387	called stack unwinding that	-0.124939
-0.329459	any other cleanup that	-0.124939
-0.102148	87 9.3 Functions that	-0.124939
-0.102148	CPUs". 9.3 Functions that	-0.124939
-0.441210	C/C++ standard specifies that	-0.124939
-0.235960	volatile keyword specifies that	-0.124939
-0.329459	allocated dynamically. Arrays that	-0.124939
-0.328835	it for lists that	-0.124939
-0.161972	in the event that	-0.425969
-0.328835	garbage collection. Objects that	-0.124939
-0.329459	memory areas. Data that	-0.124939
-0.330084	machine are frameworks that	-0.124939
-0.330710	const keyword tells that	-0.124939
-0.329459	have powerful facilities that	-0.124939
-0.330710	a constant divisor that	-0.124939
-0.282283	teachers to recommend that	-0.124939
-0.401148	programming textbooks recommend that	-0.124939
-0.322373	can roughly estimate that	-0.124939
-0.323141	can be said that	-0.124939
-0.047996	You may think that	-0.124939
-0.047996	and then think that	-0.124939
-0.047996	but I think that	-0.124939
-0.047996	I don't think that	-0.124939
-0.323911	more efficient alternatives that	-0.124939
-0.322373	offering profiling tools that	-0.124939
-0.676960	a hot spot that	-0.124939
-0.322373	have variable lengths that	-0.124939
-0.323141	the unfortunate consequence that	-0.124939
-0.092641	on the assumption that	-0.124939
-0.092641	compiler, the assumption that	-0.124939
-0.036907	it will recognize that	-0.124939
-0.036907	compiler will recognize that	-0.124939
-0.036907	compilers will recognize that	-0.124939
-0.323141	time applications. Remember that	-0.124939
-0.322373	of the considerations that	-0.124939
-0.075367	an initialization routine that	-0.124939
-0.322373	only one, auto_ptr that	-0.124939
-0.171263	we are assuming that	-0.124939
-0.171263	prevented from assuming that	-0.124939
-0.311811	#pragma optimize("a",on). Specifies that	-0.124939
-0.311811	crystal ball reveals that	-0.124939
-0.311811	with many labels that	-0.124939
-0.311811	scan instruction. Programmers that	-0.124939
-0.311811	another function F2 that	-0.124939
-0.312811	is it unusual that	-0.124939
-0.065251	64-bit systems. Applications that	-0.124939
-0.065251	user interface. Applications that	-0.124939
-0.065251	page 141. Applications that	-0.124939
-0.707579	linkage table (PLT) that	-0.124939
-0.311811	services. Many services that	-0.124939
-0.236296	{ // Check that	-0.124939
-0.171263	version. 2. Check that	-0.124939
-0.142519	etc. But beware that	-0.124939
-0.142519	inlined. But beware that	-0.124939
-0.291427	better. Remember again, that	-0.124939
-0.101963	dispatching and discovered that	-0.124939
-0.101963	programmers have discovered that	-0.124939
-0.291427	a 90% chance that	-0.124939
-0.291427	is a chip that	-0.124939
-0.047996	and I believe that	-0.124939
-0.047996	expected. I believe that	-0.124939
-0.101963	vector operations. Algorithms that	-0.124939
-0.101963	and matrixes. Algorithms that	-0.124939
-0.291427	based on hacks that	-0.124939
-0.101963	= C; Assuming that	-0.124939
-0.101963	it has. Assuming that	-0.124939
-0.101963	important to note that	-0.124939
-0.101963	manuals. Please note that	-0.124939
-0.291427	by random events that	-0.124939
-0.101963	simple expressions. Operations that	-0.124939
-0.101963	and aliasing. Operations that	-0.124939
-0.101963	vector register. Factors that	-0.124939
-0.101963	vectorization is. Factors that	-0.124939
-0.291427	useful on servers that	-0.124939
-0.101963	software optimization. Everything that	-0.124939
-0.101963	same register. Everything that	-0.124939
-0.291427	compiler may report that	-0.124939
-0.291427	CPUs to verify that	-0.124939
-0.291427	and other complications that	-0.124939
-0.291427	can therefore conclude that	-0.124939
-0.291427	cleaning up spaces that	-0.124939
-0.379418	and more complex, that	-0.124939
-0.023335	can only hope that	-0.124939
-0.379418	modify the ones that	-0.124939
-0.291427	a program saying that	-0.124939
-0.291427	integer vectors. Code that	-0.124939
-0.379418	known with certainty that	-0.124939
-0.291427	worst-case conditions. Programs that	-0.124939
-0.101963	C standard says that	-0.124939
-0.101963	usage convention says that	-0.124939
-0.291427	and then interpret that	-0.124939
-0.235457	has not noticed that	-0.124939
-0.235457	for making plug-ins that	-0.124939
-0.235457	processing speed exceeding that	-0.124939
-0.235457	the programmer forgets that	-0.124939
-0.235457	set into sub-vectors that	-0.124939
-0.235457	may seem illogical that	-0.124939
-0.235457	a virus scanner that	-0.124939
-0.235457	Example 14.27 assumes that	-0.124939
-0.235457	It is assumed that	-0.124939
-0.235457	is so kludgy that	-0.124939
-0.235457	important to remember that	-0.124939
-0.235457	header file mathimf.h that	-0.124939
-0.235457	The so-called iterators that	-0.124939
-0.235457	when you discover that	-0.124939
-0.235457	The common excuse that	-0.124939
-0.235457	increase the likelihood that	-0.124939
-0.235457	Quine–McCluskey or Espresso) that	-0.124939
-0.235457	and web browsing that	-0.124939
-0.235457	is no guarantee that	-0.124939
-0.235457	early planning stage that	-0.124939
-0.235457	the so-called CPU-dispatcher that	-0.124939
-0.235457	and later discovers that	-0.124939
-0.235457	important to realize that	-0.124939
-0.235457	doubled. Thin clients that	-0.124939
-0.235457	must be emphasized that	-0.124939
-0.235457	(*.dll or *.so) that	-0.124939
-0.235457	a strict formalism that	-0.124939
-0.235457	a program dictates that	-0.124939
-0.235457	bear in mind, that	-0.124939
-0.235457	it is unrealistic that	-0.124939
-0.235457	eliminate common subexpressions that	-0.124939
-0.235457	is, I guess, that	-0.124939
-0.235457	error condition. Things that	-0.124939
-0.235457	the compiler knows that	-0.124939
-0.235457	and the wires that	-0.124939
-0.235457	Some developers feel that	-0.124939
-0.235457	Linux. 82 Keywords that	-0.124939
-0.235457	difference, let's say that	-0.124939
-0.235457	has the complication that	-0.124939
-0.235457	costs to multithreading that	-0.124939
-0.235457	it is unlikely that	-0.124939
-0.235457	One may argue that	-0.124939
-0.235457	or from knowing that	-0.124939
-0.235457	Preprocessing directives (everything that	-0.124939
-0.529729	prefer a to be	-0.124939
-0.436786	a function to be	-0.124939
-0.436786	member function to be	-0.124939
-1.074295	the code to be	-0.124939
-1.549223	the compiler to be	-0.124939
-0.421702	that have to be	-0.124939
-0.718508	may have to be	-0.124939
-0.421702	data have to be	-0.124939
-0.421702	processors have to be	-0.124939
-0.421702	calculations have to be	-0.124939
-0.421702	versions have to be	-0.124939
-0.421702	values have to be	-0.124939
-0.233977	want this to be	-0.124939
-0.233977	expect this to be	-0.124939
-0.774426	the memory to be	-0.124939
-0.357247	that has to be	-0.124939
-0.357247	pointer has to be	-0.124939
-0.357247	line has to be	-0.124939
-0.357247	mode has to be	-0.124939
-0.357247	offset has to be	-0.124939
-0.483677	a number to be	-0.124939
-0.328860	the variable to be	-0.124939
-0.294159	other variables to be	-0.124939
-0.294159	allow variables to be	-0.124939
-0.659627	the table to be	-0.124939
-0.425721	the software to be	-0.124939
-0.258548	that need to be	-0.124939
-0.344793	not need to be	-0.124939
-0.258548	may need to be	-0.124939
-0.344793	libraries need to be	-0.124939
-0.344793	files need to be	-0.124939
-0.736957	is sure to be	-0.124939
-0.061860	turns out to be	-0.425969
-1.439867	you want to be	-0.124939
-0.328860	cache line to be	-0.124939
-0.124889	function parameters to be	-0.124939
-0.124889	four parameters to be	-0.124939
-0.057971	fourteen parameters to be	-0.425969
-0.028019	is known to be	-0.124939
-0.210802	is likely to be	-0.124939
-0.207393	are likely to be	-0.124939
-0.207393	very likely to be	-0.124939
-0.207393	less likely to be	-0.124939
-0.207393	therefore likely to be	-0.124939
-0.207393	equally likely to be	-0.124939
-0.369342	is certain to be	-0.124939
-0.257883	not certain to be	-0.124939
-0.257883	therefore certain to be	-0.124939
-0.425721	return addresses to be	-0.124939
-0.328860	many files to be	-0.124939
-0.235124	that needs to be	-0.124939
-0.302853	file needs to be	-0.124939
-0.302853	constant needs to be	-0.124939
-0.302853	list needs to be	-0.124939
-0.328860	point division to be	-0.124939
-1.475516	the programmer to be	-0.124939
-0.659627	is intended to be	-0.124939
-0.328860	the heap to be	-0.124939
-0.425721	the executable to be	-0.124939
-0.328860	the sequence to be	-0.124939
-0.084606	program happen to be	-0.124939
-0.084606	variables happen to be	-0.124939
-0.084606	matrix happen to be	-0.124939
-0.146537	long enough to be	-0.124939
-0.233977	be expected to be	-0.124939
-0.233977	are expected to be	-0.124939
-0.073759	is guaranteed to be	-0.124939
-0.162660	not guaranteed to be	-0.124939
-0.328860	metaprogramming tools to be	-0.124939
-0.425721	is going to be	-0.124939
-0.134019	it appears to be	-0.124939
-0.134019	this appears to be	-0.124939
-0.328860	function argument to be	-0.124939
-0.328860	of dangers to be	-0.124939
-0.328860	just happened to be	-0.124939
-0.035634	pointed to can be	-0.124939
-0.228298	code and can be	-0.124939
-0.206415	code that can be	-0.124939
-0.126850	cache that can be	-0.124939
-0.126850	way that can be	-0.124939
-0.126850	instructions that can be	-0.124939
-0.126850	language that can be	-0.124939
-0.058810	count that can be	-0.425969
-0.206415	applications that can be	-0.124939
-0.126850	expressions that can be	-0.124939
-0.126850	advantages that can be	-0.124939
-0.058810	things that can be	-0.124939
-0.126850	thing that can be	-0.124939
-0.206415	something that can be	-0.124939
-0.126850	alternatives that can be	-0.124939
-0.126850	chip that can be	-0.124939
-0.196952	and it can be	-0.124939
-0.188335	that it can be	-0.124939
-0.255288	then it can be	-0.124939
-0.152618	because it can be	-0.124939
-0.289148	but it can be	-0.124939
-0.075582	cases it can be	-0.124939
-0.167149	But it can be	-0.124939
-0.167149	least, it can be	-0.124939
-0.198977	the function can be	-0.124939
-0.198977	frame function can be	-0.124939
-0.198977	exponential function can be	-0.124939
-0.176968	the code can be	-0.124939
-0.117991	of code can be	-0.124939
-0.195795	The code can be	-0.124939
-0.117991	this code can be	-0.124939
-0.117991	same code can be	-0.124939
-0.117991	above code can be	-0.124939
-0.026010	} This can be	-0.301030
-0.154527	code. This can be	-0.124939
-0.083228	memory. This can be	-0.124939
-0.083228	x; This can be	-0.124939
-0.083228	pointer. This can be	-0.124939
-0.083228	operations. This can be	-0.124939
-0.083228	variable. This can be	-0.124939
-0.083228	fast. This can be	-0.124939
-0.083228	important. This can be	-0.124939
-0.083228	times. This can be	-0.124939
-0.083228	process. This can be	-0.124939
-0.083228	b2; This can be	-0.124939
-0.083228	only. This can be	-0.124939
-0.083228	CPUs"). This can be	-0.124939
-0.083228	free. This can be	-0.124939
-0.154527	modified. This can be	-0.124939
-0.083228	saturated. This can be	-0.124939
-0.083228	it). This can be	-0.124939
-0.083228	scheduler. This can be	-0.124939
-0.083228	32-62. This can be	-0.124939
-0.083228	place. This can be	-0.124939
-0.164369	case x can be	-0.124939
-0.251885	how this can be	-0.124939
-0.164369	load time can be	-0.124939
-0.164369	cache use can be	-0.124939
-0.246269	does It can be	-0.124939
-0.246269	operations. It can be	-0.124939
-0.246269	CPU. It can be	-0.124939
-0.228298	public data can be	-0.124939
-0.331469	each vector can be	-0.124939
-0.331469	The same can be	-0.124939
-0.138205	intrinsic functions can be	-0.124939
-0.138205	missing functions can be	-0.124939
-0.164369	prefetch instruction can be	-0.124939
-0.361605	the loop can be	-0.124939
-0.068897	CPU which can be	-0.124939
-0.068897	i which can be	-0.124939
-0.068897	register which can be	-0.124939
-0.068897	instructions which can be	-0.124939
-0.068897	references, which can be	-0.124939
-0.068897	attribute which can be	-0.124939
-0.068897	YMM) which can be	-0.124939
-0.074454	to integer can be	-0.124939
-0.074454	an integer can be	-0.124939
-0.251885	the set can be	-0.124939
-0.164369	template class can be	-0.124939
-0.035634	this example can be	-0.124939
-0.349715	these compilers can be	-0.124939
-0.074454	variable size can be	-0.124939
-0.074454	>= size can be	-0.124939
-0.048192	a pointer can be	-0.124939
-0.048192	A pointer can be	-0.124939
-0.048192	link pointer can be	-0.124939
-0.164369	on b can be	-0.124939
-0.082244	dynamic library can be	-0.124939
-0.107932	interface library can be	-0.124939
-0.048192	the object can be	-0.124939
-0.048192	shared object can be	-0.124939
-0.048192	existing object can be	-0.124939
-0.099317	simple array can be	-0.124939
-0.099317	large array can be	-0.124939
-0.099317	An array can be	-0.124939
-0.077554	of objects can be	-0.124939
-0.077554	when objects can be	-0.124939
-0.077554	many objects can be	-0.124939
-0.077554	new objects can be	-0.124939
-0.048192	a variable can be	-0.124939
-0.048192	A variable can be	-0.124939
-0.048192	induction variable can be	-0.124939
-0.138205	of variables can be	-0.124939
-0.138205	Integer variables can be	-0.124939
-0.164369	of 2 can be	-0.124939
-0.228298	The performance can be	-0.124939
-0.220088	A branch can be	-0.124939
-0.138205	optimal branch can be	-0.124939
-0.164369	are stored can be	-0.124939
-0.074454	the address can be	-0.124939
-0.074454	target address can be	-0.124939
-0.164369	carry bit can be	-0.124939
-0.074454	same register can be	-0.124939
-0.074454	XMM register can be	-0.124939
-0.035634	Function libraries can be	-0.425969
-0.074454	invalid pointers can be	-0.124939
-0.074454	Smart pointers can be	-0.124939
-0.228298	and they can be	-0.124939
-0.017451	This method can be	-0.124939
-0.035634	same method can be	-0.124939
-0.035634	similar method can be	-0.124939
-0.164369	data access can be	-0.124939
-0.035634	operating system can be	-0.124939
-0.164369	a file can be	-0.124939
-0.164369	oriented programming can be	-0.124939
-0.074454	of operations can be	-0.124939
-0.074454	Boolean operations can be	-0.124939
-0.164369	composite type can be	-0.124939
-0.251885	non-Intel processors can be	-0.124939
-0.164369	processors available can be	-0.124939
-0.074454	a constant can be	-0.124939
-0.074454	A constant can be	-0.124939
-0.164369	the stack can be	-0.124939
-0.264177	Intel CPUs can be	-0.124939
-0.164369	and arrays can be	-0.124939
-0.164369	caches work can be	-0.124939
-0.164369	function calls can be	-0.124939
-0.164369	the result can be	-0.124939
-0.164369	unused bytes can be	-0.124939
-0.164369	branches inside can be	-0.124939
-0.074454	This problem can be	-0.124939
-0.074454	safety problem can be	-0.124939
-0.164369	sorted list can be	-0.124939
-0.164369	the hardware can be	-0.124939
-0.164369	unwinding information can be	-0.124939
-0.164369	{ ... can be	-0.124939
-0.011556	loop counter can be	-0.602060
-0.035634	stamp counter can be	-0.124939
-0.331469	memory allocation can be	-0.124939
-0.164369	then both can be	-0.124939
-0.164369	oriented programs can be	-0.124939
-0.164369	memory space can be	-0.124939
-0.164369	automatic dispatching can be	-0.124939
-0.164369	that branches can be	-0.124939
-0.164369	the multiplication can be	-0.124939
-0.074454	instruction sets can be	-0.124939
-0.074454	32 sets can be	-0.124939
-0.164369	mixed implementation can be	-0.124939
-0.164369	exception handling can be	-0.124939
-0.164369	data members can be	-0.124939
-0.164369	or reference can be	-0.124939
-0.164369	register keyword can be	-0.124939
-0.164369	table lookup can be	-0.124939
-0.164369	WTL applications can be	-0.124939
-0.138205	dispatching mechanism can be	-0.124939
-0.138205	dispatch mechanism can be	-0.124939
-0.164369	point numbers can be	-0.124939
-0.361605	A union can be	-0.124939
-0.164369	copy constructor can be	-0.124939
-0.164369	code section can be	-0.124939
-0.164369	cache contentions can be	-0.124939
-0.228298	These conversions can be	-0.124939
-0.228298	general statement can be	-0.124939
-0.164369	programming languages can be	-0.124939
-0.164369	These costs can be	-0.124939
-0.164369	two constants can be	-0.124939
-0.164369	the offset can be	-0.124939
-0.164369	This effect can be	-0.124939
-0.164369	These counters can be	-0.124939
-0.164369	program loading can be	-0.124939
-0.074454	the condition can be	-0.124939
-0.074454	if condition can be	-0.124939
-0.164369	critical stride can be	-0.124939
-0.164369	how metaprogramming can be	-0.124939
-0.164369	hash map can be	-0.124939
-0.035634	dependency chains can be	-0.124939
-0.361605	test tool can be	-0.124939
-0.164369	Lazy binding can be	-0.124939
-0.164369	a DLL can be	-0.124939
-0.164369	Lookup tables can be	-0.124939
-0.164369	data sections can be	-0.124939
-0.138205	variables. They can be	-0.124939
-0.138205	branches. They can be	-0.124939
-0.164369	if exceptions can be	-0.124939
-0.164369	these manuals can be	-0.124939
-0.164369	Such units can be	-0.124939
-0.164369	this polynomial can be	-0.124939
-0.164369	example 13.1 can be	-0.124939
-0.164369	address. Pointers can be	-0.124939
-0.164369	wasteful behavior can be	-0.124939
-0.164369	and edx can be	-0.124939
-0.164369	background job can be	-0.124939
-0.164369	of abc can be	-0.124939
-0.035634	upper limit can be	-0.124939
-0.164369	following guidelines can be	-0.124939
-0.164369	reasonable estimate can be	-0.124939
-0.164369	code. Metaprogramming can be	-0.124939
-0.164369	example 12.4b can be	-0.124939
-0.164369	sizes Integers can be	-0.124939
-0.164369	following techniques can be	-0.124939
-0.164369	higher resolution can be	-0.124939
-0.164369	C++ projects can be	-0.124939
-0.164369	example 14.28 can be	-0.124939
-0.164369	Multiple divisions can be	-0.124939
-0.164369	and s3 can be	-0.124939
-0.164369	interface etc., can be	-0.124939
-0.164369	arrays. Strings can be	-0.124939
-0.164369	are read-only can be	-0.124939
-0.164369	subexpression c+b can be	-0.124939
-0.164369	same chip can be	-0.124939
-0.164369	The formats can be	-0.124939
-0.164369	cache miss can be	-0.124939
-0.164369	jumps Jumps can be	-0.124939
-0.164369	two parentheses can be	-0.124939
-0.164369	result (b+c) can be	-0.124939
-0.164369	This dilemma can be	-0.124939
-0.164369	following work-around can be	-0.124939
-0.164369	example 8.24 can be	-0.124939
-0.164369	time. (Examples can be	-0.124939
-0.164369	= !a; can be	-0.124939
-0.164369	zero. Zero can be	-0.124939
-0.049562	it may not be	-0.124939
-0.081105	compiler may not be	-0.602060
-0.032405	It may not be	-0.124939
-0.105518	memory may not be	-0.124939
-0.105518	functions may not be	-0.124939
-0.105518	cache may not be	-0.124939
-0.105518	alloca may not be	-0.124939
-0.105518	exit may not be	-0.124939
-0.105518	sticks may not be	-0.124939
-0.591394	it will not be	-0.124939
-0.373780	code will not be	-0.124939
-0.373780	program will not be	-0.124939
-0.134878	that should not be	-0.425969
-0.324954	class need not be	-0.124939
-0.478109	should therefore not be	-0.124939
-0.324954	it might not be	-0.124939
-0.280155	powerful and may be	-0.124939
-0.058815	variables that may be	-0.425969
-0.126862	instructions that may be	-0.124939
-0.126862	cleanup that may be	-0.124939
-0.049621	and it may be	-0.425969
-0.059253	then it may be	-0.191886
-0.024102	but it may be	-0.124939
-0.181077	example, it may be	-0.124939
-0.105652	optimization it may be	-0.124939
-0.105652	case it may be	-0.124939
-0.105652	But it may be	-0.124939
-0.105652	arrays, it may be	-0.124939
-0.105652	it, it may be	-0.124939
-0.165528	same function may be	-0.124939
-0.165528	critical function may be	-0.124939
-0.280155	error code may be	-0.124939
-0.231591	memory. This may be	-0.124939
-0.231591	72 This may be	-0.124939
-0.231591	reduced. This may be	-0.124939
-0.231591	45. This may be	-0.124939
-0.940969	the compiler may be	-0.124939
-0.208751	extra time may be	-0.124939
-0.171252	pointer. It may be	-0.124939
-0.171252	vector. It may be	-0.124939
-0.171252	space. It may be	-0.124939
-0.171252	anyway. It may be	-0.124939
-0.171252	here. It may be	-0.124939
-0.171252	predictable. It may be	-0.124939
-0.171252	profile. It may be	-0.124939
-0.171252	high. It may be	-0.124939
-0.171252	unavoidable. It may be	-0.124939
-0.452539	the program may be	-0.124939
-0.320804	results, which may be	-0.124939
-0.208751	security, but may be	-0.124939
-0.208751	An integer may be	-0.124939
-0.306827	future compilers may be	-0.124939
-0.208751	smart pointer may be	-0.124939
-0.208751	interface library may be	-0.124939
-0.043547	then there may be	-0.124939
-0.043547	because there may be	-0.124939
-0.043547	so there may be	-0.124939
-0.043547	However, there may be	-0.124939
-0.074925	dispatching There may be	-0.124939
-0.074925	user. There may be	-0.124939
-0.074925	faster. There may be	-0.124939
-0.074925	36. There may be	-0.124939
-0.074925	inheritance. There may be	-0.124939
-0.208751	Global variables may be	-0.124939
-0.280155	this table may be	-0.124939
-0.280155	use pointers may be	-0.124939
-0.208751	or they may be	-0.124939
-0.091952	This method may be	-0.124939
-0.091952	vector method may be	-0.124939
-0.208751	network access may be	-0.124939
-0.208751	few times may be	-0.124939
-0.208751	64-bit Windows may be	-0.124939
-0.208751	program execution may be	-0.124939
-0.253303	virtual processor may be	-0.124939
-0.165528	M processor may be	-0.124939
-0.208751	www.agner.org/optimize/cppexamples.zip. These may be	-0.124939
-0.208751	A calculation may be	-0.124939
-0.208751	efficient solution may be	-0.124939
-0.208751	repeat count may be	-0.124939
-0.208751	memory allocation may be	-0.124939
-0.208751	and multiplication may be	-0.124939
-0.208751	following methods may be	-0.124939
-0.208751	or reference may be	-0.124939
-0.208751	unwinding mechanism may be	-0.124939
-0.091952	simple constructor may be	-0.124939
-0.091952	copy constructor may be	-0.124939
-0.208751	Some modules may be	-0.124939
-0.208751	a network may be	-0.124939
-0.398352	clock frequency may be	-0.124939
-0.208751	The usability may be	-0.124939
-0.208751	memory. They may be	-0.124939
-0.208751	just-in-time compilation may be	-0.124939
-0.208751	parameter. Templates may be	-0.124939
-0.208751	binary tree may be	-0.124939
-0.208751	Bitfields Bitfields may be	-0.124939
-0.208751	* 2.5 may be	-0.124939
-0.208751	the tolerance may be	-0.124939
-0.340498	of a will be	-0.124939
-0.925429	then it will be	-0.124939
-0.387546	virtual function will be	-0.124939
-0.465549	the code will be	-0.124939
-0.293281	resulting code will be	-0.124939
-0.293281	resultant code will be	-0.124939
-0.457060	functionality. This will be	-0.124939
-0.445417	program, you will be	-0.124939
-0.340498	No memory will be	-0.124939
-0.275570	the program will be	-0.124939
-0.183634	application program will be	-0.124939
-0.183634	entire program will be	-0.124939
-0.479433	the cache will be	-0.124939
-0.259389	negative integer will be	-0.124939
-0.259389	same class will be	-0.124939
-0.340498	of b will be	-0.124939
-0.259389	and there will be	-0.124939
-0.259389	} There will be	-0.124939
-0.259389	linear array will be	-0.124939
-0.259389	and objects will be	-0.124939
-0.259389	which variables will be	-0.124939
-0.479433	a branch will be	-0.124939
-0.340498	the user will be	-0.124939
-0.070528	the result will be	-0.124939
-0.070528	The result will be	-0.124939
-0.070528	final result will be	-0.124939
-0.259389	the speed will be	-0.124939
-0.259389	cache line will be	-0.124939
-0.259389	this multiplication will be	-0.124939
-0.259389	Code caching will be	-0.124939
-0.259389	code section will be	-0.124939
-0.479433	in main will be	-0.124939
-0.340498	& operation will be	-0.124939
-0.259389	whether vectorization will be	-0.124939
-0.259389	only constants will be	-0.124939
-0.259389	template instances will be	-0.124939
-0.259389	to 0x273F will be	-0.124939
-0.259389	constant 3.5 will be	-0.124939
-0.259389	new today will be	-0.124939
-0.259389	static modifier will be	-0.124939
-0.259389	of b+c will be	-0.124939
-0.259389	page 103) will be	-0.124939
-0.259389	example, b*2.0/3.0 will be	-0.124939
-0.500285	code can then be	-0.124939
-0.355542	operation will then be	-0.124939
-0.350425	which can only be	-0.124939
-0.350425	otherwise can only be	-0.124939
-0.378411	will not only be	-0.124939
-0.378411	would not only be	-0.124939
-0.344788	unrolling should only be	-0.124939
-0.357847	d would all be	-0.124939
-0.219056	where it should be	-0.124939
-0.242133	System code should be	-0.124939
-0.156381	calculations. This should be	-0.124939
-0.298635	but you should be	-0.124939
-0.298635	Here, you should be	-0.124939
-0.167833	available. It should be	-0.124939
-0.167833	tools. It should be	-0.124939
-0.349090	The program should be	-0.124939
-0.156381	or class should be	-0.124939
-0.156381	this example should be	-0.124939
-0.156381	and b should be	-0.124939
-0.219056	commas. There should be	-0.124939
-0.034138	multidimensional array should be	-0.124939
-0.342310	software. You should be	-0.124939
-0.156381	The table should be	-0.124939
-0.156381	software performance should be	-0.124939
-0.156381	All software should be	-0.124939
-0.156381	loop branch should be	-0.124939
-0.242133	The test should be	-0.124939
-0.156381	Web systems should be	-0.124939
-0.156381	or method should be	-0.124939
-0.156381	a constant should be	-0.124939
-0.071189	Big arrays should be	-0.124939
-0.071189	Multidimensional arrays should be	-0.124939
-0.071189	multiple versions should be	-0.124939
-0.071189	three versions should be	-0.124939
-0.156381	16 bytes should be	-0.124939
-0.156381	boxes, etc. should be	-0.124939
-0.156381	of programs should be	-0.124939
-0.156381	These problems should be	-0.124939
-0.156381	the dispatching should be	-0.124939
-0.156381	template parameter should be	-0.124939
-0.156381	and resources should be	-0.124939
-0.008286	used together should be	-0.726999
-0.156381	intermediate results should be	-0.124939
-0.156381	Thread-local storage should be	-0.124939
-0.156381	few lines should be	-0.124939
-0.156381	and output should be	-0.124939
-0.219056	inside containers should be	-0.124939
-0.071189	for updates should be	-0.124939
-0.071189	program updates should be	-0.124939
-0.156381	This penalty should be	-0.124939
-0.156381	clock counts should be	-0.124939
-0.219056	software developers should be	-0.124939
-0.156381	Accessibility guidelines should be	-0.124939
-0.071189	A queue should be	-0.124939
-0.071189	FIFO queue should be	-0.124939
-0.156381	following considerations should be	-0.124939
-0.156381	are modified should be	-0.124939
-0.156381	User feedback should be	-0.124939
-0.156381	mathematical calculations, should be	-0.124939
-0.156381	file formats should be	-0.124939
-0.156381	and servers should be	-0.124939
-0.156381	bits wide, should be	-0.124939
-0.156381	any function) should be	-0.124939
-0.156381	solutions. Patches should be	-0.124939
-0.156381	User complaints should be	-0.124939
-0.156381	protection scheme should be	-0.124939
-0.156381	which imprecisions should be	-0.124939
-0.156381	.cpp file) should be	-0.124939
-0.156381	or malloc/free should be	-0.124939
-0.084106	it can also be	-0.124939
-0.097295	It can also be	-0.124939
-0.084106	variables can also be	-0.124939
-0.084106	branch can also be	-0.124939
-0.084106	systems can also be	-0.124939
-0.084106	classes can also be	-0.124939
-0.084106	parameter can also be	-0.124939
-0.084106	union can also be	-0.124939
-0.084106	pattern can also be	-0.124939
-0.084106	(arrays can also be	-0.124939
-0.259992	There may also be	-0.124939
-0.259992	map may also be	-0.124939
-0.207304	function should also be	-0.124939
-0.207304	video should also be	-0.124939
-0.280336	coprocessor might also be	-0.124939
-0.356987	that all software be	-0.124939
-0.199130	inlined or cannot be	-0.124939
-0.329496	that it cannot be	-0.124939
-0.226741	if it cannot be	-0.124939
-0.308390	the function cannot be	-0.124939
-0.199130	intermediate code cannot be	-0.124939
-0.556808	then you cannot be	-0.124939
-0.199130	MOVNTQ instruction cannot be	-0.124939
-0.199130	function which cannot be	-0.124939
-0.199130	final size cannot be	-0.124939
-0.199130	An object cannot be	-0.124939
-0.199130	A variable cannot be	-0.124939
-0.372093	thread. You cannot be	-0.124939
-0.199130	memory address cannot be	-0.124939
-0.199130	program optimization cannot be	-0.124939
-0.226741	because they cannot be	-0.124939
-0.226741	where they cannot be	-0.124939
-0.199130	complicated cases cannot be	-0.124939
-0.199130	access times cannot be	-0.124939
-0.199130	Intel CPUs cannot be	-0.124939
-0.199130	it work cannot be	-0.124939
-0.199130	the name cannot be	-0.124939
-0.199130	network resources cannot be	-0.124939
-0.199130	Table lookup cannot be	-0.124939
-0.199130	dynamic linking cannot be	-0.124939
-0.199130	point operands cannot be	-0.124939
-0.199130	is loaded cannot be	-0.124939
-0.356842	may neverthe- less be	-0.124939
-0.259693	objects can often be	-0.124939
-0.259693	operation can often be	-0.124939
-0.259693	queries can often be	-0.124939
-0.467859	language will often be	-0.124939
-0.346756	denominator can even be	-0.124939
-0.488073	memory may even be	-0.124939
-0.460074	size should always be	-0.124939
-0.994291	in most cases be	-0.124939
-0.653099	in some cases be	-0.124939
-0.164102	that a must be	-0.124939
-0.164102	framework that must be	-0.124939
-0.313577	However, you must be	-0.124939
-0.164102	manually. It must be	-0.124939
-0.164102	MOVNTQ instruction must be	-0.124939
-0.164102	cannot do must be	-0.124939
-0.164102	of i must be	-0.124939
-0.164102	an object must be	-0.124939
-0.164102	carry bit must be	-0.124939
-0.164102	because they must be	-0.124939
-0.164102	multiple threads must be	-0.124939
-0.164102	inequality sign must be	-0.124939
-0.164102	interface framework must be	-0.124939
-0.164102	XMM vectors must be	-0.124939
-0.164102	copy constructor must be	-0.124939
-0.164102	137 errors must be	-0.124939
-0.164102	This index must be	-0.124939
-0.164102	// SIZE must be	-0.124939
-0.164102	The pragmas must be	-0.124939
-0.164102	if any, must be	-0.124939
-0.164102	for correctness must be	-0.124939
-0.275269	It can therefore be	-0.124939
-0.275269	allocation can therefore be	-0.124939
-0.392197	examples will therefore be	-0.124939
-0.505794	You should therefore be	-0.124939
-0.359482	binding should therefore be	-0.124939
-0.535029	Can the container be	-0.124939
-0.075558	when it would be	-0.124939
-0.075558	but it would be	-0.124939
-0.231451	the compiler would be	-0.124939
-0.335472	because this would be	-0.124939
-0.167090	The loop would be	-0.124939
-0.167090	{} which would be	-0.124939
-0.167090	induction variable would be	-0.124939
-0.167090	cache line would be	-0.124939
-0.167090	the parameters would be	-0.124939
-0.167090	ultimate solution would be	-0.124939
-0.167090	to metaprogramming would be	-0.124939
-0.231451	particular reduction would be	-0.124939
-0.167090	they otherwise would be	-0.124939
-0.167090	the logarithm would be	-0.124939
-0.167090	then sizeof(S1) would be	-0.124939
-0.354732	will most likely be	-0.124939
-0.165924	dimension may preferably be	-0.124939
-0.018788	function should preferably be	-0.124939
-0.018788	loop should preferably be	-0.124939
-0.018788	object should preferably be	-0.124939
-0.006173	objects should preferably be	-0.301030
-0.018788	test should preferably be	-0.124939
-0.018788	list should preferably be	-0.124939
-0.018788	counter should preferably be	-0.124939
-0.018788	count should preferably be	-0.124939
-0.018788	statements should preferably be	-0.124939
-0.018788	unrolling should preferably be	-0.124939
-0.018788	device should preferably be	-0.124939
-0.018788	interrupt should preferably be	-0.124939
-0.110398	should therefore preferably be	-0.124939
-0.128291	i can never be	-0.124939
-0.128291	We can never be	-0.124939
-0.439536	a will never be	-0.124939
-0.454849	set may actually be	-0.124939
-0.552009	may in fact be	-0.124939
-0.102450	optimization can sometimes be	-0.124939
-0.102450	conversions can sometimes be	-0.124939
-0.102450	Divisions can sometimes be	-0.124939
-0.102450	139 can sometimes be	-0.124939
-0.257714	loop can still be	-0.124939
-0.257714	above can still be	-0.124939
-0.277157	0x273F would still be	-0.124939
-0.361866	that can possibly be	-0.124939
-0.252088	code can possibly be	-0.124939
-0.229015	size may possibly be	-0.124939
-0.229015	methods could possibly be	-0.124939
-0.367478	should of course be	-0.124939
-0.367478	would of course be	-0.124939
-0.327282	Or it might be	-0.124939
-0.248396	then this might be	-0.124939
-0.248396	solution. It might be	-0.124939
-0.237754	the function could be	-0.124939
-0.237754	the portability could be	-0.124939
-0.237754	that r+i/2 could be	-0.124939
-0.341539	code can now be	-0.124939
-0.341593	that can easily be	-0.124939
-0.097986	problem cannot easily be	-0.124939
-0.097986	algorithms, cannot easily be	-0.124939
-0.434670	manual will soon be	-0.124939
-0.047225	cases should definitely be	-0.124939
-0.047225	framework should definitely be	-0.124939
-0.047225	containers should definitely be	-0.124939
-0.325042	c; } Can be	-0.124939
-0.314416	code can probably be	-0.124939
-0.293913	address which can't be	-0.124939
-0.237641	may some day be	-0.124939
-0.237641	// Dispatcher. Will be	-0.124939
-0.850358	they point to are	-0.124939
-0.655592	the stack and are	-0.124939
-0.356786	cache space and are	-0.124939
-0.356786	to evaluate and are	-0.124939
-0.927535	of code that are	-0.124939
-0.230851	of data that are	-0.124939
-0.230851	while data that are	-0.124939
-0.943486	the program that are	-0.124939
-0.326965	the functions that are	-0.124939
-0.200984	of functions that are	-0.124939
-0.200984	for functions that are	-0.124939
-0.200984	if functions that are	-0.124939
-0.200984	other functions that are	-0.124939
-0.200984	several functions that are	-0.124939
-0.200984	few functions that are	-0.124939
-0.487733	the compilers that are	-0.124939
-0.418463	and objects that are	-0.124939
-0.259524	the variables that are	-0.124939
-0.259524	for variables that are	-0.124939
-0.259524	that variables that are	-0.124939
-0.100264	function libraries that are	-0.124939
-0.323044	with pointers that are	-0.124939
-0.502279	specific instructions that are	-0.124939
-0.418463	with CPUs that are	-0.124939
-0.418463	to arrays that are	-0.124939
-0.323044	Function parameters that are	-0.124939
-0.471480	for programs that are	-0.124939
-0.475393	making branches that are	-0.124939
-0.323044	data members that are	-0.124939
-0.323044	for constants that are	-0.124939
-0.323044	other tasks that are	-0.124939
-0.039744	efficient. Variables that are	-0.124939
-0.039744	storage Variables that are	-0.124939
-0.039744	are: Variables that are	-0.124939
-0.039744	storage. Variables that are	-0.124939
-0.019418	9.4 Variables that are	-0.425969
-0.061069	9.3 Functions that are	-0.425969
-0.323044	dynamically. Arrays that are	-0.124939
-0.323044	for lists that are	-0.124939
-0.323044	collection. Objects that are	-0.124939
-0.323044	areas. Data that are	-0.124939
-0.323044	variable lengths that are	-0.124939
-0.323044	the considerations that are	-0.124939
-0.455576	interface. Applications that are	-0.124939
-0.418463	matrixes. Algorithms that are	-0.124939
-0.323044	random events that are	-0.124939
-0.132153	expressions. Operations that are	-0.124939
-0.132153	aliasing. Operations that are	-0.124939
-0.323044	up spaces that are	-0.124939
-0.323044	the ones that are	-0.124939
-0.323044	so-called iterators that are	-0.124939
-0.369265	inside a function are	-0.425969
-0.457793	of any function are	-0.124939
-0.354329	an overloaded function are	-0.124939
-1.138709	in the code are	-0.124939
-0.780027	the program code are	-0.124939
-0.901595	the critical code are	-0.124939
-0.534878	instruction that you are	-0.124939
-0.323526	that if you are	-0.124939
-0.323526	only if you are	-0.124939
-0.323526	loop if you are	-0.124939
-0.323526	unsigned if you are	-0.124939
-0.323526	section if you are	-0.124939
-0.323526	organized if you are	-0.124939
-0.323526	explanation if you are	-0.124939
-0.323526	anyway if you are	-0.124939
-0.323526	aliasing" if you are	-0.124939
-0.390885	the code you are	-0.124939
-0.751085	long as you are	-0.124939
-0.550443	the compiler you are	-0.124939
-0.300867	first when you are	-0.124939
-0.300867	counters when you are	-0.124939
-0.300867	readable when you are	-0.124939
-0.562272	supports then you are	-0.124939
-0.336463	systems. If you are	-0.124939
-0.336463	set. If you are	-0.124939
-0.336463	u; If you are	-0.124939
-0.336463	branches. If you are	-0.124939
-0.473885	results. If you are	-0.124939
-0.336463	methods. If you are	-0.124939
-0.336463	(www.intel.com). If you are	-0.124939
-0.425620	make sure you are	-0.124939
-0.390885	of whether you are	-0.124939
-0.124884	CPUs unless you are	-0.425969
-0.300867	X, unless you are	-0.124939
-0.450546	know what you are	-0.124939
-0.550443	vector library, you are	-0.124939
-0.577397	and the data are	-0.124939
-0.199074	if the data are	-0.301030
-0.409372	when the data are	-0.124939
-0.409372	where the data are	-0.124939
-0.520664	code and data are	-0.124939
-0.320610	require that data are	-0.124939
-0.415433	poor if data are	-0.124939
-0.060736	efficiently when data are	-0.425969
-0.320610	in which data are	-0.124939
-0.452276	because static data are	-0.124939
-1.572600	of the program are	-0.124939
-0.845216	compile the program are	-0.124939
-0.490215	well-structured C++ program are	-0.124939
-0.886527	console mode program are	-0.124939
-0.170752	of the functions are	-0.124939
-0.381120	memory. The functions are	-0.124939
-0.122184	these two functions are	-0.124939
-0.122184	These two functions are	-0.124939
-0.813077	Virtual member functions are	-0.124939
-0.415052	Unfortunately, these functions are	-0.124939
-0.292817	compiler. Some functions are	-0.124939
-0.292817	most important functions are	-0.124939
-0.122184	commpage. These functions are	-0.124939
-0.122184	_mm. These functions are	-0.124939
-0.444246	If virtual functions are	-0.124939
-0.555627	and mathematical functions are	-0.124939
-0.394468	advanced mathematical functions are	-0.124939
-0.292817	optimized. Library functions are	-0.124939
-0.415052	functions Virtual functions are	-0.124939
-0.268887	code. Intrinsic functions are	-0.124939
-0.268887	instructions. Intrinsic functions are	-0.124939
-0.381120	compiler). Fastcall functions are	-0.124939
-0.292817	program. Small functions are	-0.124939
-0.292817	function. Sometimes, functions are	-0.124939
-0.292817	function. Leaf functions are	-0.124939
-0.525702	near each other are	-0.425969
-1.347401	inside the loop are	-0.124939
-0.494030	costly and which are	-0.124939
-0.336082	that functions which are	-0.124939
-0.336082	profilers available which are	-0.124939
-0.336082	and directives which are	-0.124939
-0.336082	three conditions which are	-0.124939
-0.336082	MMX registers, which are	-0.124939
-0.336082	point comparisons, which are	-0.124939
-0.357268	specific order but are	-0.124939
-0.794683	and data cache are	-0.124939
-1.223627	the level-2 cache are	-0.124939
-0.983599	the level-1 cache are	-0.124939
-1.068369	AVX-512 instruction set are	-0.124939
-0.721186	in a class are	-0.124939
-0.828684	inside a class are	-0.124939
-0.532500	and child class are	-0.124939
-0.592164	a derived class are	-0.124939
-0.374247	and derived class are	-0.124939
-0.519867	reductions the compilers are	-0.124939
-0.317356	compiler. The compilers are	-0.124939
-0.317356	so. The compilers are	-0.124939
-0.307535	Fortunately, all compilers are	-0.124939
-0.307535	Some 64-bit compilers are	-0.124939
-0.784490	Most C++ compilers are	-0.124939
-0.399227	and Gnu compilers are	-0.124939
-0.549184	time. Some compilers are	-0.124939
-0.307535	Some common compilers are	-0.124939
-0.307535	Unfortunately, few compilers are	-0.124939
-0.307535	and Watcom compilers are	-0.124939
-0.307535	73). Current compilers are	-0.124939
-0.307535	enabled. Few compilers are	-0.124939
-0.356893	of 256-bit size are	-0.124939
-0.646414	a and b are	-0.124939
-0.758683	of each object are	-0.124939
-0.199085	order and there are	-0.124939
-0.199085	limited and there are	-0.124939
-0.199085	dominating and there are	-0.124939
-0.201595	and that there are	-0.124939
-0.297876	assume that there are	-0.124939
-0.201595	Note that there are	-0.124939
-0.297876	aware that there are	-0.124939
-0.201595	discovered that there are	-0.124939
-0.201595	discover that there are	-0.124939
-0.236885	used if there are	-0.124939
-0.236885	cache if there are	-0.124939
-0.102488	pointers if there are	-0.425969
-0.236885	programs if there are	-0.124939
-0.102488	safe if there are	-0.124939
-0.236885	separately if there are	-0.124939
-0.236885	calls, if there are	-0.124939
-0.262945	blocks than there are	-0.124939
-0.288550	differently because there are	-0.124939
-0.175009	mode. If there are	-0.124939
-0.078748	again. If there are	-0.124939
-0.232537	code, but there are	-0.124939
-0.232537	devices, but there are	-0.124939
-0.262945	however, where there are	-0.124939
-0.156632	function. But there are	-0.124939
-0.156632	integers. But there are	-0.124939
-0.087071	possible. However, there are	-0.124939
-0.087071	120 However, there are	-0.124939
-0.087071	automatically. However, there are	-0.124939
-0.087071	are. However, there are	-0.124939
-0.740066	some cases, there are	-0.124939
-0.194107	ArrayOfStructures[100]; Here, there are	-0.124939
-0.194107	all. Fortunately, there are	-0.124939
-0.262945	units. Typically, there are	-0.124939
-0.194107	be avoided, there are	-0.124939
-0.066921	of compiler There are	-0.124939
-0.066921	by compiler There are	-0.124939
-0.146065	mathematical code. There are	-0.124939
-0.066921	same time. There are	-0.124939
-0.066921	extra time. There are	-0.124939
-0.146065	memcpy function. There are	-0.124939
-0.146065	size, etc. There are	-0.124939
-0.146065	less efficient. There are	-0.124939
-0.146065	explained below. There are	-0.124939
-0.207161	future processors. There are	-0.124939
-0.146065	for vectors There are	-0.124939
-0.146065	other resources. There are	-0.124939
-0.146065	function calls. There are	-0.124939
-0.146065	support it. There are	-0.124939
-0.146065	of registers. There are	-0.124939
-0.146065	in performance. There are	-0.124939
-0.146065	control instructions. There are	-0.124939
-0.146065	same address. There are	-0.124939
-0.146065	or not. There are	-0.124939
-0.146065	is enabled. There are	-0.124939
-0.146065	Boolean expressions. There are	-0.124939
-0.146065	unaligned arrays. There are	-0.124939
-0.146065	point vectors. There are	-0.124939
-0.146065	CPU core. There are	-0.124939
-0.146065	multiple threads. There are	-0.124939
-0.146065	present manual. There are	-0.124939
-0.146065	by 8. There are	-0.124939
-0.146065	(*.dll, *.so). There are	-0.124939
-0.146065	not optimal. There are	-0.124939
-0.146065	and maintenance There are	-0.124939
-0.146065	code explicitly. There are	-0.124939
-0.146065	in Windows). There are	-0.124939
-0.146065	AMD CodeAnalyst. There are	-0.124939
-0.146065	following way: There are	-0.124939
-0.146065	to security. There are	-0.124939
-0.146065	very limited. There are	-0.124939
-0.146065	number 0x1C. There are	-0.124939
-0.146065	Memory copying. There are	-0.124939
-0.146065	to post-increment. There are	-0.124939
-0.146065	overflow check. There are	-0.124939
-0.146065	"Instruction tables". There are	-0.124939
-0.146065	save power. There are	-0.124939
-0.146065	= -abs(x);. There are	-0.124939
-0.146065	it uses. There are	-0.124939
-0.146065	and 2B. There are	-0.124939
-0.146065	than normally. There are	-0.124939
-0.146065	floating point). There are	-0.124939
-0.356886	that the objects are	-0.124939
-0.502163	if the objects are	-0.124939
-0.799876	Variables and objects are	-0.124939
-0.370008	time. The objects are	-0.124939
-0.036187	manner? If objects are	-0.425969
-0.075665	consecutively? If objects are	-0.124939
-0.375211	all allocated objects are	-0.124939
-0.375211	dynamically allocated objects are	-0.124939
-0.473061	when shared objects are	-0.124939
-0.283728	for local objects are	-0.124939
-0.283728	The so-called objects are	-0.124939
-0.522229	Linux Shared objects are	-0.124939
-0.370008	cases, composite objects are	-0.124939
-0.489836	now that we are	-0.124939
-0.281378	the time we are	-0.124939
-0.260737	here because we are	-0.124939
-0.260737	9.5 because we are	-0.124939
-0.281378	cases where we are	-0.124939
-0.281378	this example, we are	-0.124939
-0.281378	these examples we are	-0.124939
-0.281378	language". While we are	-0.124939
-0.367143	F1? Then we are	-0.124939
-0.281378	example 7.4 we are	-0.124939
-0.281378	problem since we are	-0.124939
-0.281378	PC. Similarly, we are	-0.124939
-0.281378	example 14.7b, we are	-0.124939
-0.281378	200. Next, we are	-0.124939
-0.404559	commonly used variables are	-0.124939
-0.502081	for register variables are	-0.124939
-0.311848	understand how variables are	-0.124939
-0.216397	true. Boolean variables are	-0.124939
-0.216397	overdetermined Boolean variables are	-0.124939
-0.216397	invalid. Boolean variables are	-0.124939
-0.498093	} Induction variables are	-0.124939
-0.404559	function. Global variables are	-0.124939
-0.579393	in the table are	-0.124939
-0.355718	supporting multi-threaded software are	-0.124939
-0.257173	and the elements are	-0.124939
-0.257173	that the elements are	-0.124939
-0.368424	if the elements are	-0.124939
-0.257173	which the elements are	-0.124939
-0.313413	only when elements are	-0.124939
-0.313413	"how many elements are	-0.124939
-0.313413	accessing container elements are	-0.124939
-0.525637	the objects stored are	-0.124939
-1.458983	the sign bit are	-0.124939
-0.823198	obstacles to optimization are	-0.124939
-0.316778	Gnu function libraries are	-0.124939
-0.316778	best function libraries are	-0.124939
-0.130125	These function libraries are	-0.124939
-0.316778	common function libraries are	-0.124939
-0.316778	Many function libraries are	-0.124939
-0.344132	when Intel libraries are	-0.124939
-0.334821	The dynamic libraries are	-0.124939
-0.334821	more dynamic libraries are	-0.124939
-0.344132	the standard libraries are	-0.124939
-0.262402	efficient. Dynamic libraries are	-0.124939
-0.262402	special purpose libraries are	-0.124939
-0.262402	and LIBM libraries are	-0.124939
-0.745598	stored in registers are	-0.124939
-0.409475	XMM vector registers are	-0.124939
-0.646943	floating point registers are	-0.124939
-0.022095	point stack registers are	-0.301030
-0.142020	the XMM registers are	-0.301030
-0.070124	if XMM registers are	-0.124939
-0.153793	128-bit XMM registers are	-0.124939
-0.367970	The YMM registers are	-0.124939
-0.328749	or if pointers are	-0.124939
-0.496095	way member pointers are	-0.124939
-0.425582	of smart pointers are	-0.124939
-0.483519	deleted. Smart pointers are	-0.124939
-0.556607	and operating systems are	-0.124939
-0.352373	two operating systems are	-0.124939
-0.495870	64-bit operating systems are	-0.124939
-0.352373	some operating systems are	-0.124939
-0.352373	contemporary operating systems are	-0.124939
-0.446834	size, because these are	-0.124939
-0.446834	instructions, but these are	-0.124939
-0.164907	thing and they are	-0.124939
-0.164907	programmers and they are	-0.124939
-0.388215	so that they are	-0.124939
-0.127437	the function they are	-0.124939
-0.144463	or if they are	-0.124939
-0.144463	even if they are	-0.124939
-0.074673	values if they are	-0.124939
-0.074673	together if they are	-0.124939
-0.074673	errors if they are	-0.124939
-0.074673	expensive if they are	-0.124939
-0.074673	cheap if they are	-0.124939
-0.127437	integers - they are	-0.124939
-0.127437	every time they are	-0.124939
-0.028527	of when they are	-0.124939
-0.028527	only when they are	-0.124939
-0.028527	counters when they are	-0.124939
-0.028527	stronger when they are	-0.124939
-0.188359	efficient because they are	-0.124939
-0.188359	performance because they are	-0.124939
-0.147930	in which they are	-0.249877
-0.059061	a, but they are	-0.124939
-0.059061	integers, but they are	-0.124939
-0.185787	situation where they are	-0.124939
-0.127437	stages before they are	-0.124939
-0.127437	on how they are	-0.124939
-0.127437	most cases they are	-0.124939
-0.185787	of whether they are	-0.124939
-0.127437	the programs they are	-0.124939
-0.127437	pointers unless they are	-0.124939
-0.127437	calculations whenever they are	-0.124939
-0.546056	and memory access are	-0.124939
-1.175982	object oriented programming are	-0.124939
-0.807415	and 64 bits are	-0.124939
-0.451351	(when vector operations are	-0.124939
-0.440466	Floating point operations are	-0.124939
-0.460006	because integer operations are	-0.124939
-0.271463	and these operations are	-0.124939
-0.114858	operators Integer operations are	-0.124939
-0.114858	size. Integer operations are	-0.124939
-0.073105	variables. Vector operations are	-0.124939
-0.073105	sets. Vector operations are	-0.124939
-0.073105	(ZMM). Vector operations are	-0.124939
-0.355093	Pointer arithmetic operations are	-0.124939
-0.355304	best. These cases are	-0.124939
-0.294193	pipeline where instructions are	-0.124939
-0.294193	though. Some instructions are	-0.124939
-0.215058	cache. These instructions are	-0.124939
-0.215058	lookup. These instructions are	-0.124939
-0.277448	nontemporal write instructions are	-0.425969
-0.416877	on executing instructions are	-0.124939
-0.294634	and vector processors are	-0.124939
-0.539601	for different processors are	-0.124939
-0.383348	earlier Intel processors are	-0.124939
-0.383348	and AMD processors are	-0.124939
-0.294634	The x86 processors are	-0.124939
-0.666047	standard PC processors are	-0.124939
-0.294634	time. Newer processors are	-0.124939
-0.163708	resources. Modern CPUs are	-0.124939
-0.163708	parallel. Modern CPUs are	-0.124939
-0.163708	temp2. Modern CPUs are	-0.124939
-0.054165	that the arrays are	-0.124939
-0.116061	when the arrays are	-0.124939
-0.054165	sure the arrays are	-0.425969
-0.054165	whether the arrays are	-0.124939
-0.256219	efficient when arrays are	-0.124939
-0.256219	6. If arrays are	-0.124939
-0.256219	inefficient. Linear arrays are	-0.124939
-0.537202	compilers for Windows are	-0.124939
-0.854895	and function calls are	-0.124939
-0.473702	to the calculations are	-0.124939
-0.399606	these address calculations are	-0.124939
-0.563093	double precision calculations are	-0.124939
-0.307842	are: All calculations are	-0.124939
-0.307842	that certain calculations are	-0.124939
-0.062172	CPUs. New versions are	-0.425969
-0.331172	Free trial versions are	-0.124939
-0.804294	if the threads are	-0.124939
-0.432468	that different threads are	-0.124939
-0.697496	if multiple threads are	-0.124939
-0.483211	If two threads are	-0.124939
-0.305895	and high-priority threads are	-0.124939
-1.082934	b and c are	-0.124939
-0.341030	system calls. These are	-0.124939
-0.341030	improve efficiency. These are	-0.124939
-0.457559	task or thread are	-0.124939
-0.623852	trigonometric functions, etc. are	-0.124939
-0.340545	Sunday, Monday, etc. are	-0.124939
-0.457772	If two integers are	-0.124939
-0.762904	predefined vector classes are	-0.124939
-0.748634	of container classes are	-0.124939
-0.480035	threads? Container classes are	-0.124939
-0.283501	where the parameters are	-0.124939
-0.270658	Simple function parameters are	-0.124939
-0.080603	floating point parameters are	-0.124939
-0.080603	Floating point parameters are	-0.124939
-0.147736	two integer parameters are	-0.124939
-0.147736	compiler) integer parameters are	-0.124939
-0.097798	the template parameters are	-0.124939
-0.246058	first four parameters are	-0.124939
-0.139031	parameters Function parameters are	-0.124939
-0.063976	memory. Function parameters are	-0.124939
-0.139031	__fastcall. Function parameters are	-0.124939
-0.179655	that macro parameters are	-0.124939
-0.986365	to this problem are	-0.124939
-0.353978	an STL container are	-0.124939
-0.417185	vectors. The operators are	-0.124939
-0.609086	the bitwise operators are	-0.124939
-0.689686	The bitwise operators are	-0.124939
-1.128128	class or structure are	-0.124939
-0.228751	called. The values are	-0.124939
-0.228751	array. The values are	-0.124939
-0.319157	the key values are	-0.124939
-0.369666	that the addresses are	-0.124939
-0.369666	because the addresses are	-0.124939
-0.301488	itself. Function addresses are	-0.124939
-0.451223	because relative addresses are	-0.124939
-0.301211	necessary library files are	-0.124939
-0.301211	The intermediate files are	-0.124939
-0.426210	All source files are	-0.124939
-0.391429	the header files are	-0.124939
-0.455523	fail if both are	-0.124939
-0.455655	All these problems are	-0.124939
-0.332256	if the branches are	-0.124939
-0.332256	the dispatch branches are	-0.124939
-0.862127	subtraction and multiplication are	-0.124939
-0.471943	if instruction sets are	-0.124939
-0.471943	when instruction sets are	-0.124939
-0.352246	of its members are	-0.124939
-0.364130	of these methods are	-0.124939
-0.253845	that these methods are	-0.124939
-0.355467	memory. These methods are	-0.124939
-0.271771	most development methods are	-0.124939
-0.271771	and similar methods are	-0.124939
-0.351790	ease of development are	-0.124939
-0.268892	predict which resources are	-0.124939
-0.268892	removed, all resources are	-0.124939
-0.351978	sure allocated resources are	-0.124939
-0.268892	the shared resources are	-0.124939
-0.539999	on network resources are	-0.124939
-0.350965	However, such applications are	-0.124939
-0.462910	version. The examples are	-0.124939
-0.425192	All these examples are	-0.124939
-0.352182	other protection means are	-0.124939
-0.350814	&& and || are	-0.124939
-0.350708	cases. Integer expressions are	-0.124939
-0.364003	of these directives are	-0.124939
-0.278799	Fortran. These directives are	-0.124939
-0.364003	runtime. #define directives are	-0.124939
-0.364003	compiled. #if directives are	-0.124939
-0.514656	Microsoft's .NET framework are	-0.124939
-0.313595	brands of microprocessors are	-0.124939
-0.236953	of Intel microprocessors are	-0.124939
-0.236953	the way microprocessors are	-0.124939
-0.394298	The modern microprocessors are	-0.124939
-0.228509	prediction. Modern microprocessors are	-0.124939
-0.228509	mechanisms. Modern microprocessors are	-0.124939
-0.448888	all the numbers are	-0.124939
-0.492058	Floating point numbers are	-0.124939
-0.388718	and model numbers are	-0.124939
-0.577603	are used together are	-0.124939
-0.350234	256-bit YMM vectors are	-0.124939
-0.452953	a and r are	-0.124939
-0.355216	if the results are	-0.124939
-0.202340	compilers. The results are	-0.124939
-0.202340	optimizations. The results are	-0.124939
-0.403980	and intermediate results are	-0.124939
-0.641992	of variable storage are	-0.124939
-0.519256	Many optimization options are	-0.124939
-0.322149	if certain options are	-0.124939
-0.443064	that the operands are	-0.124939
-0.241327	if the operands are	-0.124939
-0.704571	which the modules are	-0.124939
-0.538102	Many algebraic reductions are	-0.124939
-0.853353	Pointers and references are	-0.124939
-0.349383	B and C are	-0.124939
-0.512216	checks. These conversions are	-0.124939
-0.565190	other programming languages are	-0.124939
-0.256960	while high-level languages are	-0.124939
-0.256960	time. Interpreted languages are	-0.124939
-0.256960	hand. Low-level languages are	-0.124939
-0.793218	in the STL are	-0.124939
-0.348114	128. These lines are	-0.124939
-0.448950	in the output are	-0.124939
-0.449993	another. These costs are	-0.124939
-0.287168	Whether the constants are	-0.124939
-0.044557	floating point constants are	-0.124939
-0.287168	the two constants are	-0.124939
-0.214694	constants. Integer constants are	-0.124939
-0.488646	classes. Text strings are	-0.124939
-0.043030	the following conditions are	-0.124939
-0.205734	if certain conditions are	-0.124939
-0.205734	the caching conditions are	-0.124939
-0.205734	www.agner.org/optimize. Copyright conditions are	-0.124939
-0.448279	many standard tasks are	-0.124939
-0.799839	parent and child are	-0.124939
-1.117985	performance monitor counters are	-0.124939
-0.490135	the function names are	-0.124939
-0.303217	libircmt.lib. Function names are	-0.124939
-0.346004	parameter. Further details are	-0.124939
-0.139538	that the rows are	-0.124939
-0.064189	if the rows are	-0.124939
-0.345429	arrays or structures are	-0.124939
-0.344423	If frequent updates are	-0.124939
-0.344840	with multiple cores are	-0.124939
-0.345676	if alternative implementations are	-0.124939
-0.095172	of different sizes are	-0.425969
-0.344840	and Open BSD are	-0.124939
-0.256259	template metaprogramming, loops are	-0.124939
-0.051307	processor. Nested loops are	-0.425969
-0.342990	ways. Switch statements are	-0.124939
-0.443741	container class templates are	-0.124939
-0.344134	allocated in sequence are	-0.124939
-0.483182	The clock counts are	-0.124939
-0.343676	set and map are	-0.124939
-0.341497	software writing style are	-0.124939
-0.180095	that all destructors are	-0.124939
-0.753287	and parameter transfer are	-0.124939
-0.484505	above the diagonal are	-0.124939
-0.543562	below the diagonal are	-0.124939
-0.342004	for special purposes are	-0.124939
-0.342004	assembly language. Here are	-0.124939
-0.499156	for branch prediction are	-0.124939
-0.441381	function calling conventions are	-0.124939
-0.439235	in the background are	-0.124939
-0.339923	microprocessor. These algorithms are	-0.124939
-0.340491	all the additions are	-0.124939
-0.339923	only allowed inputs are	-0.124939
-0.340207	below. Those who are	-0.124939
-0.363377	all the factors are	-0.124939
-0.278285	libraries. These factors are	-0.124939
-0.278038	Verilog. Common devices are	-0.124939
-0.278038	Small hand-held devices are	-0.124939
-0.340207	together Cache misses are	-0.124939
-0.046060	and PLT tables are	-0.425969
-0.297789	lookup. Lookup tables are	-0.124939
-0.337584	IDE's for D are	-0.124939
-0.337584	that you measure are	-0.124939
-0.337584	the above sections are	-0.124939
-0.619418	rules of algebra are	-0.124939
-0.436247	renewed. Context switches are	-0.124939
-0.436652	implementations of Java are	-0.124939
-0.337262	that thrown exceptions are	-0.124939
-0.338230	Java virtual machine are	-0.124939
-0.207601	my optimization manuals are	-0.124939
-0.278799	of these manuals are	-0.124939
-0.207601	The subsequent manuals are	-0.124939
-0.207601	program. The profilers are	-0.124939
-0.207601	CodeAnalyst. These profilers are	-0.124939
-0.207601	CodeAnalyst. Unfortunately, profilers are	-0.124939
-0.816196	with out-of-order capabilities are	-0.124939
-0.337584	and that measurements are	-0.124939
-0.337262	vector. These units are	-0.124939
-0.337907	pow and log are	-0.124939
-0.184993	floating point comparisons are	-0.124939
-0.051064	Floating point comparisons are	-0.124939
-0.255017	process. These requirements are	-0.124939
-0.255017	the alignment requirements are	-0.124939
-0.426821	Pascal and Fortran are	-0.124939
-0.495661	and device drivers are	-0.124939
-0.236544	= 10; Templates are	-0.124939
-0.236544	template. 57 Templates are	-0.124939
-0.329740	The storage principles are	-0.124939
-0.148539	standards. Such schemes are	-0.124939
-0.009558	copy protection schemes are	-0.301030
-0.236544	and references. Arrays are	-0.124939
-0.236544	unexpected behaviors. Arrays are	-0.124939
-0.331073	before any constructors are	-0.124939
-0.329740	logic. Some guidelines are	-0.124939
-0.159857	such runtime frameworks are	-0.124939
-0.159857	graphical interface frameworks are	-0.124939
-0.159857	running. Such frameworks are	-0.124939
-0.330184	low power consumption are	-0.124939
-0.048172	If search facilities are	-0.425969
-0.329740	used as macros are	-0.124939
-0.323264	Such hybrid solutions are	-0.124939
-0.323264	sum1 and sum2 are	-0.124939
-0.211495	example 15.1b. Branches are	-0.124939
-0.211495	misprediction penalty. Branches are	-0.124939
-0.323264	guidelines. Most caches are	-0.124939
-0.211073	7.29 Threads Threads are	-0.124939
-0.211073	in Linux). Threads are	-0.124939
-0.323264	Most performance tests are	-0.124939
-0.313392	code and main() are	-0.124939
-0.312681	multiplications and divisions are	-0.124939
-0.312681	in disguise. Enums are	-0.124939
-0.313392	constructor itself. Constructors are	-0.124939
-0.036984	b[i] and c[i] are	-0.124939
-0.171683	parallel calculations. Examples are	-0.124939
-0.171683	explained above. Examples are	-0.124939
-0.312681	These table lookups are	-0.124939
-0.535432	Sum2 and Sum3 are	-0.124939
-0.380434	All disturbing influences are	-0.124939
-0.292257	advanced programming constructs are	-0.124939
-0.292257	and user settings are	-0.124939
-0.292257	optimized well, others are	-0.124939
-0.292257	relocation. The DLLs are	-0.124939
-0.292257	AVX. These suffixes are	-0.124939
-0.292257	the same arguments are	-0.124939
-0.292257	if time intervals are	-0.124939
-0.292257	u.f and v.f are	-0.124939
-0.380434	Many CPU dispatchers are	-0.124939
-0.102231	one register. Registers are	-0.124939
-0.102231	or reference. Registers are	-0.124939
-0.102231	using references. References are	-0.124939
-0.102231	wrong type. References are	-0.124939
-0.292257	(char, short int) are	-0.124939
-0.292257	most sorting algorithms, are	-0.124939
-0.292257	of my experiment are	-0.124939
-0.292257	post-increment operator i++ are	-0.124939
-0.292257	most common time-consumers are	-0.124939
-0.292257	and far procedures are	-0.124939
-0.292257	|, ^, ~ are	-0.124939
-0.236187	spell-checking and repagination are	-0.124939
-0.236187	The three clauses are	-0.124939
-0.236187	OS X (Darwin) are	-0.124939
-0.236187	operations (chapter 12) are	-0.124939
-0.236187	any answer. Beginners are	-0.124939
-0.236187	about name mangling are	-0.124939
-0.236187	anyway. Software distributors are	-0.124939
-0.236187	work. The recommendations are	-0.124939
-0.236187	about instruction latencies are	-0.124939
-0.236187	'@' and '$' are	-0.124939
-0.236187	searching and parsing are	-0.124939
-0.236187	pointers to objects) are	-0.124939
-0.236187	(also called properties) are	-0.124939
-0.236187	ABC = 123; are	-0.124939
-0.236187	begins with #) are	-0.124939
-0.236187	each time slice are	-0.124939
-0.236187	write instructions (MOVNT) are	-0.124939
-0.236187	operators (e.g. '>') are	-0.124939
-0.236187	within each clause are	-0.124939
-0.236187	options. CPU vendors are	-0.124939
-0.236187	and 9. Multiplications are	-0.124939
-0.586010	object pointed to can	-0.124939
-0.586010	variable pointed to can	-0.124939
-0.585425	binary code and can	-0.124939
-0.357222	accessed consecutively and can	-0.124939
-0.751957	for code that can	-0.124939
-0.519952	loop-invariant code that can	-0.124939
-0.775829	a compiler that can	-0.124939
-0.790718	test program that can	-0.124939
-0.337852	a cache that can	-0.124939
-0.337852	highest performance that can	-0.124939
-0.544541	of branch that can	-0.124939
-0.881969	a way that can	-0.124939
-0.524852	application-specific instructions that can	-0.124939
-0.475793	programming language that can	-0.124939
-0.337852	parallel structure that can	-0.124939
-0.063065	loop count that can	-0.425969
-0.496578	Predictable branches that can	-0.124939
-0.136869	in applications that can	-0.124939
-0.136869	for applications that can	-0.124939
-0.337852	write expressions that can	-0.124939
-0.436988	specific advantages that can	-0.124939
-0.238772	three things that can	-0.124939
-0.238772	reveal things that can	-0.124939
-0.337852	a profiler that can	-0.124939
-0.436988	third thing that can	-0.124939
-0.136869	is something that can	-0.124939
-0.136869	certainly something that can	-0.124939
-0.103179	several factors that can	-0.124939
-0.337852	efficient alternatives that can	-0.124939
-0.337852	function F2 that can	-0.124939
-0.337852	a chip that can	-0.124939
-0.337852	or Espresso) that can	-0.124939
-0.369735	functions and it can	-0.124939
-0.369735	order and it can	-0.124939
-0.520224	calls and it can	-0.124939
-0.369735	cores, and it can	-0.124939
-0.981218	is that it can	-0.124939
-0.402253	so that it can	-0.221849
-0.422079	expression that it can	-0.124939
-0.896114	means that it can	-0.124939
-0.422079	knows that it can	-0.124939
-0.576743	arrays if it can	-0.124939
-0.508106	data than it can	-0.124939
-0.552383	program then it can	-0.124939
-0.552383	T+5, then it can	-0.124939
-0.498174	is because it can	-0.124939
-0.498174	constants because it can	-0.124939
-0.498174	one, because it can	-0.124939
-0.468637	cycles, but it can	-0.124939
-0.468637	threads, but it can	-0.124939
-0.468637	hint, but it can	-0.124939
-0.321761	C1, so it can	-0.124939
-0.533715	compilation before it can	-0.124939
-0.411322	some cases it can	-0.425969
-0.507217	resources. But it can	-0.124939
-0.545368	another. Therefore, it can	-0.124939
-0.416866	and what it can	-0.124939
-0.732766	is called, it can	-0.124939
-0.321761	even worse, it can	-0.124939
-0.321761	= (b*c)/d, it can	-0.124939
-0.321761	At least, it can	-0.124939
-0.881615	unless the function can	-0.124939
-0.590687	exceptions a function can	-0.124939
-0.528705	// this function can	-0.124939
-0.493492	that one function can	-0.124939
-0.453146	the calling function can	-0.124939
-0.766997	A frame function can	-0.124939
-0.350663	The exponential function can	-0.124939
-0.700058	of the code can	-0.221849
-1.114724	if the code can	-0.124939
-0.453447	then the code can	-0.124939
-0.838243	pieces of code can	-0.124939
-0.523476	branches The code can	-0.124939
-0.523476	know). The code can	-0.124939
-0.497675	overflow, this code can	-0.124939
-0.884639	the same code can	-0.124939
-0.861150	The above code can	-0.124939
-0.146893	3; } This can	-0.425969
-0.370531	break; } This can	-0.124939
-0.398768	the code. This can	-0.124939
-0.280472	non-AVX code. This can	-0.124939
-0.467691	program memory. This can	-0.124939
-0.401318	*= x; This can	-0.124939
-0.401318	member pointer. This can	-0.124939
-0.309228	integer operations. This can	-0.124939
-0.401318	the variable. This can	-0.124939
-0.446637	the stack. This can	-0.124939
-0.280472	their stack. This can	-0.124939
-0.309228	very fast. This can	-0.124939
-0.565590	is important. This can	-0.124939
-0.401318	inconvenient times. This can	-0.124939
-0.309228	dispatch process. This can	-0.124939
-0.309228	data files. This can	-0.124939
-0.309228	be cached. This can	-0.124939
-0.309228	/ b2; This can	-0.124939
-0.309228	code only. This can	-0.124939
-0.565590	VIA CPUs"). This can	-0.124939
-0.309228	and free. This can	-0.124939
-0.059154	or modified. This can	-0.124939
-0.309228	is defined. This can	-0.124939
-0.309228	is saturated. This can	-0.124939
-0.309228	of (a+b). This can	-0.124939
-0.309228	of it). This can	-0.124939
-0.309228	thread scheduler. This can	-0.124939
-0.309228	bits 32-62. This can	-0.124939
-0.309228	access patterns. This can	-0.124939
-0.309228	this place. This can	-0.124939
-0.808577	then the compiler can	-0.124939
-0.871095	but the compiler can	-0.124939
-0.492068	so the compiler can	-0.124939
-0.705409	example, the compiler can	-0.124939
-0.492068	cases the compiler can	-0.124939
-0.881501	what the compiler can	-0.124939
-0.705409	Here, the compiler can	-0.124939
-0.492068	Likewise, the compiler can	-0.124939
-0.492068	12.1a, the compiler can	-0.124939
-0.509292	of a compiler can	-0.124939
-0.441343	inlining The compiler can	-0.124939
-0.441343	object. The compiler can	-0.124939
-0.441343	variable. The compiler can	-0.124939
-0.441343	executed. The compiler can	-0.124939
-0.441343	84). The compiler can	-0.124939
-0.441343	a+1;. The compiler can	-0.124939
-0.470467	result. A compiler can	-0.124939
-1.181870	The Intel compiler can	-0.124939
-0.556105	The Gnu compiler can	-0.425969
-0.292545	a good compiler can	-0.124939
-0.590783	A good compiler can	-0.124939
-0.382368	an optimizing compiler can	-0.124939
-0.285458	An optimizing compiler can	-0.124939
-0.311153	a just-in-time compiler can	-0.124939
-0.358256	former case x can	-0.124939
-0.382718	to and you can	-0.124939
-0.382718	code and you can	-0.124939
-0.382718	operator; and you can	-0.124939
-0.489834	requires that you can	-0.124939
-0.348027	options that you can	-0.124939
-0.348027	thing that you can	-0.124939
-0.348027	think that you can	-0.124939
-0.348027	unrealistic that you can	-0.124939
-0.459004	necessary if you can	-0.124939
-0.459004	good if you can	-0.124939
-0.459004	global if you can	-0.124939
-0.430805	issue, as you can	-0.124939
-0.538772	code then you can	-0.124939
-0.382769	functions then you can	-0.124939
-0.382769	cache then you can	-0.124939
-0.382769	independent then you can	-0.124939
-0.382769	integer, then you can	-0.124939
-0.382769	meaning, then you can	-0.124939
-0.382769	so, then you can	-0.124939
-0.395669	fastest because you can	-0.124939
-0.508583	programs. If you can	-0.124939
-0.412881	system, but you can	-0.124939
-0.278109	most compilers you can	-0.124939
-0.429187	Internet where you can	-0.124939
-0.395669	bits, so you can	-0.124939
-0.676891	for example, you can	-0.124939
-0.117163	and how you can	-0.124939
-0.117163	shows how you can	-0.124939
-0.363164	most cases you can	-0.124939
-0.278109	vector, while you can	-0.124939
-0.278109	(In Windows you can	-0.124939
-0.117163	most cases, you can	-0.124939
-0.117163	such cases, you can	-0.124939
-0.510960	which optimizations you can	-0.124939
-0.395669	blog. Here, you can	-0.124939
-0.117163	of things you can	-0.124939
-0.117163	various things you can	-0.124939
-0.363164	In Windows, you can	-0.124939
-0.490342	functions. Alternatively, you can	-0.124939
-0.177295	In general, you can	-0.124939
-0.278109	NOT. Instead, you can	-0.124939
-0.278109	| operator; you can	-0.124939
-0.278109	this reason, you can	-0.124939
-0.533833	loop if this can	-0.124939
-0.496503	stride then this can	-0.124939
-0.496503	shows how this can	-0.124939
-0.357642	The load time can	-0.124939
-0.461867	data cache use can	-0.124939
-0.925392	longer time. It can	-0.124939
-0.333928	compiler does It can	-0.124939
-0.333928	System database It can	-0.124939
-0.333928	input/output operations. It can	-0.124939
-0.333928	the CPU. It can	-0.124939
-0.481684	containing integers. It can	-0.124939
-0.333928	template parameter. It can	-0.124939
-0.333928	for Linux. It can	-0.124939
-0.333928	point numbers. It can	-0.124939
-0.333928	and list[i].b. It can	-0.124939
-0.997785	in static memory can	-0.124939
-0.753172	from RAM memory can	-0.124939
-1.236584	code and data can	-0.124939
-0.457775	and public data can	-0.124939
-0.571541	calculations. The program can	-0.124939
-0.354175	the 7 program can	-0.124939
-0.440096	of each vector can	-0.124939
-0.311582	then each vector can	-0.124939
-0.468801	cache. The same can	-0.124939
-0.468801	reads. The same can	-0.124939
-0.493563	while other functions can	-0.124939
-0.974694	Using intrinsic functions can	-0.124939
-0.350714	The missing functions can	-0.124939
-1.182110	because the CPU can	-0.124939
-0.570733	renaming. The CPU can	-0.124939
-0.357714	The prefetch instruction can	-0.124939
-0.648466	inside the loop can	-0.124939
-0.565604	not. The loop can	-0.124939
-0.329279	the CPU which can	-0.124939
-0.602474	of i which can	-0.124939
-0.329279	vector register which can	-0.124939
-0.329279	conversion instructions which can	-0.124939
-0.329279	and references, which can	-0.124939
-0.329279	OR operator, which can	-0.124939
-0.329279	an attribute which can	-0.124939
-0.329279	n-1 multiplications, which can	-0.124939
-0.329279	or YMM) which can	-0.124939
-0.772519	double to integer can	-0.124939
-0.580320	or an integer can	-0.124939
-0.348565	in the set can	-0.124939
-1.371795	AVX instruction set can	-0.124939
-0.575459	low instruction set can	-0.124939
-0.524803	A template class can	-0.124939
-0.574958	in this example can	-0.425969
-0.484820	and other compilers can	-0.124939
-0.886565	and Intel compilers can	-0.124939
-0.408962	of these compilers can	-0.124939
-0.812328	compiler. Some compilers can	-0.124939
-0.445237	because optimizing compilers can	-0.124939
-0.372203	storage Most compilers can	-0.124939
-0.372203	reductions Most compilers can	-0.124939
-0.372203	reduction Most compilers can	-0.124939
-0.060016	and PathScale compilers can	-0.124939
-0.315400	optimize Modern compilers can	-0.124939
-0.352303	of variable size can	-0.124939
-0.352303	i >= size can	-0.124939
-0.587196	Likewise, a pointer can	-0.124939
-0.489166	conversion A pointer can	-0.124939
-0.449202	No link pointer can	-0.124939
-0.356990	check on b can	-0.124939
-1.143457	vector class library can	-0.124939
-0.067343	A dynamic library can	-0.602060
-0.244826	same dynamic library can	-0.124939
-0.757905	user interface library can	-0.124939
-0.357096	noticed that i can	-0.124939
-1.001336	if the object can	-0.124939
-1.133342	a shared object can	-0.124939
-0.447175	The existing object can	-0.124939
-1.060665	of an array can	-0.124939
-0.620674	a simple array can	-0.124939
-0.338887	A large array can	-0.124939
-0.511081	27. An array can	-0.124939
-1.188310	number of objects can	-0.124939
-0.480644	on when objects can	-0.124939
-0.489340	and class objects can	-0.124939
-0.430672	containing many objects can	-0.124939
-0.332817	and new objects can	-0.124939
-1.011073	that a variable can	-0.124939
-0.527172	expensive. A variable can	-0.124939
-0.928552	the induction variable can	-0.124939
-0.303812	is that we can	-0.124939
-0.360609	so that we can	-0.124939
-0.303812	information that we can	-0.124939
-0.351524	cycles, then we can	-0.124939
-0.351524	C2, then we can	-0.124939
-0.442342	faster because we can	-0.124939
-0.278018	x so we can	-0.124939
-0.278018	bytes, so we can	-0.124939
-0.281360	64-bit systems we can	-0.124939
-0.281360	of constants we can	-0.124939
-0.367121	a2/b2; Here we can	-0.124939
-0.281360	pow(x,n) As we can	-0.124939
-0.281360	The lesson we can	-0.124939
-0.343099	number of variables can	-0.124939
-0.343099	158 Integer variables can	-0.124939
-0.547385	of induction variables can	-0.124939
-1.768999	power of 2 can	-0.124939
-0.232591	Read time You can	-0.124939
-0.232591	if unsigned You can	-0.124939
-0.232591	instruction code. You can	-0.124939
-0.308396	of time. You can	-0.124939
-0.232591	member functions. You can	-0.124939
-0.308396	more efficient. You can	-0.124939
-0.232591	the compiler. You can	-0.124939
-0.435818	clock cycles. You can	-0.124939
-0.232591	with references. You can	-0.124939
-0.232591	by 16. You can	-0.124939
-0.232591	the result. You can	-0.124939
-0.232591	preceding one. You can	-0.124939
-0.308396	not overlap. You can	-0.124939
-0.232591	positive n. You can	-0.124939
-0.232591	+ a. You can	-0.124939
-0.232591	155 test. You can	-0.124939
-0.232591	it twice. You can	-0.124939
-0.232591	Print heading You can	-0.124939
-0.232591	variable __intel_cpu_feature_indicator_x. You can	-0.124939
-0.232591	into account. You can	-0.124939
-0.232591	GOT entry. You can	-0.124939
-0.232591	or makefile. You can	-0.124939
-0.529362	examples. The table can	-0.124939
-0.704282	A hash table can	-0.124939
-0.768751	gain in performance can	-0.124939
-0.526853	efficient. The performance can	-0.124939
-0.557811	updating of software can	-0.124939
-0.425005	} A branch can	-0.124939
-0.425005	b; A branch can	-0.124939
-0.342188	the optimal branch can	-0.124939
-1.041359	data are stored can	-0.124939
-0.586944	if the address can	-0.124939
-0.710809	the target address can	-0.124939
-0.500618	The carry bit can	-0.124939
-0.530816	The same register can	-0.124939
-0.347485	128-bit XMM register can	-0.124939
-0.757927	level of optimization can	-0.124939
-0.139704	code Function libraries can	-0.124939
-0.139704	libraries Function libraries can	-0.124939
-0.921716	the vector registers can	-0.124939
-0.560287	in XMM registers can	-0.124939
-0.346441	with invalid pointers can	-0.124939
-0.509011	auto_ptr. Smart pointers can	-0.124939
-0.530742	The 64-bit systems can	-0.124939
-0.571362	these operating systems can	-0.124939
-0.583193	way the user can	-0.124939
-0.526323	sizes, and they can	-0.124939
-0.541151	critical because they can	-0.124939
-0.498909	all. This method can	-0.124939
-0.498909	executables. This method can	-0.124939
-0.326842	The same method can	-0.124939
-0.326842	A similar method can	-0.124939
-0.355246	if data access can	-0.124939
-0.938821	the operating system can	-0.425969
-0.544401	writes a file can	-0.124939
-0.552774	Object oriented programming can	-0.124939
-1.409373	the critical part can	-0.124939
-0.445906	sequence of operations can	-0.124939
-0.344935	The Boolean operations can	-0.124939
-0.875533	a composite type can	-0.124939
-0.458370	These new instructions can	-0.124939
-0.432300	These virtual processors can	-0.124939
-0.334116	processors. Many processors can	-0.124939
-0.611592	on non-Intel processors can	-0.124939
-0.652592	logical processors available can	-0.124939
-1.429928	by a constant can	-0.124939
-0.343827	subexpression. A constant can	-0.124939
-0.574792	detects an error can	-0.124939
-0.578390	for the stack can	-0.124939
-0.472345	current Intel CPUs can	-0.124939
-0.320895	modern x86 CPUs can	-0.124939
-0.415788	because modern CPUs can	-0.124939
-0.499013	CPU Modern CPUs can	-0.124939
-0.458266	Objects and arrays can	-0.124939
-0.354488	how caches work can	-0.124939
-0.577265	Even function calls can	-0.124939
-0.354328	if intermediate calculations can	-0.124939
-0.584691	that the result can	-0.124939
-0.553765	that the processor can	-0.124939
-0.807668	of unused bytes can	-0.124939
-0.341782	and that threads can	-0.124939
-0.791919	if multiple threads can	-0.124939
-1.082878	b and c can	-0.124939
-0.430569	that one thread can	-0.124939
-0.449253	interface, another thread can	-0.124939
-0.449253	threads. Each thread can	-0.124939
-0.304475	a third thread can	-0.124939
-0.304475	a high-priority thread can	-0.124939
-0.424035	big that overflow can	-0.124939
-0.327511	that no overflow can	-0.124939
-0.327511	An array overflow can	-0.124939
-0.519564	allocation. Container classes can	-0.124939
-0.850554	Each cache line can	-0.124939
-0.354731	no branches inside can	-0.124939
-0.437593	caching. This problem can	-0.124939
-0.338333	This safety problem can	-0.124939
-0.437925	JNZ). This solution can	-0.124939
-0.486057	But this solution can	-0.124939
-0.820060	a sorted list can	-0.124939
-0.543546	because the hardware can	-0.124939
-0.353845	stack unwinding information can	-0.124939
-0.548556	max) { ... can	-0.124939
-0.715910	the loop counter can	-0.124939
-0.271410	a loop counter can	-0.425969
-0.823715	time stamp counter can	-0.124939
-1.128956	dynamic memory allocation can	-0.124939
-1.076664	Dynamic memory allocation can	-0.124939
-0.352995	method described above can	-0.124939
-0.352501	swapped then both can	-0.124939
-0.352184	object oriented programs can	-0.124939
-0.559044	of memory space can	-0.124939
-0.352525	The automatic dispatching can	-0.124939
-0.784262	that the microprocessor can	-0.124939
-0.413656	then the microprocessor can	-0.124939
-0.413656	well the microprocessor can	-0.124939
-0.352110	is that branches can	-0.124939
-0.801461	then the multiplication can	-0.124939
-0.834239	if the application can	-0.124939
-0.547012	various instruction sets can	-0.124939
-0.429110	the 32 sets can	-0.124939
-0.352397	A mixed implementation can	-0.124939
-0.572875	why exception handling can	-0.124939
-0.568198	its data members can	-0.124939
-0.517043	A template parameter can	-0.124939
-1.544310	pointer or reference can	-0.124939
-0.819895	that the programmer can	-0.124939
-0.462871	reasons. The programmer can	-0.124939
-0.644808	The register keyword can	-0.124939
-0.561859	of table lookup can	-0.124939
-0.350919	for WTL applications can	-0.124939
-0.104105	zero } We can	-0.124939
-0.104105	be used. We can	-0.124939
-0.104105	level-1 cache. We can	-0.124939
-0.104105	to zero We can	-0.124939
-0.104105	the compiler. We can	-0.124939
-0.104105	point number. We can	-0.124939
-0.104105	identifier names. We can	-0.124939
-0.104105	of u.f We can	-0.124939
-0.104105	to 15.1c. We can	-0.124939
-0.104105	sign bit. We can	-0.124939
-0.104105	some caveats. We can	-0.124939
-0.104105	64, ...). We can	-0.124939
-0.104105	(e.g. PowerPC). We can	-0.124939
-0.104105	bit set). We can	-0.124939
-0.104105	as 'this'. We can	-0.124939
-0.559914	out-of-order execution mechanism can	-0.124939
-0.306073	The dispatching mechanism can	-0.124939
-0.915809	CPU dispatch mechanism can	-0.124939
-0.350268	an extra framework can	-0.124939
-0.518636	chains Modern microprocessors can	-0.124939
-1.243568	floating point numbers can	-0.124939
-0.992844	graphical user interface can	-0.124939
-0.417303	the development process can	-0.124939
-0.772072	the installation process can	-0.124939
-0.296809	structure or union can	-0.124939
-0.216509	space. A union can	-0.124939
-0.216509	example. A union can	-0.124939
-0.545162	The copy constructor can	-0.124939
-1.076413	the code section can	-0.124939
-1.046623	I have tested can	-0.124939
-0.538589	the cache contentions can	-0.124939
-0.315803	to float conversions can	-0.124939
-0.465145	time. These conversions can	-0.124939
-0.315665	empty throw() statement can	-0.124939
-0.315665	No general statement can	-0.124939
-0.316215	the same errors can	-0.124939
-0.316215	because serious errors can	-0.124939
-0.778014	other programming languages can	-0.124939
-0.512159	function. Function inlining can	-0.124939
-0.347760	complex digital operation can	-0.124939
-0.310120	Booleans as output can	-0.124939
-0.310120	the compiler output can	-0.124939
-0.449965	CPUs. These costs can	-0.124939
-0.346967	data. A database can	-0.124939
-0.449111	between two constants can	-0.124939
-0.347135	a simple algorithm can	-0.124939
-0.346416	members. This alignment can	-0.124939
-0.544058	because the offset can	-0.124939
-0.347505	effects. This effect can	-0.124939
-0.345577	etc. These counters can	-0.124939
-0.529609	allocation. The heap can	-0.124939
-0.344802	But program loading can	-0.124939
-0.596540	if the condition can	-0.124939
-0.298261	the if condition can	-0.124939
-0.344802	with four cores can	-0.124939
-0.345871	the same generation can	-0.124939
-0.790229	branch target buffer can	-0.124939
-0.939689	The critical stride can	-0.124939
-0.342938	explain how metaprogramming can	-0.124939
-0.691919	A hash map can	-0.124939
-0.344137	such dependency chains can	-0.124939
-0.344137	Such dependency chains can	-0.124939
-0.313845	Studio. This tool can	-0.124939
-0.281891	The test tool can	-0.124939
-0.400634	My test tool can	-0.124939
-0.439535	called. Lazy binding can	-0.124939
-0.781873	the x86 family can	-0.124939
-0.682935	in a DLL can	-0.124939
-0.439900	lookup Lookup tables can	-0.124939
-0.337540	read-only data sections can	-0.124939
-0.495653	Frequent context switches can	-0.124939
-0.517436	free. Visual Studio can	-0.124939
-0.207818	global variables. They can	-0.124939
-0.207818	three branches. They can	-0.124939
-0.207818	very smart. They can	-0.124939
-0.337210	out if exceptions can	-0.124939
-0.712065	and garbage collection can	-0.124939
-0.437011	in these manuals can	-0.124939
-0.816042	with out-of-order capabilities can	-0.124939
-0.337540	execute then measurements can	-0.124939
-0.337210	chip. Such units can	-0.124939
-0.334061	of this polynomial can	-0.124939
-0.766223	in example 13.1 can	-0.124939
-0.334825	valid address. Pointers can	-0.124939
-0.334061	debugging. A debugger can	-0.124939
-0.335208	This wasteful behavior can	-0.124939
-0.432710	ecx and edx can	-0.124939
-0.334061	the background job can	-0.124939
-0.858961	size of abc can	-0.124939
-0.273903	reasonable upper limit can	-0.124939
-0.182284	not-too-big upper limit can	-0.124939
-0.329690	The following guidelines can	-0.124939
-0.330598	have ever seen can	-0.124939
-0.323214	a reasonable estimate can	-0.124939
-0.323214	as code. Metaprogramming can	-0.124939
-0.775185	The heap manager can	-0.124939
-0.152465	simple periodic pattern can	-0.124939
-0.418676	in example 12.4b can	-0.124939
-0.323773	Integer sizes Integers can	-0.124939
-0.323773	applications running simultaneously can	-0.124939
-0.323214	The following techniques can	-0.124939
-0.323214	operator which otherwise can	-0.124939
-0.323214	much higher resolution can	-0.124939
-0.313360	investment. A redesign can	-0.124939
-0.312632	of C++ projects can	-0.124939
-0.571753	in example 14.28 can	-0.124939
-0.312632	exact. Multiple divisions can	-0.124939
-0.312632	s2 and s3 can	-0.124939
-0.312632	user interface etc., can	-0.124939
-0.312632	character arrays. Strings can	-0.124939
-0.312632	that are read-only can	-0.124939
-0.292211	size (typically 64) can	-0.124939
-0.292211	the subexpression c+b can	-0.124939
-0.292211	the same chip can	-0.124939
-0.292211	format. The formats can	-0.124939
-0.380378	level-2 cache miss can	-0.124939
-0.292211	Eliminate jumps Jumps can	-0.124939
-0.292211	the two parentheses can	-0.124939
-0.292211	instruction set. Neither can	-0.124939
-0.292211	optimization explicitly. Divisions can	-0.124939
-0.292211	/ c) 139 can	-0.124939
-0.292211	The fastcall modifier can	-0.124939
-0.292211	This new insight can	-0.124939
-0.292211	cases. Database queries can	-0.124939
-0.292211	studying the bottlenecks can	-0.124939
-0.292211	~, <<, >> can	-0.124939
-0.236146	example, one tread can	-0.124939
-0.236146	intermediate result (b+c) can	-0.124939
-0.236146	of CPUs unequally can	-0.124939
-0.236146	lost. This dilemma can	-0.124939
-0.236146	what the preprocessor can	-0.124939
-0.236146	the following work-around can	-0.124939
-0.236146	in example 8.24 can	-0.124939
-0.236146	in the BTB can	-0.124939
-0.236146	a common denominator can	-0.124939
-0.236146	aliasing" (if valid) can	-0.124939
-0.236146	compile time. (Examples can	-0.124939
-0.236146	such programs installed can	-0.124939
-0.236146	b = !a; can	-0.124939
-0.236146	conversion and shuffling can	-0.124939
-0.236146	are zero. Zero can	-0.124939
-0.236146	keyword far (arrays can	-0.124939
-0.600264	goes in the //	-0.124939
-0.358080	if powN is //	-0.124939
-0.503149	// loop for //	-0.124939
-0.356536	// Virtual function //	-0.124939
-0.356536	// instrset_detect function //	-0.124939
-0.043730	replace this by //	-0.823909
-1.333843	be replaced by //	-0.124939
-1.029714	in Gnu compiler //	-0.124939
-0.065223	for small x //	-0.124939
-0.354377	// square x //	-0.124939
-0.339005	coef[16] = { //	-0.124939
-0.258150	class vector { //	-0.124939
-0.696000	size; i++) { //	-0.124939
-0.536618	NumberOfTests; i++) { //	-0.124939
-0.381264	arraysize; i++) { //	-0.124939
-0.381264	list.Size(); i++) { //	-0.124939
-0.717536	} else { //	-0.124939
-0.772273	const x) { //	-0.124939
-0.895270	& x) { //	-0.124939
-0.897212	p(double x) { //	-0.124939
-0.422041	(double x) { //	-0.124939
-0.300121	< 0) { //	-0.124939
-0.300121	> 0) { //	-0.124939
-0.503696	== 0) { //	-0.124939
-0.067067	int cc[]) { //	-0.301030
-0.385882	* 2) { //	-0.124939
-0.258150	int parm2) { //	-0.124939
-0.137232	< 4) { //	-0.124939
-0.063218	>= 4) { //	-0.425969
-0.395897	CriticalInnerFunction () { //	-0.124939
-0.012337	+= 8) { //	-0.726999
-0.941065	SIZE; r++) { //	-0.124939
-0.105085	r; c++) { //	-0.425969
-0.200584	(int n) { //	-0.425969
-0.258150	< 5) { //	-0.124939
-0.051601	>= 11) { //	-0.425969
-0.258150	* CriticalFunctionDispatch(void) { //	-0.124939
-0.051601	& 0x7FFFFFFF) { //	-0.124939
-0.477378	+= TILESIZE) { //	-0.124939
-0.051601	transpose(double a[SIZE][SIZE]) { //	-0.124939
-0.258150	array i) { //	-0.124939
-0.258150	62 __try { //	-0.124939
-0.258150	: EXCEPTION_CONTINUE_SEARCH) { //	-0.124939
-0.258150	>= N) { //	-0.124939
-0.258150	< arraysize) { //	-0.124939
-0.258150	> v.i) { //	-0.124939
-0.258150	< 13) { //	-0.124939
-1.269981	} } } //	-0.124939
-0.533733	c[i]); } } //	-0.124939
-0.316434	for multiplication } //	-0.124939
-0.719171	*)d, x); } //	-0.124939
-0.039167	_mm_loadu_si128((__m128i const*)p); } //	-0.425969
-0.082219	_mm_load_si128((__m128i const*)p); } //	-0.124939
-0.316434	&Object2; p->Hello(); } //	-0.124939
-0.316434	return clock; } //	-0.124939
-0.410244	= &CriticalFunction_386; } //	-0.124939
-0.410244	return &CriticalFunction_SSE2; } //	-0.124939
-0.578681	bb, cc); } //	-0.124939
-0.316434	return x10; } //	-0.124939
-0.316434	supported"); return; } //	-0.124939
-0.316434	return _mm_cvtss_f32(s); } //	-0.124939
-0.316434	& N-1)==0,N>::p(x); } //	-0.124939
-0.316434	return *(T*)0; } //	-0.124939
-0.461079	// constant data //	-0.124939
-0.780013	for intrinsic functions //	-0.124939
-0.478051	SSE2 intrinsic functions //	-0.124939
-1.102790	cache line size //	-0.124939
-0.654210	that have multiple //	-0.124939
-0.460466	desired function version //	-0.124939
-0.501296	possible vector objects //	-0.124939
-1.068765	power of 2 //	-0.425969
-0.786457	INSTRSET == 2 //	-0.124939
-0.177905	use a table //	-0.425969
-0.653728	Loop with branch //	-0.124939
-0.653815	// swap elements //	-0.124939
-0.649244	constant is faster //	-0.425969
-0.781762	on first call //	-0.124939
-1.728956	i = 0; //	-0.124939
-0.495149	{ return 0; //	-0.124939
-0.580443	test sign bit //	-0.124939
-0.305730	faster if unsigned //	-0.425969
-0.467223	2 bytes. first //	-0.124939
-0.643315	4 bytes. first //	-0.124939
-0.467223	8 bytes. first //	-0.124939
-0.574943	improving the code. //	-0.124939
-1.555910	at compile time. //	-0.124939
-0.470540	function to test //	-0.124939
-0.470540	times to test //	-0.124939
-0.327991	for each test //	-0.124939
-0.327991	Time before test //	-0.124939
-0.537942	#endif // SSE2 //	-0.124939
-0.491517	x to 0 //	-0.124939
-0.568959	N = 0 //	-0.124939
-0.354239	to provoke error //	-0.124939
-0.354563	Repeat NumberOfTests times //	-0.124939
-0.000474	the code. Example: //	-0.425969
-0.000948	of code. Example: //	-0.124939
-0.000948	extra code. Example: //	-0.124939
-0.003804	same time. Example: //	-0.124939
-0.001898	called function. Example: //	-0.124939
-0.001898	pure function. Example: //	-0.124939
-0.001898	in memory. Example: //	-0.124939
-0.001898	static memory. Example: //	-0.124939
-0.003804	are used. Example: //	-0.124939
-0.003804	is called. Example: //	-0.124939
-0.003804	a loop. Example: //	-0.124939
-0.000948	of 2. Example: //	-0.425969
-0.003804	consecutive variables. Example: //	-0.124939
-0.003804	function calls. Example: //	-0.124939
-0.003804	XMM registers. Example: //	-0.124939
-0.003804	float variable. Example: //	-0.124939
-0.003804	is needed. Example: //	-0.124939
-0.003804	detailed instructions. Example: //	-0.124939
-0.003804	non-sequential order. Example: //	-0.124939
-0.003804	jumps to. Example: //	-0.124939
-0.001898	for overflow. Example: //	-0.124939
-0.001898	cause overflow. Example: //	-0.124939
-0.003804	previous value. Example: //	-0.124939
-0.003804	previous branch. Example: //	-0.124939
-0.003804	same constant. Example: //	-0.124939
-0.003804	branch prediction. Example: //	-0.124939
-0.003804	calculated result. Example: //	-0.124939
-0.001898	loop counter. Example: //	-0.124939
-0.001898	integer counter. Example: //	-0.124939
-0.003804	single operation. Example: //	-0.124939
-0.003804	is finished. Example: //	-0.124939
-0.003804	different ways. Example: //	-0.124939
-0.003804	parallel execution. Example: //	-0.124939
-0.003804	array elements. Example: //	-0.124939
-0.003804	only once. Example: //	-0.124939
-0.003804	is limited. Example: //	-0.124939
-0.003804	lookup-table static. Example: //	-0.124939
-0.003804	is known. Example: //	-0.124939
-0.003804	same thing. Example: //	-0.124939
-0.003804	independent divisions. Example: //	-0.124939
-0.003804	or later. Example: //	-0.124939
-0.003804	all zeroes. Example: //	-0.124939
-0.003804	be undesired. Example: //	-0.124939
-0.003804	bit offsets). Example: //	-0.124939
-0.003804	loop overhead. Example: //	-0.124939
-0.003804	members individually. Example: //	-0.124939
-0.354309	12.5. Aligned arrays //	-0.124939
-0.533785	above doesn't work //	-0.124939
-0.354661	// Store result //	-0.124939
-0.806609	6 unused bytes //	-0.124939
-0.353158	#include <ia32intrin.h> etc. //	-0.124939
-0.098249	columns in matrix //	-0.425969
-0.313042	// define matrix //	-0.124939
-0.313042	to transpose matrix //	-0.124939
-0.988302	Define vector classes //	-0.124939
-0.145135	function } }; //	-0.124939
-0.145135	operator } }; //	-0.124939
-0.145135	x; } }; //	-0.124939
-0.145135	1.0; } }; //	-0.124939
-0.145135	N1 } }; //	-0.124939
-0.145135	powN<true,N/2>::p(x); } }; //	-0.124939
-0.145135	(static_cast<MyChild*>(this))->Disp(); } }; //	-0.124939
-0.234052	and perhaps }; //	-0.124939
-0.234052	void NotPolymorphic(); }; //	-0.124939
-0.234052	+ 4.; }; //	-0.124939
-0.500231	399 int b; //	-0.124939
-0.438047	{ double b; //	-0.124939
-0.957563	double a, b; //	-0.124939
-0.461514	i, a, b; //	-0.124939
-0.352808	to prevent optimizing //	-0.124939
-0.060305	b, c; ... //	-0.425969
-0.317487	void CriticalFunction(); ... //	-0.124939
-0.612010	// Loop counter //	-0.124939
-0.946585	time stamp counter //	-0.124939
-0.351662	// sum operator //	-0.124939
-0.135235	one : 1; //	-0.124939
-0.135235	sign : 1; //	-0.124939
-0.401965	Integer size conversion //	-0.124939
-0.309751	/ unsigned conversion //	-0.124939
-0.456634	Implicit type conversion //	-0.124939
-0.591378	a, b, c; //	-0.124939
-0.559945	Initialize to zero //	-0.124939
-0.339585	coefficients // Table //	-0.124939
-0.339585	A2; // Table //	-0.124939
-0.596281	and Gnu compilers. //	-0.124939
-0.325967	to Microsoft compilers. //	-0.124939
-0.350926	of S1 aligned //	-0.124939
-0.349880	x * x; //	-0.124939
-1.124020	size = 100; //	-0.124939
-0.557834	2.20 or later //	-0.124939
-0.349115	SSE4.1 // AVX2 //	-0.124939
-0.414379	constructor // constructor //	-0.124939
-0.451130	// default constructor //	-0.124939
-0.533221	a[i].u[1] * 2; //	-0.124939
-0.349591	functions go here //	-0.124939
-0.768317	short int a; //	-0.124939
-0.041161	Take the example: //	-0.124939
-0.003285	time. For example: //	-0.124939
-0.003285	etc. For example: //	-0.124939
-0.003285	cases. For example: //	-0.124939
-0.001639	to. For example: //	-0.425969
-0.003285	structure. For example: //	-0.124939
-0.003285	sizes. For example: //	-0.124939
-0.003285	lookup. For example: //	-0.124939
-0.003285	valid. For example: //	-0.124939
-0.003285	predictable. For example: //	-0.124939
-0.003285	combined. For example: //	-0.124939
-0.003285	completely. For example: //	-0.124939
-0.041161	the following example: //	-0.124939
-0.041161	ARRAYSIZE. Another example: //	-0.124939
-0.098219	This is slow //	-0.124939
-0.116346	b; int d; //	-0.124939
-0.116346	7 int d; //	-0.124939
-0.506925	c + d; //	-0.124939
-0.634360	loop through rows //	-0.124939
-0.344538	// Called directly //	-0.124939
-0.491881	c2; double temp; //	-0.124939
-0.344257	outside both loops //	-0.124939
-0.292370	SSE2 // SSE4.1 //	-0.124939
-0.292370	vectorized with SSE4.1 //	-0.124939
-0.442836	polymorphism with templates //	-0.124939
-0.018573	the code to: //	-0.124939
-0.001822	reduce this to: //	-0.425969
-0.003652	change this to: //	-0.124939
-0.003652	changing this to: //	-0.124939
-0.003652	Change this to: //	-0.124939
-0.018573	be optimized to: //	-0.124939
-0.018573	be reduced to: //	-0.124939
-0.004569	be changed to: //	-0.425969
-0.502845	using template metaprogramming //	-0.124939
-0.341384	for // multiply //	-0.124939
-0.624809	columns below diagonal //	-0.124939
-0.481124	optimizing // Time //	-0.124939
-0.502172	float x, y; //	-0.124939
-0.338847	to align arrays. //	-0.124939
-0.339995	// SSE3 required //	-0.124939
-0.339612	a different array. //	-0.124939
-0.495218	want to measure //	-0.124939
-0.003384	look like this: //	-0.602060
-0.003384	looks like this: //	-0.602060
-0.485291	int)a / 10; //	-0.124939
-0.528159	organized as follows: //	-0.124939
-0.129780	b * c); //	-0.425969
-0.033853	_mm_mullo_epi16 (b, c); //	-0.425969
-0.436896	public: int a[100]; //	-0.124939
-0.338214	size = 256; //	-0.124939
-0.435258	of type T //	-0.124939
-0.436349	powers of 2: //	-0.124939
-0.336908	template metaprogramming is. //	-0.124939
-0.746895	manual for details. //	-0.124939
-0.337343	x2*x, x2, x); //	-0.124939
-0.333330	positive integer constant. //	-0.124939
-0.333330	C; } polynomial //	-0.124939
-0.241485	C-style type casting //	-0.124939
-0.241485	Constructor-style type casting //	-0.124939
-0.334337	int)b / 16; //	-0.124939
-0.334337	int)b % 16; //	-0.124939
-0.000843	int parm2) {...} //	-0.301030
-0.333833	// Example 13.1 //	-0.124939
-0.001687	LoadVector(cc + i); //	-0.301030
-0.001687	LoadVector(bb + i); //	-0.602060
-0.328966	Use simple method. //	-0.124939
-0.328966	error return a[i]; //	-0.124939
-0.329564	For unused returns //	-0.124939
-0.601886	{ // n! //	-0.124939
-0.426600	function from www.agner.org/optimize/asmlib.zip. //	-0.124939
-0.236044	int cc[size] ); //	-0.124939
-0.236044	int aa[size] ); //	-0.124939
-0.328966	as entry point. //	-0.124939
-0.329564	byte at 13 //	-0.124939
-0.236044	cc); } #endif //	-0.124939
-0.236044	FUNCNAME SelectAddMul_AVX2 #endif //	-0.124939
-0.322501	by writing: 103 //	-0.124939
-0.011515	_controlfp_s(&dummy, 0, _EM_OVERFLOW); //	-0.425969
-0.011515	// _controlfp(0, _EM_OVERFLOW); //	-0.425969
-0.167004	x^2 // x^4 //	-0.124939
-0.167004	x2; // x^4 //	-0.124939
-0.056625	*(int*)&x |= 0x80000000; //	-0.124939
-0.056625	x.i |= 0x80000000; //	-0.124939
-0.179294	u.i ^= 0x80000000; //	-0.124939
-0.323238	to the dispatcher. //	-0.124939
-0.322501	it with 1: //	-0.124939
-0.323238	converted to unsigned. //	-0.124939
-0.322501	induction variable Y //	-0.124939
-0.312895	provokes an error. //	-0.124939
-0.171324	__declspec(align(16)) X #else //	-0.124939
-0.171324	"memory" ); #else //	-0.124939
-0.065274	list of numbers: //	-0.124939
-0.065274	floating point numbers: //	-0.124939
-0.065274	of 100 numbers: //	-0.124939
-0.171324	series of calculations: //	-0.124939
-0.171324	to modulo calculations: //	-0.124939
-0.236366	TILESIZE = 8; //	-0.124939
-0.171324	exponent : 8; //	-0.124939
-0.171324	joining the operations: //	-0.124939
-0.171324	avoid modulo operations: //	-0.124939
-0.311936	risk of overflow: //	-0.124939
-0.707891	the following way: //	-0.124939
-0.570490	// Polynomial coefficients //	-0.124939
-0.015422	be replaced with: //	-0.425969
-0.065274	}; Replace with: //	-0.124939
-0.311936	0 - 30 //	-0.124939
-0.311936	this example: 38 //	-0.124939
-0.065455	the sign bit: //	-0.425969
-0.020687	int a[size], b[size]; //	-0.124939
-0.010220	float a[size], b[size]; //	-0.425969
-0.023343	two = _mm_set1_epi16(2); //	-0.425969
-0.291547	data #ifdef _MSC_VER //	-0.124939
-0.534190	{temp=x; x=y; y=temp;} //	-0.124939
-0.379564	} // x^2 //	-0.124939
-0.379564	library #include <stdio.h> //	-0.124939
-0.534190	C = 3.3; //	-0.124939
-0.048013	vectorized #include <dvec.h> //	-0.124939
-0.048013	114 #include <dvec.h> //	-0.124939
-0.102002	such a case: //	-0.124939
-0.102002	to lower case: //	-0.124939
-0.291547	outside the loop: //	-0.124939
-0.048013	classes #include "vectorclass.h" //	-0.124939
-0.048013	dispatching #include "vectorclass.h" //	-0.124939
-0.023343	a table lookup: //	-0.425969
-0.102002	= _mm_or_si128(c2, bc); //	-0.124939
-0.102002	= _mm_andnot_si128(mask, bc); //	-0.124939
-0.023343	= _mm_add_epi16(c, two); //	-0.425969
-0.291547	divisible by TILESIZE //	-0.124939
-0.291547	ReadTSC() - time1; //	-0.124939
-0.379564	SSE2 #include <emmintrin.h> //	-0.124939
-0.379564	diagonal swapd(a[r][c], a[c][r]); //	-0.124939
-0.023343	level = InstructionSet(); //	-0.425969
-0.023343	InstructionSet() #include "asmlib.h" //	-0.124939
-0.291547	_mm_blendv_epi8(bc, c2, mask); //	-0.124939
-0.023343	SIZE = 512; //	-0.425969
-0.023343	b * 1.2; //	-0.124939
-0.291547	consecutive elements c.load(cc+i); //	-0.124939
-0.534190	a template parameter: //	-0.124939
-0.291547	parm1, int parm2); //	-0.124939
-0.291547	x // x^n //	-0.124939
-0.023343	a lookup table: //	-0.425969
-0.023343	zero = _mm_set1_epi16(0); //	-0.425969
-0.102002	x^8 // x^10 //	-0.124939
-0.102002	// return x^10 //	-0.124939
-0.291547	the derived class: //	-0.124939
-0.102002	fraction : 23; //	-0.124939
-0.102002	n << 23; //	-0.124939
-0.048013	overflow is needed: //	-0.124939
-0.048013	bookkeeping is needed: //	-0.124939
-0.023343	= _mm_cmpgt_epi16(b, zero); //	-0.124939
-0.534190	<< "Hello "; //	-0.124939
-0.023343	Writes "Hello 1" //	-0.425969
-0.534190	= (double)(signed int)u; //	-0.124939
-0.291547	overflow has occurred. //	-0.124939
-0.291547	*const_cast<int*>(&x) += 2;} //	-0.124939
-0.048013	set is available: //	-0.124939
-0.048013	SSE2 is available: //	-0.124939
-0.534190	memset(a, 0, sizeof(a)); //	-0.124939
-0.291547	consecutive elements b.load(bb+i); //	-0.124939
-0.291547	the lrint function: //	-0.124939
-0.235562	to use SafeArray: //	-0.124939
-0.235562	in a union: //	-0.124939
-0.235562	the fraction bits: //	-0.124939
-0.235562	bit to zero: //	-0.124939
-0.235562	x^4 F32vec4 xx4(x4); //	-0.124939
-0.235562	to floating point: //	-0.124939
-0.235562	set is enabled: //	-0.124939
-0.235562	x^4 // x^8 //	-0.124939
-0.235562	<float, 100> list; //	-0.124939
-0.235562	b double precision: //	-0.124939
-0.235562	SIZE = 64; //	-0.124939
-0.235562	log(b[i]) + log(c[i]); //	-0.124939
-0.235562	= &Object2; p2->Hello(); //	-0.124939
-0.235562	(".type CriticalFunction, @gnu_indirect_function"); //	-0.124939
-0.235562	a positive integer: //	-0.124939
-0.235562	smallest members last: //	-0.124939
-0.235562	Z += A2; //	-0.124939
-0.235562	9.6b. #include "xmmintrin.h" //	-0.124939
-0.235562	loop control condition: //	-0.124939
-0.235562	calculation more efficient: //	-0.124939
-0.235562	a[arraysize], b[arraysize], c[arraysize]; //	-0.124939
-0.235562	(Intel) #include <pmmintrin.h> //	-0.124939
-0.235562	return _mm_loadu_si128((__m128i const*)p);} //	-0.124939
-0.235562	in the arrays: //	-0.124939
-0.235562	case of underflow: //	-0.124939
-0.235562	the runtime polymorphism: //	-0.124939
-0.235562	is unsigned Examples: //	-0.124939
-0.235562	SelectAddMul_pointer = &SelectAddMul_dispatch; //	-0.124939
-0.235562	floating point variable: //	-0.124939
-0.235562	nfac *= n+1; //	-0.124939
-0.235562	T // Constructor //	-0.124939
-0.235562	nn ifbit=1 bitofn //	-0.124939
-0.235562	of a double: //	-0.124939
-0.235562	loop and reorganize: //	-0.124939
-0.235562	two suggested improvements). //	-0.124939
-0.235562	optimizing away cpuid //	-0.124939
-0.235562	*)alloca(n * sizeof(float)); //	-0.124939
-0.235562	Intel vector classes): //	-0.124939
-0.235562	{ return ipow(x,10); //	-0.124939
-0.235562	a = (int)d; //	-0.124939
-0.235562	2'nd order polynomial: //	-0.124939
-0.235562	xxn * _mm_load_ps(coef+i); //	-0.124939
-0.235562	defined(__unix__) || defined(__GNUC__) //	-0.124939
-0.235562	method using InstructionSet(): //	-0.124939
-0.235562	__declspec(align(64)) int BigArray[1024]; //	-0.124939
-0.235562	array of structures: //	-0.124939
-0.235562	// Example 7.45 //	-0.124939
-0.235562	(2,2,2,2,2,2,2,2) Is16vec8 two(2,2,2,2,2,2,2,2); //	-0.124939
-0.235562	DynamicArray[i] = WhateverFunction(i); //	-0.124939
-0.235562	u.i &= 0x7FFFFFFF; //	-0.124939
-0.235562	compare absolute values: //	-0.124939
-0.235562	the static keyword: //	-0.124939
-0.235562	a single comparison: //	-0.124939
-0.235562	SSE2 instruction set: //	-0.124939
-0.235562	with the reciprocal: //	-0.124939
-0.235562	Template Library (WTL): //	-0.124939
-0.235562	a loop counter: //	-0.124939
-0.235562	Time // Serialize //	-0.124939
-0.235562	can be used: //	-0.124939
-0.235562	// Example 14.21. //	-0.124939
-0.235562	memset and memcpy: //	-0.124939
-0.235562	from library asmlib.. //	-0.124939
-0.235562	a[i] = 0.0; //	-0.124939
-0.235562	CriticalFunction = &CriticalFunction_Dispatch; //	-0.124939
-0.235562	vector, uses SSE3. //	-0.124939
-0.235562	b = lrint(d); //	-0.124939
-0.235562	a common denominator: //	-0.124939
-0.235562	chain in two: //	-0.124939
-0.235562	4, we have: //	-0.124939
-0.235562	by using memset: //	-0.124939
-0.235562	// define fprintf //	-0.124939
-0.235562	fraction : 52; //	-0.124939
-0.235562	versions #include "instrset_detect.cpp" //	-0.124939
-0.235562	// x,y coordinates //	-0.124939
-0.235562	(0,0,0,0,0,0,0,0) Is16vec8 zero(0,0,0,0,0,0,0,0); //	-0.124939
-0.235562	type-casting its address: //	-0.124939
-0.235562	index changes fastest: //	-0.124939
-0.235562	array with alloca: //	-0.124939
-0.235562	//=2*A //=A*x*x+B*x+C //=DeltaY //	-0.124939
-0.235562	to the exponent: //	-0.124939
-0.235562	constant reference instead: //	-0.124939
-0.235562	doing type conversions: //	-0.124939
-0.235562	with element matrix[c][r]. //	-0.124939
-0.235562	SelectAddMul_pointer = &SelectAddMul_SSE2; //	-0.124939
-0.235562	SelectAddMul_SSE41, SelectAddMul_AVX2, SelectAddMul_dispatch; //	-0.124939
-0.235562	Intel vector classes: //	-0.124939
-0.235562	prototype CriticalFunctionType CriticalFunction_Dispatch; //	-0.124939
-0.235562	cout << x.f; //	-0.124939
-0.235562	for matrix a: //	-0.124939
-0.235562	fraction : 63; //	-0.124939
-0.235562	} return add_elements(s); //	-0.124939
-0.235562	them as integers: //	-0.124939
-0.235562	0x7FFFFF) | 0x3F800000; //	-0.124939
-0.235562	through pointers, e.g.: //	-0.124939
-0.235562	134) return FactorialTable[n]; //	-0.124939
-0.235562	in example 7.22. //	-0.124939
-0.235562	& obj1; p->f(); //	-0.124939
-0.235562	a pivot search: //	-0.124939
-0.235562	x2 * x2; //	-0.124939
-0.235562	initialize to x^0/0! //	-0.124939
-0.235562	in example 9.5b. //	-0.124939
-0.235562	or always false: //	-0.124939
-0.235562	code inside square: //	-0.124939
-0.235562	half a square. //	-0.124939
-0.235562	{ _mm_stream_pi((__m64*)dest, *(__m64*)&source); //	-0.124939
-0.235562	a certain interval: //	-0.124939
-0.235562	0.f, 0.f, 1.f); //	-0.124939
-0.235562	two induction variables: //	-0.124939
-0.235562	xxn *= xx4; //	-0.124939
-0.235562	exponent : 15; //	-0.124939
-0.235562	ReadTSC function. 154 //	-0.124939
-0.235562	the return statement: //	-0.124939
-0.235562	it a template: //	-0.124939
-0.235562	f = static_cast<float>(i); //	-0.124939
-0.235562	of this capability: //	-0.124939
-0.235562	int BigArray[1024] __attribute__((aligned(64))); //	-0.124939
-0.235562	// MOVNTQ _mm_empty(); //	-0.124939
-0.235562	by two gives: //	-0.124939
-0.235562	volatile int seconds; //	-0.124939
-0.235562	b * 1.2f; //	-0.124939
-0.235562	short int cc[]); //	-0.124939
-0.235562	= instrset_detect(); 116 //	-0.124939
-0.235562	_mm_and_si128(c2, mask); 110 //	-0.124939
-0.235562	exponent : 11; //	-0.124939
-0.235562	list[i] << endl; //	-0.124939
-0.235562	a * 2.5; //	-0.124939
-0.572465	changed to a =	-0.124939
-0.516698	critical function a =	-0.124939
-0.338661	{ int a =	-0.124939
-0.004729	else { a =	-0.522879
-0.005919	(b) { a =	-0.726999
-0.024178	(true) { a =	-0.124939
-0.559123	we have a =	-0.124939
-0.338661	8.4 double a =	-0.124939
-0.338661	function pointer a =	-0.124939
-0.338661	8.18 float a =	-0.124939
-0.338661	memory address a =	-0.124939
-1.216293	For example, a =	-0.124939
-0.338661	the case a =	-0.124939
-0.338661	a & a =	-0.124939
-0.053302	int b; a =	-0.124939
-0.006316	a, b; a =	-0.425969
-0.053302	bool b; a =	-0.124939
-0.338661	The solution a =	-0.124939
-0.338661	{ ... a =	-0.124939
-0.103336	the expression a =	-0.124939
-0.137124	element __m128i a =	-0.124939
-0.137124	operations: __m128i a =	-0.124939
-0.049781	* c; a =	-0.124939
-0.049781	/ c; a =	-0.124939
-0.024178	b, c; a =	-0.425969
-0.049781	% c; a =	-0.124939
-0.438005	a && a =	-0.124939
-0.338661	a | a =	-0.124939
-0.086451	7.10b char a =	-0.124939
-0.086451	8.17 char a =	-0.124939
-0.086451	7.9b char a =	-0.124939
-0.338661	c, d; a =	-0.124939
-0.030439	/ 10; a =	-0.124939
-0.030439	% 10; a =	-0.124939
-0.137124	/ 16; a =	-0.124939
-0.137124	% 16; a =	-0.124939
-0.338661	c; Is16vec8 a =	-0.124939
-0.338661	Example 7.2 a =	-0.124939
-0.338661	float 140 a =	-0.124939
-0.338661	b.load(bb+i); c.load(cc+i); a =	-0.124939
-0.338661	y, z; a =	-0.124939
-0.338661	1. Writing a =	-0.124939
-0.338661	Example 8.2b a =	-0.124939
-0.338661	Example 8.3b a =	-0.124939
-0.338661	Example 8.10b a =	-0.124939
-0.338661	{2.6f, 1.5f}; a =	-0.124939
-0.338661	{1.0f, 2.5f}; a =	-0.124939
-0.138584	to int x =	-0.124939
-0.138584	reduce int x =	-0.124939
-0.041468	efficient than x =	-0.124939
-0.087317	faster than x =	-0.124939
-0.063788	For example, x =	-0.124939
-0.343324	- 2, x =	-0.124939
-0.343324	x, y; x =	-0.124939
-0.065235	const double A =	-0.425969
-0.001539	const int size =	-0.505150
-0.121236	a, b; b =	-0.124939
-0.121236	0, b; b =	-0.124939
-0.290015	be 1 b =	-0.124939
-0.056401	b: __m128i b =	-0.425969
-0.121236	double c; b =	-0.124939
-0.121236	b, c; b =	-0.124939
-0.290015	= 0, b =	-0.124939
-0.290015	= a; b =	-0.124939
-0.290015	b: Is16vec8 b =	-0.124939
-0.290015	= -100, b =	-0.124939
-0.290015	= -1.0E8, b =	-0.124939
-0.290015	parabola (2.0f); b =	-0.124939
-0.290015	= 5.0f; b =	-0.124939
-0.290015	= Multiply(10,8); b =	-0.124939
-0.290015	= a+1; b =	-0.124939
-0.397829	eax = i =	-0.124939
-0.279756	work int i =	-0.124939
-0.279756	for example i =	-0.124939
-0.001222	for (int i =	-1.238882
-0.279756	s; 40 i =	-0.124939
-0.064782	(2,2,2,2,2,2,2,2) __m128i two =	-0.425969
-0.357102	= dummy[0]; clock =	-0.124939
-0.348060	8 * 4 =	-0.124939
-0.348060	8192 / 4 =	-0.124939
-0.347936	10 * 8 =	-0.124939
-0.347936	kb / 8 =	-0.124939
-0.355498	64) % 32 =	-0.124939
-0.334589	a & 0 =	-0.124939
-0.062630	a | 0 =	-0.425969
-0.061051	Multiply by constant =	-0.124939
-0.019412	Divide by constant =	-0.602060
-0.138462	i); // result =	-0.124939
-0.138462	c.load(cc+i); // result =	-0.124939
-0.047741	4 4 bytes =	-0.425969
-0.023215	unlimited 4 bytes =	-0.425969
-0.044669	unlimited 8 bytes =	-0.425969
-0.244280	= 2048 bytes =	-0.124939
-0.192176	changed to c =	-0.124939
-0.192176	0) { c =	-0.124939
-0.192176	overlap. If c =	-0.124939
-0.192176	= b; c =	-0.124939
-0.040669	c: __m128i c =	-0.425969
-0.192176	fast division c =	-0.124939
-0.040669	c, d; c =	-0.425969
-0.192176	* temp; c =	-0.124939
-0.192176	c: Is16vec8 c =	-0.124939
-0.192176	= 100, c =	-0.124939
-0.192176	* 3.5; c =	-0.124939
-0.192176	= 1.0E8, c =	-0.124939
-0.192176	{ CFALSE: c =	-0.124939
-0.192176	* (a+1); c =	-0.124939
-0.001624	b for (i =	-0.124939
-0.001624	0; for (i =	-0.124939
-0.000270	i; for (i =	-0.602060
-0.001624	b; for (i =	-0.124939
-0.000405	... for (i =	-0.726999
-0.000811	1; for (i =	-0.124939
-0.001624	zero for (i =	-0.124939
-0.000811	x; for (i =	-0.425969
-0.001624	temp; for (i =	-0.124939
-0.001624	3; for (i =	-0.124939
-0.001624	a[100]; for (i =	-0.124939
-0.001624	45 for (i =	-0.124939
-0.001624	r; for (i =	-0.124939
-0.001624	loop: for (i =	-0.124939
-0.001624	StringLength; for (i =	-0.124939
-0.001624	a[2]; for (i =	-0.124939
-0.001624	84 for (i =	-0.124939
-0.001624	timediff[NumberOfTests]; for (i =	-0.124939
-0.001624	printf("\nResults:"); for (i =	-0.124939
-0.010792	else { y =	-0.425969
-0.010792	(b) { y =	-0.425969
-0.094929	{ double y =	-0.124939
-0.094929	// return y =	-0.124939
-0.094929	the expression y =	-0.124939
-0.094929	+ c; y =	-0.124939
-0.094929	= a; y =	-0.124939
-0.094929	: b) y =	-0.124939
-0.010792	d, y; y =	-0.124939
-0.021859	100, y; y =	-0.124939
-0.021859	1.23456, y; y =	-0.124939
-0.021859	b1, b2; y =	-0.124939
-0.094929	may write: y =	-0.124939
-0.061912	(0,0,0,0,0,0,0,0) __m128i zero =	-0.425969
-0.328082	vector. If n =	-0.124939
-0.328082	for (int n =	-0.124939
-0.233531	16 1 byte =	-0.124939
-0.233531	32 1 byte =	-0.124939
-0.322691	r) { r =	-0.124939
-0.322691	example when r =	-0.124939
-0.001148	i++) { a[i] =	-0.191886
-0.008100	2) { a[i] =	-0.124939
-0.069470	loop ; a[i] =	-0.124939
-0.033347	2; i++) a[i] =	-0.124939
-0.033347	size; i++) a[i] =	-0.124939
-0.069470	multiplication here: a[i] =	-0.124939
-0.069470	avoids overflow: a[i] =	-0.124939
-0.069470	safe formula a[i] =	-0.124939
-0.060444	= 2.2, C =	-0.425969
-0.055800	= 20, columns =	-0.124939
-0.285914	= 10, columns =	-0.124939
-0.257489	C0 * p =	-0.124939
-0.257489	int i; p =	-0.124939
-0.257489	p->NotPolymorphic(); p->Hello(); p =	-0.124939
-0.257489	* p; p =	-0.124939
-0.348237	!(a < b) =	-0.124939
-0.244502	i++) { temp =	-0.124939
-0.132016	b, temp; temp =	-0.124939
-0.132016	c, temp; temp =	-0.124939
-0.132016	a[100], temp; temp =	-0.124939
-0.091630	0) { d =	-0.124939
-0.091630	14.20 double d =	-0.124939
-0.021160	c*x + d =	-0.124939
-0.043403	& b; d =	-0.124939
-0.043403	&& b; d =	-0.124939
-0.009277	double d; d =	-0.301030
-0.091630	{ DTRUE: d =	-0.124939
-0.215234	a[i+3]; } sum =	-0.124939
-0.094418	x; float sum =	-0.124939
-0.094418	a[100]; float sum =	-0.124939
-0.215234	int i, sum =	-0.124939
-0.215234	float list[size], sum =	-0.124939
-0.346601	; shift right =	-0.124939
-0.302822	a && true =	-0.124939
-0.302822	a || true =	-0.124939
-0.185953	specialization for N =	-0.124939
-0.022577	const int rows =	-0.301030
-0.058293	library int level =	-0.425969
-0.546866	i<300; i++){ list[i] =	-0.124939
-0.388405	i<301; i+=3){ list[i] =	-0.124939
-0.206614	CriticalFunctionType * CriticalFunction =	-0.124939
-0.206614	Generic version CriticalFunction =	-0.124939
-0.091134	SSE2 supported CriticalFunction =	-0.124939
-0.091134	AVX supported CriticalFunction =	-0.124939
-0.343368	DelayFiveSeconds() { seconds =	-0.124939
-0.206769	int i, f =	-0.124939
-0.206769	= (float)i; f =	-0.124939
-0.206769	= float(i); f =	-0.124939
-0.206769	f; f=i; f =	-0.124939
-0.040767	p) { *p =	-0.425969
-0.040767	char string[100], *p =	-0.425969
-0.055905	x, n, factorial =	-0.425969
-0.341898	85 ; eax =	-0.124939
-0.277917	a && false =	-0.124939
-0.277917	a || false =	-0.124939
-0.046109	c __m128i c2 =	-0.425969
-0.223961	the bit-mask: c2 =	-0.124939
-0.117176	stack ; ecx =	-0.124939
-0.117176	;edx=addressinr ; ecx =	-0.124939
-0.054648	i++) { j =	-0.425969
-0.029486	a & -1 =	-0.425969
-0.029486	a | -1 =	-0.425969
-0.132275	a ^ -1 =	-0.124939
-0.267730	value as xn =	-0.124939
-0.267730	{ float xn =	-0.124939
-0.338624	i; int Induction =	-0.124939
-0.051033	= 1.1, B =	-0.425969
-0.108871	Induction ; edx =	-0.124939
-0.108871	[esp+12] ; edx =	-0.124939
-0.034804	c __m128i bc =	-0.425969
-0.159922	inverted bit-mask: bc =	-0.124939
-0.015125	const int SIZE =	-0.301030
-0.007152	{ for (c =	-0.602060
-0.021818	rows for (c =	-0.124939
-0.159922	such as -(-a) =	-0.124939
-0.159922	a*(b+c) - -(-a) =	-0.124939
-0.159922	- n.a. -(-a) =	-0.124939
-0.329901	value as n! =	-0.124939
-0.160201	_mm_hadd_ps(x, x); s =	-0.124939
-0.072755	int s; s =	-0.124939
-0.072755	__m128 s; s =	-0.124939
-0.330313	= 4, Wednesday =	-0.124939
-0.330313	float list[size], sum1 =	-0.124939
-0.034804	a & ~a =	-0.425969
-0.159922	a ^ ~a =	-0.124939
-0.236980	i++) { b[i] =	-0.124939
-0.236980	size; i++) b[i] =	-0.124939
-0.048136	FuncType * SelectAddMul_pointer =	-0.124939
-0.048136	>= 2) SelectAddMul_pointer =	-0.124939
-0.048136	>= 8) SelectAddMul_pointer =	-0.124939
-0.048136	>= 5) SelectAddMul_pointer =	-0.124939
-0.122045	cos(x); } z =	-0.124939
-0.122045	= cos(x); z =	-0.124939
-0.122045	= sin(x); z =	-0.124939
-0.323423	int n; u.i =	-0.124939
-0.418935	sets) (line size) =	-0.124939
-0.323423	= 0, sum2 =	-0.124939
-0.005733	c; for (r =	-0.425969
-0.005733	temp; for (r =	-0.425969
-0.323423	trick that N1 =	-0.124939
-0.043958	bit-mask: __m128i mask =	-0.425969
-0.323423	A; double Y =	-0.124939
-0.065444	0x2710 and (set) =	-0.124939
-0.065444	we have (set) =	-0.124939
-0.065444	the formula: (set) =	-0.124939
-0.171757	a && !a =	-0.124939
-0.171757	a || !a =	-0.124939
-0.065444	a - a-a =	-0.124939
-0.065444	- n.a. a-a =	-0.124939
-0.065444	x-xxx---- a-(-b)=a+b a-a =	-0.124939
-0.065444	- n.a. x*x*x*x*x*x*x*x =	-0.124939
-0.031490	= ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x =	-0.124939
-0.031490	x ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x =	-0.124939
-0.171757	= 0; list[i+1] =	-0.124939
-0.171757	list[i] =0; list[i+1] =	-0.124939
-0.171757	{ double x2 =	-0.124939
-0.171757	1./2.09227E13}; float x2 =	-0.124939
-0.036997	= 1; list[i+2] =	-0.425969
-0.312835	C; double Z =	-0.124939
-0.313495	+ b[i]; c[i] =	-0.124939
-0.312835	= 0, s3 =	-0.124939
-0.312835	= 0, s2 =	-0.124939
-0.171757	- n.a. a+b =	-0.124939
-0.171757	algebra reductions: a+b =	-0.124939
-0.171757	= b+a a*b =	-0.124939
-0.171757	= b+a, a*b =	-0.124939
-0.020738	x; for (x =	-0.124939
-0.020738	1.0; for (x =	-0.124939
-0.020738	B; for (x =	-0.124939
-0.405783	is 8 kb =	-0.124939
-0.102279	*x; double x4 =	-0.124939
-0.102279	x^2 float x4 =	-0.124939
-0.102279	Induction; ; a[i+1] =	-0.124939
-0.102279	= Induction; a[i+1] =	-0.124939
-0.292405	1.f; float nfac =	-0.124939
-0.102279	a - a*0 =	-0.124939
-0.102279	- n.a. a*0 =	-0.124939
-0.102279	0 - a*1 =	-0.124939
-0.102279	- n.a. a*1 =	-0.124939
-0.292405	const int TILESIZE =	-0.124939
-0.292405	= 0x20, Saturday =	-0.124939
-0.102279	0 - a+0 =	-0.124939
-0.102279	- n.a. a+0 =	-0.124939
-0.102279	* b2); y1 =	-0.124939
-0.102279	y1, y2; y1 =	-0.124939
-0.102279	* reciprocal_divisor; y2 =	-0.124939
-0.102279	/ b1; y2 =	-0.124939
-0.102279	B, C; x.abc =	-0.124939
-0.102279	Example 7.40c x.abc =	-0.124939
-0.292405	* p2; p2 =	-0.124939
-0.292405	* p1; p1 =	-0.124939
-0.292405	const int ArraySize =	-0.124939
-0.023401	i++) { aa[i] =	-0.124939
-0.023401	{ for (c2 =	-0.124939
-0.292405	const int ABC =	-0.124939
-0.292405	a[100]; float s0 =	-0.124939
-0.535691	(a&&b) || (a&&c) =	-0.124939
-0.292405	const int NumberOfTests =	-0.124939
-0.292405	const int min =	-0.124939
-0.292405	The factor sizeof(S1) =	-0.124939
-0.102279	int i, largest_index =	-0.124939
-0.102279	= absvalue; largest_index =	-0.124939
-0.292405	i++) { list[i].a =	-0.124939
-0.292405	= order(i); matrix[j][0] =	-0.124939
-0.292405	= 0, s1 =	-0.124939
-0.102279	a+(b+c) - a*b+a*c =	-0.124939
-0.102279	- n.a. a*b+a*c =	-0.124939
-0.048136	(a&&b) || (a&&b&&c) =	-0.124939
-0.048136	(a&&c) || (a&&b&&c) =	-0.124939
-0.292405	const int ARRAYSIZE =	-0.124939
-0.048136	- a ^a =	-0.124939
-0.048136	~a a ^a =	-0.124939
-0.292405	= 0x10, Friday =	-0.124939
-0.102279	reductions as 0/a =	-0.124939
-0.102279	a - 0/a =	-0.124939
-0.102279	n.a. - (a&b)|(a&c) =	-0.124939
-0.102279	~(~a)=a x-xxxxx-- (a&b)|(a&c) =	-0.124939
-0.292405	const double log2 =	-0.124939
-0.535691	(a&&b) || (!a&&c) =	-0.124939
-0.102279	largest_abs) { largest_abs =	-0.124939
-0.102279	int absvalue, largest_abs =	-0.124939
-0.102279	y; ... x.a =	-0.124939
-0.102279	B, C; x.a =	-0.124939
-0.023401	x++) { Table[x] =	-0.124939
-0.102279	= B; x.c =	-0.124939
-0.102279	+ 2.; x.c =	-0.124939
-0.102279	+ 1.; x.b =	-0.124939
-0.102279	= A; x.b =	-0.124939
-0.102279	x-xxx---- a*b*c=a*(b*c) a+b+c+d =	-0.124939
-0.102279	(a+b)+c=a+(b+c) a+b+c=c+b+a a+b+c+d =	-0.124939
-0.023401	const int FactorialTable[13] =	-0.425969
-0.292405	= 2, Tuesday =	-0.124939
-0.048136	sqaure: for (r2 =	-0.124939
-0.048136	separately: for (r2 =	-0.124939
-0.292405	c first. b+c =	-0.124939
-0.236316	float b) {x =	-0.124939
-0.236316	n.a. - a<<b<<c =	-0.124939
-0.236316	and then 0+1.23456 =	-0.124939
-0.236316	ReadTSC(); CriticalFunction(); timediff[i] =	-0.124939
-0.236316	Weekdays { Sunday =	-0.124939
-0.236316	const float OneOrTwo5[2] =	-0.124939
-0.236316	b<c && a<c) =	-0.124939
-0.236316	i++) { time1 =	-0.124939
-0.236316	a - a/1 =	-0.124939
-0.236316	+ 1; x[1] =	-0.124939
-0.236316	= order(i); list[j].a =	-0.124939
-0.236316	__cpuid(dummy, 0); DontSkip =	-0.124939
-0.236316	* const Greek[4] =	-0.124939
-0.236316	const double A2 =	-0.124939
-0.236316	= 100, max =	-0.124939
-0.236316	x4*x4; double x10 =	-0.124939
-0.236316	{ 92 DynamicArray[i] =	-0.124939
-0.236316	{ // polynomial(x) =	-0.124939
-0.236316	NUMCOLUMNS; column++) matrix[row][column] =	-0.124939
-0.236316	sign bit: absvalue =	-0.124939
-0.236316	y2, reciprocal_divisor; reciprocal_divisor =	-0.124939
-0.236316	- n.a. (-a)*(-b) =	-0.124939
-0.236316	n.a. - a+b+c =	-0.124939
-0.236316	const float coef[16] =	-0.124939
-0.236316	- n.a. (a+b)+c =	-0.124939
-0.236316	const int NUMROWS =	-0.124939
-0.236316	&list[100] is (int)(&list[100]) =	-0.124939
-0.236316	i++) for (j =	-0.124939
-0.236316	- n.a. a+a+a+a =	-0.124939
-0.236316	x2*x2; double x8 =	-0.124939
-0.236316	(~a&c) | (b&c) =	-0.124939
-0.236316	a&(b|c) x-xxxx--x (a|b)&(a|c) =	-0.124939
-0.236316	the value -100+100+100 =	-0.124939
-0.236316	(!a&&c) || (b&&c) =	-0.124939
-0.236316	{ for (c1 =	-0.124939
-0.236316	= 1.0; list[i].b =	-0.124939
-0.236316	n.a. - andnot(a,a) =	-0.124939
-0.236316	const float lookup[2] =	-0.124939
-0.236316	+ 3.; x.d =	-0.124939
-0.236316	} x; x.f =	-0.124939
-0.236316	!a && !b =	-0.124939
-0.236316	= 1; a[1] =	-0.124939
-0.236316	column; for (row =	-0.124939
-0.236316	squares: for (r1 =	-0.124939
-0.236316	const int arraysize =	-0.124939
-0.236316	*temp; for (temp =	-0.124939
-0.236316	be repeated 1024/4 =	-0.124939
-0.236316	int a[2]; a[0] =	-0.124939
-0.236316	(a&&!b) || (!a&&b) =	-0.124939
-0.236316	as (critical stride) =	-0.124939
-0.236316	* 0.5 ns =	-0.124939
-0.236316	i++) { ab[i].b =	-0.124939
-0.236316	(columns * sizeof(float)) =	-0.124939
-0.236316	= 1.0; temp->b =	-0.124939
-0.236316	temp++) { temp->a =	-0.124939
-0.236316	static float list[] =	-0.124939
-0.236316	= 8, Thursday =	-0.124939
-0.236316	float * DynamicArray =	-0.124939
-0.236316	set int iset =	-0.124939
-0.236316	= 1, Monday =	-0.124939
-0.236316	| (~a&c) a&b&c&d =	-0.124939
-0.236316	~a ^ ~b =	-0.124939
-0.236316	c++) { a[c][r] =	-0.124939
-0.236316	lines is 8*1024/64 =	-0.124939
-0.236316	called with IsPowerOf2 =	-0.124939
-0.236316	(vector) reductions: ~(~a) =	-0.124939
-0.236316	b, c; x[0] =	-0.124939
-0.236316	= 100, NUMCOLUMNS =	-0.124939
-0.236316	(a&b)&(c&d) a ^0 =	-0.124939
-0.236316	row++) for (column =	-0.124939
-0.236316	Example 7.41b a.x =	-0.124939
-0.236316	+ d.x; a.y =	-0.124939
-0.236316	2; } list[300] =	-0.124939
-0.236316	0x40) % 0x20 =	-0.124939
-0.357566	library asmlib.. // or	-0.124939
-1.291773	when the function or	-0.124939
-0.818485	inline the function or	-0.124939
-0.557701	Put the function or	-0.124939
-1.135297	of a function or	-0.124939
-0.896646	the same function or	-0.124939
-0.341995	that no function or	-0.124939
-0.528961	test each function or	-0.124939
-0.442200	with any function or	-0.124939
-1.209976	a member function or	-0.124939
-0.786942	a single function or	-0.124939
-0.442200	at every function or	-0.124939
-0.341995	storage. No function or	-0.124939
-0.744743	the frame function or	-0.124939
-0.341995	or friend function or	-0.124939
-0.461580	be represented with or	-0.124939
-0.500960	in system code or	-0.124939
-0.553801	use assembly code or	-0.124939
-0.498881	use vectorized code or	-0.124939
-0.587703	an unsigned int or	-0.124939
-0.356788	to reflect this or	-0.124939
-0.764236	at compile time or	-0.425969
-0.846491	allocation of memory or	-0.124939
-0.443276	variables in memory or	-0.124939
-0.628286	around in memory or	-0.124939
-0.356848	to exchange data or	-0.124939
-0.590186	behind the program or	-0.124939
-0.528898	size of program or	-0.124939
-0.567252	initialization. The program or	-0.124939
-0.862811	in a vector or	-0.124939
-1.276463	from the same or	-0.124939
-0.523383	and virtual functions or	-0.124939
-0.470760	use intrinsic functions or	-0.124939
-0.470760	using intrinsic functions or	-0.124939
-0.347123	identify individual functions or	-0.124939
-0.655665	a particular CPU or	-0.124939
-0.839363	before the loop or	-0.124939
-0.986785	outside the loop or	-0.124939
-0.578545	vectorize a loop or	-0.124939
-0.591081	keyword is used or	-0.124939
-0.518171	file and one or	-0.124939
-0.456456	program has one or	-0.124939
-0.539046	from only one or	-0.124939
-0.419271	by using one or	-0.124939
-0.323693	but read one or	-0.124939
-0.419271	to just one or	-0.124939
-0.323693	and enable one or	-0.124939
-0.323693	and 22 one or	-0.124939
-0.323693	integer units, one or	-0.124939
-0.323693	purpose: Contain one or	-0.124939
-1.167475	the code cache or	-0.124939
-0.891652	this instruction set or	-0.124939
-0.270188	of the class or	-0.124939
-0.366331	to the class or	-0.124939
-0.469325	of a class or	-0.425969
-0.549373	in a class or	-0.124939
-0.549373	into a class or	-0.124939
-0.327678	the object's class or	-0.124939
-0.799959	with other compilers or	-0.124939
-0.501604	the code size or	-0.124939
-0.027419	Gnu, Clang, Intel or	-0.124939
-0.761689	of the pointer or	-0.124939
-0.426910	is a pointer or	-0.124939
-0.301736	it a pointer or	-0.124939
-0.301736	with a pointer or	-0.124939
-0.426910	return a pointer or	-0.124939
-0.217041	through a pointer or	-0.249877
-0.301736	transfer a pointer or	-0.124939
-0.301736	Unlike a pointer or	-0.124939
-0.301736	pass a pointer or	-0.124939
-0.381002	elimination A pointer or	-0.124939
-0.113253	return any pointer or	-0.124939
-0.113253	making any pointer or	-0.124939
-0.266876	A const pointer or	-0.124939
-0.266876	4 4 pointer or	-0.124939
-0.266876	8 8 pointer or	-0.124939
-0.266876	a simple pointer or	-0.124939
-0.266876	6 integer, pointer or	-0.124939
-0.266876	The returned pointer or	-0.124939
-0.266876	a variable, pointer or	-0.124939
-0.846397	a function library or	-0.124939
-0.350940	a graphics library or	-0.124939
-0.288043	is a float or	-0.124939
-0.288043	to a float or	-0.124939
-0.320055	Conversions of float or	-0.124939
-0.291254	bytes = float or	-0.425969
-0.131188	conversion from float or	-0.124939
-0.131188	conversions from float or	-0.124939
-0.320055	registers (8 float or	-0.124939
-0.314563	that is two or	-0.124939
-0.314563	may be two or	-0.124939
-0.523819	there are two or	-0.124939
-0.649072	There are two or	-0.124939
-0.294175	to have two or	-0.124939
-0.294175	CPUs have two or	-0.124939
-0.774709	chooses between two or	-0.124939
-0.407924	for doing two or	-0.124939
-0.314563	set. Make two or	-0.124939
-1.017067	of the object or	-0.124939
-0.350821	distributed as object or	-0.124939
-0.340162	Access to static or	-0.124939
-0.439893	inline or static or	-0.124939
-0.340162	rely on static or	-0.124939
-0.439893	all functions static or	-0.124939
-0.350327	in compiled C++ or	-0.124939
-0.350327	in C, C++ or	-0.124939
-0.800084	if the array or	-0.124939
-0.868798	of an array or	-0.124939
-0.491107	copying an array or	-0.124939
-0.332794	fixed size array or	-0.124939
-0.332794	allocation Any array or	-0.124939
-0.523315	small as possible or	-0.124939
-0.503170	Accessing a variable or	-0.124939
-0.503170	treat a variable or	-0.124939
-0.337018	that no variable or	-0.124939
-0.435941	global const variable or	-0.124939
-0.500719	avoid global variables or	-0.124939
-1.762206	power of 2 or	-0.124939
-0.652846	an import table or	-0.124939
-0.503069	out of order or	-0.124939
-0.456135	unsigned long long or	-0.124939
-0.456135	int32_t long long or	-0.124939
-0.586956	running in 32-bit or	-0.124939
-0.547650	first data member or	-0.124939
-0.341450	a different way or	-0.124939
-0.344908	in one way or	-0.124939
-0.344908	randomly one way or	-0.124939
-0.356281	a #define, const or	-0.124939
-0.355131	1, 2, 4 or	-0.124939
-0.576354	destructor to call or	-0.124939
-0.510245	or 16 8 or	-0.124939
-0.347289	the lower 8 or	-0.124939
-0.355398	the entire 64 or	-0.124939
-0.778042	whole program optimization or	-0.124939
-0.354968	(dynamically linked libraries or	-0.124939
-0.328318	than by pointers or	-0.124939
-0.295584	accessed through pointers or	-0.124939
-0.328318	may contain pointers or	-0.124939
-0.354699	>> can test or	-0.124939
-0.150559	allocated with new or	-0.124939
-0.336336	dynamically (with new or	-0.124939
-0.354683	in 16-bit systems or	-0.124939
-0.521144	is file access or	-0.124939
-0.354429	be 8, 16 or	-0.124939
-0.674519	when the SSE2 or	-0.425969
-0.826360	unless the SSE2 or	-0.124939
-0.794902	enable the SSE2 or	-0.124939
-0.399314	Only for SSE2 or	-0.124939
-0.399314	instruction set SSE2 or	-0.124939
-0.652835	be ruled out or	-0.124939
-1.613927	the operating system or	-0.124939
-0.055835	to be 0 or	-0.124939
-0.041246	value than 0 or	-0.124939
-0.258379	values than 0 or	-0.425969
-0.286153	is always 0 or	-0.124939
-0.355896	macros with short or	-0.124939
-0.354202	single assembly instructions or	-0.124939
-0.354734	libraries. Use Gnu or	-0.124939
-0.806326	the most important or	-0.124939
-0.085100	with multiple CPUs or	-0.124939
-0.085100	use multiple CPUs or	-0.124939
-0.085100	Using multiple CPUs or	-0.124939
-0.534792	functions, inline assembly or	-0.124939
-0.481299	count is large or	-0.124939
-0.531021	is very large or	-0.124939
-0.567524	If the arrays or	-0.124939
-1.240582	floating point calculations or	-0.124939
-0.519999	is 128 bytes or	-0.124939
-0.519075	increase the speed or	-0.124939
-0.432091	software for speed or	-0.124939
-0.477164	optimizing execution speed or	-0.124939
-0.305614	with reduced speed or	-0.124939
-0.396855	at half speed or	-0.124939
-0.353571	done with single or	-0.124939
-0.576460	an Intel, AMD or	-0.124939
-0.808927	of an exception or	-0.124939
-0.358229	it is small or	-0.124939
-0.358229	function is small or	-0.124939
-0.406908	is very small or	-0.124939
-0.406908	time too small or	-0.124939
-0.461025	if an overflow or	-0.124939
-0.423464	would cause overflow or	-0.124939
-0.327054	to ignore overflow or	-0.124939
-0.340220	they are integers or	-0.124939
-0.340220	eight 16-bit integers or	-0.124939
-0.137388	78). A matrix or	-0.124939
-0.137388	needed? A matrix or	-0.124939
-0.562842	Enable the AVX or	-0.124939
-0.339666	compiled for AVX or	-0.124939
-0.339089	data into classes or	-0.124939
-0.535485	own container classes or	-0.124939
-0.468660	two double precision or	-0.124939
-0.468660	four double precision or	-0.124939
-0.377697	using single precision or	-0.124939
-0.377697	constant single precision or	-0.124939
-0.103435	a command line or	-0.124939
-0.650254	the compiler manual or	-0.124939
-0.570392	will be advantageous or	-0.124939
-0.825230	use a container or	-0.124939
-0.649208	operations involves eight or	-0.124939
-0.352298	loop with few or	-0.124939
-0.948612	a linked list or	-0.124939
-0.773641	a sorted list or	-0.124939
-0.478067	of the structure or	-0.124939
-0.036204	in a structure or	-0.425969
-0.075702	as a structure or	-0.124939
-0.075702	big a structure or	-0.124939
-0.075702	define a structure or	-0.124939
-0.167445	array of structure or	-0.124939
-0.167445	arrays of structure or	-0.124939
-0.211932	thread. This structure or	-0.124939
-0.211932	the same structure or	-0.124939
-0.211932	a class, structure or	-0.124939
-0.353301	critical functions inline or	-0.124939
-0.352088	that doesn't add or	-0.124939
-0.489954	in 64-bit mode or	-0.425969
-0.297699	In 64-bit mode or	-0.124939
-0.421533	Use 64-bit mode or	-0.124939
-0.352828	to valid values or	-0.124939
-0.352512	time loading files or	-0.124939
-0.352051	causes technical problems or	-0.124939
-0.516846	save cache space or	-0.124939
-0.856813	automatic CPU dispatching or	-0.124939
-0.454114	has many branches or	-0.124939
-0.351183	code involves multiplication or	-0.124939
-0.644674	the code automatically or	-0.124939
-0.479196	be an expression or	-0.124939
-0.466778	propagation An expression or	-0.124939
-0.171739	non-static data members or	-0.124939
-0.351090	inefficient code-based methods or	-0.124939
-0.271848	can be signed or	-0.124939
-0.271848	64-bit integer, signed or	-0.124939
-0.114992	2 int, signed or	-0.124939
-0.114992	short int, signed or	-0.124939
-0.271848	1 char, signed or	-0.124939
-0.351222	no try block or	-0.124939
-0.328384	conversion takes zero or	-0.124939
-0.328384	a was zero or	-0.124939
-0.453969	made with Microsoft or	-0.124939
-0.234282	is a reference or	-0.124939
-0.234282	Use a reference or	-0.124939
-0.350694	such as string or	-0.124939
-0.349977	may be three or	-0.124939
-0.560857	on table lookup or	-0.124939
-1.032657	of different types or	-0.124939
-0.326695	have mixed types or	-0.124939
-0.809245	floating point expressions or	-0.124939
-0.326695	buffer and read or	-0.124939
-0.326695	Do not read or	-0.124939
-0.631237	arrays are aligned or	-0.124939
-0.682533	#pragma vector aligned or	-0.124939
-0.301568	are properly aligned or	-0.124939
-0.642743	class is declared or	-0.124939
-0.321479	cores. A process or	-0.124939
-0.770290	the installation process or	-0.124939
-0.349826	give misleading results or	-0.124939
-0.450985	conversion. The constructor or	-0.124939
-0.348408	loading of modules or	-0.124939
-0.317448	u.d is negative or	-0.124939
-0.317448	both are negative or	-0.124939
-0.548570	cannot be predicted or	-0.124939
-0.739120	library is loaded or	-0.124939
-0.348370	whether the positive or	-0.124939
-0.261831	often be C or	-0.124939
-0.343443	together with C or	-0.124939
-0.261831	a separate C or	-0.124939
-0.261831	choose either C or	-0.124939
-1.169747	to turn off or	-0.124939
-0.315928	turn them off or	-0.124939
-0.348123	// Windows syntax or	-0.124939
-0.347918	by their index or	-0.124939
-0.709396	if the network or	-0.124939
-0.347625	setup but slow or	-0.124939
-0.347843	in library functions, or	-0.124939
-0.347055	supporting multiple platforms or	-0.124939
-0.346355	to each task or	-0.124939
-0.487518	cannot be inlined or	-0.124939
-0.348225	whether to repeat or	-0.124939
-0.346148	you can clear or	-0.124939
-0.329523	a hard disk or	-0.124939
-0.302710	network is overloaded or	-0.124939
-0.302710	cannot be overloaded or	-0.124939
-0.344821	to always true or	-0.124939
-0.301972	methods with little or	-0.124939
-0.301972	programmers have little or	-0.124939
-0.737359	it is initialized or	-0.124939
-0.286163	and the SSE or	-0.124939
-0.286163	has the SSE or	-0.124939
-0.123905	faster than reading or	-0.124939
-0.123905	rather than reading or	-0.124939
-0.673219	number of cores or	-0.124939
-0.830398	multiple CPU cores or	-0.124939
-0.673849	can be copied or	-0.124939
-0.297935	created, deleted, copied or	-0.124939
-0.773145	Windows, Linux, BSD or	-0.124939
-0.344439	a multithreaded program, or	-0.124939
-0.344144	compile- time loops or	-0.124939
-0.342360	functions, classes, templates or	-0.124939
-0.342360	a static buffer or	-0.124939
-0.342360	rather than seconds or	-0.124939
-0.342036	a Linux compiler, or	-0.124939
-0.342683	a different module or	-0.124939
-0.113680	for user input or	-0.124939
-0.442659	for each row or	-0.124939
-0.343007	a link map or	-0.124939
-0.342341	__asm int 3; or	-0.124939
-0.341264	with normal writes or	-0.124939
-0.373655	specific CPU brands or	-0.124939
-0.286716	processors. Other brands or	-0.124939
-0.340547	of unknown brand or	-0.124939
-0.342341	object by *p or	-0.124939
-0.339094	8, 10, 12 or	-0.124939
-0.338693	with a prediction or	-0.124939
-0.737448	to an integer, or	-0.124939
-0.339094	is called once or	-0.124939
-0.277422	__declspec(noalias) or __restrict or	-0.124939
-0.277422	the keyword __restrict or	-0.124939
-0.438045	a runtime DLL or	-0.124939
-0.331480	new and delete or	-0.124939
-0.502905	to class C1 or	-0.124939
-0.436212	will be called, or	-0.124939
-0.336321	important new update or	-0.124939
-0.620138	to be slower or	-0.124939
-0.336777	functionality without polymorphism or	-0.124939
-0.337234	can add, remove or	-0.124939
-0.519161	code if possible, or	-0.124939
-0.254187	function that reads or	-0.124939
-0.254187	program afterwards reads or	-0.124939
-0.335295	regardless of scope or	-0.124939
-0.253747	pointer, a reference, or	-0.124939
-0.639434	pointer or reference, or	-0.124939
-0.333178	up to five or	-0.124939
-0.128496	File access Reading or	-0.124939
-0.128496	the program. Reading or	-0.124939
-0.128496	random access. Reading or	-0.124939
-0.128496	= 0x1C. Reading or	-0.124939
-0.333706	that requires compilation or	-0.124939
-0.328815	function calling. __fastcall or	-0.124939
-0.328815	Files on remote or	-0.124939
-0.329443	has side effects or	-0.124939
-0.425664	Run multiple processes or	-0.124939
-0.330702	to the console or	-0.124939
-0.048076	string is created or	-0.425969
-0.484510	be a hundred or	-0.124939
-0.478071	to a command or	-0.124939
-0.709962	by the latency or	-0.124939
-0.329443	for Tuesday, Wednesday or	-0.124939
-0.425664	pressing a key or	-0.124939
-0.751178	the carry flag or	-0.124939
-0.323901	do not overlap or	-0.124939
-0.454638	they are needed, or	-0.124939
-0.322353	you use pre-increment or	-0.124939
-0.322353	a mouse move or	-0.124939
-0.322353	Data alignment. __declspec(align(16)) or	-0.124939
-0.848464	simple periodic pattern or	-0.124939
-0.323126	Any specific bottleneck or	-0.124939
-0.043853	Aligning RGB video or	-0.425969
-0.323901	by only 50% or	-0.124939
-0.322353	result, true (1) or	-0.124939
-0.322353	code are uncached or	-0.124939
-0.323901	to use, incompatible or	-0.124939
-0.210545	multiple calculations simultaneously or	-0.124939
-0.210545	more jobs simultaneously or	-0.124939
-0.121708	A branch tree or	-0.124939
-0.110941	a binary tree or	-0.124939
-0.187376	A binary tree or	-0.124939
-0.322353	to use hyperthreading or	-0.124939
-0.440377	32 bits each, or	-0.124939
-0.440377	large memory blocks, or	-0.124939
-0.311791	using templates. Two or	-0.124939
-0.404490	called square blocking or	-0.124939
-0.015416	link library (*.dll or	-0.425969
-0.065247	dynamic libraries (*.dll or	-0.124939
-0.312798	queue, list, database, or	-0.124939
-0.311791	a function parameter, or	-0.124939
-0.312798	for storing text or	-0.124939
-0.311791	estimate is correct or	-0.124939
-0.311791	the options -S or	-0.124939
-0.311791	be more (128 or	-0.124939
-0.171254	CPU only) -O3 or	-0.124939
-0.171254	or /Ox -O3 or	-0.124939
-0.065247	operand is infinity or	-0.124939
-0.065247	will be infinity or	-0.124939
-0.065247	zero or infinity or	-0.124939
-0.312798	function returns. Global or	-0.124939
-0.311791	not backwards. Copying or	-0.124939
-0.311791	running on. Replace or	-0.124939
-0.312798	framework for interpreting or	-0.124939
-0.291409	a graphics card or	-0.124939
-0.291409	step of interpretation or	-0.124939
-0.291409	are not overlapping or	-0.124939
-0.291409	the project window or	-0.124939
-0.291409	vector size (16 or	-0.124939
-0.291409	overloaded assignment operator, or	-0.124939
-0.379395	time. A for-loop or	-0.124939
-0.291409	header file stdint.h or	-0.124939
-0.291409	bigger segments (32-bit or	-0.124939
-0.291409	8-bit signed number, or	-0.124939
-0.291409	other resources locally or	-0.124939
-0.291409	a binary search, or	-0.124939
-0.291409	they are uninitialized or	-0.124939
-0.291409	an && expression, or	-0.124939
-0.291409	real-time speed. Delays or	-0.124939
-0.291409	programs use internet or	-0.124939
-0.291409	produce streaming audio or	-0.124939
-0.291409	by consecutive indices or	-0.124939
-0.291409	network is unstable or	-0.124939
-0.023334	libraries (*.lib, *.a) or	-0.425969
-0.291409	sorting and searching, or	-0.124939
-0.291409	in an array, or	-0.124939
-0.291409	for this purpose, or	-0.124939
-0.291409	a graphics coprocessor or	-0.124939
-0.291409	use for recovering or	-0.124939
-0.291409	want this initialization, or	-0.124939
-0.291409	a key press or	-0.124939
-0.379395	quickly to keyboard or	-0.124939
-0.291409	structure }; 52 or	-0.124939
-0.533948	new and delete, or	-0.124939
-0.235441	of an update, or	-0.124939
-0.235441	or use objconv or	-0.124939
-0.235441	dispatching to C1::Disp() or	-0.124939
-0.235441	a particular weakness or	-0.124939
-0.235441	element to x?" or	-0.124939
-0.235441	draw each pixel or	-0.124939
-0.235441	the keyword __thread or	-0.124939
-0.235441	this example, f(x) or	-0.124939
-0.235441	dynamic libraries (.dll or	-0.124939
-0.235441	binutils version 2.20 or	-0.124939
-0.235441	pressing a button or	-0.124939
-0.235441	can become imprecise or	-0.124939
-0.235441	a whole polygon or	-0.124939
-0.235441	the data optimally, or	-0.124939
-0.235441	the declaration "static" or	-0.124939
-0.235441	a computer game or	-0.124939
-0.235441	of inte- ger or	-0.124939
-0.235441	a whole workday or	-0.124939
-0.235441	compiler to vectorize, or	-0.124939
-0.235441	misses, branch misprediction, or	-0.124939
-0.235441	within a year or	-0.124939
-0.235441	structures with First-In-First-Out or	-0.124939
-0.235441	use the GetTickCount or	-0.124939
-0.235441	static libraries (.lib or	-0.124939
-0.235441	"standard stack frame" or	-0.124939
-0.235441	pointer aliasing. __declspec(noalias) or	-0.124939
-0.235441	computer is reset or	-0.124939
-0.235441	than allocating piecewise or	-0.124939
-0.235441	a program creates or	-0.124939
-0.235441	hacks that violate or	-0.124939
-0.235441	such as VHDL or	-0.124939
-0.235441	been reordered, inlined, or	-0.124939
-0.235441	the option /QaxAVX or	-0.124939
-0.235441	for speed /O2 or	-0.124939
-0.235441	allocation using new/delete or	-0.124939
-0.235441	memcpy, memmove, memset, or	-0.124939
-0.235441	available in 2015 or	-0.124939
-0.235441	integer is signed, or	-0.124939
-0.235441	assembly output (/FAs or	-0.124939
-0.235441	with enum, const, or	-0.124939
-0.235441	__asm ("int 3"); or	-0.124939
-0.235441	development kit (SDK or	-0.124939
-0.235441	the structure. Incrementing or	-0.124939
-0.235441	with option -fwrapv or	-0.124939
-0.235441	algorithm (e.g. Quine–McCluskey or	-0.124939
-0.235441	to a printer or	-0.124939
-0.235441	compiling for AVX2, or	-0.124939
-0.235441	methods are incremental or	-0.124939
-0.235441	on complicated criteria or	-0.124939
-0.235441	unless the Pentium-II or	-0.124939
-0.235441	timingtest.h from www.agner.org/optimize/testp.zip or	-0.124939
-0.235441	of valid addresses, or	-0.124939
-0.235441	CPUID instruction directly, or	-0.124939
-0.235441	out of range"); or	-0.124939
-0.235441	vector registers (XMM or	-0.124939
-0.235441	goes to C0::f or	-0.124939
-0.235441	computer while he or	-0.124939
-0.235441	like string, wstring or	-0.124939
-0.235441	with option -Wstrict-overflow=2, or	-0.124939
-0.358523	hour. Neither is it	-0.124939
-1.824773	the address of it	-0.124939
-0.494265	the compiler and it	-0.124939
-0.950402	a time and it	-0.124939
-0.561844	intrinsic functions and it	-0.124939
-0.916523	code cache and it	-0.124939
-0.453851	sequential order and it	-0.124939
-0.351219	some cases and it	-0.124939
-0.127749	function calls and it	-0.124939
-0.497863	the function, and it	-0.124939
-0.494265	VIA processors, and it	-0.124939
-0.351219	accurate, however, and it	-0.124939
-0.351219	with -fpic and it	-0.124939
-0.351219	quite inefficient, and it	-0.124939
-0.453851	more cores, and it	-0.124939
-0.453851	multiple layers and it	-0.124939
-0.351219	often fluctuating and it	-0.124939
-0.360507	code is that it	-0.124939
-0.382292	data is that it	-0.124939
-0.538088	set is that it	-0.124939
-0.382292	double is that it	-0.124939
-0.382292	problem is that it	-0.124939
-0.382292	optimizations is that it	-0.124939
-0.382292	storage is that it	-0.124939
-0.382292	algorithms is that it	-0.124939
-0.382292	volatile is that it	-0.124939
-0.538088	notice is that it	-0.124939
-1.174080	the code that it	-0.124939
-0.417062	same time that it	-0.124939
-0.717022	the functions that it	-0.425969
-0.315512	it so that it	-0.124939
-0.315512	function so that it	-0.124939
-0.445387	time so that it	-0.124939
-0.315512	less so that it	-0.124939
-0.315512	exception so that it	-0.124939
-0.315512	source so that it	-0.124939
-0.315512	section so that it	-0.124939
-0.315512	statement so that it	-0.124939
-0.315512	100 so that it	-0.124939
-0.315512	changed so that it	-0.124939
-0.315512	above, so that it	-0.124939
-0.315512	9.5 so that it	-0.124939
-0.315512	switches; so that it	-0.124939
-0.825979	are sure that it	-0.124939
-0.588768	so small that it	-0.124939
-0.588768	the advantage that it	-0.124939
-0.532379	loop-invariant expression that it	-0.124939
-0.026025	so high that it	-0.301030
-0.303479	function means that it	-0.124939
-0.673096	This means that it	-0.124939
-0.125753	variable means that it	-0.425969
-0.303479	10 means that it	-0.124939
-0.454050	compiler optimizations that it	-0.124939
-0.543213	or assume that it	-0.124939
-0.417062	table shows that it	-0.124939
-0.321919	so expensive that it	-0.124939
-0.321919	9.1 show that it	-0.124939
-0.060915	the event that it	-0.124939
-0.473797	I think that it	-0.124939
-0.321919	unfortunate consequence that it	-0.124939
-0.321919	program saying that it	-0.124939
-0.321919	so kludgy that it	-0.124939
-0.321919	later discovers that it	-0.124939
-0.321919	compiler knows that it	-0.124939
-0.321919	may argue that it	-0.124939
-0.095480	small or if it	-0.301030
-0.389429	pattern or if it	-0.124939
-0.578809	a function if it	-0.124939
-0.410331	object as if it	-0.124939
-0.756448	but not if it	-0.124939
-0.410331	extra time if it	-0.124939
-0.446725	RAM memory if it	-0.124939
-0.586422	works only if it	-0.124939
-0.415481	needed only if it	-0.124939
-0.726278	a loop if it	-0.124939
-0.633579	an integer if it	-0.124939
-0.502380	more efficient if it	-0.124939
-0.410331	possible branch if it	-0.124939
-0.316504	function call if it	-0.124939
-0.465340	member pointers if it	-0.124939
-0.316504	large arrays if it	-0.124939
-0.719348	separate thread if it	-0.124939
-0.504717	must check if it	-0.124939
-0.410331	quite well if it	-0.124939
-0.060169	clock cycles if it	-0.425969
-0.410331	calculated fast if it	-0.124939
-0.811426	to see if it	-0.124939
-0.410331	variable global if it	-0.124939
-0.316504	switch statement if it	-0.124939
-0.316504	the costs if it	-0.124939
-0.316504	a destructor if it	-0.124939
-0.480665	only safe if it	-0.124939
-0.756448	most efficiently if it	-0.124939
-0.663793	may consider if it	-0.124939
-0.316504	error message if it	-0.124939
-0.316504	programming style if it	-0.124939
-0.316504	time consumer if it	-0.124939
-0.316504	separate subroutine if it	-0.124939
-1.238113	as long as it	-0.124939
-0.545000	and stored as it	-0.124939
-0.497194	is distributed as it	-0.124939
-0.353324	be executed as it	-0.124939
-0.759802	cycles more than it	-0.124939
-0.524533	implies more than it	-0.124939
-0.346867	more data than it	-0.124939
-0.346867	such systems than it	-0.124939
-0.488227	less important than it	-0.124939
-0.560732	as bigger than it	-0.124939
-0.636073	other purposes than it	-0.124939
-0.286690	is the time it	-0.124939
-0.674177	of the time it	-0.124939
-0.120106	to the time it	-0.425969
-0.406952	and the time it	-0.124939
-0.406952	with the time it	-0.124939
-0.157458	than the time it	-0.602060
-0.225831	this the time it	-0.425969
-0.541962	at the time it	-0.425969
-0.286690	includes the time it	-0.124939
-0.286690	Consider the time it	-0.124939
-0.286690	measuring the time it	-0.124939
-0.185690	time. The time it	-0.124939
-0.185690	installation The time it	-0.124939
-0.185690	bytes. The time it	-0.124939
-0.185690	calculations. The time it	-0.124939
-0.185690	prediction. The time it	-0.124939
-0.185690	doubled. The time it	-0.124939
-0.185690	run. The time it	-0.124939
-0.185690	tolerance. The time it	-0.124939
-0.185690	costs. The time it	-0.124939
-0.498591	memory each time it	-0.124939
-0.484861	how long time it	-0.124939
-0.215438	how much time it	-0.124939
-0.535653	re-allocated every time it	-0.124939
-0.536464	as last time it	-0.124939
-1.058126	reason to use it	-0.124939
-1.226185	you can use it	-0.124939
-0.322243	object x when it	-0.124939
-0.096245	mispredicted only when it	-0.124939
-0.454489	is used when it	-0.124939
-0.309861	mechanism even when it	-0.124939
-0.309861	inlined even when it	-0.124939
-0.309861	space, even when it	-0.124939
-0.322243	is compiled when it	-0.124939
-0.322243	by line when it	-0.124939
-0.322243	process running when it	-0.124939
-0.322243	many advantages when it	-0.124939
-0.322243	error message when it	-0.124939
-0.322243	is costly when it	-0.124939
-0.322243	is permissible when it	-0.124939
-0.322243	motion manually when it	-0.124939
-0.415442	a code then it	-0.124939
-0.345534	the program then it	-0.124939
-0.393162	library functions then it	-0.124939
-0.345534	instruction set then it	-0.124939
-0.263563	different compilers then it	-0.124939
-0.263563	object static then it	-0.124939
-0.486382	is available then it	-0.124939
-0.263563	clean up then it	-0.124939
-0.263563	is large then it	-0.124939
-0.263563	program execution then it	-0.124939
-0.263563	are necessary then it	-0.124939
-0.263563	not advantageous then it	-0.124939
-0.263563	a problem then it	-0.124939
-0.263563	already known then it	-0.124939
-0.263563	clock cycles then it	-0.124939
-0.263563	73) automatically then it	-0.124939
-0.263563	CPU core then it	-0.124939
-0.345534	array index then it	-0.124939
-0.263563	CPU efficiency then it	-0.124939
-0.263563	same resource then it	-0.124939
-0.376699	separate module then it	-0.124939
-0.263563	the debugger then it	-0.124939
-0.052438	version on, then it	-0.425969
-0.486382	If not, then it	-0.124939
-0.263563	non-sequential manner then it	-0.124939
-0.263563	big arrays, then it	-0.124939
-0.263563	(see below) then it	-0.124939
-0.263563	one segment then it	-0.124939
-0.263563	μs today, then it	-0.124939
-0.263563	been identified, then it	-0.124939
-0.263563	CPU dispatching, then it	-0.124939
-0.263563	been found, then it	-0.124939
-0.263563	too fine then it	-0.124939
-0.263563	to T+5, then it	-0.124939
-0.263563	poorly predictable, then it	-0.124939
-0.263563	be made) then it	-0.124939
-0.263563	is obvious, then it	-0.124939
-0.263563	too small, then it	-0.124939
-0.263563	not met then it	-0.124939
-1.487386	order to make it	-0.124939
-0.576029	advisable to make it	-0.124939
-0.555388	package and make it	-0.124939
-0.420301	instructions that make it	-0.124939
-0.420301	conditions that make it	-0.124939
-0.509180	parameters then make it	-0.124939
-0.943673	This is because it	-0.124939
-0.230694	member function because it	-0.425969
-0.368463	optimized code because it	-0.124939
-0.257203	Intel compiler because it	-0.124939
-0.420247	first time because it	-0.124939
-0.257203	can do because it	-0.124939
-0.257203	link library because it	-0.124939
-0.420247	very efficient because it	-0.124939
-0.337864	induction variable because it	-0.124939
-0.368463	uses pointers because it	-0.124939
-0.257203	is useful because it	-0.124939
-0.257203	cleaning up because it	-0.124939
-0.368463	eight times because it	-0.124939
-0.051454	not optimal because it	-0.124939
-0.257203	no cost because it	-0.124939
-0.257203	compiler mechanism because it	-0.124939
-0.257203	function just because it	-0.124939
-0.663737	is inefficient because it	-0.124939
-0.257203	different platforms because it	-0.124939
-0.257203	other constants because it	-0.124939
-0.257203	main executable because it	-0.124939
-0.257203	inherently parallel because it	-0.124939
-0.257203	several seconds because it	-0.124939
-0.051454	time consuming because it	-0.425969
-0.257203	be poor because it	-0.124939
-0.257203	multiple processes because it	-0.124939
-0.257203	plus one, because it	-0.124939
-0.257203	becomes simpler because it	-0.124939
-0.257203	particularly interesting because it	-0.124939
-0.257203	accessed non-sequentially because it	-0.124939
-0.257203	Sandy Bridge) because it	-0.124939
-0.257203	with alloca, because it	-0.124939
-0.257203	particularly risky because it	-0.124939
-1.057188	for the CPU it	-0.124939
-0.350668	a branch. If it	-0.124939
-0.453153	come first. If it	-0.124939
-0.350668	so. 58 If it	-0.124939
-0.533536	processors on which it	-0.124939
-0.353863	the array, which it	-0.124939
-0.398306	most cases, but it	-0.124939
-0.207182	the function, but it	-0.124939
-0.207182	latter function, but it	-0.124939
-0.514401	instruction set, but it	-0.124939
-0.398306	other CPUs, but it	-0.124939
-0.280119	clock cycles, but it	-0.124939
-0.561199	multiple threads, but it	-0.124939
-0.365610	reasonably well, but it	-0.124939
-0.365610	a pointer, but it	-0.124939
-0.280119	logarithm again, but it	-0.124939
-0.365610	such applications, but it	-0.124939
-0.280119	64-bit software, but it	-0.124939
-0.280119	simplest method, but it	-0.124939
-0.280119	CPU core, but it	-0.124939
-0.280119	the hint, but it	-0.124939
-0.280119	with profiling, but it	-0.124939
-0.280119	simple solution, but it	-0.124939
-0.280119	syntax restriction, but it	-0.124939
-0.280119	disk caching, but it	-0.124939
-0.280119	small subtasks, but it	-0.124939
-0.280119	container expandable, but it	-0.124939
-0.280119	considerable job, but it	-0.124939
-0.280119	code section, but it	-0.124939
-0.280119	is wrong, but it	-0.124939
-0.540312	than the one it	-0.124939
-0.892789	which instruction set it	-0.124939
-0.888259	is to do it	-0.124939
-0.565398	with the pointer it	-0.124939
-0.874842	that the object it	-0.124939
-0.306222	the point where it	-0.124939
-0.306222	public variable where it	-0.124939
-0.524979	be cases where it	-0.124939
-0.373092	are cases where it	-0.124939
-0.147656	few cases where it	-0.124939
-0.306222	= false where it	-0.124939
-0.306222	bit mode, where it	-0.124939
-0.306222	data cache, where it	-0.124939
-0.306222	of data", where it	-0.124939
-0.593857	has the value it	-0.124939
-0.356825	class C1, so it	-0.124939
-0.381406	pointer and makes it	-0.124939
-0.457725	and it makes it	-0.124939
-0.438261	doubled. This makes it	-0.124939
-0.438261	occurred. This makes it	-0.124939
-0.438261	local. This makes it	-0.124939
-0.433389	default, which makes it	-0.124939
-0.293050	class library makes it	-0.124939
-0.293050	Using pointers makes it	-0.124939
-0.381406	Dynamic linking makes it	-0.124939
-0.293050	static declaration makes it	-0.124939
-0.298845	an Intel before it	-0.124939
-0.547030	the stack before it	-0.124939
-0.298845	for overflow before it	-0.124939
-0.423057	actual values before it	-0.124939
-0.298845	of temp before it	-0.124939
-0.298845	the misprediction before it	-0.124939
-0.298845	or compilation before it	-0.124939
-0.298845	calculate (c+d) before it	-0.124939
-0.298845	second sub-vector before it	-0.124939
-0.440631	compiler and call it	-0.124939
-0.138902	After first call it	-0.124939
-0.562253	unit. For example, it	-0.124939
-0.562253	modularity. For example, it	-0.124939
-0.355923	the best optimization it	-0.124939
-0.355971	in most libraries it	-0.124939
-1.482832	to make sure it	-0.124939
-0.496983	you make sure it	-0.124939
-0.713471	then make sure it	-0.124939
-0.570448	than to access it	-0.124939
-0.428814	in this case it	-0.124939
-0.573893	In this case it	-0.124939
-0.657397	In most cases it	-0.124939
-0.385044	In many cases it	-0.124939
-0.295934	In some cases it	-0.425969
-0.296016	more complex cases it	-0.124939
-0.501818	zero than making it	-0.124939
-0.869466	what you want it	-0.124939
-0.532425	that we want it	-0.124939
-0.523473	the more important it	-0.124939
-0.354918	running on, while it	-0.124939
-0.553318	and the work it	-0.124939
-0.306423	same resources. But it	-0.124939
-0.306423	modern CPU. But it	-0.124939
-0.306423	function parameter. But it	-0.124939
-0.306423	such checks. But it	-0.124939
-0.306423	optimization issue. But it	-0.124939
-0.564047	memory and therefore it	-0.124939
-0.279874	find out whether it	-0.124939
-0.513981	may consider whether it	-0.124939
-0.279874	to evaluate whether it	-0.124939
-0.469809	when deciding whether it	-0.425969
-0.279874	to determine whether it	-0.124939
-0.436674	function and calculate it	-0.124939
-0.337602	compiler may calculate it	-0.124939
-0.523545	*p+2 and store it	-0.124939
-0.500175	see how well it	-0.124939
-0.432940	value and write it	-0.124939
-0.612559	read or write it	-0.124939
-0.353488	the size. However, it	-0.124939
-0.352796	multiplications. How was it	-0.124939
-0.334044	In other cases, it	-0.124939
-1.132091	In some cases, it	-0.124939
-0.575721	fits the microprocessor it	-0.124939
-0.333310	variable or replace it	-0.124939
-0.333310	predictable then replace it	-0.124939
-0.186119	internal references. Therefore, it	-0.124939
-0.186119	points to. Therefore, it	-0.124939
-0.186119	different applications. Therefore, it	-0.124939
-0.186119	other number. Therefore, it	-0.124939
-0.186119	is declared. Therefore, it	-0.124939
-0.253599	than another. Therefore, it	-0.124939
-0.186119	type int. Therefore, it	-0.124939
-0.186119	library. 78 Therefore, it	-0.124939
-0.186119	was programmed. Therefore, it	-0.124939
-0.186119	than PCs. Therefore, it	-0.124939
-0.186119	been calculated. Therefore, it	-0.124939
-0.520353	iteration. This allows it	-0.124939
-0.330078	do and what it	-0.124939
-0.427243	cannot change what it	-0.124939
-0.351654	In multithreaded applications it	-0.124939
-0.327739	an object after it	-0.124939
-0.327739	is accessed after it	-0.124939
-0.351398	class or give it	-0.124939
-1.412332	the loop control it	-0.124939
-0.540443	&list[8]); } Here, it	-0.124939
-0.350075	#if directives around it	-0.124939
-0.349905	RTTI then turn it	-0.124939
-0.550731	when in fact it	-0.124939
-0.449869	efficient, and sometimes it	-0.124939
-0.449869	CPU and prevent it	-0.124939
-0.534932	compiling. This prevents it	-0.124939
-0.450420	We can tell it	-0.124939
-0.347358	waste of time, it	-0.124939
-0.448670	instead of copying it	-0.124939
-0.347358	fast as accessing it	-0.124939
-0.347827	code and divide it	-0.124939
-0.487581	After each iteration it	-0.124939
-0.370501	returns even though it	-0.124939
-0.370501	nine, even though it	-0.124939
-0.394270	231 then convert it	-0.124939
-0.303517	compiler must convert it	-0.124939
-0.345547	that I consider it	-0.124939
-0.801712	of the program, it	-0.124939
-0.298930	the final program, it	-0.124939
-0.999328	the reason why it	-0.124939
-0.248066	clock cycles whenever it	-0.124939
-0.248066	be mispredicted whenever it	-0.124939
-0.248066	own initiative whenever it	-0.124939
-0.523228	allocation is used, it	-0.124939
-0.344222	back again. Obviously, it	-0.124939
-0.344071	and other features it	-0.124939
-0.342608	you should multiply it	-0.124939
-0.342608	} } Here it	-0.124939
-0.437480	than to delete it	-0.124939
-0.338030	generating overflow. Likewise, it	-0.124939
-0.188765	function is called, it	-0.124939
-0.156980	object is called, it	-0.124939
-0.338243	brutally interrupted. Now it	-0.124939
-0.472046	is to declare it	-0.124939
-0.334875	CPU by giving it	-0.124939
-0.334875	function bodies above, it	-0.124939
-0.433251	efficient than comparing it	-0.124939
-0.526402	generators. In general, it	-0.124939
-0.335122	checking In C++, it	-0.124939
-0.335369	reference to anything it	-0.124939
-0.330789	the specific event it	-0.124939
-0.931066	the other hand, it	-0.124939
-0.330789	declared or created it	-0.124939
-0.330496	vector classes Fortunately, it	-0.124939
-0.486015	readable but unfortunately it	-0.124939
-0.237032	pointers, etc. And it	-0.124939
-0.237032	to maintain. And it	-0.124939
-0.330789	porting between platforms, it	-0.124939
-0.330496	error and compare it	-0.124939
-0.330496	interpreted script languages, it	-0.124939
-0.324008	system crash. Furthermore, it	-0.124939
-0.472724	smaller by declaring it	-0.124939
-0.324008	a derived class, it	-0.124939
-0.324369	you should disable it	-0.124939
-0.927762	In other words, it	-0.124939
-0.313407	comes to optimization, it	-0.124939
-0.172033	time consuming. Sometimes it	-0.124939
-0.172033	simple tasks. Sometimes it	-0.124939
-0.313877	or size. Today, it	-0.124939
-0.313407	In large arrays, it	-0.124939
-0.313407	an integer variable, it	-0.124939
-0.102455	Program loading Often, it	-0.124939
-0.102455	or two. Often, it	-0.124939
-0.292950	logic allows it, it	-0.124939
-0.292950	and even worse, it	-0.124939
-0.292950	a PC. Nevertheless, it	-0.124939
-0.292950	of modern software, it	-0.124939
-0.292950	of Boolean algebra, it	-0.124939
-0.102455	For team projects, it	-0.124939
-0.102455	For one-man projects, it	-0.124939
-0.292950	11.1b automatically, although it	-0.124939
-0.292950	few parameters. Or it	-0.124939
-0.292950	of each method, it	-0.124939
-0.381284	object is accessed, it	-0.124939
-0.236795	the compiler recognizes it	-0.124939
-0.236795	we can see, it	-0.124939
-0.236795	is cached. Usually it	-0.124939
-0.236795	macro is referencing it	-0.124939
-0.236795	improve the performance, it	-0.124939
-0.236795	system which redirects it	-0.124939
-0.236795	a = (b*c)/d, it	-0.124939
-0.236795	dominating. At least, it	-0.124939
-0.236795	as possible. Typically it	-0.124939
-0.236795	programs do. Hence, it	-0.124939
-0.236795	optimized software design, it	-0.124939
-0.236795	weakness or bottleneck, it	-0.124939
-0.236795	new software project, it	-0.124939
-0.236795	matter of habit, it	-0.124939
-0.236795	All in all, it	-0.124939
-0.236795	1 by XOR'ing it	-0.124939
-0.236795	negative by AND'ing it	-0.124939
-0.236795	references. Most importantly, it	-0.124939
-0.236795	same as reflecting it	-0.124939
-0.236795	cases like these, it	-0.124939
-0.236795	iterative in nature, it	-0.124939
-1.525139	This is the function	-0.124939
-1.048599	value of the function	-0.425969
-1.001077	performance of the function	-0.124939
-1.479861	address of the function	-0.124939
-0.574435	changes of the function	-0.124939
-0.574435	scope of the function	-0.124939
-0.193579	static to the function	-0.124939
-1.067862	call to the function	-0.124939
-0.551739	available to the function	-0.124939
-0.551739	throw() to the function	-0.124939
-0.566531	function and the function	-0.124939
-0.834720	function, and the function	-0.124939
-0.489136	available in the function	-0.425969
-0.851857	declared in the function	-0.124939
-0.575688	dot in the function	-0.124939
-0.583321	consistent for the function	-0.124939
-0.788957	compiler that the function	-0.124939
-0.354859	disadvantage that the function	-0.124939
-0.788957	means that the function	-0.124939
-0.788957	possibility that the function	-0.124939
-0.788957	provided that the function	-0.124939
-0.547676	reference, or the function	-0.124939
-1.056755	advantageous if the function	-0.124939
-0.560273	reference if the function	-0.124939
-0.560273	lookup if the function	-0.124939
-0.560273	pure if the function	-0.124939
-0.860849	replaced by the function	-0.124939
-1.014971	obtained with the function	-0.124939
-0.561507	soon as the function	-0.124939
-1.306122	faster than the function	-0.124939
-0.126394	each time the function	-0.602060
-0.091520	first time the function	-0.425969
-0.206451	every time the function	-0.301030
-0.207621	next time the function	-0.124939
-0.494243	used when the function	-0.124939
-0.494243	even when the function	-0.124939
-0.708970	automatically when the function	-0.124939
-0.885061	initialized when the function	-0.124939
-0.180309	deallocated when the function	-0.425969
-0.494243	freed when the function	-0.124939
-0.576892	returning from the function	-0.124939
-1.331472	look at the function	-0.124939
-1.044849	to make the function	-0.425969
-1.005960	mode because the function	-0.124939
-0.564500	functions, but the function	-0.124939
-0.571190	(i.e. where the function	-0.124939
-0.745276	called before the function	-0.124939
-0.460372	stack before the function	-0.124939
-0.460372	freed before the function	-0.124939
-0.460372	restored before the function	-0.124939
-1.255193	to call the function	-0.124939
-0.839205	in case the function	-0.124939
-0.984868	look up the function	-0.124939
-0.737964	many times the function	-0.124939
-0.868273	you want the function	-0.124939
-1.246933	information about the function	-0.124939
-0.759001	that calls the function	-0.124939
-0.479357	arrays inside the function	-0.124939
-0.176649	declared inside the function	-0.124939
-1.455923	to calculate the function	-0.124939
-0.349466	to inline the function	-0.124939
-0.242427	cannot inline the function	-0.124939
-0.490519	optimization unless the function	-0.124939
-0.490519	clear unless the function	-0.124939
-0.536131	reference allows the function	-0.124939
-0.128812	class. Make the function	-0.124939
-0.128812	object. Make the function	-0.124939
-0.128812	returns. Make the function	-0.124939
-0.128812	alternatives: Make the function	-0.124939
-0.305109	than calling the function	-0.124939
-0.431417	before calling the function	-0.124939
-0.519849	method. When the function	-0.124939
-0.485319	93. Avoid the function	-0.124939
-0.546970	variable until the function	-0.124939
-0.344765	optimize across the function	-0.124939
-0.139033	dispatcher changes the function	-0.124939
-0.139033	__fastcall changes the function	-0.124939
-0.445692	may declare the function	-0.124939
-0.344765	for giving the function	-0.124939
-0.631991	by declaring the function	-0.124939
-0.344765	2. Put the function	-0.124939
-0.344765	are. Declare the function	-0.124939
-0.235997	function is a function	-0.602060
-1.308265	This is a function	-0.124939
-0.921291	example is a function	-0.124939
-0.931893	out of a function	-0.124939
-0.547224	type of a function	-0.124939
-0.931893	versions of a function	-0.124939
-0.799572	end of a function	-0.124939
-0.506370	object to a function	-0.124939
-0.506370	objects to a function	-0.124939
-0.506370	link to a function	-0.124939
-0.506370	executable to a function	-0.124939
-0.952245	applied to a function	-0.124939
-1.067578	variables in a function	-0.124939
-0.563694	branches in a function	-0.124939
-0.563694	piece in a function	-0.124939
-0.691238	know that a function	-0.124939
-0.790684	Assume that a function	-0.124939
-0.483342	says that a function	-0.124939
-0.556203	module or a function	-0.124939
-0.489784	or as a function	-0.124939
-0.589218	implemented as a function	-0.124939
-0.701683	name as a function	-0.124939
-0.489784	assignment, as a function	-0.124939
-0.528400	rather than a function	-0.124939
-0.958800	may have a function	-0.124939
-0.422451	next time a function	-0.124939
-0.422451	Every time a function	-0.124939
-0.564105	must use a function	-0.124939
-0.541589	comes when a function	-0.124939
-0.437142	much memory a function	-0.124939
-1.474907	to make a function	-0.124939
-0.439708	problem. If a function	-0.124939
-0.622836	part. If a function	-0.124939
-0.439708	BSD. If a function	-0.124939
-0.547533	as using a function	-0.124939
-0.679387	difference between a function	-0.124939
-1.027316	to call a function	-0.124939
-0.124745	Splitting up a function	-0.124939
-0.437142	function, while a function	-0.124939
-0.437142	support calls a function	-0.124939
-0.654972	function through a function	-0.124939
-0.460518	called through a function	-0.124939
-0.460518	address through a function	-0.124939
-0.300698	table inside a function	-0.124939
-0.358832	declared inside a function	-0.425969
-0.496754	expression contains a function	-0.124939
-0.337975	to inline a function	-0.124939
-0.518593	can replace a function	-0.124939
-0.136908	then sets a function	-0.124939
-0.136908	routine sets a function	-0.124939
-0.437142	or what a function	-0.124939
-0.337975	for inlining a function	-0.124939
-0.475961	useful whenever a function	-0.124939
-0.337975	} Obviously, a function	-0.124939
-0.337975	of exceptions a function	-0.124939
-0.437142	used. Whenever a function	-0.124939
-0.437142	pointers Calling a function	-0.124939
-0.337975	only). Specifies a function	-0.124939
-0.337975	inline. Replacing a function	-0.124939
-0.337975	by replacing a function	-0.124939
-0.337975	definition. Inlining a function	-0.124939
-0.337975	and publish a function	-0.124939
-0.065368	table // of function	-0.425969
-1.055667	excessive number of function	-0.124939
-1.091022	The disadvantage of function	-0.124939
-1.447998	The advantages of function	-0.124939
-0.180869	2.6 Choice of function	-0.124939
-0.355502	lazy binding of function	-0.124939
-0.355502	the chain of function	-0.124939
-0.523537	platforms. Comparison of function	-0.124939
-0.463115	these addresses to function	-0.124939
-0.568706	Virtual functions and function	-0.124939
-0.481846	only compilers and function	-0.124939
-0.481846	Intel compilers and function	-0.124939
-0.537509	Register allocation and function	-0.124939
-0.441559	of branches and function	-0.124939
-0.312670	many branches and function	-0.124939
-1.438098	} } The function	-0.124939
-0.819138	compile time. The function	-0.124939
-1.081075	the function. The function	-0.124939
-0.531576	14.19 below. The function	-0.124939
-0.352567	first call. The function	-0.124939
-0.352567	Intel/x86-compatible microprocessors. The function	-0.124939
-0.647236	following reasons: The function	-0.124939
-0.352567	library. 119 The function	-0.124939
-0.455558	using exceptions. The function	-0.124939
-0.352567	one instance. The function	-0.124939
-1.633510	be used for function	-0.124939
-0.461979	recovery information for function	-0.124939
-0.881878	a[SIZE][SIZE]) { // function	-0.124939
-0.128632	in matrix // function	-0.425969
-0.355172	int cc[]); // function	-0.124939
-0.355686	other compilers or function	-0.124939
-0.355686	many branches or function	-0.124939
-0.355686	try block or function	-0.124939
-0.503925	time spent on function	-0.124939
-0.878109	optimizations such as function	-0.124939
-0.354894	vector objects as function	-0.124939
-0.499381	the variable as function	-0.124939
-0.431039	functions // This function	-0.124939
-0.431039	required // This function	-0.124939
-0.978360	} } This function	-0.124939
-0.452255	different compilers. This function	-0.124939
-0.349959	CPU dispatching. This function	-0.124939
-0.349959	for overflow. This function	-0.124939
-0.510661	performance of this function	-0.124939
-0.510661	name of this function	-0.124939
-0.348914	0 // this function	-0.124939
-0.978944	can use this function	-0.124939
-0.348914	to inline this function	-0.124939
-0.347298	innermost loop A function	-0.124939
-0.347298	functions local A function	-0.124939
-0.347298	mutually incompatible. A function	-0.124939
-0.347298	function body. A function	-0.124939
-0.347298	a destructor. A function	-0.124939
-0.357676	AVX2 Mathematical vector function	-0.124939
-0.557875	details that make function	-0.124939
-0.569114	modifier can make function	-0.124939
-0.577226	using a different function	-0.124939
-0.579743	performance of different function	-0.124939
-0.564864	that the same function	-0.124939
-0.831633	If the same function	-0.124939
-0.975971	using the same function	-0.124939
-0.564864	calculate the same function	-0.124939
-0.858672	call any other function	-0.124939
-0.357480	function decides which function	-0.124939
-0.452109	sure that one function	-0.124939
-0.452109	created by one function	-0.124939
-0.764865	transferred from one function	-0.124939
-0.524773	recommend that no function	-0.124939
-0.813389	address of each function	-0.124939
-0.817522	once for each function	-0.124939
-0.435063	code at each function	-0.124939
-0.336319	to test each function	-0.124939
-0.237958	of times each function	-0.124939
-0.343756	many times each function	-0.124939
-0.357313	be able do function	-0.124939
-0.502792	used that most function	-0.124939
-0.590883	further by using function	-0.124939
-1.004178	that the Intel function	-0.124939
-0.531036	function in Intel function	-0.124939
-0.764342	using an Intel function	-0.124939
-0.343613	well optimized Intel function	-0.124939
-0.695074	call the library function	-0.124939
-0.485715	economize the library function	-0.124939
-0.296181	is a library function	-0.124939
-0.296181	as a library function	-0.124939
-0.429378	register. The library function	-0.124939
-0.429378	undocumented Intel library function	-0.124939
-0.461115	separately through multiple function	-0.124939
-1.109023	there are many function	-0.124939
-0.457875	application with many function	-0.124939
-0.457875	applications with many function	-0.124939
-0.452700	outside of any function	-0.124939
-0.452700	interfere with any function	-0.124939
-0.356537	clear correspondence between function	-0.124939
-0.409079	if the member function	-0.124939
-0.234085	and a member function	-0.124939
-0.234085	or a member function	-0.124939
-0.514912	as a member function	-0.124939
-0.234085	Make a member function	-0.124939
-0.234085	Calling a member function	-0.124939
-0.234085	force a member function	-0.124939
-0.322682	pointer or member function	-0.124939
-0.517401	a class member function	-0.124939
-0.244557	But each member function	-0.124939
-0.021290	A static member function	-0.301030
-0.244557	A const member function	-0.124939
-0.574802	a virtual member function	-0.124939
-0.244557	static Assume member function	-0.124939
-0.110603	a non-static member function	-0.425969
-0.322682	a polymorphic member function	-0.124939
-0.524932	to a const function	-0.124939
-0.578829	stack. This makes function	-0.124939
-0.349112	stack frame makes function	-0.124939
-1.125001	of the critical function	-0.124939
-0.448274	time the critical function	-0.124939
-0.168698	calls the critical function	-0.124939
-0.448274	When the critical function	-0.124939
-0.680268	of a critical function	-0.124939
-0.058566	// Call critical function	-0.124939
-0.648408	of the template function	-0.124939
-0.456312	example, the template function	-0.124939
-0.491596	than the simple function	-0.124939
-0.563413	calling a simple function	-0.124939
-0.564778	107). The Gnu function	-0.124939
-0.580328	without information about function	-0.124939
-0.535597	despite the extra function	-0.124939
-0.354802	for details. Use function	-0.124939
-0.565927	code. The best function	-0.124939
-0.463933	for a single function	-0.124939
-0.660328	only a single function	-0.124939
-0.463933	just a single function	-0.124939
-0.341671	in vectors. These function	-0.124939
-0.341671	Performance Primitives". These function	-0.124939
-0.447694	of the virtual function	-0.124939
-0.264240	when the virtual function	-0.124939
-0.264240	avoiding the virtual function	-0.124939
-0.335584	of a virtual function	-0.124939
-0.335584	for a virtual function	-0.124939
-0.342209	misprediction of virtual function	-0.124939
-0.342209	Call to virtual function	-0.124939
-0.342209	tables, and virtual function	-0.124939
-0.260808	the inefficient virtual function	-0.124939
-0.489564	critical function through function	-0.124939
-0.342343	to data through function	-0.124939
-0.498927	best. Some common function	-0.124939
-0.777715	in the thread function	-0.124939
-0.524728	of the power function	-0.124939
-0.340143	compilers and optimized function	-0.124939
-0.623081	the best optimized function	-0.124939
-0.354233	the dispatcher 128 function	-0.124939
-0.498010	allows only four function	-0.124939
-0.459069	deleted by another function	-0.124939
-0.296731	by making another function	-0.124939
-0.114927	F1 calls another function	-0.124939
-0.296731	file. Use another function	-0.124939
-0.321915	macro as inline function	-0.124939
-0.417057	using an inline function	-0.124939
-0.321915	functions An inline function	-0.124939
-0.337086	recommend that every function	-0.124939
-0.436026	breakpoints at every function	-0.124939
-0.435642	have a standard function	-0.124939
-0.336780	compiler includes standard function	-0.124939
-0.428982	calling the intrinsic function	-0.124939
-0.331467	that each intrinsic function	-0.124939
-1.063135	in a separate function	-0.124939
-0.871090	into a separate function	-0.124939
-1.398145	There are various function	-0.124939
-0.366634	that the dispatcher function	-0.124939
-0.366634	calls the dispatcher function	-0.124939
-0.372431	initialized. The dispatcher function	-0.124939
-0.285714	conditions. A dispatcher function	-0.124939
-0.351393	handling information. Each function	-0.124939
-0.266807	to a graphics function	-0.124939
-0.302052	unit. Various graphics function	-0.124939
-0.351187	speed-critical functions. Many function	-0.124939
-0.820260	of a linked function	-0.124939
-0.420368	which the calling function	-0.124939
-0.324572	object. The calling function	-0.124939
-0.350799	Asmlib My own function	-0.124939
-1.233676	to the appropriate function	-0.124939
-0.642105	the Gnu C function	-0.124939
-1.157362	to the desired function	-0.124939
-0.348379	separate storage. No function	-0.124939
-0.310981	the Intel math function	-0.124939
-0.310981	best optimized math function	-0.124939
-0.224151	of the inlined function	-0.124939
-0.326217	to the inlined function	-0.124939
-0.094446	inlining the frame function	-0.124939
-0.094446	turning the frame function	-0.124939
-0.287894	than a frame function	-0.124939
-0.094446	functions. A frame function	-0.124939
-0.094446	exception. A frame function	-0.124939
-0.276926	throw() throw() Assume function	-0.124939
-0.276926	__attribute(( const)) Assume function	-0.124939
-0.276926	#pragma ivdep Assume function	-0.124939
-0.557646	find the right function	-0.124939
-0.443449	"asmlib.h" // Define function	-0.124939
-0.443449	SelectAddMul_dispatch; // Define function	-0.124939
-0.346471	Intel CPU’s. Another function	-0.124939
-0.488008	of an overloaded function	-0.124939
-0.346115	}; // Any function	-0.124939
-0.344856	The string length function	-0.124939
-0.151263	is a linear function	-0.425969
-0.433715	ability to define function	-0.124939
-0.433715	fprintf // define function	-0.124939
-0.536072	If the latter function	-0.124939
-0.482224	A very time-consuming function	-0.124939
-0.214530	to a pure function	-0.124939
-0.152460	functions A pure function	-0.124939
-0.152460	code containing pure function	-0.124939
-0.152460	that contain pure function	-0.124939
-0.152460	it involves pure function	-0.124939
-0.286974	of the factorial function	-0.124939
-0.286974	the integer factorial function	-0.124939
-0.406963	to optimize across function	-0.124939
-0.373634	making optimizations across function	-0.124939
-0.440989	use the memcpy function	-0.124939
-0.109269	the CPU detection function	-0.124939
-0.097877	Intel CPU detection function	-0.124939
-0.719542	call a polymorphic function	-0.124939
-0.441210	here // Virtual function	-0.124939
-0.440324	justified for general function	-0.124939
-0.338123	predicted well. Even function	-0.124939
-0.476165	used for storing function	-0.124939
-0.338523	// call transpose function	-0.124939
-0.255566	Assembly name Intrinsic function	-0.124939
-0.255566	summarized below. Intrinsic function	-0.124939
-0.129198	if the dispatched function	-0.124939
-0.129198	If a dispatched function	-0.124939
-0.187802	Entry to dispatched function	-0.124939
-0.129198	calls another dispatched function	-0.124939
-0.334967	support SSE. Several function	-0.124939
-0.501424	116 // Set function	-0.124939
-0.054010	into a leaf function	-0.425969
-0.028849	function. A leaf function	-0.425969
-0.427879	test // Critical function	-0.124939
-0.237090	inside the pow function	-0.124939
-0.237090	} The pow function	-0.124939
-0.330861	a monotonically increasing function	-0.124939
-0.102644	of the strlen function	-0.124939
-0.102644	tested the strlen function	-0.124939
-0.420621	but the asmlib function	-0.124939
-0.324098	make a round function	-0.124939
-0.420199	of the lrint function	-0.124939
-0.324437	#pragma optimize(...) Fastcall function	-0.124939
-0.211555	directly: Library exp function	-0.124939
-0.211555	4 floats exp function	-0.124939
-0.313494	the virtual 53 function	-0.124939
-0.313494	library or API function	-0.124939
-0.406599	intermediates, loop counters, function	-0.124939
-0.313935	series. The exponential function	-0.124939
-0.406599	choose an up-to-date function	-0.124939
-0.313494	equally efficient. Simple function	-0.124939
-0.313935	} The InstructionSet() function	-0.124939
-0.293034	} The indirect function	-0.124939
-0.293034	though the 61 function	-0.124939
-0.381386	possible to distribute function	-0.124939
-0.293034	Compiler-specific keywords Fast function	-0.124939
-0.048225	4 ; mangled function	-0.124939
-0.048225	;alignby4 ; mangled function	-0.124939
-0.293034	vector always Optimize function	-0.124939
-0.293034	functions. A thread-safe function	-0.124939
-0.236869	that a user-defined function	-0.124939
-0.236869	problems. Avoid nested function	-0.124939
-0.236869	function or friend function	-0.124939
-0.236869	} // Branch/loop function	-0.124939
-0.236869	your own error-handling function	-0.124939
-0.236869	is a staircase function	-0.124939
-0.236869	before you. Optimized function	-0.124939
-0.236869	"instrset_detect.cpp" // instrset_detect function	-0.124939
-0.236869	call the std::unexpected() function	-0.124939
-0.236869	example, the DelayFiveSeconds function	-0.124939
-0.236869	sin(0.8); The sin function	-0.124939
-0.236869	was called from), function	-0.124939
-1.702325	so that the if	-0.124939
-0.503747	is how the if	-0.124939
-0.358020	while loop, the if	-0.124939
-0.830218	same thing and if	-0.124939
-0.357038	function, etc., and if	-0.124939
-0.656189	Pentium 4. The if	-0.124939
-0.357086	} 135 The if	-0.124939
-1.112299	code is that if	-0.124939
-0.569248	library is that if	-0.124939
-0.642995	This means that if	-0.124939
-0.129150	_controlfp(0, _EM_OVERFLOW); // if	-0.425969
-0.338762	is used or if	-0.124939
-0.338762	ruled out or if	-0.124939
-0.438132	very large or if	-0.124939
-0.075121	is small or if	-0.425969
-0.166011	very small or if	-0.124939
-0.338762	valid values or if	-0.124939
-0.438132	are negative or if	-0.124939
-0.338762	be predicted or if	-0.124939
-0.338762	library functions, or if	-0.124939
-0.338762	side effects or if	-0.124939
-0.338762	not overlap or if	-0.124939
-0.338762	periodic pattern or if	-0.124939
-0.338762	memory blocks, or if	-0.124939
-0.338762	is correct or if	-0.124939
-0.338762	is unstable or if	-0.124939
-0.338762	this initialization, or if	-0.124939
-0.338762	valid addresses, or if	-0.124939
-0.576957	inline a function if	-0.124939
-0.576957	inlining a function if	-0.124939
-1.277637	vectorize the code if	-0.124939
-0.717024	an error code if	-0.124939
-0.354719	is dead code if	-0.124939
-1.375990	the same as if	-0.124939
-0.355817	or object as if	-0.124939
-0.425450	used, but not if	-0.124939
-0.425450	memory, but not if	-0.124939
-0.425450	float, but not if	-0.124939
-0.352248	n. But not if	-0.124939
-0.357446	operator[] (unsigned int if	-0.124939
-1.515724	more efficient than if	-0.124939
-0.353630	are variables than if	-0.124939
-0.353630	binary form than if	-0.124939
-0.598410	from the compiler if	-0.124939
-0.570172	20; i++) { if	-0.124939
-0.028509	bool b) { if	-0.726999
-0.582360	== 0) { if	-0.124939
-0.260111	!= 0) { if	-0.124939
-0.903005	(int n) { if	-0.124939
-0.620455	F3(bool y) { if	-0.124939
-0.872621	no extra time if	-0.124939
-0.458887	allow compile- time if	-0.124939
-0.751875	= 0; } if	-0.124939
-0.774606	* 3; } if	-0.124939
-0.455106	return &CriticalFunction_AVX; } if	-0.124939
-1.188650	of the memory if	-0.124939
-0.557796	sequentially in memory if	-0.124939
-0.745749	from RAM memory if	-0.124939
-1.459134	in the program if	-0.124939
-0.854544	virtual member functions if	-0.124939
-0.533518	Avoid virtual functions if	-0.124939
-0.499548	thread, and only if	-0.124939
-0.499548	automatically but only if	-0.124939
-0.331095	are possible only if	-0.124939
-0.418260	this works only if	-0.124939
-0.418260	14.13b works only if	-0.124939
-0.331095	can run only if	-0.124939
-0.331095	such methods only if	-0.124939
-0.331095	is needed only if	-0.124939
-0.331095	to F1 only if	-0.124939
-0.357645	a blend instruction if	-0.124939
-1.358244	to floating point if	-0.124939
-0.782333	before the loop if	-0.124939
-0.537498	after the loop if	-0.124939
-0.908675	outside the loop if	-0.124939
-1.108509	of a loop if	-0.124939
-0.516645	out a loop if	-0.124939
-0.333442	is no loop if	-0.124939
-0.333442	; repeat loop if	-0.124939
-0.333442	avoiding infinite loop if	-0.124939
-0.357122	me manually, but if	-0.124939
-0.961461	can be used if	-0.124939
-1.060826	may be used if	-0.124939
-0.468995	only be used if	-0.124939
-0.468995	still be used if	-0.124939
-0.548491	joined into one if	-0.124939
-1.168343	the code cache if	-0.124939
-0.763361	of an integer if	-0.124939
-0.526600	as an integer if	-0.124939
-0.738838	a signed integer if	-0.124939
-1.731038	SSE2 instruction set if	-0.124939
-1.254730	later instruction set if	-0.124939
-0.443815	it, for example if	-0.124939
-0.443815	parts, for example if	-0.124939
-0.460364	rather than double if	-0.124939
-0.565232	efficient integer size if	-0.124939
-0.574885	Set function pointer if	-0.124939
-0.352062	a & b if	-0.124939
-0.352062	a | b if	-0.124939
-0.356368	re- usable library if	-0.124939
-0.356429	making them static if	-0.124939
-0.334377	compact and efficient if	-0.124939
-0.863016	much more efficient if	-0.124939
-0.466272	is most efficient if	-0.124939
-0.664011	are most efficient if	-0.124939
-1.321391	is less efficient if	-0.124939
-1.135730	is not possible if	-0.124939
-0.063964	is only possible if	-0.124939
-0.356571	the static version if	-0.124939
-0.460451	remove any objects if	-0.124939
-0.355829	than one variable if	-0.124939
-0.356105	and static variables if	-0.124939
-0.525488	power of 2 if	-0.522879
-0.667493	powers of 2 if	-0.124939
-0.826327	a lookup table if	-0.124939
-1.272871	improve the performance if	-0.124939
-0.529524	improvement in performance if	-0.124939
-0.348974	best possible branch if	-0.124939
-0.348974	a single branch if	-0.124939
-0.502657	more efficient way if	-0.124939
-0.615559	This is faster if	-0.124939
-0.615559	method is faster if	-0.124939
-0.434921	access is faster if	-0.124939
-0.573476	constant is faster if	-0.425969
-0.293744	vector goes faster if	-0.124939
-0.013551	// Still faster if	-0.726999
-0.571265	virtual function call if	-0.124939
-0.390372	vector. For example, if	-0.124939
-0.390372	operation. For example, if	-0.124939
-0.390372	factor. For example, if	-0.124939
-0.390372	conditions. For example, if	-0.124939
-0.390372	structures. For example, if	-0.124939
-0.390372	frequency. For example, if	-0.124939
-0.390372	exits. For example, if	-0.124939
-0.390372	modulo. For example, if	-0.124939
-0.390372	minimized. For example, if	-0.124939
-0.523095	dividend to unsigned if	-0.124939
-1.154336	in a register if	-0.124939
-0.103078	of function pointers if	-0.425969
-0.508769	of member pointers if	-0.124939
-0.544695	use 64-bit systems if	-0.124939
-1.024291	to the user if	-0.124939
-1.007478	method is useful if	-0.124939
-1.382866	may be useful if	-0.124939
-0.267699	different arrays even if	-0.124939
-0.267699	(|) works even if	-0.124939
-0.267699	clock cycles even if	-0.124939
-0.267699	an Intel, even if	-0.124939
-0.267699	be mispredicted even if	-0.124939
-0.267699	be called, even if	-0.124939
-0.267699	starts up, even if	-0.124939
-0.267699	more resources, even if	-0.124939
-0.267699	program execution, even if	-0.124939
-0.267699	(b*c) overflows, even if	-0.124939
-0.267699	exception handler, even if	-0.124939
-0.558796	avoid this method if	-0.124939
-0.355584	be left out if	-0.124939
-0.584373	protected operating system if	-0.124939
-0.355120	// return 0 if	-0.124939
-0.842519	is the case if	-0.124939
-1.007002	registers are available if	-0.124939
-0.630900	are only available if	-0.124939
-0.355186	be filled up if	-0.124939
-0.574556	detect an error if	-0.124939
-0.354912	can be important if	-0.124939
-0.443521	several different CPUs if	-0.124939
-0.343043	contemporary 106 CPUs if	-0.124939
-0.354073	compile time while if	-0.124939
-0.458031	of large arrays if	-0.124939
-0.552426	than 64-bit Windows if	-0.124939
-0.354823	the same result if	-0.124939
-0.538329	vectorization works best if	-0.124939
-0.568899	This is necessary if	-0.124939
-0.528837	are not necessary if	-0.124939
-0.543497	an array element if	-0.124939
-0.294576	clock cycles. But if	-0.124939
-0.294576	be obsolete. But if	-0.124939
-0.294576	was programmed. But if	-0.124939
-0.294576	the delay. But if	-0.124939
-0.294576	cache miss. But if	-0.124939
-0.294576	resource conflicts. But if	-0.124939
-0.354436	actually reduce speed if	-0.124939
-0.590749	7.20 int i; if	-0.124939
-0.177817	a separate thread if	-0.124939
-0.715586	use 64-bit integers if	-0.124939
-0.353956	source annotation option if	-0.124939
-0.522059	precision is good if	-0.124939
-0.551677	use single precision if	-0.124939
-0.353473	the memset line if	-0.124939
-0.867466	can be optimized if	-0.124939
-0.338942	float b[1000]; }; if	-0.124939
-0.338942	"Gamma", "Delta" }; if	-0.124939
-0.239310	y; bool b; if	-0.124939
-0.239310	z; bool b; if	-0.124939
-0.322593	have to check if	-0.124939
-0.322593	need to check if	-0.124939
-0.322593	necessary to check if	-0.124939
-0.410562	We can check if	-0.124939
-0.258405	{ // check if	-0.124939
-0.477800	does not check if	-0.124939
-0.258405	function must check if	-0.124939
-0.258405	as input check if	-0.124939
-0.458362	function is advantageous if	-0.124939
-0.458362	method is advantageous if	-0.124939
-0.444006	not be advantageous if	-0.124939
-0.629405	may be advantageous if	-0.124939
-0.543373	is more advantageous if	-0.124939
-0.338153	is no problem if	-0.124939
-0.338153	very big problem if	-0.124939
-0.546218	not an advantage if	-0.124939
-0.352699	// always 1 if	-0.124939
-0.708814	64 bit mode if	-0.124939
-0.434869	the denormals-are-zero mode if	-0.124939
-0.958883	have other values if	-0.124939
-0.335617	predicted quite well if	-0.124939
-0.483267	usually predicted well if	-0.124939
-0.761640	hundred clock cycles if	-0.124939
-0.525601	2-3 clock cycles if	-0.124939
-0.827071	int i; ... if	-0.124939
-0.334947	float list[size]; ... if	-0.124939
-0.536798	therefore not recommended if	-0.124939
-0.334140	is very fast if	-0.124939
-0.334140	is calculated fast if	-0.124939
-0.352904	function F1. However, if	-0.124939
-0.351901	than 32-bit programs if	-0.124939
-0.333183	compilers without problems if	-0.124939
-0.431130	can cause problems if	-0.124939
-0.226968	pointer if else if	-0.124939
-0.329783	else if else if	-0.124939
-0.578486	&CriticalFunction_AVX; } else if	-0.124939
-0.351843	the first application if	-0.124939
-0.351675	a loop automatically if	-0.124939
-0.327301	at to see if	-0.124939
-0.327301	result to see if	-0.124939
-0.327301	measurements to see if	-0.124939
-0.327301	listing to see if	-0.124939
-0.331560	simplest possible implementation if	-0.124939
-0.713783	the software implementation if	-0.124939
-0.562205	little more complicated if	-0.124939
-0.351493	the above methods if	-0.124939
-0.295099	be a disadvantage if	-0.124939
-0.295099	at a disadvantage if	-0.124939
-0.494055	value is zero if	-0.124939
-0.351754	returns. But what if	-0.124939
-0.747090	a const reference if	-0.124939
-0.561467	a table lookup if	-0.124939
-0.493910	than at runtime if	-0.124939
-0.427038	is not needed if	-0.425969
-0.539830	also stored together if	-0.124939
-0.350863	code becomes bigger if	-0.124939
-0.349905	by using vectors if	-0.124939
-0.350438	I don't know if	-0.124939
-0.350149	give inconsistent results if	-0.124939
-0.349358	inside one function, if	-0.124939
-0.556900	swap the operands if	-0.124939
-0.348903	in separate modules if	-0.124939
-0.349358	code becomes smaller if	-0.124939
-0.349814	at runtime here if	-0.124939
-0.490456	skip this section if	-0.124939
-0.354484	be cache contentions if	-0.124939
-0.354484	cause cache contentions if	-0.124939
-0.491123	address is predicted if	-0.124939
-0.349117	resources than C if	-0.124939
-0.318443	a variable global if	-0.124939
-0.318443	make variables global if	-0.124939
-0.766091	a switch statement if	-0.124939
-0.316028	can cause errors if	-0.124939
-0.316028	cause fatal errors if	-0.124939
-0.461630	be very inefficient if	-0.124939
-0.406368	be quite inefficient if	-0.124939
-0.347814	overflow by checking if	-0.124939
-0.347412	for Linux platforms if	-0.124939
-0.319697	can be vectorized if	-0.124939
-0.319697	also be vectorized if	-0.124939
-0.539478	about the costs if	-0.124939
-0.346830	is usually inlined if	-0.124939
-0.960656	b, c, d; if	-0.124939
-0.530750	make a destructor if	-0.124939
-0.454770	not be safe if	-0.124939
-0.244363	is only safe if	-0.124939
-0.244363	not thread safe if	-0.124939
-0.244363	is exception safe if	-0.124939
-0.347024	the loop further if	-0.124939
-0.346830	and complicated algorithm if	-0.124939
-0.518809	from the exponent if	-0.124939
-0.532889	the hard disk if	-0.124939
-0.930017	performance is obtained if	-0.124939
-0.125595	accessed most efficiently if	-0.124939
-0.125376	works most efficiently if	-0.124939
-0.381788	works less efficiently if	-0.124939
-0.345507	specific CPU models if	-0.124939
-0.228573	likely to fail if	-0.124939
-0.063690	code will fail if	-0.124939
-0.063690	It will fail if	-0.124939
-0.063690	trick will fail if	-0.124939
-0.485354	buffer can occur if	-0.124939
-0.511723	predict the target if	-0.124939
-0.218265	a program, especially if	-0.124939
-0.218265	32-bit systems, especially if	-0.124939
-0.218265	the file, especially if	-0.124939
-0.218265	time consuming, especially if	-0.124939
-0.344054	need the updates if	-0.124939
-0.373686	you may consider if	-0.124939
-0.217898	we will consider if	-0.124939
-0.290956	you must consider if	-0.124939
-0.344790	the function directly if	-0.124939
-1.109870	an error message if	-0.124939
-0.344054	calculations in parallel if	-0.124939
-0.345528	calculation becomes easier if	-0.124939
-0.298030	with the loops if	-0.124939
-0.298030	will unroll loops if	-0.124939
-0.781817	i; } u; if	-0.124939
-0.354470	i[2]; } u; if	-0.124939
-0.342585	delay is significant if	-0.124939
-0.584922	would be invalid if	-0.124939
-0.380580	counter becomes invalid if	-0.124939
-0.444641	cache is organized if	-0.124939
-0.848939	lot to gain if	-0.124939
-0.612832	errors can happen if	-0.124939
-0.292856	This will happen if	-0.124939
-0.445321	it doesn't matter if	-0.124939
-0.686263	oriented programming style if	-0.124939
-0.342287	of some help if	-0.124939
-0.342585	the following explanation if	-0.124939
-0.627203	function is pure if	-0.124939
-0.341094	is stored (or if	-0.124939
-0.814363	one clock cycle if	-0.124939
-0.339571	are more frequent if	-0.124939
-0.339905	the $B1$2 label if	-0.124939
-0.339238	floating point variables, if	-0.124939
-0.339571	be needed, however, if	-0.124939
-0.439570	such small devices if	-0.124939
-0.339571	prefetch data explicitly if	-0.124939
-0.478616	use lookup tables if	-0.124939
-0.337621	removed after debugging if	-0.124939
-0.336863	second operand. Likewise, if	-0.124939
-0.336863	time, but expensive if	-0.124939
-0.336863	language allows compile-time if	-0.124939
-0.725192	is more compact if	-0.124939
-0.503416	omitted, of course, if	-0.124939
-0.703060	is more complex if	-0.124939
-0.333716	is to detect if	-0.124939
-0.334155	CPU dispatching. Test if	-0.124939
-0.432348	conversion is costly if	-0.124939
-0.333716	caching is poor if	-0.124939
-0.333716	This typically happens if	-0.124939
-0.050972	not be evaluated if	-0.425969
-0.334155	can be permissible if	-0.124939
-0.333716	do so (i.e. if	-0.124939
-0.926509	the other hand, if	-0.124939
-0.329870	is taken, i.e. if	-0.124939
-0.329348	each object separately if	-0.124939
-0.329870	count as true, if	-0.124939
-0.418256	therefore need modification if	-0.124939
-0.038394	can be eliminated if	-0.425969
-0.080513	also be eliminated if	-0.124939
-0.322878	code is selected if	-0.124939
-0.322878	very high resolution if	-0.124939
-0.036950	unsigned // Faster if	-0.425969
-0.312303	the database anyway if	-0.124939
-0.313140	// Example 7.8 if	-0.124939
-0.312303	save RAM space, if	-0.124939
-0.312303	cause severe delays if	-0.124939
-0.312303	use of longjmp if	-0.124939
-0.571156	your programming questions if	-0.124939
-0.571156	template library (STL) if	-0.124939
-0.406160	y=temp;} // Check if	-0.124939
-0.441066	Windows) to determine if	-0.124939
-0.405124	cause branch mispredictions if	-0.124939
-0.291897	use multiple accumulators if	-0.124939
-0.534802	no pointer aliasing" if	-0.124939
-0.534802	3628800, 39916800, 479001600}; if	-0.124939
-0.291897	The copy constructor, if	-0.124939
-0.291897	u.f > v.f if	-0.124939
-0.023367	are many branches): if	-0.425969
-0.023367	} u, v; if	-0.425969
-0.291897	are relatively cheap if	-0.124939
-0.023367	}; Weekdays Day; if	-0.425969
-0.379994	a time consumer if	-0.124939
-0.379994	should be avoided, if	-0.124939
-0.291897	a separate subroutine if	-0.124939
-0.379994	of memory leaks if	-0.124939
-0.235870	// Example 14.5b if	-0.124939
-0.235870	(N & N-1)==0 if	-0.124939
-0.235870	not call WriteFile if	-0.124939
-0.235870	int n; 143 if	-0.124939
-0.235870	(Not A Number) if	-0.124939
-0.235870	many function calls, if	-0.124939
-0.235870	// Example 14.4b if	-0.124939
-0.235870	extending with zero-bits if	-0.124939
-0.235870	// Example 14.15b if	-0.124939
-0.235870	largest element (approximately): if	-0.124939
-0.235870	<< 6); Or, if	-0.124939
-0.235870	division is inexact if	-0.124939
-0.235870	code are modified, if	-0.124939
-0.235870	must be adjusted if	-0.124939
-0.235870	// Example 8.10a if	-0.124939
-0.235870	table at runtime, if	-0.124939
-0.235870	contentions will occur: if	-0.124939
-0.235870	256 bits (YMM) if	-0.124939
-0.235870	and the destructor, if	-0.124939
-0.235870	extending the sign-bit if	-0.124939
-0.235870	cannot be ignored if	-0.124939
-0.235870	is always normalized, if	-0.124939
-0.235870	they are uninitialized, if	-0.124939
-0.235870	128 bits (XMM) if	-0.124939
-0.235870	__restrict or __restrict__, if	-0.124939
-0.235870	than a minute if	-0.124939
-0.235870	// Example 14.15a if	-0.124939
-0.235870	100; float list[ARRAYSIZE]; if	-0.124939
-0.235870	metaprogramming. Don't panic if	-0.124939
-0.235870	must be reversed if	-0.124939
-0.235870	"function level linking" if	-0.124939
-0.235870	cost is minimized if	-0.124939
-0.235870	do not alias, if	-0.124939
-1.518283	a function is by	-0.124939
-0.159771	value pointed to by	-0.124939
-0.356384	if possible and by	-0.124939
-0.502687	static linking and by	-0.124939
-0.356384	become invalid, and by	-0.124939
-0.356384	clock period and by	-0.124939
-0.568445	frame function or by	-0.124939
-0.349774	unsigned int or by	-0.124939
-0.517216	or static or by	-0.124939
-0.741623	single precision or by	-0.124939
-0.452021	installation process or by	-0.124939
-0.349774	the latency or by	-0.124939
-0.349774	consecutive indices or by	-0.124939
-0.349774	is signed, or by	-0.124939
-0.461283	then replace it by	-0.124939
-0.357077	should multiply it by	-0.124939
-0.825921	a single function by	-0.124939
-0.128979	a leaf function by	-0.425969
-0.569111	the intermediate code by	-0.124939
-0.837241	use position-independent code by	-0.124939
-0.458509	and compiler-generated code by	-0.124939
-0.357447	operating system, not by	-0.124939
-0.706498	code rather than by	-0.124939
-0.706498	8 rather than by	-0.124939
-0.492734	system rather than by	-0.124939
-0.492734	container rather than by	-0.124939
-0.492734	units rather than by	-0.124939
-0.492734	tools, rather than by	-0.124939
-0.342968	vector classes than by	-0.124939
-0.342968	other ways than by	-0.124939
-0.342968	best algorithm than by	-0.124939
-0.357851	a different compiler by	-0.124939
-0.787144	bit of x by	-0.124939
-1.017583	advantage of this by	-0.124939
-0.333531	tell it this by	-0.124939
-0.748873	can do this by	-0.124939
-0.333531	It does this by	-0.124939
-0.738266	can avoid this by	-0.124939
-0.062469	may replace this by	-0.726999
-0.102337	will replace this by	-0.124939
-0.333531	can improve this by	-0.124939
-0.333531	have confirmed this by	-0.124939
-0.566175	obtain much more by	-0.124939
-0.355091	(rebased) once more by	-0.124939
-0.565184	functions in memory by	-0.124939
-1.440662	in the program by	-0.124939
-0.817985	the whole program by	-0.124939
-0.357349	for speed-critical functions by	-0.124939
-0.594331	up the CPU by	-0.124939
-0.168436	out the loop by	-0.124939
-0.493081	Unrolling the loop by	-0.124939
-0.415497	optimize this loop by	-0.124939
-0.009166	Roll out loop by	-0.522879
-1.041788	the innermost loop by	-0.124939
-0.549859	that is used by	-0.124939
-1.449591	that are used by	-0.124939
-0.523688	manuals are used by	-0.124939
-0.425524	the time used by	-0.124939
-0.133969	the memory used by	-0.124939
-0.133969	data memory used by	-0.124939
-0.848468	are often used by	-0.124939
-0.328703	that was used by	-0.124939
-0.384322	files into one by	-0.124939
-0.384322	modules into one by	-0.124939
-0.349462	were inserted, one by	-0.124939
-0.577516	contrary, you should by	-0.124939
-0.461159	f is set by	-0.124939
-0.813155	its child class by	-0.124939
-0.547216	modify a double by	-0.124939
-0.356942	a longer size by	-0.124939
-0.346676	has replaced i by	-0.124939
-0.064226	to divide i by	-0.425969
-0.582742	Accessing an object by	-0.124939
-1.161114	floating point number by	-0.124939
-0.937202	the CPU clock by	-0.124939
-0.899639	the absolute value by	-0.124939
-0.356139	an integer variable by	-0.124939
-0.501405	and global variables by	-0.124939
-1.703805	power of 2 by	-0.124939
-0.451116	reduced to 2 by	-0.124939
-0.356029	// align table by	-0.124939
-0.602844	improve the performance by	-0.124939
-0.342816	of measuring performance by	-0.124939
-0.549273	replace the branch by	-0.124939
-0.553516	replace a branch by	-0.124939
-0.342260	poorly predictable branch by	-0.124939
-0.356747	its b member by	-0.124939
-0.356503	and intelligible way by	-0.124939
-0.356624	member functions faster by	-0.124939
-0.356620	for information stored by	-0.124939
-0.566378	which is called by	-0.124939
-0.137767	all functions called by	-0.124939
-0.137767	library functions called by	-0.124939
-0.802093	particular memory address by	-0.124939
-0.348608	calculate each address by	-0.124939
-0.993545	a function call by	-0.124939
-0.486914	lot of optimization by	-0.124939
-0.048901	Obstacles to optimization by	-0.249877
-1.343175	transferred in registers by	-0.124939
-0.487593	512-bit ZMM registers by	-0.124939
-0.795933	in 64-bit systems by	-0.124939
-0.355337	for exclusive access by	-0.124939
-0.615280	be ruled out by	-0.124939
-0.136304	is rolled out by	-0.124939
-0.136304	list, rolled out by	-0.124939
-0.544583	created a file by	-0.124939
-0.458879	a different type by	-0.124939
-0.458408	avoid this error by	-0.124939
-0.953598	it is accessed by	-0.124939
-0.062301	is not accessed by	-0.425969
-0.443052	objects and arrays by	-0.124939
-0.342671	to replace arrays by	-0.124939
-0.498889	in 32-bit Windows by	-0.124939
-0.354838	that delays execution by	-0.124939
-0.355037	a better result by	-0.124939
-0.546314	to 16 bytes by	-0.124939
-0.540907	double the speed by	-0.124939
-0.139747	gain in speed by	-0.425969
-1.148879	check for overflow by	-0.124939
-0.130850	This is done by	-0.124939
-0.319009	Relocation is done by	-0.124939
-0.253816	standardized and done by	-0.124939
-1.007673	can be done by	-0.124939
-0.402400	should be done by	-0.124939
-0.253816	it has done by	-0.124939
-0.253816	15.1c was done by	-0.124939
-0.253816	not necessarily done by	-0.124939
-0.577272	are double precision by	-0.124939
-0.339933	replace this line by	-0.124939
-0.339933	and interpreted line by	-0.124939
-0.719323	optimization. This works by	-0.124939
-0.519979	often be optimized by	-0.124939
-0.257166	can be calculated by	-0.124939
-0.255117	could be calculated by	-0.124939
-0.457041	a function uses by	-0.124939
-0.497510	auto_ptr to another by	-0.124939
-0.537099	therefore not advantageous by	-0.124939
-0.517656	Branches are implemented by	-0.124939
-0.339455	is typically implemented by	-0.124939
-0.527583	reduce the problem by	-0.124939
-0.392267	avoid this problem by	-0.124939
-0.392267	solved this problem by	-0.124939
-0.089622	set is supported by	-0.346788
-0.135732	C++ is supported by	-0.124939
-0.217104	AVX is supported by	-0.124939
-0.261947	Linux and supported by	-0.124939
-0.055380	functions are supported by	-0.124939
-0.055380	registers are supported by	-0.124939
-0.055380	directives are supported by	-0.124939
-0.085966	available if supported by	-0.124939
-0.085966	__restrict__, if supported by	-0.124939
-0.818557	0 or 1 by	-0.124939
-0.353392	the table values by	-0.124939
-0.230296	point number simply by	-0.124939
-0.230296	an error simply by	-0.124939
-0.230296	is done simply by	-0.124939
-0.230296	is implemented simply by	-0.124939
-0.230296	point numbers simply by	-0.124939
-0.230296	be copied simply by	-0.124939
-0.230296	CPU brand simply by	-0.124939
-0.230296	is measured simply by	-0.124939
-0.230296	performance significantly simply by	-0.124939
-1.017146	a loop counter by	-0.124939
-0.535268	saving memory space by	-0.124939
-0.702179	of cache space by	-0.124939
-0.473135	avoid the multiplication by	-0.124939
-0.122900	} The multiplication by	-0.124939
-0.122900	element. The multiplication by	-0.124939
-0.383726	replace integer multiplication by	-0.124939
-0.352062	often inlined automatically by	-0.124939
-0.463954	number is zero by	-0.124939
-0.525580	initialized to zero by	-0.124939
-0.044704	faster than division by	-0.425969
-0.566636	Floating point division by	-0.124939
-0.215564	eliminate one division by	-0.124939
-0.193609	2 Integer division by	-0.124939
-0.193609	time. Integer division by	-0.124939
-0.193609	processor). Integer division by	-0.124939
-0.193609	division: Integer division by	-0.124939
-0.351740	is actually needed by	-0.124939
-1.459180	parameters are transferred by	-0.124939
-0.065635	to be aligned by	-0.124939
-0.065635	should be aligned by	-0.124939
-0.065635	must be aligned by	-0.124939
-0.503650	arrays are aligned by	-0.124939
-0.237167	is typically aligned by	-0.124939
-0.237167	are preferably aligned by	-0.124939
-0.350548	sense to dispatch by	-0.124939
-0.643916	class is declared by	-0.124939
-0.350548	incremented every second by	-0.124939
-0.351654	background calculations piece by	-0.124939
-0.000506	that is divisible by	-0.124939
-0.001521	which is divisible by	-0.124939
-0.000760	count is divisible by	-0.425969
-0.004580	to be divisible by	-0.124939
-0.004580	must be divisible by	-0.124939
-0.002284	is not divisible by	-0.425969
-0.009209	Array size divisible by	-0.124939
-0.000365	an address divisible by	-0.221849
-0.001140	to addresses divisible by	-0.124939
-0.002284	have addresses divisible by	-0.124939
-0.002284	memory addresses divisible by	-0.124939
-0.349984	up significantly just by	-0.124939
-0.320388	variable even smaller by	-0.124939
-0.320388	be made smaller by	-0.124939
-0.740845	specific CPU core by	-0.124939
-0.491459	incremented to 5 by	-0.124939
-0.018284	function is replaced by	-0.124939
-0.018284	m is replaced by	-0.124939
-0.018284	x*8 is replaced by	-0.124939
-0.057337	possible, and replaced by	-0.124939
-0.126224	can be replaced by	-0.726999
-0.061306	may be replaced by	-0.124939
-0.061306	will be replaced by	-0.124939
-0.061306	sometimes be replaced by	-0.124939
-0.057337	parameters are replaced by	-0.124939
-0.057337	have been replaced by	-0.124939
-0.057337	its parameters replaced by	-0.124939
-0.349199	and not negative by	-0.124939
-0.491099	conclude this section by	-0.124939
-0.549631	to be predicted by	-0.124939
-0.490808	Avoid the conversions by	-0.124939
-0.349280	optional and off by	-0.124939
-0.450700	multiplying the index by	-0.124939
-0.039302	can be avoided by	-0.191886
-0.360571	should be avoided by	-0.124939
-0.108511	sometimes be avoided by	-0.124939
-0.348199	of this fact by	-0.124939
-0.291368	CPU is limited by	-0.124939
-0.291368	performance is limited by	-0.124939
-0.275982	to be limited by	-0.124939
-0.488797	to be inlined by	-0.124939
-0.448668	replace a database by	-0.124939
-0.775326	and the destructor by	-0.124939
-0.448470	we may save by	-0.124939
-0.347436	the code further by	-0.124939
-0.348220	may improve efficiency by	-0.124939
-0.348063	and you unroll by	-0.124939
-0.346557	operations when alignment by	-0.124939
-0.346389	7.34b. Replace macro by	-0.124939
-0.307629	You can divide by	-0.124939
-0.307629	right = divide by	-0.124939
-0.385104	performance is obtained by	-0.124939
-0.237572	microprocessors is obtained by	-0.124939
-0.237572	interface is obtained by	-0.124939
-0.385269	sometimes be obtained by	-0.124939
-0.508753	achieved more efficiently by	-0.124939
-1.047313	can be changed by	-0.124939
-0.345899	call to square by	-0.124939
-0.345534	and big structures by	-0.124939
-0.345122	An array initialized by	-0.124939
-0.044814	can be improved by	-0.301030
-0.043246	will be improved by	-0.124939
-0.043246	probably be improved by	-0.124939
-0.344923	times faster either by	-0.124939
-0.738626	object is copied by	-0.124939
-0.387716	elements // align by	-0.124939
-0.387716	esp ; align by	-0.124939
-0.344923	replace such loops by	-0.124939
-0.343538	in a module by	-0.124939
-0.383723	lot to gain by	-0.124939
-0.236588	nothing to gain by	-0.124939
-0.326776	insight you gain by	-0.124939
-0.443871	of each row by	-0.124939
-0.313238	it can multiply by	-0.124939
-0.162232	constant = multiply by	-0.124939
-0.130432	and lazy binding by	-0.425969
-0.488655	14.7b is converted by	-0.124939
-0.440396	just two additions by	-0.124939
-0.340562	was originally designed by	-0.124939
-0.340292	to multiply j by	-0.124939
-0.277878	vectorize code explicitly by	-0.124939
-0.277878	the alignment explicitly by	-0.124939
-0.278580	ways of multiplying by	-0.124939
-0.278580	faster than multiplying by	-0.124939
-0.340022	eliminate this jump by	-0.124939
-0.117774	time is determined by	-0.124939
-0.117774	slices is determined by	-0.124939
-0.233329	can be determined by	-0.124939
-0.149141	cases be determined by	-0.124939
-0.132259	is often determined by	-0.124939
-0.518665	Provoke cache misses by	-0.124939
-0.711672	of context switches by	-0.124939
-0.337987	Literature Other manuals by	-0.124939
-0.504692	made more compact by	-0.124939
-0.433325	Replacing two comparisons by	-0.124939
-0.334579	space 91 step by	-0.124939
-0.334934	not alias anything by	-0.124939
-0.465413	avoid multiple inheritance by	-0.124939
-0.330273	can be overcome by	-0.124939
-0.021816	the code generated by	-0.425969
-0.094726	Object files generated by	-0.124939
-0.094726	the comments generated by	-0.124939
-0.330273	be dynamically created by	-0.124939
-0.329851	instead of pointers, by	-0.124939
-0.603547	can be increased by	-0.124939
-0.159902	slow // Division by	-0.124939
-0.159902	much faster. Division by	-0.124939
-0.159902	it matters: Division by	-0.124939
-0.329851	of these guidelines by	-0.124939
-0.330273	results are combined by	-0.124939
-0.323893	&& b<c) Multiply by	-0.124939
-0.323893	Windows. Does not, by	-0.124939
-0.092862	number by 2n by	-0.124939
-0.092862	divide by 2n by	-0.124939
-0.592423	can be prevented by	-0.124939
-0.324413	can be illustrated by	-0.124939
-0.323893	objects are returned by	-0.124939
-0.418873	optimize example 8.26a by	-0.124939
-0.048129	should be identified by	-0.124939
-0.023398	but are identified by	-0.124939
-0.023398	objects are identified by	-0.124939
-0.048129	Are objects identified by	-0.124939
-0.048129	value is multiplied by	-0.124939
-0.023398	should be multiplied by	-0.124939
-0.023398	must be multiplied by	-0.124939
-0.048129	an index multiplied by	-0.124939
-0.027445	may be modified by	-0.425969
-0.122029	are never modified by	-0.124939
-0.323373	rather unconventional manner by	-0.124939
-0.323373	can avoid hyperthreading by	-0.124939
-0.312787	and later deleted by	-0.124939
-0.312787	should be hidden by	-0.124939
-0.312787	pointers to zero, by	-0.124939
-0.312787	be done manually by	-0.124939
-0.036993	to be spaced by	-0.425969
-0.077433	clauses are separated by	-0.124939
-0.077433	clause are separated by	-0.124939
-0.036993	can be solved by	-0.425969
-0.065435	0 - Divide by	-0.124939
-0.065435	and add Divide by	-0.124939
-0.065435	(-a>-b)=(a<b) ---xx---x Divide by	-0.124939
-0.313463	to 120 ms by	-0.124939
-0.405723	Provoke branch mispredictions by	-0.124939
-0.441717	space, if necessary, by	-0.124939
-0.292359	for exceptions thrown by	-0.124939
-0.292359	check is bypassed by	-0.124939
-0.535610	Numerically Intensive Codes", by	-0.124939
-0.292359	occurrences of ArraySize by	-0.124939
-0.292359	position-independent code everywhere by	-0.124939
-0.292359	compiler Linux Align by	-0.124939
-0.292359	the performance dramatically by	-0.124939
-0.292359	int before dividing by	-0.124939
-0.292359	DLLs are relocated by	-0.124939
-0.292359	A command received by	-0.124939
-0.292359	block size grows by	-0.124939
-0.292359	far data segment by	-0.124939
-0.380559	compose a bitfield by	-0.124939
-0.236276	multiple threads Parallelization by	-0.124939
-0.236276	interval [1.0, 2.0) by	-0.124939
-0.236276	tested and investigated by	-0.124939
-0.236276	manuals is copyrighted by	-0.124939
-0.236276	slow // Modulo by	-0.124939
-0.236276	comparison of doubles by	-0.124939
-0.236276	exception is caught by	-0.124939
-0.236276	to replace u[1] by	-0.124939
-0.236276	Automatic paralleli- zation by	-0.124939
-0.236276	are not affected by	-0.124939
-0.236276	shift operations. Multiplying by	-0.124939
-0.236276	a place indicated by	-0.124939
-0.236276	may be mitigated by	-0.124939
-0.236276	function libraries published by	-0.124939
-0.236276	can be ameliorated by	-0.124939
-0.236276	must be followed by	-0.124939
-0.236276	may be caused by	-0.124939
-0.236276	is obviously influenced by	-0.124939
-0.236276	can be accomplished by	-0.124939
-0.236276	only when activated by	-0.124939
-0.236276	is still frustrated by	-0.124939
-0.458740	linked list or with	-0.124939
-0.652194	and delete or with	-0.124939
-0.355075	without polymorphism or with	-0.124939
-0.458297	and write it with	-0.124939
-0.458297	or replace it with	-0.124939
-0.354726	by XOR'ing it with	-0.124939
-0.354726	by AND'ing it with	-0.124939
-0.593554	Replacing a function with	-0.124939
-0.457737	a simple function with	-0.124939
-0.534174	making another function with	-0.124939
-0.535602	a pure function with	-0.124939
-0.503994	requires log on with	-0.124939
-1.784009	in the code with	-0.124939
-0.516841	running this code with	-0.124939
-0.351811	Vector class code with	-0.124939
-0.530435	highly optimized code with	-0.124939
-0.351811	the user-written code with	-0.124939
-0.355661	but possibly not with	-0.124939
-0.355661	template specialization, not with	-0.124939
-0.355590	with signed than with	-0.124939
-0.355590	coarse-grained parallelism than with	-0.124939
-0.561689	requires a compiler with	-0.124939
-0.352123	to each compiler with	-0.124939
-0.352123	the Borland compiler with	-0.124939
-0.352123	user friendly compiler with	-0.124939
-0.717642	to optimize this with	-0.124939
-0.354987	me explain this with	-0.124939
-0.582381	de-allocation of memory with	-0.124939
-0.880767	Organize the data with	-0.124939
-0.885820	compile the program with	-0.124939
-0.354130	reason. A program with	-0.124939
-0.524815	speed of functions with	-0.124939
-0.561681	dispatching works only with	-0.124939
-0.305555	or a CPU with	-0.124939
-0.305555	on a CPU with	-0.124939
-0.305555	need a CPU with	-0.124939
-0.580544	or the other with	-0.124939
-1.022705	in a loop with	-0.124939
-0.532823	inside a loop with	-0.124939
-0.243552	well. A loop with	-0.124939
-0.243552	prediction. A loop with	-0.124939
-1.266942	also be used with	-0.124939
-0.585822	libraries are used with	-0.124939
-0.580252	divide an integer with	-0.124939
-0.496354	mix simple integer with	-0.124939
-0.734240	is a class with	-0.124939
-0.734240	into a class with	-0.124939
-1.218354	structure or class with	-0.124939
-1.296109	you can do with	-0.124939
-0.239189	the above example with	-0.124939
-0.356831	8 kb size with	-0.124939
-0.637108	a && b with	-0.124939
-0.347398	a || b with	-0.124939
-0.347398	have AND'ed b with	-0.124939
-0.599209	a function library with	-0.124939
-0.424066	A function library with	-0.124939
-0.424066	math function library with	-0.124939
-0.062849	use this library with	-0.425969
-0.357072	than comparing i with	-0.124939
-0.582615	construct an object with	-0.124939
-0.567708	like an array with	-0.124939
-0.794281	a linear array with	-0.124939
-0.344615	a variable-size array with	-0.124939
-0.748972	a debug version with	-0.124939
-0.643498	a release version with	-0.124939
-0.760032	dynamically allocated objects with	-0.124939
-0.355984	class member variable with	-0.124939
-0.355500	error can return with	-0.124939
-1.127253	by a table with	-0.124939
-0.557709	vulnerability of software with	-0.124939
-0.348724	swap these elements with	-0.124939
-0.348724	to distinguish elements with	-0.124939
-0.881139	point is faster with	-0.124939
-1.149151	will be stored with	-0.124939
-0.576213	example is called with	-0.124939
-0.348065	// erroneously called with	-0.124939
-0.944746	a Pentium 4 with	-0.124939
-0.993103	a function call with	-0.124939
-0.584894	Use function libraries with	-0.124939
-0.536133	parameters. A template with	-0.124939
-0.133715	portable to systems with	-0.124939
-0.133715	ported to systems with	-0.124939
-0.327909	thread in systems with	-0.124939
-0.327909	fully utilize systems with	-0.124939
-0.355507	blocks. A method with	-0.124939
-0.459497	were carried out with	-0.124939
-0.458169	on a system with	-0.124939
-0.355352	j * 32 with	-0.124939
-0.242464	out multiple bits with	-0.124939
-0.242464	toggle multiple bits with	-0.124939
-0.334985	and return operations with	-0.124939
-0.334985	This makes operations with	-0.124939
-0.433390	do arithmetic operations with	-0.124939
-0.499593	Define function type with	-0.124939
-0.570884	commonly the case with	-0.124939
-0.293849	best on processors with	-0.124939
-0.293849	avoided on processors with	-0.124939
-0.658669	The first processors with	-0.124939
-0.712747	standard PC processors with	-0.124939
-0.313881	Small lightweight processors with	-0.124939
-0.652515	are only available with	-0.124939
-1.393486	by a constant with	-0.124939
-0.333119	an integer constant with	-0.124939
-0.333119	a single constant with	-0.124939
-0.333689	we end up with	-0.124939
-0.135555	to keep up with	-0.124939
-0.135555	always keep up with	-0.124939
-0.548209	function many times with	-0.124939
-0.343761	function 250 times with	-0.124939
-0.524478	It is accessed with	-0.124939
-0.721321	can be accessed with	-0.124939
-0.371653	should be accessed with	-0.124939
-0.124467	function for CPUs with	-0.124939
-0.124467	version for CPUs with	-0.124939
-0.218063	function on CPUs with	-0.124939
-0.218063	only on CPUs with	-0.124939
-0.299616	compiler. Use CPUs with	-0.124939
-0.299616	division. Older CPUs with	-0.124939
-0.486384	examples of arrays with	-0.124939
-0.428088	make aligned arrays with	-0.124939
-0.330754	allocate variable-size arrays with	-0.124939
-0.519705	function to work with	-0.124939
-0.330544	this may work with	-0.124939
-0.716359	it doesn't work with	-0.124939
-0.457711	mix mathematical calculations with	-0.124939
-1.194887	in multiple versions with	-0.124939
-0.354440	Core i7 processor with	-0.124939
-0.681276	program is compiled with	-0.124939
-0.405057	to be compiled with	-0.124939
-0.371868	code are compiled with	-0.124939
-0.263505	of code compiled with	-0.124939
-0.376623	mixing code compiled with	-0.124939
-0.371868	than when compiled with	-0.124939
-0.285253	are normally compiled with	-0.124939
-0.383161	number of threads with	-0.124939
-0.383161	no more threads with	-0.124939
-0.383161	in other threads with	-0.124939
-0.294482	divided into threads with	-0.124939
-0.468951	running two threads with	-0.124939
-0.383161	into separate threads with	-0.124939
-0.457505	advanced high-level language with	-0.124939
-1.128921	a separate thread with	-0.124939
-0.135877	is to compile with	-0.124939
-0.476914	when you compile with	-0.124939
-0.327522	it has allocated with	-0.124939
-0.133591	memory Memory allocated with	-0.124939
-0.133591	include: Memory allocated with	-0.124939
-0.498461	trap integer overflow with	-0.124939
-0.461583	addition of integers with	-0.124939
-0.656640	use 64-bit integers with	-0.124939
-0.423975	as 8-bit integers with	-0.124939
-0.521932	and 32-bit Linux with	-0.124939
-0.353600	are wrapper classes with	-0.124939
-0.385512	allocation is done with	-0.124939
-0.385512	multiplication is done with	-0.124939
-0.128496	can be done with	-0.346788
-0.295611	operations are done with	-0.124939
-0.469018	calculations are done with	-0.124939
-0.500147	the command line with	-0.124939
-0.457271	This method works with	-0.124939
-0.563194	to be calculated with	-0.124939
-0.339186	are always calculated with	-0.124939
-0.514822	which is implemented with	-0.124939
-1.441169	can be implemented with	-0.124939
-0.483214	languages are implemented with	-0.124939
-0.310702	this calculation implemented with	-0.124939
-0.699363	is a problem with	-0.124939
-0.435943	unchanged. The problem with	-0.124939
-0.294897	} A problem with	-0.124939
-0.294897	double. Another problem with	-0.124939
-0.294897	most serious problem with	-0.124939
-1.005011	it is known with	-0.124939
-0.353981	Example 12.6. Function with	-0.124939
-0.353541	a linear list with	-0.124939
-0.474705	code may run with	-0.124939
-0.337060	a test run with	-0.124939
-0.353175	platforms. Works well with	-0.124939
-0.353313	from different addresses with	-0.124939
-1.183985	the loop counter with	-0.124939
-1.471232	dynamic memory allocation with	-0.124939
-0.353032	PC platform. However, with	-0.124939
-0.517277	useful in programs with	-0.124939
-0.352553	Some common problems with	-0.124939
-0.578887	c: CPU dispatching with	-0.124939
-0.135122	chain. A microprocessor with	-0.124939
-0.135122	counter. A microprocessor with	-0.124939
-0.352960	single container, preferably with	-0.124939
-0.352039	systems"). An application with	-0.124939
-0.495832	thing. An expression with	-0.124939
-0.568117	accesses data members with	-0.124939
-0.351684	efficient table-based methods with	-0.124939
-0.352280	of comparing signed with	-0.124939
-0.351585	inferior. A model with	-0.124939
-0.495064	means integer division with	-0.124939
-0.351051	b) >> n with	-0.124939
-0.352022	must always end with	-0.124939
-0.327643	on mathematical applications with	-0.124939
-0.327643	some CPU-intensive applications with	-0.124939
-0.424837	do an addition with	-0.124939
-1.058729	floating point addition with	-0.124939
-1.080948	of different types with	-0.124939
-0.453582	for such optimizations with	-0.124939
-0.323474	on a platform with	-0.124939
-0.323474	standard PC platform with	-0.124939
-0.819169	AVX or later with	-0.124939
-0.453079	be linked together with	-0.124939
-0.324172	for variables declared with	-0.124939
-0.324172	A macro declared with	-0.124939
-0.322535	need to link with	-0.124939
-0.322535	-fno-pic and link with	-0.124939
-0.322047	shared object made with	-0.124939
-0.322047	into projects made with	-0.124939
-0.350510	(or eight) points with	-0.124939
-0.319826	templates or modules with	-0.124939
-0.319826	most critical modules with	-0.124939
-0.451406	of the core with	-0.124939
-0.841696	to do things with	-0.124939
-0.318572	some funny things with	-0.124939
-1.013926	should be tested with	-0.124939
-0.530189	in a computer with	-0.124939
-0.289254	used. A computer with	-0.124939
-0.289254	an old computer with	-0.124939
-0.005698	that is compatible with	-0.124939
-0.017324	processor is compatible with	-0.124939
-0.011472	not be compatible with	-0.124939
-0.023256	will be compatible with	-0.124939
-0.093946	are not compatible with	-0.124939
-0.073874	not even compatible with	-0.124939
-0.073874	and highly compatible with	-0.124939
-0.125038	not backwards compatible with	-0.124939
-0.073874	is mostly compatible with	-0.124939
-0.456791	a switch statement with	-0.124939
-0.323939	A switch statement with	-0.124939
-0.922220	be allocated dynamically with	-0.124939
-0.500942	branch // Loop with	-0.124939
-0.316308	Example 12.4a. Loop with	-0.124939
-0.449992	on a network with	-0.124939
-0.330386	library that comes with	-0.124939
-0.250983	The compiler comes with	-0.124939
-0.250983	source. It comes with	-0.124939
-0.250983	(STL) which comes with	-0.124939
-0.438473	on other platforms with	-0.124939
-0.310374	most common platforms with	-0.124939
-0.690709	cannot be vectorized with	-0.124939
-0.310535	Same example, vectorized with	-0.124939
-0.347055	express any algorithm with	-0.124939
-0.176760	that the compatibility with	-0.124939
-0.076042	sake of compatibility with	-0.124939
-0.168286	requirements of compatibility with	-0.124939
-0.176760	set when compatibility with	-0.124939
-0.176760	of backwards compatibility with	-0.124939
-0.347468	desired polymorphism effect with	-0.124939
-0.553190	compiler to predict with	-0.124939
-0.302445	that is obtained with	-0.124939
-0.479246	performance is obtained with	-0.124939
-0.087623	can be obtained with	-0.425969
-0.195331	no doubt obtained with	-0.124939
-0.508531	works more efficiently with	-0.124939
-0.634603	functions have names with	-0.124939
-0.486898	value of N with	-0.124939
-0.345092	63 number (e.g. with	-0.124939
-0.376238	big data structures with	-0.124939
-0.376238	advanced data structures with	-0.124939
-0.344956	point representation directly with	-0.124939
-0.446213	constants are defined with	-0.124939
-0.067934	elements that come with	-0.124939
-0.067934	libraries that come with	-0.124939
-0.067934	mathimf.h that come with	-0.124939
-0.182258	a circular buffer with	-0.425969
-0.727085	errors can happen with	-0.124939
-0.686836	fashioned C style with	-0.124939
-0.442070	mix nontemporal writes with	-0.124939
-0.534446	loop-carried dependency chains with	-0.124939
-0.341622	It compares eax with	-0.124939
-0.340703	The libraries included with	-0.124939
-0.340401	in vector c2 with	-0.124939
-0.440193	do two additions with	-0.124939
-0.478052	make a DLL with	-0.124939
-0.439813	on small devices with	-0.124939
-0.175688	done by multiplying with	-0.124939
-0.175688	2 when multiplying with	-0.124939
-0.145274	double before multiplying with	-0.124939
-0.145274	precision before multiplying with	-0.124939
-0.730460	can be determined with	-0.124939
-0.337118	point calculations. Even with	-0.124939
-0.350300	7.43a. Runtime polymorphism with	-0.124939
-0.267506	7.43b. Compile-time polymorphism with	-0.124939
-0.474785	time is measured with	-0.124939
-0.432116	8.23b. Calculate polynomial with	-0.124939
-0.432614	name of Func with	-0.124939
-0.334367	hard disk. Test with	-0.124939
-0.612820	more powerful computers with	-0.124939
-0.333970	disturb the users with	-0.124939
-0.333970	cannot be mixed with	-0.124939
-0.335162	care of communication with	-0.124939
-0.334367	C- style type-casting with	-0.124939
-0.330544	in vector bc with	-0.124939
-0.102469	diagonal is swapped with	-0.124939
-0.102469	matrix[r][c] is swapped with	-0.124939
-0.342323	{ // Array with	-0.124939
-0.236834	Example 7.15a. Array with	-0.124939
-0.329599	a[i+2] ; compare with	-0.124939
-0.102329	memory is contiguous with	-0.124939
-0.102329	which is contiguous with	-0.124939
-0.329599	compile them separately with	-0.124939
-0.010765	that is AND'ed with	-0.425969
-0.021803	cc[i]+2 is AND'ed with	-0.124939
-0.021803	bb[i]*cc[i] is AND'ed with	-0.124939
-0.330071	Clang compiler combined with	-0.124939
-0.329599	should avoid macros with	-0.124939
-0.323125	files and databases with	-0.124939
-0.323706	Example 12.4b. Vectorized with	-0.124939
-0.323125	swapping column 29 with	-0.124939
-0.323125	names that begin with	-0.124939
-0.211441	double is represented with	-0.124939
-0.402522	can be represented with	-0.124939
-0.056709	these are incompatible with	-0.124939
-0.056709	options are incompatible with	-0.124939
-0.121951	the code incompatible with	-0.124939
-0.418565	the PLT entry with	-0.124939
-0.323125	make some tests with	-0.124939
-0.323706	overflow behavior well-defined with	-0.124939
-0.031465	you are satisfied with	-0.124939
-0.031465	who are satisfied with	-0.124939
-0.065389	are not satisfied with	-0.124939
-0.065389	Windows. Gnu Comes with	-0.124939
-0.065389	available. Microsoft Comes with	-0.124939
-0.065389	/ Embarcadero Comes with	-0.124939
-0.313302	should be performed with	-0.124939
-0.171618	(VML, MKL). Works with	-0.124939
-0.171618	Primitives (IPP). Works with	-0.124939
-0.313302	This is supplied with	-0.124939
-0.313302	copied or moved with	-0.124939
-0.313302	in duration compared with	-0.124939
-0.292128	that are impossible with	-0.124939
-0.292128	can be manipulated with	-0.124939
-0.292128	further optimizations. Loops with	-0.124939
-0.292128	in example 14.14a with	-0.124939
-0.102190	104 } Microprocessors with	-0.124939
-0.102190	cache control Microprocessors with	-0.124939
-0.292128	in regular patterns with	-0.124939
-0.292128	are often conflicting with	-0.124939
-0.292128	using indexes, working with	-0.124939
-0.102190	the problems associated with	-0.124939
-0.102190	programming errors associated with	-0.124939
-0.292128	devices and machines with	-0.124939
-0.292128	bytes, 4 ways, with	-0.124939
-0.292128	can cause complications with	-0.124939
-0.102190	methods for dealing with	-0.124939
-0.102190	you are dealing with	-0.124939
-0.380276	or by extending with	-0.124939
-0.102190	which may interfere with	-0.124939
-0.102190	macro will interfere with	-0.124939
-0.292128	(everything that begins with	-0.124939
-0.292128	Has an IDE with	-0.124939
-0.236073	Compiler Documentation". Included with	-0.124939
-0.236073	subtask before coordination with	-0.124939
-0.236073	SelectAddMul example (12.4e) with	-0.124939
-0.236073	of ADC (add with	-0.124939
-0.236073	Example 12.1b. Vectorization with	-0.124939
-0.236073	or multiple configurations with	-0.124939
-0.236073	created. Far Systems with	-0.124939
-0.236073	it is correlated with	-0.124939
-0.236073	7 through 14, with	-0.124939
-0.236073	security matters. Problems with	-0.124939
-0.236073	easy to trace with	-0.124939
-0.236073	heavy competition. Processors with	-0.124939
-0.236073	computer. Big supercomputers with	-0.124939
-0.236073	a project built with	-0.124939
-0.236073	by calling vector::reserve with	-0.124939
-0.236073	have been unsatisfied with	-0.124939
-0.236073	can't be reached with	-0.124939
-0.236073	is -0 (zero with	-0.124939
-0.236073	example 12.4b, rewritten with	-0.124939
-0.236073	containing multiple streams with	-0.124939
-0.236073	done in connection with	-0.124939
-0.236073	or array coincides with	-0.124939
-0.236073	compilers and invoked with	-0.124939
-0.236073	is dividing repeatedly with	-0.124939
-0.236073	to calculate pow(x,10) with	-0.124939
-0.236073	are usually dealt with	-0.124939
-0.236073	address might clash with	-0.124939
-0.236073	functions. I disagree with	-0.124939
-0.236073	have spent fighting with	-0.124939
-0.897930	than it is on	-0.124939
-0.358114	main focus is on	-0.124939
-0.549182	older processors and on	-0.124939
-0.783451	of this function on	-0.124939
-0.952647	the same function on	-0.124939
-0.492721	processors, but not on	-0.124939
-0.492721	CPUs, but not on	-0.124939
-0.455682	on registers, not on	-0.124939
-0.352664	own research, not on	-0.124939
-0.512179	register rather than on	-0.124939
-0.171989	registers rather than on	-0.726999
-0.635730	integer expressions than on	-0.124939
-0.346691	user interface than on	-0.124939
-1.451359	the Gnu compiler on	-0.124939
-1.512887	of the time on	-0.124939
-0.450449	more CPU time on	-0.124939
-0.805403	of its time on	-0.124939
-0.515846	their execution time on	-0.124939
-0.450449	not spend time on	-0.124939
-0.462078	the resource use on	-0.124939
-0.357644	software. For more on	-0.124939
-0.357458	that allocates memory on	-0.124939
-0.357362	a speed-critical program on	-0.124939
-0.334564	will work only on	-0.124939
-0.543466	mechanism works only on	-0.124939
-0.062627	predicted well only on	-0.425969
-0.334564	is optimal only on	-0.124939
-0.135831	that depends only on	-0.124939
-0.135831	value depends only on	-0.124939
-0.334564	been tested only on	-0.124939
-0.524399	not at all on	-0.124939
-0.357408	model numbers, but on	-0.124939
-0.590496	even be used on	-0.124939
-0.353795	frameworks typically used on	-0.124939
-0.356639	type. The example on	-0.124939
-0.565534	the integer size on	-0.124939
-0.587489	constructing the object on	-0.124939
-0.356763	is generally possible on	-0.124939
-0.643854	the advanced version on	-0.124939
-0.453380	an inferior version on	-0.124939
-0.356735	with other objects on	-0.124939
-0.464835	degradation of performance on	-0.124939
-0.497712	CPUs. The performance on	-0.124939
-0.026467	has reduced performance on	-0.301030
-0.524681	for very long on	-0.124939
-0.944274	it is stored on	-0.124939
-1.093288	will be stored on	-0.124939
-0.167535	function are stored on	-0.425969
-0.443864	parameters are stored on	-0.124939
-0.319728	26). Variables stored on	-0.124939
-1.376080	function is called on	-0.124939
-0.576214	need to test on	-0.124939
-1.001849	This is useful on	-0.124939
-0.355720	many applications even on	-0.124939
-0.544880	writing a file on	-0.124939
-0.891110	use vector operations on	-0.124939
-0.325602	to do operations on	-0.124939
-0.325602	and mathematical operations on	-0.124939
-0.325602	float. Similar operations on	-0.124939
-1.359917	in some cases on	-0.124939
-0.432521	of the processors on	-0.124939
-0.500174	list of processors on	-0.124939
-0.432521	other virtual processors on	-0.124939
-0.444581	is less important on	-0.124939
-0.343884	are particularly important on	-0.124939
-0.346905	objects are accessed on	-0.425969
-0.535607	for inline assembly on	-0.124939
-0.344943	compiler to work on	-0.124939
-0.344943	sure to work on	-0.124939
-0.286662	Keywords that work on	-0.124939
-0.373590	may not work on	-0.124939
-0.286662	make this work on	-0.124939
-0.120096	Gnu directives work on	-0.124939
-0.120096	Microsoft directives work on	-0.124939
-0.474048	finished the calculations on	-0.124939
-0.457423	to do calculations on	-0.124939
-0.308078	when doing calculations on	-0.124939
-0.308078	to start calculations on	-0.124939
-0.308078	doing parallel calculations on	-0.124939
-0.355015	be cross- compiled on	-0.124939
-0.354749	typically 64 bytes on	-0.124939
-0.563630	Running multiple threads on	-0.124939
-0.305970	to work best on	-0.124939
-0.055459	that works best on	-0.124939
-0.055459	"what works best on	-0.124939
-0.535002	6! The speed on	-0.124939
-0.354779	depends very much on	-0.124939
-0.340919	of possible overflow on	-0.124939
-0.340919	for buffer overflow on	-0.124939
-0.650568	64 64 matrix on	-0.124939
-0.815631	of container classes on	-0.124939
-0.847271	preferably be done on	-0.124939
-0.457507	regardless of precision on	-0.124939
-0.749034	one that works on	-0.124939
-0.339937	It also works on	-0.124939
-0.354578	not a manual on	-0.124939
-0.217260	method is explained on	-0.124939
-0.311952	storage are explained on	-0.124939
-0.186733	code, as explained on	-0.124939
-0.186733	processors, as explained on	-0.124939
-0.186733	static, as explained on	-0.124939
-0.186733	precision, as explained on	-0.124939
-0.186733	execution, as explained on	-0.124939
-0.186733	operations, as explained on	-0.124939
-0.186733	templates, as explained on	-0.124939
-0.186733	AVX, as explained on	-0.124939
-0.186733	statements, as explained on	-0.124939
-0.186733	frequency, as explained on	-0.124939
-0.186733	stride, as explained on	-0.124939
-0.186733	contentions, as explained on	-0.124939
-0.008217	for reasons explained on	-0.726999
-0.498543	Storing the parameters on	-0.124939
-0.339215	stack (three parameters on	-0.124939
-0.339529	This extra check on	-0.124939
-0.339529	a bounds check on	-0.124939
-0.339529	modification if implemented on	-0.124939
-0.339529	is preferably implemented on	-0.124939
-0.354270	the fastest solution on	-0.124939
-1.323867	set is supported on	-0.124939
-0.418438	standardized and supported on	-0.124939
-0.323024	currently only supported on	-0.124939
-0.456814	to decrement operators on	-0.124939
-0.321539	can then run on	-0.124939
-0.321539	Can only run on	-0.124939
-0.321539	can still run on	-0.124939
-0.125645	always work well on	-0.124939
-0.125645	doesn't work well on	-0.124939
-0.125645	that works well on	-0.124939
-0.125645	it works well on	-0.124939
-0.526135	4 clock cycles on	-0.124939
-0.526135	11 clock cycles on	-0.124939
-0.353418	but don't count on	-0.124939
-0.062725	scans all files on	-0.425969
-0.316737	runs quite fast on	-0.124939
-0.316737	are particularly fast on	-0.124939
-0.316737	worked sufficiently fast on	-0.124939
-0.542964	solution is optimal on	-0.124939
-0.352742	amount of space on	-0.124939
-0.497740	than anything else on	-0.124939
-0.558260	// CPU dispatching on	-0.124939
-0.333585	it makes dispatching on	-0.124939
-0.579736	code is running on	-0.124939
-0.141332	only when running on	-0.124939
-0.141332	set when running on	-0.124939
-0.141332	libraries when running on	-0.124939
-0.269067	other processes running on	-0.124939
-0.351389	processor performs better on	-0.124939
-0.495205	set. The examples on	-0.124939
-1.182854	floating point addition on	-0.124939
-0.453631	various algebraic expressions on	-0.124939
-1.205602	parameters are transferred on	-0.124939
-0.451576	r are transferred on	-0.124939
-0.422085	prevents all optimizations on	-0.124939
-0.325949	from doing optimizations on	-0.124939
-0.351038	than rendering graphics on	-0.124939
-0.350933	to keep together on	-0.124939
-0.359810	uses the dispatch on	-0.124939
-0.359810	implement the dispatch on	-0.124939
-0.350206	replaced by storage on	-0.124939
-0.044264	system is based on	-0.124939
-0.108965	manual is based on	-0.124939
-0.023008	should be based on	-0.124939
-0.002810	that are based on	-0.124939
-0.002810	methods are based on	-0.124939
-0.002810	framework are based on	-0.124939
-0.002810	Java are based on	-0.124939
-0.002810	Fortran are based on	-0.124939
-0.001403	schemes are based on	-0.124939
-0.002810	recommendations are based on	-0.124939
-0.023008	unknown CPU based on	-0.124939
-0.023008	or C++ based on	-0.124939
-0.023008	A language based on	-0.124939
-0.023008	CPU dispatcher based on	-0.124939
-0.023008	level framework based on	-0.124939
-0.023008	will go based on	-0.124939
-0.023008	be chosen based on	-0.124939
-0.350716	specific CPU feature on	-0.124939
-0.349635	same processor core on	-0.124939
-0.530233	etc. scattered around on	-0.124939
-0.318837	not wrap around on	-0.124939
-0.262364	do more reductions on	-0.124939
-0.262364	most simple reductions on	-0.124939
-0.303529	make algebraic reductions on	-0.124939
-0.303529	any algebraic reductions on	-0.124939
-0.057362	to use depends on	-0.124939
-0.057362	each vector depends on	-0.124939
-0.057362	a loop depends on	-0.124939
-0.106505	each value depends on	-0.124939
-0.013646	control branch depends on	-0.124939
-0.013646	each calculation depends on	-0.425969
-0.057362	final application depends on	-0.124939
-0.057362	each addition depends on	-0.124939
-0.057362	be predicted depends on	-0.124939
-0.057362	of sum depends on	-0.124939
-0.057362	The gain depends on	-0.124939
-0.057362	the truth depends on	-0.124939
-0.319734	should be tested on	-0.124939
-0.350318	are fully compatible on	-0.124939
-0.011549	function name depending on	-0.124939
-0.011549	various ways depending on	-0.124939
-0.011549	frequency dynamically depending on	-0.124939
-0.000456	clock cycles, depending on	-0.823909
-0.011549	the memory, depending on	-0.124939
-0.011549	32-bit integers, depending on	-0.124939
-0.011549	and 64, depending on	-0.124939
-0.011549	or four, depending on	-0.124939
-0.011549	several meanings depending on	-0.124939
-0.011549	example 12.4a, depending on	-0.124939
-0.011549	conditional move, depending on	-0.124939
-0.011549	following solutions, depending on	-0.124939
-0.574525	preferably be avoided on	-0.124939
-0.392871	is to turn on	-0.124939
-0.729404	recommended to turn on	-0.124939
-0.306462	using and turn on	-0.124939
-0.230966	you can turn on	-0.124939
-0.230966	Do not turn on	-0.124939
-0.349042	the methods described on	-0.124939
-0.638558	floating point operation on	-0.124939
-0.348527	new model comes on	-0.124939
-0.014430	is to rely on	-0.124939
-0.014430	convenient to rely on	-0.124939
-0.048698	compilers that rely on	-0.124939
-0.048698	optimizations that rely on	-0.124939
-0.007155	you can rely on	-0.124939
-0.029356	threads should rely on	-0.124939
-0.009566	function cannot rely on	-0.124939
-0.009566	you cannot rely on	-0.124939
-0.009566	You cannot rely on	-0.124939
-0.029356	cannot always rely on	-0.124939
-0.029356	branch must rely on	-0.124939
-0.029356	possible. Don't rely on	-0.124939
-0.029356	can surely rely on	-0.124939
-0.515177	divisions are given on	-0.124939
-0.347083	for certain tasks on	-0.124939
-0.237404	hardly any effect on	-0.124939
-0.237404	no negative effect on	-0.124939
-0.237404	a significant effect on	-0.124939
-0.237404	very dramatic effect on	-0.124939
-0.346422	should work efficiently on	-0.124939
-0.132665	of processor models on	-0.425969
-0.428967	systems" for details on	-0.124939
-0.303277	gives more details on	-0.124939
-0.345642	dependency chain, especially on	-0.124939
-0.469340	threads is discussed on	-0.124939
-0.299016	devices, as discussed on	-0.124939
-0.345468	is explained below on	-0.124939
-0.492324	linker. The delay on	-0.124939
-0.345121	processing unit, either on	-0.124939
-0.344947	; save ebx on	-0.124939
-0.344136	actually doing something on	-0.124939
-0.726544	a clock cycle on	-0.124939
-0.300689	A clock cycle on	-0.124939
-0.425514	one clock cycle on	-0.124939
-0.440012	method is fastest on	-0.124939
-0.530429	conditions are listed on	-0.124939
-0.338178	time you spend on	-0.124939
-0.337909	and make measurements on	-0.124939
-0.337641	sizes were measured on	-0.124939
-0.207802	cases, the log on	-0.124939
-0.207802	password. The log on	-0.124939
-0.207802	usually requires log on	-0.124939
-0.365843	time is spent on	-0.124939
-0.365843	the time spent on	-0.124939
-0.187466	clock cycles spent on	-0.124939
-0.334799	which is 15 on	-0.124939
-0.334488	time than normal on	-0.124939
-0.031508	clients that depend on	-0.124939
-0.031508	iteration should depend on	-0.124939
-0.031508	it doesn't depend on	-0.124939
-0.031508	factorials don't depend on	-0.124939
-0.031508	workaround methods depend on	-0.124939
-0.031508	hardware-related details depend on	-0.124939
-0.478802	the optimization effort on	-0.124939
-0.330113	A negative list, on	-0.124939
-0.330482	necessary to compromise on	-0.124939
-0.795007	a software package on	-0.124939
-0.038094	Software that relies on	-0.124939
-0.038094	the code relies on	-0.124939
-0.038094	your program relies on	-0.124939
-0.038094	The mechanism relies on	-0.124939
-0.038094	the MKL relies on	-0.124939
-0.211637	takes time. Dispatch on	-0.124939
-0.211637	different times: Dispatch on	-0.124939
-0.323631	works particularly bad on	-0.124939
-0.419195	(see page 134 on	-0.124939
-0.048163	compiler for restrictions on	-0.124939
-0.048163	very few restrictions on	-0.124939
-0.011549	are certain restrictions on	-0.124939
-0.324086	better processor appears on	-0.124939
-0.211285	only 5 μs on	-0.124939
-0.283143	= 250 μs on	-0.124939
-0.324086	or 256 bytes) on	-0.124939
-0.323631	well in tests on	-0.124939
-0.406768	explained in detail on	-0.124939
-0.313039	developer.intel.com. Many advices on	-0.124939
-0.236983	may behave differently on	-0.124939
-0.171855	Overflow behaves differently on	-0.124939
-0.037015	operation is performed on	-0.124939
-0.313039	of titles. Literature on	-0.124939
-0.313039	is more focus on	-0.124939
-0.313039	are typically specified on	-0.124939
-0.406034	using example 9.5a on	-0.124939
-0.313039	various discussion forums on	-0.124939
-0.313039	and zero flags on	-0.124939
-0.292599	compilation or interpretation on	-0.124939
-0.102342	TR18015 Technical Report on	-0.124939
-0.102342	18015, "Technical Report on	-0.124939
-0.292599	from www.intel.com. Manual on	-0.124939
-0.380853	the cache miss on	-0.124939
-0.292599	the general literature on	-0.124939
-0.023414	effort is concentrated on	-0.124939
-0.292599	series of experiments on	-0.124939
-0.292599	This has influence on	-0.124939
-0.292599	first two (three on	-0.124939
-0.380853	It will crash on	-0.124939
-0.536030	be predicted perfectly on	-0.124939
-0.292599	can run optimally on	-0.124939
-0.023414	time is wasted on	-0.124939
-0.292599	that rely heavily on	-0.124939
-0.292599	the optimization efforts on	-0.124939
-0.292599	2001. Advanced book on	-0.124939
-0.536030	the const restriction on	-0.124939
-0.292599	from the IDE on	-0.124939
-0.380853	use algebraic manipulations on	-0.124939
-0.292599	counters will stay on	-0.124939
-0.102342	contain many tips on	-0.124939
-0.102342	and some tips on	-0.124939
-0.236487	are provided below, on	-0.124939
-0.236487	disk cache. Files on	-0.124939
-0.236487	for discussions. Turn on	-0.124939
-0.236487	and objects. Storage on	-0.124939
-0.236487	ebx is pushed on	-0.124939
-0.236487	compilers. Wikipedia article on	-0.124939
-0.236487	test theory. Advice on	-0.124939
-0.236487	put a tag on	-0.124939
-0.236487	runtime. Example 7.43 on	-0.124939
-0.236487	is based mainly on	-0.124939
-0.236487	program runs satisfactorily on	-0.124939
-0.236487	and negative impacts on	-0.124939
-0.236487	clock cycles (depending on	-0.124939
-0.236487	deal of research on	-0.124939
-0.236487	is busy concentrating on	-0.124939
-0.236487	Processors". www.amd.com. Advices on	-0.124939
-0.236487	we are relying on	-0.124939
-0.236487	the BIOS setup. on	-0.124939
-0.236487	not a textbook on	-0.124939
-0.236487	are different opinions on	-0.124939
-0.236487	a Boolean NOT on	-0.124939
-0.236487	'this' is incurred on	-0.124939
-1.387290	This is the code	-0.124939
-0.957767	which is the code	-0.124939
-0.947859	version of the code	-0.602060
-0.532170	part of the code	-0.279841
-0.867199	Most of the code	-0.124939
-0.365671	parts of the code	-0.234083
-0.519367	fragmentation of the code	-0.124939
-0.519367	compactness of the code	-0.124939
-0.843106	addition to the code	-0.124939
-0.571033	modifications to the code	-0.124939
-0.548207	obvious and the code	-0.124939
-0.548207	overflow, and the code	-0.124939
-0.548207	tedious and the code	-0.124939
-0.744619	other in the code	-0.124939
-0.744619	values in the code	-0.124939
-0.114442	addresses in the code	-0.301030
-0.229131	space in the code	-0.301030
-0.515636	dispatching in the code	-0.124939
-0.920710	contentions in the code	-0.124939
-0.515636	references in the code	-0.124939
-0.515636	happen in the code	-0.124939
-0.515636	chains in the code	-0.124939
-0.744619	breakpoint in the code	-0.124939
-0.515636	instruments in the code	-0.124939
-0.185405	relocations in the code	-0.425969
-0.515636	pragmas in the code	-0.124939
-0.515636	parallelization in the code	-0.124939
-1.389807	is that the code	-0.124939
-1.275235	so that the code	-0.124939
-1.237098	sure that the code	-0.124939
-1.010094	require that the code	-0.124939
-0.544847	notice that the code	-0.124939
-0.540215	double if the code	-0.124939
-0.540215	see if the code	-0.124939
-0.540215	vectors if the code	-0.124939
-0.540215	vectorized if the code	-0.124939
-0.540215	efficiently if the code	-0.124939
-0.915089	especially if the code	-0.124939
-0.574939	square by the code	-0.124939
-0.985682	resources than the code	-0.124939
-0.519269	relevant when the code	-0.124939
-0.519269	obtained when the code	-0.124939
-0.519269	especially when the code	-0.124939
-0.519269	fragmented when the code	-0.124939
-0.519269	precisions when the code	-0.124939
-0.496466	functions then the code	-0.124939
-0.496466	parameters then the code	-0.124939
-0.712621	time, then the code	-0.124939
-0.496466	double, then the code	-0.124939
-0.682358	look at the code	-0.425969
-0.616669	to make the code	-0.425969
-0.621337	and make the code	-0.124939
-0.569804	better because the code	-0.124939
-0.549267	Actually, only the code	-0.124939
-0.534114	purposes. If the code	-0.124939
-0.534114	order. If the code	-0.124939
-0.534114	12. If the code	-0.124939
-0.558197	vectorized, but the code	-0.124939
-0.513184	feature into the code	-0.124939
-0.513184	directly into the code	-0.124939
-0.383759	This makes the code	-0.124939
-0.268994	this makes the code	-0.124939
-0.268994	one makes the code	-0.124939
-0.268994	also makes the code	-0.124939
-0.383759	option makes the code	-0.124939
-0.268994	templates makes the code	-0.124939
-0.268994	checks makes the code	-0.124939
-0.841166	immediately before the code	-0.124939
-0.545665	are sure the code	-0.124939
-0.825095	in case the code	-0.124939
-1.063346	by making the code	-0.124939
-0.853203	you want the code	-0.124939
-0.479016	to check the code	-0.124939
-0.932834	checks whether the code	-0.124939
-0.439935	needed. All the code	-0.124939
-0.346381	to optimize the code	-0.124939
-0.240014	often optimize the code	-0.124939
-0.533399	debugger. However, the code	-0.124939
-0.557969	handling unless the code	-0.124939
-0.526656	will replace the code	-0.124939
-0.526656	addresses. Therefore, the code	-0.124939
-0.532541	x; Here, the code	-0.124939
-0.528462	will change the code	-0.124939
-0.781949	by copying the code	-0.124939
-0.110620	to vectorize the code	-0.124939
-0.103633	and vectorize the code	-0.124939
-0.103633	will vectorize the code	-0.124939
-0.103633	don't vectorize the code	-0.124939
-0.439935	above. Now the code	-0.124939
-0.340196	problem. Whenever the code	-0.124939
-0.340196	simultaneously prefetching the code	-0.124939
-0.623181	to study the code	-0.124939
-0.340196	the contrary, the code	-0.124939
-0.340196	This reduces the code	-0.124939
-0.340196	to organize the code	-0.124939
-0.340196	and reorganize the code	-0.124939
-0.340196	fine- tune the code	-0.124939
-1.613174	part of a code	-0.124939
-0.461671	on which a code	-0.124939
-0.461671	example shows a code	-0.124939
-0.357382	can execute a code	-0.124939
-0.502855	and insert a code	-0.124939
-0.873749	combined size of code	-0.124939
-0.245540	new branch of code	-0.124939
-0.245540	particular branch of code	-0.124939
-0.854276	A lot of code	-0.124939
-0.080510	the piece of code	-0.124939
-0.136209	a piece of code	-0.204120
-0.270369	same piece of code	-0.124939
-0.179421	critical piece of code	-0.124939
-0.767084	the range of code	-0.124939
-0.568176	total amount of code	-0.124939
-0.475634	two kinds of code	-0.124939
-0.475634	certain kinds of code	-0.124939
-1.234767	in terms of code	-0.124939
-0.495835	small pieces of code	-0.124939
-0.352348	Critical pieces of code	-0.124939
-0.350696	automatic parallelization of code	-0.124939
-0.064749	9.1 Caching of code	-0.425969
-0.350696	Abrash: "Zen of code	-0.124939
-0.578995	ReadB needs to code	-0.124939
-0.357930	cache efficiency and code	-0.124939
-0.938797	function names and code	-0.124939
-0.358712	slight degradation in code	-0.124939
-0.568971	timediff[i]); } The code	-0.124939
-0.434359	same time. The code	-0.124939
-0.434359	save time. The code	-0.124939
-0.548861	trigonometric functions. The code	-0.124939
-0.344031	identical branches The code	-0.124939
-0.913156	of data. The code	-0.124939
-0.484305	Clang compilers. The code	-0.124939
-0.344031	stored together The code	-0.124939
-0.505511	integer calculations. The code	-0.124939
-0.484305	every access. The code	-0.124939
-0.344031	CPU dispatching. The code	-0.124939
-0.344031	OS X The code	-0.124939
-0.484305	vectorize automatically. The code	-0.124939
-0.344031	class members. The code	-0.124939
-0.484305	interrupt 3. The code	-0.124939
-0.444766	sizeof operator. The code	-0.124939
-0.344031	is specified. The code	-0.124939
-0.344031	is allowed. The code	-0.124939
-0.630570	page 122. The code	-0.124939
-0.344031	you know). The code	-0.124939
-0.344031	becomes contiguous. The code	-0.124939
-0.344031	following features: The code	-0.124939
-0.344031	and tedious. The code	-0.124939
-0.344031	return _mm_cvtsd_si32(_mm_load_sd(&x));} The code	-0.124939
-0.548000	features and for code	-0.124939
-1.037353	good choice for code	-0.124939
-0.357151	been criticized for code	-0.124939
-0.504519	very likely that code	-0.124939
-0.526466	individual functions or code	-0.124939
-0.569759	instance. The function code	-0.124939
-0.358260	or later with code	-0.124939
-0.358302	titles. Literature on code	-0.124939
-0.781764	} } This code	-0.124939
-0.684129	1; } This code	-0.124939
-0.353360	return n;} This code	-0.124939
-0.353360	example 16.1. This code	-0.124939
-0.504296	higher priority than code	-0.124939
-0.529701	problem with this code	-0.124939
-0.351324	on which this code	-0.124939
-0.351324	are running this code	-0.124939
-0.351324	cause overflow, this code	-0.124939
-0.535980	possible or when code	-0.124939
-0.535980	execution time when code	-0.124939
-0.353131	execute CriticalFunction when code	-0.124939
-0.809886	same time. A code	-0.124939
-0.352632	works correctly. A code	-0.124939
-0.352632	it says. A code	-0.124939
-1.556011	of the program code	-0.124939
-1.297824	in the program code	-0.124939
-0.521685	piece of program code	-0.124939
-0.478383	compilation. The program code	-0.124939
-0.478383	interpretation. The program code	-0.124939
-0.598037	means to make code	-0.124939
-1.024163	to a different code	-0.124939
-0.832122	because the same code	-0.124939
-0.565128	shows the same code	-0.124939
-0.832122	produce the same code	-0.124939
-1.323286	share the same code	-0.124939
-0.656985	there is other code	-0.124939
-1.366555	the floating point code	-0.124939
-0.997841	on floating point code	-0.124939
-0.362679	makes floating point code	-0.124939
-0.991547	list of which code	-0.124939
-0.462930	Check that all code	-0.124939
-0.462930	verify that all code	-0.124939
-0.346672	can call all code	-0.124939
-0.346672	vectorization Not all code	-0.124939
-0.502899	12.7. Vector class code	-0.124939
-0.503419	to make multiple code	-0.124939
-1.484217	32-bit and 64-bit code	-0.124939
-0.357272	and maintaining such code	-0.124939
-0.577725	and less efficient code	-0.124939
-0.952795	in situations where code	-0.124939
-0.356841	and run any code	-0.124939
-0.585028	lines. This makes code	-0.124939
-0.784268	of the critical code	-0.124939
-0.512500	by the critical code	-0.124939
-0.061546	13 Making critical code	-0.425969
-0.561379	code 64 bit code	-0.124939
-0.554142	directives 32 bit code	-0.124939
-0.479491	that the system code	-0.124939
-0.340541	therefore the system code	-0.124939
-0.434714	function in system code	-0.124939
-0.503132	or the error code	-0.124939
-0.476291	with an error code	-0.124939
-0.476291	return an error code	-0.124939
-0.355430	useful discussions about code	-0.124939
-0.523722	generates no extra code	-0.124939
-0.364540	generate any extra code	-0.124939
-0.688987	produce any extra code	-0.124939
-0.310733	actually add extra code	-0.124939
-0.310733	compiler inserts extra code	-0.124939
-0.056218	C++ and assembly code	-0.425969
-0.529329	to use assembly code	-0.124939
-0.376154	compilers need assembly code	-0.124939
-0.056218	the following assembly code	-0.425969
-0.438508	use inline assembly code	-0.124939
-0.295610	that the compiled code	-0.124939
-0.295610	makes the compiled code	-0.124939
-0.466347	C++, directly compiled code	-0.124939
-0.354841	economy and small code	-0.124939
-0.355042	recommendation for good code	-0.124939
-0.137837	going from AVX code	-0.124939
-0.137837	transition from AVX code	-0.124939
-0.287221	run the optimized code	-0.124939
-0.287221	12.2, the optimized code	-0.124939
-0.390077	output. The optimized code	-0.124939
-0.390077	the fully optimized code	-0.124939
-0.462756	making highly optimized code	-0.124939
-0.354210	function or every code	-0.124939
-0.353699	prediction). 149 All code	-0.124939
-0.354966	interprets the intermediate code	-0.124939
-0.200569	disadvantage of intermediate code	-0.124939
-0.270527	code and intermediate code	-0.124939
-0.296596	distributed. The intermediate code	-0.124939
-0.042138	based on intermediate code	-0.124939
-0.127988	with an intermediate code	-0.124939
-0.127988	use an intermediate code	-0.124939
-0.127988	used an intermediate code	-0.124939
-0.059296	using an intermediate code	-0.425969
-1.123736	compiler to optimize code	-0.124939
-0.497833	explain the above code	-0.124939
-0.195670	a[i]; The above code	-0.124939
-0.195670	sources. The above code	-0.124939
-0.195670	rarely. The above code	-0.124939
-0.283609	is. This above code	-0.124939
-0.545270	produce the optimal code	-0.124939
-0.334200	produce less optimal code	-0.124939
-0.507673	using a particular code	-0.124939
-0.507673	where a particular code	-0.124939
-0.536090	in 32-bit Mac code	-0.124939
-0.496570	such a complicated code	-0.124939
-0.507039	make the source code	-0.124939
-0.330317	code). The source code	-0.124939
-0.326322	installation time. Each code	-0.124939
-0.326322	at initialization. Each code	-0.124939
-0.324985	remember that your code	-0.124939
-0.324985	years before your code	-0.124939
-0.350948	compiled to binary code	-0.124939
-0.350393	the most advanced code	-0.124939
-0.137631	off the position-independent code	-0.124939
-0.019752	linking and position-independent code	-0.124939
-0.040446	often use position-independent code	-0.124939
-0.040446	systems use position-independent code	-0.124939
-0.085050	X make position-independent code	-0.124939
-0.085050	not using position-independent code	-0.124939
-0.085050	objects without position-independent code	-0.124939
-0.085050	compiler uses position-independent code	-0.124939
-0.085050	for special position-independent code	-0.124939
-0.085050	the burdensome position-independent code	-0.124939
-0.276966	automatically in vectorized code	-0.124939
-0.276966	prone. The vectorized code	-0.124939
-0.276966	to use vectorized code	-0.124939
-0.346554	400 here. Any code	-0.124939
-0.446993	make exactly identical code	-0.124939
-0.482908	makes position- independent code	-0.124939
-0.515821	possible to vectorize code	-0.124939
-0.539704	the hardware definition code	-0.124939
-0.155370	in the machine code	-0.124939
-0.155370	transferred as machine code	-0.124939
-0.155370	translated into machine code	-0.124939
-0.155370	the resulting machine code	-0.124939
-0.255551	system code. System code	-0.124939
-0.255551	everything else. System code	-0.124939
-0.160372	146 below. Position-independent code	-0.124939
-0.160372	by default. Position-independent code	-0.124939
-0.223670	147 14.12 Position-independent code	-0.124939
-0.233004	is a loop-invariant code	-0.124939
-0.044906	elimination and loop-invariant code	-0.124939
-0.044906	propagation, and loop-invariant code	-0.124939
-0.094996	move out loop-invariant code	-0.124939
-0.211797	register size. Vectorized code	-0.124939
-0.211797	overloaded operators. Vectorized code	-0.124939
-0.044066	12.6 Transforming serial code	-0.425969
-0.211797	restrictions on mixing code	-0.124939
-0.211797	problem when mixing code	-0.124939
-0.324750	method. Your measurement code	-0.124939
-0.420297	of data cache, code	-0.124939
-0.172271	see the compiler-generated code	-0.124939
-0.172271	libraries and compiler-generated code	-0.124939
-0.407103	used for improving code	-0.124939
-0.313902	clear and well-structured code	-0.124939
-0.172271	instructions. The built-in code	-0.124939
-0.172271	often inserts built-in code	-0.124939
-0.314207	http://www.agner.org/optimize/asmlib.zip contains complete code	-0.124939
-0.048280	x Loop invariant code	-0.124939
-0.048280	compiler. Loop invariant code	-0.124939
-0.537473	code to non-AVX code	-0.124939
-0.293422	} The resulting code	-0.124939
-0.293422	if the unsafe code	-0.124939
-0.293422	the linker. Both code	-0.124939
-0.293422	shell script. Interpreted code	-0.124939
-0.237210	copy is dead code	-0.124939
-0.237210	Studio can build code	-0.124939
-0.237210	of the user-written code	-0.124939
-0.237210	sure the startup code	-0.124939
-0.237210	use it. Complicated code	-0.124939
-0.237210	user. Making exception-safe code	-0.124939
-0.237210	and the resultant code	-0.124939
-1.341260	member function is as	-0.124939
-0.952954	overloaded operator is as	-0.124939
-0.656393	instruction sets is as	-0.124939
-0.248926	A destructor is as	-0.124939
-0.248926	virtual destructor is as	-0.124939
-0.596136	executable to be as	-0.124939
-0.594758	etc. should be as	-0.124939
-0.524644	header files are as	-0.124939
-0.357132	operator i++ are as	-0.124939
-0.357527	function parameter, or as	-0.124939
-0.358263	compiler recognizes it as	-0.124939
-1.053371	CPU detection function as	-0.124939
-0.956856	the same code as	-0.124939
-0.357530	fixed size, not as	-0.124939
-0.593583	(b*2.0)/3.0 rather than as	-0.124939
-0.355780	circular buffer than as	-0.124939
-0.358296	to access x as	-0.124939
-1.339575	then you may as	-0.124939
-0.547865	but you may as	-0.124939
-0.503178	dispatcher should have as	-0.124939
-1.084426	the same time as	-0.124939
-0.873370	no extra time as	-0.124939
-0.357658	matrix for use as	-0.124939
-0.587986	organizing the data as	-0.124939
-0.457955	as much data as	-0.124939
-1.130392	of the program as	-0.124939
-0.461732	a 256-bit vector as	-0.124939
-0.044872	is the same as	-0.191886
-0.522458	does the same as	-0.124939
-0.504645	becomes the same as	-0.124939
-0.502894	common string functions as	-0.124939
-0.357366	optimizations automatically, but as	-0.124939
-0.537761	counter is used as	-0.124939
-0.537761	bool is used as	-0.124939
-0.952618	can be used as	-0.124939
-0.633245	may be used as	-0.124939
-0.971380	also be used as	-0.124939
-0.425582	directives when used as	-0.124939
-0.848621	are often used as	-0.124939
-1.276099	the level-2 cache as	-0.124939
-0.589870	try to do as	-0.124939
-0.352699	routine should do as	-0.124939
-0.352439	type and size as	-0.124939
-0.352439	units same size as	-0.124939
-0.357103	a / b as	-0.124939
-0.877735	variable or object as	-0.124939
-0.356541	in 36 C++ as	-0.124939
-0.130095	member function such as	-0.124939
-0.014285	using functions such as	-0.124939
-0.003527	mathematical functions such as	-0.249877
-0.014285	C functions such as	-0.124939
-0.014285	math functions such as	-0.124939
-0.014285	memory-intensive functions such as	-0.124939
-0.130095	Good compilers such as	-0.124939
-0.130095	integer operations such as	-0.124939
-0.130095	composite type such as	-0.124939
-0.130095	special cases such as	-0.124939
-0.130095	of CPUs such as	-0.124939
-0.130095	used branches such as	-0.124939
-0.130095	other applications such as	-0.124939
-0.130095	simple types such as	-0.124939
-0.060194	other optimizations such as	-0.124939
-0.060194	do optimizations such as	-0.124939
-0.130095	algebraic reductions such as	-0.124939
-0.060194	compiled languages such as	-0.124939
-0.060194	includes languages such as	-0.124939
-0.029055	for tasks such as	-0.124939
-0.029055	standard tasks such as	-0.124939
-0.029055	Other tasks such as	-0.124939
-0.029055	trivial tasks such as	-0.124939
-0.130095	long time, such as	-0.124939
-0.130095	building blocks such as	-0.124939
-0.130095	of purposes such as	-0.124939
-0.130095	mathematical iterations such as	-0.124939
-0.130095	of vector, such as	-0.124939
-0.130095	segmented memory, such as	-0.124939
-0.130095	third-party profilers such as	-0.124939
-0.130095	also available, such as	-0.124939
-0.130095	between threads, such as	-0.124939
-0.130095	programming languages, such as	-0.124939
-0.130095	same resources, such as	-0.124939
-0.130095	of overflow, such as	-0.124939
-0.130095	by considerations such as	-0.124939
-0.130095	in comparisons, such as	-0.124939
-0.130095	definition language, such as	-0.124939
-0.130095	string classes, such as	-0.124939
-0.130095	STL templates, such as	-0.124939
-0.130095	other resource, such as	-0.124939
-0.130095	data shuffling, such as	-0.124939
-0.130095	certain events, such as	-0.124939
-0.130095	with suffixes such as	-0.124939
-0.130095	inherently serial, such as	-0.124939
-0.130095	automatic vectorization, such as	-0.124939
-0.130095	to obtain, such as	-0.124939
-0.130095	removable media such as	-0.124939
-0.130095	table 9.2, such as	-0.124939
-0.130095	feature information, such as	-0.124939
-0.026658	is as efficient as	-0.602060
-0.085450	therefore as efficient as	-0.124939
-0.040627	exactly as efficient as	-0.124939
-0.201541	the previous value as	-0.124939
-0.501888	allow vector objects as	-0.124939
-0.568844	transferring the variable as	-0.124939
-0.343802	floating point variable as	-0.124939
-0.929271	the induction variable as	-0.124939
-0.356644	be designed so as	-0.124939
-0.343242	for multiple variables as	-0.124939
-0.352641	with Boolean variables as	-0.124939
-0.352641	have Boolean variables as	-0.124939
-0.071985	time as long as	-0.124939
-0.071985	program as long as	-0.124939
-0.071985	but as long as	-0.124939
-0.071985	variables as long as	-0.124939
-0.071985	calculations as long as	-0.124939
-0.071985	significant as long as	-0.124939
-0.071985	integers, as long as	-0.124939
-0.109097	the same way as	-0.124939
-0.343476	i is stored as	-0.124939
-0.343476	sign is stored as	-0.124939
-0.343476	exponent is stored as	-0.124939
-0.343476	fraction is stored as	-0.124939
-0.414308	distributed and stored as	-0.124939
-0.995570	variables are stored as	-0.124939
-0.459193	occur quite often as	-0.124939
-1.225277	is not always as	-0.124939
-1.176650	object oriented programming as	-0.124939
-0.574480	These are available as	-0.124939
-0.355123	takes six times as	-0.124939
-0.590414	150 you want as	-0.124939
-0.567944	align the arrays as	-0.124939
-0.342571	software development work as	-0.124939
-0.342571	as little work as	-0.124939
-1.247078	floating point calculations as	-0.124939
-0.807252	that is compiled as	-0.124939
-0.482289	possibly be compiled as	-0.124939
-0.354389	low-level C language as	-0.124939
-0.458293	do as much as	-0.124939
-0.354329	the same thread as	-0.124939
-0.354095	be as small as	-0.124939
-0.354458	operators using integers as	-0.124939
-0.059950	always as good as	-0.124939
-0.059950	optimized as good as	-0.124939
-0.059950	optimize as good as	-0.124939
-0.059950	cached as good as	-0.124939
-0.810999	and 64-bit Linux as	-0.124939
-1.365778	can be done as	-0.124939
-0.137622	operations are therefore as	-0.124939
-0.137622	Constructors are therefore as	-0.124939
-0.650110	the same precision as	-0.124939
-0.457446	compiler. Not optimized as	-0.124939
-0.517728	expression is calculated as	-0.124939
-0.721883	can be calculated as	-0.124939
-0.409504	will be calculated as	-0.124939
-0.837315	want to get as	-0.124939
-0.354413	are particular advantageous as	-0.124939
-0.339243	code is implemented as	-0.124939
-0.339243	software is implemented as	-0.124939
-0.634110	can be implemented as	-0.124939
-0.121802	should be implemented as	-0.425969
-0.291688	easily be implemented as	-0.124939
-0.256809	which are implemented as	-0.124939
-0.256809	loops are implemented as	-0.124939
-0.234617	is often implemented as	-0.124939
-0.137162	of error known as	-0.124939
-0.137162	programming error known as	-0.124939
-0.100959	program as well as	-0.124939
-0.100959	functions as well as	-0.124939
-0.100959	Linux as well as	-0.124939
-0.100959	framework as well as	-0.124939
-0.100959	reading as well as	-0.124939
-0.100959	users as well as	-0.124939
-0.100959	enum as well as	-0.124939
-0.100959	R2 as well as	-0.124939
-0.353360	and therefore count as	-0.124939
-0.352672	is not quite as	-0.124939
-0.034603	is as fast as	-0.124939
-0.034603	are as fast as	-0.124939
-0.034603	therefore as fast as	-0.124939
-0.011231	just as fast as	-0.124939
-0.647715	Does not optimize as	-0.124939
-0.352353	as few branches as	-0.124939
-0.188283	the same name as	-0.425969
-0.567067	child class name as	-0.124939
-0.351561	interpret that string as	-0.124939
-0.350945	references accept expressions as	-0.124939
-0.449681	which is transferred as	-0.124939
-0.304782	and then transferred as	-0.124939
-0.304782	are always transferred as	-0.124939
-0.515019	the .NET framework as	-0.124939
-0.350761	of thousand numbers as	-0.124939
-0.744569	that are declared as	-0.124939
-0.514993	storing intermediate results as	-0.124939
-0.350331	measured results were as	-0.124939
-0.267324	may be just as	-0.124939
-0.267324	calculations are just as	-0.124939
-0.267324	a vector just as	-0.124939
-0.267324	brackets index, just as	-0.124939
-0.349853	may be smaller as	-0.124939
-0.349319	such obvious reductions as	-0.124939
-0.535379	compiled programming languages as	-0.124939
-0.450688	matrix in STL as	-0.124939
-0.830967	It is intended as	-0.124939
-0.281149	for optimizing code, as	-0.124939
-0.281149	the source code, as	-0.124939
-0.281149	for CPU-intensive code, as	-0.124939
-0.403042	for different platforms as	-0.124939
-0.621460	to other platforms as	-0.124939
-0.449588	class is given as	-0.124939
-0.483277	now be vectorized as	-0.124939
-0.310755	is indeed vectorized as	-0.124939
-0.544311	code the offset as	-0.124939
-0.346534	7.34a. Use macro as	-0.124939
-0.346846	by comparing them as	-0.124939
-1.049759	the same thing as	-0.124939
-0.446268	expressions may occur as	-0.124939
-0.345221	applies to reading as	-0.124939
-0.051332	be implemented either as	-0.425969
-0.256420	be linked either as	-0.124939
-0.344853	It uses ebx as	-0.124939
-0.446501	etc. are defined as	-0.124939
-0.343259	is not significant as	-0.124939
-0.444302	it becomes invalid as	-0.124939
-0.361736	can be organized as	-0.124939
-0.277682	lines are organized as	-0.124939
-0.206652	memory if organized as	-0.124939
-0.206652	point registers organized as	-0.124939
-1.027570	SSE2 instruction set, as	-0.124939
-0.344067	on non-Intel processors, as	-0.124939
-0.445067	coding rules apply as	-0.124939
-0.343663	the same features as	-0.124939
-0.687753	fashioned C style as	-0.124939
-0.481488	language is chosen as	-0.124939
-0.210868	language is provided as	-0.124939
-0.210868	contain is provided as	-0.124939
-0.341990	be as standardized as	-0.124939
-0.340909	are usually included as	-0.124939
-0.340157	inheritance is now as	-0.124939
-0.340408	the same unit as	-0.124939
-0.046149	of modern CPUs, as	-0.425969
-0.298426	or multi-core CPUs, as	-0.124939
-0.340408	instead of j as	-0.124939
-0.340658	many different factors as	-0.124939
-0.340157	CPU dispatching explicitly as	-0.124939
-0.395520	i is interpreted as	-0.124939
-0.277996	will be interpreted as	-0.124939
-0.363818	operator is exactly as	-0.124939
-0.363818	Enums are exactly as	-0.124939
-0.337813	will calculate xn as	-0.124939
-0.484336	code is distributed as	-0.124939
-0.207953	compiled and distributed as	-0.124939
-0.207953	function libraries distributed as	-0.124939
-0.436939	to 64-bit mode, as	-0.124939
-0.475348	variable in memory, as	-0.124939
-0.436939	on the system, as	-0.124939
-0.436581	and 64-bit integers, as	-0.124939
-0.337813	make the measurements as	-0.124939
-0.335036	the same principle as	-0.124939
-0.335366	preferably be static, as	-0.124939
-0.334706	value in edx as	-0.124939
-0.334376	for software users as	-0.124939
-0.254578	invalid as soon as	-0.124939
-0.254578	5). As soon as	-0.124939
-0.427148	cannot be executed as	-0.124939
-0.495891	exact time consumption as	-0.124939
-0.330786	i will appear as	-0.124939
-0.330394	point value written as	-0.124939
-0.465044	15.1d to 15.1c as	-0.124939
-0.324004	be cleaned up, as	-0.124939
-0.324004	double, bool, enum as	-0.124939
-0.591736	loss of precision, as	-0.124939
-0.324004	implement a queue as	-0.124939
-0.011546	to be expressed as	-0.124939
-0.003815	can be expressed as	-0.124939
-0.324487	compiler // Same as	-0.124939
-0.056763	x is treated as	-0.124939
-0.056763	object is treated as	-0.124939
-0.122076	are simply treated as	-0.124939
-0.093036	that is coded as	-0.124939
-0.093036	This is coded as	-0.124939
-0.402760	can be represented as	-0.124939
-0.211595	in fact represented as	-0.124939
-0.419660	accurate and reproducible as	-0.124939
-0.312932	as template parameters, as	-0.124939
-0.405903	in an FPGA as	-0.124939
-0.405903	such small devices, as	-0.124939
-0.312932	with multiple counters, as	-0.124939
-0.405903	doing out-of-order execution, as	-0.124939
-0.312932	to optimize access, as	-0.124939
-0.312932	of cache space, as	-0.124939
-0.405903	of vector operations, as	-0.124939
-0.312932	just-in-time compilers, etc., as	-0.124939
-0.312932	or get ReadTSC as	-0.124939
-0.312932	used for metaprogramming, as	-0.124939
-0.312932	vector of vectors, as	-0.124939
-0.312932	use vector classes, as	-0.124939
-0.292497	container class templates, as	-0.124939
-0.292497	can eliminate branches, as	-0.124939
-0.292497	as well developed as	-0.124939
-0.292497	reproducible. Such events as	-0.124939
-0.292497	seconds or microseconds as	-0.124939
-0.292497	about register use, as	-0.124939
-0.292497	coded as _mm_empty() as	-0.124939
-0.292497	in other ways, as	-0.124939
-0.292497	compiled without AVX, as	-0.124939
-0.292497	data are cached as	-0.124939
-0.292497	that have Booleans as	-0.124939
-0.102309	is calculated internally as	-0.124939
-0.102309	is implemented internally as	-0.124939
-0.236398	complicated and clumsy, as	-0.124939
-0.236398	for switch statements, as	-0.124939
-0.236398	are not yet as	-0.124939
-0.236398	should be regarded as	-0.124939
-0.236398	a memory pool, as	-0.124939
-0.236398	copied by assignment, as	-0.124939
-0.236398	for other optimizations, as	-0.124939
-0.236398	becoming increasingly blurred as	-0.124939
-0.236398	not be passed as	-0.124939
-0.236398	Function pointer serves as	-0.124939
-0.236398	of data elements, as	-0.124939
-0.236398	tested implement OneOrTwo5[b!=0] as	-0.124939
-0.236398	use static linking, as	-0.124939
-0.236398	the clock frequency, as	-0.124939
-0.236398	insert optimization hints as	-0.124939
-0.236398	unit is pipelined, as	-0.124939
-0.236398	the critical stride, as	-0.124939
-0.236398	expensive cache contentions, as	-0.124939
-0.236398	with bounds checking, as	-0.124939
-0.236398	factorial function (n!) as	-0.124939
-0.236398	the same directory as	-0.124939
-0.236398	use a union, as	-0.124939
-0.236398	serious legal issue, as	-0.124939
-0.236398	and garbage collection, as	-0.124939
-0.236398	Server 2008 R2 as	-0.124939
-0.501822	objects and is not	-0.124939
-1.111974	code that is not	-0.124939
-0.642006	that it is not	-0.124939
-0.795261	if it is not	-0.124939
-0.356522	when it is not	-0.124939
-0.888756	then it is not	-0.124939
-0.742628	If it is not	-0.124939
-0.458996	which it is not	-0.124939
-0.581406	but it is not	-0.124939
-0.831093	where it is not	-0.124939
-0.458996	hand, it is not	-0.124939
-0.458996	Furthermore, it is not	-0.124939
-0.458996	Today, it is not	-0.124939
-0.458996	software, it is not	-0.124939
-1.327085	the function is not	-0.124939
-0.738612	a function is not	-0.124939
-0.514081	linked function is not	-0.124939
-1.112517	the code is not	-0.124939
-0.492752	function code is not	-0.124939
-0.706527	system code is not	-0.124939
-0.492752	built-in code is not	-0.124939
-0.559217	code. This is not	-0.124939
-0.559217	usability. This is not	-0.124939
-0.934547	The compiler is not	-0.124939
-0.325299	as this is not	-0.124939
-0.404480	but this is not	-0.124939
-0.325299	example, this is not	-0.124939
-0.325299	However, this is not	-0.124939
-0.458638	Obviously, this is not	-0.124939
-0.325299	unfortunately this is not	-0.124939
-0.745966	of A is not	-0.124939
-0.708673	} It is not	-0.124939
-0.494062	libraries It is not	-0.124939
-0.494062	loop. It is not	-0.124939
-0.494062	thread. It is not	-0.124939
-0.494062	structures It is not	-0.124939
-0.708673	problems. It is not	-0.124939
-0.494062	zero. It is not	-0.124939
-0.494062	profiler. It is not	-0.124939
-0.494062	programs. It is not	-0.124939
-0.494062	lost. It is not	-0.124939
-0.494062	standardized. It is not	-0.124939
-0.494062	poorly. It is not	-0.124939
-0.687127	member functions is not	-0.124939
-0.491988	C++ but is not	-0.124939
-1.732271	instruction set is not	-0.124939
-0.770624	integer size is not	-0.124939
-0.687127	when i is not	-0.124939
-0.477736	new object is not	-0.124939
-0.477736	original object is not	-0.124939
-0.423194	higher number is not	-0.124939
-0.852556	the array is not	-0.124939
-0.844576	of objects is not	-0.124939
-0.811107	the table is not	-0.124939
-0.760255	the performance is not	-0.124939
-0.493287	this address is not	-0.124939
-0.460731	PC processors is not	-0.124939
-0.480791	the error is not	-0.124939
-0.423194	old CPUs is not	-0.124939
-0.526139	the processor is not	-0.124939
-0.333086	logical processor is not	-0.124939
-0.655305	double precision is not	-0.124939
-1.064255	repeat count is not	-0.124939
-0.326837	several branches is not	-0.124939
-0.414973	exception handling is not	-0.124939
-0.414973	Exception handling is not	-0.124939
-0.326837	function name is not	-0.124939
-0.460731	__fastcall keyword is not	-0.124939
-0.597904	user interface is not	-0.124939
-0.101018	a union is not	-0.124939
-0.232892	copy constructor is not	-0.124939
-0.232892	default constructor is not	-0.124939
-0.326837	of points is not	-0.124939
-0.423194	data section is not	-0.124939
-0.480791	one computer is not	-0.124939
-0.734487	to p is not	-0.124939
-0.597904	the STL is not	-0.124939
-0.503483	that index is not	-0.124939
-0.326837	these conditions is not	-0.124939
-0.326837	the alignment is not	-0.124939
-0.326837	cross-platform compatibility is not	-0.124939
-0.361929	second operand is not	-0.124939
-0.503483	case, N is not	-0.124939
-0.326837	loop unrolling is not	-0.124939
-0.326837	row length is not	-0.124939
-0.460731	This tool is not	-0.124939
-0.326837	of iterations is not	-0.124939
-0.326837	memory required is not	-0.124939
-0.326837	cache misses is not	-0.124939
-0.326837	the debugger is not	-0.124939
-0.326837	image base is not	-0.124939
-0.326837	constant propagation is not	-0.124939
-0.326837	program package is not	-0.124939
-0.705279	the divisor is not	-0.124939
-0.326837	compilers. Fastcall is not	-0.124939
-0.326837	Step (1) is not	-0.124939
-0.326837	If hyperthreading is not	-0.124939
-0.423194	file format is not	-0.124939
-0.326837	and mirroring is not	-0.124939
-0.326837	This '1' is not	-0.124939
-0.357199	than 2n and not	-0.124939
-0.357199	human readable and not	-0.124939
-0.357199	n.a. _MSC_VER and not	-0.124939
-0.357199	platform __GNUC__ and not	-0.124939
-0.713399	functions that are not	-0.124939
-0.479041	lengths that are not	-0.124939
-0.479041	Applications that are not	-0.124939
-0.479041	events that are not	-0.124939
-1.041067	if you are not	-0.124939
-0.462434	as you are not	-0.124939
-0.834065	when you are not	-0.124939
-0.931922	If you are not	-0.124939
-0.802749	and data are not	-0.124939
-0.536970	Fastcall functions are not	-0.124939
-0.468381	Watcom compilers are not	-0.124939
-0.468381	Current compilers are not	-0.124939
-0.961878	If there are not	-0.124939
-0.404056	The objects are not	-0.124939
-0.569592	allocated objects are not	-0.124939
-0.404056	shared objects are not	-0.124939
-0.771806	function libraries are not	-0.124939
-0.394756	standard libraries are not	-0.124939
-0.394756	LIBM libraries are not	-0.124939
-0.375318	operating systems are not	-0.124939
-0.742918	and they are not	-0.124939
-0.742918	but they are not	-0.124939
-0.519751	vector operations are not	-0.124939
-0.151019	write instructions are not	-0.124939
-0.508352	Intel processors are not	-0.124939
-0.815038	point parameters are not	-0.124939
-0.493605	network resources are not	-0.124939
-0.509903	modern microprocessors are not	-0.124939
-0.461026	model numbers are not	-0.124939
-0.327054	algebraic reductions are not	-0.124939
-0.327054	These conversions are not	-0.124939
-0.423464	function names are not	-0.124939
-0.327054	in sequence are not	-0.124939
-0.101061	PLT tables are not	-0.425969
-0.327054	for D are not	-0.124939
-0.461026	The profilers are not	-0.124939
-0.327054	sorting algorithms, are not	-0.124939
-0.327054	(e.g. '>') are not	-0.124939
-0.591304	what it can not	-0.124939
-0.357685	A redesign can not	-0.124939
-0.538705	reduced speed or not	-0.124939
-0.357274	use hyperthreading or not	-0.124939
-0.526587	linking and by not	-0.124939
-0.526339	__GNUC__ and not not	-0.124939
-1.540472	tell the compiler not	-0.124939
-0.899972	that it may not	-0.124939
-1.377093	then it may not	-0.124939
-0.385315	the compiler may not	-0.301030
-1.023944	The compiler may not	-0.124939
-0.443817	arrays It may not	-0.124939
-0.443817	registers. It may not	-0.124939
-0.443817	objects? It may not	-0.124939
-0.420113	allocated memory may not	-0.124939
-0.324368	inlined functions may not	-0.124939
-0.324368	level-1 cache may not	-0.124939
-0.457373	current compilers may not	-0.124939
-0.420113	of pointers may not	-0.124939
-0.324368	The user may not	-0.124939
-0.778458	operating system may not	-0.124939
-0.324368	returns. alloca may not	-0.124939
-0.324368	Calling exit may not	-0.124939
-0.324368	USB sticks may not	-0.124939
-0.462727	64-bit. They have not	-0.124939
-0.358187	large expressions when not	-0.124939
-0.579605	and it will not	-0.124939
-0.410871	that it will not	-0.124939
-0.410871	but it will not	-0.124939
-0.906062	the code will not	-0.124939
-0.488845	memory. It will not	-0.124939
-0.517058	The program will not	-0.124939
-0.496294	the compilers will not	-0.124939
-0.496294	The compilers will not	-0.124939
-0.488845	But we will not	-0.124939
-0.332472	me. You will not	-0.124939
-0.332472	code 16 will not	-0.124939
-0.575835	when it has not	-0.124939
-0.976502	The compiler has not	-0.124939
-0.819949	class library has not	-0.124939
-0.441195	some cases, but not	-0.124939
-0.460352	compile- time, but not	-0.124939
-0.405242	is used, but not	-0.124939
-0.405242	AMD processors, but not	-0.124939
-0.441195	Intel CPUs, but not	-0.124939
-0.312399	or 8, but not	-0.124939
-0.312399	in memory, but not	-0.124939
-0.441195	and efficient, but not	-0.124939
-0.312399	a float, but not	-0.124939
-0.405242	multiple applications, but not	-0.124939
-0.312399	more complex, but not	-0.124939
-0.312399	or .a), but not	-0.124939
-0.312399	2 GB, but not	-0.124939
-0.312399	be noticeable but not	-0.124939
-0.063629	expression that should not	-0.425969
-0.559414	because you should not	-0.124939
-0.787290	CPU dispatcher should not	-0.124939
-0.342120	performance measurement should not	-0.124939
-0.597179	lowest instruction set not	-0.124939
-0.143305	systems that do not	-0.124939
-0.143305	microprocessors that do not	-0.124939
-0.226255	languages that do not	-0.124939
-0.143305	cores that do not	-0.124939
-0.107102	most compilers do not	-0.124939
-0.107102	why compilers do not	-0.124939
-0.189706	and we do not	-0.124939
-0.189706	that we do not	-0.124939
-0.249579	point variables do not	-0.124939
-0.107102	function libraries do not	-0.124939
-0.107102	Intel libraries do not	-0.124939
-0.249579	that pointers do not	-0.124939
-0.249579	32-bit systems do not	-0.124939
-0.249579	because they do not	-0.124939
-0.249579	integer operations do not	-0.124939
-0.249579	these directives do not	-0.124939
-0.249579	bigger vectors do not	-0.124939
-0.249579	when contentions do not	-0.124939
-0.249579	relative references do not	-0.124939
-0.249579	These conversions do not	-0.124939
-0.249579	STL containers do not	-0.124939
-0.249579	many programmers do not	-0.124939
-0.249579	overlap. Compilers do not	-0.124939
-0.021622	live ranges do not	-0.602060
-0.249579	have studied do not	-0.124939
-0.249579	their live-ranges do not	-0.124939
-0.249579	(live ranges) do not	-0.124939
-0.461949	aligned Assume pointer not	-0.124939
-0.356472	a class need not	-0.124939
-0.356425	signed. Be sure not	-0.124939
-0.460236	Instruction set SSE2 not	-0.124939
-0.310864	that it does not	-0.124939
-0.117017	Assume function does not	-0.124939
-0.054579	profiler. This does not	-0.124939
-0.054579	point. This does not	-0.124939
-0.151400	The compiler does not	-0.124939
-0.151400	This compiler does not	-0.124939
-0.151400	Microsoft compiler does not	-0.124939
-0.117017	However, this does not	-0.124939
-0.194631	check. It does not	-0.124939
-0.117017	the loop does not	-0.124939
-0.035614	the pointer does not	-0.124939
-0.035614	a pointer does not	-0.124939
-0.035614	specific pointer does not	-0.124939
-0.117017	IPP library does not	-0.124939
-0.117017	the object does not	-0.124939
-0.026433	of 2 does not	-0.124939
-0.117017	is long does not	-0.124939
-0.117017	or thread does not	-0.124939
-0.117017	This manual does not	-0.124939
-0.117017	the list does not	-0.124939
-0.117017	CPU dispatcher does not	-0.124939
-0.117017	The programmer does not	-0.124939
-0.117017	pointer aliasing does not	-0.124939
-0.117017	other hand, does not	-0.124939
-0.117017	the unit-test does not	-0.124939
-0.117017	same argument does not	-0.124939
-0.117017	Example 14.26 does not	-0.124939
-0.355498	value, n. But not	-0.124939
-1.225659	It is therefore not	-0.124939
-0.413089	microprocessor and therefore not	-0.124939
-0.413089	dependent and therefore not	-0.124939
-0.745012	You should therefore not	-0.124939
-0.498821	scratch. This would not	-0.124939
-0.570440	X" is simply not	-0.124939
-0.353656	If seconds was not	-0.124939
-0.352779	scalar (Scalar means not	-0.124939
-0.494756	32 bit platform not	-0.124939
-0.561744	compiler is usually not	-0.124939
-0.532052	problem that were not	-0.124939
-0.323361	different tasks were not	-0.124939
-0.050806	a time. Do not	-0.124939
-0.050806	member function. Do not	-0.124939
-0.050806	are used. Do not	-0.124939
-0.050806	less efficient. Do not	-0.124939
-0.050806	instruction set. Do not	-0.124939
-0.050806	memory allocation. Do not	-0.124939
-0.050806	memory block. Do not	-0.124939
-0.050806	certain optimizations. Do not	-0.124939
-0.050806	linked list. Do not	-0.124939
-0.050806	scarce resource. Do not	-0.124939
-0.050806	+ column; Do not	-0.124939
-0.050806	Enterprise editions). Do not	-0.124939
-0.347083	set, but possibly not	-0.124939
-0.347185	this function, though not	-0.124939
-0.445730	cases it might not	-0.124939
-0.343310	test should include not	-0.124939
-0.441718	for Intel CPUs, not	-0.124939
-0.338924	If columns had not	-0.124939
-0.438443	classes are generally not	-0.124939
-0.747133	the operating system, not	-0.124939
-0.338924	with fixed size, not	-0.124939
-0.400265	saved in registers, not	-0.124939
-0.268823	only on registers, not	-0.124939
-0.314566	use. I am not	-0.124939
-0.314566	microcontrollers. I am not	-0.124939
-0.429012	platform not _WIN64 not	-0.124939
-0.428864	version is currently not	-0.124939
-0.095080	some cases. Does not	-0.124939
-0.095080	instruction sets. Does not	-0.124939
-0.233120	32-bit Windows. Does not	-0.124939
-0.095080	an IDE. Does not	-0.124939
-0.314251	Studio IDE. Has not	-0.124939
-0.314251	in a register, not	-0.124939
-0.314440	The profiler measures not	-0.124939
-0.538058	my own research, not	-0.124939
-0.293755	containers is 95 not	-0.124939
-0.237503	PTR [edx] adds, not	-0.124939
-0.237503	follow the rows, not	-0.124939
-0.237503	However, this did not	-0.124939
-0.237503	non-recursing template specialization, not	-0.124939
-0.237503	accessing arrays forwards, not	-0.124939
-0.237503	operating systems (but not	-0.124939
-0.237503	will take precedence, not	-0.124939
-0.643511	intrinsic functions // This	-0.124939
-0.453159	= 0 // This	-0.124939
-0.350673	SSE3 required // This	-0.124939
-0.140864	/ 16; // This	-0.124939
-0.140864	% 16; // This	-0.124939
-0.350673	simple method. // This	-0.124939
-0.350673	- time1; // This	-0.124939
-0.350673	a square. // This	-0.124939
-0.812453	} } } This	-0.124939
-0.514970	range } } This	-0.124939
-0.669590	= 1; } This	-0.124939
-0.936172	+ 1; } This	-0.124939
-0.328962	return f; } This	-0.124939
-0.334810	/ 3; } This	-0.124939
-0.334810	% 3; } This	-0.124939
-0.328962	printf("Delta"); break; } This	-0.124939
-0.328962	} FuncC(i); } This	-0.124939
-0.328962	four sums } This	-0.124939
-0.328962	FuncB(i+1); FuncC(i+1); } This	-0.124939
-0.594131	of the code. This	-0.124939
-0.427141	the intermediate code. This	-0.124939
-0.329996	to non-AVX code. This	-0.124939
-0.753556	of the time. This	-0.124939
-0.854122	at a time. This	-0.124939
-0.329713	the first time. This	-0.124939
-0.497512	any extra time. This	-0.124939
-0.354996	systems. 10 Gnu This	-0.124939
-0.350331	of the function. This	-0.425969
-0.305019	call the function. This	-0.124939
-0.305019	inside the function. This	-0.124939
-0.702729	the dispatcher function. This	-0.124939
-0.353523	number 2, etc. This	-0.124939
-0.526279	other member functions. This	-0.124939
-0.419896	the virtual functions. This	-0.124939
-0.457137	so-called intrinsic functions. This	-0.124939
-0.353962	a || b; This	-0.124939
-0.574772	in the memory. This	-0.124939
-0.467500	temp in memory. This	-0.124939
-0.526568	in program memory. This	-0.124939
-0.374212	library into memory. This	-0.124939
-0.407587	in RAM memory. This	-0.124939
-0.563599	of the program. This	-0.124939
-0.855483	of a program. This	-0.124939
-0.284893	a C++ program. This	-0.124939
-0.284893	the final program. This	-0.124939
-0.330531	level- 1 cache. This	-0.124939
-0.755730	the level-2 cache. This	-0.124939
-0.547194	comparisons more efficient. This	-0.124939
-0.350894	page 8 below. This	-0.124939
-0.452106	on the data. This	-0.124939
-0.380849	block of data. This	-0.124939
-0.712116	addressing of data. This	-0.124939
-0.577029	FMA4 instruction set. This	-0.124939
-0.460035	for different compilers. This	-0.124939
-0.422556	with other compilers. This	-0.124939
-0.420838	matrix[i][j] += x; This	-0.124939
-0.702882	factorial *= x; This	-0.124939
-0.567147	function is called. This	-0.124939
-0.591774	is never called. This	-0.124939
-0.408884	for different CPUs. This	-0.124939
-0.288154	support different CPUs. This	-0.124939
-0.414424	the Intel compiler. This	-0.124939
-0.319798	Intel C++ compiler. This	-0.124939
-0.248202	the innermost loop. This	-0.124939
-0.318182	a memory pointer. This	-0.124939
-0.412414	the member pointer. This	-0.124939
-0.660748	and Windows platforms. This	-0.124939
-0.753112	all x86 platforms. This	-0.124939
-0.717062	in most cases. This	-0.124939
-0.315598	in both cases. This	-0.124939
-0.285115	point is 1. This	-0.124939
-0.371700	N = 1. This	-0.124939
-0.720019	0 or 1. This	-0.124939
-0.511111	level-1 cache size. This	-0.124939
-0.488621	for register variables. This	-0.124939
-0.346695	or network resources. This	-0.124939
-0.507354	structure or class. This	-0.124939
-0.360595	its child class. This	-0.124939
-0.275996	the derived class. This	-0.124939
-0.637214	c + d; This	-0.124939
-0.447463	pointer to it. This	-0.124939
-0.447463	scarcity of registers. This	-0.124939
-0.448751	typically 64 bytes. This	-0.124939
-0.732787	the shared object. This	-0.124939
-0.798669	of the library. This	-0.124939
-0.346983	the actual calculations. This	-0.124939
-0.346168	of integer operations. This	-0.124939
-0.398224	on the variable. This	-0.124939
-0.306723	a local variable. This	-0.124939
-0.560078	whole program optimization. This	-0.124939
-0.306165	offer profile-guided optimization. This	-0.124939
-0.304453	on the stack. This	-0.124939
-0.253683	up the stack. This	-0.124939
-0.227748	each their stack. This	-0.124939
-0.536072	library if possible. This	-0.124939
-0.345114	space than needed. This	-0.124939
-0.447506	for each thread. This	-0.124939
-0.281064	into each thread. This	-0.124939
-0.486410	by another thread. This	-0.124939
-0.554122	for other purposes. This	-0.124939
-0.393430	all these purposes. This	-0.124939
-0.631188	floating point instructions. This	-0.124939
-0.445169	into the vector. This	-0.124939
-0.387567	languages as well. This	-0.124939
-0.298070	Optimizes very well. This	-0.124939
-0.547245	the CPU dispatching. This	-0.124939
-0.343436	of the problem. This	-0.124939
-1.226495	the function returns. This	-0.124939
-0.626146	in random order. This	-0.124939
-1.136318	dynamic memory allocation. This	-0.124939
-0.263945	the memory block. This	-0.124939
-0.263945	bigger memory block. This	-0.124939
-0.757899	Func is executed. This	-0.124939
-0.625586	check for overflow. This	-0.124939
-0.339952	the same value. This	-0.124939
-0.090793	single object file. This	-0.425969
-0.339625	same logical register. This	-0.124939
-0.958150	the operating system. This	-0.124939
-0.499136	are very fast. This	-0.124939
-0.339625	point multiplication units. This	-0.124939
-0.439628	in the array. This	-0.124939
-0.339299	their 23 software. This	-0.124939
-0.871065	a cache line. This	-0.124939
-0.339625	in the vectors. This	-0.124939
-0.336924	and its parameters. This	-0.124939
-0.452520	speed is important. This	-0.124939
-0.284475	portability is important. This	-0.124939
-0.267679	a vector simultaneously. This	-0.124939
-0.350510	eight threads simultaneously. This	-0.124939
-0.861948	Microsoft Visual Studio This	-0.124939
-0.869745	on the processor. This	-0.124939
-0.397147	number of bits. This	-0.124939
-0.279236	to 64 bits. This	-0.124939
-0.207972	than 32 bits. This	-0.124939
-0.337294	it actually is. This	-0.124939
-0.337294	important than speed. This	-0.124939
-0.382047	jobs to do. This	-0.124939
-0.267679	but event-counters do. This	-0.124939
-0.408599	than by 16. This	-0.124939
-0.267364	i modulo 16. This	-0.124939
-0.612576	20 Copyright notice This	-0.124939
-0.503506	of data members. This	-0.124939
-0.869337	and back again. This	-0.124939
-0.254162	a hundred times. This	-0.124939
-0.470791	at inconvenient times. This	-0.124939
-0.611761	the preceding one. This	-0.124939
-0.611761	time stamp counter. This	-0.124939
-0.611761	class or structure. This	-0.124939
-0.329408	a ready-made profiler. This	-0.124939
-0.329408	Gnu compiler manual. This	-0.124939
-0.329408	loop are finished. This	-0.124939
-0.329918	sets 4 ways. This	-0.124939
-0.497064	intrinsics. Digital Mars This	-0.124939
-0.329408	function dispatch process. This	-0.124939
-0.329408	of data files. This	-0.124939
-0.484460	do out-of-order execution. This	-0.124939
-0.159722	in different modules. This	-0.124939
-0.062395	any other modules. This	-0.124939
-0.181962	code at all. This	-0.124939
-0.181962	offset at all. This	-0.124939
-0.329408	32-bit absolute addresses. This	-0.124939
-0.329408	before multiplying them. This	-0.124939
-0.329408	values per point. This	-0.124939
-0.477927	registers is doubled. This	-0.124939
-0.743850	every clock cycle. This	-0.124939
-0.322937	be filled up. This	-0.124939
-0.322937	for marketing reasons. This	-0.124939
-0.322937	for all objects. This	-0.124939
-0.591816	same cache lines. This	-0.124939
-0.471969	15] += 1.0f; This	-0.124939
-0.322937	have multiple versions. This	-0.124939
-0.322937	not be cached. This	-0.124939
-0.323565	zero-bits if unsigned. This	-0.124939
-0.591816	and invalid pointers. This	-0.124939
-0.647038	on this option. This	-0.124939
-0.323565	a2 / b2; This	-0.124939
-0.418330	other programming languages. This	-0.124939
-0.322937	the subsequent counts. This	-0.124939
-0.322937	faster and smaller. This	-0.124939
-0.673964	from another module. This	-0.124939
-0.322937	do this manually. This	-0.124939
-0.419112	get reproducible results. This	-0.124939
-0.312361	floating point addition. This	-0.124939
-0.312361	the code only. This	-0.124939
-0.312361	they are long. This	-0.124939
-0.462217	calculated in advance. This	-0.124939
-0.405196	32 = 28. This	-0.124939
-0.171529	program is loaded. This	-0.124939
-0.171529	has been loaded. This	-0.124939
-0.312361	in compiled C++. This	-0.124939
-0.312361	efficient code caching. This	-0.124939
-0.312361	array is stored. This	-0.124939
-0.312361	a random manner. This	-0.124939
-0.312361	of #include directives. This	-0.124939
-0.171529	the function definition. This	-0.124939
-0.236604	the class definition. This	-0.124939
-0.136689	and VIA CPUs"). This	-0.124939
-0.441144	is the same. This	-0.124939
-0.312361	a null reference. This	-0.124939
-0.462217	neutralize each other. This	-0.124939
-0.171529	become too fragmented. This	-0.124939
-0.236604	has become fragmented. This	-0.124939
-0.312361	instead of truncation. This	-0.124939
-0.171529	static or inline. This	-0.124939
-0.342030	the function inline. This	-0.124939
-0.571261	malloc and free. This	-0.124939
-0.036955	created or modified. This	-0.425969
-0.312361	destructor of x. This	-0.124939
-0.312361	results in a. This	-0.124939
-0.291953	the table static. This	-0.124939
-0.291953	and interface frameworks. This	-0.124939
-0.534899	moving the mouse. This	-0.124939
-0.380061	bytes S1 ArrayOfStructures[100]; This	-0.124939
-0.380061	it is compiled. This	-0.124939
-0.291953	4 = 32. This	-0.124939
-0.291953	+ a; 72 This	-0.124939
-0.291953	be declared volatile. This	-0.124939
-0.291953	also less safe. This	-0.124939
-0.291953	to be pure. This	-0.124939
-0.534899	out of range. This	-0.124939
-0.291953	u.f < 2.0 This	-0.124939
-0.291953	microprocessors is lost. This	-0.124939
-0.291953	the function declaration. This	-0.124939
-0.291953	is never changed. This	-0.124939
-0.380061	the "override" feature. This	-0.124939
-0.291953	array is defined. This	-0.124939
-0.380061	code by default. This	-0.124939
-0.291953	in software development. This	-0.124939
-0.380061	object is known. This	-0.124939
-0.291953	rather than rounding. This	-0.124939
-0.291953	not even temporarily. This	-0.124939
-0.380061	should be predicted. This	-0.124939
-0.534899	of everything else. This	-0.124939
-0.291953	arrays with alloca. This	-0.124939
-0.380061	of 250 ms. This	-0.124939
-0.380061	(see page 137). This	-0.124939
-0.291953	the index, i. This	-0.124939
-0.380061	164 1 Introduction This	-0.124939
-0.380061	higher than normal. This	-0.124939
-0.291953	misses have occurred. This	-0.124939
-0.291953	slightly less efficiently. This	-0.124939
-0.380061	sum += list[i]; This	-0.124939
-0.291953	Use different executables. This	-0.124939
-0.235919	less than 231. This	-0.124939
-0.235919	have undesired effects. This	-0.124939
-0.235919	the label $B1$2:. This	-0.124939
-0.235919	been called before. This	-0.124939
-0.235919	bus is saturated. This	-0.124939
-0.235919	it is compiling. This	-0.124939
-0.235919	longer time slices. This	-0.124939
-0.235919	exceed 2 Gbytes. This	-0.124939
-0.235919	of task switching. This	-0.124939
-0.235919	point of view. This	-0.124939
-0.235919	2.11 ifunc branch). This	-0.124939
-0.235919	standard 754 (1985). This	-0.124939
-0.235919	is also de-allocated. This	-0.124939
-0.235919	at address [ecx+eax*4]. This	-0.124939
-0.235919	b + 0.666666666666666666667; This	-0.124939
-0.235919	be made local. This	-0.124939
-0.235919	or 32 bytes). This	-0.124939
-0.235919	for vacant spaces. This	-0.124939
-0.235919	it becomes full. This	-0.124939
-0.235919	instead of if. This	-0.124939
-0.235919	calculation of (a+b). This	-0.124939
-0.235919	part of it). This	-0.124939
-0.235919	be quite substantial. This	-0.124939
-0.235919	of its arguments. This	-0.124939
-0.235919	/QaxAVX or -axAVX. This	-0.124939
-0.235919	(partial) template specialization. This	-0.124939
-0.235919	the last member. This	-0.124939
-0.235919	Studio 2008 version). This	-0.124939
-0.235919	exception ever happens. This	-0.124939
-0.235919	with two entries. This	-0.124939
-0.235919	Microsoft Visual Studio. This	-0.124939
-0.235919	#endif return n;} This	-0.124939
-0.235919	!= 0; 35 This	-0.124939
-0.235919	test their functionality. This	-0.124939
-0.235919	ALIGN ; mark_end; This	-0.124939
-0.235919	page size (4096). This	-0.124939
-0.235919	8 columns unused. This	-0.124939
-0.235919	exceeds 64 kbytes. This	-0.124939
-0.235919	can be reduced. This	-0.124939
-0.235919	math library (SVML). This	-0.124939
-0.235919	happen quite often. This	-0.124939
-0.235919	(e.g. DEC, JNZ). This	-0.124939
-0.235919	system thread scheduler. This	-0.124939
-0.235919	must be added. This	-0.124939
-0.235919	the external clock. This	-0.124939
-0.235919	list plus i*sizeof(S1). This	-0.124939
-0.235919	See page 45. This	-0.124939
-0.235919	interfaces from scratch. This	-0.124939
-0.235919	a polymorphous class? This	-0.124939
-0.235919	(see page 135). This	-0.124939
-0.235919	blocking or tiling. This	-0.124939
-0.235919	in example 16.1. This	-0.124939
-0.235919	comparing bits 32-62. This	-0.124939
-0.235919	on page 87. This	-0.124939
-0.235919	available from www.agner.org/optimize/testp.zip. This	-0.124939
-0.235919	string or CString. This	-0.124939
-0.235919	the previous iteration. This	-0.124939
-0.235919	(MKL v. 7.2). This	-0.124939
-0.235919	YMM register state. This	-0.124939
-0.235919	memset is deprecated. This	-0.124939
-0.235919	calculated as ((a+b)+c)+d. This	-0.124939
-0.235919	compiler (parallel composer) This	-0.124939
-0.235919	instead of -fpic. This	-0.124939
-0.235919	utilities in 2010. This	-0.124939
-0.235919	each line written. This	-0.124939
-0.235919	on page 158. This	-0.124939
-0.235919	a time measure. This	-0.124939
-0.235919	normal return route. This	-0.124939
-0.235919	in Windows MFC). This	-0.124939
-0.235919	other access patterns. This	-0.124939
-0.235919	overflow is "undefined". This	-0.124939
-0.235919	the option -mveclibabi=svml. This	-0.124939
-0.235919	// Portability note: This	-0.124939
-0.235919	at this place. This	-0.124939
-0.235919	= 64 kb. This	-0.124939
-0.235919	perspective of usability. This	-0.124939
-0.235919	log(b[i]) + log(c[i]);. This	-0.124939
-0.235919	= (short int)i; This	-0.124939
-0.235919	(/FAs or -fsource-asm). This	-0.124939
-0.235919	void F1() throw(); This	-0.124939
-0.235919	instruction xor eax,eax. This	-0.124939
-1.118521	a = a -	-0.124939
-0.529607	0 = a -	-0.124939
-0.148623	-(-a) = a -	-0.124939
-0.148623	a*1 = a -	-0.124939
-0.148623	a+0 = a -	-0.124939
-0.376349	a/1 = a -	-0.124939
-0.376349	~(~a) = a -	-0.124939
-0.825590	{ return a -	-0.124939
-0.673827	3; return a -	-0.124939
-0.725749	= multiply by -	-0.124939
-0.154608	- - - -	-1.185637
-0.331775	x - - -	-0.698970
-0.650034	n.a. - - -	-0.903090
-0.189073	-- - - -	-0.124939
-0.153574	- x - -	-0.926571
-0.434965	x x - -	-0.669007
-0.219014	-1 x - -	-0.124939
-0.328225	- n.a. - -	-0.823909
-0.231408	x n.a. - -	-0.124939
-0.134386	n.a. n.a. - -	-1.447158
-0.245199	x -- - -	-0.124939
-0.178918	a+a+a+a=a*4 -(-a)=a - -	-0.124939
-0.239374	= a x -	-0.124939
-0.250153	- - x -	-1.000000
-0.596214	x - x -	-0.778151
-0.069002	n.a. - x -	-0.602060
-0.496101	- x x -	-0.903090
-0.676815	x x x -	-1.185637
-0.267438	n.a. x x -	-0.425969
-0.361198	(x) x x -	-0.124939
-0.082012	n.a. n.a. x -	-0.602060
-0.436210	(2.5f * x -	-0.124939
-0.239374	= -1 x -	-0.124939
-0.239374	Devirtualization ---x----- x -	-0.124939
-0.239374	x-xxx---x x-xxx---x x -	-0.124939
-1.828626	of the program -	-0.124939
-0.435794	a - n.a. -	-0.425969
-0.218261	n.a. - n.a. -	-0.726999
-0.299623	0 - n.a. -	-0.602060
-0.356613	-1 - n.a. -	-0.124939
-0.356613	a&(b|c) - n.a. -	-0.124939
-0.356613	a<<(b+c) - n.a. -	-0.124939
-0.197485	n.a. x n.a. -	-0.124939
-0.030973	- n.a. n.a. -	-0.784991
-0.173997	- reciprocal n.a. -	-0.124939
-0.655913	((x2) 2) 2 -	-0.124939
-0.140346	double takes 4 -	-0.124939
-0.140346	Multiplication takes 4 -	-0.124939
-0.758961	cache of 8 -	-0.124939
-0.251025	~a = 0 -	-0.124939
-0.150370	a-a = 0 -	-0.124939
-0.074168	a*0 = 0 -	-0.124939
-0.251025	^a = 0 -	-0.124939
-0.251025	0/a = 0 -	-0.124939
-0.163666	andnot(a,a) = 0 -	-0.124939
-0.260729	test bits 0 -	-0.124939
-0.260729	takes typically 0 -	-0.124939
-0.260729	& 0= 0 -	-0.124939
-0.565763	use unsigned integers -	-0.124939
-0.354296	Manual", Volume 1 -	-0.124939
-0.471779	penalty of 10 -	-0.124939
-0.323363	detected until 10 -	-0.124939
-0.349425	(a >= b) -	-0.124939
-0.348709	code is inlined -	-0.124939
-0.248420	processors, and 3 -	-0.124939
-0.248420	addition takes 3 -	-0.124939
-0.248420	may take 3 -	-0.124939
-0.341458	is approximately 12 -	-0.124939
-0.687503	-1 = -1 -	-0.124939
-0.339006	is quite expensive -	-0.124939
-0.335929	Division takes 14 -	-0.124939
-0.331454	conversion takes 50 -	-0.124939
-0.420842	division takes 40 -	-0.124939
-0.682819	testing and maintenance -	-0.124939
-0.407633	listing /FA -S -	-0.124939
-0.314329	x- x ----- -	-0.124939
-0.314492	(-a)*(-b) = a*b -	-0.124939
-0.314329	timediff[i] = ReadTSC() -	-0.124939
-0.463451	n.a. Constant folding -	-0.124939
-0.048338	a+b+c = a+(b+c) -	-0.124939
-0.048338	(a+b)+c = a+(b+c) -	-0.124939
-0.102738	x x -- -	-0.124939
-0.102738	- xxxxxxxxx -- -	-0.124939
-0.023497	a*b+a*c = a*(b+c) -	-0.124939
-0.382361	((x2)2)2 a+a+a+a=a*4 -(-a)=a -	-0.124939
-0.538188	a*b = b*a -	-0.124939
-0.538188	(a&b)|(a&c) = a&(b|c) -	-0.124939
-0.237568	matter of convenience -	-0.124939
-0.237568	----- - x-xxx -	-0.124939
-0.237568	// (time after) -	-0.124939
-0.237568	(i = (int)n -	-0.124939
-0.237568	polynomial(x) = 2.5*x^2 -	-0.124939
-0.237568	a<<b<<c = a<<(b+c) -	-0.124939
-0.237568	and multiplication (27 -	-0.124939
-0.237568	and multiplication (20 -	-0.124939
-0.237568	if ((unsigned int)(i -	-0.124939
-0.237568	and subtraction (3 -	-0.124939
-0.237568	Copyright © 2004 -	-0.124939
-0.237568	x-xxx - xx(-)x- -	-0.124939
-0.237568	<= (unsigned int)(max -	-0.124939
-0.237568	features, see http://www.agner.org/optimize/ -	-0.124939
-0.237568	a+a+a+a = a*4 -	-0.124939
-0.237568	x-- x --- -	-0.124939
-1.165382	expression that is an	-0.124939
-1.890544	if it is an	-0.124939
-1.022762	a program is an	-0.124939
-0.834849	smart pointer is an	-0.124939
-0.095485	if b is an	-0.301030
-1.366173	then there is an	-0.124939
-0.529041	case there is an	-0.124939
-0.529041	But there is an	-0.124939
-0.943671	cases, there is an	-0.124939
-0.553472	libraries. C++ is an	-0.124939
-0.559908	... There is an	-0.124939
-0.559908	throughput There is an	-0.124939
-0.900462	the processor is an	-0.124939
-0.772997	loop counter is an	-0.124939
-0.352953	the source is an	-0.124939
-0.749503	when n is an	-0.124939
-0.352953	that 10 is an	-0.124939
-0.896231	the exponent is an	-0.124939
-1.643662	the size of an	-0.124939
-1.216569	The size of an	-0.124939
-1.121456	the address of an	-0.425969
-0.831127	the bits of an	-0.124939
-1.127260	the type of an	-0.124939
-0.325018	in case of an	-0.221849
-1.371134	different versions of an	-0.124939
-0.519216	An overflow of an	-0.124939
-0.534568	unused copy of an	-0.124939
-1.134998	the end of an	-0.124939
-0.769411	the range of an	-0.124939
-0.532327	used. Conversion of an	-0.124939
-0.814208	the throughput of an	-0.124939
-1.051564	the availability of an	-0.124939
-0.645306	just-in-time compilation of an	-0.124939
-0.351587	the event of an	-0.124939
-0.351587	the functionality of an	-0.124939
-0.458426	console or to an	-0.124939
-1.667688	a pointer to an	-0.124939
-0.754190	point number to an	-0.124939
-0.522793	first compiled to an	-0.124939
-0.354828	are aligned to an	-0.124939
-0.460019	always points to an	-0.124939
-0.460019	actually points to an	-0.124939
-0.816209	same applies to an	-0.124939
-1.073612	be converted to an	-0.124939
-0.354828	add functionality to an	-0.124939
-0.754190	reduced 15.1a to an	-0.124939
-0.354828	adding bounds-checking to an	-0.124939
-0.357310	a multiplication and an	-0.124939
-0.461580	microprocessor core and an	-0.124939
-0.461580	exclusive mode, and an	-0.124939
-0.751636	a pointer in an	-0.124939
-1.373744	of elements in an	-0.124939
-1.034732	objects stored in an	-0.124939
-0.454946	or first in an	-0.124939
-0.550031	of bits in an	-0.124939
-0.844006	largest element in an	-0.124939
-0.530847	each run in an	-0.124939
-0.352084	memory allocation in an	-0.124939
-0.352084	a microprocessor in an	-0.124939
-0.454946	true last in an	-0.124939
-0.141723	stored together in an	-0.425969
-0.710978	are provided in an	-0.124939
-0.352084	question: Put in an	-0.124939
-0.352084	thread-like scheduling in an	-0.124939
-0.581480	and used for an	-0.124939
-0.453015	page 32 for an	-0.124939
-0.350560	be allocated for an	-0.124939
-0.989283	page 130 for an	-0.124939
-0.350560	page 80 for an	-0.124939
-0.350560	page 43 for an	-0.124939
-0.350560	page 81 for an	-0.124939
-0.064731	page 89 for an	-0.124939
-0.064731	page 78 for an	-0.425969
-0.064731	VIA CPUs" for an	-0.425969
-0.863766	cannot assume that an	-0.124939
-0.357770	program dictates that an	-0.124939
-1.061948	known to be an	-0.124939
-0.826212	appears to be an	-0.124939
-0.561922	argument to be an	-0.124939
-1.563473	it can be an	-0.124939
-1.779073	This can be an	-0.124939
-0.584703	programming can be an	-0.124939
-0.583671	array will be an	-0.124939
-0.847658	may also be an	-0.124939
-0.569130	which would be an	-0.124939
-1.572670	should preferably be an	-0.124939
-0.649221	command line or an	-0.124939
-0.456834	an expression or an	-0.124939
-0.353573	link map or an	-0.124939
-0.353573	an integer, or an	-0.124939
-0.353573	assignment operator, or an	-0.124939
-0.456648	etc., and if an	-0.124939
-0.951810	to check if an	-0.124939
-0.353426	But what if an	-0.124939
-0.871395	will fail if an	-0.124939
-0.353426	A Number) if an	-0.124939
-0.553553	process or by an	-0.124939
-0.464983	be calculated by an	-0.425969
-0.353004	command received by an	-0.124939
-0.353004	be followed by an	-0.124939
-0.490442	list or with an	-0.124939
-0.528545	the code with an	-0.124939
-0.450366	explain this with an	-0.124939
-0.348466	can return with an	-0.124939
-0.702756	be accessed with an	-0.124939
-0.815282	is done with an	-0.124939
-0.511958	is implemented with an	-0.124939
-0.450366	PC platform with an	-0.124939
-0.353782	is called on an	-0.124939
-0.355870	is running on an	-0.124939
-0.562218	when running on an	-0.124939
-1.339162	are based on an	-0.124939
-0.336997	access x as an	-0.124939
-0.435914	the data as an	-0.124939
-0.163717	is used as an	-0.425969
-0.474618	point variable as an	-0.124939
-0.773160	same way as an	-0.124939
-0.336997	are available as an	-0.124939
-0.474618	is transferred as an	-0.124939
-0.617065	is provided as an	-0.124939
-0.435914	is interpreted as an	-0.124939
-0.956800	be expressed as an	-0.124939
-0.677244	is treated as an	-0.124939
-0.617065	is coded as an	-0.124939
-0.435914	be represented as an	-0.124939
-0.336997	function (n!) as an	-0.124939
-0.823320	This is not an	-0.124939
-1.241715	this is not an	-0.124939
-0.195451	processor is not an	-0.124939
-1.041626	take more than an	-0.124939
-0.354271	more expensive than an	-0.124939
-0.354271	more compact than an	-0.124939
-0.370897	user will have an	-0.124939
-0.370897	processor will have an	-0.124939
-0.527488	Some compilers have an	-0.425969
-0.359078	Most compilers have an	-0.124939
-0.359078	Many compilers have an	-0.124939
-0.502546	F1 also have an	-0.124939
-0.730863	then we have an	-0.124939
-0.435229	don't even have an	-0.124939
-0.988758	compiler doesn't have an	-0.124939
-0.336451	for Linux have an	-0.124939
-0.580190	called every time an	-0.124939
-0.878563	inefficient to use an	-0.124939
-1.134721	You may use an	-0.124939
-0.550570	option then use an	-0.124939
-0.348520	programming languages use an	-0.124939
-0.348520	overkill. Don't use an	-0.124939
-0.357948	some microprocessors when an	-0.124939
-0.358029	declared volatile then an	-0.124939
-0.150527	be stored at an	-0.425969
-0.444183	to start at an	-0.124939
-0.126023	be loaded at an	-0.124939
-0.343568	must begin at an	-0.124939
-1.071153	if it has an	-0.124939
-0.796471	the compiler has an	-0.124939
-0.653095	Intel compiler has an	-0.124939
-0.794342	The program has an	-0.124939
-0.547101	or library has an	-0.124939
-0.342871	functions. Sum1 has an	-0.124939
-0.773778	you can make an	-0.124939
-0.677739	We can make an	-0.124939
-0.519251	counter then make an	-0.124939
-0.357949	an issue because an	-0.124939
-0.989080	This is only an	-0.124939
-0.529225	multiplication but only an	-0.124939
-0.453583	pointers requires only an	-0.124939
-0.357805	cleaned up. If an	-0.124939
-0.357686	of Pascal used an	-0.124939
-0.548785	way to set an	-0.124939
-1.428182	possible to do an	-0.124939
-0.353014	vector register, do an	-0.124939
-0.984773	disadvantage of using an	-0.124939
-0.478442	reason for using an	-0.124939
-0.621489	you are using an	-0.425969
-0.575105	this by using an	-0.124939
-0.338509	or class into an	-0.124939
-0.338509	of software into an	-0.124939
-0.338509	is compiled into an	-0.124939
-0.497525	be loaded into an	-0.124939
-0.338509	by one, into an	-0.124939
-0.357367	by making i an	-0.124939
-0.447075	reference to such an	-0.124939
-0.345861	doesn't make such an	-0.124939
-0.345861	hardware. Porting such an	-0.124939
-0.356372	function may return an	-0.124939
-0.451238	faster and makes an	-0.124939
-0.349155	the linker makes an	-0.124939
-0.582473	lookup is often an	-0.124939
-0.537102	we don't need an	-0.124939
-0.346439	command-line versions without an	-0.124939
-0.346439	in applications without an	-0.124939
-0.465381	is to access an	-0.124939
-0.662607	order to access an	-0.124939
-0.322907	two and making an	-0.124939
-0.788369	avoided by making an	-0.124939
-0.327721	compiler from making an	-0.124939
-1.095840	how many times an	-0.124939
-0.355217	any assumption about an	-0.124939
-0.355081	bits wide, while an	-0.124939
-0.459009	as you avoid an	-0.124939
-0.342511	cards, etc. Use an	-0.124939
-0.442850	of data. Use an	-0.124939
-0.355175	or C1::f. But an	-0.124939
-0.442683	DLL goes through an	-0.124939
-0.342379	from main through an	-0.124939
-0.339243	Mac code uses an	-0.124939
-0.339243	This feature uses an	-0.124939
-1.083602	order to get an	-0.124939
-0.354433	to check whether an	-0.124939
-0.498046	microprocessor is doing an	-0.124939
-0.903694	it will run an	-0.124939
-0.353655	loop or add an	-0.124939
-0.570024	enum is simply an	-0.124939
-0.353013	example 11.2b was an	-0.124939
-1.053585	in most cases, an	-0.124939
-0.938817	compiler can replace an	-0.124939
-0.647753	that behaves like an	-0.124939
-0.352740	a function. Using an	-0.124939
-0.496453	threads and put an	-0.124939
-0.904728	because it needs an	-0.124939
-0.494730	importantly, it requires an	-0.124939
-0.351445	page 58 shows an	-0.124939
-0.309230	system to generate an	-0.124939
-0.309230	able to generate an	-0.124939
-0.255015	which will generate an	-0.124939
-0.255015	linker will generate an	-0.124939
-0.255015	c+b will generate an	-0.124939
-0.351049	you should choose an	-0.124939
-0.514981	This is just an	-0.124939
-0.316850	of code gives an	-0.124939
-0.316850	library, SSE4.1 gives an	-0.124939
-0.349034	application software. Such an	-0.124939
-0.551001	is in fact an	-0.124939
-0.349034	32-bit Windows, including an	-0.124939
-0.347618	arithmetic operations. When an	-0.124939
-0.347319	press. 19 Avoid an	-0.124939
-0.347319	such as copying an	-0.124939
-0.307582	cost to accessing an	-0.124939
-0.307582	or when accessing an	-0.124939
-0.542075	improved by adding an	-0.124939
-0.347419	the write causes an	-0.124939
-0.449671	when you divide an	-0.124939
-0.346250	a branch (e.g. an	-0.124939
-0.448110	We can convert an	-0.124939
-0.777185	way to handle an	-0.124939
-0.525189	risking to insert an	-0.124939
-0.293725	be called whenever an	-0.124939
-0.293725	are declared whenever an	-0.124939
-0.344254	the function modify an	-0.124939
-0.344511	array or setting an	-0.124939
-0.734300	to optimize away an	-0.124939
-0.338256	first PC's had an	-0.124939
-0.335099	a constant plus an	-0.124939
-0.335309	if you declare an	-0.124939
-0.335099	operator will detect an	-0.124939
-0.434058	programming language defines an	-0.124939
-0.335518	smart pointer. Accessing an	-0.124939
-0.428043	simply to increment an	-0.124939
-0.330717	that it adds an	-0.124939
-0.331216	unless you specify an	-0.124939
-0.324533	way of declaring an	-0.124939
-0.324227	frame functions. While an	-0.124939
-0.313620	code will catch an	-0.124939
-0.313620	the dispatcher signal an	-0.124939
-0.313620	C++ builder Has an	-0.124939
-0.313620	program to issue an	-0.124939
-0.313620	condition clause. Comparing an	-0.124939
-0.313620	can possibly throw an	-0.124939
-0.406755	the user expects an	-0.124939
-0.293154	the function construct an	-0.124939
-0.293154	a slow CPU, an	-0.124939
-0.293154	defined a constructor, an	-0.124939
-0.293154	input never exceeds an	-0.124939
-0.293154	simply by performing an	-0.124939
-0.293154	This will provoke an	-0.124939
-0.293154	+= a[i]; Converting an	-0.124939
-0.381533	avoided by replacing an	-0.124939
-0.236974	this works, here's an	-0.124939
-0.236974	or for issuing an	-0.124939
-0.236974	we are seeing an	-0.124939
-0.236974	next line provokes an	-0.124939
-0.236974	you are feeding an	-0.124939
-0.236974	that simply prints an	-0.124939
-0.236974	F2 actually throws an	-0.124939
-0.236974	set specified. Insert an	-0.124939
-0.236974	manipulated to fake an	-0.124939
-0.236974	hardware for raising an	-0.124939
-0.236974	function that detects an	-0.124939
-0.901791	or double to int	-0.124939
-0.358186	* 5; to int	-0.124939
-0.625155	4 bytes = int	-0.124939
-0.165024	= float or int	-0.124939
-0.864204	size of an int	-0.124939
-0.535752	fail if an int	-0.124939
-0.354408	wide, while an int	-0.124939
-0.593408	256 short int int	-0.124939
-0.851610	struct S1 { int	-0.124939
-0.328960	{ struct { int	-0.124939
-1.276108	* p) { int	-0.124939
-0.194543	& r) { int	-0.425969
-0.425846	struct Bitfield { int	-0.124939
-0.061874	int main() { int	-0.425969
-0.328960	long ReadTSC() { int	-0.124939
-0.134051	int Func2() { int	-0.124939
-0.134051	void Func2() { int	-0.124939
-0.601876	if (y) { int	-0.124939
-0.061874	double b[SIZE][SIZE]) { int	-0.425969
-0.328960	int x[]) { int	-0.124939
-0.549539	only first time int	-0.124939
-0.352591	(*CriticalFunction)(parm1, parm2); } int	-0.124939
-0.455589	return &CriticalFunction_386; } int	-0.124939
-0.352591	0, sizeof(a)); } int	-0.124939
-0.357267	to nearest integer int	-0.124939
-0.892870	supported instruction set int	-0.124939
-0.105864	using asmlib library int	-0.425969
-0.102332	// SSE2 version int	-0.425969
-0.610440	// AVX version int	-0.124939
-0.062486	// Lowest version int	-0.425969
-0.460334	32 2 2 int	-0.124939
-0.482736	Linux: unsigned long int	-0.124939
-0.342895	16-bit systems: long int	-0.124939
-0.342895	64-bit Linux: long int	-0.124939
-0.333568	However, the const int	-0.124939
-0.080867	c1 { const int	-0.124939
-0.080867	MathLoop() { const int	-0.124939
-0.180317	EMMS } const int	-0.124939
-0.286001	and static const int	-0.124939
-0.286001	factorials: static const int	-0.124939
-0.180317	Integer constant const int	-0.124939
-0.180317	int i; const int	-0.124939
-0.180317	Automatic vectorization const int	-0.124939
-0.180317	order(int x); const int	-0.124939
-0.180317	__attribute__((aligned(16))) #endif const int	-0.124939
-0.180317	Example 14.8 const int	-0.124939
-0.180317	example 16.1 const int	-0.124939
-0.180317	Example 14.30 const int	-0.124939
-0.180317	Example 9.5a const int	-0.124939
-0.180317	Example 11.3 const int	-0.124939
-0.180317	Example 9.4 const int	-0.124939
-0.180317	Example 7.17 const int	-0.124939
-0.038554	int Func(int); const int	-0.425969
-0.180317	Example 9.6a const int	-0.124939
-0.180317	Example 11.2b const int	-0.124939
-0.180317	of squares: const int	-0.124939
-0.180317	of factorials: const int	-0.124939
-0.180317	Example 14.5a const int	-0.124939
-0.180317	Example 11.2a const int	-0.124939
-0.180317	Example 7.33b const int	-0.124939
-0.180317	Example 7.33a const int	-0.124939
-0.180317	int FuncCol(int); const int	-0.124939
-0.180317	Example 14.4a const int	-0.124939
-0.537972	Vec8us 32 4 int	-0.124939
-0.983007	sum = 0; int	-0.124939
-0.551316	sum2 = 0; int	-0.124939
-0.551316	largest_abs = 0; int	-0.124939
-0.298260	i to unsigned int	-0.124939
-0.345228	i an unsigned int	-0.124939
-0.259578	2 int unsigned int	-0.124939
-0.054889	Sdouble { unsigned int	-0.124939
-0.054889	Slongdouble { unsigned int	-0.124939
-0.054889	Sfloat { unsigned int	-0.124939
-0.284979	32 4 unsigned int	-0.124939
-0.040503	fractional part unsigned int	-0.425969
-0.191232	{double d; unsigned int	-0.124939
-0.191232	(double x, unsigned int	-0.124939
-0.191232	float f; unsigned int	-0.124939
-0.371535	16-bit systems: unsigned int	-0.124939
-0.191232	and normal unsigned int	-0.124939
-0.191232	= 1000; unsigned int	-0.124939
-0.191232	Example 7.7 unsigned int	-0.124939
-0.191232	Example 7.25 unsigned int	-0.124939
-0.191232	part 142 unsigned int	-0.124939
-0.191232	u[2]} a[size]; unsigned int	-0.124939
-0.191232	<typename T, unsigned int	-0.124939
-0.191232	+ 0x3FF unsigned int	-0.124939
-0.191232	+ 0x3FFF unsigned int	-0.124939
-0.191232	Example 14.22b unsigned int	-0.124939
-0.191232	Example 14.22a unsigned int	-0.124939
-0.191232	65535 uint16_t unsigned int	-0.124939
-0.191232	+ 0x7F unsigned int	-0.124939
-0.536691	8 128 SSE2 int	-0.124939
-0.058940	other than short int	-0.124939
-0.058940	S1 { short int	-0.124939
-0.058940	int. A short int	-0.124939
-0.058940	by using short int	-0.124939
-0.058940	16 4 short int	-0.124939
-0.058940	16 8 short int	-0.124939
-0.018772	4 unsigned short int	-0.124939
-0.018772	8 unsigned short int	-0.124939
-0.018772	uint8_t unsigned short int	-0.124939
-0.058940	128 SSE2 short int	-0.124939
-0.058940	of type short int	-0.124939
-0.014002	int i; short int	-0.425969
-0.058940	unsigned 256 short int	-0.124939
-0.058940	unsigned char short int	-0.124939
-0.058940	256 AVX2 short int	-0.124939
-0.000681	int bb[], short int	-0.778151
-0.000681	int aa[], short int	-1.079181
-0.006168	Alignd ( short int	-0.124939
-0.058940	64 MMX short int	-0.124939
-0.058940	at 11 short int	-0.124939
-0.058940	Example 7.22 short int	-0.124939
-0.058940	127 int8_t short int	-0.124939
-0.535372	line doesn't work int	-0.124939
-1.046923	{ int i; int	-0.124939
-1.183157	SomeFunction (int a, int	-0.124939
-1.072270	and unsigned integers int	-0.124939
-0.458206	4 256 AVX int	-0.124939
-0.439037	+ b;} }; int	-0.124939
-0.339482	int UnusedFiller; }; int	-0.124939
-0.194282	a; int b; int	-0.124939
-0.621740	a; double b; int	-0.124939
-0.422389	<emmintrin.h> static inline int	-0.124939
-0.422389	14.19 static inline int	-0.124939
-0.422389	_mm_cvtss_si32(_mm_load_ss(&x));} static inline int	-0.124939
-0.353636	39916800, 479001600}; ... int	-0.124939
-0.455344	256 unsigned 256 int	-0.124939
-0.499228	b; int c; int	-0.124939
-0.438844	B2 { public: int	-0.124939
-0.438844	S2 { public: int	-0.124939
-0.438844	S3 { public: int	-0.124939
-0.061461	}; Bitfield x; int	-0.425969
-0.760836	size = 100; int	-0.124939
-0.393289	NUMCOLUMNS = 100; int	-0.124939
-0.515301	16 256 AVX2 int	-0.124939
-0.350697	compilers will reduce int	-0.124939
-0.641497	memory allocation are: int	-0.124939
-0.238762	{ int a; int	-0.124939
-0.238762	public: int a; int	-0.124939
-0.210959	14.2a float a; int	-0.124939
-0.210959	14.2b float a; int	-0.124939
-0.093671	abc {int a; int	-0.124939
-0.093671	Sab {int a; int	-0.124939
-0.519472	{ double d; int	-0.124939
-0.346727	Multiply (int x, int	-0.124939
-0.027729	{ float f; int	-0.970037
-0.437635	i; } u; int	-0.425969
-0.005744	int CriticalFunction_386(int parm1, int	-0.425969
-0.005744	int CriticalFunction_SSE2(int parm1, int	-0.425969
-0.005744	int CriticalFunction_AVX(int parm1, int	-0.425969
-0.023445	int CriticalFunctionType(int parm1, int	-0.124939
-0.023445	int CriticalFunction_Dispatch(int parm1, int	-0.124939
-0.341052	& operator[] (unsigned int	-0.124939
-0.278622	Explain volatile volatile int	-0.124939
-0.278622	int dummy[4]; volatile int	-0.124939
-0.438105	NumberOfTests = 10; int	-0.124939
-0.279526	short int a[100]; int	-0.124939
-0.240189	7.26b float a[100]; int	-0.124939
-0.240189	7.26a float a[100]; int	-0.124939
-0.338153	AVX version 127 int	-0.124939
-0.497012	4 64 MMX int	-0.124939
-0.952655	in 16-bit systems: int	-0.124939
-0.335450	size = 16; int	-0.124939
-0.335224	byte at 7 int	-0.124939
-0.727267	factorial = 1.0; int	-0.124939
-0.005102	} void SelectAddMul(short int	-0.124939
-0.005102	branch void SelectAddMul(short int	-0.124939
-0.005102	classes void SelectAddMul(short int	-0.124939
-0.005102	inline void SelectAddMul(short int	-0.124939
-0.005102	x);} void SelectAddMul(short int	-0.124939
-0.005102	vectorized: void SelectAddMul(short int	-0.124939
-0.604986	{ // n! int	-0.124939
-0.331883	size = 1000; int	-0.124939
-0.228624	ArraySize = 1000; int	-0.124939
-0.330617	code is __asm int	-0.124939
-0.011564	14.12b int list[300]; int	-0.124939
-0.011564	14.13b int list[300]; int	-0.124939
-0.011564	14.13a int list[300]; int	-0.124939
-0.011564	14.12a int list[300]; int	-0.124939
-0.324459	public: B2 b2; int	-0.124939
-0.020777	8; float matrix[rows][columns]; int	-0.124939
-0.020777	32; float matrix[rows][columns]; int	-0.124939
-0.020777	50; float matrix[rows][columns]; int	-0.124939
-0.313954	union { 89 int	-0.124939
-0.313523	b;}; S1 list[100]; int	-0.124939
-0.313954	// Example 14.10 int	-0.124939
-0.313954	// Example 14.11 int	-0.124939
-0.313954	// Example 8.7 int	-0.124939
-0.313954	// Example 7.21 int	-0.124939
-0.711852	int i, j; int	-0.124939
-0.136918	size = 1024; int	-0.425969
-0.065574	Func1 (int a[], int	-0.124939
-0.015489	void Func(int a[], int	-0.425969
-0.313523	desired parameters typedef int	-0.124939
-0.313954	// Example 7.23 int	-0.124939
-0.313954	// Example 7.20 int	-0.124939
-0.313954	// Example 7.19 int	-0.124939
-0.313954	// Example 7.18 int	-0.124939
-0.536840	{temp=x; x=y; y=temp;} int	-0.124939
-0.293061	// Example 14.12b int	-0.124939
-0.048229	coefficients double Table[100]; int	-0.124939
-0.048229	3.3; double Table[100]; int	-0.124939
-0.023445	a:4; int b:2; int	-0.425969
-0.536840	*p = string; int	-0.124939
-0.293061	// Example 14.13b int	-0.124939
-0.293061	100; S1 list[size]; int	-0.124939
-0.381420	abc * p; int	-0.124939
-0.381420	point extern "C" int	-0.124939
-0.023445	{ int a:4; int	-0.425969
-0.293061	7.14 class c1; int	-0.124939
-0.536840	x * m;} int	-0.124939
-0.293061	*p + 2;} int	-0.124939
-0.236893	// Example 14.3a int	-0.124939
-0.236893	// Example 14.3b int	-0.124939
-0.236893	100; int matrix[NUMROWS][NUMCOLUMNS]; int	-0.124939
-0.236893	// Example 8.9b int	-0.124939
-0.236893	// Example 8.9a int	-0.124939
-0.236893	// Example 14.1b int	-0.124939
-0.236893	// Example 14.1a int	-0.124939
-0.236893	byte at 403 int	-0.124939
-0.236893	max = 110; int	-0.124939
-0.236893	// Example 14.13c int	-0.124939
-0.236893	// Example 14.13a int	-0.124939
-0.236893	7.18 int FuncRow(int); int	-0.124939
-0.236893	// Example 8.13a int	-0.124939
-0.236893	// Example 8.13b int	-0.124939
-0.236893	// Example 9.1a int	-0.124939
-0.236893	// Example 9.1b int	-0.124939
-0.236893	// Example 8.11b int	-0.124939
-0.236893	// Example 8.11a int	-0.124939
-0.236893	// Example 7.42 int	-0.124939
-0.236893	b;}; Sab ab[size]; int	-0.124939
-0.236893	Dispatcher void SelectAddMul_dispatch(short int	-0.124939
-0.236893	1; } module2.cpp int	-0.124939
-0.236893	Use square blocking: int	-0.124939
-0.236893	template <int m> int	-0.124939
-0.236893	// Example 8.6a int	-0.124939
-0.236893	// Example 8.6b int	-0.124939
-0.236893	14.6 float list[16]; int	-0.124939
-0.236893	Example 8.20 module1.cpp int	-0.124939
-0.236893	// Example 7.30b int	-0.124939
-0.236893	// Example 7.30a int	-0.124939
-0.236893	14.13c int list[301]; int	-0.124939
-0.236893	template <bool IsPowerOf2, int	-0.124939
-0.236893	* __restrict aa, int	-0.124939
-0.236893	version void FUNCNAME(short int	-0.124939
-0.236893	by writing: __declspec(align(64)) int	-0.124939
-0.236893	// Example 8.12a int	-0.124939
-0.236893	// Example 8.12b int	-0.124939
-0.236893	typedef void FuncType(short int	-0.124939
-0.236893	// Example 14.12a int	-0.124939
-0.236893	// Example 8.14b int	-0.124939
-0.236893	// Example 8.14a int	-0.124939
-0.236893	p->a + p->b;} int	-0.124939
-0.236893	-32768 32767 int16_t int	-0.124939
-0.236893	d = 1.6; int	-0.124939
-0.236893	byte at 399 int	-0.124939
-0.236893	example 9.5a: 98 int	-0.124939
-0.078070	no more time than	-0.124939
-0.201901	takes more time than	-0.124939
-0.281525	take more time than	-0.124939
-0.078070	much more time than	-0.124939
-0.173319	consume more time than	-0.124939
-0.303131	takes longer time than	-0.124939
-0.177666	take longer time than	-0.124939
-0.051546	much longer time than	-0.301030
-0.856000	faster to use than	-0.124939
-0.856000	safer to use than	-0.124939
-0.808835	there is more than	-0.124939
-0.434140	accessed in more than	-0.124939
-0.464692	register for more than	-0.124939
-0.398743	increased by more than	-0.124939
-0.434140	functions have more than	-0.124939
-0.307144	run at more than	-0.124939
-0.470431	is no more than	-0.124939
-0.434140	or do more than	-0.124939
-0.317175	to take more than	-0.124939
-0.317175	can take more than	-0.124939
-0.447631	may take more than	-0.124939
-0.512003	often much more than	-0.124939
-0.452982	program uses more than	-0.124939
-0.058860	clock cycles more than	-0.124939
-0.307144	CPUs was more than	-0.124939
-0.307144	is actually more than	-0.124939
-0.307144	to load more than	-0.124939
-0.307144	can go more than	-0.124939
-0.307144	subexpression occurs more than	-0.124939
-0.307144	cannot prefetch more than	-0.124939
-0.307144	some programs, more than	-0.124939
-0.307144	actually implies more than	-0.124939
-0.357872	sample more data than	-0.124939
-1.444604	in the program than	-0.124939
-0.586061	load a program than	-0.124939
-0.565377	optimizing library functions than	-0.124939
-0.594844	to the CPU than	-0.124939
-0.347461	any size other than	-0.124939
-0.347461	a library other than	-0.124939
-0.347461	of sizes other than	-0.124939
-0.347461	class c1 other than	-0.124939
-0.893171	higher instruction set than	-0.124939
-1.214811	a + b than	-0.124939
-1.001944	a dynamic library than	-0.124939
-0.349471	is more efficient than	-0.182931
-0.407046	and more efficient than	-0.124939
-0.407046	are more efficient than	-0.124939
-0.407046	caching more efficient than	-0.124939
-0.120130	sometimes more efficient than	-0.124939
-0.455890	slightly more efficient than	-0.124939
-0.108200	is less efficient than	-0.124939
-0.278106	be less efficient than	-0.124939
-0.185685	input less efficient than	-0.124939
-0.041127	no other value than	-0.425969
-0.086559	any other value than	-0.124939
-0.872697	the previous value than	-0.124939
-0.356916	operands are variables than	-0.124939
-0.356868	goes another way than	-0.124939
-0.079130	function is faster than	-0.124939
-0.266110	This is faster than	-0.124939
-0.442202	It is faster than	-0.124939
-0.266110	point is faster than	-0.124939
-0.175963	2 is faster than	-0.124939
-0.175963	file is faster than	-0.124939
-0.380109	constant is faster than	-0.425969
-0.079130	implementation is faster than	-0.425969
-0.175963	blocks is faster than	-0.124939
-0.175963	15.1c is faster than	-0.124939
-0.175963	(This is faster than	-0.124939
-0.175963	Unsigned is faster than	-0.124939
-0.175963	14.21 is faster than	-0.124939
-0.178684	to be faster than	-0.124939
-0.269461	may be faster than	-0.124939
-0.076947	operations are faster than	-0.124939
-0.076947	arrays are faster than	-0.124939
-0.170526	and C++ faster than	-0.124939
-0.170526	83 called faster than	-0.124939
-0.340545	is often faster than	-0.124939
-0.142059	many times faster than	-0.124939
-0.142059	seven times faster than	-0.124939
-0.235439	calculated much faster than	-0.124939
-0.170526	are calculated faster than	-0.124939
-0.371340	will run faster than	-0.124939
-0.170526	code execute faster than	-0.124939
-0.170526	a little faster than	-0.124939
-0.170526	is increasing faster than	-0.124939
-0.170526	// ipow faster than	-0.124939
-0.317392	function is less than	-0.124939
-0.317392	class is less than	-0.124939
-0.317392	value is less than	-0.124939
-0.317392	delay is less than	-0.124939
-0.317392	μs is less than	-0.124939
-0.088774	to be less than	-0.124939
-0.200483	fact be less than	-0.124939
-0.264091	is not less than	-0.124939
-0.264091	run at less than	-0.124939
-0.264091	million times less than	-0.124939
-0.264091	files while less than	-0.124939
-0.264091	or write less than	-0.124939
-0.264091	relative difference less than	-0.124939
-0.003295	the code rather than	-0.124939
-0.003295	point code rather than	-0.124939
-0.001644	compile time rather than	-0.425969
-0.006614	full use rather than	-0.124939
-0.001644	in memory rather than	-0.425969
-0.006614	64-bit integer rather than	-0.124939
-0.006614	use float rather than	-0.124939
-0.006614	existing object rather than	-0.124939
-0.006614	one array rather than	-0.124939
-0.003295	by 8 rather than	-0.124939
-0.003295	constant 8 rather than	-0.124939
-0.006614	a register rather than	-0.124939
-0.006614	class template rather than	-0.124939
-0.000263	in registers rather than	-0.522879
-0.006614	using pointers rather than	-0.124939
-0.006614	cache access rather than	-0.124939
-0.006614	operating system rather than	-0.124939
-0.006614	64 bits rather than	-0.124939
-0.006614	get 0 rather than	-0.124939
-0.006614	six instructions rather than	-0.124939
-0.006614	present processors rather than	-0.124939
-0.006614	10 times rather than	-0.124939
-0.006614	the stack rather than	-0.124939
-0.006614	API calls rather than	-0.124939
-0.006614	the container rather than	-0.124939
-0.006614	flush-to-zero mode rather than	-0.124939
-0.006614	clock cycles rather than	-0.124939
-0.006614	with sets rather than	-0.124939
-0.006614	software implementation rather than	-0.124939
-0.006614	template parameter rather than	-0.124939
-0.006614	integer expressions rather than	-0.124939
-0.006614	static linking rather than	-0.124939
-0.006614	using references rather than	-0.124939
-0.006614	is loaded rather than	-0.124939
-0.006614	one operation rather than	-0.124939
-0.006614	result 100 rather than	-0.124939
-0.006614	processor models rather than	-0.124939
-0.006614	the beginning rather than	-0.124939
-0.006614	big blocks rather than	-0.124939
-0.006614	to memcpy rather than	-0.124939
-0.006614	each factor rather than	-0.124939
-0.006614	execution units rather than	-0.124939
-0.006614	single step rather than	-0.124939
-0.006614	in advance rather than	-0.124939
-0.006614	towards zero, rather than	-0.124939
-0.006614	of xxn rather than	-0.124939
-0.006614	and frameworks, rather than	-0.124939
-0.006614	result -56 rather than	-0.124939
-0.006614	calculated once, rather than	-0.124939
-0.006614	&& !b) rather than	-0.124939
-0.006614	electrical connections rather than	-0.124939
-0.006614	as (b*2.0)/3.0 rather than	-0.124939
-0.006614	using unions rather than	-0.124939
-0.006614	processor X?" rather than	-0.124939
-0.006614	that matters rather than	-0.124939
-0.006614	CPU supports, rather than	-0.124939
-0.006614	development tools, rather than	-0.124939
-0.006614	running at, rather than	-0.124939
-0.356278	on compiler optimization than	-0.124939
-0.356095	on such systems than	-0.124939
-0.355831	a separate file than	-0.124939
-0.355878	uses more bits than	-0.124939
-0.355896	and | operations than	-0.124939
-0.355694	cache control instructions than	-0.124939
-0.297166	is more important than	-0.124939
-0.297166	are more important than	-0.124939
-0.431163	become less important than	-0.124939
-0.481634	to one thread than	-0.124939
-0.502708	for each thread than	-0.124939
-0.355427	and computing power than	-0.124939
-0.077460	in 64-bit Linux than	-0.602060
-0.845517	or vector classes than	-0.124939
-0.553089	for single precision than	-0.124939
-0.562555	re-use a container than	-0.124939
-0.582325	faster to calculate than	-0.124939
-1.279985	in 64-bit mode than	-0.124939
-0.711252	64 bit mode than	-0.124939
-0.120603	have other values than	-0.425969
-0.055199	no other values than	-0.425969
-0.585351	more clock cycles than	-0.124939
-0.497327	allocate more space than	-0.124939
-0.497989	optimize anything else than	-0.124939
-0.353018	faster with signed than	-0.124939
-0.572057	big memory block than	-0.124939
-0.562769	down to zero than	-0.124939
-0.035463	use more resources than	-0.124939
-0.035463	take more resources than	-0.124939
-0.017369	much more resources than	-0.425969
-0.035463	slightly more resources than	-0.124939
-0.216134	more memory resources than	-0.124939
-0.409831	and other resources than	-0.124939
-0.216134	less computing resources than	-0.124939
-0.126875	set is better than	-0.124939
-0.126875	version is better than	-0.124939
-0.470126	actually be better than	-0.124939
-0.470136	on integer expressions than	-0.124939
-0.333727	reducing integer expressions than	-0.124939
-0.305022	that is longer than	-0.124939
-0.305022	should be longer than	-0.124939
-0.305022	the matrix longer than	-0.124939
-0.995794	the user interface than	-0.124939
-0.420747	are much higher than	-0.124939
-0.324876	is usually higher than	-0.124939
-0.046064	program is bigger than	-0.124939
-0.046064	matrix is bigger than	-0.124939
-0.046064	parameter is bigger than	-0.124939
-0.156087	may be bigger than	-0.124939
-0.156087	that are bigger than	-0.124939
-0.156087	treated as bigger than	-0.124939
-0.156087	innermost loop bigger than	-0.124939
-0.156087	for arrays bigger than	-0.124939
-0.156087	total offset bigger than	-0.124939
-0.156087	2. Objects bigger than	-0.124939
-0.351487	in other ways than	-0.124939
-0.452934	in other modules than	-0.124939
-0.350653	execution units smaller than	-0.124939
-0.786167	level-2 cache contentions than	-0.124939
-0.891279	an array index than	-0.124939
-0.450569	therefore more safe than	-0.124939
-0.348358	the best algorithm than	-0.124939
-0.488189	the same priority than	-0.124939
-0.346840	has higher priority than	-0.124939
-0.346840	with lower priority than	-0.124939
-0.562458	higher clock frequency than	-0.124939
-0.509818	cached more efficiently than	-0.124939
-0.125931	with more RAM than	-0.124939
-0.125931	allocate more RAM than	-0.124939
-0.880620	a circular buffer than	-0.124939
-0.628893	programmable logic device than	-0.124939
-0.482964	more memory blocks than	-0.124939
-0.342957	dynamic_cast more time-consuming than	-0.124939
-0.055964	for other purposes than	-0.124939
-0.441225	with coarse-grained parallelism than	-0.124939
-0.341105	error is lower than	-0.124939
-0.341105	data more random than	-0.124939
-0.193208	to be slower than	-0.124939
-0.065677	link libraries slower than	-0.124939
-0.065677	is much slower than	-0.124939
-0.065677	always run slower than	-0.124939
-0.065677	to execute slower than	-0.124939
-0.065677	faster nor slower than	-0.124939
-0.437935	is more expensive than	-0.124939
-0.476828	often more reliable than	-0.124939
-0.109161	is more predictable than	-0.124939
-0.109161	are more predictable than	-0.124939
-0.506200	and more compact than	-0.124939
-0.335902	in binary form than	-0.124939
-0.331061	that is larger than	-0.124939
-0.324787	often easier said than	-0.124939
-0.324787	be a bottleneck than	-0.124939
-0.420359	function is simpler than	-0.124939
-0.314239	find elsewhere. Faster than	-0.124939
-0.313950	do other input/output than	-0.124939
-0.381919	larger memory footprint than	-0.124939
-0.293468	been accessed recently than	-0.124939
-0.293468	maintain and verify than	-0.124939
-0.293468	higher for shared_ptr than	-0.124939
-0.237251	array element. Rather than	-0.124939
-0.237251	polygon or bitmap than	-0.124939
-0.237251	to write 2.0/3.0 than	-0.124939
-0.237251	class objects (rather than	-0.124939
-0.237251	function call (other than	-0.124939
-0.237251	small loops (less than	-0.124939
-0.237251	objects (memory pooling) than	-0.124939
-1.346004	some of the compiler	-0.124939
-0.583479	help of the compiler	-0.124939
-0.580858	known to the compiler	-0.124939
-0.562462	hint and the compiler	-0.124939
-0.562462	www.openmp.org and the compiler	-0.124939
-0.854129	listed in the compiler	-0.124939
-0.198968	Optimizations in the compiler	-0.124939
-0.346166	possible for the compiler	-0.425969
-0.455034	difficult for the compiler	-0.425969
-0.521483	options for the compiler	-0.124939
-0.346166	easier for the compiler	-0.425969
-1.032918	is that the compiler	-0.124939
-0.522236	code that the compiler	-0.124939
-1.165572	so that the compiler	-0.124939
-1.135659	sure that the compiler	-0.124939
-0.522236	name that the compiler	-0.124939
-0.998076	assume that the compiler	-0.124939
-0.522236	illogical that the compiler	-0.124939
-0.847431	check if the compiler	-0.124939
-0.573339	here if the compiler	-0.124939
-0.528873	implemented by the compiler	-0.124939
-0.528873	converted by the compiler	-0.124939
-0.459776	generated by the compiler	-0.124939
-0.482324	rely on the compiler	-0.124939
-0.815998	good as the compiler	-0.124939
-0.521410	index then the compiler	-0.124939
-0.521410	module then the compiler	-0.124939
-0.521410	once then the compiler	-0.124939
-0.573176	warning from the compiler	-0.124939
-0.577531	Looking at the compiler	-0.124939
-1.575262	to make the compiler	-0.124939
-0.741913	function because the compiler	-0.124939
-0.741913	efficient because the compiler	-0.124939
-0.514037	possible because the compiler	-0.124939
-0.741913	inefficient because the compiler	-0.124939
-0.535744	systems. If the compiler	-0.124939
-0.535744	efficient. If the compiler	-0.124939
-0.535744	maintain. If the compiler	-0.124939
-0.397588	threads, but the compiler	-0.124939
-0.397588	BSD, but the compiler	-0.124939
-0.397588	103), but the compiler	-0.124939
-0.397588	aliasing, but the compiler	-0.124939
-1.144859	cases where the compiler	-0.124939
-0.766604	situations where the compiler	-0.124939
-0.502015	2, so the compiler	-0.124939
-0.564467	simply makes the compiler	-0.124939
-0.571547	comes before the compiler	-0.124939
-0.446043	platforms. See the compiler	-0.124939
-0.446043	obvious. See the compiler	-0.124939
-0.763589	For example, the compiler	-0.124939
-0.714031	above example, the compiler	-0.124939
-1.192032	make sure the compiler	-0.124939
-0.687423	some cases the compiler	-0.124939
-0.524127	5. But the compiler	-0.124939
-0.549715	predict whether the compiler	-0.124939
-0.625914	how well the compiler	-0.124939
-0.375556	many cases, the compiler	-0.124939
-0.375556	simple cases, the compiler	-0.124939
-0.265060	it allows the compiler	-0.124939
-0.112615	This allows the compiler	-0.425969
-0.223129	know what the compiler	-0.124939
-0.097390	Checking what the compiler	-0.425969
-0.776974	to give the compiler	-0.124939
-0.398742	List[i]++; Here, the compiler	-0.124939
-0.398742	c1::*MemberPointer; Here, the compiler	-0.124939
-0.341618	simple function, the compiler	-0.124939
-0.534440	calls. Unfortunately, the compiler	-0.124939
-0.240766	can prevent the compiler	-0.124939
-0.240766	will prevent the compiler	-0.124939
-0.312366	it prevents the compiler	-0.124939
-0.306661	This prevents the compiler	-0.425969
-0.213165	also prevents the compiler	-0.124939
-0.213165	division prevents the compiler	-0.124939
-0.003298	to tell the compiler	-0.425969
-0.027109	then tell the compiler	-0.124939
-0.302127	may enable the compiler	-0.124939
-0.302127	will enable the compiler	-0.124939
-0.302127	sets enable the compiler	-0.124939
-0.341618	will allow the compiler	-0.124939
-0.480975	cannot expect the compiler	-0.124939
-0.341618	reason why the compiler	-0.124939
-0.480975	can help the compiler	-0.124939
-0.341618	sets. Likewise, the compiler	-0.124939
-0.341618	or reference, the compiler	-0.124939
-0.341618	in list, the compiler	-0.124939
-0.341618	other hand, the compiler	-0.124939
-0.480975	to specify the compiler	-0.124939
-0.502015	This tells the compiler	-0.124939
-0.063563	This enables the compiler	-0.425969
-0.341618	Without optimization, the compiler	-0.124939
-0.625914	In fact, the compiler	-0.124939
-0.341618	code. Sometimes the compiler	-0.124939
-0.341618	78). Adding the compiler	-0.124939
-0.441725	by invoking the compiler	-0.124939
-0.441725	operator forces the compiler	-0.124939
-0.341618	example 12.1a, the compiler	-0.124939
-0.341618	exist. Therefore the compiler	-0.124939
-0.341618	example 12.1b, the compiler	-0.124939
-0.341618	= ++b; the compiler	-0.124939
-0.596155	output of a compiler	-0.124939
-0.586314	code that a compiler	-0.124939
-1.613410	to use a compiler	-0.124939
-1.336728	by using a compiler	-0.124939
-1.367963	For example, a compiler	-0.124939
-0.537649	1. Use a compiler	-0.124939
-0.558875	(4) get a compiler	-0.124939
-0.522936	This requires a compiler	-0.124939
-0.240741	cannot expect a compiler	-0.425969
-1.154771	The choice of compiler	-0.124939
-0.181194	2.5 Choice of compiler	-0.124939
-0.065617	18 Overview of compiler	-0.124939
-0.504620	assembly programmers and compiler	-0.124939
-0.541164	license included in compiler	-0.124939
-0.586705	} } The compiler	-0.124939
-0.399151	1; } The compiler	-0.124939
-0.399151	3; } The compiler	-0.124939
-0.399151	1.; } The compiler	-0.124939
-0.775184	library functions. The compiler	-0.124939
-0.723004	is called. The compiler	-0.124939
-0.333264	across modules The compiler	-0.124939
-0.333264	Function inlining The compiler	-0.124939
-0.702004	by 2. The compiler	-0.124939
-0.469503	anonymous object. The compiler	-0.124939
-0.489981	induction variable. The compiler	-0.124939
-0.469503	each access. The compiler	-0.124939
-0.333264	+ f; The compiler	-0.124939
-0.469503	another problem. The compiler	-0.124939
-0.609979	is enabled. The compiler	-0.124939
-0.333264	is executed. The compiler	-0.124939
-0.333264	up-to-date solution. The compiler	-0.124939
-0.333264	and Linux. The compiler	-0.124939
-0.431232	an integer. The compiler	-0.124939
-0.669115	automatic vectorization. The compiler	-0.124939
-0.669115	multiple threads. The compiler	-0.124939
-0.431232	used here. The compiler	-0.124939
-0.431232	in one. The compiler	-0.124939
-0.333264	or division. The compiler	-0.124939
-0.431232	necessary initialization. The compiler	-0.124939
-0.333264	+ 1.0f; The compiler	-0.124939
-0.669115	another module. The compiler	-0.124939
-0.333264	is big. The compiler	-0.124939
-0.609979	+ 1.0f;} The compiler	-0.124939
-0.333264	in x. The compiler	-0.124939
-0.333264	/ 4; The compiler	-0.124939
-0.333264	= i+1; The compiler	-0.124939
-0.333264	methods: Instrumentation: The compiler	-0.124939
-0.333264	/ 1.2345); The compiler	-0.124939
-0.333264	e.g. /arch:SSE2. The compiler	-0.124939
-0.333264	page 84). The compiler	-0.124939
-0.333264	of temp. The compiler	-0.124939
-0.333264	Plus2 (&a); The compiler	-0.124939
-0.333264	return a+1;. The compiler	-0.124939
-0.333264	page 72). The compiler	-0.124939
-0.333264	/ 3.0; The compiler	-0.124939
-0.358703	I guess, that compiler	-0.124939
-0.342722	to optimization by compiler	-0.124939
-0.854324	to rely on compiler	-0.124939
-0.356996	Wikipedia article on compiler	-0.124939
-0.460931	very well. This compiler	-0.124939
-0.356800	(parallel composer) This compiler	-0.124939
-0.347862	is possible. A compiler	-0.124939
-0.347862	same result. A compiler	-0.124939
-0.347862	avoided. 37 A compiler	-0.124939
-0.347862	for Basic. A compiler	-0.124939
-0.347862	polynomial. Scheduling A compiler	-0.124939
-0.866562	with a different compiler	-0.124939
-1.069548	using the same compiler	-0.124939
-0.956589	to predict which compiler	-0.124939
-0.503323	expressions, but no compiler	-0.124939
-0.549171	belong to each compiler	-0.124939
-0.119065	of the Intel compiler	-0.124939
-0.190154	and the Intel compiler	-0.124939
-0.084752	in the Intel compiler	-0.124939
-0.319883	that the Intel compiler	-0.124939
-0.190154	on the Intel compiler	-0.124939
-0.283641	use the Intel compiler	-0.124939
-0.190154	If the Intel compiler	-0.124939
-0.190154	tests, the Intel compiler	-0.124939
-0.121230	dispatching in Intel compiler	-0.124939
-0.079808	} The Intel compiler	-0.124939
-0.079808	Intel The Intel compiler	-0.124939
-0.079808	systems. The Intel compiler	-0.124939
-0.079808	not. The Intel compiler	-0.124939
-0.079808	this. The Intel compiler	-0.124939
-0.079808	required. The Intel compiler	-0.124939
-0.079808	122. The Intel compiler	-0.124939
-0.079808	details). The Intel compiler	-0.124939
-0.411322	tests on Intel compiler	-0.124939
-0.460064	Intel compiler Intel compiler	-0.124939
-0.049947	compiler Windows Intel compiler	-0.425969
-0.021493	compiler Linux Intel compiler	-0.301030
-0.990018	of the C++ compiler	-0.124939
-0.531317	compilers. Intel C++ compiler	-0.124939
-0.433954	The Gnu C++ compiler	-0.124939
-0.433954	platforms. PathScale C++ compiler	-0.124939
-0.433954	tolerated. PGI C++ compiler	-0.124939
-0.586800	is a new compiler	-0.124939
-0.588707	precision. The following compiler	-0.124939
-0.064053	to the Gnu compiler	-0.124939
-0.064053	and the Gnu compiler	-0.124939
-0.221309	with the Gnu compiler	-0.124939
-0.139215	than the Gnu compiler	-0.124939
-0.139215	only the Gnu compiler	-0.124939
-0.139215	replace the Gnu compiler	-0.124939
-0.139215	Here, the Gnu compiler	-0.124939
-0.015972	dispatching in Gnu compiler	-0.124939
-0.100691	CPUs. The Gnu compiler	-0.124939
-0.100691	calls. The Gnu compiler	-0.124939
-0.100691	version. The Gnu compiler	-0.124939
-0.100691	vectorization. The Gnu compiler	-0.124939
-0.100691	3. The Gnu compiler	-0.124939
-0.100691	Mac. The Gnu compiler	-0.124939
-0.015972	compiler Windows Gnu compiler	-0.602060
-0.459418	for a Windows compiler	-0.124939
-0.580836	with the best compiler	-0.124939
-0.533218	because a good compiler	-0.124939
-0.250346	} A good compiler	-0.124939
-0.250346	operation. A good compiler	-0.124939
-0.250346	index. A good compiler	-0.124939
-0.035681	that an optimizing compiler	-0.124939
-0.035681	then an optimizing compiler	-0.124939
-0.035681	because an optimizing compiler	-0.124939
-0.035681	But an optimizing compiler	-0.124939
-0.035681	cases, an optimizing compiler	-0.124939
-0.045079	program. An optimizing compiler	-0.124939
-0.045079	used. An optimizing compiler	-0.124939
-0.045079	Devirtualization An optimizing compiler	-0.124939
-0.045079	pointers). An optimizing compiler	-0.124939
-0.217792	A good optimizing compiler	-0.124939
-0.581107	expect a particular compiler	-0.124939
-0.353429	and maintain. Most compiler	-0.124939
-0.376515	} The Microsoft compiler	-0.124939
-0.289056	Intel or Microsoft compiler	-0.124939
-0.289056	// If Microsoft compiler	-0.124939
-0.289056	explanation. (The Microsoft compiler	-0.124939
-0.352944	Another open source compiler	-0.124939
-0.352059	of code. Each compiler	-0.124939
-0.454639	manual for your compiler	-0.124939
-0.351051	3. Use appropriate compiler	-0.124939
-0.346237	Intel or PathScale compiler	-0.124939
-0.483344	where the chosen compiler	-0.124939
-0.341664	hand, a just-in-time compiler	-0.124939
-0.113983	platforms. The Clang compiler	-0.124939
-0.113983	Clang The Clang compiler	-0.124939
-0.477505	Combining the Borland compiler	-0.124939
-0.335864	keyword. The CodeGear compiler	-0.124939
-0.500107	the Digital Mars compiler	-0.124939
-0.331475	assembly language programming, compiler	-0.124939
-0.331572	way. The Codeplay compiler	-0.124939
-0.160626	_WIN32 n.a. MS compiler	-0.124939
-0.034935	to optimization MS compiler	-0.425969
-0.324973	VectorC A commercial compiler	-0.124939
-0.325093	compilers. (The PGI compiler	-0.124939
-0.293848	a stand alone compiler	-0.124939
-0.293848	is a cheap compiler	-0.124939
-0.237584	very user friendly compiler	-0.124939
-0.237584	when a genuine compiler	-0.124939
-1.578520	a = a x	-0.124939
-0.874633	The address of x	-0.124939
-0.691233	sign bit of x	-0.124939
-0.656954	the reading of x	-0.124939
-1.081886	the availability of x	-0.124939
-0.504952	add 2 to x	-0.124939
-0.358682	Function template for x	-0.124939
-0.595450	result is that x	-0.124939
-0.594790	z; a = x	-0.124939
-0.142628	double x2 = x	-0.124939
-0.142628	float x2 = x	-0.124939
-0.550365	don't depend on x	-0.124939
-0.680610	- - - x	-0.793946
-0.456850	x - - x	-0.903090
-0.391245	-(-a)=a - - x	-0.124939
-0.216123	a x - x	-0.124939
-0.530128	- x - x	-0.726999
-0.200189	x x - x	-1.204120
-0.216123	---x----- x - x	-0.124939
-0.216123	x-xxx---x x - x	-0.124939
-0.695106	- n.a. - x	-0.602060
-0.460638	5; to int x	-0.124939
-0.356569	will reduce int x	-0.124939
-0.872067	more efficient than x	-0.425969
-1.651746	is faster than x	-0.124939
-0.368929	- - x x	-0.689210
-0.127194	x - x x	-1.238882
-0.172347	- x x x	-1.028029
-0.118412	x x x x	-1.436693
-0.351039	n.a. x x x	-0.124939
-0.030984	x- x x x	-0.124939
-0.100494	74 x x x	-0.124939
-0.341098	- n.a. x x	-0.425969
-0.385243	x n.a. x x	-0.124939
-0.085826	x x- x x	-0.602060
-0.286634	x (x) x x	-0.124939
-0.133357	x 74 x x	-0.124939
-1.085893	a - n.a. x	-0.124939
-0.648805	n.a. - n.a. x	-0.425969
-1.130899	0 - n.a. x	-0.124939
-0.647929	n.a. x n.a. x	-0.124939
-0.334970	a*b=b*a x n.a. x	-0.124939
-1.172876	- n.a. n.a. x	-0.602060
-0.588302	for the object x	-0.124939
-0.384920	return x * x	-0.425969
-0.339852	return (2.5f * x	-0.124939
-0.339852	- 8.0f) * x	-0.124939
-0.594590	x) { return x	-0.602060
-0.444631	m) { return x	-0.124939
-0.357040	are there between x	-0.124939
-0.198188	(x = 0; x	-0.425969
-0.562572	used. For example, x	-0.124939
-0.562572	efficiency. For example, x	-0.124939
-0.843008	order to access x	-0.124939
-0.356124	the former case x	-0.124939
-0.103907	exp(x) for small x	-0.425969
-0.569492	template to get x	-0.124939
-0.576971	compiler to store x	-0.124939
-0.530827	y *= x; x	-0.124939
-0.347217	(int)n - 2, x	-0.124939
-0.347217	multiply // square x	-0.124939
-0.632324	function can modify x	-0.124939
-0.504734	f, x, y; x	-0.124939
-0.687606	-1 = -1 x	-0.124939
-0.005973	x x x- x	-0.602060
-0.018172	xx x x- x	-0.124939
-0.077748	= x- x- x	-0.124939
-0.331713	Example 15.1a. Calculate x	-0.124939
-0.665398	Common subexpression elimination x	-0.124939
-0.490133	(x = 2.0; x	-0.124939
-0.015526	x-- x x-- x	-0.124939
-0.065742	algebra reductions: x-- x	-0.124939
-0.314416	vectorization Devirtualization ---x----- x	-0.124939
-0.143302	x x (x) x	-0.124939
-0.143302	x- x (x) x	-0.124939
-0.293913	--xx----- x-xxx---x x-xxx---x x	-0.124939
-0.293913	- x 74 x	-0.124939
-0.293913	reductions: a+b=b+a, a*b=b*a x	-0.124939
-0.237641	-- - xx x	-0.124939
-0.237641	// constructor initializes x	-0.124939
-0.357876	quite powerful and may	-0.124939
-0.357876	is ambiguous and may	-0.124939
-0.389017	initialized variables that may	-0.124939
-0.389017	uninitialized variables that may	-0.124939
-0.551459	few instructions that may	-0.124939
-0.458599	other advantages that may	-0.124939
-0.354964	other cleanup that may	-0.124939
-0.354964	condition. Things that may	-0.124939
-0.483403	however, and it may	-0.124939
-0.483403	fluctuating and it may	-0.124939
-1.303104	is that it may	-0.124939
-1.375655	so that it may	-0.124939
-0.533900	kludgy that it may	-0.124939
-0.386926	functions then it may	-0.124939
-0.386926	necessary then it may	-0.124939
-0.386926	problem then it may	-0.124939
-0.386926	resource then it may	-0.124939
-0.386926	not, then it may	-0.124939
-0.386926	below) then it may	-0.124939
-0.386926	today, then it may	-0.124939
-0.386926	identified, then it may	-0.124939
-0.386926	obvious, then it may	-0.124939
-0.182526	consuming because it may	-0.124939
-0.503452	non-sequentially because it may	-0.124939
-0.434713	cases, but it may	-0.124939
-0.615244	function, but it may	-0.124939
-0.434713	set, but it may	-0.124939
-0.434713	job, but it may	-0.124939
-0.061803	For example, it may	-0.124939
-0.328434	best optimization it may	-0.124939
-0.750171	this case it may	-0.124939
-0.513350	parameter. But it may	-0.124939
-0.328434	large arrays, it may	-0.124939
-0.328434	allows it, it may	-0.124939
-0.566115	function. The function may	-0.124939
-0.948806	the same function may	-0.124939
-1.149078	the critical function may	-0.124939
-0.501993	the error code may	-0.124939
-0.721750	the compiled code may	-0.124939
-0.515664	RAM memory. This may	-0.124939
-0.341965	back again. This may	-0.124939
-0.481453	different modules. This may	-0.124939
-0.341965	marketing reasons. This may	-0.124939
-0.442162	function inline. This may	-0.124939
-0.341965	a; 72 This may	-0.124939
-0.341965	of range. This may	-0.124939
-0.341965	becomes full. This may	-0.124939
-0.341965	two entries. This may	-0.124939
-0.341965	be reduced. This may	-0.124939
-0.341965	page 45. This may	-0.124939
-0.741829	and the compiler may	-0.124939
-0.855306	then the compiler may	-0.124939
-0.925530	but the compiler may	-0.124939
-0.741829	cases, the compiler may	-0.124939
-0.513987	reference, the compiler may	-0.124939
-0.513987	hand, the compiler may	-0.124939
-0.513987	fact, the compiler may	-0.124939
-0.518434	example, a compiler may	-0.124939
-0.480933	choice of compiler may	-0.124939
-0.132834	} The compiler may	-0.124939
-0.331408	access. The compiler may	-0.124939
-0.331408	f; The compiler may	-0.124939
-0.331408	initialization. The compiler may	-0.124939
-0.331408	1.0f; The compiler may	-0.124939
-0.331408	1.0f;} The compiler may	-0.124939
-0.331408	4; The compiler may	-0.124939
-0.331408	i+1; The compiler may	-0.124939
-0.331408	(&a); The compiler may	-0.124939
-0.480933	Scheduling A compiler may	-0.124939
-1.018287	An optimizing compiler may	-0.124939
-0.541077	features, and you may	-0.124939
-0.302778	purpose, or you may	-0.124939
-0.287389	are then you may	-0.124939
-0.407875	code then you may	-0.124939
-0.287389	program then you may	-0.124939
-0.287389	pointer then you may	-0.124939
-0.287389	works then you may	-0.124939
-0.287389	structure then you may	-0.124939
-0.287389	... then you may	-0.124939
-0.287389	code, then you may	-0.124939
-0.287389	instance then you may	-0.124939
-0.287389	changes then you may	-0.124939
-0.287389	efficiency, then you may	-0.124939
-0.287389	y?" then you may	-0.124939
-0.446886	automatically but you may	-0.124939
-0.651711	For example, you may	-0.124939
-0.393359	some cases you may	-0.124939
-0.393359	in Windows, you may	-0.124939
-0.302778	to code, you may	-0.124939
-0.393359	64-bit systems, you may	-0.124939
-0.302778	composite object, you may	-0.124939
-0.118218	size. Alternatively, you may	-0.124939
-0.118218	stack. Alternatively, you may	-0.124939
-0.118218	returns. Alternatively, you may	-0.124939
-0.118218	Windows). Alternatively, you may	-0.124939
-0.302778	latter case, you may	-0.124939
-0.302778	executed. Furthermore, you may	-0.124939
-0.302778	In fact, you may	-0.124939
-0.302778	Even better, you may	-0.124939
-0.302778	jeopardizing safety, you may	-0.124939
-0.554086	function because this may	-0.124939
-0.355681	Mac systems, this may	-0.124939
-0.526022	The extra time may	-0.124939
-0.313669	clearing arrays It may	-0.124939
-0.949634	a pointer. It may	-0.124939
-0.313669	vector registers. It may	-0.124939
-0.313669	clock cycles. It may	-0.124939
-0.313669	one vector. It may	-0.124939
-0.313669	the branch. It may	-0.124939
-0.406815	disk space. It may	-0.124939
-0.313669	legacy software. It may	-0.124939
-0.313669	true anyway. It may	-0.124939
-0.406815	time here. It may	-0.124939
-0.313669	previous one. It may	-0.124939
-0.313669	approximately so. It may	-0.124939
-0.313669	poorly predictable. It may	-0.124939
-0.313669	also safer. It may	-0.124939
-0.313669	the profile. It may	-0.124939
-0.313669	contained objects? It may	-0.124939
-0.313669	too high. It may	-0.124939
-0.313669	are unavoidable. It may	-0.124939
-0.778342	The allocated memory may	-0.124939
-0.521479	of RAM memory may	-0.124939
-1.116331	of the program may	-0.124939
-0.579047	redesigning a program may	-0.124939
-0.564815	true. The program may	-0.124939
-0.357771	of inlined functions may	-0.124939
-0.594780	then the CPU may	-0.124939
-0.346989	Pentium CPUs which may	-0.124939
-0.346989	unpredictable intervals which may	-0.124939
-0.346989	is moved, which may	-0.124939
-0.346989	intermediate results, which may	-0.124939
-0.357705	of security, but may	-0.124939
-1.021109	the level-1 cache may	-0.124939
-0.502923	operations An integer may	-0.124939
-1.064234	CISC instruction set may	-0.124939
-0.937392	the above example may	-0.124939
-0.827890	that the compilers may	-0.124939
-0.450876	though future compilers may	-0.124939
-0.450876	that current compilers may	-0.124939
-1.106889	cache line size may	-0.124939
-0.785701	A smart pointer may	-0.124939
-0.830626	user interface library may	-0.124939
-0.564812	used, then there may	-0.124939
-0.480579	problematic because there may	-0.124939
-0.341331	parameter, so there may	-0.124939
-0.530508	critical. However, there may	-0.124939
-0.328849	Model-specific dispatching There may	-0.124939
-0.328849	end user. There may	-0.124939
-0.328849	much faster. There may	-0.124939
-0.328849	2 Mbytes. There may	-0.124939
-0.328849	to 36. There may	-0.124939
-0.328849	using inheritance. There may	-0.124939
-0.461209	An allocated array may	-0.124939
-0.547382	cycles that we may	-0.124939
-0.430197	the case we may	-0.124939
-0.332438	the future we may	-0.124939
-0.332438	be available, we may	-0.124939
-0.332438	of algebra, we may	-0.124939
-0.460988	it. Global variables may	-0.124939
-0.179415	error. // You may	-0.124939
-0.245779	a time. You may	-0.124939
-0.179415	memory used. You may	-0.124939
-0.245779	less efficient. You may	-0.124939
-0.179415	guidelines below. You may	-0.124939
-0.179415	is called. You may	-0.124939
-0.179415	'this' pointer. You may	-0.124939
-0.245779	vector operations. You may	-0.124939
-0.179415	not needed. You may	-0.124939
-0.179415	base classes. You may	-0.124939
-0.179415	double precision. You may	-0.124939
-0.179415	graceful way. You may	-0.124939
-0.179415	per vector. You may	-0.124939
-0.179415	to zero. You may	-0.124939
-0.179415	needed anyway. You may	-0.124939
-0.179415	the application. You may	-0.124939
-0.038391	CPU cores. You may	-0.124939
-0.179415	other modules. You may	-0.124939
-0.179415	program itself. You may	-0.124939
-0.179415	not expensive. You may	-0.124939
-0.179415	a debugger. You may	-0.124939
-0.179415	page 52. You may	-0.124939
-0.179415	in question. You may	-0.124939
-0.179415	processor model. You may	-0.124939
-0.179415	mutually incompatible. You may	-0.124939
-0.179415	available today. You may	-0.124939
-0.179415	level 108 You may	-0.124939
-0.179415	or -fno-strict-overflow. You may	-0.124939
-0.179415	cache contention. You may	-0.124939
-0.179415	(rarely 64). You may	-0.124939
-0.179415	not used). You may	-0.124939
-0.179415	bad dilemma. You may	-0.124939
-0.349864	in this table may	-0.124939
-0.349864	hand- written table may	-0.124939
-0.769939	casting of pointers may	-0.124939
-0.347032	typically use pointers may	-0.124939
-0.356000	but other systems may	-0.124939
-0.501110	starts. The user may	-0.124939
-0.356322	needed, or they may	-0.124939
-0.575731	MFC). This method may	-0.124939
-0.346285	short vector method may	-0.124939
-0.500839	The network access may	-0.124939
-0.721782	that the system may	-0.124939
-0.108324	The operating system may	-0.124939
-0.355613	a few times may	-0.124939
-0.553586	of 64-bit Windows may	-0.124939
-0.564090	many function calls may	-0.124939
-0.343252	Functions Function calls may	-0.124939
-0.355188	error. The calculations may	-0.124939
-0.459135	in program execution may	-0.124939
-0.134627	The virtual processor may	-0.124939
-0.134627	A virtual processor may	-0.124939
-0.330769	Pentium M processor may	-0.124939
-0.354889	in www.agner.org/optimize/cppexamples.zip. These may	-0.124939
-0.521465	contrary, each thread may	-0.124939
-0.354761	pattern history, etc. may	-0.124939
-0.354814	motion A calculation may	-0.124939
-0.926983	most efficient solution may	-0.124939
-0.534565	because the container may	-0.124939
-0.570319	low repeat count may	-0.124939
-1.406280	Dynamic memory allocation may	-0.124939
-0.353060	optimal. The branches may	-0.124939
-0.518549	addition and multiplication may	-0.124939
-0.353089	A C++ implementation may	-0.124939
-0.353003	and class members may	-0.124939
-0.455696	The following methods may	-0.124939
-1.545901	pointer or reference may	-0.124939
-0.352369	example, a programmer may	-0.124939
-0.352609	be annoying. We may	-0.124939
-0.646562	stack unwinding mechanism may	-0.124939
-0.747702	floating point expressions may	-0.124939
-0.327498	but such expressions may	-0.124939
-0.494551	the runtime framework may	-0.124939
-0.644002	log on process may	-0.124939
-0.321136	A simple constructor may	-0.124939
-0.730560	A copy constructor may	-0.124939
-0.350355	up. Some modules may	-0.124939
-0.350722	advice given here may	-0.124939
-0.492453	the data section may	-0.124939
-0.492053	x The syntax may	-0.124939
-0.451748	then the profiler may	-0.124939
-0.451144	in a network may	-0.124939
-0.821691	the clock frequency may	-0.124939
-0.696752	The clock frequency may	-0.124939
-0.345522	time consuming updates may	-0.124939
-0.344412	const int declaration may	-0.124939
-0.693886	A hash map may	-0.124939
-0.342939	reads and writes may	-0.124939
-0.490121	The program logic may	-0.124939
-0.342939	calls. The usability may	-0.124939
-0.529449	long dependency chain may	-0.124939
-0.338911	in memory. They may	-0.124939
-0.497662	This garbage collection may	-0.124939
-0.338604	have. The developers may	-0.124939
-0.437934	The time measurements may	-0.124939
-0.483129	and just-in-time compilation may	-0.124939
-0.496689	Critical device drivers may	-0.124939
-0.331120	template parameter. Templates may	-0.124939
-0.324416	that similar solutions may	-0.124939
-0.324416	function returns. alloca may	-0.124939
-0.324416	under this unit-test may	-0.124939
-0.675070	A binary tree may	-0.124939
-0.313804	of the advices may	-0.124939
-0.313804	only once. One may	-0.124939
-0.313804	7.25 Bitfields Bitfields may	-0.124939
-0.314142	b * 2.5 may	-0.124939
-0.406983	not computationally intensive may	-0.124939
-0.293330	exit. Calling exit may	-0.124939
-0.293330	{ // Overflow may	-0.124939
-0.381748	A test setup may	-0.124939
-0.237129	to the tolerance may	-0.124939
-0.237129	and USB sticks may	-0.124939
-0.237129	A software developer may	-0.124939
-0.237129	the first dimension may	-0.124939
-0.237129	OR operator (^) may	-0.124939
-0.352933	points to and you	-0.124939
-0.553439	critical function and you	-0.124939
-0.580483	point code and you	-0.124939
-0.564376	different functions and you	-0.124939
-0.712921	function library and you	-0.124939
-0.456022	less efficient and you	-0.124939
-0.518482	non-sequential access and you	-0.124939
-0.352933	exception handling and you	-0.124939
-0.456022	optimizing features, and you	-0.124939
-0.352933	fast anyway and you	-0.124939
-0.352933	is odd and you	-0.124939
-0.352933	& operator; and you	-0.124939
-0.352933	always sequential, and you	-0.124939
-0.588505	containers is that you	-0.124939
-1.306622	the code that you	-0.124939
-1.101330	the compiler that you	-0.124939
-0.445312	slow instruction that you	-0.124939
-0.631408	instruction set that you	-0.124939
-0.819578	code so that you	-0.124939
-0.558299	call so that you	-0.124939
-0.445312	of arrays that you	-0.124939
-0.344464	any processor that you	-0.124939
-0.304903	This requires that you	-0.124939
-0.304903	method requires that you	-0.124939
-0.344464	optimization options that you	-0.124939
-0.528051	speed. Assume that you	-0.124939
-0.445312	Another thing that you	-0.124939
-0.344464	of statements that you	-0.124939
-0.138940	clock counts that you	-0.124939
-0.138940	case" counts that you	-0.124939
-0.344464	of course, that you	-0.124939
-0.506139	then think that you	-0.124939
-0.344464	is unrealistic that you	-0.124939
-0.344464	let's say that you	-0.124939
-0.358602	set. Neither can you	-0.124939
-0.358235	this purpose, or you	-0.124939
-0.416199	thing and if you	-0.124939
-0.674369	is that if you	-0.124939
-0.416199	same as if you	-0.124939
-0.526726	methods only if you	-0.124939
-0.849000	the loop if you	-0.124939
-0.060820	for example if you	-0.124939
-0.321226	integer size if you	-0.124939
-0.321226	lookup table if you	-0.124939
-1.226600	For example, if you	-0.124939
-0.321226	to unsigned if you	-0.124939
-0.321226	be important if you	-0.124939
-0.416199	different CPUs if you	-0.124939
-0.416199	not necessary if you	-0.124939
-0.321226	annotation option if you	-0.124939
-0.321226	is good if you	-0.124939
-0.321226	single precision if you	-0.124939
-0.321226	memset line if you	-0.124939
-0.321226	this section if you	-0.124939
-0.321226	than C if you	-0.124939
-0.416199	variables global if you	-0.124939
-0.587487	be vectorized if you	-0.124939
-0.321226	is organized if you	-0.124939
-0.321226	some help if you	-0.124939
-0.321226	following explanation if you	-0.124939
-0.321226	small devices if you	-0.124939
-0.321226	database anyway if you	-0.124939
-0.321226	programming questions if you	-0.124939
-0.321226	pointer aliasing" if you	-0.124939
-0.321226	memory leaks if you	-0.124939
-0.321226	be adjusted if you	-0.124939
-0.321226	Don't panic if you	-0.124939
-1.284113	that the code you	-0.124939
-1.494101	piece of code you	-0.124939
-0.302777	as long as you	-0.124939
-0.453069	As soon as you	-0.124939
-0.350602	and clumsy, as you	-0.124939
-0.350602	legal issue, as you	-0.124939
-1.416915	for the compiler you	-0.124939
-1.145603	by the compiler you	-0.124939
-1.183342	than the time you	-0.124939
-0.578400	maintain. The time you	-0.124939
-0.543752	The first time you	-0.124939
-0.441079	position-independent code when you	-0.124939
-0.341105	to do when you	-0.124939
-0.624927	for example when you	-0.124939
-0.341105	comes first when you	-0.124939
-0.441079	than signed when you	-0.124939
-0.441079	the counters when you	-0.124939
-0.341105	more readable when you	-0.124939
-0.341105	array indices when you	-0.124939
-0.394155	members are then you	-0.124939
-0.205395	the code then you	-0.124939
-0.205395	of code then you	-0.124939
-0.361759	a program then you	-0.124939
-0.411306	virtual functions then you	-0.124939
-0.276954	big loop then you	-0.124939
-0.276954	the cache then you	-0.124939
-0.361759	smart pointer then you	-0.124939
-0.276954	the object then you	-0.124939
-0.276954	of calculations then you	-0.124939
-0.276954	profiler works then you	-0.124939
-0.276954	data structure then you	-0.124939
-0.276954	b[1], ... then you	-0.124939
-0.276954	your application then you	-0.124939
-0.276954	exception handling then you	-0.124939
-0.276954	preceding addition then you	-0.124939
-0.276954	64-bit vectors then you	-0.124939
-0.276954	CPU supports then you	-0.124939
-0.276954	the code, then you	-0.124939
-0.276954	one instance then you	-0.124939
-0.276954	specific models then you	-0.124939
-0.276954	instruction set, then you	-0.124939
-0.276954	are independent then you	-0.124939
-0.276954	an integer, then you	-0.124939
-0.276954	version changes then you	-0.124939
-0.508986	If not, then you	-0.124939
-0.276954	four numbers, then you	-0.124939
-0.276954	cache efficiency, then you	-0.124939
-0.276954	parameters differ then you	-0.124939
-0.276954	acceptable limit, then you	-0.124939
-0.276954	particular meaning, then you	-0.124939
-0.276954	If so, then you	-0.124939
-0.276954	same algorithm, then you	-0.124939
-0.276954	and y?" then you	-0.124939
-0.589614	running a program you	-0.124939
-0.454343	Boolean operands because you	-0.124939
-0.351608	is fastest because you	-0.124939
-0.351608	of course, because you	-0.124939
-0.357840	runtime, if only you	-0.124939
-0.273789	exception. 64 If you	-0.124939
-0.274245	point code. If you	-0.124939
-0.274245	application-specific code. If you	-0.124939
-0.390015	static memory. If you	-0.124939
-0.357915	64-bit systems. If you	-0.124939
-0.357915	equally efficient. If you	-0.124939
-0.600769	instruction set. If you	-0.124939
-0.357915	point library. If you	-0.124939
-0.273789	clock cycles. If you	-0.124939
-0.357915	different thread. If you	-0.124939
-0.273789	= u; If you	-0.124939
-0.273789	endian storage. If you	-0.124939
-0.273789	page 16. If you	-0.124939
-0.273789	many branches. If you	-0.124939
-0.273789	at www.agner.org/optimize/asmlib.zip. If you	-0.124939
-0.273789	oriented programs. If you	-0.124939
-0.115667	reliable results. If you	-0.124939
-0.115667	reproducible results. If you	-0.124939
-0.273789	from errors. If you	-0.124939
-0.273789	these methods. If you	-0.124939
-0.273789	a macro. If you	-0.124939
-0.273789	or __debugbreak();. If you	-0.124939
-0.273789	cryptography (www.intel.com). If you	-0.124939
-0.273789	natural ordering? If you	-0.124939
-0.273789	systems). 42 If you	-0.124939
-0.273789	logical sequence. If you	-0.124939
-0.273789	} 152 If you	-0.124939
-0.448347	arrays automatically but you	-0.124939
-0.346868	operating system, but you	-0.124939
-0.346868	this manual, but you	-0.124939
-0.346868	known type, but you	-0.124939
-0.503200	On most compilers you	-0.124939
-0.345390	the function where you	-0.124939
-0.345390	general case where you	-0.124939
-0.345390	the Internet where you	-0.124939
-0.344025	in C++ so you	-0.124939
-0.344025	the code, so you	-0.124939
-0.344025	32 bits, so you	-0.124939
-0.356773	optimal algorithm before you	-0.124939
-0.260811	decimals, for example, you	-0.124939
-0.260811	If, for example, you	-0.124939
-0.260811	12.3a, for example, you	-0.124939
-0.798825	dispatching. For example, you	-0.124939
-0.546806	are. For example, you	-0.124939
-0.515242	like and how you	-0.124939
-0.556485	39 shows how you	-0.124939
-0.356146	big endian systems you	-0.124939
-1.052506	you are sure you	-0.124939
-0.336999	are not sure you	-0.124939
-0.860610	and make sure you	-0.124939
-0.356179	effect. Which method you	-0.124939
-1.068387	In this case you	-0.124939
-1.042404	in most cases you	-0.124939
-1.377873	In some cases you	-0.124939
-0.355393	one vector, while you	-0.124939
-0.355390	input. (In Windows you	-0.124939
-0.501672	1. How much you	-0.124939
-0.354705	class. Which solution you	-0.124939
-0.620209	regardless of whether you	-0.124939
-0.338644	no difference whether you	-0.124939
-0.353815	standard operations. All you	-0.124939
-0.353817	fastest first. However, you	-0.124939
-0.353539	kind of problems you	-0.124939
-0.234104	do so unless you	-0.124939
-0.234104	to & unless you	-0.124939
-0.047776	non-Intel CPUs unless you	-0.425969
-0.310199	flush-to-zero mode unless you	-0.124939
-0.234104	be avoided unless you	-0.124939
-0.234104	OS X, unless you	-0.124939
-0.234104	as b*(2.0/3.0) unless you	-0.124939
-1.071412	In most cases, you	-0.124939
-0.334369	In such cases, you	-0.124939
-0.261930	instruction set. Therefore, you	-0.124939
-0.261930	to it. Therefore, you	-0.124939
-0.483658	time consuming. Therefore, you	-0.124939
-0.261930	an exception. Therefore, you	-0.124939
-0.261930	different strides. Therefore, you	-0.124939
-0.261930	or namespaces. Therefore, you	-0.124939
-0.349378	code that allows you	-0.124939
-0.349378	container that allows you	-0.124939
-0.405564	Intel compiler allows you	-0.124939
-0.309173	compiler does what you	-0.124939
-0.436860	you know what you	-0.124939
-0.309173	measure exactly what you	-0.124939
-0.454650	file will give you	-0.124939
-0.133274	of which optimizations you	-0.124939
-0.133274	and which optimizations you	-0.124939
-0.297073	{ ... Here, you	-0.124939
-0.297073	from testing. Here, you	-0.124939
-0.297073	my blog. Here, you	-0.124939
-0.413888	lots of things you	-0.124939
-0.319368	are various things you	-0.124939
-0.316956	example, in Windows, you	-0.124939
-0.316956	short. In Windows, you	-0.124939
-0.349185	start to code, you	-0.124939
-0.347889	references are: When you	-0.124939
-0.348086	stay on until you	-0.124939
-0.299173	instructions that allow you	-0.124939
-0.423494	32-bit systems allow you	-0.124939
-0.299312	a C++ program, you	-0.124939
-0.299312	in your program, you	-0.124939
-0.445275	by the compiler, you	-0.124939
-0.344688	CPU dispatching. Obviously, you	-0.124939
-0.287088	{ ... Here you	-0.124939
-0.287088	suboptimal way. Here you	-0.124939
-0.278880	In most systems, you	-0.124939
-0.512278	In 64-bit systems, you	-0.124939
-0.338926	a composite object, you	-0.124939
-0.338687	is false. Likewise, you	-0.124939
-0.108242	intrinsic functions. Alternatively, you	-0.124939
-0.108242	cache size. Alternatively, you	-0.124939
-0.108242	own stack. Alternatively, you	-0.124939
-0.108242	function returns. Alternatively, you	-0.124939
-0.108242	in Windows). Alternatively, you	-0.124939
-0.172458	loop. In general, you	-0.124939
-0.172458	fast. In general, you	-0.124939
-0.172458	counterparts. In general, you	-0.124939
-0.606283	the latter case, you	-0.124939
-0.102694	long vector library, you	-0.124939
-0.102694	short vector library, you	-0.124939
-0.324644	is executed. Furthermore, you	-0.124939
-0.324644	functions that 150 you	-0.124939
-0.230970	In other words, you	-0.124939
-0.324847	profiling support. Then you	-0.124939
-0.314028	the smallest devices, you	-0.124939
-0.407260	of out-of-order execution, you	-0.124939
-0.443387	(Division is slow, you	-0.124939
-0.172331	propagation, etc. Whether you	-0.124939
-0.172331	and Sum3. Whether you	-0.124939
-0.463262	language. In fact, you	-0.124939
-0.713115	On the contrary, you	-0.124939
-0.172331	hot spots Before you	-0.124939
-0.172331	mathematical tasks. Before you	-0.124939
-0.314028	for NOT. Instead, you	-0.124939
-0.293542	the | operator; you	-0.124939
-0.293542	a specific purpose, you	-0.124939
-0.293542	it. The insight you	-0.124939
-0.293542	needed. Even better, you	-0.124939
-0.237316	In example 8.21, you	-0.124939
-0.237316	without jeopardizing safety, you	-0.124939
-0.237316	to optimize anything, you	-0.124939
-0.237316	For this reason, you	-0.124939
-0.237316	following way. First you	-0.124939
-0.357178	const Greek[4] = {	-0.124939
-0.357178	float coef[16] = {	-0.124939
-0.357838	7.41a class vector {	-0.124939
-0.041442	< 100; i++) {	-0.669007
-0.059761	< size; i++) {	-0.124939
-0.147435	< n; i++) {	-0.124939
-0.093725	< 256; i++) {	-0.124939
-0.093725	< 1000; i++) {	-0.124939
-0.093725	< 20; i++) {	-0.124939
-0.231259	< rows; i++) {	-0.124939
-0.021604	< NumberOfTests; i++) {	-0.124939
-0.093725	< arraysize; i++) {	-0.124939
-0.093725	< list.Size(); i++) {	-0.124939
-0.325851	else if else {	-0.124939
-0.005005	} } else {	-0.124939
-0.010068	0; } else {	-0.124939
-0.010068	b; } else {	-0.124939
-0.003330	1; } else {	-0.301030
-0.010068	lookup } else {	-0.124939
-0.005005	2; } else {	-0.425969
-0.010068	1.; } else {	-0.124939
-0.010068	nonzero } else {	-0.124939
-0.010068	134 } else {	-0.124939
-0.010068	range"; } else {	-0.124939
-0.010068	FuncA(i); } else {	-0.124939
-0.005005	F1(a); } else {	-0.124939
-0.010068	&CriticalFunction_SSE2; } else {	-0.124939
-0.010068	69 } else {	-0.124939
-0.063191	} 34 else {	-0.124939
-0.063191	} 68 else {	-0.124939
-0.002262	(float const x) {	-0.124939
-0.001130	(double const x) {	-0.124939
-0.000188	const & x) {	-0.301030
-0.003398	SomeFunction (int x) {	-0.124939
-0.003398	MultiplyBy (int x) {	-0.124939
-0.000753	parabola (float x) {	-0.602060
-0.000423	double p(double x) {	-0.425969
-0.000423	double xpow10(double x) {	-0.425969
-0.006822	IntegerPower (double x) {	-0.124939
-0.001695	float Exp(float x) {	-0.124939
-0.006822	int Func1(int x) {	-0.124939
-0.006822	double Func2(double x) {	-0.124939
-0.147712	y) { union {	-0.124939
-0.147712	Example 14.28 union {	-0.124939
-0.147712	Example 14.23b union {	-0.124939
-0.147712	Example 14.26 union {	-0.124939
-0.147712	Example 14.27 union {	-0.124939
-0.147712	Example 14.23 union {	-0.124939
-0.147712	Example 7.39 union {	-0.124939
-0.147712	Example 14.29 union {	-0.124939
-0.147712	Example 14.24 union {	-0.124939
-0.147712	Example 14.25 union {	-0.124939
-0.223548	const & b) {	-0.124939
-0.011061	a, bool b) {	-0.726999
-0.154899	__declspec(align(16)) struct S1 {	-0.124939
-0.154899	14.9 struct S1 {	-0.124939
-0.154899	7.35b struct S1 {	-0.124939
-0.154899	7.35a struct S1 {	-0.124939
-0.347779	Bitfield { struct {	-0.124939
-0.083817	(u.i[1] < 0) {	-0.124939
-0.136240	(n > 0) {	-0.124939
-0.049492	2 == 0) {	-0.124939
-0.049492	128 == 0) {	-0.124939
-0.049492	(a == 0) {	-0.124939
-0.115038	(b == 0) {	-0.124939
-0.062799	(a != 0) {	-0.124939
-0.062799	(n != 0) {	-0.124939
-0.062799	(b != 0) {	-0.124939
-0.341278	F0() { try {	-0.124939
-0.024170	int * p) {	-0.124939
-0.005917	const * p) {	-0.726999
-0.011917	(int * p) {	-0.425969
-0.000357	short int cc[]) {	-0.550907
-0.176168	v.i * 2) {	-0.124939
-0.016396	i += 2) {	-0.124939
-0.000933	{ if (b) {	-0.726999
-0.003746	} if (b) {	-0.124939
-0.001869	b; if (b) {	-0.425969
-0.504972	7.44 class C1 {	-0.124939
-1.346477	parm1, int parm2) {	-0.124939
-0.155450	int)n < 4) {	-0.124939
-0.155450	i += 4) {	-0.124939
-0.033962	(level >= 4) {	-0.425969
-0.433852	7.28 class c1 {	-0.124939
-0.004080	void test () {	-0.124939
-0.038181	void Func () {	-0.124939
-0.038181	void CriticalInnerFunction () {	-0.124939
-0.005365	i += 8) {	-0.726999
-0.038497	int & r) {	-0.425969
-0.080741	(int & r) {	-0.124939
-0.002865	< SIZE; r++) {	-0.425969
-0.011575	< SIZE; c++) {	-0.124939
-0.011575	< r; c++) {	-0.425969
-0.048275	unsigned int n) {	-0.124939
-0.007682	factorial (int n) {	-0.425969
-0.015503	SomeFunction (int n) {	-0.124939
-0.044094	< 100; x++) {	-0.425969
-0.324720	N> class powN {	-0.124939
-0.211963	7.40b union Bitfield {	-0.124939
-0.211963	7.40a struct Bitfield {	-0.124939
-0.324475	i >= size) {	-0.124939
-0.005749	virtual void Disp() {	-0.425969
-0.005749	public: void Disp() {	-0.425969
-0.122375	functions class CHello {	-0.124939
-0.027515	: public CHello {	-0.425969
-0.093176	& enum Weekdays {	-0.124939
-0.093176	conditions enum Weekdays {	-0.124939
-0.407055	(seconds < 5) {	-0.124939
-0.037088	(level >= 11) {	-0.425969
-0.037088	} int main() {	-0.425969
-0.172252	Devirtualization class C0 {	-0.124939
-0.172252	: public C0 {	-0.124939
-0.313863	long long ReadTSC() {	-0.124939
-0.293385	i += 16) {	-0.124939
-0.102595	const & a) {	-0.124939
-0.102595	square (float a) {	-0.124939
-0.023467	: public CParent<CChild1> {	-0.124939
-0.102595	class: class CGrandParent {	-0.124939
-0.102595	: public CGrandParent {	-0.124939
-0.023467	void F3(bool y) {	-0.124939
-0.537408	typeof(CriticalFunction) * CriticalFunctionDispatch(void) {	-0.124939
-0.381816	}; void F1() {	-0.124939
-0.102595	module2.cpp int Func2() {	-0.124939
-0.102595	} void Func2() {	-0.124939
-0.023467	(u.i & 0x7FFFFFFF) {	-0.425969
-0.048275	public: void Hello() {	-0.124939
-0.048275	Disp(); void Hello() {	-0.124939
-0.048275	{ if (y) {	-0.124939
-0.048275	}; if (y) {	-0.124939
-0.048275	c1 += TILESIZE) {	-0.124939
-0.048275	r1 += TILESIZE) {	-0.124939
-0.102595	< c1+TILESIZE; c2++) {	-0.124939
-0.102595	< r2; c2++) {	-0.124939
-0.023467	< r1+TILESIZE; r2++) {	-0.425969
-0.023467	void transpose(double a[SIZE][SIZE]) {	-0.425969
-0.293385	N> class SafeArray {	-0.124939
-0.023467	a[SIZE][SIZE], double b[SIZE][SIZE]) {	-0.425969
-0.293385	B1, public B2 {	-0.124939
-0.293385	Day == Friday) {	-0.124939
-0.237177	int)i < 10) {	-0.124939
-0.237177	: public B1 {	-0.124939
-0.237177	const & source) {	-0.124939
-0.237177	the array i) {	-0.124939
-0.237177	}; void g() {	-0.124939
-0.237177	} void F0() {	-0.124939
-0.237177	template<> class powN<true,0> {	-0.124939
-0.237177	(absvalue > largest_abs) {	-0.124939
-0.237177	N> class powN<true,N> {	-0.124939
-0.237177	>= (unsigned int)size) {	-0.124939
-0.237177	}; struct Sdouble {	-0.124939
-0.237177	< &list[100]; temp++) {	-0.124939
-0.237177	7.36 class S2 {	-0.124939
-0.237177	7.37 class S3 {	-0.124939
-0.237177	block: 62 __try {	-0.124939
-0.237177	8.10a if (true) {	-0.124939
-0.237177	: public CParent<CChild2> {	-0.124939
-0.237177	n; switch (n) {	-0.124939
-0.237177	list[i] > 1.0) {	-0.124939
-0.237177	x, int m) {	-0.124939
-0.237177	&& WriteFile(handle, ...)) {	-0.124939
-0.237177	EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH) {	-0.124939
-0.237177	i <= max) {	-0.124939
-0.237177	a, int x[]) {	-0.124939
-0.237177	}; struct Slongdouble {	-0.124939
-0.237177	<= 16; n++) {	-0.124939
-0.237177	Wednesday | Friday)) {	-0.124939
-0.237177	follows: struct Sfloat {	-0.124939
-0.237177	0xC0000091L void MathLoop() {	-0.124939
-0.237177	} int Size() {	-0.124939
-0.237177	(i >= N) {	-0.124939
-0.237177	{}; void xplus2() {	-0.124939
-0.237177	(i < arraysize) {	-0.124939
-0.237177	}; void Func() {	-0.124939
-0.237177	(u.i > v.i) {	-0.124939
-0.237177	template<> class powN<true,1> {	-0.124939
-0.237177	} catch (...) {	-0.124939
-0.237177	int)n < 13) {	-0.124939
-0.237177	int)(max - min)) {	-0.124939
-0.237177	a[N]; public: SafeArray() {	-0.124939
-0.237177	* __restrict bb) {	-0.124939
-0.237177	thread void DelayFiveSeconds() {	-0.124939
-1.331632	solution is to have	-0.124939
-1.515691	more efficient to have	-0.124939
-1.032824	are sure to have	-0.124939
-1.587195	is important to have	-0.124939
-0.865399	not necessary to have	-0.124939
-0.459439	not good to have	-0.124939
-0.799861	is certain to have	-0.124939
-0.459439	are allowed to have	-0.124939
-0.522430	be convenient to have	-0.124939
-0.653286	is sufficient to have	-0.124939
-0.657990	scan instruction and have	-0.124939
-0.914088	the loop and have	-0.124939
-0.731534	the functions that have	-0.425969
-0.528861	from compilers that have	-0.124939
-0.350766	saving registers that have	-0.124939
-0.493636	all systems that have	-0.124939
-0.561355	some processors that have	-0.124939
-0.245577	all operators that have	-0.124939
-0.245577	but operators that have	-0.124939
-0.245577	in programs that have	-0.124939
-0.245577	making programs that have	-0.124939
-0.350766	many labels that have	-0.124939
-0.350766	instruction. Programmers that have	-0.124939
-1.480580	that it can have	-0.124939
-0.590392	cached. This can have	-0.124939
-1.252439	then you can have	-0.124939
-0.562928	so you can have	-0.124939
-0.499168	virtual processors can have	-0.124939
-1.958416	of the code have	-0.124939
-0.969667	it will not have	-0.124939
-0.821639	that do not have	-0.124939
-0.614904	we do not have	-0.124939
-0.434489	systems do not have	-0.124939
-0.434489	vectors do not have	-0.124939
-0.434489	containers do not have	-0.124939
-0.898798	compiler does not have	-0.124939
-0.533268	programmer does not have	-0.124939
-1.191197	by the compiler have	-0.124939
-0.573856	cases you may have	-0.124939
-0.503773	The program may have	-0.124939
-0.848446	cores. You may have	-0.124939
-0.342832	other systems may have	-0.124939
-0.833198	operating system may have	-0.124939
-0.702069	virtual processor may have	-0.124939
-0.342832	history, etc. may have	-0.124939
-0.443255	point expressions may have	-0.124939
-0.342832	this unit-test may have	-0.124939
-0.725025	so that you have	-0.124939
-0.503965	Assume that you have	-0.124939
-0.527013	important if you have	-0.124939
-0.527013	precision if you have	-0.124939
-0.498980	soon as you have	-0.124939
-0.508444	calculations then you have	-0.124939
-0.508444	vectors then you have	-0.124939
-0.508444	numbers, then you have	-0.124939
-0.324936	endian systems you have	-0.124939
-0.324936	this case you have	-0.124939
-0.324936	operations. All you have	-0.124939
-0.409478	mode unless you have	-0.124939
-0.409478	avoided unless you have	-0.124939
-0.594361	which optimizations you have	-0.124939
-0.420821	way. Here you have	-0.124939
-0.851162	In general, you have	-0.124939
-0.324936	out-of-order execution, you have	-0.124939
-0.324936	optimize anything, you have	-0.124939
-0.344996	memory and will have	-0.124939
-0.938036	then you will have	-0.124939
-0.506911	four, we will have	-0.124939
-0.445983	end user will have	-0.124939
-0.344996	a processor will have	-0.124939
-0.344996	the loader will have	-0.124939
-0.535201	data. The data have	-0.124939
-0.354963	RGB image data have	-0.124939
-1.826522	of the program have	-0.124939
-0.429794	Windows if functions have	-0.124939
-0.332116	intrinsic vector functions have	-0.124939
-0.524626	Intel library functions have	-0.124939
-0.547113	Non-static member functions have	-0.124939
-0.467932	that these functions have	-0.124939
-0.332116	CPU- specific functions have	-0.124939
-0.332116	unwinding. All functions have	-0.124939
-0.467932	style string functions have	-0.124939
-0.332116	case. Inlined functions have	-0.124939
-0.540275	you can only have	-0.124939
-0.492361	measurement code should have	-0.124939
-0.349849	memory block should have	-0.124939
-0.809185	CPU dispatcher should have	-0.124939
-0.798038	languages that do have	-0.124939
-0.499705	But we do have	-0.124939
-0.667073	while other compilers have	-0.124939
-0.480947	compiler Intel compilers have	-0.124939
-0.470968	All C++ compilers have	-0.124939
-0.671437	Most C++ compilers have	-0.124939
-0.213273	optimization Some compilers have	-0.124939
-0.093674	systems. Some compilers have	-0.425969
-0.213273	compilers. Some compilers have	-0.124939
-0.213273	directives Some compilers have	-0.124939
-0.312502	compiler. Some compilers have	-0.124939
-0.213273	order. Some compilers have	-0.124939
-0.213273	places). Some compilers have	-0.124939
-0.531859	loop. Most compilers have	-0.124939
-0.304088	slower. Many compilers have	-0.124939
-0.503021	and code size have	-0.124939
-0.461843	published by Intel have	-0.124939
-0.901915	a and b have	-0.124939
-1.293767	vector class library have	-0.124939
-0.340581	The compilers also have	-0.124939
-0.063426	Some systems also have	-0.425969
-0.340581	by F1 also have	-0.124939
-0.085460	before all objects have	-0.124939
-0.040631	after all objects have	-0.425969
-0.135454	key. Do objects have	-0.124939
-0.135454	map. Do objects have	-0.124939
-0.547478	multithreading that we have	-0.124939
-0.394060	times then we have	-0.124939
-0.394060	10000, then we have	-0.124939
-0.332523	} Here, we have	-0.124939
-0.332523	hexadecimal numbers, we have	-0.124939
-0.461116	__declspec(thread). Such variables have	-0.124939
-0.324168	111 } You have	-0.124939
-0.324168	intrinsic functions You have	-0.124939
-0.324168	error handling. You have	-0.124939
-0.324168	this manual. You have	-0.124939
-0.324168	page 72. You have	-0.124939
-0.324168	optimization job. You have	-0.124939
-0.063617	used if elements have	-0.124939
-0.481542	after all elements have	-0.124939
-0.347950	vector size often have	-0.124939
-0.347950	that hackers often have	-0.124939
-0.739373	and function libraries have	-0.124939
-0.512532	most function libraries have	-0.124939
-0.331080	not all libraries have	-0.124939
-0.331080	All these libraries have	-0.124939
-0.356241	stack. These registers have	-0.124939
-0.063048	64 bit systems have	-0.124939
-0.503948	applications. Some systems have	-0.124939
-0.356390	only after they have	-0.124939
-0.487807	It may even have	-0.124939
-0.346564	you don't even have	-0.124939
-0.355734	The AVX instructions have	-0.124939
-0.355926	Therefore, micro- processors have	-0.124939
-0.213997	function that I have	-0.124939
-0.151998	no compiler I have	-0.124939
-0.003680	the compilers I have	-0.522879
-0.018719	The compilers I have	-0.124939
-0.018719	different compilers I have	-0.124939
-0.151998	the examples I have	-0.124939
-0.069383	b; Here, I have	-0.124939
-0.069383	1]; Here, I have	-0.124939
-0.151998	is called. I have	-0.124939
-0.151998	in performance. I have	-0.124939
-0.151998	next element. I have	-0.124939
-0.151998	initialized arrays. I have	-0.124939
-0.151998	model number. I have	-0.124939
-0.151998	to call. I have	-0.124939
-0.151998	new one. I have	-0.124939
-0.151998	reductions manually. I have	-0.124939
-0.151998	own research, I have	-0.124939
-0.151998	maintenance easier. I have	-0.124939
-0.151998	particularly tricky. I have	-0.124939
-0.151998	this chapter, I have	-0.124939
-0.458170	newer Intel CPUs have	-0.124939
-0.310846	bytes. Some CPUs have	-0.124939
-0.310846	counters Many CPUs have	-0.124939
-0.403319	Most modern CPUs have	-0.124939
-0.310846	accumulators. Current CPUs have	-0.124939
-0.536492	optimization, it does have	-0.124939
-0.429835	The functions must have	-0.124939
-0.332149	container class must have	-0.124939
-0.332149	This task must have	-0.124939
-0.459073	runtime address calculations have	-0.124939
-0.568429	if different versions have	-0.124939
-0.331673	that it doesn't have	-0.425969
-0.665674	because it doesn't have	-0.124939
-0.273473	the compiler doesn't have	-0.124939
-0.356046	The compiler doesn't have	-0.124939
-0.330276	28) The threads have	-0.124939
-0.427491	no other threads have	-0.124939
-0.330276	when all threads have	-0.124939
-0.778713	in the thread have	-0.124939
-0.499467	compiler for Linux have	-0.124939
-0.337929	time you would have	-0.124939
-0.337929	then we would have	-0.124939
-0.354084	all five values have	-0.124939
-0.353592	and destination both have	-0.124939
-0.314188	Text strings typically have	-0.124939
-0.314188	such devices typically have	-0.124939
-0.314188	Software developers typically have	-0.124939
-0.575091	we should preferably have	-0.124939
-0.864159	should therefore preferably have	-0.124939
-0.472785	SSE2 instruction sets have	-0.124939
-0.472785	CISC instruction sets have	-0.124939
-0.115881	that you don't have	-0.425969
-0.274405	Therefore, you don't have	-0.124939
-0.077742	that we don't have	-0.425969
-0.261853	so we don't have	-0.124939
-0.237571	I simply don't have	-0.124939
-0.352836	These different methods have	-0.124939
-0.352226	small embedded applications have	-0.124939
-0.496039	but the examples have	-0.124939
-0.326519	possible. Smaller microprocessors have	-0.124939
-0.326519	operations Today's microprocessors have	-0.124939
-0.558341	that the operands have	-0.124939
-0.734432	if the operands have	-0.124939
-0.293738	expressions where operands have	-0.124939
-0.349779	However, these languages have	-0.124939
-0.348537	a framework sometimes have	-0.124939
-0.348475	processing capabilities still have	-0.124939
-0.346887	software development models have	-0.124939
-0.344587	the variables might have	-0.124939
-0.293756	constructs Most programmers have	-0.124939
-0.293756	program. Many programmers have	-0.124939
-0.780239	above the diagonal have	-0.124939
-0.343205	constant N1 could have	-0.124939
-0.441148	if the inputs have	-0.124939
-0.784919	the x86 family have	-0.124939
-0.341267	many people who have	-0.124939
-0.519842	thousand cache misses have	-0.124939
-0.268844	available information. They have	-0.124939
-0.268844	and 64-bit. They have	-0.124939
-0.187728	and that computers have	-0.124939
-0.187728	is because computers have	-0.124939
-0.187728	all modern computers have	-0.124939
-0.753479	the hot spots have	-0.124939
-0.420433	some development tools have	-0.124939
-0.324625	because all caches have	-0.124939
-0.314008	that software projects have	-0.124939
-0.020804	software. Smaller microcontrollers have	-0.124939
-0.020804	caching. Smaller microcontrollers have	-0.124939
-0.020804	microcontrollers: Smaller microcontrollers have	-0.124939
-0.293524	shared. You can't have	-0.124939
-0.293524	consider whether others have	-0.124939
-0.293524	Mac OS, etc.) have	-0.124939
-0.237299	(Integrated Development Environments) have	-0.124939
-0.237299	branch. Microprocessor designers have	-0.124939
-0.237299	C++ imple- mentations have	-0.124939
-0.237299	function in isolation have	-0.124939
-1.174161	the performance of this	-0.124939
-1.114320	The calculation of this	-0.124939
-1.054239	The advantage of this	-0.124939
-0.452826	take advantage of this	-0.124939
-0.455688	The name of this	-0.124939
-1.186026	the cost of this	-0.124939
-1.138955	the end of this	-0.124939
-0.967590	The costs of this	-0.124939
-0.667732	a discussion of this	-0.124939
-0.886268	further discussion of this	-0.124939
-0.554274	slow implementations of this	-0.124939
-1.100738	an explanation of this	-0.124939
-0.981881	takes care of this	-0.124939
-0.864374	The purpose of this	-0.124939
-0.183877	the scope of this	-0.301030
-0.562997	add a to this	-0.124939
-0.312494	simple solution to this	-0.124939
-0.312494	standard solution to this	-0.124939
-0.248097	library. Add to this	-0.124939
-0.248097	dispatching. Add to this	-0.124939
-0.355595	Possible solutions to this	-0.124939
-0.042475	an appendix to this	-0.425969
-0.089560	An appendix to this	-0.124939
-0.355595	The conclusion to this	-0.124939
-0.721248	is used and this	-0.124939
-0.523785	unsigned integer and this	-0.124939
-0.460611	an integer, and this	-0.124939
-0.356548	is accessed, and this	-0.124939
-0.356548	value infinity, and this	-0.124939
-0.525163	15.1b, and in this	-0.124939
-1.227597	The code in this	-0.124939
-0.555470	The data in this	-0.124939
-0.994523	the loop in this	-0.124939
-0.521206	case, but in this	-0.124939
-0.345671	to float in this	-0.124939
-0.446836	other address in this	-0.124939
-0.345671	function libraries in this	-0.124939
-0.345671	element 0 in this	-0.124939
-0.345671	32-bit Windows in this	-0.124939
-0.696463	efficient solution in this	-0.124939
-0.446836	exception handling in this	-0.124939
-0.486572	The examples in this	-0.124939
-0.525163	is needed in this	-0.124939
-0.446836	if statement in this	-0.124939
-0.540488	by columns in this	-0.124939
-1.076674	as described in this	-0.124939
-0.529156	will occur in this	-0.124939
-0.633749	error message in this	-0.124939
-0.345671	can define in this	-0.124939
-0.345671	Catch exceptions in this	-0.124939
-0.345671	is measured in this	-0.124939
-0.345671	the reduction in this	-0.124939
-0.492677	as illustrated in this	-0.124939
-0.345671	If MultiplyBy in this	-0.124939
-0.345671	: 0] in this	-0.124939
-0.345671	two formulas in this	-0.124939
-0.345671	so 1.2 in this	-0.124939
-0.345671	other volumes in this	-0.124939
-0.358451	reference parameters). The this	-0.124939
-0.949005	a function for this	-0.124939
-0.567831	optimal code for this	-0.124939
-0.239451	The reason for this	-0.124939
-0.352616	The reasons for this	-0.124939
-0.352616	be mispredicted for this	-0.124939
-0.496208	never designed for this	-0.124939
-0.352616	the basis for this	-0.124939
-0.352616	dispatching 125 for this	-0.124939
-0.352616	than doubled for this	-0.124939
-1.154401	the compiler that this	-0.124939
-0.539071	almost certain that this	-0.124939
-0.461362	to note that this	-0.124939
-0.462974	to 0 // this	-0.124939
-0.358507	can tell it this	-0.124939
-0.579684	functions, or if this	-0.124939
-0.814344	a loop if this	-0.124939
-0.353495	loop automatically if this	-0.124939
-0.353495	don't know if this	-0.124939
-0.353495	lookup tables if this	-0.124939
-0.772363	on processors with this	-0.124939
-0.531796	The problem with this	-0.124939
-0.994572	is AND'ed with this	-0.124939
-0.455742	for dealing with this	-0.124939
-0.352712	I disagree with this	-0.124939
-0.353892	For more on this	-0.124939
-0.142168	to turn on this	-0.425969
-0.353892	make measurements on this	-0.124939
-1.257404	as long as this	-0.124939
-1.120136	do not have this	-0.124939
-0.582264	general, you have this	-0.124939
-0.584407	than to use this	-0.124939
-0.441781	compiler can use this	-0.425969
-0.651336	You can use this	-0.124939
-0.542440	can then use this	-0.124939
-0.687611	You should use this	-0.124939
-0.481091	and always use this	-0.124939
-0.441832	not normally use this	-0.124939
-0.352730	array elements then this	-0.124939
-0.352730	critical stride then this	-0.124939
-0.647558	clock cycles, then this	-0.124939
-0.355058	be clear from this	-0.124939
-0.355058	can learn from this	-0.124939
-0.582992	already known at this	-0.124939
-1.334775	how to make this	-0.124939
-0.563010	list and make this	-0.124939
-0.560233	compiler can make this	-0.124939
-0.761885	do not make this	-0.124939
-0.933341	member function because this	-0.124939
-0.477373	of pointers because this	-0.124939
-0.339002	= 0 because this	-0.124939
-0.339002	VIA processors because this	-0.124939
-0.339002	always position-independent because this	-0.124939
-0.339002	time-consuming tasks because this	-0.124939
-0.339002	^= 0x80000000; because this	-0.124939
-0.354328	cache size. If this	-0.124939
-0.354328	load address. If this	-0.124939
-0.785549	models on which this	-0.124939
-0.409226	efficient code, but this	-0.124939
-0.315614	of course, but this	-0.124939
-0.315614	function library, but this	-0.124939
-0.409226	imported pointer, but this	-0.124939
-0.315614	it occurs, but this	-0.124939
-0.315614	template metaprogramming, but this	-0.124939
-0.315614	addition unit, but this	-0.124939
-0.315614	the factorials, but this	-0.124939
-0.315614	than 2-20, but this	-0.124939
-0.315614	with -mcmodel=large, but this	-0.124939
-0.315614	memory block, but this	-0.124939
-0.315614	option -ftrapv, but this	-0.124939
-0.315614	public symbols, but this	-0.124939
-0.973763	necessary to do this	-0.124939
-1.135518	able to do this	-0.124939
-0.532055	safer to do this	-0.124939
-0.582516	you can do this	-0.124939
-0.675356	compilers will do this	-0.124939
-0.656785	I am using this	-0.124939
-0.213930	} } In this	-0.124939
-0.213930	c; } In this	-0.124939
-0.380322	system code. In this	-0.124939
-0.292166	+= b; In this	-0.124939
-0.292166	the resources. In this	-0.124939
-0.380322	single-thread speed. In this	-0.124939
-0.292166	dependency chains. In this	-0.124939
-0.292166	-2.0 55 In this	-0.124939
-0.292166	reused elsewhere. In this	-0.124939
-0.292166	page 71). In this	-0.124939
-0.292166	clock cycle? In this	-0.124939
-0.292166	same divisor. In this	-0.124939
-0.292166	MAX(f(x), g(x)); In this	-0.124939
-0.357018	automatic prefetching so this	-0.124939
-0.356575	I will call this	-0.124939
-0.356861	operating systems". For this	-0.124939
-0.356604	the preceding example, this	-0.124939
-0.551337	12.4b shows how this	-0.124939
-0.962741	to know how this	-0.124939
-0.339522	chapter describes how this	-0.124939
-0.576982	way to test this	-0.124939
-0.586034	an operating system this	-0.124939
-1.418017	In some cases this	-0.124939
-1.251660	that you want this	-0.124939
-0.344537	you don't want this	-0.124939
-0.355437	too worried about this	-0.124939
-0.500626	value. It does this	-0.124939
-0.483950	possible to avoid this	-0.124939
-0.791918	ways to avoid this	-0.124939
-0.282045	You can avoid this	-0.124939
-0.595816	You may avoid this	-0.124939
-0.387354	You cannot avoid this	-0.124939
-0.342635	CPU time. But this	-0.124939
-0.342635	Java today. But this	-0.124939
-0.501815	data object through this	-0.124939
-0.534517	CPUs that support this	-0.124939
-0.779167	able to inline this	-0.124939
-0.458455	able to optimize this	-0.124939
-0.458455	try to optimize this	-0.124939
-0.317097	compiler will optimize this	-0.124939
-0.300138	is used. However, this	-0.124939
-0.300138	actual processor. However, this	-0.124939
-0.300138	program flow. However, this	-0.124939
-0.300138	later maintenance. However, this	-0.124939
-0.056932	compiler may replace this	-0.726999
-0.087329	You may replace this	-0.124939
-0.343387	compiler will replace this	-0.124939
-0.353037	be implemented like this	-0.124939
-0.499253	we are running this	-0.124939
-0.352272	the same after this	-0.124939
-0.455071	need only read this	-0.124939
-0.569612	next example shows this	-0.124939
-0.803631	You can improve this	-0.124939
-0.457663	lengths to reduce this	-0.124939
-0.057390	compiler may reduce this	-0.425969
-0.497860	processors, and choose this	-0.124939
-0.319305	to work around this	-0.124939
-0.319305	various ways around this	-0.124939
-0.469635	Microsoft compiler supports this	-0.124939
-0.450073	The CPU supports this	-0.124939
-0.642794	compiler may change this	-0.124939
-0.349224	functions. 80 Unfortunately, this	-0.124939
-0.450805	i is outside this	-0.124939
-0.348462	rebooted. To prevent this	-0.124939
-0.346971	multiplication b[i]*c[i], though this	-0.124939
-0.346971	executing instructions during this	-0.124939
-0.345892	performs best under this	-0.124939
-0.346070	instruction and expect this	-0.124939
-0.765452	The reason why this	-0.124939
-0.299001	no explanation why this	-0.124939
-0.293691	all variables. Obviously, this	-0.124939
-0.293691	is finished. Obviously, this	-0.124939
-0.543580	order to implement this	-0.124939
-0.343351	PTR [ecx+eax*4],ebx stores this	-0.124939
-0.341069	and Mac systems, this	-0.124939
-0.516747	Windows operating system, this	-0.124939
-0.714820	compiler can eliminate this	-0.124939
-0.335405	faster. Of course, this	-0.124939
-0.335405	I am giving this	-0.124939
-0.818414	how to overcome this	-0.124939
-0.331209	Add to 122 this	-0.124939
-0.331397	saying please install this	-0.124939
-0.331020	sar ebx,1 adds this	-0.124939
-0.486765	occur, but unfortunately this	-0.124939
-0.324525	in edx. Furthermore, this	-0.124939
-0.594029	Let me explain this	-0.124939
-0.324525	possible remedies against this	-0.124939
-0.324757	to cause overflow, this	-0.124939
-0.420309	multiplication by changing this	-0.124939
-0.313911	You may skip this	-0.124939
-0.313911	user's computers. At this	-0.124939
-0.314213	compiler has solved this	-0.124939
-0.077651	way to solve this	-0.124939
-0.077651	designed to solve this	-0.124939
-0.293431	the microprocessor handles this	-0.124939
-0.293431	I will conclude this	-0.124939
-0.237218	You can subtract this	-0.124939
-0.237218	very often underestimate this	-0.124939
-0.237218	I have confirmed this	-0.124939
-0.237218	/ 1.2345; Change this	-0.124939
-0.237218	number to reflect this	-0.124939
-1.576724	This is the time	-0.124939
-0.760064	most of the time	-0.124939
-1.036786	value of the time	-0.425969
-0.838617	fraction of the time	-0.124939
-0.568628	lengths of the time	-0.124939
-0.985738	50% of the time	-0.124939
-0.197225	99% of the time	-0.425969
-0.568628	1/50 of the time	-0.124939
-1.214775	equal to the time	-0.124939
-0.583384	compared to the time	-0.124939
-0.575574	contentions and the time	-0.124939
-0.575574	call, and the time	-0.124939
-0.592239	frequent if the time	-0.124939
-0.989473	obtained with the time	-0.124939
-0.570054	compared with the time	-0.124939
-0.589740	efforts on the time	-0.124939
-0.747063	more than the time	-0.124939
-0.229503	less than the time	-0.301030
-0.558853	don't have the time	-0.124939
-0.042161	to this the time	-0.425969
-0.088860	122 this the time	-0.124939
-0.388533	name at the time	-0.124939
-0.045054	unknown at the time	-0.425969
-0.388533	popular at the time	-0.124939
-0.388533	lost at the time	-0.124939
-0.566622	not only the time	-0.124939
-0.645601	but also the time	-0.124939
-0.582604	cycles before the time	-0.124939
-1.497617	to calculate the time	-0.124939
-0.519381	may read the time	-0.124939
-0.530323	way includes the time	-0.124939
-0.769803	and stores the time	-0.124939
-0.530323	can increase the time	-0.124939
-0.494984	for reducing the time	-0.124939
-0.454506	processors. Consider the time	-0.124939
-0.454506	by measuring the time	-0.124939
-0.351736	addition to) the time	-0.124939
-0.596809	graphics function is time	-0.124939
-1.315726	can be a time	-0.124939
-0.593041	microseconds as a time	-0.124939
-0.137143	elements at a time	-0.124939
-0.338724	bytes at a time	-0.124939
-0.476991	line at a time	-0.124939
-0.338724	numbers at a time	-0.124939
-0.338724	piece at a time	-0.124939
-1.045080	a lot of time	-0.124939
-0.888076	a waste of time	-0.124939
-0.449434	and waste of time	-0.124939
-0.142819	branch ahead of time	-0.124939
-0.142819	counter ahead of time	-0.124939
-0.725838	more complicated and time	-0.124939
-0.552227	user's time. The time	-0.124939
-0.798067	a program. The time	-0.124939
-0.525407	given below. The time	-0.124939
-0.525407	test loop. The time	-0.124939
-0.348471	Program installation The time	-0.124939
-0.348471	12 bytes. The time	-0.124939
-0.738419	the calculations. The time	-0.124939
-0.348471	right prediction. The time	-0.124939
-0.348471	to maintain. The time	-0.124939
-0.639201	is doubled. The time	-0.124939
-0.348471	user input. The time	-0.124939
-0.639201	programming style. The time	-0.124939
-0.348471	is run. The time	-0.124939
-0.348471	Ignoring virtualization. The time	-0.124939
-0.348471	Windows: __rdtsc()). The time	-0.124939
-0.348471	certain tolerance. The time	-0.124939
-0.348471	performance costs. The time	-0.124939
-0.899726	system can be time	-0.124939
-0.787738	these methods are time	-0.124939
-0.358544	high resolution if time	-0.124939
-0.358395	network resources. This time	-0.124939
-0.355863	instructions during this time	-0.124939
-0.355863	often underestimate this time	-0.124939
-0.526139	Other programs use time	-0.124939
-0.420952	programs use more time	-0.124939
-0.291511	takes no more time	-0.124939
-0.291511	take no more time	-0.124939
-0.123931	that takes more time	-0.124939
-0.123931	it takes more time	-0.124939
-0.123931	list takes more time	-0.124939
-0.123931	conversion takes more time	-0.124939
-0.245621	and take more time	-0.124939
-0.353557	may take more time	-0.124939
-0.245621	functions take more time	-0.124939
-0.245621	sometimes take more time	-0.124939
-0.164730	takes much more time	-0.425969
-0.325041	to consume more time	-0.124939
-0.325041	takes 40% more time	-0.124939
-0.355165	to sum1 from time	-0.124939
-0.355165	to sum2 from time	-0.124939
-1.485904	in the same time	-0.124939
-0.777145	at the same time	-0.124939
-0.194224	take the same time	-0.124939
-1.250961	of the CPU time	-0.124939
-0.354485	spend more CPU time	-0.124939
-0.357944	time measurement. If time	-0.124939
-0.584286	mispredicted only one time	-0.124939
-0.062907	rather than each time	-0.425969
-0.336667	from memory each time	-0.124939
-0.336667	the value each time	-0.124939
-0.435500	switches after each time	-0.124939
-0.336667	for updates each time	-0.124939
-0.357187	of course also time	-0.124939
-0.357180	it obviously takes time	-0.124939
-1.016952	which is very time	-0.124939
-0.462887	takes a long time	-0.124939
-0.133878	take a long time	-0.124939
-0.462887	quite a long time	-0.124939
-0.543007	times as long time	-0.124939
-0.473917	a very long time	-0.124939
-0.310334	measure how long time	-0.124939
-0.310334	takes too long time	-0.124939
-0.460979	access are critical time	-0.124939
-0.639381	for the first time	-0.124939
-0.450490	only the first time	-0.124939
-0.450490	calculated the first time	-0.124939
-0.450490	until the first time	-0.124939
-0.530404	performance: The first time	-0.124939
-0.317555	called only first time	-0.124939
-0.460059	problematic because these time	-0.124939
-0.460225	in a short time	-0.124939
-0.082474	most of its time	-0.124939
-0.321632	structure. The extra time	-0.124939
-0.379680	takes no extra time	-0.124939
-0.475795	take no extra time	-0.124939
-0.400187	on the execution time	-0.124939
-0.400187	give the execution time	-0.124939
-0.320120	of their execution time	-0.124939
-0.585448	the total execution time	-0.124939
-0.411571	users and much time	-0.124939
-0.072163	and how much time	-0.425969
-0.158753	measure how much time	-0.124939
-0.258176	value at compile time	-0.124939
-0.258176	calculations at compile time	-0.124939
-0.369721	calculated at compile time	-0.124939
-0.366032	known at compile time	-0.204120
-0.209080	resolved at compile time	-0.124939
-0.258176	(1./1.2345) at compile time	-0.124939
-0.511681	if the calculation time	-0.124939
-0.511681	but the calculation time	-0.124939
-0.326044	an estimated calculation time	-0.124939
-0.521046	thread will get time	-0.124939
-0.162193	do this every time	-0.124939
-0.162193	for example every time	-0.124939
-0.162193	are called every time	-0.124939
-0.162193	is done every time	-0.124939
-0.162193	the list every time	-0.124939
-0.162193	of branches every time	-0.124939
-0.035228	memory block every time	-0.425969
-0.162193	be loaded every time	-0.124939
-0.162193	for updates every time	-0.124939
-0.162193	a misprediction every time	-0.124939
-0.162193	are evaluated every time	-0.124939
-0.162193	be updated every time	-0.124939
-0.162193	is re-allocated every time	-0.124939
-0.162193	be re-calculated every time	-0.124939
-0.543324	until the next time	-0.124939
-0.517792	returns. The next time	-0.124939
-0.749752	most of their time	-0.124939
-0.134844	automatically. The development time	-0.124939
-0.134844	application. The development time	-0.124939
-0.518618	integer. The conversion time	-0.124939
-0.134155	same as last time	-0.124939
-0.134155	way as last time	-0.124939
-0.116500	that takes longer time	-0.124939
-0.116500	It takes longer time	-0.124939
-0.116500	multiplication takes longer time	-0.124939
-0.409252	to take longer time	-0.124939
-0.205385	by making longer time	-0.124939
-0.018575	takes much longer time	-0.602060
-0.351892	functions (methods) Each time	-0.124939
-0.351999	it. The load time	-0.124939
-0.349920	should take installation time	-0.124939
-0.037514	that the response time	-0.124939
-0.149051	because the response time	-0.124939
-0.078577	If the response time	-0.124939
-0.078577	test the response time	-0.124939
-0.349073	template parameter. No time	-0.124939
-0.348529	can be particularly time	-0.124939
-0.525335	is to save time	-0.124939
-0.786838	using the so-called time	-0.124939
-0.347112	CPU core during time	-0.124939
-0.268821	does not spend time	-0.124939
-0.268821	will never spend time	-0.124939
-0.438180	16.2. The measured time	-0.124939
-0.104372	Finding the biggest time	-0.425969
-0.429069	likely to consume time	-0.124939
-0.331252	for details. Development time	-0.124939
-0.324754	Internet at regular time	-0.124939
-0.324929	get more reproducible time	-0.124939
-0.314362	simply zero. Execution time	-0.124939
-0.314134	4 processor. Extra time	-0.124939
-0.314362	be an annoying time	-0.124939
-0.172383	but no compile- time	-0.124939
-0.172383	should allow compile- time	-0.124939
-0.293644	the overall computation time	-0.124939
-0.293644	function returns. Every time	-0.124939
-0.382134	etc. // Returns time	-0.124939
-0.382134	in develop- ment time	-0.124939
-0.237405	interfere with real time	-0.124939
-0.237405	one that saves time	-0.124939
-0.237405	cpuid // Read time	-0.124939
-0.237405	profilers are: Coarse time	-0.124939
-0.237405	get the exact time	-0.124939
-0.597649	said that the use	-0.124939
-0.561546	obtained by the use	-0.124939
-0.561546	additions by the use	-0.124939
-0.561546	bitfield by the use	-0.124939
-0.576499	directly with the use	-0.124939
-0.576499	rewritten with the use	-0.124939
-0.862459	This makes the use	-0.124939
-0.833635	it prevents the use	-0.124939
-0.502517	function. Avoid the use	-0.124939
-0.241189	to economize the use	-0.425969
-0.461363	addresses. Especially the use	-0.124939
-0.599650	absent in a use	-0.124939
-0.447343	performance is to use	-0.124939
-0.405891	problem is to use	-0.124939
-0.883307	solution is to use	-0.124939
-0.447343	lookup is to use	-0.124939
-0.447343	generates is to use	-0.124939
-0.447343	recommendation is to use	-0.124939
-0.543223	which function to use	-0.124939
-0.539751	optimization than to use	-0.124939
-0.836056	allows you to use	-0.124939
-0.809003	the program to use	-0.124939
-0.957318	compiler has to use	-0.124939
-0.803127	more efficient to use	-0.425969
-1.387641	is possible to use	-0.124939
-0.660023	it possible to use	-0.124939
-0.334861	which version to use	-0.124939
-0.334861	code branch to use	-0.124939
-0.819524	The way to use	-0.124939
-0.760366	is faster to use	-0.124939
-0.427554	often faster to use	-0.124939
-0.921529	of how to use	-0.124939
-0.887801	for how to use	-0.124939
-0.921529	shows how to use	-0.124939
-0.463739	illustrates how to use	-0.124939
-0.935971	you need to use	-0.124939
-1.088984	no need to use	-0.124939
-0.334861	which method to use	-0.124939
-0.334861	some cases to use	-0.124939
-0.822779	may want to use	-0.124939
-1.055063	is necessary to use	-0.124939
-0.504328	rarely necessary to use	-0.124939
-0.197504	is advantageous to use	-0.249877
-0.210871	not advantageous to use	-0.124939
-0.261188	less advantageous to use	-0.124939
-0.261188	always advantageous to use	-0.124939
-0.809003	deciding whether to use	-0.124939
-1.581293	is likely to use	-0.124939
-0.325988	is recommended to use	-0.550907
-0.487478	not recommended to use	-0.124939
-0.478178	is optimal to use	-0.124939
-0.478178	be optimal to use	-0.124939
-0.179463	no reason to use	-0.124939
-0.471689	you choose to use	-0.124939
-0.062667	is inefficient to use	-0.124939
-0.334861	cache lines to use	-0.124939
-0.471689	is safe to use	-0.124939
-0.519838	often easier to use	-0.124939
-0.433234	we expect to use	-0.124939
-0.334861	special reasons to use	-0.124939
-0.342766	is preferred to use	-0.124939
-0.237182	be preferred to use	-0.124939
-0.422339	is safer to use	-0.124939
-0.298305	are safer to use	-0.124939
-0.672581	may prefer to use	-0.124939
-0.334861	is profitable to use	-0.124939
-0.334861	it unwise to use	-0.124939
-0.334861	are cumbersome to use	-0.124939
-0.537791	a loop and use	-0.124939
-0.460769	eliminate i and use	-0.124939
-0.655365	user interface and use	-0.124939
-0.356672	it off and use	-0.124939
-0.356672	one local, and use	-0.124939
-1.053847	the program. The use	-0.124939
-0.501825	other purposes. The use	-0.124939
-0.501825	size vector. The use	-0.124939
-0.721471	multiple threads. The use	-0.124939
-0.358558	a matrix for use	-0.124939
-1.383091	the code that use	-0.124939
-0.551611	on instructions that use	-0.124939
-0.355060	container classes that use	-0.124939
-0.458721	all modules that use	-0.124939
-0.355060	to platforms that use	-0.124939
-0.499614	systems. Applications that use	-0.124939
-1.276893	that it can use	-0.124939
-0.541181	called, it can use	-0.124939
-0.542030	calling function can use	-0.124939
-0.843864	The compiler can use	-0.124939
-0.170183	Gnu compiler can use	-0.425969
-0.784900	optimizing compiler can use	-0.124939
-0.782059	if you can use	-0.124939
-0.684365	how you can use	-0.124939
-0.479071	Windows, you can use	-0.124939
-0.479071	Alternatively, you can use	-0.124939
-0.479071	reason, you can use	-0.124939
-0.552656	Intel compilers can use	-0.124939
-0.550722	systems we can use	-0.124939
-0.515884	time You can use	-0.124939
-0.515884	test. You can use	-0.124939
-0.441619	The programmer can use	-0.124939
-0.341533	user interface can use	-0.124939
-0.480858	or union can use	-0.124939
-0.502600	assembly code or use	-0.124939
-0.357199	instruction directly, or use	-0.124939
-0.749318	it will not use	-0.124939
-0.462465	It will not use	-0.124939
-0.774196	libraries do not use	-0.124939
-0.532849	they do not use	-0.124939
-0.568920	object does not use	-0.124939
-0.231453	set. Do not use	-0.124939
-0.231453	allocation. Do not use	-0.124939
-0.231453	block. Do not use	-0.124939
-0.231453	optimizations. Do not use	-0.124939
-0.231453	list. Do not use	-0.124939
-0.231453	resource. Do not use	-0.124939
-0.499288	or you may use	-0.124939
-1.149394	then you may use	-0.124939
-0.499288	systems, you may use	-0.124939
-0.446554	efficient. You may use	-0.124939
-0.446554	precision. You may use	-0.124939
-0.446554	application. You may use	-0.124939
-0.446554	expensive. You may use	-0.124939
-0.446554	used). You may use	-0.124939
-0.342873	runtime framework may use	-0.124939
-0.356324	Which method you use	-0.124939
-0.460326	difference whether you use	-0.124939
-0.489271	but this will use	-0.124939
-0.518997	The loop will use	-0.124939
-0.914727	Some compilers will use	-0.124939
-0.961797	Most compilers will use	-0.124939
-0.347621	class library will use	-0.124939
-0.451747	added and then use	-0.124939
-0.451747	calculations, and then use	-0.124939
-0.478237	compiler can then use	-0.124939
-0.339630	one element then use	-0.124939
-0.339630	output option then use	-0.124939
-0.622096	is used, then use	-0.124939
-0.137428	(FIFO) basis then use	-0.124939
-0.137428	(FILO) basis then use	-0.124939
-0.574144	application can make use	-0.124939
-0.357867	than optimizing CPU use	-0.124939
-0.357667	above examples all use	-0.124939
-0.561605	Both code cache use	-0.124939
-0.805818	and data cache use	-0.124939
-0.384627	operations. You should use	-0.124939
-0.384627	overlap. You should use	-0.124939
-0.349884	rights. Software should use	-0.124939
-0.357609	If you do use	-0.124939
-0.357431	directives. For example use	-0.124939
-0.353307	The best compilers use	-0.124939
-0.353307	// (Some compilers use	-0.124939
-0.578524	compiler can also use	-0.124939
-0.357344	obstacles to efficient use	-0.124939
-0.356993	should avoid any use	-0.124939
-0.539070	easier if we use	-0.124939
-1.089305	Floating point variables use	-0.124939
-0.558033	here. You cannot use	-0.124939
-0.768472	that they cannot use	-0.124939
-0.593665	supported. For example, use	-0.124939
-0.356375	Mac systems often use	-0.124939
-0.460451	container class libraries use	-0.124939
-0.346913	3.x. These systems use	-0.124939
-0.346913	in Unix-like systems use	-0.124939
-0.473907	is to always use	-0.124939
-0.435264	count and always use	-0.124939
-0.435264	process should always use	-0.124939
-0.560570	The vector operations use	-0.124939
-0.547398	the integer operations use	-0.124939
-0.355938	function libraries available use	-0.124939
-0.505533	For Intel CPUs use	-0.124939
-0.444785	for AMD CPUs use	-0.124939
-0.444442	Mathematical functions must use	-0.124939
-0.343774	Multithreaded programs must use	-0.124939
-0.984236	if the threads use	-0.124939
-0.458837	numbers to integers use	-0.124939
-0.538037	standard container classes use	-0.124939
-0.440876	of string classes use	-0.124939
-0.842099	may as well use	-0.124939
-0.298095	But many programs use	-0.124939
-0.298095	many common programs use	-0.124939
-0.298095	Some application programs use	-0.124939
-0.298095	time. Other programs use	-0.124939
-0.431346	structures that typically use	-0.124939
-0.431346	calculations will typically use	-0.124939
-0.749385	function should never use	-0.124939
-0.352378	to make better use	-0.124939
-0.602087	Many software applications use	-0.124939
-0.329073	when several applications use	-0.124939
-0.536758	modern programming languages use	-0.124939
-0.347050	solution. Many containers use	-0.124939
-0.346805	server in full use	-0.124939
-0.299317	economize the resource use	-0.124939
-0.299317	to economize resource use	-0.124939
-0.447561	loop. Some implementations use	-0.124939
-0.344769	example, some programmers use	-0.124939
-0.343295	against overkill. Don't use	-0.124939
-0.441182	within the DLL use	-0.124939
-0.339214	such applications. Alternatively, use	-0.124939
-0.324754	cases. The explicit use	-0.124939
-0.211935	do not normally use	-0.124939
-0.211935	Mac systems normally use	-0.124939
-0.314134	square brackets mean use	-0.124939
-0.314134	the future. To use	-0.124939
-0.023484	appropriate version (May use	-0.425969
-0.293644	on Intel CPUs: use	-0.124939
-0.293644	it occurs, (2) use	-0.124939
-0.293644	cache space. Excessive use	-0.124939
-0.293644	position. Windows DLLs use	-0.124939
-0.293644	best Java machines use	-0.124939
-0.293644	such as Java, use	-0.124939
-0.237405	+ 2 thenaandbcannot use	-0.124939
-0.237405	and stack entries use	-0.124939
-0.237405	a float. (Both use	-0.124939
-0.237405	and multiplications. Subtractions use	-0.124939
-0.805054	but not the more	-0.124939
-0.358620	the system, the more	-0.124939
-0.581412	zero that is more	-0.124939
-1.446533	that it is more	-0.124939
-1.538239	then it is more	-0.124939
-1.131510	where it is more	-0.124939
-0.562450	while it is more	-0.124939
-0.562450	Obviously, it is more	-0.124939
-0.581035	Vectorized code is more	-0.124939
-0.994135	The compiler is more	-0.124939
-0.978777	code. It is more	-0.124939
-0.900176	time. It is more	-0.124939
-0.978777	used. It is more	-0.124939
-0.533862	Windows. It is more	-0.124939
-0.775964	integers. It is more	-0.124939
-0.533862	throw. It is more	-0.124939
-0.533862	pooling. It is more	-0.124939
-0.533862	queue. It is more	-0.124939
-0.509713	numerical data is more	-0.124939
-0.573987	calling program is more	-0.124939
-0.577260	polymorphism, which is more	-0.124939
-0.810024	template class is more	-0.124939
-1.155109	that there is more	-0.124939
-1.450588	if there is more	-0.124939
-0.636183	data member is more	-0.124939
-0.346924	function libraries is more	-0.124939
-0.509713	file access is more	-0.124939
-0.346924	vector operations is more	-0.124939
-0.636183	composite type is more	-0.124939
-0.448417	64-bit Linux is more	-0.124939
-0.346924	Address calculation is more	-0.124939
-0.834643	the solution is more	-0.124939
-0.346924	bitwise operators is more	-0.124939
-0.346924	uncached write is more	-0.124939
-0.789470	one operand is more	-0.124939
-0.448417	The situation is more	-0.124939
-0.488305	Parameter transfer is more	-0.124939
-0.448417	memory blocks is more	-0.124939
-0.699276	The latter is more	-0.124939
-0.448417	code. #if is more	-0.124939
-0.346924	where pre-increment is more	-0.124939
-0.346924	= *(p++) is more	-0.124939
-0.346924	= array[i++] is more	-0.124939
-1.178046	lead to a more	-0.124939
-0.581927	same in a more	-0.124939
-1.039808	implemented in a more	-0.124939
-0.581927	likely in a more	-0.124939
-0.357418	90 Gives a more	-0.124939
-0.355218	becoming more and more	-0.124939
-0.521832	generally faster and more	-0.124939
-0.355218	become bigger and more	-0.124939
-0.499834	code smaller and more	-0.124939
-0.521832	less clear and more	-0.124939
-0.458922	32-bit mode, and more	-0.124939
-0.355218	more expensive and more	-0.124939
-0.355218	both cheaper and more	-0.124939
-0.572497	is accessed in more	-0.124939
-0.462189	b. But in more	-0.124939
-0.566192	is described in more	-0.124939
-0.546054	array and for more	-0.124939
-0.756873	same register for more	-0.124939
-0.355895	page 31 for more	-0.124939
-0.355895	and 119 for more	-0.124939
-0.355895	specific literature for more	-0.124939
-1.551905	it may be more	-0.124939
-0.810736	It may be more	-0.124939
-0.523224	may possibly be more	-0.124939
-0.580177	Leaf functions are more	-0.124939
-0.588497	Fortunately, there are more	-0.124939
-0.354473	of development are more	-0.124939
-0.354473	Context switches are more	-0.124939
-0.822638	point comparisons are more	-0.124939
-0.354473	(chapter 12) are more	-0.124939
-0.300750	and one or more	-0.124939
-0.300750	using one or more	-0.124939
-0.300750	read one or more	-0.124939
-0.300750	enable one or more	-0.124939
-0.066605	is two or more	-0.124939
-0.032027	are two or more	-0.124939
-0.032027	have two or more	-0.124939
-0.066605	between two or more	-0.124939
-0.066605	doing two or more	-0.124939
-0.066605	Make two or more	-0.124939
-0.343010	128 bytes or more	-0.124939
-0.343010	point expressions or more	-0.124939
-0.343010	templates. Two or more	-0.124939
-0.570173	pointers makes it more	-0.124939
-1.355565	be replaced by more	-0.124939
-0.357048	be increased by more	-0.124939
-0.356913	powerful computers with more	-0.124939
-0.722094	are satisfied with more	-0.124939
-0.333220	make the code more	-0.124939
-1.225376	makes the code more	-0.124939
-0.923639	floating point code more	-0.124939
-0.451490	the source code more	-0.124939
-0.349354	position- independent code more	-0.124939
-0.544046	loader will have more	-0.124939
-0.561590	if functions have more	-0.124939
-0.497745	developers typically have more	-0.124939
-0.563156	framework may use more	-0.124939
-0.522572	many programs use more	-0.124939
-0.345126	Instruction sets A more	-0.124939
-0.345126	is enabled. A more	-0.124939
-0.345126	garbage collection. A more	-0.124939
-0.345126	to date. A more	-0.124939
-0.345126	"__attribute__((visibility ("hidden")))". A more	-0.124939
-0.345126	and _mm_free. A more	-0.124939
-0.462435	will run at more	-0.124939
-0.588496	making the data more	-0.124939
-0.354963	for making data more	-0.124939
-1.063759	make the program more	-0.124939
-0.895132	compiler to make more	-0.124939
-0.882319	cache is used more	-0.124939
-0.357629	by adding one more	-0.124939
-0.582208	priority is no more	-0.124939
-0.563193	preferably have no more	-0.124939
-0.532974	object takes no more	-0.124939
-0.492353	calculations take no more	-0.124939
-1.138963	compiler to do more	-0.124939
-1.258847	able to do more	-0.124939
-0.450883	order or do more	-0.124939
-0.517586	way that takes more	-0.124939
-0.580054	Often, it takes more	-0.124939
-0.333095	linked list takes more	-0.124939
-0.489738	integer-to-float conversion takes more	-0.124939
-0.333095	runtime DLL takes more	-0.124939
-0.356872	instructions SSE4.1 some more	-0.124939
-0.460790	of making software more	-0.124939
-0.524140	individual array elements more	-0.124939
-0.356879	of software. For more	-0.124939
-0.499418	process to take more	-0.124939
-0.302133	itself and take more	-0.124939
-0.538149	it can take more	-0.124939
-0.339821	calculations may take more	-0.124939
-0.339821	process may take more	-0.124939
-0.427440	mathematical functions take more	-0.124939
-0.302133	in C++ take more	-0.124939
-0.302133	can sometimes take more	-0.124939
-1.101356	It is often more	-0.124939
-0.516009	system is often more	-0.124939
-0.356316	makes detailed optimization more	-0.124939
-0.346564	it is even more	-0.124939
-0.447962	cases. An even more	-0.124939
-0.459788	loop takes up more	-0.124939
-0.856451	makes function calls more	-0.124939
-0.325392	it is much more	-0.124939
-0.325392	effect is much more	-0.124939
-0.250407	where a much more	-0.124939
-0.312270	memory takes much more	-0.124939
-0.312270	often takes much more	-0.124939
-0.250407	typically take much more	-0.124939
-0.250407	are often much more	-0.124939
-0.250407	typically uses much more	-0.124939
-0.250407	be made much more	-0.124939
-0.250407	can obtain much more	-0.124939
-0.555962	and is therefore more	-0.124939
-0.544429	make and therefore more	-0.124939
-0.354849	101 Multithreading works more	-0.124939
-1.511443	can be calculated more	-0.124939
-0.754383	the address calculation more	-0.124939
-0.310816	big and uses more	-0.124939
-0.310816	but it uses more	-0.124939
-0.310816	an int uses more	-0.124939
-0.676966	the program uses more	-0.124939
-0.569062	ways to get more	-0.124939
-0.581245	SSSE3 a few more	-0.124939
-0.606020	few clock cycles more	-0.425969
-0.353409	non-Intel CPUs was more	-0.124939
-0.671749	makes data caching more	-0.124939
-0.572172	code makes caching more	-0.124939
-0.352836	makes program development more	-0.124939
-0.275438	The code becomes more	-0.124939
-0.392859	memory space becomes more	-0.124939
-0.530093	time is actually more	-0.124939
-0.567063	need to load more	-0.124939
-0.516421	make function calling more	-0.124939
-1.071331	can be made more	-0.124939
-0.493814	slower or require more	-0.124939
-0.350214	that can go more	-0.124939
-0.451889	computers have become more	-0.124939
-0.286464	because it gives more	-0.124939
-0.286464	it often gives more	-0.124939
-0.286464	VIA CPUs" gives more	-0.124939
-0.349724	This makes inlining more	-0.124939
-0.349463	can also find more	-0.124939
-0.988376	the assembly output more	-0.124939
-0.403822	zero is sometimes more	-0.124939
-0.403822	macros are sometimes more	-0.124939
-0.346887	range is possibly more	-0.124939
-0.314278	with a little more	-0.124939
-0.314278	becomes a little more	-0.124939
-0.434730	Try to allocate more	-0.124939
-0.248325	Does not allocate more	-0.124939
-0.248325	to even allocate more	-0.124939
-0.130895	it is slightly more	-0.124939
-0.130895	latter is slightly more	-0.124939
-0.060535	takes only slightly more	-0.124939
-0.152781	make Sum1 slightly more	-0.124939
-0.341160	relocated (rebased) once more	-0.124939
-0.338912	very well spend more	-0.124939
-0.735430	floating point comparisons more	-0.124939
-0.335649	same subexpression occurs more	-0.124939
-0.335790	cache cannot prefetch more	-0.124939
-0.428970	scanners to consume more	-0.124939
-0.343262	devices are becoming more	-0.124939
-0.237571	is therefore becoming more	-0.124939
-0.331121	invest in ever more	-0.124939
-0.314008	In some programs, more	-0.124939
-0.574255	rather than allocating more	-0.124939
-0.628416	it is certainly more	-0.124939
-0.381987	worthwhile to invest more	-0.124939
-0.293524	check makes dynamic_cast more	-0.124939
-0.293524	could be achieved more	-0.124939
-0.293524	to be cached more	-0.124939
-0.293524	method is somewhat more	-0.124939
-0.237299	profiler may sample more	-0.124939
-0.237299	code" actually implies more	-0.124939
-0.237299	it takes 40% more	-0.124939
-0.882320	be aware of when	-0.124939
-0.658239	keep track of when	-0.124939
-0.521223	virtual functions or when	-0.124939
-0.353405	as possible or when	-0.124939
-0.110776	64-bit mode or when	-0.602060
-0.357003	the power function when	-0.124939
-0.461189	the pow function when	-0.124939
-0.462577	for details on when	-0.124939
-0.570177	without position-independent code when	-0.124939
-0.501769	in vectorized code when	-0.124939
-0.357891	level-2 cache as when	-0.124939
-1.521705	more efficient than when	-0.124939
-0.586270	run faster than when	-0.124939
-0.354144	array index than when	-0.124939
-0.358047	stand alone compiler when	-0.124939
-0.358353	the object x when	-0.124939
-0.574091	virtualization. The time when	-0.124939
-1.027465	a long time when	-0.124939
-0.851436	no extra time when	-0.124939
-0.745188	the execution time when	-0.124939
-0.348645	develop- ment time when	-0.124939
-0.852359	free the memory when	-0.124939
-0.458348	loaded into memory when	-0.124939
-0.357525	a big program when	-0.124939
-0.316020	evaluate a only when	-0.124939
-0.495941	precedence, not only when	-0.124939
-0.721653	be used only when	-0.124939
-0.316020	register size only when	-0.124939
-0.316020	new branch only when	-0.124939
-0.316020	use AVX only when	-0.124939
-0.316020	be loaded only when	-0.124939
-0.316020	is chosen only when	-0.124939
-0.316020	2 applies only when	-0.124939
-0.105229	is mispredicted only when	-0.602060
-0.316020	is evaluated only when	-0.124939
-0.316020	the services only when	-0.124939
-0.873116	process is used when	-0.124939
-1.394899	may be used when	-0.124939
-0.810310	is also used when	-0.124939
-0.578653	possible instruction set when	-0.124939
-0.578653	newer instruction set when	-0.124939
-0.594558	thing to do when	-0.124939
-0.444541	program, for example when	-0.124939
-0.444541	loop, for example when	-0.124939
-0.357203	the default size when	-0.124939
-0.357263	to evaluate b when	-0.124939
-0.064903	large positive number when	-0.124939
-0.357514	simply put there when	-0.124939
-0.559898	main, but also when	-0.124939
-0.439952	1.5f; is efficient when	-0.124939
-0.816281	code more efficient when	-0.124939
-0.556490	becomes more efficient when	-0.124939
-1.335384	is less efficient when	-0.124939
-1.158684	is not possible when	-0.124939
-1.232639	powers of 2 when	-0.124939
-0.587828	case is faster when	-0.124939
-0.516723	will be faster when	-0.124939
-0.576870	destructor is called when	-0.124939
-0.797489	must be called when	-0.124939
-0.356584	code is critical when	-0.124939
-0.593401	valid. For example, when	-0.124939
-0.356122	parameter comes first when	-0.124939
-0.355941	most other libraries when	-0.124939
-0.558110	bit vector registers when	-0.124939
-0.576496	things to test when	-0.124939
-1.310730	in 32-bit systems when	-0.124939
-0.929077	This is useful when	-0.124939
-1.569098	can be useful when	-0.124939
-0.550436	operations are useful when	-0.124939
-0.503792	is very useful when	-0.124939
-0.292633	into memory even when	-0.124939
-0.292633	different objects even when	-0.124939
-0.292633	dispatch mechanism even when	-0.124939
-0.292633	are needed even when	-0.124939
-0.292633	always inlined even when	-0.124939
-0.292633	be used, even when	-0.124939
-0.292633	by default, even when	-0.124939
-0.292633	memory space, even when	-0.124939
-0.536933	single executable file when	-0.124939
-0.459269	and 512 bits when	-0.124939
-0.487413	with vector operations when	-0.124939
-0.487413	using vector operations when	-0.124939
-1.095764	in most cases when	-0.124939
-0.355363	at inconvenient times when	-0.124939
-1.013426	to the stack when	-0.124939
-0.880186	what you want when	-0.124939
-0.354967	methods also work when	-0.124939
-0.568256	and is compiled when	-0.124939
-0.499223	It is best when	-0.124939
-1.109799	is not necessary when	-0.124939
-0.568915	preferred programming language when	-0.124939
-0.355041	i<n; ++i). But when	-0.124939
-0.537466	transpose the matrix when	-0.124939
-0.809950	transpose a matrix when	-0.124939
-0.712937	and double precision when	-0.124939
-0.496658	than double precision when	-0.124939
-0.354306	line by line when	-0.124939
-0.842603	therefore be advantageous when	-0.124939
-0.820631	is a problem when	-0.124939
-0.539486	have this problem when	-0.124939
-0.582117	time to calculate when	-0.124939
-0.570465	as loop counter when	-0.124939
-0.353487	load several files when	-0.124939
-1.475500	dynamic memory allocation when	-0.124939
-0.352809	of CPU-intensive programs when	-0.124939
-0.456227	schemes cause problems when	-0.124939
-0.331861	it goes automatically when	-0.124939
-0.331861	or update automatically when	-0.124939
-0.352826	a different implementation when	-0.124939
-0.428690	faster than signed when	-0.124939
-0.331234	// Use signed when	-0.124939
-0.520604	is a disadvantage when	-0.124939
-0.352380	background process running when	-0.124939
-1.012503	in the end when	-0.124939
-0.351347	skip large expressions when	-0.124939
-0.453937	name. #define directives when	-0.124939
-0.351515	do cross-module optimizations when	-0.124939
-0.454225	on some microprocessors when	-0.124939
-0.350565	for each process when	-0.124939
-0.351318	has many advantages when	-0.124939
-0.350816	produce 32 results when	-0.124939
-0.349924	assembly language modules when	-0.124939
-0.212837	size is relevant when	-0.124939
-0.212837	speed is relevant when	-0.124939
-0.290204	possibly be relevant when	-0.124939
-0.225266	be allocated dynamically when	-0.425969
-0.849855	definitely be avoided when	-0.124939
-0.348864	comparisons are inefficient when	-0.124939
-0.348759	considerable delay comes when	-0.124939
-0.347809	for this task when	-0.124939
-0.546880	efficiency is obtained when	-0.124939
-0.303248	of the counters when	-0.124939
-0.912031	performance monitor counters when	-0.124939
-0.823745	works most efficiently when	-0.124939
-0.198290	much less efficiently when	-0.124939
-0.198290	somewhat less efficiently when	-0.124939
-0.243025	that is initialized when	-0.124939
-0.243025	table is initialized when	-0.124939
-0.409051	to be initialized when	-0.124939
-0.345799	code slower, especially when	-0.124939
-1.112765	an error message when	-0.124939
-0.345799	static library, except when	-0.124939
-0.344499	to execute CriticalFunction when	-0.124939
-0.511230	a performance penalty when	-0.124939
-0.344029	list[i] is invalid when	-0.124939
-0.440631	and fine-grained parallelism when	-0.124939
-0.149491	take into account when	-0.124939
-0.078951	problems into account when	-0.124939
-0.024756	taken into account when	-0.301030
-0.340554	is inefficient, however, when	-0.124939
-0.224376	becomes more fragmented when	-0.124939
-0.298632	space becomes fragmented when	-0.124939
-0.298632	easily become fragmented when	-0.124939
-0.340554	and mouse inputs when	-0.124939
-0.488071	version is preferred when	-0.124939
-0.340554	the space explicitly when	-0.124939
-0.420296	library is resolved when	-0.124939
-0.268697	is not resolved when	-0.124939
-0.337979	shared object. Likewise, when	-0.124939
-0.437702	the program itself when	-0.124939
-0.338199	possibly more serious when	-0.124939
-0.865493	Microsoft Visual Studio when	-0.124939
-0.351287	over the disadvantages when	-0.124939
-0.268322	overcome these disadvantages when	-0.124939
-0.337979	make an update when	-0.124939
-0.497079	start garbage collection when	-0.124939
-0.433508	exception is costly when	-0.124939
-0.335079	slower than truncation when	-0.124939
-0.334824	normal. This happens when	-0.124939
-0.433187	at different places when	-0.124939
-0.007032	The keyword static, when	-0.726999
-0.187768	it is deallocated when	-0.124939
-0.187768	they are deallocated when	-0.124939
-0.187768	is automatically deallocated when	-0.124939
-0.335079	this is permissible when	-0.124939
-0.330748	are less strict when	-0.124939
-0.330748	a viable compromise when	-0.124939
-0.330445	frequency is increased when	-0.124939
-0.465649	easier to understand when	-0.124939
-0.323959	much more dramatic when	-0.124939
-0.324332	come into force when	-0.124939
-0.419603	syntax is simpler when	-0.124939
-0.406431	to is deleted when	-0.124939
-0.313358	code motion manually when	-0.124939
-0.627077	compiler option -fno-pic when	-0.124939
-0.313358	the class Vec16s when	-0.124939
-0.406431	becomes more readable when	-0.124939
-0.406431	allocation is negligible when	-0.124939
-0.313358	or re- allocating when	-0.124939
-0.313358	information about Func1 when	-0.124939
-0.172009	memcpy function implicitly when	-0.124939
-0.172009	are done implicitly when	-0.124939
-0.406431	will be evicted when	-0.124939
-0.292904	space is freed when	-0.124939
-0.381227	and all 0's when	-0.124939
-0.292904	can be bypassed when	-0.124939
-0.292904	program is achieved when	-0.124939
-0.292904	must be careful when	-0.124939
-0.292904	and array indices when	-0.124939
-0.292904	memory requirement. Useful when	-0.124939
-0.292904	of the question when	-0.124939
-0.292904	floating point precisions when	-0.124939
-0.381227	is all 1's when	-0.124939
-0.292904	chains is stronger when	-0.124939
-0.381227	or four float's when	-0.124939
-0.236755	the loop exits, when	-0.124939
-0.236755	remarkably in popularity when	-0.124939
-0.236755	on the processor) when	-0.124939
-0.236755	and to Eclipse when	-0.124939
-0.236755	operating systems disappears when	-0.124939
-0.236755	more than 33% when	-0.124939
-0.236755	the memory released when	-0.124939
-0.236755	high and decreased when	-0.124939
-0.236755	to const definitions when	-0.124939
-1.846238	the value of A	-0.124939
-0.681729	the calculation of A	-0.301030
-0.355717	double Z = A	-0.124939
-0.459556	C; x.abc = A	-0.124939
-0.355717	double A2 = A	-0.124939
-1.031167	in Gnu compiler A	-0.124939
-1.036190	1; } } A	-0.124939
-0.452073	: b; } A	-0.124939
-1.160584	+ 2; } A	-0.124939
-0.641830	+= 1.0f; } A	-0.124939
-1.089064	Alignment of data A	-0.124939
-1.237794	code and data A	-0.124939
-0.520319	instead of functions A	-0.124939
-0.354186	them. Pure functions A	-0.124939
-1.252610	the innermost loop A	-0.124939
-0.310237	of const double A	-0.124939
-0.310237	variables const double A	-0.124939
-0.355703	the critical code. A	-0.124939
-0.674439	of the time. A	-0.124939
-0.756556	at a time. A	-0.124939
-0.596384	amount of time. A	-0.124939
-0.354475	the same time. A	-0.124939
-0.771694	at compile time. A	-0.124939
-0.298183	total calculation time. A	-0.124939
-0.756698	7.9 Smart pointers A	-0.124939
-0.138599	one other function. A	-0.124939
-0.138599	any other function. A	-0.124939
-0.365173	or member functions. A	-0.124939
-0.513786	non-static member functions. A	-0.124939
-0.402873	useful mathematical functions. A	-0.124939
-0.283594	from string functions. A	-0.124939
-0.369845	and frame functions. A	-0.124939
-0.283594	use thread-safe functions. A	-0.124939
-0.716130	a = b; A	-0.124939
-0.353458	of main memory. A	-0.124939
-0.524084	precision is used. A	-0.124939
-0.581358	is never used. A	-0.124939
-0.581358	no longer used. A	-0.124939
-0.352540	13.1. Instruction sets A	-0.124939
-0.527367	the 64-bit systems. A	-0.124939
-0.428184	some embedded systems. A	-0.124939
-0.517639	Pointer type conversion A	-0.124939
-0.423445	organizing the data. A	-0.124939
-0.491231	list of data. A	-0.124939
-0.285829	storing user data. A	-0.124939
-0.285829	or input data. A	-0.124939
-0.577705	each instruction set. A	-0.124939
-0.515542	with Intel processors. A	-0.124939
-0.350273	constants. Register storage A	-0.124939
-0.517982	that are called. A	-0.124939
-0.543632	or a pointer. A	-0.124939
-0.347969	and unsigned variables. A	-0.124939
-0.347716	Make functions local A	-0.124939
-0.346858	that calls it. A	-0.124939
-0.346858	copied into registers. A	-0.124939
-0.875777	in 64-bit mode. A	-0.124939
-0.346858	for each object. A	-0.124939
-0.346858	a static library. A	-0.124939
-0.346858	sequences of operations. A	-0.124939
-0.346717	needs careful optimization. A	-0.124939
-0.452753	difference in performance. A	-0.124939
-0.306981	for improved performance. A	-0.124939
-0.346012	of static libraries. A	-0.124939
-1.125832	on the stack. A	-0.124939
-0.345860	it is possible. A	-0.124939
-0.346165	into one thread. A	-0.124939
-0.632476	the AVX instructions. A	-0.124939
-0.345015	the other way. A	-0.124939
-0.217502	is predicted well. A	-0.124939
-0.217502	not predicted well. A	-0.124939
-0.343639	a different address. A	-0.124939
-0.723586	Constructors and destructors A	-0.124939
-0.543292	optimization is enabled. A	-0.124939
-0.264465	it points to. A	-0.124939
-0.377869	r points to. A	-0.124939
-0.447845	allocated memory block. A	-0.124939
-0.286422	the next block. A	-0.124939
-0.442668	is particularly critical. A	-0.124939
-0.627356	Performance and usability A	-0.124939
-0.340329	an output file. A	-0.124939
-0.440387	accessed. Pointer arithmetic A	-0.124939
-0.340555	Programmable logic devices A	-0.124939
-0.097754	same memory space. A	-0.124939
-0.097754	takes memory space. A	-0.124939
-0.479822	of cache space. A	-0.124939
-0.340103	byte of zero. A	-0.124939
-0.340103	of your software. A	-0.124939
-0.340329	organized into vectors. A	-0.124939
-0.337723	of template parameters. A	-0.124939
-0.433040	efficiency is important. A	-0.124939
-0.268166	becoming increasingly important. A	-0.124939
-0.436826	3.14 Context switches A	-0.124939
-0.350834	the hard disk. A	-0.124939
-0.267947	a floppy disk. A	-0.124939
-0.882415	in this case. A	-0.124939
-0.337980	templates for polymorphism A	-0.124939
-0.337980	or full speed. A	-0.124939
-0.334570	on every call. A	-0.124939
-0.389853	of branch prediction. A	-0.124939
-0.273666	about branch prediction. A	-0.124939
-0.504678	all data members. A	-0.124939
-0.432868	mechanisms explained above. A	-0.124939
-0.334867	the same result. A	-0.124939
-0.854111	loop-carried dependency chain. A	-0.124939
-0.613015	at inconvenient times. A	-0.124939
-0.433241	simple integer counter. A	-0.124939
-0.334867	no other branches. A	-0.124939
-0.500805	between CPU cores. A	-0.124939
-0.427387	using a profiler. A	-0.124939
-0.330547	cases. 7.28 Templates A	-0.124939
-0.496060	and time consuming. A	-0.124939
-0.330193	justify the method. A	-0.124939
-0.465305	and development tools. A	-0.124939
-0.604854	in one operation. A	-0.124939
-0.486088	loop unroll factor. A	-0.124939
-0.604190	for each process. A	-0.124939
-0.330193	is not necessary. A	-0.124939
-0.427387	sin. Pointer elimination A	-0.124939
-0.330547	for finding elements. A	-0.124939
-0.330547	is not doubled. A	-0.124939
-0.330193	be mentioned here: A	-0.124939
-0.648671	of an exception. A	-0.124939
-0.324146	doesn't need initialization. A	-0.124939
-0.419294	access. 3.10 Graphics A	-0.124939
-0.324146	a simple index. A	-0.124939
-0.324146	a few lines. A	-0.124939
-0.323710	other hardware conditions. A	-0.124939
-0.456480	for an example. A	-0.124939
-0.092936	are very expensive. A	-0.124939
-0.092936	also very expensive. A	-0.124939
-0.323710	in two versions. A	-0.124939
-0.323710	logically distinct tasks. A	-0.124939
-0.679992	graphical user interface. A	-0.124939
-0.592085	of 4 floats A	-0.124939
-0.592893	Access data sequentially A	-0.124939
-0.710834	long dependency chains. A	-0.124939
-0.626579	invariant code motion A	-0.124939
-0.442160	and garbage collection. A	-0.124939
-0.313116	instead of int. A	-0.124939
-0.313116	a const reference. A	-0.124939
-0.626579	more error prone. A	-0.124939
-0.380944	53. 7.24 Unions A	-0.124939
-0.292673	the constant subexpression. A	-0.124939
-0.292673	same bits differently. A	-0.124939
-0.380944	well, of course. A	-0.124939
-0.292673	be avoided. 37 A	-0.124939
-0.536160	up to date. A	-0.124939
-0.536160	(see page 78). A	-0.124939
-0.102365	profiling and debugging. A	-0.124939
-0.102365	incompatible with debugging. A	-0.124939
-0.292673	time to load. A	-0.124939
-0.380944	variables (see below). A	-0.124939
-0.380944	is fast enough. A	-0.124939
-0.292673	branches works correctly. A	-0.124939
-0.380944	well. Codeplay VectorC A	-0.124939
-0.292673	support static linking. A	-0.124939
-0.536160	are mutually incompatible. A	-0.124939
-0.536160	performance monitor counters. A	-0.124939
-0.536160	and VIA CPUs". A	-0.124939
-0.292673	0 n! 117 A	-0.124939
-0.236552	of memory blocks. A	-0.124939
-0.236552	to do so). A	-0.124939
-0.236552	to be moved. A	-0.124939
-0.236552	the function body. A	-0.124939
-0.236552	are often mispredicted. A	-0.124939
-0.236552	the object owns. A	-0.124939
-0.236552	a good investment. A	-0.124939
-0.236552	on page 153. A	-0.124939
-0.236552	any other constructors. A	-0.124939
-0.236552	on page 107. A	-0.124939
-0.236552	of jump targets. A	-0.124939
-0.236552	in example 7.32b. A	-0.124939
-0.236552	need a constructor. A	-0.124939
-0.236552	the first sub-vector. A	-0.124939
-0.236552	and microprocessor microarchitecture. A	-0.124939
-0.236552	or "__attribute__((visibility ("hidden")))". A	-0.124939
-0.236552	a device driver. A	-0.124939
-0.236552	types (See Sutter: A	-0.124939
-0.236552	Foundation Classes (MFC). A	-0.124939
-0.236552	used without restrictions. A	-0.124939
-0.236552	memset(list, 0, sizeof(list)); A	-0.124939
-0.236552	in ASCII form. A	-0.124939
-0.236552	software was developed. A	-0.124939
-0.236552	Booth: "Inner Loops: A	-0.124939
-0.236552	for millisecond resolution. A	-0.124939
-0.236552	or NAN (Not A	-0.124939
-0.236552	some extra complications. A	-0.124939
-0.236552	use linked lists. A	-0.124939
-0.236552	1 0.5ns. 2GHz A	-0.124939
-0.236552	Template Library (WTL). A	-0.124939
-0.236552	|= 0x20; 46 A	-0.124939
-0.236552	also be considered. A	-0.124939
-0.236552	limited in scope. A	-0.124939
-0.236552	inputs give infinity. A	-0.124939
-0.236552	_mm_malloc and _mm_free. A	-0.124939
-0.236552	(&ArraySize) is taken. A	-0.124939
-0.236552	than it says. A	-0.124939
-0.236552	for details (www.agner.org/optimize/testp.zip). A	-0.124939
-0.236552	when it changes. A	-0.124939
-0.236552	interpreter for Basic. A	-0.124939
-0.236552	often have exploited. A	-0.124939
-0.236552	it is servicing. A	-0.124939
-0.236552	N-1 is inferior. A	-0.124939
-0.236552	having different types. A	-0.124939
-0.236552	make a destructor. A	-0.124939
-0.236552	for this reason. A	-0.124939
-0.236552	this instruction set?". A	-0.124939
-0.236552	a specific interval. A	-0.124939
-0.236552	a polynomial. Scheduling A	-0.124939
-0.236552	i_div_3; } 138 A	-0.124939
-0.236552	multidimensional structure needed? A	-0.124939
-0.236552	thread increments seconds. A	-0.124939
-0.598543	Value of a will	-0.124939
-0.877834	certain that a will	-0.124939
-0.826741	static memory and will	-0.124939
-0.358375	<< x.f; // will	-0.124939
-0.497064	-fpic and it will	-0.124939
-0.497064	inefficient, and it will	-0.124939
-0.586738	sure that it will	-0.124939
-0.478305	set then it will	-0.124939
-0.478305	cycles then it will	-0.124939
-0.478305	debugger then it will	-0.124939
-0.176387	on, then it will	-0.425969
-0.575713	section, but it will	-0.124939
-0.692190	first call it will	-0.124939
-0.791881	this case it will	-0.124939
-0.562000	references. Therefore, it will	-0.124939
-0.343761	or created it will	-0.124939
-0.343761	in all, it will	-0.124939
-0.544211	The library function will	-0.124939
-0.826136	a virtual function will	-0.124939
-0.521079	The dispatcher function will	-0.124939
-0.354705	the DelayFiveSeconds function will	-0.124939
-1.110903	then the code will	-0.124939
-0.568715	but the code will	-0.124939
-0.568715	Now the code will	-0.124939
-0.576654	specified. The code will	-0.124939
-0.506720	n;} This code will	-0.124939
-0.752033	the optimized code will	-0.124939
-0.549087	The above code will	-0.124939
-0.347659	This above code will	-0.124939
-0.344864	The resulting code will	-0.124939
-0.344864	the resultant code will	-0.124939
-0.520694	dispatcher function. This will	-0.124939
-0.442173	class definition. This will	-0.124939
-0.341973	null reference. This will	-0.124939
-0.442173	or inline. This will	-0.124939
-0.341973	never changed. This will	-0.124939
-0.341973	time slices. This will	-0.124939
-0.341973	or -axAVX. This will	-0.124939
-0.341973	their functionality. This will	-0.124939
-0.341973	size (4096). This will	-0.124939
-0.341973	page 87. This will	-0.124939
-0.341973	of -fpic. This will	-0.124939
-0.558708	whether the compiler will	-0.124939
-0.820324	cases, the compiler will	-0.124939
-0.558708	++b; the compiler will	-0.124939
-0.428486	called. The compiler will	-0.124939
-0.428486	enabled. The compiler will	-0.124939
-0.428486	vectorization. The compiler will	-0.124939
-0.428486	division. The compiler will	-0.124939
-0.428486	1.2345); The compiler will	-0.124939
-0.428486	/arch:SSE2. The compiler will	-0.124939
-0.428486	3.0; The compiler will	-0.124939
-0.328891	predict which compiler will	-0.124939
-0.716163	the Gnu compiler will	-0.124939
-0.234038	A good compiler will	-0.124939
-0.498167	an optimizing compiler will	-0.124939
-0.525105	clumsy, as you will	-0.124939
-0.627150	the compiler you will	-0.124939
-0.516133	addition then you will	-0.124939
-0.516133	models then you will	-0.124939
-0.516133	differ then you will	-0.124939
-0.481860	course, because you will	-0.124939
-0.481860	code, so you will	-0.124939
-0.442534	your program, you will	-0.124939
-0.342260	the compiler, you will	-0.124939
-0.354588	used and this will	-0.124939
-0.354588	integer and this will	-0.124939
-0.572765	-mcmodel=large, but this will	-0.124939
-0.350013	compilers. // It will	-0.124939
-0.496736	in memory. It will	-0.124939
-0.350013	operating system. It will	-0.124939
-0.350013	both positive. It will	-0.124939
-0.578835	amounts of memory will	-0.124939
-0.354991	time. No memory will	-0.124939
-0.841217	and the program will	-0.124939
-0.570022	Otherwise the program will	-0.124939
-0.561140	below. The program will	-0.124939
-0.697104	The application program will	-0.124939
-0.634304	the entire program will	-0.124939
-0.574493	independently. The CPU will	-0.124939
-0.874674	that the loop will	-0.124939
-0.561553	loop. The loop will	-0.124939
-0.449053	then this loop will	-0.124939
-0.347427	the whole loop will	-0.124939
-0.636338	of i which will	-0.124939
-0.347003	program optimization, which will	-0.124939
-0.347003	to -56 which will	-0.124939
-0.347003	in a[] which will	-0.124939
-0.357716	register variables, but will	-0.124939
-0.522617	that the cache will	-0.124939
-0.522617	28, the cache will	-0.124939
-0.357447	A negative integer will	-0.124939
-0.656759	the same class will	-0.124939
-0.500447	on, the compilers will	-0.124939
-0.457274	precision. The compilers will	-0.124939
-0.632598	while other compilers will	-0.124939
-0.409744	Fortunately, most compilers will	-0.124939
-0.376208	that some compilers will	-0.124939
-0.365387	unrolling Some compilers will	-0.124939
-0.365387	line. Some compilers will	-0.124939
-0.365387	division. Some compilers will	-0.124939
-0.365387	two. Some compilers will	-0.124939
-0.376208	all good compilers will	-0.124939
-0.409744	All optimizing compilers will	-0.124939
-0.240581	memory. Most compilers will	-0.124939
-0.240581	cache. Most compilers will	-0.124939
-0.240581	variable. Most compilers will	-0.124939
-0.240581	sets. Most compilers will	-0.124939
-0.240581	47 Most compilers will	-0.124939
-0.288805	processor). Optimizing compilers will	-0.124939
-0.376208	that future compilers will	-0.124939
-0.520367	Value of b will	-0.124939
-1.561420	a and b will	-0.124939
-1.293138	vector class library will	-0.124939
-1.072698	value of i will	-0.124939
-0.549767	areas, and there will	-0.124939
-0.357482	Func(ab[i].a); } There will	-0.124939
-0.830202	a linear array will	-0.124939
-0.461384	and this value will	-0.124939
-1.156555	variables and objects will	-0.124939
-0.534711	result then we will	-0.124939
-0.437753	1.23456. But we will	-0.124939
-0.338460	by four, we will	-0.124939
-0.338460	back. Thus, we will	-0.124939
-1.036887	to do so will	-0.124939
-0.356859	predict which variables will	-0.124939
-0.357181	to me. You will	-0.124939
-0.465807	way a branch will	-0.124939
-0.465807	Such a branch will	-0.124939
-0.524876	these eight elements will	-0.124939
-0.546245	the 64-bit systems will	-0.124939
-0.572564	and the user will	-0.124939
-0.914302	the end user will	-0.124939
-0.355821	the code 16 will	-0.124939
-0.446975	closed. The file will	-0.124939
-0.775891	appropriate header file will	-0.124939
-0.573191	time of programming will	-0.124939
-0.355847	brand. Future processors will	-0.124939
-0.333774	or not. I will	-0.124939
-0.333774	model. Instead, I will	-0.124939
-0.333774	0x800 apart. I will	-0.124939
-1.252267	floating point calculations will	-0.124939
-0.838008	and the result will	-0.124939
-0.508146	profiler. The result will	-0.124939
-0.606750	the final result will	-0.124939
-0.458957	Such a processor will	-0.124939
-0.535534	because the threads will	-0.124939
-0.355058	the preferred language will	-0.124939
-0.563919	and the speed will	-0.124939
-0.484468	then each thread will	-0.124939
-0.484468	with another thread will	-0.124939
-0.484468	simultaneously. Each thread will	-0.124939
-0.499499	An integer overflow will	-0.124939
-0.852540	a cache line will	-0.124939
-0.354885	and my manual will	-0.124939
-0.651413	a & b; will	-0.124939
-0.457595	multiple overloaded operators will	-0.124939
-0.353124	The [] operator will	-0.124939
-0.353004	cases this multiplication will	-0.124939
-0.353230	eliminated. Code caching will	-0.124939
-0.518077	next processor model will	-0.124939
-0.353215	c; Here, y will	-0.124939
-0.455330	the above examples will	-0.124939
-0.351172	that such feature will	-0.124939
-0.350195	the same core will	-0.124939
-1.082601	the code section will	-0.124939
-0.785998	level-2 cache contentions will	-0.124939
-0.228682	version in main will	-0.124939
-0.228682	instance in main will	-0.124939
-0.313965	a. This operation will	-0.124939
-0.313965	the & operation will	-0.124939
-0.348818	correctly whether vectorization will	-0.124939
-0.348402	containing only constants will	-0.124939
-0.449060	scope. A macro will	-0.124939
-0.346657	core). The counters will	-0.124939
-0.446926	uncaught overflow condition will	-0.124939
-0.797465	number of cores will	-0.124939
-0.344543	assume that F1 will	-0.124939
-1.207386	the critical stride will	-0.124939
-0.516662	143. The trick will	-0.124939
-0.338928	more template instances will	-0.124939
-0.338482	1 to 127 will	-0.124939
-0.338779	address. The linker will	-0.124939
-0.434030	that the break will	-0.124939
-0.433814	that many users will	-0.124939
-0.330939	micro-op cache. Compilers will	-0.124939
-0.331144	1. Number 18 will	-0.124939
-0.457479	that the loader will	-0.124939
-0.778679	The heap manager will	-0.124939
-0.324698	column. Number 17 will	-0.124939
-0.152817	from address 0x2710 will	-0.425969
-0.573937	in example 14.28 will	-0.124939
-0.313834	14.23b and 14.30 will	-0.124939
-0.712629	0x2700 to 0x273F will	-0.124939
-0.314162	the constant 3.5 will	-0.124939
-0.293357	calculation of c+b will	-0.124939
-0.293357	they are disabled will	-0.124939
-0.293357	brand new today will	-0.124939
-0.293357	the static modifier will	-0.124939
-0.293357	value of b+c will	-0.124939
-0.237153	a = OneOrTwo5[b!=0]; will	-0.124939
-0.237153	(see page 103) will	-0.124939
-0.237153	that the producer will	-0.124939
-0.237153	operator (bitwise and) will	-0.124939
-0.237153	a = b++; will	-0.124939
-0.237153	For example, b*2.0/3.0 will	-0.124939
-0.576756	polymorphic function. The }	-0.124939
-0.566492	to virtual function }	-0.124939
-0.423216	} } } }	-0.124939
-0.311888	elements } } }	-0.124939
-0.616599	... } } }	-0.124939
-0.128528	swapd(a[r2][c2],a[c2][r2]); } } }	-0.124939
-0.311888	b[r][c]; } } }	-0.124939
-0.311888	transpose(matrix); } } }	-0.124939
-0.311888	b[r][c]); } } }	-0.124939
-0.311888	i++; } } }	-0.124939
-0.264018	swap elements } }	-0.124939
-0.705005	= i; } }	-0.124939
-0.219992	// ... } }	-0.124939
-0.219992	{ ... } }	-0.124939
-0.219992	FactorialTable[b]; ... } }	-0.124939
-0.143449	- 1; } }	-0.124939
-0.703678	+ 1; } }	-0.124939
-0.419731	+ 2; } }	-0.124939
-0.264018	of range } }	-0.124939
-0.264018	to 5 } }	-0.124939
-0.172278	i, a); } }	-0.425969
-0.112248	{ F2(b); } }	-0.124939
-0.112248	b[1000]; F2(b); } }	-0.124939
-0.112248	c); a.store(aa+i); } }	-0.124939
-0.112248	aa: a.store(aa+i); } }	-0.124939
-0.264018	Induction; Induction++; } }	-0.124939
-0.052507	{ swapd(a[r2][c2],a[c2][r2]); } }	-0.425969
-0.264018	+ i/2; } }	-0.124939
-0.264018	* c[i]); } }	-0.124939
-0.264018	= b[r][c]; } }	-0.124939
-0.264018	goto CFALSE; } }	-0.124939
-0.264018	goto DTRUE; } }	-0.124939
-0.264018	matrix[SIZE][SIZE]; transpose(matrix); } }	-0.124939
-0.264018	StoreNTD(&a[c][r], b[r][c]); } }	-0.124939
-0.264018	for-loop: i++; } }	-0.124939
-0.655423	// swap elements }	-0.124939
-0.566623	c = 0; }	-0.124939
-0.566623	d = 0; }	-0.124939
-0.102341	... return 0; }	-0.425969
-0.544923	Output array element }	-0.124939
-0.107408	largest_index = i; }	-0.124939
-0.107408	matrix[j][0] = i; }	-0.124939
-0.500824	unsigned int i; }	-0.124939
-0.036170	f; int i; }	-0.492916
-0.682910	a = b; }	-0.124939
-0.339576	a : b; }	-0.124939
-0.353801	WhateverFunction(i); // ... }	-0.124939
-0.425012	(...) { ... }	-0.124939
-0.270397	C1 x; ... }	-0.124939
-0.270397	= Func1(2); ... }	-0.124939
-0.270397	= log(2.0); ... }	-0.124939
-0.270397	= FactorialTable[b]; ... }	-0.124939
-0.353102	array index operator }	-0.124939
-0.280161	c = 1; }	-0.124939
-0.280161	d = 1; }	-0.124939
-0.034661	a - 1; }	-0.425969
-0.095577	a + 1; }	-0.249877
-0.218303	b + 1; }	-0.124939
-0.112616	x*x + 1; }	-0.124939
-0.034661	cout << 1; }	-0.425969
-0.159161	n >>= 1; }	-0.124939
-0.352978	used for multiplication }	-0.124939
-0.331373	a = c; }	-0.124939
-0.331373	1; return c; }	-0.124939
-0.495862	f is zero }	-0.124939
-0.455063	// Table lookup }	-0.124939
-0.351530	{ return x; }	-0.124939
-0.064256	list[i+2] = 2; }	-0.124939
-0.021522	r + 2; }	-0.124939
-0.010628	*p + 2; }	-0.124939
-0.021522	b[i] + 2; }	-0.124939
-0.021522	bb[i] + 2; }	-0.124939
-0.298885	a * 2; }	-0.425969
-0.027074	cout << 2; }	-0.425969
-1.018279	out of range }	-0.124939
-0.492633	count to 5 }	-0.124939
-0.452363	if both positive }	-0.124939
-0.347993	n to exponent }	-0.124939
-0.634228	a[i] = temp; }	-0.124939
-0.346017	i; return f; }	-0.124939
-0.152657	9 + 3; }	-0.124939
-0.304326	a * 3; }	-0.124939
-0.152657	i / 3; }	-0.124939
-0.152657	i % 3; }	-0.124939
-0.343059	} return y; }	-0.124939
-0.343180	// n factorial }	-0.124939
-0.028484	_mm_storeu_si128((__m128i *)d, x); }	-0.124939
-0.058967	_mm_store_si128((__m128i *)d, x); }	-0.124939
-0.335293	{ return 1.0; }	-0.124939
-0.128901	y + 1.; }	-0.124939
-0.130896	Func1(x) + 1.; }	-0.124939
-0.331120	f is nonzero }	-0.124939
-0.330909	B*x + C; }	-0.124939
-0.324935	3: printf("Delta"); break; }	-0.124939
-0.324416	of range"; 134 }	-0.124939
-0.209822	list[i].b = 2.0; }	-0.124939
-0.209822	temp->b = 2.0; }	-0.124939
-0.075655	list[i] += 1.0f; }	-0.124939
-0.457438	list[i+2] += i_div_3; }	-0.124939
-0.324416	powN<true,N-N1>::p(x); #undef N1 }	-0.124939
-0.015501	return _mm_loadu_si128((__m128i const*)p); }	-0.425969
-0.065627	return _mm_load_si128((__m128i const*)p); }	-0.124939
-0.313804	induction variable Z }	-0.124939
-0.006821	+ i, a); }	-0.301030
-0.293330	= &Object2; p->Hello(); }	-0.124939
-0.537311	y = cos(x); }	-0.124939
-0.293330	call to C1::f }	-0.124939
-0.293330	} return sum; }	-0.124939
-0.293330	x + 2.0f; }	-0.124939
-0.537311	out of range"; }	-0.124939
-0.293330	__rdtsc(); return clock; }	-0.124939
-0.293330	negative or -0 }	-0.124939
-0.102577	else { F2(b); }	-0.124939
-0.102577	float b[1000]; F2(b); }	-0.124939
-0.293330	FuncB(i); } FuncC(i); }	-0.124939
-0.381748	0) { FuncA(i); }	-0.124939
-0.293330	return (*CriticalFunction)(parm1, parm2); }	-0.124939
-0.293330	next four x^n }	-0.124939
-0.102577	(y) { F1(a); }	-0.124939
-0.102577	int a[1000]; F1(a); }	-0.124939
-0.102577	CriticalFunction = &CriticalFunction_386; }	-0.124939
-0.102577	version return &CriticalFunction_386; }	-0.124939
-0.048267	Friday) { DoThisThreeTimesAWeek(); }	-0.124939
-0.048267	Friday)) { DoThisThreeTimesAWeek(); }	-0.124939
-0.537311	y = sin(x); }	-0.124939
-0.102577	* c); a.store(aa+i); }	-0.124939
-0.102577	in aa: a.store(aa+i); }	-0.124939
-0.102577	CriticalFunction = &CriticalFunction_SSE2; }	-0.124939
-0.102577	supported return &CriticalFunction_SSE2; }	-0.124939
-0.293330	= Induction; Induction++; }	-0.124939
-0.023463	(*SelectAddMul_pointer)(aa, bb, cc); }	-0.124939
-0.023463	c2++) { swapd(a[r2][c2],a[c2][r2]); }	-0.425969
-0.293330	_mm_empty(); // EMMS }	-0.124939
-0.537311	memset(a, 0, sizeof(a)); }	-0.124939
-0.102577	CriticalFunction = &CriticalFunction_AVX; }	-0.124939
-0.102577	supported return &CriticalFunction_AVX; }	-0.124939
-0.293330	cc[i]); } 109 }	-0.124939
-0.237129	{ return pow(x,10); }	-0.124939
-0.237129	the four sums }	-0.124939
-0.237129	- (time before) }	-0.124939
-0.237129	powN<true,N/2>::p(x) * powN<true,N/2>::p(x); }	-0.124939
-0.237129	list[j].b + list[j].c; }	-0.124939
-0.237129	(bb[i] * cc[i]); }	-0.124939
-0.237129	x8*x2; return x10; }	-0.124939
-0.237129	r + i/2; }	-0.124939
-0.237129	abs(u.f) > abs(v.f) }	-0.124939
-0.237129	(b[i] * c[i]); }	-0.124939
-0.237129	else { FuncB(i); }	-0.124939
-0.237129	{ return IntegerPower<10>(x); }	-0.124939
-0.237129	child function: (static_cast<MyChild*>(this))->Disp(); }	-0.124939
-0.237129	a[c][r] = b[r][c]; }	-0.124939
-0.237129	of range printf(Greek[n]); }	-0.124939
-0.237129	{ goto CFALSE; }	-0.124939
-0.237129	{ goto DTRUE; }	-0.124939
-0.237129	b[i] = Func(a[i]); }	-0.124939
-0.237129	%10I64i", i, timediff[i]); }	-0.124939
-0.237129	double matrix[SIZE][SIZE]; transpose(matrix); }	-0.124939
-0.237129	try { F1(); }	-0.124939
-0.237129	not supported"); return; }	-0.124939
-0.237129	ab[i].b = Func(ab[i].a); }	-0.124939
-0.237129	+ 1; 69 }	-0.124939
-0.237129	2.5}; return list[x]; }	-0.124939
-0.237129	{ return N; }	-0.124939
-0.237129	{ StoreNTD(&a[c][r], b[r][c]); }	-0.124939
-0.237129	s); return _mm_cvtss_f32(s); }	-0.124939
-0.237129	list[100]; Func1(list, &list[8]); }	-0.124939
-0.237129	d; int i[2]; }	-0.124939
-0.237129	powN<(N & N-1)==0,N>::p(x); }	-0.124939
-0.237129	* temp; 104 }	-0.124939
-0.237129	FuncC(i); FuncB(i+1); FuncC(i+1); }	-0.124939
-0.237129	s3 += a[i+3]; }	-0.124939
-0.237129	the for-loop: i++; }	-0.124939
-0.237129	here: return *(T*)0; }	-0.124939
-0.237129	a); } 111 }	-0.124939
-0.237129	temp += 9; }	-0.124939
-0.574462	this pointer is then	-0.124939
-0.462687	result ebx is then	-0.124939
-0.353426	of A and then	-0.124939
-0.456648	precision constant and then	-0.124939
-0.497336	data structure and then	-0.124939
-0.353426	CPU model and then	-0.124939
-0.499925	to zero and then	-0.124939
-0.353426	a string and then	-0.124939
-0.648931	error message and then	-0.124939
-0.353426	been added and then	-0.124939
-0.353426	a PC and then	-0.124939
-0.353426	doing calculations, and then	-0.124939
-0.353426	gives a+b=0, and then	-0.124939
-0.721124	The values are then	-0.124939
-0.523707	intermediate files are then	-0.124939
-0.356495	its members are then	-0.124939
-1.374162	the code can then	-0.124939
-1.299149	The compiler can then	-0.124939
-0.538291	Each thread can then	-0.124939
-0.526219	another dispatched function then	-0.124939
-1.812025	in the code then	-0.124939
-0.536547	of a code then	-0.124939
-1.482500	piece of code then	-0.124939
-0.456382	the CPU time then	-0.124939
-0.353216	a short time then	-0.124939
-1.420869	at compile time then	-0.124939
-0.579020	bytes or more then	-0.124939
-0.462033	This operation will then	-0.124939
-0.832751	around in memory then	-0.124939
-1.785288	of the program then	-0.124939
-1.485429	of a program then	-0.124939
-0.549186	in library functions then	-0.124939
-0.838547	virtual member functions then	-0.124939
-0.524280	avoid virtual functions then	-0.124939
-0.449424	to frame functions then	-0.124939
-0.581126	than the other then	-0.124939
-0.357505	a big loop then	-0.124939
-0.548336	API function which then	-0.124939
-1.044345	than the cache then	-0.124939
-0.357573	Each thread should then	-0.124939
-0.648125	can be set then	-0.124939
-0.593754	particular instruction set then	-0.124939
-0.524952	with different compilers then	-0.124939
-0.879132	the vector size then	-0.124939
-1.343949	to a pointer then	-0.124939
-0.533401	their smart pointer then	-0.124939
-0.578732	Intel function library then	-0.124939
-0.501764	unroll by two then	-0.124939
-0.587620	deleting the object then	-0.124939
-0.357284	an odd number then	-0.124939
-0.356897	local object static then	-0.124939
-0.831403	power of 2 then	-0.301030
-0.586062	for the performance then	-0.124939
-0.523666	the array elements then	-0.124939
-1.062351	the above example, then	-0.124939
-0.355815	not enough registers then	-0.124939
-0.843737	is the case then	-0.124939
-0.456362	set is available then	-0.124939
-0.456362	inttypes.h is available then	-0.124939
-0.719027	to clean up then	-0.124939
-0.575245	of an error then	-0.124939
-0.630808	a thousand times then	-0.124939
-0.630808	repeats 1000 times then	-0.124939
-0.499260	object is large then	-0.124939
-0.458919	calling function must then	-0.124939
-0.823568	sequence of calculations then	-0.124939
-0.458764	during program execution then	-0.124939
-0.501624	for a result then	-0.124939
-0.521348	first 128 bytes then	-0.124939
-0.355031	updates are necessary then	-0.124939
-0.354832	only one element then	-0.124939
-0.354295	abort(), _endthread(), etc. then	-0.124939
-0.553162	throw an exception then	-0.124939
-0.546062	elements is small then	-0.124939
-0.651178	assembly output option then	-0.124939
-0.575549	used cache line then	-0.124939
-0.354374	a profiler works then	-0.124939
-0.339320	type of parameters then	-0.124939
-0.523339	as template parameters then	-0.124939
-1.082966	is not advantageous then	-0.124939
-0.869278	is a problem then	-0.124939
-0.650890	is already known then	-0.124939
-0.498308	without AVX support then	-0.124939
-0.457355	other data structure then	-0.124939
-0.353661	different set values then	-0.124939
-0.869190	10 clock cycles then	-0.124939
-0.353483	a[1], b[1], ... then	-0.124939
-0.353256	floating point counter then	-0.124939
-0.579440	using CPU dispatching then	-0.124939
-0.352620	for your application then	-0.124939
-0.352441	page 73) automatically then	-0.124939
-0.573614	using exception handling then	-0.124939
-0.751946	of these methods then	-0.124939
-0.352316	the old block then	-0.124939
-0.454889	objects is high then	-0.124939
-0.565966	A CPU dispatcher then	-0.124939
-0.645894	the preceding addition then	-0.124939
-0.350740	the 64-bit vectors then	-0.124939
-0.350112	the innermost function, then	-0.124939
-0.293553	in this range then	-0.124939
-0.293553	a limited range then	-0.124939
-0.293553	a narrow range then	-0.124939
-0.513892	one CPU core then	-0.124939
-0.349693	pointers or references then	-0.124939
-0.704858	the CPU supports then	-0.124939
-0.316278	j as index then	-0.124939
-0.804009	an array index then	-0.124939
-0.956047	of the code, then	-0.124939
-0.551213	only one instance then	-0.124939
-0.348239	you want vectorization then	-0.124939
-0.348441	on CPU efficiency then	-0.124939
-0.307337	at a time, then	-0.124939
-0.307337	at any time, then	-0.124939
-0.346283	or specific models then	-0.124939
-0.447784	pointer has changed then	-0.124939
-0.361117	time to execute then	-0.124939
-0.361117	microseconds to execute then	-0.124939
-0.345295	the same resource then	-0.124939
-0.343613	an Intel compiler, then	-0.124939
-0.523126	the same module then	-0.124939
-0.247603	any other module then	-0.124939
-0.247603	a separate module then	-0.124939
-0.305515	compiler is used, then	-0.124939
-0.305515	expression is used, then	-0.124939
-1.204936	the critical stride then	-0.124939
-0.551119	particular instruction set, then	-0.124939
-0.342487	additions are independent then	-0.124939
-0.343227	are equally near then	-0.124939
-0.535356	long dependency chains then	-0.124939
-0.740884	is an integer, then	-0.124939
-0.340464	more than once then	-0.124939
-0.154781	5 clock cycles, then	-0.124939
-0.340671	not declared volatile then	-0.124939
-0.113738	the shared object, then	-0.124939
-0.113738	a shared object, then	-0.124939
-0.338111	the version changes then	-0.124939
-0.496613	are 32-bit integers, then	-0.124939
-0.433401	is poorly predictable then	-0.124939
-0.766978	in the debugger then	-0.124939
-0.102506	code version on, then	-0.124939
-0.102506	advanced version on, then	-0.124939
-0.330991	b are swapped then	-0.124939
-0.755234	the carry flag then	-0.124939
-0.496192	|| is true, then	-0.124939
-0.648986	into the pipeline then	-0.124939
-0.324258	compile-time constant n, then	-0.124939
-0.092969	CPU. If not, then	-0.124939
-0.092969	factor. If not, then	-0.124939
-0.323859	small and changing then	-0.124939
-0.323859	a non-sequential manner then	-0.124939
-0.313262	e.g. four numbers, then	-0.124939
-0.442355	A is slow, then	-0.124939
-0.171963	Out (FIFO) basis then	-0.124939
-0.171963	Out (FILO) basis then	-0.124939
-0.313262	bottleneck is elsewhere then	-0.124939
-0.313262	times one way, then	-0.124939
-0.313262	improve cache efficiency, then	-0.124939
-0.313262	no big arrays, then	-0.124939
-0.313780	&& is false, then	-0.124939
-0.171963	the first sum, then	-0.124939
-0.171963	the second sum, then	-0.124939
-0.171963	instead of double, then	-0.124939
-0.171963	a 64-bit double, then	-0.124939
-0.292812	is not vacant then	-0.124939
-0.381114	with different priorities then	-0.124939
-0.381114	is 2 GHz then	-0.124939
-0.292812	template parameters differ then	-0.124939
-0.102410	is calculated first, then	-0.124939
-0.102410	R values first, then	-0.124939
-0.381114	register (see below) then	-0.124939
-0.292812	microprocessor has hyperthreading, then	-0.124939
-0.292812	compile-time while loops, then	-0.124939
-0.292812	in the container, then	-0.124939
-0.292812	no AVX support, then	-0.124939
-0.292812	for one segment then	-0.124939
-0.292812	an acceptable limit, then	-0.124939
-0.236674	10 μs today, then	-0.124939
-0.236674	FuncA and FuncB, then	-0.124939
-0.236674	a particular meaning, then	-0.124939
-0.236674	has been identified, then	-0.124939
-0.236674	with CPU dispatching, then	-0.124939
-0.236674	have been found, then	-0.124939
-0.236674	in chapter 9.10, then	-0.124939
-0.236674	is too fine then	-0.124939
-0.236674	hyperthreading. If so, then	-0.124939
-0.236674	than the other, then	-0.124939
-0.236674	are accessed row-wise, then	-0.124939
-0.236674	T to T+5, then	-0.124939
-0.236674	is poorly predictable, then	-0.124939
-0.236674	can be made) then	-0.124939
-0.236674	the same algorithm, then	-0.124939
-0.236674	important to ignore, then	-0.124939
-0.236674	x and y?" then	-0.124939
-0.236674	this is obvious, then	-0.124939
-0.236674	option for RTTI then	-0.124939
-0.236674	a = 10000, then	-0.124939
-0.236674	it writes only, then	-0.124939
-0.236674	be too small, then	-0.124939
-0.236674	u < 231 then	-0.124939
-0.236674	C1 or C2, then	-0.124939
-0.236674	is not met then	-0.124939
-0.236674	i = 18, then	-0.124939
-0.236674	first, then d+e, then	-0.124939
-0.462583	Microsoft compilers. // It	-0.124939
-0.355136	or -0 } It	-0.124939
-0.355136	} 109 } It	-0.124939
-0.998944	Using intrinsic functions It	-0.124939
-0.582925	of an object It	-0.124939
-0.355803	in two libraries It	-0.124939
-1.164315	of the code. It	-0.124939
-0.330488	on integer code. It	-0.124939
-0.755615	any extra code. It	-0.124939
-0.465707	the source code. It	-0.124939
-0.321945	a long time. It	-0.124939
-0.796357	no extra time. It	-0.124939
-0.160289	takes longer time. It	-0.124939
-0.148183	take longer time. It	-0.124939
-0.502176	and function pointers It	-0.124939
-1.052706	the compiler does It	-0.124939
-0.354985	or clearing arrays It	-0.124939
-0.354187	hash maps etc. It	-0.124939
-0.457677	used intrinsic functions. It	-0.124939
-0.457677	optimized mathematical functions. It	-0.124939
-0.324592	remove unreferenced functions. It	-0.124939
-0.639372	in the memory. It	-0.124939
-0.737326	stored in memory. It	-0.124939
-0.583912	dynamically allocated memory. It	-0.124939
-0.397496	set is used. It	-0.124939
-0.397496	stack is used. It	-0.124939
-0.750571	they are used. It	-0.124939
-0.523326	no longer used. It	-0.124939
-0.285301	is seldom used. It	-0.124939
-1.163868	in 64-bit systems. It	-0.124939
-0.352195	on these data. It	-0.124939
-0.577722	necessary instruction set. It	-0.124939
-0.812333	and VIA processors. It	-0.124939
-0.350482	to be called. It	-0.124939
-0.452452	different Intel CPUs. It	-0.124939
-0.349810	the same compiler. It	-0.124939
-0.349274	vector registers are: It	-0.124939
-0.555622	during the loop. It	-0.124939
-0.138364	to a pointer. It	-0.124939
-0.138364	from a pointer. It	-0.124939
-0.138364	through a pointer. It	-0.124939
-0.138364	like a pointer. It	-0.124939
-0.313729	a 'this' pointer. It	-0.124939
-0.349014	the best cases. It	-0.124939
-0.347990	with induction variables. It	-0.124939
-0.347992	to another class. It	-0.124939
-0.637252	3.8 System database It	-0.124939
-0.558105	library function calls. It	-0.124939
-0.493803	the vector registers. It	-0.124939
-0.509643	a shared object. It	-0.124939
-0.346876	Gnu C library. It	-0.124939
-0.399191	64-bit integer calculations. It	-0.124939
-0.456885	do mathematical calculations. It	-0.124939
-0.827232	20 clock cycles. It	-0.124939
-0.346876	file input/output operations. It	-0.124939
-0.346736	of full optimization. It	-0.124939
-0.346876	to improve performance. It	-0.124939
-0.881614	for each thread. It	-0.124939
-0.795339	large data structures It	-0.124939
-0.345032	in one vector. It	-0.124939
-0.343477	pointers or references. It	-0.124939
-0.727134	in the CPU. It	-0.124939
-0.343838	database in Windows. It	-0.124939
-0.442940	applied to integers. It	-0.124939
-0.237038	than signed integers. It	-0.124939
-0.237038	vector containing integers. It	-0.124939
-0.286611	want it to. It	-0.124939
-0.286611	to apply to. It	-0.124939
-0.342383	are not critical. It	-0.124939
-0.342582	compiler became available. It	-0.124939
-0.442186	it was executed. It	-0.124939
-0.342582	their software faster. It	-0.124939
-0.286436	no cache problems. It	-0.124939
-0.286436	of alignment problems. It	-0.124939
-0.971465	a template parameter. It	-0.124939
-0.897321	floating point expressions. It	-0.124939
-0.623901	the previous value. It	-0.124939
-0.960294	the operating system. It	-0.124939
-0.623043	of the arrays. It	-0.124939
-0.340123	of the branch. It	-0.124939
-0.278547	of storage space. It	-0.124939
-0.278547	and disk space. It	-0.124939
-0.340123	extra element zero. It	-0.124939
-0.340123	some legacy software. It	-0.124939
-0.337997	a better solution. It	-0.124939
-0.436852	compiling for Linux. It	-0.124939
-0.509382	or assembly language. It	-0.124939
-0.337997	if it is. It	-0.124939
-0.338251	the code automatically. It	-0.124939
-0.267962	in the core. It	-0.124939
-0.267962	the same core. It	-0.124939
-0.756470	and automatic vectorization. It	-0.124939
-0.337743	be true anyway. It	-0.124939
-0.476341	it to do. It	-0.124939
-0.334590	double precision constant. It	-0.124939
-0.334590	to see this. It	-0.124939
-0.254971	compile time here. It	-0.124939
-0.254971	not appropriate here. It	-0.124939
-0.504708	modify data members. It	-0.124939
-0.334884	irregular response times. It	-0.124939
-0.334884	the previous one. It	-0.124939
-0.334884	desired program structure. It	-0.124939
-0.427413	not a profiler. It	-0.124939
-0.485611	no exception handling. It	-0.124939
-0.427413	standardized installation tools. It	-0.124939
-0.604228	floating point numbers. It	-0.124939
-0.330213	b[i] = a[i]; It	-0.124939
-0.485611	prevents out-of-order execution. It	-0.124939
-0.330213	or approximately so. It	-0.124939
-0.323730	or mouse input. It	-0.124939
-0.323730	its own IDE. It	-0.124939
-0.323730	and dynamic versions. It	-0.124939
-0.323730	line number information. It	-0.124939
-0.680037	the user interface. It	-0.124939
-0.592921	pitfalls of unit-testing It	-0.124939
-0.927291	of programming style. It	-0.124939
-0.323730	from the counts. It	-0.124939
-0.323730	make files smaller. It	-0.124939
-0.323730	to do manually. It	-0.124939
-0.323730	with 16-bit programs. It	-0.124939
-0.313696	that particular part. It	-0.124939
-0.572668	i < 100. It	-0.124939
-0.572668	9.2 Cache organization It	-0.124939
-0.171902	for this purpose. It	-0.124939
-0.171902	a specific purpose. It	-0.124939
-0.572668	are accessed sequentially. It	-0.124939
-0.313136	a non-sequential manner. It	-0.124939
-0.313136	restriction on x. It	-0.124939
-0.292691	the new context. It	-0.124939
-0.292691	objects are aligned. It	-0.124939
-0.292691	catch, and throw. It	-0.124939
-0.380966	safe, of course. It	-0.124939
-0.536192	(see page 73). It	-0.124939
-0.536192	also has disadvantages: It	-0.124939
-0.292691	from a buffer. It	-0.124939
-0.380966	or optimized away. It	-0.124939
-0.536192	a make utility. It	-0.124939
-0.292691	more syntax check. It	-0.124939
-0.380966	is data decomposition. It	-0.124939
-0.292691	settings are lost. It	-0.124939
-0.536192	difficult to read. It	-0.124939
-0.292691	static data. 148 It	-0.124939
-0.292691	program is started. It	-0.124939
-0.292691	compatible with Gnu. It	-0.124939
-0.292691	CPU dispatcher updated. It	-0.124939
-0.380966	are poorly predictable. It	-0.124939
-0.380966	7.15b below shows. It	-0.124939
-0.292691	and open source. It	-0.124939
-0.292691	used more efficiently. It	-0.124939
-0.292691	not doing divisions. It	-0.124939
-0.536192	on page 72. It	-0.124939
-0.292691	but also safer. It	-0.124939
-0.236568	an error message. It	-0.124939
-0.236568	the final product. It	-0.124939
-0.236568	on and off. It	-0.124939
-0.236568	difficult to diagnose. It	-0.124939
-0.236568	in the profile. It	-0.124939
-0.236568	user. Feature bloat. It	-0.124939
-0.236568	to contained objects? It	-0.124939
-0.236568	with compile-time polymorphism. It	-0.124939
-0.236568	or mouse move. It	-0.124939
-0.236568	several iterations ahead. It	-0.124939
-0.236568	CPU dispatch strategies It	-0.124939
-0.236568	the C-style type-casting. It	-0.124939
-0.236568	waiting for response. It	-0.124939
-0.236568	what is happening. It	-0.124939
-0.236568	is not standardized. It	-0.124939
-0.236568	actually quite convenient. It	-0.124939
-0.236568	is too high. It	-0.124939
-0.236568	they are unavoidable. It	-0.124939
-0.236568	game or animation. It	-0.124939
-0.236568	#include <xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It	-0.124939
-0.236568	on non-Intel processors). It	-0.124939
-0.236568	the programmer can. It	-0.124939
-0.236568	quite tedious indeed. It	-0.124939
-0.236568	attack for hackers. It	-0.124939
-0.236568	less user friendly. It	-0.124939
-0.236568	See page 54. It	-0.124939
-0.236568	for the label. It	-0.124939
-0.236568	are both positive. It	-0.124939
-0.236568	between these considerations. It	-0.124939
-0.236568	list[i].a and list[i].b. It	-0.124939
-0.236568	doesn't mean atomic. It	-0.124939
-0.236568	version performs poorly. It	-0.124939
-0.236568	a first-in-last-out fashion. It	-0.124939
-0.236568	to choose between. It	-0.124939
-0.236568	on page 130. It	-0.124939
-0.236568	for pointer conversions. It	-0.124939
-0.236568	(see p. 57). It	-0.124939
-0.236568	branches for correctness. It	-0.124939
-0.236568	See page 61. It	-0.124939
-0.236568	types or sizes? It	-0.124939
-0.236568	(a*b*c)+(c*b*a) to a*b*c*2. It	-0.124939
-0.236568	additions with double's. It	-0.124939
-0.236568	is memory pooling. It	-0.124939
-0.236568	with many decimals. It	-0.124939
-0.236568	simple to develop. It	-0.124939
-0.236568	as memory leaks. It	-0.124939
-0.236568	comments, in green. It	-0.124939
-0.236568	integers is costless. It	-0.124939
-0.236568	use a queue. It	-0.124939
-0.005368	this by // Example	-0.124939
-0.027527	replaced by // Example	-0.124939
-0.199622	the code. // Example	-0.124939
-0.199622	compile time. // Example	-0.124939
-0.000834	code. Example: // Example	-0.124939
-0.003347	time. Example: // Example	-0.124939
-0.001670	function. Example: // Example	-0.124939
-0.001670	memory. Example: // Example	-0.124939
-0.003347	used. Example: // Example	-0.124939
-0.003347	called. Example: // Example	-0.124939
-0.003347	loop. Example: // Example	-0.124939
-0.001670	2. Example: // Example	-0.124939
-0.003347	variables. Example: // Example	-0.124939
-0.003347	calls. Example: // Example	-0.124939
-0.003347	registers. Example: // Example	-0.124939
-0.003347	variable. Example: // Example	-0.124939
-0.003347	needed. Example: // Example	-0.124939
-0.003347	instructions. Example: // Example	-0.124939
-0.003347	order. Example: // Example	-0.124939
-0.003347	to. Example: // Example	-0.124939
-0.001670	overflow. Example: // Example	-0.124939
-0.003347	value. Example: // Example	-0.124939
-0.003347	branch. Example: // Example	-0.124939
-0.003347	constant. Example: // Example	-0.124939
-0.003347	prediction. Example: // Example	-0.124939
-0.003347	result. Example: // Example	-0.124939
-0.001670	counter. Example: // Example	-0.124939
-0.003347	operation. Example: // Example	-0.124939
-0.003347	finished. Example: // Example	-0.124939
-0.003347	ways. Example: // Example	-0.124939
-0.003347	execution. Example: // Example	-0.124939
-0.003347	elements. Example: // Example	-0.124939
-0.003347	once. Example: // Example	-0.124939
-0.003347	limited. Example: // Example	-0.124939
-0.003347	static. Example: // Example	-0.124939
-0.003347	known. Example: // Example	-0.124939
-0.003347	thing. Example: // Example	-0.124939
-0.003347	divisions. Example: // Example	-0.124939
-0.003347	later. Example: // Example	-0.124939
-0.003347	zeroes. Example: // Example	-0.124939
-0.003347	undesired. Example: // Example	-0.124939
-0.003347	offsets). Example: // Example	-0.124939
-0.003347	overhead. Example: // Example	-0.124939
-0.003347	individually. Example: // Example	-0.124939
-0.159996	size conversion // Example	-0.124939
-0.159996	unsigned conversion // Example	-0.124939
-0.269414	Gnu compilers. // Example	-0.124939
-0.010802	the example: // Example	-0.124939
-0.000890	For example: // Example	-0.124939
-0.010802	following example: // Example	-0.124939
-0.010802	Another example: // Example	-0.124939
-0.016306	code to: // Example	-0.124939
-0.003213	this to: // Example	-0.124939
-0.016306	optimized to: // Example	-0.124939
-0.016306	reduced to: // Example	-0.124939
-0.008076	changed to: // Example	-0.124939
-0.199622	align arrays. // Example	-0.124939
-0.199622	different array. // Example	-0.124939
-0.004468	like this: // Example	-0.124939
-0.199622	as follows: // Example	-0.124939
-0.199622	of 2: // Example	-0.124939
-0.199622	metaprogramming is. // Example	-0.124939
-0.199622	for details. // Example	-0.124939
-0.199622	from www.agner.org/optimize/asmlib.zip. // Example	-0.124939
-0.199622	writing: 103 // Example	-0.124939
-0.199622	with 1: // Example	-0.124939
-0.199622	to unsigned. // Example	-0.124939
-0.056917	of numbers: // Example	-0.124939
-0.056917	point numbers: // Example	-0.124939
-0.056917	100 numbers: // Example	-0.124939
-0.088441	of calculations: // Example	-0.124939
-0.088441	modulo calculations: // Example	-0.124939
-0.088441	the operations: // Example	-0.124939
-0.088441	modulo operations: // Example	-0.124939
-0.199622	of overflow: // Example	-0.124939
-0.199622	following way: // Example	-0.124939
-0.027527	replaced with: // Example	-0.124939
-0.056917	Replace with: // Example	-0.124939
-0.199622	example: 38 // Example	-0.124939
-0.041973	sign bit: // Example	-0.124939
-0.088441	a case: // Example	-0.124939
-0.088441	lower case: // Example	-0.124939
-0.199622	the loop: // Example	-0.124939
-0.041973	table lookup: // Example	-0.124939
-0.199622	template parameter: // Example	-0.124939
-0.041973	lookup table: // Example	-0.124939
-0.199622	derived class: // Example	-0.124939
-0.041973	is needed: // Example	-0.124939
-0.041973	is available: // Example	-0.124939
-0.199622	lrint function: // Example	-0.124939
-0.199622	use SafeArray: // Example	-0.124939
-0.199622	a union: // Example	-0.124939
-0.199622	fraction bits: // Example	-0.124939
-0.199622	to zero: // Example	-0.124939
-0.199622	floating point: // Example	-0.124939
-0.199622	is enabled: // Example	-0.124939
-0.199622	double precision: // Example	-0.124939
-0.199622	positive integer: // Example	-0.124939
-0.199622	members last: // Example	-0.124939
-0.199622	control condition: // Example	-0.124939
-0.199622	more efficient: // Example	-0.124939
-0.199622	the arrays: // Example	-0.124939
-0.199622	of underflow: // Example	-0.124939
-0.199622	runtime polymorphism: // Example	-0.124939
-0.199622	unsigned Examples: // Example	-0.124939
-0.199622	point variable: // Example	-0.124939
-0.199622	a double: // Example	-0.124939
-0.199622	and reorganize: // Example	-0.124939
-0.199622	suggested improvements). // Example	-0.124939
-0.199622	vector classes): // Example	-0.124939
-0.199622	order polynomial: // Example	-0.124939
-0.199622	using InstructionSet(): // Example	-0.124939
-0.199622	of structures: // Example	-0.124939
-0.199622	absolute values: // Example	-0.124939
-0.199622	static keyword: // Example	-0.124939
-0.199622	single comparison: // Example	-0.124939
-0.199622	instruction set: // Example	-0.124939
-0.199622	the reciprocal: // Example	-0.124939
-0.199622	Library (WTL): // Example	-0.124939
-0.199622	loop counter: // Example	-0.124939
-0.199622	be used: // Example	-0.124939
-0.199622	and memcpy: // Example	-0.124939
-0.199622	common denominator: // Example	-0.124939
-0.199622	in two: // Example	-0.124939
-0.199622	we have: // Example	-0.124939
-0.199622	using memset: // Example	-0.124939
-0.199622	its address: // Example	-0.124939
-0.199622	changes fastest: // Example	-0.124939
-0.199622	with alloca: // Example	-0.124939
-0.199622	the exponent: // Example	-0.124939
-0.199622	reference instead: // Example	-0.124939
-0.199622	type conversions: // Example	-0.124939
-0.199622	element matrix[c][r]. // Example	-0.124939
-0.199622	vector classes: // Example	-0.124939
-0.199622	matrix a: // Example	-0.124939
-0.199622	as integers: // Example	-0.124939
-0.199622	pointers, e.g.: // Example	-0.124939
-0.199622	example 7.22. // Example	-0.124939
-0.199622	pivot search: // Example	-0.124939
-0.199622	example 9.5b. // Example	-0.124939
-0.199622	always false: // Example	-0.124939
-0.199622	inside square: // Example	-0.124939
-0.199622	certain interval: // Example	-0.124939
-0.199622	induction variables: // Example	-0.124939
-0.199622	return statement: // Example	-0.124939
-0.199622	a template: // Example	-0.124939
-0.199622	this capability: // Example	-0.124939
-0.199622	two gives: // Example	-0.124939
-0.199622	* 1.2f; // Example	-0.124939
-0.589565	i; } } Example	-0.124939
-0.352969	both positive } Example	-0.124939
-0.352969	to exponent } Example	-0.124939
-0.344049	(32-bit mode): ; Example	-0.124939
-0.344049	example 8.26b: ; Example	-0.124939
-0.235736	Time per element Example	-0.124939
-0.582725	of the loop. Example	-0.124939
-0.658141	inside the loop. Example	-0.124939
-0.791695	with big-endian storage. Example	-0.124939
-0.444342	than at runtime. Example	-0.124939
-0.314737	the chosen expression. Example	-0.124939
-0.356974	asmlib.. // or from	-0.124939
-0.356974	data optimally, or from	-0.124939
-0.501173	and call it from	-0.124939
-0.356178	and prevent it from	-0.124939
-0.356178	This prevents it from	-0.124939
-0.800937	a library function from	-0.124939
-0.830529	a single function from	-0.124939
-0.892064	is the code from	-0.124939
-0.159233	following assembly code from	-0.425969
-0.525343	previous value than from	-0.124939
-1.667728	is faster than from	-0.124939
-0.186477	prevent the compiler from	-0.124939
-0.065317	prevents the compiler from	-0.346788
-0.520245	invoking the compiler from	-0.124939
-0.454766	Copying constant data from	-0.124939
-0.351942	access Accessing data from	-0.124939
-0.351942	or send data from	-0.124939
-0.428349	aligned integer vector from	-0.124939
-0.339466	unaligned integer vector from	-0.602060
-0.170109	is called only from	-0.124939
-0.076779	function called only from	-0.425969
-0.543715	prevent the CPU from	-0.124939
-0.191807	prevents the CPU from	-0.124939
-0.551895	Intel and one from	-0.124939
-0.353737	currently available, one from	-0.124939
-1.252197	the level-2 cache from	-0.124939
-1.003124	the level-1 cache from	-0.124939
-0.357538	the latest compilers from	-0.124939
-0.357423	to convert b from	-0.124939
-0.552619	reading the value from	-0.124939
-0.552619	reload the value from	-0.124939
-0.499154	Reading a value from	-0.124939
-0.431358	subtract this value from	-0.124939
-0.517930	calculate each value from	-0.124939
-0.577049	fetch the variable from	-0.124939
-0.575332	writing a variable from	-0.124939
-0.241670	is to return from	-0.124939
-0.241670	recommended to return from	-0.124939
-0.443883	label ; return from	-0.124939
-0.580279	copy the table from	-0.124939
-0.518834	take the elements from	-0.124939
-0.257900	eight consecutive elements from	-0.602060
-0.429214	and is called from	-0.124939
-0.606937	that is called from	-0.124939
-0.606937	it is called from	-0.124939
-0.260643	can be called from	-0.124939
-0.372913	may be called from	-0.124939
-0.260643	cannot be called from	-0.124939
-0.441329	class are called from	-0.124939
-0.076028	compiler when called from	-0.124939
-0.076028	only when called from	-0.124939
-0.076028	also when called from	-0.124939
-0.633737	is also called from	-0.124939
-0.460565	address. A call from	-0.124939
-0.522547	a map file from	-0.124939
-0.424166	it is available from	-0.124939
-0.599359	which is available from	-0.124939
-0.436184	software are available from	-0.124939
-0.436184	tasks are available from	-0.124939
-0.521760	is also available from	-0.124939
-0.284394	are always available from	-0.124939
-0.284394	be easily available from	-0.124939
-0.284394	Kernel Library, available from	-0.124939
-0.732253	variable is accessed from	-0.124939
-0.402283	can be accessed from	-0.124939
-0.321562	even when accessed from	-0.124939
-0.355215	or 0x40 bytes from	-0.124939
-0.545900	prevent two threads from	-0.124939
-0.354950	are the integers from	-0.124939
-0.127648	value is calculated from	-0.124939
-0.309208	xn is calculated from	-0.124939
-0.309208	n! is calculated from	-0.124939
-0.576265	condition is known from	-0.124939
-0.354436	It requires support from	-0.124939
-0.354302	the entire list from	-0.124939
-0.353867	and subtracting 1 from	-0.124939
-0.472897	Mixing object files from	-0.124939
-0.335742	or resource files from	-0.124939
-0.996215	is not optimal from	-0.124939
-0.845383	different instruction sets from	-0.124939
-0.352660	used functions separate from	-0.124939
-0.844752	the memory block from	-0.124939
-0.458181	well. The conversion from	-0.124939
-0.310853	mode. A conversion from	-0.124939
-0.310853	truncation. Efficient conversion from	-0.124939
-0.352632	thread steals resources from	-0.124939
-0.352112	by subtracting n from	-0.124939
-0.495574	transferred at runtime from	-0.124939
-0.455292	that are needed from	-0.124939
-0.482537	ownership is transferred from	-0.124939
-0.328062	copied or transferred from	-0.124939
-0.298366	want to read from	-0.124939
-0.298366	cycles to read from	-0.124939
-0.262176	when we read from	-0.124939
-0.262176	program had read from	-0.124939
-0.262176	to 99 read from	-0.124939
-0.351781	library functions linked from	-0.124939
-0.454562	family of microprocessors from	-0.124939
-0.351366	useful for calling from	-0.124939
-0.350966	example 9.5a goes from	-0.124939
-0.293804	integers which range from	-0.124939
-0.056952	the address range from	-0.425969
-0.805961	to be loaded from	-0.124939
-0.286117	language, all conversions from	-0.124939
-0.119911	to avoid conversions from	-0.124939
-0.119911	cannot avoid conversions from	-0.124939
-0.349291	waiting for response from	-0.124939
-0.989183	four cache lines from	-0.124939
-0.314129	initialized or comes from	-0.124939
-0.314129	main feedback comes from	-0.124939
-0.347470	array size right from	-0.124939
-0.637749	reading and writing from	-0.124939
-0.347641	has been reduced from	-0.124939
-0.347641	should be clear from	-0.124939
-0.509667	calculated more efficiently from	-0.124939
-0.346892	and variable names from	-0.124939
-0.345604	; restore ebx from	-0.124939
-0.512842	14.1c is copied from	-0.124939
-0.248140	possible to come from	-0.124939
-0.248140	uninitialized or come from	-0.124939
-0.248140	if they come from	-0.124939
-0.279160	and integers Conversion from	-0.124939
-0.279160	is enabled. Conversion from	-0.124939
-0.340952	has a jump from	-0.124939
-0.208220	user is far from	-0.124939
-0.208220	have values far from	-0.124939
-0.208220	of course far from	-0.124939
-0.477017	must be saved from	-0.124939
-0.338743	the programming manuals from	-0.124939
-0.338587	0x4700. Reading again from	-0.124939
-0.255350	a program reads from	-0.124939
-0.255350	and later reads from	-0.124939
-0.018667	that can benefit from	-0.124939
-0.018667	registers can benefit from	-0.124939
-0.038173	memory will benefit from	-0.124939
-0.018667	that could benefit from	-0.124939
-0.018667	code could benefit from	-0.124939
-0.053580	takes to recover from	-0.124939
-0.025964	able to recover from	-0.124939
-0.053580	made to recover from	-0.124939
-0.331533	factors are generated from	-0.124939
-0.330889	has been increased from	-0.124939
-0.331104	beginning. ret returns from	-0.124939
-0.331319	pointer it gets from	-0.124939
-0.331104	addition to sum1 from	-0.124939
-0.324660	penalty when going from	-0.124939
-0.324396	addition to sum2 from	-0.124939
-0.093087	compiler is prevented from	-0.124939
-0.093087	F1 is prevented from	-0.124939
-0.420478	graphical user interfaces from	-0.124939
-0.739587	in the interval from	-0.124939
-0.928419	into vector b: from	-0.124939
-0.313785	static is removed from	-0.124939
-0.314129	is not separated from	-0.124939
-0.313785	for details. Inheritance from	-0.124939
-0.172215	demonstration purposes. Available from	-0.124939
-0.172215	Intel Agner Available from	-0.124939
-0.313785	to answer questions from	-0.124939
-0.314129	deallocated when returning from	-0.124939
-0.313785	// Use ReadTSC() from	-0.124939
-0.406959	to be evicted from	-0.124939
-0.102571	statements often suffer from	-0.124939
-0.102571	can therefore suffer from	-0.124939
-0.293311	get no warning from	-0.124939
-0.293311	can be fetched from	-0.124939
-0.102571	instructions are accessible from	-0.124939
-0.102571	are not accessible from	-0.124939
-0.293311	occur and recovering from	-0.124939
-0.537278	the const restriction from	-0.124939
-0.237112	header file timingtest.h from	-0.124939
-0.237112	is not referenced from	-0.124939
-0.237112	before any transition from	-0.124939
-0.237112	used and popped from	-0.124939
-0.237112	You may deviate from	-0.124939
-0.237112	we can learn from	-0.124939
-0.237112	profiling feasible. Interference from	-0.124939
-0.237112	2.0f; } 115 from	-0.124939
-0.237112	r is re-loaded from	-0.124939
-1.960193	part of the memory	-0.124939
-0.123251	ownership of the memory	-0.301030
-0.587103	segmentation of the memory	-0.124939
-0.595257	properly and the memory	-0.124939
-0.596532	solution for the memory	-0.124939
-1.769814	is that the memory	-0.124939
-1.581809	so that the memory	-0.124939
-1.552417	divisible by the memory	-0.124939
-0.885169	time because the memory	-0.124939
-0.547043	can cause the memory	-0.124939
-0.460593	free) causes the memory	-0.124939
-0.142663	not free the memory	-0.124939
-0.142663	could free the memory	-0.124939
-0.460593	priority. Especially the memory	-0.124939
-0.358739	important remedy is memory	-0.124939
-1.066669	size of a memory	-0.124939
-0.580209	file in a memory	-0.124939
-0.199661	strings in a memory	-0.124939
-0.591864	edx as a memory	-0.124939
-0.570082	allocating when a memory	-0.124939
-0.107819	stored at a memory	-0.602060
-0.459385	that holds a memory	-0.124939
-0.355583	of managing a memory	-0.124939
-0.839282	a part of memory	-0.124939
-0.568985	A part of memory	-0.124939
-0.582661	optimized versions of memory	-0.124939
-0.141501	The allocation of memory	-0.124939
-0.141501	involves allocation of memory	-0.124939
-0.496380	big block of memory	-0.124939
-0.578965	small piece of memory	-0.124939
-0.531838	same range of memory	-0.124939
-0.352740	an index of memory	-0.124939
-0.298482	the amount of memory	-0.124939
-0.499060	required amount of memory	-0.124939
-0.352740	wasteful copying of memory	-0.124939
-0.933698	the risk of memory	-0.124939
-0.352740	the swapping of memory	-0.124939
-0.647577	and deallocation of memory	-0.124939
-0.352740	and de-allocation of memory	-0.124939
-0.352740	large amounts of memory	-0.124939
-0.540577	overhead cost to memory	-0.124939
-0.504548	write directly to memory	-0.124939
-0.526685	CPU access and memory	-0.124939
-0.354992	store x in memory	-0.124939
-0.553309	of functions in memory	-0.124939
-0.991325	a variable in memory	-0.124939
-0.569811	on variables in memory	-0.124939
-1.308728	is stored in memory	-0.124939
-0.341443	scattered around in memory	-0.124939
-0.458634	accessed sequentially in memory	-0.124939
-0.354992	whose distance in memory	-0.124939
-0.502707	large libraries. The memory	-0.124939
-0.357275	the stack. The memory	-0.124939
-0.357275	table (PLT). The memory	-0.124939
-0.726102	is likely that memory	-0.124939
-0.358297	a container or memory	-0.124939
-0.358542	this method if memory	-0.124939
-0.562144	latency or by memory	-0.124939
-0.358361	mathematical calculations with memory	-0.124939
-0.065527	error known as memory	-0.124939
-0.462956	these purposes. This memory	-0.124939
-0.358217	a bottleneck than memory	-0.124939
-0.503937	modern computers have memory	-0.124939
-0.576862	block, but this memory	-0.124939
-0.539314	DLL takes more memory	-0.124939
-0.500454	even allocate more memory	-0.124939
-0.773992	the value from memory	-0.124939
-0.764174	to read from memory	-0.124939
-0.349578	be loaded from memory	-0.124939
-0.349578	is re-loaded from memory	-0.124939
-1.256219	code and data memory	-0.124939
-0.245965	will use different memory	-0.124939
-0.245965	threads use different memory	-0.124939
-0.709664	around at different memory	-0.124939
-0.752118	to the same memory	-0.124939
-1.014275	in the same memory	-0.124939
-0.851149	use the same memory	-0.425969
-0.738971	share the same memory	-0.124939
-0.568747	strings in one memory	-0.124939
-0.353922	queue) allocates one memory	-0.124939
-0.352784	it) load into memory	-0.124939
-0.518264	are loaded into memory	-0.124939
-0.546689	method with multiple memory	-0.124939
-0.351923	to keep multiple memory	-0.124939
-0.219230	table in static memory	-0.124939
-0.496092	stored in static memory	-0.124939
-0.219230	something in static memory	-0.124939
-0.495178	memory. The static memory	-0.124939
-0.122842	data from static memory	-0.124939
-0.122842	table from static memory	-0.124939
-0.122842	list from static memory	-0.124939
-0.122842	copied from static memory	-0.124939
-1.366063	the most efficient memory	-0.124939
-0.357227	the maximum possible memory	-0.124939
-0.351042	constant always takes memory	-0.124939
-0.351042	directive never takes memory	-0.124939
-0.356986	that destroys any memory	-0.124939
-0.460810	have much less memory	-0.124939
-0.356635	some cases take memory	-0.124939
-0.350737	allocate a new memory	-0.425969
-0.104075	uses of dynamic memory	-0.124939
-0.104075	cost of dynamic memory	-0.124939
-0.179201	advantages of dynamic memory	-0.124939
-0.104075	costs of dynamic memory	-0.124939
-0.104075	disadvantages of dynamic memory	-0.124939
-0.202185	associated with dynamic memory	-0.124939
-0.013684	to use dynamic memory	-0.124939
-0.027813	libraries use dynamic memory	-0.124939
-0.013684	classes use dynamic memory	-0.124939
-0.027813	Java, use dynamic memory	-0.124939
-0.202185	of using dynamic memory	-0.124939
-0.202185	container without dynamic memory	-0.124939
-0.027813	to avoid dynamic memory	-0.425969
-0.057531	and avoid dynamic memory	-0.124939
-0.202185	purposes. All dynamic memory	-0.124939
-0.202185	classes Whenever dynamic memory	-0.124939
-0.582102	object in case memory	-0.124939
-0.341637	memory to stack memory	-0.124939
-0.236296	table to stack memory	-0.124939
-0.333199	stored in stack memory	-0.124939
-0.355599	page 87 about memory	-0.124939
-0.524495	in a large memory	-0.124939
-0.429688	allocations of large memory	-0.124939
-0.332032	Gbytes. This large memory	-0.124939
-0.342920	deallocation of big memory	-0.124939
-0.482770	in one big memory	-0.124939
-0.524716	calculate how much memory	-0.124939
-1.126158	the most common memory	-0.124939
-0.606157	wrap the allocated memory	-0.124939
-0.125619	error. The allocated memory	-0.124939
-0.125619	collection. The allocated memory	-0.124939
-0.303074	its own allocated memory	-0.124939
-0.730410	Aligning dynamically allocated memory	-0.124939
-0.354612	wait for another memory	-0.124939
-0.956564	for a particular memory	-0.124939
-0.910029	that a particular memory	-0.124939
-0.959353	has its own memory	-0.124939
-0.078985	the new bigger memory	-0.124939
-0.037699	a new bigger memory	-0.425969
-0.553701	of the old memory	-0.124939
-0.453307	has a smaller memory	-0.124939
-0.405157	for the main memory	-0.124939
-0.405157	than the main memory	-0.124939
-0.184866	system code. Dynamic memory	-0.124939
-0.126632	memory allocation Dynamic memory	-0.124939
-0.126632	memory allocation. Dynamic memory	-0.124939
-0.126632	caching inefficient. Dynamic memory	-0.124939
-0.126632	systems). 28 Dynamic memory	-0.124939
-0.126632	optimization are. Dynamic memory	-0.124939
-0.126632	is limited. Dynamic memory	-0.124939
-0.058717	memory. 9.6 Dynamic memory	-0.124939
-0.058717	90 9.6 Dynamic memory	-0.124939
-0.451121	compile time. No memory	-0.124939
-0.786974	way to prevent memory	-0.124939
-0.125921	ebx. 9 Optimizing memory	-0.124939
-0.125921	84 9 Optimizing memory	-0.124939
-0.382664	amount of RAM memory	-0.124939
-0.099570	data from RAM memory	-0.124939
-0.099570	variable from RAM memory	-0.124939
-0.228978	1980 where RAM memory	-0.124939
-0.491965	system to swap memory	-0.124939
-0.331386	to cause seven memory	-0.124939
-0.331531	see the excessive memory	-0.124939
-0.331242	have a larger memory	-0.124939
-0.331386	in one contiguous memory	-0.124939
-0.420582	loaded at round memory	-0.124939
-0.457884	used for saving memory	-0.124939
-0.324744	writing to uncached memory	-0.124939
-0.324744	have execution units, memory	-0.124939
-0.443518	at an arbitrary memory	-0.124939
-0.314125	less efficient. Extra memory	-0.124939
-0.314125	function that allocates memory	-0.124939
-0.048311	where execution speed, memory	-0.124939
-0.048311	while execution speed, memory	-0.124939
-0.237397	and for minimizing memory	-0.124939
-0.237397	fixed strides. Uncached memory	-0.124939
-0.237397	feature for reserving memory	-0.124939
-0.358591	be responded to at	-0.124939
-0.894520	method may be at	-0.124939
-0.358040	is loaded or at	-0.124939
-0.461581	and calculate it at	-0.124939
-0.357311	as reflecting it at	-0.124939
-0.358355	virtual 53 function at	-0.124939
-0.938355	be aligned by at	-0.124939
-0.357193	no extra code at	-0.124939
-0.357193	inserts extra code at	-0.124939
-0.462437	speed or not at	-0.124939
-0.198457	time rather than at	-0.124939
-0.598730	to the compiler at	-0.124939
-0.358040	to consume time at	-0.124939
-0.724122	to stack memory at	-0.124939
-0.462284	the class has at	-0.124939
-0.583460	frameworks are used at	-0.124939
-0.064705	are never used at	-0.425969
-0.801976	with other compilers at	-0.124939
-0.357387	126 Make pointer at	-0.124939
-0.703101	the function library at	-0.124939
-0.490654	The function library at	-0.124939
-0.341819	call. Load library at	-0.124939
-0.489073	the asmlib library at	-0.124939
-0.524534	much as possible at	-0.124939
-0.461338	by its value at	-0.124939
-0.584695	write the variable at	-0.124939
-1.122707	calculate the table at	-0.124939
-0.334717	diagonal. The elements at	-0.124939
-0.298206	handle eight elements at	-0.124939
-0.298206	handles eight elements at	-0.124939
-0.334717	add dummy elements at	-0.124939
-0.502130	applications run faster at	-0.124939
-0.944510	it is stored at	-0.124939
-1.067730	should be stored at	-0.124939
-0.500533	preferably be stored at	-0.124939
-0.319873	is then stored at	-0.124939
-0.060634	16, i.e. stored at	-0.425969
-0.559805	The memory address at	-0.124939
-0.356393	a small bit at	-0.124939
-0.567438	double 32 bits at	-0.124939
-0.355540	specific optimization instructions at	-0.124939
-0.574792	calculations are available at	-0.124939
-1.494518	on the stack at	-0.124939
-0.781263	function that calls at	-0.124939
-0.355131	doing some calculations at	-0.124939
-0.546923	test 16 bytes at	-0.124939
-0.355034	that are best at	-0.124939
-0.354695	/ b) etc. at	-0.124939
-0.521433	not very good at	-0.124939
-0.437261	inlining is done at	-0.124939
-0.437261	C2::Disp() is done at	-0.124939
-0.773202	to be done at	-0.124939
-0.812303	calculations are done at	-0.124939
-0.137748	code one line at	-0.124939
-0.137748	than one line at	-0.124939
-0.047499	to this manual at	-0.301030
-0.587594	metaprogramming, as explained at	-0.124939
-0.552228	coefficients is calculated at	-0.124939
-0.563831	cannot be calculated at	-0.124939
-0.321752	it is known at	-0.124939
-0.191539	objects is known at	-0.124939
-0.085295	elements is known at	-0.425969
-0.191539	n is known at	-0.124939
-0.191539	divisor is known at	-0.124939
-0.155452	cannot be known at	-0.124939
-0.004378	is not known at	-0.823909
-0.022344	are not known at	-0.124939
-0.155452	an integer known at	-0.124939
-0.318430	the size known at	-0.124939
-0.155452	a constant known at	-0.124939
-0.318430	is already known at	-0.124939
-0.458116	are not supported at	-0.124939
-0.475641	thread may run at	-0.124939
-0.515590	thread will run at	-0.124939
-0.062882	checking multiple values at	-0.124939
-0.585166	counting clock cycles at	-0.124939
-0.353875	calculating row addresses at	-0.124939
-0.352998	8. Avoid branches at	-0.124939
-0.647911	floating point multiplication at	-0.124939
-0.352567	than its name at	-0.124939
-0.562456	seconds to zero at	-0.124939
-0.425590	better and better at	-0.124939
-0.328755	compilers are better at	-0.124939
-0.563165	Avoid table lookup at	-0.124939
-0.016209	bytes. first byte at	-0.221849
-0.082160	unused bytes byte at	-0.124939
-0.242156	at 1 byte at	-0.124939
-0.003130	0, last byte at	-0.124939
-0.004704	8, last byte at	-0.124939
-0.009459	16, last byte at	-0.124939
-0.009459	12, last byte at	-0.124939
-0.009459	400, last byte at	-0.124939
-0.082160	at 15 byte at	-0.124939
-0.747985	m is transferred at	-0.124939
-0.746779	data are aligned at	-0.124939
-0.163919	should not look at	-0.124939
-0.320653	You may look at	-0.124939
-0.048077	if you look at	-0.124939
-0.048077	If you look at	-0.124939
-0.048077	When you look at	-0.124939
-0.163919	you should look at	-0.124939
-0.163919	may also look at	-0.124939
-0.074271	code. Let's look at	-0.124939
-0.074271	rows. Let's look at	-0.124939
-0.163919	example, let's look at	-0.124939
-0.453984	with four numbers at	-0.124939
-0.645844	a small piece at	-0.124939
-0.350861	all installation options at	-0.124939
-0.493509	has to start at	-0.124939
-0.321264	collection may start at	-0.124939
-0.154303	be scattered around at	-0.425969
-0.350259	do multiple things at	-0.124939
-0.349980	doing equivalent reductions at	-0.124939
-0.472406	to be loaded at	-0.124939
-0.472406	can be loaded at	-0.124939
-0.485352	libraries are loaded at	-0.124939
-0.262946	is typically loaded at	-0.124939
-0.349841	model N+1 supports at	-0.124939
-0.349025	and BSD comes at	-0.124939
-0.347598	or no offset at	-0.124939
-0.164815	that was unknown at	-0.124939
-0.005521	that were unknown at	-0.823909
-0.346659	handle one square at	-0.124939
-0.346855	than one thing at	-0.124939
-0.554104	cache, at least at	-0.124939
-0.486703	collection can occur at	-0.124939
-0.344228	CPU, which counts at	-0.124939
-0.631858	can be added at	-0.124939
-0.342615	entire library (or at	-0.124939
-0.340752	the same DLL at	-0.124939
-0.132559	if is resolved at	-0.124939
-0.132559	#if is resolved at	-0.124939
-0.070790	is always resolved at	-0.124939
-0.070790	are always resolved at	-0.124939
-0.680253	stored in memory, at	-0.124939
-0.335402	it will break at	-0.124939
-0.335211	where everything happens at	-0.124939
-0.335402	is not evaluated at	-0.124939
-0.330828	was less popular at	-0.124939
-0.331055	sqrt and pow at	-0.124939
-0.428182	for the project at	-0.124939
-0.211909	128 below. Dispatch at	-0.124939
-0.211909	different compilers. Dispatch at	-0.124939
-0.324895	in the appendix at	-0.124939
-0.324336	array must begin at	-0.124939
-0.324336	the same cache, at	-0.124939
-0.406887	through the Internet at	-0.124939
-0.574405	the program flow at	-0.124939
-0.406887	size is handled at	-0.124939
-0.293256	memset and memcpy, at	-0.124939
-0.293256	a few kilobytes at	-0.124939
-0.293256	not be visible at	-0.124939
-0.293256	uses by looking at	-0.124939
-0.293256	with element matrix[c][r] at	-0.124939
-0.293256	to generate interrupts at	-0.124939
-0.293256	able to do, at	-0.124939
-0.293256	loop body begins at	-0.124939
-0.381658	time-consuming garbage collector at	-0.124939
-0.237064	have been lost at	-0.124939
-0.237064	chapter is aiming at	-0.124939
-0.237064	Templates are instantiated at	-0.124939
-0.237064	will calculate (1./1.2345) at	-0.124939
-0.237064	the dispatch decision at	-0.124939
-0.237064	Usability for Nerds at	-0.124939
-0.237064	A Pragmatic Look at	-0.124939
-0.237064	not need relocation at	-0.124939
-0.237064	stupid things. Looking at	-0.124939
-0.237064	may come unpredictably at	-0.124939
-0.237064	temporary debug breakpoints at	-0.124939
-1.021263	use of the data	-0.124939
-1.533275	address of the data	-0.124939
-1.208389	instance of the data	-0.124939
-0.863633	efficiency of the data	-0.124939
-0.581883	sizes of the data	-0.124939
-0.581883	analysis of the data	-0.124939
-0.199253	cache and the data	-0.124939
-0.376163	contentions in the data	-0.425969
-0.839665	than if the data	-0.124939
-0.987210	efficient if the data	-0.124939
-1.182273	faster if the data	-0.124939
-0.569191	CPUs if the data	-0.124939
-0.593270	efficiently when the data	-0.124939
-1.712277	to make the data	-0.124939
-0.580823	possible into the data	-0.124939
-1.391668	cases where the data	-0.124939
-0.577644	which makes the data	-0.124939
-0.519744	(4) access the data	-0.124939
-0.565652	split up the data	-0.124939
-1.140601	by making the data	-0.124939
-0.386512	data. Therefore, the data	-0.124939
-0.386512	consuming. Therefore, the data	-0.124939
-0.714897	the smaller the data	-0.124939
-0.353794	than processing the data	-0.124939
-0.551959	you divide the data	-0.124939
-0.353794	for converting the data	-0.124939
-0.353794	by reordering the data	-0.124939
-0.353794	from aligning the data	-0.124939
-0.649659	to manipulate the data	-0.124939
-0.457115	by organizing the data	-0.124939
-0.141824	vector. Organize the data	-0.124939
-0.141824	bottleneck. Organize the data	-0.124939
-0.895550	data. This is data	-0.124939
-0.597957	offset of a data	-0.124939
-0.357727	simple cases, a data	-0.124939
-0.525520	for accessing a data	-0.124939
-0.503339	structures. Accessing a data	-0.124939
-0.841867	realistic set of data	-0.124939
-1.824941	the size of data	-0.124939
-0.566320	and type of data	-0.124939
-1.134594	the case of data	-0.124939
-1.037264	a lot of data	-0.124939
-0.499080	per byte of data	-0.124939
-0.536079	if pieces of data	-0.124939
-0.065262	16. Alignment of data	-0.124939
-0.031406	9.5 Alignment of data	-0.124939
-0.065262	7.2. Alignment of data	-0.124939
-0.651406	the contents of data	-0.124939
-0.473687	or pointers to data	-0.124939
-0.473687	Any pointers to data	-0.124939
-0.462453	Internal references to data	-0.124939
-0.066401	of code and data	-0.124939
-0.144820	when code and data	-0.425969
-0.456583	cache use and data	-0.124939
-0.831942	public functions and data	-0.124939
-0.923159	code cache and data	-0.124939
-0.353375	code caching and data	-0.124939
-0.353375	on algorithms and data	-0.124939
-0.353375	functional decomposition and data	-0.124939
-0.585051	Func(a[i]); } The data	-0.124939
-0.558236	vector data. The data	-0.124939
-0.459850	array index. The data	-0.124939
-0.355949	members (properties) The data	-0.124939
-0.459850	multiple processes. The data	-0.124939
-0.550579	applications require that data	-0.124939
-0.502478	of program or data	-0.124939
-0.357111	code size or data	-0.124939
-0.357257	data explicitly if data	-0.124939
-0.357257	is poor if data	-0.124939
-0.358354	the vectors. This data	-0.124939
-0.358056	may sample more data	-0.124939
-0.106573	less efficiently when data	-0.425969
-0.598581	accessing the same data	-0.124939
-0.580652	lists and other data	-0.124939
-0.544639	database, or other data	-0.124939
-1.380757	order in which data	-0.124939
-0.756821	size of all data	-0.124939
-0.818214	operations on all data	-0.124939
-0.346733	by copying all data	-0.124939
-0.346733	to contain all data	-0.124939
-0.953360	most often used data	-0.124939
-1.270966	of a class data	-0.124939
-0.521081	the parent class data	-0.124939
-0.495149	merge the multiple data	-0.124939
-0.351855	performed on multiple data	-0.124939
-0.357106	by allowing two data	-0.124939
-0.509077	advantage of static data	-0.124939
-0.526149	tables. The static data	-0.124939
-0.346487	problems because static data	-0.124939
-0.357173	a structure where data	-0.124939
-0.301070	memory. This makes data	-0.124939
-0.301070	program. This makes data	-0.124939
-0.301070	class. This makes data	-0.124939
-0.301070	needed. This makes data	-0.124939
-0.301070	order. This makes data	-0.124939
-0.301070	bits. This makes data	-0.124939
-0.301070	fragmented. This makes data	-0.124939
-0.452191	stack, which makes data	-0.124939
-1.166881	before the first data	-0.124939
-0.361413	set of test data	-0.124939
-0.476068	data. The test data	-0.124939
-0.344737	}; // constant data	-0.124939
-0.344737	memory. Copying constant data	-0.124939
-0.797105	useful for making data	-0.124939
-0.355485	cases, but its data	-0.124939
-0.355503	page 26 about data	-0.124939
-0.112897	contentions in large data	-0.425969
-0.375911	optimized for large data	-0.124939
-0.288561	applications with large data	-0.124939
-0.288561	calculations on large data	-0.124939
-0.640451	for very large data	-0.124939
-0.288561	server. Use large data	-0.124939
-0.355371	multiple threads, while data	-0.124939
-0.295479	that have big data	-0.124939
-0.418583	you have big data	-0.124939
-0.486404	to very big data	-0.124939
-0.442581	get as much data	-0.124939
-0.481912	has too much data	-0.124939
-0.576597	space to store data	-0.124939
-0.650074	to store intermediate data	-0.124939
-0.474622	access a public data	-0.124939
-0.597076	functions and public data	-0.124939
-0.558140	thread its own data	-0.124939
-0.351032	disadvantage of binary data	-0.124939
-0.351032	a plain old data	-0.124939
-0.350493	for more advanced data	-0.124939
-0.641465	in the level-1 data	-0.124939
-0.326899	than the level-1 data	-0.124939
-0.326899	where the level-1 data	-0.124939
-0.245390	is a level-1 data	-0.124939
-0.348466	data, including local data	-0.124939
-0.558258	putting the right data	-0.124939
-0.638026	reading and writing data	-0.124939
-0.346805	generates too little data	-0.124939
-0.633983	compilers will align data	-0.124939
-0.344579	function cannot modify data	-0.124939
-0.344491	overflow on input data	-0.124939
-0.092897	access any non-static data	-0.425969
-0.343003	finished the time-consuming data	-0.124939
-0.338904	in a far data	-0.124939
-0.620233	method of storing data	-0.124939
-0.954411	use the smallest data	-0.124939
-0.335640	remote help files, data	-0.124939
-0.187723	have to prefetch data	-0.124939
-0.187723	modern processors prefetch data	-0.124939
-0.187723	to automatically prefetch data	-0.124939
-0.335640	processing, signal processing, data	-0.124939
-0.335784	Memory access Accessing data	-0.124939
-0.331111	mirror the remote data	-0.124939
-0.331452	instruction set. Aligning data	-0.124939
-0.093197	96 9.9 Access data	-0.124939
-0.093197	www.agner.org/optimize/cppexamples.zip. 9.9 Access data	-0.124939
-0.324615	vectors RGB image data	-0.124939
-0.093197	performance. 7.18 Class data	-0.124939
-0.093197	51 7.18 Class data	-0.124939
-0.324615	vectorization favorable: Small data	-0.124939
-0.313999	Misaligned data. Extra data	-0.124939
-0.313999	used for prefetching data	-0.124939
-0.313999	macro for aligning data	-0.124939
-0.314272	penalty for organizing data	-0.124939
-0.407224	section and read-only data	-0.124939
-0.574237	code that accesses data	-0.124939
-0.293515	buffer or send data	-0.124939
-0.381975	way of keeping data	-0.124939
-0.102637	to make thread-specific data	-0.124939
-0.102637	for containing thread-specific data	-0.124939
-0.293515	unit of received data	-0.124939
-0.381975	need to organize data	-0.124939
-0.237291	to optimization. Prefetching data	-0.124939
-0.237291	common to exchange data	-0.124939
-0.237291	functions Encryption, decryption, data	-0.124939
-0.237291	concentrated on arranging data	-0.124939
-0.237291	shared. Any writable data	-0.124939
-0.237291	less favorable: Larger data	-0.124939
-0.237291	file containing numerical data	-0.124939
-0.237291	Aligning data Loading data	-0.124939
-0.237291	the data structure, data	-0.124939
-1.228454	size of the program	-0.124939
-1.528455	version of the program	-0.124939
-0.586500	part of the program	-0.176091
-0.644929	parts of the program	-0.124939
-0.782506	installation of the program	-0.124939
-0.537596	logic of the program	-0.124939
-0.537596	modification of the program	-0.124939
-0.537596	clarity of the program	-0.124939
-0.569702	elements and the program	-0.124939
-0.569702	support, and the program	-0.124939
-0.826366	table in the program	-0.124939
-0.562006	later in the program	-0.124939
-0.562006	overhead in the program	-0.124939
-0.826366	spent in the program	-0.124939
-0.826366	spots in the program	-0.124939
-0.562006	handler in the program	-0.124939
-0.562006	elsewhere in the program	-0.124939
-0.562006	delays in the program	-0.124939
-0.931111	only if the program	-0.124939
-0.546901	systems if the program	-0.124939
-1.121921	even if the program	-0.124939
-0.931111	But if the program	-0.124939
-0.546901	Test if the program	-0.124939
-0.546901	happens if the program	-0.124939
-0.583371	out by the program	-0.124939
-1.012436	resources than the program	-0.124939
-1.033594	each time the program	-0.124939
-1.087557	every time the program	-0.124939
-0.469184	memory when the program	-0.124939
-0.469184	program when the program	-0.124939
-0.469184	there when the program	-0.124939
-0.469184	want when the program	-0.124939
-0.469184	files when the program	-0.124939
-0.322100	initialized when the program	-0.425969
-0.469184	inputs when the program	-0.124939
-0.668611	resolved when the program	-0.124939
-0.506154	not make the program	-0.124939
-0.838297	will make the program	-0.124939
-0.506154	course make the program	-0.124939
-0.523649	long. If the program	-0.124939
-0.523649	0x1C. If the program	-0.124939
-0.523649	containers. If the program	-0.124939
-0.523649	analysis. If the program	-0.124939
-0.567888	time, but the program	-0.124939
-0.571681	instruments into the program	-0.124939
-0.570519	only makes the program	-0.124939
-0.820008	called before the program	-0.124939
-0.497546	values before the program	-0.124939
-0.497546	resolved before the program	-0.124939
-0.523518	just want the program	-0.124939
-0.451737	and while the program	-0.124939
-0.451737	break while the program	-0.124939
-0.306784	and compile the program	-0.124939
-0.433658	you compile the program	-0.124939
-0.735338	to run the program	-0.124939
-0.544400	locked after the program	-0.124939
-0.523518	function. When the program	-0.124939
-0.550441	postponed until the program	-0.124939
-0.488707	to modify the program	-0.124939
-0.347213	to update the program	-0.124939
-0.347213	other words, the program	-0.124939
-0.347213	which determines the program	-0.124939
-0.448783	logic behind the program	-0.124939
-0.636748	and stop the program	-0.124939
-0.347213	interface. Otherwise the program	-0.124939
-0.347213	condition terminates the program	-0.124939
-0.337467	loop of a program	-0.425969
-0.472898	part of a program	-0.124939
-0.829810	versions of a program	-0.124939
-0.502183	speed of a program	-0.124939
-0.502183	structure of a program	-0.124939
-0.502183	redesign of a program	-0.124939
-0.865979	data in a program	-0.124939
-0.583108	loop in a program	-0.124939
-1.013255	Assume that a program	-0.124939
-0.778241	example, if a program	-0.124939
-0.477166	inefficient if a program	-0.124939
-0.477166	occur if a program	-0.124939
-0.580138	tag on a program	-0.124939
-0.576348	verify than a program	-0.124939
-0.561154	advantageous when a program	-0.124939
-0.449611	pointer. If a program	-0.124939
-0.449611	inefficient. If a program	-0.124939
-0.449611	ways). If a program	-0.124939
-0.417252	situation where a program	-0.124939
-0.350113	before running a program	-0.124939
-0.642414	to load a program	-0.124939
-0.350113	slow down a program	-0.124939
-0.706489	to install a program	-0.124939
-0.350113	on redesigning a program	-0.124939
-1.837128	the size of program	-0.124939
-0.525719	possible cases of program	-0.124939
-1.510946	a piece of program	-0.124939
-0.461792	low priority of program	-0.124939
-1.276120	in terms of program	-0.124939
-1.039449	scattered around in program	-0.124939
-0.357879	modules contiguous in program	-0.124939
-0.357879	reproducibility. Delays in program	-0.124939
-0.555973	load time. The program	-0.124939
-0.529369	described below. The program	-0.124939
-0.768149	are called. The program	-0.124939
-0.744909	the calculations. The program	-0.124939
-0.453704	instruction sets. The program	-0.124939
-0.453704	at initialization. The program	-0.124939
-0.453704	always true. The program	-0.124939
-0.708743	or *.so). The program	-0.124939
-0.453704	just-in-time compilation. The program	-0.124939
-0.351104	been deallocated. The program	-0.124939
-0.351104	with interpretation. The program	-0.124939
-0.351104	Linux, sched_setaffinity). The program	-0.124939
-0.351104	and error-prone. The program	-0.124939
-0.540565	execution speed or program	-0.124939
-0.358506	negative impacts on program	-0.124939
-0.358118	this reason. A program	-0.124939
-0.487444	Make a C++ program	-0.124939
-0.346301	produces another C++ program	-0.124939
-0.346301	a well-structured C++ program	-0.124939
-0.812912	that it makes program	-0.124939
-0.294633	in the test program	-0.124939
-0.294633	by the test program	-0.124939
-0.693361	make a test program	-0.124939
-0.329545	a small test program	-0.124939
-0.459494	made a Windows program	-0.124939
-0.808877	of a big program	-0.124939
-0.459397	Basic, etc. But program	-0.124939
-0.523092	a highly optimized program	-0.124939
-0.246588	a console mode program	-0.124939
-0.072687	A console mode program	-0.425969
-0.129394	processors. The application program	-0.124939
-0.129394	library. The application program	-0.124939
-0.407888	in an application program	-0.124939
-0.454620	to the calling program	-0.124939
-0.361142	switch in your program	-0.124939
-0.427617	must make your program	-0.124939
-0.276446	counters inside your program	-0.124939
-0.276446	frame unless your program	-0.124939
-0.492743	*.so). The installation program	-0.124939
-0.827555	for the desired program	-0.124939
-0.237871	for the whole program	-0.124939
-0.237871	throughout the whole program	-0.124939
-0.068142	option for whole program	-0.124939
-0.068142	support for whole program	-0.124939
-0.149002	can do whole program	-0.124939
-0.149002	feature called whole program	-0.124939
-0.149002	"__attribute__((visibility("hidden")))". Use whole program	-0.124939
-0.149002	of doing whole program	-0.124939
-0.349399	actually used. No program	-0.124939
-1.058287	in the final program	-0.124939
-0.766014	a more clear program	-0.124939
-0.304218	is used during program	-0.124939
-0.304218	array grows during program	-0.124939
-0.396176	in the entire program	-0.124939
-0.396176	making the entire program	-0.124939
-0.346102	Application programmers rarely program	-0.124939
-0.336023	if the 7 program	-0.124939
-0.314436	run a speed-critical program	-0.124939
-0.314436	a more well-structured program	-0.124939
-0.065745	Profile-guided optimization Whole program	-0.124939
-0.065745	a program. Whole program	-0.124939
-0.065745	optimization /Og Whole program	-0.124939
-0.407765	a computationally intensive program	-0.124939
-0.293932	useful for preventing program	-0.124939
-0.293932	development time, usability, program	-0.124939
-0.237658	testing and analyzing program	-0.124939
-0.237658	installation of downloaded program	-0.124939
-0.237658	/GL --combine -fwhole- program	-0.124939
-0.237658	Use an antivirus program	-0.124939
-1.391653	a function that has	-0.124939
-0.903396	function library that has	-0.124939
-0.354992	a file that has	-0.124939
-0.652029	memory block that has	-0.124939
-0.354992	extra iteration that has	-0.124939
-0.499518	that everything that has	-0.124939
-1.489174	so that it has	-0.124939
-0.561464	discovers that it has	-0.124939
-0.491096	branch if it has	-0.124939
-0.491096	pointers if it has	-0.124939
-0.491096	check if it has	-0.124939
-0.491096	safe if it has	-0.124939
-0.962600	even when it has	-0.124939
-0.542506	compiler because it has	-0.124939
-0.542506	up because it has	-0.124939
-0.438865	array, which it has	-0.124939
-0.844811	function, but it has	-0.124939
-0.437193	(c+d) before it has	-0.124939
-0.437193	sub-vector before it has	-0.124939
-0.339345	the work it has	-0.124939
-0.558677	declared. Therefore, it has	-0.124939
-0.137339	object after it has	-0.124939
-0.137339	accessed after it has	-0.124939
-0.339345	to anything it has	-0.124939
-1.194010	if the function has	-0.124939
-0.583452	each member function has	-0.124939
-0.582991	sure the code has	-0.124939
-0.582991	Whenever the code has	-0.124939
-0.581712	automatically. The code has	-0.124939
-0.744164	} This code has	-0.124939
-0.518351	Not all code has	-0.124939
-0.453323	else. System code has	-0.124939
-0.558825	FuncC(i+1); } This has	-0.124939
-0.334781	the function. This has	-0.425969
-0.518105	C++ program. This has	-0.124939
-0.138670	1 cache. This has	-0.124939
-0.138670	level-2 cache. This has	-0.124939
-0.691829	is called. This has	-0.124939
-0.343599	is executed. This has	-0.124939
-0.343599	+= list[i]; This has	-0.124939
-0.343599	32 bytes). This has	-0.124939
-1.251528	that the compiler has	-0.124939
-1.041782	because the compiler has	-0.124939
-0.952099	If the compiler has	-0.124939
-0.814380	Here, the compiler has	-0.124939
-0.526811	2. The compiler has	-0.124939
-0.526811	one. The compiler has	-0.124939
-0.526811	module. The compiler has	-0.124939
-0.512257	37 A compiler has	-0.124939
-1.248761	the Intel compiler has	-0.124939
-1.144140	The Intel compiler has	-0.124939
-0.339677	The Codeplay compiler has	-0.124939
-0.658404	public CParent<CChild1> { has	-0.124939
-0.462589	during this time has	-0.124939
-0.341040	a pointer. It has	-0.124939
-0.352737	non-Intel processors). It has	-0.124939
-1.593989	of the program has	-0.124939
-0.830775	of a program has	-0.124939
-0.284939	if a program has	-0.124939
-0.394362	when a program has	-0.124939
-0.625908	If a program has	-0.124939
-0.555473	where a program has	-0.124939
-0.466391	*.so). The program has	-0.124939
-0.466391	error-prone. The program has	-0.124939
-0.236229	because the CPU has	-0.301030
-0.357743	the unit-test but has	-0.124939
-0.565219	that the integer has	-0.124939
-0.456744	a 32-bit integer has	-0.124939
-0.561356	64-bit instruction set has	-0.124939
-0.966989	x86 instruction set has	-0.124939
-0.561356	(the instruction set has	-0.124939
-0.565654	If the class has	-0.124939
-0.353219	a polymorphic class has	-0.124939
-1.272403	in this example has	-0.124939
-0.549041	in Intel compilers has	-0.124939
-0.502953	small code size has	-0.124939
-0.696000	of the pointer has	-0.124939
-0.434106	then the pointer has	-0.124939
-0.494175	the function pointer has	-0.124939
-0.438232	previous link pointer has	-0.124939
-0.357466	b because b has	-0.124939
-0.530804	and the library has	-0.124939
-0.411596	code. The library has	-0.124939
-0.411596	program or library has	-0.124939
-0.056741	platforms. This library has	-0.425969
-0.122026	7.2). This library has	-0.124939
-0.122026	-mveclibabi=svml. This library has	-0.124939
-0.550498	vector class library has	-0.124939
-0.532110	register the object has	-0.124939
-0.532110	destructor the object has	-0.124939
-0.812632	A shared object has	-0.124939
-0.357209	functions, where static has	-0.124939
-0.357093	language While C++ has	-0.124939
-0.345955	latter function also has	-0.124939
-0.345955	register stack also has	-0.124939
-0.345955	Loop unrolling also has	-0.124939
-0.593968	until the value has	-0.124939
-0.860746	of the table has	-0.124939
-0.502116	optimization of performance has	-0.124939
-0.356853	a suboptimal way has	-0.124939
-0.568447	If a template has	-0.124939
-0.918267	number of registers has	-0.124939
-0.544112	of vector registers has	-0.124939
-0.341310	if the user has	-0.124939
-0.436890	that a user has	-0.124939
-0.355599	the variable always has	-0.124939
-1.633289	the operating system has	-0.124939
-0.546728	because the file has	-0.124939
-0.355822	instructions. Each type has	-0.124939
-0.355565	or another error has	-0.124939
-0.949350	if the processor has	-0.124939
-0.428180	Whenever a processor has	-0.124939
-0.330827	simultaneously. This processor has	-0.124939
-0.795631	the array element has	-0.124939
-0.567598	MASM assembly language has	-0.124939
-0.521570	stack. Each thread has	-0.124939
-0.458667	Floating point overflow has	-0.124939
-0.852652	Each cache line has	-0.124939
-0.458054	registers. This problem has	-0.124939
-1.010224	a linked list has	-0.124939
-0.354325	the pipeline structure has	-0.124939
-0.354084	point rounding mode has	-0.124939
-1.239770	the repeat count has	-0.124939
-0.713955	the heap space has	-0.124939
-0.338361	that the microprocessor has	-0.124939
-0.351554	because the microprocessor has	-0.124939
-0.494731	If the microprocessor has	-0.124939
-0.836847	if the application has	-0.124939
-0.520458	each CPU model has	-0.124939
-0.455729	if the parameter has	-0.124939
-0.578786	if the programmer has	-0.124939
-0.416986	the static keyword has	-0.124939
-0.416986	The static keyword has	-0.124939
-0.455273	where each addition has	-0.124939
-0.351473	the user actually has	-0.124939
-0.966973	of hardware platform has	-0.124939
-0.959129	of the operands has	-0.124939
-0.496841	variable in main has	-0.124939
-0.552558	If the computer has	-0.124939
-0.349948	the pointer p has	-0.124939
-0.451927	The C++ syntax has	-0.124939
-0.545106	fact, the STL has	-0.124939
-0.513666	copy Function inlining has	-0.124939
-0.349482	A template instance has	-0.124939
-0.349549	compiled as position-independent has	-0.124939
-0.545127	then the offset has	-0.124939
-0.778551	when the heap has	-0.124939
-0.345886	overflow doesn't occur has	-0.124939
-0.798123	the main executable has	-0.124939
-0.344397	wait until seconds has	-0.124939
-0.382183	possible if F1 has	-0.124939
-0.293684	exception then F1 has	-0.124939
-0.344594	not selected. Compiler has	-0.124939
-0.690054	oriented programming style has	-0.124939
-0.536460	though the latter has	-0.124939
-0.529608	Each dependency chain has	-0.124939
-0.341182	end user who has	-0.124939
-0.338692	D language. D has	-0.124939
-0.778878	The heap manager has	-0.124939
-0.681822	a hot spot has	-0.124939
-0.324515	and shared_ptr. auto_ptr has	-0.124939
-0.313902	ArrayOfStructures[100]; This reordering has	-0.124939
-0.313902	major platforms. Pascal has	-0.124939
-0.381862	7.32b. A for-loop has	-0.124939
-0.293422	three functions. Sum1 has	-0.124939
-0.381862	that the reader has	-0.124939
-0.293422	(without member functions) has	-0.124939
-0.237210	"Gnu indirect function" has	-0.124939
-0.237210	is that CParent::Hello() has	-0.124939
-0.237210	the stack. Deallocation has	-0.124939
-0.237210	in example 8.23b has	-0.124939
-0.237210	option -fno-pic apparently has	-0.124939
-0.237210	loop initialisation i=0; has	-0.124939
-0.887971	set is the vector	-0.124939
-1.588707	size of the vector	-0.124939
-1.476915	multiple of the vector	-0.124939
-0.595480	library and the vector	-0.124939
-0.892473	set for the vector	-0.124939
-0.561054	aligned by the vector	-0.124939
-0.846708	divisible by the vector	-0.425969
-0.592347	smaller than the vector	-0.124939
-0.561098	branches at the vector	-0.124939
-0.561098	lookup at the vector	-0.124939
-0.593724	CPU. If the vector	-0.124939
-1.147100	of using the vector	-0.124939
-0.584876	nicely into the vector	-0.124939
-0.537833	11. Using the vector	-0.124939
-0.886854	elements of a vector	-0.124939
-1.159210	elements in a vector	-0.124939
-0.872640	result in a vector	-0.124939
-0.567125	STL as a vector	-0.124939
-0.835823	organized as a vector	-0.124939
-0.456376	data into a vector	-0.124939
-0.456376	y into a vector	-0.124939
-0.456376	fit into a vector	-0.124939
-0.456376	packed into a vector	-0.124939
-0.832934	situations where a vector	-0.124939
-0.353737	can calculate a vector	-0.124939
-0.018922	// Make a vector	-0.903090
-1.139554	The use of vector	-0.124939
-1.389702	The size of vector	-0.124939
-1.430735	take advantage of vector	-0.124939
-1.174843	different kinds of vector	-0.124939
-0.357392	further extension of vector	-0.124939
-0.549455	simple processors and vector	-0.124939
-0.358087	(chapter 11) and vector	-0.124939
-1.397371	of elements in vector	-0.124939
-0.218565	each element in vector	-0.249877
-0.722995	vector registers. The vector	-0.124939
-0.502743	64 bits. The vector	-0.124939
-0.357302	operations. 105 The vector	-0.124939
-0.538705	poor performance for vector	-0.124939
-0.461533	not suited for vector	-0.124939
-0.357274	structures. Useful for vector	-0.124939
-0.659036	#include "vectorclass.h" // vector	-0.124939
-0.129090	intrinsic functions or vector	-0.425969
-0.585718	size divisible by vector	-0.124939
-0.454150	simple integer with vector	-0.124939
-0.351455	only available with vector	-0.124939
-0.494593	aligned arrays with vector	-0.124939
-0.529900	a problem with vector	-0.124939
-0.351455	12.6. Function with vector	-0.124939
-0.351455	CPU dispatching with vector	-0.124939
-0.460899	same size as vector	-0.124939
-1.208093	be implemented as vector	-0.124939
-0.462688	Today's microprocessors have vector	-0.124939
-0.961150	possible to use vector	-0.124939
-0.743167	advantageous to use vector	-0.124939
-0.539047	easier to use vector	-0.124939
-0.574062	compilers can use vector	-0.124939
-0.346420	can also use vector	-0.124939
-0.358135	SSE4.1 some more vector	-0.124939
-0.475370	functions for integer vector	-0.124939
-0.455554	few more integer vector	-0.124939
-0.132148	store aligned integer vector	-0.124939
-0.132148	load aligned integer vector	-0.124939
-0.012848	store unaligned integer vector	-0.602060
-0.012848	load unaligned integer vector	-0.602060
-0.357583	Example 7.41a class vector	-0.124939
-0.521947	size of each vector	-0.124939
-0.529181	elements in each vector	-0.124939
-0.445901	available then each vector	-0.124939
-1.307221	you are using vector	-0.124939
-0.587224	loop by using vector	-0.124939
-0.492473	use of Intel vector	-0.124939
-0.519377	AMD and Intel vector	-0.124939
-0.558754	www.agner.org/optimize/#vectorclass. The Intel vector	-0.124939
-0.335000	example, using Intel vector	-0.124939
-0.613267	math libraries: Intel vector	-0.124939
-0.335000	follows (using Intel vector	-0.124939
-0.014541	from cc into vector	-0.726999
-0.014541	from bb into vector	-0.726999
-0.555206	after the 64-bit vector	-0.124939
-0.851667	The most efficient vector	-0.124939
-0.357252	Define biggest possible vector	-0.124939
-0.472994	using a long vector	-0.124939
-0.472994	With a long vector	-0.124939
-0.336392	of some long vector	-0.124939
-0.336392	math libraries: long vector	-0.124939
-0.061545	a 128 bit vector	-0.301030
-0.333151	two 128- bit vector	-0.124939
-0.586664	support a new vector	-0.124939
-0.315542	However, the short vector	-0.124939
-0.409137	With a short vector	-0.124939
-0.315542	list of short vector	-0.124939
-0.315542	libraries and short vector	-0.124939
-0.315542	libraries: Intel short vector	-0.124939
-0.524043	lists the available vector	-0.124939
-0.558007	add the constant vector	-0.124939
-0.073461	Store the result vector	-0.726999
-0.521010	addition with another vector	-0.124939
-0.121759	access. 12 Using vector	-0.124939
-0.121759	103 12 Using vector	-0.124939
-0.121759	109 12.5 Using vector	-0.124939
-0.121759	section. 12.5 Using vector	-0.124939
-0.331594	useful for Boolean vector	-0.124939
-0.331594	branch mispredictions. Boolean vector	-0.124939
-0.353224	double. The intrinsic vector	-0.124939
-0.351786	Windows). The XMM vector	-0.124939
-0.351786	advantage of bigger vector	-0.124939
-0.514601	Intel compiler supports vector	-0.124939
-0.528186	found in my vector	-0.124939
-0.349910	automatic parallelization. Supports vector	-0.124939
-0.410865	into an STL vector	-0.124939
-0.316934	four objects. STL vector	-0.124939
-0.058698	use the #pragma vector	-0.124939
-0.058698	when the #pragma vector	-0.124939
-0.126587	then use #pragma vector	-0.124939
-0.126587	vector always #pragma vector	-0.124939
-0.126587	Noncached write #pragma vector	-0.124939
-0.058698	is aligned #pragma vector	-0.124939
-0.058698	vector aligned #pragma vector	-0.124939
-0.126587	vector nontemporal #pragma vector	-0.124939
-0.126587	("internal"))) Vectorize #pragma vector	-0.124939
-0.348525	set of special vector	-0.124939
-0.819912	into the right vector	-0.124939
-0.386781	{ // Define vector	-0.124939
-0.114805	<dvec.h> // Define vector	-0.425969
-0.271313	"vectorclass.h" // Define vector	-0.124939
-0.215960	into a 128-bit vector	-0.124939
-0.264942	that supported 128-bit vector	-0.124939
-0.345892	with massively parallel vector	-0.124939
-0.345954	does not allow vector	-0.124939
-0.344805	from me. My vector	-0.124939
-0.287202	emulate a 256-bit vector	-0.124939
-0.287202	use one 256-bit vector	-0.124939
-0.339332	AVX, AVX2 Mathematical vector	-0.124939
-0.473131	whether the largest vector	-0.124939
-0.335802	example, using Agner vector	-0.124939
-0.038215	example using Agner's vector	-0.124939
-0.038215	vector classes Agner's vector	-0.124939
-0.038215	page 107). Agner's vector	-0.124939
-0.038215	option -mveclibabi=acml. Agner's vector	-0.124939
-0.038215	amd_vrs4_expf amd_vrd2_exp Agner's vector	-0.124939
-0.428776	using the larger vector	-0.124939
-0.324966	eight or sixteen vector	-0.124939
-0.324966	got RISC cores, vector	-0.124939
-0.314395	y = b;} vector	-0.124939
-0.293691	workstations and scientific vector	-0.124939
-0.102694	libraries of predefined vector	-0.124939
-0.102694	functions Use predefined vector	-0.124939
-0.237446	most processors (when vector	-0.124939
-0.237446	and other odd-sized vector	-0.124939
-0.237446	XOR b Bit vector	-0.124939
-0.237446	y + a.y);} vector	-0.124939
-0.237446	{ // 2-dimensional vector	-0.124939
-0.585391	line or a make	-0.124939
-0.890227	such as a make	-0.124939
-0.598828	this is to make	-0.124939
-0.677366	way is to make	-0.124939
-0.423811	pointers is to make	-0.124939
-0.802305	problem is to make	-0.124939
-0.502993	solution is to make	-0.124939
-0.423811	jobs is to make	-0.124939
-0.423811	goal is to make	-0.124939
-0.151095	zero and to make	-0.425969
-0.423795	so as to make	-0.124939
-0.327319	maintenance - to make	-0.124939
-1.022075	the compiler to make	-0.124939
-1.159450	You have to make	-0.124939
-0.936026	compiler has to make	-0.124939
-0.598804	can do to make	-0.124939
-0.789537	more efficient to make	-0.425969
-0.461387	the array to make	-0.124939
-0.673755	is possible to make	-0.221849
-0.657902	often possible to make	-0.124939
-0.327319	fourth value to make	-0.124939
-1.064776	it takes to make	-0.124939
-0.498326	in order to make	-0.321233
-0.173057	only way to make	-0.425969
-0.662182	easiest way to make	-0.124939
-0.527825	even faster to make	-0.124939
-0.479499	of how to make	-0.124939
-0.736293	for how to make	-0.124939
-0.479499	shows how to make	-0.124939
-0.564131	discusses how to make	-0.124939
-0.510680	be useful to make	-0.124939
-0.912157	are sure to make	-0.124939
-0.610727	may want to make	-0.124939
-0.697877	you want to make	-0.301030
-0.610727	who want to make	-0.124939
-1.328775	is important to make	-0.124939
-0.101114	is common to make	-0.124939
-1.267823	be advantageous to make	-0.124939
-0.492517	better solution to make	-0.124939
-0.327319	the structure to make	-0.124939
-1.407920	is recommended to make	-0.124939
-0.868911	not recommended to make	-0.124939
-0.423795	CPU dispatching to make	-0.124939
-0.327319	more complicated to make	-0.124939
-0.796718	compiler needs to make	-0.124939
-0.114296	the programmer to make	-0.492916
-0.327319	Metaprogramming means to make	-0.124939
-0.656332	may choose to make	-0.124939
-0.771742	three ways to make	-0.124939
-0.423795	ingenious things to make	-0.124939
-0.423795	8.23a. Loop to make	-0.124939
-0.423795	a destructor to make	-0.124939
-0.461387	be safe to make	-0.124939
-0.327319	a subexpression to make	-0.124939
-0.747232	is easy to make	-0.124939
-0.481477	is convenient to make	-0.124939
-0.061651	the effort to make	-0.124939
-0.461387	often preferable to make	-0.124939
-0.747232	good idea to make	-0.124939
-0.327319	systematic manner to make	-0.124939
-0.598804	is sufficient to make	-0.124939
-0.327319	code carefully to make	-0.124939
-0.598804	you forget to make	-0.124939
-0.327319	I tried to make	-0.124939
-0.327319	is advisable to make	-0.124939
-0.327319	also tends to make	-0.124939
-0.519593	thread-specific data and make	-0.124939
-0.353691	possible case and make	-0.124939
-0.550649	random times and make	-0.124939
-0.649455	the problem and make	-0.124939
-0.353691	to list and make	-0.124939
-0.353691	the conversions and make	-0.124939
-0.867703	if possible, and make	-0.124939
-0.456984	of truncation and make	-0.124939
-0.353691	software package and make	-0.124939
-0.353691	hot spot and make	-0.124939
-0.353691	are aligned, and make	-0.124939
-0.353691	an error; and make	-0.124939
-0.549575	vector instructions that make	-0.124939
-0.457077	.cpp modules that make	-0.124939
-0.353764	The conditions that make	-0.124939
-0.353764	other details that make	-0.124939
-0.353764	some disadvantages that make	-0.124939
-0.141815	register. Factors that make	-0.124939
-0.141815	is. Factors that make	-0.124939
-0.353764	other complications that make	-0.124939
-0.865176	factors that can make	-0.124939
-1.413564	the compiler can make	-0.124939
-1.069054	that you can make	-0.124939
-0.771794	how you can make	-0.124939
-0.531469	Instead, you can make	-0.124939
-0.964933	Most compilers can make	-0.124939
-0.348197	new instructions can make	-0.124939
-0.348197	the application can make	-0.124939
-0.493044	...). We can make	-0.124939
-0.493044	PowerPC). We can make	-0.124939
-0.490069	This tool can make	-0.124939
-0.348197	fastcall modifier can make	-0.124939
-0.504819	== 2 // make	-0.124939
-0.358586	cache space or make	-0.124939
-0.790971	compilers do not make	-0.124939
-0.542392	studied do not make	-0.124939
-0.581494	long does not make	-0.124939
-0.479198	function. Do not make	-0.124939
-0.479198	efficient. Do not make	-0.124939
-1.324510	then you may make	-0.124939
-0.544576	better, you may make	-0.124939
-0.548587	pointer. You may make	-0.124939
-0.548587	incompatible. You may make	-0.124939
-0.354391	if only you make	-0.124939
-0.585488	64 If you make	-0.124939
-0.354391	support. Then you make	-0.124939
-0.305025	function. This will make	-0.124939
-0.305025	slices. This will make	-0.124939
-0.305025	-axAVX. This will make	-0.124939
-0.305025	(4096). This will make	-0.124939
-0.787773	Gnu compiler will make	-0.124939
-0.657049	and this will make	-0.124939
-0.439167	other compilers will make	-0.124939
-0.439167	some compilers will make	-0.124939
-0.785753	Most compilers will make	-0.124939
-0.461844	Instead, I will make	-0.124939
-0.327655	integer overflow will make	-0.124939
-0.327655	& b; will make	-0.124939
-0.327655	= b++; will make	-0.124939
-0.350237	function library then make	-0.124939
-0.452607	of parameters then make	-0.124939
-0.350237	point counter then make	-0.124939
-0.350237	Intel compiler, then make	-0.124939
-0.581419	style. Some compilers make	-0.124939
-0.519153	CodeGear compiler cannot make	-0.124939
-0.786029	and you cannot make	-0.124939
-0.738469	that they cannot make	-0.124939
-0.334756	variables Compilers cannot make	-0.124939
-0.346953	problems you must make	-0.124939
-0.346953	words, you must make	-0.124939
-1.275605	the compiler doesn't make	-0.124939
-0.338236	two loops would make	-0.124939
-0.338236	dependency chain would make	-0.124939
-0.353812	additional parameters. Therefore, make	-0.124939
-0.543696	may of course make	-0.124939
-1.096462	Mac OS X make	-0.124939
-0.339412	OMF format. Alternatively, make	-0.124939
-0.331758	compile time. Templates make	-0.124939
-0.294061	range"); or better, make	-0.124939
-0.237772	and non-recoverable errors; make	-0.124939
-0.237772	-Wstrict-overflow=2, or (5) make	-0.124939
-0.599730	support of the different	-0.124939
-0.894728	pointers to the different	-0.124939
-0.599314	predictions in the different	-0.124939
-0.594342	priority. If the different	-0.124939
-0.490073	synchronization between the different	-0.124939
-0.490073	evenly between the different	-0.124939
-0.879230	to test the different	-0.124939
-0.572965	shows whether the different	-0.124939
-0.560942	and put the different	-0.124939
-0.357188	classes contain the different	-0.124939
-0.656392	to manipulate the different	-0.124939
-0.357188	table summarizes the different	-0.124939
-0.573114	specific size is different	-0.124939
-0.593277	pointer of a different	-0.124939
-0.554899	application to a different	-0.124939
-0.554899	jump to a different	-0.124939
-0.554899	loader to a different	-0.124939
-0.562223	integer in a different	-0.124939
-0.969197	bits in a different	-0.124939
-0.826765	result in a different	-0.124939
-0.562223	possibly in a different	-0.124939
-0.826765	defined in a different	-0.124939
-1.025992	the function a different	-0.124939
-0.833170	library with a different	-0.124939
-0.833170	modules with a different	-0.124939
-1.351405	to use a different	-0.124939
-0.546774	variables use a different	-0.124939
-0.526548	static has a different	-0.124939
-0.526548	keyword has a different	-0.124939
-1.310587	by using a different	-0.124939
-0.549936	compiler uses a different	-0.124939
-0.353162	it had a different	-0.124939
-1.965225	the number of different	-0.124939
-0.099282	when objects of different	-0.425969
-0.228203	store objects of different	-0.124939
-0.555691	Comparing performance of different	-0.124939
-0.141520	that pointers of different	-0.124939
-0.141520	two pointers of different	-0.124939
-0.496470	make arrays of different	-0.124939
-0.304512	The efficiency of different	-0.425969
-0.305679	relative efficiency of different	-0.124939
-0.352804	or strings of different	-0.124939
-0.566017	A discussion of different	-0.124939
-0.647704	time consumption of different	-0.124939
-0.128104	8.2 Comparison of different	-0.124939
-0.352804	for transposition of different	-0.124939
-0.352804	are hundreds of different	-0.124939
-0.352804	7.1. Sizes of different	-0.124939
-0.572314	that pointers to different	-0.124939
-0.358098	different priorities to different	-0.124939
-0.358098	to port to different	-0.124939
-0.358163	color settings and different	-0.124939
-0.358163	different alignments and different	-0.124939
-0.566642	same data in different	-0.124939
-1.319721	are stored in different	-0.124939
-0.554992	typically stored in different	-0.124939
-0.987294	is available in different	-0.124939
-1.214824	be implemented in different	-0.124939
-0.500229	of optimizations in different	-0.124939
-0.459281	be tested in different	-0.124939
-0.459281	are kept in different	-0.124939
-0.501941	Overloaded functions The different	-0.124939
-0.835447	level-1 cache. The different	-0.124939
-0.524050	that variable. The different	-0.124939
-0.460839	instructions sets. The different	-0.124939
-0.925738	a function for different	-0.124939
-0.492752	is different for different	-0.124939
-0.345752	different cases for different	-0.124939
-0.041669	code versions for different	-0.124939
-0.041669	different versions for different	-0.124939
-0.013450	multiple versions for different	-0.301030
-0.041669	several versions for different	-0.124939
-0.345752	to compile for different	-0.124939
-0.016249	"Calling conventions for different	-0.823909
-0.088108	Calling conventions for different	-0.124939
-0.345752	different algorithms for different	-0.124939
-0.633905	memory area for different	-0.124939
-0.345752	cache organization for different	-0.124939
-0.345752	memory spaces for different	-0.124939
-0.345752	matrix cell for different	-0.124939
-0.579124	here means that different	-0.124939
-0.599826	Integers can be different	-0.124939
-0.592844	section will be different	-0.124939
-0.558132	because there are different	-0.124939
-0.558132	where there are different	-0.124939
-0.356995	branch prediction are different	-0.124939
-0.583967	large or if different	-0.124939
-0.348773	Pentium 4 with different	-0.124939
-0.407389	be compiled with different	-0.124939
-0.407389	are compiled with different	-0.124939
-0.537668	into threads with different	-0.124939
-0.348773	different addresses with different	-0.124939
-0.562144	even compatible with different	-0.124939
-0.348773	disk. Test with different	-0.124939
-0.348773	multiple streams with different	-0.124939
-0.653062	be tested on different	-0.124939
-0.355513	is fastest on different	-0.124939
-0.459296	behave differently on different	-0.124939
-0.504254	simply treated as different	-0.124939
-0.777308	compilers will use different	-0.124939
-0.456723	integer operations use different	-0.124939
-0.353485	the threads use different	-0.124939
-0.458967	object files from different	-0.124939
-0.535641	had read from different	-0.124939
-0.064961	scattered around at different	-0.124939
-0.352337	dispatch decision at different	-0.124939
-1.360943	how to make different	-0.124939
-0.357994	file stub. If different	-0.124939
-0.587288	function for each different	-0.124939
-1.406505	able to do different	-0.124939
-0.591313	conversions by using different	-0.124939
-1.599872	a and b different	-0.124939
-0.479708	performance of two different	-0.124939
-0.500686	representations in two different	-0.124939
-0.973309	There are two different	-0.124939
-0.299630	you have two different	-0.124939
-0.299630	family have two different	-0.124939
-0.422246	can make two different	-0.124939
-0.326078	and correspondingly two different	-0.124939
-0.497546	users in many different	-0.124939
-0.422411	useful for many different	-0.124939
-0.596734	available for many different	-0.124939
-0.520386	called with many different	-0.124939
-0.455126	to have many different	-0.124939
-0.322713	called from many different	-0.124939
-0.418050	are so many different	-0.124939
-0.502271	computers have very different	-0.124939
-0.316704	program, or between different	-0.124939
-0.316704	a switch between different	-0.124939
-0.496803	precision. Conversions between different	-0.124939
-0.465528	for communication between different	-0.124939
-0.316704	thread jumps between different	-0.124939
-0.316704	function. Switch between different	-0.124939
-0.316704	multitasking environment, between different	-0.124939
-0.355462	symbolic link. Use different	-0.124939
-0.355306	casting operator These different	-0.124939
-0.278857	optimize for several different	-0.124939
-0.278857	templates for several different	-0.124939
-0.264928	There are several different	-0.124939
-0.260932	accessed by several different	-0.124939
-0.260932	test on several different	-0.124939
-0.389619	syntax has several different	-0.124939
-0.260932	to test several different	-0.124939
-0.354754	code to support different	-0.124939
-0.354710	go into eight different	-0.124939
-0.520701	threads are doing different	-0.124939
-0.352497	well. Supports three different	-0.124939
-0.352269	It will look different	-0.124939
-0.347958	transposing and copying different	-0.124939
-0.539079	is to mix different	-0.124939
-0.341582	recommended to try different	-0.124939
-0.341648	brands of CPUs, different	-0.124939
-0.102766	expressions on seven different	-0.124939
-0.102766	experiments on seven different	-0.124939
-0.102766	on different platforms, different	-0.124939
-0.102766	browsers, different platforms, different	-0.124939
-0.594612	penalty for mixing different	-0.124939
-0.293820	and p2 having different	-0.124939
-0.538172	different screen resolutions, different	-0.124939
-0.293820	mechanism that treats different	-0.124939
-0.293820	useful for assigning different	-0.124939
-0.237560	in different browsers, different	-0.124939
-0.237560	threads with widely different	-0.124939
-0.237560	for different microprocessors, different	-0.124939
-0.910818	time. This is because	-0.124939
-0.508225	cases. This is because	-0.124939
-0.508225	size. This is because	-0.124939
-0.508225	returns. This is because	-0.124939
-0.508225	times. This is because	-0.124939
-0.508225	results. This is because	-0.124939
-0.508225	deprecated. This is because	-0.124939
-0.508225	measure. This is because	-0.124939
-1.186893	This may be because	-0.124939
-0.503662	the program or because	-0.124939
-0.465833	or member function because	-0.124939
-0.173241	non-static member function because	-0.425969
-0.456625	the simple function because	-0.124939
-0.532848	a frame function because	-0.124939
-0.504585	efficient than if because	-0.124939
-0.535536	fully optimized code because	-0.124939
-0.458880	less optimal code because	-0.124939
-0.355185	to non-AVX code because	-0.124939
-1.552566	the Intel compiler because	-0.124939
-1.203864	at a time because	-0.124939
-1.033887	the first time because	-0.124939
-0.435047	the execution time because	-0.124939
-0.307821	total execution time because	-0.124939
-1.385745	at compile time because	-0.124939
-0.357707	to store data because	-0.124939
-0.582463	class member functions because	-0.124939
-0.832027	the while loop because	-0.124939
-0.524929	evaluated at all because	-0.124939
-1.278369	the level-2 cache because	-0.124939
-0.872316	is an integer because	-0.124939
-0.584552	preprocessor can do because	-0.124939
-1.235936	float or double because	-0.124939
-0.893975	a and b because	-0.124939
-0.347786	a < b because	-0.124939
-0.525265	static link library because	-0.124939
-0.357068	a contained object because	-0.124939
-0.334813	Templates are efficient because	-0.124939
-1.021016	slightly more efficient because	-0.124939
-0.471623	be very efficient because	-0.124939
-0.816092	be less efficient because	-0.124939
-0.612913	are equally efficient because	-0.124939
-0.357003	other optimizations possible because	-0.124939
-0.656125	the optimized version because	-0.124939
-0.850908	by a variable because	-0.124939
-0.816650	an induction variable because	-0.124939
-0.548546	make register variables because	-0.124939
-0.524030	reduce the performance because	-0.124939
-0.524030	reducing the performance because	-0.124939
-0.443722	of program performance because	-0.124939
-0.460411	for 32-bit software because	-0.124939
-0.460887	loop is long because	-0.124939
-0.868981	method is faster because	-0.124939
-0.342069	the code faster because	-0.124939
-0.688425	will run faster because	-0.124939
-0.356663	are particularly critical because	-0.124939
-0.387649	the same register because	-0.425969
-0.459860	happens quite often because	-0.124939
-0.356151	a function template because	-0.124939
-0.755631	casting of pointers because	-0.124939
-0.475957	use than pointers because	-0.124939
-0.337971	that uses pointers because	-0.124939
-0.355897	the unit- test because	-0.124939
-0.355867	in some systems because	-0.124939
-0.574879	testing is useful because	-0.124939
-0.854463	0/a = 0 because	-0.124939
-0.355741	recognize VIA processors because	-0.124939
-0.355703	soon became available because	-0.124939
-0.500548	from cleaning up because	-0.124939
-0.333461	reloaded eight times because	-0.124939
-0.333461	the subsequent times because	-0.124939
-0.333461	a hundred times because	-0.124939
-0.499622	list is large because	-0.124939
-0.577863	extra function calls because	-0.124939
-0.355395	the correct result because	-0.124939
-1.110246	is not necessary because	-0.124939
-0.836087	using assembly language because	-0.124939
-0.458815	than half speed because	-0.124939
-0.354308	less than 128 because	-0.124939
-0.498926	as function parameters because	-0.124939
-0.850429	This is advantageous because	-0.124939
-0.880158	most efficient solution because	-0.124939
-0.438442	an optimal solution because	-0.124939
-0.798303	be an advantage because	-0.124939
-0.978414	the Boolean operators because	-0.124939
-0.375027	in 64-bit mode because	-0.602060
-0.560033	in the values because	-0.124939
-0.353018	in interactive programs because	-0.124939
-0.456431	cause caching problems because	-0.124939
-0.298543	is not optimal because	-0.425969
-0.496734	in a microprocessor because	-0.124939
-0.563900	somewhat more complicated because	-0.124939
-0.520616	has no cost because	-0.124939
-0.519557	will be better because	-0.124939
-0.351839	for critical applications because	-0.124939
-0.352070	Gnu compiler mechanism because	-0.124939
-0.455148	would be needed because	-0.124939
-0.454837	of simple types because	-0.124939
-0.351998	an uncached read because	-0.124939
-0.351235	use hexadecimal numbers because	-0.124939
-0.350759	the allocation process because	-0.124939
-0.350525	a function just because	-0.124939
-0.883183	of the operands because	-0.124939
-0.415901	the Boolean operands because	-0.124939
-0.493097	code is smaller because	-0.124939
-0.321192	on n here because	-0.124939
-0.321192	cost anything here because	-0.124939
-0.349888	slower than intended because	-0.124939
-0.327707	should be avoided because	-0.124939
-0.196150	This is inefficient because	-0.124939
-0.196150	other is inefficient because	-0.124939
-0.196150	storage is inefficient because	-0.124939
-0.377200	is very inefficient because	-0.124939
-0.349462	is always position-independent because	-0.124939
-0.450177	to different platforms because	-0.124939
-0.348222	by other constants because	-0.124939
-0.449150	very time-consuming tasks because	-0.124939
-1.026684	a hard disk because	-0.124939
-0.797623	the main executable because	-0.124939
-0.345323	is inherently parallel because	-0.124939
-0.345680	is not copied because	-0.124939
-0.518669	for switch statements because	-0.124939
-0.444862	for several seconds because	-0.124939
-0.440874	with fine-grained parallelism because	-0.124939
-0.440671	code is fastest because	-0.124939
-0.340942	processors are preferred because	-0.124939
-0.726267	compiled without -fpic because	-0.124939
-0.251359	is time consuming because	-0.124939
-0.251359	be time consuming because	-0.124939
-0.338236	bits total size, because	-0.124939
-0.505432	practice, of course, because	-0.124939
-0.335292	problem only occurs because	-0.124939
-0.255065	are quite costly because	-0.124939
-0.255065	are relatively costly because	-0.124939
-0.335079	may be poor because	-0.124939
-0.335079	or fail completely because	-0.124939
-0.428018	in column 28 because	-0.124939
-0.428018	between multiple processes because	-0.124939
-0.324207	label plus one, because	-0.124939
-0.420690	u.i[1] ^= 0x80000000; because	-0.124939
-0.593586	code is serial because	-0.124939
-0.420301	for example 9.5 because	-0.124939
-0.324207	code becomes simpler because	-0.124939
-0.407233	compilers behave differently because	-0.124939
-0.313601	This is unfortunate because	-0.124939
-0.172126	has const twice because	-0.124939
-0.172126	is calculated twice because	-0.124939
-0.406731	with option -fpie because	-0.124939
-0.313601	not an issue because	-0.124939
-0.406731	handling is negligible because	-0.124939
-0.102515	is very problematic because	-0.124939
-0.102515	are particularly problematic because	-0.124939
-0.381510	can be vectorized, because	-0.124939
-0.293135	memcpy is unsafe because	-0.124939
-0.236958	i but i*12, because	-0.124939
-0.236958	is particularly interesting because	-0.124939
-0.236958	and accessed non-sequentially because	-0.124939
-0.236958	partial flags stall because	-0.124939
-0.236958	is not evaluated, because	-0.124939
-0.236958	x = *(++p) because	-0.124939
-0.236958	x = array[++i] because	-0.124939
-0.236958	same cache line, because	-0.124939
-0.236958	(e.g. Sandy Bridge) because	-0.124939
-0.236958	determined in advance, because	-0.124939
-0.236958	allocated with alloca, because	-0.124939
-0.236958	is particularly risky because	-0.124939
-0.762907	which is the same	-0.124939
-0.469432	pointer is the same	-0.124939
-0.469432	calls is the same	-0.124939
-0.469432	result is the same	-0.124939
-0.469432	matrix is the same	-0.124939
-0.469432	i++) is the same	-0.124939
-0.469432	reference is the same	-0.124939
-0.547558	be of the same	-0.124939
-1.018056	many of the same	-0.124939
-1.579562	version of the same	-0.124939
-0.568545	member of the same	-0.124939
-0.850823	versions of the same	-0.124939
-0.118534	members of the same	-0.301030
-0.547558	models of the same	-0.124939
-0.547558	implementations of the same	-0.124939
-0.800169	instances of the same	-0.124939
-0.449545	point to the same	-0.124939
-0.740133	pointers to the same	-0.124939
-0.922944	access to the same	-0.124939
-0.512983	write to the same	-0.124939
-0.184783	writing to the same	-0.425969
-0.512983	reads to the same	-0.124939
-0.740133	belong to the same	-0.124939
-0.732951	not in the same	-0.124939
-0.732951	other in the same	-0.124939
-0.747501	used in the same	-0.425969
-0.732951	objects in the same	-0.124939
-0.508712	way in the same	-0.124939
-0.183776	threads in the same	-0.425969
-0.508712	language in the same	-0.124939
-0.843811	classes in the same	-0.124939
-0.732951	implemented in the same	-0.124939
-0.508712	preferably in the same	-0.124939
-0.340434	running in the same	-0.124939
-0.508712	were in the same	-0.124939
-0.508712	priority in the same	-0.124939
-0.508712	additions in the same	-0.124939
-0.508712	lengths in the same	-0.124939
-0.508712	FPGA in the same	-0.124939
-0.508712	stay in the same	-0.124939
-0.524846	variables for the same	-0.124939
-0.524846	registers for the same	-0.124939
-0.524846	competing for the same	-0.124939
-0.187540	contend for the same	-0.425969
-0.524846	compete for the same	-0.124939
-0.574190	complication that the same	-0.124939
-0.220458	b are the same	-0.124939
-0.220458	algebra are the same	-0.124939
-0.220458	principles are the same	-0.124939
-0.851632	loop if the same	-0.124939
-0.569660	loop by the same	-0.124939
-0.171570	function with the same	-0.124939
-0.171570	threads with the same	-0.425969
-0.459312	calculated with the same	-0.124939
-0.459312	types with the same	-0.124939
-0.459312	configurations with the same	-0.124939
-0.459312	repeatedly with the same	-0.124939
-0.557797	processors on the same	-0.124939
-0.818661	running on the same	-0.124939
-0.775232	to have the same	-0.124939
-0.757940	to use the same	-0.124939
-0.515945	that use the same	-0.124939
-0.164809	can use the same	-0.221849
-0.579778	not use the same	-0.124939
-0.366705	cannot use the same	-0.124939
-0.366705	applications use the same	-0.124939
-0.366705	thenaandbcannot use the same	-0.124939
-0.366705	Subtractions use the same	-0.124939
-0.740626	called from the same	-0.124939
-0.457954	sets from the same	-0.124939
-0.457954	read from the same	-0.124939
-0.457954	writing from the same	-0.124939
-0.457954	generated from the same	-0.124939
-0.155163	used at the same	-0.124939
-0.398853	variable at the same	-0.124939
-0.398853	multiplication at the same	-0.124939
-0.398853	zero at the same	-0.124939
-0.398853	things at the same	-0.124939
-0.398853	thing at the same	-0.124939
-0.398853	DLL at the same	-0.124939
-0.429461	{ has the same	-0.124939
-0.429461	main has the same	-0.124939
-0.429461	executable has the same	-0.124939
-0.543853	data because the same	-0.124939
-0.543853	copied because the same	-0.124939
-0.547822	elimination If the same	-0.124939
-0.547822	class). If the same	-0.124939
-0.842548	of using the same	-0.124939
-0.176787	are using the same	-0.124939
-0.555936	linked into the same	-0.124939
-0.551190	speed. In the same	-0.124939
-0.558644	sets where the same	-0.124939
-0.370301	b take the same	-0.124939
-0.370301	usually take the same	-0.124939
-0.434399	cases even the same	-0.124939
-0.177660	that does the same	-0.124939
-0.102778	It does the same	-0.124939
-0.102778	which does the same	-0.124939
-0.102778	operator does the same	-0.124939
-0.102778	__intel_cpu_features_init_x() does the same	-0.124939
-0.335790	BSD work the same	-0.124939
-0.513240	always calls the same	-0.124939
-0.515286	array. But the same	-0.124939
-0.513240	not get the same	-0.124939
-0.293928	of doing the same	-0.124939
-0.293928	are doing the same	-0.124939
-0.293928	fact doing the same	-0.124939
-1.406689	to calculate the same	-0.124939
-0.874792	to write the same	-0.124939
-0.335790	code becomes the same	-0.124939
-0.614768	example shows the same	-0.124939
-0.434399	always goes the same	-0.124939
-0.335790	to go the same	-0.124939
-0.237676	to produce the same	-0.124939
-0.237676	should produce the same	-0.124939
-0.614768	is still the same	-0.124939
-0.335790	for accessing the same	-0.124939
-0.614768	at least the same	-0.124939
-0.614768	to keep the same	-0.124939
-0.340338	from within the same	-0.124939
-0.340338	only within the same	-0.124939
-0.026793	to share the same	-0.124939
-0.008749	can share the same	-0.124939
-0.026793	objects share the same	-0.124939
-0.026793	threads share the same	-0.124939
-0.026793	members share the same	-0.124939
-0.026793	usually share the same	-0.124939
-0.026793	28 share the same	-0.124939
-0.237676	have exactly the same	-0.124939
-0.237676	doing exactly the same	-0.124939
-0.434399	for executing the same	-0.124939
-0.062791	of interpreting the same	-0.425969
-0.335790	variable having the same	-0.124939
-0.335790	library requiring the same	-0.124939
-0.136219	of sharing the same	-0.124939
-0.136219	are sharing the same	-0.124939
-0.335790	may reuse the same	-0.124939
-1.437609	} } The same	-0.124939
-0.352511	if unsigned The same	-0.124939
-0.560101	data cache. The same	-0.124939
-0.711955	non-sequential order. The same	-0.124939
-0.352511	register storage. The same	-0.124939
-0.647126	is closed. The same	-0.124939
-0.352511	>= operators). The same	-0.124939
-0.352511	four floats. The same	-0.124939
-0.352511	or 2016. The same	-0.124939
-0.352511	128-bit reads. The same	-0.124939
-0.352511	preceding row. The same	-0.124939
-0.203013	called only from same	-0.425969
-0.738412	with execution units same	-0.124939
-1.467661	most of the functions	-0.124939
-0.596502	few of the functions	-0.124939
-0.597036	performance for the functions	-0.124939
-0.576309	or with the functions	-0.124939
-1.006101	done with the functions	-0.124939
-1.781738	to use the functions	-0.124939
-1.755807	to make the functions	-0.124939
-0.191235	inlining all the functions	-0.425969
-0.722252	class containing the functions	-0.124939
-0.502296	classes implement the functions	-0.124939
-0.356981	linker extracts the functions	-0.124939
-0.356981	to collect the functions	-0.124939
-1.329662	the order of functions	-0.124939
-0.357699	versions even of functions	-0.124939
-1.107236	the speed of functions	-0.124939
-0.585569	macros instead of functions	-0.124939
-0.358711	All accesses to functions	-0.124939
-0.358508	These operators and functions	-0.124939
-0.503584	code memory. The functions	-0.124939
-0.357902	more difficult. The functions	-0.124939
-0.557641	calling conventions for functions	-0.124939
-1.043579	making sure that functions	-0.124939
-0.357283	64-bit Windows if functions	-0.124939
-0.879544	most efficiently if functions	-0.124939
-0.358146	class library have functions	-0.124939
-0.724548	to return from functions	-0.124939
-0.357944	The intrinsic vector functions	-0.124939
-0.569708	put the different functions	-0.124939
-0.560998	by several different functions	-0.124939
-0.453450	and some other functions	-0.124939
-0.497568	x, while other functions	-0.124939
-0.350903	that calls other functions	-0.124939
-0.556170	reports of which functions	-0.124939
-0.567509	PLT for all functions	-0.124939
-0.541954	only if all functions	-0.124939
-0.350386	and declare all functions	-0.124939
-0.556006	Keep often used functions	-0.124939
-1.336828	you are using functions	-0.124939
-0.770854	call the library functions	-0.124939
-0.317623	addresses of library functions	-0.124939
-0.448236	time in library functions	-0.124939
-0.317623	of most library functions	-0.124939
-0.411720	Many Intel library functions	-0.124939
-0.317623	for any library functions	-0.124939
-0.317623	that standard library functions	-0.124939
-0.317623	in optimizing library functions	-0.124939
-0.317623	Time- consuming library functions	-0.124939
-0.516847	and these two functions	-0.124939
-0.454605	} These two functions	-0.124939
-0.557093	well as efficient functions	-0.124939
-1.087757	there are many functions	-0.124939
-0.426560	library contains many functions	-0.124939
-0.301473	Library" contains many functions	-0.124939
-0.339463	CPUs. Includes many functions	-0.124939
-0.309180	have the member functions	-0.124939
-0.309180	If the member functions	-0.124939
-0.250208	classes and member functions	-0.124939
-0.050357	pointer in member functions	-0.124939
-0.250208	52. The member functions	-0.124939
-0.190071	to make member functions	-0.124939
-0.190071	may make member functions	-0.124939
-0.375257	for class member functions	-0.124939
-0.250208	contains any member functions	-0.124939
-0.292378	to virtual member functions	-0.124939
-0.292378	one virtual member functions	-0.124939
-0.068530	systems. Virtual member functions	-0.124939
-0.032915	7.20 Virtual member functions	-0.124939
-0.050357	7.19 Class member functions	-0.124939
-0.250208	are: Non-static member functions	-0.124939
-0.550429	and the critical functions	-0.124939
-0.550429	includes the critical functions	-0.124939
-0.341453	by making critical functions	-0.124939
-0.554745	implementation of these functions	-0.124939
-0.496337	beware that these functions	-0.124939
-0.337685	lrint. Unfortunately, these functions	-0.124939
-0.586235	certain operating system functions	-0.124939
-0.502078	Microsoft compiler. Some functions	-0.124939
-0.580802	have information about functions	-0.124939
-0.809490	the most important functions	-0.124939
-0.552210	lack the necessary functions	-0.124939
-0.499739	The CPU- specific functions	-0.124939
-0.342152	so-called commpage. These functions	-0.124939
-0.342152	with _mm. These functions	-0.124939
-0.396906	pointers and virtual functions	-0.124939
-0.396906	polymorphism with virtual functions	-0.124939
-0.305655	called. If virtual functions	-0.124939
-0.305655	can avoid virtual functions	-0.124939
-0.305655	functions. Avoid virtual functions	-0.124939
-0.355189	memory. If several functions	-0.124939
-0.581356	for a few functions	-0.124939
-0.354457	possible. Use inline functions	-0.124939
-0.353887	stack unwinding. All functions	-0.124939
-0.353669	and optimize both functions	-0.124939
-0.564559	the more complicated functions	-0.124939
-0.171610	support for intrinsic functions	-0.124939
-0.097681	files for intrinsic functions	-0.124939
-0.097681	header for intrinsic functions	-0.124939
-0.160578	implemented with intrinsic functions	-0.124939
-0.160578	to use intrinsic functions	-0.124939
-0.160578	of different intrinsic functions	-0.124939
-0.160578	by using intrinsic functions	-0.124939
-0.160578	Define SSE2 intrinsic functions	-0.124939
-0.160578	language Use intrinsic functions	-0.124939
-0.047222	lookup Using intrinsic functions	-0.124939
-0.022969	12.4 Using intrinsic functions	-0.124939
-0.375198	tables of mathematical functions	-0.124939
-0.095013	root and mathematical functions	-0.124939
-0.221120	instructions for mathematical functions	-0.124939
-0.221120	memset, or mathematical functions	-0.124939
-0.294772	most common mathematical functions	-0.124939
-0.221120	of advanced mathematical functions	-0.124939
-0.221120	for computing mathematical functions	-0.124939
-0.352725	www.agner.org/optimize/asmlib.zip contains various functions	-0.124939
-0.614249	memory and string functions	-0.124939
-0.307082	of common string functions	-0.124939
-0.307082	C style string functions	-0.124939
-0.455200	r.b;} The three functions	-0.124939
-0.352622	64-bit mode. Make functions	-0.124939
-0.326472	overriding of public functions	-0.124939
-0.326472	relocation. All public functions	-0.124939
-0.350775	into multiple smaller functions	-0.124939
-0.452795	manipulated with C functions	-0.124939
-0.311371	library of math functions	-0.124939
-0.311371	most common math functions	-0.124939
-0.348502	names of inlined functions	-0.124939
-0.569557	calls to frame functions	-0.124939
-0.311422	efficient than frame functions	-0.124939
-0.345747	fully optimized. Library functions	-0.124939
-0.224680	member functions Virtual functions	-0.124939
-0.224680	is pure. Virtual functions	-0.124939
-0.224680	page 96). Virtual functions	-0.124939
-0.338770	for all suitable functions	-0.124939
-0.065702	string manipulation Mathematical functions	-0.124939
-0.031610	142 14.10 Mathematical functions	-0.124939
-0.031610	u[0]. 14.10 Mathematical functions	-0.124939
-0.065702	page 140). Mathematical functions	-0.124939
-0.031610	117 12.7 Mathematical functions	-0.124939
-0.031610	118 12.7 Mathematical functions	-0.124939
-0.129331	suboptimal code. Intrinsic functions	-0.124939
-0.129331	machine instructions. Intrinsic functions	-0.124939
-0.129331	either case. Intrinsic functions	-0.124939
-0.129331	Table 12.3. Intrinsic functions	-0.124939
-0.335983	distinction between leaf functions	-0.124939
-0.335858	variables and internal functions	-0.124939
-0.324907	truncation. The missing functions	-0.124939
-0.211918	pointers /vms Fastcall functions	-0.124939
-0.211918	CodeGear compiler). Fastcall functions	-0.124939
-0.324724	to identify individual functions	-0.124939
-0.324724	the program. Small functions	-0.124939
-0.093213	); 7.26 Overloaded functions	-0.124939
-0.093213	56 7.26 Overloaded functions	-0.124939
-0.407356	only for speed-critical functions	-0.124939
-0.293617	needs them. Pure functions	-0.124939
-0.293617	names. Use fastcall functions	-0.124939
-0.293617	// Place non-polymorphic functions	-0.124939
-0.293617	separate function. Sometimes, functions	-0.124939
-0.293617	program. Avoid unnecessary functions	-0.124939
-0.237381	} // Non-polymorphic functions	-0.124939
-0.237381	GetTickCount or QueryPerformanceCounter functions	-0.124939
-0.237381	leaf function. Leaf functions	-0.124939
-0.237381	dispatching or memory-intensive functions	-0.124939
-0.237381	each case. Inlined functions	-0.124939
-0.576218	here is the only	-0.124939
-0.576218	metaprogramming is the only	-0.124939
-0.598935	copy that the only	-0.124939
-0.574651	that's about the only	-0.124939
-1.816332	if it is only	-0.124939
-0.878404	sure it is only	-0.124939
-0.568112	class. This is only	-0.124939
-0.837656	bits. This is only	-0.124939
-0.568112	x. This is only	-0.124939
-0.867896	Obviously, this is only	-0.124939
-0.875217	true, which is only	-0.124939
-1.367638	if there is only	-0.124939
-0.550236	diagonal there is only	-0.124939
-0.550236	systems, there is only	-0.124939
-1.171488	the variable is only	-0.124939
-0.499290	register keyword is only	-0.124939
-0.354829	misprediction penalty is only	-0.124939
-0.354829	Here, log(2.0) is only	-0.124939
-0.358658	to evaluate a only	-0.124939
-0.358653	ports, etc. of only	-0.124939
-0.356580	have one and only	-0.124939
-0.356580	always one, and only	-0.124939
-0.356580	15.1c automatically, and only	-0.124939
-0.356580	advantageous if, and only	-0.124939
-0.356580	priority thread, and only	-0.124939
-0.463381	make dispatcher in only	-0.124939
-0.357205	model number. The only	-0.124939
-0.357205	an example. The only	-0.124939
-0.357205	pointer aliasing. The only	-0.124939
-0.594007	identical so that only	-0.124939
-1.184270	there may be only	-0.124939
-0.595360	There should be only	-0.124939
-0.884513	Operations that are only	-0.124939
-0.356037	256-bit size are only	-0.124939
-0.593104	0x1C. There are only	-0.124939
-0.356037	short int) are only	-0.124939
-0.590859	while you can only	-0.124939
-0.564983	operator, which can only	-0.124939
-0.500348	names. We can only	-0.124939
-0.500348	15.1c. We can only	-0.124939
-0.354732	which otherwise can only	-0.124939
-0.463121	may calculate it only	-0.124939
-0.358504	at runtime, if only	-0.124939
-0.358331	size grows by only	-0.124939
-0.522011	in systems with only	-0.124939
-0.354120	a system with only	-0.124939
-0.794826	for CPUs with only	-0.124939
-0.354120	calculate pow(x,10) with only	-0.124939
-0.857321	can rely on only	-0.124939
-0.452222	redesign can not only	-0.124939
-0.565284	16 will not only	-0.124939
-0.349933	This would not only	-0.124939
-0.349933	should include not only	-0.124939
-0.349933	profiler measures not only	-0.124939
-0.349933	take precedence, not only	-0.124939
-0.550365	set. Therefore, you only	-0.124939
-0.546849	you will have only	-0.124939
-0.536602	Current CPUs have only	-0.124939
-0.357994	b[i]*c[i], though this only	-0.124939
-0.543562	code that use only	-0.124939
-1.219548	you can use only	-0.124939
-0.816004	and then use only	-0.124939
-0.994825	is called from only	-0.124939
-0.140321	the function has only	-0.124939
-0.140321	member function has only	-0.124939
-0.348915	a template has only	-0.124939
-0.348915	the computer has only	-0.124939
-0.576135	compiler will make only	-0.124939
-0.357840	multiple smaller functions only	-0.124939
-0.343265	the user but only	-0.124939
-0.343265	a multiplication but only	-0.124939
-0.443801	14.14b automatically but only	-0.124939
-0.343265	are needed, but only	-0.124939
-0.343265	need relocation, but only	-0.124939
-1.110989	that is used only	-0.124939
-0.546208	This is used only	-0.124939
-0.552497	should be used only	-0.124939
-0.552497	therefore be used only	-0.124939
-0.578269	pointers are used only	-0.124939
-0.657563	Loop unrolling should only	-0.124939
-0.561090	giving this example only	-0.124939
-0.591131	hyperthreading by using only	-0.124939
-0.760879	new register size only	-0.124939
-0.357161	large libraries where only	-0.124939
-0.357161	objects) are possible only	-0.124939
-0.582719	Usually it takes only	-0.124939
-0.438385	in memory takes only	-0.124939
-0.338963	language. C++ takes only	-0.124939
-0.338963	double precision takes only	-0.124939
-0.655675	a new branch only	-0.124939
-0.818098	it is called only	-0.124939
-0.529740	Will be called only	-0.124939
-0.135662	member function called only	-0.124939
-0.135662	Assume function called only	-0.124939
-0.886349	expressions. For example, only	-0.124939
-0.498085	instructions that take only	-0.124939
-0.746405	it may take only	-0.124939
-0.801685	loop will take only	-0.124939
-0.332391	shift operations take only	-0.124939
-0.576072	returned in registers only	-0.124939
-0.356359	high-level language need only	-0.124939
-0.559701	use this method only	-0.124939
-0.459116	code will work only	-0.124939
-0.355221	16 XOP, AMD only	-0.124939
-0.355010	supports this option only	-0.124939
-0.354931	example, use AVX only	-0.124939
-0.813140	it is done only	-0.124939
-0.886286	calculations are done only	-0.124939
-0.263311	overflow and works only	-0.124939
-0.263311	This code works only	-0.124939
-0.263311	Intel compiler works only	-0.124939
-0.345229	course, this works only	-0.124939
-0.345229	this method works only	-0.124939
-0.263311	CPU dispatching works only	-0.124939
-0.345229	renaming mechanism works only	-0.124939
-0.263311	and 14.13b works only	-0.124939
-0.520885	cache. The problem only	-0.124939
-0.455548	container that contains only	-0.124939
-0.590812	the code contains only	-0.124939
-0.418438	body now contains only	-0.124939
-0.354333	safer implementation would only	-0.124939
-0.520492	set can run only	-0.124939
-0.102932	are predicted well only	-0.425969
-0.543763	implementation is optimal only	-0.124939
-0.432420	do the dispatching only	-0.124939
-0.558944	apply CPU dispatching only	-0.124939
-0.648175	64-bit Windows allows only	-0.124939
-0.352806	use such methods only	-0.124939
-0.905055	because it needs only	-0.124939
-0.533400	polymorphism is needed only	-0.124939
-0.597113	two pointers requires only	-0.124939
-0.326413	(single precision requires only	-0.124939
-0.350399	of doing things only	-0.124939
-0.319640	everything that depends only	-0.124939
-0.414226	return value depends only	-0.124939
-0.452468	have been tested only	-0.124939
-0.550966	may be loaded only	-0.124939
-0.316866	Microsoft compiler. Supports only	-0.124939
-0.316866	instruction sets. Supports only	-0.124939
-0.349200	register size comes only	-0.124939
-0.551443	had in fact only	-0.124939
-0.346779	or subexpression containing only	-0.124939
-0.535005	designed to handle only	-0.124939
-0.740131	it is initialized only	-0.124939
-0.345932	Static linking includes only	-0.124939
-0.512981	set and insert only	-0.124939
-0.344645	specification to F1 only	-0.124939
-0.344736	On other processors, only	-0.124939
-0.482847	code is chosen only	-0.124939
-0.343580	of 2 applies only	-0.124939
-0.040809	branch is mispredicted only	-0.124939
-0.019925	way is mispredicted only	-0.425969
-0.040809	count is mispredicted only	-0.124939
-0.338882	specialization is allowed only	-0.124939
-0.486204	is to hold only	-0.124939
-0.335615	operand is evaluated only	-0.124939
-0.466517	instruction is executed only	-0.124939
-0.331081	reference is valid only	-0.124939
-0.428498	but is currently only	-0.124939
-0.324585	at all. Can only	-0.124939
-0.313969	running the services only	-0.124939
-0.293487	double by modifying only	-0.124939
-0.237267	processes simultaneously. Actually, only	-0.124939
-0.237267	and it understands only	-0.124939
-1.492215	versions of the CPU	-0.124939
-0.582410	frequency of the CPU	-0.124939
-1.022712	care of the CPU	-0.124939
-0.582410	resolution of the CPU	-0.124939
-0.582410	90% of the CPU	-0.124939
-0.590956	closer to the CPU	-0.124939
-0.577085	instructions in the CPU	-0.124939
-0.199009	delay in the CPU	-0.425969
-0.577085	integrated in the CPU	-0.124939
-0.577085	flaws in the CPU	-0.124939
-1.016094	possible for the CPU	-0.124939
-1.112623	difficult for the CPU	-0.124939
-0.569372	specifically for the CPU	-0.124939
-1.461648	is that the CPU	-0.124939
-0.555677	instruction that the CPU	-0.124939
-1.293234	sure that the CPU	-0.124939
-1.186095	Note that the CPU	-0.124939
-0.555677	frequency that the CPU	-0.124939
-1.319509	even if the CPU	-0.124939
-0.763077	supported by the CPU	-0.425969
-0.588686	relies on the CPU	-0.124939
-0.583919	set than the CPU	-0.124939
-0.585592	or use the CPU	-0.124939
-0.584312	changing then the CPU	-0.124939
-0.586289	counts at the CPU	-0.124939
-0.525498	calls because the CPU	-0.124939
-0.525498	needed because the CPU	-0.124939
-0.525498	types because the CPU	-0.124939
-0.525498	stall because the CPU	-0.124939
-0.872240	set. If the CPU	-0.124939
-0.579077	in all the CPU	-0.124939
-1.339254	to do the CPU	-0.124939
-0.561332	warm up the CPU	-0.124939
-0.529020	We want the CPU	-0.124939
-0.529523	memory inside the CPU	-0.124939
-0.529523	counter inside the CPU	-0.124939
-0.352490	by both the CPU	-0.124939
-0.352490	checks both the CPU	-0.124939
-0.361643	to replace the CPU	-0.425969
-0.542459	mechanism allows the CPU	-0.124939
-0.518428	instruction sets the CPU	-0.124939
-0.546851	dispatching. Unfortunately, the CPU	-0.124939
-0.497539	code prevent the CPU	-0.124939
-0.436015	This prevents the CPU	-0.425969
-0.708215	to help the CPU	-0.124939
-0.744336	profiler tells the CPU	-0.124939
-0.350872	pulses since the CPU	-0.124939
-0.493783	or bypass the CPU	-0.124939
-0.350872	to override the CPU	-0.124939
-0.350872	that limits the CPU	-0.124939
-0.582995	CPUs or a CPU	-0.124939
-0.589131	run on a CPU	-0.124939
-0.584529	library has a CPU	-0.124939
-1.764484	to make a CPU	-0.124939
-0.524264	only need a CPU	-0.124939
-0.461025	should give a CPU	-0.124939
-0.356874	of keeping a CPU	-0.124939
-2.045296	the number of CPU	-0.124939
-1.163250	the type of CPU	-0.124939
-1.724425	a lot of CPU	-0.124939
-0.503650	common pitfalls of CPU	-0.124939
-0.502902	the history of CPU	-0.124939
-0.463420	flawed approach to CPU	-0.124939
-0.659187	operating system and CPU	-0.124939
-1.029520	the program. The CPU	-0.124939
-0.516604	up-to-date version. The CPU	-0.124939
-0.351649	Intel processor. The CPU	-0.124939
-0.351649	13.5 Implementation The CPU	-0.124939
-0.351649	single instruction. The CPU	-0.124939
-0.351649	most cases: The CPU	-0.124939
-0.351649	of programming. The CPU	-0.124939
-0.351649	detection mechanism. The CPU	-0.124939
-0.351649	calculated independently. The CPU	-0.124939
-0.351649	register renaming. The CPU	-0.124939
-0.351649	necessarily newer. The CPU	-0.124939
-0.351649	years old. The CPU	-0.124939
-0.581163	the check for CPU	-0.124939
-1.127770	is intended for CPU	-0.124939
-0.358541	Example 13.1 // CPU	-0.124939
-0.834217	multiple CPUs or CPU	-0.124939
-1.037356	to optimization by CPU	-0.124939
-0.357151	to dispatch by CPU	-0.124939
-0.538929	optimized code with CPU	-0.124939
-0.962012	function library with CPU	-0.124939
-0.655935	is concentrated on CPU	-0.124939
-0.356959	of research on CPU	-0.124939
-0.596217	access rather than CPU	-0.124939
-0.762730	function libraries have CPU	-0.124939
-0.358164	well spend more CPU	-0.124939
-0.725054	is relevant when CPU	-0.124939
-0.355421	instruction set. A CPU	-0.124939
-0.355421	was developed. A CPU	-0.124939
-0.570036	not look at CPU	-0.124939
-0.556296	jumps between different CPU	-0.124939
-0.754603	with only one CPU	-0.124939
-0.871968	has only one CPU	-0.124939
-0.456572	CPU-specific and each CPU	-0.124939
-0.542141	counters in each CPU	-0.124939
-0.353151	without AVX using CPU	-0.124939
-0.648389	I am using CPU	-0.124939
-0.896153	that the Intel CPU	-0.124939
-0.532127	Overriding the Intel CPU	-0.124939
-0.336277	Library. The multiple CPU	-0.124939
-0.522431	systems with multiple CPU	-0.124939
-0.473630	to use multiple CPU	-0.124939
-0.414731	jump between multiple CPU	-0.124939
-0.414731	workload between multiple CPU	-0.124939
-0.343104	optimized. Jumps between CPU	-0.124939
-0.343104	without discriminating between CPU	-0.124939
-0.343104	that discriminates between CPU	-0.124939
-1.356470	This is called CPU	-0.124939
-0.356277	library functions without CPU	-0.124939
-0.169630	to a specific CPU	-0.301030
-0.320282	indicates a specific CPU	-0.124939
-0.370566	lists of specific CPU	-0.124939
-0.119251	version for specific CPU	-0.124939
-0.119251	fine-tuned for specific CPU	-0.124939
-0.354764	when software uses CPU	-0.124939
-0.354820	represent a known CPU	-0.124939
-0.354201	important than optimizing CPU	-0.124939
-0.842375	in a particular CPU	-0.124939
-0.956943	for a particular CPU	-0.124939
-0.353161	to keep their CPU	-0.124939
-0.352934	features for automatic CPU	-0.124939
-0.065038	code with automatic CPU	-0.124939
-0.141560	(12.4e) with automatic CPU	-0.124939
-0.269681	program contains automatic CPU	-0.124939
-0.351895	processors properly. Many CPU	-0.124939
-0.558539	on its own CPU	-0.124939
-0.514696	The compiler supports CPU	-0.124939
-0.347236	about an unknown CPU	-0.124939
-0.346180	instr. set Automatic CPU	-0.124939
-0.346125	in Wikipedia under CPU	-0.124939
-0.344734	Intel have similar CPU	-0.124939
-0.632404	you should apply CPU	-0.124939
-0.053253	are overriding Intel's CPU	-0.425969
-0.255539	sets........................... 122 13.1 CPU	-0.124939
-0.255539	innermost loops. 13.1 CPU	-0.124939
-0.230129	on the newest CPU	-0.124939
-0.335762	examples of poor CPU	-0.124939
-0.324873	examples of bad CPU	-0.124939
-0.324873	performance monitoring options. CPU	-0.124939
-0.929228	into vector c: CPU	-0.124939
-0.420743	Insert an explicit CPU	-0.124939
-0.314251	overhead which consumes CPU	-0.124939
-0.314440	VIA processors. Explicit CPU	-0.124939
-0.172439	instruction set. 13.6 CPU	-0.124939
-0.172439	..................................................................................................... 126 13.6 CPU	-0.124939
-0.172439	-axSSE3, etc. (Intel CPU	-0.124939
-0.172439	CPU only) (Intel CPU	-0.124939
-0.102714	......................................................................... 128 13.7 CPU	-0.124939
-0.102714	critical. 129 13.7 CPU	-0.124939
-0.237503	// Example 13.2. CPU	-0.124939
-0.237503	programs use inappropriate CPU	-0.124939
-0.237503	Pentium 4 (NetBurst) CPU	-0.124939
-0.581705	0 and the other	-0.124939
-0.863291	times and the other	-0.124939
-1.287476	explained in the other	-0.124939
-0.596431	data for the other	-0.124939
-0.566462	way or the other	-0.124939
-0.594083	compatible with the other	-0.124939
-0.595490	list, on the other	-0.124939
-0.580915	time as the other	-0.124939
-0.591976	calculate than the other	-0.124939
-0.442097	many times the other	-0.124939
-0.313070	three times the other	-0.124939
-0.460478	it goes the other	-0.124939
-0.255187	CPUs. On the other	-0.124939
-0.255187	compiler. On the other	-0.124939
-0.255187	difficult. On the other	-0.124939
-0.255187	profitable. On the other	-0.124939
-0.356444	and rarely the other	-0.124939
-1.472990	if there is other	-0.124939
-0.573217	whether there is other	-0.124939
-0.851048	a result of other	-0.124939
-0.358450	out independently of other	-0.124939
-0.503874	the costs to other	-0.124939
-0.152731	may apply to other	-0.425969
-0.348556	that branch and other	-0.124939
-0.777488	big arrays and other	-0.124939
-0.531796	of integers and other	-0.124939
-0.777488	help files and other	-0.124939
-0.639366	instruction sets and other	-0.124939
-0.450480	of expressions and other	-0.124939
-0.639366	user interface and other	-0.124939
-0.348556	databases, network and other	-0.124939
-0.450480	virtual functions, and other	-0.124939
-0.512089	other platforms and other	-0.124939
-0.348556	very smart and other	-0.124939
-0.450480	managed C++, and other	-0.124939
-0.348556	memory swapping and other	-0.124939
-0.639366	constant propagation and other	-0.124939
-0.348556	Linked lists and other	-0.124939
-0.450480	system database, and other	-0.124939
-0.348556	virus scanners and other	-0.124939
-0.348556	memory leaks and other	-0.124939
-0.348556	cache evictions and other	-0.124939
-0.348556	3-dimensional geometry and other	-0.124939
-0.348556	this limitation and other	-0.124939
-0.348556	virus attacks and other	-0.124939
-0.348556	Template Library) and other	-0.124939
-0.576507	faster than in other	-0.124939
-0.355979	look different in other	-0.124939
-0.554422	about functions in other	-0.124939
-0.812527	are running in other	-0.124939
-0.537660	been defined in other	-0.124939
-0.355979	be prevented in other	-0.124939
-0.500895	rarely found in other	-0.124939
-0.504635	the vector. The other	-0.124939
-0.353455	CPUs, not for other	-0.124939
-1.573402	be used for other	-0.124939
-0.919997	Induction variables for other	-0.124939
-0.562520	register available for other	-0.124939
-0.353455	reserve resources for other	-0.124939
-0.774319	is needed for other	-0.124939
-0.819652	the possibility for other	-0.124939
-0.353455	processing unit for other	-0.124939
-0.353455	accelerator card for other	-0.124939
-0.567926	below. There are other	-0.124939
-0.567926	instructions. There are other	-0.124939
-0.496674	of memory or other	-0.124939
-0.352951	particular CPU or other	-0.124939
-0.352951	an exception or other	-0.124939
-0.647993	hard disk or other	-0.124939
-0.352951	list, database, or other	-0.124939
-0.352951	a printer or other	-0.124939
-0.659203	a disadvantage if other	-0.124939
-0.463091	than multiplying by other	-0.124939
-0.512431	Borland compiler with other	-0.124939
-0.140283	be used with other	-0.124939
-0.140283	are used with other	-0.124939
-0.490892	return operations with other	-0.124939
-0.826667	not compatible with other	-0.124939
-0.064502	is contiguous with other	-0.124939
-0.348790	before coordination with other	-0.124939
-0.463063	if implemented on other	-0.124939
-1.337927	more time than other	-0.124939
-1.378574	is faster than other	-0.124939
-0.545259	execute faster than other	-0.124939
-0.350961	clock cycles than other	-0.124939
-0.350961	clock frequency than other	-0.124939
-0.649822	and b have other	-0.124939
-0.715087	the operands have other	-0.124939
-0.353877	variables might have other	-0.124939
-0.352471	for calling from other	-0.124939
-0.455437	not accessible from other	-0.124939
-0.352471	feasible. Interference from other	-0.124939
-0.498361	used by all other	-0.124939
-0.457581	and sets all other	-0.124939
-0.357920	little-endian storage, but other	-0.124939
-0.657653	at least one other	-0.124939
-0.484983	few or no other	-0.124939
-0.234466	code if no other	-0.124939
-0.234466	objects if no other	-0.124939
-0.406375	to have no other	-0.124939
-0.406375	can have no other	-0.124939
-0.406375	operands have no other	-0.124939
-0.464733	overflow but no other	-0.124939
-0.329774	can produce no other	-0.124939
-0.511996	unrelated to each other	-0.124939
-0.546569	waiting for each other	-0.124939
-0.328702	far from each other	-0.124939
-0.146061	used near each other	-0.124939
-0.263844	stored near each other	-0.425969
-0.146061	called near each other	-0.124939
-0.146061	together near each other	-0.124939
-0.462091	command or do other	-0.124939
-0.519106	cycles on most other	-0.124939
-0.353358	Faster than most other	-0.124939
-0.357660	of any size other	-0.124939
-0.525714	in a library other	-0.124939
-0.801523	used in two other	-0.124939
-0.506142	There are also other	-0.124939
-0.425996	register size. In other	-0.124939
-0.660099	mathematical calculations. In other	-0.124939
-0.329081	template parameter. In other	-0.124939
-0.329081	intended for. In other	-0.124939
-0.329081	exception safe. In other	-0.124939
-0.329081	call __intel_cpu_features_init_x(). In other	-0.124939
-0.202256	it to any other	-0.124939
-0.202256	writes to any other	-0.124939
-0.271417	constructors, and any other	-0.124939
-0.386917	used for any other	-0.124939
-0.386917	call or any other	-0.124939
-0.065308	accessed by any other	-0.425969
-0.142206	line by any other	-0.124939
-0.271417	efficient as any other	-0.124939
-0.271417	but not any other	-0.124939
-0.355037	inputs have any other	-0.124939
-0.202256	called from any other	-0.124939
-0.202256	referenced from any other	-0.124939
-0.053636	doesn't call any other	-0.124939
-0.271417	may insert any other	-0.124939
-0.515194	identical to some other	-0.124939
-0.515194	functions and some other	-0.124939
-0.459750	make two. Some other	-0.124939
-0.332201	no overhead while other	-0.124939
-0.332201	modify x, while other	-0.124939
-0.332201	to Func1, while other	-0.124939
-0.781987	function that calls other	-0.124939
-0.355374	Fortran and several other	-0.124939
-0.572998	array can cause other	-0.124939
-0.352909	also makes various other	-0.124939
-0.932603	compilers can reduce other	-0.124939
-0.351472	make developers choose other	-0.124939
-0.351085	linear algebra) require other	-0.124939
-0.348185	loop predictor. On other	-0.124939
-0.347046	be saved. Any other	-0.124939
-0.447387	Bit-fields of sizes other	-0.124939
-0.335874	on compilers. Several other	-0.124939
-0.434505	the class c1 other	-0.124939
-0.331580	in performance over other	-0.124939
-0.237592	(RTTI), which affects other	-0.124939
-2.069251	part of the instruction	-0.124939
-0.597139	explanation of the instruction	-0.124939
-0.581337	instructions to the instruction	-0.124939
-0.862588	addition to the instruction	-0.124939
-0.581337	translated to the instruction	-0.124939
-0.599814	missing in the instruction	-0.124939
-0.587172	file for the instruction	-0.124939
-1.147405	compiled for the instruction	-0.124939
-0.462918	technical details of instruction	-0.124939
-0.358362	tables: Lists of instruction	-0.124939
-0.550100	different processors and instruction	-0.124939
-0.761932	by 2. The instruction	-0.124939
-0.462327	array elements. The instruction	-0.124939
-1.095207	be used if instruction	-0.124939
-0.581895	name depending on instruction	-0.124939
-0.355067	more efficient. This instruction	-0.124939
-0.355067	instruction set. This instruction	-0.124939
-0.355067	of view. This instruction	-0.124939
-0.548242	it has an instruction	-0.124939
-0.356451	to insert an instruction	-0.124939
-0.561907	code for this instruction	-0.124939
-0.533183	processors with this instruction	-0.124939
-0.353630	that support this instruction	-0.124939
-0.578124	used only when instruction	-0.124939
-0.577048	versions for different instruction	-0.124939
-0.475304	compile for different instruction	-0.124939
-0.598648	least the same instruction	-0.124939
-0.528680	based on which instruction	-0.124939
-0.350646	it checks which instruction	-0.124939
-0.350646	automatically detect which instruction	-0.124939
-0.568289	set has no instruction	-0.124939
-0.587106	name for each instruction	-0.124939
-0.555141	because the 64-bit instruction	-0.124939
-0.879319	the best possible instruction	-0.124939
-0.561434	CPUs". A branch instruction	-0.124939
-0.572409	The 64 bit instruction	-0.124939
-0.586592	when a new instruction	-0.124939
-0.836130	any of these instruction	-0.124939
-0.505144	availability of these instruction	-0.124939
-0.040649	for the SSE2 instruction	-0.124939
-0.040649	if the SSE2 instruction	-0.425969
-0.113300	when the SSE2 instruction	-0.823909
-0.157204	only the SSE2 instruction	-0.124939
-0.085499	without the SSE2 instruction	-0.124939
-0.085499	cases the SSE2 instruction	-0.124939
-0.130197	unless the SSE2 instruction	-0.425969
-0.085499	Using the SSE2 instruction	-0.124939
-0.098252	enable the SSE2 instruction	-0.425969
-0.170315	SSE and SSE2 instruction	-0.124939
-0.049701	efficient. The SSE2 instruction	-0.124939
-0.049701	CPUs. The SSE2 instruction	-0.124939
-0.049701	140). The SSE2 instruction	-0.124939
-0.170315	SSE or SSE2 instruction	-0.124939
-0.170315	the 145 SSE2 instruction	-0.124939
-0.170315	/arch:SSE -msse SSE2 instruction	-0.124939
-0.356130	the AVX 32 instruction	-0.124939
-0.305251	on the available instruction	-0.124939
-0.305251	increased the available instruction	-0.124939
-0.355577	processors. Details about instruction	-0.124939
-0.536297	an inline assembly instruction	-0.124939
-0.552203	support the necessary instruction	-0.124939
-0.723806	for the specific instruction	-0.124939
-1.052534	for a specific instruction	-0.124939
-0.094740	of the AVX instruction	-0.124939
-0.216087	in the AVX instruction	-0.124939
-0.316041	for the AVX instruction	-0.124939
-0.316041	if the AVX instruction	-0.124939
-0.216087	If the AVX instruction	-0.124939
-0.334236	later. The AVX instruction	-0.124939
-0.108757	elements. 12.1 AVX instruction	-0.124939
-0.108757	105 12.1 AVX instruction	-0.124939
-0.266692	for the supported instruction	-0.124939
-0.266692	such as supported instruction	-0.124939
-0.266692	information about supported instruction	-0.124939
-0.052917	// Get supported instruction	-0.425969
-0.266692	the minimum supported instruction	-0.124939
-0.266692	// Detect supported instruction	-0.124939
-0.783435	the nontemporal write instruction	-0.124939
-0.956477	for a particular instruction	-0.124939
-0.507881	supports a particular instruction	-0.124939
-0.548611	i/2+r. The next instruction	-0.124939
-0.352718	availability of various instruction	-0.124939
-0.520573	but on what instruction	-0.124939
-0.182333	SSE2 and later instruction	-0.124939
-0.182333	AVX and later instruction	-0.124939
-0.182333	SSE and later instruction	-0.124939
-0.018931	SSE2 or later instruction	-0.425969
-0.152193	AVX or later instruction	-0.124939
-0.081246	Pentium-II or later instruction	-0.124939
-0.418737	for a higher instruction	-0.124939
-0.418737	with a higher instruction	-0.124939
-0.253712	SSE or higher instruction	-0.124939
-0.253712	or any higher instruction	-0.124939
-0.253712	the next higher instruction	-0.124939
-0.324751	and the AVX2 instruction	-0.124939
-0.324751	somewhat. The AVX2 instruction	-0.124939
-0.356776	of the x86 instruction	-0.124939
-0.356776	to the x86 instruction	-0.124939
-0.297107	32- bit x86 instruction	-0.124939
-0.567351	with the appropriate instruction	-0.124939
-0.453138	of backwards compatible instruction	-0.124939
-0.451411	any particularly slow instruction	-0.124939
-0.614805	for the desired instruction	-0.124939
-0.434424	enable the desired instruction	-0.124939
-0.639571	for a given instruction	-0.124939
-0.346019	-ffunction- sections SSE instruction	-0.124939
-0.122520	if the SSE4.1 instruction	-0.124939
-0.122520	unless the SSE4.1 instruction	-0.124939
-0.382258	choose a newer instruction	-0.124939
-0.293745	set. The newer instruction	-0.124939
-0.519578	with the current instruction	-0.124939
-0.483131	for a low instruction	-0.124939
-0.441251	for a lower instruction	-0.124939
-0.185690	and the CPUID instruction	-0.124939
-0.313877	when the CPUID instruction	-0.124939
-0.185690	call the CPUID instruction	-0.124939
-0.137659	for the latest instruction	-0.425969
-0.066922	the bit scan instruction	-0.124939
-0.229601	slow bit scan instruction	-0.124939
-0.335599	/arch:SSE2 -msse2 SSE3 instruction	-0.124939
-0.248486	use the newest instruction	-0.124939
-0.248486	using the newest instruction	-0.124939
-0.187862	vectorization. The newest instruction	-0.124939
-0.335852	data The prefetch instruction	-0.124939
-0.331212	when the AVX512 instruction	-0.124939
-0.095042	However, the CISC instruction	-0.124939
-0.044926	resource. The CISC instruction	-0.124939
-0.044926	ratio. The CISC instruction	-0.124939
-0.095042	processors with CISC instruction	-0.124939
-0.420545	with the highest instruction	-0.124939
-0.324899	because the x86-64 instruction	-0.124939
-0.044086	9.6b. The MOVNTQ instruction	-0.124939
-0.420545	have the selected instruction	-0.124939
-0.407344	on the specified instruction	-0.124939
-0.065681	with the AVX-512 instruction	-0.124939
-0.031600	function. 12.2 AVX-512 instruction	-0.124939
-0.031600	107 12.2 AVX-512 instruction	-0.124939
-0.293607	// Error: lowest instruction	-0.124939
-0.293607	without the FMA4 instruction	-0.124939
-0.382089	supports the corresponding instruction	-0.124939
-0.293607	by an EMMS instruction	-0.124939
-0.237373	32-bit number (the instruction	-0.124939
-0.237373	by a blend instruction	-0.124939
-0.237373	SSE2 (or later) instruction	-0.124939
-0.237373	(bit scan forward) instruction	-0.124939
-0.237373	color difference. Newest instruction	-0.124939
-0.237373	The Pentium Pro instruction	-0.124939
-0.599614	execution to the point	-0.124939
-0.584011	example, but the point	-0.124939
-1.048569	are sure to point	-0.124939
-0.659359	; edx = point	-0.124939
-0.570334	and makes it point	-0.124939
-0.578619	call it will point	-0.124939
-0.000556	of the floating point	-0.124939
-0.001113	that the floating point	-0.124939
-0.001113	by the floating point	-0.124939
-0.000371	when the floating point	-0.301030
-0.001113	so the floating point	-0.124939
-0.001113	before the floating point	-0.124939
-0.001113	store the floating point	-0.124939
-0.001113	When the floating point	-0.124939
-0.001113	reflects the floating point	-0.124939
-0.001113	in-between the floating point	-0.124939
-0.000371	of a floating point	-0.301030
-0.001113	to a floating point	-0.124939
-0.001113	and a floating point	-0.124939
-0.001113	if a floating point	-0.124939
-0.001113	than a floating point	-0.124939
-0.001113	If a floating point	-0.124939
-0.001113	do a floating point	-0.124939
-0.001113	access a floating point	-0.124939
-0.001113	needs a floating point	-0.124939
-0.001113	addition, a floating point	-0.124939
-0.001113	rounds a floating point	-0.124939
-0.001608	use of floating point	-0.124939
-0.001608	number of floating point	-0.124939
-0.001608	order of floating point	-0.124939
-0.001608	cases of floating point	-0.124939
-0.000535	types of floating point	-0.124939
-0.001608	range of floating point	-0.124939
-0.001608	manipulations of floating point	-0.124939
-0.088497	integer to floating point	-0.124939
-0.013083	integers to floating point	-0.124939
-0.026573	conversion to floating point	-0.124939
-0.026573	apply to floating point	-0.124939
-0.026573	Conversion to floating point	-0.124939
-0.002899	integer and floating point	-0.124939
-0.000964	integers and floating point	-0.124939
-0.002899	constants and floating point	-0.124939
-0.007285	calculations in floating point	-0.124939
-0.007285	especially in floating point	-0.124939
-0.014695	functions. The floating point	-0.124939
-0.022646	variables for floating point	-0.124939
-0.022646	registers for floating point	-0.124939
-0.022646	exception for floating point	-0.124939
-0.022646	accumulators for floating point	-0.124939
-0.014695	assume that floating point	-0.124939
-0.014695	integers or floating point	-0.124939
-0.004843	example with floating point	-0.124939
-0.004843	addition with floating point	-0.124939
-0.004843	incompatible with floating point	-0.124939
-0.004843	than on floating point	-0.124939
-0.002415	reductions on floating point	-0.124939
-0.007285	faster than floating point	-0.124939
-0.007285	expressions than floating point	-0.124939
-0.014695	that have floating point	-0.124939
-0.014695	space. A floating point	-0.124939
-0.003627	than from floating point	-0.124939
-0.003627	conversion from floating point	-0.124939
-0.003627	conversions from floating point	-0.124939
-0.003627	Conversion from floating point	-0.124939
-0.004843	to make floating point	-0.124939
-0.002415	cannot make floating point	-0.124939
-0.014695	mixing different floating point	-0.124939
-0.014695	that all floating point	-0.124939
-0.014695	only one floating point	-0.124939
-0.014695	If each floating point	-0.124939
-0.002415	or two floating point	-0.124939
-0.004843	require two floating point	-0.124939
-0.003627	before any floating point	-0.124939
-0.003627	Conversions between floating point	-0.425969
-0.004843	it makes floating point	-0.124939
-0.002415	set makes floating point	-0.124939
-0.014695	a new floating point	-0.124939
-0.003627	difficulties making floating point	-0.425969
-0.003627	that does floating point	-0.425969
-0.014695	a big floating point	-0.124939
-0.004843	are eight floating point	-0.124939
-0.004843	first eight floating point	-0.124939
-0.004843	involves eight floating point	-0.124939
-0.014695	loop contains floating point	-0.124939
-0.003627	of doing floating point	-0.425969
-0.014695	enable fast floating point	-0.124939
-0.014695	that generate floating point	-0.124939
-0.014695	two positive floating point	-0.124939
-0.014695	are 100 floating point	-0.124939
-0.014695	bug causes floating point	-0.124939
-0.014695	to mix floating point	-0.124939
-0.014695	units. Any floating point	-0.124939
-0.014695	the entire floating point	-0.124939
-0.014695	x87 style floating point	-0.124939
-0.014695	static variables, floating point	-0.124939
-0.014695	for strict floating point	-0.124939
-0.007285	a nonzero floating point	-0.124939
-0.007285	of nonzero floating point	-0.124939
-0.014695	allows larger floating point	-0.124939
-0.014695	an additional floating point	-0.124939
-0.003627	for manipulating floating point	-0.425969
-0.014695	// Catch floating point	-0.124939
-0.014695	branch mispredictions, floating point	-0.124939
-0.014695	less precise floating point	-0.124939
-0.014695	-fno-alias Non-strict floating point	-0.124939
-0.014695	no native floating point	-0.124939
-0.014695	// Reset floating point	-0.124939
-0.014695	including relaxed floating point	-0.124939
-0.014695	to relax floating point	-0.124939
-0.014695	vectors FMA3 floating point	-0.124939
-0.461770	also a possible point	-0.124939
-0.064595	different types cannot point	-0.425969
-0.448894	the objects they point	-0.124939
-0.347301	the texts they point	-0.124939
-0.355887	; Induction++; ; point	-0.124939
-0.009255	{ // Floating point	-0.124939
-0.009255	and double Floating point	-0.124939
-0.009255	- n.a. Floating point	-0.124939
-0.009255	point variables Floating point	-0.124939
-0.009255	64-bit systems. Floating point	-0.124939
-0.009255	point division Floating point	-0.124939
-0.009255	and shift Floating point	-0.124939
-0.009255	clock cycles. Floating point	-0.124939
-0.009255	multiple purposes. Floating point	-0.124939
-0.009255	point expressions. Floating point	-0.124939
-0.009255	integer parameters. Floating point	-0.124939
-0.009255	32-bit integer. Floating point	-0.124939
-0.004603	0; 14.6 Floating point	-0.124939
-0.004603	137 14.6 Floating point	-0.124939
-0.009255	page 105. Floating point	-0.124939
-0.004603	29 7.3 Floating point	-0.124939
-0.004603	31 7.3 Floating point	-0.124939
-0.009255	is organized. Floating point	-0.124939
-0.009255	clock cycles). Floating point	-0.124939
-0.009255	programmer. 79 Floating point	-0.124939
-0.325301	for floating 26 point	-0.124939
-0.325301	the common entry point	-0.124939
-0.102843	before the decimal point	-0.124939
-0.102843	with a decimal point	-0.124939
-0.237853	from a technological point	-0.124939
-0.567768	$B1$2 is the loop	-0.124939
-0.567768	eax,1 is the loop	-0.124939
-1.020811	function of the loop	-0.124939
-1.622209	value of the loop	-0.124939
-0.863318	iteration of the loop	-0.124939
-0.863318	independent of the loop	-0.124939
-0.581719	polynomial of the loop	-0.124939
-0.586541	calculations and the loop	-0.124939
-0.876574	integer in the loop	-0.124939
-0.588599	100 in the loop	-0.124939
-0.877812	(except for the loop	-0.124939
-0.580817	predict that the loop	-0.124939
-0.580817	estimate that the loop	-0.124939
-0.556123	vector or the loop	-0.124939
-0.541861	not if the loop	-0.124939
-0.919000	only if the loop	-0.124939
-0.919000	efficient if the loop	-0.124939
-0.919000	But if the loop	-0.124939
-1.001435	advantageous if the loop	-0.124939
-0.541861	further if the loop	-0.124939
-0.541861	loops if the loop	-0.124939
-1.034850	than by the loop	-0.124939
-0.590098	best when the loop	-0.124939
-0.556360	times then the loop	-0.124939
-0.556360	flag then the loop	-0.124939
-0.583131	than from the loop	-0.124939
-0.585312	constant. If the loop	-0.124939
-1.012324	situation where the loop	-0.124939
-0.538562	check before the loop	-0.124939
-0.784205	immediately before the loop	-0.124939
-0.149070	roll out the loop	-0.301030
-0.112403	rolling out the loop	-0.425969
-0.560104	fills up the loop	-0.124939
-1.108504	to avoid the loop	-0.124939
-0.351862	branch inside the loop	-0.124939
-0.272262	calculations inside the loop	-0.124939
-0.370220	condition inside the loop	-0.124939
-0.370220	nothing inside the loop	-0.124939
-0.569918	variable unless the loop	-0.124939
-0.548859	check after the loop	-0.124939
-0.928282	turn off the loop	-0.124939
-0.372393	variable outside the loop	-0.124939
-0.372393	element outside the loop	-0.124939
-0.372393	overflow outside the loop	-0.124939
-0.064664	to unroll the loop	-0.124939
-0.452358	can predict the loop	-0.124939
-0.452358	can execute the loop	-0.124939
-0.064664	by unrolling the loop	-0.124939
-0.973882	to vectorize the loop	-0.124939
-0.350040	to evaluate the loop	-0.124939
-0.350040	counter, comparing the loop	-0.124939
-0.350040	to increment the loop	-0.124939
-0.350040	FuncC. Unrolling the loop	-0.124939
-0.590667	n is a loop	-0.124939
-0.190843	function of a loop	-0.425969
-0.913149	out of a loop	-0.124939
-0.539396	efficiency of a loop	-0.124939
-0.539396	body of a loop	-0.124939
-0.539396	Division of a loop	-0.124939
-0.567503	times in a loop	-0.124939
-0.836524	done in a loop	-0.124939
-0.567503	semicolons in a loop	-0.124939
-0.567503	Calculations in a loop	-0.124939
-1.014360	Assume that a loop	-0.124939
-1.007852	example, if a loop	-0.124939
-0.929349	to use a loop	-0.124939
-1.644495	to make a loop	-0.124939
-0.507521	integer. If a loop	-0.124939
-0.507521	obtained. If a loop	-0.124939
-1.314196	For example, a loop	-0.124939
-0.643049	roll out a loop	-0.124939
-0.552022	is inside a loop	-0.124939
-0.154389	to unroll a loop	-0.124939
-0.169898	usually unroll a loop	-0.124939
-0.452860	many processors, a loop	-0.124939
-0.350437	will vectorize a loop	-0.124939
-0.350437	CPU. Unrolling a loop	-0.124939
-0.350437	for incrementing a loop	-0.124939
-0.462236	the calculations of loop	-0.124939
-0.089962	to top of loop	-0.124939
-0.042656	; top of loop	-0.425969
-0.531768	2; } The loop	-0.124939
-0.531768	Z } The loop	-0.124939
-0.529402	innermost loop. The loop	-0.124939
-0.515839	overlapping calculations. The loop	-0.124939
-0.930322	or not. The loop	-0.124939
-0.494135	the value. The loop	-0.124939
-0.351125	AVX. 5. The loop	-0.124939
-0.351125	clearly better. The loop	-0.124939
-0.351125	in eax. The loop	-0.124939
-0.351125	value 1000. The loop	-0.124939
-0.351125	mov eax,0. The loop	-0.124939
-0.351125	changed freely. The loop	-0.124939
-0.351125	example 7.30b. The loop	-0.124939
-0.545146	r++) { // loop	-0.124939
-0.192125	c++) { // loop	-0.124939
-0.354543	through rows // loop	-0.124939
-0.458064	return x^10 // loop	-0.124939
-0.504372	induction variable as loop	-0.124939
-0.482022	FuncC(i); } This loop	-0.124939
-0.482022	sums } This loop	-0.124939
-0.501002	cycles, then this loop	-0.124939
-0.501002	will optimize this loop	-0.124939
-0.553580	calculation time. A loop	-0.124939
-0.648132	predicted well. A loop	-0.124939
-0.648132	branch prediction. A loop	-0.124939
-1.870421	there is no loop	-0.124939
-0.462094	integer power using loop	-0.124939
-0.851922	The most efficient loop	-0.124939
-0.009109	// Roll out loop	-0.823909
-0.163827	of the while loop	-0.124939
-0.163827	in the while loop	-0.124939
-0.163827	emulate the while loop	-0.124939
-0.552463	out a big loop	-0.124939
-0.355597	diagonal. The c loop	-0.124939
-0.354897	is inside another loop	-0.124939
-0.122053	of the innermost loop	-0.124939
-0.123205	in the innermost loop	-0.124939
-0.122053	only the innermost loop	-0.124939
-0.122053	also the innermost loop	-0.124939
-0.200660	outside the innermost loop	-0.124939
-0.296279	the critical innermost loop	-0.425969
-0.170161	A critical innermost loop	-0.124939
-0.547284	and the whole loop	-0.124939
-0.490944	have a special loop	-0.124939
-0.349060	;checkifi<100 ; repeat loop	-0.124939
-0.454421	above, the maximum loop	-0.124939
-0.494705	same. The maximum loop	-0.124939
-0.299563	from the message loop	-0.124939
-0.299563	in a message loop	-0.124939
-0.441650	as simple variables, loop	-0.124939
-0.331755	available use excessive loop	-0.124939
-0.314484	disadvantages: The unrolled loop	-0.124939
-0.293978	too much. Excessive loop	-0.124939
-0.293978	for avoiding infinite loop	-0.124939
-0.382543	0; // Initialize loop	-0.124939
-0.293978	log(c[i]); // Increment loop	-0.124939
-0.237698	are temporary intermediates, loop	-0.124939
-0.237698	Calculate integer power, loop	-0.124939
-0.237698	advantages: The i<20 loop	-0.124939
-0.237698	{ // Main loop	-0.124939
-0.358260	#ifdef _MSC_VER // If	-0.124939
-0.549791	Func1(2); ... } If	-0.124939
-0.356293	another exception. 64 If	-0.124939
-0.859865	floating point code. If	-0.124939
-0.330746	the error code. If	-0.124939
-0.330746	the simplest code. If	-0.124939
-0.428078	for application-specific code. If	-0.124939
-0.355256	the virtual function. If	-0.124939
-0.584352	in program memory. If	-0.124939
-0.414084	loaded into memory. If	-0.124939
-0.450808	than static memory. If	-0.124939
-0.714134	can be used. If	-0.124939
-0.992083	the data cache. If	-0.124939
-0.331220	a level-3 cache. If	-0.124939
-1.053987	in 64-bit systems. If	-0.124939
-0.331114	on some systems. If	-0.124939
-0.328770	is not efficient. If	-0.124939
-0.425608	are equally efficient. If	-0.124939
-0.473107	desired instruction set. If	-0.124939
-0.473107	corresponding instruction set. If	-0.124939
-0.304518	in each set. If	-0.124939
-0.351584	in some compilers. If	-0.124939
-1.297562	function is called. If	-0.124939
-0.953436	inside the loop. If	-0.124939
-0.809592	a smart pointer. If	-0.124939
-0.512862	and cache size. If	-0.124939
-0.558427	many function calls. If	-0.124939
-1.114702	in 32-bit mode. If	-0.124939
-0.735423	to the object. If	-0.124939
-0.434398	different function library. If	-0.124939
-0.307336	floating point library. If	-0.124939
-0.562904	40 clock cycles. If	-0.124939
-0.303502	a different thread. If	-0.124939
-0.555311	by another thread. If	-0.124939
-0.635602	for test purposes. If	-0.124939
-0.796469	the following way. If	-0.124939
-0.633204	in a vector. If	-0.124939
-0.630344	for local references. If	-0.124939
-0.344597	d = u; If	-0.124939
-0.444791	specific load address. If	-0.124939
-0.613809	in the CPU. If	-0.124939
-0.293323	a non-Intel CPU. If	-0.124939
-0.293444	becomes a problem. If	-0.124939
-0.445136	around this problem. If	-0.124939
-0.342721	non- sequential order. If	-0.124939
-0.342418	procedures are inefficient. If	-0.124939
-0.442734	statement was executed. If	-0.124939
-0.772536	the template parameter. If	-0.124939
-0.342418	big endian storage. If	-0.124939
-0.440602	another source file. If	-0.124939
-0.624200	in a register. If	-0.124939
-0.533249	and execution units. If	-0.124939
-0.340556	is a branch. If	-0.124939
-0.268276	the above table. If	-0.124939
-0.268276	procedure linkage table. If	-0.124939
-0.351431	or threads simultaneously. If	-0.124939
-0.268440	or seemingly simultaneously. If	-0.124939
-0.399747	be an integer. If	-0.124939
-0.268440	the nearest integer. If	-0.124939
-0.338174	be loaded anyway. If	-0.124939
-0.338367	on page 16. If	-0.124939
-0.268276	a 32-bit number. If	-0.124939
-0.268276	8-bit signed number. If	-0.124939
-0.335018	large or constant. If	-0.124939
-0.739452	of branch prediction. If	-0.124939
-0.433990	a particular application. If	-0.124939
-0.727318	the data members. If	-0.124939
-0.742052	long dependency chain. If	-0.124939
-0.255023	again and again. If	-0.124939
-0.642542	and back again. If	-0.124939
-0.872896	do not overlap. If	-0.124939
-0.335241	too many branches. If	-0.124939
-0.796537	difficult to maintain. If	-0.124939
-0.857220	in the future. If	-0.124939
-0.237123	any other factor. If	-0.124939
-0.565474	the unroll factor. If	-0.124939
-0.330637	with lower priority. If	-0.124939
-0.605522	library at www.agner.org/optimize/asmlib.zip. If	-0.124939
-0.330637	may be necessary. If	-0.124939
-0.663441	Common subexpression elimination If	-0.124939
-0.330637	different memory addresses. If	-0.124939
-0.330637	j << 5. If	-0.124939
-0.419838	will work better. If	-0.124939
-0.324147	resources cleaned up. If	-0.124939
-0.476961	variable is declared. If	-0.124939
-0.592896	program is running. If	-0.124939
-0.835778	type identification (RTTI) If	-0.124939
-0.211584	predictable operand first. If	-0.124939
-0.211584	members come first. If	-0.124939
-0.324147	pool. See www.agner.org/optimize/cppexamples.zip. If	-0.124939
-0.324147	non-object oriented programs. If	-0.124939
-0.211584	more reliable results. If	-0.124939
-0.283496	and reproducible results. If	-0.124939
-0.406659	and an addition. If	-0.124939
-0.313543	bytes of code). If	-0.124939
-0.573408	recover from errors. If	-0.124939
-0.037060	the AVX part. If	-0.425969
-0.573408	in chapter 12. If	-0.124939
-0.313543	or too long. If	-0.124939
-0.442734	exactly the same. If	-0.124939
-0.406659	cores is slow. If	-0.124939
-0.313543	from set 0x1C. If	-0.124939
-0.293080	calls to CriticalFunction. If	-0.124939
-0.293080	0 to 15. If	-0.124939
-0.293080	the following cases: If	-0.124939
-0.536873	*(p++) |= 0x20; If	-0.124939
-0.293080	for these methods. If	-0.124939
-0.293080	processor has hyperthreading. If	-0.124939
-0.102497	a FIFO manner? If	-0.124939
-0.102497	a FILO manner? If	-0.124939
-0.536873	difficult to read. If	-0.124939
-0.381442	can be obtained. If	-0.124939
-0.293080	and deleting containers. If	-0.124939
-0.381442	than 250 ms. If	-0.124939
-0.023447	have been added? If	-0.425969
-0.293080	do so. 58 If	-0.124939
-0.536873	(see page 105). If	-0.124939
-0.536873	Linux and BSD. If	-0.124939
-0.236909	instruction set extensions. If	-0.124939
-0.236909	of a macro. If	-0.124939
-0.236909	rightmost 1-bit removed. If	-0.124939
-0.236909	at compile time? If	-0.124939
-0.236909	the same class). If	-0.124939
-0.236909	very different speeds. If	-0.124939
-0.236909	given above. 7. If	-0.124939
-0.236909	a = lookup[b]; If	-0.124939
-0.236909	3"); or __debugbreak();. If	-0.124939
-0.236909	objects numbered consecutively? If	-0.124939
-0.236909	n! = n∙(n-1)!. If	-0.124939
-0.236909	on page 62. If	-0.124939
-0.236909	software was coded. If	-0.124939
-0.236909	by a key? If	-0.124939
-0.236909	is more complicated. If	-0.124939
-0.236909	and cryptography (www.intel.com). If	-0.124939
-0.236909	locally or remotely. If	-0.124939
-0.236909	a natural ordering? If	-0.124939
-0.236909	executable file stub. If	-0.124939
-0.236909	time to calculate. If	-0.124939
-0.236909	x86 systems). 42 If	-0.124939
-0.236909	using nontemporal writes. If	-0.124939
-0.236909	a logical sequence. If	-0.124939
-0.236909	Coarse time measurement. If	-0.124939
-0.236909	data for analysis. If	-0.124939
-0.236909	or multiple elements? If	-0.124939
-0.236909	pointers or references: If	-0.124939
-0.236909	a vector. 6. If	-0.124939
-0.236909	into the pipeline. If	-0.124939
-0.236909	element is stored? If	-0.124939
-0.236909	IntegerPower<10>(x); } 152 If	-0.124939
-0.236909	(number of ways). If	-0.124939
-0.236909	etc. is considerable. If	-0.124939
-0.236909	1.0f : 2.5f; If	-0.124939
-0.236909	sum1 += sum2; If	-0.124939
-0.236909	already been allocated. If	-0.124939
-0.392336	A list of which	-0.124939
-0.622460	negative list of which	-0.124939
-0.552539	positive list of which	-0.124939
-1.150491	The choice of which	-0.124939
-0.356591	specific recommendation of which	-0.124939
-0.356591	some indication of which	-0.124939
-0.356591	misleading reports of which	-0.124939
-0.461102	to do and which	-0.124939
-0.356934	be allowed and which	-0.124939
-0.356934	are costly and which	-0.124939
-0.356934	some situations, and which	-0.124939
-0.149609	the function in which	-0.249877
-1.021709	of code in which	-0.124939
-0.011919	the order in which	-0.602060
-0.076978	The order in which	-0.124939
-0.455577	the thread in which	-0.124939
-0.352582	{} brackets in which	-0.124939
-1.245888	is a function which	-0.124939
-0.573615	contains a function which	-0.124939
-0.787470	a library function which	-0.124939
-1.255570	a member function which	-0.124939
-0.771141	calls another function which	-0.124939
-0.352247	or API function which	-0.124939
-0.495879	of processors on which	-0.124939
-0.575672	be based on which	-0.124939
-0.064966	processor models on which	-0.124939
-0.352380	different opinions on which	-0.124939
-0.725238	an error code which	-0.124939
-0.358190	a new compiler which	-0.124939
-1.461809	at compile time which	-0.124939
-0.503454	in stack memory which	-0.124939
-0.357893	memory address at which	-0.124939
-0.357759	sure that functions which	-0.124939
-0.888625	inside the CPU which	-0.124939
-0.761576	a template class which	-0.124939
-0.548141	of a double which	-0.124939
-1.164572	a function pointer which	-0.124939
-0.350182	a 'this' pointer which	-0.124939
-0.492824	implicit 'this' pointer which	-0.124939
-0.357160	floating point library which	-0.124939
-0.444181	function of i which	-0.124939
-0.764221	value of i which	-0.124939
-1.179098	a shared object which	-0.124939
-0.583227	on a variable which	-0.124939
-0.979184	a memory address which	-0.124939
-0.349011	a higher address which	-0.124939
-0.356539	the following example, which	-0.124939
-0.356420	a single bit which	-0.124939
-0.553558	a vector register which	-0.124939
-0.654258	to find out which	-0.124939
-1.632110	the operating system which	-0.124939
-0.355573	precision conversion instructions which	-0.124939
-0.355780	various profilers available which	-0.124939
-0.818101	necessary information about which	-0.124939
-0.332857	specific recommendation about which	-0.124939
-0.332857	considerable debate about which	-0.124939
-0.355476	oldest Pentium CPUs which	-0.124939
-0.458574	using 8-bit integers which	-0.124939
-1.005976	it is known which	-0.124939
-0.354430	full debugging support which	-0.124939
-0.543465	We can calculate which	-0.124939
-0.353079	the | operator which	-0.124939
-0.462343	code to see which	-0.124939
-0.657831	possible to see which	-0.124939
-0.312645	libraries and see which	-0.124939
-0.351612	keywords and directives which	-0.124939
-0.351529	(page 77) shows which	-0.124939
-0.350899	a complicated process which	-0.124939
-0.514906	9 extra overhead which	-0.124939
-0.529840	with a profiler which	-0.124939
-0.348834	a shift operation which	-0.124939
-0.830534	of the code, which	-0.124939
-0.407292	an intermediate code, which	-0.124939
-0.347807	has three conditions which	-0.124939
-0.449096	useful when testing which	-0.124939
-0.205572	order to predict which	-0.124939
-0.302846	difficult to predict which	-0.124939
-0.205572	algorithms to predict which	-0.124939
-0.205572	unable to predict which	-0.124939
-0.469724	it is discussed which	-0.124939
-0.299340	is also discussed which	-0.124939
-0.345799	important to consider which	-0.124939
-0.630827	Intel C++ compiler, which	-0.124939
-0.445497	before it checks which	-0.124939
-0.529401	critical dependency chain which	-0.124939
-0.341081	the access non-sequential which	-0.124939
-0.338578	a special trick which	-0.124939
-0.497397	two 32-bit integers, which	-0.124939
-0.338420	cache line size, which	-0.124939
-0.338578	older MMX registers, which	-0.124939
-0.433737	should automatically detect which	-0.124939
-0.335992	time on deciding which	-0.124939
-0.331096	has a latency which	-0.124939
-0.228749	a is true, which	-0.124939
-0.228749	b is true, which	-0.124939
-0.211722	to start up, which	-0.124939
-0.211722	is filled up, which	-0.124939
-0.489593	know in advance which	-0.124939
-0.324386	< 5) {} which	-0.124939
-0.681528	binding by default, which	-0.124939
-0.443047	level of abstraction which	-0.124939
-0.313775	whole program optimization, which	-0.124939
-0.313775	floating point comparisons, which	-0.124939
-0.136968	on the stack, which	-0.124939
-0.313775	Template Library (STL) which	-0.124939
-0.313775	important to decide which	-0.124939
-0.313775	pointers and references, which	-0.124939
-0.293302	bitwise OR operator, which	-0.124939
-0.293302	-100 to -56 which	-0.124939
-0.293302	CPU family number, which	-0.124939
-0.293302	in the CPU, which	-0.124939
-0.293302	at unpredictable intervals which	-0.124939
-0.293302	of the array, which	-0.124939
-0.381714	on an interpreter which	-0.124939
-0.293302	requires a division, which	-0.124939
-0.293302	an integer comparison, which	-0.124939
-0.381714	the loop counter, which	-0.124939
-0.293302	dispatcher function decides which	-0.124939
-0.381714	a garbage collector which	-0.124939
-0.381714	predict with certainty which	-0.124939
-0.102568	single & operation, which	-0.124939
-0.102568	a shift operation, which	-0.124939
-0.237104	object is moved, which	-0.124939
-0.237104	to be mispredicted, which	-0.124939
-0.237104	link library (DLL) which	-0.124939
-0.237104	replaced by x<<3, which	-0.124939
-0.237104	the generic branch, which	-0.124939
-0.237104	function library asmlib, which	-0.124939
-0.237104	a compile-time polymorphism, which	-0.124939
-0.237104	have an attribute which	-0.124939
-0.237104	for intermediate results, which	-0.124939
-0.237104	performance even matters, which	-0.124939
-0.237104	/Gy, Linux: -ffunction-sections) which	-0.124939
-0.237104	addresses for everything, which	-0.124939
-0.237104	assembly language output, which	-0.124939
-0.237104	Visual Basic .NET, which	-0.124939
-0.237104	elements in a[] which	-0.124939
-0.237104	requires n-1 multiplications, which	-0.124939
-0.237104	make a bit-mask which	-0.124939
-0.237104	induction variable (eax) which	-0.124939
-0.237104	newest CPU model, which	-0.124939
-0.237104	by calling WritePrivateProfileString, which	-0.124939
-0.237104	type identification (RTTI), which	-0.124939
-0.237104	(XMM or YMM) which	-0.124939
-0.237104	the constant 2.5, which	-0.124939
-0.598074	[ecx+eax*4]. This is all	-0.124939
-0.592804	bit-mask which is all	-0.124939
-1.231922	The size of all	-0.124939
-0.818933	combined size of all	-0.124939
-1.118623	the values of all	-0.124939
-1.043529	take care of all	-0.124939
-0.900552	get rid of all	-0.124939
-0.722371	an array to all	-0.124939
-0.571046	keep pointers to all	-0.124939
-0.571046	you access to all	-0.124939
-0.461227	extra information to all	-0.124939
-0.357033	static keyword to all	-0.124939
-1.103504	be applied to all	-0.124939
-0.784141	executable file and all	-0.124939
-0.357156	for AVX2 and all	-0.124939
-0.357156	if statement and all	-0.124939
-0.357156	is true, and all	-0.124939
-0.991608	are available in all	-0.124939
-0.502156	same precision in all	-0.124939
-0.461034	monitor counters in all	-0.124939
-0.503151	also deallocated in all	-0.124939
-0.356881	are permissible in all	-0.124939
-0.641243	of memory for all	-0.124939
-1.469698	is used for all	-0.124939
-0.451694	this method for all	-0.124939
-0.349515	the stack for all	-0.124939
-0.349515	is best for all	-0.124939
-0.814083	compiler option for all	-0.124939
-1.098732	to check for all	-0.124939
-0.349515	memory allocation for all	-0.124939
-0.451694	be made for all	-0.124939
-0.338249	good choice for all	-0.124939
-0.451694	a PLT for all	-0.124939
-0.349515	a GOT for all	-0.124939
-0.349515	and c1 for all	-0.124939
-0.349515	compilers exist for all	-0.124939
-0.883113	this is that all	-0.124939
-0.787520	be sure that all	-0.124939
-1.240398	makes sure that all	-0.124939
-0.891477	is important that all	-0.124939
-1.318734	This means that all	-0.124939
-0.519143	often requires that all	-0.124939
-1.181197	the sense that all	-0.124939
-0.454233	standard specifies that all	-0.124939
-0.454233	2. Check that all	-0.124939
-0.351521	to verify that all	-0.124939
-0.351521	no guarantee that all	-0.124939
-0.560081	F1 only if all	-0.124939
-0.947973	the loop if all	-0.124939
-0.778366	most efficient if all	-0.124939
-0.543956	cycles. But if all	-0.124939
-0.352317	is zero if all	-0.124939
-0.352317	at runtime if all	-0.124939
-0.562139	time used by all	-0.124939
-0.355821	you should by all	-0.124939
-1.327132	is supported by all	-0.124939
-0.344629	the data with all	-0.124939
-0.445520	release version with all	-0.124939
-0.344629	carried out with all	-0.124939
-0.344629	command line with all	-0.124939
-0.344629	method works with all	-0.124939
-0.953252	be compatible with all	-0.124939
-0.344629	calculations. Even with all	-0.124939
-0.304016	is AND'ed with all	-0.124939
-0.138991	MKL). Works with all	-0.124939
-0.138991	(IPP). Works with all	-0.124939
-0.440961	advanced version on all	-0.124939
-0.302536	do operations on all	-0.124939
-0.302536	Similar operations on all	-0.124939
-0.341011	inline assembly on all	-0.124939
-0.565174	to work on all	-0.124939
-0.401033	that work on all	-0.124939
-0.440961	that works on all	-0.124939
-0.240445	is supported on all	-0.124939
-0.240445	and supported on all	-0.124939
-0.731026	works well on all	-0.124939
-0.519532	and turn on all	-0.124939
-0.341011	work efficiently on all	-0.124939
-0.341011	is incurred on all	-0.124939
-0.358239	function, though not all	-0.124939
-0.358158	the end when all	-0.124939
-0.355478	above example, then all	-0.124939
-0.459252	values first, then all	-0.124939
-0.349428	or not at all	-0.124939
-0.496189	are used at all	-0.124939
-0.349428	not evaluated at all	-0.124939
-0.349428	be visible at all	-0.124939
-1.108979	This will make all	-0.124939
-0.351639	or double because all	-0.124939
-0.351639	hexadecimal numbers because all	-0.124939
-0.454383	quite costly because all	-0.124939
-0.460970	searching needed before all	-0.124939
-0.501913	that can call all	-0.124939
-0.593678	time. For example, all	-0.124939
-0.577212	order to test all	-0.124939
-0.355504	Pentium 4, while all	-0.124939
-1.097327	This can cause all	-0.124939
-0.354448	and d would all	-0.124939
-0.553655	is to store all	-0.124939
-0.436341	you may store all	-0.124939
-0.354087	0x4700. These addresses all	-0.124939
-0.939768	compiler can replace all	-0.124939
-0.332560	number and sets all	-0.124939
-0.332560	The constructor sets all	-0.124939
-0.353192	exception handler needs all	-0.124939
-0.567723	}; // Make all	-0.124939
-0.455596	The above examples all	-0.124939
-0.455596	values, and last all	-0.124939
-0.396200	but only after all	-0.124939
-0.305083	the array after all	-0.124939
-0.305083	searching needed after all	-0.124939
-0.352023	may not load all	-0.124939
-0.496755	for turning off all	-0.124939
-0.286428	newest processors. Supports all	-0.124939
-0.286428	Open source. Supports all	-0.124939
-0.286428	possible workaround. Supports all	-0.124939
-0.123027	function by inlining all	-0.425969
-0.349394	software package, including all	-0.124939
-0.349301	loop without checking all	-0.124939
-0.348864	register and prevents all	-0.124939
-0.449556	zero by testing all	-0.124939
-0.531127	simply by copying all	-0.124939
-0.307911	the list causes all	-0.124939
-0.307911	critical stride causes all	-0.124939
-1.002718	the reason why all	-0.124939
-0.344798	shared object. Obviously, all	-0.124939
-0.343093	possible to contain all	-0.124939
-0.343475	STL vector stores all	-0.124939
-0.343246	parameter transfer across all	-0.124939
-0.117495	used in almost all	-0.124939
-0.117495	Linux in almost all	-0.124939
-0.338841	stack pointer. Likewise, all	-0.124939
-0.339036	F1 has saved all	-0.124939
-0.505390	need to remove all	-0.124939
-0.335793	directives and declare all	-0.124939
-0.331427	possible to select all	-0.124939
-0.331293	operator less. Fortunately, all	-0.124939
-0.324794	FMA4 fma4intrin.h (Gnu) all	-0.124939
-0.314388	test or manipulate all	-0.124939
-0.314173	the C++ language, all	-0.124939
-0.077702	program that scans all	-0.124939
-0.077702	scanner that scans all	-0.124939
-0.314388	does not solve all	-0.124939
-0.314388	for vectorization Not all	-0.124939
-0.382180	is to join all	-0.124939
-0.382180	have to distribute all	-0.124939
-0.237438	efficient to pool all	-0.124939
-0.237438	influences are removed, all	-0.124939
-0.237438	that you analyze all	-0.124939
-1.068595	inside a function but	-0.124939
-1.142167	takes more time but	-0.124939
-0.784577	size of all but	-0.124939
-0.461618	designed by Intel but	-0.124939
-0.357359	is not i but	-0.124939
-0.560616	advantages of C++ but	-0.124939
-0.357070	no specific order but	-0.124939
-0.356452	very contrived example, but	-0.124939
-0.653811	program under test but	-0.124939
-0.583810	interrupt the user but	-0.124939
-0.459581	four physical processors but	-0.124939
-0.539148	-(-a) = a, but	-0.124939
-0.458439	floating point overflow but	-0.124939
-0.353009	and Mac programs but	-0.124939
-0.906155	in most cases, but	-0.124939
-0.868222	in some cases, but	-0.124939
-0.577351	the simplest cases, but	-0.124939
-0.455839	require a multiplication but	-0.124939
-0.332038	static arrays automatically but	-0.124939
-0.332038	with 14.14b automatically but	-0.124939
-0.381897	of the function, but	-0.124939
-0.293451	an optimized function, but	-0.124939
-0.293451	the latter function, but	-0.124939
-0.190913	for intrinsic functions, but	-0.124939
-0.190913	supports intrinsic functions, but	-0.124939
-0.251659	contains similar functions, but	-0.124939
-0.251659	are pure functions, but	-0.124939
-0.313918	more efficient code, but	-0.124939
-0.313918	removing superfluous code, but	-0.124939
-0.237266	of the time, but	-0.124939
-0.237266	at compile time, but	-0.124939
-0.237266	at compile- time, but	-0.124939
-0.237266	the programmers' time, but	-0.124939
-0.462650	linking is used, but	-0.124939
-0.381786	can be used, but	-0.124939
-0.670852	SSE2 instruction set, but	-0.124939
-0.396989	higher instruction set, but	-0.124939
-0.293594	on AMD processors, but	-0.124939
-0.293594	Intel Atom processors, but	-0.124939
-0.624286	in 32-bit systems, but	-0.124939
-0.224471	than other CPUs, but	-0.124939
-0.298744	certain Intel CPUs, but	-0.124939
-0.298744	on multi-core CPUs, but	-0.124939
-0.340608	point register variables, but	-0.124939
-0.340934	4 or 8, but	-0.124939
-0.552498	few clock cycles, but	-0.124939
-0.338781	0 or 1, but	-0.124939
-0.679937	stored in memory, but	-0.124939
-0.516397	protected operating system, but	-0.124939
-0.437457	define 64-bit integers, but	-0.124939
-0.505417	time, of course, but	-0.124939
-0.335068	example 16.2 above, but	-0.124939
-0.187673	This is efficient, but	-0.124939
-0.187673	fast and efficient, but	-0.124939
-0.187673	also quite efficient, but	-0.124939
-0.335283	adds, not edx but	-0.124939
-0.330942	the general case, but	-0.124939
-0.330942	a function library, but	-0.124939
-0.330687	of system programming, but	-0.124939
-0.160241	between different threads, but	-0.124939
-0.135609	into multiple threads, but	-0.124939
-0.216956	between multiple threads, but	-0.124939
-0.428324	with many features, but	-0.124939
-0.486289	other programming languages, but	-0.124939
-0.324197	based on BSD, but	-0.124939
-0.211613	platforms as well, but	-0.124939
-0.211613	optimizes reasonably well, but	-0.124939
-0.649700	facilities are needed, but	-0.124939
-0.592989	or double precision, but	-0.124939
-0.477032	or hot spot but	-0.124939
-0.324511	is a float, but	-0.124939
-0.592989	as it is, but	-0.124939
-0.419900	in the unit-test but	-0.124939
-0.283530	is a pointer, but	-0.124939
-0.211613	an imported pointer, but	-0.124939
-0.313591	with 64 bits, but	-0.124939
-0.314000	Intel function libraries, but	-0.124939
-0.406719	to small devices, but	-0.124939
-0.313591	and model numbers, but	-0.124939
-0.313591	registers by 64, but	-0.124939
-0.406719	when it occurs, but	-0.124939
-0.313591	such optimizations automatically, but	-0.124939
-0.406719	output more readable but	-0.124939
-0.313591	and fence instructions, but	-0.124939
-0.406719	with template metaprogramming, but	-0.124939
-0.313591	division in vectors, but	-0.124939
-0.293126	have little-endian storage, but	-0.124939
-0.293126	types of expressions, but	-0.124939
-0.293126	the logarithm again, but	-0.124939
-0.102512	in multiple applications, but	-0.124939
-0.102512	for such applications, but	-0.124939
-0.381499	still be vectorized, but	-0.124939
-0.293126	reduce any expression, but	-0.124939
-0.102512	overflow can occur, but	-0.124939
-0.102512	error doesn't occur, but	-0.124939
-0.293126	to using hyperthreading, but	-0.124939
-0.293126	program under test, but	-0.124939
-0.293126	of 64-bit software, but	-0.124939
-0.381499	code more complex, but	-0.124939
-0.536954	program is loaded, but	-0.124939
-0.293126	safe and flexible, but	-0.124939
-0.293126	that need relocation, but	-0.124939
-0.293126	point addition unit, but	-0.124939
-0.293126	manual on usability, but	-0.124939
-0.381499	of this manual, but	-0.124939
-0.381499	simple test setup but	-0.124939
-0.293126	of known type, but	-0.124939
-0.293126	for other reasons, but	-0.124939
-0.293126	the simplest method, but	-0.124939
-0.236950	of the factorials, but	-0.124939
-0.236950	finding hot spots, but	-0.124939
-0.236950	by a macro, but	-0.124939
-0.236950	(.lib or .a), but	-0.124939
-0.236950	less than 2-20, but	-0.124939
-0.236950	compile with -mcmodel=large, but	-0.124939
-0.236950	references to relocate, but	-0.124939
-0.236950	(called static if), but	-0.124939
-0.236950	one CPU core, but	-0.124939
-0.236950	large data bases, but	-0.124939
-0.236950	A more primitive, but	-0.124939
-0.236950	called from main, but	-0.124939
-0.236950	same memory block, but	-0.124939
-0.236950	take the hint, but	-0.124939
-0.236950	by me manually, but	-0.124939
-0.236950	the option -ftrapv, but	-0.124939
-0.236950	below 2 GB, but	-0.124939
-0.236950	associated with profiling, but	-0.124939
-0.236950	(see page 103), but	-0.124939
-0.236950	-(-a) very often, but	-0.124939
-0.236950	sake of security, but	-0.124939
-0.236950	a simple solution, but	-0.124939
-0.236950	a particular situation, but	-0.124939
-0.236950	a syntax restriction, but	-0.124939
-0.236950	arrays as required, but	-0.124939
-0.236950	int)u; // Faster, but	-0.124939
-0.236950	of disk caching, but	-0.124939
-0.236950	to be noticeable but	-0.124939
-0.236950	many small subtasks, but	-0.124939
-0.236950	the container expandable, but	-0.124939
-0.236950	overlapping or aliasing, but	-0.124939
-0.236950	15.1b to 15.1c, but	-0.124939
-0.236950	it is cached, but	-0.124939
-0.236950	a considerable job, but	-0.124939
-0.236950	in scientific computing, but	-0.124939
-0.236950	simple type casting, but	-0.124939
-0.236950	the code section, but	-0.124939
-0.236950	methods of rounding, but	-0.124939
-0.236950	parameter is wrong, but	-0.124939
-0.236950	its final destination, but	-0.124939
-0.236950	override public symbols, but	-0.124939
-0.236950	the Mac platform, but	-0.124939
-0.536008	dispatching and is used	-0.124939
-0.894779	function that is used	-0.124939
-0.721441	one that is used	-0.124939
-0.501807	class that is used	-0.124939
-0.501807	method that is used	-0.124939
-0.501807	unwinding that is used	-0.124939
-0.869811	time it is used	-0.124939
-1.139931	before it is used	-0.124939
-0.867723	Position-independent code is used	-0.124939
-1.267803	// This is used	-0.124939
-0.594376	fashion. It is used	-0.124939
-0.549104	support which is used	-0.124939
-0.549104	trick which is used	-0.124939
-0.441530	code cache is used	-0.124939
-0.441530	level-1 cache is used	-0.124939
-0.526782	virtual table is used	-0.124939
-0.491718	one thread is used	-0.124939
-0.349386	This standard is used	-0.124939
-0.349386	16-bit mode is used	-0.124939
-0.763674	loop counter is used	-0.124939
-0.853796	memory space is used	-0.124939
-0.302698	dynamic_cast operator is used	-0.124939
-0.302698	const_cast operator is used	-0.124939
-0.302698	reinterpret_cast operator is used	-0.124939
-0.491718	inline keyword is used	-0.124939
-0.244854	lookup process is used	-0.124939
-0.244854	delaying process is used	-0.124939
-0.529659	function feature is used	-0.124939
-0.349386	a bool is used	-0.124939
-0.349386	stack frame is used	-0.124939
-0.451529	following algorithm is used	-0.124939
-0.349386	macro INSTRSET is used	-0.124939
-0.349386	function longjmp is used	-0.124939
-0.358703	more popular and used	-0.124939
-1.143411	that can be used	-0.124939
-0.502672	instruction can be used	-0.124939
-0.999991	which can be used	-0.124939
-0.502672	class can be used	-0.124939
-0.502672	compilers can be used	-0.124939
-0.722877	register can be used	-0.124939
-0.225743	method can be used	-0.124939
-0.502672	lookup can be used	-0.124939
-0.502672	union can be used	-0.124939
-0.502672	conversions can be used	-0.124939
-0.502672	metaprogramming can be used	-0.124939
-0.502672	map can be used	-0.124939
-0.502672	tool can be used	-0.124939
-0.502672	manuals can be used	-0.124939
-0.502672	units can be used	-0.124939
-0.502672	guidelines can be used	-0.124939
-0.939709	that may be used	-0.124939
-0.495045	integer may be used	-0.124939
-0.495045	These may be used	-0.124939
-0.495045	methods may be used	-0.124939
-0.495045	mechanism may be used	-0.124939
-0.495045	Templates may be used	-0.124939
-0.495045	tree may be used	-0.124939
-0.554973	variables will be used	-0.124939
-0.515853	should only be used	-0.124939
-0.825966	versions should be used	-0.124939
-0.342146	can also be used	-0.249877
-0.328690	might also be used	-0.124939
-0.507818	instruction cannot be used	-0.124939
-0.507818	optimization cannot be used	-0.124939
-0.428596	can even be used	-0.124939
-0.731210	should therefore be used	-0.124939
-0.664566	can still be used	-0.124939
-0.343232	code that are used	-0.124939
-0.483202	data that are used	-0.124939
-0.413639	functions that are used	-0.301030
-0.257694	variables that are used	-0.124939
-0.483202	libraries that are used	-0.124939
-0.291750	Variables that are used	-0.602060
-0.138556	Functions that are used	-0.425969
-0.343232	iterators that are used	-0.124939
-1.203619	the data are used	-0.124939
-0.505640	virtual functions are used	-0.124939
-0.505640	Virtual functions are used	-0.124939
-0.526879	directives which are used	-0.124939
-0.555034	Intel libraries are used	-0.124939
-0.498477	Smart pointers are used	-0.124939
-0.564753	function they are used	-0.124939
-0.526879	x86 processors are used	-0.124939
-0.477603	optimization manuals are used	-0.124939
-0.339169	These units are used	-0.124939
-0.477603	runtime frameworks are used	-0.124939
-0.438643	Threads Threads are used	-0.124939
-0.358621	return ipow(x,10); // used	-0.124939
-0.557343	important than it used	-0.124939
-0.195781	tables are not used	-0.425969
-0.779181	Here, I have used	-0.124939
-0.535701	tricky. I have used	-0.124939
-0.593241	also the time used	-0.124939
-0.580600	input. The time used	-0.124939
-0.355794	#define directives when used	-0.124939
-0.355794	const definitions when used	-0.124939
-0.853789	free the memory used	-0.124939
-0.355285	and data memory used	-0.124939
-0.462599	size or data used	-0.124939
-0.578213	variable is only used	-0.124939
-0.889215	inside the CPU used	-0.124939
-1.295681	of the most used	-0.124939
-0.820781	to the most used	-0.124939
-0.479791	memory is also used	-0.124939
-0.176757	mechanism is also used	-0.124939
-0.357279	cache lines we used	-0.124939
-0.349472	variables are often used	-0.124939
-0.349472	addresses are often used	-0.124939
-0.349472	Arrays are often used	-0.124939
-0.045974	the most often used	-0.124939
-0.308869	file. Keep often used	-0.124939
-0.487820	but the method used	-0.124939
-0.548245	pow The method used	-0.124939
-0.569533	experience to get used	-0.124939
-0.497946	line that was used	-0.124939
-0.751604	of cache space used	-0.124939
-0.353573	and frameworks typically used	-0.124939
-0.496982	the memory model used	-0.124939
-0.419417	that are never used	-0.124939
-0.296107	they are never used	-0.124939
-0.716989	are no longer used	-0.124939
-0.348173	as additions. When used	-0.124939
-0.341576	solutions are now used	-0.124939
-0.341576	branches. The algorithms used	-0.124939
-0.339140	if you had used	-0.124939
-0.339194	important and generally used	-0.124939
-0.335976	from Func 87 used	-0.124939
-0.331586	The method currently used	-0.124939
-0.027553	The most commonly used	-0.124939
-0.122566	are two commonly used	-0.124939
-0.212125	separate from seldom used	-0.124939
-0.212125	and put seldom used	-0.124939
-0.314455	implementations of Pascal used	-0.124939
-0.237674	embedded systems Microcontrollers used	-0.124939
-1.061386	which is the one	-0.124939
-0.526224	may be the one	-0.124939
-0.594534	modules than the one	-0.124939
-0.358206	code like the one	-0.124939
-1.180937	to find the one	-0.124939
-0.583558	Gnu This is one	-0.124939
-0.583558	CPUs"). This is one	-0.124939
-0.461895	software product is one	-0.124939
-0.357558	enabled (there is one	-0.124939
-0.357558	runtime. Polymorphism is one	-0.124939
-0.358723	functions counts a one	-0.124939
-0.937329	the calculation of one	-0.425969
-0.981191	a pointer to one	-0.124939
-0.357369	higher priority to one	-0.124939
-0.723152	is identical to one	-0.124939
-0.539351	often belong to one	-0.124939
-1.214016	instruction set and one	-0.124939
-0.821289	from Intel and one	-0.124939
-0.780433	executable file and one	-0.124939
-0.459616	one global and one	-0.124939
-0.826445	the program, and one	-0.124939
-0.459616	for SSE4.1 and one	-0.124939
-0.355765	CPU brands, and one	-0.124939
-0.519453	write it in one	-0.124939
-0.974693	short int in one	-0.124939
-0.902278	an integer in one	-0.124939
-0.456862	parent class in one	-0.124939
-0.543886	R value in one	-0.124939
-1.524159	be stored in one	-0.124939
-0.353595	through pointers in one	-0.124939
-0.534765	objects together in one	-0.124939
-0.456862	register temp in one	-0.124939
-0.793622	all strings in one	-0.124939
-0.353595	consecutive terms in one	-0.124939
-0.456862	four additions in one	-0.124939
-0.525800	much data for one	-0.124939
-0.462352	element addresses for one	-0.124939
-0.593729	threads so that one	-0.124939
-1.042071	making sure that one	-0.124939
-0.557165	Some instructions are one	-0.124939
-0.462867	takes zero or one	-0.124939
-0.583936	negative or if one	-0.124939
-0.357114	two comparisons by one	-0.124939
-0.357114	dynamically created by one	-0.124939
-0.834286	clock cycle on one	-0.124939
-0.540613	execute a code one	-0.124939
-0.588879	142 unsigned int one	-0.124939
-0.368732	is more than one	-0.124939
-0.368732	in more than one	-0.124939
-0.368732	for more than one	-0.124939
-0.368732	do more than one	-0.124939
-0.368732	load more than one	-0.124939
-0.368732	prefetch more than one	-0.124939
-0.775201	you can have one	-0.124939
-0.544137	and will have one	-0.124939
-0.353790	Such variables have one	-0.124939
-1.237659	Do not use one	-0.124939
-0.537424	library will use one	-0.124939
-0.884272	called only from one	-0.124939
-0.346832	memory block from one	-0.124939
-0.139676	is transferred from one	-0.124939
-0.139676	or transferred from one	-0.124939
-0.346832	be saved from one	-0.124939
-1.191599	a program has one	-0.124939
-0.351976	pipeline structure has one	-0.124939
-0.351976	the latter has one	-0.124939
-1.278334	want to make one	-0.124939
-0.570642	data and make one	-0.124939
-0.624536	is the only one	-0.124939
-0.843078	there is only one	-0.124939
-0.452206	one and only one	-0.124939
-0.298418	so that only one	-0.124939
-0.124066	may be only one	-0.124939
-0.124066	should be only one	-0.124939
-0.272856	systems with only one	-0.124939
-0.272856	system with only one	-0.124939
-0.124066	will have only one	-0.124939
-0.124066	CPUs have only one	-0.124939
-0.298418	called from only one	-0.124939
-0.236847	function has only one	-0.124939
-0.152037	template has only one	-0.124939
-0.152037	computer has only one	-0.124939
-0.298418	will make only one	-0.124939
-0.298418	For example, only one	-0.124939
-0.272856	that take only one	-0.124939
-0.272856	operations take only one	-0.124939
-0.990055	is mispredicted only one	-0.124939
-0.298418	to hold only one	-0.124939
-0.354396	branch prediction. If one	-0.124939
-0.457878	operand first. If one	-0.124939
-0.546591	recommendation of which one	-0.124939
-0.350679	find out which one	-0.124939
-0.493514	and see which one	-0.124939
-0.456236	solution is using one	-0.124939
-1.036084	improved by using one	-0.124939
-0.481808	source files into one	-0.124939
-0.334060	be read into one	-0.124939
-0.334060	.cpp modules into one	-0.124939
-0.499919	join them into one	-0.124939
-0.334060	to 0x273F into one	-0.124939
-0.611486	be joined into one	-0.124939
-0.357500	manual is number one	-0.124939
-0.357244	two threads where one	-0.124939
-0.502591	This typically takes one	-0.124939
-0.593675	jobs. For example, one	-0.124939
-0.459871	pointer takes up one	-0.124939
-0.563272	goes many times one	-0.124939
-0.355175	kept entirely inside one	-0.124939
-0.753876	you will get one	-0.124939
-0.134921	code section needs one	-0.124939
-0.134921	data section needs one	-0.124939
-0.352333	above, but read one	-0.124939
-0.813228	branch that goes one	-0.124939
-1.084352	You may choose one	-0.124939
-0.321473	translated to just one	-0.124939
-0.416508	AND-operations in just one	-0.124939
-0.350821	CPU detection function, one	-0.124939
-0.514696	example, to go one	-0.124939
-0.348455	it should save one	-0.124939
-0.844825	of the preceding one	-0.124939
-0.617652	to the preceding one	-0.124939
-0.542742	bytes by adding one	-0.124939
-0.404996	has at least one	-0.124939
-0.404996	calls at least one	-0.124939
-0.448381	squares and handle one	-0.124939
-0.346193	up and enable one	-0.124939
-0.782298	by the program, one	-0.124939
-1.031668	SSE2 instruction set, one	-0.124939
-0.541980	efficient to allocate one	-0.124939
-0.498132	we can eliminate one	-0.124939
-0.338831	are currently available, one	-0.124939
-0.331283	units, and 22 one	-0.124939
-0.331419	cache line. Only one	-0.124939
-0.324784	more integer units, one	-0.124939
-0.314164	ended queue) allocates one	-0.124939
-0.293672	that goes randomly one	-0.124939
-0.293672	compiled three times, one	-0.124939
-0.237430	1)sign 2exponent 16383 one	-0.124939
-0.237430	Day for signifying one	-0.124939
-0.237430	specific purpose: Contain one	-0.124939
-0.237430	i and shifts one	-0.124939
-0.237430	into three parts: one	-0.124939
-0.237430	just two branches: one	-0.124939
-0.237430	variable two names, one	-0.124939
-0.237430	elements were inserted, one	-0.124939
-0.598206	already in the cache	-0.124939
-1.901995	is that the cache	-0.124939
-0.576768	arrays by the cache	-0.124939
-1.449843	divisible by the cache	-0.124939
-1.095063	less than the cache	-0.124939
-0.623298	bigger than the cache	-0.124939
-0.558826	evicted from the cache	-0.124939
-0.558826	fetched from the cache	-0.124939
-0.843315	time because the cache	-0.124939
-0.571145	28 because the cache	-0.124939
-0.845026	set. If the cache	-0.124939
-0.572058	writes. If the cache	-0.124939
-0.585732	that all the cache	-0.124939
-0.587247	again before the cache	-0.124939
-0.929372	will cause the cache	-0.124939
-0.500896	that uses the cache	-0.124939
-0.653987	at least the cache	-0.124939
-0.355979	will evict the cache	-0.124939
-0.355979	column 28, the cache	-0.124939
-1.062580	example is a cache	-0.124939
-0.596142	beginning of a cache	-0.124939
-1.013905	is also a cache	-0.124939
-0.355957	address so a cache	-0.124939
-0.459860	know how a cache	-0.124939
-0.522916	will cause a cache	-0.124939
-0.065426	without loading a cache	-0.124939
-0.355957	for fetching a cache	-0.124939
-0.355957	of occupying a cache	-0.124939
-0.577291	speed because of cache	-0.124939
-0.571314	which set of cache	-0.124939
-0.594018	realistic number of cache	-0.124939
-1.039781	a lot of cache	-0.124939
-1.276109	the amount of cache	-0.124939
-0.459207	The details of cache	-0.124939
-0.459207	The penalty of cache	-0.124939
-0.527514	a waste of cache	-0.124939
-0.355443	three levels of cache	-0.124939
-0.541062	time goes to cache	-0.124939
-0.526154	memory access and cache	-0.124939
-0.658325	instruction sets and cache	-0.124939
-0.983961	the cache. The cache	-0.124939
-0.524977	contemporary processors. The cache	-0.124939
-0.461641	cache line. The cache	-0.124939
-0.593940	there will be cache	-0.124939
-0.358416	file access or cache	-0.124939
-0.463064	// align by cache	-0.124939
-0.806636	to the code cache	-0.124939
-0.854833	in the code cache	-0.124939
-1.125699	that the code cache	-0.124939
-0.768377	time. The code cache	-0.124939
-0.529501	together The code cache	-0.124939
-0.346502	likely that code cache	-0.124939
-0.346502	linker. Both code cache	-0.124939
-0.594928	resources, such as cache	-0.124939
-0.526200	and uses more cache	-0.124939
-0.459230	and data A cache	-0.124939
-0.355461	data sequentially A cache	-0.124939
-1.234581	of the data cache	-0.124939
-0.463658	use and data cache	-0.124939
-0.463658	cache and data cache	-0.124939
-0.513790	index. The data cache	-0.124939
-0.152305	the level-1 data cache	-0.124939
-0.166663	a level-1 data cache	-0.124939
-0.357907	into eight different cache	-0.124939
-0.778686	to the same cache	-0.124939
-0.623650	for the same cache	-0.425969
-1.227022	share the same cache	-0.124939
-0.794658	sharing the same cache	-0.124939
-0.357945	Wikipedia under CPU cache	-0.124939
-0.657593	There are other cache	-0.124939
-0.357920	Func 87 used cache	-0.124939
-1.120740	there are no cache	-0.124939
-0.997144	microcontrollers have no cache	-0.124939
-0.357698	program has most cache	-0.124939
-0.357306	efficient today where cache	-0.124939
-0.357107	from loading any cache	-0.124939
-0.586768	load a new cache	-0.124939
-0.580461	fourth of these cache	-0.124939
-0.047786	4 bytes without cache	-0.124939
-0.047786	8 bytes without cache	-0.124939
-0.015352	16 bytes without cache	-0.124939
-0.104572	that take up cache	-0.124939
-0.565852	have an extra cache	-0.124939
-1.025688	This can cause cache	-0.124939
-0.773419	it may cause cache	-0.124939
-0.150433	of the four cache	-0.425969
-0.458239	are only four cache	-0.124939
-0.549287	at the last cache	-0.124939
-0.326622	of 64. Each cache	-0.124939
-0.326622	line 29. Each cache	-0.124939
-0.770023	order to improve cache	-0.124939
-0.229833	in the level-2 cache	-0.124939
-0.095209	for the level-2 cache	-0.124939
-0.095209	that the level-2 cache	-0.124939
-0.095209	than the level-2 cache	-0.124939
-0.095209	from the level-2 cache	-0.124939
-0.095209	prevents the level-2 cache	-0.124939
-0.031645	and a level-2 cache	-0.124939
-0.031645	if, a level-2 cache	-0.124939
-0.065779	cache. The level-2 cache	-0.124939
-0.065779	stronger for level-2 cache	-0.124939
-0.065779	Check if level-2 cache	-0.124939
-0.959431	of the code, cache	-0.124939
-0.451373	{ // No cache	-0.124939
-0.237929	of the level-1 cache	-0.124939
-0.510544	in the level-1 cache	-0.124939
-0.237929	for the level-1 cache	-0.124939
-0.237929	reload the level-1 cache	-0.124939
-0.188023	than for level-1 cache	-0.124939
-0.188023	the entire level-1 cache	-0.124939
-0.490700	in a special cache	-0.124939
-0.540280	want to prevent cache	-0.124939
-0.490646	it can save cache	-0.124939
-0.346016	causes an entire cache	-0.124939
-0.338985	get very expensive cache	-0.124939
-0.505418	time a thousand cache	-0.124939
-0.443766	into an arbitrary cache	-0.124939
-0.077728	96 9.11 Explicit cache	-0.124939
-0.077728	2001. 9.11 Explicit cache	-0.124939
-0.314479	with a micro-op cache	-0.124939
-0.293811	to disk. Provoke cache	-0.124939
-0.237552	supported instruction sets, cache	-0.124939
-0.237552	size) = (total cache	-0.124939
-0.237552	speed, memory economy, cache	-0.124939
-0.237552	unit-test without taking cache	-0.124939
-0.237552	machine instructions executed, cache	-0.124939
-0.159759	an expression that should	-0.425969
-0.569032	false where it should	-0.124939
-0.357364	possible. Typically it should	-0.124939
-1.049185	of a function should	-0.124939
-0.958111	in a function should	-0.124939
-0.557846	Obviously, a function should	-0.124939
-0.354740	A thread-safe function should	-0.124939
-0.499957	The vectorized code should	-0.124939
-0.459034	code. System code should	-0.124939
-0.355307	Your measurement code should	-0.124939
-0.358336	actual calculations. This should	-0.124939
-0.569894	good optimizing compiler should	-0.124939
-0.828769	counts that you should	-0.124939
-0.544150	application then you should	-0.124939
-0.544150	limit, then you should	-0.124939
-0.468528	operands because you should	-0.124939
-0.488960	manual, but you should	-0.124939
-0.828777	for example, you should	-0.124939
-0.173927	it. Therefore, you should	-0.124939
-0.173927	consuming. Therefore, you should	-0.124939
-0.173927	exception. Therefore, you should	-0.124939
-0.173927	namespaces. Therefore, you should	-0.124939
-0.468528	testing. Here, you should	-0.124939
-0.430340	C++ program, you should	-0.124939
-0.332552	dispatching. Obviously, you should	-0.124939
-0.332552	the contrary, you should	-0.124939
-0.350026	became available. It should	-0.124939
-0.452340	storage space. It should	-0.124939
-0.350026	installation tools. It should	-0.124939
-0.350026	a buffer. It should	-0.124939
-0.503535	The test data should	-0.124939
-0.484526	sets. The program should	-0.124939
-0.484526	sched_setaffinity). The program should	-0.124939
-0.351877	used. No program should	-0.124939
-0.462238	of math functions should	-0.124939
-1.015107	critical innermost loop should	-0.124939
-1.278415	structure or class should	-0.124939
-1.272526	in this example should	-0.124939
-0.569640	8. The size should	-0.124939
-1.598738	a and b should	-0.124939
-0.760255	of each object should	-0.124939
-0.560854	version of C++ should	-0.124939
-0.351609	to NULL. There should	-0.124939
-0.351609	by commas. There should	-0.124939
-0.437258	or multidimensional array should	-0.124939
-0.309470	A multidimensional array should	-0.124939
-0.543730	of the objects should	-0.124939
-0.821332	variables and objects should	-0.124939
-0.678922	Variables and objects should	-0.124939
-0.357053	data decomposition, we should	-0.124939
-0.419854	point operations. You should	-0.124939
-0.324160	memory allocation. You should	-0.124939
-0.324160	the software. You should	-0.124939
-0.419854	b overlap. You should	-0.124939
-0.324160	about them. You should	-0.124939
-0.324160	too late. You should	-0.124939
-0.538600	134. The table should	-0.124939
-0.356871	that software performance should	-0.124939
-0.356639	problems. All software should	-0.124939
-0.460949	The loop branch should	-0.124939
-0.476017	spots. The test should	-0.124939
-0.437194	realistic performance test should	-0.124939
-0.338015	The speed test should	-0.124939
-0.356089	used. Web systems should	-0.124939
-0.356139	function or method should	-0.124939
-0.459811	These complicated cases should	-0.124939
-1.472843	by a constant should	-0.124939
-0.343302	memory. Big arrays should	-0.124939
-0.343302	data. Multidimensional arrays should	-0.124939
-1.252885	floating point calculations should	-0.124939
-1.168613	in multiple versions should	-0.124939
-0.343448	all three versions should	-0.124939
-0.547061	than 16 bytes should	-0.124939
-0.831009	by multiple threads should	-0.124939
-0.521611	cores. Each thread should	-0.124939
-0.354878	dialog boxes, etc. should	-0.124939
-0.546112	by a list should	-0.124939
-0.571005	A loop counter should	-0.124939
-0.562710	The loop count should	-0.124939
-0.353310	uninstallation of programs should	-0.124939
-0.353481	server. These problems should	-0.124939
-0.456674	and the dispatching should	-0.124939
-0.845008	the memory block should	-0.124939
-0.518177	The template parameter should	-0.124939
-0.455758	data and resources should	-0.124939
-0.250103	The CPU dispatcher should	-0.124939
-0.352328	The updating mechanism should	-0.124939
-0.746037	The .NET framework should	-0.124939
-0.162851	are used together should	-0.726999
-0.475469	The installation process should	-0.124939
-0.323097	The update process should	-0.124939
-0.515923	All intermediate results should	-0.124939
-0.493935	block. Thread-local storage should	-0.124939
-0.349116	a few lines should	-0.124939
-0.348417	input and output should	-0.124939
-0.303865	so. These containers should	-0.124939
-0.303865	Objects inside containers should	-0.124939
-0.734407	of one iteration should	-0.124939
-0.598063	search for updates should	-0.124939
-0.299027	downloaded program updates should	-0.124939
-0.750697	and switch statements should	-0.124939
-0.305941	case. Loop unrolling should	-0.124939
-0.305941	factor. Loop unrolling should	-0.124939
-0.344717	state. This penalty should	-0.124939
-0.693694	the clock counts should	-0.124939
-0.343156	or other device should	-0.124939
-0.441069	long. Lazy binding should	-0.124939
-0.441216	by an interrupt should	-0.124939
-0.338860	access rights. Software should	-0.124939
-0.351604	that software developers should	-0.124939
-0.351604	problems. Software developers should	-0.124939
-0.331050	etc. Accessibility guidelines should	-0.124939
-0.211820	times. A queue should	-0.124939
-0.211820	a FIFO queue should	-0.124939
-0.324780	audio or video should	-0.124939
-0.324780	The performance measurement should	-0.124939
-0.324555	The following considerations should	-0.124939
-0.324780	interrupt service routine should	-0.124939
-0.325005	that are modified should	-0.124939
-0.313940	features. User feedback should	-0.124939
-0.313940	heavy mathematical calculations, should	-0.124939
-0.293459	standardized file formats should	-0.124939
-0.293459	resources and servers should	-0.124939
-0.381907	64 bits wide, should	-0.124939
-0.237243	outside any function) should	-0.124939
-0.237243	planned solutions. Patches should	-0.124939
-0.237243	utilized appropriately. Users should	-0.124939
-0.237243	seriously. User complaints should	-0.124939
-0.237243	copy protection scheme should	-0.124939
-0.237243	on which imprecisions should	-0.124939
-0.237243	proceed unattended. Uninstallation should	-0.124939
-0.237243	current .cpp file) should	-0.124939
-0.237243	new/delete or malloc/free should	-0.124939
-1.593328	size of the integer	-0.124939
-1.183371	bits of the integer	-0.124939
-0.597991	16 to the integer	-0.124939
-0.894681	important that the integer	-0.124939
-0.598071	sign-bit if the integer	-0.124939
-0.887347	performance because the integer	-0.124939
-0.874383	and all the integer	-0.124939
-0.587893	and using the integer	-0.124939
-0.548688	Let's take the integer	-0.124939
-0.524904	cannot reduce the integer	-0.124939
-0.723012	the smaller the integer	-0.124939
-1.523925	the use of integer	-0.124939
-1.064439	maximum number of integer	-0.124939
-0.328377	floating point to integer	-0.124939
-0.203609	or double to integer	-0.124939
-0.357381	41 Float to integer	-0.124939
-0.358098	floating point and integer	-0.124939
-0.462582	to SSE4.1 and integer	-0.124939
-1.548350	be stored in integer	-0.124939
-0.523972	pivot element. The integer	-0.124939
-0.501868	particular application. The integer	-0.124939
-0.460773	loop index. The integer	-0.124939
-0.356675	example 8.15b. The integer	-0.124939
-0.573751	have functions for integer	-0.124939
-0.554083	no instructions for integer	-0.124939
-0.579916	automatic check for integer	-0.124939
-0.356631	caching problems for integer	-0.124939
-0.834670	the operands are integer	-0.124939
-0.358395	of functions with integer	-0.124939
-0.354002	decrement operators on integer	-0.124939
-0.312544	more reductions on integer	-0.124939
-0.312544	simple reductions on integer	-0.124939
-0.354002	algebraic manipulations on integer	-0.124939
-1.132991	as fast as integer	-0.124939
-0.871201	b is an integer	-0.124939
-0.485737	exponent is an integer	-0.124939
-0.492485	bits of an integer	-0.124939
-0.492485	range of an integer	-0.124939
-0.526262	number to an integer	-0.124939
-0.546461	allocated for an integer	-0.124939
-0.481184	integer, or an integer	-0.124939
-0.536128	variable as an integer	-0.124939
-0.492571	to use an integer	-0.124939
-0.449491	requires only an integer	-0.124939
-0.412874	to do an integer	-0.124939
-0.318551	check whether an integer	-0.124939
-0.318551	is simply an integer	-0.124939
-0.318551	can replace an integer	-0.124939
-0.318551	in fact an integer	-0.124939
-0.318551	operations. When an integer	-0.124939
-0.318551	by adding an integer	-0.124939
-0.318551	you divide an integer	-0.124939
-0.318551	can convert an integer	-0.124939
-0.318551	to increment an integer	-0.124939
-0.318551	of declaring an integer	-0.124939
-0.318551	clause. Comparing an integer	-0.124939
-0.318551	a[i]; Converting an integer	-0.124939
-0.318551	by replacing an integer	-0.124939
-0.658519	more predictable than integer	-0.124939
-0.462680	will typically use integer	-0.124939
-1.331450	two or more integer	-0.124939
-0.353261	adding one more integer	-0.124939
-0.353261	a few more integer	-0.124939
-0.718128	avoid conversions from integer	-0.124939
-0.458896	enabled. Conversion from integer	-0.124939
-0.549297	integer with vector integer	-0.124939
-0.569791	summarizes the different integer	-0.124939
-0.580256	Sizes of different integer	-0.124939
-0.358076	an advantage because integer	-0.124939
-0.565651	variables for other integer	-0.124939
-1.227481	size of each integer	-0.124939
-1.457723	possible to do integer	-0.124939
-0.502666	The first two integer	-0.124939
-0.541672	using a 64-bit integer	-0.124939
-0.746833	advantage of 64-bit integer	-0.124939
-1.366224	the most efficient integer	-0.124939
-0.514073	of a 32-bit integer	-0.124939
-0.349916	such as 32-bit integer	-0.124939
-1.493010	the most critical integer	-0.124939
-0.531914	SSE2 128 bit integer	-0.124939
-0.450642	AVX2 256 bit integer	-0.124939
-0.333273	convert the unsigned integer	-0.124939
-0.333273	bits. The unsigned integer	-0.124939
-0.420882	of an unsigned integer	-0.124939
-0.297210	as an unsigned integer	-0.124939
-0.523227	assume that these integer	-0.124939
-0.356282	then the even integer	-0.124939
-0.938509	is a simple integer	-0.124939
-0.432897	to do simple integer	-0.124939
-0.334593	to mix simple integer	-0.124939
-0.330990	vector operations An integer	-0.124939
-0.330990	= s; An integer	-0.124939
-0.330990	to +127. An integer	-0.124939
-0.498858	code that contains integer	-0.124939
-0.580923	whether a particular integer	-0.124939
-0.353608	will often replace integer	-0.124939
-0.134893	141 14.9 Using integer	-0.124939
-0.134893	int)u; 14.9 Using integer	-0.124939
-0.312944	of a signed integer	-0.124939
-0.213625	to a signed integer	-0.124939
-0.413463	conversion to signed integer	-0.124939
-0.291617	assumption that signed integer	-0.124939
-0.352761	Here, / means integer	-0.124939
-0.326694	to store aligned integer	-0.124939
-0.326694	to load aligned integer	-0.124939
-0.452782	differently. A negative integer	-0.124939
-0.775263	is a positive integer	-0.124939
-0.784800	advantageous to mix integer	-0.124939
-0.013075	to store unaligned integer	-0.602060
-0.013075	to load unaligned integer	-0.602060
-0.343270	also allows 256-bit integer	-0.124939
-0.443807	use the default integer	-0.124939
-0.441273	float Register variables, integer	-0.124939
-0.178021	use the smallest integer	-0.425969
-0.207211	using the smallest integer	-0.124939
-0.493495	or more complex integer	-0.124939
-0.255503	the first six integer	-0.124939
-0.335819	are approximately six integer	-0.124939
-0.606557	systems and fourteen integer	-0.124939
-0.331579	better at reducing integer	-0.124939
-0.237669	Example 15.1c. Calculate integer	-0.124939
-0.237669	Example 15.1b. Calculate integer	-0.124939
-0.420681	make an additional integer	-0.124939
-0.324823	way of defining integer	-0.124939
-0.407476	Most reductions involving integer	-0.124939
-0.314202	Round to nearest integer	-0.124939
-0.314202	very fast. Simple integer	-0.124939
-0.537977	Common subexpression elimin., integer	-0.124939
-0.237462	on CodeGear compiler) integer	-0.124939
-0.237462	in registers (6 integer	-0.124939
-0.237462	array bounds violation, integer	-0.124939
-0.237462	around, (3) trap integer	-0.124939
-0.586218	predicted. This is no	-0.124939
-1.225357	the object is no	-0.124939
-0.104934	and there is no	-0.124939
-0.430964	that there is no	-0.124939
-0.301426	if there is no	-0.249877
-0.243583	- there is no	-0.124939
-0.104934	when there is no	-0.124939
-0.237227	then there is no	-0.124939
-0.243583	because there is no	-0.124939
-0.350946	but there is no	-0.124939
-0.243583	where there is no	-0.124939
-0.200333	cases, there is no	-0.124939
-0.243583	course there is no	-0.124939
-0.243583	general, there is no	-0.124939
-0.243583	unfortunately there is no	-0.124939
-0.243583	enabled there is no	-0.124939
-0.148178	double There is no	-0.124939
-0.148178	functions. There is no	-0.124939
-0.148178	systems. There is no	-0.124939
-0.148178	processors. There is no	-0.124939
-0.148178	called. There is no	-0.124939
-0.148178	object. There is no	-0.124939
-0.148178	way. There is no	-0.124939
-0.148178	references. There is no	-0.124939
-0.148178	returns. There is no	-0.124939
-0.148178	allocation. There is no	-0.124939
-0.148178	parameter. There is no	-0.124939
-0.148178	value. There is no	-0.124939
-0.148178	parameters. There is no	-0.124939
-0.148178	bits. There is no	-0.124939
-0.148178	automatically. There is no	-0.124939
-0.148178	execution. There is no	-0.124939
-0.148178	screen. There is no	-0.124939
-0.148178	created. There is no	-0.124939
-0.148178	43). There is no	-0.124939
-0.148178	87). There is no	-0.124939
-0.148178	x4∙xn-4. There is no	-0.124939
-0.148178	Namespaces There is no	-0.124939
-0.148178	returned. There is no	-0.124939
-0.148178	.so). There is no	-0.124939
-0.343891	fastest execution is no	-0.124939
-0.343891	same priority is no	-0.124939
-0.357356	be 8 and no	-0.124939
-0.881175	repeat count and no	-0.124939
-0.656726	copy constructor and no	-0.124939
-0.502819	n additions and no	-0.124939
-0.884750	make sure that no	-0.124939
-1.142375	makes sure that no	-0.124939
-0.460684	to recommend that no	-0.124939
-1.185391	there may be no	-0.124939
-0.592983	There will be no	-0.124939
-0.588769	spaces that are no	-0.124939
-0.321570	and there are no	-0.124939
-0.931955	that there are no	-0.124939
-1.108046	if there are no	-0.124939
-0.760230	If there are no	-0.124939
-0.564344	enabled. There are no	-0.124939
-0.564344	security. There are no	-0.124939
-1.132428	when they are no	-0.124939
-0.355253	with few or no	-0.124939
-0.142272	with little or no	-0.124939
-0.142272	have little or no	-0.124939
-0.355253	signed number, or no	-0.124939
-0.501112	dead code if no	-0.124939
-0.356134	any objects if no	-0.124939
-0.571262	handler, even if no	-0.124939
-0.358536	is inlined - no	-0.124939
-0.533990	certain to have no	-0.124939
-0.498817	it can have no	-0.124939
-0.427252	these functions have no	-0.124939
-0.427252	string functions have no	-0.124939
-0.663360	if elements have no	-0.124939
-0.506884	performance. I have no	-0.124939
-0.506884	arrays. I have no	-0.124939
-0.427896	should preferably have no	-0.124939
-0.427896	Smaller microprocessors have no	-0.124939
-0.663360	the operands have no	-0.124939
-0.026508	Smaller microcontrollers have no	-0.124939
-0.330599	imple- mentations have no	-0.124939
-0.355823	not necessary when no	-0.124939
-0.355823	requirement. Useful when no	-0.124939
-0.515656	function that has no	-0.124939
-0.768229	the code has no	-0.124939
-0.998701	the compiler has no	-0.124939
-0.758528	instruction set has no	-0.124939
-0.533090	the library has no	-0.124939
-0.101954	the object has no	-0.124939
-0.331580	member functions) has no	-0.124939
-0.331580	stack. Deallocation has no	-0.124939
-0.331580	-fno-pic apparently has no	-0.124939
-0.350672	point overflow but no	-0.124939
-0.350672	of expressions, but no	-0.124939
-0.350672	static if), but no	-0.124939
-0.333308	the code takes no	-0.124939
-0.333308	structure object takes no	-0.124939
-0.431286	conversion often takes no	-0.124939
-0.333308	exception handling takes no	-0.124939
-0.490043	This conversion takes no	-0.124939
-0.542848	variable, it makes no	-0.124939
-0.349467	or #define makes no	-0.124939
-0.340542	long double take no	-0.124939
-0.340542	precision calculations take no	-0.124939
-0.340542	different precisions take no	-0.124939
-0.355907	the hint about no	-0.124939
-0.754525	you will get no	-0.124939
-0.820778	a program contains no	-0.124939
-0.338777	code section contains no	-0.124939
-0.570626	there is simply no	-0.124939
-0.495653	program. This requires no	-0.124939
-0.351895	Instruction set control no	-0.124939
-0.751546	compiler to assume no	-0.124939
-0.349587	output can produce no	-0.124939
-0.311705	/GR- -fno-rtti Assume no	-0.124939
-0.311705	page 78. Assume no	-0.124939
-0.339265	type conversion generates no	-0.124939
-0.314628	option for assuming no	-0.124939
-0.314533	-O3 or (requires no	-0.124939
-0.102801	option for "assume no	-0.124939
-0.102801	compiler option "assume no	-0.124939
-0.237739	there is virtually no	-0.124939
-0.358828	data storage and page	-0.124939
-0.317678	The example on page	-0.124939
-0.317678	container classes on page	-0.124939
-0.012701	is explained on page	-0.124939
-0.012701	are explained on page	-0.124939
-0.001044	as explained on page	-0.124939
-0.003141	reasons explained on page	-0.124939
-0.317678	The examples on page	-0.124939
-0.317678	methods described on page	-0.124939
-0.317678	are given on page	-0.124939
-0.130417	is discussed on page	-0.124939
-0.130417	as discussed on page	-0.124939
-0.317678	explained below on page	-0.124939
-0.317678	are listed on page	-0.124939
-0.317678	in detail on page	-0.124939
-0.317678	provided below, on page	-0.124939
-0.317678	Example 7.43 on page	-0.124939
-0.581022	by the memory page	-0.124939
-0.358242	as explained at page	-0.124939
-0.357527	www.intel.com. (See also page	-0.124939
-0.077984	syntax or See page	-0.124939
-0.077984	member function. See page	-0.124939
-0.077984	than functions. See page	-0.124939
-0.129664	of memory. See page	-0.124939
-0.077984	main program. See page	-0.124939
-0.018223	are used. See page	-0.124939
-0.077984	VIA processors. See page	-0.124939
-0.077984	or 1. See page	-0.124939
-0.077984	if possible. See page	-0.124939
-0.077984	inefficient way. See page	-0.124939
-0.077984	Intel CPU. See page	-0.124939
-0.077984	or not. See page	-0.124939
-0.077984	of order. See page	-0.124939
-0.077984	memory allocation. See page	-0.124939
-0.077984	if available. See page	-0.124939
-0.077984	integer expressions. See page	-0.124939
-0.077984	of storage. See page	-0.124939
-0.018223	operating system. See page	-0.124939
-0.077984	interprocedural optimizations. See page	-0.124939
-0.077984	assembly language. See page	-0.124939
-0.077984	avoid this. See page	-0.124939
-0.077984	not overlap. See page	-0.124939
-0.077984	disk files. See page	-0.124939
-0.077984	do so. See page	-0.124939
-0.077984	five manuals. See page	-0.124939
-0.018223	CPU dispatcher. See page	-0.425969
-0.077984	not cached. See page	-0.124939
-0.077984	pointer aliasing. See page	-0.124939
-0.077984	less compact. See page	-0.124939
-0.077984	such errors. See page	-0.124939
-0.077984	part takes. See page	-0.124939
-0.077984	from exceptions. See page	-0.124939
-0.077984	not occur. See page	-0.124939
-0.018223	each other. See page	-0.124939
-0.077984	optimally aligned. See page	-0.124939
-0.077984	is required. See page	-0.124939
-0.077984	prediction mechanism. See page	-0.124939
-0.077984	long delay. See page	-0.124939
-0.077984	STL containers. See page	-0.124939
-0.077984	cache contentions. See page	-0.124939
-0.077984	will crash. See page	-0.124939
-0.077984	is incremented. See page	-0.124939
-0.077984	is requested. See page	-0.124939
-0.077984	Mac OS. See page	-0.124939
-0.077984	code motion. See page	-0.124939
-0.077984	identification (RTTI). See page	-0.124939
-0.077984	| 0x8040); See page	-0.124939
-0.068979	points to (see page	-0.124939
-0.068979	the compiler (see page	-0.124939
-0.068979	preceding one (see page	-0.124939
-0.119534	data cache (see page	-0.124939
-0.068979	derived class (see page	-0.124939
-0.068979	and double (see page	-0.124939
-0.068979	smart pointer (see page	-0.124939
-0.068979	less efficient (see page	-0.124939
-0.033121	in registers (see page	-0.124939
-0.033121	XMM registers (see page	-0.124939
-0.068979	operating system (see page	-0.124939
-0.068979	vector instructions (see page	-0.124939
-0.068979	non-Intel processors (see page	-0.124939
-0.068979	a constant (see page	-0.124939
-0.068979	where necessary (see page	-0.124939
-0.068979	unsigned integers (see page	-0.124939
-0.068979	point precision (see page	-0.124939
-0.068979	linked list (see page	-0.124939
-0.068979	or 1 (see page	-0.124939
-0.068979	point expressions (see page	-0.124939
-0.068979	of range (see page	-0.124939
-0.068979	as intended (see page	-0.124939
-0.016245	automatic vectorization (see page	-0.124939
-0.068979	Bounds checking (see page	-0.124939
-0.068979	quite time-consuming (see page	-0.124939
-0.033121	pointer aliasing (see page	-0.124939
-0.033121	out aliasing (see page	-0.124939
-0.068979	a profiling (see page	-0.124939
-0.068979	out-of-order capabilities (see page	-0.124939
-0.068979	of mispredictions (see page	-0.124939
-0.068979	be profitable (see page	-0.124939
-0.068979	automatic CPU-dispatching (see page	-0.124939
-0.068979	the devirtualization (see page	-0.124939
-0.332895	vector operations, see page	-0.124939
-0.332895	XMM registers; see page	-0.124939
-0.351616	See chapter 10 page	-0.124939
-0.079688	of 2 (See page	-0.124939
-0.079688	by 2 (See page	-0.124939
-0.177359	64-bit Windows (See page	-0.124939
-0.177359	different CPUs. (See page	-0.124939
-0.177359	across modules (See page	-0.124939
-0.177359	by 2. (See page	-0.124939
-0.434928	stride (see above, page	-0.124939
-0.771085	in example 13.1 page	-0.124939
-0.294163	in example 14.23 page	-0.124939
-0.237861	in example 7.35 page	-0.124939
-0.600890	lines in the set	-0.124939
-0.526750	of f is set	-0.124939
-0.358563	or Friday is set	-0.124939
-1.385332	to use a set	-0.124939
-0.553336	operations use a set	-0.124939
-1.054463	way is to set	-0.124939
-1.325484	You have to set	-0.124939
-0.873092	fastest way to set	-0.124939
-0.588504	strongly recommended to set	-0.124939
-0.539079	all belong to set	-0.124939
-0.461371	it attempts to set	-0.124939
-0.917169	cache lines in set	-0.124939
-1.398906	which can be set	-0.124939
-0.888748	limit can be set	-0.124939
-0.463331	certain options are set	-0.124939
-1.398561	then you can set	-0.124939
-0.724032	test tool can set	-0.124939
-0.719011	|= 0x80000000; // set	-0.124939
-0.203046	a[size], b[size]; // set	-0.425969
-0.355581	&= 0x7FFFFFFF; // set	-0.124939
-0.358176	cache lines from set	-0.124939
-0.591705	divide the data set	-0.124939
-0.561585	addresses with different set	-0.124939
-1.535016	to the same set	-0.124939
-0.248432	to the instruction set	-0.124939
-0.136057	in the instruction set	-0.124939
-0.062722	for the instruction set	-0.124939
-0.142868	depending on instruction set	-0.124939
-0.042605	efficient. This instruction set	-0.124939
-0.042605	set. This instruction set	-0.124939
-0.042605	view. This instruction set	-0.124939
-0.203483	has an instruction set	-0.124939
-0.124585	for this instruction set	-0.124939
-0.124585	support this instruction set	-0.124939
-0.124585	checks which instruction set	-0.124939
-0.124585	detect which instruction set	-0.124939
-0.142868	the 64-bit instruction set	-0.124939
-0.142868	best possible instruction set	-0.124939
-0.142868	64 bit instruction set	-0.124939
-0.142868	a new instruction set	-0.124939
-0.084064	the SSE2 instruction set	-0.539912
-0.024401	The SSE2 instruction set	-0.602060
-0.077747	145 SSE2 instruction set	-0.124939
-0.077747	-msse SSE2 instruction set	-0.124939
-0.142868	AVX 32 instruction set	-0.124939
-0.071035	the AVX instruction set	-0.425969
-0.063040	The AVX instruction set	-0.124939
-0.030377	12.1 AVX instruction set	-0.425969
-0.232490	minimum supported instruction set	-0.124939
-0.232490	Detect supported instruction set	-0.124939
-0.300234	a particular instruction set	-0.124939
-0.066644	or later instruction set	-0.522879
-0.317133	a higher instruction set	-0.124939
-0.216954	next higher instruction set	-0.124939
-0.065585	the AVX2 instruction set	-0.124939
-0.065585	The AVX2 instruction set	-0.124939
-0.020780	the x86 instruction set	-0.124939
-0.042605	bit x86 instruction set	-0.124939
-0.142868	the appropriate instruction set	-0.124939
-0.300234	the desired instruction set	-0.124939
-0.142868	sections SSE instruction set	-0.124939
-0.031556	the SSE4.1 instruction set	-0.425969
-0.203483	a newer instruction set	-0.124939
-0.142868	a low instruction set	-0.124939
-0.031556	the latest instruction set	-0.124939
-0.142868	-msse2 SSE3 instruction set	-0.124939
-0.328209	the newest instruction set	-0.124939
-0.142868	the AVX512 instruction set	-0.124939
-0.089931	the CISC instruction set	-0.124939
-0.042642	The CISC instruction set	-0.124939
-0.142868	the highest instruction set	-0.124939
-0.142868	the x86-64 instruction set	-0.124939
-0.042605	the AVX-512 instruction set	-0.124939
-0.020780	12.2 AVX-512 instruction set	-0.425969
-0.142868	Error: lowest instruction set	-0.124939
-0.142868	number (the instruction set	-0.124939
-0.142868	(or later) instruction set	-0.124939
-0.142868	difference. Newest instruction set	-0.124939
-0.142868	Pentium Pro instruction set	-0.124939
-0.357984	can calculate which set	-0.124939
-0.724608	most commonly used set	-0.124939
-0.540353	(there is one set	-0.124939
-1.245367	instance for each set	-0.124939
-0.357715	have its pointer set	-0.124939
-0.357002	The debugger cannot set	-0.124939
-1.206099	for a particular set	-0.124939
-0.960977	has its own set	-0.124939
-0.081848	vector, bits Instruction set	-0.124939
-0.081848	table element Instruction set	-0.124939
-0.081848	function name Instruction set	-0.124939
-0.081848	Mac, BSD Instruction set	-0.124939
-0.019062	as follows: Instruction set	-0.124939
-0.081848	fprintf(stderr, "\nError: Instruction set	-0.124939
-0.504707	on a typical set	-0.124939
-0.343578	may, in addition, set	-0.124939
-0.716059	with a suitable set	-0.124939
-0.331667	such as list, set	-0.124939
-0.081860	with a realistic set	-0.425969
-0.065764	set AVX instr. set	-0.124939
-0.065764	set SSE4.1 instr. set	-0.124939
-0.065764	Suppl. SSE3 instr. set	-0.124939
-1.179742	object of the class	-0.124939
-1.276652	member of the class	-0.124939
-1.261128	instance of the class	-0.124939
-1.071168	reference to the class	-0.124939
-1.071432	appear in the class	-0.124939
-0.597450	register for the class	-0.124939
-0.598114	functions if the class	-0.124939
-0.594546	members. If the class	-0.124939
-0.645962	information about the class	-0.124939
-0.586522	body inside the class	-0.124939
-0.584563	template is a class	-0.124939
-0.584563	T is a class	-0.124939
-1.099243	object of a class	-0.124939
-0.790581	value of a class	-0.124939
-0.919743	member of a class	-0.124939
-0.235837	members of a class	-0.124939
-0.572557	parameters to a class	-0.124939
-1.170225	applied to a class	-0.124939
-1.144345	variables in a class	-0.124939
-0.872175	declared in a class	-0.124939
-0.584369	should be a class	-0.124939
-0.190526	wrapped into a class	-0.124939
-0.362272	declared inside a class	-0.124939
-0.306733	defined inside a class	-0.124939
-1.341947	an object of class	-0.124939
-0.463463	object belongs to class	-0.124939
-0.503979	of structure and class	-0.124939
-0.358185	smaller. Structure and class	-0.124939
-0.463292	doesn't work for class	-0.124939
-0.574102	the function or class	-0.124939
-0.363222	same function or class	-0.124939
-0.363222	each function or class	-0.124939
-0.363222	No function or class	-0.124939
-0.166973	the structure or class	-0.124939
-0.149668	a structure or class	-0.124939
-0.075511	of structure or class	-0.124939
-0.166973	This structure or class	-0.124939
-0.346444	}; 52 or class	-0.124939
-0.355493	into registers. A class	-0.124939
-0.355493	other constructors. A class	-0.124939
-0.482139	and the vector class	-0.124939
-0.482139	for the vector class	-0.124939
-0.462959	registers. The vector class	-0.124939
-0.328473	"vectorclass.h" // vector class	-0.124939
-0.511705	The Intel vector class	-0.124939
-0.328473	in my vector class	-0.124939
-0.328473	me. My vector class	-0.124939
-0.124792	classes Agner's vector class	-0.124939
-0.124792	107). Agner's vector class	-0.124939
-0.124792	-mveclibabi=acml. Agner's vector class	-0.124939
-0.124792	amd_vrd2_exp Agner's vector class	-0.124939
-0.982665	of the same class	-0.124939
-0.539809	with virtual functions class	-0.124939
-0.549024	information to all class	-0.124939
-0.787208	pointer to one class	-0.124939
-0.445650	is a template class	-0.124939
-0.445650	that a template class	-0.124939
-0.499255	polymorphism A template class	-0.124939
-0.330896	the above template class	-0.124939
-0.579043	of a simple class	-0.124939
-0.172348	1; } }; class	-0.124939
-0.594961	void f(); }; class	-0.124939
-0.544282	into a container class	-0.124939
-0.386609	defining a container class	-0.124939
-0.282238	class. The container class	-0.124939
-0.282238	and other container class	-0.124939
-0.433084	more efficient container class	-0.124939
-0.282238	Ready made container class	-0.124939
-0.496703	doesn't know what class	-0.124939
-0.062813	of the child class	-0.124939
-0.062813	for the child class	-0.124939
-0.357519	parent and child class	-0.124939
-0.195906	class. The child class	-0.124939
-0.120356	of its child class	-0.124939
-0.120356	about its child class	-0.124939
-0.136271	the correct child class	-0.124939
-0.347211	of suitable containers class	-0.124939
-0.414445	The first generation class	-0.124939
-0.077433	The second generation class	-0.124939
-0.219083	the third generation class	-0.124939
-0.248515	Table 12.4. Vector class	-0.124939
-0.248515	updated lately. Vector class	-0.124939
-0.248515	Example 12.7. Vector class	-0.124939
-0.445846	that the declaration class	-0.124939
-0.217785	of the derived class	-0.124939
-0.136297	and the derived class	-0.124939
-0.037145	of a derived class	-0.124939
-0.077767	and a derived class	-0.124939
-0.115854	class and derived class	-0.124939
-0.193241	of the parent class	-0.124939
-0.040857	of a parent class	-0.124939
-0.490705	functions of parent class	-0.124939
-0.721469	of a polymorphic class	-0.124939
-0.434505	to a base class	-0.124939
-0.343388	to multiple inheritance class	-0.124939
-0.237670	7.38a. Multiple inheritance class	-0.124939
-0.056960	unsigned int N> class	-0.124939
-0.056960	IsPowerOf2, int N> class	-0.124939
-0.271311	template <int N> class	-0.124939
-0.314358	class B2; 54 class	-0.124939
-0.314358	objects Conversions involving class	-0.124939
-0.314358	// Example 7.28 class	-0.124939
-0.314358	Example 8.19. Devirtualization class	-0.124939
-0.314511	// Example 7.14 class	-0.124939
-0.023499	inheritance class B1; class	-0.425969
-0.538237	of the object's class	-0.124939
-0.293857	the grandparent class: class	-0.124939
-0.293857	"Hello "; Disp(); class	-0.124939
-0.102747	be true. template<> class	-0.124939
-0.102747	the recursion template<> class	-0.124939
-0.293857	go undetected. Converting class	-0.124939
-0.538237	B1; class B2; class	-0.124939
-0.237592	multiple // versions: class	-0.124939
-0.237592	template <typename MyChild> class	-0.124939
-0.237592	// Example 7.44 class	-0.124939
-0.237592	// Example 7.37 class	-0.124939
-0.237592	// Example 7.36 class	-0.124939
-0.237592	// Example 7.41a class	-0.124939
-1.748911	value of the floating	-0.124939
-1.865783	parts of the floating	-0.124939
-1.186203	fact that the floating	-0.124939
-1.058896	determined by the floating	-0.124939
-0.363987	time when the floating	-0.425969
-0.562582	inefficient when the floating	-0.124939
-0.524471	truncation so the floating	-0.124939
-0.588381	long before the floating	-0.124939
-0.564059	might store the floating	-0.124939
-0.538311	precision. When the floating	-0.124939
-0.502342	double reflects the floating	-0.124939
-0.357014	operations in-between the floating	-0.124939
-1.122502	when b is floating	-0.124939
-0.848244	sign of a floating	-0.124939
-0.848244	Conversion of a floating	-0.124939
-0.573772	latency of a floating	-0.124939
-0.592421	223 to a floating	-0.124939
-0.580360	addition, and a floating	-0.124939
-0.582552	check if a floating	-0.124939
-1.041465	faster than a floating	-0.124939
-0.578946	chain. If a floating	-0.124939
-0.909115	to do a floating	-0.124939
-0.982656	to access a floating	-0.124939
-0.458640	loop needs a floating	-0.124939
-0.354996	integer addition, a floating	-0.124939
-0.354996	function rounds a floating	-0.124939
-0.582983	any use of floating	-0.124939
-1.057408	maximum number of floating	-0.124939
-1.325338	the order of floating	-0.124939
-0.524176	rare cases of floating	-0.124939
-0.342122	different types of floating	-0.425969
-0.356732	two types of floating	-0.124939
-0.536893	The range of floating	-0.124939
-0.356080	algebraic manipulations of floating	-0.124939
-0.159437	from integer to floating	-0.124939
-0.313022	of integers to floating	-0.124939
-0.442033	unsigned integers to floating	-0.124939
-0.460389	before conversion to floating	-0.124939
-0.800024	not apply to floating	-0.124939
-0.356373	%. Conversion to floating	-0.124939
-0.460389	before converting to floating	-0.124939
-0.524047	mix integer and floating	-0.124939
-0.064007	between integers and floating	-0.602060
-0.356726	String constants and floating	-0.124939
-0.526375	integer calculations in floating	-0.124939
-0.526900	precision, especially in floating	-0.124939
-0.570029	hardware functions. The floating	-0.124939
-0.929425	induction variables for floating	-0.124939
-0.459906	XMM registers for floating	-0.124939
-0.355993	Enable exception for floating	-0.124939
-0.355993	of accumulators for floating	-0.124939
-0.355993	static keyword, for floating	-0.124939
-0.582928	makers assume that floating	-0.124939
-1.052075	If there are floating	-0.124939
-0.462923	are integers or floating	-0.124939
-0.653280	above example with floating	-0.124939
-0.459435	point addition with floating	-0.124939
-0.719108	are incompatible with floating	-0.124939
-0.556253	expressions than on floating	-0.124939
-0.128963	algebraic reductions on floating	-0.425969
-0.358337	intermediate results as floating	-0.124939
-0.877226	are faster than floating	-0.124939
-0.654854	integer expressions than floating	-0.124939
-0.574496	systems that have floating	-0.124939
-0.724571	memory space. A floating	-0.124939
-0.451837	faster than from floating	-0.124939
-0.492056	A conversion from floating	-0.124939
-0.492056	all conversions from floating	-0.124939
-0.451837	integers Conversion from floating	-0.124939
-0.594471	easy to make floating	-0.124939
-0.309956	compiler cannot make floating	-0.124939
-0.309956	Compilers cannot make floating	-0.124939
-0.357857	for mixing different floating	-0.124939
-0.571875	specifies that all floating	-0.124939
-0.868380	have only one floating	-0.124939
-0.357615	loop. If each floating	-0.124939
-0.126710	one or two floating	-0.425969
-0.346600	would require two floating	-0.124939
-0.436925	and before any floating	-0.124939
-0.309221	instruction before any floating	-0.124939
-0.150600	14.8 Conversions between floating	-0.425969
-0.771268	because it makes floating	-0.124939
-0.063606	instruction set makes floating	-0.425969
-0.460757	integer and 8 floating	-0.124939
-0.872903	start a new floating	-0.124939
-0.063969	have difficulties making floating	-0.425969
-0.308083	code that does floating	-0.124939
-0.308083	loop that does floating	-0.124939
-0.552155	requires a big floating	-0.124939
-0.323312	There are eight floating	-0.124939
-0.418797	49 first eight floating	-0.124939
-0.591346	operations involves eight floating	-0.124939
-0.354538	a loop contains floating	-0.124939
-0.147407	method of doing floating	-0.425969
-0.353880	to enable fast floating	-0.124939
-0.351366	Applications that generate floating	-0.124939
-0.350400	compare two positive floating	-0.124939
-0.347781	there are 100 floating	-0.124939
-0.347927	FDIV bug causes floating	-0.124939
-0.784879	advantageous to mix floating	-0.124939
-0.346910	execution units. Any floating	-0.124939
-0.551616	loading the entire floating	-0.124939
-0.343155	with x87 style floating	-0.124939
-0.341290	includes static variables, floating	-0.124939
-0.331475	requirements for strict floating	-0.124939
-0.237585	multiply a nonzero floating	-0.124939
-0.237585	values of nonzero floating	-0.124939
-0.331353	it allows larger floating	-0.124939
-0.420718	making an additional floating	-0.124939
-0.023491	operations for manipulating floating	-0.425969
-0.382248	} // Catch floating	-0.124939
-0.538026	misses, branch mispredictions, floating	-0.124939
-0.237487	allows less precise floating	-0.124939
-0.237487	/Oa -fno-alias Non-strict floating	-0.124939
-0.237487	have no native floating	-0.124939
-0.237487	occurred. // Reset floating	-0.124939
-0.237487	on, including relaxed floating	-0.124939
-0.237487	set to relax floating	-0.124939
-0.237487	integer vectors FMA3 floating	-0.124939
-1.025210	the size of each	-0.124939
-0.944086	The size of each	-0.124939
-0.467207	maximum size of each	-0.124939
-0.467207	total size of each	-0.124939
-1.122901	the address of each	-0.124939
-0.570876	intermediate result of each	-0.124939
-1.082190	the speed of each	-0.124939
-0.570072	above advantages of each	-0.124939
-1.263562	the length of each	-0.124939
-0.353134	and destructors of each	-0.124939
-0.648354	time consumption of each	-0.124939
-0.648354	(in bytes) of each	-0.124939
-0.089114	function Size of each	-0.124939
-0.089114	elements Size of each	-0.124939
-0.089114	classes. Size of each	-0.124939
-0.353134	the logarithm of each	-0.124939
-0.106847	Add 2 to each	-0.425969
-0.357079	slices allocated to each	-0.124939
-0.357079	new features to each	-0.124939
-0.538997	that belong to each	-0.124939
-0.357079	are unrelated to each	-0.124939
-0.358163	are CPU-specific and each	-0.124939
-0.358163	= x∙xn-1, and each	-0.124939
-1.395150	of elements in each	-0.124939
-0.759873	two threads in each	-0.124939
-0.908540	cache lines in each	-0.124939
-0.460475	of counters in each	-0.124939
-0.460475	interrupt occurs in each	-0.124939
-0.356441	right formula in each	-0.124939
-0.541176	template function for each	-0.124939
-0.834398	to use for each	-0.124939
-0.241612	be different for each	-0.124939
-0.241612	are different for each	-0.124939
-0.522199	the file for each	-0.124939
-0.343220	residual error for each	-0.124939
-0.483184	one container for each	-0.124939
-0.343220	is optimal for each	-0.124939
-0.343220	be separate for each	-0.124939
-0.443743	small block for each	-0.124939
-0.483184	different name for each	-0.124939
-0.443743	Time difference for each	-0.124939
-0.009595	one instance for each	-0.221849
-0.343220	separate containers for each	-0.124939
-0.138552	only once for each	-0.124939
-0.138552	Compile once for each	-0.124939
-0.343220	some changes for each	-0.124939
-0.536582	often waiting for each	-0.124939
-0.343220	and bc for each	-0.124939
-0.343220	heap manager for each	-0.124939
-0.343220	function prototypes for each	-0.124939
-0.355114	of CPU that each	-0.124939
-0.827309	code so that each	-0.124939
-0.562519	macro so that each	-0.124939
-0.355114	into account that each	-0.124939
-0.502714	the sense that each	-0.124939
-1.219257	a structure or each	-0.124939
-0.760503	means that if each	-0.124939
-1.372388	For example, if each	-0.124939
-0.358482	exclusive access by each	-0.124939
-0.357053	allocated objects with each	-0.124939
-0.357053	be stored with each	-0.124939
-0.574748	loaded rather than each	-0.124939
-0.574748	once, rather than each	-0.124939
-0.504061	The threads have each	-0.124939
-0.939415	how much time each	-0.124939
-0.653069	is available then each	-0.124939
-0.355516	dependency chains then each	-0.124939
-0.503822	values far from each	-0.124939
-0.526023	loaded from memory each	-0.124939
-0.652462	extra code at each	-0.124939
-0.355210	row addresses at each	-0.124939
-0.536318	a time because each	-0.124939
-0.354874	is serial because each	-0.124939
-0.357994	the loop. If each	-0.124939
-0.357620	in fact using each	-0.124939
-0.503875	of work into each	-0.124939
-0.611417	a loop where each	-0.124939
-0.334024	a sequence where each	-0.124939
-0.334024	dependency chain where each	-0.124939
-0.334024	of calculations, where each	-0.124939
-0.334024	variable __intel_cpu_feature_indicator where each	-0.124939
-0.594062	calculating the value each	-0.124939
-0.356987	the cache between each	-0.124939
-0.356796	50% or less each	-0.124939
-0.577340	practice to test each	-0.124939
-0.703360	number of times each	-0.124939
-0.391626	how many times each	-0.124939
-0.355532	data members. But each	-0.124939
-0.572150	have to calculate each	-0.124939
-0.518781	it can calculate each	-0.124939
-0.576884	than to store each	-0.124939
-0.353376	// get next each	-0.124939
-0.936380	before and after each	-0.124939
-0.328167	context switches after each	-0.124939
-0.519740	possible to give each	-0.124939
-0.541117	sum; } Here, each	-0.124939
-0.350943	the same function, each	-0.124939
-0.487061	searching for updates each	-0.124939
-0.345121	multiple statements within each	-0.124939
-0.081807	are used near each	-0.124939
-0.019053	are stored near each	-0.124939
-0.006259	also stored near each	-0.301030
-0.081807	are called near each	-0.124939
-0.081807	code together near each	-0.124939
-0.335833	} By giving each	-0.124939
-0.102766	zero); // AND each	-0.124939
-0.102766	110 // AND each	-0.124939
-0.331548	of CPU development, each	-0.124939
-0.331548	quadratic matrix, i.e. each	-0.124939
-0.314485	of branch. After each	-0.124939
-0.713846	On the contrary, each	-0.124939
-0.314485	rather than moving each	-0.124939
-0.443779	modules if necessary, each	-0.124939
-0.293820	threads will invalidate each	-0.124939
-0.382350	than to draw each	-0.124939
-0.023496	c); // Compare each	-0.425969
-0.237560	in multiple versions, each	-0.124939
-0.237560	and underflow neutralize each	-0.124939
-0.551133	optimized is to do	-0.124939
-0.551133	limited is to do	-0.124939
-0.618152	the compiler to do	-0.124939
-0.409877	particular compiler to do	-0.124939
-0.827894	may have to do	-0.124939
-0.573187	you have to do	-0.124939
-0.893481	don't have to do	-0.124939
-0.716894	is possible to do	-0.124939
-0.515968	be possible to do	-0.124939
-0.869828	it possible to do	-0.124939
-0.748910	not possible to do	-0.124939
-1.088948	it takes to do	-0.124939
-1.120316	of how to do	-0.124939
-0.536162	specifies how to do	-0.124939
-1.080083	you need to do	-0.124939
-1.164641	is important to do	-0.124939
-0.731963	very important to do	-0.124939
-0.439883	heavy work to do	-0.124939
-0.819044	is necessary to do	-0.124939
-0.599135	often necessary to do	-0.124939
-0.300053	therefore necessary to do	-0.124939
-0.439883	is good to do	-0.124939
-1.160095	is advantageous to do	-0.124939
-1.110522	be advantageous to do	-0.124939
-0.597680	is able to do	-0.124939
-0.610537	be able to do	-0.124939
-0.516697	are able to do	-0.124939
-0.531356	not able to do	-0.124939
-0.531356	were able to do	-0.124939
-0.781833	clock cycles to do	-0.124939
-0.518499	not optimal to do	-0.124939
-0.623102	be better to do	-0.124939
-0.894158	various ways to do	-0.124939
-0.594604	three ways to do	-0.124939
-0.478958	more safe to do	-0.124939
-0.439883	first thing to do	-0.124939
-0.499899	may try to do	-0.124939
-0.623102	be obvious to do	-0.124939
-0.506619	therefore safer to do	-0.124939
-0.075310	deallocated. Failure to do	-0.425969
-0.166478	flow. Failure to do	-0.124939
-0.340154	tested seem to do	-0.124939
-0.340154	may decide to do	-0.124939
-0.718040	operating systems that do	-0.124939
-0.355160	old microprocessors that do	-0.124939
-0.142243	other languages that do	-0.124939
-0.142243	programming languages that do	-0.124939
-0.355160	soft cores that do	-0.124939
-0.355160	powerful facilities that do	-0.124939
-0.575911	compiler that can do	-0.124939
-0.848908	then it can do	-0.124939
-1.367528	the compiler can do	-0.124939
-0.737075	and you can do	-0.124939
-0.481627	that you can do	-0.124939
-0.456098	compilers you can do	-0.124939
-0.170739	things you can do	-0.124939
-0.439372	the CPU can do	-0.124939
-0.737550	Most compilers can do	-0.124939
-0.456347	Modern compilers can do	-0.124939
-1.018879	that we can do	-0.124939
-0.499313	Modern CPUs can do	-0.124939
-0.339748	the processor can do	-0.124939
-0.343535	one thread can do	-0.124939
-0.343535	third thread can do	-0.124939
-0.439372	the programmer can do	-0.124939
-0.339748	Modern microprocessors can do	-0.124939
-0.339748	simple algorithm can do	-0.124939
-0.339748	the preprocessor can do	-0.124939
-0.595484	5) { // do	-0.124939
-0.357418	of order or do	-0.124939
-0.357418	a command or do	-0.124939
-0.175070	compilers will not do	-0.124939
-0.878458	code. If you do	-0.124939
-0.574961	which compiler will do	-0.124939
-0.514501	most compilers will do	-0.124939
-0.514501	future compilers will do	-0.124939
-1.064418	make the program do	-0.124939
-0.357920	service routine should do	-0.124939
-0.497346	However, most compilers do	-0.124939
-0.353433	reason why compilers do	-0.124939
-0.445772	memory and we do	-0.124939
-0.561480	large that we do	-0.124939
-0.445772	applications. But we do	-0.124939
-1.089897	Floating point variables do	-0.124939
-0.451588	the compilers cannot do	-0.124939
-0.349432	and therefore cannot do	-0.124939
-0.854450	Intel function libraries do	-0.124939
-0.449830	the Intel libraries do	-0.124939
-0.501658	explicitly that pointers do	-0.124939
-0.564515	but 32-bit systems do	-0.124939
-0.554506	operators because they do	-0.124939
-0.559790	these integer operations do	-0.124939
-0.537474	then you must do	-0.124939
-0.576550	would be able do	-0.124939
-0.455039	that these directives do	-0.124939
-0.351798	The bigger vectors do	-0.124939
-0.497347	as when contentions do	-0.124939
-0.453096	because relative references do	-0.124939
-0.514347	pointer. These conversions do	-0.124939
-0.448886	other STL containers do	-0.124939
-0.446019	example, many programmers do	-0.124939
-0.331606	now overlap. Compilers do	-0.124939
-0.325102	since 2004. Can do	-0.124939
-0.069261	their live ranges do	-0.602060
-0.407814	128-bit vector register, do	-0.124939
-0.237690	I have studied do	-0.124939
-0.237690	if their live-ranges do	-0.124939
-0.237690	uses (live ranges) do	-0.124939
-0.584797	code, as the example	-0.124939
-1.399473	look at the example	-0.124939
-0.853413	the way of example	-0.124939
-0.462676	a collection of example	-0.124939
-0.358172	the transformation of example	-0.124939
-0.358843	implementation analogous to example	-0.124939
-0.425056	the code in example	-0.124939
-0.249147	The code in example	-0.124939
-0.146292	explicitly as in example	-0.124939
-0.146292	memory, as in example	-0.124939
-0.146292	principle as in example	-0.124939
-0.146292	counters, as in example	-0.124939
-0.146292	union, as in example	-0.124939
-0.146108	the loop in example	-0.124939
-0.071482	The loop in example	-0.124939
-0.157092	while loop in example	-0.124939
-0.157092	c loop in example	-0.124939
-0.511869	required, but in example	-0.124939
-0.570642	method used in example	-0.124939
-0.621688	and b in example	-0.124939
-0.807497	register variable in example	-0.124939
-0.345854	of 2 in example	-0.124939
-0.239601	by 2 in example	-0.124939
-0.339417	if branch in example	-0.124939
-0.438955	The method in example	-0.124939
-0.438955	not work in example	-0.124939
-1.238861	as explained in example	-0.124939
-0.339417	in list in example	-0.124939
-0.438955	the structure in example	-0.124939
-0.239601	The syntax in example	-0.124939
-0.239601	C++ syntax in example	-0.124939
-0.511869	is given in example	-0.124939
-0.339417	listed below in example	-0.124939
-0.339417	loop unrolling in example	-0.124939
-0.137361	of CriticalFunction in example	-0.124939
-0.137361	to CriticalFunction in example	-0.124939
-0.103482	is illustrated in example	-0.124939
-0.621688	as shown in example	-0.124939
-0.339417	| Friday) in example	-0.124939
-0.339417	The FactorialTable in example	-0.124939
-0.339417	AND-OR construction in example	-0.124939
-0.339417	the if-branch in example	-0.124939
-0.358729	a type. The example	-0.124939
-0.353527	some cases, for example	-0.124939
-0.353527	the program, for example	-0.124939
-0.353527	not suitable for example	-0.124939
-0.353527	a variable, for example	-0.124939
-0.353527	the loop, for example	-0.124939
-0.353527	of it, for example	-0.124939
-0.353527	certain events, for example	-0.124939
-0.353527	this interval, for example	-0.124939
-0.353527	in parts, for example	-0.124939
-0.355415	same code as example	-0.124939
-0.355415	// Same as example	-0.124939
-0.355415	template parameters, as example	-0.124939
-0.358515	Portability note: This example	-0.124939
-0.488231	80 for an example	-0.124939
-0.699157	89 for an example	-0.124939
-0.574233	provided as an example	-0.124939
-0.353044	58 shows an example	-0.124939
-0.881795	times faster than example	-0.124939
-0.376585	code in this example	-0.124939
-0.376585	data in this example	-0.124939
-0.376585	float in this example	-0.124939
-0.376585	statement in this example	-0.124939
-0.376585	MultiplyBy in this example	-0.124939
-0.376585	formulas in this example	-0.124939
-0.376585	1.2 in this example	-0.124939
-0.342966	am giving this example	-0.124939
-0.445046	// or from example	-0.124939
-0.087488	the code from example	-0.124939
-0.041545	assembly code from example	-0.124939
-0.484611	The conversion from example	-0.124939
-0.484611	to come from example	-0.124939
-0.572064	} The same example	-0.124939
-0.357737	a matrix using example	-0.124939
-0.334649	to double In example	-0.124939
-0.334649	XMM register. In example	-0.124939
-0.334649	by 16. In example	-0.124939
-0.334649	the application. In example	-0.124939
-0.334649	it explicitly. In example	-0.124939
-0.356976	Preprocessor directives. For example	-0.124939
-0.356425	reasons. Use these example	-0.124939
-0.122259	function. The following example	-0.124939
-0.122259	functions. The following example	-0.124939
-0.122259	efficient. The following example	-0.124939
-0.122259	set. The following example	-0.124939
-0.122259	processors. The following example	-0.124939
-0.122259	loop. The following example	-0.124939
-0.122259	2. The following example	-0.124939
-0.122259	not. The following example	-0.124939
-0.122259	vectors. The following example	-0.124939
-0.122259	again. The following example	-0.124939
-0.122259	www.agner.org/optimize/asmlib.zip. The following example	-0.124939
-0.122259	explanation. The following example	-0.124939
-0.122259	errors. The following example	-0.124939
-0.122259	compilation. The following example	-0.124939
-0.122259	multiplications. The following example	-0.124939
-0.122259	noticeable. The following example	-0.124939
-0.122259	Atom). The following example	-0.124939
-0.211769	is InstructionSet().The following example	-0.124939
-0.355669	is executed. An example	-0.124939
-0.354012	the compiler optimize example	-0.124939
-0.427463	in the above example	-0.124939
-0.392932	repeat the above example	-0.124939
-0.487548	register. The above example	-0.124939
-0.549290	metaprogramming. The next example	-0.124939
-0.353475	in situations like example	-0.124939
-0.785437	compiler to reduce example	-0.124939
-0.347330	tested can convert example	-0.124939
-0.172648	compiler will convert example	-0.124939
-0.345029	If we modify example	-0.124939
-0.294099	illustrates this. My example	-0.124939
-0.294099	an example. My example	-0.124939
-0.331779	are actually reducing example	-0.124939
-0.294015	that automatically reduces example	-0.124939
-0.237731	make the SelectAddMul example	-0.124939
-1.541935	one of the compilers	-0.124939
-0.375661	none of the compilers	-0.124939
-0.590772	None of the compilers	-0.124939
-0.588697	reductions that the compilers	-0.124939
-0.588697	emphasized that the compilers	-0.124939
-0.541582	on all the compilers	-0.124939
-0.789536	because all the compilers	-0.124939
-0.813973	may choose the compilers	-0.124939
-0.357352	which reductions the compilers	-0.124939
-0.357352	turned on, the compilers	-0.124939
-0.356115	the compiler. The compilers	-0.124939
-0.501086	losing precision. The compilers	-0.124939
-0.356115	a constant. The compilers	-0.124939
-0.356115	do so. The compilers	-0.124939
-0.356115	8. 71 The compilers	-0.124939
-1.088234	works only for compilers	-0.124939
-0.902231	that come with compilers	-0.124939
-0.557244	not work on compilers	-0.124939
-0.462580	are accessible from compilers	-0.124939
-0.561424	whether the different compilers	-0.124939
-0.847183	Comparison of different compilers	-0.124939
-0.798294	compiled with different compilers	-0.124939
-0.450475	files from different compilers	-0.124939
-0.503701	can use only compilers	-0.124939
-0.560819	expressions and other compilers	-0.124939
-0.416528	compiler with other compilers	-0.124939
-0.416528	compatible with other compilers	-0.124939
-0.341181	two. Some other compilers	-0.124939
-0.240535	overhead while other compilers	-0.124939
-0.240535	Func1, while other compilers	-0.124939
-0.357816	less. Fortunately, all compilers	-0.124939
-0.349044	brackets. However, most compilers	-0.124939
-0.349044	output. On most compilers	-0.124939
-0.349044	d.y; Fortunately, most compilers	-0.124939
-0.220129	Microsoft and Intel compilers	-0.124939
-0.220129	PathScale and Intel compilers	-0.124939
-0.321135	Clang and Intel compilers	-0.124939
-0.520110	mechanism in Intel compilers	-0.124939
-0.558791	127. The Intel compilers	-0.124939
-0.613408	Intel compiler Intel compilers	-0.124939
-0.357430	mode. Some 64-bit compilers	-0.124939
-0.470290	brands of C++ compilers	-0.124939
-0.280332	sense that C++ compilers	-0.124939
-0.110183	in different C++ compilers	-0.124939
-0.016530	for different C++ compilers	-0.903090
-0.110183	several different C++ compilers	-0.124939
-0.280332	on all C++ compilers	-0.124939
-0.280332	Furthermore, most C++ compilers	-0.124939
-0.280332	options All C++ compilers	-0.124939
-0.117929	efficient. Most C++ compilers	-0.124939
-0.117929	optimizations. Most C++ compilers	-0.124939
-0.365869	and Microsoft C++ compilers	-0.124939
-0.515203	notice that some compilers	-0.124939
-0.350690	2; Unfortunately, some compilers	-0.124939
-0.593744	purity. For example, compilers	-0.124939
-0.431830	study of how compilers	-0.124939
-0.431830	understanding of how compilers	-0.124939
-0.836680	All of these compilers	-0.124939
-0.346996	to tell these compilers	-0.124939
-0.704641	Intel and Gnu compilers	-0.124939
-0.344518	Microsoft or Gnu compilers	-0.124939
-0.210093	program optimization Some compilers	-0.124939
-0.210093	compile time. Some compilers	-0.124939
-0.092464	64-bit systems. Some compilers	-0.124939
-0.092464	operating systems. Some compilers	-0.124939
-0.210093	all compilers. Some compilers	-0.124939
-0.210093	Optimization directives Some compilers	-0.124939
-0.166338	the compiler. Some compilers	-0.124939
-0.166338	a compiler. Some compilers	-0.124939
-0.210093	Loop unrolling Some compilers	-0.124939
-0.210093	optimal order. Some compilers	-0.124939
-0.210093	option available. Some compilers	-0.124939
-0.210093	cache line. Some compilers	-0.124939
-0.210093	the division. Some compilers	-0.124939
-0.281737	than two. Some compilers	-0.124939
-0.210093	programming style. Some compilers	-0.124939
-0.210093	different places). Some compilers	-0.124939
-0.566614	available. The best compilers	-0.124939
-0.499978	stupid. Some common compilers	-0.124939
-0.341674	example, all good compilers	-0.124939
-0.502097	some very good compilers	-0.124939
-0.354543	table. Unfortunately, few compilers	-0.124939
-0.320025	variable because optimizing compilers	-0.124939
-0.320025	the best optimizing compilers	-0.124939
-0.320025	fast. All optimizing compilers	-0.124939
-0.180781	static memory. Most compilers	-0.124939
-0.180781	or cache. Most compilers	-0.124939
-0.180781	Thread-local storage Most compilers	-0.124939
-0.180781	Algebraic reductions Most compilers	-0.124939
-0.180781	the loop. Most compilers	-0.124939
-0.180781	simple variable. Most compilers	-0.124939
-0.180781	instruction sets. Most compilers	-0.124939
-0.180781	Algebraic reduction Most compilers	-0.124939
-0.180781	the executable. Most compilers	-0.124939
-0.180781	data compression Most compilers	-0.124939
-0.180781	sizeof(b)); 47 Most compilers	-0.124939
-0.352004	be slower. Many compilers	-0.124939
-0.347218	the processor). Optimizing compilers	-0.124939
-0.304143	this format. Other compilers	-0.124939
-0.304143	and Gnu). Other compilers	-0.124939
-0.135733	Intel and PathScale compilers	-0.425969
-0.920775	The reason why compilers	-0.124939
-0.293953	hope that future compilers	-0.124939
-0.293953	Object1.Hello(), though future compilers	-0.124939
-0.293910	code that current compilers	-0.124939
-0.293910	12.4a where current compilers	-0.124939
-0.343621	compilers optimize Modern compilers	-0.124939
-0.535453	in the latest compilers	-0.124939
-0.339183	supplied with Intel's compilers	-0.124939
-0.109317	compiler 8.1 How compilers	-0.124939
-0.109317	66 8.1 How compilers	-0.124939
-0.500137	and Digital Mars compilers	-0.124939
-0.324993	in many commercial compilers	-0.124939
-0.324993	Codeplay and Watcom compilers	-0.124939
-0.314518	a number). Different compilers	-0.124939
-0.293867	page 73). Current compilers	-0.124939
-0.293867	dvec.h vectorclass.h Supported compilers	-0.124939
-0.237601	Automatic vectorization Good compilers	-0.124939
-0.237601	is enabled. Few compilers	-0.124939
-0.237601	sizeof(float)); // (Some compilers	-0.124939
-0.900648	this is the most	-0.124939
-0.534065	array is the most	-0.124939
-0.534065	stack is the most	-0.124939
-0.534065	list is the most	-0.124939
-0.578433	each of the most	-0.124939
-1.321528	some of the most	-0.124939
-0.578433	Some of the most	-0.124939
-0.902838	versions of the most	-0.425969
-1.011839	instead of the most	-0.124939
-0.578433	compilation of the most	-0.124939
-0.584908	only to the most	-0.124939
-1.139237	access to the most	-0.124939
-0.831002	declaration and the most	-0.124939
-0.564522	easiest and the most	-0.124939
-0.564522	exponent, and the most	-0.124939
-0.834767	used in the most	-0.124939
-0.583175	even in the most	-0.124939
-1.149762	calls in the most	-0.124939
-1.555319	sure that the most	-0.124939
-0.593728	methods if the most	-0.124939
-1.718997	to use the most	-0.124939
-0.587272	high then the most	-0.124939
-1.704060	to make the most	-0.124939
-0.568778	on only the most	-0.124939
-1.136726	by making the most	-0.124939
-0.750016	to run the most	-0.124939
-1.506553	to calculate the most	-0.124939
-0.302976	then put the most	-0.124939
-0.548625	automatically choose the most	-0.124939
-0.532471	mispredictions. When the most	-0.124939
-0.534235	than finding the most	-0.124939
-0.456309	always select the most	-0.124939
-0.456309	programmer choosing the most	-0.124939
-0.353159	is probably the most	-0.124939
-0.456309	and isolate the most	-0.124939
-0.353159	are among the most	-0.124939
-0.888788	operand that is most	-0.124939
-0.657885	composite type is most	-0.124939
-0.503632	on what is most	-0.124939
-0.504138	A profiler is most	-0.124939
-0.593024	SSE2 version of most	-0.124939
-0.358228	the best and most	-0.124939
-0.358228	The simplest and most	-0.124939
-0.533700	price, and in most	-0.124939
-0.352718	compilers can in most	-0.124939
-0.541140	of time in most	-0.124939
-0.499263	pointers because in most	-0.124939
-0.542753	by value in most	-0.124939
-0.352718	quite simple in most	-0.124939
-0.647534	is advantageous in most	-0.124939
-0.455750	are fast in most	-0.124939
-0.352718	thread. However, in most	-0.124939
-0.455750	is optimal in most	-0.124939
-0.065010	hardware implementation in most	-0.124939
-0.352718	linked lists in most	-0.124939
-0.352718	efficient because, in most	-0.124939
-1.409667	} } The most	-0.124939
-0.491499	Mathematical functions The most	-0.124939
-0.349228	Register variables The most	-0.124939
-1.081472	instruction set. The most	-0.124939
-0.513069	such cases. The most	-0.124939
-0.451330	arithmetic operations. The most	-0.124939
-0.491499	different purposes. The most	-0.124939
-0.349228	control condition The most	-0.124939
-0.491499	this problem. The most	-0.124939
-0.513069	libraries available. The most	-0.124939
-0.491499	execute faster. The most	-0.124939
-0.513069	single element. The most	-0.124939
-0.349228	out-of-order execution. The most	-0.124939
-0.349228	dispatch methods. The most	-0.124939
-0.349228	classes. Security The most	-0.124939
-0.349228	code generality. The most	-0.124939
-0.463335	computing, but for most	-0.124939
-0.357293	generally used that most	-0.124939
-0.357293	Remember again, that most	-0.124939
-0.357293	therefore conclude that most	-0.124939
-1.090047	operating systems are most	-0.124939
-0.538365	which resources are most	-0.124939
-0.357050	Switch statements are most	-0.124939
-1.335552	is supported by most	-0.124939
-0.313519	It comes with most	-0.124939
-0.313519	which comes with most	-0.124939
-0.457490	less important on most	-0.124939
-0.354090	of precision on most	-0.124939
-0.650244	clock cycles on most	-0.124939
-0.821513	clock cycle on most	-0.124939
-0.595031	serial, such as most	-0.124939
-0.358369	elsewhere. Faster than most	-0.124939
-0.526201	dispatcher function will most	-0.124939
-0.566005	the program has most	-0.124939
-1.686256	that are used most	-0.124939
-0.318266	Difficult cases In most	-0.124939
-0.449106	limit, etc. In most	-0.124939
-0.318266	unsigned integers In most	-0.124939
-0.318266	clock cycles. In most	-0.124939
-0.318266	other optimizations. In most	-0.124939
-0.318266	by step. In most	-0.124939
-0.318266	each calculation. In most	-0.124939
-0.318266	many strings. In most	-0.124939
-0.357360	shared object where most	-0.124939
-0.538926	goes one way most	-0.124939
-0.356791	Today, the 8 most	-0.124939
-0.502005	which functions take most	-0.124939
-0.826767	variable is accessed most	-0.124939
-0.355705	for Windows, while most	-0.124939
-0.355592	vice versa. But most	-0.124939
-0.162000	The cache works most	-0.124939
-0.248989	code cache works most	-0.124939
-0.162000	A cache works most	-0.124939
-0.458509	the application uses most	-0.124939
-0.553845	likely to run most	-0.124939
-0.354032	{} brackets. However, most	-0.124939
-0.497278	they are predicted most	-0.124939
-0.348224	language output. On most	-0.124939
-0.269015	Many programs spend most	-0.124939
-0.269015	Some applications spend most	-0.124939
-0.331566	+ d.y; Fortunately, most	-0.124939
-0.331566	most software runs most	-0.124939
-0.325062	of code. Furthermore, most	-0.124939
-0.594779	you can obtain most	-0.124939
-0.314436	task that consumes most	-0.124939
-0.237658	performance. 25 Since most	-0.124939
-0.237658	program 153 spends most	-0.124939
-0.857901	this solution is using	-0.124939
-0.358537	(v. 15.0) is using	-0.124939
-0.217877	The advantage of using	-0.221849
-0.679464	The disadvantage of using	-0.124939
-0.565930	A disadvantage of using	-0.124939
-0.401551	biggest disadvantage of using	-0.124939
-0.577072	system instead of using	-0.124939
-0.227558	The advantages of using	-0.221849
-0.861749	the possibility of using	-0.124939
-0.861749	The purpose of using	-0.124939
-0.064899	The trick of using	-0.124939
-0.454662	are disadvantages of using	-0.124939
-0.519517	and drawbacks of using	-0.124939
-0.351859	and cons of using	-0.124939
-0.351859	The advise of using	-0.124939
-0.065444	execution speed to using	-0.124939
-0.441785	an advantage to using	-0.124939
-0.312838	no advantage to using	-0.124939
-0.356684	no cost to using	-0.124939
-0.356684	performance cost to using	-0.124939
-0.356101	performance penalty to using	-0.124939
-0.356101	little-known alternative to using	-0.124939
-0.501067	various alternatives to using	-0.124939
-0.525583	C++ classes and using	-0.124939
-0.462165	integer counter and using	-0.124939
-0.462165	system devices and using	-0.124939
-0.527137	speed advantage in using	-0.124939
-0.937765	The reason for using	-0.124939
-0.761674	performance penalty for using	-0.124939
-0.656791	test tool for using	-0.124939
-0.830658	if you are using	-0.124939
-0.366669	compiler you are using	-0.124939
-0.270369	when you are using	-0.124939
-0.477894	If you are using	-0.425969
-0.366669	sure you are using	-0.124939
-0.366669	whether you are using	-0.124939
-0.487416	example, we are using	-0.124939
-0.487416	examples we are using	-0.124939
-1.066007	operating systems are using	-0.124939
-0.785336	Modern microprocessors are using	-0.124939
-0.358591	a round function using	-0.124939
-0.498533	static or by using	-0.124939
-0.515776	classes than by using	-0.124939
-0.317530	of x by using	-0.124939
-0.530376	avoid this by using	-0.124939
-0.317530	speed-critical functions by using	-0.124939
-0.530376	this loop by using	-0.124939
-0.317530	is set by using	-0.124939
-0.317530	CPU clock by using	-0.124939
-0.317530	global variables by using	-0.124939
-0.411605	of 2 by using	-0.124939
-0.317530	64-bit systems by using	-0.124939
-0.317530	better result by using	-0.124939
-0.082431	the speed by using	-0.124939
-0.039263	in speed by using	-0.124939
-0.317530	be optimized by using	-0.124939
-0.502112	significantly simply by using	-0.124939
-0.411605	to zero by using	-0.124939
-0.317530	the conversions by using	-0.124939
-1.091776	be avoided by using	-0.124939
-0.317530	code further by using	-0.124939
-0.317530	improve efficiency by using	-0.124939
-0.759279	is obtained by using	-0.124939
-0.333365	be improved by using	-0.124939
-0.411605	code explicitly by using	-0.124939
-0.317530	alias anything by using	-0.124939
-0.317530	these guidelines by using	-0.124939
-0.317530	avoid hyperthreading by using	-0.124939
-0.317530	be hidden by using	-0.124939
-0.317530	if necessary, by using	-0.124939
-0.317530	data segment by using	-0.124939
-0.317530	be ameliorated by using	-0.124939
-0.312471	for restrictions on using	-0.124939
-0.441291	certain restrictions on using	-0.124939
-0.355561	www.intel.com. Manual on using	-0.124939
-1.202331	as efficient as using	-0.124939
-0.358395	and by not using	-0.124939
-0.891679	code rather than using	-0.124939
-0.358261	is simpler when using	-0.124939
-0.786760	could benefit from using	-0.124939
-0.357655	The same example using	-0.124939
-0.538802	www.agner.org/optimize/cppexamples.zip. An array using	-0.124939
-0.830270	in speed between using	-0.124939
-0.244647	12.4e. Same example, using	-0.124939
-0.244647	12.4d. Same example, using	-0.124939
-0.328597	unrecoverable error without using	-0.124939
-0.328597	an exception without using	-0.124939
-0.328597	handling errors without using	-0.124939
-0.328597	C1::f directly without using	-0.124939
-0.356368	first call method using	-0.124939
-0.341864	Calculate integer power using	-0.124939
-0.341864	15.1d. Integer power using	-0.124939
-0.843083	transpose a matrix using	-0.124939
-0.718193	and without AVX using	-0.124939
-1.512581	can be calculated using	-0.124939
-0.565669	with bitwise operators using	-0.124939
-1.481258	dynamic memory allocation using	-0.124939
-0.353731	this by preferably using	-0.124939
-0.455310	simple algebraic expressions using	-0.124939
-0.349463	a single operation using	-0.124939
-1.094131	are in fact using	-0.124939
-0.702149	Testing multiple conditions using	-0.124939
-0.154819	supported instruction set, using	-0.425969
-0.621138	dynamically allocated memory, using	-0.124939
-0.444229	it. I am using	-0.124939
-0.314652	manual, I am using	-0.124939
-0.325263	it is finished using	-0.124939
-0.293941	when running. Programs using	-0.124939
-0.237666	are highly optimized, using	-0.124939
-1.064238	multiplying with the double	-0.124939
-0.659486	// everything is double	-0.124939
-1.066233	example is a double	-0.124939
-1.069830	bits of a double	-0.124939
-1.380189	For example, a double	-0.124939
-0.461368	bits while a double	-0.124939
-0.357144	to modify a double	-0.124939
-0.461368	function stores a double	-0.124939
-0.833847	converting a to double	-0.124939
-0.358036	single precision to double	-0.124939
-0.462503	when converting to double	-0.124939
-0.521927	bit integer and double	-0.124939
-0.131348	bit float and double	-0.124939
-0.211822	mix float and double	-0.124939
-0.131348	Mixing float and double	-0.124939
-0.131348	mixes float and double	-0.124939
-0.089503	between single and double	-0.124939
-0.089503	mix single and double	-0.124939
-0.089503	mixing single and double	-0.124939
-0.503581	precision than for double	-0.124939
-0.525775	cases, even for double	-0.124939
-0.550409	not know that double	-0.124939
-0.787725	point constants are double	-0.124939
-0.594361	example, you can double	-0.124939
-0.449155	8 bytes = double	-0.124939
-0.059403	a float or double	-0.124939
-0.128238	of float or double	-0.124939
-0.059403	from float or double	-0.425969
-0.128238	(8 float or double	-0.124939
-0.350623	with single or double	-0.124939
-0.743720	single precision or double	-0.124939
-0.588874	calculated faster than double	-0.124939
-0.594198	float rather than double	-0.124939
-1.172728	xpow10(double x) { double	-0.124939
-0.567138	14.23b union { double	-0.124939
-0.933109	struct S1 { double	-0.124939
-0.519515	int n) { double	-0.124939
-1.172517	You may use double	-0.124939
-0.357907	extra complications. A double	-0.124939
-0.358129	return y; } double	-0.124939
-0.357863	power using loop double	-0.124939
-1.599290	a and b double	-0.124939
-0.524734	vectors of two double	-0.124939
-0.015029	{ public: static double	-0.726999
-0.541625	of a 64-bit double	-0.124939
-0.384759	write a 64-bit double	-0.124939
-0.588137	function of 2 double	-0.124939
-0.316696	in the long double	-0.124939
-0.060196	double and long double	-0.124939
-0.410569	done with long double	-0.124939
-0.316696	registers have long double	-0.124939
-0.316696	calculate when long double	-0.124939
-0.316696	8 8 long double	-0.124939
-0.334979	table of const double	-0.124939
-0.532094	{ static const double	-0.124939
-0.334979	induction variables const double	-0.124939
-0.334979	int x; const double	-0.124939
-0.538438	float 4 4 double	-0.124939
-0.355484	Func1(double) pure_function ; double	-0.124939
-0.458680	8 256 AVX double	-0.124939
-0.354675	256 float 128 double	-0.124939
-0.354506	can hold four double	-0.124939
-1.023675	int a, b; double	-0.124939
-0.498721	log(c[i]);. This would double	-0.124939
-0.577828	N> static inline double	-0.124939
-1.151729	In most cases, double	-0.124939
-0.353049	single precision. Using double	-0.124939
-0.352898	128 float 256 double	-0.124939
-0.359690	int r, c; double	-0.425969
-0.100295	power of 10 double	-0.425969
-0.604050	short int a; double	-0.124939
-0.422698	7.24 float a; double	-0.124939
-0.052389	S1 {double a; double	-0.425969
-0.346039	4 128 SSE double	-0.124939
-0.021543	unsigned int u; double	-0.602060
-0.697292	16 512 AVX512 double	-0.124939
-0.428700	Y = C; double	-0.124939
-0.331386	#define pure_function #endif double	-0.124939
-0.211929	representation of float, double	-0.124939
-0.211929	Conversions between float, double	-0.124939
-0.314356	// Example 8.4 double	-0.124939
-0.574467	// Polynomial coefficients double	-0.124939
-0.172378	Disadvantages are: Long double	-0.124939
-0.172378	double precision. Long double	-0.124939
-0.314125	power, loop unrolled double	-0.124939
-0.537847	C = 3.3; double	-0.124939
-0.293635	// Example 14.14b double	-0.124939
-0.293635	// Example 14.14a double	-0.124939
-0.293635	A + A; double	-0.124939
-0.023484	void TransposeCopy(double a[SIZE][SIZE], double	-0.425969
-0.237397	x4 = x2*x2; double	-0.124939
-0.237397	unsigned int dummy; double	-0.124939
-0.237397	r2, c1, c2; double	-0.124939
-0.237397	// Example 14.18c double	-0.124939
-0.237397	// Example 8.2a double	-0.124939
-0.237397	// Example 7.32a double	-0.124939
-0.237397	// Example 7.32b double	-0.124939
-0.237397	() { __declspec(__align(64)) double	-0.124939
-0.237397	// Example 14.17b double	-0.124939
-0.237397	// Example 14.16a double	-0.124939
-0.237397	StoreNTD(double * dest, double	-0.124939
-0.237397	= x *x; double	-0.124939
-0.237397	// Example 8.8b double	-0.124939
-0.237397	// Example 8.8a double	-0.124939
-0.237397	x8 = x4*x4; double	-0.124939
-0.237397	// Example 14.16b double	-0.124939
-0.237397	// Example 14.17a double	-0.124939
-0.237397	// Example 14.20 double	-0.124939
-1.483770	multiple of the size	-0.124939
-0.594587	documentation for the size	-0.124939
-0.562083	memory if the size	-0.124939
-0.562083	errors if the size	-0.124939
-0.195825	happen if the size	-0.124939
-0.562083	matter if the size	-0.124939
-1.363813	divisible by the size	-0.124939
-0.362444	multiplied by the size	-0.124939
-1.530370	depends on the size	-0.124939
-0.578168	well as the size	-0.124939
-0.560854	matrix when the size	-0.124939
-0.560854	allocation when the size	-0.124939
-0.824252	dynamically when the size	-0.124939
-0.590532	i*12, because the size	-0.124939
-0.591325	addition. If the size	-0.124939
-0.545447	and where the size	-0.124939
-1.209661	cases where the size	-0.124939
-0.974994	above example, the size	-0.124939
-0.851917	slow unless the size	-0.124939
-0.554044	that fit the size	-0.124939
-0.534959	cannot increase the size	-0.124939
-0.458395	only half the size	-0.124939
-0.142134	stack. Is the size	-0.124939
-0.142134	38). Is the size	-0.124939
-0.354803	// Return the size	-0.124939
-0.458395	table increases the size	-0.124939
-0.463324	the type and size	-0.124939
-0.498388	more efficient. The size	-0.124939
-0.498388	less efficient. The size	-0.124939
-0.648049	execution units. The size	-0.124939
-0.648049	by 8. The size	-0.124939
-0.456081	class elements. The size	-0.124939
-0.352979	class objects. The size	-0.124939
-0.713027	another module. The size	-0.124939
-0.648049	following reasons: The size	-0.124939
-0.352979	arithmetic expression. The size	-0.124939
-0.352979	number i. The size	-0.124939
-0.462493	between optimizing for size	-0.124939
-0.503760	speed. Optimizing for size	-0.124939
-0.596690	making the code size	-0.124939
-0.459251	efficiency and code size	-0.124939
-0.355477	and small code size	-0.124939
-0.266815	vectorization const int size	-0.124939
-0.266815	x); const int size	-0.124939
-0.266815	#endif const int size	-0.124939
-0.266815	14.30 const int size	-0.124939
-0.266815	11.3 const int size	-0.124939
-0.113232	Func(int); const int size	-0.425969
-0.266815	11.2b const int size	-0.124939
-0.266815	11.2a const int size	-0.124939
-0.266815	7.33b const int size	-0.124939
-0.266815	7.33a const int size	-0.124939
-0.266815	14.4a const int size	-0.124939
-0.358093	the smallest data size	-0.124939
-0.102748	by the vector size	-0.124939
-0.348991	a new vector size	-0.124939
-0.580411	transposition of different size	-0.124939
-0.354821	and copying different size	-0.124939
-0.357942	execution units same size	-0.124939
-0.357829	today where cache size	-0.124939
-0.526352	smaller the integer size	-0.124939
-0.480833	application. The integer size	-0.124939
-0.546025	use an integer size	-0.124939
-0.326867	most efficient integer size	-0.124939
-0.326867	a particular integer size	-0.124939
-0.326867	the default integer size	-0.124939
-0.026302	the smallest integer size	-0.602060
-0.357606	the memory page size	-0.124939
-0.572802	make the array size	-0.124939
-0.351383	the final array size	-0.124939
-0.539122	objects of variable size	-0.124939
-0.461410	integers of any size	-0.124939
-0.509734	of the register size	-0.124939
-0.517221	largest vector register size	-0.124939
-0.062401	a new register size	-0.124939
-0.355810	make sure its size	-0.124939
-0.852607	of a specific size	-0.124939
-0.625151	64 64 matrix size	-0.124939
-0.625151	512 512 matrix size	-0.124939
-0.054536	with a line size	-0.425969
-0.329523	the cache line size	-0.124939
-0.118384	The cache line size	-0.124939
-0.281655	by cache line size	-0.124939
-0.353813	in performance. Integer size	-0.124939
-0.353175	because the block size	-0.124939
-0.352543	to a longer size	-0.124939
-0.453597	to a smaller size	-0.124939
-0.491788	and i >= size	-0.124939
-0.531392	registers. The maximum size	-0.124939
-0.684351	of the final size	-0.124939
-0.427694	If the final size	-0.124939
-0.565278	temp; // Define size	-0.124939
-0.348080	operations. The total size	-0.124939
-0.347108	handling a full size	-0.124939
-0.347325	if the RAM size	-0.124939
-0.343448	Vectors of 256-bit size	-0.124939
-0.444031	of the default size	-0.124939
-0.326942	is a fixed size	-0.124939
-0.299046	objects and fixed size	-0.124939
-0.460874	buffer with fixed size	-0.124939
-0.343524	256; // Array size	-0.124939
-0.237776	large arrays. Array size	-0.124939
-0.102790	If the combined size	-0.124939
-0.102790	where the combined size	-0.124939
-0.037137	of elements Total size	-0.425969
-0.407741	of 8 kb size	-0.124939
-0.102765	array element. Matrix size	-0.124939
-0.102765	as follows: Matrix size	-0.124939
-0.237641	future CPUs. Half size	-0.124939
-1.041832	function of the Intel	-0.124939
-0.589250	disadvantage of the Intel	-0.124939
-0.877837	behavior of the Intel	-0.124939
-0.589250	bias of the Intel	-0.124939
-0.596042	inferior to the Intel	-0.124939
-0.593903	compiler and the Intel	-0.124939
-0.202411	mechanism in the Intel	-0.425969
-0.595401	optimized for the Intel	-0.124939
-1.114197	fact that the Intel	-0.124939
-0.656956	Note that the Intel	-0.124939
-1.178627	generated by the Intel	-0.124939
-0.885041	come with the Intel	-0.124939
-0.887842	work on the Intel	-0.124939
-1.514936	to use the Intel	-0.124939
-0.983195	may use the Intel	-0.124939
-0.592240	references: If the Intel	-0.124939
-0.566106	__declspec(cpu_dispatch(...)). See the Intel	-0.124939
-0.557564	sets. However, the Intel	-0.124939
-0.798069	some cases, the Intel	-0.124939
-0.355527	my tests, the Intel	-0.124939
-0.355527	etc. Overriding the Intel	-0.124939
-1.515606	the use of Intel	-0.124939
-0.586536	Some versions of Intel	-0.124939
-0.503395	optimization features of Intel	-0.124939
-0.357767	in favor of Intel	-0.124939
-0.582643	both AMD and Intel	-0.124939
-0.460892	The Microsoft and Intel	-0.124939
-0.460892	the PathScale and Intel	-0.124939
-0.065530	Gnu, Clang and Intel	-0.124939
-0.581272	detection function in Intel	-0.124939
-0.460472	monitor counter in Intel	-0.124939
-0.541950	CPU dispatching in Intel	-0.425969
-0.502739	detection mechanism in Intel	-0.124939
-0.783602	classes defined in Intel	-0.124939
-1.441306	} } The Intel	-0.124939
-0.352931	platform. Intel The Intel	-0.124939
-0.548268	Itanium systems. The Intel	-0.124939
-0.937216	or not. The Intel	-0.124939
-0.352931	avoiding this. The Intel	-0.124939
-0.456020	page 127. The Intel	-0.124939
-0.352931	is required. The Intel	-0.124939
-0.647955	page 122. The Intel	-0.124939
-0.352931	for details). The Intel	-0.124939
-0.352931	vectorclass www.agner.org/optimize/#vectorclass. The Intel	-0.124939
-0.849514	assembly code for Intel	-0.124939
-1.084603	works only for Intel	-0.124939
-0.358416	Use Gnu or Intel	-0.124939
-0.357176	originally designed by Intel	-0.124939
-0.357176	libraries published by Intel	-0.124939
-0.357048	works only with Intel	-0.124939
-0.357048	Documentation". Included with Intel	-0.124939
-0.351124	processors and on Intel	-0.124939
-0.744959	but not on Intel	-0.124939
-0.453730	also works on Intel	-0.124939
-0.351124	CPU feature on Intel	-0.124939
-0.351124	in tests on Intel	-0.124939
-0.351124	BIOS setup. on Intel	-0.124939
-0.574724	processor is an Intel	-0.124939
-0.127339	running on an Intel	-0.124939
-0.141648	are using an Intel	-0.124939
-0.349392	to fake an Intel	-0.124939
-0.795520	in Intel compiler Intel	-0.124939
-0.971611	Linux Intel compiler Intel	-0.124939
-0.462735	Intel CPUs use Intel	-0.124939
-0.358209	also work when Intel	-0.124939
-0.458961	available, one from Intel	-0.124939
-0.458961	Agner Available from Intel	-0.124939
-0.585743	function for different Intel	-0.124939
-0.657241	Same example, using Intel	-0.124939
-0.357563	of 2 double Intel	-0.124939
-0.565037	Vector class library Intel	-0.124939
-0.356965	from Intel. See Intel	-0.124939
-0.356936	specific profiler. For Intel	-0.124939
-0.355890	PGI PathScale Gnu Intel	-0.124939
-0.278883	Intel compiler Windows Intel	-0.425969
-0.755986	strlen 128 bytes Intel	-0.124939
-0.060953	Gnu compiler Linux Intel	-0.602060
-0.458659	the well optimized Intel	-0.124939
-0.354182	flags on certain Intel	-0.124939
-0.536346	Gnu 32-bit Mac Intel	-0.124939
-0.816189	and PathScale compilers. Intel	-0.124939
-0.351949	CPU dispatching. Many Intel	-0.124939
-0.549672	Core and later Intel	-0.124939
-0.644026	16kB aligned operands Intel	-0.124939
-0.346016	internal multi-threading, e.g. Intel	-0.124939
-0.445793	while all newer Intel	-0.124939
-0.344791	tasks on current Intel	-0.124939
-0.193268	tried. The Microsoft, Intel	-0.124939
-0.193268	Linux with Microsoft, Intel	-0.124939
-0.193268	compilers from Microsoft, Intel	-0.124939
-0.193268	functions (i.e. Microsoft, Intel	-0.124939
-0.341711	parallelization. The Gnu, Intel	-0.124939
-0.095025	the Gnu, Clang, Intel	-0.425969
-0.080807	as Gnu, Clang, Intel	-0.124939
-0.420818	bits Vector class, Intel	-0.124939
-0.314309	processors and earlier Intel	-0.124939
-0.314309	on Mac platform. Intel	-0.124939
-0.137073	vector math libraries: Intel	-0.124939
-0.538156	See page 131. Intel	-0.124939
-0.538156	16kB unaligned op. Intel	-0.124939
-0.048336	from Intel: "IA-32 Intel	-0.124939
-0.048336	documentation Intel: "IA-32 Intel	-0.124939
-0.237552	earlier vmlsExp4 vmldExp2 Intel	-0.124939
-0.237552	Builder 5, 2009). Intel	-0.124939
-0.237552	doesn’t. The undocumented Intel	-0.124939
-0.237552	later __svml_expf4 __svml_exp2 Intel	-0.124939
-0.237552	Asmlib: v. 2.00. Intel	-0.124939
-0.237552	as follows (using Intel	-0.124939
-0.237552	such as AQtime, Intel	-0.124939
-0.842172	value of the pointer	-0.124939
-1.603159	sure that the pointer	-0.124939
-0.595616	entry with the pointer	-0.124939
-0.596545	deleted when the pointer	-0.124939
-0.593130	registers then the pointer	-0.124939
-0.589104	well before the pointer	-0.124939
-0.824497	cycles after the pointer	-0.124939
-0.964098	there is a pointer	-0.124939
-0.875929	value of a pointer	-0.124939
-0.501897	class to a pointer	-0.124939
-0.972583	pointer to a pointer	-0.124939
-0.721590	added to a pointer	-0.124939
-0.442328	converted to a pointer	-0.124939
-0.721590	type-casted to a pointer	-0.124939
-0.569653	block and a pointer	-0.124939
-0.450124	give it a pointer	-0.124939
-0.860927	function with a pointer	-0.124939
-0.582725	same as a pointer	-0.124939
-0.558168	example, when a pointer	-0.124939
-0.572690	class has a pointer	-0.124939
-1.610299	to make a pointer	-0.124939
-0.468253	and return a pointer	-0.124939
-0.468253	function return a pointer	-0.124939
-0.437227	function through a pointer	-0.124939
-0.613318	class through a pointer	-0.124939
-0.309446	b through a pointer	-0.124939
-0.437227	object through a pointer	-0.124939
-0.309446	variable through a pointer	-0.124939
-0.437227	accessed through a pointer	-0.124939
-0.309446	jump through a pointer	-0.124939
-0.450124	change what a pointer	-0.124939
-0.348274	to transfer a pointer	-0.124939
-0.450124	mechanism stores a pointer	-0.124939
-0.348274	type. Likewise, a pointer	-0.124939
-0.495111	for converting a pointer	-0.124939
-0.450124	and returns a pointer	-0.124939
-0.348274	// Returns a pointer	-0.124939
-0.348274	or decrementing a pointer	-0.124939
-0.348274	modified. Unlike a pointer	-0.124939
-0.348274	to pass a pointer	-0.124939
-0.358698	pointer arithmetics and pointer	-0.124939
-0.463339	a pointer. The pointer	-0.124939
-1.590133	is used for pointer	-0.124939
-0.358732	optimize("a",on). Specifies that pointer	-0.124939
-0.659017	a reference or pointer	-0.124939
-0.402438	of the function pointer	-0.124939
-1.431128	time the function pointer	-0.124939
-0.777916	changes the function pointer	-0.124939
-0.344298	through a function pointer	-0.124939
-0.185792	sets a function pointer	-0.425969
-0.517296	what a function pointer	-0.124939
-1.218730	a member function pointer	-0.124939
-0.444763	function through function pointer	-0.124939
-0.344028	// Set function pointer	-0.124939
-0.358618	are uninitialized, if pointer	-0.124939
-0.358491	last member. This pointer	-0.124939
-0.537727	integer, and this pointer	-0.124939
-0.356034	parameters). The this pointer	-0.124939
-0.352995	type conversion A pointer	-0.124939
-0.352995	Pointer arithmetic A pointer	-0.124939
-0.352995	Pointer elimination A pointer	-0.124939
-0.524071	code has no pointer	-0.124939
-0.329822	hint about no pointer	-0.124939
-0.329822	to assume no pointer	-0.124939
-0.134325	-fno-rtti Assume no pointer	-0.124939
-0.134325	78. Assume no pointer	-0.124939
-0.329822	for assuming no pointer	-0.124939
-0.134325	for "assume no pointer	-0.124939
-0.134325	option "assume no pointer	-0.124939
-0.580987	compares the array pointer	-0.124939
-0.538842	obstacle of possible pointer	-0.124939
-0.350922	never return any pointer	-0.124939
-0.453475	avoid making any pointer	-0.124939
-0.545780	that the member pointer	-0.124939
-0.994229	a data member pointer	-0.124939
-0.502369	taken. A const pointer	-0.124939
-0.538686	mode 4 4 pointer	-0.124939
-0.547452	unsigned 8 8 pointer	-0.124939
-1.013756	is a simple pointer	-0.124939
-0.459702	should have its pointer	-0.124939
-0.581066	insufficient information about pointer	-0.124939
-0.576120	that a specific pointer	-0.124939
-0.475110	function // Function pointer	-0.124939
-0.475110	CriticalFunction_Dispatch; // Function pointer	-0.124939
-0.352800	below. 126 Make pointer	-0.124939
-0.323557	array. No link pointer	-0.124939
-0.323557	the previous link pointer	-0.124939
-0.311668	vector aligned Assume pointer	-0.124939
-0.311668	__attribute(( aligned(16))) Assume pointer	-0.124939
-0.280261	use a smart pointer	-0.124939
-0.280261	whenever a smart pointer	-0.124939
-0.082237	pointers A smart pointer	-0.124939
-0.082237	used. A smart pointer	-0.124939
-0.183771	each their smart pointer	-0.124939
-0.234316	have a 'this' pointer	-0.124939
-0.169559	functions. The 'this' pointer	-0.124939
-0.169559	type-casting its 'this' pointer	-0.124939
-0.076556	The implicit 'this' pointer	-0.124939
-0.076556	an implicit 'this' pointer	-0.124939
-0.341567	plus 6 integer, pointer	-0.124939
-0.104447	InstructionSet(); // Set pointer	-0.425969
-0.473293	and by avoiding pointer	-0.124939
-0.458330	whether the original pointer	-0.124939
-0.325168	risky. The returned pointer	-0.124939
-0.325168	through the implicit pointer	-0.124939
-0.407777	about a variable, pointer	-0.124939
-0.048354	pointer -fomit- frame- pointer	-0.124939
-0.048354	/Oy -fomit- frame- pointer	-0.124939
-1.849432	the value of b	-0.124939
-0.539851	four elements of b	-0.124939
-0.526054	The offset of b	-0.124939
-0.462178	fast. Value of b	-0.124939
-0.169213	copy a to b	-0.124939
-0.027728	of a and b	-0.124939
-0.089137	for a and b	-0.124939
-0.020628	if a and b	-0.249877
-0.042286	If a and b	-0.124939
-0.089137	between a and b	-0.124939
-0.089137	making a and b	-0.124939
-0.089137	used. a and b	-0.124939
-0.089137	joining a and b	-0.124939
-0.089137	MultiplyBy<8>(10); a and b	-0.124939
-0.814644	the time and b	-0.124939
-0.633583	each element in b	-0.425969
-0.582965	we assume that b	-0.124939
-0.335531	to a = b	-0.124939
-0.335531	example, a = b	-0.124939
-0.200034	b; a = b	-0.301030
-0.133874	c; a = b	-0.425969
-0.335531	d; a = b	-0.124939
-0.378400	10; a = b	-0.124939
-0.335531	7.2 a = b	-0.124939
-0.335531	140 a = b	-0.124939
-0.335531	8.2b a = b	-0.124939
-0.062649	// result = b	-0.425969
-0.489022	If c = b	-0.124939
-0.489022	b; c = b	-0.124939
-1.219888	{ a[i] = b	-0.124939
-0.808616	temp; temp = b	-0.124939
-0.871474	but not if b	-0.124939
-0.142146	& b if b	-0.124939
-0.142146	| b if b	-0.124939
-0.354842	is inexact if b	-0.124939
-0.463058	extra check on b	-0.124939
-0.567815	a only when b	-0.124939
-0.515445	is efficient when b	-0.124939
-0.350855	different implementation when b	-0.124939
-0.350855	inefficient, however, when b	-0.124939
-0.724870	and b because b	-0.124939
-0.221269	= a + b	-0.726999
-0.310778	write a + b	-0.124939
-0.533014	= c + b	-0.124939
-0.328894	= 1.0f + b	-0.124939
-0.724351	= b * b	-0.425969
-0.356990	to a < b	-0.124939
-0.569731	with a & b	-0.124939
-0.355757	to align its b	-0.124939
-0.355588	this example, a, b	-0.124939
-0.354833	compute a / b	-0.124939
-0.565091	bool a, b; b	-0.124939
-0.339833	= 0, b; b	-0.124939
-0.354311	will be 1 b	-0.124939
-0.063007	+ 2 : b	-0.425969
-0.457729	operands and add b	-0.124939
-0.332710	the intermediate expression b	-0.124939
-0.332710	The equivalent expression b	-0.124939
-0.062247	vector b: __m128i b	-0.425969
-0.331690	b; double c; b	-0.124939
-1.227262	a, b, c; b	-0.124939
-0.386642	replace a && b	-0.124939
-0.386642	expression a && b	-0.124939
-0.550579	with a | b	-0.124939
-0.531178	replace a || b	-0.124939
-0.305035	return a > b	-0.124939
-0.126270	if (a > b	-0.124939
-0.126270	MAX(a,b) (a > b	-0.124939
-0.813549	a = 0, b	-0.124939
-0.453088	x[0] = a; b	-0.124939
-0.058428	c + 2, b	-0.425969
-0.488694	necessary to convert b	-0.124939
-0.057768	= a ? b	-0.425969
-0.729409	needs to evaluate b	-0.124939
-0.708105	= a ^ b	-0.124939
-0.335864	vector b: Is16vec8 b	-0.124939
-0.331768	I have AND'ed b	-0.124939
-0.044111	two); // Multiply b	-0.425969
-0.314348	sake of security. b	-0.124939
-0.574874	code that accesses b	-0.124939
-0.237584	a = -100, b	-0.124939
-0.237584	a = -1.0E8, b	-0.124939
-0.237584	= parabola (2.0f); b	-0.124939
-0.237584	a = 5.0f; b	-0.124939
-0.237584	c + two, b	-0.124939
-0.237584	= a XOR b	-0.124939
-0.237584	a = Multiply(10,8); b	-0.124939
-0.237584	temp = a+1; b	-0.124939
-0.358549	and divide it into	-0.124939
-0.890832	up a function into	-0.124939
-0.784359	the frame function into	-0.124939
-0.588563	parallelization of code into	-0.124939
-0.539737	the allocated memory into	-0.124939
-0.544816	up the data into	-0.124939
-0.795276	Organize the data into	-0.124939
-0.346302	the right data into	-0.124939
-0.346302	for organizing data into	-0.124939
-0.346302	data Loading data into	-0.124939
-0.428409	aligned integer vector into	-0.124939
-0.339483	unaligned integer vector into	-0.602060
-0.357605	the data set into	-0.124939
-1.100238	function or class into	-0.124939
-0.525780	elements of b into	-0.124939
-1.019199	of the library into	-0.124939
-0.977292	bit of i into	-0.124939
-0.461381	the allocated array into	-0.124939
-0.524754	data as possible into	-0.124939
-0.356978	Putting simple variables into	-0.124939
-0.559494	splitting of software into	-0.124939
-0.570863	feeds a branch into	-0.124939
-0.356493	the flags register into	-0.124939
-0.561223	have to take into	-0.124939
-0.348404	if you take into	-0.124939
-0.355976	256-bit read operations into	-0.124939
-0.459824	be split up into	-0.124939
-0.284346	divide the work into	-0.425969
-0.331500	amount of work into	-0.124939
-0.355387	time- consuming calculations into	-0.124939
-1.110896	code is compiled into	-0.124939
-0.063691	what fits best into	-0.425969
-0.557460	divide the matrix into	-0.124939
-0.449580	all source files into	-0.124939
-0.060461	multiple .cpp files into	-0.425969
-0.757862	and compatibility problems into	-0.124939
-0.572168	old memory block into	-0.124939
-0.456173	to be put into	-0.124939
-0.353279	the structure y into	-0.124939
-0.495741	must be read into	-0.124939
-0.292565	should be linked into	-0.124939
-0.292565	cases be linked into	-0.124939
-0.351962	of it) load into	-0.124939
-0.454349	the objects together into	-0.124939
-0.454144	wrapping the vectors into	-0.124939
-0.351183	reset or goes into	-0.124939
-0.453967	a test feature into	-0.124939
-0.453089	multiple .cpp modules into	-0.124939
-0.452649	elements will go into	-0.124939
-0.552923	program is loaded into	-0.124939
-0.263147	esp+12 and loaded into	-0.124939
-0.434151	cannot be loaded into	-0.124939
-0.485688	libraries are loaded into	-0.124939
-0.490455	put a task into	-0.124939
-0.237500	by copying them into	-0.124939
-0.237500	and copies them into	-0.124939
-0.237500	to join them into	-0.124939
-0.237500	and getting them into	-0.124939
-0.307894	split the tasks into	-0.124939
-0.399670	put time-consuming tasks into	-0.124939
-0.304134	does not fit into	-0.124939
-0.556441	the data fit into	-0.124939
-0.488423	splitting of N into	-0.124939
-0.346005	measurement instruments directly into	-0.124939
-0.798208	can be copied into	-0.124939
-0.344804	shall automatically come into	-0.124939
-0.344964	and go back into	-0.124939
-0.456031	easily be organized into	-0.124939
-0.382500	caches are organized into	-0.124939
-0.501302	take branch prediction into	-0.124939
-0.255627	that it fits into	-0.124939
-0.255627	four float's fits into	-0.124939
-0.493304	divide the job into	-0.124939
-0.331346	can be turned into	-0.124939
-0.331346	taking cache effects into	-0.124939
-0.331346	simply put 80 into	-0.124939
-0.331346	operation was split into	-0.124939
-0.038204	task is divided into	-0.124939
-0.009241	can be divided into	-0.124939
-0.038204	were not divided into	-0.124939
-0.038204	is usually divided into	-0.124939
-0.331346	can be combined into	-0.124939
-0.324694	one by one, into	-0.124939
-0.003826	elements from cc into	-0.602060
-0.011582	b: from cc into	-0.124939
-0.003826	elements from bb into	-0.602060
-0.011582	115 from bb into	-0.124939
-0.407320	desired measurement instruments into	-0.124939
-0.713237	0x2700 to 0x273F into	-0.124939
-0.407626	code is translated into	-0.124939
-0.314076	by some formula into	-0.124939
-0.006826	should be taken into	-0.602060
-0.143200	can be joined into	-0.124939
-0.226127	will be joined into	-0.124939
-0.293589	branch is fed into	-0.124939
-0.102661	can be wrapped into	-0.124939
-0.102661	they are wrapped into	-0.124939
-0.237356	to go deeper into	-0.124939
-0.237356	32-bit Windows. Integrates into	-0.124939
-0.237356	data fit nicely into	-0.124939
-0.237356	- preferably isolated into	-0.124939
-0.237356	a time packed into	-0.124939
-0.237356	branches to feed into	-0.124939
-0.436850	x = a +	-0.124939
-0.306436	b = a +	-0.124939
-0.749040	c = a +	-0.124939
-0.207356	y = a +	-0.602060
-0.450667	{ return a +	-0.124939
-0.266186	} return a +	-0.124939
-0.113011	2; return a +	-0.425969
-0.380105	3; return a +	-0.124939
-0.458245	to write a +	-0.124939
-0.354685	Sum1() {return a +	-0.124939
-0.077255	x * x +	-0.425969
-0.171293	8.0f) * x +	-0.124939
-0.248000	Z = A +	-0.124939
-0.248000	A2 = A +	-0.124939
-0.780601	a = b +	-0.124939
-0.155950	c = b +	-0.124939
-0.074887	a + b +	-0.425969
-0.137251	c + b +	-0.124939
-0.058893	b * b +	-0.425969
-0.461865	i ; i +	-0.124939
-0.356687	parameter 1: 4 +	-0.124939
-0.356663	parameter 1: 8 +	-0.124939
-0.271808	y = c +	-0.124939
-0.053695	b + c +	-0.425969
-0.053695	> 0, c +	-0.425969
-0.053695	0 ? c +	-0.425969
-0.271808	select_gt(b, zero, c +	-0.124939
-0.353430	b;} vector operator +	-0.124939
-0.024273	z = y +	-0.602060
-0.291764	+ a.x, y +	-0.124939
-0.231065	r = r +	-0.124939
-0.231065	a[i] = r +	-0.124939
-0.351158	temp = a[i] +	-0.124939
-0.350105	p = p +	-0.124939
-0.349347	(a + b) +	-0.124939
-0.128453	c = d +	-0.124939
-0.128453	y = d +	-0.124939
-0.073125	8; // exponent +	-0.124939
-0.073125	15; // exponent +	-0.124939
-0.073125	11; // exponent +	-0.124939
-0.344716	matrix[row][column] = row +	-0.124939
-0.065768	a[i] = *p +	-0.124939
-0.031640	*p = *p +	-0.124939
-0.438421	y = (a +	-0.124939
-0.486676	as(a << 4) +	-0.124939
-0.335947	i * 9 +	-0.124939
-0.331719	b) + (c +	-0.124939
-0.331597	a[i] = b[i] +	-0.124939
-0.458033	__svml_exp2 Intel SVML +	-0.124939
-0.325004	object for (b +	-0.124939
-0.006829	c = LoadVector(cc +	-0.602060
-0.006829	b = LoadVector(bb +	-0.602060
-0.006829	in aa: StoreVector(aa +	-0.602060
-0.023491	b*x*x + c*x +	-0.425969
-0.023491	a*x*x*x + b*x*x +	-0.425969
-0.023491	(a+b)+(c+d) a*b+a*c=a*(b+c) a*x*x*x +	-0.425969
-0.293737	aa[i] = bb[i] +	-0.124939
-0.023491	a[i] = log(b[i]) +	-0.124939
-0.293737	Func1(x) * Func1(x) +	-0.124939
-0.293737	a = 1.0f +	-0.124939
-0.293737	ALIGN ?Func@@YAXQAHAAH@Z ENDP +	-0.124939
-0.237487	a.y = b.y +	-0.124939
-0.237487	must compute (FuncRow(i)*columns +	-0.124939
-0.237487	y = a1/b1 +	-0.124939
-0.237487	internally as (int)&matrix[0][0] +	-0.124939
-0.237487	list[j].a = list[j].b +	-0.124939
-0.237487	d + e +	-0.124939
-0.237487	r) {return r.a +	-0.124939
-0.237487	Table[x] = A*x*x +	-0.124939
-0.237487	y = (a1*b2 +	-0.124939
-0.237487	A*x*x + B*x +	-0.124939
-0.237487	{ return x*x +	-0.124939
-0.237487	0) ? (cc[i] +	-0.124939
-0.237487	p) {return p->a +	-0.124939
-0.237487	x.d = y.d +	-0.124939
-0.237487	x.a = y.a +	-0.124939
-0.237487	x.b = y.b +	-0.124939
-0.237487	x.c = y.c +	-0.124939
-0.237487	{ return square(x) +	-0.124939
-0.237487	2.5*x^2 - 8*x +	-0.124939
-0.237487	(int)(&list[100]) = (int)(&list[0]) +	-0.124939
-0.237487	{ return vector(x +	-0.124939
-0.237487	a.x = b.x +	-0.124939
-0.237487	b.x + c.x +	-0.124939
-0.237487	b.y + c.y +	-0.124939
-0.144874	= a - n.a.	-0.204120
-1.103468	- - - n.a.	-0.823909
-0.754126	x - - n.a.	-0.903090
-0.117086	n.a. - - n.a.	-1.447158
-0.119358	n.a. x - n.a.	-0.602060
-0.527239	- n.a. - n.a.	-0.425969
-0.638879	x n.a. - n.a.	-0.124939
-0.450165	reciprocal n.a. - n.a.	-0.124939
-0.262286	2) 2 - n.a.	-0.124939
-0.130770	= 0 - n.a.	-0.221849
-0.175552	0= 0 - n.a.	-0.124939
-0.262286	>= b) - n.a.	-0.124939
-0.262286	= -1 - n.a.	-0.124939
-0.262286	= a*b - n.a.	-0.124939
-0.262286	Constant folding - n.a.	-0.124939
-0.484251	= a+(b+c) - n.a.	-0.124939
-0.484251	= a*(b+c) - n.a.	-0.124939
-0.262286	= b*a - n.a.	-0.124939
-0.262286	= a&(b|c) - n.a.	-0.124939
-0.262286	= a<<(b+c) - n.a.	-0.124939
-0.262286	= a*4 - n.a.	-0.124939
-0.389753	- n.a. x n.a.	-0.124939
-0.501380	x n.a. x n.a.	-0.124939
-0.350984	subexpression elimination x n.a.	-0.124939
-0.350984	a+b=b+a, a*b=b*a x n.a.	-0.124939
-0.004151	- - n.a. n.a.	-1.329059
-0.046461	x - n.a. n.a.	-0.602060
-0.211436	__INTEL_COMPILER __INTEL_COMPILER n.a. n.a.	-0.124939
-0.352004	_WIN32 Linux platform n.a.	-0.124939
-0.339433	by - reciprocal n.a.	-0.124939
-0.421371	Linux __INTEL_COMPILER __INTEL_COMPILER n.a.	-0.124939
-0.314717	not not _WIN32 n.a.	-0.124939
-0.294200	0.38 0.44 0.40 n.a.	-0.124939
-0.294200	0.24 0.25 0.24 n.a.	-0.124939
-0.237894	1.09 1.25 1.61 n.a.	-0.124939
-2.021642	part of the library	-0.124939
-1.568297	versions of the library	-0.124939
-1.257683	instance of the library	-0.124939
-0.893160	parameter to the library	-0.124939
-0.595163	compiler, and the library	-0.124939
-0.598650	supplied in the library	-0.124939
-0.893398	useful if the library	-0.124939
-0.592005	other than the library	-0.124939
-0.890114	resolved when the library	-0.124939
-0.559362	function from the library	-0.124939
-0.559362	needed from the library	-0.124939
-0.627247	to call the library	-0.124939
-0.759917	before calling the library	-0.124939
-0.554394	by including the library	-0.124939
-0.524600	and economize the library	-0.124939
-0.460504	program loads the library	-0.124939
-0.598467	log is a library	-0.124939
-1.367447	function in a library	-0.124939
-0.594737	ReadTSC as a library	-0.124939
-1.387952	For example, a library	-0.124939
-1.009766	the addresses of library	-0.124939
-0.549130	its time in library	-0.124939
-0.357879	of CPU-time in library	-0.124939
-0.357879	and __intel_new_strlen in library	-0.124939
-0.569073	source code. The library	-0.124939
-0.462523	vector register. The library	-0.124939
-1.296898	is useful for library	-0.124939
-0.502871	The program or library	-0.124939
-0.461685	as object or library	-0.124939
-0.485691	in the function library	-0.124939
-1.143096	as a function library	-0.124939
-0.565545	publish a function library	-0.124939
-0.545423	time. The function library	-0.124939
-0.525469	incompatible. A function library	-0.124939
-0.508260	an Intel function library	-0.124939
-0.515187	Use another function library	-0.124939
-0.441759	a standard function library	-0.124939
-0.625966	a separate function library	-0.124939
-0.341645	My own function library	-0.124939
-0.341645	Gnu C function library	-0.124939
-0.441759	Intel math function library	-0.124939
-0.341645	the asmlib function library	-0.124939
-0.341645	an up-to-date function library	-0.124939
-0.454687	C++ compiler. This library	-0.124939
-0.141236	Windows platforms. This library	-0.124939
-0.141236	x86 platforms. This library	-0.124939
-0.351879	v. 7.2). This library	-0.124939
-0.351879	option -mveclibabi=svml. This library	-0.124939
-0.304031	can use this library	-0.425969
-0.358129	Use ReadTSC() from library	-0.124939
-0.762414	a long vector library	-0.124939
-0.598498	big floating point library	-0.124939
-0.199154	the vector class library	-0.124939
-0.120797	The vector class library	-0.124939
-0.120797	// vector class library	-0.124939
-0.120797	Intel vector class library	-0.124939
-0.120797	My vector class library	-0.124939
-0.088260	Agner's vector class library	-0.124939
-0.458559	lately. Vector class library	-0.124939
-0.357754	version of most library	-0.124939
-0.353084	dispatching. Many Intel library	-0.124939
-0.353084	The undocumented Intel library	-0.124939
-1.366709	the most efficient library	-0.124939
-0.502586	linking for any library	-0.124939
-0.140000	the standard template library	-0.124939
-0.140000	The standard template library	-0.124939
-0.068113	in a dynamic library	-0.124939
-0.148933	which a dynamic library	-0.124939
-0.076907	it. A dynamic library	-0.124939
-0.076907	process. A dynamic library	-0.124939
-0.076907	linking. A dynamic library	-0.124939
-0.531101	the same dynamic library	-0.124939
-0.289778	with another dynamic library	-0.124939
-0.552462	if the necessary library	-0.124939
-0.569512	-fno-builtin to get library	-0.124939
-0.354393	know that standard library	-0.124939
-0.457772	efforts in optimizing library	-0.124939
-0.978020	to a graphics library	-0.124939
-0.352186	of dynamically linked library	-0.124939
-0.392666	The user interface library	-0.124939
-0.392666	A user interface library	-0.124939
-0.392666	popular user interface library	-0.124939
-0.432731	a static link library	-0.124939
-0.236968	a dynamic link library	-0.124939
-0.102518	separate dynamic link library	-0.425969
-0.319541	AMD math core library	-0.124939
-0.319541	AMD Math core library	-0.124939
-0.350545	and well- tested library	-0.124939
-0.513571	Intel vector math library	-0.124939
-0.513571	short vector math library	-0.124939
-0.551904	makes the entire library	-0.124939
-0.343691	function call. Load library	-0.124939
-0.438799	time on executing library	-0.124939
-0.339177	functions. Time- consuming library	-0.124939
-0.180213	in the asmlib library	-0.124939
-0.027552	set, using asmlib library	-0.425969
-0.237658	a re- usable library	-0.124939
-0.237658	SVML. The IPP library	-0.124939
-0.237658	"Intel Performance Primitives" library	-0.124939
-0.823188	linear function of i	-0.124939
-0.719891	new value of i	-0.124939
-0.500871	binary value of i	-0.124939
-0.500871	negative value of i	-0.124939
-0.500871	initial value of i	-0.124939
-0.435933	sign bit of i	-0.124939
-0.356154	The conversion of i	-0.124939
-0.504963	adds this to i	-0.124939
-0.550577	< 0 and i	-0.124939
-0.358735	not noticed that i	-0.124939
-1.349085	{ a[i] = i	-0.124939
-0.460473	i++){ list[i] = i	-0.124939
-0.356440	; eax = i	-0.124939
-0.358621	$B1$2 label if i	-0.124939
-1.393959	the same as i	-0.124939
-0.598554	p is not i	-0.124939
-0.358404	doesn't work int i	-0.124939
-0.653620	positive number when i	-0.124939
-0.355794	is invalid when i	-0.124939
-0.866245	that it has i	-0.124939
-0.358030	to 15. If i	-0.124939
-0.565590	interval, for example i	-0.124939
-0.022269	i = 0; i	-1.238882
-0.003773	(i = 0; i	-1.539912
-0.577508	this by making i	-0.124939
-0.343846	of i ; i	-0.124939
-0.343846	for ( ; i	-0.124939
-0.523300	i++){ list[i] += i	-0.124939
-0.534303	takes to add i	-0.124939
-0.571482	The loop counter i	-0.124939
-0.013181	{ for (int i	-0.425969
-0.026774	0; for (int i	-0.124939
-0.013181	... for (int i	-0.425969
-0.005224	vectors: for (int i	-0.823909
-0.026774	floats for (int i	-0.124939
-0.026774	sum; for (int i	-0.124939
-0.026774	_alloca) for (int i	-0.124939
-0.352632	>= min && i	-0.124939
-0.352398	< 0 || i	-0.124939
-0.892013	i < 100; i	-0.425969
-0.518769	(i = 2; i	-0.124939
-0.350817	compiler has replaced i	-0.124939
-0.297714	order to divide i	-0.425969
-0.346229	the loop condition i	-0.124939
-1.580846	i < size; i	-0.124939
-0.034618	i < 256; i	-0.823909
-0.339194	can also eliminate i	-0.124939
-0.434793	The two comparisons i	-0.124939
-0.434633	Rather than comparing i	-0.124939
-0.483663	or by type-casting i	-0.124939
-0.325082	int s; 40 i	-0.124939
-0.490166	x = 2.0; i	-0.124939
-0.293950	(i = StringLength; i	-0.124939
-0.538400	i < 20; i	-0.124939
-0.659480	// everything is float	-0.124939
-0.895710	b is a float	-0.124939
-1.060729	integer to a float	-0.124939
-0.592381	variable by a float	-0.124939
-0.525577	vectorized, because a float	-0.124939
-0.358702	used. Conversions of float	-0.124939
-0.762275	of i to float	-0.124939
-0.143121	overflow Integer to float	-0.124939
-0.143121	discussion. Integer to float	-0.124939
-0.358544	= 100000001.23456. The float	-0.124939
-0.939079	Induction variables for float	-0.124939
-0.210337	4 bytes = float	-0.249877
-1.618023	} else { float	-0.124939
-0.855617	Exp(float x) { float	-0.124939
-0.096642	14.28 union { float	-0.124939
-0.096642	14.26 union { float	-0.124939
-0.096642	14.27 union { float	-0.124939
-0.096642	14.23 union { float	-0.124939
-0.096642	7.39 union { float	-0.124939
-0.096642	14.29 union { float	-0.124939
-0.096642	14.24 union { float	-0.124939
-0.096642	14.25 union { float	-0.124939
-0.879896	struct S1 { float	-0.124939
-1.480185	advantageous to use float	-0.124939
-0.499747	Efficient conversion from float	-0.124939
-0.718031	avoid conversions from float	-0.124939
-0.461583	x) { static float	-0.124939
-0.267685	b; static const float	-0.124939
-0.267685	__declspec(align(16)) static const float	-0.124939
-0.267685	boolb=0; static const float	-0.124939
-0.460693	out by 4 float	-0.124939
-0.547129	mode 8 8 float	-0.124939
-0.524700	SSE 128 bit float	-0.124939
-0.440703	AVX 256 bit float	-0.124939
-0.340806	set (128 bit float	-0.124939
-0.546235	table by 16 float	-0.124939
-0.781579	2 128 SSE2 float	-0.124939
-0.418822	100; int i; float	-0.124939
-0.591384	a[100]; int i; float	-0.124939
-0.418822	16; int i; float	-0.124939
-0.418822	1000; int i; float	-0.124939
-0.418822	matrix[rows][columns]; int i; float	-0.124939
-0.418822	7.19 int i; float	-0.124939
-0.355527	{} vector(float a, float	-0.124939
-0.354668	128 double 128 float	-0.124939
-0.354497	you get four float	-0.124939
-0.546223	of a list float	-0.124939
-0.577825	add_horizontal) static inline float	-0.124939
-0.352889	256 uint64_t 256 float	-0.124939
-0.352096	vector 56 public: float	-0.124939
-0.351804	xn = x; float	-0.124939
-0.326140	size = 100; float	-0.425969
-0.277869	ARRAYSIZE = 100; float	-0.124939
-0.516196	4 256 AVX2 float	-0.124939
-0.351212	example 8.15a were float	-0.124939
-0.350482	7.11 bool a; float	-0.124939
-0.096356	14.7 Don't mix float	-0.425969
-0.488461	example, to convert float	-0.124939
-0.345131	12.9a. Taylor series float	-0.124939
-0.441156	integer Register variables, float	-0.124939
-0.271790	4 float a[100]; float	-0.124939
-0.271790	list float a[100]; float	-0.124939
-0.697269	8 512 AVX512 float	-0.124939
-0.713454	size = 1000; float	-0.124939
-0.314349	// Example 14.6 float	-0.124939
-0.314349	// Example 7.1 float	-0.124939
-0.407659	columns = 8; float	-0.124939
-0.314115	+ 1.0f;} 66 float	-0.124939
-0.713334	int i, j; float	-0.124939
-0.314349	// Example 7.27 float	-0.124939
-0.314349	// Example 7.24 float	-0.124939
-0.314349	// Example 7.16 float	-0.124939
-0.537830	Common subexpression elimin., float	-0.124939
-0.382112	x; // x^2 float	-0.124939
-0.382112	89 int a[1000]; float	-0.124939
-0.382112	sum = 1.f; float	-0.124939
-0.293626	// Example 11.1a float	-0.124939
-0.293626	// Example 11.1b float	-0.124939
-0.293626	1.2; // Mixing float	-0.124939
-0.237389	// Example 8.3a float	-0.124939
-0.237389	// Example 7.29a float	-0.124939
-0.237389	the code mixes float	-0.124939
-0.237389	a * a;} float	-0.124939
-0.237389	of n floats: float	-0.124939
-0.237389	// Example 8.1b float	-0.124939
-0.237389	// Example 8.1a float	-0.124939
-0.237389	// Example 8.16 float	-0.124939
-0.237389	// Example 8.18 float	-0.124939
-0.237389	columns = 32; float	-0.124939
-0.237389	// Example 14.18a float	-0.124939
-0.237389	// Example 14.18b float	-0.124939
-0.237389	1./8.71782E10, 1./1.30767E12, 1./2.09227E13}; float	-0.124939
-0.237389	a[size], b[size], c[size]; float	-0.124939
-0.237389	// Example 7.26b float	-0.124939
-0.237389	// Example 7.26a float	-0.124939
-0.237389	columns = 50; float	-0.124939
-0.237389	in registers (8 float	-0.124939
-0.237389	// Example 14.2a float	-0.124939
-0.237389	// Example 14.2b float	-0.124939
-0.659133	to utilize the multiple	-0.124939
-0.358562	then merge the multiple	-0.124939
-0.358562	to combine the multiple	-0.124939
-0.560504	memory is a multiple	-0.124939
-0.560504	array is a multiple	-0.124939
-0.964824	matrix is a multiple	-0.124939
-0.195485	stride is a multiple	-0.425969
-0.195112	spaced by a multiple	-0.425969
-0.356288	array size a multiple	-0.124939
-0.356288	are spaced a multiple	-0.124939
-0.504354	is ported to multiple	-0.124939
-0.358454	7.38b. Alternative to multiple	-0.124939
-1.058407	the code in multiple	-0.124939
-0.708485	of code in multiple	-0.124939
-0.166747	critical code in multiple	-0.425969
-0.500898	entire program in multiple	-0.124939
-1.429765	are used in multiple	-0.124939
-0.355981	is compiled in multiple	-0.124939
-0.358658	Kernel Library. The multiple	-0.124939
-0.951207	be used for multiple	-0.124939
-0.461636	same array for multiple	-0.124939
-0.568842	Contain one or multiple	-0.124939
-0.356247	multiple platforms or multiple	-0.124939
-0.356247	called once or multiple	-0.124939
-1.372441	For example, if multiple	-0.124939
-0.525574	thread safe if multiple	-0.124939
-0.830704	is used by multiple	-0.124939
-0.357201	paralleli- zation by multiple	-0.124939
-0.810093	a CPU with multiple	-0.124939
-0.748030	a loop with multiple	-0.124939
-0.517646	utilize systems with multiple	-0.124939
-0.350164	A method with multiple	-0.124939
-0.350164	An expression with multiple	-0.124939
-0.492798	a computer with multiple	-0.124939
-0.350164	competition. Processors with multiple	-0.124939
-0.658968	is performed on multiple	-0.124939
-0.594976	cases such as multiple	-0.124939
-0.176897	functions that have multiple	-0.124939
-0.593425	way to use multiple	-0.124939
-0.534655	loop and use multiple	-0.124939
-0.353505	future. To use multiple	-0.124939
-0.358096	details. Inheritance from multiple	-0.124939
-0.358096	that CParent::Hello() has multiple	-0.124939
-0.594504	dispatching to make multiple	-0.124939
-0.746927	you may make multiple	-0.124939
-1.080772	This will make multiple	-0.124939
-0.462135	you can set multiple	-0.124939
-0.889157	is to do multiple	-0.124939
-0.316157	divide it into multiple	-0.124939
-0.409900	a function into multiple	-0.124939
-0.316157	of code into multiple	-0.124939
-0.682178	the data into multiple	-0.124939
-0.316157	split up into multiple	-0.124939
-0.098881	the work into multiple	-0.425969
-0.409900	the tasks into multiple	-0.124939
-0.316157	the job into multiple	-0.124939
-0.682178	be divided into multiple	-0.124939
-0.046785	is shared between multiple	-0.124939
-0.046785	and shared between multiple	-0.124939
-0.046785	be shared between multiple	-0.124939
-0.046785	are shared between multiple	-0.124939
-0.046785	not shared between multiple	-0.124939
-0.310363	that jump between multiple	-0.124939
-0.310363	is distributed between multiple	-0.124939
-0.310363	the workload between multiple	-0.124939
-0.356329	or mask out multiple	-0.124939
-0.546138	feature for making multiple	-0.124939
-0.355653	by semicolons, while multiple	-0.124939
-0.998024	order to avoid multiple	-0.124939
-0.691711	You may avoid multiple	-0.124939
-0.482312	may go through multiple	-0.124939
-0.342588	line separately through multiple	-0.124939
-0.433877	function is doing multiple	-0.124939
-0.306948	renaming and doing multiple	-0.124939
-0.642812	CPU from doing multiple	-0.124939
-0.306948	the CPU doing multiple	-0.124939
-0.353345	assignment. shared_ptr allows multiple	-0.124939
-0.353198	in parallel: Using multiple	-0.124939
-0.352977	term for running multiple	-0.124939
-0.351452	can automatically generate multiple	-0.124939
-0.350455	environment (IDE) supports multiple	-0.124939
-0.098543	operators for checking multiple	-0.425969
-0.449780	useful for testing multiple	-0.124939
-0.348002	is declared. Avoid multiple	-0.124939
-0.348115	CPU cores: Define multiple	-0.124939
-0.448891	possibility of compiling multiple	-0.124939
-0.347137	access patterns containing multiple	-0.124939
-0.545241	is to keep multiple	-0.124939
-0.255624	Example 14.7b. Testing multiple	-0.124939
-0.255624	Example 14.7a. Testing multiple	-0.124939
-0.314368	vector processing instructions, multiple	-0.124939
-0.382407	useful for supporting multiple	-0.124939
-0.237601	data processing. Running multiple	-0.124939
-0.237601	or for combining multiple	-0.124939
-0.237601	on access. Run multiple	-0.124939
-0.237601	you can toggle multiple	-0.124939
-0.880075	which of the two	-0.124939
-0.595027	order of the two	-0.425969
-0.590400	results of the two	-0.124939
-0.596952	identical for the two	-0.124939
-0.893804	recognize that the two	-0.124939
-0.593541	software because the two	-0.124939
-0.803846	difference between the two	-0.124939
-0.489779	transitions between the two	-0.124939
-0.356906	to mix the two	-0.124939
-0.655831	to keep the two	-0.124939
-0.461067	d); Now the two	-0.124939
-0.356906	may interleave the two	-0.124939
-0.356906	chains, namely the two	-0.124939
-1.271217	value that is two	-0.124939
-1.515989	the use of two	-0.124939
-0.845174	bit vector of two	-0.124939
-1.189642	the performance of two	-0.124939
-0.539499	on vectors of two	-0.124939
-0.907281	be used in two	-0.124939
-0.536904	also used in two	-0.124939
-0.523652	memory-hungry software in two	-0.124939
-0.356457	by compiling in two	-0.124939
-0.783641	classes defined in two	-0.124939
-0.356457	integer representations in two	-0.124939
-0.587881	1.0f; } The two	-0.124939
-0.358034	broken up. The two	-0.124939
-0.659443	C, specifying that two	-0.124939
-1.285399	There may be two	-0.124939
-0.555042	cases, there are two	-0.124939
-0.555042	Typically, there are two	-0.124939
-0.494944	code. There are two	-0.124939
-0.494944	address. There are two	-0.124939
-0.494944	threads. There are two	-0.124939
-0.494944	maintenance There are two	-0.124939
-0.494944	way: There are two	-0.124939
-0.461790	22 one or two	-0.124939
-0.461790	units, one or two	-0.124939
-0.355166	bits each, or two	-0.124939
-0.355166	for AVX2, or two	-0.124939
-0.502829	the loop by two	-0.124939
-0.355917	you unroll by two	-0.124939
-0.357087	a table with two	-0.124939
-0.461297	be calculated with two	-0.124939
-0.356884	256-bit vector as two	-0.124939
-0.461038	fact represented as two	-0.124939
-0.585333	go more than two	-0.124939
-0.594338	integer rather than two	-0.124939
-0.561165	good to have two	-0.124939
-0.859172	that you have two	-0.124939
-0.530297	modern CPUs have two	-0.124939
-0.351719	x86 family have two	-0.124939
-1.207692	a program has two	-0.124939
-0.355073	example 8.23b has two	-0.124939
-0.891704	common to make two	-0.124939
-0.569661	tool can make two	-0.124939
-0.457957	nearest integer. If two	-0.124939
-0.354458	is considerable. If two	-0.124939
-1.313064	possible to do two	-0.124939
-0.839031	takes to do two	-0.124939
-0.352866	read operations into two	-0.124939
-0.352866	was split into two	-0.124939
-0.585222	give the variable two	-0.124939
-1.119366	the difference between two	-0.124939
-0.329760	that select between two	-0.124939
-0.040323	that chooses between two	-0.124939
-0.084777	program chooses between two	-0.124939
-0.538911	go one way two	-0.124939
-0.778443	for the first two	-0.124939
-0.535280	that the first two	-0.124939
-0.555644	mode. The first two	-0.124939
-0.483875	Which of these two	-0.124939
-0.483875	combination of these two	-0.124939
-0.425878	function and these two	-0.124939
-0.328986	to distinguish these two	-0.124939
-0.560694	capable of making two	-0.124939
-0.342382	2; } These two	-0.124939
-0.342382	Addison-Wesley, 1996. These two	-0.124939
-0.476097	it is doing two	-0.124939
-0.476097	used for doing two	-0.124939
-0.402030	able to run two	-0.124939
-0.402030	try to run two	-0.124939
-0.549090	ebx. The next two	-0.124939
-0.062250	of (2,2,2,2,2,2,2,2) __m128i two	-0.425969
-0.648080	to avoid running two	-0.124939
-0.352774	instruction set. Make two	-0.124939
-0.321618	done with just two	-0.124939
-0.321618	to have just two	-0.124939
-0.351106	This would require two	-0.124939
-0.348838	It doesn't prevent two	-0.124939
-0.094330	macro to swap two	-0.124939
-0.287358	x for approximately two	-0.124939
-0.287358	be accessed approximately two	-0.124939
-0.339142	times. Then again two	-0.124939
-0.467124	us to compare two	-0.124939
-0.314397	an addition. Comparing two	-0.124939
-0.293894	error condition. Replacing two	-0.124939
-0.237625	registers and correspondingly two	-0.124939
-0.237625	space by allowing two	-0.124939
-0.591792	class of the object	-0.124939
-1.550298	size of the object	-0.124939
-1.388906	beginning of the object	-0.124939
-1.518049	pointer to the object	-0.124939
-0.885487	declaration and the object	-0.124939
-0.597292	name in the object	-0.124939
-0.594787	destructor for the object	-0.124939
-0.586141	block that the object	-0.124939
-0.586141	feature that the object	-0.124939
-0.199331	needed if the object	-0.124939
-0.857419	fail if the object	-0.124939
-0.593702	compatible on the object	-0.124939
-0.854933	called when the object	-0.124939
-0.577314	released when the object	-0.124939
-0.590174	compilers at the object	-0.124939
-0.591550	object. If the object	-0.124939
-0.585599	determined where the object	-0.124939
-0.354981	virtual member the object	-0.124939
-0.354981	single register the object	-0.124939
-1.271555	make sure the object	-0.124939
-0.569645	compile-time whether the object	-0.124939
-0.354981	no destructor the object	-0.124939
-0.354981	may move the object	-0.124939
-0.354981	and deleting the object	-0.124939
-0.354981	by constructing the object	-0.124939
-0.354981	are met: the object	-0.124939
-0.461406	what class of object	-0.124939
-0.592400	release version of object	-0.124939
-1.161708	the type of object	-0.124939
-0.848589	the advantages of object	-0.124939
-0.065582	negative effects of object	-0.425969
-0.502958	and classes. The object	-0.124939
-0.357455	to another. The object	-0.124939
-0.357455	about division). The object	-0.124939
-0.536771	Any array or object	-0.124939
-0.077330	a variable or object	-0.124939
-0.171479	no variable or object	-0.124939
-0.504400	libraries distributed as object	-0.124939
-0.548527	pointer is an object	-0.124939
-0.550776	type of an object	-0.124939
-0.169662	points to an object	-0.425969
-0.172719	together in an object	-0.425969
-0.423419	dictates that an object	-0.124939
-0.492186	called on an object	-0.124939
-0.545452	way as an object	-0.124939
-0.327018	every time an object	-0.124939
-0.333233	may use an object	-0.124939
-0.333233	then use an object	-0.124939
-0.460976	to such an object	-0.124939
-0.598241	to access an object	-0.124939
-0.133430	to accessing an object	-0.124939
-0.133430	when accessing an object	-0.124939
-0.423419	called whenever an object	-0.124939
-0.327018	pointer. Accessing an object	-0.124939
-0.327018	function construct an object	-0.124939
-1.145623	of the data object	-0.124939
-0.553082	access the data object	-0.124939
-0.358042	Supports three different object	-0.124939
-1.534970	to the same object	-0.124939
-0.539673	block from one object	-0.124939
-0.881328	sure that no object	-0.124939
-0.997737	size of each object	-0.124939
-0.504244	destructors of each object	-0.124939
-0.345119	to store each object	-0.124939
-0.345119	than moving each object	-0.124939
-0.548560	to a static object	-0.124939
-0.589654	adding the first object	-0.124939
-0.532104	for a new object	-0.124939
-0.896100	time a new object	-0.124939
-0.331187	mentioned above. An object	-0.124939
-0.331187	is declared. An object	-0.124939
-0.331187	7.22 Inheritance An object	-0.124939
-0.599548	into a single object	-0.425969
-1.130231	class or structure object	-0.124939
-0.370739	compile the shared object	-0.124939
-0.108058	of a shared object	-0.124939
-0.082304	in a shared object	-0.301030
-0.108058	compile a shared object	-0.124939
-0.093928	libraries. A shared object	-0.124939
-0.093928	above. A shared object	-0.124939
-0.286278	a 64-bit shared object	-0.124939
-0.213941	very large shared object	-0.124939
-0.354253	// makes intermediate object	-0.124939
-0.352195	following reasons: Each object	-0.124939
-0.224641	and the local object	-0.124939
-0.224641	make the local object	-0.124939
-0.346225	main reasons why object	-0.124939
-0.501989	of a temporary object	-0.124939
-0.594904	programming textbooks recommend object	-0.124939
-0.325228	to a contained object	-0.124939
-0.458439	that the original object	-0.124939
-0.212165	one. The existing object	-0.124939
-0.284182	modify an existing object	-0.124939
-0.294015	commercial compilers. Mixing object	-0.124939
-0.237731	of the usual object	-0.124939
-0.570971	n is the number	-0.124939
-0.570971	r is the number	-0.124939
-1.181566	bits of the number	-0.124939
-1.259687	equal to the number	-0.124939
-0.564541	processors and the number	-0.124939
-0.564541	branches and the number	-0.124939
-0.564541	flow and the number	-0.124939
-0.584193	calculate that the number	-0.124939
-0.584193	expected that the number	-0.124939
-0.561172	array or the number	-0.124939
-1.180266	faster if the number	-0.124939
-0.838524	example, if the number	-0.124939
-0.568578	complicated if the number	-0.124939
-0.568578	minimized if the number	-0.124939
-0.573512	and by the number	-0.124939
-1.433089	divisible by the number	-0.124939
-1.515118	depends on the number	-0.124939
-1.003783	such as the number	-0.124939
-0.828616	more than the number	-0.124939
-0.828616	priority than the number	-0.124939
-0.852156	useful when the number	-0.124939
-0.575847	negligible when the number	-0.124939
-1.704254	to make the number	-0.124939
-0.710023	code. If the number	-0.124939
-0.494885	used. If the number	-0.124939
-0.710023	cache. If the number	-0.124939
-0.494885	problem. If the number	-0.124939
-0.494885	www.agner.org/optimize/cppexamples.zip. If the number	-0.124939
-0.494885	time? If the number	-0.124939
-0.456328	would double the number	-0.124939
-0.506971	set where the number	-0.124939
-1.071638	cases where the number	-0.124939
-0.730038	situations where the number	-0.124939
-0.570271	distinguishing between the number	-0.124939
-1.068154	of making the number	-0.124939
-0.543342	critical. Therefore, the number	-0.124939
-0.518836	to reduce the number	-0.124939
-0.496985	of reducing the number	-0.124939
-0.648434	that measures the number	-0.124939
-0.788776	present manual is number	-0.124939
-1.068730	used in a number	-0.124939
-0.305152	There are a number	-0.425969
-0.859862	available from a number	-0.124939
-0.357580	you want a number	-0.124939
-0.556575	application program. The number	-0.124939
-0.871684	is available. The number	-0.124939
-0.458526	the system. The number	-0.124939
-0.354906	bit systems: The number	-0.124939
-0.651859	by 8. The number	-0.124939
-0.354906	is small. The number	-0.124939
-0.354906	overlap. 27 The number	-0.124939
-0.065507	= 512; // number	-0.425969
-0.356587	= 64; // number	-0.124939
-0.568240	a to this number	-0.124939
-0.559002	always use this number	-0.124939
-0.453038	a floating point number	-0.221849
-0.779102	nonzero floating point number	-0.124939
-0.543905	belong to set number	-0.124939
-0.353610	lines in set number	-0.124939
-1.107767	of a variable number	-0.124939
-0.538427	time. A variable number	-0.124939
-0.759919	as a 32-bit number	-0.124939
-1.033414	a very large number	-0.124939
-0.342893	address of element number	-0.124939
-0.342893	we reach element number	-0.124939
-0.499811	have the line number	-0.124939
-0.353849	finished. The optimal number	-0.124939
-0.353204	a false model number	-0.124939
-0.789667	with a higher number	-0.124939
-0.131022	a large positive number	-0.124939
-0.131022	very large positive number	-0.124939
-0.459192	only a limited number	-0.124939
-0.404220	storage A limited number	-0.124939
-0.189725	27). The maximum number	-0.124939
-0.189725	weekdays. The maximum number	-0.124939
-0.189725	67 The maximum number	-0.124939
-0.348172	have a reduced number	-0.124939
-0.298787	when the total number	-0.124939
-0.124190	If the total number	-0.425969
-0.341655	CPUs have family number	-0.124939
-0.341612	useful for random number	-0.124939
-0.479591	get a realistic number	-0.124939
-0.047233	into an excessive number	-0.124939
-0.047233	avoid an excessive number	-0.124939
-0.047233	Avoid an excessive number	-0.124939
-0.331626	by the 107 number	-0.124939
-0.102809	for an increasing number	-0.124939
-0.102809	seeing an increasing number	-0.124939
-0.325288	with an extended number	-0.124939
-0.325122	a valid 63 number	-0.124939
-0.314494	was an odd number	-0.124939
-0.143325	18 will evict number	-0.124939
-0.143325	17 will evict number	-0.124939
-0.237706	simultaneous lookups Max. number	-0.124939
-0.237706	get an integral number	-0.124939
-0.577643	variable with the static	-0.124939
-0.577643	declared with the static	-0.124939
-1.801804	to use the static	-0.124939
-0.595249	functions because the static	-0.124939
-0.572177	and without the static	-0.124939
-0.539957	may add the static	-0.124939
-1.177082	reference to a static	-0.124939
-1.207763	than in a static	-0.124939
-0.863349	data in a static	-0.124939
-1.430696	stored in a static	-0.124939
-0.583480	DLL or a static	-0.124939
-0.585726	resources than a static	-0.124939
-1.445800	The advantage of static	-0.124939
-0.357774	The mechanism of static	-0.124939
-0.462170	thread-local storage of static	-0.124939
-0.882223	the behavior of static	-0.124939
-0.463452	efficient. Access to static	-0.124939
-0.357705	all public and static	-0.124939
-0.462082	Avoid global and static	-0.124939
-0.357705	ABC 123 and static	-0.124939
-0.723723	the table in static	-0.124939
-0.810831	be stored in static	-0.124939
-1.190794	are stored in static	-0.124939
-0.461076	Storing something in static	-0.124939
-0.573243	inlined function. The static	-0.124939
-0.501062	the memory. The static	-0.124939
-0.523124	same class. The static	-0.124939
-0.501062	other module. The static	-0.124939
-0.356098	function tables. The static	-0.124939
-0.659403	is clear that static	-0.124939
-0.357324	functions inline or static	-0.124939
-0.357324	returns. Global or static	-0.124939
-0.883913	but not if static	-0.124939
-0.578735	should rely on static	-0.124939
-0.355317	you want as static	-0.124939
-0.106516	implemented either as static	-0.425969
-0.356462	more efficiently than static	-0.124939
-0.547593	libraries slower than static	-0.124939
-0.876473	(int x) { static	-0.124939
-0.537050	Func () { static	-0.124939
-1.413946	is to use static	-0.124939
-0.355851	should never use static	-0.124939
-0.358217	executable file when static	-0.124939
-0.151376	member functions. A static	-0.425969
-0.352917	the stack. A static	-0.124939
-0.488285	constant data from static	-0.124939
-0.346909	the table from static	-0.124939
-0.346909	entire list from static	-0.124939
-0.346909	functions linked from static	-0.124939
-0.346909	is copied from static	-0.124939
-0.358104	caching problems because static	-0.124939
-0.499091	declare all functions static	-0.124939
-0.856751	make member functions static	-0.124939
-0.578958	used for all static	-0.124939
-1.050553	advantage of using static	-0.124939
-1.050553	advantages of using static	-0.124939
-1.025218	improved by using static	-0.124939
-0.656869	the local object static	-0.124939
-0.350731	static static static static	-0.124939
-0.243415	module static static static	-0.124939
-0.346644	same module static static	-0.124939
-0.014304	vector from array static	-0.726999
-0.014304	vector into array static	-0.726999
-0.357314	member functions, where static	-0.124939
-0.459436	will align large static	-0.124939
-1.152368	a; int b; static	-0.124939
-0.717119	do not support static	-0.124939
-0.525919	available in both static	-0.124939
-0.335225	work with both static	-0.124939
-0.364474	add the keyword static	-0.124939
-0.364474	Add the keyword static	-0.124939
-0.352069	system, this requires static	-0.124939
-0.382462	powN { public: static	-0.124939
-0.382462	powN<true,0> { public: static	-0.124939
-0.382462	powN<true,N> { public: static	-0.124939
-0.382462	powN<true,1> { public: static	-0.124939
-0.348008	by making them static	-0.124939
-0.508629	feature. This includes static	-0.124939
-0.729613	from same module static	-0.124939
-0.517683	14.3b int n; static	-0.124939
-0.429220	recommended to specify static	-0.124939
-0.324953	x) { __declspec(align(16)) static	-0.124939
-0.594860	template <int N> static	-0.124939
-0.382361	141 #include <emmintrin.h> static	-0.124939
-0.293830	// Example 14.19 static	-0.124939
-0.293830	if statements (called static	-0.124939
-0.538188	Table of factorials: static	-0.124939
-0.293830	If the word static	-0.124939
-0.237568	7.29b floata; boolb=0; static	-0.124939
-0.237568	template <typename T> static	-0.124939
-0.237568	a cache line: static	-0.124939
-0.237568	{ return _mm_cvtss_si32(_mm_load_ss(&x));} static	-0.124939
-0.237568	the function add_horizontal) static	-0.124939
-0.600412	space of the 64-bit	-0.124939
-0.600048	alleviated in the 64-bit	-0.124939
-0.598586	certain that the 64-bit	-0.124939
-0.594540	do use the 64-bit	-0.124939
-1.058609	mode because the 64-bit	-0.124939
-0.561492	_mm_empty() after the 64-bit	-0.124939
-0.556459	VIA including the 64-bit	-0.124939
-0.373963	bits of a 64-bit	-0.124939
-0.584000	systems and a 64-bit	-0.124939
-1.189052	variables in a 64-bit	-0.124939
-1.349308	by using a 64-bit	-0.124939
-0.461526	you write a 64-bit	-0.124939
-0.736933	take advantage of 64-bit	-0.124939
-1.101672	The disadvantage of 64-bit	-0.124939
-0.357829	heavy marketing of 64-bit	-0.124939
-0.358834	of portability to 64-bit	-0.124939
-0.074482	and 32-bit and 64-bit	-0.124939
-0.035646	for 32-bit and 64-bit	-0.124939
-0.074482	between 32-bit and 64-bit	-0.124939
-0.074482	support 32-bit and 64-bit	-0.124939
-0.074482	both 32-bit and 64-bit	-0.124939
-0.074482	supports 32-bit and 64-bit	-0.124939
-0.035646	Supports 32-bit and 64-bit	-0.425969
-0.074482	including 32-bit and 64-bit	-0.124939
-0.074482	16-bit, 32-bit and 64-bit	-0.124939
-0.535197	32-bit integers and 64-bit	-0.124939
-0.042205	for 32- and 64-bit	-0.124939
-0.088958	Supports 32- and 64-bit	-0.124939
-0.100960	Linux than in 64-bit	-0.301030
-1.008947	be used in 64-bit	-0.124939
-0.134442	more efficient in 64-bit	-0.301030
-0.488008	slightly faster in 64-bit	-0.124939
-0.243449	in registers in 64-bit	-0.124939
-0.243449	integer registers in 64-bit	-0.124939
-0.543979	64 bits in 64-bit	-0.124939
-0.960471	are available in 64-bit	-0.124939
-0.509400	8 bytes in 64-bit	-0.124939
-0.543979	when running in 64-bit	-0.124939
-0.140392	not needed in 64-bit	-0.425969
-0.346709	be avoided in 64-bit	-0.124939
-0.346709	by default in 64-bit	-0.124939
-0.346709	as follows in 64-bit	-0.124939
-0.346709	and fourteen in 64-bit	-0.124939
-0.346709	available, i.e. in 64-bit	-0.124939
-0.064231	and sixteen in 64-bit	-0.124939
-0.346709	always enabled in 64-bit	-0.124939
-0.346709	much simpler in 64-bit	-0.124939
-0.346709	default anyway in 64-bit	-0.124939
-0.346709	not recognized in 64-bit	-0.124939
-0.358078	future. 6 The 64-bit	-0.124939
-0.462557	itself is. The 64-bit	-0.124939
-0.567339	Only available for 64-bit	-0.124939
-0.824295	programs compiled for 64-bit	-0.124939
-0.576176	inherent support for 64-bit	-0.124939
-0.655536	device drivers for 64-bit	-0.124939
-0.356328	in 32-bit or 64-bit	-0.124939
-0.356328	16-bit systems or 64-bit	-0.124939
-0.356328	segments (32-bit or 64-bit	-0.124939
-1.574323	more efficient than 64-bit	-0.124939
-0.886064	necessary to use 64-bit	-0.124939
-0.580399	we can use 64-bit	-0.124939
-0.964663	you may use 64-bit	-0.124939
-0.358157	been increased from 64-bit	-0.124939
-0.358041	in fact only 64-bit	-0.124939
-0.852145	supported on all 64-bit	-0.124939
-0.461753	split into two 64-bit	-0.124939
-0.329119	be 2 In 64-bit	-0.124939
-0.602174	less efficient. In 64-bit	-0.124939
-0.329119	64-bit Windows. In 64-bit	-0.124939
-0.329119	register parameters. In 64-bit	-0.124939
-0.329119	clock cycle. In 64-bit	-0.124939
-0.329119	usually 32. In 64-bit	-0.124939
-0.538724	unsigned 4 4 64-bit	-0.124939
-0.356103	for speeding up 64-bit	-0.124939
-0.355945	bit mode. Some 64-bit	-0.124939
-0.343190	library files. Use 64-bit	-0.124939
-0.343190	floating point. Use 64-bit	-0.124939
-0.353790	point numbers. Therefore, 64-bit	-0.124939
-0.803431	is more efficient. 64-bit	-0.124939
-0.449926	transferred in registers. 64-bit	-0.124939
-0.347192	will use full 64-bit	-0.124939
-0.635077	you can expect 64-bit	-0.124939
-0.795181	for internal references. 64-bit	-0.124939
-0.506923	you to define 64-bit	-0.124939
-0.875776	pointer or reference, 64-bit	-0.124939
-0.331626	edx, respectively. (In 64-bit	-0.124939
-0.325122	263-1 int64_t 29 64-bit	-0.124939
-0.293987	compiler: unsigned __int64 64-bit	-0.124939
-0.237706	in registers, whereas 64-bit	-0.124939
-0.237706	conventions are different. 64-bit	-0.124939
-0.460435	natural order and there	-0.124939
-0.356410	is limited and there	-0.124939
-0.356410	doesn't matter and there	-0.124939
-0.356410	are common, and there	-0.124939
-0.356410	memory areas, and there	-0.124939
-0.356410	are dominating and there	-0.124939
-0.529073	times and that there	-0.124939
-0.590768	operations so that there	-0.124939
-0.929495	a way that there	-0.124939
-0.446281	can assume that there	-0.124939
-0.493831	the feature that there	-0.124939
-0.565350	1. Note that there	-0.124939
-0.064776	be aware that there	-0.124939
-0.453456	have discovered that there	-0.124939
-0.350907	more complex, that there	-0.124939
-0.350907	you discover that there	-0.124939
-0.557537	many elements are there	-0.124939
-0.564395	out or if there	-0.124939
-0.335390	the program if there	-0.124939
-1.019278	be used if there	-0.124939
-0.335390	code cache if there	-0.124939
-0.472414	signed integer if there	-0.124939
-0.102700	function pointers if there	-0.425969
-0.768787	separate thread if there	-0.124939
-0.335390	32-bit programs if there	-0.124939
-0.335390	becomes bigger if there	-0.124939
-0.335390	becomes smaller if there	-0.124939
-0.298670	be safe if there	-0.124939
-0.298670	exception safe if there	-0.124939
-0.335390	the exponent if there	-0.124939
-0.509890	consuming, especially if there	-0.124939
-0.493034	will consider if there	-0.124939
-0.335390	taken, i.e. if there	-0.124939
-0.335390	object separately if there	-0.124939
-0.335390	multiple accumulators if there	-0.124939
-0.335390	function calls, if there	-0.124939
-0.358528	of convenience - there	-0.124939
-0.655108	more RAM than there	-0.124939
-0.356543	memory blocks than there	-0.124939
-0.756655	more efficient when there	-0.124939
-0.355809	is critical when there	-0.124939
-0.237025	CPU time then there	-0.124939
-0.237025	compile time then there	-0.124939
-0.334565	in memory then there	-0.124939
-0.491849	member functions then there	-0.124939
-0.334565	by two then there	-0.124939
-0.334565	the performance then there	-0.124939
-0.334565	an error then there	-0.124939
-0.334565	_endthread(), etc. then there	-0.124939
-0.612443	is used, then there	-0.124939
-0.334565	is elsewhere then there	-0.124939
-0.351708	behave differently because there	-0.124939
-0.351708	is negligible because there	-0.124939
-0.454470	particularly problematic because there	-0.124939
-0.343985	32-bit mode. If there	-0.124939
-0.138790	and again. If there	-0.124939
-0.138790	back again. If there	-0.124939
-0.343985	is running. If there	-0.124939
-0.343985	to calculate. If there	-0.124939
-0.448570	superfluous code, but there	-0.124939
-0.347044	double precision, but there	-0.124939
-0.347044	small devices, but there	-0.124939
-0.347044	data bases, but there	-0.124939
-0.351444	vector operations where there	-0.124939
-0.645026	cases, however, where there	-0.124939
-0.357240	a parameter, so there	-0.124939
-1.069271	In this case there	-0.124939
-0.427882	member function. But there	-0.124939
-0.330589	smart pointer. But there	-0.124939
-0.330589	unsigned integers. But there	-0.124939
-0.354799	program and whether there	-0.124939
-0.283905	as possible. However, there	-0.124939
-0.283905	most critical. However, there	-0.124939
-0.283905	set. 120 However, there	-0.124939
-0.283905	works automatically. However, there	-0.124939
-0.283905	they are. However, there	-0.124939
-0.316264	by default unless there	-0.124939
-0.316264	large object, unless there	-0.124939
-0.316264	loop manually unless there	-0.124939
-0.946062	In most cases, there	-0.124939
-0.474709	in some cases, there	-0.124939
-0.396732	In some cases, there	-0.124939
-0.456474	are simply put there	-0.124939
-0.351550	S1 ArrayOfStructures[100]; Here, there	-0.124939
-1.004142	the reason why there	-0.124939
-0.344969	time. (Of course there	-0.124939
-0.632454	registers are used, there	-0.124939
-0.536922	At the diagonal there	-0.124939
-0.625903	In 64-bit systems, there	-0.124939
-0.488879	many cases, however, there	-0.124939
-0.527191	code. In general, there	-0.124939
-0.331626	not all. Fortunately, there	-0.124939
-0.487633	functions, but unfortunately there	-0.124939
-0.237815	execution units. Typically, there	-0.124939
-0.237815	memory caches. Typically, there	-0.124939
-0.594708	set is enabled there	-0.124939
-0.382555	cannot be avoided, there	-0.124939
-1.891706	version of the C++	-0.124939
-0.201891	drawbacks of the C++	-0.124939
-0.591026	knowledge of the C++	-0.124939
-0.597796	standards for the C++	-0.124939
-0.595594	problem with the C++	-0.124939
-0.548786	languages. But the C++	-0.124939
-0.462021	hidden behind the C++	-0.124939
-0.357657	complicated? Because the C++	-0.124939
-0.897228	declared in a C++	-0.124939
-0.358168	processors. In a C++	-0.124939
-0.562629	operators. Make a C++	-0.124939
-0.592127	future version of C++	-0.124939
-0.050095	on optimization of C++	-0.602060
-0.845327	important disadvantage of C++	-0.124939
-0.847299	the advantages of C++	-0.124939
-0.932779	different brands of C++	-0.124939
-0.356460	and maintainability of C++	-0.124939
-0.995419	for Windows and C++	-0.124939
-0.358196	in C and C++	-0.124939
-0.554427	Virtual functions in C++	-0.124939
-0.757093	Optimizing software in C++	-0.124939
-0.142680	of errors in C++	-0.124939
-0.355983	parallel processing in C++	-0.124939
-0.459893	not allowed in C++	-0.124939
-0.355983	old-fashioned. Development in C++	-0.124939
-0.523156	connections, etc. The C++	-0.124939
-0.356120	Type conversions The C++	-0.124939
-0.460067	is needed. The C++	-0.124939
-0.501092	compilers work. The C++	-0.124939
-0.460067	allocated resource. The C++	-0.124939
-0.504650	preference is for C++	-0.124939
-1.203156	the sense that C++	-0.124939
-0.659190	type casting // C++	-0.124939
-0.171808	with C or C++	-0.124939
-0.171808	separate C or C++	-0.124939
-0.171808	either C or C++	-0.124939
-0.142808	Technical Report on C++	-0.124939
-0.142808	"Technical Report on C++	-0.124939
-0.884245	languages such as C++	-0.124939
-0.356874	well developed as C++	-0.124939
-0.462756	these disadvantages when C++	-0.124939
-0.358078	n! 117 A C++	-0.124939
-0.103947	efficiency of different C++	-0.124939
-0.513599	optimizations in different C++	-0.124939
-0.041247	conventions for different C++	-0.903090
-0.874398	are several different C++	-0.124939
-0.851912	work on all C++	-0.124939
-0.357729	code. Furthermore, most C++	-0.124939
-0.515896	with the Intel C++	-0.124939
-0.515896	See the Intel C++	-0.124939
-0.486302	features of Intel C++	-0.124939
-0.428017	Included with Intel C++	-0.124939
-0.330697	PathScale compilers. Intel C++	-0.124939
-0.330697	5, 2009). Intel C++	-0.124939
-0.330697	v. 2.00. Intel C++	-0.124939
-0.357674	the vectors into C++	-0.124939
-0.357422	need it. In C++	-0.124939
-0.556839	libraries. The Gnu C++	-0.124939
-0.344522	IA-32/Intel64, 2009. Gnu C++	-0.124939
-0.719175	implemented in compiled C++	-0.124939
-0.354812	that produces another C++	-0.124939
-0.354137	optimization options All C++	-0.124939
-0.354002	major platforms. However, C++	-0.124939
-0.332744	less efficient. Most C++	-0.124939
-0.332744	low-level optimizations. Most C++	-0.124939
-0.330605	Intel and Microsoft C++	-0.124939
-0.330605	were tested: Microsoft C++	-0.124939
-0.350882	tips on advanced C++	-0.124939
-0.350055	with most modern C++	-0.124939
-0.510036	optimized function libraries. C++	-0.124939
-0.299566	all platforms. PathScale C++	-0.124939
-0.299566	(Red Hat). PathScale C++	-0.124939
-0.511334	need assembly language. C++	-0.124939
-0.383663	with the Borland C++	-0.124939
-0.268921	Studio 2005). Borland C++	-0.124939
-0.325003	for several reasons. C++	-0.124939
-0.325003	C++ language While C++	-0.124939
-0.212079	be tolerated. PGI C++	-0.124939
-0.212079	3.1, 2007. PGI C++	-0.124939
-0.325116	written in C, C++	-0.124939
-0.314377	in a well-structured C++	-0.124939
-0.293876	on page 15. C++	-0.124939
-0.293876	known in 36 C++	-0.124939
-0.293876	to security. Standard C++	-0.124939
-0.293876	optimization. 14 Portability C++	-0.124939
-0.237609	best optimizer. Borland/CodeGear/Embarcadero C++	-0.124939
-0.237609	regularly. Intel: "Intel® C++	-0.124939
-0.237609	C++ 5.82 (Embarcadero/CodeGear/Borland C++	-0.124939
-1.202299	where it is also	-0.124939
-1.161898	cases it is also	-0.124939
-0.580436	performance, it is also	-0.124939
-1.639410	the function is also	-0.124939
-0.844084	The function is also	-0.124939
-0.593089	simultaneously. This is also	-0.124939
-1.115548	code. It is also	-0.124939
-0.578167	optimization. It is also	-0.124939
-0.578167	automatically. It is also	-0.124939
-0.517586	static memory is also	-0.124939
-0.747928	member functions is also	-0.124939
-0.552468	However, C++ is also	-0.124939
-0.646751	do so is also	-0.124939
-0.141372	is allocated is also	-0.124939
-0.141372	been allocated is also	-0.124939
-0.455246	output option is also	-0.124939
-0.774839	present manual is also	-0.124939
-0.550297	& operator is also	-0.124939
-0.246390	This mechanism is also	-0.124939
-0.246390	unwinding mechanism is also	-0.124939
-0.352320	target buffer is also	-0.124939
-0.352320	versatile. Fortran is also	-0.124939
-0.352320	or while-loop is also	-0.124939
-0.358708	C++ programs and also	-0.124939
-0.527008	another loop that also	-0.124939
-0.740196	the program are also	-0.124939
-0.064554	each other are also	-0.425969
-0.492324	vectors. There are also	-0.124939
-0.492324	8. There are also	-0.124939
-0.492324	post-increment. There are also	-0.124939
-0.492324	uses. There are also	-0.124939
-0.492324	point). There are also	-0.124939
-0.837003	allocated objects are also	-0.124939
-0.566462	purpose libraries are also	-0.124939
-0.513020	library files are also	-0.124939
-0.640615	used together are also	-0.124939
-0.349194	special purposes are also	-0.124939
-1.017170	but it can also	-0.124939
-1.245586	The compiler can also	-0.124939
-0.583248	Here, you can also	-0.124939
-0.370018	Linux. It can also	-0.124939
-0.370018	numbers. It can also	-0.124939
-0.370018	list[i].b. It can also	-0.124939
-0.493252	induction variables can also	-0.124939
-0.707346	A branch can also	-0.124939
-0.447612	operating systems can also	-0.124939
-0.346286	Container classes can also	-0.124939
-0.346286	template parameter can also	-0.124939
-0.697843	A union can also	-0.124939
-0.634944	periodic pattern can also	-0.124939
-0.346286	far (arrays can also	-0.124939
-0.358645	of time, it also	-0.124939
-0.358597	the latter function also	-0.124939
-0.356835	if possible. This also	-0.124939
-0.356835	page 137). This also	-0.124939
-1.543090	then you may also	-0.124939
-0.547545	Mbytes. There may also	-0.124939
-0.531567	future we may also	-0.124939
-0.352560	hash map may also	-0.124939
-0.358203	= OneOrTwo5[b!=0]; will also	-0.124939
-0.501897	mathematical functions. It also	-0.124939
-0.355538	see this. It also	-0.124939
-0.332878	under test but also	-0.124939
-0.489427	programmers' time, but also	-0.124939
-0.332878	many features, but also	-0.124939
-0.332878	programming languages, but also	-0.124939
-0.332878	hot spot but also	-0.124939
-0.332878	from main, but also	-0.124939
-0.332878	type casting, but also	-0.124939
-0.332878	Mac platform, but also	-0.124939
-0.862919	a function should also	-0.124939
-0.349977	or video should also	-0.124939
-0.349977	unattended. Uninstallation should also	-0.124939
-0.893532	AVX2 instruction set also	-0.124939
-0.539865	constant. The compilers also	-0.124939
-0.306723	for. Some systems also	-0.124939
-0.306723	card. Some systems also	-0.124939
-0.580635	some of these also	-0.124939
-0.581744	allocation. This method also	-0.124939
-0.537570	The register stack also	-0.124939
-0.929850	the C++ language also	-0.124939
-0.355238	here about Linux also	-0.124939
-0.354662	about increment operators also	-0.124939
-1.409508	Dynamic memory allocation also	-0.124939
-0.456355	CPU. These methods also	-0.124939
-0.752846	The static keyword also	-0.124939
-0.352389	inlining. Reducible expressions also	-0.124939
-0.928386	of the STL also	-0.124939
-0.348268	from www.intel.com. (See also	-0.124939
-0.448837	table and possibly also	-0.124939
-1.006331	is of course also	-0.124939
-0.524325	eliminated. Loop unrolling also	-0.124939
-0.344979	a coprocessor might also	-0.124939
-0.345017	called by F1 also	-0.124939
-0.336048	(YMM), and soon also	-0.124939
-0.037141	dynamic link libraries, also	-0.425969
-0.237682	It is 102 also	-0.124939
-0.525997	useful source of such	-0.124939
-0.462523	The consequence of such	-0.124939
-0.358052	The absence of such	-0.124939
-1.384595	or reference to such	-0.124939
-0.504264	performance costs to such	-0.124939
-0.503471	viable solution in such	-0.124939
-0.832553	is supported in such	-0.124939
-0.357822	often reorganized in such	-0.124939
-0.461523	most efficient for such	-0.124939
-0.461523	explicit checks for such	-0.124939
-0.357265	compiler warning for such	-0.124939
-0.525428	theoretical possibility that such	-0.124939
-0.656442	only hope that such	-0.124939
-0.357213	to realize that such	-0.124939
-1.285330	a member function such	-0.124939
-0.357297	the user if such	-0.124939
-0.357297	to gain if such	-0.124939
-0.353987	resource use on such	-0.124939
-0.353987	applications even on such	-0.124939
-0.498118	quite fast on such	-0.124939
-0.353987	point operation on such	-0.124939
-1.120634	do not have such	-0.124939
-0.459884	that do have such	-0.124939
-0.724989	You should use such	-0.124939
-0.357982	compiler doesn't make such	-0.124939
-0.335312	are using functions such	-0.124939
-0.246446	for mathematical functions such	-0.124939
-0.246446	or mathematical functions such	-0.124939
-0.246446	common mathematical functions such	-0.124939
-0.246446	computing mathematical functions such	-0.124939
-0.335312	with C functions such	-0.124939
-0.433800	common math functions such	-0.124939
-0.335312	or memory-intensive functions such	-0.124939
-0.357844	very often, but such	-0.124939
-1.867949	there is no such	-0.124939
-1.403526	possible to do such	-0.124939
-0.640126	will not do such	-0.124939
-0.703838	compilers will do such	-0.124939
-0.357679	vectorization Good compilers such	-0.124939
-0.357531	running. Programs using such	-0.124939
-0.723124	mathematical calculations. In such	-0.124939
-0.570515	computer with many such	-0.124939
-0.559555	Simple integer operations such	-0.124939
-0.877715	a composite type such	-0.124939
-0.459949	are special cases such	-0.124939
-0.756366	brands of CPUs such	-0.124939
-0.353889	the screen. However, such	-0.124939
-0.456856	will automatically replace such	-0.124939
-0.353352	seldom used branches such	-0.124939
-0.352373	from other applications such	-0.124939
-0.455262	for simple types such	-0.124939
-0.422911	of other optimizations such	-0.124939
-0.326611	to do optimizations such	-0.124939
-0.643192	a parenthesis around such	-0.124939
-0.539690	simple algebraic reductions such	-0.124939
-0.317000	in compiled languages such	-0.124939
-0.317000	This includes languages such	-0.124939
-0.227777	order to prevent such	-0.124939
-0.330808	way to prevent such	-0.124939
-0.227777	overhead to prevent such	-0.124939
-0.237598	important for tasks such	-0.124939
-0.314365	for standard tasks such	-0.124939
-0.237598	priority. Other tasks such	-0.124939
-0.237598	for trivial tasks such	-0.124939
-0.347925	a long time, such	-0.124939
-0.542291	main reason why such	-0.124939
-0.343238	digital building blocks such	-0.124939
-0.343238	number of purposes such	-0.124939
-0.341217	in mathematical iterations such	-0.124939
-0.486369	type of vector, such	-0.124939
-0.338831	with segmented memory, such	-0.124939
-0.339029	also third-party profilers such	-0.124939
-0.338831	are also available, such	-0.124939
-0.331555	synchronization between threads, such	-0.124939
-0.487141	Some programming languages, such	-0.124939
-0.324784	the same resources, such	-0.124939
-0.324951	risk of overflow, such	-0.124939
-0.837848	following example illustrates such	-0.124939
-0.324784	determined by considerations such	-0.124939
-0.314164	integers in comparisons, such	-0.124939
-0.314164	hardware definition language, such	-0.124939
-0.574538	enough to justify such	-0.124939
-0.314164	use string classes, such	-0.124939
-0.293672	Some STL templates, such	-0.124939
-0.293672	any other resource, such	-0.124939
-0.293672	of data shuffling, such	-0.124939
-0.382168	count certain events, such	-0.124939
-0.293672	names with suffixes such	-0.124939
-0.382168	debugging and maintaining such	-0.124939
-0.237430	are inherently serial, such	-0.124939
-0.237430	supports automatic vectorization, such	-0.124939
-0.237430	advantage to obtain, such	-0.124939
-0.237430	or removable media such	-0.124939
-0.237430	in table 9.2, such	-0.124939
-0.237430	the feature information, such	-0.124939
-0.237430	CPU hardware. Porting such	-0.124939
-0.237430	system may supply such	-0.124939
-0.598188	variable. This is efficient	-0.124939
-0.358595	* 1.5f; is efficient	-0.124939
-0.847251	for discussion of efficient	-0.124939
-0.505018	common obstacles to efficient	-0.124939
-0.463483	more compact and efficient	-0.124939
-1.055891	code will be efficient	-0.124939
-0.463396	57 Templates are efficient	-0.124939
-0.226666	operator is as efficient	-0.124939
-0.098710	destructor is as efficient	-0.425969
-0.641012	are therefore as efficient	-0.124939
-1.190500	as well as efficient	-0.124939
-0.140470	is exactly as efficient	-0.124939
-0.140470	are exactly as efficient	-0.124939
-0.703802	can be an efficient	-0.124939
-0.375016	will be an efficient	-0.124939
-0.375016	also be an efficient	-0.124939
-1.381241	Some compilers have efficient	-0.124939
-0.211760	that is more efficient	-0.124939
-0.174924	it is more efficient	-0.602060
-0.100843	It is more efficient	-0.903090
-0.211760	which is more efficient	-0.124939
-0.211760	class is more efficient	-0.124939
-0.211760	Linux is more efficient	-0.124939
-0.211760	transfer is more efficient	-0.124939
-0.211760	#if is more efficient	-0.124939
-0.211760	pre-increment is more efficient	-0.124939
-0.211760	*(p++) is more efficient	-0.124939
-0.211760	array[i++] is more efficient	-0.124939
-0.688987	in a more efficient	-0.124939
-0.338317	mode, and more efficient	-0.124939
-0.338317	cheaper and more efficient	-0.124939
-0.203910	may be more efficient	-0.425969
-0.306660	functions are more efficient	-0.124939
-0.306660	there are more efficient	-0.124939
-0.348217	replaced by more efficient	-0.124939
-0.543281	the code more efficient	-0.124939
-0.324466	point code more efficient	-0.124939
-0.413388	collection. A more efficient	-0.124939
-0.265783	to make more efficient	-0.124939
-0.490097	is often more efficient	-0.124939
-0.534483	is much more efficient	-0.124939
-0.379770	made much more efficient	-0.124939
-0.294694	data caching more efficient	-0.124939
-0.199044	makes caching more efficient	-0.124939
-0.534213	code becomes more efficient	-0.124939
-0.265783	function calling more efficient	-0.124939
-0.265783	makes inlining more efficient	-0.124939
-0.112869	is sometimes more efficient	-0.124939
-0.112869	are sometimes more efficient	-0.124939
-0.083451	is slightly more efficient	-0.124939
-0.186844	Sum1 slightly more efficient	-0.124939
-0.069442	is the most efficient	-0.249877
-0.388352	then the most efficient	-0.124939
-0.388352	finding the most efficient	-0.124939
-0.388352	select the most efficient	-0.124939
-0.388352	choosing the most efficient	-0.124939
-0.454431	type is most efficient	-0.124939
-0.478383	} The most efficient	-0.124939
-0.478383	condition The most efficient	-0.124939
-0.222773	systems are most efficient	-0.124939
-0.222773	statements are most efficient	-0.124939
-0.899313	is a very efficient	-0.124939
-0.724477	be a very efficient	-0.124939
-0.787366	will be very efficient	-0.124939
-0.244421	This is less efficient	-0.124939
-0.244421	compiler is less efficient	-0.124939
-0.244421	It is less efficient	-0.124939
-0.244421	array is less efficient	-0.124939
-0.244421	list is less efficient	-0.124939
-0.244421	numbers is less efficient	-0.124939
-0.244421	bitfield is less efficient	-0.124939
-0.264354	around and less efficient	-0.124939
-0.318647	can be less efficient	-0.124939
-0.318647	will be less efficient	-0.124939
-0.266918	functions are less efficient	-0.124939
-0.266918	libraries are less efficient	-0.124939
-0.266918	implementations are less efficient	-0.124939
-0.264354	as input less efficient	-0.124939
-0.356785	a[i]. Note how efficient	-0.124939
-0.355745	of matrices. An efficient	-0.124939
-0.555501	C++ is quite efficient	-0.124939
-0.456252	checking and various efficient	-0.124939
-0.294194	references are equally efficient	-0.124939
-0.294194	123; are equally efficient	-0.124939
-1.055127	CPU detection function In	-0.124939
-0.589399	5 } } In	-0.124939
-0.455781	return c; } In	-0.124939
-0.647582	= 2.0; } In	-0.124939
-0.502855	converting to double In	-0.124939
-0.356702	will be 2 In	-0.124939
-0.489405	of system code. In	-0.124939
-0.489405	the compiled code. In	-0.124939
-0.356285	7.8 Member pointers In	-0.124939
-0.355951	13.3 Difficult cases In	-0.124939
-0.826576	to the function. In	-0.124939
-0.879822	the critical function. In	-0.124939
-0.328758	11 programming, etc. In	-0.124939
-0.328758	loop counters, etc. In	-0.124939
-0.328758	its limit, etc. In	-0.124939
-0.565523	versus unsigned integers In	-0.124939
-0.354766	a += b; In	-0.124939
-1.306450	of the program. In	-0.124939
-0.450844	are less efficient. In	-0.124939
-0.450844	slightly less efficient. In	-0.124939
-0.351710	on different processors. In	-0.124939
-0.816098	outside the loop. In	-0.124939
-0.316689	of code size. In	-0.124939
-0.410561	available register size. In	-0.124939
-0.348936	the same variables. In	-0.124939
-0.740136	14.2 Bounds checking In	-0.124939
-0.348350	of the resources. In	-0.124939
-0.347674	you need it. In	-0.124939
-0.202228	and mathematical calculations. In	-0.124939
-0.202228	doing mathematical calculations. In	-0.124939
-0.271368	heavy graphics calculations. In	-0.124939
-0.563393	4 clock cycles. In	-0.124939
-0.523852	time. Loop unrolling In	-0.124939
-0.737633	in 64-bit Windows. In	-0.124939
-0.287159	which is faster. In	-0.124939
-0.574747	is much faster. In	-0.124939
-0.532666	as template parameter. In	-0.124939
-0.483164	an array element. In	-0.124939
-0.341160	128-bit XMM register. In	-0.124939
-0.341160	accessed equally fast. In	-0.124939
-0.338667	fourteen register parameters. In	-0.124939
-0.338912	modify objects simultaneously. In	-0.124939
-0.338789	and other optimizations. In	-0.124939
-0.510754	use assembly language. In	-0.124939
-0.268636	hence higher speed. In	-0.124939
-0.268636	the single-thread speed. In	-0.124939
-0.917702	divisible by 16. In	-0.124939
-0.472963	fits the application. In	-0.124939
-0.428548	with low priority. In	-0.124939
-0.331289	reduce them all. In	-0.124939
-0.921542	one clock cycle. In	-0.124939
-0.324832	power of two. In	-0.124939
-0.324625	"worst case" counts. In	-0.124939
-0.713066	long dependency chains. In	-0.124939
-0.314008	give -2.0 55 In	-0.124939
-0.314008	do it explicitly. In	-0.124939
-0.574255	is intended for. In	-0.124939
-0.314008	step by step. In	-0.124939
-0.314008	about this condition. In	-0.124939
-0.314008	it doesn't occur. In	-0.124939
-0.314008	be reused elsewhere. In	-0.124939
-0.314008	become very big. In	-0.124939
-0.381987	(see page 71). In	-0.124939
-0.293524	working software users. In	-0.124939
-0.293524	function can throw. In	-0.124939
-0.293524	integer, usually 32. In	-0.124939
-0.293524	false vendor string. In	-0.124939
-0.293524	program exception safe. In	-0.124939
-0.293524	always optimal, though. In	-0.124939
-0.293524	the same name. In	-0.124939
-0.293524	for each calculation. In	-0.124939
-0.381987	easily be obtained. In	-0.124939
-0.537652	a clock cycle? In	-0.124939
-0.537652	of mathematical purity. In	-0.124939
-0.293524	some microprocessors have. In	-0.124939
-0.237299	constructor specifying otherwise. In	-0.124939
-0.237299	(e.g. option /MT). In	-0.124939
-0.237299	calculation of B. In	-0.124939
-0.237299	comes to mind. In	-0.124939
-0.237299	on page 60. In	-0.124939
-0.237299	Dobbs Journal, 2002). In	-0.124939
-0.237299	of course system-specific. In	-0.124939
-0.237299	from mispredictions. 44 In	-0.124939
-0.237299	See page 34. In	-0.124939
-0.237299	their 32-bit counterparts. In	-0.124939
-0.237299	intervals are short. In	-0.124939
-0.237299	original is destroyed. In	-0.124939
-0.237299	supports self-relative addressing. In	-0.124939
-0.237299	the same divisor. In	-0.124939
-0.237299	= MAX(f(x), g(x)); In	-0.124939
-0.237299	random number generators. In	-0.124939
-0.237299	chapter (page 146). In	-0.124939
-0.237299	then call __intel_cpu_features_init_x(). In	-0.124939
-0.237299	in its API. In	-0.124939
-0.237299	modifies many strings. In	-0.124939
-0.113171	a = a *	-0.726999
-0.668576	c = a *	-0.124939
-0.591938	b as a *	-0.124939
-1.090983	{ return a *	-0.124939
-0.502812	(int a[], int *	-0.124939
-0.356517	__restrict aa, int *	-0.124939
-0.708346	x2 = x *	-0.124939
-0.015312	{ return x *	-0.425969
-0.372603	a = b *	-0.191886
-0.278594	a[i] = b *	-0.124939
-0.278594	temp = b *	-0.124939
-0.510269	1.0f + b *	-0.124939
-0.056470	2 : b *	-0.425969
-0.378264	intermediate expression b *	-0.124939
-0.056470	+ 2, b *	-0.425969
-0.290485	+ two, b *	-0.124939
-0.503094	a[i] = i *	-0.124939
-0.357368	n floats: float *	-0.124939
-0.356925	x = 2 *	-0.124939
-0.327944	static char const *	-0.124939
-0.026361	__m128i LoadVector(void const *	-0.602060
-0.327944	__m128i LoadVectorA(void const *	-0.124939
-0.356706	sizeof(float)) = 8 *	-0.124939
-0.335428	void Plus2 (int *	-0.124939
-0.335428	void FuncA (int *	-0.124939
-0.351290	the value 10 *	-0.124939
-0.452884	100 * 5 *	-0.124939
-0.224586	b = temp *	-0.124939
-0.224586	c[i] = temp *	-0.124939
-0.347854	1000 * 100 *	-0.124939
-0.279159	(int)&matrix[0][0] + j *	-0.124939
-0.279159	can replace j *	-0.124939
-0.498310	14.15b if (a *	-0.124939
-0.331532	int c;}; abc *	-0.124939
-0.702091	v; if (u.i *	-0.124939
-0.325056	CChild2 Object2; CChild1 *	-0.124939
-0.325189	C2 Object2; CHello *	-0.124939
-0.314472	will take 1000 *	-0.124939
-0.314472	x4 = x2 *	-0.124939
-0.314472	C1 obj1; C0 *	-0.124939
-0.006830	inline void StoreVector(void *	-0.602060
-0.314300	s += xxn *	-0.124939
-0.314300	&Object1; p1->Hello(); CChild2 *	-0.124939
-0.538139	y2 = a2 *	-0.124939
-0.538139	y1 = a1 *	-0.124939
-0.293802	function version CriticalFunctionType *	-0.124939
-0.382327	selected version FuncType *	-0.124939
-0.293802	2) : (bb[i] *	-0.124939
-0.102729	by is (columns *	-0.124939
-0.102729	j * (columns *	-0.124939
-0.102729	dispatcher function. typeof(CriticalFunction) *	-0.124939
-0.102729	__asm__ ("CriticalFunction"); typeof(CriticalFunction) *	-0.124939
-0.293802	{ return Func1(x) *	-0.124939
-0.382327	b = (a+1) *	-0.124939
-0.237544	}; int Sum2(S3 *	-0.124939
-0.237544	operation. For example,a *	-0.124939
-0.237544	problem void AddTwo(int *	-0.124939
-0.237544	(FuncRow(i)*columns + FuncCol(i)) *	-0.124939
-0.237544	2 return (2.5f *	-0.124939
-0.237544	absvalue = a[i].u[1] *	-0.124939
-0.237544	= (float *)alloca(n *	-0.124939
-0.237544	1. / (b1 *	-0.124939
-0.237544	(N-1)) return powN<(N1&(N1-1))==0,N1>::p(x) *	-0.124939
-0.237544	2 > v.i *	-0.124939
-0.237544	inline void StoreNTD(double *	-0.124939
-0.237544	{ return powN<true,N/2>::p(x) *	-0.124939
-0.237544	inline void StoreVectorA(void *	-0.124939
-0.237544	<< 4, anda *	-0.124939
-0.237544	a1 * b2 *	-0.124939
-0.237544	a2 * b1 *	-0.124939
-0.237544	= log (b[i] *	-0.124939
-0.237544	x - 8.0f) *	-0.124939
-0.781994	Choice of compiler There	-0.124939
-0.654728	optimization by compiler There	-0.124939
-0.358068	= Func(ab[i].a); } There	-0.124939
-0.357335	everything is double There	-0.124939
-0.356250	vectorizing mathematical code. There	-0.124939
-1.058016	the same time. There	-0.124939
-0.877638	no extra time. There	-0.124939
-0.355501	the memcpy function. There	-0.124939
-0.354869	cache size, etc. There	-0.124939
-0.354674	as different functions. There	-0.124939
-0.549748	min)) { ... There	-0.124939
-0.648959	13.2 Model-specific dispatching There	-0.124939
-0.352853	between the systems. There	-0.124939
-1.200926	caching less efficient. There	-0.124939
-0.993398	as explained below. There	-0.124939
-0.422469	of logical processors. There	-0.124939
-0.422469	on future processors. There	-0.124939
-0.351324	functions for vectors There	-0.124939
-0.351024	6 Development process There	-0.124939
-0.351075	alloca was called. There	-0.124939
-0.350036	and free are: There	-0.124939
-0.491850	the vector size. There	-0.124939
-0.348272	or other resources. There	-0.124939
-0.820370	across function calls. There	-0.124939
-0.347603	that support it. There	-0.124939
-0.449275	kind of registers. There	-0.124939
-0.347603	the same object. There	-0.124939
-0.510701	gain in performance. There	-0.124939
-0.345732	Cache control instructions. There	-0.124939
-0.345732	a different way. There	-0.124939
-0.793479	for internal references. There	-0.124939
-0.344425	the same address. There	-0.124939
-0.544823	speed or not. There	-0.124939
-0.989049	the end user. There	-0.124939
-1.235733	the function returns. There	-0.124939
-1.146889	dynamic memory allocation. There	-0.124939
-1.179487	set is enabled. There	-0.124939
-0.342833	caching becomes inefficient. There	-0.124939
-0.343044	a try block. There	-0.124939
-0.483087	accessed much faster. There	-0.124939
-0.975097	a template parameter. There	-0.124939
-0.443389	than Boolean expressions. There	-0.124939
-0.341205	the maximum value. There	-0.124939
-0.340969	for unaligned arrays. There	-0.124939
-0.624667	loop control branch. There	-0.124939
-0.341087	floating point vectors. There	-0.124939
-0.338585	than four parameters. There	-0.124939
-0.339121	the higher bits. There	-0.124939
-0.338853	advantage comes automatically. There	-0.124939
-0.338719	own CPU core. There	-0.124939
-1.030071	into multiple threads. There	-0.124939
-0.614665	Execution unit throughput There	-0.124939
-0.331225	very time- consuming. There	-0.124939
-0.331040	the present manual. There	-0.124939
-0.486794	no out-of-order execution. There	-0.124939
-0.713177	divisible by 8. There	-0.124939
-0.594480	objects (*.dll, *.so). There	-0.124939
-0.324545	without AVX support. There	-0.124939
-0.420334	are not optimal. There	-0.124939
-0.681890	Test and maintenance There	-0.124939
-0.313931	the code explicitly. There	-0.124939
-0.407140	GetProcessAffinityMask in Windows). There	-0.124939
-0.407140	and AMD CodeAnalyst. There	-0.124939
-0.712872	the following way: There	-0.124939
-0.574114	relates to security. There	-0.124939
-0.313931	is very limited. There	-0.124939
-0.313931	set number 0x1C. There	-0.124939
-0.313931	math. Memory copying. There	-0.124939
-0.313931	pre-increment to post-increment. There	-0.124939
-0.381896	refresh the screen. There	-0.124939
-0.293450	the application programmer. There	-0.124939
-0.293450	an overflow check. There	-0.124939
-0.537522	4: "Instruction tables". There	-0.124939
-0.293450	structure is created. There	-0.124939
-0.293450	(see p. 43). There	-0.124939
-0.293450	Intel and Gnu. There	-0.124939
-0.381896	vectors. 12.10 Conclusion There	-0.124939
-0.293450	to save power. There	-0.124939
-0.293450	(see p. 87). There	-0.124939
-0.237234	x = -abs(x);. There	-0.124939
-0.237234	set to NULL. There	-0.124939
-0.237234	time it uses. There	-0.124939
-0.237234	2A and 2B. There	-0.124939
-0.237234	to 2 Mbytes. There	-0.124939
-0.237234	separated by commas. There	-0.124939
-0.237234	container be recycled? There	-0.124939
-0.237234	xn as x4∙xn-4. There	-0.124939
-0.237234	discussion. 7.33 Namespaces There	-0.124939
-0.237234	integer is returned. There	-0.124939
-0.237234	down to 36. There	-0.124939
-0.237234	time than normally. There	-0.124939
-0.237234	(.dll or .so). There	-0.124939
-0.237234	8 floating point). There	-0.124939
-0.237234	to using inheritance. There	-0.124939
-0.903611	size of the array	-0.124939
-1.527708	address of the array	-0.124939
-0.581143	element of the array	-0.124939
-1.205267	calculation of the array	-0.124939
-0.862218	end of the array	-0.124939
-0.581143	dimensions of the array	-0.124939
-0.595177	smaller and the array	-0.124939
-0.598660	S1 in the array	-0.124939
-0.588578	system if the array	-0.124939
-0.588578	however, if the array	-0.124939
-1.748605	to make the array	-0.124939
-1.216456	in which the array	-0.124939
-0.586379	checking all the array	-0.124939
-0.876859	in case the array	-0.124939
-0.356474	it compares the array	-0.124939
-0.356474	solution. Sort the array	-0.124939
-1.820265	the address of array	-0.124939
-0.310557	the addresses of array	-0.124939
-0.561304	with end of array	-0.124939
-0.357490	scanf. Violation of array	-0.124939
-0.573184	fast access to array	-0.124939
-0.358714	array sizes and array	-0.124939
-0.541131	store result in array	-0.124939
-0.929822	Induction variables for array	-0.124939
-0.579416	automatically check for array	-0.124939
-1.116951	is intended for array	-0.124939
-0.654225	no checking for array	-0.124939
-0.460040	no checks for array	-0.124939
-0.463122	the object or array	-0.124939
-0.552448	n is an array	-0.124939
-0.591536	size of an array	-0.124939
-0.160795	address of an array	-0.124939
-0.418925	end of an array	-0.124939
-0.455318	applies to an array	-0.124939
-0.455318	bounds-checking to an array	-0.124939
-0.537096	elements in an array	-0.124939
-0.506735	check if an array	-0.124939
-0.417500	data as an array	-0.124939
-0.160401	used as an array	-0.124939
-0.330379	to set an array	-0.124939
-0.330379	behaves like an array	-0.124939
-0.330379	as copying an array	-0.124939
-0.330379	or setting an array	-0.124939
-0.330379	are feeding an array	-0.124939
-0.015277	integer vector from array	-0.726999
-0.598914	reuse the same array	-0.124939
-0.462310	addresses for one array	-0.124939
-1.021568	size of each array	-0.124939
-0.513649	bytes) of each array	-0.124939
-0.503318	a fixed size array	-0.124939
-0.015093	integer vector into array	-0.726999
-0.656880	to swap two array	-0.124939
-0.356353	// Make dynamic array	-0.124939
-0.181533	then a simple array	-0.124939
-0.459578	here: A large array	-0.124939
-0.307983	element } An array	-0.124939
-0.307983	the arrays. An array	-0.124939
-0.307983	7.10 Arrays An array	-0.124939
-0.307983	in www.agner.org/optimize/cppexamples.zip. An array	-0.124939
-0.307983	page 27. An array	-0.124939
-0.355612	// Loop through array	-0.124939
-0.687675	wrap the allocated array	-0.124939
-0.341731	inefficient. An allocated array	-0.124939
-0.567864	list; // Make array	-0.124939
-1.060111	clock cycles per array	-0.124939
-0.560809	allocate the final array	-0.124939
-0.347171	memory allocation Any array	-0.124939
-0.267440	than a linear array	-0.124939
-0.381735	use a linear array	-0.124939
-0.267440	then a linear array	-0.124939
-0.520126	of the current array	-0.124939
-0.721620	in a temporary array	-0.124939
-0.149030	access a multidimensional array	-0.124939
-0.021902	matrix or multidimensional array	-0.124939
-0.095134	sizeof(list)); A multidimensional array	-0.124939
-0.325102	access to individual array	-0.124939
-0.594835	constants, string constants, array	-0.124939
-0.293969	make a variable-size array	-0.124939
-0.293969	// Safe [] array	-0.124939
-0.237690	endl; // Output array	-0.124939
-0.237690	when a fixed-size array	-0.124939
-0.550201	large arrays and where	-0.124939
-1.368611	of the function where	-0.124939
-1.959159	of the code where	-0.124939
-0.589725	than a program where	-0.124939
-0.462285	to the point where	-0.124939
-0.468073	in a loop where	-0.124939
-1.788987	can be used where	-0.124939
-1.064659	x86 instruction set where	-0.124939
-0.568501	large shared object where	-0.124939
-0.461644	member functions static where	-0.124939
-0.656119	a public variable where	-0.124939
-0.356471	for large libraries where	-0.124939
-1.006007	use vector operations where	-0.124939
-0.356012	the general case where	-0.124939
-0.293271	In the cases where	-0.124939
-0.035951	size in cases where	-0.124939
-0.035951	advantageous in cases where	-0.124939
-0.035951	automatically in cases where	-0.124939
-0.035951	errors in cases where	-0.124939
-0.035951	containers in cases where	-0.124939
-0.045424	may be cases where	-0.124939
-0.219853	there are cases where	-0.124939
-0.616097	in most cases where	-0.124939
-0.219853	etc. In cases where	-0.124939
-0.293271	are many cases where	-0.124939
-0.219853	in simple cases where	-0.124939
-0.096161	the few cases where	-0.124939
-0.096161	a few cases where	-0.124939
-0.293271	in special cases where	-0.124939
-0.355875	with carry) instructions where	-0.124939
-0.546238	making two threads where	-0.124939
-0.354786	But a solution where	-0.124939
-0.548552	is a structure where	-0.124939
-1.328369	in 64-bit mode where	-0.124939
-0.519377	errors in programs where	-0.124939
-0.560515	take memory space where	-0.124939
-0.353278	large data sets where	-0.124939
-0.713007	large memory model where	-0.124939
-0.352620	construct obscure examples where	-0.124939
-0.455032	calculation of expressions where	-0.124939
-0.351250	a learning process where	-0.124939
-0.350422	Pentium 4 computer where	-0.124939
-0.349931	in interpreted languages where	-0.124939
-0.349420	with member functions, where	-0.124939
-0.448648	you can predict where	-0.124939
-0.028659	is the situation where	-0.124939
-0.028659	to the situation where	-0.124939
-0.028659	in the situation where	-0.124939
-0.092361	a use situation where	-0.124939
-0.092361	space. A situation where	-0.124939
-0.092361	the only situation where	-0.124939
-0.092361	in any situation where	-0.124939
-0.092361	A common situation where	-0.124939
-0.445606	form of templates where	-0.124939
-0.492009	follow a sequence where	-0.124939
-0.056973	make overflow checks where	-0.124939
-0.081786	aware of situations where	-0.124939
-0.081865	useful in situations where	-0.124939
-0.061831	RISC in situations where	-0.124939
-0.081786	may be situations where	-0.124939
-0.081786	There are situations where	-0.124939
-0.081786	are also situations where	-0.124939
-0.341349	IsPowerOf2 = false where	-0.124939
-0.530119	a dependency chain where	-0.124939
-0.206567	are cases, however, where	-0.124939
-0.206567	few cases, however, where	-0.124939
-0.488860	storage is determined where	-0.124939
-0.438399	32- bit mode, where	-0.124939
-0.434446	the second step where	-0.124939
-0.335721	return addresses (i.e. where	-0.124939
-0.331333	(except in Fortran where	-0.124939
-0.467034	as multiple inheritance where	-0.124939
-0.324833	using a pipeline where	-0.124939
-0.420694	level-1 data cache, where	-0.124939
-0.324833	a column-wise manner where	-0.124939
-0.314212	series of calculations, where	-0.124939
-0.407488	on the Internet where	-0.124939
-0.314212	of sequential instructions, where	-0.124939
-0.382225	like example 12.4a where	-0.124939
-0.293718	more efficient today where	-0.124939
-0.293718	time. An experiment where	-0.124939
-0.237470	threads are areas where	-0.124939
-0.237470	the variable __intel_cpu_feature_indicator where	-0.124939
-0.237470	to around 1980 where	-0.124939
-0.237470	Template for pow(x,N) where	-0.124939
-0.237470	as 2eee 1.fffff, where	-0.124939
-0.237470	blocks of data", where	-0.124939
-0.237470	n places back, where	-0.124939
-0.237470	in the sequence, where	-0.124939
-0.726287	to implement the many	-0.124939
-0.358717	to thank the many	-0.124939
-0.557622	The speed is many	-0.124939
-0.358457	extremely costly to many	-0.124939
-0.358457	time consumer to many	-0.124939
-0.502215	Gnu compiler in many	-0.124939
-0.847057	the variable in many	-0.124939
-0.356924	reductions explicitly in many	-0.124939
-0.356924	many users in many	-0.124939
-0.655865	are missing in many	-0.124939
-0.864857	to use for many	-0.124939
-0.531956	good performance for many	-0.124939
-0.331356	function libraries for many	-0.124939
-0.228209	standard libraries for many	-0.124939
-0.228209	well-tested libraries for many	-0.124939
-0.857113	very useful for many	-0.124939
-0.738146	is available for many	-0.124939
-0.790712	are available for many	-0.124939
-0.352818	precious resource for many	-0.124939
-0.352818	the market for many	-0.124939
-0.462449	and discovered that many	-0.124939
-0.357993	in mind, that many	-0.124939
-0.273376	if there are many	-0.221849
-0.765299	However, there are many	-0.124939
-1.160711	the critical function many	-0.124939
-0.834599	are used by many	-0.124939
-0.506507	write it with many	-0.124939
-0.506507	friendly compiler with many	-0.124939
-0.445632	A program with many	-0.124939
-0.445632	is called with many	-0.124939
-0.344717	A template with many	-0.124939
-0.344717	in programs with many	-0.124939
-0.344717	An application with many	-0.124939
-0.445632	CPU-intensive applications with many	-0.124939
-0.485253	A computer with many	-0.124939
-0.631899	switch statement with many	-0.124939
-0.344717	an IDE with many	-0.124939
-0.594984	shuffling, such as many	-0.124939
-0.563974	necessary to have many	-0.124939
-0.837447	programs that have many	-0.124939
-1.361259	Some compilers have many	-0.124939
-0.358229	one way, then many	-0.124939
-0.995446	is called from many	-0.124939
-0.825627	because it has many	-0.124939
-0.509355	a program has many	-0.124939
-0.793442	class library has many	-0.124939
-0.340205	While C++ has many	-0.124939
-0.340205	language. D has many	-0.124939
-0.340205	platforms. Pascal has many	-0.124939
-0.588933	data are used many	-0.124939
-0.539313	is divided into many	-0.124939
-0.634125	less efficient. In many	-0.124939
-0.345865	low priority. In many	-0.124939
-0.345865	mathematical purity. In many	-0.124939
-0.643419	may be so many	-0.124939
-0.528649	There are so many	-0.124939
-0.886621	code. For example, many	-0.124939
-0.134831	to count how many	-0.124939
-0.134831	that count how many	-0.124939
-0.331411	can tell how many	-0.124939
-0.331411	profiler counts how many	-0.124939
-0.355780	benefit from its many	-0.124939
-0.355660	all cases, while many	-0.124939
-0.355562	CPU-intensive code. But many	-0.124939
-0.522820	a program uses many	-0.124939
-0.734520	a program contains many	-0.124939
-0.279899	This library contains many	-0.124939
-0.279899	Primitives" library contains many	-0.124939
-0.308413	Kernel Library" contains many	-0.124939
-0.500973	servers that run many	-0.124939
-0.854216	efficient to store many	-0.124939
-0.352597	improvements. Making too many	-0.124939
-0.543066	expression to generate many	-0.124939
-0.813690	branch that goes many	-0.124939
-0.349527	container classes. Unfortunately, many	-0.124939
-0.348195	some processors. On many	-0.124939
-0.347146	large block containing many	-0.124939
-0.343311	two books contain many	-0.124939
-0.498412	CPU can hold many	-0.124939
-0.331688	I have seen many	-0.124939
-0.325116	C# and avoids many	-0.124939
-0.314377	64-bit Linux. Has many	-0.124939
-0.293876	4. Even worse, many	-0.124939
-0.293876	complex framework requiring many	-0.124939
-0.237609	non-Intel CPUs. Includes many	-0.124939
-0.237609	Addison-Wesley, 2003. Contains many	-0.124939
-0.237609	creates or modifies many	-0.124939
-0.237609	x?" or "how many	-0.124939
-1.400259	look at the possible	-0.124939
-1.209838	that it is possible	-0.124939
-1.296586	if it is possible	-0.124939
-0.228681	cases it is possible	-0.602060
-0.855105	But it is possible	-0.124939
-1.101385	whether it is possible	-0.124939
-0.741675	cases, it is possible	-0.124939
-0.513896	Here it is possible	-0.124939
-0.513896	Nevertheless, it is possible	-0.124939
-0.513896	algebra, it is possible	-0.124939
-0.513896	see, it is possible	-0.124939
-0.513896	design, it is possible	-0.124939
-0.859026	if this is possible	-0.124939
-0.668835	} It is possible	-0.124939
-0.469326	object It is possible	-0.124939
-0.762699	time. It is possible	-0.124939
-0.818527	used. It is possible	-0.124939
-0.469326	processors. It is possible	-0.124939
-0.469326	object. It is possible	-0.124939
-0.469326	critical. It is possible	-0.124939
-0.469326	vectorization. It is possible	-0.124939
-0.469326	input. It is possible	-0.124939
-0.469326	purpose. It is possible	-0.124939
-0.469326	context. It is possible	-0.124939
-0.469326	148 It is possible	-0.124939
-0.469326	happening. It is possible	-0.124939
-0.469326	animation. It is possible	-0.124939
-0.469326	indeed. It is possible	-0.124939
-0.469326	57). It is possible	-0.124939
-0.469326	sizes? It is possible	-0.124939
-0.550143	and also a possible	-0.124939
-0.358530	easily justify a possible	-0.124939
-1.658631	the number of possible	-0.124939
-1.065917	a number of possible	-0.124939
-0.828514	limited number of possible	-0.124939
-0.525194	(be aware of possible	-0.124939
-0.357506	the obstacle of possible	-0.124939
-0.358741	an explanation and possible	-0.124939
-1.644288	it may be possible	-0.124939
-1.422133	It may be possible	-0.124939
-0.882781	It should be possible	-0.124939
-0.354875	neverthe- less be possible	-0.124939
-0.247722	it might be possible	-0.124939
-0.247722	It might be possible	-0.124939
-0.358682	to objects) are possible	-0.124939
-0.385191	to make it possible	-0.124939
-0.114380	that make it possible	-0.124939
-0.302030	it makes it possible	-0.124939
-0.234654	This makes it possible	-0.425969
-0.302030	library makes it possible	-0.124939
-0.350665	How was it possible	-0.124939
-1.241174	of 2 if possible	-0.124939
-0.457234	much data as possible	-0.124939
-0.353888	as much as possible	-0.124939
-0.353888	as small as possible	-0.124939
-0.353888	as standardized as possible	-0.124939
-0.861339	it is not possible	-0.602060
-1.191010	this is not possible	-0.124939
-0.547470	propagation is not possible	-0.124939
-0.514551	is therefore not possible	-0.124939
-0.526181	1.0f; } A possible	-0.124939
-0.814131	This is only possible	-0.124939
-0.494739	this is only possible	-0.124939
-0.657912	There are other possible	-0.124939
-0.539680	deallocated in all possible	-0.124939
-0.349888	It is also possible	-0.425969
-0.745944	it is often possible	-0.124939
-1.102733	It is often possible	-0.124939
-1.237497	is not always possible	-0.124939
-0.355893	cannot change its possible	-0.124939
-0.431452	to the best possible	-0.124939
-0.431452	use the best possible	-0.124939
-0.431452	model the best possible	-0.124939
-0.524830	flaws: The best possible	-0.124939
-1.460374	It is therefore possible	-0.124939
-0.455145	various other optimizations possible	-0.124939
-0.450960	It is sometimes possible	-0.124939
-0.511562	of the maximum possible	-0.124939
-0.684437	gives the simplest possible	-0.124939
-0.442415	tools. The simplest possible	-0.124939
-0.771180	It is rarely possible	-0.124939
-0.341639	sake of fastest possible	-0.124939
-0.438753	compilers is generally possible	-0.124939
-0.265159	assume the worst possible	-0.124939
-0.265159	gives the worst possible	-0.124939
-0.336099	// Define biggest possible	-0.124939
-0.203042	reciprocal of the clock	-0.124939
-0.599337	changes in the clock	-0.124939
-0.579556	problem that the clock	-0.124939
-0.579556	fast that the clock	-0.124939
-0.579556	problems that the clock	-0.124939
-0.894951	example, if the clock	-0.124939
-1.066027	multiplied by the clock	-0.124939
-0.596156	programs when the clock	-0.124939
-0.357214	to measure the clock	-0.124939
-0.357214	experience. Occasionally, the clock	-0.124939
-0.357214	any event, the clock	-0.124939
-0.463508	time unit is clock	-0.124939
-0.201357	much is a clock	-0.425969
-0.337913	length of a clock	-0.602060
-0.595177	comparable to a clock	-0.124939
-2.072421	the number of clock	-0.124939
-0.356772	10 Multithreading The clock	-0.124939
-0.356772	work load. The clock	-0.124939
-0.356772	standard PCs. The clock	-0.124939
-0.356772	normal afterwards. The clock	-0.124939
-0.526267	it uses more clock	-0.124939
-0.358104	0.5ns. 2GHz A clock	-0.124939
-1.056738	of the CPU clock	-0.124939
-0.540980	if the CPU clock	-0.124939
-0.540980	at the CPU clock	-0.124939
-0.449677	am using CPU clock	-0.124939
-0.342958	zero or one clock	-0.124939
-0.183738	take only one clock	-0.124939
-0.342958	typically takes one clock	-0.124939
-0.443413	in just one clock	-0.124939
-0.823730	between the two clock	-0.124939
-0.139642	for approximately two clock	-0.124939
-0.139642	accessed approximately two clock	-0.124939
-0.357015	0 - 2 clock	-0.124939
-0.349049	up to 4 clock	-0.124939
-0.349049	3 - 4 clock	-0.124939
-0.356779	4 - 8 clock	-0.124939
-0.356218	4 - 16 clock	-0.124939
-0.342428	can save several clock	-0.124939
-0.342428	microprocessor wastes several clock	-0.124939
-0.293151	is a few clock	-0.124939
-0.293151	only a few clock	-0.124939
-0.293151	takes a few clock	-0.124939
-0.293151	needed a few clock	-0.124939
-0.293151	just a few clock	-0.124939
-0.293151	until a few clock	-0.124939
-0.266596	kludgy. The few clock	-0.124939
-0.238607	point addition every clock	-0.124939
-0.344584	one addition every clock	-0.124939
-0.353314	can change their clock	-0.124939
-0.353140	take only 256 clock	-0.124939
-0.352594	addition every three clock	-0.124939
-0.789541	for a higher clock	-0.124939
-0.297168	(3 - 10 clock	-0.124939
-0.544065	something takes 10 clock	-0.124939
-0.297168	still take 10 clock	-0.124939
-0.314558	use the core clock	-0.124939
-0.102809	cycles. The core clock	-0.124939
-0.102809	frequency. The core clock	-0.124939
-0.237760	table are core clock	-0.124939
-0.237760	is called core clock	-0.124939
-0.290483	3 - 5 clock	-0.124939
-0.121394	addition takes 5 clock	-0.124939
-0.121394	operation takes 5 clock	-0.124939
-0.347980	50 - 100 clock	-0.124939
-0.257399	5 and 20 clock	-0.124939
-0.051484	10 - 20 clock	-0.124939
-0.343448	3 - 6 clock	-0.124939
-0.336006	2 and 15 clock	-0.124939
-0.102790	40 - 80 clock	-0.124939
-0.102790	(27 - 80 clock	-0.124939
-0.487517	at the actual clock	-0.124939
-0.358666	than a hundred clock	-0.124939
-0.237709	but several hundred clock	-0.124939
-0.331629	multiplication takes 11 clock	-0.124939
-0.331545	code took 50 clock	-0.124939
-0.420954	typically takes 40 clock	-0.124939
-0.093229	14 - 45 clock	-0.124939
-0.093229	(20 - 45 clock	-0.124939
-0.314416	12 - 25 clock	-0.124939
-0.023503	different size matrices, clock	-0.425969
-0.237641	takes only 2-3 clock	-0.124939
-0.237641	counter is counting clock	-0.124939
-0.237641	take approximately 500 clock	-0.124939
-0.237641	DontSkip = dummy[0]; clock	-0.124939
-0.887649	object, then the version	-0.124939
-0.596070	function. If the version	-0.124939
-1.326456	to call the version	-0.124939
-1.645674	to use a version	-0.124939
-0.358004	it takes. The version	-0.124939
-0.358004	CPU brand. The version	-0.124939
-0.357320	the appropriate function version	-0.124939
-0.357320	the desired function version	-0.124939
-0.533370	which a code version	-0.124939
-0.517771	which this code version	-0.124939
-0.141410	time. Each code version	-0.124939
-0.141410	initialization. Each code version	-0.124939
-0.352446	most advanced code version	-0.124939
-0.598812	calls the same version	-0.124939
-0.343745	compile time which version	-0.124939
-0.343745	is known which version	-0.124939
-0.343745	when testing which version	-0.124939
-0.343745	on deciding which version	-0.124939
-0.343745	with certainty which version	-0.124939
-0.462209	to make one version	-0.124939
-0.571529	speed of each version	-0.124939
-0.767221	once for each version	-0.124939
-0.528834	prototypes for each version	-0.124939
-0.548828	use the static version	-0.124939
-0.541747	and a 64-bit version	-0.124939
-0.454762	is. The 64-bit version	-0.124939
-0.524912	The best possible version	-0.124939
-0.501944	A 32- bit version	-0.124939
-0.403861	if the new version	-0.124939
-0.403861	gets the new version	-0.124939
-0.552630	to a new version	-0.124939
-0.425900	to each new version	-0.124939
-0.862676	only the SSE2 version	-0.124939
-0.139233	{...} // SSE2 version	-0.425969
-0.459402	BSD. The Windows version	-0.124939
-0.500406	the directly compiled version	-0.124939
-0.355350	AVX2 // specific version	-0.124939
-0.126812	{...} // AVX version	-0.124939
-0.306450	in the optimized version	-0.124939
-0.306450	not the optimized version	-0.124939
-0.354773	instruction set, another version	-0.124939
-0.839710	then the optimal version	-0.124939
-0.575968	implemented a separate version	-0.124939
-0.531556	with a better version	-0.124939
-0.351315	six years old version	-0.124939
-0.081577	to the appropriate version	-0.425969
-0.156848	choose the appropriate version	-0.124939
-0.156848	loads the appropriate version	-0.124939
-0.198517	systems. The appropriate version	-0.124939
-0.288987	run the advanced version	-0.124939
-0.288987	running the advanced version	-0.124939
-0.463359	to the desired version	-0.124939
-0.311953	to the right version	-0.425969
-0.445581	finding the right version	-0.124939
-1.057869	in the final version	-0.124939
-0.344862	If a future version	-0.124939
-0.445814	uses a newer version	-0.124939
-0.519915	if the current version	-0.124939
-0.483318	call the chosen version	-0.124939
-0.341458	because the interpreted version	-0.124939
-0.070844	making a debug version	-0.124939
-0.070844	executable: a debug version	-0.124939
-0.155540	The 17 debug version	-0.124939
-0.155540	time. Uses debug version	-0.124939
-0.206344	example, the latest version	-0.124939
-0.206344	Use the latest version	-0.124939
-0.206344	gets the latest version	-0.124939
-0.336186	go to dispatched version	-0.124939
-0.255809	continue in dispatched version	-0.124939
-0.331454	8 most popular version	-0.124939
-0.420842	to the selected version	-0.124939
-0.212147	run an inferior version	-0.124939
-0.212147	supports. An inferior version	-0.124939
-0.314329	in Linux kernel version	-0.124939
-0.048338	"asmlib.h" // Lowest version	-0.124939
-0.048338	&CriticalFunction_Dispatch; // Lowest version	-0.124939
-0.102738	loader (requires binutils version	-0.124939
-0.102738	13.1, Requires binutils version	-0.124939
-0.293830	debugging. A command-line version	-0.124939
-0.023497	and a release version	-0.124939
-0.293830	and a generic version	-0.124939
-0.237568	{ // Generic version	-0.124939
-0.237568	version 2.20, glibc version	-0.124939
-0.237568	} // Default version	-0.124939
-0.883758	independent of the value	-0.124939
-0.883758	regardless of the value	-0.124939
-0.592186	compiler to the value	-0.124939
-1.086697	is that the value	-0.124939
-0.540248	and that the value	-0.124939
-1.251069	so that the value	-0.124939
-0.787177	means that the value	-0.124939
-1.054341	assume that the value	-0.124939
-0.540248	detect that the value	-0.124939
-0.559364	used if the value	-0.124939
-0.821522	possible if the value	-0.124939
-0.559364	way if the value	-0.124939
-0.559364	predicted if the value	-0.124939
-0.559364	course, if the value	-0.124939
-0.589136	ArraySize by the value	-0.124939
-0.588553	integers with the value	-0.124939
-1.504403	depends on the value	-0.124939
-0.587005	mean use the value	-0.124939
-0.585769	range then the value	-0.124939
-0.812545	value from the value	-0.124939
-0.988637	calculated from the value	-0.124939
-0.574103	always has the value	-0.124939
-1.024374	will make the value	-0.124939
-0.586530	times because the value	-0.124939
-0.587782	executed. If the value	-0.124939
-0.517115	digits, so the value	-0.124939
-0.562995	Make sure the value	-0.124939
-0.353394	will get the value	-0.124939
-0.353394	both get the value	-0.124939
-0.836576	to calculate the value	-0.124939
-0.572302	unfavorable, unless the value	-0.124939
-0.808073	cycles after the value	-0.124939
-0.310048	you read the value	-0.124939
-0.310048	only read the value	-0.124939
-0.548368	i; Here, the value	-0.124939
-0.351998	doesn't know the value	-0.124939
-0.351998	will generate the value	-0.124939
-0.546806	to change the value	-0.124939
-0.541825	N&(N-1) gives the value	-0.124939
-0.557248	wait until the value	-0.124939
-0.454838	with reading the value	-0.124939
-0.548368	of calculating the value	-0.124939
-0.646117	to hold the value	-0.124939
-0.646117	to reload the value	-0.124939
-0.351998	ebx restores the value	-0.124939
-0.876907	sure that a value	-0.124939
-0.580439	calculated from a value	-0.124939
-0.357920	each constant a value	-0.124939
-0.357920	tables Reading a value	-0.124939
-0.460984	little explanation. The value	-0.124939
-0.356842	clock counts. The value	-0.124939
-0.356842	when false. The value	-0.124939
-0.356842	63 . The value	-0.124939
-0.358590	are transferred by value	-0.124939
-0.537812	infinity, and this value	-0.124939
-0.356103	can subtract this value	-0.124939
-0.358073	for each different value	-0.124939
-0.781438	have no other value	-0.124939
-0.428157	produce no other value	-0.124939
-0.570041	have any other value	-0.124939
-1.675724	the floating point value	-0.124939
-0.571709	to the integer value	-0.124939
-0.435798	x∙xn-1, and each value	-0.124939
-0.522447	so that each value	-0.124939
-0.371306	account that each value	-0.124939
-0.435798	serial because each value	-0.124939
-0.435798	to calculate each value	-0.124939
-0.336904	} Here, each value	-0.124939
-0.357143	and its return value	-0.124939
-0.546669	for the new value	-0.124939
-0.575293	calculating a new value	-0.124939
-0.344532	replaced by its value	-0.124939
-0.344532	pointer then its value	-0.124939
-0.494689	off the binary value	-0.124939
-0.350682	A possible negative value	-0.124939
-0.563351	on the preceding value	-0.124939
-0.348230	minimum value maximum value	-0.124939
-0.560930	when the final value	-0.124939
-0.079280	from the previous value	-0.301030
-0.059069	of the absolute value	-0.124939
-0.059069	take the absolute value	-0.124939
-0.059069	calculate the absolute value	-0.124939
-0.336142	the four B value	-0.124939
-0.294061	size, bits minimum value	-0.124939
-0.294061	the four R value	-0.124939
-0.294061	an unused fourth value	-0.124939
-0.237772	Obviously, the initial value	-0.124939
-0.600282	bytes) of the objects	-0.124939
-0.598416	knowing that the objects	-0.124939
-1.168832	2 if the objects	-0.124939
-0.589976	necessary if the objects	-0.124939
-0.719367	for all the objects	-0.124939
-0.500555	stores all the objects	-0.124939
-0.500555	pool all the objects	-0.124939
-0.357836	zero whenever the objects	-0.124939
-1.205201	the number of objects	-0.425969
-0.189085	variable number of objects	-0.124939
-0.948124	total number of objects	-0.124939
-1.157406	the type of objects	-0.124939
-0.142652	the movements of objects	-0.124939
-0.142652	physical movements of objects	-0.124939
-0.266481	all variables and objects	-0.124939
-0.266481	many variables and objects	-0.124939
-0.266481	Such variables and objects	-0.124939
-0.113115	non-static variables and objects	-0.425969
-0.089539	stack Variables and objects	-0.124939
-0.089539	memory. Variables and objects	-0.124939
-0.089539	storage Variables and objects	-0.124939
-0.565937	of time. The objects	-0.124939
-0.358074	complexity (en.wikipedia.org/wiki/Standard_Template_Library). The objects	-0.124939
-0.986929	used only for objects	-0.124939
-0.462531	this principle for objects	-0.124939
-0.592963	than there are objects	-0.124939
-0.353349	details on when objects	-0.124939
-0.246927	becomes fragmented when objects	-0.124939
-0.246927	become fragmented when objects	-0.124939
-0.352029	biggest possible vector objects	-0.124939
-1.056937	// Define vector objects	-0.124939
-0.352029	not allow vector objects	-0.124939
-0.582805	area for different objects	-0.124939
-0.354864	to make different objects	-0.124939
-0.825624	contiguous with other objects	-0.124939
-0.140951	FIFO manner? If objects	-0.124939
-0.140951	FILO manner? If objects	-0.124939
-0.350954	numbered consecutively? If objects	-0.124939
-0.350624	needed before all objects	-0.124939
-0.245503	only after all objects	-0.124939
-0.245503	needed after all objects	-0.124939
-0.446356	Structure and class objects	-0.124939
-0.345291	to all class objects	-0.124939
-0.345291	Conversions involving class objects	-0.124939
-0.345291	undetected. Converting class objects	-0.124939
-0.351354	to store many objects	-0.124939
-0.351354	block containing many objects	-0.124939
-0.453506	detect if any objects	-0.124939
-0.350947	or remove any objects	-0.124939
-0.356444	needed, and new objects	-0.124939
-0.577525	inheritance by making objects	-0.124939
-0.459587	to align large objects	-0.124939
-0.343190	only for big objects	-0.124939
-0.343190	and other big objects	-0.124939
-0.129775	to all allocated objects	-0.124939
-0.129775	that all allocated objects	-0.124939
-0.402449	different dynamically allocated objects	-0.124939
-0.402449	small dynamically allocated objects	-0.124939
-0.854397	possible to store objects	-0.124939
-0.270805	used in shared objects	-0.124939
-0.270805	even when shared objects	-0.124939
-0.270805	to make shared objects	-0.124939
-0.354296	up 64-bit shared objects	-0.124939
-0.053543	also called shared objects	-0.425969
-0.455048	calculation of graphics objects	-0.124939
-0.495676	destructors for local objects	-0.124939
-0.311779	unique key. Do objects	-0.124939
-0.311779	hash map. Do objects	-0.124939
-0.489870	modular. The so-called objects	-0.124939
-0.445937	strings and similar objects	-0.124939
-0.445983	remove or modify objects	-0.124939
-0.341558	creation of temporary objects	-0.124939
-0.026971	Position-independent code Shared objects	-0.124939
-0.026971	load time. Shared objects	-0.124939
-0.026971	bit Linux Shared objects	-0.124939
-0.006587	explained below. Shared objects	-0.425969
-0.026971	in BSD Shared objects	-0.124939
-0.026971	local references. Shared objects	-0.124939
-0.269132	simplest cases, composite objects	-0.124939
-0.269132	for transferring composite objects	-0.124939
-0.473340	preferred to declare objects	-0.124939
-0.048359	compile time. Are objects	-0.124939
-0.048359	top-of-stack index. Are objects	-0.124939
-0.048359	too small. Are objects	-0.124939
-0.048359	list. 94 Are objects	-0.124939
-0.237698	or void. Returning objects	-0.124939
-0.463351	is compact and takes	-0.124939
-1.063600	the one that takes	-0.124939
-0.755009	container class that takes	-0.124939
-0.903970	function library that takes	-0.124939
-0.535490	one version that takes	-0.124939
-0.945813	a way that takes	-0.124939
-0.458841	Any task that takes	-0.124939
-0.532167	time that it takes	-0.124939
-0.532167	shows that it takes	-0.124939
-0.532167	show that it takes	-0.124939
-0.739934	more than it takes	-0.124939
-0.047632	the time it takes	-0.970037
-0.010056	The time it takes	-1.079181
-0.827364	optimal because it takes	-0.124939
-0.421364	consuming. Sometimes it takes	-0.124939
-0.421364	loading Often, it takes	-0.124939
-0.325371	cached. Usually it takes	-0.124939
-1.070305	and the code takes	-0.124939
-0.784919	that the compiler takes	-0.124939
-0.782453	is used. It takes	-0.124939
-0.352803	is started. It takes	-0.124939
-0.352803	in green. It takes	-0.124939
-0.579302	deallocation of memory takes	-0.124939
-0.562608	variable in memory takes	-0.124939
-1.184509	If the program takes	-0.124939
-0.358096	A branch instruction takes	-0.124939
-0.358000	The unrolled loop takes	-0.124939
-0.764931	double to integer takes	-0.124939
-0.576444	to an integer takes	-0.124939
-0.741858	an unsigned integer takes	-0.124939
-0.541691	example, a double takes	-0.124939
-1.223262	float or double takes	-0.124939
-0.539733	The 'this' pointer takes	-0.124939
-0.357490	or structure object takes	-0.124939
-0.357408	assembly language. C++ takes	-0.124939
-0.580816	Copying the table takes	-0.124939
-0.348294	size conversion often takes	-0.124939
-0.348294	hard disk often takes	-0.124939
-0.356165	point constant always takes	-0.124939
-0.857696	Long double precision takes	-0.124939
-1.011506	a linked list takes	-0.124939
-0.407890	unsigned. This typically takes	-0.124939
-0.314536	function pointer typically takes	-0.124939
-0.314536	without SSE2 typically takes	-0.124939
-0.297437	time. Integer multiplication takes	-0.124939
-0.297437	multiplication Integer multiplication takes	-0.124939
-0.575015	that exception handling takes	-0.124939
-0.353172	#define directive never takes	-0.124939
-0.453579	Typically, the conversion takes	-0.124939
-0.291691	int)i; This conversion takes	-0.124939
-0.610403	the type conversion takes	-0.124939
-0.291691	the integer-to-float conversion takes	-0.124939
-0.815391	Floating point division takes	-0.124939
-0.404864	division Integer division takes	-0.124939
-0.404864	microprocessors. Integer division takes	-0.124939
-0.745011	floating point addition takes	-0.124939
-0.404318	Floating point addition takes	-0.124939
-0.641159	floating point operation takes	-0.124939
-0.056994	write that something takes	-0.425969
-0.441611	A runtime DLL takes	-0.124939
-0.715994	and garbage collection takes	-0.124939
-0.339194	reading them again takes	-0.124939
-0.336040	unfortunate because truncation takes	-0.124939
-0.331737	clock cycles. Division takes	-0.124939
-0.293950	the microprocessor. Multiplication takes	-0.124939
-0.293950	that the branching takes	-0.124939
-0.293950	then it obviously takes	-0.124939
-1.609046	This is the variable	-0.124939
-1.002648	address of the variable	-0.124939
-0.895002	set in the variable	-0.124939
-0.873419	possibility that the variable	-0.124939
-0.586971	assumption that the variable	-0.124939
-1.347536	even if the variable	-0.124939
-1.423485	rather than the variable	-0.124939
-1.038132	the time the variable	-0.124939
-0.592523	declared. If the variable	-0.124939
-0.524047	in which the variable	-0.425969
-0.579759	edx but the variable	-0.124939
-0.838476	makes sure the variable	-0.124939
-0.948144	to write the variable	-0.124939
-0.312600	__intel_cpu_features_init() sets the variable	-0.124939
-0.312600	similarly sets the variable	-0.124939
-0.810733	to give the variable	-0.124939
-0.653535	optimize away the variable	-0.124939
-0.355751	of transferring the variable	-0.124939
-0.459599	union forces the variable	-0.124939
-0.355751	to fetch the variable	-0.124939
-0.561505	optimizations of a variable	-0.124939
-0.561505	range of a variable	-0.124939
-0.561505	collection of a variable	-0.124939
-0.561505	scope of a variable	-0.124939
-1.039358	reference to a variable	-0.124939
-0.554651	writing to a variable	-0.124939
-0.812942	added to a variable	-0.124939
-0.716545	sure that a variable	-0.124939
-0.498847	specifies that a variable	-0.124939
-0.498847	tells that a variable	-0.124939
-0.555832	used by a variable	-0.124939
-1.286010	division by a variable	-0.124939
-0.583897	NOT on a variable	-0.124939
-0.846184	value from a variable	-0.124939
-1.687560	to make a variable	-0.124939
-0.532145	you access a variable	-0.124939
-0.818155	or writing a variable	-0.124939
-0.518497	When accessing a variable	-0.124939
-0.496663	Efficiency Accessing a variable	-0.124939
-0.352943	compiler treat a variable	-0.124939
-0.539999	similar objects of variable	-0.124939
-0.503061	few arrays of variable	-0.124939
-0.717253	different kinds of variable	-0.124939
-0.157467	Different kinds of variable	-0.124939
-0.241638	function names and variable	-0.124939
-0.819875	the function or variable	-0.124939
-0.497482	any function or variable	-0.124939
-0.504234	strings typically have variable	-0.124939
-0.446631	of data A variable	-0.124939
-0.789777	compile time. A variable	-0.124939
-0.345509	a pointer. A variable	-0.124939
-0.345509	one thread. A variable	-0.124939
-0.633434	very expensive. A variable	-0.124939
-0.345509	(see below). A variable	-0.124939
-0.462459	to some other variable	-0.124939
-1.688803	a floating point variable	-0.124939
-1.130773	more than one variable	-0.124939
-0.587172	replacing an integer variable	-0.124939
-0.881460	sure that no variable	-0.124939
-0.524717	A class member variable	-0.124939
-0.349628	a global const variable	-0.124939
-0.349628	a local const variable	-0.124939
-0.760783	of an unsigned variable	-0.124939
-0.659934	as a register variable	-0.124939
-0.463682	temp a register variable	-0.124939
-0.544831	store the shared variable	-0.124939
-0.718214	of a signed variable	-0.124939
-0.051623	of the induction variable	-0.124939
-0.051623	use the induction variable	-0.124939
-0.051623	make the induction variable	-0.124939
-0.230363	by an induction variable	-0.124939
-0.146696	making an induction variable	-0.124939
-0.177977	the same induction variable	-0.124939
-0.177977	and no induction variable	-0.124939
-0.177977	a second induction variable	-0.124939
-0.038131	// Update induction variable	-0.124939
-0.232887	to a public variable	-0.124939
-0.232887	whenever a public variable	-0.124939
-0.199111	to a global variable	-0.124939
-0.199111	as a global variable	-0.124939
-0.199111	when a global variable	-0.124939
-0.721917	in a temporary variable	-0.124939
-0.339352	access the saved variable	-0.124939
-0.549978	multiply integers of any	-0.124939
-0.358424	declared outside of any	-0.124939
-0.556588	comparing it to any	-0.124939
-0.503840	or writes to any	-0.124939
-0.358086	123 correspond to any	-0.124939
-0.358605	copy constructors, and any	-0.124939
-1.038436	be used in any	-0.124939
-0.358318	execution speed in any	-0.124939
-1.628795	be used for any	-0.124939
-0.357316	that works for any	-0.124939
-0.357316	static linking for any	-0.124939
-0.356191	instruction set or any	-0.124939
-0.356191	to call or any	-0.124939
-0.356191	specific bottleneck or any	-0.124939
-0.357324	to detect if any	-0.124939
-0.357324	as true, if any	-0.124939
-0.106376	not accessed by any	-0.425969
-0.458104	this line by any	-0.124939
-0.354574	is bypassed by any	-0.124939
-0.502371	to work with any	-0.124939
-0.461230	will interfere with any	-0.124939
-0.358446	run optimally on any	-0.124939
-1.202069	as efficient as any	-0.124939
-0.578540	8, but not any	-0.124939
-0.881521	times faster than any	-0.124939
-0.536815	processors can have any	-0.124939
-0.356028	the inputs have any	-0.124939
-0.584524	union can use any	-0.124939
-0.977812	be called from any	-0.124939
-0.748212	be accessed from any	-0.124939
-0.352434	not referenced from any	-0.124939
-0.358088	be added at any	-0.124939
-1.094665	This will make any	-0.124939
-0.522922	you cannot make any	-0.124939
-0.354427	execution units. If any	-0.124939
-0.354427	identification (RTTI) If any	-0.124939
-0.462317	be used, but any	-0.124939
-1.266807	compiler to do any	-0.124939
-0.357392	case" counts. In any	-0.124939
-0.356907	should never return any	-0.124939
-0.135754	operations and before any	-0.124939
-0.135754	running and before any	-0.124939
-0.334318	EMMS instruction before any	-0.124939
-0.500202	function _mm256_zeroupper() before any	-0.124939
-0.064516	that doesn't call any	-0.425969
-0.356704	and doesn't take any	-0.124939
-0.876892	does not need any	-0.124939
-0.523692	they don't need any	-0.124939
-0.515384	are compiled without any	-0.124939
-0.337570	old microprocessors without any	-0.124939
-0.337570	used freely without any	-0.124939
-0.139549	it cannot access any	-0.124939
-0.139549	function cannot access any	-0.124939
-0.344735	am not making any	-0.124939
-0.344735	should avoid making any	-0.124939
-0.874106	you should avoid any	-0.124939
-0.651617	will not get any	-0.124939
-0.338612	class (CGrandParent) contains any	-0.124939
-0.338612	class (CParent<>) contains any	-0.124939
-0.354490	dispatching and run any	-0.124939
-0.351716	up and calling any	-0.124939
-0.351397	it doesn't generate any	-0.124939
-0.547433	that can reduce any	-0.124939
-0.075322	do not produce any	-0.124939
-0.036030	does not produce any	-0.425969
-0.348956	variables defined outside any	-0.124939
-0.348012	At this time, any	-0.124939
-0.347922	type-casting without adding any	-0.124939
-0.346145	You may insert any	-0.124939
-0.346092	cache from loading any	-0.124939
-0.343334	should not include any	-0.124939
-0.241085	there is hardly any	-0.124939
-0.155520	integers with hardly any	-0.124939
-0.155520	This has hardly any	-0.124939
-0.155520	there was hardly any	-0.124939
-0.339117	add or remove any	-0.124939
-0.473095	error by avoiding any	-0.124939
-0.420952	going to recommend any	-0.124939
-0.574750	does not alias any	-0.124939
-0.314280	will never throw any	-0.124939
-0.382305	time and resolve any	-0.124939
-0.382305	have to obey any	-0.124939
-0.237527	possible to express any	-0.124939
-0.237527	The profiler identifies any	-0.124939
-0.237527	destructor that destroys any	-0.124939
-0.561963	uncached memory and we	-0.124939
-0.358179	of range and we	-0.124939
-0.881816	conclusion is that we	-0.124939
-0.506331	vector so that we	-0.124939
-0.506331	negative so that we	-0.124939
-0.506331	12.4a so that we	-0.124939
-0.506331	factorials so that we	-0.124939
-0.350159	so large that we	-0.124939
-0.706594	cache line that we	-0.124939
-0.350159	CPUID information that we	-0.124939
-0.350159	clock cycles that we	-0.124939
-0.492793	the optimizations that we	-0.124939
-0.350159	Assume now that we	-0.124939
-0.350159	14.27 assumes that we	-0.124939
-0.350159	to multithreading that we	-0.124939
-0.598198	is the function we	-0.124939
-0.497576	the code if we	-0.124939
-0.353598	manually, but if we	-0.124939
-1.355462	For example, if we	-0.124939
-0.353598	same result if we	-0.124939
-0.353598	becomes easier if we	-0.124939
-1.521208	at the time we	-0.124939
-0.355754	to understand when we	-0.124939
-0.355754	be evicted when we	-0.124939
-0.342284	odd number then we	-0.124939
-0.442564	1000 times then we	-0.124939
-0.342284	a result then we	-0.124939
-0.627197	clock cycles, then we	-0.124939
-0.342284	constant n, then we	-0.124939
-0.342284	= 10000, then we	-0.124939
-0.342284	or C2, then we	-0.124939
-1.260704	This is because we	-0.124939
-0.490485	run faster because we	-0.124939
-0.450405	anything here because we	-0.124939
-0.348497	example 9.5 because we	-0.124939
-0.347419	local references. If we	-0.124939
-0.347419	loaded anyway. If we	-0.124939
-0.347419	= n∙(n-1)!. If we	-0.124939
-0.347419	: 2.5f; If we	-0.124939
-0.549155	library function which we	-0.124939
-0.461877	to this number we	-0.124939
-1.240958	in cases where we	-0.124939
-0.337767	on x so we	-0.124939
-0.337767	the cache so we	-0.124939
-0.337767	this case so we	-0.124939
-0.337767	1024 bytes, so we	-0.124939
-0.356900	is evicted before we	-0.124939
-1.197276	In this example, we	-0.124939
-0.546766	In 64-bit systems we	-0.124939
-0.559149	not the case we	-0.124939
-1.033752	In this case we	-0.124939
-0.342853	matical applications. But we	-0.124939
-0.342853	= 1.23456. But we	-0.124939
-0.335359	64-bit code. However, we	-0.124939
-0.335359	CPU models. However, we	-0.124939
-0.455733	In these examples we	-0.124939
-0.541139	list[j].c; } Here, we	-0.124939
-0.814106	the cache lines we	-0.124939
-0.450789	number of constants we	-0.124939
-0.308115	the cache. When we	-0.124939
-0.308115	to 100000000. When we	-0.124939
-0.556503	The first thing we	-0.124939
-0.304168	The second thing we	-0.124939
-0.445835	In the future we	-0.124939
-0.344929	each process. Obviously, we	-0.124939
-0.287322	is double. Here we	-0.124939
-0.287322	+ a2/b2; Here we	-0.124939
-0.441468	n = 4, we	-0.124939
-0.438554	64 bit mode, we	-0.124939
-0.339026	to be available, we	-0.124939
-0.325093	or CPU cores, we	-0.124939
-0.324973	assembly language". While we	-0.124939
-0.324973	= pow(x,n) As we	-0.124939
-0.212062	-fpic option. Then we	-0.124939
-0.212062	in F1? Then we	-0.124939
-0.314348	Using hexadecimal numbers, we	-0.124939
-0.314505	In example 7.4 we	-0.124939
-0.314348	a problem since we	-0.124939
-0.293848	loop by four, we	-0.124939
-0.293848	rules of algebra, we	-0.124939
-0.237584	of data decomposition, we	-0.124939
-0.237584	fast. The lesson we	-0.124939
-0.237584	a PC. Similarly, we	-0.124939
-0.237584	than -156. Surprisingly, we	-0.124939
-0.237584	in example 14.7b, we	-0.124939
-0.237584	than 200. Next, we	-0.124939
-0.237584	places back. Thus, we	-0.124939
-0.237584	GHz CPU. Should we	-0.124939
-0.462672	may be of some	-0.124939
-0.995660	a list of some	-0.124939
-1.046240	take care of some	-0.124939
-0.574246	count up to some	-0.124939
-0.724197	is identical to some	-0.124939
-0.357819	software programmers to some	-0.124939
-0.357819	gives rise to some	-0.124939
-0.570920	string functions and some	-0.124939
-0.525573	16-bit mode and some	-0.124939
-0.357345	hard-to-find errors, and some	-0.124939
-0.357345	dispatch mechanisms, and some	-0.124939
-0.351745	platforms, and in some	-0.124939
-0.351745	level, and in some	-0.124939
-0.547621	(but not in some	-0.124939
-0.060182	compiler may in some	-0.425969
-0.130067	It may in some	-0.124939
-0.130067	declaration may in some	-0.124939
-0.580345	now used in some	-0.124939
-0.801073	more efficient in some	-0.124939
-0.355368	less efficient in some	-0.124939
-0.642083	is possible in some	-0.124939
-0.349944	improve performance in some	-0.124939
-1.156847	be useful in some	-0.124939
-0.706106	efficient solution in some	-0.124939
-0.452237	program structure in some	-0.124939
-0.492494	improve optimizations in some	-0.124939
-0.642083	at least in some	-0.124939
-0.349944	be expensive in some	-0.124939
-0.349944	slight imprecision in some	-0.124939
-0.349944	the iterator in some	-0.124939
-0.356766	this section for some	-0.124939
-0.524107	time, except for some	-0.124939
-0.356766	by 5-10% for some	-0.124939
-0.356766	newsgroup comp.lang.asm.x86 for some	-0.124939
-0.655222	to avoid that some	-0.124939
-0.356600	is true that some	-0.124939
-0.356600	loop buffer that some	-0.124939
-0.655222	will notice that some	-0.124939
-0.883450	but there are some	-0.124939
-0.357882	language. Here are some	-0.124939
-0.358658	by giving it some	-0.124939
-1.331599	is supported by some	-0.124939
-0.357264	are combined by some	-0.124939
-0.523810	compiler comes with some	-0.124939
-0.928510	of compatibility with some	-0.124939
-0.355748	through 14, with some	-0.124939
-0.163824	well only on some	-0.124939
-0.354120	than normal on some	-0.124939
-0.354120	the IDE on some	-0.124939
-0.358427	similar solutions may some	-0.124939
-0.358332	it does have some	-0.124939
-0.568763	Codeplay compiler has some	-0.124939
-0.717936	pointer. It has some	-0.124939
-0.762504	you may make some	-0.124939
-0.595107	decide to do some	-0.124939
-0.287311	detection function In some	-0.124939
-0.374382	critical function. In some	-0.124939
-0.287311	the program. In some	-0.124939
-0.407771	graphics calculations. In some	-0.124939
-0.287311	Loop unrolling In some	-0.124939
-0.287311	array element. In some	-0.124939
-0.287311	software users. In some	-0.124939
-0.287311	optimal, though. In some	-0.124939
-0.287311	microprocessors have. In some	-0.124939
-0.287311	to mind. In some	-0.124939
-0.287311	Journal, 2002). In some	-0.124939
-0.287311	mispredictions. 44 In some	-0.124939
-0.287311	page 34. In some	-0.124939
-0.287311	its API. In some	-0.124939
-0.502801	green. It takes some	-0.124939
-0.593794	enough. For example, some	-0.124939
-0.356381	list points out some	-0.124939
-0.750669	that it does some	-0.124939
-0.545570	Each compiler does some	-0.124939
-0.498957	C++ for doing some	-0.124939
-0.352203	table can give some	-0.124939
-0.933011	compilers can reduce some	-0.124939
-0.349682	I have described some	-0.124939
-0.349616	= 2; Unfortunately, some	-0.124939
-0.762243	have to save some	-0.124939
-0.345053	vector instructions SSE4.1 some	-0.124939
-0.339201	very common. Even some	-0.124939
-0.325142	than others. While some	-0.124939
-0.382577	following sections describe some	-0.124939
-1.056194	This function is so	-0.124939
-1.081031	of code is so	-0.124939
-0.567866	such code is so	-0.124939
-0.850969	response time is so	-0.124939
-1.308124	the object is so	-0.124939
-0.501041	point variables is so	-0.124939
-0.356083	meta- programming is so	-0.124939
-1.058376	a matrix is so	-0.124939
-0.875565	the syntax is so	-0.124939
-0.501041	this effect is so	-0.124939
-0.358562	own caller, and so	-0.124939
-1.441086	are used in so	-0.124939
-1.133681	there may be so	-0.124939
-0.583351	tolerance may be so	-0.124939
-0.591949	lists that are so	-0.124939
-0.500025	level-2 cache are so	-0.124939
-0.592628	function. There are so	-0.124939
-0.825234	Modern CPUs are so	-0.124939
-0.652749	and c[i] are so	-0.124939
-0.358580	directives around it so	-0.124939
-0.358511	the thread function so	-0.124939
-0.597818	reorganize the code so	-0.124939
-0.911718	the critical code so	-0.124939
-0.358464	depend on x so	-0.124939
-0.797431	ahead of time so	-0.124939
-1.442948	at compile time so	-0.124939
-0.724586	a 128-bit vector so	-0.124939
-0.357839	and b different so	-0.124939
-0.584759	in the cache so	-0.124939
-0.516894	optimal to do so	-0.124939
-0.114608	Failure to do so	-0.301030
-0.623677	will not do so	-0.124939
-1.273361	in this example so	-0.124939
-0.555081	allowed in C++ so	-0.124939
-0.356833	the same address so	-0.124939
-0.845999	each function call so	-0.124939
-0.356741	one register less so	-0.124939
-1.461813	the sign bit so	-0.124939
-1.003685	accessed through pointers so	-0.124939
-0.356055	two 64-bit operations so	-0.124939
-1.195730	in this case so	-0.124939
-0.355875	automatically or does so	-0.124939
-0.545409	through the calculations so	-0.124939
-0.459165	in separate threads so	-0.124939
-0.355275	throw any exception so	-0.124939
-0.716015	in 32-bit mode so	-0.124939
-0.352852	a reliable source so	-0.124939
-0.644081	at the start so	-0.124939
-0.350375	never be negative so	-0.124939
-1.084515	the code section so	-0.124939
-0.349841	after this statement so	-0.124939
-0.526690	in the code, so	-0.124939
-0.348592	operators are inlined so	-0.124939
-0.489520	like a macro so	-0.124939
-0.347749	it by 100 so	-0.124939
-0.347044	power of 2, so	-0.124939
-0.448639	14.9 is changed so	-0.124939
-0.447290	constants are identical so	-0.124939
-0.485303	excessive loop unrolling so	-0.124939
-0.114252	should be organized so	-0.425969
-0.506368	is template metaprogramming so	-0.124939
-0.743091	is an integer, so	-0.124939
-0.341504	to be designed so	-0.124939
-0.338965	time in thousand so	-0.124939
-0.520326	normalized, if possible, so	-0.124939
-0.506528	code more compact so	-0.124939
-0.434300	as explained above, so	-0.124939
-0.335819	standard specifies truncation so	-0.124939
-0.477923	precision by default, so	-0.124939
-0.420878	modify example 9.5 so	-0.124939
-0.314202	is 32 bits, so	-0.124939
-0.314202	on automatic prefetching so	-0.124939
-0.314202	like a parameter, so	-0.124939
-0.382214	in example 12.4a so	-0.124939
-0.293709	is 1024 bytes, so	-0.124939
-0.293709	the reciprocal factorials so	-0.124939
-0.237462	and task switches; so	-0.124939
-0.237462	market is developing so	-0.124939
-0.237462	seven significant digits, so	-0.124939
-0.237462	the value 0x2C so	-0.124939
-0.237462	of class C1, so	-0.124939
-1.959472	is that the variables	-0.124939
-0.557382	will choose the variables	-0.124939
-0.894027	limited number of variables	-0.124939
-0.592676	also used for variables	-0.124939
-1.128146	is intended for variables	-0.124939
-1.043764	making sure that variables	-0.124939
-0.834946	the operands are variables	-0.124939
-0.526644	registers, not on variables	-0.124939
-0.786605	Do not make variables	-0.124939
-0.357856	can cause other variables	-0.124939
-1.217426	of floating point variables	-0.124939
-0.801793	in floating point variables	-0.124939
-0.192859	manipulating floating point variables	-0.124939
-0.410640	systems. Floating point variables	-0.124939
-0.410640	expressions. Floating point variables	-0.124939
-0.158493	7.3 Floating point variables	-0.425969
-0.956682	to predict which variables	-0.124939
-0.572001	is that all variables	-0.124939
-0.939488	most often used variables	-0.124939
-0.715949	most commonly used variables	-0.124939
-0.503352	conclude that most variables	-0.124939
-0.724538	used for multiple variables	-0.124939
-0.502951	global and static variables	-0.124939
-0.525450	program contains many variables	-0.124939
-0.448300	candidates for register variables	-0.124939
-0.317670	to make register variables	-0.124939
-0.235133	floating point register variables	-0.301030
-0.317670	of integer register variables	-0.124939
-0.356701	to understand how variables	-0.124939
-0.356324	by setting these variables	-0.124939
-0.355957	space. Putting simple variables	-0.124939
-0.756620	pointers to its variables	-0.124939
-0.355402	risk that several variables	-0.124939
-0.401167	four single precision variables	-0.124939
-0.401167	eight single precision variables	-0.124939
-0.354086	may add counter variables	-0.124939
-0.353801	integer. 158 Integer variables	-0.124939
-0.272956	operations with Boolean variables	-0.124939
-0.272956	that have Boolean variables	-0.124939
-0.272956	for true. Boolean variables	-0.124939
-0.272956	are overdetermined Boolean variables	-0.124939
-0.272956	is invalid. Boolean variables	-0.124939
-0.194249	method of induction variables	-0.124939
-0.194249	subexpressions, and induction variables	-0.124939
-0.263111	polynomial with induction variables	-0.124939
-0.194249	to use induction variables	-0.124939
-0.194249	not make induction variables	-0.124939
-0.086353	floating point induction variables	-0.124939
-0.086353	Floating point induction variables	-0.124939
-0.263111	the two induction variables	-0.124939
-0.194249	doesn't need induction variables	-0.124939
-0.597621	functions and public variables	-0.124939
-0.326686	can't have public variables	-0.124939
-0.290562	static and global variables	-0.124939
-0.290562	preferably avoid global variables	-0.124939
-0.290562	variables. All global variables	-0.124939
-0.314565	are used. Such variables	-0.124939
-0.314565	or __declspec(thread). Such variables	-0.124939
-0.311498	data and local variables	-0.124939
-0.311498	with other local variables	-0.124939
-0.346225	one for initialized variables	-0.124939
-0.346142	16-bit Windows, allow variables	-0.124939
-0.374601	Likewise, all non-static variables	-0.124939
-0.287490	constructed. All non-static variables	-0.124939
-0.108340	9; } Induction variables	-0.124939
-0.108340	array elements Induction variables	-0.124939
-0.108340	integer expressions Induction variables	-0.124939
-0.108340	code motion Induction variables	-0.124939
-0.108340	} 70 Induction variables	-0.124939
-0.335989	/ 4; Register variables	-0.124939
-0.335915	global variables (i.e. variables	-0.124939
-0.336063	can access internal variables	-0.124939
-0.093225	classes. 7.2 Integers variables	-0.124939
-0.093225	26 7.2 Integers variables	-0.124939
-0.314397	the class. Storing variables	-0.124939
-0.172509	any function. Global variables	-0.124939
-0.172509	avoid it. Global variables	-0.124939
-0.293894	one for uninitialized variables	-0.124939
-0.237625	The two summation variables	-0.124939
-0.835297	by copying the return	-0.124939
-0.358740	can overwrite the return	-0.124939
-1.124153	an explanation of return	-0.124939
-0.594822	exceptions is to return	-0.124939
-1.801816	is recommended to return	-0.124939
-0.358144	from a=a*2; to return	-0.124939
-0.143044	the call and return	-0.124939
-0.143044	of call and return	-0.124939
-1.104896	with new and return	-0.124939
-0.358702	return types The return	-0.124939
-0.358700	an error can return	-0.124939
-0.460646	return 0; // return	-0.124939
-0.460646	// x^10 // return	-0.124939
-0.356575	ifbit=1 bitofn // return	-0.124939
-1.180560	Make the function return	-0.124939
-1.266906	as a function return	-0.124939
-0.356136	for storing function return	-0.124939
-0.504580	The function may return	-0.124939
-0.982997	} else { return	-0.124939
-0.171180	const x) { return	-0.124939
-0.291263	(int x) { return	-0.124939
-0.056115	(float x) { return	-0.301030
-0.121659	p(double x) { return	-0.124939
-0.121659	xpow10(double x) { return	-0.124939
-0.196289	Func1(int x) { return	-0.124939
-0.196289	Func2(double x) { return	-0.124939
-0.509368	& b) { return	-0.124939
-0.205085	* p) { return	-0.425969
-1.241947	if (b) { return	-0.124939
-0.127130	& a) { return	-0.124939
-0.127130	(float a) { return	-0.124939
-0.307638	int m) { return	-0.124939
-0.307638	int Size() { return	-0.124939
-0.526221	library function will return	-0.124939
-0.569862	>>= 1; } return	-0.124939
-0.770433	* 3; } return	-0.124939
-0.350233	n factorial } return	-0.124939
-0.350233	four x^n } return	-0.124939
-0.339605	the chosen version return	-0.124939
-0.137420	to dispatched version return	-0.124939
-0.137420	in dispatched version return	-0.124939
-0.339605	// Default version return	-0.124939
-1.715681	power of 2 return	-0.124939
-0.517825	8*x + 2 return	-0.124939
-0.355944	// No error return	-0.124939
-0.459722	side-effects and its return	-0.124939
-0.355820	then it must return	-0.124939
-0.444541	from stack ; return	-0.124939
-0.792138	unused label ; return	-0.124939
-0.355572	f *= i; return	-0.124939
-0.137344	50 7.16 Function return	-0.124939
-0.137344	systems". 7.16 Function return	-0.124939
-0.620611	// SSE2 supported return	-0.124939
-0.620611	// AVX supported return	-0.124939
-0.136336	CriticalFunction(b, c); ... return	-0.124939
-0.136336	(*CriticalFunction)(b, c); ... return	-0.124939
-0.979772	b + 1; return	-0.124939
-0.750069	function should never return	-0.124939
-0.378808	a * 2; return	-0.425969
-0.347500	a * 3; return	-0.425969
-0.335986	using the normal return	-0.124939
-0.331669	ptr n; #endif return	-0.124939
-0.331596	error reporting here: return	-0.124939
-0.237682	x10 = x8*x2; return	-0.124939
-0.237682	(N & (N-1)) return	-0.124939
-0.237682	-2.0, 4.4, 2.5}; return	-0.124939
-0.237682	(see page 134) return	-0.124939
-0.237682	execution by causing return	-0.124939
-0.237682	clock = __rdtsc(); return	-0.124939
-0.237682	= _mm_hadd_ps(s, s); return	-0.124939
-0.960417	clock frequency is 2	-0.124939
-0.591757	μs on a 2	-0.124939
-0.811826	exp function of 2	-0.124939
-0.075400	a power of 2	-0.321233
-0.149151	for powers of 2	-0.124939
-0.044259	using powers of 2	-0.124939
-0.149151	avoid powers of 2	-0.124939
-0.659008	be reduced to 2	-0.124939
-0.358500	256 Kbytes to 2	-0.124939
-0.594074	b will be 2	-0.124939
-0.356594	int a; // 2	-0.124939
-0.721355	int d; // 2	-0.124939
-0.356594	at 13 // 2	-0.124939
-0.834851	int x = 2	-0.124939
-0.757051	The multiplication by 2	-0.124939
-0.459872	= divide by 2	-0.124939
-0.355966	before dividing by 2	-0.124939
-0.572977	typically 0 - 2	-0.124939
-0.585396	uses more than 2	-0.124939
-0.568641	arrays bigger than 2	-0.124939
-0.657384	bytes = double 2	-0.124939
-1.330583	= a + 2	-0.124939
-0.161961	? c + 2	-0.425969
-0.343056	- 8*x + 2	-0.124939
-0.357317	if (u.i * 2	-0.124939
-0.350362	or unsigned 2 2	-0.124939
-0.452765	Iu16vec4 32 2 2	-0.124939
-0.357092	It takes between 2	-0.124939
-0.451181	execution time. 4 2	-0.124939
-0.349110	optimizing ............................................................................................... 4 2	-0.124939
-0.952622	signed or unsigned 2	-0.124939
-0.457999	SSE double 64 2	-0.124939
-0.837993	long long 64 2	-0.124939
-0.457999	32 4 64 2	-0.124939
-0.324828	I64vec2 Vec2q 64 2	-0.124939
-0.324828	Iu32vec4 Vec4ui 64 2	-0.124939
-0.522244	MMX int 32 2	-0.124939
-0.346364	64 Iu16vec4 32 2	-0.124939
-0.354985	* 5 / 2	-0.124939
-0.520662	2;} // add 2	-0.124939
-0.192561	// INSTRSET == 2	-0.124939
-0.085694	#if INSTRSET == 2	-0.124939
-0.347268	if (i % 2	-0.124939
-0.346334	an address below 2	-0.124939
-0.248542	16383 one fraction 2	-0.124939
-0.106729	127 1 fraction 2	-0.124939
-0.106729	1023 1 fraction 2	-0.124939
-0.054832	= ((x2) 2) 2	-0.124939
-0.621276	bytes = int64_t 2	-0.124939
-0.051238	i); // Add 2	-0.425969
-0.109324	bytes Intel Core 2	-0.124939
-0.109324	operands Intel Core 2	-0.124939
-0.109324	op. Intel Core 2	-0.124939
-0.314513	Difficult cases........................................................................................................ 124 2	-0.124939
-0.237723	data can exceed 2	-0.124939
-0.358475	an error. // You	-0.124939
-0.358170	// Read time You	-0.124939
-0.358117	} 111 } You	-0.124939
-1.000385	for intrinsic functions You	-0.124939
-1.004558	faster if unsigned You	-0.124939
-0.356354	the instruction code. You	-0.124939
-0.917251	at a time. You	-0.124939
-0.489301	lot of time. You	-0.124939
-0.554005	for member functions. You	-0.124939
-0.353859	of memory used. You	-0.124939
-0.353556	a ^ 1; You	-0.124939
-0.517875	and more efficient. You	-0.124939
-0.535251	is less efficient. You	-0.124939
-0.352369	the guidelines below. You	-0.124939
-1.300116	function is called. You	-0.124939
-0.925868	in the compiler. You	-0.124939
-0.452778	the 'this' pointer. You	-0.124939
-0.347745	two different registers. You	-0.124939
-0.438655	several clock cycles. You	-0.124939
-0.438655	500 clock cycles. You	-0.124939
-0.453885	use vector operations. You	-0.124939
-0.307789	floating point operations. You	-0.124939
-0.488159	95 not needed. You	-0.124939
-0.346884	the base classes. You	-0.124939
-0.346950	terminating a thread. You	-0.124939
-0.734534	for double precision. You	-0.124939
-0.345869	a graceful way. You	-0.124939
-0.634132	elements per vector. You	-0.124939
-0.344497	impossible with references. You	-0.124939
-0.344732	and which not. You	-0.124939
-1.147947	dynamic memory allocation. You	-0.124939
-0.480309	elements to zero. You	-0.124939
-0.441117	of the software. You	-0.124939
-0.885506	in this case. You	-0.124939
-0.338749	rarely needed anyway. You	-0.124939
-0.917969	divisible by 16. You	-0.124939
-0.473038	for the application. You	-0.124939
-0.335845	few pitfalls here. You	-0.124939
-0.335717	of the result. You	-0.124939
-0.614628	the preceding one. You	-0.124939
-0.643508	do not overlap. You	-0.124939
-0.255419	and b overlap. You	-0.124939
-0.242015	different CPU cores. You	-0.124939
-0.242015	multiple CPU cores. You	-0.124939
-0.331202	to error handling. You	-0.124939
-0.331202	use denormal numbers. You	-0.124939
-0.606087	of this manual. You	-0.124939
-0.479416	from other modules. You	-0.124939
-0.331202	something about them. You	-0.124939
-0.331202	the program itself. You	-0.124939
-0.324892	is not expensive. You	-0.124939
-0.314086	of Boolean operands. You	-0.124939
-0.314086	for positive n. You	-0.124939
-0.314086	4) + a. You	-0.124939
-0.382077	with a debugger. You	-0.124939
-0.293598	execution of CriticalFunction. You	-0.124939
-0.382077	7.35 page 52. You	-0.124939
-0.382077	algorithm in question. You	-0.124939
-0.293598	particular processor model. You	-0.124939
-0.382077	different test examples. You	-0.124939
-0.382077	cannot be shared. You	-0.124939
-0.293598	to 155 test. You	-0.124939
-0.537782	are mutually incompatible. You	-0.124939
-0.537782	on page 72. You	-0.124939
-0.293598	systems available today. You	-0.124939
-0.293598	questions to me. You	-0.124939
-0.237365	element level 108 You	-0.124939
-0.237365	referencing it twice. You	-0.124939
-0.237365	// Print heading You	-0.124939
-0.237365	-fwrapv or -fno-strict-overflow. You	-0.124939
-0.237365	the variable __intel_cpu_feature_indicator_x. You	-0.124939
-0.237365	prevent cache contention. You	-0.124939
-0.237365	directives are compiler-specific. You	-0.124939
-0.237365	bits (rarely 64). You	-0.124939
-0.237365	are not used). You	-0.124939
-0.237365	effects into account. You	-0.124939
-0.237365	is too late. You	-0.124939
-0.237365	the optimization job. You	-0.124939
-0.237365	a GOT entry. You	-0.124939
-0.237365	a bad dilemma. You	-0.124939
-0.237365	window or makefile. You	-0.124939
-1.061514	copy of the table	-0.124939
-0.596091	declaration of the table	-0.124939
-0.598678	Numbers in the table	-0.124939
-1.590187	sure that the table	-0.124939
-0.597219	fast if the table	-0.124939
-0.883294	time than the table	-0.124939
-0.429840	to calculate the table	-0.124939
-0.563222	will store the table	-0.124939
-0.721124	to copy the table	-0.124939
-0.721124	you expect the table	-0.124939
-0.460543	to declare the table	-0.124939
-0.655011	by declaring the table	-0.124939
-0.460543	that copies the table	-0.124939
-0.356494	function. Copying the table	-0.124939
-1.326056	pointer to a table	-0.124939
-0.580531	(PLT) and a table	-0.124939
-0.499851	it by a table	-0.124939
-0.336402	branch by a table	-0.124939
-0.889962	replaced by a table	-0.124939
-0.590494	call with a table	-0.124939
-1.338546	implemented as a table	-0.124939
-0.193008	(May use a table	-0.425969
-0.727435	value from a table	-0.124939
-0.505412	read from a table	-0.124939
-0.582084	object has a table	-0.124939
-0.659706	The principle of table	-0.124939
-1.048910	position-independent code and table	-0.124939
-0.358293	address calculation and table	-0.124939
-1.393041	of elements in table	-0.124939
-0.536801	experimental results in table	-0.124939
-0.500952	ones mentioned in table	-0.124939
-0.280192	are listed in table	-0.124939
-0.273462	instructions listed in table	-0.124939
-0.356019	are summarized in table	-0.124939
-0.523272	per element. The table	-0.124939
-0.356198	table 8.1. The table	-0.124939
-0.356198	test examples. The table	-0.124939
-0.356198	p. 104). The table	-0.124939
-0.356198	page 134. The table	-0.124939
-0.595521	= { // table	-0.124939
-0.358550	rely heavily on table	-0.124939
-0.590349	examples in this table	-0.124939
-0.598193	Loop to make table	-0.124939
-1.041350	Size of each table	-0.124939
-0.460485	Obviously, all these table	-0.124939
-0.539758	table The following table	-0.124939
-0.786313	unsigned. The following table	-0.124939
-0.355495	table (GOT). These table	-0.124939
-0.514049	bypass the virtual table	-0.124939
-0.368489	to a virtual table	-0.124939
-0.368489	in a virtual table	-0.124939
-0.317735	This so-called virtual table	-0.124939
-0.252687	with a lookup table	-0.124939
-0.362637	use a lookup table	-0.124939
-0.252687	uses a lookup table	-0.124939
-0.349641	page 132. Unfortunately, table	-0.124939
-0.128458	functions for vectorized table	-0.124939
-0.128458	used for vectorized table	-0.124939
-0.127322	the global offset table	-0.124939
-0.127322	called global offset table	-0.124939
-0.348171	level 9. Avoid table	-0.124939
-0.447603	x // align table	-0.124939
-0.327116	use a hash table	-0.124939
-0.249067	data. A hash table	-0.124939
-0.249067	enough. A hash table	-0.124939
-0.039008	the procedure linkage table	-0.124939
-0.065806	a procedure linkage table	-0.124939
-0.039008	called procedure linkage table	-0.124939
-0.039008	ordinary procedure linkage table	-0.124939
-0.331742	a hand- written table	-0.124939
-0.325250	} 112 Vectorized table	-0.124939
-0.325182	element. 100 As table	-0.124939
-0.048368	in an import table	-0.124939
-0.048368	through an import table	-0.124939
-0.598218	track of the performance	-0.124939
-0.595272	factors for the performance	-0.124939
-0.592798	unsatisfied with the performance	-0.124939
-0.579286	hyperthreading, but the performance	-0.124939
-0.585387	for using the performance	-0.124939
-1.404709	cases where the performance	-0.124939
-0.586627	times before the performance	-0.124939
-0.873349	to test the performance	-0.124939
-0.568049	set up the performance	-0.124939
-1.320111	information about the performance	-0.124939
-0.523438	to read the performance	-0.124939
-0.214708	to improve the performance	-0.124939
-0.182679	can improve the performance	-0.124939
-0.129319	may improve the performance	-0.124939
-0.755657	can reduce the performance	-0.124939
-0.459168	and reading the performance	-0.124939
-0.500105	without reducing the performance	-0.124939
-0.459168	this case, the performance	-0.124939
-0.355412	to compare the performance	-0.124939
-0.355412	can influence the performance	-0.124939
-0.355412	without paying the performance	-0.124939
-1.579142	There is a performance	-0.124939
-0.504425	to include a performance	-0.124939
-0.574649	own set of performance	-0.124939
-0.526781	the optimization of performance	-0.124939
-0.358146	dramatic degradation of performance	-0.124939
-0.882468	no difference in performance	-0.124939
-0.414391	big difference in performance	-0.124939
-0.357770	The gain in performance	-0.124939
-0.357770	This gain in performance	-0.124939
-0.356938	considerable improvement in performance	-0.124939
-0.782985	less efficient. The performance	-0.124939
-0.523201	Intel processors. The performance	-0.124939
-0.537000	of CPUs. The performance	-0.124939
-0.460106	performance problems. The performance	-0.124939
-0.356150	branch mispredictions. The performance	-0.124939
-0.504692	if required for performance	-0.124939
-0.358504	technical problems or performance	-0.124939
-0.358508	for good code performance	-0.124939
-0.441900	one or more performance	-0.425969
-0.540667	ment time when performance	-0.124939
-0.358124	monitor counters. A performance	-0.124939
-0.535402	terms of program performance	-0.124939
-0.355096	and analyzing program performance	-0.124939
-1.190935	There is no performance	-0.124939
-0.524737	is hardly any performance	-0.124939
-0.502333	believe that software performance	-0.124939
-0.502125	test feature called performance	-0.124939
-0.356479	A particularly useful performance	-0.124939
-0.356185	under advanced system performance	-0.124939
-0.357005	calls. The best performance	-0.124939
-0.357005	system. The best performance	-0.124939
-0.357005	compilers). The best performance	-0.124939
-0.824410	get a good performance	-0.124939
-0.502157	have very good performance	-0.124939
-0.353953	by selecting optimize performance	-0.124939
-0.353482	Worst-case testing Most performance	-0.124939
-0.134938	153 16.1 Using performance	-0.124939
-0.134938	below) 16.1 Using performance	-0.124939
-0.549818	64) can improve performance	-0.124939
-0.023032	library has reduced performance	-0.602060
-0.346222	give almost identical performance	-0.124939
-0.521332	way to identify performance	-0.124939
-0.335966	well. Very poor performance	-0.124939
-0.331653	considered. A realistic performance	-0.124939
-0.420991	cycle. The highest performance	-0.124939
-0.325072	is no 51 performance	-0.124939
-0.314570	fallacy of measuring performance	-0.124939
-0.314445	Table 2.1. Comparing performance	-0.124939
-0.293941	there are inherent performance	-0.124939
-0.382498	measuring the overall performance	-0.124939
-0.237666	microprocessor The benchmark performance	-0.124939
-0.237666	useful for investigating performance	-0.124939
-0.237666	binding definitely degrades performance	-0.124939
-0.358906	of activating the very	-0.124939
-1.581956	that it is very	-0.124939
-1.338074	but it is very	-0.124939
-0.863605	program, it is very	-0.124939
-0.589743	Interpreted code is very	-0.124939
-0.594542	system. This is very	-0.124939
-0.597018	data. It is very	-0.124939
-0.525289	library which is very	-0.124939
-0.525289	stack, which is very	-0.124939
-0.761102	operation, which is very	-0.124939
-0.925427	the library is very	-0.124939
-0.592457	are: There is very	-0.124939
-0.776037	of registers is very	-0.124939
-0.533904	cycle counter is very	-0.124939
-0.869061	the syntax is very	-0.124939
-0.354106	of constants is very	-0.124939
-0.457511	of algorithm is very	-0.124939
-0.457511	loop body is very	-0.124939
-0.968510	This is a very	-0.124939
-1.135864	compiler is a very	-0.124939
-0.834834	itself is a very	-0.124939
-1.024001	because of a very	-0.124939
-0.582877	disadvantage of a very	-0.124939
-0.594618	but in a very	-0.124939
-0.585561	perhaps for a very	-0.124939
-0.625031	can be a very	-0.124939
-0.566223	integers with a very	-0.124939
-0.566223	Loops with a very	-0.124939
-0.589306	interpreted as a very	-0.124939
-1.015909	This has a very	-0.124939
-0.777424	integer takes a very	-0.124939
-0.533047	may require a very	-0.124939
-0.353540	is certainly a very	-0.124939
-0.353540	is indeed a very	-0.124939
-0.805763	not apply to very	-0.124939
-0.725263	very long and very	-0.124939
-0.358277	well tested, and very	-0.124939
-0.984231	used only for very	-0.124939
-0.461723	to work for very	-0.124939
-0.357422	24 dramatically for very	-0.124939
-0.592065	going to be very	-0.124939
-1.039731	It can be very	-0.124939
-0.581887	list can be very	-0.124939
-0.581887	counters can be very	-0.124939
-0.863640	chains can be very	-0.124939
-0.961389	code will be very	-0.124939
-0.559147	This will be very	-0.124939
-0.592109	program that are very	-0.124939
-0.564942	these operations are very	-0.124939
-0.546355	of microprocessors are very	-0.124939
-0.355506	Cache misses are very	-0.124939
-0.355506	out-of-order capabilities are very	-0.124939
-0.358539	performs better on very	-0.124939
-0.358486	same thread as very	-0.124939
-0.886211	compilers are not very	-0.124939
-0.358431	computationally intensive may very	-0.124939
-0.519974	these libraries have very	-0.124939
-0.353951	AVX instructions have very	-0.124939
-0.498068	because computers have very	-0.124939
-0.358177	particularly critical. A very	-0.124939
-0.560915	features, but also very	-0.124939
-0.579810	not in some very	-0.124939
-0.453308	supported by some very	-0.124939
-0.585111	registers are accessed very	-0.124939
-0.568528	making the arrays very	-0.124939
-0.824053	you can get very	-0.124939
-1.379504	makes data caching very	-0.124939
-0.498125	array is made very	-0.124939
-0.319633	and other things very	-0.124939
-0.319633	does some things very	-0.124939
-0.350831	this bookkeeping depends very	-0.124939
-0.492742	code can become very	-0.124939
-0.438753	operations are generally very	-0.124939
-0.331779	expressions like -(-a) very	-0.124939
-0.331657	unit-testing is unfortunately very	-0.124939
-0.325228	is executed. Optimizes very	-0.124939
-0.314523	be obsolete. Programmers very	-0.124939
-0.237731	7.43b is admittedly very	-0.124939
-1.866880	parts of the software	-0.124939
-0.596699	systematization of the software	-0.124939
-1.926976	is that the software	-0.124939
-1.067029	But if the software	-0.124939
-0.343009	the time the software	-0.425969
-0.886258	compilers use the software	-0.124939
-0.594374	coded. If the software	-0.124939
-0.548098	precision. But the software	-0.124939
-0.548098	may cause the software	-0.124939
-0.461457	of optimizing the software	-0.124939
-0.357214	may view the software	-0.124939
-1.113706	of using a software	-0.124939
-0.720684	difference between a software	-0.124939
-0.356304	to test a software	-0.124939
-0.356304	CPUs. However, a software	-0.124939
-0.460301	to choose a software	-0.124939
-0.356304	to base a software	-0.124939
-0.720684	to install a software	-0.124939
-0.460301	traditionally considered a software	-0.124939
-0.356304	to reinstall a software	-0.124939
-1.508010	a piece of software	-0.124939
-0.932820	the costs of software	-0.124939
-0.356473	advanced principles of software	-0.124939
-0.654969	The splitting of software	-0.124939
-0.356473	Automatic updating of software	-0.124939
-0.356473	The vulnerability of software	-0.124939
-0.356473	a lineage of software	-0.124939
-0.356473	the attention of software	-0.124939
-0.550632	respects relevant to software	-0.124939
-0.358217	development process and software	-0.124939
-0.504024	advanced programmers and software	-0.124939
-0.358347	this shift in software	-0.124939
-0.358347	class separately in software	-0.124939
-0.525607	of time for software	-0.124939
-0.357376	is common for software	-0.124939
-0.656766	not uncommon for software	-0.124939
-0.357288	style are that software	-0.124939
-0.524873	usability problems that software	-0.124939
-0.656590	I believe that software	-0.124939
-0.659005	is wasted on software	-0.124939
-0.358250	to test when software	-0.124939
-0.462590	64-bit systems. A software	-0.124939
-1.309509	possible to make software	-0.124939
-1.526382	order to make software	-0.124939
-0.503618	debate about which software	-0.124939
-0.572016	requires that all software	-0.124939
-0.503364	again, that most software	-0.124939
-0.357247	Even worse, many software	-0.124939
-0.560388	performance for 32-bit software	-0.124939
-0.350002	for fast 32-bit software	-0.124939
-0.586856	starting a new software	-0.124939
-0.551730	means of making software	-0.124939
-0.344804	satisfied with making software	-0.124939
-0.355902	of redesign. Some software	-0.124939
-0.355808	and other extra software	-0.124939
-0.355549	come. Even big software	-0.124939
-0.458753	a well optimized software	-0.124939
-0.354172	Compatibility problems. All software	-0.124939
-0.568247	in the application software	-0.124939
-0.353314	to make their software	-0.124939
-0.302638	Automatic updates Many software	-0.124939
-0.302638	Other databases Many software	-0.124939
-0.302638	or more. Many software	-0.124939
-0.351725	the Microsoft platform software	-0.124939
-0.351900	the ever bigger software	-0.124939
-0.379281	put the whole software	-0.124939
-0.379281	Test the whole software	-0.124939
-0.125984	manuals: 1. Optimizing software	-0.124939
-0.725960	in a typical software	-0.124939
-0.429079	piece of CPU-intensive software	-0.124939
-0.331629	goal of 18 software	-0.124939
-0.421083	priority of structured software	-0.124939
-0.293913	lot of irrelevant software	-0.124939
-0.293913	many hard working software	-0.124939
-0.293913	A better performing software	-0.124939
-0.293913	a computer. Security software	-0.124939
-0.237641	to make memory-hungry software	-0.124939
-0.237641	for supporting multi-threaded software	-0.124939
-0.373952	stored in the order	-0.425969
-0.586577	accessed in the order	-0.124939
-0.586577	consecutively in the order	-0.124939
-1.922148	is that the order	-0.124939
-0.722249	can check the order	-0.124939
-0.356980	is usually the order	-0.124939
-0.554633	we change the order	-0.124939
-0.172046	not swap the order	-0.124939
-0.155769	cannot swap the order	-0.425969
-0.356980	when swapping the order	-0.124939
-0.502294	This reflects the order	-0.124939
-0.356980	by controlling the order	-0.124939
-0.870513	instructions out of order	-0.124939
-0.065716	11 Out of order	-0.425969
-0.496275	disable it in order	-0.124939
-1.386893	the code in order	-0.124939
-0.444659	of data in order	-0.124939
-0.444659	arranging data in order	-0.124939
-0.958314	each other in order	-0.124939
-0.485163	instruction set in order	-0.124939
-0.475504	line size in order	-0.124939
-0.337642	of i in order	-0.124939
-0.554260	point variables in order	-0.124939
-0.694181	of 2 in order	-0.124939
-0.554463	non-sequential order in order	-0.124939
-0.562395	language elements in order	-0.124939
-0.337642	declared const in order	-0.124939
-0.337642	by 8 in order	-0.124939
-0.337642	to unsigned in order	-0.124939
-0.436725	of operations in order	-0.124939
-0.124669	several times in order	-0.425969
-0.337642	very big in order	-0.124939
-0.558603	array element in order	-0.124939
-0.475504	debug information in order	-0.124939
-0.521800	round addresses in order	-0.124939
-0.337642	the end in order	-0.124939
-0.515471	than needed in order	-0.124939
-0.515471	joined together in order	-0.124939
-0.337642	of bool in order	-0.124939
-0.485163	worst-case conditions in order	-0.124939
-0.337642	the right in order	-0.124939
-0.337642	of cores in order	-0.124939
-0.337642	user input in order	-0.124939
-0.337642	multiple blocks in order	-0.124939
-0.337642	is low in order	-0.124939
-0.436725	different algorithms in order	-0.124939
-0.337642	specific purpose in order	-0.124939
-0.337642	calling itself in order	-0.124939
-0.337642	programming principles in order	-0.124939
-0.337642	software package in order	-0.124939
-0.337642	there is, in order	-0.124939
-0.337642	of bookkeeping in order	-0.124939
-0.337642	disturbing influences in order	-0.124939
-0.337642	do experiments in order	-0.124939
-0.337642	develop- ment in order	-0.124939
-0.337642	of randomness in order	-0.124939
-0.337642	then de-referenced in order	-0.124939
-0.337642	planning phase in order	-0.124939
-0.337642	table (GOT) in order	-0.124939
-0.337642	* sizeof(float) in order	-0.124939
-0.357525	template parameter. The order	-0.124939
-0.357525	7.5 Booleans The order	-0.124939
-0.357525	page 51). The order	-0.124939
-0.492941	2.0; } In order	-0.124939
-0.345953	of B. In order	-0.124939
-0.345953	course system-specific. In order	-0.124939
-0.719200	have no specific order	-0.124939
-0.454353	where the storage order	-0.124939
-0.454404	together. The link order	-0.124939
-1.152646	in a non-sequential order	-0.124939
-0.325311	accessed in sequential order	-0.124939
-0.575476	have a natural order	-0.124939
-0.382771	sequentially. The opposite order	-0.124939
-0.237861	utilizing its out-of- order	-0.124939
-0.237861	for a 2'nd order	-0.124939
-0.900706	except in the long	-0.124939
-1.431715	because it is long	-0.124939
-1.046831	the loop is long	-0.124939
-0.595227	sum of a long	-0.124939
-1.021669	This has a long	-0.124939
-1.106660	of using a long	-0.124939
-1.107822	situation where a long	-0.124939
-0.355853	that takes a long	-0.124939
-0.500720	integer takes a long	-0.124939
-0.356643	may take a long	-0.124939
-0.356643	logarithms take a long	-0.124939
-0.106468	take quite a long	-0.124939
-0.355064	which causes a long	-0.124939
-0.458726	numbers. With a long	-0.124939
-0.355064	calculations forms a long	-0.124939
-0.065716	float, double and long	-0.425969
-0.358498	branch misprediction, or long	-0.124939
-0.832317	are done with long	-0.124939
-0.357105	dependency chains with long	-0.124939
-0.449512	extra time as long	-0.124939
-0.637871	the program as long	-0.124939
-0.347790	automatically, but as long	-0.124939
-0.494658	multiple variables as long	-0.124939
-0.347790	six times as long	-0.124939
-0.347790	point calculations as long	-0.124939
-0.347790	not significant as long	-0.124939
-0.347790	64-bit integers, as long	-0.124939
-0.578645	noticeable but not long	-0.124939
-0.358297	These registers have long	-0.124939
-0.358257	to calculate when long	-0.124939
-0.358118	first sub-vector. A long	-0.124939
-1.142915	there are no long	-0.124939
-0.502533	list of some long	-0.124939
-0.568010	time is so long	-0.124939
-0.557367	takes a very long	-0.124939
-0.484261	work for very long	-0.124939
-0.397477	to be very long	-0.124939
-0.756169	can be very long	-0.124939
-0.421912	uint32_t unsigned long long	-0.124939
-0.297984	128 SSE2 long long	-0.124939
-0.297984	int i; long long	-0.124939
-0.297984	256 AVX2 long long	-0.124939
-0.297984	64 MMX long long	-0.124939
-0.297984	512 AVX512 long long	-0.124939
-0.297984	long time1; long long	-0.124939
-0.297984	231-1 int32_t long long	-0.124939
-0.297984	#include <intrin.h> long long	-0.124939
-0.297984	int DontSkip; long long	-0.124939
-0.547442	double 8 8 long	-0.124939
-0.624822	16-bit systems: unsigned long	-0.124939
-0.341050	64-bit Linux: unsigned long	-0.124939
-0.341050	232-1 uint32_t unsigned long	-0.124939
-0.460823	and measure how long	-0.124939
-0.537273	4 128 SSE2 long	-0.124939
-0.523298	is to avoid long	-0.124939
-0.523298	have to avoid long	-0.124939
-0.591256	10; int i; long	-0.124939
-0.455663	program takes too long	-0.124939
-0.516681	8 256 AVX2 long	-0.124939
-0.515774	delay is just long	-0.124939
-0.348065	page 22. Avoid long	-0.124939
-0.480892	any branch misprediction long	-0.124939
-0.498405	2 64 MMX long	-0.124939
-0.954520	in 16-bit systems: long	-0.124939
-0.698044	16 512 AVX512 long	-0.124939
-0.095124	cause of unacceptably long	-0.124939
-0.095124	frustrated by unacceptably long	-0.124939
-0.095124	sometimes have unacceptably long	-0.124939
-0.095124	might experience unacceptably long	-0.124939
-0.835767	vector math libraries: long	-0.124939
-0.407765	29 64-bit Linux: long	-0.124939
-0.382486	Larger data types: long	-0.124939
-0.293932	long long time1; long	-0.124939
-0.293932	would give annoyingly long	-0.124939
-0.237658	-231 231-1 int32_t long	-0.124939
-0.237658	16.1 #include <intrin.h> long	-0.124939
-0.237658	volatile int DontSkip; long	-0.124939
-0.358633	and mainframes, and between	-0.124939
-0.659611	the cache in between	-0.124939
-0.358429	multithreaded program, or between	-0.124939
-0.869271	from the cache between	-0.124939
-0.357599	elements are there between	-0.124939
-0.502690	used. It takes between	-0.124939
-0.141271	difference in performance between	-0.124939
-0.810254	6 unused bytes between	-0.124939
-0.062251	difference in speed between	-0.124939
-0.286230	that is shared between	-0.124939
-0.286230	address and shared between	-0.124939
-0.286230	can be shared between	-0.124939
-0.286230	that are shared between	-0.124939
-0.286230	is not shared between	-0.124939
-0.543718	time is typically between	-0.124939
-0.518804	result. The conversion between	-0.124939
-0.060383	for the difference between	-0.124939
-0.060383	as the difference between	-0.124939
-0.029143	Note the difference between	-0.124939
-0.060383	illustrates the difference between	-0.124939
-0.060383	Calculating the difference between	-0.124939
-0.288672	FPGAs. The difference between	-0.124939
-1.001037	is no difference between	-0.124939
-0.194205	a minimal difference between	-0.124939
-0.454766	third-party graphics framework between	-0.124939
-0.516322	mask to choose between	-0.124939
-0.518612	is a switch between	-0.124939
-0.317084	don't need conversions between	-0.124939
-0.317084	140. Avoid conversions between	-0.124939
-0.536023	offer the choice between	-0.124939
-0.344967	and CISC processors, between	-0.124939
-0.341458	threads that jump between	-0.124939
-0.093210	{ ... Conversions between	-0.124939
-0.093210	precision conversion Conversions between	-0.124939
-0.093210	double precision. Conversions between	-0.124939
-0.093210	is enabled. Conversions between	-0.124939
-0.044109	platform. 14.8 Conversions between	-0.124939
-0.044109	140 14.8 Conversions between	-0.124939
-0.779278	code is distributed between	-0.124939
-0.255738	needed for communication between	-0.124939
-0.187949	is that communication between	-0.124939
-0.187949	of necessary communication between	-0.124939
-0.429093	branches that select between	-0.124939
-0.331556	microprocessors is split between	-0.124939
-0.102768	is a compromise between	-0.124939
-0.102768	be a compromise between	-0.124939
-0.331556	set has nothing between	-0.124939
-0.331454	a thread jumps between	-0.124939
-0.027545	branch that chooses between	-0.425969
-0.122525	a program chooses between	-0.124939
-0.210054	have to distinguish between	-0.124939
-0.210054	important to distinguish between	-0.124939
-0.314329	if the distance between	-0.124939
-0.293830	not optimized. Jumps between	-0.124939
-0.293830	single function. Switch between	-0.124939
-0.023497	communication and synchronization between	-0.124939
-0.102738	makes a distinction between	-0.124939
-0.102738	an important distinction between	-0.124939
-0.237568	data. The similarity between	-0.124939
-0.237568	a sensible balance between	-0.124939
-0.237568	and the transitions between	-0.124939
-0.237568	the work evenly between	-0.124939
-0.237568	divide the workload between	-0.124939
-0.237568	synchronizing and communicating between	-0.124939
-0.237568	no clear correspondence between	-0.124939
-0.237568	predicted perfectly varies between	-0.124939
-0.237568	differences were observed between	-0.124939
-0.237568	same without discriminating between	-0.124939
-0.237568	be. The distinctions between	-0.124939
-0.237568	to facilitate porting between	-0.124939
-0.237568	one that discriminates between	-0.124939
-0.237568	a multitasking environment, between	-0.124939
-0.237568	functions for distinguishing between	-0.124939
-1.199618	bits of the 32-bit	-0.124939
-0.598855	above for the 32-bit	-0.124939
-0.595104	better than the 32-bit	-0.124939
-0.598131	bit of a 32-bit	-0.124939
-0.594618	reached with a 32-bit	-0.124939
-0.572075	offset as a 32-bit	-0.124939
-0.845057	expressed as a 32-bit	-0.124939
-0.818458	the efficiency of 32-bit	-0.124939
-1.233837	when applied to 32-bit	-0.124939
-0.675237	64-bit Windows and 32-bit	-0.124939
-0.473359	32-bit Windows and 32-bit	-0.124939
-0.853021	32-bit Linux and 32-bit	-0.124939
-0.527535	systems and in 32-bit	-0.124939
-0.838523	mode than in 32-bit	-0.124939
-0.173417	register variables in 32-bit	-0.124939
-0.489286	calls faster in 32-bit	-0.124939
-0.449312	calling method in 32-bit	-0.124939
-0.795635	32 bits in 32-bit	-0.124939
-0.559877	registers available in 32-bit	-0.124939
-0.105061	the stack in 32-bit	-0.124939
-0.510743	4 bytes in 32-bit	-0.124939
-0.510743	64-bit integers in 32-bit	-0.124939
-0.700872	double precision in 32-bit	-0.124939
-0.347632	is eight in 32-bit	-0.124939
-0.770892	relative addresses in 32-bit	-0.124939
-0.545018	system running in 32-bit	-0.124939
-0.510743	self-relative references in 32-bit	-0.124939
-0.168970	inefficient, especially in 32-bit	-0.124939
-0.168970	resource, especially in 32-bit	-0.124939
-0.168970	relocation, especially in 32-bit	-0.124939
-0.347632	scarce resource in 32-bit	-0.124939
-0.347632	general purposes in 32-bit	-0.124939
-0.637562	without -fpic in 32-bit	-0.124939
-0.347632	approximately six in 32-bit	-0.124939
-0.347632	and Sum3 in 32-bit	-0.124939
-0.415525	source compiler for 32-bit	-0.124939
-0.415525	commercial compiler for 32-bit	-0.124939
-0.415525	cheap compiler for 32-bit	-0.124939
-0.532038	identical performance for 32-bit	-0.124939
-0.352872	clock cycles for 32-bit	-0.124939
-1.098670	is intended for 32-bit	-0.124939
-1.101518	when compiling for 32-bit	-0.124939
-0.352872	optimization capabilities for 32-bit	-0.124939
-0.352872	X Compilers for 32-bit	-0.124939
-0.352872	separate executables for 32-bit	-0.124939
-0.835098	and b are 32-bit	-0.124939
-0.461967	); #else // 32-bit	-0.124939
-0.357614	|| defined(__GNUC__) // 32-bit	-0.124939
-0.595087	vector, such as 32-bit	-0.124939
-0.591271	little faster than 32-bit	-0.124939
-0.893168	inefficient to use 32-bit	-0.124939
-0.142103	compiler. Supports only 32-bit	-0.124939
-0.142103	sets. Supports only 32-bit	-0.124939
-0.357976	64 bits, but 32-bit	-0.124939
-0.454947	represented as two 32-bit	-0.124939
-0.454947	rather than two 32-bit	-0.124939
-0.357460	self-relative addressing. In 32-bit	-0.124939
-0.642401	in performance between 32-bit	-0.124939
-0.557705	no difference between 32-bit	-0.124939
-0.138975	32-bit -fno-builtin Gnu 32-bit	-0.124939
-0.138975	bit -fno-builtin Gnu 32-bit	-0.124939
-0.354941	compiler sometimes uses 32-bit	-0.124939
-0.354864	Gnu libraries support 32-bit	-0.124939
-0.533810	sourcebook for fast 32-bit	-0.124939
-0.353985	Today (2013) both 32-bit	-0.124939
-0.353395	inferior to their 32-bit	-0.124939
-0.352131	64-bit integers. Many 32-bit	-0.124939
-0.350585	utility. It supports 32-bit	-0.124939
-0.317235	are not. Supports 32-bit	-0.124939
-0.317235	binary code). Supports 32-bit	-0.124939
-0.349609	many platforms, including 32-bit	-0.124939
-0.348205	Windows and Linux, 32-bit	-0.124939
-0.438671	as in Linux. 32-bit	-0.124939
-0.875805	pointer or reference, 32-bit	-0.124939
-0.538481	Mac OS X, 32-bit	-0.124939
-0.237715	in both 16-bit, 32-bit	-0.124939
-1.191172	space in the branch	-0.124939
-1.178599	Contentions in the branch	-0.124939
-0.895924	performance if the branch	-0.124939
-0.596000	predicted by the branch	-0.124939
-0.249182	is called the branch	-0.124939
-0.249182	cache called the branch	-0.124939
-1.061569	to replace the branch	-0.124939
-0.357682	how predictable the branch	-0.124939
-0.462053	it avoids the branch	-0.124939
-0.358844	is used is branch	-0.124939
-0.891421	case is a branch	-0.124939
-0.593284	it to a branch	-0.124939
-0.591374	not with a branch	-0.124939
-0.576970	recover from a branch	-0.124939
-0.460270	code has a branch	-0.124939
-0.459534	which way a branch	-0.124939
-1.365200	For example, a branch	-0.124939
-0.545751	automatically replace a branch	-0.124939
-0.536317	way. Such a branch	-0.124939
-0.355700	it feeds a branch	-0.124939
-0.425295	an explanation of branch	-0.425969
-0.825909	a kind of branch	-0.124939
-0.561167	each function and branch	-0.124939
-0.065660	cache misses and branch	-0.425969
-0.462545	16-bit integers. The branch	-0.124939
-0.503817	be critical. The branch	-0.124939
-0.592727	algorithms used for branch	-0.124939
-0.358054	reliable results for branch	-0.124939
-0.358741	history of that branch	-0.124939
-0.502840	how the if branch	-0.124939
-0.461657	4. The if branch	-0.124939
-0.142843	// Loop with branch	-0.124939
-0.142843	12.4a. Loop with branch	-0.124939
-0.463119	more details on branch	-0.124939
-0.501860	correctly. A code branch	-0.124939
-0.355498	of which code branch	-0.124939
-0.355498	run any code branch	-0.124939
-0.500327	} } A branch	-0.124939
-0.340450	= b; A branch	-0.124939
-0.340450	other way. A branch	-0.124939
-0.340450	every call. A branch	-0.124939
-0.340450	of course. A branch	-0.124939
-0.340450	VIA CPUs". A branch	-0.124939
-0.340450	often mispredicted. A branch	-0.124939
-0.340450	it changes. A branch	-0.124939
-0.885498	then the loop branch	-0.124939
-0.571439	calculations. The loop branch	-0.124939
-1.010486	microcontrollers have no branch	-0.124939
-0.357287	to generate many branch	-0.124939
-0.879857	the best possible branch	-0.124939
-0.357214	and resolve any branch	-0.124939
-0.846176	need to take branch	-0.124939
-0.772837	make a new branch	-0.124939
-0.532069	maintaining a new branch	-0.124939
-0.355863	page 43 about branch	-0.124939
-1.297284	into a single branch	-0.124939
-0.573123	BTB can cause branch	-0.124939
-0.569334	cases, the optimal branch	-0.124939
-0.353756	that each particular branch	-0.124939
-0.266038	the loop control branch	-0.249877
-0.239943	i<20 loop control branch	-0.124939
-0.541614	then the dispatch branch	-0.124939
-0.434734	a poorly predictable branch	-0.124939
-0.335997	suffer from poor branch	-0.124939
-0.325102	cache, code cache, branch	-0.124939
-0.407814	If the wrong branch	-0.124939
-0.020831	to cache misses, branch	-0.124939
-0.020831	most cache misses, branch	-0.124939
-0.020831	executed, cache misses, branch	-0.124939
-0.293969	intrinsic function. Provoke branch	-0.124939
-0.293969	Eliminate branches Remove branch	-0.124939
-0.237690	branch target buffer, branch	-0.124939
-0.597174	-b to a <	-0.124939
-0.065510	= 0; x <	-0.425969
-0.177896	label if i <	-0.124939
-0.000096	= 0; i <	-0.674611
-0.244009	( ; i <	-0.124939
-0.177896	loop condition i <	-0.124939
-0.177896	two comparisons i <	-0.124939
-0.305835	reversed if c <	-0.124939
-0.013942	= 0; c <	-0.425969
-0.475976	... if (i <	-0.124939
-0.337985	list[ARRAYSIZE]; if (i <	-0.124939
-0.324506	exceptions: while (i <	-0.124939
-0.352800	0 <= n <	-0.124939
-0.053812	= 0; r <	-0.425969
-0.053812	= 1; r <	-0.425969
-0.349005	= &list[0]; temp <	-0.124939
-0.345063	= 0; row <	-0.124939
-0.279387	= r1; c2 <	-0.124939
-0.279387	= c1; c2 <	-0.124939
-0.341733	= 0; j <	-0.124939
-0.336159	= 0; column <	-0.124939
-0.336119	= 0; c1 <	-0.124939
-0.325222	1.0 <= u.f <	-0.124939
-0.314591	= 0; r1 <	-0.124939
-0.172602	= r1; r2 <	-0.124939
-0.172602	= r1+1; r2 <	-0.124939
-0.538627	if ((unsigned int)i <	-0.124939
-0.294080	- n.a. !(a <	-0.124939
-0.023514	if ((unsigned int)n <	-0.124939
-0.237788	certain that u <	-0.124939
-0.237788	u; if (u.i[1] <	-0.124939
-0.237788	0; while (seconds <	-0.124939
-0.237788	be while (0 <	-0.124939
-1.637606	address of the member	-0.124939
-1.173949	implementation of the member	-0.124939
-0.887743	offset of the member	-0.124939
-0.598586	class that the member	-0.124939
-0.598784	(i.e. if the member	-0.124939
-0.839259	to have the member	-0.124939
-0.595357	file. If the member	-0.124939
-1.180245	function that is member	-0.124939
-0.580500	object, and a member	-0.124939
-0.586476	could be a member	-0.124939
-1.017295	pointer or a member	-0.124939
-0.341496	the function a member	-0.425969
-0.357483	efficient as a member	-0.425969
-0.547324	internally as a member	-0.124939
-0.560775	support. Make a member	-0.124939
-0.754832	than accessing a member	-0.124939
-0.499646	compact. Accessing a member	-0.124939
-0.458751	class. Calling a member	-0.124939
-0.355084	can force a member	-0.124939
-0.805911	complicated implementation of member	-0.124939
-0.526988	of classes and member	-0.124939
-0.314372	this pointer in member	-0.124939
-0.314372	'this' pointer in member	-0.124939
-0.358715	page 52. The member	-0.124939
-0.358716	different meaning for member	-0.124939
-0.586766	variable, pointer or member	-0.124939
-0.656875	data members or member	-0.124939
-0.502505	doesn't work with member	-0.124939
-0.357131	cause complications with member	-0.124939
-0.063828	of a data member	-0.124939
-0.063828	cases, a data member	-0.124939
-0.063828	accessing a data member	-0.124939
-0.063828	Accessing a data member	-0.124939
-0.444265	a class data member	-0.124939
-0.343633	the first data member	-0.124939
-0.888166	recommended to make member	-0.124939
-0.554691	complications that make member	-0.124939
-0.747052	You may make member	-0.124939
-1.653828	use the same member	-0.124939
-0.580453	as any other member	-0.124939
-0.183279	to a class member	-0.124939
-0.345297	work for class member	-0.124939
-0.446363	registers. A class member	-0.124939
-0.357765	members. But each member	-0.124939
-0.357688	align its b member	-0.124939
-0.041749	functions. A static member	-0.425969
-0.087943	stack. A static member	-0.124939
-0.461474	(CParent<>) contains any member	-0.124939
-0.560762	control the way member	-0.124939
-0.502389	to. A const member	-0.124939
-0.355098	as a virtual member	-0.124939
-0.355098	call a virtual member	-0.124939
-0.382799	dispatch to virtual member	-0.124939
-0.382799	obtained with virtual member	-0.124939
-0.294186	least one virtual member	-0.124939
-0.294186	has no virtual member	-0.124939
-0.349013	static static Assume member	-0.124939
-0.040865	than a non-static member	-0.425969
-0.193287	members or non-static member	-0.124939
-0.261984	on all non-static member	-0.124939
-0.364559	call the polymorphic member	-0.124939
-0.584889	of a polymorphic member	-0.124939
-0.224838	32-bit systems. Virtual member	-0.124939
-0.098029	access. 7.20 Virtual member	-0.124939
-0.098029	53 7.20 Virtual member	-0.124939
-0.093271	bytes. 7.19 Class member	-0.124939
-0.093271	51 7.19 Class member	-0.124939
-0.314494	model fast=2 Simple member	-0.124939
-0.314494	each other (not member	-0.124939
-0.293987	contains any non-polymorphic member	-0.124939
-0.293987	programming are: Non-static member	-0.124939
-0.293987	call a non-virtual member	-0.124939
-0.237706	of structures (without member	-0.124939
-1.292210	because of the way	-0.124939
-0.201985	lies in the way	-0.124939
-0.591488	Programming in the way	-0.124939
-0.595818	satisfied with the way	-0.124939
-1.560105	depends on the way	-0.124939
-0.657683	to control the way	-0.124939
-0.556250	types Unfortunately, the way	-0.124939
-0.588121	It is a way	-0.124939
-1.572556	there is a way	-0.124939
-1.020185	code in a way	-0.124939
-0.199928	calculation in a way	-0.425969
-0.567622	in such a way	-0.124939
-0.461124	131) shows a way	-0.124939
-0.462551	fine-grained parallelism. The way	-0.124939
-0.358074	physical factors. The way	-0.124939
-0.546867	define in this way	-0.124939
-0.546867	measured in this way	-0.124939
-1.215515	in a different way	-0.124939
-1.554904	in the same way	-0.124939
-0.565505	work the same way	-0.124939
-0.565505	goes the same way	-0.124939
-0.565505	go the same way	-0.124939
-0.753854	is the only way	-0.124939
-0.499103	aliasing. The only way	-0.124939
-0.639366	and the other way	-0.124939
-0.639366	times the other way	-0.124939
-0.450480	rarely the other way	-0.124939
-0.956851	to predict which way	-0.124939
-0.553880	pointers in one way	-0.124939
-1.048835	more than one way	-0.124939
-0.342994	that goes one way	-0.124939
-0.342994	to go one way	-0.124939
-0.342994	goes randomly one way	-0.124939
-1.428120	there is no way	-0.124939
-1.129393	There is no way	-0.124939
-0.539418	resource. The C++ way	-0.124939
-0.883220	be an efficient way	-0.124939
-0.587806	a more efficient way	-0.124939
-0.697524	a very efficient way	-0.124939
-0.461255	an even faster way	-0.124939
-0.580001	times the first way	-0.124939
-0.564395	ways. The first way	-0.124939
-0.636055	is a useful way	-0.124939
-0.526598	a very useful way	-0.124939
-0.536812	time. A simple way	-0.124939
-0.420269	is the best way	-0.124939
-0.420269	Obviously, the best way	-0.124939
-0.420269	Sometimes, the best way	-0.124939
-0.416993	shows. The best way	-0.124939
-0.416993	duration. The best way	-0.124939
-1.115638	is a common way	-0.124939
-1.125481	is a good way	-0.124939
-0.532994	exploited. A good way	-0.124939
-0.354897	it goes another way	-0.124939
-0.519498	code. The second way	-0.124939
-0.350780	The most compatible way	-0.124939
-0.404298	in a safe way	-0.124939
-0.311637	called. The safe way	-0.124939
-0.508689	module2.cpp. The simplest way	-0.124939
-0.056027	is no easy way	-0.425969
-0.343425	fastest. The typical way	-0.124939
-0.259213	cases, the fastest way	-0.124939
-0.259213	still the fastest way	-0.124939
-0.336182	is a convenient way	-0.124939
-0.336182	primitive, but efficient, way	-0.124939
-0.331755	for a portable way	-0.124939
-0.851554	in a suboptimal way	-0.124939
-0.077762	(/Oa). The easiest way	-0.124939
-0.077762	delays. The easiest way	-0.124939
-0.237698	clear and intelligible way	-0.124939
-1.624914	one of the elements	-0.124939
-0.596834	b, and the elements	-0.124939
-1.603499	sure that the elements	-0.124939
-0.589842	that if the elements	-0.124939
-1.168450	2 if the elements	-0.124939
-1.220718	in which the elements	-0.124939
-0.549201	We take the elements	-0.124939
-0.357705	by storing the elements	-0.124939
-0.357705	function adds the elements	-0.124939
-0.556428	the number of elements	-0.550907
-0.842412	The number of elements	-0.124939
-0.300432	total number of elements	-0.124939
-0.424773	107 number of elements	-0.124939
-0.424773	Max. number of elements	-0.124939
-0.560013	and types of elements	-0.124939
-0.128196	bits Number of elements	-0.124939
-0.141646	vector Type of elements	-0.124939
-0.141646	follows: Type of elements	-0.124939
-0.835317	the diagonal. The elements	-0.124939
-0.358092	and c2 for elements	-0.124939
-0.358092	search requests for elements	-0.124939
-0.396798	be used if elements	-0.425969
-0.578376	applies only when elements	-0.124939
-0.588759	smaller the data elements	-0.124939
-0.458929	on multiple data elements	-0.124939
-0.829987	operations on all elements	-0.124939
-0.453159	constructor sets all elements	-0.124939
-0.497352	array after all elements	-0.124939
-0.557012	all the array elements	-0.124939
-0.739246	addresses of array elements	-0.124939
-0.523789	variables for array elements	-0.124939
-0.339812	to individual array elements	-0.124939
-0.357347	or "how many elements	-0.124939
-0.357271	not alias any elements	-0.124939
-0.356457	and swap these elements	-0.124939
-0.546474	different C++ language elements	-0.124939
-0.354919	will read four elements	-0.124939
-0.354923	for accessing container elements	-0.124939
-0.400514	The first eight elements	-0.124939
-0.308577	But these eight elements	-0.124939
-0.308577	can handle eight elements	-0.124939
-0.308577	it handles eight elements	-0.124939
-0.520723	operator // add elements	-0.124939
-0.572967	standard user interface elements	-0.124939
-0.351516	experiment where 10 elements	-0.124939
-0.001513	in eight consecutive elements	-0.726999
-0.000756	Load eight consecutive elements	-0.550907
-0.347358	Array with N elements	-0.124939
-0.122641	diagonal // swap elements	-0.124939
-0.122641	a[c][r]); // swap elements	-0.124939
-0.343538	causes all subsequent elements	-0.124939
-0.490256	fail to distinguish elements	-0.124939
-0.325257	then the mirror elements	-0.124939
-0.294052	even add dummy elements	-0.124939
-0.592503	general, it is faster	-0.124939
-1.212221	member function is faster	-0.124939
-0.569154	template function is faster	-0.124939
-1.112804	time. This is faster	-0.124939
-0.577369	before. This is faster	-0.124939
-1.007315	time. It is faster	-0.124939
-0.853883	integers. It is faster	-0.124939
-0.576760	smaller. It is faster	-0.124939
-0.127409	floating point is faster	-0.124939
-0.349704	of 2 is faster	-0.124939
-0.734090	The method is faster	-0.124939
-1.176367	This method is faster	-0.124939
-0.513763	The access is faster	-0.124939
-0.496447	a file is faster	-0.124939
-0.747168	this case is faster	-0.124939
-0.013586	a constant is faster	-0.425969
-0.141135	software implementation is faster	-0.425969
-0.451932	Integer division is faster	-0.124939
-0.796101	one operand is faster	-0.124939
-0.451932	big blocks is faster	-0.124939
-0.705559	development tool is faster	-0.124939
-0.349704	example 15.1c is faster	-0.124939
-0.349704	// (This is faster	-0.124939
-0.349704	constant: Unsigned is faster	-0.124939
-0.349704	example 14.21 is faster	-0.124939
-0.358851	this prevents a faster	-0.124939
-1.633046	likely to be faster	-0.124939
-1.128353	This may be faster	-0.124939
-0.863556	method may be faster	-0.124939
-0.590791	multiplication will be faster	-0.124939
-0.568819	integer operations are faster	-0.124939
-0.569551	Linear arrays are faster	-0.124939
-1.426889	makes the code faster	-0.124939
-0.865790	make member functions faster	-0.124939
-0.461767	C and C++ faster	-0.124939
-0.356918	is 83 called faster	-0.124939
-0.516433	this is often faster	-0.124939
-1.102785	It is often faster	-0.124939
-0.346816	would be even faster	-0.124939
-0.346816	is an even faster	-0.124939
-0.536040	is many times faster	-0.124939
-0.333963	to 5 times faster	-0.124939
-0.333963	to seven times faster	-0.124939
-0.495612	make function calls faster	-0.124939
-0.711216	makes function calls faster	-0.124939
-0.342573	are calculated much faster	-0.124939
-0.342573	is usually much faster	-0.124939
-0.458764	functions are calculated faster	-0.124939
-0.520497	it will run faster	-0.124939
-0.329461	code will run faster	-0.124939
-0.322375	make applications run faster	-0.124939
-0.352196	between threads becomes faster	-0.124939
-0.562188	it is usually faster	-0.124939
-0.351531	a vector goes faster	-0.124939
-0.448885	the code execute faster	-0.124939
-0.527126	run a little faster	-0.124939
-0.343684	may run slightly faster	-0.124939
-0.438776	application is generally faster	-0.124939
-0.429244	often be executed faster	-0.124939
-0.331734	CPUs is increasing faster	-0.124939
-0.005759	2 // Still faster	-0.425969
-0.005759	faster // Still faster	-0.425969
-0.407898	make software packages faster	-0.124939
-0.294033	} // ipow faster	-0.124939
-0.237747	rounding, but neither faster	-0.124939
-1.807258	to use the const	-0.124939
-0.562136	value. However, the const	-0.124939
-0.725533	to remove the const	-0.124939
-0.358393	for relieving the const	-0.124939
-1.282800	call to a const	-0.124939
-0.592440	is by a const	-0.124939
-0.357828	optimize away a const	-0.124939
-0.357828	non-const reference, a const	-0.124939
-0.562833	make table of const	-0.124939
-0.463440	are equivalent to const	-0.124939
-0.504598	test purposes. The const	-0.124939
-0.587219	const pointer or const	-0.124939
-0.356174	class c1 { const	-0.124939
-0.356174	void MathLoop() { const	-0.124939
-0.647889	points to. A const	-0.124939
-0.352898	const reference. A const	-0.124939
-0.352898	is taken. A const	-0.124939
-0.358185	// EMMS } const	-0.124939
-0.358072	the table has const	-0.124939
-0.357557	* dest, double const	-0.124939
-0.466345	123 and static const	-0.124939
-0.428341	() { static const	-0.124939
-0.330955	int b; static const	-0.124939
-0.330955	{ __declspec(align(16)) static const	-0.124939
-0.330955	of factorials: static const	-0.124939
-0.330955	floata; boolb=0; static const	-0.124939
-0.540229	char const * const	-0.124939
-0.564870	with induction variables const	-0.124939
-0.355885	8.24. Integer constant const	-0.124939
-0.881680	unsigned int i; const	-0.124939
-0.013483	* d, __m128i const	-0.726999
-0.352674	n; static char const	-0.124939
-0.710712	Table[100]; int x; const	-0.124939
-0.773710	should be declared const	-0.124939
-0.533656	log2 a global const	-0.124939
-0.526637	12.1a. Automatic vectorization const	-0.124939
-0.639692	to a local const	-0.124939
-0.338975	& a, T const	-0.124939
-0.621143	int order(int x); const	-0.124939
-0.335994	int lrintf (float const	-0.124939
-0.331532	X __attribute__((aligned(16))) #endif const	-0.124939
-0.325056	// Example 14.8 const	-0.124939
-0.325056	from example 16.1 const	-0.124939
-0.044106	int lrint (double const	-0.425969
-0.407597	// Example 14.30 const	-0.124939
-0.006830	inline __m128i LoadVector(void const	-0.602060
-0.314300	// Example 9.5a const	-0.124939
-0.314300	// Example 11.3 const	-0.124939
-0.314472	// Example 9.4 const	-0.124939
-0.314472	// Example 7.17 const	-0.124939
-0.048334	9.1a int Func(int); const	-0.124939
-0.048334	9.1b int Func(int); const	-0.124939
-0.382327	// Example 9.6a const	-0.124939
-0.293802	// Example 11.2b const	-0.124939
-0.293802	size of squares: const	-0.124939
-0.538139	Table of factorials: const	-0.124939
-0.237544	// Example 14.5a const	-0.124939
-0.237544	Vec4f polynomial (Vec4f const	-0.124939
-0.237544	// Example 11.2a const	-0.124939
-0.237544	operator + (vector const	-0.124939
-0.237544	inline __m128i LoadVectorA(void const	-0.124939
-0.237544	inline float add_elements(__m128 const	-0.124939
-0.237544	// Example 7.33b const	-0.124939
-0.237544	use a #define, const	-0.124939
-0.237544	inline T max(T const	-0.124939
-0.237544	// Example 7.33a const	-0.124939
-0.237544	FuncRow(int); int FuncCol(int); const	-0.124939
-0.237544	// Example 14.4a const	-0.124939
-0.358245	function pointer and makes	-0.124939
-0.526281	calls faster and makes	-0.124939
-0.588463	make code that makes	-0.124939
-0.833142	a destructor that makes	-0.124939
-0.504595	+ d; // makes	-0.124939
-0.574606	cases and it makes	-0.124939
-0.898491	is that it makes	-0.124939
-0.552963	mechanism because it makes	-0.124939
-0.552963	interesting because it makes	-0.124939
-1.097610	cases where it makes	-0.124939
-0.351784	integer variable, it makes	-0.124939
-1.262145	of the code makes	-0.425969
-0.492669	the memory. This makes	-0.124939
-0.706391	the program. This makes	-0.124939
-0.677283	of data. This makes	-0.124939
-0.422667	+= x; This makes	-0.124939
-0.460157	or 1. This makes	-0.124939
-0.460157	or class. This makes	-0.124939
-0.326416	64 bytes. This makes	-0.124939
-0.807455	the stack. This makes	-0.124939
-0.326416	than needed. This makes	-0.124939
-0.326416	random order. This makes	-0.124939
-0.474643	32 bits. This makes	-0.124939
-0.654408	other modules. This makes	-0.124939
-0.597117	at all. This makes	-0.124939
-0.326416	is doubled. This makes	-0.124939
-0.326416	cache lines. This makes	-0.124939
-0.422667	been loaded. This makes	-0.124939
-0.326416	is stored. This makes	-0.124939
-0.422667	become fragmented. This makes	-0.124939
-0.326416	table static. This makes	-0.124939
-0.326416	have occurred. This makes	-0.124939
-0.326416	made local. This makes	-0.124939
-0.594007	functions. The compiler makes	-0.124939
-0.577029	code, but this makes	-0.124939
-0.358289	another class. It makes	-0.124939
-0.358091	The installation program makes	-0.124939
-0.358029	smaller functions only makes	-0.124939
-0.347302	access non-sequential which makes	-0.124939
-0.347302	by default, which makes	-0.124939
-0.347302	of abstraction which makes	-0.124939
-0.636920	the stack, which makes	-0.124939
-0.462304	comparisons by one makes	-0.124939
-1.734544	SSE2 instruction set makes	-0.124939
-0.579031	Pro instruction set makes	-0.124939
-1.295416	vector class library makes	-0.124939
-0.346298	time, it also makes	-0.124939
-0.447627	possible. This also makes	-0.124939
-0.346298	static keyword also makes	-0.124939
-0.572758	The function call makes	-0.124939
-0.356520	variable. Using pointers makes	-0.124939
-0.460170	exception handling system makes	-0.124939
-0.355640	A non-Intel processor makes	-0.124939
-0.138076	manual. This option makes	-0.124939
-0.138076	-fsource-asm). This option makes	-0.124939
-0.355054	class. This check makes	-0.124939
-0.354330	unsigned integers simply makes	-0.124939
-0.518679	A const reference makes	-0.124939
-0.647550	The volatile keyword makes	-0.124939
-0.502405	while dynamic linking makes	-0.124939
-0.491915	used. Dynamic linking makes	-0.124939
-0.351079	const, or #define makes	-0.124939
-0.739583	standard stack frame makes	-0.124939
-0.445914	use of templates makes	-0.124939
-0.345055	of such checks makes	-0.124939
-0.344979	The static declaration makes	-0.124939
-0.483555	multiple memory blocks makes	-0.124939
-0.339310	with many instances makes	-0.124939
-0.517417	because the linker makes	-0.124939
-0.314583	than the product makes	-0.124939
-0.237682	code section position-independent, makes	-0.124939
-0.358517	be inlined or cannot	-0.124939
-1.529712	so that it cannot	-0.124939
-1.211279	means that it cannot	-0.124939
-0.881189	only if it cannot	-0.124939
-0.588613	pointers because it cannot	-0.124939
-0.569714	to. Therefore, it cannot	-0.124939
-1.352489	that the function cannot	-0.124939
-0.928728	static member function cannot	-0.124939
-0.520351	const member function cannot	-0.124939
-0.354910	dispatcher 128 function cannot	-0.124939
-0.572887	The intermediate code cannot	-0.124939
-1.331744	that the compiler cannot	-0.124939
-0.572632	Unfortunately, the compiler cannot	-0.124939
-0.572632	12.1b, the compiler cannot	-0.124939
-0.589371	72). The compiler cannot	-0.124939
-0.350905	The CodeGear compiler cannot	-0.124939
-0.480092	library and you cannot	-0.124939
-0.480092	sequential, and you cannot	-0.124939
-0.572695	questions if you cannot	-0.124939
-0.513608	loop then you cannot	-0.124939
-0.513608	handling then you cannot	-0.124939
-0.513608	set, then you cannot	-0.124939
-0.567468	cycles. If you cannot	-0.124939
-0.494670	type, but you cannot	-0.124939
-0.473974	... Here, you cannot	-0.124939
-0.435324	... Here you cannot	-0.124939
-0.435324	most systems, you cannot	-0.124939
-0.336527	false. Likewise, you cannot	-0.124939
-0.658205	The MOVNTQ instruction cannot	-0.124939
-0.803187	a function which cannot	-0.124939
-1.281316	the level-2 cache cannot	-0.124939
-0.836679	that the compilers cannot	-0.124939
-0.353430	For example, compilers cannot	-0.124939
-0.657429	the final size cannot	-0.124939
-0.503724	above. An object cannot	-0.124939
-0.548096	below). A variable cannot	-0.124939
-0.299487	^ 1; You cannot	-0.124939
-0.548168	clock cycles. You cannot	-0.124939
-0.299487	a thread. You cannot	-0.124939
-0.299487	which not. You cannot	-0.124939
-0.299487	this case. You cannot	-0.124939
-0.299487	pitfalls here. You cannot	-0.124939
-0.299487	Boolean operands. You cannot	-0.124939
-0.299487	of CriticalFunction. You cannot	-0.124939
-0.299487	test examples. You cannot	-0.124939
-0.299487	are compiler-specific. You cannot	-0.124939
-0.823111	particular memory address cannot	-0.124939
-0.909161	Whole program optimization cannot	-0.124939
-0.462166	so that they cannot	-0.124939
-0.327891	reason that they cannot	-0.124939
-0.506375	avoided because they cannot	-0.124939
-0.415229	cases where they cannot	-0.124939
-0.320445	which reductions they cannot	-0.124939
-0.460135	More complicated cases cannot	-0.124939
-0.536847	The vector instructions cannot	-0.124939
-0.355984	that access times cannot	-0.124939
-0.522780	of Intel CPUs cannot	-0.124939
-0.355698	make it work cannot	-0.124939
-0.565090	m and therefore cannot	-0.124939
-0.558972	of the problem cannot	-0.124939
-0.518844	that the name cannot	-0.124939
-0.518679	or const reference cannot	-0.124939
-0.496791	for network resources cannot	-0.124939
-0.455739	132 Table lookup cannot	-0.124939
-0.352860	is optimized. We cannot	-0.124939
-0.384622	of different types cannot	-0.425969
-0.533055	where dynamic linking cannot	-0.124939
-0.351057	floating point operands cannot	-0.124939
-0.743758	library is loaded cannot	-0.124939
-0.335986	optimization. The debugger cannot	-0.124939
-0.331596	induction variables Compilers cannot	-0.124939
-0.293959	many encryption algorithms, cannot	-0.124939
-0.503895	vector operations and before	-0.124939
-0.358125	starts running and before	-0.124939
-0.588922	to unsigned int before	-0.124939
-1.012385	this the time before	-0.124939
-0.567379	read the time before	-0.124939
-0.594158	terminates the program before	-0.124939
-0.521509	inside your program before	-0.124939
-0.358053	an EMMS instruction before	-0.124939
-0.503060	a to double before	-0.124939
-0.548729	is an Intel before	-0.124939
-0.357241	a temporary array before	-0.124939
-1.415955	that the value before	-0.124939
-1.941722	time it takes before	-0.124939
-0.759524	a virtual table before	-0.124939
-0.357029	branch misprediction long before	-0.124939
-0.836583	that is called before	-0.124939
-0.783589	must be called before	-0.124939
-0.625839	is usually called before	-0.124939
-0.522780	billions of times before	-0.124939
-1.126936	on the stack before	-0.124939
-0.502259	from the stack before	-0.124939
-0.459180	are too big before	-0.124939
-1.152547	check for overflow before	-0.124939
-0.458923	to signed integers before	-0.124939
-0.578587	to double precision before	-0.124939
-0.354989	such a check before	-0.124939
-0.565266	store is known before	-0.124939
-0.621467	the size known before	-0.124939
-0.319979	by their values before	-0.124939
-0.319979	to desired values before	-0.124939
-0.319979	their actual values before	-0.124939
-0.354201	a pointer well before	-0.124939
-1.303524	few clock cycles before	-0.124939
-1.033344	time stamp counter before	-0.124939
-0.354083	the clock count before	-0.124939
-0.496988	it to signed before	-0.124939
-0.352591	a new addition before	-0.124939
-0.329357	final size needed before	-0.124939
-0.602619	Is searching needed before	-0.124939
-0.495886	can be read before	-0.124939
-0.451553	because it comes before	-0.124939
-0.348769	value of temp before	-0.124939
-0.805706	the optimal algorithm before	-0.124939
-0.347083	the thread priority before	-0.124939
-0.735020	of one iteration before	-0.124939
-1.126410	performance monitor counters before	-0.124939
-0.344917	for security reasons before	-0.124939
-0.483402	times // Time before	-0.124939
-0.441433	detect the misprediction before	-0.124939
-0.339267	they are resolved before	-0.124939
-0.339009	nearby address again before	-0.124939
-0.335762	declaration of c1 before	-0.124939
-0.335861	calculation of B before	-0.124939
-0.335861	interpretation or compilation before	-0.124939
-0.493569	done the job before	-0.124939
-0.331492	that require cleanup before	-0.124939
-0.048328	intrinsic function _mm256_zeroupper() before	-0.124939
-0.005112	then call _mm256_zeroupper() before	-0.301030
-0.324873	take several years before	-0.124939
-0.314251	of programming experience before	-0.124939
-0.314251	it is evicted before	-0.124939
-0.293755	may be freed before	-0.124939
-0.102714	to do immediately before	-0.124939
-0.102714	be placed immediately before	-0.124939
-0.237503	in several stages before	-0.124939
-0.237503	to be restored before	-0.124939
-0.237503	a particular subtask before	-0.124939
-0.237503	to calculate (c+d) before	-0.124939
-0.237503	string is checked before	-0.124939
-0.237503	the second sub-vector before	-0.124939
-0.591655	table that is stored	-0.124939
-1.589830	that it is stored	-0.124939
-1.235505	if it is stored	-0.124939
-0.521988	counter i is stored	-0.124939
-0.485326	the variable is stored	-0.425969
-0.412737	first result is stored	-0.124939
-0.412737	second result is stored	-0.124939
-0.718419	first element is stored	-0.124939
-0.355324	The sign is stored	-0.124939
-0.779262	The exponent is stored	-0.124939
-0.355324	the fraction is stored	-0.124939
-0.358342	is distributed and stored	-0.124939
-0.658693	in advance and stored	-0.124939
-0.567981	variable to be stored	-0.124939
-0.837412	variables to be stored	-0.124939
-1.341122	which can be stored	-0.124939
-0.870063	variables can be stored	-0.124939
-0.583699	code may be stored	-0.124939
-0.497670	class will be stored	-0.124939
-0.497670	objects will be stored	-0.124939
-0.497670	3.5 will be stored	-0.124939
-0.497670	modifier will be stored	-0.124939
-0.708868	array should be stored	-0.124939
-0.494181	bytes should be stored	-0.124939
-0.080857	together should be stored	-0.425969
-0.526629	object cannot be stored	-0.124939
-0.526629	variable cannot be stored	-0.124939
-0.841853	therefore preferably be stored	-0.124939
-1.302837	Variables that are stored	-0.124939
-0.126437	a function are stored	-0.425969
-0.501109	the data are stored	-0.425969
-0.421928	which data are stored	-0.124939
-0.493673	a class are stored	-0.124939
-0.493673	derived class are stored	-0.124939
-0.563273	and objects are stored	-0.124939
-0.328222	used variables are stored	-0.124939
-0.638707	Boolean variables are stored	-0.124939
-0.328222	Global variables are stored	-0.124939
-1.097574	the elements are stored	-0.124939
-1.142642	Function parameters are stored	-0.124939
-0.345393	or structure are stored	-0.124939
-0.486187	point numbers are stored	-0.124939
-0.633209	used together are stored	-0.124939
-0.753386	point constants are stored	-0.124939
-1.054378	objects are not stored	-0.124939
-0.462836	ebx is then stored	-0.124939
-1.409117	through a pointer stored	-0.124939
-0.346438	program are also stored	-0.124939
-0.139554	other are also stored	-0.425969
-0.346438	together are also stored	-0.124939
-0.779676	if the objects stored	-0.124939
-0.438999	(en.wikipedia.org/wiki/Standard_Template_Library). The objects stored	-0.124939
-0.137372	only for objects stored	-0.124939
-0.137372	principle for objects stored	-0.124939
-0.537308	properties) are always stored	-0.124939
-1.021009	objects have been stored	-0.124939
-0.354485	and for information stored	-0.124939
-0.497760	child are typically stored	-0.124939
-0.795606	variable is never stored	-0.124939
-0.494728	functions are usually stored	-0.124939
-0.348340	p. 26). Variables stored	-0.124939
-0.048385	by 16, i.e. stored	-0.425969
-0.487850	are not necessarily stored	-0.124939
-0.601081	body of the called	-0.124939
-0.599595	caller to the called	-0.124939
-0.537875	big and is called	-0.124939
-1.067420	function that is called	-0.124939
-0.563645	routine that is called	-0.124939
-1.246101	if it is called	-0.124939
-0.975804	the function is called	-0.425969
-1.057702	a function is called	-0.124939
-0.694948	The function is called	-0.124939
-0.485637	other function is called	-0.124939
-0.485637	each function is called	-0.124939
-0.694948	virtual function is called	-0.124939
-0.930126	time. This is called	-0.124939
-0.515763	CPUs. This is called	-0.124939
-0.515763	parameters. This is called	-0.124939
-0.515763	processor. This is called	-0.124939
-0.515763	do. This is called	-0.124939
-0.515763	same. This is called	-0.124939
-0.515763	spaces. This is called	-0.124939
-0.515070	other functions is called	-0.124939
-0.582199	(DLL) which is called	-0.124939
-0.515070	preceding one is called	-0.124939
-0.790587	this example is called	-0.124939
-0.493403	Intel processors is called	-0.124939
-0.245490	Intel's profiler is called	-0.124939
-0.245490	AMD's profiler is called	-0.124939
-0.497283	the destructor is called	-0.124939
-0.140841	times CriticalFunction is called	-0.124939
-0.140841	whether CriticalFunction is called	-0.124939
-1.183681	library can be called	-0.124939
-0.858835	function may be called	-0.124939
-0.858835	constructor may be called	-0.124939
-0.582715	function cannot be called	-0.124939
-0.523830	constructor must be called	-0.124939
-0.523830	any, must be called	-0.124939
-0.354150	Dispatcher. Will be called	-0.124939
-0.522709	any function are called	-0.124939
-0.551126	functions which are called	-0.124939
-0.782270	derived class are called	-0.124939
-0.354752	each object are called	-0.124939
-0.651553	all destructors are called	-0.124939
-0.354752	and sum2 are called	-0.124939
-0.583675	Assume member function called	-0.124939
-0.502894	throw() Assume function called	-0.124939
-0.353392	alone compiler when called	-0.124939
-0.571403	not only when called	-0.124939
-0.353392	but also when called	-0.124939
-0.867578	part of memory called	-0.124939
-0.499247	if all functions called	-0.124939
-0.560494	any library functions called	-0.124939
-0.856780	it is only called	-0.124939
-0.357920	a special cache called	-0.124939
-0.751819	function is also called	-0.124939
-0.519871	This is also called	-0.124939
-0.063473	link libraries, also called	-0.425969
-0.357236	to its variables called	-0.124939
-1.141129	it has been called	-0.124939
-0.354013	the function was called	-0.124939
-0.352829	described a mechanism called	-0.124939
-0.517409	functions are actually called	-0.124939
-0.612470	function is usually called	-0.124939
-0.432881	any, is usually called	-0.124939
-0.634525	have a feature called	-0.124939
-0.297363	compiler A feature called	-0.124939
-0.386698	built-in test feature called	-0.124939
-0.349672	to its functions, called	-0.124939
-0.237796	a class (also called	-0.124939
-0.237796	function is 83 called	-0.124939
-0.237796	is // erroneously called	-0.124939
-1.281018	calculation of the address	-0.124939
-0.578606	row to the address	-0.124939
-0.578606	eax to the address	-0.124939
-1.194686	equal to the address	-0.124939
-1.885707	is that the address	-0.124939
-0.595797	element if the address	-0.124939
-1.075246	to make the address	-0.425969
-0.322514	look up the address	-0.425969
-0.065321	which contains the address	-0.124939
-0.065321	now contains the address	-0.124939
-0.065321	ecx contains the address	-0.124939
-0.065321	edx contains the address	-0.124939
-1.171724	to calculate the address	-0.124939
-0.495540	can calculate the address	-0.124939
-0.355137	is simply the address	-0.124939
-0.576127	constant, unless the address	-0.124939
-0.809233	} Here, the address	-0.124939
-1.160927	to find the address	-0.124939
-0.499720	line containing the address	-0.124939
-0.699408	for calculating the address	-0.124939
-0.411954	when calculating the address	-0.124939
-0.521713	file tells the address	-0.124939
-0.355137	4. So the address	-0.124939
-0.355137	that covered the address	-0.124939
-0.462585	its address. The address	-0.124939
-0.462585	odd here. The address	-0.124939
-0.817949	extra instructions for address	-0.124939
-0.598255	up the function address	-0.124939
-0.358543	a different code address	-0.124939
-0.563987	aligned to an address	-0.124939
-0.049517	stored at an address	-0.425969
-0.105415	start at an address	-0.124939
-0.180794	loaded at an address	-0.124939
-0.105415	begin at an address	-0.124939
-0.462853	address. If this address	-0.124939
-0.452023	a variable from address	-0.124939
-0.349776	0x40 bytes from address	-0.124939
-0.349776	Reading again from address	-0.124939
-0.452023	program reads from address	-0.124939
-0.093537	at a memory address	-0.301030
-0.486076	libraries. The memory address	-0.124939
-0.497681	read from memory address	-0.124939
-0.063166	a particular memory address	-0.124939
-0.338617	an arbitrary memory address	-0.124939
-0.548007	then stored at address	-0.124939
-0.355296	the stack at address	-0.124939
-1.285752	from the same address	-0.124939
-0.861010	to any other address	-0.124939
-0.462193	can calculate each address	-0.124939
-1.334104	of the array address	-0.124939
-0.461338	overwrite the return address	-0.124939
-0.460475	relocate, but these address	-0.124939
-0.459799	register if its address	-0.124939
-0.353442	than the complicated address	-0.124939
-0.519211	program by their address	-0.124939
-0.518247	while the runtime address	-0.124939
-0.558999	get its own address	-0.124939
-0.541701	at a higher address	-0.124939
-0.350846	if the target address	-0.124939
-0.243505	then the target address	-0.124939
-0.338185	predicted. The target address	-0.124939
-0.481080	because a fixed address	-0.124939
-0.434748	as a base address	-0.124939
-0.429244	from the larger address	-0.124939
-0.314542	or a nearby address	-0.124939
-0.294033	the variable whose address	-0.124939
-1.296623	any of the 4	-0.124939
-0.575449	This pointer is 4	-0.124939
-0.827458	exp function of 4	-0.124939
-0.358480	// Structure of 4	-0.124939
-0.575480	take up to 4	-0.124939
-0.203036	bytes. first // 4	-0.124939
-0.522314	int b; // 4	-0.124939
-0.718932	int d; // 4	-0.124939
-0.573066	2048 bytes = 4	-0.124939
-0.722840	rolled out by 4	-0.124939
-0.461484	; align by 4	-0.124939
-0.504445	and 3 - 4	-0.124939
-0.655185	bytes = int 4	-0.124939
-0.655185	float or int 4	-0.124939
-0.657303	bytes = double 4	-0.124939
-0.993160	bytes = float 4	-0.124939
-0.352139	8 8 float 4	-0.124939
-1.204429	There are also 4	-0.124939
-0.357268	= 8 * 4	-0.124939
-0.453798	or double takes 4	-0.124939
-0.351177	microprocessor. Multiplication takes 4	-0.124939
-0.422898	8 float 4 4	-0.124939
-0.326600	or unsigned 4 4	-0.124939
-0.326600	32-bit mode 4 4	-0.124939
-0.326600	only _mm_permutevar_ps 4 4	-0.124939
-0.326600	AVX _mm256_permutevar_ps 4 4	-0.124939
-0.952523	signed or unsigned 4	-0.124939
-0.457955	AVX double 64 4	-0.124939
-0.837888	long long 64 4	-0.124939
-0.457955	64 4 64 4	-0.124939
-0.100613	32 8 64 4	-0.124939
-0.489600	to execution time. 4	-0.124939
-0.347858	total computation time. 4	-0.124939
-0.841330	short int 16 4	-0.124939
-0.336517	64 Iu8vec8 16 4	-0.124939
-0.336517	64 Is16vec4 16 4	-0.124939
-0.479969	SSE2 int 32 4	-0.124939
-0.448361	SSE2 float 32 4	-0.124939
-0.317715	64 4 32 4	-0.124939
-0.317715	Iu16vec8 Vec8us 32 4	-0.124939
-0.317715	Is32vec4 Vec4i 32 4	-0.124939
-0.354922	is 8192 / 4	-0.124939
-0.498747	reference, 32-bit mode 4	-0.124939
-0.456659	as 32 sets 4	-0.124939
-0.161910	on the Pentium 4	-0.124939
-0.077658	on a Pentium 4	-0.124939
-0.225451	computer. The Pentium 4	-0.124939
-0.161910	cycles on Pentium 4	-0.124939
-0.161910	pattern, while Pentium 4	-0.124939
-0.514423	of 1, 2, 4	-0.124939
-0.492126	_mm_prefetch SSE Store 4	-0.124939
-0.341765	end of procedure 4	-0.124939
-0.176448	AVX2 _mm256_i32gather_epi32 unlimited 4	-0.124939
-0.176448	AVX2 _mm_i32gather_ps unlimited 4	-0.124939
-0.176448	AVX2 _mm_i32gather_epi32 unlimited 4	-0.124939
-0.176448	AVX2 _mm256_i32gather_ps unlimited 4	-0.124939
-0.621138	bytes = int64_t 4	-0.124939
-0.621138	by a factor 4	-0.124939
-0.331576	throughput ....................................................................................... 22 4	-0.124939
-0.044121	to assembly: ALIGN 4	-0.425969
-0.594615	; parameter 1: 4	-0.124939
-0.314445	of optimizing ............................................................................................... 4	-0.124939
-0.293941	the least recently 4	-0.124939
-0.293941	= 8192 bytes, 4	-0.124939
-0.237666	AMD only _mm_permutevar_ps 4	-0.124939
-0.237666	4 AVX _mm256_permutevar_ps 4	-0.124939
-0.358335	Windows syntax or See	-0.124939
-1.315318	of the code. See	-0.124939
-0.536257	virtual member function. See	-0.124939
-0.354813	efficient than functions. See	-0.124939
-0.474187	piece of memory. See	-0.124939
-0.336683	with contiguous memory. See	-0.124939
-0.353953	the main program. See	-0.124939
-0.443110	registers are used. See	-0.425969
-0.646426	and Gnu compilers. See	-0.124939
-0.815014	and VIA processors. See	-0.124939
-0.349780	standardized across platforms. See	-0.124939
-0.349867	the simplest cases. See	-0.124939
-0.925564	0 or 1. See	-0.124939
-0.349125	use static variables. See	-0.124939
-0.347886	into sleep mode. See	-0.124939
-0.347781	about code optimization. See	-0.124939
-0.538870	variables, if possible. See	-0.124939
-0.447193	an inefficient way. See	-0.124939
-0.344737	an Intel CPU. See	-0.124939
-0.795595	aligned or not. See	-0.124939
-0.344873	best possible version. See	-0.124939
-0.629068	out of order. See	-0.124939
-1.148610	dynamic memory allocation. See	-0.124939
-0.343329	linking" if available. See	-0.124939
-0.343178	complex integer expressions. See	-0.124939
-0.343103	kinds of storage. See	-0.124939
-0.533970	the operating system. See	-0.124939
-0.318817	and operating system. See	-0.124939
-0.625184	loop control branch. See	-0.124939
-0.620791	enables interprocedural optimizations. See	-0.124939
-0.511029	using assembly language. See	-0.124939
-0.614579	to avoid this. See	-0.124939
-0.874833	do not overlap. See	-0.124939
-0.487170	structured exception handling. See	-0.124939
-0.331303	reading disk files. See	-0.124939
-0.757788	to do so. See	-0.124939
-0.331435	of five manuals. See	-0.124939
-0.209990	same memory pool. See	-0.124939
-0.209990	one memory pool. See	-0.124939
-0.044094	Intel's CPU dispatcher. See	-0.425969
-0.420656	is not cached. See	-0.124939
-0.740648	no pointer aliasing. See	-0.124939
-0.314183	slightly less compact. See	-0.124939
-0.314183	prevent such errors. See	-0.124939
-0.314183	each part takes. See	-0.124939
-0.314395	recover from exceptions. See	-0.124939
-0.407452	does not occur. See	-0.124939
-0.065738	near each other. See	-0.425969
-0.293691	be optimally aligned. See	-0.124939
-0.382191	strictness is required. See	-0.124939
-0.293691	branch prediction mechanism. See	-0.124939
-0.293691	serious legal issue. See	-0.124939
-0.293691	a long delay. See	-0.124939
-0.293691	use STL containers. See	-0.124939
-0.293691	of the alignment. See	-0.124939
-0.382191	cause cache contentions. See	-0.124939
-0.293691	program will crash. See	-0.124939
-0.237446	p is incremented. See	-0.124939
-0.237446	when I die. See	-0.124939
-0.237446	you are doing. See	-0.124939
-0.237446	optimization is requested. See	-0.124939
-0.237446	available from Intel. See	-0.124939
-0.237446	perhaps Mac OS. See	-0.124939
-0.237446	the directive __declspec(cpu_dispatch(...)). See	-0.124939
-0.237446	loop-invariant code motion. See	-0.124939
-0.237446	parallelism is obvious. See	-0.124939
-0.237446	type identification (RTTI). See	-0.124939
-0.237446	_mm_setcsr(_mm_getcsr() | 0x8040); See	-0.124939
-0.379817	multiple of the critical	-0.726999
-1.021478	version of the critical	-0.124939
-0.573045	call of the critical	-0.124939
-1.433451	versions of the critical	-0.124939
-1.156079	call to the critical	-0.124939
-1.223681	equal to the critical	-0.124939
-0.591399	resolution and the critical	-0.124939
-0.579549	int in the critical	-0.124939
-0.491504	calls in the critical	-0.425969
-0.579549	small in the critical	-0.124939
-0.579549	conversions in the critical	-0.124939
-0.593307	subroutine for the critical	-0.124939
-0.593822	avoid that the critical	-0.124939
-0.585694	obtained if the critical	-0.124939
-0.585694	mispredictions if the critical	-0.124939
-0.881527	called by the critical	-0.124939
-1.296618	every time the critical	-0.124939
-0.587925	methods then the critical	-0.124939
-1.040853	is because the critical	-0.124939
-0.589884	62. If the critical	-0.124939
-0.583732	manner where the critical	-0.124939
-0.499153	that calls the critical	-0.124939
-0.354730	16.2 calls the critical	-0.124939
-0.582990	log) inside the critical	-0.124939
-0.565417	is outside the critical	-0.124939
-0.533234	size. When the critical	-0.124939
-0.533234	that includes the critical	-0.124939
-0.456949	after executing the critical	-0.124939
-0.649401	to identify the critical	-0.124939
-0.353664	this distance the critical	-0.124939
-0.890200	system code is critical	-0.124939
-1.547410	part of a critical	-0.124939
-1.035397	versions of a critical	-0.124939
-1.068879	used in a critical	-0.124939
-0.539250	This makes a critical	-0.124939
-0.357633	after executing a critical	-0.124939
-0.358868	security advices in critical	-0.124939
-0.461837	8 ways. The critical	-0.124939
-0.461837	cache lines. The critical	-0.124939
-0.461837	bytes each. The critical	-0.124939
-0.358784	not recommended for critical	-0.124939
-0.503644	data cache are critical	-0.124939
-0.357946	memory access are critical	-0.124939
-0.358636	most important or critical	-0.124939
-0.358262	distinct tasks. A critical	-0.124939
-1.286116	from the same critical	-0.124939
-0.538280	of the most critical	-0.425969
-0.495925	to the most critical	-0.124939
-0.180716	in the most critical	-0.602060
-0.352413	that the most critical	-0.124939
-0.352413	make the most critical	-0.124939
-0.352413	making the most critical	-0.124939
-0.352413	isolate the most critical	-0.124939
-0.550346	cases. The most critical	-0.124939
-0.356952	time. Optimizing less critical	-0.124939
-0.577620	mitigated by making critical	-0.124939
-0.526212	drivers are particularly critical	-0.124939
-0.093282	120 13 Making critical	-0.124939
-0.093282	121 13 Making critical	-0.124939
-0.065796	... // Call critical	-0.425969
-0.237837	soft processor activates critical	-0.124939
-0.540318	spot. Use the call	-0.124939
-0.574437	see whether the call	-0.124939
-0.549796	may replace the call	-0.124939
-0.503950	and inlining the call	-0.124939
-0.725000	to overlap the call	-0.124939
-0.358164	by removing the call	-0.124939
-0.878450	certain that a call	-0.124939
-0.358550	unchanged across a call	-0.124939
-1.115495	The overhead of call	-0.124939
-1.228884	the code to call	-0.124939
-0.881648	you have to call	-0.425969
-0.574308	longer time to call	-0.124939
-0.865750	it takes to call	-0.301030
-1.045191	that need to call	-0.124939
-1.729135	you want to call	-0.124939
-1.026423	that needs to call	-0.124939
-0.458380	the destructor to call	-0.124939
-0.354792	exception handler to call	-0.124939
-0.354792	is supposed to call	-0.124939
-0.504153	best compiler and call	-0.124939
-0.358310	by F2 and call	-0.124939
-0.589559	program that can call	-0.124939
-0.590451	so it can call	-0.124939
-0.568474	parameter. It can call	-0.124939
-0.525905	define matrix // call	-0.124939
-0.357645	"Hello "; // call	-0.124939
-1.320670	that the function call	-0.124939
-1.090260	as a function call	-0.124939
-0.550932	replace a function call	-0.124939
-0.550932	replacing a function call	-0.124939
-0.550644	reasons: The function call	-0.124939
-0.486012	block or function call	-0.124939
-0.378842	for each function call	-0.124939
-0.378842	at each function call	-0.124939
-0.804700	a virtual function call	-0.124939
-0.446325	each intrinsic function call	-0.124939
-0.345267	information. Each function call	-0.124939
-0.507304	to dispatched function call	-0.124939
-0.540693	you should not call	-0.124939
-1.160428	Alternatively, you may call	-0.124939
-0.355643	different address. A call	-0.124939
-0.355643	device driver. A call	-0.124939
-0.504077	apart. I will call	-0.124939
-0.560399	zero and then call	-0.124939
-0.350240	AVX support then call	-0.124939
-0.350240	CPU dispatching then call	-0.124939
-0.350240	AVX support, then call	-0.124939
-1.088003	more than one call	-0.124939
-0.517315	is only one call	-0.124939
-0.517315	make only one call	-0.124939
-0.462221	time because each call	-0.124939
-0.461547	will make any call	-0.124939
-1.069501	before the first call	-0.124939
-0.232080	dispatching on first call	-0.124939
-0.232080	dispatch on first call	-0.124939
-0.061381	// After first call	-0.425969
-0.460292	by a system call	-0.124939
-0.288050	function that doesn't call	-0.425969
-0.520417	with a single call	-0.124939
-0.520417	make a single call	-0.124939
-0.238650	dispatching on every call	-0.124939
-0.238650	dispatch on every call	-0.124939
-0.453707	no other modules call	-0.124939
-0.441939	p->f(); // Virtual call	-0.124939
-0.438820	} // Now call	-0.124939
-0.070320	int i = 0;	-0.124939
-0.005023	(int i = 0;	-1.238882
-0.516996	CFALSE: c = 0;	-0.124939
-0.016802	for (i = 0;	-1.539912
-0.490815	{ d = 0;	-0.124939
-0.298278	float sum = 0;	-0.124939
-0.201917	i, sum = 0;	-0.124939
-0.201917	list[size], sum = 0;	-0.124939
-0.385581	i+=3){ list[i] = 0;	-0.124939
-0.296454	{ seconds = 0;	-0.124939
-0.013640	for (c = 0;	-0.726999
-0.296454	0, sum2 = 0;	-0.124939
-0.350046	for (r = 0;	-0.425969
-0.183671	for (x = 0;	-0.425969
-0.385581	i, largest_index = 0;	-0.124939
-0.385581	absvalue, largest_abs = 0;	-0.124939
-0.296454	for (j = 0;	-0.124939
-0.296454	for (c1 = 0;	-0.124939
-0.296454	for (row = 0;	-0.124939
-0.296454	for (r1 = 0;	-0.124939
-0.296454	for (column = 0;	-0.124939
-0.296454	} list[300] = 0;	-0.124939
-0.845278	else { return 0;	-0.124939
-0.063890	c); ... return 0;	-0.425969
-0.352681	StringLength; i > 0;	-0.124939
-0.492245	2.0; i >= 0;	-0.124939
-0.336330	&& z != 0;	-0.124939
-0.463571	slow. Today, the 8	-0.124939
-0.835680	data cache is 8	-0.124939
-0.249243	a cache of 8	-0.124939
-0.249243	data cache of 8	-0.124939
-0.549008	sixteen integers of 8	-0.124939
-0.357800	8 double's of 8	-0.124939
-0.526233	(6 integer and 8	-0.124939
-1.075883	32-bit systems and 8	-0.124939
-0.580281	sizeof(S1) would be 8	-0.124939
-0.357576	unused bytes // 8	-0.124939
-0.525297	double b; // 8	-0.124939
-0.358533	* sizeof(float)) = 8	-0.124939
-0.548570	typically aligned by 8	-0.124939
-1.236498	address divisible by 8	-0.124939
-0.659029	takes 4 - 8	-0.124939
-0.655151	bytes = int 8	-0.124939
-0.655151	float or int 8	-0.124939
-1.677493	explained on page 8	-0.124939
-0.357622	4 4 double 8	-0.124939
-1.006306	bytes = float 8	-0.124939
-0.357240	value 10 * 8	-0.124939
-0.461563	a double takes 8	-0.124939
-0.357034	has nothing between 8	-0.124939
-0.318804	4 double 8 8	-0.124939
-0.318804	or unsigned 8 8	-0.124939
-0.318804	64-bit mode 8 8	-0.124939
-0.481552	set char 8 8	-0.124939
-0.318804	class, Agner 8 8	-0.124939
-0.318804	64 Is8vec8 8 8	-0.124939
-0.952467	signed or unsigned 8	-0.124939
-0.490519	AVX512 double 64 8	-0.124939
-0.920522	long long 64 8	-0.124939
-0.460895	12 or 16 8	-0.124939
-0.811476	short int 16 8	-0.124939
-0.326958	Is16vec8 Vec8s 16 8	-0.124939
-0.326958	Iu8vec16 Vec16uc 16 8	-0.124939
-0.479943	AVX2 int 32 8	-0.124939
-0.448337	AVX2 float 32 8	-0.124939
-0.317697	64 2 32 8	-0.124939
-0.448337	32 8 32 8	-0.124939
-0.317697	16 16 32 8	-0.124939
-0.558250	sees the constant 8	-0.124939
-0.354887	512 kb / 8	-0.124939
-0.544038	made the structure 8	-0.124939
-0.569294	reference, 64-bit mode 8	-0.124939
-0.266241	Instruction set char 8	-0.124939
-0.396774	int unsigned char 8	-0.124939
-0.266241	128 SSE2 char 8	-0.124939
-0.266241	64 MMX char 8	-0.124939
-0.266241	in stdint.h char 8	-0.124939
-0.549367	leave the last 8	-0.124939
-0.772925	#elif INSTRSET == 8	-0.124939
-0.446066	_mm_stream_si32 SSE2 Store 8	-0.124939
-0.287365	using namespaces. 65 8	-0.124939
-0.287365	7.33 Namespaces........................................................................................................... 65 8	-0.124939
-0.176441	AVX2 _mm256_i64gather_pd unlimited 8	-0.124939
-0.176441	AVX2 _mm_i64gather_pd unlimited 8	-0.124939
-0.176441	AVX2 _mm256_i64gather_epi32 unlimited 8	-0.124939
-0.176441	AVX2 _mm_i64gather_epi32 unlimited 8	-0.124939
-0.441615	only the lower 8	-0.124939
-0.438620	ecx, 1 eax, 8	-0.124939
-0.498450	line can hold 8	-0.124939
-0.335997	Vector class, Agner 8	-0.124939
-0.594541	; parameter 1: 8	-0.124939
-0.237633	programming language ............................................................................... 8	-0.124939
-0.237633	128 Is8vec16 Vec16c 8	-0.124939
-0.237633	uint64_t 128 Vec2uq 8	-0.124939
-0.237633	char 64 Is8vec8 8	-0.124939
-0.237633	is 512 kb, 8	-0.124939
-0.237633	int64_t 64 I64vec1 8	-0.124939
-1.272217	where it is less	-0.124939
-1.825078	the function is less	-0.124939
-0.594530	default. This is less	-0.124939
-1.201547	Intel compiler is less	-0.124939
-0.597010	ahead. It is less	-0.124939
-0.586906	integers, which is less	-0.124939
-0.311472	processors, but is less	-0.124939
-0.311472	test, but is less	-0.124939
-0.972330	or class is less	-0.124939
-0.555280	multidimensional array is less	-0.124939
-0.551190	a value is less	-0.124939
-0.457492	-fpie option is less	-0.124939
-0.775999	linked list is less	-0.124939
-0.650247	point numbers is less	-0.124939
-0.715581	The delay is less	-0.124939
-0.354092	250 μs is less	-0.124939
-0.354092	a bitfield is less	-0.124939
-0.358714	jumping around and less	-0.124939
-0.358707	be sufficient for less	-0.124939
-1.494266	likely to be less	-0.124939
-1.017070	guaranteed to be less	-0.124939
-0.598039	programs can be less	-0.124939
-1.042973	code will be less	-0.124939
-0.355614	in fact be less	-0.124939
-0.578000	member functions are less	-0.124939
-0.496904	level-1 cache are less	-0.124939
-0.570942	Dynamic libraries are less	-0.124939
-0.353116	64 bits are less	-0.124939
-0.549874	executing instructions are less	-0.124939
-0.353116	Integer expressions are less	-0.124939
-0.353116	alternative implementations are less	-0.124939
-0.456254	alignment requirements are less	-0.124939
-0.358523	only 50% or less	-0.124939
-0.570286	which makes it less	-0.124939
-1.426759	makes the code less	-0.124939
-0.598565	i is not less	-0.124939
-0.358314	embedded applications have less	-0.124939
-0.462664	may run at less	-0.124939
-0.597083	makes the program less	-0.124939
-0.357914	and several other less	-0.124939
-0.525870	similar functions, but less	-0.124939
-0.560872	languages, but also less	-0.124939
-0.356751	using one register less	-0.124939
-0.537568	make member pointers less	-0.124939
-0.355989	a million times less	-0.124939
-0.355735	data files while less	-0.124939
-0.442885	backwards and much less	-0.124939
-0.342539	typically have much less	-0.124939
-0.759859	code cache works less	-0.124939
-0.650042	read or write less	-0.124939
-0.353884	this brand was less	-0.124939
-0.211657	the data caching less	-0.124939
-0.211657	and data caching less	-0.124939
-0.334069	makes data caching less	-0.602060
-0.478216	code makes caching less	-0.124939
-0.543666	option that allows less	-0.124939
-0.352880	a relative difference less	-0.124939
-0.514313	platform has become less	-0.124939
-0.349541	Mars compilers produce less	-0.124939
-0.641146	that make vectorization less	-0.124939
-0.347295	total time. Optimizing less	-0.124939
-0.347326	compilers available, though less	-0.124939
-0.632350	variables as input less	-0.124939
-0.391115	bytes is slightly less	-0.124939
-0.391115	are only slightly less	-0.124939
-0.237882	133 although slightly less	-0.124939
-0.293969	It works somewhat less	-0.124939
-0.237690	It may neverthe- less	-0.124939
-0.463140	stamp counter // For	-0.124939
-1.130868	in the code. For	-0.124939
-0.339450	handles this code. For	-0.124939
-0.339450	that makes code. For	-0.124939
-0.875051	at compile time. For	-0.124939
-0.355136	are comparisons, etc. For	-0.124939
-0.564902	pointer is used. For	-0.124939
-0.457421	Intel header files For	-0.124939
-0.642023	in many cases. For	-0.124939
-0.569327	waste of resources. For	-0.124939
-0.311294	have ample resources. For	-0.124939
-0.449756	of the variable. For	-0.124939
-0.347839	possibilities for optimization. For	-0.124939
-0.346006	a Boolean vector. For	-0.124939
-0.389816	called CPU dispatching. For	-0.124939
-0.389816	poor CPU dispatching. For	-0.124939
-0.293966	of every version. For	-0.124939
-0.293966	every intermediate version. For	-0.124939
-0.287305	object pointed to. For	-0.124939
-0.287305	pointer refers to. For	-0.124939
-0.725675	floating point expressions. For	-0.124939
-0.287184	complicated algebraic expressions. For	-0.124939
-0.443761	can cause overflow. For	-0.124939
-0.534118	different execution units. For	-0.124939
-0.341300	use of software. For	-0.124939
-0.339088	be vectorized automatically. For	-0.124939
-0.339001	in each core. For	-0.124939
-0.434478	to a structure. For	-0.124939
-0.331363	CPU- specific profiler. For	-0.124939
-0.443830	in one operation. For	-0.124939
-0.443830	a shift operation. For	-0.124939
-0.799017	the unroll factor. For	-0.124939
-0.324863	data elements are. For	-0.124939
-0.324863	worst- case conditions. For	-0.124939
-0.324863	difference in efficiency. For	-0.124939
-0.324863	to different tasks. For	-0.124939
-0.594226	big data structures. For	-0.124939
-0.211998	or more constants. For	-0.124939
-0.211998	for defining constants. For	-0.124939
-0.713651	and operating systems". For	-0.124939
-0.314241	C++: Preprocessor directives. For	-0.124939
-0.314241	of mixed sizes. For	-0.124939
-0.443675	a table lookup. For	-0.124939
-0.143249	conversion is valid. For	-0.124939
-0.143249	operand is valid. For	-0.124939
-0.314241	efficient than post-increment. For	-0.124939
-0.538042	the clock frequency. For	-0.124939
-0.293746	kinds of jobs. For	-0.124939
-0.293746	as a subexpression. For	-0.124939
-0.293746	set is supported. For	-0.124939
-0.382259	task in question. For	-0.124939
-0.293746	and easy development. For	-0.124939
-0.538042	of mathematical purity. For	-0.124939
-0.293746	number of sources. For	-0.124939
-0.382259	job fast enough. For	-0.124939
-0.538042	of the fraction. For	-0.124939
-0.382259	is poorly predictable. For	-0.124939
-0.293746	an execution unit. For	-0.124939
-0.293746	to a matrix. For	-0.124939
-0.237495	laws of algebra. For	-0.124939
-0.237495	when it exits. For	-0.124939
-0.237495	can be combined. For	-0.124939
-0.237495	sake of modularity. For	-0.124939
-0.237495	is simply identical. For	-0.124939
-0.237495	of algebraic reduction. For	-0.124939
-0.237495	% means modulo. For	-0.124939
-0.237495	be eliminated completely. For	-0.124939
-0.237495	of error reporting. For	-0.124939
-0.237495	threads is minimized. For	-0.124939
-0.355563	is used, for example,	-0.124939
-0.355563	can be, for example,	-0.124939
-0.355563	two decimals, for example,	-0.124939
-0.355563	expression. Assume, for example,	-0.124939
-0.355563	footprint. If, for example,	-0.124939
-0.355563	example 12.3a, for example,	-0.124939
-0.108617	} In this example,	-0.124939
-0.253798	b; In this example,	-0.124939
-0.253798	55 In this example,	-0.124939
-0.253798	elsewhere. In this example,	-0.124939
-0.253798	g(x)); In this example,	-0.124939
-0.088504	this code. For example,	-0.124939
-0.088504	makes code. For example,	-0.124939
-0.222970	compile time. For example,	-0.124939
-0.087670	is used. For example,	-0.124939
-0.087670	header files For example,	-0.124939
-0.140590	ample resources. For example,	-0.124939
-0.087670	the variable. For example,	-0.124939
-0.087670	for optimization. For example,	-0.124939
-0.087670	Boolean vector. For example,	-0.124939
-0.020315	CPU dispatching. For example,	-0.124939
-0.041627	point expressions. For example,	-0.124939
-0.041627	algebraic expressions. For example,	-0.124939
-0.087670	cause overflow. For example,	-0.124939
-0.087670	execution units. For example,	-0.124939
-0.087670	vectorized automatically. For example,	-0.124939
-0.087670	each core. For example,	-0.124939
-0.140590	one operation. For example,	-0.124939
-0.087670	unroll factor. For example,	-0.124939
-0.087670	elements are. For example,	-0.124939
-0.087670	case conditions. For example,	-0.124939
-0.087670	in efficiency. For example,	-0.124939
-0.087670	different tasks. For example,	-0.124939
-0.087670	data structures. For example,	-0.124939
-0.041627	more constants. For example,	-0.124939
-0.041627	defining constants. For example,	-0.124939
-0.222970	is valid. For example,	-0.124939
-0.087670	than post-increment. For example,	-0.124939
-0.087670	clock frequency. For example,	-0.124939
-0.087670	of jobs. For example,	-0.124939
-0.087670	a subexpression. For example,	-0.124939
-0.087670	is supported. For example,	-0.124939
-0.087670	in question. For example,	-0.124939
-0.087670	easy development. For example,	-0.124939
-0.087670	mathematical purity. For example,	-0.124939
-0.087670	of sources. For example,	-0.124939
-0.087670	fast enough. For example,	-0.124939
-0.087670	the fraction. For example,	-0.124939
-0.087670	execution unit. For example,	-0.124939
-0.087670	a matrix. For example,	-0.124939
-0.087670	of algebra. For example,	-0.124939
-0.087670	it exits. For example,	-0.124939
-0.087670	of modularity. For example,	-0.124939
-0.087670	simply identical. For example,	-0.124939
-0.087670	algebraic reduction. For example,	-0.124939
-0.087670	means modulo. For example,	-0.124939
-0.087670	error reporting. For example,	-0.124939
-0.087670	is minimized. For example,	-0.124939
-0.586677	Consider the following example,	-0.124939
-0.482069	from the above example,	-0.124939
-0.138299	In the above example,	-0.425969
-0.342411	(In the above example,	-0.124939
-0.563508	In the preceding example,	-0.124939
-0.122656	Example 12.4c. Same example,	-0.124939
-0.122656	Example 12.4e. Same example,	-0.124939
-0.122656	Example 12.4d. Same example,	-0.124939
-0.294219	a very contrived example,	-0.124939
-0.599457	consider that the bit	-0.124939
-1.814512	to use the bit	-0.124939
-0.584241	implementations of this bit	-0.124939
-0.446206	example, if each bit	-0.124939
-0.345173	fact using each bit	-0.124939
-0.520459	__intel_cpu_feature_indicator where each bit	-0.124939
-0.345173	get next each bit	-0.124939
-0.357387	2003. Contains many bit	-0.124939
-0.356908	nothing between 8 bit	-0.124939
-0.035265	objects in 64 bit	-0.124939
-0.035265	calculation in 64 bit	-0.124939
-0.035265	longer in 64 bit	-0.124939
-0.035265	references in 64 bit	-0.124939
-0.035265	-fpic in 64 bit	-0.124939
-0.035265	seen in 64 bit	-0.124939
-0.358227	systems. The 64 bit	-0.124939
-0.274046	bit code 64 bit	-0.124939
-0.274046	Asmlib Gnu 64 bit	-0.124939
-0.274046	more efficient. 64 bit	-0.124939
-0.274046	not _WIN64 64 bit	-0.124939
-0.274046	(option -fno-pic). 64 bit	-0.124939
-0.356356	Compiler identification 16 bit	-0.124939
-0.282350	compared to 32 bit	-0.124939
-0.282350	bit and 32 bit	-0.124939
-0.075380	than in 32 bit	-0.124939
-0.075380	objects in 32 bit	-0.124939
-0.075380	references in 32 bit	-0.124939
-0.282350	OpenMP directives 32 bit	-0.124939
-0.282350	advantages over 32 bit	-0.124939
-0.282350	__INTEL_COMPILER 161 32 bit	-0.124939
-0.282350	features 80386 32 bit	-0.124939
-0.584306	as a single bit	-0.124939
-0.558071	writing a small bit	-0.124939
-0.024617	defines a 128 bit	-0.602060
-0.297529	vectors SSE2 128 bit	-0.124939
-0.297529	mode SSE 128 bit	-0.124939
-0.137341	with the sign bit	-0.124939
-0.137341	example, the sign bit	-0.124939
-0.063264	out the sign bit	-0.124939
-0.137341	sets the sign bit	-0.124939
-0.219046	except the sign bit	-0.124939
-0.137341	setting the sign bit	-0.124939
-0.137341	copies the sign bit	-0.124939
-0.137341	flip the sign bit	-0.124939
-0.135234	1; // sign bit	-0.124939
-0.135234	(zero with sign bit	-0.124939
-0.030068	// set sign bit	-0.124939
-0.135234	// test sign bit	-0.124939
-0.135234	shift down sign bit	-0.124939
-0.135234	// Set sign bit	-0.124939
-0.135234	// flip sign bit	-0.124939
-0.331807	instructions AVX 256 bit	-0.124939
-0.331807	vectors AVX2 256 bit	-0.124939
-0.629816	with a slow bit	-0.124939
-0.314686	CPUs with slow bit	-0.124939
-0.632582	the least significant bit	-0.124939
-0.303907	If the carry bit	-0.124939
-0.303907	where the carry bit	-0.124939
-0.208732	next. The carry bit	-0.124939
-0.188083	useful in 32- bit	-0.124939
-0.188083	mode. The 32- bit	-0.124939
-0.188083	versions. A 32- bit	-0.124939
-0.575370	instruction set (128 bit	-0.124939
-0.294108	addition, subtraction, comparison, bit	-0.124939
-0.237812	or two 128- bit	-0.124939
-1.982064	part of the operating	-0.124939
-0.877940	overhead of the operating	-0.124939
-1.409959	responsibility of the operating	-0.124939
-0.589303	facilities of the operating	-0.124939
-0.587514	updates to the operating	-0.124939
-0.587514	Updates to the operating	-0.124939
-0.084259	CPU and the operating	-0.425969
-0.770617	processor and the operating	-0.124939
-0.530792	microprocessor and the operating	-0.124939
-1.066671	included in the operating	-0.124939
-1.180153	require that the operating	-0.124939
-0.923912	done by the operating	-0.124939
-1.203508	supported by the operating	-0.124939
-0.923912	determined by the operating	-0.124939
-0.543915	caught by the operating	-0.124939
-0.885195	come with the operating	-0.124939
-0.488404	framework between the operating	-0.124939
-0.488404	similarity between the operating	-0.124939
-0.756105	profiler tells the operating	-0.124939
-0.459395	applications force the operating	-0.124939
-0.181372	2.3 Choice of operating	-0.124939
-0.031470	C++ compilers and operating	-0.301030
-0.141930	most CPUs and operating	-0.124939
-0.141930	64-bit CPUs and operating	-0.124939
-0.354139	modern microprocessors and operating	-0.124939
-0.354139	hardware platform and operating	-0.124939
-0.520250	which platforms and operating	-0.124939
-0.937917	is used. The operating	-0.124939
-0.984425	the cache. The operating	-0.124939
-0.357468	and databases. The operating	-0.124939
-0.570587	even have an operating	-0.124939
-0.460737	applications without an operating	-0.124939
-0.866979	use a different operating	-0.124939
-1.870947	there is no operating	-0.124939
-0.461851	ported to multiple operating	-0.124939
-0.577030	because the two operating	-0.124939
-1.413764	32-bit and 64-bit operating	-0.124939
-0.864446	sixteen in 64-bit operating	-0.124939
-0.509294	compiled for 64-bit operating	-0.124939
-0.524785	mode and some operating	-0.124939
-0.535339	available in 32-bit operating	-0.124939
-0.535339	purposes in 32-bit operating	-0.124939
-0.356449	systems, though these operating	-0.124939
-0.500574	In the Windows operating	-0.124939
-0.355292	Windows and Linux operating	-0.124939
-0.354354	to query certain operating	-0.124939
-0.353699	BSD or Mac operating	-0.124939
-0.943597	in the old operating	-0.124939
-0.323543	crash on old operating	-0.124939
-0.445984	in both compiler, operating	-0.124939
-0.345025	features, and current operating	-0.124939
-0.398204	Mac OS X operating	-0.124939
-0.487720	of programming languages, operating	-0.124939
-0.044140	in a protected operating	-0.124939
-0.314552	core. Unfortunately, contemporary operating	-0.124939
-0.294043	4, etc.). Older operating	-0.124939
-0.294043	Gnu, Clang Supported operating	-0.124939
-0.237755	violate or circumvent operating	-0.124939
-0.463568	first convert the unsigned	-0.124939
-0.659649	the dividend is unsigned	-0.124939
-0.541088	point. Conversion of unsigned	-0.124939
-0.833256	converting a to unsigned	-0.124939
-0.525601	type-casting i to unsigned	-0.124939
-0.357782	the dividend to unsigned	-0.124939
-0.357782	// Convert to unsigned	-0.124939
-0.051568	on signed and unsigned	-0.124939
-0.051568	using signed and unsigned	-0.124939
-0.025019	between signed and unsigned	-0.425969
-0.051568	mix signed and unsigned	-0.124939
-0.356356	7.4. Signed and unsigned	-0.124939
-0.504659	extra bits. The unsigned	-0.124939
-0.131320	integer, signed or unsigned	-0.124939
-0.060715	int, signed or unsigned	-0.124939
-0.131320	char, signed or unsigned	-0.124939
-0.058611	Still faster if unsigned	-0.249877
-0.461297	signed than with unsigned	-0.124939
-0.357087	comparing signed with unsigned	-0.124939
-0.527557	overflow of an unsigned	-0.124939
-0.527557	Conversion of an unsigned	-0.124939
-0.574174	interpreted as an unsigned	-0.124939
-0.352990	making i an unsigned	-0.124939
-0.356560	2 2 int unsigned	-0.124939
-0.501708	Linux: long int unsigned	-0.124939
-0.354068	struct Sdouble { unsigned	-0.124939
-0.354068	struct Slongdouble { unsigned	-0.124939
-0.354068	struct Sfloat { unsigned	-0.124939
-0.358254	occurs, (2) use unsigned	-0.124939
-0.752081	8 64 4 unsigned	-0.124939
-0.480701	Is16vec4 16 4 unsigned	-0.124939
-0.520025	Vec4i 32 4 unsigned	-0.124939
-0.535267	Is8vec8 8 8 unsigned	-0.124939
-0.512595	Vec8s 16 8 unsigned	-0.124939
-0.501210	Vec16c 8 16 unsigned	-0.124939
-0.200104	// fractional part unsigned	-0.425969
-0.354878	below. Signed / unsigned	-0.124939
-0.353123	int int 256 unsigned	-0.124939
-0.348870	union {double d; unsigned	-0.124939
-0.347277	ipow (double x, unsigned	-0.124939
-0.488742	efficient to convert unsigned	-0.124939
-1.425488	{ float f; unsigned	-0.124939
-0.256161	in 16-bit systems: unsigned	-0.124939
-0.335915	nonzero and normal unsigned	-0.124939
-0.331613	overflow. Signed versus unsigned	-0.124939
-0.497230	arraysize = 1000; unsigned	-0.124939
-0.407717	__int64 64-bit Linux: unsigned	-0.124939
-0.314537	// Example 7.7 unsigned	-0.124939
-0.314537	// Example 7.25 unsigned	-0.124939
-0.382441	uint64_t MS compiler: unsigned	-0.124939
-0.293894	fractional part 142 unsigned	-0.124939
-0.237625	char 256 Vec32c unsigned	-0.124939
-0.237625	int u[2]} a[size]; unsigned	-0.124939
-0.237625	template <typename T, unsigned	-0.124939
-0.237625	exponent + 0x3FF unsigned	-0.124939
-0.237625	exponent + 0x3FFF unsigned	-0.124939
-0.237625	// Example 14.22b unsigned	-0.124939
-0.237625	// Example 14.22a unsigned	-0.124939
-0.237625	0 65535 uint16_t unsigned	-0.124939
-0.237625	0 255 uint8_t unsigned	-0.124939
-0.237625	exponent + 0x7F unsigned	-0.124939
-0.237625	0 232-1 uint32_t unsigned	-0.124939
-1.595667	This is the first	-0.124939
-1.608141	address of the first	-0.124939
-0.591069	output of the first	-0.124939
-0.881380	behavior of the first	-0.124939
-1.056900	added to the first	-0.124939
-0.596503	members in the first	-0.124939
-0.583208	called for the first	-0.124939
-0.583208	need for the first	-0.124939
-1.650733	so that the first	-0.124939
-0.354121	access it the first	-0.124939
-1.292541	or if the first	-0.124939
-0.586164	Likewise, if the first	-0.124939
-1.051544	calculations on the first	-0.124939
-0.588518	sum, then the first	-0.124939
-0.570238	initialized only the first	-0.124939
-0.590462	way. If the first	-0.124939
-0.577485	all but the first	-0.124939
-0.468208	table before the first	-0.124939
-0.760503	called before the first	-0.124939
-0.173846	known before the first	-0.425969
-1.066925	For example, the first	-0.124939
-0.522013	two times the first	-0.124939
-0.457529	only calculated the first	-0.124939
-0.457529	by optimizing the first	-0.124939
-0.354121	64-bit Windows, the first	-0.124939
-1.154472	to find the first	-0.124939
-0.354121	64-bit Linux, the first	-0.124939
-0.560278	waits until the first	-0.124939
-0.457529	before adding the first	-0.124939
-0.535403	members within the first	-0.124939
-0.354121	same way, the first	-0.124939
-0.354121	to localize the first	-0.124939
-1.147509	is faster to first	-0.124939
-1.469667	is necessary to first	-0.124939
-0.454636	ZMM registers The first	-0.124939
-0.351839	optimal algorithm The first	-0.124939
-0.495127	YMM registers. The first	-0.124939
-0.861684	32-bit mode. The first	-0.124939
-0.454636	as possible. The first	-0.124939
-0.495127	following way. The first	-0.124939
-0.454636	two ways. The first	-0.124939
-0.814942	the diagonal. The first	-0.124939
-0.351839	worst-case performance: The first	-0.124939
-0.351839	a vector). The first	-0.124939
-0.351839	optimized further. The first	-0.124939
-0.351839	as follows. The first	-0.124939
-0.527035	source files are first	-0.124939
-0.358655	&& expression, or first	-0.124939
-0.459469	CPU dispatching on first	-0.124939
-0.653332	the dispatch on first	-0.124939
-0.459469	time. Dispatch on first	-0.124939
-0.358387	only read this first	-0.124939
-0.526763	be called only first	-0.124939
-1.502445	following example shows first	-0.124939
-0.349713	integer parameter comes first	-0.124939
-0.027881	// 2 bytes. first	-0.124939
-0.012171	// 4 bytes. first	-0.301030
-0.027881	// 8 bytes. first	-0.124939
-0.124202	// 400 bytes. first	-0.124939
-0.314678	and the 49 first	-0.124939
-0.077799	point. // After first	-0.124939
-0.077799	dispatcher. // After first	-0.124939
-0.237861	efficiency is reflected, first	-0.124939
-0.600679	expansions of the register	-0.124939
-1.059915	is because the register	-0.124939
-1.154850	of using the register	-0.124939
-0.725323	the way the register	-0.124939
-0.572451	even without the register	-0.124939
-1.061853	it is a register	-0.124939
-1.139285	than in a register	-0.124939
-0.572597	stored in a register	-0.124939
-0.564533	'this' in a register	-0.124939
-0.874084	to be a register	-0.124939
-0.987975	stored as a register	-0.124939
-0.840210	organized as a register	-0.124939
-0.355719	make temp a register	-0.124939
-0.459558	of setting a register	-0.124939
-1.512244	the use of register	-0.124939
-1.847334	the value of register	-0.124939
-1.119086	an explanation of register	-0.124939
-0.357509	The opposite of register	-0.124939
-0.461833	are capable of register	-0.124939
-0.656933	C++ compilers The register	-0.124939
-0.837654	code cache. The register	-0.124939
-0.760821	register variable. The register	-0.124939
-0.965855	a function for register	-0.124939
-0.357427	used most for register	-0.124939
-0.357427	Typical candidates for register	-0.124939
-0.358183	main memory. A register	-0.124939
-0.786873	could benefit from register	-0.124939
-0.564045	than the vector register	-0.124939
-0.834689	in a vector register	-0.124939
-0.512943	extension of vector register	-0.124939
-0.729421	of each vector register	-0.124939
-0.340138	with another vector register	-0.124939
-0.439863	one 256-bit vector register	-0.124939
-0.340138	the largest vector register	-0.124939
-0.895381	compiler to make register	-0.124939
-0.687347	use the same register	-0.301030
-1.309607	share the same register	-0.124939
-0.555058	storage. The same register	-0.124939
-1.241489	the floating point register	-0.124939
-1.160708	of floating point register	-0.124939
-0.352426	make floating point register	-0.124939
-0.190005	making floating point register	-0.425969
-0.462351	is using one register	-0.124939
-0.462270	number of integer register	-0.124939
-0.357509	b[size], c[size]; float register	-0.124939
-1.357840	This is called register	-0.124939
-0.772911	of a new register	-0.124939
-0.532111	use a new register	-0.124939
-0.356113	the largest available register	-0.124939
-0.355907	certain rules about register	-0.124939
-0.446871	makes an extra register	-0.124939
-0.446871	requires an extra register	-0.124939
-1.297537	into a single register	-0.124939
-1.125423	compiler to optimize register	-0.124939
-0.770383	a 128-bit XMM register	-0.124939
-0.488877	though the logical register	-0.124939
-0.502004	as a temporary register	-0.124939
-0.626119	in the YMM register	-0.124939
-0.339265	only one free register	-0.124939
-0.861340	up to fourteen register	-0.124939
-0.331726	a new physical register	-0.124939
-0.314533	separating the flags register	-0.124939
-0.595966	code with a 64	-0.124939
-0.884122	line size of 64	-0.124939
-0.549531	two integers of 64	-0.124939
-0.540015	the vectors of 64	-0.124939
-0.504853	is extended to 64	-0.124939
-1.075905	32-bit systems and 64	-0.124939
-0.462734	16, 32 and 64	-0.124939
-1.258229	Shared objects in 64	-0.124939
-0.502759	address calculation in 64	-0.124939
-0.356461	byte longer in 64	-0.124939
-0.523658	relative references in 64	-0.124939
-0.654945	without -fpic in 64	-0.124939
-0.502759	is seen in 64	-0.124939
-0.556310	operating systems. The 64	-0.124939
-0.358043	be expected. The 64	-0.124939
-0.600478	vector can be 64	-0.124939
-0.557240	registers, which are 64	-0.124939
-0.463142	/ 8 = 64	-0.124939
-0.463082	is represented with 64	-0.124939
-0.463090	32 bit code 64	-0.124939
-0.358512	of 8 - 64	-0.124939
-0.496680	unsigned long int 64	-0.124939
-0.582967	int unsigned int 64	-0.124939
-0.559771	4 short int 64	-0.124939
-0.962967	unsigned short int 64	-0.124939
-0.358262	stack entries use 64	-0.124939
-0.348524	256 AVX double 64	-0.124939
-0.348524	128 SSE double 64	-0.124939
-0.348524	512 AVX512 double 64	-0.124939
-0.461204	int 32 2 64	-0.124939
-0.294364	SSE2 long long 64	-0.124939
-0.294364	AVX2 long long 64	-0.124939
-0.294364	MMX long long 64	-0.124939
-0.294364	AVX512 long long 64	-0.124939
-0.752106	8 64 4 64	-0.124939
-0.480717	int 16 4 64	-0.124939
-0.520039	4 32 4 64	-0.124939
-0.523477	char 8 8 64	-0.124939
-0.348236	2 32 8 64	-0.124939
-0.348236	8 32 8 64	-0.124939
-0.332553	with a 64 64	-0.124939
-0.430342	expected. The 64 64	-0.124939
-0.332553	31 11.6 64 64	-0.124939
-0.332553	Example 9.6b 64 64	-0.124939
-0.356290	64 Is32vec2 32 64	-0.124939
-0.355939	13 Asmlib Gnu 64	-0.124939
-0.354880	a double uses 64	-0.124939
-0.457838	long 64 1 64	-0.124939
-0.518210	which is typically 64	-0.124939
-0.368310	size is typically 64	-0.124939
-0.549298	code more efficient. 64	-0.124939
-0.329453	8 8 char 64	-0.124939
-0.484524	8 unsigned char 64	-0.124939
-0.551877	load the entire 64	-0.124939
-0.339098	64 1 int64_t 64	-0.124939
-0.429184	_WIN64 not _WIN64 64	-0.124939
-0.325042	causes another exception. 64	-0.124939
-0.325145	optimization Intel: "Intel 64	-0.124939
-0.293913	or data exceeds 64	-0.124939
-0.293913	MS compiler: __int64 64	-0.124939
-0.237641	code (option -fno-pic). 64	-0.124939
-0.237641	32 64 Iu32vec2 64	-0.124939
-0.237641	Each line covers 64	-0.124939
-0.237641	63 31 11.6 64	-0.124939
-0.237641	element Example 9.6b 64	-0.124939
-0.237641	128 I64vec2 Vec2q 64	-0.124939
-0.237641	128 Iu32vec4 Vec4ui 64	-0.124939
-0.589983	we have to take	-0.124939
-1.018138	compiler has to take	-0.124939
-0.652265	can do to take	-0.124939
-1.048766	In order to take	-0.124939
-1.314044	shows how to take	-0.124939
-0.674020	no need to take	-0.124939
-0.458786	reinstallation work to take	-0.124939
-0.355111	installation process to take	-0.124939
-0.458786	with destructors to take	-0.124939
-0.355111	program appear to take	-0.124939
-0.355111	as coprocessors to take	-0.124939
-0.463400	application itself and take	-0.124939
-0.460681	big objects that take	-0.124939
-0.554039	single instructions that take	-0.124939
-0.523866	of branches that take	-0.124939
-0.356603	multiple instances that take	-0.124939
-0.862267	applications that can take	-0.124939
-1.017203	but it can take	-0.124939
-1.025058	if you can take	-0.124939
-0.558220	database It can take	-0.124939
-0.447628	RAM memory can take	-0.124939
-0.447628	7 program can take	-0.124939
-0.560401	multiplications, which can take	-0.124939
-0.520739	unsigned You can take	-0.124939
-0.520739	a. You can take	-0.124939
-0.525922	another thread can take	-0.124939
-0.490925	cache. We can take	-0.124939
-0.490925	u.f We can take	-0.124939
-0.346299	one tread can take	-0.124939
-0.346299	programs installed can take	-0.124939
-1.170523	compiler may not take	-0.124939
-1.399970	then it may take	-0.124939
-0.792886	example, it may take	-0.124939
-0.562683	again. This may take	-0.124939
-0.348723	The calculations may take	-0.124939
-0.348723	The branches may take	-0.124939
-0.348723	on process may take	-0.124939
-0.591506	vectorized if you take	-0.124939
-0.169878	the loop will take	-0.124939
-0.169878	this loop will take	-0.124939
-0.169878	whole loop will take	-0.124939
-0.642929	in main will take	-0.124939
-0.351487	of which functions take	-0.124939
-0.709616	the critical functions take	-0.124939
-0.817397	and mathematical functions take	-0.124939
-0.462374	Software developers should take	-0.124939
-0.815490	and long double take	-0.124939
-1.600485	a and b take	-0.124939
-0.555368	functions in C++ take	-0.124939
-0.501875	it will often take	-0.124939
-0.356258	and shift operations take	-0.124939
-1.364331	in some cases take	-0.124939
-0.653462	double precision calculations take	-0.124939
-0.355676	code and doesn't take	-0.124939
-0.354645	the multiplication would take	-0.124939
-0.456908	frameworks that typically take	-0.124939
-0.353056	Multiplication and division take	-0.124939
-0.352882	= 28. We take	-0.124939
-0.351432	point calculations usually take	-0.124939
-0.514405	precision. These conversions take	-0.124939
-0.535195	shuffling can sometimes take	-0.124939
-0.348912	it will still take	-0.124939
-0.325228	possible inputs. Let's take	-0.124939
-0.314523	log, and logarithms take	-0.124939
-0.294015	as additions. Divisions take	-0.124939
-0.294015	between different precisions take	-0.124939
-0.588245	C++, it is often	-0.124939
-0.588245	languages, it is often	-0.124939
-0.456410	vectors, as is often	-0.124939
-0.580157	long. This is often	-0.124939
-0.580157	frameworks. This is often	-0.124939
-1.290537	but this is often	-0.124939
-0.781371	core. It is often	-0.124939
-0.536950	times. It is often	-0.124939
-0.536950	a[i]; It is often	-0.124939
-0.536950	execution. It is often	-0.124939
-0.536950	information. It is often	-0.124939
-0.536950	style. It is often	-0.124939
-0.536950	read. It is often	-0.124939
-0.536950	130. It is often	-0.124939
-0.582828	optimized program is often	-0.124939
-1.261256	that there is often	-0.124939
-0.938396	of objects is often	-0.124939
-0.713622	operating system is often	-0.124939
-0.648561	table lookup is often	-0.124939
-0.456410	given task is often	-0.124939
-0.358310	more complex and often	-0.124939
-0.358310	and delete, and often	-0.124939
-0.744185	the program are often	-0.124939
-0.574314	Small functions are often	-0.124939
-0.554931	Induction variables are often	-0.124939
-1.316702	if they are often	-0.124939
-0.528929	two threads are often	-0.124939
-0.515380	relative addresses are often	-0.124939
-0.350811	clock counts are often	-0.124939
-0.493698	Unfortunately, profilers are often	-0.124939
-0.453334	These requirements are often	-0.124939
-0.453334	behaviors. Arrays are often	-0.124939
-0.350811	Software distributors are often	-0.124939
-0.587691	Therefore, it can often	-0.124939
-0.869003	good compiler can often	-0.124939
-0.538243	class objects can often	-0.124939
-0.353909	digital operation can often	-0.124939
-0.457261	compiler output can often	-0.124939
-0.353909	Database queries can often	-0.124939
-0.591749	processes because it often	-0.124939
-0.587424	profiling, but it often	-0.124939
-0.463166	operators. Vectorized code often	-0.124939
-1.305721	The Gnu compiler often	-0.124939
-0.572973	all, it will often	-0.124939
-0.578154	Optimizing compilers will often	-0.124939
-0.353004	preferred language will often	-0.124939
-0.565799	consuming library functions often	-0.124939
-0.880058	and the most often	-0.124939
-0.760764	put the most often	-0.124939
-0.525093	choose the most often	-0.124939
-0.506715	that is most often	-0.124939
-0.525570	new vector size often	-0.124939
-0.357130	obsolete. Programmers very often	-0.124939
-0.356765	compiler e.g. how often	-0.124939
-0.460560	systems. Mac systems often	-0.124939
-0.651124	or other hardware often	-0.124939
-0.336150	also occur quite often	-0.124939
-0.336150	which happens quite often	-0.124939
-0.456511	The size conversion often	-0.124939
-1.029592	a hard disk often	-0.124939
-0.520242	because switch statements often	-0.124939
-0.341693	Programmers do, however, often	-0.124939
-0.294071	version. Updating mechanisms often	-0.124939
-0.237780	big software companies often	-0.124939
-0.237780	error that hackers often	-0.124939
-0.237780	source file. Keep often	-0.124939
-0.599691	here in a rather	-0.124939
-0.894680	into the code rather	-0.124939
-0.952770	floating point code rather	-0.124939
-0.765827	at compile time rather	-0.425969
-0.358199	in full use rather	-0.124939
-0.447113	x in memory rather	-0.124939
-0.447113	stored in memory rather	-0.124939
-0.462070	a 64-bit integer rather	-0.124939
-0.357344	to use float rather	-0.124939
-0.461686	an existing object rather	-0.124939
-0.357246	for one array rather	-0.124939
-0.450812	aligned by 8 rather	-0.124939
-0.348819	the constant 8 rather	-0.124939
-1.160918	in a register rather	-0.124939
-0.356509	a class template rather	-0.124939
-0.295534	variables in registers rather	-0.124939
-0.246815	transferred in registers rather	-0.726999
-0.654853	of using pointers rather	-0.124939
-0.356207	or cache access rather	-0.124939
-1.636663	the operating system rather	-0.124939
-0.553216	use 64 bits rather	-0.124939
-0.356074	we get 0 rather	-0.124939
-0.355908	only six instructions rather	-0.124939
-0.356041	for present processors rather	-0.124939
-0.355869	executed 10 times rather	-0.124939
-1.497689	on the stack rather	-0.124939
-0.355631	standard API calls rather	-0.124939
-0.534911	in the container rather	-0.124939
-0.716102	the flush-to-zero mode rather	-0.124939
-0.585609	CPU clock cycles rather	-0.124939
-0.353310	working with sets rather	-0.124939
-0.754036	a software implementation rather	-0.124939
-0.518455	a template parameter rather	-0.124939
-0.533068	are integer expressions rather	-0.124939
-0.815696	using static linking rather	-0.124939
-0.452889	of using references rather	-0.124939
-0.743403	program is loaded rather	-0.124939
-0.349290	in one operation rather	-0.124939
-0.347812	the result 100 rather	-0.124939
-0.530456	specific processor models rather	-0.124939
-0.566090	from the beginning rather	-0.124939
-0.443868	in big blocks rather	-0.124939
-0.341543	call to memcpy rather	-0.124939
-0.338934	of each factor rather	-0.124939
-0.511151	the execution units rather	-0.124939
-0.335870	a single step rather	-0.124939
-0.490002	needed in advance rather	-0.124939
-0.314261	truncation towards zero, rather	-0.124939
-0.314261	multiplication of xxn rather	-0.124939
-0.314261	libraries and frameworks, rather	-0.124939
-0.293765	the result -56 rather	-0.124939
-0.293765	only calculated once, rather	-0.124939
-0.237511	if(!a && !b) rather	-0.124939
-0.237511	defines electrical connections rather	-0.124939
-0.237511	calculated as (b*2.0)/3.0 rather	-0.124939
-0.237511	are using unions rather	-0.124939
-0.237511	on processor X?" rather	-0.124939
-0.237511	code that matters rather	-0.124939
-0.237511	the CPU supports, rather	-0.124939
-0.237511	good development tools, rather	-0.124939
-0.237511	is running at, rather	-0.124939
-2.111139	part of the optimization	-0.124939
-0.596819	language when the optimization	-0.124939
-0.579169	cannot do the optimization	-0.124939
-0.876978	on using the optimization	-0.124939
-0.576400	observed between the optimization	-0.124939
-0.358000	to focus the optimization	-0.124939
-0.358000	and concentrate the optimization	-0.124939
-1.731257	a lot of optimization	-0.124939
-0.249265	higher level of optimization	-0.124939
-0.249265	highest level of optimization	-0.124939
-0.462256	high degree of optimization	-0.124939
-0.388914	options relevant to optimization	-0.124939
-0.388914	keywords relevant to optimization	-0.124939
-0.248551	the obstacles to optimization	-0.124939
-0.248551	important obstacles to optimization	-0.124939
-0.031512	8.4 Obstacles to optimization	-0.425969
-0.031512	8.3 Obstacles to optimization	-0.425969
-0.355594	Many advices on optimization	-0.124939
-0.355594	Advanced book on optimization	-0.124939
-0.355594	www.amd.com. Advices on optimization	-0.124939
-0.358540	Literature on code optimization	-0.124939
-0.463029	rely on compiler optimization	-0.124939
-0.616077	to do this optimization	-0.124939
-0.388567	will do this optimization	-0.124939
-0.598707	for whole program optimization	-0.124939
-0.423731	Use whole program optimization	-0.124939
-0.087867	optimization Whole program optimization	-0.124939
-0.087867	program. Whole program optimization	-0.124939
-0.087867	/Og Whole program optimization	-0.124939
-0.357327	from its many optimization	-0.124939
-0.557781	considered a software optimization	-0.124939
-0.350178	of 18 software optimization	-0.124939
-0.319450	in C++ An optimization	-0.124939
-0.319450	VIA CPUs: An optimization	-0.124939
-0.319450	in C++: An optimization	-0.124939
-0.319450	assembly language: An optimization	-0.124939
-0.580994	provide the best optimization	-0.124939
-0.355532	for giving specific optimization	-0.124939
-0.355389	Has many good optimization	-0.124939
-0.330674	applying the various optimization	-0.124939
-0.605094	compilers have various optimization	-0.124939
-0.352158	relevant options. Many optimization	-0.124939
-0.421286	hand, if your optimization	-0.124939
-0.325308	CriticalFunction. If your optimization	-0.124939
-0.345421	all the relevant optimization	-0.124939
-0.041642	with all relevant optimization	-0.602060
-0.414119	suggestions for my optimization	-0.124939
-0.319553	note that my optimization	-0.124939
-0.762231	possible to insert optimization	-0.124939
-0.122618	latencies. 8.5 Compiler optimization	-0.124939
-0.122618	CPU.............................................................................81 8.5 Compiler optimization	-0.124939
-0.325162	which makes detailed optimization	-0.124939
-0.325235	optimizations when interprocedural optimization	-0.124939
-0.314533	whole program 81 optimization	-0.124939
-0.314533	file /Fm Generate optimization	-0.124939
-0.048366	systems. 14 Specific optimization	-0.124939
-0.048366	130 14 Specific optimization	-0.124939
-0.237739	Loopunrolling x-xxxx--x Profile-guided optimization	-0.124939
-0.237739	/O3 -O3 Interprocedural optimization	-0.124939
-0.237739	Choose the strongest optimization	-0.124939
-0.541194	This includes the libraries	-0.124939
-0.358737	date. Mac The libraries	-0.124939
-0.358567	operating system or libraries	-0.124939
-0.626784	Choice of function libraries	-0.124939
-0.442294	Comparison of function libraries	-0.124939
-0.147032	compilers and function libraries	-0.124939
-0.428553	used for function libraries	-0.124939
-0.466577	compilers or function libraries	-0.124939
-0.331125	that most function libraries	-0.124939
-0.295726	the Intel function libraries	-0.124939
-0.295726	in Intel function libraries	-0.124939
-0.331125	The Gnu function libraries	-0.124939
-0.331125	details. Use function libraries	-0.124939
-0.331125	The best function libraries	-0.124939
-0.134740	vectors. These function libraries	-0.124939
-0.134740	Primitives". These function libraries	-0.124939
-0.331125	Some common function libraries	-0.124939
-0.428553	best optimized function libraries	-0.124939
-0.331125	are various function libraries	-0.124939
-0.466577	Various graphics function libraries	-0.124939
-0.331125	functions. Many function libraries	-0.124939
-0.428553	optimized math function libraries	-0.124939
-0.331125	for general function libraries	-0.124939
-0.331125	SSE. Several function libraries	-0.124939
-0.331125	to distribute function libraries	-0.124939
-0.331125	you. Optimized function libraries	-0.124939
-0.526120	libraries: long vector libraries	-0.124939
-0.358055	to try different libraries	-0.124939
-0.462413	than most other libraries	-0.124939
-0.357928	though not all libraries	-0.124939
-0.543894	other container class libraries	-0.124939
-0.497580	12.4. Vector class libraries	-0.124939
-0.578672	and in most libraries	-0.124939
-0.583341	However, the Intel libraries	-0.124939
-0.353129	work when Intel libraries	-0.124939
-0.548511	defined in two libraries	-0.124939
-0.540448	linked from static libraries	-0.124939
-0.523628	www.agner.org/optimize/#vectorclass All these libraries	-0.124939
-0.093423	all the dynamic libraries	-0.124939
-0.289802	libraries. The dynamic libraries	-0.124939
-0.289802	but not dynamic libraries	-0.124939
-0.289802	or more dynamic libraries	-0.124939
-0.531143	the same dynamic libraries	-0.124939
-0.289802	make all dynamic libraries	-0.124939
-0.531143	Static versus dynamic libraries	-0.124939
-0.565124	instead. The Gnu libraries	-0.124939
-0.459644	useful for large libraries	-0.124939
-0.339399	position-independent code Function libraries	-0.124939
-0.339399	dynamic libraries Function libraries	-0.124939
-0.544207	Unfortunately, the standard libraries	-0.124939
-0.337540	compilers include standard libraries	-0.124939
-0.352772	including all runtime libraries	-0.124939
-0.352167	discussed below. Many libraries	-0.124939
-0.352253	DLL's (dynamically linked libraries	-0.124939
-0.115076	as static link libraries	-0.425969
-0.502190	make dynamic link libraries	-0.124939
-0.349734	less efficient. Dynamic libraries	-0.124939
-0.339357	Several special purpose libraries	-0.124939
-0.314542	write _mm_add_epi16(a,b). Two libraries	-0.124939
-0.314542	collection contains well-tested libraries	-0.124939
-0.314542	SVML and LIBM libraries	-0.124939
-0.598404	specialization. This is how	-0.124939
-0.050099	an example of how	-0.602060
-0.085685	for examples of how	-0.602060
-0.356512	my study of how	-0.124939
-0.460565	basic understanding of how	-0.124939
-0.585579	machine code and how	-0.124939
-0.241272	is called and how	-0.425969
-0.357356	look like and how	-0.124939
-0.309210	page 130 for how	-0.425969
-0.458424	page 120 for how	-0.124939
-0.065281	page 122 for how	-0.425969
-0.354826	page 107 for how	-0.124939
-0.458424	at www.agner.org/optimize/cppexamples.zip for how	-0.124939
-0.576486	loop depends on how	-0.124939
-0.508872	memory, depending on how	-0.124939
-0.508872	12.4a, depending on how	-0.124939
-0.354127	theory. Advice on how	-0.124939
-0.469690	More details about how	-0.124939
-0.333401	few comments about how	-0.124939
-0.333401	tables". Tips about how	-0.124939
-0.544127	You can calculate how	-0.124939
-0.434924	call to count how	-0.124939
-0.336208	variables that count how	-0.124939
-0.573491	generates to see how	-0.124939
-0.172115	following example shows how	-0.903090
-0.192759	Example 12.4b shows how	-0.124939
-0.192759	page 39 shows how	-0.124939
-0.184899	and to know how	-0.124939
-0.184899	order to know how	-0.124939
-0.184899	useful to know how	-0.124939
-0.184899	want to know how	-0.124939
-0.496338	useful for checking how	-0.124939
-0.451081	that can tell how	-0.124939
-0.348343	in a[i]. Note how	-0.124939
-0.128208	It is discussed how	-0.425969
-0.346234	the compiler e.g. how	-0.124939
-0.345006	The profiler counts how	-0.124939
-0.400560	program to measure how	-0.124939
-0.269041	data and measure how	-0.124939
-0.336108	only to show how	-0.124939
-0.331726	operator that specifies how	-0.124939
-0.331667	advanced C++ programming, how	-0.124939
-0.467317	important to understand how	-0.124939
-0.325235	following examples explain how	-0.124939
-0.325308	or no idea how	-0.124939
-0.839082	following example illustrates how	-0.124939
-0.314533	factors that decide how	-0.124939
-0.102801	This manual discusses how	-0.124939
-0.102801	This section discusses how	-0.124939
-0.294024	are in doubt how	-0.124939
-0.237739	next chapter describes how	-0.124939
-1.033247	version of the code.	-0.124939
-1.907637	part of the code.	-0.124939
-1.175455	parts of the code.	-0.124939
-1.206322	rest of the code.	-0.124939
-1.022407	variable in the code.	-0.124939
-0.864428	spots in the code.	-0.124939
-0.582299	places in the code.	-0.124939
-0.582299	modifications in the code.	-0.124939
-0.582299	previously in the code.	-0.124939
-0.571412	possibly improve the code.	-0.124939
-0.356781	it optimizes the code.	-0.124939
-0.356781	for improving the code.	-0.124939
-1.277191	a piece of code.	-0.124939
-0.514786	particular piece of code.	-0.124939
-0.539911	identical pieces of code.	-0.124939
-0.462240	small sequences of code.	-0.124939
-0.358471	that string as code.	-0.124939
-0.358285	microprocessor handles this code.	-0.124939
-0.826767	of the instruction code.	-0.124939
-0.995332	with floating point code.	-0.124939
-0.845432	any floating point code.	-0.124939
-0.572275	style floating point code.	-0.124939
-0.762516	reductions on integer code.	-0.124939
-0.589194	anyway in 64-bit code.	-0.124939
-0.831384	C or C++ code.	-0.124939
-0.461187	code that makes code.	-0.124939
-0.592021	executing the critical code.	-0.124939
-0.435297	discussion of system code.	-0.124939
-0.435297	use in system code.	-0.124939
-0.336505	intended for system code.	-0.124939
-0.536717	for the error code.	-0.124939
-0.212389	produce any extra code.	-0.124939
-0.263748	adding any extra code.	-0.124939
-0.542215	pointer in assembly code.	-0.124939
-0.343972	understand compiler-generated assembly code.	-0.124939
-0.496801	not the compiled code.	-0.124939
-0.466710	as directly compiled code.	-0.124939
-0.331222	a fully compiled code.	-0.124939
-0.513746	compiling the intermediate code.	-0.124939
-0.544798	of an intermediate code.	-0.124939
-0.568333	from the application code.	-0.124939
-0.353303	for vectorizing mathematical code.	-0.124939
-0.679448	in the source code.	-0.124939
-0.188069	the same source code.	-0.124939
-0.408133	of the position-independent code.	-0.124939
-0.314732	costs of position-independent code.	-0.124939
-0.348963	a faster vectorized code.	-0.124939
-0.346323	as binary executable code.	-0.124939
-0.787819	gives the simplest code.	-0.124939
-0.483588	so-called position- independent code.	-0.124939
-0.429180	C++ and Fortran code.	-0.124939
-0.331761	loader. 2. Position-independent code.	-0.124939
-0.331626	relate to CPU-intensive code.	-0.124939
-0.325205	leads to suboptimal code.	-0.124939
-0.212148	always for application-specific code.	-0.124939
-0.212148	in optimizing application-specific code.	-0.124939
-0.538465	code to non-AVX code.	-0.124939
-0.293987	longjmp in time-critical code.	-0.124939
-0.237706	optimizations in precompiled code.	-0.124939
-1.057883	50% of the time.	-0.124939
-0.202667	10% of the time.	-0.124939
-0.421572	bit at a time.	-0.124939
-0.595481	line at a time.	-0.124939
-0.421572	square at a time.	-0.124939
-0.421572	kilobytes at a time.	-0.124939
-1.736868	a lot of time.	-0.124939
-0.491205	significant amount of time.	-0.124939
-0.491205	considerable amount of time.	-0.124939
-0.787984	only slightly more time.	-0.124939
-0.266011	at the same time.	-0.221849
-0.354678	rather than CPU time.	-0.124939
-0.354678	which consumes CPU time.	-0.124939
-0.357809	or less each time.	-0.124939
-0.357815	functions take most time.	-0.124939
-0.357368	the branching takes time.	-0.124939
-0.852074	quite a long time.	-0.124939
-0.589687	it the first time.	-0.124939
-0.376314	takes no extra time.	-0.124939
-0.467442	take no extra time.	-0.124939
-0.403653	again takes extra time.	-0.124939
-0.513200	take any extra time.	-0.124939
-0.331838	relation to execution time.	-0.124939
-0.429446	compactness, and execution time.	-0.124939
-0.607286	the total execution time.	-0.124939
-0.181012	it at compile time.	-0.124939
-0.181012	compiler at compile time.	-0.124939
-0.181012	possible at compile time.	-0.124939
-0.181012	available at compile time.	-0.124939
-0.181012	etc. at compile time.	-0.124939
-0.181012	done at compile time.	-0.124939
-0.272332	calculated at compile time.	-0.124939
-0.279200	known at compile time.	-0.124939
-0.448424	resolved at compile time.	-0.124939
-0.181012	instantiated at compile time.	-0.124939
-0.355156	the total calculation time.	-0.124939
-0.354665	or at run time.	-0.124939
-0.497132	portability and development time.	-0.124939
-0.352842	way than last time.	-0.124939
-0.470468	integer takes longer time.	-0.124939
-0.146618	would take longer time.	-0.124939
-0.146618	division take longer time.	-0.124939
-0.146618	Divisions take longer time.	-0.124939
-0.133376	Dispatch at load time.	-0.124939
-0.133376	relocation at load time.	-0.124939
-0.350216	Dispatch at installation time.	-0.124939
-0.526046	calculations to save time.	-0.124939
-0.555665	of the total time.	-0.124939
-0.167603	of the user's time.	-0.124939
-0.167603	steal the user's time.	-0.124939
-0.294061	the total computation time.	-0.124939
-1.753216	value of the template	-0.124939
-1.276152	instance of the template	-0.124939
-0.596603	name and the template	-0.124939
-0.599630	But in the template	-0.124939
-0.895172	sense that the template	-0.124939
-0.598306	one if the template	-0.124939
-0.847292	efficient because the template	-0.124939
-0.573266	faster because the template	-0.124939
-0.594779	same. If the template	-0.124939
-0.985332	above example, the template	-0.124939
-0.358847	sub-expressions. Why is template	-0.124939
-0.596517	CParent is a template	-0.124939
-0.891451	instead of a template	-0.124939
-0.581991	parameter and a template	-0.124939
-0.872260	know that a template	-0.124939
-0.569834	given as a template	-0.124939
-0.569834	provided as a template	-0.124939
-0.580050	read. If a template	-0.124939
-1.112054	of using a template	-0.124939
-0.347723	class through a template	-0.425969
-0.575489	each set of template	-0.124939
-0.570196	virtual functions. The template	-0.124939
-0.892961	than a function template	-0.124939
-0.358552	Replace macro by template	-0.124939
-0.524636	calculation implemented with template	-0.124939
-0.357127	any algorithm with template	-0.124939
-0.459147	and size as template	-0.124939
-0.501765	class name as template	-0.124939
-0.355396	different factors as template	-0.124939
-0.579777	Two or more template	-0.124939
-0.796605	compile time. A template	-0.124939
-0.347951	template parameters. A template	-0.124939
-0.347951	for polymorphism A template	-0.124939
-0.347951	7.28 Templates A template	-0.124939
-0.347951	do so). A template	-0.124939
-0.865624	is a class template	-0.124939
-0.462094	Integer power using template	-0.124939
-0.558294	Because the C++ template	-0.124939
-0.351841	it. In C++ template	-0.124939
-0.656776	cases, however, where template	-0.124939
-1.784294	power of 2 template	-0.124939
-0.356631	macro by template template	-0.124939
-0.522384	} // Use template	-0.124939
-0.573460	}; // Function template	-0.124939
-0.794112	to the standard template	-0.124939
-0.475307	classes. The standard template	-0.124939
-0.575345	using the above template	-0.124939
-0.353391	use this complicated template	-0.124939
-0.705207	with bounds checking template	-0.124939
-0.488829	power of N template	-0.124939
-0.438773	power of 2: template	-0.124939
-0.093245	templates. The powN template	-0.124939
-0.093245	template. The powN template	-0.124939
-0.293978	template because partial template	-0.124939
-0.023507	}; // Full template	-0.425969
-0.538448	a template parameter: template	-0.124939
-0.538448	x * m;} template	-0.124939
-0.237698	implemented by (partial) template	-0.124939
-0.237698	tortuous and convoluted template	-0.124939
-0.237698	with a non-recursing template	-0.124939
-0.237698	}; // Partial template	-0.124939
-0.463653	ebx. Only the registers	-0.124939
-1.661561	the number of registers	-0.124939
-0.702351	The number of registers	-0.425969
-1.166241	the type of registers	-0.124939
-0.531872	systems, but in registers	-0.124939
-0.567817	local variables in registers	-0.124939
-1.252677	be stored in registers	-0.124939
-0.779409	are stored in registers	-0.124939
-0.140965	be transferred in registers	-0.124939
-0.132497	are transferred in registers	-0.346788
-0.647621	be returned in registers	-0.124939
-0.602946	of the vector registers	-0.124939
-0.426559	If the vector registers	-0.124939
-0.426559	using the vector registers	-0.124939
-0.508664	size of vector registers	-0.124939
-0.503435	128- bit vector registers	-0.124939
-0.337259	The XMM vector registers	-0.124939
-0.474979	supported 128-bit vector registers	-0.124939
-0.337259	or sixteen vector registers	-0.124939
-0.358174	is advantageous because registers	-0.124939
-1.368306	the floating point registers	-0.124939
-1.272964	of floating point registers	-0.124939
-0.362804	eight floating point registers	-0.124939
-0.822715	of the integer registers	-0.124939
-0.452282	approximately six integer registers	-0.124939
-0.349980	and fourteen integer registers	-0.124939
-0.357810	benefit from using registers	-0.124939
-0.356165	number of available registers	-0.124939
-0.026668	floating point stack registers	-0.602060
-0.355574	register stack. These registers	-0.124939
-0.027081	in the XMM registers	-0.124939
-0.027081	use the XMM registers	-0.124939
-0.008841	when the XMM registers	-0.602060
-0.156306	underflow in XMM registers	-0.124939
-0.071159	check if XMM registers	-0.124939
-0.071159	costly if XMM registers	-0.124939
-0.156306	implementation uses XMM registers	-0.124939
-0.373926	The 128-bit XMM registers	-0.124939
-0.345210	are not enough registers	-0.124939
-0.343641	extended to 256-bit registers	-0.124939
-0.046262	set and YMM registers	-0.124939
-0.224884	SSE). The YMM registers	-0.124939
-0.458615	and for saving registers	-0.124939
-0.015536	set and ZMM registers	-0.124939
-0.065786	the 512-bit ZMM registers	-0.124939
-0.393769	version without the need	-0.124939
-0.393769	libraries without the need	-0.124939
-0.393769	calculations without the need	-0.124939
-0.358475	(This eliminates the need	-0.124939
-0.970509	of data. The need	-0.124939
-0.581163	member functions that need	-0.124939
-0.142474	or addresses that need	-0.124939
-0.142474	absolute addresses that need	-0.124939
-0.355914	Temporary files that need	-0.124939
-0.355914	allocated resources that need	-0.124939
-0.581456	user may not need	-0.124939
-0.583455	references do not need	-0.124939
-0.500756	library does not need	-0.124939
-0.500756	list does not need	-0.124939
-0.500756	hand, does not need	-0.124939
-0.541400	Things that may need	-0.124939
-0.734416	the program may need	-0.124939
-0.346836	allocated array may need	-0.124939
-0.522951	case we may need	-0.124939
-0.857042	cores. You may need	-0.124939
-0.346836	program logic may need	-0.124939
-0.346836	device drivers may need	-0.124939
-0.588206	algorithm, then you need	-0.124939
-0.581432	ordering? If you need	-0.124939
-0.493086	C++ so you need	-0.124939
-0.493086	not sure you need	-0.124939
-0.642918	other words, you need	-0.124939
-0.358092	Therefore, you only need	-0.124939
-1.062164	there is no need	-0.124939
-0.550458	There is no need	-0.602060
-0.329912	inlined - no need	-0.124939
-1.292350	of a class need	-0.124939
-0.462262	Gnu). Other compilers need	-0.124939
-0.535078	n, then we need	-0.124939
-0.338787	evicted before we need	-0.124939
-0.438164	this case we need	-0.124939
-0.338787	CPU cores, we need	-0.124939
-0.357276	different registers. You need	-0.124939
-0.822538	the dynamic libraries need	-0.124939
-0.582594	current operating systems need	-0.124939
-0.395326	because it doesn't need	-0.124939
-0.754262	The compiler doesn't need	-0.124939
-0.308067	A class doesn't need	-0.124939
-0.308067	the object doesn't need	-0.124939
-0.500451	The different threads need	-0.124939
-0.459449	a high-level language need	-0.124939
-0.459047	14.30 will therefore need	-0.124939
-0.498504	the object files need	-0.124939
-0.273058	data that don't need	-0.124939
-0.378814	as you don't need	-0.124939
-0.378814	then you don't need	-0.124939
-0.462233	and we don't need	-0.124939
-0.273058	if they don't need	-0.124939
-0.647691	Many software applications need	-0.124939
-0.541209	because both the pointers	-0.124939
-0.238631	a table of pointers	-0.602060
-0.142956	type casting of pointers	-0.124939
-0.142956	Type casting of pointers	-0.124939
-0.357308	the programmer that pointers	-0.124939
-0.357308	compiler explicitly that pointers	-0.124939
-0.656630	C, specifying that pointers	-0.124939
-0.358529	exchange data or pointers	-0.124939
-0.172147	// of function pointers	-0.425969
-0.547921	functions and function pointers	-0.124939
-0.584000	addresses, or if pointers	-0.124939
-1.253359	rather than by pointers	-0.124939
-0.463121	do things with pointers	-0.124939
-1.245638	as well as pointers	-0.124939
-0.502221	always transferred as pointers	-0.124939
-0.651434	to use than pointers	-0.124939
-0.592437	references rather than pointers	-0.124939
-0.354692	objects (rather than pointers	-0.124939
-0.462826	that typically use pointers	-0.124939
-0.598177	as to make pointers	-0.124939
-0.357891	you analyze all pointers	-0.124939
-1.061272	advantages of using pointers	-0.124939
-0.528632	disadvantages of using pointers	-0.124939
-0.357486	shared_ptr allows multiple pointers	-0.124939
-0.346769	specifying that two pointers	-0.124939
-0.522851	difference between two pointers	-0.124939
-0.346769	addition. Comparing two pointers	-0.124939
-0.329540	implementation of member pointers	-0.124939
-0.426571	complications with member pointers	-0.124939
-0.477573	that make member pointers	-0.124939
-0.329540	the way member pointers	-0.124939
-0.329540	fast=2 Simple member pointers	-0.124939
-0.355742	as arguments while pointers	-0.124939
-0.189835	are accessed through pointers	-0.301030
-0.239201	fact accessed through pointers	-0.124939
-0.778209	code that uses pointers	-0.124939
-0.137347	operators. 7.7 Function pointers	-0.124939
-0.137347	36 7.7 Function pointers	-0.124939
-0.354235	1. Relocation. All pointers	-0.124939
-0.353269	simple variable. Using pointers	-0.124939
-0.351545	for the link pointers	-0.124939
-0.347182	new block. Any pointers	-0.124939
-0.299681	implementations of smart pointers	-0.124939
-0.299681	of using smart pointers	-0.124939
-0.508784	variables. This includes pointers	-0.124939
-0.545367	has to keep pointers	-0.124939
-0.344995	Problems with invalid pointers	-0.124939
-0.524432	zero, by setting pointers	-0.124939
-0.343425	section may contain pointers	-0.124939
-0.023507	pointer. 7.9 Smart pointers	-0.124939
-0.023507	pointers.......................................................................................................37 7.9 Smart pointers	-0.124939
-0.048359	is deleted. Smart pointers	-0.124939
-0.048359	for auto_ptr. Smart pointers	-0.124939
-0.382543	changed. 7.8 Member pointers	-0.124939
-0.237698	pointers, by initializing pointers	-0.124939
-0.600478	test in the test	-0.124939
-0.596940	used by the test	-0.124939
-0.589964	priority before the test	-0.124939
-0.562242	output after the test	-0.124939
-1.421526	to make a test	-0.124939
-0.552361	you make a test	-0.124939
-0.914609	to put a test	-0.124939
-0.357913	have developed a test	-0.124939
-0.482788	a set of test	-0.124939
-0.342932	typical set of test	-0.124939
-0.342932	suitable set of test	-0.124939
-0.482788	realistic set of test	-0.124939
-0.565056	Critical function to test	-0.124939
-0.571840	of code to test	-0.124939
-1.153237	may have to test	-0.124939
-2.039426	in order to test	-0.124939
-0.582888	typical way to test	-0.124939
-0.498286	For example, to test	-0.124939
-1.158306	for how to test	-0.124939
-0.547081	on how to test	-0.124939
-0.354108	XMM registers to test	-0.124939
-1.135738	you need to test	-0.124939
-0.498286	of times to test	-0.124939
-1.424859	is necessary to test	-0.124939
-0.544548	more relevant to test	-0.124939
-0.457513	two things to test	-0.124939
-0.354108	common practice to test	-0.124939
-0.358757	branches separately and test	-0.124939
-1.203174	be useful in test	-0.124939
-0.560659	test data. The test	-0.124939
-0.357468	stamp counter. The test	-0.124939
-0.357468	hot spots. The test	-0.124939
-0.462569	a variable for test	-0.124939
-0.658186	code branch for test	-0.124939
-0.358720	<<, >> can test	-0.124939
-0.991400	0) { // test	-0.124939
-0.842644	0x7FFFFFFF) { // test	-0.124939
-0.358550	a textbook on test	-0.124939
-0.358196	critical code. A test	-0.124939
-0.503805	differently on different test	-0.124939
-0.856453	then you should test	-0.124939
-0.587477	difference for each test	-0.124939
-0.452764	include a performance test	-0.124939
-0.350361	A realistic performance test	-0.124939
-0.356981	// Time before test	-0.124939
-0.137948	} }; void test	-0.425969
-0.336967	swapd(a[r][c], a[c][r]); void test	-0.124939
-0.579256	in a simple test	-0.124939
-0.536233	correctly. The speed test	-0.124939
-0.557998	make a small test	-0.124939
-0.482666	reductions in my test	-0.124939
-0.414139	manual for my test	-0.124939
-0.184759	the program under test	-0.124939
-0.294119	monitor counters. My test	-0.124939
-0.294119	been identified. My test	-0.124939
-0.702438	than a dedicated test	-0.124939
-0.314641	have a built-in test	-0.124939
-0.237755	in the unit- test	-0.124939
-0.597255	parameters of the new	-0.124939
-1.419094	beginning of the new	-0.124939
-1.043846	copied to the new	-0.124939
-0.589959	adapt to the new	-0.124939
-0.895081	wait for the new	-0.124939
-1.360202	or if the new	-0.124939
-0.503540	never uses the new	-0.124939
-0.462294	programmer gets the new	-0.124939
-0.591037	LLVM is a new	-0.124939
-0.580127	advantage of a new	-0.124939
-0.580127	insertion of a new	-0.124939
-0.587266	updated to a new	-0.124939
-0.582337	memory for a new	-0.124939
-0.579804	require that a new	-0.124939
-0.421276	every time a new	-0.124939
-0.332432	Each time a new	-0.124939
-1.559093	to use a new	-0.124939
-0.562255	only when a new	-0.124939
-1.076067	to make a new	-0.124939
-0.804628	than making a new	-0.124939
-0.350790	that support a new	-0.124939
-0.643741	to load a new	-0.124939
-0.531361	will generate a new	-0.124939
-0.140900	to start a new	-0.124939
-0.140900	can start a new	-0.124939
-0.453307	begin calculating a new	-0.124939
-0.154463	to allocate a new	-0.124939
-0.170014	classes allocate a new	-0.124939
-0.350790	Before starting a new	-0.124939
-0.350790	and maintaining a new	-0.124939
-0.350790	by assigning a new	-0.124939
-0.350790	and create a new	-0.124939
-0.358795	are needed, and new	-0.124939
-0.351649	of memory with new	-0.124939
-0.351649	an object with new	-0.124939
-0.105824	Memory allocated with new	-0.425969
-0.351649	memory allocation with new	-0.124939
-0.351649	allocated dynamically with new	-0.124939
-0.358545	the problem. This new	-0.124939
-0.543859	features to each new	-0.124939
-0.353574	CPU development, each new	-0.124939
-0.565534	alternative to using new	-0.124939
-0.356028	install this important new	-0.124939
-0.355550	instruction set. These new	-0.124939
-0.355023	or CString uses new	-0.124939
-0.354782	with the operators new	-0.124939
-0.534517	software to add new	-0.124939
-0.850399	when the next new	-0.124939
-0.349628	problems and desired new	-0.124939
-0.348233	producers keep adding new	-0.124939
-0.343570	what is brand new	-0.124939
-0.331799	of alloca over new	-0.124939
-0.237812	mechanism to advertise new	-0.124939
-0.237812	happy to receive new	-0.124939
-0.237812	allocated dynamically (with new	-0.124939
-0.463132	not portable to systems	-0.124939
-0.504462	easily ported to systems	-0.124939
-0.463551	separate thread in systems	-0.124939
-0.358016	storage, but other systems	-0.124939
-0.539765	available in all systems	-0.124939
-0.514104	that the 64-bit systems	-0.124939
-0.529313	available in 64-bit systems	-0.124939
-0.529313	avoided in 64-bit systems	-0.124939
-0.428189	6 The 64-bit systems	-0.124939
-0.478786	to use 64-bit systems	-0.124939
-0.514707	32. In 64-bit systems	-0.124939
-0.525760	use on such systems	-0.124939
-0.869111	efficient in some systems	-0.124939
-0.333748	and in 32-bit systems	-0.124939
-0.470165	variables in 32-bit systems	-0.124939
-0.333748	bits in 32-bit systems	-0.124939
-0.333748	bytes in 32-bit systems	-0.124939
-0.333748	precision in 32-bit systems	-0.124939
-0.333748	eight in 32-bit systems	-0.124939
-0.333748	six in 32-bit systems	-0.124939
-0.303572	bits, but 32-bit systems	-0.124939
-0.303572	integers. Many 32-bit systems	-0.124939
-0.472738	efficient. 64 bit systems	-0.124939
-0.472738	-fno-pic). 64 bit systems	-0.124939
-0.798515	between the operating systems	-0.124939
-0.444206	CPUs and operating systems	-0.124939
-0.314636	microprocessors and operating systems	-0.124939
-0.314636	platforms and operating systems	-0.124939
-0.231872	a different operating systems	-0.124939
-0.231872	the two operating systems	-0.124939
-0.179334	and 64-bit operating systems	-0.124939
-0.179334	for 64-bit operating systems	-0.124939
-0.231872	and some operating systems	-0.124939
-0.047412	in 32-bit operating systems	-0.425969
-0.231872	though these operating systems	-0.124939
-0.231872	and Linux operating systems	-0.124939
-0.100642	the old operating systems	-0.124939
-0.100642	on old operating systems	-0.124939
-0.231872	and current operating systems	-0.124939
-0.231872	Unfortunately, contemporary operating systems	-0.124939
-0.231872	etc.). Older operating systems	-0.124939
-0.231872	Clang Supported operating systems	-0.124939
-0.322665	intensive applications. Some systems	-0.124939
-0.322665	is important. Some systems	-0.124939
-0.322665	intended for. Some systems	-0.124939
-0.322665	accelerator card. Some systems	-0.124939
-0.355542	Windows 3.x. These systems	-0.124939
-1.012049	BSD and Mac systems	-0.124939
-0.333774	Unix-like systems. Mac systems	-0.124939
-0.534246	integers in 16-bit systems	-0.124939
-0.051240	Optimization in embedded systems	-0.124939
-0.325295	compatibility with existing systems	-0.124939
-0.444172	On big endian systems	-0.124939
-0.314680	to fully utilize systems	-0.124939
-0.575352	objects in Unix-like systems	-0.124939
-0.294098	registers. 64-bit Unix systems	-0.124939
-0.237804	be used. Web systems	-0.124939
-1.068632	care of the user	-0.124939
-0.858632	goes to the user	-0.124939
-0.858632	annoying to the user	-0.124939
-0.579262	unacceptable to the user	-0.124939
-0.594218	terminated and the user	-0.124939
-0.595664	unnecessary for the user	-0.124939
-1.653620	is that the user	-0.124939
-0.578022	long that the user	-0.124939
-0.856274	important that the user	-0.124939
-0.565356	overloaded or the user	-0.124939
-1.259232	even if the user	-0.124939
-0.858912	problem if the user	-0.124939
-1.014492	especially if the user	-0.124939
-0.594655	time on the user	-0.124939
-0.881218	priority than the user	-0.124939
-0.590646	priorities then the user	-0.124939
-0.500592	no way the user	-0.124939
-0.355761	word processor the user	-0.124939
-0.355761	never interrupt the user	-0.124939
-0.355761	to place the user	-0.124939
-0.355761	even telling the user	-0.124939
-0.355761	system forbids the user	-0.124939
-0.589586	unusual that a user	-0.124939
-0.574981	times when a user	-0.124939
-0.461874	easy development of user	-0.124939
-0.585340	data instead of user	-0.124939
-1.110930	the choice of user	-0.124939
-0.181210	2.7 Choice of user	-0.425969
-0.861267	response time to user	-0.124939
-0.358801	be reinstalled and user	-0.124939
-0.555458	Linux systems. The user	-0.124939
-0.657020	program starts. The user	-0.124939
-0.357503	is insufficient. The user	-0.124939
-0.827633	response times for user	-0.124939
-0.159141	time waiting for user	-0.425969
-0.356167	that waits for user	-0.124939
-0.356167	handle. Waiting for user	-0.124939
-0.826182	same time. A user	-0.124939
-0.462597	settings and different user	-0.124939
-0.461749	The simplest possible user	-0.124939
-1.129166	is a very user	-0.124939
-0.356943	available, though less user	-0.124939
-0.458060	can use standard user	-0.124939
-0.519694	for the end user	-0.124939
-0.146543	that the end user	-0.124939
-0.369361	before the end user	-0.124939
-0.349691	the code, including user	-0.124939
-0.093292	dropping the graphical user	-0.124939
-0.014222	of a graphical user	-0.124939
-0.028926	has a graphical user	-0.124939
-0.093292	Graphics A graphical user	-0.124939
-0.093292	their own graphical user	-0.124939
-0.477817	database for storing user	-0.124939
-0.331768	tools. A popular user	-0.124939
-0.294117	new features. Take user	-0.124939
-0.456367	to all of these	-0.124939
-1.025002	to one of these	-0.124939
-0.537812	for many of these	-0.124939
-0.334970	by any of these	-0.124939
-0.471839	If any of these	-0.124939
-0.334970	without any of these	-0.124939
-0.560806	and some of these	-0.124939
-0.583019	latest versions of these	-0.124939
-1.415438	take advantage of these	-0.124939
-0.141643	systems. All of these	-0.124939
-0.141643	Comments All of these	-0.124939
-0.546376	hardware implementation of these	-0.124939
-0.456367	code. Many of these	-0.124939
-0.866116	be aware of these	-0.124939
-1.059733	the availability of these	-0.124939
-0.353204	values. Which of these	-0.124939
-0.497027	OR combination of these	-0.124939
-0.353204	Every fourth of these	-0.124939
-0.527232	the solution to these	-0.124939
-0.561959	another function and these	-0.124939
-0.358282	Boolean vectors, and these	-0.124939
-0.504901	code examples in these	-0.124939
-0.575064	various functions for these	-0.124939
-0.357427	code examples for these	-0.124939
-0.357427	is avoided for these	-0.124939
-1.142929	can assume that these	-0.124939
-0.462526	explanation. Note that these	-0.124939
-0.462526	131 Note that these	-0.124939
-0.655233	But beware that these	-0.124939
-0.358543	doing something on these	-0.124939
-0.549984	classes that use these	-0.124939
-0.354921	total size, because these	-0.124939
-0.458544	very problematic because these	-0.124939
-0.573506	stack for all these	-0.124939
-0.354273	object. Obviously, all these	-0.124939
-0.354314	fence instructions, but these	-0.124939
-0.354314	to relocate, but these	-0.124939
-0.357469	same variables. In these	-0.124939
-0.357103	sensible balance between these	-0.124939
-0.461158	of resources. For these	-0.124939
-0.571042	code to access these	-0.124939
-0.998244	ways to avoid these	-0.124939
-0.504934	You should avoid these	-0.124939
-0.355633	performance reasons. Use these	-0.124939
-0.355642	cache line. But these	-0.124939
-0.394707	different purposes. All these	-0.124939
-0.303872	error prone. All these	-0.124939
-0.303872	table 9.2. All these	-0.124939
-0.303872	libraries. www.agner.org/optimize/#vectorclass All these	-0.124939
-0.354084	Java implementations. However, these	-0.124939
-0.349628	and lrint. Unfortunately, these	-0.124939
-0.836398	way to tell these	-0.124939
-0.347362	operating systems, though these	-0.124939
-0.885046	compiler will convert these	-0.124939
-0.345129	diagonal and swap these	-0.124939
-0.524470	simply by setting these	-0.124939
-0.819607	how to overcome these	-0.124939
-0.490231	sure to distinguish these	-0.124939
-0.237739	attempts to translate these	-0.124939
-0.989687	compatibility problems and they	-0.124939
-0.829869	same thing and they	-0.124939
-0.502211	many programmers and they	-0.124939
-0.356920	32-bit integers, and they	-0.124939
-0.356920	different sizes, and they	-0.124939
-0.562829	in so that they	-0.124939
-0.562829	different so that they	-0.124939
-0.937025	make sure that they	-0.124939
-0.355905	same reason that they	-0.124939
-0.358592	are needed, or they	-0.124939
-1.067897	inside the function they	-0.124939
-0.511490	values or if they	-0.124939
-0.511490	overlap or if they	-0.124939
-0.347487	them static if they	-0.124939
-0.467781	arrays even if they	-0.124939
-0.467781	mispredicted even if they	-0.124939
-0.347487	other values if they	-0.124939
-0.347487	stored together if they	-0.124939
-0.449129	fatal errors if they	-0.124939
-0.347487	but expensive if they	-0.124939
-0.347487	relatively cheap if they	-0.124939
-0.358546	unsigned integers - they	-0.124939
-0.580423	evaluated every time they	-0.124939
-0.453500	track of when they	-0.124939
-0.567938	loaded only when they	-0.124939
-0.453500	monitor counters when they	-0.124939
-0.350942	is stronger when they	-0.124939
-0.477566	optimal code because they	-0.124939
-0.517279	equally efficient because they	-0.124939
-0.486568	program performance because they	-0.124939
-0.339142	particularly critical because they	-0.124939
-0.339142	Boolean operators because they	-0.124939
-0.621163	be avoided because they	-0.124939
-0.438610	relatively costly because they	-0.124939
-0.277740	function in which they	-0.425969
-0.352313	order in which they	-0.301030
-0.270247	thread in which they	-0.124939
-0.354333	= a, but they	-0.124939
-0.354333	64-bit integers, but they	-0.124939
-1.230952	in cases where they	-0.124939
-0.558602	only situation where they	-0.124939
-0.552397	whenever the objects they	-0.124939
-0.556969	type of objects they	-0.124939
-0.356993	several stages before they	-0.124939
-0.759064	depending on how they	-0.124939
-1.099049	in most cases they	-0.124939
-0.620638	regardless of whether they	-0.124939
-0.620638	to see whether they	-0.124939
-0.353861	than the programs they	-0.124939
-0.354018	as pointers unless they	-0.124939
-0.353104	and that's what they	-0.124939
-0.455607	things only after they	-0.124939
-0.453181	and which reductions they	-0.124939
-0.345155	point calculations whenever they	-0.124939
-0.237780	and the texts they	-0.124939
-0.143026	versions with and without	-0.124939
-0.143026	compiled with and without	-0.124939
-0.539385	innermost loop and without	-0.124939
-0.358448	represented with or without	-0.124939
-0.462568	directly to memory without	-0.124939
-0.358066	of storing data without	-0.124939
-0.503771	an application program without	-0.124939
-1.288260	does the same without	-0.124939
-0.565650	most library functions without	-0.124939
-1.060778	outside the loop without	-0.124939
-1.789190	can be used without	-0.124939
-0.785735	point to integer without	-0.124939
-0.462142	or Gnu compilers without	-0.124939
-0.548686	stores a double without	-0.124939
-0.456684	the shared object without	-0.124939
-0.887677	a shared object without	-0.124939
-0.524944	a new version without	-0.124939
-0.549117	make shared objects without	-0.124939
-0.559700	same dynamic libraries without	-0.124939
-0.356357	example 11.3 even without	-0.124939
-1.180448	object oriented programming without	-0.124939
-1.113889	floating point operations without	-0.124939
-0.459884	to reorder instructions without	-0.124939
-0.356085	for old processors without	-0.124939
-0.355872	an unrecoverable error without	-0.124939
-0.502145	performance on CPUs without	-0.124939
-0.825938	sequence of calculations without	-0.124939
-0.355751	as command-line versions without	-0.124939
-0.728900	program is compiled without	-0.124939
-0.399785	main() are compiled without	-0.124939
-0.471352	with code compiled without	-0.124939
-0.399785	process when compiled without	-0.124939
-0.307987	shared object compiled without	-0.124939
-0.531768	Store 4 bytes without	-0.124939
-0.512509	Store 8 bytes without	-0.124939
-0.067583	Store 16 bytes without	-0.602060
-0.355540	to improve speed without	-0.124939
-0.811230	of an exception without	-0.124939
-0.578683	use double precision without	-0.124939
-0.651690	array or container without	-0.124939
-0.455546	but in applications without	-0.124939
-0.646391	with old microprocessors without	-0.124939
-0.350103	of handling errors without	-0.124939
-0.347994	legitimate backup copying without	-0.124939
-0.544545	cannot be changed without	-0.124939
-0.448885	disadvantage of compiling without	-0.124939
-0.346198	call C1::f directly without	-0.124939
-0.344936	out of F1 without	-0.124939
-0.335955	the C-style type-casting without	-0.124939
-0.331580	declare an int, without	-0.124939
-0.324983	the desired functionality without	-0.124939
-0.324983	on a unit-test without	-0.124939
-0.314358	a disassembly, probably without	-0.124939
-0.293857	numbers in question without	-0.124939
-0.237592	be used freely without	-0.124939
-0.237592	compiled code. (Compile without	-0.124939
-0.568501	functions. This is useful	-0.124939
-0.568501	memory. This is useful	-0.124939
-0.568501	value. This is useful	-0.124939
-0.588653	performance. It is useful	-0.124939
-0.588653	organization It is useful	-0.124939
-0.871119	mode program is useful	-0.124939
-0.588589	output, which is useful	-0.124939
-0.533646	This method is useful	-0.301030
-0.355339	Best-case testing is useful	-0.124939
-0.355339	This principle is useful	-0.124939
-0.355339	empty throw()specification is useful	-0.124939
-1.619917	This is a useful	-0.124939
-0.589799	(STL) is a useful	-0.124939
-0.898952	it can be useful	-0.124939
-0.683505	This can be useful	-0.346788
-0.990335	library can be useful	-0.124939
-0.783174	pointers can be useful	-0.124939
-0.537976	binding can be useful	-0.124939
-0.537976	tables can be useful	-0.124939
-0.537976	Metaprogramming can be useful	-0.124939
-0.788404	it may be useful	-0.602060
-0.911159	This may be useful	-0.124939
-0.731164	It may be useful	-0.425969
-0.508360	Bitfields may be useful	-0.124939
-0.548714	available which are useful	-0.124939
-1.243494	function libraries are useful	-0.124939
-0.966691	Vector operations are useful	-0.124939
-0.518897	#if directives are useful	-0.124939
-0.497043	These profilers are useful	-0.124939
-0.456381	Linux). Threads are useful	-0.124939
-0.456381	type. References are useful	-0.124939
-0.353216	^, ~ are useful	-0.124939
-0.550039	12) are more useful	-0.124939
-0.525725	profiler is most useful	-0.124939
-0.936560	It is also useful	-0.124939
-0.524921	operator is also useful	-0.124939
-0.346434	is 102 also useful	-0.124939
-0.761761	library contains many useful	-0.124939
-0.544078	counter is very useful	-0.124939
-1.024297	is a very useful	-0.124939
-0.427458	tested, and very useful	-0.124939
-0.367543	can be very useful	-0.124939
-0.581894	option is less useful	-0.124939
-1.389111	It is often useful	-0.124939
-0.349043	(www.agner.org/optimize/testp.zip). A particularly useful	-0.124939
-0.343643	and newsgroups contain useful	-0.124939
-0.594665	near then the even	-0.124939
-1.433213	Therefore, it is even	-0.124939
-0.358782	are prone to even	-0.124939
-0.358633	be overwritten, and even	-0.124939
-0.358644	hash table for even	-0.124939
-0.860466	it would be even	-0.124939
-0.358670	common denominator can even	-0.124939
-0.355122	new update or even	-0.124939
-0.355122	a hundred or even	-0.124939
-0.355122	are uncached or even	-0.124939
-0.355122	binary search, or even	-0.124939
-1.631924	It is not even	-0.124939
-0.356677	a register, not even	-0.124939
-0.870170	There is an even	-0.124939
-0.580918	software. It may even	-0.124939
-0.457955	RAM memory may even	-0.124939
-0.587096	vector. You may even	-0.124939
-0.583825	as you have even	-0.124939
-0.462551	load into memory even	-0.124939
-0.461545	for different objects even	-0.124939
-1.134831	of a variable even	-0.124939
-0.586618	before the performance even	-0.124939
-1.363752	in some cases even	-0.124939
-0.355656	for different arrays even	-0.124939
-0.830686	make multiple versions even	-0.124939
-0.343133	some cases. An even	-0.124939
-0.343133	memory leak. An even	-0.124939
-0.355053	operator (|) works even	-0.124939
-0.870897	10 clock cycles even	-0.124939
-1.055318	in most cases, even	-0.124939
-0.574867	use exception handling even	-0.124939
-0.560980	devices, you don't even	-0.124939
-0.352530	for many applications even	-0.124939
-0.542630	the dispatch mechanism even	-0.124939
-0.455717	lookups are needed even	-0.124939
-0.455185	not an Intel, even	-0.124939
-0.348823	register to temp even	-0.124939
-0.348709	is always inlined even	-0.124939
-0.445814	will be used, even	-0.124939
-0.748388	can be mispredicted even	-0.124939
-0.438625	likely be called, even	-0.124939
-0.467027	latter is executed even	-0.124939
-0.331556	the function returns even	-0.124939
-0.325078	computer starts up, even	-0.124939
-0.324953	require more resources, even	-0.124939
-0.682819	binding by default, even	-0.124939
-0.314329	whole program execution, even	-0.124939
-0.314329	overflow never occurs, even	-0.124939
-0.314329	takes memory space, even	-0.124939
-0.574839	in example 11.3 even	-0.124939
-0.293830	floating point expressions, even	-0.124939
-0.293830	be a time-consumer even	-0.124939
-0.293830	long response times, even	-0.124939
-0.237568	if(!(a || b)) even	-0.124939
-0.237568	rather than nine, even	-0.124939
-0.237568	that (b*c) overflows, even	-0.124939
-0.237568	the exception handler, even	-0.124939
-1.432190	because it is sure	-0.124939
-0.598206	directives. This is sure	-0.124939
-0.358814	cannot know for sure	-0.124939
-0.588264	you cannot be sure	-0.124939
-0.724658	can never be sure	-0.124939
-0.381507	if you are sure	-0.249877
-1.161595	if they are sure	-0.124939
-0.547883	cases they are sure	-0.124939
-0.548736	AMD processors are sure	-0.124939
-0.353230	same arguments are sure	-0.124939
-1.171595	you are not sure	-0.124939
-1.037457	is to make sure	-0.124939
-0.385844	have to make sure	-0.124939
-0.901802	order to make sure	-0.124939
-0.280504	way to make sure	-0.425969
-0.421317	want to make sure	-0.425969
-0.385844	important to make sure	-0.124939
-0.385844	structure to make sure	-0.124939
-0.169880	programmer to make sure	-0.522879
-0.385844	destructor to make sure	-0.124939
-0.385844	subexpression to make sure	-0.124939
-0.385844	manner to make sure	-0.124939
-0.385844	carefully to make sure	-0.124939
-0.409293	possible, and make sure	-0.124939
-0.409293	aligned, and make sure	-0.124939
-0.665832	We can make sure	-0.124939
-0.403873	only you make sure	-0.124939
-0.262864	library then make sure	-0.124939
-0.262864	compiler, then make sure	-0.124939
-0.521690	you must make sure	-0.124939
-0.284354	parameters. Therefore, make sure	-0.124939
-0.284354	non-recoverable errors; make sure	-0.124939
-0.390026	destructor that makes sure	-0.124939
-0.666772	that it makes sure	-0.124939
-0.442919	x; This makes sure	-0.124939
-0.442919	bytes. This makes sure	-0.124939
-0.442919	static. This makes sure	-0.124939
-0.300071	handling system makes sure	-0.124939
-0.300071	const reference makes sure	-0.124939
-0.300071	volatile keyword makes sure	-0.124939
-0.300071	the product makes sure	-0.124939
-0.298070	way of making sure	-0.425969
-0.810356	solved by making sure	-0.124939
-0.329639	a variable. Make sure	-0.124939
-0.329639	and executables. Make sure	-0.124939
-0.237894	be signed. Be sure	-0.124939
-0.583774	macro, but the method	-0.124939
-0.817622	may choose the method	-0.124939
-0.358624	9.3 shows, the method	-0.124939
-0.522235	not used. The method	-0.124939
-0.498585	of bits. The method	-0.124939
-0.354322	than pow The method	-0.124939
-0.457785	iterations back. The method	-0.124939
-0.354322	assembly language". The method	-0.124939
-0.650703	be negative. The method	-0.124939
-0.354322	loop count. The method	-0.124939
-0.354322	Gauss elimination. The method	-0.124939
-0.578901	no function or method	-0.124939
-0.468709	never called. This method	-0.124939
-0.608883	different CPUs. This method	-0.124939
-0.430506	Intel compiler. This method	-0.124939
-0.332685	the library. This method	-0.124939
-0.667859	each thread. This method	-0.124939
-0.332685	memory allocation. This method	-0.124939
-0.332685	the array. This method	-0.124939
-0.430506	modulo 16. This method	-0.124939
-0.332685	are finished. This method	-0.124939
-0.608883	at all. This method	-0.124939
-0.332685	multiple versions. This method	-0.124939
-0.430506	is loaded. This method	-0.124939
-0.332685	< 2.0 This method	-0.124939
-0.332685	less efficiently. This method	-0.124939
-0.332685	different executables. This method	-0.124939
-0.332685	be added. This method	-0.124939
-0.332685	Windows MFC). This method	-0.124939
-1.060119	advantage of this method	-0.124939
-0.482891	note that this method	-0.124939
-0.540883	should use this method	-0.124939
-0.539818	pointers because this method	-0.124939
-0.563405	metaprogramming, but this method	-0.124939
-0.526245	may avoid this method	-0.124939
-0.343007	and choose this method	-0.124939
-0.343007	80 Unfortunately, this method	-0.124939
-0.358229	memory blocks. A method	-0.124939
-0.540314	the short vector method	-0.124939
-0.572134	closed. The same method	-0.124939
-0.546872	choice of which method	-0.124939
-0.453430	also discussed which method	-0.124939
-0.350887	to consider which method	-0.124939
-0.983711	the induction variable method	-0.124939
-0.784638	on first call method	-0.124939
-1.097880	The most important method	-0.124939
-0.746992	the function calling method	-0.124939
-0.345072	138 A similar method	-0.124939
-0.345095	used. A newer method	-0.124939
-0.343627	the table. Optimization method	-0.124939
-0.341682	A more general method	-0.124939
-0.341740	returns. The preferred method	-0.124939
-0.325232	The old C-style method	-0.124939
-0.325232	disadvantages. The original method	-0.124939
-0.314601	same effect. Which method	-0.124939
-0.314601	uses an unfortunate method	-0.124939
-0.593451	branch that is always	-0.124939
-1.854102	the function is always	-0.124939
-0.570761	that b is always	-0.124939
-1.268080	that there is always	-0.124939
-0.502033	possible. SSE2 is always	-0.124939
-0.547446	template parameter is always	-0.124939
-0.460923	code section is always	-0.124939
-0.783173	The exponent is always	-0.124939
-0.888823	possibility is to always	-0.124939
-0.876125	a compiler to always	-0.124939
-0.658377	be reduced to always	-0.124939
-0.725351	is small and always	-0.124939
-0.883582	repeat count and always	-0.124939
-1.106519	A branch that always	-0.124939
-1.011094	template parameters are always	-0.124939
-0.522302	intermediate results are always	-0.124939
-0.500280	these manuals are always	-0.124939
-0.459328	references. Arrays are always	-0.124939
-0.355538	called properties) are always	-0.124939
-0.659358	: 1; // always	-0.124939
-0.358598	always true or always	-0.124939
-0.526265	compiler is not always	-0.124939
-1.115916	this is not always	-0.124939
-1.217825	It is not always	-0.124939
-0.526265	set is not always	-0.124939
-0.526265	computer is not always	-0.124939
-0.526265	compatibility is not always	-0.124939
-0.796129	libraries are not always	-0.124939
-0.695564	systems are not always	-0.124939
-0.486017	numbers are not always	-0.124939
-0.486017	profilers are not always	-0.124939
-0.553955	applications, but not always	-0.124939
-0.766658	libraries do not always	-0.124939
-0.528509	directives do not always	-0.124939
-0.828620	2 does not always	-0.124939
-1.285943	The compiler will always	-0.124939
-0.496754	another thread will always	-0.124939
-0.353008	same core will always	-0.124939
-0.501894	the #pragma vector always	-0.124939
-0.356694	always #pragma vector always	-0.124939
-0.356694	Vectorize #pragma vector always	-0.124939
-0.869521	If the cache always	-0.124939
-0.353965	The size should always	-0.124939
-0.457331	installation process should always	-0.124939
-0.870337	that the variable always	-0.124939
-0.572719	but you cannot always	-0.124939
-0.527104	integers, and they always	-0.124939
-0.764230	sure that they always	-0.124939
-0.356084	floating point constant always	-0.124939
-1.499661	on the stack always	-0.124939
-0.355884	The recursion must always	-0.124939
-0.642551	function call statement always	-0.124939
-0.452656	see that p always	-0.124939
-0.508356	It is almost always	-0.124939
-0.527262	manuals. I am always	-0.124939
-0.237788	compiler. Remember, therefore, always	-0.124939
-0.882526	would make the access	-0.124939
-0.583021	blocks makes the access	-0.124939
-0.591695	error is to access	-0.124939
-0.848481	assembly code to access	-0.124939
-0.570348	recently than to access	-0.124939
-0.592285	always possible to access	-0.124939
-1.762090	in order to access	-0.124939
-0.982218	In order to access	-0.124939
-0.804162	is faster to access	-0.124939
-0.447359	much faster to access	-0.124939
-0.459215	several seconds to access	-0.124939
-0.755750	be unable to access	-0.124939
-0.355449	following steps to access	-0.124939
-0.358729	or column. The access	-0.124939
-0.889315	here is that access	-0.124939
-1.432158	the functions that access	-0.124939
-0.578883	because we can access	-0.124939
-0.541350	storage. If you access	-0.124939
-0.541350	42 If you access	-0.124939
-0.354380	will give you access	-0.124939
-0.504190	other threads have access	-0.124939
-0.344107	access and memory access	-0.124939
-0.344107	likely that memory access	-0.124939
-0.344107	method if memory access	-0.124939
-0.344107	bottleneck than memory access	-0.124939
-0.063890	9 Optimizing memory access	-0.124939
-0.462640	explicitly if data access	-0.124939
-0.358040	relevant when CPU access	-0.124939
-0.357951	algebra) require other access	-0.124939
-0.357881	access or cache access	-0.124939
-0.357397	of fastest possible access	-0.124939
-0.768787	that it cannot access	-0.124939
-0.740826	member function cannot access	-0.124939
-0.356508	and different user access	-0.124939
-0.327010	bottleneck is file access	-0.124939
-0.061610	to put file access	-0.124939
-0.327010	storage. Optimizing file access	-0.124939
-1.086723	order to get access	-0.124939
-0.533828	instructions for fast access	-0.124939
-0.350223	double which gives access	-0.124939
-0.368207	access and network access	-0.124939
-0.282251	controlled. The network access	-0.124939
-0.282251	software with network access	-0.124939
-0.625955	21 3.13 Memory access	-0.124939
-0.341678	structures with non-sequential access	-0.124939
-0.336048	a subset, giving access	-0.124939
-0.325152	data for regular access	-0.124939
-0.407874	below. 3.7 File access	-0.124939
-0.294015	and finally (4) access	-0.124939
-0.048364	21 3.12 Network access	-0.124939
-0.048364	modules. 3.12 Network access	-0.124939
-0.294015	that allows direct access	-0.124939
-0.294015	container for exclusive access	-0.124939
-0.237731	access. Sequential forward access	-0.124939
-0.874564	2; } } void	-0.124939
-0.541803	x; ... } void	-0.124939
-1.162181	+ 2; } void	-0.124939
-0.810310	*)d, x); } void	-0.124939
-0.723184	for each version void	-0.124939
-0.656167	Loop with branch void	-0.124939
-0.013941	{ public: virtual void	-0.425969
-0.305787	void NotPolymorphic(); virtual void	-0.124939
-0.522196	by another thread void	-0.124939
-0.063517	and copy matrix void	-0.425969
-0.997362	Define vector classes void	-0.124939
-0.169610	2; } }; void	-0.425969
-0.297516	c, d; }; void	-0.124939
-0.544681	void f(); }; void	-0.124939
-0.297516	... ~C1(); }; void	-0.124939
-0.354863	with alignment problem void	-0.124939
-0.329438	array static inline void	-0.425969
-0.302454	line: static inline void	-0.124939
-0.278059	function call inline void	-0.124939
-0.644506	CHello { public: void	-0.124939
-0.332470	CParent<CChild1> { public: void	-0.124939
-0.135168	CGrandParent { public: void	-0.124939
-0.332470	CParent<CChild2> { public: void	-0.124939
-0.325112	in matrix 96 void	-0.124939
-0.421041	// Example 8.26a void	-0.124939
-0.314595	// Example 7.12 void	-0.124939
-0.314484	function type typedef void	-0.124939
-0.382543	// Example 8.26b void	-0.124939
-0.382543	function swapd(a[r][c], a[c][r]); void	-0.124939
-0.293978	// Example 14.1c void	-0.124939
-0.293978	virtual void Disp(); void	-0.124939
-0.102786	void F2(float x[]); void	-0.124939
-0.102786	void F1(int x[]); void	-0.124939
-0.382543	// Example 8.21 void	-0.124939
-0.237698	// Example 9.5b void	-0.124939
-0.237698	&SelectAddMul_dispatch; // Dispatcher void	-0.124939
-0.237698	_mm_storeu_si128((__m128i *)d, x);} void	-0.124939
-0.237698	: x(0) {}; void	-0.124939
-0.237698	#define EXCEPTION_FLT_OVERFLOW 0xC0000091L void	-0.124939
-0.237698	// Example 8.5a void	-0.124939
-0.237698	the function prototype: void	-0.124939
-0.237698	9.3 #include <malloc.h> void	-0.124939
-0.237698	// Example 8.25 void	-0.124939
-0.237698	// Example 9.2b void	-0.124939
-0.237698	// Example 9.2a void	-0.124939
-0.237698	Branch/loop function vectorized: void	-0.124939
-0.237698	<stdio.h> #include <asmlib.h> void	-0.124939
-0.504905	short int is 16	-0.124939
-0.550084	eight integers of 16	-0.124939
-0.504409	a block of 16	-0.124939
-0.358490	be increased to 16	-0.124939
-0.358490	This corresponds to 16	-0.124939
-0.358575	factor sizeof(S1) = 16	-0.124939
-0.142599	16 8 or 16	-0.124939
-0.142599	lower 8 or 16	-0.124939
-0.356322	10, 12 or 16	-0.124939
-1.235611	out loop by 16	-0.124939
-0.352085	align table by 16	-0.124939
-1.322512	is divisible by 16	-0.124939
-0.352085	when alignment by 16	-0.124939
-0.352085	big structures by 16	-0.124939
-0.352085	Linux Align by 16	-0.124939
-1.960152	of the code 16	-0.124939
-0.659060	takes 4 - 16	-0.124939
-0.581007	systems: unsigned int 16	-0.124939
-0.530687	SSE2 short int 16	-0.124939
-0.530687	AVX2 short int 16	-0.124939
-0.530687	MMX short int 16	-0.124939
-0.351189	16-bit systems: int 16	-0.124939
-0.570140	Objects bigger than 16	-0.124939
-0.593659	files. See page 16	-0.124939
-0.461918	reach element number 16	-0.124939
-0.514572	MMX char 8 16	-0.124939
-0.341232	Is8vec16 Vec16c 8 16	-0.124939
-0.341232	64 I64vec1 8 16	-0.124939
-0.577486	registers to test 16	-0.124939
-0.841411	short int 16 16	-0.124939
-0.336542	832 256 16 16	-0.124939
-0.336542	Vec8f Vec4d 16 16	-0.124939
-0.522223	AVX int 32 16	-0.124939
-0.487511	AVX512 float 32 16	-0.124939
-0.353196	int 832 256 16	-0.124939
-0.061946	byte = char 16	-0.124939
-0.327441	_mm_stream_pd SSE2 Store 16	-0.124939
-0.189096	_mm_stream_ps SSE Store 16	-0.124939
-0.189096	_mm_stream_pi SSE Store 16	-0.124939
-0.441706	stores the lower 16	-0.124939
-0.341647	can be 8, 16	-0.124939
-0.341692	macros Compiler identification 16	-0.124939
-0.331616	temp++ actually adds 16	-0.124939
-0.325112	Metaprogramming ....................................................................................................... 150 16	-0.124939
-0.314484	clock cycle? ...................................................................................... 16	-0.124939
-0.293978	128 Is16vec8 Vec8s 16	-0.124939
-0.293978	time consumers ................................................................................ 16	-0.124939
-0.293978	hot spots .................................................................................. 16	-0.124939
-0.237698	15.1a to 15.1c). 16	-0.124939
-0.237698	128 Iu8vec16 Vec16uc 16	-0.124939
-0.237698	char 64 Iu8vec8 16	-0.124939
-0.237698	Vec2d Vec8f Vec4d 16	-0.124939
-0.237698	set needed _mm_shuffle_epi8 16	-0.124939
-0.237698	int 64 Is16vec4 16	-0.124939
-0.584837	functions for the SSE2	-0.124939
-0.869300	one for the SSE2	-0.124939
-0.587687	implementation if the SSE2	-0.124939
-0.587687	(XMM) if the SSE2	-0.124939
-0.112565	or when the SSE2	-0.602060
-0.501654	systems when the SSE2	-0.124939
-0.501654	operations when the SSE2	-0.124939
-0.501654	truncation when the SSE2	-0.124939
-0.501654	float's when the SSE2	-0.124939
-0.487710	with only the SSE2	-0.124939
-0.487710	insert only the SSE2	-0.124939
-0.568842	processors without the SSE2	-0.124939
-0.719069	some cases the SSE2	-0.124939
-0.381917	time unless the SSE2	-0.124939
-0.381917	systems unless the SSE2	-0.124939
-0.381917	mode unless the SSE2	-0.124939
-0.381917	rounding unless the SSE2	-0.124939
-0.536174	105). Using the SSE2	-0.124939
-0.237796	to enable the SSE2	-0.124939
-0.307548	or enable the SSE2	-0.124939
-0.659588	the SSE and SSE2	-0.124939
-0.785864	more efficient. The SSE2	-0.124939
-0.539040	modern CPUs. The SSE2	-0.124939
-0.357495	page 140). The SSE2	-0.124939
-0.561689	Not optimized for SSE2	-0.124939
-0.358113	// Only for SSE2	-0.124939
-0.365980	4) { // SSE2	-0.425969
-0.502388	parm2) {...} // SSE2	-0.425969
-0.458127	SelectAddMul_AVX2 #endif // SSE2	-0.124939
-0.659230	the SSE or SSE2	-0.124939
-0.463294	denormals-are-zero mode if SSE2	-0.124939
-0.358579	12.4b. Vectorized with SSE2	-0.124939
-1.173613	the instruction set SSE2	-0.124939
-0.549421	"\nError: Instruction set SSE2	-0.124939
-0.356496	to integer without SSE2	-0.124939
-0.057489	64 2 128 SSE2	-0.124939
-0.544692	32 4 128 SSE2	-0.124939
-0.297523	16 8 128 SSE2	-0.124939
-0.297523	8 16 128 SSE2	-0.124939
-0.351917	bit float vectors SSE2	-0.124939
-0.565402	<emmintrin.h> // Define SSE2	-0.124939
-0.785906	set if possible. SSE2	-0.124939
-0.314610	when the 145 SSE2	-0.124939
-0.294098	the same executable. SSE2	-0.124939
-0.382691	-msse /arch:SSE -msse SSE2	-0.124939
-0.237804	cache MOVNTDQ _mm_stream_si128 SSE2	-0.124939
-0.237804	cache MOVNTI _mm_stream_si32 SSE2	-0.124939
-0.237804	mmintrin.h SSE xmmintrin.h SSE2	-0.124939
-0.237804	cache MOVNTPD _mm_stream_pd SSE2	-0.124939
-0.172466	the index is out	-0.124939
-0.077727	array index is out	-0.425969
-0.527061	Interpreted languages are out	-0.124939
-0.463178	calculations simultaneously or out	-0.124939
-0.358642	return 0 if out	-0.124939
-0.598603	index is not out	-0.124939
-0.345362	can execute instructions out	-0.124939
-0.486144	by executing instructions out	-0.124939
-0.351235	following list points out	-0.124939
-0.492792	Move the conversions out	-0.124939
-0.350159	write FatalAppExitA(0,"Array index out	-0.124939
-0.171434	want to find out	-0.124939
-0.411707	a[i] and shift out	-0.124939
-0.277248	We can shift out	-0.124939
-0.277248	14.28 will shift out	-0.124939
-0.039008	it cannot rule out	-0.124939
-0.019066	compiler cannot rule out	-0.425969
-0.129426	to completely rule out	-0.124939
-0.006146	is to roll out	-0.124939
-0.006146	way to roll out	-0.124939
-0.006146	useful to roll out	-0.124939
-0.006146	want to roll out	-0.124939
-0.006146	advantageous to roll out	-0.124939
-0.031639	when we roll out	-0.124939
-0.007386	objects // Roll out	-0.124939
-0.007386	c; // Roll out	-0.124939
-0.003677	_mm_set1_epi16(2); // Roll out	-0.425969
-0.007386	two(2,2,2,2,2,2,2,2); // Roll out	-0.124939
-0.594800	it can move out	-0.124939
-0.325243	clear or mask out	-0.124939
-0.172579	can be carried out	-0.124939
-0.172579	tests were carried out	-0.124939
-0.065765	}; // Index out	-0.124939
-0.015531	<< "Error: Index out	-0.425969
-0.077773	can be moved out	-0.124939
-0.077773	may be moved out	-0.124939
-0.314542	of n being out	-0.124939
-0.172579	used for jumping out	-0.124939
-0.172579	destructors after jumping out	-0.124939
-0.048367	can be ruled out	-0.124939
-0.048367	cannot be ruled out	-0.124939
-0.048367	avoided by rolling out	-0.124939
-0.048367	8.26a by rolling out	-0.124939
-0.102804	memory block turns out	-0.124939
-0.102804	the prediction turns out	-0.124939
-0.294033	can be left out	-0.124939
-0.102804	loop is rolled out	-0.124939
-0.102804	a list, rolled out	-0.124939
-0.237747	Loop to print out	-0.124939
-0.237747	we are breaking out	-0.124939
-0.591976	all of the following	-0.124939
-1.551818	one of the following	-0.124939
-0.591976	Each of the following	-0.124939
-0.564928	compiler in the following	-0.124939
-0.564928	necessary in the following	-0.124939
-0.831751	works in the following	-0.124939
-0.564928	given in the following	-0.124939
-0.564928	improved in the following	-0.124939
-0.564928	discussed in the following	-0.124939
-0.564928	interpreted in the following	-0.124939
-0.564928	evaluated in the following	-0.124939
-0.564928	(PLT) in the following	-0.124939
-0.574005	program for the following	-0.124939
-0.574005	array for the following	-0.124939
-0.574005	caching for the following	-0.124939
-0.595864	best if the following	-0.124939
-0.593000	illustrated by the following	-0.124939
-0.564447	systems have the following	-0.124939
-0.577776	position-independent has the following	-0.124939
-0.544977	goes through the following	-0.124939
-0.355200	may consider the following	-0.124939
-0.065329	compiler generates the following	-0.425969
-0.355200	Please skip the following	-0.124939
-0.458899	loops. Consider the following	-0.124939
-0.355200	2011). Instead, the following	-0.124939
-0.341957	in table The following	-0.124939
-0.551901	message function. The following	-0.124939
-0.928623	member functions. The following	-0.124939
-0.520675	very efficient. The following	-0.124939
-1.043594	instruction set. The following	-0.124939
-0.502507	newer processors. The following	-0.124939
-0.744648	the loop. The following	-0.124939
-0.502507	of 2. The following	-0.124939
-0.481443	point precision. The following	-0.124939
-0.896511	or not. The following	-0.124939
-0.442153	128-bit vectors. The following	-0.124939
-0.442153	not do. The following	-0.124939
-0.341957	breakpoint again. The following	-0.124939
-0.442153	and branches. The following	-0.124939
-0.442153	at www.agner.org/optimize/asmlib.zip. The following	-0.124939
-0.442153	further explanation. The following	-0.124939
-0.138157	is unsigned. The following	-0.124939
-0.138157	or unsigned. The following	-0.124939
-0.341957	from errors. The following	-0.124939
-0.341957	multiplications only. The following	-0.124939
-0.442153	before compilation. The following	-0.124939
-0.341957	no multiplications. The following	-0.124939
-0.341957	at runtime). The following	-0.124939
-0.341957	see shortly. The following	-0.124939
-0.341957	at Wikibooks. The following	-0.124939
-0.341957	becomes noticeable. The following	-0.124939
-0.341957	(Intel Atom). The following	-0.124939
-0.341957	not satisfactory. The following	-0.124939
-0.237951	function is InstructionSet().The following	-0.124939
-0.597681	workplace and the system	-0.124939
-1.321906	Note that the system	-0.124939
-0.589756	forgets that the system	-0.124939
-0.763037	to access the system	-0.124939
-0.358333	and therefore the system	-0.124939
-0.593146	determined by a system	-0.124939
-0.591399	threads on a system	-0.124939
-1.121796	further discussion of system	-0.124939
-0.358548	the area of system	-0.124939
-0.548419	configuration files and system	-0.124939
-0.295098	compatibility problems and system	-0.124939
-0.461721	hardware interfaces and system	-0.124939
-0.582816	time-consuming function in system	-0.124939
-0.504273	resource use in system	-0.124939
-0.358786	allocated resources. The system	-0.124939
-0.562762	are intended for system	-0.124939
-0.358596	be determined with system	-0.124939
-0.358122	screen resolutions, different system	-0.124939
-0.024051	of the operating system	-0.124939
-0.180783	to the operating system	-0.124939
-0.192295	and the operating system	-0.124939
-0.105405	in the operating system	-0.124939
-0.105405	that the operating system	-0.124939
-0.146680	by the operating system	-0.124939
-0.105405	with the operating system	-0.124939
-0.180783	between the operating system	-0.124939
-0.105405	tells the operating system	-0.124939
-0.105405	force the operating system	-0.124939
-0.363741	Choice of operating system	-0.124939
-0.053626	used. The operating system	-0.124939
-0.053626	cache. The operating system	-0.124939
-0.053626	databases. The operating system	-0.124939
-0.253543	without an operating system	-0.124939
-0.186071	query certain operating system	-0.124939
-0.186071	or Mac operating system	-0.124939
-0.186071	both compiler, operating system	-0.124939
-0.363741	OS X operating system	-0.124939
-0.363741	a protected operating system	-0.124939
-0.186071	or circumvent operating system	-0.124939
-0.489380	own error handling system	-0.124939
-0.545136	C++ exception handling system	-0.124939
-0.351162	services under advanced system	-0.124939
-0.126035	21 3.11 Other system	-0.124939
-0.126035	best. 3.11 Other system	-0.124939
-0.505814	applications are highly system	-0.124939
-0.336240	back again. Accessing system	-0.124939
-0.538725	interpreters, just-in-time compilers, system	-0.124939
-0.237837	interrupt service routines, system	-0.124939
-1.634598	one of the 32	-0.124939
-0.725840	an int is 32	-0.124939
-0.358525	type size_t is 32	-0.124939
-0.550563	four integers of 32	-0.124939
-0.463483	disadvantages compared to 32	-0.124939
-0.358676	8 bit and 32	-0.124939
-0.856812	mode than in 32	-0.124939
-1.264762	Shared objects in 32	-0.124939
-0.525734	absolute references in 32	-0.124939
-0.357374	8, 16 or 32	-0.124939
-0.357374	size (16 or 32	-0.124939
-0.550256	preferably aligned by 32	-0.124939
-0.526768	are organized as 32	-0.124939
-0.494176	systems: long int 32	-0.124939
-0.351155	128 SSE2 int 32	-0.124939
-0.351155	256 AVX int 32	-0.124939
-0.351155	256 AVX2 int 32	-0.124939
-0.351155	64 MMX int 32	-0.124939
-0.596297	bits rather than 32	-0.124939
-0.358258	float. (Both use 32	-0.124939
-0.576383	b; will make 32	-0.124939
-0.886240	used. See page 32	-0.124939
-0.565546	variable, for example 32	-0.124939
-0.657256	a 64-bit double 32	-0.124939
-0.346863	128 SSE2 float 32	-0.124939
-0.346863	256 AVX2 float 32	-0.124939
-0.346863	512 AVX512 float 32	-0.124939
-0.461490	replace j * 32	-0.124939
-0.538911	4 64 2 32	-0.124939
-0.502450	systems: unsigned long 32	-0.124939
-0.538656	4 64 4 32	-0.124939
-0.525047	16 32 8 32	-0.124939
-0.514501	SSE2 char 8 32	-0.124939
-0.341185	128 Vec2uq 8 32	-0.124939
-0.524898	11.6 64 64 32	-0.124939
-0.502526	Vec4d 16 16 32	-0.124939
-0.850144	if the AVX 32	-0.124939
-0.354873	a float uses 32	-0.124939
-0.353715	for SSE2, preferably 32	-0.124939
-0.498690	by OpenMP directives 32	-0.124939
-0.451657	bitwise operators produce 32	-0.124939
-0.348106	problem with accessing 32	-0.124939
-0.347172	/ 64) % 32	-0.124939
-0.429174	several advantages over 32	-0.124939
-0.429174	than 8, 16, 32	-0.124939
-0.237840	use the upper 32	-0.124939
-0.237840	// Get upper 32	-0.124939
-0.237633	16 SSSE3 _mm_perm_epi8 32	-0.124939
-0.237633	__INTEL_COMPILER __INTEL_COMPILER 161 32	-0.124939
-0.237633	128 Iu16vec8 Vec8us 32	-0.124939
-0.237633	and operators ...................................................................... 32	-0.124939
-0.237633	128 Is32vec4 Vec4i 32	-0.124939
-0.237633	int 64 Is32vec2 32	-0.124939
-0.237633	Important features 80386 32	-0.124939
-0.237633	int 64 Iu16vec4 32	-0.124939
-1.059336	is because the file	-0.124939
-0.589640	program before the file	-0.124939
-0.845175	makes sure the file	-0.124939
-0.762607	to access the file	-0.124939
-0.957661	to write the file	-0.124939
-0.358164	and closes the file	-0.124939
-0.659760	the bottleneck is file	-0.124939
-0.595245	access to a file	-0.124939
-0.992503	to access a file	-0.124939
-0.830988	or writing a file	-0.124939
-0.357297	or writes a file	-0.124939
-0.357297	that created a file	-0.124939
-0.357297	function opens a file	-0.124939
-0.358118	dynamic linking. The file	-0.124939
-0.658246	is closed. The file	-0.124939
-0.593784	time used for file	-0.124939
-0.358172	plain old data file	-0.124939
-0.862859	from the library file	-0.124939
-0.521234	on the object file	-0.124939
-0.521234	at the object file	-0.124939
-0.825063	use an object file	-0.124939
-0.336461	three different object file	-0.124939
-0.336461	the usual object file	-0.124939
-0.831532	C or C++ file	-0.124939
-0.502826	that have many file	-0.124939
-0.355677	disk. A big file	-0.124939
-0.473593	step. The intermediate file	-0.124939
-0.544830	to an intermediate file	-0.124939
-1.428493	in a separate file	-0.124939
-0.627627	useful to put file	-0.124939
-0.627627	advantageous to put file	-0.124939
-0.353093	in one source file	-0.124939
-0.347372	big-endian storage. Optimizing file	-0.124939
-0.552110	mirror the entire file	-0.124939
-0.143885	both the executable file	-0.124939
-0.143885	Only the executable file	-0.124939
-0.143885	Both the executable file	-0.124939
-0.183776	by an executable file	-0.124939
-0.183776	a single executable file	-0.124939
-0.119503	and the header file	-0.124939
-0.119503	use the header file	-0.124939
-0.134944	including a header file	-0.124939
-0.134944	the standard header file	-0.124939
-0.030012	the appropriate header file	-0.124939
-0.207260	requesting a map file	-0.124939
-0.091382	program. The map file	-0.124939
-0.091382	linker. The map file	-0.124939
-0.207260	-S Generate map file	-0.124939
-0.343557	protocols and standardized file	-0.124939
-0.056991	call // Header file	-0.124939
-0.056991	later // Header file	-0.124939
-0.122607	Instruction set Header file	-0.124939
-0.237780	www.agner.org/ optimize/#vectorclass Include file	-0.124939
-0.237780	make a zip file	-0.124939
-0.600753	or in the programming	-0.124939
-0.726388	the way the programming	-0.124939
-0.463539	by choosing a programming	-0.124939
-0.933591	the time of programming	-0.124939
-0.414382	the choice of programming	-0.425969
-0.180817	2.4 Choice of programming	-0.425969
-0.252151	a matter of programming	-0.726999
-0.499794	The history of programming	-0.124939
-0.652420	good deal of programming	-0.124939
-0.355189	better standardization of programming	-0.124939
-0.358846	University courses in programming	-0.124939
-0.995697	be called from programming	-0.124939
-0.540395	than in other programming	-0.124939
-0.347894	developers choose other programming	-0.124939
-0.347894	compilers. Several other programming	-0.124939
-0.347894	performance over other programming	-0.124939
-0.357989	to decide which programming	-0.124939
-0.357518	(IDE) supports multiple programming	-0.124939
-1.072828	of the C++ programming	-0.124939
-0.566038	that the software programming	-0.124939
-0.557789	between a software programming	-0.124939
-0.344571	unnecessary functions Some programming	-0.124939
-0.344571	memory allocation. Some programming	-0.124939
-0.355730	and other compiled programming	-0.124939
-0.712722	is a common programming	-0.124939
-0.404761	also a common programming	-0.124939
-0.329869	and other common programming	-0.124939
-0.354347	adhere to certain programming	-0.124939
-1.126630	that a particular programming	-0.124939
-0.456162	platforms and various programming	-0.124939
-0.325313	answers to your programming	-0.124939
-0.325313	don't send your programming	-0.124939
-0.518626	of the advanced programming	-0.124939
-0.350178	Perl. Several modern programming	-0.124939
-0.451033	not a safe programming	-0.124939
-0.011124	of object oriented programming	-0.124939
-0.034264	The object oriented programming	-0.124939
-0.034264	an object oriented programming	-0.124939
-0.034264	recommend object oriented programming	-0.124939
-0.081850	88 Object oriented programming	-0.124939
-0.441821	definitely the preferred programming	-0.124939
-0.109344	148 14.13 System programming	-0.124939
-0.109344	X. 14.13 System programming	-0.124939
-0.314542	It may catch programming	-0.124939
-0.294033	of the trivial programming	-0.124939
-0.294033	a relatively primitive programming	-0.124939
-0.237747	possible. Template meta- programming	-0.124939
-0.237747	and classes Nowadays, programming	-0.124939
-0.792149	and all the dynamic	-0.124939
-0.543057	distribute all the dynamic	-0.124939
-0.883965	to load the dynamic	-0.124939
-0.672007	function in a dynamic	-0.425969
-0.594778	compiled as a dynamic	-0.124939
-0.462338	at which a dynamic	-0.124939
-0.460988	typical uses of dynamic	-0.124939
-1.087540	The cost of dynamic	-0.124939
-0.356844	The process of dynamic	-0.124939
-0.758998	The advantages of dynamic	-0.124939
-0.933987	the costs of dynamic	-0.124939
-0.460988	The disadvantages of dynamic	-0.124939
-0.107072	both static and dynamic	-0.124939
-0.504746	dynamic libraries. The dynamic	-0.124939
-0.358733	is reserved for dynamic	-0.124939
-0.065618	(*.lib, *.a) or dynamic	-0.425969
-0.358639	first application if dynamic	-0.124939
-0.463149	errors associated with dynamic	-0.124939
-0.578711	.a), but not dynamic	-0.124939
-0.596362	linking rather than dynamic	-0.124939
-0.571176	whether to use dynamic	-0.124939
-0.992426	reason to use dynamic	-0.124939
-0.346559	class libraries use dynamic	-0.124939
-0.139591	container classes use dynamic	-0.124939
-0.139591	string classes use dynamic	-0.124939
-0.346559	as Java, use dynamic	-0.124939
-1.140514	one or more dynamic	-0.124939
-0.353053	calls it. A dynamic	-0.124939
-0.353053	each process. A dynamic	-0.124939
-0.353053	static linking. A dynamic	-0.124939
-0.574387	that can make dynamic	-0.124939
-1.567292	of the same dynamic	-0.124939
-1.451735	share the same dynamic	-0.124939
-0.357922	will make all dynamic	-0.124939
-0.586222	drawbacks of using dynamic	-0.124939
-0.561331	distributed between multiple dynamic	-0.124939
-0.581149	the cases where dynamic	-0.124939
-0.356448	or container without dynamic	-0.124939
-0.355779	the application, while dynamic	-0.124939
-0.513222	size to avoid dynamic	-0.124939
-0.853627	how to avoid dynamic	-0.124939
-0.331818	possible, and avoid dynamic	-0.124939
-0.521417	clash with another dynamic	-0.124939
-0.457731	multiple purposes. All dynamic	-0.124939
-0.730365	in a separate dynamic	-0.425969
-1.138276	{ // Make dynamic	-0.124939
-0.048374	14.11 Static versus dynamic	-0.124939
-0.325235	Container classes Whenever dynamic	-0.124939
-0.577528	includes only the part	-0.124939
-0.890296	software that is part	-0.124939
-0.550194	a parameter is part	-0.124939
-0.897307	stack is a part	-0.124939
-0.504466	how often a part	-0.124939
-0.358687	X (Darwin) are part	-0.124939
-0.358491	usually included as part	-0.124939
-0.598597	that is not part	-0.124939
-0.501039	certain that this part	-0.124939
-0.524178	measurements on this part	-0.124939
-0.561801	of time. A part	-0.124939
-0.909550	in the same part	-0.602060
-0.358051	following cases: If part	-0.124939
-0.724580	to see which part	-0.124939
-0.357988	other reasons, but part	-0.124939
-0.535884	occurs in each part	-0.124939
-0.349304	much time each part	-0.124939
-0.711975	many times each part	-0.124939
-0.935158	in a static part	-0.124939
-0.357252	not include any part	-0.124939
-0.053590	in the critical part	-0.823909
-0.530805	if the critical part	-0.124939
-0.377191	then the critical part	-0.124939
-0.408303	in a critical part	-0.124939
-0.246926	the same critical part	-0.124939
-0.074963	the most critical part	-0.970037
-0.101732	The most critical part	-0.124939
-0.720831	If you access part	-0.124939
-0.654001	is an important part	-0.124939
-0.562072	least a large part	-0.124939
-0.557977	in a small part	-0.124939
-0.458855	framework. The optimized part	-0.124939
-0.354935	support and another part	-0.124939
-0.581341	activate a particular part	-0.124939
-0.344976	the most significant part	-0.124939
-0.629564	the most time-consuming part	-0.124939
-0.343477	C++ program (or part	-0.124939
-0.020834	23; // fractional part	-0.124939
-0.020834	52; // fractional part	-0.124939
-0.020834	63; // fractional part	-0.124939
-0.294024	if the time-critical part	-0.124939
-0.237739	put the task-specific part	-0.124939
-1.295485	any of the bits	-0.124939
-0.589179	manipulate all the bits	-0.124939
-0.358616	compiler interpret the bits	-0.124939
-2.073473	the number of bits	-0.124939
-0.526365	int uses more bits	-0.124939
-0.201180	interpreting the same bits	-0.124939
-0.462441	sets all other bits	-0.124939
-0.546009	zero if all bits	-0.124939
-0.354298	by testing all bits	-0.124939
-0.346861	can set multiple bits	-0.124939
-0.346861	mask out multiple bits	-0.124939
-0.346861	can toggle multiple bits	-0.124939
-0.524269	integers of 8 bits	-0.124939
-0.223614	integers of 64 bits	-0.124939
-0.223614	vectors of 64 bits	-0.124939
-0.127815	systems and 64 bits	-0.124939
-0.127815	32 and 64 bits	-0.124939
-0.309716	can be 64 bits	-0.124939
-0.309716	which are 64 bits	-0.124939
-0.309716	entries use 64 bits	-0.124939
-0.655099	{ // test bits	-0.124939
-0.327063	int is 16 bits	-0.124939
-0.423476	integers of 16 bits	-0.124939
-0.655787	8 or 16 bits	-0.124939
-0.327063	the lower 16 bits	-0.124939
-0.358053	size_t is 32 bits	-0.124939
-0.273903	integers of 32 bits	-0.124939
-0.358053	16 or 32 bits	-0.124939
-0.273903	(Both use 32 bits	-0.124939
-0.273903	for example 32 bits	-0.124939
-0.273903	64-bit double 32 bits	-0.124939
-0.273903	float uses 32 bits	-0.124939
-0.273903	with accessing 32 bits	-0.124939
-0.115707	the upper 32 bits	-0.124939
-0.115707	Get upper 32 bits	-0.124939
-0.355356	or writing small bits	-0.124939
-0.439547	register is 128 bits	-0.124939
-0.339887	bits (MMX), 128 bits	-0.124939
-0.331769	is available, 256 bits	-0.124939
-0.331769	bits (XMM), 256 bits	-0.124939
-0.352784	least significant n bits	-0.124939
-0.323625	system, and 512 bits	-0.124939
-0.323625	soon also 512 bits	-0.124939
-0.345176	integer has enough bits	-0.124939
-0.088943	size of vector, bits	-0.124939
-0.339263	available. declaration size, bits	-0.124939
-0.434787	doubles by comparing bits	-0.124939
-0.325202	into the individual bits	-0.124939
-0.294061	sizes to 1024 bits	-0.124939
-0.023512	of each element, bits	-0.425969
-0.237772	that the remaining bits	-0.124939
-1.180469	different kinds of operations	-0.124939
-0.550396	long sequence of operations	-0.124939
-0.473606	is the vector operations	-0.124939
-0.473606	Using the vector operations	-0.124939
-0.479330	use of vector operations	-0.124939
-0.411289	11) and vector operations	-0.124939
-0.447767	105 The vector operations	-0.124939
-0.487593	problem with vector operations	-0.124939
-0.428084	to use vector operations	-0.124939
-0.251286	can use vector operations	-0.124939
-0.251286	also use vector operations	-0.124939
-0.411289	by using vector operations	-0.124939
-0.317275	the 64-bit vector operations	-0.124939
-0.317275	most efficient vector operations	-0.124939
-0.688285	12 Using vector operations	-0.124939
-0.411289	mispredictions. Boolean vector operations	-0.124939
-0.317275	processors (when vector operations	-0.124939
-1.286086	the floating point operations	-0.124939
-1.200609	of floating point operations	-0.124939
-0.192049	doing floating point operations	-0.425969
-0.544805	100 floating point operations	-0.124939
-0.566439	purposes. Floating point operations	-0.124939
-0.531892	because the integer operations	-0.124939
-0.330709	typically use integer operations	-0.124939
-0.330709	advantage because integer operations	-0.124939
-0.330709	to do integer operations	-0.124939
-0.330709	that these integer operations	-0.124939
-0.062110	14.9 Using integer operations	-0.425969
-0.330709	fast. Simple integer operations	-0.124939
-1.458902	possible to do operations	-0.124939
-0.357535	into two 64-bit operations	-0.124939
-0.722664	call and return operations	-0.124939
-0.585221	1. This makes operations	-0.124939
-0.356876	subtraction, comparison, bit operations	-0.124939
-0.460525	vectors, and these operations	-0.124939
-0.536648	do the extra operations	-0.124939
-0.432966	Integer operators Integer operations	-0.124939
-0.334647	specific size. Integer operations	-0.124939
-0.456532	43). The Boolean operations	-0.124939
-0.521167	processing, and mathematical operations	-0.124939
-0.564197	these table lookup operations	-0.124939
-0.352790	<< and | operations	-0.124939
-0.352587	splitting 256-bit read operations	-0.124939
-0.512693	operations and shift operations	-0.124939
-0.347317	waiting for disk operations	-0.124939
-0.248595	register variables. Vector operations	-0.124939
-0.248595	instruction sets. Vector operations	-0.124939
-0.248595	bits (ZMM). Vector operations	-0.124939
-0.279361	can do arithmetic operations	-0.124939
-0.364686	address. Pointer arithmetic operations	-0.124939
-0.294080	functions are primitive operations	-0.124939
-0.237788	int, float. Similar operations	-0.124939
-0.592970	bit which is 0	-0.124939
-0.526772	if one is 0	-0.124939
-0.526806	initializes x to 0	-0.124939
-0.065710	in b to 0	-0.425969
-1.133660	known to be 0	-0.124939
-1.025294	guaranteed to be 0	-0.124939
-0.577738	= i = 0	-0.124939
-0.632503	for N = 0	-0.124939
-0.104563	& ~a = 0	-0.124939
-0.087632	- a-a = 0	-0.124939
-0.087632	n.a. a-a = 0	-0.124939
-0.087632	a-(-b)=a+b a-a = 0	-0.124939
-0.139115	- a*0 = 0	-0.124939
-0.139115	n.a. a*0 = 0	-0.124939
-0.064011	a ^a = 0	-0.124939
-0.139115	as 0/a = 0	-0.124939
-0.139115	- 0/a = 0	-0.124939
-0.345029	- andnot(a,a) = 0	-0.124939
-0.049344	other value than 0	-0.602060
-0.015210	other values than 0	-0.425969
-0.355380	the integers from 0	-0.124939
-0.355380	the interval from 0	-0.124939
-0.594121	with the value 0	-0.124939
-0.502533	0; // return 0	-0.124939
-0.589499	comparisons i < 0	-0.124939
-0.706328	if (i < 0	-0.124939
-0.538111	unsigned char 8 0	-0.124939
-0.524189	long int 64 0	-0.124939
-0.558785	b is always 0	-0.124939
-0.537272	unsigned int 16 0	-0.124939
-0.356373	unsigned long 32 0	-0.124939
-0.356290	// test bits 0	-0.124939
-0.569947	a, a & 0	-0.124939
-0.355642	used by element 0	-0.124939
-0.458639	number we get 0	-0.124939
-0.353681	instruction takes typically 0	-0.124939
-0.352792	ex xn n 0	-0.124939
-0.551416	a, a | 0	-0.124939
-0.391559	x-xxx--xx a | 0	-0.124939
-0.058585	= b > 0	-0.425969
-0.305201	when bb[i] > 0	-0.124939
-0.741712	in the interval 0	-0.124939
-0.237780	a & 0= 0	-0.124939
-1.623656	size of the type	-0.124939
-0.892291	processor and the type	-0.124939
-1.281595	assume that the type	-0.124939
-1.358269	or if the type	-0.124939
-0.596845	done on the type	-0.124939
-0.589245	templates where the type	-0.124939
-0.568737	expensive, while the type	-0.124939
-0.814544	and choose the type	-0.124939
-0.461887	by specifying the type	-0.124939
-0.357551	valid. Re-interpreting the type	-0.124939
-0.540340	N elements of type	-0.124939
-0.143166	four numbers of type	-0.124939
-0.143166	eight numbers of type	-0.124939
-0.851455	program than to type	-0.124939
-0.463420	the size and type	-0.124939
-0.460973	loop is. The type	-0.124939
-0.356833	four float. The type	-0.124939
-0.356833	class declaration. The type	-0.124939
-0.460973	bits each. The type	-0.124939
-0.596074	about the function type	-0.124939
-0.654356	// Define function type	-0.124939
-0.460124	// define function type	-0.124939
-0.596372	unions rather than type	-0.124939
-0.578340	had a different type	-0.124939
-0.580509	consumption of different type	-0.124939
-1.680277	of the same type	-0.124939
-0.525722	The unsigned integer type	-0.124939
-0.874398	different for each type	-0.124939
-0.357722	arithmetics and pointer type	-0.124939
-0.357521	100000001.23456. The float type	-0.124939
-0.461522	work with any type	-0.124939
-0.357128	types The return type	-0.124939
-0.579256	than a simple type	-0.124939
-0.927017	ways of doing type	-0.124939
-0.286339	support for runtime type	-0.124939
-0.286339	not use runtime type	-0.124939
-0.286339	or require runtime type	-0.124939
-0.286339	pointer No runtime type	-0.124939
-0.352218	point instructions. Each type	-0.124939
-0.567557	for the appropriate type	-0.124939
-0.350700	an over- loaded type	-0.124939
-0.045150	of a composite type	-0.124939
-0.095545	has a composite type	-0.124939
-0.155655	parameter of composite type	-0.124939
-0.336171	about rounding. Pointer type	-0.124939
-0.160683	identification (RTTI) Runtime type	-0.124939
-0.072952	53 7.21 Runtime type	-0.124939
-0.072952	effort. 7.21 Runtime type	-0.124939
-0.325182	conversion // C-style type	-0.124939
-0.237755	static_cast<float>(i); // Implicit type	-0.124939
-0.237755	casting // Constructor-style type	-0.124939
-1.484018	This is the case	-0.124939
-1.003052	this is the case	-0.124939
-0.889432	example, in the case	-0.124939
-0.595177	cycles in the case	-0.124939
-1.067249	efficient if the case	-0.124939
-0.934283	is not the case	-0.124939
-0.503466	faster. In the case	-0.124939
-0.357819	all. In the case	-0.124939
-0.357819	occur. In the case	-0.124939
-0.357819	60. In the case	-0.124939
-0.357285	is often the case	-0.124939
-0.357285	is commonly the case	-0.124939
-1.273257	a function in case	-0.124939
-0.510315	std::unexpected() function in case	-0.124939
-0.539067	long time in case	-0.124939
-0.494479	to use in case	-0.124939
-0.709356	the program in case	-0.124939
-0.494479	the object in case	-0.124939
-0.454045	safe way in case	-0.124939
-0.245895	clean up in case	-0.124939
-0.245895	cleaned up in case	-0.124939
-0.064836	an exception in case	-0.425969
-0.516201	signed integers in case	-0.124939
-0.494479	denormal numbers in case	-0.124939
-0.351373	all operands in case	-0.124939
-0.532068	program errors in case	-0.124939
-0.351373	up everything in case	-0.124939
-0.351373	be justified in case	-0.124939
-0.358443	switch (n) { case	-0.124939
-0.432324	and in this case	-0.124939
-0.432324	but in this case	-0.124939
-0.432324	solution in this case	-0.124939
-0.432324	columns in this case	-0.124939
-0.432324	0] in this case	-0.124939
-0.351965	resources. In this case	-0.124939
-0.351965	speed. In this case	-0.124939
-0.351965	71). In this case	-0.124939
-0.351965	divisor. In this case	-0.124939
-0.656949	the worst possible case	-0.124939
-0.354786	in the likely case	-0.124939
-0.508950	repetitive. The simplest case	-0.124939
-0.781828	in the latter case	-0.124939
-0.481192	to the general case	-0.124939
-0.099666	cover the worst case	-0.425969
-0.255895	etc. The worst case	-0.124939
-0.122641	1: printf("Beta"); break; case	-0.124939
-0.122641	2: printf("Gamma"); break; case	-0.124939
-0.122641	0: printf("Alpha"); break; case	-0.124939
-0.237869	under the worst- case	-0.124939
-0.237869	in the former case	-0.124939
-0.599060	except for the cases	-0.124939
-0.577424	146). In the cases	-0.124939
-0.502286	integer size in cases	-0.124939
-0.655966	is advantageous in cases	-0.124939
-0.525164	operations automatically in cases	-0.124939
-0.784748	such errors in cases	-0.124939
-0.461153	example containers in cases	-0.124939
-0.587197	There may be cases	-0.425969
-1.169082	However, there are cases	-0.124939
-0.556613	so many different cases	-0.124939
-0.311939	can in most cases	-0.124939
-0.311939	because in most cases	-0.124939
-0.311939	advantageous in most cases	-0.124939
-0.311939	However, in most cases	-0.124939
-0.440576	implementation in most cases	-0.124939
-0.422966	cycles. In most cases	-0.124939
-0.422966	optimizations. In most cases	-0.124939
-0.503014	programming, etc. In cases	-0.124939
-1.136354	there are many cases	-0.124939
-0.498075	purity. In many cases	-0.124939
-0.357439	in all possible cases	-0.124939
-0.106011	and in some cases	-0.124939
-0.049779	may in some cases	-0.249877
-0.354755	efficient in some cases	-0.124939
-0.246555	possible in some cases	-0.124939
-0.180591	calculations. In some cases	-0.124939
-0.180591	unrolling In some cases	-0.124939
-0.180591	element. In some cases	-0.124939
-0.180591	though. In some cases	-0.124939
-0.180591	have. In some cases	-0.124939
-0.180591	mind. In some cases	-0.124939
-0.180591	44 In some cases	-0.124939
-0.180591	34. In some cases	-0.124939
-0.761117	automatically in simple cases	-0.124939
-0.355550	is best. These cases	-0.124939
-0.338797	of the few cases	-0.124939
-0.966690	are a few cases	-0.124939
-0.332807	needed. These complicated cases	-0.124939
-0.332807	set. More complicated cases	-0.124939
-0.352693	otherwise. In difficult cases	-0.124939
-0.404307	optimal in special cases	-0.124939
-0.311644	there are special cases	-0.124939
-0.126028	on. 7.31 Other cases	-0.124939
-0.126028	61 7.31 Other cases	-0.124939
-0.494127	in more complex cases	-0.124939
-0.382702	programming. 13.3 Difficult cases	-0.124939
-0.237812	in some rare cases	-0.124939
-0.562957	function. However, the short	-0.124939
-1.965814	if it is short	-0.124939
-0.599472	finishes in a short	-0.124939
-0.463132	step. With a short	-0.124939
-0.997779	a list of short	-0.124939
-0.540935	vector libraries and short	-0.124939
-0.358540	avoid macros with short	-0.124939
-0.526528	size other than short	-0.124939
-0.958565	struct S1 { short	-0.124939
-0.358177	of int. A short	-0.124939
-1.048010	speed by using short	-0.124939
-0.657482	math libraries: Intel short	-0.124939
-0.502143	Iu8vec8 16 4 short	-0.124939
-0.524224	Vec16uc 16 8 short	-0.124939
-0.480235	16 4 unsigned short	-0.124939
-0.441049	16 8 unsigned short	-0.124939
-0.341081	255 uint8_t unsigned short	-0.124939
-0.537329	16 128 SSE2 short	-0.124939
-0.720389	numbers of type short	-0.124939
-0.550890	7.21 int i; short	-0.124939
-0.550890	7.23 int i; short	-0.124939
-0.457945	unsigned 1 1 short	-0.124939
-0.456397	4 unsigned 256 short	-0.124939
-0.518306	Vec32c unsigned char short	-0.124939
-0.516813	32 256 AVX2 short	-0.124939
-0.000251	short int bb[], short	-1.079181
-0.000377	SelectAddMul(short int aa[], short	-0.903090
-0.002266	SelectAddMul_dispatch(short int aa[], short	-0.124939
-0.002266	FUNCNAME(short int aa[], short	-0.124939
-0.002266	FuncType(short int aa[], short	-0.124939
-0.059078	size Alignd ( short	-0.124939
-0.059078	arrays Alignd ( short	-0.124939
-0.059078	); Alignd ( short	-0.124939
-0.498538	8 64 MMX short	-0.124939
-0.331718	byte at 11 short	-0.124939
-0.314621	// Example 7.22 short	-0.124939
-0.294015	data types: char, short	-0.124939
-0.237731	-128 127 int8_t short	-0.124939
-0.237731	smaller sizes (char, short	-0.124939
-1.442205	result of the &	-0.124939
-0.892240	bits with the &	-0.124939
-0.594157	18, then the &	-0.124939
-0.550041	integer. But the &	-0.124939
-0.522587	a= a a &	-0.124939
-1.063383	c = a &	-0.124939
-1.128075	y = a &	-0.124939
-0.882068	b with a &	-0.124939
-0.127481	n.a. - a &	-0.124939
-0.653498	= 0 a &	-0.124939
-1.002058	= a, a &	-0.124939
-0.355732	~a&~b=~(a|b) --xxxx--- a &	-0.124939
-0.358859	change && to &	-0.124939
-0.463412	^ operator. The &	-0.124939
-0.526838	* p = &	-0.124939
-0.106764	Func(int a[], int &	-0.425969
-0.357765	multiple conditions using &	-0.124939
-0.357717	of security. b &	-0.124939
-0.294236	dest, double const &	-0.124939
-0.013567	d, __m128i const &	-0.726999
-0.294236	a, T const &	-0.124939
-0.294236	polynomial (Vec4f const &	-0.124939
-0.294236	+ (vector const &	-0.124939
-0.294236	float add_elements(__m128 const &	-0.124939
-0.294236	T max(T const &	-0.124939
-0.868185	by a single &	-0.124939
-0.354151	void FuncB (int &	-0.124939
-0.339263	N; } T &	-0.124939
-0.210161	u; if (u.i &	-0.124939
-0.210161	143 if (u.i &	-0.124939
-0.212194	u.i = (n &	-0.124939
-0.403691	{ if (n &	-0.124939
-0.102813	{ // (N &	-0.124939
-0.102813	#define N1 (N &	-0.124939
-0.538595	Day; if (Day &	-0.124939
-0.237772	p->b;} int Sum3(S3 &	-0.124939
-0.237772	2 return powN<(N &	-0.124939
-0.237772	4) | ((C &	-0.124939
-0.237772	0x0F) | ((B &	-0.124939
-0.237772	i; ... list[i &	-0.124939
-0.237772	Intel SVML v.10.3 &	-0.124939
-0.237772	Intel SVML v.10.2 &	-0.124939
-0.237772	a = OneOrTwo5[b &	-0.124939
-0.237772	x.abc = (A &	-0.124939
-0.900881	case of the simple	-0.124939
-1.407226	faster than the simple	-0.124939
-0.577247	function. In the simple	-0.124939
-1.506041	This is a simple	-0.124939
-0.852168	p is a simple	-0.124939
-0.575853	condition is a simple	-0.124939
-1.269696	object of a simple	-0.124939
-0.594948	fast in a simple	-0.124939
-1.251426	preferably be a simple	-0.124939
-0.578718	reference or a simple	-0.124939
-0.581180	algorithm if a simple	-0.124939
-0.588692	list with a simple	-0.124939
-0.581317	time-consuming than a simple	-0.124939
-0.128344	range then a simple	-0.425969
-0.353879	as calling a simple	-0.124939
-0.751815	than accessing a simple	-0.124939
-0.089249	that follows a simple	-0.124939
-0.089249	it follows a simple	-0.124939
-0.089249	pointer follows a simple	-0.124939
-0.541176	array elements of simple	-0.124939
-0.725853	response times to simple	-0.124939
-0.358530	immediate responses to simple	-0.124939
-0.358790	fast, compact, and simple	-0.124939
-0.502942	data file in simple	-0.124939
-0.313748	code automatically in simple	-0.124939
-0.313748	optimization automatically in simple	-0.124939
-0.656901	at least in simple	-0.124939
-0.356809	the same for simple	-0.124939
-0.460943	is efficient for simple	-0.124939
-0.524169	times, even for simple	-0.124939
-0.829538	response times for simple	-0.124939
-0.595173	overflow, such as simple	-0.124939
-0.545694	a time. A simple	-0.124939
-0.348035	full speed. A simple	-0.124939
-0.348035	data members. A simple	-0.124939
-0.348035	other branches. A simple	-0.124939
-0.348035	a profiler. A simple	-0.124939
-0.503848	code contains only simple	-0.124939
-0.880580	advantageous to do simple	-0.124939
-0.858585	compilers can do simple	-0.124939
-0.593054	only the most simple	-0.124939
-0.539088	select between two simple	-0.124939
-0.357492	Member pointers In simple	-0.124939
-0.357145	mainframes, and between simple	-0.124939
-0.355695	cache contentions. Use simple	-0.124939
-0.555458	problems is quite simple	-0.124939
-0.933266	compilers can reduce simple	-0.124939
-0.539528	multiplication, to mix simple	-0.124939
-0.331747	destroyed. In 50 simple	-0.124939
-0.237804	cache space. Putting simple	-0.124939
-0.879265	on using the instructions	-0.124939
-0.970706	this kind of instructions	-0.124939
-0.463345	computing i/2+r. The instructions	-0.124939
-0.578792	must rely on instructions	-0.124939
-0.487055	bits. The vector instructions	-0.124939
-0.346020	microprocessors have vector instructions	-0.124939
-0.988794	to use vector instructions	-0.124939
-0.346020	some more vector instructions	-0.124939
-0.562141	more integer vector instructions	-0.124939
-0.574328	of the different instructions	-0.124939
-0.832834	There are no instructions	-0.124939
-0.357420	The next two instructions	-0.124939
-0.357368	a pipeline where instructions	-0.124939
-0.356609	giving specific optimization instructions	-0.124939
-0.347118	set. These new instructions	-0.124939
-0.347118	keep adding new instructions	-0.124939
-0.523525	9.2. All these instructions	-0.124939
-0.355924	used, though. Some instructions	-0.124939
-0.630890	lot of extra instructions	-0.124939
-0.344196	a few extra instructions	-0.124939
-0.355841	testing single assembly instructions	-0.124939
-0.355463	critical application- specific instructions	-0.124939
-0.355454	operators are single instructions	-0.124939
-0.329816	data cache. These instructions	-0.124939
-0.329816	this problem. These instructions	-0.124939
-0.329816	table lookup. These instructions	-0.124939
-0.458940	possible. The AVX instructions	-0.124939
-0.581753	include a few instructions	-0.124939
-0.354285	sets have certain instructions	-0.124939
-0.195301	the nontemporal write instructions	-0.124939
-0.117577	of nontemporal write instructions	-0.124939
-0.117577	The nontemporal write instructions	-0.124939
-0.117577	so-called nontemporal write instructions	-0.124939
-0.353331	There are intrinsic instructions	-0.124939
-0.456406	require precision conversion instructions	-0.124939
-0.495129	other cache control instructions	-0.124939
-0.488710	CPUs can execute instructions	-0.124939
-0.343480	that supported 256-bit instructions	-0.124939
-0.343523	SSE4.2 string search instructions	-0.124939
-0.208636	optimization by executing instructions	-0.124939
-0.280018	spent on executing instructions	-0.124939
-0.208636	43 speculatively executing instructions	-0.124939
-0.339304	number of machine instructions	-0.124939
-0.336040	contains only six instructions	-0.124939
-0.325175	to define application-specific instructions	-0.124939
-0.102777	able to reorder instructions	-0.124939
-0.102777	compiler may reorder instructions	-0.124939
-0.237674	table. The 16-byte instructions	-0.124939
-0.237674	(add with carry) instructions	-0.124939
-0.237674	support the ADX instructions	-0.124939
-0.237674	queue of pending instructions	-0.124939
-0.901099	power of the processors	-0.124939
-0.598318	version on the processors	-0.124939
-0.994619	negative list of processors	-0.124939
-0.274051	first generation of processors	-0.124939
-0.390357	next generation of processors	-0.124939
-0.274051	second generation of processors	-0.124939
-0.355099	the time on processors	-0.124939
-0.355099	its time on processors	-0.124939
-0.978965	works best on processors	-0.124939
-0.354123	be avoided on processors	-0.124939
-0.462618	processors and vector processors	-0.124939
-1.136543	versions for different processors	-0.124939
-0.527194	organization for different processors	-0.124939
-0.525617	precision on most processors	-0.124939
-0.543273	counter in Intel processors	-0.124939
-0.353120	and earlier Intel processors	-0.124939
-0.525727	operation on such processors	-0.124939
-0.760186	only on some processors	-0.124939
-0.570610	way, the first processors	-0.124939
-0.467692	registers The first processors	-0.124939
-0.467692	registers. The first processors	-0.124939
-0.356058	and between simple processors	-0.124939
-0.342601	for other virtual processors	-0.124939
-0.342601	CPU. These virtual processors	-0.124939
-0.442440	4 and AMD processors	-0.124939
-0.342185	Intel processors. AMD processors	-0.124939
-0.352149	newer processors. Many processors	-0.124939
-0.351567	versions. The x86 processors	-0.124939
-0.351482	compiled for old processors	-0.124939
-0.350698	to recognize VIA processors	-0.124939
-0.350163	is that modern processors	-0.124939
-0.734996	performance on non-Intel processors	-0.124939
-0.396728	running on non-Intel processors	-0.124939
-0.304292	for all unknown processors	-0.124939
-0.304292	to handle unknown processors	-0.124939
-0.430259	number of logical processors	-0.124939
-0.046958	cores or logical processors	-0.124939
-0.229101	the even-numbered logical processors	-0.124939
-0.089883	the standard PC processors	-0.124939
-0.237780	number of physical processors	-0.124939
-0.237780	has four physical processors	-0.124939
-0.331718	Optimizing for present processors	-0.124939
-0.325152	effect on older processors	-0.124939
-0.237731	another. Therefore, micro- processors	-0.124939
-0.237731	also see emulated processors	-0.124939
-0.237731	parallel. Small lightweight processors	-0.124939
-0.237731	last time. Newer processors	-0.124939
-0.237731	same brand. Future processors	-0.124939
-1.645762	depending on the available	-0.124939
-0.358472	table lists the available	-0.124939
-0.358472	CPUs increased the available	-0.124939
-0.658953	to study the available	-0.124939
-0.546529	support and is available	-0.124939
-0.597890	and it is available	-0.124939
-0.594375	InstructionSet() function is available	-0.124939
-0.187464	C++ compiler is available	-0.425969
-0.557644	compiler, which is available	-0.124939
-0.557644	asmlib, which is available	-0.124939
-1.965663	instruction set is available	-0.124939
-0.356202	or inttypes.h is available	-0.124939
-0.356202	"express" edition is available	-0.124939
-2.073670	the number of available	-0.124939
-0.895304	expected to be available	-0.124939
-0.585482	ones that are available	-0.124939
-0.349258	multi-threaded software are available	-0.124939
-1.226031	function libraries are available	-0.124939
-0.415218	vector registers are available	-0.124939
-0.762123	stack registers are available	-0.124939
-0.415218	YMM registers are available	-0.124939
-0.950661	Vector operations are available	-0.124939
-0.532438	the calculations are available	-0.124939
-0.496030	trial versions are available	-0.124939
-0.451368	efficiency. These are available	-0.124939
-0.349258	standard tasks are available	-0.124939
-0.349258	class templates are available	-0.124939
-0.491541	interface frameworks are available	-0.124939
-0.311916	that are only available	-0.124939
-0.311916	size are only available	-0.124939
-0.769123	function is also available	-0.124939
-0.529931	option is also available	-0.124939
-0.655685	an extra register available	-0.124939
-0.586246	optimized function libraries available	-0.124939
-0.916429	floating point registers available	-0.124939
-0.488994	six integer registers available	-0.124939
-1.023201	and operating systems available	-0.124939
-0.537248	manuals are always available	-0.124939
-0.305579	of logical processors available	-0.124939
-0.432045	or logical processors available	-0.124939
-1.073833	can be made available	-0.124939
-0.350192	feature will become available	-0.124939
-0.341740	should be easily available	-0.124939
-0.339360	are various profilers available	-0.124939
-0.473532	than the largest available	-0.124939
-0.331783	LIBM library. Only available	-0.124939
-0.294089	Basic soon became available	-0.124939
-0.237796	Math Kernel Library, available	-0.124939
-0.237796	not on publicly available	-0.124939
-0.598606	a to the constant	-0.124939
-1.287588	faster if the constant	-0.124939
-1.060846	multiplying with the constant	-0.124939
-1.166179	by making the constant	-0.124939
-0.539613	register, add the constant	-0.124939
-0.556299	3.5; Here, the constant	-0.124939
-0.462294	parenthesis around the constant	-0.124939
-0.357872	compiler sees the constant	-0.124939
-0.358888	count (ArraySize) is constant	-0.124939
-0.594280	row is a constant	-0.124939
-1.251438	preferably be a constant	-0.124939
-0.381213	counter by a constant	-0.124939
-0.381213	multiplication by a constant	-0.124939
-0.252607	division by a constant	-0.425969
-0.278076	Division by a constant	-0.425969
-0.381213	Modulo by a constant	-0.124939
-0.588696	integer with a constant	-0.124939
-1.590779	to use a constant	-0.124939
-1.317162	by using a constant	-0.124939
-0.065160	by adding a constant	-0.425969
-0.353883	address plus a constant	-0.124939
-0.357868	function inlining and constant	-0.124939
-0.065670	Constant folding and constant	-0.425969
-0.358135	of n. The constant	-0.124939
-0.462629	is 0. The constant	-0.124939
-0.570160	perhaps }; // constant	-0.124939
-0.354718	b<c) Multiply by constant	-0.124939
-0.089401	- Divide by constant	-0.124939
-0.089401	add Divide by constant	-0.124939
-0.089401	---xx---x Divide by constant	-0.124939
-0.358535	are declared as constant	-0.124939
-0.358242	constant subexpression. A constant	-0.124939
-0.598578	A floating point constant	-0.124939
-0.584646	that only one constant	-0.124939
-0.587195	replace an integer constant	-0.124939
-0.357837	By giving each constant	-0.124939
-0.584306	even a single constant	-0.124939
-0.578918	the double precision constant	-0.124939
-0.353941	Example 8.24. Integer constant	-0.124939
-0.529838	order to enable constant	-0.124939
-0.339314	for any compile-time constant	-0.124939
-0.314620	stack memory. Copying constant	-0.124939
-0.023516	common subexpression elimination, constant	-0.425969
-0.595471	CPUs that are up	-0.124939
-1.292194	make the code up	-0.124939
-0.358428	is currently not up	-0.124939
-0.562059	modules that make up	-0.124939
-0.538549	is to set up	-0.124939
-0.641128	can be set up	-0.124939
-0.451620	tool can set up	-0.124939
-0.351202	unrolled loop takes up	-0.124939
-0.351202	'this' pointer takes up	-0.124939
-0.302205	branches that take up	-0.124939
-0.302205	instances that take up	-0.124939
-0.527083	This may take up	-0.124939
-0.138459	used to speed up	-0.124939
-0.138459	how to speed up	-0.124939
-0.354221	making it count up	-0.124939
-0.352860	Surprisingly, we end up	-0.124939
-0.117859	necessary to look up	-0.124939
-0.117859	needs to look up	-0.124939
-0.280127	to first look up	-0.124939
-0.280127	address. (3) look up	-0.124939
-0.351502	clock frequency goes up	-0.124939
-0.482024	computers to keep up	-0.124939
-0.299577	they always keep up	-0.124939
-0.423995	Unix systems allow up	-0.124939
-0.299549	and Mac allow up	-0.124939
-0.345080	needed for setting up	-0.124939
-0.331702	STL vector turned up	-0.124939
-0.314566	need to split up	-0.124939
-0.237767	should be split up	-0.124939
-0.056980	something to clean up	-0.124939
-0.056980	nothing to clean up	-0.124939
-0.122582	program must clean up	-0.124939
-0.403629	to be cleaned up	-0.124939
-0.212154	resources are cleaned up	-0.124939
-0.407850	will be filled up	-0.124939
-0.065758	handlers for cleaning up	-0.124939
-0.065758	of time cleaning up	-0.124939
-0.065758	prevented from cleaning up	-0.124939
-0.314504	versions tested (not up	-0.124939
-0.102792	less efficient. Splitting up	-0.124939
-0.102792	this rule. Splitting up	-0.124939
-0.382566	their CPU dispatchers up	-0.124939
-0.237715	time measurements: warm up	-0.124939
-0.237715	to _endthread() cleans up	-0.124939
-0.237715	and it fills up	-0.124939
-0.237715	chain may fill up	-0.124939
-0.237715	can be speeded up	-0.124939
-0.237715	section by summing up	-0.124939
-0.237715	in registers, totaling up	-0.124939
-0.237715	precautions for speeding up	-0.124939
-0.896083	check for the error	-0.124939
-0.569511	possible, or the error	-0.124939
-0.584067	long as the error	-0.124939
-0.593970	pipeline then the error	-0.124939
-0.358321	will trigger the error	-0.124939
-0.526213	common source of error	-0.124939
-0.561825	worse kind of error	-0.124939
-0.526516	other form of error	-0.124939
-0.463561	thought-through approach to error	-0.124939
-0.065728	7.30 Exceptions and error	-0.425969
-0.358605	use, incompatible or error	-0.124939
-0.595166	branches such as error	-0.124939
-1.154475	case of an error	-0.124939
-0.472637	mode, and an error	-0.124939
-0.526177	return with an error	-0.124939
-0.335553	up. If an error	-0.124939
-0.335553	may return an error	-0.124939
-0.434101	linker makes an error	-0.124939
-0.335553	don't need an error	-0.124939
-0.838254	will generate an error	-0.124939
-0.335553	will detect an error	-0.124939
-0.335553	dispatcher signal an error	-0.124939
-0.335553	to issue an error	-0.124939
-0.335553	will provoke an error	-0.124939
-0.335553	for issuing an error	-0.124939
-0.335553	that detects an error	-0.124939
-0.537831	accessed, and this error	-0.124939
-0.798101	can avoid this error	-0.124939
-0.558729	expensive and more error	-0.124939
-0.459744	and therefore more error	-0.124939
-0.358209	and recovering from error	-0.124939
-0.545217	exception or other error	-0.124939
-0.575294	insert any other error	-0.124939
-0.720834	a common programming error	-0.124939
-0.355720	bounds checking). An error	-0.124939
-1.116145	is a common error	-0.124939
-0.354989	overflow or another error	-0.124939
-0.134955	make your own error	-0.124939
-0.321757	prints an appropriate error	-0.124939
-0.321757	and make appropriate error	-0.124939
-0.451773	} // No error	-0.124939
-0.010294	of the residual error	-0.124939
-0.020838	until the residual error	-0.124939
-0.294089	is a minor error	-0.124939
-0.294089	reference to provoke error	-0.124939
-0.237796	handle an unrecoverable error	-0.124939
-0.358692	usability issues, and I	-0.124939
-0.584171	detection function that I	-0.124939
-0.358011	so complicated that I	-0.124939
-0.550487	obsolete. But if I	-0.124939
-0.358416	but no compiler I	-0.124939
-0.358257	into force when I	-0.124939
-0.358025	different speeds. If I	-0.124939
-0.357950	on usability, but I	-0.124939
-0.144941	of the compilers I	-0.602060
-0.108607	all the compilers I	-0.425969
-0.508973	71 The compilers I	-0.124939
-0.488515	of different compilers I	-0.124939
-0.354951	into multiple functions. I	-0.124939
-0.496406	of the examples I	-0.124939
-0.323552	| b; Here, I	-0.124939
-0.323552	& 1]; Here, I	-0.124939
-1.301765	function is called. I	-0.124939
-0.349994	matrix line size. I	-0.124939
-0.308081	don't understand it. I	-0.124939
-0.308081	and recompile it. I	-0.124939
-0.511374	lot in performance. I	-0.124939
-0.545167	code or not. I	-0.124939
-0.343509	the next element. I	-0.124939
-0.341507	and initialized arrays. I	-0.124939
-0.779137	and model number. I	-0.124939
-0.335956	destructors to call. I	-0.124939
-0.615211	a new one. I	-0.124939
-0.237722	books and manuals. I	-0.124939
-0.237722	my optimization manuals. I	-0.124939
-0.325062	system performance options. I	-0.124939
-0.325062	The reason is, I	-0.124939
-0.325062	the reductions manually. I	-0.124939
-0.314436	good as expected. I	-0.124939
-0.314436	specific model. Instead, I	-0.124939
-0.314436	-(-a) to a. I	-0.124939
-0.538367	my own research, I	-0.124939
-0.293932	principles to use. I	-0.124939
-0.382486	In this manual, I	-0.124939
-0.237658	and maintenance easier. I	-0.124939
-0.237658	with embedded microcontrollers. I	-0.124939
-0.237658	is particularly tricky. I	-0.124939
-0.237658	thousands of people. I	-0.124939
-0.237658	That being said, I	-0.124939
-0.237658	In this chapter, I	-0.124939
-0.237658	of 0x800 apart. I	-0.124939
-0.237658	quite dramatic consequences. I	-0.124939
-0.418037	useful way of making	-0.124939
-0.418037	good way of making	-0.124939
-0.418037	convenient way of making	-0.124939
-0.142672	alternative solution of making	-0.124939
-0.142672	radical solution of making	-0.124939
-0.502852	a means of making	-0.124939
-0.356561	The advice of making	-0.124939
-0.460627	is capable of making	-0.124939
-0.358828	by two and making	-0.124939
-0.571431	program code for making	-0.124939
-0.542837	be useful for making	-0.124939
-0.758160	are good for making	-0.124939
-0.653112	a feature for making	-0.124939
-0.355538	have facilities for making	-0.124939
-1.189278	if you are making	-0.124939
-1.080133	If you are making	-0.124939
-0.919993	unless you are making	-0.124939
-0.534281	precision or by making	-0.124939
-0.479794	compiler-generated code by making	-0.124939
-0.555720	it this by making	-0.124939
-0.340760	functions faster by making	-0.124939
-0.555745	one division by making	-0.124939
-0.608488	be avoided by making	-0.124939
-0.340760	faster either by making	-0.124939
-0.340760	cache misses by making	-0.124939
-0.340760	context switches by making	-0.124939
-0.340760	multiple inheritance by making	-0.124939
-0.063450	be solved by making	-0.124939
-0.340760	branch mispredictions by making	-0.124939
-0.340760	be mitigated by making	-0.124939
-0.726037	are satisfied with making	-0.124939
-0.659065	I am not making	-0.124939
-0.873520	be faster than making	-0.124939
-0.592537	object rather than making	-0.124939
-0.354786	to zero than making	-0.124939
-0.488513	prevents it from making	-0.124939
-0.250538	the compiler from making	-0.249877
-0.874813	you should avoid making	-0.124939
-0.352280	code for actually making	-0.124939
-0.346418	If you consider making	-0.124939
-0.434928	in different places making	-0.124939
-0.023519	compilers have difficulties making	-0.425969
-1.373095	the number of times	-0.124939
-0.526123	// Number of times	-0.124939
-0.357842	function billions of times	-0.124939
-0.503038	once or multiple times	-0.124939
-0.352104	one way two times	-0.124939
-0.352104	Then again two times	-0.124939
-0.311938	speed is many times	-0.124939
-0.311938	critical function many times	-0.124939
-0.311938	way, then many times	-0.124939
-0.311938	are used many times	-0.124939
-0.028747	count how many times	-0.124939
-0.059534	tell how many times	-0.124939
-0.059534	counts how many times	-0.124939
-0.311938	that goes many times	-0.124939
-0.460382	is that access times	-0.124939
-0.459632	} The execution times	-0.124939
-0.342511	software package several times	-0.124939
-0.342511	versions alternatingly several times	-0.124939
-0.354807	be reloaded eight times	-0.124939
-0.581841	break a few times	-0.124939
-0.353236	1024/4 = 256 times	-0.124939
-0.329419	way and three times	-0.124939
-0.329419	is approximately three times	-0.124939
-0.351491	is executed 10 times	-0.124939
-0.493474	up to 5 times	-0.124939
-0.678972	because the response times	-0.124939
-0.089739	unacceptably long response times	-0.301030
-0.224008	of longer response times	-0.124939
-0.346318	loop repeats 20 times	-0.124939
-0.748585	than the subsequent times	-0.124939
-0.343578	can improve search times	-0.124939
-0.341648	occur at random times	-0.124939
-0.251878	even a thousand times	-0.124939
-0.251878	repeats a thousand times	-0.124939
-0.336108	it takes six times	-0.124939
-0.331726	three to seven times	-0.124939
-0.487776	*p+2 a hundred times	-0.124939
-0.331785	library function 250 times	-0.124939
-0.473536	flow at inconvenient times	-0.124939
-0.077771	program repeats 1000 times	-0.124939
-0.077771	also repeats 1000 times	-0.124939
-0.407886	start at unpredictable times	-0.124939
-0.294024	critical function ten times	-0.124939
-0.294024	// Repeat NumberOfTests times	-0.124939
-0.237739	be a million times	-0.124939
-0.199597	memory to the stack	-0.124939
-1.316871	relative to the stack	-0.124939
-0.596350	large for the stack	-0.124939
-0.990820	than on the stack	-0.124939
-0.467622	memory on the stack	-0.124939
-0.376387	stored on the stack	-0.124939
-0.467622	parameters on the stack	-0.124939
-0.467622	space on the stack	-0.124939
-0.173697	transferred on the stack	-0.124939
-0.467622	storage on the stack	-0.124939
-0.467622	Storage on the stack	-0.124939
-0.467622	pushed on the stack	-0.124939
-0.590639	popped from the stack	-0.124939
-0.884715	function because the stack	-0.124939
-0.882031	not use a stack	-0.124939
-0.526934	setting up a stack	-0.124939
-0.129382	Other cases of stack	-0.425969
-0.430215	static memory to stack	-0.124939
-0.725096	the table to stack	-0.124939
-0.358817	Pointers, references, and stack	-0.124939
-1.326957	is stored in stack	-0.124939
-0.574435	new function. The stack	-0.124939
-0.538109	sections below. The stack	-0.124939
-0.356881	other situations: The stack	-0.124939
-0.356881	implementation dependent. The stack	-0.124939
-0.358591	save ebx on stack	-0.124939
-0.358237	restore ebx from stack	-0.124939
-0.948856	the floating point stack	-0.425969
-0.572342	The floating point stack	-0.124939
-1.301002	This is called stack	-0.124939
-0.349339	a mechanism called stack	-0.124939
-0.548761	Use the call stack	-0.124939
-0.331886	using the register stack	-0.124939
-0.331886	way the register stack	-0.124939
-0.491119	explanation of register stack	-0.124939
-0.473651	cache. The register stack	-0.124939
-1.106952	floating point register stack	-0.124939
-0.544298	Omitting the standard stack	-0.124939
-0.475481	pointer". The standard stack	-0.124939
-0.349641	handling /EHs- No stack	-0.124939
-0.237845	option for "standard stack	-0.124939
-0.805697	big arrays and want	-0.124939
-0.548459	and you may want	-0.124939
-0.548459	example, you may want	-0.124939
-0.238329	function and you want	-0.124939
-0.238329	functions and you want	-0.124939
-0.238329	efficient and you want	-0.124939
-0.238329	access and you want	-0.124939
-0.238329	handling and you want	-0.124939
-0.238329	anyway and you want	-0.124939
-0.360987	compiler that you want	-0.124939
-0.360987	set that you want	-0.124939
-0.360987	arrays that you want	-0.124939
-0.360987	statements that you want	-0.124939
-0.360987	say that you want	-0.124939
-0.590380	example if you want	-0.124939
-0.418147	example, if you want	-0.124939
-0.418147	CPUs if you want	-0.124939
-0.418147	option if you want	-0.124939
-0.418147	help if you want	-0.124939
-0.389168	of code you want	-0.124939
-0.389932	do when you want	-0.124939
-0.389932	indices when you want	-0.124939
-0.299373	a program you want	-0.124939
-0.391233	library. If you want	-0.124939
-0.391233	www.agner.org/optimize/asmlib.zip. If you want	-0.124939
-0.550946	results. If you want	-0.124939
-0.391233	macro. If you want	-0.124939
-0.391233	152 If you want	-0.124939
-0.217928	function where you want	-0.124939
-0.217928	case where you want	-0.124939
-0.732255	for example, you want	-0.124939
-0.217928	does what you want	-0.124939
-0.217928	exactly what you want	-0.124939
-0.299373	that 150 you want	-0.124939
-0.389168	Sum3. Whether you want	-0.124939
-0.547813	optimizations that we want	-0.124939
-0.332817	the function we want	-0.124939
-0.337933	but if we want	-0.124939
-0.337933	example, if we want	-0.124939
-0.489340	n∙(n-1)!. If we want	-0.124939
-0.460100	and manuals. I want	-0.124939
-0.824986	if you don't want	-0.124939
-0.329652	Intel compilers. We want	-0.124939
-0.329652	dependency chain. We want	-0.124939
-0.351268	If you just want	-0.124939
-0.349078	However, we still want	-0.124939
-0.279448	software developers who want	-0.124939
-0.279448	for those who want	-0.124939
-0.494864	in the code. Example:	-0.425969
-0.487174	pieces of code. Example:	-0.124939
-0.757795	any extra code. Example:	-0.124939
-1.088394	the same time. Example:	-0.124939
-0.630855	the called function. Example:	-0.124939
-0.344178	a pure function. Example:	-0.124939
-0.779315	stored in memory. Example:	-0.124939
-0.676881	in static memory. Example:	-0.124939
-0.954177	they are used. Example:	-0.124939
-1.301720	function is called. Example:	-0.124939
-0.350611	unroll a loop. Example:	-0.124939
-0.390978	power of 2. Example:	-0.425969
-0.349389	of consecutive variables. Example:	-0.124939
-0.821085	across function calls. Example:	-0.124939
-0.348056	versus XMM registers. Example:	-0.124939
-0.348121	a float variable. Example:	-0.124939
-0.523429	function is needed. Example:	-0.124939
-0.346168	for detailed instructions. Example:	-0.124939
-0.879111	a non-sequential order. Example:	-0.124939
-0.343502	it jumps to. Example:	-0.124939
-0.526857	check for overflow. Example:	-0.124939
-0.374416	doesn't cause overflow. Example:	-0.124939
-0.625881	the previous value. Example:	-0.124939
-0.341496	a previous branch. Example:	-0.124939
-0.335946	the same constant. Example:	-0.124939
-0.513510	poor branch prediction. Example:	-0.124939
-0.336014	the calculated result. Example:	-0.124939
-0.473270	the loop counter. Example:	-0.124939
-0.336014	additional integer counter. Example:	-0.124939
-0.331637	a single operation. Example:	-0.124939
-0.487532	iteration is finished. Example:	-0.124939
-0.331637	in different ways. Example:	-0.124939
-0.331555	of parallel execution. Example:	-0.124939
-0.429194	of array elements. Example:	-0.124939
-0.407753	it only once. Example:	-0.124939
-0.407753	registers is limited. Example:	-0.124939
-0.293922	a lookup-table static. Example:	-0.124939
-0.382475	to is known. Example:	-0.124939
-0.538351	the same thing. Example:	-0.124939
-0.293922	completely independent divisions. Example:	-0.124939
-0.293922	SSE2 or later. Example:	-0.124939
-0.237649	to all zeroes. Example:	-0.124939
-0.237649	may be undesired. Example:	-0.124939
-0.237649	32 bit offsets). Example:	-0.124939
-0.237649	the loop overhead. Example:	-0.124939
-0.237649	the members individually. Example:	-0.124939
-1.465158	most of the Gnu	-0.124939
-0.891283	case of the Gnu	-0.124939
-0.588515	similar to the Gnu	-0.124939
-1.152306	according to the Gnu	-0.124939
-0.581811	Windows and the Gnu	-0.124939
-0.581811	15.1b and the Gnu	-0.124939
-0.843270	used in the Gnu	-0.425969
-1.158915	described in the Gnu	-0.124939
-1.455441	supported by the Gnu	-0.124939
-0.575764	included with the Gnu	-0.124939
-0.852000	Comes with the Gnu	-0.124939
-0.883381	efficient than the Gnu	-0.124939
-0.573893	and only the Gnu	-0.124939
-0.567402	called, while the Gnu	-0.124939
-1.057921	to replace the Gnu	-0.124939
-0.812626	} Here, the Gnu	-0.124939
-0.396823	Microsoft, Intel and Gnu	-0.124939
-0.462323	Intel, PathScale and Gnu	-0.124939
-0.287434	CPU dispatching in Gnu	-0.602060
-0.533322	AMD CPUs. The Gnu	-0.124939
-1.003831	function calls. The Gnu	-0.124939
-0.497748	math libraries. The Gnu	-0.124939
-0.519638	32-bit version. The Gnu	-0.124939
-0.714731	automatic vectorization. The Gnu	-0.124939
-0.497748	namespace. 3. The Gnu	-0.124939
-0.353722	page 107). The Gnu	-0.124939
-0.353722	and Mac. The Gnu	-0.124939
-0.353722	versions instead. The Gnu	-0.124939
-0.463355	X #else // Gnu	-0.124939
-0.358649	with Microsoft or Gnu	-0.124939
-0.071371	MS compiler Windows Gnu	-0.602060
-0.355741	function libraries. Use Gnu	-0.124939
-0.070062	the Microsoft, Intel, Gnu	-0.425969
-0.153643	platforms. Microsoft, Intel, Gnu	-0.124939
-0.351608	operating systems. 10 Gnu	-0.124939
-0.346426	Mars PGI PathScale Gnu	-0.124939
-0.738685	only 32-bit Windows. Gnu	-0.124939
-0.172640	Gnu 32-bit -fno-builtin Gnu	-0.124939
-0.172640	64 bit -fno-builtin Gnu	-0.124939
-0.294154	1.19 13 Asmlib Gnu	-0.124939
-0.237853	for IA-32/Intel64, 2009. Gnu	-0.124939
-0.358279	details. Development time Some	-0.124939
-0.358013	Avoid unnecessary functions Some	-0.124939
-0.782561	whole program optimization Some	-0.124939
-0.871746	of function libraries Some	-0.124939
-1.316244	of the code. Some	-0.124939
-1.571921	at compile time. Some	-0.124939
-0.654589	3.12 Network access Some	-0.124939
-1.056635	in 64-bit systems. Some	-0.124939
-0.487705	64-bit operating systems. Some	-0.124939
-0.455268	on all compilers. Some	-0.124939
-0.646295	8.6 Optimization directives Some	-0.124939
-0.459046	on the compiler. Some	-0.124939
-0.538406	in a compiler. Some	-0.124939
-0.382514	with Microsoft compiler. Some	-0.124939
-0.816703	of the loop. Some	-0.124939
-0.449848	32 bit mode. Some	-0.124939
-0.348199	of 16 bytes. Some	-0.124939
-0.524245	} Loop unrolling Some	-0.124939
-0.343423	the optimal order. Some	-0.124939
-1.150005	dynamic memory allocation. Some	-0.124939
-0.343475	optimization option available. Some	-0.124939
-0.343423	of technical problems. Some	-0.124939
-0.877513	a cache line. Some	-0.124939
-0.341512	less intensive applications. Some	-0.124939
-0.862710	speed is important. Some	-0.124939
-0.331515	to avoid them. Some	-0.124939
-0.331515	doing the division. Some	-0.124939
-0.212085	rather than two. Some	-0.124939
-0.212085	will make two. Some	-0.124939
-0.325013	program starts up. Some	-0.124939
-0.929465	of programming style. Some	-0.124939
-0.325013	it can run. Some	-0.124939
-0.574945	is intended for. Some	-0.124939
-0.407883	and just-in-time compilation. Some	-0.124939
-0.314387	multidimensional array sequentially. Some	-0.124939
-0.407705	(www.intel.com/technology/itj/). 10.1 Hyperthreading Some	-0.124939
-0.314387	one works best. Some	-0.124939
-0.293885	ever used, though. Some	-0.124939
-0.237617	many different places). Some	-0.124939
-0.237617	the program logic. Some	-0.124939
-0.237617	regular time intervals. Some	-0.124939
-0.237617	in the STL. Some	-0.124939
-0.237617	the Xnu project. Some	-0.124939
-0.237617	graphics accelerator card. Some	-0.124939
-0.237617	memory pool. Alignment? Some	-0.124939
-0.237617	iterations of redesign. Some	-0.124939
-0.237617	things very stupid. Some	-0.124939
-0.237617	obeyed. Copy protection. Some	-0.124939
-0.578374	solution because of its	-0.124939
-0.460582	or each of its	-0.124939
-0.419771	uses most of its	-0.124939
-0.419771	run most of its	-0.124939
-0.419771	spends most of its	-0.124939
-0.565952	polymorphic member of its	-0.124939
-0.568618	individual bits of its	-0.124939
-1.115308	the values of its	-0.124939
-1.677384	a pointer to its	-0.124939
-0.357838	must return to its	-0.124939
-0.324163	of pointers to its	-0.124939
-0.462844	function type and its	-0.124939
-0.358304	no side-effects and its	-0.124939
-0.504251	each object in its	-0.124939
-0.358379	a framework in its	-0.124939
-0.582963	used or if its	-0.124939
-0.357394	a register if its	-0.124939
-1.013201	is replaced by its	-0.124939
-0.503417	integer constant with its	-0.124939
-0.357166	loop counter with its	-0.124939
-0.355609	the object on its	-0.124939
-0.500379	then run on its	-0.124939
-0.579877	CPU based on its	-0.124939
-0.521122	c1 other than its	-0.124939
-0.592482	supports, rather than its	-0.124939
-0.717059	is better than its	-0.124939
-0.822156	does not have its	-0.124939
-0.501149	block should have its	-0.124939
-0.462813	a pointer then its	-0.124939
-0.786923	can benefit from its	-0.124939
-0.358200	element matrix[c][r] at its	-0.124939
-0.349129	Each thread has its	-0.124939
-0.349129	linked list has its	-0.124939
-0.349129	CPU model has its	-0.124939
-0.349129	template instance has its	-0.124939
-0.503724	simplest cases, but its	-0.124939
-1.795818	to make sure its	-0.124939
-0.862258	gets information about its	-0.124939
-0.522290	give each thread its	-0.124939
-0.354989	object: (1) get its	-0.124939
-0.354673	compiler must calculate its	-0.124939
-0.453230	We cannot change its	-0.124939
-0.347298	should then handle its	-0.124939
-0.508800	order to align its	-0.124939
-0.193292	class by type-casting its	-0.124939
-0.193292	type by type-casting its	-0.124939
-0.237772	from fully utilizing its	-0.124939
-0.482306	worrying too much about	-0.124939
-0.342583	to worry much about	-0.124939
-0.124599	source of information about	-0.124939
-0.124599	doesn't have information about	-0.124939
-0.124599	for more information about	-0.124939
-0.057846	needs all information about	-0.124939
-0.057846	saved all information about	-0.124939
-0.124599	has no information about	-0.124939
-0.124599	probably without information about	-0.124939
-0.057846	the necessary information about	-0.124939
-0.057846	124 necessary information about	-0.124939
-0.124599	memory. No information about	-0.124939
-0.124599	the full information about	-0.124939
-0.124599	of added information about	-0.124939
-0.182542	the CPUID information about	-0.124939
-0.057846	which gets information about	-0.124939
-0.057846	class gets information about	-0.124939
-0.124599	compiler additional information about	-0.124939
-0.124599	has insufficient information about	-0.124939
-0.124599	has incomplete information about	-0.124939
-0.352570	you can read about	-0.124939
-1.073669	can be made about	-0.124939
-0.060890	is said here about	-0.124939
-0.347358	names. The details about	-0.124939
-0.378658	141 for details about	-0.124939
-0.265073	problems. More details about	-0.124939
-0.057002	to do something about	-0.124939
-0.343691	have to care about	-0.124939
-0.339369	obey certain rules about	-0.124939
-0.434774	and page 87 about	-0.124939
-0.429269	any specific recommendation about	-0.124939
-0.594838	See page 43 about	-0.124939
-0.325192	See page 26 about	-0.124939
-0.325257	make any assumption about	-0.124939
-0.314562	(See page 137 about	-0.124939
-0.538579	have to worry about	-0.124939
-0.294052	threads, but that's about	-0.124939
-0.294052	takes the hint about	-0.124939
-0.294052	a few comments about	-0.124939
-0.237763	contain useful discussions about	-0.124939
-0.237763	programmer hasn't thought about	-0.124939
-0.237763	get a reply about	-0.124939
-0.237763	be too worried about	-0.124939
-0.237763	"Instruction tables". Tips about	-0.124939
-0.237763	Intel processors. Details about	-0.124939
-0.237763	a considerable debate about	-0.124939
-0.872594	then it is important	-0.602060
-1.134042	because it is important	-0.124939
-1.167145	but it is important	-0.124939
-1.134042	Therefore, it is important	-0.124939
-0.541015	project, it is important	-0.124939
-0.541015	these, it is important	-0.124939
-0.541015	nature, it is important	-0.124939
-0.944750	used. It is important	-0.124939
-0.754323	pointer. It is important	-0.124939
-0.521338	variables. It is important	-0.124939
-0.754323	calculations. It is important	-0.124939
-0.521338	language. It is important	-0.124939
-0.521338	is. It is important	-0.124939
-0.521338	do. It is important	-0.124939
-0.521338	structure. It is important	-0.124939
-0.521338	decomposition. It is important	-0.124939
-0.521338	off. It is important	-0.124939
-0.556238	when performance is important	-0.124939
-0.600567	stored can be important	-0.124939
-0.358591	busy concentrating on important	-0.124939
-1.258392	as well as important	-0.124939
-0.525024	program is an important	-0.124939
-0.760647	There is an important	-0.124939
-0.358376	please install this important	-0.124939
-0.453615	system, the more important	-0.124939
-0.587813	access is more important	-0.124939
-0.540580	development are more important	-0.124939
-0.453615	is even more important	-0.124939
-0.703333	of the most important	-0.124939
-0.335959	problem. The most important	-0.124939
-0.335959	available. The most important	-0.124939
-0.335959	faster. The most important	-0.124939
-0.335959	execution. The most important	-0.124939
-0.335959	generality. The most important	-0.124939
-0.568135	function is so important	-0.124939
-0.708386	it is very important	-0.124939
-0.440839	It is very important	-0.124939
-0.440839	algorithm is very important	-0.124939
-0.853262	but is less important	-0.124939
-0.349090	has become less important	-0.124939
-0.356036	avoid them. Some important	-0.124939
-0.355757	is important. An important	-0.124939
-1.461044	It is therefore important	-0.124939
-0.541346	problem is too important	-0.124939
-0.762715	that are particularly important	-0.124939
-1.682457	if it is accessed	-0.124939
-1.295166	because it is accessed	-0.124939
-1.187318	where it is accessed	-0.124939
-0.576811	therefore it is accessed	-0.124939
-0.598370	shows. It is accessed	-0.124939
-0.512551	or object is accessed	-0.124939
-0.512551	no object is accessed	-0.124939
-0.461537	or variable is accessed	-0.124939
-0.461537	A variable is accessed	-0.124939
-0.550677	data cache and accessed	-0.124939
-0.855601	to can be accessed	-0.124939
-0.855601	example can be accessed	-0.124939
-0.577667	both can be accessed	-0.124939
-0.577667	DLL can be accessed	-0.124939
-0.855601	They can be accessed	-0.124939
-0.882988	arrays should be accessed	-0.124939
-0.823129	the data are accessed	-0.124939
-0.161049	when data are accessed	-0.124939
-0.491097	a class are accessed	-0.124939
-0.348938	child class are accessed	-0.124939
-0.588767	the objects are accessed	-0.124939
-0.296554	If objects are accessed	-0.425969
-0.129888	the elements are accessed	-0.301030
-0.216095	when elements are accessed	-0.124939
-0.564907	in registers are accessed	-0.124939
-0.786241	the arrays are accessed	-0.124939
-0.367515	when arrays are accessed	-0.124939
-0.367515	If arrays are accessed	-0.124939
-0.723529	the addresses are accessed	-0.124939
-0.198862	the rows are accessed	-0.124939
-0.342343	or structures are accessed	-0.124939
-0.063659	the diagonal are accessed	-0.124939
-0.726186	in memory or accessed	-0.124939
-1.705865	it is not accessed	-0.124939
-1.142185	function is not accessed	-0.124939
-0.562348	used, even when accessed	-0.124939
-0.309622	time. Are objects accessed	-0.124939
-0.309622	94 Are objects accessed	-0.124939
-0.860418	that has been accessed	-0.124939
-1.095291	are in fact accessed	-0.124939
-0.331849	malloc) is necessarily accessed	-0.124939
-0.556009	The speed of CPUs	-0.124939
-0.549694	new generation of CPUs	-0.124939
-0.618764	different brands of CPUs	-0.124939
-0.390157	other brands of CPUs	-0.124939
-0.561841	strlen function for CPUs	-0.124939
-0.539948	another version for CPUs	-0.124939
-1.111508	is compatible with CPUs	-0.124939
-0.459413	this function on CPUs	-0.124939
-0.558366	optimal only on CPUs	-0.124939
-0.959731	reduced performance on CPUs	-0.124939
-0.552298	have many different CPUs	-0.124939
-0.561410	on several different CPUs	-0.124939
-0.539777	cycles than other CPUs	-0.124939
-0.572287	compatible with all CPUs	-0.124939
-0.357810	supported by most CPUs	-0.124939
-0.505541	favor of Intel CPUs	-0.124939
-0.344051	profiler. For Intel CPUs	-0.124939
-0.344051	all newer Intel CPUs	-0.124939
-0.344051	on current Intel CPUs	-0.124939
-0.538779	computer with multiple CPUs	-0.124939
-0.488211	To use multiple CPUs	-0.124939
-0.346856	parallel: Using multiple CPUs	-0.124939
-0.357521	on all 64-bit CPUs	-0.124939
-0.355982	16 bytes. Some CPUs	-0.124939
-0.355656	Intel compiler. Use CPUs	-0.124939
-0.481782	VTune, for AMD CPUs	-0.124939
-0.533478	not on AMD CPUs	-0.124939
-0.352185	monitor counters Many CPUs	-0.124939
-0.351589	All modern x86 CPUs	-0.124939
-0.519138	compatibility with old CPUs	-0.124939
-0.317276	is because modern CPUs	-0.124939
-0.317276	12. Most modern CPUs	-0.124939
-0.349045	the oldest Pentium CPUs	-0.124939
-0.197919	performance on non-Intel CPUs	-0.425969
-0.239592	speed on non-Intel CPUs	-0.124939
-0.206510	dispatcher treats non-Intel CPUs	-0.124939
-0.206510	also treat non-Intel CPUs	-0.124939
-0.345035	doesn't handle current CPUs	-0.124939
-0.193331	by CPU Modern CPUs	-0.124939
-0.193331	critical resources. Modern CPUs	-0.124939
-0.193331	in parallel. Modern CPUs	-0.124939
-0.193331	and temp2. Modern CPUs	-0.124939
-0.294052	called accumulators. Current CPUs	-0.124939
-0.294052	integer division. Older CPUs	-0.124939
-0.237763	on contemporary 106 CPUs	-0.124939
-0.237763	some small low-power CPUs	-0.124939
-0.888855	version of the function.	-0.249877
-0.875072	start of the function.	-0.124939
-0.878244	parameter to the function.	-0.124939
-0.589459	transferred to the function.	-0.124939
-0.560402	return from the function.	-0.124939
-0.560402	returns from the function.	-0.124939
-0.573514	modules call the function.	-0.124939
-0.586576	defined inside the function.	-0.124939
-1.069467	out of a function.	-0.124939
-0.596781	equivalent to a function.	-0.124939
-0.354604	least one other function.	-0.124939
-0.851050	call any other function.	-0.124939
-0.581472	calling the library function.	-0.124939
-0.503522	accessed from any function.	-0.124939
-0.865055	of the member function.	-0.124939
-0.328394	as a member function.	-0.124939
-0.693395	a class member function.	-0.124939
-0.740309	a virtual member function.	-0.124939
-0.140436	of the called function.	-0.124939
-0.140436	to the called function.	-0.124939
-0.790072	of the critical function.	-0.124939
-0.751177	to the critical function.	-0.124939
-0.821985	of the new function.	-0.124939
-0.584257	isolating a single function.	-0.124939
-1.052947	of the virtual function.	-0.124939
-0.574839	of the next function.	-0.124939
-0.353377	the _mm_clflush intrinsic function.	-0.124939
-1.428447	in a separate function.	-0.124939
-0.281816	to the dispatcher function.	-0.124939
-0.281816	from the dispatcher function.	-0.124939
-0.281816	Make the dispatcher function.	-0.124939
-1.161724	to the desired function.	-0.124939
-0.655334	to the inlined function.	-0.124939
-0.404273	of an inlined function.	-0.124939
-0.547944	own error message function.	-0.124939
-0.444267	is a pure function.	-0.124939
-0.441892	of the memcpy function.	-0.124939
-0.721946	call a polymorphic function.	-0.124939
-0.483839	called a leaf function.	-0.124939
-0.331810	some examples: strlen function.	-0.124939
-0.314572	call the ReadTSC function.	-0.124939
-0.294061	registers anyway. Pure function.	-0.124939
-1.066063	use of the extra	-0.124939
-1.279381	because of the extra	-0.124939
-0.579720	must do the extra	-0.124939
-0.658671	optimize away the extra	-0.124939
-0.358331	faster despite the extra	-0.124939
-1.050091	a lot of extra	-0.124939
-0.463451	a structure. The extra	-0.124939
-0.356889	than 231. This extra	-0.124939
-0.356889	page 135). This extra	-0.124939
-0.214224	there is an extra	-0.124939
-0.811387	will have an extra	-0.124939
-0.442829	and makes an extra	-0.124939
-0.342494	or add an extra	-0.124939
-0.342494	it needs an extra	-0.124939
-0.342494	it requires an extra	-0.124939
-0.342494	software. Such an extra	-0.124939
-0.342494	it adds an extra	-0.124939
-0.526459	and make this extra	-0.124939
-0.586603	database, and other extra	-0.124939
-1.617554	There is no extra	-0.124939
-0.422244	will be no extra	-0.124939
-0.124191	code takes no extra	-0.124939
-0.124191	often takes no extra	-0.124939
-0.124191	handling takes no extra	-0.124939
-0.124191	conversion takes no extra	-0.124939
-0.232483	double take no extra	-0.124939
-0.232483	precisions take no extra	-0.124939
-0.326077	conversion generates no extra	-0.124939
-0.351260	the table takes extra	-0.124939
-0.351260	them again takes extra	-0.124939
-0.326817	doesn't take any extra	-0.124939
-0.326817	doesn't generate any extra	-0.124939
-0.026299	not produce any extra	-0.301030
-0.326817	without adding any extra	-0.124939
-0.524869	rise to some extra	-0.124939
-0.572772	has to take extra	-0.124939
-0.555156	logic may need extra	-0.124939
-0.581962	require a few extra	-0.124939
-0.354528	may actually add extra	-0.124939
-0.336233	gives an 9 extra	-0.124939
-0.331778	type identification adds extra	-0.124939
-0.314640	The compiler inserts extra	-0.124939
-1.400065	a function that does	-0.124939
-0.872924	A code that does	-0.124939
-0.758678	a loop that does	-0.124939
-0.356611	default constructor that does	-0.124939
-0.358573	code automatically or does	-0.124939
-1.457222	is that it does	-0.124939
-0.570050	advantage that it does	-0.124939
-0.354081	in fact it does	-0.124939
-0.354081	and sometimes it does	-0.124939
-0.354081	to optimization, it does	-0.124939
-0.504596	const)) Assume function does	-0.124939
-0.356862	ready-made profiler. This does	-0.124939
-0.356862	per point. This does	-0.124939
-0.820780	if the compiler does	-0.124939
-0.362454	what the compiler does	-0.124939
-0.558958	Sometimes the compiler does	-0.124939
-0.585944	integer. The compiler does	-0.124939
-0.446465	composer) This compiler does	-0.124939
-0.507466	The Microsoft compiler does	-0.124939
-0.345378	code. Each compiler does	-0.124939
-0.526388	used. However, this does	-0.124939
-0.352815	previous value. It does	-0.124939
-0.352815	syntax check. It does	-0.124939
-0.352815	pointer conversions. It does	-0.124939
-1.350255	inside the loop does	-0.124939
-0.724602	'this' pointer which does	-0.124939
-0.555771	that the pointer does	-0.124939
-0.587575	decrementing a pointer does	-0.124939
-0.348309	a specific pointer does	-0.124939
-0.357576	The IPP library does	-0.124939
-0.588383	member the object does	-0.124939
-1.440540	power of 2 does	-0.124939
-1.049990	powers of 2 does	-0.124939
-0.461377	it is long does	-0.124939
-0.459300	process or thread does	-0.124939
-0.537837	important. This manual does	-0.124939
-0.521178	that the list does	-0.124939
-0.353677	The static_cast operator does	-0.124939
-1.203919	The CPU dispatcher does	-0.124939
-0.496506	70). The programmer does	-0.124939
-0.518047	that pointer aliasing does	-0.124939
-0.935834	the other hand, does	-0.124939
-0.421128	unfortunately the unit-test does	-0.124939
-0.314552	The same argument does	-0.124939
-0.382623	} Example 14.26 does	-0.124939
-0.237755	Another function __intel_cpu_features_init_x() does	-0.124939
-0.600495	annotation in the assembly	-0.124939
-1.398038	look at the assembly	-0.124939
-0.865234	option makes the assembly	-0.124939
-0.540865	shows what the assembly	-0.124939
-1.530274	the use of assembly	-0.124939
-0.358874	easy linking to assembly	-0.124939
-0.241645	of C++ and assembly	-0.425969
-0.459360	dealt with in assembly	-0.124939
-0.758209	a pointer in assembly	-0.124939
-0.355564	and d in assembly	-0.124939
-0.459360	are allowed in assembly	-0.124939
-0.065376	Optimizing subroutines in assembly	-0.124939
-0.020717	"Optimizing subroutines in assembly	-0.301030
-1.109309	the function. The assembly	-0.124939
-0.462629	assembly output. The assembly	-0.124939
-0.837671	compiler option for assembly	-0.124939
-1.006387	optimization guide for assembly	-0.124939
-0.357465	or /Fa for assembly	-0.124939
-0.142962	compiled C++ or assembly	-0.124939
-0.142962	C, C++ or assembly	-0.124939
-0.535071	map or an assembly	-0.124939
-0.568481	doesn't have an assembly	-0.124939
-0.778077	to generate an assembly	-0.124939
-0.856376	need to use assembly	-0.124939
-0.856376	necessary to use assembly	-0.124939
-0.348974	round function using assembly	-0.124939
-0.583466	than by using assembly	-0.124939
-0.348974	highly optimized, using assembly	-0.124939
-0.544829	drivers may need assembly	-0.124939
-0.347464	Other compilers need assembly	-0.124939
-0.188765	generates the following assembly	-0.425969
-0.355702	do this: Use assembly	-0.124939
-0.355571	for testing single assembly	-0.124939
-0.292393	support for inline assembly	-0.124939
-0.380600	with an inline assembly	-0.124939
-0.292393	to use inline assembly	-0.124939
-0.292393	the same inline assembly	-0.124939
-0.292393	intrinsic functions, inline assembly	-0.124939
-0.314686	and understand compiler-generated assembly	-0.124939
-0.314620	-openmp -static Generate assembly	-0.124939
-0.237812	of instruction timing, assembly	-0.124939
-0.237812	straightforward. The MASM assembly	-0.124939
-1.296167	because of the large	-0.124939
-0.570254	you avoid the large	-0.124939
-1.324651	the object is large	-0.124939
-0.540246	the list is large	-0.124939
-1.226816	repeat count is large	-0.124939
-0.880059	There is a large	-0.124939
-0.597370	complicated in a large	-0.124939
-0.584239	But if a large	-0.124939
-0.592850	appear as a large	-0.124939
-0.356369	when copying a large	-0.124939
-0.356369	at least a large	-0.124939
-0.501441	must install a large	-0.124939
-0.356369	can incur a large	-0.124939
-1.555203	in case of large	-0.124939
-0.463138	many allocations of large	-0.124939
-0.839036	all data in large	-0.124939
-0.143230	Cache contentions in large	-0.425969
-0.357443	quite inefficient in large	-0.124939
-1.240701	be useful for large	-0.124939
-0.825784	highly optimized for large	-0.124939
-0.463193	mathematical applications with large	-0.124939
-0.540952	doing calculations on large	-0.124939
-0.358542	2 Gbytes. This large	-0.124939
-0.550025	Applications that use large	-0.124939
-0.459487	memory block. A large	-0.124939
-0.355663	mentioned here: A large	-0.124939
-0.503010	counters, etc. In large	-0.124939
-0.568108	variables is so large	-0.124939
-0.526762	library is very large	-0.124939
-0.534244	of a very large	-0.124939
-0.379602	for a very large	-0.124939
-0.534244	with a very large	-0.124939
-0.379602	as a very large	-0.124939
-0.227762	only for very large	-0.124939
-0.227762	dramatically for very large	-0.124939
-0.355695	test server. Use large	-0.124939
-0.522332	program has several large	-0.124939
-0.521986	caches and cause large	-0.124939
-0.496474	that are too large	-0.124939
-0.442511	choose to align large	-0.124939
-0.548428	compilers will align large	-0.124939
-0.044137	arrays are sufficiently large	-0.425969
-0.314610	that can skip large	-0.124939
-0.879100	means that a must	-0.124939
-0.463385	runtime framework that must	-0.124939
-0.593877	up then it must	-0.124939
-1.360821	that the function must	-0.124939
-0.461637	The calling function must	-0.124939
-0.599075	Here, the code must	-0.124939
-1.145925	because the compiler must	-0.124939
-0.586764	list, the compiler must	-0.124939
-0.540561	reading of x must	-0.124939
-0.588187	not, then you must	-0.124939
-0.350333	first. However, you must	-0.124939
-0.350333	of problems you must	-0.124939
-0.642844	other words, you must	-0.124939
-0.350333	specific purpose, you must	-0.124939
-0.358285	do manually. It must	-0.124939
-0.597064	words, the program must	-0.124939
-0.458302	difficult. The functions must	-0.124939
-0.545351	140). Mathematical functions must	-0.124939
-0.658193	The MOVNTQ instruction must	-0.124939
-0.357950	it is, but must	-0.124939
-0.549240	The container class must	-0.124939
-0.462136	compilers cannot do must	-0.124939
-1.073363	value of i must	-0.124939
-0.583607	that an object must	-0.124939
-0.580981	in the array must	-0.124939
-0.461529	models. However, we must	-0.124939
-0.461252	the loop branch must	-0.124939
-0.721807	the carry bit must	-0.124939
-1.028089	that the user must	-0.124939
-0.554489	costly because they must	-0.124939
-0.356045	being said, I must	-0.124939
-0.565018	between multiple threads must	-0.124939
-0.354407	The inequality sign must	-0.124939
-0.353736	cache. Multithreaded programs must	-0.124939
-0.352849	possible performance. We must	-0.124939
-0.646213	user interface framework must	-0.124939
-0.351764	128-bit XMM vectors must	-0.124939
-0.547069	Any copy constructor must	-0.124939
-0.350150	rounding 137 errors must	-0.124939
-0.350072	i. This index must	-0.124939
-0.348809	mouse. This task must	-0.124939
-0.331725	8; // SIZE must	-0.124939
-0.293932	vectorize. The pragmas must	-0.124939
-0.293932	implemented. The recursion must	-0.124939
-0.382486	destructor, if any, must	-0.124939
-0.237658	module for correctness must	-0.124939
-0.600779	purpose of the while	-0.124939
-0.600443	zero in the while	-0.124939
-0.557043	n, including the while	-0.124939
-0.358421	to emulate the while	-0.124939
-0.835106	the program, and while	-0.124939
-0.580281	loop would be while	-0.124939
-1.463832	at compile time while	-0.124939
-0.585060	processor can do while	-0.124939
-0.596085	seconds = 0; while	-0.124939
-0.567871	uses 32 bits while	-0.124939
-0.104268	can do calculations while	-0.124939
-0.457551	writing data files while	-0.124939
-0.353839	in all cases, while	-0.124939
-0.351011	a frame function, while	-0.124939
-0.553021	restart the computer while	-0.124939
-0.350729	or no overhead while	-0.124939
-0.517581	compiler for Windows, while	-0.124939
-0.347283	can modify x, while	-0.124939
-0.632308	registers are used, while	-0.124939
-0.341476	old Pentium 4, while	-0.124939
-0.621178	in one vector, while	-0.124939
-1.000813	function is called, while	-0.124939
-0.339088	c are integers, while	-0.124939
-0.339088	or program size, while	-0.124939
-0.339088	if and compile-time while	-0.124939
-0.335997	and press break while	-0.124939
-0.506685	y = 1.0; while	-0.124939
-0.331621	is running on, while	-0.124939
-0.331621	// do nothing while	-0.124939
-0.685219	between multiple threads, while	-0.124939
-0.538318	*p = string; while	-0.124939
-0.293904	expressions as arguments while	-0.124939
-0.293904	resume after exceptions: while	-0.124939
-0.382452	16 bits wide, while	-0.124939
-0.293904	done only once, while	-0.124939
-0.237633	has been incremented, while	-0.124939
-0.237633	is relatively expensive, while	-0.124939
-0.237633	1's is unchanged, while	-0.124939
-0.237633	register for both, while	-0.124939
-0.237633	simple regular pattern, while	-0.124939
-0.237633	call to Func1, while	-0.124939
-0.237633	generality and flexibility, while	-0.124939
-0.237633	separated by semicolons, while	-0.124939
-0.237633	by the application, while	-0.124939
-0.237633	15.1c as intended, while	-0.124939
-0.895878	end of a ;	-0.124939
-0.593738	ecx = a ;	-0.124939
-0.358197	;startofFunc ; a ;	-0.124939
-0.570427	r points to ;	-0.124939
-0.049651	top of loop ;	-0.124939
-0.977636	bit of i ;	-0.124939
-0.529827	end of array ;	-0.124939
-0.351407	result in array ;	-0.124939
-0.461258	stack ; return ;	-0.124939
-0.502394	divide by 2 ;	-0.124939
-0.460976	align by 4 ;	-0.124939
-0.344506	ebx on stack ;	-0.124939
-0.344506	ebx from stack ;	-0.124939
-0.773594	mangled function name ;	-0.124939
-0.351534	a ; r ;	-0.124939
-0.347224	loop if true ;	-0.124939
-0.299518	i/2 in ebx ;	-0.124939
-0.299518	100 $B1$2 ebx ;	-0.124939
-0.447506	return ; align ;	-0.124939
-0.089876	; unused label ;	-0.124939
-0.339364	calculations: for ( ;	-0.124939
-0.339364	ecx = Induction ;	-0.124939
-0.421129	$B2$3: ret ALIGN ;	-0.124939
-0.308536	a[i] = Induction; ;	-0.124939
-0.308536	a[i+1] = Induction; ;	-0.124939
-0.314465	double Func1(double) pure_function ;	-0.124939
-0.443976	8 + esp ;	-0.124939
-0.293959	$B2$2 ; Induction++; ;	-0.124939
-0.237682	edx, eax $B2$2 ;	-0.124939
-0.237682	point to a[i+2] ;	-0.124939
-0.237682	?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROCNEAR ;	-0.124939
-0.237682	8.26a (32-bit mode): ;	-0.124939
-0.237682	the variable 85 ;	-0.124939
-0.237682	?Func@@YAXQAHAAH@Z PROC NEAR ;	-0.124939
-0.237682	from example 8.26b: ;	-0.124939
-0.237682	label ;eax=addressofa ;edx=addressinr ;	-0.124939
-0.237682	i + sign(i) ;	-0.124939
-0.237682	+ esp ;alignby4 ;	-0.124939
-0.237682	Func ;a ;r ;	-0.124939
-0.237682	; i++ ;checkifi<100 ;	-0.124939
-0.237682	DWORD PTR [esp+12] ;	-0.124939
-0.237682	function name ;startofFunc ;	-0.124939
-0.877181	compiler that the arrays	-0.124939
-1.153761	require that the arrays	-0.124939
-0.596441	disadvantage when the arrays	-0.124939
-0.594802	7. If the arrays	-0.124939
-0.597258	make sure the arrays	-0.425969
-1.091357	of making the arrays	-0.124939
-0.482164	efficient whether the arrays	-0.124939
-0.482164	sure whether the arrays	-0.124939
-0.357551	can align the arrays	-0.124939
-1.516671	the use of arrays	-0.124939
-1.182368	a number of arrays	-0.124939
-0.561735	contains examples of arrays	-0.124939
-0.462265	Specifies alignment of arrays	-0.124939
-0.550246	always apply to arrays	-0.124939
-0.562326	advice applies to arrays	-0.124939
-0.526353	large objects and arrays	-0.124939
-0.358293	allocation Objects and arrays	-0.124939
-0.527013	occurs, even for arrays	-0.124939
-0.762955	more efficient when arrays	-0.124939
-1.000981	you can make arrays	-0.124939
-0.585892	spaces for different arrays	-0.124939
-0.358056	vector. 6. If arrays	-0.124939
-0.503368	and fixed size arrays	-0.124939
-0.498690	want as static arrays	-0.124939
-0.352104	align large static arrays	-0.124939
-0.444613	case of large arrays	-0.124939
-0.343910	has several large arrays	-0.124939
-0.343241	recommended that big arrays	-0.124939
-0.735142	you have big arrays	-0.124939
-1.021207	or a few arrays	-0.124939
-0.558930	is to replace arrays	-0.124939
-0.326865	to make aligned arrays	-0.124939
-0.326865	Make three aligned arrays	-0.124939
-0.643631	static or global arrays	-0.124939
-0.525022	optimized for accessing arrays	-0.124939
-0.441741	for simple variables, arrays	-0.124939
-0.314641	strings in character arrays	-0.124939
-0.314552	RAM memory. Big arrays	-0.124939
-0.294043	Example 12.5. Aligned arrays	-0.124939
-0.294043	to allocate variable-size arrays	-0.124939
-0.294043	Linux) 4. Align arrays	-0.124939
-0.237755	Copying or clearing arrays	-0.124939
-0.237755	very inefficient. Linear arrays	-0.124939
-0.237755	writing data. Multidimensional arrays	-0.124939
-0.597242	later and the work	-0.124939
-1.641025	depending on the work	-0.124939
-0.579769	increased when the work	-0.124939
-0.579769	decreased when the work	-0.124939
-0.270672	to divide the work	-0.301030
-0.577506	equal amount of work	-0.124939
-0.568138	the function to work	-0.124939
-0.586974	Windows compiler to work	-0.124939
-0.872269	a way to work	-0.124939
-0.812441	is sure to work	-0.124939
-1.733527	is likely to work	-0.124939
-0.502065	are intended to work	-0.124939
-0.356816	is impossible to work	-0.124939
-0.358766	82 Keywords that work	-0.124939
-0.550542	and make it work	-0.124939
-0.584069	pointers may not work	-0.124939
-0.587754	this does not work	-0.124939
-0.463022	systems, this may work	-0.124939
-0.526395	to make this work	-0.124939
-0.566217	This code will work	-0.124939
-0.355613	processor model will work	-0.124939
-0.657970	there is other work	-0.124939
-0.357945	math functions should work	-0.124939
-0.357454	These methods also work	-0.124939
-0.850259	do not always work	-0.124939
-0.356348	the trivial programming work	-0.124939
-0.780792	of the extra work	-0.124939
-0.459700	all code versions work	-0.124939
-0.816931	that it doesn't work	-0.124939
-0.611001	but it doesn't work	-0.124939
-0.308039	this method doesn't work	-0.124939
-0.308039	above line doesn't work	-0.124939
-0.308039	if above doesn't work	-0.124939
-0.353261	the next model work	-0.124939
-0.790268	the software development work	-0.124939
-0.326807	the Gnu directives work	-0.124939
-0.326807	the Microsoft directives work	-0.124939
-0.347312	do as little work	-0.124939
-0.634995	objects in BSD work	-0.124939
-0.341675	it some heavy work	-0.124939
-0.325192	about how caches work	-0.124939
-0.314562	be deleted. User work	-0.124939
-0.237763	for the reinstallation work	-0.124939
-0.570409	p points to (see	-0.124939
-0.896762	of the compiler (see	-0.124939
-0.657717	the preceding one (see	-0.124939
-0.550891	the data cache (see	-0.124939
-0.457211	have no cache (see	-0.124939
-0.803156	the derived class (see	-0.124939
-1.060517	float and double (see	-0.124939
-0.786234	a smart pointer (see	-0.124939
-1.010387	are less efficient (see	-0.124939
-0.564942	and induction variables (see	-0.124939
-1.161444	in a register (see	-0.124939
-0.977727	stored in registers (see	-0.124939
-1.188189	the XMM registers (see	-0.124939
-0.546553	divisible by 16 (see	-0.124939
-1.638130	the operating system (see	-0.124939
-0.536796	use vector instructions (see	-0.124939
-0.654250	on non-Intel processors (see	-0.124939
-0.579996	with a constant (see	-0.124939
-1.498616	on the stack (see	-0.124939
-0.355592	checks where necessary (see	-0.124939
-0.833419	with unsigned integers (see	-0.124939
-0.458837	floating point precision (see	-0.124939
-1.011416	a linked list (see	-0.124939
-0.822318	0 or 1 (see	-0.124939
-0.650263	clock cycle counter (see	-0.124939
-0.816409	floating point expressions (see	-0.124939
-1.019829	out of range (see	-0.124939
-0.350220	vectorized as intended (see	-0.124939
-0.284126	and automatic vectorization (see	-0.124939
-0.284126	intrinsics, automatic vectorization (see	-0.124939
-0.513457	// Bounds checking (see	-0.124939
-0.344894	to using templates (see	-0.124939
-1.209893	the critical stride (see	-0.124939
-0.536823	down dependency chains (see	-0.124939
-0.343400	be quite time-consuming (see	-0.124939
-0.700477	no pointer aliasing (see	-0.124939
-0.287372	rule out aliasing (see	-0.124939
-0.501825	no branch prediction (see	-0.124939
-0.341486	If a profiling (see	-0.124939
-0.821753	with out-of-order capabilities (see	-0.124939
-0.510402	than the throughput (see	-0.124939
-0.314416	case of mispredictions (see	-0.124939
-0.293913	to be profitable (see	-0.124939
-0.237641	and automatic CPU-dispatching (see	-0.124939
-0.237641	do the devirtualization (see	-0.124939
-0.596505	alternative is the Windows	-0.124939
-0.600638	database in the Windows	-0.124939
-0.577256	name. In the Windows	-0.124939
-0.591467	directive for a Windows	-0.124939
-0.463182	once made a Windows	-0.124939
-0.702104	64-bit Linux and Windows	-0.124939
-0.490043	supports Linux and Windows	-0.124939
-0.356953	Windows 7 and Windows	-0.124939
-0.461126	systems DOS and Windows	-0.124939
-0.356953	Library (ATL) and Windows	-0.124939
-0.834261	is supported in Windows	-0.124939
-0.358393	interface (OnIdle in Windows	-0.124939
-0.462640	best performance. The Windows	-0.124939
-0.358144	and BSD. The Windows	-0.124939
-0.700906	Intel compiler for Windows	-0.124939
-0.489307	Microsoft compiler for Windows	-0.124939
-0.654368	Intel compilers for Windows	-0.124939
-0.460132	interface library for Windows	-0.124939
-1.120341	when compiling for Windows	-0.124939
-0.358694	int BigArray[1024]; // Windows	-0.124939
-0.358584	some cases on Windows	-0.124939
-0.499263	compiler Intel compiler Windows	-0.124939
-0.336133	Linux Intel compiler Windows	-0.425969
-0.088380	n.a. MS compiler Windows	-0.124939
-0.041946	optimization MS compiler Windows	-0.425969
-0.479165	disadvantage of 64-bit Windows	-0.124939
-1.061523	32-bit and 64-bit Windows	-0.124939
-0.765935	32- and 64-bit Windows	-0.124939
-1.015119	than in 64-bit Windows	-0.124939
-0.325697	efficient than 64-bit Windows	-0.124939
-0.325697	more efficient. 64-bit Windows	-0.124939
-0.325697	are different. 64-bit Windows	-0.124939
-0.483068	Linux and 32-bit Windows	-0.124939
-0.577292	Sum3 in 32-bit Windows	-0.124939
-0.552594	intended for 32-bit Windows	-0.124939
-1.250472	in 64 bit Windows	-0.124939
-0.392723	work in both Windows	-0.124939
-0.392723	syntax in both Windows	-0.124939
-0.331778	user input. (In Windows	-0.124939
-0.421240	64-bit Linux, BSD, Windows	-0.124939
-0.314640	now discontinued Object Windows	-0.124939
-0.382725	_LP64 _WIN64 _LP64 Windows	-0.124939
-0.237829	the current position. Windows	-0.124939
-0.577383	in between the calls	-0.124939
-1.161144	to avoid the calls	-0.124939
-2.073604	the number of calls	-0.124939
-0.557527	20 times and calls	-0.124939
-1.156469	a function that calls	-0.124939
-0.870749	A function that calls	-0.124939
-0.822206	the program that calls	-0.124939
-0.671438	test program that calls	-0.124939
-0.355908	each statement that calls	-0.124939
-0.449549	number of function calls	-0.124939
-0.449549	chain of function calls	-0.124939
-0.149062	branches and function calls	-0.124939
-0.340508	spent on function calls	-0.124939
-0.440329	can make function calls	-0.124939
-0.684968	with many function calls	-0.124939
-0.137704	This makes function calls	-0.124939
-0.137704	frame makes function calls	-0.124939
-0.340508	the extra function calls	-0.124939
-0.544162	of virtual function calls	-0.124939
-0.518926	contain pure function calls	-0.124939
-0.340508	well. Even function calls	-0.124939
-0.500411	a dispatched function calls	-0.124939
-0.340508	the 61 function calls	-0.124939
-0.340508	Avoid nested function calls	-0.124939
-0.358598	such loops by calls	-0.124939
-0.568309	message and then calls	-0.124939
-0.355587	function which then calls	-0.124939
-0.462275	program contains no calls	-0.124939
-0.814692	program has many calls	-0.124939
-0.356304	call statement always calls	-0.124939
-0.356305	determined with system calls	-0.124939
-0.354945	7.14 Functions Function calls	-0.124939
-0.717479	with AVX support calls	-0.124939
-0.871275	a program contains calls	-0.124939
-0.352876	described below. Make calls	-0.124939
-0.350307	which in turn calls	-0.124939
-0.382748	However, if F1 calls	-0.124939
-0.294145	necessary. If F1 calls	-0.124939
-0.607166	in example 16.2 calls	-0.124939
-0.331727	a "function". Multiple calls	-0.124939
-0.458533	loaded, the loader calls	-0.124939
-0.325222	an error handler calls	-0.124939
-0.314591	use standard API calls	-0.124939
-0.237788	number of jumps, calls	-0.124939
-0.598929	inputs to the calculations	-0.124939
-1.563460	depends on the calculations	-0.124939
-0.549578	propagate through the calculations	-0.124939
-0.725006	to overlap the calculations	-0.124939
-0.725006	has finished the calculations	-0.124939
-0.658343	and redo the calculations	-0.124939
-0.115826	the sequence of calculations	-0.124939
-0.390613	a sequence of calculations	-0.124939
-0.463423	of error. The calculations	-0.124939
-0.858371	branch depends on calculations	-0.124939
-0.561714	operations with other calculations	-0.124939
-1.133752	the floating point calculations	-0.124939
-0.963433	and floating point calculations	-0.124939
-0.916195	for floating point calculations	-0.124939
-0.184162	does floating point calculations	-0.124939
-0.510345	contains floating point calculations	-0.124939
-0.510345	fast floating point calculations	-0.124939
-0.510345	strict floating point calculations	-0.124939
-0.555671	double Floating point calculations	-0.124939
-0.503562	do simple integer calculations	-0.124939
-0.871934	takes to do calculations	-0.124939
-0.518850	CPU can do calculations	-0.124939
-0.750078	thread can do calculations	-0.124939
-0.168718	and doing multiple calculations	-0.124939
-0.168718	from doing multiple calculations	-0.124939
-0.168718	CPU doing multiple calculations	-0.124939
-0.357253	for doing some calculations	-0.124939
-0.349342	but these address calculations	-0.124939
-0.349342	the runtime address calculations	-0.124939
-0.552581	do the necessary calculations	-0.124939
-0.497577	that double precision calculations	-0.124939
-0.497577	cases, double precision calculations	-0.124939
-0.458196	useful when doing calculations	-0.124939
-0.354333	stack are: All calculations	-0.124939
-0.457860	sure that certain calculations	-0.124939
-0.354286	consider if intermediate calculations	-0.124939
-0.429461	for common mathematical calculations	-0.124939
-0.331850	to mix mathematical calculations	-0.124939
-0.538797	CPU to start calculations	-0.124939
-0.346291	for doing parallel calculations	-0.124939
-0.341672	the heavy background calculations	-0.124939
-0.341733	if pointer arithmetic calculations	-0.124939
-0.438831	put time- consuming calculations	-0.124939
-0.761033	that all code versions	-0.124939
-0.357051	make multiple code versions	-0.124939
-0.587906	on Intel compiler versions	-0.124939
-0.356580	The following compiler versions	-0.124939
-0.670892	two or more versions	-0.425969
-0.381557	to the different versions	-0.124939
-0.381557	test the different versions	-0.124939
-0.381557	contain the different versions	-0.124939
-0.518357	available in different versions	-0.124939
-0.295290	functions The different versions	-0.124939
-0.295290	sets. The different versions	-0.124939
-0.330495	or if different versions	-0.124939
-0.518357	compatible with different versions	-0.124939
-0.330495	stub. If different versions	-0.124939
-0.761949	have two different versions	-0.124939
-0.357603	to get library versions	-0.124939
-0.096987	code in multiple versions	-0.602060
-0.139831	program in multiple versions	-0.124939
-0.139831	compiled in multiple versions	-0.124939
-0.227226	may make multiple versions	-0.124939
-0.227226	will make multiple versions	-0.124939
-0.316345	for making multiple versions	-0.124939
-0.316345	automatically generate multiple versions	-0.124939
-0.789288	there are two versions	-0.124939
-0.448320	to make two versions	-0.124939
-0.509601	distinguish these two versions	-0.124939
-0.356520	to advertise new versions	-0.124939
-0.356004	10.1 Hyperthreading Some versions	-0.124939
-0.523820	of the compiled versions	-0.124939
-0.459345	functions have several versions	-0.124939
-0.355213	Currently includes optimized versions	-0.124939
-0.352778	and all three versions	-0.124939
-0.348945	to make special versions	-0.124939
-0.346279	by 16. Library versions	-0.124939
-0.345095	available in newer versions	-0.124939
-0.339393	3 The latest versions	-0.124939
-0.294089	to the CPU-specific versions	-0.124939
-0.048375	Intel CPUs. New versions	-0.124939
-0.048375	AMD CPUs. New versions	-0.124939
-0.294089	used as command-line versions	-0.124939
-0.294089	where necessary. Fast versions	-0.124939
-0.237796	IDE. Free trial versions	-0.124939
-0.600295	throughput of the execution	-0.124939
-0.893447	effect on the execution	-0.124939
-0.143066	can block the execution	-0.124939
-0.143066	possibly block the execution	-0.124939
-0.556271	counts give the execution	-0.124939
-0.572369	not improve the execution	-0.124939
-0.065668	slow down the execution	-0.124939
-0.586394	in terms of execution	-0.425969
-0.358849	in relation to execution	-0.124939
-0.549956	as cache and execution	-0.124939
-0.358288	program compactness, and execution	-0.124939
-1.489197	} } The execution	-0.124939
-0.503860	memory access. The execution	-0.124939
-0.562687	are optimized for execution	-0.124939
-0.389529	Use CPUs with execution	-0.124939
-0.389529	Older CPUs with execution	-0.124939
-0.586841	throughput of an execution	-0.124939
-0.462896	size often have execution	-0.124939
-0.501531	Delays in program execution	-0.124939
-0.458830	grows during program execution	-0.124939
-0.841332	between the different execution	-0.124939
-0.501302	operations use different execution	-0.124939
-0.598544	native floating point execution	-0.124939
-0.357744	CPUs. Half size execution	-0.124939
-0.357512	fact only 64-bit execution	-0.124939
-0.357406	be used where execution	-0.124939
-0.105556	Out of order execution	-0.124939
-0.355787	and flexibility, while execution	-0.124939
-0.355509	split between several execution	-0.124939
-0.457853	good for optimizing execution	-0.124939
-0.750694	most of their execution	-0.124939
-0.257857	because the out-of-order execution	-0.124939
-0.257857	general, the out-of-order execution	-0.124939
-0.277356	thanks to out-of-order execution	-0.124939
-0.650304	to the total execution	-0.124939
-0.408510	on the total execution	-0.124939
-0.347393	the full 128-bit execution	-0.124939
-0.347369	switch occurs during execution	-0.124939
-0.341657	critical. The fastest execution	-0.124939
-0.314542	method that delays execution	-0.124939
-0.237747	CPUs with full-size execution	-0.124939
-0.237747	multiple threads. Out-of-order execution	-0.124939
-0.587089	thing is to avoid	-0.124939
-1.526419	you have to avoid	-0.124939
-0.745827	be used to avoid	-0.124939
-0.351474	fixed size to avoid	-0.124939
-1.848964	is possible to avoid	-0.124939
-1.156044	in order to avoid	-0.124939
-1.198397	best way to avoid	-0.124939
-1.037453	of how to avoid	-0.124939
-0.732629	know how to avoid	-0.124939
-0.508519	programming, how to avoid	-0.124939
-0.813883	operating system to avoid	-0.124939
-0.351474	same type to avoid	-0.124939
-1.680706	you want to avoid	-0.124939
-1.518519	be able to avoid	-0.124939
-0.236993	various ways to avoid	-0.124939
-0.645085	processor models to avoid	-0.124939
-0.351474	test situations to avoid	-0.124939
-0.454174	time measurements to avoid	-0.124939
-0.351474	completely unrolled to avoid	-0.124939
-0.884800	if possible, and avoid	-0.124939
-0.846359	if you can avoid	-0.124939
-1.063589	then you can avoid	-0.124939
-0.509888	because you can avoid	-0.124939
-0.509888	If you can avoid	-0.124939
-0.837156	then we can avoid	-0.124939
-0.479456	time. You can avoid	-0.124939
-0.479456	twice. You can avoid	-0.124939
-0.479456	entry. You can avoid	-0.124939
-1.567538	The compiler may avoid	-0.124939
-0.550675	used. You may avoid	-0.124939
-0.550675	classes. You may avoid	-0.124939
-0.589801	devices if you avoid	-0.124939
-0.931150	long as you avoid	-0.124939
-0.202404	Therefore, you should avoid	-0.124939
-0.537549	allocation. You should avoid	-0.124939
-0.561478	If you cannot avoid	-0.124939
-0.558329	CriticalFunction. You cannot avoid	-0.124939
-0.457180	You may preferably avoid	-0.124939
-0.352960	by all means avoid	-0.124939
-0.580828	away and the result	-0.124939
-0.580828	bits, and the result	-0.124939
-0.890413	wait for the result	-0.124939
-1.671887	so that the result	-0.124939
-0.844442	depends on the result	-0.425969
-0.985418	depend on the result	-0.124939
-0.594939	But when the result	-0.124939
-0.591917	evaluated, because the result	-0.124939
-0.355771	will return the result	-0.124939
-1.276661	make sure the result	-0.124939
-0.968780	and store the result	-0.124939
-0.874535	to see the result	-0.124939
-0.355771	iteration needs the result	-0.124939
-0.159033	and give the result	-0.124939
-0.459624	then convert the result	-0.124939
-0.015451	// Store the result	-0.726999
-0.780450	and stores the result	-0.124939
-0.591110	wait for a result	-0.124939
-0.572514	often as a result	-0.124939
-0.572514	occur as a result	-0.124939
-0.523349	be 2. The result	-0.124939
-0.460234	very fast. The result	-0.124939
-0.356251	the profiler. The result	-0.124939
-0.356251	is ecx+eax*4. The result	-0.124939
-0.356251	as <. The result	-0.124939
-1.239349	+ i); // result	-0.124939
-0.357679	elements c.load(cc+i); // result	-0.124939
-0.358381	[ecx+eax*4],ebx stores this result	-0.124939
-0.599050	get the same result	-0.124939
-0.589755	then the first result	-0.124939
-0.354584	to ; store result	-0.124939
-0.340742	for the intermediate result	-0.124939
-0.340742	store the intermediate result	-0.124939
-0.532019	get a better result	-0.124939
-0.565042	last the second result	-0.124939
-0.427925	that the final result	-0.124939
-0.427925	check the final result	-0.124939
-0.532898	//=DeltaY // Store result	-0.124939
-0.325301	because the 33 result	-0.124939
-0.408055	with the correct result	-0.124939
-0.598597	work that the processor	-0.124939
-1.270119	even if the processor	-0.124939
-0.581663	selected if the processor	-0.124939
-0.581663	determine if the processor	-0.124939
-1.466896	supported by the processor	-0.124939
-1.640947	depending on the processor	-0.124939
-1.000421	checks whether the processor	-0.124939
-0.540619	language. Such a processor	-0.124939
-0.463136	better. Whenever a processor	-0.124939
-0.772276	negative list of processor	-0.124939
-0.676535	positive list of processor	-0.124939
-0.549634	you know that processor	-0.124939
-0.462507	has. Assuming that processor	-0.124939
-0.997908	works best on processor	-0.124939
-0.463116	threads simultaneously. This processor	-0.124939
-0.596362	models rather than processor	-0.124939
-0.749937	in the same processor	-0.425969
-0.293603	list of which processor	-0.425969
-0.587461	use for each processor	-0.124939
-0.503038	utilize the multiple processor	-0.124939
-0.357252	this time, any processor	-0.124939
-1.035348	time a new processor	-0.124939
-0.459321	terms of specific processor	-0.124939
-0.342605	machine. The virtual processor	-0.124939
-0.442968	important. A virtual processor	-0.124939
-0.354882	"we don't support processor	-0.124939
-0.862596	on a particular processor	-0.124939
-1.001992	that the next processor	-0.124939
-0.455774	new and better processor	-0.124939
-0.350705	AMD or VIA processor	-0.124939
-0.348225	processors. A non-Intel processor	-0.124939
-0.347337	only one logical processor	-0.124939
-0.093256	where a soft processor	-0.124939
-0.093256	Such a soft processor	-0.124939
-0.325162	tell a hyperthreading processor	-0.124939
-0.294024	a dedicated physics processor	-0.124939
-0.294024	in a word processor	-0.124939
-0.237739	a Core i7 processor	-0.124939
-0.237739	cache. The Core2 processor	-0.124939
-0.237739	A Pentium M processor	-0.124939
-0.600829	lowest of the compiled	-0.124939
-1.954102	is that the compiled	-0.124939
-0.804664	but not the compiled	-0.124939
-0.582723	instances makes the compiled	-0.124939
-0.546083	is, and is compiled	-0.124939
-1.159434	code that is compiled	-0.124939
-0.569807	Code that is compiled	-0.124939
-1.503419	a function is compiled	-0.124939
-0.787364	the code is compiled	-0.301030
-0.522861	source code is compiled	-0.124939
-1.390059	the program is compiled	-0.124939
-0.927158	a program is compiled	-0.124939
-0.355914	Yet, D is compiled	-0.124939
-0.541025	C++ file and compiled	-0.124939
-1.198150	be useful in compiled	-0.124939
-0.894143	be implemented in compiled	-0.124939
-0.473745	programs implemented in compiled	-0.124939
-1.407349	have to be compiled	-0.124939
-0.594766	example should be compiled	-0.124939
-0.760194	can possibly be compiled	-0.124939
-0.525828	critical code are compiled	-0.124939
-0.357937	and main() are compiled	-0.124939
-1.476027	piece of code compiled	-0.124939
-0.354051	later with code compiled	-0.124939
-0.141903	on mixing code compiled	-0.124939
-0.141903	when mixing code compiled	-0.124939
-0.500730	faster than when compiled	-0.124939
-0.355860	each process when compiled	-0.124939
-0.586591	platforms and other compiled	-0.124939
-0.357842	if necessary, each compiled	-0.124939
-0.838880	A shared object compiled	-0.124939
-0.356864	files are first compiled	-0.124939
-0.334628	systems and programs compiled	-0.124939
-0.491940	precision in programs compiled	-0.124939
-0.257518	Obviously, the directly compiled	-0.124939
-0.257518	well as directly compiled	-0.124939
-0.257518	of C++, directly compiled	-0.124939
-0.343603	with a fully compiled	-0.124939
-0.421227	; Example 8.26a compiled	-0.124939
-0.325310	objects are normally compiled	-0.124939
-0.382714	; Example 8.26b compiled	-0.124939
-0.237820	can be cross- compiled	-0.124939
-0.358215	array element } An	-0.124939
-0.358017	Use inline functions An	-0.124939
-0.555255	software in C++ An	-0.124939
-0.539640	70 Induction variables An	-0.124939
-0.356563	the application code. An	-0.124939
-0.356554	less each time. An	-0.124939
-0.576455	Boolean vector operations An	-0.124939
-0.651271	7.27 Overloaded operators An	-0.124939
-0.354407	Uncached memory store An	-0.124939
-0.354101	the first program. An	-0.124939
-0.565031	variable is used. An	-0.124939
-0.809738	in some cases. An	-0.124939
-0.735216	efficient container classes. An	-0.124939
-0.629218	data caching inefficient. An	-0.124939
-0.526739	it is executed. An	-0.124939
-0.625621	of the arrays. An	-0.124939
-0.341465	0's gives zero. An	-0.124939
-0.862729	speed is important. An	-0.124939
-0.473134	methods mentioned above. An	-0.124939
-0.255645	a single result. An	-0.124939
-0.255645	a negative result. An	-0.124939
-0.819346	and constant propagation An	-0.124939
-0.429164	93. 7.10 Arrays An	-0.124939
-0.478206	class is declared. An	-0.124939
-0.325131	i = s; An	-0.124939
-0.420929	given in www.agner.org/optimize/cppexamples.zip. An	-0.124939
-0.407717	implementations. 7.22 Inheritance An	-0.124939
-0.407717	functions. 7.4 Enums An	-0.124939
-0.314397	function is inlined. An	-0.124939
-0.407891	to become fragmented. An	-0.124939
-0.314397	8.1 below. Devirtualization An	-0.124939
-0.293894	and VIA CPUs: An	-0.124939
-0.382441	software in C++: An	-0.124939
-0.538302	the same thing. An	-0.124939
-0.237625	as memory leak. An	-0.124939
-0.237625	-128 to +127. An	-0.124939
-0.237625	and operating systems"). An	-0.124939
-0.237625	for char pointers). An	-0.124939
-0.237625	sizes of matrices. An	-0.124939
-0.237625	the C99 standard. An	-0.124939
-0.237625	on bounds checking). An	-0.124939
-0.237625	in assembly language: An	-0.124939
-0.237625	set it supports. An	-0.124939
-0.237625	on page 27. An	-0.124939
-0.578186	N-1)==0,N>::p(x); } // Use	-0.124939
-0.501909	CriticalFunction(); ... // Use	-0.124939
-0.653136	_mm_cmpgt_epi16(b, zero); // Use	-0.124939
-0.355550	* 2.5; // Use	-0.124939
-0.358643	of copying it Use	-0.124939
-0.574386	Use intrinsic functions Use	-0.124939
-0.568207	Use assembly language Use	-0.124939
-0.355319	graphics cards, etc. Use	-0.124939
-0.484627	of the data. Use	-0.124939
-0.531128	lots of data. Use	-0.124939
-0.455345	with all compilers. Use	-0.124939
-0.453580	or Intel compiler. Use	-0.124939
-0.350817	blocks, for example: Use	-0.124939
-0.350088	are satisfied: 1. Use	-0.124939
-0.349525	or PathScale. 2. Use	-0.124939
-0.510138	vector function libraries. Use	-0.124939
-0.539288	function, if possible. Use	-0.124939
-0.488845	the object file. Use	-0.124939
-0.339414	to do this: Use	-0.124939
-0.752796	manual for details. Use	-0.124939
-0.331586	or common names. Use	-0.124939
-0.331586	or library files. Use	-0.124939
-0.467207	as floating point. Use	-0.124939
-0.325082	for performance reasons. Use	-0.124939
-0.421004	is not optimal. Use	-0.124939
-0.458344	of this option. Use	-0.124939
-0.325268	at vectorization. 3. Use	-0.124939
-0.314576	pointers are implemented. Use	-0.124939
-0.314455	Cache contentions expected. Use	-0.124939
-0.443963	a hot spot. Use	-0.124939
-0.102777	...................................................................................... 16 3.2 Use	-0.124939
-0.102777	be improved. 3.2 Use	-0.124939
-0.102777	to zero. 14.3 Use	-0.124939
-0.102777	.................................................................................................. 134 14.3 Use	-0.124939
-0.102777	......................................................................................... 132 14.1 Use	-0.124939
-0.102777	optimization topics 14.1 Use	-0.124939
-0.293950	dedicated test server. Use	-0.124939
-0.382509	No cache contentions. Use	-0.124939
-0.293950	it calls. 48 Use	-0.124939
-0.237674	the symbolic link. Use	-0.124939
-0.237674	an assembly listing. Use	-0.124939
-0.237674	// Example 7.34a. Use	-0.124939
-0.237674	compiler allows "__attribute__((visibility("hidden")))". Use	-0.124939
-0.237674	that supports this). Use	-0.124939
-0.358888	a string of bytes	-0.124939
-0.305364	pointer is 4 bytes	-0.124939
-0.315579	_mm_permutevar_ps 4 4 bytes	-0.124939
-0.315579	_mm256_permutevar_ps 4 4 bytes	-0.124939
-0.305364	SSE Store 4 bytes	-0.124939
-0.058608	_mm256_i32gather_epi32 unlimited 4 bytes	-0.124939
-0.058608	_mm_i32gather_ps unlimited 4 bytes	-0.124939
-0.058608	_mm_i32gather_epi32 unlimited 4 bytes	-0.124939
-0.058608	_mm256_i32gather_ps unlimited 4 bytes	-0.124939
-0.439776	double's of 8 bytes	-0.124939
-0.387068	systems and 8 bytes	-0.124939
-0.297664	double takes 8 bytes	-0.124939
-0.297664	the structure 8 bytes	-0.124939
-0.297664	SSE2 Store 8 bytes	-0.124939
-0.057510	_mm256_i64gather_pd unlimited 8 bytes	-0.124939
-0.057510	_mm_i64gather_pd unlimited 8 bytes	-0.124939
-0.057510	_mm256_i64gather_epi32 unlimited 8 bytes	-0.124939
-0.057510	_mm_i64gather_epi32 unlimited 8 bytes	-0.124939
-0.655753	is typically 64 bytes	-0.124939
-0.400612	increased to 16 bytes	-0.124939
-0.308657	bigger than 16 bytes	-0.124939
-0.308657	to test 16 bytes	-0.124939
-0.080696	SSE2 Store 16 bytes	-0.124939
-0.038476	SSE Store 16 bytes	-0.425969
-0.403801	class is 128 bytes	-0.124939
-0.569219	the first 128 bytes	-0.124939
-0.128314	0.28 strlen 128 bytes	-0.124939
-0.128314	0.27 strlen 128 bytes	-0.124939
-0.081629	number of unused bytes	-0.124939
-0.081629	holes of unused bytes	-0.124939
-0.182237	// 2 unused bytes	-0.124939
-0.081629	// 4 unused bytes	-0.124939
-0.081629	also 4 unused bytes	-0.124939
-0.081629	are 6 unused bytes	-0.124939
-0.081629	// 6 unused bytes	-0.124939
-0.350333	covers 64 consecutive bytes	-0.124939
-0.343658	(less than 65 bytes	-0.124939
-0.339366	bigger than 127 bytes	-0.124939
-0.339366	table. Type size, bytes	-0.124939
-0.294154	4 = 2048 bytes	-0.124939
-0.294154	64 or 0x40 bytes	-0.124939
-0.237853	the array 800 bytes	-0.124939
-0.237853	size, bytes alignment, bytes	-0.124939
-0.600902	than in the big	-0.124939
-0.368022	size that is big	-0.425969
-1.345508	member function is big	-0.124939
-0.844954	integer size is big	-0.124939
-0.894084	itself is a big	-0.124939
-0.586321	parts of a big	-0.124939
-0.586321	modules of a big	-0.124939
-0.589667	rows in a big	-0.124939
-0.589667	investing in a big	-0.124939
-0.655986	roll out a big	-0.124939
-0.524427	processors requires a big	-0.124939
-0.659726	and deallocation of big	-0.124939
-0.550521	Align arrays and big	-0.124939
-0.527241	be done in big	-0.124939
-0.570216	method only for big	-0.124939
-0.358772	therefore recommended that big	-0.124939
-1.376464	are based on big	-0.124939
-0.725915	the compiled code big	-0.124939
-0.831655	programs that have big	-0.124939
-0.562102	systems may have big	-0.124939
-0.185441	if you have big	-0.124939
-0.550010	platforms that use big	-0.124939
-0.462732	floppy disk. A big	-0.124939
-0.586535	arrays and other big	-0.124939
-0.563982	together in one big	-0.124939
-0.493169	structure has one big	-0.124939
-0.350431	to allocate one big	-0.124939
-1.819416	there is no big	-0.124939
-1.121555	there are no big	-0.124939
-0.558516	matrix is so big	-0.124939
-0.528764	c[i] are so big	-0.124939
-0.818070	be a very big	-0.124939
-0.336767	apply to very big	-0.124939
-0.336767	the arrays very big	-0.124939
-0.336767	is made very big	-0.124939
-0.356765	in doubt how big	-0.124939
-0.505791	container is too big	-0.124939
-0.464340	c[i] are too big	-0.124939
-0.348296	point comparison. On big	-0.124939
-1.038923	Reading or writing big	-0.124939
-0.339273	to come. Even big	-0.124939
-0.237780	that of yesterday's big	-0.124939
-0.587240	extra code and doesn't	-0.124939
-0.455888	a function that doesn't	-0.301030
-1.082879	integer size that doesn't	-0.124939
-0.354490	efficient solution that doesn't	-0.124939
-0.354490	CPU dispatcher that doesn't	-0.124939
-0.354490	programming style that doesn't	-0.124939
-0.911064	so that it doesn't	-0.425969
-0.523824	assume that it doesn't	-0.124939
-0.523824	saying that it doesn't	-0.124939
-0.179135	function because it doesn't	-0.425969
-0.489426	code because it doesn't	-0.124939
-0.489426	platforms because it doesn't	-0.124939
-0.530114	method, but it doesn't	-0.124939
-0.530114	restriction, but it doesn't	-0.124939
-0.801905	this case it doesn't	-0.124939
-0.783529	if the compiler doesn't	-0.124939
-0.990908	because the compiler doesn't	-0.124939
-0.910275	If the compiler doesn't	-0.124939
-0.538178	But the compiler doesn't	-0.124939
-0.538178	function, the compiler doesn't	-0.124939
-0.538178	why the compiler doesn't	-0.124939
-0.555982	modules The compiler doesn't	-0.124939
-0.555982	temp. The compiler doesn't	-0.124939
-0.343564	the chosen compiler doesn't	-0.124939
-0.358308	mean atomic. It doesn't	-0.124939
-1.267481	that the CPU doesn't	-0.124939
-0.833473	the CPUID instruction doesn't	-0.124939
-0.525780	to signed integer doesn't	-0.124939
-0.462262	constructors. A class doesn't	-0.124939
-0.879394	where the size doesn't	-0.124939
-1.039491	if the object doesn't	-0.124939
-0.560053	Unfortunately, this method doesn't	-0.124939
-0.536846	as the error doesn't	-0.124939
-0.500132	signed integer overflow doesn't	-0.124939
-0.355301	if above line doesn't	-0.124939
-0.650308	// if above doesn't	-0.124939
-0.576646	structure), the microprocessor doesn't	-0.124939
-0.349628	a store operation doesn't	-0.124939
-0.341756	Note that volatile doesn't	-0.124939
-0.331758	Gnu manual currently doesn't	-0.124939
-0.573670	up if the threads	-0.124939
-0.573670	advantage if the threads	-0.124939
-0.198292	contentions if the threads	-0.124939
-0.595583	line, because the threads	-0.124939
-1.145937	The use of threads	-0.124939
-2.067244	the number of threads	-0.124939
-0.358782	p. 28) The threads	-0.124939
-0.358780	event counts for threads	-0.124939
-0.541008	faster and that threads	-0.124939
-0.358630	multiple processes or threads	-0.124939
-1.350714	two or more threads	-0.124939
-0.523957	have no more threads	-0.124939
-0.519484	variable. The different threads	-0.124939
-0.351830	means that different threads	-0.124939
-0.548141	environment, between different threads	-0.124939
-0.550972	running in other threads	-0.124939
-0.818859	if no other threads	-0.124939
-0.357990	end when all threads	-0.124939
-0.539435	not divided into threads	-0.124939
-0.128394	example, if multiple threads	-0.124939
-0.128394	safe if multiple threads	-0.124939
-0.128394	used by multiple threads	-0.124939
-0.128394	zation by multiple threads	-0.124939
-0.524156	divided into multiple threads	-0.124939
-1.003844	shared between multiple threads	-0.124939
-0.404102	to avoid multiple threads	-0.124939
-0.311478	for running multiple threads	-0.124939
-0.311478	cores: Define multiple threads	-0.124939
-0.311478	processing. Running multiple threads	-0.124939
-0.428821	considerable. If two threads	-0.124939
-0.331339	of making two threads	-0.124939
-0.062194	to run two threads	-0.425969
-0.331339	avoid running two threads	-0.124939
-0.331339	doesn't prevent two threads	-0.124939
-0.245270	that communication between threads	-0.124939
-0.245270	necessary communication between threads	-0.124939
-0.354855	can run eight threads	-0.124939
-0.429435	access in separate threads	-0.124939
-0.331830	tasks into separate threads	-0.124939
-0.314640	processor core. Two threads	-0.124939
-0.294126	core and high-priority threads	-0.124939
-1.613118	This is the best	-0.124939
-1.559102	one of the best	-0.124939
-0.737157	some of the best	-0.425969
-1.525773	pointer to the best	-0.124939
-0.594860	.NET and the best	-0.124939
-0.598444	4 in the best	-0.124939
-0.593827	DLL with the best	-0.124939
-0.930346	is not the best	-0.124939
-0.566129	libraries have the best	-0.124939
-1.050636	not use the best	-0.124939
-0.576242	will do the best	-0.124939
-0.356239	or model the best	-0.124939
-1.168021	to find the best	-0.124939
-0.501258	needed. Obviously, the best	-0.124939
-0.460218	that select the best	-0.124939
-0.460218	by choosing the best	-0.124939
-0.356239	spot. Sometimes, the best	-0.124939
-0.356239	doesn't provide the best	-0.124939
-0.599368	cases. It is best	-0.124939
-0.968690	programming language is best	-0.124939
-0.578650	universal solution is best	-0.124939
-0.561081	application-specific code. The best	-0.124939
-0.541758	it calls. The best	-0.124939
-0.865834	is available. The best	-0.124939
-0.456257	operating system. The best	-0.124939
-0.456257	this case. The best	-0.124939
-0.456257	virtual machine. The best	-0.124939
-0.456257	other compilers). The best	-0.124939
-0.353118	below shows. The best	-0.124939
-0.353118	suitable duration. The best	-0.124939
-0.353118	several flaws: The best	-0.124939
-0.541052	64-bit version for best	-0.124939
-0.595583	compilers that are best	-0.124939
-0.553592	likely to work best	-0.124939
-0.440242	one that works best	-0.124939
-0.311690	version that works best	-0.124939
-0.300622	automatic vectorization works best	-0.124939
-0.124803	than "what works best	-0.124939
-0.124803	thinks "what works best	-0.124939
-0.051252	on what fits best	-0.425969
-0.325361	version that performs best	-0.124939
-0.896663	problems if the necessary	-0.124939
-0.839405	doesn't have the necessary	-0.124939
-0.876231	for all the necessary	-0.124939
-1.414629	to do the necessary	-0.124939
-0.805262	that does the necessary	-0.124939
-0.503773	that support the necessary	-0.124939
-0.358038	systems lack the necessary	-0.124939
-0.948017	that it is necessary	-0.124939
-1.066727	then it is necessary	-0.425969
-1.140012	Therefore, it is necessary	-0.124939
-0.542531	Here, it is necessary	-0.124939
-0.542531	called, it is necessary	-0.124939
-0.542531	Sometimes it is necessary	-0.124939
-0.542531	accessed, it is necessary	-0.124939
-0.595541	often. This is necessary	-0.124939
-0.876674	calculations. It is necessary	-0.124939
-0.588651	hackers. It is necessary	-0.124939
-0.459071	This unit-testing is necessary	-0.124939
-1.289521	the amount of necessary	-0.124939
-0.818290	usability problems and necessary	-0.124939
-1.922334	This can be necessary	-0.124939
-1.598588	may not be necessary	-0.124939
-1.659443	it may be necessary	-0.124939
-1.431663	It may be necessary	-0.124939
-0.358756	frequent updates are necessary	-0.124939
-0.570336	linking makes it necessary	-0.124939
-1.425093	it is not necessary	-0.124939
-1.191301	this is not necessary	-0.124939
-1.312513	It is not necessary	-0.124939
-0.192656	handling is not necessary	-0.124939
-0.863806	they are not necessary	-0.124939
-0.656941	overflow checks where necessary	-0.124939
-0.357347	and calling any necessary	-0.124939
-0.746153	it is often necessary	-0.124939
-1.103149	It is often necessary	-0.124939
-0.507510	It is therefore necessary	-0.602060
-0.531372	it is rarely necessary	-0.124939
-0.314678	give the 124 necessary	-0.124939
-1.826655	the address of element	-0.124939
-0.566627	was used by element	-0.124939
-0.065583	is swapped with element	-0.124939
-0.658967	to access an element	-0.124939
-0.182233	at the vector element	-0.425969
-0.584639	hold only one element	-0.124939
-0.144938	2 to each element	-0.425969
-0.546756	bc for each element	-0.124939
-0.061857	// AND each element	-0.425969
-0.328837	matrix, i.e. each element	-0.124939
-0.061857	// Compare each element	-0.425969
-0.976864	of the array element	-0.124939
-0.685325	if the array element	-0.124939
-0.495721	address of array element	-0.124939
-1.026208	of an array element	-0.124939
-0.601004	of each array element	-0.124939
-0.328495	// Output array element	-0.124939
-0.357172	of each table element	-0.124939
-0.831127	of the first element	-0.124939
-0.442919	before the first element	-0.425969
-0.873505	of a new element	-0.124939
-0.355931	make this extra element	-0.124939
-0.354695	needs only calculate element	-0.124939
-0.498936	expressions for every element	-0.124939
-0.549537	add the last element	-0.124939
-0.326843	the diagonal. Each element	-0.124939
-0.326843	linked list. Each element	-0.124939
-0.769318	clock cycles per element	-0.124939
-0.065793	size Time per element	-0.124939
-0.065793	kilobytes Time per element	-0.124939
-0.065793	9.6a Time per element	-0.124939
-0.026211	the numerically largest element	-0.425969
-0.054106	Find numerically largest element	-0.124939
-0.407982	is the nearest element	-0.124939
-0.294098	an extra dummy element	-0.124939
-0.237804	When we reach element	-0.124939
-1.027443	is also a language	-0.124939
-0.462949	today. But this language	-0.124939
-0.462815	is important. A language	-0.124939
-0.563419	of the C++ language	-0.124939
-0.336522	with the C++ language	-0.124939
-0.336522	But the C++ language	-0.124939
-0.335896	needed. The C++ language	-0.124939
-0.335896	work. The C++ language	-0.124939
-0.953680	of different C++ language	-0.124939
-0.348084	way the programming language	-0.124939
-0.265673	choosing a programming language	-0.124939
-0.123326	choice of programming language	-0.124939
-0.123326	Choice of programming language	-0.124939
-0.265673	decide which programming language	-0.124939
-0.265673	the C++ programming language	-0.124939
-0.348084	the software programming language	-0.124939
-0.265673	a particular programming language	-0.124939
-0.265673	the preferred programming language	-0.124939
-0.249568	use of assembly language	-0.124939
-0.249568	linking to assembly language	-0.124939
-0.416616	d in assembly language	-0.124939
-0.358624	option for assembly language	-0.124939
-0.463251	C++ or assembly language	-0.124939
-0.358624	generate an assembly language	-0.124939
-0.189700	by using assembly language	-0.124939
-0.189700	optimized, using assembly language	-0.124939
-0.249568	this: Use assembly language	-0.124939
-0.249568	instruction timing, assembly language	-0.124939
-0.249568	The MASM assembly language	-0.124939
-0.653215	for the common language	-0.124939
-0.350789	the low-level C language	-0.124939
-0.347391	the device. Any language	-0.124939
-0.441958	reasons, the preferred language	-0.124939
-0.339397	library). The D language	-0.124939
-0.125808	a hardware definition language	-0.124939
-0.126389	The hardware definition language	-0.124939
-0.336211	application integration, mixed language	-0.124939
-0.172644	in a high-level language	-0.124939
-0.172644	an advanced high-level language	-0.124939
-0.356576	to CPU-intensive code. But	-0.124939
-0.460633	consumes CPU time. But	-0.124939
-0.518962	the member function. But	-0.124939
-0.630844	the called function. But	-0.124939
-0.341989	strlen, sprintf, etc. But	-0.124939
-0.341989	Visual Basic, etc. But	-0.124939
-0.811365	a smart pointer. But	-0.124939
-0.349494	c > b) But	-0.124939
-0.348759	the same resources. But	-0.124939
-0.563822	three clock cycles. But	-0.124939
-0.735258	for double precision. But	-0.124939
-0.344937	a modern CPU. But	-0.124939
-0.344894	when it returns. But	-0.124939
-0.343495	and unsigned integers. But	-0.124939
-0.343352	as function parameter. But	-0.124939
-0.341593	the destination array. But	-0.124939
-0.758900	same cache line. But	-0.124939
-0.341539	mathe- matical applications. But	-0.124939
-0.822123	as an integer. But	-0.124939
-0.729586	the data members. But	-0.124939
-0.331545	to function names. But	-0.124939
-0.331545	advantageous by itself. But	-0.124939
-0.331545	the constant 5. But	-0.124939
-0.325042	in other languages. But	-0.124939
-0.420954	soon be obsolete. But	-0.124939
-0.407741	to be inlined. But	-0.124939
-0.314416	positive value, n. But	-0.124939
-0.538335	it was programmed. But	-0.124939
-0.538335	on the market. But	-0.124939
-0.293913	have such checks. But	-0.124939
-0.293913	software optimization issue. But	-0.124939
-0.293913	see the delay. But	-0.124939
-0.293913	0+1.23456 = 1.23456. But	-0.124939
-0.293913	for Java today. But	-0.124939
-0.237641	or vice versa. But	-0.124939
-0.237641	containers 93 themselves. But	-0.124939
-0.237641	(i=0; i<n; ++i). But	-0.124939
-0.237641	C0::f or C1::f. But	-0.124939
-0.237641	of its simplicity. But	-0.124939
-0.237641	a single session. But	-0.124939
-0.237641	a and b. But	-0.124939
-0.237641	a cache miss. But	-0.124939
-0.237641	other resource conflicts. But	-0.124939
-0.892761	times and the speed	-0.124939
-1.400019	faster than the speed	-0.124939
-0.594718	long because the speed	-0.124939
-0.462111	can double the speed	-0.124939
-0.568967	used, while the speed	-0.124939
-1.021492	can improve the speed	-0.124939
-0.539395	actually increase the speed	-0.124939
-0.357728	speed Testing the speed	-0.124939
-0.657468	that measures the speed	-0.124939
-0.763568	be used to speed	-0.124939
-0.590927	about how to speed	-0.124939
-0.225228	no difference in speed	-0.602060
-0.357796	to gain in speed	-0.124939
-0.357796	you gain in speed	-0.124939
-0.558703	used data. The speed	-0.124939
-0.501263	and precision. The speed	-0.124939
-0.356242	work correctly. The speed	-0.124939
-0.356242	CPUs optimally. The speed	-0.124939
-0.356242	number 6! The speed	-0.124939
-0.357478	the software for speed	-0.124939
-0.502990	critical. Optimizing for speed	-0.124939
-0.357478	Linux Optimize for speed	-0.124939
-0.358667	static version if speed	-0.124939
-0.355867	be avoided when speed	-0.124939
-0.355867	is preferred when speed	-0.124939
-0.351510	the code where speed	-0.124939
-0.351510	are areas where speed	-0.124939
-0.524933	was hardly any speed	-0.124939
-0.503970	improve the execution speed	-0.124939
-0.059145	terms of execution speed	-0.425969
-0.309169	optimized for execution speed	-0.124939
-0.309169	for optimizing execution speed	-0.124939
-0.456251	statements The high speed	-0.124939
-0.530730	want to improve speed	-0.124939
-0.351601	can actually reduce speed	-0.124939
-0.348288	run with reduced speed	-0.124939
-0.348246	and a processing speed	-0.124939
-0.336240	less than half speed	-0.124939
-0.255853	handled at half speed	-0.124939
-0.434926	15.1c). 16 Testing speed	-0.124939
-1.150396	compiled for the specific	-0.124939
-0.587993	enough for the specific	-0.124939
-0.569773	searching, or the specific	-0.124939
-0.594935	input/output than the specific	-0.124939
-0.550774	This example is specific	-0.124939
-1.637401	there is a specific	-0.124939
-0.583847	integer of a specific	-0.124939
-0.583847	types of a specific	-0.124939
-0.539748	code to a specific	-0.124939
-0.190922	thread to a specific	-0.425969
-0.539748	linker to a specific	-0.124939
-1.184909	elements in a specific	-0.124939
-0.482391	container for a specific	-0.124939
-0.482391	enough for a specific	-0.124939
-0.482391	Compile for a specific	-0.124939
-0.482391	wired for a specific	-0.124939
-0.584478	compiler that a specific	-0.124939
-0.354510	resources. Typically, a specific	-0.124939
-0.354510	bit indicates a specific	-0.124939
-1.282945	in terms of specific	-0.124939
-0.358545	or lists of specific	-0.124939
-0.540000	separate version for specific	-0.124939
-0.462618	are fine-tuned for specific	-0.124939
-1.512975	if there are specific	-0.124939
-0.358694	// AVX2 // specific	-0.124939
-0.463258	CPU brands or specific	-0.124939
-0.358227	optimization instructions at specific	-0.124939
-0.489810	elements have no specific	-0.124939
-0.701724	I have no specific	-0.124939
-0.349712	or (requires no specific	-0.124939
-0.445814	not making any specific	-0.124939
-0.344862	to recommend any specific	-0.124939
-0.344862	to obey any specific	-0.124939
-0.349672	with legacy code, specific	-0.124939
-0.552597	available to fit specific	-0.124939
-0.347349	to maintain. Any specific	-0.124939
-0.615491	used for giving specific	-0.124939
-0.065784	requires a CPU- specific	-0.124939
-0.065784	functions. The CPU- specific	-0.124939
-0.065784	to make CPU- specific	-0.124939
-0.237829	and other system- specific	-0.124939
-0.237829	activates critical application- specific	-0.124939
-0.527172	is changed to c	-0.124939
-0.051622	if b and c	-0.124939
-0.051622	a, b and c	-0.124939
-0.051622	add b and c	-0.124939
-0.025044	Multiply b and c	-0.425969
-0.835290	the diagonal. The c	-0.124939
-0.582819	write: y = c	-0.124939
-0.358642	be reversed if c	-0.124939
-1.011194	!= 0) { c	-0.124939
-0.358272	then d+e, then c	-0.124939
-0.341496	element in vector c	-0.425969
-0.358054	not overlap. If c	-0.124939
-0.879878	= b + c	-0.124939
-1.003849	+ b + c	-0.124939
-0.357339	if (a * c	-0.124939
-0.084253	(c = 0; c	-0.726999
-0.499567	x[1] = b; c	-0.124939
-0.063025	? b : c	-0.124939
-0.062260	vector c: __m128i c	-0.425969
-0.353068	for fast division c	-0.124939
-0.579843	if a, b, c	-0.124939
-0.061139	select(b > 0, c	-0.425969
-0.537098	b, c, d; c	-0.124939
-0.320720	0, c, d; c	-0.124939
-0.057788	> 0 ? c	-0.425969
-0.635070	temp * temp; c	-0.124939
-0.336068	vector c: Is16vec8 c	-0.124939
-0.478419	b = 100, c	-0.124939
-0.314542	= select_gt(b, zero, c	-0.124939
-0.294033	b * 3.5; c	-0.124939
-0.237747	b = 1.0E8, c	-0.124939
-0.237747	else { CFALSE: c	-0.124939
-0.237747	(a+1) * (a+1); c	-0.124939
-0.598626	applications it is much	-0.124939
-0.558760	operation which is much	-0.124939
-0.820418	operation, which is much	-0.124939
-0.539012	soft processor is much	-0.124939
-0.461301	self-relative addresses is much	-0.124939
-0.722506	The effect is much	-0.124939
-0.357091	without -fpic is much	-0.124939
-0.573184	inheritance where a much	-0.124939
-0.358315	computer users and much	-0.124939
-0.358315	accessed backwards and much	-0.124939
-0.358715	you measure are much	-0.124939
-0.461163	to do as much	-0.124939
-0.356983	to get as much	-0.124939
-0.504234	devices typically have much	-0.124939
-0.358223	millisecond resolution. A much	-0.124939
-0.549972	facilities that do much	-0.124939
-0.431316	of memory takes much	-0.124939
-0.431316	disk often takes much	-0.124939
-0.236367	point division takes much	-0.124939
-0.341727	Integer division takes much	-0.124939
-0.333331	because truncation takes much	-0.124939
-0.568097	effect is so much	-0.124939
-0.357136	bookkeeping depends very much	-0.124939
-0.356842	that typically take much	-0.124939
-0.570350	program are often much	-0.124939
-0.123246	called and how much	-0.425969
-0.331480	can calculate how much	-0.124939
-0.428998	to measure how much	-0.124939
-1.063292	it is accessed much	-0.124939
-0.458802	operators are calculated much	-0.124939
-0.355002	framework typically uses much	-0.124939
-0.307385	program has too much	-0.124939
-0.399042	it takes too much	-0.124939
-0.307385	without worrying too much	-0.124939
-0.562262	It is usually much	-0.124939
-1.073792	can be made much	-0.124939
-0.188047	dispatching: 1. How much	-0.124939
-0.083924	16 3.1 How much	-0.124939
-0.083924	consumers 3.1 How much	-0.124939
-0.595001	you can obtain much	-0.124939
-0.538627	have to worry much	-0.124939
-1.605458	there is a single	-0.124939
-0.809992	functions in a single	-0.124939
-0.809992	called in a single	-0.124939
-0.946085	bits in a single	-0.124939
-0.809992	done in a single	-0.124939
-0.809992	conditions in a single	-0.124939
-0.553021	included in a single	-0.124939
-0.583071	handling for a single	-0.124939
-0.554485	calculated by a single	-0.124939
-1.038852	replaced by a single	-0.124939
-0.585071	moved with a single	-0.124939
-1.034499	stored as a single	-0.124939
-1.660851	to make a single	-0.124939
-0.427186	not only a single	-0.124939
-0.427186	contains only a single	-0.124939
-0.151629	files into a single	-0.425969
-0.386590	together into a single	-0.124939
-0.544254	fits into a single	-0.124939
-0.386590	formula into a single	-0.124939
-0.386590	joined into a single	-0.124939
-0.494539	have even a single	-0.124939
-0.494539	when just a single	-0.124939
-0.351416	operators produce a single	-0.124939
-0.351416	than isolating a single	-0.124939
-0.527222	result back to single	-0.124939
-0.463455	is higher for single	-0.124939
-0.726347	bitwise operators are single	-0.124939
-0.835292	are done with single	-0.124939
-1.134351	as fast as single	-0.124939
-1.396819	more time than single	-0.124939
-1.163806	You may use single	-0.124939
-0.355981	examples all use single	-0.124939
-0.358232	convert b from single	-0.124939
-1.339005	you are using single	-0.124939
-0.830601	in speed between single	-0.124939
-0.558514	making the constant single	-0.124939
-0.458647	precision or four single	-0.124939
-0.354859	precision or eight single	-0.124939
-0.450106	this for testing single	-0.124939
-0.347375	Do not mix single	-0.124939
-0.595085	penalty for mixing single	-0.124939
-0.462033	absvalue; largest_index = i;	-0.124939
-0.357666	order(i); matrix[j][0] = i;	-0.124939
-0.390470	p) { int i;	-0.124939
-0.152755	r) { int i;	-0.124939
-0.184006	= 0; int i;	-0.425969
-0.481956	f; unsigned int i;	-0.124939
-0.481956	7.7 unsigned int i;	-0.124939
-0.544532	= 100; int i;	-0.124939
-0.004417	float f; int i;	-0.970037
-0.297432	= 10; int i;	-0.124939
-0.095030	float a[100]; int i;	-0.124939
-0.297432	= 16; int i;	-0.124939
-0.297432	= 1.0; int i;	-0.124939
-0.544532	= 1000; int i;	-0.124939
-0.102317	int list[300]; int i;	-0.602060
-0.810716	float matrix[rows][columns]; int i;	-0.124939
-0.297432	S1 list[100]; int i;	-0.124939
-0.297432	Example 7.21 int i;	-0.124939
-0.297432	Example 7.23 int i;	-0.124939
-0.297432	Example 7.20 int i;	-0.124939
-0.297432	Example 7.19 int i;	-0.124939
-0.297432	* p; int i;	-0.124939
-0.297432	= 110; int i;	-0.124939
-0.297432	Sab ab[size]; int i;	-0.124939
-0.297432	float list[16]; int i;	-0.124939
-0.297432	Example 7.30b int i;	-0.124939
-0.297432	Example 7.30a int i;	-0.124939
-0.297432	int list[301]; int i;	-0.124939
-0.357754	= p + i;	-0.124939
-0.341886	i++) f *= i;	-0.124939
-0.382873	int a[size], b[size], i;	-0.124939
-1.193101	+ 2; } These	-0.124939
-0.460647	the position-independent code. These	-0.124939
-0.908985	no extra time. These	-0.124939
-0.355286	branch mispredictions, etc. These	-0.124939
-0.498534	blocks of memory. These	-0.124939
-0.456834	C++ casting operator These	-0.124939
-1.103512	the data cache. These	-0.124939
-0.578869	the instruction set. These	-0.124939
-0.453628	set of CPUs. These	-0.124939
-0.350604	holding the pointer. These	-0.124939
-0.348884	and system calls. These	-0.124939
-0.347181	static link libraries. These	-0.124939
-0.347217	a register stack. These	-0.124939
-0.523415	implementation is needed. These	-0.124939
-0.510087	to single precision. These	-0.124939
-0.346159	the largest vector. These	-0.124939
-0.293990	brand of CPU. These	-0.124939
-0.293990	known hardware CPU. These	-0.124939
-0.520170	solve this problem. These	-0.124939
-0.341539	etc. in vectors. These	-0.124939
-0.331545	software development process. These	-0.124939
-0.331545	often excessively so. These	-0.124939
-0.325042	to improve efficiency. These	-0.124939
-0.421083	function to another. These	-0.124939
-0.420954	templates in www.agner.org/optimize/cppexamples.zip. These	-0.124939
-0.314416	is called CodeAnalyst. These	-0.124939
-0.314416	type of microprocessor. These	-0.124939
-0.407741	implementation is best. These	-0.124939
-0.443910	vectorized table lookup. These	-0.124939
-0.574998	malloc and free. These	-0.124939
-0.293913	0x3F00 and 0x4700. These	-0.124939
-0.293913	bypassing syntax checks. These	-0.124939
-0.293913	from the server. These	-0.124939
-0.382464	.R. for AVX. These	-0.124939
-0.237641	8*1024/64 = 128. These	-0.124939
-0.237641	C++". Addison-Wesley, 1996. These	-0.124939
-0.237641	the so-called commpage. These	-0.124939
-0.237641	C++ and Fortran. These	-0.124939
-0.237641	begin with _mm. These	-0.124939
-0.237641	offset table (GOT). These	-0.124939
-0.237641	and Windows 3.x. These	-0.124939
-0.237641	"Integrated Performance Primitives". These	-0.124939
-1.047194	version of the virtual	-0.124939
-1.554313	versions of the virtual	-0.124939
-0.596709	bypassed when the virtual	-0.124939
-0.876642	without using the virtual	-0.124939
-0.462291	by avoiding the virtual	-0.124939
-0.724314	can bypass the virtual	-0.124939
-1.065719	version of a virtual	-0.124939
-1.340906	pointer to a virtual	-0.124939
-0.598282	up in a virtual	-0.124939
-0.589983	lookup for a virtual	-0.124939
-1.063258	efficient as a virtual	-0.124939
-1.090936	to call a virtual	-0.124939
-0.586790	templates instead of virtual	-0.124939
-0.358535	and misprediction of virtual	-0.124939
-0.358530	runtime dispatch to virtual	-0.124939
-0.358530	// Call to virtual	-0.124939
-0.566299	member pointers and virtual	-0.124939
-0.358326	jump tables, and virtual	-0.124939
-0.463435	same machine. The virtual	-0.124939
-0.786180	is obtained with virtual	-0.124939
-0.461418	Runtime polymorphism with virtual	-0.124939
-0.459487	increasingly important. A virtual	-0.124939
-0.355663	not necessary. A virtual	-0.124939
-0.566035	resources for other virtual	-0.124939
-0.358072	is called. If virtual	-0.124939
-0.657929	at least one virtual	-0.124939
-0.838891	object has no virtual	-0.124939
-1.055707	you can avoid virtual	-0.124939
-0.459334	hardware CPU. These virtual	-0.124939
-0.278757	CHello { public: virtual	-0.425969
-0.150437	C0 { public: virtual	-0.425969
-0.349679	avoid the inefficient virtual	-0.124939
-0.348224	non-virtual functions. Avoid virtual	-0.124939
-0.450017	functions. This so-called virtual	-0.124939
-0.269101	and the Java virtual	-0.124939
-0.269101	the so-called Java virtual	-0.124939
-0.538660	public: void NotPolymorphic(); virtual	-0.124939
-0.237804	Avoid multiple inheritance, virtual	-0.124939
-0.726596	the loading of several	-0.124939
-0.358757	Pascal, Fortran and several	-0.124939
-0.358848	and decoded in several	-0.124939
-0.460083	for C++ for several	-0.124939
-0.356133	to optimize for several	-0.124939
-0.460083	and templates for several	-0.124939
-0.356133	is delayed for several	-0.124939
-0.356133	be prepared for several	-0.124939
-0.463428	a risk that several	-0.124939
-0.175375	compiler There are several	-0.124939
-0.474264	*.so). There are several	-0.124939
-0.474264	Windows). There are several	-0.124939
-0.474264	CodeAnalyst. There are several	-0.124939
-0.474264	copying. There are several	-0.124939
-0.504739	is accessed by several	-0.124939
-0.358550	to test on several	-0.124939
-0.564207	library functions have several	-0.124939
-0.720330	bit systems have several	-0.124939
-0.358301	library, except when several	-0.124939
-1.177180	a program has several	-0.124939
-0.349122	Intel compilers has several	-0.124939
-0.640474	static keyword has several	-0.124939
-0.349122	C++ syntax has several	-0.124939
-0.503799	program memory. If several	-0.124939
-0.357995	is cached, but several	-0.124939
-0.357114	is split between several	-0.124939
-0.487968	It can take several	-0.124939
-0.487968	installed can take several	-0.124939
-0.340550	will often take several	-0.124939
-0.577546	have to test several	-0.124939
-0.499188	loop that contains several	-0.124939
-0.455133	formalism that requires several	-0.124939
-0.567366	necessary to load several	-0.124939
-0.350149	loop control statement several	-0.124939
-0.491011	This can save several	-0.124939
-0.343624	I have provided several	-0.124939
-0.799614	a software package several	-0.124939
-0.314552	but it took several	-0.124939
-0.237755	different versions alternatingly several	-0.124939
-0.237755	the microprocessor wastes several	-0.124939
-0.237755	processing power. Connecting several	-0.124939
-1.271921	to the function through	-0.124939
-0.595045	Calling a function through	-0.124939
-0.821141	Call critical function through	-0.124939
-0.725007	pointers to data through	-0.124939
-0.264828	{ // loop through	-0.425969
-0.795520	its child class through	-0.124939
-0.513409	third generation class through	-0.124939
-0.911161	a derived class through	-0.124939
-0.357717	that accesses b through	-0.124939
-0.581472	loads the library through	-0.124939
-0.845823	variable or object through	-0.124939
-0.845556	accessing an object through	-0.124939
-0.636086	the data object through	-0.124939
-0.583870	accessing a variable through	-0.124939
-1.381684	function is called through	-0.124939
-0.356985	its own address through	-0.124939
-0.768977	it is accessed through	-0.124939
-0.509504	class are accessed through	-0.124939
-0.090732	arrays are accessed through	-0.301030
-0.362128	structures are accessed through	-0.124939
-0.280344	in fact accessed through	-0.124939
-0.280344	is necessarily accessed through	-0.124939
-0.323585	The code goes through	-0.124939
-0.323585	a DLL goes through	-0.124939
-0.378399	it may go through	-0.124939
-0.290595	public variables go through	-0.124939
-0.290595	but must go through	-0.124939
-0.350682	accessed from main through	-0.124939
-0.928963	{ // Loop through	-0.124939
-0.346272	automatically download updates through	-0.124939
-0.441892	find the GOT through	-0.124939
-0.341684	an extra jump through	-0.124939
-0.336142	compiler versions 7 through	-0.124939
-0.331707	or line separately through	-0.124939
-0.382645	by the caller through	-0.124939
-0.237772	allocated block. Walking through	-0.124939
-0.237772	value will propagate through	-0.124939
-0.237772	time than looping through	-0.124939
-0.237772	can be propagated through	-0.124939
-1.151601	code for the common	-0.124939
-0.588323	Prototype for the common	-0.124939
-0.572485	etc. It is common	-0.124939
-0.572485	unit-testing It is common	-0.124939
-0.572485	away. It is common	-0.124939
-0.572485	bloat. It is common	-0.124939
-0.956271	This is a common	-0.124939
-0.560890	data is a common	-0.124939
-0.560890	so is a common	-0.124939
-0.560890	abstraction is a common	-0.124939
-1.115906	of using a common	-0.124939
-1.017243	is also a common	-0.124939
-0.821613	by making a common	-0.124939
-0.587388	Fast versions of common	-0.124939
-0.358777	that branch. The common	-0.124939
-1.008754	many functions for common	-0.124939
-0.358623	with short or common	-0.124939
-0.889456	optimizations such as common	-0.124939
-1.480346	It is more common	-0.124939
-0.358249	0x20; 46 A common	-0.124939
-0.586591	leaks and other common	-0.124939
-0.633040	of the most common	-0.124939
-0.958848	in the most common	-0.124939
-0.462334	calculate the most common	-0.124939
-0.462334	probably the most common	-0.124939
-0.378442	functions The most common	-0.124939
-0.378442	operations. The most common	-0.124939
-0.378442	element. The most common	-0.124939
-0.378442	methods. The most common	-0.124939
-0.962551	libraries for many common	-0.124939
-0.454147	discovered that many common	-0.124939
-0.333508	technical problems. Some common	-0.124939
-0.333508	works best. Some common	-0.124939
-0.333508	very stupid. Some common	-0.124939
-0.354369	page 93). All common	-0.124939
-0.521375	pure. This allows common	-0.124939
-0.352249	fail to eliminate common	-0.124939
-0.564631	compiler can eliminate common	-0.124939
-0.325310	references, 'this' pointer, common	-0.124939
-0.237820	using function inlining, common	-0.124939
-0.593055	& a = a,	-0.124939
-0.457940	&& true = a,	-0.124939
-0.142583	& -1 = a,	-0.425969
-0.500877	as -(-a) = a,	-0.124939
-0.573154	overflows, even if a,	-0.124939
-0.179583	main() { int a,	-0.425969
-0.344158	unsigned integers int a,	-0.124939
-0.344158	Example 14.10 int a,	-0.124939
-0.344158	Example 14.11 int a,	-0.124939
-0.344158	* m;} int a,	-0.124939
-0.344158	Example 8.6a int a,	-0.124939
-0.344158	Example 8.6b int a,	-0.124939
-0.344158	= 1.6; int a,	-0.124939
-0.358190	+ a.y);} vector a,	-0.124939
-0.344190	Example 14.14b double a,	-0.124939
-0.344190	Example 14.14a double a,	-0.124939
-0.344190	Example 14.18c double a,	-0.124939
-0.344190	Example 8.2a double a,	-0.124939
-0.553819	S1 { float a,	-0.124939
-0.326615	1.0f;} 66 float a,	-0.124939
-0.326615	Example 11.1a float a,	-0.124939
-0.326615	Example 11.1b float a,	-0.124939
-0.326615	Example 8.16 float a,	-0.124939
-0.326615	Example 14.18a float a,	-0.124939
-0.326615	Example 14.18b float a,	-0.124939
-1.197742	In this example, a,	-0.124939
-0.568335	max(T const & a,	-0.124939
-0.023127	int SomeFunction (int a,	-0.522879
-0.570163	... int i, a,	-0.124939
-0.277378	following way: bool a,	-0.124939
-0.277378	Example 7.10a bool a,	-0.124939
-0.277378	Example 7.9a bool a,	-0.124939
-0.314669	cc[]) { Vec16s a,	-0.124939
-0.294154	vector objects Vec8s a,	-0.124939
-0.237853	vector() {} vector(float a,	-0.124939
-0.237853	to b memcpy(b, a,	-0.124939
-0.897213	belong to the thread	-0.124939
-0.891307	declared in the thread	-0.124939
-0.596128	locally in the thread	-0.124939
-0.358333	by increasing the thread	-0.124939
-0.658676	to fix the thread	-0.124939
-1.530605	stored in a thread	-0.124939
-0.870399	invalid if a thread	-0.124939
-0.871288	slower than a thread	-0.124939
-0.572931	used when a thread	-0.124939
-0.461592	by setting a thread	-0.124939
-0.461592	to lock a thread	-0.124939
-0.461860	A process or thread	-0.124939
-0.357530	each task or thread	-0.124939
-0.358511	are generally not thread	-0.124939
-0.358262	not doubled. A thread	-0.124939
-1.862558	in the same thread	-0.124939
-0.582251	as the other thread	-0.124939
-0.533545	priority to one thread	-0.124939
-0.452898	so that one thread	-0.124939
-0.350467	threads where one thread	-0.124939
-0.569612	containers for each thread	-0.124939
-0.446227	chains then each thread	-0.124939
-0.345189	to give each thread	-0.124939
-0.345189	the contrary, each thread	-0.124939
-1.640340	the operating system thread	-0.124939
-0.474873	second by another thread	-0.124939
-0.458680	core with another thread	-0.124939
-0.311209	calculations while another thread	-0.124939
-0.311209	user interface, another thread	-0.124939
-0.478528	in a separate thread	-0.425969
-0.229605	into a separate thread	-0.124939
-0.293223	making a separate thread	-0.124939
-0.280099	the stack. Each thread	-0.124939
-0.280099	jobs simultaneously. Each thread	-0.124939
-0.280099	multiple threads. Each thread	-0.124939
-0.280099	processor cores. Each thread	-0.124939
-0.331788	and a third thread	-0.124939
-0.294135	in a high-priority thread	-0.124939
-0.237837	that a low-priority thread	-0.124939
-0.237837	from a higher-priority thread	-0.124939
-0.719443	files, help files etc.	-0.124939
-0.808187	(2n / b) etc.	-0.124939
-0.059902	functions, trigonometric functions, etc.	-0.124939
-0.347225	evict number 2, etc.	-0.124939
-0.445813	// Gnu compiler, etc.	-0.124939
-0.339109	sets, cache size, etc.	-0.124939
-0.331555	violations, invalid pointers, etc.	-0.124939
-0.331555	language 11 programming, etc.	-0.124939
-0.325052	multiplexers, arithmetic units, etc.	-0.124939
-0.314426	set (/arch:SSE2, /arch:AVX etc.	-0.124939
-0.314426	addition, subtraction, multiplication, etc.	-0.124939
-0.407753	variables, loop counters, etc.	-0.124939
-0.314426	data base access, etc.	-0.124939
-0.314426	operands are comparisons, etc.	-0.124939
-0.314426	the other way, etc.	-0.124939
-0.382475	constants Sunday, Monday, etc.	-0.124939
-0.293922	network resources, databases, etc.	-0.124939
-0.293922	of semaphores, mutexes, etc.	-0.124939
-0.538351	different screen resolutions, etc.	-0.124939
-0.293922	SSE3 horizontal add, etc.	-0.124939
-0.293922	rather than loops, etc.	-0.124939
-0.538351	elimination, constant propagation, etc.	-0.124939
-0.293922	with its limit, etc.	-0.124939
-0.538351	misses, branch mispredictions, etc.	-0.124939
-0.237649	Or #include <ia32intrin.h> etc.	-0.124939
-0.237649	trees, hash maps etc.	-0.124939
-0.237649	strcat, strlen, sprintf, etc.	-0.124939
-0.237649	log, exp, sin, etc.	-0.124939
-0.237649	of graphics cards, etc.	-0.124939
-0.237649	Windows, -msse2, -mavx, etc.	-0.124939
-0.237649	units, memory ports, etc.	-0.124939
-0.237649	branch pattern history, etc.	-0.124939
-0.237649	mutexes, database connections, etc.	-0.124939
-0.237649	-msse4.1 -mAVX -axSSE3, etc.	-0.124939
-0.237649	-mAVX /arch:AVX /QaxSSE3, etc.	-0.124939
-0.237649	C#, Visual Basic, etc.	-0.124939
-0.237649	windows, graphic brushes, etc.	-0.124939
-0.237649	buttons, dialog boxes, etc.	-0.124939
-0.237649	exit(), abort(), _endthread(), etc.	-0.124939
-0.237649	floating point exceptions, etc.	-0.124939
-0.358326	Pentium 4 and AMD	-0.124939
-0.358326	Intel VTune and AMD	-0.124939
-0.358768	page 119). The AMD	-0.124939
-0.848303	assembly code for AMD	-0.124939
-0.357461	Intel VTune, for AMD	-0.124939
-0.357461	Optimization Guide for AMD	-0.124939
-0.745239	but not on AMD	-0.124939
-0.351237	at all on AMD	-0.124939
-0.351237	integer size on AMD	-0.124939
-0.534249	of performance on AMD	-0.124939
-0.497880	only supported on AMD	-0.124939
-0.750044	work well on AMD	-0.124939
-0.595173	CPUs such as AMD	-0.124939
-0.462891	AMD CPUs use AMD	-0.124939
-0.756459	strlen 128 bytes AMD	-0.124939
-0.355530	x86-64 platforms. AMD AMD	-0.124939
-0.354070	yet. Supports both AMD	-0.124939
-0.517471	later Intel processors. AMD	-0.124939
-0.001258	microarchitecture of Intel, AMD	-1.079181
-0.115946	breakdowns for Intel, AMD	-0.124939
-0.172669	with an Intel, AMD	-0.124939
-0.115946	microprocessors from Intel, AMD	-0.124939
-0.115946	supports both Intel, AMD	-0.124939
-0.644491	16kB aligned operands AMD	-0.124939
-0.642557	and x86-64 platforms. AMD	-0.124939
-0.325242	(MS) xopintrin.h (Gnu) AMD	-0.124939
-0.538660	See page 131. AMD	-0.124939
-0.538660	16kB unaligned op. AMD	-0.124939
-0.294098	AMD SSE4A ammintrin.h AMD	-0.124939
-0.237804	ia32intrin.h _mm_exp_ps _mm_exp_pd AMD	-0.124939
-0.237804	Library __vrs4_expf __vrd2_exp AMD	-0.124939
-0.237804	char 16 XOP, AMD	-0.124939
-0.237804	wmmintrin.h AVX immintrin.h AMD	-0.124939
-1.240191	solution is to compile	-0.124939
-0.841570	possibility is to compile	-0.124939
-1.664709	is possible to compile	-0.124939
-1.284853	it possible to compile	-0.124939
-1.773793	you want to compile	-0.124939
-1.034916	the code and compile	-0.124939
-0.462982	the framework and compile	-0.124939
-0.582756	course, that you compile	-0.124939
-0.558553	code when you compile	-0.124939
-0.354433	way. First you compile	-0.124939
-0.367074	calculate it at compile	-0.124939
-0.281321	the compiler at compile	-0.124939
-0.281321	as possible at compile	-0.124939
-0.281321	its value at compile	-0.124939
-0.281321	are available at compile	-0.124939
-0.281321	some calculations at compile	-0.124939
-0.281321	b) etc. at compile	-0.124939
-0.417268	are done at compile	-0.124939
-0.118269	is calculated at compile	-0.124939
-0.118269	be calculated at compile	-0.124939
-0.034829	is known at compile	-0.522879
-0.048758	be known at compile	-0.124939
-0.007756	not known at compile	-0.425969
-0.048758	integer known at compile	-0.124939
-0.048758	size known at compile	-0.124939
-0.048758	constant known at compile	-0.124939
-0.229345	is resolved at compile	-0.124939
-0.066834	always resolved at compile	-0.124939
-0.281321	are instantiated at compile	-0.124939
-0.281321	calculate (1./1.2345) at compile	-0.124939
-0.525061	references. If we compile	-0.124939
-1.442669	responsibility of the exception	-0.124939
-0.896840	it to the exception	-0.124939
-0.874716	support for the exception	-0.124939
-0.587641	information for the exception	-0.124939
-0.593763	function, then the exception	-0.124939
-0.816647	turning off the exception	-0.124939
-0.590761	The program is exception	-0.124939
-1.230034	the cost of exception	-0.124939
-0.504921	possible alternatives to exception	-0.124939
-0.504795	for debugging and exception	-0.124939
-0.358113	of overflow. The exception	-0.124939
-0.358113	support anyway. The exception	-0.124939
-0.858150	off support for exception	-0.124939
-0.527054	may think that exception	-0.124939
-0.834849	is used by exception	-0.124939
-0.540656	program relies on exception	-0.124939
-0.537040	case of an exception	-0.124939
-0.349967	and if an exception	-0.124939
-0.349967	what if an exception	-0.124939
-0.347733	will catch an exception	-0.124939
-0.347733	possibly throw an exception	-0.124939
-0.347733	for raising an exception	-0.124939
-0.893210	optimal to use exception	-0.124939
-0.526136	make your program exception	-0.124939
-0.358062	some compilers. If exception	-0.124939
-1.819349	there is no exception	-0.124939
-0.500218	even if no exception	-0.124939
-0.586244	instead of using exception	-0.124939
-0.539478	etc. The C++ exception	-0.124939
-0.357418	change its possible exception	-0.124939
-0.357278	never throw any exception	-0.124939
-0.349547	/Qipo -ipo No exception	-0.124939
-0.348897	and possibly save exception	-0.124939
-1.004488	the reason why exception	-0.124939
-0.212194	compatible with structured exception	-0.124939
-0.212194	relies on structured exception	-0.124939
-0.044133	You can disable exception	-0.425969
-0.294061	c[arraysize]; // Enable exception	-0.124939
-0.065768	to wrap the allocated	-0.124939
-0.358636	that owns the allocated	-0.124939
-0.572229	object that is allocated	-0.124939
-0.572229	everything that is allocated	-0.124939
-0.581903	Each object is allocated	-0.124939
-0.658013	memory block is allocated	-0.124939
-0.358882	all cleanup of allocated	-0.124939
-0.462646	programming error. The allocated	-0.124939
-0.358149	garbage collection. The allocated	-0.124939
-0.855563	size can be allocated	-0.124939
-1.009708	array can be allocated	-0.124939
-1.113757	objects can be allocated	-0.124939
-0.577647	stack can be allocated	-0.124939
-0.577647	arrays can be allocated	-0.124939
-0.588673	memory will be allocated	-0.124939
-0.593013	Objects that are allocated	-0.124939
-1.501374	if there are allocated	-0.124939
-0.065477	different sizes are allocated	-0.425969
-0.583332	anything it has allocated	-0.124939
-0.358045	saved. Any other allocated	-0.124939
-0.543657	pointers to all allocated	-0.124939
-0.567850	important that all allocated	-0.124939
-0.587556	manager for each allocated	-0.124939
-0.589047	errors; make sure allocated	-0.124939
-0.355751	caching inefficient. An allocated	-0.124939
-0.860243	that has been allocated	-0.124939
-0.559168	in its own allocated	-0.124939
-0.159595	stored in dynamically allocated	-0.124939
-0.159595	such as dynamically allocated	-0.124939
-0.159595	to different dynamically allocated	-0.124939
-0.159595	many small dynamically allocated	-0.124939
-0.159595	to align dynamically allocated	-0.124939
-0.034742	12.8 Aligning dynamically allocated	-0.124939
-0.159595	of aligning dynamically allocated	-0.124939
-0.279381	allocated memory Memory allocated	-0.124939
-0.279381	up include: Memory allocated	-0.124939
-0.781063	the time slices allocated	-0.124939
-1.945996	if it is small	-0.124939
-1.860803	the function is small	-0.124939
-0.597213	addition. This is small	-0.124939
-0.900560	of elements is small	-0.124939
-0.894733	loop count is small	-0.124939
-0.891983	repeat count is small	-0.124939
-0.597216	register is a small	-0.124939
-1.066153	used in a small	-0.124939
-0.884807	loop with a small	-0.124939
-1.236558	rather than a small	-0.124939
-1.760408	to make a small	-0.124939
-0.559869	use only a small	-0.124939
-0.829159	or writing a small	-0.124939
-0.937902	to allocate a small	-0.124939
-0.358869	the design of small	-0.124939
-0.550712	also relevant to small	-0.124939
-0.358790	memory economy and small	-0.124939
-0.588547	Microcontrollers used in small	-0.124939
-0.525128	programs, except for small	-0.124939
-0.065618	Approximate exp(x) for small	-0.425969
-0.358611	assembly instructions or small	-0.124939
-0.463185	particularly important on small	-0.124939
-0.463132	to be as small	-0.124939
-0.785720	be divided into small	-0.124939
-0.309923	even on such small	-0.124939
-0.309923	fast on such small	-0.124939
-0.351440	divided into many small	-0.124939
-0.351440	program uses many small	-0.124939
-0.524841	except for some small	-0.124939
-0.558532	object is so small	-0.124939
-0.528781	that are so small	-0.124939
-0.571254	body is very small	-0.124939
-0.350254	better on very small	-0.124939
-0.497725	functions are typically small	-0.124939
-0.464374	C are too small	-0.124939
-0.329511	Execution time too small	-0.124939
-0.786332	reading or writing small	-0.124939
-0.331747	justifies the relatively small	-0.124939
-0.595029	preferably be kept small	-0.124939
-0.598883	Testing for the overflow	-0.124939
-0.591488	(5) make the overflow	-0.124939
-0.596003	result because the overflow	-0.124939
-0.857699	in case of overflow	-0.124939
-0.462285	to problems of overflow	-0.124939
-0.556029	but risk of overflow	-0.124939
-0.507805	to check for overflow	-0.124939
-0.302688	not check for overflow	-0.124939
-0.359968	no check for overflow	-0.425969
-0.302688	might check for overflow	-0.124939
-0.302688	(1) check for overflow	-0.124939
-1.634467	make sure that overflow	-0.124939
-0.658129	so big that overflow	-0.124939
-0.463181	wrap around on overflow	-0.124939
-0.536312	Number) if an overflow	-0.124939
-0.902961	will generate an overflow	-0.124939
-0.354869	optimize away an overflow	-0.124939
-0.200307	and to make overflow	-0.425969
-0.595473	Catch floating point overflow	-0.124939
-0.581231	// Floating point overflow	-0.124939
-0.492505	+127. An integer overflow	-0.124939
-0.514126	that signed integer overflow	-0.124939
-0.349952	(3) trap integer overflow	-0.124939
-0.881479	sure that no overflow	-0.124939
-0.538919	arrays. An array overflow	-0.124939
-0.538943	aware of possible overflow	-0.124939
-0.459863	worry much about overflow	-0.124939
-0.459559	negative result. An overflow	-0.124939
-0.441385	that doesn't cause overflow	-0.124939
-0.341348	reduction would cause overflow	-0.124939
-0.353929	pool. 15 Integer overflow	-0.124939
-0.455178	Higher inputs give overflow	-0.124939
-0.350706	variables. A positive overflow	-0.124939
-0.345072	check for buffer overflow	-0.124939
-0.325232	if protection against overflow	-0.124939
-0.314601	compiler to ignore overflow	-0.124939
-0.237796	result. An uncaught overflow	-0.124939
-0.573275	double b; a +=	-0.124939
-0.358553	Example 8.5b a +=	-0.124939
-0.060261	< 100; i +=	-0.124939
-0.317170	< size; i +=	-0.124939
-0.009097	< 256; i +=	-0.522879
-0.317170	< 20; i +=	-0.124939
-0.520024	= temp; temp +=	-0.124939
-0.245507	n++) { sum +=	-0.124939
-0.067494	100; i++) sum +=	-0.124939
-0.067494	size; i++) sum +=	-0.124939
-0.067494	i<100; i++) sum +=	-0.124939
-0.045296	else { list[i] +=	-0.425969
-0.414447	i<300; i++){ list[i] +=	-0.124939
-0.219084	i<300; i+=3,i_div_3++){ list[i] +=	-0.124939
-0.336119	< r1; c1 +=	-0.124939
-0.237902	_mm_load_ps(coef+i); // s +=	-0.124939
-0.237902	for(inti=0;i<16;i+=4){ //Loopby4 s +=	-0.124939
-0.237826	2) { sum1 +=	-0.124939
-0.237826	+= list[i+1];} sum1 +=	-0.124939
-0.325222	if nonzero u.i +=	-0.124939
-0.325222	+= list[i]; sum2 +=	-0.124939
-0.325222	= Y; Y +=	-0.124939
-0.314667	+= i_div_3; list[i+1] +=	-0.124939
-0.314591	< SIZE; r1 +=	-0.124939
-0.314667	+= i_div_3; list[i+2] +=	-0.124939
-0.314591	+= Z; Z +=	-0.124939
-0.314591	+= a[i+2]; s3 +=	-0.124939
-0.314591	+= a[i+1]; s2 +=	-0.124939
-0.294080	4) { s0 +=	-0.124939
-0.294080	+= a[i]; s1 +=	-0.124939
-0.237788	modify x *const_cast<int*>(&x) +=	-0.124939
-0.237788	list[i & 15] +=	-0.124939
-0.237788	100; i++) matrix[FuncRow(i)][FuncCol(i)] +=	-0.124939
-0.237788	j++) 39 matrix[i][j] +=	-0.124939
-0.541285	inputs are the integers	-0.124939
-1.392861	The size of integers	-0.124939
-0.358211	allow addition of integers	-0.124939
-0.540375	point Conversion of integers	-0.124939
-0.358540	these types to integers	-0.124939
-0.358540	point numbers to integers	-0.124939
-0.065731	point numbers and integers	-0.124939
-0.591902	whether they are integers	-0.124939
-0.357804	bitwise operators using integers	-0.124939
-0.519867	each, or two integers	-0.124939
-0.455064	integer. If two integers	-0.124939
-0.487962	systems or 64-bit integers	-0.124939
-0.243432	can use 64-bit integers	-0.124939
-0.243432	may use 64-bit integers	-0.124939
-0.138579	need conversions between integers	-0.124939
-0.138579	Avoid conversions between integers	-0.124939
-0.530646	enabled. Conversions between integers	-0.124939
-0.343135	efficiency of 32-bit integers	-0.124939
-0.343135	to use 32-bit integers	-0.124939
-0.443637	than two 32-bit integers	-0.124939
-0.289905	Conversion of unsigned integers	-0.124939
-0.144849	signed and unsigned integers	-0.124939
-0.158649	Signed and unsigned integers	-0.124939
-0.121199	than with unsigned integers	-0.124939
-0.121199	signed with unsigned integers	-0.124939
-0.289905	(2) use unsigned integers	-0.124939
-0.289905	to convert unsigned integers	-0.124939
-0.289905	Signed versus unsigned integers	-0.124939
-0.354993	bits each, four integers	-0.124939
-0.354855	bits each, eight integers	-0.124939
-0.331874	behavior of signed integers	-0.124939
-0.467602	integers to signed integers	-0.124939
-0.637138	of eight 16-bit integers	-0.124939
-0.343633	instructions cannot multiply integers	-0.124939
-0.325317	contain either sixteen integers	-0.124939
-0.212270	stored as 8-bit integers	-0.124939
-0.212270	are using 8-bit integers	-0.124939
-0.180426	library with the option	-0.124939
-0.494724	compiled with the option	-0.124939
-0.180426	compile with the option	-0.124939
-0.494724	overflow with the option	-0.124939
-0.494724	link with the option	-0.124939
-0.596694	Turn on the option	-0.124939
-0.357746	option. Use the option	-0.124939
-0.357746	implemented. Use the option	-0.124939
-0.357428	with, e.g. the option	-0.124939
-0.355813	such optimizations with option	-0.124939
-0.459678	object made with option	-0.124939
-0.355813	behavior well-defined with option	-0.124939
-0.356895	compiler manual. This option	-0.124939
-0.356895	or -fsource-asm). This option	-0.124939
-0.060488	compilers have an option	-0.425969
-0.150610	compiler has an option	-0.425969
-0.347770	you specify an option	-0.124939
-0.583207	specify the compiler option	-0.124939
-0.583207	Adding the compiler option	-0.124939
-0.560195	use a compiler option	-0.124939
-0.589412	here. The compiler option	-0.124939
-0.350971	the same compiler option	-0.124939
-0.462935	compiler supports this option	-0.124939
-0.502790	compiled without any option	-0.124939
-0.356742	the strongest optimization option	-0.124939
-0.851039	the exception handling option	-0.124939
-0.348510	The assembly output option	-0.124939
-0.348510	an assembly output option	-0.124939
-0.640386	the loop unroll option	-0.124939
-0.347370	static linking (e.g. option	-0.124939
-0.341766	optimal. Use 12 option	-0.124939
-0.314659	case. The -fpie option	-0.124939
-0.382748	the source annotation option	-0.124939
-0.237845	"generate map file" option	-0.124939
-1.266320	if it is good	-0.425969
-0.599197	product. It is good	-0.124939
-0.503742	Single precision is good	-0.124939
-0.242044	compiler is a good	-0.602060
-0.567873	structure is a good	-0.124939
-1.305942	can be a good	-0.124939
-1.198017	is not a good	-0.124939
-0.528596	that has a good	-0.124939
-0.528596	reader has a good	-0.124939
-0.521278	language because a good	-0.124939
-0.354841	have done a good	-0.124939
-0.354841	is therefore a good	-0.124939
-0.609257	to get a good	-0.124939
-0.430754	and get a good	-0.124939
-0.501246	be quite a good	-0.124939
-1.089510	the availability of good	-0.124939
-0.358806	My recommendation for good	-0.124939
-0.314105	high-level languages are good	-0.124939
-0.314105	Low-level languages are good	-0.124939
-0.353977	not always as good	-0.124939
-0.353977	Not optimized as good	-0.124939
-0.353977	not optimize as good	-0.124939
-0.353977	are cached as good	-0.124939
-1.648658	It is not good	-0.124939
-0.507758	2; } A good	-0.124939
-0.446719	in performance. A good	-0.124939
-0.345579	of zero. A good	-0.124939
-0.345579	one operation. A good	-0.124939
-0.345579	simple index. A good	-0.124939
-0.345579	have exploited. A good	-0.124939
-0.358028	For example, all good	-0.124939
-0.357441	Linux. Has many good	-0.124939
-1.048292	is a very good	-0.124939
-0.336838	are not very good	-0.124939
-0.474400	libraries have very good	-0.124939
-0.435714	by some very good	-0.124939
-0.314698	(low numbers mean good	-0.124939
-1.295019	calculation of the power	-0.124939
-0.122681	x to the power	-0.602060
-0.469396	to is a power	-0.124939
-0.322201	that is a power	-0.425969
-1.024966	This is a power	-0.124939
-0.322201	size is a power	-0.425969
-0.469396	constant is a power	-0.124939
-0.762835	matrix is a power	-0.124939
-0.469396	columns is a power	-0.124939
-0.322201	N is a power	-0.425969
-0.469396	factor is a power	-0.124939
-0.469396	abc is a power	-0.124939
-0.174147	divisor is a power	-0.425969
-0.469396	interval is a power	-0.124939
-0.474531	always be a power	-0.124939
-0.299992	preferably be a power	-0.602060
-1.195027	division by a power	-0.124939
-0.524054	multiply by a power	-0.124939
-0.524054	Multiplying by a power	-0.124939
-1.168747	is not a power	-0.124939
-0.452962	a matrix a power	-0.124939
-0.350517	not been a power	-0.124939
-0.350517	of columns a power	-0.124939
-0.350517	for N a power	-0.124939
-0.462414	15.1b. Calculate integer power	-0.124939
-0.354008	Example 15.1d. Integer power	-0.124939
-0.157115	is a high power	-0.425969
-0.348324	the high processing power	-0.124939
-0.444357	processors with low power	-0.124939
-0.331869	memory and computing power	-0.124939
-0.237902	utilize the computational power	-0.124939
-1.570881	size of the matrix	-0.124939
-1.467350	multiple of the matrix	-0.124939
-1.636799	address of the matrix	-0.124939
-0.597079	2 and the matrix	-0.124939
-0.599954	rows in the matrix	-0.124939
-1.768957	to make the matrix	-0.124939
-1.221350	to divide the matrix	-0.124939
-0.357887	to transpose the matrix	-0.124939
-0.373222	size of a matrix	-0.124939
-0.845144	element in a matrix	-0.124939
-0.121508	columns in a matrix	-0.301030
-0.583117	2 if a matrix	-0.124939
-0.065362	to transpose a matrix	-0.124939
-0.459225	But implementing a matrix	-0.124939
-0.459225	example transposes a matrix	-0.124939
-0.355457	} Transposing a matrix	-0.124939
-0.504969	function writes to matrix	-0.124939
-0.152797	and columns in matrix	-0.425969
-0.357929	of rows/columns in matrix	-0.124939
-0.358797	cache lines for matrix	-0.124939
-0.355709	page 78). A matrix	-0.124939
-0.355709	structure needed? A matrix	-0.124939
-0.582923	cell for different matrix	-0.124939
-0.556703	4 with different matrix	-0.124939
-0.307770	a 64 64 matrix	-0.124939
-0.307770	The 64 64 matrix	-0.124939
-0.809412	in a big matrix	-0.124939
-0.103016	transpose and copy matrix	-0.425969
-0.465757	a 512 512 matrix	-0.124939
-0.330524	The 512 512 matrix	-0.124939
-0.348322	execution times per matrix	-0.124939
-0.507153	size // define matrix	-0.124939
-0.946366	function to transpose matrix	-0.124939
-0.591814	work on a Linux	-0.124939
-0.587391	newer versions of Linux	-0.124939
-0.504952	are identical to Linux	-0.124939
-0.573126	The Windows and Linux	-0.124939
-0.556790	well as in Linux	-0.124939
-0.568860	and data in Linux	-0.124939
-0.357449	been introduced in Linux	-0.124939
-0.357449	be overridden in Linux	-0.124939
-0.852308	Intel compiler for Linux	-0.124939
-1.082723	are available for Linux	-0.124939
-1.038282	good choice for Linux	-0.124939
-0.358694	BigArray[1024] __attribute__((aligned(64))); // Linux	-0.124939
-0.358584	generally possible on Linux	-0.124939
-0.190862	Windows Intel compiler Linux	-0.124939
-0.108537	Windows Gnu compiler Linux	-0.602060
-0.750311	32-bit and 64-bit Linux	-0.124939
-0.330731	efficient in 64-bit Linux	-0.425969
-0.487551	faster in 64-bit Linux	-0.124939
-0.479165	available for 64-bit Linux	-0.124939
-0.325697	numbers. Therefore, 64-bit Linux	-0.124939
-0.357501	option /MT). In Linux	-0.124939
-0.675694	Windows and 32-bit Linux	-0.124939
-0.572116	-fpic in 32-bit Linux	-0.124939
-0.336288	addressing. In 32-bit Linux	-0.124939
-0.435024	difference between 32-bit Linux	-0.124939
-1.208771	in 64 bit Linux	-0.124939
-0.950119	in 32 bit Linux	-0.124939
-0.654006	said here about Linux	-0.124939
-0.977779	Intel compiler Windows Linux	-0.124939
-0.350716	but also supports Linux	-0.124939
-0.112440	guide for Windows, Linux	-0.425969
-0.373684	and 64-bit Windows, Linux	-0.124939
-0.314640	platform _WIN32 _WIN32 Linux	-0.124939
-0.352013	They have not been	-0.124939
-0.246229	it has not been	-0.124939
-0.246229	library has not been	-0.124939
-0.352013	columns had not been	-0.124939
-0.352013	IDE. Has not been	-0.124939
-0.332776	the code have been	-0.124939
-0.332776	the compiler have been	-0.124939
-0.332776	the program have been	-0.124939
-0.609056	and b have been	-0.124939
-0.061505	all objects have been	-0.301030
-0.468834	all elements have been	-0.124939
-0.332776	the examples have been	-0.124939
-0.332776	the diagonal have been	-0.124939
-0.332776	N1 could have been	-0.124939
-0.332776	hot spots have been	-0.124939
-0.332776	in isolation have been	-0.124939
-0.341444	file that has been	-0.124939
-0.341444	everything that has been	-0.124939
-0.618593	if it has been	-0.124939
-0.518325	because it has been	-0.124939
-0.146253	after it has been	-0.425969
-0.303918	this time has been	-0.124939
-0.443880	the pointer has been	-0.124939
-0.314394	link pointer has been	-0.124939
-0.125899	of registers has been	-0.124939
-0.125899	vector registers has been	-0.124939
-0.303918	the file has been	-0.124939
-0.303918	This problem has been	-0.124939
-0.303918	repeat count has been	-0.124939
-0.303918	pointer p has been	-0.124939
-0.303918	the STL has been	-0.124939
-0.303918	until seconds has been	-0.124939
-0.303918	hot spot has been	-0.124939
-0.303918	indirect function" has been	-0.124939
-0.303918	initialisation i=0; has been	-0.124939
-0.331928	that has already been	-0.124939
-0.574741	turned up to cause	-0.124939
-0.358221	too small to cause	-0.124939
-1.745034	is likely to cause	-0.124939
-0.555732	unpredictable times and cause	-0.124939
-0.357453	be invalid and cause	-0.124939
-0.357453	critical stride and cause	-0.124939
-0.357453	other's caches and cause	-0.124939
-0.527113	resource problems that cause	-0.124939
-1.036969	because it can cause	-0.124939
-0.690353	stack. This can cause	-0.124939
-0.690353	modified. This can cause	-0.124939
-0.482794	defined. This can cause	-0.124939
-0.482794	patterns. This can cause	-0.124939
-0.487496	then this can cause	-0.124939
-0.447679	static memory can cause	-0.124939
-0.518046	an array can cause	-0.124939
-0.346339	of software can cause	-0.124939
-0.346339	intermediate calculations can cause	-0.124939
-0.487496	array overflow can cause	-0.124939
-0.346339	This alignment can cause	-0.124939
-0.346339	same generation can cause	-0.124939
-0.346339	the BTB can cause	-0.124939
-0.922907	that it may cause	-0.124939
-0.922907	because it may cause	-0.124939
-0.466141	modules. This may cause	-0.124939
-0.466141	reasons. This may cause	-0.124939
-0.512422	CPUs which may cause	-0.124939
-0.348784	class members may cause	-0.124939
-0.350457	do so will cause	-0.124939
-0.350457	overloaded operators will cause	-0.124939
-0.064718	address 0x2710 will cause	-0.124939
-0.536789	size that doesn't cause	-0.124939
-0.343375	signed integer doesn't cause	-0.124939
-1.054675	is a common cause	-0.124939
-1.057142	the most common cause	-0.124939
-0.458327	the reduction would cause	-0.124939
-0.626286	is a frequent cause	-0.124939
-0.429545	identification. Such schemes cause	-0.124939
-0.378028	advantage of the AVX	-0.425969
-0.597788	as to the AVX	-0.124939
-0.599254	YMM in the AVX	-0.124939
-1.144976	compiled for the AVX	-0.124939
-0.586502	compiling for the AVX	-0.124939
-0.589245	available if the AVX	-0.124939
-0.589245	(YMM) if the AVX	-0.124939
-1.784247	to use the AVX	-0.124939
-0.594260	105). If the AVX	-0.124939
-0.065575	before leaving the AVX	-0.425969
-0.357123	12.1a. Enable the AVX	-0.124939
-0.462646	if possible. The AVX	-0.124939
-0.358149	and later. The AVX	-0.124939
-0.562904	code compiled for AVX	-0.124939
-0.197197	11) { // AVX	-0.425969
-0.503034	parm2) {...} // AVX	-0.425969
-0.566916	run only if AVX	-0.124939
-0.414266	is compiled with AVX	-0.124939
-0.584623	code compiled with AVX	-0.124939
-0.358357	For example, use AVX	-0.124939
-0.355412	when going from AVX	-0.124939
-0.355412	any transition from AVX	-0.124939
-0.568812	library has no AVX	-0.124939
-0.503557	SSE4.1 instr. set AVX	-0.124939
-0.461125	or int 4 AVX	-0.124939
-0.103166	with and without AVX	-0.124939
-0.515643	is compiled without AVX	-0.124939
-0.356177	string search instructions AVX	-0.124939
-0.607271	64 4 256 AVX	-0.124939
-0.607271	32 8 256 AVX	-0.124939
-0.964505	the operating system. AVX	-0.124939
-0.102837	vector elements. 12.1 AVX	-0.124939
-0.102837	operations............................................................................................... 105 12.1 AVX	-0.124939
-0.237837	AES, PCLMUL wmmintrin.h AVX	-0.124939
-1.530539	the use of classes	-0.124939
-0.659664	7.17 Structures and classes	-0.124939
-0.805836	text strings in classes	-0.124939
-0.060666	functions or vector classes	-0.124939
-0.491790	dispatching with vector classes	-0.124939
-0.414807	are using vector classes	-0.124939
-0.501101	using Intel vector classes	-0.124939
-0.120579	12.5 Using vector classes	-0.124939
-0.105855	// Define vector classes	-0.124939
-0.320106	using Agner vector classes	-0.124939
-0.517262	using Agner's vector classes	-0.124939
-0.131205	of predefined vector classes	-0.124939
-0.131205	Use predefined vector classes	-0.124939
-0.539454	organizing data into classes	-0.124939
-0.357535	vectors into C++ classes	-0.124939
-0.109554	examples of container classes	-0.124939
-0.109554	discussion of container classes	-0.124939
-0.256420	Remember that container classes	-0.124939
-0.256420	to make container classes	-0.124939
-0.256420	of example container classes	-0.124939
-0.256420	many standard container classes	-0.124939
-0.256420	your own container classes	-0.124939
-0.256420	www.agner.org/optimize/cppexamples.zip containing container classes	-0.124939
-0.329588	implementations of string classes	-0.124939
-0.426630	applications. The string classes	-0.124939
-0.449119	// The child classes	-0.124939
-0.294243	Table 12.5. Vector classes	-0.124939
-0.294243	Table 12.1. Vector classes	-0.124939
-0.343716	from multiple parent classes	-0.124939
-0.048386	memory allocation. Container classes	-0.124939
-0.023519	90 9.7 Container classes	-0.124939
-0.023519	alloca. 9.7 Container classes	-0.124939
-0.048386	Multiple threads? Container classes	-0.124939
-0.237861	there are wrapper classes	-0.124939
-0.237861	59 third generations classes	-0.124939
-1.827424	if it is done	-0.124939
-0.880162	as it is done	-0.124939
-0.582283	called. This is done	-0.124939
-0.582283	loop. This is done	-0.124939
-0.569285	smaller size is done	-0.124939
-1.042784	memory allocation is done	-0.124939
-0.653905	the multiplication is done	-0.124939
-0.653905	function inlining is done	-0.124939
-0.355937	The branching is done	-0.124939
-0.355937	or C2::Disp() is done	-0.124939
-0.355937	address. Relocation is done	-0.124939
-0.463531	be standardized and done	-0.124939
-1.274458	have to be done	-0.124939
-1.171079	has to be done	-0.124939
-1.408163	it can be done	-0.124939
-0.955704	This can be done	-0.301030
-0.830196	integer can be done	-0.124939
-0.564086	2 can be done	-0.124939
-0.564086	polynomial can be done	-0.124939
-0.585158	constant should be done	-0.124939
-0.571859	do must be done	-0.124939
-0.906574	should preferably be done	-0.124939
-0.563803	point operations are done	-0.124939
-0.229102	address calculations are done	-0.124939
-0.229102	All calculations are done	-0.124939
-0.229102	certain calculations are done	-0.124939
-0.354802	performance tests are done	-0.124939
-0.354802	9. Multiplications are done	-0.124939
-0.358503	easier said than done	-0.124939
-0.585135	easier. I have done	-0.124939
-0.356222	whether others have done	-0.124939
-0.583358	work it has done	-0.124939
-0.462500	This is all done	-0.124939
-0.354122	to 15.1c was done	-0.124939
-0.562439	This is usually done	-0.124939
-0.698702	is not necessarily done	-0.124939
-0.544760	checking and is therefore	-0.124939
-0.927012	code. It is therefore	-0.124939
-0.742799	memory. It is therefore	-0.124939
-0.514561	compiler. It is therefore	-0.124939
-0.514561	executed. It is therefore	-0.124939
-0.514561	expressions. It is therefore	-0.124939
-0.514561	constant. It is therefore	-0.124939
-0.514561	73). It is therefore	-0.124939
-0.514561	efficiently. It is therefore	-0.124939
-0.514561	72. It is therefore	-0.124939
-0.514561	can. It is therefore	-0.124939
-0.514561	correctness. It is therefore	-0.124939
-0.545778	Efficient caching is therefore	-0.124939
-0.796991	to p is therefore	-0.124939
-0.557112	main memory and therefore	-0.124939
-0.354706	to make and therefore	-0.124939
-0.570197	local variables and therefore	-0.124939
-0.458272	dedicated microprocessor and therefore	-0.124939
-0.651463	operating system, and therefore	-0.124939
-0.354706	to understand and therefore	-0.124939
-0.354706	of m and therefore	-0.124939
-0.354706	does not, and therefore	-0.124939
-0.354706	be non-zero, and therefore	-0.124939
-0.354706	system dependent and therefore	-0.124939
-0.568976	arithmetic operations are therefore	-0.124939
-0.357992	itself. Constructors are therefore	-0.124939
-1.371704	the code can therefore	-0.124939
-0.567569	time. It can therefore	-0.124939
-0.653764	memory allocation can therefore	-0.124939
-0.576669	'this'. We can therefore	-0.124939
-0.358506	The developers may therefore	-0.124939
-0.355695	above examples will therefore	-0.124939
-0.355695	and 14.30 will therefore	-0.124939
-0.476558	vectorized code should therefore	-0.124939
-0.504699	space. It should therefore	-0.124939
-0.376309	them. You should therefore	-0.124939
-0.376309	late. You should therefore	-0.124939
-0.338409	point calculations should therefore	-0.124939
-0.338409	Lazy binding should therefore	-0.124939
-0.463606	type holds a precision	-0.124939
-0.550433	same regardless of precision	-0.124939
-0.540805	and loss of precision	-0.124939
-1.413016	with the same precision	-0.124939
-0.587640	keep the same precision	-0.124939
-0.595515	relax floating point precision	-0.124939
-0.581282	integer. Floating point precision	-0.124939
-0.289785	with the double precision	-0.124939
-0.447495	is a double precision	-0.124939
-0.411039	precision to double precision	-0.124939
-0.273599	single and double precision	-0.124939
-0.289785	know that double precision	-0.124939
-0.289785	constants are double precision	-0.124939
-0.377407	faster than double precision	-0.124939
-0.289785	may use double precision	-0.124939
-0.289785	of two double precision	-0.124939
-0.348003	have long double precision	-0.124939
-0.348003	when long double precision	-0.124939
-0.289785	hold four double precision	-0.124939
-0.289785	most cases, double precision	-0.124939
-0.289785	precision. Using double precision	-0.124939
-0.121158	are: Long double precision	-0.124939
-0.121158	precision. Long double precision	-0.124939
-0.282919	higher for single precision	-0.124939
-0.369022	may use single precision	-0.124939
-0.282919	b from single precision	-0.124939
-0.282919	are using single precision	-0.124939
-0.282919	the constant single precision	-0.124939
-0.282919	or four single precision	-0.124939
-0.282919	or eight single precision	-0.124939
-0.134632	instructions for high precision	-0.124939
-0.134632	Libraries for high precision	-0.124939
-0.351270	mixed precision require precision	-0.124939
-0.434953	operands have mixed precision	-0.124939
-0.102852	more time. Single precision	-0.124939
-0.102852	data cache. Single precision	-0.124939
-0.237877	instruction set. High precision	-0.124939
-0.237877	is enabled (single precision	-0.124939
-0.841212	doesn't have the line	-0.124939
-0.566824	line then the line	-0.124939
-0.566824	values then the line	-0.124939
-0.573274	size with a line	-0.124939
-0.573274	ways, with a line	-0.124939
-0.358630	each pixel or line	-0.124939
-0.463245	interpreted line by line	-0.124939
-0.358592	column 29 with line	-0.124939
-1.203507	may replace this line	-0.124939
-0.354202	a code one line	-0.124939
-1.109027	more than one line	-0.124939
-0.141529	by the cache line	-0.124939
-0.352832	before the cache line	-0.124939
-0.352832	least the cache line	-0.124939
-0.352832	evict the cache line	-0.124939
-0.415793	so a cache line	-0.124939
-0.415793	fetching a cache line	-0.124939
-0.221322	processors. The cache line	-0.124939
-0.221322	line. The cache line	-0.124939
-0.305533	align by cache line	-0.124939
-0.305533	87 used cache line	-0.124939
-0.305533	a new cache line	-0.124939
-0.126434	64. Each cache line	-0.124939
-0.126434	29. Each cache line	-0.124939
-0.305533	an entire cache line	-0.124939
-0.587548	file for each line	-0.124939
-0.911320	of the matrix line	-0.124939
-0.811807	of a matrix line	-0.124939
-0.650330	// if above line	-0.124939
-0.549476	range. The next line	-0.124939
-0.352884	negative. The last line	-0.124939
-0.352286	64 bytes. Each line	-0.124939
-0.341748	is and interpreted line	-0.124939
-0.331852	remove the memset line	-0.124939
-0.160682	from the command line	-0.124939
-0.135887	on a command line	-0.124939
-0.135887	from a command line	-0.124939
-0.237829	Table 18.1. Command line	-0.124939
-0.562787	for overflow and works	-0.124939
-1.032219	of code that works	-0.124939
-0.771805	the one that works	-0.124939
-0.447794	only one that works	-0.124939
-0.906622	function library that works	-0.124939
-0.780798	The version that works	-0.124939
-0.902546	make sure it works	-0.124939
-0.526737	16.1. This code works	-0.124939
-0.142259	program optimization. This works	-0.124939
-0.142259	profile-guided optimization. This works	-0.124939
-0.355213	absolute addresses. This works	-0.124939
-1.555005	the Intel compiler works	-0.124939
-0.501069	describes how this works	-0.124939
-0.356103	Of course, this works	-0.124939
-0.358302	accessed sequentially. It works	-0.124939
-0.504138	see which one works	-0.124939
-0.493107	cache. The cache works	-0.124939
-0.166549	The code cache works	-0.124939
-0.447417	sequentially A cache works	-0.124939
-0.461771	functions. It also works	-0.124939
-0.575934	thread. This method works	-0.124939
-0.548307	that this method works	-0.124939
-0.355707	manual currently doesn't works	-0.124939
-0.580450	Explicit CPU dispatching works	-0.124939
-0.649438	all code branches works	-0.124939
-0.353541	particular code implementation works	-0.124939
-0.602908	out-of-order execution mechanism works	-0.124939
-0.329511	The renaming mechanism works	-0.124939
-0.519957	user. Dynamic linking works	-0.124939
-0.530625	way a profiler works	-0.124939
-0.516985	The automatic vectorization works	-0.124939
-0.331758	program that already works	-0.124939
-0.172593	rather than "what works	-0.124939
-0.172593	typically thinks "what works	-0.124939
-0.294061	14.12b and 14.13b works	-0.124939
-0.294061	necessary. 101 Multithreading works	-0.124939
-0.237772	OR operator (|) works	-0.124939
-0.600492	support in the optimized	-0.124939
-0.938793	is not the optimized	-0.124939
-0.526620	you run the optimized	-0.124939
-0.358475	example 12.2, the optimized	-0.124939
-1.771734	the code is optimized	-0.124939
-0.834432	data cache is optimized	-0.124939
-0.504123	some expression is optimized	-0.124939
-0.575540	good compilers and optimized	-0.124939
-0.358860	are obscured in optimized	-0.124939
-0.462624	Boolean output. The optimized	-0.124939
-0.358131	level framework. The optimized	-0.124939
-1.648007	that can be optimized	-0.124939
-1.407868	code can be optimized	-0.124939
-0.588640	!a; can be optimized	-0.124939
-0.876742	can often be optimized	-0.124939
-0.585735	Some functions are optimized	-0.124939
-0.462365	these examples are optimized	-0.124939
-0.358611	reordered, inlined, or optimized	-0.124939
-0.593610	microprocessors are not optimized	-0.124939
-0.586875	functionality of an optimized	-0.124939
-0.586327	processor that you optimized	-0.124939
-0.357831	same function, each optimized	-0.124939
-0.339452	of the best optimized	-0.124939
-0.756752	core library contains optimized	-0.124939
-0.336951	use the well optimized	-0.124939
-0.336951	with a well optimized	-0.124939
-0.842441	pointer is simply optimized	-0.124939
-0.346371	www.agner.org/optimize/asmlib.zip. Currently includes optimized	-0.124939
-0.287492	in the fully optimized	-0.124939
-0.287492	are not fully optimized	-0.124939
-0.155651	But a highly optimized	-0.124939
-0.167271	functions are highly optimized	-0.124939
-0.255437	libraries are highly optimized	-0.124939
-0.155651	consider making highly optimized	-0.124939
-0.172612	Gnu compiler. Not optimized	-0.124939
-0.172612	C++ builder. Not optimized	-0.124939
-0.294098	versions, each carefully optimized	-0.124939
-1.961866	if it is inside	-0.124939
-1.047016	the loop is inside	-0.124939
-0.586323	they must be inside	-0.124939
-0.358670	by declaring it inside	-0.124939
-0.599111	by the code inside	-0.124939
-0.583912	piece of memory inside	-0.124939
-0.591667	algorithm is used inside	-0.124939
-0.357382	by making objects inside	-0.124939
-0.357275	the shared variable inside	-0.124939
-0.580902	declaring the table inside	-0.124939
-0.431945	predictable the branch inside	-0.124939
-0.431945	avoids the branch inside	-0.124939
-0.546404	with a branch inside	-0.124939
-0.434965	integers. The branch inside	-0.124939
-0.461222	c2 for elements inside	-0.124939
-0.355787	fixed size arrays inside	-0.124939
-0.509247	on the calculations inside	-0.124939
-0.331786	depends on calculations inside	-0.124939
-1.094749	floating point calculations inside	-0.124939
-0.650473	is a counter inside	-0.124939
-0.353674	and no branches inside	-0.124939
-0.365774	should be declared inside	-0.124939
-0.365774	preferably be declared inside	-0.124939
-0.206078	and objects declared inside	-0.425969
-0.100915	class Variables declared inside	-0.124939
-0.100915	inefficient. Variables declared inside	-0.124939
-0.347312	the performance counters inside	-0.124939
-0.447646	the overflow condition inside	-0.124939
-0.449505	body is defined inside	-0.124939
-0.299663	static object defined inside	-0.124939
-0.339330	the function body inside	-0.124939
-0.336088	at what happens inside	-0.124939
-0.331697	is needed. Objects inside	-0.124939
-0.331750	loop because nothing inside	-0.124939
-0.294052	(addition, multiplication, etc.) inside	-0.124939
-0.237763	be kept entirely inside	-0.124939
-0.237763	(other than log) inside	-0.124939
-0.463530	mode. See the manual	-0.124939
-0.463530	handling. See the manual	-0.124939
-1.227129	is not a manual	-0.124939
-0.358369	page 49 and manual	-0.124939
-0.358369	preceding paragraph and manual	-0.124939
-0.828089	are explained in manual	-0.124939
-0.776209	are given in manual	-0.124939
-0.650405	are discussed in manual	-0.124939
-0.354172	loops" chapter in manual	-0.124939
-0.498375	is provided in manual	-0.124939
-1.056604	are listed in manual	-0.124939
-0.354172	table 19 in manual	-0.124939
-0.141940	in detail in manual	-0.124939
-0.141940	more detail in manual	-0.124939
-0.354172	kernel code" in manual	-0.124939
-0.354172	are covered in manual	-0.124939
-0.351956	8 below. This manual	-0.124939
-0.646033	is important. This manual	-0.124939
-0.351956	and smaller. This manual	-0.124939
-0.351956	1 Introduction This manual	-0.124939
-0.351956	page 158. This manual	-0.124939
-0.873125	and the compiler manual	-0.124939
-1.034975	in the compiler manual	-0.124939
-0.092868	appendix to this manual	-0.602060
-0.558878	basis for this manual	-0.124939
-0.357036	simplest cases. See manual	-0.124939
-1.037989	in the Gnu manual	-0.124939
-0.350771	computers and my manual	-0.124939
-0.048398	the CPU (See manual	-0.425969
-0.237936	AMD CPUs (See manual	-0.124939
-0.237936	be mispredicted (See manual	-0.124939
-0.314714	of the present manual	-0.124939
-0.237891	Fog The present manual	-0.124939
-0.294173	See the vectorclass manual	-0.124939
-0.358844	can compute a /	-0.124939
-0.677329	a = b /	-0.124939
-0.695869	(a > b /	-0.124939
-0.357614	list[i] += i /	-0.124939
-0.351567	of sets). Here, /	-0.124939
-0.453119	b * 5 /	-0.124939
-0.452488	reciprocal_divisor = 1. /	-0.124939
-0.495750	c = temp /	-0.124939
-0.449898	cmp eax, 100 /	-0.124939
-0.490647	add ebx, eax /	-0.124939
-0.339256	sum += xn /	-0.124939
-0.339256	64-bit Windows. Borland /	-0.124939
-0.336048	Borland / CodeGear /	-0.124939
-0.325152	(total cache size) /	-0.124939
-0.324520	= (unsigned int)b /	-0.124939
-0.314523	discussed below. Signed /	-0.124939
-0.065762	calculations of (2n /	-0.124939
-0.065762	a * (2n /	-0.124939
-0.065762	The constant (2n /	-0.124939
-0.314523	is 512 kb /	-0.124939
-0.538514	y2 = a2 /	-0.124939
-0.538514	y1 = a1 /	-0.124939
-0.294015	b + 2.0 /	-0.124939
-0.294015	stride is 8192 /	-0.124939
-0.382589	c = (a+1) /	-0.124939
-0.237731	(a1*b2 + a2*b1) /	-0.124939
-0.237731	instructions mov ebx,eax /	-0.124939
-0.237731	(set) = (10000 /	-0.124939
-0.237731	= (unsigned int)a /	-0.124939
-0.237731	= (memory address) /	-0.124939
-0.237731	b * (1. /	-0.124939
-0.237731	14.00 for 80x86 /	-0.124939
-0.237731	(set) = (0x2710 /	-0.124939
-1.488006	This method is explained	-0.124939
-0.463255	bounds checking is explained	-0.124939
-0.357197	variable storage are explained	-0.124939
-0.461436	These factors are explained	-0.124939
-0.357197	name mangling are explained	-0.124939
-0.455661	optimizing code, as explained	-0.124939
-0.323107	non-Intel processors, as explained	-0.124939
-0.323107	64-bit mode, as explained	-0.124939
-0.323107	the system, as explained	-0.124939
-0.323107	be static, as explained	-0.124939
-0.323107	cleaned up, as explained	-0.124939
-0.323107	of precision, as explained	-0.124939
-0.323107	out-of-order execution, as explained	-0.124939
-0.323107	cache space, as explained	-0.124939
-0.323107	vector operations, as explained	-0.124939
-0.323107	for metaprogramming, as explained	-0.124939
-0.323107	vector classes, as explained	-0.124939
-0.323107	class templates, as explained	-0.124939
-0.323107	eliminate branches, as explained	-0.124939
-0.323107	register use, as explained	-0.124939
-0.323107	other ways, as explained	-0.124939
-0.323107	without AVX, as explained	-0.124939
-0.323107	switch statements, as explained	-0.124939
-0.323107	memory pool, as explained	-0.124939
-0.323107	other optimizations, as explained	-0.124939
-0.323107	static linking, as explained	-0.124939
-0.323107	clock frequency, as explained	-0.124939
-0.323107	is pipelined, as explained	-0.124939
-0.323107	critical stride, as explained	-0.124939
-0.323107	cache contentions, as explained	-0.124939
-0.349103	background is further explained	-0.124939
-0.144358	functions for reasons explained	-0.124939
-0.144358	precision for reasons explained	-0.124939
-0.144358	manual for reasons explained	-0.124939
-0.144358	mode, for reasons explained	-0.124939
-0.325381	predicted perfectly. As explained	-0.124939
-0.294228	table lookup mechanisms explained	-0.124939
-0.893745	replaced by the calculated	-0.124939
-0.597030	it with the calculated	-0.124939
-0.883294	operand that is calculated	-0.124939
-0.589005	counter, which is calculated	-0.124939
-0.657207	the value is calculated	-0.124939
-0.412452	each value is calculated	-0.124939
-0.500432	This expression is calculated	-0.124939
-0.355647	/ b) is calculated	-0.124939
-0.355647	value xn is calculated	-0.124939
-0.355647	of n! is calculated	-0.124939
-0.355647	of coefficients is calculated	-0.124939
-0.355647	that a+b is calculated	-0.124939
-0.355647	of matrix[j][0] is calculated	-0.124939
-0.355647	or g(x) is calculated	-0.124939
-0.584261	number to be calculated	-0.124939
-0.925560	function can be calculated	-0.124939
-1.148265	which can be calculated	-0.124939
-0.925560	variable can be calculated	-0.124939
-0.544601	result can be calculated	-0.124939
-0.236436	counter can be calculated	-0.301030
-0.544601	numbers can be calculated	-0.124939
-0.794894	condition can be calculated	-0.124939
-0.544601	stride can be calculated	-0.124939
-0.544601	parentheses can be calculated	-0.124939
-0.579011	b*2.0/3.0 will be calculated	-0.124939
-0.574178	or cannot be calculated	-0.124939
-0.490035	r+i/2 could be calculated	-0.124939
-0.871203	mathematical functions are calculated	-0.124939
-0.724578	bitwise operators are calculated	-0.124939
-0.583367	which it has calculated	-0.124939
-0.709925	it is only calculated	-0.124939
-0.494825	log(2.0) is only calculated	-0.124939
-0.537444	results are always calculated	-0.124939
-0.237902	anda * 17is calculated	-0.124939
-0.237902	example,a * 16is calculated	-0.124939
-0.594097	burden is the calculation	-0.124939
-0.863142	count and the calculation	-0.124939
-0.581627	B, and the calculation	-0.124939
-0.598577	Nothing in the calculation	-0.124939
-0.596364	needed for the calculation	-0.124939
-1.351286	or if the calculation	-0.124939
-0.580643	occur, but the calculation	-0.124939
-0.356384	how efficient the calculation	-0.124939
-0.587691	B before the calculation	-0.124939
-1.144914	roll out the calculation	-0.124939
-0.569492	speed up the calculation	-0.124939
-0.460403	and start the calculation	-0.124939
-0.356384	software specifies the calculation	-0.124939
-0.460403	latter case, the calculation	-0.124939
-0.356384	can begin the calculation	-0.124939
-0.720869	has finished the calculation	-0.124939
-0.654792	and redo the calculation	-0.124939
-0.356384	// Re-do the calculation	-0.124939
-0.355635	+ c; The calculation	-0.124939
-1.012443	function calls. The calculation	-0.124939
-0.355635	Calculate polynomial The calculation	-0.124939
-0.355635	number 28. The calculation	-0.124939
-0.459451	generates 127. The calculation	-0.124939
-0.355635	not supported. The calculation	-0.124939
-0.575236	f; } This calculation	-0.124939
-0.358403	example shows this calculation	-0.124939
-0.358301	code motion A calculation	-0.124939
-0.793686	sense that each calculation	-0.124939
-0.533173	calculations, where each calculation	-0.124939
-0.357242	its out-of- order calculation	-0.124939
-0.190450	make the address calculation	-0.124939
-0.334478	instructions for address calculation	-0.124939
-0.334478	the complicated address calculation	-0.124939
-0.952953	to the total calculation	-0.124939
-0.237886	have an estimated calculation	-0.124939
-0.237886	in Linux. Address calculation	-0.124939
-0.332156	virtual function } };	-0.124939
-0.332156	index operator } };	-0.124939
-0.174850	<< 1; } };	-0.425969
-0.332156	return x; } };	-0.124939
-0.168920	<< 2; } };	-0.425969
-0.332156	return 1.0; } };	-0.124939
-0.332156	#undef N1 } };	-0.124939
-0.332156	* powN<true,N/2>::p(x); } };	-0.124939
-0.332156	function: (static_cast<MyChild*>(this))->Disp(); } };	-0.124939
-0.357050	// add elements };	-0.124939
-0.581626	// sign bit };	-0.124939
-0.794190	of the structure };	-0.124939
-0.235572	public: int c; };	-0.124939
-0.235572	b2; int c; };	-0.124939
-0.965954	b, c, d; };	-0.124939
-0.339336	byte at 19 };	-0.124939
-0.407982	decoding and perhaps };	-0.124939
-0.172612	a + b;} };	-0.124939
-0.172612	ReadB() {return b;} };	-0.124939
-0.023515	b:2; int c:2; };	-0.124939
-0.294098	Thursday, Friday, Saturday };	-0.124939
-0.023515	virtual void f(); };	-0.124939
-0.382691	a[1000]; float b[1000]; };	-0.124939
-0.538660	public: void NotPolymorphic(); };	-0.124939
-0.294098	Saturday = 0x40 };	-0.124939
-0.023515	int sign :1;//signbit };	-0.425969
-0.237804	public: ... ~C1(); };	-0.124939
-0.237804	c; int UnusedFiller; };	-0.124939
-0.237804	}; char abc; };	-0.124939
-0.237804	"Beta", "Gamma", "Delta" };	-0.124939
-0.237804	y.d + 4.; };	-0.124939
-0.989333	or class is 128	-0.124939
-0.504528	vector register is 128	-0.124939
-0.090036	__m128i defines a 128	-0.124939
-0.090036	__m128 defines a 128	-0.124939
-0.090036	__m128d defines a 128	-0.124939
-0.353028	32 4 int 128	-0.124939
-0.583048	4 unsigned int 128	-0.124939
-0.559823	8 short int 128	-0.124939
-0.963097	unsigned short int 128	-0.124939
-1.214082	is less than 128	-0.124939
-0.593791	OS. See page 128	-0.124939
-0.357727	float 128 double 128	-0.124939
-0.357538	uint64_t 256 float 128	-0.124939
-0.352122	double 64 2 128	-0.124939
-0.352122	long 64 2 128	-0.124939
-0.351109	int 32 4 128	-0.124939
-0.351109	float 32 4 128	-0.124939
-0.524278	int 16 8 128	-0.124939
-0.544069	in the first 128	-0.124939
-0.544069	within the first 128	-0.124939
-0.501386	char 8 16 128	-0.124939
-0.356392	float vectors SSE2 128	-0.124939
-0.439555	12.2 128 128 128	-0.124939
-0.339894	126 12.2 128 128	-0.124939
-0.552256	Therefore, the dispatcher 128	-0.124939
-0.484663	16 unsigned char 128	-0.124939
-0.329550	8 16 char 128	-0.124939
-0.347329	&& SIZE % 128	-0.124939
-0.346351	bit mode SSE 128	-0.124939
-0.339273	64 2 int64_t 128	-0.124939
-0.237859	0.29 0.28 strlen 128	-0.124939
-0.237859	0.59 0.27 strlen 128	-0.124939
-0.325212	64 2 uint64_t 128	-0.124939
-0.325272	127 126 12.2 128	-0.124939
-0.382657	Gnu compiler ......................................................................... 128	-0.124939
-0.237780	64 bits (MMX), 128	-0.124939
-0.599760	compiler if the uses	-0.124939
-0.463441	code big and uses	-0.124939
-0.540064	other code that uses	-0.124939
-0.540064	optimize code that uses	-0.124939
-0.355905	second application that uses	-0.124939
-0.459794	software framework that uses	-0.124939
-0.355905	hot spot that uses	-0.124939
-0.588401	CPUs, but it uses	-0.124939
-0.596018	memory a function uses	-0.124939
-0.461698	The pow function uses	-0.124939
-0.358557	32-bit Mac code uses	-0.124939
-0.504626	if an int uses	-0.124939
-1.408100	that the compiler uses	-0.124939
-1.146040	but the compiler uses	-0.124939
-0.355552	Intel CPUs. It uses	-0.124939
-0.355552	the label. It uses	-0.124939
-0.657862	if the program uses	-0.124939
-1.014746	If a program uses	-0.124939
-0.565307	time. The program uses	-0.124939
-0.548895	while a double uses	-0.124939
-0.525787	because a float uses	-0.124939
-0.357107	test when software uses	-0.124939
-0.353681	This framework typically uses	-0.124939
-0.541378	If the application uses	-0.124939
-0.333728	a particular application uses	-0.124939
-0.332841	} This implementation uses	-0.124939
-0.430701	A good implementation uses	-0.124939
-0.353467	long as their uses	-0.124939
-0.353277	the user never uses	-0.124939
-0.454334	branch). This feature uses	-0.124939
-0.348980	The compiler sometimes uses	-0.124939
-0.348962	where it still uses	-0.124939
-0.343529	time. Four typical uses	-0.124939
-0.339309	of a vector, uses	-0.124939
-0.237780	wstring or CString uses	-0.124939
-0.935174	one of the four	-0.425969
-0.597259	vector, and the four	-0.124939
-0.596047	points with the four	-0.124939
-0.539839	// add the four	-0.124939
-0.978080	and store the four	-0.124939
-0.358020	one vector, the four	-0.124939
-1.271589	value that is four	-0.124939
-0.844664	bit vector of four	-0.124939
-0.503070	A structure of four	-0.124939
-0.539102	Define vectors of four	-0.124939
-0.357535	a maximum of four	-0.124939
-0.357535	into groups of four	-0.124939
-0.764403	of i to four	-0.124939
-0.594977	core. There are four	-0.124939
-0.461828	16-bit integers or four	-0.124939
-0.760938	double precision or four	-0.124939
-0.461418	250 times with four	-0.124939
-0.357183	i7 processor with four	-0.124939
-0.587498	have more than four	-0.124939
-0.358376	can only have four	-0.124939
-0.503978	This processor has four	-0.124939
-0.519043	There are only four	-0.124939
-0.519043	pow(x,10) with only four	-0.124939
-0.351430	Windows allows only four	-0.124939
-0.585188	we can do four	-0.124939
-0.589714	Windows, the first four	-0.124939
-0.355016	example, you get four	-0.124939
-0.498936	block for every four	-0.124939
-0.353672	xx4; // next four	-0.124939
-0.352599	code will read four	-0.124939
-0.299615	vector of e.g. four	-0.124939
-0.299615	can hold e.g. four	-0.124939
-0.498717	vector can hold four	-0.124939
-0.444172	16 bits each, four	-0.124939
-0.294098	This loop calculates four	-0.124939
-1.575072	more efficient than functions.	-0.124939
-0.358098	treated as different functions.	-0.124939
-0.968534	of the library functions.	-0.124939
-0.342277	useful for library functions.	-0.124939
-0.342277	dynamically linked library functions.	-0.124939
-0.342277	on executing library functions.	-0.124939
-0.569219	up into multiple functions.	-0.124939
-0.577091	for the two functions.	-0.124939
-0.316392	meaning for member functions.	-0.124939
-0.410193	members or member functions.	-0.124939
-0.316392	any other member functions.	-0.124939
-0.496411	with virtual member functions.	-0.124939
-0.295047	or non-static member functions.	-0.124939
-0.295047	all non-static member functions.	-0.124939
-0.316392	any non-polymorphic member functions.	-0.124939
-1.006320	of the virtual functions.	-0.124939
-0.443006	instead of virtual functions.	-0.124939
-0.354553	the intrinsic hardware functions.	-0.124939
-0.311395	had used intrinsic functions.	-0.124939
-0.311395	that support intrinsic functions.	-0.124939
-0.311395	the so-called intrinsic functions.	-0.124939
-0.311370	many useful mathematical functions.	-0.124939
-0.311370	information about mathematical functions.	-0.124939
-0.311370	contains optimized mathematical functions.	-0.124939
-0.352837	names from string functions.	-0.124939
-0.352787	for the three functions.	-0.124939
-0.570077	calls to frame functions.	-0.124939
-0.311709	functions and frame functions.	-0.124939
-0.449033	for using overloaded functions.	-0.124939
-0.441854	of the polymorphic functions.	-0.124939
-0.407982	preferable for speed-critical functions.	-0.124939
-0.314680	logarithms and trigonometric functions.	-0.124939
-0.294098	all local non-member functions.	-0.124939
-0.294098	must use thread-safe functions.	-0.124939
-0.294098	resources than non-virtual functions.	-0.124939
-0.237804	to remove unreferenced functions.	-0.124939
-0.504954	Integer overflow is another	-0.124939
-1.678519	a pointer to another	-0.124939
-0.358187	one auto_ptr to another	-0.124939
-0.503981	later ported to another	-0.124939
-0.463455	AVX support and another	-0.124939
-0.462347	derived class in another	-0.124939
-0.539676	thousand results in another	-0.124939
-0.357913	is system-independent, in another	-0.124939
-1.009647	to wait for another	-0.124939
-0.504566	an overflow or another	-0.124939
-0.354710	every second by another	-0.124939
-0.354710	to 5 by another	-0.124939
-0.354710	be changed by another	-0.124939
-0.354710	later deleted by another	-0.124939
-0.457884	an addition with another	-0.124939
-0.354401	the core with another	-0.124939
-0.354401	project built with another	-0.124939
-0.354401	might clash with another	-0.124939
-0.834785	clock cycle on another	-0.124939
-0.766411	be called from another	-0.124939
-0.471212	also called from another	-0.124939
-0.585182	it can do another	-0.124939
-0.577592	code by making another	-0.124939
-0.653694	do calculations while another	-0.124939
-0.538007	dispatched function calls another	-0.124939
-0.320351	in turn calls another	-0.124939
-0.131284	if F1 calls another	-0.124939
-0.131284	If F1 calls another	-0.124939
-0.355687	object file. Use another	-0.124939
-0.459027	loop is inside another	-0.124939
-0.494759	whenever it goes another	-0.124939
-0.348234	the destructor causes another	-0.124939
-0.552601	AVX instruction set, another	-0.124939
-0.421329	program that produces another	-0.124939
-0.237796	the user interface, another	-0.124939
-0.237796	mode, we encounter another	-0.124939
-0.598730	space for the parameters	-0.124939
-0.590576	mode where the parameters	-0.124939
-0.358487	64-bit mode, the parameters	-0.124939
-0.358487	mode. Storing the parameters	-0.124939
-0.573190	any type of parameters	-0.124939
-0.358790	operating systems". The parameters	-0.124939
-0.501203	objects as function parameters	-0.124939
-0.356199	only four function parameters	-0.124939
-0.356199	efficient. Simple function parameters	-0.124939
-0.358555	be passed as parameters	-0.124939
-0.549608	Function with vector parameters	-0.124939
-1.066155	eight floating point parameters	-0.124939
-0.581261	parameters. Floating point parameters	-0.124939
-0.349985	first two integer parameters	-0.124939
-0.452289	first six integer parameters	-0.124939
-0.349985	CodeGear compiler) integer parameters	-0.124939
-0.290080	that the template parameters	-0.124939
-0.290080	if the template parameters	-0.124939
-0.411429	because the template parameters	-0.124939
-0.290080	If the template parameters	-0.124939
-0.455270	factors as template parameters	-0.124939
-0.525197	instance has its parameters	-0.124939
-0.512601	maximum of four parameters	-0.124939
-0.339909	the first four parameters	-0.124939
-0.269200	Function parameters Function parameters	-0.124939
-0.053299	in memory. Function parameters	-0.425969
-0.269200	overloaded operators. Function parameters	-0.124939
-0.114067	48 7.15 Function parameters	-0.124939
-0.114067	respect. 7.15 Function parameters	-0.124939
-0.269200	using __fastcall. Function parameters	-0.124939
-0.349664	type with desired parameters	-0.124939
-0.348256	beware that macro parameters	-0.124939
-0.162687	up to fourteen parameters	-0.425969
-0.294145	the stack (three parameters	-0.124939
-1.897361	is possible to get	-0.124939
-1.006743	in order to get	-0.124939
-0.355151	Use template to get	-0.124939
-0.551782	and want to get	-0.124939
-0.551782	still want to get	-0.124939
-1.013688	be difficult to get	-0.124939
-1.183550	various ways to get	-0.124939
-0.355151	option -fno-builtin to get	-0.124939
-0.355151	some experience to get	-0.124939
-0.358812	information elsewhere and get	-0.124939
-0.672751	then you can get	-0.124939
-0.538320	where you can get	-0.124939
-0.504697	square x // get	-0.124939
-0.358636	from www.agner.org/optimize/testp.zip or get	-0.124939
-0.473120	we will not get	-0.124939
-0.473120	You will not get	-0.124939
-1.342234	then you may get	-0.124939
-0.548437	fact, you may get	-0.124939
-0.787673	For example, you get	-0.124939
-0.711532	then you will get	-0.124939
-0.442536	because you will get	-0.124939
-0.493170	Each thread will get	-0.124939
-0.350431	Here, y will get	-0.124939
-0.357970	appropriately. Users should get	-0.124939
-0.351089	this number we get	-0.124939
-0.453686	option. Then we get	-0.124939
-0.354101	b will both get	-0.124939
-0.457044	programming will typically get	-0.124939
-0.780041	so we don't get	-0.124939
-0.434926	you will soon get	-0.124939
-0.325281	data object: (1) get	-0.124939
-0.294135	extremely inefficient, (4) get	-0.124939
-1.527172	{ a = b;	-0.124939
-0.575636	8.10b a = b;	-0.124939
-0.356565	1; x[1] = b;	-0.124939
-0.049517	int a; int b;	-0.425969
-0.049517	float a; int b;	-0.124939
-0.180794	{int a; int b;	-0.124939
-0.349489	at 399 int b;	-0.124939
-0.512277	S1 { double b;	-0.124939
-0.310119	int a; double b;	-0.124939
-0.310119	float a; double b;	-0.124939
-0.172202	= a & b;	-0.124939
-0.326988	integers int a, b;	-0.124939
-0.326988	m;} int a, b;	-0.124939
-0.326988	1.6; int a, b;	-0.124939
-0.047183	14.14b double a, b;	-0.124939
-0.047183	14.14a double a, b;	-0.124939
-0.047183	14.18c double a, b;	-0.124939
-0.047183	8.2a double a, b;	-0.124939
-0.272353	66 float a, b;	-0.124939
-0.272353	14.18a float a, b;	-0.124939
-0.272353	14.18b float a, b;	-0.124939
-0.230472	int i, a, b;	-0.124939
-0.384082	7.10a bool a, b;	-0.124939
-0.459220	b; a += b;	-0.124939
-0.651228	? a : b;	-0.124939
-0.549535	= a && b;	-0.124939
-0.550925	= a | b;	-0.124939
-0.531637	= a || b;	-0.124939
-0.814340	a = 0, b;	-0.124939
-0.277387	float a; bool b;	-0.124939
-0.277387	x, y; bool b;	-0.124939
-0.277387	y, z; bool b;	-0.124939
-0.714773	int i, a[100], b;	-0.124939
-1.423159	to do the check	-0.124939
-0.726435	can bypass the check	-0.124939
-0.570439	do such a check	-0.124939
-0.574464	extra code to check	-0.124939
-1.316858	You have to check	-0.124939
-0.581558	F1 has to check	-0.124939
-0.782616	a way to check	-0.124939
-1.045940	best way to check	-0.124939
-1.360672	for how to check	-0.124939
-0.585939	You need to check	-0.124939
-1.745196	you want to check	-0.124939
-0.558663	function calls to check	-0.124939
-0.865908	often necessary to check	-0.124939
-0.460934	The program can check	-0.124939
-0.584941	makefile. You can check	-0.124939
-0.577913	zero We can check	-0.124939
-0.890173	0x7FFFFFFF) { // check	-0.124939
-0.544953	dispatcher does not check	-0.124939
-0.544953	14.26 does not check	-0.124939
-0.504482	derived class. This check	-0.124939
-0.358293	function must then check	-0.124939
-1.172653	There is no check	-0.425969
-0.178251	functions have no check	-0.425969
-0.459835	135). This extra check	-0.124939
-0.459786	the function must check	-0.124939
-0.353527	that doesn't automatically check	-0.124939
-0.353136	is no automatic check	-0.124939
-0.455898	makes a runtime check	-0.124939
-0.346339	added a bounds check	-0.124939
-0.345112	example. We might check	-0.124939
-0.632623	variables as input check	-0.124939
-0.483668	the CPU brand check	-0.124939
-0.325302	data. A missing check	-0.124939
-0.325252	this problem: (1) check	-0.124939
-1.536798	then it is advantageous	-0.124939
-0.407817	whether it is advantageous	-0.602060
-1.223860	Therefore, it is advantageous	-0.124939
-0.562236	Likewise, it is advantageous	-0.124939
-1.500019	a function is advantageous	-0.124939
-0.581821	purposes. This is advantageous	-0.124939
-0.581821	line. This is advantageous	-0.124939
-0.894438	to. It is advantageous	-0.124939
-0.535787	lookup table is advantageous	-0.124939
-1.464529	This method is advantageous	-0.124939
-0.546154	without caching is advantageous	-0.124939
-1.561673	it can be advantageous	-0.124939
-0.373101	It can be advantageous	-0.425969
-1.555525	may not be advantageous	-0.124939
-1.108520	This may be advantageous	-0.124939
-1.400690	It may be advantageous	-0.124939
-0.583301	vectorization will be advantageous	-0.124939
-1.453010	can also be advantageous	-0.124939
-0.708914	some cases be advantageous	-0.124939
-0.141499	can therefore be advantageous	-0.124939
-0.358779	multiple cores are advantageous	-0.124939
-1.087048	it is not advantageous	-0.425969
-1.373461	It is not advantageous	-0.124939
-0.559432	hyperthreading is not advantageous	-0.124939
-0.747095	and therefore not advantageous	-0.124939
-0.561743	operations is more advantageous	-0.124939
-0.561743	operators is more advantageous	-0.124939
-0.581907	it is less advantageous	-0.124939
-0.356815	that decide how advantageous	-0.124939
-0.356443	is almost always advantageous	-0.124939
-0.353983	tables are particular advantageous	-0.124939
-1.748490	the code is implemented	-0.124939
-0.590606	.NET, which is implemented	-0.124939
-0.567446	derived class is implemented	-0.124939
-0.559642	An array is implemented	-0.124939
-0.142888	code version is implemented	-0.425969
-0.460971	application software is implemented	-0.124939
-0.503105	A constructor is implemented	-0.124939
-1.295019	it can be implemented	-0.124939
-1.176423	code can be implemented	-0.124939
-0.544598	this can be implemented	-0.124939
-0.794888	functions can be implemented	-0.124939
-0.544598	loop can be implemented	-0.124939
-0.192003	libraries can be implemented	-0.425969
-0.794888	mechanism can be implemented	-0.124939
-0.544598	14.28 can be implemented	-0.124939
-0.544598	etc., can be implemented	-0.124939
-0.544598	8.24 can be implemented	-0.124939
-0.196338	queue should be implemented	-0.425969
-0.702082	cannot easily be implemented	-0.124939
-0.348168	some day be implemented	-0.124939
-0.552456	conditions which are implemented	-0.124939
-0.459403	functions, etc. are implemented	-0.124939
-0.523643	programming languages are implemented	-0.124939
-0.501953	metaprogramming, loops are implemented	-0.124939
-0.459403	15.1b. Branches are implemented	-0.124939
-0.358684	need modification if implemented	-0.124939
-0.460202	instruction and have implemented	-0.124939
-0.585141	call. I have implemented	-0.124939
-0.583488	objects is often implemented	-0.124939
-0.355220	shows this calculation implemented	-0.124939
-0.457348	even for programs implemented	-0.124939
-0.544146	This is typically implemented	-0.124939
-0.353841	application is preferably implemented	-0.124939
-0.600323	overview of the problem	-0.124939
-0.598674	models if the problem	-0.124939
-0.595224	better. If the problem	-0.124939
-0.986191	can avoid the problem	-0.124939
-0.761898	can reduce the problem	-0.124939
-0.357884	may ignore the problem	-0.124939
-0.657780	to fix the problem	-0.124939
-0.357884	of solving the problem	-0.124939
-1.536851	This is a problem	-0.124939
-1.450422	There is a problem	-0.124939
-0.579948	caching is a problem	-0.124939
-1.220183	is not a problem	-0.124939
-0.524282	access, etc. The problem	-0.124939
-0.981962	the cache. The problem	-0.124939
-0.655789	execution units. The problem	-0.124939
-0.356885	remain unchanged. The problem	-0.124939
-0.356898	of registers. This problem	-0.124939
-0.356898	code caching. This problem	-0.124939
-0.146964	solution to this problem	-0.425969
-0.370771	solutions to this problem	-0.124939
-0.446243	you have this problem	-0.124939
-0.768541	to avoid this problem	-0.124939
-0.345202	has solved this problem	-0.124939
-0.632838	to solve this problem	-0.124939
-0.526326	b; } A problem	-0.124939
-0.594726	This is no problem	-0.124939
-0.522606	a very big problem	-0.124939
-0.348278	Vectorization with alignment problem	-0.124939
-0.348290	new version causes problem	-0.124939
-0.347421	the double. Another problem	-0.124939
-0.343658	to a usability problem	-0.124939
-0.339388	The most serious problem	-0.124939
-0.435013	critical. The worst problem	-0.124939
-0.294154	safe. This safety problem	-0.124939
-1.234768	if it is known	-0.124939
-1.023284	If it is known	-0.124939
-0.588185	variable which is known	-0.124939
-0.945368	of objects is known	-0.124939
-0.202877	of elements is known	-0.425969
-0.657300	the result is known	-0.124939
-0.412505	33 result is known	-0.124939
-0.355040	to store is known	-0.124939
-0.521570	that n is known	-0.124939
-0.501432	on process is known	-0.124939
-0.499585	the condition is known	-0.124939
-0.757216	the divisor is known	-0.124939
-0.463598	truly represent a known	-0.124939
-1.342162	an object of known	-0.124939
-0.463510	is constant and known	-0.124939
-0.878015	it cannot be known	-0.124939
-0.547548	array is not known	-0.124939
-0.547548	objects is not known	-0.124939
-0.547548	length is not known	-0.124939
-0.547548	required is not known	-0.124939
-0.547548	divisor is not known	-0.124939
-1.208774	that are not known	-0.124939
-0.358121	to handle only known	-0.124939
-0.873939	is an integer known	-0.124939
-0.192143	Is the size known	-0.124939
-0.357774	the implicit pointer known	-0.124939
-0.503587	correspond to any known	-0.124939
-0.580190	is a constant known	-0.124939
-0.485525	source of error known	-0.124939
-0.344914	common programming error known	-0.124939
-0.182876	string is already known	-0.124939
-0.182876	CPU-type is already known	-0.124939
-0.340898	to b for (i	-0.124939
-0.783895	= 0; for (i	-0.124939
-0.006611	int i; for (i	-0.903090
-0.340898	a[100], b; for (i	-0.124939
-0.080278	i; ... for (i	-0.425969
-0.178837	136 ... for (i	-0.124939
-0.178837	j; ... for (i	-0.124939
-0.137826	= 1; for (i	-0.124939
-0.137826	+ 1; for (i	-0.124939
-0.340898	to zero for (i	-0.124939
-0.103769	float x; for (i	-0.425969
-0.488211	register temp; for (i	-0.124939
-0.340898	= 3; for (i	-0.124939
-0.340898	i, a[100]; for (i	-0.124939
-0.340898	i; 45 for (i	-0.124939
-0.340898	= r; for (i	-0.124939
-0.340898	innermost loop: for (i	-0.124939
-0.340898	i, StringLength; for (i	-0.124939
-0.340898	i, a[2]; for (i	-0.124939
-0.340898	i; 84 for (i	-0.124939
-0.340898	long timediff[NumberOfTests]; for (i	-0.124939
-0.340898	} printf("\nResults:"); for (i	-0.124939
-0.353702	(unsigned int if (i	-0.124939
-0.566316	i++) { if (i	-0.124939
-0.141796	i; ... if (i	-0.124939
-0.141796	list[size]; ... if (i	-0.124939
-0.353702	float list[ARRAYSIZE]; if (i	-0.124939
-0.355973	after exceptions: while (i	-0.124939
-0.835235	time, then the solution	-0.124939
-0.566809	ignore, then the solution	-0.124939
-0.358621	sizes. Fortunately, the solution	-0.124939
-0.504933	itself. But a solution	-0.124939
-0.358760	integer comparisons. The solution	-0.124939
-0.356874	; mark_end; This solution	-0.124939
-0.356874	DEC, JNZ). This solution	-0.124939
-0.577556	cost of this solution	-0.124939
-0.457250	time. But this solution	-0.124939
-0.353901	edx. Furthermore, this solution	-0.124939
-0.724646	to see which solution	-0.124939
-0.579072	A more efficient solution	-0.124939
-0.402067	the most efficient solution	-0.124939
-0.661492	a very efficient solution	-0.124939
-0.329730	matrices. An efficient solution	-0.124939
-0.536942	speed. A simple solution	-0.124939
-0.566789	case. The best solution	-0.124939
-0.498846	storing. The standard solution	-0.124939
-0.797037	then the optimal solution	-0.124939
-0.334639	not an optimal solution	-0.124939
-0.565194	A more complicated solution	-0.124939
-0.772567	be a better solution	-0.124939
-0.294145	size. The alternative solution	-0.124939
-0.382748	inlined. An alternative solution	-0.124939
-0.508323	be the fastest solution	-0.124939
-0.483774	even more powerful solution	-0.124939
-0.325338	and most clean solution	-0.124939
-0.325280	the only reasonable solution	-0.124939
-0.314591	the class. Which solution	-0.124939
-0.538627	be a viable solution	-0.124939
-0.294080	time. No universal solution	-0.124939
-0.237788	library. The radical solution	-0.124939
-0.237788	design. The ultimate solution	-0.124939
-0.600358	key in the container	-0.124939
-0.595579	object because the container	-0.124939
-0.595786	elements? If the container	-0.124939
-1.095583	of making the container	-0.124939
-0.358328	container. Can the container	-0.124939
-0.943301	to use a container	-0.124939
-0.539750	memory into a container	-0.124939
-0.539750	array into a container	-0.124939
-0.356376	by defining a container	-0.124939
-0.460392	when choosing a container	-0.124939
-0.460392	be considered a container	-0.124939
-0.460392	temporarily lock a container	-0.124939
-0.356376	to re-use a container	-0.124939
-0.562565	More examples of container	-0.124939
-0.572842	the discussion of container	-0.124939
-0.527066	container class. The container	-0.124939
-0.358786	applications. Remember that container	-0.124939
-0.357818	the array or container	-0.124939
-0.357818	size array or container	-0.124939
-0.895428	common to make container	-0.124939
-0.586591	Library) and other container	-0.124939
-0.462419	not use one container	-0.124939
-0.503424	collection of example container	-0.124939
-0.503050	source of such container	-0.124939
-0.346194	discussion of efficient container	-0.124939
-0.587833	by more efficient container	-0.124939
-0.346194	and various efficient container	-0.124939
-0.458060	Unfortunately, many standard container	-0.124939
-0.532723	write your own container	-0.124939
-0.351585	templates. Ready made container	-0.124939
-0.452582	in an STL container	-0.124939
-0.525110	STL for accessing container	-0.124939
-0.347373	at www.agner.org/optimize/cppexamples.zip containing container	-0.124939
-0.314630	arrays by well-tested container	-0.124939
-0.466066	This has the advantage	-0.425969
-0.550418	set gives the advantage	-0.124939
-0.582894	list[x]; } The advantage	-0.124939
-1.042575	the program. The advantage	-0.124939
-0.828348	level-1 cache. The advantage	-0.124939
-0.498633	C++ compilers. The advantage	-0.124939
-0.650772	is enabled. The advantage	-0.124939
-0.498633	calculation faster. The advantage	-0.124939
-0.354357	of iterations. The advantage	-0.124939
-0.354357	variable m. The advantage	-0.124939
-0.358563	logical register. This advantage	-0.124939
-1.151413	there is an advantage	-0.124939
-0.315451	can be an advantage	-0.124939
-0.924366	is not an advantage	-0.124939
-0.491943	is only an advantage	-0.124939
-0.451735	SSE4.1 gives an advantage	-0.124939
-1.871925	there is no advantage	-0.124939
-0.357537	is no such advantage	-0.124939
-0.548824	version that takes advantage	-0.124939
-0.421475	order to take advantage	-0.124939
-0.421475	how to take advantage	-0.124939
-0.253862	that can take advantage	-0.124939
-0.253862	you can take advantage	-0.124939
-0.253862	program can take advantage	-0.124939
-0.108640	You can take advantage	-0.425969
-0.364151	We can take advantage	-0.124939
-0.355713	hardly any speed advantage	-0.124939
-0.576397	is a specific advantage	-0.124939
-0.493642	set. The main advantage	-0.124939
-0.348310	to take maximum advantage	-0.124939
-0.523785	Typically, the full advantage	-0.124939
-0.314678	15.1c? We took advantage	-0.124939
-0.449181	instrset_detect function // Function	-0.124939
-0.104280	const*)p); } // Function	-0.602060
-0.637361	intrinsic functions // Function	-0.124939
-0.347528	vector classes // Function	-0.124939
-1.215833	} }; // Function	-0.124939
-0.449181	with SSE4.1 // Function	-0.124939
-0.449181	cc[size] ); // Function	-0.124939
-0.347528	int parm2); // Function	-0.124939
-0.347528	_mm_loadu_si128((__m128i const*)p);} // Function	-0.124939
-0.347528	CriticalFunctionType CriticalFunction_Dispatch; // Function	-0.124939
-0.846438	and position-independent code Function	-0.124939
-0.559927	versus dynamic libraries Function	-0.124939
-0.356447	table. Optimization method Function	-0.124939
-0.459835	the inlined function. Function	-0.124939
-0.812271	7.15 Function parameters Function	-0.124939
-0.932207	a non-inlined copy Function	-0.124939
-0.426615	than in memory. Function	-0.124939
-0.426615	around in memory. Function	-0.124939
-0.346339	access these instructions. Function	-0.124939
-0.429397	big. 7.14 Functions Function	-0.124939
-0.331758	the profiler itself. Function	-0.124939
-0.594949	and overloaded operators. Function	-0.124939
-0.172616	decrement operators. 7.7 Function	-0.124939
-0.172616	............................................................................................ 36 7.7 Function	-0.124939
-0.172616	............................................................................................... 50 7.16 Function	-0.124939
-0.172616	operating systems". 7.16 Function	-0.124939
-0.102828	................................................................................................................ 48 7.15 Function	-0.124939
-0.102828	this respect. 7.15 Function	-0.124939
-0.294108	are using __fastcall. Function	-0.124939
-0.237812	// Example 12.6. Function	-0.124939
-0.237812	Fastcall functions /Gr Function	-0.124939
-0.237812	to know about. Function	-0.124939
-0.237812	in library libircmt.lib. Function	-0.124939
-1.258032	the code to support	-0.124939
-0.358763	operating system for support	-0.124939
-0.536636	for compilers that support	-0.124939
-0.143000	of processors that support	-0.124939
-0.357636	unknown processors that support	-0.124939
-0.459801	all CPUs that support	-0.124939
-0.470374	that do not support	-0.124939
-0.523189	Windows. Does not support	-0.124939
-0.571438	compilers that have support	-0.124939
-1.371320	Some compilers have support	-0.124939
-0.358264	Future processors will support	-0.124939
-0.824618	instruction set has support	-0.124939
-0.355146	operating system has support	-0.124939
-0.895414	takes to make support	-0.124939
-0.461515	compiler has some support	-0.124939
-0.356711	The Gnu libraries support	-0.124939
-0.061719	compiled with AVX support	-0.124939
-0.462069	compiled without AVX support	-0.124939
-0.039594	CPU has hardware support	-0.425969
-0.083162	microprocessor has hardware support	-0.124939
-0.575187	possible exception handling support	-0.124939
-0.353331	that "we don't support	-0.124939
-0.455852	systems need better support	-0.124939
-0.352267	library. It requires support	-0.124939
-0.551761	to turn off support	-0.425969
-0.347425	method requires OS support	-0.124939
-0.341682	debugging and profiling support	-0.124939
-0.339360	with full debugging support	-0.124939
-0.294089	not have inherent support	-0.124939
-0.237796	It has excellent support	-0.124939
-0.897477	check for the supported	-0.124939
-0.595954	instruction set is supported	-0.492916
-0.558199	reasons. C++ is supported	-0.124939
-0.142478	if AVX is supported	-0.124939
-0.142478	system. AVX is supported	-0.124939
-0.355926	when AVX2 is supported	-0.124939
-1.007802	in Linux and supported	-0.124939
-0.462913	fully standardized and supported	-0.124939
-0.169222	first processors that supported	-0.124939
-0.868648	Intrinsic functions are supported	-0.124939
-1.285164	XMM registers are supported	-0.124939
-0.524686	These directives are supported	-0.124939
-0.461716	only available if supported	-0.124939
-0.357417	or __restrict__, if supported	-0.124939
-0.595221	information, such as supported	-0.124939
-0.591294	processors are not supported	-0.124939
-0.356871	instruction set not supported	-0.124939
-0.358117	is currently only supported	-0.124939
-0.141099	{ // SSE2 supported	-0.124939
-0.581240	CPUID information about supported	-0.124939
-0.126843	{ // AVX supported	-0.124939
-0.226342	{ // Get supported	-0.124939
-0.143377	version // Get supported	-0.124939
-0.294154	is the minimum supported	-0.124939
-0.237853	{ // Detect supported	-0.124939
-0.726699	register variables is eight	-0.124939
-1.426925	a vector of eight	-0.124939
-0.540636	in vectors of eight	-0.124939
-0.015498	result vector in eight	-0.726999
-0.594987	registers. There are eight	-0.124939
-0.763790	double precision or eight	-0.124939
-0.135246	out loop by eight	-0.726999
-0.787445	you can have eight	-0.124939
-0.358029	physical processors but eight	-0.124939
-0.357755	will go into eight	-0.124939
-0.564492	diagonal. The first eight	-0.124939
-0.348775	the 49 first eight	-0.124939
-0.356519	line. But these eight	-0.124939
-0.521080	cores can run eight	-0.124939
-0.347366	we can handle eight	-0.124939
-0.001486	{ // Load eight	-0.726999
-0.001983	i); // Load eight	-0.602060
-0.005976	b.load(bb+i); // Load eight	-0.124939
-0.056044	point operations involves eight	-0.124939
-0.444211	8 bits each, eight	-0.124939
-0.294126	because it handles eight	-0.124939
-0.237829	to be reloaded eight	-0.124939
-1.064761	done with the operators	-0.124939
-0.153686	point variables and operators	-0.124939
-0.554511	Integers variables and operators	-0.124939
-0.462657	adding vectors. The operators	-0.124939
-0.833555	clock cycle. The operators	-0.124939
-0.504058	they come from operators	-0.124939
-0.572208	sense that all operators	-0.124939
-0.358040	or 1, but operators	-0.124939
-0.355590	and free. These operators	-0.124939
-0.353971	undesired results. Integer operators	-0.124939
-0.115096	of the Boolean operators	-0.124939
-0.110257	than the Boolean operators	-0.124939
-0.110257	between the Boolean operators	-0.124939
-0.357072	~. The Boolean operators	-0.124939
-0.395298	and using overloaded operators	-0.124939
-0.304351	with multiple overloaded operators	-0.124939
-0.069769	use the bitwise operators	-0.124939
-0.069769	Nevertheless, the bitwise operators	-0.124939
-0.051248	needed. The bitwise operators	-0.124939
-0.051248	once The bitwise operators	-0.124939
-0.051248	||). The bitwise operators	-0.124939
-0.059914	do with bitwise operators	-0.124939
-0.059914	of using bitwise operators	-0.124939
-0.014222	14.3 Use bitwise operators	-0.425969
-0.059914	the corresponding bitwise operators	-0.124939
-0.331808	here about increment operators	-0.124939
-0.093297	functions. 7.27 Overloaded operators	-0.124939
-0.093297	56 7.27 Overloaded operators	-0.124939
-0.172640	applies to decrement operators	-0.124939
-0.237895	Increment and decrement operators	-0.124939
-0.237853	division and relational operators	-0.124939
-1.635238	one of the few	-0.124939
-1.185564	function is a few	-0.124939
-0.584177	library for a few	-0.124939
-0.252164	there are a few	-0.124939
-0.645483	There are a few	-0.124939
-0.107882	one or a few	-0.124939
-0.483249	less than a few	-0.124939
-0.177615	longer than a few	-0.124939
-0.581717	will make a few	-0.124939
-0.553843	takes only a few	-0.124939
-0.533265	typically takes a few	-0.124939
-0.455295	then add a few	-0.124939
-0.352359	is needed a few	-0.124939
-0.495851	in just a few	-0.124939
-0.531263	arrays require a few	-0.124939
-0.352359	wait until a few	-0.124939
-0.495851	sets include a few	-0.124939
-0.455295	the break a few	-0.124939
-0.352359	etc. SSSE3 a few	-0.124939
-0.358808	very kludgy. The few	-0.124939
-0.764017	a loop with few	-0.124939
-0.358575	should have as few	-0.124939
-0.459561	hard disk. A few	-0.124939
-0.355722	few lines. A few	-0.124939
-1.286282	from the same few	-0.124939
-0.358130	libraries where only few	-0.124939
-0.358064	even matters, which few	-0.124939
-0.502607	instructions have very few	-0.124939
-0.535373	framework that uses few	-0.124939
-0.349736	virtual table. Unfortunately, few	-0.124939
-0.587622	improving code that contains	-0.124939
-0.760524	a loop that contains	-0.124939
-0.656698	a container that contains	-0.124939
-1.037754	and the code contains	-0.124939
-1.325809	if the code contains	-0.124939
-1.727522	of the program contains	-0.124939
-0.751938	of a program contains	-0.124939
-0.828628	If a program contains	-0.124939
-0.877920	If a loop contains	-0.124939
-0.358017	variable (eax) which contains	-0.124939
-0.531090	compiler. This library contains	-0.124939
-0.138256	math core library contains	-0.124939
-0.138256	Math core library contains	-0.124939
-0.342272	Performance Primitives" library contains	-0.124939
-0.476134	if the software contains	-0.124939
-0.476134	If the software contains	-0.124939
-0.450290	because it often contains	-0.124939
-0.348406	Vectorized code often contains	-0.124939
-0.555511	if the expression contains	-0.124939
-0.938332	The code section contains	-0.124939
-0.348216	you are testing contains	-0.124939
-0.346322	to. Now ebx contains	-0.124939
-0.279342	stack). ecx now contains	-0.124939
-0.279342	loop body now contains	-0.124939
-0.341740	address is. ecx contains	-0.124939
-0.339327	the Boost collection contains	-0.124939
-0.434873	a and edx contains	-0.124939
-0.652010	manual at www.agner.org/optimize/cppexamples.zip contains	-0.124939
-0.538644	Math Kernel Library" contains	-0.124939
-0.237796	library at www.agner.org/optimize/asmlib.zip contains	-0.124939
-0.237796	The file http://www.agner.org/optimize/asmlib.zip contains	-0.124939
-0.237796	generation class (CGrandParent) contains	-0.124939
-0.237796	generation class (CParent<>) contains	-0.124939
-0.390848	time regardless of whether	-0.124939
-0.390848	registers, regardless of whether	-0.124939
-0.463434	final program and whether	-0.124939
-0.579121	truth depends on whether	-0.124939
-0.357498	will be efficient whether	-0.124939
-0.356488	know for sure whether	-0.124939
-0.654828	to find out whether	-0.124939
-0.355937	be made about whether	-0.124939
-0.567445	want to check whether	-0.124939
-0.354086	is equally fast whether	-0.124939
-0.475389	compilers to see whether	-0.124939
-0.475389	table to see whether	-0.124939
-0.819483	makes no difference whether	-0.124939
-0.646047	The table shows whether	-0.124939
-0.551155	dispatcher to know whether	-0.124939
-0.348230	is not clear whether	-0.124939
-0.813134	difficult to predict whether	-0.124939
-0.413892	you may consider whether	-0.124939
-0.248546	CPU-dispatcher that checks whether	-0.124939
-0.327461	class, it checks whether	-0.124939
-0.248546	CPU dispatcher checks whether	-0.124939
-0.339263	known at compile-time whether	-0.124939
-0.506943	compiler to evaluate whether	-0.124939
-0.014481	parallelism when deciding whether	-0.124939
-0.004774	account when deciding whether	-0.301030
-0.014481	disadvantages when deciding whether	-0.124939
-0.444120	order to determine whether	-0.124939
-0.294061	GB. When considering whether	-0.124939
-0.294061	first operand determines whether	-0.124939
-0.294061	iteration it decides whether	-0.124939
-0.237772	to predict correctly whether	-0.124939
-0.072488	i < 100; i++)	-0.602060
-0.351276	i < 2; i++)	-0.124939
-0.012684	i < size; i++)	-0.522879
-0.287549	i < n; i++)	-0.124939
-0.374674	i <= n; i++)	-0.124939
-1.265017	i < 256; i++)	-0.124939
-0.331847	i < 1000; i++)	-0.124939
-0.314678	for (i=0; i<100; i++)	-0.124939
-0.538774	i < 20; i++)	-0.124939
-0.538774	for (i=0; i<n; i++)	-0.124939
-0.023519	i < rows; i++)	-0.124939
-0.023519	i < NumberOfTests; i++)	-0.425969
-0.237861	i < arraysize; i++)	-0.124939
-0.237861	i < ArraySize; i++)	-0.124939
-0.237861	i < list.Size(); i++)	-0.124939
-0.899988	element in the list	-0.124939
-1.954327	is that the list	-0.124939
-0.599308	inefficient if the list	-0.124939
-0.587380	put into the list	-0.124939
-0.876392	here is a list	-0.124939
-0.588506	Here is a list	-0.124939
-0.894109	Sum of a list	-0.124939
-0.590006	3 for a list	-0.124939
-0.591964	initialized by a list	-0.124939
-0.538784	market. Such a list	-0.124939
-1.459205	the beginning of list	-0.124939
-0.463589	dummy element to list	-0.124939
-1.407473	of elements in list	-0.124939
-0.358275	instruction set?". A list	-0.124939
-0.575844	of a long list	-0.124939
-0.588828	Wikibooks. The following list	-0.124939
-0.316541	of a linked list	-0.124939
-0.216484	in a linked list	-0.124939
-0.216484	use a linked list	-0.124939
-0.216484	through a linked list	-0.124939
-0.102862	block. A linked list	-0.124939
-0.102862	lists. A linked list	-0.124939
-0.149254	use a negative list	-0.124939
-0.149254	make a negative list	-0.124939
-0.149254	contains a negative list	-0.124939
-0.199114	that a positive list	-0.124939
-0.199114	make a positive list	-0.124939
-0.199114	contains a positive list	-0.124939
-0.552234	copying the entire list	-0.124939
-0.803273	use a linear list	-0.124939
-0.535778	even the smallest list	-0.124939
-0.020842	use a sorted list	-0.124939
-0.020842	then a sorted list	-0.124939
-0.020842	But a sorted list	-0.124939
-0.463432	programming errors that would	-0.124939
-1.017299	even when it would	-0.124939
-0.587420	again, but it would	-0.124939
-0.355210	multiplying them. This would	-0.124939
-0.355210	from scratch. This would	-0.124939
-0.355210	+ log(c[i]);. This would	-0.124939
-1.066166	then the compiler would	-0.124939
-1.206609	an optimizing compiler would	-0.124939
-0.504356	the time you would	-0.124939
-0.413368	0 because this would	-0.124939
-0.413368	tasks because this would	-0.124939
-0.576332	value. The loop would	-0.124939
-0.357998	5) {} which would	-0.124939
-0.567811	no induction variable would	-0.124939
-0.555942	number then we would	-0.124939
-1.207380	the cache line would	-0.124939
-0.521566	mode, the parameters would	-0.124939
-0.354939	The ultimate solution would	-0.124939
-0.805625	then the multiplication would	-0.124939
-0.353535	A safer implementation would	-0.124939
-0.451148	c and d would	-0.124939
-0.634995	the two loops would	-0.124939
-0.345007	15.1b to metaprogramming would	-0.124939
-0.530694	carried dependency chain would	-0.124939
-0.279340	libraries, but who would	-0.124939
-0.279340	precision. And who would	-0.124939
-0.314613	where the reduction would	-0.124939
-0.237806	a particular reduction would	-0.124939
-0.467359	15.1a to 15.1c would	-0.124939
-0.325192	if they otherwise would	-0.124939
-0.714455	0x2700 to 0x273F would	-0.124939
-0.444107	static, the logarithm would	-0.124939
-0.294052	double, then sizeof(S1) would	-0.124939
-0.600932	updating in the likely	-0.124939
-0.352687	optimized for is likely	-0.124939
-1.679979	then it is likely	-0.124939
-0.861429	program, it is likely	-0.124939
-0.580730	bottleneck, it is likely	-0.124939
-1.472813	the code is likely	-0.124939
-0.829972	your code is likely	-0.124939
-1.012295	The compiler is likely	-0.124939
-0.863959	because this is likely	-0.124939
-0.582054	test program is likely	-0.124939
-0.585010	collector which is likely	-0.124939
-0.772296	target address is likely	-0.124939
-0.496306	end user is likely	-0.124939
-0.852580	which method is likely	-0.124939
-0.496306	The system is likely	-0.124939
-0.352687	1024 bits is likely	-0.124939
-0.943885	the problem is likely	-0.124939
-0.748839	CPU model is likely	-0.124939
-0.455710	different platform is likely	-0.124939
-0.550710	speed here is likely	-0.124939
-0.352687	particular brand is likely	-0.124939
-0.352687	integer comparison is likely	-0.124939
-0.352687	page 87) is likely	-0.124939
-0.579157	static data are likely	-0.124939
-0.593660	compiler is more likely	-0.124939
-0.357875	function will most likely	-0.124939
-1.034363	it is also likely	-0.124939
-0.836311	it is very likely	-0.124939
-0.836311	which is very likely	-0.124939
-0.560693	instructions are less likely	-0.124939
-0.565330	variables and therefore likely	-0.124939
-0.354293	model, which quite likely	-0.124939
-0.520516	and are equally likely	-0.124939
-0.742745	beginning of the structure	-0.124939
-0.600226	UnusedFiller in the structure	-0.124939
-0.572044	for making the structure	-0.124939
-0.526193	will load the structure	-0.124939
-0.658379	has made the structure	-0.124939
-0.597889	union is a structure	-0.124939
-0.589990	bytes in a structure	-0.124939
-0.589990	last in a structure	-0.124939
-0.887206	such as a structure	-0.124939
-0.357323	how big a structure	-0.124939
-0.461596	may define a structure	-0.124939
-0.914909	an array of structure	-0.124939
-0.504025	to arrays of structure	-0.124939
-0.462734	The alignment of structure	-0.124939
-0.165259	the class or structure	-0.425969
-0.117688	a class or structure	-0.124939
-0.725914	each thread. This structure	-0.124939
-0.358269	4 floats A structure	-0.124939
-0.458987	or other data structure	-0.124939
-0.355270	its own data structure	-0.124939
-0.358191	more clear program structure	-0.124939
-1.681111	of the same structure	-0.124939
-0.799857	of the whole structure	-0.124939
-0.489003	for the logical structure	-0.124939
-0.346358	has a parallel structure	-0.124939
-0.444285	where the logic structure	-0.124939
-0.429520	Is a multidimensional structure	-0.124939
-0.458628	However, the pipeline structure	-0.124939
-0.325291	of a class, structure	-0.124939
-0.599626	though it is doing	-0.124939
-1.871084	the function is doing	-0.124939
-0.658608	the microprocessor is doing	-0.124939
-0.575174	efficient, way of doing	-0.124939
-0.389909	newer method of doing	-0.124939
-0.389909	original method of doing	-0.124939
-0.253208	different ways of doing	-0.124939
-0.471977	smarter ways of doing	-0.124939
-0.358812	register renaming and doing	-0.124939
-1.049271	are used for doing	-0.124939
-0.461793	as C++ for doing	-0.124939
-1.082744	are available for doing	-0.124939
-1.485737	if you are doing	-0.124939
-0.866165	two functions are doing	-0.124939
-0.537314	different threads are doing	-0.124939
-0.356357	and Sum3 are doing	-0.124939
-0.358623	be accomplished by doing	-0.124939
-1.171384	you are not doing	-0.124939
-0.562485	other resources than doing	-0.124939
-0.462934	never spend time doing	-0.124939
-0.355867	default size when doing	-0.124939
-0.523941	are useful when doing	-0.124939
-0.663172	the compiler from doing	-0.124939
-0.201238	the CPU from doing	-0.124939
-0.358230	are best at doing	-0.124939
-0.595153	help the CPU doing	-0.124939
-1.257126	the innermost loop doing	-0.124939
-0.352249	goes to actually doing	-0.124939
-1.095033	are in fact doing	-0.124939
-0.382736	program is busy doing	-0.124939
-0.885774	alternative is to run	-0.124939
-1.733775	is likely to run	-0.124939
-1.420212	are able to run	-0.124939
-0.356846	64-bit programs to run	-0.124939
-0.655712	processor models to run	-0.124939
-0.524224	then try to run	-0.124939
-0.502107	will prefer to run	-0.124939
-0.835529	CPU dispatching and run	-0.124939
-0.461642	multiple threads that run	-0.124939
-0.357359	Many services that run	-0.124939
-0.357359	on servers that run	-0.124939
-0.719627	instruction set can run	-0.124939
-0.355847	critical part can run	-0.124939
-0.355847	four cores can run	-0.124939
-0.355847	x86 family can run	-0.124939
-0.458095	compiled code may run	-0.124939
-0.458095	function calls may run	-0.124939
-0.354567	each thread may run	-0.124939
-0.589610	__debugbreak();. If you run	-0.124939
-0.453097	then it will run	-0.124939
-0.422663	Therefore, it will run	-0.124939
-0.953877	the code will run	-0.124939
-0.489587	each thread will run	-0.124939
-0.504144	thread can then run	-0.124939
-0.358234	loaded or at run	-0.124939
-0.358113	all. Can only run	-0.124939
-0.462423	update process should run	-0.124939
-0.582812	consumption of each run	-0.124939
-0.758639	make a test run	-0.124939
-0.501447	thread will always run	-0.124939
-0.352843	will make applications run	-0.124939
-0.516394	set can still run	-0.124939
-0.566400	2.0/3.0 than to calculate	-0.124939
-0.587212	will have to calculate	-0.124939
-1.092193	more time to calculate	-0.124939
-1.866099	is possible to calculate	-0.124939
-0.863574	it takes to calculate	-0.602060
-0.520576	induction variables to calculate	-0.124939
-2.019078	in order to calculate	-0.124939
-1.120744	is faster to calculate	-0.124939
-1.699741	you want to calculate	-0.124939
-1.393729	are able to calculate	-0.124939
-1.763171	is recommended to calculate	-0.124939
-0.455878	typical application to calculate	-0.124939
-0.518316	will start to calculate	-0.124939
-0.352819	don't care to calculate	-0.124939
-0.352819	The procedure to calculate	-0.124939
-0.749170	more convenient to calculate	-0.124939
-0.753022	is safer to calculate	-0.124939
-0.562141	this function and calculate	-0.124939
-0.358396	reload *p and calculate	-0.124939
-0.457305	consecutively and can calculate	-0.124939
-1.162459	and it can calculate	-0.124939
-0.843032	so we can calculate	-0.124939
-0.581647	account. You can calculate	-0.124939
-0.498057	Many processors can calculate	-0.124939
-0.574113	used. We can calculate	-0.124939
-1.379360	the compiler may calculate	-0.124939
-1.305236	The compiler will calculate	-0.124939
-0.522526	Thus, we will calculate	-0.124939
-0.358147	it needs only calculate	-0.124939
-0.653943	the compiler must calculate	-0.124939
-0.343724	8.21, you could calculate	-0.124939
-0.599787	inlined if the inline	-0.124939
-1.766845	the compiler to inline	-0.124939
-0.588344	more likely to inline	-0.124939
-0.937095	is able to inline	-0.124939
-1.321385	be able to inline	-0.124939
-0.785943	is optimal to inline	-0.124939
-0.579076	excellent support for inline	-0.124939
-0.358565	Use macro as inline	-0.124939
-0.559435	code with an inline	-0.124939
-0.539243	by using an inline	-0.124939
-1.423025	is to use inline	-0.124939
-1.070232	using the same inline	-0.124939
-0.503907	making critical functions inline	-0.124939
-0.006617	from array static inline	-0.726999
-0.006617	into array static inline	-0.726999
-0.287221	<int N> static inline	-0.124939
-0.287221	#include <emmintrin.h> static inline	-0.124939
-0.287221	Example 14.19 static inline	-0.124939
-0.287221	<typename T> static inline	-0.124939
-0.287221	cache line: static inline	-0.124939
-0.287221	return _mm_cvtss_si32(_mm_load_ss(&x));} static inline	-0.124939
-0.287221	function add_horizontal) static inline	-0.124939
-0.538974	Therefore, it cannot inline	-0.124939
-0.572906	dispatched function call inline	-0.124939
-0.355770	inline functions An inline	-0.124939
-0.355749	if possible. Use inline	-0.124939
-0.496465	assembly-like intrinsic functions, inline	-0.124939
-0.358528	zip file of every	-0.124939
-0.540912	backup copy of every	-0.124939
-0.461760	memory block for every	-0.124939
-0.357452	two expressions for every	-0.124939
-0.357452	and again for every	-0.124939
-0.463443	textbooks recommend that every	-0.124939
-0.578893	every function or every	-0.124939
-0.459427	makes dispatching on every	-0.124939
-0.653267	the dispatch on every	-0.124939
-0.459427	times: Dispatch on every	-0.124939
-0.804586	can do this every	-0.124939
-0.521974	be done at every	-0.124939
-0.355315	debug breakpoints at every	-0.124939
-0.565713	events, for example every	-0.124939
-0.547686	object are called every	-0.124939
-0.569592	branching is done every	-0.124939
-0.521213	into the list every	-0.124939
-0.521546	series of branches every	-0.124939
-0.174841	new memory block every	-0.425969
-0.968813	floating point addition every	-0.124939
-0.127060	have one addition every	-0.124939
-0.127060	only one addition every	-0.124939
-0.551573	must be loaded every	-0.124939
-0.697854	search for updates every	-0.124939
-0.346291	an interrupt, e.g. every	-0.124939
-0.441832	make a misprediction every	-0.124939
-0.434862	parameters are evaluated every	-0.124939
-0.429307	to be updated every	-0.124939
-0.294080	seconds; // incremented every	-0.124939
-0.237788	block is re-allocated every	-0.124939
-0.237788	would be re-calculated every	-0.124939
-0.600190	x86) of the standard	-0.124939
-0.589805	alternatives to the standard	-0.124939
-0.589805	conform to the standard	-0.124939
-1.184460	based on the standard	-0.124939
-0.583070	blurred as the standard	-0.124939
-0.595026	table. If the standard	-0.124939
-0.556104	purposes. Unfortunately, the standard	-0.124939
-0.357728	most purposes the standard	-0.124939
-0.357728	handling. Omitting the standard	-0.124939
-0.599200	OpenMP is a standard	-0.124939
-0.876480	to have a standard	-0.124939
-0.065757	aliasing rule of standard	-0.425969
-0.503038	container classes. The standard	-0.124939
-0.357512	before storing. The standard	-0.124939
-0.357512	"frame pointer". The standard	-0.124939
-0.358784	data structures for standard	-0.124939
-0.550610	Intel) know that standard	-0.124939
-0.358554	754 (1985). This standard	-0.124939
-0.562485	computing resources than standard	-0.124939
-0.582551	programmer can use standard	-0.124939
-0.500899	Software should use standard	-0.124939
-0.962599	libraries for many standard	-0.124939
-0.351466	classes. Unfortunately, many standard	-0.124939
-0.356159	contains only simple standard	-0.124939
-0.355587	power. Connecting several standard	-0.124939
-0.350768	The official C standard	-0.124939
-0.346398	Intel compiler includes standard	-0.124939
-0.343641	Most compilers include standard	-0.124939
-0.172630	that the C/C++ standard	-0.124939
-0.172630	bad The C/C++ standard	-0.124939
-0.237837	to the IEEE standard	-0.124939
-0.895785	cache for the hardware	-0.124939
-1.066003	depend on the hardware	-0.124939
-0.710108	faster than the hardware	-0.425969
-0.596977	problems when the hardware	-0.124939
-0.595376	microprocessor because the hardware	-0.124939
-0.584094	C++, and a hardware	-0.124939
-0.200000	coded in a hardware	-0.425969
-0.581842	programmed in a hardware	-0.124939
-1.239628	rather than a hardware	-0.124939
-0.570892	instructions, where a hardware	-0.124939
-0.874522	the choice of hardware	-0.124939
-0.839636	The choice of hardware	-0.124939
-0.181267	2.1 Choice of hardware	-0.425969
-0.573257	direct access to hardware	-0.124939
-0.573259	are implemented in hardware	-0.124939
-0.358795	connect them. The hardware	-0.124939
-1.376735	are based on hardware	-0.124939
-0.201973	the CPU has hardware	-0.425969
-0.934250	the microprocessor has hardware	-0.124939
-0.387304	CPU or other hardware	-0.124939
-0.387304	disk or other hardware	-0.124939
-0.354991	to any known hardware	-0.124939
-0.555855	on the microprocessor hardware	-0.124939
-0.333807	improvements in microprocessor hardware	-0.124939
-0.456634	than the intrinsic hardware	-0.124939
-0.434981	definition language defines hardware	-0.124939
-0.294154	hacks and direct hardware	-0.124939
-0.237853	issue to catching hardware	-0.124939
-0.902849	clock cycle is 1	-0.124939
-0.462830	for positive and 1	-0.124939
-0.358293	for false and 1	-0.124939
-0.594100	a will be 1	-0.124939
-0.472442	be 0 or 1	-0.124939
-0.751618	than 0 or 1	-0.124939
-0.335411	always 0 or 1	-0.124939
-0.504712	AND'ed b with 1	-0.124939
-1.360035	last byte at 1	-0.124939
-1.177471	= b + 1	-0.124939
-0.952678	signed or unsigned 1	-0.124939
-0.920794	long long 64 1	-0.124939
-0.348594	64 Iu32vec2 64 1	-0.124939
-0.356264	1; // always 1	-0.124939
-0.356310	needed _mm_shuffle_epi8 16 1	-0.124939
-0.356358	SSSE3 _mm_perm_epi8 32 1	-0.124939
-0.356087	security. b & 1	-0.124939
-0.337539	or unsigned 1 1	-0.124939
-0.337539	bytes bool 1 1	-0.124939
-0.349020	alignment, bytes bool 1	-0.124939
-0.346305	as OneOrTwo5[(b!=0) ? 1	-0.124939
-0.621586	ebx, eax ebx, 1	-0.124939
-0.339283	DWORD PTR[ecx+eax*4],ebx eax, 1	-0.124939
-0.339242	1)sign 2exponent 127 1	-0.124939
-0.314641	notice .......................................................................................................... 164 1	-0.124939
-0.314552	?Func2@@YAXQAHAAH@Z ENDP ecx, 1	-0.124939
-0.444093	Programmer’s Manual", Volume 1	-0.124939
-0.294043	wrap around. Adding 1	-0.124939
-0.294043	-128, and subtracting 1	-0.124939
-0.237755	updated 2014-08-07. Contents 1	-0.124939
-0.237755	1)sign 2exponent 1023 1	-0.124939
-0.237755	in the level- 1	-0.124939
-0.065758	b ? a :	-0.124939
-0.357954	unsigned int one :	-0.124939
-0.065035	a ? b :	-0.425969
-0.127572	c + 2 :	-0.425969
-0.354510	OneOrTwo5[(b!=0) ? 1 :	-0.124939
-0.896022	unsigned int sign :	-0.124939
-0.023042	unsigned int exponent :	-0.124939
-0.021557	unsigned int fraction :	-0.124939
-0.341798	(cc[i] + 2) :	-0.124939
-0.114030	54 class D :	-0.124939
-0.114030	B2; class D :	-0.124939
-0.251916	}; class C1 :	-0.124939
-0.251916	Disp(); class C1 :	-0.124939
-0.093271	declaration class CChild1 :	-0.124939
-0.093271	versions: class CChild1 :	-0.124939
-0.314601	}; class CChild2 :	-0.124939
-0.294089	MyChild> class CParent :	-0.124939
-0.294089	}; class C2 :	-0.124939
-0.294089	0) ? 1.0f :	-0.124939
-0.237796	" : "=m"(n) :	-0.124939
-0.237796	b ? 1.5f :	-0.124939
-0.237796	x; public: c1() :	-0.124939
-0.237796	EXCEPTION_FLT_OVERFLOW ? EXCEPTION_EXECUTE_HANDLER :	-0.124939
-0.237796	fistpl %0 " :	-0.124939
-0.237796	"=m"(n) : "m"(x) :	-0.124939
-0.592887	would have to add	-0.124939
-1.174016	not possible to add	-0.124939
-1.809861	it takes to add	-0.124939
-0.461834	for software to add	-0.124939
-1.136384	no reason to add	-0.124939
-0.358326	the operands and add	-0.124939
-0.358326	= shift and add	-0.124939
-0.358781	making plug-ins that add	-0.124939
-0.355611	sum operator // add	-0.124939
-0.459421	<< 23; // add	-0.124939
-0.355611	+= 2;} // add	-0.124939
-0.355611	return add_elements(s); // add	-0.124939
-0.726040	the loop or add	-0.124939
-0.590407	operations do not add	-0.124939
-0.552676	modules. You may add	-0.124939
-0.552676	itself. You may add	-0.124939
-0.355593	vector size then add	-0.124939
-0.500356	other module then add	-0.124939
-0.142116	2. The instruction add	-0.124939
-0.142116	elements. The instruction add	-0.124939
-0.461627	100000000. When we add	-0.124939
-0.501595	You may even add	-0.124939
-0.356150	next two instructions add	-0.124939
-0.355844	by 2 ; add	-0.124939
-0.986775	function that doesn't add	-0.124939
-0.354503	ecx 86 add add	-0.124939
-0.455105	compiler may actually add	-0.124939
-0.347355	sar add mov add	-0.124939
-0.407982	a vector register, add	-0.124939
-0.294098	shr add sar add	-0.124939
-0.294098	$B1$2: mov shr add	-0.124939
-0.237804	[eax+4], ecx 86 add	-0.124939
-0.671254	efficient in 64-bit mode	-0.124939
-0.351865	running in 64-bit mode	-0.124939
-0.141231	needed in 64-bit mode	-0.124939
-0.351865	follows in 64-bit mode	-0.124939
-0.351865	i.e. in 64-bit mode	-0.124939
-0.351865	simpler in 64-bit mode	-0.124939
-0.482993	2 In 64-bit mode	-0.124939
-0.126497	files. Use 64-bit mode	-0.124939
-0.126497	point. Use 64-bit mode	-0.124939
-0.305721	or reference, 64-bit mode	-0.124939
-0.770431	variables in 32-bit mode	-0.124939
-0.530685	method in 32-bit mode	-0.124939
-0.343170	or reference, 32-bit mode	-0.124939
-0.534807	in 64 bit mode	-0.124939
-0.681630	in 32 bit mode	-0.124939
-0.426185	80386 32 bit mode	-0.124939
-0.304377	optimized for 16-bit mode	-0.124939
-0.304377	64-bit mode. 16-bit mode	-0.124939
-0.339455	floating point rounding mode	-0.124939
-0.044994	for a console mode	-0.124939
-0.044994	use a console mode	-0.124939
-0.044994	file. A console mode	-0.124939
-0.044994	interface. A console mode	-0.124939
-0.057011	set the flush-to-zero mode	-0.124939
-0.057011	setting the flush-to-zero mode	-0.124939
-0.180320	7.5. Set flush-to-zero mode	-0.124939
-0.093306	switch to protected mode	-0.124939
-0.093306	switching to protected mode	-0.124939
-0.102861	set the denormals-are-zero mode	-0.124939
-0.102861	flush-to-zero and denormals-are-zero mode	-0.124939
-1.068665	because of a store	-0.124939
-0.591011	miss on a store	-0.124939
-0.787555	to generate a store	-0.124939
-0.589419	safety is to store	-0.124939
-0.567400	block than to store	-0.124939
-1.729734	the compiler to store	-0.124939
-1.542084	you have to store	-0.124939
-0.837539	more efficient to store	-0.124939
-1.632878	is possible to store	-0.124939
-1.268065	it possible to store	-0.124939
-0.456724	of elements to store	-0.124939
-1.349008	for how to store	-0.124939
-1.024888	the need to store	-0.124939
-0.316257	// Function to store	-0.425969
-0.867034	deciding whether to store	-0.124939
-0.456724	memory space to store	-0.124939
-0.502963	same class and store	-0.124939
-0.357459	calculate *p+2 and store	-0.124939
-0.357459	containing (2,2,2,2), and store	-0.124939
-0.357459	vector (1,2,3,4), and store	-0.124939
-0.858049	so we can store	-0.124939
-0.588650	safety, you may store	-0.124939
-0.523770	the system may store	-0.124939
-1.324864	The compiler will store	-0.124939
-0.358259	strides. Uncached memory store	-0.124939
-0.355923	points to ; store	-0.124939
-0.345196	optimizing compiler might store	-0.124939
-0.294200	table. Even better: store	-0.124939
-0.599947	type in the values	-0.124939
-0.877868	recognize that the values	-0.124939
-0.589266	Assuming that the values	-0.124939
-0.597246	only on the values	-0.124939
-1.045482	will make the values	-0.124939
-0.565454	can store the values	-0.124939
-0.724337	and insert the values	-0.124939
-0.357879	and show the values	-0.124939
-0.461837	bit }; The values	-0.124939
-0.785094	is called. The values	-0.124939
-0.503038	an array. The values	-0.124939
-0.574783	labels that have values	-0.124939
-0.087565	b have other values	-0.124939
-0.087565	operands have other values	-0.124939
-0.087565	might have other values	-0.124939
-0.299172	have no other values	-0.425969
-0.357884	with different set values	-0.124939
-0.064944	for checking multiple values	-0.425969
-0.761058	of these two values	-0.124939
-1.125293	calculate the table values	-0.124939
-0.750941	replaced by their values	-0.124939
-0.352826	data have three values	-0.124939
-0.349655	initialized to desired values	-0.124939
-0.336180	then all five values	-0.124939
-0.331788	by their actual values	-0.124939
-0.331788	initialized to valid values	-0.124939
-0.331788	If the key values	-0.124939
-0.294135	the four G values	-0.124939
-0.294135	all the R values	-0.124939
-0.356648	2. Position-independent code. All	-0.124939
-0.586666	of operating system All	-0.124939
-0.653576	of order execution All	-0.124939
-0.821802	the application program. All	-0.124939
-0.353267	and message systems. All	-0.124939
-0.811999	Compiler optimization options All	-0.124939
-0.350604	register stack are: All	-0.124939
-0.349494	any public variables. All	-0.124939
-0.348145	simple standard operations. All	-0.124939
-0.523573	library is needed. All	-0.124939
-0.430259	several different purposes. All	-0.124939
-0.556638	for multiple purposes. All	-0.124939
-0.343536	user. Compatibility problems. All	-0.124939
-0.791054	with big-endian storage. All	-0.124939
-0.502046	accessed very fast. All	-0.124939
-0.339301	they cannot do. All	-0.124939
-0.325152	the best-case conditions. All	-0.124939
-0.314523	to be stored. All	-0.124939
-0.444054	or error prone. All	-0.124939
-0.294015	that need relocation. All	-0.124939
-0.382589	this "override" feature. All	-0.124939
-0.294015	in table 9.2. All	-0.124939
-0.237731	microprocessors are constructed. All	-0.124939
-0.237731	return prediction). 149 All	-0.124939
-0.237731	in two steps. All	-0.124939
-0.237731	(see page 93). All	-0.124939
-0.237731	external libraries. www.agner.org/optimize/#vectorclass All	-0.124939
-0.237731	problem: 1. Relocation. All	-0.124939
-0.237731	file formats. Comments All	-0.124939
-0.237731	is an integer). All	-0.124939
-0.237731	called stack unwinding. All	-0.124939
-1.185400	access to the sign	-0.124939
-0.594378	things with the sign	-0.124939
-0.982084	this example, the sign	-0.124939
-0.877544	to test the sign	-0.124939
-0.570280	or without the sign	-0.124939
-0.175125	shift out the sign	-0.425969
-0.572201	care about the sign	-0.124939
-0.524840	example sets the sign	-0.124939
-0.948910	can change the sign	-0.124939
-0.065519	bits except the sign	-0.124939
-0.758853	by setting the sign	-0.124939
-0.460780	ebx,31 copies the sign	-0.124939
-0.356681	to flip the sign	-0.124939
-0.356681	by inverting the sign	-0.124939
-0.358804	the fraction. The sign	-0.124939
-0.358801	various corrections for sign	-0.124939
-0.659435	: 1; // sign	-0.124939
-0.358614	-0 (zero with sign	-0.124939
-0.498056	0x3FF unsigned int sign	-0.124939
-0.498056	0x3FFF unsigned int sign	-0.124939
-0.498056	0x7F unsigned int sign	-0.124939
-0.312408	0x80000000; // set sign	-0.124939
-0.312408	0x7FFFFFFF; // set sign	-0.124939
-0.655253	{ // test sign	-0.124939
-0.502744	and shift out sign	-0.124939
-0.341810	; shift down sign	-0.124939
-0.502344	0x80000000; // Set sign	-0.124939
-0.294173	0x80000000; // flip sign	-0.124939
-0.237869	here: The inequality sign	-0.124939
-0.599609	calls to the copy	-0.124939
-0.890158	will use the copy	-0.124939
-1.682870	there is a copy	-0.124939
-0.598824	benefits of a copy	-0.124939
-0.579596	takes time to copy	-0.124939
-1.237752	be useful to copy	-0.124939
-0.358202	data block to copy	-0.124939
-0.657779	memory block and copy	-0.124939
-0.065672	to transpose and copy	-0.425969
-0.461837	hidden pointer. The copy	-0.124939
-0.503038	return value. The copy	-0.124939
-0.357512	and destructors. The copy	-0.124939
-1.352188	are useful for copy	-0.124939
-0.357672	0, sizeof(a)); // copy	-0.124939
-0.357672	= 0.0; // copy	-0.124939
-0.459520	improved performance. A copy	-0.124939
-0.355689	need initialization. A copy	-0.124939
-1.121759	there are no copy	-0.124939
-0.826646	object has no copy	-0.124939
-0.356030	Copy protection. Some copy	-0.124939
-0.353593	is updated. Most copy	-0.124939
-0.352267	system breakdown. Many copy	-0.124939
-0.452691	making an unused copy	-0.124939
-0.347360	entire object. Any copy	-0.124939
-0.015536	have a non-inlined copy	-0.124939
-0.015536	make a non-inlined copy	-0.124939
-0.015536	making a non-inlined copy	-0.124939
-0.048382	module. This non-inlined copy	-0.124939
-0.314649	saving a backup copy	-0.124939
-0.294135	to default constructors, copy	-0.124939
-0.282921	The costs of optimizing	-0.124939
-0.833706	the requirements of optimizing	-0.124939
-0.463476	for size and optimizing	-0.124939
-0.578798	functions than in optimizing	-0.124939
-0.358391	more efforts in optimizing	-0.124939
-1.240746	be useful for optimizing	-0.124939
-0.763071	are good for optimizing	-0.124939
-0.565153	algorithm than by optimizing	-0.124939
-0.723023	to gain by optimizing	-0.124939
-0.453959	assume that an optimizing	-0.124939
-0.351305	volatile then an optimizing	-0.124939
-0.351305	issue because an optimizing	-0.124939
-0.351305	C1::f. But an optimizing	-0.124939
-0.351305	most cases, an optimizing	-0.124939
-0.725699	more important than optimizing	-0.124939
-1.138162	into account when optimizing	-0.124939
-0.358223	very good at optimizing	-0.124939
-0.462673	a variable because optimizing	-0.124939
-0.357156	the choice between optimizing	-0.124939
-0.319508	first program. An optimizing	-0.124939
-0.319508	is used. An optimizing	-0.124939
-0.319508	below. Devirtualization An optimizing	-0.124939
-0.319508	char pointers). An optimizing	-0.124939
-1.038182	of the best optimizing	-0.124939
-0.547196	performance. A good optimizing	-0.124939
-0.354369	very fast. All optimizing	-0.124939
-0.351142	has many advanced optimizing	-0.124939
-0.540537	Volatile to prevent optimizing	-0.124939
-0.336160	the best job optimizing	-0.124939
-0.237820	Serialize // Prevent optimizing	-0.124939
-2.118570	part of the memory.	-0.124939
-0.596415	around in the memory.	-0.124939
-0.596415	contiguously in the memory.	-0.124939
-0.504043	consecutive bytes of memory.	-0.124939
-0.865320	same piece of memory.	-0.124939
-0.462751	copying blocks of memory.	-0.124939
-0.553443	registers, not in memory.	-0.124939
-1.027937	rather than in memory.	-0.124939
-1.000510	each other in memory.	-0.124939
-1.318243	are stored in memory.	-0.124939
-0.950183	objects stored in memory.	-0.124939
-1.031518	scattered around in memory.	-0.124939
-0.458785	save temp in memory.	-0.124939
-0.458785	stored sequentially in memory.	-0.124939
-0.458785	elements consecutively in memory.	-0.124939
-1.844424	in the code memory.	-0.124939
-0.247904	around in program memory.	-0.124939
-0.247904	contiguous in program memory.	-0.124939
-0.352958	the library into memory.	-0.124939
-0.518519	is loaded into memory.	-0.124939
-0.263610	stored in static memory.	-0.124939
-0.448285	efficiently than static memory.	-0.124939
-0.722383	memory to stack memory.	-0.124939
-0.422260	align dynamically allocated memory.	-0.124939
-0.422260	aligning dynamically allocated memory.	-0.124939
-0.350787	instead of main memory.	-0.124939
-0.198694	use of RAM memory.	-0.124939
-0.198694	speed of RAM memory.	-0.124939
-0.265171	results in RAM memory.	-0.124939
-0.331864	preferably with contiguous memory.	-0.124939
-1.817627	to use the well	-0.124939
-0.596092	However, with a well	-0.124939
-0.358822	a systematic and well	-0.124939
-0.063800	you may as well	-0.124939
-0.629387	the program as well	-0.124939
-0.343419	string functions as well	-0.124939
-0.343419	64-bit Linux as well	-0.124939
-0.343419	.NET framework as well	-0.124939
-0.343419	to reading as well	-0.124939
-0.343419	software users as well	-0.124939
-0.343419	bool, enum as well	-0.124939
-0.343419	not yet as well	-0.124939
-0.343419	2008 R2 as well	-0.124939
-1.054447	libraries are not well	-0.124939
-0.592239	of a pointer well	-0.124939
-0.357186	intensive may very well	-0.124939
-0.499348	depends on how well	-0.124939
-0.339773	to see how well	-0.124939
-0.339773	for checking how well	-0.124939
-0.343686	not always work well	-0.124939
-0.749031	it doesn't work well	-0.124939
-0.519447	library that works well	-0.124939
-0.340940	sure it works well	-0.124939
-0.354257	be predicted quite well	-0.124939
-0.093608	loops are predicted well	-0.425969
-0.290674	is usually predicted well	-0.124939
-0.314719	x86 platforms. Works well	-0.124939
-0.294154	They have worked well	-0.124939
-0.294154	are universal, flexible, well	-0.124939
-1.296482	make sure the information	-0.124939
-0.566883	may store the information	-0.124939
-0.527176	valuable source of information	-0.124939
-0.550478	pointers and for information	-0.124939
-0.358527	is known. This information	-0.124939
-1.047350	compiler doesn't have information	-0.124939
-0.562162	then use this information	-0.124939
-0.540260	119 for more information	-0.124939
-0.354291	handler needs all information	-0.124939
-0.354291	has saved all information	-0.124939
-0.568729	compiler has no information	-0.124939
-0.357234	to save some information	-0.124939
-0.356466	disassembly, probably without information	-0.124939
-0.355900	identification adds extra information	-0.124939
-0.532741	have the necessary information	-0.124939
-0.342965	the 124 necessary information	-0.124939
-0.349536	in memory. No information	-0.124939
-0.523596	give the full information	-0.124939
-0.345145	lot of added information	-0.124939
-0.458018	on the CPUID information	-0.124939
-0.269129	The only CPUID information	-0.124939
-0.339369	testing contains debug information	-0.124939
-0.521462	the stack unwinding information	-0.124939
-0.237849	class which gets information	-0.124939
-0.237849	generation class gets information	-0.124939
-0.325192	the compiler additional information	-0.124939
-0.325257	to store application-specific information	-0.124939
-0.294052	Compiler has insufficient information	-0.124939
-0.237763	advised to seek information	-0.124939
-0.237763	to save recovery information	-0.124939
-0.237763	it has incomplete information	-0.124939
-1.153368	code. It is simply	-0.124939
-0.588805	costless. It is simply	-0.124939
-0.479804	The pointer is simply	-0.124939
-0.685541	function pointer is simply	-0.124939
-0.594785	used, there is simply	-0.124939
-0.826009	or structure is simply	-0.124939
-0.142383	the difference is simply	-0.124939
-0.142383	The difference is simply	-0.124939
-0.719097	The effect is simply	-0.124939
-0.355618	An enum is simply	-0.124939
-0.355618	sequential labels is simply	-0.124939
-0.355618	processor X" is simply	-0.124939
-0.573085	both functions and simply	-0.124939
-0.584996	error-handling function that simply	-0.124939
-0.526227	overloaded function are simply	-0.124939
-0.724471	The values are simply	-0.124939
-0.358623	become imprecise or simply	-0.124939
-0.540711	extra time. It simply	-0.124939
-0.358028	additions. When used simply	-0.124939
-0.462144	data member pointer simply	-0.124939
-1.163061	floating point number simply	-0.124939
-0.576229	signal an error simply	-0.124939
-0.356115	of people. I simply	-0.124939
-1.074942	and unsigned integers simply	-0.124939
-0.569611	size is done simply	-0.124939
-0.559224	array is implemented simply	-0.124939
-1.249528	floating point numbers simply	-0.124939
-0.799283	can be copied simply	-0.124939
-0.483683	for CPU brand simply	-0.124939
-0.477817	It is measured simply	-0.124939
-0.294117	the performance significantly simply	-0.124939
-0.619950	the compiler is able	-0.425969
-0.658661	the microprocessor is able	-0.124939
-0.559748	code to be able	-0.124939
-0.559748	compiler to be able	-0.124939
-0.559748	want to be able	-0.124939
-0.611742	may not be able	-0.602060
-0.434999	might not be able	-0.124939
-0.561381	compiler may be able	-0.124939
-0.561381	compilers may be able	-0.124939
-0.825219	processor may be able	-0.124939
-0.946529	program will be able	-0.124939
-0.553201	103) will be able	-0.124939
-0.566420	compiler would be able	-0.124939
-0.254892	the compilers are able	-0.124939
-0.254892	all compilers are able	-0.124939
-0.254892	C++ compilers are able	-0.124939
-0.254892	Some compilers are able	-0.124939
-0.254892	few compilers are able	-0.124939
-0.254892	Few compilers are able	-0.124939
-0.386013	Intel microprocessors are able	-0.124939
-0.543425	Modern microprocessors are able	-0.124939
-0.881927	they are not able	-0.124939
-0.356909	is usually not able	-0.124939
-1.239162	is not always able	-0.124939
-0.517615	CPUs are actually able	-0.124939
-0.323701	whether they were able	-0.124939
-0.323701	have tested were able	-0.124939
-0.451169	processors are sometimes able	-0.124939
-1.957953	if it is certain	-0.124939
-0.462820	& 1 is certain	-0.124939
-0.358285	with #define is certain	-0.124939
-0.586543	see if a certain	-0.124939
-0.587091	lower than a certain	-0.124939
-0.526279	is within a certain	-0.124939
-0.358868	to adhere to certain	-0.124939
-0.358763	The speed for certain	-0.124939
-1.634467	make sure that certain	-0.124939
-0.358059	the likelihood that certain	-0.124939
-0.589291	You cannot be certain	-0.124939
-1.379951	If you are certain	-0.124939
-0.567480	optimal. There are certain	-0.124939
-0.567480	tables". There are certain	-0.124939
-0.565552	but only if certain	-0.124939
-0.357401	in parallel if certain	-0.124939
-0.358569	zero flags on certain	-0.124939
-0.598635	count is not certain	-0.124939
-0.658752	instruction sets have certain	-0.124939
-0.358212	generate interrupts at certain	-0.124939
-0.574429	instructions can make certain	-0.124939
-0.462282	may be no certain	-0.124939
-1.460757	It is therefore certain	-0.124939
-0.457724	up to count certain	-0.124939
-0.555444	it is quite certain	-0.124939
-0.354013	CPUID instruction was certain	-0.124939
-0.495848	unfortunately it prevents certain	-0.124939
-0.259298	it is almost certain	-0.124939
-0.259298	list is almost certain	-0.124939
-0.382680	has to obey certain	-0.124939
-0.237796	necessary to query certain	-0.124939
-0.787016	that the clock cycles	-0.124939
-0.210179	unit is clock cycles	-0.124939
-0.210179	uses more clock cycles	-0.124939
-0.365517	using CPU clock cycles	-0.124939
-0.436114	approximately two clock cycles	-0.124939
-0.210179	- 2 clock cycles	-0.124939
-0.281838	- 4 clock cycles	-0.124939
-0.281838	wastes several clock cycles	-0.124939
-0.039593	a few clock cycles	-0.221849
-0.063689	The few clock cycles	-0.124939
-0.166389	takes 10 clock cycles	-0.124939
-0.166389	take 10 clock cycles	-0.124939
-0.376953	The core clock cycles	-0.124939
-0.263759	are core clock cycles	-0.124939
-0.364790	- 5 clock cycles	-0.124939
-0.513247	- 20 clock cycles	-0.124939
-0.210179	and 15 clock cycles	-0.124939
-0.400563	- 80 clock cycles	-0.124939
-0.092497	a hundred clock cycles	-0.124939
-0.092497	several hundred clock cycles	-0.124939
-0.210179	takes 11 clock cycles	-0.124939
-0.210179	took 50 clock cycles	-0.124939
-0.043791	size matrices, clock cycles	-0.425969
-0.210179	only 2-3 clock cycles	-0.124939
-0.210179	is counting clock cycles	-0.124939
-0.900986	a[size], b[size]; // ...	-0.124939
-0.357668	= WhateverFunction(i); // ...	-0.124939
-0.576323	1000; i++) { ...	-0.124939
-0.345698	< 10) { ...	-0.124939
-0.345698	> 1.0) { ...	-0.124939
-0.345698	WriteFile(handle, ...)) { ...	-0.124939
-0.345698	<= max) { ...	-0.124939
-0.345698	catch (...) { ...	-0.124939
-0.345698	- min)) { ...	-0.124939
-0.507343	110; int i; ...	-0.124939
-0.507343	ab[size]; int i; ...	-0.124939
-0.507343	list[16]; int i; ...	-0.124939
-0.317766	a[size], b[size], i; ...	-0.124939
-0.592405	a, b, c; ...	-0.425969
-0.580577	C1 { public: ...	-0.124939
-0.352267	{ C1 x; ...	-0.124939
-0.504965	S1 x, y; ...	-0.124939
-0.269192	= CriticalFunction(b, c); ...	-0.124939
-0.269192	= (*CriticalFunction)(b, c); ...	-0.124939
-0.314640	order(int x); 136 ...	-0.124939
-0.714650	int i, j; ...	-0.124939
-0.294126	<asmlib.h> void CriticalFunction(); ...	-0.124939
-0.538709	3628800, 39916800, 479001600}; ...	-0.124939
-0.294126	i; float list[size]; ...	-0.124939
-0.237829	1000; int List[ArraySize]; ...	-0.124939
-0.237829	a = Func1(2); ...	-0.124939
-0.237829	log2 = log(2.0); ...	-0.124939
-0.237829	b[0], a[1], b[1], ...	-0.124939
-0.237829	a = FactorialTable[b]; ...	-0.124939
-0.896573	see that the addresses	-0.124939
-0.595358	occurs because the addresses	-0.124939
-1.539356	to calculate the addresses	-0.124939
-0.658358	to control the addresses	-0.124939
-0.540073	file includes the addresses	-0.124939
-1.045669	for calculating the addresses	-0.124939
-0.358533	requires alignment to addresses	-0.124939
-0.358533	data structures to addresses	-0.124939
-0.566972	All pointers and addresses	-0.124939
-0.526830	contain pointers or addresses	-0.124939
-0.358380	destination both have addresses	-0.124939
-0.462735	later reads from addresses	-0.124939
-0.579426	range of memory addresses	-0.124939
-0.355364	at round memory addresses	-0.124939
-0.462589	read from different addresses	-0.124939
-0.357549	use full 64-bit addresses	-0.124939
-0.493314	storing function return addresses	-0.124939
-0.350535	by causing return addresses	-0.124939
-0.356503	to translate these addresses	-0.124939
-0.355666	only calculate element addresses	-0.124939
-0.355550	and 0x4700. These addresses	-0.124939
-0.354959	profiler itself. Function addresses	-0.124939
-0.354360	Position-independent code. All addresses	-0.124939
-0.373699	smaller because relative addresses	-0.124939
-0.286751	will generate relative addresses	-0.124939
-0.373699	calculating self- relative addresses	-0.124939
-0.345091	for calculating row addresses	-0.124939
-0.339374	contains no absolute addresses	-0.124939
-0.336184	to calculate self-relative addresses	-0.124939
-0.325252	members to round addresses	-0.124939
-1.620010	This is a counter	-0.124939
-0.589809	counter is a counter	-0.124939
-1.689216	a floating point counter	-0.124939
-0.911779	of the loop counter	-0.124939
-0.491438	and the loop counter	-0.124939
-1.057907	if the loop counter	-0.124939
-0.491438	comparing the loop counter	-0.124939
-0.491438	increment the loop counter	-0.124939
-0.358765	of a loop counter	-0.301030
-0.521578	freely. The loop counter	-0.124939
-0.318208	variable as loop counter	-0.124939
-0.466941	time. A loop counter	-0.124939
-0.318208	// Initialize loop counter	-0.124939
-0.318208	// Increment loop counter	-0.124939
-0.587270	adding an integer counter	-0.124939
-0.651243	You may add counter	-0.124939
-0.871520	core clock cycles counter	-0.124939
-0.148239	Table // Loop counter	-0.124939
-0.452633	A performance monitor counter	-0.124939
-0.452633	useful performance monitor counter	-0.124939
-0.158356	core clock cycle counter	-0.124939
-0.076253	the time stamp counter	-0.124939
-0.095315	The time stamp counter	-0.124939
-0.095315	Returns time stamp counter	-0.124939
-1.293671	any of the shared	-0.124939
-1.073704	variable in the shared	-0.124939
-0.966752	called from the shared	-0.124939
-0.561263	accessed from the shared	-0.124939
-0.526501	we compile the shared	-0.124939
-0.978765	and store the shared	-0.124939
-0.596028	variable that is shared	-0.124939
-1.610667	part of a shared	-0.124939
-0.658286	function in a shared	-0.124939
-0.847827	not in a shared	-0.124939
-0.847827	variable in a shared	-0.124939
-0.559928	are making a shared	-0.124939
-0.357010	to compile a shared	-0.124939
-0.504863	memory address and shared	-0.124939
-1.395495	is used in shared	-0.124939
-0.600567	read-only can be shared	-0.124939
-1.066255	variables that are shared	-0.124939
-0.358642	linked libraries or shared	-0.124939
-0.598668	section is not shared	-0.124939
-0.562326	default, even when shared	-0.124939
-0.355696	static libraries. A shared	-0.124939
-0.355696	explained above. A shared	-0.124939
-1.365046	possible to make shared	-0.124939
-0.897076	within the same shared	-0.124939
-0.541945	in a 64-bit shared	-0.124939
-0.352092	speeding up 64-bit shared	-0.124939
-0.127327	libraries, also called shared	-0.425969
-1.034016	a very large shared	-0.124939
-0.865501	function call to count	-0.124939
-0.575166	set up to count	-0.124939
-0.550744	counter variables that count	-0.124939
-0.358708	than making it count	-0.124939
-0.934810	of the loop count	-0.124939
-0.646172	if the loop count	-0.425969
-0.500044	when the loop count	-0.124939
-0.500044	If the loop count	-0.124939
-0.553590	make a loop count	-0.124939
-0.539101	5. The loop count	-0.124939
-0.134734	the maximum loop count	-0.124939
-0.134734	The maximum loop count	-0.124939
-0.574256	measure the clock count	-0.124939
-0.474880	possible. The first count	-0.124939
-0.474880	way. The first count	-0.124939
-0.565339	non-zero, and therefore count	-0.124939
-0.353408	destination, but don't count	-0.124939
-0.013285	that the repeat count	-0.124939
-0.006592	if the repeat count	-0.425969
-0.013285	when the repeat count	-0.124939
-0.013285	If the repeat count	-0.124939
-0.070907	a high repeat count	-0.124939
-0.034008	the maximum repeat count	-0.124939
-0.034008	worst-case maximum repeat count	-0.124939
-0.070907	very low repeat count	-0.124939
-0.070907	the typical repeat count	-0.124939
-0.070907	and fixed repeat count	-0.124939
-0.345185	nothing while seconds count	-0.124939
-1.148185	part of the program.	-0.301030
-1.777871	parts of the program.	-0.124939
-0.581806	task of the program.	-0.124939
-1.337764	beginning of the program.	-0.124939
-1.208060	rest of the program.	-0.124939
-0.595177	place in the program.	-0.124939
-0.595177	time-consumer in the program.	-0.124939
-1.066197	modified by the program.	-0.124939
-0.357285	will crash the program.	-0.124939
-0.357285	that crashes the program.	-0.124939
-0.565987	performance of a program.	-0.124939
-0.894779	part of a program.	-0.124939
-0.565987	development of a program.	-0.124939
-0.525905	make up a program.	-0.124939
-0.884937	you start to program.	-0.124939
-0.503077	in a C++ program.	-0.124939
-1.043303	of the first program.	-0.124939
-0.809433	of a big program.	-0.124939
-0.759163	a console mode program.	-0.124939
-0.516018	of the application program.	-0.124939
-0.366757	by the application program.	-0.124939
-0.366757	market the application program.	-0.124939
-0.989483	in the main program.	-0.124939
-0.799887	of the whole program.	-0.124939
-1.059276	in the final program.	-0.124939
-0.341827	original, poorly designed program.	-0.124939
-0.421344	to an existing program.	-0.124939
-1.425614	but it is quite	-0.124939
-0.596739	kbytes. This is quite	-0.124939
-0.589456	strategies It is quite	-0.124939
-0.589456	a*b*c*2. It is quite	-0.124939
-0.590579	everything, which is quite	-0.124939
-0.559610	in C++ is quite	-0.124939
-0.356811	these problems is quite	-0.124939
-0.356811	and 12.4c is quite	-0.124939
-1.765230	This can be quite	-0.124939
-0.865992	functions can be quite	-0.124939
-1.329187	which can be quite	-0.124939
-0.583115	languages can be quite	-0.124939
-1.788424	it may be quite	-0.124939
-0.358742	time slice are quite	-0.124939
-0.598662	but is not quite	-0.124939
-0.540407	This can have quite	-0.124939
-0.358040	CPU model, which quite	-0.124939
-0.358033	and flexible, but quite	-0.124939
-0.586563	Fortran is also quite	-0.124939
-0.492827	memory can take quite	-0.124939
-0.492827	which can take quite	-0.124939
-1.063438	it is accessed quite	-0.124939
-1.054430	the compiler does quite	-0.124939
-0.493276	This is actually quite	-0.124939
-0.480780	consumption are actually quite	-0.124939
-0.551615	also be predicted quite	-0.124939
-0.346398	expressions also occur quite	-0.124939
-0.345173	which may happen quite	-0.124939
-0.336180	up, which happens quite	-0.124939
-0.467483	software that runs quite	-0.124939
-1.966017	instruction set is used.	-0.124939
-0.993370	the pointer is used.	-0.124939
-1.176682	the variable is used.	-0.124939
-0.783170	register stack is used.	-0.124939
-0.720528	double precision is used.	-0.124939
-0.356237	graphics framework is used.	-0.124939
-0.342166	static linking is used.	-0.124939
-0.356237	vector nontemporal is used.	-0.124939
-0.356237	when alloca is used.	-0.124939
-0.594288	set can be used.	-0.124939
-1.056271	array can be used.	-0.124939
-0.594817	formats should be used.	-0.124939
-0.235125	stack registers are used.	-0.124939
-0.267390	XMM registers are used.	-0.301030
-0.514360	time they are used.	-0.124939
-0.450437	which they are used.	-0.124939
-1.054562	objects are not used.	-0.124939
-1.136097	amount of memory used.	-0.124939
-0.524799	type of registers used.	-0.124939
-0.371553	it is never used.	-0.124939
-0.371553	program is never used.	-0.124939
-0.233708	is no longer used.	-0.124939
-0.338341	are no longer used.	-0.124939
-0.531204	program is actually used.	-0.124939
-0.325377	feature is seldom used.	-0.124939
-0.553607	Use large data files	-0.124939
-0.355256	and writing data files	-0.124939
-0.358171	space or make files	-0.124939
-0.065218	that scans all files	-0.425969
-0.357618	the necessary library files	-0.124939
-0.575678	and the object files	-0.124939
-0.538345	version of object files	-0.124939
-0.346902	compilers. Mixing object files	-0.124939
-0.357394	framework requiring many files	-0.124939
-0.355573	to load several files	-0.124939
-0.498559	format. The intermediate files	-0.124939
-0.309488	in different source files	-0.124939
-0.309488	join all source files	-0.124939
-0.309488	steps. All source files	-0.124939
-0.346366	more time loading files	-0.124939
-0.346366	modules or resource files	-0.124939
-0.444384	of the header files	-0.124939
-0.294225	in Intel header files	-0.124939
-0.237891	to store help files	-0.124939
-0.102856	resource files, help files	-0.124939
-0.102856	configuration files, help files	-0.124939
-0.339353	database connections. Open files	-0.124939
-0.182875	compiling multiple .cpp files	-0.124939
-0.182875	combining multiple .cpp files	-0.124939
-0.212266	and write configuration files	-0.124939
-0.212266	several drivers, configuration files	-0.124939
-0.325358	Table 12.2. Header files	-0.124939
-0.314630	different compiler. Object files	-0.124939
-0.237820	network connections. Temporary files	-0.124939
-0.892347	then it is recommended	-0.602060
-1.186438	Therefore, it is recommended	-0.124939
-0.553800	used, it is recommended	-0.124939
-0.553800	platforms, it is recommended	-0.124939
-0.811401	projects, it is recommended	-0.124939
-0.704589	functions. It is recommended	-0.124939
-0.869895	used. It is recommended	-0.124939
-0.491566	called. It is recommended	-0.124939
-0.704589	pointer. It is recommended	-0.124939
-0.491566	calls. It is recommended	-0.124939
-0.704589	problems. It is recommended	-0.124939
-0.491566	members. It is recommended	-0.124939
-0.491566	handling. It is recommended	-0.124939
-0.491566	versions. It is recommended	-0.124939
-0.491566	divisions. It is recommended	-0.124939
-0.491566	message. It is recommended	-0.124939
-0.491566	54. It is recommended	-0.124939
-0.491566	61. It is recommended	-0.124939
-0.491566	decimals. It is recommended	-0.124939
-0.691659	It is not recommended	-0.602060
-0.584357	resources are not recommended	-0.124939
-0.747149	and therefore not recommended	-0.124939
-1.034468	it is also recommended	-0.124939
-0.762807	It is therefore recommended	-0.124939
-0.237934	It is strongly recommended	-0.124939
-0.598565	object for the intermediate	-0.124939
-0.597811	overflow on the intermediate	-0.124939
-0.566199	to store the intermediate	-0.124939
-0.462891	or compiling the intermediate	-0.124939
-0.358341	which interprets the intermediate	-0.124939
-0.851599	Another disadvantage of intermediate	-0.124939
-0.586742	compiled code and intermediate	-0.124939
-0.358364	double precision, and intermediate	-0.124939
-0.461853	file format. The intermediate	-0.124939
-0.357525	second step. The intermediate	-0.124939
-0.357525	is distributed. The intermediate	-0.124939
-0.358797	temporary objects for intermediate	-0.124939
-0.526914	must consider if intermediate	-0.124939
-0.517733	language based on intermediate	-0.124939
-0.517733	framework based on intermediate	-0.124939
-0.572527	compilation of an intermediate	-0.124939
-0.559719	compiled to an intermediate	-0.124939
-0.542513	implemented with an intermediate	-0.124939
-0.525578	languages use an intermediate	-0.124939
-0.346015	Pascal used an intermediate	-0.124939
-0.351171	of using an intermediate	-0.124939
-0.351171	for using an intermediate	-0.124939
-0.521720	compiled into an intermediate	-0.124939
-0.357065	d; // makes intermediate	-0.124939
-0.458149	copy of every intermediate	-0.124939
-0.702637	possible to store intermediate	-0.124939
-0.490370	need to store intermediate	-0.124939
-0.354414	an integer). All intermediate	-0.124939
-0.438904	loop by storing intermediate	-0.124939
-0.314678	big runtime frameworks, intermediate	-0.124939
-1.771925	the code is fast	-0.124939
-0.358300	character arrays is fast	-0.124939
-0.358300	linear search, is fast	-0.124939
-0.546487	code and for fast	-0.124939
-0.356175	to unsigned for fast	-0.124939
-0.553365	of instructions for fast	-0.124939
-0.460137	the options for fast	-0.124939
-0.356175	A sourcebook for fast	-0.124939
-0.598230	access may be fast	-0.124939
-0.841542	Integer operations are fast	-0.124939
-0.533953	function is as fast	-0.124939
-0.453464	i++ are as fast	-0.124939
-0.643984	are therefore as fast	-0.124939
-0.170055	be just as fast	-0.124939
-0.170055	are just as fast	-0.124939
-0.170055	vector just as fast	-0.124939
-0.526732	compilers also have fast	-0.124939
-0.528803	CPUs are so fast	-0.124939
-0.350728	is developing so fast	-0.124939
-0.580761	constants is very fast	-0.124939
-0.570966	which is calculated fast	-0.124939
-0.354239	that runs quite fast	-0.124939
-0.526212	operations are particularly fast	-0.124939
-0.768985	recommended to enable fast	-0.124939
-0.343662	maximum, saturated addition, fast	-0.124939
-0.441928	p->member is equally fast	-0.124939
-0.494171	do the job fast	-0.124939
-0.325325	This worked sufficiently fast	-0.124939
-0.237837	fast approximate reciprocal, fast	-0.124939
-0.599800	overhead to the allocation	-0.124939
-0.503983	the object. The allocation	-0.124939
-0.462696	64-bit integers. The allocation	-0.124939
-0.504643	Especially the memory allocation	-0.124939
-0.089214	of dynamic memory allocation	-0.249877
-0.168395	use dynamic memory allocation	-0.124939
-0.447122	avoid dynamic memory allocation	-0.124939
-0.179957	All dynamic memory allocation	-0.124939
-0.179957	Whenever dynamic memory allocation	-0.124939
-0.025030	code. Dynamic memory allocation	-0.124939
-0.025030	allocation Dynamic memory allocation	-0.124939
-0.025030	allocation. Dynamic memory allocation	-0.124939
-0.025030	inefficient. Dynamic memory allocation	-0.124939
-0.025030	28 Dynamic memory allocation	-0.124939
-0.025030	are. Dynamic memory allocation	-0.124939
-0.025030	limited. Dynamic memory allocation	-0.124939
-0.012335	9.6 Dynamic memory allocation	-0.124939
-0.356921	to optimize register allocation	-0.124939
-0.556094	process of dynamic allocation	-0.124939
-0.483894	if it involves allocation	-0.124939
-0.341838	advance. The frequent allocation	-0.124939
-0.336287	is finished. Register allocation	-0.124939
-0.429952	cc[]) { for (int	-0.124939
-0.429952	bb) { for (int	-0.124939
-0.810759	= 0; for (int	-0.124939
-0.388090	// ... for (int	-0.124939
-0.388090	List[ArraySize]; ... for (int	-0.124939
-0.050973	biggest vectors: for (int	-0.124939
-0.012194	eight-element vectors: for (int	-0.726999
-0.350395	100 floats for (int	-0.124939
-0.350395	0, sum; for (int	-0.124939
-0.350395	= 1.f; for (int	-0.124939
-0.350395	name _alloca) for (int	-0.124939
-0.120422	14.1b int factorial (int	-0.124939
-0.120422	14.1a int factorial (int	-0.124939
-0.005265	8.7 int SomeFunction (int	-0.124939
-0.005265	8.9b int SomeFunction (int	-0.124939
-0.005265	8.9a int SomeFunction (int	-0.124939
-0.005265	8.11b int SomeFunction (int	-0.124939
-0.005265	8.11a int SomeFunction (int	-0.124939
-0.026989	7.1 float SomeFunction (int	-0.124939
-0.026989	<malloc.h> void SomeFunction (int	-0.124939
-0.325384	7.42 int Multiply (int	-0.124939
-0.314727	8.21 void Func1 (int	-0.124939
-0.294210	8.5a void Plus2 (int	-0.124939
-0.294210	m> int MultiplyBy (int	-0.124939
-0.294210	7.12 void FuncA (int	-0.124939
-0.237902	} void FuncB (int	-0.124939
-0.596467	read because the write	-0.124939
-0.480897	b than to write	-0.124939
-0.480897	operations than to write	-0.124939
-1.916650	is possible to write	-0.124939
-0.616732	is easier to write	-0.124939
-0.388955	just easier to write	-0.124939
-0.721168	may prefer to write	-0.124939
-0.655049	several minutes to write	-0.124939
-0.356514	from attempting to write	-0.124939
-0.358364	the value and write	-0.124939
-0.725464	to read and write	-0.124939
-0.142973	and read or write	-0.124939
-0.142973	not read or write	-0.124939
-0.588636	Windows, you may write	-0.124939
-0.589452	needed. You may write	-0.124939
-0.882290	example if you write	-0.124939
-0.356802	do, however, often write	-0.124939
-0.502513	problem. These instructions write	-0.124939
-0.344957	But if I write	-0.124939
-0.344957	speeds. If I write	-0.124939
-0.985658	if the threads write	-0.124939
-0.082254	then the nontemporal write	-0.124939
-0.082254	Using the nontemporal write	-0.124939
-0.183815	effect of nontemporal write	-0.124939
-0.183815	area. The nontemporal write	-0.124939
-0.183815	The so-called nontemporal write	-0.124939
-0.345177	think that programmers write	-0.124939
-0.325311	store An uncached write	-0.124939
-0.237861	fastcall)) __fastcall Noncached write	-0.124939
-0.551273	function and to optimize	-0.124939
-0.663639	the compiler to optimize	-0.249877
-0.457571	organize data to optimize	-0.124939
-2.040164	in order to optimize	-0.124939
-1.083111	you want to optimize	-0.124939
-0.860369	very important to optimize	-0.124939
-1.018221	be necessary to optimize	-0.124939
-0.457571	this information to optimize	-0.124939
-1.540885	be able to optimize	-0.124939
-0.869215	you start to optimize	-0.124939
-0.520271	we try to optimize	-0.124939
-0.358839	to inline and optimize	-0.124939
-0.794503	good compiler can optimize	-0.124939
-0.544382	just-in-time compiler can optimize	-0.124939
-0.313371	sets. Does not optimize	-0.124939
-0.313371	IDE. Does not optimize	-0.124939
-0.598948	help the compiler optimize	-0.124939
-0.864477	good compiler will optimize	-0.124939
-0.065113	8.1 How compilers optimize	-0.124939
-0.548622	compiler can often optimize	-0.124939
-0.502305	compiler can easily optimize	-0.124939
-0.325331	function or otherwise optimize	-0.124939
-0.237877	ms by selecting optimize	-0.124939
-0.237877	discussed below. Cannot optimize	-0.124939
-0.599433	more of the above	-0.124939
-0.864527	code in the above	-0.124939
-0.582350	bit in the above	-0.124939
-0.582350	methods in the above	-0.124939
-0.582350	mentioned in the above	-0.124939
-0.582350	shown in the above	-0.124939
-0.559777	removed from the above	-0.124939
-0.559777	deviate from the above	-0.124939
-0.587273	array using the above	-0.124939
-0.701930	faster. In the above	-0.124939
-0.489936	big. In the above	-0.124939
-0.356839	Let's repeat the above	-0.124939
-0.356839	inlined. (In the above	-0.124939
-0.460981	me explain the above	-0.124939
-0.356839	program. Weighing the above	-0.124939
-0.460261	a register. The above	-0.124939
-0.356273	+= a[i]; The above	-0.124939
-0.356273	unknown sources. The above	-0.124939
-0.356273	happens rarely. The above	-0.124939
-0.356273	pointers. 144 The above	-0.124939
-0.065614	_EM_OVERFLOW); // if above	-0.124939
-0.358574	actually is. This above	-0.124939
-0.358058	lines we used above	-0.124939
-0.349770	The method described above	-0.124939
-0.343722	the disadvantages mentioned above	-0.124939
-0.429472	with column 28 above	-0.124939
-0.294200	mirror elements matrix[c][r] above	-0.124939
-0.237894	its mirror position above	-0.124939
-0.356654	in 64-bit code. However,	-0.124939
-0.355882	the next function. However,	-0.124939
-0.565144	nontemporal is used. However,	-0.124939
-0.351128	the x86 CPUs. However,	-0.124939
-0.642412	all major platforms. However,	-0.124939
-0.452425	of the size. However,	-0.124939
-0.348877	were scarce resources. However,	-0.124939
-0.488771	reproducible as possible. However,	-0.124939
-0.347311	the other thread. However,	-0.124939
-0.700207	many different purposes. However,	-0.124939
-0.632459	large data sets. However,	-0.124939
-0.343545	are most critical. However,	-0.124939
-0.343477	they are executed. However,	-0.124939
-0.341686	with its value. However,	-0.124939
-0.339265	instruction set. 120 However,	-0.124939
-0.339265	the actual processor. However,	-0.124939
-0.339308	mechanism works automatically. However,	-0.124939
-0.421103	what they are. However,	-0.124939
-0.325235	the fastest first. However,	-0.124939
-0.314533	a PC platform. However,	-0.124939
-0.382600	of a debugger. However,	-0.124939
-0.382600	on the screen. However,	-0.124939
-0.382600	the program flow. However,	-0.124939
-0.294024	the next calculation. However,	-0.124939
-0.294024	best Java implementations. However,	-0.124939
-0.237739	inside {} brackets. However,	-0.124939
-0.237739	for later maintenance. However,	-0.124939
-0.237739	specific CPU models. However,	-0.124939
-0.237739	for function F1. However,	-0.124939
-0.874076	invalid if a was	-0.124939
-0.723082	cache line that was	-0.124939
-0.357339	a model that was	-0.124939
-0.357339	of ebx that was	-0.124939
-1.070547	the time it was	-0.425969
-0.515623	last time it was	-0.124939
-0.355232	the value it was	-0.124939
-0.598269	where the function was	-0.124939
-0.595127	since the CPU was	-0.124939
-0.833455	the CPUID instruction was	-0.124939
-1.069712	This instruction set was	-0.124939
-0.575154	so that there was	-0.124939
-0.175843	time the software was	-0.124939
-0.960261	on non-Intel CPUs was	-0.124939
-0.351606	C++ template feature was	-0.124939
-0.350183	time the statement was	-0.124939
-0.349600	Each 128-bit operation was	-0.124939
-0.345063	thread. If seconds was	-0.124939
-0.343539	because this brand was	-0.124939
-0.892543	when the CPUID was	-0.124939
-0.336239	four multiplications. How was	-0.124939
-0.331727	ago, the recommendation was	-0.124939
-0.607077	version of Basic was	-0.124939
-0.714171	the time consumption was	-0.124939
-0.467400	15.1b to 15.1c was	-0.124939
-0.325222	in which alloca was	-0.124939
-0.294080	in example 11.2b was	-0.124939
-0.567186	the members of both	-0.124939
-0.654087	the same in both	-0.124939
-0.988942	are available in both	-0.124939
-0.459952	to work in both	-0.124939
-0.356029	cache line in both	-0.124939
-0.536816	can run in both	-0.124939
-0.502356	assembly syntax in both	-0.124939
-0.356029	standardized details in both	-0.124939
-0.598204	network may be both	-0.124939
-0.358719	and v.f are both	-0.124939
-0.881288	will fail if both	-0.124939
-0.357401	> v.f if both	-0.124939
-1.335773	is supported by both	-0.124939
-0.504524	may work with both	-0.124939
-0.804321	lot of time both	-0.124939
-0.462793	and b will both	-0.124939
-0.358288	are swapped then both	-0.124939
-0.358209	requires support from both	-0.124939
-0.358181	programming style has both	-0.124939
-0.462665	const twice because both	-0.124939
-0.354071	inline and optimize both	-0.124939
-0.353821	run time. Therefore, both	-0.124939
-0.350679	test tool supports both	-0.124939
-0.350221	optimized yet. Supports both	-0.124939
-0.349080	loop counter outside both	-0.124939
-0.345142	independent and checks both	-0.124939
-0.336129	they always evaluate both	-0.124939
-0.294089	source and destination both	-0.124939
-0.237796	systems. Today (2013) both	-0.124939
-1.060007	resources than the programs	-0.124939
-0.358859	and uninstallation of programs	-0.124939
-0.981460	operating systems and programs	-0.124939
-0.502933	CPU use in programs	-0.124939
-1.195562	be useful in programs	-0.124939
-0.723310	double precision in programs	-0.124939
-0.785741	such errors in programs	-0.124939
-0.526069	time-consumer even for programs	-0.124939
-0.358101	annoyingly high for programs	-0.124939
-0.358566	the case with programs	-0.124939
-0.357531	can expect 64-bit programs	-0.124939
-0.814338	errors in C++ programs	-0.124939
-0.357504	with many such programs	-0.124939
-0.357360	code. But many programs	-0.124939
-0.503363	updates Many software programs	-0.124939
-0.357139	faster than 32-bit programs	-0.124939
-0.827317	you are making programs	-0.124939
-0.355993	time intervals. Some programs	-0.124939
-0.459288	that many common programs	-0.124939
-0.354729	matters, which few programs	-0.124939
-0.555736	Windows and Mac programs	-0.124939
-0.353690	access Some application programs	-0.124939
-0.352203	user input. Many programs	-0.124939
-0.347372	user's time. Other programs	-0.124939
-0.550869	why object oriented programs	-0.124939
-0.429294	throughput of CPU-intensive programs	-0.124939
-0.237780	problem in interactive programs	-0.124939
-0.237780	same cache. Multithreaded programs	-0.124939
-0.600709	investigation of the problems	-0.124939
-0.596444	fighting with the problems	-0.124939
-0.542742	of all the problems	-0.124939
-0.542742	solve all the problems	-0.124939
-0.526671	this involves the problems	-0.124939
-0.971000	this kind of problems	-0.124939
-0.358889	less susceptible to problems	-0.124939
-0.901821	the CPU has problems	-0.124939
-0.347233	solution to these problems	-0.124939
-0.510162	prone. All these problems	-0.124939
-0.356531	Gnu compilers without problems	-0.124939
-0.355590	the server. These problems	-0.124939
-0.500341	problems. Some common problems	-0.124939
-0.550491	software can cause problems	-0.124939
-0.341390	Such schemes cause problems	-0.124939
-0.332908	are no caching problems	-0.124939
-0.332908	can cause caching problems	-0.124939
-0.306470	causes of compatibility problems	-0.124939
-0.306470	sources of compatibility problems	-0.124939
-0.102871	time and compatibility problems	-0.124939
-0.102871	problems and compatibility problems	-0.124939
-0.299693	sources of resource problems	-0.124939
-0.548533	and other resource problems	-0.124939
-0.969894	useful for finding problems	-0.124939
-0.237881	terms of usability problems	-0.124939
-0.237881	as important usability problems	-0.124939
-0.237881	compatibility problems, usability problems	-0.124939
-0.314669	inlining causes technical problems	-0.124939
-0.562331	very long time unless	-0.124939
-0.837115	an induction variable unless	-0.124939
-0.539229	not do so unless	-0.124939
-0.554168	do the optimization unless	-0.124939
-0.460645	transferred as pointers unless	-0.124939
-1.313460	in 32-bit systems unless	-0.124939
-0.356102	&& to & unless	-0.124939
-0.261452	on non-Intel CPUs unless	-0.425969
-1.256554	floating point calculations unless	-0.124939
-0.678534	in 32-bit mode unless	-0.124939
-0.678534	the flush-to-zero mode unless	-0.124939
-0.575152	for exception handling unless	-0.124939
-0.850584	definitely be avoided unless	-0.124939
-0.463603	which is slow unless	-0.124939
-0.314654	comparisons are slow unless	-0.124939
-0.512753	a stack frame unless	-0.124939
-0.349011	but not safe unless	-0.124939
-0.528259	program more clear unless	-0.124939
-0.483677	off by default unless	-0.124939
-0.339338	a large object, unless	-0.124939
-0.339338	time than rounding unless	-0.124939
-0.314572	a loop manually unless	-0.124939
-0.538595	Mac OS X, unless	-0.124939
-0.237772	int (16 bits), unless	-0.124939
-0.237772	than as b*(2.0/3.0) unless	-0.124939
-0.237772	an integer constant, unless	-0.124939
-0.237772	variable method unfavorable, unless	-0.124939
-0.897818	functions in the optimal	-0.124939
-0.760417	to be the optimal	-0.124939
-0.565197	set then the optimal	-0.124939
-0.565197	case then the optimal	-0.124939
-0.802173	most cases, the optimal	-0.124939
-0.555138	to choose the optimal	-0.124939
-1.174949	to find the optimal	-0.124939
-0.503543	will produce the optimal	-0.124939
-0.031562	2 Choosing the optimal	-0.425969
-0.031562	5 Choosing the optimal	-0.425969
-0.594236	type that is optimal	-0.124939
-1.340041	whether it is optimal	-0.124939
-0.882541	cases, it is optimal	-0.124939
-0.597246	35 This is optimal	-0.124939
-0.577701	which solution is optimal	-0.124939
-0.539431	vector implementation is optimal	-0.124939
-0.358830	is finished. The optimal	-0.124939
-1.605396	may not be optimal	-0.124939
-1.129886	it may be optimal	-0.124939
-1.506841	it is not optimal	-0.124939
-0.560923	processors is not optimal	-0.124939
-0.824378	union is not optimal	-0.124939
-0.560923	unrolling is not optimal	-0.124939
-0.959149	is not an optimal	-0.124939
-0.356998	compilers produce less optimal	-0.124939
-0.358950	to deallocate the space	-0.124939
-0.358900	statement occupies a space	-0.124939
-0.855389	required amount of space	-0.124939
-0.463479	the heap. The space	-0.124939
-0.353467	takes up more space	-0.124939
-0.497393	not allocate more space	-0.124939
-0.353467	than allocating more space	-0.124939
-0.802453	that the memory space	-0.124939
-1.022399	amount of memory space	-0.124939
-0.237766	stack. The memory space	-0.124939
-0.237766	(PLT). The memory space	-0.124939
-0.335960	purposes. This memory space	-0.124939
-0.335960	most efficient memory space	-0.124939
-0.335960	cases take memory space	-0.124939
-0.335960	for saving memory space	-0.124939
-0.335960	efficient. Extra memory space	-0.124939
-1.654987	use the same space	-0.124939
-0.658030	lot of cache space	-0.124939
-0.462470	amount of cache space	-0.124939
-0.634751	take up cache space	-0.124939
-0.346187	can save cache space	-0.124939
-0.357034	the larger address space	-0.124939
-0.500427	takes too much space	-0.124939
-0.449036	memory and disk space	-0.124939
-0.347425	and takes little space	-0.124939
-0.432725	when the heap space	-0.124939
-0.306087	causes the heap space	-0.124939
-0.454901	26. The heap space	-0.124939
-0.595012	performance. There are cases,	-0.124939
-0.550187	__intel_cpu_features_init_x(). In other cases,	-0.124939
-0.539869	permissible in all cases,	-0.124939
-0.359346	time in most cases,	-0.124939
-0.359346	fast in most cases,	-0.124939
-0.505604	implementation in most cases,	-0.124939
-0.359346	because, in most cases,	-0.124939
-0.179039	cases In most cases,	-0.124939
-0.179039	etc. In most cases,	-0.124939
-0.179039	integers In most cases,	-0.124939
-0.179039	calculation. In most cases,	-0.124939
-0.179039	strings. In most cases,	-0.124939
-0.357551	calculations. In such cases,	-0.124939
-0.245967	efficient. In many cases,	-0.124939
-0.245967	priority. In many cases,	-0.124939
-0.448862	useful in some cases,	-0.124939
-0.448862	optimizations in some cases,	-0.124939
-0.448862	iterator in some cases,	-0.124939
-0.311710	function In some cases,	-0.124939
-0.311710	function. In some cases,	-0.124939
-0.311710	users. In some cases,	-0.124939
-0.311710	2002). In some cases,	-0.124939
-0.311710	API. In some cases,	-0.124939
-0.345469	pointers In simple cases,	-0.124939
-0.345469	In 50 simple cases,	-0.124939
-1.021723	are a few cases,	-0.124939
-0.503352	in the simplest cases,	-0.124939
-0.357737	for the simplest cases,	-0.124939
-0.356189	function pointer if else	-0.124939
-0.106680	if else if else	-0.124939
-1.252228	} } } else	-0.124939
-0.529184	DTRUE; } } else	-0.124939
-0.675471	= 0; } else	-0.124939
-0.402221	= b; } else	-0.124939
-0.548239	= 1; } else	-0.124939
-0.468539	+ 1; } else	-0.425969
-0.309958	Table lookup } else	-0.124939
-0.163270	* 2; } else	-0.425969
-0.702985	+ 1.; } else	-0.124939
-0.309958	is nonzero } else	-0.124939
-0.309958	range"; 134 } else	-0.124939
-0.309958	of range"; } else	-0.124939
-0.309958	{ FuncA(i); } else	-0.124939
-0.127894	{ F1(a); } else	-0.124939
-0.127894	a[1000]; F1(a); } else	-0.124939
-0.402221	= &CriticalFunction_SSE2; } else	-0.124939
-0.402221	= &CriticalFunction_AVX; } else	-0.124939
-0.309958	1; 69 } else	-0.124939
-0.188106	rely on anything else	-0.124939
-0.188106	time than anything else	-0.124939
-0.188106	to optimize anything else	-0.124939
-0.237934	} } 34 else	-0.124939
-0.237934	sin(x); } 68 else	-0.124939
-0.871692	There is a lot	-0.425969
-0.577840	advantageous if a lot	-0.124939
-0.584689	Func with a lot	-0.124939
-0.862942	can use a lot	-0.124939
-0.370967	to do a lot	-0.124939
-0.099006	can do a lot	-0.425969
-0.354285	that take a lot	-0.124939
-0.354285	conversions take a lot	-0.124939
-0.708862	is often a lot	-0.124939
-0.859478	may cause a lot	-0.124939
-0.575474	program uses a lot	-0.124939
-0.408065	application uses a lot	-0.124939
-0.552169	soon get a lot	-0.124939
-0.515884	often contains a lot	-0.124939
-0.529448	or require a lot	-0.124939
-0.351156	can save a lot	-0.124939
-0.351156	they waste a lot	-0.124939
-0.351156	can spend a lot	-0.124939
-0.105731	can consume a lot	-0.425969
-0.351156	still consumes a lot	-0.124939
-0.351156	drivers differ a lot	-0.124939
-0.351156	software installed, a lot	-0.124939
-0.351156	of RAM, a lot	-0.124939
-0.546719	mathematical functions. A lot	-0.124939
-0.355787	into vectors. A lot	-0.124939
-2.123531	x - - Integer	-0.124939
-1.785110	power of 2 Integer	-0.124939
-0.909433	Whole program optimization Integer	-0.124939
-1.486632	at compile time. Integer	-0.124939
-0.926822	take longer time. Integer	-0.124939
-0.523435	risk of overflow Integer	-0.124939
-0.778064	variables and operators Integer	-0.124939
-0.338787	results. Integer operators Integer	-0.124939
-0.754628	14.4 Integer multiplication Integer	-0.124939
-0.558392	14.5 Integer division Integer	-0.124939
-0.642566	in many cases. Integer	-0.124939
-0.350143	a specific size. Integer	-0.124939
-0.511568	cost in performance. Integer	-0.124939
-0.336150	memory pool. 15 Integer	-0.124939
-0.429294	most other microprocessors. Integer	-0.124939
-0.325272	used for constants. Integer	-0.124939
-0.325272	produce undesired results. Integer	-0.124939
-0.172598	the performance. 14.4 Integer	-0.124939
-0.172598	once................................... 135 14.4 Integer	-0.124939
-0.575299	on the microprocessor. Integer	-0.124939
-0.102816	............................................................................................. 136 14.5 Integer	-0.124939
-0.102816	page 96. 14.5 Integer	-0.124939
-0.294071	an integer. 158 Integer	-0.124939
-0.294071	for further discussion. Integer	-0.124939
-0.538611	on the processor). Integer	-0.124939
-0.237780	contains integer division: Integer	-0.124939
-0.237780	// Example 15.1d. Integer	-0.124939
-0.237780	// Example 8.24. Integer	-0.124939
-0.598331	possible, and the dispatching	-0.124939
-1.423573	to do the dispatching	-0.124939
-0.358197	"Hello 2" The dispatching	-0.124939
-0.358197	page 44. The dispatching	-0.124939
-0.515301	do the CPU dispatching	-0.124939
-0.515301	override the CPU dispatching	-0.124939
-0.451545	pitfalls of CPU dispatching	-0.124939
-0.387425	intended for CPU dispatching	-0.124939
-0.297955	13.1 // CPU dispatching	-0.124939
-0.387425	research on CPU dispatching	-0.124939
-0.297955	libraries have CPU dispatching	-0.124939
-0.387425	AVX using CPU dispatching	-0.124939
-0.645993	with automatic CPU dispatching	-0.124939
-0.333556	contains automatic CPU dispatching	-0.124939
-0.297955	compiler supports CPU dispatching	-0.124939
-0.297955	should apply CPU dispatching	-0.124939
-0.297955	vector c: CPU dispatching	-0.124939
-0.297955	an explicit CPU dispatching	-0.124939
-0.297955	processors. Explicit CPU dispatching	-0.124939
-0.123911	set. 13.6 CPU dispatching	-0.124939
-0.123911	126 13.6 CPU dispatching	-0.124939
-0.123911	128 13.7 CPU dispatching	-0.124939
-0.123911	129 13.7 CPU dispatching	-0.124939
-0.297955	Example 13.2. CPU dispatching	-0.124939
-0.813230	because it makes dispatching	-0.124939
-0.456374	function. The automatic dispatching	-0.124939
-0.048396	122 13.2 Model-specific dispatching	-0.124939
-0.048396	files. 13.2 Model-specific dispatching	-0.124939
-0.600946	overflow in the particular	-0.124939
-0.591843	CPU of a particular	-0.124939
-0.576631	with in a particular	-0.124939
-0.853638	not in a particular	-0.124939
-0.576631	invalid in a particular	-0.124939
-0.450165	function for a particular	-0.124939
-0.450165	use for a particular	-0.124939
-0.450165	compiled for a particular	-0.124939
-0.450165	optimized for a particular	-0.124939
-0.450165	fine-tuned for a particular	-0.124939
-0.656120	means that a particular	-0.124939
-0.461251	shows that a particular	-0.124939
-0.461251	possibility that a particular	-0.124939
-0.461251	feel that a particular	-0.124939
-0.545068	long on a particular	-0.124939
-0.545068	bad on a particular	-0.124939
-0.524843	integer has a particular	-0.124939
-0.524843	processor has a particular	-0.124939
-1.298070	by using a particular	-0.124939
-0.562648	cases where a particular	-0.124939
-0.351764	considering whether a particular	-0.124939
-0.351764	job optimizing a particular	-0.124939
-0.351764	N supports a particular	-0.124939
-0.519411	can expect a particular	-0.124939
-0.351764	you activate a particular	-0.124939
-0.463509	effort on that particular	-0.124939
-0.504941	Lookup tables are particular	-0.124939
-0.549479	CPU that each particular	-0.124939
-1.187720	is that the microprocessor	-0.124939
-0.841567	case that the microprocessor	-0.124939
-1.088718	require that the microprocessor	-0.124939
-1.456757	supported by the microprocessor	-0.124939
-0.595798	relying on the microprocessor	-0.124939
-0.591859	slow, then the microprocessor	-0.124939
-0.593240	process because the microprocessor	-0.124939
-0.846649	simultaneously. If the microprocessor	-0.124939
-0.572923	sum2; If the microprocessor	-0.124939
-0.580789	call makes the microprocessor	-0.124939
-0.501897	and how the microprocessor	-0.124939
-0.501897	most cases the microprocessor	-0.124939
-0.655413	how well the microprocessor	-0.124939
-0.655413	that fits the microprocessor	-0.124939
-0.356696	if-else structure), the microprocessor	-0.124939
-1.287109	than in a microprocessor	-0.124939
-0.833956	to implement a microprocessor	-0.124939
-0.358292	set (requires a microprocessor	-0.124939
-0.181383	2.2 Choice of microprocessor	-0.124939
-0.358866	compiler technology, and microprocessor	-0.124939
-0.358889	general improvements in microprocessor	-0.124939
-0.355754	dependency chain. A microprocessor	-0.124939
-0.355754	integer counter. A microprocessor	-0.124939
-0.210239	of a dedicated microprocessor	-0.124939
-0.308692	than a dedicated microprocessor	-0.124939
-0.592947	errors is to replace	-0.124939
-1.566758	you have to replace	-0.124939
-1.177268	is possible to replace	-0.124939
-1.155393	be possible to replace	-0.124939
-1.027256	be necessary to replace	-0.124939
-1.376387	is advantageous to replace	-0.124939
-0.502823	is expected to replace	-0.124939
-0.676399	the compiler can replace	-0.124939
-0.873039	The compiler can replace	-0.124939
-0.810260	optimizing compiler can replace	-0.124939
-0.527179	const variable or replace	-0.124939
-0.358082	The compiler may replace	-0.522879
-0.580648	debugger. You may replace	-0.124939
-1.305204	The compiler will replace	-0.124939
-1.125065	Some compilers will replace	-0.124939
-0.358321	poorly predictable then replace	-0.124939
-0.561478	Likewise, you cannot replace	-0.124939
-0.558329	1; You cannot replace	-0.124939
-0.502099	compilers will often replace	-0.124939
-0.668335	compiler can automatically replace	-0.124939
-0.609299	compilers will automatically replace	-0.124939
-0.599187	code of the next	-0.124939
-1.480871	pointer to the next	-0.124939
-0.876463	directly to the next	-0.124939
-0.592653	explained in the next	-0.124939
-0.879944	values in the next	-0.124939
-0.892141	function for the next	-0.124939
-1.662475	is that the next	-0.124939
-0.857857	case that the next	-0.124939
-1.195716	assume that the next	-0.124939
-0.595620	μs on the next	-0.124939
-0.578587	only when the next	-0.124939
-0.578587	update when the next	-0.124939
-0.877024	and make the next	-0.124939
-0.523789	we need the next	-0.124939
-0.460614	to start the next	-0.124939
-0.563753	only until the next	-0.124939
-0.872085	32-bit mode. The next	-0.124939
-0.652101	function returns. The next	-0.124939
-0.458681	of ebx. The next	-0.124939
-0.355028	need metaprogramming. The next	-0.124939
-0.355028	of range. The next	-0.124939
-0.458681	contains i/2+r. The next	-0.124939
-0.355028	and VIA. The next	-0.124939
-0.358744	*= xx4; // next	-0.124939
-0.355125	x // get next	-0.124939
-0.237934	will be mainstream next	-0.124939
-1.072341	especially if the branches	-0.124939
-1.415040	The number of branches	-0.124939
-1.731677	a lot of branches	-0.124939
-0.357868	The target of branches	-0.124939
-1.223429	a series of branches	-0.124939
-0.527170	jumps, calls and branches	-0.124939
-0.358768	from optimal. The branches	-0.124939
-0.890023	contentions is that branches	-0.124939
-0.442656	that all code branches	-0.124939
-0.313485	call all code branches	-0.124939
-0.357972	to test all branches	-0.124939
-0.462485	put seldom used branches	-0.124939
-0.525727	count and no branches	-0.124939
-1.133147	of the two branches	-0.124939
-0.562170	program with many branches	-0.124939
-0.800272	program has many branches	-0.124939
-0.827362	you are making branches	-0.124939
-0.355559	that contains several branches	-0.124939
-0.354754	have as few branches	-0.124939
-0.541748	where the dispatch branches	-0.124939
-0.349003	correlated with preceding branches	-0.124939
-0.348224	appropriate. 8. Avoid branches	-0.124939
-0.124479	analysis Join identical branches	-0.124939
-0.124479	area. Join identical branches	-0.124939
-0.314680	Eliminate jumps Eliminate branches	-0.124939
-0.237804	vector size. Unpredictable branches	-0.124939
-0.237804	unpacking needed. Predictable branches	-0.124939
-0.597222	b; This is typically	-0.124939
-0.577097	conversion time is typically	-0.124939
-0.559156	object which is typically	-0.124939
-0.559156	size, which is typically	-0.124939
-0.571250	line size is typically	-0.124939
-0.357406	or malloc is typically	-0.124939
-0.358882	time slices of typically	-0.124939
-0.358074	data structures that typically	-0.124939
-0.358074	are frameworks that typically	-0.124939
-0.524672	mode program are typically	-0.124939
-0.584483	Library functions are typically	-0.124939
-0.357151	and child are typically	-0.124939
-0.356892	if unsigned. This typically	-0.124939
-0.356892	compiled C++. This typically	-0.124939
-0.572843	range. This may typically	-0.124939
-0.355652	of programming will typically	-0.124939
-0.355652	point calculations will typically	-0.124939
-1.205367	a function pointer typically	-0.124939
-0.357401	branch instruction takes typically	-0.124939
-0.356420	integer without SSE2 typically	-0.124939
-0.496590	features. The programmer typically	-0.124939
-0.352258	code. This framework typically	-0.124939
-0.490235	Strings Text strings typically	-0.124939
-0.341779	that such devices typically	-0.124939
-0.438898	swapping. Software developers typically	-0.124939
-0.331858	layers and frameworks typically	-0.124939
-0.314706	lower priority level, typically	-0.124939
-0.578916	friend function or operator	-0.124939
-0.358169	= b;} vector operator	-0.124939
-0.503670	latter has one operator	-0.124939
-0.298342	then the & operator	-0.124939
-0.298342	But the & operator	-0.124939
-0.334914	operator. The & operator	-0.124939
-0.647727	using the | operator	-0.124939
-0.530673	[] array index operator	-0.124939
-0.349072	constructor // sum operator	-0.124939
-0.265122	constructor or overloaded operator	-0.124939
-0.378722	Using an overloaded operator	-0.124939
-0.265122	operators An overloaded operator	-0.124939
-0.255865	// C++ casting operator	-0.124939
-0.414710	loaded type casting operator	-0.124939
-0.331799	1. The AND operator	-0.124939
-0.237845	cycle. The OR operator	-0.124939
-0.237845	the EXCLUSIVE OR operator	-0.124939
-0.325252	to the modulo operator	-0.124939
-0.325252	operators The pre-increment operator	-0.124939
-0.294108	cast The dynamic_cast operator	-0.124939
-0.102828	of the const_cast operator	-0.124939
-0.102828	cast The const_cast operator	-0.124939
-0.294108	zero. The [] operator	-0.124939
-0.237812	2.6f; The ?: operator	-0.124939
-0.237812	and the post-increment operator	-0.124939
-0.237812	cast The reinterpret_cast operator	-0.124939
-0.237812	cast The static_cast operator	-0.124939
-0.463624	graphics application is preferably	-0.124939
-0.358775	YMM vectors are preferably	-0.124939
-0.575326	of this by preferably	-0.124939
-0.358573	the program - preferably	-0.124939
-0.589467	anyway. You may preferably	-0.124939
-0.356537	first dimension may preferably	-0.124939
-0.760751	a function should preferably	-0.124939
-0.305223	innermost loop should preferably	-0.124939
-0.305223	each object should preferably	-0.124939
-0.080017	the objects should preferably	-0.124939
-0.038168	and objects should preferably	-0.425969
-0.305223	decomposition, we should preferably	-0.124939
-0.431569	speed test should preferably	-0.124939
-0.305223	a list should preferably	-0.124939
-0.305223	loop counter should preferably	-0.124939
-0.305223	loop count should preferably	-0.124939
-0.305223	switch statements should preferably	-0.124939
-0.558390	Loop unrolling should preferably	-0.124939
-0.305223	other device should preferably	-0.124939
-0.305223	an interrupt should preferably	-0.124939
-0.261607	code should therefore preferably	-0.124939
-0.261607	It should therefore preferably	-0.124939
-0.261607	calculations should therefore preferably	-0.124939
-0.336270	A few files, preferably	-0.124939
-0.294200	a single container, preferably	-0.124939
-0.237894	16 for SSE2, preferably	-0.124939
-0.573053	{ c = 1;	-0.124939
-0.452683	(int n = 1;	-0.124939
-0.559323	DTRUE: d = 1;	-0.124939
-0.514629	i, f = 1;	-0.124939
-0.364610	for (r = 1;	-0.425969
-0.140748	0; list[i+1] = 1;	-0.124939
-0.140748	=0; list[i+1] = 1;	-0.124939
-0.350297	a[2]; a[0] = 1;	-0.124939
-0.177436	return a - 1;	-0.425969
-0.686480	= a + 1;	-0.124939
-0.041177	return a + 1;	-0.522879
-0.662515	= b + 1;	-0.124939
-0.149503	* b + 1;	-0.124939
-0.315520	return x*x + 1;	-0.124939
-0.337691	int one : 1;	-0.124939
-0.337691	int sign : 1;	-0.124939
-0.669093	{ cout << 1;	-0.425969
-0.709092	= a ^ 1;	-0.124939
-0.237918	x; n >>= 1;	-0.124939
-0.356661	at run time. Therefore,	-0.124939
-0.352877	contains writeable data. Therefore,	-0.124939
-0.858148	higher instruction set. Therefore,	-0.124939
-0.519178	constructors are called. Therefore,	-0.124939
-0.450016	reference to it. Therefore,	-0.124939
-0.879879	in 64-bit mode. Therefore,	-0.124939
-0.795416	for internal references. Therefore,	-0.124939
-0.510419	pointer points to. Therefore,	-0.124939
-0.629692	can be critical. Therefore,	-0.124939
-0.341684	many different applications. Therefore,	-0.124939
-0.339263	transferring additional parameters. Therefore,	-0.124939
-0.339300	any other number. Therefore,	-0.124939
-0.229141	are time consuming. Therefore,	-0.124939
-0.229141	very time consuming. Therefore,	-0.124939
-0.607039	floating point numbers. Therefore,	-0.124939
-0.331707	self- relative addresses. Therefore,	-0.124939
-0.458506	throws an exception. Therefore,	-0.124939
-0.683387	MemberPointer is declared. Therefore,	-0.124939
-0.212194	way or another. Therefore,	-0.124939
-0.212194	thread than another. Therefore,	-0.124939
-0.314572	of type int. Therefore,	-0.124939
-0.314654	function library. 78 Therefore,	-0.124939
-0.538595	it was programmed. Therefore,	-0.124939
-0.294061	with different strides. Therefore,	-0.124939
-0.294061	scope or namespaces. Therefore,	-0.124939
-0.294061	power than PCs. Therefore,	-0.124939
-0.237772	has been calculated. Therefore,	-0.124939
-0.598557	compiler on the Mac	-0.124939
-0.568827	bit Windows and Mac	-0.124939
-0.417701	In Linux and Mac	-0.124939
-0.296877	Windows, Linux and Mac	-0.124939
-0.015446	Linux, BSD and Mac	-0.249877
-0.854323	shared objects in Mac	-0.124939
-0.462971	been tested in Mac	-0.124939
-0.855186	Gnu compiler for Mac	-0.124939
-0.358661	Linux, BSD or Mac	-0.124939
-0.504562	only run on Mac	-0.124939
-0.357581	internal references. 64-bit Mac	-0.124939
-0.567016	addresses in 32-bit Mac	-0.124939
-0.437419	compiling for 32-bit Mac	-0.124939
-0.437419	Compilers for 32-bit Mac	-0.124939
-0.603011	-fno-builtin Gnu 32-bit Mac	-0.124939
-0.329566	in Linux. 32-bit Mac	-0.124939
-0.353383	in Unix-like systems. Mac	-0.124939
-0.515622	systems Windows, Linux, Mac	-0.124939
-0.408079	Linux and perhaps Mac	-0.124939
-0.065793	systems. The Intel-based Mac	-0.124939
-0.065793	well as Intel-based Mac	-0.124939
-0.065793	Linux, BSD, Intel-based Mac	-0.124939
-0.538790	up to date. Mac	-0.124939
-0.600460	latency of the multiplication	-0.124939
-0.597293	advance and the multiplication	-0.124939
-1.075269	occur in the multiplication	-0.124939
-0.196687	2 then the multiplication	-0.124939
-0.569381	integers, while the multiplication	-0.124939
-0.569064	may avoid the multiplication	-0.124939
-1.802676	to make a multiplication	-0.124939
-0.540709	not require a multiplication	-0.124939
-0.461762	point addition and multiplication	-0.124939
-0.027947	addition, subtraction and multiplication	-0.124939
-0.541233	may occur in multiplication	-0.124939
-0.588072	abs(v.f) } The multiplication	-0.124939
-0.526179	matrix element. The multiplication	-0.124939
-0.593867	// used for multiplication	-0.124939
-0.358403	some cases this multiplication	-0.124939
-1.597094	a floating point multiplication	-0.124939
-1.030698	two floating point multiplication	-0.124939
-0.457332	as 32-bit integer multiplication	-0.124939
-0.353966	often replace integer multiplication	-0.124939
-0.388159	longer time. Integer multiplication	-0.124939
-0.298552	Integer multiplication Integer multiplication	-0.124939
-0.124111	performance. 14.4 Integer multiplication	-0.124939
-0.124111	135 14.4 Integer multiplication	-0.124939
-0.343716	the code involves multiplication	-0.124939
-1.952680	version of the application	-0.124939
-0.596757	job of the application	-0.124939
-0.596265	API and the application	-0.124939
-0.599400	operation in the application	-0.124939
-0.589407	thread if the application	-0.124939
-0.589407	integers if the application	-0.124939
-0.595516	not by the application	-0.124939
-1.337580	bigger than the application	-0.124939
-0.591720	separated from the application	-0.124939
-0.594461	library. If the application	-0.124939
-0.588674	takes before the application	-0.124939
-0.357282	and market the application	-0.124939
-0.526159	PC processors. The application	-0.124939
-0.462663	the library. The application	-0.124939
-0.577267	scheduling in an application	-0.124939
-0.501905	Porting such an application	-0.124939
-1.043284	of the first application	-0.124939
-0.356047	Network access Some application	-0.124939
-0.355770	operating systems"). An application	-0.124939
-1.127235	that a particular application	-0.124939
-0.455240	a heavy graphics application	-0.124939
-0.454867	necessary for your application	-0.124939
-0.495338	Installing a second application	-0.124939
-0.966244	of the final application	-0.124939
-0.726638	in a typical application	-0.124939
-0.294163	database integration, web application	-0.124939
-0.294163	(WTL). A WTL application	-0.124939
-0.342340	lrintf (float const x)	-0.124939
-0.063658	lrint (double const x)	-0.425969
-0.037070	__m128i const & x)	-0.726999
-0.172151	(Vec4f const & x)	-0.124939
-0.172151	add_elements(__m128 const & x)	-0.124939
-0.541828	float SomeFunction (int x)	-0.124939
-0.335564	int MultiplyBy (int x)	-0.124939
-0.017301	float parabola (float x)	-0.602060
-0.002871	static double p(double x)	-0.726999
-0.011602	} double xpow10(double x)	-0.124939
-0.005762	10 double xpow10(double x)	-0.425969
-0.011602	unrolled double xpow10(double x)	-0.124939
-0.325377	double IntegerPower (double x)	-0.124939
-0.048391	16 float Exp(float x)	-0.124939
-0.048391	series float Exp(float x)	-0.124939
-0.237894	module1.cpp int Func1(int x)	-0.124939
-0.237894	; double Func2(double x)	-0.124939
-0.527220	The space is automatically	-0.124939
-1.434043	are able to automatically	-0.124939
-0.562800	A compiler that automatically	-0.124939
-0.777296	the compiler can automatically	-0.124939
-0.842606	PathScale compilers can automatically	-0.124939
-0.590621	vectorize the code automatically	-0.124939
-0.516870	optimizing compilers will automatically	-0.124939
-0.982218	Most compilers will automatically	-0.124939
-0.503973	processors prefetch data automatically	-0.124939
-1.041953	unroll a loop automatically	-0.124939
-0.724517	The program should automatically	-0.124939
-0.655428	do this optimization automatically	-0.124939
-1.007032	use vector operations automatically	-0.124939
-0.459672	large static arrays automatically	-0.124939
-0.553395	style that doesn't automatically	-0.124939
-0.353878	Many software programs automatically	-0.124939
-0.494759	because it goes automatically	-0.124939
-0.348962	are often inlined automatically	-0.124939
-0.444224	insert nontemporal writes automatically	-0.124939
-0.339294	update, or update automatically	-0.124939
-0.294089	14.14a with 14.14b automatically	-0.124939
-0.237796	12.8a to 12.8b automatically	-0.124939
-0.237796	Documentation License shall automatically	-0.124939
-0.237796	(see page 73) automatically	-0.124939
-0.572949	critical code to see	-0.124939
-0.354831	look at to see	-0.124939
-0.651711	C++ compilers to see	-0.124939
-1.273556	it possible to see	-0.124939
-0.837208	also possible to see	-0.124939
-0.499294	virtual table to see	-0.124939
-1.729744	you want to see	-0.124939
-0.354831	final result to see	-0.124939
-0.872642	not able to see	-0.124939
-0.794547	therefore fail to see	-0.124939
-0.354831	compiler generates to see	-0.124939
-0.458431	some measurements to see	-0.124939
-0.354831	output listing to see	-0.124939
-0.541092	different libraries and see	-0.124939
-1.252804	the compiler can see	-0.124939
-1.006029	optimizing compiler can see	-0.124939
-0.356821	the user can see	-0.124939
-0.586346	code that you see	-0.124939
-0.857565	optimizing compiler will see	-0.124939
-0.563100	as you will see	-0.124939
-0.525794	we may also see	-0.124939
-0.349726	position- independent code, see	-0.124939
-0.429488	has many features, see	-0.124939
-0.408103	for vector operations, see	-0.124939
-0.237886	on this topic, see	-0.124939
-0.237886	for XMM registers; see	-0.124939
-0.598511	sufficient, and the caching	-0.124939
-0.504886	become fragmented and caching	-0.124939
-0.659616	so big that caching	-0.124939
-0.354072	degradation in code caching	-0.124939
-0.500528	or when code caching	-0.124939
-0.354072	situations where code caching	-0.124939
-0.354072	This makes code caching	-0.124939
-0.565838	makes the data caching	-0.124939
-0.539676	caching and data caching	-0.124939
-0.004168	This makes data caching	-0.492916
-0.030055	which makes data caching	-0.124939
-0.358093	memory addresses. If caching	-0.124939
-1.143712	there are no caching	-0.124939
-0.064596	the code makes caching	-0.124939
-0.356543	storing data without caching	-0.124939
-0.573352	memory can cause caching	-0.124939
-0.331855	page 87). Data caching	-0.124939
-0.294173	RAM memory. Efficient caching	-0.124939
-0.294173	are eliminated. Code caching	-0.124939
-1.384335	the code that allows	-0.124939
-0.718178	operating systems that allows	-0.124939
-0.499836	a language that allows	-0.124939
-0.499836	an option that allows	-0.124939
-0.652480	a container that allows	-0.124939
-0.499836	interposition feature that allows	-0.124939
-1.467139	is that it allows	-0.124939
-0.572052	consequence that it allows	-0.124939
-0.353596	be pure. This allows	-0.124939
-0.353596	previous iteration. This allows	-0.124939
-0.353596	is "undefined". This allows	-0.124939
-0.353596	F1() throw(); This allows	-0.124939
-1.414745	The Intel compiler allows	-0.124939
-1.295367	The Gnu compiler allows	-0.124939
-0.358054	Linux: -ffunction-sections) which allows	-0.124939
-0.357527	instruction set also allows	-0.124939
-0.403229	efficient. 64-bit Windows allows	-0.124939
-0.403229	different. 64-bit Windows allows	-0.124939
-0.355684	The D language allows	-0.124939
-0.750054	a const reference allows	-0.124939
-0.352888	the out-of-order mechanism allows	-0.124939
-0.092948	the program logic allows	-0.124939
-0.343649	format is standardized allows	-0.124939
-0.294163	exponent is biased allows	-0.124939
-0.294163	by assignment. shared_ptr allows	-0.124939
-0.659685	the number and sets	-0.124939
-0.358618	indexes, working with sets	-0.124939
-0.358313	CPU dispatcher then sets	-0.124939
-0.556959	on large data sets	-0.124939
-0.752378	of the instruction sets	-0.124939
-0.316079	used if instruction sets	-0.124939
-0.316079	only when instruction sets	-0.124939
-0.190299	for different instruction sets	-0.124939
-0.464941	on which instruction sets	-0.124939
-0.576114	and SSE2 instruction sets	-0.124939
-0.509669	as supported instruction sets	-0.124939
-0.316079	of various instruction sets	-0.124939
-0.316079	on what instruction sets	-0.124939
-0.316079	backwards compatible instruction sets	-0.124939
-0.409803	The newer instruction sets	-0.124939
-0.446152	The newest instruction sets	-0.124939
-0.495379	with CISC instruction sets	-0.124939
-0.526117	The above example sets	-0.124939
-0.346455	of the 32 sets	-0.124939
-0.346455	organized as 32 sets	-0.124939
-0.453878	array. The constructor sets	-0.124939
-0.343739	Table 13.1. Instruction sets	-0.124939
-0.473655	The initialization routine sets	-0.124939
-0.237877	library function __intel_cpu_features_init() sets	-0.124939
-0.237877	brands and similarly sets	-0.124939
-1.440267	result of the expression	-0.124939
-0.898892	example, in the expression	-0.124939
-0.896354	possible if the expression	-0.124939
-0.593409	integers, then the expression	-0.124939
-0.594950	operands because the expression	-0.124939
-0.462604	unchanged, while the expression	-0.124939
-0.462604	both, while the expression	-0.124939
-0.556068	may change the expression	-0.124939
-0.565157	the time. The expression	-0.124939
-0.357530	of efficiency. The expression	-0.124939
-0.357530	inverted mask. The expression	-0.124939
-0.358565	+ d; This expression	-0.124939
-0.346922	b is an expression	-0.425969
-0.978813	to be an expression	-0.124939
-0.571809	reduce the integer expression	-0.124939
-0.524916	avoid that some expression	-0.124939
-0.331288	Induction variables An expression	-0.124939
-0.331288	constant propagation An expression	-0.124939
-0.331288	same thing. An expression	-0.124939
-0.535659	on the intermediate expression	-0.124939
-0.352865	expect the && expression	-0.124939
-0.347402	loop counter. Any expression	-0.124939
-0.331855	cases. The equivalent expression	-0.124939
-0.607421	is a loop-invariant expression	-0.124939
-0.237869	because the non-reduced expression	-0.124939
-0.358894	This behaviour is implementation	-0.124939
-0.659165	a particular code implementation	-0.124939
-1.003160	} } This implementation	-0.124939
-0.582561	where a vector implementation	-0.124939
-0.583723	uses a different implementation	-0.124939
-0.503770	information about which implementation	-0.124939
-0.357513	117 A C++ implementation	-0.124939
-0.461754	the simplest possible implementation	-0.124939
-0.464140	use the software implementation	-0.124939
-0.464140	But the software implementation	-0.124939
-0.432398	using a software implementation	-0.124939
-0.432398	However, a software implementation	-0.124939
-0.355745	C99 standard. An implementation	-0.124939
-0.581085	select the best implementation	-0.124939
-0.824612	has a good implementation	-0.124939
-0.533080	zero. A good implementation	-0.124939
-0.142878	than the hardware implementation	-0.425969
-0.530958	than a hardware implementation	-0.124939
-0.442085	make a complicated implementation	-0.124939
-0.506619	much more complicated implementation	-0.124939
-0.313061	the most complicated implementation	-0.124939
-0.345090	functions. A metaprogramming implementation	-0.124939
-0.343591	infinity. A typical implementation	-0.124939
-0.336170	optimization. A mixed implementation	-0.124939
-0.331889	seconds. A safer implementation	-0.124939
-0.356897	of procedure 4 Most	-0.124939
-0.356674	binary executable code. Most	-0.124939
-0.716241	in static memory. Most	-0.124939
-0.353317	memory or cache. Most	-0.124939
-0.567213	pointers less efficient. Most	-0.124939
-0.646458	user interface framework Most	-0.124939
-0.494677	times. Thread-local storage Most	-0.124939
-0.350674	CPU. Algebraic reductions Most	-0.124939
-0.816983	outside the loop. Most	-0.124939
-0.348906	with limited resources. Most	-0.124939
-0.638632	16.3 Worst-case testing Most	-0.124939
-0.638715	a simple variable. Most	-0.124939
-0.632460	pointers and references. Most	-0.124939
-0.768097	these instruction sets. Most	-0.124939
-0.902207	floating point expressions. Most	-0.124939
-0.339291	to low-level optimizations. Most	-0.124939
-0.331697	debug and maintain. Most	-0.124939
-0.429336	call. Algebraic reduction Most	-0.124939
-0.575264	in chapter 12. Most	-0.124939
-0.314562	options turned on. Most	-0.124939
-0.294052	different C++ constructs Most	-0.124939
-0.294052	of the executable. Most	-0.124939
-0.382634	decryption, data compression Most	-0.124939
-0.294052	hardware is updated. Most	-0.124939
-0.237763	a, sizeof(b)); 47 Most	-0.124939
-0.237763	more heuristic guidelines. Most	-0.124939
-1.456058	rather than the complicated	-0.124939
-0.598881	reduction is a complicated	-0.124939
-0.880918	and make a complicated	-0.124939
-0.569549	justify such a complicated	-0.124939
-0.851599	Another disadvantage of complicated	-0.124939
-0.358828	an advanced and complicated	-0.124939
-0.583778	dispatcher based on complicated	-0.124939
-0.562253	to use this complicated	-0.124939
-0.432727	not the more complicated	-0.124939
-1.149976	it is more complicated	-0.124939
-0.545667	calculation is more complicated	-0.124939
-0.504512	literature for more complicated	-0.124939
-0.954253	the code more complicated	-0.124939
-0.513273	("hidden")))". A more complicated	-0.124939
-0.671703	to do more complicated	-0.124939
-0.334457	array elements more complicated	-0.124939
-0.542767	a much more complicated	-0.124939
-0.612237	a little more complicated	-0.124939
-0.334457	is somewhat more complicated	-0.124939
-0.593072	use the most complicated	-0.124939
-0.568146	programming is so complicated	-0.124939
-0.355598	is needed. These complicated	-0.124939
-0.353388	less expensive. Using complicated	-0.124939
-0.785776	compiler to reduce complicated	-0.124939
-0.314678	instruction set. More complicated	-0.124939
-0.444264	method is extremely complicated	-0.124939
-0.577092	C++ way of handling	-0.124939
-0.562620	possible ways of handling	-0.124939
-0.358835	used twice for handling	-0.124939
-0.061125	Exceptions and error handling	-0.124939
-0.323449	such as error handling	-0.124939
-0.591600	your own error handling	-0.124939
-0.402832	for the exception handling	-0.124939
-0.283563	off the exception handling	-0.124939
-0.176500	cost of exception handling	-0.124939
-0.176500	alternatives to exception handling	-0.124939
-0.176500	support for exception handling	-0.124939
-0.176500	think that exception handling	-0.124939
-0.176500	to use exception handling	-0.124939
-0.176500	compilers. If exception handling	-0.124939
-0.176500	of using exception handling	-0.124939
-0.176500	The C++ exception handling	-0.124939
-0.176500	its possible exception handling	-0.124939
-0.176500	-ipo No exception handling	-0.124939
-0.176500	possibly save exception handling	-0.124939
-0.176500	reason why exception handling	-0.124939
-0.242385	on structured exception handling	-0.124939
-0.037863	can disable exception handling	-0.425969
-0.077816	error handling Exception handling	-0.124939
-0.077816	exception handling Exception handling	-0.124939
-1.194744	an intermediate code like	-0.124939
-0.560520	standard library functions like	-0.124939
-0.354814	more complicated functions like	-0.124939
-0.356259	In difficult cases like	-0.124939
-0.355311	strings in classes like	-0.124939
-1.537462	can be implemented like	-0.124939
-0.458278	but who would like	-0.124939
-0.455542	programmers write expressions like	-0.124939
-0.280163	compiler can look like	-0.124939
-0.207207	implementation may look like	-0.124939
-0.207207	setup may look like	-0.124939
-0.280163	may typically look like	-0.124939
-0.319695	operators for things like	-0.124939
-0.319695	to simple things like	-0.124939
-0.348281	for simple tasks like	-0.124939
-0.345090	to add statements like	-0.124939
-0.921493	useful in situations like	-0.124939
-0.325363	is also treated like	-0.124939
-0.325271	Using complicated techniques like	-0.124939
-0.077792	class that behaves like	-0.124939
-0.077792	object that behaves like	-0.124939
-0.065784	factorial function looks like	-0.124939
-0.065784	optimized code looks like	-0.124939
-0.065784	vector classes looks like	-0.124939
-0.237829	function is expanded like	-0.124939
-0.237829	to simple actions like	-0.124939
-0.358950	and splitting the dependency	-0.124939
-1.715208	This is a dependency	-0.124939
-0.463202	to break a dependency	-0.124939
-0.940636	The effect of dependency	-0.124939
-0.358288	dependency chains. A dependency	-0.124939
-0.461873	gain if such dependency	-0.124939
-0.451100	has a long dependency	-0.124939
-0.451100	forms a long dependency	-0.124939
-0.310536	misprediction, or long dependency	-0.124939
-0.310536	sub-vector. A long dependency	-0.124939
-0.310536	are no long dependency	-0.124939
-0.059337	to avoid long dependency	-0.425969
-0.310536	22. Avoid long dependency	-0.124939
-0.494230	of a critical dependency	-0.124939
-0.351194	makes a critical dependency	-0.124939
-0.352324	and Z. Each dependency	-0.124939
-0.349729	dependency chain. Such dependency	-0.124939
-0.341810	to break down dependency	-0.124939
-0.038269	called a loop-carried dependency	-0.124939
-0.038269	is no loop-carried dependency	-0.124939
-0.038269	has two loop-carried dependency	-0.124939
-0.038269	are: No loop-carried dependency	-0.124939
-0.038269	chains, especially loop-carried dependency	-0.124939
-0.314732	longer loop- carried dependency	-0.124939
-0.314732	of order. Long dependency	-0.124939
-0.960208	to write the members	-0.124939
-0.726489	class containing the members	-0.124939
-1.351598	Variables that are members	-0.124939
-1.374058	if they are members	-0.124939
-0.504777	or class with members	-0.124939
-0.489440	(properties) The data members	-0.124939
-0.489109	contain all data members	-0.124939
-0.324211	often used data members	-0.124939
-0.419918	parent class data members	-0.124939
-0.324211	allowing two data members	-0.124939
-0.324211	structure where data members	-0.124939
-0.324211	but its data members	-0.124939
-0.324211	will align data members	-0.124939
-0.061229	any non-static data members	-0.425969
-0.061229	7.18 Class data members	-0.425969
-0.324211	that accesses data members	-0.124939
-0.954308	most often used members	-0.124939
-0.462329	structure and class members	-0.124939
-0.357368	the saved variable members	-0.124939
-0.559563	each of its members	-0.124939
-0.535811	putting the smallest members	-0.124939
-0.237904	the class. Data members	-0.124939
-0.237904	data together. Data members	-0.124939
-0.294191	one instance. Non-static members	-0.124939
-0.579730	preferred because of their	-0.124939
-0.179673	spend most of their	-0.124939
-0.586624	new versions of their	-0.124939
-0.463581	are inferior to their	-0.124939
-0.358812	programming languages and their	-0.124939
-0.358667	one variable if their	-0.124939
-0.458303	the program by their	-0.124939
-0.495180	are replaced by their	-0.124939
-0.495180	parameters replaced by their	-0.124939
-0.522686	be identified by their	-0.124939
-1.258931	as long as their	-0.124939
-0.562321	objects even when their	-0.124939
-0.358230	equivalent reductions at their	-0.124939
-0.358186	programmers rarely program their	-0.124939
-1.285856	want to make their	-0.124939
-0.710198	and b because their	-0.124939
-0.064884	same register because their	-0.425969
-0.456856	objects with each their	-0.124939
-0.353591	threads have each their	-0.124939
-0.356590	separately and test their	-0.124939
-0.533830	CPUs can change their	-0.124939
-0.449103	tools that fit their	-0.124939
-0.545546	fail to keep their	-0.124939
-0.463651	reasons before leaving their	-0.124939
-0.523395	is. The type __m128i	-0.124939
-0.561158	for each element __m128i	-0.124939
-0.368676	b and c __m128i	-0.425969
-0.060348	in vector c __m128i	-0.425969
-0.344458	array static inline __m128i	-0.425969
-0.011043	StoreVector(void * d, __m128i	-0.602060
-0.034009	StoreVectorA(void * d, __m128i	-0.124939
-0.044147	generate a bit-mask: __m128i	-0.425969
-0.231206	into vector c: __m128i	-0.425969
-0.231206	into vector b: __m128i	-0.425969
-0.314745	two AND operations: __m128i	-0.124939
-0.137151	vector of (0,0,0,0,0,0,0,0) __m128i	-0.425969
-0.137151	vector of (2,2,2,2,2,2,2,2) __m128i	-0.425969
-0.886047	F2(b); } } Using	-0.124939
-0.459835	to a function. Using	-0.124939
-0.819352	the level-2 cache. Using	-0.124939
-0.564232	vectorized table lookup Using	-0.124939
-0.538805	powers of 2. Using	-0.124939
-0.638801	a simple variable. Using	-0.124939
-0.510354	use single precision. Using	-0.124939
-0.279358	memory access. 12 Using	-0.124939
-0.279358	................................................................................................. 103 12 Using	-0.124939
-0.212223	................................... 141 14.9 Using	-0.124939
-0.212223	(double)(signed int)u; 14.9 Using	-0.124939
-0.325302	are less expensive. Using	-0.124939
-0.421215	sake of efficiency. Using	-0.124939
-0.212223	speed.............................................................................................................. 153 16.1 Using	-0.124939
-0.212223	(see below) 16.1 Using	-0.124939
-0.294108	at least temporarily. Using	-0.124939
-0.538676	(see page 105). Using	-0.124939
-0.102828	......................................................................................... 107 12.4 Using	-0.124939
-0.102828	vector division. 12.4 Using	-0.124939
-0.102828	........................................................................................ 109 12.5 Using	-0.124939
-0.102828	next section. 12.5 Using	-0.124939
-0.237812	in this chapter. Using	-0.124939
-0.237812	at page 150. Using	-0.124939
-0.237812	things in parallel: Using	-0.124939
-0.237812	in chapter 11. Using	-0.124939
-1.279044	order of the Boolean	-0.124939
-1.057195	instead of the Boolean	-0.124939
-0.594607	operands of the Boolean	-0.124939
-1.404865	faster than the Boolean	-0.124939
-1.007489	difference between the Boolean	-0.124939
-0.890292	used as a Boolean	-0.124939
-0.591303	can make a Boolean	-0.124939
-1.048088	the order of Boolean	-0.124939
-0.659245	The order of Boolean	-0.124939
-1.144658	the case of Boolean	-0.124939
-0.358149	page 43). The Boolean	-0.124939
-0.358149	|, ~. The Boolean	-0.124939
-1.352188	are useful for Boolean	-0.124939
-0.504554	makes operations with Boolean	-0.124939
-0.358550	using integers as Boolean	-0.124939
-0.596423	expressions rather than Boolean	-0.124939
-0.850148	operators that have Boolean	-0.124939
-0.571012	programs with many Boolean	-0.124939
-0.451902	operators that produce Boolean	-0.124939
-0.339371	by xx-xx--x- reciprocal Boolean	-0.124939
-0.487865	many branch mispredictions. Boolean	-0.124939
-0.325325	1 for true. Boolean	-0.124939
-0.538725	variables are overdetermined Boolean	-0.124939
-0.294135	handle is invalid. Boolean	-0.124939
-0.237837	- - 76 Boolean	-0.124939
-1.049918	be in the cache.	-0.124939
-0.883363	not in the cache.	-0.124939
-0.883363	set in the cache.	-0.124939
-0.358500	actively invalidate the cache.	-0.124939
-0.726186	in memory or cache.	-0.124939
-1.145790	in the code cache.	-0.425969
-0.778070	of the data cache.	-0.124939
-0.602006	and the data cache.	-0.124939
-0.162717	in the data cache.	-0.124939
-0.425933	into the data cache.	-0.124939
-0.425933	manipulate the data cache.	-0.124939
-1.532992	share the same cache.	-0.124939
-0.354594	the level- 1 cache.	-0.124939
-0.360884	and the level-2 cache.	-0.124939
-0.392208	in the level-2 cache.	-0.124939
-0.387276	in the level-1 cache.	-0.124939
-0.277418	the same level-1 cache.	-0.124939
-0.347433	to the disk cache.	-0.124939
-0.172658	of the micro-op cache.	-0.124939
-0.172658	cache or micro-op cache.	-0.124939
-0.237886	be a level-3 cache.	-0.124939
-0.358839	zero flag and don't	-0.124939
-0.504949	local data that don't	-0.124939
-0.517754	is that you don't	-0.124939
-0.748214	so that you don't	-0.124939
-0.540895	line if you don't	-0.124939
-0.540895	panic if you don't	-0.124939
-0.880716	long as you don't	-0.124939
-0.584155	object then you don't	-0.124939
-0.573434	memory. If you don't	-0.124939
-0.537958	strides. Therefore, you don't	-0.124939
-0.342442	smallest devices, you don't	-0.124939
-0.358052	final destination, but don't	-0.124939
-0.462303	where current compilers don't	-0.124939
-0.430662	range and we don't	-0.124939
-0.446141	so that we don't	-0.425969
-0.302784	cache so we don't	-0.124939
-0.302784	case so we don't	-0.124939
-0.569087	static if they don't	-0.124939
-0.344964	performance options. I don't	-0.124939
-0.344964	to a. I don't	-0.124939
-0.354470	people. I simply don't	-0.124939
-0.294182	everybody. So please don't	-0.124939
-0.294182	because the factorials don't	-0.124939
-0.237877	excuse that "we don't	-0.124939
-0.505007	level-2 cache of 256	-0.124939
-0.358644	repeated 1024/4 = 256	-0.124939
-0.358611	more (128 or 256	-0.124939
-0.358474	short int int 256	-0.124939
-0.526399	will take only 256	-0.124939
-0.357745	float 256 double 256	-0.124939
-0.357556	double 128 float 256	-0.124939
-0.351123	double 64 4 256	-0.124939
-0.351123	long 64 4 256	-0.124939
-0.353002	int 32 8 256	-0.124939
-0.353002	float 32 8 256	-0.124939
-0.491110	64 4 unsigned 256	-0.124939
-0.348947	int 256 unsigned 256	-0.124939
-0.243274	int 16 16 256	-0.124939
-0.243274	256 16 16 256	-0.124939
-0.501466	char 8 32 256	-0.124939
-0.355328	search instructions AVX 256	-0.124939
-0.352866	8 32 char 256	-0.124939
-0.352558	if (SIZE > 256	-0.124939
-0.351883	double vectors AVX2 256	-0.124939
-0.339304	256 int int64_t 256	-0.124939
-0.339304	set is available, 256	-0.124939
-0.325242	int64_t 256 uint64_t 256	-0.124939
-0.237804	128 bits (XMM), 256	-0.124939
-0.237804	short int 832 256	-0.124939
-1.408494	faster than the intrinsic	-0.124939
-0.527162	by calling the intrinsic	-0.124939
-1.530318	the use of intrinsic	-0.124939
-0.358777	two double. The intrinsic	-0.124939
-0.712483	have support for intrinsic	-0.124939
-0.496383	some support for intrinsic	-0.124939
-0.460953	Header files for intrinsic	-0.124939
-0.356817	// header for intrinsic	-0.124939
-0.889050	time. There are intrinsic	-0.124939
-0.526786	be implemented with intrinsic	-0.124939
-0.597110	choose to use intrinsic	-0.124939
-0.584196	hundreds of different intrinsic	-0.124939
-0.358028	you had used intrinsic	-0.124939
-0.803432	sense that each intrinsic	-0.124939
-0.591475	result by using intrinsic	-0.124939
-0.356412	// Define SSE2 intrinsic	-0.124939
-0.355710	assembly language Use intrinsic	-0.124939
-0.535165	compilers that support intrinsic	-0.124939
-0.311364	table lookup Using intrinsic	-0.124939
-0.128356	107 12.4 Using intrinsic	-0.124939
-0.128356	division. 12.4 Using intrinsic	-0.124939
-0.515227	PGI compiler supports intrinsic	-0.124939
-0.788310	using the so-called intrinsic	-0.124939
-0.237820	compilers allow assembly-like intrinsic	-0.124939
-0.237820	use the _mm_clflush intrinsic	-0.124939
-0.597332	investigated by the methods	-0.124939
-0.540998	temporarily. Using the methods	-0.124939
-0.358110	operator These different methods	-0.124939
-0.786483	faster than other methods	-0.124939
-0.503760	two commonly used methods	-0.124939
-0.357521	should use such methods	-0.124939
-0.460833	the various optimization methods	-0.124939
-0.792190	any of these methods	-0.124939
-0.692436	All of these methods	-0.124939
-0.692475	Note that these methods	-0.124939
-0.329162	that use these methods	-0.124939
-0.356529	are more useful methods	-0.124939
-1.002310	of the following methods	-0.124939
-0.582095	branches. The following methods	-0.124939
-0.342565	of memory. These methods	-0.124939
-0.442919	of CPU. These methods	-0.124939
-0.575436	of the above methods	-0.124939
-0.353325	Since most development methods	-0.124939
-1.402257	There are various methods	-0.124939
-0.454301	of the storage methods	-0.124939
-0.446115	blocking and similar methods	-0.124939
-0.237820	have inefficient code-based methods	-0.124939
-0.237820	code. These workaround methods	-0.124939
-0.237820	have efficient table-based methods	-0.124939
-0.237820	works and suggests methods	-0.124939
-0.587596	overflow of a signed	-0.124939
-0.874628	Conversion of a signed	-0.124939
-1.062498	integer to a signed	-0.124939
-0.885008	the behavior of signed	-0.124939
-0.816747	convert it to signed	-0.124939
-0.763224	unsigned integers to signed	-0.124939
-0.462714	the conversion to signed	-0.124939
-0.659593	the assumption that signed	-0.124939
-0.600563	they can be signed	-0.124939
-0.358596	is faster with signed	-0.124939
-0.463204	behaves differently on signed	-0.124939
-1.330465	less efficient than signed	-0.124939
-1.672210	is faster than signed	-0.124939
-0.357810	speed between using signed	-0.124939
-0.350185	The conversion between signed	-0.124939
-0.539486	... Conversions between signed	-0.124939
-0.522578	2.5; // Use signed	-0.124939
-0.539587	not to mix signed	-0.124939
-0.341757	4 64-bit integer, signed	-0.124939
-0.336180	result of comparing signed	-0.124939
-0.237865	2 2 int, signed	-0.124939
-0.314683	1 short int, signed	-0.124939
-0.044147	as an 8-bit signed	-0.124939
-0.294135	1 1 char, signed	-0.124939
-0.599547	likely is a model	-0.124939
-0.461167	brand name and model	-0.124939
-0.525176	brand names and model	-0.124939
-0.089810	CPU family and model	-0.124939
-0.089810	its family and model	-0.124939
-0.089810	brand, family and model	-0.124939
-0.514624	you assume that model	-0.124939
-0.742906	cannot assume that model	-0.124939
-0.358667	unknown brand or model	-0.124939
-0.358295	is inferior. A model	-0.124939
-0.572760	for the memory model	-0.124939
-0.246520	a large memory model	-0.124939
-0.246520	This large memory model	-0.124939
-0.449831	and each CPU model	-0.124939
-1.046772	a specific CPU model	-0.124939
-0.348043	a known CPU model	-0.124939
-0.638364	a particular CPU model	-0.124939
-0.356584	the next new model	-0.124939
-0.131025	know that processor model	-0.124939
-0.131025	Assuming that processor model	-0.124939
-0.319552	for each processor model	-0.124939
-0.319552	the next processor model	-0.124939
-0.575039	make the next model	-0.124939
-0.441957	given a false model	-0.124939
-0.237877	-fp-model fast, -fp- model	-0.124939
-0.504826	about how the development	-0.124939
-0.527085	performance during the development	-0.124939
-0.358885	and ease of development	-0.124939
-0.574798	with compilers and development	-0.124939
-0.913726	programming language and development	-0.124939
-0.462316	efficiency, portability and development	-0.124939
-0.503934	work automatically. The development	-0.124939
-0.503934	MFC application. The development	-0.124939
-0.358191	it makes program development	-0.124939
-0.357854	25 Since most development	-0.124939
-0.524888	true that some development	-0.124939
-0.642422	of the software development	-0.124939
-0.452456	view the software development	-0.124939
-0.323735	about which software development	-0.124939
-0.323735	redesign. Some software development	-0.124939
-0.323735	Microsoft platform software development	-0.124939
-0.323735	of structured software development	-0.124939
-0.656360	a compromise between development	-0.124939
-0.355459	availability of good development	-0.124939
-0.494199	lack of advanced development	-0.124939
-0.444309	fast and easy development	-0.124939
-0.336218	availability of powerful development	-0.124939
-0.331798	tools. One popular development	-0.124939
-0.294145	or network. Various development	-0.124939
-0.294145	Windows. The integrated development	-0.124939
-0.358935	closely follows the mathematical	-0.124939
-0.065714	for reasons of mathematical	-0.425969
-0.358208	produce tables of mathematical	-0.124939
-0.357410	sound processing, and mathematical	-0.124939
-0.357410	sorting, searching, and mathematical	-0.124939
-0.065612	square root and mathematical	-0.425969
-0.504929	loop is in mathematical	-0.124939
-0.557467	intrinsic instructions for mathematical	-0.124939
-0.358623	memmove, memset, or mathematical	-0.124939
-0.463195	focus is on mathematical	-0.124939
-0.870001	thread can do mathematical	-0.124939
-0.356529	contains many useful mathematical	-0.124939
-0.581209	more information about mathematical	-0.124939
-0.342556	functions for common mathematical	-0.124939
-0.998961	The most common mathematical	-0.124939
-0.355233	library contains optimized mathematical	-0.124939
-0.354675	innermost loop doing mathematical	-0.124939
-0.565245	for more complicated mathematical	-0.124939
-0.494157	lot of advanced mathematical	-0.124939
-0.539558	and to mix mathematical	-0.124939
-0.341739	such as heavy mathematical	-0.124939
-0.429358	libraries for computing mathematical	-0.124939
-0.237820	useful for vectorizing mathematical	-0.124939
-0.598901	libraries it is never	-0.124939
-1.675498	the function is never	-0.124939
-0.576235	dispatched function is never	-0.124939
-1.714072	the program is never	-0.124939
-0.318831	a variable is never	-0.124939
-0.569170	members that are never	-0.124939
-0.569170	constants that are never	-0.124939
-0.866250	the functions are never	-0.124939
-1.360751	if they are never	-0.124939
-0.357787	that i can never	-0.124939
-0.579225	compiler. We can never	-0.124939
-0.456183	that a will never	-0.124939
-0.559820	so you will never	-0.124939
-0.353059	that F1 will never	-0.124939
-0.484661	a function should never	-0.124939
-0.306042	thread-safe function should never	-0.124939
-0.513197	buffer. It should never	-0.124939
-0.346128	updating mechanism should never	-0.124939
-1.044797	if the user never	-0.124939
-0.459257	sure that overflow never	-0.124939
-0.354122	template feature was never	-0.124939
-0.824474	The memory space never	-0.124939
-0.512148	to user input never	-0.124939
-0.325341	a #define directive never	-0.124939
-0.882419	or in a separate	-0.124939
-0.882419	code in a separate	-0.124939
-0.526133	access in a separate	-0.124939
-0.348231	implemented in a separate	-0.425969
-0.526133	information in a separate	-0.124939
-0.762556	defined in a separate	-0.124939
-0.526133	something in a separate	-0.124939
-0.526133	placed in a separate	-0.124939
-0.526133	scheduled in a separate	-0.124939
-0.579216	library or a separate	-0.124939
-0.495608	calculations into a separate	-0.124939
-0.495608	task into a separate	-0.124939
-0.495608	isolated into a separate	-0.124939
-0.814478	than making a separate	-0.124939
-0.650518	have implemented a separate	-0.124939
-1.065995	excessive number of separate	-0.124939
-0.462981	network access in separate	-0.124939
-0.658832	be placed in separate	-0.124939
-0.596779	modified should be separate	-0.124939
-0.569880	efficient to have separate	-0.124939
-0.762724	You may make separate	-0.124939
-0.358154	often used functions separate	-0.124939
-0.462179	time-consuming tasks into separate	-0.124939
-0.356665	different threads need separate	-0.124939
-0.596474	often because the block	-0.124939
-0.527251	zero within a block	-0.124939
-0.463415	because they can block	-0.124939
-0.489897	of the memory block	-0.124939
-0.440014	of a memory block	-0.124939
-0.440014	when a memory block	-0.124939
-0.322595	but this memory block	-0.124939
-0.417904	allocates one memory block	-0.124939
-0.322595	destroys any memory block	-0.124939
-0.061008	a new memory block	-0.425969
-0.417904	one big memory block	-0.124939
-0.322595	its own memory block	-0.124939
-0.192456	new bigger memory block	-0.124939
-0.322595	the old memory block	-0.124939
-1.321911	of the data block	-0.124939
-0.459795	block. A large block	-0.124939
-0.500601	allocate one big block	-0.124939
-0.558154	allocate a small block	-0.124939
-0.559245	handle its own block	-0.124939
-1.038256	in the old block	-0.124939
-0.527309	thread can possibly block	-0.124939
-0.341834	is no try block	-0.124939
-0.596354	?Func@@YAXQAHAAH@Z is the name	-0.124939
-1.954383	is that the name	-0.124939
-0.889425	compilers use the name	-0.124939
-0.358493	for modifying the name	-0.124939
-0.569252	independent code. The name	-0.124939
-0.462663	library www.agner.org/optimize/asmlib.zip. The name	-0.124939
-1.180198	in the function name	-0.124939
-0.649587	// Define function name	-0.124939
-0.457069	name Intrinsic function name	-0.124939
-0.065144	; mangled function name	-0.124939
-0.583764	function a different name	-0.124939
-0.122022	has the same name	-0.301030
-0.579109	the child class name	-0.124939
-0.410534	correct child class name	-0.124939
-1.358550	This is called name	-0.124939
-0.500945	other than its name	-0.124939
-0.500950	The details about name	-0.124939
-0.495831	use the local name	-0.124939
-0.444264	have any brand name	-0.124939
-0.444264	just an arbitrary name	-0.124939
-0.294163	The funny looking name	-0.124939
-0.237861	instructions. Function Assembly name	-0.124939
-0.577609	performance between the systems.	-0.124939
-0.491074	of the 64-bit systems.	-0.124939
-0.736059	32-bit and 64-bit systems.	-0.124939
-0.150927	registers in 64-bit systems.	-0.124939
-0.384183	bits in 64-bit systems.	-0.124939
-0.384183	bytes in 64-bit systems.	-0.124939
-0.384183	fourteen in 64-bit systems.	-0.124939
-0.540798	sixteen in 64-bit systems.	-0.124939
-0.524916	IDE on some systems.	-0.124939
-0.587847	resource in 32-bit systems.	-0.124939
-0.966066	compilers and operating systems.	-0.124939
-0.648416	CPUs and operating systems.	-0.124939
-0.333467	to multiple operating systems.	-0.124939
-0.481253	in 64-bit operating systems.	-0.124939
-0.500055	available for Linux systems.	-0.124939
-0.457113	tested in Mac systems.	-0.124939
-0.352034	used on bigger systems.	-0.124939
-0.346412	mutexes and message systems.	-0.124939
-0.346412	applies to BSD systems.	-0.124939
-0.255876	in some embedded systems.	-0.124939
-0.336267	for small embedded systems.	-0.124939
-0.575494	objects in Unix-like systems.	-0.124939
-0.237869	OS and Itanium systems.	-0.124939
-0.592162	bottlenecks is to put	-0.124939
-0.553566	module, and to put	-0.124939
-0.460529	also useful to put	-0.124939
-0.460529	often useful to put	-0.124939
-0.667763	be advantageous to put	-0.124939
-1.509154	is recommended to put	-0.124939
-0.543051	therefore recommended to put	-0.124939
-0.355851	would like to put	-0.124939
-0.826700	good idea to put	-0.124939
-1.033347	the code and put	-0.124939
-0.357927	multiple threads and put	-0.124939
-0.462364	used functions, and put	-0.124939
-1.414313	have to be put	-0.124939
-0.864476	therefore preferably be put	-0.124939
-1.588871	then you may put	-0.124939
-0.562611	b*(2.0/3.0) unless you put	-0.124939
-0.358429	after they have put	-0.124939
-0.352942	the other then put	-0.124939
-0.352942	128 bytes then put	-0.124939
-0.352942	the other, then put	-0.124939
-0.337018	functions and simply put	-0.124939
-0.435941	values are simply put	-0.124939
-0.287609	stopping threads. Don't put	-0.124939
-0.287609	the opposite: Don't put	-0.124939
-1.296860	because of the needs	-0.124939
-0.463589	b && a needs	-0.124939
-0.521636	the function that needs	-0.124939
-0.521636	Any function that needs	-0.124939
-0.460732	User work that needs	-0.124939
-0.829045	a destructor that needs	-0.124939
-0.728842	calls and it needs	-0.124939
-0.506255	layers and it needs	-0.124939
-0.523599	time because it needs	-0.124939
-0.523599	library because it needs	-0.124939
-0.523599	simpler because it needs	-0.124939
-0.358557	+= 1.0f; This needs	-0.124939
-1.408198	that the compiler needs	-0.124939
-0.586813	optimization, the compiler needs	-0.124939
-0.877983	If a loop needs	-0.124939
-0.657512	a && b needs	-0.124939
-0.959546	the executable file needs	-0.124939
-0.356130	only one constant needs	-0.124939
-0.896599	a positive list needs	-0.124939
-0.828894	The code section needs	-0.124939
-0.450976	writable data section needs	-0.124939
-0.349028	the code still needs	-0.124939
-0.488983	where each iteration needs	-0.124939
-0.741920	the exception handler needs	-0.124939
-0.237845	such as ReadB needs	-0.124939
-0.089732	} z = y	-0.124939
-0.089732	cos(x); z = y	-0.124939
-0.089732	sin(x); z = y	-0.124939
-1.424502	} else { y	-0.124939
-0.527989	68 else { y	-0.124939
-0.624748	if (b) { y	-0.425969
-0.525588	n) { double y	-0.124939
-0.502621	bitofn // return y	-0.124939
-0.544246	load the structure y	-0.124939
-0.555559	then the expression y	-0.124939
-0.819422	b + c; y	-0.124939
-0.352596	= x > y	-0.124939
-0.351645	+ c; Here, y	-0.124939
-0.453314	{x = a; y	-0.124939
-0.349687	a : b) y	-0.124939
-0.040870	c, d, y; y	-0.425969
-0.193316	= 100, y; y	-0.124939
-0.193316	= 1.23456, y; y	-0.124939
-0.044142	a2, b1, b2; y	-0.425969
-0.237845	(n & 1) y	-0.124939
-0.237845	vector(x + a.x, y	-0.124939
-0.237845	we may write: y	-0.124939
-0.599140	check that the conversion	-0.124939
-0.599310	point if the conversion	-0.124939
-0.989007	this example, the conversion	-0.124939
-0.463084	enabled. Typically, the conversion	-0.124939
-0.461046	as well. The conversion	-0.124939
-0.461046	to integer. The conversion	-0.124939
-0.356890	positive result. The conversion	-0.124939
-0.356890	i<100; i++)a[i]=2*i; The conversion	-0.124939
-0.358563	(short int)i; This conversion	-0.124939
-0.358282	64-bit mode. A conversion	-0.124939
-0.358218	vectors. This data conversion	-0.124939
-0.539713	Float to integer conversion	-0.124939
-0.566870	expression. The size conversion	-0.124939
-0.353228	performance. Integer size conversion	-0.124939
-0.723682	Integer to float conversion	-0.124939
-0.357032	signed integers before conversion	-0.124939
-0.356952	Signed / unsigned conversion	-0.124939
-0.442050	that the type conversion	-0.124939
-0.442050	while the type conversion	-0.124939
-0.325380	rounding. Pointer type conversion	-0.124939
-0.325380	// Implicit type conversion	-0.124939
-0.441427	Floating point precision conversion	-0.124939
-0.341381	precision require precision conversion	-0.124939
-0.294163	and truncation. Efficient conversion	-0.124939
-0.237861	because the integer-to-float conversion	-0.124939
-1.586456	{ a = c;	-0.124939
-0.871634	int b; int c;	-0.124939
-0.823866	{ public: int c;	-0.124939
-0.354891	B2 b2; int c;	-0.124939
-0.357809	a, b; double c;	-0.124939
-0.268960	+ b + c;	-0.124939
-0.182505	: b * c;	-0.124939
-0.357262	+ 1; return c;	-0.124939
-0.983405	= b / c;	-0.124939
-0.185641	{ int b, c;	-0.124939
-0.086501	int a, b, c;	-0.346788
-0.234168	Vec16s a, b, c;	-0.124939
-0.234168	Vec8s a, b, c;	-0.124939
-0.802317	= b % c;	-0.124939
-0.005762	{ int r, c;	-0.425969
-0.011602	y=temp;} int r, c;	-0.124939
-0.011602	98 int r, c;	-0.124939
-0.107118	by means of #include	-0.124939
-1.295909	vector class library #include	-0.124939
-0.356400	Vectorized with SSE2 #include	-0.124939
-0.355846	the compiled versions #include	-0.124939
-0.573198	Agner vector classes #include	-0.124939
-0.860937	automatic CPU dispatching #include	-0.124939
-0.455509	in other compilers. #include	-0.124939
-0.349029	Taylor series, vectorized #include	-0.124939
-0.331783	// Example 16.2 #include	-0.124939
-0.325287	// Example 16.1 #include	-0.124939
-0.325287	// Example 9.3 #include	-0.124939
-0.314601	or x64 141 #include	-0.124939
-0.314673	// Example 9.6b. #include	-0.124939
-0.037153	file for InstructionSet() #include	-0.425969
-0.382680	16.2 #include <stdio.h> #include	-0.124939
-0.294089	} // Or #include	-0.124939
-0.237796	compilers. #include <excpt.h> #include	-0.124939
-0.237796	<excpt.h> #include <float.h> #include	-0.124939
-0.237796	denormals-are-zero mode (SSE2): #include	-0.124939
-0.237796	vector classes (Intel) #include	-0.124939
-0.237796	flush-to-zero mode (SSE): #include	-0.124939
-0.237796	vector classes 114 #include	-0.124939
-0.358952	in applying the various	-0.124939
-0.550715	The availability of various	-0.124939
-0.462934	bounds checking and various	-0.124939
-0.526473	ARM platforms and various	-0.124939
-1.230592	be implemented in various	-0.124939
-0.584654	avoided, there are various	-0.124939
-0.396900	efficient. There are various	-0.124939
-0.396900	processors. There are various	-0.124939
-0.396900	vectors There are various	-0.124939
-0.396900	resources. There are various	-0.124939
-0.396900	not. There are various	-0.124939
-0.396900	arrays. There are various	-0.124939
-0.396900	manual. There are various	-0.124939
-0.396900	explicitly. There are various	-0.124939
-0.396900	power. There are various	-0.124939
-0.396900	normally. There are various	-0.124939
-0.180922	C++ compilers have various	-0.124939
-0.358185	more complicated because various	-0.124939
-0.502423	keyword also makes various	-0.124939
-0.354854	at www.agner.org/optimize/asmlib.zip contains various	-0.124939
-0.539478	capability to reduce various	-0.124939
-0.336253	and 135 show various	-0.124939
-0.382793	subsequent sections describe various	-0.124939
-0.569819	languages have the disadvantage	-0.124939
-0.636720	it has the disadvantage	-0.124939
-0.437218	This has the disadvantage	-0.425969
-1.574945	There is a disadvantage	-0.124939
-0.590298	therefore be a disadvantage	-0.124939
-0.584414	be at a disadvantage	-0.124939
-0.724568	is often a disadvantage	-0.124939
-0.535286	explained below. The disadvantage	-0.124939
-0.556756	read-only data. The disadvantage	-0.124939
-0.535286	never called. The disadvantage	-0.124939
-0.652084	64-bit Windows. The disadvantage	-0.124939
-0.499557	another array. The disadvantage	-0.124939
-0.652084	program starts. The disadvantage	-0.124939
-0.355020	is avoided. The disadvantage	-0.124939
-0.353196	page 107. A disadvantage	-0.124939
-0.353196	ASCII form. A disadvantage	-0.124939
-0.353196	different types. A disadvantage	-0.124939
-1.038725	The most important disadvantage	-0.124939
-0.344685	important. An important disadvantage	-0.124939
-0.265177	CPU time. Another disadvantage	-0.124939
-0.265177	code itself. Another disadvantage	-0.124939
-0.265177	program slower. Another disadvantage	-0.124939
-0.336295	compact. The biggest disadvantage	-0.124939
-0.900016	implemented in the high	-0.124939
-1.809348	to use the high	-0.124939
-0.595833	solution because the high	-0.124939
-0.358505	user. With the high	-0.124939
-0.959482	of objects is high	-0.124939
-0.659251	work load is high	-0.124939
-1.037970	matrix is a high	-0.124939
-0.587884	bytes) is a high	-0.124939
-0.584670	loops if a high	-0.124939
-0.884920	loop with a high	-0.124939
-0.871130	must have a high	-0.124939
-0.583410	comes at a high	-0.124939
-0.841159	situations where a high	-0.124939
-0.356721	and involve a high	-0.124939
-0.358184	switch statements The high	-0.124939
-0.358184	more powerful. The high	-0.124939
-0.556502	ADX instructions for high	-0.124939
-0.358165	math. Libraries for high	-0.124939
-0.358227	of performance has high	-0.124939
-0.168341	code is so high	-0.425969
-0.631059	may be so high	-0.124939
-0.582149	require a very high	-0.124939
-0.294210	can be annoyingly high	-0.124939
-0.890579	that use the zero	-0.124939
-0.462864	point number is zero	-0.124939
-0.556906	The value is zero	-0.124939
-0.763003	// f is zero	-0.124939
-0.726725	first byte of zero	-0.124939
-0.168473	set a to zero	-0.124939
-0.813000	an integer to zero	-0.124939
-0.523547	these variables to zero	-0.124939
-0.825692	sign bit to zero	-0.124939
-0.500241	a register to zero	-0.124939
-0.569234	setting pointers to zero	-0.124939
-0.718848	be initialized to zero	-0.124939
-0.459293	set seconds to zero	-0.124939
-0.459293	count down to zero	-0.124939
-0.355510	// Initialize to zero	-0.124939
-0.358849	the carry and zero	-0.124939
-0.525083	type conversion takes zero	-0.124939
-0.354132	if a was zero	-0.124939
-0.062272	of (0,0,0,0,0,0,0,0) __m128i zero	-0.425969
-0.037163	including the terminating zero	-0.425969
-0.237894	that seconds remains zero	-0.124939
-1.496653	most of the Microsoft	-0.124939
-0.600504	free in the Microsoft	-0.124939
-0.584347	features as the Microsoft	-0.124939
-0.587376	Integrates into the Microsoft	-0.124939
-0.562468	and C++ is Microsoft	-0.124939
-0.726017	development tool is Microsoft	-0.124939
-0.358546	is specific to Microsoft	-0.124939
-0.358546	a plug-in to Microsoft	-0.124939
-0.828003	from Intel and Microsoft	-0.124939
-1.489711	} } The Microsoft	-0.124939
-0.658316	Windows platforms. The Microsoft	-0.124939
-0.835005	Clang, Intel or Microsoft	-0.124939
-0.461446	projects made with Microsoft	-0.124939
-0.503454	Microsoft Comes with Microsoft	-0.124939
-0.358085	_MSC_VER // If Microsoft	-0.124939
-0.352859	are mentioned below. Microsoft	-0.124939
-0.326949	Supported compilers Intel, Microsoft	-0.124939
-0.326949	Gnu, Clang, Intel, Microsoft	-0.124939
-0.343668	are also available. Microsoft	-0.124939
-0.339380	Gnu Intel Borland Microsoft	-0.124939
-0.336191	Mac Intel CodeGear Microsoft	-0.124939
-0.294145	an explanation. (The Microsoft	-0.124939
-0.237845	up to date): Microsoft	-0.124939
-0.237845	versions were tested: Microsoft	-0.124939
-0.358883	serious limitations to what	-0.124939
-0.463490	can do and what	-0.124939
-0.659593	so fast that what	-0.124939
-0.358636	be called, or what	-0.124939
-0.354171	numbers, but on what	-0.124939
-0.578003	C++ based on what	-0.124939
-0.576512	use depends on what	-0.124939
-0.580131	solutions, depending on what	-0.124939
-0.841429	Let's look at what	-0.124939
-1.054430	the compiler does what	-0.124939
-0.355701	it returns. But what	-0.124939
-0.354537	2 ; add what	-0.124939
-1.502393	following example shows what	-0.124939
-0.721474	programmer to know what	-0.124939
-0.386739	sure you know what	-0.124939
-0.594823	compiler doesn't know what	-0.124939
-0.505313	You can change what	-0.124939
-0.414290	reference cannot change what	-0.124939
-0.341757	to tell explicitly what	-0.124939
-0.341824	to measure exactly what	-0.124939
-0.294135	efficient, and that's what	-0.124939
-0.048382	105. 8.7 Checking what	-0.124939
-0.048382	82 8.7 Checking what	-0.124939
-0.382736	to the reader what	-0.124939
-1.795808	value of the parameter	-0.124939
-0.599639	complex if the parameter	-0.124939
-0.598526	transfer of a parameter	-0.124939
-0.586591	significant if a parameter	-0.124939
-0.595249	numbers as a parameter	-0.124939
-0.634880	the overhead of parameter	-0.124939
-0.357063	The overhead of parameter	-0.425969
-0.358385	and return and parameter	-0.124939
-0.540587	register allocation and parameter	-0.124939
-0.855391	than a function parameter	-0.124939
-0.577556	between a function parameter	-0.124939
-0.357960	most critical integer parameter	-0.124939
-0.567082	if the size parameter	-0.425969
-0.800556	because the template parameter	-0.124939
-0.547774	using a template parameter	-0.124939
-0.331080	functions. The template parameter	-0.124939
-0.499526	so). A template parameter	-0.124939
-0.453066	= a ; parameter	-0.124939
-0.587426	= Induction; ; parameter	-0.124939
-0.321192	?Func@@YAXQAHAAH@Z PROCNEAR ; parameter	-0.124939
-0.321192	PROC NEAR ; parameter	-0.124939
-0.421371	as an implicit parameter	-0.124939
-1.784994	to make the division	-0.124939
-0.358849	microprocessors. Multiplication and division	-0.124939
-0.463496	< 0. The division	-0.124939
-0.994254	is faster than division	-0.425969
-0.586363	causes floating point division	-0.124939
-0.418439	division Floating point division	-0.124939
-0.160661	14.6 Floating point division	-0.124939
-0.418439	cycles). Floating point division	-0.124939
-0.358018	can eliminate one division	-0.124939
-0.514223	index. The integer division	-0.124939
-0.514223	instructions for integer division	-0.124939
-0.350019	/ means integer division	-0.124939
-0.534006	unsigned for fast division	-0.124939
-0.234174	of 2 Integer division	-0.124939
-0.310281	compile time. Integer division	-0.124939
-0.234174	Integer division Integer division	-0.124939
-0.234174	other microprocessors. Integer division	-0.124939
-0.234174	the microprocessor. Integer division	-0.124939
-0.310281	96. 14.5 Integer division	-0.124939
-0.234174	the processor). Integer division	-0.124939
-0.234174	integer division: Integer division	-0.124939
-0.408115	many reductions involving division	-0.124939
-0.598902	r is a reference	-0.124939
-0.540469	example: Use a reference	-0.124939
-0.462824	which returns a reference	-0.124939
-0.171523	a pointer or reference	-0.669007
-0.195887	A pointer or reference	-0.124939
-0.086992	any pointer or reference	-0.425969
-0.195887	integer, pointer or reference	-0.124939
-0.195887	returned pointer or reference	-0.124939
-0.658651	points to. A reference	-0.124939
-0.298500	away a const reference	-0.124939
-0.298500	reference, a const reference	-0.124939
-0.335144	pointer or const reference	-0.124939
-0.472077	reference. A const reference	-0.124939
-0.580233	use a constant reference	-0.124939
-0.452737	sees a relative reference	-0.124939
-0.444342	error // Return reference	-0.124939
-0.382839	Return a null reference	-0.124939
-0.600576	modifications of the source	-0.124939
-1.061223	appear in the source	-0.124939
-0.595991	everywhere in the source	-0.124939
-0.598985	cycle if the source	-0.124939
-1.057704	not use the source	-0.124939
-0.590922	Templates make the source	-0.124939
-0.358786	(byte code). The source	-0.124939
-1.161217	an option for source	-0.124939
-0.579205	operands means that source	-0.124939
-0.561703	kept in different source	-0.124939
-1.635989	in the same source	-0.124939
-1.351381	with the same source	-0.124939
-1.185931	from the same source	-0.124939
-0.357996	to join all source	-0.124939
-0.574302	class in one source	-0.124939
-0.655090	is a useful source	-0.124939
-1.116357	is a common source	-0.124939
-0.499568	class in another source	-0.124939
-0.354387	two steps. All source	-0.124939
-0.626182	is a frequent source	-0.124939
-0.339371	processing. Yeppp. Open source	-0.124939
-0.438865	from a reliable source	-0.124939
-0.314649	Watcom Another open source	-0.124939
-0.237837	as a valuable source	-0.124939
-0.597501	itself, and the cost	-0.124939
-0.599004	consider if the cost	-0.124939
-0.593216	faster at the cost	-0.124939
-0.549629	market. But the cost	-0.124939
-0.358200	NAN. Avoiding the cost	-0.124939
-0.358200	dispatching. Underestimating the cost	-0.124939
-0.356907	the thread. The cost	-0.124939
-0.356907	cores. 60 The cost	-0.124939
-0.356907	be defined. The cost	-0.124939
-0.356907	multithreaded applications: The cost	-0.124939
-0.589977	loop does not cost	-0.124939
-0.358574	task switching. This cost	-0.124939
-1.541435	there is no cost	-0.124939
-1.593277	There is no cost	-0.124939
-0.549211	Deallocation has no cost	-0.124939
-0.345726	is virtually no cost	-0.124939
-0.502843	freely without any cost	-0.124939
-0.656520	is no performance cost	-0.124939
-0.430927	231. This extra cost	-0.124939
-0.898332	is an extra cost	-0.124939
-0.543893	is no extra cost	-0.124939
-0.826905	is a large cost	-0.124939
-0.414370	a large overhead cost	-0.124939
-0.319755	a high overhead cost	-0.124939
-0.591889	CPU it is running	-0.124939
-0.591889	microprocessor it is running	-0.124939
-1.516558	the code is running	-0.124939
-0.840688	your code is running	-0.124939
-0.357712	CPU core is running	-0.124939
-0.358797	Intel's term for running	-0.124939
-0.593870	tasks that are running	-0.124939
-0.576535	that we are running	-0.124939
-0.357165	and repagination are running	-0.124939
-0.567989	chosen only when running	-0.124939
-0.644111	instruction set when running	-0.124939
-0.350978	other libraries when running	-0.124939
-0.350978	systems disappears when running	-0.124939
-0.357032	monitor counters before running	-0.124939
-0.586827	Mac operating system running	-0.124939
-0.523407	system to avoid running	-0.124939
-0.523407	models to avoid running	-0.124939
-0.355702	core. Two threads running	-0.124939
-0.355608	a higher-priority thread running	-0.124939
-0.352862	146 Multiple applications running	-0.124939
-0.351624	a background process running	-0.124939
-0.331818	all other processes running	-0.124939
-0.575476	the program starts running	-0.124939
-0.314725	of resources. Consider running	-0.124939
-0.782437	assembly language and automatic	-0.124939
-0.065498	Supports OpenMP and automatic	-0.124939
-0.031515	processing, OpenMP and automatic	-0.425969
-0.065498	107), OpenMP and automatic	-0.124939
-0.356518	vector intrinsics and automatic	-0.124939
-1.109469	the function. The automatic	-0.124939
-0.503952	AVX instructions. The automatic	-0.124939
-0.358147	have features for automatic	-0.124939
-0.358147	or intranet for automatic	-0.124939
-0.355352	class code with automatic	-0.124939
-0.355352	user-written code with automatic	-0.124939
-0.354440	example (12.4e) with automatic	-0.124939
-0.354440	and invoked with automatic	-0.124939
-0.713696	to rely on automatic	-0.124939
-0.713696	can rely on automatic	-0.124939
-1.889536	There is no automatic	-0.124939
-0.357864	2004. Can do automatic	-0.124939
-0.953742	in situations where automatic	-0.124939
-0.355757	all compilers. Use automatic	-0.124939
-0.521288	the program contains automatic	-0.124939
-0.453272	compiler that supports automatic	-0.124939
-0.479768	hours to install automatic	-0.124939
-0.237869	supports vector intrinsics, automatic	-0.124939
-0.358947	that shares the resources	-0.124939
-0.526457	the data and resources	-0.124939
-0.526699	are called and resources	-0.124939
-0.450578	may use more resources	-0.124939
-0.556879	C++ take more resources	-0.124939
-0.454490	take much more resources	-0.124939
-0.454490	uses much more resources	-0.124939
-0.772486	only slightly more resources	-0.124939
-0.462762	takes more memory resources	-0.124939
-0.530514	files and other resources	-0.124939
-0.530514	network and other resources	-0.124939
-0.957225	to predict which resources	-0.124939
-0.358015	are removed, all resources	-0.124939
-0.064191	3.11 Other system resources	-0.124939
-0.508474	there are allocated resources	-0.124939
-0.341840	make sure allocated resources	-0.124939
-0.544864	of the shared resources	-0.124939
-0.282347	times for network resources	-0.124939
-0.118621	depend on network resources	-0.124939
-0.118621	relies on network resources	-0.124939
-0.331818	have less computing resources	-0.124939
-0.382771	order to reserve resources	-0.124939
-0.237861	low-priority thread steals resources	-0.124939
-1.794306	value of the induction	-0.124939
-0.889794	and use the induction	-0.124939
-0.882242	would make the induction	-0.124939
-0.805930	The method of induction	-0.124939
-0.358822	common subexpressions, and induction	-0.124939
-0.461451	optimize this with induction	-0.124939
-0.357209	Calculate polynomial with induction	-0.124939
-0.142414	calculated by an induction	-0.124939
-0.754366	from making an induction	-0.124939
-1.189471	how to use induction	-0.124939
-0.786901	do not make induction	-0.124939
-1.654871	use the same induction	-0.124939
-1.066163	make floating point induction	-0.124939
-0.581267	79 Floating point induction	-0.124939
-0.525777	8 and no induction	-0.124939
-0.568913	namely the two induction	-0.124939
-0.517405	use of two induction	-0.124939
-0.538466	compiler doesn't need induction	-0.124939
-0.495328	by a second induction	-0.124939
-0.421277	making an explicit induction	-0.124939
-0.048384	result // Update induction	-0.124939
-0.048384	Y // Update induction	-0.124939
-0.522749	This is the reason	-0.726999
-0.560087	intermediate code. The reason	-0.124939
-0.455476	and 1. The reason	-0.124939
-0.816870	clock cycles. The reason	-0.124939
-0.455476	performs well. The reason	-0.124939
-0.455476	to do. The reason	-0.124939
-0.647109	Pentium 4. The reason	-0.124939
-0.352502	floating point. The reason	-0.124939
-0.352502	not occur. The reason	-0.124939
-0.352502	the end. The reason	-0.124939
-0.352502	have tested. The reason	-0.124939
-0.352502	it directly. The reason	-0.124939
-1.365662	for the same reason	-0.124939
-0.840370	there is no reason	-0.602060
-1.027740	There is no reason	-0.425969
-0.493732	running. The main reason	-0.124939
-0.336293	a compelling security reason	-0.124939
-1.069514	points to the dispatcher	-0.124939
-1.363493	Note that the dispatcher	-0.124939
-0.592646	gets from the dispatcher	-0.124939
-0.582269	processor makes the dispatcher	-0.124939
-0.540193	loader calls the dispatcher	-0.124939
-0.549662	called. Therefore, the dispatcher	-0.124939
-0.540777	// Make the dispatcher	-0.124939
-0.358201	a dispatcher. The dispatcher	-0.124939
-0.358201	being initialized. The dispatcher	-0.124939
-0.358341	hardware conditions. A dispatcher	-0.124939
-0.358226	2 // make dispatcher	-0.124939
-0.910592	for the CPU dispatcher	-0.124939
-1.048051	that the CPU dispatcher	-0.124939
-0.294463	has a CPU dispatcher	-0.124939
-0.294463	make a CPU dispatcher	-0.124939
-0.294463	keeping a CPU dispatcher	-0.124939
-0.269099	program. The CPU dispatcher	-0.124939
-0.269099	version. The CPU dispatcher	-0.124939
-0.269099	processor. The CPU dispatcher	-0.124939
-0.269099	cases: The CPU dispatcher	-0.124939
-0.269099	programming. The CPU dispatcher	-0.124939
-0.417528	set. A CPU dispatcher	-0.124939
-0.589460	the Intel CPU dispatcher	-0.124939
-1.271606	value that is n	-0.124939
-0.540626	dynamic array of n	-0.124939
-0.463142	the consequence of n	-0.124939
-0.541217	the fact that n	-0.124939
-0.358687	to x^0/0! // n	-0.124939
-1.197761	the loop by n	-0.124939
-1.161726	be calculated by n	-0.124939
-0.463190	bounds check on n	-0.124939
-0.500725	index than when n	-0.124939
-0.355856	more serious when n	-0.124939
-0.358075	a vector. If n	-0.124939
-0.357439	places back, where n	-0.124939
-0.355405	nonzero u.i += n	-0.124939
-0.520796	23; // add n	-0.124939
-0.578239	1.f; for (int n	-0.124939
-0.561670	n = 1; n	-0.124939
-0.531099	x *= x; n	-0.124939
-0.543303	2n by adding n	-0.124939
-0.632582	the least significant n	-0.124939
-0.339344	series: ex xn n	-0.124939
-0.339344	interval 0 <= n	-0.124939
-0.294108	2n by subtracting n	-0.124939
-0.294108	/ b) >> n	-0.124939
-0.901417	length of the string	-0.124939
-0.493986	every time a string	-0.425969
-0.503649	that produces a string	-0.124939
-0.357949	function scans a string	-0.124939
-0.828103	common implementations of string	-0.124939
-0.433931	of memory and string	-0.124939
-0.433931	common memory and string	-0.124939
-0.357889	processing Memory and string	-0.124939
-0.576374	strlen function. The string	-0.124939
-0.462652	large applications. The string	-0.124939
-0.577314	efficient functions for string	-0.124939
-0.358795	then interpret that string	-0.124939
-0.595213	classes, such as string	-0.124939
-1.422996	is to use string	-0.124939
-0.358237	variable names from string	-0.124939
-0.582812	length of each string	-0.124939
-0.355573	versions of common string	-0.124939
-0.483726	the C style string	-0.124939
-0.093284	floating point constants, string	-0.124939
-0.093284	26 point constants, string	-0.124939
-0.294145	a zero-terminated ASCII string	-0.124939
-0.294145	vector instructions SSE4.2 string	-0.124939
-0.213472	responsibility of the programmer	-0.823909
-0.584029	bility of the programmer	-0.124939
-0.596999	obvious to the programmer	-0.124939
-0.565118	useful for the programmer	-0.124939
-0.565118	important for the programmer	-0.124939
-1.101565	difficult for the programmer	-0.124939
-0.565118	relevant for the programmer	-0.124939
-0.587673	things that the programmer	-0.124939
-0.587673	risk that the programmer	-0.124939
-1.064532	only if the programmer	-0.124939
-0.592819	or because the programmer	-0.124939
-0.580668	situation, but the programmer	-0.124939
-0.720910	to help the programmer	-0.124939
-0.356402	compiler puts the programmer	-0.124939
-1.399395	For example, a programmer	-0.124939
-0.357569	usability reasons. The programmer	-0.124939
-0.357569	processor features. The programmer	-0.124939
-0.357569	page 70). The programmer	-0.124939
-0.568645	before the application programmer	-0.124939
-1.192921	code for the three	-0.124939
-0.358801	first way and three	-0.124939
-0.358140	+ r.b;} The three	-0.124939
-0.358140	and increment. The three	-0.124939
-0.598219	multiplication may be three	-0.124939
-0.837468	time. There are three	-0.124939
-0.568011	calls. There are three	-0.124939
-0.567155	be two or three	-0.124939
-0.841659	are implemented as three	-0.124939
-0.462947	image data have three	-0.124939
-0.565044	} This has three	-0.124939
-0.352143	this example has three	-0.124939
-0.352143	A for-loop has three	-0.124939
-0.526279	AVX2 and all three	-0.124939
-0.539431	usually divided into three	-0.124939
-0.900074	the other way three	-0.124939
-0.500606	should be compiled three	-0.124939
-0.720116	one addition every three	-0.124939
-0.567937	); // Make three	-0.124939
-0.350243	moderately well. Supports three	-0.124939
-0.483744	This is approximately three	-0.124939
-0.314630	assembly listing reveals three	-0.124939
-0.294117	Example 12.4b executes three	-0.124939
-1.990796	instruction set is better	-0.124939
-0.540860	64-bit version is better	-0.124939
-1.179119	lead to a better	-0.124939
-1.136774	may be a better	-0.124939
-0.550804	might be a better	-0.124939
-0.594247	compatible with a better	-0.124939
-0.824916	may get a better	-0.124939
-0.566829	a new and better	-0.124939
-0.358369	becoming better and better	-0.124939
-0.557557	are waiting for better	-0.124939
-1.127639	it may be better	-0.425969
-0.590886	caching will be better	-0.124939
-0.356474	may actually be better	-0.124939
-0.855224	The compilers are better	-0.124939
-0.358640	more efficiently by better	-0.124939
-0.358288	and usability A better	-0.124939
-1.617298	order to make better	-0.124939
-0.347491	operating systems need better	-0.124939
-0.347491	software applications need better	-0.124939
-0.353609	the non-reduced expression better	-0.124939
-0.467612	compilers are becoming better	-0.124939
-0.325321	Core2 processor performs better	-0.124939
-0.900166	effect of the keyword	-0.124939
-0.569927	by using the keyword	-0.124939
-0.540119	then add the keyword	-0.124939
-0.358205	1. Add the keyword	-0.124939
-0.501318	fastcall functions The keyword	-0.124939
-0.501318	calculated value. The keyword	-0.124939
-0.460273	interprocedural optimizations. The keyword	-0.124939
-0.356282	the context. The keyword	-0.124939
-0.460273	page 80. The keyword	-0.124939
-0.375492	because the static keyword	-0.124939
-0.375492	add the static keyword	-0.124939
-0.344986	function. The static keyword	-0.124939
-0.344986	class. The static keyword	-0.124939
-0.517100	use the const keyword	-0.124939
-0.349668	purposes. The const keyword	-0.124939
-0.244560	compilers The register keyword	-0.124939
-0.244560	variable. The register keyword	-0.124939
-0.354772	if the inline keyword	-0.124939
-0.117630	volatile. The volatile keyword	-0.124939
-0.117630	Volatile The volatile keyword	-0.124939
-0.331879	Therefore, the __fastcall keyword	-0.124939
-0.598727	and is not efficient.	-0.124939
-0.552759	blocks is more efficient.	-0.124939
-0.552759	latter is more efficient.	-0.124939
-0.542105	smaller and more efficient.	-0.124939
-0.537482	independent code more efficient.	-0.124939
-0.343892	function calls more efficient.	-0.124939
-0.703686	data caching more efficient.	-0.124939
-0.343892	point comparisons more efficient.	-0.124939
-0.357247	data caching very efficient.	-0.124939
-0.525399	which is less efficient.	-0.124939
-0.466918	bits are less efficient.	-0.124939
-0.277426	makes it less efficient.	-0.124939
-0.277426	the program less efficient.	-0.124939
-0.277426	member pointers less efficient.	-0.124939
-0.006888	data caching less efficient.	-0.124939
-0.035589	makes caching less efficient.	-0.124939
-0.394774	only slightly less efficient.	-0.124939
-0.364834	This is equally efficient.	-0.124939
-0.446302	they are equally efficient.	-0.124939
-0.886759	function with a lookup	-0.124939
-0.945628	to use a lookup	-0.124939
-1.350172	by using a lookup	-0.124939
-0.548318	possibly also a lookup	-0.124939
-0.555600	implementation uses a lookup	-0.124939
-1.249871	Do not use lookup	-0.124939
-0.518884	with a table lookup	-0.124939
-0.293521	principle of table lookup	-0.124939
-0.122421	code and table lookup	-0.124939
-0.122421	calculation and table lookup	-0.124939
-0.293521	heavily on table lookup	-0.124939
-0.293521	all these table lookup	-0.124939
-0.434040	the virtual table lookup	-0.124939
-0.293521	132. Unfortunately, table lookup	-0.124939
-0.537647	for vectorized table lookup	-0.124939
-0.293521	9. Avoid table lookup	-0.124939
-0.293521	112 Vectorized table lookup	-0.124939
-0.138606	132 14.1 Use lookup	-0.124939
-0.138606	topics 14.1 Use lookup	-0.124939
-0.514451	FactorialTable[n]; // Table lookup	-0.124939
-0.329633	line. 132 Table lookup	-0.124939
-0.341873	the slow GOT lookup	-0.124939
-1.686660	address of the end	-0.124939
-0.571950	important to the end	-0.124939
-0.197929	distributed to the end	-0.425969
-0.571950	inconvenient to the end	-0.124939
-0.590461	handling in the end	-0.124939
-0.590461	points in the end	-0.124939
-0.590461	structures in the end	-0.124939
-0.586030	disadvantage for the end	-0.124939
-0.586030	poorly for the end	-0.124939
-0.587987	delay that the end	-0.124939
-0.587987	unlikely that the end	-0.124939
-0.882791	elements at the end	-0.124939
-0.588030	time before the end	-0.124939
-0.567622	doing. See the end	-0.124939
-0.356693	Writing past the end	-0.124939
-0.358911	the majority of end	-0.124939
-0.575609	= point to end	-0.124939
-0.358636	; compare with end	-0.124939
-0.659042	more RAM than end	-0.124939
-0.357403	-156. Surprisingly, we end	-0.124939
-0.356453	recursion must always end	-0.124939
-0.237910	align ; mark end	-0.124939
-0.540403	programming, but in applications	-0.124939
-0.763184	an advantage in applications	-0.124939
-0.358776	are advantageous for applications	-0.124939
-1.109862	This will make applications	-0.124939
-0.503763	calling from other applications	-0.124939
-0.357521	screen. However, such applications	-0.124939
-0.568268	performance for many applications	-0.124939
-0.245306	databases Many software applications	-0.124939
-0.245306	more. Many software applications	-0.124939
-0.356999	recommended for critical applications	-0.124939
-0.333508	function libraries Some applications	-0.124939
-0.333508	array sequentially. Some applications	-0.124939
-0.333508	pool. Alignment? Some applications	-0.124939
-0.355573	except when several applications	-0.124939
-0.353375	is on mathematical applications	-0.124939
-0.434946	in small embedded applications	-0.124939
-0.331768	are: 146 Multiple applications	-0.124939
-0.331768	for some CPU-intensive applications	-0.124939
-0.314630	simultaneously. In multithreaded applications	-0.124939
-0.294117	compiler for Unix applications	-0.124939
-0.294117	time for WTL applications	-0.124939
-0.237820	cause the resource-hungry applications	-0.124939
-0.237820	from disk. Memory-hungry applications	-0.124939
-0.356932	on page 8 below.	-0.124939
-0.444463	mode, as explained below.	-0.124939
-0.444463	system, as explained below.	-0.124939
-0.444463	up, as explained below.	-0.124939
-0.444463	optimizations, as explained below.	-0.124939
-0.355050	See page 128 below.	-0.124939
-0.353592	independent code, see below.	-0.124939
-0.575615	method is described below.	-0.124939
-0.513884	function as described below.	-0.124939
-0.516421	experiment are given below.	-0.124939
-0.420662	conversions is discussed below.	-0.124939
-0.266046	optimization are discussed below.	-0.124939
-0.266046	libraries are discussed below.	-0.124939
-0.343687	compilers are mentioned below.	-0.124939
-0.339380	in the sections below.	-0.124939
-0.771038	in example 13.1 below.	-0.124939
-0.331798	follow the guidelines below.	-0.124939
-0.325332	in table 8.1 below.	-0.124939
-0.044142	on page 146 below.	-0.124939
-0.314712	on page 164 below.	-0.124939
-0.382748	instructions are summarized below.	-0.124939
-0.294145	in example 14.19 below.	-0.124939
-0.726813	you expect the &&	-0.124939
-0.878639	= a a &&	-0.124939
-1.169301	c = a &&	-0.124939
-0.800571	cannot replace a &&	-0.124939
-0.503272	The expression a &&	-0.124939
-0.357010	|| b) a &&	-0.124939
-0.065561	= true a &&	-0.124939
-0.659778	first operand of &&	-0.124939
-0.580060	last in an &&	-0.124939
-0.462139	equivalent expression b &&	-0.124939
-0.981876	the Boolean operators &&	-0.124939
-0.353341	(SIZE > 256 &&	-0.124939
-0.353432	x > y &&	-0.124939
-0.350785	cases. Don't change &&	-0.124939
-0.314712	= a&&(b||c) !a &&	-0.124939
-0.294145	(i >= min &&	-0.124939
-0.294145	(i < ARRAYSIZE &&	-0.124939
-0.102840	a<c) = (a<b &&	-0.124939
-0.102840	----x---- !(a<b)=(a>=b) (a<b &&	-0.124939
-0.237845	(a<b && b<c &&	-0.124939
-0.237845	(handle != INVALID_HANDLE_VALUE &&	-0.124939
-0.237845	to write if(!a &&	-0.124939
-1.306037	by using the |	-0.124939
-0.545314	operation using the |	-0.124939
-0.878682	= a a |	-0.124939
-0.885345	d = a |	-0.124939
-0.885796	b with a |	-0.124939
-1.238101	n.a. - a |	-0.124939
-0.310321	= a, a |	-0.124939
-0.357023	a<<b<<c=a<<(b+c) x-xxx--xx a |	-0.124939
-0.358839	of << and |	-0.124939
-0.504471	x.abc = A |	-0.124939
-0.201002	3) << 4) |	-0.124939
-0.201002	(B << 4) |	-0.124939
-0.048391	(Tuesday | Wednesday |	-0.124939
-0.102852	(Day & (Tuesday |	-0.124939
-0.102852	The expression (Tuesday |	-0.124939
-0.538807	(a&b) | (~a&c) |	-0.124939
-0.102852	(b&c) = (a&b) |	-0.124939
-0.102852	= 0, (a&b) |	-0.124939
-0.237877	#include <xmmintrin.h> _mm_setcsr(_mm_getcsr() |	-0.124939
-0.237877	(n & 0x7FFFFF) |	-0.124939
-0.237877	(A & 0x0F) |	-0.124939
-0.864001	0) { // Make	-0.124939
-0.386553	cc[]) { // Make	-0.602060
-0.555404	4.; }; // Make	-0.124939
-0.450482	aa[size] ); // Make	-0.124939
-0.064471	= _mm_set1_epi16(0); // Make	-0.425969
-0.348557	100> list; // Make	-0.124939
-0.348557	CriticalFunction, @gnu_indirect_function"); // Make	-0.124939
-0.348557	Is16vec8 zero(0,0,0,0,0,0,0,0); // Make	-0.124939
-0.455960	is described below. Make	-0.124939
-0.579112	selected instruction set. Make	-0.124939
-0.349073	the object's class. Make	-0.124939
-0.880209	in 64-bit mode. Make	-0.124939
-0.511710	for the object. Make	-0.124939
-0.450178	by a variable. Make	-0.124939
-1.240252	the function returns. Make	-0.124939
-0.325321	13.1 below. 126 Make	-0.124939
-0.325321	with C++0x support. Make	-0.124939
-0.325321	expressions and operators. Make	-0.124939
-0.294173	files and executables. Make	-0.124939
-0.237869	the following alternatives: Make	-0.124939
-0.358276	is zero } We	-0.124939
-0.715893	can be used. We	-0.124939
-0.718184	the level-1 cache. We	-0.124939
-0.563669	bit to zero We	-0.124939
-0.352520	and Intel compilers. We	-0.124939
-0.796397	by the compiler. We	-0.124939
-0.348207	worst possible performance. We	-0.124939
-0.339318	floating point number. We	-0.124939
-0.744870	long dependency chain. We	-0.124939
-0.331727	of identifier names. We	-0.124939
-0.325222	bit of u.f We	-0.124939
-0.325222	in this example. We	-0.124939
-0.325222	assume is optimized. We	-0.124939
-0.407958	r = 28. We	-0.124939
-0.294080	15.1b to 15.1c. We	-0.124939
-0.538627	the sign bit. We	-0.124939
-0.237788	for some caveats. We	-0.124939
-0.237788	32, 64, ...). We	-0.124939
-0.237788	storage (e.g. PowerPC). We	-0.124939
-0.237788	sign bit set). We	-0.124939
-0.237788	C++ as 'this'. We	-0.124939
-0.237788	15.1b to 15.1c? We	-0.124939
-0.237788	to be annoying. We	-0.124939
-1.295603	any of the examples	-0.124939
-0.583802	well, but the examples	-0.124939
-0.570159	pool. See the examples	-0.124939
-1.128270	instruction set. The examples	-0.124939
-0.525223	each version. The examples	-0.124939
-0.357525	well documented. The examples	-0.124939
-0.461809	and 90 for examples	-0.124939
-0.357490	page 103 for examples	-0.124939
-0.461809	See www.agner.org/optimize/cppexamples.zip for examples	-0.124939
-0.587047	allowed. The code examples	-0.124939
-0.357079	contains complete code examples	-0.124939
-0.358347	also find more examples	-0.124939
-0.357427	have seen many examples	-0.124939
-0.347240	variables. In these examples	-0.124939
-0.510174	purposes. All these examples	-0.124939
-0.588832	shortly. The following examples	-0.124939
-0.355608	have provided several examples	-0.124939
-0.354841	at www.agner.org/optimize/cppexamples.zip contains examples	-0.124939
-1.136832	in the above examples	-0.124939
-0.519855	144 The above examples	-0.124939
-0.314678	to a[i] More examples	-0.124939
-0.294163	that such contrived examples	-0.124939
-0.294163	to construct obscure examples	-0.124939
-0.526107	object, except for char	-0.124939
-0.504314	object (except for char	-0.124939
-0.065636	1 byte = char	-0.425969
-0.658074	I have used char	-0.124939
-0.556051	bits Instruction set char	-0.124939
-0.357569	int n; static char	-0.124939
-0.547642	Agner 8 8 char	-0.124939
-0.431433	long int unsigned char	-0.124939
-0.431433	8 8 unsigned char	-0.124939
-0.333425	8 16 unsigned char	-0.124939
-0.333425	256 Vec32c unsigned char	-0.124939
-0.501441	I64vec1 8 16 char	-0.124939
-0.782166	2 128 SSE2 char	-0.124939
-0.501486	Vec2uq 8 32 char	-0.124939
-0.652203	int c:2; }; char	-0.124939
-0.498717	1 64 MMX char	-0.124939
-0.294126	value in stdint.h char	-0.124939
-0.237829	// Example 7.10b char	-0.124939
-0.237829	// Example 7.31b char	-0.124939
-0.237829	// Example 7.31a char	-0.124939
-0.237829	// Example 8.17 char	-0.124939
-0.237829	// Example 7.9b char	-0.124939
-1.437685	some of the difference	-0.124939
-0.598242	compensate for the difference	-0.124939
-0.583617	consumption as the difference	-0.124939
-0.143127	x; Note the difference	-0.124939
-0.143127	Day. Note the difference	-0.124939
-0.358054	example illustrates the difference	-0.124939
-0.358054	fast. Calculating the difference	-0.124939
-1.320593	may be a difference	-0.124939
-0.568499	two functions. The difference	-0.124939
-0.461887	= 80. The difference	-0.124939
-0.357552	and FPGAs. The difference	-0.124939
-0.718511	there is no difference	-0.726999
-1.249058	There is no difference	-0.124939
-0.134376	it makes no difference	-0.124939
-0.134376	#define makes no difference	-0.124939
-0.329981	is simply no difference	-0.124939
-0.459658	is no big difference	-0.124939
-0.452737	with a relative difference	-0.124939
-0.483880	test // Time difference	-0.124939
-0.237910	only a minimal difference	-0.124939
-0.586981	definition code in addition	-0.124939
-0.358408	of (or in addition	-0.124939
-0.460822	register, do an addition	-0.124939
-0.356714	is doing an addition	-0.124939
-1.214149	longer time than addition	-0.124939
-0.894561	a floating point addition	-0.124939
-0.945373	for floating point addition	-0.124939
-0.521573	one floating point addition	-0.124939
-0.872128	two floating point addition	-0.124939
-0.521573	new floating point addition	-0.124939
-0.521573	mix floating point addition	-0.124939
-0.559289	105. Floating point addition	-0.124939
-0.498469	can have one addition	-0.124939
-0.859333	have only one addition	-0.124939
-0.456898	that if each addition	-0.124939
-0.533173	chain where each addition	-0.124939
-0.873660	start a new addition	-0.124939
-1.098164	The most important addition	-0.124939
-0.355074	can do another addition	-0.124939
-0.845616	of the preceding addition	-0.124939
-0.618238	before the preceding addition	-0.124939
-0.346416	precision math allow addition	-0.124939
-0.600842	decomposition of the data.	-0.124939
-1.067113	calculations on the data.	-0.124939
-0.358495	can prefetch the data.	-0.124939
-0.463087	of organizing the data.	-0.124939
-0.570883	smallest list of data.	-0.124939
-0.502177	own block of data.	-0.124939
-0.089794	relative addressing of data.	-0.124939
-0.042581	self-relative addressing of data.	-0.124939
-0.461054	with lots of data.	-0.124939
-0.356896	2 gigabytes of data.	-0.124939
-0.358199	other odd-sized vector data.	-0.124939
-0.658107	the most used data.	-0.124939
-0.503150	public and static data.	-0.124939
-0.363874	set of test data.	-0.124939
-0.356594	for storing user data.	-0.124939
-0.356558	something on these data.	-0.124939
-0.348328	well as writing data.	-0.124939
-0.345157	text or input data.	-0.124939
-0.408079	code and read-only data.	-0.124939
-0.237869	long, double. Misaligned data.	-0.124939
-0.237869	often contains writeable data.	-0.124939
-1.191601	before it is too	-0.124939
-0.954503	the problem is too	-0.124939
-0.855632	this solution is too	-0.124939
-0.461716	or container is too	-0.124939
-1.117602	loop count is too	-0.124939
-0.357417	the granularity is too	-0.124939
-0.200122	out to be too	-0.124939
-0.585444	therefore not be too	-0.124939
-0.593874	Arrays that are too	-0.124939
-0.357169	and C are too	-0.124939
-0.656355	and c[i] are too	-0.124939
-0.527160	too small or too	-0.124939
-0.358389	zero. Execution time too	-0.124939
-1.224519	a program has too	-0.124939
-0.588247	because it takes too	-0.124939
-0.351280	the program takes too	-0.124939
-0.354102	of Basic was too	-0.124939
-0.514588	space has become too	-0.124939
-0.349093	some compilers unroll too	-0.124939
-0.339406	the sampling generates too	-0.124939
-0.325355	significant improvements. Making too	-0.124939
-0.237869	precision without worrying too	-0.124939
-0.358893	paragraph described a mechanism	-0.124939
-0.503940	(*.dll, *.so). The mechanism	-0.124939
-0.462657	hardware exceptions. The mechanism	-0.124939
-0.358560	return route. This mechanism	-0.124939
-1.455482	the Gnu compiler mechanism	-0.124939
-0.587131	cases, the Intel mechanism	-0.124939
-0.580969	while the Gnu mechanism	-0.124939
-0.104309	the out-of-order execution mechanism	-0.124939
-0.457324	44. The dispatching mechanism	-0.124939
-0.393195	bypassing the dispatch mechanism	-0.124939
-0.435899	virtual function dispatch mechanism	-0.124939
-0.208065	the CPU dispatch mechanism	-0.124939
-0.091690	The CPU dispatch mechanism	-0.124939
-0.208065	A CPU dispatch mechanism	-0.124939
-0.516480	However, the out-of-order mechanism	-0.124939
-1.266375	the CPU detection mechanism	-0.124939
-0.438891	abusing the update mechanism	-0.124939
-0.123776	The stack unwinding mechanism	-0.124939
-0.314669	work. The updating mechanism	-0.124939
-0.294154	memory. The renaming mechanism	-0.124939
-0.590636	() { // Table	-0.124939
-0.354600	// n! // Table	-0.124939
-0.354600	Polynomial coefficients // Table	-0.124939
-0.354600	+= A2; // Table	-0.124939
-0.354600	return FactorialTable[n]; // Table	-0.124939
-2.060238	n.a. n.a. - Table	-0.124939
-0.720820	8 or 16 Table	-0.124939
-0.356412	MOVNTDQ _mm_stream_si128 SSE2 Table	-0.124939
-0.353124	Intel CodeGear Microsoft Table	-0.124939
-0.516975	double 4 AVX2 Table	-0.124939
-0.454314	of compiler options Table	-0.124939
-0.799335	#pragma vector nontemporal Table	-0.124939
-0.444226	possibility of overflow. Table	-0.124939
-0.698514	8 512 AVX512 Table	-0.124939
-0.325262	(MS) x86intrin.h (Gnu) Table	-0.124939
-0.325262	0 264-1 uint64_t Table	-0.124939
-0.314630	cache line. 132 Table	-0.124939
-0.294117	optional commercial license Table	-0.124939
-0.237820	F64vec2 F32vec8 F64vec4 Table	-0.124939
-0.237820	513 58.7 168.3 Table	-0.124939
-0.237820	report /Qopt-report -opt-report Table	-0.124939
-0.237820	floating point multiply-and-add Table	-0.124939
-0.237820	2056 38.1 97 Table	-0.124939
-0.597888	program, and the runtime	-0.124939
-0.888970	efficient than the runtime	-0.124939
-0.358487	shows first the runtime	-0.124939
-0.569956	once, while the runtime	-0.124939
-0.595618	either as a runtime	-0.124939
-0.540684	It makes a runtime	-0.124939
-1.748368	a lot of runtime	-0.124939
-0.858254	off support for runtime	-0.124939
-0.659191	is wasted on runtime	-0.124939
-0.572628	does not use runtime	-0.124939
-0.358269	static library. A runtime	-0.124939
-0.647039	rather than at runtime	-0.124939
-0.748292	is done at runtime	-0.124939
-0.352467	is transferred at runtime	-0.124939
-0.358003	package, including all runtime	-0.124939
-0.357531	reason why such runtime	-0.124939
-0.543266	install a large runtime	-0.124939
-0.980076	a very large runtime	-0.124939
-0.355738	based on big runtime	-0.124939
-0.355670	the common language runtime	-0.124939
-0.494306	methods or require runtime	-0.124939
-0.349641	frame- pointer No runtime	-0.124939
-0.314659	aware of. Big runtime	-0.124939
-1.188408	intermediate code is needed	-0.124939
-0.577541	Extra time is needed	-0.124939
-0.999287	the pointer is needed	-0.124939
-0.357718	hash map is needed	-0.124939
-0.357718	Runtime polymorphism is needed	-0.124939
-0.597386	variables may be needed	-0.124939
-0.579295	variable would be needed	-0.124939
-1.514033	functions that are needed	-0.124939
-0.357969	table lookups are needed	-0.124939
-0.548998	A is not needed	-0.124939
-0.192976	constructor is not needed	-0.425969
-0.548998	Fastcall is not needed	-0.124939
-0.548998	(1) is not needed	-0.124939
-0.504661	matrix longer than needed	-0.124939
-1.136045	amount of memory needed	-0.124939
-0.556096	element Instruction set needed	-0.124939
-0.657641	the final size needed	-0.124939
-0.355851	the extra work needed	-0.124939
-0.531174	that is actually needed	-0.124939
-0.771681	feature is rarely needed	-0.124939
-0.102866	solution. Is searching needed	-0.124939
-0.102866	tree. Is searching needed	-0.124939
-0.596016	programming as a means	-0.124939
-0.584898	class member function means	-0.124939
-0.106894	into one by means	-0.425969
-0.348683	2, etc. This means	-0.124939
-0.450640	most cases. This means	-0.124939
-0.348683	multiplication units. This means	-0.124939
-0.348683	4 ways. This means	-0.124939
-0.348683	out-of-order execution. This means	-0.124939
-0.348683	clock cycle. This means	-0.124939
-0.348683	= 28. This means	-0.124939
-0.503732	should by all means	-0.124939
-0.453659	local const variable means	-0.124939
-0.890779	a global variable means	-0.124939
-0.355128	sets). Here, / means	-0.124939
-0.351608	20 to 10 means	-0.124939
-0.351218	a non-member function, means	-0.124939
-0.351218	performance). Aligned operands means	-0.124939
-0.351245	Functional decomposition here means	-0.124939
-0.347408	truncation, and % means	-0.124939
-0.336253	and other protection means	-0.124939
-0.325301	15 Metaprogramming Metaprogramming means	-0.124939
-0.237853	a scalar (Scalar means	-0.124939
-0.899173	value in the last	-0.124939
-1.702658	so that the last	-0.124939
-0.596076	accessed with the last	-0.124939
-0.593068	least at the last	-0.124939
-0.539874	to add the last	-0.124939
-0.561577	or after the last	-0.124939
-0.358043	and leave the last	-0.124939
-0.358375	in x, and last	-0.124939
-0.358375	G values, and last	-0.124939
-0.659624	be negative. The last	-0.124939
-1.384232	the same as last	-0.124939
-0.830215	same way as last	-0.124939
-0.358498	another way than last	-0.124939
-0.024608	byte at 0, last	-0.602060
-0.347434	most often true last	-0.124939
-0.345205	big objects come last	-0.124939
-0.054840	byte at 8, last	-0.425969
-0.331864	byte at 16, last	-0.124939
-0.237877	byte at 12, last	-0.124939
-0.237877	byte at 400, last	-0.124939
-0.358034	instructions are one byte	-0.124939
-0.502418	find the first byte	-0.124939
-0.502418	localize the first byte	-0.124939
-0.270533	2 bytes. first byte	-0.124939
-0.160575	4 bytes. first byte	-0.425969
-0.270533	8 bytes. first byte	-0.124939
-0.179553	400 bytes. first byte	-0.124939
-0.810891	4 unused bytes byte	-0.124939
-0.321383	byte at 1 byte	-0.124939
-0.321383	_mm_shuffle_epi8 16 1 byte	-0.124939
-0.321383	_mm_perm_epi8 32 1 byte	-0.124939
-0.018995	at 0, last byte	-0.602060
-0.043969	at 8, last byte	-0.425969
-0.211227	at 16, last byte	-0.124939
-0.211227	at 12, last byte	-0.124939
-0.211227	at 400, last byte	-0.124939
-1.060707	clock cycles per byte	-0.124939
-0.336295	byte at 15 byte	-0.124939
-1.454488	rather than the parts	-0.124939
-0.726651	to optimize the parts	-0.124939
-0.358353	not possible when parts	-0.124939
-0.574709	times and make parts	-0.124939
-0.561695	manipulate the different parts	-0.124939
-0.798855	stored in different parts	-0.124939
-0.405714	or between different parts	-0.124939
-0.405714	Switch between different parts	-0.124939
-0.501106	costs to other parts	-0.124939
-0.354690	which affects other parts	-0.124939
-0.658107	the most used parts	-0.124939
-0.575920	identify the critical parts	-0.124939
-0.326782	advices in critical parts	-0.124939
-0.326782	important or critical parts	-0.124939
-1.435074	the most critical parts	-0.124939
-0.326782	Optimizing less critical parts	-0.124939
-0.355671	other system- specific parts	-0.124939
-0.457947	likelihood that certain parts	-0.124939
-0.629850	the most time-consuming parts	-0.124939
-0.505796	the time consuming parts	-0.124939
-0.331828	CPU brand. Critical parts	-0.124939
-0.408079	if other nearby parts	-0.124939
-0.886008	d = a ||	-0.124939
-1.006832	= a, a ||	-0.124939
-0.802328	cannot replace a ||	-0.124939
-0.065641	= false, a ||	-0.124939
-0.659791	first operand of ||	-0.124939
-0.358828	operators && and ||	-0.124939
-0.580073	first in an ||	-0.124939
-0.460268	(i < 0 ||	-0.124939
-0.038268	: c (a&&b) ||	-0.124939
-0.038268	(a+b)+c=a+(b+c) --xx----- (a&&b) ||	-0.124939
-0.038268	(a&&b)||(a&&!b)=a x--xx---- (a&&b) ||	-0.124939
-0.038268	x-xx----- 75 (a&&b) ||	-0.124939
-0.038268	= a&&b (a&&b) ||	-0.124939
-0.331847	Day == Wednesday ||	-0.124939
-0.538774	(a&&b) || (a&&c) ||	-0.124939
-0.294163	!b = !(a ||	-0.124939
-0.538774	(a&&b) || (!a&&c) ||	-0.124939
-0.294163	(Day == Tuesday ||	-0.124939
-0.237861	= a&&(b||c) (a&&!b) ||	-0.124939
-0.237861	n; #if defined(__unix__) ||	-0.124939
-0.237861	the equivalent if(!(a ||	-0.124939
-1.099030	{ return a >	-0.124939
-0.504449	a = x >	-0.124939
-0.190235	result = b >	-0.425969
-0.357643	= StringLength; i >	-0.124939
-0.357178	(u.i * 2 >	-0.124939
-0.355647	(a * c >	-0.124939
-0.346403	ARRAYSIZE && list[i] >	-0.124939
-0.400661	14.15a if (a >	-0.124939
-0.269116	#define MAX(a,b) (a >	-0.124939
-0.702545	v; if (u.i >	-0.124939
-0.325262	{ // u.f >	-0.124939
-0.595057	{ if (n >	-0.124939
-0.294117	aa[i] = (bb[i] >	-0.124939
-0.294117	1's when bb[i] >	-0.124939
-0.023516	a = select(b >	-0.425969
-0.237820	integer expression -a >	-0.124939
-0.237820	(approximately): if (absvalue >	-0.124939
-0.237820	occur: if (SIZE >	-0.124939
-0.237820	the <, <=, >	-0.124939
-0.237820	{ // abs(u.f) >	-0.124939
-0.659696	the number and types	-0.124939
-0.746029	objects of different types	-0.124939
-0.156707	pointers of different types	-0.425969
-0.404292	arrays of different types	-0.124939
-0.336542	by using different types	-0.124939
-0.559831	have two different types	-0.124939
-0.397365	correspondingly two different types	-0.124939
-0.336542	of CPUs, different types	-0.124939
-0.358089	can reduce other types	-0.124939
-0.140661	the different integer types	-0.124939
-0.140661	of different integer types	-0.124939
-0.350013	of defining integer types	-0.124939
-0.577194	mix the two types	-0.124939
-0.357329	can reduce some types	-0.124939
-0.064737	7.16 Function return types	-0.124939
-0.356574	will convert these types	-0.124939
-0.345461	elements of simple types	-0.124939
-0.507587	efficient for simple types	-0.124939
-0.434966	objects have mixed types	-0.124939
-0.408103	of the specified types	-0.124939
-1.135486	The calculation of expressions	-0.124939
-0.566719	some types of expressions	-0.124939
-1.342213	of floating point expressions	-0.124939
-0.845580	in floating point expressions	-0.124939
-1.095832	for floating point expressions	-0.124939
-0.342203	operands are integer expressions	-0.124939
-0.429071	reductions on integer expressions	-0.124939
-0.303354	manipulations on integer expressions	-0.124939
-0.342203	for other integer expressions	-0.124939
-0.342203	at reducing integer expressions	-0.124939
-0.357603	variables for float expressions	-0.124939
-0.912613	chooses between two expressions	-0.124939
-0.357541	often, but such expressions	-0.124939
-0.355898	can skip large expressions	-0.124939
-0.335508	however, often write expressions	-0.124939
-0.335508	that programmers write expressions	-0.124939
-0.353983	many cases. Integer expressions	-0.124939
-0.395337	reduce simple algebraic expressions	-0.124939
-0.304383	reduce various algebraic expressions	-0.124939
-0.540841	only the simplest expressions	-0.124939
-0.237869	constant references accept expressions	-0.124939
-0.237869	function inlining. Reducible expressions	-0.124939
-1.261484	code that is difficult	-0.124939
-1.738952	that it is difficult	-0.124939
-0.572030	functions It is difficult	-0.124939
-0.844973	memory. It is difficult	-0.124939
-0.572030	disadvantages: It is difficult	-0.124939
-0.572030	between. It is difficult	-0.124939
-0.591005	process which is difficult	-0.124939
-0.725502	very long and difficult	-0.124939
-0.358380	becomes bulky and difficult	-0.124939
-1.714242	it can be difficult	-0.124939
-1.129836	it may be difficult	-0.425969
-0.503683	space and are difficult	-0.124939
-1.514060	functions that are difficult	-0.124939
-1.427204	makes the code difficult	-0.124939
-0.591708	code is more difficult	-0.124939
-0.558793	clear and more difficult	-0.124939
-0.357522	specifying otherwise. In difficult	-0.124939
-0.538608	that are very difficult	-0.124939
-0.565330	understand and therefore difficult	-0.124939
-0.814662	It is quite difficult	-0.124939
-0.444303	that is slow, difficult	-0.124939
-0.823378	to the instruction set.	-0.124939
-0.295026	the same instruction set.	-0.124939
-0.295026	for each instruction set.	-0.124939
-0.057129	the available instruction set.	-0.425969
-0.295026	the necessary instruction set.	-0.124939
-0.383829	the specific instruction set.	-0.124939
-1.196133	the AVX instruction set.	-0.124939
-0.486250	the supported instruction set.	-0.124939
-1.249269	or later instruction set.	-0.124939
-0.450781	a higher instruction set.	-0.124939
-0.319505	or higher instruction set.	-0.124939
-0.540291	the desired instruction set.	-0.124939
-0.295026	a given instruction set.	-0.124939
-0.295026	the current instruction set.	-0.124939
-0.295026	a lower instruction set.	-0.124939
-0.590135	the newest instruction set.	-0.124939
-0.295026	the selected instruction set.	-0.124939
-0.295026	the specified instruction set.	-0.124939
-0.295026	the FMA4 instruction set.	-0.124939
-0.295026	the corresponding instruction set.	-0.124939
-0.549220	lines in each set.	-0.124939
-0.504622	an inline function instead	-0.124939
-0.463179	inserts built-in code instead	-0.124939
-0.593774	using short int instead	-0.124939
-0.725050	of test data instead	-0.124939
-0.357638	it has i instead	-0.124939
-0.357556	8.15a were float instead	-0.124939
-0.588417	to the object instead	-0.124939
-0.830616	a lookup table instead	-0.124939
-1.006626	stored in registers instead	-0.124939
-0.460322	error handling system instead	-0.124939
-0.453228	by using references instead	-0.124939
-1.128357	performance monitor counters instead	-0.124939
-0.446091	effect with templates instead	-0.124939
-0.341776	example use #if instead	-0.124939
-0.269127	by using rounding instead	-0.124939
-0.269127	this). Use rounding instead	-0.124939
-0.331747	48 Use macros instead	-0.124939
-0.595029	intermediate file format instead	-0.124939
-0.407982	the option -fpie instead	-0.124939
-0.314610	const or typedef instead	-0.124939
-0.294098	char (or int) instead	-0.124939
-0.237804	(& and |) instead	-0.124939
-1.376795	are based on compilers.	-0.124939
-1.297654	versions for different compilers.	-0.124939
-0.807678	compiled with different compilers.	-0.124939
-0.645845	on seven different compilers.	-0.124939
-0.551029	different in other compilers.	-0.124939
-0.817372	used with other compilers.	-0.124939
-0.568568	works with all compilers.	-0.124939
-0.841237	work on all compilers.	-0.124939
-0.787183	Clang and Intel compilers.	-0.124939
-0.447808	developed as C++ compilers.	-0.124939
-0.543679	of Intel C++ compilers.	-0.124939
-0.346441	most modern C++ compilers.	-0.124939
-0.584777	expensive in some compilers.	-0.124939
-0.349395	Intel and Gnu compilers.	-0.124939
-0.242372	PathScale and Gnu compilers.	-0.124939
-0.456312	specific to Microsoft compilers.	-0.124939
-0.087896	Gnu and PathScale compilers.	-0.124939
-0.198216	Microsoft and PathScale compilers.	-0.124939
-0.343674	not compatible across compilers.	-0.124939
-0.339426	Gnu and Clang compilers.	-0.124939
-0.325321	for the commercial compilers.	-0.124939
-0.592233	pointer which is transferred	-0.124939
-0.249365	way m is transferred	-0.124939
-0.358363	function, m is transferred	-0.124939
-0.358033	and ownership is transferred	-0.124939
-0.086316	parameters to be transferred	-0.726999
-0.575779	parameters would be transferred	-0.124939
-0.176537	the parameters are transferred	-0.124939
-0.176537	function parameters are transferred	-0.124939
-0.266817	point parameters are transferred	-0.124939
-0.079360	integer parameters are transferred	-0.425969
-0.176537	four parameters are transferred	-0.124939
-0.113232	Function parameters are transferred	-0.301030
-0.351676	and r are transferred	-0.124939
-0.463354	deleted, copied or transferred	-0.124939
-0.572382	PC and then transferred	-0.124939
-0.537489	Arrays are always transferred	-0.124939
-1.180533	function that is longer	-0.124939
-1.189755	converted to a longer	-0.124939
-1.230350	the cost of longer	-0.124939
-0.596762	method should be longer	-0.124939
-0.358295	the method. A longer	-0.124939
-0.586608	object is no longer	-0.124939
-0.440617	that are no longer	-0.124939
-0.440617	they are no longer	-0.124939
-0.525462	one that takes longer	-0.124939
-0.477732	started. It takes longer	-0.124939
-0.486681	unsigned integer takes longer	-0.124939
-0.621394	Integer multiplication takes longer	-0.124939
-0.539993	appear to take longer	-0.124939
-0.332697	multiplication would take longer	-0.124939
-0.332697	and division take longer	-0.124939
-0.332697	additions. Divisions take longer	-0.124939
-0.577648	switches by making longer	-0.124939
-0.097024	division takes much longer	-0.425969
-0.222153	truncation takes much longer	-0.124939
-0.558046	in the matrix longer	-0.124939
-0.352699	are one byte longer	-0.124939
-0.065613	time before and after	-0.124939
-0.065613	program before and after	-0.124939
-0.065613	counter before and after	-0.124939
-0.065613	count before and after	-0.124939
-0.358636	data member or after	-0.124939
-1.070172	are the same after	-0.124939
-0.534901	needed, but only after	-0.124939
-0.354765	doing things only after	-0.124939
-0.583719	access an object after	-0.124939
-0.581128	Sort the array after	-0.124939
-0.827077	object is accessed after	-0.124939
-0.458785	do the check after	-0.124939
-0.527304	two clock cycles after	-0.124939
-1.092594	few clock cycles after	-0.124939
-0.647874	Is searching needed after	-0.124939
-0.349020	are then output after	-0.124939
-0.343641	any necessary destructors after	-0.124939
-0.498731	The context switches after	-0.124939
-0.314649	may be removed after	-0.124939
-0.294135	to execute _mm_empty() after	-0.124939
-0.237837	is to resume after	-0.124939
-0.237837	will remain locked after	-0.124939
-0.525227	get used to read	-0.124939
-1.159821	we want to read	-0.124939
-0.831678	clock cycles to read	-0.124939
-0.357528	testing. Trying to read	-0.124939
-0.357528	and WritePrivateProfileString to read	-0.124939
-0.358822	memory buffer and read	-0.124939
-0.596168	line to be read	-0.124939
-0.599338	x can be read	-0.124939
-0.583939	a must be read	-0.124939
-0.594471	as you can read	-0.124939
-0.575320	time. Do not read	-0.124939
-0.590750	Furthermore, you may read	-0.124939
-0.589612	16. If you read	-0.124939
-0.569704	The code will read	-0.124939
-0.354773	language need only read	-0.124939
-0.354773	implementation would only read	-0.124939
-0.358040	16.2 above, but read	-0.124939
-0.461660	evicted when we read	-0.124939
-0.343658	were splitting 256-bit read	-0.124939
-0.339366	the program had read	-0.124939
-0.325301	than an uncached read	-0.124939
-0.294154	expect to 99 read	-0.124939
-0.855508	assembly code to give	-0.124939
-1.936650	is possible to give	-0.124939
-0.357868	hyperthreading processor to give	-0.124939
-0.357868	wherever appropriate to give	-0.124939
-0.562213	an overflow and give	-0.124939
-0.358358	an underflow and give	-0.124939
-0.463406	The table can give	-0.124939
-1.004135	the class or give	-0.124939
-0.589942	unit-test does not give	-0.124939
-0.358480	operator (^) may give	-0.124939
-0.355660	x.f; // will give	-0.124939
-0.459483	header file will give	-0.124939
-0.833013	CPU dispatcher should give	-0.124939
-0.582634	Linux operating systems give	-0.124939
-0.355767	CPUID instruction doesn't give	-0.124939
-0.651506	because this would give	-0.124939
-0.349048	unreliable. They sometimes give	-0.124939
-0.516403	solution can still give	-0.124939
-0.446162	The subsequent counts give	-0.124939
-0.279397	and negative inputs give	-0.124939
-0.279397	12. Higher inputs give	-0.124939
-0.758918	piece of code. Each	-0.124939
-0.356685	at installation time. Each	-0.124939
-0.348955	of extra resources. Each	-0.124939
-0.450170	of 64 bytes. Each	-0.124939
-1.130378	on the stack. Each	-0.124939
-0.347355	implementing polymorphic classes. Each	-0.124939
-0.635029	floating point instructions. Each	-0.124939
-0.534578	64-bit execution units. Each	-0.124939
-0.339367	two jobs simultaneously. Each	-0.124939
-1.031689	into multiple threads. Each	-0.124939
-0.336176	multiple processor cores. Each	-0.124939
-0.421269	library at initialization. Each	-0.124939
-0.325242	exception handling information. Each	-0.124939
-0.502364	at the diagonal. Each	-0.124939
-0.835939	the following reasons: Each	-0.124939
-0.714577	a linked list. Each	-0.124939
-0.237804	member functions (methods) Each	-0.124939
-0.237804	size of 64. Each	-0.124939
-0.237804	output are unacceptable. Each	-0.124939
-0.237804	Y and Z. Each	-0.124939
-0.237804	with line 29. Each	-0.124939
-0.889433	event that it becomes	-0.124939
-0.593551	static then it becomes	-0.124939
-0.967744	and the code becomes	-0.124939
-1.158007	that the code becomes	-0.124939
-0.561653	because the code becomes	-0.124939
-0.561653	contrary, the code becomes	-0.124939
-0.486951	122. The code becomes	-0.124939
-0.486951	contiguous. The code becomes	-0.124939
-0.486951	tedious. The code becomes	-0.124939
-0.511453	resulting machine code becomes	-0.124939
-0.589352	Unrolling a loop becomes	-0.124939
-1.003266	nontemporal write instructions becomes	-0.124939
-0.653472	communication between threads becomes	-0.124939
-0.547945	28. The calculation becomes	-0.124939
-1.034941	time stamp counter becomes	-0.124939
-0.781386	The memory space becomes	-0.124939
-0.471475	The heap space becomes	-0.124939
-0.332925	fragmented and caching becomes	-0.124939
-0.332925	big that caching becomes	-0.124939
-0.353382	memory space never becomes	-0.124939
-0.237886	a menu click becomes	-0.124939
-0.575559	Assume pointer is aligned	-0.124939
-1.407575	have to be aligned	-0.124939
-0.594800	class should be aligned	-0.124939
-0.583953	vectors must be aligned	-0.124939
-1.005492	the data are aligned	-0.124939
-0.496901	that data are aligned	-0.124939
-0.607863	the arrays are aligned	-0.124939
-0.356708	use #pragma vector aligned	-0.124939
-0.142717	aligned #pragma vector aligned	-0.124939
-1.361370	how to make aligned	-0.124939
-1.155773	Function to store aligned	-0.124939
-0.544114	malloc is typically aligned	-0.124939
-0.353830	vectors are preferably aligned	-0.124939
-0.352865	// Make three aligned	-0.124939
-1.137441	Function to load aligned	-0.124939
-0.451134	instances of S1 aligned	-0.124939
-0.242375	0.12 memcpy 16kB aligned	-0.124939
-0.242375	Processor memcpy 16kB aligned	-0.124939
-0.294173	arrays are properly aligned	-0.124939
-0.358833	many keywords and directives	-0.124939
-0.567936	Many of these directives	-0.124939
-0.735423	Note that these directives	-0.124939
-0.861898	of the Gnu directives	-0.124939
-0.355605	and Fortran. These directives	-0.124939
-0.648362	means of #include directives	-0.124939
-0.518817	of the Microsoft directives	-0.124939
-0.321792	at runtime. #define directives	-0.124939
-0.321792	a name. #define directives	-0.124939
-0.345184	Table 18.2. Compiler directives	-0.124939
-0.120411	handling. 8.6 Optimization directives	-0.124939
-0.120411	81 8.6 Optimization directives	-0.124939
-0.102871	of the OpenMP directives	-0.124939
-0.102871	Supports the OpenMP directives	-0.124939
-0.237930	Parallelization by OpenMP directives	-0.124939
-0.279443	should have #if directives	-0.124939
-0.279443	is compiled. #if directives	-0.124939
-0.065793	Preprocessing directives Preprocessing directives	-0.124939
-0.031652	code. 7.32 Preprocessing directives	-0.124939
-0.031652	65 7.32 Preprocessing directives	-0.124939
-0.294173	library has preprocessing directives	-0.124939
-0.503835	Any language that requires	-0.124939
-0.358082	strict formalism that requires	-0.124939
-0.590732	seconds because it requires	-0.124939
-0.586470	core, but it requires	-0.124939
-0.356399	Most importantly, it requires	-0.124939
-0.779052	the program. This requires	-0.124939
-0.652531	memory block. This requires	-0.124939
-0.355245	this option. This requires	-0.124939
-0.358387	operating system, this requires	-0.124939
-0.358315	C library. It requires	-0.124939
-0.356802	other hardware often requires	-0.124939
-0.243850	between two pointers requires	-0.124939
-0.243850	Comparing two pointers requires	-0.124939
-0.509197	array. This method requires	-0.124939
-0.509197	loaded. This method requires	-0.124939
-0.356230	on such processors requires	-0.124939
-0.355351	enabled (single precision requires	-0.124939
-0.355203	} This calculation requires	-0.124939
-0.351976	of intrinsic vectors requires	-0.124939
-0.351592	remote databases usually requires	-0.124939
-0.294163	etc. Event-based sampling requires	-0.124939
-0.557769	from doing the optimizations	-0.124939
-0.969635	this kind of optimizations	-0.124939
-0.526910	8.1. Comparison of optimizations	-0.124939
-0.896874	of the compiler optimizations	-0.124939
-0.458233	result of other optimizations	-0.124939
-0.354676	makes various other optimizations	-0.124939
-0.551701	indication of which optimizations	-0.124939
-0.520735	do and which optimizations	-0.124939
-0.354359	turning off all optimizations	-0.124939
-0.354359	and prevents all optimizations	-0.124939
-1.177234	necessary to do optimizations	-0.124939
-0.495121	warning for such optimizations	-0.124939
-0.495121	will do such optimizations	-0.124939
-0.350513	it from making optimizations	-0.124939
-0.678572	compiler from making optimizations	-0.124939
-0.500442	make CPU- specific optimizations	-0.124939
-0.753877	compiler from doing optimizations	-0.124939
-0.549955	statement can improve optimizations	-0.124939
-0.447807	which will enable optimizations	-0.124939
-0.325340	to do interprocedural optimizations	-0.124939
-0.237853	to do cross-module optimizations	-0.124939
-0.901430	power of the graphics	-0.124939
-0.332047	call to a graphics	-0.301030
-0.593322	platform with a graphics	-0.124939
-0.551485	processors on a graphics	-0.124939
-0.551485	either on a graphics	-0.124939
-0.586233	systems have a graphics	-0.124939
-1.135486	The calculation of graphics	-0.124939
-1.105706	different types of graphics	-0.124939
-0.358661	graphics coprocessor or graphics	-0.124939
-1.872001	there is no graphics	-0.124939
-0.459786	avoid the large graphics	-0.124939
-0.576409	Typically, a specific graphics	-0.124939
-0.352324	extra resources. Each graphics	-0.124939
-0.364750	of the heavy graphics	-0.124939
-0.279413	example, a heavy graphics	-0.124939
-0.314732	does not cover graphics	-0.124939
-0.294173	of a third-party graphics	-0.124939
-0.294173	processing unit. Various graphics	-0.124939
-0.237869	purposes than rendering graphics	-0.124939
-1.182123	reference to a public	-0.124939
-0.996772	to access a public	-0.124939
-0.504114	And whenever a public	-0.124939
-0.358905	allows overriding of public	-0.124939
-0.473906	to functions and public	-0.124939
-0.676107	public functions and public	-0.124939
-0.358425	You can't have public	-0.124939
-0.579344	GOT for all public	-0.124939
-0.357373	by avoiding any public	-0.124939
-0.047862	class D : public	-0.124939
-0.047862	class C1 : public	-0.124939
-0.047862	class CChild1 : public	-0.425969
-0.234634	class CChild2 : public	-0.124939
-0.234634	class CParent : public	-0.124939
-0.234634	class C2 : public	-0.124939
-0.354449	need relocation. All public	-0.124939
-0.382816	ability to override public	-0.124939
-0.237894	: public B1, public	-0.124939
-0.323327	class C1 { public:	-0.124939
-0.323327	class powN { public:	-0.124939
-0.083550	class CHello { public:	-0.124939
-0.039769	public CHello { public:	-0.425969
-0.132245	class C0 { public:	-0.124939
-0.132245	public C0 { public:	-0.124939
-0.591375	public CParent<CChild1> { public:	-0.124939
-0.132245	class CGrandParent { public:	-0.124939
-0.132245	public CGrandParent { public:	-0.124939
-0.323327	public B2 { public:	-0.124939
-0.323327	public B1 { public:	-0.124939
-0.323327	class powN<true,0> { public:	-0.124939
-0.323327	class powN<true,N> { public:	-0.124939
-0.323327	class S2 { public:	-0.124939
-0.323327	class S3 { public:	-0.124939
-0.323327	public CParent<CChild2> { public:	-0.124939
-0.323327	class powN<true,1> { public:	-0.124939
-0.495890	const int x; public:	-0.124939
-0.325411	2-dimensional vector 56 public:	-0.124939
-0.237943	protected: T a[N]; public:	-0.124939
-0.901181	installation of the framework	-0.124939
-0.884627	to load the framework	-0.124939
-0.463265	using such a framework	-0.124939
-0.463265	supply such a framework	-0.124939
-0.526929	intermediate code. This framework	-0.124939
-0.565336	choose a software framework	-0.124939
-0.566220	Such an extra framework	-0.124939
-0.453465	and the runtime framework	-0.124939
-0.127081	a large runtime framework	-0.124939
-0.127081	very large runtime framework	-0.124939
-0.326914	a specific graphics framework	-0.124939
-0.326914	a third-party graphics framework	-0.124939
-0.324225	of user interface framework	-0.124939
-0.321830	the high level framework	-0.124939
-0.220680	a high level framework	-0.124939
-0.336242	on a complex framework	-0.124939
-0.188091	for the .NET framework	-0.124939
-0.059920	frameworks. The .NET framework	-0.124939
-0.059920	mouse. The .NET framework	-0.124939
-0.129450	in Microsoft's .NET framework	-0.124939
-1.469635	is necessary to look	-0.124939
-0.857925	compiler needs to look	-0.124939
-0.589865	a compiler can look	-0.124939
-0.540754	dispatcher should not look	-0.124939
-0.587226	108 You may look	-0.124939
-0.354570	C++ implementation may look	-0.124939
-0.354570	test setup may look	-0.124939
-0.588054	and if you look	-0.124939
-0.585509	efficient. If you look	-0.124939
-0.354412	are: When you look	-0.124939
-0.462807	embedded systems. A look	-0.124939
-0.526354	// It will look	-0.124939
-0.526440	case. Intrinsic functions look	-0.124939
-0.578155	that you should look	-0.124939
-0.525768	you may also look	-0.124939
-0.461041	necessary to first look	-0.124939
-0.353755	This may typically look	-0.124939
-0.212252	the code. Let's look	-0.124939
-0.212252	4 rows. Let's look	-0.124939
-0.294154	For example, let's look	-0.124939
-0.294154	self-relative address. (3) look	-0.124939
-0.472516	mechanism of static linking	-0.124939
-0.321016	clear that static linking	-0.124939
-0.321016	not if static linking	-0.124939
-0.321016	file when static linking	-0.124939
-0.333318	of using static linking	-0.124939
-0.229756	by using static linking	-0.124939
-0.321016	this requires static linking	-0.124939
-0.321016	to specify static linking	-0.124939
-0.767598	advantages of dynamic linking	-0.124939
-0.317144	application if dynamic linking	-0.124939
-0.317144	rather than dynamic linking	-0.124939
-0.317144	cases where dynamic linking	-0.124939
-0.317144	application, while dynamic linking	-0.124939
-0.252283	is used. Dynamic linking	-0.124939
-0.252283	end user. Dynamic linking	-0.124939
-0.108074	19 3.6 Dynamic linking	-0.124939
-0.108074	acceptable. 3.6 Dynamic linking	-0.124939
-0.347498	/Gr Function level linking	-0.124939
-0.343738	assembly or easy linking	-0.124939
-0.237947	bit code Static linking	-0.124939
-0.237947	linking are: Static linking	-0.124939
-1.210268	in the code. Many	-0.124939
-0.355047	for speed-critical functions. Many	-0.124939
-0.354249	start to program. Many	-0.124939
-0.712656	are discussed below. Many	-0.124939
-0.455146	on newer processors. Many	-0.124939
-0.644442	in a compiler. Many	-0.124939
-1.128357	performance monitor counters Many	-0.124939
-0.634990	3.4 Automatic updates Many	-0.124939
-0.939576	automatic CPU dispatching. Many	-0.124939
-0.444265	for 64-bit integers. Many	-0.124939
-0.339336	very inefficient solution. Many	-0.124939
-0.429332	with other microprocessors. Many	-0.124939
-0.594930	3.9 Other databases Many	-0.124939
-0.325242	all relevant options. Many	-0.124939
-0.741790	for user input. Many	-0.124939
-0.294098	Reference Manual". developer.intel.com. Many	-0.124939
-0.294098	to be slower. Many	-0.124939
-0.237804	workday or more. Many	-0.124939
-0.237804	and system breakdown. Many	-0.124939
-0.237804	software. Background services. Many	-0.124939
-0.237804	unknown processors properly. Many	-0.124939
-0.358199	and scientific vector processors.	-0.124939
-0.503926	fastest on different processors.	-0.124939
-0.444857	only with Intel processors.	-0.124939
-0.377795	not on Intel processors.	-0.124939
-0.377795	works on Intel processors.	-0.124939
-0.344103	and later Intel processors.	-0.124939
-0.760460	only on some processors.	-0.124939
-0.354999	handle only known processors.	-0.124939
-0.352324	not cover graphics processors.	-0.124939
-0.636011	AMD and VIA processors.	-0.124939
-0.556830	number of logical processors.	-0.124939
-0.304352	but eight logical processors.	-0.124939
-0.416882	bytes) on future processors.	-0.124939
-0.294197	rather than future processors.	-0.124939
-0.294197	fast on newer processors.	-0.124939
-0.294197	on most newer processors.	-0.124939
-0.341810	implemented in PC processors.	-0.124939
-0.929086	on the newest processors.	-0.124939
-0.408079	bytes on contemporary processors.	-0.124939
-0.594579	library that is actually	-0.124939
-0.597473	0.666666666666666666667; This is actually	-0.124939
-0.855354	calculation time is actually	-0.124939
-1.718429	the program is actually	-0.124939
-0.462095	= 16 is actually	-0.124939
-0.541208	1% goes to actually	-0.124939
-0.575458	extra code for actually	-0.124939
-0.866222	the functions are actually	-0.124939
-0.575342	Gnu compilers are actually	-0.124939
-0.828253	Modern CPUs are actually	-0.124939
-0.356376	power consumption are actually	-0.124939
-0.591897	files. This can actually	-0.124939
-0.557422	bigger than it actually	-0.124939
-1.371052	the compiler may actually	-0.124939
-0.356527	instruction set may actually	-0.124939
-0.357778	the original pointer actually	-0.124939
-1.028541	that the user actually	-0.124939
-0.325321	if your modifications actually	-0.124939
-0.314688	in case F2 actually	-0.124939
-0.294173	name "position-independent code" actually	-0.124939
-0.237869	100*16, and temp++ actually	-0.124939
-0.027884	The microarchitecture of Intel,	-0.124939
-0.003389	"The microarchitecture of Intel,	-1.028029
-0.358818	micro-operation breakdowns for Intel,	-0.124939
-0.559469	platform with an Intel,	-0.124939
-0.951957	is not an Intel,	-0.124939
-0.358270	of microprocessors from Intel,	-0.124939
-0.357890	vectorclass.h Supported compilers Intel,	-0.124939
-0.354162	tool supports both Intel,	-0.124939
-0.456343	compilers Intel, Microsoft Intel,	-0.124939
-0.086012	for the Microsoft, Intel,	-0.124939
-0.086012	as the Microsoft, Intel,	-0.124939
-0.193373	supported by Microsoft, Intel,	-0.124939
-0.193373	x86 platforms. Microsoft, Intel,	-0.124939
-0.898531	the Gnu, Clang, Intel,	-0.124939
-0.586054	address of a linked	-0.124939
-0.871647	form of a linked	-0.124939
-0.894430	element in a linked	-0.124939
-0.549174	not as a linked	-0.124939
-0.549174	than as a linked	-0.124939
-1.198940	implemented as a linked	-0.124939
-0.877201	not use a linked	-0.124939
-0.583933	Walking through a linked	-0.124939
-0.593680	and can be linked	-0.124939
-1.171530	library can be linked	-0.124939
-0.886806	versions should be linked	-0.124939
-0.501605	most cases be linked	-0.124939
-0.358775	the modules are linked	-0.124939
-0.882013	are faster than linked	-0.124939
-0.358385	Many containers use linked	-0.124939
-0.459578	next block. A linked	-0.124939
-0.355735	linked lists. A linked	-0.124939
-0.504165	files are then linked	-0.124939
-0.565890	of library functions linked	-0.124939
-0.350329	addresses of dynamically linked	-0.124939
-0.237894	runtime DLL's (dynamically linked	-0.124939
-0.463343	float xn = x;	-0.124939
-0.875485	{ const int x;	-0.124939
-0.065289	double Table[100]; int x;	-0.124939
-1.415497	int i; } x;	-0.124939
-1.118009	int i; float x;	-0.124939
-0.347041	i, j; float x;	-0.124939
-0.347041	Example 7.27 float x;	-0.124939
-0.539935	= x * x;	-0.124939
-1.600366	x) { return x;	-0.124939
-0.341837	i++) matrix[FuncRow(i)][FuncCol(i)] += x;	-0.124939
-0.341837	39 matrix[i][j] += x;	-0.124939
-0.132892	x; x *= x;	-0.124939
-0.132892	1) y *= x;	-0.124939
-0.029608	x++) factorial *= x;	-0.124939
-0.132892	nfac; xn *= x;	-0.124939
-0.477954	F1() { C1 x;	-0.124939
-0.093303	c:2; }; Bitfield x;	-0.124939
-0.093303	abc; }; Bitfield x;	-0.124939
-0.294191	fld qword ptr x;	-0.124939
-0.550314	competing brands of microprocessors	-0.124939
-0.358564	x86 family of microprocessors	-0.124939
-0.575580	how compilers and microprocessors	-0.124939
-0.525640	versions of Intel microprocessors	-0.124939
-0.515514	buffer that some microprocessors	-0.124939
-0.515514	normal on some microprocessors	-0.124939
-0.965674	in the way microprocessors	-0.124939
-0.121418	compatible with old microprocessors	-0.124939
-0.295755	speed of modern microprocessors	-0.124939
-0.295755	core of modern microprocessors	-0.124939
-0.258179	inefficient. The modern microprocessors	-0.124939
-0.369725	almost all modern microprocessors	-0.124939
-0.345179	system All newer microprocessors	-0.124939
-0.193366	and operators Modern microprocessors	-0.124939
-0.193366	Dependency chains Modern microprocessors	-0.124939
-0.193366	branch prediction. Modern microprocessors	-0.124939
-0.193366	prediction mechanisms. Modern microprocessors	-0.124939
-0.325331	compatibility with older microprocessors	-0.124939
-0.325393	if possible. Smaller microprocessors	-0.124939
-0.237877	vector operations Today's microprocessors	-0.124939
-1.106170	more time to load	-0.124939
-0.355513	the cache to load	-0.124939
-1.790979	it takes to load	-0.124939
-1.031508	the need to load	-0.124939
-0.582650	it necessary to load	-0.124939
-0.316373	// Function to load	-0.425969
-1.004331	it needs to load	-0.124939
-0.500246	all writes to load	-0.124939
-0.463502	execute it. The load	-0.124939
-0.584999	system may not load	-0.124939
-0.569738	optimized code will load	-0.124939
-0.459111	compilers. Dispatch at load	-0.124939
-0.355367	need relocation at load	-0.124939
-0.156414	when the work load	-0.425969
-1.154336	to a specific load	-0.124939
-0.487981	fit the actual load	-0.124939
-0.237902	part of it) load	-0.124939
-0.876667	easy way to control	-0.124939
-0.463190	various options to control	-0.124939
-0.164013	in the loop control	-0.124939
-0.913298	if the loop control	-0.124939
-0.430703	by the loop control	-0.124939
-0.609180	then the loop control	-0.124939
-0.430703	where the loop control	-0.124939
-0.430703	predict the loop control	-0.124939
-0.430703	execute the loop control	-0.124939
-0.430703	evaluate the loop control	-0.124939
-0.525913	7.30b. The loop control	-0.124939
-0.321412	most efficient loop control	-0.124939
-0.321412	The i<20 loop control	-0.124939
-0.350110	are other cache control	-0.124939
-0.064673	9.11 Explicit cache control	-0.124939
-0.556140	BSD Instruction set control	-0.124939
-0.357484	use a version control	-0.124939
-0.339487	Table 9.2. Cache control	-0.124939
-1.080056	the compiler to assume	-0.124939
-0.583932	it has to assume	-0.124939
-0.357881	not permissible to assume	-0.124939
-0.659696	the problem and assume	-0.124939
-0.829800	cases, you can assume	-0.124939
-0.829800	general, you can assume	-0.124939
-0.530480	code. You can assume	-0.124939
-0.530480	result. You can assume	-0.124939
-0.504662	ignore overflow or assume	-0.124939
-0.358487	Neither can you assume	-0.124939
-0.515822	2.5f; If we assume	-0.124939
-0.351114	function which we assume	-0.124939
-0.967269	then you cannot assume	-0.124939
-0.558320	cycles. You cannot assume	-0.124939
-0.458327	optimizing compiler would assume	-0.124939
-0.114059	you can generally assume	-0.124939
-0.114059	You can generally assume	-0.124939
-0.237886	that compiler makers assume	-0.124939
-0.237886	compiler can safely assume	-0.124939
-0.331606	int size = 100;	-0.221849
-0.352413	int ARRAYSIZE = 100;	-0.124939
-0.352413	100, NUMCOLUMNS = 100;	-0.124939
-0.054796	0; x < 100;	-0.425969
-0.493010	0; i < 100;	-0.865301
-0.599321	Assume that the numbers	-0.124939
-0.877802	because all the numbers	-0.124939
-0.659322	to hold the numbers	-0.124939
-1.075295	to floating point numbers	-0.124939
-0.996114	and floating point numbers	-0.124939
-1.055767	from floating point numbers	-0.124939
-0.186786	between floating point numbers	-0.425969
-0.521580	positive floating point numbers	-0.124939
-0.754737	nonzero floating point numbers	-0.124939
-0.559298	variables Floating point numbers	-0.124939
-0.439652	times with four numbers	-0.124939
-0.339971	only have four numbers	-0.124939
-0.354894	can have eight numbers	-0.124939
-0.922217	family and model numbers	-0.124939
-0.698802	that processor model numbers	-0.124939
-0.339441	array of thousand numbers	-0.124939
-0.294210	than generating denormal numbers	-0.124939
-0.294210	we use hexadecimal numbers	-0.124939
-0.237902	of data (low numbers	-0.124939
-0.591835	implemented on a platform	-0.124939
-1.162080	The choice of platform	-0.124939
-1.026484	to a different platform	-0.124939
-0.551337	_WIN64 64 bit platform	-0.124939
-0.341115	identification 16 bit platform	-0.124939
-0.544916	161 32 bit platform	-0.124939
-0.355842	_WIN64 _LP64 Windows platform	-0.124939
-0.355377	_WIN32 _WIN32 Linux platform	-0.124939
-0.463980	on the hardware platform	-0.124939
-0.027318	choice of hardware platform	-0.124939
-0.027318	Choice of hardware platform	-0.124939
-0.442478	Choosing the optimal platform	-0.124939
-0.518817	in the Microsoft platform	-0.124939
-0.351662	__unix__ __linux__ x86 platform	-0.124939
-1.044027	the standard PC platform	-0.124939
-0.325355	_M_IX86 _M_IX86 x86-64 platform	-0.124939
-0.314688	considerations of efficiency, platform	-0.124939
-1.782290	the code is later	-0.124939
-0.558458	one function and later	-0.124939
-0.356088	for SSE2 and later	-0.124939
-0.356088	the AVX and later	-0.124939
-0.654205	the SSE and later	-0.124939
-0.356088	Intel Core and later	-0.124939
-0.356088	the pipeline and later	-0.124939
-0.460027	address 0x2710 and later	-0.124939
-0.358827	very helpful for later	-0.124939
-0.038541	the SSE2 or later	-0.726999
-0.140970	the AVX or later	-0.124939
-0.140970	for AVX or later	-0.124939
-0.351018	version 2.20 or later	-0.124939
-0.351018	the Pentium-II or later	-0.124939
-0.597146	expect to use later	-0.124939
-0.356238	SVML v.10.3 & later	-0.124939
-0.586012	20 clock cycles later	-0.124939
-1.961076	of the code together	-0.124939
-0.364450	that are used together	-0.602060
-0.944766	all the objects together	-0.124939
-0.454060	store many objects together	-0.124939
-0.350063	should be stored together	-0.124939
-0.819947	class are stored together	-0.124939
-1.016842	are also stored together	-0.124939
-0.320087	are always stored together	-0.124939
-0.705477	can be linked together	-0.124939
-0.326948	are then linked together	-0.124939
-0.545641	want to keep together	-0.124939
-0.429497	whole software project together	-0.124939
-0.659972	will be joined together	-0.124939
-0.593826	function then the dispatch	-0.124939
-0.590181	program where the dispatch	-0.124939
-0.572085	places making the dispatch	-0.124939
-0.504014	It uses the dispatch	-0.124939
-0.725107	to implement the dispatch	-0.124939
-0.462724	compiler bypassing the dispatch	-0.124939
-0.358923	is called, a dispatch	-0.124939
-0.358917	makes sense to dispatch	-0.124939
-0.813623	the virtual function dispatch	-0.124939
-0.449292	inefficient virtual function dispatch	-0.124939
-1.156565	that the CPU dispatch	-0.124939
-0.462625	Implementation The CPU dispatch	-0.124939
-0.462625	mechanism. The CPU dispatch	-0.124939
-0.429370	developed. A CPU dispatch	-0.124939
-0.331778	set Automatic CPU dispatch	-0.124939
-0.331778	have similar CPU dispatch	-0.124939
-0.134948	122 13.1 CPU dispatch	-0.124939
-0.134948	loops. 13.1 CPU dispatch	-0.124939
-0.331778	use inappropriate CPU dispatch	-0.124939
-0.352945	wasted on runtime dispatch	-0.124939
-0.898224	interface to the calling	-0.124939
-0.557604	code which the calling	-0.124939
-0.726733	powN template is calling	-0.124939
-0.504886	cleaning up and calling	-0.124939
-0.504844	global object. The calling	-0.124939
-1.297729	is useful for calling	-0.124939
-0.581598	because the function calling	-0.124939
-0.863087	changes the function calling	-0.124939
-0.458624	that make function calling	-0.124939
-0.945147	the same function calling	-0.124939
-0.356041	a file by calling	-0.124939
-1.312674	be avoided by calling	-0.124939
-0.356041	be prevented by calling	-0.124939
-1.134469	as fast as calling	-0.124939
-0.874519	cycles more than calling	-0.124939
-0.349384	temporary array before calling	-0.124939
-0.922660	call _mm256_zeroupper() before calling	-0.124939
-0.500466	obey any specific calling	-0.124939
-0.827311	to the standard calling	-0.124939
-1.085978	in manual 5: calling	-0.124939
-0.358888	the lifetime of your	-0.124939
-0.358889	get answers to your	-0.124939
-0.358400	decimal point in your	-0.124939
-0.358400	a switch in your	-0.124939
-0.462634	not necessary for your	-0.124939
-0.787247	the manual for your	-0.124939
-0.358798	to remember that your	-0.124939
-0.965821	to check if your	-0.124939
-0.357417	other hand, if your	-0.124939
-0.495404	If you make your	-0.124939
-0.646194	you must make your	-0.124939
-0.352038	or better, make your	-0.124939
-0.358088	to CriticalFunction. If your	-0.124939
-0.357028	several years before your	-0.124939
-0.355321	performance counters inside your	-0.124939
-0.457538	You may write your	-0.124939
-0.354036	stack frame unless your	-0.124939
-0.507141	efficient to define your	-0.124939
-0.294154	please don't send your	-0.124939
-0.237853	the code. Inserting your	-0.124939
-0.392208	return to its own	-0.124939
-0.344679	object in its own	-0.124939
-0.375780	run on its own	-0.124939
-0.344679	not have its own	-0.124939
-0.139007	thread has its own	-0.124939
-0.139007	list has its own	-0.124939
-0.139007	model has its own	-0.124939
-0.262855	each thread its own	-0.124939
-0.262855	(1) get its own	-0.124939
-0.262855	then handle its own	-0.124939
-0.332928	reductions at their own	-0.124939
-0.332928	rarely program their own	-0.124939
-0.192325	you make your own	-0.124939
-0.192325	better, make your own	-0.124939
-0.254098	may write your own	-0.124939
-0.254098	to define your own	-0.124939
-0.254098	code. Inserting your own	-0.124939
-0.414361	based on my own	-0.124939
-0.319748	code. For my own	-0.124939
-0.345243	Linux. Asmlib My own	-0.124939
-0.463302	a class is declared	-0.124939
-0.659336	template class is declared	-0.124939
-0.573686	table should be declared	-0.124939
-0.848082	arrays should be declared	-0.124939
-0.581540	threads must be declared	-0.124939
-0.913939	should preferably be declared	-0.124939
-0.569184	parameters that are declared	-0.124939
-1.261431	Variables that are declared	-0.124939
-1.068373	which they are declared	-0.124939
-0.551947	how they are declared	-0.124939
-0.358561	seconds was not declared	-0.124939
-0.367733	variables and objects declared	-0.124939
-0.513076	Variables and objects declared	-0.124939
-0.439049	if any objects declared	-0.124939
-0.461571	used for variables declared	-0.124939
-0.450199	functions A macro declared	-0.124939
-0.308369	a class Variables declared	-0.124939
-0.308369	course inefficient. Variables declared	-0.124939
-0.899730	variables in the XMM	-0.124939
-0.889068	will use the XMM	-0.124939
-0.196156	precision when the XMM	-0.425969
-0.563624	processor) when the XMM	-0.124939
-0.358881	point underflow in XMM	-0.124939
-0.358812	64-bit Windows). The XMM	-0.124939
-0.858312	has support for XMM	-0.124939
-0.560910	not check if XMM	-0.124939
-0.357426	is costly if XMM	-0.124939
-0.585064	n.a. Floating point XMM	-0.124939
-0.458751	good implementation uses XMM	-0.124939
-0.353996	- - Integer XMM	-0.124939
-0.353406	- 76 Boolean XMM	-0.124939
-0.514264	into a 128-bit XMM	-0.124939
-0.233576	example, a 128-bit XMM	-0.124939
-0.196010	MMX to 128-bit XMM	-0.124939
-0.087040	code. The 128-bit XMM	-0.124939
-0.087040	registers The 128-bit XMM	-0.124939
-0.331872	register stack versus XMM	-0.124939
-1.783602	value of the second	-0.124939
-1.067561	added to the second	-0.124939
-0.596499	(0); and the second	-0.124939
-1.059304	only in the second	-0.124939
-0.595333	together in the second	-0.124939
-1.063378	calculations on the second	-0.124939
-0.565388	true, then the second	-0.124939
-0.565388	false, then the second	-0.124939
-0.592513	done at the second	-0.124939
-0.573369	determines whether the second	-0.124939
-0.357456	and last the second	-0.124939
-0.592886	i by a second	-0.124939
-0.585401	go through a second	-0.124939
-0.358298	below. Installing a second	-0.124939
-0.567250	of code. The second	-0.124939
-0.747183	member functions. The second	-0.124939
-0.461361	polymorphic functions. The second	-0.124939
-0.356929	page 137). The second	-0.124939
-0.354654	// incremented every second	-0.124939
-0.237934	as price, compatibility, second	-0.124939
-0.171465	The following example shows	-0.425969
-0.160559	InstructionSet().The following example shows	-0.124939
-0.312126	The next example shows	-0.124939
-0.352142	element. The table shows	-0.124939
-0.352142	8.1. The table shows	-0.124939
-0.421364	expression. Example 12.4b shows	-0.124939
-0.294219	(see page 16) shows	-0.124939
-0.294219	on page 39 shows	-0.124939
-0.294219	on page 58 shows	-0.124939
-0.237910	8.1 (page 77) shows	-0.124939
-0.237910	section (page 131) shows	-0.124939
-0.917307	programming language and interface	-0.124939
-0.586935	that the user interface	-0.124939
-0.371073	on the user interface	-0.124939
-0.371073	place the user interface	-0.124939
-0.174555	choice of user interface	-0.124939
-0.078566	Choice of user interface	-0.124939
-0.348353	systems. The user interface	-0.124939
-0.241557	time. A user interface	-0.124939
-0.241557	simplest possible user interface	-0.124939
-0.241557	use standard user interface	-0.124939
-0.241557	code, including user interface	-0.124939
-0.222435	the graphical user interface	-0.124939
-0.500139	a graphical user interface	-0.124939
-0.222435	A graphical user interface	-0.124939
-0.241557	A popular user interface	-0.124939
-0.279508	(OWL). Several graphical interface	-0.124939
-0.279508	on system-specific graphical interface	-0.124939
-0.167676	and a well-defined interface	-0.124939
-0.255933	with a well-defined interface	-0.124939
-1.343110	be possible to improve	-0.124939
-1.421821	in order to improve	-0.124939
-1.773741	you want to improve	-0.124939
-0.357556	these methods to improve	-0.124939
-0.878776	stack. This can improve	-0.124939
-0.527558	cycles. You can improve	-0.124939
-0.527558	one. You can improve	-0.124939
-0.456096	hash table can improve	-0.124939
-0.456096	64-bit systems can improve	-0.124939
-0.456096	throw() statement can improve	-0.124939
-0.352991	(typically 64) can improve	-0.124939
-0.358572	this did not improve	-0.124939
-0.544440	advantages that may improve	-0.124939
-0.564768	entries. This may improve	-0.124939
-0.880702	then you may improve	-0.124939
-0.453224	because this may improve	-0.124939
-0.549784	can not only improve	-0.124939
-0.527366	valid) can possibly improve	-0.124939
-0.358947	by ignoring the higher	-0.124939
-0.527254	these functions is higher	-0.124939
-1.569813	There is a higher	-0.124939
-0.552112	file for a higher	-0.124939
-0.552112	designed for a higher	-0.124939
-0.571533	CPU with a higher	-0.124939
-0.571533	model with a higher	-0.124939
-0.583896	loaded at a higher	-0.124939
-0.895397	expected to be higher	-0.124939
-0.358756	These costs are higher	-0.124939
-0.659318	the SSE or higher	-0.124939
-0.358282	microprocessor microarchitecture. A higher	-0.124939
-0.358209	code size has higher	-0.124939
-0.502807	set or any higher	-0.124939
-0.342643	measure are much higher	-0.124939
-0.342643	resolution. A much higher	-0.124939
-0.850574	when the next higher	-0.124939
-0.520029	processor to give higher	-0.124939
-0.562395	count is usually higher	-0.124939
-0.237861	units and hence higher	-0.124939
-1.043836	a program is bigger	-0.124939
-0.804838	the matrix is bigger	-0.124939
-0.804169	size parameter is bigger	-0.124939
-0.587385	the advantage of bigger	-0.124939
-0.841688	point code. The bigger	-0.124939
-0.598235	library may be bigger	-0.124939
-0.595549	arrays that are bigger	-0.124939
-0.463209	typically used on bigger	-0.124939
-0.725910	is treated as bigger	-0.124939
-1.257155	the innermost loop bigger	-0.124939
-0.776505	of the new bigger	-0.124939
-0.521881	that a new bigger	-0.124939
-0.931344	allocate a new bigger	-0.124939
-0.355841	even for arrays bigger	-0.124939
-0.543847	systems that allows bigger	-0.124939
-1.036612	the code becomes bigger	-0.124939
-0.452615	projects have become bigger	-0.124939
-0.348295	a total offset bigger	-0.124939
-0.331798	of 2. Objects bigger	-0.124939
-0.331798	with the ever bigger	-0.124939
-1.814705	to use the vectors	-0.124939
-0.358791	by wrapping the vectors	-0.124939
-0.550672	a time in vectors	-0.124939
-0.854933	Mathematical functions for vectors	-0.124939
-0.525318	mathematical operations on vectors	-0.124939
-0.539620	parallel calculations on vectors	-0.124939
-0.462363	256 bit integer vectors	-0.124939
-1.048213	speed by using vectors	-0.124939
-0.429903	integer and double vectors	-0.124939
-0.734903	float and double vectors	-0.124939
-0.503140	128 bit float vectors	-0.124939
-0.555559	use the 64-bit vectors	-0.124939
-0.353411	use of intrinsic vectors	-0.124939
-0.770545	The 128-bit XMM vectors	-0.124939
-0.352020	code. The bigger vectors	-0.124939
-0.565433	x^4 // Define vectors	-0.124939
-0.441965	The 256-bit YMM vectors	-0.124939
-0.575441	instruction set (128 vectors	-0.124939
-0.037158	video or 3-dimensional vectors	-0.124939
-0.595563	EXCEPTION_CONTINUE_SEARCH) { // Floating	-0.124939
-1.061027	float and double Floating	-0.124939
-0.596445	b) - n.a. Floating	-0.124939
-1.058003	floating point variables Floating	-0.124939
-1.169967	in 64-bit systems. Floating	-0.124939
-0.974794	Floating point division Floating	-0.124939
-0.512754	multiply and shift Floating	-0.124939
-0.564057	45 clock cycles. Floating	-0.124939
-0.637138	for multiple purposes. Floating	-0.124939
-0.902475	floating point expressions. Floating	-0.124939
-0.339335	with integer parameters. Floating	-0.124939
-0.339389	the 32-bit integer. Floating	-0.124939
-0.172626	= 0; 14.6 Floating	-0.124939
-0.172626	division...................................................................................................... 137 14.6 Floating	-0.124939
-0.382725	on page 105. Floating	-0.124939
-0.102834	operators............................................................................... 29 7.3 Floating	-0.124939
-0.102834	variables. 31 7.3 Floating	-0.124939
-0.237829	stack is organized. Floating	-0.124939
-0.237829	45 clock cycles). Floating	-0.124939
-0.237829	the programmer. 79 Floating	-0.124939
-0.896035	function, and the AVX2	-0.124939
-0.358812	performance somewhat. The AVX2	-0.124939
-0.818245	and one for AVX2	-0.124939
-0.463375	// SSE4.1 // AVX2	-0.124939
-0.578454	AVX only when AVX2	-0.124939
-0.350509	= double 2 AVX2	-0.124939
-0.350509	= int64_t 2 AVX2	-0.124939
-0.432304	= int 4 AVX2	-0.124939
-0.334119	= double 4 AVX2	-0.124939
-0.432304	= float 4 AVX2	-0.124939
-0.334119	= int64_t 4 AVX2	-0.124939
-0.137973	= int 8 AVX2	-0.124939
-0.137973	or int 8 AVX2	-0.124939
-0.341368	= float 8 AVX2	-0.124939
-0.534664	64 4 256 AVX2	-0.124939
-0.534664	32 8 256 AVX2	-0.124939
-0.534664	16 16 256 AVX2	-0.124939
-0.291818	8 32 256 AVX2	-0.124939
-0.646122	and double vectors AVX2	-0.124939
-0.294191	available, e.g. AVX, AVX2	-0.124939
-0.562804	and after the piece	-0.124939
-0.504855	to insert the piece	-0.124939
-1.062091	version of a piece	-0.124939
-0.583907	that if a piece	-0.124939
-1.100092	to make a piece	-0.425969
-0.860307	part. If a piece	-0.124939
-0.460041	idea how a piece	-0.124939
-0.356099	to optimize a piece	-0.124939
-0.782876	to generate a piece	-0.124939
-0.356099	compiler optimizes a piece	-0.124939
-0.356099	for studying a piece	-0.124939
-0.358661	calculations piece by piece	-0.124939
-1.452641	share the same piece	-0.124939
-0.587666	executing the same piece	-0.124939
-0.538943	executing a critical piece	-0.124939
-0.355871	heavy background calculations piece	-0.124939
-0.417228	is a small piece	-0.124939
-0.417228	than a small piece	-0.124939
-0.581599	optimizing a particular piece	-0.124939
-0.193204	address that is divisible	-0.425969
-0.550035	constant that is divisible	-0.124939
-0.591427	address which is divisible	-0.124939
-0.423196	loop count is divisible	-0.425969
-1.064851	certain to be divisible	-0.124939
-0.585233	SIZE must be divisible	-0.124939
-0.585755	points is not divisible	-0.124939
-0.585755	iterations is not divisible	-0.124939
-0.462255	arrays. Array size divisible	-0.124939
-0.101096	to an address divisible	-0.124939
-0.062016	at an address divisible	-0.726999
-0.125383	alignment to addresses divisible	-0.124939
-0.125383	structures to addresses divisible	-0.124939
-0.302363	both have addresses divisible	-0.124939
-0.392848	round memory addresses divisible	-0.124939
-1.530715	the use of <<	-0.124939
-0.352908	u.i += n <<	-0.124939
-0.346454	cout << list[i] <<	-0.124939
-0.002296	size) { cout <<	-0.124939
-0.000573	Disp() { cout <<	-0.425969
-0.001146	Hello() { cout <<	-0.425969
-0.002296	int)size) { cout <<	-0.124939
-0.018714	through array cout <<	-0.124939
-0.018714	of f cout <<	-0.124939
-0.341833	32 with j <<	-0.124939
-0.382816	((B & 3) <<	-0.124939
-0.237894	16is calculated asa <<	-0.124939
-0.237894	17is calculated as(a <<	-0.124939
-0.237894	4) | (C <<	-0.124939
-0.237894	A | (B <<	-0.124939
-0.821295	= i; } Here,	-0.124939
-0.536420	log(2.0); ... } Here,	-0.124939
-0.795489	+ 1.; } Here,	-0.124939
-0.345043	return sum; } Here,	-0.124939
-0.345043	+ list[j].c; } Here,	-0.124939
-0.345043	Func1(list, &list[8]); } Here,	-0.124939
-0.355656	p + i; Here,	-0.124939
-0.355093	a | b; Here,	-0.124939
-0.550435	1.0) { ... Here,	-0.124939
-0.819456	b + c; Here,	-0.124939
-0.455222	matrix[FuncRow(i)][FuncCol(i)] += x; Here,	-0.124939
-0.382771	}; S1 ArrayOfStructures[100]; Here,	-0.124939
-0.294163	d + 3.5; Here,	-0.124939
-0.294163	comes from testing. Here,	-0.124939
-0.237861	in my blog. Here,	-0.124939
-0.237861	(number of sets). Here,	-0.124939
-0.237861	ArraySize; i++) List[i]++; Here,	-0.124939
-0.237861	c1; int c1::*MemberPointer; Here,	-0.124939
-0.237861	OneOrTwo5[b & 1]; Here,	-0.124939
-0.600591	problem of the x86	-0.124939
-0.598965	extension to the x86	-0.124939
-0.591815	hardware in the x86	-0.124939
-0.202052	microprocessors in the x86	-0.425969
-1.186697	based on the x86	-0.124939
-0.818482	32 bits in x86	-0.124939
-0.358817	64-bit versions. The x86	-0.124939
-1.009776	optimization guide for x86	-0.124939
-0.172218	Works with all x86	-0.425969
-0.087372	processors. Supports all x86	-0.124939
-0.087372	source. Supports all x86	-0.124939
-0.087372	workaround. Supports all x86	-0.124939
-0.502248	The 32- bit x86	-0.124939
-0.354449	big-endian storage. All x86	-0.124939
-0.350308	source library. Supports x86	-0.124939
-0.350308	execution All modern x86	-0.124939
-0.382816	__linux__ __unix__ __linux__ x86	-0.124939
-0.504838	allocation are: The process	-0.124939
-0.248891	the log on process	-0.124939
-0.248891	The log on process	-0.124939
-0.358282	CPU cores. A process	-0.124939
-1.245797	instance for each process	-0.124939
-0.354140	to the allocation process	-0.124939
-0.497526	is a complicated process	-0.124939
-0.429472	how the development process	-0.124939
-0.509357	which software development process	-0.124939
-0.352882	slow GOT lookup process	-0.124939
-0.070265	of the installation process	-0.124939
-0.070265	for the installation process	-0.124939
-0.070265	during the installation process	-0.124939
-0.369719	installed. The installation process	-0.124939
-0.341765	leave a background process	-0.124939
-0.339376	updating. The update process	-0.124939
-0.429421	program. 6 Development process	-0.124939
-0.237861	as a learning process	-0.124939
-0.237861	why this delaying process	-0.124939
-0.596521	fffff is the binary	-0.124939
-0.584619	stored as the binary	-0.124939
-0.557351	cut off the binary	-0.124939
-1.534780	stored in a binary	-0.124939
-0.588824	small that a binary	-0.124939
-1.313877	may be a binary	-0.124939
-0.584553	list or a binary	-0.124939
-0.851616	A disadvantage of binary	-0.124939
-0.527290	is compiled to binary	-0.124939
-1.529570	are stored in binary	-0.124939
-0.358406	right-most 1-bit in binary	-0.124939
-0.504524	and distributed as binary	-0.124939
-0.827032	and then use binary	-0.124939
-0.355722	this case. A binary	-0.124939
-0.355722	be moved. A binary	-0.124939
-0.559554	bits of its binary	-0.124939
-0.492180	CLR, to produce binary	-0.124939
-0.294182	as a biased binary	-0.124939
-0.294182	and search facilities, binary	-0.124939
-0.554940	obstacles and to know	-0.124939
-2.086281	in order to know	-0.124939
-0.567848	is useful to know	-0.124939
-1.762228	you want to know	-0.124939
-0.356868	CPU dispatcher to know	-0.124939
-0.814399	the programmer to know	-0.124939
-0.590477	programmers do not know	-0.124939
-0.587569	sequence. If you know	-0.124939
-0.501558	are sure you know	-0.124939
-0.575842	assumes that we know	-0.124939
-0.263178	the compiler cannot know	-0.124939
-0.571181	the compiler doesn't know	-0.124939
-0.331314	the microprocessor doesn't know	-0.124939
-0.458341	And who would know	-0.124939
-0.456632	options. I don't know	-0.124939
-0.237902	compilers (Microsoft, Intel) know	-0.124939
-0.169256	level-2 cache is 512	-0.124939
-0.599532	lines in a 512	-0.124939
-0.591494	element for a 512	-0.124939
-0.659696	operating system, and 512	-0.124939
-0.504857	subsequent instructions. The 512	-0.124939
-0.357545	and soon also 512	-0.124939
-0.140377	double 64 8 512	-0.124939
-0.140377	long 64 8 512	-0.124939
-0.139554	int 32 16 512	-0.124939
-0.139554	float 32 16 512	-0.124939
-0.558055	make the matrix 512	-0.124939
-0.106963	in a 512 512	-0.124939
-0.106963	for a 512 512	-0.124939
-0.249193	instructions. The 512 512	-0.124939
-0.249193	2040 38.7 512 512	-0.124939
-0.249193	13.6 80.9 512 512	-0.124939
-0.237886	511 2040 38.7 512	-0.124939
-0.237886	65 13.6 80.9 512	-0.124939
-0.953904	the CPU to generate	-0.124939
-0.830749	operating system to generate	-0.124939
-1.158252	we want to generate	-0.124939
-1.736781	is likely to generate	-0.124939
-1.422715	are able to generate	-0.124939
-0.357216	&& expression to generate	-0.124939
-0.152840	to 0 and generate	-0.425969
-0.504873	141. Applications that generate	-0.124939
-0.689504	and it will generate	-0.124939
-0.482267	created it will generate	-0.124939
-0.551105	-fpic. This will generate	-0.124939
-0.499996	-56 which will generate	-0.124939
-0.340221	overflow condition will generate	-0.124939
-0.340221	to 127 will generate	-0.124939
-0.340221	The linker will generate	-0.124939
-0.340221	of c+b will generate	-0.124939
-0.841323	but it doesn't generate	-0.124939
-0.714534	compiler can automatically generate	-0.124939
-1.456410	most of the advantages	-0.124939
-1.175603	many of the advantages	-0.124939
-1.057620	discussion of the advantages	-0.124939
-0.358508	to weigh the advantages	-0.124939
-0.239889	is used. The advantages	-0.425969
-0.353752	end user. The advantages	-0.124939
-0.457062	with pointers. The advantages	-0.124939
-0.649576	programming style. The advantages	-0.124939
-0.353752	the operands. The advantages	-0.124939
-0.649576	+ 1.0f;} The advantages	-0.124939
-0.353752	allocated dynamically. The advantages	-0.124939
-0.353752	(80 bits). The advantages	-0.124939
-0.358230	Each type has advantages	-0.124939
-0.658231	are also other advantages	-0.124939
-0.555752	C++ has many advantages	-0.124939
-0.355715	there are specific advantages	-0.124939
-0.459472	systems have several advantages	-0.124939
-0.575504	Weighing the above advantages	-0.124939
-0.584957	parameters a and r	-0.124939
-0.462927	as p and r	-0.124939
-0.503839	the variable that r	-0.124939
-0.358085	= 0 that r	-0.124939
-0.460633	{ r = r	-0.124939
-1.349936	{ a[i] = r	-0.124939
-0.655153	; edx = r	-0.124939
-0.659289	pointed to by r	-0.124939
-0.902189	& r) { r	-0.124939
-0.658715	for example when r	-0.124939
-0.357468	the sequence, where r	-0.124939
-0.198195	(r = 0; r	-0.425969
-0.500788	; a ; r	-0.124939
-0.354881	not clear whether r	-0.124939
-0.163813	(r = 1; r	-0.425969
-0.353167	; add what r	-0.124939
-0.314732	value that lies r	-0.124939
-1.862820	then it is usually	-0.124939
-0.574884	A function is usually	-0.124939
-0.574884	dispatcher function is usually	-0.124939
-0.891131	bits. This is usually	-0.124939
-1.306568	the compiler is usually	-0.124939
-0.598046	manner. It is usually	-0.124939
-0.459850	link order is usually	-0.124939
-0.569301	first count is usually	-0.124939
-0.355949	data area is usually	-0.124939
-0.355949	if any, is usually	-0.124939
-0.355949	The loop-branch is usually	-0.124939
-0.584557	The functions are usually	-0.124939
-0.357197	These cases are usually	-0.124939
-0.538588	Integer constants are usually	-0.124939
-0.358329	cache. Compilers will usually	-0.124939
-0.759531	or logical processors usually	-0.124939
-0.562231	Floating point calculations usually	-0.124939
-0.515489	in an integer, usually	-0.124939
-0.325381	to remote databases usually	-0.124939
-0.898244	loop if the results	-0.124939
-0.358798	// OR the results	-0.124939
-0.503062	different compilers. The results	-0.124939
-0.461859	of optimizations. The results	-0.124939
-0.357530	matrix sizes. The results	-0.124939
-0.463176	event-counters do. This results	-0.124939
-0.356444	to print out results	-0.124939
-0.356426	operators produce 32 results	-0.124939
-0.552468	store the four results	-0.124939
-0.392813	precision, and intermediate results	-0.124939
-0.553229	to store intermediate results	-0.124939
-0.302335	integer). All intermediate results	-0.124939
-0.302335	by storing intermediate results	-0.124939
-0.438917	computer. The measured results	-0.124939
-0.339386	to get reliable results	-0.124939
-0.339406	stores the thousand results	-0.124939
-0.294173	may give inconsistent results	-0.124939
-0.294173	sometimes give misleading results	-0.124939
-0.237869	matrix. My experimental results	-0.124939
-0.585185	arrays, a and b,	-0.124939
-0.580343	x[]) { int b,	-0.124939
-0.192633	even if a, b,	-0.124939
-0.062259	{ int a, b,	-0.425969
-0.134962	14.10 int a, b,	-0.124939
-0.134962	14.11 int a, b,	-0.124939
-0.134962	8.6a int a, b,	-0.124939
-0.134962	8.6b int a, b,	-0.124939
-0.192633	a.y);} vector a, b,	-0.124939
-0.182718	{ float a, b,	-0.124939
-0.182718	11.1a float a, b,	-0.124939
-0.182718	11.1b float a, b,	-0.124939
-0.182718	8.16 float a, b,	-0.124939
-0.155729	way: bool a, b,	-0.124939
-0.155729	7.9a bool a, b,	-0.124939
-0.192633	{ Vec16s a, b,	-0.124939
-0.192633	objects Vec8s a, b,	-0.124939
-0.714992	int i, a[100], b,	-0.124939
-1.296274	any of the storage	-0.124939
-0.591025	Fortran where the storage	-0.124939
-0.572671	The type of storage	-0.124939
-0.504514	8 bytes of storage	-0.124939
-0.833888	compile time. The storage	-0.124939
-0.358175	are stored. The storage	-0.124939
-0.578997	and replaced by storage	-0.124939
-0.355293	26 about data storage	-0.124939
-0.355293	of binary data storage	-0.124939
-0.461950	Global or static storage	-0.124939
-0.262834	kinds of variable storage	-0.124939
-0.336261	integer constants. Register storage	-0.124939
-0.065796	each thread. Thread-local storage	-0.124939
-0.065796	environment block. Thread-local storage	-0.124939
-0.065796	five times. Thread-local storage	-0.124939
-0.444303	have big endian storage	-0.124939
-0.102855	can make thread-local storage	-0.124939
-0.102855	variables. (See thread-local storage	-0.124939
-0.595719	strings is the old	-0.124939
-0.600336	contents of the old	-0.124939
-0.598637	changed to the old	-0.124939
-0.874240	data in the old	-0.124939
-1.474935	used in the old	-0.124939
-0.587396	strings in the old	-0.124939
-0.587396	present in the old	-0.124939
-0.597271	15 on the old	-0.124939
-0.358821	a string. The old	-0.124939
-0.971026	is compiled for old	-0.124939
-0.857351	is compatible with old	-0.124939
-0.796651	be compatible with old	-0.124939
-0.543827	when compatibility with old	-0.124939
-0.498772	code incompatible with old	-0.124939
-0.358617	will crash on old	-0.124939
-0.463131	data. Use an old	-0.124939
-0.461469	in some very old	-0.124939
-0.325361	a six years old	-0.124939
-0.237902	by a plain old	-0.124939
-1.446151	the compiler to reduce	-0.124939
-0.786229	a compiler to reduce	-0.124939
-1.926874	is possible to reduce	-0.124939
-0.877892	were able to reduce	-0.124939
-0.357213	great lengths to reduce	-0.124939
-0.357213	the capability to reduce	-0.124939
-0.659740	constant propagation and reduce	-0.124939
-0.586523	Espresso) that can reduce	-0.124939
-0.589290	but you can reduce	-0.124939
-0.389489	other compilers can reduce	-0.124939
-0.389489	Some compilers can reduce	-0.124939
-0.617635	Most compilers can reduce	-0.124939
-0.352989	context switches can reduce	-0.124939
-0.352989	ever seen can reduce	-0.124939
-0.890191	The compiler may reduce	-0.425969
-0.583620	good compilers will reduce	-0.124939
-0.461294	example, compilers cannot reduce	-0.124939
-0.352350	This can actually reduce	-0.124939
-0.679692	a branch that goes	-0.124939
-0.342980	A branch that goes	-0.124939
-0.358649	is reset or goes	-0.124939
-1.013785	only when it goes	-0.124939
-0.590729	cost because it goes	-0.124939
-0.501479	mispredicted whenever it goes	-0.124939
-0.358664	a polymorphic function goes	-0.124939
-0.588329	data. The code goes	-0.124939
-1.554190	of the time goes	-0.124939
-1.130924	into a vector goes	-0.124939
-0.356383	branch that always goes	-0.124939
-0.349036	file. The output goes	-0.124939
-1.096185	the clock frequency goes	-0.124939
-0.687728	in a DLL goes	-0.124939
-0.429408	typical software project goes	-0.124939
-0.408055	in example 9.5a goes	-0.124939
-0.237853	the call p->f() goes	-0.124939
-0.237853	less than 1% goes	-0.124939
-0.588920	variables into a union	-0.124939
-0.358576	} Using a union	-0.124939
-0.764198	register variable. The union	-0.124939
-0.573189	class, structure or union	-0.124939
-0.658876	F3(bool y) { union	-0.124939
-0.713391	memory space. A union	-0.124939
-0.353138	an example. A union	-0.124939
-0.353138	7.24 Unions A union	-0.124939
-0.314659	// Example 14.28 union	-0.124939
-0.382748	// Example 14.23b union	-0.124939
-0.382748	// Example 14.26 union	-0.124939
-0.382748	// Example 14.27 union	-0.124939
-0.294145	// Example 14.23 union	-0.124939
-0.237845	// Example 7.40b union	-0.124939
-0.237845	of 100 doubles: union	-0.124939
-0.237845	// Example 7.39 union	-0.124939
-0.237845	// Example 14.29 union	-0.124939
-0.237845	// Example 14.24 union	-0.124939
-0.237845	// Example 14.25 union	-0.124939
-0.367254	char a = 0,	-0.124939
-0.571260	0, b = 0,	-0.124939
-0.497006	& 0 = 0,	-0.124939
-0.350302	list[size], sum1 = 0,	-0.124939
-0.350302	0, s3 = 0,	-0.124939
-0.350302	0, s2 = 0,	-0.124939
-0.350302	float s0 = 0,	-0.124939
-0.350302	0, s1 = 0,	-0.124939
-0.470216	first byte at 0,	-0.425969
-0.442680	bytes byte at 0,	-0.124939
-0.061792	= select(b > 0,	-0.425969
-0.102870	SafeArray() { memset(a, 0,	-0.124939
-0.102870	to zero memset(a, 0,	-0.124939
-0.102870	point overflow: _controlfp_s(&dummy, 0,	-0.124939
-0.102870	status: _fpreset(); _controlfp_s(&dummy, 0,	-0.124939
-0.237926	float list[100]; memset(list, 0,	-0.124939
-0.461358	the function is called.	-0.124939
-0.858834	member function is called.	-0.124939
-0.654150	critical function is called.	-0.124939
-0.356252	when CriticalInnerFunction is called.	-0.124939
-1.282301	needs to be called.	-0.124939
-1.504965	functions that are called.	-0.124939
-0.576302	local objects are called.	-0.124939
-0.654849	all destructors are called.	-0.124939
-0.356413	any constructors are called.	-0.124939
-0.354181	which alloca was called.	-0.124939
-0.142659	function is never called.	-0.124939
-0.475115	functions are never called.	-0.124939
-0.361863	the power of 10	-0.425969
-0.462734	misprediction penalty of 10	-0.124939
-0.358886	from 20 to 10	-0.124939
-0.358817	foreground jobs and 10	-0.124939
-0.902737	will recognize that 10	-0.124939
-0.358562	subtraction (3 - 10	-0.124939
-0.357455	An experiment where 10	-0.124939
-0.887388	get the value 10	-0.124939
-0.064823	that something takes 10	-0.124939
-0.356873	will still take 10	-0.124939
-0.519111	multiple operating systems. 10	-0.124939
-0.348348	not detected until 10	-0.124939
-0.345128	variables. See chapter 10	-0.124939
-0.345145	drivers for Windows. 10	-0.124939
-0.467497	branch is executed 10	-0.124939
-0.294145	of compiler .................................................................................................... 10	-0.124939
-0.294145	control .............................................................................................. 99 10	-0.124939
-0.725363	operating system is based	-0.124939
-0.265965	This manual is based	-0.124939
-0.596774	dispatching should be based	-0.124939
-0.589694	programs that are based	-0.124939
-0.773725	these methods are based	-0.124939
-0.353230	.NET framework are based	-0.124939
-0.353230	of Java are based	-0.124939
-0.353230	and Fortran are based	-0.124939
-0.239688	protection schemes are based	-0.425969
-0.353230	The recommendations are based	-0.124939
-0.358127	an unknown CPU based	-0.124939
-0.831763	C or C++ based	-0.124939
-0.355713	important. A language based	-0.124939
-0.982488	a CPU dispatcher based	-0.124939
-0.646760	high level framework based	-0.124939
-0.453324	branch will go based	-0.124939
-0.343686	can be chosen based	-0.124939
-0.594527	it is to choose	-0.124939
-0.657790	C++ compilers to choose	-0.124939
-0.357890	than done to choose	-0.124939
-0.357890	Use mask to choose	-0.124939
-0.657876	operating system and choose	-0.124939
-0.503626	point operations and choose	-0.124939
-0.503626	future processors, and choose	-0.124939
-0.523014	available, we may choose	-0.124939
-0.449940	below. You may choose	-0.124939
-0.449940	operations. You may choose	-0.124939
-0.449940	52. You may choose	-0.124939
-0.449940	question. You may choose	-0.124939
-0.449940	today. You may choose	-0.124939
-0.346877	software developer may choose	-0.124939
-0.463084	etc. Whether you choose	-0.124939
-1.324930	The compiler will choose	-0.124939
-0.578178	program, you should choose	-0.124939
-0.649316	compilers will automatically choose	-0.124939
-0.339450	that make developers choose	-0.124939
-0.598317	options and the options	-0.124939
-0.504848	we specify the options	-0.124939
-0.782710	Overview of compiler options	-0.124939
-0.356620	Use appropriate compiler options	-0.124939
-0.389832	have various optimization options	-0.124939
-0.299913	options. Many optimization options	-0.124939
-0.057832	the relevant optimization options	-0.124939
-0.018435	all relevant optimization options	-0.602060
-0.057832	8.5 Compiler optimization options	-0.124939
-0.524302	study the available options	-0.124939
-0.355378	18.1. Command line options	-0.124939
-0.457973	only if certain options	-0.124939
-0.648427	compilers have various options	-0.124939
-0.350315	select all installation options	-0.124939
-0.339448	because the debugging options	-0.124939
-0.237894	test. disable power-save options	-0.124939
-0.596684	on is the feature	-0.124939
-0.581911	auto_ptr has the feature	-0.124939
-0.191059	compilers have a feature	-0.124939
-0.569558	have such a feature	-0.124939
-0.358674	The indirect function feature	-0.124939
-0.356907	ifunc branch). This feature	-0.124939
-0.356907	in 2010. This feature	-0.124939
-0.488280	pointer, but this feature	-0.124939
-0.488280	symbols, but this feature	-0.124939
-0.353960	prefetching so this feature	-0.124939
-0.358295	Gnu compiler A feature	-0.124939
-1.080006	a specific CPU feature	-0.124939
-0.503082	hope that such feature	-0.124939
-0.460863	the C++ template feature	-0.124939
-0.510500	put a test feature	-0.124939
-0.347464	a built-in test feature	-0.124939
-0.349043	has the special feature	-0.124939
-0.382793	The symbol interposition feature	-0.124939
-0.710538	there are different ways	-0.124939
-0.712336	are several different ways	-0.124939
-0.442968	has several different ways	-0.124939
-0.556417	defined in other ways	-0.124939
-0.357489	are other possible ways	-0.124939
-1.235496	There are several ways	-0.124939
-0.354186	also have fast ways	-0.124939
-0.216466	implemented in various ways	-0.124939
-0.188463	there are various ways	-0.124939
-0.219279	There are various ways	-0.425969
-0.216466	135 show various ways	-0.124939
-0.216466	sections describe various ways	-0.124939
-0.061962	There are three ways	-0.425969
-0.023523	there are smarter ways	-0.425969
-0.357328	the processors that were	-0.124939
-0.142906	on processors that were	-0.425969
-0.458924	causes problem that were	-0.124939
-0.142261	or models that were	-0.124939
-0.142261	newer models that were	-0.124939
-0.357079	where 10 elements were	-0.124939
-0.460697	see whether they were	-0.124939
-0.356197	supported 256-bit instructions were	-0.124939
-0.459758	following compiler versions were	-0.124939
-0.353961	and disk space were	-0.124939
-0.351648	The measured results were	-0.124939
-1.054283	I have tested were	-0.124939
-0.450169	the different tasks were	-0.124939
-0.346390	different matrix sizes were	-0.124939
-0.325311	compilers The tests were	-0.124939
-0.294163	in example 8.15a were	-0.124939
-0.294163	2004. No differences were	-0.124939
-0.237861	Func1 and Func2 were	-0.124939
-0.599259	used for the link	-0.124939
-0.585157	looking at a link	-0.124939
-1.377106	no need to link	-0.124939
-0.358849	option -fno-pic and link	-0.124939
-0.358179	works differently. The link	-0.124939
-0.358179	linked together. The link	-0.124939
-0.877601	in a static link	-0.124939
-0.103892	either as static link	-0.425969
-0.441625	slower than static link	-0.124939
-0.489144	as a dynamic link	-0.124939
-0.058959	*.a) or dynamic link	-0.425969
-0.307846	can make dynamic link	-0.124939
-0.058959	a separate dynamic link	-0.425969
-0.349705	linear array. No link	-0.124939
-0.556336	until the previous link	-0.124939
-0.294200	makes a symbolic link	-0.124939
-0.562019	fixed-size array is made	-0.124939
-0.358320	a dispatch is made	-0.124939
-0.358320	no attempt is made	-0.124939
-1.336789	code can be made	-0.124939
-1.008192	object can be made	-0.124939
-0.854499	operations can be made	-0.124939
-0.577085	dispatching can be made	-0.124939
-0.577085	statement can be made	-0.124939
-0.590930	file) should be made	-0.124939
-0.869375	can often be made	-0.124939
-1.393225	compilers I have made	-0.124939
-0.945946	the microprocessor has made	-0.124939
-0.355188	This reordering has made	-0.124939
-0.579514	own function library made	-0.124939
-0.568825	64-bit shared object made	-0.124939
-0.341820	consequences. I once made	-0.124939
-0.314717	linked into projects made	-0.124939
-0.237894	of using ready made	-0.124939
-0.237894	using templates. Ready made	-0.124939
-1.100173	point to the appropriate	-0.124939
-0.812727	pointer to the appropriate	-0.425969
-0.564578	link to the appropriate	-0.124939
-0.564578	leads to the appropriate	-0.124939
-0.894139	set for the appropriate	-0.124939
-0.595335	separately with the appropriate	-0.124939
-0.814249	and choose the appropriate	-0.124939
-0.357449	to include the appropriate	-0.124939
-0.461756	that loads the appropriate	-0.124939
-0.357449	classes. Including the appropriate	-0.124939
-0.818203	64-bit systems. The appropriate	-0.124939
-0.358561	is simply not appropriate	-0.124939
-0.358534	simply prints an appropriate	-0.124939
-0.574733	error; and make appropriate	-0.124939
-0.525756	what is most appropriate	-0.124939
-0.355796	vectorization. 3. Use appropriate	-0.124939
-0.237910	const keyword wherever appropriate	-0.124939
-0.866999	= 0; int i,	-0.124939
-0.335479	479001600}; ... int i,	-0.124939
-0.483137	int a[100]; int i,	-0.124939
-0.335479	// n! int i,	-0.124939
-1.036069	int list[300]; int i,	-0.124939
-0.196658	float matrix[rows][columns]; int i,	-0.425969
-0.335479	= string; int i,	-0.124939
-0.335479	S1 list[size]; int i,	-0.124939
-0.335479	Example 8.13a int i,	-0.124939
-0.335479	Example 8.13b int i,	-0.124939
-0.335479	Example 8.12a int i,	-0.124939
-0.335479	Example 8.14b int i,	-0.124939
-0.335479	Example 8.14a int i,	-0.124939
-0.027451	aa: StoreVector(aa + i,	-0.602060
-0.237951	results printf("\n%2i %10I64i", i,	-0.124939
-1.065787	done by the constructor	-0.124939
-1.042445	the function a constructor	-0.124939
-0.503983	normal array. The constructor	-0.124939
-0.358188	the conversion. The constructor	-0.124939
-0.357706	constant data // constructor	-0.124939
-0.462084	default constructor // constructor	-0.124939
-0.358321	and destructors A constructor	-0.124939
-0.537118	members. A simple constructor	-0.124939
-0.339901	to the copy constructor	-0.124939
-0.339901	is a copy constructor	-0.124939
-0.370649	value. The copy constructor	-0.124939
-0.110434	performance. A copy constructor	-0.124939
-0.110434	initialization. A copy constructor	-0.124939
-0.339901	has no copy constructor	-0.124939
-0.258894	object. Any copy constructor	-0.124939
-0.237925	with a default constructor	-0.124939
-0.237925	coordinates // default constructor	-0.124939
-0.237925	constructor. A default constructor	-0.124939
-0.575166	particular set of CPUs.	-0.124939
-0.550314	all brands of CPUs.	-0.124939
-0.616038	versions for different CPUs.	-0.124939
-0.809291	for several different CPUs.	-0.124939
-0.348757	to support different CPUs.	-0.124939
-0.456353	code for Intel CPUs.	-0.124939
-0.353194	for different Intel CPUs.	-0.124939
-0.481870	code for AMD CPUs.	-0.124939
-0.533544	supported on AMD CPUs.	-0.124939
-0.353574	that fit their CPUs.	-0.124939
-0.972009	in the x86 CPUs.	-0.124939
-0.519267	incompatible with old CPUs.	-0.124939
-1.589919	AMD and VIA CPUs.	-0.124939
-0.492979	by all modern CPUs.	-0.124939
-0.348334	well with non-Intel CPUs.	-0.124939
-0.485891	solution on future CPUs.	-0.124939
-0.314698	not with earlier CPUs.	-0.124939
-2.018663	for (i = 2;	-0.124939
-0.065373	1; list[i+2] = 2;	-0.425969
-0.355546	1; a[1] = 2;	-0.124939
-0.619884	= r + 2;	-0.124939
-0.197622	= *p + 2;	-0.425969
-0.338474	= b[i] + 2;	-0.124939
-0.338474	= bb[i] + 2;	-0.124939
-0.357699	= a * 2;	-0.425969
-0.334440	= a[i].u[1] * 2;	-0.124939
-2.107421	0; i < 2;	-0.124939
-0.459282	8.5b a += 2;	-0.124939
-0.669097	{ cout << 2;	-0.425969
-0.597713	$B1$2:. This is just	-0.124939
-0.897376	functions. It is just	-0.124939
-0.358010	vector classes is just	-0.124939
-0.503734	This delay is just	-0.124939
-0.463593	is translated to just	-0.124939
-0.658812	the cache in just	-0.124939
-0.358402	32 AND-operations in just	-0.124939
-0.598245	reference may be just	-0.124939
-0.541120	precision calculations are just	-0.124939
-0.893061	up a function just	-0.124939
-1.202966	be done with just	-0.124939
-0.589614	errors. If you just	-0.124939
-0.569851	sufficient to have just	-0.124939
-0.562335	memory even when just	-0.124939
-0.462857	this purpose. It just	-0.124939
-0.582576	calculate a vector just	-0.124939
-0.294163	speeded up significantly just	-0.124939
-0.294163	square brackets index, just	-0.124939
-1.143244	32 bits of a[i]	-0.124939
-0.575619	Return reference to a[i]	-0.124939
-0.526954	{ temp = a[i]	-0.124939
-0.091203	100; i++) { a[i]	-0.903090
-0.605986	size; i++) { a[i]	-0.124939
-0.835690	+= 2) { a[i]	-0.124939
-0.826912	of loop ; a[i]	-0.124939
-0.545791	of array element a[i]	-0.124939
-0.338935	< 2; i++) a[i]	-0.124939
-1.445952	< size; i++) a[i]	-0.124939
-0.331859	in multiplication here: a[i]	-0.124939
-0.314717	that avoids overflow: a[i]	-0.124939
-0.314717	the safe formula a[i]	-0.124939
-1.076289	copy of the function,	-0.124939
-0.726639	to inline the function,	-0.124939
-0.569600	125 for this function,	-0.124939
-1.681111	of the same function,	-0.124939
-0.357986	entirely inside one function,	-0.124939
-0.581532	to the library function,	-0.124939
-0.357198	a non-virtual member function,	-0.124939
-0.568781	in the template function,	-0.124939
-0.242773	of the simple function,	-0.124939
-0.242773	In the simple function,	-0.124939
-0.355253	of an optimized function,	-0.124939
-0.523022	turn calls another function,	-0.124939
-1.211028	in the innermost function,	-0.124939
-0.451124	called a frame function,	-0.124939
-0.537183	inlining the latter function,	-0.124939
-1.266363	the CPU detection function,	-0.124939
-0.331831	in the select function,	-0.124939
-0.294145	to a non-member function,	-0.124939
-0.597277	order of the operands	-0.124939
-0.594118	evaluation of the operands	-0.124939
-0.877599	provided that the operands	-0.124939
-0.589128	certainty that the operands	-0.124939
-0.862731	than if the operands	-0.124939
-1.126836	advantageous if the operands	-0.124939
-0.581412	results if the operands	-0.124939
-0.939262	cannot swap the operands	-0.124939
-0.358826	Boolean operands The operands	-0.124939
-1.526351	of floating point operands	-0.124939
-0.539888	precision in all operands	-0.124939
-0.357489	of expressions where operands	-0.124939
-0.936116	of the Boolean operands	-0.124939
-0.666198	order of Boolean operands	-0.124939
-0.061602	memcpy 16kB aligned operands	-0.124939
-0.294219	good performance). Aligned operands	-0.124939
-1.072919	out of the innermost	-0.124939
-1.452277	used in the innermost	-0.124939
-1.149000	calls in the innermost	-0.124939
-0.865577	spent in the innermost	-0.124939
-0.582899	occurs in the innermost	-0.124939
-0.582899	changing in the innermost	-0.124939
-0.575319	processors, only the innermost	-0.124939
-0.656936	but also the innermost	-0.124939
-0.586629	be inside the innermost	-0.124939
-0.462565	memory outside the innermost	-0.124939
-0.462565	but outside the innermost	-0.124939
-0.454379	for the critical innermost	-0.124939
-0.645404	if the critical innermost	-0.124939
-0.454379	If the critical innermost	-0.124939
-0.454379	inside the critical innermost	-0.124939
-0.454379	outside the critical innermost	-0.124939
-0.319570	tasks. A critical innermost	-0.124939
-0.429560	b; // Critical innermost	-0.124939
-1.750607	is likely to require	-0.124939
-0.583156	from functions that require	-0.124939
-0.356440	code-based methods or require	-0.124939
-0.356440	table lookup or require	-0.124939
-0.356440	be slower or require	-0.124939
-1.055274	pointer does not require	-0.124939
-0.570808	full. This may require	-0.124939
-0.356521	time measurements may require	-0.124939
-0.576721	efficient vector operations require	-0.124939
-0.356190	All these instructions require	-0.124939
-0.355847	or global arrays require	-0.124939
-0.355344	have mixed precision require	-0.124939
-0.499150	them. This would require	-0.124939
-0.496537	Alignment? Some applications require	-0.124939
-0.350772	and non-constant references require	-0.124939
-0.339411	code. Some profilers require	-0.124939
-0.294154	MOVNTPD and MOVNTDQ require	-0.124939
-0.237853	in linear algebra) require	-0.124939
-1.996736	version of the compiler.	-0.124939
-0.202024	option in the compiler.	-0.124939
-0.591681	flag in the compiler.	-0.124939
-1.365666	supported by the compiler.	-0.124939
-0.579176	automatically by the compiler.	-0.124939
-0.597456	much on the compiler.	-0.124939
-0.591208	implement in a compiler.	-0.124939
-0.591208	algebra in a compiler.	-0.124939
-0.867338	with a different compiler.	-0.124939
-1.681642	of the same compiler.	-0.124939
-1.167577	of the Intel compiler.	-0.124939
-0.353206	Gnu or Intel compiler.	-0.124939
-0.815769	the Intel C++ compiler.	-0.124939
-0.861925	with the Gnu compiler.	-0.124939
-0.521643	built with another compiler.	-0.124939
-0.486474	as the Microsoft compiler.	-0.124939
-0.428168	Comes with Microsoft compiler.	-0.124939
-0.900667	which of the advanced	-0.124939
-0.598006	compromise on the advanced	-0.124939
-0.526658	will run the advanced	-0.124939
-0.463093	avoid running the advanced	-0.124939
-0.874602	A lot of advanced	-0.124939
-0.358234	and lack of advanced	-0.124939
-0.358234	a wealth of advanced	-0.124939
-0.358844	out-of-order execution and advanced	-0.124939
-0.504853	manual is for advanced	-0.124939
-0.463233	many tips on advanced	-0.124939
-0.583348	C++ is an advanced	-0.124939
-0.538554	Don't use an advanced	-0.124939
-0.540355	and for more advanced	-0.124939
-0.593080	run the most advanced	-0.124939
-0.497315	devices and using advanced	-0.124939
-0.568048	microprocessors are using advanced	-0.124939
-0.555724	it has many advanced	-0.124939
-0.346438	background services under advanced	-0.124939
-0.573213	space where a #define	-0.124939
-0.358636	enum, const, or #define	-0.124939
-0.504640	as inline function #define	-0.124939
-0.463216	macro declared with #define	-0.124939
-0.526642	If Microsoft compiler #define	-0.124939
-0.830680	INSTRSET == 2 #define	-0.124939
-0.356926	INSTRSET == 8 #define	-0.124939
-0.886802	constants. For example, #define	-0.124939
-0.355486	Gnu compiler, etc. #define	-0.124939
-0.350747	INSTRSET == 5 #define	-0.124939
-0.314706	pure_function __attribute__((const)) #else #define	-0.124939
-0.444224	resolved at runtime. #define	-0.124939
-0.102837	swap two elements: #define	-0.124939
-0.102837	two array elements: #define	-0.124939
-0.294135	given a name. #define	-0.124939
-0.294135	8.22 #ifdef __GNUC__ #define	-0.124939
-0.237837	representation of N: #define	-0.124939
-0.237837	<float.h> #include <math.h> #define	-0.124939
-2.074395	the number of points	-0.124939
-0.356406	the object it points	-0.124939
-0.720920	first call it points	-0.124939
-0.460431	change what it points	-0.124939
-0.589920	what a pointer points	-0.124939
-1.185322	a function pointer points	-0.124939
-0.356423	that p always points	-0.124939
-0.354868	The following list points	-0.124939
-0.352310	original pointer actually points	-0.124939
-0.123731	variable that r points	-0.124939
-0.123731	0 that r points	-0.124939
-0.297417	add what r points	-0.124939
-0.350331	a few unused points	-0.124939
-0.350331	of object p points	-0.124939
-0.065796	pointer which initially points	-0.124939
-0.065796	Function pointer initially points	-0.124939
-0.065796	PLT entry initially points	-0.124939
-0.237886	four (or eight) points	-0.124939
-0.598563	switch is a switch	-0.124939
-0.357965	to predict a switch	-0.124939
-0.724537	to insert a switch	-0.124939
-0.462413	older processors, a switch	-0.124939
-1.014106	it needs to switch	-0.124939
-0.762679	of branches and switch	-0.124939
-0.065676	7.12 Branches and switch	-0.124939
-0.658322	same as for switch	-0.124939
-0.358156	as replacements for switch	-0.124939
-0.504825	branch tree or switch	-0.124939
-0.653489	predicted well. A switch	-0.124939
-0.355728	jump targets. A switch	-0.124939
-0.358187	switch statements because switch	-0.124939
-0.491268	when a task switch	-0.124939
-0.518221	14.3a int n; switch	-0.124939
-0.331918	switches A context switch	-0.124939
-0.237886	array initializer lists, switch	-0.124939
-0.596686	variable is the range	-0.124939
-0.358803	to limit the range	-0.124939
-0.281202	is out of range	-0.124939
-0.387179	if out of range	-0.124939
-0.387179	not out of range	-0.124939
-0.387179	being out of range	-0.124939
-0.463491	and underflow. The range	-0.124939
-0.590425	address in this range	-0.124939
-0.897144	because the same range	-0.124939
-0.358069	8-bit integers which range	-0.124939
-0.541261	containing the address range	-0.124939
-0.541261	covered the address range	-0.124939
-0.512838	within a limited range	-0.124939
-0.331918	analysis The live range	-0.124939
-0.102855	variables, float Live range	-0.124939
-0.102855	register storage. Live range	-0.124939
-0.237886	to a narrow range	-0.124939
-0.562954	memory at the start	-0.124939
-0.562954	options at the start	-0.124939
-0.583157	b has to start	-0.124939
-0.953868	the CPU to start	-0.124939
-1.926783	is possible to start	-0.124939
-1.806950	it takes to start	-0.124939
-0.801118	therefore fail to start	-0.124939
-0.656430	several minutes to start	-0.124939
-0.358855	the iterations and start	-0.124939
-1.500961	that it can start	-0.124939
-0.358500	garbage collection may start	-0.124939
-0.354424	algorithm before you start	-0.124939
-0.142018	spots Before you start	-0.124939
-0.142018	tasks. Before you start	-0.124939
-0.355686	The CPU will start	-0.124939
-0.355686	heap manager will start	-0.124939
-0.355930	function name ; start	-0.124939
-0.347485	the framework, during start	-0.124939
-0.526077	in which the modules	-0.124939
-0.589219	load all the modules	-0.124939
-0.726702	the loading of modules	-0.124939
-0.358661	classes, templates or modules	-0.124939
-0.551029	functions in other modules	-0.124939
-0.818938	if no other modules	-0.124939
-0.579315	option for all modules	-0.124939
-0.357649	well- tested library modules	-0.124939
-0.577173	keep the two modules	-0.124939
-1.493326	the most critical modules	-0.124939
-0.356052	starts up. Some modules	-0.124939
-0.568463	to assembly language modules	-0.124939
-0.456578	placed in separate modules	-0.124939
-0.408114	Cannot optimize across modules	-0.124939
-0.374700	enable optimizations across modules	-0.124939
-0.237913	across all .cpp modules	-0.124939
-0.391145	the multiple .cpp modules	-0.124939
-0.358796	is faster the smaller	-0.124939
-0.358796	more advantageous the smaller	-0.124939
-0.594851	position-independent code is smaller	-0.124939
-0.358308	context switches is smaller	-0.124939
-0.358308	The proxy is smaller	-0.124939
-1.063661	integer to a smaller	-0.124939
-0.586893	but has a smaller	-0.124939
-0.358892	size. Integers of smaller	-0.124939
-0.557504	bigger systems. The smaller	-0.124939
-1.830226	it may be smaller	-0.124939
-1.292341	make the code smaller	-0.124939
-0.357767	the matrix into smaller	-0.124939
-0.569255	function into multiple smaller	-0.124939
-0.356512	a variable even smaller	-0.124939
-0.563801	structure 8 bytes smaller	-0.124939
-0.987442	The code becomes smaller	-0.124939
-0.546220	often be made smaller	-0.124939
-0.738154	with execution units smaller	-0.124939
-0.462506	the method used here	-0.124939
-0.724755	of using static here	-0.124939
-1.505680	it is necessary here	-0.124939
-0.564317	because the speed here	-0.124939
-0.547928	c; The calculation here	-0.124939
-0.521456	etc. The problem here	-0.124939
-0.431896	the const_cast operator here	-0.124939
-0.333794	The ?: operator here	-0.124939
-0.352859	check on n here	-0.124939
-0.496556	done at runtime here	-0.124939
-0.350755	Non-polymorphic functions go here	-0.124939
-0.640337	the advice given here	-0.124939
-0.336246	not cost anything here	-0.124939
-0.331831	languages. www.yeppp.info And here	-0.124939
-0.044142	that is said here	-0.425969
-0.314659	decomposition. Functional decomposition here	-0.124939
-0.237845	message is provoked here	-0.124939
-0.601102	resources of the core	-0.124939
-0.595562	CPUs: use the core	-0.124939
-0.833594	clock cycles. The core	-0.124939
-0.358171	clock frequency. The core	-0.124939
-0.358765	the table are core	-0.124939
-1.862995	in the same core	-0.124939
-1.222555	that the CPU core	-0.124939
-0.638364	only one CPU core	-0.124939
-0.376599	a specific CPU core	-0.124939
-0.588256	processors is called core	-0.124939
-0.356392	service routines, system core	-0.124939
-0.459745	access. The execution core	-0.124939
-1.061703	the same processor core	-0.124939
-0.649659	a dedicated microprocessor core	-0.124939
-0.349080	The AMD math core	-0.124939
-0.339470	AMD AMD Math core	-0.124939
-0.600793	answers in the relevant	-0.124939
-0.878174	with all the relevant	-0.124939
-1.749972	that it is relevant	-0.124939
-0.572470	for size is relevant	-0.124939
-0.557061	for speed is relevant	-0.124939
-0.527069	could possibly be relevant	-0.124939
-1.276303	it is more relevant	-0.124939
-0.386044	version with all relevant	-0.124939
-0.386044	out with all relevant	-0.124939
-0.386044	Even with all relevant	-0.124939
-0.558271	turn on all relevant	-0.124939
-0.586601	manual is also relevant	-0.124939
-0.356590	to receive new relevant	-0.124939
-0.351648	Command line options relevant	-0.124939
-0.339457	these are hardly relevant	-0.124939
-0.314708	directives and keywords relevant	-0.124939
-0.294191	almost all respects relevant	-0.124939
-0.961774	the vector registers are:	-0.124939
-0.501798	rather than pointers are:	-0.124939
-1.181279	object oriented programming are:	-0.124939
-0.782730	the register stack are:	-0.124939
-0.784118	dynamic memory allocation are:	-0.124939
-0.580525	of CPU dispatching are:	-0.124939
-0.333137	of dynamic linking are:	-0.124939
-0.333137	than dynamic linking are:	-0.124939
-0.350772	rather than references are:	-0.124939
-0.748214	of function inlining are:	-0.124939
-0.341755	of loop iterations are:	-0.124939
-0.339388	malloc and free are:	-0.124939
-0.339411	problems with profilers are:	-0.124939
-0.331839	The positive effects are:	-0.124939
-0.421277	it. Possible solutions are:	-0.124939
-0.237853	YMM registers. Disadvantages are:	-0.124939
-0.527254	dates back to around	-0.124939
-0.553584	way to work around	-0.124939
-0.455279	have #if directives around	-0.124939
-1.215660	are various ways around	-0.124939
-0.081884	fragmented and scattered around	-0.124939
-0.039015	to be scattered around	-0.124939
-0.039015	may be scattered around	-0.124939
-0.109371	that are scattered around	-0.124939
-0.185505	data are scattered around	-0.124939
-0.081884	many functions scattered around	-0.124939
-0.081884	files etc. scattered around	-0.124939
-0.331880	do not wrap around	-0.124939
-0.314751	lot of jumping around	-0.124939
-0.294200	are scattered randomly around	-0.124939
-0.023522	put a parenthesis around	-0.124939
-0.237894	identify the circumstances around	-0.124939
-0.574734	code up to 5	-0.124939
-0.358215	seconds count to 5	-0.124939
-0.358215	been incremented to 5	-0.124939
-0.504514	take 3 - 5	-0.124939
-0.526436	may take only 5	-0.124939
-1.387817	= b * 5	-0.124939
-0.351566	* 100 * 5	-0.124939
-0.644704	point addition takes 5	-0.124939
-0.351280	point operation takes 5	-0.124939
-0.357188	is typically between 5	-0.124939
-0.352310	cover graphics processors. 5	-0.124939
-0.773203	#elif INSTRSET == 5	-0.124939
-0.325321	hardware platform ....................................................................................... 5	-0.124939
-0.314688	optimal platform ........................................................................................... 5	-0.124939
-0.314688	usability ............................................................................................... 23 5	-0.124939
-0.294173	multiplying by 3, 5	-0.124939
-0.237869	from a website. 5	-0.124939
-1.871419	the function is replaced	-0.124939
-0.725921	function, m is replaced	-0.124939
-0.358329	operation. x*8 is replaced	-0.124939
-0.884837	if possible, and replaced	-0.124939
-1.188557	This can be replaced	-0.425969
-0.843277	size can be replaced	-0.124939
-0.571125	multiplication can be replaced	-0.124939
-0.571125	constants can be replaced	-0.124939
-0.571125	12.4b can be replaced	-0.124939
-0.591383	pointers may be replaced	-0.124939
-0.585474	constants will be replaced	-0.124939
-0.936325	can sometimes be replaced	-0.124939
-1.017986	template parameters are replaced	-0.124939
-0.995865	The compiler has replaced	-0.124939
-0.572161	compiler have been replaced	-0.124939
-0.355127	has its parameters replaced	-0.124939
-0.357656	b) {x = a;	-0.124939
-0.357656	c; x[0] = a;	-0.124939
-0.572303	S1 { int a;	-0.124939
-0.558604	{ short int a;	-0.124939
-0.558604	11 short int a;	-0.124939
-0.813388	{ public: int a;	-0.124939
-0.351304	+ 2;} int a;	-0.124939
-1.223100	+ b + a;	-0.124939
-0.341872	Example 7.24 float a;	-0.124939
-0.341872	Example 7.29a float a;	-0.124939
-0.341872	Example 14.2a float a;	-0.124939
-0.341872	Example 14.2b float a;	-0.124939
-0.349125	Example 7.11 bool a;	-0.124939
-0.037166	struct S1 {double a;	-0.425969
-0.102867	struct abc {int a;	-0.124939
-0.102867	struct Sab {int a;	-0.124939
-0.463167	are lots of things	-0.124939
-0.358558	a couple of things	-0.124939
-0.504940	overloaded operators for things	-0.124939
-0.586647	smart and other things	-0.124939
-1.204886	possible to do things	-0.124939
-0.794286	advantageous to do things	-0.124939
-0.794286	ways to do things	-0.124939
-0.357592	to do multiple things	-0.124939
-1.107688	There are two things	-0.124939
-0.461580	compiler does some things	-0.124939
-0.460146	times to simple things	-0.124939
-0.927305	ways of doing things	-0.124939
-1.402434	There are various things	-0.124939
-0.352855	listing reveals three things	-0.124939
-0.294163	can often reveal things	-0.124939
-0.294163	does some funny things	-0.124939
-0.237861	does quite ingenious things	-0.124939
-0.570538	positive or the negative	-0.124939
-0.358911	// u.d is negative	-0.124939
-0.590583	Alternatively, use a negative	-0.124939
-1.788295	to make a negative	-0.124939
-0.762095	software contains a negative	-0.124939
-0.724530	variable produces a negative	-0.124939
-0.562236	give overflow and negative	-0.124939
-0.462934	both positive and negative	-0.124939
-0.358808	these variables. The negative	-0.124939
-0.726494	and 1 for negative	-0.124939
-0.726423	can never be negative	-0.124939
-0.358765	if both are negative	-0.124939
-0.526715	2n and not negative	-0.124939
-0.355722	your software. A negative	-0.124939
-0.355722	bits differently. A negative	-0.124939
-0.568858	functions) has no negative	-0.124939
-0.357472	} A possible negative	-0.124939
-1.062806	in the code section	-0.124939
-0.552344	only the code section	-0.124939
-1.180224	makes the code section	-0.124939
-0.552344	Therefore, the code section	-0.124939
-0.486963	access. The code section	-0.124939
-0.486963	X The code section	-0.124939
-0.486963	features: The code section	-0.124939
-0.358586	programming languages. This section	-0.124939
-0.577691	end of this section	-0.124939
-0.353991	may skip this section	-0.124939
-0.353991	will conclude this section	-0.124939
-0.871371	Therefore, the data section	-0.124939
-0.531319	processes. The data section	-0.124939
-0.352396	Any writable data section	-0.124939
-0.549661	VIA. The next section	-0.124939
-0.237926	in assembly language", section	-0.124939
-1.423455	to do the reductions	-0.124939
-0.463482	do. All the reductions	-0.124939
-0.725465	to do more reductions	-0.124939
-0.520769	situations, and which reductions	-0.124939
-0.354494	77) shows which reductions	-0.124939
-0.357454	cases, while many reductions	-0.124939
-0.356213	the most simple reductions	-0.124939
-0.353631	point expressions. Most reductions	-0.124939
-0.165172	do the algebraic reductions	-0.124939
-0.165172	methods and algebraic reductions	-0.124939
-0.165172	cannot make algebraic reductions	-0.124939
-0.165172	do any algebraic reductions	-0.124939
-0.229228	do simple algebraic reductions	-0.124939
-0.165172	compiler. Many algebraic reductions	-0.124939
-0.331880	do such obvious reductions	-0.124939
-0.331880	at doing equivalent reductions	-0.124939
-0.314751	the CPU. Algebraic reductions	-0.124939
-0.724333	for example, to go	-0.124939
-0.881301	who want to go	-0.124939
-1.742198	is likely to go	-0.124939
-0.357877	simply predicted to go	-0.124939
-0.463524	loop counter and go	-0.124939
-0.591122	branch that can go	-0.124939
-1.354892	cc[]) { // go	-0.124939
-1.282547	to the function go	-0.124939
-1.042589	because it may go	-0.124939
-0.460589	written table may go	-0.124939
-0.653380	a branch will go	-0.124939
-0.355673	eight elements will go	-0.124939
-0.358142	// Non-polymorphic functions go	-0.124939
-0.461547	and public variables go	-0.124939
-0.355937	is, but must go	-0.124939
-0.355843	pointer arithmetic calculations go	-0.124939
-0.325331	that would otherwise go	-0.124939
-0.504840	eliminate everything that depends	-0.124939
-0.597125	method to use depends	-0.124939
-0.526512	in each vector depends	-0.124939
-1.331409	of a loop depends	-0.124939
-0.788535	that each value depends	-0.124939
-0.351372	its return value depends	-0.124939
-0.436792	loop control branch depends	-0.425969
-0.137700	that each calculation depends	-0.124939
-0.137700	where each calculation depends	-0.124939
-0.353769	the final application depends	-0.124939
-0.455974	if each addition depends	-0.124939
-1.067633	can be predicted depends	-0.124939
-0.349099	value of sum depends	-0.124939
-0.446269	parallelism. The gain depends	-0.124939
-0.314678	of this bookkeeping depends	-0.124939
-0.237861	to the truth depends	-0.124939
-0.358968	necessary. Take the example:	-0.124939
-0.358831	memory blocks, for example:	-0.124939
-0.590448	illustrated in this example:	-0.124939
-0.509812	compile time. For example:	-0.124939
-0.277437	comparisons, etc. For example:	-0.124939
-0.277437	many cases. For example:	-0.124939
-0.116931	pointed to. For example:	-0.124939
-0.116931	refers to. For example:	-0.124939
-0.277437	a structure. For example:	-0.124939
-0.277437	mixed sizes. For example:	-0.124939
-0.277437	table lookup. For example:	-0.124939
-0.509812	is valid. For example:	-0.124939
-0.277437	poorly predictable. For example:	-0.124939
-0.277437	be combined. For example:	-0.124939
-0.277437	eliminated completely. For example:	-0.124939
-0.586690	by the following example:	-0.124939
-0.347492	than ARRAYSIZE. Another example:	-0.124939
-0.358866	project together and tested	-0.124939
-0.537673	program should be tested	-0.124939
-0.537673	software should be tested	-0.124939
-0.537673	systems should be tested	-0.124939
-0.537673	servers should be tested	-0.124939
-0.856979	should also be tested	-0.124939
-0.558533	Programmers that have tested	-0.124939
-0.263448	compilers I have tested	-0.124939
-0.407167	manually. I have tested	-0.124939
-0.355903	16. Library versions tested	-0.124939
-0.520021	have not been tested	-0.124939
-0.551091	examples have been tested	-0.124939
-0.349103	can be further tested	-0.124939
-0.237918	reusable and well- tested	-0.124939
-0.358968	This removed the contentions	-0.124939
-0.353451	cache as when contentions	-0.124939
-0.456679	a matrix when contentions	-0.124939
-0.353451	more dramatic when contentions	-0.124939
-0.832536	If the cache contentions	-0.124939
-0.338543	will be cache contentions	-0.124939
-0.437857	can cause cache contentions	-0.124939
-0.461265	for level-2 cache contentions	-0.124939
-0.461265	if level-2 cache contentions	-0.124939
-0.519455	for level-1 cache contentions	-0.124939
-0.503110	consequence of such contentions	-0.124939
-0.462190	likely to cause contentions	-0.124939
-0.482319	stride and cause contentions	-0.124939
-0.528876	this can cause contentions	-0.124939
-0.208770	{ // Cache contentions	-0.124939
-0.091959	96 9.10 Cache contentions	-0.124939
-0.091959	opposite). 9.10 Cache contentions	-0.124939
-0.556766	innermost loop is predicted	-0.124939
-0.557233	same way is predicted	-0.124939
-0.787291	target address is predicted	-0.124939
-1.592463	likely to be predicted	-0.124939
-1.051615	that can be predicted	-0.124939
-1.534920	it can be predicted	-0.124939
-0.581378	inside can be predicted	-0.124939
-1.467641	can also be predicted	-0.124939
-0.581728	size cannot be predicted	-0.124939
-0.572251	otherwise would be predicted	-0.124939
-1.367546	if they are predicted	-0.124939
-0.106870	Nested loops are predicted	-0.425969
-0.598722	branches is not predicted	-0.124939
-0.570774	labels is simply predicted	-0.124939
-0.562513	loop-branch is usually predicted	-0.124939
-1.626573	one of the main	-0.124939
-0.591552	function in the main	-0.124939
-0.591552	loop in the main	-0.124939
-0.882323	works in the main	-0.124939
-0.598077	proxy for the main	-0.124939
-0.594100	CPU than the main	-0.124939
-0.592463	call from the main	-0.124939
-0.589751	process where the main	-0.124939
-0.587349	registers instead of main	-0.124939
-0.462388	the version in main	-0.124939
-0.849430	global variable in main	-0.124939
-0.357946	the instance in main	-0.124939
-0.568275	Fortran code. The main	-0.124939
-1.128477	instruction set. The main	-0.124939
-0.357560	is running. The main	-0.124939
-0.762911	be accessed from main	-0.124939
-1.107945	There are two main	-0.124939
-0.565039	all pointers and references	-0.124939
-0.089894	references Pointers and references	-0.124939
-0.042625	7.6 Pointers and references	-0.124939
-0.763885	through pointers or references	-0.124939
-0.596448	pointers rather than references	-0.124939
-0.504218	will have more references	-0.124939
-1.225408	advantages of using references	-0.124939
-0.587490	avoided by using references	-0.124939
-0.356157	declared as constant references	-0.124939
-0.411434	efficient because relative references	-0.124939
-0.317392	and mostly relative references	-0.124939
-0.339433	DLL use absolute references	-0.124939
-0.336253	calculation of self-relative references	-0.124939
-0.331864	references Pointers versus references	-0.124939
-0.237877	pointers and non-constant references	-0.124939
-0.237877	function calls. Internal references	-0.124939
-1.036389	the program is loaded	-0.124939
-0.619040	the library is loaded	-0.124939
-0.549627	dynamic library is loaded	-0.124939
-0.358871	and esp+12 and loaded	-0.124939
-1.190575	has to be loaded	-0.124939
-0.577607	sure to be loaded	-0.124939
-1.057866	pointer can be loaded	-0.124939
-0.591272	section can be loaded	-0.124939
-0.592244	modules may be loaded	-0.124939
-0.586565	line will be loaded	-0.124939
-0.581728	address cannot be loaded	-0.124939
-0.577906	that must be loaded	-0.124939
-0.179526	dynamic libraries are loaded	-0.124939
-0.794159	which is typically loaded	-0.124939
-0.237926	or an over- loaded	-0.124939
-0.575634	about whether the positive	-0.124939
-0.588844	exponent is a positive	-0.124939
-1.053140	N is a positive	-0.124939
-0.588438	is that a positive	-0.124939
-1.781460	to make a positive	-0.124939
-0.761317	software contains a positive	-0.124939
-0.463496	program performance. The positive	-0.124939
-1.085557	works only for positive	-0.124939
-0.462661	is 0 for positive	-0.124939
-0.358308	unsigned variables. A positive	-0.124939
-0.357600	to compare two positive	-0.124939
-0.524944	up to some positive	-0.124939
-0.543335	as a large positive	-0.124939
-0.980263	a very large positive	-0.124939
-0.434061	v.f if both positive	-0.124939
-0.335520	style has both positive	-0.124939
-0.483845	produces a low positive	-0.124939
-1.064642	out of the loop.	-0.124939
-0.893343	iteration of the loop.	-0.124939
-0.927870	branch inside the loop.	-0.124939
-0.877042	calculations inside the loop.	-0.124939
-0.489334	etc.) inside the loop.	-0.124939
-0.462792	done outside the loop.	-0.124939
-0.462792	move outside the loop.	-0.124939
-0.525566	change during the loop.	-0.124939
-0.357759	to exit the loop.	-0.124939
-0.940756	to unroll a loop.	-0.124939
-0.524824	after the test loop.	-0.124939
-0.785736	in the innermost loop.	-0.124939
-0.356908	inside the innermost loop.	-0.124939
-0.502193	outside the innermost loop.	-0.124939
-0.416123	the critical innermost loop.	-0.124939
-0.294247	be an infinite loop.	-0.124939
-1.306892	every time the computer	-0.124939
-0.892508	automatically when the computer	-0.124939
-0.889568	simultaneously. If the computer	-0.124939
-0.661086	turn off the computer	-0.124939
-0.414656	log off the computer	-0.124939
-0.449563	or until the computer	-0.124939
-0.449563	file until the computer	-0.124939
-0.357902	to restart the computer	-0.124939
-0.591211	objects in a computer	-0.124939
-0.591211	smaller in a computer	-0.124939
-0.704345	of objects in computer	-0.124939
-0.491417	graphics objects in computer	-0.124939
-0.504169	never used. A computer	-0.124939
-0.358029	cycle on one computer	-0.124939
-0.568376	resource for many computer	-0.124939
-0.950192	a Pentium 4 computer	-0.124939
-0.351666	Use an old computer	-0.124939
-1.954778	is that the overhead	-0.124939
-1.159482	to avoid the overhead	-0.124939
-0.526861	driver involves the overhead	-0.124939
-0.463106	without invoking the overhead	-0.124939
-0.864631	the function. The overhead	-0.124939
-0.489356	member function. The overhead	-0.124939
-0.502211	inlining are: The overhead	-0.124939
-0.502211	between threads. The overhead	-0.124939
-0.762057	little or no overhead	-0.124939
-0.486194	away the extra overhead	-0.124939
-0.533922	be no extra overhead	-0.124939
-0.321990	may need extra overhead	-0.124939
-0.321990	an 9 extra overhead	-0.124939
-0.444795	of the large overhead	-0.124939
-0.792704	is a large overhead	-0.124939
-0.554997	involve a high overhead	-0.124939
-0.347478	is very little overhead	-0.124939
-0.017750	on AMD and VIA	-0.602060
-0.004370	Intel, AMD and VIA	-0.359022
-0.358724	Intel, AMD or VIA	-0.124939
-0.325438	fail to recognize VIA	-0.124939
-0.358963	for holding the pointer.	-0.124939
-1.184174	converted to a pointer.	-0.124939
-0.583675	table or a pointer.	-0.124939
-0.579546	restriction from a pointer.	-0.124939
-0.560468	actually making a pointer.	-0.124939
-0.868690	accessed through a pointer.	-0.124939
-0.538819	behaves like a pointer.	-0.124939
-0.572516	as a memory pointer.	-0.124939
-0.928624	of the member pointer.	-0.124939
-0.350145	the same member pointer.	-0.124939
-1.016671	to the stack pointer.	-0.124939
-0.198229	or a smart pointer.	-0.124939
-0.198229	need a smart pointer.	-0.124939
-0.198229	through a smart pointer.	-0.124939
-0.294258	need the 'this' pointer.	-0.124939
-0.382886	need a 'this' pointer.	-0.124939
-0.314737	through a hidden pointer.	-0.124939
-0.826066	a compiler that supports	-0.124939
-0.358094	a microprocessor that supports	-0.124939
-0.590578	threads. The compiler supports	-0.124939
-1.391189	The Intel compiler supports	-0.124939
-0.518374	(The Microsoft compiler supports	-0.124939
-0.352859	(The PGI compiler supports	-0.124939
-0.358319	make utility. It supports	-0.124939
-1.143770	that the CPU supports	-0.124939
-0.565723	than the CPU supports	-0.124939
-0.568390	instruction. The CPU supports	-0.124939
-0.579117	bit instruction set supports	-0.124939
-0.579117	x86-64 instruction set supports	-0.124939
-0.561084	platform, but also supports	-0.124939
-0.347470	processor model N supports	-0.124939
-0.764711	My test tool supports	-0.124939
-0.237894	that model N+1 supports	-0.124939
-0.237894	development environment (IDE) supports	-0.124939
-1.368261	Note that the C	-0.124939
-0.358833	A, B and C	-0.124939
-0.463572	of arrays in C	-0.124939
-0.527056	will often be C	-0.124939
-0.357218	linked together with C	-0.124939
-0.357218	be manipulated with C	-0.124939
-1.191773	more resources than C	-0.124939
-0.339091	in the Gnu C	-0.124939
-1.428993	in a separate C	-0.124939
-0.346412	may choose either C	-0.124939
-0.023520	B = 2.2, C	-0.425969
-0.023520	the old fashioned C	-0.425969
-0.294173	includes the low-level C	-0.124939
-0.237869	problem. The official C	-0.124939
-0.805777	one that is compatible	-0.124939
-0.550684	set that is compatible	-0.124939
-0.550684	version that is compatible	-0.124939
-0.914931	the processor is compatible	-0.124939
-1.308356	may not be compatible	-0.124939
-0.895641	will not be compatible	-0.124939
-0.592010	it will be compatible	-0.124939
-0.532611	functions are not compatible	-0.124939
-0.773782	compilers are not compatible	-0.124939
-0.532611	names are not compatible	-0.124939
-0.581332	set. The most compatible	-0.124939
-0.460596	is not even compatible	-0.124939
-0.343695	Windows are fully compatible	-0.124939
-0.339469	respects and highly compatible	-0.124939
-0.314745	sequence of backwards compatible	-0.124939
-0.237917	are not backwards compatible	-0.124939
-0.294210	compiler is mostly compatible	-0.124939
-1.070567	because of a change	-0.124939
-0.463608	not allowed to change	-0.124939
-0.463496	Hardware updating. The change	-0.124939
-0.717438	the loop can change	-0.124939
-1.101449	dynamic library can change	-0.124939
-0.582747	references. You can change	-0.124939
-0.521364	modern CPUs can change	-0.124939
-0.575382	caveats. We can change	-0.124939
-0.534350	a compiler may change	-0.124939
-1.342579	The compiler may change	-0.124939
-0.591543	adjusted if you change	-0.124939
-0.857576	optimizing compiler will change	-0.124939
-0.522513	a[] which will change	-0.124939
-0.539381	result if we change	-0.124939
-0.349525	const reference cannot change	-0.124939
-0.349525	optimized. We cannot change	-0.124939
-0.343710	most cases. Don't change	-0.124939
-1.075806	variable in the global	-0.124939
-1.270035	applied to a global	-0.124939
-1.532762	stored in a global	-0.124939
-0.888018	name as a global	-0.124939
-0.573477	Likewise, when a global	-0.124939
-0.357652	make log2 a global	-0.124939
-0.504984	of static and global	-0.124939
-0.313833	to static or global	-0.124939
-0.313833	on static or global	-0.124939
-0.358013	two names, one global	-0.124939
-0.583980	make a variable global	-0.124939
-0.357289	not make variables global	-0.124939
-0.535981	function are called global	-0.124939
-0.349368	its variables called global	-0.124939
-0.355871	may preferably avoid global	-0.124939
-0.354440	public variables. All global	-0.124939
-0.348313	page 26. Avoid global	-0.124939
-0.463604	The results of my	-0.124939
-0.358844	of computers and my	-0.124939
-0.356992	read about in my	-0.124939
-0.461176	512 matrix in my	-0.124939
-0.356992	A look in my	-0.124939
-0.356992	algebraic reductions in my	-0.124939
-0.722278	be found in my	-0.124939
-0.787284	the manual for my	-0.124939
-0.358156	and suggestions for my	-0.124939
-0.463487	Please note that my	-0.124939
-0.578997	been replaced by my	-0.124939
-1.365016	are based on my	-0.124939
-0.357132	based mainly on my	-0.124939
-0.357040	legal issue. See my	-0.124939
-0.503273	the code. For my	-0.124939
-0.353621	this topic, see my	-0.124939
-0.331849	reliable solution. (In my	-0.124939
-0.989472	can avoid the conversions	-0.124939
-0.504635	are: Avoid the conversions	-0.124939
-0.358654	variables. Move the conversions	-0.124939
-0.358040	C++ language, all conversions	-0.124939
-0.723736	Integer to float conversions	-0.124939
-0.782819	you don't need conversions	-0.124939
-0.460281	of different type conversions	-0.124939
-0.998491	order to avoid conversions	-0.124939
-0.444386	you cannot avoid conversions	-0.124939
-0.317756	extra time. These conversions	-0.124939
-0.317756	the pointer. These conversions	-0.124939
-0.317756	single precision. These conversions	-0.124939
-0.317756	syntax checks. These conversions	-0.124939
-0.348322	page 140. Avoid conversions	-0.124939
-0.435001	problem. 7.11 Type conversions	-0.124939
-0.237894	are floating point-to-integer conversions	-0.124939
-0.583266	last time the statement	-0.124939
-0.502921	loop, the if statement	-0.124939
-0.461731	135 The if statement	-0.124939
-0.358408	same after this statement	-0.124939
-0.869071	be only one statement	-0.124939
-0.803548	so that each statement	-0.124939
-0.472965	the function call statement	-0.124939
-0.472965	Each function call statement	-0.124939
-1.415309	the loop control statement	-0.124939
-0.233564	predict a switch statement	-0.124939
-0.233564	processors, a switch statement	-0.124939
-0.243839	tree or switch statement	-0.124939
-0.321823	targets. A switch statement	-0.124939
-0.243839	initializer lists, switch statement	-0.124939
-0.687902	an empty throw() statement	-0.124939
-0.341807	53). No general statement	-0.124939
-0.725932	common cause of errors	-0.124939
-0.526752	frequent source of errors	-0.124939
-0.358211	for preventing program errors	-0.124939
-0.599070	But the same errors	-0.124939
-0.487330	checks for such errors	-0.124939
-0.200096	to prevent such errors	-0.124939
-0.487646	other common programming errors	-0.124939
-0.346447	may catch programming errors	-0.124939
-0.573362	it can cause errors	-0.124939
-0.456855	ways of handling errors	-0.124939
-0.339415	unsafe because serious errors	-0.124939
-0.314698	can cause unpredictable errors	-0.124939
-0.314698	and rounding 137 errors	-0.124939
-0.237877	intended for detecting errors	-0.124939
-0.237877	and cause fatal errors	-0.124939
-0.358866	be optional and off	-0.124939
-0.358716	then turn it off	-0.124939
-0.093711	program to turn off	-0.124939
-0.093711	has to turn off	-0.124939
-0.093711	user to turn off	-0.124939
-0.093711	useful to turn off	-0.124939
-0.103873	recommended to turn off	-0.425969
-0.222821	versions and turn off	-0.124939
-0.159638	calculations or turn off	-0.124939
-0.348355	you turn them off	-0.124939
-0.339470	off or log off	-0.124939
-0.122660	used for turning off	-0.124939
-0.111560	program by turning off	-0.124939
-0.111560	just by turning off	-0.124939
-0.237918	and) will cut off	-0.124939
-1.419197	The number of unused	-0.124939
-0.358574	cause holes of unused	-0.124939
-0.577299	Put in an unused	-0.124939
-0.758958	from making an unused	-0.124939
-0.503499	13 // 2 unused	-0.124939
-0.752380	first // 4 unused	-0.124939
-0.349244	are also 4 unused	-0.124939
-0.357015	counter // For unused	-0.124939
-0.731320	of loop ; unused	-0.124939
-0.321199	; r ; unused	-0.124939
-0.321199	if true ; unused	-0.124939
-0.321199	;a ;r ; unused	-0.124939
-0.582062	add a few unused	-0.124939
-0.534656	possible to add unused	-0.124939
-0.287599	there are 6 unused	-0.124939
-0.287599	first // 6 unused	-0.124939
-0.358952	at explaining the relative	-0.124939
-0.595642	elements with a relative	-0.124939
-0.358589	it sees a relative	-0.124939
-0.858300	has support for relative	-0.124939
-0.764135	the code are relative	-0.124939
-0.550456	of each function relative	-0.124939
-0.536418	more efficient because relative	-0.124939
-0.354957	is smaller because relative	-0.124939
-0.928594	of the member relative	-0.124939
-0.994449	a data member relative	-0.124939
-0.554323	This will generate relative	-0.124939
-0.545798	If the offset relative	-0.124939
-0.294182	mode and mostly relative	-0.124939
-0.102852	but only self- relative	-0.124939
-0.102852	for calculating self- relative	-0.124939
-0.237877	in fact addressed relative	-0.124939
-0.961150	the number of columns	-0.425969
-0.065736	of rows and columns	-0.425969
-0.763848	The multiplication by columns	-0.124939
-0.462920	is faster when columns	-0.124939
-0.677557	{ // loop columns	-0.124939
-0.356387	rows // loop columns	-0.124939
-0.358101	<< 5. If columns	-0.124939
-0.356969	the last 8 columns	-0.124939
-0.350336	to add unused columns	-0.124939
-0.023522	rows = 20, columns	-0.425969
-0.294200	rows = 10, columns	-0.124939
-0.526734	add i to p	-0.124939
-1.161830	is added to p	-0.124939
-0.504273	will see that p	-0.124939
-0.658175	is clear that p	-0.124939
-0.526923	i; p = p	-0.124939
-0.659280	pointed to by p	-0.124939
-0.358565	same thing as p	-0.124939
-0.565954	after the pointer p	-0.124939
-0.549433	class of object p	-0.124939
-0.357438	obj1; C0 * p	-0.124939
-0.357032	be read before p	-0.124939
-0.591313	p; int i; p	-0.124939
-0.354877	equally fast whether p	-0.124939
-0.444306	at optimizing away p	-0.124939
-0.294163	&Object1; p->NotPolymorphic(); p->Hello(); p	-0.124939
-0.382771	CHello * p; p	-0.124939
-0.858806	choice for all platforms.	-0.124939
-0.136662	Linux and Windows platforms.	-0.124939
-0.331905	cases on Windows platforms.	-0.124939
-0.953177	Linux and Mac platforms.	-0.124939
-0.272645	guide for x86 platforms.	-0.124939
-0.089907	with all x86 platforms.	-0.124939
-0.476246	Supports all x86 platforms.	-0.124939
-0.343706	not standardized across platforms.	-0.124939
-0.341841	only on PC platforms.	-0.124939
-0.044149	x86 and x86-64 platforms.	-0.124939
-0.314727	for all Unix-like platforms.	-0.124939
-0.048392	for all major platforms.	-0.124939
-0.048392	on all major platforms.	-0.124939
-0.581473	C++, and other languages	-0.124939
-0.777639	faster than other languages	-0.124939
-0.356581	implementations. However, these languages	-0.124939
-0.533874	history of programming languages	-0.124939
-0.308697	called from programming languages	-0.124939
-0.280099	Several other programming languages	-0.124939
-0.280099	over other programming languages	-0.124939
-0.308697	other compiled programming languages	-0.124939
-0.308697	Several modern programming languages	-0.124939
-0.500664	useful in compiled languages	-0.124939
-0.509016	speed. This includes languages	-0.124939
-0.441980	advantage in interpreted languages	-0.124939
-0.314751	size, while high-level languages	-0.124939
-0.294200	development time. Interpreted languages	-0.124939
-0.237894	compiled code. Compiled languages	-0.124939
-0.237894	at hand. Low-level languages	-0.124939
-1.295646	rest of the installation	-0.124939
-0.896828	unusual for the installation	-0.124939
-0.526885	selected during the installation	-0.124939
-0.723555	or *.so). The installation	-0.124939
-0.357543	in use. The installation	-0.124939
-0.357543	be installed. The installation	-0.124939
-0.358814	The procedures for installation	-0.124939
-0.462783	below. Dispatch at installation	-0.124939
-0.358040	to select all installation	-0.124939
-0.356899	developers should take installation	-0.124939
-0.304379	time both during installation	-0.124939
-0.304379	framework itself, during installation	-0.124939
-0.343686	always use standardized installation	-0.124939
-0.093305	16 3.3 Program installation	-0.124939
-0.093305	sections. 3.3 Program installation	-0.124939
-0.325351	than by individual installation	-0.124939
-0.532796	Define function name depending	-0.124939
-0.557276	in various ways depending	-0.124939
-0.350318	clock frequency dynamically depending	-0.124939
-0.118891	16 clock cycles, depending	-0.124939
-0.118891	10 clock cycles, depending	-0.124939
-0.118891	6 clock cycles, depending	-0.124939
-0.118891	80 clock cycles, depending	-0.124939
-0.118891	25 clock cycles, depending	-0.124939
-0.339397	of the memory, depending	-0.124939
-0.498806	for 32-bit integers, depending	-0.124939
-0.314698	9 and 64, depending	-0.124939
-0.294182	three or four, depending	-0.124939
-0.237877	has several meanings depending	-0.124939
-0.237877	than example 12.4a, depending	-0.124939
-0.237877	a conditional move, depending	-0.124939
-0.237877	the following solutions, depending	-0.124939
-0.897291	sense that the syntax	-0.124939
-0.583605	efficient, but the syntax	-0.124939
-0.415242	called. Unfortunately, the syntax	-0.124939
-0.415242	this. Unfortunately, the syntax	-0.124939
-0.358910	of relieving a syntax	-0.124939
-0.357543	of x The syntax	-0.124939
-0.503081	pointers are: The syntax	-0.124939
-0.357543	cache space. The syntax	-0.124939
-0.658737	a little more syntax	-0.124939
-0.558458	behind the C++ syntax	-0.124939
-0.532794	conversions The C++ syntax	-0.124939
-0.461612	It has some syntax	-0.124939
-0.536703	same inline assembly syntax	-0.124939
-0.355862	BigArray[1024]; // Windows syntax	-0.124939
-0.355395	__attribute__((aligned(64))); // Linux syntax	-0.124939
-0.314751	way or bypassing syntax	-0.124939
-1.076720	50% of the cases.	-0.124939
-0.427001	value in most cases.	-0.124939
-0.427001	optimal in most cases.	-0.124939
-0.427001	lists in most cases.	-0.124939
-0.503096	solution in such cases.	-0.124939
-0.353009	variable in many cases.	-0.124939
-0.353009	explicitly in many cases.	-0.124939
-0.467214	performance in some cases.	-0.124939
-0.467214	solution in some cases.	-0.124939
-0.467214	structure in some cases.	-0.124939
-0.164741	file in simple cases.	-0.124939
-0.252340	automatically in simple cases.	-0.124939
-0.164741	least in simple cases.	-0.124939
-0.581160	in the best cases.	-0.124939
-0.551286	same in both cases.	-0.124939
-0.788306	in the simplest cases.	-0.124939
-0.352284	the newest processors. Supports	-0.124939
-0.453815	the Microsoft compiler. Supports	-0.124939
-0.348269	Open source library. Supports	-0.124939
-0.347399	the Intel libraries. Supports	-0.124939
-0.346389	Optimizes moderately well. Supports	-0.124939
-0.890065	later instruction sets. Supports	-0.124939
-0.345163	others are not. Supports	-0.124939
-0.325291	good optimization options. Supports	-0.124939
-0.314659	produce binary code). Supports	-0.124939
-0.382748	and automatic parallelization. Supports	-0.124939
-0.538741	32-bit and 64-bit. Supports	-0.124939
-0.382748	Linux and Mac. Supports	-0.124939
-0.294145	library. Open source. Supports	-0.124939
-0.237845	(SDK or PSDK). Supports	-0.124939
-0.237845	fully optimized yet. Supports	-0.124939
-0.237845	and possible workaround. Supports	-0.124939
-0.600391	foremost, in the choice	-0.124939
-1.797151	is that the choice	-0.124939
-0.589789	discussion that the choice	-0.124939
-0.658737	compilers offer the choice	-0.124939
-0.462920	computers. Today, the choice	-0.124939
-0.356295	hardware platform The choice	-0.124939
-0.460289	Windows applications. The choice	-0.124939
-0.356295	B values. The choice	-0.124939
-0.356295	Graphics accelerators The choice	-0.124939
-0.356295	best algorithm. The choice	-0.124939
-0.195542	is a good choice	-0.602060
-0.465203	a very good choice	-0.124939
-0.569573	be the optimal choice	-0.124939
-0.716648	with a suitable choice	-0.124939
-0.527310	decimal point is 1.	-0.124939
-0.152835	than 0 and 1.	-0.124939
-0.657292	for N = 1.	-0.124939
-0.357640	reciprocal_divisor; reciprocal_divisor = 1.	-0.124939
-0.365450	be 0 or 1.	-0.124939
-0.198155	than 0 or 1.	-0.124939
-0.657275	will evict number 1.	-0.124939
-0.382816	with this problem: 1.	-0.124939
-0.237894	uses CPU dispatching: 1.	-0.124939
-0.237894	a function local: 1.	-0.124939
-0.237894	of five manuals: 1.	-0.124939
-0.237894	conditions are satisfied: 1.	-0.124939
-1.056514	performance of the STL	-0.124939
-0.594372	generality of the STL	-0.124939
-0.594372	flexibility of the STL	-0.124939
-1.060883	classes in the STL	-0.124939
-0.595875	containers in the STL	-0.124939
-0.561594	purposes. However, the STL	-0.124939
-0.658117	In fact, the STL	-0.124939
-1.439611	are used in STL	-0.124939
-0.462983	a matrix in STL	-0.124939
-0.577306	stored in an STL	-0.124939
-0.537874	one, into an STL	-0.124939
-1.249834	Do not use STL	-0.124939
-0.358111	vector. The other STL	-0.124939
-0.356079	the STL. Some STL	-0.124939
-0.325371	every four objects. STL	-0.124939
-0.382839	in the container. STL	-0.124939
-1.658209	that it is intended	-0.124939
-0.880656	than it is intended	-0.124939
-1.056653	This function is intended	-0.124939
-1.166503	of code is intended	-0.124939
-0.596283	temporarily. This is intended	-0.124939
-0.589150	systems. It is intended	-0.124939
-0.589150	IDE. It is intended	-0.124939
-0.759508	Exception handling is intended	-0.124939
-0.537988	This feature is intended	-0.124939
-0.356249	symbol interposition is intended	-0.124939
-0.888596	libraries that are intended	-0.124939
-0.462453	The examples are intended	-0.124939
-0.463226	indeed vectorized as intended	-0.124939
-1.648962	It is not intended	-0.124939
-0.550265	execute slower than intended	-0.124939
-0.442036	physics processing unit intended	-0.124939
-1.009988	the addresses of dynamically	-0.124939
-1.051759	objects stored in dynamically	-0.124939
-0.586423	object must be dynamically	-0.124939
-0.595245	resource, such as dynamically	-0.124939
-0.503934	pointers to different dynamically	-0.124939
-0.662251	that is allocated dynamically	-0.124939
-0.242867	can be allocated dynamically	-0.301030
-0.459221	uses many small dynamically	-0.124939
-0.563047	their clock frequency dynamically	-0.124939
-0.733371	how to align dynamically	-0.124939
-0.102866	exp 12.8 Aligning dynamically	-0.124939
-0.102866	119 12.8 Aligning dynamically	-0.124939
-0.314698	discussion of aligning dynamically	-0.124939
-0.237877	frequency may vary dynamically	-0.124939
-0.805991	a sequence of consecutive	-0.124939
-0.764122	are identified by consecutive	-0.124939
-0.356917	line covers 64 consecutive	-0.124939
-0.355112	loop calculates four consecutive	-0.124939
-0.010338	vector in eight consecutive	-0.726999
-0.002561	// Load eight consecutive	-1.028029
-0.594614	execute then the profiler	-0.124939
-0.557577	computer, including the profiler	-0.124939
-0.594280	run with a profiler	-0.124939
-0.462035	the way a profiler	-0.124939
-0.143092	3.2 Use a profiler	-0.425969
-0.503256	packages include a profiler	-0.124939
-0.355657	optimized programs. The profiler	-0.124939
-0.142395	Event-based sampling: The profiler	-0.124939
-0.142395	Time-based sampling: The profiler	-0.124939
-0.459479	other processes. The profiler	-0.124939
-0.355657	every millisecond. The profiler	-0.124939
-0.355657	takes. Debugging. The profiler	-0.124939
-0.358334	page 153. A profiler	-0.124939
-0.339477	their CPUs. Intel's profiler	-0.124939
-0.237926	called VTune; AMD's profiler	-0.124939
-0.826379	the memory to become	-0.124939
-0.849599	they point to become	-0.124939
-0.549737	almost certain to become	-0.124939
-0.462321	heap space to become	-0.124939
-1.375463	the code can become	-0.124939
-0.356830	then measurements can become	-0.124939
-0.356830	CPUs unequally can become	-0.124939
-0.501259	that computers have become	-0.124939
-0.356239	software projects have become	-0.124939
-0.358329	such feature will become	-0.124939
-0.358326	old block then become	-0.124939
-0.349191	suboptimal way has become	-0.124939
-0.349191	heap space has become	-0.124939
-0.349191	hardware platform has become	-0.124939
-0.349191	the heap has become	-0.124939
-0.502361	heap can easily become	-0.124939
-0.586634	processor and a Windows,	-0.124939
-1.096415	For example, in Windows,	-0.124939
-0.850964	Gnu compiler for Windows,	-0.124939
-0.460997	/arch:AVX etc. for Windows,	-0.124939
-0.310235	optimization guide for Windows,	-0.425969
-0.463249	common platforms with Windows,	-0.124939
-0.995777	32- and 64-bit Windows,	-0.124939
-0.541974	parameters. In 64-bit Windows,	-0.124939
-0.357522	are short. In Windows,	-0.124939
-0.999733	compiler for 32-bit Windows,	-0.124939
-0.452459	#else // 32-bit Windows,	-0.124939
-0.582661	Supported operating systems Windows,	-0.124939
-0.353806	Windows, Linux, Mac Windows,	-0.124939
-0.347463	DOS and 16-bit Windows,	-0.124939
-0.331849	avoid this. (In Windows,	-0.124939
-0.599641	error if the index	-0.124939
-0.358808	for multiplying the index	-0.124939
-0.463494	// Check that index	-0.124939
-0.358590	of j as index	-0.124939
-0.358577	index, i. This index	-0.124939
-0.559469	or with an index	-0.124939
-0.356723	constant plus an index	-0.124939
-0.549387	case the array index	-0.124939
-0.436605	is an array index	-0.124939
-0.436605	if an array index	-0.124939
-0.700578	as an array index	-0.124939
-0.334183	Safe [] array index	-0.124939
-0.519462	identified by their index	-0.124939
-0.387024	that the last index	-0.124939
-0.387024	with the last index	-0.124939
-0.237902	may write FatalAppExitA(0,"Array index	-0.124939
-0.591848	addition on a modern	-0.124939
-0.357571	vector operations of modern	-0.124939
-0.555565	high speed of modern	-0.124939
-0.357571	execution core of modern	-0.124939
-0.357571	out-of-order capabilities of modern	-0.124939
-0.357571	high complexity of modern	-0.124939
-0.358817	quite inefficient. The modern	-0.124939
-1.184709	reason is that modern	-0.124939
-1.307148	This is because modern	-0.124939
-0.493642	supported by all modern	-0.124939
-0.350771	reason why all modern	-0.124939
-0.643703	in almost all modern	-0.124939
-0.657770	comes with most modern	-0.124939
-0.354449	order execution All modern	-0.124939
-0.353631	chapter 12. Most modern	-0.124939
-0.336252	is Perl. Several modern	-0.124939
-1.072454	the one that gives	-0.124939
-0.502832	the method that gives	-0.124939
-0.502832	the option that gives	-0.124939
-0.592813	useful because it gives	-0.124939
-0.876933	kinds of code gives	-0.124939
-0.358563	external clock. This gives	-0.124939
-0.358054	a double which gives	-0.124939
-0.893683	AVX2 instruction set gives	-0.124939
-0.761108	of these two gives	-0.124939
-0.460934	but it often gives	-0.124939
-1.313701	in 32-bit systems gives	-0.124939
-0.351251	The calculation here gives	-0.124939
-0.345177	class library, SSE4.1 gives	-0.124939
-0.836006	and VIA CPUs" gives	-0.124939
-0.382771	with all 0's gives	-0.124939
-0.294163	N1 = N&(N-1) gives	-0.124939
-0.468789	i++) { // Loop	-0.124939
-0.543046	TILESIZE) { // Loop	-0.124939
-0.352603	with branch // Loop	-0.124939
-0.064995	// Table // Loop	-0.425969
-0.352603	by TILESIZE // Loop	-0.124939
-2.163929	- x x Loop	-0.124939
-1.053925	1; } } Loop	-0.124939
-0.501973	total execution time. Loop	-0.124939
-0.928025	in the compiler. Loop	-0.124939
-0.887526	in this case. Loop	-0.124939
-0.800181	the unroll factor. Loop	-0.124939
-0.294200	branch is eliminated. Loop	-0.124939
-0.237894	// Example 12.4a. Loop	-0.124939
-0.237894	// Example 8.23a. Loop	-0.124939
-0.726808	parameter transfer is avoided	-0.124939
-0.947694	This can be avoided	-0.301030
-0.819613	example can be avoided	-0.124939
-0.819613	pointers can be avoided	-0.124939
-0.819613	constant can be avoided	-0.124939
-0.558319	Jumps can be avoided	-0.124939
-0.558319	(b+c) can be avoided	-0.124939
-0.546511	penalty should be avoided	-0.124939
-0.546511	wide, should be avoided	-0.124939
-0.546511	malloc/free should be avoided	-0.124939
-1.554874	should preferably be avoided	-0.124939
-0.919333	can sometimes be avoided	-0.124939
-0.200725	should definitely be avoided	-0.124939
-0.592951	aliasing is to turn	-0.124939
-0.523762	your program to turn	-0.124939
-0.864582	user has to turn	-0.124939
-0.655086	the user to turn	-0.124939
-1.229378	be useful to turn	-0.124939
-0.852125	is recommended to turn	-0.301030
-0.358396	are using and turn	-0.124939
-0.358396	two versions and turn	-0.124939
-0.358887	function which in turn	-0.124939
-1.271493	that you can turn	-0.124939
-0.358692	point calculations or turn	-0.124939
-0.575352	editions). Do not turn	-0.124939
-0.358493	on until you turn	-0.124939
-0.358323	for RTTI then turn	-0.124939
-0.599805	function if the inlining	-0.124939
-0.463559	away p and inlining	-0.124939
-0.358827	a request for inlining	-0.124939
-0.460684	disadvantage of function inlining	-0.124939
-0.460684	advantages of function inlining	-0.124939
-0.546740	allocation and function inlining	-0.124939
-0.355002	able do function inlining	-0.124939
-0.106413	leaf function by inlining	-0.425969
-1.303681	be avoided by inlining	-0.124939
-1.290321	be improved by inlining	-0.124939
-0.585264	modules. This makes inlining	-0.124939
-0.310011	Optimization method Function inlining	-0.124939
-0.310011	inlined function. Function inlining	-0.124939
-0.310011	non-inlined copy Function inlining	-0.124939
-0.310011	know about. Function inlining	-0.124939
-0.901190	regardless of the size.	-0.124939
-0.463486	without specifying the size.	-0.124939
-0.540851	for speed or size.	-0.124939
-0.588803	terms of code size.	-0.124939
-0.850072	of the vector size.	-0.124939
-0.352134	divisible by vector size.	-0.124939
-0.352134	the larger vector size.	-0.124939
-0.343750	than the cache size.	-0.124939
-0.447511	access and cache size.	-0.124939
-0.971213	the level-1 cache size.	-0.124939
-0.539364	arrays of variable size.	-0.124939
-0.541837	the vector register size.	-0.124939
-0.348819	largest available register size.	-0.124939
-0.853306	of a specific size.	-0.124939
-0.459134	the matrix line size.	-0.124939
-0.880885	problems if the network	-0.124939
-0.590816	minute if the network	-0.124939
-1.046278	situation where the network	-0.124939
-0.599535	PC's in a network	-0.124939
-0.591429	tested on a network	-0.124939
-0.358905	and interfaces to network	-0.124939
-0.526489	file access and network	-0.124939
-0.549918	Open files and network	-0.124939
-0.358817	be controlled. The network	-0.124939
-0.835520	response times for network	-0.124939
-0.659368	user input or network	-0.124939
-0.358627	of software with network	-0.124939
-0.548952	that depend on network	-0.124939
-0.538496	that relies on network	-0.124939
-0.294200	or accessing databases, network	-0.124939
-0.358960	and BSD, the slow	-0.124939
-0.591231	// This is slow	-0.425969
-0.592625	division, which is slow	-0.124939
-0.585635	running, and a slow	-0.124939
-0.572875	CPUs with a slow	-0.124939
-0.572875	computer with a slow	-0.124939
-0.835416	point comparisons are slow	-0.124939
-0.805269	for CPUs with slow	-0.124939
-0.460601	Function calls may slow	-0.124939
-0.356540	and writes may slow	-0.124939
-0.358063	test setup but slow	-0.124939
-0.356387	table lookup operations slow	-0.124939
-0.311749	CPUs have particularly slow	-0.124939
-0.311749	or any particularly slow	-0.124939
-0.357721	= (a + b)	-0.124939
-0.357621	vector(float a, float b)	-0.124939
-0.357183	n.a. !(a < b)	-0.124939
-0.568365	T const & b)	-0.124939
-0.084108	of (2n / b)	-0.124939
-0.084108	* (2n / b)	-0.124939
-0.084108	constant (2n / b)	-0.124939
-0.651270	? a : b)	-0.124939
-0.352633	= !(a || b)	-0.124939
-0.352643	* c > b)	-0.124939
-0.349719	= (a >= b)	-0.124939
-0.011883	(int a, bool b)	-0.726999
-0.358866	<=, > and >=	-0.124939
-0.347199	0 and i >=	-0.124939
-0.347199	0 || i >=	-0.124939
-0.347199	= 2.0; i >=	-0.124939
-0.347213	int if (i >=	-0.124939
-0.488706	... if (i >=	-0.124939
-0.439009	b) = (a >=	-0.124939
-0.011603	} if (level >=	-0.124939
-0.011603	else if (level >=	-0.124939
-0.005763	branches): if (level >=	-0.425969
-0.065804	else { (iset >=	-0.124939
-0.065804	= &SelectAddMul_AVX2; (iset >=	-0.124939
-0.065804	= &SelectAddMul_SSE41; (iset >=	-0.124939
-0.538888	if ((unsigned int)i >=	-0.124939
-0.898645	length of the desired	-0.124939
-0.576618	pointer to the desired	-0.124939
-0.564435	made to the desired	-0.124939
-0.564435	go to the desired	-0.124939
-1.145707	compiled for the desired	-0.124939
-0.586704	appropriate for the desired	-0.124939
-0.561086	may put the desired	-0.124939
-0.992744	to enable the desired	-0.124939
-0.065599	to obtain the desired	-0.124939
-0.726768	be initialized to desired	-0.124939
-0.818424	usability problems and desired	-0.124939
-0.358653	function type with desired	-0.124939
-0.954820	they are used. Such	-0.124939
-0.346390	going either way. Such	-0.124939
-0.341765	the application software. Such	-0.124939
-0.339397	so-called soft processor. Such	-0.124939
-0.339376	hardware definition language. Such	-0.124939
-0.859565	loop-carried dependency chain. Such	-0.124939
-0.325311	they are running. Such	-0.124939
-0.538774	on the market. Such	-0.124939
-0.102846	the same chip. Such	-0.124939
-0.102846	the CPU chip. Such	-0.124939
-0.237861	operating system standards. Such	-0.124939
-0.237861	on hardware identification. Such	-0.124939
-0.237861	in computer games. Such	-0.124939
-0.237861	__thread or __declspec(thread). Such	-0.124939
-0.237861	are not reproducible. Such	-0.124939
-1.814946	to use the #pragma	-0.124939
-0.597496	code when the #pragma	-0.124939
-0.461908	or __restrict or #pragma	-0.124939
-0.357568	to vectorize, or #pragma	-0.124939
-0.358677	always Optimize function #pragma	-0.124939
-0.562373	used, then use #pragma	-0.124939
-0.828393	#pragma vector always #pragma	-0.124939
-0.354155	__fastcall Noncached write #pragma	-0.124939
-0.326941	pointer is aligned #pragma	-0.124939
-0.746240	#pragma vector aligned #pragma	-0.124939
-0.799465	#pragma vector nontemporal #pragma	-0.124939
-0.279441	#pragma ivdep __restrict #pragma	-0.124939
-0.279441	__declspec( noalias) __restrict #pragma	-0.124939
-0.237886	pointer not aliased #pragma	-0.124939
-0.237886	((visibility ("internal"))) Vectorize #pragma	-0.124939
-0.490034	in system code. Dynamic	-0.124939
-0.804373	any extra code. Dynamic	-0.124939
-0.832405	linking is used. Dynamic	-0.124939
-1.410654	Dynamic memory allocation Dynamic	-0.124939
-1.204366	caching less efficient. Dynamic	-0.124939
-0.992693	the end user. Dynamic	-0.124939
-1.152139	dynamic memory allocation. Dynamic	-0.124939
-0.629840	data caching inefficient. Dynamic	-0.124939
-0.331838	64-bit systems). 28 Dynamic	-0.124939
-0.325331	to optimization are. Dynamic	-0.124939
-0.408091	allocations is limited. Dynamic	-0.124939
-0.102852	allocated memory. 9.6 Dynamic	-0.124939
-0.102852	...................................................................................................... 90 9.6 Dynamic	-0.124939
-0.102852	....................................................................................................... 19 3.6 Dynamic	-0.124939
-0.102852	is acceptable. 3.6 Dynamic	-0.124939
-0.462527	from seldom used functions,	-0.124939
-0.503244	CPU-time in library functions,	-0.124939
-0.461452	work with member functions,	-0.124939
-0.757230	pointers to its functions,	-0.124939
-0.355667	multiple inheritance, virtual functions,	-0.124939
-0.705072	support for intrinsic functions,	-0.124939
-0.311429	compiler supports intrinsic functions,	-0.124939
-0.311429	allow assembly-like intrinsic functions,	-0.124939
-0.345175	library contains similar functions,	-0.124939
-0.343716	log are pure functions,	-0.124939
-0.314708	limited to well-tested functions,	-0.124939
-0.037162	as logarithms, exponential functions,	-0.425969
-0.037162	exponential functions, trigonometric functions,	-0.425969
-0.597411	optimizations of the whole	-0.124939
-0.597411	understanding of the whole	-0.124939
-0.597307	T+6, and the whole	-0.124939
-0.598245	handling for the whole	-0.124939
-0.561983	have put the whole	-0.124939
-0.358056	mispredictions. Test the whole	-0.124939
-0.358056	occupied throughout the whole	-0.124939
-0.540982	to take a whole	-0.124939
-0.358605	that draws a whole	-0.124939
-1.157353	an option for whole	-0.124939
-0.856590	have support for whole	-0.124939
-0.585274	that can do whole	-0.124939
-0.502383	a feature called whole	-0.124939
-0.355804	allows "__attribute__((visibility("hidden")))". Use whole	-0.124939
-0.545390	way of doing whole	-0.124939
-0.990802	can avoid the inefficient	-0.124939
-0.591906	However, it is inefficient	-0.124939
-0.591906	words, it is inefficient	-0.124939
-0.597488	it. This is inefficient	-0.124939
-0.462117	each other is inefficient	-0.124939
-0.503346	Thread-local storage is inefficient	-0.124939
-0.835443	point comparisons are inefficient	-0.124939
-0.580118	allocation in an inefficient	-0.124939
-0.578842	other compilers have inefficient	-0.124939
-0.553018	code is very inefficient	-0.124939
-0.496726	in a very inefficient	-0.124939
-0.496726	certainly a very inefficient	-0.124939
-1.022936	can be very inefficient	-0.124939
-0.908318	can be quite inefficient	-0.124939
-0.336295	flexible, but quite inefficient	-0.124939
-0.596702	level-1 and the level-2	-0.124939
-0.374180	occur in the level-2	-0.124939
-0.873737	misses in the level-2	-0.124939
-1.147273	Contentions in the level-2	-0.124939
-0.597740	stride for the level-2	-0.124939
-1.935241	is that the level-2	-0.124939
-1.339768	bigger than the level-2	-0.124939
-0.592106	cache from the level-2	-0.124939
-0.566282	instruction prevents the level-2	-0.124939
-0.586161	Kbytes and a level-2	-0.124939
-0.358612	only if, a level-2	-0.124939
-0.570353	level-2 cache. The level-2	-0.124939
-0.358835	much stronger for level-2	-0.124939
-0.358695	// Check if level-2	-0.124939
-1.522997	sure that the response	-0.124939
-0.589620	advantage that the response	-0.124939
-0.574186	programs because the response	-0.124939
-0.574186	applications because the response	-0.124939
-0.595633	ms. If the response	-0.124939
-0.526227	should test the response	-0.124939
-0.570501	if such a response	-0.124939
-0.818287	is waiting for response	-0.124939
-0.062900	of unacceptably long response	-0.124939
-0.062900	by unacceptably long response	-0.124939
-0.062900	have unacceptably long response	-0.124939
-0.062900	experience unacceptably long response	-0.124939
-0.352691	cost of longer response	-0.124939
-0.294228	expects an immediate response	-0.124939
-0.237918	long and irregular response	-0.124939
-0.791344	This method is described	-0.124939
-0.358775	These algorithms are described	-0.124939
-0.352465	detection function as described	-0.124939
-0.495997	CPU-intensive code, as described	-0.124939
-0.042221	modern CPUs, as described	-0.425969
-0.088992	multi-core CPUs, as described	-0.124939
-0.587936	chapter, I have described	-0.124939
-0.560106	negative. The method described	-0.124939
-0.460292	for the cases described	-0.124939
-0.456607	by the methods described	-0.124939
-0.753727	Unfortunately, the syntax described	-0.124939
-0.349078	methods are further described	-0.124939
-0.382816	The preceding paragraph described	-0.124939
-1.014604	a power of 2.	-0.301030
-0.173445	high power of 2.	-0.124939
-0.556652	are powers of 2.	-0.124939
-1.056075	result will be 2.	-0.124939
-0.106656	divide i by 2.	-0.425969
-0.720123	rolled out by 2.	-0.124939
-0.350306	and Mac platforms. 2.	-0.124939
-0.345230	particular code version. 2.	-0.124939
-0.237910	a different meaning. 2.	-0.124939
-0.237910	and the loader. 2.	-0.124939
-0.237910	Intel or PathScale. 2.	-0.124939
-1.106805	different types of variables.	-0.124939
-1.365463	for the same variables.	-0.124939
-0.539850	values of all variables.	-0.124939
-0.526241	operators on integer variables.	-0.124939
-0.461944	never use static variables.	-0.124939
-1.158798	signed and unsigned variables.	-0.124939
-0.514007	use of register variables.	-0.124939
-0.479921	most for register variables.	-0.124939
-1.149923	floating point register variables.	-0.124939
-0.501715	avoided for these variables.	-0.124939
-0.456064	this with induction variables.	-0.124939
-0.352331	avoiding any public variables.	-0.124939
-0.584724	static or global variables.	-0.124939
-0.414335	are called global variables.	-0.124939
-0.350343	sequence of consecutive variables.	-0.124939
-2.074659	the number of lines	-0.124939
-0.717186	because the cache lines	-0.124939
-0.499236	all the cache lines	-0.124939
-0.535841	set of cache lines	-0.124939
-0.327272	eight different cache lines	-0.124939
-0.327272	loading any cache lines	-0.124939
-0.327272	of these cache lines	-0.124939
-0.040110	the four cache lines	-0.124939
-0.084304	only four cache lines	-0.124939
-0.462187	are organized into lines	-0.124939
-0.357015	of the 4 lines	-0.124939
-0.460477	corresponds to 16 lines	-0.124939
-0.355653	= 128. These lines	-0.124939
-1.021814	than a few lines	-0.124939
-0.463100	circumstances around the hot	-0.124939
-1.041257	for finding the hot	-0.124939
-0.358505	code once the hot	-0.124939
-0.463100	to isolate the hot	-0.124939
-1.679505	there is a hot	-0.124939
-0.358285	profiling. When a hot	-0.124939
-0.358285	to identify a hot	-0.124939
-0.573166	critical functions and hot	-0.124939
-0.578998	single function or hot	-0.124939
-0.590434	occur in this hot	-0.124939
-0.357379	profiler identifies any hot	-0.124939
-0.171445	profiler to find hot	-0.425969
-0.544019	intended for finding hot	-0.124939
-0.237902	useful for identifying hot	-0.124939
-0.712236	different integer types Unfortunately,	-0.124939
-0.709931	is never called. Unfortunately,	-0.124939
-0.518868	a[1] = 2; Unfortunately,	-0.124939
-0.821437	pure function calls. Unfortunately,	-0.124939
-0.510417	such container classes. Unfortunately,	-0.124939
-0.449063	of these purposes. Unfortunately,	-0.124939
-0.939776	automatic CPU dispatching. Unfortunately,	-0.124939
-0.339388	the virtual table. Unfortunately,	-0.124939
-0.779720	same processor core. Unfortunately,	-0.124939
-0.336201	to do this. Unfortunately,	-0.124939
-0.331839	non-member functions. 80 Unfortunately,	-0.124939
-0.408055	use AMD CodeAnalyst. Unfortunately,	-0.124939
-0.237853	of cross-platform portability. Unfortunately,	-0.124939
-0.237853	lrintf and lrint. Unfortunately,	-0.124939
-0.237853	on page 132. Unfortunately,	-0.124939
-0.447821	2009. Gnu C++ v.	-0.124939
-0.447821	Hat). PathScale C++ v.	-0.124939
-0.447821	2007. PGI C++ v.	-0.124939
-0.632764	Intel C++ compiler, v.	-0.124939
-0.342598	Intel C++ Compiler v.	-0.124939
-0.237050	Microsoft C++ Compiler v.	-0.124939
-0.248606	Digital Mars Compiler v.	-0.124939
-0.314745	Open Watcom C/C++ v.	-0.124939
-0.382805	2005. Codeplay VectorC v.	-0.124939
-0.237886	doesn't works (gcc v.	-0.124939
-0.237886	Kernel Library (MKL v.	-0.124939
-0.237886	2.7, 2.8. Asmlib: v.	-0.124939
-0.237886	CodeGear Borland bcc, v.	-0.124939
-0.237886	4.0.1. Gnu: Glibc v.	-0.124939
-0.237886	Visual studio 2008, v.	-0.124939
-0.358565	in a. This operation	-0.124939
-0.587633	if the same operation	-0.124939
-0.587633	where the same operation	-0.124939
-0.585282	each floating point operation	-0.124939
-0.585282	Any floating point operation	-0.124939
-0.574332	it in one operation	-0.124939
-0.524300	of the & operation	-0.124939
-1.283516	in a single operation	-0.124939
-0.501018	on a store operation	-0.124939
-0.352324	resources. Each graphics operation	-0.124939
-0.516421	by a shift operation	-0.124939
-0.347472	units. Each 128-bit operation	-0.124939
-0.331855	The bitwise AND operation	-0.124939
-0.294173	A complex digital operation	-0.124939
-0.237869	performing an illegal operation	-0.124939
-1.405905	part of the code,	-0.124939
-0.591605	loading of the code,	-0.124939
-1.251101	rest of the code,	-0.124939
-0.600379	reciprocal in the code,	-0.124939
-0.884968	you start to code,	-0.124939
-0.593784	make more efficient code,	-0.124939
-0.457987	useful for optimizing code,	-0.124939
-0.544880	into an intermediate code,	-0.124939
-0.336312	runtime frameworks, intermediate code,	-0.124939
-0.541853	of the source code,	-0.124939
-0.483845	of position- independent code,	-0.124939
-0.331859	language for CPU-intensive code,	-0.124939
-0.294200	compatibility with legacy code,	-0.124939
-0.237894	of removing superfluous code,	-0.124939
-0.888698	object, then the instance	-0.124939
-0.463136	declared whenever an instance	-0.124939
-1.013777	more than one instance	-0.124939
-0.473065	variables have one instance	-0.124939
-0.434493	and make one instance	-0.124939
-0.951259	has only one instance	-0.124939
-0.335865	will get one instance	-0.124939
-0.062801	section needs one instance	-0.425969
-0.462334	stored with each instance	-0.124939
-0.537936	time. A template instance	-0.124939
-0.773169	make a new instance	-0.124939
-0.532259	generate a new instance	-0.124939
-1.002859	that the next instance	-0.124939
-0.352361	polymorphic classes. Each instance	-0.124939
-0.786629	the library that comes	-0.124939
-0.358085	first algorithm that comes	-0.124939
-0.358661	is initialized or comes	-0.124939
-0.580474	advantages when it comes	-0.124939
-0.591773	do because it comes	-0.124939
-0.594054	Linux. The compiler comes	-0.124939
-0.358316	open source. It comes	-0.124939
-0.358059	Library (STL) which comes	-0.124939
-0.761710	new register size comes	-0.124939
-0.355023	register. This advantage comes	-0.124939
-0.353366	next new model comes	-0.124939
-0.353167	critical integer parameter comes	-0.124939
-0.346424	A considerable delay comes	-0.124939
-0.346412	Linux and BSD comes	-0.124939
-0.314688	the main feedback comes	-0.124939
-1.077561	advantage of the fact	-0.124939
-0.499706	pointer is in fact	-0.124939
-0.106480	and are in fact	-0.124939
-0.106480	program are in fact	-0.124939
-0.106480	we are in fact	-0.124939
-0.049985	they are in fact	-0.124939
-0.538987	program may in fact	-0.124939
-0.355127	exits, when in fact	-0.124939
-0.355127	registers had in fact	-0.124939
-0.502229	suboptimal way. The fact	-0.124939
-0.461101	of underflow. The fact	-0.124939
-0.356934	sign bit. The fact	-0.124939
-0.356934	than 20. The fact	-0.124939
-1.137314	advantage of this fact	-0.124939
-0.591395	software is to find	-0.124939
-1.163977	in order to find	-0.301030
-1.086730	you want to find	-0.425969
-0.355190	of bytes to find	-0.124939
-1.549820	be able to find	-0.124939
-0.850815	be difficult to find	-0.124939
-0.738367	are difficult to find	-0.124939
-0.065328	a profiler to find	-0.425969
-0.578715	you can also find	-0.124939
-0.572828	if you cannot find	-0.124939
-0.294256	function call. (2) find	-0.124939
-1.347478	solution is to rely	-0.124939
-0.763631	more convenient to rely	-0.124939
-0.538865	on compilers that rely	-0.124939
-0.502852	making optimizations that rely	-0.124939
-0.461667	operations. Algorithms that rely	-0.124939
-0.977617	and you can rely	-0.124939
-0.565502	cases you can rely	-0.124939
-0.357993	multiple threads should rely	-0.124939
-0.502727	128 function cannot rely	-0.124939
-0.939740	then you cannot rely	-0.124939
-0.548753	not. You cannot rely	-0.124939
-0.356443	you cannot always rely	-0.124939
-0.355952	loop branch must rely	-0.124939
-0.343717	if possible. Don't rely	-0.124939
-0.237902	we can surely rely	-0.124939
-1.056450	else { // No	-0.124939
-0.581338	*(T*)0; } // No	-0.124939
-0.657567	-fomit- frame- pointer No	-0.124939
-0.489802	and execution time. No	-0.124939
-1.487215	at compile time. No	-0.124939
-0.560761	consecutively in memory. No	-0.124939
-0.354299	is actually used. No	-0.124939
-0.350762	loop iterations are: No	-0.124939
-0.978580	a template parameter. No	-0.124939
-0.343643	need separate storage. No	-0.124939
-0.341810	a linear array. No	-0.124939
-0.314688	v. 2.1.7, 2004. No	-0.124939
-0.237869	exception handling /EHs- No	-0.124939
-0.237869	(see page 53). No	-0.124939
-0.237869	program /Qipo -ipo No	-0.124939
-0.358227	for example to produce	-0.124939
-1.045336	are sure to produce	-0.124939
-0.358227	Runtime, CLR, to produce	-0.124939
-0.504286	from operators that produce	-0.124939
-0.358097	conditions. Programs that produce	-0.124939
-0.463423	as output can produce	-0.124939
-0.586987	conversions do not produce	-0.124939
-0.922009	compiler does not produce	-0.124939
-0.543121	It does not produce	-0.124939
-0.463093	ambiguous and may produce	-0.124939
-1.324897	The compiler will produce	-0.124939
-0.357993	optimizing compiler should produce	-0.124939
-0.357890	Digital Mars compilers produce	-0.124939
-0.511094	The Boolean operators produce	-0.124939
-0.933265	The bitwise operators produce	-0.124939
-0.601102	features of the position-independent	-0.124939
-0.818223	turning off the position-independent	-0.124939
-0.940484	the costs of position-independent	-0.124939
-0.107089	Dynamic linking and position-independent	-0.425969
-0.463189	is compiled as position-independent	-0.124939
-0.356001	systems often use position-independent	-0.124939
-0.459916	Unix-like systems use position-independent	-0.124939
-0.358198	OS X make position-independent	-0.124939
-0.357837	by not using position-independent	-0.124939
-0.356549	shared objects without position-independent	-0.124939
-0.558976	section is always position-independent	-0.124939
-0.652198	the compiler uses position-independent	-0.124939
-0.451096	need for special position-independent	-0.124939
-0.237877	avoid the burdensome position-independent	-0.124939
-0.851482	serial code for vectorization	-0.124939
-0.164265	Factors that make vectorization	-0.124939
-0.880455	where you want vectorization	-0.124939
-0.355133	decide how advantageous vectorization	-0.124939
-0.354905	predict correctly whether vectorization	-0.124939
-0.462483	intrinsics and automatic vectorization	-0.124939
-0.376790	instructions. The automatic vectorization	-0.124939
-0.289280	situations where automatic vectorization	-0.124939
-0.289280	vector intrinsics, automatic vectorization	-0.124939
-0.183843	float expressions Automatic vectorization	-0.124939
-0.183843	CPU dispatch Automatic vectorization	-0.124939
-0.183843	Example 12.1a. Automatic vectorization	-0.124939
-0.082265	future. 12.3 Automatic vectorization	-0.124939
-0.082265	107 12.3 Automatic vectorization	-0.124939
-1.387114	If you are including	-0.124939
-0.358636	different compiler by including	-0.124939
-0.459697	common mathematical calculations including	-0.124939
-1.589858	AMD and VIA including	-0.124939
-0.452673	for 32-bit Windows, including	-0.124939
-0.960628	of the code, including	-0.124939
-0.059027	handle the strings including	-0.425969
-0.607303	options turned on, including	-0.124939
-0.331847	for many platforms, including	-0.124939
-0.331847	full metaprogramming features, including	-0.124939
-0.325347	only on n, including	-0.124939
-0.237861	all static data, including	-0.124939
-0.237861	the same computer, including	-0.124939
-0.237861	whole software package, including	-0.124939
-1.288208	is useful for checking	-0.124939
-0.106929	bitwise operators for checking	-0.425969
-0.358665	for overflow by checking	-0.124939
-1.845059	There is no checking	-0.124939
-0.573256	mentations have no checking	-0.124939
-0.356578	the loop without checking	-0.124939
-0.350330	has some syntax checking	-0.124939
-0.257572	method of bounds checking	-0.124939
-0.194330	arrays with bounds checking	-0.124939
-0.194330	Array with bounds checking	-0.124939
-0.048395	{ // Bounds checking	-0.124939
-0.023524	constant. 14.2 Bounds checking	-0.124939
-0.023524	132 14.2 Bounds checking	-0.124939
-0.048395	memory fragmentation. Bounds checking	-0.124939
-0.595836	cache because the out-of-order	-0.124939
-0.562320	calculation. However, the out-of-order	-0.124939
-0.540580	chapter. Using the out-of-order	-0.124939
-0.358508	In general, the out-of-order	-0.124939
-1.236396	take advantage of out-of-order	-0.124939
-0.533373	maximum advantage of out-of-order	-0.124939
-0.527304	automatically thanks to out-of-order	-0.124939
-0.065412	A microprocessor with out-of-order	-0.425969
-0.459722	} Microprocessors with out-of-order	-0.124939
-1.010976	microcontrollers have no out-of-order	-0.124939
-0.585268	microprocessors can do out-of-order	-0.124939
-0.754002	CPU from doing out-of-order	-0.124939
-0.349126	chain which prevents out-of-order	-0.124939
-0.463604	be portable to platforms	-0.124939
-0.499536	port to different platforms	-0.124939
-0.582941	different for different platforms	-0.124939
-0.105767	apply to other platforms	-0.124939
-0.351347	implemented on other platforms	-0.124939
-0.503816	recommendation about which platforms	-0.124939
-0.576067	assembly on all platforms	-0.124939
-0.357608	for supporting multiple platforms	-0.124939
-1.129137	the most common platforms	-0.124939
-0.500072	choice for Linux platforms	-0.124939
-0.953130	Linux and Mac platforms	-0.124939
-0.351673	storage. All x86 platforms	-0.124939
-0.237886	x86 and ARM platforms	-0.124939
-1.984630	instruction set is particularly	-0.124939
-0.816730	where speed is particularly	-0.124939
-1.053057	memory allocation is particularly	-0.124939
-0.358031	binary representation is particularly	-0.124939
-0.900173	system can be particularly	-0.124939
-1.384974	functions that are particularly	-0.124939
-0.568392	considerations that are particularly	-0.124939
-0.976612	Vector operations are particularly	-0.124939
-0.355611	Text strings are particularly	-0.124939
-0.355611	device drivers are particularly	-0.124939
-0.540474	Some CPUs have particularly	-0.124939
-0.358328	details (www.agner.org/optimize/testp.zip). A particularly	-0.124939
-0.502869	bottleneck or any particularly	-0.124939
-0.355318	code implementation works particularly	-0.124939
-0.596699	lrint function is given	-0.124939
-0.570073	child class is given	-0.124939
-0.553485	implementation for a given	-0.124939
-0.553485	platform for a given	-0.124939
-0.599982	space can be given	-0.124939
-0.893842	processor may be given	-0.124939
-0.501488	container classes are given	-0.124939
-0.356403	Further details are given	-0.124939
-0.356403	and divisions are given	-0.124939
-0.356403	my experiment are given	-0.124939
-0.358600	optimize access, as given	-0.124939
-0.781354	has not been given	-0.124939
-0.274666	of the advice given	-0.124939
-0.182902	follow the advice given	-0.124939
-0.600800	glitches in the output	-0.124939
-0.764385	you compile the output	-0.124939
-0.358860	file input and output	-0.124939
-0.358826	input file. The output	-0.124939
-0.358595	have Booleans as output	-0.124939
-0.575128	or to an output	-0.124939
-0.598956	at the compiler output	-0.124939
-0.504172	values are then output	-0.124939
-0.057777	in the assembly output	-0.124939
-0.057777	at the assembly output	-0.124939
-0.057777	makes the assembly output	-0.124939
-0.057777	what the assembly output	-0.124939
-0.389357	output. The assembly output	-0.124939
-0.423965	have an assembly output	-0.124939
-1.495373	multiple of the level-1	-0.124939
-1.035863	be in the level-1	-0.124939
-0.873737	table in the level-1	-0.124939
-1.147273	Contentions in the level-1	-0.124939
-0.587135	mirrored in the level-1	-0.124939
-0.597740	blocking for the level-1	-0.124939
-1.339768	bigger than the level-1	-0.124939
-0.589325	computer where the level-1	-0.124939
-0.539641	in both the level-1	-0.124939
-0.657227	to reload the level-1	-0.124939
-1.687034	there is a level-1	-0.124939
-0.504889	contentions than for level-1	-0.124939
-0.599118	even the same level-1	-0.124939
-0.552372	almost the entire level-1	-0.124939
-1.499368	most of the resources.	-0.124939
-0.890464	a waste of resources.	-0.124939
-0.450476	big waste of resources.	-0.124939
-1.365430	for the same resources.	-0.124939
-0.549679	memory or other resources.	-0.124939
-0.461208	cache are critical resources.	-0.124939
-0.653989	lot of extra resources.	-0.124939
-0.355475	cleanup of allocated resources.	-0.124939
-0.354822	that uses few resources.	-0.124939
-0.349720	input or network resources.	-0.124939
-0.349053	devices with limited resources.	-0.124939
-0.331828	lot of computing resources.	-0.124939
-0.325321	space were scarce resources.	-0.124939
-0.237869	you have ample resources.	-0.124939
-1.371567	when it is outside	-0.124939
-0.526819	If i is outside	-0.124939
-0.725187	to stack memory outside	-0.124939
-0.358048	a function but outside	-0.124939
-0.357355	a temporary variable outside	-0.124939
-0.356362	the extra operations outside	-0.124939
-0.355708	the last element outside	-0.124939
-1.153647	check for overflow outside	-0.124939
-0.848696	to be done outside	-0.124939
-0.571698	Initialize loop counter outside	-0.124939
-0.747181	that are declared outside	-0.124939
-0.350779	arithmetic calculations go outside	-0.124939
-0.346437	(i.e. variables defined outside	-0.124939
-0.595078	it can move outside	-0.124939
-0.601231	requirements of the task	-0.124939
-0.574502	happens when a task	-0.124939
-1.120347	situation where a task	-0.124939
-0.540454	Don't put a task	-0.124939
-1.098704	The cost of task	-0.124939
-0.358839	to interrupts and task	-0.124939
-0.358575	Such events as task	-0.124939
-0.358568	the mouse. This task	-0.124939
-0.569635	doubled for this task	-0.124939
-0.549429	allocated to each task	-0.124939
-0.584384	is a single task	-0.124939
-0.640390	for a given task	-0.124939
-0.347412	spell checking. Any task	-0.124939
-0.237877	on the essential task	-0.124939
-0.594154	unsafe code is limited	-0.124939
-0.462121	the CPU is limited	-0.124939
-0.825618	the performance is limited	-0.124939
-0.955959	clock frequency is limited	-0.124939
-0.357735	possible inputs is limited	-0.124939
-1.072249	size is a limited	-0.124939
-0.596086	distribution to a limited	-0.124939
-0.561686	has only a limited	-0.124939
-0.525894	keys within a limited	-0.124939
-1.668823	likely to be limited	-0.124939
-0.463354	be overloaded or limited	-0.124939
-0.358644	small devices with limited	-0.124939
-0.355761	Register storage A limited	-0.124939
-0.653554	very expensive. A limited	-0.124939
-0.527275	writes automatically in vectorized	-0.124939
-0.358821	error prone. The vectorized	-0.124939
-0.576283	Intrinsic functions for vectorized	-0.124939
-1.641901	be used for vectorized	-0.124939
-1.754163	that can be vectorized	-0.124939
-1.482605	can also be vectorized	-0.124939
-0.539289	cases cannot be vectorized	-0.124939
-0.539289	lookup cannot be vectorized	-0.124939
-0.355729	can now be vectorized	-0.124939
-0.597140	profitable to use vectorized	-0.124939
-0.357117	prevents a faster vectorized	-0.124939
-0.503252	12.4c. Same example, vectorized	-0.124939
-0.408127	code is indeed vectorized	-0.124939
-0.237902	12.9b. Taylor series, vectorized	-0.124939
-0.599537	polymorphism. It is sometimes	-0.124939
-0.358627	to zero is sometimes	-0.124939
-0.462968	more efficient, and sometimes	-0.124939
-0.358402	becomes inconsistent and sometimes	-0.124939
-0.556229	Newer processors are sometimes	-0.124939
-0.357992	as macros are sometimes	-0.124939
-0.354905	of optimization can sometimes	-0.124939
-0.458525	float conversions can sometimes	-0.124939
-0.354905	explicitly. Divisions can sometimes	-0.124939
-0.354905	c) 139 can sometimes	-0.124939
-0.354905	and shuffling can sometimes	-0.124939
-0.594067	problem. The compiler sometimes	-0.124939
-0.646817	such a framework sometimes	-0.124939
-0.339481	often unreliable. They sometimes	-0.124939
-0.598116	called and the local	-0.124939
-0.889832	and use the local	-0.124939
-1.780434	to make the local	-0.124939
-0.578578	copied to a local	-0.124939
-1.194569	applied to a local	-0.124939
-0.527180	intermediate data and local	-0.124939
-0.503037	local name for local	-0.124939
-0.357512	all destructors for local	-0.124939
-0.357512	PLT lookups for local	-0.124939
-0.358154	mode. Make functions local	-0.124939
-0.826091	contiguous with other local	-0.124939
-0.549391	keyword to all local	-0.124939
-0.349755	static data, including local	-0.124939
-0.408127	from), function parameters, local	-0.124939
-1.293778	because of the costs	-0.124939
-0.597651	focus on the costs	-0.124939
-0.574506	much about the costs	-0.124939
-0.358210	memory plus the costs	-0.124939
-0.462724	for avoiding the costs	-0.124939
-0.358210	weighed against the costs	-0.124939
-0.575602	four kinds of costs	-0.124939
-0.357560	an exception. The costs	-0.124939
-0.142977	3 1.1 The costs	-0.124939
-0.142977	information. 1.1 The costs	-0.124939
-0.357575	the STL also costs	-0.124939
-0.357272	are inherent performance costs	-0.124939
-0.342665	of CPUs. These costs	-0.124939
-0.342665	to another. These costs	-0.124939
-0.557557	next instance of S1	-0.124939
-0.763945	all instances of S1	-0.124939
-0.358461	void Func() { S1	-0.124939
-0.553519	2 unused bytes S1	-0.124939
-0.355129	at 19 }; S1	-0.124939
-1.126937	size = 100; S1	-0.124939
-0.177366	12.2 __declspec(align(16)) struct S1	-0.124939
-0.177366	Example 14.9 struct S1	-0.124939
-0.177366	Example 8.15a struct S1	-0.124939
-0.177366	Example 8.15b struct S1	-0.124939
-0.177366	Example 7.35b struct S1	-0.124939
-0.177366	Example 7.35a struct S1	-0.124939
-0.037166	a; double b;}; S1	-0.124939
-0.358908	a library of math	-0.124939
-0.517461	kinds of vector math	-0.124939
-0.376952	and Intel vector math	-0.124939
-0.376952	libraries: Intel vector math	-0.124939
-0.504260	some long vector math	-0.124939
-0.346293	of short vector math	-0.124939
-0.346293	Intel short vector math	-0.124939
-0.873767	use the Intel math	-0.124939
-1.129236	the most common math	-0.124939
-0.355586	119). The AMD math	-0.124939
-0.652814	for high precision math	-0.124939
-0.652641	the best optimized math	-0.124939
-0.534015	options for fast math	-0.124939
-0.347460	factor. A little math	-0.124939
-0.886084	new value of temp	-0.124939
-0.504990	physical register to temp	-0.124939
-0.579114	a+1; b = temp	-0.124939
-0.579621	temp; c = temp	-0.124939
-0.356586	b[i]; c[i] = temp	-0.124939
-1.354925	size; i++) { temp	-0.124939
-1.006789	compilers will make temp	-0.124939
-0.538160	value of register temp	-0.124939
-0.349065	but will save temp	-0.124939
-0.414541	a[i] = temp; temp	-0.124939
-0.219144	a[100], b, temp; temp	-0.124939
-0.219144	b, c, temp; temp	-0.124939
-0.219144	i, a[100], temp; temp	-0.124939
-0.237902	(temp = &list[0]; temp	-0.124939
-1.075949	copy of the inlined	-0.124939
-0.498360	call to the inlined	-0.124939
-1.060070	The code is inlined	-0.124939
-0.659824	The names of inlined	-0.124939
-0.891454	function to be inlined	-0.124939
-0.892110	constructor may be inlined	-0.124939
-0.587170	which cannot be inlined	-0.124939
-0.504810	The operators are inlined	-0.124939
-0.584849	copy of an inlined	-0.124939
-0.572892	15.1a to an inlined	-0.124939
-0.570493	functions are often inlined	-0.124939
-0.559023	function is always inlined	-0.124939
-0.827217	function is usually inlined	-0.124939
-1.432868	but it is still	-0.124939
-0.462868	multiplication, etc. is still	-0.124939
-0.358323	and who is still	-0.124939
-0.500733	The loop can still	-0.124939
-0.719662	instruction set can still	-0.124939
-0.459740	this solution can still	-0.124939
-0.355862	described above can still	-0.124939
-0.570348	mode, where it still	-0.124939
-1.961040	of the code still	-0.124939
-1.214114	then it will still	-0.124939
-0.461692	code. However, we still	-0.124939
-0.354761	to 0x273F would still	-0.124939
-0.646779	high level framework still	-0.124939
-0.339428	vector processing capabilities still	-0.124939
-1.182552	object of the class.	-0.124939
-1.265657	instance of the class.	-0.124939
-0.888593	instances of the class.	-0.124939
-0.587629	variable inside the class.	-0.124939
-0.993732	a structure or class.	-0.124939
-0.473976	same structure or class.	-0.124939
-1.681642	of the same class.	-0.124939
-0.499654	pointer to another class.	-0.124939
-0.828148	into a container class.	-0.124939
-0.689372	parent and child class.	-0.124939
-0.453963	to its child class.	-0.124939
-0.736062	of the derived class.	-0.124939
-0.343739	the // parent class.	-0.124939
-0.538855	of the object's class.	-0.124939
-0.600930	information in the database	-0.124939
-0.591421	applications use a database	-0.124939
-0.804969	to replace a database	-0.124939
-0.358667	the network or database	-0.124939
-0.526355	user data. A database	-0.124939
-0.538163	access the system database	-0.124939
-0.457972	gain by optimizing database	-0.124939
-0.347456	loops, etc. Optimizing database	-0.124939
-0.339415	Locked mutexes. Open database	-0.124939
-0.109368	20 3.8 System database	-0.124939
-0.109368	finish. 3.8 System database	-0.124939
-0.331864	easy GUI development, database	-0.124939
-0.294182	memory, windows, mutexes, database	-0.124939
-0.237877	the big registration database	-0.124939
-0.599639	branch if the constants	-0.124939
-0.358806	expressions. Whether the constants	-0.124939
-1.064966	maximum number of constants	-0.124939
-1.192011	a table of constants	-0.124939
-0.557575	parts: one for constants	-0.124939
-0.358138	subexpression containing only constants	-0.124939
-0.358096	multiplying by other constants	-0.124939
-1.223010	and floating point constants	-0.124939
-0.585294	all floating point constants	-0.124939
-0.568963	that the two constants	-0.124939
-0.893708	chooses between two constants	-0.124939
-0.354002	for constants. Integer constants	-0.124939
-0.346434	stored. All identical constants	-0.124939
-0.237894	the stack. String constants	-0.124939
-0.583185	lookup[b]; If a bool	-0.124939
-0.587325	int) instead of bool	-0.124939
-0.355792	bytes alignment, bytes bool	-0.124939
-0.069576	SomeFunction (int a, bool	-0.726999
-0.518374	7.29a float a; bool	-0.124939
-0.505036	double x, y; bool	-0.124939
-0.714821	the following way: bool	-0.124939
-0.314745	// Example 7.11 bool	-0.124939
-0.538823	x, y, z; bool	-0.124939
-0.237886	// Example 7.10a bool	-0.124939
-0.237886	// Example 7.9a bool	-0.124939
-0.951971	at a time. Do	-0.124939
-0.780988	a member function. Do	-0.124939
-1.142040	registers are used. Do	-0.124939
-1.204279	caching less efficient. Do	-0.124939
-0.579104	current instruction set. Do	-0.124939
-1.152005	dynamic memory allocation. Do	-0.124939
-0.510485	contiguous memory block. Do	-0.124939
-0.339397	prevents certain optimizations. Do	-0.124939
-0.714748	a linked list. Do	-0.124939
-0.314725	a scarce resource. Do	-0.124939
-0.294163	row + column; Do	-0.124939
-0.237861	and Enterprise editions). Do	-0.124939
-0.237861	a unique key. Do	-0.124939
-0.237861	a hash map. Do	-0.124939
-0.726513	by inlining the frame	-0.124939
-0.358814	by turning the frame	-0.124939
-0.587572	simpler than a frame	-0.124939
-0.834895	is called a frame	-0.124939
-0.434644	no calls to frame	-0.124939
-0.434644	contains calls to frame	-0.124939
-0.573182	leaf functions and frame	-0.124939
-1.575947	more efficient than frame	-0.124939
-0.546676	frame functions. A frame	-0.124939
-0.355754	an exception. A frame	-0.124939
-0.418072	use a stack frame	-0.124939
-0.132052	the standard stack frame	-0.124939
-0.132052	The standard stack frame	-0.124939
-0.322730	/EHs- No stack frame	-0.124939
-0.357260	(i % 2 ==	-0.124939
-0.355104	SIZE % 128 ==	-0.124939
-0.498883	} if (a ==	-0.124939
-0.077810	#endif // INSTRSET ==	-0.124939
-0.037165	} #if INSTRSET ==	-0.124939
-0.037165	set #if INSTRSET ==	-0.124939
-0.037165	SelectAddMul_SSE41 #elif INSTRSET ==	-0.124939
-0.037165	SelectAddMul_SSE2 #elif INSTRSET ==	-0.124939
-0.212292	a = (b ==	-0.124939
-0.403844	{ if (b ==	-0.124939
-0.077810	Wednesday || Day ==	-0.124939
-0.077810	Tuesday || Day ==	-0.124939
-0.538872	Day; if (Day ==	-0.124939
-0.237910	exceptions: __except (GetExceptionCode() ==	-0.124939
-0.524044	double b; int d;	-0.124939
-0.356724	at 7 int d;	-0.124939
-0.505824	union { double d;	-0.124939
-0.027250	int u; double d;	-0.602060
-0.163961	+ c + d;	-0.124939
-0.253472	a, b, c, d;	-0.124939
-0.152955	= 0, c, d;	-0.124939
-0.314783	doubles: union {double d;	-0.124939
-0.582091	It has the special	-0.124939
-0.599230	saved in a special	-0.124939
-0.889452	done with a special	-0.124939
-0.588117	that have a special	-0.124939
-0.851664	a set of special	-0.124939
-0.462978	be optimal in special	-0.124939
-0.504579	underflow except in special	-0.124939
-0.540575	Many libraries for special	-0.124939
-0.503944	The need for special	-0.124939
-0.885314	But there are special	-0.124939
-0.867578	unless you have special	-0.124939
-0.895469	effort to make special	-0.124939
-0.846454	need to take special	-0.124939
-0.336252	function libraries. Several special	-0.124939
-2.092440	in order to prevent	-0.124939
-1.051147	best way to prevent	-0.124939
-0.539268	good way to prevent	-0.124939
-1.767921	you want to prevent	-0.124939
-0.461453	extra overhead to prevent	-0.124939
-0.357210	// Volatile to prevent	-0.124939
-1.186413	the CPU and prevent	-0.124939
-0.658801	the user and prevent	-0.124939
-0.881495	factors that can prevent	-0.124939
-1.844557	in the code prevent	-0.124939
-0.572584	definition. This will prevent	-0.124939
-0.355810	atomic. It doesn't prevent	-0.124939
-0.351671	the debugging options prevent	-0.124939
-0.314737	is rebooted. To prevent	-0.124939
-1.167390	replaced by a shift	-0.124939
-0.888568	done with a shift	-0.124939
-0.594868	done as a shift	-0.124939
-1.356338	by using a shift	-0.124939
-0.502986	bit operations and shift	-0.124939
-0.357475	of a[i] and shift	-0.124939
-0.357475	= multiply and shift	-0.124939
-0.723397	of additions and shift	-0.124939
-0.580526	set). We can shift	-0.124939
-1.065430	by constant = shift	-0.124939
-0.569679	reasons for this shift	-0.124939
-0.358329	example 14.28 will shift	-0.124939
-0.444798	in ebx ; shift	-0.124939
-0.344056	+ sign(i) ; shift	-0.124939
-0.584014	constructor and the destructor	-0.124939
-0.584014	process, and the destructor	-0.124939
-0.897026	fail if the destructor	-0.124939
-0.629190	to call the destructor	-0.124939
-1.341259	information about the destructor	-0.124939
-0.589893	course be a destructor	-0.124939
-0.887677	class with a destructor	-0.124939
-0.785954	must have a destructor	-0.124939
-0.539555	thread have a destructor	-0.124939
-0.589908	not make a destructor	-0.124939
-0.358341	object owns. A destructor	-0.124939
-0.525860	constructor and no destructor	-0.124939
-0.459522	necessary. A virtual destructor	-0.124939
-0.887383	optimization is to save	-0.124939
-0.564871	functions have to save	-0.124939
-0.975989	doesn't have to save	-0.124939
-2.098688	in order to save	-0.124939
-0.357553	other calculations to save	-0.124939
-0.590487	if it can save	-0.124939
-0.591178	(a+b). This can save	-0.124939
-0.584973	overlap. You can save	-0.124939
-0.537602	that we may save	-0.124939
-0.589478	64). You may save	-0.124939
-0.358329	variables, but will save	-0.124939
-0.462455	Typically it should save	-0.124939
-0.826976	unused label ; save	-0.124939
-0.449125	registers, and possibly save	-0.124939
-0.358860	a register and prevents	-0.124939
-0.579441	function, and it prevents	-0.124939
-0.880747	optimal because it prevents	-0.124939
-0.356414	but unfortunately it prevents	-0.124939
-0.530677	in memory. This prevents	-0.124939
-0.495310	another thread. This prevents	-0.124939
-0.351971	preceding one. This prevents	-0.124939
-0.351971	declared volatile. This prevents	-0.124939
-0.351971	is compiling. This prevents	-0.124939
-0.540628	tables if this prevents	-0.124939
-0.358155	nontemporal write instruction prevents	-0.124939
-0.358083	dependency chain which prevents	-0.124939
-0.461902	this. It also prevents	-0.124939
-0.497010	The integer division prevents	-0.124939
-1.579732	address of the preceding	-0.124939
-0.328760	result of the preceding	-0.249877
-0.878046	constant to the preceding	-0.124939
-1.240935	equal to the preceding	-0.124939
-1.554816	depends on the preceding	-0.124939
-0.575740	two. In the preceding	-0.124939
-0.545776	addition before the preceding	-0.124939
-0.545776	iteration before the preceding	-0.124939
-0.568424	branch. See the preceding	-0.124939
-0.358848	stack unwinding The preceding	-0.124939
-0.358657	is correlated with preceding	-0.124939
-0.595760	always use the safe	-0.124939
-0.599890	why it is safe	-0.124939
-0.598206	CString. This is safe	-0.124939
-0.599535	calculations in a safe	-0.124939
-1.224889	is not a safe	-0.124939
-0.788594	are called. The safe	-0.124939
-0.920032	may not be safe	-0.124939
-0.578836	efficient, but not safe	-0.124939
-1.474815	It is more safe	-0.124939
-0.459807	is therefore more safe	-0.124939
-1.011607	This is only safe	-0.124939
-0.355637	generally not thread safe	-0.124939
-0.355502	program is exception safe	-0.124939
-0.358396	b, c and d	-0.124939
-0.462961	example 15.1b and d	-0.124939
-0.580732	3.5; c = d	-0.124939
-0.581688	c; y = d	-0.124939
-1.077538	== 0) { d	-0.124939
-0.357821	Example 14.20 double d	-0.124939
-0.065025	+ c*x + d	-0.425969
-0.622837	a & b; d	-0.124939
-0.340016	a && b; d	-0.124939
-0.043015	u; double d; d	-0.602060
-0.237910	else { DTRUE: d	-0.124939
-0.172677	............................................................................... 8 2.5 Choice	-0.124939
-0.172677	C++ compilers. 2.5 Choice	-0.124939
-0.102867	........................................................................................... 6 2.3 Choice	-0.124939
-0.102867	this manual. 2.3 Choice	-0.124939
-0.102867	data cache. 2.2 Choice	-0.124939
-0.102867	....................................................................................... 5 2.2 Choice	-0.124939
-0.102867	optimal platform 2.1 Choice	-0.124939
-0.102867	........................................................................................... 5 2.1 Choice	-0.124939
-0.102867	libraries........................................................................................ 12 2.7 Choice	-0.124939
-0.102867	are undocumented. 2.7 Choice	-0.124939
-0.102867	.................................................................................................... 10 2.6 Choice	-0.124939
-0.102867	another compiler. 2.6 Choice	-0.124939
-0.102867	program optimization. 2.4 Choice	-0.124939
-0.102867	system......................................................................................... 6 2.4 Choice	-0.124939
-1.234050	the code to tell	-0.124939
-0.883915	also possible to tell	-0.124939
-0.190379	no way to tell	-0.124939
-0.459320	vector always to tell	-0.124939
-0.355531	variable declaration to tell	-0.124939
-0.355531	__assume_aligned directive to tell	-0.124939
-0.355531	function prototype to tell	-0.124939
-0.355531	we forgot to tell	-0.124939
-0.355531	like throw(A,B,C) to tell	-0.124939
-0.355531	#pragma novector to tell	-0.124939
-0.590366	profiler that can tell	-0.124939
-0.579248	bit. We can tell	-0.124939
-0.358336	or references then tell	-0.124939
-0.598569	especially on the Pentium	-0.124939
-0.485453	matrix on a Pentium	-0.124939
-0.485453	measured on a Pentium	-0.124939
-0.485453	9.5a on a Pentium	-0.124939
-0.485453	perfectly on a Pentium	-0.124939
-0.540087	old CPUs. The Pentium	-0.124939
-0.503977	another computer. The Pentium	-0.124939
-0.659242	clock cycles on Pentium	-0.124939
-0.658638	branch prediction. A Pentium	-0.124939
-0.549042	fake an Intel Pentium	-0.124939
-0.355928	regular pattern, while Pentium	-0.124939
-0.403400	to the old Pentium	-0.124939
-0.403400	on the old Pentium	-0.124939
-0.237902	in the oldest Pentium	-0.124939
-0.358923	theoretical background is further	-0.124939
-0.591874	90 for a further	-0.124939
-0.823935	the possibility for further	-0.124939
-0.354914	be expected for further	-0.124939
-0.354914	C++ Performance for further	-0.124939
-0.354914	page 150 for further	-0.124939
-0.354914	page 101 for further	-0.124939
-0.354914	page 153 for further	-0.124939
-0.354914	page 140 for further	-0.124939
-1.493569	code can be further	-0.124939
-0.541000	similar methods are further	-0.124939
-0.897311	optimize the code further	-0.124939
-0.358321	AVX instructions. A further	-0.124939
-0.890976	unroll the loop further	-0.124939
-0.358592	Making exception-safe code Assume	-0.124939
-1.579053	} } } Assume	-0.124939
-0.724769	static static static Assume	-0.124939
-0.816434	#pragma vector aligned Assume	-0.124939
-0.488988	another memory access. Assume	-0.124939
-0.700500	throw() throw() throw() Assume	-0.124939
-0.339397	terms of speed. Assume	-0.124939
-0.538774	align(16)) __attribute(( aligned(16))) Assume	-0.124939
-0.382771	const)) __attribute(( const)) Assume	-0.124939
-0.538774	__restrict #pragma ivdep Assume	-0.124939
-0.294163	-fno-rtti /GR- -fno-rtti Assume	-0.124939
-0.237861	are accessed column-wise. Assume	-0.124939
-0.237861	See page 78. Assume	-0.124939
-0.237861	code in general. Assume	-0.124939
-0.598380	influence on the efficiency	-0.124939
-1.009111	difference between the efficiency	-0.124939
-0.524314	databases, etc. The efficiency	-0.124939
-0.142777	25 7 The efficiency	-0.124939
-0.142777	tool. 7 The efficiency	-0.124939
-0.356907	7.13 Loops The efficiency	-0.124939
-0.840527	reason for this efficiency	-0.124939
-0.358364	the question when efficiency	-0.124939
-0.540143	priority of program efficiency	-0.124939
-0.462618	concentrated on CPU efficiency	-0.124939
-0.357977	memory economy, cache efficiency	-0.124939
-0.781635	you may improve efficiency	-0.124939
-0.350329	explaining the relative efficiency	-0.124939
-0.421339	implemented. The highest efficiency	-0.124939
-1.191955	fact that the repeat	-0.124939
-0.880284	problem if the repeat	-0.124939
-0.590507	well if the repeat	-0.124939
-0.597117	problem when the repeat	-0.124939
-0.595818	code). If the repeat	-0.124939
-0.527258	decides whether to repeat	-0.124939
-0.355930	i++ ;checkifi<100 ; repeat	-0.124939
-0.554976	with a high repeat	-0.124939
-0.454653	near the maximum repeat	-0.124939
-0.308338	the worst-case maximum repeat	-0.124939
-0.343706	a very low repeat	-0.124939
-0.343684	if the typical repeat	-0.124939
-0.442007	small and fixed repeat	-0.124939
-0.325384	of precision. Let's repeat	-0.124939
-0.617337	divisible by the unroll	-0.602060
-0.885036	not have to unroll	-0.124939
-0.870518	not necessary to unroll	-0.124939
-0.761987	an advantage to unroll	-0.124939
-1.136477	no reason to unroll	-0.124939
-0.461889	be worthwhile to unroll	-0.124939
-0.577365	odd and you unroll	-0.124939
-1.134637	Some compilers will unroll	-0.124939
-0.572488	or the loop unroll	-0.124939
-0.572488	off the loop unroll	-0.124939
-0.462326	Unfortunately, some compilers unroll	-0.124939
-0.351662	Compilers will usually unroll	-0.124939
-0.197953	functions that it calls.	-0.124939
-0.559618	binding of function calls.	-0.124939
-0.489421	branches or function calls.	-0.124939
-0.775272	the library function calls.	-0.124939
-0.347729	through multiple function calls.	-0.124939
-0.701091	with many function calls.	-0.124939
-0.449435	data through function calls.	-0.124939
-0.349964	containing pure function calls.	-0.124939
-0.349964	involves pure function calls.	-0.124939
-0.139954	optimize across function calls.	-0.124939
-0.139954	optimizations across function calls.	-0.124939
-0.523651	interfaces and system calls.	-0.124939
-0.454922	system-specific graphical interface calls.	-0.124939
-0.190123	best into the algorithm	-0.425969
-1.162114	The choice of algorithm	-0.124939
-0.358521	language defines an algorithm	-0.124939
-0.357366	to express any algorithm	-0.124939
-0.589782	optimizing the first algorithm	-0.124939
-0.588842	only. The following algorithm	-0.124939
-0.579427	if a simple algorithm	-0.124939
-0.581143	choosing the best algorithm	-0.124939
-0.372039	choose the optimal algorithm	-0.124939
-0.398167	Choosing the optimal algorithm	-0.124939
-0.353588	advanced and complicated algorithm	-0.124939
-0.294191	implement a universal algorithm	-0.124939
-0.358955	which calculates the sum	-0.124939
-0.898085	case is a sum	-0.124939
-0.886073	each value of sum	-0.124939
-0.463375	// constructor // sum	-0.124939
-0.358449	16; n++) { sum	-0.124939
-0.358313	+= a[i+3]; } sum	-0.124939
-0.352296	= x; float sum	-0.124939
-0.646702	float a[100]; float sum	-0.124939
-1.410476	< 100; i++) sum	-0.124939
-1.414978	< size; i++) sum	-0.124939
-0.323529	(i=0; i<100; i++) sum	-0.124939
-0.570196	a[100]; int i, sum	-0.124939
-0.538823	100; float list[size], sum	-0.124939
-0.294191	1.f); // initialize sum	-0.124939
-0.065790	to handle the strings	-0.425969
-0.463338	different types or strings	-0.124939
-0.142011	to store all strings	-0.124939
-0.142011	may store all strings	-0.124939
-0.577258	how to store strings	-0.124939
-0.779669	way to handle strings	-0.124939
-0.621729	method of storing strings	-0.124939
-0.065802	compile time. Text strings	-0.124939
-0.065802	container classes. Text strings	-0.124939
-0.065802	9.8 Strings Text strings	-0.124939
-0.172672	storage of text strings	-0.124939
-0.172672	and handle text strings	-0.124939
-0.352310	on some processors. On	-0.124939
-0.518834	several different CPUs. On	-0.124939
-0.796643	by the compiler. On	-0.124939
-0.349033	uses few resources. On	-0.124939
-0.595078	big data structures. On	-0.124939
-0.314732	assembly language output. On	-0.124939
-0.382782	optimization more difficult. On	-0.124939
-0.294173	to using hyperthreading. On	-0.124939
-0.294173	a branch tree. On	-0.124939
-0.237869	special loop predictor. On	-0.124939
-0.237869	floating point comparison. On	-0.124939
-0.237869	modification is profitable. On	-0.124939
-0.237869	in example 9.1b. On	-0.124939
-0.600860	representation of the exponent	-0.124939
-0.199655	function when the exponent	-0.425969
-0.593186	n from the exponent	-0.124939
-0.463628	add n to exponent	-0.124939
-0.358201	negative numbers. The exponent	-0.124939
-0.358201	binary digits. The exponent	-0.124939
-0.460796	: 8; // exponent	-0.124939
-0.356694	: 15; // exponent	-0.124939
-0.356694	: 11; // exponent	-0.124939
-0.181239	part unsigned int exponent	-0.425969
-0.498087	normal unsigned int exponent	-0.124939
-0.358914	most distributions of Linux,	-0.124939
-0.847179	both Windows and Linux,	-0.124939
-1.267298	Shared objects in Linux,	-0.124939
-0.358416	Windows, SetThreadAffinityMask, in Linux,	-0.124939
-1.414319	32-bit and 64-bit Linux,	-0.124939
-0.535033	Windows. In 64-bit Linux,	-0.124939
-0.346725	registers, whereas 64-bit Linux,	-0.124939
-0.461448	defined(__GNUC__) // 32-bit Linux,	-0.124939
-0.258212	and a Windows, Linux,	-0.124939
-0.258212	platforms with Windows, Linux,	-0.124939
-0.258212	operating systems Windows, Linux,	-0.124939
-0.258212	Linux, Mac Windows, Linux,	-0.124939
-0.237918	x86 platforms (Windows, Linux,	-0.124939
-0.600218	sake of the possibility	-0.124939
-0.596910	platforms and the possibility	-0.124939
-0.324341	rule out the possibility	-0.425969
-0.573829	thought about the possibility	-0.124939
-0.143038	it opens the possibility	-0.124939
-0.143038	set opens the possibility	-0.124939
-0.657534	compilers offer the possibility	-0.124939
-0.357761	can open the possibility	-0.124939
-0.304405	innermost loop. Another possibility	-0.124939
-0.304405	a GOT. Another possibility	-0.124939
-0.294256	out the theoretical possibility	-0.124939
-0.294256	a very obscure possibility	-0.124939
-0.570574	memory. See the discussion	-0.124939
-0.553482	16 for a discussion	-0.124939
-0.553482	87 for a discussion	-0.124939
-0.462672	and 120 for discussion	-0.124939
-0.358169	page 93 for discussion	-0.124939
-0.462990	clear from this discussion	-0.124939
-0.540374	31 for more discussion	-0.124939
-0.358321	error prone. A discussion	-0.124939
-1.402646	There are various discussion	-0.124939
-0.245536	for a further discussion	-0.124939
-0.233294	150 for further discussion	-0.124939
-0.233294	101 for further discussion	-0.124939
-0.233294	153 for further discussion	-0.124939
-0.358817	are satisfied. The conditions	-0.124939
-0.346939	for testing multiple conditions	-0.124939
-0.139709	14.7b. Testing multiple conditions	-0.124939
-0.139709	14.7a. Testing multiple conditions	-0.124939
-1.018614	any of these conditions	-0.124939
-0.891837	of the following conditions	-0.124939
-0.530255	if the following conditions	-0.124939
-0.356114	recovering from error conditions	-0.124939
-0.457973	parallel if certain conditions	-0.124939
-0.353639	and the caching conditions	-0.124939
-0.496595	example has three conditions	-0.124939
-0.325377	from www.agner.org/optimize. Copyright conditions	-0.124939
-0.421339	tested under worst-case conditions	-0.124939
-0.591866	satisfactorily on a non-Intel	-0.124939
-0.358644	Works well with non-Intel	-0.124939
-0.226210	The performance on non-Intel	-0.124939
-0.189765	reduced performance on non-Intel	-0.425969
-0.545873	this work on non-Intel	-0.124939
-0.348391	The speed on non-Intel	-0.124939
-0.434736	work well on non-Intel	-0.124939
-0.434736	works well on non-Intel	-0.124939
-0.895042	when running on non-Intel	-0.124939
-0.358334	Intel processors. A non-Intel	-0.124939
-0.294237	CPU dispatcher treats non-Intel	-0.124939
-0.294237	these also treat non-Intel	-0.124939
-1.679728	a pointer to it.	-0.124939
-1.385249	or reference to it.	-0.124939
-0.358606	don't count on it.	-0.124939
-0.597131	unwise to use it.	-0.124939
-0.539919	sure you need it.	-0.124939
-0.654093	do something about it.	-0.124939
-0.782378	program that calls it.	-0.124939
-1.055855	you can avoid it.	-0.124939
-0.903354	processors that support it.	-0.124939
-0.350779	together and tested it.	-0.124939
-0.542230	than to execute it.	-0.124939
-0.331838	you don't understand it.	-0.124939
-0.237877	itself and recompile it.	-0.124939
-0.575313	in the CPU (See	-0.425969
-1.717425	power of 2 (See	-0.124939
-0.493278	dividing by 2 (See	-0.124939
-0.459915	on AMD CPUs (See	-0.124939
-0.554283	in 64-bit Windows (See	-0.124939
-0.352657	the specified types (See	-0.124939
-0.750078	for different CPUs. (See	-0.124939
-0.453862	optimizations across modules (See	-0.124939
-0.496468	out by 2. (See	-0.124939
-0.451920	or global variables. (See	-0.124939
-0.749075	to be mispredicted (See	-0.124939
-0.538823	available from www.intel.com. (See	-0.124939
-0.562431	different kind of registers.	-0.124939
-0.358577	the scarcity of registers.	-0.124939
-1.225724	be transferred in registers.	-0.124939
-0.658837	be returned in registers.	-0.124939
-0.574746	into the vector registers.	-0.124939
-0.352138	of bigger vector registers.	-0.124939
-0.352138	of special vector registers.	-0.124939
-0.556903	in two different registers.	-0.124939
-0.357972	stored in integer registers.	-0.124939
-0.357785	be copied into registers.	-0.124939
-0.352058	stack versus XMM registers.	-0.124939
-0.513290	in the YMM registers.	-0.124939
-0.364821	and 256-bit YMM registers.	-0.124939
-0.600858	situation of the maximum	-0.124939
-0.527051	be below the maximum	-0.124939
-0.358513	is near the maximum	-0.124939
-0.358513	explained above, the maximum	-0.124939
-0.358923	Windows allows a maximum	-0.124939
-0.720651	vector registers. The maximum	-0.124939
-0.356290	the same. The maximum	-0.124939
-0.356290	page 27). The maximum	-0.124939
-0.356290	the weekdays. The maximum	-0.124939
-0.356290	systems. 67 The maximum	-0.124939
-0.357470	bits minimum value maximum	-0.124939
-0.572843	do to take maximum	-0.124939
-0.325391	determine the worst-case maximum	-0.124939
-1.352102	32-bit and 64-bit mode.	-0.124939
-0.493124	default in 64-bit mode.	-0.124939
-0.493124	enabled in 64-bit mode.	-0.124939
-0.493124	recognized in 64-bit mode.	-0.124939
-0.473415	32-bit or 64-bit mode.	-0.124939
-0.407831	faster in 32-bit mode.	-0.124939
-0.575129	stack in 32-bit mode.	-0.124939
-0.407831	references in 32-bit mode.	-0.124939
-0.291866	especially in 32-bit mode.	-0.124939
-1.209147	in 64 bit mode.	-0.124939
-0.950351	in 32 bit mode.	-0.124939
-0.237943	goes into sleep mode.	-0.124939
-0.691053	number of elements per	-0.301030
-0.356153	The execution times per	-0.124939
-0.354512	have three values per	-0.124939
-0.387851	is clock cycles per	-0.124939
-0.546069	core clock cycles per	-0.124939
-0.387851	50 clock cycles per	-0.124939
-0.151995	matrices, clock cycles per	-0.425969
-0.237949	Matrix size Time per	-0.124939
-0.237949	Total kilobytes Time per	-0.124939
-0.237949	Example 9.6a Time per	-0.124939
-0.358165	using this for testing	-0.124939
-0.586249	also useful for testing	-0.124939
-0.591477	code you are testing	-0.124939
-0.461637	is zero by testing	-0.124939
-0.502818	you gain by testing	-0.124939
-0.523973	very useful when testing	-0.124939
-0.500781	be relevant when testing	-0.124939
-0.503987	code faster because testing	-0.124939
-0.502437	This also makes testing	-0.124939
-0.331888	terms of development, testing	-0.124939
-0.023522	156 16.3 Worst-case testing	-0.124939
-0.237902	are optimal. Best-case testing	-0.124939
-1.711253	so that the alignment	-0.124939
-0.583813	64, but the alignment	-0.124939
-0.504632	well specify the alignment	-0.124939
-0.580750	parameters because of alignment	-0.124939
-0.504857	alignment automatically. The alignment	-0.124939
-0.358623	12.1b. Vectorization with alignment	-0.124939
-0.526971	few restrictions on alignment	-0.124939
-0.358571	data members. This alignment	-0.124939
-0.584314	care of this alignment	-0.124939
-0.658729	vector operations when alignment	-0.124939
-0.357786	information about pointer alignment	-0.124939
-0.352344	intrinsic vectors requires alignment	-0.124939
-0.314708	or __attribute__((aligned(16))). Specifies alignment	-0.124939
-0.572742	data to the right	-0.124939
-1.121510	point to the right	-0.124939
-1.404313	pointer to the right	-0.124939
-0.572742	place to the right	-0.124939
-0.534525	data into the right	-0.124939
-0.777122	them into the right	-0.124939
-0.656936	has made the right	-0.124939
-1.176010	to find the right	-0.124939
-0.342877	for finding the right	-0.124939
-0.461772	for putting the right	-0.124939
-0.462272	final array size right	-0.124939
-0.451213	sign(i) ; shift right	-0.124939
-1.069506	added to the offset	-0.124939
-0.598856	compact if the offset	-0.124939
-0.358059	to code the offset	-0.124939
-0.593629	more then the offset	-0.124939
-0.595192	128 because the offset	-0.124939
-0.595444	number. If the offset	-0.124939
-0.786558	simply stores the offset	-0.124939
-0.463519	b;} }; The offset	-0.124939
-0.562375	accessed with an offset	-0.124939
-0.525852	number, or no offset	-0.124939
-0.319758	in the global offset	-0.124939
-0.414374	variables called global offset	-0.124939
-0.450236	with a total offset	-0.124939
-1.964885	is that the compatibility	-0.124939
-0.785252	the sake of compatibility	-0.425969
-0.357578	frequent causes of compatibility	-0.124939
-0.831825	the requirements of compatibility	-0.124939
-0.723636	frequent sources of compatibility	-0.124939
-0.566288	installation time and compatibility	-0.124939
-0.557009	resource problems and compatibility	-0.124939
-0.658751	instruction set when compatibility	-0.124939
-0.429518	sake of backwards compatibility	-0.124939
-0.382839	find and resolve compatibility	-0.124939
-0.294219	Unfortunately, the cross-platform compatibility	-0.124939
-0.237910	information about bugs, compatibility	-0.124939
-0.596485	twice because the macro	-0.124939
-0.596459	similar to a macro	-0.124939
-0.540242	expanded like a macro	-0.124939
-0.462820	// define a macro	-0.124939
-0.851760	a result of macro	-0.124939
-0.659638	But beware that macro	-0.124939
-0.459586	of functions A macro	-0.124939
-0.355741	in scope. A macro	-0.124939
-0.355788	Example 7.34a. Use macro	-0.124939
-0.629465	{ // Define macro	-0.124939
-0.444045	arrays // Define macro	-0.124939
-0.314727	Example 7.34b. Replace macro	-0.124939
-0.294210	set. The preprocessing macro	-0.124939
-0.245454	a; // 2 bytes.	-0.124939
-0.245454	d; // 2 bytes.	-0.124939
-0.255070	first // 4 bytes.	-0.124939
-0.166971	b; // 4 bytes.	-0.124939
-0.166971	d; // 4 bytes.	-0.124939
-0.140382	bytes // 8 bytes.	-0.124939
-0.140382	b; // 8 bytes.	-0.124939
-0.495496	size of 64 bytes.	-0.124939
-0.639622	is typically 64 bytes.	-0.124939
-0.460469	block of 16 bytes.	-0.124939
-0.652252	the first 128 bytes.	-0.124939
-0.341838	abc is 12 bytes.	-0.124939
-0.294219	a[100]; // 400 bytes.	-0.124939
-0.375609	reference to the object.	-0.124939
-0.598756	constructor for the object.	-0.124939
-0.358510	to delete the object.	-0.124939
-1.536021	to the same object.	-0.124939
-0.587635	block for each object.	-0.124939
-0.137556	from the shared object.	-0.124939
-0.518685	making a shared object.	-0.124939
-0.302371	the same shared object.	-0.124939
-0.533891	in a global object.	-0.124939
-0.552344	copy the entire object.	-0.124939
-0.382850	or an anonymous object.	-0.124939
-0.540174	Make array of 100	-0.124939
-0.462763	the sum of 100	-0.124939
-0.358241	// Array of 100	-0.124939
-1.337088	that there are 100	-0.124939
-0.463293	multiply it by 100	-0.124939
-0.357235	comparing i with 100	-0.124939
-0.357235	compares eax with 100	-0.124939
-0.358575	takes 50 - 100	-0.124939
-0.357473	take 1000 * 100	-0.124939
-0.870253	give the result 100	-0.124939
-0.629963	cycles per element. 100	-0.124939
-0.352340	eax, 1 eax, 100	-0.124939
-0.269191	i++. cmp eax, 100	-0.124939
-0.771307	factorial *= x; Note	-0.124939
-0.642736	0 and 1. Note	-0.124939
-0.345198	the desired version. Note	-0.124939
-0.341810	the Windows system. Note	-0.124939
-0.441924	as character arrays. Note	-0.124939
-0.520765	Documentation for details. Note	-0.124939
-0.666067	for an explanation. Note	-0.124939
-0.325321	but less optimized. Note	-0.124939
-0.382782	is optimized away. Note	-0.124939
-0.237869	the variable Day. Note	-0.124939
-0.237869	object file disassembler. Note	-0.124939
-0.237869	any patch. 131 Note	-0.124939
-0.237869	result in a[i]. Note	-0.124939
-0.577642	faster by making them	-0.124939
-1.267489	that you want them	-0.124939
-0.459401	code and compile them	-0.124939
-0.547874	seen can reduce them	-0.124939
-0.350340	until you turn them	-0.124939
-0.531558	returned by copying them	-0.124939
-0.447786	stack and reading them	-0.124939
-0.434941	simply by comparing them	-0.124939
-0.314732	file and copies them	-0.124939
-0.408079	cores and leave them	-0.124939
-0.382782	better to join them	-0.124939
-0.237869	variables or hide them	-0.124939
-0.237869	format and getting them	-0.124939
-0.143235	are reading and writing	-0.124939
-0.143235	on reading and writing	-0.124939
-0.579043	where we are writing	-0.124939
-0.065073	than reading or writing	-0.124939
-0.065073	access Reading or writing	-0.124939
-0.065073	program. Reading or writing	-0.124939
-0.065073	access. Reading or writing	-0.124939
-0.065073	0x1C. Reading or writing	-0.124939
-1.258786	as well as writing	-0.124939
-0.461485	shift in software writing	-0.124939
-0.443528	or more threads writing	-0.124939
-0.544941	avoid multiple threads writing	-0.124939
-1.052455	version of the library.	-0.124939
-0.595085	or a function library.	-0.124939
-0.460196	a different function library.	-0.124939
-0.654469	a separate function library.	-0.124939
-0.598624	entire floating point library.	-0.124939
-1.147358	Agner's vector class library.	-0.124939
-0.548715	or a static library.	-0.124939
-0.353186	Yeppp. Open source library.	-0.124939
-0.643806	the Gnu C library.	-0.124939
-0.408127	131. AMD LIBM library.	-0.124939
-0.237902	than any non-vector library.	-0.124939
-0.463024	union Bitfield { struct	-0.124939
-0.063350	sign :1;//signbit }; struct	-0.124939
-0.531499	expressed as follows: struct	-0.124939
-0.325331	Example 12.2 __declspec(align(16)) struct	-0.124939
-0.325362	// Example 14.9 struct	-0.124939
-0.836025	size = 1024; struct	-0.124939
-0.314698	// Example 7.13 struct	-0.124939
-0.294182	// Example 8.15a struct	-0.124939
-0.237877	// Example 7.40a struct	-0.124939
-0.237877	// Example 8.15b struct	-0.124939
-0.237877	// Example 7.35b struct	-0.124939
-0.237877	// Example 7.35a struct	-0.124939
-0.600798	anywhere in the calculations.	-0.124939
-0.580515	program do the calculations.	-0.124939
-0.598624	precise floating point calculations.	-0.124939
-0.565922	all the integer calculations.	-0.124939
-0.457346	of 64-bit integer calculations.	-0.124939
-0.501748	functions for these calculations.	-0.124939
-0.475110	searching, and mathematical calculations.	-0.124939
-0.311426	can do mathematical calculations.	-0.124939
-0.311426	loop doing mathematical calculations.	-0.124939
-0.455288	the heavy graphics calculations.	-0.124939
-0.346424	logic allows parallel calculations.	-0.124939
-0.487981	than the actual calculations.	-0.124939
-0.294210	CPU from overlapping calculations.	-0.124939
-0.615569	to put the operand	-0.124939
-0.791661	then put the operand	-0.124939
-0.358542	microprocessors when an operand	-0.124939
-0.141969	prediction. If one operand	-0.124939
-0.141969	first. If one operand	-0.124939
-0.182390	if the first operand	-0.124939
-0.502885	If the first operand	-0.124939
-0.269134	and the second operand	-0.124939
-0.114044	then the second operand	-0.425969
-0.269134	whether the second operand	-0.124939
-0.336304	the most predictable operand	-0.124939
-1.041306	may have a reduced	-0.124939
-0.726740	common cause of reduced	-0.124939
-0.594295	bytes can be reduced	-0.124939
-0.887696	condition can be reduced	-0.124939
-0.357260	} Can be reduced	-0.124939
-0.463266	may run with reduced	-0.124939
-1.541684	the Intel compiler reduced	-0.124939
-1.441818	the Gnu compiler reduced	-0.124939
-0.181147	This library has reduced	-0.602060
-1.146394	of the compilers reduced	-0.124939
-0.580246	count has been reduced	-0.124939
-0.581035	approximately two clock cycles.	-0.124939
-0.378148	to 4 clock cycles.	-0.124939
-0.290390	- 8 clock cycles.	-0.124939
-0.378148	save several clock cycles.	-0.124939
-0.290390	only 256 clock cycles.	-0.124939
-0.290390	every three clock cycles.	-0.124939
-0.478286	called core clock cycles.	-0.124939
-0.290390	- 100 clock cycles.	-0.124939
-0.212941	and 20 clock cycles.	-0.124939
-0.312084	- 20 clock cycles.	-0.124939
-0.290390	takes 40 clock cycles.	-0.124939
-0.532169	- 45 clock cycles.	-0.124939
-0.290390	approximately 500 clock cycles.	-0.124939
-1.054714	performance of the final	-0.124939
-0.886626	efficiency of the final	-0.124939
-0.593750	estimate of the final	-0.124939
-0.586873	times in the final	-0.124939
-0.586873	arrays in the final	-0.124939
-0.586873	disabled in the final	-0.124939
-0.586873	inserted in the final	-0.124939
-1.692540	so that the final	-0.124939
-0.596239	counter when the final	-0.124939
-0.594500	allocated. If the final	-0.124939
-0.723021	can check the final	-0.124939
-0.357313	to allocate the final	-0.124939
-0.501072	object on its final	-0.124939
-0.483571	is for the sake	-0.124939
-0.483571	or for the sake	-0.124939
-0.691608	function for the sake	-0.124939
-0.691608	cache for the sake	-0.124939
-0.483571	size for the sake	-0.124939
-0.483571	version for the sake	-0.124939
-0.483571	instructions for the sake	-0.124939
-0.483571	1 for the sake	-0.124939
-0.483571	files for the sake	-0.124939
-0.483571	them for the sake	-0.124939
-0.691608	chosen for the sake	-0.124939
-0.483571	included for the sake	-0.124939
-0.483571	maintained for the sake	-0.124939
-0.463612	than sequences of operations.	-0.124939
-0.491356	suited for vector operations.	-0.124939
-0.451199	implemented as vector operations.	-0.124939
-0.997950	to use vector operations.	-0.124939
-0.451199	for Boolean vector operations.	-0.124939
-0.896245	than floating point operations.	-0.124939
-0.462414	use of integer operations.	-0.124939
-0.354613	only simple standard operations.	-0.124939
-0.512862	additions and shift operations.	-0.124939
-0.279455	as integer arithmetic operations.	-0.124939
-0.279455	than doing arithmetic operations.	-0.124939
-0.314727	many file input/output operations.	-0.124939
-0.237902	called Single-Instruction-Multiple-Data (SIMD) operations.	-0.124939
-0.827080	the dispatcher function. When	-0.124939
-0.933728	in the cache. When	-0.124939
-0.350762	than references are: When	-0.124939
-0.742801	the cache size. When	-0.124939
-0.450151	integer arithmetic operations. When	-0.124939
-0.510443	than single precision. When	-0.124939
-0.343674	77 Pointer aliasing When	-0.124939
-0.331828	every call method. When	-0.124939
-0.698655	and branch mispredictions. When	-0.124939
-0.382782	fast as additions. When	-0.124939
-0.237869	than 2 GB. When	-0.124939
-0.237869	discussion of profiling. When	-0.124939
-0.237869	rounded to 100000000. When	-0.124939
-0.358958	to split the tasks	-0.124939
-0.463493	very important for tasks	-0.124939
-0.570235	If the different tasks	-0.124939
-0.552431	switch between different tasks	-0.124939
-0.462579	independently of other tasks	-0.124939
-0.523293	times for simple tasks	-0.124939
-0.337667	structures for standard tasks	-0.124939
-0.436756	for many standard tasks	-0.124939
-0.354471	speed for certain tasks	-0.124939
-0.347470	high priority. Other tasks	-0.124939
-0.408129	as very time-consuming tasks	-0.124939
-0.287582	to put time-consuming tasks	-0.124939
-0.294200	loop for trivial tasks	-0.124939
-0.459889	of a function. Avoid	-0.124939
-0.355090	than non-virtual functions. Avoid	-0.124939
-1.029223	of a program. Avoid	-0.124939
-0.350762	Possible solutions are: Avoid	-0.124939
-0.444316	or performance problems. Avoid	-0.124939
-0.339406	key press. 19 Avoid	-0.124939
-0.331855	if appropriate. 8. Avoid	-0.124939
-0.683660	MemberPointer is declared. Avoid	-0.124939
-0.538790	on page 93. Avoid	-0.124939
-0.382782	on page 26. Avoid	-0.124939
-0.294173	element level 9. Avoid	-0.124939
-0.237869	on page 22. Avoid	-0.124939
-0.237869	See page 140. Avoid	-0.124939
-0.594801	only, then the effect	-0.124939
-0.356277	xplus2() { The effect	-0.124939
-0.558759	the data. The effect	-0.124939
-0.537192	infinite loop. The effect	-0.124939
-0.356277	parentheses manually. The effect	-0.124939
-0.356277	or post-increment. The effect	-0.124939
-0.358577	undesired effects. This effect	-0.124939
-0.462983	reason why this effect	-0.124939
-0.525007	has hardly any effect	-0.124939
-0.350811	has no negative effect	-0.124939
-0.485898	has a significant effect	-0.124939
-0.339441	the desired polymorphism effect	-0.124939
-0.325361	a very dramatic effect	-0.124939
-0.597511	lower; and the amount	-0.124939
-1.704961	so that the amount	-0.124939
-0.893016	useful when the amount	-0.124939
-0.462721	it increases the amount	-0.124939
-0.358207	to reserve the amount	-0.124939
-0.358207	to minimize the amount	-0.124939
-0.555821	because the total amount	-0.124939
-0.485927	consume a significant amount	-0.124939
-0.117635	to the required amount	-0.124939
-0.117635	allocates the required amount	-0.124939
-0.336333	put an equal amount	-0.124939
-0.488010	takes a considerable amount	-0.124939
-0.294228	CPU, an insufficient amount	-0.124939
-1.634056	size of the variable.	-0.124939
-0.598386	optimizations on the variable.	-0.124939
-1.412655	division by a variable.	-0.124939
-0.540724	references require a variable.	-0.124939
-0.463498	optimizations on that variable.	-0.124939
-0.525891	by a float variable.	-0.124939
-0.463781	is a register variable.	-0.124939
-0.463781	be a register variable.	-0.124939
-0.499551	or a simple variable.	-0.124939
-0.499551	accessing a simple variable.	-0.124939
-0.682159	by an induction variable.	-0.124939
-0.329670	an explicit induction variable.	-0.124939
-0.640419	to a local variable.	-0.124939
-1.499368	most of the time,	-0.124939
-0.585149	bits at a time,	-0.124939
-1.228507	a waste of time,	-0.124939
-0.358392	computers. At this time,	-0.124939
-1.496221	at the same time,	-0.124939
-0.539981	lot of CPU time,	-0.124939
-0.357354	added at any time,	-0.124939
-0.852165	takes a long time,	-0.124939
-0.459889	table takes extra time,	-0.124939
-1.733850	known at compile time,	-0.124939
-0.353371	compromise between development time,	-0.124939
-0.314732	pow at compile- time,	-0.124939
-0.237869	of the programmers' time,	-0.124939
-1.024545	inside a class Variables	-0.124939
-1.500360	on the stack Variables	-0.124939
-0.560779	other in memory. Variables	-0.124939
-0.549597	caching more efficient. Variables	-0.124939
-0.323669	or static storage Variables	-0.124939
-0.592008	of variable storage Variables	-0.124939
-0.350781	positive effects are: Variables	-0.124939
-0.343664	of course inefficient. Variables	-0.124939
-0.343664	for temporary storage. Variables	-0.124939
-0.172658	library functions. 9.4 Variables	-0.124939
-0.172658	together...................................... 88 9.4 Variables	-0.124939
-0.237886	above, p. 26). Variables	-0.124939
-0.237886	the critical stride. Variables	-0.124939
-0.600939	called in the copying	-0.124939
-0.586852	object instead of copying	-0.124939
-1.003565	different ways of copying	-0.124939
-0.358860	for transposing and copying	-0.124939
-0.964417	is done by copying	-0.124939
-0.558377	copied simply by copying	-0.124939
-1.294671	be avoided by copying	-0.124939
-0.353478	this jump by copying	-0.124939
-0.353478	are returned by copying	-0.124939
-1.177599	tasks such as copying	-0.124939
-0.462929	function implicitly when copying	-0.124939
-0.314737	avoid this wasteful copying	-0.124939
-0.314737	prevent legitimate backup copying	-0.124939
-1.486107	the sake of optimization.	-0.124939
-0.550748	hardly relevant to optimization.	-0.124939
-0.358806	the possibilities for optimization.	-0.124939
-0.358599	discussions about code optimization.	-0.124939
-0.463101	article on compiler optimization.	-0.124939
-0.938894	to do this optimization.	-0.124939
-0.610711	for whole program optimization.	-0.124939
-0.431717	do whole program optimization.	-0.124939
-0.357193	relevant to software optimization.	-0.124939
-0.349071	debugging options prevent optimization.	-0.124939
-0.347412	result of full optimization.	-0.124939
-0.294182	still needs careful optimization.	-0.124939
-0.237877	compilers offer profile-guided optimization.	-0.124939
-0.541227	extra cost to accessing	-0.124939
-0.572264	The code for accessing	-0.124939
-1.612289	be used for accessing	-0.124939
-0.460186	induction variable for accessing	-0.124939
-0.558657	is optimized for accessing	-0.124939
-0.356213	in STL for accessing	-0.124939
-0.358692	loading files or accessing	-0.124939
-0.540774	Another problem with accessing	-0.124939
-1.134617	as fast as accessing	-0.124939
-1.381963	more time than accessing	-0.124939
-1.330810	less efficient than accessing	-0.124939
-0.540768	functions or when accessing	-0.124939
-0.348360	Pointer aliasing When accessing	-0.124939
-0.463314	them off or until	-0.124939
-0.358610	will stay on until	-0.124939
-0.358134	is valid only until	-0.124939
-0.867653	by a variable until	-0.124939
-0.547521	access the file until	-0.124939
-0.807585	can be loaded until	-0.124939
-0.208773	seconds and wait until	-0.124939
-0.208773	function will wait until	-0.124939
-0.208773	x must wait until	-0.124939
-0.294191	loaded, but waits until	-0.124939
-0.294191	iteration is repeated until	-0.124939
-0.237886	is not detected until	-0.124939
-0.237886	should be postponed until	-0.124939
-0.599259	difference for the performance.	-0.124939
-0.357466	a lot in performance.	-0.124939
-0.503697	any cost in performance.	-0.124939
-1.176461	no difference in performance.	-0.124939
-0.539941	small gain in performance.	-0.124939
-0.526975	negative effect on performance.	-0.124939
-0.358221	impacts on program performance.	-0.124939
-0.656974	the worst possible performance.	-0.124939
-0.355718	version for best performance.	-0.124939
-0.770606	order to improve performance.	-0.124939
-0.348339	cause of reduced performance.	-0.124939
-0.346474	inlined for improved performance.	-0.124939
-0.314717	tips on improving performance.	-0.124939
-0.358827	be convenient for adding	-0.124939
-0.579036	Next, we are adding	-0.124939
-0.497577	single function by adding	-0.124939
-0.453463	each address by adding	-0.124939
-0.350913	16 bytes by adding	-0.124939
-1.134102	be calculated by adding	-0.124939
-1.271922	be improved by adding	-0.124939
-0.350913	each row by adding	-0.124939
-0.643983	by 2n by adding	-0.124939
-0.461260	size needed before adding	-0.124939
-0.356578	C-style type-casting without adding	-0.124939
-0.456912	for things like adding	-0.124939
-0.346457	Microprocessor producers keep adding	-0.124939
-1.229978	cc[]) { // Define	-0.124939
-0.821510	a[SIZE][SIZE]) { // Define	-0.124939
-0.347596	Aligned arrays // Define	-0.124939
-0.637493	b, c; // Define	-0.124939
-0.347596	double temp; // Define	-0.124939
-0.637493	// x^4 // Define	-0.124939
-0.064346	#include <dvec.h> // Define	-0.425969
-0.637493	#include "vectorclass.h" // Define	-0.124939
-0.347596	#include <emmintrin.h> // Define	-0.124939
-0.637493	#include "asmlib.h" // Define	-0.124939
-0.347596	SelectAddMul_AVX2, SelectAddMul_dispatch; // Define	-0.124939
-0.237959	multiple CPU cores: Define	-0.124939
-0.358839	of course, and causes	-0.124939
-0.358064	be mispredicted, which causes	-0.124939
-0.141653	64 matrix size causes	-0.124939
-0.141653	512 matrix size causes	-0.124939
-0.760829	the new version causes	-0.124939
-0.521309	in the list causes	-0.124939
-0.354148	because the write causes	-0.124939
-0.350286	if the inlining causes	-0.124939
-0.535539	if the destructor causes	-0.124939
-1.211109	the critical stride causes	-0.124939
-0.341802	the most frequent causes	-0.124939
-0.382793	the FDIV bug causes	-0.124939
-0.237877	malloc and free) causes	-0.124939
-1.065768	than by the processing	-0.124939
-0.586639	cores, and a processing	-0.124939
-1.397097	more time than processing	-0.124939
-0.355160	massively parallel vector processing	-0.124939
-0.355160	RISC cores, vector processing	-0.124939
-0.520959	use the high processing	-0.124939
-0.302903	of the graphics processing	-0.124939
-0.483448	have a graphics processing	-0.124939
-0.302903	is no graphics processing	-0.124939
-0.346415	for specifying parallel processing	-0.124939
-0.314717	for statistics, signal processing	-0.124939
-0.294200	Graphics and sound processing	-0.124939
-0.294200	have a physics processing	-0.124939
-0.546469	case is to divide	-0.124939
-0.192418	cores is to divide	-0.425969
-1.166880	in order to divide	-0.301030
-0.873068	we need to divide	-0.124939
-0.560661	several ways to divide	-0.124939
-1.036420	the code and divide	-0.124939
-0.587208	n. You can divide	-0.124939
-0.358728	shift right = divide	-0.124939
-0.560581	signed when you divide	-0.124939
-0.654946	vector library, you divide	-0.124939
-0.896627	due to the so-called	-0.124939
-0.600116	kernel in the so-called	-0.124939
-0.594633	normally use the so-called	-0.124939
-0.824396	by using the so-called	-0.124939
-0.462538	by bypassing the so-called	-0.124939
-0.358064	by emulating the so-called	-0.124939
-0.596061	FPGA as a so-called	-0.124939
-0.525287	most cases. The so-called	-0.124939
-0.461909	written back. The so-called	-0.124939
-0.357569	and modular. The so-called	-0.124939
-0.502225	virtual functions. This so-called	-0.124939
-0.356930	shared object. This so-called	-0.124939
-0.592631	above, it is clear	-0.124939
-0.592631	method, it is clear	-0.124939
-0.892603	It should be clear	-0.124939
-0.594485	operator; you can clear	-0.124939
-1.855382	it is not clear	-0.124939
-0.553950	in a more clear	-0.124939
-0.350711	Gives a more clear	-0.124939
-0.348657	makes it more clear	-0.124939
-0.348657	the program more clear	-0.124939
-0.348657	making software more clear	-0.124939
-1.872378	there is no clear	-0.124939
-0.356994	the code less clear	-0.124939
-0.546433	good for making clear	-0.124939
-0.899413	fraction of the total	-0.124939
-0.581365	add to the total	-0.124939
-0.199901	contribution to the total	-0.124939
-0.893229	effect on the total	-0.124939
-0.892273	dynamically when the total	-0.124939
-0.594766	test because the total	-0.124939
-0.574209	complicated. If the total	-0.124939
-0.574209	stored? If the total	-0.124939
-0.599245	which is a total	-0.124939
-0.595680	members with a total	-0.124939
-0.463530	(SIMD) operations. The total	-0.124939
-0.811302	of 64 bits total	-0.124939
-0.885858	do is to mix	-0.124939
-0.554961	operations, and to mix	-0.124939
-0.461037	sure not to mix	-0.124939
-0.668478	be advantageous to mix	-0.124939
-1.420460	are able to mix	-0.124939
-0.356883	point multiplication, to mix	-0.124939
-0.575370	used. Do not mix	-0.124939
-0.102880	reciprocal_divisor; 14.7 Don't mix	-0.124939
-0.102880	139 14.7 Don't mix	-0.124939
-0.237954	be evicted. Don't mix	-0.124939
-0.237943	have a balanced mix	-0.124939
-0.463559	as DOS and 16-bit	-0.124939
-0.182256	unsigned int in 16-bit	-0.124939
-0.162298	short int in 16-bit	-0.425969
-0.182256	int16_t int in 16-bit	-0.124939
-0.524451	32-bit integers in 16-bit	-0.124939
-0.562830	not optimized for 16-bit	-0.124939
-0.577117	backwards compatible with 16-bit	-0.124939
-0.895483	recommended to make 16-bit	-0.124939
-0.137209	vector of eight 16-bit	-0.124939
-0.137209	vectors of eight 16-bit	-0.124939
-0.525232	and 64-bit mode. 16-bit	-0.124939
-0.901203	offset of the child	-0.124939
-0.599100	name for the child	-0.124939
-0.042665	of parent and child	-0.124939
-0.089983	both parent and child	-0.124939
-0.462708	}; // The child	-0.124939
-0.526211	parent class. The child	-0.124939
-0.532762	member of its child	-0.124939
-0.490406	pointer to its child	-0.124939
-0.333560	information about its child	-0.124939
-0.341856	// call polymorphic child	-0.124939
-0.408163	has the correct child	-0.124939
-0.575582	used set of containers	-0.124939
-0.358812	the wheel. The containers	-0.124939
-0.358770	objects stored are containers	-0.124939
-0.357844	Use these example containers	-0.124939
-0.355621	excessively so. These containers	-0.124939
-0.355332	needed. Objects inside containers	-0.124939
-0.353387	to have separate containers	-0.124939
-0.352322	inefficient solution. Many containers	-0.124939
-0.351648	using ready made containers	-0.124939
-0.830113	of the STL containers	-0.124939
-0.317363	The other STL containers	-0.124939
-0.339407	examples of suitable containers	-0.124939
-0.461037	by 16 to fit	-0.124939
-0.461037	are available to fit	-0.124939
-0.015482	by eight to fit	-0.726999
-0.356883	if necessary, to fit	-0.124939
-0.358111	profiling tools that fit	-0.124939
-0.358111	into sub-vectors that fit	-0.124939
-0.879335	This does not fit	-0.124939
-1.111332	if the data fit	-0.124939
-0.553193	make the data fit	-0.124939
-1.754393	the compiler to predict	-0.124939
-2.074644	in order to predict	-0.124939
-1.159341	be able to predict	-0.124939
-0.511244	always able to predict	-0.124939
-0.511244	sometimes able to predict	-0.124939
-0.449478	is difficult to predict	-0.124939
-0.356208	advanced algorithms to predict	-0.124939
-0.757661	is unable to predict	-0.124939
-1.268229	that you can predict	-0.124939
-0.832501	the microprocessor can predict	-0.124939
-0.237951	CPU may occasionally predict	-0.124939
-0.527339	and setting the priority	-0.124939
-0.358178	with widely different priority	-0.124939
-0.786160	with the same priority	-0.124939
-0.536243	increasing the thread priority	-0.124939
-0.456337	powerful. The high priority	-0.124939
-0.325400	size has higher priority	-0.124939
-0.325400	to give higher priority	-0.124939
-0.287606	and the low priority	-0.124939
-0.408161	in a low priority	-0.124939
-0.364798	at a lower priority	-0.124939
-0.364798	thread with lower priority	-0.124939
-1.072452	copied to the disk	-0.124939
-0.580766	disk because of disk	-0.124939
-0.562281	RAM memory and disk	-0.124939
-0.358407	time, RAM and disk	-0.124939
-0.557598	while waiting for disk	-0.124939
-0.346471	input or reading disk	-0.124939
-0.172697	to the hard disk	-0.124939
-0.026220	of a hard disk	-0.124939
-0.012912	on a hard disk	-0.124939
-0.026220	from a hard disk	-0.124939
-0.115971	support for hard disk	-0.124939
-0.237446	that the clock frequency	-0.124939
-0.306932	if the clock frequency	-0.124939
-0.306932	when the clock frequency	-0.124939
-0.151030	Multithreading The clock frequency	-0.124939
-0.151030	load. The clock frequency	-0.124939
-0.151030	PCs. The clock frequency	-0.124939
-0.216590	the CPU clock frequency	-0.124939
-0.295603	change their clock frequency	-0.124939
-0.295603	a higher clock frequency	-0.124939
-0.295603	the actual clock frequency	-0.124939
-0.463616	a CPU of unknown	-0.124939
-0.358534	assumption about an unknown	-0.124939
-0.504104	or come from unknown	-0.124939
-0.579363	method for all unknown	-0.124939
-0.461780	be so many unknown	-0.124939
-0.498347	model that was unknown	-0.124939
-0.026378	processors that were unknown	-0.602060
-0.040193	models that were unknown	-0.425969
-0.535980	Failure to handle unknown	-0.124939
-0.593886	polymorphism that is obtained	-0.124939
-0.085770	best performance is obtained	-0.124939
-0.722641	modern microprocessors is obtained	-0.124939
-0.656314	user interface is obtained	-0.124939
-0.502530	highest efficiency is obtained	-0.124939
-1.187567	counter can be obtained	-0.124939
-0.593705	resolution can be obtained	-0.124939
-0.951152	can sometimes be obtained	-0.124939
-0.758442	can possibly be obtained	-0.124939
-0.294265	is no doubt obtained	-0.124939
-0.354996	Mathematical vector function libraries.	-0.124939
-0.458640	of different function libraries.	-0.124939
-0.458640	and optimized function libraries.	-0.124939
-0.458640	includes standard function libraries.	-0.124939
-0.540381	and short vector libraries.	-0.124939
-0.587151	to the Intel libraries.	-0.124939
-0.525349	behavior of static libraries.	-0.124939
-0.356420	between multiple dynamic libraries.	-0.124939
-0.809911	for very large libraries.	-0.124939
-0.519323	than static link libraries.	-0.124939
-0.792352	Intel vector math libraries.	-0.124939
-0.294210	link with external libraries.	-0.124939
-0.358965	iterations. Here the iteration	-0.124939
-0.064244	calculation of one iteration	-0.124939
-0.559057	temp in one iteration	-0.124939
-0.522914	saved from one iteration	-0.124939
-0.575588	error for each iteration	-0.124939
-0.526839	loop where each iteration	-0.124939
-0.349424	branch. After each iteration	-0.124939
-0.979566	is an extra iteration	-0.124939
-0.499034	again for every iteration	-0.124939
-0.829148	before the preceding iteration	-0.124939
-0.556362	of the previous iteration	-0.124939
-0.601245	reading of the counters	-0.124939
-0.575602	one set of counters	-0.124939
-0.358834	CPU core). The counters	-0.124939
-0.586840	reading the performance counters	-0.124939
-0.355661	mispredictions, etc. These counters	-0.124939
-0.119832	the performance monitor counters	-0.124939
-0.117301	The performance monitor counters	-0.124939
-0.054702	more performance monitor counters	-0.124939
-0.054702	Using performance monitor counters	-0.124939
-0.356750	the total time. Optimizing	-0.124939
-0.355544	than loops, etc. Optimizing	-0.124939
-0.350790	CPU dispatching are: Optimizing	-0.124939
-0.317377	five manuals: 1. Optimizing	-0.124939
-0.349734	Mac platforms. 2. Optimizing	-0.124939
-0.703390	caching is critical. Optimizing	-0.124939
-0.791637	with big-endian storage. Optimizing	-0.124939
-0.339433	optimizing for speed. Optimizing	-0.124939
-0.255893	pop ebx. 9 Optimizing	-0.124939
-0.255893	............................................................................. 84 9 Optimizing	-0.124939
-0.538839	on the processor). Optimizing	-0.124939
-0.497681	b into a 128-bit	-0.124939
-0.714620	fits into a 128-bit	-0.124939
-0.497681	combined into a 128-bit	-0.124939
-1.389071	For example, a 128-bit	-0.124939
-0.358914	64-bit MMX to 128-bit	-0.124939
-0.839873	point code. The 128-bit	-0.124939
-0.462702	YMM registers The 128-bit	-0.124939
-0.455156	vector as two 128-bit	-0.124939
-0.455156	operations into two 128-bit	-0.124939
-0.651871	processors that supported 128-bit	-0.124939
-0.352369	execution units. Each 128-bit	-0.124939
-0.523895	had the full 128-bit	-0.124939
-0.358926	of range is possibly	-0.124939
-0.358402	import table and possibly	-0.124939
-0.358402	restoring registers, and possibly	-0.124939
-0.547915	performance that can possibly	-0.124939
-0.547915	F2 that can possibly	-0.124939
-1.367977	the code can possibly	-0.124939
-0.536356	high-priority thread can possibly	-0.124939
-0.354905	(if valid) can possibly	-0.124939
-0.358506	line size may possibly	-0.124939
-0.658151	instruction set, but possibly	-0.124939
-0.343731	following methods could possibly	-0.124939
-0.294228	object is overwritten, possibly	-0.124939
-1.327035	is stored in x,	-0.124939
-0.344230	Example 7.32a double x,	-0.124939
-0.344230	Example 7.32b double x,	-0.124939
-0.344230	Example 8.8b double x,	-0.124939
-0.344230	Example 8.8a double x,	-0.124939
-0.352313	56 public: float x,	-0.124939
-0.352313	bool a; float x,	-0.124939
-0.354207	int Multiply (int x,	-0.124939
-0.349107	Func() { S1 x,	-0.124939
-0.632858	function can modify x,	-0.124939
-0.325392	double ipow (double x,	-0.124939
-0.237910	d, e, f, x,	-0.124939
-0.600218	nature of the stack.	-0.124939
-0.587202	big for the stack.	-0.124939
-0.873865	support for the stack.	-0.124939
-0.400154	than on the stack.	-0.301030
-1.229947	stored on the stack.	-0.124939
-0.545995	together on the stack.	-0.124939
-0.571538	cleans up the stack.	-0.124939
-0.842361	as a register stack.	-0.124939
-0.456927	have each their stack.	-0.124939
-0.961947	has its own stack.	-0.124939
-2.004814	a power of 2,	-0.124939
-0.358712	1, Monday = 2,	-0.124939
-0.358577	= (int)n - 2,	-0.124939
-0.163957	0, c + 2,	-0.425969
-0.657285	will evict number 2,	-0.124939
-0.155701	factor of 1, 2,	-0.124939
-0.155701	of sizes 1, 2,	-0.124939
-0.034009	= {1, 1, 2,	-0.425969
-0.336272	of 2 (i.e. 2,	-0.124939
-0.237910	logical processors (0, 2,	-0.124939
-0.599168	(STL) if the full	-0.124939
-1.169306	by making the full	-0.124939
-0.556955	not give the full	-0.124939
-0.358356	models had the full	-0.124939
-0.462910	future. Typically, the full	-0.124939
-0.358916	for handling a full	-0.124939
-1.395833	the result of full	-0.124939
-0.358887	a server in full	-0.124939
-0.540861	half speed or full	-0.124939
-0.463266	debug version with full	-0.124939
-0.540596	this will use full	-0.124939
-0.358230	assembly language has full	-0.124939
-0.460853	than CPU time. Another	-0.124939
-0.947724	the innermost loop. Another	-0.124939
-0.331838	the code itself. Another	-0.124939
-0.421314	well. Open Watcom Another	-0.124939
-0.325331	upon the double. Another	-0.124939
-0.714797	long dependency chains. Another	-0.124939
-0.294182	uses a GOT. Another	-0.124939
-0.294182	the program slower. Another	-0.124939
-0.237877	less than ARRAYSIZE. Another	-0.124939
-0.237877	in a DLL. Another	-0.124939
-0.237877	on Intel CPU’s. Another	-0.124939
-0.237877	the execution considerably. Another	-0.124939
-0.659854	the network is overloaded	-0.124939
-0.526505	vector classes and overloaded	-0.124939
-0.939685	copy constructors and overloaded	-0.124939
-0.589380	name cannot be overloaded	-0.124939
-0.358692	The constructor or overloaded	-0.124939
-0.582780	versions of an overloaded	-0.124939
-0.354927	function. Using an overloaded	-0.124939
-0.354927	a constructor, an overloaded	-0.124939
-0.497338	classes and using overloaded	-0.124939
-0.497338	penalty for using overloaded	-0.124939
-0.555649	expression with multiple overloaded	-0.124939
-0.355808	Overloaded operators An overloaded	-0.124939
-0.600151	whenever it is possible.	-0.124939
-0.598255	happened to be possible.	-0.124939
-0.453854	virtual functions if possible.	-0.124939
-0.064817	instruction set if possible.	-0.124939
-0.351222	usable library if possible.	-0.124939
-0.351222	one function, if possible.	-0.124939
-0.351222	point variables, if possible.	-0.124939
-0.351222	of longjmp if possible.	-0.124939
-0.459335	little work as possible.	-0.124939
-1.001587	as good as possible.	-0.124939
-0.355544	and reproducible as possible.	-0.124939
-0.351082	Multithreading works more efficiently	-0.124939
-0.351082	be calculated more efficiently	-0.124939
-0.351082	be achieved more efficiently	-0.124939
-0.351082	be cached more efficiently	-0.124939
-0.344944	is accessed most efficiently	-0.124939
-0.027287	cache works most efficiently	-0.301030
-0.441471	and much less efficiently	-0.124939
-0.341416	cache works less efficiently	-0.124939
-0.341416	works somewhat less efficiently	-0.124939
-0.355903	functions should work efficiently	-0.124939
-0.463338	Other brands or models	-0.124939
-0.556656	of specific CPU models	-0.124939
-0.059000	list of processor models	-0.425969
-0.059000	of which processor models	-0.425969
-0.308137	of specific processor models	-0.124939
-0.355715	brands or specific models	-0.124939
-0.542208	Some software development models	-0.124939
-0.345212	choice for future models	-0.124939
-0.446256	on all newer models	-0.124939
-0.237910	vector size. Later models	-0.124939
-1.064165	This function is OS	-0.124939
-0.278296	BSD and Mac OS	-0.124939
-0.270445	objects in Mac OS	-0.124939
-0.200500	compiler for Mac OS	-0.124939
-0.200500	references. 64-bit Mac OS	-0.124939
-0.069780	for 32-bit Mac OS	-0.124939
-0.152959	Linux. 32-bit Mac OS	-0.124939
-0.160530	The Intel-based Mac OS	-0.124939
-0.160530	as Intel-based Mac OS	-0.124939
-0.646910	This method requires OS	-0.124939
-0.890747	virtual function is needed.	-0.124939
-0.936798	the library is needed.	-0.124939
-0.786381	of optimization is needed.	-0.124939
-0.539797	complicated implementation is needed.	-0.124939
-0.357735	memory re-allocation is needed.	-0.124939
-1.690852	it is not needed.	-0.124939
-0.584187	CPUs is not needed.	-0.124939
-0.355275	is 95 not needed.	-0.124939
-0.358526	more space than needed.	-0.124939
-0.578480	evaluated only when needed.	-0.124939
-0.771848	feature is rarely needed.	-0.124939
-0.237926	shuffling, packing, unpacking needed.	-0.124939
-0.358855	of structures and classes.	-0.124939
-0.570320	allowed only for classes.	-0.124939
-0.358216	the available vector classes.	-0.124939
-0.586334	cons of using classes.	-0.124939
-0.580929	all of these classes.	-0.124939
-0.310000	of such container classes.	-0.124939
-0.223769	of efficient container classes.	-0.124939
-0.223769	various efficient container classes.	-0.124939
-0.310000	by well-tested container classes.	-0.124939
-0.341829	for implementing polymorphic classes.	-0.124939
-0.336262	of the base classes.	-0.124939
-0.294210	modularity and reusable classes.	-0.124939
-0.463266	+ 1 is changed	-0.124939
-0.358636	and 14.9 is changed	-0.124939
-1.253893	has to be changed	-0.124939
-1.755531	This can be changed	-0.124939
-1.021517	variable can be changed	-0.124939
-0.581976	... can be changed	-0.124939
-0.581976	edx can be changed	-0.124939
-0.593112	2.5 may be changed	-0.124939
-0.582828	operands cannot be changed	-0.124939
-0.780976	function pointer has changed	-0.124939
-0.355209	the value has changed	-0.124939
-0.237943	CPUID is artificially changed	-0.124939
-0.526397	that a is true	-0.124939
-0.599645	habit, it is true	-0.124939
-0.572582	and b is true	-0.124939
-1.188981	known to be true	-0.124939
-0.462006	|| true = true	-0.124939
-0.462006	|| !a = true	-0.124939
-0.562671	repeat loop if true	-0.124939
-0.525818	is most often true	-0.124939
-0.501544	reduced to always true	-0.124939
-0.803807	true a && true	-0.124939
-0.772177	false, a || true	-0.124939
-0.237902	a single result, true	-0.124939
-0.659938	and stop the thread.	-0.124939
-0.358919	before terminating a thread.	-0.124939
-1.216716	in a different thread.	-0.124939
-0.582330	for the other thread.	-0.124939
-0.549883	them into one thread.	-0.124939
-0.477580	separate for each thread.	-0.124939
-0.499784	instance for each thread.	-0.124939
-0.341101	access by each thread.	-0.124939
-0.341101	work into each thread.	-0.124939
-0.301841	5 by another thread.	-0.124939
-0.301841	changed by another thread.	-0.124939
-0.358197	code addresses. The names	-0.124939
-0.358197	compile for. The names	-0.124939
-0.593961	but the function names	-0.124939
-0.562997	119 The function names	-0.124939
-0.353783	correspondence between function names	-0.124939
-0.353783	information about function names	-0.124939
-0.457101	to define function names	-0.124939
-0.448503	vector functions have names	-0.124939
-0.448503	specific functions have names	-0.124939
-0.656812	names and variable names	-0.124939
-0.355023	library libircmt.lib. Function names	-0.124939
-0.483869	at CPU brand names	-0.124939
-0.310385	to temp even though	-0.124939
-0.310385	is executed even though	-0.124939
-0.310385	function returns even though	-0.124939
-0.310385	point expressions, even though	-0.124939
-0.310385	|| b)) even though	-0.124939
-0.310385	than nine, even though	-0.124939
-0.351279	for this function, though	-0.124939
-0.442014	X operating systems, though	-0.124939
-0.339448	optimizing compilers available, though	-0.124939
-0.331904	the track backwards though	-0.124939
-0.237918	the multiplication b[i]*c[i], though	-0.124939
-0.237918	call to Object1.Hello(), though	-0.124939
-0.845874	program than to execute	-0.124939
-1.569738	you have to execute	-0.124939
-1.009456	long time to execute	-0.124939
-1.121457	it takes to execute	-0.124939
-1.734097	is likely to execute	-0.124939
-0.356886	take microseconds to execute	-0.124939
-0.524213	x86 CPUs can execute	-0.124939
-0.829627	the microprocessor can execute	-0.124939
-0.356839	A debugger can execute	-0.124939
-1.419393	makes the code execute	-0.124939
-0.874019	kinds of code execute	-0.124939
-0.358855	with truncation, and %	-0.124939
-0.837741	a = b %	-0.124939
-0.503258	list[i] = i %	-0.124939
-0.537725	{ if (i %	-0.124939
-0.331907	256 && SIZE %	-0.124939
-0.421352	/ (line size) %	-0.124939
-0.324547	= (unsigned int)b %	-0.124939
-0.294210	(10000 / 64) %	-0.124939
-0.237902	(0x2710 / 0x40) %	-0.124939
-1.575759	more efficient than mov	-0.124939
-0.358151	The next instruction mov	-0.124939
-0.356224	i/2+r. The instructions mov	-0.124939
-0.354595	add sar add mov	-0.124939
-0.378759	$B1$1: mov mov mov	-0.124939
-0.265150	parameter $B1$1: mov mov	-0.124939
-0.265150	lea $B2$2: mov mov	-0.124939
-0.294200	push mov xor mov	-0.124939
-0.294200	12 $B1$1: push mov	-0.124939
-0.294200	; parameter $B1$1: mov	-0.124939
-0.237894	mov lea $B2$2: mov	-0.124939
-0.237894	xor mov $B1$2: mov	-0.124939
-1.853039	the value of N	-0.124939
-1.060359	the power of N	-0.124939
-0.658497	The splitting of N	-0.124939
-0.027951	template specialization for N	-0.301030
-0.358688	& N-1)==0 if N	-0.124939
-0.463266	// Array with N	-0.124939
-0.358106	1-bit removed. If N	-0.124939
-0.357489	for pow(x,N) where N	-0.124939
-0.750633	that processor model N	-0.124939
-0.331896	// General case, N	-0.124939
-0.512725	cache. The different kinds	-0.124939
-0.345699	to do different kinds	-0.124939
-0.542844	are two different kinds	-0.124939
-0.345699	are doing different kinds	-0.124939
-0.345699	to mix different kinds	-0.124939
-0.539987	time than other kinds	-0.124939
-0.358059	can cause all kinds	-0.124939
-0.854782	between the two kinds	-0.124939
-0.355094	There are four kinds	-0.124939
-0.354491	can make certain kinds	-0.124939
-0.077812	manuals. 7.1 Different kinds	-0.124939
-0.077812	26 7.1 Different kinds	-0.124939
-0.358184	assembly names. The details	-0.124939
-0.358184	cache (en.wikipedia.org/wiki/L2_cache). The details	-0.124939
-0.657037	test tool for details	-0.124939
-0.357512	page 141 for details	-0.124939
-0.357512	operating systems" for details	-0.124939
-0.504540	CPUs" gives more details	-0.124939
-0.658217	are also other details	-0.124939
-0.343695	on non- standardized details	-0.124939
-0.314727	into the technical details	-0.124939
-0.314727	these problems. More details	-0.124939
-0.237902	and other hardware-related details	-0.124939
-0.237902	one parameter. Further details	-0.124939
-0.599803	disk if the RAM	-0.124939
-1.521877	the use of RAM	-0.124939
-1.110305	the speed of RAM	-0.124939
-1.286979	the amount of RAM	-0.124939
-0.541156	intermediate results in RAM	-0.124939
-0.459817	computers with more RAM	-0.124939
-0.500818	to allocate more RAM	-0.124939
-0.500163	Accessing data from RAM	-0.124939
-0.459222	the variable from RAM	-0.124939
-0.357489	around 1980 where RAM	-0.124939
-0.451137	You may save RAM	-0.124939
-0.348354	of CPU time, RAM	-0.124939
-0.897319	see that the rows	-0.124939
-0.498282	2 if the rows	-0.425969
-1.778305	to make the rows	-0.124939
-0.371432	// number of rows	-0.425969
-0.511180	14.8 const int rows	-0.124939
-0.511180	7.17 const int rows	-0.124939
-0.511180	FuncCol(int); const int rows	-0.124939
-0.357230	the distance between rows	-0.124939
-0.063705	// loop through rows	-0.124939
-0.596106	accessed with a square	-0.124939
-1.024863	the call to square	-0.124939
-0.463485	to ebx. The square	-0.124939
-0.358717	// multiply // square	-0.124939
-0.358007	and handle one square	-0.124939
-0.357609	Example 8.1a float square	-0.124939
-1.358645	This is called square	-0.124939
-0.355765	contentions expected. Use square	-0.124939
-0.353600	complicated techniques like square	-0.124939
-0.339415	fast approximate reciprocal square	-0.124939
-0.294182	Single precision division, square	-0.124939
-0.237877	is inefficient. Division, square	-0.124939
-1.750787	is likely to fail	-0.124939
-0.358692	misleading results or fail	-0.124939
-0.582479	so. It may fail	-0.124939
-0.356544	above example may fail	-0.124939
-0.828046	above code will fail	-0.124939
-0.518687	positive. It will fail	-0.124939
-0.353072	The trick will fail	-0.124939
-0.356848	software companies often fail	-0.124939
-0.554626	code because they fail	-0.124939
-0.545115	not, and therefore fail	-0.124939
-0.341431	developers may therefore fail	-0.124939
-0.237910	many software products fail	-0.124939
-0.157976	for many different purposes.	-0.124939
-0.817412	for several different purposes.	-0.124939
-0.446678	used for other purposes.	-0.124939
-0.446678	available for other purposes.	-0.124939
-0.354505	used for multiple purposes.	-0.124939
-0.246360	array for multiple purposes.	-0.124939
-0.139884	variable for test purposes.	-0.124939
-0.139884	branch for test purposes.	-0.124939
-0.568010	many of these purposes.	-0.124939
-0.448894	for all these purposes.	-0.124939
-0.237926	made for demonstration purposes.	-0.124939
-0.358142	operating system functions (e.g.	-0.124939
-0.357967	a micro-op cache (e.g.	-0.124939
-0.357627	valid 63 number (e.g.	-0.124939
-0.843453	has a branch (e.g.	-0.124939
-0.356972	a system call (e.g.	-0.124939
-0.355866	with system calls (e.g.	-0.124939
-0.354842	and relational operators (e.g.	-0.124939
-0.496576	sequentially. Some applications (e.g.	-0.124939
-0.556464	requires static linking (e.g.	-0.124939
-0.351633	big endian storage (e.g.	-0.124939
-0.349052	a universal algorithm (e.g.	-0.124939
-0.759220	the carry flag (e.g.	-0.124939
-1.105200	The disadvantage of compiling	-0.124939
-0.883913	the possibility of compiling	-0.124939
-0.358718	for interpreting or compiling	-0.124939
-0.357376	This works by compiling	-0.124939
-0.357376	a module by compiling	-0.124939
-0.343766	vector registers when compiling	-0.124939
-0.343766	Visual Studio when compiling	-0.124939
-0.343766	less strict when compiling	-0.124939
-0.343766	option -fno-pic when compiling	-0.124939
-0.343766	class Vec16s when compiling	-0.124939
-0.343766	about Func1 when compiling	-0.124939
-0.343766	to Eclipse when compiling	-0.124939
-1.539564	more efficient to convert	-0.124939
-0.725167	for example, to convert	-0.124939
-1.048432	therefore necessary to convert	-0.124939
-0.579239	number. We can convert	-0.124939
-0.357798	have tested can convert	-0.124939
-0.726858	the compiler will convert	-0.124939
-0.639734	Gnu compiler will convert	-0.124939
-0.639734	good compiler will convert	-0.124939
-0.568374	constant and then convert	-0.124939
-0.355631	< 231 then convert	-0.124939
-0.461106	faster to first convert	-0.124939
-0.653963	the compiler must convert	-0.124939
-1.170820	does the same thing	-0.124939
-0.365287	doing the same thing	-0.425969
-0.833123	exactly the same thing	-0.124939
-1.131247	more than one thing	-0.124939
-0.474890	algorithm The first thing	-0.124939
-0.474890	further. The first thing	-0.124939
-1.098241	The most important thing	-0.124939
-0.519711	137). The second thing	-0.124939
-0.347476	dependency chains. Another thing	-0.124939
-0.331879	code. The third thing	-0.124939
-0.331896	be an obvious thing	-0.124939
-0.587638	i into the least	-0.124939
-0.358675	always chooses the least	-0.124939
-0.358675	operation isolates the least	-0.124939
-0.335660	aligned by at least	-0.124939
-0.335660	class has at least	-0.124939
-0.335660	that calls at least	-0.124939
-0.335660	N+1 supports at least	-0.124939
-0.335660	library (or at least	-0.124939
-0.335660	in memory, at least	-0.124939
-0.335660	same cache, at least	-0.124939
-0.335660	and memcpy, at least	-0.124939
-0.335660	to do, at least	-0.124939
-0.358810	or class for containing	-0.124939
-0.526809	out loop-invariant code containing	-0.124939
-0.939325	128 bit vector containing	-0.124939
-0.578375	be a class containing	-0.124939
-0.353700	a simple class containing	-0.124939
-0.554510	another vector register containing	-0.124939
-0.356407	A big file containing	-0.124939
-0.718525	then the line containing	-0.124939
-0.353391	A large block containing	-0.124939
-0.346438	expression or subexpression containing	-0.124939
-0.652185	manual at www.agner.org/optimize/cppexamples.zip containing	-0.124939
-0.294191	regular access patterns containing	-0.124939
-0.357214	if (u.i[1] < 0)	-0.124939
-0.328369	if (n > 0)	-0.124939
-0.328369	= (bb[i] > 0)	-0.124939
-0.215892	% 2 == 0)	-0.124939
-0.215892	% 128 == 0)	-0.124939
-0.215892	if (a == 0)	-0.124939
-0.094667	= (b == 0)	-0.124939
-0.094667	if (b == 0)	-0.124939
-0.129472	if (a != 0)	-0.124939
-0.129472	while (n != 0)	-0.124939
-0.129472	if (b != 0)	-0.124939
-0.129472	while (*p != 0)	-0.124939
-0.541235	about loss of precision.	-0.124939
-0.463565	good performance and precision.	-0.124939
-0.598637	relaxed floating point precision.	-0.124939
-0.972522	single and double precision.	-0.124939
-0.138870	than for double precision.	-0.124939
-0.138870	even for double precision.	-0.124939
-0.537949	with long double precision.	-0.124939
-0.317791	back to single precision.	-0.124939
-0.317791	fast as single precision.	-0.124939
-0.317791	time than single precision.	-0.124939
-0.411929	all use single precision.	-0.124939
-0.237926	risk of losing precision.	-0.124939
-1.425188	to do the algebraic	-0.124939
-0.884956	the possibility of algebraic	-0.124939
-0.358844	optimization methods and algebraic	-0.124939
-0.597134	safe to use algebraic	-0.124939
-0.526521	they cannot make algebraic	-0.124939
-1.307138	This is because algebraic	-0.124939
-0.357366	to do any algebraic	-0.124939
-0.446571	can do simple algebraic	-0.124939
-0.345461	can reduce simple algebraic	-0.124939
-0.353588	to reduce complicated algebraic	-0.124939
-0.353164	to reduce various algebraic	-0.124939
-0.352322	a compiler. Many algebraic	-0.124939
-1.146192	The use of structures	-0.124939
-0.526949	are instances of structures	-0.124939
-0.358718	the arrays or structures	-0.124939
-1.006472	Alignment of data structures	-0.124939
-0.547258	algorithms and data structures	-0.124939
-0.437294	and other data structures	-0.124939
-0.437294	the multiple data structures	-0.124939
-0.155102	in large data structures	-0.124939
-0.694867	have big data structures	-0.124939
-0.338095	more advanced data structures	-0.124939
-0.355828	arrays and big structures	-0.124939
-0.594271	type-casting with a little	-0.124939
-0.657336	to run a little	-0.124939
-0.462027	This needs a little	-0.124939
-0.462027	loop becomes a little	-0.124939
-0.357662	may seem a little	-0.124939
-0.358636	table-based methods with little	-0.124939
-0.463214	should do as little	-0.124939
-0.463008	Most programmers have little	-0.124939
-0.358321	unroll factor. A little	-0.124939
-0.357439	compact and takes little	-0.124939
-0.580838	There is very little	-0.124939
-0.352921	sampling generates too little	-0.124939
-0.570205	NotPolymorphic(); }; // Any	-0.124939
-1.410654	Dynamic memory allocation Any	-0.124939
-0.348304	the entire object. Any	-0.124939
-0.343682	the new block. Any	-0.124939
-0.534686	point execution units. Any	-0.124939
-0.336274	is 400 here. Any	-0.124939
-0.615648	the loop counter. Any	-0.124939
-0.800060	difficult to maintain. Any	-0.124939
-0.382793	can be shared. Any	-0.124939
-0.237877	to be saved. Any	-0.124939
-0.237877	the spell checking. Any	-0.124939
-0.237877	to the device. Any	-0.124939
-0.600661	abstraction in the logical	-0.124939
-0.896839	good for the logical	-0.124939
-0.998416	even though the logical	-0.124939
-0.358916	numbers form a logical	-0.124939
-1.822774	the number of logical	-0.124939
-1.331064	The number of logical	-0.124939
-0.142985	of cores or logical	-0.124939
-0.142985	CPU cores or logical	-0.124939
-1.681718	of the same logical	-0.124939
-0.869099	with only one logical	-0.124939
-0.354899	processors but eight logical	-0.124939
-0.237910	only the even-numbered logical	-0.124939
-0.065522	asmlib library int level	-0.425969
-0.566246	adds an extra level	-0.124939
-0.063750	the vector element level	-0.124939
-0.355013	functions /Gr Function level	-0.124939
-0.280690	in the high level	-0.124939
-0.280690	because the high level	-0.124939
-0.495344	where a high level	-0.124939
-0.352040	microarchitecture. A higher level	-0.124939
-0.421364	when the highest level	-0.124939
-0.237910	option for "function level	-0.124939
-0.065577	all files on access.	-0.124939
-0.352584	or by memory access.	-0.124939
-0.352584	calculations with memory access.	-0.124939
-0.352584	for another memory access.	-0.124939
-0.549654	arrays with vector access.	-0.124939
-0.462327	addresses at each access.	-0.124939
-0.458181	done at every access.	-0.124939
-0.354625	and direct hardware access.	-0.124939
-0.349072	by optimizing database access.	-0.124939
-0.490869	need any non-static access.	-0.124939
-0.341829	faster than random access.	-0.124939
-1.812330	to use the bitwise	-0.124939
-1.522051	by using the bitwise	-0.124939
-0.358664	operands. Nevertheless, the bitwise	-0.124939
-0.461090	when needed. The bitwise	-0.124939
-0.356925	at once The bitwise	-0.124939
-0.356925	and ||). The bitwise	-0.124939
-0.356925	2n -1. The bitwise	-0.124939
-0.358644	can do with bitwise	-0.124939
-0.872219	trick of using bitwise	-0.124939
-0.138606	zero. 14.3 Use bitwise	-0.124939
-0.138606	134 14.3 Use bitwise	-0.124939
-0.382862	and the corresponding bitwise	-0.124939
-0.599808	WriteFile if the handle	-0.124939
-0.539275	safe way to handle	-0.124939
-0.785460	fastest way to handle	-0.124939
-0.065587	sufficiently large to handle	-0.425969
-0.656448	are designed to handle	-0.124939
-0.526095	number. Failure to handle	-0.124939
-0.358407	avoid these and handle	-0.124939
-0.358407	smaller squares and handle	-0.124939
-1.118324	that we can handle	-0.124939
-0.358329	thread should then handle	-0.124939
-0.553527	dispatcher that doesn't handle	-0.124939
-0.596648	stored by the heap	-0.124939
-0.579940	collection when the heap	-0.124939
-0.579940	places when the heap	-0.124939
-0.504399	memory called the heap	-0.124939
-0.937816	will cause the heap	-0.124939
-0.462734	and causes the heap	-0.124939
-0.573260	overhead cost of heap	-0.124939
-0.501349	random order. The heap	-0.124939
-0.356303	dynamic allocation. The heap	-0.124939
-0.460300	memory heap. The heap	-0.124939
-0.356303	page 26. The heap	-0.124939
-0.356303	become invalid. The heap	-0.124939
-0.347460	next instruction mov DWORD	-0.124939
-0.341841	PTR [eax], ecx DWORD	-0.124939
-0.269214	ebx, 1 ebx, DWORD	-0.124939
-0.352368	instruction add ebx, DWORD	-0.124939
-0.255885	eax, eax edx, DWORD	-0.124939
-0.255885	edx, ecx, edx, DWORD	-0.124939
-0.314727	esp ebx ecx, DWORD	-0.124939
-0.137155	DWORD PTR [edx] DWORD	-0.124939
-0.538855	DWORD PTR [esp+8] DWORD	-0.124939
-0.237902	DWORD PTR [eax+400] DWORD	-0.124939
-0.237902	DWORD PTR [esp+4] DWORD	-0.124939
-0.655532	the user's time. Other	-0.124939
-0.352344	only known processors. Other	-0.124939
-0.429484	a high priority. Other	-0.124939
-0.325384	in this format. Other	-0.124939
-0.408127	162 19 Literature Other	-0.124939
-0.102861	................................................................................................................. 21 3.11 Other	-0.124939
-0.102861	is best. 3.11 Other	-0.124939
-0.048392	...................................................................................................... 20 3.9 Other	-0.124939
-0.048392	files). 20 3.9 Other	-0.124939
-0.102861	so on. 7.31 Other	-0.124939
-0.102861	................................................................................ 61 7.31 Other	-0.124939
-0.237902	Intel and Gnu). Other	-0.124939
-0.882604	which is used during	-0.124939
-0.586809	of the performance during	-0.124939
-0.501219	speculatively executing instructions during	-0.124939
-0.354139	of time both during	-0.124939
-0.744163	specific CPU core during	-0.124939
-0.810474	off the computer during	-0.124939
-0.453333	which will change during	-0.124939
-0.336253	task switch occurs during	-0.124939
-0.325331	may be selected during	-0.124939
-0.294182	the framework itself, during	-0.124939
-0.294182	an array grows during	-0.124939
-0.237877	under the framework, during	-0.124939
-0.594968	(PLT) that is initialized	-0.124939
-1.671979	that it is initialized	-0.124939
-0.883493	sure it is initialized	-0.124939
-0.914245	the table is initialized	-0.124939
-0.358871	string constants, and initialized	-0.124939
-0.557598	program, one for initialized	-0.124939
-1.413243	need to be initialized	-0.124939
-1.656764	it can be initialized	-0.124939
-1.056311	array can be initialized	-0.124939
-0.539037	} An array initialized	-0.124939
-0.572175	b have been initialized	-0.124939
-0.502085	that overflow can occur	-0.124939
-0.356830	target buffer can occur	-0.124939
-0.356830	garbage collection can occur	-0.124939
-0.460610	such expressions may occur	-0.124939
-0.356547	// Overflow may occur	-0.124939
-0.358329	the break will occur	-0.124939
-0.357569	Reducible expressions also occur	-0.124939
-0.355816	integer overflow doesn't occur	-0.124939
-0.229071	matrix when contentions occur	-0.124939
-0.229071	dramatic when contentions occur	-0.124939
-0.325399	errors that seldom occur	-0.124939
-0.590678	directly if the target	-0.124939
-1.056709	eliminated if the target	-0.124939
-0.594230	changed then the target	-0.124939
-0.463119	to predict the target	-0.124939
-0.358210	be predicted. The target	-0.124939
-0.358210	next paragraph. The target	-0.124939
-0.137289	in the branch target	-0.425969
-0.477630	called the branch target	-0.124939
-0.426647	critical. The branch target	-0.124939
-0.329601	code cache, branch target	-0.124939
-0.346438	of a program, especially	-0.124939
-0.626286	in 32-bit systems, especially	-0.124939
-0.467566	code is inefficient, especially	-0.124939
-0.595116	loss of precision, especially	-0.124939
-0.294191	a scarce resource, especially	-0.124939
-0.294191	in the file, especially	-0.124939
-0.294191	efficient than relocation, especially	-0.124939
-0.382805	long dependency chains, especially	-0.124939
-0.237886	point code slower, especially	-0.124939
-0.237886	critical dependency chain, especially	-0.124939
-0.237886	also time consuming, especially	-0.124939
-1.027477	pointer or a smart	-0.124939
-1.044507	then use a smart	-0.124939
-0.525433	don't need a smart	-0.124939
-0.869261	object through a smart	-0.124939
-0.503256	cost whenever a smart	-0.124939
-0.828187	common implementations of smart	-0.124939
-0.355761	Smart pointers A smart	-0.124939
-0.500591	longer used. A smart	-0.124939
-0.586350	purpose of using smart	-0.124939
-0.461492	some things very smart	-0.124939
-0.456904	with each their smart	-0.124939
-0.527126	innermost loop that includes	-0.124939
-0.456886	other compilers. This includes	-0.124939
-0.353614	register variables. This includes	-0.124939
-0.353614	than speed. This includes	-0.124939
-0.353614	"override" feature. This includes	-0.124939
-1.427195	The Intel compiler includes	-0.124939
-0.357563	C++ language also includes	-0.124939
-0.656245	in this way includes	-0.124939
-0.758218	The map file includes	-0.124939
-0.455309	are: Static linking includes	-0.124939
-0.237910	from www.agner.org/optimize/asmlib.zip. Currently includes	-0.124939
-0.596713	allocated and the entire	-0.124939
-0.599705	constants in the entire	-0.124939
-0.581785	linking makes the entire	-0.124939
-1.091702	of making the entire	-0.124939
-0.723723	to copy the entire	-0.124939
-0.880652	to load the entire	-0.124939
-0.831936	by copying the entire	-0.124939
-0.357615	be loading the entire	-0.124939
-0.357615	where almost the entire	-0.124939
-0.461967	may mirror the entire	-0.124939
-0.358559	write causes an entire	-0.124939
-0.873868	them into the executable	-0.124939
-0.915436	you want the executable	-0.124939
-0.540561	Therefore, both the executable	-0.124939
-0.462920	file. Only the executable	-0.124939
-0.358364	run. Both the executable	-0.124939
-0.540783	or by an executable	-0.124939
-1.283907	in a single executable	-0.124939
-0.351691	distributed as binary executable	-0.124939
-0.230379	in the main executable	-0.124939
-0.294568	from the main executable	-0.124939
-0.898598	useful if the subexpression	-0.124939
-0.570506	around such a subexpression	-0.124939
-0.463354	An expression or subexpression	-0.124939
-0.897212	If the same subexpression	-0.124939
-0.330043	such as common subexpression	-0.124939
-0.330043	This allows common subexpression	-0.124939
-0.330043	function inlining, common subexpression	-0.124939
-0.129466	elimin., integer Common subexpression	-0.124939
-0.129466	+= 2; Common subexpression	-0.124939
-0.129466	(vector) reductions: Common subexpression	-0.124939
-0.129466	Pointer elimination Common subexpression	-0.124939
-1.055860	way is to insert	-0.124939
-1.664733	is possible to insert	-0.124939
-0.842004	often possible to insert	-0.124939
-0.461900	down. Remember to insert	-0.124939
-0.357562	are risking to insert	-0.124939
-0.564828	compile time and insert	-0.124939
-0.824519	static memory and insert	-0.124939
-1.219975	instruction set and insert	-0.124939
-0.357491	by hand and insert	-0.124939
-0.589891	Intel compiler can insert	-0.124939
-0.591723	// You may insert	-0.124939
-0.594607	9.10, then the nontemporal	-0.124939
-0.541041	cache. Using the nontemporal	-0.124939
-0.527304	the effect of nontemporal	-0.124939
-0.358826	memory area. The nontemporal	-0.124939
-0.501924	the #pragma vector nontemporal	-0.124939
-0.356715	write #pragma vector nontemporal	-0.124939
-0.356715	nontemporal #pragma vector nontemporal	-0.124939
-0.591531	ameliorated by using nontemporal	-0.124939
-0.490249	back. The so-called nontemporal	-0.124939
-0.494351	evicted. Don't mix nontemporal	-0.124939
-0.346458	compiler can insert nontemporal	-0.124939
-0.570599	go outside the bounds	-0.124939
-0.358923	have added a bounds	-0.124939
-0.805985	The method of bounds	-0.124939
-0.500725	an array with bounds	-0.124939
-0.500725	of arrays with bounds	-0.124939
-0.459733	7.15a. Array with bounds	-0.124939
-0.358628	page 134 on bounds	-0.124939
-0.512595	Violation of array bounds	-0.124939
-0.222346	check for array bounds	-0.124939
-0.222346	checking for array bounds	-0.124939
-0.222346	checks for array bounds	-0.124939
-0.358848	be inlined for improved	-0.124939
-1.001107	that can be improved	-0.124939
-1.252129	code can be improved	-0.124939
-1.165491	This can be improved	-0.425969
-0.561127	performance can be improved	-0.124939
-0.561127	processors can be improved	-0.124939
-0.561127	projects can be improved	-0.124939
-0.584424	speed will be improved	-0.124939
-0.351965	can probably be improved	-0.124939
-0.597916	math and the SSE	-0.124939
-0.596655	Microprocessors with the SSE	-0.124939
-0.581574	microprocessor has the SSE	-0.124939
-0.504430	not support the SSE	-0.124939
-0.652252	32 4 128 SSE	-0.124939
-0.753690	32 bit mode SSE	-0.124939
-0.621753	/Gy -ffunction- sections SSE	-0.124939
-0.237910	Prefetch PREFETCH _mm_prefetch SSE	-0.124939
-0.237910	cache MOVNTPS _mm_stream_ps SSE	-0.124939
-0.237910	file MMX mmintrin.h SSE	-0.124939
-0.237910	cache MOVNTQ _mm_stream_pi SSE	-0.124939
-0.897318	And it is discussed	-0.124939
-0.589963	part. It is discussed	-0.124939
-0.589963	considerations. It is discussed	-0.124939
-0.462128	of threads is discussed	-0.124939
-0.357741	type conversions is discussed	-0.124939
-0.356417	to optimization are discussed	-0.124939
-1.258027	function libraries are discussed	-0.124939
-0.537404	These methods are discussed	-0.124939
-0.356417	common time-consumers are discussed	-0.124939
-0.358615	small devices, as discussed	-0.124939
-1.034486	it is also discussed	-0.124939
-0.527339	not need the updates	-0.124939
-0.249096	The search for updates	-0.124939
-0.249096	programs search for updates	-0.124939
-0.461841	time searching for updates	-0.124939
-0.358230	of downloaded program updates	-0.124939
-0.353200	to install automatic updates	-0.124939
-0.124513	18 3.4 Automatic updates	-0.124939
-0.124513	manner. 3.4 Automatic updates	-0.124939
-0.341838	remotely. If frequent updates	-0.124939
-0.505845	these time consuming updates	-0.124939
-0.237910	programs automatically download updates	-0.124939
-1.628705	is important to consider	-0.124939
-0.566717	then you may consider	-0.301030
-0.441778	code, you may consider	-0.124939
-0.441778	object, you may consider	-0.124939
-0.878579	code. If you consider	-0.124939
-0.526412	then we will consider	-0.124939
-0.460118	complicated that I consider	-0.124939
-0.523236	purpose, you must consider	-0.124939
-0.344077	However, we must consider	-0.124939
-0.358811	it requires the loading	-0.124939
-0.358811	may involve the loading	-0.124939
-0.594221	you will be loading	-0.124939
-0.580597	use more time loading	-0.124939
-0.462806	level-2 cache from loading	-0.124939
-0.358230	etc. But program loading	-0.124939
-0.347103	to memory without loading	-0.124939
-0.347103	a double without loading	-0.124939
-0.439059	because of lazy loading	-0.124939
-0.093308	19 3.5 Program loading	-0.124939
-0.093308	process. 3.5 Program loading	-0.124939
-0.358788	would all be below	-0.124939
-0.462277	as the example below	-0.124939
-1.202279	at an address below	-0.124939
-0.458958	checking is explained below	-0.124939
-0.060294	// loop columns below	-0.425969
-0.341878	function ReadTSC listed below	-0.124939
-0.429484	from row 28 below	-0.124939
-0.172668	the elements matrix[r][c] below	-0.124939
-0.237927	Each element matrix[r][c] below	-0.124939
-0.294210	as example 7.15b below	-0.124939
-0.527116	latter case the reading	-0.124939
-0.957420	turn off the reading	-0.124939
-0.828129	This applies to reading	-0.124939
-0.658801	the stack and reading	-0.124939
-0.358396	to optimize, and reading	-0.124939
-0.858188	because we are reading	-0.124939
-0.659393	user input or reading	-0.124939
-0.358636	in connection with reading	-0.124939
-0.504588	is spent on reading	-0.124939
-1.672666	is faster than reading	-0.124939
-0.594513	operation rather than reading	-0.124939
-0.505056	framework. Obviously, the directly	-0.124939
-0.895608	calling the function directly	-0.124939
-1.258589	as well as directly	-0.124939
-0.355871	below. Make calls directly	-0.124939
-0.354155	These instructions write directly	-0.124939
-0.339457	floating point representation directly	-0.124939
-0.615664	implementations of C++, directly	-0.124939
-0.408103	put measurement instruments directly	-0.124939
-0.294191	can call C1::f directly	-0.124939
-0.294191	can be fed directly	-0.124939
-0.237886	}; // Called directly	-0.124939
-1.628715	This is the simplest	-0.124939
-1.060924	only in the simplest	-0.124939
-0.890833	except in the simplest	-0.124939
-0.598259	Except for the simplest	-0.124939
-0.576246	understands only the simplest	-0.124939
-0.282856	that gives the simplest	-0.124939
-0.502241	per vector. The simplest	-0.124939
-0.356942	development tools. The simplest	-0.124939
-0.356942	compiling module2.cpp. The simplest	-0.124939
-0.356942	is repetitive. The simplest	-0.124939
-0.596532	parallelism is the situation	-0.124939
-0.599468	refers to the situation	-0.124939
-0.600661	useful in the situation	-0.124939
-0.463507	or structure. The situation	-0.124939
-0.358393	in a use situation	-0.124939
-0.504169	cache space. A situation	-0.124939
-0.526137	about the only situation	-0.124939
-0.461676	used in any situation	-0.124939
-0.104703	the worst case situation	-0.124939
-0.355641	46 A common situation	-0.124939
-1.054653	called from the message	-0.124939
-0.599860	typically in a message	-0.124939
-0.358882	semaphores, mutexes and message	-0.124939
-0.274742	makes an error message	-0.124939
-0.274742	need an error message	-0.124939
-0.274742	generate an error message	-0.124939
-0.274742	issue an error message	-0.124939
-0.274742	provoke an error message	-0.124939
-0.283366	checking). An error message	-0.124939
-0.519984	your own error message	-0.124939
-0.369566	an appropriate error message	-0.124939
-0.565188	extra time. The delay	-0.124939
-0.461887	matrix line. The delay	-0.124939
-0.461887	dynamic linker. The delay	-0.124939
-0.358580	250 ms. This delay	-0.124939
-0.594907	calculate the time delay	-0.124939
-0.526399	i which will delay	-0.124939
-0.562323	incur a large delay	-0.124939
-0.355810	store operation doesn't delay	-0.124939
-0.331879	called. A considerable delay	-0.124939
-0.023523	a store forwarding delay	-0.425969
-0.600664	statement in the condition	-0.124939
-0.375682	eliminated if the condition	-0.124939
-0.358919	because testing a condition	-0.124939
-0.504686	that the if condition	-0.124939
-0.890982	is the loop condition	-0.124939
-0.576313	and an error condition	-0.124939
-0.481328	for the overflow condition	-0.124939
-0.341874	An uncaught overflow condition	-0.124939
-0.461264	The loop control condition	-0.124939
-0.461264	efficient loop control condition	-0.124939
-0.437940	using the performance monitor	-0.124939
-0.437940	up the performance monitor	-0.124939
-0.437940	read the performance monitor	-0.124939
-0.445284	problems. The performance monitor	-0.124939
-0.056915	or more performance monitor	-0.425969
-0.293549	counters. A performance monitor	-0.124939
-0.293549	feature called performance monitor	-0.124939
-0.293549	particularly useful performance monitor	-0.124939
-0.056915	16.1 Using performance monitor	-0.425969
-0.940756	to economize the resource	-0.124939
-0.726725	frequent sources of resource	-0.124939
-0.358680	of modules or resource	-0.124939
-1.535884	to the same resource	-0.124939
-0.530554	swapping and other resource	-0.124939
-0.530554	evictions and other resource	-0.124939
-0.435001	DLLs, configuration files, resource	-0.124939
-1.031610	important to economize resource	-0.124939
-0.652146	are a scarce resource	-0.124939
-0.237894	is a precious resource	-0.124939
-0.237894	or shared objects), resource	-0.124939
-1.312639	the number of cores	-0.124939
-0.563890	reduced number of cores	-0.124939
-0.849784	between the different cores	-0.124939
-0.585215	all the CPU cores	-0.124939
-0.226064	The multiple CPU cores	-0.124939
-0.226064	with multiple CPU cores	-0.124939
-0.226064	use multiple CPU cores	-0.124939
-0.555666	Processors with multiple cores	-0.124939
-0.458776	processor with four cores	-0.124939
-0.325407	and FPGA soft cores	-0.124939
-1.161393	code has a parallel	-0.124939
-1.486182	the sake of parallel	-0.124939
-0.764433	multiple calculations in parallel	-0.124939
-0.659636	OpenMP directives for parallel	-0.124939
-0.499165	available for doing parallel	-0.124939
-0.649294	program logic allows parallel	-0.124939
-0.317383	optimization options. Supports parallel	-0.124939
-0.317383	and Mac. Supports parallel	-0.124939
-0.336287	standard for specifying parallel	-0.124939
-0.294200	language is inherently parallel	-0.124939
-0.237894	supercomputers with massively parallel	-0.124939
-0.587383	same code in either	-0.124939
-0.503369	5 times faster either	-0.124939
-0.838604	can be implemented either	-0.425969
-0.752160	can be linked either	-0.124939
-1.085579	You may choose either	-0.124939
-0.343674	It can contain either	-0.124939
-0.477987	can be saved either	-0.124939
-0.325377	chance of going either	-0.124939
-0.444316	multiple memory blocks, either	-0.124939
-0.294200	graphics processing unit, either	-0.124939
-0.412492	of two different implementations	-0.124939
-0.412492	make two different implementations	-0.124939
-0.344675	the loop. Some implementations	-0.124939
-0.344675	can run. Some implementations	-0.124939
-0.999367	The most common implementations	-0.124939
-0.342645	93). All common implementations	-0.124939
-0.353641	executable code. Most implementations	-0.124939
-0.353610	languages and their implementations	-0.124939
-0.451997	have particularly slow implementations	-0.124939
-0.345221	case if alternative implementations	-0.124939
-0.294219	compilation. Some early implementations	-0.124939
-0.587354	table instead of calculating	-0.124939
-0.558297	often used for calculating	-0.124939
-0.558297	currently used for calculating	-0.124939
-0.925447	induction variables for calculating	-0.124939
-0.354927	physics processor for calculating	-0.124939
-0.998791	hardware support for calculating	-0.124939
-0.535146	work needed for calculating	-0.124939
-0.556609	unit intended for calculating	-0.124939
-1.693048	is faster than calculating	-0.124939
-0.462943	done implicitly when calculating	-0.124939
-0.325401	it to begin calculating	-0.124939
-1.858202	the value of ebx	-0.124939
-0.358881	compute i/2 in ebx	-0.124939
-0.537543	ecx+eax*4. The result ebx	-0.124939
-0.458751	label. It uses ebx	-0.124939
-0.349044	label ; save ebx	-0.124939
-0.339424	points to. Now ebx	-0.124939
-0.336261	assembly code. Register ebx	-0.124939
-0.444303	ENDP + esp ebx	-0.124939
-0.314708	< 100. pop ebx	-0.124939
-0.314708	eax, 100 $B1$2 ebx	-0.124939
-0.237886	label ; restore ebx	-0.124939
-1.863432	in the same generation	-0.124939
-0.474895	vector). The first generation	-0.124939
-0.474895	follows. The first generation	-0.124939
-0.460697	development, each new generation	-0.124939
-0.787653	that the next generation	-0.124939
-0.481846	on the next generation	-0.124939
-0.725591	in the second generation	-0.124939
-0.115806	functions. The second generation	-0.425969
-0.339448	loops or compile-time generation	-0.124939
-0.331889	about the third generation	-0.124939
-0.887390	do is to enable	-0.124939
-2.098745	in order to enable	-0.124939
-1.107274	is recommended to enable	-0.124939
-0.461892	compiler options to enable	-0.124939
-0.504939	set up and enable	-0.124939
-1.065414	64-bit mode or enable	-0.124939
-0.572881	inline. This may enable	-0.124939
-0.569459	inline. This will enable	-0.124939
-0.522538	optimization, which will enable	-0.124939
-0.573023	newer instruction sets enable	-0.124939
-1.424637	of floating point instructions.	-0.124939
-0.870190	any floating point instructions.	-0.124939
-0.356589	to access these instructions.	-0.124939
-0.480877	to the AVX instructions.	-0.124939
-0.480877	use the AVX instructions.	-0.124939
-0.712904	memory and string instructions.	-0.124939
-0.352033	9.2. Cache control instructions.	-0.124939
-0.518249	delay the subsequent instructions.	-0.124939
-0.339469	a few machine instructions.	-0.124939
-0.753001	slow bit scan instructions.	-0.124939
-0.325361	documentation for detailed instructions.	-0.124939
-1.037714	the object is copied	-0.124939
-0.742398	an object is copied	-0.124939
-0.549371	the parameter is copied	-0.124939
-0.358033	example 14.1c is copied	-0.124939
-1.041952	object can be copied	-0.124939
-1.155158	objects can be copied	-0.124939
-0.589292	members can be copied	-0.124939
-1.190919	code is not copied	-0.124939
-0.580276	file has been copied	-0.124939
-0.314777	the entire contents copied	-0.124939
-0.237926	is created, deleted, copied	-0.124939
-1.428402	a vector of e.g.	-0.124939
-0.861340	response time to e.g.	-0.124939
-0.595253	suffixes such as e.g.	-0.124939
-1.540969	tell the compiler e.g.	-0.124939
-0.552719	specific instruction set, e.g.	-0.124939
-0.498845	registers can hold e.g.	-0.124939
-0.339407	instruction set available, e.g.	-0.124939
-0.314708	software programming language, e.g.	-0.124939
-0.237886	with internal multi-threading, e.g.	-0.124939
-0.237886	the module with, e.g.	-0.124939
-0.237886	generate an interrupt, e.g.	-0.124939
-0.885087	alternative is to keep	-0.124939
-0.582393	program has to keep	-0.124939
-0.586026	useful way to keep	-0.124939
-1.756909	you want to keep	-0.124939
-1.360413	be advantageous to keep	-0.124939
-0.547059	often fail to keep	-0.124939
-0.356544	powerful computers to keep	-0.124939
-0.501685	be preferable to keep	-0.124939
-0.527220	allocated objects and keep	-0.124939
-0.460541	that they always keep	-0.124939
-0.237943	sets Microprocessor producers keep	-0.124939
-0.037171	instruction mov DWORD PTR	-0.124939
-0.037171	[eax], ecx DWORD PTR	-0.124939
-0.018188	1 ebx, DWORD PTR	-0.124939
-0.018188	add ebx, DWORD PTR	-0.124939
-0.018188	eax edx, DWORD PTR	-0.124939
-0.018188	ecx, edx, DWORD PTR	-0.124939
-0.037171	ebx ecx, DWORD PTR	-0.124939
-0.155385	PTR [edx] DWORD PTR	-0.124939
-0.037171	PTR [esp+8] DWORD PTR	-0.124939
-0.037171	PTR [eax+400] DWORD PTR	-0.124939
-0.037171	PTR [esp+4] DWORD PTR	-0.124939
-0.503610	AVX instr. set Automatic	-0.124939
-0.352637	for float expressions Automatic	-0.124939
-0.559771	Automatic CPU dispatch Automatic	-0.124939
-0.527287	dispatch Automatic vectorization Automatic	-0.124939
-0.429484	individual installation tools. Automatic	-0.124939
-0.102861	.................................................................................................. 18 3.4 Automatic	-0.124939
-0.102861	standardized manner. 3.4 Automatic	-0.124939
-0.294210	// Example 12.1a. Automatic	-0.124939
-0.102861	distant future. 12.3 Automatic	-0.124939
-0.102861	.......................................................... 107 12.3 Automatic	-0.124939
-0.237902	tools. Automatic updates. Automatic	-0.124939
-0.355875	discontinued Object Windows Library	-0.124939
-0.070910	the Windows Template Library	-0.124939
-0.070910	and Windows Template Library	-0.124939
-0.155701	the Standard Template Library	-0.124939
-0.155701	the Active Template Library	-0.124939
-0.339487	AMD Core Math Library	-0.124939
-0.920214	divisible by 16. Library	-0.124939
-0.732692	Intel's Math Kernel Library	-0.124939
-0.325371	always fully optimized. Library	-0.124939
-0.408139	__vrd2_exp AMD LIBM Library	-0.124939
-0.237910	library functions directly: Library	-0.124939
-0.563841	(!a&&c) = a ?	-0.124939
-0.563841	(b&&c) = a ?	-0.124939
-1.774047	a = b ?	-0.124939
-0.244237	a > b ?	-0.124939
-0.351784	(a > b ?	-0.124939
-0.104706	b > 0 ?	-0.425969
-0.395384	(bb[i] > 0) ?	-0.124939
-0.730617	(b == 0) ?	-0.124939
-0.294237	(GetExceptionCode() == EXCEPTION_FLT_OVERFLOW ?	-0.124939
-0.237926	OneOrTwo5[b!=0] as OneOrTwo5[(b!=0) ?	-0.124939
-1.682182	the function is defined	-0.124939
-0.577065	sin function is defined	-0.124939
-0.462875	its body is defined	-0.124939
-0.462447	Monday, etc. are defined	-0.124939
-0.539796	the constants are defined	-0.124939
-0.358234	the programmer has defined	-0.124939
-0.357633	a static object defined	-0.124939
-0.357308	variables (i.e. variables defined	-0.124939
-0.572161	could have been defined	-0.124939
-0.137991	12.5. Vector classes defined	-0.124939
-0.137991	12.1. Vector classes defined	-0.124939
-0.358929	of Basic is Visual	-0.124939
-0.401730	into the Microsoft Visual	-0.124939
-0.353205	tool is Microsoft Visual	-0.124939
-0.353205	plug-in to Microsoft Visual	-0.124939
-0.269905	mentioned below. Microsoft Visual	-0.124939
-0.269905	to date): Microsoft Visual	-0.124939
-0.325391	for multi-core processing. Visual	-0.124939
-0.172682	such as C#, Visual	-0.124939
-0.172682	in Java, C#, Visual	-0.124939
-0.314756	available for free. Visual	-0.124939
-0.237926	objects, respectively (MS Visual	-0.124939
-2.105085	in order to align	-0.124939
-1.170499	for how to align	-0.124939
-1.166341	shows how to align	-0.124939
-0.724383	may choose to align	-0.124939
-1.477499	the compiler can align	-0.124939
-0.723962	small x // align	-0.124939
-0.357718	swap elements // align	-0.124939
-0.933174	Some compilers will align	-0.124939
-0.982407	Most compilers will align	-0.124939
-0.344070	; return ; align	-0.124939
-0.344070	+ esp ; align	-0.124939
-0.463192	memory allocations of sizes	-0.124939
-0.358577	cache. Bit-fields of sizes	-0.124939
-0.344031	objects of different sizes	-0.425969
-0.454701	alignments and different sizes	-0.124939
-0.576098	efficiently on all sizes	-0.124939
-0.539957	intended for array sizes	-0.124939
-0.554540	of vector register sizes	-0.124939
-0.459156	for different matrix sizes	-0.124939
-0.457393	and operators Integer sizes	-0.124939
-0.351271	Integers of smaller sizes	-0.124939
-0.665966	{ a[i] = temp;	-0.124939
-0.064492	r, c; double temp;	-0.425969
-0.348719	c1, c2; double temp;	-0.124939
-0.064867	= temp * temp;	-0.124939
-0.356927	c[size]; float register temp;	-0.124939
-0.351716	i, a[100], b, temp;	-0.124939
-1.374275	a, b, c, temp;	-0.124939
-0.714919	int i, a[100], temp;	-0.124939
-0.557530	certain instructions that allow	-0.124939
-1.043917	compiler does not allow	-0.124939
-0.572579	changed. This will allow	-0.124939
-0.357993	of C++ should allow	-0.124939
-0.581219	most C++ compilers allow	-0.124939
-0.544889	Many 32-bit systems allow	-0.124939
-0.504444	important. Some systems allow	-0.124939
-0.338176	64-bit Unix systems allow	-0.124939
-1.078115	BSD and Mac allow	-0.124939
-0.350322	and 16-bit Windows, allow	-0.124939
-0.349100	high precision math allow	-0.124939
-0.896148	work on the PathScale	-0.124939
-0.433316	the Intel and PathScale	-0.124939
-0.433316	Gnu, Intel and PathScale	-0.124939
-0.203494	Intel, Gnu and PathScale	-0.425969
-0.461201	Intel, Microsoft and PathScale	-0.124939
-0.835174	Clang, Intel or PathScale	-0.124939
-0.522539	by Microsoft, Intel, PathScale	-0.124939
-0.350316	for all platforms. PathScale	-0.124939
-0.325399	Digital Mars PGI PathScale	-0.124939
-0.237918	2006 (Red Hat). PathScale	-0.124939
-0.971118	also applies to BSD	-0.124939
-1.009388	in Linux and BSD	-0.124939
-0.570602	Shared objects in BSD	-0.124939
-0.206603	distributions of Linux, BSD	-0.124939
-0.277624	objects in Linux, BSD	-0.124939
-0.304136	whereas 64-bit Linux, BSD	-0.124939
-0.205981	a Windows, Linux, BSD	-0.124939
-0.205981	with Windows, Linux, BSD	-0.124939
-0.339468	FreeBSD and Open BSD	-0.124939
-0.237926	Windows, Linux, Mac, BSD	-0.124939
-0.357767	+ e + f;	-0.124939
-0.013197	union { float f;	-0.726999
-1.005642	int i; float f;	-0.124939
-0.357317	*= i; return f;	-0.124939
-1.439345	result of the previous	-0.124939
-0.895653	constant to the previous	-0.124939
-0.890022	value in the previous	-0.124939
-1.275275	explained in the previous	-0.124939
-1.557875	depends on the previous	-0.124939
-0.350502	calculated from the previous	-0.425969
-0.531280	efficiently from the previous	-0.124939
-0.588298	finished using the previous	-0.124939
-0.565279	loaded until the previous	-0.124939
-0.582030	known from a previous	-0.124939
-0.494804	0; i < size;	-0.865301
-0.598913	Fortunately, it is rarely	-0.124939
-0.589798	pointers It is rarely	-0.124939
-0.589798	set. It is rarely	-0.124939
-0.503671	dispatch mechanism is rarely	-0.124939
-0.265585	this feature is rarely	-0.124939
-0.835691	the time and rarely	-0.124939
-1.066918	high that it rarely	-0.124939
-0.358074	Mac programs but rarely	-0.124939
-0.345236	program. Application programmers rarely	-0.124939
-0.345229	of advanced features rarely	-0.124939
-1.216716	in a different way.	-0.124939
-0.582330	goes the other way.	-0.124939
-0.530290	in the following way.	-0.124939
-0.314775	in an inefficient way.	-0.124939
-0.659957	a very inefficient way.	-0.124939
-0.346457	of going either way.	-0.124939
-0.152951	in a suboptimal way.	-0.124939
-0.237918	in a graceful way.	-0.124939
-0.599634	object to the vector.	-0.124939
-0.587829	fit into the vector.	-0.124939
-0.498608	elements in a vector.	-0.124939
-0.574368	terms in one vector.	-0.124939
-0.357833	a full size vector.	-0.124939
-0.456645	as a Boolean vector.	-0.124939
-0.549642	in the last vector.	-0.124939
-0.187713	of elements per vector.	-0.124939
-0.473663	of the largest vector.	-0.124939
-0.595329	program that is easier	-0.124939
-0.599371	convenient. It is easier	-0.124939
-0.658667	example 15.1b is easier	-0.124939
-0.358866	more manageable and easier	-0.124939
-0.834395	This makes it easier	-0.124939
-0.462644	declaration makes it easier	-0.124939
-0.690741	It is often easier	-0.124939
-0.352365	The calculation becomes easier	-0.124939
-0.516076	It is just easier	-0.124939
-0.314747	make this reordering easier	-0.124939
-0.884036	pointed to is identical	-0.124939
-0.550396	by p is identical	-0.124939
-0.539796	two constants are identical	-0.124939
-0.357992	Open BSD are identical	-0.124939
-0.354476	be stored. All identical	-0.124939
-0.341857	systems give almost identical	-0.124939
-0.364837	code is exactly identical	-0.124939
-0.279485	will make exactly identical	-0.124939
-0.444355	compact by joining identical	-0.124939
-0.102867	range analysis Join identical	-0.124939
-0.102867	memory area. Join identical	-0.124939
-0.463538	between 5 and 20	-0.124939
-0.142765	of 10 - 20	-0.124939
-0.142765	until 10 - 20	-0.124939
-0.358265	been reduced from 20	-0.124939
-0.421371	This loop repeats 20	-0.124939
-0.325351	System database ...................................................................................................... 20	-0.124939
-0.294200	position-independent code ....................................................... 20	-0.124939
-0.294200	Literature ..................................................................................................................... 163 20	-0.124939
-0.237894	for some links. 20	-0.124939
-0.237894	3.7 File access................................................................................................................ 20	-0.124939
-0.237894	files (*.ini files). 20	-0.124939
-0.357062	be smaller as well.	-0.124939
-0.357062	programming languages as well.	-0.124939
-0.357235	executed. Optimizes very well.	-0.124939
-0.650423	Does not optimize well.	-0.124939
-0.412283	loop is predicted well.	-0.124939
-0.470275	would be predicted well.	-0.124939
-0.290726	is not predicted well.	-0.124939
-0.595190	code version performs well.	-0.124939
-0.077812	vectorization. Optimizes reasonably well.	-0.124939
-0.077812	conventions. Optimizes reasonably well.	-0.124939
-0.237918	vectorization. Optimizes moderately well.	-0.124939
-2.038115	part of the program,	-0.124939
-0.888106	start of the program,	-0.124939
-0.594503	redesign of the program,	-0.124939
-0.370989	modified by the program,	-0.124939
-0.462724	to execute the program,	-0.124939
-1.623703	part of a program,	-0.124939
-0.503131	In a C++ program,	-0.124939
-0.454921	point in your program,	-0.124939
-1.059542	in the final program,	-0.124939
-0.314756	in a multithreaded program,	-0.124939
-0.876609	The address of list[i]	-0.124939
-1.050588	} else { list[i]	-0.425969
-0.555623	because the expression list[i]	-0.124939
-0.352913	< ARRAYSIZE && list[i]	-0.124939
-0.563475	array cout << list[i]	-0.124939
-0.023523	for(i=0; i<300; i++){ list[i]	-0.124939
-0.102864	for(i=0; i<300; i+=3){ list[i]	-0.124939
-0.102864	for(i=0; i<301; i+=3){ list[i]	-0.124939
-0.237910	for(i=i_div_3=0; i<300; i+=3,i_div_3++){ list[i]	-0.124939
-1.138318	the response time under	-0.124939
-1.487155	of the program under	-0.124939
-1.249183	in the program under	-0.124939
-1.057545	If the program under	-0.124939
-0.586825	test the performance under	-0.124939
-0.355725	that performs best under	-0.124939
-0.546239	tests are done under	-0.124939
-0.531380	also be tested under	-0.124939
-0.467594	program that runs under	-0.124939
-0.314727	for background services under	-0.124939
-0.294210	found in Wikipedia under	-0.124939
-0.659751	scan instruction and expect	-0.124939
-0.565507	optimizations you can expect	-0.124939
-0.832824	general, you can expect	-0.124939
-0.880260	we do not expect	-0.124939
-0.589807	table if you expect	-0.124939
-0.560579	& unless you expect	-0.124939
-0.575856	line that we expect	-0.124939
-0.786207	and you cannot expect	-0.124939
-0.357729	case. You cannot expect	-0.124939
-0.357729	examples. You cannot expect	-0.124939
-0.357729	compiler-specific. You cannot expect	-0.124939
-1.162464	in a register except	-0.124939
-0.139486	if all bits except	-0.124939
-0.139486	testing all bits except	-0.124939
-0.348339	the same time, except	-0.124939
-0.339448	the same object, except	-0.124939
-0.331880	a static library, except	-0.124939
-0.595134	overflow and underflow except	-0.124939
-0.314717	make 16-bit programs, except	-0.124939
-0.836044	on the stack, except	-0.124939
-0.237894	is much faster, except	-0.124939
-0.237894	in the representation, except	-0.124939
-0.597219	happen with the loops	-0.124939
-0.462969	no compile- time loops	-0.124939
-0.434717	of the two loops	-0.124939
-0.357554	automatically replace such loops	-0.124939
-0.459249	on very small loops	-0.124939
-0.354162	counter outside both loops	-0.124939
-0.349114	compilers will unroll loops	-0.124939
-0.408127	C++ template metaprogramming, loops	-0.124939
-0.023522	the processor. Nested loops	-0.425969
-0.010583	is the reason why	-0.124939
-0.329856	1. The reason why	-0.124939
-0.329856	do. The reason why	-0.124939
-0.329856	occur. The reason why	-0.124939
-0.211250	The main reason why	-0.124939
-0.345255	the main reasons why	-0.124939
-0.343762	have no explanation why	-0.124939
-0.294256	following example explains why	-0.124939
-1.144548	of the CPU dispatching.	-0.124939
-0.328612	approach to CPU dispatching.	-0.124939
-0.425410	library with CPU dispatching.	-0.124939
-0.328612	is called CPU dispatching.	-0.124939
-0.328612	functions without CPU dispatching.	-0.124939
-0.221709	for automatic CPU dispatching.	-0.124939
-0.187001	with automatic CPU dispatching.	-0.124939
-0.328612	of poor CPU dispatching.	-0.124939
-0.328612	of bad CPU dispatching.	-0.124939
-0.343656	>= size) { cout	-0.124939
-0.015100	void Disp() { cout	-0.726999
-0.063831	void Hello() { cout	-0.425969
-0.343656	(unsigned int)size) { cout	-0.124939
-0.357507	Loop through array cout	-0.124939
-0.632966	bit of f cout	-0.124939
-0.450117	as pointers and references.	-0.124939
-0.450117	using pointers and references.	-0.124939
-0.526959	by pointers or references.	-0.124939
-0.358644	are impossible with references.	-0.124939
-0.357871	simpler when using references.	-0.124939
-0.224740	name for local references.	-0.124939
-0.224740	lookups for local references.	-0.124939
-0.026219	used for internal references.	-0.124939
-0.054124	PLT for internal references.	-0.124939
-1.410495	it possible to come	-0.124939
-0.357382	interface elements that come	-0.124939
-0.503619	or libraries that come	-0.124939
-0.357382	file mathimf.h that come	-0.124939
-0.358692	are uninitialized or come	-0.124939
-0.358503	consuming updates may come	-0.124939
-0.461765	other big objects come	-0.124939
-0.839488	or if they come	-0.124939
-0.353621	License shall automatically come	-0.124939
-0.570172	used data members come	-0.124939
-0.567167	the series of statements	-0.124939
-0.358693	allows compile-time if statements	-0.124939
-0.357635	semicolons, while multiple statements	-0.124939
-0.534694	reason to add statements	-0.124939
-0.186374	branches and switch statements	-0.124939
-0.278959	Branches and switch statements	-0.124939
-0.321845	replacements for switch statements	-0.124939
-0.321845	well. A switch statements	-0.124939
-0.243858	statements because switch statements	-0.124939
-0.294237	two ways. Switch statements	-0.124939
-1.017793	d; d = u;	-0.124939
-0.498095	7.25 unsigned int u;	-0.124939
-0.498095	14.22b unsigned int u;	-0.124939
-0.498095	14.22a unsigned int u;	-0.124939
-0.184782	int i; } u;	-0.346788
-0.345076	int i[2]; } u;	-0.124939
-0.599641	instruction if the SSE4.1	-0.124939
-0.580610	bits), unless the SSE4.1	-0.124939
-0.358908	multiplication prior to SSE4.1	-0.124939
-0.557580	set, one for SSE4.1	-0.124939
-0.358729	// SSE2 // SSE4.1	-0.124939
-0.463260	example, vectorized with SSE4.1	-0.124939
-0.503610	SSE3 instr. set SSE4.1	-0.124939
-0.537121	integer vector instructions SSE4.1	-0.124939
-0.331888	vector class library, SSE4.1	-0.124939
-0.237902	Suppl. SSE3 tmmintrin.h SSE4.1	-0.124939
-1.296573	explained in the chapter	-0.124939
-1.279693	as explained in chapter	-0.124939
-0.407516	as described in chapter	-0.124939
-0.502981	operations mentioned in chapter	-0.124939
-0.358583	polymorphous class? This chapter	-0.124939
-0.357047	static variables. See chapter	-0.124939
-0.549646	mode. The next chapter	-0.124939
-0.816048	in the previous chapter	-0.124939
-0.237918	the "Macro loops" chapter	-0.124939
-0.593033	compiler which is similar	-0.124939
-0.504593	A template is similar	-0.124939
-0.585920	objconv or a similar	-0.124939
-0.358396	Text strings and similar	-0.124939
-0.358396	Square blocking and similar	-0.124939
-0.358818	ball reveals that similar	-0.124939
-0.358433	by Intel have similar	-0.124939
-0.358321	} 138 A similar	-0.124939
-0.538636	microprocessors are very similar	-0.124939
-0.756914	core library contains similar	-0.124939
-0.031520	This is of course	-0.124939
-0.065508	works is of course	-0.124939
-0.065508	animations is of course	-0.124939
-0.356596	These are of course	-0.124939
-0.356596	you may of course	-0.124939
-0.356596	There should of course	-0.124939
-0.356596	15.1c would of course	-0.124939
-0.237951	reveal a zigzag course	-0.124939
-0.237951	compile time. (Of course	-0.124939
-0.502986	code address and back	-0.124939
-0.129144	protected mode and back	-0.425969
-0.461789	to truncation and back	-0.124939
-0.585337	convert the result back	-0.124939
-0.350828	counter and go back	-0.124939
-0.347484	setting the priority back	-0.124939
-0.336283	lies r places back	-0.124939
-0.331889	100 and jumps back	-0.124939
-0.237918	software that dates back	-0.124939
-0.572939	changed without the risk	-0.124939
-0.314578	it involves the risk	-0.124939
-0.314578	also involves the risk	-0.124939
-0.589826	code is a risk	-0.124939
-1.587979	there is a risk	-0.124939
-0.358078	// Faster, but risk	-0.124939
-0.918723	there is no risk	-0.602060
-0.541905	is a higher risk	-0.124939
-0.587359	manager has a garbage	-0.124939
-0.357475	task switches and garbage	-0.124939
-0.357475	allocation, deallocation and garbage	-0.124939
-0.142951	memory management and garbage	-0.124939
-0.142951	heap management and garbage	-0.124939
-0.504877	no need for garbage	-0.124939
-0.463199	too fragmented. This garbage	-0.124939
-1.358882	This is called garbage	-0.124939
-0.453943	manager will start garbage	-0.124939
-0.483867	the very time-consuming garbage	-0.124939
-0.586823	Excessive use of templates	-0.124939
-0.940036	the form of templates	-0.124939
-0.527196	container classes and templates	-0.124939
-0.357244	polymorphism effect with templates	-0.124939
-0.461496	Compile-time polymorphism with templates	-0.124939
-0.544051	made container class templates	-0.124939
-0.353723	suitable containers class templates	-0.124939
-0.833066	cost to using templates	-0.124939
-0.353430	page 150. Using templates	-0.124939
-0.314747	well-tested functions, classes, templates	-0.124939
-0.581986	missing check for buffer	-0.124939
-0.996001	in a memory buffer	-0.124939
-0.595967	up the loop buffer	-0.124939
-0.935570	in a static buffer	-0.124939
-0.162597	the branch target buffer	-0.124939
-0.182725	The branch target buffer	-0.124939
-0.006838	as a circular buffer	-0.301030
-0.600984	names of the header	-0.124939
-0.598123	libmmt.lib and the header	-0.124939
-1.427439	can use the header	-0.124939
-0.358919	are including a header	-0.124939
-0.358866	library modules and header	-0.124939
-0.358737	#include "xmmintrin.h" // header	-0.124939
-0.549363	defined in Intel header	-0.124939
-0.562567	If the standard header	-0.124939
-0.452221	include the appropriate header	-0.124939
-0.452221	Including the appropriate header	-0.124939
-0.600803	away in the future	-0.124939
-0.577462	string. In the future	-0.124939
-0.583199	future. If a future	-0.124939
-0.541125	optimal choice for future	-0.124939
-0.659650	only hope that future	-0.124939
-0.536279	work best on future	-0.124939
-0.355675	fastest solution on future	-0.124939
-0.355675	256 bytes) on future	-0.124939
-0.596473	processors rather than future	-0.124939
-0.347497	to Object1.Hello(), though future	-0.124939
-0.814669	may be called whenever	-0.124939
-0.575186	principle is useful whenever	-0.124939
-1.257405	floating point calculations whenever	-0.124939
-0.585988	several clock cycles whenever	-0.124939
-0.563847	pointers to zero whenever	-0.124939
-0.497038	an extra cost whenever	-0.124939
-0.747224	they are declared whenever	-0.124939
-0.749092	to be mispredicted whenever	-0.124939
-0.331880	table (PLT). And whenever	-0.124939
-0.237894	their own initiative whenever	-0.124939
-0.723178	to gain by unrolling	-0.124939
-0.357380	performance dramatically by unrolling	-0.124939
-0.835267	} The loop unrolling	-0.124939
-0.351226	use excessive loop unrolling	-0.124939
-0.351226	much. Excessive loop unrolling	-0.124939
-0.231412	} } Loop unrolling	-0.124939
-0.231412	execution time. Loop unrolling	-0.124939
-0.231412	this case. Loop unrolling	-0.124939
-0.231412	unroll factor. Loop unrolling	-0.124939
-0.231412	is eliminated. Loop unrolling	-0.124939
-1.391321	different versions of CriticalFunction	-0.124939
-1.024888	the call to CriticalFunction	-0.124939
-0.358525	extern "C" int CriticalFunction	-0.124939
-0.357473	version CriticalFunctionType * CriticalFunction	-0.124939
-0.357473	// Generic version CriticalFunction	-0.124939
-0.757467	number of times CriticalFunction	-0.124939
-0.620767	// SSE2 supported CriticalFunction	-0.124939
-0.620767	// AVX supported CriticalFunction	-0.124939
-0.354897	depends on whether CriticalFunction	-0.124939
-0.790745	takes to execute CriticalFunction	-0.124939
-0.833799	operating system to swap	-0.124939
-0.143184	a macro to swap	-0.124939
-0.143184	Define macro to swap	-0.124939
-0.358877	the diagonal and swap	-0.124939
-0.357718	below diagonal // swap	-0.124939
-0.357718	swapd(a[r][c], a[c][r]); // swap	-0.124939
-0.575365	column; Do not swap	-0.124939
-0.463395	Here, you cannot swap	-0.124939
-0.463395	Here you cannot swap	-0.124939
-0.548772	operands. You cannot swap	-0.124939
-0.557288	that uses a newer	-0.124939
-0.463223	may choose a newer	-0.124939
-0.997929	is available in newer	-0.124939
-1.135996	instruction set. The newer	-0.124939
-0.504588	particularly fast on newer	-0.124939
-0.504169	is used. A newer	-0.124939
-0.570084	version on all newer	-0.124939
-0.354402	4, while all newer	-0.124939
-0.525756	important on most newer	-0.124939
-0.354467	operating system All newer	-0.124939
-0.598331	integer, and the fraction	-0.124939
-0.764263	by setting the fraction	-0.124939
-0.112078	{ unsigned int fraction	-0.602060
-0.358039	2exponent 16383 one fraction	-0.124939
-0.562347	if a large fraction	-0.124939
-0.558217	only a small fraction	-0.124939
-0.337693	2exponent 127 1 fraction	-0.124939
-0.337693	2exponent 1023 1 fraction	-0.124939
-1.035147	be necessary to modify	-0.124939
-1.043293	not recommended to modify	-0.124939
-0.414951	this function can modify	-0.124939
-0.414951	one function can modify	-0.124939
-0.461940	container classes or modify	-0.124939
-0.357593	add, remove or modify	-0.124939
-1.189326	Make the function modify	-0.124939
-0.525049	anyway. If we modify	-0.124939
-0.759876	member function cannot modify	-0.124939
-0.353427	flag and don't modify	-0.124939
-1.858253	the value of seconds	-0.124939
-0.583086	would assume that seconds	-0.124939
-0.596463	cycles rather than seconds	-0.124939
-0.358455	void DelayFiveSeconds() { seconds	-0.124939
-0.462589	another thread. If seconds	-0.124939
-0.549481	attempts to set seconds	-0.124939
-0.355928	do nothing while seconds	-0.124939
-0.521510	delayed for several seconds	-0.124939
-0.689713	can take several seconds	-0.124939
-0.490318	will wait until seconds	-0.124939
-0.358835	to account for unaligned	-0.124939
-0.201742	Function to store unaligned	-0.602060
-0.353442	of efficiency. Using unaligned	-0.124939
-0.183328	Function to load unaligned	-0.602060
-0.242391	0.11 memcpy 16kB unaligned	-0.124939
-0.242391	0.22 memcpy 16kB unaligned	-0.124939
-0.358413	object through this address.	-0.124939
-0.572511	holds a memory address.	-0.124939
-1.026599	to a different address.	-0.124939
-0.599090	requiring the same address.	-0.124939
-0.356052	must calculate its address.	-0.124939
-0.326942	a specific load address.	-0.124939
-0.326942	the actual load address.	-0.124939
-0.336278	through a self-relative address.	-0.124939
-0.429484	to a valid address.	-0.124939
-0.237902	a 32-bit (signed) address.	-0.124939
-0.128841	* c); // Store	-0.425969
-0.458214	_mm_or_si128(c2, bc); // Store	-0.124939
-0.354661	c2, mask); // Store	-0.124939
-0.354661	//=A*x*x+B*x+C //=DeltaY // Store	-0.124939
-0.346500	MOVNTI _mm_stream_si32 SSE2 Store	-0.124939
-0.346500	MOVNTPD _mm_stream_pd SSE2 Store	-0.124939
-0.257605	PREFETCH _mm_prefetch SSE Store	-0.124939
-0.257605	MOVNTPS _mm_stream_ps SSE Store	-0.124939
-0.257605	MOVNTQ _mm_stream_pi SSE Store	-0.124939
-0.600860	step of the sequence	-0.124939
-0.891633	elements in the sequence	-0.124939
-0.596294	step in the sequence	-0.124939
-0.599334	disadvantage if the sequence	-0.124939
-0.591032	performed on a sequence	-0.124939
-0.358298	are doing a sequence	-0.124939
-0.358298	labels follow a sequence	-0.124939
-0.358893	are allocated in sequence	-0.124939
-0.541083	earlier CPUs. The sequence	-0.124939
-0.575884	where a long sequence	-0.124939
-1.190204	generated by the compiler,	-0.124939
-0.597039	comes with the compiler,	-0.124939
-0.802862	using an Intel compiler,	-0.124939
-0.410020	with Intel C++ compiler,	-0.124939
-0.410020	2.00. Intel C++ compiler,	-0.124939
-0.844857	of the Gnu compiler,	-0.124939
-0.344691	#else // Gnu compiler,	-0.124939
-0.355419	on a Linux compiler,	-0.124939
-0.526351	details in both compiler,	-0.124939
-0.335550	support from both compiler,	-0.124939
-0.726774	The delay is significant	-0.124939
-0.598905	iteration is a significant	-0.124939
-1.034054	This has a significant	-0.124939
-0.504468	functions consume a significant	-0.124939
-0.835558	the possibility for significant	-0.124939
-0.598717	table is not significant	-0.124939
-1.052811	and the most significant	-0.124939
-0.220694	into the least significant	-0.124939
-0.220694	isolates the least significant	-0.124939
-0.331904	of approximately seven significant	-0.124939
-0.556865	complex cases it might	-0.124939
-0.357560	parameters. Or it might	-0.124939
-1.114685	An optimizing compiler might	-0.124939
-0.504298	elements then this might	-0.124939
-0.358320	better solution. It might	-0.124939
-0.461565	that the variables might	-0.124939
-0.357050	a fixed address might	-0.124939
-0.584540	then the user might	-0.124939
-0.352960	this example. We might	-0.124939
-0.294210	Such a coprocessor might	-0.124939
-0.596289	registers in the CPU.	-0.124939
-0.596289	units in the CPU.	-0.124939
-1.072598	easier for the CPU.	-0.124939
-1.646154	depending on the CPU.	-0.124939
-0.358914	any brand of CPU.	-0.124939
-0.802850	on an Intel CPU.	-0.124939
-0.354637	any known hardware CPU.	-0.124939
-0.350330	on a modern CPU.	-0.124939
-0.348367	on a non-Intel CPU.	-0.124939
-0.382850	a 2 GHz CPU.	-0.124939
-0.357817	Vector class, Intel Vector	-0.124939
-0.654781	of vector, bits Vector	-0.124939
-0.492161	point register variables. Vector	-0.124939
-0.890262	later instruction sets. Vector	-0.124939
-0.294200	F64vec4 Table 12.5. Vector	-0.124939
-0.237894	license Table 12.4. Vector	-0.124939
-0.237894	been updated lately. Vector	-0.124939
-0.237894	AVX512 Table 12.1. Vector	-0.124939
-0.237894	// Example 12.7. Vector	-0.124939
-0.237894	512 bits (ZMM). Vector	-0.124939
-0.896956	limit to the length	-0.124939
-0.599020	integer if the length	-0.124939
-0.593832	GHz then the length	-0.124939
-0.595643	0x20; If the length	-0.124939
-0.579885	safe unless the length	-0.124939
-0.462731	by adding the length	-0.124939
-0.658421	is doubled. The length	-0.124939
-0.358206	was started. The length	-0.124939
-0.456060	function. The string length	-0.124939
-0.345241	when the row length	-0.124939
-0.358882	into lines and sets.	-0.124939
-0.412738	for large data sets.	-0.124939
-0.412738	with large data sets.	-0.124939
-0.341497	processors and instruction sets.	-0.124939
-0.063547	of these instruction sets.	-0.124939
-0.089108	and later instruction sets.	-0.124939
-0.356264	the different instructions sets.	-0.124939
-1.051316	that is a linear	-0.124939
-1.602810	This is a linear	-0.124939
-0.869808	efficient than a linear	-0.124939
-0.806604	can use a linear	-0.124939
-0.941457	then use a linear	-0.124939
-0.524905	container, then a linear	-0.124939
-0.721693	or even a linear	-0.124939
-0.583955	looping through a linear	-0.124939
-0.358897	applications (e.g. in linear	-0.124939
-0.349793	mathematical calculations including linear	-0.124939
-1.603596	if there is something	-0.124939
-0.065700	I write that something	-0.425969
-0.504411	function also has something	-0.124939
-0.197304	important to do something	-0.425969
-0.354753	to actually doing something	-0.124939
-0.456649	opposite: Don't put something	-0.124939
-0.314737	vectorized code. Storing something	-0.124939
-0.629920	it is certainly something	-0.124939
-0.691949	sign bit of f	-0.124939
-1.056517	else { // f	-0.124939
-0.357706	- 30 // f	-0.124939
-0.462868	first sum, then f	-0.124939
-0.458533	<= n; i++) f	-0.124939
-0.570221	n! int i, f	-0.124939
-0.237910	f = (float)i; f	-0.124939
-0.237910	f = float(i); f	-0.124939
-0.237910	float f; f=i; f	-0.124939
-1.686996	there is a penalty	-0.124939
-0.527150	alternative version. The penalty	-0.124939
-0.358586	register state. This penalty	-0.124939
-1.889988	There is no penalty	-0.124939
-0.436104	is a performance penalty	-0.124939
-0.617352	is no performance penalty	-0.124939
-0.337148	hardly any performance penalty	-0.124939
-0.337148	no 51 performance penalty	-0.124939
-0.364817	so the misprediction penalty	-0.124939
-0.364817	get a misprediction penalty	-0.124939
-0.585985	breaking out of F1	-0.124939
-0.358908	throw() specification to F1	-0.124939
-0.865937	to assume that F1	-0.124939
-0.570155	} The function F1	-0.124939
-0.724526	only possible if F1	-0.124939
-0.357431	F1. However, if F1	-0.124939
-0.726431	functions called by F1	-0.124939
-0.358321	an exception then F1	-0.124939
-0.358103	be necessary. If F1	-0.124939
-0.237902	F1 without returning. F1	-0.124939
-0.463639	expression list[i] is invalid	-0.124939
-0.462975	integer overflow, and invalid	-0.124939
-0.358407	bounds violations and invalid	-0.124939
-0.599370	Pointers can be invalid	-0.124939
-0.502861	this would be invalid	-0.124939
-0.502861	reduction would be invalid	-0.124939
-0.358644	matters. Problems with invalid	-0.124939
-0.423331	then it becomes invalid	-0.124939
-0.326947	stamp counter becomes invalid	-0.124939
-0.237926	array bounds violations, invalid	-0.124939
-0.358839	called once. The reasons	-0.124939
-0.572014	frame functions for reasons	-0.124939
-0.355576	of precision for reasons	-0.124939
-0.537171	this manual for reasons	-0.124939
-0.355576	32-bit mode, for reasons	-0.124939
-0.065377	not permissible for reasons	-0.425969
-0.554964	of the main reasons	-0.124939
-0.349111	you have special reasons	-0.124939
-0.336303	computer for security reasons	-0.124939
-0.577555	common way of setting	-0.124939
-0.358871	the test and setting	-0.124939
-0.788633	is needed for setting	-0.124939
-0.788464	an array or setting	-0.124939
-0.353486	absolute value by setting	-0.124939
-0.558390	brand simply by setting	-0.124939
-0.353486	CPU core by setting	-0.124939
-0.353486	to zero, by setting	-0.124939
-0.353486	[1.0, 2.0) by setting	-0.124939
-0.787163	can benefit from setting	-0.124939
-0.463685	by compiling the module	-0.124939
-0.898682	functions in a module	-0.124939
-1.216716	in a different module	-0.124939
-1.674745	in the same module	-0.124939
-0.862161	within the same module	-0.124939
-0.064470	only from same module	-0.124939
-0.861465	from any other module	-0.124939
-0.565366	test a software module	-0.124939
-0.576311	or a separate module	-0.124939
-1.006065	address of the beginning	-0.425969
-0.259364	relative to the beginning	-0.726999
-1.602524	sure that the beginning	-0.124939
-0.595545	coincides with the beginning	-0.124939
-0.592119	right from the beginning	-0.124939
-0.586159	block into the beginning	-0.124939
-0.788864	an integer is within	-0.124939
-0.526312	is accessed from within	-0.124939
-0.504050	references to data within	-0.124939
-0.787720	is used only within	-0.124939
-0.570154	all data members within	-0.124939
-0.353157	byte of zero within	-0.124939
-0.345174	while multiple statements within	-0.124939
-0.294200	to be irrelevant within	-0.124939
-0.237894	or by keys within	-0.124939
-0.237894	to become obsolete within	-0.124939
-1.219831	Intel compiler is used,	-0.124939
-0.599044	type-casting. It is used,	-0.124939
-1.051626	memory allocation is used,	-0.124939
-0.503358	the expression is used,	-0.124939
-0.539804	dynamic linking is used,	-0.124939
-0.898966	sets can be used,	-0.124939
-0.593142	main will be used,	-0.124939
-0.491351	point registers are used,	-0.124939
-1.045473	XMM registers are used,	-0.124939
-0.331919	is hardly ever used,	-0.124939
-0.358860	OS independent and checks	-0.124939
-0.358818	so-called CPU-dispatcher that checks	-0.124939
-0.565987	Intel before it checks	-0.124939
-0.357562	derived class, it checks	-0.124939
-0.833269	There are no checks	-0.124939
-0.503101	absence of such checks	-0.124939
-0.063596	to make overflow checks	-0.425969
-1.204386	The CPU dispatcher checks	-0.124939
-0.325371	to make explicit checks	-0.124939
-0.358705	storing text or input	-0.124939
-0.463256	buffer overflow on input	-0.124939
-0.106844	Boolean variables as input	-0.124939
-0.540632	line or an input	-0.124939
-0.514121	instead of user input	-0.124939
-0.329274	time to user input	-0.124939
-0.139690	waiting for user input	-0.425969
-0.356444	used for file input	-0.124939
-0.358802	well, others are not.	-0.124939
-0.504915	other functions can not.	-0.124939
-0.495525	vectorized code or not.	-0.124939
-0.495525	a loop or not.	-0.124939
-0.352125	of 2 or not.	-0.124939
-0.530910	the speed or not.	-0.124939
-0.352125	be advantageous or not.	-0.124939
-0.246288	are aligned or not.	-0.124939
-0.246288	properly aligned or not.	-0.124939
-0.526071	allowed and which not.	-0.124939
-0.527122	don't think that programmers	-0.124939
-0.559703	use for many programmers	-0.124939
-0.351519	For example, many programmers	-0.124939
-0.357342	For example, some programmers	-0.124939
-0.560253	attention of software programmers	-0.124939
-0.500867	guide for assembly programmers	-0.124939
-0.353636	C++ constructs Most programmers	-0.124939
-0.352340	to program. Many programmers	-0.124939
-0.351241	is for advanced programmers	-0.124939
-0.237902	the program. Application programmers	-0.124939
-0.595630	footprint than the alternative	-0.124939
-0.358821	variable size. The alternative	-0.124939
-0.358686	the case if alternative	-0.124939
-0.540591	off and use alternative	-0.124939
-0.537106	profiler. A simple alternative	-0.124939
-0.343384	is inlined. An alternative	-0.124939
-0.343384	become fragmented. An alternative	-0.124939
-0.347468	a DLL. Another alternative	-0.124939
-0.237902	object. A little-known alternative	-0.124939
-0.237902	load. A light-weight alternative	-0.124939
-0.346425	bit scan instructions. My	-0.124939
-0.336252	example illustrates this. My	-0.124939
-0.458709	with an example. My	-0.124939
-0.325351	less well-known languages. My	-0.124939
-0.294200	and Linux. Asmlib My	-0.124939
-0.538839	performance monitor counters. My	-0.124939
-0.294200	512 512 matrix. My	-0.124939
-0.294200	one from me. My	-0.124939
-0.237894	object file level. My	-0.124939
-0.237894	have been identified. My	-0.124939
-0.595702	memory that is organized	-0.124939
-0.566809	a cache is organized	-0.124939
-0.896513	register can be organized	-0.124939
-0.849865	array should be organized	-0.124939
-0.574633	resources should be organized	-0.124939
-0.501636	can easily be organized	-0.124939
-0.358001	These lines are organized	-0.124939
-0.358001	Most caches are organized	-0.124939
-0.504693	in memory if organized	-0.124939
-0.951757	floating point registers organized	-0.124939
-0.367609	of the critical stride	-0.124939
-0.395124	that the critical stride	-0.124939
-0.395124	because the critical stride	-0.124939
-0.395124	where the critical stride	-0.124939
-0.077311	ways. The critical stride	-0.124939
-0.077311	lines. The critical stride	-0.124939
-0.077311	each. The critical stride	-0.124939
-0.820423	the SSE2 instruction set,	-0.124939
-0.447513	or SSE2 instruction set,	-0.124939
-0.425414	a specific instruction set,	-0.124939
-1.241304	the AVX instruction set,	-0.124939
-0.152862	Get supported instruction set,	-0.425969
-0.601228	a particular instruction set,	-0.124939
-0.513516	any higher instruction set,	-0.124939
-1.701579	address of the current	-0.124939
-1.371516	relative to the current	-0.124939
-0.599173	updates if the current	-0.124939
-0.596473	vectorized with the current	-0.124939
-0.358361	module (i.e. the current	-0.124939
-0.463565	backup features, and current	-0.124939
-0.878252	a code that current	-0.124939
-0.358628	certain tasks on current	-0.124939
-0.357497	example 12.4a where current	-0.124939
-0.347481	that doesn't handle current	-0.124939
-0.527339	doesn't need the 'this'	-0.124939
-0.876566	functions have a 'this'	-0.124939
-0.763720	doesn't need a 'this'	-0.124939
-0.990288	member functions. The 'this'	-0.124939
-0.358823	parameter transfer for 'this'	-0.124939
-0.654148	by type-casting its 'this'	-0.124939
-0.325371	Windows by transferring 'this'	-0.124939
-0.212292	__fastcall. The implicit 'this'	-0.124939
-0.284331	has an implicit 'this'	-0.124939
-0.314737	parameters, pointers, references, 'this'	-0.124939
-0.601245	structure of the problem.	-0.124939
-0.463631	caching becomes a problem.	-0.124939
-0.843283	discussion of this problem.	-0.124939
-0.451798	not have this problem.	-0.124939
-0.496348	to reduce this problem.	-0.124939
-0.451798	ways around this problem.	-0.124939
-0.641405	to solve this problem.	-0.124939
-0.500664	has one big problem.	-0.124939
-0.355112	we encounter another problem.	-0.124939
-0.336293	is another security problem.	-0.124939
-0.504916	4 processors, and 3	-0.124939
-0.593989	manuals. See page 3	-0.124939
-0.656883	point addition takes 3	-0.124939
-0.801252	it may take 3	-0.124939
-1.308584	of the program. 3	-0.124939
-0.750642	and operating systems. 3	-0.124939
-0.117627	at the interrupt 3	-0.124939
-0.117627	remove the interrupt 3	-0.124939
-0.336278	C++ language...................................................... 14 3	-0.124939
-0.237902	1 Introduction ....................................................................................................................... 3	-0.124939
-0.866017	in member functions counts	-0.124939
-0.358087	the CPU, which counts	-0.124939
-0.473456	Occasionally, the clock counts	-0.124939
-0.473456	event, the clock counts	-0.124939
-0.517468	afterwards. The clock counts	-0.124939
-0.539667	millisecond. The profiler counts	-0.124939
-0.436880	and the subsequent counts	-0.124939
-0.408161	cached. The subsequent counts	-0.124939
-0.331904	in meaningless event counts	-0.124939
-0.421376	the "best case" counts	-0.124939
-0.027970	a lot to gain	-0.124939
-0.657809	is nothing to gain	-0.124939
-0.358201	high priority. The gain	-0.124939
-0.462713	natural parallelism. The gain	-0.124939
-0.358589	quite substantial. This gain	-0.124939
-0.356461	How much you gain	-0.124939
-0.356461	The insight you gain	-0.124939
-0.355506	the relatively small gain	-0.124939
-0.358096	predictor. On other processors,	-0.124939
-0.357454	processors. On many processors,	-0.124939
-0.554658	on Pentium 4 processors,	-0.124939
-0.547340	size on AMD processors,	-0.124939
-1.589981	AMD and VIA processors,	-0.124939
-0.808992	well on non-Intel processors,	-0.124939
-0.485914	best on future processors,	-0.124939
-0.331922	RISC and CISC processors,	-0.124939
-0.325351	tree. On older processors,	-0.124939
-0.237894	on Intel Atom processors,	-0.124939
-0.589565	(b*c)/d, it can happen	-0.124939
-0.653764	The same can happen	-0.124939
-0.142459	same errors can happen	-0.124939
-0.142459	serious errors can happen	-0.124939
-0.526667	moved, which may happen	-0.124939
-0.572589	87. This will happen	-0.124939
-1.831383	of the program happen	-0.124939
-0.357308	that several variables happen	-0.124939
-0.548661	it can often happen	-0.124939
-0.355409	a big matrix happen	-0.124939
-0.598280	times may be enough	-0.124939
-0.593721	there are not enough	-0.124939
-0.462754	the integer has enough	-0.124939
-0.350247	but not long enough	-0.124939
-0.350247	is just long enough	-0.124939
-0.074099	that is big enough	-0.124939
-0.163495	size is big enough	-0.124939
-0.547246	This is small enough	-0.124939
-0.531484	mechanism is rarely enough	-0.124939
-0.358917	want them to apply	-0.124939
-0.795680	2 does not apply	-0.124939
-0.545043	argument does not apply	-0.124939
-0.356550	given here may apply	-0.124939
-0.356550	the advices may apply	-0.124939
-1.009683	Therefore, you should apply	-0.124939
-0.494050	Obviously, you should apply	-0.124939
-0.575170	does not always apply	-0.124939
-0.269229	The same rules apply	-0.124939
-0.269229	same coding rules apply	-0.124939
-1.579116	} } } Obviously,	-0.124939
-0.349705	of all variables. Obviously,	-0.124939
-0.511749	same shared object. Obviously,	-0.124939
-0.564139	two clock cycles. Obviously,	-0.124939
-0.700445	is not needed. Obviously,	-0.124939
-0.550526	bad CPU dispatching. Obviously,	-0.124939
-0.876454	and back again. Obviously,	-0.124939
-0.487966	A is finished. Obviously,	-0.124939
-0.607325	for each process. Obviously,	-0.124939
-0.294200	the .NET framework. Obviously,	-0.124939
-0.659221	a particular code version.	-0.124939
-0.587611	changes for each version.	-0.124939
-0.880203	the best possible version.	-0.124939
-0.502594	than the 32-bit version.	-0.124939
-0.458175	file of every version.	-0.124939
-0.354346	of every intermediate version.	-0.124939
-1.038317	in the old version.	-0.124939
-1.162317	to the desired version.	-0.124939
-0.345206	than the alternative version.	-0.124939
-0.408115	have an up-to-date version.	-0.124939
-0.597628	efficient when the row	-0.124939
-1.292426	length of a row	-0.124939
-0.584190	eight elements in row	-0.124939
-0.358707	column++) matrix[row][column] = row	-0.124939
-0.557347	the elements from row	-0.124939
-0.854915	address of each row	-0.124939
-0.581554	container for each row	-0.124939
-0.596127	(row = 0; row	-0.124939
-0.886606	of elements per row	-0.124939
-0.546510	needed for calculating row	-0.124939
-0.565201	the Intel C++ Compiler	-0.124939
-0.401052	2009). Intel C++ Compiler	-0.124939
-0.440990	tested: Microsoft C++ Compiler	-0.124939
-0.341034	Intel: "Intel® C++ Compiler	-0.124939
-0.500733	2008. Digital Mars Compiler	-0.124939
-0.331899	18.3. Predefined macros Compiler	-0.124939
-0.102870	long latencies. 8.5 Compiler	-0.124939
-0.102870	by CPU.............................................................................81 8.5 Compiler	-0.124939
-0.237926	-opt-report Table 18.2. Compiler	-0.124939
-0.237926	when not selected. Compiler	-0.124939
-0.587907	use is a matter	-0.124939
-0.587907	prefer is a matter	-0.124939
-0.009842	is simply a matter	-0.823909
-0.501963	is just a matter	-0.124939
-0.557169	case it doesn't matter	-0.124939
-0.343428	the size doesn't matter	-0.124939
-1.615962	sure that the declaration	-0.124939
-1.523194	by using the declaration	-0.124939
-0.588950	printf(Greek[n]); } The declaration	-0.124939
-0.590940	the const int declaration	-0.124939
-1.281720	structure or class declaration	-0.124939
-0.539645	module. The static declaration	-0.124939
-1.027098	to a variable declaration	-0.124939
-0.523864	making the full declaration	-0.124939
-0.343717	integer types available. declaration	-0.124939
-0.382827	have extern "C" declaration	-0.124939
-0.593358	delete is to allocate	-0.124939
-0.572506	pooling) than to allocate	-0.124939
-1.527046	more efficient to allocate	-0.124939
-1.452350	is necessary to allocate	-0.124939
-0.356883	and delete to allocate	-0.124939
-0.502159	is preferable to allocate	-0.124939
-0.356883	core. Try to allocate	-0.124939
-0.526942	cases. Does not allocate	-0.124939
-0.356559	prone to even allocate	-0.124939
-0.459181	The string classes allocate	-0.124939
-0.570567	loop or the series	-0.124939
-0.597926	chain is a series	-0.124939
-0.590024	one in a series	-0.124939
-0.590024	first in a series	-0.124939
-0.584529	propagated through a series	-0.124939
-0.461642	have made a series	-0.124939
-0.357359	mechanism executes a series	-0.124939
-0.358589	Copyright notice This series	-0.124939
-0.590453	volumes in this series	-0.124939
-0.325401	Example 12.9a. Taylor series	-0.124939
-1.200841	many of the features	-0.124939
-0.763280	function libraries have features	-0.124939
-1.681642	of the same features	-0.124939
-0.586704	sets and other features	-0.124939
-0.540860	using the optimization features	-0.124939
-0.348192	its many optimization features	-0.124939
-0.356603	to add new features	-0.124939
-0.494296	wealth of advanced features	-0.124939
-0.438986	the time- consuming features	-0.124939
-0.237902	Instruction set Important features	-0.124939
-1.264982	value that is added	-0.124939
-0.785700	an integer is added	-0.124939
-0.357738	then c is added	-0.124939
-0.357738	data members is added	-0.124939
-0.525536	then f is added	-0.124939
-1.748968	a lot of added	-0.124939
-1.176216	objects can be added	-0.124939
-0.594914	keyword can be added	-0.124939
-0.587964	called. I have added	-0.124939
-0.572190	elements have been added	-0.124939
-0.880261	annoying to the user.	-0.124939
-0.590496	messages to the user.	-0.124939
-0.201245	time for the user.	-0.124939
-0.596834	activated by the user.	-0.124939
-0.060268	to the end user.	-0.124939
-0.447693	for the end user.	-0.124939
-0.599169	reduces the code to:	-0.124939
-0.105436	may reduce this to:	-0.425969
-0.349603	may change this to:	-0.124939
-0.349603	by changing this to:	-0.124939
-0.349603	1.2345; Change this to:	-0.124939
-0.873071	can be optimized to:	-0.124939
-0.495205	Can be reduced to:	-0.124939
-0.352268	can be changed to:	-0.425969
-0.876457	stack is a waste	-0.124939
-0.588539	situation is a waste	-0.124939
-0.584147	user and a waste	-0.124939
-0.193297	only be a waste	-0.425969
-0.879798	may cause a waste	-0.124939
-0.358877	of frustration and waste	-0.124939
-0.538470	problems and they waste	-0.124939
-0.552807	is a big waste	-0.124939
-0.450246	is a total waste	-0.124939
-0.999528	example 15.1b to metaprogramming	-0.124939
-0.550008	string functions. A metaprogramming	-0.124939
-0.356822	examples explain how metaprogramming	-0.124939
-0.331095	Why is template metaprogramming	-0.124939
-0.331095	power using template metaprogramming	-0.124939
-0.331095	however, where template metaprogramming	-0.124939
-0.331095	and convoluted template metaprogramming	-0.124939
-0.352930	waiting for better metaprogramming	-0.124939
-0.347465	language has full metaprogramming	-0.124939
-0.408181	can be considered metaprogramming	-0.124939
-0.358919	by requesting a map	-0.124939
-0.557680	list, set and map	-0.124939
-1.061611	the program. The map	-0.124939
-0.462702	the linker. The map	-0.124939
-0.351695	at a link map	-0.124939
-0.461246	or a hash map	-0.124939
-0.249109	elements. A hash map	-0.124939
-0.249109	interval. A hash map	-0.124939
-0.314747	/FA -S Generate map	-0.124939
-0.237918	Use the "generate map	-0.124939
-0.786138	allow you to define	-0.124939
-1.536443	more efficient to define	-0.124939
-1.573989	be able to define	-0.124939
-0.657815	the ability to define	-0.124939
-0.578978	constants we can define	-0.124939
-0.355676	line size // define	-0.124939
-0.523730	transpose matrix // define	-0.124939
-0.355676	#include <stdio.h> // define	-0.124939
-0.355676	define fprintf // define	-0.124939
-1.160693	Alternatively, you may define	-0.124939
-0.581827	x when it returns.	-0.124939
-0.493399	as the function returns.	-0.124939
-0.296797	when the function returns.	-0.124939
-0.080782	before the function returns.	-0.124939
-0.463595	system database in Windows.	-0.124939
-0.659687	device drivers for Windows.	-0.124939
-0.772743	32-bit and 64-bit Windows.	-0.124939
-0.352334	than in 64-bit Windows.	-0.124939
-0.309704	compiler for 32-bit Windows.	-0.124939
-0.062864	Supports only 32-bit Windows.	-0.124939
-0.473146	object oriented programming style	-0.124939
-0.336732	relatively primitive programming style	-0.124939
-0.290742	that the C style	-0.124939
-0.056507	old fashioned C style	-0.124939
-0.348380	in software writing style	-0.124939
-0.237934	mixed with x87 style	-0.124939
-0.237934	same as C- style	-0.124939
-0.083383	8) { // Load	-0.726999
-0.286086	+ i); // Load	-0.602060
-0.351621	elements b.load(bb+i); // Load	-0.124939
-0.675793	the function call. Load	-0.124939
-0.884356	temp; temp = 3;	-0.124939
-0.358538	is __asm int 3;	-0.124939
-0.357747	* 9 + 3;	-0.124939
-0.358597	= a * 3;	-0.425969
-0.355208	+= i / 3;	-0.124939
-0.347486	= i % 3;	-0.124939
-0.597971	truncation. This is approximately	-0.124939
-0.725383	register variables is approximately	-0.124939
-0.358329	branch misprediction is approximately	-0.124939
-0.358914	a precision of approximately	-0.124939
-0.358827	of x for approximately	-0.124939
-0.595022	limited. There are approximately	-0.124939
-0.358699	an array, or approximately	-0.124939
-0.878311	loop will take approximately	-0.124939
-1.115687	can be accessed approximately	-0.124939
-0.527138	or out of order.	-0.124939
-0.764289	instructions out of order.	-0.124939
-0.569573	in the optimal order.	-0.124939
-0.198712	in a non-sequential order.	-0.301030
-0.054846	deallocated in random order.	-0.124939
-0.325401	in non- sequential order.	-0.124939
-0.501383	printf("Gamma"); break; case 3:	-0.124939
-0.357753	paragraph and manual 3:	-0.124939
-0.611218	detail in manual 3:	-0.124939
-0.432052	covered in manual 3:	-0.124939
-0.026149	CPU (See manual 3:	-0.425969
-0.053974	CPUs (See manual 3:	-0.124939
-0.053974	mispredicted (See manual 3:	-0.124939
-0.294265	of branches. Manual 3:	-0.124939
-0.504912	platforms. 3. The microarchitecture	-0.124939
-0.000409	manual 3: "The microarchitecture	-0.970037
-0.002872	Manual 3: "The microarchitecture	-0.124939
-0.590286	are: It is easy	-0.124939
-0.590286	develop. It is easy	-0.124939
-0.526414	this error is easy	-0.124939
-0.462982	for fast and easy	-0.124939
-0.358412	platform independence, and easy	-0.124939
-0.358711	inline assembly or easy	-0.124939
-1.191362	There is no easy	-0.425969
-0.294247	for debugging facilities, easy	-0.124939
-0.885011	be aware of situations	-0.124939
-0.357817	is useful in situations	-0.124939
-0.637552	be useful in situations	-0.124939
-0.357817	also useful in situations	-0.124939
-0.357474	than RISC in situations	-0.124939
-1.286122	There may be situations	-0.124939
-0.595025	it. There are situations	-0.124939
-1.205185	There are also situations	-0.124939
-0.356661	useful in test situations	-0.124939
-1.527074	more efficient to implement	-0.124939
-1.248757	is possible to implement	-0.425969
-2.086609	in order to implement	-0.124939
-1.320812	shows how to implement	-0.124939
-0.979196	is difficult to implement	-0.124939
-0.513509	quite difficult to implement	-0.124939
-0.355430	The child classes implement	-0.124939
-1.054686	I have tested implement	-0.124939
-0.358517	loops (less than 65	-0.124939
-0.287606	32 16.4 65 65	-0.124939
-0.287606	14.0 80.8 65 65	-0.124939
-0.325371	Preprocessing directives ......................................................................................... 65	-0.124939
-0.294219	to using namespaces. 65	-0.124939
-0.237910	65 7.33 Namespaces........................................................................................................... 65	-0.124939
-0.237910	64 32 16.4 65	-0.124939
-0.237910	64 14.0 80.8 65	-0.124939
-0.237910	stack unwinding .............................................................................. 65	-0.124939
-1.432060	cases where the chosen	-0.124939
-0.575248	Now call the chosen	-0.124939
-0.550451	two gives the chosen	-0.124939
-1.772545	the code is chosen	-0.124939
-0.562047	when C++ is chosen	-0.124939
-0.826442	C++ language is chosen	-0.124939
-0.900194	branch can be chosen	-0.124939
-0.859483	that it has chosen	-0.124939
-1.084330	the compiler has chosen	-0.124939
-0.358913	can emulate a 256-bit	-0.124939
-0.358908	problem. Vectors of 256-bit	-0.124939
-0.504990	are extended to 256-bit	-0.124939
-0.358855	128-bit XMM and 256-bit	-0.124939
-0.358821	(see below). The 256-bit	-0.124939
-0.462487	will use one 256-bit	-0.124939
-0.651859	processors that supported 256-bit	-0.124939
-0.353617	set also allows 256-bit	-0.124939
-0.325384	instructions were splitting 256-bit	-0.124939
-1.430597	Therefore, it is slightly	-0.124939
-0.358335	127 bytes is slightly	-0.124939
-0.725397	The latter is slightly	-0.124939
-0.519116	int) are only slightly	-0.124939
-0.309706	C++ takes only slightly	-0.124939
-0.309706	precision takes only slightly	-0.124939
-0.499224	calls may run slightly	-0.124939
-0.294247	compilers make Sum1 slightly	-0.124939
-0.294247	work, 133 although slightly	-0.124939
-0.504947	is fragmented and scattered	-0.124939
-1.657124	likely to be scattered	-0.124939
-0.597421	They may be scattered	-0.124939
-0.593071	objects that are scattered	-0.124939
-1.005569	the data are scattered	-0.124939
-0.496930	if data are scattered	-0.124939
-0.460439	dispatch branches are scattered	-0.124939
-0.526799	are many functions scattered	-0.124939
-0.355586	help files etc. scattered	-0.124939
-1.179325	not possible to contain	-0.124939
-0.358815	common subexpressions that contain	-0.124939
-0.570395	integers. It can contain	-0.124939
-0.358500	data section may contain	-0.124939
-0.357993	test data should contain	-0.124939
-0.460717	of objects they contain	-0.124939
-0.355383	third generations classes contain	-0.124939
-0.314727	These two books contain	-0.124939
-0.237902	forums and newsgroups contain	-0.124939
-0.358866	unaligned reads and writes	-0.124939
-0.142987	that reads or writes	-0.124939
-0.142987	afterwards reads or writes	-0.124939
-1.674507	so that it writes	-0.124939
-0.550563	} This function writes	-0.124939
-0.462532	stride causes all writes	-0.124939
-0.299760	Don't mix nontemporal writes	-0.124939
-0.299760	can insert nontemporal writes	-0.124939
-0.336283	writes with normal writes	-0.124939
-0.598572	interpretation on the device	-0.124939
-0.463623	then calls a device	-0.124939
-0.358860	service routines and device	-0.124939
-0.463585	capabilities (except in device	-0.124939
-0.549727	printer or other device	-0.124939
-0.589266	used in 64-bit device	-0.124939
-0.120418	a programmable logic device	-0.124939
-0.120418	A programmable logic device	-0.124939
-0.331879	or C++. Critical device	-0.124939
-1.962666	if it is independent	-0.124939
-0.358630	example 11.3 is independent	-0.124939
-0.358793	the additions are independent	-0.124939
-0.347509	function is OS independent	-0.124939
-0.508502	that is almost independent	-0.124939
-0.336293	used on completely independent	-0.124939
-0.065806	needs of position- independent	-0.124939
-0.065806	This makes position- independent	-0.124939
-0.065806	the so-called position- independent	-0.124939
-0.781583	of dynamic memory allocation.	-0.124939
-0.268906	with dynamic memory allocation.	-0.124939
-0.516467	use dynamic memory allocation.	-0.124939
-0.268906	using dynamic memory allocation.	-0.124939
-0.268906	without dynamic memory allocation.	-0.124939
-0.215437	avoid dynamic memory allocation.	-0.124939
-0.356438	reserved for dynamic allocation.	-0.124939
-0.352562	faster than a non-static	-0.425969
-0.659431	data members or non-static	-0.124939
-0.570114	incurred on all non-static	-0.124939
-0.354421	pointer. Likewise, all non-static	-0.124939
-0.445915	don't need any non-static	-0.124939
-0.064000	cannot access any non-static	-0.425969
-0.354494	are constructed. All non-static	-0.124939
-0.894470	count and the subsequent	-0.124939
-1.197500	described in the subsequent	-0.124939
-0.570349	higher than the subsequent	-0.124939
-0.570349	slower than the subsequent	-0.124939
-0.462927	doesn't delay the subsequent	-0.124939
-0.837999	code cache. The subsequent	-0.124939
-0.357574	first manual. The subsequent	-0.124939
-0.357574	not cached. The subsequent	-0.124939
-0.462563	list causes all subsequent	-0.124939
-0.502220	member functions. This applies	-0.124939
-0.356927	random manner. This applies	-0.124939
-0.470312	order. The same applies	-0.124939
-0.470312	floats. The same applies	-0.124939
-0.447859	137). This also applies	-0.124939
-0.346482	about Linux also applies	-0.124939
-0.346482	increment operators also applies	-0.124939
-1.237885	powers of 2 applies	-0.124939
-0.331920	The same advice applies	-0.124939
-1.442443	code can be applied	-0.124939
-1.391812	which can be applied	-0.124939
-0.142965	can only be applied	-0.425969
-0.348592	32 results when applied	-0.124939
-0.015244	keyword static, when applied	-0.726999
-0.241323	copy constructors and destructors	-0.124939
-0.065622	7.23 Constructors and destructors	-0.124939
-0.358653	wrapper classes with destructors	-0.124939
-0.151970	sure that all destructors	-0.124939
-0.387762	guarantee that all destructors	-0.124939
-0.355767	calling any necessary destructors	-0.124939
-0.408072	be applied to integers.	-0.124939
-1.202808	as efficient as integers.	-0.124939
-0.747461	advantage of 64-bit integers.	-0.124939
-0.517311	support for 64-bit integers.	-0.124939
-1.158832	signed and unsigned integers.	-0.124939
-0.456660	efficient than signed integers.	-0.124939
-0.637287	of eight 16-bit integers.	-0.124939
-0.347478	bit vector containing integers.	-0.124939
-1.259138	the function in terms	-0.124939
-0.355600	expensive - in terms	-0.124939
-0.106569	no cost in terms	-0.425969
-0.142378	the costs in terms	-0.124939
-0.142378	also costs in terms	-0.124939
-0.355600	are costless in terms	-0.124939
-0.355600	lag. Thinking in terms	-0.124939
-0.350376	calculates four consecutive terms	-0.124939
-0.889713	this is to help	-0.124939
-2.117814	in order to help	-0.124939
-1.118312	that we can help	-0.124939
-0.502818	be of some help	-0.124939
-0.356578	reorder instructions without help	-0.124939
-0.577266	whether to store help	-0.124939
-0.336295	files, resource files, help	-0.124939
-0.336295	files, configuration files, help	-0.124939
-0.331889	automatic updates, remote help	-0.124939
-0.566746	usually faster to transfer	-0.124939
-0.358583	"move constructor" to transfer	-0.124939
-0.527163	or class. The transfer	-0.124939
-0.022930	overhead of parameter transfer	-0.301030
-0.114318	return and parameter transfer	-0.124939
-0.114318	allocation and parameter transfer	-0.124939
-0.237943	64-bit mode Parameter transfer	-0.124939
-0.455597	allocate more memory blocks	-0.124939
-0.455597	with multiple memory blocks	-0.124939
-0.455597	of big memory blocks	-0.124939
-0.569297	data into multiple blocks	-0.124939
-0.343395	done in big blocks	-0.124939
-0.343395	or writing big blocks	-0.124939
-0.450230	ways of copying blocks	-0.124939
-0.237926	of digital building blocks	-0.124939
-0.237926	section 17.9: "Moving blocks	-0.124939
-0.355328	is simply optimized away	-0.124939
-0.337041	good at optimizing away	-0.124939
-0.337041	// Prevent optimizing away	-0.124939
-0.400966	compiler to optimize away	-0.124939
-0.550014	compiler can optimize away	-0.124939
-0.300527	can easily optimize away	-0.124939
-0.456670	preferably be put away	-0.124939
-0.515427	likely to go away	-0.124939
-0.468258	transformation of example 15.1b	-0.124939
-0.332355	analogous to example 15.1b	-0.124939
-0.530181	used in example 15.1b	-0.124939
-0.530181	branch in example 15.1b	-0.124939
-0.367200	conversion from example 15.1b	-0.124939
-0.367200	come from example 15.1b	-0.124939
-0.667146	will convert example 15.1b	-0.124939
-0.451210	to an inlined 15.1b	-0.124939
-0.450270	Gnu compiler reduced 15.1b	-0.124939
-0.598536	development and the low	-0.124939
-0.659865	work load is low	-0.124939
-0.599243	run in a low	-0.124939
-0.591144	branch for a low	-0.124939
-0.725305	variable produces a low	-0.124939
-0.538666	lightweight processors with low	-0.124939
-0.548610	separate threads with low	-0.124939
-0.864184	with a very low	-0.124939
-0.382862	size have got low	-0.124939
-0.358917	the factor to multiply	-0.124939
-1.496015	that it can multiply	-0.124939
-0.579239	} We can multiply	-0.124939
-0.358740	loop for // multiply	-0.124939
-0.111232	by constant = multiply	-0.301030
-0.578185	example, you should multiply	-0.124939
-0.357090	vector instructions cannot multiply	-0.124939
-0.861377	same time to share	-0.124939
-0.460970	that threads can share	-0.124939
-0.356830	and c can share	-0.124939
-0.356830	running simultaneously can share	-0.124939
-0.461771	make different objects share	-0.124939
-0.985837	if the threads share	-0.124939
-0.570181	where data members share	-0.124939
-0.351662	logical processors usually share	-0.124939
-0.429510	in row 28 share	-0.124939
-0.596229	instruction set is enabled.	-0.191886
-0.538367	interprocedural optimization is enabled.	-0.124939
-0.356560	(or higher) is enabled.	-0.124939
-0.253897	32 for an explanation	-0.124939
-0.253897	130 for an explanation	-0.124939
-0.253897	43 for an explanation	-0.124939
-0.253897	81 for an explanation	-0.124939
-0.108653	CPUs" for an explanation	-0.425969
-0.856502	I have no explanation	-0.124939
-0.586703	skip the following explanation	-0.124939
-0.325411	A more detailed explanation	-0.124939
-1.230704	repeat count is near	-0.124939
-1.686606	that are used near	-0.124939
-0.568160	together are stored near	-0.124939
-0.108023	are also stored near	-0.602060
-0.547843	which are called near	-0.124939
-0.352057	the code together near	-0.124939
-0.520563	integers are equally near	-0.124939
-0.562043	assembly language is provided	-0.124939
-0.658678	instruction sets is provided	-0.124939
-0.358335	they contain is provided	-0.124939
-0.357206	Some guidelines are provided	-0.124939
-0.461448	above. Examples are provided	-0.124939
-0.357206	and parsing are provided	-0.124939
-0.587964	one. I have provided	-0.124939
-0.351294	non-virtual member function, provided	-0.124939
-0.294247	not use branches, provided	-0.124939
-0.600480	branch of the latter	-0.124939
-0.202879	because in the latter	-0.124939
-0.889983	code. If the latter	-0.124939
-0.576609	obtained. In the latter	-0.124939
-0.724778	by inlining the latter	-0.124939
-0.995846	even though the latter	-0.124939
-0.816446	64-bit systems. The latter	-0.124939
-0.526237	bit mode. The latter	-0.124939
-0.593086	Here, there are 6	-0.124939
-0.902634	bytes. first // 6	-0.124939
-0.504525	takes 3 - 6	-0.124939
-0.354330	poorly designed program. 6	-0.124939
-0.336262	or double plus 6	-0.124939
-0.861376	in the future. 6	-0.124939
-0.314727	of microprocessor ........................................................................................... 6	-0.124939
-0.294210	algorithm ....................................................................................... 24 6	-0.124939
-0.237902	of operating system......................................................................................... 6	-0.124939
-0.557023	ten times and stores	-0.124939
-0.358407	a matrix and stores	-0.124939
-0.893471	and the function stores	-0.124939
-0.803269	// This function stores	-0.124939
-0.462748	objects. STL vector stores	-0.124939
-0.337035	time. It simply stores	-0.124939
-0.337035	member pointer simply stores	-0.124939
-0.352948	the Gnu mechanism stores	-0.124939
-0.237926	DWORD PTR [ecx+eax*4],ebx stores	-0.124939
-0.463371	you want it to.	-0.124939
-0.419961	what it points to.	-0.124939
-0.351313	function pointer points to.	-0.124939
-0.088783	that r points to.	-0.124939
-0.345249	them to apply to.	-0.124939
-0.489146	the object pointed to.	-0.124939
-0.331899	that it jumps to.	-0.124939
-0.294237	member pointer refers to.	-0.124939
-0.601116	integers of the default	-0.124939
-1.815187	to use the default	-0.124939
-0.891320	class with a default	-0.124939
-0.828147	This applies to default	-0.124939
-0.358740	x,y coordinates // default	-0.124939
-0.562539	often used by default	-0.124939
-0.460004	in registers by default	-0.124939
-0.356070	and off by default	-0.124939
-0.358334	a constructor. A default	-0.124939
-0.654807	of vector, bits Instruction	-0.124939
-0.355738	each table element Instruction	-0.124939
-0.532845	Intrinsic function name Instruction	-0.124939
-0.346450	Linux, Mac, BSD Instruction	-0.124939
-0.327817	is as follows: Instruction	-0.124939
-0.327817	are as follows: Instruction	-0.124939
-0.336287	compiler makers. 4. Instruction	-0.124939
-0.237910	supported fprintf(stderr, "\nError: Instruction	-0.124939
-0.237910	multiply-and-add Table 13.1. Instruction	-0.124939
-0.527287	the purpose of finding	-0.124939
-1.537031	is used for finding	-0.124939
-0.942253	be useful for finding	-0.124939
-0.935576	are useful for finding	-0.124939
-0.481122	most useful for finding	-0.124939
-0.556622	not intended for finding	-0.124939
-0.501335	binary search for finding	-0.124939
-0.717523	is required for finding	-0.124939
-0.358541	anything else than finding	-0.124939
-0.659854	point numbers is inefficient.	-0.124939
-0.358784	far procedures are inefficient.	-0.124939
-1.018393	it is very inefficient.	-0.124939
-0.460992	complex and often inefficient.	-0.124939
-0.555642	This is quite inefficient.	-0.124939
-0.680808	makes data caching inefficient.	-0.124939
-0.455293	and caching becomes inefficient.	-0.124939
-1.007386	is of course inefficient.	-0.124939
-0.773753	int a, b, c,	-0.124939
-0.200598	vector a, b, c,	-0.124939
-0.042143	float a, b, c,	-0.425969
-0.088818	bool a, b, c,	-0.425969
-0.555559	b = 0, c,	-0.124939
-0.358860	access, sort and search	-0.124939
-0.358826	user's needs. The search	-0.124939
-1.061425	and there are search	-0.124939
-0.065245	been added? If search	-0.425969
-0.353995	intervals. Some programs search	-0.124939
-0.352933	instructions SSE4.2 string search	-0.124939
-0.549997	table can improve search	-0.124939
-0.351666	then use binary search	-0.124939
-0.462624	optimization by CPU Modern	-0.124939
-0.823790	variables and operators Modern	-0.124939
-0.650394	How compilers optimize Modern	-0.124939
-0.349072	are critical resources. Modern	-0.124939
-0.629921	3.15 Dependency chains Modern	-0.124939
-0.513828	is branch prediction. Modern	-0.124939
-0.478688	calculations in parallel. Modern	-0.124939
-0.237902	advanced prediction mechanisms. Modern	-0.124939
-0.237902	temp1 and temp2. Modern	-0.124939
-1.180730	of the memory block.	-0.124939
-0.527383	own allocated memory block.	-0.124939
-0.888805	new bigger memory block.	-0.124939
-0.349785	one contiguous memory block.	-0.124939
-0.822293	to the new block.	-0.124939
-0.355512	for each allocated block.	-0.124939
-0.850807	to the next block.	-0.124939
-0.341873	is a try block.	-0.124939
-0.294237	a thread environment block.	-0.124939
-0.817355	when speed is critical.	-0.124939
-0.152789	code caching is critical.	-0.124939
-1.715478	that can be critical.	-0.124939
-0.594914	use can be critical.	-0.124939
-1.260925	that are not critical.	-0.124939
-0.504104	resources are most critical.	-0.124939
-0.459485	speed is particularly critical.	-0.124939
-0.672083	that are particularly critical.	-0.124939
-0.242056	effect of dependency chains	-0.124939
-0.242056	if such dependency chains	-0.124939
-0.446176	or long dependency chains	-0.124939
-0.242056	chain. Such dependency chains	-0.124939
-0.242056	break down dependency chains	-0.124939
-0.433405	especially loop-carried dependency chains	-0.124939
-0.242056	order. Long dependency chains	-0.124939
-0.048401	cores. 3.15 Dependency chains	-0.124939
-0.048401	22 3.15 Dependency chains	-0.124939
-0.505074	have finished the time-consuming	-0.124939
-0.358381	makes dynamic_cast more time-consuming	-0.124939
-0.559034	if the most time-consuming	-0.124939
-0.559034	When the most time-consuming	-0.124939
-0.343557	activating the very time-consuming	-0.124939
-0.343557	thread as very time-consuming	-0.124939
-0.343557	critical. A very time-consuming	-0.124939
-0.979866	can be quite time-consuming	-0.124939
-0.832250	useful to put time-consuming	-0.124939
-0.551807	Test with different brands	-0.124939
-0.645929	on seven different brands	-0.124939
-0.351903	that treats different brands	-0.124939
-0.499194	discriminating between CPU brands	-0.124939
-0.808313	for specific CPU brands	-0.124939
-0.566171	not for other brands	-0.124939
-0.576118	well on all brands	-0.124939
-0.347498	known processors. Other brands	-0.124939
-0.314756	performance of competing brands	-0.124939
-1.040161	instruction set is available.	-0.602060
-0.358690	level linking" if available.	-0.124939
-0.576037	purposes are also available.	-0.124939
-0.586342	math function libraries available.	-0.124939
-0.355501	strongest optimization option available.	-0.124939
-0.712342	different integer types available.	-0.124939
-0.294228	genuine compiler became available.	-0.124939
-0.810536	in most cases. Don't	-0.124939
-0.539722	longjmp if possible. Don't	-0.124939
-0.339463	and stopping threads. Don't	-0.124939
-0.314737	with template metaprogramming. Don't	-0.124939
-0.102864	* reciprocal_divisor; 14.7 Don't	-0.124939
-0.102864	........................................................................................... 139 14.7 Don't	-0.124939
-0.237910	warn against overkill. Don't	-0.124939
-0.237910	would be evicted. Don't	-0.124939
-0.237910	was the opposite: Don't	-0.124939
-0.505019	that what is brand	-0.124939
-0.557188	processors because this brand	-0.124939
-0.588518	If the CPU brand	-0.124939
-0.454081	check for CPU brand	-0.124939
-0.351401	look at CPU brand	-0.124939
-0.351122	optimally on any brand	-0.124939
-0.453728	can have any brand	-0.124939
-0.581624	of a particular brand	-0.124939
-0.347509	CPU of unknown brand	-0.124939
-1.367786	when it is executed.	-0.124939
-1.758165	the code is executed.	-0.124939
-1.714418	the program is executed.	-0.124939
-0.548459	and branch is executed.	-0.124939
-0.065616	time Func is executed.	-0.124939
-0.591999	before they are executed.	-0.124939
-0.897144	time it was executed.	-0.124939
-0.335565	the statement was executed.	-0.124939
-0.593442	x<<3, which is faster.	-0.124939
-0.357228	make their software faster.	-0.124939
-0.460095	approximately three times faster.	-0.124939
-0.153205	which is much faster.	-0.124939
-0.330059	is accessed much faster.	-0.124939
-0.755208	the address calculation faster.	-0.124939
-0.353198	make the division faster.	-0.124939
-0.449117	of code execute faster.	-0.124939
-0.885335	elements at the diagonal	-0.124939
-0.314388	used above the diagonal	-0.124939
-0.314388	matrix[c][r] above the diagonal	-0.124939
-0.172401	28 below the diagonal	-0.124939
-0.077701	matrix[r][c] below the diagonal	-0.124939
-0.358069	// At the diagonal	-0.124939
-0.057814	loop columns below diagonal	-0.124939
-0.351316	nearest integer int n;	-0.124939
-0.064829	} u; int n;	-0.124939
-0.351316	Example 14.3a int n;	-0.124939
-0.351316	Example 14.3b int n;	-0.124939
-2.107474	0; i < n;	-0.124939
-0.269228	2.0; x <= n;	-0.124939
-0.352385	2; i <= n;	-0.124939
-0.294256	fistp dword ptr n;	-0.124939
-1.350220	{ a[i] = *p	-0.124939
-0.129130	{ *p = *p	-0.425969
-0.358673	an object by *p	-0.124939
-0.626222	* p) { *p	-0.425969
-0.463725	necessary to reload *p	-0.124939
-0.048397	7.31b char string[100], *p	-0.124939
-0.048397	7.31a char string[100], *p	-0.124939
-1.046946	situation where the logic	-0.124939
-0.358819	better explains the logic	-0.124939
-0.504894	times faster. The logic	-0.124939
-0.852060	and the program logic	-0.124939
-1.107430	If the program logic	-0.124939
-0.569154	deallocated. The program logic	-0.124939
-0.102873	in a programmable logic	-0.124939
-0.102873	devices A programmable logic	-0.124939
-0.237934	processors. 5 Programmable logic	-0.124939
-1.192376	code for the Microsoft,	-0.124939
-0.869406	good as the Microsoft,	-0.124939
-0.358826	have tried. The Microsoft,	-0.124939
-1.031234	are supported by Microsoft,	-0.124939
-0.358636	32-bit Linux with Microsoft,	-0.124939
-0.358275	latest compilers from Microsoft,	-0.124939
-0.352400	Intel, Microsoft Intel, Microsoft,	-0.124939
-0.856745	all x86 platforms. Microsoft,	-0.124939
-0.336272	intrinsic functions (i.e. Microsoft,	-0.124939
-0.599642	swapped to the hard	-0.124939
-0.598396	around on the hard	-0.124939
-0.598222	change of a hard	-0.124939
-0.552665	file on a hard	-0.124939
-0.552665	fast on a hard	-0.124939
-0.580541	response from a hard	-0.124939
-0.579130	better support for hard	-0.124939
-0.461805	consumer to many hard	-0.124939
-0.341886	slow and fragmented hard	-0.124939
-0.894262	increasing number of purposes	-0.124939
-0.586005	algorithms for different purposes	-0.124939
-0.446664	unit for other purposes	-0.124939
-0.446664	card for other purposes	-0.124939
-0.357888	but for most purposes	-0.124939
-0.459458	for many common purposes	-0.124939
-0.451145	libraries for special purposes	-0.124939
-0.441989	available for general purposes	-0.124939
-0.237910	audience for educational purposes	-0.124939
-1.195181	advantageous if the typical	-0.124939
-1.045724	code in a typical	-0.124939
-0.880501	called in a typical	-0.124939
-0.590610	best on a typical	-0.124939
-0.357982	should contain a typical	-0.124939
-0.358834	is fastest. The typical	-0.124939
-0.358334	give infinity. A typical	-0.124939
-0.357361	points out some typical	-0.124939
-0.237926	compile time. Four typical	-0.124939
-0.597235	leads to a usability	-0.124939
-1.285307	in terms of usability	-0.124939
-0.065738	4 Performance and usability	-0.124939
-0.550602	interface calls. The usability	-0.124939
-0.527279	as possible for usability	-0.124939
-0.358784	these problems are usability	-0.124939
-0.356084	well as important usability	-0.124939
-0.237910	bugs, compatibility problems, usability	-0.124939
-1.407078	a function is pure	-0.124939
-0.577339	Assume function is pure	-0.124939
-0.599235	this is a pure	-0.124939
-0.596853	calls to a pure	-0.124939
-0.358789	and log are pure	-0.124939
-0.462874	Pure functions A pure	-0.124939
-0.347478	loop-invariant code containing pure	-0.124939
-0.343705	subexpressions that contain pure	-0.124939
-0.483902	when it involves pure	-0.124939
-1.578283	you have to vectorize	-0.124939
-1.937113	is possible to vectorize	-0.124939
-0.591058	We want to vectorize	-0.124939
-0.761934	is unable to vectorize	-0.124939
-0.358877	12.8b automatically and vectorize	-0.124939
-0.585010	compilers may not vectorize	-0.124939
-0.850030	the compiler will vectorize	-0.124939
-1.028608	The compiler will vectorize	-0.124939
-0.353439	current compilers don't vectorize	-0.124939
-0.462439	are no cache problems.	-0.124939
-0.350450	problems or performance problems.	-0.124939
-0.350450	for investigating performance problems.	-0.124939
-0.460673	to avoid these problems.	-0.124939
-0.348340	because of alignment problems.	-0.124939
-0.348381	and resolve compatibility problems.	-0.124939
-0.314737	because of technical problems.	-0.124939
-0.237910	end user. Installation problems.	-0.124939
-0.237910	the user. Compatibility problems.	-0.124939
-0.504861	a variable that could	-0.124939
-0.659430	even though it could	-0.124939
-0.598308	or the function could	-0.124939
-1.961040	of the code could	-0.124939
-0.358491	example 8.21, you could	-0.124939
-0.456617	the following methods could	-0.124939
-0.336278	that the portability could	-0.124939
-0.325361	The constant N1 could	-0.124939
-0.237902	is that r+i/2 could	-0.124939
-0.504703	variable as function parameter.	-0.124939
-0.358055	counts a one parameter.	-0.124939
-0.607900	of the template parameter.	-0.124939
-0.429853	and the template parameter.	-0.124939
-0.282612	of a template parameter.	-0.124939
-0.401581	as a template parameter.	-0.124939
-0.118712	through a template parameter.	-0.124939
-0.433773	name as template parameter.	-0.124939
-0.502591	object of the derived	-0.124939
-0.597933	file and the derived	-0.124939
-0.587644	objects inside the derived	-0.124939
-0.590235	object of a derived	-0.425969
-1.345610	pointer to a derived	-0.124939
-0.585163	class and a derived	-0.124939
-0.726685	parent class and derived	-0.124939
-0.600592	allocation can be mentioned	-0.124939
-0.577484	common compilers are mentioned	-0.124939
-0.358590	garbage collection, as mentioned	-0.124939
-0.853918	the vector operations mentioned	-0.124939
-0.775767	all the problems mentioned	-0.124939
-0.353402	the storage methods mentioned	-0.124939
-0.439004	have the disadvantages mentioned	-0.124939
-0.294210	of the time-consumers mentioned	-0.124939
-0.382827	than the ones mentioned	-0.124939
-0.758866	to test // Time	-0.124939
-0.356686	NumberOfTests times // Time	-0.124939
-0.356686	prevent optimizing // Time	-0.124939
-0.462250	follows: Matrix size Time	-0.124939
-0.752905	for the user. Time	-0.124939
-0.294228	size Total kilobytes Time	-0.124939
-0.382850	element Example 9.6a Time	-0.124939
-0.237918	97 Table 9.1. Time	-0.124939
-0.237918	168.3 Table 9.3. Time	-0.124939
-0.439020	to the table. Optimization	-0.124939
-0.093306	................................................................................................ 157 17 Optimization	-0.124939
-0.093306	normal. 157 17 Optimization	-0.124939
-0.102870	the book "Performance Optimization	-0.124939
-0.102870	Adolfy Hoisie: "Performance Optimization	-0.124939
-0.102870	exception handling. 8.6 Optimization	-0.124939
-0.102870	................................................................................... 81 8.6 Optimization	-0.124939
-0.237926	regularly. AMD: "Software Optimization	-0.124939
-0.237926	and IA-32 Architectures Optimization	-0.124939
-1.273202	of floating point expressions.	-0.124939
-1.201057	to floating point expressions.	-0.124939
-0.362823	on floating point expressions.	-0.124939
-0.357988	more complex integer expressions.	-0.124939
-0.356252	between two simple expressions.	-0.124939
-0.331933	rather than Boolean expressions.	-0.124939
-0.331933	with many Boolean expressions.	-0.124939
-0.347519	reduce complicated algebraic expressions.	-0.124939
-0.358570	would be to include	-0.124939
-1.334900	You have to include	-0.124939
-0.540802	measurement should not include	-0.124939
-0.503714	performance test should include	-0.124939
-0.572574	compression Most compilers include	-0.124939
-0.573007	newest instruction sets include	-0.124939
-0.350323	code. Compiled languages include	-0.124939
-0.314737	Most compiler packages include	-0.124939
-0.314764	is run. Examples include	-0.124939
-0.524893	1; } return y;	-0.124939
-0.385991	8.8b double x, y;	-0.124939
-0.304406	public: float x, y;	-0.124939
-0.229238	{ S1 x, y;	-0.124939
-0.229238	e, f, x, y;	-0.124939
-0.174150	b, c, d, y;	-0.425969
-0.478745	c = 100, y;	-0.124939
-0.237934	c = 1.23456, y;	-0.124939
-0.463688	but avoids the overflow.	-0.124939
-1.555483	in case of overflow.	-0.124939
-0.526781	obscure possibility of overflow.	-0.124939
-0.465928	no check for overflow.	-0.124939
-0.525896	check for integer overflow.	-0.124939
-0.550579	calculations can cause overflow.	-0.124939
-0.441508	integer doesn't cause overflow.	-0.124939
-0.294237	question without generating overflow.	-0.124939
-1.084308	of an array element.	-0.124939
-0.345689	cycles per array element.	-0.124939
-0.345689	the current array element.	-0.124939
-0.868542	only a single element.	-0.124939
-0.958753	of the matrix element.	-0.124939
-0.575132	need the next element.	-0.124939
-0.361868	clock cycles per element.	-0.124939
-0.314756	a suitable pivot element.	-0.124939
-0.256478	advantages of object oriented	-0.124939
-0.109575	effects of object oriented	-0.425969
-0.460110	classes. The object oriented	-0.124939
-0.805663	use an object oriented	-0.124939
-0.326381	reasons why object oriented	-0.124939
-0.326381	textbooks recommend object oriented	-0.124939
-0.314785	delete). 88 Object oriented	-0.124939
-0.237951	efficient than non-object oriented	-0.124939
-0.900851	breakpoint in the fully	-0.124939
-0.562506	Portability C++ is fully	-0.124939
-0.884026	the syntax is fully	-0.124939
-0.596125	obtained with a fully	-0.124939
-1.239691	best way to fully	-0.124939
-0.358784	for Windows are fully	-0.124939
-1.054608	libraries are not fully	-0.124939
-0.504104	prevent it from fully	-0.124939
-1.105194	are not always fully	-0.124939
-0.575602	other kinds of storage.	-0.124939
-0.490958	function for register storage.	-0.124939
-0.348838	benefit from register storage.	-0.124939
-0.353425	threads need separate storage.	-0.124939
-0.341848	used for temporary storage.	-0.124939
-0.010298	systems with big-endian storage.	-0.124939
-0.020847	platforms with big-endian storage.	-0.124939
-0.444369	use big endian storage.	-0.124939
-1.114107	the speed of addition,	-0.124939
-0.358889	You may, in addition,	-0.124939
-0.595284	operations such as addition,	-0.124939
-0.515411	longer time than addition,	-0.425969
-1.689312	a floating point addition,	-0.124939
-0.581947	do an integer addition,	-0.124939
-0.353988	reductions involving integer addition,	-0.124939
-0.237918	minimum, maximum, saturated addition,	-0.124939
-0.107128	the execution of everything	-0.425969
-1.637922	make sure that everything	-0.124939
-0.761456	a, b; // everything	-0.124939
-0.657432	* 1.2; // everything	-0.124939
-0.357493	interpreted languages where everything	-0.124939
-1.795886	to make sure everything	-0.124939
-0.501217	must clean up everything	-0.124939
-0.439009	time to eliminate everything	-0.124939
-0.591987	consumer if it involves	-0.124939
-0.579175	manually when it involves	-0.124939
-0.590749	risky because it involves	-0.124939
-1.368751	if the code involves	-0.124939
-0.526554	processor. However, this involves	-0.124939
-0.357575	This method also involves	-0.124939
-0.416000	floating point operations involves	-0.425969
-0.237926	to a driver involves	-0.124939
-0.886095	F2(b); } } Here	-0.124939
-0.550479	...)) { ... Here	-0.124939
-0.635229	a suboptimal way. Here	-0.124939
-0.511885	in assembly language. Here	-0.124939
-0.325361	which is double. Here	-0.124939
-0.538855	Writes "Hello 2" Here	-0.124939
-0.294210	and Newton-Raphson iterations. Here	-0.124939
-0.237902	order calculation capabilities. Here	-0.124939
-0.237902	a1/b1 + a2/b2; Here	-0.124939
-1.200867	implementation of the factorial	-0.124939
-0.356719	Example 14.1b int factorial	-0.124939
-0.356719	Example 14.1a int factorial	-0.124939
-0.571868	take the integer factorial	-0.124939
-0.352942	x^0/0! // n factorial	-0.124939
-0.044152	double x, n, factorial	-0.425969
-0.212316	<= n; x++) factorial	-0.124939
-0.212316	0; i--, x++) factorial	-0.124939
-0.601113	Documentation of the OpenMP	-0.124939
-0.358814	PSDK). Supports the OpenMP	-0.124939
-0.358665	threads Parallelization by OpenMP	-0.124939
-0.459665	the data. Use OpenMP	-0.124939
-0.350330	and 64-bit. Supports OpenMP	-0.124939
-0.051250	Supports parallel processing, OpenMP	-0.425969
-0.314747	Use OpenMP directives. OpenMP	-0.124939
-0.237918	(see page 107), OpenMP	-0.124939
-0.357801	the array pointer eax	-0.124939
-0.355945	variable 85 ; eax	-0.124939
-0.442026	of the array. eax	-0.124939
-0.280172	instructions add ebx, eax	-0.124939
-0.208766	= r ebx, eax	-0.124939
-0.208766	ebx, 31 ebx, eax	-0.124939
-0.339459	PTR [esp+8] eax, eax	-0.124939
-0.336295	eax, 8 edx, eax	-0.124939
-0.294228	100. It compares eax	-0.124939
-0.029572	aa[], short int bb[],	-1.079181
-0.803450	control branch is mispredicted	-0.124939
-0.159740	other way is mispredicted	-0.425969
-1.225351	repeat count is mispredicted	-0.124939
-1.017577	certain to be mispredicted	-0.124939
-0.580539	addresses to be mispredicted	-0.124939
-0.593101	calls can be mispredicted	-0.124939
-0.593101	branches can be mispredicted	-0.124939
-0.589860	branch will be mispredicted	-0.124939
-0.463627	point format is standardized	-0.124939
-0.599844	proceed in a standardized	-0.124939
-0.358855	Available protocols and standardized	-0.124939
-0.596779	programs should be standardized	-0.124939
-0.463208	should be as standardized	-0.124939
-0.598706	size is not standardized	-0.124939
-0.504264	should always use standardized	-0.124939
-0.444343	syntax is fully standardized	-0.124939
-0.294210	relies on non- standardized	-0.124939
-0.587335	counters instead of (or	-0.124939
-0.504035	another C++ program (or	-0.124939
-1.829799	SSE2 instruction set (or	-0.124939
-0.357670	the entire library (or	-0.124939
-0.576455	element is stored (or	-0.124939
-1.161449	unless the SSE2 (or	-0.124939
-0.535371	groups of four (or	-0.124939
-0.352938	have used char (or	-0.124939
-1.031924	new and delete (or	-0.124939
-0.526583	and to optimize across	-0.124939
-0.317683	or otherwise optimize across	-0.124939
-0.317683	below. Cannot optimize across	-0.124939
-0.598131	from making optimizations across	-0.124939
-0.326959	will enable optimizations across	-0.124939
-0.890470	are not compatible across	-0.124939
-0.756855	and parameter transfer across	-0.124939
-0.343723	is not standardized across	-0.124939
-0.237926	member is unchanged across	-0.124939
-0.050112	of a clock cycle	-0.602060
-0.171943	to a clock cycle	-0.124939
-0.311630	2GHz A clock cycle	-0.124939
-0.477976	only one clock cycle	-0.124939
-0.339440	takes one clock cycle	-0.124939
-0.329971	the core clock cycle	-0.124939
-0.465001	The core clock cycle	-0.124939
-0.339182	Specifies that pointer aliasing	-0.124939
-0.329192	assume no pointer aliasing	-0.124939
-0.463939	Assume no pointer aliasing	-0.124939
-0.329192	assuming no pointer aliasing	-0.124939
-0.339182	of possible pointer aliasing	-0.124939
-0.937641	cannot rule out aliasing	-0.124939
-0.336328	81). 77 Pointer aliasing	-0.124939
-0.102878	on the strict aliasing	-0.124939
-0.102878	violates the strict aliasing	-0.124939
-0.006654	void SelectAddMul(short int aa[],	-0.903090
-0.344217	void SelectAddMul_dispatch(short int aa[],	-0.124939
-0.344217	void FUNCNAME(short int aa[],	-0.124939
-0.344217	void FuncType(short int aa[],	-0.124939
-0.356933	Visual Studio. This tool	-0.124939
-0.356933	from www.agner.org/optimize/testp.zip. This tool	-0.124939
-0.472704	developed a test tool	-0.124939
-0.453006	counter. The test tool	-0.124939
-0.416104	for my test tool	-0.124939
-0.131542	counters. My test tool	-0.124939
-0.131542	identified. My test tool	-0.124939
-0.467690	language and development tool	-0.124939
-0.331939	One popular development tool	-0.124939
-1.635366	size of the parent	-0.124939
-0.587945	functions of a parent	-0.124939
-1.176335	members of a parent	-0.124939
-0.203880	member functions of parent	-0.124939
-0.358250	class. Members of parent	-0.124939
-0.358740	in the // parent	-0.124939
-0.357635	Inheritance from multiple parent	-0.124939
-0.354186	members of both parent	-0.124939
-1.337260	don't have to care	-0.124939
-0.378840	class that takes care	-0.124939
-0.378840	library that takes care	-0.124939
-0.632959	the compiler takes care	-0.124939
-0.454843	destructors to take care	-0.124939
-0.454843	coprocessors to take care	-0.124939
-0.483245	thread can take care	-0.124939
-0.483245	tread can take care	-0.124939
-0.561296	If you don't care	-0.124939
-0.562469	step. In most systems,	-0.124939
-0.385027	efficient. In 64-bit systems,	-0.124939
-0.385027	cycle. In 64-bit systems,	-0.124939
-0.778649	stack in 32-bit systems,	-0.124939
-0.535397	integers in 32-bit systems,	-0.124939
-0.640262	OS X operating systems,	-0.124939
-0.349014	programming languages, operating systems,	-0.124939
-0.953270	Linux and Mac systems,	-0.124939
-0.023525	version int CriticalFunction_386(int parm1,	-0.425969
-0.023525	version int CriticalFunction_SSE2(int parm1,	-0.425969
-0.048397	version int CriticalFunction_AVX(int parm1,	-0.124939
-0.048397	127 int CriticalFunction_AVX(int parm1,	-0.124939
-0.237934	typedef int CriticalFunctionType(int parm1,	-0.124939
-0.237934	time int CriticalFunction_Dispatch(int parm1,	-0.124939
-1.772450	the code is included	-0.124939
-0.578414	This time is included	-0.124939
-0.462879	Bounds checking is included	-0.124939
-0.587132	important functions are included	-0.124939
-0.598722	'1' is not included	-0.124939
-0.356808	Mac The libraries included	-0.124939
-0.494895	constants are usually included	-0.124939
-0.294237	yes License license included	-0.124939
-0.588600	even have a false	-0.124939
-0.463227	be given a false	-0.124939
-0.463509	value 0 for false	-0.124939
-1.189027	known to be false	-0.124939
-0.358717	with IsPowerOf2 = false	-0.124939
-0.358699	true (1) or false	-0.124939
-0.549614	a a && false	-0.124939
-0.531715	a, a || false	-0.124939
-0.957912	can change the value.	-0.124939
-0.599104	have the same value.	-0.124939
-0.502720	a function return value.	-0.124939
-0.460001	constant with its value.	-0.124939
-0.458925	with the calculated value.	-0.124939
-0.511805	below the maximum value.	-0.124939
-0.410570	to the previous value.	-0.124939
-0.410570	using the previous value.	-0.124939
-0.575756	in the object file.	-0.124939
-0.064264	a single object file.	-0.425969
-0.860008	the same source file.	-0.124939
-0.330840	in another source file.	-0.124939
-0.349119	to an output file.	-0.124939
-0.529899	into the executable file.	-0.124939
-0.345232	or an input file.	-0.124939
-0.358536	*= x; x *=	-0.124939
-0.353455	& 1) y *=	-0.124939
-0.345251	n; i++) f *=	-0.124939
-0.120425	n; x++) factorial *=	-0.124939
-0.120425	i--, x++) factorial *=	-0.124939
-0.339459	/ nfac; xn *=	-0.124939
-0.314747	+= x^n/n! xxn *=	-0.124939
-0.294228	*= x; nfac *=	-0.124939
-0.598222	creation of a temporary	-0.124939
-0.590622	it in a temporary	-0.124939
-0.590622	sequence in a temporary	-0.124939
-0.594876	ebx as a temporary	-0.124939
-0.463629	the creation of temporary	-0.124939
-0.593906	CPU used for temporary	-0.124939
-0.562918	register variables are temporary	-0.124939
-0.314766	The profiler inserts temporary	-0.124939
-0.659854	of abc is 12	-0.124939
-0.355796	not optimal. Use 12	-0.124939
-0.489060	with memory access. 12	-0.124939
-0.483867	misprediction is approximately 12	-0.124939
-0.339463	; parameter 2: 12	-0.124939
-0.325371	execution ................................................................................................. 103 12	-0.124939
-0.294219	double 8, 10, 12	-0.124939
-0.237910	of function libraries........................................................................................ 12	-0.124939
-1.200346	implementation of the memcpy	-0.124939
-1.815139	to use the memcpy	-0.124939
-0.866163	single call to memcpy	-0.124939
-0.504931	of memset and memcpy	-0.124939
-0.408151	0.18 0.18 0.11 memcpy	-0.124939
-0.314747	0.57 0.44 0.12 memcpy	-0.124939
-0.237918	libraries Test Processor memcpy	-0.124939
-0.237918	0.25 0.28 0.22 memcpy	-0.124939
-0.600944	address in the procedure	-0.124939
-0.590918	version in a procedure	-0.124939
-0.590918	lookup in a procedure	-0.124939
-0.816976	program uses a procedure	-0.124939
-0.563017	mark end of procedure	-0.124939
-0.358834	bit Linux The procedure	-0.124939
-0.357049	its functions, called procedure	-0.124939
-0.237926	uses an ordinary procedure	-0.124939
-0.553436	is on a PC	-0.124939
-0.553436	compiled on a PC	-0.124939
-1.230669	be implemented in PC	-0.124939
-0.562601	tested only on PC	-0.124939
-0.252634	of the standard PC	-0.124939
-0.252634	on the standard PC	-0.124939
-0.252634	as the standard PC	-0.124939
-0.252634	purposes the standard PC	-0.124939
-1.620133	This is a frequent	-0.124939
-0.589823	swapping is a frequent	-0.124939
-0.659677	in advance. The frequent	-0.124939
-0.526293	Such schemes are frequent	-0.124939
-0.503715	Such frameworks are frequent	-0.124939
-0.550077	switches are more frequent	-0.124939
-0.358111	or remotely. If frequent	-0.124939
-0.593094	among the most frequent	-0.124939
-0.237910	2 AVX2 _mm256_i64gather_pd unlimited	-0.124939
-0.237910	8 AVX2 _mm_i64gather_pd unlimited	-0.124939
-0.237910	4 AVX2 _mm256_i32gather_epi32 unlimited	-0.124939
-0.237910	4 AVX2 _mm_i32gather_ps unlimited	-0.124939
-0.237910	2 AVX2 _mm256_i64gather_epi32 unlimited	-0.124939
-0.237910	8 AVX2 _mm_i32gather_epi32 unlimited	-0.124939
-0.237910	8 AVX2 _mm_i64gather_epi32 unlimited	-0.124939
-0.237910	4 AVX2 _mm256_i32gather_ps unlimited	-0.124939
-0.735789	cases where the parallelism	-0.425969
-0.172686	parallelism and fine-grained parallelism	-0.124939
-0.172686	than with fine-grained parallelism	-0.124939
-0.102873	efficiently with coarse-grained parallelism	-0.124939
-0.102873	distinguish between coarse-grained parallelism	-0.124939
-0.237934	in parallel. Fine-grained parallelism	-0.124939
-0.237934	in parallel. Coarse-grained parallelism	-0.124939
-0.820781	of the CPU detection	-0.124939
-0.883168	in the CPU detection	-0.124939
-0.454589	use the CPU detection	-0.124939
-0.170348	replace the CPU detection	-0.425969
-0.454589	Unfortunately, the CPU detection	-0.124939
-0.454589	bypass the CPU detection	-0.124939
-0.613282	the Intel CPU detection	-0.124939
-0.358866	Loop r2 and c2	-0.124939
-1.040445	element in vector c2	-0.124939
-0.357219	to choose between c2	-0.124939
-0.125386	vector c __m128i c2	-0.425969
-0.325399	with the bit-mask: c2	-0.124939
-0.408151	(c2 = r1; c2	-0.124939
-0.294228	(c2 = c1; c2	-0.124939
-0.006839	and manual 3: "The	-0.124939
-0.003406	in manual 3: "The	-0.425969
-0.001700	(See manual 3: "The	-0.726999
-0.050303	branches. Manual 3: "The	-0.124939
-0.543494	function by adding throw()	-0.124939
-0.265206	throw() throw() throw() throw()	-0.124939
-0.175228	exceptions throw() throw() throw()	-0.124939
-0.224944	throw exceptions throw() throw()	-0.124939
-0.339479	not throw exceptions throw()	-0.124939
-0.122669	apply the empty throw()	-0.124939
-0.057018	have an empty throw()	-0.124939
-0.057018	While an empty throw()	-0.124939
-1.366887	or if the prediction	-0.124939
-0.596140	vector::reserve with a prediction	-0.124939
-0.358835	The rules for prediction	-0.124939
-0.542913	by the branch prediction	-0.124939
-0.435082	used for branch prediction	-0.124939
-0.336334	have no branch prediction	-0.124939
-0.336334	to take branch prediction	-0.124939
-0.351281	execution and advanced prediction	-0.124939
-1.635064	versions of the polymorphic	-0.124939
-0.575456	can call the polymorphic	-0.124939
-1.036396	version of a polymorphic	-0.124939
-0.587325	instance of a polymorphic	-0.124939
-0.397216	to call a polymorphic	-0.124939
-0.461199	"; // call polymorphic	-0.124939
-0.575679	used for implementing polymorphic	-0.124939
-0.504326	code should have #if	-0.124939
-0.358393	For example use #if	-0.124939
-0.886099	a.store(aa+i); } } #if	-0.124939
-0.358193	than if because #if	-0.124939
-0.597352	on instruction set #if	-0.124939
-0.721812	same source code. #if	-0.124939
-0.518263	integer int n; #if	-0.124939
-0.382839	program is compiled. #if	-0.124939
-0.358923	of inheritance is now	-0.124939
-0.358784	hybrid solutions are now	-0.124939
-0.858474	The code can now	-0.124939
-0.349126	accessed column-wise. Assume now	-0.124939
-0.341849	the stack). ecx now	-0.124939
-0.682659	The loop body now	-0.124939
-1.023476	their live ranges now	-0.124939
-0.237910	product is Borland's now	-0.124939
-0.577173	unit, but this unit	-0.124939
-0.582770	below. The time unit	-0.124939
-1.655393	use the same unit	-0.124939
-0.358039	should save one unit	-0.124939
-0.435748	the graphics processing unit	-0.124939
-0.308344	a physics processing unit	-0.124939
-0.077814	chain. 3.16 Execution unit	-0.124939
-0.077814	22 3.16 Execution unit	-0.124939
-0.683914	the function calling conventions	-0.124939
-0.325433	any specific calling conventions	-0.124939
-0.001468	manual 5: "Calling conventions	-0.823909
-0.325429	CPUs. 5. Calling conventions	-0.124939
-1.047432	or in a register.	-0.124939
-1.249336	than in a register.	-0.124939
-1.121540	into a vector register.	-0.124939
-0.458869	size as vector register.	-0.124939
-1.655393	use the same register.	-0.124939
-0.358039	takes up one register.	-0.124939
-0.770670	a 128-bit XMM register.	-0.124939
-0.347498	the same logical register.	-0.124939
-0.599241	statements is a kind	-0.124939
-1.026119	is also a kind	-0.124939
-0.520041	not make this kind	-0.124939
-0.457372	CPU supports this kind	-0.124939
-0.353997	To prevent this kind	-0.124939
-0.867417	use a different kind	-0.124939
-0.353213	tell explicitly what kind	-0.124939
-0.237934	An even worse kind	-0.124939
-0.358968	by dropping the graphical	-0.124939
-1.050789	loop of a graphical	-0.124939
-0.587632	menus of a graphical	-0.124939
-0.586495	application has a graphical	-0.124939
-0.358334	3.10 Graphics A graphical	-0.124939
-0.454887	program their own graphical	-0.124939
-0.336293	Library (OWL). Several graphical	-0.124939
-0.237926	depend on system-specific graphical	-0.124939
-0.577390	use only the lower	-0.124939
-0.788599	simply stores the lower	-0.124939
-0.527293	residual error is lower	-0.124939
-0.591517	compiling for a lower	-0.124939
-0.584920	to at a lower	-0.124939
-0.358920	ASCII string to lower	-0.124939
-0.548615	other threads with lower	-0.124939
-0.357253	separate thread with lower	-0.124939
-0.593944	begins at the label	-0.124939
-0.539686	sequence where each label	-0.124939
-0.051614	loop ; unused label	-0.124939
-0.051614	r ; unused label	-0.124939
-0.051614	true ; unused label	-0.124939
-0.051614	;r ; unused label	-0.124939
-0.829182	to the preceding label	-0.124939
-0.314766	to the $B1$2 label	-0.124939
-0.505071	can overlap the iterations	-0.124939
-2.074659	the number of iterations	-0.124939
-1.369767	two or more iterations	-0.124939
-0.526762	calculations of loop iterations	-0.124939
-0.461973	is doing two iterations	-0.124939
-0.521527	prepared for several iterations	-0.124939
-0.342663	control statement several iterations	-0.124939
-0.353436	is in mathematical iterations	-0.124939
-0.527131	thousand so the misprediction	-0.124939
-0.358821	may detect the misprediction	-0.124939
-1.803206	to make a misprediction	-0.124939
-0.827408	may get a misprediction	-0.124939
-0.358882	for prediction and misprediction	-0.124939
-0.805030	called the branch misprediction	-0.124939
-0.554612	from a branch misprediction	-0.124939
-0.343187	resolve any branch misprediction	-0.124939
-0.519531	counter is an integer,	-0.124939
-0.519531	10 is an integer,	-0.124939
-0.476305	pointer to an integer,	-0.124939
-0.476305	converted to an integer,	-0.124939
-0.569020	bits in an integer,	-0.124939
-0.357622	4 4 64-bit integer,	-0.124939
-0.351700	a biased binary integer,	-0.124939
-0.343747	double plus 6 integer,	-0.124939
-0.115980	principle of lazy binding	-0.124939
-0.015542	code and lazy binding	-0.425969
-0.065814	delay on lazy binding	-0.124939
-0.065814	session. But lazy binding	-0.124939
-0.065814	systems allow lazy binding	-0.124939
-0.102881	is called. Lazy binding	-0.124939
-0.102881	unacceptably long. Lazy binding	-0.124939
-0.358926	other hand, a just-in-time	-0.124939
-0.189421	intermediate code and just-in-time	-0.124939
-1.377036	are based on just-in-time	-0.124939
-0.356028	Some implementations use just-in-time	-0.124939
-0.356028	Java machines use just-in-time	-0.124939
-0.102873	intermediate code, interpreters, just-in-time	-0.124939
-0.102873	graphics frameworks, interpreters, just-in-time	-0.124939
-1.686918	there is a try	-0.124939
-1.807636	is recommended to try	-0.124939
-0.587627	optimizing compiler may try	-0.124939
-0.358458	void F0() { try	-0.124939
-0.358325	the producer will try	-0.124939
-0.358323	has hyperthreading, then try	-0.124939
-1.872378	there is no try	-0.124939
-0.357403	CPU. Should we try	-0.124939
-0.600805	run in the background	-0.124939
-0.599496	requires that the background	-0.124939
-0.358923	should leave a background	-0.124939
-1.748914	a lot of background	-0.124939
-0.540077	optimize performance for background	-0.124939
-0.462683	10 ms for background	-0.124939
-0.442025	doing the heavy background	-0.124939
-0.294237	alternative. The theoretical background	-0.124939
-0.540325	An integer is converted	-0.124939
-0.569653	base class is converted	-0.124939
-0.358340	example 14.7b is converted	-0.124939
-1.407701	need to be converted	-0.124939
-0.876773	integer can be converted	-0.124939
-0.374819	pointer can be converted	-0.425969
-0.658788	positive number when converted	-0.124939
-1.004630	of the object pointed	-0.124939
-0.538418	type of object pointed	-0.124939
-0.488372	division). The object pointed	-0.124939
-1.311521	that the value pointed	-0.124939
-0.563978	because the value pointed	-0.124939
-0.760839	that the variable pointed	-0.124939
-0.525136	time the variable pointed	-0.124939
-0.741157	if the target pointed	-0.124939
-0.940557	different brands of CPUs,	-0.124939
-0.540009	frequency than other CPUs,	-0.124939
-0.456389	only for Intel CPUs,	-0.124939
-0.353222	on certain Intel CPUs,	-0.124939
-0.333584	operations of modern CPUs,	-0.124939
-0.333584	capabilities of modern CPUs,	-0.124939
-0.172686	CPUs or multi-core CPUs,	-0.124939
-0.172686	core on multi-core CPUs,	-0.124939
-0.358929	extra precautions to account	-0.124939
-0.134281	to take into account	-0.124939
-0.134281	you take into account	-0.124939
-0.329682	compatibility problems into account	-0.124939
-0.329682	branch prediction into account	-0.124939
-0.026458	be taken into account	-0.602060
-0.412127	a[], int * p)	-0.124939
-0.037022	LoadVector(void const * p)	-0.602060
-0.122135	LoadVectorA(void const * p)	-0.124939
-0.130506	Plus2 (int * p)	-0.124939
-0.130506	FuncA (int * p)	-0.124939
-0.317950	int Sum2(S3 * p)	-0.124939
-1.347199	information about the chain	-0.124939
-0.242056	splitting the dependency chain	-0.124939
-0.319690	is a dependency chain	-0.124939
-0.242056	chains. A dependency chain	-0.124939
-0.446176	A long dependency chain	-0.124939
-0.451031	a critical dependency chain	-0.124939
-0.242056	Z. Each dependency chain	-0.124939
-0.242056	loop- carried dependency chain	-0.124939
-0.463559	data flow and algorithms	-0.124939
-0.463513	nearby branches. The algorithms	-0.124939
-0.358624	general literature on algorithms	-0.124939
-0.580650	discussion of different algorithms	-0.124939
-0.561576	test several different algorithms	-0.124939
-0.355653	of microprocessor. These algorithms	-0.124939
-0.353622	disadvantage of complicated algorithms	-0.124939
-0.453904	are using advanced algorithms	-0.124939
-0.550595	go through the PLT	-0.124939
-0.358821	and replaces the PLT	-0.124939
-0.541221	position-independent, makes a PLT	-0.124939
-0.065622	the GOT and PLT	-0.124939
-0.065622	use GOT and PLT	-0.124939
-0.065622	effect. GOT and PLT	-0.124939
-0.065622	suppress. GOT and PLT	-0.124939
-0.577428	desired function. The PLT	-0.124939
-1.441509	some of the heavy	-0.124939
-0.557621	by doing the heavy	-0.124939
-1.399287	For example, a heavy	-0.124939
-0.527307	similar thanks to heavy	-0.124939
-0.358640	a network with heavy	-0.124939
-0.595284	time, such as heavy	-0.124939
-1.889924	There is no heavy	-0.124939
-0.357355	giving it some heavy	-0.124939
-1.505878	piece of code once	-0.124939
-0.587566	occurs more than once	-0.124939
-0.658538	multiple values at once	-0.124939
-0.358147	is executed only once	-0.124939
-0.875970	CriticalFunction is called once	-0.124939
-0.356153	dramatic consequences. I once	-0.124939
-0.294219	the function. Compile once	-0.124939
-0.237910	is relocated (rebased) once	-0.124939
-1.042935	if all the additions	-0.124939
-0.358584	balanced mix of additions	-0.124939
-0.504537	a combination of additions	-0.124939
-0.357644	get four float additions	-0.124939
-0.646623	to do two additions	-0.124939
-0.455164	with just two additions	-0.124939
-0.355103	can do four additions	-0.124939
-0.456033	calculated by n additions	-0.124939
-0.187926	tree or a hash	-0.124939
-0.591041	programmers use a hash	-0.124939
-0.515187	of data. A hash	-0.124939
-0.350679	finding elements. A hash	-0.124939
-0.350679	fast enough. A hash	-0.124939
-0.350679	specific interval. A hash	-0.124939
-0.237951	facilities, binary trees, hash	-0.124939
-0.525609	and loaded into ecx	-0.124939
-0.444798	on stack ; ecx	-0.124939
-0.344056	;eax=addressofa ;edx=addressinr ; ecx	-0.124939
-0.339459	the registers eax, ecx	-0.124939
-0.339459	array address is. ecx	-0.124939
-0.237918	on the stack). ecx	-0.124939
-0.237918	DWORD PTR [eax+4], ecx	-0.124939
-0.237918	DWORD PTR [eax], ecx	-0.124939
-1.200801	available in the system.	-0.124939
-0.622653	to the operating system.	-0.124939
-0.576681	and the operating system.	-0.124939
-0.831011	by the operating system.	-0.124939
-0.537014	platform and operating system.	-0.124939
-0.412790	have an operating system.	-0.124939
-0.500797	in the Windows system.	-0.124939
-1.284177	and floating point variables,	-0.124939
-0.357625	This includes static variables,	-0.124939
-1.197656	floating point register variables,	-0.124939
-0.507642	same for simple variables,	-0.124939
-0.345499	such as simple variables,	-0.124939
-0.349111	function parameters, local variables,	-0.124939
-0.255906	variables, integer Register variables,	-0.124939
-0.255906	elimin., float Register variables,	-0.124939
-0.598223	structure. This is equally	-0.124939
-0.358639	or p->member is equally	-0.124939
-0.500408	evaluate and are equally	-0.124939
-0.874555	and they are equally	-0.124939
-0.355629	two integers are equally	-0.124939
-0.355629	and references are equally	-0.124939
-0.355629	= 123; are equally	-0.124939
-0.870020	class are accessed equally	-0.124939
-0.316291	There are cases, however,	-0.124939
-0.578420	In many cases, however,	-0.124939
-0.316291	a few cases, however,	-0.124939
-0.467635	method is inefficient, however,	-0.124939
-0.325391	may be needed, however,	-0.124939
-0.294237	expansions. Programmers do, however,	-0.124939
-0.237926	not always accurate, however,	-0.124939
-0.237926	It is OK, however,	-0.124939
-0.463259	(NetBurst) CPU is designed	-0.124939
-0.659268	the STL is designed	-0.124939
-1.421118	have to be designed	-0.124939
-0.357997	CPU dispatchers are designed	-0.124939
-0.357997	instructions (MOVNT) are designed	-0.124939
-0.353422	feature was never designed	-0.124939
-0.339477	the original, poorly designed	-0.124939
-0.237926	set was originally designed	-0.124939
-0.583196	extensions. If a profiling	-0.124939
-0.504924	off debugging and profiling	-0.124939
-0.463266	the program with profiling	-0.124939
-1.617376	order to make profiling	-0.124939
-0.979128	are several different profiling	-0.124939
-0.532850	Inserting your own profiling	-0.124939
-0.487995	multiple programming languages, profiling	-0.124939
-0.237910	vendors are offering profiling	-0.124939
-1.782339	the code is fragmented	-0.124939
-0.358871	a slow and fragmented	-0.124939
-0.598243	memory to be fragmented	-0.124939
-0.504253	space becomes more fragmented	-0.124939
-0.423331	heap space becomes fragmented	-0.124939
-0.326947	space never becomes fragmented	-0.124939
-0.467403	memory to become fragmented	-0.124939
-0.317402	can easily become fragmented	-0.124939
-0.898276	check if the inputs	-0.124939
-1.042362	if all the inputs	-0.124939
-0.562835	mode program. The inputs	-0.124939
-0.912302	number of possible inputs	-0.124939
-0.453355	overflow and negative inputs	-0.124939
-0.339470	The only allowed inputs	-0.124939
-0.325381	keyboard and mouse inputs	-0.124939
-0.237918	to 12. Higher inputs	-0.124939
-0.593049	comparison, which is fast.	-0.124939
-0.358636	// Rounding is fast.	-0.124939
-0.946138	which is very fast.	-0.124939
-0.508108	operations are very fast.	-0.124939
-0.336885	are accessed very fast.	-0.124939
-0.336885	are generally very fast.	-0.124939
-0.354355	is accessed quite fast.	-0.124939
-0.341880	are accessed equally fast.	-0.124939
-0.540487	Intel CPUs have family	-0.124939
-0.591850	on the CPU family	-0.124939
-0.571639	newer. The CPU family	-0.124939
-0.501041	based on its family	-0.124939
-0.203162	in the x86 family	-0.124939
-0.248281	on the x86 family	-0.124939
-0.237934	than its brand, family	-0.124939
-0.462026	If n = 4,	-0.124939
-0.357661	2, Tuesday = 4,	-0.124939
-0.351690	calculated asa << 4,	-0.124939
-0.640498	the old Pentium 4,	-0.124939
-0.304391	2 (i.e. 2, 4,	-0.124939
-0.304391	processors (0, 2, 4,	-0.124939
-0.294237	1, 2, 3, 4,	-0.124939
-0.237926	Vol. 11, Iss. 4,	-0.124939
-0.357714	go here // Virtual	-0.124939
-0.357714	obj1; p->f(); // Virtual	-0.124939
-1.024710	Virtual member functions Virtual	-0.124939
-0.353430	in 32-bit systems. Virtual	-0.124939
-0.172682	non-static access. 7.20 Virtual	-0.124939
-0.172682	(methods)......................................................................... 53 7.20 Virtual	-0.124939
-0.294237	function is pure. Virtual	-0.124939
-0.237926	(see page 96). Virtual	-0.124939
-0.587344	i instead of j	-0.124939
-0.358640	* 32 with j	-0.124939
-1.108444	size; i++) { j	-0.124939
-0.534336	rows; i++) { j	-0.124939
-0.357741	as (int)&matrix[0][0] + j	-0.124939
-0.596129	(j = 0; j	-0.124939
-0.940755	compiler can replace j	-0.124939
-0.343722	factor to multiply j	-0.124939
-0.593798	break at the interrupt	-0.124939
-0.726519	to remove the interrupt	-0.124939
-0.463514	assembly instruction for interrupt	-0.124939
-0.538580	received by an interrupt	-0.124939
-0.356735	many times an interrupt	-0.124939
-0.355821	application code. An interrupt	-0.124939
-0.519199	mechanism should never interrupt	-0.124939
-0.294237	programming Device drivers, interrupt	-0.124939
-0.142981	| -1 = -1	-0.124939
-0.502901	^ ~a = -1	-0.124939
-0.801952	- a & -1	-0.124939
-0.461831	0 a & -1	-0.124939
-0.391696	a a | -1	-0.124939
-0.391696	- a | -1	-0.124939
-0.494362	a a ^ -1	-0.124939
-0.900173	variables can be 8,	-0.124939
-0.358717	4, Wednesday = 8,	-0.124939
-0.358699	2, 4 or 8,	-0.124939
-0.526689	sizes other than 8,	-0.124939
-1.003167	first byte at 8,	-0.124939
-0.506414	1 byte at 8,	-0.124939
-0.556238	8 long double 8,	-0.124939
-0.442002	(i.e. 2, 4, 8,	-0.124939
-0.374803	cache and execution units.	-0.124939
-0.374803	use different execution units.	-0.124939
-0.287655	floating point execution units.	-0.124939
-0.287655	Half size execution units.	-0.124939
-0.287655	only 64-bit execution units.	-0.124939
-0.287655	between several execution units.	-0.124939
-0.287655	with full-size execution units.	-0.124939
-0.649792	floating point multiplication units.	-0.124939
-0.358860	software packages and who	-0.124939
-0.358067	function libraries, but who	-0.124939
-0.951552	the end user who	-0.124939
-0.438997	and software developers who	-0.124939
-0.331896	single precision. And who	-0.124939
-0.237910	are for those who	-0.124939
-0.237910	164 below. Those who	-0.124939
-0.237910	the many people who	-0.124939
-0.763506	to be the fastest	-0.124939
-0.463116	is calculated the fastest	-0.124939
-0.805005	most cases, the fastest	-0.124939
-0.659045	is still the fastest	-0.124939
-1.177498	of code is fastest	-0.124939
-0.867358	which method is fastest	-0.124939
-1.486405	the sake of fastest	-0.124939
-0.726582	is critical. The fastest	-0.124939
-0.358699	aliasing. __declspec(noalias) or __restrict	-0.124939
-0.454343	aa, int * __restrict	-0.124939
-0.351608	void AddTwo(int * __restrict	-0.124939
-0.921816	using the keyword __restrict	-0.124939
-0.341857	optimize("a", on) __restrict __restrict	-0.124939
-0.538888	__restrict #pragma ivdep __restrict	-0.124939
-0.237918	__restrict __declspec( noalias) __restrict	-0.124939
-0.237918	#pragma optimize("a", on) __restrict	-0.124939
-0.585501	source is an arithmetic	-0.124939
-0.357977	fast as integer arithmetic	-0.124939
-1.302103	you can do arithmetic	-0.124939
-0.357801	uninitialized, if pointer arithmetic	-0.124939
-0.354760	resources than doing arithmetic	-0.124939
-0.255910	memory address. Pointer arithmetic	-0.124939
-0.255910	be accessed. Pointer arithmetic	-0.124939
-0.237918	gates, flip-flops, multiplexers, arithmetic	-0.124939
-0.594620	vacant then the DLL	-0.124939
-0.541118	data within the DLL	-0.124939
-1.337285	function in a DLL	-0.124939
-0.881097	variable in a DLL	-0.124939
-0.590883	Alternatively, make a DLL	-0.124939
-1.655509	use the same DLL	-0.124939
-0.426715	as a runtime DLL	-0.124939
-0.329656	library. A runtime DLL	-0.124939
-1.042372	if all the factors	-0.124939
-0.573108	summing up the factors	-0.124939
-0.556730	with many different factors	-0.124939
-0.355661	link libraries. These factors	-0.124939
-0.536622	There are several factors	-0.425969
-0.347509	so many unknown factors	-0.124939
-0.237926	resources are limiting factors	-0.124939
-0.597725	applications and the Gnu,	-0.124939
-1.469783	supported by the Gnu,	-0.124939
-1.062649	obtained with the Gnu,	-0.124939
-1.027508	such as the Gnu,	-0.124939
-1.806765	to use the Gnu,	-0.124939
-0.358843	automatic parallelization. The Gnu,	-0.124939
-0.595308	vectorization, such as Gnu,	-0.124939
-0.343767	Microsoft Intel, Microsoft, Gnu,	-0.124939
-0.203316	overflow of the arrays.	-0.124939
-0.524033	data in large arrays.	-0.124939
-0.346471	constants, and initialized arrays.	-0.124939
-0.733496	how to align arrays.	-0.124939
-0.345256	account for unaligned arrays.	-0.124939
-0.172682	style with character arrays.	-0.124939
-0.172682	style as character arrays.	-0.124939
-0.894268	increasing number of devices	-0.124939
-0.503106	realize that such devices	-0.124939
-0.356432	again. Accessing system devices	-0.124939
-0.341861	important on small devices	-0.124939
-0.626382	on such small devices	-0.124939
-0.343722	5 Programmable logic devices	-0.124939
-0.336320	or Verilog. Common devices	-0.124939
-0.237918	controlled. Small hand-held devices	-0.124939
-0.901456	rid of the branch.	-0.124939
-0.898112	here is a branch.	-0.124939
-0.828139	a kind of branch.	-0.124939
-0.358821	compatible with that branch.	-0.124939
-0.716537	the loop control branch.	-0.124939
-0.346457	from a previous branch.	-0.124939
-0.408151	chosen the wrong branch.	-0.124939
-0.898262	limit to the required	-0.124939
-0.358816	it allocates the required	-0.124939
-0.358630	little math is required	-0.124939
-0.358630	data manipulation is required	-0.124939
-0.358693	after debugging if required	-0.124939
-1.136202	amount of memory required	-0.124939
-0.336293	<pmmintrin.h> // SSE3 required	-0.124939
-0.294237	browsing that previously required	-0.124939
-0.478324	10; a = (unsigned	-0.425969
-0.190500	16; a = (unsigned	-0.425969
-0.853403	c; b = (unsigned	-0.124939
-0.349774	((unsigned int)i >= (unsigned	-0.124939
-0.339486	- min) <= (unsigned	-0.124939
-0.237943	T & operator[] (unsigned	-0.124939
-0.594975	measure that is almost	-0.124939
-1.886145	then it is almost	-0.124939
-0.599210	double's. It is almost	-0.124939
-0.539868	a list is almost	-0.124939
-1.393987	is used in almost	-0.124939
-0.358422	to Linux in almost	-0.124939
-0.657024	a loop where almost	-0.124939
-0.352394	operating systems give almost	-0.124939
-0.901203	rid of the GOT	-0.124939
-0.562813	(2) find the GOT	-0.124939
-0.586155	functions and a GOT	-0.124939
-0.358609	it expects a GOT	-0.124939
-0.846182	will not use GOT	-0.124939
-0.349769	BSD, the slow GOT	-0.124939
-0.294237	has no effect. GOT	-0.124939
-0.237926	option -read_only_relocs suppress. GOT	-0.124939
-1.441509	beginning of the array.	-0.124939
-0.900579	elements in the array.	-0.124939
-0.580118	run in an array.	-0.124939
-1.216716	in a different array.	-0.124939
-0.499676	results in another array.	-0.124939
-0.549348	through a linear array.	-0.124939
-0.336283	as a normal array.	-0.124939
-0.294228	than the destination array.	-0.124939
-0.863907	These functions are listed	-0.124939
-0.756192	The results are listed	-0.124939
-0.536203	Copyright conditions are listed	-0.124939
-0.355625	These suffixes are listed	-0.124939
-0.355625	instruction latencies are listed	-0.124939
-0.358615	instruction set, as listed	-0.124939
-0.356264	using the instructions listed	-0.124939
-0.314776	the function ReadTSC listed	-0.124939
-0.599476	extended to the general	-0.124939
-0.600668	logarithms in the general	-0.124939
-0.358667	to consult the general	-0.124939
-0.550776	future due to general	-0.124939
-0.569421	registers available for general	-0.124939
-0.358182	be justified for general	-0.124939
-0.549917	_mm_free. A more general	-0.124939
-0.349757	page 53). No general	-0.124939
-0.358819	is definitely the preferred	-0.124939
-0.358819	these reasons, the preferred	-0.124939
-1.889534	then it is preferred	-0.124939
-0.599374	aligned. It is preferred	-0.124939
-0.540525	compiled version is preferred	-0.124939
-0.659686	function returns. The preferred	-0.124939
-1.830793	it may be preferred	-0.124939
-0.557502	PC processors are preferred	-0.124939
-0.317106	- 16 clock cycles,	-0.124939
-1.261083	a few clock cycles,	-0.124939
-0.465906	- 10 clock cycles,	-0.124939
-0.099072	takes 5 clock cycles,	-0.425969
-0.317106	- 6 clock cycles,	-0.124939
-0.579912	- 80 clock cycles,	-0.124939
-0.317106	- 25 clock cycles,	-0.124939
-0.358613	to vectorize code explicitly	-0.124939
-1.541001	tell the compiler explicitly	-0.124939
-0.504063	to prefetch data explicitly	-0.124939
-0.354004	deallocate the space explicitly	-0.124939
-0.861137	the CPU dispatching explicitly	-0.124939
-0.540304	the algebraic reductions explicitly	-0.124939
-0.567492	throw(A,B,C) to tell explicitly	-0.124939
-0.490268	specify the alignment explicitly	-0.124939
-1.293556	the same memory space.	-0.124939
-0.459200	never takes memory space.	-0.124939
-0.658092	lot of cache space.	-0.124939
-0.658092	waste of cache space.	-0.124939
-0.346229	uses more cache space.	-0.124939
-0.634832	take up cache space.	-0.124939
-0.454459	bytes of storage space.	-0.124939
-0.449145	RAM and disk space.	-0.124939
-0.598919	solution is a fixed	-0.124939
-0.526370	advance, because a fixed	-0.124939
-0.725327	to insert a fixed	-0.124939
-0.526544	declare objects and fixed	-0.124939
-0.504312	a small and fixed	-0.124939
-0.065415	circular buffer with fixed	-0.124939
-0.355870	regular patterns with fixed	-0.124939
-0.540222	dynamically allocated memory Memory	-0.124939
-0.348343	and sound processing Memory	-0.124939
-0.682652	memory to disk. Memory	-0.124939
-0.048395	...................................................................................................... 21 3.13 Memory	-0.124939
-0.048395	loaded. 21 3.13 Memory	-0.124939
-0.382850	high precision math. Memory	-0.124939
-0.237918	cleaned up include: Memory	-0.124939
-0.237918	systems, and API's. Memory	-0.124939
-0.726755	first byte of zero.	-0.124939
-0.725167	an array to zero.	-0.124939
-0.462757	all elements to zero.	-0.124939
-0.358236	other bits to zero.	-0.124939
-0.358793	sign bit are zero.	-0.124939
-0.355750	this extra element zero.	-0.124939
-0.354505	imprecise or simply zero.	-0.124939
-0.350350	all 0's gives zero.	-0.124939
-0.959258	bits in a non-sequential	-0.124939
-0.319275	accessed in a non-sequential	-0.425969
-0.558302	indexed in a non-sequential	-0.124939
-0.659323	data structures with non-sequential	-0.124939
-0.460572	make the access non-sequential	-0.124939
-0.563017	fast ways of multiplying	-0.124939
-1.013717	hardware support for multiplying	-0.124939
-0.835321	be done by multiplying	-0.124939
-1.692995	is faster than multiplying	-0.124939
-0.358378	of 2 when multiplying	-0.124939
-0.341890	to double before multiplying	-0.124939
-0.341890	too big before multiplying	-0.124939
-0.341890	double precision before multiplying	-0.124939
-1.363338	to floating point Conversion	-0.124939
-0.653046	numbers and integers Conversion	-0.124939
-0.354341	of registers used. Conversion	-0.124939
-0.331919	to integer conversion Conversion	-0.124939
-0.331919	to float conversion Conversion	-0.124939
-1.183557	set is enabled. Conversion	-0.124939
-0.467621	to floating point. Conversion	-0.124939
-0.237918	modulo operator %. Conversion	-0.124939
-0.563111	a loop count down	-0.124939
-0.354172	time consumption was down	-0.124939
-0.118638	calls may slow down	-0.124939
-0.118638	writes may slow down	-0.124939
-0.282395	lookup operations slow down	-0.124939
-0.451182	ebx ; shift down	-0.124939
-0.435044	need to break down	-0.124939
-0.237926	program is shut down	-0.124939
-0.601113	architecture of the software.	-0.124939
-0.600803	account in the software.	-0.124939
-0.587336	efficient use of software.	-0.124939
-0.568613	and the application software.	-0.124939
-0.352059	lifetime of your software.	-0.124939
-0.336283	third party security software.	-0.124939
-0.314747	of their 23 software.	-0.124939
-0.294228	with some legacy software.	-0.124939
-0.596500	available because the interpreted	-0.124939
-0.578418	measured time is interpreted	-0.124939
-0.556770	a loop is interpreted	-0.124939
-0.763040	when i is interpreted	-0.124939
-0.358877	it is and interpreted	-0.124939
-1.094807	For example, in interpreted	-0.124939
-0.526539	such advantage in interpreted	-0.124939
-0.594240	integer will be interpreted	-0.124939
-1.777366	the code is exactly	-0.124939
-0.957026	overloaded operator is exactly	-0.124939
-1.016302	template parameters are exactly	-0.124939
-0.357997	disguise. Enums are exactly	-0.124939
-0.358442	different methods have exactly	-0.124939
-1.006833	compilers will make exactly	-0.124939
-0.521170	Sum3 are doing exactly	-0.124939
-0.498908	difficult to measure exactly	-0.124939
-1.161418	code has a jump	-0.124939
-1.193056	a table of jump	-0.124939
-0.463498	for threads that jump	-0.124939
-0.358419	can eliminate this jump	-0.124939
-0.566246	needs an extra jump	-0.124939
-0.459835	of array ; jump	-0.124939
-0.576735	makes the microprocessor jump	-0.124939
-0.530782	lists, switch statement jump	-0.124939
-0.578426	computation time is determined	-0.124939
-0.504196	of storage is determined	-0.124939
-0.358340	time slices is determined	-0.124939
-0.593705	available can be determined	-0.124939
-0.886537	sets can be determined	-0.124939
-0.586110	loaded cannot be determined	-0.124939
-0.721179	some cases be determined	-0.124939
-0.583551	task is often determined	-0.124939
-0.077623	bb[], short int cc[])	-1.028029
-0.358631	or every code line.	-0.124939
-0.362984	of a cache line.	-0.124939
-0.510707	loading a cache line.	-0.124939
-0.362984	occupying a cache line.	-0.124939
-0.492303	the same cache line.	-0.124939
-0.338556	an arbitrary cache line.	-0.124939
-0.843614	of a matrix line.	-0.124939
-0.596808	Patches should be easily	-0.124939
-0.588826	structure that can easily	-0.124939
-1.295716	The compiler can easily	-0.124939
-0.459754	in performance can easily	-0.124939
-0.355873	The heap can easily	-0.124939
-0.526781	readable and not easily	-0.124939
-0.349548	the problem cannot easily	-0.124939
-0.349548	encryption algorithms, cannot easily	-0.124939
-0.057357	for runtime type identification	-0.124939
-0.057357	use runtime type identification	-0.124939
-0.057357	require runtime type identification	-0.124939
-0.057357	No runtime type identification	-0.124939
-0.078292	(RTTI) Runtime type identification	-0.124939
-0.037384	7.21 Runtime type identification	-0.425969
-0.345265	Predefined macros Compiler identification	-0.124939
-0.600939	positions in the vectors.	-0.124939
-0.597623	integral number of vectors.	-0.124939
-0.358887	functions, etc. in vectors.	-0.124939
-0.598628	larger floating point vectors.	-0.124939
-0.357972	allows 256-bit integer vectors.	-0.124939
-0.462183	be organized into vectors.	-0.124939
-0.348340	things like adding vectors.	-0.124939
-0.449143	as two 128-bit vectors.	-0.124939
-0.357754	? (cc[i] + 2)	-0.124939
-0.357501	> v.i * 2)	-0.124939
-0.486011	100; i += 2)	-0.124939
-0.345266	size; i += 2)	-0.124939
-0.345266	20; i += 2)	-0.124939
-0.492245	&SelectAddMul_SSE41; (iset >= 2)	-0.124939
-0.023525	x*x*x*x*x*x*x*x = ((x2) 2)	-0.425969
-0.358406	with real time applications.	-0.124939
-0.556713	in many different applications.	-0.124939
-0.579363	best for all applications.	-0.124939
-0.503101	supported in such applications.	-0.124939
-0.524016	inefficient in large applications.	-0.124939
-0.784075	compiler for Windows applications.	-0.124939
-0.314737	for less intensive applications.	-0.124939
-0.237910	special mathe- matical applications.	-0.124939
-0.358193	is volatile. The volatile	-0.124939
-0.358193	enabled. Volatile The volatile	-0.124939
-0.570521	away. Note that volatile	-0.124939
-0.543040	of the keyword volatile	-0.124939
-0.352055	was not declared volatile	-0.124939
-0.341857	7.3. Explain volatile volatile	-0.124939
-0.237918	Example 7.3. Explain volatile	-0.124939
-0.237918	{ int dummy[4]; volatile	-0.124939
-0.458848	number of cache misses	-0.124939
-0.458848	penalty of cache misses	-0.124939
-0.342378	the code, cache misses	-0.124939
-0.342378	a thousand cache misses	-0.124939
-0.342378	disk. Provoke cache misses	-0.124939
-0.059034	matrix size causes misses	-0.425969
-0.339504	stored together Cache misses	-0.124939
-0.307539	not use lookup tables	-0.124939
-0.058916	14.1 Use lookup tables	-0.124939
-0.492266	example to produce tables	-0.124939
-0.269431	GOT and PLT tables	-0.425969
-0.102876	table lookup Lookup tables	-0.124939
-0.102876	table lookup. Lookup tables	-0.124939
-1.438386	accessed in a random	-0.124939
-0.107097	and deallocated in random	-0.425969
-1.297921	is useful for random	-0.124939
-0.358665	be caused by random	-0.124939
-1.692942	is faster than random	-0.124939
-0.462936	the data more random	-0.124939
-0.358268	can occur at random	-0.124939
-0.209578	and Mac OS X	-0.124939
-0.129482	for Mac OS X	-0.124939
-0.129482	64-bit Mac OS X	-0.124939
-0.127962	32-bit Mac OS X	-0.124939
-0.209578	Intel-based Mac OS X	-0.124939
-0.325421	#define Alignd(X) __declspec(align(16)) X	-0.124939
-0.382896	etc. #define Alignd(X) X	-0.124939
-0.525127	Converting class objects Conversions	-0.124939
-0.550497	10) { ... Conversions	-0.124939
-1.142254	registers are used. Conversions	-0.124939
-0.456648	point precision conversion Conversions	-0.124939
-0.510519	long double precision. Conversions	-0.124939
-1.183557	set is enabled. Conversions	-0.124939
-0.212298	another platform. 14.8 Conversions	-0.124939
-0.212298	double..................................................................................... 140 14.8 Conversions	-0.124939
-0.892176	variables in the YMM	-0.124939
-0.596568	change in the YMM	-0.124939
-0.525913	instruction set and YMM	-0.425969
-0.358839	only SSE). The YMM	-0.124939
-0.287628	XMM and 256-bit YMM	-0.124939
-0.287628	below). The 256-bit YMM	-0.124939
-0.294247	256-bit registers named YMM	-0.124939
-0.358337	while if is resolved	-0.124939
-0.804585	dynamic library is resolved	-0.124939
-0.462886	because #if is resolved	-0.124939
-0.883184	because they are resolved	-0.124939
-1.190962	function is not resolved	-0.124939
-0.543577	parameter is always resolved	-0.124939
-0.522734	parameters are always resolved	-0.124939
-0.599276	accurate for the purpose	-0.124939
-0.461101	loops // The purpose	-0.124939
-0.356934	is terminated. The purpose	-0.124939
-0.356934	in y. The purpose	-0.124939
-0.356934	using new. The purpose	-0.124939
-0.756506	for the specific purpose	-0.124939
-0.349121	libraries. Several special purpose	-0.124939
-0.573349	object without the -fpic	-0.124939
-0.557440	when compiled with -fpic	-0.124939
-0.601547	shared object without -fpic	-0.124939
-0.334666	when compiled without -fpic	-0.124939
-0.334666	object compiled without -fpic	-0.124939
-0.328785	of compiling without -fpic	-0.124939
-1.251962	with the option -fpic	-0.124939
-0.596862	considering is the D	-0.124939
-0.358834	class library). The D	-0.124939
-0.358831	and IDE's for D	-0.124939
-0.353729	B2; 54 class D	-0.124939
-0.353729	class B2; class D	-0.124939
-0.339458	the D language. D	-0.124939
-0.237926	of C++. Yet, D	-0.124939
-0.592974	as if it had	-0.124939
-0.591548	as if you had	-0.124939
-1.361144	if the program had	-0.124939
-0.559370	128-bit vector registers had	-0.124939
-0.350345	5. If columns had	-0.124939
-0.347484	size. Later models had	-0.124939
-0.314747	the first PC's had	-0.124939
-0.357977	functions with integer parameters.	-0.124939
-0.356927	to fourteen register parameters.	-0.124939
-0.356773	set of template parameters.	-0.124939
-0.460001	type and its parameters.	-0.124939
-0.355094	more than four parameters.	-0.124939
-0.354873	the same few parameters.	-0.124939
-0.325381	for transferring additional parameters.	-0.124939
-0.354640	eax ebx, 1 ebx,	-0.124939
-0.618401	The instruction add ebx,	-0.124939
-0.337697	two instructions add ebx,	-0.124939
-0.498318	edx = r ebx,	-0.124939
-0.211395	r ebx, eax ebx,	-0.124939
-0.211395	31 ebx, eax ebx,	-0.124939
-0.325401	eax ebx, 31 ebx,	-0.124939
-0.358929	This gives a measure	-0.124939
-0.569376	this function to measure	-0.124939
-0.881610	the program to measure	-0.124939
-1.161707	we want to measure	-0.124939
-1.021848	be difficult to measure	-0.124939
-0.527220	test data and measure	-0.124939
-0.872237	counts that you measure	-0.124939
-1.072006	If it is poorly	-0.124939
-0.955801	the value is poorly	-0.124939
-0.549839	the branch is poorly	-0.124939
-0.805911	to replace a poorly	-0.124939
-0.463472	the branches are poorly	-0.124939
-0.237934	of the original, poorly	-0.124939
-0.237934	applications to perform poorly	-0.124939
-0.889557	ways to do this:	-0.124939
-0.063317	may look like this:	-0.425969
-0.137467	typically look like this:	-0.124939
-0.070394	function looks like this:	-0.124939
-0.070394	code looks like this:	-0.124939
-0.070394	classes looks like this:	-0.124939
-1.199672	described in the sections	-0.124939
-0.358255	and read-only data sections	-0.124939
-0.588861	do. The following sections	-0.124939
-1.203389	in the above sections	-0.124939
-0.483879	cache. The subsequent sections	-0.124939
-0.048396	sections /Gy -ffunction- sections	-0.124939
-0.048396	functions) /Gy -ffunction- sections	-0.124939
-0.343730	resolve compatibility problems. Software	-0.124939
-0.478014	swapped to disk. Software	-0.124939
-0.439008	is restarted anyway. Software	-0.124939
-0.037167	"IA-32 Intel Architecture Software	-0.425969
-0.237926	user access rights. Software	-0.124939
-0.237926	API's. Memory swapping. Software	-0.124939
-0.348373	floating point calculations. Even	-0.124939
-0.700510	is not needed. Even	-0.124939
-0.493412	be predicted well. Even	-0.124939
-0.339459	a pre-calculated table. Even	-0.124939
-0.483902	Intel Pentium 4. Even	-0.124939
-0.237918	unfortunately very common. Even	-0.124939
-0.237918	years to come. Even	-0.124939
-1.360648	last byte at 19	-0.124939
-0.951168	listed in table 19	-0.124939
-0.314747	Program loading ....................................................................................................... 19	-0.124939
-0.294228	Automatic updates .................................................................................................... 19	-0.124939
-0.294228	compiler options....................................................................................... 160 19	-0.124939
-0.237918	_M_X64 _M_X64 162 19	-0.124939
-0.237918	or key press. 19	-0.124939
-0.308042	if speed is important.	-0.124939
-0.435343	when speed is important.	-0.124939
-0.435343	where speed is important.	-0.124939
-0.503362	when efficiency is important.	-0.124939
-0.357744	if portability is important.	-0.124939
-0.562262	more and more important.	-0.124939
-0.382896	is becoming increasingly important.	-0.124939
-0.891104	either in the carry	-0.124939
-0.596026	kept in the carry	-0.124939
-0.595649	register. If the carry	-0.124939
-0.587002	register into the carry	-0.124939
-0.590196	instructions where the carry	-0.124939
-0.504028	don't modify the carry	-0.124939
-0.358852	the next. The carry	-0.124939
-0.580436	times because of lazy	-0.124939
-0.659183	The principle of lazy	-0.124939
-0.351294	position-independent code and lazy	-0.425969
-0.358632	The delay on lazy	-0.124939
-0.355761	single session. But lazy	-0.124939
-0.487673	Some systems allow lazy	-0.124939
-0.659208	previous value as xn	-0.124939
-0.572590	x) { float xn	-0.124939
-0.548892	Here, each value xn	-0.124939
-0.523530	{ sum += xn	-0.124939
-0.458355	we will calculate xn	-0.124939
-0.237918	xn / nfac; xn	-0.124939
-0.237918	the series: ex xn	-0.124939
-0.783462	of the time stamp	-0.425969
-0.740812	with the time stamp	-0.124939
-0.513385	to) the time stamp	-0.124939
-0.570486	__rdtsc()). The time stamp	-0.124939
-0.344596	the so-called time stamp	-0.124939
-0.344596	// Returns time stamp	-0.124939
-0.596496	version because the debugging	-0.124939
-1.582456	is used for debugging	-0.124939
-0.358177	the IDE, for debugging	-0.124939
-0.352692	be removed after debugging	-0.124939
-0.554470	and turn off debugging	-0.124939
-0.347475	version with full debugging	-0.124939
-0.237926	cost of verifying, debugging	-0.124939
-0.832120	int x = 10;	-0.124939
-0.357677	int NumberOfTests = 10;	-0.124939
-0.872128	= b / 10;	-0.124939
-0.596902	(unsigned int)b / 10;	-0.124939
-0.326300	(unsigned int)a / 10;	-0.124939
-0.689397	= b % 10;	-0.124939
-0.556934	(unsigned int)b % 10;	-0.124939
-1.194518	according to the table.	-0.124939
-0.900583	listed in the table.	-0.124939
-1.484376	in the following table.	-0.124939
-0.559143	using the virtual table.	-0.124939
-1.203389	in the above table.	-0.124939
-0.961062	a procedure linkage table.	-0.124939
-0.237926	in a pre-calculated table.	-0.124939
-0.358918	a factor of 1,	-0.124939
-0.358723	{ Sunday = 1,	-0.124939
-1.192909	than 0 or 1,	-0.124939
-0.447830	allocations of sizes 1,	-0.124939
-0.629961	Developer’s Manual", Volume 1,	-0.124939
-0.023524	FactorialTable[13] = {1, 1,	-0.425969
-0.897368	elements of a vector,	-0.124939
-0.195056	Total size of vector,	-0.425969
-0.572162	appropriate type of vector,	-0.124939
-0.478691	int in one vector,	-0.124939
-0.478691	value in one vector,	-0.124939
-1.003024	in the next vector,	-0.124939
-0.058276	b) { if (b)	-0.726999
-0.494279	3; } if (b)	-0.124939
-0.064818	bool b; if (b)	-0.425969
-1.442288	beginning of the object,	-0.124939
-1.536067	to the same object,	-0.124939
-0.562347	copying a large object,	-0.124939
-0.500244	owns the allocated object,	-0.124939
-0.521727	in the shared object,	-0.124939
-1.091949	in a shared object,	-0.124939
-0.512954	returning a composite object,	-0.124939
-0.358932	template specialization is allowed	-0.124939
-0.596802	imprecisions should be allowed	-0.124939
-0.358001	STL container are allowed	-0.124939
-0.358001	and '$' are allowed	-0.124939
-1.142299	function is not allowed	-0.124939
-0.585760	name is not allowed	-0.124939
-0.503943	example. The only allowed	-0.124939
-0.575077	container than to delete	-0.124939
-0.659186	you forget to delete	-0.124939
-0.483078	with new and delete	-0.124939
-0.218204	using new and delete	-0.124939
-0.218204	uses new and delete	-0.124939
-0.218204	operators new and delete	-0.124939
-0.218204	over new and delete	-0.124939
-0.350837	the stack pointer. Likewise,	-0.124939
-0.738119	the shared object. Likewise,	-0.124939
-0.529612	and instruction sets. Likewise,	-0.124939
-0.343714	without generating overflow. Likewise,	-0.124939
-0.325381	a different type. Likewise,	-0.124939
-0.294228	a is false. Likewise,	-0.124939
-0.237918	the second operand. Likewise,	-0.124939
-0.532642	sets is as follows:	-0.124939
-0.451649	files are as follows:	-0.124939
-0.854099	be calculated as follows:	-0.124939
-0.349480	results were as follows:	-0.124939
-0.516892	if organized as follows:	-0.124939
-0.986645	be expressed as follows:	-0.124939
-0.349480	data elements, as follows:	-0.124939
-0.582606	of a vector simultaneously.	-0.124939
-0.357464	or modify objects simultaneously.	-0.124939
-0.343049	processes or threads simultaneously.	-0.124939
-0.343049	run eight threads simultaneously.	-0.124939
-0.331899	run many processes simultaneously.	-0.124939
-0.331899	do two jobs simultaneously.	-0.124939
-0.237926	simultaneously or seemingly simultaneously.	-0.124939
-1.844624	in the code itself	-0.124939
-0.598961	as the compiler itself	-0.124939
-0.594400	than the program itself	-0.124939
-0.757622	the test program itself	-0.124939
-0.568623	than the application itself	-0.124939
-0.352054	template is calling itself	-0.124939
-0.343738	on the device itself	-0.124939
-0.200103	be an efficient solution.	-0.124939
-1.304470	the most efficient solution.	-0.124939
-0.772982	be a better solution.	-0.124939
-0.741638	a very inefficient solution.	-0.124939
-0.339469	the most reliable solution.	-0.124939
-0.314766	and most up-to-date solution.	-0.124939
-0.358730	the rules of algebra	-0.124939
-0.249650	many rules of algebra	-0.124939
-0.358228	b Bit vector algebra	-0.124939
-0.585090	shift Floating point algebra	-0.124939
-0.354026	program optimization Integer algebra	-0.124939
-0.353436	xx-xx--x- reciprocal Boolean algebra	-0.124939
-0.345236	calculations including linear algebra	-0.124939
-0.598225	pieces of a suitable	-0.124939
-0.572458	times with a suitable	-0.124939
-0.572458	n with a suitable	-0.124939
-0.357988	of finding a suitable	-0.124939
-0.563025	several examples of suitable	-0.124939
-0.886617	instructions are not suitable	-0.124939
-0.579402	made for all suitable	-0.124939
-0.358740	template metaprogramming // Template	-0.124939
-0.483904	is the Windows Template	-0.124939
-0.522829	(ATL) and Windows Template	-0.124939
-0.347481	to be possible. Template	-0.124939
-0.294237	is the Standard Template	-0.124939
-0.237926	the STL (Standard Template	-0.124939
-0.237926	in the Active Template	-0.124939
-0.358764	heap manager can spend	-0.124939
-0.589999	it does not spend	-0.124939
-0.504413	The time you spend	-0.124939
-0.354499	may very well spend	-0.124939
-0.354003	input. Many programs spend	-0.124939
-0.497319	you will never spend	-0.124939
-0.496640	libraries Some applications spend	-0.124939
-0.349133	events as task switches	-0.124939
-0.021918	number of context switches	-0.124939
-0.095209	jobs. The context switches	-0.124939
-0.095209	program. Frequent context switches	-0.124939
-0.237960	caching. 3.14 Context switches	-0.124939
-0.172696	be renewed. Context switches	-0.124939
-0.434531	of memory to disk.	-0.124939
-0.434531	swap memory to disk.	-0.124939
-0.462765	even swapped to disk.	-0.124939
-0.462830	resource files from disk.	-0.124939
-0.374800	on the hard disk.	-0.124939
-0.287653	and fragmented hard disk.	-0.124939
-0.294256	to a floppy disk.	-0.124939
-0.143296	can become a serious	-0.124939
-0.143296	has become a serious	-0.124939
-0.885353	but there are serious	-0.124939
-0.358381	is possibly more serious	-0.124939
-0.358198	is unsafe because serious	-0.124939
-0.581342	Security The most serious	-0.124939
-0.347492	execution considerably. Another serious	-0.124939
-0.165978	2, b * c);	-0.124939
-0.438008	two, b * c);	-0.124939
-0.023525	= _mm_mullo_epi16 (b, c);	-0.425969
-0.237943	a = CriticalFunction(b, c);	-0.124939
-0.237943	a = (*CriticalFunction)(b, c);	-0.124939
-0.125758	the Microsoft Visual Studio	-0.124939
-0.125758	to Microsoft Visual Studio	-0.124939
-0.125758	below. Microsoft Visual Studio	-0.124939
-0.151209	multi-core processing. Visual Studio	-0.124939
-0.151209	for free. Visual Studio	-0.124939
-0.151209	respectively (MS Visual Studio	-0.124939
-0.237959	/ x64 (Visual Studio	-0.124939
-0.592459	7.22 short int a[100];	-0.124939
-0.829310	{ public: int a[100];	-0.124939
-0.341895	by 4 float a[100];	-0.124939
-0.341895	a list float a[100];	-0.124939
-0.341895	Example 7.26b float a[100];	-0.124939
-0.341895	Example 7.26a float a[100];	-0.124939
-0.570263	8.14a int i, a[100];	-0.124939
-0.358975	have used the trick	-0.124939
-0.460300	is true. The trick	-0.124939
-0.356303	/ (b1*b2); The trick	-0.124939
-0.356303	type-casting pointers: The trick	-0.124939
-0.356303	page 143. The trick	-0.124939
-0.356303	of sum. The trick	-0.124939
-0.491364	with a special trick	-0.124939
-0.841766	not have the disadvantages	-0.124939
-0.463495	advantages over the disadvantages	-0.124939
-0.659677	in advance. The disadvantages	-0.124939
-1.169342	However, there are disadvantages	-0.124939
-0.357361	does have some disadvantages	-0.124939
-0.356613	to overcome these disadvantages	-0.124939
-0.586690	have the following disadvantages	-0.124939
-0.356666	Only the registers eax,	-0.124939
-0.337693	PTR[ecx+eax*4],ebx eax, 1 eax,	-0.124939
-0.337693	ENDP ecx, 1 eax,	-0.124939
-0.314756	increment i++. cmp eax,	-0.124939
-0.538904	DWORD PTR [esp+8] eax,	-0.124939
-0.237926	[edx] DWORD PTR[ecx+eax*4],ebx eax,	-0.124939
-0.237926	mov mov 2:8+esp eax,	-0.124939
-1.364526	the code is distributed	-0.124939
-0.357486	program code is distributed	-0.124939
-0.358882	is compiled and distributed	-0.124939
-1.345898	need to be distributed	-0.124939
-1.215091	needs to be distributed	-0.124939
-0.586361	for function libraries distributed	-0.124939
-0.358636	different compilers is generally	-0.124939
-0.463266	WTL application is generally	-0.124939
-0.358882	so important and generally	-0.124939
-0.839307	Integer operations are generally	-0.124939
-0.503728	Container classes are generally	-0.124939
-1.398830	then you can generally	-0.124939
-0.586094	16. You can generally	-0.124939
-0.352146	portability to 64-bit mode,	-0.124939
-0.352146	respectively. (In 64-bit mode,	-0.124939
-0.535400	than in 32-bit mode,	-0.124939
-0.954745	especially in 32-bit mode,	-0.124939
-1.209147	in 64 bit mode,	-0.124939
-0.491182	in 32- bit mode,	-0.124939
-0.294256	file in exclusive mode,	-0.124939
-0.676283	64-bit Windows and Linux.	-0.124939
-0.676283	both Windows and Linux.	-0.124939
-0.557450	way as in Linux.	-0.124939
-0.358422	but rarely in Linux.	-0.124939
-0.658381	Intel compilers for Linux.	-0.124939
-1.132180	when compiling for Linux.	-0.124939
-1.013814	32- and 64-bit Linux.	-0.124939
-0.900571	test () { C1	-0.124939
-0.354191	void F1() { C1	-0.124939
-0.354191	void g() { C1	-0.124939
-0.345474	belongs to class C1	-0.124939
-0.486298	f(); }; class C1	-0.124939
-0.345474	"; Disp(); class C1	-0.124939
-0.345474	Example 7.44 class C1	-0.124939
-0.579125	so-called objects are instances	-0.124939
-0.543762	applied to all instances	-0.124939
-0.354415	// Make all instances	-0.124939
-0.357635	CParent::Hello() has multiple instances	-0.124939
-0.571122	template with many instances	-0.124939
-0.356778	or more template instances	-0.124939
-0.237926	hold many renamed instances	-0.124939
-1.090791	the function is called,	-0.124939
-0.773118	a function is called,	-0.425969
-0.862999	shared object is called,	-0.124939
-0.593155	function will be called,	-0.124939
-0.358052	most likely be called,	-0.124939
-0.527123	computer during the update	-0.124939
-0.358816	often abusing the update	-0.124939
-1.042786	the need to update	-0.124939
-0.463519	for updating. The update	-0.124939
-0.358705	an update, or update	-0.124939
-0.725880	can make an update	-0.124939
-0.356622	this important new update	-0.124939
-0.358538	= 2.0; x <=	-0.124939
-0.352408	min && i <=	-0.124939
-0.352408	= 2; i <=	-0.124939
-0.356315	the interval 0 <=	-0.124939
-0.352942	= 1; n <=	-0.124939
-0.237926	int)(i - min) <=	-0.124939
-0.237926	// Now 1.0 <=	-0.124939
-1.027824	floating point to integer.	-0.124939
-0.563968	preferably be an integer.	-0.124939
-0.435962	x as an integer.	-0.124939
-0.435962	treated as an integer.	-0.124939
-0.435962	represented as an integer.	-0.124939
-0.502627	of the 32-bit integer.	-0.124939
-0.408187	to the nearest integer.	-0.124939
-0.597378	call by the body	-0.124939
-0.891618	inefficient because the body	-0.124939
-0.598323	declaring the function body	-0.124939
-1.392602	if the loop body	-0.124939
-0.483037	better. The loop body	-0.124939
-0.483037	eax,0. The loop body	-0.124939
-0.460029	or if its body	-0.124939
-0.428513	for the hardware definition	-0.124939
-0.087314	and a hardware definition	-0.124939
-0.027200	in a hardware definition	-0.124939
-0.087314	where a hardware definition	-0.124939
-0.261718	them. The hardware definition	-0.124939
-0.598539	framework and the Java	-0.124939
-0.562533	Some implementations of Java	-0.124939
-0.504541	the features of Java	-0.124939
-1.592243	is used for Java	-0.124939
-0.565520	and the best Java	-0.124939
-0.552498	machine. The best Java	-0.124939
-0.541118	emulating the so-called Java	-0.124939
-0.357838	multi-threading, e.g. Intel Math	-0.124939
-0.355605	platforms. AMD AMD Math	-0.124939
-0.269227	version of Intel's Math	-0.124939
-0.269227	supplied in Intel's Math	-0.124939
-0.336321	_mm_exp_pd AMD Core Math	-0.124939
-0.284352	as the "Intel Math	-0.124939
-0.212309	(www.boost.org). The "Intel Math	-0.124939
-1.459630	that the compiler generates	-0.124939
-0.562668	that a compiler generates	-0.124939
-1.278386	} The compiler generates	-0.124939
-1.391310	The Intel compiler generates	-0.124939
-0.750732	the type conversion generates	-0.124939
-0.408187	1 from -128 generates	-0.124939
-0.294256	then the sampling generates	-0.124939
-0.358835	other CPUs for executing	-0.124939
-0.540938	of optimization by executing	-0.124939
-0.539090	execution time on executing	-0.124939
-0.502538	cycles spent on executing	-0.124939
-0.295615	before and after executing	-0.124939
-0.237934	x 43 speculatively executing	-0.124939
-0.358866	conventions. FreeBSD and Open	-0.124939
-0.348349	vector class library. Open	-0.124939
-0.346457	not optimize well. Open	-0.124939
-0.314747	v. 8.42n, 2004. Open	-0.124939
-0.294228	Open database connections. Open	-0.124939
-0.237918	image processing. Yeppp. Open	-0.124939
-0.237918	etc. Locked mutexes. Open	-0.124939
-1.562369	int size = 256;	-0.124939
-0.785366	0; i < 256;	-0.602060
-1.182409	different kinds of optimizations.	-0.124939
-0.586737	propagation and other optimizations.	-0.124939
-0.354498	it prevents certain optimizations.	-0.124939
-0.546683	possibility for further optimizations.	-0.124939
-0.093306	and enables interprocedural optimizations.	-0.124939
-0.093306	This enables interprocedural optimizations.	-0.124939
-0.294237	access to low-level optimizations.	-0.124939
-1.060149	0) { // Cache	-0.124939
-0.919114	be stored together Cache	-0.124939
-0.102873	more important. 9.2 Cache	-0.124939
-0.102873	......................................................................................... 87 9.2 Cache	-0.124939
-0.102873	.......................................................................................... 96 9.10 Cache	-0.124939
-0.102873	is opposite). 9.10 Cache	-0.124939
-0.294247	SSE2 Table 9.2. Cache	-0.124939
-0.583446	software to be slower	-0.124939
-1.520302	likely to be slower	-0.124939
-0.503083	dynamic link libraries slower	-0.124939
-0.554083	processor is much slower	-0.124939
-0.354775	will always run slower	-0.124939
-0.542300	likely to execute slower	-0.124939
-0.237926	neither faster nor slower	-0.124939
-0.599703	friendly. It is free	-0.124939
-0.764489	or malloc and free	-0.124939
-0.990293	is available for free	-0.124939
-0.880249	compilers do not free	-0.124939
-0.869113	be only one free	-0.124939
-0.350823	topic, see my free	-0.124939
-0.343731	though it could free	-0.124939
-0.590149	on the time consuming	-0.124939
-0.351451	function is time consuming	-0.124939
-0.351451	can be time consuming	-0.124939
-0.351451	because these time consuming	-0.124939
-0.172696	avoid the time- consuming	-0.124939
-0.172696	to put time- consuming	-0.124939
-0.237951	library functions. Time- consuming	-0.124939
-0.594943	container is to hold	-0.124939
-0.504067	extra register to hold	-0.124939
-0.550105	big enough to hold	-0.124939
-0.653786	each vector can hold	-0.124939
-0.459759	The CPU can hold	-0.124939
-0.459759	vector registers can hold	-0.124939
-0.355878	cache line can hold	-0.124939
-1.897444	parts of the memory,	-0.124939
-1.001002	a variable in memory,	-0.124939
-1.161097	is stored in memory,	-0.124939
-1.382946	be stored in memory,	-0.124939
-0.422291	in dynamically allocated memory,	-0.124939
-0.422291	as dynamically allocated memory,	-0.124939
-0.237943	Systems with segmented memory,	-0.124939
-0.401342	no cache (see p.	-0.124939
-0.309247	using templates (see p.	-0.124939
-0.309247	dependency chains (see p.	-0.124939
-0.309247	branch prediction (see p.	-0.124939
-0.309247	the throughput (see p.	-0.124939
-0.454470	(See thread-local storage p.	-0.124939
-0.435069	stack (see above, p.	-0.124939
-0.376985	0; c < SIZE;	-0.425969
-0.029039	0; r < SIZE;	-0.425969
-0.029039	1; r < SIZE;	-0.425969
-0.316431	0; r1 < SIZE;	-0.124939
-0.471281	loop in this case.	-0.124939
-0.471281	needed in this case.	-0.124939
-0.471281	message in this case.	-0.124939
-0.471281	reduction in this case.	-0.124939
-0.549203	formula in each case.	-0.124939
-0.502627	for the 32-bit case.	-0.124939
-0.346480	code in either case.	-0.124939
-0.358835	for calculations: for (	-0.124939
-0.065807	Array size Alignd (	-0.124939
-0.065807	aligned arrays Alignd (	-0.124939
-0.065807	bb[size] ); Alignd (	-0.124939
-0.237934	52 , longdoublevalue (	-0.124939
-0.237934	23 , doublevalue (	-0.124939
-0.237934	as follows: floatvalue (	-0.124939
-0.600603	handling can be expensive	-0.124939
-0.591724	write is more expensive	-0.124939
-0.355932	program development more expensive	-0.124939
-0.526031	the time, but expensive	-0.124939
-0.538805	cache are so expensive	-0.124939
-0.357241	can get very expensive	-0.124939
-0.814790	It is quite expensive	-0.124939
-0.358871	for sign and rounding	-0.124939
-1.214266	longer time than rounding	-0.124939
-1.676603	the floating point rounding	-0.124939
-0.591541	efficiency by using rounding	-0.124939
-0.810319	in speed between rounding	-0.124939
-1.193879	the difference between rounding	-0.124939
-0.355811	supports this). Use rounding	-0.124939
-0.551856	processors. See page 130	-0.124939
-0.551856	CPU. See page 130	-0.124939
-0.572903	processors (see page 130	-0.124939
-0.529975	CPUs. (See page 130	-0.124939
-0.314776	17.4 129 129 130	-0.124939
-0.382884	Intel compiler ......................................................................... 130	-0.124939
-0.237943	are the following: 130	-0.124939
-0.505015	the user is far	-0.124939
-1.541065	stored in a far	-0.124939
-0.358866	far pointers, and far	-0.124939
-0.354501	that have values far	-0.124939
-0.921816	using the keyword far	-0.124939
-1.007423	is of course far	-0.124939
-0.294228	huge). Far storage, far	-0.124939
-0.560814	sequentially in memory. They	-0.124939
-0.451973	called global variables. They	-0.124939
-0.336295	as three branches. They	-0.124939
-0.325381	publicly available information. They	-0.124939
-0.538888	32-bit and 64-bit. They	-0.124939
-0.237918	are very smart. They	-0.124939
-0.237918	are often unreliable. They	-0.124939
-0.562970	what kind of exceptions	-0.124939
-1.128840	to check for exceptions	-0.124939
-0.358690	left out if exceptions	-0.124939
-0.526148	error without using exceptions	-0.124939
-0.314747	does not throw exceptions	-0.124939
-0.294228	check that thrown exceptions	-0.124939
-0.382850	{ // Catch exceptions	-0.124939
-1.068308	depend on the system,	-0.124939
-0.504876	The smaller the system,	-0.124939
-1.186685	and the operating system,	-0.124939
-1.010593	by the operating system,	-0.124939
-0.325920	is no operating system,	-0.124939
-0.325920	the Windows operating system,	-0.124939
-0.596193	a protected operating system,	-0.124939
-1.075975	function of the absolute	-0.124939
-0.550447	can take the absolute	-0.124939
-1.542695	to calculate the absolute	-0.124939
-0.358405	the DLL use absolute	-0.124939
-0.462405	section contains no absolute	-0.124939
-0.357214	sometimes uses 32-bit absolute	-0.124939
-0.467649	bit to compare absolute	-0.124939
-1.164349	y; y = (a	-0.124939
-0.357677	< b) = (a	-0.124939
-0.499454	0; } if (a	-0.124939
-0.354946	c, d; if (a	-0.124939
-0.354946	Example 14.15b if (a	-0.124939
-0.354946	Example 14.15a if (a	-0.124939
-0.237951	function #define MAX(a,b) (a	-0.124939
-0.600941	appears in the machine	-0.124939
-2.074659	the number of machine	-0.124939
-0.504559	then transferred as machine	-0.124939
-0.357788	is translated into machine	-0.124939
-0.459512	the Java virtual machine	-0.124939
-1.021814	or a few machine	-0.124939
-0.294228	that the resulting machine	-0.124939
-0.659443	; ecx = Induction	-0.124939
-0.358534	int i; int Induction	-0.124939
-0.358325	+= 9; } Induction	-0.124939
-0.524610	for array elements Induction	-0.124939
-0.533623	other integer expressions Induction	-0.124939
-0.629940	invariant code motion Induction	-0.124939
-0.237918	temp; } 70 Induction	-0.124939
-0.358914	time slices to 120	-0.124939
-0.358866	page 95 and 120	-0.124939
-0.594016	aligned. See page 120	-0.124939
-0.579164	later instruction set. 120	-0.124939
-0.314747	12.10 Conclusion .......................................................................................................... 120	-0.124939
-0.294228	3-dimensional vectors ....................................................... 120	-0.124939
-0.237918	dynamically allocated memory................................................................. 120	-0.124939
-0.599655	Hence, it is hardly	-0.124939
-1.271055	that there is hardly	-0.124939
-0.526414	memory model is hardly	-0.124939
-0.463472	but these are hardly	-0.124939
-0.504628	64-bit integers with hardly	-0.124939
-0.841204	cache. This has hardly	-0.124939
-0.354181	that there was hardly	-0.124939
-0.597528	thing and the CPUID	-0.124939
-1.186794	based on the CPUID	-0.124939
-1.005251	time when the CPUID	-0.124939
-0.563524	task when the CPUID	-0.124939
-0.563524	33% when the CPUID	-0.124939
-0.574631	may call the CPUID	-0.124939
-0.503960	number. The only CPUID	-0.124939
-0.527350	that access the saved	-0.124939
-0.599374	bit can be saved	-0.124939
-0.594845	results should be saved	-0.124939
-0.584010	bit must be saved	-0.124939
-0.358798	function calls are saved	-0.124939
-0.462763	if F1 has saved	-0.124939
-0.498388	ebx that was saved	-0.124939
-0.599805	target if the changes	-0.124939
-0.505050	almost independent of changes	-0.124939
-0.502994	If the version changes	-0.124939
-0.502818	14, with some changes	-0.124939
-0.456036	dispatcher. The dispatcher changes	-0.124939
-0.642829	the last index changes	-0.124939
-0.331889	The keyword __fastcall changes	-0.124939
-0.358807	and c are integers,	-0.124939
-0.572487	integers and 64-bit integers,	-0.124939
-0.352150	to define 64-bit integers,	-0.124939
-0.336345	applied to 32-bit integers,	-0.124939
-0.544903	cycles for 32-bit integers,	-0.124939
-0.336345	b are 32-bit integers,	-0.124939
-0.435096	as two 32-bit integers,	-0.124939
-0.659866	have implemented a collection	-0.124939
-0.358347	time consuming. A collection	-0.124939
-0.206560	switches and garbage collection	-0.124939
-0.206560	deallocation and garbage collection	-0.124939
-0.207376	fragmented. This garbage collection	-0.124939
-0.207376	will start garbage collection	-0.124939
-0.237943	example, the Boost collection	-0.124939
-0.460939	that my optimization manuals	-0.124939
-0.568010	versions of these manuals	-0.124939
-0.347301	examples in these manuals	-0.124939
-0.460495	in the programming manuals	-0.124939
-0.347498	19 Literature Other manuals	-0.124939
-0.483879	manual. The subsequent manuals	-0.124939
-0.771240	series of five manuals	-0.124939
-1.384957	depends on the processor.	-0.124939
-0.929304	depending on the processor.	-0.124939
-0.802873	on an Intel processor.	-0.124939
-0.554707	the Pentium 4 processor.	-0.124939
-0.488039	on the actual processor.	-0.124939
-0.325414	a so-called soft processor.	-0.124939
-0.504775	14.12 Position-independent code Shared	-0.124939
-0.655567	at load time. Shared	-0.124939
-0.459177	32 bit Linux Shared	-0.124939
-0.296272	as explained below. Shared	-0.425969
-0.635291	objects in BSD Shared	-0.124939
-0.632865	for local references. Shared	-0.124939
-0.390899	the method of storing	-0.124939
-0.390899	C-style method of storing	-0.124939
-1.573130	is used for storing	-0.124939
-0.357537	a database for storing	-0.124939
-0.357537	as buffers for storing	-0.124939
-0.573929	innermost loop by storing	-0.124939
-0.564649	implemented simply by storing	-0.124939
-0.358839	users have. The developers	-0.124939
-0.562187	disadvantages that make developers	-0.124939
-0.452745	programmers and software developers	-0.124939
-0.493052	problems that software developers	-0.124939
-0.356095	Development time Some developers	-0.124939
-0.269227	compatibility problems. Software developers	-0.124939
-0.269227	Memory swapping. Software developers	-0.124939
-0.036493	CriticalFunction_386(int parm1, int parm2)	-0.425969
-0.036493	CriticalFunction_SSE2(int parm1, int parm2)	-0.425969
-0.036493	CriticalFunction_AVX(int parm1, int parm2)	-0.425969
-0.076336	CriticalFunction_Dispatch(int parm1, int parm2)	-0.124939
-0.462983	sum1 from time T	-0.124939
-0.358329	return N; } T	-0.124939
-0.559833	if the type T	-0.124939
-0.486706	elements of type T	-0.124939
-0.355695	const & a, T	-0.124939
-0.578039	T> static inline T	-0.124939
-0.237926	SafeArray { protected: T	-0.124939
-0.580192	compile time to eliminate	-0.124939
-0.550229	they fail to eliminate	-0.124939
-1.105729	The compiler can eliminate	-0.124939
-0.542534	A compiler can eliminate	-0.124939
-0.500751	if this can eliminate	-0.124939
-0.574182	Here we can eliminate	-0.124939
-1.033687	It can also eliminate	-0.124939
-2.004235	a power of 2:	-0.124939
-0.557564	as powers of 2:	-0.124939
-0.501375	printf("Beta"); break; case 2:	-0.124939
-0.381059	discussed in manual 2:	-0.124939
-0.381059	chapter in manual 2:	-0.124939
-0.536325	detail in manual 2:	-0.124939
-0.521017	a ; parameter 2:	-0.124939
-0.587325	objects of a composite	-0.124939
-0.587325	Objects of a composite	-0.124939
-0.586071	parameter has a composite	-0.124939
-0.462442	of returning a composite	-0.124939
-0.358924	a parameter of composite	-0.124939
-0.650148	the simplest cases, composite	-0.124939
-0.421414	method for transferring composite	-0.124939
-0.827891	a program. The profilers	-0.124939
-0.358640	common problems with profilers	-0.124939
-0.356084	the code. Some profilers	-0.124939
-0.355653	called CodeAnalyst. These profilers	-0.124939
-1.402681	There are various profilers	-0.124939
-0.349767	AMD CodeAnalyst. Unfortunately, profilers	-0.124939
-0.294228	are also third-party profilers	-0.124939
-0.505020	names. But a highly	-0.124939
-0.358882	many respects and highly	-0.124939
-0.866349	These functions are highly	-0.124939
-0.568720	function libraries are highly	-0.124939
-0.356417	such applications are highly	-0.124939
-0.356160	you consider making highly	-0.124939
-0.463559	interpreted again and again	-0.124939
-0.357058	a nearby address again	-0.124939
-0.348355	and reading them again	-0.124939
-0.481291	loop is interpreted again	-0.124939
-0.336320	from 0x4700. Reading again	-0.124939
-0.325399	three times. Then again	-0.124939
-0.294228	addresses is reused again	-0.124939
-0.358914	Adding 1 to 127	-0.124939
-0.570244	offset bigger than 127	-0.124939
-0.656975	// AVX version 127	-0.124939
-0.339448	33 11.8 127 127	-0.124939
-0.314747	char 8 -128 127	-0.124939
-0.714919	( 1)sign 2exponent 127	-0.124939
-0.237918	65 33 11.8 127	-0.124939
-0.496313	with in assembly language.	-0.124939
-0.567456	C++ or assembly language.	-0.124939
-0.567456	to use assembly language.	-0.124939
-0.459475	function using assembly language.	-0.124939
-0.402596	may need assembly language.	-0.124939
-0.339494	is the D language.	-0.124939
-1.136877	a hardware definition language.	-0.124939
-0.579567	programmer to be aware	-0.124939
-0.579567	dangers to be aware	-0.124939
-0.812567	you should be aware	-0.124939
-0.554444	You should be aware	-0.124939
-0.554444	developers should be aware	-0.124939
-0.780525	should therefore be aware	-0.124939
-0.237959	/ 2 (be aware	-0.124939
-0.499700	support intrinsic functions. Alternatively,	-0.124939
-0.742948	the cache size. Alternatively,	-0.124939
-0.347484	its own stack. Alternatively,	-0.124939
-1.240602	the function returns. Alternatively,	-0.124939
-0.341847	in such applications. Alternatively,	-0.124939
-0.325399	to OMF format. Alternatively,	-0.124939
-0.408151	IsProcessorFeaturePresent in Windows). Alternatively,	-0.124939
-0.598645	have floating point capabilities	-0.124939
-0.554378	between the optimization capabilities	-0.124939
-0.403598	Using the out-of-order capabilities	-0.124939
-0.032444	microprocessor with out-of-order capabilities	-0.124939
-0.067510	Microprocessors with out-of-order capabilities	-0.124939
-0.450248	parallel vector processing capabilities	-0.124939
-0.656443	((unsigned int)n < 4)	-0.124939
-0.831474	100; i += 4)	-0.124939
-0.297445	& 3) << 4)	-0.124939
-0.297445	calculated as(a << 4)	-0.124939
-0.297445	| (B << 4)	-0.124939
-0.355151	if (level >= 4)	-0.425969
-1.951665	is that the linker	-0.124939
-1.014863	done by the linker	-0.124939
-0.579545	relocated by the linker	-0.124939
-0.595637	-fpie because the linker	-0.124939
-0.550230	which allows the linker	-0.124939
-0.463535	(signed) address. The linker	-0.124939
-0.446300	from both compiler, linker	-0.124939
-0.449251	8 bytes = int64_t	-0.124939
-0.659418	long long or int64_t	-0.124939
-0.358538	unsigned 256 int int64_t	-0.124939
-0.539237	Vec4ui 64 2 int64_t	-0.124939
-0.458178	Iu32vec2 64 1 int64_t	-0.124939
-0.237926	64 -263 263-1 int64_t	-0.124939
-0.580436	large number of bits.	-0.124939
-0.580436	extended number of bits.	-0.124939
-0.348701	extended to 64 bits.	-0.124939
-0.348701	double uses 64 bits.	-0.124939
-0.356465	rather than 32 bits.	-0.124939
-0.781135	of the extra bits.	-0.124939
-0.352061	ignoring the higher bits.	-0.124939
-1.785149	to make the measurements	-0.124939
-0.541061	dynamically and that measurements	-0.124939
-0.580696	program. The time measurements	-0.124939
-0.356077	core during time measurements	-0.124939
-0.658666	to execute then measurements	-0.124939
-0.574743	spot and make measurements	-0.124939
-0.357361	to do some measurements	-0.124939
-1.194622	fact that the representation	-0.124939
-0.463524	DOS compilers). The representation	-0.124939
-1.676649	the floating point representation	-0.124939
-0.525904	8.15b. The integer representation	-0.124939
-0.439456	in a binary representation	-0.124939
-0.386784	1-bit in binary representation	-0.124939
-0.297433	of its binary representation	-0.124939
-0.351321	Example 8.7 int SomeFunction	-0.124939
-0.351321	Example 8.9b int SomeFunction	-0.124939
-0.351321	Example 8.9a int SomeFunction	-0.124939
-0.351321	Example 8.11b int SomeFunction	-0.124939
-0.351321	Example 8.11a int SomeFunction	-0.124939
-0.357662	Example 7.1 float SomeFunction	-0.124939
-0.356513	#include <malloc.h> void SomeFunction	-0.124939
-0.358235	speed or program size,	-0.124939
-0.357991	instruction sets, cache size,	-0.124939
-1.207933	the cache line size,	-0.124939
-0.348355	64 bits total size,	-0.124939
-0.345221	types available. declaration size,	-0.124939
-0.687954	buffer with fixed size,	-0.124939
-0.336295	following table. Type size,	-0.124939
-0.592974	message if it is.	-0.124939
-1.350551	inside the loop is.	-0.124939
-0.357058	the array address is.	-0.124939
-0.352350	than it actually is.	-0.124939
-0.349741	how advantageous vectorization is.	-0.124939
-0.507215	convoluted template metaprogramming is.	-0.124939
-0.339470	the compiler itself is.	-0.124939
-0.155725	Bit vector algebra reductions:	-0.124939
-0.155725	Floating point algebra reductions:	-0.124939
-0.155725	optimization Integer algebra reductions:	-0.124939
-0.155725	reciprocal Boolean algebra reductions:	-0.124939
-0.020849	point XMM (vector) reductions:	-0.124939
-0.020849	Integer XMM (vector) reductions:	-0.124939
-0.020849	Boolean XMM (vector) reductions:	-0.124939
-0.504605	a user is waiting	-0.124939
-0.504605	another thread is waiting	-0.124939
-0.579050	While we are waiting	-0.124939
-0.827377	of its time waiting	-0.124939
-0.356080	of their time waiting	-0.124939
-0.570534	threads are often waiting	-0.124939
-0.355958	can do while waiting	-0.124939
-1.994196	instruction set is available,	-0.124939
-0.598237	tools to be available,	-0.124939
-0.591973	whenever they are available,	-0.124939
-0.597355	Newest instruction set available,	-0.124939
-0.503576	best optimizing compilers available,	-0.124939
-0.576037	libraries are also available,	-0.124939
-0.331889	classes are currently available,	-0.124939
-1.289559	vectorize the code automatically.	-0.124939
-0.355884	trivial programming work automatically.	-0.124939
-0.459049	execution mechanism works automatically.	-0.124939
-0.349762	This advantage comes automatically.	-0.124939
-0.768091	cannot be vectorized automatically.	-0.124939
-0.348349	of this alignment automatically.	-0.124939
-0.343748	may not vectorize automatically.	-0.124939
-1.089173	works only for powers	-0.124939
-0.504836	the numbers are powers	-0.124939
-0.358610	are defined as powers	-0.124939
-1.051139	advantage of using powers	-0.124939
-0.524889	advise of using powers	-0.124939
-0.349056	by preferably using powers	-0.124939
-0.355903	all means avoid powers	-0.124939
-0.562481	for making a debug	-0.124939
-0.358609	program executable: a debug	-0.124939
-0.583197	more difficult to debug	-0.124939
-0.354893	are testing contains debug	-0.124939
-0.341848	profiler inserts temporary debug	-0.124939
-0.325407	lines. The 17 debug	-0.124939
-0.237926	most time. Uses debug	-0.124939
-0.463514	Using templates for polymorphism	-0.124939
-0.356584	desired functionality without polymorphism	-0.124939
-0.518500	than the runtime polymorphism	-0.124939
-0.828310	obtain the desired polymorphism	-0.124939
-0.237957	Example 7.43a. Runtime polymorphism	-0.124939
-0.237957	page 73. Runtime polymorphism	-0.124939
-0.237926	Example 7.43b. Compile-time polymorphism	-0.124939
-0.902872	Intel, Gnu and Clang	-0.124939
-0.658421	Windows platforms. The Clang	-0.124939
-0.358206	platforms. Clang The Clang	-0.124939
-0.350343	all Unix-like platforms. Clang	-0.124939
-0.300346	with the Gnu, Clang	-0.124939
-0.300346	as the Gnu, Clang	-0.124939
-0.224957	Intel, Microsoft, Gnu, Clang	-0.124939
-0.595340	time that is measured	-0.124939
-0.578422	If time is measured	-0.124939
-0.599376	counts. It is measured	-0.124939
-0.504008	4 computer. The measured	-0.124939
-0.358206	example 16.2. The measured	-0.124939
-0.596808	output should be measured	-0.124939
-0.351705	matrix sizes were measured	-0.124939
-0.587397	above code in details.	-0.124939
-0.356273	compiler manual for details.	-0.124939
-0.356273	vectorclass manual for details.	-0.124939
-0.355589	page 88 for details.	-0.124939
-0.355589	page 29 for details.	-0.124939
-0.355589	Compiler Documentation for details.	-0.124939
-0.355589	my blog for details.	-0.124939
-0.597509	faster when the factor	-0.124939
-0.358819	sizeof(float)). Now, the factor	-0.124939
-0.560826	improved by a factor	-0.124939
-0.560826	multiplied by a factor	-0.124939
-0.588974	2.0; } The factor	-0.124939
-0.582891	logarithm of each factor	-0.124939
-0.632961	is a risk factor	-0.124939
-0.115978	{ _mm_storeu_si128((__m128i *)d, x);	-0.425969
-0.122669	{ _mm_store_si128((__m128i *)d, x);	-0.124939
-0.048399	}; int order(int x);	-0.124939
-0.048399	j; int order(int x);	-0.124939
-0.237943	s = _mm_hadd_ps(x, x);	-0.124939
-0.237943	xxn(x4, x2*x, x2, x);	-0.124939
-0.600946	alone in the core.	-0.124939
-1.863607	in the same core.	-0.124939
-0.358148	its own CPU core.	-0.124939
-0.549194	threads in each core.	-0.124939
-0.107551	the same processor core.	-0.124939
-1.194518	according to the rules	-0.124939
-0.999100	even though the rules	-0.124939
-0.835581	clock cycles. The rules	-0.124939
-0.572273	unsigned The same rules	-0.124939
-0.461797	implement the many rules	-0.124939
-0.354498	to obey certain rules	-0.124939
-0.237926	The same coding rules	-0.124939
-1.285328	in terms of speed.	-0.124939
-0.463509	and optimizing for speed.	-0.124939
-0.725832	more important than speed.	-0.124939
-0.352047	and hence higher speed.	-0.124939
-0.347465	speed or full speed.	-0.124939
-0.237918	the expected real-time speed.	-0.124939
-0.237918	half the single-thread speed.	-0.124939
-0.358926	an obstacle to vectorization.	-0.124939
-0.462817	and better at vectorization.	-0.124939
-0.308075	OpenMP and automatic vectorization.	-0.124939
-0.460060	invoked with automatic vectorization.	-0.124939
-0.497057	rely on automatic vectorization.	-0.124939
-0.269922	Can do automatic vectorization.	-0.124939
-1.390589	transferred in registers anyway.	-0.124939
-0.355015	exception handling support anyway.	-0.124939
-0.352956	is rarely needed anyway.	-0.124939
-0.551735	will be loaded anyway.	-0.124939
-0.347486	to be true anyway.	-0.124939
-0.102870	computer is restarted anyway.	-0.124939
-0.102870	down and restarted anyway.	-0.124939
-0.862842	to use the smallest	-0.301030
-1.517369	by using the smallest	-0.124939
-0.462548	for even the smallest	-0.124939
-0.562571	resources. On the smallest	-0.124939
-0.462548	by putting the smallest	-0.124939
-0.800433	it is the responsibility	-0.124939
-0.051832	It is the responsibility	-0.903090
-0.352044	e.g. AVX, AVX2 Mathematical	-0.124939
-0.314766	and string manipulation Mathematical	-0.124939
-0.172686	......................... 142 14.10 Mathematical	-0.124939
-0.172686	by u[0]. 14.10 Mathematical	-0.124939
-0.538921	(see page 140). Mathematical	-0.124939
-0.102873	vectorization............................................................. 117 12.7 Mathematical	-0.124939
-0.102873	called. 118 12.7 Mathematical	-0.124939
-0.357622	increased from 64-bit MMX	-0.124939
-0.332727	32 2 64 MMX	-0.124939
-0.468767	16 4 64 MMX	-0.124939
-0.480559	8 8 64 MMX	-0.124939
-0.332727	64 1 64 MMX	-0.124939
-0.502757	set Header file MMX	-0.124939
-0.325411	available. The older MMX	-0.124939
-0.581528	comes from a reliable	-0.124939
-1.803206	to make a reliable	-0.124939
-0.649086	is often more reliable	-0.124939
-0.247008	it gives more reliable	-0.124939
-0.247008	often gives more reliable	-0.124939
-1.052834	and the most reliable	-0.124939
-1.087600	order to get reliable	-0.124939
-0.892739	Comes with the Borland	-0.124939
-0.570189	intended, while the Borland	-0.124939
-0.358667	included. Combining the Borland	-0.124939
-0.357838	PathScale Gnu Intel Borland	-0.124939
-0.738840	and 64-bit Windows. Borland	-0.124939
-0.336303	v. 9.0 CodeGear Borland	-0.124939
-0.237934	(Visual Studio 2005). Borland	-0.124939
-0.579493	operations in the sense	-0.124939
-0.579493	macro in the sense	-0.124939
-0.199512	portable in the sense	-0.425969
-0.579493	serial in the sense	-0.124939
-0.579493	overdetermined in the sense	-0.124939
-0.554832	where it makes sense	-0.124939
-0.600259	supported in the latest	-0.124939
-0.874840	one for the latest	-0.124939
-0.587705	122) for the latest	-0.124939
-1.086305	For example, the latest	-0.124939
-0.540386	2. Use the latest	-0.124939
-0.462737	user gets the latest	-0.124939
-0.358852	systems. 3 The latest	-0.124939
-0.581378	&CriticalFunction_386; } // Now	-0.124939
-0.357714	| 0x3F800000; // Now	-0.124939
-0.736060	r points to. Now	-0.124939
-0.473652	problems mentioned above. Now	-0.124939
-0.237926	(c + d); Now	-0.124939
-0.237926	been brutally interrupted. Now	-0.124939
-0.237926	sum = (s0+s1)+(s2+s3); Now	-0.124939
-0.504047	of the execution units	-0.124939
-0.059155	CPUs with execution units	-0.124939
-0.401327	the different execution units	-0.124939
-0.309234	full 128-bit execution units	-0.124939
-0.355684	largest vector. These units	-0.124939
-0.452044	CPU chip. Such units	-0.124939
-0.556796	want it to do.	-0.124939
-0.462761	obvious thing to do.	-0.124939
-0.358239	cleanup jobs to do.	-0.124939
-0.463192	it can not do.	-0.124939
-0.539015	reductions they cannot do.	-0.124939
-0.354020	which few programs do.	-0.124939
-0.237934	CPUs, but event-counters do.	-0.124939
-0.198872	cycle is the reciprocal	-0.425969
-0.566484	better: store the reciprocal	-0.124939
-0.725824	and insert the reciprocal	-0.124939
-0.358585	multiply by - reciprocal	-0.124939
-0.408187	reciprocal, fast approximate reciprocal	-0.124939
-0.237943	multiply by xx-xx--x- reciprocal	-0.124939
-0.027028	void StoreVector(void * d,	-0.602060
-0.340117	void StoreVectorA(void * d,	-0.124939
-0.420070	a, b, c, d,	-0.301030
-0.234565	code into multiple threads.	-0.124939
-0.101635	work into multiple threads.	-0.124939
-0.234565	tasks into multiple threads.	-0.124939
-0.234565	job into multiple threads.	-0.124939
-0.357241	and communicating between threads.	-0.124939
-0.237951	starting and stopping threads.	-0.124939
-0.806048	some cases, the log	-0.124939
-0.358866	sqrt, pow and log	-0.124939
-0.358830	a password. The log	-0.124939
-0.578998	here: a[i] = log	-0.124939
-0.463346	turn off or log	-0.124939
-0.352373	databases usually requires log	-0.124939
-0.771705	critical innermost loop. log	-0.124939
-0.541287	function stores the thousand	-0.124939
-0.590618	function on a thousand	-0.124939
-1.191281	every time a thousand	-0.124939
-0.724590	or even a thousand	-0.124939
-0.357988	loop repeats a thousand	-0.124939
-0.917459	an array of thousand	-0.124939
-0.550710	one time in thousand	-0.124939
-0.463627	for implementing a compile-time	-0.124939
-0.358866	time if and compile-time	-0.124939
-0.358699	time loops or compile-time	-0.124939
-0.504615	polymorphism or with compile-time	-0.124939
-1.343060	is known at compile-time	-0.124939
-0.502869	works for any compile-time	-0.124939
-0.353632	D language allows compile-time	-0.124939
-0.594541	here is to remove	-0.124939
-1.151850	you need to remove	-0.124939
-0.657815	the linker to remove	-0.124939
-0.462332	names. Remember to remove	-0.124939
-0.358718	doesn't add or remove	-0.124939
-0.591723	zero. You may remove	-0.124939
-0.294256	threads can add, remove	-0.124939
-0.358929	processors. Hyperthreading is Intel's	-0.124939
-0.593065	old version of Intel's	-0.124939
-0.463590	are supplied in Intel's	-0.124939
-0.358644	is supplied with Intel's	-0.124939
-0.351290	fit their CPUs. Intel's	-0.124939
-0.037167	you are overriding Intel's	-0.425969
-1.233147	rather than by 16.	-0.124939
-0.984707	is divisible by 16.	-0.124939
-0.826616	address divisible by 16.	-0.124939
-0.426644	addresses divisible by 16.	-0.124939
-1.683559	explained on page 16.	-0.124939
-0.325421	as i modulo 16.	-0.124939
-0.870927	be transferred in registers,	-0.124939
-0.659831	are transferred in registers,	-0.124939
-0.502990	be saved in registers,	-0.124939
-0.562601	works only on registers,	-0.124939
-0.339479	The older MMX registers,	-0.124939
-0.237943	saving and restoring registers,	-0.124939
-0.093303	// function to transpose	-0.301030
-0.897259	more time to transpose	-0.124939
-0.831141	long time to transpose	-0.124939
-1.807155	it takes to transpose	-0.124939
-0.461212	matrix // call transpose	-0.124939
-1.330461	don't have to wait	-0.124939
-0.469047	element has to wait	-0.124939
-0.469047	addition has to wait	-0.124939
-0.469047	actually has to wait	-0.124939
-0.358882	of seconds and wait	-0.124939
-0.526425	DelayFiveSeconds function will wait	-0.124939
-0.355976	of x must wait	-0.124939
-0.861509	to any other number.	-0.124939
-1.676649	the floating point number.	-0.124939
-0.760201	as a 32-bit number.	-0.124939
-0.648971	an 8-bit signed number.	-0.124939
-0.209081	name and model number.	-0.124939
-0.179185	family and model number.	-0.124939
-0.599502	chance that the break	-0.124939
-0.358821	spot. Repeating the break	-0.124939
-1.327353	of how to break	-0.124939
-1.375116	no need to break	-0.124939
-1.214194	then it will break	-0.124939
-0.294256	debugger and press break	-0.124939
-0.596135	multiplying with a constant.	-0.124939
-0.358793	point to are constant.	-0.124939
-0.463354	is large or constant.	-0.124939
-0.599111	by the same constant.	-0.124939
-0.357983	a positive integer constant.	-0.124939
-0.579041	a double precision constant.	-0.124939
-0.093317	in the procedure linkage	-0.124939
-0.014226	in a procedure linkage	-0.124939
-0.028934	uses a procedure linkage	-0.124939
-0.093317	functions, called procedure linkage	-0.124939
-0.093317	an ordinary procedure linkage	-0.124939
-0.497723	error code if possible,	-0.124939
-0.353704	static variables if possible,	-0.124939
-0.457001	possible implementation if possible,	-0.124939
-0.353704	be avoided, if possible,	-0.124939
-0.353704	always normalized, if possible,	-0.124939
-0.358625	few branches as possible,	-0.124939
-0.133075	that the bit scan	-0.124939
-0.133075	use the bit scan	-0.124939
-0.325911	of this bit scan	-0.124939
-0.133075	a slow bit scan	-0.124939
-0.133075	with slow bit scan	-0.124939
-0.237959	a BSF (bit scan	-0.124939
-0.441168	to 32 bit systems:	-0.124939
-0.441168	over 32 bit systems:	-0.124939
-0.022454	int in 16-bit systems:	-0.249877
-0.591731	operand is more predictable	-0.124939
-0.546916	comparisons are more predictable	-0.124939
-0.885349	put the most predictable	-0.124939
-0.759233	depending on how predictable	-0.124939
-0.384071	it is poorly predictable	-0.124939
-0.269234	replace a poorly predictable	-0.124939
-0.669115	{ cout << "Hello	-0.425969
-0.021918	directly // Writes "Hello	-0.124939
-0.010821	1" // Writes "Hello	-0.425969
-0.021918	p2->Hello(); // Writes "Hello	-0.124939
-0.557238	this way is equal	-0.124939
-0.462886	of list[i] is equal	-0.124939
-0.358337	each label is equal	-0.124939
-1.071604	happen to be equal	-0.124939
-0.358551	and put an equal	-0.124939
-0.574712	p is therefore equal	-0.124939
-0.358839	register keyword. The CodeGear	-0.124939
-0.461382	(three parameters on CodeGear	-0.124939
-0.357155	two (three on CodeGear	-0.124939
-0.357838	32-bit Mac Intel CodeGear	-0.124939
-0.355217	Windows. Borland / CodeGear	-0.124939
-0.237934	2008, v. 9.0 CodeGear	-0.124939
-1.060142	The code is compact	-0.124939
-0.556345	data is more compact	-0.124939
-0.556345	member is more compact	-0.124939
-0.548734	faster and more compact	-0.124939
-1.007353	the code more compact	-0.124939
-0.348682	be made more compact	-0.124939
-0.584363	calculation of this polynomial	-0.124939
-0.358332	+ C; } polynomial	-0.124939
-0.237952	counter // Calculate polynomial	-0.124939
-0.237952	Example 8.23b. Calculate polynomial	-0.124939
-0.314783	an n'th degree polynomial	-0.124939
-0.294247	vector parameters Vec4f polynomial	-0.124939
-0.573342	(Compile without the Common	-0.124939
-0.357983	subexpression elimin., integer Common	-0.124939
-0.351309	a += 2; Common	-0.124939
-0.779913	XMM (vector) reductions: Common	-0.124939
-0.429522	propagation Pointer elimination Common	-0.124939
-0.237926	VHDL or Verilog. Common	-0.124939
-1.029989	A function that reads	-0.124939
-0.358705	normal writes or reads	-0.124939
-0.590052	that a program reads	-0.124939
-0.550001	0x2710 and later reads	-0.124939
-0.345256	efficiency. Using unaligned reads	-0.124939
-0.294237	the program afterwards reads	-0.124939
-0.526329	value from memory plus	-0.124939
-1.238135	float or double plus	-0.124939
-0.357062	a base address plus	-0.124939
-0.580247	plus a constant plus	-0.124939
-0.354897	beginning of list plus	-0.124939
-0.341865	the preceding label plus	-0.124939
-0.373075	49 and manual 5:	-0.124939
-0.300495	explained in manual 5:	-0.124939
-0.300495	given in manual 5:	-0.124939
-0.300495	19 in manual 5:	-0.124939
-0.300495	code" in manual 5:	-0.124939
-0.286241	cases. See manual 5:	-0.124939
-2.117932	in order to increase	-0.124939
-0.876675	The way to increase	-0.124939
-0.594491	Windows you can increase	-0.124939
-0.572823	systems, you cannot increase	-0.124939
-0.352371	your modifications actually increase	-0.124939
-0.294247	a possible minor increase	-0.124939
-0.357595	casting // C++ casting	-0.124939
-0.325437	rather than type casting	-0.124939
-0.325437	over- loaded type casting	-0.124939
-0.325437	// C-style type casting	-0.124939
-0.325437	// Constructor-style type casting	-0.124939
-0.336329	is safer. Type casting	-0.124939
-0.357597	extra time, of course,	-0.124939
-0.357597	is inefficient, of course,	-0.124939
-0.357597	programming practice, of course,	-0.124939
-0.357597	be omitted, of course,	-0.124939
-0.357597	This requires, of course,	-0.124939
-0.237959	division faster. Of course,	-0.124939
-0.600398	visible in the scope	-0.124939
-1.776150	to make the scope	-0.124939
-0.027995	is beyond the scope	-0.602060
-0.550813	name, regardless of scope	-0.124939
-0.659949	example shows the principle	-0.124939
-1.024212	function calls. The principle	-0.124939
-0.358201	go undetected. The principle	-0.124939
-0.358589	this manually. This principle	-0.124939
-0.562320	normally use this principle	-0.124939
-1.655451	use the same principle	-0.124939
-0.597936	latency and the throughput	-0.124939
-0.893062	limited by the throughput	-0.124939
-1.451606	rather than the throughput	-0.124939
-0.787805	to increase the throughput	-0.124939
-0.054850	3.16 Execution unit throughput	-0.124939
-0.181346	the time is spent	-0.124939
-0.463031	you would have spent	-0.124939
-0.567501	only the time spent	-0.124939
-0.567501	reducing the time spent	-0.124939
-0.586037	the clock cycles spent	-0.124939
-1.562340	int size = 16;	-0.124939
-0.924295	= b / 16;	-0.124939
-0.623801	(unsigned int)b / 16;	-0.124939
-0.689377	= b % 16;	-0.124939
-0.556920	(unsigned int)b % 16;	-0.124939
-0.339486	1; n <= 16;	-0.124939
-0.463208	the name of Func	-0.124939
-0.504747	; start of Func	-0.124939
-1.056537	the first time Func	-0.124939
-0.578910	re-calculated every time Func	-0.124939
-0.504130	; return from Func	-0.124939
-0.356506	Example 8.25 void Func	-0.124939
-1.572637	you have to identify	-0.124939
-2.092829	in order to identify	-0.124939
-1.230081	best way to identify	-0.124939
-0.878921	discussed how to identify	-0.124939
-0.549051	be enough to identify	-0.124939
-0.357232	the debugger to identify	-0.124939
-0.593445	number, which is 15	-0.124939
-0.463565	between 2 and 15	-0.124939
-1.360678	last byte at 15	-0.124939
-0.490420	a memory pool. 15	-0.124939
-0.325391	programming .......................................................................................... 150 15	-0.124939
-0.237926	See page 90. 15	-0.124939
-0.357447	cycles. Division takes 14	-0.124939
-0.353430	in Mac systems. 14	-0.124939
-0.348353	sake of optimization. 14	-0.124939
-0.339487	compiler ......................................................................... 130 14	-0.124939
-0.237926	user interface framework........................................................................... 14	-0.124939
-0.237926	the C++ language...................................................... 14	-0.124939
-0.889530	how to do this.	-0.124939
-0.523446	type to avoid this.	-0.124939
-0.523446	measurements to avoid this.	-0.124939
-0.573596	able to see this.	-0.124939
-0.336312	not always avoiding this.	-0.124939
-0.839863	following example illustrates this.	-0.124939
-0.357983	Register variables, integer Register	-0.124939
-0.357644	subexpression elimin., float Register	-0.124939
-0.460937	in assembly code. Register	-0.124939
-0.488024	compilation is finished. Register	-0.124939
-0.325407	for integer constants. Register	-0.124939
-0.382862	temp / 4; Register	-0.124939
-0.591879	package on a complex	-0.124939
-0.558141	type is more complex	-0.124939
-0.558141	situation is more complex	-0.124939
-0.494080	But in more complex	-0.124939
-0.569454	expressions or more complex	-0.124939
-0.358354	of operations. A complex	-0.124939
-0.356805	to suboptimal code. Intrinsic	-0.124939
-0.353422	Function Assembly name Intrinsic	-0.124939
-0.352942	are summarized below. Intrinsic	-0.124939
-0.346459	few machine instructions. Intrinsic	-0.124939
-0.339468	in either case. Intrinsic	-0.124939
-0.237926	AVX2 Table 12.3. Intrinsic	-0.124939
-0.463632	and destructors to call.	-0.124939
-0.582677	for the function call.	-0.124939
-0.582677	across the function call.	-0.124939
-1.065366	through a function call.	-0.124939
-0.503215	Dispatch on first call.	-0.124939
-0.501077	Dispatch on every call.	-0.124939
-0.447612	compiler you will notice	-0.124939
-0.447612	compiler, you will notice	-0.124939
-0.141011	first thing we notice	-0.124939
-0.141011	second thing we notice	-0.124939
-0.093313	163 20 Copyright notice	-0.124939
-0.093313	links. 20 Copyright notice	-0.124939
-0.542440	+ i); // Add	-0.425969
-0.822297	the application program. Add	-0.124939
-0.350341	function local: 1. Add	-0.124939
-0.804931	of the library. Add	-0.124939
-0.550593	with CPU dispatching. Add	-0.124939
-0.329605	used is branch prediction.	-0.124939
-0.101565	explanation of branch prediction.	-0.124939
-0.329605	43 about branch prediction.	-0.124939
-0.329605	from poor branch prediction.	-0.124939
-0.559226	made the right prediction.	-0.124939
-1.064864	up with the expected	-0.124939
-0.599707	Gnu. It is expected	-0.124939
-1.605911	it can be expected	-0.124939
-0.589299	same can be expected	-0.124939
-0.589299	applications can be expected	-0.124939
-0.358802	instruction set are expected	-0.124939
-0.594936	data is to declare	-0.124939
-0.589472	also recommended to declare	-0.124939
-0.725786	is preferred to declare	-0.124939
-0.463579	#include directives and declare	-0.124939
-0.591723	called. You may declare	-0.124939
-0.591554	size if you declare	-0.124939
-0.896868	good for the application.	-0.124939
-0.596860	compiler with the application.	-0.124939
-0.659353	that fits the application.	-0.124939
-0.334755	in the particular application.	-0.124939
-0.950405	in a particular application.	-0.124939
-0.237951	than an MFC application.	-0.124939
-1.464956	at compile time here.	-0.124939
-0.358071	memory model used here.	-0.124939
-0.351298	simply not appropriate here.	-0.124939
-0.331925	a few pitfalls here.	-0.124939
-0.314756	a little odd here.	-0.124939
-0.294237	b is 400 here.	-0.124939
-1.632921	size of the largest	-0.124939
-0.595215	larger than the largest	-0.124939
-0.575207	know whether the largest	-0.124939
-0.031661	finding the numerically largest	-0.124939
-0.031661	finds the numerically largest	-0.124939
-0.065813	// Find numerically largest	-0.124939
-1.366887	even if the dispatched	-0.124939
-0.583206	table. If a dispatched	-0.124939
-0.463194	// go to dispatched	-0.124939
-0.358580	// Entry to dispatched	-0.124939
-0.358893	// continue in dispatched	-0.124939
-0.523116	function calls another dispatched	-0.124939
-1.115277	of the data members.	-0.124939
-0.545124	reordering the data members.	-0.124939
-1.047775	Alignment of data members.	-0.124939
-0.513754	copying all data members.	-0.124939
-0.346633	cannot modify data members.	-0.124939
-0.816130	the child class members.	-0.124939
-0.540530	data size that fits	-0.124939
-0.539976	the version that fits	-0.124939
-0.595888	small that it fits	-0.124939
-0.295535	depends on what fits	-0.124939
-0.295535	depending on what fits	-0.124939
-0.382884	of four float's fits	-0.124939
-0.358581	- xx(-)x- - x-xxxx--x	-0.124939
-0.514682	method Function inlining x-xxxx--x	-0.124939
-0.336293	xx(-)x- - x-xxxx--x x-xxxx--x	-0.124939
-0.538904	(a&b)|(a&c) = a&(b|c) x-xxxx--x	-0.124939
-0.237926	(a|b)&(a|c) = a|(b&c) x-xxxx--x	-0.124939
-0.237926	always true/false Loopunrolling x-xxxx--x	-0.124939
-1.352194	is used for giving	-0.124939
-0.970847	are used for giving	-0.124939
-0.358673	the CPU by giving	-0.124939
-0.764722	it. I am giving	-0.124939
-0.294247	DoThisThreeTimesAWeek(); } By giving	-0.124939
-0.237934	as a subset, giving	-0.124939
-0.579340	that floating point comparisons	-0.124939
-1.034871	makes floating point comparisons	-0.124939
-0.516751	cycles. Floating point comparisons	-0.124939
-0.516751	organized. Floating point comparisons	-0.124939
-0.455197	} The two comparisons	-0.124939
-0.352282	condition. Replacing two comparisons	-0.124939
-0.357838	page 131. Intel Performance	-0.124939
-0.657182	Report on C++ Performance	-0.124939
-0.451381	computation time. 4 Performance	-0.124939
-0.349268	....................................................................................... 22 4 Performance	-0.124939
-0.421418	and the "Intel Performance	-0.124939
-0.237934	Library" and "Integrated Performance	-0.124939
-0.343777	the stack (see above,	-0.124939
-0.343777	critical stride (see above,	-0.124939
-0.581594	pipelined, as explained above,	-0.124939
-0.340547	perfectly. As explained above,	-0.124939
-0.607456	in example 16.2 above,	-0.124939
-0.237943	the function bodies above,	-0.124939
-0.581595	pool, as explained above.	-0.124939
-0.340549	lookup mechanisms explained above.	-0.124939
-0.640509	the advice given above.	-0.124939
-0.237963	collection, as mentioned above.	-0.124939
-0.237963	the problems mentioned above.	-0.124939
-0.237963	storage methods mentioned above.	-0.124939
-0.345222	a memory address. Pointer	-0.124939
-0.331912	Microsoft Constant propagation Pointer	-0.124939
-0.294237	page 81). 77 Pointer	-0.124939
-0.294237	details about rounding. Pointer	-0.124939
-0.237926	functions like sin. Pointer	-0.124939
-0.237926	can be accessed. Pointer	-0.124939
-0.595723	purpose is to detect	-0.124939
-0.504913	smart. They can detect	-0.124939
-1.044996	that it may detect	-0.124939
-0.358338	[] operator will detect	-0.124939
-0.469050	compilers can automatically detect	-0.124939
-0.332934	program should automatically detect	-0.124939
-0.879469	without using the normal	-0.124939
-0.596053	just as a normal	-0.124939
-0.527272	priority back to normal	-0.124939
-0.358871	if nonzero and normal	-0.124939
-0.358644	nontemporal writes with normal	-0.124939
-1.397256	more time than normal	-0.124939
-0.352667	based on compilers. Several	-0.124939
-0.510532	standard function libraries. Several	-0.124939
-0.314756	163 Internet forums Several	-0.124939
-0.237926	not support SSE. Several	-0.124939
-0.237926	Windows Library (OWL). Several	-0.124939
-0.237926	example is Perl. Several	-0.124939
-1.896494	then it is convenient	-0.124939
-0.599574	list is a convenient	-0.124939
-1.820053	it may be convenient	-0.124939
-1.498347	can also be convenient	-0.124939
-0.935013	may be more convenient	-0.124939
-0.355940	is certainly more convenient	-0.124939
-0.463628	example only to show	-0.124939
-0.979717	a time and show	-0.124939
-0.358412	3 breakpoint and show	-0.124939
-0.540050	user but only show	-0.124939
-0.314766	134 and 135 show	-0.124939
-0.314783	in table 9.1 show	-0.124939
-0.358420	number 16 in column	-0.124939
-0.915637	cache lines in column	-0.124939
-0.463282	these elements with column	-0.124939
-0.596132	(column = 0; column	-0.124939
-0.331909	we are swapping column	-0.124939
-0.237934	from the leftmost column	-0.124939
-0.024498	parm1, int parm2) {...}	-0.903090
-0.872227	of function libraries Test	-0.124939
-0.550593	without CPU dispatching. Test	-0.124939
-0.439031	fragmented hard disk. Test	-0.124939
-0.698843	and branch mispredictions. Test	-0.124939
-0.102873	124 2 13.4 Test	-0.124939
-0.102873	reliable decision. 13.4 Test	-0.124939
-0.463629	full declaration of c1	-0.124939
-0.358877	Loop r1 and c1	-0.124939
-0.834371	about the class c1	-0.124939
-0.353735	Example 7.28 class c1	-0.124939
-0.596132	(c1 = 0; c1	-0.124939
-0.314766	c1 < r1; c1	-0.124939
-0.841702	+ d = x-	-0.124939
-1.680475	x x x x-	-0.602060
-0.352868	- xx x x-	-0.124939
-0.336346	d = x- x-	-0.124939
-0.358744	to measure // Number	-0.124939
-0.064171	each element, bits Number	-0.425969
-0.350341	evict number 1. Number	-0.124939
-0.294247	in this column. Number	-0.124939
-0.294247	class libraries 113 Number	-0.124939
-0.599663	believe that the portability	-0.124939
-1.582508	There is a portability	-0.124939
-1.486331	the sake of portability	-0.124939
-0.358693	not recommended if portability	-0.124939
-0.358378	viable compromise when portability	-0.124939
-0.314756	compromise between efficiency, portability	-0.124939
-0.358744	#include <pmmintrin.h> // SSE3	-0.124939
-0.646222	and double vectors SSE3	-0.124939
-0.382873	-msse2 /arch:SSE2 -msse2 SSE3	-0.124939
-0.102873	instruction set Suppl. SSE3	-0.124939
-0.102873	SSE3 pmmintrin.h Suppl. SSE3	-0.124939
-0.237934	xmmintrin.h SSE2 emmintrin.h SSE3	-0.124939
-1.767196	the compiler to evaluate	-0.124939
-0.857394	same time to evaluate	-0.124939
-1.570949	be able to evaluate	-0.124939
-0.497700	a needs to evaluate	-0.124939
-0.497700	b needs to evaluate	-0.124939
-0.460566	and they always evaluate	-0.124939
-0.065741	17 Optimization in embedded	-0.425969
-0.358653	and machines with embedded	-0.124939
-0.584820	used in some embedded	-0.124939
-0.341882	used in small embedded	-0.124939
-0.489132	except for small embedded	-0.124939
-0.357372	Other manuals by Agner	-0.124939
-0.357372	is copyrighted by Agner	-0.124939
-0.657764	Same example, using Agner	-0.124939
-0.357838	class library Intel Agner	-0.124939
-0.421401	Intel Vector class, Agner	-0.124939
-0.294247	Mac platforms By Agner	-0.124939
-0.897296	thanks to the availability	-0.124939
-0.597732	source, and the availability	-0.124939
-0.598600	test for the availability	-0.124939
-0.462930	will delay the availability	-0.124939
-0.358372	icon signaling the availability	-0.124939
-0.504912	the application. The availability	-0.124939
-0.599038	InstructionSet(): // Example 13.1	-0.124939
-1.010490	as in example 13.1	-0.124939
-0.762925	CriticalFunction in example 13.1	-0.124939
-0.762925	illustrated in example 13.1	-0.124939
-0.331928	instruction sets........................... 122 13.1	-0.124939
-0.294256	critical innermost loops. 13.1	-0.124939
-0.358932	a pointer, a reference,	-0.124939
-0.841262	a pointer or reference,	-0.124939
-0.441543	4 pointer or reference,	-0.124939
-0.441543	8 pointer or reference,	-0.124939
-0.237951	or a non-const reference,	-0.124939
-0.601120	runtime of the .NET	-0.124939
-1.192421	code for the .NET	-0.124939
-0.358206	runtime frameworks. The .NET	-0.124939
-0.358206	the mouse. The .NET	-0.124939
-0.429548	C#, Visual Basic .NET	-0.124939
-0.237943	languages in Microsoft's .NET	-0.124939
-0.498908	d; if (a !=	-0.124939
-0.325423	y && z !=	-0.124939
-0.325407	1.0; while (n !=	-0.124939
-0.595238	{ if (b !=	-0.124939
-0.237926	7.8 if (handle !=	-0.124939
-0.237926	string; while (*p !=	-0.124939
-0.458516	disk. A few files,	-0.124939
-0.299768	configuration files, resource files,	-0.124939
-0.299768	shared objects), resource files,	-0.124939
-0.343757	updates, remote help files,	-0.124939
-0.212323	resource files, configuration files,	-0.124939
-0.212323	of DLLs, configuration files,	-0.124939
-0.765477	Pointers and references Pointers	-0.124939
-0.319761	Pointers versus references Pointers	-0.124939
-0.527385	by each thread. Pointers	-0.124939
-0.345241	a valid address. Pointers	-0.124939
-0.102876	vector operations. 7.6 Pointers	-0.124939
-0.102876	Booleans................................................................................................................... 33 7.6 Pointers	-0.124939
-0.585543	at more than half	-0.124939
-0.576349	at less than half	-0.124939
-0.358283	is handled at half	-0.124939
-0.990624	there is only half	-0.124939
-0.351505	etc. of only half	-0.124939
-0.351505	by modifying only half	-0.124939
-1.582519	is used for converting	-0.124939
-0.816351	extra instructions for converting	-0.124939
-0.579050	7.4 we are converting	-0.124939
-0.462943	Use signed when converting	-0.124939
-0.357067	to signed before converting	-0.124939
-0.314783	line is implicitly converting	-0.124939
-0.358159	The problem only occurs	-0.124939
-0.155989	if an exception occurs	-0.124939
-0.351303	a task switch occurs	-0.124939
-0.346478	the same subexpression occurs	-0.124939
-0.442046	times an interrupt occurs	-0.124939
-0.719240	|= 0x80000000; // Set	-0.124939
-0.065390	= InstructionSet(); // Set	-0.425969
-0.355680	instrset_detect(); 116 // Set	-0.124939
-0.237951	// Example 7.6. Set	-0.124939
-0.237951	// Example 7.5. Set	-0.124939
-0.659274	The exception is costly	-0.124939
-0.659274	the conversion is costly	-0.124939
-0.358798	programming constructs are costly	-0.124939
-0.354346	slice are quite costly	-0.124939
-0.429535	which are relatively costly	-0.124939
-0.444382	abuse is extremely costly	-0.124939
-0.571500	program on the newest	-0.124939
-0.571500	best on the newest	-0.124939
-0.843980	running on the newest	-0.124939
-1.806860	to use the newest	-0.124939
-1.155185	of using the newest	-0.124939
-0.504912	to vectorization. The newest	-0.124939
-0.358835	a standard for specifying	-0.124939
-0.358673	is declared by specifying	-0.124939
-0.356590	an int, without specifying	-0.124939
-0.547419	a copy constructor specifying	-0.124939
-0.044153	of standard C, specifying	-0.425969
-1.106663	A branch that follows	-0.124939
-0.592976	statement if it follows	-0.124939
-1.208725	be implemented as follows	-0.124939
-0.461276	be vectorized as follows	-0.124939
-1.205573	the function pointer follows	-0.124939
-0.237934	arguments. This closely follows	-0.124939
-0.851785	The result of comparing	-0.124939
-0.564642	numbers simply by comparing	-0.124939
-0.357376	of doubles by comparing	-0.124939
-1.552374	more efficient than comparing	-0.124939
-0.356681	element. Rather than comparing	-0.124939
-0.382884	a loop counter, comparing	-0.124939
-0.598458	function. This is efficient,	-0.124939
-0.463565	is fast and efficient,	-0.124939
-1.046532	the code more efficient,	-0.124939
-0.358074	more primitive, but efficient,	-0.124939
-0.356670	to make pointers efficient,	-0.124939
-0.354337	is also quite efficient,	-0.124939
-0.805991	next generation of computers	-0.124939
-0.541065	development, and that computers	-0.124939
-1.307201	This is because computers	-0.124939
-0.493050	why all modern computers	-0.124939
-0.193374	have more powerful computers	-0.124939
-0.193374	ever more powerful computers	-0.124939
-0.589642	last all the B	-0.124939
-1.636452	the calculation of B	-0.124939
-0.552568	and the four B	-0.124939
-0.314783	values of A, B	-0.124939
-0.023525	A = 1.1, B	-0.425969
-0.502067	for system code. System	-0.124939
-0.102876	access................................................................................................................ 20 3.8 System	-0.124939
-0.102876	to finish. 3.8 System	-0.124939
-0.102876	code.................................................................................. 148 14.13 System	-0.124939
-0.102876	OS X. 14.13 System	-0.124939
-0.538937	of everything else. System	-0.124939
-0.437608	a series of five	-0.124939
-0.358427	This series of five	-0.124939
-0.575609	are up to five	-0.124939
-0.462555	example, then all five	-0.124939
-0.449160	value has changed five	-0.124939
-0.582891	result of each step	-0.124939
-1.283907	in a single step	-0.124939
-0.850836	to the next step	-0.124939
-0.775614	in the second step	-0.124939
-0.458799	through a second step	-0.124939
-0.237934	more space 91 step	-0.124939
-0.550787	Data caching is poor	-0.124939
-0.563017	many examples of poor	-0.124939
-0.550772	higher due to poor	-0.124939
-0.598285	usability may be poor	-0.124939
-0.462818	often suffer from poor	-0.124939
-0.237926	reasonably well. Very poor	-0.124939
-1.337219	don't have to prefetch	-0.124939
-0.358834	Prefetching data The prefetch	-0.124939
-0.504911	execution mechanism can prefetch	-0.124939
-0.357090	level-2 cache cannot prefetch	-0.124939
-0.356265	that modern processors prefetch	-0.124939
-0.353636	able to automatically prefetch	-0.124939
-0.463147	code gives an 9	-0.124939
-0.357494	= i * 9	-0.124939
-0.357225	perfectly varies between 9	-0.124939
-0.325407	and pop ebx. 9	-0.124939
-0.314777	3, 4, 6, 9	-0.124939
-0.294237	does ............................................................................. 84 9	-0.124939
-0.540901	spend time on deciding	-0.124939
-0.348592	fine-grained parallelism when deciding	-0.124939
-0.182879	into account when deciding	-0.602060
-0.450526	the disadvantages when deciding	-0.124939
-0.585983	GOT through a self-relative	-0.124939
-1.636452	the calculation of self-relative	-0.124939
-0.463520	no instruction for self-relative	-0.124939
-0.582612	procedure to calculate self-relative	-0.124939
-0.060617	instruction set supports self-relative	-0.124939
-0.358733	* DynamicArray = (float	-0.124939
-0.347508	8.1a float square (float	-0.124939
-0.037168	8.3a float parabola (float	-0.124939
-0.037168	a;} float parabola (float	-0.124939
-0.037168	8.1b float parabola (float	-0.124939
-0.294256	inline int lrintf (float	-0.124939
-1.399431	For example, a Core	-0.124939
-0.576026	for the Intel Core	-0.124939
-0.344143	128 bytes Intel Core	-0.124939
-0.344143	aligned operands Intel Core	-0.124939
-0.344143	unaligned op. Intel Core	-0.124939
-0.355615	_mm_exp_ps _mm_exp_pd AMD Core	-0.124939
-0.592237	stack in the debugger	-0.124939
-0.592237	see in the debugger	-0.124939
-0.592237	version) in the debugger	-0.124939
-0.599860	program in a debugger	-0.124939
-0.358843	prevent optimization. The debugger	-0.124939
-0.462899	with debugging. A debugger	-0.124939
-0.893499	bits with the ^	-0.124939
-0.881908	= a a ^	-0.124939
-1.005257	b = a ^	-0.124939
-0.563527	~b = a ^	-0.124939
-0.657993	= 0 a ^	-0.124939
-0.331943	(a&~b)|(~a&b)=a^b --------- ~a ^	-0.124939
-1.094784	the same time regardless	-0.124939
-0.599111	still the same regardless	-0.124939
-1.056365	in most cases, regardless	-0.124939
-0.341856	to be false regardless	-0.124939
-0.822877	transferred in registers, regardless	-0.124939
-0.237926	the same name, regardless	-0.124939
-0.874150	rounding instead of truncation	-0.124939
-0.764520	be changed to truncation	-0.124939
-0.550265	nor slower than truncation	-0.124939
-0.358401	to integers use truncation	-0.124939
-0.358198	is unfortunate because truncation	-0.124939
-0.607426	C/C++ standard specifies truncation	-0.124939
-1.635387	one of the base	-0.124939
-1.349911	pointer to a base	-0.124939
-0.890389	expressed as a base	-0.124939
-0.885019	deciding whether to base	-0.124939
-0.358259	help files, data base	-0.124939
-0.325401	if the image base	-0.124939
-1.200867	bits of the result.	-0.124939
-0.897212	produce the same result.	-0.124939
-1.298505	into a single result.	-0.124939
-0.458933	by the calculated result.	-0.124939
-0.521379	produces a negative result.	-0.124939
-0.350836	a low positive result.	-0.124939
-0.350349	CPU dispatching: 1. How	-0.124939
-0.212315	the compiler 8.1 How	-0.124939
-0.212315	.......................................................................................... 66 8.1 How	-0.124939
-0.314776	only four multiplications. How	-0.124939
-0.102876	................................................................................ 16 3.1 How	-0.124939
-0.102876	time consumers 3.1 How	-0.124939
-0.339765	break a dependency chain.	-0.124939
-0.143217	a long dependency chain.	-0.124939
-0.183327	a loop-carried dependency chain.	-0.124939
-0.183327	no loop-carried dependency chain.	-0.124939
-0.183327	No loop-carried dependency chain.	-0.124939
-0.356500	3.7 File access Reading	-0.124939
-0.842304	in the program. Reading	-0.124939
-0.347486	than random access. Reading	-0.124939
-0.687971	Use lookup tables Reading	-0.124939
-0.314756	0x20 = 0x1C. Reading	-0.124939
-0.294237	read from 0x4700. Reading	-0.124939
-0.591266	step where the compilation	-0.124939
-0.358718	of interpretation or compilation	-0.124939
-0.455339	language that requires compilation	-0.124939
-0.423685	code and just-in-time compilation	-0.124939
-0.224953	based on just-in-time compilation	-0.124939
-0.299316	machines use just-in-time compilation	-0.124939
-0.199259	finding the hot spots	-0.124939
-0.199259	once the hot spots	-0.124939
-0.197651	identifies any hot spots	-0.124939
-0.041630	to find hot spots	-0.124939
-0.197651	for identifying hot spots	-0.124939
-0.599335	says that the behavior	-0.124939
-0.956739	can change the behavior	-0.124939
-0.358670	to mimic the behavior	-0.124939
-0.463530	for details. The behavior	-0.124939
-0.500257	make the overflow behavior	-0.124939
-0.314776	version). This wasteful behavior	-0.124939
-0.358586	than normal. This happens	-0.124939
-0.358155	though this only happens	-0.124939
-0.462574	filled up, which happens	-0.124939
-0.457159	C++. This typically happens	-0.124939
-0.353207	look at what happens	-0.124939
-0.343745	languages where everything happens	-0.124939
-1.366871	or if the 7	-0.124939
-1.360678	last byte at 7	-0.124939
-0.459773	supported in Windows 7	-0.124939
-0.459796	Intel compiler versions 7	-0.124939
-0.314756	Development process...................................................................................................... 25 7	-0.124939
-0.237926	version control tool. 7	-0.124939
-0.353720	storage and page 87	-0.124939
-0.587677	contentions. See page 87	-0.124939
-0.336312	return from Func 87	-0.124939
-0.325401	memory access ............................................................................................. 87	-0.124939
-0.325401	and data ......................................................................................... 87	-0.124939
-0.237934	Cache organization ................................................................................................... 87	-0.124939
-0.540402	elements in vector Type	-0.124939
-0.339477	the following table. Type	-0.124939
-0.531582	elements, as follows: Type	-0.124939
-0.172686	this problem. 7.11 Type	-0.124939
-0.172686	..................................................................................................................... 38 7.11 Type	-0.124939
-0.294247	method is safer. Type	-0.124939
-0.556790	implemented in different places	-0.124939
-0.717767	around at different places	-0.124939
-0.355741	instructions at specific places	-0.124939
-0.355112	that is four places	-0.124939
-0.352950	that is n places	-0.124939
-0.351706	that lies r places	-0.124939
-0.506273	because the stack unwinding	-0.124939
-0.058112	cases of stack unwinding	-0.124939
-0.288074	situations: The stack unwinding	-0.124939
-0.288074	dependent. The stack unwinding	-0.124939
-0.392243	mechanism called stack unwinding	-0.124939
-1.615651	should preferably be static,	-0.124939
-0.113902	value. The keyword static,	-0.124939
-0.113902	optimizations. The keyword static,	-0.124939
-0.113902	context. The keyword static,	-0.124939
-0.113902	80. The keyword static,	-0.124939
-0.294265	is executed. Without static,	-0.124939
-0.125584	understand it. I am	-0.124939
-0.125584	recompile it. I am	-0.124939
-0.393594	optimization manuals. I am	-0.124939
-0.302969	to use. I am	-0.124939
-0.302969	this manual, I am	-0.124939
-0.302969	embedded microcontrollers. I am	-0.124939
-0.540926	function into a leaf	-0.124939
-0.540926	turned into a leaf	-0.124939
-0.833996	is called a leaf	-0.124939
-0.065403	other function. A leaf	-0.425969
-0.461491	a distinction between leaf	-0.124939
-1.082778	second operand is evaluated	-0.124939
-0.189335	should not be evaluated	-0.425969
-0.580080	macro parameters are evaluated	-0.124939
-0.358006	and || are evaluated	-0.124939
-0.896461	operand is not evaluated	-0.124939
-1.583382	be able to completely	-0.124939
-1.488856	code can be completely	-0.124939
-0.597421	count may be completely	-0.124939
-0.358711	but slow or completely	-0.124939
-0.463261	be used on completely	-0.124939
-0.347523	results or fail completely	-0.124939
-0.463586	reused again and again.	-0.124939
-0.043312	address and back again.	-0.124939
-0.021116	mode and back again.	-0.124939
-0.043312	truncation and back again.	-0.124939
-0.595280	interrupt 3 breakpoint again.	-0.124939
-1.089650	the availability of powerful	-0.124939
-0.358451	development tools have powerful	-0.124939
-0.497445	typically have more powerful	-0.124939
-0.456747	An even more powerful	-0.124939
-0.353504	in ever more powerful	-0.124939
-0.457826	are actually quite powerful	-0.124939
-0.592237	it in the form	-0.124939
-1.050354	classes in the form	-0.124939
-0.883661	either in the form	-0.124939
-1.037622	by any other form	-0.124939
-0.454925	processor model numbers form	-0.124939
-0.454459	stored in binary form	-0.124939
-1.433954	because it is deallocated	-0.124939
-0.107096	are allocated and deallocated	-0.425969
-0.591986	that they are deallocated	-0.124939
-0.576051	objects are also deallocated	-0.124939
-0.353644	space is automatically deallocated	-0.124939
-0.352942	other way three times.	-0.124939
-0.349787	and irregular response times.	-0.124939
-0.336303	has changed five times.	-0.124939
-0.488054	memory a hundred times.	-0.124939
-0.167672	collector at inconvenient times.	-0.124939
-0.167672	unpredictably at inconvenient times.	-0.124939
-0.567136	less useful in 32-	-0.124939
-0.884746	32-bit mode. The 32-	-0.124939
-0.179568	C++ compiler for 32-	-0.425969
-0.358341	two versions. A 32-	-0.124939
-0.350345	Intel libraries. Supports 32-	-0.124939
-0.584979	array a and edx	-0.124939
-0.462989	eax, ecx and edx	-0.124939
-0.805883	the value in edx	-0.124939
-0.358584	[edx] adds, not edx	-0.124939
-0.344077	= Induction ; edx	-0.124939
-0.344077	PTR [esp+12] ; edx	-0.124939
-0.520885	because it cannot rule	-0.124939
-0.667009	the compiler cannot rule	-0.124939
-0.348809	The compiler cannot rule	-0.124939
-0.056054	the strict aliasing rule	-0.425969
-0.336323	able to completely rule	-0.124939
-0.358895	two iterations in one.	-0.124939
-0.532287	making a new one.	-0.124939
-0.532287	create a new one.	-0.124939
-0.465144	of the preceding one.	-0.124939
-0.556388	on the previous one.	-0.124939
-1.314243	but this is permissible	-0.124939
-1.945301	This can be permissible	-0.124939
-0.358802	and multiplication are permissible	-0.124939
-1.619783	It is not permissible	-0.124939
-0.559923	reductions are not permissible	-0.124939
-0.559923	'>') are not permissible	-0.124939
-0.358523	to assume the worst	-0.124939
-0.981108	that gives the worst	-0.124939
-0.065754	to cover the worst	-0.425969
-0.526237	way, etc. The worst	-0.124939
-0.725117	is critical. The worst	-0.124939
-1.062370	this is the job	-0.124939
-0.580035	can do the job	-0.124939
-0.358521	have done the job	-0.124939
-1.223557	to divide the job	-0.124939
-0.581210	do the best job	-0.124939
-0.442054	that the background job	-0.124939
-0.357901	many commercial compilers due	-0.124939
-0.352054	to be higher due	-0.124939
-0.346471	a large delay due	-0.124939
-0.446277	in the future due	-0.124939
-0.294237	measurements are unstable due	-0.124939
-0.294237	are some differences due	-0.124939
-0.577981	double y = 1.0;	-0.124939
-0.065241	n, factorial = 1.0;	-0.124939
-0.354513	{ list[i].a = 1.0;	-0.124939
-0.354513	{ temp->a = 1.0;	-0.124939
-1.601104	x) { return 1.0;	-0.124939
-0.358824	Thin clients that depend	-0.124939
-0.358001	one iteration should depend	-0.124939
-1.114479	because it doesn't depend	-0.124939
-0.353433	the factorials don't depend	-0.124939
-0.353425	These workaround methods depend	-0.124939
-0.347498	other hardware-related details depend	-0.124939
-0.596384	access is the biggest	-0.124939
-1.258030	to fit the biggest	-0.124939
-0.065753	3 Finding the biggest	-0.425969
-0.358848	and compact. The biggest	-0.124939
-0.565514	c; // Define biggest	-0.124939
-0.353438	funny looking name ?Func@@YAXQAHAAH@Z	-0.124939
-0.051254	4 PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z	-0.124939
-0.421427	pop ret ALIGN ?Func@@YAXQAHAAH@Z	-0.124939
-0.023525	ALIGN 4 PUBLIC ?Func@@YAXQAHAAH@Z	-0.425969
-0.592832	parallel because it defines	-0.124939
-0.556774	software programming language defines	-0.124939
-0.971119	hardware definition language defines	-0.124939
-0.353446	The type __m128i defines	-0.124939
-0.294247	The type __m128 defines	-0.124939
-0.237934	The type __m128d defines	-0.124939
-0.108383	ranges do not overlap.	-0.124939
-0.471816	ranges) do not overlap.	-0.124939
-1.601273	a and b overlap.	-0.124939
-0.341883	live ranges now overlap.	-0.124939
-0.124512	options. Supports parallel processing,	-0.124939
-0.124512	Mac. Supports parallel processing,	-0.124939
-0.325414	audio and video processing,	-0.124939
-0.325401	Examples are image processing,	-0.124939
-0.314766	video processing, signal processing,	-0.124939
-0.294247	image processing, sound processing,	-0.124939
-0.456405	x); } void SelectAddMul(short	-0.124939
-0.309588	with branch void SelectAddMul(short	-0.124939
-0.309588	vector classes void SelectAddMul(short	-0.124939
-0.499746	call inline void SelectAddMul(short	-0.124939
-0.309588	*)d, x);} void SelectAddMul(short	-0.124939
-0.309588	function vectorized: void SelectAddMul(short	-0.124939
-0.358970	often disturb the users	-0.124939
-0.454264	mind, that many users	-0.124939
-0.351545	used by many users	-0.124939
-0.502660	time for software users	-0.124939
-0.352972	RAM than end users	-0.124939
-0.350847	for many computer users	-0.124939
-0.358877	bits (YMM), and soon	-0.124939
-0.358610	becomes invalid as soon	-0.124939
-0.971523	then you will soon	-0.124939
-0.355703	my manual will soon	-0.124939
-0.331909	compiler for Basic soon	-0.124939
-0.325401	* 5). As soon	-0.124939
-0.582005	is using a six	-0.124939
-0.503943	now contains only six	-0.124939
-1.046768	that it takes six	-0.124939
-0.589822	Linux, the first six	-0.124939
-0.408190	variables is approximately six	-0.124939
-0.287628	There are approximately six	-0.124939
-0.346479	....................................................................................................... 150 16 Testing	-0.124939
-0.346479	to 15.1c). 16 Testing	-0.124939
-0.355760	16 Testing speed Testing	-0.124939
-0.237934	// Example 14.7b. Testing	-0.124939
-0.237934	// Example 14.7a. Testing	-0.124939
-0.237934	occurrence is rare. Testing	-0.124939
-0.426155	compiled code. In general,	-0.124939
-0.329207	the loop. In general,	-0.124939
-0.329207	equally fast. In general,	-0.124939
-0.329207	this condition. In general,	-0.124939
-0.329207	32-bit counterparts. In general,	-0.124939
-0.329207	number generators. In general,	-0.124939
-0.594153	trick is to roll	-0.124939
-0.874145	easy way to roll	-0.124939
-1.234558	be useful to roll	-0.124939
-1.160022	we want to roll	-0.124939
-1.379339	is advantageous to roll	-0.124939
-0.461730	understand when we roll	-0.124939
-1.001403	for intrinsic functions (i.e.	-0.124939
-1.037897	to do so (i.e.	-0.124939
-0.502759	All global variables (i.e.	-0.124939
-1.237848	powers of 2 (i.e.	-0.124939
-0.457838	function return addresses (i.e.	-0.124939
-0.730504	the same module (i.e.	-0.124939
-0.463565	into ecx and edx,	-0.124939
-0.504966	address is in edx,	-0.124939
-0.356993	1 eax, 8 edx,	-0.124939
-0.343723	[esp+8] eax, eax edx,	-0.124939
-0.339468	mov 2:8+esp eax, edx,	-0.124939
-0.314756	eax, edx, ecx, edx,	-0.124939
-0.615161	different implementations of C++,	-0.124939
-0.434659	Most implementations of C++,	-0.124939
-0.357539	Bounds checking In C++,	-0.124939
-0.346462	programming language, e.g. C++,	-0.124939
-0.325414	languages include C, C++,	-0.124939
-0.237934	code. C#, managed C++,	-0.124939
-0.026685	= LoadVector(cc + i);	-0.602060
-0.026685	= LoadVector(bb + i);	-0.602060
-0.567225	with members of mixed	-0.124939
-0.878127	they cannot be mixed	-0.124939
-0.784675	Do objects have mixed	-0.124939
-0.501272	where operands have mixed	-0.124939
-0.358341	careful optimization. A mixed	-0.124939
-0.294247	web application integration, mixed	-0.124939
-0.358700	6); Or, if protection	-0.124939
-0.586771	scanners and other protection	-0.124939
-0.394980	of a copy protection	-0.124939
-0.304093	protection. Some copy protection	-0.124939
-0.304093	updated. Most copy protection	-0.124939
-0.304093	breakdown. Many copy protection	-0.124939
-1.169997	of the loop counter.	-0.124939
-0.572500	from the loop counter.	-0.124939
-0.498150	a simple integer counter.	-0.124939
-0.354010	an additional integer counter.	-0.124939
-0.842195	the time stamp counter.	-0.124939
-0.384439	so-called time stamp counter.	-0.124939
-0.599811	integer to the structure.	-0.124939
-1.069674	member of a structure.	-0.124939
-1.183703	reference to a structure.	-0.124939
-0.915923	a class or structure.	-0.124939
-0.434482	object's class or structure.	-0.124939
-0.358250	the desired program structure.	-0.124939
-0.726795	an int is 4.	-0.124939
-0.964398	on a Pentium 4.	-0.124939
-0.277430	an Intel Pentium 4.	-0.124939
-0.509800	the old Pentium 4.	-0.124939
-0.237943	etc. for Linux) 4.	-0.124939
-0.237943	and compiler makers. 4.	-0.124939
-0.358831	the computer for security	-0.124939
-0.357497	in programs where security	-0.124939
-0.355112	overflow is another security	-0.124939
-0.851531	from the above security	-0.124939
-0.237926	is a compelling security	-0.124939
-0.237926	than third party security	-0.124939
-2.074725	the number of branches.	-0.124939
-0.527260	many calls and branches.	-0.124939
-0.562066	or no other branches.	-0.124939
-0.357481	Making too many branches.	-0.124939
-0.352933	implemented as three branches.	-0.124939
-0.408163	and other nearby branches.	-0.124939
-0.754918	short int 128 Is16vec8	-0.124939
-0.648951	b * c; Is16vec8	-0.124939
-0.930107	into vector c: Is16vec8	-0.124939
-0.930107	into vector b: Is16vec8	-0.124939
-0.836082	vector of (0,0,0,0,0,0,0,0) Is16vec8	-0.124939
-0.836082	vector of (2,2,2,2,2,2,2,2) Is16vec8	-0.124939
-0.524820	number of CPU cores.	-0.124939
-0.348081	between different CPU cores.	-0.124939
-0.771607	between multiple CPU cores.	-0.124939
-0.489908	Jumps between CPU cores.	-0.124939
-0.555691	CPU with multiple cores.	-0.124939
-0.355840	the multiple processor cores.	-0.124939
-1.048864	take care of communication	-0.124939
-0.526188	various methods for communication	-0.124939
-0.540084	be needed for communication	-0.124939
-0.595500	cache is that communication	-0.124939
-0.358200	fine-grained parallelism because communication	-0.124939
-0.355762	amount of necessary communication	-0.124939
-0.987491	used only for avoiding	-0.124939
-0.526201	suggests methods for avoiding	-0.124939
-0.523101	invalid, and by avoiding	-0.124939
-0.356082	this error by avoiding	-0.124939
-0.356082	may save by avoiding	-0.124939
-1.105473	are not always avoiding	-0.124939
-1.386643	or reference to anything	-0.124939
-1.013185	cannot rely on anything	-0.124939
-1.397256	more time than anything	-0.124939
-0.574437	necessary to optimize anything	-0.124939
-0.353224	does not cost anything	-0.124939
-0.575618	does not alias anything	-0.124939
-0.463409	} #endif // INSTRSET	-0.124939
-0.348372	The preprocessing macro INSTRSET	-0.124939
-0.279493	} } #if INSTRSET	-0.124939
-0.279493	instruction set #if INSTRSET	-0.124939
-0.102876	FUNCNAME SelectAddMul_SSE41 #elif INSTRSET	-0.124939
-0.102876	FUNCNAME SelectAddMul_SSE2 #elif INSTRSET	-0.124939
-0.356500	3.13 Memory access Accessing	-0.124939
-0.812056	a smart pointer. Accessing	-0.124939
-0.876573	and back again. Accessing	-0.124939
-0.325391	classes or structures. Accessing	-0.124939
-0.314756	data more compact. Accessing	-0.124939
-0.237926	a variable. Efficiency Accessing	-0.124939
-0.573311	internal variables and internal	-0.124939
-0.195797	not used for internal	-0.425969
-0.461863	and PLT for internal	-0.124939
-0.358653	function libraries with internal	-0.124939
-0.356511	we can access internal	-0.124939
-0.558442	int or by type-casting	-0.124939
-0.356078	child class by type-casting	-0.124939
-0.356078	different type by type-casting	-0.124939
-0.462948	aware of when type-casting	-0.124939
-0.343736	as C- style type-casting	-0.124939
-0.421414	than the C-style type-casting	-0.124939
-1.015815	determined by the requirements	-0.124939
-0.579895	influenced by the requirements	-0.124939
-0.596857	conflicting with the requirements	-0.124939
-0.355676	development process. These requirements	-0.124939
-0.554481	or turn off requirements	-0.124939
-0.490317	but the alignment requirements	-0.124939
-0.599811	all to the profiler.	-0.124939
-1.225033	is not a profiler.	-0.124939
-0.581486	to using a profiler.	-0.124939
-0.500575	a CPU- specific profiler.	-0.124939
-0.237943	using a ready-made profiler.	-0.124939
-0.550841	mode. Therefore, the __fastcall	-0.124939
-0.358696	optimize(...) Fastcall function __fastcall	-0.124939
-0.537659	functions The keyword __fastcall	-0.124939
-0.237934	__fastcall __attribute(( fastcall)) __fastcall	-0.124939
-0.237934	Fast function calling. __fastcall	-0.124939
-0.885038	may cause a loss	-0.124939
-0.828082	of overflow and loss	-0.124939
-0.504715	cause overflow or loss	-0.124939
-0.525045	with hardly any loss	-0.124939
-0.356084	to worry about loss	-0.124939
-0.580772	or any other cleanup	-0.124939
-0.539916	care of all cleanup	-0.124939
-0.552715	all the necessary cleanup	-0.124939
-0.456924	way of handling cleanup	-0.124939
-0.351307	functions that require cleanup	-0.124939
-0.212321	................................................................................................... 87 9.3 Functions	-0.124939
-0.212321	VIA CPUs". 9.3 Functions	-0.124939
-0.172696	Loops...................................................................................................................... 45 7.14 Functions	-0.124939
-0.172696	too big. 7.14 Functions	-0.124939
-0.237951	compiler, v. 10.1.020. Functions	-0.124939
-0.356167	approach to error handling.	-0.124939
-0.315861	debugging and exception handling.	-0.124939
-0.315861	relies on exception handling.	-0.124939
-0.409533	is no exception handling.	-0.124939
-0.409533	with structured exception handling.	-0.124939
-0.939726	of C++ and Fortran	-0.124939
-0.358418	C++, Pascal and Fortran	-0.124939
-0.463595	loops (except in Fortran	-0.124939
-0.237943	quite as versatile. Fortran	-0.124939
-0.237943	C++, D, Pascal, Fortran	-0.124939
-1.076752	discussion of the increment	-0.124939
-0.959327	the CPU to increment	-0.124939
-0.358583	used simply to increment	-0.124939
-0.891000	is the loop increment	-0.124939
-0.654210	said here about increment	-0.124939
-0.788799	function libraries and drivers	-0.124939
-0.193384	routines and device drivers	-0.124939
-0.193384	(except in device drivers	-0.124939
-0.193384	in 64-bit device drivers	-0.124939
-0.193384	C++. Critical device drivers	-0.124939
-0.614424	is important to economize	-0.301030
-0.435357	more important to economize	-0.124939
-0.504969	efficient library and economize	-0.124939
-1.574442	at compile time. Templates	-0.124939
-0.978939	a template parameter. Templates	-0.124939
-0.439051	x = 10; Templates	-0.124939
-0.314766	simple cases. 7.28 Templates	-0.124939
-0.237934	same template. 57 Templates	-0.124939
-0.294268	elements in row 28	-0.124939
-0.294268	elements from row 28	-0.124939
-0.336329	lines in column 28	-0.124939
-0.255928	elements with column 28	-0.124939
-0.294265	or 64-bit systems). 28	-0.124939
-0.358923	executes three to seven	-0.124939
-0.357158	algebraic expressions on seven	-0.124939
-0.357158	of experiments on seven	-0.124939
-0.500129	up to cause seven	-0.124939
-0.343747	precision of approximately seven	-0.124939
-1.074855	function can be turned	-0.124939
-0.462764	an STL vector turned	-0.124939
-0.159293	relevant optimization options turned	-0.301030
-0.842248	The order of inheritance	-0.124939
-0.448482	Alternative to multiple inheritance	-0.124939
-0.346975	such as multiple inheritance	-0.124939
-0.448482	may avoid multiple inheritance	-0.124939
-0.331930	Example 7.38a. Multiple inheritance	-0.124939
-0.874995	easiest way to overcome	-0.124939
-1.033736	for how to overcome	-0.124939
-0.740378	discussed how to overcome	-0.124939
-0.740378	discusses how to overcome	-0.124939
-0.900208	problem can be overcome	-0.124939
-0.647834	and difficult to maintain.	-0.124939
-0.647834	are difficult to maintain.	-0.124939
-0.455942	therefore difficult to maintain.	-0.124939
-0.937345	is easier to maintain.	-0.124939
-0.358893	to debug and maintain.	-0.124939
-0.157637	allow up to fourteen	-0.425969
-0.407590	totaling up to fourteen	-0.124939
-0.776941	32-bit systems and fourteen	-0.124939
-0.725923	operating systems and fourteen	-0.124939
-0.505058	program. Add to 122	-0.124939
-0.829397	system. See page 122	-0.124939
-0.563652	crash. See page 122	-0.124939
-0.237943	CPU dispatch strategies........................................................................................ 122	-0.124939
-0.237943	different instruction sets........................... 122	-0.124939
-0.351455	complicated and time consuming.	-0.124939
-0.351455	methods are time consuming.	-0.124939
-0.351455	is very time consuming.	-0.124939
-0.351455	be particularly time consuming.	-0.124939
-0.314803	and very time- consuming.	-0.124939
-0.358970	to justify the method.	-0.124939
-0.868388	discussion of this method.	-0.124939
-0.656029	on every call method.	-0.124939
-0.356783	this complicated template method.	-0.124939
-0.356252	contentions. Use simple method.	-0.124939
-1.482606	the sake of backwards	-0.124939
-0.550460	The sequence of backwards	-0.124939
-0.886617	systems are not backwards	-0.124939
-1.030464	data are accessed backwards	-0.124939
-0.314790	follow the track backwards	-0.124939
-0.463692	to mirror the remote	-0.124939
-0.596140	communication with a remote	-0.124939
-0.463628	locally. Access to remote	-0.124939
-0.358632	cache. Files on remote	-0.124939
-0.237934	for automatic updates, remote	-0.124939
-0.595308	types such as int,	-0.124939
-0.358551	you declare an int,	-0.124939
-0.461555	unsigned 2 2 int,	-0.124939
-0.345796	1 1 short int,	-0.124939
-0.345796	types: char, short int,	-0.124939
-0.358882	between c2 and bc	-0.124939
-1.040482	element in vector bc	-0.124939
-0.125388	and c __m128i bc	-0.425969
-0.325422	the inverted bit-mask: bc	-0.124939
-0.439922	compilers and development tools.	-0.124939
-0.311452	of advanced development tools.	-0.124939
-0.311452	of powerful development tools.	-0.124939
-0.317438	use standardized installation tools.	-0.124939
-0.317438	by individual installation tools.	-0.124939
-0.478697	integer in one operation.	-0.124939
-0.478697	additions in one operation.	-0.124939
-1.284005	in a single operation.	-0.124939
-0.282276	as a shift operation.	-0.124939
-0.282276	using a shift operation.	-0.124939
-0.587950	size in the future.	-0.124939
-1.163096	available in the future.	-0.124939
-0.587950	dominate in the future.	-0.124939
-0.587950	grow in the future.	-0.124939
-0.237959	a more distant future.	-0.124939
-0.463692	to force the swapping	-0.124939
-0.579050	time we are swapping	-0.124939
-0.358382	be careful when swapping	-0.124939
-0.358282	the excessive memory swapping	-0.124939
-0.341865	to disk. Memory swapping	-0.124939
-0.597643	bits when the AVX512	-0.124939
-0.053827	64 8 512 AVX512	-0.124939
-0.053827	32 16 512 AVX512	-0.124939
-1.575075	There is a considerable	-0.124939
-0.540112	collection takes a considerable	-0.124939
-0.462450	still give a considerable	-0.124939
-0.357995	of course a considerable	-0.124939
-0.358361	are called. A considerable	-0.124939
-0.505078	may remove the memset	-0.124939
-0.587346	explicit use of memset	-0.124939
-0.562999	by calls to memset	-0.124939
-0.358827	may report that memset	-0.124939
-0.577158	use the functions memset	-0.124939
-0.897301	interface to the rest	-0.124939
-0.597736	language and the rest	-0.124939
-0.600400	advice in the rest	-0.124939
-1.707313	so that the rest	-0.124939
-0.892706	called by the rest	-0.124939
-0.534479	advanced code version on,	-0.124939
-0.645114	the advanced version on,	-0.124939
-0.773711	it is running on,	-0.124939
-0.162719	optimization options turned on,	-0.124939
-0.357877	same example using Agner's	-0.124939
-0.573420	with vector classes Agner's	-0.124939
-0.538921	(see page 107). Agner's	-0.124939
-0.237934	the option -mveclibabi=acml. Agner's	-0.124939
-0.237934	Library amd_vrs4_expf amd_vrd2_exp Agner's	-0.124939
-0.598539	level, and the Digital	-0.124939
-0.358877	the Borland and Digital	-0.124939
-0.325401	xxxxxxxxx Codeplay Watcom Digital	-0.124939
-0.237934	v. 7.1-4, 2008. Digital	-0.124939
-0.237934	for vector intrinsics. Digital	-0.124939
-1.347139	information about the third	-0.124939
-0.586665	database, and a third	-0.124939
-0.570353	the code. The third	-0.124939
-0.358531	more reliable than third	-0.124939
-0.237934	The } 59 third	-0.124939
-0.354668	vector objects // Roll	-0.124939
-0.651388	b, c; // Roll	-0.124939
-0.065261	= _mm_set1_epi16(2); // Roll	-0.425969
-0.354668	Is16vec8 two(2,2,2,2,2,2,2,2); // Roll	-0.124939
-0.525511	before test // Critical	-0.124939
-0.761485	a, b; // Critical	-0.124939
-0.314776	consuming parts only. Critical	-0.124939
-0.408187	the CPU brand. Critical	-0.124939
-0.314776	C or C++. Critical	-0.124939
-0.039023	and manual 5: "Calling	-0.124939
-0.048404	in manual 5: "Calling	-0.602060
-0.039023	See manual 5: "Calling	-0.124939
-0.563063	resources. However, the CISC	-0.124939
-0.358882	between RISC and CISC	-0.124939
-0.462719	limited resource. The CISC	-0.124939
-0.358206	performance/price ratio. The CISC	-0.124939
-0.540801	PC processors with CISC	-0.124939
-0.358877	addition units, and 22	-0.124939
-0.325401	unit throughput ....................................................................................... 22	-0.124939
-0.294247	Dependency chains ................................................................................................ 22	-0.124939
-0.237934	3.14 Context switches..................................................................................................... 22	-0.124939
-0.237934	3.13 Memory access....................................................................................................... 22	-0.124939
-0.463530	or 1. The AND	-0.124939
-0.657455	_mm_cmpgt_epi16(b, zero); // AND	-0.124939
-0.357722	mask); 110 // AND	-0.124939
-1.133618	of the two AND	-0.124939
-0.518930	-1. The bitwise AND	-0.124939
-0.314730	rarely worth the effort	-0.124939
-0.314730	hardly worth the effort	-0.124939
-0.527863	concentrate the optimization effort	-0.124939
-0.137484	if your optimization effort	-0.124939
-0.137484	If your optimization effort	-0.124939
-1.223116	and floating point numbers.	-0.124939
-0.585319	or floating point numbers.	-0.124939
-0.350852	1 for negative numbers.	-0.124939
-0.505884	on a thousand numbers.	-0.124939
-0.294256	to use denormal numbers.	-0.124939
-0.462947	are becoming more popular	-0.124939
-0.358341	development tools. A popular	-0.124939
-0.357901	the 8 most popular	-0.124939
-0.357008	brand was less popular	-0.124939
-0.314766	development tools. One popular	-0.124939
-0.463414	= 8; // SIZE	-0.124939
-0.511186	} const int SIZE	-0.124939
-0.511186	9.5a const int SIZE	-0.124939
-0.511186	9.6a const int SIZE	-0.124939
-0.352962	> 256 && SIZE	-0.124939
-0.839896	type identification (RTTI) Runtime	-0.124939
-0.172691	........................................................................................ 53 7.21 Runtime	-0.124939
-0.172691	the effort. 7.21 Runtime	-0.124939
-0.237943	// Example 7.43a. Runtime	-0.124939
-0.237943	See page 73. Runtime	-0.124939
-0.356462	to certain programming principles	-0.124939
-0.454449	stored. The storage principles	-0.124939
-0.518879	on the advanced principles	-0.124939
-0.350844	are two main principles	-0.124939
-0.237934	and software engineering principles	-0.124939
-1.822921	the number of context	-0.124939
-1.331127	The number of context	-0.124939
-0.358843	background jobs. The context	-0.124939
-0.358347	Context switches A context	-0.124939
-0.237943	big program. Frequent context	-0.124939
-0.358696	addresses to function names.	-0.124939
-0.656824	names and variable names.	-0.124939
-0.558883	allowed in assembly names.	-0.124939
-0.355666	short or common names.	-0.124939
-0.237934	generation of identifier names.	-0.124939
-0.563021	various ways of reducing	-0.124939
-1.652535	be used for reducing	-0.124939
-0.462807	are better at reducing	-0.124939
-0.356590	point operations without reducing	-0.124939
-0.517660	compilers are actually reducing	-0.124939
-0.880008	code that can benefit	-0.124939
-0.462208	XMM registers can benefit	-0.124939
-0.462897	of memory will benefit	-0.124939
-0.287645	variable that could benefit	-0.124939
-0.287645	the code could benefit	-0.124939
-0.920079	may not be worth	-0.425969
-0.771904	It is rarely worth	-0.124939
-0.345250	DLL. Another alternative worth	-0.124939
-0.478056	it is hardly worth	-0.124939
-1.455754	the Gnu compiler manual.	-0.124939
-0.345580	scope of this manual.	-0.124939
-0.356949	read this first manual.	-0.124939
-0.429559	in the present manual.	-0.124939
-0.463512	casting operator that specifies	-0.124939
-0.560309	piece of software specifies	-0.124939
-0.136824	the C/C++ standard specifies	-0.124939
-0.136824	The C/C++ standard specifies	-0.124939
-0.648016	The volatile keyword specifies	-0.124939
-0.504954	longer used and searching	-0.124939
-0.358420	programs use time searching	-0.124939
-0.352962	functions for string searching	-0.124939
-0.444418	efficient solution. Is searching	-0.124939
-0.237957	binary tree. Is searching	-0.124939
-0.537804	of register stack versus	-0.124939
-0.435075	and references Pointers versus	-0.124939
-0.102881	145 14.11 Static versus	-0.124939
-0.102881	limitation). 14.11 Static versus	-0.124939
-0.314776	integer overflow. Signed versus	-0.124939
-0.083708	inlining and constant propagation	-0.124939
-0.039840	folding and constant propagation	-0.124939
-0.324150	to enable constant propagation	-0.124939
-0.325443	Borland Microsoft Constant propagation	-0.124939
-1.423662	to do the reduction	-0.124939
-0.591054	examples where the reduction	-0.124939
-1.127710	that a particular reduction	-0.124939
-0.172696	function call. Algebraic reduction	-0.124939
-0.172696	complicated reductions. Algebraic reduction	-0.124939
-0.358005	without taking cache effects	-0.124939
-0.319758	or the negative effects	-0.124939
-0.319758	variables. The negative effects	-0.124939
-0.350852	performance. The positive effects	-0.124939
-0.237943	operands has side effects	-0.124939
-0.048588	= y + 1.;	-0.301030
-0.338504	* Func1(x) + 1.;	-0.124939
-0.338504	= y.a + 1.;	-0.124939
-0.358852	range analysis The live	-0.124939
-0.294278	even when their live	-0.124939
-0.077822	b because their live	-0.124939
-0.037170	register because their live	-0.425969
-0.998238	to access a multidimensional	-0.124939
-0.358619	solution. Is a multidimensional	-0.124939
-0.065638	A matrix or multidimensional	-0.425969
-0.358354	0, sizeof(list)); A multidimensional	-0.124939
-1.124150	it takes to install	-0.425969
-0.358245	takes hours to install	-0.124939
-0.355981	the user must install	-0.124939
-0.294265	messages saying please install	-0.124939
-1.285371	in terms of development,	-0.124939
-0.462769	used during program development,	-0.124939
-0.540033	history of CPU development,	-0.124939
-0.560298	costs of software development,	-0.124939
-0.237934	facilities, easy GUI development,	-0.124939
-1.193228	rely on the strict	-0.124939
-0.358821	trick violates the strict	-0.124939
-0.589084	models have a strict	-0.124939
-0.358840	off requirements for strict	-0.124939
-0.560731	requirements are less strict	-0.124939
-0.085831	r++) { for (c	-0.602060
-0.356890	through rows for (c	-0.124939
-0.357773	+ b) + (c	-0.124939
-0.648008	page 146 below. Position-independent	-0.124939
-0.349780	the loader. 2. Position-independent	-0.124939
-0.382884	everywhere by default. Position-independent	-0.124939
-0.102876	libraries............................................................................ 146 14.12 Position-independent	-0.124939
-0.102876	code. 147 14.12 Position-independent	-0.124939
-0.726795	the parallelism is obvious	-0.124939
-1.524860	It may be obvious	-0.124939
-0.858798	it would be obvious	-0.124939
-0.989633	to be an obvious	-0.124939
-0.503120	not do such obvious	-0.124939
-0.358636	the diagonal is swapped	-0.124939
-0.358636	element matrix[r][c] is swapped	-0.124939
-0.598295	they may be swapped	-0.124939
-0.835485	and b are swapped	-0.124939
-0.524705	uncached or even swapped	-0.124939
-0.331909	system resources .......................................................................................... 21	-0.124939
-0.325401	Network access ...................................................................................................... 21	-0.124939
-0.314783	is heavily loaded. 21	-0.124939
-0.314766	Other databases ....................................................................................................... 21	-0.124939
-0.237934	3.10 Graphics ................................................................................................................. 21	-0.124939
-0.502422	fit the biggest vectors:	-0.124939
-0.002872	fit the eight-element vectors:	-0.726999
-0.835594	clock cycle. The OR	-0.124939
-0.463404	_mm_andnot_si128(mask, bc); // OR	-0.124939
-0.355827	gives zero. An OR	-0.124939
-0.494413	using the bitwise OR	-0.124939
-0.237934	and the EXCLUSIVE OR	-0.124939
-0.593164	N) { // Array	-0.124939
-0.356701	= 100; // Array	-0.124939
-0.356701	= 256; // Array	-0.124939
-0.341879	in large arrays. Array	-0.124939
-0.237951	// Example 7.15a. Array	-0.124939
-0.462635	by all other processes	-0.124939
-1.137511	shared between multiple processes	-0.124939
-0.352275	access. Run multiple processes	-0.124939
-0.357494	that run many processes	-0.124939
-0.341869	lot of background processes	-0.124939
-0.828190	C++ language is portable	-0.124939
-0.591886	that for a portable	-0.124939
-1.036192	will not be portable	-0.124939
-1.855602	it is not portable	-0.124939
-0.444389	C++ is fully portable	-0.124939
-1.748068	is likely to consume	-0.124939
-0.358586	virus scanners to consume	-0.124939
-0.357804	extra framework can consume	-0.124939
-0.357804	A database can consume	-0.124939
-0.358178	operators and functions consume	-0.124939
-0.314804	system standards. Such schemes	-0.124939
-0.314804	hardware identification. Such schemes	-0.124939
-0.109382	Some copy protection schemes	-0.124939
-0.109382	Most copy protection schemes	-0.124939
-0.109382	Many copy protection schemes	-0.124939
-0.356878	takes 40 - 80	-0.124939
-0.356878	multiplication (27 - 80	-0.124939
-0.594056	expressions. See page 80	-0.124939
-0.355138	local non-member functions. 80	-0.124939
-0.456677	and simply put 80	-0.124939
-0.632905	pointers and references. Arrays	-0.124939
-0.102876	.......................................................................................................... 38 7.10 Arrays	-0.124939
-0.102876	page 93. 7.10 Arrays	-0.124939
-0.538937	be allocated dynamically. Arrays	-0.124939
-0.237943	and unexpected behaviors. Arrays	-0.124939
-0.659679	use it for lists	-0.124939
-0.358711	complicated criteria or lists	-0.124939
-0.656553	The following table lists	-0.124939
-0.352392	faster than linked lists	-0.124939
-0.237934	type casting. Linked lists	-0.124939
-0.592239	fail in the event	-0.124939
-0.592239	recover in the event	-0.124939
-0.592239	resized in the event	-0.124939
-0.522625	than the specific event	-0.124939
-0.237951	results in meaningless event	-0.124939
-0.599555	memory in a computer.	-0.124939
-0.591455	else on a computer.	-0.124939
-0.950289	a Pentium 4 computer.	-0.124939
-0.355128	cycle on another computer.	-0.124939
-0.382884	a big mainframe computer.	-0.124939
-0.463255	64 bit code Static	-0.124939
-0.643852	dynamic linking are: Static	-0.124939
-0.345250	functions can not. Static	-0.124939
-0.172691	....................................................................................... 145 14.11 Static	-0.124939
-0.172691	this limitation). 14.11 Static	-0.124939
-0.358937	over. Virtualization is becoming	-0.124939
-0.852617	The compilers are becoming	-0.124939
-0.555004	vector processors are becoming	-0.124939
-0.461459	hand-held devices are becoming	-0.124939
-0.574721	caching is therefore becoming	-0.124939
-0.600948	advantage in the select	-0.124939
-1.349505	be possible to select	-0.124939
-0.526084	Unpredictable branches that select	-0.124939
-0.358111	preprocessing directives that select	-0.124939
-0.501613	compiler will always select	-0.124939
-0.897362	Sum of a list,	-0.124939
-0.577586	an element in list,	-0.124939
-0.595300	templates, such as list,	-0.124939
-0.453375	software. A negative list,	-0.124939
-0.237934	the same queue, list,	-0.124939
-0.358343	scan instruction is executed	-0.124939
-0.804282	control branch is executed	-0.124939
-0.504200	the latter is executed	-0.124939
-0.588315	code cannot be executed	-0.124939
-0.882112	can often be executed	-0.124939
-0.598034	optimal on the actual	-0.124939
-0.889073	time than the actual	-0.124939
-0.593521	cycles at the actual	-0.124939
-1.258036	to fit the actual	-0.124939
-0.751278	replaced by their actual	-0.124939
-0.577330	chains. In this case,	-0.124939
-0.486889	in the latter case,	-0.124939
-0.345900	In the latter case,	-0.124939
-0.481321	in the general case,	-0.124939
-0.294256	constant. // General case,	-0.124939
-0.785406	gain in performance over	-0.124939
-0.488599	weigh the advantages over	-0.124939
-0.323746	have several advantages over	-0.124939
-0.325411	advantages of alloca over	-0.124939
-0.237943	due to controversies over	-0.124939
-0.572902	tested with a realistic	-0.124939
-0.572902	performed with a realistic	-0.124939
-0.826609	to get a realistic	-0.124939
-0.549930	date. A more realistic	-0.124939
-0.358354	be considered. A realistic	-0.124939
-0.889349	the size of abc	-0.301030
-0.348393	Example 7.13 struct abc	-0.124939
-0.237951	b; int c;}; abc	-0.124939
-0.833219	of A is finished.	-0.124939
-0.358045	preceding addition is finished.	-0.124939
-0.503783	preceding iteration is finished.	-0.124939
-0.358045	the compilation is finished.	-0.124939
-0.358812	the loop are finished.	-0.124939
-0.349752	on the other hand,	-0.124939
-0.064626	On the other hand,	-0.124939
-0.358882	platform _M_IX86 and _WIN64	-0.124939
-0.356931	bit platform not _WIN64	-0.124939
-0.356931	not _WIN64 not _WIN64	-0.124939
-0.495432	64 bit platform _WIN64	-0.124939
-0.382884	platform _WIN64 _LP64 _WIN64	-0.124939
-1.810452	it takes to recover	-0.124939
-0.879486	know how to recover	-0.124939
-0.882657	be able to recover	-0.425969
-0.657156	is made to recover	-0.124939
-0.898609	goes to the console	-0.124939
-0.591524	inputs for a console	-0.124939
-0.591460	and use a console	-0.124939
-0.355780	output file. A console	-0.124939
-0.355780	user interface. A console	-0.124939
-1.066911	Most of the advice	-0.124939
-0.597931	Much of the advice	-0.124939
-0.526911	then follow the advice	-0.124939
-0.726592	non-sequential order. The advice	-0.124939
-0.572299	row. The same advice	-0.124939
-0.561832	data in different ways.	-0.124939
-0.787530	used in two ways.	-0.124939
-0.455181	more than two ways.	-0.124939
-0.357033	32 sets 4 ways.	-0.124939
-0.357005	512 kb, 8 ways.	-0.124939
-1.362385	this: // Example 16.2	-0.124939
-1.166661	code in example 16.2	-0.124939
-1.116546	as in example 16.2	-0.124939
-0.570619	crash the program. 16.2	-0.124939
-0.294256	counters .................................................................... 155 16.2	-0.124939
-0.588075	used inside the pow	-0.124939
-0.358877	as sqrt and pow	-0.124939
-0.588974	pow(x,10); } The pow	-0.124939
-0.591422	ipow faster than pow	-0.124939
-0.237934	functions like sqrt, pow	-0.124939
-0.726795	modern microprocessors is split	-0.124939
-0.877725	we need to split	-0.124939
-1.045506	not advantageous to split	-0.124939
-0.596808	lines should be split	-0.124939
-0.354191	128-bit operation was split	-0.124939
-0.463478	the factors are generated	-0.124939
-0.201242	at the code generated	-0.425969
-0.354367	compiler. Object files generated	-0.124939
-0.294256	of the comments generated	-0.124939
-0.129401	a string is created	-0.425969
-0.573128	Windows program that created	-0.124939
-0.358718	is declared or created	-0.124939
-0.350361	must be dynamically created	-0.124939
-1.314105	may be a hundred	-0.124939
-0.873015	more than a hundred	-0.124939
-0.462450	from memory a hundred	-0.124939
-0.357995	calculate *p+2 a hundred	-0.124939
-0.355694	cached, but several hundred	-0.124939
-0.527315	calculation time of 250	-0.124939
-0.358728	0.5 ns = 250	-0.124939
-0.805268	the library function 250	-0.124939
-0.587582	actually more than 250	-0.124939
-0.237934	loop? Certainly not! 250	-0.124939
-1.749023	a lot of computing	-0.124939
-0.562946	less memory and computing	-0.124939
-0.526195	temporary register for computing	-0.124939
-0.787797	function libraries for computing	-0.124939
-0.357012	applications have less computing	-0.124939
-0.587354	references instead of pointers,	-0.124939
-1.186609	are accessed through pointers,	-0.124939
-0.345237	bounds violations, invalid pointers,	-0.124939
-0.339485	Far storage, far pointers,	-0.124939
-0.408175	counters, function parameters, pointers,	-0.124939
-0.589099	faster way to limit	-0.124939
-0.354519	be no certain limit	-0.124939
-0.072979	a reasonable upper limit	-0.124939
-0.072979	no reasonable upper limit	-0.124939
-0.160748	a not-too-big upper limit	-0.124939
-0.358877	See page and 90	-0.124939
-0.594042	errors. See page 90	-0.124939
-0.350345	// Linux syntax 90	-0.124939
-0.325401	of data ...................................................................................................... 90	-0.124939
-0.314766	memory allocation ...................................................................................... 90	-0.124939
-1.014160	it needs to follow	-0.124939
-0.591552	C if you follow	-0.124939
-0.358331	want vectorization then follow	-0.124939
-0.814860	the cache lines follow	-0.124939
-0.314766	the case labels follow	-0.124939
-0.835855	is called a loop-carried	-0.124939
-1.872604	there is no loop-carried	-0.124939
-0.461990	8.23b has two loop-carried	-0.124939
-0.349757	iterations are: No loop-carried	-0.124939
-0.346488	dependency chains, especially loop-carried	-0.124939
-0.597042	use a function library,	-0.124939
-0.755087	a long vector library,	-0.124939
-0.536696	a short vector library,	-0.124939
-0.846361	the vector class library,	-0.124939
-0.548752	than a static library,	-0.124939
-0.358975	decades ago, the recommendation	-0.124939
-0.690622	have no specific recommendation	-0.124939
-0.482960	making any specific recommendation	-0.124939
-0.294275	scan instructions. My recommendation	-0.124939
-0.294275	file level. My recommendation	-0.124939
-1.410988	Dynamic memory allocation Objects	-0.124939
-1.090024	power of 2. Objects	-0.124939
-0.523934	re-allocation is needed. Objects	-0.124939
-0.343726	and often inefficient. Objects	-0.124939
-0.444382	called garbage collection. Objects	-0.124939
-0.599245	language is a compromise	-0.124939
-0.591140	must be a compromise	-0.124939
-1.473608	is necessary to compromise	-0.124939
-0.553543	solution that doesn't compromise	-0.124939
-0.538937	be a viable compromise	-0.124939
-0.038281	and the Digital Mars	-0.124939
-0.038281	Borland and Digital Mars	-0.124939
-0.038281	Codeplay Watcom Digital Mars	-0.124939
-0.038281	7.1-4, 2008. Digital Mars	-0.124939
-0.038281	vector intrinsics. Digital Mars	-0.124939
-0.955821	the value is already	-0.124939
-0.526673	the string is already	-0.124939
-0.358340	the CPU-type is already	-0.124939
-0.573133	a program that already	-0.124939
-0.549904	block that has already	-0.124939
-1.602839	if there is nothing	-0.124939
-0.593485	branch. There is nothing	-0.124939
-0.833815	instruction set has nothing	-0.124939
-0.358202	while loop because nothing	-0.124939
-0.357905	{ // do nothing	-0.124939
-0.653420	b : c (a&&b)	-0.124939
-0.294247	a+b+c=a+(b+c) (a+b)+c=a+(b+c) --xx----- (a&&b)	-0.124939
-0.294247	x--xx---- (a&&b)||(a&&!b)=a x--xx---- (a&&b)	-0.124939
-0.237934	c x-xx----- 75 (a&&b)	-0.124939
-0.237934	(a&&b&&c) = a&&b (a&&b)	-0.124939
-1.049001	for calculating the physical	-0.124939
-2.074791	the number of physical	-0.124939
-0.726457	is limited by physical	-0.124939
-0.587144	assigning a new physical	-0.124939
-0.355112	processor has four physical	-0.124939
-0.353707	int i; if ((unsigned	-0.124939
-0.457003	"Delta" }; if ((unsigned	-0.124939
-0.353707	39916800, 479001600}; if ((unsigned	-0.124939
-0.353707	Example 14.5b if ((unsigned	-0.124939
-0.353707	Example 14.4b if ((unsigned	-0.124939
-2.419663	- - - xxxxxxxxx	-0.124939
-0.294247	a/a=1 ----x---x a/1=a xxxxxxxxx	-0.124939
-0.294247	x-xxxxxx- x-xxxx-x- x-xxxxxxx xxxxxxxxx	-0.124939
-0.237934	inlining x-xxxx--x Constantfolding xxxxxxxxx	-0.124939
-0.237934	x-xxxxxxx xxxxxxxxx xxxxxxx-x xxxxxxxxx	-0.124939
-0.569914	allowed to have constructors	-0.124939
-0.761729	and before any constructors	-0.124939
-0.451806	destructors. The copy constructors	-0.124939
-0.320263	useful for copy constructors	-0.124939
-0.415001	are no copy constructors	-0.124939
-0.960732	clock frequency is increased	-0.124939
-0.594918	CPUs can be increased	-0.124939
-0.594918	abc can be increased	-0.124939
-0.523036	generation of CPUs increased	-0.124939
-0.860622	registers has been increased	-0.124939
-0.357584	on advanced C++ programming,	-0.124939
-0.460484	area of system programming,	-0.124939
-0.568549	timing, assembly language programming,	-0.124939
-0.331920	mixed language 11 programming,	-0.124939
-0.237934	structured and object-oriented programming,	-0.124939
-0.580806	not any other factor.	-0.124939
-0.021358	by the unroll factor.	-0.124939
-0.456748	the loop unroll factor.	-0.124939
-0.339479	they are available, i.e.	-0.124939
-0.048400	aligned by 16, i.e.	-0.425969
-0.237943	address is taken, i.e.	-0.124939
-0.237943	a quadratic matrix, i.e.	-0.124939
-0.764565	// f is nonzero	-0.124939
-0.358929	can multiply a nonzero	-0.124939
-0.550781	The values of nonzero	-0.124939
-0.560932	// check if nonzero	-0.124939
-0.357443	always 1 if nonzero	-0.124939
-0.505009	frequent cause of unacceptably	-0.124939
-0.358673	still frustrated by unacceptably	-0.124939
-0.358447	framework sometimes have unacceptably	-0.124939
-0.451198	inconsistent and sometimes unacceptably	-0.124939
-0.314766	user might experience unacceptably	-0.124939
-0.777118	different for each process.	-0.124939
-1.077529	instance for each process.	-0.124939
-0.790728	the software development process.	-0.124939
-0.646248	virtual function dispatch process.	-0.124939
-0.439034	during the update process.	-0.124939
-0.463404	Loop counter // Calculate	-0.124939
-0.294247	// Example 15.1c. Calculate	-0.124939
-0.294247	// Example 15.1b. Calculate	-0.124939
-0.237934	// Example 8.23b. Calculate	-0.124939
-0.237934	// Example 15.1a. Calculate	-0.124939
-0.358744	Example 14.21. // Only	-0.124939
-0.348367	AMD LIBM library. Only	-0.124939
-0.341865	the executable file. Only	-0.124939
-0.524504	arbitrary cache line. Only	-0.124939
-0.421418	value of ebx. Only	-0.124939
-1.601975	is that it adds	-0.124939
-0.805572	// This function adds	-0.124939
-0.352371	and temp++ actually adds	-0.124939
-0.966975	Runtime type identification adds	-0.124939
-0.237934	/ sar ebx,1 adds	-0.124939
-0.041069	}; void test ()	-0.425969
-0.086431	a[c][r]); void test ()	-0.124939
-0.336329	8.25 void Func ()	-0.124939
-0.294265	14.1c void CriticalInnerFunction ()	-0.124939
-0.659497	is slow // Division	-0.124939
-0.348384	for these calculations. Division	-0.124939
-0.564190	8 clock cycles. Division	-0.124939
-0.692153	is much faster. Division	-0.124939
-0.237934	where it matters: Division	-0.124939
-0.601249	beware of the pitfalls	-0.124939
-0.143174	program. 16.2 The pitfalls	-0.124939
-0.143174	155 16.2 The pitfalls	-0.124939
-1.061886	The most common pitfalls	-0.124939
-1.021905	are a few pitfalls	-0.124939
-0.590075	install a program package	-0.124939
-0.807235	of the software package	-0.124939
-0.344146	base a software package	-0.124939
-0.344146	install a software package	-0.124939
-0.344146	reinstall a software package	-0.124939
-1.456300	rather than the equivalent	-0.124939
-0.957881	overloaded operator is equivalent	-0.124939
-0.527156	the cases. The equivalent	-0.124939
-0.527096	#define directives are equivalent	-0.124939
-0.354774	best at doing equivalent	-0.124939
-0.585758	therefore important to understand	-0.124939
-1.147812	is difficult to understand	-0.124939
-0.938412	is easier to understand	-0.124939
-0.726685	to read and understand	-0.124939
-0.825074	if you don't understand	-0.124939
-0.847583	predefined vector classes Fortunately,	-0.124939
-0.331920	but not all. Fortunately,	-0.124939
-0.314766	and cache sizes. Fortunately,	-0.124939
-0.237934	one operator less. Fortunately,	-0.124939
-0.237934	c.y + d.y; Fortunately,	-0.124939
-0.593732	compiler from the command	-0.124939
-0.596483	respond to a command	-0.124939
-0.591041	specified on a command	-0.124939
-0.581042	called from a command	-0.124939
-0.358354	is servicing. A command	-0.124939
-0.463396	i++) b[i] = a[i];	-0.124939
-0.357317	No error return a[i];	-0.124939
-0.230070	i++) sum += a[i];	-0.124939
-0.328675	{ s0 += a[i];	-0.124939
-0.358973	rarely justifies the relatively	-0.124939
-0.505028	a condition is relatively	-0.124939
-0.599190	dangers of a relatively	-0.124939
-0.556251	comparisons, which are relatively	-0.124939
-0.462464	penalty. Branches are relatively	-0.124939
-0.524191	have a high priority.	-0.124939
-0.330848	performance has high priority.	-0.124939
-0.374787	threads with low priority.	-0.124939
-0.287642	have got low priority.	-0.124939
-0.442060	threads with lower priority.	-0.124939
-0.576439	size of data files.	-0.124939
-0.462064	object or library files.	-0.124939
-0.790101	in the source files.	-0.124939
-0.347490	or reading disk files.	-0.124939
-0.345260	modules and header files.	-0.124939
-0.888867	Position-independent code is inefficient,	-0.124939
-0.597981	block. This is inefficient,	-0.124939
-1.485907	This method is inefficient,	-0.124939
-0.555712	which is quite inefficient,	-0.124939
-0.444408	this is extremely inefficient,	-0.124939
-0.527350	you follow the guidelines	-0.124939
-0.580973	advantage of these guidelines	-0.124939
-0.877088	unsigned. The following guidelines	-0.124939
-0.356095	program logic. Some guidelines	-0.124939
-0.237934	resolutions, etc. Accessibility guidelines	-0.124939
-0.108449	e.g. Intel Math Kernel	-0.124939
-0.050848	of Intel's Math Kernel	-0.124939
-0.050848	in Intel's Math Kernel	-0.124939
-0.050848	the "Intel Math Kernel	-0.124939
-0.050848	The "Intel Math Kernel	-0.124939
-0.358940	or malloc) is necessarily	-0.124939
-0.865092	object is not necessarily	-0.124939
-0.582646	number is not necessarily	-0.124939
-0.586707	sequence are not necessarily	-0.124939
-0.583734	thread does not necessarily	-0.124939
-0.463572	to use and returns	-0.124939
-0.598320	until the function returns	-0.124939
-0.549469	member function which returns	-0.124939
-0.350359	// For unused returns	-0.124939
-0.314783	the beginning. ret returns	-0.124939
-1.369865	two or more jobs	-0.124939
-0.657292	to do two jobs	-0.124939
-0.237950	the necessary cleanup jobs	-0.124939
-0.237950	of handling cleanup jobs	-0.124939
-0.237943	ms for foreground jobs	-0.124939
-0.852972	of the class. Data	-0.124939
-0.325428	not always work. Data	-0.124939
-0.294247	keeping data together. Data	-0.124939
-0.294247	(see page 87). Data	-0.124939
-0.237934	same memory areas. Data	-0.124939
-0.463572	software layers and frameworks	-0.124939
-0.358798	virtual machine are frameworks	-0.124939
-0.352952	why such runtime frameworks	-0.124939
-0.454913	Several graphical interface frameworks	-0.124939
-0.349780	are running. Such frameworks	-0.124939
-0.885205	to see the excessive	-0.124939
-0.535178	software into an excessive	-0.124939
-0.354948	you avoid an excessive	-0.124939
-0.354948	19 Avoid an excessive	-0.124939
-0.358412	libraries available use excessive	-0.124939
-1.752964	that it is safer	-0.124939
-0.898068	to. It is safer	-0.124939
-0.463478	references. References are safer	-0.124939
-0.358347	increments seconds. A safer	-0.124939
-1.461619	It is therefore safer	-0.124939
-0.579199	supported instruction set. Aligning	-0.124939
-0.102879	exp exp 12.8 Aligning	-0.124939
-0.102879	vectors........................................................................ 119 12.8 Aligning	-0.124939
-0.102879	vector access. 12.9 Aligning	-0.124939
-0.102879	memory................................................................. 120 12.9 Aligning	-0.124939
-0.456752	advantage of out-of-order execution.	-0.124939
-0.245583	have no out-of-order execution.	-0.124939
-0.245583	can do out-of-order execution.	-0.124939
-0.245583	which prevents out-of-order execution.	-0.124939
-0.346491	sake of parallel execution.	-0.124939
-0.065526	= 1024; int a[size],	-0.124939
-0.459667	int i; float a[size],	-0.124939
-0.347093	= 1000; float a[size],	-0.124939
-1.015351	than by the latency	-0.124939
-0.859512	limited by the latency	-0.124939
-0.868474	same as the latency	-0.124939
-0.577066	distinction between the latency	-0.124939
-0.587382	chain has a latency	-0.124939
-0.463198	therefore, always to specify	-0.124939
-1.805136	is recommended to specify	-0.124939
-0.562622	so unless you specify	-0.124939
-0.539413	code if we specify	-0.124939
-0.842530	may as well specify	-0.124939
-0.636107	a[100]; int i; for(i=0;	-0.124939
-0.104918	list[300]; int i; for(i=0;	-0.602060
-0.448368	list[301]; int i; for(i=0;	-0.124939
-0.593549	benefit from the larger	-0.124939
-0.589895	in using the larger	-0.124939
-1.067284	size that is larger	-0.124939
-1.041361	may have a larger	-0.124939
-0.649381	that it allows larger	-0.124939
-0.595300	reductions such as -(-a)	-0.124939
-0.659175	= a*(b+c) - -(-a)	-0.124939
-0.596457	a*4 - n.a. -(-a)	-0.124939
-0.555647	change the expression -(-a)	-0.124939
-0.353648	write expressions like -(-a)	-0.124939
-0.810605	in some cases. Multiple	-0.124939
-0.325414	linking are: 146 Multiple	-0.124939
-0.237934	of a "function". Multiple	-0.124939
-0.237934	multiplication is exact. Multiple	-0.124939
-0.237934	// Example 7.38a. Multiple	-0.124939
-0.463653	by unit-testing is unfortunately	-0.124939
-0.494067	optimized function, but unfortunately	-0.124939
-0.510054	pure functions, but unfortunately	-0.124939
-0.347158	more readable but unfortunately	-0.124939
-0.448714	doesn't occur, but unfortunately	-0.124939
-0.886112	each value of n!	-0.124939
-0.197704	n) { // n!	-0.124939
-0.659238	previous value as n!	-0.124939
-0.356325	xn n 0 n!	-0.124939
-0.884272	most efficiently if pieces	-0.124939
-0.341882	divided into small pieces	-0.124939
-0.341882	are typically small pieces	-0.124939
-0.346480	by joining identical pieces	-0.124939
-0.331919	parts only. Critical pieces	-0.124939
-0.558874	interpreted version of Basic	-0.124939
-0.558874	popular version of Basic	-0.124939
-0.577497	A compiler for Basic	-0.124939
-0.299784	Basic is Visual Basic	-0.124939
-0.389673	as C#, Visual Basic	-0.124939
-0.339477	most reliable solution. (In	-0.124939
-0.615744	to avoid this. (In	-0.124939
-0.742206	for user input. (In	-0.124939
-0.408175	can be inlined. (In	-0.124939
-0.294247	and edx, respectively. (In	-0.124939
-0.574607	in the different microprocessors.	-0.124939
-0.817537	used with other microprocessors.	-0.124939
-0.458334	on most other microprocessors.	-0.124939
-0.525782	cycle on most microprocessors.	-0.124939
-0.237943	only on Intel/x86-compatible microprocessors.	-0.124939
-0.826064	stored in different modules.	-0.124939
-0.494522	accessible from other modules.	-0.124939
-0.336706	by any other modules.	-0.425969
-0.523662	files and system modules.	-0.124939
-0.358748	* _mm_load_ps(coef+i); // s	-0.124939
-0.339492	= _mm_hadd_ps(x, x); s	-0.124939
-0.403879	short int s; s	-0.124939
-0.212315	{ __m128 s; s	-0.124939
-0.237943	sum for(inti=0;i<16;i+=4){ //Loopby4 s	-0.124939
-1.075448	appear in the project	-0.124939
-0.599108	suited for the project	-0.124939
-0.582025	it from a project	-0.124939
-0.642898	the whole software project	-0.124939
-0.350360	a typical software project	-0.124939
-0.463646	a task is divided	-0.124939
-1.715518	that can be divided	-0.124939
-0.594918	job can be divided	-0.124939
-0.463199	tasks were not divided	-0.124939
-0.562543	area is usually divided	-0.124939
-0.459251	library function from www.agner.org/optimize/asmlib.zip.	-0.124939
-0.459251	purposes. Available from www.agner.org/optimize/asmlib.zip.	-0.124939
-0.441134	function library at www.agner.org/optimize/asmlib.zip.	-0.124939
-0.312354	asmlib library at www.agner.org/optimize/asmlib.zip.	-0.124939
-0.581619	in the library www.agner.org/optimize/asmlib.zip.	-0.124939
-0.134271	& (Tuesday | Wednesday	-0.124939
-0.134271	expression (Tuesday | Wednesday	-0.124939
-0.640503	|| Day == Wednesday	-0.124939
-0.442041	Tuesday = 4, Wednesday	-0.124939
-0.294256	bits for Tuesday, Wednesday	-0.124939
-0.462842	therefore suffer from mispredictions.	-0.124939
-0.102887	misses and branch mispredictions.	-0.124939
-0.435099	results for branch mispredictions.	-0.124939
-0.336347	generate many branch mispredictions.	-0.124939
-0.358827	disk. Software that relies	-0.124939
-0.599169	unless the code relies	-0.124939
-0.526282	unless your program relies	-0.124939
-0.456050	exceptions. The mechanism relies	-0.124939
-0.237934	in the MKL relies	-0.124939
-0.355586	invalid pointers, etc. And	-0.124939
-0.510544	as single precision. And	-0.124939
-0.800268	difficult to maintain. And	-0.124939
-0.538921	linkage table (PLT). And	-0.124939
-0.237934	programming languages. www.yeppp.info And	-0.124939
-0.499596	tested on different platforms,	-0.124939
-0.355047	different browsers, different platforms,	-0.124939
-0.838220	available for many platforms,	-0.124939
-0.357235	facilitate porting between platforms,	-0.124939
-0.355431	possible on Linux platforms,	-0.124939
-0.833817	sign bit to compare	-0.124939
-1.785567	you want to compare	-0.124939
-0.658500	allows us to compare	-0.124939
-0.358887	residual error and compare	-0.124939
-0.355973	to a[i+2] ; compare	-0.124939
-0.463646	or reference is valid	-0.124939
-1.070790	it is a valid	-0.124939
-0.596865	point to a valid	-0.124939
-0.358924	the bounds of valid	-0.124939
-0.505012	been initialized to valid	-0.124939
-1.514218	a piece of CPU-intensive	-0.124939
-0.834850	the throughput of CPU-intensive	-0.124939
-0.358923	that relate to CPU-intensive	-0.124939
-0.358840	assembly language for CPU-intensive	-0.124939
-0.525000	5-10% for some CPU-intensive	-0.124939
-0.809737	for the stack. Is	-0.124939
-0.297146	an efficient solution. Is	-0.124939
-0.201010	most efficient solution. Is	-0.124939
-0.294256	a binary tree. Is	-0.124939
-0.237943	(see page 38). Is	-0.124939
-1.179187	able to do so.	-0.124939
-0.544305	obvious to do so.	-0.124939
-0.544305	seem to do so.	-0.124939
-0.343755	array, or approximately so.	-0.124939
-0.237951	and often excessively so.	-0.124939
-0.463642	extra cost is seen	-0.124939
-0.596802	performance should be seen	-0.124939
-0.598727	misses is not seen	-0.124939
-0.587964	number. I have seen	-0.124939
-0.331909	I have ever seen	-0.124939
-0.349111	of computing resources. Typically,	-0.124939
-1.183655	set is enabled. Typically,	-0.124939
-0.534771	several execution units. Typically,	-0.124939
-0.861513	in the future. Typically,	-0.124939
-0.237934	have memory caches. Typically,	-0.124939
-1.571486	divisible by the 107	-0.124939
-0.594042	not. See page 107	-0.124939
-0.325401	Automatic vectorization ......................................................................................... 107	-0.124939
-0.237934	YMM registers ................................................................. 107	-0.124939
-0.237934	ZMM registers .......................................................... 107	-0.124939
-0.526858	allocated memory is contiguous	-0.124939
-0.593049	memory which is contiguous	-0.124939
-0.358653	container, preferably with contiguous	-0.124939
-0.574398	stored in one contiguous	-0.124939
-0.351294	the two modules contiguous	-0.124939
-0.358721	the pointer it gets	-0.124939
-0.358097	template class which gets	-0.124939
-0.762024	second generation class gets	-0.124939
-0.951612	the end user gets	-0.124939
-0.352962	the application programmer gets	-0.124939
-0.567171	this series of manuals.	-0.124939
-0.358877	relevant books and manuals.	-0.124939
-0.460947	for my optimization manuals.	-0.124939
-0.518304	in the subsequent manuals.	-0.124939
-0.771268	series of five manuals.	-0.124939
-0.463211	function definition. This tells	-0.124939
-0.758293	The map file tells	-0.124939
-0.456060	The const keyword tells	-0.124939
-0.141794	sampling: The profiler tells	-0.425969
-0.594939	method is to wrap	-0.124939
-1.802581	is recommended to wrap	-0.124939
-0.526854	are guaranteed to wrap	-0.124939
-0.590513	variables do not wrap	-0.124939
-0.594166	make the value wrap	-0.124939
-1.102157	function or class separately	-0.124939
-0.525396	moving each object separately	-0.124939
-0.355416	pixel or line separately	-0.124939
-0.649749	all code branches separately	-0.124939
-0.348371	and compile them separately	-0.124939
-0.630040	function is pure __attribute((	-0.124939
-0.331919	Fastcall function __fastcall __attribute((	-0.124939
-0.048399	16 __declspec( align(16)) __attribute((	-0.124939
-0.048399	aligned(16))) __declspec( align(16)) __attribute((	-0.124939
-0.382884	pure __attribute(( const)) __attribute((	-0.124939
-0.358932	other subtasks is necessary.	-0.124939
-1.192906	that may be necessary.	-0.124939
-1.855602	it is not necessary.	-0.124939
-1.346143	less efficient than necessary.	-0.124939
-0.657016	overflow checks where necessary.	-0.124939
-0.463646	of CPUs is increasing	-0.124939
-0.504668	the problem by increasing	-0.124939
-0.575960	used for an increasing	-0.124939
-0.356743	are seeing an increasing	-0.124939
-0.237943	represent a monotonically increasing	-0.124939
-0.618047	be aligned by 16,	-0.124939
-0.389733	are aligned by 16,	-0.124939
-0.581162	15 byte at 16,	-0.124939
-0.279495	other than 8, 16,	-0.124939
-0.279495	2, 4, 8, 16,	-0.124939
-0.556755	communication between different threads,	-0.124939
-0.558881	it into multiple threads,	-0.124939
-0.454528	shared between multiple threads,	-0.124939
-0.656497	and synchronization between threads,	-0.124939
-0.287635	designed program. 6 Development	-0.124939
-0.287635	....................................................................................... 24 6 Development	-0.124939
-0.520886	29 for details. Development	-0.124939
-0.237943	is very old-fashioned. Development	-0.124939
-0.237943	Most IDE's (Integrated Development	-0.124939
-0.487038	expression that is AND'ed	-0.425969
-0.358045	of cc[i]+2 is AND'ed	-0.124939
-0.358045	and bb[i]*cc[i] is AND'ed	-0.124939
-0.875374	Here, I have AND'ed	-0.124939
-0.409810	allows common subexpression elimination	-0.124939
-0.266072	2; Common subexpression elimination	-0.124939
-0.266072	reductions: Common subexpression elimination	-0.124939
-0.255938	Constant propagation Pointer elimination	-0.124939
-0.255938	like sin. Pointer elimination	-0.124939
-0.578883	cases, but not all.	-0.124939
-0.647134	extra code at all.	-0.124939
-0.352515	not supported at all.	-0.124939
-0.352515	no offset at all.	-0.124939
-0.348387	can reduce them all.	-0.124939
-1.069955	in the compiler ..........................................................................................	-0.124939
-0.654947	14.13 System programming ..........................................................................................	-0.124939
-0.648511	Other system resources ..........................................................................................	-0.124939
-0.683842	Test and maintenance ..........................................................................................	-0.124939
-0.595252	Access data sequentially ..........................................................................................	-0.124939
-1.060604	may use the upper	-0.124939
-0.284359	or a reasonable upper	-0.124939
-0.212315	when no reasonable upper	-0.124939
-0.660010	{ // Get upper	-0.124939
-0.237943	or a not-too-big upper	-0.124939
-0.463251	names and code addresses.	-0.124939
-0.504114	at different memory addresses.	-0.124939
-0.452757	only self- relative addresses.	-0.124939
-0.339485	uses 32-bit absolute addresses.	-0.124939
-0.421401	aligned at round addresses.	-0.124939
-0.201647	*p+2 is a loop-invariant	-0.124939
-0.358423	subexpression elimination and loop-invariant	-0.124939
-0.358423	constant propagation, and loop-invariant	-0.124939
-0.356479	can move out loop-invariant	-0.124939
-0.527314	an addition to sum1	-0.124939
-0.883499	+= 2) { sum1	-0.124939
-0.357318	two summation variables sum1	-0.124939
-0.538921	100; float list[size], sum1	-0.124939
-0.237934	sum2 += list[i+1];} sum1	-0.124939
-0.541098	^ -1 = ~a	-0.124939
-0.461831	a a & ~a	-0.124939
-0.801952	- a & ~a	-0.124939
-0.494362	0 a ^ ~a	-0.124939
-0.237943	-1 (a&~b)|(~a&b)=a^b --------- ~a	-0.124939
-0.832177	point induction variables Compilers	-0.124939
-0.356811	or C++ code. Compilers	-0.124939
-0.456668	the micro-op cache. Compilers	-0.124939
-1.097177	Mac OS X Compilers	-0.124939
-0.336312	ranges now overlap. Compilers	-0.124939
-0.237934	"m"(x) : "memory" );	-0.124939
-0.237934	& 3) <<6 );	-0.124939
-0.237934	short int bb[size] );	-0.124939
-0.237934	short int cc[size] );	-0.124939
-0.237934	short int aa[size] );	-0.124939
-0.358921	the goal of 18	-0.124939
-0.336330	number 1. Number 18	-0.124939
-0.294247	systems ............................................................................. 158 18	-0.124939
-0.294247	Program installation .................................................................................................. 18	-0.124939
-0.237934	p. 22). 159 18	-0.124939
-0.654196	do something about them.	-0.124939
-1.030268	how to avoid them.	-0.124939
-0.757709	function that needs them.	-0.124939
-0.489136	big before multiplying them.	-0.124939
-0.237934	wires that connect them.	-0.124939
-0.349534	b is floating point.	-0.124939
-0.817982	integer to floating point.	-0.124939
-0.349534	results as floating point.	-0.124939
-0.348393	three values per point.	-0.124939
-0.325421	serves as entry point.	-0.124939
-0.830987	and the time consumption	-0.124939
-0.564514	stores the time consumption	-0.124939
-0.576590	style. The time consumption	-0.124939
-0.351455	the exact time consumption	-0.124939
-0.355543	with low power consumption	-0.124939
-0.354793	b member by 8.	-0.124939
-0.539555	address divisible by 8.	-0.124939
-0.354793	the index by 8.	-0.124939
-0.294275	alias, if appropriate. 8.	-0.124939
-0.596606	key? If the key	-0.124939
-0.540748	actions like a key	-0.124939
-0.463244	to pressing a key	-0.124939
-0.357618	their index or key	-0.124939
-0.357618	mouse move or key	-0.124939
-0.179136	78 for an explanation.	-0.124939
-0.354948	works, here's an explanation.	-0.124939
-0.546711	Performance for further explanation.	-0.124939
-0.527391	needs a little explanation.	-0.124939
-0.358673	not advantageous by itself.	-0.124939
-0.599169	than the code itself.	-0.124939
-0.597224	into the program itself.	-0.124939
-0.351294	by the constructor itself.	-0.124939
-0.452739	including the profiler itself.	-0.124939
-1.277486	needs to be updated	-0.124939
-1.195924	library can be updated	-0.124939
-0.485519	has not been updated	-0.124939
-0.344910	Has not been updated	-0.124939
-0.237951	- 2014. Last updated	-0.124939
-0.358342	of i will appear	-0.124939
-1.831576	of the program appear	-0.124939
-0.582126	in which they appear	-0.425969
-0.709176	which the modules appear	-0.124939
-0.504894	inefficient way. The Codeplay	-0.124939
-0.635305	Optimizes reasonably well. Codeplay	-0.124939
-0.331909	x-xxxx--x Constantfolding xxxxxxxxx Codeplay	-0.124939
-0.237934	v. 1.4, 2005. Codeplay	-0.124939
-0.237934	these. The CodeGear, Codeplay	-0.124939
-0.357643	the same object (except	-0.124939
-0.775586	on integer expressions (except	-0.124939
-0.347500	the previous iteration (except	-0.124939
-0.635305	the two loops (except	-0.124939
-0.339469	floating point capabilities (except	-0.124939
-0.891864	cache. If the combined	-0.124939
-0.591051	model where the combined	-0.124939
-0.600610	s3 can be combined	-0.124939
-0.527103	the results are combined	-0.124939
-0.659074	The Clang compiler combined	-0.124939
-0.563007	15. C++ is definitely	-0.124939
-0.350067	complicated cases should definitely	-0.124939
-0.350067	.NET framework should definitely	-0.124939
-0.452392	These containers should definitely	-0.124939
-0.528818	But lazy binding definitely	-0.124939
-0.358882	with 100 and jumps	-0.124939
-0.595888	code that it jumps	-0.124939
-0.545720	if a thread jumps	-0.124939
-0.172691	identical branches Eliminate jumps	-0.124939
-0.172691	+ 1.; Eliminate jumps	-0.124939
-0.358237	the right vector elements.	-0.124939
-1.281925	structure or class elements.	-0.124939
-0.769342	addresses of array elements.	-0.124939
-0.351560	access to array elements.	-0.124939
-0.544045	search for finding elements.	-0.124939
-0.358084	transfer across all .cpp	-0.124939
-0.488376	combine the multiple .cpp	-0.124939
-0.346975	of compiling multiple .cpp	-0.124939
-0.346975	for combining multiple .cpp	-0.124939
-0.520575	(i.e. the current .cpp	-0.124939
-0.562336	compiler with many features,	-0.124939
-0.547766	library has many features,	-0.124939
-0.354516	many advanced optimizing features,	-0.124939
-0.345236	has full metaprogramming features,	-0.124939
-0.314776	need better backup features,	-0.124939
-0.572891	the position-independent code flag	-0.124939
-0.353216	use the zero flag	-0.124939
-0.091070	in the carry flag	-0.124939
-0.206447	modify the carry flag	-0.124939
-0.115510	256; i += 8)	-0.726999
-0.492283	{ (iset >= 8)	-0.124939
-1.064855	up with the ever	-0.124939
-0.358893	to invest in ever	-0.124939
-0.587964	compiler I have ever	-0.124939
-0.459309	if no exception ever	-0.124939
-0.478048	model is hardly ever	-0.124939
-0.355684	Called directly // Writes	-0.124939
-0.065391	"Hello 1" // Writes	-0.425969
-0.355684	&Object2; p2->Hello(); // Writes	-0.124939
-0.648545	Other system resources Writes	-0.124939
-0.463572	6, 9 and 13	-0.124939
-1.360709	last byte at 13	-0.124939
-0.339477	Conclusion .......................................................................................................... 120 13	-0.124939
-0.237934	header files. 121 13	-0.124939
-0.237934	0.95 0.6 1.19 13	-0.124939
-0.726697	the numbers in b[i]	-0.124939
-1.364792	{ a[i] = b[i]	-0.124939
-0.358695	by checking if b[i]	-0.124939
-1.354987	size; i++) { b[i]	-0.124939
-1.479330	< size; i++) b[i]	-0.124939
-0.857042	calculation time is doubled.	-0.124939
-0.787314	of registers is doubled.	-0.124939
-0.958362	clock frequency is doubled.	-0.124939
-0.598738	performance is not doubled.	-0.124939
-0.860651	registers has been doubled.	-0.124939
-0.504947	be read and written	-0.124939
-0.596802	code should be written	-0.124939
-0.357473	floating point value written	-0.124939
-0.354020	case with programs written	-0.124939
-0.237934	in a hand- written	-0.124939
-0.549365	standardization of programming languages,	-0.124939
-0.492395	in other programming languages,	-0.124939
-0.327208	supports multiple programming languages,	-0.124939
-0.423657	allocation. Some programming languages,	-0.124939
-0.237959	in interpreted script languages,	-0.124939
-0.721172	with new or malloc	-0.124939
-0.655053	and delete or malloc	-0.124939
-0.356515	and delete, or malloc	-0.124939
-0.854647	with the functions malloc	-0.124939
-0.343747	and delete (or malloc	-0.124939
-1.091316	the program that runs	-0.124939
-0.525630	make software that runs	-0.124939
-0.461689	a thread that runs	-0.124939
-1.361217	if the program runs	-0.124939
-0.357256	that most software runs	-0.124939
-0.525988	when a is true,	-0.124939
-1.120208	when b is true,	-0.124939
-0.358045	> 0 is true,	-0.124939
-0.358045	of || is true,	-0.124939
-0.358625	therefore count as true,	-0.124939
-0.557798	time doing the division.	-0.124939
-0.358711	involves multiplication or division.	-0.124939
-0.570137	for integer vector division.	-0.124939
-1.526458	of floating point division.	-0.124939
-0.462442	SSE4.1 and integer division.	-0.124939
-0.357677	double Y = C;	-0.124939
-0.462046	B; x.c = C;	-0.124939
-0.357767	+ B*x + C;	-0.124939
-0.037169	int A, B, C;	-0.124939
-0.314793	0.11 0.18 0.18 0.18	-0.124939
-0.237957	0.12 0.11 0.18 0.18	-0.124939
-0.314776	0.18 0.12 0.11 0.18	-0.124939
-0.314776	Core 2 0.12 0.18	-0.124939
-0.237943	2 0.63 0.75 0.18	-0.124939
-0.357768	not _WIN32 n.a. MS	-0.124939
-0.163456	relevant to optimization MS	-0.425969
-0.339479	long or int64_t MS	-0.124939
-0.325411	long or uint64_t MS	-0.124939
-0.658672	bb, cc); } #endif	-0.124939
-0.343732	dword ptr n; #endif	-0.124939
-0.408175	#else #define pure_function #endif	-0.124939
-0.237934	Alignd(X) X __attribute__((aligned(16))) #endif	-0.124939
-0.237934	#define FUNCNAME SelectAddMul_AVX2 #endif	-0.124939
-1.296350	rest of the present	-0.124939
-0.600810	techniques in the present	-0.124939
-0.358843	Agner Fog The present	-0.124939
-0.504895	are: Optimizing for present	-0.124939
-0.463199	that were not present	-0.124939
-0.996612	example 15.1b to 15.1c	-0.124939
-0.762813	example 15.1a to 15.1c	-0.124939
-0.358245	example 15.1d to 15.1c	-0.124939
-1.328140	code in example 15.1c	-0.124939
-0.237951	15.1a to 151 15.1c	-0.124939
-0.862249	int size = 1000;	-0.124939
-0.355567	int ArraySize = 1000;	-0.124939
-0.355567	int arraysize = 1000;	-0.124939
-2.107526	0; i < 1000;	-0.124939
-1.635064	versions of the strlen	-0.124939
-0.659651	have tested the strlen	-0.124939
-0.294256	0.35 0.29 0.28 strlen	-0.124939
-0.237943	are some examples: strlen	-0.124939
-0.237943	0.82 0.59 0.27 strlen	-0.124939
-1.060121	The code is __asm	-0.124939
-0.358718	int 3; or __asm	-0.124939
-0.352387	qword ptr x; __asm	-0.124939
-0.102876	Windows, Intel/MASM syntax: __asm	-0.124939
-0.102876	Linux, Gnu/AT&T syntax: __asm	-0.124939
-0.223375	or one clock cycle.	-0.124939
-0.325235	only one clock cycle.	-0.124939
-0.223375	just one clock cycle.	-0.124939
-0.062546	addition every clock cycle.	-0.124939
-1.360709	last byte at 11	-0.124939
-0.656916	Integer multiplication takes 11	-0.124939
-0.355749	integration, mixed language 11	-0.124939
-0.347490	is rarely needed. 11	-0.124939
-0.325401	Hyperthreading ..................................................................................................... 103 11	-0.124939
-0.358827	or *.so) that belong	-0.124939
-0.358071	These addresses all belong	-0.124939
-0.356871	library functions often belong	-0.124939
-0.356483	the stack always belong	-0.124939
-0.555708	these cache lines belong	-0.124939
-0.357539	is destroyed. In 50	-0.124939
-0.525114	the conversion takes 50	-0.124939
-0.325401	return types .............................................................................................. 50	-0.124939
-0.314766	Function parameters ............................................................................................... 50	-0.124939
-0.314766	This code took 50	-0.124939
-0.358451	Development Environments) have facilities	-0.124939
-0.453942	and using advanced facilities	-0.124939
-0.056054	added? If search facilities	-0.425969
-0.336321	tools have powerful facilities	-0.124939
-0.358583	Volume 1 - 5.	-0.124939
-0.558641	to the constant 5.	-0.124939
-0.351697	with j << 5.	-0.124939
-0.351297	and VIA CPUs. 5.	-0.124939
-0.382873	32 for AVX. 5.	-0.124939
-0.527000	Intel but is currently	-0.124939
-0.540892	Windows version is currently	-0.124939
-0.504843	vector classes are currently	-0.124939
-0.560138	used. The method currently	-0.124939
-0.355249	the Gnu manual currently	-0.124939
-0.353836	occur in multiplication here:	-0.124939
-0.343751	can be mentioned here:	-0.124939
-0.331909	two main principles here:	-0.124939
-0.331931	of the pitfalls here:	-0.124939
-0.237934	other error reporting here:	-0.124939
-0.810628	in some cases. Does	-0.124939
-0.890460	later instruction sets. Does	-0.124939
-0.384957	for 32-bit Windows. Does	-0.124939
-0.384957	only 32-bit Windows. Does	-0.124939
-0.421414	including an IDE. Does	-0.124939
-0.540794	A problem with macros	-0.124939
-0.566780	when used as macros	-0.124939
-0.874968	you should avoid macros	-0.124939
-0.355819	calls. 48 Use macros	-0.124939
-0.237934	Table 18.3. Predefined macros	-0.124939
-0.589489	dilemma. You may prefer	-0.124939
-0.356557	a programmer may prefer	-0.124939
-0.358502	Which solution you prefer	-0.124939
-0.358342	many users will prefer	-0.124939
-0.461719	double. Here we prefer	-0.124939
-1.098457	value of the divisor	-0.425969
-0.065616	// Faster if divisor	-0.425969
-0.580269	using a constant divisor	-0.124939
-0.172700	.................................................................................................... 19 3.5 Program	-0.124939
-0.172700	update process. 3.5 Program	-0.124939
-0.102881	.................................................................................. 16 3.3 Program	-0.124939
-0.102881	following sections. 3.3 Program	-0.124939
-0.505032	of processors is better.	-0.124939
-0.444426	model will work better.	-0.124939
-0.343761	next model work better.	-0.124939
-0.237951	solution is clearly better.	-0.124939
-0.855373	32-bit Linux and BSD,	-0.124939
-0.867367	is based on BSD,	-0.124939
-0.435797	and 64-bit Linux, BSD,	-0.124939
-0.308380	platforms (Windows, Linux, BSD,	-0.124939
-0.597238	c2 with the bit-mask:	-0.124939
-0.143324	and generate a bit-mask:	-0.425969
-0.538953	with the inverted bit-mask:	-0.124939
-2.004838	a power of two.	-0.124939
-0.358718	a year or two.	-0.124939
-0.596488	array rather than two.	-0.124939
-1.006863	compilers will make two.	-0.124939
-0.538969	minutes to start up,	-0.124939
-0.595266	to be cleaned up,	-0.124939
-0.314776	the computer starts up,	-0.124939
-0.314776	it is filled up,	-0.124939
-0.325422	and resources cleaned up.	-0.124939
-0.575653	the program starts up.	-0.124939
-0.408187	to be filled up.	-0.124939
-0.237943	can be broken up.	-0.124939
-0.357281	required for performance reasons.	-0.124939
-0.537297	C++ for several reasons.	-0.124939
-0.343747	possible for usability reasons.	-0.124939
-0.294256	version for marketing reasons.	-0.124939
-0.594056	order. See page 103	-0.124939
-0.314776	10.1 Hyperthreading ..................................................................................................... 103	-0.124939
-0.294256	order execution ................................................................................................. 103	-0.124939
-0.382884	this by writing: 103	-0.124939
-0.140834	time. 4 2 Choosing	-0.124939
-0.140834	............................................................................................... 4 2 Choosing	-0.124939
-0.319773	............................................................................................... 23 5 Choosing	-0.124939
-0.319773	a website. 5 Choosing	-0.124939
-1.286181	of the time slices	-0.124939
-0.540308	if the time slices	-0.124939
-0.540308	increase the time slices	-0.124939
-0.351458	will get time slices	-0.124939
-1.063799	case of an exception.	-0.124939
-0.529557	event of an exception.	-0.124939
-0.354952	actually throws an exception.	-0.124939
-0.355143	destructor causes another exception.	-0.124939
-0.356260	conditions using & enum	-0.124939
-0.355833	7.4 Enums An enum	-0.124939
-0.702583	Testing multiple conditions enum	-0.124939
-0.237943	float, double, bool, enum	-0.124939
-0.879431	in a program repeats	-0.124939
-0.584759	if a loop repeats	-0.124939
-0.651387	} This loop repeats	-0.124939
-0.357594	loop that also repeats	-0.124939
-0.597052	CPU with the highest	-0.124939
-0.597516	modules when the highest	-0.124939
-0.833725	clock cycle. The highest	-0.124939
-0.658439	is implemented. The highest	-0.124939
-0.501795	rows/columns in matrix 96	-0.124939
-0.331919	data sequentially .......................................................................................... 96	-0.124939
-0.294256	9.8 Strings ...................................................................................................................... 96	-0.124939
-0.237943	data structures ............................................................. 96	-0.124939
-0.463206	not going to recommend	-0.124939
-0.358589	software teachers to recommend	-0.124939
-0.048401	Some programming textbooks recommend	-0.124939
-0.048401	Nowadays, programming textbooks recommend	-0.124939
-0.589625	also likely to lead	-0.124939
-0.881599	code. This can lead	-0.124939
-0.356841	new insight can lead	-0.124939
-0.356841	the bottlenecks can lead	-0.124939
-0.501969	then make an additional	-0.124939
-0.524079	by making an additional	-0.124939
-0.598967	give the compiler additional	-0.124939
-0.421426	left for transferring additional	-0.124939
-1.890118	There is no 51	-0.124939
-0.887226	other. See page 51	-0.124939
-0.237943	Structures and classes............................................................................................ 51	-0.124939
-0.237943	members (properties) ............................................................................ 51	-0.124939
-0.358237	// 2-dimensional vector 56	-0.124939
-0.325411	Overloaded functions .............................................................................................. 56	-0.124939
-0.325411	Overloaded operators ............................................................................................. 56	-0.124939
-0.237943	7.25 Bitfields ................................................................................................................... 56	-0.124939
-0.882328	also be a type.	-0.124939
-0.583868	of a different type.	-0.124939
-0.357994	of each integer type.	-0.124939
-0.314776	to a wrong type.	-0.124939
-0.589162	them into a place	-0.124939
-1.807753	is recommended to place	-0.124939
-0.534169	only from one place	-0.124939
-0.354282	and shifts one place	-0.124939
-1.893051	then it is preferable	-0.124939
-1.041643	static linking is preferable	-0.124939
-1.830919	it may be preferable	-0.124939
-1.389615	It is often preferable	-0.124939
-0.959339	the CPU to overlap	-0.124939
-1.046354	is able to overlap	-0.124939
-0.358772	out-of-order capabilities can overlap	-0.124939
-0.590513	live-ranges do not overlap	-0.124939
-0.145137	to fit the eight-element	-0.726999
-0.494407	SSE2 typically takes 40	-0.124939
-0.715078	Integer division takes 40	-0.124939
-0.595280	short int s; 40	-0.124939
-0.237951	7.11 Type conversions.................................................................................................... 40	-0.124939
-0.358541	2 to x 43	-0.124939
-0.563665	mechanism. See page 43	-0.124939
-0.563665	delay. See page 43	-0.124939
-0.237951	and switch statements............................................................................. 43	-0.124939
-0.776930	32-bit systems and sixteen	-0.124939
-0.725913	operating systems and sixteen	-0.124939
-0.358724	involves eight or sixteen	-0.124939
-0.346488	can contain either sixteen	-0.124939
-1.652740	be used for turning	-0.124939
-0.558455	function or by turning	-0.124939
-0.460025	whole program by turning	-0.124939
-0.356087	significantly just by turning	-0.124939
-0.355389	Make pointer at initialization.	-0.124939
-0.523413	Load library at initialization.	-0.124939
-0.538523	object doesn't need initialization.	-0.124939
-0.552730	does the necessary initialization.	-0.124939
-0.350353	on PC platforms. Graphics	-0.124939
-0.314785	categories: File input/output Graphics	-0.124939
-0.102879	database access. 3.10 Graphics	-0.124939
-0.102879	....................................................................................................... 21 3.10 Graphics	-0.124939
-0.591266	predict where the obstacles	-0.124939
-0.580984	aware of these obstacles	-0.124939
-0.356103	them. Some important obstacles	-0.124939
-1.129486	the most common obstacles	-0.124939
-0.600814	provided in the asmlib	-0.124939
-0.584057	vectors, but the asmlib	-0.124939
-0.065106	instruction set, using asmlib	-0.425969
-0.759198	piece of code. Furthermore,	-0.124939
-0.527357	program is executed. Furthermore,	-0.124939
-0.294256	and system crash. Furthermore,	-0.124939
-0.237943	stored in edx. Furthermore,	-0.124939
-1.673089	is possible to obtain	-0.124939
-0.571410	sometimes possible to obtain	-0.124939
-1.263392	then you can obtain	-0.124939
-0.832837	cases, you can obtain	-0.124939
-1.855779	the value of ebx.	-0.124939
-0.570233	significant bit of ebx.	-0.124939
-0.358926	in edx, to ebx.	-0.124939
-0.314785	push and pop ebx.	-0.124939
-0.358718	a prediction or estimate	-0.124939
-0.421427	if a reasonable estimate	-0.124939
-0.237943	we can roughly estimate	-0.124939
-0.237943	see if our estimate	-0.124939
-1.295438	instruction set is enabled	-0.124939
-0.559118	SSE2 is always enabled	-0.124939
-0.348387	and leave them enabled	-0.124939
-0.463593	more efficient and enables	-0.124939
-0.065339	object file. This enables	-0.425969
-0.718319	other modules. This enables	-0.124939
-0.172700	this option. 8.4 Obstacles	-0.124939
-0.172700	....................................................................... 77 8.4 Obstacles	-0.124939
-0.102881	PathScale compilers. 8.3 Obstacles	-0.124939
-0.102881	compilers............................................................................. 74 8.3 Obstacles	-0.124939
-0.061311	a[], int & r)	-0.425969
-0.324812	FuncB (int & r)	-0.124939
-0.324812	int Sum3(S3 & r)	-0.124939
-0.358895	be arranged in regular	-0.124939
-0.527158	prefetch data for regular	-0.124939
-0.358279	the Internet at regular	-0.124939
-1.035179	follows a simple regular	-0.124939
-1.858408	the value of m	-0.124939
-0.965757	in the way m	-0.124939
-0.321856	the template function, m	-0.124939
-0.588651	the simple function, m	-0.124939
-0.356824	string as code. Metaprogramming	-0.124939
-0.255928	.......................................................................................... 150 15 Metaprogramming	-0.124939
-0.255928	page 90. 15 Metaprogramming	-0.124939
-0.421426	90. 15 Metaprogramming Metaprogramming	-0.124939
-0.352974	The following examples explain	-0.124939
-0.093313	code. Let me explain	-0.124939
-0.093313	sets. Let me explain	-0.124939
-0.314785	vector libraries. To explain	-0.124939
-0.356786	branching takes time. Dispatch	-0.124939
-0.352958	page 128 below. Dispatch	-0.124939
-0.496303	with different compilers. Dispatch	-0.124939
-0.237943	at different times: Dispatch	-0.124939
-0.463239	other platforms as well,	-0.124939
-0.459069	functions are optimized well,	-0.124939
-0.493758	way is predicted well,	-0.124939
-0.314790	Studio optimizes reasonably well,	-0.124939
-1.314256	but this is sufficiently	-0.124939
-0.609113	the arrays are sufficiently	-0.425969
-0.294265	written. This worked sufficiently	-0.124939
-0.352958	example 13.1 below. 126	-0.124939
-0.339479	11.8 127 127 126	-0.124939
-0.331919	and maintenance .......................................................................................... 126	-0.124939
-0.314776	13.5 Implementation ..................................................................................................... 126	-0.124939
-0.505028	and double is bad	-0.124939
-0.599860	programmer in a bad	-0.124939
-0.563025	more examples of bad	-0.124939
-0.349124	implementation works particularly bad	-0.124939
-0.015118	public: static double p(double	-0.726999
-0.198144	Everything that is said	-0.425969
-1.729745	it can be said	-0.124939
-0.635360	is often easier said	-0.124939
-0.599811	applies to the modulo	-0.124939
-0.550780	rules apply to modulo	-0.124939
-0.357683	same as i modulo	-0.124939
-0.585141	used to avoid modulo	-0.124939
-0.550698	data files and databases	-0.124939
-0.058474	20 3.9 Other databases	-0.124939
-0.331930	Access to remote databases	-0.124939
-0.132378	overflow: _controlfp_s(&dummy, 0, _EM_OVERFLOW);	-0.124939
-0.132378	_fpreset(); _controlfp_s(&dummy, 0, _EM_OVERFLOW);	-0.124939
-0.023527	_EM_OVERFLOW); // _controlfp(0, _EM_OVERFLOW);	-0.425969
-0.336328	Or, if protection against	-0.124939
-0.237943	I must warn against	-0.124939
-0.237943	should be weighed against	-0.124939
-0.237943	of possible remedies against	-0.124939
-0.452738	vector register size. Vectorized	-0.124939
-0.595245	and overloaded operators. Vectorized	-0.124939
-0.237943	// Example 12.4b. Vectorized	-0.124939
-0.237943	x); } 112 Vectorized	-0.124939
-0.237943	case 1: printf("Beta"); break;	-0.124939
-0.237943	case 2: printf("Gamma"); break;	-0.124939
-0.237943	case 3: printf("Delta"); break;	-0.124939
-0.237943	case 0: printf("Alpha"); break;	-0.124939
-1.958424	is that the loader	-0.124939
-0.597201	more by the loader	-0.124939
-0.358675	is loaded, the loader	-0.124939
-0.463593	compiler, linker and loader	-0.124939
-0.780012	and model number. Failure	-0.124939
-0.172696	is also deallocated. Failure	-0.124939
-0.343759	has been deallocated. Failure	-0.124939
-0.382896	of program flow. Failure	-0.124939
-0.569225	the class is declared.	-0.124939
-1.183427	the variable is declared.	-0.124939
-0.143126	time MemberPointer is declared.	-0.124939
-0.143126	before MemberPointer is declared.	-0.124939
-1.749023	a lot of resources,	-0.124939
-0.358389	or require more resources,	-0.124939
-1.365728	for the same resources,	-0.124939
-0.349783	interfaces to network resources,	-0.124939
-0.764565	if a is true.	-0.124939
-0.726573	and 1 for true.	-0.124939
-0.596808	it should be true.	-0.124939
-1.239466	is not always true.	-0.124939
-0.550690	variables, arrays and objects.	-0.124939
-0.579402	allocation for all objects.	-0.124939
-1.281925	structure or class objects.	-0.124939
-0.355121	for every four objects.	-0.124939
-0.760883	multiple calculations in parallel.	-0.124939
-0.785018	that run in parallel.	-0.124939
-0.815609	are running in parallel.	-0.124939
-0.656981	do things in parallel.	-0.124939
-0.504829	inserted, one by one,	-0.124939
-0.540056	one, and only one,	-0.124939
-0.559102	there is always one,	-0.124939
-0.336313	preceding label plus one,	-0.124939
-0.353126	Example 14.12b int list[300];	-0.124939
-0.353126	Example 14.13b int list[300];	-0.124939
-0.353126	Example 14.13a int list[300];	-0.124939
-0.353126	Example 14.12a int list[300];	-0.124939
-0.032669	r < SIZE; r++)	-0.726999
-1.576533	b; a = parabola	-0.124939
-0.347093	Example 8.3a float parabola	-0.124939
-0.347093	* a;} float parabola	-0.124939
-0.347093	Example 8.1b float parabola	-0.124939
-0.356705	// x^2 // x^4	-0.124939
-0.356705	F32vec4 xx4(x4); // x^4	-0.124939
-0.356705	* x2; // x^4	-0.124939
-0.237959	x^1, x^2, x^3, x^4	-0.124939
-0.541226	things like a mouse	-0.124939
-0.358887	to keyboard and mouse	-0.124939
-0.357618	key press or mouse	-0.124939
-0.357618	to keyboard or mouse	-0.124939
-0.331125	because partial template specialization	-0.124939
-0.062166	// Full template specialization	-0.425969
-0.331125	// Partial template specialization	-0.124939
-1.271415	of the loop index.	-0.124939
-1.020656	as an array index.	-0.124939
-0.579502	with a simple index.	-0.124939
-0.237943	with a top-of-stack index.	-0.124939
-0.357281	advanced system performance options.	-0.124939
-0.356818	many good optimization options.	-0.124939
-0.521403	on all relevant options.	-0.124939
-0.237943	of performance monitoring options.	-0.124939
-0.146612	c < SIZE; c++)	-0.425969
-0.037170	c < r; c++)	-0.425969
-0.461345	the data elements are.	-0.124939
-0.825222	obstacles to optimization are.	-0.124939
-0.764363	sure that they are.	-0.124939
-0.347339	that's what they are.	-0.124939
-0.598305	allocation may be needed,	-0.124939
-1.156953	when they are needed,	-0.124939
-0.065588	search facilities are needed,	-0.124939
-0.855423	the way of declaring	-0.124939
-0.972755	is done by declaring	-0.124939
-0.460025	even smaller by declaring	-0.124939
-0.356087	be inlined by declaring	-0.124939
-0.600953	names in the SVML	-0.124939
-0.348667	2 double Intel SVML	-0.124939
-0.348667	vmlsExp4 vmldExp2 Intel SVML	-0.124939
-0.348667	__svml_expf4 __svml_exp2 Intel SVML	-0.124939
-0.106949	library (*.dll or *.so).	-0.425969
-0.023527	shared objects (*.dll, *.so).	-0.124939
-0.651947	} u; if (u.i	-0.124939
-0.065297	u, v; if (u.i	-0.124939
-0.354951	n; 143 if (u.i	-0.124939
-0.355767	problems and necessary support.	-0.124939
-0.718665	and without AVX support.	-0.124939
-0.341869	program with profiling support.	-0.124939
-0.237943	compiler with C++0x support.	-0.124939
-0.463593	than addition and subtraction	-0.124939
-0.048402	time than addition, subtraction	-0.425969
-0.314803	involving integer addition, subtraction	-0.124939
-0.065652	_mm_add_epi16(c, two); // Multiply	-0.425969
-0.358551	Example 7.42 int Multiply	-0.124939
-0.237951	(a<b && b<c) Multiply	-0.124939
-0.172696	!= 0) *(p++) |=	-0.124939
-0.172696	0; i--) *(p++) |=	-0.124939
-0.237951	float x; *(int*)&x |=	-0.124939
-0.237951	= 2.0f; x.i |=	-0.124939
-0.969798	in a memory pool.	-0.124939
-0.349808	container or memory pool.	-0.124939
-1.266761	the same memory pool.	-0.124939
-0.452064	in one memory pool.	-0.124939
-0.788636	The version that performs	-0.124939
-0.354486	a code version performs	-0.124939
-0.354486	this code version performs	-0.124939
-0.355840	The Core2 processor performs	-0.124939
-0.598341	statistics, and the "Intel	-0.124939
-1.029646	such as the "Intel	-0.124939
-0.358848	purposes (www.boost.org). The "Intel	-0.124939
-0.325429	code optimization Intel: "Intel	-0.124939
-1.574507	at compile time. Are	-0.124939
-0.325422	a top-of-stack index. Are	-0.124939
-0.294256	be too small. Are	-0.124939
-0.237943	linked list. 94 Are	-0.124939
-0.358843	decrement operators The pre-increment	-0.124939
-0.462977	whether you use pre-increment	-0.124939
-0.556144	also situations where pre-increment	-0.124939
-0.350860	if you change pre-increment	-0.124939
-0.463579	allocated object, and ownership	-0.124939
-0.444407	constructor" to transfer ownership	-0.124939
-0.237943	operator that transfers ownership	-0.124939
-0.237943	object that looses ownership	-0.124939
-0.887226	other. See page 88	-0.124939
-0.237943	(new and delete). 88	-0.124939
-0.237943	stored together ...................................... 88	-0.124939
-0.237943	be stored together...................................... 88	-0.124939
-0.212331	x; *(int*)&x |= 0x80000000;	-0.124939
-0.212331	2.0f; x.i |= 0x80000000;	-0.124939
-0.102881	u; u.i ^= 0x80000000;	-0.124939
-0.102881	with u.i[1] ^= 0x80000000;	-0.124939
-1.077301	and it can move	-0.124939
-1.331449	that it can move	-0.124939
-0.358520	the container may move	-0.124939
-0.325421	like a mouse move	-0.124939
-0.462883	= c; } Can	-0.124939
-0.479811	supported at all. Can	-0.124939
-0.314776	updated since 2004. Can	-0.124939
-0.382884	to the container. Can	-0.124939
-0.577568	portable way of defining	-0.124939
-0.593919	when used for defining	-0.124939
-0.723178	this problem by defining	-0.124939
-0.357380	be overcome by defining	-0.124939
-1.039948	of code that produces	-0.124939
-0.572067	C++ program that produces	-0.124939
-0.351147	an unsigned variable produces	-0.124939
-0.351147	a signed variable produces	-0.124939
-0.358682	a loss of precision,	-0.124939
-0.358682	or loss of precision,	-0.124939
-0.430827	single or double precision,	-0.124939
-0.430827	precision or double precision,	-0.124939
-0.875713	functions have a non-inlined	-0.124939
-1.796134	to make a non-inlined	-0.124939
-0.562000	from making a non-inlined	-0.124939
-0.358598	another module. This non-inlined	-0.124939
-1.199876	many of the drawbacks	-0.124939
-0.065773	2.8 Overcoming the drawbacks	-0.425969
-0.463593	the advantages and drawbacks	-0.124939
-0.878999	Exp(float x) { __declspec(align(16))	-0.124939
-0.325422	// Example 12.2 __declspec(align(16))	-0.124939
-0.382884	compiler #define Alignd(X) __declspec(align(16))	-0.124939
-0.294256	work. Data alignment. __declspec(align(16))	-0.124939
-1.390891	sign bit of u.f	-0.124939
-0.550658	we know that u.f	-0.124939
-0.595628	v.i) { // u.f	-0.124939
-0.339486	Now 1.0 <= u.f	-0.124939
-0.599276	available for the commercial	-0.124939
-0.358347	Codeplay VectorC A commercial	-0.124939
-0.539503	missing in many commercial	-0.124939
-0.294256	Public License, optional commercial	-0.124939
-0.457635	read and write configuration	-0.124939
-0.435065	objects), resource files, configuration	-0.124939
-0.294256	of several drivers, configuration	-0.124939
-0.237943	number of DLLs, configuration	-0.124939
-0.583242	examples on page 134	-0.124939
-0.584887	range (see page 134	-0.124939
-0.538953	out of range"; 134	-0.124939
-0.294265	Bounds checking .................................................................................................. 134	-0.124939
-0.358631	functions or code lines.	-0.124939
-0.502049	the same cache lines.	-0.124939
-1.021936	than a few lines.	-0.124939
-0.577497	your compiler for restrictions	-0.124939
-0.354907	have very few restrictions	-0.124939
-0.103023	There are certain restrictions	-0.425969
-0.540217	elimination x n.a. Constant	-0.124939
-0.353217	Intel Borland Microsoft Constant	-0.124939
-0.237943	b = 6.0f; Constant	-0.124939
-0.237943	a few places. Constant	-0.124939
-0.389116	by the heap manager	-0.124939
-0.168187	order. The heap manager	-0.124939
-0.168187	heap. The heap manager	-0.124939
-0.168187	invalid. The heap manager	-0.124939
-0.357225	target buffer, branch pattern	-0.124939
-0.010299	a simple periodic pattern	-0.124939
-0.020849	A simple periodic pattern	-0.124939
-1.062732	mode because the x86-64	-0.124939
-0.107098	all x86 and x86-64	-0.425969
-0.314785	platform _M_IX86 _M_IX86 x86-64	-0.124939
-0.864485	to assume that *p+2	-0.124939
-0.462605	from assuming that *p+2	-0.124939
-0.437624	*p and calculate *p+2	-0.124939
-0.338358	you could calculate *p+2	-0.124939
-0.358887	CodeGear, Codeplay and Watcom	-0.124939
-0.269236	optimize well. Open Watcom	-0.124939
-0.269236	8.42n, 2004. Open Watcom	-0.124939
-0.331936	Constantfolding xxxxxxxxx Codeplay Watcom	-0.124939
-0.882924	and make a round	-0.124939
-0.358926	data members to round	-0.124939
-0.355389	are aligned at round	-0.124939
-0.523413	are loaded at round	-0.124939
-1.369865	two or more cores,	-0.124939
-0.358153	CPUs or CPU cores,	-0.124939
-0.357645	processing instructions, multiple cores,	-0.124939
-0.314776	have got RISC cores,	-0.124939
-0.312901	a branch that chooses	-0.425969
-0.879431	where a program chooses	-0.124939
-0.356503	the cache always chooses	-0.124939
-1.038277	the program is running.	-0.124939
-0.591999	programs they are running.	-0.124939
-0.358390	program itself when running.	-0.124939
-1.090696	of code is serial	-0.124939
-0.570809	above code is serial	-0.124939
-0.048401	2 12.6 Transforming serial	-0.124939
-0.048401	113 12.6 Transforming serial	-0.124939
-0.340384	consecutive elements from cc	-0.602060
-0.349900	vector b: from cc	-0.124939
-0.357725	first call // Header	-0.124939
-0.357725	or later // Header	-0.124939
-0.815686	follows: Instruction set Header	-0.124939
-0.237951	(Gnu) Table 12.2. Header	-0.124939
-0.583178	mathematical functions that 150	-0.124939
-0.594056	program. See page 150	-0.124939
-0.331919	System programming .......................................................................................... 150	-0.124939
-0.314776	15 Metaprogramming ....................................................................................................... 150	-0.124939
-0.357570	is quite efficient thanks	-0.124939
-0.353652	prefetch data automatically thanks	-0.124939
-0.345241	are very similar thanks	-0.124939
-0.442070	never becomes fragmented thanks	-0.124939
-0.561738	2, x = 2.0;	-0.124939
-0.897735	for (x = 2.0;	-0.124939
-0.355572	1.0; list[i].b = 2.0;	-0.124939
-0.355572	1.0; temp->b = 2.0;	-0.124939
-0.536032	branch into the pipeline	-0.124939
-0.536032	fed into the pipeline	-0.124939
-0.562586	executed. However, the pipeline	-0.124939
-1.365970	by using a pipeline	-0.124939
-0.589191	x, unsigned int n)	-0.124939
-0.060337	int factorial (int n)	-0.425969
-0.529453	void SomeFunction (int n)	-0.124939
-0.224692	times for user input.	-0.124939
-0.224692	waits for user input.	-0.124939
-0.224692	Waiting for user input.	-0.124939
-0.421439	keyboard or mouse input.	-0.124939
-1.069962	in the compiler 8.1	-0.124939
-0.951239	listed in table 8.1	-0.124939
-0.352954	of overflow. Table 8.1	-0.124939
-0.314776	compiler .......................................................................................... 66 8.1	-0.124939
-0.356322	the worst- case conditions.	-0.124939
-0.651362	or other hardware conditions.	-0.124939
-0.421414	time under worst-case conditions.	-0.124939
-0.237943	under the best-case conditions.	-0.124939
-0.461669	much more by choosing	-0.124939
-0.879869	is obtained by choosing	-0.124939
-1.138289	into account when choosing	-0.124939
-0.579246	help the programmer choosing	-0.124939
-1.319159	explained on page 146	-0.124939
-0.543526	detail on page 146	-0.124939
-0.643870	dynamic linking are: 146	-0.124939
-0.237951	versus dynamic libraries............................................................................ 146	-0.124939
-0.658358	7.26 Overloaded functions ..............................................................................................	-0.124939
-0.647493	Function return types ..............................................................................................	-0.124939
-0.646891	8.6 Optimization directives ..............................................................................................	-0.124939
-0.710942	Explicit cache control ..............................................................................................	-0.124939
-0.463355	the intrinsic function _mm256_zeroupper()	-0.124939
-0.166903	support then call _mm256_zeroupper()	-0.124939
-0.166903	dispatching then call _mm256_zeroupper()	-0.124939
-0.166903	support, then call _mm256_zeroupper()	-0.124939
-0.753011	to the user. Making	-0.124939
-0.237956	.......................................................................................................... 120 13 Making	-0.124939
-0.237956	files. 121 13 Making	-0.124939
-0.237951	for significant improvements. Making	-0.124939
-0.726542	to set the flush-to-zero	-0.124939
-0.527138	from setting the flush-to-zero	-0.124939
-0.255938	Example 7.6. Set flush-to-zero	-0.124939
-0.255938	Example 7.5. Set flush-to-zero	-0.124939
-0.599190	example of a Taylor	-0.124939
-0.595308	iterations such as Taylor	-0.124939
-0.237943	// Example 12.9b. Taylor	-0.124939
-0.237943	// Example 12.9a. Taylor	-0.124939
-0.357508	version FuncType * SelectAddMul_pointer	-0.124939
-0.341891	(iset >= 2) SelectAddMul_pointer	-0.124939
-0.331946	(iset >= 8) SelectAddMul_pointer	-0.124939
-0.314776	(iset >= 5) SelectAddMul_pointer	-0.124939
-1.072477	points to the dispatcher.	-0.124939
-0.597255	points to a dispatcher.	-0.124939
-0.065274	overriding Intel's CPU dispatcher.	-0.425969
-0.168189	and the Gnu, Clang,	-0.124939
-0.168189	by the Gnu, Clang,	-0.124939
-0.168189	use the Gnu, Clang,	-0.124939
-0.176512	such as Gnu, Clang,	-0.124939
-0.358882	example 14.8 and 14.9	-0.124939
-2.157755	Example: // Example 14.9	-0.124939
-0.314776	integers ................................... 141 14.9	-0.124939
-0.538937	= (double)(signed int)u; 14.9	-0.124939
-0.827470	depends only on n,	-0.124939
-0.356217	any compile-time constant n,	-0.124939
-0.277090	7.32a double x, n,	-0.124939
-0.277090	7.32b double x, n,	-0.124939
-2.157755	Example: // Example 14.8	-0.124939
-1.328086	code in example 14.8	-0.124939
-0.314776	to another platform. 14.8	-0.124939
-0.314776	and double..................................................................................... 140 14.8	-0.124939
-0.957749	no risk of overflow,	-0.124939
-0.659687	no checking for overflow,	-0.124939
-0.357994	bounds violation, integer overflow,	-0.124939
-0.500129	small to cause overflow,	-0.124939
-0.177103	x < 100; x++)	-0.425969
-0.444413	x <= n; x++)	-0.124939
-0.237951	>= 0; i--, x++)	-0.124939
-0.541035	caching conditions are optimal.	-0.124939
-1.184699	code is not optimal.	-0.124939
-0.882004	instructions are not optimal.	-0.124939
-0.504137	course far from optimal.	-0.124939
-0.006839	x) { _mm_storeu_si128((__m128i *)d,	-0.301030
-0.237959	x) { _mm_store_si128((__m128i *)d,	-0.124939
-1.289674	object of a class,	-0.124939
-0.294284	class, Intel Vector class,	-0.124939
-0.294284	vector, bits Vector class,	-0.124939
-0.516134	to a derived class,	-0.124939
-0.358335	= cos(x); } z	-0.124939
-0.352952	> y && z	-0.124939
-0.538937	y = cos(x); z	-0.124939
-0.538937	y = sin(x); z	-0.124939
-0.249079	is calculated in advance	-0.124939
-0.249079	has calculated in advance	-0.124939
-0.539491	memory needed in advance	-0.124939
-0.357484	doesn't know in advance	-0.124939
-0.050952	cc into vector c:	-0.249877
-0.846715	of b is guaranteed	-0.124939
-0.358639	of i&15 is guaranteed	-0.124939
-0.591999	- they are guaranteed	-0.124939
-0.598738	base is not guaranteed	-0.124939
-0.591723	way. You may think	-0.124939
-0.572390	model and then think	-0.124939
-0.356167	usability, but I think	-0.124939
-0.456673	a. I don't think	-0.124939
-0.849783	89 for an example.	-0.124939
-0.556648	this with an example.	-0.124939
-0.576351	(n!) as an example.	-0.124939
-0.590467	handling in this example.	-0.124939
-0.884761	is available. The older	-0.124939
-0.550334	the compatibility with older	-0.124939
-0.526999	significant effect on older	-0.124939
-0.348393	branch tree. On older	-0.124939
-0.463650	etc., as is commonly	-0.124939
-0.507208	variables The most commonly	-0.124939
-0.507208	purposes. The most commonly	-0.124939
-1.108042	There are two commonly	-0.124939
-0.573342	fill up the queue	-0.124939
-0.835865	to implement a queue	-0.124939
-0.358347	inconvenient times. A queue	-0.124939
-0.382884	example, a FIFO queue	-0.124939
-0.358973	when exiting the {}	-0.124939
-0.355352	declaring it inside {}	-0.124939
-0.408187	(0 < 5) {}	-0.124939
-0.237943	elements }; vector() {}	-0.124939
-1.360270	= a + 1.0f;	-0.124939
-0.122595	{ list[i] += 1.0f;	-0.425969
-0.328681	& 15] += 1.0f;	-0.124939
-0.172700	$B1$3: pop ret ALIGN	-0.124939
-0.172700	ja $B2$3: ret ALIGN	-0.124939
-0.023527	compiled to assembly: ALIGN	-0.425969
-0.357970	This requires no modification	-0.124939
-0.544902	program may need modification	-0.124939
-0.347529	will therefore need modification	-0.124939
-0.498859	if a certain modification	-0.124939
-0.345250	reveals that similar solutions	-0.124939
-0.102879	about it. Possible solutions	-0.124939
-0.102879	non-Intel machines? Possible solutions	-0.124939
-0.237951	chip. Such hybrid solutions	-0.124939
-0.062222	C++ An optimization guide	-0.124939
-0.062222	CPUs: An optimization guide	-0.124939
-0.062222	C++: An optimization guide	-0.124939
-0.062222	language: An optimization guide	-0.124939
-0.600951	examples in the appendix	-0.124939
-0.577338	provided in an appendix	-0.124939
-0.578344	available as an appendix	-0.124939
-0.355840	container classes. An appendix	-0.124939
-0.463535	code lines. The 17	-0.124939
-0.336341	this column. Number 17	-0.124939
-0.102879	testing ................................................................................................ 157 17	-0.124939
-0.102879	than normal. 157 17	-0.124939
-0.358975	should apply the empty	-0.124939
-0.358848	throw() specification. The empty	-0.124939
-0.570706	also have an empty	-0.124939
-0.356748	functions. While an empty	-0.124939
-0.249091	makes testing and maintenance	-0.124939
-0.249091	development, testing and maintenance	-0.124939
-0.065624	13.4 Test and maintenance	-0.124939
-0.526889	XOR'ing it with 1:	-0.124939
-0.501383	printf("Alpha"); break; case 1:	-0.124939
-0.295538	PROCNEAR ; parameter 1:	-0.124939
-0.295538	NEAR ; parameter 1:	-0.124939
-0.237956	rarely needed. 11 Out	-0.124939
-0.237956	..................................................................................................... 103 11 Out	-0.124939
-0.237951	on a First-In-Last- Out	-0.124939
-0.237951	on a First-In-First- Out	-0.124939
-0.881693	this in a protected	-0.124939
-0.591230	message in a protected	-0.124939
-0.358589	to switch to protected	-0.124939
-0.358589	of switching to protected	-0.124939
-1.152741	dynamic memory allocation. Container	-0.124939
-0.102879	...................................................................................... 90 9.7 Container	-0.124939
-0.102879	using alloca. 9.7 Container	-0.124939
-0.237951	cases. Multiple threads? Container	-0.124939
-1.174431	be used as alternatives	-0.124939
-0.886714	are more efficient alternatives	-0.124939
-0.357505	at the possible alternatives	-0.124939
-1.402788	There are various alternatives	-0.124939
-1.749023	a lot of modifications	-0.124939
-1.309597	be improved by modifications	-0.124939
-0.454934	check if your modifications	-0.124939
-0.351312	likely to require modifications	-0.124939
-0.494012	i+=3,i_div_3++){ list[i] += i_div_3;	-0.124939
-0.328681	i_div_3; list[i+1] += i_div_3;	-0.124939
-0.328681	i_div_3; list[i+2] += i_div_3;	-0.124939
-0.570271	list[300]; int i, i_div_3;	-0.124939
-0.583199	40 i = s;	-0.124939
-0.195901	i; short int s;	-0.124939
-0.294265	x) { __m128 s;	-0.124939
-0.048401	at the "worst case"	-0.124939
-0.048401	represent the "worst case"	-0.124939
-0.102881	is the "best case"	-0.124939
-0.102881	case" and "best case"	-0.124939
-1.330522	You have to distinguish	-0.124939
-0.556102	Make sure to distinguish	-0.124939
-1.615627	is important to distinguish	-0.124939
-0.549181	may fail to distinguish	-0.124939
-0.358848	and truncation. The missing	-0.124939
-0.888625	Operations that are missing	-0.124939
-0.585868	these functions are missing	-0.124939
-0.526442	input data. A missing	-0.124939
-0.347525	platforms. 2. Optimizing subroutines	-0.124939
-0.006839	manual 2: "Optimizing subroutines	-0.602060
-0.331939	that some development tools	-0.124939
-0.331939	network. Various development tools	-0.124939
-0.345247	for better metaprogramming tools	-0.124939
-0.341879	are offering profiling tools	-0.124939
-0.596833	have a = 0x2710	-0.124939
-0.167064	variable from address 0x2710	-0.124939
-0.167064	again from address 0x2710	-0.124939
-0.167064	reads from address 0x2710	-0.124939
-0.410849	isolate the hot spot	-0.124939
-0.191281	is a hot spot	-0.124939
-0.191281	When a hot spot	-0.124939
-0.252293	function or hot spot	-0.124939
-0.358210	recursive templates. The powN	-0.124939
-0.358210	the template. The powN	-0.124939
-0.562691	infinite loop if powN	-0.124939
-0.724492	int N> class powN	-0.124939
-0.869447	same as the C-style	-0.124939
-0.595432	safe than the C-style	-0.124939
-0.504898	type conversion // C-style	-0.124939
-0.351708	string. The old C-style	-0.124939
-0.930571	the C++ language While	-0.124939
-0.458820	to frame functions. While	-0.124939
-0.538937	in assembly language". While	-0.124939
-0.237943	use than others. While	-0.124939
-0.622882	int c:2; }; Bitfield	-0.124939
-0.340040	char abc; }; Bitfield	-0.124939
-0.351725	Example 7.40b union Bitfield	-0.124939
-0.348393	Example 7.40a struct Bitfield	-0.124939
-0.358586	has something to clean	-0.124939
-0.659180	is nothing to clean	-0.124939
-0.462342	simplest and most clean	-0.124939
-0.355981	the program must clean	-0.124939
-0.339486	the option -fpic according	-0.124939
-0.478056	a binary representation according	-0.124939
-0.314790	to always behave according	-0.124939
-0.294256	= 100. Now, according	-0.124939
-0.595632	13) { // Bounds	-0.124939
-0.102879	are constant. 14.2 Bounds	-0.124939
-0.102879	................................................................................................. 132 14.2 Bounds	-0.124939
-0.237951	minimizing memory fragmentation. Bounds	-0.124939
-0.437887	i; } u; u.i	-0.124939
-0.749196	u; int n; u.i	-0.124939
-0.429569	check if nonzero u.i	-0.124939
-1.185195	lead to a dramatic	-0.124939
-0.841045	is much more dramatic	-0.124939
-0.582186	has a very dramatic	-0.124939
-0.354355	can have quite dramatic	-0.124939
-0.460865	versions without an IDE.	-0.124939
-0.356748	Windows, including an IDE.	-0.124939
-0.559383	have its own IDE.	-0.124939
-0.870602	Microsoft Visual Studio IDE.	-0.124939
-0.358843	are smaller. The lengths	-0.124939
-0.584303	strings of different lengths	-0.124939
-0.357411	typically have variable lengths	-0.124939
-0.237943	gone to great lengths	-0.124939
-0.598738	functions is not expensive.	-0.124939
-0.528253	misses are very expensive.	-0.124939
-0.350363	but also very expensive.	-0.124939
-0.560737	cache are less expensive.	-0.124939
-1.482643	the sake of efficiency.	-0.124939
-0.540841	any loss of efficiency.	-0.124939
-0.557743	a difference in efficiency.	-0.124939
-0.530842	methods to improve efficiency.	-0.124939
-0.299778	..................................................................................................................... 163 20 Copyright	-0.124939
-0.299778	some links. 20 Copyright	-0.124939
-0.237951	available from www.agner.org/optimize. Copyright	-0.124939
-0.237951	University of Denmark. Copyright	-0.124939
-0.541229	integer registers is extended	-0.124939
-1.201224	method can be extended	-0.124939
-1.289458	XMM registers are extended	-0.124939
-0.562388	done with an extended	-0.124939
-0.358010	= (total cache size)	-0.124939
-0.492270	|| i >= size)	-0.124939
-0.102879	address) / (line size)	-0.124939
-0.102879	of sets) (line size)	-0.124939
-0.294256	nmmintrin.h (MS) smmintrin.h (Gnu)	-0.124939
-0.237943	AMD FMA4 fma4intrin.h (Gnu)	-0.124939
-0.237943	ammintrin.h (MS) xopintrin.h (Gnu)	-0.124939
-0.237943	intrin.h (MS) x86intrin.h (Gnu)	-0.124939
-0.463646	This information is contained	-0.124939
-1.352124	pointer to a contained	-0.124939
-0.358923	thread. Pointers to contained	-0.124939
-0.435056	can be completely contained	-0.124939
-1.089820	the overhead of transferring	-0.124939
-0.462699	preferred method for transferring	-0.124939
-0.358190	register left for transferring	-0.124939
-0.358682	32-bit Windows by transferring	-0.124939
-1.204760	caching less efficient. Access	-0.124939
-0.102879	...................................................................................................................... 96 9.9 Access	-0.124939
-0.102879	at www.agner.org/optimize/cppexamples.zip. 9.9 Access	-0.124939
-0.237951	remote data locally. Access	-0.124939
-0.548607	variables, and for saving	-0.124939
-1.632239	be used for saving	-0.124939
-0.357541	a strategy for saving	-0.124939
-0.237959	a stack frame, saving	-0.124939
-0.568415	market for many years	-0.124939
-0.500478	often take several years	-0.124939
-0.336321	using a six years	-0.124939
-0.294256	five or ten years	-0.124939
-0.353278	Example 14.16a double y,	-0.124939
-0.353278	Example 14.16b double y,	-0.124939
-0.467460	8.8a double x, y,	-0.124939
-0.395386	a; float x, y,	-0.124939
-0.463212	high priority of structured	-0.124939
-0.358594	the importance of structured	-0.124939
-1.008351	be compatible with structured	-0.124939
-0.540780	code relies on structured	-0.124939
-0.201033	See the compiler documentation	-0.425969
-0.336323	due to poor documentation	-0.124939
-0.314785	Mostly obsolete. Microprocessor documentation	-0.124939
-0.915829	test () { CChild1	-0.124939
-0.353747	the declaration class CChild1	-0.124939
-0.353747	// versions: class CChild1	-0.124939
-0.294265	Object1; CChild2 Object2; CChild1	-0.124939
-0.500763	Watcom Digital Mars PGI	-0.124939
-0.294256	PathScale compilers. (The PGI	-0.124939
-0.237943	cannot be tolerated. PGI	-0.124939
-0.237943	v. 3.1, 2007. PGI	-0.124939
-0.348368	per element. 100 As	-0.124939
-0.237943	y = pow(x,n) As	-0.124939
-0.237943	be predicted perfectly. As	-0.124939
-0.237943	b * 5). As	-0.124939
-0.172889	runtime type identification (RTTI)	-0.124939
-0.133201	Runtime type identification (RTTI)	-0.124939
-0.354797	double precision by default,	-0.124939
-0.065277	lazy binding by default,	-0.124939
-0.354797	Does not, by default,	-0.124939
-0.344269	y; } double xpow10(double	-0.124939
-0.063912	of 10 double xpow10(double	-0.425969
-0.344269	loop unrolled double xpow10(double	-0.124939
-0.023527	14.17b double a1, a2,	-0.124939
-0.023527	14.17a double a1, a2,	-0.124939
-0.011604	double y, a1, a2,	-0.425969
-0.352518	program flow at inconvenient	-0.124939
-0.352518	garbage collector at inconvenient	-0.124939
-0.352518	come unpredictably at inconvenient	-0.124939
-0.561152	time, but also inconvenient	-0.124939
-1.267960	has to be expressed	-0.124939
-0.876786	address can be expressed	-0.124939
-0.588709	offset can be expressed	-0.124939
-0.588709	formats can be expressed	-0.124939
-0.599657	speed if the bottleneck	-0.124939
-0.596414	calls. If the bottleneck	-0.124939
-0.882337	to be a bottleneck	-0.124939
-0.355758	maintain. Any specific bottleneck	-0.124939
-1.524502	by using the directive	-0.124939
-0.940701	cannot expect a directive	-0.124939
-0.351304	where a #define directive	-0.124939
-0.237943	or the __assume_aligned directive	-0.124939
-0.458089	non-Intel CPU. If not,	-0.124939
-0.458089	unroll factor. If not,	-0.124939
-0.536851	fact it does not,	-0.124939
-0.607497	32-bit Windows. Does not,	-0.124939
-0.598923	registers is a scarce	-0.124939
-0.165088	Registers are a scarce	-0.124939
-0.351719	disk space were scarce	-0.124939
-0.653080	the loop. Example 12.4b	-0.124939
-0.355522	chosen expression. Example 12.4b	-0.124939
-0.497500	way of example 12.4b	-0.124939
-0.586684	construction in example 12.4b	-0.124939
-1.200389	implementation of the lrint	-0.124939
-1.815380	to use the lrint	-0.124939
-0.203408	static inline int lrint	-0.425969
-0.657303	that have multiple versions.	-0.124939
-0.548759	software in two versions.	-0.124939
-0.555646	including the 64-bit versions.	-0.124939
-0.654890	static and dynamic versions.	-0.124939
-0.800344	Optimizing memory access .............................................................................................	-0.124939
-0.847608	Using vector classes .............................................................................................	-0.124939
-0.651852	7.27 Overloaded operators .............................................................................................	-0.124939
-0.754956	14.4 Integer multiplication .............................................................................................	-0.124939
-0.920382	divisible by 16. Alignment	-0.124939
-0.212321	register variables. 9.5 Alignment	-0.124939
-0.212321	...................................... 88 9.5 Alignment	-0.124939
-0.237951	16 Table 7.2. Alignment	-0.124939
-0.527297	processor model is going	-0.124939
-0.358924	50-50 chance of going	-0.124939
-0.659176	I am not going	-0.124939
-0.358386	performance penalty when going	-0.124939
-0.434488	the overflow and underflow	-0.124939
-0.434488	about overflow and underflow	-0.124939
-0.916124	will generate an underflow	-0.124939
-0.598649	generate floating point underflow	-0.124939
-0.021919	when their live ranges	-0.124939
-0.007184	because their live ranges	-0.301030
-0.917327	the loop and splitting	-0.124939
-0.504014	for classes. The splitting	-0.124939
-0.358210	a formalism. The splitting	-0.124939
-0.351712	256-bit instructions were splitting	-0.124939
-0.900963	waste of the user's	-0.124939
-0.358675	version satisfies the user's	-0.124939
-0.358675	can steal the user's	-0.124939
-0.352985	majority of end user's	-0.124939
-0.526789	_MSC_VER and not __INTEL_COMPILER	-0.124939
-0.355437	compiler Windows Linux __INTEL_COMPILER	-0.124939
-0.212321	and not __INTEL_COMPILER __INTEL_COMPILER	-0.124939
-0.212321	Windows Linux __INTEL_COMPILER __INTEL_COMPILER	-0.124939
-0.725966	need to be cleaned	-0.124939
-0.541035	allocated resources are cleaned	-0.124939
-0.456392	called and resources cleaned	-0.124939
-0.917507	the table is cached.	-0.124939
-1.618951	may not be cached.	-0.124939
-1.834347	it is not cached.	-0.124939
-0.591389	data are not cached.	-0.124939
-0.358887	for audio and video	-0.124939
-0.358724	streaming audio or video	-0.124939
-0.044155	12.9 Aligning RGB video	-0.425969
-0.072326	consecutive elements in aa:	-0.425969
-0.357645	the line number information.	-0.124939
-0.356220	on publicly available information.	-0.124939
-0.575392	save exception handling information.	-0.124939
-0.350868	receive new relevant information.	-0.124939
-0.755683	9.7 Container classes Whenever	-0.124939
-0.650782	is never used. Whenever	-0.124939
-0.345250	one big problem. Whenever	-0.124939
-0.325411	processors is better. Whenever	-0.124939
-0.599814	belongs to the area	-0.124939
-0.608751	the same memory area	-0.425969
-0.504095	The static data area	-0.124939
-0.596507	here because the consequence	-0.124939
-1.024253	function calls. The consequence	-0.124939
-0.358210	been wasted. The consequence	-0.124939
-0.314785	has the unfortunate consequence	-0.124939
-0.353278	Example 14.17b double a1,	-0.124939
-0.353278	Example 14.17a double a1,	-0.124939
-0.093316	14.16a double y, a1,	-0.124939
-0.093316	14.16b double y, a1,	-0.124939
-0.659877	the dividend is unsigned.	-0.124939
-0.557767	when converted to unsigned.	-0.124939
-0.541177	be signed or unsigned.	-0.124939
-0.358697	with zero-bits if unsigned.	-0.124939
-0.504640	arithmetic operations with pointers.	-0.124939
-0.456075	except for char pointers.	-0.124939
-0.122674	overflow, and invalid pointers.	-0.124939
-0.122674	violations and invalid pointers.	-0.124939
-0.594056	cached. See page 26	-0.124939
-0.540713	keyword, for floating 26	-0.124939
-0.237943	different C++ constructs........................................................................ 26	-0.124939
-0.237943	of variable storage............................................................................. 26	-0.124939
-0.539782	functions if possible. Smaller	-0.124939
-0.442041	in the software. Smaller	-0.124939
-0.314776	to optimize caching. Smaller	-0.124939
-0.237943	of small microcontrollers: Smaller	-0.124939
-0.887226	system. See page 29	-0.124939
-0.339479	-263 263-1 int64_t 29	-0.124939
-0.336321	are swapping column 29	-0.124939
-0.237943	variables and operators............................................................................... 29	-0.124939
-0.527317	another addition to sum2	-0.124939
-0.358882	variables sum1 and sum2	-0.124939
-0.555534	sum1 = 0, sum2	-0.124939
-0.382884	sum1 += list[i]; sum2	-0.124939
-0.358738	n; u.i = (n	-0.124939
-0.834566	0) { if (n	-0.124939
-0.462753	n) { if (n	-0.124939
-0.355973	= 1.0; while (n	-0.124939
-0.463531	intermediate object for (b	-0.124939
-1.576511	b; a = (b	-0.124939
-0.319033	0) { if (b	-0.124939
-0.357380	point number by 2n	-0.124939
-0.461669	can divide by 2n	-0.124939
-0.526889	AND'ing it with 2n	-0.124939
-1.214534	is less than 2n	-0.124939
-0.762129	little or no idea	-0.124939
-0.407952	be a good idea	-0.124939
-0.407952	not a good idea	-0.124939
-0.407952	therefore a good idea	-0.124939
-1.109649	Alignment of data ......................................................................................................	-0.124939
-0.655380	7.7 Function pointers ......................................................................................................	-0.124939
-0.655045	3.12 Network access ......................................................................................................	-0.124939
-0.640470	3.8 System database ......................................................................................................	-0.124939
-0.463598	be written in C,	-0.124939
-0.063047	rule of standard C,	-0.425969
-0.343755	Compiled languages include C,	-0.124939
-0.358748	Gnu compiler // Same	-0.124939
-0.237943	// Example 12.4c. Same	-0.124939
-0.237943	// Example 12.4e. Same	-0.124939
-0.237943	// Example 12.4d. Same	-0.124939
-0.532464	efficient. You can disable	-0.124939
-0.532464	compiler. You can disable	-0.124939
-0.856603	then you should disable	-0.124939
-0.294265	code to test. disable	-0.124939
-1.193238	rely on the assumption	-0.124939
-0.358824	Gnu compiler, the assumption	-0.124939
-0.504496	make such an assumption	-0.124939
-0.461716	cannot make any assumption	-0.124939
-0.463270	that x is treated	-0.124939
-1.327332	the object is treated	-0.124939
-0.586651	functions is also treated	-0.124939
-0.458039	function are simply treated	-0.124939
-0.352685	compatible across compilers. Fastcall	-0.124939
-0.237943	function #pragma optimize(...) Fastcall	-0.124939
-0.237943	member pointers /vms Fastcall	-0.124939
-0.237943	on CodeGear compiler). Fastcall	-0.124939
-0.646256	or 3-dimensional vectors RGB	-0.124939
-0.102883	access. 12.9 Aligning RGB	-0.124939
-0.102883	120 12.9 Aligning RGB	-0.124939
-0.237951	reciprocal square root, RGB	-0.124939
-0.358882	and C# and avoids	-0.124939
-0.960313	a way that avoids	-0.124939
-0.581853	time and it avoids	-0.124939
-0.358082	more time but avoids	-0.124939
-1.317896	the compiler is prevented	-0.124939
-0.463273	returning. F1 is prevented	-0.124939
-0.594925	contentions can be prevented	-0.124939
-0.594925	behavior can be prevented	-0.124939
-0.804204	14.10 Mathematical functions .......................................................................................	-0.124939
-0.970326	of hardware platform .......................................................................................	-0.124939
-0.807102	the optimal algorithm .......................................................................................	-0.124939
-0.615791	Execution unit throughput .......................................................................................	-0.124939
-0.965050	this feature is seldom	-0.124939
-0.463512	detecting errors that seldom	-0.124939
-0.358293	functions separate from seldom	-0.124939
-0.497368	functions, and put seldom	-0.124939
-0.314255	a penalty for mixing	-0.124939
-0.314255	no penalty for mixing	-0.124939
-0.764057	certain restrictions on mixing	-0.124939
-0.462953	a problem when mixing	-0.124939
-0.172696	member function. 7.12 Branches	-0.124939
-0.172696	conversions.................................................................................................... 40 7.12 Branches	-0.124939
-0.294265	in example 15.1b. Branches	-0.124939
-0.237951	branch misprediction penalty. Branches	-0.124939
-0.358973	alias upon the double.	-0.124939
-0.593453	2.5, which is double.	-0.124939
-0.525391	vector of two double.	-0.124939
-0.237943	types: long long, double.	-0.124939
-0.599038	www.agner.org/optimize/asmlib.zip. // Example 16.1	-0.124939
-0.549445	or from example 16.1	-0.124939
-0.314776	Testing speed.............................................................................................................. 153 16.1	-0.124939
-0.382884	counter (see below) 16.1	-0.124939
-0.065546	r, c; for (r	-0.425969
-0.106812	double temp; for (r	-0.425969
-0.886044	true, which is 50%	-0.124939
-0.358163	grows by only 50%	-0.124939
-0.489109	a is true 50%	-0.124939
-0.518333	will be mispredicted 50%	-0.124939
-0.582761	so in a suboptimal	-0.124939
-0.200191	CPUs in a suboptimal	-0.124939
-0.505020	vectorization leads to suboptimal	-0.124939
-0.176509	0.18 0.11 memcpy 16kB	-0.124939
-0.176509	0.44 0.12 memcpy 16kB	-0.124939
-0.176509	Test Processor memcpy 16kB	-0.124939
-0.176509	0.28 0.22 memcpy 16kB	-0.124939
-0.504003	priorities to different tasks.	-0.124939
-0.523361	even for simple tasks.	-0.124939
-0.353451	more complicated mathematical tasks.	-0.124939
-0.237943	multiple logically distinct tasks.	-0.124939
-0.599813	mode if the image	-0.124939
-0.358882	signal processing and image	-0.124939
-0.463478	calculations. Examples are image	-0.124939
-0.325422	3-dimensional vectors RGB image	-0.124939
-0.358975	to determine the worst-case	-0.124939
-0.450263	relevant when testing worst-case	-0.124939
-0.299778	response time under worst-case	-0.124939
-0.299778	be tested under worst-case	-0.124939
-0.347504	single result, true (1)	-0.124939
-0.382884	against this problem: (1)	-0.124939
-0.237943	public data object: (1)	-0.124939
-0.237943	this address. Step (1)	-0.124939
-0.898132	b is a float,	-0.124939
-0.527282	The representation of float,	-0.124939
-0.548593	conversion Conversions between float,	-0.124939
-0.331928	such as int, float,	-0.124939
-0.560418	suitable for example 9.5	-0.124939
-0.353537	we modify example 9.5	-0.124939
-0.492263	of register variables. 9.5	-0.124939
-0.325421	together ...................................... 88 9.5	-0.124939
-1.035068	{ a[i] = Induction;	-0.124939
-0.495249	; a[i] = Induction;	-0.124939
-0.142369	; a[i+1] = Induction;	-0.124939
-0.142369	Induction; a[i+1] = Induction;	-0.124939
-0.527317	are writing to uncached	-0.124939
-0.527103	program code are uncached	-0.124939
-0.504490	expensive than an uncached	-0.124939
-0.355833	memory store An uncached	-0.124939
-0.588055	N into the individual	-0.124939
-0.573297	the access to individual	-0.124939
-1.253868	rather than by individual	-0.124939
-0.521702	order to identify individual	-0.124939
-0.557723	allows it to begin	-0.124939
-0.358830	have names that begin	-0.124939
-0.835388	the microprocessor can begin	-0.124939
-0.355976	the array must begin	-0.124939
-0.828312	to the user interface.	-0.124939
-0.501478	than the user interface.	-0.124939
-0.274771	a graphical user interface.	-0.124939
-0.599038	alloca: // Example 9.3	-0.124939
-0.357275	100 As table 9.3	-0.124939
-0.336313	organization ................................................................................................... 87 9.3	-0.124939
-0.538937	and VIA CPUs". 9.3	-0.124939
-0.577723	explanation of this option.	-0.124939
-0.128374	turn on this option.	-0.124939
-0.339503	without the -fpic option.	-0.124939
-0.599322	column to the diagonal.	-0.124939
-0.593524	it at the diagonal.	-0.124939
-0.314594	28 above the diagonal.	-0.124939
-0.314594	position above the diagonal.	-0.124939
-0.463586	user interfaces and interfaces	-0.124939
-0.530678	development of user interfaces	-0.124939
-0.541789	own graphical user interfaces	-0.124939
-0.354661	access to hardware interfaces	-0.124939
-0.140434	function of 4 floats	-0.124939
-0.140434	Structure of 4 floats	-0.124939
-0.535454	vectors of four floats	-0.124939
-0.490322	array of 100 floats	-0.124939
-0.570154	one function to another.	-0.124939
-0.504539	one object to another.	-0.124939
-0.726303	one way or another.	-0.124939
-0.463145	one thread than another.	-0.124939
-0.587169	T, unsigned int N>	-0.124939
-0.356737	<bool IsPowerOf2, int N>	-0.124939
-0.143415	2 template <int N>	-0.124939
-0.143415	N template <int N>	-0.124939
-0.172700	128 bytes. 7.19 Class	-0.124939
-0.172700	............................................................................ 51 7.19 Class	-0.124939
-0.172700	on performance. 7.18 Class	-0.124939
-0.172700	classes............................................................................................ 51 7.18 Class	-0.124939
-0.842332	in the program. Small	-0.124939
-0.478759	run in parallel. Small	-0.124939
-0.538937	cannot be controlled. Small	-0.124939
-0.294256	make vectorization favorable: Small	-0.124939
-0.358830	the trick that N1	-0.124939
-0.460182	0. The constant N1	-0.124939
-0.351304	of N: #define N1	-0.124939
-0.237943	* powN<true,N-N1>::p(x); #undef N1	-0.124939
-1.459073	The advantages of alloca	-0.124939
-0.358386	space explicitly when alloca	-0.124939
-1.138206	function in which alloca	-0.124939
-1.240777	the function returns. alloca	-0.124939
-0.463593	pointer alignment and aliasing.	-0.124939
-0.332549	has no pointer aliasing.	-0.124939
-0.332549	about no pointer aliasing.	-0.124939
-0.468524	Assume no pointer aliasing.	-0.124939
-0.886551	branch can be eliminated	-0.124939
-0.593712	reference can be eliminated	-0.124939
-1.487978	can also be eliminated	-0.124939
-0.951186	can sometimes be eliminated	-0.124939
-0.590044	stage that a detailed	-0.124939
-0.659687	compiler documentation for detailed	-0.124939
-0.549924	sets A more detailed	-0.124939
-0.524596	abstraction which makes detailed	-0.124939
-0.353446	256 double 256 F32vec4	-0.124939
-0.366835	xx4(x4); // x^4 F32vec4	-0.124939
-0.212327	x^2, x^3, x^4 F32vec4	-0.124939
-0.325421	of four floats F32vec4	-0.124939
-0.358724	can clear or mask	-0.124939
-0.522737	zero); // Use mask	-0.124939
-0.062276	a bit-mask: __m128i mask	-0.425969
-1.614434	sure that the original	-0.124939
-0.893795	called when the original	-0.124939
-1.003142	checks whether the original	-0.124939
-0.358852	and disadvantages. The original	-0.124939
-0.504268	costly because all caches	-0.124939
-0.502087	details about how caches	-0.124939
-0.353663	heuristic guidelines. Most caches	-0.124939
-0.237943	invalidate each other's caches	-0.124939
-0.550764	products fail to recognize	-0.124939
-0.573073	case it will recognize	-0.124939
-1.003003	the compiler will recognize	-0.124939
-1.193219	Most compilers will recognize	-0.124939
-0.212321	378.7 168.5 513 513	-0.124939
-0.212321	2048 230.7 513 513	-0.124939
-0.237951	512 378.7 168.5 513	-0.124939
-0.237951	512 2048 230.7 513	-0.124939
-0.421437	method. 7.29 Threads Threads	-0.124939
-0.102879	template method. 7.29 Threads	-0.124939
-0.102879	7.28 Templates...............................................................................................................57 7.29 Threads	-0.124939
-0.237951	possible in Linux). Threads	-0.124939
-0.172700	overloaded functions. 7.27 Overloaded	-0.124939
-0.172700	.............................................................................................. 56 7.27 Overloaded	-0.124939
-0.102881	<<6 ); 7.26 Overloaded	-0.124939
-0.102881	................................................................................................................... 56 7.26 Overloaded	-0.124939
-1.090053	power of 2. Contentions	-0.124939
-0.294256	branch target buffer. Contentions	-0.124939
-0.237943	target buffer (BTB). Contentions	-0.124939
-0.237943	in my experiments. Contentions	-0.124939
-1.488091	This method is illustrated	-0.124939
-0.358639	This technique is illustrated	-0.124939
-0.600613	effect can be illustrated	-0.124939
-0.358620	bounds checking, as illustrated	-0.124939
-0.179254	size. In other words,	-0.124939
-0.179254	parameter. In other words,	-0.124939
-0.179254	for. In other words,	-0.124939
-0.179254	safe. In other words,	-0.124939
-0.358848	but risky. The returned	-0.124939
-1.176243	objects can be returned	-0.124939
-0.594921	type can be returned	-0.124939
-0.579141	composite objects are returned	-0.124939
-0.463535	new one. The existing	-0.124939
-0.939489	of compatibility with existing	-0.124939
-0.572923	functionality to an existing	-0.124939
-0.356748	function modify an existing	-0.124939
-1.210818	in the code. Let's	-0.124939
-0.347504	loss of precision. Let's	-0.124939
-0.237943	= 4 rows. Let's	-0.124939
-0.237943	of possible inputs. Let's	-0.124939
-0.314164	stored as it is,	-0.124939
-0.314164	executed as it is,	-0.124939
-0.461998	RAM than there is,	-0.124939
-0.569808	point. The reason is,	-0.124939
-0.634172	The following example illustrates	-0.124939
-0.107130	The pitfalls of unit-testing	-0.124939
-0.504674	measuring performance by unit-testing	-0.124939
-0.358595	software development. This unit-testing	-0.124939
-0.128643	int i; for(i=0; i<300;	-0.301030
-0.237959	i, i_div_3; for(i=i_div_3=0; i<300;	-0.124939
-0.556213	Sum2(S3 * p) {return	-0.124939
-0.502428	Sum3(S3 & r) {return	-0.124939
-0.237943	403 int ReadB() {return	-0.124939
-0.237943	b; int Sum1() {return	-0.124939
-0.355235	= a2 / b2;	-0.124939
-0.324554	a1, a2, b1, b2;	-0.425969
-0.294265	{ public: B2 b2;	-0.124939
-0.341874	real time applications. Remember	-0.124939
-0.331919	and variable names. Remember	-0.124939
-0.421414	model work better. Remember	-0.124939
-0.237943	up and down. Remember	-0.124939
-0.527169	simple cases. The explicit	-0.124939
-0.524079	and making an explicit	-0.124939
-0.356748	specified. Insert an explicit	-0.124939
-1.435824	programmer to make explicit	-0.124939
-0.594818	row-wise, then the mirror	-0.124939
-0.788934	be optimal to mirror	-0.124939
-0.591723	time. You may mirror	-0.124939
-0.356090	matrix[c][r] at its mirror	-0.124939
-0.598235	combination of a dedicated	-0.124939
-1.079588	rather than a dedicated	-0.124939
-0.778406	slower than a dedicated	-0.124939
-0.874815	also have a dedicated	-0.124939
-0.380858	public: virtual void Disp()	-0.425969
-0.426832	{ public: void Disp()	-0.425969
-0.181613	b[SIZE][SIZE]) { int r,	-0.425969
-0.353126	x=y; y=temp;} int r,	-0.124939
-0.353126	9.5a: 98 int r,	-0.124939
-0.594248	code. // Example 8.26a	-0.124939
-0.459307	mode): ; Example 8.26a	-0.124939
-0.923683	code from example 8.26a	-0.124939
-0.353544	compiler optimize example 8.26a	-0.124939
-0.505066	cannot set a breakpoint	-0.124939
-0.057021	the interrupt 3 breakpoint	-0.124939
-0.481347	insert a fixed breakpoint	-0.124939
-0.005763	double a1, a2, b1,	-0.425969
-0.005763	y, a1, a2, b1,	-0.425969
-0.358830	logical register that appears	-0.124939
-0.358723	automatically, although it appears	-0.124939
-0.540654	automatically if this appears	-0.124939
-0.355834	and better processor appears	-0.124939
-0.358973	for verifying the functionality	-0.124939
-0.354645	plug-ins that add functionality	-0.124939
-0.828356	obtain the desired functionality	-0.124939
-0.675797	with a well-defined functionality	-0.124939
-0.556474	found in other languages.	-0.124939
-0.513602	choose other programming languages.	-0.124939
-0.346495	and various programming languages.	-0.124939
-0.237951	other less well-known languages.	-0.124939
-0.358924	an algorithm of sequential	-0.124939
-1.342105	are accessed in sequential	-0.124939
-0.659315	switch statement with sequential	-0.124939
-0.294256	accessed in non- sequential	-0.124939
-0.202086	this manual at www.agner.org/optimize/cppexamples.zip	-0.124939
-0.352518	the appendix at www.agner.org/optimize/cppexamples.zip	-0.124939
-0.357056	the alignment. See www.agner.org/optimize/cppexamples.zip	-0.124939
-0.061762	matter of programming style.	-0.249877
-0.065714	example 9.6b. The MOVNTQ	-0.425969
-0.358752	_mm_stream_pi((__m64*)dest, *(__m64*)&source); // MOVNTQ	-0.124939
-1.052955	bytes without cache MOVNTQ	-0.124939
-0.358934	we assume is optimized.	-0.124939
-1.190962	code is not optimized.	-0.124939
-0.357012	functions, but less optimized.	-0.124939
-0.343742	not always fully optimized.	-0.124939
-1.257917	code and data .........................................................................................	-0.124939
-0.711683	7.32 Preprocessing directives .........................................................................................	-0.124939
-0.764675	12.3 Automatic vectorization .........................................................................................	-0.124939
-0.538937	Specific optimization topics .........................................................................................	-0.124939
-0.357946	virtual functions class CHello	-0.124939
-0.593070	C1 : public CHello	-0.124939
-0.419955	C2 : public CHello	-0.124939
-0.294265	Object1; C2 Object2; CHello	-0.124939
-0.594316	work can be found	-0.124939
-0.594316	(Examples can be found	-0.124939
-0.584031	correctness must be found	-0.124939
-0.346488	advanced features rarely found	-0.124939
-0.566871	was done by me	-0.124939
-0.102879	assembly code. Let me	-0.124939
-0.102879	and sets. Let me	-0.124939
-0.237951	who have sent me	-0.124939
-0.886585	value from the counts.	-0.124939
-0.503019	the two clock counts.	-0.124939
-0.749172	than the subsequent counts.	-0.124939
-0.595245	the "worst case" counts.	-0.124939
-0.538716	mispredictions. The performance measurement	-0.124939
-0.565208	is to put measurement	-0.124939
-0.563087	put the desired measurement	-0.124939
-0.237943	this method. Your measurement	-0.124939
-0.462006	go through multiple layers	-0.124939
-0.357249	other extra software layers	-0.124939
-0.355680	that requires several layers	-0.124939
-0.353440	number of separate layers	-0.124939
-0.576348	If an error handler	-0.124939
-0.261953	of the exception handler	-0.124939
-0.261953	to the exception handler	-0.124939
-0.261953	then the exception handler	-0.124939
-0.595709	offset that is coded	-0.124939
-0.598223	instructions. This is coded	-0.124939
-1.786351	that can be coded	-0.124939
-0.595613	instructions that are coded	-0.124939
-0.726685	is small and changing	-0.124939
-0.525009	possible and by changing	-0.124939
-0.525009	the multiplication by changing	-0.124939
-0.642892	the last index changing	-0.124939
-0.600812	time in the unit-test	-0.124939
-0.358824	but unfortunately the unit-test	-0.124939
-0.591879	rely on a unit-test	-0.124939
-0.358446	best under this unit-test	-0.124939
-0.550835	accessed through the implicit	-0.124939
-0.358848	by __fastcall. The implicit	-0.124939
-0.578344	transferred as an implicit	-0.124939
-0.548549	Sum1 has an implicit	-0.124939
-0.527220	packages faster and smaller.	-0.124939
-0.358802	or thread are smaller.	-0.124939
-0.355834	array 800 bytes smaller.	-0.124939
-0.354367	or make files smaller.	-0.124939
-1.050368	be in the interval	-0.124939
-0.883670	integer in the interval	-0.124939
-0.592241	number in the interval	-0.124939
-0.563112	of the desired interval	-0.124939
-0.596504	all because the 33	-0.124939
-0.444407	16.4 65 65 33	-0.124939
-0.294256	7.4 Enums ...................................................................................................................... 33	-0.124939
-0.237943	33 7.5 Booleans................................................................................................................... 33	-0.124939
-0.594056	incremented. See page 31	-0.124939
-0.349768	on integer variables. 31	-0.124939
-0.621845	ebx, eax ebx, 31	-0.124939
-0.325411	element 63 63 31	-0.124939
-0.018985	a = (unsigned int)b	-0.425969
-0.514697	for x86 platforms. 3.	-0.124939
-0.341880	instruction for interrupt 3.	-0.124939
-0.339498	better at vectorization. 3.	-0.124939
-0.237943	an anonymous namespace. 3.	-0.124939
-0.645545	something takes 10 μs	-0.124939
-0.350860	take only 5 μs	-0.124939
-0.237962	ns = 250 μs	-0.124939
-0.237962	Certainly not! 250 μs	-0.124939
-0.861574	from any other module.	-0.124939
-0.458827	system-independent, in another module.	-0.124939
-0.061396	called from another module.	-0.124939
-0.452056	extra code. Dynamic cast	-0.124939
-0.331937	can not. Static cast	-0.124939
-0.237943	to int. Reinterpret cast	-0.124939
-0.237943	VIA CPUs"). Const cast	-0.124939
-0.550490	are stored as 8-bit	-0.124939
-0.503027	expressed as an 8-bit	-0.124939
-0.503027	coded as an 8-bit	-0.124939
-0.849003	we are using 8-bit	-0.124939
-0.452750	specifying the size. Integers	-0.124939
-0.346484	operators Integer sizes Integers	-0.124939
-0.172696	using classes. 7.2 Integers	-0.124939
-0.172696	storage............................................................................. 26 7.2 Integers	-0.124939
-0.655380	7.7 Function pointers Calling	-0.124939
-0.852993	of the class. Calling	-0.124939
-0.331919	VIA CPUs. 5. Calling	-0.124939
-0.237943	then calls exit. Calling	-0.124939
-0.044155	inline int lrint (double	-0.425969
-0.294265	loop double ipow (double	-0.124939
-0.237951	inline double IntegerPower (double	-0.124939
-0.340044	Friday, Saturday }; Weekdays	-0.124939
-0.340044	= 0x40 }; Weekdays	-0.124939
-0.212327	using & enum Weekdays	-0.124939
-0.212327	multiple conditions enum Weekdays	-0.124939
-0.358840	not always for application-specific	-0.124939
-0.854892	efficient to store application-specific	-0.124939
-0.458031	than in optimizing application-specific	-0.124939
-0.507274	able to define application-specific	-0.124939
-1.085380	b and c first.	-0.124939
-0.348397	most predictable operand first.	-0.124939
-0.345255	data members come first.	-0.124939
-0.508513	calculated the fastest first.	-0.124939
-1.442314	some of the considerations	-0.124939
-0.541048	often determined by considerations	-0.124939
-0.588868	satisfactory. The following considerations	-0.124939
-0.294256	reflects the conflicting considerations	-0.124939
-0.502728	one fraction 2 63	-0.124939
-0.934675	Time per element 63	-0.124939
-0.429548	is a valid 63	-0.124939
-0.325411	per element 63 63	-0.124939
-0.505032	A double is represented	-0.124939
-1.663449	it can be represented	-0.124939
-0.594921	Zero can be represented	-0.124939
-1.095635	are in fact represented	-0.124939
-2.124561	in order to force	-0.124939
-0.587211	functions. You can force	-0.124939
-0.357797	automatically come into force	-0.124939
-0.352954	disk. Memory-hungry applications force	-0.124939
-0.939031	to do this manually.	-0.124939
-1.274338	have to do manually.	-0.124939
-0.453385	do the reductions manually.	-0.124939
-0.294256	set the parentheses manually.	-0.124939
-0.596813	containers should be identified	-0.124939
-0.358010	order but are identified	-0.124939
-1.032697	If objects are identified	-0.124939
-0.525719	index. Are objects identified	-0.124939
-0.787539	are given in www.agner.org/optimize/cppexamples.zip.	-0.124939
-0.358424	class templates in www.agner.org/optimize/cppexamples.zip.	-0.124939
-0.901941	this manual at www.agner.org/optimize/cppexamples.zip.	-0.124939
-0.656125	memory pool. See www.agner.org/optimize/cppexamples.zip.	-0.124939
-0.587373	user has a virus	-0.124939
-0.573297	network access to virus	-0.124939
-0.659687	not uncommon for virus	-0.124939
-0.237943	many users. Firewalls, virus	-0.124939
-0.550698	of arrays and structures.	-0.124939
-0.463378	into classes or structures.	-0.124939
-0.356557	have big data structures.	-0.124939
-0.247959	very big data structures.	-0.124939
-1.296543	vector class library exp	-0.124939
-0.346468	functions directly: Library exp	-0.124939
-0.595245	of 4 floats exp	-0.124939
-0.325422	class library exp exp	-0.124939
-1.051740	by the size (in	-0.124939
-0.566902	objects. The size (in	-0.124939
-0.459193	a matrix line (in	-0.124939
-0.828409	CPU clock frequency (in	-0.124939
-0.897485	p is a pointer,	-0.124939
-0.358619	simple type, a pointer,	-0.124939
-0.345265	pointers, references, 'this' pointer,	-0.124939
-0.237951	through an imported pointer,	-0.124939
-0.358937	carry bit is kept	-0.124939
-0.916754	should preferably be kept	-0.124939
-0.587154	Sometimes, functions are kept	-0.124939
-0.357844	+ A; double Y	-0.124939
-0.837509	Update induction variable Y	-0.124939
-0.565164	two induction variables Y	-0.124939
-0.237943	Table[x] = Y; Y	-0.124939
-0.358390	cross-module optimizations when interprocedural	-0.124939
-1.267935	compiler to do interprocedural	-0.124939
-0.212327	efficient and enables interprocedural	-0.124939
-0.366835	modules. This enables interprocedural	-0.124939
-0.462470	because these are incompatible	-0.124939
-0.462470	optimization options are incompatible	-0.124939
-1.427353	makes the code incompatible	-0.124939
-0.294265	difficult to use, incompatible	-0.124939
-0.353455	(128 or 256 bytes)	-0.124939
-0.057020	the size (in bytes)	-0.124939
-0.057020	The size (in bytes)	-0.124939
-0.122675	matrix line (in bytes)	-0.124939
-1.071985	points to the selected	-0.124939
-0.841789	not have the selected	-0.124939
-1.782485	the code is selected	-0.124939
-0.598300	program may be selected	-0.124939
-0.557742	its value is multiplied	-0.124939
-0.595834	counts should be multiplied	-0.124939
-0.585254	index must be multiplied	-0.124939
-0.452759	plus an index multiplied	-0.124939
-0.358423	more reliable and reproducible	-0.124939
-0.358423	as accurate and reproducible	-0.124939
-0.358393	to get more reproducible	-0.124939
-0.569881	difficult to get reproducible	-0.124939
-0.579136	Shared objects are normally	-0.124939
-0.590507	Compilers do not normally	-0.124939
-0.358592	everything else. This normally	-0.124939
-0.460721	and Mac systems normally	-0.124939
-0.593912	space used for constants.	-0.124939
-1.369865	two or more constants.	-0.124939
-0.525913	problems for integer constants.	-0.124939
-0.325411	used for defining constants.	-0.124939
-0.358631	data cache, code cache,	-0.124939
-0.571608	contents of data cache,	-0.124939
-0.936184	the level-1 data cache,	-0.124939
-1.533408	share the same cache,	-0.124939
-0.358620	pointer serves as entry	-0.124939
-0.653399	for the common entry	-0.124939
-0.364855	replaces the PLT entry	-0.124939
-0.279499	function. The PLT entry	-0.124939
-0.563037	The performance is inferior	-0.124939
-0.577505	64-bit compilers are inferior	-0.124939
-0.358551	will run an inferior	-0.124939
-0.355833	it supports. An inferior	-0.124939
-1.657325	likely to be obsolete.	-0.124939
-0.358052	will soon be obsolete.	-0.124939
-0.102881	books 1994. Mostly obsolete.	-0.124939
-0.102881	Wesley 1997. Mostly obsolete.	-0.124939
-0.653960	do calculations while simultaneously	-0.124939
-0.898243	doing multiple calculations simultaneously	-0.124939
-0.353219	Multiple applications running simultaneously	-0.124939
-0.331919	or more jobs simultaneously	-0.124939
-0.408211	An interrupt service routine	-0.124939
-0.065813	function. The initialization routine	-0.124939
-0.015542	has an initialization routine	-0.425969
-0.527103	smart pointers are auto_ptr	-0.124939
-0.786535	transferred from one auto_ptr	-0.124939
-0.325411	and only one, auto_ptr	-0.124939
-0.237943	auto_ptr and shared_ptr. auto_ptr	-0.124939
-0.561704	call. A branch tree	-0.124939
-0.439487	be a binary tree	-0.124939
-0.123744	case. A binary tree	-0.124939
-0.123744	moved. A binary tree	-0.124939
-0.620219	the compiler is unable	-0.425969
-0.974882	program will be unable	-0.124939
-0.564441	user will be unable	-0.124939
-0.527373	code is executed. Optimizes	-0.124939
-0.428547	and automatic vectorization. Optimizes	-0.124939
-0.302962	do automatic vectorization. Optimizes	-0.124939
-0.444408	standard calling conventions. Optimizes	-0.124939
-0.595556	variables, floating point constants,	-0.124939
-0.354697	floating 26 point constants,	-0.124939
-0.061970	point constants, string constants,	-0.124939
-1.076373	discussion of the techniques	-0.124939
-0.358824	before trying the techniques	-0.124939
-0.588872	runtime). The following techniques	-0.124939
-0.353657	expensive. Using complicated techniques	-0.124939
-1.013473	the function or otherwise	-0.124939
-0.358102	| operator which otherwise	-0.124939
-0.839504	even if they otherwise	-0.124939
-0.354788	errors that would otherwise	-0.124939
-0.102879	member pointer. 7.9 Smart	-0.124939
-0.102879	Member pointers.......................................................................................................37 7.9 Smart	-0.124939
-0.294265	pointer is deleted. Smart	-0.124939
-0.237951	than for auto_ptr. Smart	-0.124939
-1.186140	or if it opens	-0.124939
-1.069202	that a function opens	-0.124939
-0.358102	calling WritePrivateProfileString, which opens	-0.124939
-0.597365	new instruction set opens	-0.124939
-0.494005	that may be modified	-0.425969
-0.890290	data that are modified	-0.124939
-0.754205	that are never modified	-0.124939
-0.497500	can convert example 15.1a	-0.124939
-0.353544	automatically reduces example 15.1a	-0.124939
-0.400276	Intel compiler reduced 15.1a	-0.124939
-0.308384	the compilers reduced 15.1a	-0.124939
-0.642883	and x86-64 platforms. Comparison	-0.124939
-0.294265	- Table 8.1. Comparison	-0.124939
-0.102879	this optimization. 8.2 Comparison	-0.124939
-0.102879	............................................................................................ 66 8.2 Comparison	-0.124939
-1.196545	before it is finished	-0.124939
-0.504357	all threads have finished	-0.124939
-0.185464	before it has finished	-0.425969
-1.370699	when it is run.	-0.124939
-1.037355	the program is run.	-0.124939
-0.592347	before it can run.	-0.124939
-0.065346	9.9 Access data sequentially	-0.124939
-0.357080	not necessarily stored sequentially	-0.124939
-1.115746	can be accessed sequentially	-0.124939
-0.358293	programming manuals from Intel:	-0.124939
-0.356818	on code optimization Intel:	-0.124939
-0.325422	obsolete. Microprocessor documentation Intel:	-0.124939
-0.538937	are produced regularly. Intel:	-0.124939
-0.590458	libraries in this format.	-0.124939
-0.556261	the long double format.	-0.124939
-0.537468	usual object file format.	-0.124939
-0.237943	converted to OMF format.	-0.124939
-0.814654	errors in C++ programs.	-0.124939
-0.355334	obscured in optimized programs.	-0.124939
-0.347512	compatible with 16-bit programs.	-0.124939
-0.343767	than non-object oriented programs.	-0.124939
-1.152870	in a non-sequential manner	-0.124939
-0.382884	in a systematic manner	-0.124939
-0.237943	a rather unconventional manner	-0.124939
-0.237943	in a column-wise manner	-0.124939
-0.657827	of how compilers work.	-0.124939
-0.850941	do not always work.	-0.124939
-0.356103	concentrating on important work.	-0.124939
-0.352398	compilers and microprocessors work.	-0.124939
-0.659443	long long or uint64_t	-0.124939
-0.539255	Vec2q 64 2 uint64_t	-0.124939
-0.353438	int int64_t 256 uint64_t	-0.124939
-0.237943	64 0 264-1 uint64_t	-0.124939
-0.499461	&CriticalFunction_AVX; } if (level	-0.124939
-0.501349	} else if (level	-0.124939
-0.065297	many branches): if (level	-0.425969
-0.358895	worked well in tests	-0.124939
-0.659694	C++ compilers The tests	-0.124939
-0.357374	may make some tests	-0.124939
-0.357281	testing Most performance tests	-0.124939
-0.336321	way three times. Then	-0.124939
-0.325411	with profiling support. Then	-0.124939
-0.325411	the -fpic option. Then	-0.124939
-0.237943	somewhere in F1? Then	-0.124939
-0.572814	solution where a soft	-0.124939
-0.540748	processor. Such a soft	-0.124939
-0.348379	as a so-called soft	-0.124939
-0.314785	processors and FPGA soft	-0.124939
-0.577844	-100, b = 100,	-0.124939
-0.578561	100, c = 100,	-0.124939
-0.355572	int min = 100,	-0.124939
-0.355572	int NUMROWS = 100,	-0.124939
-0.682718	gives more reliable results.	-0.124939
-0.284365	reliable and reproducible results.	-0.124939
-0.212321	to get reproducible results.	-0.124939
-0.294265	may produce undesired results.	-0.124939
-0.358929	to tell a hyperthreading	-0.124939
-1.481161	advantageous to use hyperthreading	-0.124939
-0.358117	particular application. If hyperthreading	-0.124939
-1.055986	you can avoid hyperthreading	-0.124939
-0.463586	simplest expressions and operators.	-0.124939
-0.126065	classes and overloaded operators.	-0.124939
-0.126065	constructors and overloaded operators.	-0.124939
-0.408213	increment and decrement operators.	-0.124939
-0.892456	leaf function is simpler	-0.124939
-0.526862	The syntax is simpler	-0.124939
-0.554101	addresses is much simpler	-0.124939
-1.037056	the code becomes simpler	-0.124939
-1.676742	the floating point format	-0.124939
-0.139569	The intermediate file format	-0.124939
-0.139569	an intermediate file format	-0.124939
-1.086499	to the right format	-0.124939
-0.870566	time or a reasonable	-0.124939
-0.587005	(or if a reasonable	-0.124939
-0.526168	that the only reasonable	-0.124939
-0.462419	Useful when no reasonable	-0.124939
-1.064864	done with the resolution	-0.124939
-0.353211	a very high resolution	-0.124939
-0.454925	A much higher resolution	-0.124939
-0.294256	measured with millisecond resolution	-0.124939
-0.503712	or more integer units,	-0.124939
-0.355908	often have execution units,	-0.124939
-1.190247	floating point addition units,	-0.124939
-0.341880	flip-flops, multiplexers, arithmetic units,	-0.124939
-2.157755	Example: // Example 12.2	-0.124939
-0.356035	the library function. 12.2	-0.124939
-0.331919	registers ................................................................. 107 12.2	-0.124939
-0.325411	127 127 126 12.2	-0.124939
-0.050952	bb into vector b:	-0.249877
-0.358264	the time-consuming data processing.	-0.124939
-0.346472	directives for parallel processing.	-0.124939
-0.325411	processing and image processing.	-0.124939
-0.314790	directives for multi-core processing.	-0.124939
-0.585672	functionality and a well-defined	-0.124939
-0.846618	library with a well-defined	-0.124939
-0.846618	modules with a well-defined	-0.124939
-0.336346	the overflow behavior well-defined	-0.124939
-0.106586	of 2 // Still	-0.425969
-0.065391	is faster // Still	-0.425969
-0.356880	takes 14 - 45	-0.124939
-0.356880	multiplication (20 - 45	-0.124939
-0.591339	7.30b int i; 45	-0.124939
-0.237951	43 7.13 Loops...................................................................................................................... 45	-0.124939
-0.340384	consecutive elements from bb	-0.602060
-0.349900	} 115 from bb	-0.124939
-0.835150	are explained in detail	-0.124939
-0.566763	are described in detail	-0.124939
-0.504276	described in more detail	-0.124939
-0.601251	Many of the advices	-0.124939
-0.352394	Manual". developer.intel.com. Many advices	-0.124939
-0.336323	the above security advices	-0.124939
-0.357582	page 71). The conclusion	-0.124939
-0.357582	make utility. The conclusion	-0.124939
-0.357582	than 1.23456. The conclusion	-0.124939
-0.526866	points to is deleted	-0.124939
-1.327355	the object is deleted	-0.124939
-0.550030	function and later deleted	-0.124939
-0.598549	parameters and the 49	-0.124939
-0.587716	functions. See page 49	-0.124939
-0.542727	Windows (See page 49	-0.124939
-0.357473	loop counters, function parameters,	-0.124939
-0.357473	called from), function parameters,	-0.124939
-0.502041	size as template parameters,	-0.124939
-0.356824	faster vectorized code. Storing	-0.124939
-0.853015	of the class. Storing	-0.124939
-1.116882	in 32-bit mode. Storing	-0.124939
-0.463586	= 0x2710 and (set)	-0.124939
-0.787624	then we have (set)	-0.124939
-0.237951	by the formula: (set)	-0.124939
-0.358887	coarse-grained parallelism and fine-grained	-0.124939
-0.463293	parallelism than with fine-grained	-0.124939
-0.237951	methods for exploiting fine-grained	-0.124939
-0.341889	or simply zero. Execution	-0.124939
-0.102881	dependency chain. 3.16 Execution	-0.124939
-0.102881	................................................................................................ 22 3.16 Execution	-0.124939
-0.168457	__m128i c = LoadVector(cc	-0.425969
-0.447356	Is16vec8 c = LoadVector(cc	-0.124939
-0.802384	loaded at an arbitrary	-0.124939
-0.535190	loaded into an arbitrary	-0.124939
-0.354956	is just an arbitrary	-0.124939
-0.512942	inside the class. Which	-0.124939
-0.294265	"best case" values. Which	-0.124939
-0.294265	the same effect. Which	-0.124939
-0.504447	the compilers may behave	-0.124939
-0.357912	number). Different compilers behave	-0.124939
-0.501627	compiler to always behave	-0.124939
-0.356927	represented with 64 bits,	-0.124939
-0.460518	int is 32 bits,	-0.124939
-0.355130	i to four bits,	-0.124939
-0.358887	be platform-independent and compact.	-0.124939
-0.462957	making data more compact.	-0.124939
-0.502345	is slightly less compact.	-0.124939
-0.762486	container class that behaves	-0.124939
-0.503883	an object that behaves	-0.124939
-0.294275	142). 30 Overflow behaves	-0.124939
-0.550707	emulated processors and FPGA	-0.124939
-0.501975	core and an FPGA	-0.124939
-0.577344	microprocessor in an FPGA	-0.124939
-0.550698	AMD processors and earlier	-0.124939
-0.463293	possibly not with earlier	-0.124939
-0.356268	SVML v.10.2 & earlier	-0.124939
-0.350153	while (seconds < 5)	-0.124939
-0.350153	while (0 < 5)	-0.124939
-0.492283	&SelectAddMul_AVX2; (iset >= 5)	-0.124939
-0.419092	cycle. The operators &,	-0.124939
-0.692188	The bitwise operators &,	-0.124939
-0.432016	corresponding bitwise operators &,	-0.124939
-0.357936	chapter 10 page 101	-0.124939
-0.331930	subtasks is necessary. 101	-0.124939
-0.237951	99 10 Multithreading.............................................................................................................. 101	-0.124939
-0.109609	for the following reasons:	-0.301030
-0.565256	storing the elements consecutively	-0.124939
-0.583061	structure are stored consecutively	-0.124939
-0.870020	rows are accessed consecutively	-0.124939
-1.204760	caching less efficient. Extra	-0.124939
-0.352979	double. Misaligned data. Extra	-0.124939
-0.339494	Pentium 4 processor. Extra	-0.124939
-1.557458	in case of error.	-0.124939
-0.358555	line provokes an error.	-0.124939
-0.721074	a common programming error.	-0.124939
-0.900201	operations can be carried	-0.124939
-0.351712	The tests were carried	-0.124939
-0.237951	A longer loop- carried	-0.124939
-0.463324	made smaller by reordering	-0.124939
-0.358595	S1 ArrayOfStructures[100]; This reordering	-0.124939
-0.526578	can make this reordering	-0.124939
-0.499719	ported to another platform.	-0.124939
-0.353860	run on Mac platform.	-0.124939
-0.626434	on a PC platform.	-0.124939
-1.492986	if you are satisfied	-0.124939
-0.358015	Those who are satisfied	-0.124939
-1.171836	you are not satisfied	-0.124939
-0.583956	safer. It may catch	-0.124939
-0.840749	above code will catch	-0.124939
-0.358338	{ F1(); } catch	-0.124939
-0.294265	-msse4.1 /arch:SSE4.1 -mAVX /arch:AVX	-0.124939
-0.237951	Static linking (multithreaded) /arch:AVX	-0.124939
-0.237951	instruction set (/arch:SSE2, /arch:AVX	-0.124939
-0.594069	allocation. See page 93	-0.124939
-0.347518	stored are containers 93	-0.124939
-0.314785	Container classes ..................................................................................................... 93	-0.124939
-1.053261	of the virtual 53	-0.124939
-0.382896	member functions ........................................................................................ 53	-0.124939
-0.237951	member functions (methods)......................................................................... 53	-0.124939
-0.341883	Alignd(X) __declspec(align(16)) X #else	-0.124939
-0.331936	: "memory" ); #else	-0.124939
-0.294265	#define pure_function __attribute__((const)) #else	-0.124939
-0.501975	multiplication and an addition.	-0.124939
-0.501975	but only an addition.	-0.124939
-1.689471	a floating point addition.	-0.124939
-1.574572	at compile time. Text	-0.124939
-0.736070	efficient container classes. Text	-0.124939
-0.408199	needs. 9.8 Strings Text	-0.124939
-0.128790	to systems with big-endian	-0.425969
-0.459761	other platforms with big-endian	-0.124939
-0.538953	B1; class B2; 54	-0.124939
-0.237951	identification (RTTI) ........................................................................... 54	-0.124939
-0.237951	7.22 Inheritance .............................................................................................................. 54	-0.124939
-0.358887	page 145 and 119	-0.124939
-0.348384	any non-vector library. 119	-0.124939
-0.237951	functions for vectors........................................................................ 119	-0.124939
-0.549666	b) a && !a	-0.124939
-0.772314	false, a || !a	-0.124939
-0.382896	(a&&c) = a&&(b||c) !a	-0.124939
-0.504445	extra level of abstraction	-0.124939
-0.143192	several layers of abstraction	-0.124939
-0.143192	separate layers of abstraction	-0.124939
-0.336350	of 8 bits each,	-0.124939
-0.502435	of 16 bits each,	-0.124939
-0.548493	of 32 bits each,	-0.124939
-0.355156	if (level >= 11)	-0.425969
-0.294275	Out-of-order execution (chapter 11)	-0.124939
-0.505018	65 bytes of code).	-0.124939
-0.351708	to produce binary code).	-0.124939
-0.237951	intermediate code (byte code).	-0.124939
-1.411084	Dynamic memory allocation ......................................................................................	-0.124939
-0.595280	pitfalls of unit-testing ......................................................................................	-0.124939
-0.538953	a clock cycle? ......................................................................................	-0.124939
-0.596417	pipeline. If the wrong	-0.124939
-0.358826	has chosen the wrong	-0.124939
-0.893540	type-casted to a wrong	-0.124939
-0.168360	__m128i b = LoadVector(bb	-0.425969
-0.446989	Is16vec8 b = LoadVector(bb	-0.124939
-0.527324	32-bit integers to alias	-0.124939
-0.356510	pointer does not alias	-0.124939
-0.575080	copying of memory blocks,	-0.124939
-0.455626	keep multiple memory blocks,	-0.124939
-0.499172	of large memory blocks,	-0.124939
-0.356645	features. Take user feedback	-0.124939
-0.554980	where the main feedback	-0.124939
-0.314785	new features. User feedback	-0.124939
-0.321864	__attribute__((const)) #else #define pure_function	-0.124939
-0.321864	#ifdef __GNUC__ #define pure_function	-0.124939
-0.237959	#endif double Func1(double) pure_function	-0.124939
-1.482513	= a - a-a	-0.124939
-1.352899	a - n.a. a-a	-0.124939
-0.294265	xxxxxxxxx x-xxx---- a-(-b)=a+b a-a	-0.124939
-0.308597	no long dependency chains.	-0.124939
-0.127447	avoid long dependency chains.	-0.124939
-1.592371	is used for prefetching	-0.124939
-0.648537	rely on automatic prefetching	-0.124939
-0.325429	calculations while simultaneously prefetching	-0.124939
-0.885205	to see the compiler-generated	-0.124939
-0.788785	function libraries and compiler-generated	-0.124939
-0.331930	read and understand compiler-generated	-0.124939
-0.358361	good investment. A redesign	-0.124939
-0.172700	to a complete redesign	-0.124939
-0.172700	data. A complete redesign	-0.124939
-0.172700	compilers may behave differently	-0.124939
-0.172700	Different compilers behave differently	-0.124939
-0.314803	30 Overflow behaves differently	-0.124939
-0.572398	A and then B,	-0.124939
-0.037170	x; int A, B,	-0.425969
-0.403897	automatic vectorization. Optimizes reasonably	-0.124939
-0.212327	calling conventions. Optimizes reasonably	-0.124939
-0.314795	Visual Studio optimizes reasonably	-0.124939
-0.780012	same processor core. Two	-0.124939
-0.575671	to using templates. Two	-0.124939
-0.237951	to write _mm_add_epi16(a,b). Two	-0.124939
-0.294265	and destructors .................................................................................. 55	-0.124939
-0.237951	will give -2.0 55	-0.124939
-0.237951	7.24 Unions .................................................................................................................... 55	-0.124939
-0.239234	of vector math libraries:	-0.124939
-0.239234	long vector math libraries:	-0.124939
-0.345385	short vector math libraries:	-0.124939
-0.657611	be linked into projects	-0.124939
-0.561431	maintainability of C++ projects	-0.124939
-0.502680	are that software projects	-0.124939
-0.208794	, longdoublevalue ( 1)sign	-0.124939
-0.208794	, doublevalue ( 1)sign	-0.124939
-0.208794	follows: floatvalue ( 1)sign	-0.124939
-2.158402	Example: // Example 14.6	-0.124939
-0.596134	list[300] = 0; 14.6	-0.124939
-0.314785	Integer division...................................................................................................... 137 14.6	-0.124939
-0.596871	solution is the combination	-0.124939
-0.891349	constant with a combination	-0.124939
-0.331936	zero. An OR combination	-0.124939
-0.527077	optimized Intel function libraries,	-0.124939
-0.147288	or dynamic link libraries,	-0.425969
-0.355840	that volatile doesn't mean	-0.124939
-0.352074	data (low numbers mean	-0.124939
-0.408199	The square brackets mean	-0.124939
-0.594075	Instrumentation: The compiler inserts	-0.124939
-0.356886	Gnu compiler often inserts	-0.124939
-0.539708	Debugging. The profiler inserts	-0.124939
-0.137169	{ return _mm_loadu_si128((__m128i const*)p);	-0.425969
-0.237959	{ return _mm_load_si128((__m128i const*)p);	-0.124939
-0.585989	caller through a hidden	-0.124939
-0.596813	function) should be hidden	-0.124939
-0.531311	16 is actually hidden	-0.124939
-0.596459	a*(b+c) - n.a. x*x*x*x*x*x*x*x	-0.124939
-0.102881	d = ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x	-0.124939
-0.102881	x x ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x	-0.124939
-0.309542	to recover from errors.	-0.124939
-0.900845	to prevent such errors.	-0.124939
-0.498862	bytes of memory. One	-0.124939
-0.467677	powerful development tools. One	-0.124939
-0.408199	done only once. One	-0.124939
-0.304419	is called square blocking	-0.124939
-0.304419	techniques like square blocking	-0.124939
-0.237959	the effort. Square blocking	-0.124939
-0.065653	if unsigned // Faster	-0.425969
-0.314795	to find elsewhere. Faster	-0.124939
-0.343757	out some typical sources	-0.124939
-0.117642	schemes are frequent sources	-0.124939
-0.117642	frameworks are frequent sources	-0.124939
-0.358926	is limited to well-tested	-0.124939
-0.463324	replace arrays by well-tested	-0.124939
-0.354913	Boost collection contains well-tested	-0.124939
-0.341897	relevant to small devices,	-0.124939
-0.626451	on such small devices,	-0.124939
-0.535886	On the smallest devices,	-0.124939
-1.069028	with floating point multiplication,	-0.124939
-0.382896	of addition, subtraction, multiplication,	-0.124939
-0.237951	point operations (addition, multiplication,	-0.124939
-0.177034	leaving the AVX part.	-0.425969
-0.354044	on that particular part.	-0.124939
-0.463378	graphics library or API	-0.124939
-1.641634	the operating system API	-0.124939
-0.458209	should use standard API	-0.124939
-1.489167	when the program starts	-0.124939
-1.012046	before the program starts	-0.124939
-0.553377	time the computer starts	-0.124939
-1.961254	of the code only.	-0.124939
-0.352719	time consuming parts only.	-0.124939
-0.294265	calculated using multiplications only.	-0.124939
-0.354672	simple variables, loop counters,	-0.124939
-0.354672	temporary intermediates, loop counters,	-0.124939
-0.555699	loop with multiple counters,	-0.124939
-0.826774	the whole program execution,	-0.124939
-0.570276	advantage of out-of-order execution,	-0.124939
-0.311818	from doing out-of-order execution,	-0.124939
-0.596134	list[i] = 0; list[i+1]	-0.124939
-0.458804	list[i] += i_div_3; list[i+1]	-0.124939
-0.237951	i+=3){ list[i] =0; list[i+1]	-0.124939
-0.599816	delays if the distance	-0.124939
-0.358446	will call this distance	-0.124939
-0.294265	stride. Variables whose distance	-0.124939
-0.599073	values: // Example 14.28	-0.124939
-0.819726	2 in example 14.28	-0.124939
-0.558381	method in example 14.28	-0.124939
-0.573301	initializing pointers to zero,	-0.124939
-0.237951	use truncation towards zero,	-0.124939
-0.237951	a = select_gt(b, zero,	-0.124939
-0.596134	(r1 = 0; r1	-0.124939
-0.546170	TILESIZE // Loop r1	-0.124939
-0.544543	r1 < SIZE; r1	-0.124939
-0.929341	{ // Loop r2	-0.124939
-0.408199	(r2 = r1; r2	-0.124939
-0.237951	(r2 = r1+1; r2	-0.124939
-0.294265	AMD XOP ammintrin.h (MS)	-0.124939
-0.237951	(Gnu) all intrin.h (MS)	-0.124939
-0.237951	smmintrin.h SSE4.2 nmmintrin.h (MS)	-0.124939
-0.847361	for discussion of aligning	-0.124939
-0.358844	Define macro for aligning	-0.124939
-1.365007	the compiler from aligning	-0.124939
-1.161596	an option for assuming	-0.124939
-0.579064	since we are assuming	-0.124939
-0.658605	is prevented from assuming	-0.124939
-0.358744	int Induction = r;	-0.124939
-0.388321	0; c < r;	-0.425969
-0.131774	float Live range analysis	-0.124939
-0.131774	storage. Live range analysis	-0.124939
-0.237959	do a thorough analysis	-0.124939
-0.582494	branch. It may seem	-0.124939
-0.356563	The syntax may seem	-0.124939
-1.054723	I have tested seem	-0.124939
-0.577021	of Linux and perhaps	-0.124939
-0.358429	fetching, decoding and perhaps	-0.124939
-0.346502	much faster, except perhaps	-0.124939
-0.279502	code. An interrupt service	-0.124939
-0.279502	Device drivers, interrupt service	-0.124939
-0.237959	integer type. Interrupt service	-0.124939
-0.356108	32-bit Windows. Gnu Comes	-0.124939
-0.353223	also available. Microsoft Comes	-0.124939
-0.237951	CodeGear / Embarcadero Comes	-0.124939
-0.348035	1: 4 + esp	-0.124939
-0.348035	1: 8 + esp	-0.124939
-0.348035	?Func@@YAXQAHAAH@Z ENDP + esp	-0.124939
-0.546867	uses the new features.	-0.124939
-0.347335	and desired new features.	-0.124939
-0.355846	rather than processor features.	-0.124939
-0.357801	a ^ b ---xx----	-0.124939
-0.102881	0/a=0 ---x---xx (-a==-b)=(a==b) ---xx----	-0.124939
-0.102881	0/a=0 ---xx--xx (-a==-b)=(a==b) ---xx----	-0.124939
-0.812403	7.15 Function parameters ...............................................................................................	-0.124939
-0.716571	costs of optimizing ...............................................................................................	-0.124939
-0.630036	Performance and usability ...............................................................................................	-0.124939
-1.965174	is that the C/C++	-0.124939
-0.358848	is bad The C/C++	-0.124939
-0.421426	2004. Open Watcom C/C++	-0.124939
-0.358744	value -100+100+100 = 100.	-0.124939
-0.561076	if i < 100.	-0.124939
-0.561076	condition i < 100.	-0.124939
-0.805745	{int a; int b;};	-0.124939
-0.128635	{double a; double b;};	-0.425969
-0.463516	single task that consumes	-0.124939
-0.358106	extra overhead which consumes	-0.124939
-0.349136	level framework still consumes	-0.124939
-0.458811	hold e.g. four numbers,	-0.124939
-0.536269	names and model numbers,	-0.124939
-0.294265	2. Using hexadecimal numbers,	-0.124939
-0.444418	are particularly critical. 129	-0.124939
-0.314785	128 17.4 129 129	-0.124939
-0.237951	128 128 17.4 129	-0.124939
-1.124163	it takes to reload	-0.425969
-1.466502	is necessary to reload	-0.124939
-0.557793	doesn't give the 124	-0.124939
-0.237951	13.3 Difficult cases........................................................................................................ 124	-0.124939
-0.237951	Model-specific dispatching .................................................................................... 124	-0.124939
-0.756171	and loop-invariant code motion	-0.124939
-0.065382	Loop invariant code motion	-0.124939
-0.659879	to run a speed-critical	-0.124939
-0.569305	dispatching only for speed-critical	-0.124939
-0.358194	is preferable for speed-critical	-0.124939
-0.573302	long list of numbers:	-0.124939
-1.069028	with floating point numbers:	-0.124939
-0.490322	sum of 100 numbers:	-0.124939
-0.350856	The next section (page	-0.124939
-0.345250	the previous chapter (page	-0.124939
-0.325429	overflow. Table 8.1 (page	-0.124939
-0.659867	from 0 to 12.	-0.124939
-0.384969	described in chapter 12.	-0.124939
-0.269923	mentioned in chapter 12.	-0.124939
-0.878382	loop will take 1000	-0.124939
-0.212327	a program repeats 1000	-0.124939
-0.212327	that also repeats 1000	-0.124939
-1.165204	when they are long.	-0.124939
-0.352965	small or too long.	-0.124939
-0.331950	and sometimes unacceptably long.	-0.124939
-0.572215	details of cache organization	-0.124939
-0.114085	important. 9.2 Cache organization	-0.124939
-0.114085	87 9.2 Cache organization	-0.124939
-0.889774	software that is slow,	-0.124939
-0.834119	of A is slow,	-0.124939
-0.358346	divisions (Division is slow,	-0.124939
-0.065769	same operation is performed	-0.425969
-0.596819	test should be performed	-0.124939
-0.599863	software in a high-level	-0.124939
-0.355973	program size, while high-level	-0.124939
-0.453955	is an advanced high-level	-0.124939
-0.462401	reserving memory in advance.	-0.124939
-0.504155	be calculated in advance.	-0.124939
-0.539741	be given in advance.	-0.124939
-0.498435	code is fast anyway	-0.124939
-0.349131	in the database anyway	-0.124939
-0.483923	used by default anyway	-0.124939
-0.239420	dynamic link library (*.dll	-0.425969
-0.822831	the dynamic libraries (*.dll	-0.124939
-0.557581	BSD systems. The Intel-based	-0.124939
-1.258905	as well as Intel-based	-0.124939
-0.421426	(Windows, Linux, BSD, Intel-based	-0.124939
-0.587352	startup code and main()	-0.124939
-0.248691	parm2); } int main()	-0.124939
-0.248691	&CriticalFunction_386; } int main()	-0.124939
-0.463396	float x4 = x2	-0.124939
-0.525700	x) { double x2	-0.124939
-0.357662	1./1.30767E12, 1./2.09227E13}; float x2	-0.124939
-0.356464	just-in-time compilers, system database,	-0.124939
-0.331930	with a remote database,	-0.124939
-0.331930	same queue, list, database,	-0.124939
-0.856893	all x86 platforms. Works	-0.124939
-0.237951	library (VML, MKL). Works	-0.124939
-0.237951	Performance Primitives (IPP). Works	-0.124939
-1.228490	a series of calculations:	-0.124939
-0.504901	Main loop for calculations:	-0.124939
-0.325421	apply to modulo calculations:	-0.124939
-0.585168	chosen as the basis	-0.124939
-0.237951	First-In-First- Out (FIFO) basis	-0.124939
-0.237951	First-In-Last- Out (FILO) basis	-0.124939
-0.504906	important work. The updating	-0.124939
-0.909423	does not need updating	-0.124939
-0.346494	Automatic updates. Automatic updating	-0.124939
-0.853305	lot of data manipulation	-0.124939
-0.356985	Contains many bit manipulation	-0.124939
-0.496700	Memory and string manipulation	-0.124939
-0.357682	% 32 = 28.	-0.124939
-0.462053	when r = 28.	-0.124939
-0.462012	in set number 28.	-0.124939
-0.357946	8.19. Devirtualization class C0	-0.124939
-0.820845	C1 : public C0	-0.124939
-0.294265	{ C1 obj1; C0	-0.124939
-0.574467	order to optimize access,	-0.124939
-0.336323	files, data base access,	-0.124939
-0.237951	First-In-First-Out or First-In-Last-Out access,	-0.124939
-1.228490	a series of calculations,	-0.124939
-0.458376	size when doing calculations,	-0.124939
-0.353457	as heavy mathematical calculations,	-0.124939
-0.659696	OpenMP directives for multi-core	-0.124939
-0.835250	multiple CPUs or multi-core	-0.124939
-0.358639	processor core on multi-core	-0.124939
-0.358010	the last cache level,	-0.124939
-0.782300	the object file level,	-0.124939
-0.449165	a lower priority level,	-0.124939
-0.358287	Pragmatic Look at Exception	-0.124939
-0.701284	and error handling Exception	-0.124939
-0.545296	of exception handling Exception	-0.124939
-0.463636	it comes to optimization,	-0.124939
-0.562222	called whole program optimization,	-0.124939
-0.294265	} 73 Without optimization,	-0.124939
-0.355602	constant propagation, etc. Whether	-0.124939
-0.444413	many Boolean expressions. Whether	-0.124939
-0.237951	Sum2 and Sum3. Whether	-0.124939
-0.891632	performance because the contents	-0.124939
-0.504876	and copy the contents	-0.124939
-0.552413	and the entire contents	-0.124939
-0.462007	1996. These two books	-0.124939
-0.453411	in the relevant books	-0.124939
-0.237951	optimization", Coriolis group books	-0.124939
-0.358937	word static is removed	-0.124939
-0.598300	but may be removed	-0.124939
-0.358595	columns unused. This removed	-0.124939
-0.589669	listed on page 164	-0.124939
-0.314785	Copyright notice .......................................................................................................... 164	-0.124939
-0.237951	die. See www.gnu.org/copyleft/fdl.html. 164	0.000000
-0.347099	= 8; float matrix[rows][columns];	-0.124939
-0.347099	= 32; float matrix[rows][columns];	-0.124939
-0.347099	= 50; float matrix[rows][columns];	-0.124939
-0.082050	as a linked list.	-0.124939
-0.358852	Taylor series. The exponential	-0.124939
-0.023527	such as logarithms, exponential	-0.425969
-0.358848	mentioned above. The generality	-0.124939
-0.726583	is designed for generality	-0.124939
-0.523958	if the full generality	-0.124939
-0.552102	long time it takes.	-0.124939
-0.808333	much time it takes.	-0.124939
-0.502742	time each part takes.	-0.124939
-0.599863	threads in a multithreaded	-0.124939
-0.357545	objects simultaneously. In multithreaded	-0.124939
-0.354522	account when optimizing multithreaded	-0.124939
-0.163823	list[i+1] = 1; list[i+2]	-0.425969
-0.458818	list[i+1] += i_div_3; list[i+2]	-0.124939
-0.462278	element. Matrix size Total	-0.124939
-0.744115	Number of elements Total	-0.124939
-0.744115	Type of elements Total	-0.124939
-0.358725	to do it explicitly.	-0.124939
-1.289603	vectorize the code explicitly.	-0.124939
-0.655669	do this optimization explicitly.	-0.124939
-0.550227	calculations. In other programs,	-0.124939
-0.578811	program. In some programs,	-0.124939
-0.347518	to make 16-bit programs,	-0.124939
-0.358725	how well it optimizes	-0.124939
-0.598967	well the compiler optimizes	-0.124939
-0.520891	processing. Visual Studio optimizes	-0.124939
-0.341889	your own profiling instruments	-0.124939
-0.212327	to put measurement instruments	-0.124939
-0.212327	the desired measurement instruments	-0.124939
-0.357729	entry point. // After	-0.124939
-0.357729	the dispatcher. // After	-0.124939
-0.341889	kind of branch. After	-0.124939
-0.319773	while many reductions involving	-0.124939
-0.319773	expressions. Most reductions involving	-0.124939
-0.341905	class objects Conversions involving	-0.124939
-0.300346	Floating point XMM (vector)	-0.124939
-0.300346	- Integer XMM (vector)	-0.124939
-0.300346	76 Boolean XMM (vector)	-0.124939
-0.582112	occur has the unfortunate	-0.124939
-0.598465	rounding. This is unfortunate	-0.124939
-0.463163	code uses an unfortunate	-0.124939
-0.018820	in manual 2: "Optimizing	-0.602060
-1.072477	copied to the parameter,	-0.124939
-0.541226	treated like a parameter,	-0.124939
-1.276510	as a function parameter,	-0.124939
-1.008478	to recover from exceptions.	-0.124939
-0.526173	errors without using exceptions.	-0.124939
-0.354661	to catching hardware exceptions.	-0.124939
-1.162431	to avoid the time-	-0.124939
-0.461516	long and very time-	-0.124939
-0.565213	idea to put time-	-0.124939
-0.020849	bytes AMD Opteron K8	-0.124939
-0.020849	operands AMD Opteron K8	-0.124939
-0.020849	op. AMD Opteron K8	-0.124939
-1.737124	the program is loaded.	-0.124939
-0.860651	pointer has been loaded.	-0.124939
-0.294265	database is heavily loaded.	-0.124939
-0.657104	same as for (i=0;	-0.124939
-0.831729	= 0; for (i=0;	-0.124939
-0.357546	For example, for (i=0;	-0.124939
-0.358893	Example 14.23b and 14.30	-0.124939
-0.594248	search: // Example 14.30	-0.124939
-0.500258	} } Example 14.30	-0.124939
-0.582975	a; y = b;}	-0.124939
-0.578651	{return a + b;}	-0.124939
-0.325421	int ReadB() {return b;}	-0.124939
-0.358595	2008 version). This wasteful	-0.124939
-0.804569	to avoid this wasteful	-0.124939
-0.237951	allocation is unnecessarily wasteful	-0.124939
-0.356709	provoke error // Return	-0.124939
-0.356709	to zero // Return	-0.124939
-0.356709	return a[i]; // Return	-0.124939
-0.202183	static inline void StoreVector(void	-0.602060
-0.122678	the software. Smaller microcontrollers	-0.124939
-0.122678	optimize caching. Smaller microcontrollers	-0.124939
-0.122678	small microcontrollers: Smaller microcontrollers	-0.124939
-0.550745	storing strings in character	-0.124939
-0.358657	C style with character	-0.124939
-0.358620	C style as character	-0.124939
-0.969991	programming language is implemented.	-0.124939
-0.659292	example 15.1b is implemented.	-0.124939
-0.527116	member pointers are implemented.	-0.124939
-0.447239	code size. In fact,	-0.124939
-0.345991	assembly language. In fact,	-0.124939
-0.345991	can throw. In fact,	-0.124939
-0.352522	53 function at runtime.	-0.124939
-0.647148	rather than at runtime.	-0.124939
-0.756532	is resolved at runtime.	-0.124939
-1.042244	unroll a loop manually	-0.124939
-0.574056	must be done manually	-0.124939
-0.444408	loop-invariant code motion manually	-0.124939
-0.358927	the multiplication of xxn	-0.124939
-0.459308	//Loopby4 s += xxn	-0.124939
-0.237951	s += x^n/n! xxn	-0.124939
-0.020849	The operators &, |,	-0.124939
-0.010300	bitwise operators &, |,	-0.124939
-2.158402	Example: // Example 7.2	-0.124939
-0.347513	of using classes. 7.2	-0.124939
-0.325421	variable storage............................................................................. 26 7.2	-0.124939
-0.885288	for each thread. Thread-local	-0.124939
-0.343755	thread environment block. Thread-local	-0.124939
-0.336329	changed five times. Thread-local	-0.124939
-0.562222	doing whole program 81	-0.124939
-0.594069	available. See page 81	-0.124939
-0.237951	optimization options ................................................................................... 81	-0.124939
-2.158402	Example: // Example 7.1	-0.124939
-0.331936	series of manuals. 7.1	-0.124939
-0.325421	C++ constructs........................................................................ 26 7.1	-0.124939
-0.552387	makes the dispatcher signal	-0.124939
-0.336329	and video processing, signal	-0.124939
-0.294265	functions for statistics, signal	-0.124939
-0.639908	implemented as a circular	-0.425969
-0.550980	queue as a circular	-0.124939
-0.539657	double In example 7.4	-0.124939
-0.356475	operators ...................................................................... 32 7.4	-0.124939
-0.499730	about mathematical functions. 7.4	-0.124939
-1.780326	the compiler to ignore	-0.124939
-0.591727	model. You may ignore	-0.124939
-1.365115	in some cases ignore	-0.124939
-0.463586	Compiler directives and keywords	-0.124939
-0.503022	compilers have many keywords	-0.124939
-0.237951	where appropriate. Compiler-specific keywords	-0.124939
-1.735132	example: // Example 7.8	-0.124939
-0.294265	pointers ...................................................................................................... 37 7.8	-0.124939
-0.294265	pointer has changed. 7.8	-0.124939
-0.354828	calculate it only once.	-0.124939
-0.458426	is done only once.	-0.124939
-0.357073	is only called once.	-0.124939
-0.570385	{ union { 89	-0.124939
-0.563677	memory. See page 89	-0.124939
-0.563677	overlap. See page 89	-0.124939
-0.860700	Func2() { int list[100];	-0.124939
-0.357662	Example 7.16 float list[100];	-0.124939
-0.640509	double b;}; S1 list[100];	-0.124939
-0.600000	techniques can be considered	-0.124939
-0.597436	pointer may be considered	-0.124939
-0.237959	is not traditionally considered	-0.124939
-0.358426	(e.g. GetProcessAffinityMask in Windows).	-0.124939
-0.358426	(e.g. IsProcessorFeaturePresent in Windows).	-0.124939
-0.525378	drivers for 64-bit Windows).	-0.124939
-0.537229	want to compile for.	-0.124939
-0.626477	it is intended for.	-0.124939
-0.442094	code is intended for.	-0.124939
-1.425426	to do the divisions	-0.124939
-0.358887	up multiplications and divisions	-0.124939
-0.331930	is exact. Multiple divisions	-0.124939
-0.503275	10, columns = 8;	-0.124939
-0.357682	int TILESIZE = 8;	-0.124939
-0.823201	int exponent : 8;	-0.124939
-0.358832	zigzag course that reflects	-0.124939
-0.659199	innermost loop. This reflects	-0.124939
-0.815879	and long double reflects	-0.124939
-0.755703	9.7 Container classes .....................................................................................................	-0.124939
-0.408199	101 10.1 Hyperthreading .....................................................................................................	-0.124939
-0.382896	126 13.5 Implementation .....................................................................................................	-0.124939
-1.107033	the value that lies	-0.124939
-0.496720	80. The difference lies	-0.124939
-0.349148	for this efficiency lies	-0.124939
-0.358893	as logarithms and trigonometric	-0.124939
-0.059933	logarithms, exponential functions, trigonometric	-0.425969
-0.787981	allow you to manipulate	-0.124939
-0.659186	allows us to manipulate	-0.124939
-0.358730	can test or manipulate	-0.124939
-0.460816	: 23; // fractional	-0.124939
-0.356709	: 52; // fractional	-0.124939
-0.356709	: 63; // fractional	-0.124939
-0.355482	subtracting 1 from -128	-0.124939
-0.501845	which range from -128	-0.124939
-0.538316	stdint.h char 8 -128	-0.124939
-0.372679	happen to be spaced	-0.425969
-0.764252	the addresses are spaced	-0.124939
-0.725919	can make an approximate	-0.124939
-0.335584	saturated addition, fast approximate	-0.124939
-0.335584	approximate reciprocal, fast approximate	-0.124939
-0.527242	unsigned integers in comparisons,	-0.124939
-0.835499	the operands are comparisons,	-0.124939
-1.069028	two floating point comparisons,	-0.124939
-0.408199	desired new features. User	-0.124939
-0.294265	to be deleted. User	-0.124939
-0.237951	user feedback seriously. User	-0.124939
-0.593060	faster if the dividend	-0.425969
-0.358677	by changing the dividend	-0.124939
-0.355393	consume time at unpredictable	-0.124939
-0.459144	may start at unpredictable	-0.124939
-0.573466	overflow can cause unpredictable	-0.124939
-0.104525	static inline __m128i LoadVector(void	-0.602060
-0.358682	91 step by step.	-0.124939
-0.575178	for the next step.	-0.124939
-0.565147	at the second step.	-0.124939
-0.357850	= C; double Z	-0.124939
-0.837527	Update induction variable Z	-0.124939
-0.237951	Y += Z; Z	-0.124939
-0.358015	three clauses are separated	-0.124939
-0.358015	each clause are separated	-0.124939
-1.191004	code is not separated	-0.124939
-0.463586	between 9 and 64,	-0.124939
-0.463324	ZMM registers by 64,	-0.124939
-0.237951	8, 16, 32, 64,	-0.124939
-0.541158	library file and copies	-0.124939
-0.878272	a code that copies	-0.124939
-0.237951	/ shr ebx,31 copies	-0.124939
-1.682174	of the same brand.	-0.124939
-0.591863	limits the CPU brand.	-0.124939
-0.458361	dispatch by CPU brand.	-0.124939
-0.598465	software. This is annoying	-0.124939
-0.940468	protection schemes are annoying	-0.124939
-0.989645	to be an annoying	-0.124939
-0.876064	profiler is called CodeAnalyst.	-0.124939
-0.442600	VTune and AMD CodeAnalyst.	-0.124939
-0.342313	CPUs use AMD CodeAnalyst.	-0.124939
-0.269243	options....................................................................................... 160 19 Literature	-0.124939
-0.269243	_M_X64 162 19 Literature	-0.124939
-0.237959	list of titles. Literature	-0.124939
-0.570087	very useful to study	-0.124939
-1.624395	is important to study	-0.124939
-0.453404	mainly on my study	-0.124939
-0.571852	objects on the stack,	-0.124939
-0.662866	stored on the stack,	-0.124939
-0.573812	management and garbage collection.	-0.124939
-0.248668	need for garbage collection.	-0.124939
-0.248668	is called garbage collection.	-0.124939
-0.580504	costly when it occurs,	-0.124939
-0.566000	overflow before it occurs,	-0.124939
-0.353455	that overflow never occurs,	-0.124939
-1.088673	with the option -fno-pic	-0.124939
-0.479530	the compiler option -fno-pic	-0.124939
-0.340569	The compiler option -fno-pic	-0.124939
-0.325434	__linux__ x86 platform _M_IX86	-0.124939
-0.325434	_M_IX86 x86-64 platform _M_IX86	-0.124939
-0.408211	x86 platform _M_IX86 _M_IX86	-0.124939
-0.659883	the bottleneck is elsewhere	-0.124939
-0.354532	to seek information elsewhere	-0.124939
-0.350364	cause unpredictable errors elsewhere	-0.124939
-0.504733	different way or bypassing	-0.124939
-0.726205	this problem by bypassing	-0.124939
-0.896926	on the compiler bypassing	-0.124939
-0.042690	from 0x2700 to 0x273F	-0.124939
-0.090038	address 0x2700 to 0x273F	-0.124939
-0.358887	page 134 and 135	-0.124939
-0.658685	{ DoThisThreeTimesAWeek(); } 135	-0.124939
-0.237951	values at once................................... 135	-0.124939
-0.463351	the factorial function looks	-0.124939
-0.540767	The optimized code looks	-0.124939
-0.573447	Agner's vector classes looks	-0.124939
-0.351728	100 doubles: union {double	-0.124939
-0.373663	8.15a struct S1 {double	-0.124939
-0.373663	8.15b struct S1 {double	-0.124939
-1.400443	be used for implementing	-0.124939
-0.970893	are used for implementing	-0.124939
-0.355776	93 themselves. But implementing	-0.124939
-0.587363	int instead of int.	-0.124939
-0.358926	convert float to int.	-0.124939
-0.720736	numbers of type int.	-0.124939
-0.462827	always takes memory space,	-0.124939
-0.845309	waste of cache space,	-0.124939
-0.347524	may save RAM space,	-0.124939
-0.591137	branches that can skip	-0.124939
-0.591727	contention. You may skip	-0.124939
-0.294265	an explanation. Please skip	-0.124939
-0.803147	2 (See page 137	-0.124939
-0.339499	sign and rounding 137	-0.124939
-0.237951	14.5 Integer division...................................................................................................... 137	-0.124939
-0.878790	a cache line. 132	-0.124939
-0.325421	optimization topics ......................................................................................... 132	-0.124939
-0.294265	lookup tables ................................................................................................. 132	-0.124939
-0.358927	the needs of position-	-0.124939
-0.585275	data. This makes position-	-0.124939
-0.541151	use the so-called position-	-0.124939
-1.284058	} }; // Index	-0.124939
-0.023527	cout << "Error: Index	-0.425969
-0.237951	or #pragma optimize("a",on). Specifies	-0.124939
-0.237951	__attribute__((const)) (Linux only). Specifies	-0.124939
-0.237951	__declspec(align(16)) or __attribute__((aligned(16))). Specifies	-0.124939
-1.762833	value of the residual	-0.124939
-1.280789	calculation of the residual	-0.124939
-0.566803	repeated until the residual	-0.124939
-0.535550	advantage of vector operations,	-0.124939
-0.499800	Useful for vector operations,	-0.124939
-0.358005	with vector integer operations,	-0.124939
-0.940698	the drawbacks of C++.	-0.124939
-0.527207	be C or C++.	-0.124939
-0.719622	implemented in compiled C++.	-0.124939
-0.358147	or do other input/output	-0.124939
-0.356466	have many file input/output	-0.124939
-0.314785	these categories: File input/output	-0.124939
-0.358539	maintain. Most compiler packages	-0.124939
-0.642911	to make software packages	-0.124939
-0.350367	ever bigger software packages	-0.124939
-0.358975	by joining the operations:	-0.124939
-0.331936	the two AND operations:	-0.124939
-0.325421	to avoid modulo operations:	-0.124939
-0.816586	and VIA processors. Explicit	-0.124939
-0.102881	............................................................. 96 9.11 Explicit	-0.124939
-0.102881	SIAM 2001. 9.11 Explicit	-0.124939
-0.569714	designed for this purpose.	-0.124939
-1.109914	for a specific purpose.	-0.124939
-1.207448	for a particular purpose.	-0.124939
-0.351643	* b2 * reciprocal_divisor;	-0.124939
-0.351643	* b1 * reciprocal_divisor;	-0.124939
-0.237959	b2, y1, y2, reciprocal_divisor;	-0.124939
-0.502431	their values before compilation.	-0.124939
-0.513352	code and just-in-time compilation.	-0.124939
-0.364865	implementations use just-in-time compilation.	-0.124939
-0.358738	(critical stride) = (number	-0.124939
-0.355235	cache size) / (number	-0.124939
-0.347513	(line size) % (number	-0.124939
-0.496950	may have big endian	-0.124939
-0.331358	that use big endian	-0.124939
-0.331358	comparison. On big endian	-0.124939
-1.411820	a function that allocates	-0.124939
-0.835254	is called, it allocates	-0.124939
-0.237951	(doubly ended queue) allocates	-0.124939
-0.589669	given on page 136	-0.124939
-0.621847	int order(int x); 136	-0.124939
-0.325421	Integer multiplication ............................................................................................. 136	-0.124939
-0.462874	appropriate here. It reveals	-0.124939
-0.408199	The assembly listing reveals	-0.124939
-0.237951	my crystal ball reveals	-0.124939
-0.899291	time it is filled	-0.124939
-0.597254	heap to be filled	-0.124939
-0.593155	cache will be filled	-0.124939
-0.463378	only) -O3 or (requires	-0.124939
-1.829917	SSE2 instruction set (requires	-0.124939
-0.325421	linker and loader (requires	-0.124939
-0.576005	available. Some compilers offer	-0.124939
-0.566106	executable. Most compilers offer	-0.124939
-0.451340	format. Other compilers offer	-0.124939
-0.408211	integers. 7.25 Bitfields Bitfields	-0.124939
-0.172700	to integers. 7.25 Bitfields	-0.124939
-0.172700	.................................................................................................................... 55 7.25 Bitfields	-0.124939
-0.865601	} } // At	-0.124939
-0.294265	end user's computers. At	-0.124939
-0.237951	programming are dominating. At	-0.124939
-0.842504	will have an up-to-date	-0.124939
-0.356752	should choose an up-to-date	-0.124939
-0.462348	best and most up-to-date	-0.124939
-0.341909	security reasons before leaving	-0.124939
-0.235295	call _mm256_zeroupper() before leaving	-0.425969
-0.520912	88 for details. Inheritance	-0.124939
-0.172700	........................................................................... 54 7.22 Inheritance	-0.124939
-0.172700	alternative implementations. 7.22 Inheritance	-0.124939
-1.541979	when the program 153	-0.124939
-0.594069	takes. See page 153	-0.124939
-0.237951	16 Testing speed.............................................................................................................. 153	-0.124939
-0.555039	if a high degree	-0.124939
-0.505099	contain a typical degree	-0.124939
-0.237951	is an n'th degree	-0.124939
-0.379080	& x) { _mm_storeu_si128((__m128i	-0.602060
-0.455352	do such optimizations automatically,	-0.124939
-0.331930	to 151 15.1c automatically,	-0.124939
-0.294265	11.1a to 11.1b automatically,	-0.124939
-0.525758	a multidimensional array sequentially.	-0.124939
-0.876958	data are accessed sequentially.	-0.124939
-1.059436	elements are accessed sequentially.	-0.124939
-0.172700	...................................................................... 32 7.4 Enums	-0.124939
-0.172700	mathematical functions. 7.4 Enums	-0.124939
-0.237959	integer in disguise. Enums	-0.124939
-0.507286	for the CPU. Algebraic	-0.124939
-0.675771	the function call. Algebraic	-0.124939
-0.237951	more complicated reductions. Algebraic	-0.124939
-1.124584	the values of A,	-0.124939
-0.065526	Bitfield x; int A,	-0.425969
-0.585168	precision as the operands.	-0.124939
-0.354209	always evaluate both operands.	-0.124939
-0.714115	order of Boolean operands.	-0.124939
-1.082847	int i; for(i=0; i<100;	-0.124939
-0.463737	0; for (i=0; i<100;	-0.124939
-0.237951	float i2; for(i=0,i2=0; i<100;	-0.124939
-0.314804	0.18 0.18 0.18 0.11	-0.124939
-0.237967	0.63 0.75 0.18 0.11	-0.124939
-0.314795	0.12 0.18 0.12 0.11	-0.124939
-0.830991	Intel Core 2 0.12	-0.124939
-0.331943	2 0.12 0.18 0.12	-0.124939
-0.294265	1.21 0.57 0.44 0.12	-0.124939
-0.596711	"what is the nearest	-0.124939
-0.599651	number to the nearest	-0.124939
-0.358929	// Round to nearest	-0.124939
-0.347513	short vector libraries. To	-0.124939
-0.861582	in the future. To	-0.124939
-0.237951	computer is rebooted. To	-0.124939
-0.357435	x x-- x x--	-0.124939
-0.248642	reductions: x-- x x--	-0.124939
-0.498954	vector algebra reductions: x--	-0.124939
-0.565506	for the C++ language,	-0.124939
-0.460516	a software programming language,	-0.124939
-1.136877	a hardware definition language,	-0.124939
-0.597641	cases when the 145	-0.124939
-0.594069	0x8040); See page 145	-0.124939
-0.325421	Mathematical functions ....................................................................................... 145	-0.124939
-0.594069	this. See page 140	-0.124939
-0.357662	everything is float 140	-0.124939
-0.237951	float and double..................................................................................... 140	-0.124939
-0.594069	language. See page 141	-0.124939
-0.294265	SSE2 or x64 141	-0.124939
-0.237951	and integers ................................... 141	-0.124939
-0.504476	be better than RISC	-0.124939
-0.357241	The distinctions between RISC	-0.124939
-0.382896	sets have got RISC	-0.124939
-0.455342	than future processors. Consider	-0.124939
-0.640490	waste of resources. Consider	-0.124939
-0.294265	case in loops. Consider	-0.124939
-0.463637	The storage of text	-0.124939
-0.449158	these and handle text	-0.124939
-0.478044	buffers for storing text	-0.124939
-0.351305	a different compiler. Object	-0.124939
-0.325421	and delete). 88 Object	-0.124939
-0.237951	Borland's now discontinued Object	-0.124939
-0.599056	Examples: // Example 14.10	-0.124939
-0.294265	variables ......................... 142 14.10	-0.124939
-0.237951	u[1] by u[0]. 14.10	-0.124939
-0.897102	calculations: // Example 14.11	-0.124939
-0.314785	functions ....................................................................................... 145 14.11	-0.124939
-0.237951	overcome this limitation). 14.11	-0.124939
-0.339517	of 2 template <int	-0.124939
-0.339517	of N template <int	-0.124939
-0.339517	* m;} template <int	-0.124939
-0.341879	or more iterations back.	-0.124939
-0.336323	is four places back.	-0.124939
-0.331936	read and written back.	-0.124939
-1.735132	example: // Example 8.4	-0.124939
-0.652294	on this option. 8.4	-0.124939
-0.294265	compiler ....................................................................... 77 8.4	-0.124939
-2.158402	Example: // Example 8.7	-0.124939
-0.382896	see page 105. 8.7	-0.124939
-0.294265	directives .............................................................................................. 82 8.7	-0.124939
-0.444851	function. The assembly listing	-0.124939
-0.344098	-static Generate assembly listing	-0.124939
-0.990815	the assembly output listing	-0.124939
-0.589032	units are used twice	-0.124939
-0.357130	table has const twice	-0.124939
-0.571070	g(x) is calculated twice	-0.124939
-0.563010	early implementations of Pascal	-0.124939
-0.642883	all major platforms. Pascal	-0.124939
-0.615793	implementations of C++, Pascal	-0.124939
-0.600613	miss can be expected.	-0.124939
-1.009287	as good as expected.	-0.124939
-0.497533	// Cache contentions expected.	-0.124939
-0.348384	for the performance. 14.4	-0.124939
-0.339504	129 129 130 14.4	-0.124939
-0.314785	at once................................... 135 14.4	-0.124939
-1.396673	int cc[]) { Vec16s	-0.124939
-0.571973	for the class Vec16s	-0.124939
-0.237951	16 256 Vec32uc Vec16s	-0.124939
-0.456069	is equally efficient. Simple	-0.124939
-0.502400	generally very fast. Simple	-0.124939
-0.237951	-fp- model fast=2 Simple	-0.124939
-0.023527	Architecture Software Developer’s Manual",	-0.425969
-0.237959	"AMD64 Architecture Programmer’s Manual",	-0.124939
-0.358429	CPU cores and leave	-0.124939
-0.358429	512 520 and leave	-0.124939
-0.503737	No program should leave	-0.124939
-0.888935	problem can be solved	-0.124939
-0.594925	dilemma can be solved	-0.124939
-0.845849	Intel compiler has solved	-0.124939
-0.598465	(SVML). This is supplied	-0.124939
-0.873773	mathematical functions are supplied	-0.124939
-0.587975	that I have supplied	-0.124939
-0.347518	for demonstration purposes. Available	-0.124939
-0.347513	direct hardware access. Available	-0.124939
-0.336329	library Intel Agner Available	-0.124939
-1.065680	program code is translated	-0.124939
-0.358642	function call is translated	-0.124939
-0.580337	i=0; has been translated	-0.124939
-0.352155	int64_t 29 64-bit Linux:	-0.124939
-0.352155	unsigned __int64 64-bit Linux:	-0.124939
-0.237959	option (Windows: /Gy, Linux:	-0.124939
-0.753011	for the user. With	-0.124939
-0.331930	a thousand numbers. With	-0.124939
-0.314785	the next step. With	-0.124939
-0.339489	and 64-bit Linux. Has	-0.124939
-0.325421	Visual Studio IDE. Has	-0.124939
-0.237951	Borland/CodeGear/Embarcadero C++ builder Has	-0.124939
-0.359364	unless you are overriding	-0.425969
-0.543980	feature that allows overriding	-0.124939
-0.329406	128 bytes AMD Opteron	-0.124939
-0.329406	aligned operands AMD Opteron	-0.124939
-0.329406	unaligned op. AMD Opteron	-0.124939
-0.315205	compilers and operating systems".	-0.124939
-1.064328	up with the correct	-0.124939
-0.863742	it has the correct	-0.124939
-0.358940	our estimate is correct	-0.124939
-0.358631	less efficient code caching.	-0.124939
-0.358291	87 about memory caching.	-0.124939
-0.574467	data to optimize caching.	-0.124939
-0.957762	no risk of overflow:	-0.124939
-1.190635	for floating point overflow:	-0.124939
-0.325429	way that avoids overflow:	-0.124939
-0.572067	antivirus program that scans	-0.124939
-0.358117	virus scanner that scans	-0.124939
-0.358706	string length function scans	-0.124939
-0.530309	in the following way:	-0.124939
-0.577700	optimizes the code. Sometimes	-0.124939
-0.497586	particularly time consuming. Sometimes	-0.124939
-0.325421	for simple tasks. Sometimes	-0.124939
-0.656460	-fno-builtin Gnu 32-bit -fno-builtin	-0.124939
-0.572938	Gnu 64 bit -fno-builtin	-0.124939
-0.355525	Use 12 option -fno-builtin	-0.124939
-0.390896	small enough to justify	-0.124939
-0.390896	rarely enough to justify	-0.124939
-0.502417	performance can easily justify	-0.124939
-0.336235	structures. On the contrary,	-0.124939
-0.336235	hyperthreading. On the contrary,	-0.124939
-0.336235	9.1b. On the contrary,	-0.124939
-0.443497	same function calling conventions.	-0.124939
-0.300344	the standard calling conventions.	-0.124939
-0.300344	manual 5: calling conventions.	-0.124939
-0.855173	critical function. The initialization	-0.124939
-0.389576	program has an initialization	-0.124939
-0.389576	library has an initialization	-0.124939
-0.598405	forums on the Internet	-0.124939
-0.550603	updates through the Internet	-0.124939
-0.294275	5. www.amd.com. 163 Internet	-0.124939
-2.118108	in order to cover	-0.124939
-0.358589	very big to cover	-0.124939
-0.590034	manual does not cover	-0.124939
-0.331940	the constructor itself. Constructors	-0.124939
-0.172700	c; }; 7.23 Constructors	-0.124939
-0.172700	.............................................................................................................. 54 7.23 Constructors	-0.124939
-0.357241	CISC processors, between PC's	-0.124939
-0.589836	example, the first PC's	-0.124939
-0.354657	Connecting several standard PC's	-0.124939
-0.897102	conversion // Example 7.21	-0.124939
-0.314785	functions ........................................................................................ 53 7.21	-0.124939
-0.538953	worth the effort. 7.21	-0.124939
-0.504885	unfortunate method that delays	-0.124939
-0.522152	times and cause delays	-0.124939
-0.237951	can cause severe delays	-0.124939
-0.024424	StoreVector(aa + i, a);	-0.602060
-0.143242	in b[i] and c[i]	-0.124939
-0.143242	if b[i] and c[i]	-0.124939
-0.237959	a[i] + b[i]; c[i]	-0.124939
-0.358844	exception handlers for cleaning	-0.124939
-0.804508	lot of time cleaning	-0.124939
-0.658605	is prevented from cleaning	-0.124939
-0.599131	In the same way,	-0.124939
-0.864548	times the other way,	-0.124939
-0.358055	many times one way,	-0.124939
-0.716569	of RAM memory. Big	-0.124939
-0.331943	big mainframe computer. Big	-0.124939
-0.237951	be aware of. Big	-0.124939
-0.525924	instruction set and ZMM	-0.425969
-0.237959	and the 512-bit ZMM	-0.124939
-0.563032	The table of coefficients	-0.124939
-0.048401	polynomial // Polynomial coefficients	-0.124939
-0.048401	3.3; // Polynomial coefficients	-0.124939
-0.595316	memory, such as DOS	-0.124939
-0.865225	old operating systems DOS	-0.124939
-0.351708	some very old DOS	-0.124939
-0.463541	32-bit case. The -fpie	-0.124939
-1.162706	with the option -fpie	-0.124939
-0.481364	made with option -fpie	-0.124939
-0.571151	statement with many labels	-0.124939
-0.572405	if the case labels	-0.124939
-0.325421	statement with sequential labels	-0.124939
-0.116811	{1, 1, 2, 6,	-0.425969
-0.341889	2, 3, 4, 6,	-0.124939
-0.314785	jl $B1$3: pop ret	-0.124939
-0.237951	cmp ja $B2$3: ret	-0.124939
-0.237951	in the beginning. ret	-0.124939
-0.496696	is discussed below. Signed	-0.124939
-0.343751	for integer overflow. Signed	-0.124939
-0.237951	// Example 7.4. Signed	-0.124939
-0.358887	to log, and logarithms	-0.124939
-1.460003	functions such as logarithms	-0.124939
-0.458821	pow function uses logarithms	-0.124939
-0.971201	the array is stored.	-0.124939
-1.282357	needs to be stored.	-0.124939
-0.562927	how variables are stored.	-0.124939
-0.343751	in a standardized manner.	-0.124939
-1.152892	in a non-sequential manner.	-0.124939
-0.341883	in a random manner.	-0.124939
-0.350350	speed or size. Today,	-0.124939
-0.314785	was too slow. Today,	-0.124939
-0.294265	big mainframe computers. Today,	-0.124939
-0.527362	course be the easiest	-0.124939
-0.358215	aliasing (/Oa). The easiest	-0.124939
-0.358215	large delays. The easiest	-0.124939
-0.358887	to push and pop	-0.124939
-0.575671	i < 100. pop	-0.124939
-0.237951	cmp jl $B1$3: pop	-0.124939
-0.558662	Here, the constant 3.5	-0.124939
-0.339494	updates .................................................................................................... 19 3.5	-0.124939
-0.331930	the update process. 3.5	-0.124939
-0.454482	and the options -S	-0.124939
-0.102881	assembly listing /FA -S	-0.124939
-0.102881	- masm=intel /FA -S	-0.124939
-1.878496	the function is inlined.	-0.124939
-1.064919	certain to be inlined.	-0.124939
-1.073031	function can be inlined.	-0.124939
-0.337721	86 add add cmp	-0.124939
-0.337721	add mov add cmp	-0.124939
-0.237959	loop increment i++. cmp	-0.124939
-0.358273	data structure, data flow	-0.124939
-1.345042	in the program flow	-0.124939
-0.578509	determines the program flow	-0.124939
-0.648519	means of #include directives.	-0.124939
-0.343766	data. Use OpenMP directives.	-0.124939
-0.237951	in C++: Preprocessor directives.	-0.124939
-0.872812	allocated is also deallocated.	-0.124939
-0.443073	it has been deallocated.	-0.124939
-0.526736	possibly be more (128	-0.124939
-1.735081	SSE2 instruction set (128	-0.124939
-0.579143	SSE instruction set (128	-0.124939
-0.421426	to be obsolete. Programmers	-0.124939
-0.294265	bit scan instruction. Programmers	-0.124939
-0.237951	of macro expansions. Programmers	-0.124939
-1.628826	is important to focus	-0.124939
-0.886502	there is more focus	-0.124939
-0.493766	code. The main focus	-0.124939
-1.282684	to the function definition.	-0.124939
-0.566366	inside the class definition.	-0.124939
-1.011840	inside a class definition.	-0.124939
-0.527362	to follow the track	-0.124939
-0.482297	way to keep track	-0.124939
-0.299782	objects and keep track	-0.124939
-0.358451	worried about this condition.	-0.124939
-0.520187	trigger the error condition.	-0.124939
-0.445977	or other error condition.	-0.124939
-0.358887	s1, s2 and s3	-0.124939
-0.555547	s2 = 0, s3	-0.124939
-0.237951	s2 += a[i+2]; s3	-0.124939
-0.555547	s1 = 0, s2	-0.124939
-0.237951	s1 += a[i+1]; s2	-0.124939
-0.237951	Now s0, s1, s2	-0.124939
-0.525375	vector operations on contemporary	-0.124939
-0.357166	64 bytes on contemporary	-0.124939
-0.349799	processor core. Unfortunately, contemporary	-0.124939
-0.331930	the compiler .......................................................................................... 66	-0.124939
-0.660020	x + 1.0f;} 66	-0.124939
-0.294265	compilers optimize ............................................................................................ 66	-0.124939
-0.358937	array bounds is probably	-0.124939
-1.383115	the code can probably	-0.124939
-0.237951	show a disassembly, probably	-0.124939
-1.531023	the use of longjmp	-0.124939
-1.432408	when the function longjmp	-0.124939
-0.578942	Don't rely on longjmp	-0.124939
-0.020849	longdoublevalue ( 1)sign 2exponent	-0.124939
-0.020849	doublevalue ( 1)sign 2exponent	-0.124939
-0.020849	floatvalue ( 1)sign 2exponent	-0.124939
-0.530833	or switch statement leads	-0.124939
-0.517224	where automatic vectorization leads	-0.124939
-0.528818	on lazy binding leads	-0.124939
-0.462272	// Array size Alignd	-0.124939
-0.459804	three aligned arrays Alignd	-0.124939
-0.331936	int bb[size] ); Alignd	-0.124939
-0.658398	use it for improving	-0.124939
-1.642369	be used for improving	-0.124939
-0.463275	some tips on improving	-0.124939
-0.462470	sets and cache sizes.	-0.124939
-0.459197	with different matrix sizes.	-0.124939
-0.336323	members of mixed sizes.	-0.124939
-0.635335	3.5 Program loading .......................................................................................................	-0.124939
-0.421426	150 15 Metaprogramming .......................................................................................................	-0.124939
-0.595264	3.9 Other databases .......................................................................................................	-0.124939
-0.358832	an integer that holds	-0.124939
-0.356327	The float type holds	-0.124939
-0.343751	the array. eax holds	-0.124939
-0.563029	benchmark performance of competing	-0.124939
-0.541035	the threads are competing	-0.124939
-0.358354	Classes (MFC). A competing	-0.124939
-0.139573	to your programming questions	-0.124939
-0.139573	send your programming questions	-0.124939
-0.237959	time to answer questions	-0.124939
-1.541173	stored in a register,	-0.124939
-1.121592	into a vector register,	-0.124939
-0.718118	a 128-bit vector register,	-0.124939
-0.573156	including user interface etc.,	-0.124939
-0.351309	calls another function, etc.,	-0.124939
-0.538953	interpreters, just-in-time compilers, etc.,	-0.124939
-1.328800	to call the ReadTSC	-0.124939
-0.598325	with the function ReadTSC	-0.124939
-0.355138	www.agner.org/optimize/testp.zip or get ReadTSC	-0.124939
-0.608854	can be replaced with:	-0.425969
-0.314795	c; }; Replace with:	-0.124939
-0.358897	"Register usage in kernel	-0.124939
-1.641634	the operating system kernel	-0.124939
-0.524782	as in Linux kernel	-0.124939
-0.636052	AMD and VIA CPUs").	-0.301030
-0.148507	matrix[rows][columns]; int i, j;	-0.124939
-0.375959	list[size]; int i, j;	-0.124939
-0.540826	objects have a natural	-0.124939
-0.540826	elements have a natural	-0.124939
-0.651886	the code contains natural	-0.124939
-0.348395	allows parallel calculations. Examples	-0.124939
-0.435069	as explained above. Examples	-0.124939
-0.458804	it is run. Examples	-0.124939
-0.586995	if else { (iset	-0.124939
-0.237951	SelectAddMul_pointer = &SelectAddMul_AVX2; (iset	-0.124939
-0.237951	SelectAddMul_pointer = &SelectAddMul_SSE41; (iset	-0.124939
-0.788287	calls another function F2	-0.124939
-0.358682	exceptions thrown by F2	-0.124939
-0.864327	function in case F2	-0.124939
-0.357624	a key or moving	-0.124939
-0.357624	a button or moving	-0.124939
-0.596498	memcpy rather than moving	-0.124939
-0.599073	a: // Example 9.6b.	-0.124939
-0.558381	work in example 9.6b.	-0.124939
-0.558381	shown in example 9.6b.	-0.124939
-0.538953	(Intel CPU only) -O3	-0.124939
-0.237951	or -Ofast /O3 -O3	-0.124939
-0.237951	/O2 or /Ox -O3	-0.124939
-0.358728	Neither is it unusual	-0.124939
-1.706334	it is not unusual	-0.124939
-1.540712	It is not unusual	-0.124939
-0.350133	goes to cache misses,	-0.124939
-0.350133	has most cache misses,	-0.124939
-0.350133	instructions executed, cache misses,	-0.124939
-1.399112	= 0 - Divide	-0.124939
-0.458205	shift and add Divide	-0.124939
-0.237951	---xx---- (-a>-b)=(a<b) ---xx---x Divide	-0.124939
-0.591050	may use a sorted	-0.124939
-0.526641	small then a sorted	-0.124939
-0.504156	simplicity. But a sorted	-0.124939
-0.358927	conflicting considerations of efficiency,	-0.124939
-0.358010	to improve cache efficiency,	-0.124939
-0.656497	a compromise between efficiency,	-0.124939
-0.596551	code is the same.	-0.124939
-0.358677	and always the same.	-0.124939
-0.504828	are exactly the same.	-0.124939
-0.064986	standard template library (STL)	-0.124939
-0.509080	Standard Template Library (STL)	-0.124939
-0.762305	want to get rid	-0.124939
-0.421457	Then we get rid	-0.124939
-0.325446	we don't get rid	-0.124939
-0.351708	jobs and 10 ms	-0.124939
-0.339494	slices to 120 ms	-0.124939
-0.314785	of typically 30 ms	-0.124939
-0.462007	program has two arrays,	-0.124939
-0.355973	etc. In large arrays,	-0.124939
-0.459706	are no big arrays,	-0.124939
-0.832367	if the elements matrix[r][c]	-0.124939
-0.553502	i.e. each element matrix[r][c]	-0.124939
-0.443557	diagonal. Each element matrix[r][c]	-0.124939
-0.885040	the program to issue	-0.124939
-0.959216	is not an issue	-0.124939
-0.336329	is a portability issue	-0.124939
-0.588663	simplest way to solve	-0.124939
-0.659186	are designed to solve	-0.124939
-0.879362	This does not solve	-0.124939
-0.521561	not a problem since	-0.124939
-0.607458	not been updated since	-0.124939
-0.237951	of clock pulses since	-0.124939
-0.462897	different purposes is beyond	-0.124939
-0.358346	database queries is beyond	-0.124939
-0.358346	of coprocessors is beyond	-0.124939
-0.719860	code becomes more readable	-0.124939
-0.355948	assembly output more readable	-0.124939
-0.237959	is not human readable	-0.124939
-0.557742	an operand is infinity	-0.124939
-1.056167	result will be infinity	-0.124939
-0.463378	was zero or infinity	-0.124939
-1.749078	a lot of bookkeeping	-0.124939
-0.584379	costs of this bookkeeping	-0.124939
-0.346481	example explains why bookkeeping	-0.124939
-0.461669	combined by some formula	-0.124939
-0.349143	use the safe formula	-0.124939
-0.821249	finding the right formula	-0.124939
-0.588059	deeper into the technical	-0.124939
-0.580776	completely because of technical	-0.124939
-0.348387	the inlining causes technical	-0.124939
-0.355437	instr. set AVX instr.	-0.124939
-0.345257	instr. set SSE4.1 instr.	-0.124939
-0.435069	set Suppl. SSE3 instr.	-0.124939
-0.601124	indeed of the specified	-0.124939
-1.649406	depending on the specified	-0.124939
-0.497944	program are typically specified	-0.124939
-0.828247	smarter ways of organizing	-0.124939
-0.764447	performance penalty for organizing	-0.124939
-0.726205	the performance by organizing	-0.124939
-0.599073	matrix[c][r]. // Example 9.5a	-0.124939
-1.295194	loop in example 9.5a	-0.124939
-0.353544	matrix using example 9.5a	-0.124939
-0.358940	of && is false,	-0.124939
-0.462053	&& false = false,	-0.124939
-0.462053	&& !a = false,	-0.124939
-0.358887	is free and open	-0.124939
-0.358772	Function inlining can open	-0.124939
-0.347516	Open Watcom Another open	-0.124939
-0.569589	find the optimal decomposition	-0.124939
-0.237951	principles here: functional decomposition	-0.124939
-0.237951	data decomposition. Functional decomposition	-0.124939
-0.358927	The fallacy of measuring	-0.124939
-0.358887	hot spots and measuring	-0.124939
-0.575358	confirmed this by measuring	-0.124939
-0.102881	146 below. 3.7 File	-0.124939
-0.102881	....................................................... 20 3.7 File	-0.124939
-0.237959	of these categories: File	-0.124939
-1.056094	memory allocation is negligible	-0.124939
-0.764062	exception handling is negligible	-0.124939
-0.971238	is only a negligible	-0.124939
-0.588444	caching, but it took	-0.124939
-0.763793	} This code took	-0.124939
-0.352982	to 15.1c? We took	-0.124939
-0.357351	caller, and so on.	-0.124939
-0.773711	it is running on.	-0.124939
-0.861656	optimization options turned on.	-0.124939
-0.455352	eight logical processors. Hyperthreading	-0.124939
-0.102881	Multithreading.............................................................................................................. 101 10.1 Hyperthreading	-0.124939
-0.102881	2007 (www.intel.com/technology/itj/). 10.1 Hyperthreading	-0.124939
-0.573026	bits 0 - 30	-0.124939
-0.353854	slices of typically 30	-0.124939
-0.237951	(see page 142). 30	-0.124939
-0.503869	function pointer which initially	-0.124939
-0.657646	// Function pointer initially	-0.124939
-0.421426	The PLT entry initially	-0.124939
-0.588770	contentions do not occur.	-0.124939
-0.587924	aliasing does not occur.	-0.124939
-1.114546	that it doesn't occur.	-0.124939
-0.442067	with character arrays. Strings	-0.124939
-0.102881	..................................................................................................... 93 9.8 Strings	-0.124939
-0.102881	specific needs. 9.8 Strings	-0.124939
-0.711718	7.32 Preprocessing directives Preprocessing	-0.124939
-0.102881	time-critical code. 7.32 Preprocessing	-0.124939
-0.102881	.............................................................................. 65 7.32 Preprocessing	-0.124939
-1.947646	is possible to utilize	-0.124939
-2.118108	in order to utilize	-0.124939
-0.343760	way to fully utilize	-0.124939
-0.474312	a vector of (0,0,0,0,0,0,0,0)	-0.301030
-0.350881	in this example: 38	-0.124939
-0.314785	Smart pointers .......................................................................................................... 38	-0.124939
-0.294265	7.10 Arrays ..................................................................................................................... 38	-0.124939
-0.587364	the pointer or reference.	-0.124939
-0.525336	by a const reference.	-0.124939
-0.382896	returning a null reference.	-0.124939
-0.294288	== 2 #define FUNCNAME	-0.124939
-0.294288	== 8 #define FUNCNAME	-0.124939
-0.294288	== 5 #define FUNCNAME	-0.124939
-0.358975	program. During the history	-0.124939
-0.570368	precompiled code. The history	-0.124939
-0.294265	on the past history	-0.124939
-0.724492	} }; class CChild2	-0.124939
-0.294265	{ CChild1 Object1; CChild2	-0.124939
-0.237951	= &Object1; p1->Hello(); CChild2	-0.124939
-0.706961	except the sign bit:	-0.124939
-0.493017	inverting the sign bit:	-0.124939
-0.320271	shift out sign bit:	-0.124939
-0.348406	are various discussion forums	-0.124939
-0.314785	www.amd.com. 163 Internet forums	-0.124939
-0.294265	forums Several internet forums	-0.124939
-0.350371	support for relative addressing	-0.124939
-0.255935	instruction for self-relative addressing	-0.124939
-0.473714	set supports self-relative addressing	-0.124939
-0.608159	int size = 1024;	-0.301030
-0.889705	languages such as C#,	-0.124939
-1.318059	of the code. C#,	-0.124939
-0.294265	written in Java, C#,	-0.124939
-0.574982	beginning rather than allocating	-0.124939
-0.574982	advance rather than allocating	-0.124939
-0.294275	piecewise or re- allocating	-0.124939
-0.594108	example so that a+b	-0.124939
-0.596458	folding - n.a. a+b	-0.124939
-0.498939	Integer algebra reductions: a+b	-0.124939
-0.557126	This should be taken	-0.124939
-0.557126	problems should be taken	-0.124939
-0.557126	considerations should be taken	-0.124939
-0.952055	depending on the microprocessor.	-0.124939
-0.573265	each type of microprocessor.	-0.124939
-0.598325	allows the function argument	-0.124939
-0.570182	conclusion to this argument	-0.124939
-0.572299	operators). The same argument	-0.124939
-0.358119	... } If Func1	-0.124939
-0.356513	Example 8.21 void Func1	-0.124939
-0.862575	necessary information about Func1	-0.124939
-0.704364	shared objects in Unix-like	-0.124939
-0.990934	Shared objects in Unix-like	-0.124939
-0.858935	choice for all Unix-like	-0.124939
-0.358587	x --- - -----	-0.124939
-1.089401	x x- x -----	-0.124939
-0.294265	----- x---- x---- -----	-0.124939
-1.423598	= b * 2.5	-0.124939
-0.357012	language ............................................................................... 8 2.5	-0.124939
-0.496315	as C++ compilers. 2.5	-0.124939
-0.586817	same code and read-only	-0.124939
-0.358429	code section and read-only	-0.124939
-0.595618	Data that are read-only	-0.124939
-1.195398	variables in a well-structured	-0.124939
-0.527228	making clear and well-structured	-0.124939
-0.540597	to a more well-structured	-0.124939
-0.356423	the remaining bits represent	-0.124939
-0.446304	the subsequent counts represent	-0.124939
-0.237951	certain to truly represent	-0.124939
-0.840020	difficult to find elsewhere.	-0.124939
-0.473726	must be found elsewhere.	-0.124939
-0.294265	can be reused elsewhere.	-0.124939
-1.076759	use of the micro-op	-0.124939
-0.596150	processors with a micro-op	-0.124939
-0.358724	code cache or micro-op	-0.124939
-0.763820	which one is best.	-0.124939
-0.540899	which implementation is best.	-0.124939
-0.355349	which one works best.	-0.124939
-0.358927	it. Instead of returning	-0.124939
-0.358682	unconventional manner by returning	-0.124939
-0.504560	automatically deallocated when returning	-0.124939
-0.350856	registers. Disadvantages are: Long	-0.124939
-0.510570	and double precision. Long	-0.124939
-0.630036	out of order. Long	-0.124939
-0.657376	for (c2 = r1;	-0.124939
-0.657376	for (r2 = r1;	-0.124939
-0.357224	0; c1 < r1;	-0.124939
-0.527294	sampling requires a CPU-	-0.124939
-0.841915	library functions. The CPU-	-0.124939
-0.598249	complicated to make CPU-	-0.124939
-0.463640	; jump to top	-0.124939
-0.484387	of a ; top	-0.124939
-0.444842	$B1$2 ebx ; top	-0.124939
-1.628826	is important to decide	-0.124939
-0.504973	the factors that decide	-0.124939
-0.358520	annoying. We may decide	-0.124939
-0.419239	stored near each other.	-0.425969
-0.349456	underflow neutralize each other.	-0.124939
-0.304419	with a square brackets	-0.124939
-0.304419	ebx. The square brackets	-0.124939
-0.325431	exiting the {} brackets	-0.124939
-0.314785	been updated since 2004.	-0.124939
-0.237951	Compiler v. 8.42n, 2004.	-0.124939
-0.237951	VectorC v. 2.1.7, 2004.	-0.124939
-1.230757	repeat count is odd	-0.124939
-0.358555	11.2b was an odd	-0.124939
-0.527391	seem a little odd	-0.124939
-1.735132	example: // Example 7.7	-0.124939
-0.325421	and decrement operators. 7.7	-0.124939
-0.294265	references ............................................................................................ 36 7.7	-0.124939
-0.738877	Intel C++ Compiler Documentation	-0.124939
-0.294265	A GNU Free Documentation	-0.124939
-0.237951	std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP. www.openmp.org. Documentation	-0.124939
-0.334102	incompatible or error prone.	-0.124939
-0.135685	and more error prone.	-0.124939
-0.135685	therefore more error prone.	-0.124939
-0.358283	and pow at compile-	-0.124939
-0.503678	if), but no compile-	-0.124939
-0.346484	C++ should allow compile-	-0.124939
-0.356041	from any function. Global	-0.124939
-0.348384	can avoid it. Global	-0.124939
-1.240835	the function returns. Global	-0.124939
-0.357281	(GOT). These table lookups	-0.124939
-0.968362	GOT and PLT lookups	-0.124939
-0.237951	Number of simultaneous lookups	-0.124939
-0.356825	x-xxxx--x Profile-guided optimization Whole	-0.124939
-0.535706	up a program. Whole	-0.124939
-0.237951	Interprocedural optimization /Og Whole	-0.124939
-0.358738	n.a. (-a)*(-b) = a*b	-0.124939
-0.237951	a+b = b+a a*b	-0.124939
-0.237951	a+b = b+a, a*b	-0.124939
-0.599111	option for the linker.	-0.124939
-0.593555	file from the linker.	-0.124939
-0.502738	load the dynamic linker.	-0.124939
-1.486480	the sake of security.	-0.124939
-0.143290	C++ relates to security.	-0.124939
-0.143290	language relates to security.	-0.124939
-0.563862	than the table lookup.	-0.124939
-1.098919	by a table lookup.	-0.124939
-0.630073	for vectorized table lookup.	-0.124939
-0.563677	aliasing. See page 78	-0.124939
-0.563677	occur. See page 78	-0.124939
-0.490342	a function library. 78	-0.124939
-0.572905	register size is handled	-0.124939
-0.358642	This triangle is handled	-0.124939
-0.596819	feedback should be handled	-0.124939
-0.354227	void Func1 (int a[],	-0.124939
-0.048401	8.26a void Func(int a[],	-0.124939
-0.048401	8.26b void Func(int a[],	-0.124939
-0.358937	last line is implicitly	-0.124939
-0.358702	the memcpy function implicitly	-0.124939
-0.546273	Multiplications are done implicitly	-0.124939
-0.159863	strings including the terminating	-0.425969
-0.357078	require cleanup before terminating	-0.124939
-0.358589	and not not _WIN32	-0.124939
-0.352067	_LP64 Windows platform _WIN32	-0.124939
-0.314785	Windows platform _WIN32 _WIN32	-0.124939
-0.463637	necessary calculations of (2n	-0.124939
-0.572292	as a * (2n	-0.124939
-0.460190	n. The constant (2n	-0.124939
-0.358117	performance test that measures	-0.124939
-0.358117	a counter that measures	-0.124939
-0.539718	processes. The profiler measures	-0.124939
-0.726685	of additions and multiplications.	-0.124939
-0.525877	additions and no multiplications.	-0.124939
-0.499712	with only four multiplications.	-0.124939
-0.357021	sufficient for less intensive	-0.124939
-0.102881	in a computationally intensive	-0.124939
-0.102881	are not computationally intensive	-0.124939
-1.778194	that can be moved	-0.124939
-0.597436	calculation may be moved	-0.124939
-0.463386	be copied or moved	-0.124939
-0.358738	CriticalFunction(); timediff[i] = ReadTSC()	-0.124939
-0.568916	<intrin.h> long long ReadTSC()	-0.124939
-0.522737	... // Use ReadTSC()	-0.124939
-0.956223	the result is valid.	-0.124939
-0.658702	the conversion is valid.	-0.124939
-1.080780	second operand is valid.	-0.124939
-0.326168	1024; int a[size], b[size];	-0.124939
-0.217345	i; float a[size], b[size];	-0.124939
-0.135932	1000; float a[size], b[size];	-0.124939
-0.351305	the Gnu compiler. Not	-0.124939
-0.349781	code for vectorization Not	-0.124939
-0.237951	Borland C++ builder. Not	-0.124939
-0.358393	is achieved when none	-0.124939
-0.354415	any expression, but none	-0.124939
-0.354415	to 15.1c, but none	-0.124939
-0.596493	X?" rather than "what	-0.124939
-0.237951	programmer typically thinks "what	-0.124939
-0.237951	of the kind: "what	-0.124939
-0.408199	only an addition. Comparing	-0.124939
-0.237951	the condition clause. Comparing	-0.124939
-0.237951	Microsoft Table 2.1. Comparing	-0.124939
-0.450260	cores, vector processing instructions,	-0.124939
-0.325421	algorithm of sequential instructions,	-0.124939
-0.237951	flush and fence instructions,	-0.124939
-0.846880	different instruction sets Microprocessor	-0.124939
-0.341879	the wrong branch. Microprocessor	-0.124939
-0.421426	1997. Mostly obsolete. Microprocessor	-0.124939
-0.460930	implemented with template metaprogramming.	-0.124939
-0.544907	we may need metaprogramming.	-0.124939
-0.514746	then we need metaprogramming.	-0.124939
-0.463351	below. Intrinsic function Size	-0.124939
-0.866286	Type of elements Size	-0.124939
-0.347513	of these classes. Size	-0.124939
-1.652740	be used for metaprogramming,	-0.124939
-0.449865	algorithm with template metaprogramming,	-0.124939
-0.449865	In C++ template metaprogramming,	-0.124939
-1.153562	optimizing compiler can bypass	-0.124939
-0.586099	__intel_cpu_feature_indicator_x. You can bypass	-0.124939
-0.358730	on. Replace or bypass	-0.124939
-0.500902	/Fa for assembly output.	-0.124939
-0.568570	for assembly language output.	-0.124939
-0.353454	that produce Boolean output.	-0.124939
-0.649785	Choice of microprocessor ...........................................................................................	-0.124939
-0.975183	Floating point division ...........................................................................................	-0.124939
-0.646252	the optimal platform ...........................................................................................	-0.124939
-1.042186	for finding the numerically	-0.124939
-0.358826	14.30 finds the numerically	-0.124939
-0.237959	2; // Find numerically	-0.124939
-0.352693	in an || expression.	-0.124939
-0.483918	gives the chosen expression.	-0.124939
-0.341888	is an arithmetic expression.	-0.124939
-0.758862	7.9 Smart pointers ..........................................................................................................	-0.124939
-0.615804	20 Copyright notice ..........................................................................................................	-0.124939
-0.382896	120 12.10 Conclusion ..........................................................................................................	-0.124939
-0.877335	0; } The InstructionSet()	-0.124939
-0.143220	Header file for InstructionSet()	-0.425969
-0.649780	Join identical branches Eliminate	-0.124939
-0.922297	y + 1.; Eliminate	-0.124939
-0.429560	branches Eliminate jumps Eliminate	-0.124939
-0.358932	for saving a backup	-0.124939
-0.456063	applications need better backup	-0.124939
-0.237951	and prevent legitimate backup	-0.124939
-0.579199	specific instruction set. 13.6	-0.124939
-0.444418	80.8 65 65 13.6	-0.124939
-0.325421	Implementation ..................................................................................................... 126 13.6	-0.124939
-1.113567	i++) { // Get	-0.124939
-0.569732	parm2) { // Get	-0.124939
-0.356709	function version // Get	-0.124939
-0.590027	function does not throw	-0.124939
-0.497364	F1 will never throw	-0.124939
-0.764732	that can possibly throw	-0.124939
-0.858512	higher instruction set. More	-0.124939
-0.351325	reference to a[i] More	-0.124939
-0.343755	avoid these problems. More	-0.124939
-0.314785	Automatic vectorization Devirtualization ---x-----	-0.124939
-0.294265	reductions: !(!a)=a x-xxxxxxx ---x-----	-0.124939
-0.294265	x-xxxx--x x-xx----- x--x----- ---x-----	-0.124939
-0.221679	p) { return _mm_loadu_si128((__m128i	-0.301030
-1.088085	of programming language Before	-0.124939
-0.754965	find hot spots Before	-0.124939
-0.325421	complicated mathematical tasks. Before	-0.124939
-0.821443	and 64-bit systems. Applications	-0.124939
-0.683887	the user interface. Applications	-0.124939
-0.237951	See page 141. Applications	-0.124939
-0.358587	approximately 12 - 25	-0.124939
-0.348384	of reduced performance. 25	-0.124939
-0.237951	6 Development process...................................................................................................... 25	-0.124939
-0.597241	processors with the AVX-512	-0.124939
-0.212327	library function. 12.2 AVX-512	-0.124939
-0.212327	................................................................. 107 12.2 AVX-512	-0.124939
-0.722986	1 fraction 2 23	-0.124939
-0.519541	versions of their 23	-0.124939
-0.314785	and usability ............................................................................................... 23	-0.124939
-0.648292	the cache will evict	-0.124939
-0.353102	Number 18 will evict	-0.124939
-0.353102	Number 17 will evict	-0.124939
-1.188881	of the function. Copying	-0.124939
-0.354521	to stack memory. Copying	-0.124939
-0.237951	forwards, not backwards. Copying	-0.124939
-0.503772	int x; for (x	-0.124939
-0.357546	= 1.0; for (x	-0.124939
-0.357546	+ B; for (x	-0.124939
-0.498217	on anything else being	-0.124939
-0.456065	consequence of n being	-0.124939
-0.237951	test data. That being	-0.124939
-0.358752	// x^n // sum,	-0.124939
-0.589836	to the first sum,	-0.124939
-0.565147	to the second sum,	-0.124939
-0.358848	has disadvantages: The unrolled	-0.124939
-0.358137	integer power, loop unrolled	-0.124939
-0.435069	may be completely unrolled	-0.124939
-0.504795	different cores is slow.	-0.124939
-0.358642	// Truncation is slow.	-0.124939
-0.352974	Basic was too slow.	-0.124939
-0.069285	elements in aa: StoreVector(aa	-0.602060
-1.735132	example: // Example 7.11	-0.124939
-0.520586	of this problem. 7.11	-0.124939
-0.314785	Arrays ..................................................................................................................... 38 7.11	-0.124939
-0.358975	processor enters the market	-0.124939
-0.659783	to develop and market	-0.124939
-0.574900	old. The CPU market	-0.124939
-1.428511	a vector of vectors,	-0.124939
-0.358897	integer division in vectors,	-0.124939
-0.353454	integers as Boolean vectors,	-0.124939
-0.355528	Any other allocated resource.	-0.124939
-0.512935	is a limited resource.	-0.124939
-0.458804	is a scarce resource.	-0.124939
-0.065076	Intel: "IA-32 Intel Architecture	-0.425969
-0.237959	developer.intel.com. AMD: "AMD64 Architecture	-0.124939
-2.158402	Example: // Example 7.12	-0.124939
-0.781168	a member function. 7.12	-0.124939
-0.325421	Type conversions.................................................................................................... 40 7.12	-0.124939
-0.540784	available registers is limited.	-0.124939
-0.358642	of allocations is limited.	-0.124939
-0.580888	registers is very limited.	-0.124939
-2.159050	Example: // Example 11.3	-0.124939
-1.166736	loop in example 11.3	-0.124939
-0.558381	variable in example 11.3	-0.124939
-0.358724	#define, const or typedef	-0.124939
-0.501382	define function type typedef	-0.124939
-0.355145	with desired parameters typedef	-0.124939
-0.106547	address range from 0x2700	-0.425969
-0.525993	bytes from address 0x2700	-0.124939
-0.652335	int c; }; Replace	-0.124939
-0.314785	is running on. Replace	-0.124939
-0.237951	// Example 7.34b. Replace	-0.124939
-0.294265	any specific model. Instead,	-0.124939
-0.237951	4.5.2, July 2011). Instead,	-0.124939
-0.237951	~ for NOT. Instead,	-0.124939
-0.541158	runtime libraries and frameworks,	-0.124939
-0.352968	on big runtime frameworks,	-0.124939
-0.352399	the large graphics frameworks,	-0.124939
-0.835293	example, x = *(p++)	-0.124939
-0.518932	(*p != 0) *(p++)	-0.124939
-0.237951	> 0; i--) *(p++)	-0.124939
-0.356043	small low-power CPUs (Intel	-0.124939
-0.355602	-mAVX -axSSE3, etc. (Intel	-0.124939
-0.538953	(Intel CPU only) (Intel	-0.124939
-0.585948	same or a nearby	-0.124939
-0.581562	branch and other nearby	-0.124939
-0.354770	disadvantage if other nearby	-0.124939
-0.352974	has become too fragmented.	-0.124939
-0.467448	space to become fragmented.	-0.124939
-0.467448	heap has become fragmented.	-0.124939
-0.874187	rounding instead of truncation.	-0.124939
-0.065742	between rounding and truncation.	-0.124939
-0.172700	of manuals. 7.1 Different	-0.124939
-0.172700	constructs........................................................................ 26 7.1 Different	-0.124939
-0.237959	(not a number). Different	-0.124939
-0.557389	than calculating the logarithm	-0.124939
-0.358677	Without static, the logarithm	-0.124939
-0.358677	overflow. Taking the logarithm	-0.124939
-0.726711	each bit in Day	-0.124939
-0.328382	== Wednesday || Day	-0.124939
-0.328382	== Tuesday || Day	-0.124939
-1.271894	code that is ported	-0.124939
-0.352076	code is later ported	-0.124939
-0.341888	and not easily ported	-0.124939
-0.527104	functions static or inline.	-0.124939
-0.583740	declare the function inline.	-0.124939
-0.583740	Declare the function inline.	-0.124939
-1.878462	the function is big.	-0.124939
-0.357260	can become very big.	-0.124939
-0.541522	count is too big.	-0.124939
-0.358547	The allocation and deallocation	-0.124939
-0.358547	dynamic allocation and deallocation	-0.124939
-0.237959	separately. The allocation, deallocation	-0.124939
-0.230991	procedure linkage table (PLT)	-0.124939
-0.599056	7.22. // Example 7.22	-0.124939
-0.314785	(RTTI) ........................................................................... 54 7.22	-0.124939
-0.294265	use alternative implementations. 7.22	-0.124939
-0.102881	access....................................................................................................... 22 3.14 Context	-0.124939
-0.102881	memory caching. 3.14 Context	-0.124939
-0.237959	to be renewed. Context	-0.124939
-2.158402	Example: // Example 7.23	-0.124939
-0.652335	int c; }; 7.23	-0.124939
-0.314785	Inheritance .............................................................................................................. 54 7.23	-0.124939
-0.463698	Consider running the services	-0.124939
-0.352394	Background services. Many services	-0.124939
-0.442054	performance for background services	-0.124939
-0.897102	conversion // Example 7.20	-0.124939
-0.347513	any non-static access. 7.20	-0.124939
-0.314785	functions (methods)......................................................................... 53 7.20	-0.124939
-1.311749	but this is extremely	-0.124939
-1.025581	this method is extremely	-0.124939
-0.358346	other abuse is extremely	-0.124939
-0.349149	cache is 8 kb	-0.124939
-0.740085	cache of 8 kb	-0.124939
-0.645575	cache is 512 kb	-0.124939
-0.460030	cache space by joining	-0.124939
-1.313030	be avoided by joining	-0.124939
-0.356091	more compact by joining	-0.124939
-0.971168	also applies to decrement	-0.124939
-0.358429	the increment and decrement	-0.124939
-0.358429	respectively. Increment and decrement	-0.124939
-0.358738	% 0x20 = 0x1C.	-0.124939
-0.357950	lines from set 0x1C.	-0.124939
-0.462009	to set number 0x1C.	-0.124939
-0.443910	or malloc and free.	-0.124939
-0.314416	functions malloc and free.	-0.124939
-0.990375	is available for free.	-0.124939
-1.063995	4) { // Check	-0.124939
-0.357729	x=y; y=temp;} // Check	-0.124939
-0.349796	code version. 2. Check	-0.124939
-0.587363	float instead of double,	-0.124939
-0.802931	of a 64-bit double,	-0.124939
-0.325429	as int, float, double,	-0.124939
-0.330626	follows a simple periodic	-0.425969
-0.505390	branches. A simple periodic	-0.124939
-0.599056	address: // Example 7.27	-0.124939
-0.355143	using overloaded functions. 7.27	-0.124939
-0.325421	functions .............................................................................................. 56 7.27	-0.124939
-2.158402	Example: // Example 7.24	-0.124939
-0.314785	destructors .................................................................................. 55 7.24	-0.124939
-0.237951	See page 53. 7.24	-0.124939
-1.456354	rather than the product	-0.124939
-0.357256	better performing software product	-0.124939
-0.314785	(MFC). A competing product	-0.124939
-0.599056	overflow: // Example 7.25	-0.124939
-0.630043	applied to integers. 7.25	-0.124939
-0.314785	Unions .................................................................................................................... 55 7.25	-0.124939
-2.158402	Example: // Example 7.28	-0.124939
-0.810650	in simple cases. 7.28	-0.124939
-0.325421	operators ............................................................................................. 56 7.28	-0.124939
-0.567105	includes pointers and references,	-0.124939
-0.331930	function parameters, pointers, references,	-0.124939
-0.237951	bit systems: Pointers, references,	-0.124939
-0.636052	AMD and VIA CPUs"	-0.301030
-0.357380	It takes some experience	-0.124939
-0.573631	deal of programming experience	-0.124939
-0.345254	the user might experience	-0.124939
-1.539708	more efficient to determine	-0.124939
-2.111693	in order to determine	-0.124939
-0.358251	in Windows) to determine	-0.124939
-0.339517	by template template <typename	-0.124939
-0.339517	bounds checking template <typename	-0.124939
-0.339517	template parameter: template <typename	-0.124939
-0.408199	masm=intel /FA -S Generate	-0.124939
-0.294265	-parallel -openmp -static Generate	-0.124939
-0.237951	map file /Fm Generate	-0.124939
-1.798085	then it is certainly	-0.124939
-1.050838	But it is certainly	-0.124939
-0.358346	often seen, is certainly	-0.124939
-0.352967	table 8.1 below. Devirtualization	-0.124939
-0.527376	expressions Automatic vectorization Devirtualization	-0.124939
-0.237951	// Example 8.19. Devirtualization	-0.124939
-0.898707	this in a pivot	-0.124939
-0.358620	for use as pivot	-0.124939
-0.498939	finding a suitable pivot	-0.124939
-0.546942	Align by 16 __declspec(	-0.124939
-0.341888	on) __restrict __restrict __declspec(	-0.124939
-0.538953	align(16)) __attribute(( aligned(16))) __declspec(	-0.124939
-1.557477	in case of mispredictions	-0.124939
-0.350154	can cause branch mispredictions	-0.124939
-0.350154	function. Provoke branch mispredictions	-0.124939
-0.375959	8.13a int i, a[100],	-0.124939
-0.375959	8.13b int i, a[100],	-0.124939
-0.375959	8.14b int i, a[100],	-0.124939
-2.074923	the number of allocations	-0.124939
-0.358291	cause seven memory allocations	-0.124939
-1.162574	there are many allocations	-0.124939
-0.356198	separate modules if necessary,	-0.124939
-0.356198	RAM space, if necessary,	-0.124939
-0.356198	are modified, if necessary,	-0.124939
-2.158402	Example: // Example 9.4	-0.124939
-0.523142	linked library functions. 9.4	-0.124939
-0.325421	stored together...................................... 88 9.4	-0.124939
-0.588005	bits than a float.	-0.124939
-0.535454	vector of four float.	-0.124939
-0.429569	char, short int, float.	-0.124939
-0.200672	* x + 1.0f;}	-0.124939
-0.348035	return square(x) + 1.0f;}	-0.124939
-1.777559	the code is indeed	-0.124939
-0.358642	Example 8.21 is indeed	-0.124939
-0.358812	thrown exceptions are indeed	-0.124939
-0.555108	results in table 9.1	-0.124939
-0.800357	Optimizing memory access 9.1	-0.124939
-0.336323	access ............................................................................................. 87 9.1	-0.124939
-0.562442	to each other (not	-0.124939
-0.350858	Library versions tested (not	-0.124939
-0.538953	infinity or NAN (not	-0.124939
-0.589089	CPUs have a built-in	-0.124939
-0.504906	string instructions. The built-in	-0.124939
-0.314785	compiler often inserts built-in	-0.124939
-0.570528	suitable choice of n.	-0.124939
-0.453396	only for positive n.	-0.124939
-0.237951	some positive value, n.	-0.124939
-1.185210	lead to a complete	-0.124939
-0.526442	the data. A complete	-0.124939
-0.354913	file http://www.agner.org/optimize/asmlib.zip contains complete	-0.124939
-2.633073	x x x (x)	-0.124939
-0.537463	x- x- x (x)	-0.124939
-0.651556	x (x) x (x)	-0.124939
-0.346484	+ esp ebx ecx,	-0.124939
-0.336329	2:8+esp eax, edx, ecx,	-0.124939
-0.294265	4 ?Func2@@YAXQAHAAH@Z ENDP ecx,	-0.124939
-0.065639	is created or modified.	-0.425969
-0.896483	object is not modified.	-0.124939
-0.122678	x n.a. Constant folding	-0.124939
-0.122678	= 6.0f; Constant folding	-0.124939
-0.122678	few places. Constant folding	-0.124939
-0.570364	variable where it expects	-0.124939
-0.573165	processor the user expects	-0.124939
-0.488879	insufficient. The user expects	-0.124939
-0.460816	Virtual function // Call	-0.124939
-0.106777	c; ... // Call	-0.425969
-0.897762	They can be joined	-0.124939
-0.972250	program will be joined	-0.124939
-0.563416	instances will be joined	-0.124939
-1.025604	to use vector classes,	-0.124939
-0.352970	to use string classes,	-0.124939
-0.349793	to well-tested functions, classes,	-0.124939
-0.573117	optimizing compilers can compute	-0.124939
-0.355981	the code must compute	-0.124939
-0.827061	of loop ; compute	-0.124939
-0.667661	a matter of interpreting	-0.425969
-0.358848	runtime framework for interpreting	-0.124939
-0.358893	the SVML and LIBM	-0.124939
-0.342313	page 131. AMD LIBM	-0.124939
-0.342313	__vrs4_expf __vrd2_exp AMD LIBM	-0.124939
-0.542535	The code that accesses	-0.124939
-0.542535	Any code that accesses	-0.124939
-0.354521	"override" feature. All accesses	-0.124939
-0.599814	back to the $B1$2	-0.124939
-0.450256	1 eax, 100 $B1$2	-0.124939
-0.294265	100 / jl $B1$2	-0.124939
-0.536042	for hard disk copying.	-0.124939
-0.341883	precision math. Memory copying.	-0.124939
-0.237951	effectively preventing illegitimate copying.	-0.124939
-0.015542	Software Developer’s Manual", Volume	-0.124939
-0.065815	Architecture Programmer’s Manual", Volume	-0.124939
-0.599388	13.1 can be placed	-0.124939
-0.461556	can then be placed	-0.124939
-0.584038	pragmas must be placed	-0.124939
-0.443505	around the hot spot.	-0.124939
-0.433264	identify a hot spot.	-0.124939
-0.282429	in this hot spot.	-0.124939
-1.621630	part of a variable,	-0.124939
-0.358622	information about a variable,	-0.124939
-0.587331	increment an integer variable,	-0.124939
-1.749078	a lot of jumping	-0.124939
-1.592371	is used for jumping	-0.124939
-0.352706	necessary destructors after jumping	-0.124939
-1.063538	a long time compared	-0.124939
-0.339499	the following disadvantages compared	-0.124939
-0.237951	short in duration compared	-0.124939
-0.828184	same applies to 3-dimensional	-0.124939
-0.065639	RGB video or 3-dimensional	-0.425969
-0.358927	the destructor of x.	-0.124939
-0.964968	the result in x.	-0.124939
-0.358639	const restriction on x.	-0.124939
-0.358926	expression -(-a) to a.	-0.124939
-0.541172	four results in a.	-0.124939
-0.357767	<< 4) + a.	-0.124939
-0.358926	change pre-increment to post-increment.	-0.124939
-0.358724	use pre-increment or post-increment.	-0.124939
-1.576198	more efficient than post-increment.	-0.124939
-0.592646	Often, it is sufficient	-0.124939
-0.884461	projects, it is sufficient	-0.124939
-0.598305	and may be sufficient	-0.124939
-1.434003	because it is evicted	-0.124939
-0.597254	table to be evicted	-0.124939
-0.593155	0x273F will be evicted	-0.124939
-0.358975	problems separating the flags	-0.124939
-0.353216	carry and zero flags	-0.124939
-0.294265	the so-called partial flags	-0.124939
-0.358897	and r in Sum2	-0.124939
-1.576198	more efficient than Sum2	-0.124939
-0.237951	three functions Sum1, Sum2	-0.124939
-0.474312	a vector of (2,2,2,2,2,2,2,2)	-0.301030
-0.128691	ebx, DWORD PTR [edx]	-0.124939
-0.312387	[esp+8] DWORD PTR [edx]	-0.124939
-1.735132	example: // Example 7.14	-0.124939
-0.325429	7.13 Loops...................................................................................................................... 45 7.14	-0.124939
-0.314785	is too big. 7.14	-0.124939
-0.599056	memset: // Example 7.16	-0.124939
-0.331930	parameters ............................................................................................... 50 7.16	-0.124939
-0.715017	and operating systems". 7.16	-0.124939
-0.599056	fastest: // Example 7.17	-0.124939
-0.511840	delete the object. 7.17	-0.124939
-0.331930	types .............................................................................................. 50 7.17	-0.124939
-0.630453	speed to using templates.	-0.124939
-0.630453	cost to using templates.	-0.124939
-0.237959	implemented as recursive templates.	-0.124939
-1.735132	example: // Example 7.13	-0.124939
-0.331930	the different microprocessors. 7.13	-0.124939
-0.325421	switch statements............................................................................. 43 7.13	-0.124939
-0.599056	conversions: // Example 7.19	-0.124939
-0.348398	first 128 bytes. 7.19	-0.124939
-0.325421	(properties) ............................................................................ 51 7.19	-0.124939
-0.428070	sprintf, etc. But beware	-0.124939
-0.330739	> b) But beware	-0.124939
-0.330739	be inlined. But beware	-0.124939
-0.599056	efficient: // Example 7.18	-0.124939
-0.348384	effect on performance. 7.18	-0.124939
-0.325421	and classes............................................................................................ 51 7.18	-0.124939
-0.802606	on a graphics card	-0.124939
-0.382907	a graphics accelerator card	-0.124939
-0.331940	is extremely inefficient, (4)	-0.124939
-0.237959	GOT, and finally (4)	-0.124939
-0.657318	to swap two elements:	-0.124939
-0.357512	swap two array elements:	-0.124939
-1.173107	can be a viable	-0.124939
-1.141353	may be a viable	-0.124939
-0.314795	pointers .......................................................................................................... 38 7.10	-0.124939
-0.538969	on page 93. 7.10	-0.124939
-0.065647	__m128i two = _mm_set1_epi16(2);	-0.425969
-0.549521	efficient container class templates,	-0.124939
-0.350363	STL. Some STL templates,	-0.124939
-0.358893	code bloat and complexity	-0.124939
-0.521025	With the high complexity	-0.124939
-0.314803	129 130 14.4 511	-0.124939
-0.294275	130 14.4 511 511	-0.124939
-0.172705	...................................................................................................... 37 7.8 Member	-0.124939
-0.172705	has changed. 7.8 Member	-0.124939
-0.358848	similar utility for modifying	-0.124939
-0.358686	a double by modifying	-0.124939
-0.566894	expressions may have undesired	-0.124939
-0.349796	and may produce undesired	-0.124939
-0.294275	Functions ................................................................................................................ 48 7.15	-0.124939
-0.237959	in this respect. 7.15	-0.124939
-0.358852	and GOT. The symbol	-0.124939
-0.450270	object. This so-called symbol	-0.124939
-0.580888	This is very problematic	-0.124939
-0.526416	strings are particularly problematic	-0.124939
-0.584751	who has to invest	-0.124939
-0.463210	is worthwhile to invest	-0.124939
-0.504969	to memset and memcpy,	-0.124939
-1.460054	functions such as memcpy,	-0.124939
-0.249573	than Sum2 and Sum3	-0.124939
-0.249573	Sum1, Sum2 and Sum3	-0.124939
-0.339500	in registers anyway. Pure	-0.124939
-0.331940	that needs them. Pure	-0.124939
-1.435858	but it is impossible	-0.124939
-0.595618	pointers that are impossible	-0.124939
-0.141813	multiple inheritance class B1;	-0.124939
-0.141813	Multiple inheritance class B1;	-0.124939
-0.238708	of a store forwarding	-0.124939
-0.238708	generate a store forwarding	-0.124939
-0.294275	be huge). Far storage,	-0.124939
-0.237959	etc.) have little-endian storage,	-0.124939
-0.549555	system (see page 107).	-0.124939
-0.803748	vectorization (see page 107).	-0.124939
-0.355244	= (10000 / 64)	-0.124939
-0.237959	line size (typically 64)	-0.124939
-0.581089	declare the table static.	-0.124939
-0.237959	make a lookup-table static.	-0.124939
-0.331944	_M_IX86 and _WIN64 _M_X64	-0.124939
-0.294275	and _WIN64 _M_X64 _M_X64	-0.124939
-0.356520	#include <asmlib.h> void CriticalFunction();	-0.124939
-0.237959	time1 = ReadTSC(); CriticalFunction();	-0.124939
-0.294275	x--x----- --xx----- x-xxx---x x-xxx---x	-0.124939
-0.294275	x-xx--xx- x--x----- --xx----- x-xxx---x	-0.124939
-0.461302	different platforms as shown	-0.124939
-0.357091	as _mm_empty() as shown	-0.124939
-0.358893	poor documentation and lack	-0.124939
-0.582722	Older operating systems lack	-0.124939
-0.143015	reciprocal_divisor; y2 = a2	-0.124939
-0.143015	b1; y2 = a2	-0.124939
-0.143015	b2); y1 = a1	-0.124939
-0.143015	y2; y1 = a1	-0.124939
-0.591012	profiling (see page 16)	-0.124939
-1.229433	256; i += 16)	-0.124939
-0.598878	window of a debugger.	-0.124939
-0.595695	trace with a debugger.	-0.124939
-0.585976	Mars compiler is mostly	-0.124939
-0.527284	bit mode and mostly	-0.124939
-0.568415	(vector const & a)	-0.124939
-0.336342	float square (float a)	-0.124939
-0.237959	__asm fld qword ptr	-0.124939
-0.237959	__asm fistp dword ptr	-0.124939
-0.358852	or __attribute__((fastcall)). The fastcall	-0.124939
-0.355842	common names. Use fastcall	-0.124939
-0.597644	optimal number of accumulators	-0.124939
-0.503239	and use multiple accumulators	-0.124939
-0.164486	"assume no pointer aliasing"	-0.124939
-0.102884	.......................................................................................... 126 13.5 Implementation	-0.124939
-0.102884	found elsewhere. 13.5 Implementation	-0.124939
-1.309031	improve the performance significantly	-0.124939
-0.356229	be speeded up significantly	-0.124939
-0.314803	for exploiting fine-grained parallelism.	-0.124939
-0.314795	code contains natural parallelism.	-0.124939
-0.023527	362880, 3628800, 39916800, 479001600};	-0.124939
-0.294275	in the book "Performance	-0.124939
-0.237959	and Adolfy Hoisie: "Performance	-0.124939
-0.463653	of x is type-casted	-0.124939
-0.527116	if pointers are type-casted	-0.124939
-0.357856	x *x; double x4	-0.124939
-0.357668	// x^2 float x4	-0.124939
-0.682804	of the clock frequency.	-0.124939
-0.478098	in the clock frequency.	-0.124939
-0.463641	second step of interpretation	-0.124939
-0.358730	requires compilation or interpretation	-0.124939
-0.576845	and vector operations (chapter	-0.124939
-0.355918	threads. Out-of-order execution (chapter	-0.124939
-0.132925	on my own research,	-0.124939
-0.132925	For my own research,	-0.124939
-0.538969	Numerically Intensive Codes", SIAM	-0.124939
-0.237959	and A. Hoisie, SIAM	-0.124939
-0.653991	= Induction; ; a[i+1]	-0.124939
-0.702772	a[i] = Induction; a[i+1]	-0.124939
-0.331940	xxxxxxxxx xxxxxxx-x xxxxxxxxx x-xxx----	-0.124939
-0.237959	x--x----- ---x----- x---x---x x-xxx----	-0.124939
-0.358730	static buffer or send	-0.124939
-0.353459	So please don't send	-0.124939
-0.329681	Example 7.31b char string[100],	-0.124939
-0.329681	Example 7.31a char string[100],	-0.124939
-0.654974	= char 16 SSSE3	-0.124939
-0.355611	horizontal add, etc. SSSE3	-0.124939
-0.567183	other types of expressions,	-0.124939
-0.896303	than floating point expressions,	-0.124939
-0.461819	appropriate function version CriticalFunctionType	-0.124939
-0.294275	// Function prototype CriticalFunctionType	-0.124939
-0.584912	one (see page 71).	-0.124939
-0.542741	2. (See page 71).	-0.124939
-1.532588	are stored in ASCII	-0.124939
-0.237959	converts a zero-terminated ASCII	-0.124939
-1.054746	objects are not overlapping	-0.124939
-0.901972	the CPU from overlapping	-0.124939
-0.599866	obtained in a computationally	-0.124939
-1.261036	that are not computationally	-0.124939
-0.352977	the Intel mechanism executes	-0.124939
-0.421439	loop. Example 12.4b executes	-0.124939
-0.429573	in the project window	-0.124939
-0.237959	in the disassembly window	-0.124939
-0.358730	to five or ten	-0.124939
-1.161320	the critical function ten	-0.124939
-0.358756	S1 aligned // Structure	-0.124939
-0.325431	800 bytes smaller. Structure	-0.124939
-1.182475	different kinds of jobs.	-0.124939
-0.442067	ms for background jobs.	-0.124939
-0.294275	from everybody. So please	-0.124939
-0.294275	pop-up messages saying please	-0.124939
-0.172705	.................................................................................. 55 7.24 Unions	-0.124939
-0.172705	page 53. 7.24 Unions	-0.124939
-0.294275	and 3B. developer.intel.com. AMD:	-0.124939
-0.538969	are produced regularly. AMD:	-0.124939
-0.841715	+ d = ((a*x+b)*x+c)*x+d	-0.124939
-2.634813	x x x ((a*x+b)*x+c)*x+d	-0.124939
-0.339507	and more important. 9.2	-0.124939
-0.336334	data ......................................................................................... 87 9.2	-0.124939
-0.358940	One kilobyte is 1024	-0.124939
-0.358929	register sizes to 1024	-0.124939
-0.232810	time it was programmed.	-0.124939
-0.294275	// Example 12.5. Aligned	-0.124939
-0.237959	mean good performance). Aligned	-0.124939
-1.190407	based on the past	-0.124939
-0.294275	C++ programs. Writing past	-0.124939
-0.651109	dynamically allocated memory. 9.6	-0.124939
-0.331944	data ...................................................................................................... 90 9.6	-0.124939
-0.599658	member of the object's	-0.124939
-0.324557	a1, a2, b1, b2,	-0.425969
-0.358206	function template because partial	-0.124939
-0.541168	to the so-called partial	-0.124939
-0.143015	a*b*c=a*(b*c) a+b+c+d = (a+b)+(c+d)	-0.124939
-0.143015	a+b+c=c+b+a a+b+c+d = (a+b)+(c+d)	-0.124939
-0.593836	than short int (16	-0.124939
-0.881467	the vector size (16	-0.124939
-0.969361	to the instruction xor	-0.124939
-0.347521	$B1$1: push mov xor	-0.124939
-0.325437	work better. Remember again,	-0.124939
-0.444421	calculating the logarithm again,	-0.124939
-0.325431	Strings ...................................................................................................................... 96 9.9	-0.124939
-0.325431	manual at www.agner.org/optimize/cppexamples.zip. 9.9	-0.124939
-0.566349	of time and resolve	-0.124939
-0.358434	to find and resolve	-0.124939
-1.650974	depending on the context.	-0.124939
-0.822358	to the new context.	-0.124939
-0.595672	the appropriate version (May	-0.425969
-0.196170	dispatcher. See page 131.	-0.124939
-0.599819	ignored if the goal	-0.124939
-0.331944	A more realistic goal	-0.124939
-0.835756	CPU dispatching and discovered	-0.124939
-0.463042	Many programmers have discovered	-0.124939
-0.352354	by 16 float Exp(float	-0.124939
-0.352354	Taylor series float Exp(float	-0.124939
-0.314795	classes ..................................................................................................... 93 9.8	-0.124939
-0.294275	fit specific needs. 9.8	-0.124939
-0.592926	__INTEL_COMPILER n.a. n.a. _MSC_VER	-0.124939
-0.294275	aligning data #ifdef _MSC_VER	-0.124939
-0.102884	unit-testing ...................................................................................... 156 16.3	-0.124939
-0.102884	unreasonably large. 156 16.3	-0.124939
-0.294275	is a 90% chance	-0.124939
-0.237959	with a 50-50 chance	-0.124939
-0.600617	Strings can be manipulated	-0.124939
-0.354211	the CPUID was manipulated	-0.124939
-1.136839	The calculation of c+b	-0.124939
-0.346498	if the subexpression c+b	-0.124939
-0.916258	allows you to override	-0.124939
-0.659192	the ability to override	-0.124939
-0.846217	do not use branches,	-0.124939
-0.498959	this can eliminate branches,	-0.124939
-0.556931	used in multiple applications,	-0.124939
-0.503129	efficient for such applications,	-0.124939
-0.587981	research, I have developed	-0.124939
-0.570732	yet as well developed	-0.124939
-0.331940	complicated template method. 7.29	-0.124939
-0.237959	56 7.28 Templates...............................................................................................................57 7.29	-0.124939
-0.505065	during execution of CriticalFunction.	-0.124939
-0.828218	the calls to CriticalFunction.	-0.124939
-0.537923	smaller. This manual discusses	-0.124939
-0.350865	languages. This section discusses	-0.124939
-0.237959	#define FUNCNAME SelectAddMul_SSE41 #elif	-0.124939
-0.237959	#define FUNCNAME SelectAddMul_SSE2 #elif	-0.124939
-0.331944	3) <<6 ); 7.26	-0.124939
-0.325431	Bitfields ................................................................................................................... 56 7.26	-0.124939
-0.023527	in manual 4: "Instruction	-0.425969
-0.847390	of b is 400	-0.124939
-0.358756	int a[100]; // 400	-0.124939
-0.317446	x x Loop invariant	-0.124939
-0.317446	the compiler. Loop invariant	-0.124939
-0.065031	+ b*x*x + c*x	-0.425969
-1.844757	in the code carefully	-0.124939
-0.357936	multiple versions, each carefully	-0.124939
-0.358393	the stack when CriticalInnerFunction	-0.124939
-0.356520	Example 14.1c void CriticalInnerFunction	-0.124939
-0.237959	---xxx--- a/a=1 --------x a/1=a	-0.124939
-0.237959	(-a)*(-b)=a*b a/a=1 ----x---x a/1=a	-0.124939
-1.333308	& x) { __m128	-0.124939
-0.523469	each. The type __m128	-0.124939
-0.524392	with the & operator;	-0.124939
-0.648039	using the | operator;	-0.124939
-0.596069	it as a subexpression.	-0.124939
-0.558673	around the constant subexpression.	-0.124939
-0.885087	memory space is freed	-0.124939
-1.830982	it may be freed	-0.124939
-0.331944	the bitwise OR operator,	-0.124939
-0.237959	an overloaded assignment operator,	-0.124939
-0.382907	p = &Object2; p->Hello();	-0.124939
-0.237959	= &Object1; p->NotPolymorphic(); p->Hello();	-0.124939
-0.549389	setup. on Intel CPUs:	-0.124939
-1.590226	AMD and VIA CPUs:	-0.124939
-0.522371	true, and all 0's	-0.124939
-0.838648	AND'ed with all 0's	-0.124939
-0.599581	device is a chip	-0.124939
-1.863870	in the same chip	-0.124939
-0.336334	with the ^ operator.	-0.124939
-0.237959	with the sizeof operator.	-0.124939
-0.463442	installation process can proceed	-0.124939
-0.503822	Uninstallation should also proceed	-0.124939
-0.143005	Lowest version int CriticalFunction_386(int	-0.425969
-0.358893	PC's, workstations and scientific	-0.124939
-0.358899	a niche in scientific	-0.124939
-0.917518	the exponent is biased	-0.124939
-1.061452	stored as a biased	-0.124939
-0.599581	bug is a minor	-0.124939
-0.461839	justify a possible minor	-0.124939
-0.598408	graphics on the screen.	-0.124939
-0.358829	to refresh the screen.	-0.124939
-0.585017	comes on the market.	-0.124939
-0.585017	appears on the market.	-0.124939
-0.048403	__declspec( align(16)) __attribute(( aligned(16)))	-0.124939
-0.600004	costs can be justified	-0.124939
-0.597441	time may be justified	-0.124939
-0.463640	repeat or to exit	-0.124939
-0.325437	calls exit. Calling exit	-0.124939
-0.467821	{ y = cos(x);	-0.124939
-0.656861	function or variable having	-0.124939
-0.294275	p1 and p2 having	-0.124939
-0.557988	the time. A for-loop	-0.124939
-0.355794	example 7.32b. A for-loop	-0.124939
-0.458217	bool 1 1 char,	-0.124939
-0.382907	Small data types: char,	-0.124939
-0.294275	a*0=0 a*1=a (-a)*(-b)=a*b a/a=1	-0.124939
-0.237959	x-xxxxx-x (-a)*(-b)=a*b ---xxx--- a/a=1	-0.124939
-0.053307	become a serious legal	-0.124939
-0.580806	for any other resource,	-0.124939
-0.652315	are a scarce resource,	-0.124939
-0.981913	OpenMP and automatic parallelization.	-0.124939
-0.330865	compilers. Use automatic parallelization.	-0.124939
-0.854546	efficient way of keeping	-0.124939
-1.228139	the cost of keeping	-0.124939
-0.294275	less reliable. Event-based sampling:	-0.124939
-0.237959	code line. Time-based sampling:	-0.124939
-0.599073	arrays. // Example 12.5.	-0.124939
-0.352973	F32vec8 F64vec4 Table 12.5.	-0.124939
-0.358598	in advance. This reduces	-0.124939
-0.353667	compiler that automatically reduces	-0.124939
-1.277538	applied to a non-member	-0.124939
-0.349144	to all local non-member	-0.124939
-1.488966	code can be vectorized,	-0.124939
-0.724750	can still be vectorized,	-0.124939
-0.023527	swapd(x,y) {temp=x; x=y; y=temp;}	-0.124939
-0.102884	remove all disturbing influences	-0.124939
-0.102884	conditions. All disturbing influences	-0.124939
-1.748083	The following example explains	-0.124939
-0.352974	non-reduced expression better explains	-0.124939
-2.124680	in order to emulate	-0.124939
-0.550715	class library can emulate	-0.124939
-0.358730	be three or four,	-0.124939
-1.202867	the loop by four,	-0.124939
-0.345002	issues, and I believe	-0.124939
-0.345002	as expected. I believe	-0.124939
-0.550748	maximum value in stdint.h	-0.124939
-0.547607	standard header file stdint.h	-0.124939
-0.287064	integer Common subexpression elimin.,	-0.124939
-0.287064	elimination Common subexpression elimin.,	-0.124939
-0.521856	and only one instance.	-0.124939
-0.872763	has only one instance.	-0.124939
-0.064233	copy matrix void TransposeCopy(double	-0.425969
-0.358559	slow CPU, an insufficient	-0.124939
-0.358251	selected. Compiler has insufficient	-0.124939
-0.659964	to overcome the dangers	-0.124939
-1.186718	a number of dangers	-0.124939
-0.858411	the objects are aligned.	-0.124939
-0.294275	not be optimally aligned.	-0.124939
-1.456381	rather than the external	-0.124939
-0.463299	to link with external	-0.124939
-0.669120	{ cout << "Error:	-0.425969
-0.345265	SSE3 tmmintrin.h SSE4.1 smmintrin.h	-0.124939
-0.314795	SSE4.2 nmmintrin.h (MS) smmintrin.h	-0.124939
-0.352976	of. Big runtime frameworks.	-0.124939
-0.352080	language and interface frameworks.	-0.124939
-0.102884	Weekdays { Sunday, Monday,	-0.124939
-0.102884	the constants Sunday, Monday,	-0.124939
-0.615843	and Mac OS X,	-0.124939
-0.791934	32-bit Mac OS X,	-0.124939
-0.358361	without restrictions. A GNU	-0.124939
-0.237959	in compiler price GNU	-0.124939
-0.357945	example 13.1 page 127.	-0.124939
-0.339503	from -128 generates 127.	-0.124939
-0.709637	for each version FuncType	-0.124939
-0.351496	the selected version FuncType	-0.124939
-0.583219	Virtual call to C1::f	-0.124939
-0.502351	it can call C1::f	-0.124939
-0.567455	program less efficient. Splitting	-0.124939
-0.237959	with this rule. Splitting	-0.124939
-0.511853	for vector operations. Algorithms	-0.124939
-0.237959	vectors and matrixes. Algorithms	-0.124939
-0.294275	-msse3 -mssse3 -msse4.1 -mAVX	-0.124939
-0.237959	/arch:SSSE2 -msse4.1 /arch:SSE4.1 -mAVX	-0.124939
-0.833830	not have to worry	-0.124939
-0.566051	do have to worry	-0.124939
-1.284054	in a single instruction.	-0.124939
-0.520608	this bit scan instruction.	-0.124939
-0.581406	x10; } // x^2	-0.124939
-0.357733	* x; // x^2	-0.124939
-1.729790	it can be disabled	-0.124939
-1.165228	when they are disabled	-0.124939
-0.898615	directly to the CPU-specific	-0.124939
-0.358812	monitor counters are CPU-specific	-0.124939
-0.594266	improvements). // Example 8.26b	-0.124939
-0.459320	8.26b: ; Example 8.26b	-0.124939
-1.136154	instruction set. The preprocessing	-0.124939
-0.566330	The library has preprocessing	-0.124939
-0.561851	streams with different strides.	-0.124939
-0.481358	patterns with fixed strides.	-0.124939
-0.659867	from 0 to 15.	-0.124939
-0.589682	below, on page 15.	-0.124939
-1.124838	it takes to develop	-0.425969
-0.585141	} }; // Full	-0.425969
-1.178978	x) { // (N	-0.124939
-0.325431	N: #define N1 (N	-0.124939
-0.358756	b[arraysize], c[arraysize]; // Enable	-0.124939
-0.294275	12.1b to 12.1a. Enable	-0.124939
-0.578738	simple in most cases:	-0.124939
-1.484489	in the following cases:	-0.124939
-0.181323	AVX code to non-AVX	-0.124939
-0.357687	- a+b+c = a+(b+c)	-0.124939
-0.357687	n.a. (a+b)+c = a+(b+c)	-0.124939
-0.065793	or moving the mouse.	-0.124939
-0.331940	1 - 5. www.amd.com.	-0.124939
-0.237959	Family 15h Processors". www.amd.com.	-0.124939
-0.358929	adding -100 to -56	-0.124939
-0.870293	give the result -56	-0.124939
-0.591741	libraries is more difficult.	-0.124939
-0.355952	detailed optimization more difficult.	-0.124939
-0.355611	/arch:AVX /QaxSSE3, etc. -msse3	-0.124939
-0.294275	/arch:AVX /openmp /MT -msse3	-0.124939
-0.421439	from example 8.26a (32-bit	-0.124939
-0.237959	allows bigger segments (32-bit	-0.124939
-0.336338	all the B values.	-0.124939
-0.421439	and "best case" values.	-0.124939
-0.382907	double) /arch:SSE2 -msse2 /arch:SSE2	-0.124939
-0.237959	ger or double) /arch:SSE2	-0.124939
-0.502994	Define vector objects Vec8s	-0.124939
-0.336334	int 128 Is16vec8 Vec8s	-0.124939
-0.357952	<typename MyChild> class CParent	-0.124939
-0.343763	"Hello 2" Here CParent	-0.124939
-0.538969	(see page 78). Adding	-0.124939
-0.237959	value wrap around. Adding	-0.124939
-0.237959	3A and 3B. developer.intel.com.	-0.124939
-0.237959	Optimization Reference Manual". developer.intel.com.	-0.124939
-0.102884	frame- pointer -fomit- frame-	-0.124939
-0.102884	frame /Oy -fomit- frame-	-0.124939
-0.330862	class library #include <stdio.h>	-0.124939
-0.330862	Example 16.2 #include <stdio.h>	-0.124939
-0.325431	structures ............................................................. 96 9.11	-0.124939
-0.382907	Hoisie, SIAM 2001. 9.11	-0.124939
-0.596511	-fpic because the relocations	-0.124939
-0.812480	it will generate relocations	-0.124939
-0.249155	be infinity or NAN	-0.124939
-0.249155	or infinity or NAN	-0.124939
-0.325431	sequentially .......................................................................................... 96 9.10	-0.124939
-0.237959	order is opposite). 9.10	-0.124939
-0.338200	// Writes "Hello 2"	-0.124939
-0.524927	factorial } return sum;	-0.124939
-0.555559	s3 = 0, sum;	-0.124939
-0.341892	number of vectors. 12.10	-0.124939
-0.339503	vectors ....................................................... 120 12.10	-0.124939
-1.115710	The overhead of semaphores,	-0.124939
-0.595324	threads, such as semaphores,	-0.124939
-0.331940	on most microprocessors. Multiplication	-0.124939
-0.575689	on the microprocessor. Multiplication	-0.124939
-0.358744	that N1 = N&(N-1)	-0.124939
-0.902028	of 2 then N&(N-1)	-0.124939
-0.314527	8.26a compiled to assembly:	-0.124939
-0.314527	8.26b compiled to assembly:	-0.124939
-0.023527	versions are produced regularly.	-0.124939
-0.463536	and searching for vacant	-0.124939
-0.598744	address is not vacant	-0.124939
-0.600953	factors in the early	-0.124939
-0.356111	just-in-time compilation. Some early	-0.124939
-0.461724	(CGrandParent) contains any non-polymorphic	-0.124939
-0.237959	templates // Place non-polymorphic	-0.124939
-0.065647	2.2, C = 3.3;	-0.124939
-0.461831	costly to many users.	-0.124939
-0.357264	hard working software users.	-0.124939
-0.504625	functions must have extern	-0.124939
-0.358145	common entry point extern	-0.124939
-1.456381	rather than the heap.	-0.124939
-0.572550	managing a memory heap.	-0.124939
-0.463541	double format. The formats	-0.124939
-0.356474	and standardized file formats	-0.124939
-0.600004	exceptions can be ruled	-0.124939
-0.876034	they cannot be ruled	-0.124939
-0.463653	memory addresses is reused	-0.124939
-0.600617	c+b can be reused	-0.124939
-0.346493	the last vector. Organize	-0.124939
-0.237959	is a bottleneck. Organize	-0.124939
-0.161083	CChild1 : public CParent<CChild1>	-0.425969
-1.070283	with: // Example 14.12b	-0.124939
-0.592946	unrolling in example 14.12b	-0.124939
-0.355338	favorable: Small data types:	-0.124939
-0.355338	favorable: Larger data types:	-0.124939
-0.599282	Correction for the FDIV	-0.124939
-0.358852	"FDIV bug". The FDIV	-0.124939
-0.590532	value before the decimal	-0.124939
-0.891358	constant with a decimal	-0.124939
-0.357668	= 1.f; float nfac	-0.124939
-0.531332	xn *= x; nfac	-0.124939
-0.452050	files and network connections.	-0.124939
-0.349140	mutexes. Open database connections.	-0.124939
-1.288691	than in a PC.	-0.124939
-0.358625	previously required a PC.	-0.124939
-1.377126	are based on hacks	-0.124939
-0.237959	rather than self-styled hacks	-0.124939
-0.356169	improve search times 24	-0.124939
-0.325431	optimal algorithm ....................................................................................... 24	-0.124939
-0.356894	switch statements often suffer	-0.124939
-0.523471	code can therefore suffer	-0.124939
-0.358893	try, catch, and throw.	-0.124939
-0.557633	a function can throw.	-0.124939
-0.654882	the same bits differently.	-0.124939
-0.355349	Dynamic linking works differently.	-0.124939
-0.172705	by 16 __declspec( align(16))	-0.124939
-0.172705	__attribute(( aligned(16))) __declspec( align(16))	-0.124939
-0.342755	Size of each element,	-0.425969
-0.635350	3.5 Program loading Often,	-0.124939
-0.325437	year or two. Often,	-0.124939
-0.408211	the error condition. Replacing	-0.124939
-0.575703	the function inline. Replacing	-0.124939
-0.065031	a*b+a*c=a*(b+c) a*x*x*x + b*x*x	-0.425969
-0.386766	subroutines in assembly language".	-0.124939
-0.463593	pointers efficient, and that's	-0.124939
-0.503845	different threads, but that's	-0.124939
-0.314795	dispatching .................................................................................... 124 13.3	-0.124939
-0.538969	time of programming. 13.3	-0.124939
-1.337247	that there are inherent	-0.124939
-1.131577	do not have inherent	-0.124939
-0.237959	/Qparallel -parallel -openmp -static	-0.124939
-0.237959	/Qopenmp -m32 -m64 -static	-0.124939
-0.311818	unused bytes S1 ArrayOfStructures[100];	-0.124939
-0.311818	19 }; S1 ArrayOfStructures[100];	-0.124939
-0.331944	dispatch strategies........................................................................................ 122 13.2	-0.124939
-0.331940	the source files. 13.2	-0.124939
-0.525929	element. The integer comparison	-0.124939
-0.314795	make an approximate comparison	-0.124939
-0.358978	compiler takes the hint	-0.124939
-0.971238	is only a hint	-0.124939
-0.325431	maintenance .......................................................................................... 126 13.5	-0.124939
-0.314795	be found elsewhere. 13.5	-0.124939
-0.357305	cases........................................................................................................ 124 2 13.4	-0.124939
-0.237959	a reliable decision. 13.4	-0.124939
-0.355145	compiler ......................................................................... 128 13.7	-0.124939
-0.314795	particularly critical. 129 13.7	-0.124939
-0.314795	usage in kernel code"	-0.124939
-0.237959	The name "position-independent code"	-0.124939
-0.358600	predicted well, of course.	-0.124939
-0.358600	not safe, of course.	-0.124939
-0.330862	series, vectorized #include <dvec.h>	-0.124939
-0.330862	classes 114 #include <dvec.h>	-0.124939
-0.588082	happens inside the loop,	-0.124939
-0.525183	including the while loop,	-0.124939
-0.649002	an 8-bit signed number,	-0.124939
-0.442075	the CPU family number,	-0.124939
-0.570525	illustrates such a case:	-0.124939
-0.341892	string to lower case:	-0.124939
-1.322403	be avoided by rolling	-0.124939
-0.357388	example 8.26a by rolling	-0.124939
-0.570609	operations outside the loop:	-0.124939
-0.351328	// Critical innermost loop:	-0.124939
-0.358744	x; x.f = 2.0f;	-0.124939
-0.901149	* x + 2.0f;	-0.124939
-0.587729	compact. See page 52.	-0.124939
-0.353754	example 7.35 page 52.	-0.124939
-1.347231	are useful for supporting	-0.124939
-0.358199	development tools for supporting	-0.124939
-0.463520	to check that thrown	-0.124939
-0.339500	check for exceptions thrown	-0.124939
-0.575362	do this by invoking	-0.124939
-0.356608	application program without invoking	-0.124939
-0.065647	int FactorialTable[13] = {1,	-0.425969
-1.952939	is possible to construct	-0.124939
-1.189381	Make the function construct	-0.124939
-1.195589	before it is compiled.	-0.124939
-1.732577	the program is compiled.	-0.124939
-0.346724	matrix 96 void transpose(double	-0.124939
-0.346724	Example 9.5b void transpose(double	-0.124939
-0.172705	page 105. 8.7 Checking	-0.124939
-0.172705	.............................................................................................. 82 8.7 Checking	-0.124939
-0.218159	as common subexpression elimination,	-0.124939
-0.218159	inlining, common subexpression elimination,	-0.124939
-2.021462	for (i = StringLength;	-0.124939
-0.570271	string; int i, StringLength;	-0.124939
-0.353864	integration, web application integration,	-0.124939
-0.349140	GUI development, database integration,	-0.124939
-1.021966	than a few kilobytes	-0.124939
-0.314803	Matrix size Total kilobytes	-0.124939
-0.356266	code size have got	-0.124939
-0.654557	instruction sets have got	-0.124939
-0.358728	to declare it locally	-0.124939
-0.648545	and other resources locally	-0.124939
-0.358931	only half of it,	-0.124939
-0.649410	program logic allows it,	-0.124939
-0.463403	* 4 = 32.	-0.124939
-0.351712	an integer, usually 32.	-0.124939
-0.596873	SSE2 is the minimum	-0.124939
-0.356429	declaration size, bits minimum	-0.124939
-1.071597	way to make thread-specific	-0.124939
-0.347521	class for containing thread-specific	-0.124939
-0.358556	Example 8.12b int a[2];	-0.124939
-0.570271	8.12a int i, a[2];	-0.124939
-0.462598	higher address which can't	-0.124939
-0.357299	be shared. You can't	-0.124939
-0.355150	with vector parameters Vec4f	-0.124939
-0.237959	Vec8ui Vec4q Vec4uq Vec4f	-0.124939
-0.473708	a function call. (2)	-0.124939
-0.408211	before it occurs, (2)	-0.124939
-0.518127	See the preceding paragraph	-0.124939
-0.311817	unwinding The preceding paragraph	-0.124939
-0.462903	The file will remain	-0.124939
-0.537305	at the diagonal remain	-0.124939
-0.212332	uncommon for virus scanners	-0.124939
-0.212332	users. Firewalls, virus scanners	-0.124939
-0.658291	} This loop calculates	-0.124939
-0.358111	following example, which calculates	-0.124939
-0.348397	files or accessing databases,	-0.124939
-0.325431	to network resources, databases,	-0.124939
-0.237959	Scott Meyers: "Effective C++".	-0.124939
-0.237959	and "More Effective C++".	-0.124939
-0.520907	(MS Visual Studio 2008	-0.124939
-0.237959	and Windows Server 2008	-0.124939
-0.621854	/Gy -ffunction- sections /Gy	-0.124939
-0.294275	unreferen- ced functions) /Gy	-0.124939
-0.429573	x.c = C; Assuming	-0.124939
-0.237959	features it has. Assuming	-0.124939
-0.516703	that a binary search,	-0.124939
-0.549385	even a linear search,	-0.124939
-0.875376	in Intel compiler .........................................................................	-0.124939
-1.026468	in Gnu compiler .........................................................................	-0.124939
-0.996108	of different C++ constructs	-0.124939
-0.356478	the advanced programming constructs	-0.124939
-0.065312	different platforms, different screen	-0.425969
-0.339503	for further optimizations. Loops	-0.124939
-0.314795	different microprocessors. 7.13 Loops	-0.124939
-0.356520	Example 8.5a void Plus2	-0.124939
-0.531463	2;} int a; Plus2	-0.124939
-1.482520	= a - a*0	-0.124939
-1.352904	a - n.a. a*0	-0.124939
-1.399118	= 0 - a*1	-0.124939
-1.358230	0 - n.a. a*1	-0.124939
-0.330862	vector classes #include "vectorclass.h"	-0.124939
-0.330862	CPU dispatching #include "vectorclass.h"	-0.124939
-0.444445	1024; int a[size], b[size],	-0.124939
-0.550901	i; float a[size], b[size],	-0.124939
-0.353284	Polynomial coefficients double Table[100];	-0.124939
-0.353284	= 3.3; double Table[100];	-0.124939
-0.269251	The following sections describe	-0.124939
-0.269251	The subsequent sections describe	-0.124939
-0.557739	still uses a GOT.	-0.124939
-0.358893	the PLT and GOT.	-0.124939
-0.527176	Dynamic cast The dynamic_cast	-0.124939
-0.357106	This check makes dynamic_cast	-0.124939
-0.294289	the program. 3 Finding	-0.124939
-0.294289	language...................................................... 14 3 Finding	-0.124939
-0.023527	2, 6, 24, 120,	-0.425969
-0.818340	and one for uninitialized	-0.124939
-1.381225	if they are uninitialized	-0.124939
-0.237959	(See page 81). 77	-0.124939
-0.237959	by compiler ....................................................................... 77	-0.124939
-1.292495	length of a string.	-0.124939
-0.237959	a false vendor string.	-0.124939
-2.018533	- - x 74	-0.124939
-0.237959	of different compilers............................................................................. 74	-0.124939
-0.358341	to C1::f } 73	-0.124939
-0.594082	required. See page 73	-0.124939
-0.314795	CChild1 Object1; CChild2 Object2;	-0.124939
-0.294275	C1 Object1; C2 Object2;	-0.124939
-1.349095	bigger than the destination	-0.124939
-0.358893	that source and destination	-0.124939
-0.431686	by a table lookup:	-0.425969
-0.358893	page 73 and 72	-0.124939
-0.350874	b + a; 72	-0.124939
-0.356275	if (Day & (Tuesday	-0.124939
-0.497679	efficiency. The expression (Tuesday	-0.124939
-0.065526	int a:4; int b:2;	-0.425969
-0.129289	string[100], *p = string;	-0.124939
-0.726593	is required for putting	-0.124939
-0.463330	to 2 by putting	-0.124939
-0.358662	example 14.14a with 14.14b	-0.124939
-1.573080	to: // Example 14.14b	-0.124939
-1.342115	are accessed in non-	-0.124939
-0.540786	mechanism relies on non-	-0.124939
-0.999595	example 15.1b to 15.1c.	-0.124939
-0.599073	reorganize: // Example 15.1c.	-0.124939
-0.549555	to (see page 73).	-0.124939
-0.549555	precision (see page 73).	-0.124939
-0.846597	and position-independent code .......................................................	-0.124939
-0.646272	or 3-dimensional vectors .......................................................	-0.124939
-0.294275	overhead of semaphores, mutexes,	-0.124939
-0.294275	allocated memory, windows, mutexes,	-0.124939
-0.505036	of register is volatile.	-0.124939
-0.532930	must be declared volatile.	-0.124939
-0.358416	Windows DLLs use relocation.	-0.124939
-0.785389	addresses that need relocation.	-0.124939
-0.134279	= (a&b) | (~a&c)	-0.124939
-0.134279	0, (a&b) | (~a&c)	-0.124939
-1.687151	there is a 90%	-0.124939
-0.535477	spot that uses 90%	-0.124939
-0.358556	<int m> int MultiplyBy	-0.124939
-0.358122	template parameter. If MultiplyBy	-0.124939
-0.593761	algorithms, are not suited	-0.124939
-0.500606	language is best suited	-0.124939
-0.053307	Intel Architecture Software Developer’s	-0.425969
-0.023527	a2, b1, b2, y1,	-0.124939
-0.599073	reciprocal: // Example 14.14a	-0.124939
-1.328195	code in example 14.14a	-0.124939
-0.356650	reinstalled and user settings	-0.124939
-0.294275	different system color settings	-0.124939
-1.188896	of the function. Compile	-0.124939
-0.339510	the following: 130 Compile	-0.124939
-0.356047	_mm_clflush intrinsic function. Provoke	-0.124939
-0.682749	memory to disk. Provoke	-0.124939
-0.577351	pointer in an import	-0.124939
-0.460875	goes through an import	-0.124939
-0.846419	a memory block turns	-0.124939
-0.341889	if the prediction turns	-0.124939
-0.358893	0x3700, 0x3F00 and 0x4700.	-0.124939
-0.540268	we read from 0x4700.	-0.124939
-0.314795	--- - ----- x----	-0.124939
-0.294275	- ----- x---- x----	-0.124939
-0.102884	all applications. 2.8 Overcoming	-0.124939
-0.102884	framework........................................................................... 14 2.8 Overcoming	-0.124939
-0.294275	0 a+0=a a*0=0 a*1=a	-0.124939
-0.237959	x-xxxxxx- a*0=0 --xxxx-xx a*1=a	-0.124939
-0.331940	efficient than necessary. Take	-0.124939
-0.408211	the new features. Take	-0.124939
-0.853320	lot of data shuffling,	-0.124939
-0.237959	Extra data conversion, shuffling,	-0.124939
-0.556829	threads with different priorities	-0.124939
-0.355066	for assigning different priorities	-0.124939
-0.348679	Index out of range";	-0.124939
-0.353227	complicated because various corrections	-0.124939
-0.325437	have sent me corrections	-0.124939
-0.357021	but also less safe.	-0.124939
-0.355536	your program exception safe.	-0.124939
-1.994354	instruction set is supported.	-0.124939
-0.598744	precision is not supported.	-0.124939
-0.537919	expression or an anonymous	-0.124939
-0.537919	class into an anonymous	-0.124939
-0.596999	user-defined function is pure.	-0.124939
-0.895536	function to be pure.	-0.124939
-0.537193	more vector instructions SSE4.2	-0.124939
-0.294275	tmmintrin.h SSE4.1 smmintrin.h SSE4.2	-0.124939
-0.357325	= __rdtsc(); return clock;	-0.124939
-0.568920	DontSkip; long long clock;	-0.124939
-1.295245	loop in example 12.4a	-0.124939
-0.353551	situations like example 12.4a	-0.124939
-0.141668	of different size matrices,	-0.124939
-0.141668	copying different size matrices,	-0.124939
-0.352408	(^) may give inconsistent	-0.124939
-0.352406	menu click becomes inconsistent	-0.124939
-0.237959	a = _mm_or_si128(c2, bc);	-0.124939
-0.237959	bc = _mm_andnot_si128(mask, bc);	-0.124939
-0.889755	optimization is to join	-0.124939
-0.659192	be better to join	-0.124939
-0.940395	is out of range.	-0.124939
-0.940395	Index out of range.	-0.124939
-0.065526	int b:2; int c:2;	-0.425969
-0.442071	Rounding is fast. Value	-0.124939
-0.408211	Truncation is slow. Value	-0.124939
-0.357952	grandparent class: class CGrandParent	-0.124939
-0.559003	CParent : public CGrandParent	-0.124939
-0.023527	c2 = _mm_add_epi16(c, two);	-0.425969
-2.634813	x x x --	-0.124939
-0.331940	- - xxxxxxxxx --	-0.124939
-0.358940	brand check is bypassed	-0.124939
-0.900208	mechanism can be bypassed	-0.124939
-0.355694	loading of several drivers,	-0.124939
-0.237959	System programming Device drivers,	-0.124939
-0.463653	the other is -0	-0.124939
-0.463386	is negative or -0	-0.124939
-0.743538	on a graphics accelerator	-0.124939
-0.326986	coprocessor or graphics accelerator	-0.124939
-0.347521	optimizing database access. 3.10	-0.124939
-0.331944	databases ....................................................................................................... 21 3.10	-0.124939
-0.331944	Graphics ................................................................................................................. 21 3.11	-0.124939
-0.408211	one is best. 3.11	-0.124939
-0.588446	solution, but it increases	-0.124939
-0.722962	A hash table increases	-0.124939
-0.237969	access ...................................................................................................... 21 3.13	-0.124939
-0.237969	heavily loaded. 21 3.13	-0.124939
-0.331940	Memory access....................................................................................................... 22 3.14	-0.124939
-0.314795	about memory caching. 3.14	-0.124939
-0.336338	with multiple cores. 3.15	-0.124939
-0.331940	Context switches..................................................................................................... 22 3.15	-0.124939
-0.516109	a dependency chain. 3.16	-0.124939
-0.331940	chains ................................................................................................ 22 3.16	-0.124939
-0.358238	Some compilers make Sum1	-0.124939
-0.355149	the three functions. Sum1	-0.124939
-0.023527	= (a+b)+(c+d) a*b+a*c=a*(b+c) a*x*x*x	-0.425969
-0.093318	0) *(p++) |= 0x20;	-0.124939
-0.093318	i--) *(p++) |= 0x20;	-0.124939
-0.871299	be divisible by TILESIZE	-0.124939
-0.590965	squares: const int TILESIZE	-0.124939
-0.357423	can reduce any expression,	-0.124939
-0.352972	in an && expression,	-0.124939
-0.065444	the biggest time consumers	-0.124939
-0.600953	counter in the CPU,	-0.124939
-0.705771	with a slow CPU,	-0.124939
-0.525461	p; p = &Object1;	-0.124939
-0.357687	p1; p1 = &Object1;	-0.124939
-0.655315	in embedded systems .............................................................................	-0.124939
-1.054596	the compiler does .............................................................................	-0.124939
-0.048403	} // Approximate exp(x)	-0.124939
-0.048403	n+1; // Approximate exp(x)	-0.124939
-0.241747	the time of programming.	-0.124939
-0.358589	= ReadTSC() - time1;	-0.124939
-0.568920	i; long long time1;	-0.124939
-0.337062	interrupts at certain events,	-0.124939
-0.337062	to count certain events,	-0.124939
-0.590856	intensive program is achieved	-0.124939
-0.504866	portability could be achieved	-0.124939
-0.330862	with SSE2 #include <emmintrin.h>	-0.124939
-0.330862	x64 141 #include <emmintrin.h>	-0.124939
-0.341892	for all applications. 2.8	-0.124939
-0.336338	interface framework........................................................................... 14 2.8	-0.124939
-0.563072	cannot find the answers	-0.124939
-0.824616	you can get answers	-0.124939
-1.098883	The cost of starting	-0.124939
-0.314803	programming language Before starting	-0.124939
-0.247902	stack also has disadvantages:	-0.124939
-0.247902	unrolling also has disadvantages:	-0.124939
-0.343763	microprocessor ........................................................................................... 6 2.3	-0.124939
-0.607477	of this manual. 2.3	-0.124939
-1.135767	loop control branch ahead	-0.124939
-1.190887	the loop counter ahead	-0.124939
-0.893027	critical function is inserted	-0.124939
-0.540507	Here, we have inserted	-0.124939
-1.104820	the data cache. 2.2	-0.124939
-0.350868	platform ....................................................................................... 5 2.2	-0.124939
-0.382907	vectors) /arch:SSE -msse /arch:SSE	-0.124939
-0.237959	bit float vectors) /arch:SSE	-0.124939
-0.646270	the optimal platform 2.1	-0.124939
-0.350868	platform ........................................................................................... 5 2.1	-0.124939
-1.178094	= b + 2.0	-0.124939
-0.357224	<= u.f < 2.0	-0.124939
-0.023527	120, 720, 5040, 40320,	-0.425969
-0.356741	Example 9.1a int Func(int);	-0.124939
-0.356741	Example 9.1b int Func(int);	-0.124939
-0.358351	the threads will invalidate	-0.124939
-0.237959	you may actively invalidate	-0.124939
-0.358219	accessed sequentially. The opposite	-0.124939
-0.358219	used most. The opposite	-0.124939
-0.358899	risk factor in itself,	-0.124939
-0.455349	of the framework itself,	-0.124939
-0.341892	function libraries........................................................................................ 12 2.7	-0.124939
-0.237959	names are undocumented. 2.7	-0.124939
-0.578337	(y) { int a[1000];	-0.124939
-0.356741	{ 89 int a[1000];	-0.124939
-0.351716	compiler .................................................................................................... 10 2.6	-0.124939
-0.351314	with another compiler. 2.6	-0.124939
-0.358686	Intensive Codes", by S.	-0.124939
-0.237959	vector processors. Henry S.	-0.124939
-0.545742	in a thread environment	-0.124939
-0.353456	The integrated development environment	-0.124939
-1.749543	} else { F2(b);	-0.124939
-0.382907	{ float b[1000]; F2(b);	-0.124939
-0.592839	efficient because it handles	-0.124939
-0.576779	how the microprocessor handles	-0.124939
-0.639045	whole program optimization. 2.4	-0.124939
-0.343763	operating system......................................................................................... 6 2.4	-0.124939
-1.628867	is important to note	-0.124939
-0.294275	subsequent manuals. Please note	-0.124939
-0.651898	may consider whether others	-0.124939
-0.325437	are optimized well, others	-0.124939
-0.355767	to fit specific needs.	-0.124939
-0.473725	satisfies the user's needs.	-0.124939
-0.355244	ebx, eax / sar	-0.124939
-0.354662	mov shr add sar	-0.124939
-0.048403	module __attribute__ ((visibility ("internal")))	-0.124939
-0.048403	("internal"))) __attribute__ ((visibility ("internal")))	-0.124939
-2.159050	Example: // Example 8.15a	-0.124939
-0.592946	b in example 8.15a	-0.124939
-0.854246	that the memory footprint	-0.124939
-0.355451	a larger memory footprint	-0.124939
-0.358893	example 14.12b and 14.13b	-0.124939
-1.070283	with: // Example 14.13b	-0.124939
-0.358730	of scope or namespaces.	-0.124939
-0.833140	speed to using namespaces.	-0.124939
-1.298082	is useful for preventing	-0.124939
-0.237959	copying without effectively preventing	-0.124939
-0.657478	#include "asmlib.h" // Lowest	-0.124939
-0.357733	= &CriticalFunction_Dispatch; // Lowest	-0.124939
-0.237959	Wednesday, Thursday, Friday, Saturday	-0.124939
-0.237959	Friday = 0x20, Saturday	-0.124939
-0.023527	a+b+c+d = (a+b)+(c+d) a*b+a*c=a*(b+c)	-0.425969
-0.023527	platforms, different screen resolutions,	-0.124939
-0.336338	int is 4. So	-0.124939
-0.237959	questions from everybody. So	-0.124939
-0.594266	array. // Example 9.6a	-0.124939
-0.653100	per element Example 9.6a	-0.124939
-0.143015	- a*b+a*c = a*(b+c)	-0.124939
-0.143015	n.a. a*b+a*c = a*(b+c)	-0.124939
-0.349799	not reproducible. Such events	-0.124939
-0.341892	caused by random events	-0.124939
-0.358686	not affected by __fastcall.	-0.124939
-1.339590	you are using __fastcall.	-0.124939
-1.399118	= 0 - a+0	-0.124939
-1.358230	0 - n.a. a+0	-0.124939
-0.339503	expected real-time speed. Delays	-0.124939
-0.237959	mask. Poor reproducibility. Delays	-0.124939
-0.294275	ISO/IEC TR18015 Technical Report	-0.124939
-0.237959	TR 18015, "Technical Report	-0.124939
-0.659495	{ aa[i] = (bb[i]	-0.124939
-0.354665	+ 2) : (bb[i]	-0.124939
-1.994354	instruction set is specified.	-0.124939
-0.597372	appropriate instruction set specified.	-0.124939
-1.281315	to a function prototype	-0.124939
-0.573541	parm2); // Function prototype	-0.124939
-0.589682	example on page 39	-0.124939
-0.237959	< columns; j++) 39	-0.124939
-0.593889	automatically. For example, let's	-0.124939
-0.237959	explain the difference, let's	-0.124939
-0.065617	Weekdays Day; if (Day	-0.124939
-1.618990	may not be visible	-0.124939
-0.598744	alignment is not visible	-0.124939
-0.356932	8 - 64 Kbytes	-0.124939
-0.353455	cache of 256 Kbytes	-0.124939
-0.599581	cache is a proxy	-0.124939
-0.504912	a computer. The proxy	-0.124939
-0.358341	temp; 104 } Microprocessors	-0.124939
-0.710973	Explicit cache control Microprocessors	-0.124939
-1.613864	explained on page 105.	-0.124939
-0.457064	operations, see page 105.	-0.124939
-0.356051	has been accessed recently	-0.124939
-0.494411	chooses the least recently	-0.124939
-0.541250	large cost to creating	-0.124939
-0.358848	is responsible for creating	-0.124939
-0.065647	{ j = order(i);	-0.124939
-0.462229	the member pointer refers	-0.124939
-0.341896	parallel. Coarse-grained parallelism refers	-0.124939
-0.597259	file to a floppy	-0.124939
-0.595324	media such as floppy	-0.124939
-0.957775	no risk of underflow.	-0.124939
-0.562961	on overflow and underflow.	-0.124939
-0.325431	Function pointers ...................................................................................................... 37	-0.124939
-0.294275	definitely be avoided. 37	-0.124939
-0.358899	pointer known in 36	-0.124939
-0.294275	and references ............................................................................................ 36	-0.124939
-0.345537	| ((C & 3)	-0.124939
-0.345537	| ((B & 3)	-0.124939
-0.503129	possibility that such contrived	-0.124939
-0.582201	indeed a very contrived	-0.124939
-0.336338	number of branches. Manual	-0.124939
-0.538969	available from www.intel.com. Manual	-0.124939
-0.502422	more cache space. Excessive	-0.124939
-0.237959	unroll too much. Excessive	-0.124939
-0.356520	Example 7.12 void FuncA	-0.124939
-0.237959	and calls alternately FuncA	-0.124939
-0.358893	as email and web	-0.124939
-0.294275	development, database integration, web	-0.124939
-0.504804	no overflow can occur,	-0.124939
-0.355846	the error doesn't occur,	-0.124939
-0.591231	actually able to reorder	-0.124939
-0.587646	A compiler may reorder	-0.124939
-0.358730	than seconds or microseconds	-0.124939
-0.502228	critical functions take microseconds	-0.124939
-0.596873	containers is the Standard	-0.124939
-0.575689	relates to security. Standard	-0.124939
-0.901478	effect of the const_cast	-0.124939
-0.527176	Const cast The const_cast	-0.124939
-0.345262	hardly ever used, though.	-0.124939
-0.237959	not always optimal, though.	-0.124939
-0.237974	or int64_t MS compiler:	-0.124939
-0.237974	or uint64_t MS compiler:	-0.124939
-0.048403	x[]); void F3(bool y)	-0.124939
-0.048403	9.2b void F3(bool y)	-0.124939
-0.294275	-m32 -m64 -static /MT	-0.124939
-0.237959	(multithreaded) /arch:AVX /openmp /MT	-0.124939
-0.583198	local object is overwritten,	-0.124939
-0.895536	variables to be overwritten,	-0.124939
-0.418252	ecx, DWORD PTR [esp+8]	-0.124939
-0.418252	[esp+4] DWORD PTR [esp+8]	-0.124939
-0.600617	time can be annoyingly	-0.124939
-0.352408	this would give annoyingly	-0.124939
-1.163314	int i; float list[size];	-0.124939
-0.349148	= 100; S1 list[size];	-0.124939
-0.325431	License, optional commercial license	-0.124939
-0.294275	no yes License license	-0.124939
-0.237959	(b1 * b2); y1	-0.124939
-0.237959	b2, y1, y2; y1	-0.124939
-0.408211	b2 * reciprocal_divisor; y2	-0.124939
-0.237959	a1 / b1; y2	-0.124939
-0.131775	two elements: #define swapd(x,y)	-0.124939
-0.131775	array elements: #define swapd(x,y)	-0.124939
-0.282429	i; if ((unsigned int)i	-0.124939
-0.282429	14.4b if ((unsigned int)i	-0.124939
-0.143005	SSE2 version int CriticalFunction_SSE2(int	-0.425969
-0.350582	frequency is 2 GHz	-0.124939
-0.350582	on a 2 GHz	-0.124939
-0.764580	if a is false.	-0.124939
-0.358393	all 0's when false.	-0.124939
-0.351716	for Windows. 10 Multithreading	-0.124939
-0.314795	is necessary. 101 Multithreading	-0.124939
-0.417003	for Intel CPUs. New	-0.124939
-0.417003	for AMD CPUs. New	-0.124939
-0.141165	function. typeof(CriticalFunction) * CriticalFunctionDispatch(void)	-0.124939
-0.141165	("CriticalFunction"); typeof(CriticalFunction) * CriticalFunctionDispatch(void)	-0.124939
-0.501824	examples for these methods.	-0.124939
-0.559827	inappropriate CPU dispatch methods.	-0.124939
-0.358539	a genuine compiler became	-0.124939
-0.336338	for Basic soon became	-0.124939
-0.540069	if, and only if,	-0.124939
-0.575234	caching is advantageous if,	-0.124939
-0.325437	of end user's computers.	-0.124939
-0.382907	yesterday's big mainframe computers.	-0.124939
-0.607477	A, B, C; x.abc	-0.124939
-0.237959	// Example 7.40c x.abc	-0.124939
-0.048403	...................................................................................... 156 16.3 Worst-case	-0.124939
-0.048403	large. 156 16.3 Worst-case	-0.124939
-0.343760	two simple expressions. Operations	-0.124939
-0.325431	alignment and aliasing. Operations	-0.124939
-0.356832	includes the libraries named	-0.124939
-0.356688	to 256-bit registers named	-0.124939
-0.294275	---xxx-x- a+0=a x-xxxxxx- a*0=0	-0.124939
-0.294275	= 0 a+0=a a*0=0	-0.124939
-0.358893	in p1 and p2	-0.124939
-0.237959	CChild2 * p2; p2	-0.124939
-0.463601	is contained in p1	-0.124939
-0.237959	CChild1 * p1; p1	-0.124939
-0.573773	exist for all major	-0.124939
-0.841464	supported on all major	-0.124939
-0.526534	application programs use internet	-0.124939
-0.336334	Internet forums Several internet	-0.124939
-0.351650	c;}; abc * p;	-0.124939
-0.351650	Object2; CHello * p;	-0.124939
-0.902365	static inline int lrintf	-0.124939
-0.854652	with the functions lrintf	-0.124939
-1.715928	so that the resulting	-0.124939
-1.496573	} } The resulting	-0.124939
-0.102884	transpose function swapd(a[r][c], a[c][r]);	-0.124939
-0.102884	below diagonal swapd(a[r][c], a[c][r]);	-0.124939
-0.625638	for high precision math.	-0.124939
-0.341474	set. High precision math.	-0.124939
-0.463403	/ 4 = 2048	-0.124939
-0.532493	38.7 512 512 2048	-0.124939
-0.657558	= d + 3.5;	-0.124939
-1.423642	= b * 3.5;	-0.124939
-0.358852	use relocation. The DLLs	-0.124939
-0.355915	current position. Windows DLLs	-0.124939
-0.577502	PathScale compiler for Unix	-0.124939
-0.357631	in registers. 64-bit Unix	-0.124939
-0.564441	Vectorized table lookup Lookup	-0.124939
-0.444421	the table lookup. Lookup	-0.124939
-1.083939	the template parameters differ	-0.124939
-0.331944	libraries and drivers differ	-0.124939
-0.065647	int level = InstructionSet();	-0.425969
-0.526437	~C1(); }; void F1()	-0.124939
-0.346724	function prototype: void F1()	-0.124939
-0.358598	less safe. This safety	-0.124939
-0.331944	that doesn't compromise safety	-0.124939
-0.358931	Two libraries of predefined	-0.124939
-0.355842	intrinsic functions Use predefined	-0.124939
-1.810703	to make a variable-size	-0.124939
-0.542361	is to allocate variable-size	-0.124939
-0.356275	p = & obj1;	-0.124939
-0.478063	g() { C1 obj1;	-0.124939
-0.023527	of Numerically Intensive Codes",	-0.124939
-0.816297	These instructions are summarized	-0.124939
-0.762240	The results are summarized	-0.124939
-0.358940	different targets is small.	-0.124939
-0.713015	to be too small.	-0.124939
-0.751285	and error handling ................................................................................	-0.124939
-0.538969	biggest time consumers ................................................................................	-0.124939
-0.582030	data from a buffer.	-0.124939
-0.945182	the branch target buffer.	-0.124939
-0.202035	= 100; float list[size],	-0.124939
-0.062130	for InstructionSet() #include "asmlib.h"	-0.425969
-0.358931	all occurrences of ArraySize	-0.124939
-0.590965	constant const int ArraySize	-0.124939
-0.357668	Register variables, float Live	-0.124939
-0.444421	from register storage. Live	-0.124939
-0.237959	= _mm_blendv_epi8(bc, c2, mask);	-0.124939
-0.237959	c2 = _mm_and_si128(c2, mask);	-0.124939
-0.358662	have names with suffixes	-0.124939
-0.355692	for AVX. These suffixes	-0.124939
-0.597568	manually by the programmer.	-0.124939
-0.838688	of the application programmer.	-0.124939
-0.294275	a-(-b)=a+b ---xxx-x- a+0=a x-xxxxxx-	-0.124939
-0.237959	- - x-xx----x x-xxxxxx-	-0.124939
-0.463647	been given a name.	-0.124939
-1.485097	with the same name.	-0.124939
-0.599196	layer of a third-party	-0.124939
-1.205280	There are also third-party	-0.124939
-0.538969	a*b = b*a (a+b)+c=a+(b+c)	-0.124939
-0.237959	a+b=b+a a*b=b*a a+b+c=a+(b+c) (a+b)+c=a+(b+c)	-0.124939
-1.009078	many functions for audio	-0.124939
-0.237959	that produce streaming audio	-0.124939
-0.358625	accept expressions as arguments	-0.124939
-1.485097	with the same arguments	-0.124939
-0.570124	would be an infinite	-0.124939
-0.435087	only for avoiding infinite	-0.124939
-1.447517	in the program flow.	-0.124939
-0.535679	cases of program flow.	-0.124939
-0.356569	overwritten, and even worse,	-0.124939
-0.339500	Pentium 4. Even worse,	-0.124939
-0.862037	because the cache miss	-0.124939
-0.839184	a level-2 cache miss	-0.124939
-0.599819	permissible if the unsafe	-0.124939
-0.358940	and memcpy is unsafe	-0.124939
-0.480155	expression is optimized away.	-0.124939
-0.341023	inlined, or optimized away.	-0.124939
-1.049033	for calculating the movements	-0.124939
-0.331944	calculating the physical movements	-0.124939
-0.249185	n.a. x*x*x*x*x*x*x*x = ((x2)	-0.124939
-0.358132	((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x = ((x2)	-0.124939
-0.358929	malloc. Handles to windows,	-0.124939
-0.621847	dynamically allocated memory, windows,	-0.124939
-0.358929	immediate response to pressing	-0.124939
-0.353669	simple tasks like pressing	-0.124939
-0.442071	as vector register. Factors	-0.124939
-0.339503	advantageous vectorization is. Factors	-0.124939
-0.595324	considerations such as price,	-0.124939
-0.555049	at a high price,	-0.124939
-0.429573	1.; Eliminate jumps Jumps	-0.124939
-0.325431	is not optimized. Jumps	-0.124939
-0.504327	verifying, debugging and maintaining	-0.124939
-0.358434	testing, verifying and maintaining	-0.124939
-0.314795	evaluate both operands. Nevertheless,	-0.124939
-0.382907	in a PC. Nevertheless,	-0.124939
-0.358893	input/output Graphics and sound	-0.124939
-0.336338	are image processing, sound	-0.124939
-0.358893	network resources and servers	-0.124939
-0.358643	is useful on servers	-0.124939
-0.142242	or a make utility.	-0.124939
-0.142242	as a make utility.	-0.124939
-2.006660	version of the executable.	-0.124939
-0.599138	into the same executable.	-0.124939
-0.542298	times cannot be controlled.	-0.124939
-0.542298	resources cannot be controlled.	-0.124939
-0.522638	or the specific literature	-0.124939
-0.481349	consult the general literature	-0.124939
-0.203704	int SIZE = 512;	-0.425969
-0.356047	to take extra precautions	-0.124939
-0.349140	to take special precautions	-0.124939
-0.638325	that there are smarter	-0.425969
-0.102884	Intensive Codes", SIAM 2001.	-0.124939
-0.102884	A. Hoisie, SIAM 2001.	-0.124939
-0.538969	(see page 73). Current	-0.124939
-0.237959	are called accumulators. Current	-0.124939
-0.065769	optimization effort is concentrated	-0.425969
-1.108497	size; i++) { aa[i]	-0.124939
-0.534350	256; i++) { aa[i]	-0.124939
-0.358625	// Return a null	-0.124939
-0.463252	by returning a null	-0.124939
-1.225975	Intel compiler is capable	-0.124939
-0.835512	Modern CPUs are capable	-0.124939
-0.358341	{ FuncB(i); } FuncC(i);	-0.124939
-0.382907	2) { FuncA(i); FuncC(i);	-0.124939
-0.527234	security reason for updating.	-0.124939
-0.237959	necessary support. Hardware updating.	-0.124939
-0.358893	MOVNTPS, MOVNTPD and MOVNTDQ	-0.124939
-1.052979	bytes without cache MOVNTDQ	-0.124939
-0.504912	in memory. The renaming	-0.124939
-0.538227	capable of register renaming	-0.124939
-0.348399	2 GB. When considering	-0.124939
-0.331954	Another alternative worth considering	-0.124939
-1.434003	Therefore, it is worthwhile	-0.124939
-1.530673	It may be worthwhile	-0.124939
-0.294275	xxxxxxx-x xxxxxxxxx x-xxx---- a-(-b)=a+b	-0.124939
-0.237959	a+a+a+a=a*4 -(-a)=a --xxxxxx- a-(-b)=a+b	-0.124939
-0.387195	public: virtual void f();	-0.425969
-0.522295	object is allocated separately.	-0.124939
-0.339500	should be measured separately.	-0.124939
-0.356523	for regular access patterns	-0.124939
-0.325431	arranged in regular patterns	-0.124939
-0.543551	classes on page 93.	-0.124939
-1.319295	explained on page 93.	-0.124939
-0.577633	in only the lowest	-0.124939
-0.237959	&SelectAddMul_SSE2; // Error: lowest	-0.124939
-0.351318	#include <math.h> #define EXCEPTION_FLT_OVERFLOW	-0.124939
-0.349149	__except (GetExceptionCode() == EXCEPTION_FLT_OVERFLOW	-0.124939
-0.358936	has defined a constructor,	-0.124939
-0.498877	pointer. The copy constructor,	-0.124939
-0.237959	32-bit Windows, Intel/MASM syntax:	-0.124939
-0.237959	32-bit Linux, Gnu/AT&T syntax:	-0.124939
-0.724765	= b * 1.2;	-0.425969
-1.613864	explained on page 26.	-0.124939
-0.587729	storage. See page 26.	-0.124939
-0.726896	to set the parentheses	-0.124939
-0.577286	Now the two parentheses	-0.124939
-0.500273	away an overflow check.	-0.124939
-0.350366	little more syntax check.	-0.124939
-1.228505	a series of experiments	-0.124939
-1.177463	necessary to do experiments	-0.124939
-0.653867	of order execution .................................................................................................	-0.124939
-0.688040	Use lookup tables .................................................................................................	-0.124939
-0.023527	manual 4: "Instruction tables".	-0.124939
-0.355244	eax, 100 / jl	-0.124939
-0.408211	mov add cmp jl	-0.124939
-0.953148	to the total computation	-0.124939
-0.382907	that the overall computation	-0.124939
-0.574548	compilers can make thread-local	-0.124939
-0.348405	global variables. (See thread-local	-0.124939
-0.483531	not up to date.	-0.124939
-0.483531	dispatchers up to date.	-0.124939
-0.877533	also have a physics	-0.124939
-0.490452	have a dedicated physics	-0.124939
-0.571078	a+b is calculated first,	-0.124939
-0.354529	the R values first,	-0.124939
-0.343780	a register (see below)	-0.124939
-0.343780	cycle counter (see below)	-0.124939
-0.550780	if branch is eliminated.	-0.124939
-0.358812	parameter transfer are eliminated.	-0.124939
-1.580806	eight consecutive elements c.load(cc+i);	-0.124939
-0.294275	16) { b.load(bb+i); c.load(cc+i);	-0.124939
-0.237959	algebra reductions: !(!a)=a x-xxxxxxx	-0.124939
-0.237959	x-xx----x x-xxxxxx- x-xxxx-x- x-xxxxxxx	-0.124939
-0.507298	on the CPU. Unrolling	-0.124939
-0.237959	FuncB, then FuncC. Unrolling	-0.124939
-0.504072	the processor has hyperthreading.	-0.124939
-0.833140	advantage to using hyperthreading.	-0.124939
-0.294275	kilobyte is 1024 bytes,	-0.124939
-0.294275	kb = 8192 bytes,	-0.124939
-0.105177	static link libraries (*.lib,	-0.425969
-1.749132	a lot of irrelevant	-0.124939
-1.669098	likely to be irrelevant	-0.124939
-0.586494	you must be careful	-0.124939
-0.353464	code still needs careful	-0.124939
-0.355338	signal processing, data compression	-0.124939
-0.355338	Encryption, decryption, data compression	-0.124939
-0.358427	resolution if time intervals	-0.124939
-0.408211	time at unpredictable intervals	-0.124939
-0.165070	r2++) { for (c2	-0.425969
-0.358559	user expects an immediate	-0.124939
-0.408211	The user expects immediate	-0.124939
-0.478063	() { C1 Object1;	-0.124939
-0.325437	() { CChild1 Object1;	-0.124939
-0.314795	operations (addition, multiplication, etc.)	-0.124939
-0.237959	Intel-based Mac OS, etc.)	-0.124939
-0.491866	Linux, 32-bit and 64-bit.	-0.124939
-0.491866	X, 32-bit and 64-bit.	-0.124939
-0.877335	0; } The indirect	-0.124939
-0.237959	feature called "Gnu indirect	-0.124939
-0.237959	xxxxxxxxx 0/a=0 ---x---xx (-a==-b)=(a==b)	-0.124939
-0.237959	x-xxx-x-- 0/a=0 ---xx--xx (-a==-b)=(a==b)	-0.124939
-0.958008	the microprocessor has hyperthreading,	-0.124939
-0.833140	advantage to using hyperthreading,	-0.124939
-0.596714	eee is the exponent,	-0.124939
-0.358829	sign bit, the exponent,	-0.124939
-1.071379	== 0) { FuncA(i);	-0.124939
-0.876386	+= 2) { FuncA(i);	-0.124939
-0.557797	portability. Unfortunately, the cross-platform	-0.124939
-1.486480	the sake of cross-platform	-0.124939
-0.023527	elements: #define swapd(x,y) {temp=x;	-0.425969
-0.355338	This is data decomposition.	-0.124939
-0.570683	decomposition and data decomposition.	-0.124939
-0.456759	and a template parameter:	-0.124939
-0.649105	as a template parameter:	-0.124939
-0.358111	a profiler which determines	-0.124939
-0.886685	the first operand determines	-0.124939
-0.172306	Class data members (properties)	-0.124939
-0.881176	static const int ABC	-0.124939
-0.351318	For example, #define ABC	-0.124939
-1.076765	Most of the comments	-0.124939
-0.582139	make a few comments	-0.124939
-0.429579	38 7.10 Arrays .....................................................................................................................	-0.124939
-0.408211	160 19 Literature .....................................................................................................................	-0.124939
-1.372645	whether it is profitable	-0.124939
-0.895536	appears to be profitable	-0.124939
-0.444428	explains the logic behind	-0.124939
-0.314795	is actually hidden behind	-0.124939
-0.382907	By Agner Fog. Technical	-0.124939
-0.237959	See ISO/IEC TR18015 Technical	-0.124939
-0.579208	same instruction set. Neither	-0.124939
-0.237959	than an hour. Neither	-0.124939
-0.587674	optimal for each calculation.	-0.124939
-0.575194	start the next calculation.	-0.124939
-0.237972	is pure __attribute(( const))	-0.124939
-0.237972	__attribute(( const)) __attribute(( const))	-0.124939
-0.550788	and easier to test,	-0.124939
-0.883761	the program under test,	-0.124939
-0.102884	-ffunction- sections /Gy -ffunction-	-0.124939
-0.102884	ced functions) /Gy -ffunction-	-0.124939
-0.102884	/arch:SSE -msse /arch:SSE -msse	-0.124939
-0.102884	float vectors) /arch:SSE -msse	-0.124939
-0.462118	= 0; // Initialize	-0.124939
-0.357733	// Constructor // Initialize	-0.124939
-0.357512	sizes and array indices	-0.124939
-0.350376	identified by consecutive indices	-0.124939
-0.527024	+= 4) { s0	-0.124939
-0.657348	float a[100]; float s0	-0.124939
-0.408211	Generate assembly listing /FA	-0.124939
-0.237959	-S - masm=intel /FA	-0.124939
-0.541098	new version for marketing	-0.124939
-0.341892	is no heavy marketing	-0.124939
-0.562955	CriticalFunctionType(int parm1, int parm2);	-0.124939
-0.237959	version return (*CriticalFunction)(parm1, parm2);	-0.124939
-0.294275	a*b=b*a a+b+c=a+(b+c) (a+b)+c=a+(b+c) --xx-----	-0.124939
-0.294275	(x) x-xx--xx- x--x----- --xx-----	-0.124939
-0.203704	int rows = 20,	-0.425969
-0.505088	that reflects the conflicting	-0.124939
-0.570565	requirements are often conflicting	-0.124939
-1.408611	0; i < 20;	-0.124939
-0.541295	backwards though the 61	-0.124939
-0.294275	error handling ................................................................................ 61	-0.124939
-0.358686	function uses by looking	-0.124939
-0.294275	details. The funny looking	-0.124939
-0.358662	more efficiently with coarse-grained	-0.124939
-0.656507	to distinguish between coarse-grained	-0.124939
-0.357130	the mirror elements matrix[c][r]	-0.124939
-0.653579	swapped with element matrix[c][r]	-0.124939
-0.294275	/QaxSSE3, etc. -msse3 -mssse3	-0.124939
-0.237959	/MT -msse3 /arch:SSE3 -mssse3	-0.124939
-1.241438	be useful to isolate	-0.124939
-0.358893	to identify and isolate	-0.124939
-0.460970	compiler-generated assembly code. Let	-0.124939
-0.345260	lines and sets. Let	-0.124939
-0.358428	the task in question.	-0.124939
-0.658865	the algorithm in question.	-0.124939
-0.726378	small x // x^n	-0.124939
-0.355140	// next four x^n	-0.124939
-0.358835	dispatch mechanism that treats	-0.124939
-0.567457	Intel CPU dispatcher treats	-0.124939
-0.064430	14 Specific optimization topics	-0.124939
-0.345260	a self-relative address. (3)	-0.124939
-0.237959	to wrap around, (3)	-0.124939
-0.527915	use a lookup table:	-0.124939
-0.375160	using a lookup table:	-0.124939
-0.659889	the network is unstable	-0.124939
-0.358812	that measurements are unstable	-0.124939
-0.502422	of CPU cores. 60	-0.124939
-0.237959	7.29 Threads .................................................................................................................. 60	-0.124939
-0.597644	this number of iterations.	-0.124939
-0.237959	expansions and Newton-Raphson iterations.	-0.124939
-0.575689	Live range analysis Join	-0.124939
-0.538969	same memory area. Join	-0.124939
-0.659495	{ aa[i] = bb[i]	-0.124939
-0.358393	all 1's when bb[i]	-0.124939
-0.594824	time then the sampling	-0.124939
-0.294275	exceptions, etc. Event-based sampling	-0.124939
-0.504760	formula: (set) = (memory	-0.124939
-0.966047	all the objects (memory	-0.124939
-0.463536	is necessary for verifying	-0.124939
-0.237959	of fine-tuning, testing, verifying	-0.124939
-0.339503	loading ....................................................................................................... 19 3.6	-0.124939
-0.237959	file, is acceptable. 3.6	-0.124939
-0.331944	installation .................................................................................................. 18 3.4	-0.124939
-0.314795	a standardized manner. 3.4	-0.124939
-0.497542	overflow: a[i] = log(b[i])	-0.124939
-0.497542	formula a[i] = log(b[i])	-0.124939
-0.314795	-100+100+100 = 100. Now,	-0.124939
-0.237959	(columns * sizeof(float)). Now,	-0.124939
-0.656625	((x2) 2) 2 a+a+a+a=a*4	-0.124939
-0.237959	x*x*x*x*x*x*x*x = ((x2)2)2 a+a+a+a=a*4	-0.124939
-0.550844	you are in doubt	-0.124939
-0.594800	execution is no doubt	-0.124939
-0.193099	aliasing (see page 78).	-0.124939
-0.358893	that u.f and v.f	-0.124939
-0.352704	// u.f > v.f	-0.124939
-0.382907	in a FIFO manner?	-0.124939
-0.237959	in a FILO manner?	-0.124939
-0.596498	mode rather than generating	-0.124939
-0.356608	in question without generating	-0.124939
-0.429573	got low priority. Especially	-0.124939
-0.331940	at round addresses. Especially	-0.124939
-0.787849	Choice of compiler ....................................................................................................	-0.124939
-0.635341	3.4 Automatic updates ....................................................................................................	-0.124939
-0.356475	cycle? ...................................................................................... 16 3.2	-0.124939
-0.237959	can be improved. 3.2	-0.124939
-0.658961	if (y) { F1(a);	-0.124939
-0.382907	{ int a[1000]; F1(a);	-0.124939
-0.356047	a single function. Switch	-0.124939
-0.429579	than two ways. Switch	-0.124939
-0.065647	__m128i zero = _mm_set1_epi16(0);	-0.425969
-0.846597	use position-independent code everywhere	-0.124939
-0.510604	branches are scattered everywhere	-0.124939
-0.357687	|| (a&&c) = a&&(b||c)	-0.124939
-0.657387	|| (a&&b&&c) = a&&(b||c)	-0.124939
-0.358744	| (b&c) = (a&b)	-0.124939
-0.555559	0 = 0, (a&b)	-0.124939
-0.356475	spots .................................................................................. 16 3.3	-0.124939
-0.237959	the following sections. 3.3	-0.124939
-0.356475	consumers ................................................................................ 16 3.1	-0.124939
-0.538969	biggest time consumers 3.1	-0.124939
-0.814593	branch that goes randomly	-0.124939
-0.736127	data are scattered randomly	-0.124939
-0.325431	arrays and structures. Useful	-0.124939
-0.237959	possible memory requirement. Useful	-0.124939
-0.346498	File access................................................................................................................ 20 3.8	-0.124939
-0.237959	operations to finish. 3.8	-0.124939
-0.299790	database ...................................................................................................... 20 3.9	-0.124939
-0.299790	(*.ini files). 20 3.9	-0.124939
-0.650495	How compilers optimize ............................................................................................	-0.124939
-0.858555	Pointers and references ............................................................................................	-0.124939
-0.775269	in a big mainframe	-0.124939
-0.343432	of yesterday's big mainframe	-0.124939
-0.764110	to test // (time	-0.124939
-0.358589	(time after) - (time	-0.124939
-0.348391	to software optimization. Everything	-0.124939
-0.341892	the same register. Everything	-0.124939
-0.463277	by 16 is required.	-0.124939
-0.358645	the strictness is required.	-0.124939
-1.023602	rule out the theoretical	-0.124939
-0.358852	efficient alternative. The theoretical	-0.124939
-0.358929	example 12.1b to 12.1a.	-0.124939
-2.159050	Example: // Example 12.1a.	-0.124939
-0.900879	data in the file,	-0.124939
-0.237959	as the .exe file,	-0.124939
-0.343774	to many hard working	-0.124939
-0.237959	by using indexes, working	-0.124939
-0.358852	to vectorize. The pragmas	-0.124939
-0.358625	optimization hints as pragmas	-0.124939
-0.358929	engineering principles to use.	-0.124939
-0.994607	is not in use.	-0.124939
-0.583210	slow, difficult to use,	-0.124939
-0.356959	rules about register use,	-0.124939
-0.357021	make vectorization less favorable:	-0.124939
-0.641781	that make vectorization favorable:	-0.124939
-0.356472	resolutions, different system color	-0.124939
-0.325437	square root, RGB color	-0.124939
-0.902963	critical stride is 8192	-0.124939
-0.358744	8 kb = 8192	-0.124939
-0.065647	__m128i mask = _mm_cmpgt_epi16(b,	-0.425969
-0.505036	older microprocessors is lost.	-0.124939
-0.358812	user settings are lost.	-0.124939
-0.102884	.................................................................................................................. 60 7.30 Exceptions	-0.124939
-0.102884	of multithreading. 7.30 Exceptions	-0.124939
-1.076765	out of the question	-0.124939
-0.726711	the numbers in question	-0.124939
-0.981730	a time and afterwards	-0.124939
-1.185144	If the program afterwards	-0.124939
-1.240893	the function returns. Every	-0.124939
-0.237959	not the columns. Every	-0.124939
-0.505088	addition, set the denormals-are-zero	-0.124939
-0.358893	Set flush-to-zero and denormals-are-zero	-0.124939
-1.282684	to the function declaration.	-0.124939
-0.571981	in the class declaration.	-0.124939
-0.562101	but no other exceptions:	-0.124939
-0.352711	to resume after exceptions:	-0.124939
-0.743697	and difficult to read.	-0.124939
-0.515092	code difficult to read.	-0.124939
-0.350865	oriented programming are: Non-static	-0.124939
-0.538969	only one instance. Non-static	-0.124939
-0.897382	form of a re-	-0.124939
-0.358730	allocating piecewise or re-	-0.124939
-0.562090	another dynamic library requiring	-0.124939
-0.352402	a complex framework requiring	-0.124939
-0.331944	7.13 struct abc {int	-0.124939
-0.294275	1024; struct Sab {int	-0.124939
-1.074423	disadvantage that the branching	-0.124939
-0.855173	critical function. The branching	-0.124939
-0.787489	function pointer has changed.	-0.124939
-0.795845	variable is never changed.	-0.124939
-0.588526	whether the object belongs	-0.124939
-0.325437	else. This normally belongs	-0.124939
-0.340385	--xx----- (a&&b) || (a&&c)	-0.124939
-0.340385	a&&b (a&&b) || (a&&c)	-0.124939
-0.590965	16.1 const int NumberOfTests	-0.124939
-0.237959	test // Repeat NumberOfTests	-0.124939
-0.463653	of platform is obviously	-0.124939
-0.593904	large then it obviously	-0.124939
-0.414402	table may go undetected.	-0.124939
-0.319781	would otherwise go undetected.	-0.124939
-0.467691	thread that runs alone	-0.124939
-0.237959	as a stand alone	-0.124939
-0.597387	indicated by the caller	-0.124939
-0.593558	runtime from the caller	-0.124939
-0.532192	to a better understanding	-0.124939
-0.237959	and a basic understanding	-0.124939
-0.463442	development process can influence	-0.124939
-0.570024	program. This has influence	-0.124939
-0.653439	b : c x-xx-----	-0.124939
-0.336334	- x-xxxx--x x-xxxx--x x-xx-----	-0.124939
-1.303467	function is called. Lazy	-0.124939
-0.314795	sometimes unacceptably long. Lazy	-0.124939
-0.358756	unused returns // Volatile	-0.124939
-0.545548	higher) is enabled. Volatile	-0.124939
-1.054610	may need to lock	-0.124939
-0.237959	than to temporarily lock	-0.124939
-0.463653	educational purposes is allowed.	-0.124939
-0.598744	mirroring is not allowed.	-0.124939
-0.886721	caching more efficient today	-0.124939
-0.356647	is brand new today	-0.124939
-0.318863	d; d = (double)(signed	-0.425969
-0.599866	faster in a programmable	-0.124939
-0.358361	logic devices A programmable	-0.124939
-0.461919	do have such checks.	-0.124939
-0.350366	or bypassing syntax checks.	-0.124939
-0.830852	and table lookup mechanisms	-0.124939
-0.237959	old version. Updating mechanisms	-0.124939
-0.555118	summarized in table 8.1.	-0.124939
-0.352973	n.a. - Table 8.1.	-0.124939
-0.589652	then all the G	-0.124939
-0.552605	vector, the four G	-0.124939
-0.783380	Intel compiler Linux Align	-0.124939
-0.336338	for Linux) 4. Align	-0.124939
-0.357773	for (b + c)	-0.124939
-0.535626	> b / c)	-0.124939
-0.527016	version CriticalFunction = &CriticalFunction_386;	-0.124939
-0.525551	Default version return &CriticalFunction_386;	-0.124939
-0.723811	the first two (three	-0.124939
-1.500884	on the stack (three	-0.124939
-1.050618	} else { goto	-0.124939
-1.171836	you are not testing.	-0.124939
-0.462842	feedback comes from testing.	-0.124939
-0.102884	need the "override" feature.	-0.124939
-0.102884	implement this "override" feature.	-0.124939
-0.102884	GOT. The symbol interposition	-0.124939
-0.102884	This so-called symbol interposition	-0.124939
-0.659679	initialization routine that loads	-0.124939
-0.725224	The application program loads	-0.124939
-0.023527	link libraries (*.lib, *.a)	-0.425969
-0.354781	keep their CPU dispatchers	-0.124939
-0.354781	properly. Many CPU dispatchers	-0.124939
-0.341892	up one register. Registers	-0.124939
-0.314795	pointer or reference. Registers	-0.124939
-0.355611	point exceptions, etc. Event-based	-0.124939
-0.237959	is less reliable. Event-based	-0.124939
-0.775887	all the problems associated	-0.124939
-0.452775	common programming errors associated	-0.124939
-1.330097	can be a time-consumer	-0.124939
-0.502422	is the biggest time-consumer	-0.124939
-1.266537	the CPU detection mechanism.	-0.124939
-0.502408	the branch prediction mechanism.	-0.124939
-0.140395	namespaces. 65 8 Optimizations	-0.124939
-0.140395	Namespaces........................................................................................................... 65 8 Optimizations	-0.124939
-0.463593	of devices and machines	-0.124939
-0.439064	The best Java machines	-0.124939
-0.537123	dealing with this problem:	-0.124939
-0.356232	remedies against this problem:	-0.124939
-0.354532	default constructors, copy constructors,	-0.124939
-0.343763	applies to default constructors,	-0.124939
-0.345257	when using references. References	-0.124939
-0.325431	a wrong type. References	-0.124939
-0.065690	instruction sets are mutually	-0.425969
-0.659258	is coded as _mm_empty()	-0.124939
-0.542347	have to execute _mm_empty()	-0.124939
-1.589145	The compiler may report	-0.124939
-0.356831	/Fm Generate optimization report	-0.124939
-0.358090	to remove all disturbing	-0.124939
-0.354521	best-case conditions. All disturbing	-0.124939
-0.358899	minor increase in develop-	-0.124939
-0.560332	principles of software develop-	-0.124939
-1.033766	will not be negative.	-0.124939
-0.503799	will never be negative.	-0.124939
-0.343765	sort and search facilities,	-0.124939
-0.439068	IDE, for debugging facilities,	-0.124939
-0.940711	will cause the creation	-0.124939
-0.358852	+ c) The creation	-0.124939
-0.570023	get a compiler warning	-0.124939
-0.357976	will get no warning	-0.124939
-0.590965	14.5a const int min	-0.124939
-0.641785	if (i >= min	-0.124939
-0.336334	to are constant. 14.2	-0.124939
-0.314795	tables ................................................................................................. 132 14.2	-0.124939
-0.481349	bits to zero. 14.3	-0.124939
-0.325431	checking .................................................................................................. 134 14.3	-0.124939
-0.314795	topics ......................................................................................... 132 14.1	-0.124939
-0.538969	Specific optimization topics 14.1	-0.124939
-0.570594	version. See the vectorclass	-0.124939
-0.358589	see http://www.agner.org/optimize/ - vectorclass	-0.124939
-0.408211	b1 * reciprocal_divisor; 14.7	-0.124939
-0.294275	division ........................................................................................... 139 14.7	-0.124939
-0.358893	languages, profiling and debugging.	-0.124939
-0.726159	are incompatible with debugging.	-0.124939
-0.346724	Example 8.26a void Func(int	-0.124939
-0.346724	Example 8.26b void Func(int	-0.124939
-0.358940	j by is (columns	-0.124939
-0.461850	+ j * (columns	-0.124939
-0.314795	multiplication ............................................................................................. 136 14.5	-0.124939
-0.237959	on page 96. 14.5	-0.124939
-0.971213	the array is defined.	-0.124939
-0.900208	limit can be defined.	-0.124939
-0.294275	etc. -msse3 -mssse3 -msse4.1	-0.124939
-0.237959	/arch:SSE3 -mssse3 /arch:SSSE2 -msse4.1	-0.124939
-0.358756	// x^8 // x^10	-0.124939
-0.502776	x^10 // return x^10	-0.124939
-0.463696	there are many branches):	-0.425969
-0.356331	== Friday) { DoThisThreeTimesAWeek();	-0.124939
-0.356331	| Friday)) { DoThisThreeTimesAWeek();	-0.124939
-0.102884	/arch:SSE2 -msse2 /arch:SSE2 -msse2	-0.124939
-0.102884	or double) /arch:SSE2 -msse2	-0.124939
-0.761190	functions such as logarithms,	-0.425969
-0.502865	position-independent code by default.	-0.124939
-0.357388	code everywhere by default.	-0.124939
-0.527234	development time for WTL	-0.124939
-0.358361	Library (WTL). A WTL	-0.124939
-0.129296	0, _EM_OVERFLOW); // _controlfp(0,	-0.425969
-1.124435	more time to load.	-0.124939
-0.554351	on the work load.	-0.124939
-1.251481	c; a = select(b	-0.124939
-0.576579	c.load(cc+i); a = select(b	-0.124939
-0.709244	the high level framework.	-0.124939
-0.435097	of the .NET framework.	-0.124939
-0.488082	on exception handling. 8.6	-0.124939
-0.314795	options ................................................................................... 81 8.6	-0.124939
-0.143243	for communication and synchronization	-0.124939
-0.143243	because communication and synchronization	-0.124939
-0.764722	Func is executed. Without	-0.124939
-0.294275	C1::f } 73 Without	-0.124939
-0.577413	QueryPerformanceCounter functions for millisecond	-0.124939
-0.358662	is measured with millisecond	-0.124939
-0.462888	of double, then sizeof(S1)	-0.124939
-0.339500	} The factor sizeof(S1)	-0.124939
-1.072633	or in a high-priority	-0.124939
-0.463593	system core and high-priority	-0.124939
-0.461521	separately in software development.	-0.124939
-0.444431	independence, and easy development.	-0.124939
-1.057197	doesn't have to push	-0.124939
-0.294275	2: 12 $B1$1: push	-0.124939
-0.065764	"Performance Optimization of Numerically	-0.425969
-0.358929	different CPUs to verify	-0.124939
-0.358893	test, maintain and verify	-0.124939
-0.332964	is standardized allows us	-0.124939
-0.332964	is biased allows us	-0.124939
-0.358893	as sorting and searching,	-0.124939
-0.237959	such as sorting, searching,	-0.124939
-0.884095	pointed to is known.	-0.124939
-1.327377	the object is known.	-0.124939
-0.294275	Position-independent code.................................................................................. 148 14.13	-0.124939
-0.237959	Mac OS X. 14.13	-0.124939
-0.325437	dynamic libraries............................................................................ 146 14.12	-0.124939
-0.237959	position-independent code. 147 14.12	-0.124939
-0.608760	the same memory area.	-0.124939
-0.599073	compilers. // Example 14.19	-0.124939
-0.592946	given in example 14.19	-0.124939
-0.596498	zero, rather than rounding.	-0.124939
-0.501073	for details about rounding.	-0.124939
-0.357773	= row + column;	-0.124939
-0.237959	matrix[NUMROWS][NUMCOLUMNS]; int row, column;	-0.124939
-1.309031	improve the performance dramatically	-0.124939
-0.294275	search times 24 dramatically	-0.124939
-0.352984	and static data. 148	-0.124939
-0.237959	14.12 Position-independent code.................................................................................. 148	-0.124939
-0.237959	with long latencies. 8.5	-0.124939
-0.237959	optimization by CPU.............................................................................81 8.5	-0.124939
-0.357733	} polynomial // Polynomial	-0.124939
-0.357733	= 3.3; // Polynomial	-0.124939
-0.460637	register, not even temporarily.	-0.124939
-0.554930	memory, at least temporarily.	-0.124939
-0.864241	of a very obscure	-0.124939
-0.294275	possible to construct obscure	-0.124939
-0.599073	keyword: // Example 14.1c	-0.124939
-0.592946	FactorialTable in example 14.1c	-0.124939
-0.899084	// fractional part 142	-0.124939
-0.237959	point variables ......................... 142	-0.124939
-0.841863	the option for "assume	-0.124939
-0.783522	the compiler option "assume	-0.124939
-1.308053	the arrays are properly	-0.124939
-0.408211	object is deleted properly	-0.124939
-0.460971	a software optimization issue.	-0.124939
-0.538969	a serious legal issue.	-0.124939
-0.885087	the computer is restarted	-0.124939
-0.358893	shut down and restarted	-0.124939
-0.358556	} module2.cpp int Func2()	-0.124939
-0.523744	} } void Func2()	-0.124939
-0.294275	x-xxxx--x x-xxxx--x x-xx----- x--x-----	-0.124939
-0.237959	x (x) x-xx--xx- x--x-----	-0.124939
-0.358546	computing power than PCs.	-0.124939
-0.354664	resources than standard PCs.	-0.124939
-0.348391	do this optimization. 8.2	-0.124939
-0.314795	optimize ............................................................................................ 66 8.2	-0.124939
-0.348393	something about it. Possible	-0.124939
-0.237959	on non-Intel machines? Possible	-0.124939
-0.358893	code. Compilers and IDE's	-0.124939
-0.353673	turned on. Most IDE's	-0.124939
-0.817453	and PathScale compilers. 8.3	-0.124939
-0.294275	different compilers............................................................................. 74 8.3	-0.124939
-1.778241	that can be obtained.	-0.124939
-0.724750	cannot easily be obtained.	-0.124939
-0.763277	Optimizing software in C++:	-0.124939
-0.358428	considered metaprogramming in C++:	-0.124939
-0.467821	{ y = sin(x);	-0.124939
-0.527362	can see the delay.	-0.124939
-0.575896	causes a long delay.	-0.124939
-0.786993	float sum = 1.f;	-0.124939
-0.357687	float nfac = 1.f;	-0.124939
-0.314795	this optimization explicitly. Divisions	-0.124939
-0.382907	unit as additions. Divisions	-0.124939
-0.356831	in time-critical code. 7.32	-0.124939
-0.343763	unwinding .............................................................................. 65 7.32	-0.124939
-0.570271	0; int i, largest_index	-0.124939
-0.237959	largest_abs = absvalue; largest_index	-0.124939
-0.537879	are 64 bits wide,	-0.124939
-0.513361	is 16 bits wide,	-0.124939
-1.418784	100; i++) { list[i].a	-0.124939
-0.525296	variable for accessing list[i].a	-0.124939
-0.355846	a particular processor model.	-0.124939
-0.500600	recommend any specific model.	-0.124939
-0.343763	directives ......................................................................................... 65 7.33	-0.124939
-0.294275	for a discussion. 7.33	-0.124939
-0.203864	integer operations for manipulating	-0.425969
-0.102884	multiple cores. 3.15 Dependency	-0.124939
-0.102884	switches..................................................................................................... 22 3.15 Dependency	-0.124939
-1.125716	as fast as additions.	-0.124939
-0.357091	same unit as additions.	-0.124939
-1.052979	bytes without cache MOVNTPD	-0.124939
-0.237959	16-byte instructions MOVNTPS, MOVNTPD	-0.124939
-0.822995	as an integer. 158	-0.124939
-0.294275	embedded systems ............................................................................. 158	-0.124939
-0.463403	C; x.a = A;	-0.124939
-0.657558	= A + A;	-0.124939
-0.152181	is a clock cycle?	-0.124939
-0.593735	far from the server.	-0.124939
-0.356686	a dedicated test server.	-0.124939
-0.314795	of unit-testing ...................................................................................... 156	-0.124939
-0.237959	is unreasonably large. 156	-0.124939
-0.358893	carefully optimized and fine-tuned	-0.124939
-0.595618	branches that are fine-tuned	-0.124939
-0.294275	Worst-case testing ................................................................................................ 157	-0.124939
-0.382907	random than normal. 157	-0.124939
-0.575082	bitmap than to draw	-0.124939
-0.358592	necessary here to draw	-0.124939
-0.347533	on different test examples.	-0.124939
-0.449188	in my test examples.	-0.124939
-0.510601	inside the derived class:	-0.124939
-0.237959	in the grandparent class:	-0.124939
-1.449352	The advantage of sharing	-0.124939
-0.541042	multiple threads are sharing	-0.124939
-1.797614	you want to 155	-0.124939
-0.237959	monitor counters .................................................................... 155	-0.124939
-0.294275	Threads .................................................................................................................. 60 7.30	-0.124939
-0.237959	techniques of multithreading. 7.30	-0.124939
-0.556486	prevented in other ways,	-0.124939
-0.357046	8192 bytes, 4 ways,	-0.124939
-0.898987	address can be predicted.	-0.124939
-0.595845	branch should be predicted.	-0.124939
-0.143418	as for (i=0; i<n;	-0.124939
-0.143418	example, for (i=0; i<n;	-0.124939
-1.737170	the program is dividing	-0.124939
-0.357078	unsigned int before dividing	-0.124939
-0.586782	functions, and other complications	-0.124939
-0.573466	generation can cause complications	-0.124939
-0.538425	code compiled without AVX,	-0.124939
-0.346491	set available, e.g. AVX,	-0.124939
-0.314795	and so on. 7.31	-0.124939
-0.294275	handling ................................................................................ 61 7.31	-0.124939
-0.826942	of overflow and redo	-0.124939
-0.358434	with _finite()) and redo	-0.124939
-0.358848	detect opportunities for parallelization	-0.124939
-1.047655	OpenMP and automatic parallelization	-0.124939
-0.483938	per array element. Matrix	-0.124939
-0.531617	were as follows: Matrix	-0.124939
-0.726952	Constructors and destructors ..................................................................................	-0.124939
-0.754983	find hot spots ..................................................................................	-0.124939
-0.873111	b * c); a.store(aa+i);	-0.124939
-1.023513	elements in aa: a.store(aa+i);	-0.124939
-0.064078	if (u.i & 0x7FFFFFFF)	-0.425969
-0.463442	multiple threads can add,	-0.124939
-0.237959	vectors SSE3 horizontal add,	-0.124939
-0.550780	wrong branch is fed	-0.124939
-1.437155	which can be fed	-0.124939
-0.901478	end of the array,	-0.124939
-0.580151	element in an array,	-0.124939
-0.462710	preferably 32 for AVX.	-0.124939
-0.358199	e.g. .R. for AVX.	-0.124939
-0.321871	Microsoft compiler #define Alignd(X)	-0.124939
-0.321871	compiler, etc. #define Alignd(X)	-0.124939
-0.106961	__m128i c2 = _mm_add_epi16(c,	-0.425969
-1.098452	the program that waits	-0.124939
-0.358089	is loaded, but waits	-0.124939
-0.596498	sets rather than loops,	-0.124939
-0.355980	and compile-time while loops,	-0.124939
-0.463536	the bits for Tuesday,	-0.124939
-0.382907	{ Sunday, Monday, Tuesday,	-0.124939
-0.358899	the case in loops.	-0.124939
-1.148155	the critical innermost loops.	-0.124939
-0.358756	+ log(c[i]); // Increment	-0.124939
-0.294275	and 137, respectively. Increment	-0.124939
-1.669098	likely to be cached	-0.124939
-0.858506	and data are cached	-0.124939
-0.064043	subexpression elimination, constant propagation,	-0.124939
-0.462804	program or data exceeds	-0.124939
-0.353455	user input never exceeds	-0.124939
-0.726711	be found in Wikipedia	-0.124939
-0.496328	Intel C++ compilers. Wikipedia	-0.124939
-0.639049	16.3 Worst-case testing ................................................................................................	-0.124939
-0.630046	3.15 Dependency chains ................................................................................................	-0.124939
-0.538969	with the inverted mask.	-0.124939
-0.237959	a thread affinity mask.	-0.124939
-0.504210	not. I will conclude	-0.124939
-0.523471	We can therefore conclude	-0.124939
-0.600004	sections can be shared.	-0.124939
-0.876034	it cannot be shared.	-0.124939
-0.358940	the DLL is relocated	-0.124939
-0.358812	The DLLs are relocated	-0.124939
-0.056056	execution of everything else.	-0.124939
-1.437259	accessed in a FIFO	-0.124939
-1.396069	For example, a FIFO	-0.124939
-0.110696	"Intel Math Kernel Library"	-0.124939
-0.527170	used methods for dealing	-0.124939
-0.591504	that you are dealing	-0.124939
-1.113376	{ public: void Hello()	-0.124939
-0.346724	void Disp(); void Hello()	-0.124939
-0.294275	graphics processing unit. Various	-0.124939
-0.237959	disk or network. Various	-0.124939
-0.525378	marketing of 64-bit software,	-0.124939
-0.533453	complexity of modern software,	-0.124939
-1.185129	i++) { // Overflow	-0.124939
-0.314795	page 142). 30 Overflow	-0.124939
-0.357893	be calculated using multiplications	-0.124939
-0.654485	to speed up multiplications	-0.124939
-0.461677	there are some differences	-0.124939
-0.349789	2.1.7, 2004. No differences	-0.124939
-0.065647	1.1, B = 2.2,	-0.425969
-0.897266	on the same machine.	-0.124939
-0.459536	so-called Java virtual machine.	-0.124939
-0.350363	not use STL containers.	-0.124939
-0.538969	creating and deleting containers.	-0.124939
-0.571274	to a branch tree.	-0.124939
-0.516703	or a binary tree.	-0.124939
-0.237959	funda- mentally flawed approach	-0.124939
-0.237959	and well thought-through approach	-0.124939
-0.500289	can be allocated dynamically.	-0.124939
-0.538969	typeof(CriticalFunction) * CriticalFunctionDispatch(void) __asm__	-0.124939
-0.237959	int CriticalFunction (); __asm__	-0.124939
-0.682633	Some compilers have difficulties	-0.425969
-0.143243	to creating and deleting	-0.124939
-0.143243	for creating and deleting	-0.124939
-0.591897	49 for a discussion.	-0.124939
-0.546720	140 for further discussion.	-0.124939
-0.408211	32 7.4 Enums ......................................................................................................................	-0.124939
-0.408211	93 9.8 Strings ......................................................................................................................	-0.124939
-0.356342	sizes (char, short int)	-0.124939
-0.343757	used char (or int)	-0.124939
-0.569371	y) { if (y)	-0.124939
-0.461757	b[1000]; }; if (y)	-0.124939
-0.569723	function for this purpose,	-0.124939
-1.109953	for a specific purpose,	-0.124939
-0.294275	as most sorting algorithms,	-0.124939
-0.237959	as many encryption algorithms,	-0.124939
-0.764079	supported CriticalFunction = &CriticalFunction_SSE2;	-0.124939
-0.461598	SSE2 supported return &CriticalFunction_SSE2;	-0.124939
-0.539077	NotPolymorphic(); virtual void Disp();	-0.124939
-0.538969	<< "Hello "; Disp();	-0.124939
-0.858665	response time is consistent	-0.124939
-1.309639	be improved by consistent	-0.124939
-0.358744	then 0+1.23456 = 1.23456.	-0.124939
-0.596498	0 rather than 1.23456.	-0.124939
-0.023527	i; } u, v;	-0.425969
-0.023527	bc = _mm_mullo_epi16 (b,	-0.425969
-0.548701	output can often reveal	-0.124939
-0.346504	and their implementations reveal	-0.124939
-0.835898	or structure is created.	-0.124939
-1.174630	which they are created.	-0.124939
-0.597161	reasons to use denormal	-0.124939
-0.294275	rather than generating denormal	-0.124939
-0.892670	updates should be optional	-0.124939
-0.237959	General Public License, optional	-0.124939
-0.057023	memcpy 16kB unaligned op.	-0.124939
-0.355846	each time. An experiment	-0.124939
-0.350866	results of my experiment	-0.124939
-0.355981	eax $B2$2 ; Induction++;	-0.124939
-0.702772	a[i+1] = Induction; Induction++;	-0.124939
-0.556763	Conversions between different precisions	-0.124939
-0.598653	different floating point precisions	-0.124939
-0.102884	.................................................................................... 124 13.3 Difficult	-0.124939
-0.102884	of programming. 13.3 Difficult	-0.124939
-0.524923	based on an interpreter	-0.124939
-0.356756	PC's had an interpreter	-0.124939
-0.486263	procedure linkage table (PLT).	-0.124939
-0.460856	UnusedFiller; }; int order(int	-0.124939
-0.356741	i, j; int order(int	-0.124939
-0.358931	circuits consisting of digital	-0.124939
-0.336334	operations. A complex digital	-0.124939
-0.152964	i; for(i=0; i<300; i++){	-0.425969
-0.294275	K8 0.38 0.44 0.40	-0.124939
-0.237959	2 0.77 0.89 0.40	-0.124939
-0.093318	double x, y, z;	-0.124939
-0.093318	float x, y, z;	-0.124939
-0.314795	point division ........................................................................................... 139	-0.124939
-0.294275	b / c) 139	-0.124939
-0.237959	0.11 1.21 0.57 0.44	-0.124939
-0.237959	Opteron K8 0.38 0.44	-0.124939
-0.023527	return (*SelectAddMul_pointer)(aa, bb, cc);	-0.425969
-0.495457	16 bit platform __GNUC__	-0.124939
-0.294275	Example 8.22 #ifdef __GNUC__	-0.124939
-0.504889	the line that covered	-0.124939
-0.557524	different processors are covered	-0.124939
-0.403456	is the old fashioned	-0.124939
-0.683084	in the old fashioned	-0.124939
-0.504646	variable-size arrays with alloca.	-0.124939
-0.725244	restrictions on using alloca.	-0.124939
-0.716583	of RAM memory. Efficient	-0.124939
-0.575689	rounding and truncation. Efficient	-0.124939
-0.725306	use different memory spaces	-0.124939
-0.501245	time cleaning up spaces	-0.124939
-0.521030	Induction; ; parameter $B1$1:	-0.124939
-0.341892	parameter 2: 12 $B1$1:	-0.124939
-0.023527	720, 5040, 40320, 362880,	-0.425969
-0.355244	(memory address) / (line	-0.124939
-0.237959	(number of sets) (line	-0.124939
-0.358730	each row or column.	-0.124939
-0.590467	0 in this column.	-0.124939
-0.558850	bigger and more complex,	-0.124939
-0.553703	source code more complex,	-0.124939
-0.648041	page 146 below. 3.7	-0.124939
-0.346498	code ....................................................... 20 3.7	-0.124939
-1.719335	This is a cheap	-0.124939
-0.429573	Branches are relatively cheap	-0.124939
-0.102028	reasons of mathematical purity.	-0.124939
-0.294275	F32vec4 s(0.f, 0.f, 0.f,	-0.124939
-0.237959	x^4 F32vec4 s(0.f, 0.f,	-0.124939
-0.594266	double: // Example 14.23b	-0.124939
-0.355532	big-endian storage. Example 14.23b	-0.124939
-0.343780	induction variables (see below).	-0.124939
-0.343780	by 16 (see below).	-0.124939
-0.527299	pointers requires a division,	-0.124939
-0.459203	time. Single precision division,	-0.124939
-0.358931	one unit of received	-0.124939
-0.331949	servicing. A command received	-0.124939
-0.325431	to a dramatic degradation	-0.124939
-0.294275	be a slight degradation	-0.124939
-0.527362	don't need the "override"	-0.124939
-0.358451	to implement this "override"	-0.124939
-0.599073	two: // Example 11.2b	-0.124939
-0.592946	list in example 11.2b	-0.124939
-1.826944	the address of matrix[j][0]	-0.124939
-0.538969	j = order(i); matrix[j][0]	-0.124939
-0.314795	!(!a)=a x-xxxxxxx ---x----- x--xx----	-0.124939
-0.237959	---x----- x--xx---- (a&&b)||(a&&!b)=a x--xx----	-0.124939
-0.356047	a separate function. Sometimes,	-0.124939
-0.444421	the hot spot. Sometimes,	-0.124939
-1.335043	don't have to distribute	-0.124939
-1.178121	not possible to distribute	-0.124939
-1.399857	look at the "worst	-0.124939
-0.358829	counts represent the "worst	-0.124939
-0.305208	Boolean variables are overdetermined	-0.124939
-0.237972	time of 250 ms.	-0.124939
-0.237972	more than 250 ms.	-0.124939
-1.246283	does the same thing.	-0.124939
-1.050945	doing the same thing.	-0.124939
-0.593040	Define size of squares:	-0.124939
-0.579421	c1 for all squares:	-0.124939
-1.521726	at the time MemberPointer	-0.124939
-0.357078	of c1 before MemberPointer	-0.124939
-0.611066	is available from www.intel.com.	-0.124939
-0.431952	Library, available from www.intel.com.	-0.124939
-0.341905	r1; c1 += TILESIZE)	-0.124939
-0.341905	SIZE; r1 += TILESIZE)	-0.124939
-1.186718	a number of sources.	-0.124939
-0.347530	come from unknown sources.	-0.124939
-0.898615	thanks to the first-in-last-out	-0.124939
-0.599866	organized in a first-in-last-out	-0.124939
-0.023527	5040, 40320, 362880, 3628800,	-0.425969
-1.157008	it is not uncommon	-0.425969
-0.541231	games. Such a coprocessor	-0.124939
-0.548918	with a graphics coprocessor	-0.124939
-0.823201	int fraction : 23;	-0.124939
-0.351719	+= n << 23;	-0.124939
-0.439060	compilers for Linux. 82	-0.124939
-0.325431	Optimization directives .............................................................................................. 82	-0.124939
-0.561437	disadvantage of C++ relates	-0.124939
-0.930616	the C++ language relates	-0.124939
-0.453410	same member pointer. 7.9	-0.124939
-0.237959	7.8 Member pointers.......................................................................................................37 7.9	-0.124939
-1.076765	care of the alignment.	-0.124939
-0.331944	always work. Data alignment.	-0.124939
-0.343765	efficient as integers. 7.5	-0.124939
-0.325431	Enums ...................................................................................................................... 33 7.5	-0.124939
-0.483770	done a good deal	-0.124939
-0.691928	get a good deal	-0.124939
-0.314795	and loader (requires binutils	-0.124939
-0.237959	example 13.1, Requires binutils	-0.124939
-0.511853	Boolean vector operations. 7.6	-0.124939
-0.325431	7.5 Booleans................................................................................................................... 33 7.6	-0.124939
-0.255942	Mac systems. 14 Specific	-0.124939
-0.255942	......................................................................... 130 14 Specific	-0.124939
-0.356047	anyway. Pure function. __attribute__((const))	-0.124939
-0.408211	__GNUC__ #define pure_function __attribute__((const))	-0.124939
-0.358812	the background are unnecessary	-0.124939
-0.348393	a program. Avoid unnecessary	-0.124939
-0.569430	else { float b[1000];	-0.124939
-0.352354	int a[1000]; float b[1000];	-0.124939
-0.325431	and operators............................................................................... 29 7.3	-0.124939
-0.325431	integer variables. 31 7.3	-0.124939
-0.566755	error simply by performing	-0.124939
-0.352974	usability A better performing	-0.124939
-0.462676	are done only once,	-0.124939
-0.652542	is only calculated once,	-0.124939
-0.555559	s0 = 0, s1	-0.124939
-0.467691	s0 += a[i]; s1	-0.124939
-1.065267	CISC instruction set (called	-0.124939
-0.345257	compile-time if statements (called	-0.124939
-1.047786	{ int i; 84	-0.124939
-0.294275	compiler does ............................................................................. 84	-0.124939
-0.102884	must have extern "C"	-0.124939
-0.102884	entry point extern "C"	-0.124939
-0.658189	in almost all respects	-0.124939
-0.539519	compiler in many respects	-0.124939
-0.314795	GNU Free Documentation License	-0.124939
-0.237959	control no yes License	-0.124939
-0.357056	the code. See ISO/IEC	-0.124939
-0.237959	compiler optimization. en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC	-0.124939
-1.174471	be used as command-line	-0.124939
-0.462916	and debugging. A command-line	-0.124939
-0.459890	in array ; i++	-0.124939
-0.353864	the post-increment operator i++	-0.124939
-0.584912	constant (see page 137).	-0.124939
-0.791589	2 (See page 137).	-0.124939
-1.163468	branch inside the template.	-0.124939
-1.485097	with the same template.	-0.124939
-0.358756	integer constant. // General	-0.124939
-0.294275	compiler price GNU General	-0.124939
-1.076970	stored in the container,	-0.124939
-1.298674	into a single container,	-0.124939
-0.659712	64-bit Windows. The integrated	-0.124939
-0.358730	graphics card or integrated	-0.124939
-1.737170	the program is started.	-0.124939
-0.354211	the CPU was started.	-0.124939
-0.803634	a function which transposes	-0.124939
-1.748083	The following example transposes	-0.124939
-1.194573	access to the container.	-0.124939
-0.900607	objects in the container.	-0.124939
-0.127629	dispatched version return (*SelectAddMul_pointer)(aa,	-0.425969
-0.522570	system. It will crash	-0.124939
-0.355721	are disabled will crash	-0.124939
-0.065647	double A = 1.1,	-0.425969
-0.916258	allows you to reserve	-0.124939
-2.118167	in order to reserve	-0.124939
-0.331940	and integer division. Older	-0.124939
-0.237959	2, 4, etc.). Older	-0.124939
-1.004217	the pointer is deleted.	-0.124939
-1.424879	need to be deleted.	-0.124939
-0.621847	Windows and Linux. Asmlib	-0.124939
-0.331944	0.6 1.19 13 Asmlib	-0.124939
-0.652315	program is run. Both	-0.124939
-0.408221	for the linker. Both	-0.124939
-0.525461	p->Hello(); p = &Object2;	-0.124939
-0.357687	p2; p2 = &Object2;	-0.124939
-0.503021	struct { int a:4;	-0.124939
-0.503021	Bitfield { int a:4;	-0.124939
-0.294275	a+0=a a*0=0 a*1=a (-a)*(-b)=a*b	-0.124939
-0.237959	--xxxx-xx a*1=a x-xxxxx-x (-a)*(-b)=a*b	-0.124939
-0.600617	information can be left	-0.124939
-0.356959	one free register left	-0.124939
-0.237959	c2 < c1+TILESIZE; c2++)	-0.124939
-0.237959	c2 < r2; c2++)	-0.124939
-0.174153	on the processor. Nested	-0.425969
-0.582917	multiplication } // ipow	-0.124939
-0.357856	using loop double ipow	-0.124939
-0.587331	only an integer comparison,	-0.124939
-0.382907	as addition, subtraction, comparison,	-0.124939
-0.107023	New versions are produced	-0.425969
-0.435306	{ public: void NotPolymorphic();	-0.124939
-0.659187	= a+(b+c) - a*b+a*c	-0.124939
-0.596459	a+(b+c) - n.a. a*b+a*c	-0.124939
-0.599073	capability: // Example 11.1a	-0.124939
-1.328195	code in example 11.1a	-0.124939
-0.355846	the CPU doesn't support,	-0.124939
-0.355443	has no AVX support,	-0.124939
-0.589816	leaks if you forget	-0.124939
-0.587586	thread. If you forget	-0.124939
-0.578516	bc with the inverted	-0.124939
-0.578516	AND'ed with the inverted	-0.124939
-0.358929	example 11.1a to 11.1b	-0.124939
-0.599073	103 // Example 11.1b	-0.124939
-0.463403	* 8 = 80.	-0.124939
-0.594082	optimizations. See page 80.	-0.124939
-0.462012	of element number i.	-0.124939
-0.294275	holds the index, i.	-0.124939
-0.358598	line written. This worked	-0.124939
-0.463042	information. They have worked	-0.124939
-0.504622	against overflow is needed:	-0.124939
-0.358645	why bookkeeping is needed:	-0.124939
-0.337730	.......................................................................................................... 164 1 Introduction	-0.124939
-0.337730	2014-08-07. Contents 1 Introduction	-0.124939
-0.851911	i; for(i=0; i<300; i+=3){	-0.124939
-0.237959	i; for(i=0; i<301; i+=3){	-0.124939
-0.064567	assembly: ALIGN 4 PUBLIC	-0.425969
-0.102884	2) 2 a+a+a+a=a*4 -(-a)=a	-0.124939
-0.102884	= ((x2)2)2 a+a+a+a=a*4 -(-a)=a	-0.124939
-0.212332	manuals from Intel: "IA-32	-0.124939
-0.212332	Microprocessor documentation Intel: "IA-32	-0.124939
-1.038296	the program is loaded,	-0.124939
-0.513310	c (a&&b) || (a&&b&&c)	-0.124939
-0.328390	|| (a&&c) || (a&&b&&c)	-0.124939
-1.031340	a-a = 0 a+0=a	-0.124939
-0.237959	--xxxxxx- a-(-b)=a+b ---xxx-x- a+0=a	-0.124939
-0.599073	SafeArray: // Example 7.15b	-0.124939
-0.503590	parameters, as example 7.15b	-0.124939
-0.357859	the block size grows	-0.124939
-0.581673	in an array grows	-0.124939
-0.358940	then N&(N-1) is 0.	-0.124939
-0.539720	if c < 0.	-0.124939
-0.023527	r2 < r1+TILESIZE; r2++)	-0.425969
-0.358978	eax holds the index,	-0.124939
-0.408211	a square brackets index,	-0.124939
-1.077569	none of the time-consumers	-0.124939
-1.061972	The most common time-consumers	-0.124939
-0.472687	search, is fast enough.	-0.124939
-0.335589	the job fast enough.	-0.124939
-0.023527	6, 24, 120, 720,	-0.425969
-0.527408	12.4c is quite tedious	-0.124939
-0.908509	can be quite tedious	-0.124939
-0.355916	code versions work correctly.	-0.124939
-0.355349	code branches works correctly.	-0.124939
-0.358893	is safe and flexible,	-0.124939
-0.237959	STL are universal, flexible,	-0.124939
-0.590965	i; const int ARRAYSIZE	-0.124939
-0.722815	if (i < ARRAYSIZE	-0.124939
-0.507861	use the source annotation	-0.124939
-0.330864	option for source annotation	-0.124939
-0.336338	ecx and edx, respectively.	-0.124939
-0.237959	136 and 137, respectively.	-0.124939
-0.345267	Introduction ....................................................................................................................... 3 1.1	-0.124939
-0.325431	new relevant information. 1.1	-0.124939
-0.591012	mispredictions (see page 43).	-0.124939
-0.511992	prediction (see p. 43).	-0.124939
-0.595282	} u; u.i ^=	-0.124939
-0.237959	example with u.i[1] ^=	-0.124939
-0.457941	which is all 1's	-0.124939
-0.838648	AND'ed with all 1's	-0.124939
-0.358416	if we use hexadecimal	-0.124939
-0.353460	of 2. Using hexadecimal	-0.124939
-0.716599	int i; } u,	-0.425969
-0.560531	signed, or by extending	-0.124939
-0.357388	longer size by extending	-0.124939
-0.137171	operators &, |, ^,	-0.124939
-0.599564	handled in a systematic	-0.124939
-1.643850	to use a systematic	-0.124939
-0.715059	the & operator forces	-0.124939
-0.351728	variable. The union forces	-0.124939
-1.048818	the loop is rolled	-0.124939
-0.331940	of a list, rolled	-0.124939
-0.730585	from same module __attribute__	-0.124939
-0.538969	__attribute__ ((visibility ("internal"))) __attribute__	-0.124939
-0.463601	programs written in Java,	-0.124939
-0.595324	languages, such as Java,	-0.124939
-0.129485	ivdep __restrict #pragma ivdep	-0.124939
-0.129485	noalias) __restrict #pragma ivdep	-0.124939
-1.096090	Microsoft, Intel and Gnu.	-0.124939
-0.577151	highly compatible with Gnu.	-0.124939
-0.358978	This ends the recursion	-0.124939
-0.659712	is implemented. The recursion	-0.124939
-1.241151	n.a. - a ^a	-0.124939
-0.358625	= ~a a ^a	-0.124939
-0.726857	the rules of algebra,	-0.124939
-0.497382	case of Boolean algebra,	-0.124939
-0.462833	cost to memory management	-0.124939
-0.347528	cost of heap management	-0.124939
-0.595038	expressions. There are lots	-0.124939
-0.358662	and databases with lots	-0.124939
-0.294275	- 5. www.amd.com. 163	-0.124939
-0.294275	19 Literature ..................................................................................................................... 163	-0.124939
-0.237959	version. For team projects,	-0.124939
-0.237959	version. For one-man projects,	-0.124939
-0.294275	-m64 -static /MT 160	-0.124939
-0.237959	of compiler options....................................................................................... 160	-0.124939
-0.358756	uses SSE3. // (This	-0.124939
-0.450279	explicit induction variable. (This	-0.124939
-0.572627	BSD, Windows and Mac.	-0.124939
-1.030481	Windows, Linux and Mac.	-0.124939
-1.863870	in the same chip.	-0.124939
-1.274281	in the CPU chip.	-0.124939
-0.600617	file can be wrapped	-0.124939
-0.592006	unless they are wrapped	-0.124939
-0.369023	can be predicted perfectly	-0.124939
-0.023527	mask = _mm_cmpgt_epi16(b, zero);	-0.425969
-0.322533	objects have been added?	-0.425969
-0.294275	"More Effective C++". Addison-Wesley,	-0.124939
-0.237959	Jr.: "Hacker's Delight". Addison-Wesley,	-0.124939
-0.593281	for the loop counter,	-0.124939
-0.584770	incrementing a loop counter,	-0.124939
-0.802972	with the static modifier	-0.124939
-0.294275	__attribute__((fastcall)). The fastcall modifier	-0.124939
-0.463386	/Ox -O3 or -Ofast	-0.124939
-0.294275	specific option) better: -Ofast	-0.124939
-1.258485	the code to test.	-0.124939
-0.294275	want to 155 test.	-0.124939
-0.764552	be unable to respond	-0.124939
-0.519246	It should never respond	-0.124939
-0.023527	Optimization of Numerically Intensive	-0.425969
-0.600953	algorithms in the planning	-0.124939
-0.294275	in the early planning	-0.124939
-0.724506	} }; class C2	-0.124939
-0.294275	{ C1 Object1; C2	-0.124939
-0.878619	with all the R	-0.124939
-0.552605	with the four R	-0.124939
-0.530143	version and a release	-0.124939
-0.530143	development, and a release	-0.124939
-2.080684	part of the fraction.	-0.124939
-0.598064	decimals of the fraction.	-0.124939
-0.358730	Tuesday, Wednesday or Friday	-0.124939
-0.237959	Thursday = 0x10, Friday	-0.124939
-0.447890	functions Some programming textbooks	-0.124939
-0.346506	classes Nowadays, programming textbooks	-0.124939
-0.598266	division to be slower.	-0.124939
-1.064872	make the program slower.	-0.124939
-0.331944	allocation ...................................................................................... 90 9.7	-0.124939
-0.294275	on using alloca. 9.7	-0.124939
-0.659495	for (c2 = c1;	-0.124939
-0.357952	Example 7.14 class c1;	-0.124939
-0.117647	code, interpreters, just-in-time compilers,	-0.124939
-0.117647	frameworks, interpreters, just-in-time compilers,	-0.124939
-0.358893	generate -128, and subtracting	-0.124939
-0.659380	by 2n by subtracting	-0.124939
-0.594405	CriticalFunctionDispatch(void) { // Returns	-0.124939
-0.357733	<ia32intrin.h> etc. // Returns	-0.124939
-0.463541	tested it. The insight	-0.124939
-0.356647	problem. This new insight	-0.124939
-0.237959	algebra reductions: a+b=b+a a*b=b*a	-0.124939
-0.237959	(vector) reductions: a+b=b+a, a*b=b*a	-0.124939
-0.140705	r1; r2 < r1+TILESIZE;	-0.124939
-0.140705	r1+1; r2 < r1+TILESIZE;	-0.124939
-0.358625	obvious reductions as 0/a	-0.124939
-1.482520	= a - 0/a	-0.124939
-0.142654	We can only hope	-0.425969
-0.358756	Example 7.45 // Portability	-0.124939
-0.336338	of optimization. 14 Portability	-0.124939
-0.582370	with the other compilers).	-0.124939
-0.314803	very old DOS compilers).	-0.124939
-0.594405	arraysize) { // Catch	-0.124939
-0.862720	} } // Catch	-0.124939
-0.587598	programs, more than 99%	-0.124939
-0.314795	In other programs, 99%	-0.124939
-0.051258	cout << "Hello ";	-0.124939
-0.348399	= 1024; struct Sab	-0.124939
-0.314803	a; int b;}; Sab	-0.124939
-0.485999	is a significant contribution	-0.124939
-0.314795	only a negligible contribution	-0.124939
-0.541231	compiler makes a distinction	-0.124939
-0.654252	is an important distinction	-0.124939
-0.357733	Store result // Update	-0.124939
-0.357733	variable Y // Update	-0.124939
-0.486909	Microsoft, Gnu, Clang Supported	-0.124939
-0.237959	file dvec.h vectorclass.h Supported	-0.124939
-0.102884	increase in develop- ment	-0.124939
-0.102884	of software develop- ment	-0.124939
-0.827280	the dispatcher function. typeof(CriticalFunction)	-0.124939
-0.237959	CriticalFunctionDispatch(void) __asm__ ("CriticalFunction"); typeof(CriticalFunction)	-0.124939
-0.595439	instructions than the ones	-0.124939
-0.504880	or modify the ones	-0.124939
-1.732577	the program is busy	-0.124939
-0.358645	or she is busy	-0.124939
-1.618990	may not be optimally	-0.124939
-0.521216	part can run optimally	-0.124939
-0.331940	checks where necessary. Fast	-0.124939
-0.314795	appropriate. Compiler-specific keywords Fast	-0.124939
-0.463541	in details. The funny	-0.124939
-0.461677	it does some funny	-0.124939
-0.349140	etc. Optimizing database queries	-0.124939
-0.237959	simple cases. Database queries	-0.124939
-0.590075	on a program saying	-0.124939
-0.294275	nagging pop-up messages saying	-0.124939
-0.460798	much higher than normal.	-0.124939
-0.356695	more random than normal.	-0.124939
-0.338200	// Writes "Hello 1"	-0.425969
-0.358940	the hardware is updated.	-0.124939
-0.982685	a CPU dispatcher updated.	-0.124939
-0.358852	particular purpose. The clumsy	-0.124939
-0.352418	Intrinsic functions look clumsy	-0.124939
-0.023527	d = (double)(signed int)u;	-0.124939
-0.601253	much of the trivial	-0.124939
-0.504907	a loop for trivial	-0.124939
-0.048403	96 void transpose(double a[SIZE][SIZE])	-0.124939
-0.048403	9.5b void transpose(double a[SIZE][SIZE])	-0.124939
-0.550680	for SSE2 or x64	-0.124939
-0.355244	for 80x86 / x64	-0.124939
-0.503204	(32-bit or 64-bit systems).	-0.124939
-0.351723	bits in x86 systems).	-0.124939
-0.498543	much time is wasted	-0.124939
-0.498543	No time is wasted	-0.124939
-0.294275	price GNU General Public	-0.124939
-0.382907	by Agner Fog. Public	-0.124939
-0.566300	add an extra dummy	-0.124939
-0.354662	may even add dummy	-0.124939
-0.550839	library through the symbolic	-0.124939
-0.541231	program makes a symbolic	-0.124939
-1.074866	variable can be fetched	-0.124939
-0.557571	where instructions are fetched	-0.124939
-0.358744	0x20, Saturday = 0x40	-0.124939
-0.358730	entire 64 or 0x40	-0.124939
-2.060309	n.a. n.a. - (a&b)|(a&c)	-0.124939
-0.237959	----- ~(~a)=a x-xxxxx-- (a&b)|(a&c)	-0.124939
-1.346262	less efficient than relocation,	-0.124939
-0.785389	addresses that need relocation,	-0.124939
-0.817453	and PathScale compilers. (The	-0.124939
-0.666259	for an explanation. (The	-0.124939
-0.659520	* 1.2; // Mixing	-0.124939
-0.352702	the commercial compilers. Mixing	-0.124939
-0.358728	each iteration it decides	-0.124939
-0.526960	A dispatcher function decides	-0.124939
-0.330862	mode (SSE2): #include <xmmintrin.h>	-0.124939
-0.330862	mode (SSE): #include <xmmintrin.h>	-0.124939
-0.357523	return Func1(x) * Func1(x)	-0.124939
-1.601104	x) { return Func1(x)	-0.124939
-0.023527	24, 120, 720, 5040,	-0.425969
-0.557595	features, including the ability	-0.124939
-0.358829	we loose the ability	-0.124939
-0.463330	of multiplying by 3,	-0.124939
-0.514735	sizes 1, 2, 3,	-0.124939
-0.331944	resources .......................................................................................... 21 3.12	-0.124939
-0.331949	and system modules. 3.12	-0.124939
-1.170377	that do not 123	-0.124939
-0.294275	example, #define ABC 123	-0.124939
-0.575642	null reference to provoke	-0.124939
-0.572615	reference. This will provoke	-0.124939
-0.237969	reasonably well. Codeplay VectorC	-0.124939
-0.237969	1.4, 2005. Codeplay VectorC	-0.124939
-0.503936	Interference from other processes.	-0.124939
-1.157535	shared between multiple processes.	-0.124939
-0.102884	same module __attribute__ ((visibility	-0.124939
-0.102884	((visibility ("internal"))) __attribute__ ((visibility	-0.124939
-0.358940	dependency chains is stronger	-0.124939
-0.355699	is so much stronger	-0.124939
-0.818250	These instructions are accessible	-0.124939
-1.261036	that are not accessible	-0.124939
-0.358893	consistent modularity and reusable	-0.124939
-0.463601	put away in reusable	-0.124939
-0.463541	Installation problems. The procedures	-0.124939
-0.339507	pointers, and far procedures	-0.124939
-0.358744	&& !b = !(a	-0.124939
-0.596459	a*b - n.a. !(a	-0.124939
-0.237969	............................................................................. 158 18 Overview	-0.124939
-0.237969	22). 159 18 Overview	-0.124939
-0.237959	immintrin.h AMD SSE4A ammintrin.h	-0.124939
-0.237959	ammintrin.h AMD XOP ammintrin.h	-0.124939
-0.596498	connections rather than sequences	-0.124939
-0.355528	instructions or small sequences	-0.124939
-0.358434	to start and stop	-0.124939
-0.658877	error message and stop	-0.124939
-0.546720	expected for further expansions	-0.124939
-0.325431	such as Taylor expansions	-0.124939
-0.500749	without the sign bit.	-0.124939
-0.500749	about the sign bit.	-0.124939
-0.102884	of vectors. 12.10 Conclusion	-0.124939
-0.102884	....................................................... 120 12.10 Conclusion	-0.124939
-0.357644	not support static linking.	-0.124939
-0.654900	static and dynamic linking.	-0.124939
-0.421439	without an IDE. Free	-0.124939
-0.294275	restrictions. A GNU Free	-0.124939
-0.358978	and studying the bottlenecks	-0.124939
-0.357291	to identify performance bottlenecks	-0.124939
-0.550788	unstable due to interrupts	-0.124939
-0.543339	CPU to generate interrupts	-0.124939
-0.939505	of compatibility with legacy	-0.124939
-0.502862	compatibility with some legacy	-0.124939
-0.358273	a far data segment	-0.124939
-0.462534	data for one segment	-0.124939
-0.357772	Linux platform n.a. __unix__	-0.124939
-0.382907	n.a. __unix__ __linux__ __unix__	-0.124939
-0.504969	their address and attempts	-0.124939
-0.890838	event that it attempts	-0.124939
-0.341892	256-bit integer vectors. Code	-0.124939
-0.294275	transfer are eliminated. Code	-0.124939
-0.093318	Saturday }; Weekdays Day;	-0.124939
-0.093318	0x40 }; Weekdays Day;	-0.124939
-0.142601	c1+TILESIZE; c2++) { swapd(a[r2][c2],a[c2][r2]);	-0.124939
-0.142601	r2; c2++) { swapd(a[r2][c2],a[c2][r2]);	-0.124939
-0.023527	version return (*SelectAddMul_pointer)(aa, bb,	-0.425969
-0.526410	order to save power.	-0.124939
-0.348391	by the processing power.	-0.124939
-0.559631	be a time consumer	-0.124939
-0.356094	an annoying time consumer	-0.124939
-0.566785	to put a parenthesis	-0.124939
-0.358704	you put a parenthesis	-0.124939
-0.347521	and reusable classes. Security	-0.124939
-0.429585	on a computer. Security	-0.124939
-0.460049	counter with its limit,	-0.124939
-0.237959	exceeds an acceptable limit,	-0.124939
-0.462987	You cannot use ~	-0.124939
-0.538969	&, |, ^, ~	-0.124939
-0.358706	call transpose function swapd(a[r][c],	-0.124939
-0.630046	columns below diagonal swapd(a[r][c],	-0.124939
-0.356798	slightly more time. Single	-0.124939
-1.104820	the data cache. Single	-0.124939
-0.294275	n.a. 1.00 0.25 0.28	-0.124939
-0.237959	1.00 0.35 0.29 0.28	-0.124939
-0.850337	operators that have Booleans	-0.124939
-0.294275	as integers. 7.5 Booleans	-0.124939
-0.715041	AMD Opteron K8 0.24	-0.124939
-0.294275	K8 0.24 0.25 0.24	-0.124939
-0.172705	memory access 9.1 Caching	-0.124939
-0.172705	............................................................................................. 87 9.1 Caching	-0.124939
-0.526691	intervals which may interfere	-0.124939
-0.358351	A macro will interfere	-0.124939
-0.657654	-fomit- frame- pointer -fomit-	-0.124939
-0.237959	stack frame /Oy -fomit-	-0.124939
-0.294275	Opteron K8 0.24 0.25	-0.124939
-0.382907	0.24 n.a. 1.00 0.25	-0.124939
-0.106961	__m128i bc = _mm_mullo_epi16	-0.425969
-0.573356	CPUs without the FMA4	-0.124939
-0.355619	xopintrin.h (Gnu) AMD FMA4	-0.124939
-0.294275	Agner Fog. Public distribution	-0.124939
-0.237959	not allowed. Non-public distribution	-0.124939
-0.809032	(2n / b) >>	-0.124939
-0.237959	^, ~, <<, >>	-0.124939
-1.434469	are able to do,	-0.124939
-0.314795	macro expansions. Programmers do,	-0.124939
-0.358702	not alias, if appropriate.	-0.124939
-0.357514	functions static where appropriate.	-0.124939
-0.331944	the subsequent manuals. Please	-0.124939
-0.467697	here's an explanation. Please	-0.124939
-0.102884	.......................................................................................... 21 3.12 Network	-0.124939
-0.102884	system modules. 3.12 Network	-0.124939
-0.356995	MS compiler: unsigned __int64	-0.124939
-0.382907	int64_t MS compiler: __int64	-0.124939
-0.201343	cache misses, branch mispredictions,	-0.124939
-1.408611	0; i < rows;	-0.425969
-0.358273	of keeping data together.	-0.124939
-0.352411	modules are linked together.	-0.124939
-1.347948	be possible to organize	-0.124939
-1.375173	no need to organize	-0.124939
-1.069704	member of a bitfield	-0.124939
-0.358625	to compose a bitfield	-0.124939
-0.599073	integer: // Example 15.1b.	-0.124939
-1.328195	loop in example 15.1b.	-0.124939
-0.581827	message when it sees	-0.124939
-1.483964	that the compiler sees	-0.124939
-0.356798	and development time. Interpreted	-0.124939
-0.237959	UNIX shell script. Interpreted	-0.124939
-0.598969	make the compiler treat	-0.124939
-0.357600	of these also treat	-0.124939
-0.023527	matrix void TransposeCopy(double a[SIZE][SIZE],	-0.425969
-0.358744	&& a<c) = (a<b	-0.124939
-0.237959	(a+c==b+c)=(a==b) ----x---- !(a<b)=(a>=b) (a<b	-0.124939
-0.952060	depending on the processor).	-0.124939
-0.358460	cache misses have occurred.	-0.124939
-0.358251	point overflow has occurred.	-0.124939
-0.598348	! and the corresponding	-0.124939
-0.358829	CPU supports the corresponding	-0.124939
-0.358476	public: SafeArray() { memset(a,	-0.124939
-0.829958	a to zero memset(a,	-0.124939
-0.388811	return x * m;}	-0.124939
-0.358893	seldom occur and recovering	-0.124939
-0.527170	can use for recovering	-0.124939
-0.658961	enum Weekdays { Sunday,	-0.124939
-0.451226	if the constants Sunday,	-0.124939
-0.023527	sets are mutually incompatible.	-0.124939
-0.901149	= *p + 2;}	-0.124939
-0.355529	x *const_cast<int*>(&x) += 2;}	-0.124939
-0.880963	mode program is fast,	-0.124939
-0.237959	/fp:fast /fp:fast=2 -fp-model fast,	-0.124939
-0.716585	costs of optimizing University	-0.124939
-0.294275	Agner Fog. Technical University	-0.124939
-0.563012	of calls to log,	-0.124939
-0.237959	such as pow, log,	-0.124939
-0.358835	directives (everything that begins	-0.124939
-0.682756	The loop body begins	-0.124939
-0.594266	exponent: // Example 14.26	-0.124939
-0.500272	exponent } Example 14.26	-0.124939
-0.594266	integers: // Example 14.27	-0.124939
-0.500272	positive } Example 14.27	-0.124939
-0.868123	The method is somewhat	-0.124939
-0.355349	sequentially. It works somewhat	-0.124939
-0.384095	branch is poorly predictable.	-0.124939
-0.269252	branches are poorly predictable.	-0.124939
-0.287658	speed of addition, subtraction,	-0.124939
-0.287658	such as addition, subtraction,	-0.124939
-0.897137	bit: // Example 14.23	-0.124939
-1.257262	as in example 14.23	-0.124939
-1.320732	may be a slight	-0.124939
-0.547196	which may cause slight	-0.124939
-0.355918	of an execution unit.	-0.124939
-0.490339	no graphics processing unit.	-0.124939
-0.358893	self-styled hacks and direct	-0.124939
-0.543980	language that allows direct	-0.124939
-0.541309	typically get the generic	-0.124939
-0.586681	set, and a generic	-0.124939
-1.190330	floating point addition unit,	-0.124939
-0.490339	a graphics processing unit,	-0.124939
-0.358940	the handle is invalid.	-0.124939
-0.350363	block then become invalid.	-0.124939
-0.358940	or database is heavily	-0.124939
-0.496547	Algorithms that rely heavily	-0.124939
-0.540069	relocation, but only self-	-0.124939
-0.798340	used for calculating self-	-0.124939
-0.598251	faster to make log2	-0.124939
-0.526138	static const double log2	-0.124939
-0.818802	the performance monitor counters.	-0.124939
-0.452642	called performance monitor counters.	-0.124939
-0.343775	used where execution speed,	-0.124939
-0.343775	flexibility, while execution speed,	-0.124939
-0.340385	x--xx---- (a&&b) || (!a&&c)	-0.124939
-0.340385	75 (a&&b) || (!a&&c)	-0.124939
-0.358852	to zero. The []	-0.124939
-0.237959	{ // Safe []	-0.124939
-1.613864	explained on page 122.	-0.124939
-0.457064	registers; see page 122.	-0.124939
-0.460127	make appropriate error messages	-0.124939
-0.237959	with nagging pop-up messages	-0.124939
-0.237959	x[]); void F2(float x[]);	-0.124939
-0.237959	9.2a void F1(int x[]);	-0.124939
-0.142127	etc. (Intel CPU only)	-0.124939
-0.142127	only) (Intel CPU only)	-0.124939
-0.314795	to 11.1b automatically, although	-0.124939
-0.237959	also work, 133 although	-0.124939
-0.873787	Intrinsic functions are primitive	-0.124939
-0.331940	of a relatively primitive	-0.124939
-0.294275	Codes", by S. Goedecker	-0.124939
-0.237959	improving performance. Stefan Goedecker	-0.124939
-0.102884	strategies........................................................................................ 122 13.2 Model-specific	-0.124939
-0.102884	source files. 13.2 Model-specific	-0.124939
-0.237959	identification (RTTI) /GR– -fno-rtti	-0.124939
-0.237959	/GR– -fno-rtti /GR- -fno-rtti	-0.124939
-0.463031	don't want this initialization,	-0.124939
-0.237959	has three clauses: initialization,	-0.124939
-0.358476	> largest_abs) { largest_abs	-0.124939
-0.237959	unsigned int absvalue, largest_abs	-0.124939
-0.345265	and use alternative implementations.	-0.124939
-0.439064	the best Java implementations.	-0.124939
-0.037171	1, 2, 6, 24,	-0.425969
-0.463593	program performance and studying	-0.124939
-0.463536	spots, but for studying	-0.124939
-0.591012	cache (see page 87).	-0.124939
-0.511992	cache (see p. 87).	-0.124939
-0.457450	may cause cache contentions.	-0.124939
-0.354058	// No cache contentions.	-0.124939
-0.724506	int N> class SafeArray	-0.124939
-0.294275	// Example 7.15b SafeArray	-0.124939
-0.589682	7.43 on page 58	-0.124939
-0.759490	to do so. 58	-0.124939
-0.237972	Virtualization is becoming increasingly	-0.124939
-0.343773	processors are becoming increasingly	-0.124939
-0.358625	the measurements as accurate	-0.124939
-0.325437	this is sufficiently accurate	-0.124939
-0.358398	to invest more efforts	-0.124939
-0.554398	focus the optimization efforts	-0.124939
-0.857209	time the program starts.	-0.124939
-1.012058	before the program starts.	-0.124939
-0.354920	Now ebx contains i/2+r.	-0.124939
-0.429573	register for computing i/2+r.	-0.124939
-0.599653	clear to the reader	-0.124939
-0.599510	assumed that the reader	-0.124939
-0.358643	a manual on usability,	-0.124939
-0.348397	between development time, usability,	-0.124939
-0.325437	should be true. template<>	-0.124939
-0.294275	ends the recursion template<>	-0.124939
-0.541295	also includes the low-level	-0.124939
-0.573305	giving access to low-level	-0.124939
-1.991172	instruction set is available:	-0.124939
-0.504622	if SSE2 is available:	-0.124939
-0.314795	floating point overflow: _controlfp_s(&dummy,	-0.124939
-0.237959	point status: _fpreset(); _controlfp_s(&dummy,	-0.124939
-0.344098	by 4 ; mangled	-0.124939
-0.344098	esp ;alignby4 ; mangled	-0.124939
-0.102884	== 2 12.6 Transforming	-0.124939
-0.102884	............................................................................................. 113 12.6 Transforming	-0.124939
-0.358351	critical stride will contend	-0.124939
-0.560079	all dynamic libraries contend	-0.124939
-0.294291	has a garbage collector	-0.124939
-0.294291	very time-consuming garbage collector	-0.124939
-0.282429	}; if ((unsigned int)n	-0.124939
-0.282429	479001600}; if ((unsigned int)n	-0.124939
-0.294275	they are created. Far	-0.124939
-0.237959	also be huge). Far	-0.124939
-0.065764	// Table of factorials:	-0.124939
-1.012144	Virtual member functions ........................................................................................	-0.124939
-0.989485	Using intrinsic functions ........................................................................................	-0.124939
-0.358728	loop control it compares	-0.124939
-0.358329	< 100. It compares	-0.124939
-0.299791	the example below shows.	-0.124939
-0.299791	example 7.15b below shows.	-0.124939
-0.658691	{ DoThisThreeTimesAWeek(); } By	-0.124939
-0.349148	and Mac platforms By	-0.124939
-0.357270	is known with certainty	-0.124939
-0.357270	to predict with certainty	-0.124939
-0.237959	with the rightmost 1-bit	-0.124939
-0.237959	// Remove right-most 1-bit	-0.124939
-0.582917	clock; } // Or	-0.124939
-0.339500	same few parameters. Or	-0.124939
-0.325431	itself when running. Programs	-0.124939
-0.325431	under worst-case conditions. Programs	-0.124939
-0.356275	a single & operation,	-0.124939
-0.516522	with a shift operation,	-0.124939
-0.107140	the file is closed.	-0.425969
-0.339503	class library. Open source.	-0.124939
-0.314795	free and open source.	-0.124939
-0.197170	unsigned int sign :1;//signbit	-0.425969
-0.595845	storage should be avoided,	-0.124939
-0.588321	linking cannot be avoided,	-0.124939
-1.199709	described in the book	-0.124939
-0.237959	SIAM 2001. Advanced book	-0.124939
-0.726808	parameter transfer is avoided.	-0.124939
-0.902775	should definitely be avoided.	-0.124939
-0.358756	MOVNTQ _mm_empty(); // EMMS	-0.124939
-0.540799	followed by an EMMS	-0.124939
-0.595241	work to do immediately	-0.124939
-0.463743	must be placed immediately	-0.124939
-0.132380	{ memset(a, 0, sizeof(a));	-0.124939
-0.132380	zero memset(a, 0, sizeof(a));	-0.124939
-0.803748	registers (see page 105).	-0.124939
-0.549555	instructions (see page 105).	-0.124939
-0.538852	because the register usage	-0.124939
-0.237959	the chapter "Register usage	-0.124939
-1.173492	may have to fix	-0.124939
-0.526793	will try to fix	-0.124939
-0.065083	TransposeCopy(double a[SIZE][SIZE], double b[SIZE][SIZE])	-0.425969
-0.358893	a debugger and press	-0.124939
-0.429573	like a key press	-0.124939
-1.177780	tasks such as sorting	-0.124939
-0.357914	such as most sorting	-0.124939
-0.151378	called shared objects (*.dll,	-0.425969
-0.352831	0.44 0.40 n.a. 1.00	-0.124939
-0.352831	0.25 0.24 n.a. 1.00	-0.124939
-0.314795	fraction 2 23 ,	-0.124939
-0.294275	fraction 2 52 ,	-0.124939
-0.358398	is used more efficiently.	-0.124939
-0.502352	although slightly less efficiently.	-0.124939
-0.596609	memory. If the word	-0.124939
-0.599866	example, in a word	-0.124939
-0.352407	public B1, public B2	-0.124939
-0.580618	B1 { public: B2	-0.124939
-0.314795	10 Multithreading.............................................................................................................. 101 10.1	-0.124939
-0.237959	4, 2007 (www.intel.com/technology/itj/). 10.1	-0.124939
-0.666249	sum += a[i]; Converting	-0.124939
-0.382907	otherwise go undetected. Converting	-0.124939
-0.597259	columns to a matrix.	-0.124939
-0.773576	a 512 512 matrix.	-0.124939
-0.463403	A; x.b = B;	-0.124939
-0.657558	= A + B;	-0.124939
-0.354795	are not doing divisions.	-0.124939
-0.343763	on completely independent divisions.	-0.124939
-0.537091	Avoid long dependency chains,	-0.124939
-0.517515	two loop-carried dependency chains,	-0.124939
-1.148092	The use of coprocessors	-0.124939
-1.174471	be used as coprocessors	-0.124939
-0.354380	x, y; ... x.a	-0.124939
-0.607477	A, B, C; x.a	-0.124939
-0.897266	exactly the same effect.	-0.124939
-0.568950	apparently has no effect.	-0.124939
-0.725996	response times to keyboard	-0.124939
-0.358592	respond quickly to keyboard	-0.124939
-0.023527	40320, 362880, 3628800, 39916800,	-0.425969
-0.581406	_mm_cvtss_f32(s); } // Approximate	-0.124939
-0.357733	*= n+1; // Approximate	-0.124939
-0.065474	100; x++) { Table[x]	-0.425969
-0.308473	remove the const restriction	-0.124939
-0.308473	relieving the const restriction	-0.124939
-0.294275	x.b = B; x.c	-0.124939
-0.237959	y.b + 2.; x.c	-0.124939
-1.056186	result will be misleading	-0.124939
-0.352408	They sometimes give misleading	-0.124939
-0.358273	for aligning data #ifdef	-0.124939
-0.237959	// Example 8.22 #ifdef	-0.124939
-0.642913	3.3 Program installation ..................................................................................................	-0.124939
-0.741677	14.2 Bounds checking ..................................................................................................	-0.124939
-0.237959	// Example 12.8a. Sum	-0.124939
-0.237959	// Example 12.8b. Sum	-0.124939
-0.516578	y.a + 1.; x.b	-0.124939
-0.294275	x.a = A; x.b	-0.124939
-0.891320	AMD and VIA CPUs".	-0.124939
-0.355244	mov ebx,eax / shr	-0.124939
-0.347521	mov $B1$2: mov shr	-0.124939
-0.811338	of 64 bits each.	-0.124939
-0.563879	of 8 bytes each.	-0.124939
-0.331940	----x---x a/1=a xxxxxxxxx 0/a=0	-0.124939
-0.237959	--------x a/1=a x-xxx-x-- 0/a=0	-0.124939
-1.322403	be avoided by replacing	-0.124939
-0.357388	this fact by replacing	-0.124939
-0.596833	solution a = 1.0f	-0.124939
-0.447876	== 0) ? 1.0f	-0.124939
-1.038131	scope of this manual,	-0.124939
-0.575923	cycle? In this manual,	-0.124939
-0.504969	be fragmented and involve	-0.124939
-1.045008	because it may involve	-0.124939
-0.828658	#pragma vector always Optimize	-0.124939
-0.783380	Intel compiler Linux Optimize	-0.124939
-0.908765	i++) sum += list[i];	-0.124939
-0.442087	{ sum1 += list[i];	-0.124939
-0.358756	0.f, 1.f); // initialize	-0.124939
-0.314803	x^n // sum, initialize	-0.124939
-0.331940	n 0 n! 117	-0.124939
-0.237959	code for vectorization............................................................. 117	-0.124939
-0.358835	web browsing that previously	-0.124939
-0.237959	it was assigned previously	-0.124939
-0.460972	Vector class libraries 113	-0.124939
-0.325431	vector classes ............................................................................................. 113	-0.124939
-0.304426	precision division, square root	-0.124939
-0.304426	inefficient. Division, square root	-0.124939
-0.358893	linear algebra and statistics,	-0.124939
-1.009078	many functions for statistics,	-0.124939
-0.352972	be compiled three times,	-0.124939
-1.054117	unacceptably long response times,	-0.124939
-0.659964	to overcome the obstacle	-0.124939
-0.358559	is often an obstacle	-0.124939
-0.336346	copyrighted by Agner Fog.	-0.124939
-0.255942	platforms By Agner Fog.	-0.124939
-0.325437	library exp exp 12.8	-0.124939
-0.314795	for vectors........................................................................ 119 12.8	-0.124939
-0.593735	available from the IDE	-0.124939
-0.358559	builder Has an IDE	-0.124939
-0.347521	with vector access. 12.9	-0.124939
-0.339503	allocated memory................................................................. 120 12.9	-0.124939
-0.855423	efficient way of removing	-0.124939
-0.566755	measured simply by removing	-0.124939
-0.347533	code. A test setup	-0.124939
-0.347533	a simple test setup	-0.124939
-0.237969	bit platform _WIN64 _LP64	-0.124939
-0.237969	_WIN64 _LP64 _WIN64 _LP64	-0.124939
-0.579523	be a simple type,	-0.124939
-0.355042	object of known type,	-0.124939
-0.358476	+= 16) { b.load(bb+i);	-0.124939
-1.580806	eight consecutive elements b.load(bb+i);	-0.124939
-0.596511	necessary because the factorials	-0.124939
-0.505903	store the reciprocal factorials	-0.124939
-2.006660	version of the subroutine	-0.124939
-1.429494	in a separate subroutine	-0.124939
-0.358893	version 2.6.30 and later.	-0.124939
-0.550680	set SSE2 or later.	-0.124939
-0.331940	vectorization ......................................................................................... 107 12.4	-0.124939
-0.331940	integer vector division. 12.4	-0.124939
-0.294275	functions ........................................................................................ 109 12.5	-0.124939
-0.237959	the next section. 12.5	-0.124939
-0.304431	to use algebraic manipulations	-0.124939
-0.304431	is because algebraic manipulations	-0.124939
-0.579565	risk of memory leaks	-0.124939
-0.355451	to prevent memory leaks	-0.124939
-0.354664	official C standard says	-0.124939
-0.237959	register usage convention says	-0.124939
-0.831013	INSTRSET == 2 12.6	-0.124939
-0.294275	classes ............................................................................................. 113 12.6	-0.124939
-0.549555	double (see page 140).	-0.124939
-0.549555	time-consuming (see page 140).	-0.124939
-0.294275	for vectorization............................................................. 117 12.7	-0.124939
-0.237959	is called. 118 12.7	-0.124939
-0.723003	1 fraction 2 52	-0.124939
-0.355151	the structure }; 52	-0.124939
-1.056010	doesn't have to obey	-0.124939
-0.584751	code has to obey	-0.124939
-0.985075	explained on page 72.	-0.124939
-0.143015	b+a a*b = b*a	-0.124939
-0.143015	b+a, a*b = b*a	-0.124939
-0.505036	STL containers is 95	-0.124939
-0.594082	or See page 95	-0.124939
-0.331944	right vector elements. 12.1	-0.124939
-0.294275	vector operations............................................................................................... 105 12.1	-0.124939
-0.898619	performance if the time-critical	-0.124939
-0.358899	on longjmp in time-critical	-0.124939
-0.331940	more distant future. 12.3	-0.124939
-0.331940	registers .......................................................... 107 12.3	-0.124939
-0.835884	to implement a universal	-0.124939
-0.452040	execution time. No universal	-0.124939
-0.597372	SSE3 instruction set Suppl.	-0.124939
-0.237959	emmintrin.h SSE3 pmmintrin.h Suppl.	-0.124939
-0.788063	the program will crash.	-0.124939
-0.758326	problems and system crash.	-0.124939
-0.237959	x---x---x x-xxx---- a*b*c=a*(b*c) a+b+c+d	-0.124939
-0.237959	b*a (a+b)+c=a+(b+c) a+b+c=c+b+a a+b+c+d	-0.124939
-0.463640	not expect to 99	-0.124939
-0.325431	cache control .............................................................................................. 99	-0.124939
-1.178978	x) { // Remove	-0.124939
-0.353864	jumps Eliminate branches Remove	-0.124939
-0.452048	frameworks, intermediate code, interpreters,	-0.124939
-0.314795	large graphics frameworks, interpreters,	-0.124939
-0.463593	3, 5 and 9.	-0.124939
-0.637356	vector element level 9.	-0.124939
-0.357450	copy constructor, if any,	-0.124939
-0.357450	the destructor, if any,	-0.124939
-0.462987	programs must use thread-safe	-0.124939
-0.550050	thread-safe functions. A thread-safe	-0.124939
-0.448165	F2(float x[]); void F3(bool	-0.124939
-0.346724	Example 9.2b void F3(bool	-0.124939
-0.504978	a file in exclusive	-0.124939
-0.726593	a container for exclusive	-0.124939
-0.951287	listed in table 9.2.	-0.124939
-0.352973	_mm_stream_si128 SSE2 Table 9.2.	-0.124939
-1.408611	0; i < NumberOfTests;	-0.425969
-0.358351	The counters will stay	-0.124939
-0.488082	does not necessarily stay	-0.124939
-0.555660	in the 64-bit extension	-0.124939
-0.349144	instructions. A further extension	-0.124939
-1.063803	it is the "best	-0.124939
-0.358893	"worst case" and "best	-0.124939
-0.357230	structures (without member functions)	-0.124939
-0.237959	(remove unreferen- ced functions)	-0.124939
-0.505036	the iteration is repeated	-0.124939
-0.463500	will then be repeated	-0.124939
-0.803202	static const int FactorialTable[13]	-0.124939
-0.549250	factorials: const int FactorialTable[13]	-0.124939
-0.648039	| Wednesday | Friday)	-0.124939
-0.640526	|| Day == Friday)	-0.124939
-0.347532	call polymorphic child function:	-0.124939
-0.421446	use the lrint function:	-0.124939
-2.011637	Example: // Example 8.21	-0.124939
-0.653100	the loop. Example 8.21	-0.124939
-0.764079	supported CriticalFunction = &CriticalFunction_AVX;	-0.124939
-0.461598	AVX supported return &CriticalFunction_AVX;	-0.124939
-0.511853	as vector operations. 105	-0.124939
-0.237959	Using vector operations............................................................................................... 105	-0.124939
-0.598969	makes the compiler interpret	-0.124939
-0.572398	string and then interpret	-0.124939
-0.927443	0 or 1. Writing	-0.124939
-0.325431	in C++ programs. Writing	-0.124939
-0.102884	for the FDIV bug	-0.124939
-0.102884	bug". The FDIV bug	-0.124939
-0.129296	(b, c); // Compare	-0.425969
-0.023527	#define swapd(x,y) {temp=x; x=y;	-0.425969
-0.358730	of range"); or better,	-0.124939
-0.339500	not needed. Even better,	-0.124939
-1.797614	you want to flip	-0.124939
-0.504777	^= 0x80000000; // flip	-0.124939
-0.512799	structure of four float's	-0.124939
-0.439742	integers or four float's	-0.124939
-0.689836	can take several minutes	-0.124939
-0.342704	it took several minutes	-0.124939
-0.358341	* cc[i]); } 109	-0.124939
-0.382907	intrinsic functions ........................................................................................ 109	-0.124939
-0.860976	c; b = (a+1)	-0.124939
-0.580771	(a+1); c = (a+1)	-0.124939
-0.566208	needed for other reasons,	-0.124939
-0.356644	resources. For these reasons,	-0.124939
-0.339500	pre-calculated table. Even better:	-0.124939
-0.237959	no specific option) better:	-0.124939
-0.582913	advantages of each method,	-0.124939
-0.540967	is the simplest method,	-0.124939
-0.585535	but the variable whose	-0.124939
-0.348405	critical stride. Variables whose	-0.124939
-0.408020	with new and delete,	-0.124939
-0.865369	shared object is accessed,	-0.124939
-0.504622	array element is accessed,	-0.124939
-0.143015	- (a&b)|(a&c) = a&(b|c)	-0.124939
-0.143015	x-xxxxx-- (a&b)|(a&c) = a&(b|c)	-0.124939
-1.352732	are useful for assigning	-0.124939
-0.575362	does this by assigning	-0.124939
-0.578388	which is only 10%	-0.124939
-0.489133	b is true 10%	-0.124939
-0.237959	group books 1994. Mostly	-0.124939
-0.237959	Addison- Wesley 1997. Mostly	-0.124939
-0.465797	provided in manual 4:	-0.124939
-0.465797	listed in manual 4:	-0.124939
-0.340544	= temp / 4;	-0.124939
-0.340544	= (a+1) / 4;	-0.124939
-0.455360	that some microprocessors have.	-0.124939
-0.336334	than end users have.	-0.124939
-0.577573	a way of relieving	-0.124939
-1.592435	is used for relieving	-0.124939
-0.902657	int rows = 10,	-0.124939
-0.341896	long double 8, 10,	-0.124939
-0.336338	ret ALIGN ?Func@@YAXQAHAAH@Z ENDP	-0.124939
-0.237959	recently 4 ?Func2@@YAXQAHAAH@Z ENDP	-0.124939
-0.358756	int seconds; // incremented	-0.124939
-0.580337	seconds has been incremented	-0.124939
-0.640523	that it calls. 48	-0.124939
-0.237959	7.14 Functions ................................................................................................................ 48	-0.124939
-0.349149	if (Day == Tuesday	-0.124939
-0.347523	Monday = 2, Tuesday	-0.124939
-0.571078	matrix[j][0] is calculated internally	-0.124939
-0.559274	constructor is implemented internally	-0.124939
-0.452779	in an unused fourth	-0.124939
-0.294275	the columns. Every fourth	-0.124939
-0.357508	books contain many tips	-0.124939
-0.525019	errors, and some tips	-0.124939
-0.463536	are higher for shared_ptr	-0.124939
-0.237959	another by assignment. shared_ptr	-0.124939
-0.807268	in Linux and BSD.	-0.124939
-0.704377	64-bit Linux and BSD.	-0.124939
-0.129458	be worth the effort.	-0.124939
-0.356229	operating systems available today.	-0.124939
-0.339503	used for Java today.	-0.124939
-0.349796	different meaning. 2. Put	-0.124939
-0.237959	algorithm in question: Put	-0.124939
-0.539279	AVX version int CriticalFunction_AVX(int	-0.124939
-0.356741	version 127 int CriticalFunction_AVX(int	-0.124939
-0.358199	inside sqaure: for (r2	-0.124939
-0.358199	handled separately: for (r2	-0.124939
-0.573938	improve this by writing:	-0.124939
-0.461679	alignment explicitly by writing:	-0.124939
-0.065144	class B1; class B2;	-0.124939
-1.288428	assume that the overall	-0.124939
-0.463512	and measuring the overall	-0.124939
-0.172705	the object. 7.17 Structures	-0.124939
-0.172705	.............................................................................................. 50 7.17 Structures	-0.124939
-0.550707	object files and executables.	-0.124939
-0.358215	link. Use different executables.	-0.124939
-0.828216	definition language is inherently	-0.124939
-0.595618	Algorithms that are inherently	-0.124939
-0.358929	programming questions to me.	-0.124939
-0.462842	and one from me.	-0.124939
-1.096608	to call a non-virtual	-0.124939
-1.191934	more resources than non-virtual	-0.124939
-1.027732	this method is safer.	-0.124939
-0.561152	casting, but also safer.	-0.124939
-1.858433	the value of b+c	-0.124939
-0.325437	and c first. b+c	-0.124939
-0.102884	platform n.a. __unix__ __linux__	-0.124939
-0.102884	__unix__ __linux__ __unix__ __linux__	-0.124939
-0.537503	systems: int 16 -32768	-0.124939
-0.463410	reductions: a+b = b+a	-0.124939
-1.297022	calculation of the factorials,	-0.124939
-1.366623	a matter of convenience	-0.124939
-1.033807	be less than 231.	-0.124939
-1.601186	x) { return pow(x,10);	-0.124939
-0.357271	Even big software companies	-0.124939
-0.358355	64-bit systems will dominate	-0.124939
-0.355776	legacy code, specific preferences	-0.124939
-0.452064	__restrict or #pragma optimize("a",on).	-0.124939
-0.349808	Optimize function #pragma optimize(...)	-0.124939
-0.314805	x-xx----- x--x----- ---x----- x---x---x	-0.124939
-0.715066	( 1)sign 2exponent 16383	-0.124939
-0.893772	latest instruction set extensions.	-0.124939
-0.325444	takes 10 μs today,	-0.124939
-1.573249	to: // Example 14.5b	-0.124939
-0.599091	interval: // Example 14.5a	-0.124939
-0.336349	copy constructor specifying otherwise.	-0.124939
-0.237967	initialize sum for(inti=0;i<16;i+=4){ //Loopby4	-0.124939
-0.357530	/ (b1 * b2);	-0.124939
-0.237967	1./5040., 1./40320., 1./362880., 1./3628800.,	-0.124939
-0.589099	still have a niche	-0.124939
-0.357229	((unsigned int)i < 10)	-0.124939
-0.331950	ptr x; __asm fistp	-0.124939
-0.350375	this. (In Windows, SetThreadAffinityMask,	-0.124939
-1.096190	operating systems are common,	-0.124939
-0.576470	byte of data (low	-0.124939
-0.357272	is unfortunately very common.	-0.124939
-0.352091	that allows bigger segments	-0.124939
-0.237967	v. 5.5 Mac: Darwin8	-0.124939
-0.637369	vector element level 108	-0.124939
-0.294284	program is fast, compact,	-0.124939
-0.599712	interface. It is 102	-0.124939
-0.382918	into an anonymous namespace.	-0.124939
-0.586972	availability of an update,	-0.124939
-0.970869	of data. The similarity	-0.124939
-0.408223	operations on contemporary 106	-0.124939
-0.357430	defined outside any function)	-0.124939
-0.541261	it Use a "move	-0.124939
-0.294284	called "Gnu indirect function"	-0.124939
-0.237967	Tuesday, Wednesday, Thursday, Friday,	-0.124939
-0.294284	short int (16 bits),	-0.124939
-0.331950	floating point division. Correction	-0.124939
-0.358856	copying. Security. The vulnerability	-0.124939
-0.897557	unusual for the reinstallation	-0.124939
-0.566762	done simply by ignoring	-0.124939
-0.552617	add the four sums	-0.124939
-0.460274	// (N & N-1)==0	-0.124939
-0.237967	1/n! 1., 1./2., 1./6.,	-0.124939
-0.349803	a, float b) {x	-0.124939
-0.483947	current array element. Rather	-0.124939
-0.349805	compilers have inefficient code-based	-0.124939
-0.649004	assume that model N+1	-0.124939
-0.598553	32 and the 512-bit	-0.124939
-0.897173	available: // Example 7.6.	-0.124939
-0.358736	that violate or circumvent	-0.124939
-0.294284	a/1=a xxxxxxxxx 0/a=0 ---x---xx	-0.124939
-0.529025	about in my blog.	-0.124939
-0.237967	Performance". www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP.	-0.124939
-0.351325	with non-Intel CPUs. Includes	-0.124939
-0.343769	8 edx, eax $B2$2	-0.124939
-0.294284	may have undesired effects.	-0.124939
-0.237967	Warren, Jr.: "Hacker's Delight".	-0.124939
-0.356485	will make 32 AND-operations	-0.124939
-0.504980	specific optimizations in precompiled	-0.124939
-0.527304	values because a typo	-0.124939
-0.591025	class (see page 51).	-0.124939
-0.237967	C++ Compiler Documentation". Included	-0.124939
-0.439073	and restarted anyway. Updates	-0.124939
-0.237967	The characters '?', '@'	-0.124939
-0.866082	Class member functions (methods)	-0.124939
-0.589406	is a loop count.	-0.124939
-0.358591	/FA -S - masm=intel	-0.124939
-0.355991	said, I must warn	-0.124939
-0.237967	reproducible time measurements: warm	-0.124939
-0.504756	compiler has not noticed	-0.124939
-0.996117	of different C++ constructs........................................................................	-0.124939
-2.060313	n.a. n.a. - a<<b<<c	-0.124939
-0.658610	known as memory leak.	-0.124939
-0.345269	or a similar utility	-0.124939
-0.504977	extremely complicated and clumsy,	-0.124939
-0.358856	the table. The 16-byte	-0.124939
-0.549468	array to all zeroes.	-0.124939
-0.525028	section for some caveats.	-0.124939
-0.341903	at the label $B1$2:.	-0.124939
-0.358898	as spell-checking and repagination	-0.124939
-0.575633	; point to a[i+2]	-0.124939
-0.598972	sure the compiler recognizes	-0.124939
-0.598749	keyword is not recognized	-0.124939
-0.237967	call to _endthread() cleans	-0.124939
-0.715076	the & operator (bitwise	-0.124939
-0.294284	sake of cross-platform portability.	-0.124939
-1.513519	can be calculated independently.	-0.124939
-0.599091	9.5b. // Example 9.5b	-0.124939
-0.580732	the time to answer	-0.124939
-0.797703	in the STL (Standard	-0.124939
-0.599091	used: // Example 13.2.	-0.124939
-0.836130	DWORD PTR [edx] adds,	-0.124939
-0.462992	code or use objconv	-0.124939
-1.109993	for a specific purpose:	-0.124939
-0.339512	there are serious limitations	-0.124939
-0.358749	double x10 = x8*x2;	-0.124939
-0.358457	to overcome this limitation).	-0.124939
-0.325441	to x 43 speculatively	-0.124939
-0.900884	code in the disassembly	-0.124939
-0.358019	under CPU cache (en.wikipedia.org/wiki/L2_cache).	-0.124939
-0.462433	necessary when no attempt	-0.124939
-0.572401	a+b=0, and then 0+1.23456	-0.124939
-0.294284	after) - (time before)	-0.124939
-0.584096	index of memory blocks.	-0.124939
-0.800274	the header file timingtest.h	-0.124939
-0.463643	The dispatching to C1::Disp()	-0.124939
-0.358145	a special loop predictor.	-0.124939
-0.358749	is 8*1024/64 = 128.	-0.124939
-1.105585	are not always accurate,	-0.124939
-0.325441	long long, double. Misaligned	-0.124939
-0.598553	www.agner.org/optimize and the FAQ	-0.124939
-0.357079	has been called before.	-0.124939
-2.122408	part of the Xnu	-0.124939
-0.314805	anything else being initialized.	-0.124939
-1.407788	able to do so).	-0.124939
-0.928324	in the compiler. Remember,	-0.124939
-0.237967	point -ffast-math /fp:fast /fp:fast=2	-0.124939
-0.355538	linking (e.g. option /MT).	-0.124939
-0.237967	math library (VML, MKL).	-0.124939
-0.820879	D : public B1	-0.124939
-1.288018	disadvantage is that CParent::Hello()	-0.124939
-0.590056	compilers that a user-defined	-0.124939
-0.349809	for finding hot spots,	-0.124939
-1.636528	the calculation of B.	-0.124939
-0.527177	useful methods for exploiting	-0.124939
-0.505027	% (number of sets).	-0.124939
-0.572248	diagonal have been lost	-0.124939
-0.858208	i++) a[i] = i+1;	-0.124939
-0.505040	a thread is terminated.	-0.124939
-0.339512	considerably. Another serious burden	-0.124939
-0.382918	(a&&b&&c) = a&&(b||c) (a&&!b)	-0.124939
-1.189594	how to use SafeArray:	-0.124939
-0.458838	16 char 128 Is8vec16	-0.124939
-0.863254	has a particular weakness	-0.124939
-0.357607	security. Standard C++ imple-	-0.124939
-0.325441	and image processing. Yeppp.	-0.124939
-0.348402	performance problems. Avoid nested	-0.124939
-0.568421	double const & source)	-0.124939
-0.314805	Coriolis group books 1994.	-0.124939
-0.358255	the operands has side	-0.124939
-0.867628	unless you have ample	-0.124939
-0.553776	be 64 bits (MMX),	-0.124939
-1.214577	is less than 1/50	-0.124939
-0.314805	linking (multithreaded) /arch:AVX /openmp	-0.124939
-0.525670	is because we forgot	-0.124939
-0.456084	increment. The three clauses	-0.124939
-0.237967	affinity mask. Poor reproducibility.	-0.124939
-0.981125	than x = -abs(x);.	-0.124939
-0.358591	x ----- - x-xxx	-0.124939
-0.294284	= ReadTSC(); CriticalFunction(); timediff[i]	-0.124939
-0.237967	to be signed. Be	-0.124939
-0.349157	// initialize sum for(inti=0;i<16;i+=4){	-0.124939
-0.576355	issuing an error message.	-0.124939
-0.357082	particular subtask before coordination	-0.124939
-0.916019	in a more distant	-0.124939
-0.658967	enum Weekdays { Sunday	-0.124939
-0.827104	unused label ; restore	-0.124939
-0.237967	memcpy(b, a, sizeof(b)); 47	-0.124939
-0.294284	Effective C++". Addison-Wesley, 1996.	-0.124939
-0.562285	always available from www.agner.org/optimize.	-0.124939
-0.237967	row < NUMROWS; row++)	-0.124939
-0.595737	loop is to resume	-0.124939
-0.237967	x---- ----- ~(~a)=a x-xxxxx--	-0.124939
-1.307705	the same memory areas.	-0.124939
-0.237967	syntax: __asm fld qword	-0.124939
-0.527040	each test // Repeat	-0.124939
-0.575426	No exception handling /EHs-	-0.124939
-0.599869	b in a union:	-0.124939
-0.357914	the SelectAddMul example (12.4e)	-0.124939
-0.355701	a low-priority thread steals	-0.124939
-0.237967	loop of ADC (add	-0.124939
-0.866159	an object is moved,	-0.124939
-0.598272	sequence to be moved.	-0.124939
-0.725317	use different memory areas,	-0.124939
-0.294284	2 52 , longdoublevalue	-0.124939
-0.356879	in a rather unconventional	-0.124939
-0.358544	can modify x *const_cast<int*>(&x)	-0.124939
-0.572900	using position-independent code (option	-0.124939
-0.294284	a*0=0 --xxxx-xx a*1=a x-xxxxx-x	-0.124939
-0.237967	new or malloc. Handles	-0.124939
-0.575622	all kinds of strange	-0.124939
-0.514738	point to become invalid,	-0.124939
-0.294284	without cache MOVNTDQ _mm_stream_si128	-0.124939
-0.237967	an option (Windows: /Gy,	-0.124939
-0.325441	leave them enabled (there	-0.124939
-1.460104	functions such as sqrt	-0.124939
-0.358749	d.x; a.y = b.y	-0.124939
-0.717772	do not support SSE.	-0.124939
-0.446335	setting the fraction bits:	-0.124939
-0.356339	int 64 0 264-1	-0.124939
-0.358638	priority than code generality.	-0.124939
-0.596838	double a = sin(0.8);	-0.124939
-0.358898	address esp+8 and esp+12	-0.124939
-0.348402	convenient for adding bounds-checking	-0.124939
-0.358838	one, auto_ptr that owns	-0.124939
-0.357028	i >= 0; i--,	-0.124939
-0.237967	compiler, or vice versa.	-0.124939
-0.598331	by the function body.	-0.124939
-0.358736	string, wstring or CString	-0.124939
-0.421455	168.5 513 513 58.7	-0.124939
-0.358760	// Serialize // Prevent	-0.124939
-0.451234	expensive. A limited "express"	-0.124939
-0.294284	bloat and complexity (en.wikipedia.org/wiki/Standard_Template_Library).	-0.124939
-0.835875	sign bit to zero:	-0.124939
-0.498979	7.27 float x; *(int*)&x	-0.124939
-0.237967	position-independent code (option -fno-pic).	-0.124939
-0.358066	For example, one tread	-0.124939
-0.357052	bytes = 4 rows.	-0.124939
-0.358943	memory bus is saturated.	-0.124939
-0.527365	lines follow the rows,	-0.124939
-0.459727	single result. An uncaught	-0.124939
-0.593493	than by a macro,	-0.124939
-0.237967	next year. Ignoring virtualization.	-0.124939
-0.358932	are advised to seek	-0.124939
-0.897388	instead of a macro.	-0.124939
-0.550641	way microprocessors are constructed.	-0.124939
-0.357649	for all static data,	-0.124939
-0.523754	2; } void FuncB	-0.124939
-0.504893	any option that limits	-0.124939
-0.421451	x^3, x^4 F32vec4 xx4(x4);	-0.124939
-0.599584	Time is a precious	-0.124939
-0.237967	for AMD Family 15h	-0.124939
-0.357271	of irrelevant software installed,	-0.124939
-0.598272	files to be installed.	-0.124939
-0.562503	converting to floating point:	-0.124939
-0.358852	v. 11.1 for IA-32/Intel64,	-0.124939
-0.294284	char 16 SSSE3 _mm_perm_epi8	-0.124939
-0.358591	((unsigned int)(i - min)	-0.124939
-0.597245	combined with the LLVM	-0.124939
-2.159700	Example: // Example 7.40a	-0.124939
-0.599091	way: // Example 7.40b	-0.124939
-0.897173	needed: // Example 7.40c	-0.124939
-0.314805	code must compute (FuncRow(i)*columns	-0.124939
-1.097320	Mac OS X (Darwin)	-0.124939
-0.578989	As we can see,	-0.124939
-0.358980	nowadays stress the importance	-0.124939
-1.239694	is not always comparable	-0.124939
-0.526902	are implemented with interpretation.	-0.124939
-0.463410	as xn = x∙xn-1,	-0.124939
-0.336344	this only happens rarely.	-0.124939
-0.349808	not aliased #pragma optimize("a",	-0.124939
-0.358093	of rounding, but neither	-0.124939
-0.950837	able to predict correctly	-0.124939
-0.505027	= (number of sets)	-0.124939
-0.841821	Choice of function libraries........................................................................................	-0.124939
-0.463643	that comes to mind.	-0.124939
-0.556893	about supported instruction sets,	-0.124939
-0.358856	calling conventions. The dot	-0.124939
-0.357779	= (a1*b2 + a2*b1)	-0.124939
-0.570575	they are often mispredicted.	-0.124939
-0.585543	in the variable Day.	-0.124939
-1.669167	likely to be mispredicted,	-0.124939
-0.764752	12.3 Automatic vectorization Good	-0.124939
-0.294284	test // (time after)	-0.124939
-0.832111	static const float OneOrTwo5[2]	-0.124939
-0.341900	variables are temporary intermediates,	-0.124939
-0.550805	before p is incremented.	-0.124939
-0.580353	p has been incremented,	-0.124939
-0.237967	how this works, here's	-0.124939
-0.573309	RAM size is insufficient.	-0.124939
-0.358736	while he or she	-0.124939
-0.871452	time or a not-too-big	-0.124939
-0.572221	because of cache evictions	-0.124939
-0.579817	to the sign bit,	-0.124939
-0.556178	name Instruction set Prefetch	-0.124939
-0.462668	in each CPU core).	-0.124939
-0.314809	__restrict __restrict __declspec( noalias)	-0.124939
-0.463542	9.1. Time for transposition	-0.124939
-0.357942	will invalidate each other's	-0.124939
-0.490902	guidelines are provided below,	-0.124939
-0.294284	without effectively preventing illegitimate	-0.124939
-0.451236	user and prevent legitimate	-0.124939
-0.331950	and header files. 121	-0.124939
-0.489149	in the logical architecture	-0.124939
-0.357514	can hold many renamed	-0.124939
-0.599712	move. It is unacceptable	-0.124939
-0.586794	integers and other hardware-related	-0.124939
-1.222903	first byte at 12,	-0.124939
-0.294284	vector operations (chapter 12)	-0.124939
-0.358898	Edition, 2005; and "More	-0.124939
-0.237967	processing. Scott Meyers: "Effective	-0.124939
-0.358760	= &SelectAddMul_dispatch; // Dispatcher	-0.124939
-0.357058	I die. See www.gnu.org/copyleft/fdl.html.	-0.124939
-1.089976	For example, the Boost	-0.124939
-0.294284	ebx,eax / shr ebx,31	-0.124939
-0.336344	programs where security matters.	-0.124939
-0.353233	// Or #include <ia32intrin.h>	-0.124939
-0.358932	disk operations to finish.	-0.124939
-0.526540	common programs use inappropriate	-0.124939
-0.356338	(n) { case 0:	-0.124939
-0.358898	"Intel 64 and IA-32	-0.124939
-0.358666	be mixed with x87	-0.124939
-0.294284	reductions: a+b=b+a a*b=b*a a+b+c=a+(b+c)	-0.124939
-0.581823	5.0f; b = 6.0f;	-0.124939
-0.325441	it inside {} brackets.	-0.124939
-0.463643	pointer set to NULL.	-0.124939
-0.463258	rather than as b*(2.0/3.0)	-0.124939
-0.358736	libraries (.lib or .a),	-0.124939
-0.898620	it to the tolerance	-0.124939
-0.336346	function libraries Test Processor	-0.124939
-0.353466	the disk cache. Files	-0.124939
-0.358265	time, usability, program compactness,	-0.124939
-0.463335	are implemented by (partial)	-0.124939
-0.237967	manipulation tricks Michael Abrash:	-0.124939
-0.463005	sum2 from time T+1	-0.124939
-0.354672	? 1 : 0]	-0.124939
-0.237967	C++". Addison-Wesley. Third Edition,	-0.124939
-0.592957	Friday) in example 14.7b	-0.124939
-0.683933	graphical user interface. Otherwise	-0.124939
-0.723826	the first two suggested	-0.124939
-2.159700	Example: // Example 14.3a	-0.124939
-0.897173	table: // Example 14.3b	-0.124939
-0.938381	dynamic link library (DLL)	-0.124939
-0.444434	5 #define FUNCNAME SelectAddMul_SSE41	-0.124939
-0.525537	a + 2 thenaandbcannot	-0.124939
-0.237967	4.1.0, 2006 (Red Hat).	-0.124939
-0.237967	32 -231 231-1 int32_t	-0.124939
-0.504913	recovering or for issuing	-0.124939
-0.659895	data member is unchanged	-0.124939
-0.552909	of the stack. Deallocation	-0.124939
-0.454947	at their own initiative	-0.124939
-0.504792	criticized for code bloat	-0.124939
-0.343768	numbers is inefficient. Division,	-0.124939
-0.331953	Linux syntax 90 Gives	-0.124939
-1.395201	the same as C-	-0.124939
-0.237967	Example 7.29b floata; boolb=0;	-0.124939
-0.358898	of Java and C#	-0.124939
-0.341902	(1) or false (0);	-0.124939
-0.358563	never exceeds an acceptable	-0.124939
-0.358898	alternately FuncA and FuncB,	-0.124939
-0.294284	the rightmost 1-bit removed.	-0.124939
-0.725444	and this will trigger	-0.124939
-0.586686	language and a basic	-0.124939
-0.237967	j < columns; j++)	-0.124939
-0.658590	multiple values at once...................................	-0.124939
-0.453987	as for switch statements,	-0.124939
-0.600159	one it is compiling.	-0.124939
-0.505027	typical sources of frustration	-0.124939
-0.526491	memory takes only 2-3	-0.124939
-0.358816	and map are prone	-0.124939
-0.408229	22 3.14 Context switches.....................................................................................................	-0.124939
-0.515475	want to go deeper	-0.124939
-0.352711	equivalent if(!(a || b))	-0.124939
-0.353872	has one operator less.	-0.124939
-0.599200	notion of a "function".	-0.124939
-0.314805	of simultaneous lookups Max.	-0.124939
-0.356283	} T & operator[]	-0.124939
-1.994385	instruction set is enabled:	-0.124939
-0.876443	that the object owns.	-0.124939
-1.513361	if there are wrapper	-0.124939
-0.421455	not __INTEL_COMPILER __INTEL_COMPILER 161	-0.124939
-0.294284	_WIN64 _M_X64 _M_X64 162	-0.124939
-0.659561	periodic pattern can be,	-0.124939
-0.345272	(e.g. in linear algebra)	-0.124939
-0.527295	it used to be.	-0.124939
-0.358760	doesn't work // Re-do	-0.124939
-0.358934	fundamental laws of algebra.	-0.124939
-0.562669	making longer time slices.	-0.124939
-0.594828	loops, then the transformation	-0.124939
-0.659527	// x^4 // x^8	-0.124939
-0.357502	2.20, glibc version 2.11	-0.124939
-0.325444	to test. disable power-save	-0.124939
-0.358162	in two other situations:	-0.124939
-0.237967	make a sensible balance	-0.124939
-0.382918	0.40 n.a. 1.00 0.35	-0.124939
-0.764587	CPU model is over.	-0.124939
-0.540664	operator, or an over-	-0.124939
-0.352981	&& b<c && a<c)	-0.124939
-0.865747	b2; y = a1/b1	-0.124939
-0.354672	: "m"(x) : "memory"	-0.124939
-0.237967	11.1 for IA-32/Intel64, 2009.	-0.124939
-0.584835	least in some situations,	-0.124939
-0.442082	have a false vendor	-0.124939
-0.237967	SafeArray <float, 100> list;	-0.124939
-0.582923	p->Hello(); } // Non-polymorphic	-0.124939
-0.575678	quite a good investment.	-0.124939
-0.356284	The 16-byte instructions MOVNTPS,	-0.124939
-0.858293	long double precision (80	-0.124939
-0.659129	= 100; int matrix[NUMROWS][NUMCOLUMNS];	-0.124939
-0.237967	of return prediction). 149	-0.124939
-0.851927	{ _mm_storeu_si128((__m128i *)d, x);}	-0.124939
-1.053981	The value of cc[i]+2	-0.124939
-0.356937	Is32vec2 32 64 Iu32vec2	-0.124939
-0.521735	unsigned int 128 Iu32vec4	-0.124939
-0.874722	NumberOfTests; i++) { time1	-0.124939
-0.798256	useful for making plug-ins	-0.124939
-0.353463	32 char 256 Vec32c	-0.124939
-0.358457	is outside this interval,	-0.124939
-1.733876	known at compile time?	-0.124939
-0.527024	8) SelectAddMul_pointer = &SelectAddMul_AVX2;	-0.124939
-0.336349	example, a Core i7	-0.124939
-0.542291	platform software development kit	-0.124939
-1.334980	of the array i)	-0.124939
-0.336344	parameters Vec4f polynomial (Vec4f	-0.124939
-0.598553	resources, and the transitions	-0.124939
-0.325441	table is cached. Usually	-0.124939
-0.237967	unsigned int u[2]} a[size];	-0.124939
-0.575707	9.2 Cache organization ...................................................................................................	-0.124939
-0.463643	nearest element to x?"	-0.124939
-0.354056	CPU has problems separating	-0.124939
-0.357652	a 32-bit number (the	-0.124939
-0.538326	f(); }; void g()	-0.124939
-0.358543	language programming, compiler technology,	-0.124939
-0.581234	and the array 800	-0.124939
-0.589424	CPUs cannot be tolerated.	-0.124939
-0.357313	can exceed 2 Gbytes.	-0.124939
-0.357530	Object2; CChild1 * p1;	-0.124939
-0.358560	parameters typedef int CriticalFunctionType(int	-0.124939
-0.237967	pointer is created, deleted,	-0.124939
-1.482527	= a - a/1	-0.124939
-0.314805	drawbacks of C++. Yet,	-0.124939
-0.356937	compiler: __int64 64 -263	-0.124939
-0.522123	value it was assigned	-0.124939
-0.331950	main principles here: functional	-0.124939
-0.358630	value written as 2eee	-0.124939
-1.855897	it is not human	-0.124939
-0.596074	intended as a plug-in	-0.124939
-0.382918	(SSE2): #include <xmmintrin.h> _mm_setcsr(_mm_getcsr()	-0.124939
-0.578781	difference less than 2-20,	-0.124939
-0.593769	D are not yet	-0.124939
-0.358898	Taylor expansions and Newton-Raphson	-0.124939
-0.596825	complaints should be regarded	-0.124939
-0.357942	to draw each pixel	-0.124939
-0.545753	setting a thread affinity	-0.124939
-0.615826	PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROCNEAR	-0.124939
-0.357954	(See also page 119).	-0.124939
-1.366510	than on the stack).	-0.124939
-0.237967	A few decades ago,	-0.124939
-1.160733	Alternatively, you may reuse	-0.124939
-0.237967	get any answer. Beginners	-0.124939
-0.358760	= &SelectAddMul_SSE2; // Error:	-0.124939
-0.996075	in a memory pool,	-0.124939
-0.572248	code have been reordered,	-0.124939
-0.294284	2 23 , doublevalue	-0.124939
-0.237967	Intel Performance Primitives (IPP).	-0.124939
-0.358290	the project at hand.	-0.124939
-0.599869	typo in a hand-	-0.124939
-0.358898	operators (& and |)	-0.124939
-0.352983	efficiently by better standardization	-0.124939
-0.237967	and planned solutions. Patches	-0.124939
-0.358898	of structured and object-oriented	-0.124939
-0.357026	should not call WriteFile	-0.124939
-0.294284	through the symbolic link.	-0.124939
-0.822383	13.1 CPU dispatch strategies........................................................................................	-0.124939
-0.502735	set of performance monitoring	-0.124939
-0.325441	break; case 1: printf("Beta");	-0.124939
-1.546135	more efficient to re-use	-0.124939
-0.358943	non-inlined copy is dead	-0.124939
-0.570382	micro-op cache. The Core2	-0.124939
-0.867497	has a different meaning.	-0.124939
-0.863254	has a particular meaning,	-0.124939
-0.331950	2014. Last updated 2014-08-07.	-0.124939
-0.463258	calculated internally as (int)&matrix[0][0]	-0.124939
-0.595641	i) { // Safe	-0.124939
-0.358093	not i but i*12,	-0.124939
-0.921953	using the keyword __thread	-0.124939
-0.796126	branch target buffer (BTB).	-0.124939
-0.522541	keyword has several meanings	-0.124939
-0.355259	out-of- order calculation capabilities.	-0.124939
-1.003148	in the next paragraph.	-0.124939
-0.463169	b;} }; int Sum2(S3	-0.124939
-0.237967	en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC TR 18015,	-0.124939
-0.358221	different microprocessors, different alignments	-0.124939
-0.598658	Reset floating point status:	-0.124939
-0.347530	available vector classes. Including	-0.124939
-0.354674	interrupt, e.g. every millisecond.	-0.124939
-0.640536	double b;}; S1 list[100],	-0.124939
-0.358856	of microprocessor The benchmark	-0.124939
-0.237967	bit in nn ifbit=1	-0.124939
-1.197940	In this example, f(x)	-0.124939
-0.531629	calculated as follows: floatvalue	-0.124939
-0.408229	a class definition. Inlining	-0.124939
-0.345269	assume that seconds remains	-0.124939
-0.505092	performance under the worst-	-0.124939
-0.998079	a list of titles.	-0.124939
-1.070335	numbers: // Example 11.2a	-0.124939
-0.839993	type identification (RTTI) /GR–	-0.124939
-0.354386	{ public: ... ~C1();	-0.124939
-0.357779	vector operator + (vector	-0.124939
-0.505092	cycles. Obviously, the initial	-0.124939
-0.237967	C++ compilers www.agner.org/ optimize/#vectorclass	-0.124939
-0.357530	return powN<true,N/2>::p(x) * powN<true,N/2>::p(x);	-0.124939
-0.547889	sum2 are called accumulators.	-0.124939
-1.683872	explained on page 22.	-0.124939
-1.095721	are in fact addressed	-0.124939
-0.523754	... } void F0()	-0.124939
-0.500342	BSD, Intel-based Mac OS,	-0.124939
-0.314805	v.10.2 & earlier vmlsExp4	-0.124939
-0.726445	to compile with -mcmodel=large,	-0.124939
-0.358932	from -128 to +127.	-0.124939
-0.357862	and b double precision:	-0.124939
-0.357229	<= n < 223	-0.124939
-1.281123	a + 1; x[1]	-0.124939
-0.463600	Stefan Goedecker and Adolfy	-0.124939
-0.902665	int SIZE = 64;	-0.124939
-0.357271	worse, many software products	-0.124939
-0.356481	optimize/#vectorclass Include file dvec.h	-0.124939
-0.560087	not dynamic libraries (.dll	-0.124939
-0.355701	decoded in several stages	-0.124939
-0.597002	this function is InstructionSet().The	-0.124939
-0.885238	line size of 64.	-0.124939
-0.237967	disks and USB sticks	-0.124939
-0.832598	by multiple threads Parallelization	-0.124939
-0.355447	bytes. Each line covers	-0.124939
-0.356178	force when I die.	-0.124939
-0.589696	described on page 153.	-0.124939
-0.356339	int 16 0 65535	-0.124939
-0.294284	Generate optimization report /Qopt-report	-0.124939
-0.590056	happen that a low-priority	-0.124939
-0.237967	// Example 12.1b. Vectorization	-0.124939
-0.336344	true/false Loopunrolling x-xxxx--x Profile-guided	-0.124939
-0.527290	have constructors and destructors.	-0.124939
-0.355619	graphic brushes, etc. Locked	-0.124939
-0.352085	of efficiency, platform independence,	-0.124939
-0.654905	Static versus dynamic libraries............................................................................	-0.124939
-1.230617	the cost of fine-tuning,	-0.124939
-1.021116	only when it exits.	-0.124939
-0.891019	that the loop exits,	-0.124939
-0.353463	details about name mangling	-0.124939
-0.581018	by the Gnu utilities	-0.124939
-0.382918	124 13.3 Difficult cases........................................................................................................	-0.124939
-0.580353	problem has been alleviated	-0.124939
-0.462024	calculated with two decimals,	-0.124939
-0.355448	times per matrix cell	-0.124939
-0.463523	or operator that transfers	-0.124939
-0.358749	order(i); list[j].a = list[j].b	-0.124939
-0.538986	j = order(i); list[j].a	-0.124939
-0.599091	arrays: // Example 12.4a.	-0.124939
-0.357914	faster than example 12.4a,	-0.124939
-0.461223	shift operation. For example,a	-0.124939
-0.596825	guidelines should be obeyed.	-0.124939
-0.541049	all resources are sufficient,	-0.124939
-0.966643	of the final product.	-0.124939
-0.346502	100. pop ebx restores	-0.124939
-0.237967	v. 4.5.2, July 2011).	-0.124939
-0.294284	that are inherently serial,	-0.124939
-1.421313	have to be restored	-0.124939
-0.314809	the code. C#, managed	-0.124939
-0.450286	the YMM registers. Disadvantages	-0.124939
-0.336349	with the expected real-time	-0.124939
-0.237967	1./120., 1./720., 1./5040., 1./40320.,	-0.124939
-0.355780	a processing speed exceeding	-0.124939
-0.408223	---x---xx (-a==-b)=(a==b) ---xx---- (a+c==b+c)=(a==b)	-0.124939
-0.352982	/Qopt-report -opt-report Table 18.2.	-0.124939
-0.237967	e.g. a menu click	-0.124939
-0.581880	1.0E8, c = 1.23456,	-0.124939
-0.353675	software programs automatically download	-0.124939
-0.237967	-Ofast -mveclibabi -fopenmp /Qopenmp	-0.124939
-0.358601	or tiling. This technique	-0.124939
-0.541212	frequent allocation and de-allocation	-0.124939
-1.013492	is replaced by x<<3,	-0.124939
-0.581226	not the best optimizer.	-0.124939
-0.356113	page 137 about division).	-0.124939
-0.463652	bits represent a monotonically	-0.124939
-0.596159	array with a top-of-stack	-0.124939
-0.503246	platforms or multiple configurations	-0.124939
-0.358898	turn on and off.	-0.124939
-0.237967	Vec16s Vec16us Vec8i Vec8ui	-0.124939
-0.461824	Requires binutils version 2.20	-0.124939
-0.357774	1.25 1.61 n.a. 2.23	-0.124939
-0.237967	C++ Performance". www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf.	-0.124939
-0.463643	more references to relocate,	-0.124939
-0.358816	answer. Beginners are advised	-0.124939
-0.463652	like pressing a button	-0.124939
-0.821293	into the right positions	-0.124939
-0.526593	flow. However, this did	-0.124939
-0.237967	int 128 Iu16vec8 Vec8us	-0.124939
-0.353233	other compilers. #include <excpt.h>	-0.124939
-0.971250	is only a minimal	-0.124939
-0.732806	Intel's Math Kernel Library,	-0.124939
-0.294284	are created. Far Systems	-0.124939
-0.358736	whole workday or more.	-0.124939
-0.524721	update or even telling	-0.124939
-0.358898	of strange and unexpected	-0.124939
-0.341900	to make profiling feasible.	-0.124939
-0.463410	double x4 = x2*x2;	-0.124939
-1.061387	latest version of Mathcad	-0.124939
-1.252094	with the option -mveclibabi=acml.	-0.124939
-0.356655	a very user friendly	-0.124939
-0.331950	of background processes running,	-0.124939
-0.584324	number of different targets	-0.124939
-0.459818	and then calls exit.	-0.124939
-0.349151	cost of task switching.	-0.124939
-0.586722	consider the following alternatives:	-0.124939
-0.763314	12 Using vector operations...............................................................................................	-0.124939
-0.881293	registers (see page 27).	-0.124939
-0.812488	Compiler optimization options ...................................................................................	-0.124939
-0.657570	= log(b[i]) + log(c[i]);	-0.124939
-0.358690	is copied by assignment,	-0.124939
-0.358690	to another by assignment.	-0.124939
-0.583214	very difficult to diagnose.	-0.124939
-0.599091	statement: // Example 8.9b	-0.124939
-0.764560	reduced 15.1a to 15.1c).	-0.124939
-0.325441	}; vector() {} vector(float	-0.124939
-0.883811	int b; int c;};	-0.124939
-2.159700	Example: // Example 8.9a	-0.124939
-0.901764	for different instruction sets...........................	-0.124939
-0.738902	for 32-bit Windows. Integrates	-0.124939
-0.356527	alignment problem void AddTwo(int	-0.124939
-0.237967	the function scanf. Violation	-0.124939
-0.988502	divide the work evenly	-0.124939
-0.580353	spot has been identified,	-0.124939
-0.527329	113 Number of simultaneous	-0.124939
-0.902826	integer operations for incrementing	-0.124939
-0.493088	measurements can become imprecise	-0.124939
-0.357855	Intel. See Intel Technology	-0.124939
-1.068799	char a = -100,	-0.124939
-0.548837	whether the call p->f()	-0.124939
-0.237967	clock; __cpuid(dummy, 0); DontSkip	-0.124939
-0.237967	of 1/n! 1., 1./2.,	-0.124939
-1.170822	replaced by a blend	-0.124939
-0.657570	= d + e	-0.124939
-0.463600	full generality and flexibility	-0.124939
-0.237967	version FuncType SelectAddMul, SelectAddMul_SSE2,	-0.124939
-0.357136	const * const Greek[4]	-0.124939
-0.325448	clock frequency (in Windows:	-0.124939
-0.596159	language with a wealth	-0.124939
-1.966986	if it is correlated	-0.124939
-0.460533	be carried out independently	-0.124939
-0.358760	<< endl; // Output	-0.124939
-0.444434	checking template <typename T,	-0.124939
-0.444434	template template <typename T>	-0.124939
-0.382918	each version FuncType SelectAddMul,	-0.124939
-0.325448	specification. The empty throw()specification	-0.124939
-0.355257	* 16is calculated asa	-0.124939
-0.314805	are containers 93 themselves.	-0.124939
-0.578781	not less than ARRAYSIZE.	-0.124939
-0.336349	because it defines electrical	-0.124939
-0.526144	x; const double A2	-0.124939
-0.354672	%0 " : "=m"(n)	-0.124939
-0.463600	S. Goedecker and A.	-0.124939
-0.294284	integrated development environment (IDE)	-0.124939
-0.294284	optimization. en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC TR	-0.124939
-0.510595	different function libraries. Numbers	-0.124939
-0.355988	that use large amounts	-0.124939
-0.429586	Intel/MASM syntax: __asm fld	-0.124939
-0.442082	which is fast. Calculating	-0.124939
-0.237967	version of Mathcad (v.	-0.124939
-0.452063	draws a whole polygon	-0.124939
-0.314805	x---- x---- ----- ~(~a)=a	-0.124939
-0.537826	Simple member pointers /vms	-0.124939
-0.358526	the profiler may sample	-0.124939
-0.358307	answer questions from everybody.	-0.124939
-0.358730	I consider it unwise	-0.124939
-1.269442	compilers and operating systems").	-0.124939
-0.504893	The object that looses	-0.124939
-0.382918	p2 = &Object2; p2->Hello();	-0.124939
-0.599091	variables: // Example 8.23b.	-0.124939
-0.357779	+ (c + d);	-0.124939
-0.325444	for char pointers. 144	-0.124939
-0.354673	language defines hardware circuits	-0.124939
-0.897173	table: // Example 14.1b	-0.124939
-0.749243	u; int n; 143	-0.124939
-1.362696	this: // Example 14.1a	-0.124939
-0.358943	element a[i] is ecx+eax*4.	-0.124939
-0.314805	float to int. Reinterpret	-0.124939
-0.478801	min = 100, max	-0.124939
-0.314805	Table 8.1 (page 77)	-0.124939
-0.356693	textbook on test theory.	-0.124939
-0.237967	Use a "move constructor"	-0.124939
-0.355707	versions 7 through 14,	-0.124939
-0.358367	NAN (Not A Number)	-0.124939
-0.358355	of cores will grow	-0.124939
-0.615826	32 bit systems: Pointers,	-0.124939
-0.493085	by vector size. Unpredictable	-0.124939
-0.358736	the GetTickCount or QueryPerformanceCounter	-0.124939
-1.225164	to divide the workload	-0.124939
-0.358690	replace u[1] by u[0].	-0.124939
-1.682250	of the same class).	-0.124939
-0.579079	Similarly, we are seeing	-0.124939
-0.339514	restarted anyway. Software distributors	-0.124939
-0.237967	Volume 1, 2A, 2B,	-0.124939
-0.237967	v. 4.1.0, 2006 (Red	-0.124939
-0.356054	handle current CPUs optimally.	-0.124939
-0.591834	aligning the data optimally,	-0.124939
-0.463645	technological point of view.	-0.124939
-0.341902	thanks to heavy competition.	-0.124939
-0.492293	Mars Compiler v. 8.42n,	-0.124939
-0.237967	code optimization", Coriolis group	-0.124939
-0.592462	I want to thank	-0.124939
-0.237967	www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP. www.openmp.org.	-0.124939
-0.512957	set is particularly interesting	-0.124939
-0.336344	= a|(b&c) x-xxxx--x ~a&~b=~(a|b)	-0.124939
-1.044145	; unused label ;eax=addressofa	-0.124939
-0.446330	using the declaration "static"	-0.124939
-1.489297	a multiple of 0x800	-0.124939
-0.587218	heading You can subtract	-0.124939
-0.504359	know how this works,	-0.124939
-0.237967	__asm__ (".type CriticalFunction, @gnu_indirect_function");	-0.124939
-0.566217	possibility for other optimizations,	-0.124939
-0.358898	on vectors and matrixes.	-0.124939
-0.358221	have very different speeds.	-0.124939
-0.435094	pmmintrin.h Suppl. SSE3 tmmintrin.h	-0.124939
-0.237967	c1() : x(0) {};	-0.124939
-0.858619	that the programmer forgets	-0.124939
-0.336344	advice given above. 7.	-0.124939
-0.776063	is a positive integer:	-0.124939
-0.552631	Func1 when compiling module2.cpp.	-0.124939
-0.353677	the smallest members last:	-0.124939
-0.525122	9.6b 64 64 14.0	-0.124939
-0.237967	SelectAddMul, SelectAddMul_SSE2, SelectAddMul_SSE41, SelectAddMul_AVX2,	-0.124939
-0.237967	critical time consumers. Choose	-0.124939
-0.357779	= list[j].b + list[j].c;	-0.124939
-0.325441	& r) {return r.a	-0.124939
-0.556772	from many different places).	-0.124939
-0.463542	-mavx, etc. for Linux)	-0.124939
-0.358736	structure. Incrementing or decrementing	-0.124939
-0.355619	screen resolutions, etc. Accessibility	-0.124939
-0.580818	and any other constructors.	-0.124939
-0.357862	= x4*x4; double x10	-0.124939
-1.060203	else { // Generic	-0.124939
-0.237967	F32vec4 F64vec2 F32vec8 F64vec4	-0.124939
-0.358943	which supposedly is system-independent,	-0.124939
-0.237967	i++) { 92 DynamicArray[i]	-0.124939
-1.267979	compiler to do cross-module	-0.124939
-0.237967	the compiler. Remember, therefore,	-0.124939
-0.357779	compute (FuncRow(i)*columns + FuncCol(i))	-0.124939
-0.358690	in memory by requesting	-0.124939
-0.358278	the remote data locally.	-0.124939
-0.897173	calculations: // Example 8.3a	-0.124939
-0.897173	available: // Example 12.4c.	-0.124939
-0.350376	calculation here gives a+b=0,	-0.124939
-0.294284	produced regularly. AMD: "Software	-0.124939
-0.500914	or an assembly listing.	-0.124939
-0.851655	register variable in eax.	-0.124939
-0.643912	in a computer game	-0.124939
-1.178996	x) { // polynomial(x)	-0.124939
-0.357530	: (bb[i] * cc[i]);	-0.124939
-0.341902	every code line. Time-based	-0.124939
-0.358898	for installation and uninstallation	-0.124939
-0.498969	point algebra reductions: a+b=b+a	-0.124939
-1.715964	so that the remaining	-0.124939
-0.358760	sign bit // u.d	-0.124939
-0.237967	version 2.11 ifunc branch).	-0.124939
-0.435097	38 7.11 Type conversions....................................................................................................	-0.124939
-0.538270	and the system forbids	-0.124939
-1.683872	explained on page 107.	-0.124939
-0.343771	each allocated block. Walking	-0.124939
-0.237967	IEEE standard 754 (1985).	-0.124939
-0.872824	allocated is also de-allocated.	-0.124939
-0.421455	Templates...............................................................................................................57 7.29 Threads ..................................................................................................................	-0.124939
-0.294284	-O3 or -Ofast /O3	-0.124939
-0.358166	software uses CPU dispatching:	-0.124939
-0.462668	code with CPU dispatching,	-0.124939
-0.526902	a compiler with C++0x	-0.124939
-0.355253	+ a2*b1) / (b1*b2);	-0.124939
-0.237967	1./4.790016E8, 1./6.22702E9, 1./8.71782E10, 1./1.30767E12,	-0.124939
-0.237967	b;}; S1 list[100], *temp;	-0.124939
-0.358666	Far Systems with segmented	-0.124939
-0.358563	to get an integral	-0.124939
-0.435094	parameters on CodeGear compiler).	-0.124939
-0.354384	an existing program. Weighing	-0.124939
-0.355700	position-independent code. These workaround	-0.124939
-0.237967	is utilized appropriately. Users	-0.124939
-0.237967	where security matters. Problems	-0.124939
-0.237967	-mveclibabi -fopenmp /Qopenmp -m32	-0.124939
-0.314809	int, float, double, bool,	-0.124939
-0.294284	get the generic branch,	-0.124939
-0.294284	-mssse3 /arch:SSSE2 -msse4.1 /arch:SSE4.1	-0.124939
-0.325441	when testing worst-case performance:	-0.124939
-0.237967	Professional and Enterprise editions).	-0.124939
-0.461290	stored at address [ecx+eax*4].	-0.124939
-0.237967	the best optimizer. Borland/CodeGear/Embarcadero	-0.124939
-0.683933	testing and maintenance easier.	-0.124939
-0.594171	by the value 1000.	-0.124939
-1.241439	advantage of using ready	-0.124939
-1.393410	compilers I have studied	-0.124939
-1.178119	= b + 0.666666666666666666667;	-0.124939
-0.355536	Z; Z += A2;	-0.124939
-0.541184	in the so-called commpage.	-0.124939
-0.882411	much time it uses.	-0.124939
-0.462403	true. template<> class powN<true,0>	-0.124939
-0.237967	from addresses 0x2F00, 0x3700,	-0.124939
-0.237967	< NUMCOLUMNS; column++) matrix[row][column]	-0.124939
-0.355539	numbers mean good performance).	-0.124939
-0.345269	The next chapter describes	-0.124939
-0.358749	float lookup[2] = {2.6f,	-0.124939
-0.356055	cache and accessed non-sequentially	-0.124939
-0.578781	while less than 1%	-0.124939
-0.237967	recoverable and non-recoverable errors;	-0.124939
-0.460274	if (n & 1)	-0.124939
-1.109092	cache line size (typically	-0.124939
-0.358838	common error that hackers	-0.124939
-0.237967	to avoid hard-to-find errors,	-0.124939
-0.597571	address by the formula:	-0.124939
-0.237967	optimize this loop? Certainly	-0.124939
-0.642921	function call statement occupies	-0.124939
-0.336349	libraries with internal multi-threading,	-0.124939
-0.353233	Example 9.6b. #include "xmmintrin.h"	-0.124939
-0.885097	memory space is occupied	-0.124939
-0.345272	512 matrix. My experimental	-0.124939
-1.415509	the loop control condition:	-0.124939
-0.358290	everything happens at runtime).	-0.124939
-0.237967	int r1, r2, c1,	-0.124939
-0.237967	at Exception Specifications, Dr	-0.124939
-1.089845	the overhead of switching	-0.124939
-0.577296	interleave the two formulas	-0.124939
-0.835875	is easy to trace	-0.124939
-0.761086	is also called Single-Instruction-Multiple-Data	-0.124939
-0.294284	three clauses: initialization, condition,	-0.124939
-0.358479	class SafeArray { protected:	-0.124939
-0.358939	implementations reveal a zigzag	-0.124939
-0.526801	more on this topic,	-0.124939
-0.314805	feedback seriously. User complaints	-0.124939
-0.460468	(MMX), 128 bits (XMM),	-0.124939
-0.597049	make a function local:	-0.124939
-0.596503	times rather than 20.	-0.124939
-0.336346	include C, C++, D,	-0.124939
-0.353676	add statements like throw(A,B,C)	-0.124939
-0.581860	cache and it fills	-0.124939
-0.546379	should be made local.	-0.124939
-0.357026	that allows less precise	-0.124939
-0.237967	nontemporal Table 18.3. Predefined	-0.124939
-0.556777	global and one local,	-0.124939
-0.870034	diagonal are accessed column-wise.	-0.124939
-0.358709	CPU’s. Another function __intel_cpu_features_init_x()	-0.124939
-0.314805	so-called partial flags stall	-0.124939
-0.447889	i<300; i+=3){ list[i] =0;	-0.124939
-0.325441	PC platforms. Graphics accelerators	-0.124939
-0.294284	Meyers: "Effective C++". Addison-Wesley.	-0.124939
-0.463749	out sign bit: absvalue	-0.124939
-0.347530	efficient than mov eax,0.	-0.124939
-0.237967	x; __asm fistp dword	-0.124939
-0.237967	vectors of inte- ger	-0.124939
-0.504918	microprocessors work. The recommendations	-0.124939
-0.576470	case of data decomposition,	-0.124939
-0.341902	table of jump targets.	-0.124939
-0.598310	which may be undesired.	-0.124939
-0.460530	(16 or 32 bytes).	-0.124939
-0.237967	72 for discussions. Turn	-0.124939
-0.599091	instead: // Example 12.6.	-0.124939
-1.257309	as in example 7.32b.	-0.124939
-0.357855	CPUs use Intel VTune,	-0.124939
-0.742310	in the interval [1.0,	-0.124939
-0.357649	statements (called static if),	-0.124939
-0.358943	the macro is referencing	-0.124939
-0.876079	profiler is called VTune;	-0.124939
-0.237967	it does incredibly stupid	-0.124939
-0.356614	double precision without worrying	-0.124939
-0.357082	is checked before storing.	-0.124939
-0.823967	variables and operators ......................................................................	-0.124939
-0.897173	lookup: // Example 7.29b	-0.124939
-0.237967	S. Warren, Jr.: "Hacker's	-0.124939
-0.325441	arrays and objects. Storage	-0.124939
-0.237967	by a constant: Unsigned	-0.124939
-0.382918	are becoming increasingly blurred	-0.124939
-1.735621	example: // Example 7.29a	-0.124939
-0.589196	1000; unsigned int dummy;	-0.124939
-0.595332	obtain, such as eliminating	-0.124939
-0.457661	you may write FatalAppExitA(0,"Array	-0.124939
-0.504686	intermediate code by emulating	-0.124939
-0.357502	the current version satisfies	-0.124939
-2.021467	for (i = (int)n	-0.124939
-0.345271	compiling the module with,	-0.124939
-0.237967	information is utilized appropriately.	-0.124939
-0.237967	512 512 378.7 168.5	-0.124939
-0.357332	= x8*x2; return x10;	-0.124939
-0.237967	513 513 58.7 168.3	-0.124939
-0.452058	Programs that produce streaming	-0.124939
-0.461608	+ 2 return (2.5f	-0.124939
-1.221698	the difference between commas	-0.124939
-0.314805	y1, y2, reciprocal_divisor; reciprocal_divisor	-0.124939
-1.076772	instead of the usual	-0.124939
-0.357052	least recently 4 ?Func2@@YAXQAHAAH@Z	-0.124939
-0.353233	#include <excpt.h> #include <float.h>	-0.124939
-0.358898	+ 100*16, and temp++	-0.124939
-0.591508	what you are doing.	-0.124939
-0.358943	memory footprint is unreasonably	-0.124939
-0.358901	gained remarkably in popularity	-0.124939
-0.294284	DLL is relocated (rebased)	-0.124939
-0.408223	add add cmp ja	-0.124939
-0.460468	(XMM), 256 bits (YMM),	-0.124939
-0.237967	the interval [1.0, 2.0)	-0.124939
-0.527223	of software that dates	-0.124939
-1.284083	} }; // Called	-0.124939
-1.352908	a - n.a. (-a)*(-b)	-0.124939
-0.358560	int matrix[NUMROWS][NUMCOLUMNS]; int row,	-0.124939
-0.325444	at its mirror position	-0.124939
-0.764576	doesn't need a constructor.	-0.124939
-0.358898	Volume 2A and 2B.	-0.124939
-0.358939	NAN (not a number).	-0.124939
-0.524405	even-numbered logical processors (0,	-0.124939
-0.594095	motion. See page 78.	-0.124939
-0.494968	is the binary decimals	-0.124939
-0.353463	may make separate executables	-0.124939
-0.358816	protection means are among	-0.124939
-0.352983	code, see below. Installing	-0.124939
-0.352712	if (absvalue > largest_abs)	-0.124939
-0.885070	syntax in example 8.15b.	-0.124939
-0.336346	integers use truncation towards	-0.124939
-0.550891	in the multiplication b[i]*c[i],	-0.124939
-0.294284	static where appropriate. Compiler-specific	-0.124939
-1.994385	instruction set is maintained	-0.124939
-0.358402	address calculation more efficient:	-0.124939
-1.058759	static inline __m128i LoadVectorA(void	-0.124939
-0.237967	double a[arraysize], b[arraysize], c[arraysize];	-0.124939
-0.573378	only improve the performance,	-0.124939
-1.810780	to make a sensible	-0.124939
-0.314809	-Ofast /O3 -O3 Interprocedural	-0.124939
-0.355045	these instructions. Function Assembly	-0.124939
-0.503661	<int N> class powN<true,N>	-0.124939
-2.060313	n.a. n.a. - a+b+c	-0.124939
-1.687189	there is a compelling	-0.124939
-1.075876	included in the profile.	-0.124939
-0.557670	container elements are cumbersome	-0.124939
-1.360831	last byte at 403	-0.124939
-0.336344	SSE2 emmintrin.h SSE3 pmmintrin.h	-0.124939
-1.750968	is likely to experience.	-0.124939
-1.514373	of a program executable:	-0.124939
-0.563009	means not a vector).	-0.124939
-0.331950	IDE's (Integrated Development Environments)	-0.124939
-0.809097	well on non-Intel machines?	-0.124939
-0.358852	manuals are for those	-0.124939
-0.358749	bit: absvalue = a[i].u[1]	-0.124939
-0.510606	a copy protection scheme	-0.124939
-0.599285	preferences for the IDE,	-0.124939
-0.460274	N1 (N & (N-1))	-0.124939
-0.873965	very useful for investigating	-0.124939
-0.452064	vectorize, or #pragma novector	-0.124939
-0.358749	100, max = 110;	-0.124939
-0.294284	searching for vacant spaces.	-0.124939
-0.237967	a discrete icon signaling	-0.124939
-0.587273	need not be passed	-0.124939
-0.353233	classes (Intel) #include <pmmintrin.h>	-0.124939
-0.589849	on the first sub-vector.	-0.124939
-1.195226	according to the IEEE	-0.124939
-0.457213	The OR operator (|)	-0.124939
-0.331950	condition is relatively expensive,	-0.124939
-0.649004	assume that model N-1	-0.124939
-0.358816	above sections are dominating	-0.124939
-0.331955	function __fastcall __attribute(( fastcall))	-0.124939
-0.237967	has been brutally interrupted.	-0.124939
-0.458838	128 128 128 17.4	-0.124939
-0.442082	example, in interpreted script	-0.124939
-0.596838	1.5f}; a = lookup[b];	-0.124939
-0.836130	{ return _mm_loadu_si128((__m128i const*)p);}	-0.124939
-0.237967	char 128 Iu8vec16 Vec16uc	-0.124939
-0.325441	four floats F32vec4 xxn(x4,	-0.124939
-0.657570	= r + i/2;	-0.124939
-0.786430	point to integer According	-0.124939
-0.353873	technology, and microprocessor microarchitecture.	-0.124939
-0.461223	every version. For team	-0.124939
-0.352712	// abs(u.f) > abs(v.f)	-0.124939
-0.294284	CriticalFunction (); __asm__ (".type	-0.124939
-0.461223	intermediate version. For one-man	-0.124939
-0.851542	serial code for vectorization.............................................................	-0.124939
-0.408223	we need metaprogramming. None	-0.124939
-0.351325	inline function #define MAX(a,b)	-0.124939
-0.529919	cache, branch target buffer,	-0.124939
-0.858111	then we can roughly	-0.124939
-0.582987	return y = pow(x,n)	-0.124939
-0.382918	((C & 3) <<6	-0.124939
-0.900884	element in the arrays:	-0.124939
-0.237967	AMD Family 15h Processors".	-0.124939
-0.237967	different profiling methods: Instrumentation:	-0.124939
-1.053981	The value of i&15	-0.124939
-0.237967	"static" or "__attribute__((visibility ("hidden")))".	-0.124939
-0.566805	often used as buffers	-0.124939
-0.454946	double 2 AVX2 _mm256_i64gather_pd	-0.124939
-0.358730	is referencing it twice.	-0.124939
-0.237967	embedded systems. Today (2013)	-0.124939
-0.659969	have tested the capability	-0.124939
-0.866198	a call to _endthread()	-0.124939
-0.358749	// polynomial(x) = 2.5*x^2	-0.124939
-0.358736	("int 3"); or __debugbreak();.	-0.124939
-0.586794	interface and other system-	-0.124939
-0.504712	are many function calls,	-0.124939
-0.358932	is tempting to fine-	-0.124939
-0.859200	the function library asmlib,	-0.124939
-0.358943	This chapter is aiming	-0.124939
-0.572248	program have been found,	-0.124939
-0.863254	on a particular subtask	-0.124939
-0.358898	functions lrintf and lrint.	-0.124939
-0.357530	log (b[i] * c[i]);	-0.124939
-0.237967	int 32 -231 231-1	-0.124939
-0.657661	// Function pointer serves	-0.124939
-0.358169	Details about instruction latencies	-0.124939
-0.512959	to a limited audience	-0.124939
-0.237967	linking (remove unreferen- ced	-0.124939
-0.455364	that it becomes full.	-0.124939
-0.587373	#if instead of if.	-0.124939
-0.237967	the user. Feature bloat.	-0.124939
-0.463547	function library. The radical	-0.124939
-0.722477	of cache space. Putting	-0.124939
-0.463547	invalid pointers. The absence	-0.124939
-1.749565	} else { FuncB(i);	-0.124939
-0.577577	simple way of solving	-0.124939
-0.901482	waste of the programmers'	-0.124939
-1.574702	at compile time. Four	-0.124939
-1.636528	the calculation of (a+b).	-0.124939
-0.325444	Pointers to contained objects?	-0.124939
-1.327103	is stored in y.	-0.124939
-0.844650	for array bounds violations,	-0.124939
-0.598594	(depending on the processor)	-0.124939
-0.358367	"Inner Loops: A sourcebook	-0.124939
-0.336344	double vectors SSE3 horizontal	-0.124939
-1.689503	a floating point comparison.	-0.124939
-0.900215	chains can be broken	-0.124939
-0.336349	DynamicArray = (float *)alloca(n	-0.124939
-1.177811	tasks such as spell-checking	-0.124939
-0.341905	int)i >= (unsigned int)size)	-0.124939
-2.159700	Example: // Example 7.34a.	-0.124939
-0.462681	precision requires only SSE).	-0.124939
-0.999819	even though the CPU-type	-0.124939
-0.358898	of synchronizing and communicating	-0.124939
-0.577637	using only the even-numbered	-0.124939
-0.867497	has a different meaning	-0.124939
-0.358749	- a<<b<<c = a<<(b+c)	-0.124939
-1.295290	when the code mixes	-0.124939
-0.357514	such as many encryption	-0.124939
-0.356178	line size. I tried	-0.124939
-0.832111	static const float coef[16]	-0.124939
-0.463547	measured separately. The fallacy	-0.124939
-1.191025	function is not referenced	-0.124939
-0.294284	<math.h> #define EXCEPTION_FLT_OVERFLOW 0xC0000091L	-0.124939
-0.570530	to such a formalism.	-0.124939
-1.557495	in case of underflow:	-0.124939
-0.355988	; align ; mark	-0.124939
-0.357806	data set into sub-vectors	-0.124939
-0.339510	or with compile-time polymorphism.	-0.124939
-0.570584	aligned or the __assume_aligned	-0.124939
-0.339510	implementing a compile-time polymorphism,	-0.124939
-0.518557	first the runtime polymorphism:	-0.124939
-0.237967	lacks the self-explaining menus	-0.124939
-0.549193	Some other compilers (Microsoft,	-0.124939
-0.237967	c, d, e, f,	-0.124939
-0.575621	(not up to date):	-0.124939
-0.356999	dividend is unsigned Examples:	-0.124939
-1.053002	bytes without cache MOVNTPS	-0.124939
-0.592841	poor because it lacks	-0.124939
-0.600620	access can be arranged	-0.124939
-0.582626	start to calculate (c+d)	-0.124939
-0.463657	Register ebx is pushed	-0.124939
-0.358898	floppy disks and USB	-0.124939
-0.501083	rather than its brand,	-0.124939
-0.653880	the intermediate result (b+c)	-0.124939
-0.358736	stack frame" or "frame	-0.124939
-0.639073	:1;//signbit }; struct Sdouble	-0.124939
-0.354534	universal, flexible, well tested,	-0.124939
-0.358560	+ p->b;} int Sum3(S3	-0.124939
-0.498969	of a suitable duration.	-0.124939
-0.541312	obsolete within the lifetime	-0.124939
-0.237967	temp < &list[100]; temp++)	-0.124939
-0.599091	loop: // Example 14.13c	-0.124939
-0.237967	Are objects numbered consecutively?	-0.124939
-0.897173	operations: // Example 14.13a	-0.124939
-0.358526	dependency chain may fill	-0.124939
-1.573249	to: // Example 8.15b	-0.124939
-0.495467	float 8 AVX2 _mm_i64gather_pd	-0.124939
-0.358898	the GOT, and finally	-0.124939
-0.452060	same chip. Such hybrid	-0.124939
-0.314805	{ int list[100]; Func1(list,	-0.124939
-0.452063	take a whole workday	-0.124939
-0.463495	10; Templates are instantiated	-0.124939
-0.586794	limitation and other flaws	-0.124939
-0.456090	(except for char pointers).	-0.124939
-0.345272	the "generate map file"	-0.124939
-1.601186	x) { return IntegerPower<10>(x);	-0.124939
-0.351726	compiler versions were tested:	-0.124939
-0.505035	by testing and analyzing	-0.124939
-0.357958	Example 7.36 class S2	-0.124939
-0.357958	Example 7.37 class S3	-0.124939
-0.596460	b*a - n.a. (a+b)+c	-0.124939
-0.358749	float list[] = {1.1,	-0.124939
-0.576470	type of data elements,	-0.124939
-0.294284	polymorphic child function: (static_cast<MyChild*>(this))->Disp();	-0.124939
-1.786446	that can be cross-	-0.124939
-0.347528	system functions (e.g. GetLogicalProcessorInformation	-0.124939
-0.595448	(or part of it).	-0.124939
-0.980053	can be quite substantial.	-0.124939
-0.659506	{ Table[x] = A*x*x	-0.124939
-0.325441	(MS) smmintrin.h (Gnu) AES,	-0.124939
-0.541138	are certain that u	-0.124939
-0.343772	calls a device driver.	-0.124939
-0.600620	divisions can be combined.	-0.124939
-0.463600	has advantages and disadvantages.	-0.124939
-0.357433	process. Obviously, we loose	-0.124939
-0.237967	to heavy competition. Processors	-0.124939
-1.576323	more efficient than investing	-0.124939
-0.559653	values of its arguments.	-0.124939
-0.763392	function library at www.agner.org/optimize/asmlib.zip	-0.124939
-1.178996	x) { // Round	-0.124939
-0.565474	do more complicated reductions.	-0.124939
-0.357272	is admittedly very kludgy.	-0.124939
-0.450290	SetThreadAffinityMask, in Linux, sched_setaffinity).	-0.124939
-0.357430	not get any answer.	-0.124939
-0.356113	fully utilizing its out-of-	-0.124939
-0.358560	square blocking: int r1,	-0.124939
-0.343772	debugging facilities, easy GUI	-0.124939
-0.358898	as flush and fence	-0.124939
-0.237967	is the sign, eee	-0.124939
-0.237967	as a scalar (Scalar	-0.124939
-0.488097	and exception handling. Omitting	-0.124939
-0.596503	instructions rather than nine,	-0.124939
-0.237967	mov mov lea $B2$2:	-0.124939
-1.573249	to: // Example 7.10b	-0.124939
-0.357779	+ c.y + d.y;	-0.124939
-0.599091	1: // Example 7.10a	-0.124939
-0.463645	typical degree of randomness	-0.124939
-0.449185	a low priority thread,	-0.124939
-0.461530	fast 32-bit software development",	-0.124939
-0.358600	expressions when not selected.	-0.124939
-0.358560	allocation are: int BigArray[1024]	-0.124939
-0.456967	you will see shortly.	-0.124939
-0.462672	Lists of instruction latencies,	-0.124939
-0.237967	in the broader perspective	-0.124939
-0.461507	chains with long latencies.	-0.124939
-0.237967	|, ^, ~, <<,	-0.124939
-0.358560	Example 7.18 int FuncRow(int);	-0.124939
-0.358760	have multiple // versions:	-0.124939
-0.581823	-1.0E8, b = 1.0E8,	-0.124939
-0.358898	example 12.4b and 12.4c	-0.124939
-0.325444	produced regularly. Intel: "Intel®	-0.124939
-0.408223	It may seem illogical	-0.124939
-0.314809	AMD: "AMD64 Architecture Programmer’s	-0.124939
-0.294284	purpose. The clumsy AND-OR	-0.124939
-0.726362	s; s = (short	-0.124939
-0.352981	write if(!a && !b)	-0.124939
-1.118119	code in multiple versions,	-0.124939
-0.336349	machines with embedded microcontrollers.	-0.124939
-0.352712	expression -a > -b	-0.124939
-0.353678	the integer expression -a	-0.124939
-0.237967	set Prefetch PREFETCH _mm_prefetch	-0.124939
-0.352982	commercial license Table 12.4.	-0.124939
-0.444434	parameter: template <typename MyChild>	-0.124939
-1.054970	( short int bb[size]	-0.124939
-0.294284	consisting of digital building	-0.124939
-0.358816	the STL are universal,	-0.124939
-1.546135	more efficient to pool	-0.124939
-0.527297	integer representation of &list[100]	-0.124939
-0.582923	cc); } // Entry	-0.124939
-0.357674	static inline float add_elements(__m128	-0.124939
-0.237967	reference, or void. Returning	-0.124939
-0.538986	for (i=0; i<n; ++i).	-0.124939
-0.590968	9.4 const int NUMROWS	-0.124939
-0.541113	} sum = (s0+s1)+(s2+s3);	-0.124939
-0.336346	131. Intel Performance Primitives	-0.124939
-0.597263	confined to a narrow	-0.124939
-1.362696	this: // Example 12.4e.	-0.124939
-0.352984	lot of runtime DLL's	-0.124939
-0.237967	language", section 17.9: "Moving	-0.124939
-0.355360	for elements inside sqaure:	-0.124939
-0.527024	* SelectAddMul_pointer = &SelectAddMul_dispatch;	-0.124939
-0.237967	brushes, etc. Locked mutexes.	-0.124939
-0.421451	press or mouse move.	-0.124939
-0.355253	+= xn / nfac;	-0.124939
-1.133073	is intended for detecting	-0.124939
-0.237967	by a conditional move,	-0.124939
-0.237967	obsolete. Rick Booth: "Inner	-0.124939
-0.237967	doing multiple logically distinct	-0.124939
-0.358749	(~a&c) a&b&c&d = (a&b)&(c&d)	-0.124939
-0.294284	GNU General Public License,	-0.124939
-0.658341	only one CPU core,	-0.124939
-0.353676	in classes like string,	-0.124939
-0.237967	0.40 0.30 4.5 0.82	-0.124939
-0.237967	154 // Print heading	-0.124939
-0.878704	discussed on page 60.	-0.124939
-0.348401	relevant to optimization. Prefetching	-0.124939
-0.353237	that supports automatic vectorization,	-0.124939
-0.520603	to the current position.	-0.124939
-0.237967	Core 2 0.77 0.89	-0.124939
-0.442080	statement several iterations ahead.	-0.124939
-0.356283	... list[i & 15]	-0.124939
-0.325444	has a virus scanner	-0.124939
-0.597242	by the program logic.	-0.124939
-0.237967	glibc version 2.11 ifunc	-0.124939
-0.462694	/vms Fastcall functions /Gr	-0.124939
-0.358749	as n! = n∙(n-1)!.	-0.124939
-1.579258	} } } Transposing	-0.124939
-0.835369	be done by controlling	-0.124939
-0.358898	are auto_ptr and shared_ptr.	-0.124939
-0.805834	help files and databases.	-0.124939
-0.357082	programming experience before trying	-0.124939
-0.599869	tasks in a multitasking	-0.124939
-0.868289	subtraction and multiplication (27	-0.124939
-0.868289	subtraction and multiplication (20	-0.124939
-0.358749	(line size) = (total	-0.124939
-1.362696	by // Example 8.5b	-0.124939
-0.909955	Whole program optimization /GL	-0.124939
-2.159700	Example: // Example 8.5a	-0.124939
-0.595448	large part of it)	-0.124939
-0.822383	13.1 CPU dispatch strategies	-0.124939
-0.788977	of optimization is requested.	-0.124939
-0.358943	a response is delayed	-0.124939
-0.354932	< ArraySize; i++) List[i]++;	-0.124939
-0.357082	the job before you.	-0.124939
-0.382918	} Example 14.27 assumes	-0.124939
-0.598658	additional floating point variable:	-0.124939
-0.294284	"Hacker's Delight". Addison-Wesley, 2003.	-0.124939
-0.599712	faster. It is assumed	-0.124939
-0.461957	the Borland C++ builder.	-0.124939
-0.237967	operators &&, ||, !	-0.124939
-0.356484	courses in programming nowadays	-0.124939
-0.314805	............................................................................................. 56 7.28 Templates...............................................................................................................57	-0.124939
-0.343774	have tested implement OneOrTwo5[b!=0]	-0.124939
-0.341907	binary trees, hash maps	-0.124939
-0.512251	explained in chapter 9.10,	-0.124939
-0.548790	compiling in two steps.	-0.124939
-0.345273	the old version. Updating	-0.124939
-0.358736	option /QaxAVX or -axAVX.	-0.124939
-0.566762	number simply by inverting	-0.124939
-0.358943	of &list[100] is (int)(&list[100])	-0.124939
-0.892708	__m128i a = _mm_or_si128(c2,	-0.124939
-0.347530	The instructions mov ebx,eax	-0.124939
-0.356805	by (partial) template specialization.	-0.124939
-0.356805	a non-recursing template specialization,	-0.124939
-1.786446	that can be improved.	-0.124939
-0.527290	in C++ and Fortran.	-0.124939
-0.237967	parallel processing. Scott Meyers:	-0.124939
-0.865747	b2; y = (a1*b2	-0.124939
-0.896493	operand is not evaluated,	-0.124939
-0.596838	Writing a = OneOrTwo5[b!=0];	-0.124939
-0.354672	public: c1() : x(0)	-0.124939
-0.237967	& earlier vmlsExp4 vmldExp2	-0.124939
-0.594265	b+c will be rounded	-0.124939
-0.358760	= (int)d; // Truncation	-0.124939
-0.983747	= b / 1.2345;	-0.124939
-0.358901	is short in duration	-0.124939
-0.336346	of when type-casting pointers:	-0.124939
-0.358666	that begin with _mm.	-0.124939
-0.358290	for Nerds at Wikibooks.	-0.124939
-0.349151	interrupts and task switches;	-0.124939
-1.589163	The compiler may interleave	-0.124939
-0.876720	do not overlap. 27	-0.124939
-0.237967	0.18 0.11 1.21 0.57	-0.124939
-0.237967	long clock; __cpuid(dummy, 0);	-0.124939
-0.557735	Windows and to Eclipse	-0.124939
-0.463305	may interfere with real	-0.124939
-0.357779	= A*x*x + B*x	-0.124939
-0.237967	Dr Dobbs Journal, 2002).	-0.124939
-0.541312	listing. Use the "generate	-0.124939
-0.559150	that is always true/false	-0.124939
-0.598749	error is not detected	-0.124939
-0.314805	mainframe computer. Big supercomputers	-0.124939
-0.358690	of pointers, by initializing	-0.124939
-0.527310	the memory is mirrored	-0.124939
-1.453677	< 100; i++) matrix[FuncRow(i)][FuncCol(i)]	-0.124939
-0.358760	= static_cast<float>(i); // Implicit	-0.124939
-0.356901	Programmers very often underestimate	-0.124939
-0.540502	disagree with this rule.	-0.124939
-0.993203	the end user. Installation	-0.124939
-0.463495	Function names are undocumented.	-0.124939
-0.358736	whole polygon or bitmap	-0.124939
-0.460274	= (n & 0x7FFFFF)	-0.124939
-0.630061	Developer’s Manual", Volume 2A	-0.124939
-0.596074	regarded as a valuable	-0.124939
-0.421451	as the C-style type-casting.	-0.124939
-0.557016	very large data bases,	-0.124939
-0.358116	operating system which redirects	-0.124939
-0.458229	and denormals-are-zero mode (SSE2):	-0.124939
-1.099592	This can cause severe	-0.124939
-0.549699	after the last member.	-0.124939
-0.503264	(128 bit float vectors)	-0.124939
-0.237967	reveals a funda- mentally	-0.124939
-0.358838	the wires that connect	-0.124939
-0.237967	is the responsi- bility	-0.124939
-0.341907	x; nfac *= n+1;	-0.124939
-0.348409	specified types (See Sutter:	-0.124939
-0.549943	enabled. A more primitive,	-0.124939
-0.463542	9.3. Time for transposing	-0.124939
-0.897279	on the same computer,	-0.124939
-0.358943	two values is closest	-0.124939
-0.463643	// Loop to print	-0.124939
-0.599821	operands if the evaluation	-0.124939
-0.463542	30 ms for foreground	-0.124939
-0.237967	function. __attribute__((const)) (Linux only).	-0.124939
-0.527327	specific advantage to obtain,	-0.124939
-0.358898	further tested and investigated	-0.124939
-0.462471	15.1c. Calculate integer power,	-0.124939
-0.358704	Example 7.8 if (handle	-0.124939
-0.705776	C++ Compiler v. 11.1	-0.124939
-0.358760	type T // Constructor	-0.124939
-0.325441	63 63 31 11.6	-0.124939
-0.237967	remote or removable media	-0.124939
-0.358856	illegitimate copying. The benefits	-0.124939
-0.325441	65 65 33 11.8	-0.124939
-0.461608	of 2 return powN<(N	-0.124939
-0.599821	platforms if the bias	-0.124939
-0.463657	the information is utilized	-0.124939
-0.600955	dispatcher in the MKL	-0.124939
-0.237967	__int64 64 -263 263-1	-0.124939
-0.325441	double 256 F32vec4 F64vec2	-0.124939
-1.085719	subroutines in assembly language",	-0.124939
-0.563137	in Mac OS X.	-0.124939
-0.352416	Function level linking (remove	-0.124939
-0.355852	don't support processor X"	-0.124939
-0.358943	CPU market is developing	-0.124939
-1.308070	the arrays are aligned,	-0.124939
-0.819661	easier to write 2.0/3.0	-0.124939
-0.599091	counter: // Example 7.31b	-0.124939
-0.897173	case: // Example 7.31a	-0.124939
-0.357530	return powN<(N1&(N1-1))==0,N1>::p(x) * powN<true,N-N1>::p(x);	-0.124939
-0.336351	without the Common Language	-0.124939
-0.352414	write instructions becomes noticeable.	-0.124939
-0.461583	more than 2 gigabytes	-0.124939
-0.352983	*= x; n >>=	-0.124939
-0.591025	capabilities (see page 103)	-0.124939
-0.596074	work as a learning	-0.124939
-0.599091	(WTL): // Example 7.43b.	-0.124939
-0.525028	comp.lang.asm.x86 for some links.	-0.124939
-0.358560	int c; int UnusedFiller;	-0.124939
-0.596159	other with a 50-50	-0.124939
-0.358508	is slow, you know).	-0.124939
-0.325441	that a detailed overview	-0.124939
-0.463657	function F1 is supposed	-0.124939
-0.599091	comparison: // Example 14.4b	-0.124939
-0.599091	time. // Example 15.1a.	-0.124939
-0.461583	Kbytes to 2 Mbytes.	-0.124939
-0.358901	usability problem in interactive	-0.124939
-0.294284	Visual Studio 2008 version).	-0.124939
-0.237967	in nn ifbit=1 bitofn	-0.124939
-0.331950	no exception ever happens.	-0.124939
-0.726710	more complicated and error-prone.	-0.124939
-0.294284	C++ compilers. Wikipedia article	-0.124939
-0.462024	table with two entries.	-0.124939
-0.503850	is efficient, but risky.	-0.124939
-0.331950	from a project built	-0.124939
-0.451237	and child class. Members	-0.124939
-1.076975	running in the majority	-0.124939
-0.358777	Visual Studio can build	-0.124939
-0.455367	code Static linking (multithreaded)	-0.124939
-0.516583	14.5b if ((unsigned int)(i	-0.124939
-0.346499	that it rarely justifies	-0.124939
-2.159700	Example: // Example 8.13a	-0.124939
-1.573249	to: // Example 8.13b	-0.124939
-0.538986	&, |, ^, ~,	-0.124939
-0.358980	to reinvent the wheel.	-0.124939
-0.358932	many years to come.	-0.124939
-0.355355	currently doesn't works (gcc	-0.124939
-0.358980	it lacks the self-explaining	-0.124939
-1.160733	Alternatively, you may actively	-0.124939
-0.237967	0.3, -2.0, 4.4, 2.5};	-0.124939
-0.357779	return vector(x + a.x,	-0.124939
-1.628907	is important to weigh	-0.124939
-0.550843	that cause the resource-hungry	-0.124939
-0.358666	by extending with zero-bits	-0.124939
-0.583569	program is often reorganized	-0.124939
-0.481373	~a = -1 (a&~b)|(~a&b)=a^b	-0.124939
-0.237967	this block: 62 __try	-0.124939
-0.550644	speed and for minimizing	-0.124939
-0.358901	are cheap, in relation	-0.124939
-0.358749	{ a[c][r] = b[r][c];	-0.124939
-0.358666	are defined with enum,	-0.124939
-0.539678	explicitly. In example 8.21,	-0.124939
-1.362696	by // Example 14.15b	-0.124939
-0.541549	granularity is too fine	-0.124939
-0.237967	my free E-book Usability	-0.124939
-0.543423	language and automatic CPU-dispatching	-0.124939
-0.358736	inte- ger or double)	-0.124939
-0.996097	when called from main,	-0.124939
-0.550815	was certain to truly	-0.124939
-0.463547	allocated separately. The allocation,	-0.124939
-0.583569	as is often seen,	-0.124939
-0.237967	Microsoft Foundation Classes (MFC).	-0.124939
-0.237967	strcpy, strcat, strlen, sprintf,	-0.124939
-0.897388	sign of a double:	-0.124939
-0.237967	(); __asm__ (".type CriticalFunction,	-0.124939
-0.917366	the loop and reorganize:	-0.124939
-0.358898	how tortuous and convoluted	-0.124939
-0.357779	= a[i] + b[i];	-0.124939
-0.346500	such as e.g. .R.	-0.124939
-0.356614	be used without restrictions.	-0.124939
-0.358943	five manuals is copyrighted	-0.124939
-0.659481	hard disk or network.	-0.124939
-0.593545	; i < arraysize;	-0.124939
-1.952987	is possible to express	-0.124939
-0.354224	may be both cheaper	-0.124939
-0.358300	in case memory re-allocation	-0.124939
-0.462891	pointer is then de-referenced	-0.124939
-0.498670	should be used. Web	-0.124939
-0.659873	the user to restart	-0.124939
-0.237967	of runtime DLL's (dynamically	-0.124939
-0.358852	rows; i++) for (j	-0.124939
-0.358736	backwards. Copying or clearing	-0.124939
-0.504913	shared_ptr than for auto_ptr.	-0.124939
-0.237967	aliasing /Oa -fno-alias Non-strict	-0.124939
-0.659129	= 1000; int List[ArraySize];	-0.124939
-0.529025	matrix in my experiments.	-0.124939
-0.358943	.exe file, is acceptable.	-0.124939
-0.981125	than x = *(++p)	-0.124939
-0.237967	Vec8i Vec8ui Vec4q Vec4uq	-0.124939
-0.659527	is slow // Modulo	-0.124939
-0.237967	first two suggested improvements).	-0.124939
-1.780387	the compiler to vectorize,	-0.124939
-0.358934	approximate comparison of doubles	-0.124939
-0.358124	has hyperthreading. If so,	-0.124939
-0.237967	// Example 7.43b. Compile-time	-0.124939
-1.635473	one of the weekdays.	-0.124939
-0.358543	included in compiler price	-0.124939
-0.358736	kit (SDK or PSDK).	-0.124939
-0.331950	__attribute(( fastcall)) __fastcall Noncached	-0.124939
-1.020816	out of range printf(Greek[n]);	-0.124939
-0.314809	is not modified. Unlike	-0.124939
-0.584590	of the user interface,	-0.124939
-0.346499	Math Kernel Library (MKL	-0.124939
-0.818350	is waiting for response.	-0.124939
-0.504508	compact than an MFC	-0.124939
-0.358600	set SSE2 not supported");	-0.124939
-0.900305	cache misses, branch misprediction,	-0.124939
-0.598553	linker and the loader.	-0.124939
-0.458397	compiler will calculate (1./1.2345)	-0.124939
-0.981125	than x = array[++i]	-0.124939
-0.358560	8.20 module1.cpp int Func1(int	-0.124939
-0.294284	user expects immediate responses	-0.124939
-0.358816	CPU vendors are offering	-0.124939
-0.847682	Using vector classes Programming	-0.124939
-0.882978	accessed on a First-In-Last-	-0.124939
-1.318148	of the code. Inserting	-0.124939
-0.504768	have (set) = (10000	-0.124939
-0.444438	Prevent optimizing away cpuid	-0.124939
-0.885070	CriticalFunction in example 16.2.	-0.124939
-0.237967	on test theory. Advice	-0.124939
-0.596460	2 - n.a. a+a+a+a	-0.124939
-0.637383	PTR [edx] DWORD PTR[ecx+eax*4],ebx	-0.124939
-0.294284	example 8.26a (32-bit mode):	-0.124939
-0.358749	float OneOrTwo5[2] = {1.0f,	-0.124939
-0.568216	syntax is so kludgy	-0.124939
-0.496716	for-loop has three clauses:	-0.124939
-0.237967	space is occupied throughout	-0.124939
-0.357862	= x2*x2; double x8	-0.124939
-0.294284	induction variable. (This eliminates	-0.124939
-0.580563	metaprogramming would be straightforward.	-0.124939
-0.358898	delete it and create	-0.124939
-1.032548	pointer or a non-const	-0.124939
-0.884247	is obtained by dropping	-0.124939
-0.529921	is Microsoft Visual Studio.	-0.124939
-0.355624	AVX immintrin.h AMD SSE4A	-0.124939
-0.579057	member function or friend	-0.124939
-0.358709	by using function inlining,	-0.124939
-0.462011	to use static linking,	-0.124939
-0.539678	register. In example 12.2,	-0.124939
-1.057599	memory allocation is unnecessarily	-0.124939
-0.659895	The exception is caught	-0.124939
-0.237967	(e.g. an if-else structure),	-0.124939
-0.527339	each string is checked	-0.124939
-0.355536	a[i]; s1 += a[i+1];	-0.124939
-0.358704	Example 8.10a if (true)	-0.124939
-0.881293	vectorization (see page 107),	-0.124939
-0.591025	CPU-dispatching (see page 122)	-0.124939
-1.683872	explained on page 62.	-0.124939
-0.352087	price, compatibility, second source,	-0.124939
-1.683872	explained on page 96.	-0.124939
-0.650503	the software was coded.	-0.124939
-0.873160	can be optimized further.	-0.124939
-0.582923	x); } // Branch/loop	-0.124939
-0.886122	identified by a key?	-0.124939
-0.357530	(float *)alloca(n * sizeof(float));	-0.124939
-0.294284	a/a=1 --------x a/1=a x-xxx-x--	-0.124939
-0.463209	xxxxxxxxx -- - xx	-0.124939
-0.237967	by a unique key.	-0.124939
-0.993203	the end user. Menus,	-0.124939
-0.478801	not, by default, conform	-0.124939
-0.461859	* (columns * sizeof(float)).	-0.124939
-0.596159	on with a password.	-0.124939
-0.237967	pointer type casting. Linked	-0.124939
-0.549907	(using Intel vector classes):	-0.124939
-0.868314	Comparison of different compilers.............................................................................	-0.124939
-0.525667	_mm256_zeroupper() before any transition	-0.124939
-0.325448	addition and subtraction (3	-0.124939
-0.237967	should be obeyed. Copy	-0.124939
-0.349799	C++ compiler, v. 10.1.020.	-0.124939
-0.294284	code. See ISO/IEC TR18015	-0.124939
-0.505040	reader what is happening.	-0.124939
-0.562611	indices or by keys	-0.124939
-0.489150	constructor, an overloaded assignment	-0.124939
-0.294284	......................................................................................... 65 7.33 Namespaces...........................................................................................................	-0.124939
-1.014717	the different versions alternatingly	-0.124939
-0.873118	b, c, d, e,	-0.124939
-0.237967	C/C++ v. 1.4, 2005.	-0.124939
-0.421451	problem by defining _mm_malloc	-0.124939
-0.355924	error handler calls exit(),	-0.124939
-0.351726	list[100]; memset(list, 0, sizeof(list));	-0.124939
-0.325441	be controlled. Small hand-held	-0.124939
-0.572303	return a * a;}	-0.124939
-0.461957	2005). Borland C++ 5.82	-0.124939
-0.527304	irrelevant within a year	-0.124939
-0.237967	by the series: ex	-0.124939
-0.237967	1./6., 1./24., 1./120., 1./720.,	-0.124939
-0.237967	of instruction latencies, throughputs	-0.124939
-0.325441	int i, i_div_3; for(i=i_div_3=0;	-0.124939
-0.339516	be possible. Template meta-	-0.124939
-0.538986	else { goto CFALSE;	-0.124939
-0.587000	34 else { CFALSE:	-0.124939
-0.450286	returned in registers. Except	-0.124939
-0.237967	compilers www.agner.org/ optimize/#vectorclass Include	-0.124939
-0.595307	preferably be kept entirely	-0.124939
-0.358730	address of it (&ArraySize)	-0.124939
-0.294284	stored in ASCII form.	-0.124939
-0.358355	(bitwise and) will cut	-0.124939
-0.650503	the software was developed.	-0.124939
-0.849188	of the clock frequency,	-0.124939
-0.596159	compatibility with a lineage	-0.124939
-0.357136	but neither faster nor	-0.124939
-1.601186	x) { return x*x	-0.124939
-0.532931	define your own error-handling	-0.124939
-0.348404	is no clear correspondence	-0.124939
-0.541049	high-priority threads are areas	-0.124939
-0.358526	the CPU may occasionally	-0.124939
-0.826906	sequence of calculations forms	-0.124939
-0.237967	object-oriented programming, modularity, reusability	-0.124939
-0.594095	possible. See page 141.	-0.124939
-0.659341	data structures with First-In-First-Out	-0.124939
-0.580897	syntax is very old-fashioned.	-0.124939
-0.357332	n; #endif return n;}	-0.124939
-0.559021	have to replace u[1]	-0.124939
-0.357393	can give some indication	-0.124939
-0.607502	a shift operation. x*8	-0.124939
-0.593693	solution is more complicated.	-0.124939
-1.307705	the same memory block,	-0.124939
-0.237967	STL deque (doubly ended	-0.124939
-0.517879	etc. for Windows, -msse2,	-0.124939
-0.358980	consumers. Choose the strongest	-0.124939
-1.628907	is important to remember	-0.124939
-0.442082	same source file. Keep	-0.124939
-0.339512	files from disk. Memory-hungry	-0.124939
-0.597245	tests with the sizeof	-0.124939
-0.586686	traffic and a server	-0.124939
-0.358116	identification (RTTI), which affects	-0.124939
-0.541939	making the dispatch decision	-0.124939
-0.237967	Core 2 0.63 0.75	-0.124939
-0.831035	Intel Core 2 0.77	-0.124939
-0.860008	loop-carried dependency chain. Nothing	-0.124939
-0.356805	of 2: template <bool	-0.124939
-1.071797	it is a staircase	-0.124939
-0.358856	: 2.6f; The ?:	-0.124939
-0.658610	execution speed, memory economy	-0.124939
-0.237967	1./39916800., 1./4.790016E8, 1./6.22702E9, 1./8.71782E10,	-0.124939
-1.601186	x) { return ipow(x,10);	-0.124939
-0.294284	data conversion, shuffling, packing,	-0.124939
-0.237967	Delight". Addison-Wesley, 2003. Contains	-0.124939
-0.572868	Linux have an attribute	-0.124939
-0.339512	see my free E-book	-0.124939
-1.576555	b; a = (int)d;	-0.124939
-0.352982	or 16 Table 7.2.	-0.124939
-0.237967	(s0+s1)+(s2+s3); Now s0, s1,	-0.124939
-0.331953	v. 10.1.020. Functions _intel_fast_memcpy	-0.124939
-0.294284	Handles to windows, graphic	-0.124939
-0.855134	Mathematical functions for vectors........................................................................	-0.124939
-0.599091	structures: // Example 9.1a	-0.124939
-0.599091	follows: // Example 9.1b	-0.124939
-0.589196	a[size]; unsigned int absvalue,	-0.124939
-0.800274	the header file mathimf.h	-0.124939
-1.428609	can use the GetTickCount	-0.124939
-0.352091	support for XMM registers;	-0.124939
-0.356838	from static libraries (.lib	-0.124939
-0.591901	principle for a 2'nd	-0.124939
-1.230617	the cost of verifying,	-0.124939
-0.463547	quite fast. The lesson	-0.124939
-0.237967	on access. Sequential forward	-0.124939
-0.601255	problems of the original,	-0.124939
-0.463643	and attempts to translate	-0.124939
-0.237967	likely to experience. Occasionally,	-0.124939
-0.345268	possibility for significant improvements.	-0.124939
-0.463410	{ largest_abs = absvalue;	-0.124939
-1.087657	the code section position-independent,	-0.124939
-1.062213	branch by a conditional	-0.124939
-0.352982	38.1 97 Table 9.1.	-0.124939
-0.460061	up a stack frame,	-0.124939
-0.356115	for "standard stack frame"	-0.124939
-0.339510	8 -128 127 int8_t	-0.124939
-0.512008	throughput (see p. 104).	-0.124939
-0.237967	Vec4uq Vec4f Vec2d Vec8f	-0.124939
-0.237967	Vec32uc Vec16s Vec16us Vec8i	-0.124939
-0.237967	// x^1, x^2, x^3,	-0.124939
-0.650175	13.2 Model-specific dispatching ....................................................................................	-0.124939
-0.493454	= b ? 1.5f	-0.124939
-0.237967	program optimization /GL --combine	-0.124939
-0.517240	int 4 AVX2 _mm256_i32gather_epi32	-0.124939
-0.742310	no pointer aliasing. __declspec(noalias)	-0.124939
-0.757273	brands of CPUs unequally	-0.124939
-0.352983	| (~a&c) | (b&c)	-0.124939
-0.585177	directory as the .exe	-0.124939
-0.339514	break; case 2: printf("Gamma");	-0.124939
-0.357256	a 2'nd order polynomial:	-0.124939
-0.659389	are separated by commas.	-0.124939
-0.726710	is used and popped	-0.124939
-0.461530	process and software engineering	-0.124939
-0.463652	for calculating a polynomial.	-0.124939
-1.162464	to avoid the burdensome	-0.124939
-0.294284	an IDE. Free trial	-0.124939
-1.037125	the code becomes contiguous.	-0.124939
-0.721586	and YMM registers .................................................................	-0.124939
-0.353871	The programmer typically thinks	-0.124939
-0.357530	+= xxn * _mm_load_ps(coef+i);	-0.124939
-0.352089	helpful for later maintenance.	-0.124939
-0.505068	The installation of downloaded	-0.124939
-0.525559	chosen version return (*CriticalFunction)(parm1,	-0.124939
-0.355253	= a1 / b1;	-0.124939
-0.358290	is aiming at explaining	-0.124939
-1.226384	the loop count (ArraySize)	-0.124939
-0.892681	you should be prepared	-0.124939
-0.490351	cases. The so-called iterators	-0.124939
-0.352411	"position-independent code" actually implies	-0.124939
-0.596503	-56 rather than 200.	-0.124939
-0.237967	speed without jeopardizing safety,	-0.124939
-0.336344	= a&(b|c) x-xxxx--x (a|b)&(a|c)	-0.124939
-0.237967	C++ v. 4.1.0, 2006	-0.124939
-0.341900	11, Iss. 4, 2007	-0.124939
-0.237967	Denmark. Copyright © 2004	-0.124939
-0.591025	efficient (see page 53).	-0.124939
-0.582026	than using a ready-made	-0.124939
-1.355087	cc[]) { // Detect	-0.124939
-1.460104	functions such as GetPrivateProfileString	-0.124939
-0.495473	prevented by calling vector::reserve	-0.124939
-0.559012	CChild2 : public CParent<CChild2>	-0.124939
-0.961816	kinds of variable storage.............................................................................	-0.124939
-1.473706	is necessary to query	-0.124939
-0.599091	memcpy: // Example 7.33b	-0.124939
-0.358898	allocation (new and delete).	-0.124939
-1.149333	is faster to compose	-0.124939
-1.282697	to the function prototype:	-0.124939
-0.358300	for minimizing memory fragmentation.	-0.124939
-0.599712	references. It is OK,	-0.124939
-1.177811	tasks such as sorting,	-0.124939
-0.538986	b1, b2, y1, y2;	-0.124939
-0.522171	invalid and cause fatal	-0.124939
-0.892783	set is the scarcity	-0.124939
-0.421451	1994. Mostly obsolete. Rick	-0.124939
-1.053002	bytes without cache MOVNTI	-0.124939
-0.887452	get the value -100+100+100	-0.124939
-0.659805	7.17 Structures and classes............................................................................................	-0.124939
-0.357517	string constants, array initializer	-0.124939
-0.237967	compression and cryptography (www.intel.com).	-0.124939
-0.237967	1./720., 1./5040., 1./40320., 1./362880.,	-0.124939
-0.594265	operation will be non-zero,	-0.124939
-0.453984	data // constructor initializes	-0.124939
-0.358736	example, f(x) or g(x)	-0.124939
-0.562628	example when you discover	-0.124939
-0.455366	// will give -2.0	-0.124939
-0.591025	list (see page 93).	-0.124939
-0.788114	the optimized code (release	-0.124939
-0.526873	research, not on publicly	-0.124939
-0.728278	libraries are highly optimized,	-0.124939
-0.463652	only show a discrete	-0.124939
-0.237967	optimization report /Qopt-report -opt-report	-0.124939
-0.572248	isolation have been unsatisfied	-0.124939
-0.593769	conversions are not safe,	-0.124939
-0.356115	references, and stack entries	-0.124939
-0.595660	predictable than the other,	-0.124939
-0.463394	jobs simultaneously or seemingly	-0.124939
-0.463173	main through an imported	-0.124939
-0.598749	format is not standardized.	-0.124939
-0.358932	need modification to compensate	-0.124939
-0.294284	easier to test, maintain	-0.124939
-0.456965	complicated functions like sin.	-0.124939
-0.237967	pow, log, exp, sin,	-0.124939
-0.597245	N with the rightmost	-0.124939
-1.088134	of programming language ...............................................................................	-0.124939
-0.596515	large because the insertion	-0.124939
-0.462810	a public data object:	-0.124939
-0.355926	get library versions instead.	-0.124939
-1.282413	needs to be saved.	-0.124939
-0.504895	bit-mask: bc = _mm_andnot_si128(mask,	-0.124939
-1.573249	to: // Example 8.11b	-0.124939
-2.159700	Example: // Example 8.11a	-0.124939
-0.294284	on. Most IDE's (Integrated	-0.124939
-0.463657	storage order is opposite).	-0.124939
-0.294284	3B. developer.intel.com. AMD: "AMD64	-0.124939
-0.358749	0x10, Friday = 0x20,	-0.124939
-1.749565	} else { DTRUE:	-0.124939
-0.538986	else { goto DTRUE;	-0.124939
-0.505066	more common to exchange	-0.124939
-0.462604	the code, which supposedly	-0.124939
-0.494968	as the binary digits.	-0.124939
-1.683872	explained on page 44.	-0.124939
-0.345268	approximately seven significant digits,	-0.124939
-0.347530	with external libraries. www.agner.org/optimize/#vectorclass	-0.124939
-0.358704	element (approximately): if (absvalue	-0.124939
-1.328250	loop in example 8.23b	-0.124939
-0.352711	#if defined(__unix__) || defined(__GNUC__)	-0.124939
-0.355699	branch. The common excuse	-0.124939
-0.444434	2 #define FUNCNAME SelectAddMul_SSE2	-0.124939
-0.544008	are of course system-specific.	-0.124939
-0.347529	packing, unpacking needed. Predictable	-0.124939
-0.493085	larger vector size. Later	-0.124939
-0.581243	Microsoft C++ compilers www.agner.org/	-0.124939
-0.331953	limited by physical factors.	-0.124939
-0.349801	> and >= operators).	-0.124939
-0.352711	|| (!a&&c) || (b&&c)	-0.124939
-0.357028	z != 0; 35	-0.124939
-0.593497	CFALSE; } } 34	-0.124939
-0.237967	mangling. The characters '?',	-0.124939
-0.237967	= {1.1, 0.3, -2.0,	-0.124939
-0.530863	b = (unsigned int)a	-0.124939
-0.463542	enough bits for holding	-0.124939
-1.429540	in a separate module,	-0.124939
-0.325441	time-consuming data processing. Running	-0.124939
-0.550854	not take the hint,	-0.124939
-0.550518	doesn't depend on system-specific	-0.124939
-0.358749	int iset = instrset_detect();	-0.124939
-0.358736	resources locally or remotely.	-0.124939
-0.657847	comes with most distributions	-0.124939
-0.237967	Rick Booth: "Inner Loops:	-0.124939
-0.358838	common excuse that "we	-0.124939
-0.358367	each object. A little-known	-0.124939
-0.237967	Goedecker and Adolfy Hoisie:	-0.124939
-0.357899	call method using InstructionSet():	-0.124939
-0.237967	Mathematical functions Encryption, decryption,	-0.124939
-0.529025	look in my crystal	-0.124939
-0.294284	functions for millisecond resolution.	-0.124939
-0.336344	slow or completely absent	-0.124939
-0.382918	required a PC. Similarly,	-0.124939
-0.599145	having the same name,	-0.124939
-0.898056	numerically largest element (approximately):	-0.124939
-0.885097	the computer is reset	-0.124939
-0.463652	and show a disassembly,	-0.124939
-0.575707	have a natural ordering?	-0.124939
-0.659302	is concentrated on arranging	-0.124939
-0.356837	to insert optimization hints	-0.124939
-0.353673	and test their functionality.	-0.124939
-0.358898	are fetched and decoded	-0.124939
-0.314805	^ b ---xx---- a<<b<<c=a<<(b+c)	-0.124939
-0.870661	sets the variable __intel_cpu_feature_indicator	-0.124939
-0.237967	Intel Technology Journal Vol.	-0.124939
-0.294284	the diagonal remain unchanged.	-0.124939
-0.358943	all 1's is unchanged,	-0.124939
-0.917573	to put a tag	-0.124939
-0.237967	(C << 6); Or,	-0.124939
-0.237967	are also included. Combining	-0.124939
-0.835875	clock cycles to fetch	-0.124939
-0.563006	TILESIZE) { for (c1	-0.124939
-0.358943	the heap is reserved	-0.124939
-0.589099	preferably have a balanced	-0.124939
-0.457860	is actually quite convenient.	-0.124939
-0.356287	on most processors (when	-0.124939
-0.356614	backup copying without effectively	-0.124939
-0.688057	an empty throw() specification.	-0.124939
-0.541549	solution is too high.	-0.124939
-0.578162	microprocessors have no native	-0.124939
-0.659109	other purposes than rendering	-0.124939
-0.882978	accessed on a First-In-First-	-0.124939
-0.840259	loading a cache line:	-0.124939
-0.358601	is lost. This dilemma	-0.124939
-0.892708	expression a = (b*c)/d,	-0.124939
-0.358355	this value will propagate	-0.124939
-0.538986	be predicted perfectly varies	-0.124939
-1.201009	the same cache line,	-0.124939
-1.023208	of user interface framework...........................................................................	-0.124939
-0.456086	array of n floats:	-0.124939
-0.568924	time1; long long timediff[NumberOfTests];	-0.124939
-0.458834	of e.g. four floats.	-0.124939
-0.512008	chains (see p. 22).	-0.124939
-0.358760	with templates // Place	-0.124939
-0.352986	c:2; }; char abc;	-0.124939
-0.659895	unsigned integers is ambiguous	-0.124939
-0.816657	for specific CPU models.	-0.124939
-0.294284	do not 123 correspond	-0.124939
-0.358176	XOP, AMD only _mm_permutevar_ps	-0.124939
-1.070335	with: // Example 7.38b.	-0.124939
-0.659506	{ Table[x] = Y;	-0.124939
-0.358560	writing: __declspec(align(64)) int BigArray[1024];	-0.124939
-0.358898	and 3A and 3B.	-0.124939
-0.586722	through the following steps	-0.124939
-1.397455	more time than looping	-0.124939
-0.596838	int a = Func1(2);	-0.124939
-0.237967	a funda- mentally flawed	-0.124939
-0.806822	programmer to know about.	-0.124939
-0.237967	users with nagging pop-up	-0.124939
-0.358852	in Day for signifying	-0.124939
-0.356965	is called register renaming.	-0.124939
-0.325444	done by me manually,	-0.124939
-0.353233	Example 9.3 #include <malloc.h>	-0.124939
-0.557810	busy doing the spell	-0.124939
-1.105585	are not always sequential,	-0.124939
-0.358749	for (temp = &list[0];	-0.124939
-0.538986	infinity or NAN (Not	-0.124939
-0.357393	solutions may some day	-0.124939
-0.356053	to some extra complications.	-0.124939
-0.314805	are dominating. At least,	-0.124939
-0.357855	as AQtime, Intel VTune	-0.124939
-0.346499	Core Math Library __vrs4_expf	-0.124939
-2.419667	- - - x-xx----x	-0.124939
-0.539728	programs. The profiler identifies	-0.124939
-0.294284	(-a)*(-b)=a*b ---xxx--- a/a=1 --------x	-0.124939
-0.358749	x-xxxx--x (a|b)&(a|c) = a|(b&c)	-0.124939
-0.357024	can hold 8 double's	-0.124939
-0.352417	containers use linked lists.	-0.124939
-0.237967	constants, array initializer lists,	-0.124939
-1.786446	that can be programmed	-0.124939
-0.593023	will be used most.	-0.124939
-0.358760	function. 154 // Print	-0.124939
-0.314805	wrong branch. Microprocessor designers	-0.124939
-0.780061	same processor core. Try	-0.124939
-0.357084	are not stored contiguously	-0.124939
-0.599091	square: // Example 8.1b	-0.124939
-0.358760	x, y; // x,y	-0.124939
-2.159700	Example: // Example 8.1a	-0.124939
-0.358736	option -fwrapv or -fno-strict-overflow.	-0.124939
-0.294284	of a re- usable	-0.124939
-0.358760	has occurred. // Reset	-0.124939
-0.789035	to increase the likelihood	-0.124939
-0.575594	than when a fixed-size	-0.124939
-0.353678	behaviour is implementation dependent.	-0.124939
-0.294284	{ // Remove right-most	-0.124939
-0.237967	= -1 (a&~b)|(~a&b)=a^b ---------	-0.124939
-0.357954	example 14.23 page 143.	-0.124939
-0.358898	operators (&& and ||).	-0.124939
-1.059804	In order to facilitate	-0.124939
-0.237967	C++ v. 3.1, 2007.	-0.124939
-0.599091	classes): // Example 12.9b.	-0.124939
-0.461290	stack at address esp+8	-0.124939
-0.919192	be stored together ......................................	-0.124939
-0.658610	execution speed, memory economy,	-0.124939
-0.607497	not been updated lately.	-0.124939
-0.917495	an array of structures:	-0.124939
-0.870034	diagonal are accessed row-wise,	-0.124939
-1.810780	to make a lookup-table	-0.124939
-0.358824	which can't be reached	-0.124939
-2.159700	Example: // Example 8.16	-0.124939
-0.237967	frequency (in Windows: __rdtsc()).	-0.124939
-0.656602	global offset table (GOT).	-0.124939
-1.735621	example: // Example 8.17	-0.124939
-1.070335	numbers: // Example 8.18	-0.124939
-0.637364	files on access. Sequential	-0.124939
-0.357300	denormal numbers. You may,	-0.124939
-0.358934	the techniques of multithreading.	-0.124939
-0.358310	at runtime. Example 7.43	-0.124939
-0.599091	parameter: // Example 7.42	-0.124939
-1.421313	have to be renewed.	-0.124939
-0.897173	case: // Example 7.45	-0.124939
-0.897173	needed: // Example 7.44	-0.124939
-0.599091	unsigned. // Example 7.4.	-0.124939
-0.336344	of (2,2,2,2,2,2,2,2) Is16vec8 two(2,2,2,2,2,2,2,2);	-0.124939
-0.358736	A for-loop or while-loop	-0.124939
-0.502094	fully compiled code. Compiled	-0.124939
-0.563036	// table of 1/n!	-0.124939
-0.532499	80.9 512 512 378.7	-0.124939
-0.541242	cycles counter is counting	-0.124939
-0.237967	not supported fprintf(stderr, "\nError:	-0.124939
-0.595301	overflow and underflow neutralize	-0.124939
-0.294284	x.f = 2.0f; x.i	-0.124939
-0.237967	513 513 2056 38.1	-0.124939
-0.237967	511 511 2040 38.7	-0.124939
-0.456964	Gnu compiler allows "__attribute__((visibility("hidden")))".	-0.124939
-0.358898	computer games and animations	-0.124939
-0.237967	Architectures Optimization Reference Manual".	-0.124939
-0.463542	library made for demonstration	-0.124939
-0.455365	types of graphics cards,	-0.124939
-0.237967	1./3628800., 1./39916800., 1./4.790016E8, 1./6.22702E9,	-0.124939
-0.358690	table values by hand	-0.124939
-0.559414	to its own caller,	-0.124939
-0.429589	4, 8, 16, 32,	-0.124939
-0.355447	The next line provokes	-0.124939
-0.565165	of the second operand.	-0.124939
-0.598254	choose to make memory-hungry	-0.124939
-0.358943	error message is provoked	-0.124939
-0.726362	20, columns = 32;	-0.124939
-0.358749	92 DynamicArray[i] = WhateverFunction(i);	-0.124939
-0.592012	where they are unavoidable.	-0.124939
-0.630061	compiler option -fno-pic apparently	-0.124939
-0.599869	contained in a DLL.	-0.124939
-0.237967	without cache MOVNTPS _mm_stream_ps	-0.124939
-0.358932	resource-hungry applications to perform	-0.124939
-0.355988	ret ALIGN ; mark_end;	-0.124939
-0.591025	necessary (see page 96).	-0.124939
-0.358760	x2, x); // x^1,	-0.124939
-1.252094	with the option -ftrapv,	-0.124939
-0.503317	__intel_new_strlen in library libircmt.lib.	-0.124939
-0.355253	* (1. / 1.2345);	-0.124939
-1.782583	the code is repetitive.	-0.124939
-0.351327	int n; switch (n)	-0.124939
-0.358431	are critical time consumers.	-0.124939
-0.358939	cases ignore a request	-0.124939
-0.527297	binary representation of N:	-0.124939
-0.463067	Greek[4] = { "Alpha",	-0.124939
-0.357313	5 / 2 (be	-0.124939
-0.358943	the CPUID is artificially	-0.124939
-0.358736	computer game or animation.	-0.124939
-0.358898	the pros and cons	-0.124939
-0.498878	than a certain tolerance.	-0.124939
-0.237967	addresses 0x2F00, 0x3700, 0x3F00	-0.124939
-0.357530	4, anda * 17is	-0.124939
-0.501083	better than its reputation.	-0.124939
-0.461105	unsigned char 64 Iu8vec8	-0.124939
-0.864963	no pointer aliasing (/Oa).	-0.124939
-0.237967	operands: minimum, maximum, saturated	-0.124939
-0.564754	and 32 bit offsets).	-0.124939
-0.347530	mov xor mov $B1$2:	-0.124939
-0.851836	has a good knowledge	-0.124939
-0.507306	list[i].a = 1.0; list[i].b	-0.124939
-0.538271	an executable file stub.	-0.124939
-0.358736	(e.g. Quine–McCluskey or Espresso)	-0.124939
-0.343772	AMD: "Software Optimization Guide	-0.124939
-0.358898	Kernel Library" and "Integrated	-0.124939
-0.358932	time T+1 to T+6,	-0.124939
-0.461685	Here are some examples:	-0.124939
-0.355991	performance. We must bear	-0.124939
-0.541476	... } Here, log(2.0)	-0.124939
-2.060313	n.a. n.a. - andnot(a,a)	-0.124939
-1.070335	numbers: // Example 12.8a.	-0.124939
-0.237967	u; u.i &= 0x7FFFFFFF;	-0.124939
-0.237967	for Windows, -msse2, -mavx,	-0.124939
-0.358736	declaration "static" or "__attribute__((visibility	-0.124939
-0.489145	$B2$2: mov mov 2:8+esp	-0.124939
-0.237967	a = _mm_blendv_epi8(bc, c2,	-0.124939
-0.574486	start to optimize anything,	-0.124939
-0.357505	number of clock pulses	-0.124939
-0.582729	the operating systems disappears	-0.124939
-0.343772	there are search requests	-0.124939
-0.863726	but is less reliable.	-0.124939
-0.489143	work as possible. Typically	-0.124939
-0.358690	the destructor by constructing	-0.124939
-0.294284	is not allowed. Non-public	-0.124939
-0.357313	address below 2 GB,	-0.124939
-0.347528	be shared. Any writable	-0.124939
-0.237967	r1, r2, c1, c2;	-0.124939
-0.358736	file stdint.h or inttypes.h	-0.124939
-0.358307	two threads from attempting	-0.124939
-0.575707	time it takes. Debugging.	-0.124939
-0.591566	optimized by using indexes,	-0.124939
-0.382918	2 a+a+a+a=a*4 -(-a)=a --xxxxxx-	-0.124939
-0.541324	to what the preprocessor	-0.124939
-0.355852	a new processor enters	-0.124939
-0.237967	(gcc v. 4.5.2, July	-0.124939
-0.832111	static const float lookup[2]	-0.124939
-0.237967	rather than -156. Surprisingly,	-0.124939
-0.657174	are equally efficient because,	-0.124939
-0.356614	improve speed without jeopardizing	-0.124939
-0.585051	graphics function that draws	-0.124939
-0.358898	PHP, ASP and UNIX	-0.124939
-0.355449	int 4 AVX _mm256_permutevar_ps	-0.124939
-0.358898	'?', '@' and '$'	-0.124939
-0.352986	such contrived examples exist.	-0.124939
-0.352982	58.7 168.3 Table 9.3.	-0.124939
-1.789812	can be used freely	-0.124939
-0.356614	a unit-test without taking	-0.124939
-0.339514	to compare absolute values:	-0.124939
-0.237967	vectorization Automatic paralleli- zation	-0.124939
-0.357864	memory page size (4096).	-0.124939
-0.429586	24 6 Development process......................................................................................................	-0.124939
-0.294284	all the G values,	-0.124939
-0.237967	vector operands: minimum, maximum,	-0.124939
-0.356576	newsgroups contain useful discussions	-0.124939
-0.591888	well use a #define,	-0.124939
-0.237967	floating point status: _fpreset();	-0.124939
-0.351726	No differences were observed	-0.124939
-0.357914	actually reducing example 15.1d	-0.124939
-0.656556	the whole software package,	-0.124939
-0.575707	rather than allocating piecewise	-0.124939
-0.358011	that contains integer division:	-0.124939
-0.350376	last 8 columns unused.	-0.124939
-0.294284	int b;}; Sab ab[size];	-0.124939
-0.881519	something that can steal	-0.124939
-0.835324	example, x = array[i++]	-0.124939
-0.358898	are CPLDs and FPGAs.	-0.124939
-0.459326	// s += x^n/n!	-0.124939
-0.408223	20 3.7 File access................................................................................................................	-0.124939
-1.082873	be converted to OMF	-0.124939
-0.637376	the data fit nicely	-0.124939
-0.655406	program under test finishes	-0.124939
-0.549129	without the static keyword:	-0.124939
-0.802983	with the static keyword,	-0.124939
-0.237967	is 1 0.5ns. 2GHz	-0.124939
-0.358898	data compression and cryptography	-0.124939
-1.075876	only in the Professional	-0.124939
-0.538860	without the register keyword.	-0.124939
-1.460104	functions such as strcpy,	-0.124939
-0.599091	last: // Example 7.35b	-0.124939
-1.735621	example: // Example 7.35a	-0.124939
-0.593493	database by a plain	-0.124939
-0.382918	(SSE): #include <xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);	-0.124939
-0.237967	without cache MOVNTI _mm_stream_si32	-0.124939
-0.461583	bigger than 2 GB.	-0.124939
-1.896596	then it is advisable	-0.124939
-0.237967	Standard C++ imple- mentations	-0.124939
-0.517240	int64_t 4 AVX2 _mm_i32gather_ps	-0.124939
-0.358749	first. b+c = 100000001.23456.	-0.124939
-0.658630	per element Example 9.6b	-0.124939
-0.552526	work on non-Intel processors).	-0.124939
-0.596825	scheme should be weighed	-0.124939
-0.346500	instruction set, e.g. /arch:SSE2.	-0.124939
-0.237967	C++ Builder 5, 2009).	-0.124939
-0.294284	we have inserted UnusedFiller	-0.124939
-0.522541	compilers has several flaws:	-0.124939
-0.294284	...................................................................................................................... 33 7.5 Booleans...................................................................................................................	-0.124939
-0.353467	follows the mathematical notion	-0.124939
-0.314809	low-power CPUs (Intel Atom).	-0.124939
-0.356937	data exceeds 64 kbytes.	-0.124939
-0.237967	other compilers (Microsoft, Intel)	-0.124939
-0.868636	by a single comparison:	-0.124939
-0.826879	are available from Intel.	-0.124939
-0.444434	function calling conventions. FreeBSD	-0.124939
-1.309053	improve the performance somewhat.	-0.124939
-1.792908	the SSE2 instruction set:	-0.124939
-0.356527	// Dispatcher void SelectAddMul_dispatch(short	-0.124939
-0.353233	Example 16.1 #include <intrin.h>	-0.124939
-1.786446	that can be reduced.	-0.124939
-0.237967	infinity or NAN. Avoiding	-0.124939
-0.460135	any other error reporting	-0.124939
-0.463305	problems associated with profiling,	-0.124939
-0.847376	a discussion of profiling.	-0.124939
-0.525728	small. Are objects numbered	-0.124939
-0.294284	in the planning phase	-0.124939
-0.355253	= 1. / (b1	-0.124939
-0.357607	optimizer. Borland/CodeGear/Embarcadero C++ builder	-0.124939
-0.294284	with fixed strides. Uncached	-0.124939
-1.063811	it is the responsi-	-0.124939
-0.237967	the programmer hasn't thought	-0.124939
-0.657436	vector math library (SVML).	-0.124939
-0.358591	- x-xxx - xx(-)x-	-0.124939
-0.599091	polynomial: // Example 8.23a.	-0.124939
-0.902770	the rows are indexed	-0.124939
-0.503850	multi-core CPUs, but event-counters	-0.124939
-0.354381	may happen quite often.	-0.124939
-0.655328	in embedded systems Microcontrollers	-0.124939
-0.575594	popularity when a genuine	-0.124939
-1.018102	long time to calculate.	-0.124939
-0.331950	and object-oriented programming, modularity,	-0.124939
-1.486517	the sake of modularity.	-0.124939
-0.358932	forward) instruction to localize	-0.124939
-1.797669	you want to optimize,	-0.124939
-0.348402	count on it. Instead	-0.124939
-0.444434	this hot spot. Repeating	-0.124939
-0.478801	running in parallel. Fine-grained	-0.124939
-0.237967	and UNIX shell script.	-0.124939
-0.339510	thread-local storage p. 28)	-0.124939
-0.357606	OneOrTwo5[b!=0]; will also work,	-0.124939
-0.435097	16 in column 28,	-0.124939
-0.860709	pointer has been calculated.	-0.124939
-0.599285	correction for the "FDIV	-0.124939
-0.492293	PathScale C++ v. 3.1,	-0.124939
-0.959548	floating point code slower,	-0.124939
-1.064892	multiplying with the reciprocal:	-0.124939
-0.358736	calling. __fastcall or __attribute__((fastcall)).	-0.124939
-1.054970	( short int cc[size]	-0.124939
-0.237967	("fldl %1 \n fistpl	-0.124939
-0.237967	libraries named MKL, VML	-0.124939
-0.237967	cache (e.g. Sandy Bridge)	-0.124939
-0.339510	Header file MMX mmintrin.h	-0.124939
-0.357253	data types: long long,	-0.124939
-1.132325	that the microprocessor wastes	-0.124939
-0.463657	The division is inexact	-0.124939
-0.586794	geometry and other odd-sized	-0.124939
-0.237967	1./362880., 1./3628800., 1./39916800., 1./4.790016E8,	-0.124939
-1.810780	to make a thread-like	-0.124939
-0.358932	time T to T+5,	-0.124939
-0.358898	Mac OS and Itanium	-0.124939
-0.350373	this problem: 1. Relocation.	-0.124939
-0.503952	discriminates between CPU brands,	-0.124939
-0.421455	230.7 513 513 2056	-0.124939
-0.357229	0; row < NUMROWS;	-0.124939
-0.237967	4. Instruction tables: Lists	-0.124939
-1.267665	+ 1; } module2.cpp	-0.124939
-0.599091	have: // Example 12.8b.	-0.124939
-0.237967	n.a. 1.00 0.35 0.29	-0.124939
-0.599091	1.2f; // Example 14.18c	-0.124939
-0.346504	MMX mmintrin.h SSE xmmintrin.h	-0.124939
-0.347531	expected. Use square blocking:	-0.124939
-0.237967	4.5 0.82 0.59 0.27	-0.124939
-0.294284	1.00 0.25 0.28 0.22	-0.124939
-0.648056	<< 4) | ((C	-0.124939
-0.352983	& 0x0F) | ((B	-0.124939
-0.463410	{ b[i] = Func(a[i]);	-0.124939
-1.055545	if a program creates	-0.124939
-0.586722	Instead, the following work-around	-0.124939
-0.354387	objects for intermediate results,	-0.124939
-0.358550	call (other than log)	-0.124939
-0.294284	---x----- x---x---x x-xxx---- a*b*c=a*(b*c)	-0.124939
-0.357430	faster than any non-vector	-0.124939
-0.358824	the container be recycled?	-0.124939
-0.358666	ADC (add with carry)	-0.124939
-0.353873	and perhaps Mac OS.	-0.124939
-1.496616	} } The FactorialTable	-0.124939
-0.351325	printf("\n%2i %10I64i", i, timediff[i]);	-0.124939
-0.579256	but the programmer can.	-0.124939
-0.237967	flag (e.g. DEC, JNZ).	-0.124939
-0.649438	a critical dependency chain,	-0.124939
-0.456088	(or in addition to)	-0.124939
-0.237967	---xx---- (a+c==b+c)=(a==b) ----x---- !(a<b)=(a>=b)	-0.124939
-0.589696	below on page 134.	-0.124939
-0.325444	As table 9.3 shows,	-0.124939
-0.504742	cycles whenever it feeds	-0.124939
-0.352712	&& list[i] > 1.0)	-0.124939
-0.341900	due to general improvements	-0.124939
-0.580353	function" has been introduced	-0.124939
-0.339514	few programs do. Hence,	-0.124939
-0.294284	instruction set (called x86)	-0.124939
-2.159700	Example: // Example 8.2a	-0.124939
-1.362696	by // Example 8.2b	-0.124939
-0.355924	times and calls alternately	-0.124939
-0.314805	type. Interrupt service routines	-0.124939
-0.598331	call the function billions	-0.124939
-0.715066	AMD Opteron K8 1.09	-0.124939
-0.358630	calculate xn as x4∙xn-4.	-0.124939
-0.591025	compiler (see page 103),	-0.124939
-0.350879	back to around 1980	-0.124939
-0.885070	2 in example 14.7b,	-0.124939
-0.599091	2: // Example 14.7b.	-0.124939
-0.237967	mainstream next year. Ignoring	-0.124939
-0.593769	parameters are not affected	-0.124939
-0.237967	y.c + 3.; x.d	-0.124939
-0.352413	i; } x; x.f	-0.124939
-1.362696	this: // Example 7.9b	-0.124939
-1.735621	example: // Example 7.9a	-0.124939
-0.570806	effect is simply identical.	-0.124939
-0.615826	an exception occurs somewhere	-0.124939
-0.352981	a&&(b||c) !a && !b	-0.124939
-0.581110	because the memory bus	-0.124939
-0.325444	a First-In-First- Out (FIFO)	-0.124939
-0.356484	a safe programming practice,	-0.124939
-1.328250	loop in example 8.24	-0.124939
-2.159700	Example: // Example 8.25	-0.124939
-0.537501	an object file disassembler.	-0.124939
-0.982355	the Boolean operators &&,	-0.124939
-2.159700	Example: // Example 8.20	-0.124939
-0.884044	be calculated as (critical	-0.124939
-2.159700	Example: // Example 8.22	-0.124939
-0.538702	capabilities are very smart.	-0.124939
-1.362696	this: // Example 12.9a.	-0.124939
-0.463542	by 16 for SSE2,	-0.124939
-0.357779	{return r.a + r.b;}	-0.124939
-0.348402	and shift operations. Multiplying	-0.124939
-0.598553	++i and the post-increment	-0.124939
-0.581345	of information about bugs,	-0.124939
-0.358736	to C0::f or C1::f.	-0.124939
-0.237967	256 F32vec4 F64vec2 F32vec8	-0.124939
-0.237967	{...} // Dispatcher. Will	-0.124939
-0.325441	must warn against overkill.	-0.124939
-1.362696	by // Example 8.3b	-0.124939
-0.463173	feature uses an ordinary	-0.124939
-0.356614	oriented programming without paying	-0.124939
-0.237967	See Intel Technology Journal	-0.124939
-0.596838	float a = -1.0E8,	-0.124939
-0.294284	may cause slight imprecision	-0.124939
-0.564452	chosen compiler doesn't provide	-0.124939
-0.560041	do integer operations in-between	-0.124939
-0.870661	sets the variable __intel_cpu_feature_indicator_x.	-0.124939
-0.763061	have to save recovery	-0.124939
-0.733594	Windows Template Library (WTL).	-0.124939
-0.237967	Glibc v. 2.7, 2.8.	-0.124939
-0.525188	where each bit indicates	-0.124939
-0.538986	*(p++) |= 0x20; 46	-0.124939
-0.733594	Windows Template Library (WTL):	-0.124939
-0.331950	suffer from mispredictions. 44	-0.124939
-0.439073	make a reliable decision.	-0.124939
-0.294284	in x86 systems). 42	-0.124939
-0.325441	into a place indicated	-0.124939
-0.237967	sets. Covers PC's, workstations	-0.124939
-0.237967	for(i=0,i2=0; i<100; i++,i2+=2.0f)a[i]=i2; 41	-0.124939
-0.294284	512 512 2048 230.7	-0.124939
-0.554112	-fpic is much faster,	-0.124939
-0.447886	> 0) ? (cc[i]	-0.124939
-0.585543	is the variable 85	-0.124939
-0.358265	--combine -fwhole- program /Qipo	-0.124939
-0.382918	is quite tedious indeed.	-0.124939
-0.561759	a[0] = 1; a[1]	-0.124939
-0.351726	| (C << 6);	-0.124939
-0.358852	of attack for hackers.	-0.124939
-0.659895	the multiplication is exact.	-0.124939
-0.358852	row, column; for (row	-0.124939
-0.356655	though less user friendly.	-0.124939
-0.478078	value is poorly predictable,	-0.124939
-0.997714	graphical user interface (OnIdle	-0.124939
-0.570575	distributors are often abusing	-0.124939
-0.356053	a leaf function. Leaf	-0.124939
-0.237967	"Alpha", "Beta", "Gamma", "Delta"	-0.124939
-0.358749	8, Thursday = 0x10,	-0.124939
-0.429589	45 7.14 Functions ................................................................................................................	-0.124939
-0.358736	to C1::Disp() or C2::Disp()	-0.124939
-0.357806	or goes into sleep	-0.124939
-0.237967	also called Single-Instruction-Multiple-Data (SIMD)	-0.124939
-0.358563	branch (e.g. an if-else	-0.124939
-0.345275	"C" int CriticalFunction ();	-0.124939
-0.591508	library, you are feeding	-0.124939
-0.294284	a*1=a (-a)*(-b)=a*b a/a=1 ----x---x	-0.124939
-0.358838	on hacks that violate	-0.124939
-0.594095	1. See page 34.	-0.124939
-0.237967	2.23 0.95 0.6 1.19	-0.124939
-0.294284	x-xxxxxxx ---x----- x--xx---- (a&&b)||(a&&!b)=a	-0.124939
-0.451232	except in special mathe-	-0.124939
-0.595332	language, such as VHDL	-0.124939
-0.892681	updates should be postponed	-0.124939
-0.350376	32-bit systems gives rise	-0.124939
-0.589196	d; unsigned int u[2]}	-0.124939
-0.294284	email and web browsing	-0.124939
-0.878140	use a loop counter:	-0.124939
-0.237967	list[] = {1.1, 0.3,	-0.124939
-0.352413	scientific vector processors. Henry	-0.124939
-0.357433	bit mode, we encounter	-0.124939
-1.104854	the data cache. Bit-fields	-0.124939
-0.358816	the output are unacceptable.	-0.124939
-0.353673	variable if their live-ranges	-0.124939
-0.294284	0.77 0.89 0.40 0.30	-0.124939
-0.358760	// Time // Serialize	-0.124939
-0.343772	IA-32 Architectures Optimization Reference	-0.124939
-0.358431	course also time consuming,	-0.124939
-0.348409	about bugs, compatibility problems,	-0.124939
-0.656602	global offset table (GOT)	-0.124939
-0.715066	AMD Opteron K8 0.38	-0.124939
-0.237967	(-a==-b)=(a==b) ---xx---- (a+c==b+c)=(a==b) ----x----	-0.124939
-0.851713	smart pointer is created,	-0.124939
-0.237967	__declspec(__align(64)) double matrix[SIZE][SIZE]; transpose(matrix);	-0.124939
-0.358019	to prevent cache contention.	-0.124939
-0.357332	& (N-1)) return powN<(N1&(N1-1))==0,N1>::p(x)	-0.124939
-0.358852	all squares: for (r1	-0.124939
-0.358690	intelligible way by wrapping	-0.124939
-0.600620	b can be omitted,	-0.124939
-0.325441	* p) {return p->a	-0.124939
-0.698937	is not necessarily newer.	-0.124939
-0.237967	defines hardware circuits consisting	-0.124939
-0.572868	we have an estimated	-0.124939
-0.550648	to CPU dispatching. Underestimating	-0.124939
-0.592745	and stored in edx.	-0.124939
-0.358898	variables Y and Z.	-0.124939
-0.237967	64 and IA-32 Architectures	-0.124939
-0.358736	global variables or hide	-0.124939
-0.481368	the empty throw() specification	-0.124939
-0.237967	done by fetching, decoding	-0.124939
-0.237967	handler calls exit(), abort(),	-0.124939
-0.659109	to use than others.	-0.124939
-0.355536	Y; Y += Z;	-0.124939
-0.822061	Aligning dynamically allocated memory.................................................................	-0.124939
-0.863736	should also be considered.	-0.124939
-1.036604	of code in general.	-0.124939
-0.599091	class: // Example 7.38a.	-0.124939
-0.595041	2B. There are hundreds	-0.124939
-0.357862	pure_function ; double Func2(double	-0.124939
-0.529921	date): Microsoft Visual studio	-0.124939
-1.337322	don't have to reinvent	-0.124939
-0.358934	exceeding that of yesterday's	-0.124939
-0.349799	works (gcc v. 4.5.2,	-0.124939
-0.237967	?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROC NEAR	-0.124939
-0.358856	with these. The CodeGear,	-0.124939
-0.460526	linking. The file http://www.agner.org/optimize/asmlib.zip	-0.124939
-0.435094	(three on CodeGear compiler)	-0.124939
-0.294284	14.4 511 511 2040	-0.124939
-0.599091	polymorphism: // Example 7.43a.	-0.124939
-0.841795	are implemented as recursive	-0.124939
-0.294284	are not testing. Trying	-0.124939
-0.355447	29 with line 29.	-0.124939
-0.356614	of F1 without returning.	-0.124939
-0.355707	b memcpy(b, a, sizeof(b));	-0.124939
-0.355701	operating system thread scheduler.	-0.124939
-0.656602	The following table summarizes	-0.124939
-0.463523	can happen that (b*c)	-0.124939
-0.354534	function that simply prints	-0.124939
-0.588831	"Zen of code optimization",	-0.124939
-0.325448	a = parabola (2.0f);	-0.124939
-0.873771	int i; ... list[i	-0.124939
-0.358852	// Template for pow(x,N)	-0.124939
-0.558539	[edx] DWORD PTR [eax+400]	-0.124939
-0.596503	100 rather than -156.	-0.124939
-1.074876	object can be speeded	-0.124939
-0.358749	3.; x.d = y.d	-0.124939
-0.600620	work-around can be used:	-0.124939
-0.585543	than the variable m.	-0.124939
-0.358560	(int x, int m)	-0.124939
-0.463410	... x.a = y.a	-0.124939
-0.463410	1.; x.b = y.b	-0.124939
-0.463410	2.; x.c = y.c	-0.124939
-0.237967	of code optimization", Coriolis	-0.124939
-0.314805	16, 32, 64, ...).	-0.124939
-0.463749	m;} template <int m>	-0.124939
-0.237967	INVALID_HANDLE_VALUE && WriteFile(handle, ...))	-0.124939
-0.600955	error in the oldest	-0.124939
-0.358898	point registers and correspondingly	-0.124939
-0.504077	processors). It has excellent	-0.124939
-0.527024	f=i; f = (float)i;	-0.124939
-0.900884	functions in the grandparent	-0.124939
-0.237967	Vec16us Vec8i Vec8ui Vec4q	-0.124939
-0.294284	eax / sar ebx,1	-0.124939
-0.659417	} u; if (u.i[1]	-0.124939
-0.559653	because of its simplicity.	-0.124939
-0.339514	many processes simultaneously. Actually,	-0.124939
-0.314809	applies to 3-dimensional geometry	-0.124939
-0.237967	int 128 Is32vec4 Vec4i	-0.124939
-0.567067	is one that saves	-0.124939
-0.358600	Assume pointer not aliased	-0.124939
-0.237967	Vec4f Vec2d Vec8f Vec4d	-0.124939
-0.237967	13 objects, respectively (MS	-0.124939
-0.237967	make a thread-like scheduling	-0.124939
-0.347528	endian storage (e.g. PowerPC).	-0.124939
-1.872906	there is no guarantee	-0.124939
-0.237967	model is over. Virtualization	-0.124939
-0.581226	find the best algorithm.	-0.124939
-0.501655	core will always compete	-0.124939
-0.595245	need to do searches	-0.124939
-0.764356	same register for both,	-0.124939
-0.346506	by using nontemporal writes.	-0.124939
-0.358736	by *p or p->member	-0.124939
-0.587987	element. I have confirmed	-0.124939
-0.358898	execution units and hence	-0.124939
-0.354672	? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)	-0.124939
-0.599091	enabled: // Example 14.21.	-0.124939
-0.358630	not quite as versatile.	-0.124939
-0.358898	modularity, reusability and systematization	-0.124939
-0.358300	a smaller memory footprint.	-0.124939
-0.504977	functions memset and memcpy:	-0.124939
-0.358690	this section by summing	-0.124939
-0.358300	execution units, memory ports,	-0.124939
-0.314805	an || expression. Assume,	-0.124939
-0.358856	hardware design. The ultimate	-0.124939
-0.788977	register stack is organized.	-0.124939
-0.358898	string searching and parsing	-0.124939
-0.325444	of Denmark. Copyright ©	-0.124939
-0.504918	security problem. The official	-0.124939
-1.041535	the function. This fragmentation	-0.124939
-0.237967	The clumsy AND-OR construction	-0.124939
-0.764264	the code are modified,	-0.124939
-1.046789	that it takes 40%	-0.124939
-0.349799	studio 2008, v. 9.0	-0.124939
-0.237967	the option -read_only_relocs suppress.	-0.124939
-0.505040	program efficiency is reflected,	-0.124939
-0.358898	page 136 and 137,	-0.124939
-0.346503	an error condition terminates	-0.124939
-0.339510	(see above, p. 26).	-0.124939
-1.067965	can be predicted perfectly.	-0.124939
-0.356485	64 64 32 16.4	-0.124939
-0.237967	optimization /GL --combine -fwhole-	-0.124939
-1.737215	the program is terminated	-0.124939
-0.325441	fraction 2 63 .	-0.124939
-0.358600	arrays forwards, not backwards.	-0.124939
-0.237967	{ "Alpha", "Beta", "Gamma",	-0.124939
-0.357272	like -(-a) very often,	-0.124939
-0.325444	buffer, branch pattern history,	-0.124939
-0.352419	int x; public: c1()	-0.124939
-0.596515	integer because the integer-to-float	-0.124939
-0.881182	{ const int arraysize	-0.124939
-0.314805	256 Vec32uc Vec16s Vec16us	-0.124939
-0.294284	-(-a)=a --xxxxxx- a-(-b)=a+b ---xxx-x-	-0.124939
-0.355699	'this' pointer, common subexpressions,	-0.124939
-0.505915	linker to remove unreferenced	-0.124939
-0.596159	end with a non-recursing	-0.124939
-1.555593	the Intel compiler puts	-0.124939
-0.237967	of Mathcad (v. 15.0)	-0.124939
-0.550816	compile-time generation of identifier	-0.124939
-0.355778	But this language gained	-0.124939
-0.294284	the early planning stage	-0.124939
-0.314805	i2; for(i=0,i2=0; i<100; i++,i2+=2.0f)a[i]=i2;	-0.124939
-1.194952	an intermediate code (byte	-0.124939
-0.357712	ReadTSC() from library asmlib..	-0.124939
-0.504913	optimization or for combining	-0.124939
-0.358901	or limited in scope.	-0.124939
-0.358690	120 ms by selecting	-0.124939
-0.799414	large data structures .............................................................	-0.124939
-0.349150	windows, mutexes, database connections,	-0.124939
-0.358852	list[100], *temp; for (temp	-0.124939
-0.586501	errors must be added.	-0.124939
-0.294284	a discussion. 7.33 Namespaces	-0.124939
-0.572401	structure and then merge	-0.124939
-0.591025	integers (see page 142).	-0.124939
-0.595332	9.2, such as flush	-0.124939
-0.237967	include JavaScript, PHP, ASP	-0.124939
-0.354534	are not well documented.	-0.124939
-0.586957	circumvent operating system standards.	-0.124939
-0.382918	37 7.8 Member pointers.......................................................................................................37	-0.124939
-0.358852	software module for correctness	-0.124939
-0.525163	involving class objects (rather	-0.124939
-0.294284	other is -0 (zero	-0.124939
-0.358560	first time int CriticalFunction_Dispatch(int	-0.124939
-1.199718	Contentions in the BTB	-0.124939
-0.463749	Some compilers offer profile-guided	-0.124939
-0.615826	PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROC	-0.124939
-0.832426	// exponent + 0x3FF	-0.124939
-2.159700	Example: // Example 7.32a	-0.124939
-0.355988	= string; while (*p	-0.124939
-0.343771	problems are usability issues,	-0.124939
-0.572316	2016. The same coding	-0.124939
-0.460584	F1(int x[]); void F2(float	-0.124939
-0.787858	Overview of compiler options.......................................................................................	-0.124939
-0.832426	// exponent + 0x3FFF	-0.124939
-0.294284	add cmp jl $B1$3:	-0.124939
-0.858208	i++) a[i] = 0.0;	-0.124939
-0.237967	as example 12.4b, rewritten	-0.124939
-0.294284	such as semaphores, mutexes	-0.124939
-0.454946	int64_t 2 AVX2 _mm256_i64gather_epi32	-0.124939
-0.463645	possible point of attack	-0.124939
-0.294284	than the external clock.	-0.124939
-0.294284	as pow, log, exp,	-0.124939
-0.356528	different user access rights.	-0.124939
-0.570575	counts are often fluctuating	-0.124939
-1.185421	problem is to combine	-0.124939
-0.456084	!(a<b)=(a>=b) (a<b && b<c	-0.124939
-0.463445	This solution can incur	-0.124939
-0.576079	files are also included.	-0.124939
-0.527123	these directives are compiler-specific.	-0.124939
-1.486517	the sake of security,	-0.124939
-0.527182	Reinterpret cast The reinterpret_cast	-0.124939
-0.541184	bypassing the so-called CPU-dispatcher	-0.124939
-0.580451	+ b * 1.5f;	-0.124939
-0.358186	The three functions Sum1,	-0.124939
-0.457865	write configuration files (*.ini	-0.124939
-0.541242	even integer is returned.	-0.124939
-0.237967	Opteron K8 1.09 1.25	-0.124939
-0.408223	0.75 0.18 0.11 1.21	-0.124939
-1.292910	of a class (also	-0.124939
-0.341900	and advanced prediction mechanisms.	-0.124939
-0.294284	processors. Henry S. Warren,	-0.124939
-0.294284	columns; j++) 39 matrix[i][j]	-0.124939
-0.358932	OK, however, to pass	-0.124939
-0.559835	similar CPU dispatch mechanisms,	-0.124939
-0.336344	Function inlining x-xxxx--x Constantfolding	-0.124939
-0.527024	* CriticalFunction = &CriticalFunction_Dispatch;	-0.124939
-0.598749	package is not traditionally	-0.124939
-0.810696	in simple cases. Database	-0.124939
-0.314805	integers to alias upon	-0.124939
-0.353877	program - preferably isolated	-0.124939
-0.917633	to do a thorough	-0.124939
-0.346503	== EXCEPTION_FLT_OVERFLOW ? EXCEPTION_EXECUTE_HANDLER	-0.124939
-0.349801	bitwise AND operation isolates	-0.124939
-0.237967	in my crystal ball	-0.124939
-0.358096	fma4intrin.h (Gnu) all intrin.h	-0.124939
-0.358760	/ 10; // Convert	-0.124939
-1.460104	functions such as pow,	-0.124939
-0.558497	using a common denominator	-0.124939
-0.341905	min) <= (unsigned int)(max	-0.124939
-0.325441	a simple regular pattern,	-0.124939
-0.955205	should be aware of.	-0.124939
-0.237967	i<100; i++,i2+=2.0f)a[i]=i2; 41 Float	-0.124939
-0.697017	an efficient solution. Sort	-0.124939
-0.461060	delete, and often excessively	-0.124939
-0.237967	have been reordered, inlined,	-0.124939
-0.294284	then be repeated 1024/4	-0.124939
-0.439075	&& i <= max)	-0.124939
-0.568235	e.g. the option /QaxAVX	-0.124939
-0.463038	explanation why this delaying	-0.124939
-0.358736	speed /O2 or /Ox	-0.124939
-0.512963	No stack frame /Oy	-0.124939
-0.358898	by n and reorganize	-0.124939
-0.358736	use internet or intranet	-0.124939
-0.352712	* 2 > v.i	-0.124939
-0.347530	vector register containing (2,2,2,2),	-0.124939
-0.565949	the library functions directly:	-0.124939
-0.336351	if (handle != INVALID_HANDLE_VALUE	-0.124939
-0.237967	Borland C++ 5.82 (Embarcadero/CodeGear/Borland	-0.124939
-0.864963	no pointer aliasing /Oa	-0.124939
-0.356837	-O3 Interprocedural optimization /Og	-0.124939
-0.594095	(RTTI). See page 54.	-0.124939
-0.358898	latencies, throughputs and micro-operation	-0.124939
-0.237967	Boolean operators &&, ||,	-0.124939
-0.357805	a XOR b Bit	-0.124939
-0.912392	number of possible inputs.	-0.124939
-0.594171	generate the value infinity,	-0.124939
-0.455366	negative inputs give infinity.	-0.124939
-0.343769	it from fully utilizing	-0.124939
-1.014832	is a simple solution,	-0.124939
-0.294284	vectorization less favorable: Larger	-0.124939
-0.580353	STL has been criticized	-0.124939
-0.835287	Clang, Intel or PathScale.	-0.124939
-0.347536	possibility of algebraic reduction.	-0.124939
-0.897557	chosen for the label.	-0.124939
-0.761194	may be faster despite	-0.124939
-0.500619	Optimize for speed /O2	-0.124939
-0.869891	user has to reinstall	-0.124939
-0.505092	runs under the framework,	-0.124939
-0.355152	a vector, uses SSE3.	-0.124939
-1.020719	in a particular situation,	-0.124939
-0.352990	and % means modulo.	-0.124939
-0.596838	8.3b a = 5.0f;	-0.124939
-0.294284	8.12b int a[2]; a[0]	-0.124939
-1.030323	how to avoid hard-to-find	-0.124939
-0.357899	allocated memory, using new.	-0.124939
-0.237967	floating point -ffast-math /fp:fast	-0.124939
-0.586501	i must be adjusted	-0.124939
-0.358816	oriented programming are dominating.	-0.124939
-0.356614	the same without discriminating	-0.124939
-0.463643	was down to 36.	-0.124939
-0.237967	Last updated 2014-08-07. Contents	-0.124939
-0.237967	v. 2.7, 2.8. Asmlib:	-0.124939
-0.358939	example converts a zero-terminated	-0.124939
-0.463657	this unit is pipelined,	-0.124939
-0.875438	efficient than a polymorphous	-0.124939
-0.358898	defining _mm_malloc and _mm_free.	-0.124939
-0.550037	pipeline and later discovers	-0.124939
-0.294284	without cache MOVNTPD _mm_stream_pd	-0.124939
-0.357661	patterns containing multiple streams	-0.124939
-0.325444	without cache MOVNTQ _mm_stream_pi	-0.124939
-0.461824	(requires binutils version 2.20,	-0.124939
-0.358431	at regular time intervals.	-0.124939
-0.549247	modules (See page 81).	-0.124939
-0.657022	- 45 clock cycles).	-0.124939
-0.357530	For example,a * 16is	-0.124939
-0.567120	while pointers and non-constant	-0.124939
-0.358943	it (&ArraySize) is taken.	-0.124939
-0.504977	annoyingly long and irregular	-0.124939
-0.517727	that the linker extracts	-0.124939
-0.541242	its address is taken,	-0.124939
-0.643912	objects in computer games	-0.124939
-0.581823	1 b = lrint(d);	-0.124939
-0.521735	4 int 128 Is32vec4	-0.124939
-0.524357	unsigned int 64 Is32vec2	-0.124939
-0.355536	design of small microcontrollers:	-0.124939
-0.572993	or function call (other	-0.124939
-0.237967	-fwhole- program /Qipo -ipo	-0.124939
-0.358560	(int a, int x[])	-0.124939
-0.357862	int dummy; double a[arraysize],	-0.124939
-0.343779	break; case 3: printf("Delta");	-0.124939
-0.357358	by default, so 1.2	-0.124939
-0.504561	= 1. This ends	-0.124939
-0.294284	unable to respond quickly	-0.124939
-0.350374	relieving a syntax restriction,	-0.124939
-0.595332	purposes such as email	-0.124939
-0.349154	All x86 platforms (Windows,	-0.124939
-0.336344	of list plus i*sizeof(S1).	-0.124939
-0.600955	bytes in the end.	-0.124939
-0.460979	of position-independent code. 147	-0.124939
-1.239656	at a time packed	-0.124939
-0.354387	reads from addresses 0x2F00,	-0.124939
-1.115499	have an option (Windows:	-0.124939
-0.352711	a&&(b||c) (a&&!b) || (!a&&b)	-0.124939
-0.358898	a temp1 and temp2.	-0.124939
-0.237967	A limited "express" edition	-0.124939
-0.883343	to the critical stride,	-0.124939
-0.592074	distance the critical stride.	-0.124939
-0.237967	calculated as (critical stride)	-0.124939
-0.575596	thread than to temporarily	-0.124939
-0.502700	uncommon for software teachers	-0.124939
-0.358898	of starting and stopping	-0.124939
-0.594171	to the value 0x2C	-0.124939
-0.525859	first generation class (CGrandParent)	-0.124939
-0.358300	computers have memory caches.	-0.124939
-0.354224	v.f are both positive.	-0.124939
-0.237967	Mostly obsolete. Rick Booth:	-0.124939
-0.382918	High precision math. Libraries	-0.124939
-0.237967	metaprogramming so complicated? Because	-0.124939
-0.358760	* 2; // Find	-0.124939
-0.358736	in 2015 or 2016.	-0.124939
-0.498969	Boolean algebra reductions: !(!a)=a	-0.124939
-0.504883	is infinity or NAN.	-0.124939
-0.325441	each integer type. Interrupt	-0.124939
-0.598272	intended to be platform-independent	-0.124939
-0.237967	such as strcpy, strcat,	-0.124939
-0.237967	afterwards a BSF (bit	-0.124939
-0.358630	the arrays as required,	-0.124939
-0.347530	big file containing numerical	-0.124939
-0.358856	name mangling. The characters	-0.124939
-0.600620	estimate can be made)	-0.124939
-0.463645	all sizes of matrices.	-0.124939
-0.237967	Monday, Tuesday, Wednesday, Thursday,	-0.124939
-0.357230	to their 32-bit counterparts.	-0.124939
-0.594095	containers. See page 90.	-0.124939
-0.789014	mix float and double.....................................................................................	-0.124939
-1.574702	at compile time. (Of	-0.124939
-1.073431	of the C++ language......................................................	-0.124939
-0.237967	32 bits (rarely 64).	-0.124939
-0.600955	wasteful in the STL.	-0.124939
-1.115234	cases where it matters:	-0.124939
-1.088531	- a & 0=	-0.124939
-0.358901	be determined in advance,	-0.124939
-0.847400	Integers variables and operators...............................................................................	-0.124939
-0.591025	intended (see page 84).	-0.124939
-0.294284	80x86 / x64 (Visual	-0.124939
-0.352982	point multiply-and-add Table 13.1.	-0.124939
-0.357899	memory allocation using new/delete	-0.124939
-0.599091	point: // Example 14.22b	-0.124939
-2.159700	Example: // Example 14.22a	-0.124939
-0.582035	optimal from a technological	-0.124939
-0.356573	the performance even matters,	-0.124939
-0.358116	an interpreter which interprets	-0.124939
-0.637364	files on access. Run	-0.124939
-1.352769	are useful for vectorizing	-0.124939
-1.222903	first byte at 400,	-0.124939
-0.237967	on C++ Performance". www.open-	-0.124939
-0.599091	is. // Example 15.1d.	-0.124939
-0.343769	avoids the overflow. Taking	-0.124939
-1.282413	has to be reloaded	-0.124939
-0.586722	has the following features:	-0.124939
-0.358852	E-book Usability for Nerds	-0.124939
-1.636264	versions of the user-written	-0.124939
-0.358344	function. The } 59	-0.124939
-0.237967	LIBM Library amd_vrs4_expf amd_vrd2_exp	-0.124939
-0.357530	= 2 * 5;	-0.124939
-0.579312	This solution is clearly	-0.124939
-0.237967	pointer aliasing /Oa -fno-alias	-0.124939
-0.354381	compiler does quite ingenious	-0.124939
-0.294284	the same template. 57	-0.124939
-0.558497	making a common denominator:	-0.124939
-0.357862	pure_function #endif double Func1(double)	-0.124939
-0.343768	the SSE2 (or later)	-0.124939
-0.358934	Technical University of Denmark.	-0.124939
-1.156404	static inline void StoreNTD(double	-0.124939
-0.314805	than a float. (Both	-0.124939
-0.237967	(Embarcadero/CodeGear/Borland C++ Builder 5,	-0.124939
-0.358901	dependency chain in two:	-0.124939
-2.159700	Example: // Example 14.18a	-0.124939
-0.599091	precision: // Example 14.18b	-0.124939
-0.594095	function. See page 53.	-0.124939
-0.561899	zero, c + two,	-0.124939
-0.639073	:1;//signbit }; struct Slongdouble	-0.124939
-0.599091	union: // Example 9.2b	-0.124939
-2.159700	Example: // Example 9.2a	-0.124939
-0.525305	or 64-bit mode. Much	-0.124939
-0.580563	line would be evicted.	-0.124939
-0.358932	update mechanism to advertise	-0.124939
-0.408223	---xx--xx (-a==-b)=(a==b) ---xx---- (-a>-b)=(a<b)	-0.124939
-0.358816	time intervals are short.	-0.124939
-0.594095	requested. See page 45.	-0.124939
-0.237967	1./40320., 1./362880., 1./3628800., 1./39916800.,	-0.124939
-0.358096	can replace all occurrences	-0.124939
-0.294284	the processing power. Connecting	-0.124939
-0.591025	checking (see page 134)	-0.124939
-0.539678	16. In example 12.1a,	-0.124939
-0.237967	as memcpy, memmove, memset,	-0.124939
-0.358943	the original is destroyed.	-0.124939
-0.357433	= 4, we have:	-0.124939
-0.591566	zero by using memset:	-0.124939
-0.442080	(0, 2, 4, etc.).	-0.124939
-0.555128	mentioned in table 9.2,	-0.124939
-0.237967	does incredibly stupid things.	-0.124939
-0.463038	work around this limitation	-0.124939
-1.735621	example: // Example 8.24.	-0.124939
-0.325444	access to virus attacks	-0.124939
-0.599091	condition: // Example 7.32b	-0.124939
-0.358856	that doesn’t. The undocumented	-0.124939
-1.118387	that we can surely	-0.124939
-0.325444	it with 2n -1.	-0.124939
-0.788593	following conditions are met:	-0.124939
-1.737215	the program is shut	-0.124939
-1.484517	in the following sections.	-0.124939
-0.237967	strange and unexpected behaviors.	-0.124939
-0.635365	Optimizes reasonably well. Very	-0.124939
-0.346502	C++ compilers allow assembly-like	-0.124939
-0.237967	32-bit software development", Addison-	-0.124939
-0.541324	problem are the following:	-0.124939
-0.463705	To explain the difference,	-0.124939
-0.599584	access is a bottleneck.	-0.124939
-0.347532	form a logical sequence.	-0.124939
-0.915851	test () { __declspec(__align(64))	-0.124939
-0.570193	below. The function rounds	-0.124939
-0.348401	result of macro expansions.	-0.124939
-0.589099	and have a temp1	-0.124939
-0.355253	= (0x2710 / 0x40)	-0.124939
-0.593496	integer value of temp.	-0.124939
-0.358344	(time before) } printf("\nResults:");	-0.124939
-0.237967	project at hand. Low-level	-0.124939
-0.356283	= (A & 0x0F)	-0.124939
-0.839993	type identification (RTTI) ...........................................................................	-0.124939
-1.601186	x) { return powN<true,N/2>::p(x)	-0.124939
-0.237967	optimized code (release version)	-0.124939
-0.657621	a to b memcpy(b,	-0.124939
-0.237967	dummy; double a[arraysize], b[arraysize],	-0.124939
-0.599819	closest to the truth	-0.124939
-0.505092	will support the ADX	-0.124939
-0.527362	a loop of ADC	-0.124939
-1.628907	is important to realize	-0.124939
-0.237967	a specific purpose: Contain	-0.124939
-0.331953	9 and 13 objects,	-0.124939
-0.355152	2 int64_t 128 I64vec2	-0.124939
-0.598310	Windows may be mitigated	-0.124939
-0.294284	-mssse3 -msse4.1 -mAVX -axSSE3,	-0.124939
-0.573308	than pointers to objects)	-0.124939
-0.573271	be available in 2015	-0.124939
-0.595505	improved is that r+i/2	-0.124939
-0.463005	underestimate this time lag.	-0.124939
-0.358898	look clumsy and tedious.	-0.124939
-0.458230	in microprocessor hardware design.	-0.124939
-0.357271	well optimized software design,	-0.124939
-0.358307	user interfaces from scratch.	-0.124939
-0.831035	Intel Core 2 0.63	-0.124939
-0.294284	a*1=a x-xxxxx-x (-a)*(-b)=a*b ---xxx---	-0.124939
-0.350876	graphics processors. 5 Programmable	-0.124939
-0.599677	likely that the producer	-0.124939
-0.818049	more than it says.	-0.124939
-0.314805	(Windows: /Gy, Linux: -ffunction-sections)	-0.124939
-0.659895	memory block is re-allocated	-0.124939
-0.726716	each bit in nn	-0.124939
-0.347528	carry flag (e.g. DEC,	-0.124939
-0.823011	transferred in registers, whereas	-0.124939
-0.358760	(double)(signed int)u; // Faster,	-0.124939
-0.294284	Pure function. __attribute__((const)) (Linux	-0.124939
-0.237967	5 * 0.5 ns	-0.124939
-0.237967	2005; and "More Effective	-0.124939
-0.355624	SSE4A ammintrin.h AMD XOP	-0.124939
-0.594142	(!a&&b) = a XOR	-0.124939
-0.891199	used as a stand	-0.124939
-0.237967	-ffast-math /fp:fast /fp:fast=2 -fp-model	-0.124939
-0.237967	than a polymorphous class?	-0.124939
-0.527290	(or malloc and free)	-0.124939
-0.237967	1./6.22702E9, 1./8.71782E10, 1./1.30767E12, 1./2.09227E13};	-0.124939
-0.591025	1 (see page 135).	-0.124939
-0.347529	because of disk caching,	-0.124939
-0.507310	<stdio.h> // define fprintf	-0.124939
-0.505035	in Sum2 and Sum3.	-0.124939
-0.590472	exceptions in this block:	-0.124939
-0.237967	been doubled. Thin clients	-0.124939
-0.349799	Watcom C/C++ v. 1.4,	-0.124939
-0.237967	column < NUMCOLUMNS; column++)	-0.124939
-0.358255	another error has occurred	-0.124939
-0.352088	a version control tool.	-0.124939
-0.901482	overhead of the iterator	-0.124939
-0.358563	by performing an illegal	-0.124939
-2.159700	Example: // Example 8.6a	-0.124939
-1.362696	by // Example 8.6b	-0.124939
-1.810780	to make a zip	-0.124939
-0.599091	38 // Example 7.15a.	-0.124939
-0.587603	by more than 33%	-0.124939
-0.598272	a to be signed.	-0.124939
-0.541242	the integer is signed,	-0.124939
-0.599091	underflow: // Example 7.5.	-0.124939
-0.339512	= (s0+s1)+(s2+s3); Now s0,	-0.124939
-0.237967	this language gained remarkably	-0.124939
-0.456696	small embedded systems. Today	-0.124939
-0.358932	have gone to great	-0.124939
-0.237967	configuration files (*.ini files).	-0.124939
-0.314805	/arch:SSE4.1 -mAVX /arch:AVX /QaxSSE3,	-0.124939
-0.489149	tool for details (www.agner.org/optimize/testp.zip).	-0.124939
-0.463542	64-bit addresses for everything,	-0.124939
-0.314805	hard disk copying. Security.	-0.124939
-0.237967	over the C99 standard.	-0.124939
-0.341900	several different profiling methods:	-0.124939
-0.237967	and 13 objects, respectively	-0.124939
-0.458831	vmldExp2 Intel SVML v.10.3	-0.124939
-0.458831	double Intel SVML v.10.2	-0.124939
-0.358736	square blocking or tiling.	-0.124939
-0.490351	Array of 100 doubles:	-0.124939
-0.237967	rather than 200. Next,	-0.124939
-1.367477	the most efficient alternative.	-0.124939
-0.339516	STL (Standard Template Library)	-0.124939
-0.489145	mov mov mov lea	-0.124939
-1.156404	static inline void StoreVectorA(void	-0.124939
-0.586501	It must be emphasized	-0.124939
-0.504883	libraries (*.dll or *.so)	-0.124939
-0.348401	on compiler optimization. en.wikipedia.org/wiki/Compiler_optimization.	-0.124939
-0.237967	(Gnu) AES, PCLMUL wmmintrin.h	-0.124939
-0.237967	such as gates, flip-flops,	-0.124939
-0.358901	other languages in Microsoft's	-0.124939
-0.990845	the assembly output (/FAs	-0.124939
-0.599819	According to the standards	-0.124939
-0.887304	used. See page 140.	-0.124939
-0.237967	i; float i2; for(i=0,i2=0;	-0.124939
-0.314805	do the divisions (Division	-0.124939
-1.082872	int i; for(i=0; i<301;	-0.124939
-0.237967	pointer aliasing" (if valid)	-0.124939
-0.549395	feature on Intel CPU’s.	-0.124939
-1.355050	size; i++) { ab[i].b	-0.124939
-0.355923	for accessing arrays forwards,	-0.124939
-0.237967	in a Gauss elimination.	-0.124939
-0.570575	profilers are often unreliable.	-0.124939
-0.835875	is easy to port	-0.124939
-0.429589	Available from www.agner.org/optimize/asmlib.zip. Currently	-0.124939
-0.557531	and which are cheap,	-0.124939
-0.508625	Intel Math Kernel Library.	-0.124939
-0.237967	Iss. 4, 2007 (www.intel.com/technology/itj/).	-0.124939
-0.462672	details of instruction timing,	-0.124939
-0.351729	the matrix 512 520	-0.124939
-0.357079	class (also called properties)	-0.124939
-0.584491	produce a single result,	-0.124939
-0.563026	should get a reply	-0.124939
-1.573249	to: // Example 14.17b	-0.124939
-0.823221	int fraction : 52;	-0.124939
-0.358816	with #) are costless	-0.124939
-0.481366	the branch misprediction penalty.	-0.124939
-0.566881	has done by fetching,	-0.124939
-0.336344	back to normal afterwards.	-0.124939
-0.358749	int ABC = 123;	-0.124939
-0.785880	the data into groups	-0.124939
-1.175900	p) { return _mm_load_si128((__m128i	-0.124939
-0.358464	people who have sent	-0.124939
-0.346503	very small loops (less	-0.124939
-0.237967	SVML + ia32intrin.h _mm_exp_ps	-0.124939
-0.895547	enough to be noticeable	-0.124939
-0.237967	+ ia32intrin.h _mm_exp_ps _mm_exp_pd	-0.124939
-0.345269	through this address. Step	-0.124939
-0.325441	using the directive __declspec(cpu_dispatch(...)).	-0.124939
-0.382918	float a[size], b[size], c[size];	-0.124939
-0.358898	this mask, and bb[i]*cc[i]	-0.124939
-0.314805	7.16 float list[100]; memset(list,	-0.124939
-0.900884	seen in the broader	-0.124939
-0.357229	0; column < NUMCOLUMNS;	-0.124939
-0.358898	between commas and semicolons	-0.124939
-1.056882	and you can toggle	-0.124939
-2.159700	Example: // Example 14.7a.	-0.124939
-0.576855	Using vector operations Today's	-0.124939
-0.573477	alignment can cause holes	-0.124939
-0.352982	512 AVX512 Table 12.1.	-0.124939
-0.237967	contrived examples exist. Therefore	-0.124939
-0.568592	an assembly language output,	-0.124939
-0.237967	The loop initialisation i=0;	-0.124939
-1.284103	in a single session.	-0.124939
-1.682250	of the same algorithm,	-0.124939
-0.358591	© 2004 - 2014.	-0.124939
-0.355536	a[i+1]; s2 += a[i+2];	-0.124939
-0.341900	asa << 4, anda	-0.124939
-0.591735	-fno-strict-overflow. You may deviate	-0.124939
-0.341900	party security software. Background	-0.124939
-0.353233	compiled versions #include "instrset_detect.cpp"	-0.124939
-0.357914	to reduce example 12.1b	-0.124939
-0.819661	than to write _mm_add_epi16(a,b).	-0.124939
-0.598553	(&) and the EXCLUSIVE	-0.124939
-0.937365	code from example 8.26b:	-0.124939
-0.357674	Example 14.6 float list[16];	-0.124939
-0.331953	have a strict formalism	-0.124939
-0.805350	on CPUs with full-size	-0.124939
-0.898720	conditions in a graceful	-0.124939
-1.021116	only when it changes.	-0.124939
-0.509095	Active Template Library (ATL)	-0.124939
-0.237967	/MT 160 /Qparallel -parallel	-0.124939
-0.336349	n <= 16; n++)	-0.124939
-1.574702	at compile time. (Examples	-0.124939
-0.648056	| Wednesday | Friday))	-0.124939
-0.349805	turned on, including relaxed	-0.124939
-0.237967	& operator (bitwise and)	-0.124939
-0.500400	Guide for AMD Family	-0.124939
-0.458558	set not supported fprintf(stderr,	-0.124939
-0.592957	below in example 16.1.	-0.124939
-0.592349	than it can handle.	-0.124939
-0.356804	take most time. Uses	-0.124939
-0.358736	program creates or modifies	-0.124939
-0.650510	compiler can optimize specifically	-0.124939
-0.550792	compilers due to controversies	-0.124939
-0.356527	Example 9.2a void F1(int	-0.124939
-1.075876	included in the representation,	-0.124939
-0.314809	four places back. Thus,	-0.124939
-0.353678	many features, see http://www.agner.org/optimize/	-0.124939
-0.314809	65 65 13.6 80.9	-0.124939
-0.237967	64 64 14.0 80.8	-0.124939
-0.764473	more clear and intelligible	-0.124939
-0.659969	to utilize the computational	-0.124939
-0.358736	alignment. __declspec(align(16)) or __attribute__((aligned(16))).	-0.124939
-0.358431	are: Coarse time measurement.	-0.124939
-0.527123	Function addresses are obscured	-0.124939
-0.356652	balance between these considerations.	-0.124939
-1.514373	of a program dictates	-0.124939
-0.358838	illegal operation that crashes	-0.124939
-1.350712	member function is 83	-0.124939
-0.237967	Sutter: A Pragmatic Look	-0.124939
-0.350874	assembly language", section 17.9:	-0.124939
-0.237967	as gates, flip-flops, multiplexers,	-0.124939
-0.599819	iteration to the next.	-0.124939
-0.462471	point and integer representations	-0.124939
-1.377277	no need to deallocate	-0.124939
-1.333315	& x) { _mm_store_si128((__m128i	-0.124939
-0.502439	sometimes be eliminated completely.	-0.124939
-0.358816	calling conventions are different.	-0.124939
-0.525802	the different compilers succeeded	-0.124939
-0.358464	OS, etc.) have little-endian	-0.124939
-0.590472	described in this chapter.	-0.124939
-0.237967	char 128 Is8vec16 Vec16c	-0.124939
-0.580630	have CPU dispatching 125	-0.124939
-1.735621	example: // Example 14.16a	-0.124939
-0.358980	as eliminating the if-branch	-0.124939
-0.715715	manual is based mainly	-0.124939
-1.683872	explained on page 132.	-0.124939
-0.356338	unsigned integer type size_t	-0.124939
-1.438461	accessed in a FILO	-0.124939
-0.314805	0 to 12. Higher	-0.124939
-0.341903	PTR [eax+4], ecx 86	-0.124939
-0.237967	y; // x,y coordinates	-0.124939
-0.357530	= a1 * b2	-0.124939
-0.357530	= a2 * b1	-0.124939
-0.600955	example in the "Macro	-0.124939
-0.585192	into a and b.	-0.124939
-0.583274	as AMD and VIA.	-0.124939
-0.351326	purpose. It just happened	-0.124939
-0.444434	function at runtime. Polymorphism	-0.124939
-0.356436	by comparing bits 32-62.	-0.124939
-0.763810	and loop-invariant code motion.	-0.124939
-0.343771	many common purposes (www.boost.org).	-0.124939
-0.355988	= 0; while (seconds	-0.124939
-0.582923	return; } // continue	-0.124939
-0.562617	work only on Intel/x86-compatible	-0.124939
-0.599091	variable: // Example 7.26b	-0.124939
-0.237967	Specifications, Dr Dobbs Journal,	-0.124939
-0.540925	Intel C++ compiler (parallel	-0.124939
-2.159700	Example: // Example 7.26a	-0.124939
-1.148533	may improve the possibilities	-0.124939
-0.561938	coordination with other subtasks	-0.124939
-0.358736	particular weakness or bottleneck,	-0.124939
-0.527177	little data for analysis.	-0.124939
-0.435097	150 16 Testing speed..............................................................................................................	-0.124939
-0.294284	root, RGB color difference.	-0.124939
-0.294284	SSE4.1 smmintrin.h SSE4.2 nmmintrin.h	-0.124939
-0.237967	defined with enum, const,	-0.124939
-0.358898	MKL, VML and SVML.	-0.124939
-1.346302	less efficient than non-object	-0.124939
-0.358932	portability issue to catching	-0.124939
-0.358479	{ try { F1();	-0.124939
-0.593769	operations are not used).	-0.124939
-0.357502	Linux kernel version 2.6.30	-0.124939
-0.358019	very expensive cache contentions,	-0.124939
-0.358690	delays execution by causing	-0.124939
-0.763752	SIZE; c++) { StoreNTD(&a[c][r],	-0.124939
-0.463359	information for function F1.	-0.124939
-0.429586	Gnu/AT&T syntax: __asm ("fldl	-0.124939
-2.159700	Example: // Example 8.19.	-0.124939
-0.457222	or another. Therefore, micro-	-0.124939
-1.693261	is faster than 15.1b,	-0.124939
-0.237967	// Example 8.20 module1.cpp	-0.124939
-0.358736	CPU dispatching or memory-intensive	-0.124939
-0.358901	occurs somewhere in F1?	-0.124939
-0.357806	cache effects into account.	-0.124939
-0.237967	of the Xnu project.	-0.124939
-0.357271	a new software project,	-0.124939
-0.656518	to distinguish between recoverable	-0.124939
-0.537591	DOS and Windows 3.x.	-0.124939
-0.493453	array with bounds checking,	-0.124939
-0.237967	doing the spell checking.	-0.124939
-1.573249	to: // Example 8.10b	-0.124939
-0.599091	false: // Example 8.10a	-0.124939
-0.459095	not fully optimized yet.	-0.124939
-0.237967	0.30 4.5 0.82 0.59	-0.124939
-0.463169	dummy[4]; volatile int DontSkip;	-0.124939
-0.294284	int a; Plus2 (&a);	-0.124939
-0.354384	the whole program. During	-0.124939
-0.237967	(-a==-b)=(a==b) ---xx---- (-a>-b)=(a<b) ---xx---x	-0.124939
-0.590800	case, you may view	-0.124939
-0.341902	is Borland's now discontinued	-0.124939
-0.439073	rarely in Linux. Address	-0.124939
-0.596146	why there is virtually	-0.124939
-0.598749	STL is not satisfactory.	-0.124939
-1.683872	explained on page 87.	-0.124939
-0.294284	/fp:fast=2 -fp-model fast, -fp-	-0.124939
-1.129413	performance monitor counters ....................................................................	-0.124939
-1.652808	be used for fetching	-0.124939
-1.163340	int i; float i2;	-0.124939
-0.336344	of (0,0,0,0,0,0,0,0) Is16vec8 zero(0,0,0,0,0,0,0,0);	-0.124939
-0.382918	or graphics accelerator card.	-0.124939
-1.042289	one call to Func1,	-0.124939
-0.294284	the register usage convention	-0.124939
-0.346499	Object Windows Library (OWL).	-0.124939
-0.897557	(except for the <,	-0.124939
-0.595332	comparisons, such as <.	-0.124939
-0.237967	Examples include JavaScript, PHP,	-0.124939
-0.596515	be because the non-reduced	-0.124939
-0.782339	the object file level.	-0.124939
-0.581110	and the memory released	-0.124939
-0.654253	by type-casting its address:	-0.124939
-0.358898	the Professional and Enterprise	-0.124939
-0.355152	2 uint64_t 128 Vec2uq	-0.124939
-0.715087	Branches and switch statements.............................................................................	-0.124939
-0.515707	Mac Windows, Linux, Mac,	-0.124939
-0.826879	is available from www.agner.org/optimize/testp.zip.	-0.124939
-0.358736	as string or CString.	-0.124939
-0.358932	always happy to receive	-0.124939
-0.237967	bcc, v. 5.5 Mac:	-0.124939
-0.597002	inline function is expanded	-0.124939
-0.237967	problems and planned solutions.	-0.124939
-1.034703	of the following solutions,	-0.124939
-0.357530	= (a+1) * (a+1);	-0.124939
-0.599091	gives: // Example 7.30b	-0.124939
-2.159700	Example: // Example 7.30a	-0.124939
-0.237967	SSE2 not supported"); return;	-0.124939
-0.586380	Several function libraries published	-0.124939
-0.347528	system call (e.g. GetProcessAffinityMask	-0.124939
-0.358824	all software be reinstalled	-0.124939
-0.353678	may also see emulated	-0.124939
-0.688065	or a hash map.	-0.124939
-0.294284	-static /MT 160 /Qparallel	-0.124939
-0.659805	the exponent, and fffff	-0.124939
-0.447891	Java, C#, Visual Basic,	-0.124939
-0.596838	2.5f}; a = OneOrTwo5[b	-0.124939
-0.358852	an interpreter for Basic.	-0.124939
-0.764587	which one is fastest.	-0.124939
-1.601186	x) { return square(x)	-0.124939
-0.339512	last index changes fastest:	-0.124939
-0.339510	static inline T max(T	-0.124939
-0.541549	it is too late.	-0.124939
-0.294284	Example 7.15b SafeArray <float,	-0.124939
-0.504977	is smaller and closer	-0.124939
-0.527182	Static cast The static_cast	-0.124939
-0.237967	of their superior performance/price	-0.124939
-0.488097	give a considerable improvement	-0.124939
-0.358290	the table at runtime,	-0.124939
-0.237967	x); // x^1, x^2,	-0.124939
-0.581712	with sign bit set).	-0.124939
-0.358601	64 kb. This corresponds	-0.124939
-0.237967	show a discrete icon	-0.124939
-0.237967	or __asm ("int 3");	-0.124939
-0.237967	K8 1.09 1.25 1.61	-0.124939
-0.591025	pointer (see page 38).	-0.124939
-0.341902	#define Alignd(X) X __attribute__((aligned(16)))	-0.124939
-0.512254	"Intel® C++ Compiler Documentation".	-0.124939
-0.527293	is done in connection	-0.124939
-0.331950	the program runs satisfactorily	-0.124939
-0.237967	software development kit (SDK	-0.124939
-0.504652	variable-size array with alloca:	-0.124939
-0.343768	is very inefficient. Linear	-0.124939
-0.358560	Example 14.13c int list[301];	-0.124939
-0.429593	146 14.12 Position-independent code..................................................................................	-0.124939
-0.537591	7 and Windows Server	-0.124939
-0.237967	object file formats. Comments	-0.124939
-1.098901	The cost of synchronizing	-0.124939
-0.358647	you spend on redesigning	-0.124939
-0.504818	has allocated with alloca,	-0.124939
-0.237967	to windows, graphic brushes,	-0.124939
-1.601186	x) { return _mm_cvtss_si32(_mm_load_ss(&x));}	-0.124939
-0.358901	must bear in mind,	-0.124939
-0.615826	set supports self-relative addressing.	-0.124939
-0.237967	job before you. Optimized	-0.124939
-0.382918	tools for supporting multi-threaded	-0.124939
-2.159700	Example: // Example 7.3.	-0.124939
-0.435097	manuals by Agner Fog	-0.124939
-0.237967	unused label ;eax=addressofa ;edx=addressinr	-0.124939
-0.237967	counter //=2*A //=A*x*x+B*x+C //=DeltaY	-0.124939
-0.314805	short int, float. Similar	-0.124939
-0.490461	or memory pool. Alignment?	-0.124939
-1.297938	make sure the startup	-0.124939
-0.237967	lookup[2] = {2.6f, 1.5f};	-0.124939
-0.835714	and one that doesn’t.	-0.124939
-2.159700	Example: // Example 7.39	-0.124939
-0.724419	will convert example 12.8a	-0.124939
-0.358932	example 12.8a to 12.8b	-0.124939
-0.358736	to x?" or "how	-0.124939
-0.592957	explained in example 7.35	-0.124939
-1.735621	example: // Example 7.37	-0.124939
-0.358666	that begins with #)	-0.124939
-0.237967	it defines electrical connections	-0.124939
-2.159700	Example: // Example 7.36	-0.124939
-0.582987	b) y = MAX(f(x),	-0.124939
-0.598331	than the function add_horizontal)	-0.124939
-0.517724	pointers: The trick violates	-0.124939
-0.358591	= 2.5*x^2 - 8*x	-0.124939
-0.357058	code optimization. See www.agner.org/optimize	-0.124939
-0.540608	algebra, we may write:	-0.124939
-0.463048	hackers often have exploited.	-0.124939
-0.358601	its arguments. This closely	-0.124939
-0.358898	reflected, first and foremost,	-0.124939
-1.098422	The most important remedy	-0.124939
-0.356480	are highly system dependent	-0.124939
-0.358560	class c1; int c1::*MemberPointer;	-0.124939
-0.357028	i > 0; i--)	-0.124939
-0.358898	accessing list[i].a and list[i].b.	-0.124939
-0.358736	on remote or removable	-0.124939
-0.586698	too important to ignore,	-0.124939
-0.314805	volatile doesn't mean atomic.	-0.124939
-0.580451	expression b * 5).	-0.124939
-0.357954	(see above, page 87)	-0.124939
-1.271928	code that is distributed.	-0.124939
-0.331955	lazy binding definitely degrades	-0.124939
-0.237967	// Example 7.3. Explain	-0.124939
-0.657436	vector math library (VML,	-0.124939
-0.237967	2: template <bool IsPowerOf2,	-0.124939
-0.358749	{ ab[i].b = Func(ab[i].a);	-0.124939
-0.554746	The Pentium 4 (NetBurst)	-0.124939
-0.581860	compiler and it understands	-0.124939
-0.357652	have family number 6!	-0.124939
-2.107552	0; i < ArraySize;	-0.124939
-0.595301	code version performs poorly.	-0.124939
-1.366623	a matter of habit,	-0.124939
-0.358749	double log2 = log(2.0);	-0.124939
-0.349157	prediction. A Pentium M	-0.124939
-0.574273	by the clock period	-0.124939
-0.463600	for generality and flexibility,	-0.124939
-0.294284	in a first-in-last-out fashion.	-0.124939
-0.358898	2A, 2B, and 3A	-0.124939
-1.749187	a lot of CPU-time	-0.124939
-0.237967	in this block: 62	-0.124939
-0.237967	Common Language Runtime, CLR,	-0.124939
-0.237967	1.61 n.a. 2.23 0.95	-0.124939
-0.599819	n to the exponent:	-0.124939
-0.549707	available with vector operands:	-0.124939
-1.170641	in 64-bit systems. 67	-0.124939
-0.358344	= sin(x); } 68	-0.124939
-1.281123	a + 1; 69	-0.124939
-1.009873	page 130 for details).	-0.124939
-0.237967	Darwin8 g++ v 4.0.1.	-0.124939
-1.066075	excessive number of DLLs,	-0.124939
-0.336346	to the structure. Incrementing	-0.124939
-1.425485	to do the conversion.	-0.124939
-0.561861	tested in different browsers,	-0.124939
-0.726362	20, columns = 50;	-0.124939
-1.755866	that it is unrealistic	-0.124939
-0.354673	based on hardware identification.	-0.124939
-0.343771	will take approximately 500	-0.124939
-0.516721	compilers to choose between.	-0.124939
-1.337322	You have to consult	-0.124939
-0.816143	in the previous iteration.	-0.124939
-0.357430	counts. In any event,	-0.124939
-0.815174	will be very helpful	-0.124939
-0.586865	paying the performance costs.	-0.124939
-0.314809	hardware access. Available protocols	-0.124939
-0.353237	a constant reference instead:	-0.124939
-0.346502	of smaller sizes (char,	-0.124939
-0.595307	automatic vectorization. Optimizes moderately	-0.124939
-0.442085	AddTwo(int * __restrict aa,	-0.124939
-0.237967	happen that (b*c) overflows,	-0.124939
-0.352982	4 AVX2 Table 12.3.	-0.124939
-1.142005	it has been brutally	-0.124939
-0.357779	; i + sign(i)	-0.124939
-0.358355	cache contentions will occur:	-0.124939
-0.339512	9.0 CodeGear Borland bcc,	-0.124939
-0.237967	powN<(N1&(N1-1))==0,N1>::p(x) * powN<true,N-N1>::p(x); #undef	-0.124939
-0.237967	written as 2eee 1.fffff,	-0.124939
-0.237967	ISO/IEC TR 18015, "Technical	-0.124939
-1.748097	The following example converts	-0.124939
-0.461859	is (columns * sizeof(float))	-0.124939
-0.348406	as follows: struct Sfloat	-0.124939
-0.358760	powN is // erroneously	-0.124939
-0.314805	additions and multiplications. Subtractions	-0.124939
-0.357607	5.82 (Embarcadero/CodeGear/Borland C++ Builder	-0.124939
-0.355257	* 17is calculated as(a	-0.124939
-0.463600	to i and shifts	-0.124939
-0.358250	parallelization. Supports vector intrinsics	-0.124939
-0.237967	add cmp ja $B2$3:	-0.124939
-0.294284	Vec4q Vec4uq Vec4f Vec2d	-0.124939
-0.758347	problems and system breakdown.	-0.124939
-0.459325	into many small subtasks,	-0.124939
-0.358898	between x and y?"	-0.124939
-1.286795	call to a driver	-0.124939
-0.314805	instruction sets Microprocessor producers	-0.124939
-0.237967	int64_t 128 I64vec2 Vec2q	-0.124939
-0.352981	divided into three parts:	-0.124939
-0.461105	8 char 64 Is8vec8	-0.124939
-0.527196	cc[]); // function prototypes	-0.124939
-0.771379	series of five manuals:	-0.124939
-0.358749	is (int)(&list[100]) = (int)(&list[0])	-0.124939
-0.237967	smaller memory footprint. If,	-0.124939
-0.237967	throughputs and micro-operation breakdowns	-0.124939
-2.124740	in order to minimize	-0.124939
-0.237967	16 0 65535 uint16_t	-0.124939
-0.237967	int 16 -32768 32767	-0.124939
-1.004146	a variable in parts,	-0.124939
-0.358856	and SVML. The IPP	-0.124939
-0.294284	of optimizing University courses	-0.124939
-0.294284	(Gnu) AMD FMA4 fma4intrin.h	-0.124939
-0.358901	program. All in all,	-0.124939
-0.598331	at the function bodies	-0.124939
-0.651391	The instruction add eax,1	-0.124939
-0.358278	set. Aligning data Loading	-0.124939
-0.358704	will occur: if (SIZE	-0.124939
-0.237967	bit manipulation tricks Michael	-0.124939
-0.460274	responses to simple actions	-0.124939
-0.591508	then you are risking	-0.124939
-0.596876	where is the sign,	-0.124939
-0.356527	EXCEPTION_FLT_OVERFLOW 0xC0000091L void MathLoop()	-0.124939
-0.462968	satisfied with more heuristic	-0.124939
-0.463359	integer factorial function (n!)	-0.124939
-0.347531	approximate reciprocal square root,	-0.124939
-0.600955	object in the GOT,	-0.124939
-0.349799	Library (MKL v. 7.2).	-0.124939
-1.683872	explained on page 130.	-0.124939
-1.945434	This can be ameliorated	-0.124939
-0.354053	many such programs installed	-0.124939
-0.599869	pivot in a Gauss	-0.124939
-0.350875	issue. See my blog	-0.124939
-0.237967	Exception Specifications, Dr Dobbs	-0.124939
-0.590106	expressions using the fundamental	-0.124939
-0.558539	ecx DWORD PTR [eax+4],	-0.124939
-0.357530	void StoreNTD(double * dest,	-0.124939
-0.294284	* CriticalFunctionDispatch(void) __asm__ ("CriticalFunction");	-0.124939
-0.237967	b / 1.2345; Change	-0.124939
-0.596825	calculations, should be scheduled	-0.124939
-0.500280	well-defined with option -fwrapv	-0.124939
-0.357825	used for pointer conversions.	-0.124939
-0.358852	limited audience for educational	-0.124939
-0.356965	the YMM register state.	-0.124939
-0.314805	loop ; compute i/2	-0.124939
-0.237967	Math Library __vrs4_expf __vrd2_exp	-0.124939
-0.341902	network with heavy traffic	-0.124939
-0.294284	(set) = (memory address)	-0.124939
-0.753065	to the user. Compatibility	-0.124939
-0.356338	of doing type conversions:	-0.124939
-0.504862	key values are confined	-0.124939
-0.358736	speed. Delays or glitches	-0.124939
-0.358939	It reveals a funda-	-0.124939
-0.357862	{ __declspec(__align(64)) double matrix[SIZE][SIZE];	-0.124939
-0.356527	each version void FUNCNAME(short	-0.124939
-0.653591	swapped with element matrix[c][r].	-0.124939
-0.382918	explicitly by writing: __declspec(align(64))	-0.124939
-0.237967	__asm ("fldl %1 \n	-0.124939
-0.581823	Multiply(10,8); b = MultiplyBy<8>(10);	-0.124939
-0.350877	as constant references accept	-0.124939
-0.358898	me corrections and suggestions	-0.124939
-0.314805	reduced performance. 25 Since	-0.124939
-0.453416	positive and negative impacts	-0.124939
-0.517881	is allocated dynamically (with	-0.124939
-0.331950	of this method. Your	-0.124939
-0.578989	lesson we can learn	-0.124939
-0.294284	such as price, compatibility,	-0.124939
-0.461477	c1; c2 < c1+TILESIZE;	-0.124939
-0.550448	after each time slice	-0.124939
-0.457447	136 14.5 Integer division......................................................................................................	-0.124939
-0.352986	Instruction set needed _mm_shuffle_epi8	-0.124939
-0.352089	v.10.3 & later __svml_expf4	-0.124939
-0.348402	to use it. Complicated	-0.124939
-0.507306	temp->a = 1.0; temp->b	-0.124939
-0.357332	4.4, 2.5}; return list[x];	-0.124939
-0.358479	&list[100]; temp++) { temp->a	-0.124939
-0.459533	can eliminate common subexpressions	-0.124939
-1.114968	floating point operations (addition,	-0.124939
-0.356178	reason is, I guess,	-0.124939
-0.599091	e.g.: // Example 12.1b.	-0.124939
-0.592957	but in example 12.1b,	-0.124939
-0.527024	2) SelectAddMul_pointer = &SelectAddMul_SSE2;	-0.124939
-0.539984	opinions on which imprecisions	-0.124939
-0.998513	Define vector classes (Intel)	-0.124939
-0.237967	& later __svml_expf4 __svml_exp2	-0.124939
-0.358166	system and CPU hardware.	-0.124939
-0.586501	instruction must be followed	-0.124939
-0.237967	SelectAddMul_SSE2, SelectAddMul_SSE41, SelectAddMul_AVX2, SelectAddMul_dispatch;	-0.124939
-0.462024	have just two branches:	-0.124939
-0.457211	32-bit integer multiplication prior	-0.124939
-0.503573	a=a*2; to return a+1;.	-0.124939
-1.023592	and operating systems (but	-0.124939
-0.550420	Intel library function __intel_cpu_features_init()	-0.124939
-0.355536	for some small low-power	-0.124939
-0.355536	list[i]; sum2 += list[i+1];}	-0.124939
-0.347528	system calls (e.g. IsProcessorFeaturePresent	-0.124939
-0.357660	the variable two names,	-0.124939
-0.541049	certain conditions are satisfied.	-0.124939
-0.358856	to be. The distinctions	-0.124939
-0.325444	i_div_3; for(i=i_div_3=0; i<300; i+=3,i_div_3++){	-0.124939
-0.537828	do not need relocation	-0.124939
-0.590992	Sometimes it takes hours	-0.124939
-0.463410	n.a. a+b = b+a,	-0.124939
-0.788593	following conditions are satisfied:	-0.124939
-0.583961	one. It may neverthe-	-0.124939
-0.357674	{ static float list[]	-0.124939
-0.487723	in the condition clause.	-0.124939
-0.535325	making the container expandable,	-0.124939
-0.354671	the IEEE standard 754	-0.124939
-0.237967	floats F32vec4 xxn(x4, x2*x,	-0.124939
-0.345271	2 GHz CPU. Should	-0.124939
-0.358943	that memset is deprecated.	-0.124939
-0.237967	order to facilitate porting	-0.124939
-0.346499	AMD LIBM Library amd_vrs4_expf	-0.124939
-0.504508	more than an hour.	-0.124939
-0.567067	function, one that discriminates	-0.124939
-0.358901	compilers succeeded in applying	-0.124939
-0.358730	other features it has.	-0.124939
-0.715066	( 1)sign 2exponent 1023	-0.124939
-0.237967	it can handle. Waiting	-0.124939
-0.549907	of Intel vector classes:	-0.124939
-0.382918	p1 = &Object1; p1->Hello();	-0.124939
-0.892708	__m128i a = _mm_blendv_epi8(bc,	-0.124939
-2.159700	Example: // Example 8.12a	-0.124939
-1.573249	to: // Example 8.12b	-0.124939
-0.294284	Function prototype CriticalFunctionType CriticalFunction_Dispatch;	-0.124939
-0.502094	directly compiled code. (Compile	-0.124939
-0.354384	a big program. Frequent	-0.124939
-0.596503	frameworks, rather than isolating	-0.124939
-0.599712	_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It is strongly	-0.124939
-0.593693	program is more manageable	-0.124939
-0.460211	be cleaned up include:	-0.124939
-0.823011	transferred in registers, totaling	-0.124939
-0.358736	option -Wstrict-overflow=2, or (5)	-0.124939
-1.003433	nontemporal write instructions (MOVNT)	-0.124939
-0.429586	is Visual Basic .NET,	-0.124939
-0.599869	a in a column-wise	-0.124939
-0.325444	the user. Making exception-safe	-0.124939
-0.314805	the program 153 spends	-0.124939
-0.237967	0.89 0.40 0.30 4.5	-0.124939
-0.504719	sizeof(a)); } int Size()	-0.124939
-0.237967	uint64_t Table 7.1. Sizes	-0.124939
-0.357779	= y.d + 4.;	-0.124939
-0.863923	available from a website.	-0.124939
-0.237967	end user. Menus, buttons,	-0.124939
-0.937365	code from example 9.5a:	-0.124939
-0.357942	cache between each call,	-0.124939
-0.659094	optimization by compiler .......................................................................	-0.124939
-0.237967	1./24., 1./120., 1./720., 1./5040.,	-0.124939
-0.358560	at 403 int ReadB()	-0.124939
-0.351728	print out results printf("\n%2i	-0.124939
-0.352982	264-1 uint64_t Table 7.1.	-0.124939
-0.503246	one or multiple elements?	-0.124939
-0.538986	no pointer aliasing" (if	-0.124939
-0.598310	execution may be caused	-0.124939
-0.563524	f cout << x.f;	-0.124939
-0.294284	a/1=a x-xxx-x-- 0/a=0 ---xx--xx	-0.124939
-0.654987	= char 16 XOP,	-0.124939
-0.358736	with First-In-First-Out or First-In-Last-Out	-0.124939
-0.357530	+ FuncCol(i)) * sizeof(float)	-0.124939
-0.325441	and necessary support. Hardware	-0.124939
-0.314805	many bit manipulation tricks	-0.124939
-0.570153	--xxxx--- a & a=	-0.124939
-0.355448	lines for matrix a:	-0.124939
-1.863958	in the same directory	-0.124939
-0.237967	= { "Alpha", "Beta",	-0.124939
-0.884044	be calculated as (b*2.0)/3.0	-0.124939
-0.237967	container. STL deque (doubly	-0.124939
-0.596838	function a = CriticalFunction(b,	-0.124939
-0.237967	level linking (remove unreferen-	-0.124939
-0.596074	fast as a scalar	-0.124939
-0.512008	templates (see p. 57).	-0.124939
-0.823221	int fraction : 63;	-0.124939
-0.325441	for parallel processing. Scott	-0.124939
-0.355988	and cause large delays.	-0.124939
-0.358760	away cpuid // Read	-0.124939
-0.346506	Automatic vectorization Automatic paralleli-	-0.124939
-0.237967	8 0 255 uint8_t	-0.124939
-0.435094	can automatically detect opportunities	-0.124939
-0.341903	Wednesday = 8, Thursday	-0.124939
-0.710979	int 8 AVX2 _mm_i32gather_epi32	-0.124939
-0.358464	Microprocessor designers have gone	-0.124939
-0.358736	not overlapping or aliasing,	-0.124939
-0.524939	x^n } return add_elements(s);	-0.124939
-1.203304	parm2) {...} // Dispatcher.	-0.124939
-0.463394	a reference, or void.	-0.124939
-0.460643	leak. An even worse	-0.124939
-0.526848	is calculated as ((a+b)+c)+d.	-0.124939
-0.237967	is Microsoft Foundation Classes	-0.124939
-0.237967	as strcpy, strcat, strlen,	-0.124939
-0.659527	type casting // Constructor-style	-0.124939
-0.358852	all branches for correctness.	-0.124939
-0.569509	cause a cache miss.	-0.124939
-0.582035	resources from a higher-priority	-0.124939
-0.237967	security software. Background services.	-0.124939
-0.572780	my vector class library).	-0.124939
-0.820015	edx, DWORD PTR [esp+4]	-0.124939
-0.587721	Size() { return N;	-0.124939
-0.357530	floats: float * DynamicArray	-0.124939
-0.726362	s; s = _mm_hadd_ps(x,	-0.124939
-0.641802	if (i >= N)	-0.124939
-0.898620	due to the design	-0.124939
-0.586061	15 clock cycles (depending	-0.124939
-0.726815	the parallelism is obvious.	-0.124939
-0.875506	if this is obvious,	-0.124939
-0.356527	type typedef void FuncType(short	-0.124939
-1.423686	= b * (1.	-0.124939
-1.051232	can be changed freely.	-0.124939
-0.463749	(x) x (x) x-xx--xx-	-0.124939
-0.237967	c++) { StoreNTD(&a[c][r], b[r][c]);	-0.124939
-0.500612	(requires no specific option)	-0.124939
-0.358630	particular advantageous as replacements	-0.124939
-0.657226	Report on C++ Performance".	-0.124939
-0.801313	for the exception handler,	-0.124939
-0.358898	initialization, condition, and increment.	-0.124939
-0.462992	These systems use segmentation	-0.124939
-0.358630	comparing them as integers:	-0.124939
-0.237967	160 /Qparallel -parallel -openmp	-0.124939
-0.884302	the stack. This behaviour	-0.124939
-0.358690	or 1 by XOR'ing	-0.124939
-1.161650	an option for RTTI	-0.124939
-1.485143	with the same divisor.	-0.124939
-0.526510	a.x, y + a.y);}	-0.124939
-0.659506	for (r2 = r1+1;	-0.124939
-0.355701	the other thread increments	-0.124939
-0.355540	used by exception handlers	-0.124939
-0.764061	through pointers or references:	-0.124939
-0.237967	C++ compiler (parallel composer)	-0.124939
-0.595641	vector { // 2-dimensional	-0.124939
-0.554404	127 will generate -128,	-0.124939
-0.237967	Loop counter //=2*A //=A*x*x+B*x+C	-0.124939
-0.314805	when optimizing multithreaded applications:	-0.124939
-0.358736	options -S or /Fa	-0.124939
-0.358563	to handle an unrecoverable	-0.124939
-0.408223	other error condition. Things	-0.124939
-0.523687	Generate map file /Fm	-0.124939
-0.598553	pointers and the texts	-0.124939
-0.358934	several iterations of redesign.	-0.124939
-0.352983	& 0x7FFFFF) | 0x3F800000;	-0.124939
-0.587373	-fpie instead of -fpic.	-0.124939
-0.659876	good deal of research	-0.124939
-0.600159	event it is servicing.	-0.124939
-0.294284	long long clock; __cpuid(dummy,	-0.124939
-1.070400	// number of rows/columns	-0.124939
-0.382918	p = &Object1; p->NotPolymorphic();	-0.124939
-0.237967	Goedecker and A. Hoisie,	-0.124939
-0.331950	accessed through pointers, e.g.:	-0.124939
-0.835592	a destructor that destroys	-0.124939
-0.350372	the container. STL deque	-0.124939
-1.191896	because the compiler knows	-0.124939
-0.358901	Gnu utilities in 2010.	-0.124939
-0.358852	v. 14.00 for 80x86	-0.124939
-0.358898	frame, saving and restoring	-0.124939
-0.357779	= a1/b1 + a2/b2;	-0.124939
-0.357332	_mm_hadd_ps(s, s); return _mm_cvtss_f32(s);	-0.124939
-1.252094	with the option -read_only_relocs	-0.124939
-0.358898	internet forums and newsgroups	-0.124939
-0.503600	code as example 12.4b,	-0.124939
-0.599091	set: // Example 12.4b.	-0.124939
-0.568125	or 32 bits (rarely	-0.124939
-0.596838	address a = 10000,	-0.124939
-0.237967	the order a[0], b[0],	-0.124939
-0.591025	expressions (see page 72).	-0.124939
-1.347218	information about the dimensions	-0.124939
-0.237967	Mac: Darwin8 g++ v	-0.124939
-0.538986	b1, b2, y1, y2,	-0.124939
-0.325441	in parallel. Small lightweight	-0.124939
-0.444434	4 + esp ;alignby4	-0.124939
-0.350874	with profilers are: Coarse	-0.124939
-0.358709	// Branch/loop function vectorized:	-0.124939
-0.294284	platform is obviously influenced	-0.124939
-0.349799	Borland bcc, v. 5.5	-0.124939
-0.341902	and API's. Memory swapping.	-0.124939
-0.579079	Then we are breaking	-0.124939
-0.713037	are discussed below. Cannot	-0.124939
-0.581632	including the library libmmt.lib	-0.124939
-0.358852	special precautions for speeding	-0.124939
-0.435097	start of Func ;a	-0.124939
-0.358666	number (e.g. with _finite())	-0.124939
-0.382918	and data decomposition. Functional	-0.124939
-0.237967	their superior performance/price ratio.	-0.124939
-0.294284	process can proceed unattended.	-0.124939
-0.527295	model number to reflect	-0.124939
-0.237967	of Func ;a ;r	-0.124939
-0.237967	OneOrTwo5[2] = {1.0f, 2.5f};	-0.124939
-0.594095	exceptions. See page 61.	-0.124939
-0.237967	-fopenmp /Qopenmp -m32 -m64	-0.124939
-0.382918	she is busy concentrating	-0.124939
-0.833710	the bitwise operators (&	-0.124939
-0.345272	well-known languages. My preference	-0.124939
-0.237967	int list[100]; Func1(list, &list[8]);	-0.124939
-1.390723	transferred in registers (6	-0.124939
-0.355988	would be while (0	-0.124939
-0.460468	available, 256 bits (YMM)	-0.124939
-0.599091	classes: // Example 12.4d.	-0.124939
-0.348402	in the copying process,	-0.124939
-0.505092	done under the best-case	-0.124939
-0.357779	+ c.x + d.x;	-0.124939
-0.780055	XMM (vector) reductions: a+b=b+a,	-0.124939
-1.390723	transferred in registers (8	-0.124939
-0.463410	7.40c x.abc = (A	-0.124939
-0.648056	<< 4) | (C	-0.124939
-0.357271	systems. A software developer	-0.124939
-0.352983	= A | (B	-0.124939
-0.237967	Instruction set Prefetch PREFETCH	-0.124939
-0.353463	is called name mangling.	-0.124939
-0.560748	expressions are less susceptible	-0.124939
-0.348402	on improving performance. Stefan	-0.124939
-0.499073	Set flush-to-zero mode (SSE):	-0.124939
-0.541228	(128 vectors of inte-	-0.124939
-0.294284	/openmp /MT -msse3 /arch:SSE3	-0.124939
-0.593496	preceding value of sum.	-0.124939
-0.595332	function such as ReadB	-0.124939
-0.358736	as VHDL or Verilog.	-0.124939
-0.358943	model N-1 is inferior.	-0.124939
-0.589849	but the first dimension	-0.124939
-0.331950	reliable than third party	-0.124939
-0.294284	15h Processors". www.amd.com. Advices	-0.124939
-0.501167	form of error reporting.	-0.124939
-0.600620	hardware can be wired	-0.124939
-2.159700	Example: // Example 14.12a	-0.124939
-1.823775	it takes to refresh	-0.124939
-0.358856	pitfalls here: The inequality	-0.124939
-0.575707	to using templates. Ready	-0.124939
-0.358221	p2 having different types.	-0.124939
-0.863517	b; b = !a;	-0.124939
-0.341905	int n; #if defined(__unix__)	-0.124939
-0.598553	parameter, and the destructor,	-0.124939
-1.810780	to make a destructor.	-0.124939
-0.598553	etc. and the wires	-0.124939
-1.060632	may use the _mm_clflush	-0.124939
-0.463394	mixed types or sizes?	-0.124939
-1.785344	to make the SelectAddMul	-0.124939
-0.294284	a+0=a x-xxxxxx- a*0=0 --xxxx-xx	-0.124939
-0.314805	statements............................................................................. 43 7.13 Loops......................................................................................................................	-0.124939
-0.358560	instruction set int iset	-0.124939
-0.600955	saved in the beginning.	-0.124939
-1.473706	is necessary to adhere	-0.124939
-0.358749	dummy[0]; clock = __rdtsc();	-0.124939
-0.355253	+ 2.0 / 3.0;	-0.124939
-0.564231	256 clock cycles. Calculations	-0.124939
-0.352717	method. A longer loop-	-0.124939
-0.314805	i; for(i=0; i<100; i++)a[i]=2*i;	-0.124939
-0.358898	between recoverable and non-recoverable	-0.124939
-0.568959	that has no side-effects	-0.124939
-0.237967	incredibly stupid things. Looking	-0.124939
-0.725681	to optimize this loop?	-0.124939
-0.237967	v 4.0.1. Gnu: Glibc	-0.124939
-0.357958	object of class C1,	-0.124939
-0.355447	for each line written.	-0.124939
-1.284083	} }; // Partial	-0.124939
-0.584207	the elements in a[]	-0.124939
-0.294284	improved by consistent modularity	-0.124939
-0.460280	handle unknown processors properly.	-0.124939
-0.353463	fast, -fp- model fast=2	-0.124939
-1.054970	( short int aa[size]	-0.124939
-0.352411	case F2 actually throws	-0.124939
-0.358932	two branches to feed	-0.124939
-0.600955	while in the former	-0.124939
-0.237967	for the "FDIV bug".	-0.124939
-0.821637	down the execution considerably.	-0.124939
-0.339512	time Some developers feel	-0.124939
-0.527156	finding problems that relate	-0.124939
-0.892708	expression a = b++;	-0.124939
-0.357026	several other less well-known	-0.124939
-0.635362	in a vector. 6.	-0.124939
-0.463173	etc. Use an antivirus	-0.124939
-0.462738	can be different sizes,	-0.124939
-0.588066	feed into the pipeline.	-0.124939
-0.237967	\n fistpl %0 "	-0.124939
-0.595332	blocks such as gates,	-0.124939
-0.643912	objects in computer games.	-0.124939
-0.357332	page 134) return FactorialTable[n];	-0.124939
-0.356804	than last time. Newer	-0.124939
-0.357862	static inline double IntegerPower	-0.124939
-0.878704	discussed on page 158.	-0.124939
-0.347528	relational operators (e.g. '>')	-0.124939
-0.352982	compiler options Table 18.1.	-0.124939
-0.514738	certain to become obsolete	-0.124939
-0.541226	reduced 15.1b to 15.1c,	-0.124939
-0.357332	explanation of return prediction).	-0.124939
-0.537507	long int 32 -231	-0.124939
-0.454496	is the feature information,	-0.124939
-0.358932	reduce (a*b*c)+(c*b*a) to a*b*c*2.	-0.124939
-0.358666	the users with nagging	-0.124939
-0.331950	some cases. Multiple threads?	-0.124939
-1.257309	as in example 7.22.	-0.124939
-0.421451	various programming languages. www.yeppp.info	-0.124939
-0.357229	&list[0]; temp < &list[100];	-0.124939
-0.452781	// 32-bit Windows, Intel/MASM	-0.124939
-0.294284	for Linux. 82 Keywords	-0.124939
-0.540722	final program. This requires,	-0.124939
-0.353673	before leaving their workplace	-0.124939
-0.512957	allocation is particularly risky	-0.124939
-1.161650	an option for "standard	-0.124939
-1.966986	if it is cached,	-0.124939
-0.294284	= & obj1; p->f();	-0.124939
-0.346502	134 on bounds checking).	-0.124939
-0.314805	in a pivot search:	-0.124939
-0.358760	#include "instrset_detect.cpp" // instrset_detect	-0.124939
-0.562433	as a time measure.	-0.124939
-0.463600	a program and concentrate	-0.124939
-0.358666	two additions with double's.	-0.124939
-0.339512	in each case. Inlined	-0.124939
-0.358749	n.a. a+a+a+a = a*4	-0.124939
-0.453414	microprocessor that supports this).	-0.124939
-0.357530	= x2 * x2;	-0.124939
-0.237967	the Common Language Runtime,	-0.124939
-0.237967	calculation requires n-1 multiplications,	-0.124939
-0.648067	of test data. That	-0.124939
-0.461736	cache. When we reach	-0.124939
-0.237967	F32vec4 xxn(x4, x2*x, x2,	-0.124939
-2.419667	- - - 76	-0.124939
-0.294284	: c x-xx----- 75	-0.124939
-0.593839	char short int 832	-0.124939
-0.488097	course a considerable job,	-0.124939
-0.554408	of the optimization job.	-0.124939
-0.497603	index by 8. 71	-0.124939
-0.358344	= temp; } 70	-0.124939
-0.357058	Gnu compilers. See www.openmp.org	-0.124939
-0.504977	goes up and down.	-0.124939
-0.294284	by the programmer. 79	-0.124939
-0.463495	Common devices are CPLDs	-0.124939
-0.357517	object or array coincides	-0.124939
-1.573249	to: // Example 8.14b	-0.124939
-0.429586	move or key press.	-0.124939
-2.159700	Example: // Example 8.14a	-0.124939
-1.810780	to make a bit-mask	-0.124939
-0.496718	not be too worried	-0.124939
-0.354672	: "=m"(n) : "m"(x)	-0.124939
-0.358980	by extending the sign-bit	-0.124939
-2.159700	Example: // Example 7.33a	-0.124939
-0.461531	other things very stupid.	-0.124939
-0.358300	remedy is memory pooling.	-0.124939
-0.294284	the objects (memory pooling)	-0.124939
-0.463523	A thread that shares	-0.124939
-0.764473	more clear and modular.	-0.124939
-0.336349	BSF (bit scan forward)	-0.124939
-0.496716	This has three advantages:	-0.124939
-0.358898	as GetPrivateProfileString and WritePrivateProfileString	-0.124939
-0.339510	x64 (Visual Studio 2005).	-0.124939
-1.064656	reason to use try,	-0.124939
-0.598658	FMA3 floating point multiply-and-add	-0.124939
-0.343771	that it writes only,	-0.124939
-0.562890	square. // This triangle	-0.124939
-0.294284	- x-xx----x x-xxxxxx- x-xxxx-x-	-0.124939
-0.358760	= lrint(d); // Rounding	-0.124939
-0.347528	universal algorithm (e.g. Quine–McCluskey	-0.124939
-0.352415	This calculation requires n-1	-0.124939
-0.237967	using the fundamental laws	-0.124939
-0.358932	sum, initialize to x^0/0!	-0.124939
-0.659389	are separated by semicolons,	-0.124939
-0.597376	desired instruction set (/arch:SSE2,	-0.124939
-0.527024	(float)i; f = float(i);	-0.124939
-0.571171	it with many decimals.	-0.124939
-0.237967	this loop? Certainly not!	-0.124939
-1.309398	should be stored together......................................	-0.124939
-0.355152	as their uses (live	-0.124939
-0.598658	Non-strict floating point -ffast-math	-0.124939
-0.570551	large overhead of managing	-0.124939
-0.883811	int b; int Sum1()	-0.124939
-0.358943	the occurrence is rare.	-0.124939
-1.615705	should preferably be responded	-0.124939
-0.348402	we are adding -100	-0.124939
-0.357779	= y.b + 2.;	-0.124939
-0.599869	values in a pre-calculated	-0.124939
-0.294284	= b*a (a+b)+c=a+(b+c) a+b+c=c+b+a	-0.124939
-1.425485	to do the devirtualization	-0.124939
-0.575629	Intel's compilers and invoked	-0.124939
-0.525921	AVX2, or two 128-	-0.124939
-0.885070	illustrated in example 9.5b.	-0.124939
-0.597263	Writes to a printer	-0.124939
-0.237967	can proceed unattended. Uninstallation	-0.124939
-0.351325	on future CPUs. Half	-0.124939
-0.815713	follows: Instruction set Important	-0.124939
-0.358898	If Func1 and Func2	-0.124939
-0.294284	the difference, let's say	-0.124939
-0.539678	application. In example 12.3a,	-0.124939
-1.058360	floating point variables .........................	-0.124939
-0.449189	into two 128-bit reads.	-0.124939
-0.237967	1./2., 1./6., 1./24., 1./120.,	-0.124939
-0.849032	we are using unions	-0.124939
-0.237967	C, C++, D, Pascal,	-0.124939
-0.382918	loop-carried dependency chains, namely	-0.124939
-1.322185	of the data structure,	-0.124939
-0.352988	as writing data. Multidimensional	-0.124939
-0.358630	36 C++ as 'this'.	-0.124939
-1.136157	when compiling for AVX2,	-0.124939
-0.353233	#include <stdio.h> #include <asmlib.h>	-0.124939
-0.551369	run in both 16-bit,	-0.124939
-0.659341	above example with u.i[1]	-0.124939
-0.352983	compilers unroll too much.	-0.124939
-0.568063	second induction variable (eax)	-0.124939
-0.237967	aliased #pragma optimize("a", on)	-0.124939
-0.358980	to draw the attention	-0.124939
-0.900884	misses in the level-	-0.124939
-0.567098	development time and maintainability	-0.124939
-0.560514	i; float f; f=i;	-0.124939
-0.832426	// exponent + 0x7F	-0.124939
-0.858283	because we are relying	-0.124939
-0.237967	in the BIOS setup.	-0.124939
-0.339516	Sunday = 1, Monday	-0.124939
-0.585530	that is an n'th	-0.124939
-0.345269	in the chapter "Register	-0.124939
-0.538986	(a&b) | (~a&c) a&b&c&d	-0.124939
-0.357942	statements within each clause	-0.124939
-0.589424	work cannot be ignored	-0.124939
-1.647581	to use a union,	-0.124939
-0.358856	three advantages: The i<20	-0.124939
-0.237967	for the <, <=,	-0.124939
-0.358898	involving division and relational	-0.124939
-0.725884	x2 = x *x;	-0.124939
-0.336346	the "Intel Performance Primitives"	-0.124939
-0.408223	} Example 14.30 finds	-0.124939
-0.356523	true or always false:	-0.124939
-0.355360	the code inside square:	-0.124939
-0.237967	be obeyed. Copy protection.	-0.124939
-0.358736	are incremental or iterative	-0.124939
-0.354390	libraries or shared objects),	-0.124939
-0.512957	representation is particularly tricky.	-0.124939
-0.358980	recommendation was the opposite:	-0.124939
-0.588066	back into the for-loop:	-0.124939
-0.582118	inlining has the complication	-0.124939
-0.596838	case a = ++b;	-0.124939
-0.358939	only half a square.	-0.124939
-0.237967	FuncType SelectAddMul, SelectAddMul_SSE2, SelectAddMul_SSE41,	-0.124939
-0.877543	to have a strategy	-0.124939
-0.358690	not negative by AND'ing	-0.124939
-0.356527	x(0) {}; void xplus2()	-0.124939
-0.355852	best on processor X?"	-0.124939
-0.314809	Look at Exception Specifications,	-0.124939
-0.591568	even be a million	-0.124939
-1.483176	dynamic memory allocation (new	-0.124939
-0.569731	mispredicted for this reason.	-0.124939
-0.358457	systems". For this reason,	-0.124939
-0.237967	better: -Ofast -mveclibabi -fopenmp	-0.124939
-0.351325	matrix into smaller squares	-0.124939
-0.325441	conditions are optimal. Best-case	-0.124939
-0.356113	a reply about investigation	-0.124939
-0.917463	an integer in disguise.	-0.124939
-0.357332	the normal return route.	-0.124939
-0.713035	to be too small,	-0.124939
-0.358856	its reputation. The compactness	-0.124939
-0.595983	avoid the loop overhead.	-0.124939
-0.650825	// Loop counter //=2*A	-0.124939
-0.354672	? 1.5f : 2.6f;	-0.124939
-0.237967	out results printf("\n%2i %10I64i",	-0.124939
-0.357957	there are floating point-to-integer	-0.124939
-0.504354	with this instruction set?".	-0.124939
-0.788701	the loop. The loop-branch	-0.124939
-0.874870	a) { return vector(x	-0.124939
-1.362696	by // Example 8.8b	-0.124939
-0.549824	manipulation Mathematical functions Encryption,	-0.124939
-2.159700	Example: // Example 8.8a	-0.124939
-0.336344	--------- ~a ^ ~b	-0.124939
-0.294284	to many users. Firewalls,	-0.124939
-0.237967	b ---xx---- a<<b<<c=a<<(b+c) x-xxx--xx	-0.124939
-0.357779	= (int)(&list[0]) + 100*16,	-0.124939
-0.237967	RGB color difference. Newest	-0.124939
-0.559150	exponent is always normalized,	-0.124939
-0.459815	(OnIdle in Windows MFC).	-0.124939
-0.755002	of stack unwinding ..............................................................................	-0.124939
-0.550446	two threads with widely	-0.124939
-0.580563	logarithm would be re-calculated	-0.124939
-0.960420	or not. The advise	-0.124939
-0.353466	the same cache. Multithreaded	-0.124939
-0.353869	be mainstream next year.	-0.124939
-0.294284	instruction set specified. Insert	-0.124939
-0.237967	as function inlining. Reducible	-0.124939
-0.355155	add elements }; vector()	-0.124939
-0.458548	lines. A few decades	-0.124939
-0.358898	is high and decreased	-0.124939
-1.041792	to optimization by CPU.............................................................................81	-0.124939
-0.596838	pointer a = (*CriticalFunction)(b,	-0.124939
-0.599091	details. // Example 12.7.	-0.124939
-0.356528	require other access patterns.	-0.124939
-0.659805	to develop and publish	-0.124939
-0.357136	equivalent to const definitions	-0.124939
-1.576555	b; a = Multiply(10,8);	-0.124939
-0.505040	of overflow is "undefined".	-0.124939
-0.408223	triangle is handled separately:	-0.124939
-0.558539	mov DWORD PTR [ecx+eax*4],ebx	-0.124939
-0.575674	and call the std::unexpected()	-0.124939
-0.358749	7.41b a.x = b.x	-0.124939
-0.358934	by thousands of people.	-0.124939
-0.237967	other exceptions: __except (GetExceptionCode()	-0.124939
-1.252094	with the option -mveclibabi=svml.	-0.124939
-0.763752	SIZE; c++) { a[c][r]	-0.124939
-1.381265	if they are uninitialized,	-0.124939
-2.075055	the number of jumps,	-0.124939
-1.021996	or a few places.	-0.124939
-0.294284	niche in scientific computing,	-0.124939
-0.237967	in programming nowadays stress	-0.124939
-0.458838	unsigned char 128 Iu8vec16	-0.124939
-0.237967	a[0], b[0], a[1], b[1],	-0.124939
-0.580820	calculations unless the strictness	-0.124939
-0.352982	CodeGear Microsoft Table 2.1.	-0.124939
-0.355852	Type size, bytes alignment,	-0.124939
-0.349799	2.8. Asmlib: v. 2.00.	-0.124939
-0.331953	the value wrap around.	-0.124939
-0.237967	eliminate common sub-expressions. Why	-0.124939
-0.626461	21 3.13 Memory access.......................................................................................................	-0.124939
-0.502642	while (i < arraysize)	-0.124939
-0.356338	and pointer type casting.	-0.124939
-0.356338	a simple type casting,	-0.124939
-0.844650	for array bounds violations	-0.124939
-0.294284	option) better: -Ofast -mveclibabi	-0.124939
-0.726333	with new or malloc.	-0.124939
-0.237967	2004 - 2014. Last	-0.124939
-0.504750	(with new or malloc)	-0.124939
-0.237967	source) { _mm_stream_pi((__m64*)dest, *(__m64*)&source);	-0.124939
-0.357229	that u < 231	-0.124939
-0.576548	in a specific interval.	-0.124939
-0.358816	disturbing influences are removed,	-0.124939
-0.538326	d; }; void Func()	-0.124939
-0.498878	within a certain interval:	-0.124939
-0.659810	the algorithm in question:	-0.124939
-0.461477	r1; c2 < r2;	-0.124939
-0.844650	for array bounds violation,	-0.124939
-0.541049	development methods are incremental	-0.124939
-0.358943	example 7.43b is admittedly	-0.124939
-0.357056	processor activates critical application-	-0.124939
-0.835875	good idea to collect	-0.124939
-0.237967	data sets. Covers PC's,	-0.124939
-0.456694	code. The name "position-independent	-0.124939
-0.597571	needed by the application,	-0.124939
-0.349809	functions and hot spots.	-0.124939
-2.107552	0; i < list.Size();	-0.124939
-1.366510	than on the essential	-0.124939
-0.237967	blocking: int r1, r2,	-0.124939
-0.726225	= multiply by xx-xx--x-	-0.124939
-0.358898	Public distribution and mirroring	-0.124939
-0.841876	the option for "function	-0.124939
-0.572248	spots have been identified.	-0.124939
-0.237967	y = MAX(f(x), g(x));	-0.124939
-0.523153	executing library functions. Time-	-0.124939
-0.354221	instruction set was originally	-0.124939
-0.358943	of lines is 8*1024/64	-0.124939
-0.237967	a|(b&c) x-xxxx--x ~a&~b=~(a|b) --xxxx---	-0.124939
-0.435100	vectorized as follows (using	-0.124939
-0.597571	calculated by the series:	-0.124939
-0.357779	= y.c + 3.;	-0.124939
-0.352712	if (u.i > v.i)	-0.124939
-0.596838	Is16vec8 a = select_gt(b,	-0.124939
-1.203304	parm2) {...} // Prototype	-0.124939
-1.130972	on the stack. String	-0.124939
-0.463305	erroneously called with IsPowerOf2	-0.124939
-0.237967	binutils version 2.20, glibc	-0.124939
-0.599285	51 for the pros	-0.124939
-0.237967	in special mathe- matical	-0.124939
-0.538986	data members (properties) ............................................................................	-0.124939
-0.726815	first element is stored?	-0.124939
-0.357606	memory allocation also tends	-0.124939
-0.593738	goes from the leftmost	-0.124939
-1.096106	Microsoft, Intel and Gnu).	-0.124939
-0.325441	Example 12.9b. Taylor series,	-0.124939
-0.325441	of a Taylor series.	-0.124939
-0.460468	also 512 bits (ZMM).	-0.124939
-0.237967	options Table 18.1. Command	-0.124939
-0.349799	Codeplay VectorC v. 2.1.7,	-0.124939
-0.538986	worth the effort. Square	-0.124939
-0.990885	this example, the DelayFiveSeconds	-0.124939
-0.356842	to show how tortuous	-0.124939
-0.721586	and ZMM registers ..........................................................	-0.124939
-0.237967	Menus, buttons, dialog boxes,	-0.124939
-0.237967	s = _mm_hadd_ps(s, s);	-0.124939
-0.357981	set control no yes	-0.124939
-0.237967	user. Menus, buttons, dialog	-0.124939
-0.585530	it is an integer).	-0.124939
-0.836130	operators &, |, ~.	-0.124939
-0.780055	XMM (vector) reductions: ~(~a)	-0.124939
-0.237967	transposes a quadratic matrix,	-0.124939
-0.331950	x-xxxx-x- x-xxxxxxx xxxxxxxxx xxxxxxx-x	-0.124939
-0.352983	page 164 below. Those	-0.124939
-0.331955	the current .cpp file)	-0.124939
-0.357229	details on branch predictions	-0.124939
-0.350876	to some positive value,	-0.124939
-0.557586	int b, c; x[0]	-0.124939
-0.358479	& source) { _mm_stream_pi((__m64*)dest,	-0.124939
-0.358630	implement OneOrTwo5[b!=0] as OneOrTwo5[(b!=0)	-0.124939
-0.358932	was manipulated to fake	-0.124939
-0.460468	is 128 bits (XMM)	-0.124939
-0.526423	is doing multiple logically	-0.124939
-0.358166	monitoring options. CPU vendors	-0.124939
-0.587338	for an integer constant,	-0.124939
-0.358560	double d; int i[2];	-0.124939
-0.872248	requires that you analyze	-0.124939
-1.945434	This can be accomplished	-0.124939
-0.237967	to use try, catch,	-0.124939
-0.237967	Addison-Wesley. Third Edition, 2005;	-0.124939
-0.237967	smmintrin.h (Gnu) AES, PCLMUL	-0.124939
-0.456088	the const keyword wherever	-0.124939
-0.353674	based on complicated criteria	-0.124939
-0.408223	54 7.22 Inheritance ..............................................................................................................	-0.124939
-0.353872	The pre-increment operator ++i	-0.124939
-0.294284	program is dividing repeatedly	-0.124939
-0.456965	library functions like sqrt,	-0.124939
-0.463394	keyword __restrict or __restrict__,	-0.124939
-0.358852	microprocessor hardware for raising	-0.124939
-0.237967	Include file dvec.h vectorclass.h	-0.124939
-1.003148	in the next section.	-0.124939
-1.844790	in the code section,	-0.124939
-0.356530	induction variable method unfavorable,	-0.124939
-0.573473	Intel vector classes 114	-0.124939
-0.408223	55 7.25 Bitfields ...................................................................................................................	-0.124939
-0.579256	because the programmer hasn't	-0.124939
-0.294284	s(0.f, 0.f, 0.f, 1.f);	-0.124939
-0.356486	SSE xmmintrin.h SSE2 emmintrin.h	-0.124939
-1.239694	is not always optimal,	-0.124939
-0.599677	assuming that the occurrence	-0.124939
-0.237967	cost of fine-tuning, testing,	-0.124939
-1.164099	of the preceding row.	-0.124939
-0.866082	Class member functions (methods).........................................................................	-0.124939
-0.596503	calls rather than self-styled	-0.124939
-0.357779	Intel SVML + ia32intrin.h	-0.124939
-0.356283	return powN<(N & N-1)==0,N>::p(x);	-0.124939
-0.875438	more than a minute	-0.124939
-0.358932	and simple to develop.	-0.124939
-0.421451	that they are. Declare	-0.124939
-0.541312	can get the exact	-0.124939
-0.577600	insufficient amount of RAM,	-0.124939
-0.659969	to identify the circumstances	-0.124939
-0.462403	recursion template<> class powN<true,1>	-0.124939
-0.358852	use ~ for NOT.	-0.124939
-0.504768	and (set) = (0x2710	-0.124939
-0.710979	int 8 AVX2 _mm_i64gather_epi32	-0.124939
-0.355253	/ CodeGear / Embarcadero	-0.124939
-0.654376	a thousand times lower;	-0.124939
-0.357580	compilers have efficient table-based	-0.124939
-0.539610	able to reduce (a*b*c)+(c*b*a)	-0.124939
-0.237967	513 2056 38.1 97	-0.124939
-0.582626	possible to calculate pow(x,10)	-0.124939
-0.237967	is called VTune; AMD's	-0.124939
-0.463645	"Moving blocks of data",	-0.124939
-0.587644	n; i++) { 92	-0.124939
-0.463335	memory space by allowing	-0.124939
-0.538986	4: "Instruction tables". Tips	-0.124939
-0.358736	Linux compiler, or vice	-0.124939
-0.357652	for random number generators.	-0.124939
-0.358749	double x8 = x4*x4;	-0.124939
-0.587987	examples I have tested.	-0.124939
-0.357779	= b.x + c.x	-0.124939
-0.357779	= b.y + c.y	-0.124939
-0.653736	a soft processor activates	-0.124939
-0.336344	is n places back,	-0.124939
-0.578506	services only when activated	-0.124939
-0.237967	calculating a polynomial. Scheduling	-0.124939
-0.599091	template: // Example 7.34b.	-0.124939
-0.538986	__attribute__ ((visibility ("internal"))) Vectorize	-0.124939
-0.350875	replaced by my comments,	-0.124939
-0.456097	of two induction variables:	-0.124939
-0.345271	set Important features 80386	-0.124939
-0.599091	denominator: // Example 14.16b	-0.124939
-0.294284	7.45 // Portability note:	-0.124939
-0.237967	as example 13.1, Requires	-0.124939
-0.861601	slow unless the Pentium-II	-0.124939
-0.463657	competing product is Borland's	-0.124939
-0.341907	x^n/n! xxn *= xx4;	-0.124939
-0.294284	no other exceptions: __except	-0.124939
-0.358934	other methods of rounding,	-0.124939
-0.538986	a serious legal issue,	-0.124939
-0.294284	way of removing superfluous	-0.124939
-0.593893	subexpression. For example, b*2.0/3.0	-0.124939
-0.594265	today will be mainstream	-0.124939
-0.504768	x); s = _mm_hadd_ps(s,	-0.124939
-0.550826	An example is Perl.	-0.124939
-2.159700	Example: // Example 14.17a	-0.124939
-0.999609	example 15.1b to 15.1c?	-0.124939
-0.358852	and 72 for discussions.	-0.124939
-0.460061	is called stack unwinding.	-0.124939
-0.331950	3; or __asm ("int	-0.124939
-0.463643	are set to relax	-0.124939
-0.343768	instruction set (or higher)	-0.124939
-0.294284	array ; i++ ;checkifi<100	-0.124939
-0.494965	cases are usually dealt	-0.124939
-0.726574	((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x = ((x2)2)2	-0.124939
-1.085719	subroutines in assembly language:	-0.124939
-0.580357	ReadTSC() { int dummy[4];	-0.124939
-0.658341	the newest CPU model,	-0.124939
-0.581243	that C++ compilers exist	-0.124939
-1.735621	example: // Example 14.15a	-0.124939
-0.449182	use of structures (without	-0.124939
-0.237967	is always true/false Loopunrolling	-0.124939
-0.805960	size parameter is wrong,	-0.124939
-0.358543	guess, that compiler makers	-0.124939
-1.227418	is not a textbook	-0.124939
-0.823221	int exponent : 15;	-0.124939
-0.658610	known as memory leaks.	-0.124939
-0.237967	footprint is unreasonably large.	-0.124939
-0.294284	thread affinity mask. Poor	-0.124939
-0.237967	(see p. 22). 159	-0.124939
-0.356053	the ReadTSC function. 154	-0.124939
-0.294284	(RTTI) /GR– -fno-rtti /GR-	-0.124939
-0.358344	return IntegerPower<10>(x); } 152	-0.124939
-0.442087	expects a GOT entry.	-0.124939
-0.504712	such as function inlining.	-0.124939
-1.024955	the call to Object1.Hello(),	-0.124939
-0.764560	example 15.1a to 151	-0.124939
-0.354534	systematic and well thought-through	-0.124939
-0.463643	compiler not to vectorize.	-0.124939
-0.358898	this works and suggests	-0.124939
-1.214384	longer time than normally.	-0.124939
-0.517240	float 4 AVX2 _mm256_i32gather_ps	-0.124939
-0.351725	.............................................................................................. 99 10 Multithreading..............................................................................................................	-0.124939
-0.237967	calls exit(), abort(), _endthread(),	-0.124939
-0.237967	their uses (live ranges)	-0.124939
-0.353678	A metaprogramming implementation analogous	-0.124939
-0.462024	up. The two summation	-0.124939
-0.656020	Choice of operating system.........................................................................................	-0.124939
-0.461608	copying the return statement:	-0.124939
-0.356523	I am always happy	-0.124939
-1.269442	compilers and operating systems"	-0.124939
-0.348406	// 32-bit Linux, Gnu/AT&T	-0.124939
-0.654990	14.13 System programming Device	-0.124939
-0.358898	between PC's and mainframes,	-0.124939
-0.599819	code to the device.	-0.124939
-0.237967	in the "Macro loops"	-0.124939
-0.294284	http://www.agner.org/optimize/ - vectorclass www.agner.org/optimize/#vectorclass.	-0.124939
-0.237967	syntax: __asm ("fldl %1	-0.124939
-0.314805	or First-In-Last-Out access, sort	-0.124939
-0.237967	// Example 7.29b floata;	-0.124939
-0.346506	installation tools. Automatic updates.	-0.124939
-0.456408	intranet for automatic updates,	-0.124939
-0.339514	a[i] = log (b[i]	-0.124939
-0.882354	also be a level-3	-0.124939
-0.705776	C++ Compiler v. 14.00	-0.124939
-1.183849	set is enabled. Few	-0.124939
-0.585051	The function that detects	-0.124939
-0.519258	use the name _alloca)	-0.124939
-0.237967	error has occurred anywhere	-0.124939
-0.820879	D : public B1,	-0.124939
-0.353678	and references. Most importantly,	-0.124939
-0.421451	21 3.10 Graphics .................................................................................................................	-0.124939
-0.237967	order a[0], b[0], a[1],	-0.124939
-0.357530	* 5 * 0.5	-0.124939
-0.520601	by the user. Feature	-0.124939
-0.237967	n.a. 2.23 0.95 0.6	-0.124939
-0.540502	AND'ed with this mask,	-0.124939
-0.900994	= 100; float list[ARRAYSIZE];	-0.124939
-0.759499	short int 64 Is16vec4	-0.124939
-0.748159	on Intel processors. Details	-0.124939
-0.343771	run. Examples include JavaScript,	-0.124939
-0.478801	NUMROWS = 100, NUMCOLUMNS	-0.124939
-0.343768	of four (or eight)	-0.124939
-0.799616	the following way. First	-0.124939
-0.957788	the risk of losing	-0.124939
-0.358749	{ time1 = ReadTSC();	-0.124939
-0.787764	between multiple CPU cores:	-0.124939
-0.358543	programmers and compiler makers.	-0.124939
-0.343772	template metaprogramming. Don't panic	-0.124939
-0.358856	= sin(0.8); The sin	-0.124939
-0.885097	the computer is rebooted.	-0.124939
-0.495473	file by calling WritePrivateProfileString,	-0.124939
-0.314805	F1(); } catch (...)	-0.124939
-0.586794	attacks and other abuse	-0.124939
-0.358457	known at this place.	-0.124939
-0.358560	int FuncRow(int); int FuncCol(int);	-0.124939
-0.353872	the modulo operator %.	-0.124939
-0.836130	and VIA CPUs"). Const	-0.124939
-0.595641	2) { // abs(u.f)	-0.124939
-0.343777	makers. 4. Instruction tables:	-0.124939
-0.656473	((unsigned int)n < 13)	-0.124939
-0.237967	make profiling feasible. Interference	-0.124939
-0.504029	decision at different times:	-0.124939
-0.358591	(unsigned int)(max - min))	-0.124939
-0.635365	and other resource conflicts.	-0.124939
-0.237967	will also work, 133	-0.124939
-0.357358	template metaprogramming so complicated?	-0.124939
-0.237967	need any patch. 131	-0.124939
-0.353233	#include <float.h> #include <math.h>	-0.124939
-1.308815	of the program. Application	-0.124939
-0.461839	thank the many people	-0.124939
-0.294284	such as floppy disks	-0.124939
-0.343768	a one parameter. Further	-0.124939
-0.463705	than half the single-thread	-0.124939
-0.738912	management and garbage collection,	-0.124939
-0.358344	+= i_div_3; } 138	-0.124939
-0.566905	and virtual function tables.	-0.124939
-0.237967	%1 \n fistpl %0	-0.124939
-0.341902	switch statement jump tables,	-0.124939
-0.519567	because of their superior	-0.124939
-0.358402	have become more powerful.	-0.124939
-0.599285	FAQ for the newsgroup	-0.124939
-0.504431	first time you activate	-0.124939
-0.358736	class C1 or C2,	-0.124939
-0.358666	Big supercomputers with massively	-0.124939
-0.886122	identified by a unique	-0.124939
-0.505025	of costs to multithreading	-0.124939
-0.899295	And it is unlikely	-0.124939
-0.358934	the queue of pending	-0.124939
-1.158720	in the order a[0],	-0.124939
-0.356937	8 = 64 kb.	-0.124939
-0.645586	cache is 512 kb,	-0.124939
-1.001975	It is common practice	-0.124939
-1.045146	runtime type identification (RTTI).	-0.124939
-1.045146	runtime type identification (RTTI),	-0.124939
-0.358307	file timingtest.h from www.agner.org/optimize/testp.zip	-0.124939
-0.382918	55 7.24 Unions ....................................................................................................................	-0.124939
-0.463600	when portability and ease	-0.124939
-1.424912	need to be resized	-0.124939
-0.451241	CPUs. The Pentium Pro	-0.124939
-0.345273	updates may come unpredictably	-0.124939
-0.653112	numbers and integers ...................................	-0.124939
-0.559425	through function calls. Internal	-0.124939
-0.358011	predictable than integer comparisons.	-0.124939
-0.294284	wrap around, (3) trap	-0.124939
-0.600620	data can be overridden	-0.124939
-0.237967	int 128 Iu32vec4 Vec4ui	-0.124939
-0.459533	to eliminate common sub-expressions.	-0.124939
-0.331950	bounds of valid addresses,	-0.124939
-0.725132	there are different opinions	-0.124939
-0.586047	cases for different microprocessors,	-0.124939
-0.456966	write the members individually.	-0.124939
-0.357779	{return p->a + p->b;}	-0.124939
-0.576473	eax. The loop initialisation	-0.124939
-1.413448	the code that matters	-0.124939
-0.237967	tempting to fine- tune	-0.124939
-0.352982	vector nontemporal Table 18.3.	-0.124939
-0.587603	was more than doubled	-0.124939
-0.581016	one of these categories:	-0.124939
-0.408223	drivers, interrupt service routines,	-0.124939
-0.358934	broader perspective of usability.	-0.124939
-0.352983	<xmmintrin.h> _mm_setcsr(_mm_getcsr() | 0x8040);	-0.124939
-1.004425	there are a couple	-0.124939
-0.840291	Use 64-bit mode Parameter	-0.124939
-1.503765	can also be huge).	-0.124939
-0.503600	Same as example 13.1,	-0.124939
-0.343772	the division faster. Of	-0.124939
-0.314805	next section (page 131)	-0.124939
-0.463600	program itself and recompile	-0.124939
-0.595641	__try { // Main	-0.124939
-0.294284	the libraries named MKL,	-0.124939
-0.657570	= log(b[i]) + log(c[i]);.	-0.124939
-0.314805	data. That being said,	-0.124939
-0.505027	/ (number of ways).	-0.124939
-0.350875	solution. (In my tests,	-0.124939
-0.351725	search facilities, binary trees,	-0.124939
-0.463652	make it a template:	-0.124939
-0.237967	double precision (80 bits).	-0.124939
-0.596838	... a = FactorialTable[b];	-0.124939
-0.331953	has been doubled. Thin	-0.124939
-1.601186	x) { return _mm_cvtsd_si32(_mm_load_sd(&x));}	-0.124939
-0.237967	s = (short int)i;	-0.124939
-1.393410	compilers I have tried.	-0.124939
-0.358901	(e.g. GetLogicalProcessorInformation in Windows)	-0.124939
-0.354805	a multidimensional structure needed?	-0.124939
-0.339516	Manual", Volume 1, 2A,	-0.124939
-0.358630	is now as follows.	-0.124939
-0.527024	float(i); f = static_cast<float>(i);	-0.124939
-0.590472	Windows in this respect.	-0.124939
-1.037125	the code becomes bulky	-0.124939
-0.358367	to load. A light-weight	-0.124939
-0.314805	of memory. One kilobyte	-0.124939
-0.540905	not turn on correction	-0.124939
-0.358730	instruction set it supports.	-0.124939
-0.595221	sets the CPU supports,	-0.124939
-0.635370	temp * temp; 104	-0.124939
-0.726225	the performance by 5-10%	-0.124939
-0.354671	cycle is 1 0.5ns.	-0.124939
-0.358943	certain modification is profitable.	-0.124939
-0.356339	char 8 0 255	-0.124939
-0.358856	be straightforward. The MASM	-0.124939
-0.357954	explained at page 150.	-0.124939
-0.833589	the CPUID instruction directly,	-0.124939
-0.358730	as accessing it directly.	-0.124939
-0.358526	once. One may argue	-0.124939
-0.557689	the problems and planned	-0.124939
-1.137401	advantage of this capability:	-0.124939
-0.348402	on its final destination,	-0.124939
-0.237967	ASP and UNIX shell	-0.124939
-0.358736	libraries (.dll or .so).	-0.124939
-0.586005	index out of range");	-0.124939
-0.586501	sign must be reversed	-0.124939
-0.237967	{1.1, 0.3, -2.0, 4.4,	-0.124939
-1.395201	the same as reflecting	-0.124939
-0.598331	Avoid the function scanf.	-0.124939
-0.583215	strlen function in isolation	-0.124939
-0.541049	shared resources are limiting	-0.124939
-0.524789	with a 32-bit (signed)	-0.124939
-0.598658	mispredictions, floating point exceptions,	-0.124939
-0.358431	more reproducible time measurements:	-0.124939
-0.439075	0x3F800000; // Now 1.0	-0.124939
-0.358736	registers (XMM or YMM)	-0.124939
-0.632969	large data sets. Covers	-0.124939
-0.294284	Sunday, Monday, Tuesday, Wednesday,	-0.124939
-0.559423	sixteen vector registers (XMM	-0.124939
-0.759499	short int 64 Iu16vec4	-0.124939
-0.461732	not need any patch.	-0.124939
-0.237967	table of 1/n! 1.,	-0.124939
-0.598749	conditions is not met	-0.124939
-0.314805	XOP ammintrin.h (MS) xopintrin.h	-0.124939
-0.237967	FuncA(i); FuncC(i); FuncB(i+1); FuncC(i+1);	-0.124939
-0.358939	= (a&b)&(c&d) a ^0	-0.124939
-0.463705	controversies over the C99	-0.124939
-0.755004	short int 128 Iu16vec8	-0.124939
-0.314805	all intrin.h (MS) x86intrin.h	-0.124939
-0.237967	conversion, shuffling, packing, unpacking	-0.124939
-0.358736	output (/FAs or -fsource-asm).	-0.124939
-0.314805	Take user feedback seriously.	-0.124939
-0.237967	are: int BigArray[1024] __attribute__((aligned(64)));	-0.124939
-0.345271	fixed address might clash	-0.124939
-0.294284	to the first-in-last-out nature	-0.124939
-0.331953	than the equivalent if(!(a	-0.124939
-0.358300	maximum possible memory requirement.	-0.124939
-0.382918	Codes", SIAM 2001. Advanced	-0.124939
-0.900215	constant can be propagated	-0.124939
-0.541253	p->f() goes to C0::f	-0.124939
-0.356937	1 int64_t 64 I64vec1	-0.124939
-0.762083	second generation class (CParent<>)	-0.124939
-0.325441	in a bad dilemma.	-0.124939
-0.579217	newest instruction set. High	-0.124939
-0.237967	g++ v 4.0.1. Gnu:	-0.124939
-0.358898	operating systems, and API's.	-0.124939
-0.357518	explanation and possible workaround.	-0.124939
-0.492293	Gnu C++ v. 4.1.0,	-0.124939
-0.325444	*(__m64*)&source); // MOVNTQ _mm_empty();	-0.124939
-0.659810	do things in parallel:	-0.124939
-0.959813	to see if our	-0.124939
-0.597580	* x - 8.0f)	-0.124939
-0.463657	mutexes, etc. is considerable.	-0.124939
-0.237967	frame" or "frame pointer".	-0.124939
-0.237967	tricks Michael Abrash: "Zen	-0.124939
-1.547260	pointer or reference parameters).	-0.124939
-0.488097	is a considerable debate	-0.124939
-0.358367	(See Sutter: A Pragmatic	-0.124939
-0.294284	5 and 9. Multiplications	-0.124939
-0.358250	the constant vector (1,2,3,4),	-0.124939
-0.314805	previous chapter (page 146).	-0.124939
-0.358901	my comments, in green.	-0.124939
-0.237967	integer registers. Typical candidates	-0.124939
-0.382918	metaprogramming in C++: Preprocessor	-0.124939
-1.032153	into multiple threads. Out-of-order	-0.124939
-0.355988	the computer while he	-0.124939
-0.835149	are used by thousands	-0.124939
-0.358939	to e.g. a menu	-0.124939
-0.577341	code. In this chapter,	-0.124939
-0.358898	data conversion and shuffling	-0.124939
-0.659506	|| (a&&b&&c) = a&&b	-0.124939
-0.357825	by avoiding pointer arithmetics	-0.124939
-0.649004	16 16 256 Vec32uc	-0.124939
-0.349799	types of variables. Move	-0.124939
-0.558345	prefer to write if(!a	-0.124939
-0.358852	NUMROWS; row++) for (column	-0.124939
-0.723826	loop by two gives:	-0.124939
-0.358932	be rounded to 100000000.	-0.124939
-0.537501	different object file formats.	-0.124939
-0.237967	Journal Vol. 11, Iss.	-0.124939
-1.133859	if it has incomplete	-0.124939
-0.237967	{ case 0: printf("Alpha");	-0.124939
-0.331950	} 59 third generations	-0.124939
-0.659895	unsigned integers is costless.	-0.124939
-0.237967	with more heuristic guidelines.	-0.124939
-0.462848	optimally, or from knowing	-0.124939
-0.885070	syntax in example 7.43b	-0.124939
-0.583203	example i = 18,	-0.124939
-0.237967	this time lag. Thinking	-0.124939
-0.446328	actual load address. Relocation	-0.124939
-0.348402	in integer registers. Typical	-0.124939
-0.237967	16 -32768 32767 int16_t	-0.124939
-0.294284	Free Documentation License shall	-0.124939
-0.357530	p1->Hello(); CChild2 * p2;	-0.124939
-0.352419	T a[N]; public: SafeArray()	-0.124939
-0.358898	which opens and closes	-0.124939
-0.237967	32 0 232-1 uint32_t	-0.124939
-0.354672	? 1.0f : 2.5f;	-0.124939
-0.339516	inserts temporary debug breakpoints	-0.124939
-0.504980	Intel compiler in favor	-0.124939
-0.237967	5.5 Mac: Darwin8 g++	-0.124939
-0.382918	Contents 1 Introduction .......................................................................................................................	-0.124939
-0.495931	who is still frustrated	-0.124939
-0.349799	Gnu: Glibc v. 2.7,	-0.124939
-0.355536	a[i+2]; s3 += a[i+3];	-0.124939
-0.550843	rows, not the columns.	-0.124939
-0.339510	{ protected: T a[N];	-0.124939
-0.355619	exp, sin, etc. Overriding	-0.124939
-0.357229	0; j < columns;	-0.124939
-0.237967	// Example 7.41b a.x	-0.124939
-0.237967	c.x + d.x; a.y	-0.124939
-0.358278	data. Extra data conversion,	-0.124939
-0.990742	or class is responsible	-0.124939
-0.237967	into the for-loop: i++;	-0.124939
-0.382918	prototype: void F1() throw();	-0.124939
-0.331950	the loop increment i++.	-0.124939
-0.353463	of good development tools,	-0.124939
-0.524358	main will take precedence,	-0.124939
-0.525954	and then call __intel_cpu_features_init_x().	-0.124939
-0.358760	* sizeof(float)); // (Some	-0.124939
-0.358898	in reusable and well-	-0.124939
-0.358307	a jump from a=a*2;	-0.124939
-0.787913	to generate an interrupt,	-0.124939
-0.492293	PGI C++ v. 7.1-4,	-0.124939
-0.358666	integer division with truncation,	-0.124939
-0.325441	or ten years old.	-0.124939
-0.356339	long 32 0 232-1	-0.124939
-0.460058	framework in its API.	-0.124939
-0.523477	float. The type __m128d	-0.124939
-0.237967	software development", Addison- Wesley	-0.124939
-0.591025	devirtualization (see page 73)	-0.124939
-0.594095	so. See page 73.	-0.124939
-0.421451	// x^4 F32vec4 s(0.f,	-0.124939
-1.549717	used in the Active	-0.124939
-0.596074	language as a subset,	-0.124939
-0.912392	number of possible remedies	-0.124939
-0.598553	alignment and the resultant	-0.124939
-0.600955	back in the sequence,	-0.124939
-1.477555	the compiler can safely	-0.124939
-0.356283	= OneOrTwo5[b & 1];	-0.124939
-0.601255	searches of the kind:	-0.124939
-0.237967	in a multitasking environment,	-0.124939
-0.357957	and 8 floating point).	-0.124939
-0.495927	directives Preprocessing directives (everything	-0.124939
-0.237967	Microsoft Visual studio 2008,	-0.124939
-0.442085	int * __restrict bb)	-0.124939
-0.237967	C++ v. 7.1-4, 2008.	-0.124939
-0.358250	compiler supports vector intrinsics,	-0.124939
-0.504069	performance for vector intrinsics.	-0.124939
-0.979705	is an extra layer	-0.124939
-0.325444	a First-In-Last- Out (FILO)	-0.124939
-0.347532	for "function level linking"	-0.124939
-1.735621	example: // Example 14.2a	-0.124939
-0.897173	lookup: // Example 14.2b	-0.124939
-0.358341	and FuncB, then FuncC.	-0.124939
-0.358898	CPU brands and similarly	-0.124939
-0.294284	Windows Server 2008 R2	-0.124939
-0.237967	multiple of 0x800 apart.	-0.124939
-0.352086	bit integer vectors FMA3	-0.124939
-0.504977	logical structure and clarity	-0.124939
-0.353676	difficult cases like these,	-0.124939
-0.577157	mostly compatible with these.	-0.124939
-0.237967	and CPU hardware. Porting	-0.124939
-0.580353	time has been wasted.	-0.124939
-0.294284	such as memcpy, memmove,	-0.124939
-0.820015	edx, DWORD PTR [esp+12]	-0.124939
-0.659713	a feature for reserving	-0.124939
-0.599712	updated. It is tempting	-0.124939
-0.352981	two or three levels	-0.124939
-0.356527	another thread void DelayFiveSeconds()	-0.124939
-0.726790	is intended to mimic	-0.124939
-0.237967	"Effective C++". Addison-Wesley. Third	-0.124939
-1.352769	are useful for identifying	-0.124939
-0.356178	multiple functions. I disagree	-0.124939
-0.595332	profilers such as AQtime,	-0.124939
-0.599091	bits: // Example 14.29	-0.124939
-0.599091	zero: // Example 14.24	-0.124939
-0.897173	bit: // Example 14.25	-0.124939
-0.520608	have this problem. Vectors	-0.124939
-0.408223	addition, fast approximate reciprocal,	-0.124939
-0.599091	function: // Example 14.20	-0.124939
-1.328250	code in example 14.21	-0.124939
-0.594611	caches have to adapt	-0.124939
-0.873802	two functions are unrelated	-0.124939
-0.900884	seen in the unit-	-0.124939
-0.444434	8 #define FUNCNAME SelectAddMul_AVX2	-0.124939
-0.570296	double d = 1.6;	-0.124939
-0.336351	would have spent fighting	-0.124939
-0.358898	Functions _intel_fast_memcpy and __intel_new_strlen	-0.124939
-0.505035	Supports x86 and ARM	-0.124939
-0.541215	this result in a[i].	-0.124939
-0.532588	core is running at,	-0.124939
-0.592349	worse, it can overwrite	-0.124939
-0.237967	other thread increments seconds.	-0.124939
-1.360831	last byte at 399	-0.124939
-0.355449	PCLMUL wmmintrin.h AVX immintrin.h	-0.124939
-0.358939	and afterwards a BSF	-0.124939
-0.463169	volatile volatile int seconds;	-0.124939
-1.423686	= b * 1.2f;	-0.124939
-0.358630	to 15.1c as intended,	-0.124939
-0.358736	project window or makefile.	-0.124939
-0.357514	or modifies many strings.	-0.124939
-0.883698	operating system may supply	-0.124939
-1.049353	then use a queue.	-0.124939
-0.897279	sharing the same queue,	-0.124939
-0.357079	function was called from),	-0.124939
-0.577420	necessary functions for distinguishing	-0.124939
-0.237967	deque (doubly ended queue)	-0.124939
-0.237967	for the newsgroup comp.lang.asm.x86	-0.124939
-0.592957	structure in example 9.1b.	-0.124939
-1.540013	bb[], short int cc[]);	-0.124939
-0.478801	things in parallel. Coarse-grained	-0.124939
-0.339514	Hyperthreading is Intel's term	-0.124939
-0.237967	classes like string, wstring	-0.124939
-0.352414	to override public symbols,	-0.124939
-0.325441	have quite dramatic consequences.	-0.124939
-0.957788	the risk of activating	-0.124939
-0.459326	list[i+1];} sum1 += sum2;	-0.124939
-0.463657	between threads is minimized.	-0.124939
-0.351726	10 elements were inserted,	-0.124939
-0.357332	reporting here: return *(T*)0;	-0.124939
-0.592957	if-branch in example 7.30b.	-0.124939
-0.659061	clock frequency may vary	-0.124939
-1.362696	this: // Example 14.4a	-0.124939
-0.463445	and data can exceed	-0.124939
-0.352981	!= INVALID_HANDLE_VALUE && WriteFile(handle,	-0.124939
-0.527024	5) SelectAddMul_pointer = &SelectAddMul_SSE41;	-0.124939
-0.237967	Technology Journal Vol. 11,	-0.124939
-0.355444	has already been allocated.	-0.124939
-0.738899	described in chapter 11.	-0.124939
-0.463657	This cost is minimized	-0.124939
-0.590525	pointers do not alias,	-0.124939
-0.358344	+ 2.0f; } 115	-0.124939
-0.549395	and on Intel Atom	-0.124939
-0.237967	7.15b SafeArray <float, 100>	-0.124939
-0.237967	iset = instrset_detect(); 116	-0.124939
-0.902036	i, a); } 111	-0.124939
-0.294284	= _mm_and_si128(c2, mask); 110	-0.124939
-0.479834	guaranteed to wrap around,	-0.124939
-0.834112	*)d, x); } 112	-0.124939
-0.500280	optimizations with option -Wstrict-overflow=2,	-0.124939
-0.294284	{ FuncA(i); FuncC(i); FuncB(i+1);	-0.124939
-0.237967	with vector operands: minimum,	-0.124939
-1.303514	function is called. 118	-0.124939
-0.823221	int exponent : 11;	-0.124939
-1.062213	Division by a constant:	-0.124939
-0.237967	Henry S. Warren, Jr.:	-0.124939
-0.591025	profitable (see page 70).	-0.124939
-0.353872	The AND operator (&)	-0.124939
-0.846055	= 2; } list[300]	-0.124939
-0.982355	the Boolean operators (&&	-0.124939
-0.582923	&CriticalFunction_SSE2; } // Default	-0.124939
-0.294284	file will remain locked	-0.124939
-0.237967	1., 1./2., 1./6., 1./24.,	-0.124939
-0.356284	of machine instructions executed,	-0.124939
-0.895547	enough to be annoying.	-0.124939
-0.715066	a linked list. 94	-0.124939
-0.498212	allocating more space 91	-0.124939
-0.355536	temp; temp += 9;	-0.124939
-0.237967	from example 9.5a: 98	-0.124939
-0.353873	on the Mac platform,	-0.124939
-0.462962	the memory when exiting	-0.124939
-0.584835	imprecision in some rare	-0.124939
-0.358934	the program of occupying	-0.124939
-0.504895	bit-mask: c2 = _mm_and_si128(c2,	-0.124939
-0.358898	right format and getting	-0.124939
-0.659810	is possible in Linux).	-0.124939
-2.159700	Example: // Example 7.41a	-0.124939
-0.897173	operations: // Example 7.41b	-0.124939
-0.341900	bit are zero. Zero	-0.124939
-0.351726	<< list[i] << endl;	-0.124939
-0.347530	/ 0x40) % 0x20	-0.124939
-0.558539	[eax+400] DWORD PTR [eax],	-0.124939
-0.237967	// Example 7.38b. Alternative	-0.124939
-0.456698	make a Boolean NOT	-0.124939
-0.294284	will be misleading reports	-0.124939
-0.355850	in the big registration	-0.124939
-1.243589	case of an error;	-0.124939
-0.358709	keywords Fast function calling.	-0.124939
-0.339514	the keyword far (arrays	-0.124939
-0.358736	keyword __thread or __declspec(thread).	-0.124939
-1.472528	= a * 2.5;	-0.124939
-0.596612	slow. If the granularity	-0.124939
-0.595301	} u; u.i &=	-0.124939
-0.900215	to can be accessed.	-0.124939
-0.600955	options in the BIOS	-0.124939
-0.463652	which transposes a quadratic	-0.124939
-0.462891	calculated first, then d+e,	-0.124939
-0.505040	by r is re-loaded	-0.124939
-0.558683	with the constant 2.5,	-0.124939
-0.458551	it often contains writeable	-0.124939
-0.358736	using new/delete or malloc/free	-0.124939
-0.595301	set is enabled (single	-0.124939
-0.502433	A feature called "Gnu	-0.124939
-0.294284	the instruction xor eax,eax.	-0.124939
-0.358901	or iterative in nature,	-0.124939
-0.294284	-msse3 /arch:SSE3 -mssse3 /arch:SSSE2	-0.124939
-1.683872	explained on page 27.	-0.124939
-0.237967	development", Addison- Wesley 1997.	-0.124939
-0.355446	Structures and classes Nowadays,	-0.124939
-0.347528	micro-op cache (e.g. Sandy	-0.124939
-0.336346	and "Integrated Performance Primitives".	-0.124939
-0.774338	mangled function name ;startofFunc	-0.124939
-0.541178	This results in meaningless	-0.124939
-0.726255	x x-- x ---	-0.124939
-0.356901	Updating mechanisms often disturb	-0.124939
-0.828365	to put the task-specific	-0.124939
-0.294284	and network connections. Temporary	-0.124939
-0.504561	is 1. This '1'	-0.124939
-0.358749	0); DontSkip = dummy[0];	-0.124939
-0.562944	dispatcher function and replaces	-0.124939
-0.884444	temp; temp = a+1;	-0.124939
-1.261073	that are not reproducible.	-0.124939
-0.536860	sometimes it does incredibly	-0.124939
-0.450289	require a variable. Efficiency	-0.124939
-0.463749	result is valid. Re-interpreting	-0.124939
-0.565686	penalty to using inheritance.	-0.124939
-0.357661	declared. Avoid multiple inheritance,	-0.124939
-0.314805	the same brand. Future	-0.124939
-0.565165	on the second sub-vector	-0.124939
-0.457213	EXCLUSIVE OR operator (^)	-0.124939
-0.358943	for 'this' is incurred	-0.124939
-0.456084	= (a<b && b<c)	-0.124939
-0.456406	C++ is Microsoft Foundation	-0.124939
-0.582378	in the other volumes	-0.124939
-0.352982	x86intrin.h (Gnu) Table 12.2.	-0.124939

\5-grams:
-0.569879	efficiency then it is the
-0.485558	checks. But it is the
-0.344938	interrupted. Now it is the
-0.226621	surely rely on is the
-0.345756	the machine code is the
-0.315993	method. // This is the
-0.315993	time1; // This is the
-0.289507	a program. This is the
-0.377068	the data. This is the
-0.289507	memory pointer. This is the
-0.289507	as well. This is the
-0.289507	subsequent counts. This is the
-0.289507	each other. This is the
-0.289507	is compiled. This is the
-0.289507	function declaration. This is the
-0.289507	also de-allocated. This is the
-0.289507	ever happens. This is the
-0.389941	position-independent because this is the
-0.300002	size. If this is the
-0.300002	operating system this is the
-0.334107	the arrays. It is the
-0.432288	same core. It is the
-0.334107	of course. It is the
-0.334107	to diagnose. It is the
-0.334107	for response. It is the
-0.334107	memory leaks. It is the
-0.316143	intermediate code, which is the
-0.316143	a latency which is the
-0.316143	generic branch, which is the
-0.534999	the instruction set is the
-0.534999	x86 instruction set is the
-0.432373	the function pointer is the
-0.327536	a simple array is the
-0.226621	2eee 1.fffff, where is the
-0.482773	of a variable is the
-0.404256	that memory access is the
-0.301296	same executable. SSE2 is the
-0.415522	function. The stack is the
-0.226621	virtual function calls is the
-0.329262	2. The result is the
-0.491993	Transposing a matrix is the
-0.339711	more powerful solution is the
-0.226621	(i=0; i<n; i++) is the
-0.317906	a sorted list is the
-0.281389	to. A reference is the
-0.311603	back, where n is the
-0.301296	sequence, where r is the
-0.329262	method used here is the
-0.226621	to handle strings is the
-0.301296	set of containers is the
-0.305757	A light-weight alternative is the
-0.226621	where template metaprogramming is the
-0.134478	a clock cycle is the
-0.301296	parallel. Fine-grained parallelism is the
-0.226621	looking name ?Func@@YAXQAHAAH@Z is the
-0.226621	the kind: "what is the
-0.226621	/ jl $B1$2 is the
-0.226621	alternative worth considering is the
-0.226621	Another serious burden is the
-0.226621	the sign, eee is the
-0.226621	exponent, and fffff is the
-0.226621	instruction add eax,1 is the
-0.210581	b should be of the
-0.178371	CPU detection function of the
-0.178371	monotonically increasing function of the
-0.178371	a staircase function of the
-0.163623	prefetching the code of the
-0.528273	economize the use of the
-0.226819	can make use of the
-0.226819	make better use of the
-0.163623	one or more of the
-0.179372	compile time because of the
-0.179372	register variables because of the
-0.179372	some systems because of the
-0.179372	be avoided because of the
-0.179372	is inefficient because of the
-0.074151	is discussed which of the
-0.074151	in advance which of the
-0.210581	efficient if all of the
-0.004660	This is one of the
-0.009371	Polymorphism is one of the
-0.048270	pointer to one of the
-0.048270	identical to one of the
-0.028742	example, only one of the
-0.014133	read into one of the
-0.014133	0x273F into one of the
-0.028742	may choose one of the
-0.028742	line. Only one of the
-0.028742	for signifying one of the
-0.210581	if the class of the
-0.210581	and after each of the
-0.110074	object where most of the
-0.110074	one way most of the
-0.110074	Windows, while most of the
-0.110074	versa. But most of the
-0.110074	are predicted most of the
-0.110074	software runs most of the
-0.110074	can obtain most of the
-0.110074	that consumes most of the
-0.088530	by the size of the
-0.173418	when the size of the
-0.199851	unless the size of the
-0.199851	fit the size of the
-0.199851	half the size of the
-0.199851	Return the size of the
-0.199851	increases the size of the
-0.361749	efficient. The size of the
-0.251997	reasons: The size of the
-0.003471	is a multiple of the
-0.008731	by a multiple of the
-0.017642	size a multiple of the
-0.191822	where the object of the
-0.315629	to an object of the
-0.094617	in an object of the
-0.072471	IDE with many of the
-0.034726	D has many of the
-0.034726	Pascal has many of the
-0.072471	and avoids many of the
-0.067515	the same version of the
-0.295733	known which version of the
-0.067515	of each version of the
-0.067515	best possible version of the
-0.067515	the new version of the
-0.067515	// specific version of the
-0.067515	the optimized version of the
-0.067515	the optimal version of the
-0.067515	a better version of the
-0.003138	the appropriate version of the
-0.012689	The appropriate version of the
-0.067515	the desired version of the
-0.007004	the right version of the
-0.067515	the final version of the
-0.067515	a newer version of the
-0.032447	17 debug version of the
-0.032447	Uses debug version of the
-0.090974	the latest version of the
-0.067515	An inferior version of the
-0.067515	A command-line version of the
-0.131184	of the value of the
-0.167831	that the value of the
-0.128675	if the value of the
-0.131184	If the value of the
-0.131184	sure the value of the
-0.131184	after the value of the
-0.211626	read the value of the
-0.131184	hold the value of the
-0.227518	counts. The value of the
-0.144348	each different value of the
-0.144348	the final value of the
-0.031842	the absolute value of the
-0.117295	speed in any of the
-0.117295	true, if any of the
-0.117295	can use any of the
-0.174206	units. If any of the
-0.117295	used, but any of the
-0.043859	care of some of the
-0.043859	programmers to some of the
-0.043859	comes with some of the
-0.043859	have described some of the
-0.043859	common. Even some of the
-0.043859	others. While some of the
-0.043859	sections describe some of the
-0.136081	where the performance of the
-0.136081	about the performance of the
-0.136081	influence the performance of the
-0.115960	change the order of the
-0.035321	swap the order of the
-0.136355	The opposite order of the
-0.254654	the dispatch branch of the
-0.131290	that is member of the
-0.149878	be a member of the
-0.068505	function a member of the
-0.131290	other (not member of the
-0.108855	of the address of the
-0.033343	to the address of the
-0.051026	up the address of the
-0.024764	contains the address of the
-0.108855	simply the address of the
-0.108855	find the address of the
-0.113740	the return address of the
-0.163623	on every call of the
-0.260141	languages are out of the
-0.260141	the conversions out of the
-0.480681	be moved out of the
-0.080811	only the part of the
-0.132849	that is part of the
-0.132849	often a part of the
-0.080811	(Darwin) are part of the
-0.080811	included as part of the
-0.038528	that this part of the
-0.038528	on this part of the
-0.008271	the same part of the
-0.080811	see which part of the
-0.080811	reasons, but part of the
-0.038528	in each part of the
-0.038528	times each part of the
-0.080811	a static part of the
-0.080811	include any part of the
-0.451485	the critical part of the
-0.163798	same critical part of the
-0.116224	most critical part of the
-0.080811	an important part of the
-0.080811	a small part of the
-0.080811	The optimized part of the
-0.080811	and another part of the
-0.080811	a particular part of the
-0.080811	most significant part of the
-0.080811	most time-consuming part of the
-0.080811	the time-critical part of the
-0.080811	the task-specific part of the
-0.236167	interpret the bits of the
-0.077199	or 16 bits of the
-0.077199	lower 16 bits of the
-0.171152	significant n bits of the
-0.258884	In the case of the
-0.163623	Xnu project. Some of the
-0.028186	or more versions of the
-0.051584	the different versions of the
-0.110132	with different versions of the
-0.110132	If different versions of the
-0.113549	make multiple versions of the
-0.113549	generate multiple versions of the
-0.183828	are two versions of the
-0.125724	make special versions of the
-0.125724	the CPU-specific versions of the
-0.040182	for the result of the
-0.012985	on the result of the
-0.040182	when the result of the
-0.040182	needs the result of the
-0.160712	fast. The result of the
-0.163623	the first element of the
-0.163623	that do much of the
-0.034534	check for overflow of the
-0.244968	types to integers of the
-0.316714	high processing power of the
-0.316714	the computational power of the
-0.344792	and the calculation of the
-0.238770	efficient the calculation of the
-0.238770	out the calculation of the
-0.238770	up the calculation of the
-0.238770	case, the calculation of the
-0.163623	for the parameters of the
-0.163623	The worst problem of the
-0.260422	that takes advantage of the
-0.260422	The main advantage of the
-0.260422	We took advantage of the
-0.163623	system for support of the
-0.163623	where only few of the
-0.227434	the whole structure of the
-0.121823	use the copy of the
-0.027404	a non-inlined copy of the
-0.210581	with the problems of the
-0.163623	larger address space of the
-0.098996	standard. An implementation of the
-0.098996	a good implementation of the
-0.153405	a complicated implementation of the
-0.098996	A typical implementation of the
-0.048002	procedure 4 Most of the
-0.048002	interface framework Most of the
-0.048002	limited resources. Most of the
-0.273798	they are members of the
-0.203352	saved variable members of the
-0.203352	instance. Non-static members of the
-0.351217	most important disadvantage of the
-0.163623	shares the resources of the
-0.242158	of the end of the
-0.242158	at the end of the
-0.163623	common language runtime of the
-0.008020	than the parts of the
-0.008020	optimize the parts of the
-0.016190	possible when parts of the
-0.016190	and make parts of the
-0.029268	the different parts of the
-0.029268	in different parts of the
-0.091607	between different parts of the
-0.008020	to other parts of the
-0.008020	affects other parts of the
-0.016190	most used parts of the
-0.003190	the critical parts of the
-0.003190	in critical parts of the
-0.003190	or critical parts of the
-0.003190	most critical parts of the
-0.003190	less critical parts of the
-0.016190	system- specific parts of the
-0.016190	that certain parts of the
-0.016190	most time-consuming parts of the
-0.016190	brand. Critical parts of the
-0.016190	other nearby parts of the
-0.230427	built-in code instead of the
-0.230427	file format instead of the
-0.230427	and |) instead of the
-0.163623	are unacceptable. Each of the
-0.210581	do interprocedural optimizations of the
-0.210581	other microprocessors. Many of the
-0.210581	OR the results of the
-0.163623	operands The operands of the
-0.128516	at the start of the
-0.128516	framework, during start of the
-0.353834	invoking the overhead of the
-0.234046	the extra overhead of the
-0.059522	both during installation of the
-0.059522	itself, during installation of the
-0.083396	whenever an instance of the
-0.083396	than one instance of the
-0.083396	with each instance of the
-0.019396	a new instance of the
-0.210581	compile the output of the
-0.163623	the essential task of the
-0.295947	on the efficiency of the
-0.378328	etc. The efficiency of the
-0.199921	for more discussion of the
-0.200967	a further discussion of the
-0.334525	for further discussion of the
-0.134501	to the offset of the
-0.134501	if the offset of the
-0.897315	for the sake of the
-0.134501	{ The effect of the
-0.134501	loop. The effect of the
-0.163623	The clock frequency of the
-0.074151	in one iteration of the
-0.074151	for every iteration of the
-0.163623	for future models of the
-0.282313	for. The names of the
-0.227434	of lazy loading of the
-0.282313	off the reading of the
-0.163623	worst case situation of the
-0.480446	two different implementations of the
-0.210581	and different sizes of the
-0.074151	a large fraction of the
-0.074151	a small fraction of the
-0.327498	if the length of the
-0.327498	If the length of the
-0.103453	of the beginning of the
-0.009546	to the beginning of the
-0.039512	that the beginning of the
-0.039512	into the beginning of the
-0.210581	} The declaration of the
-0.227434	time- consuming features of the
-0.566495	be a waste of the
-0.239878	a total waste of the
-0.059522	it is independent of the
-0.059522	11.3 is independent of the
-0.163623	instructions without help of the
-0.258925	more detailed explanation of the
-0.163623	faster. The logic of the
-0.061870	that takes care of the
-0.366742	can take care of the
-0.341403	// The purpose of the
-0.221487	to all instances of the
-0.158484	many renamed instances of the
-0.282313	by the body of the
-0.163623	if the changes of the
-0.236112	that the representation of the
-0.000398	is the responsibility of the
-0.017385	is the reciprocal of the
-0.163623	n'th degree polynomial of the
-0.464402	in the scope of the
-0.330374	by the throughput of the
-0.210581	of each step of the
-0.202968	most cases, regardless of the
-0.202968	be false regardless of the
-0.282313	use just-in-time compilation of the
-0.244697	change the behavior of the
-0.158484	details. The behavior of the
-0.163623	is the job of the
-0.465552	by the requirements of the
-0.005488	to the rest of the
-0.005488	and the rest of the
-0.005488	in the rest of the
-0.005488	that the rest of the
-0.005488	by the rest of the
-0.282313	by the latency of the
-0.163623	using advanced facilities of the
-0.163623	prediction or estimate of the
-0.048002	to transfer ownership of the
-0.048002	that transfers ownership of the
-0.048002	that looses ownership of the
-0.062064	Overcoming the drawbacks of the
-0.163623	requires no modification of the
-0.163623	improved by modifications of the
-0.163623	smaller. The lengths of the
-0.048002	which is 50% of the
-0.048002	is true 50% of the
-0.048002	be mispredicted 50% of the
-0.401187	size (in bytes) of the
-0.163623	with the resolution of the
-0.282313	a complete redesign of the
-0.163623	a thorough analysis of the
-0.282313	copy the contents of the
-0.163623	above. The generality of the
-0.282313	to keep track of the
-0.115357	to get rid of the
-0.115357	don't get rid of the
-0.163623	the optimal decomposition of the
-0.163623	OpenMP. www.openmp.org. Documentation of the
-0.048002	achieved when none of the
-0.023338	expression, but none of the
-0.023338	15.1c, but none of the
-0.163623	exceptions are indeed of the
-0.163623	b) But beware of the
-0.163623	that uses 90% of the
-0.163623	only the lowest of the
-0.210581	a better understanding of the
-0.074151	more than 99% of the
-0.074151	other programs, 99% of the
-0.163623	for further expansions of the
-0.074151	is only 10% of the
-0.074151	is true 10% of the
-0.163623	less than 1/50 of the
-0.163623	the logical architecture of the
-0.163623	generality and flexibility of the
-0.163623	the binary decimals of the
-0.163623	need metaprogramming. None of the
-0.163623	the responsi- bility of the
-0.163623	if the evaluation of the
-0.163623	if the bias of the
-0.163623	a detailed overview of the
-0.163623	a good knowledge of the
-0.163623	set (called x86) of the
-0.163623	to do searches of the
-0.163623	reusability and systematization of the
-0.163623	function. This fragmentation of the
-0.163623	64-bit mode. Much of the
-0.163623	systems use segmentation of the
-0.163623	about the dimensions of the
-0.163623	reply about investigation of the
-0.163623	reputation. The compactness of the
-0.163623	the first-in-last-out nature of the
-0.163623	structure and clarity of the
-0.315459	and reduce a to the
-0.242910	and compare it to the
-0.242910	which redirects it to the
-0.315154	as machine code to the
-0.261955	rules apply as to the
-0.557531	by the compiler to the
-0.111519	template for x to the
-0.111519	to get x to the
-0.111519	15.1a. Calculate x to the
-0.339329	from static memory to the
-0.261955	converting the data to the
-0.261955	CPU dispatching only to the
-0.210407	makes it point to the
-0.210407	it will point to the
-0.043830	types cannot point to the
-0.209444	visible at all to the
-0.318738	one more integer to the
-0.282552	and a pointer to the
-0.282552	stores a pointer to the
-0.282552	returns a pointer to the
-0.282552	Returns a pointer to the
-0.172910	reference or pointer to the
-0.037208	a function pointer to the
-0.037208	// Set pointer to the
-0.280971	the first object to the
-0.532886	floating point number to the
-0.021284	the keyword static to the
-0.187363	than one call to the
-0.280182	only one call to the
-0.245552	make any call to the
-0.245552	the first call to the
-0.779633	table of pointers to the
-0.275746	allows multiple pointers to the
-0.199114	threads have access to the
-0.199114	fastest possible access to the
-0.199114	to get access to the
-0.199114	which gives access to the
-0.261955	actually adds 16 to the
-0.209444	adding new instructions to the
-0.261955	be made available to the
-0.021284	adding a constant to the
-0.529608	performance is important to the
-0.402655	avoid the calls to the
-0.209444	down the execution to the
-0.317558	constant and known to the
-0.209444	do not add to the
-0.209444	the threads write to the
-0.092217	as a parameter to the
-0.092217	an implicit parameter to the
-0.247283	returns a reference to the
-0.553406	pointer or reference to the
-0.261955	by adding n to the
-0.191974	code in addition to the
-0.191974	most important addition to the
-0.209444	m is transferred to the
-0.021284	a well-defined interface to the
-0.215003	The output goes to the
-0.215003	software project goes to the
-0.261955	a symbolic link to the
-0.343593	dispatch is made to the
-0.302950	call it points to the
-0.177050	pointer initially points to the
-0.177050	entry initially points to the
-0.261955	the function go to the
-0.261955	very little overhead to the
-0.028617	code are relative to the
-0.028617	each function relative to the
-0.014073	the member relative to the
-0.014073	data member relative to the
-0.028617	the offset relative to the
-0.028617	fact addressed relative to the
-0.085465	more threads writing to the
-0.085465	multiple threads writing to the
-0.209444	it more clear to the
-0.209444	from one iteration to the
-0.290801	is artificially changed to the
-0.209444	install automatic updates to the
-0.156115	Make calls directly to the
-0.156115	be fed directly to the
-0.167628	object is copied to the
-0.111519	has been copied to the
-0.111519	entire contents copied to the
-0.343593	which is similar to the
-0.290801	and jumps back to the
-0.209444	of a row to the
-0.139159	c is added to the
-0.139159	members is added to the
-0.139159	f is added to the
-0.432576	This also applies to the
-0.209444	array pointer eax to the
-0.209444	by adding throw() to the
-0.209444	all the inputs to the
-0.021284	to be distributed to the
-0.011222	way is equal to the
-0.011222	list[i] is equal to the
-0.011222	label is equal to the
-0.034574	to be equal to the
-0.034574	is therefore equal to the
-0.209444	writes or reads to the
-0.209444	the leftmost column to the
-0.231063	large delay due to the
-0.231063	some differences due to the
-0.343593	may be obvious to the
-0.261955	may be swapped to the
-0.092217	no certain limit to the
-0.092217	reasonable upper limit to the
-0.215003	stack always belong to the
-0.215003	cache lines belong to the
-0.209444	shifts one place to the
-0.191974	quite efficient thanks to the
-0.191974	becomes fragmented thanks to the
-0.280971	used as alternatives to the
-0.209444	lot of modifications to the
-0.043665	option -fpic according to the
-0.043665	binary representation according to the
-0.043665	always behave according to the
-0.043665	100. Now, according to the
-0.280971	can be extended to the
-0.209444	but also inconvenient to the
-0.261955	performance is inferior to the
-0.092217	This is annoying to the
-0.092217	schemes are annoying to the
-0.261955	has been translated to the
-0.280971	switch statement leads to the
-0.261955	long time compared to the
-0.209444	Coarse-grained parallelism refers to the
-0.261955	This normally belongs to the
-0.209444	from the caller to the
-0.092217	a significant contribution to the
-0.092217	a negligible contribution to the
-0.209444	appropriate error messages to the
-0.209444	the 64-bit extension to the
-0.209444	restarted anyway. Updates to the
-0.209444	It is unacceptable to the
-0.209444	to integer According to the
-0.209444	values is closest to the
-0.209444	by default, conform to the
-0.209444	smaller and closer to the
-0.209444	have to adapt to the
-0.309660	the inlined function and the
-0.284449	The Gnu compiler and the
-0.024714	by the CPU and the
-0.024714	both the CPU and the
-0.476352	the code cache and the
-0.233644	the level-2 cache and the
-0.370887	asmlib function library and the
-0.265283	power of 2 and the
-0.212391	multiple data elements and the
-0.817244	function is called and the
-0.314876	both the pointers and the
-0.265283	aligned by 32 and the
-0.300415	one source file and the
-0.314861	one is 0 and the
-0.304499	of physical processors and the
-0.245638	way two times and the
-0.245638	= 256 times and the
-0.451010	compiler for Windows and the
-0.212391	floating point calculations and the
-0.071793	by the processor and the
-0.071793	on the processor and the
-0.390449	or assembly language and the
-0.212391	arithmetic units, etc. and the
-0.294338	block is allocated and the
-0.212391	six integer parameters and the
-0.194095	The first count and the
-0.409387	maximum repeat count and the
-0.265283	by the microprocessor and the
-0.301926	with preceding branches and the
-0.212391	structured software development and the
-0.265283	child class name and the
-0.212391	for Unix applications and the
-0.265283	The .NET framework and the
-0.212391	clock cycles later and the
-0.212391	a and b, and the
-0.212391	relevant optimization options and the
-0.347612	the copy constructor and the
-0.157850	the library function, and the
-0.157850	the select function, and the
-0.284449	8 bytes smaller and the
-0.212391	removed the contentions and the
-0.294358	on all platforms and the
-0.212391	both the level-1 and the
-0.212391	for fast math and the
-0.265283	restrictions on alignment and the
-0.642461	the same thing and the
-0.403998	execute the program, and the
-0.212391	with the compiler, and the
-0.093339	or class declaration and the
-0.093339	extern "C" declaration and the
-0.212391	simply optimized away and the
-0.265283	an inlined 15.1b and the
-0.265283	biased binary integer, and the
-0.212391	the next vector, and the
-0.265283	by the linker and the
-0.294358	branches as possible, and the
-0.212391	Visual Basic .NET and the
-0.212391	parallelism is obvious and the
-0.212391	between the latency and the
-0.212391	lot of resources, and the
-0.265283	checking for overflow, and the
-0.489259	calculated in advance and the
-0.212391	with millisecond resolution and the
-0.212391	to four bits, and the
-0.212391	and then B, and the
-0.212391	operating system API and the
-0.265283	object file level, and the
-0.212391	to the parameter, and the
-0.212391	be the easiest and the
-0.265283	the program flow and the
-0.212391	only a hint and the
-0.212391	factor in itself, and the
-0.347612	bit, the exponent, and the
-0.212391	is deleted properly and the
-0.212391	CPU doesn't support, and the
-0.212391	be quite tedious and the
-0.212391	algebra and statistics, and the
-0.212391	or false (0); and the
-0.212391	resources are sufficient, and the
-0.212391	&&, ||, ! and the
-0.212391	compatibility, second source, and the
-0.212391	T+1 to T+6, and the
-0.212391	program is terminated and the
-0.212391	optimization. See www.agner.org/optimize and the
-0.212391	between each call, and the
-0.212391	the library libmmt.lib and the
-0.212391	the copying process, and the
-0.212391	leaving their workplace and the
-0.212391	compilers. See www.openmp.org and the
-0.212391	pre-increment operator ++i and the
-0.212391	thousand times lower; and the
-0.212391	AND operator (&) and the
-0.023208	likely to be in the
-0.023208	guaranteed to be in the
-0.047727	would still be in the
-0.277789	compiler manual or in the
-0.234842	to make it in the
-0.528868	If a function in the
-0.469066	time. The code in the
-0.280429	the compiler-generated code in the
-0.254392	it is not in the
-0.166418	interface is not in the
-0.287547	double to int in the
-0.292254	separate file than in the
-0.226189	by the compiler in the
-0.243671	takes longer time in the
-0.233161	processing the data in the
-0.233161	pointers to data in the
-0.127855	= *(++p) because in the
-0.127855	= array[++i] because in the
-0.210226	the different functions in the
-0.210226	Place non-polymorphic functions in the
-0.047727	in registers only in the
-0.047727	this option only in the
-0.047727	size comes only in the
-0.153707	near each other in the
-0.299416	a message loop in the
-0.129217	and is used in the
-0.129217	which is used in the
-0.129217	mode is used in the
-0.129217	feature is used in the
-0.138109	that are used in the
-0.147117	processors are used in the
-0.160989	or data used in the
-0.242527	using the integer in the
-0.266922	convert an integer in the
-0.127855	Friday is set in the
-0.127855	the same set in the
-0.162548	as an example in the
-0.226189	the register size in the
-0.226189	the data object in the
-0.162548	floating point number in the
-0.257809	of the value in the
-0.189721	four B value in the
-0.246031	all the objects in the
-0.246031	there are objects in the
-0.274527	of the variable in the
-0.203972	some other variable in the
-0.390979	a global variable in the
-0.050459	single precision variables in the
-0.186265	calculate the table in the
-0.127855	from a table in the
-0.209383	than one way in the
-0.264740	alias any elements in the
-0.264740	all subsequent elements in the
-0.464965	class are stored in the
-0.275678	have been stored in the
-0.275678	are usually stored in the
-0.253770	is usually called in the
-0.209383	the function address in the
-0.162548	a factor 4 in the
-0.207481	overflow. For example, in the
-0.207481	post-increment. For example, in the
-0.226189	the sign bit in the
-0.253770	the vector registers in the
-0.162548	code to test in the
-0.759871	can be useful in the
-0.162548	exception handling even in the
-0.209383	are primitive operations in the
-0.162548	than to type in the
-0.162548	of pending instructions in the
-0.253020	function is available in the
-0.165296	is also available in the
-0.165296	logical processors available in the
-0.165296	will become available in the
-0.162548	a minor error in the
-0.255785	or multiple times in the
-0.253770	the call stack in the
-0.835924	elements are accessed in the
-0.162548	been incremented, while in the
-0.209383	as static arrays in the
-0.008558	of function calls in the
-0.008558	and function calls in the
-0.008558	on function calls in the
-0.008558	nested function calls in the
-0.311078	4 unused bytes in the
-0.157687	running multiple threads in the
-0.321686	run two threads in the
-0.162548	can be necessary in the
-0.259137	a new element in the
-0.259137	for every element in the
-0.162548	hardware definition language in the
-0.209383	called function. But in the
-0.162548	be kept small in the
-0.162548	doesn't cause overflow in the
-0.073713	exception handling option in the
-0.073713	loop unroll option in the
-0.023208	that container classes in the
-0.023208	make container classes in the
-0.047727	multiple parent classes in the
-0.035295	optimization. This works in the
-0.035295	addresses. This works in the
-0.086439	classes, as explained in the
-0.086439	branches, as explained in the
-0.086439	use, as explained in the
-0.086439	linking, as explained in the
-0.116880	is further explained in the
-0.396168	class is implemented in the
-0.667656	can be implemented in the
-0.311078	gives an advantage in the
-0.162548	and profiling support in the
-0.328796	set is supported in the
-0.317373	services that run in the
-0.162548	implemented in hardware in the
-0.186265	insert the values in the
-0.127855	four G values in the
-0.226189	store the information in the
-0.162548	2 clock cycles in the
-0.121237	pointers and addresses in the
-0.121237	code. All addresses in the
-0.178703	generate relative addresses in the
-0.209383	time stamp counter in the
-0.035295	occupies a space in the
-0.035295	up more space in the
-0.035295	too much space in the
-0.035295	takes little space in the
-0.494330	explicit CPU dispatching in the
-0.162548	few files, preferably in the
-0.162548	that you see in the
-0.209383	as error handling in the
-0.162548	often used members in the
-0.162548	Using the methods in the
-0.162548	modifying the name in the
-0.162548	seconds remains zero in the
-0.156997	code is running in the
-0.156997	Two threads running in the
-0.156997	higher-priority thread running in the
-0.209383	the CPU dispatcher in the
-0.226189	See the examples in the
-0.028610	CPU dispatch mechanism in the
-0.073713	All newer microprocessors in the
-0.073713	operators Modern microprocessors in the
-0.162548	to use later in the
-0.256999	then linked together in the
-0.175152	preferably be declared in the
-0.240815	any objects declared in the
-0.162548	polymorphic function goes in the
-0.162548	disable power-save options in the
-0.162548	and Func2 were in the
-0.162548	few unused points in the
-0.256999	scattered randomly around in the
-0.040245	to cause contentions in the
-0.040245	and cause contentions in the
-0.040245	can cause contentions in the
-0.234842	pointers and references in the
-0.162548	no extra overhead in the
-0.162548	of a change in the
-0.162548	floating point-to-integer conversions in the
-0.209383	only one statement in the
-0.305474	code, as described in the
-0.153944	the cases described in the
-0.153944	the syntax described in the
-0.153944	are further described in the
-0.240118	the 4 lines in the
-0.162548	Each graphics operation in the
-0.240118	access, as given in the
-0.162548	instance of S1 in the
-0.209383	big registration database in the
-0.162548	All identical constants in the
-0.338595	handle text strings in the
-0.162548	to a macro in the
-0.162548	i with 100 in the
-0.209383	wheel. The containers in the
-0.162548	widely different priority in the
-0.162548	The function names in the
-0.209383	make the rows in the
-0.162548	example may fail in the
-0.162548	multiple data structures in the
-0.121237	overflow can occur in the
-0.027286	when contentions occur in the
-0.162548	can be improved in the
-0.280900	time-consumers are discussed in the
-0.017289	store forwarding delay in the
-0.073713	be saved either in the
-0.073713	memory blocks, either in the
-0.127855	a register except in the
-0.127855	the representation, except in the
-0.162548	r places back in the
-0.162548	same can happen in the
-0.209383	to go away in the
-0.300784	parsing are provided in the
-0.162548	Long dependency chains in the
-0.226189	the time-consumers mentioned in the
-0.178703	time is included in the
-0.121237	functions are included in the
-0.121237	is not included in the
-0.162548	prediction into account in the
-0.209383	flow and algorithms in the
-0.209383	four float additions in the
-0.162548	many unknown factors in the
-0.370705	functions are listed in the
-0.215014	set, as listed in the
-0.162548	time is interpreted in the
-0.017289	size causes misses in the
-0.162548	registers named YMM in the
-0.162548	available for free in the
-0.226189	that was saved in the
-0.162548	independent of changes in the
-0.162548	different execution units in the
-0.162548	insert the reciprocal in the
-0.073713	time is spent in the
-0.073713	the time spent in the
-0.209383	an exception occurs in the
-0.162548	the next step in the
-0.035295	any hot spots in the
-0.035295	identifying hot spots in the
-0.162548	at specific places in the
-0.162548	|| are evaluated in the
-0.073713	language is portable in the
-0.073713	is fully portable in the
-0.162548	how to recover in the
-0.162548	of the advice in the
-0.162548	value is already in the
-0.127855	should be seen in the
-0.127855	is not seen in the
-0.162548	index or key in the
-0.011449	which they appear in the
-0.047727	the modules appear in the
-0.162548	position-independent code flag in the
-0.162548	were not present in the
-0.162548	from one place in the
-0.162548	code is serial in the
-0.162548	to require modifications in the
-0.280900	that are missing in the
-0.162548	of different lengths in the
-0.035295	of 2. Contentions in the
-0.035295	target buffer. Contentions in the
-0.035295	buffer (BTB). Contentions in the
-0.035295	my experiments. Contentions in the
-0.073713	set a breakpoint in the
-0.073713	a fixed breakpoint in the
-0.162548	register that appears in the
-0.162548	the exception handler in the
-0.162548	last index changing in the
-0.209383	bit is kept in the
-0.162548	trying the techniques in the
-0.162548	and an FPGA in the
-0.209383	are stored consecutively in the
-0.162548	layers of abstraction in the
-0.162548	not need updating in the
-0.162548	own profiling instruments in the
-0.162548	is unnecessarily wasteful in the
-0.073713	The difference lies in the
-0.073713	this efficiency lies in the
-0.162548	unpredictable errors elsewhere in the
-0.209383	I have supplied in the
-0.162548	and cause delays in the
-0.162548	function uses logarithms in the
-0.162548	operating system kernel in the
-0.162548	linkage table (PLT) in the
-0.280900	platforms as shown in the
-0.162548	can be disabled in the
-0.073713	because the relocations in the
-0.073713	will generate relocations in the
-0.162548	declare it locally in the
-0.162548	find the answers in the
-0.162548	function is inserted in the
-0.162548	is not visible in the
-0.162548	are scattered everywhere in the
-0.162548	hints as pragmas in the
-0.162548	that runs alone in the
-0.162548	the biggest time-consumer in the
-0.017289	65 8 Optimizations in the
-0.162548	opportunities for parallelization in the
-0.162548	variables are overdetermined in the
-0.162548	card or integrated in the
-0.162548	for source annotation in the
-0.162548	was assigned previously in the
-0.162548	not necessarily stay in the
-0.162548	systems will dominate in the
-0.162548	conventions. The dot in the
-0.162548	has been alleviated in the
-0.162548	the right positions in the
-0.162548	function libraries. Numbers in the
-0.162548	cores will grow in the
-0.162548	and other flaws in the
-0.162548	memory is mirrored in the
-0.162548	vector classes Programming in the
-0.162548	dependency chain. Nothing in the
-0.162548	not stored contiguously in the
-0.162548	have inserted UnusedFiller in the
-0.162548	code (release version) in the
-0.162548	first and foremost, in the
-0.162548	Delays or glitches in the
-0.162548	on branch predictions in the
-0.162548	has occurred anywhere in the
-0.162548	to be resized in the
-0.263835	at all is for the
-0.263835	reflect this or for the
-0.237934	calling the function for the
-0.237934	the inlined function for the
-0.197161	and 64-bit code for the
-0.197161	and intermediate code for the
-0.197161	exactly identical code for the
-0.197161	can build code for the
-0.181423	to save time for the
-0.181423	that saves time for the
-0.273280	for prefetching data for the
-0.194866	down a program for the
-0.293579	with intrinsic functions for the
-0.492101	space is used for the
-0.229397	three times, one for the
-0.229397	two branches: one for the
-0.086594	levels of cache for the
-0.086594	an extra cache for the
-0.041142	the instruction set for the
-0.041142	supported instruction set for the
-0.194866	the vector size for the
-0.245539	a temporary object for the
-0.245539	a linear array for the
-0.105643	make it possible for the
-0.105643	is therefore possible for the
-0.105643	is rarely possible for the
-0.279047	32- bit version for the
-0.402006	point induction variables for the
-0.279047	definitely degrades performance for the
-0.194866	function is called for the
-0.273280	256-bit vector register for the
-0.245539	from using registers for the
-0.263835	eliminates the need for the
-0.194866	how to test for the
-0.468841	It is useful for the
-0.376603	appropriate header file for the
-0.285734	may reorder instructions for the
-0.437423	versions are available for the
-0.245539	It is important for the
-0.194866	are too large for the
-0.237528	that is compiled for the
-0.152597	file and compiled for the
-0.152597	necessary, each compiled for the
-0.214688	and programs compiled for the
-0.194866	is too big for the
-0.290817	map file" option for the
-0.038753	it is good for the
-0.527736	are highly optimized for the
-0.426248	how to check for the
-0.287869	must then check for the
-0.194866	most efficient solution for the
-0.268058	to make support for the
-0.605947	has hardware support for the
-0.263835	b with 1 for the
-0.245539	save some information for the
-0.245539	different source files for the
-0.194866	disadvantages mentioned above for the
-0.194866	the same space for the
-0.194866	in code caching for the
-0.456680	disable exception handling for the
-0.263835	the same name for the
-0.194866	often a disadvantage for the
-0.245539	makes no difference for the
-0.279047	is not needed for the
-0.013286	it is difficult for the
-0.006592	It is difficult for the
-0.041142	is more difficult for the
-0.245539	the available options for the
-0.194866	is most appropriate for the
-0.194866	function a constructor for the
-0.194866	it is relevant for the
-0.194866	call the destructor for the
-0.194866	or hide them for the
-0.437680	strict when compiling for the
-0.013286	makes it easier for the
-0.055771	this reordering easier for the
-0.194866	is exactly identical for the
-0.273280	the stack, except for the
-0.194866	The critical stride for the
-0.456680	is big enough for the
-0.086594	C++ is chosen for the
-0.086594	compiler has chosen for the
-0.194866	checking is included for the
-0.194866	are limiting factors for the
-0.194866	to perform poorly for the
-0.165304	has to wait for the
-0.194866	is rare. Testing for the
-0.147460	integer expressions (except for the
-0.147460	previous iteration (except for the
-0.194866	See page 51 for the
-0.456680	the compiler documentation for the
-0.194866	like square blocking for the
-0.194866	threads are competing for the
-0.020084	is not unusual for the
-0.245539	is best suited for the
-0.194866	is a proxy for the
-0.194866	time is consistent for the
-0.194866	background are unnecessary for the
-0.194866	is sufficiently accurate for the
-0.086594	stride will contend for the
-0.086594	dynamic libraries contend for the
-0.194866	of the subroutine for the
-0.194866	code, specific preferences for the
-0.194866	point division. Correction for the
-0.194866	and the FAQ for the
-0.194866	set is maintained for the
-0.194866	(see page 122) for the
-0.194866	in registers. Except for the
-0.194866	modification to compensate for the
-0.194866	will always compete for the
-0.194866	to the standards for the
-0.194866	can optimize specifically for the
-0.194866	{...} // Prototype for the
-0.194866	turn on correction for the
-0.572982	intermediate code is that the
-0.129037	Intel compiler is that the
-0.187617	for this is that the
-0.129037	the point is that the
-0.280498	instruction set is that the
-0.129037	this method is that the
-0.129037	definition language is that the
-0.129037	32-bit Linux is that the
-0.020803	The disadvantage is that the
-0.065662	Another disadvantage is that the
-0.129037	function parameter is that the
-0.180390	The reason is that the
-0.129037	static linking is that the
-0.187617	static here is that the
-0.187617	cache contentions is that the
-0.129037	function inlining is that the
-0.129037	lazy binding is that the
-0.280498	we notice is that the
-0.129037	with macros is that the
-0.129037	The consequence is that the
-0.129037	an assumption is that the
-0.280498	The conclusion is that the
-0.129037	this argument is that the
-0.265976	or 1 and that the
-0.475368	check the code that the
-0.427364	tell the compiler that the
-0.250807	tells the compiler that the
-0.233424	insert an instruction that the
-0.260378	about the class that the
-0.264266	compile time so that the
-0.195234	sign bit so that the
-0.195234	through pointers so that the
-0.195234	the calculations so that the
-0.195234	32-bit mode so that the
-0.195234	the start so that the
-0.195234	are inlined so that the
-0.195234	loop unrolling so that the
-0.041207	be organized so that the
-0.195234	an integer, so that the
-0.195234	if possible, so that the
-0.195234	more compact so that the
-0.195234	value 0x2C so that the
-0.184070	is so long that the
-0.228229	never be sure that the
-0.331381	you are sure that the
-0.299030	to make sure that the
-0.201721	must make sure that the
-0.201721	Therefore, make sure that the
-0.039427	This makes sure that the
-0.130978	reference makes sure that the
-0.082355	in the case that the
-0.082355	the likely case that the
-0.123894	It is important that the
-0.233424	is other work that the
-0.309389	order to avoid that the
-0.233424	a runtime check that the
-0.233424	avoid the problem that the
-0.437150	has the advantage that the
-0.465934	it is likely that the
-0.184070	we can calculate that the
-0.184070	block to copy that the
-0.278395	is quite certain that the
-0.309389	are so fast that the
-0.260378	involves the problems that the
-0.031189	compiler can see that the
-0.309389	any memory block that the
-0.184070	an arbitrary name that the
-0.005593	has the disadvantage that the
-0.468588	cases. This means that the
-0.279075	non-member function, means that the
-0.275756	but it requires that the
-0.202342	problem and assume that the
-0.502485	You can assume that the
-0.042445	can generally assume that the
-0.202342	can safely assume that the
-0.251206	the special feature that the
-0.107688	vector operations require that the
-0.107688	these instructions require that the
-0.107688	Some profilers require that the
-0.107688	and MOVNTDQ require that the
-0.271393	couple of things that the
-0.184070	All the reductions that the
-0.018443	way. The fact that the
-0.018443	underflow. The fact that the
-0.018443	bit. The fact that the
-0.018443	20. The fact that the
-0.291181	} } Assume that the
-0.143942	of the possibility that the
-0.227025	out the possibility that the
-0.184070	from this discussion that the
-0.117262	desired version. Note that the
-0.117262	Windows system. Note that the
-0.117262	character arrays. Note that the
-0.117262	for details. Note that the
-0.117262	less optimized. Note that the
-0.117262	file disassembler. Note that the
-0.184070	may occasionally predict that the
-0.184070	actual clock frequency that the
-0.184070	we must consider that the
-0.184070	the time delay that the
-0.233424	a higher risk that the
-0.082355	member function, provided that the
-0.082355	use branches, provided that the
-0.412040	in the sense that the
-0.437150	you will notice that the
-0.184070	can be expected that the
-0.184070	They can detect that the
-0.184070	can roughly estimate that the
-0.184070	can be said that the
-0.309389	on the assumption that the
-0.123373	compiler will recognize that the
-0.123373	compilers will recognize that the
-0.233424	we are assuming that the
-0.184070	a 90% chance that the
-0.309389	expected. I believe that the
-0.233424	= C; Assuming that the
-0.184070	known with certainty that the
-0.233424	C standard says that the
-0.184070	the programmer forgets that the
-0.184070	may seem illogical that the
-0.184070	It is assumed that the
-0.184070	must be emphasized that the
-0.184070	has the complication that the
-0.184070	it is unlikely that the
-0.184070	or from knowing that the
-0.450642	expect this to be the
-1.250166	is likely to be the
-0.458718	critical function may be the
-0.378959	would of course be the
-0.704935	a and b are the
-0.234563	to this problem are the
-0.234563	only allowed inputs are the
-0.234563	rules of algebra are the
-0.234563	The storage principles are the
-0.230185	in a vector or the
-0.396755	before the loop or the
-0.419931	of an array or the
-0.396755	randomly one way or the
-0.308606	#pragma vector aligned or the
-0.230185	whether the positive or the
-0.285435	network is overloaded or the
-0.230185	code if possible, or the
-0.285435	pointer or reference, or the
-0.230185	sorting and searching, or the
-0.237025	than to access it the
-0.284140	code is that if the
-0.172615	be predicted or if the
-0.172615	side effects or if the
-0.172615	memory blocks, or if the
-0.172615	is correct or if the
-0.172615	is unstable or if the
-0.172615	this initialization, or if the
-0.255719	inlining a function if the
-0.212130	n. But not if the
-0.115871	are variables than if the
-0.115871	binary form than if the
-0.143242	from the compiler if the
-0.203913	of the memory if the
-0.187933	virtual member functions if the
-0.172583	thread, and only if the
-0.172583	are possible only if the
-0.237829	14.13b works only if the
-0.143242	a blend instruction if the
-0.143242	to floating point if the
-0.280919	after the loop if the
-0.187958	is no loop if the
-0.397733	still be used if the
-0.143242	joined into one if the
-0.274458	of an integer if the
-0.143242	rather than double if the
-0.110598	compact and efficient if the
-0.166580	are most efficient if the
-0.110598	is less efficient if the
-0.115871	is not possible if the
-0.261954	is only possible if the
-0.051817	power of 2 if the
-0.065742	improve the performance if the
-0.065742	improvement in performance if the
-0.187933	a single branch if the
-0.143242	more efficient way if the
-0.070563	method is faster if the
-0.070563	access is faster if the
-0.033850	constant is faster if the
-0.143368	vector goes faster if the
-0.328936	factor. For example, if the
-0.328936	frequency. For example, if the
-0.143242	use 64-bit systems if the
-0.065742	method is useful if the
-0.065742	may be useful if the
-0.113497	clock cycles even if the
-0.113497	an Intel, even if the
-0.113497	be called, even if the
-0.113497	starts up, even if the
-0.113497	more resources, even if the
-0.113497	program execution, even if the
-0.143242	protected operating system if the
-0.187933	registers are available if the
-0.143242	be filled up if the
-0.143242	detect an error if the
-0.187933	contemporary 106 CPUs if the
-0.143242	vectorization works best if the
-0.187933	This is necessary if the
-0.143242	an array element if the
-0.129370	was programmed. But if the
-0.129370	the delay. But if the
-0.129370	cache miss. But if the
-0.143242	actually reduce speed if the
-0.476734	a separate thread if the
-0.143242	use 64-bit integers if the
-0.280919	need to check if the
-0.187958	as input check if the
-0.031635	function is advantageous if the
-0.031635	method is advantageous if the
-0.115916	may be advantageous if the
-0.065757	is more advantageous if the
-0.065742	is no problem if the
-0.065742	very big problem if the
-0.143242	not an advantage if the
-0.187933	64 bit mode if the
-0.187933	usually predicted well if the
-0.187933	is very fast if the
-0.065742	compilers without problems if the
-0.065742	can cause problems if the
-0.325181	listing to see if the
-0.187933	the software implementation if the
-0.143242	little more complicated if the
-0.143242	the above methods if the
-0.255719	at a disadvantage if the
-0.143242	a const reference if the
-0.143242	a table lookup if the
-0.015526	is not needed if the
-0.143242	by using vectors if the
-0.143242	give inconsistent results if the
-0.143242	swap the operands if the
-0.143242	at runtime here if the
-0.031628	be cache contentions if the
-0.031628	cause cache contentions if the
-0.143242	address is predicted if the
-0.187933	can cause errors if the
-0.187933	be very inefficient if the
-0.143242	for Linux platforms if the
-0.255719	can be vectorized if the
-0.143242	is usually inlined if the
-0.143242	the loop further if the
-0.143242	the hard disk if the
-0.143242	performance is obtained if the
-0.212130	works less efficiently if the
-0.143242	specific CPU models if the
-0.143287	likely to fail if the
-0.226233	trick will fail if the
-0.143242	predict the target if the
-0.083879	a program, especially if the
-0.083879	32-bit systems, especially if the
-0.083879	the file, especially if the
-0.143242	need the updates if the
-0.403591	you may consider if the
-0.143242	the function directly if the
-0.187933	with the loops if the
-0.065742	errors can happen if the
-0.065742	This will happen if the
-0.143242	it doesn't matter if the
-0.143242	function is pure if the
-0.143242	one clock cycle if the
-0.143242	are more frequent if the
-0.143242	be needed, however, if the
-0.143242	second operand. Likewise, if the
-0.143242	is more compact if the
-0.143242	omitted, of course, if the
-0.143242	is more complex if the
-0.143242	CPU dispatching. Test if the
-0.143242	This typically happens if the
-0.143242	can be permissible if the
-0.143242	do so (i.e. if the
-0.006832	can be eliminated if the
-0.013773	also be eliminated if the
-0.143242	code is selected if the
-0.143242	cause severe delays if the
-0.143242	template library (STL) if the
-0.143242	Windows) to determine if the
-0.143242	cause branch mispredictions if the
-0.143242	not call WriteFile if the
-0.143242	256 bits (YMM) if the
-0.143242	extending the sign-bit if the
-0.143242	cannot be ignored if the
-0.143242	128 bits (XMM) if the
-0.143242	than a minute if the
-0.143242	cost is minimized if the
-0.263059	clock period and by the
-0.186317	operating system, not by the
-0.178574	code rather than by the
-0.178574	units rather than by the
-0.178574	tools, rather than by the
-0.235943	(rebased) once more by the
-0.828791	out the loop by the
-0.364111	data memory used by the
-0.186317	for information stored by the
-0.142330	which is called by the
-0.202864	library functions called by the
-0.235943	particular memory address by the
-0.186317	a function call by the
-0.253831	be ruled out by the
-0.235943	objects and arrays by the
-0.292494	Relocation is done by the
-0.197278	standardized and done by the
-0.197278	not necessarily done by the
-0.819601	can be calculated by the
-0.235943	is typically implemented by the
-0.128701	set is supported by the
-0.105182	Linux and supported by the
-0.100122	functions are supported by the
-0.100122	registers are supported by the
-0.049414	available if supported by the
-0.049414	__restrict__, if supported by the
-0.186317	often inlined automatically by the
-0.186317	is actually needed by the
-0.401435	to be aligned by the
-0.198980	that is divisible by the
-0.097395	which is divisible by the
-0.045973	count is divisible by the
-0.201028	to be divisible by the
-0.031142	is not divisible by the
-0.580278	an address divisible by the
-0.299095	to addresses divisible by the
-0.202571	memory addresses divisible by the
-0.387566	function is replaced by the
-0.478455	will be replaced by the
-0.186317	to be predicted by the
-0.202864	performance is limited by the
-0.142330	to be limited by the
-0.263059	sometimes be obtained by the
-0.186317	call to square by the
-0.186317	14.7b is converted by the
-0.186317	just two additions by the
-0.061870	time is determined by the
-0.061870	slices is determined by the
-0.193351	cases be determined by the
-0.009574	the code generated by the
-0.039630	Object files generated by the
-0.039630	the comments generated by the
-0.186317	can be illustrated by the
-0.102142	value is multiplied by the
-0.048075	should be multiplied by the
-0.048075	must be multiplied by the
-0.012812	may be modified by the
-0.053686	are never modified by the
-0.186317	be done manually by the
-0.186317	occurrences of ArraySize by the
-0.186317	DLLs are relocated by the
-0.186317	compose a bitfield by the
-0.186317	tested and investigated by the
-0.186317	exception is caught by the
-0.186317	a place indicated by the
-0.186317	is obviously influenced by the
-0.186317	only when activated by the
-0.253851	and delete or with the
-0.263079	or replace it with the
-0.187232	making another function with the
-0.187232	a pure function with the
-0.263079	to each compiler with the
-0.364137	need a CPU with the
-0.043120	use this library with the
-0.186334	class member variable with the
-0.039633	out multiple bits with the
-0.039633	toggle multiple bits with the
-0.268712	The first processors with the
-0.053691	we end up with the
-0.026016	to keep up with the
-0.026016	always keep up with the
-0.333831	should be accessed with the
-0.292657	are normally compiled with the
-0.210756	number of threads with the
-0.210756	no more threads with the
-0.299475	is to compile with the
-0.142340	when you compile with the
-0.186334	trap integer overflow with the
-0.253851	as 8-bit integers with the
-0.273238	allocation is done with the
-0.365880	can be done with the
-0.235962	are always calculated with the
-0.268712	most serious problem with the
-0.186334	of different types with the
-0.235962	for variables declared with the
-0.235962	-fno-pic and link with the
-0.186334	(or eight) points with the
-0.235962	some funny things with the
-0.527311	are not compatible with the
-0.277853	library that comes with the
-0.235962	cannot be vectorized with the
-0.193361	performance is obtained with the
-0.029836	can be obtained with the
-0.186334	value of N with the
-0.186334	point representation directly with the
-0.124247	elements that come with the
-0.124247	mathimf.h that come with the
-0.186334	errors can happen with the
-0.186334	The libraries included with the
-0.186334	in vector c2 with the
-0.186334	make a DLL with the
-0.102149	done by multiplying with the
-0.048078	double before multiplying with the
-0.048078	precision before multiplying with the
-0.186334	in vector bc with the
-0.186334	compile them separately with the
-0.438534	bb[i]*cc[i] is AND'ed with the
-0.186334	Clang compiler combined with the
-0.186334	the PLT entry with the
-0.186334	make some tests with the
-0.253851	are not satisfied with the
-0.142340	Windows. Gnu Comes with the
-0.142340	/ Embarcadero Comes with the
-0.186334	in duration compared with the
-0.235962	cache control Microprocessors with the
-0.186334	are often conflicting with the
-0.186334	or multiple configurations with the
-0.186334	have been unsatisfied with the
-0.186334	example 12.4b, rewritten with the
-0.186334	or array coincides with the
-0.186334	is dividing repeatedly with the
-0.186334	have spent fighting with the
-0.009664	register rather than on the
-0.002396	registers rather than on the
-0.050625	user interface than on the
-0.164180	the Gnu compiler on the
-0.258621	more CPU time on the
-0.164180	that allocates memory on the
-0.164180	a speed-critical program on the
-0.341967	value depends only on the
-0.211201	an inferior version on the
-0.164180	with other objects on the
-0.023405	it is stored on the
-0.023405	will be stored on the
-0.003814	function are stored on the
-0.007662	parameters are stored on the
-0.023405	26). Variables stored on the
-0.228078	other virtual processors on the
-0.093010	Gnu directives work on the
-0.093010	Microsoft directives work on the
-0.122126	finished the calculations on the
-0.122126	to do calculations on the
-0.122126	to start calculations on the
-0.512749	that works best on the
-0.164180	depends very much on the
-0.211201	of possible overflow on the
-0.164180	preferably be done on the
-0.211201	Storing the parameters on the
-0.164180	solution is optimal on the
-0.164180	amount of space on the
-0.286196	libraries when running on the
-0.192213	other processes running on the
-0.035598	parameters are transferred on the
-0.035598	r are transferred on the
-0.211201	prevents all optimizations on the
-0.164180	than rendering graphics on the
-0.164180	to keep together on the
-0.164180	replaced by storage on the
-0.276876	manual is based on the
-0.338865	recommendations are based on the
-0.205968	will go based on the
-0.205968	be chosen based on the
-0.211201	etc. scattered around on the
-0.071914	each vector depends on the
-0.071914	each value depends on the
-0.201595	control branch depends on the
-0.016893	each calculation depends on the
-0.071914	final application depends on the
-0.071914	each addition depends on the
-0.071914	be predicted depends on the
-0.071914	of sum depends on the
-0.071914	The gain depends on the
-0.164180	are fully compatible on the
-0.065614	various ways depending on the
-0.065614	frequency dynamically depending on the
-0.002443	clock cycles, depending on the
-0.065614	32-bit integers, depending on the
-0.065614	and 64, depending on the
-0.065614	or four, depending on the
-0.065614	several meanings depending on the
-0.065614	conditional move, depending on the
-0.164180	new model comes on the
-0.084979	compilers that rely on the
-0.084979	optimizations that rely on the
-0.284359	You cannot rely on the
-0.190733	cannot always rely on the
-0.158897	hardly any effect on the
-0.158897	very dramatic effect on the
-0.164180	dependency chain, especially on the
-0.164180	which is 15 on the
-0.142774	iteration should depend on the
-0.142774	workaround methods depend on the
-0.142774	hardware-related details depend on the
-0.164180	A negative list, on the
-0.164180	necessary to compromise on the
-0.242069	the MKL relies on the
-0.164180	better processor appears on the
-0.211201	only 5 μs on the
-0.164180	is more focus on the
-0.164180	various discussion forums on the
-0.164180	compilation or interpretation on the
-0.164180	This has influence on the
-0.164180	the optimization efforts on the
-0.164180	for discussions. Turn on the
-0.164180	and objects. Storage on the
-0.164180	ebx is pushed on the
-0.164180	clock cycles (depending on the
-0.164180	we are relying on the
-0.236621	ReadB needs to code the
-0.274712	the same time as the
-0.751695	is the same as the
-0.537615	does the same as the
-0.316021	Good compilers such as the
-0.316021	also available, such as the
-0.316021	certain events, such as the
-0.518434	time as long as the
-0.499761	fraction is stored as the
-0.201082	optimized as good as the
-0.201082	optimize as good as the
-0.220729	the same precision as the
-0.502463	program as well as the
-0.294308	the source code, as the
-0.220729	the same features as the
-0.220729	language is chosen as the
-0.274712	invalid as soon as the
-0.220729	exact time consumption as the
-0.220729	becoming increasingly blurred as the
-0.220729	the same directory as the
-0.833746	but it is not the
-0.536401	as this is not the
-0.343084	the debugger is not the
-0.306149	compile- time, but not the
-0.306149	more complex, but not the
-0.231281	follow the rows, not the
-0.482505	takes more time than the
-0.482505	take more time than the
-0.287766	is no more than the
-0.287766	often much more than the
-0.186512	to the CPU than the
-0.263292	a library other than the
-0.186512	higher instruction set than the
-0.917833	is more efficient than the
-0.762928	is less efficient than the
-0.400941	function is faster than the
-0.118545	implementation is faster than the
-0.282125	(This is faster than the
-0.223757	is often faster than the
-0.223757	calculated much faster than the
-0.223757	is increasing faster than the
-0.230856	function is less than the
-0.230856	delay is less than the
-0.249759	to be less than the
-0.162631	fact be less than the
-0.345060	constant 8 rather than the
-0.263170	the stack rather than the
-0.263170	each factor rather than the
-0.263170	single step rather than the
-0.263170	of xxn rather than the
-0.263170	&& !b) rather than the
-0.263170	that matters rather than the
-0.263170	running at, rather than the
-0.186512	cache control instructions than the
-0.186512	faster to calculate than the
-0.145601	use more resources than the
-0.066727	much more resources than the
-0.334081	version is better than the
-0.236162	is usually higher than the
-0.034770	program is bigger than the
-0.034770	matrix is bigger than the
-0.034770	parameter is bigger than the
-0.113972	may be bigger than the
-0.113972	that are bigger than the
-0.113972	innermost loop bigger than the
-0.186512	in other modules than the
-0.186512	execution units smaller than the
-0.186512	therefore more safe than the
-0.142447	the same priority than the
-0.142447	with lower priority than the
-0.282704	to be slower than the
-0.312651	is more predictable than the
-0.186512	that is larger than the
-0.186512	do other input/output than the
-0.186512	larger memory footprint than the
-0.282589	are sure to have the
-0.282589	be convenient to have the
-0.268896	it will not have the
-0.410722	that do not have the
-0.309803	not all libraries have the
-0.548260	64 bit systems have the
-0.413332	because it doesn't have the
-0.618766	the compiler doesn't have the
-0.334846	I simply don't have the
-0.225140	However, these languages have the
-0.121800	library. Add to this the
-0.121800	dispatching. Add to this the
-0.233395	Add to 122 this the
-0.981259	unknown at the time the
-0.425406	popular at the time the
-0.322507	cycles before the time the
-0.027178	rather than each time the
-0.120700	the value each time the
-0.120700	for updates each time the
-0.228247	only the first time the
-0.228247	until the first time the
-0.189425	is done every time the
-0.189425	the list every time the
-0.189425	of branches every time the
-0.189425	be loaded every time the
-0.189425	for updates every time the
-0.189425	a misprediction every time the
-0.269098	until the next time the
-0.352228	same as last time the
-0.279964	performance is to use the
-0.279964	recommendation is to use the
-0.197369	allows you to use the
-0.197369	compiler has to use the
-0.158623	is possible to use the
-0.244866	it possible to use the
-0.266769	often faster to use the
-0.199047	for how to use the
-0.199047	shows how to use the
-0.197369	some cases to use the
-0.197369	may want to use the
-0.249350	is advantageous to use the
-0.337956	not advantageous to use the
-0.197369	is likely to use the
-0.389900	is recommended to use the
-0.231660	not recommended to use the
-0.266769	be preferred to use the
-0.197369	may prefer to use the
-0.184810	eliminate i and use the
-0.184810	one local, and use the
-0.198516	on instructions that use the
-0.198516	all modules that use the
-0.077776	that it can use the
-0.077776	called, it can use the
-0.220650	The compiler can use the
-0.220650	optimizing compiler can use the
-0.204013	Windows, you can use the
-0.204013	Alternatively, you can use the
-0.237834	test. You can use the
-0.220456	instruction directly, or use the
-0.277980	It will not use the
-0.277980	libraries do not use the
-0.362340	optimizations. Do not use the
-0.278746	or you may use the
-0.215696	efficient. You may use the
-0.215696	precision. You may use the
-0.184810	The loop will use the
-0.252070	Most compilers will use the
-0.172476	If you do use the
-0.077731	The best compilers use the
-0.077731	// (Some compilers use the
-0.220456	that they cannot use the
-0.237704	is to always use the
-0.220456	when several applications use the
-0.220456	Mac systems normally use the
-0.172476	square brackets mean use the
-0.172476	on Intel CPUs: use the
-0.172476	+ 2 thenaandbcannot use the
-0.172476	and multiplications. Subtractions use the
-0.011045	64-bit mode or when the
-0.066243	the power function when the
-0.066243	the pow function when the
-0.189260	in vectorized code when the
-0.111264	virtualization. The time when the
-0.111264	a long time when the
-0.111264	no extra time when the
-0.189260	loaded into memory when the
-0.144440	a big program when the
-0.246694	register size only when the
-0.205291	is also used when the
-0.144440	simply put there when the
-0.213533	is less efficient when the
-0.189260	will be faster when the
-0.066243	destructor is called when the
-0.066243	must be called when the
-0.144440	in 32-bit systems when the
-0.144185	This is useful when the
-0.144185	can be useful when the
-0.252367	are needed even when the
-0.144440	and 512 bits when the
-0.257271	using vector operations when the
-0.144440	in most cases when the
-0.144440	what you want when the
-0.144440	It is best when the
-0.144440	preferred programming language when the
-0.144440	i<n; ++i). But when the
-0.189260	transpose the matrix when the
-0.031860	and double precision when the
-0.031860	than double precision when the
-0.189260	have this problem when the
-0.144440	as loop counter when the
-0.144440	load several files when the
-0.144440	dynamic memory allocation when the
-0.144440	of CPU-intensive programs when the
-0.144440	schemes cause problems when the
-0.066243	it goes automatically when the
-0.066243	or update automatically when the
-0.144440	is a disadvantage when the
-0.144440	assembly language modules when the
-0.276079	size is relevant when the
-0.015638	be allocated dynamically when the
-0.144440	comparisons are inefficient when the
-0.144440	for this task when the
-0.144440	efficiency is obtained when the
-0.238822	works most efficiently when the
-0.020978	that is initialized when the
-0.020978	table is initialized when the
-0.043021	to be initialized when the
-0.144440	code slower, especially when the
-0.238822	becomes more fragmented when the
-0.144440	and mouse inputs when the
-0.066243	library is resolved when the
-0.066243	is not resolved when the
-0.144440	make an update when the
-0.144440	start garbage collection when the
-0.144440	slower than truncation when the
-0.144440	at different places when the
-0.116621	it is deallocated when the
-0.116621	they are deallocated when the
-0.144440	frequency is increased when the
-0.144440	to is deleted when the
-0.144440	allocation is negligible when the
-0.144440	space is freed when the
-0.144440	can be bypassed when the
-0.144440	floating point precisions when the
-0.144440	or four float's when the
-0.144440	on the processor) when the
-0.144440	more than 33% when the
-0.144440	the memory released when the
-0.144440	high and decreased when the
-0.174018	another dispatched function then the
-0.263181	a short time then the
-0.174018	bytes or more then the
-0.248422	to frame functions then the
-0.222178	can be set then the
-0.121279	power of 2 then the
-0.174018	not enough registers then the
-0.174018	is the case then the
-0.222178	a thousand times then the
-0.174018	used cache line then the
-0.222178	as template parameters then the
-0.174018	different set values then the
-0.174018	of these methods then the
-0.174018	objects is high then the
-0.174018	the innermost function, then the
-0.263181	in this range then the
-0.222178	j as index then the
-0.078351	at a time, then the
-0.078351	at any time, then the
-0.174018	pointer has changed then the
-0.296025	time to execute then the
-0.239496	the same module then the
-0.174018	are equally near then the
-0.174018	more than once then the
-0.037411	the shared object, then the
-0.037411	a shared object, then the
-0.174018	are 32-bit integers, then the
-0.174018	the carry flag then the
-0.174018	|| is true, then the
-0.174018	into the pipeline then the
-0.174018	small and changing then the
-0.174018	A is slow, then the
-0.174018	&& is false, then the
-0.222178	the second sum, then the
-0.222178	a 64-bit double, then the
-0.174018	is not vacant then the
-0.174018	with different priorities then the
-0.174018	is 2 GHz then the
-0.174018	compile-time while loops, then the
-0.174018	in chapter 9.10, then the
-0.174018	are accessed row-wise, then the
-0.174018	important to ignore, then the
-0.174018	it writes only, then the
-0.174018	i = 18, then the
-0.227851	a single function from the
-0.227851	previous value than from the
-0.518250	invoking the compiler from the
-0.227851	the level-1 cache from the
-0.201697	subtract this value from the
-0.201697	calculate each value from the
-0.323694	is to return from the
-0.311035	that is called from the
-0.212106	class are called from the
-0.311035	also when called from the
-0.179092	address. A call from the
-0.179092	a map file from the
-0.285635	is also available from the
-0.254451	even when accessed from the
-0.139024	value is calculated from the
-0.070037	xn is calculated from the
-0.070037	n! is calculated from the
-0.179092	different instruction sets from the
-0.179092	by subtracting n from the
-0.179092	transferred at runtime from the
-0.179092	that are needed from the
-0.259972	to 99 read from the
-0.179092	example 9.5a goes from the
-0.179092	array size right from the
-0.179092	reading and writing from the
-0.179092	calculated more efficiently from the
-0.245402	user is far from the
-0.259972	memory will benefit from the
-0.179092	factors are generated from the
-0.179092	beginning. ret returns from the
-0.179092	pointer it gets from the
-0.179092	static is removed from the
-0.179092	is not separated from the
-0.179092	deallocated when returning from the
-0.179092	to be evicted from the
-0.179092	get no warning from the
-0.179092	can be fetched from the
-0.179092	used and popped from the
-0.179092	You may deviate from the
-0.224983	as reflecting it at the
-0.176527	to stack memory at the
-0.030298	are never used at the
-0.176527	with other compilers at the
-0.176527	write the variable at the
-0.182291	diagonal. The elements at the
-0.182291	add dummy elements at the
-0.176527	applications run faster at the
-0.330888	inlining is done at the
-0.176527	counting clock cycles at the
-0.176527	8. Avoid branches at the
-0.176527	floating point multiplication at the
-0.176527	than its name at the
-0.176527	seconds to zero at the
-0.176527	Avoid table lookup at the
-0.083961	You may look at the
-0.026224	if you look at the
-0.026224	If you look at the
-0.026224	When you look at the
-0.083961	may also look at the
-0.136403	code. Let's look at the
-0.083961	example, let's look at the
-0.176527	all installation options at the
-0.176527	do multiple things at the
-0.024875	that was unknown at the
-0.000968	that were unknown at the
-0.176527	than one thing at the
-0.176527	cache, at least at the
-0.176527	CPU, which counts at the
-0.176527	the same DLL at the
-0.176527	it will break at the
-0.176527	was less popular at the
-0.176527	loop body begins at the
-0.176527	have been lost at the
-0.176527	stupid things. Looking at the
-0.440771	safe if it has the
-0.298381	function, but it has the
-0.041119	the function. This has the
-0.194733	is called. This has the
-0.194733	is executed. This has the
-0.211257	public CParent<CChild1> { has the
-0.519543	a pointer. It has the
-0.211257	the variable always has the
-0.613110	that the microprocessor has the
-0.211257	variable in main has the
-0.211257	copy Function inlining has the
-0.211257	compiled as position-independent has the
-0.211257	overflow doesn't occur has the
-0.211257	the main executable has the
-0.211257	and shared_ptr. auto_ptr has the
-0.375939	problem is to make the
-0.529022	solution is to make the
-0.236251	can do to make the
-0.236251	the array to make the
-0.345603	is possible to make the
-0.236251	fourth value to make the
-0.201595	in order to make the
-0.560103	shows how to make the
-0.441676	be useful to make the
-0.733857	you want to make the
-0.236251	be advantageous to make the
-0.236251	better solution to make the
-0.236251	ingenious things to make the
-0.236251	you forget to make the
-0.236251	I tried to make the
-0.236251	also tends to make the
-0.257817	the problem and make the
-0.257817	the conversions and make the
-0.267140	long does not make the
-0.223096	and this will make the
-0.223096	integer overflow will make the
-0.223096	= b++; will make the
-0.082736	two loops would make the
-0.082736	dependency chain would make the
-0.185033	may of course make the
-0.185033	compile time. Templates make the
-0.185033	-Wstrict-overflow=2, or (5) make the
-0.180143	time. This is because the
-0.180143	size. This is because the
-0.180143	returns. This is because the
-0.141510	This may be because the
-0.141510	the program or because the
-0.177568	the simple function because the
-0.177568	a frame function because the
-0.072331	the execution time because the
-0.072331	total execution time because the
-0.141510	to store data because the
-0.141510	class member functions because the
-0.141510	evaluated at all because the
-0.141510	the level-2 cache because the
-0.141510	is an integer because the
-0.141510	a contained object because the
-0.159165	Templates are efficient because the
-0.159165	be less efficient because the
-0.141510	other optimizations possible because the
-0.141510	the optimized version because the
-0.053612	reduce the performance because the
-0.053612	reducing the performance because the
-0.141510	for 32-bit software because the
-0.141510	loop is long because the
-0.201922	method is faster because the
-0.141510	happens quite often because the
-0.141510	the unit- test because the
-0.141510	soon became available because the
-0.201922	a hundred times because the
-0.141510	list is large because the
-0.141510	extra function calls because the
-0.141510	the correct result because the
-0.141510	is not necessary because the
-0.141510	less than 128 because the
-0.186013	an optimal solution because the
-0.004496	in 64-bit mode because the
-0.141510	in interactive programs because the
-0.141510	in a microprocessor because the
-0.141510	will be better because the
-0.141510	for critical applications because the
-0.141510	would be needed because the
-0.141510	of simple types because the
-0.141510	an uncached read because the
-0.141510	the allocation process because the
-0.186013	of the operands because the
-0.186013	on n here because the
-0.224660	other is inefficient because the
-0.141987	is very inefficient because the
-0.141510	is not copied because the
-0.141510	compiled without -fpic because the
-0.141510	problem only occurs because the
-0.141510	in column 28 because the
-0.186013	is calculated twice because the
-0.141510	with option -fpie because the
-0.141510	i but i*12, because the
-0.141510	partial flags stall because the
-0.141510	is not evaluated, because the
-0.141510	same cache line, because the
-0.302259	15.1c automatically, and only the
-0.213897	make dispatcher in only the
-0.303314	for CPUs with only the
-0.213897	can rely on only the
-0.311219	profiler measures not only the
-0.286227	and then use only the
-0.213897	hyperthreading by using only the
-0.213897	it is initialized only the
-0.213897	Static linking includes only the
-0.213897	set and insert only the
-0.213897	On other processors, only the
-0.213897	processes simultaneously. Actually, only the
-0.213897	and it understands only the
-0.158662	the error code. If the
-0.158662	the simplest code. If the
-0.131065	the virtual function. If the
-0.189939	loaded into memory. If the
-0.131065	can be used. If the
-0.060607	the data cache. If the
-0.060607	a level-3 cache. If the
-0.174456	on some systems. If the
-0.174456	is not efficient. If the
-0.163863	desired instruction set. If the
-0.108209	in each set. If the
-0.131065	many function calls. If the
-0.131065	to the object. If the
-0.174456	different function library. If the
-0.131065	for test purposes. If the
-0.131065	the following way. If the
-0.174456	in the CPU. If the
-0.174456	becomes a problem. If the
-0.131065	non- sequential order. If the
-0.131065	statement was executed. If the
-0.131065	another source file. If the
-0.131065	in a register. If the
-0.174456	the above table. If the
-0.060607	or threads simultaneously. If the
-0.060607	or seemingly simultaneously. If the
-0.174456	8-bit signed number. If the
-0.131065	large or constant. If the
-0.131065	the data members. If the
-0.131065	difficult to maintain. If the
-0.131065	with lower priority. If the
-0.131065	Common subexpression elimination If the
-0.131065	will work better. If the
-0.131065	variable is declared. If the
-0.131065	pool. See www.agner.org/optimize/cppexamples.zip. If the
-0.131065	and an addition. If the
-0.131065	bytes of code). If the
-0.131065	in chapter 12. If the
-0.131065	or too long. If the
-0.131065	exactly the same. If the
-0.131065	cores is slow. If the
-0.131065	from set 0x1C. If the
-0.131065	*(p++) |= 0x20; If the
-0.131065	and deleting containers. If the
-0.131065	than 250 ms. If the
-0.131065	(see page 105). If the
-0.131065	at compile time? If the
-0.131065	the same class). If the
-0.131065	given above. 7. If the
-0.131065	on page 62. If the
-0.131065	software was coded. If the
-0.131065	by a key? If the
-0.131065	is more complicated. If the
-0.131065	using nontemporal writes. If the
-0.131065	data for analysis. If the
-0.131065	or multiple elements? If the
-0.131065	pointers or references: If the
-0.131065	into the pipeline. If the
-0.131065	element is stored? If the
-0.131065	sum1 += sum2; If the
-0.131065	already been allocated. If the
-0.658819	the function in which the
-0.184500	of code in which the
-0.313123	the order in which the
-0.184500	{} brackets in which the
-0.223891	an error code which the
-0.268886	get rid of all the
-0.175308	executable file and all the
-0.175308	if statement and all the
-0.268886	monitor counters in all the
-0.257818	of memory for all the
-0.257818	to check for all the
-0.298243	This means that all the
-0.156791	the loop if all the
-0.156791	cycles. But if all the
-0.156791	at runtime if all the
-0.261426	the data with all the
-0.261426	command line with all the
-0.283076	that works on all the
-0.236124	values first, then all the
-0.142427	or double because all the
-0.142427	hexadecimal numbers because all the
-0.186478	values, and last all the
-0.186478	may not load all the
-0.019377	function by inlining all the
-0.186478	loop without checking all the
-0.186478	STL vector stores all the
-0.186478	test or manipulate all the
-0.186478	does not solve all the
-0.186478	have to distribute all the
-0.186478	efficient to pool all the
-0.202069	size of all but the
-0.202069	very contrived example, but the
-0.367806	for intrinsic functions, but the
-0.281922	at compile time, but the
-0.272289	also quite efficient, but the
-0.202069	adds, not edx but the
-0.356094	into multiple threads, but the
-0.202069	based on BSD, but the
-0.253641	platforms as well, but the
-0.202069	registers by 64, but the
-0.202069	division in vectors, but the
-0.202069	still be vectorized, but the
-0.253641	overflow can occur, but the
-0.202069	to using hyperthreading, but the
-0.202069	by a macro, but the
-0.202069	(see page 103), but the
-0.202069	a particular situation, but the
-0.202069	overlapping or aliasing, but the
-0.380700	tricky. I have used the
-0.249962	You have to set the
-0.249962	strongly recommended to set the
-0.231089	may, in addition, set the
-0.434886	you have to do the
-0.272422	don't have to do the
-0.544723	is possible to do the
-0.458691	be possible to do the
-0.333078	specifies how to do the
-0.533790	therefore necessary to do the
-0.312141	is able to do the
-0.440848	are able to do the
-0.253224	be better to do the
-0.253224	more safe to do the
-0.315070	simple algorithm can do the
-0.272734	which compiler will do the
-0.202448	make the program do the
-0.254068	and therefore cannot do the
-0.202448	then you must do the
-0.347383	A disadvantage of using the
-0.311932	The advantages of using the
-0.240798	the possibility of using the
-0.240032	integer counter and using the
-0.174478	speed advantage in using the
-0.240032	test tool for using the
-0.407737	sure you are using the
-0.242645	operating systems are using the
-0.194688	of x by using the
-0.194688	speed-critical functions by using the
-0.194688	is set by using the
-0.194688	CPU clock by using the
-0.194688	global variables by using the
-0.194688	code explicitly by using the
-0.194688	alias anything by using the
-0.194688	be hidden by using the
-0.194688	if necessary, by using the
-0.194688	data segment by using the
-0.194630	certain restrictions on using the
-0.135159	www.intel.com. Manual on using the
-0.174478	www.agner.org/optimize/cppexamples.zip. An array using the
-0.166503	an exception without using the
-0.166503	C1::f directly without using the
-0.174478	simple algebraic expressions using the
-0.174478	a single operation using the
-0.174478	it is finished using the
-0.232821	example, you can double the
-0.232821	log(c[i]);. This would double the
-0.260785	the right data into the
-0.179767	bit of i into the
-0.179767	data as possible into the
-0.179767	feeds a branch into the
-0.179767	the flags register into the
-0.018802	what fits best into the
-0.179767	old memory block into the
-0.179767	to be put into the
-0.303656	should be linked into the
-0.179767	a test feature into the
-0.170391	and copies them into the
-0.170391	and getting them into the
-0.228606	the data fit into the
-0.179767	splitting of N into the
-0.179767	measurement instruments directly into the
-0.179767	and go back into the
-0.179767	desired measurement instruments into the
-0.179767	branch is fed into the
-0.179767	to go deeper into the
-0.179767	32-bit Windows. Integrates into the
-0.179767	data fit nicely into the
-0.179767	branches to feed into the
-0.273791	under test but also the
-0.273791	hot spot but also the
-0.236044	a[i]. Note how efficient the
-0.249113	to the function. In the
-0.087830	which is faster. In the
-0.087830	is much faster. In the
-0.249113	hence higher speed. In the
-0.198046	reduce them all. In the
-0.198046	power of two. In the
-0.198046	it doesn't occur. In the
-0.198046	become very big. In the
-0.198046	false vendor string. In the
-0.198046	the same name. In the
-0.198046	easily be obtained. In the
-0.198046	on page 60. In the
-0.198046	chapter (page 146). In the
-0.147142	large arrays and where the
-0.147142	than a program where the
-0.147142	x86 instruction set where the
-0.113120	size in cases where the
-0.113120	automatically in cases where the
-0.113120	containers in cases where the
-0.129117	in most cases where the
-0.129117	etc. In cases where the
-0.129117	are many cases where the
-0.129117	in simple cases where the
-0.129117	in special cases where the
-0.147142	with carry) instructions where the
-0.147142	in 64-bit mode where the
-0.147142	large data sets where the
-0.147142	large memory model where the
-0.147142	construct obscure examples where the
-0.147142	a learning process where the
-0.147142	Pentium 4 computer where the
-0.147142	you can predict where the
-0.345253	in the situation where the
-0.189674	a use situation where the
-0.189674	A common situation where the
-0.147142	form of templates where the
-0.197901	aware of situations where the
-0.415653	useful in situations where the
-0.147142	storage is determined where the
-0.147142	the second step where the
-0.147142	return addresses (i.e. where the
-0.147142	(except in Fortran where the
-0.147142	a column-wise manner where the
-0.534541	that the compiler takes the
-0.224710	power of 2, so the
-0.224710	time in thousand so the
-0.224710	standard specifies truncation so the
-0.224710	seven significant digits, so the
-0.234459	library function will return the
-0.189869	the cache in between the
-0.447592	difference in performance between the
-0.250989	as the difference between the
-0.360452	Note the difference between the
-0.205321	a minimal difference between the
-0.189869	third-party graphics framework between the
-0.447592	communication and synchronization between the
-0.239927	an important distinction between the
-0.189869	data. The similarity between the
-0.189869	and the transitions between the
-0.189869	the work evenly between the
-0.189869	differences were observed between the
-0.189869	functions for distinguishing between the
-0.334501	has no virtual member the
-0.270831	because of the way the
-0.270831	depends on the way the
-0.647740	There is no way the
-0.354068	Integer division is faster the
-0.287012	at all. This makes the
-0.287012	is stored. This makes the
-0.171658	code, but this makes the
-0.171658	smaller functions only makes the
-0.245622	access non-sequential which makes the
-0.171658	comparisons by one makes the
-0.236754	time, it also makes the
-0.171658	The function call makes the
-0.171658	A non-Intel processor makes the
-0.036979	manual. This option makes the
-0.036979	-fsource-asm). This option makes the
-0.171658	unsigned integers simply makes the
-0.219542	while dynamic linking makes the
-0.171658	use of templates makes the
-0.171658	of such checks makes the
-0.171658	multiple memory blocks makes the
-0.171658	with many instances makes the
-0.240017	this the time before the
-0.174465	terminates the program before the
-0.131074	that the value before the
-0.131074	time it takes before the
-0.131074	a virtual table before the
-0.131074	branch misprediction long before the
-0.039453	that is called before the
-0.039453	must be called before the
-0.039453	is usually called before the
-0.131074	billions of times before the
-0.240017	from the stack before the
-0.131074	such a check before the
-0.060611	store is known before the
-0.060611	the size known before the
-0.189949	to desired values before the
-0.131074	a pointer well before the
-0.131074	few clock cycles before the
-0.131074	a new addition before the
-0.131074	because it comes before the
-0.131074	the thread priority before the
-0.131074	of one iteration before the
-0.131074	they are resolved before the
-0.131074	nearby address again before the
-0.131074	calculation of B before the
-0.131074	may be freed before the
-0.060611	to do immediately before the
-0.060611	be placed immediately before the
-0.131074	to be restored before the
-0.531432	processor. This is called the
-0.226284	part of memory called the
-0.226284	a special cache called the
-0.249056	with contiguous memory. See the
-0.197995	standardized across platforms. See the
-0.197995	into sleep mode. See the
-0.197995	best possible version. See the
-0.197995	loop control branch. See the
-0.197995	structured exception handling. See the
-0.328073	one memory pool. See the
-0.197995	you are doing. See the
-0.197995	the directive __declspec(cpu_dispatch(...)). See the
-0.197995	parallelism is obvious. See the
-0.140218	the code to call the
-0.031042	you have to call the
-0.597970	it takes to call the
-0.140218	you want to call the
-0.140218	exception handler to call the
-0.140218	is supposed to call the
-0.238260	by F2 and call the
-0.256246	parameter. It can call the
-0.188383	Alternatively, you may call the
-0.188383	no other modules call the
-0.188383	} // Now call the
-0.242193	} In this example, the
-0.156430	b; In this example, the
-0.156430	55 In this example, the
-0.301335	for optimization. For example, the
-0.551451	CPU dispatching. For example, the
-0.301335	easy development. For example, the
-0.301335	of sources. For example, the
-0.035148	In the above example, the
-0.073392	(In the above example, the
-0.234966	following example shows first the
-0.234761	into a single register the
-0.433246	u.f We can take the
-0.212493	compiler may not take the
-0.212493	a and b take the
-0.212493	= 28. We take the
-0.212493	point calculations usually take the
-0.212493	possible inputs. Let's take the
-0.450716	long. This is often the
-0.225909	specialization. This is how the
-0.314370	machine code and how the
-0.300451	few comments about how the
-0.309636	user may not need the
-0.308848	evicted before we need the
-0.574259	because it doesn't need the
-0.309636	data that don't need the
-0.265681	For example, to test the
-0.265681	is necessary to test the
-0.265681	more relevant to test the
-0.219121	then you should test the
-0.271000	innermost loop and without the
-0.183588	represented with or without the
-0.308746	a shared object without the
-0.183588	a new version without the
-0.183588	same dynamic libraries without the
-0.183588	example 11.3 even without the
-0.183588	for old processors without the
-0.183588	performance on CPUs without the
-0.183588	sequence of calculations without the
-0.183588	cannot be changed without the
-0.183588	compiled code. (Compile without the
-0.229571	hash table for even the
-0.229571	in some cases even the
-0.887743	if you are sure the
-0.291970	is to make sure the
-0.256321	programmer to make sure the
-0.291970	subexpression to make sure the
-0.291970	manner to make sure the
-0.306102	compiler, then make sure the
-0.273297	destructor that makes sure the
-0.273297	that it makes sure the
-0.232202	a variable. Make sure the
-0.289008	is small and always the
-0.287232	several seconds to access the
-0.287232	be unable to access the
-0.271809	the functions that access the
-0.218164	and finally (4) access the
-0.139477	We can shift out the
-0.139477	14.28 will shift out the
-0.045720	compiler cannot rule out the
-0.100189	to completely rule out the
-0.142630	is to roll out the
-0.142630	way to roll out the
-0.142630	want to roll out the
-0.114128	when we roll out the
-0.038784	avoided by rolling out the
-0.038784	8.26a by rolling out the
-0.285082	to use in case the
-0.285082	all operands in case the
-0.285082	program errors in case the
-0.216934	in the latter case the
-0.415488	cycles. In most cases the
-0.402145	though. In some cases the
-0.402145	34. In some cases the
-0.242122	is to set up the
-0.299018	used to speed up the
-0.152251	necessary to look up the
-0.097977	to first look up the
-0.097977	address. (3) look up the
-0.224701	need to split up the
-0.176275	time measurements: warm up the
-0.176275	to _endthread() cleans up the
-0.176275	and it fills up the
-0.176275	chain may fill up the
-0.176275	section by summing up the
-0.305403	convenient way of making the
-0.067720	alternative solution of making the
-0.067720	radical solution of making the
-0.147986	The advice of making the
-0.258549	program code for making the
-0.358110	be avoided by making the
-0.182322	faster either by making the
-0.182322	cache misses by making the
-0.358110	be solved by making the
-0.182322	branch mispredictions by making the
-0.174855	in different places making the
-0.269074	Then again two times the
-0.268216	way, then many times the
-0.605230	count how many times the
-0.269074	way and three times the
-0.373642	handling and you want the
-0.373642	anyway and you want the
-0.498008	macro. If you want the
-0.262774	dependency chain. We want the
-0.210169	If you just want the
-0.214740	worrying too much about the
-0.063876	needs all information about the
-0.063876	saved all information about the
-0.138794	has no information about the
-0.138794	memory. No information about the
-0.138794	the full information about the
-0.138794	of added information about the
-0.198802	class gets information about the
-0.138794	has incomplete information about the
-0.167354	have to care about the
-0.167354	threads, but that's about the
-0.167354	programmer hasn't thought about the
-0.195527	a function that does the
-0.195527	default constructor that does the
-0.273443	pointer conversions. It does the
-0.203050	'this' pointer which does the
-0.203050	The static_cast operator does the
-0.203050	Another function __intel_cpu_features_init_x() does the
-0.176390	the program, and while the
-0.176390	registers are used, while the
-0.176390	function is called, while the
-0.176390	c are integers, while the
-0.176390	and press break while the
-0.176390	done only once, while the
-0.176390	is relatively expensive, while the
-0.176390	1's is unchanged, while the
-0.176390	register for both, while the
-0.176390	15.1c as intended, while the
-0.234129	objects in BSD work the
-0.285605	test program that calls the
-0.213371	each statement that calls the
-0.207431	call statement always calls the
-0.207431	in example 16.2 calls the
-0.207431	loaded, the loader calls the
-0.227865	best way to avoid the
-0.227865	you want to avoid the
-0.227865	be able to avoid the
-0.522764	various ways to avoid the
-0.227865	completely unrolled to avoid the
-0.291830	because you can avoid the
-0.186199	then we can avoid the
-0.341938	time. You can avoid the
-0.241559	The compiler may avoid the
-0.224160	devices if you avoid the
-0.234200	in a word processor the
-0.206020	or PathScale. 2. Use the
-0.206020	of this option. Use the
-0.206020	pointers are implemented. Use the
-0.206020	a hot spot. Use the
-0.206020	an assembly listing. Use the
-0.199416	for double precision. But the
-0.199416	the destination array. But the
-0.199416	as an integer. But the
-0.199416	the constant 5. But the
-0.199416	in other languages. But the
-0.199416	on the market. But the
-0.199226	loads the library through the
-0.502746	class are accessed through the
-0.250441	The code goes through the
-0.268949	public variables go through the
-0.199226	automatically download updates through the
-0.199226	value will propagate through the
-0.265686	the framework and compile the
-0.158060	course, that you compile the
-0.158060	way. First you compile the
-0.212748	references. If we compile the
-0.195403	resource problems that cause the
-0.435967	modified. This can cause the
-0.390528	reasons. This may cause the
-0.105862	do so will cause the
-0.105862	overloaded operators will cause the
-0.248006	address 0x2710 will cause the
-0.290362	whether others have done the
-0.340479	operating system, and therefore the
-0.082824	they must be inside the
-0.082824	piece of memory inside the
-0.082824	algorithm is used inside the
-0.082824	by making objects inside the
-0.082824	the shared variable inside the
-0.009529	predictable the branch inside the
-0.009529	avoids the branch inside the
-0.019273	with a branch inside the
-0.019273	integers. The branch inside the
-0.082824	fixed size arrays inside the
-0.025892	on the calculations inside the
-0.025892	depends on calculations inside the
-0.025892	floating point calculations inside the
-0.082824	is a counter inside the
-0.086865	should be declared inside the
-0.086865	preferably be declared inside the
-0.082824	the overflow condition inside the
-0.121443	static object defined inside the
-0.082824	the function body inside the
-0.082824	at what happens inside the
-0.082824	loop because nothing inside the
-0.082824	(addition, multiplication, etc.) inside the
-0.082824	(other than log) inside the
-0.335455	operand that is calculated the
-0.365556	log(2.0) is only calculated the
-0.396610	other code that uses the
-0.269893	Intel CPUs. It uses the
-0.216470	the user never uses the
-0.541869	then you can get the
-0.330821	we will not get the
-0.279472	Here, y will get the
-0.200029	b will both get the
-0.200029	programming will typically get the
-0.429872	best way to check the
-0.160951	The program can check the
-0.160951	makefile. You can check the
-0.378119	operations is more advantageous the
-0.431468	unknown processors that support the
-0.289010	Windows. Does not support the
-0.216252	Future processors will support the
-0.206673	variable (eax) which contains the
-0.258829	stack). ecx now contains the
-0.206673	address is. ecx contains the
-0.206673	a and edx contains the
-0.142968	will be efficient whether the
-0.142968	know for sure whether the
-0.142968	be made about whether the
-0.255363	table to see whether the
-0.142968	The table shows whether the
-0.142968	dispatcher to know whether the
-0.142968	difficult to predict whether the
-0.042632	CPU-dispatcher that checks whether the
-0.042632	class, it checks whether the
-0.042632	CPU dispatcher checks whether the
-0.142968	known at compile-time whether the
-0.142968	first operand determines whether the
-0.552661	different ways of doing the
-0.255182	two functions are doing the
-0.179706	be accomplished by doing the
-0.179706	never spend time doing the
-0.472471	the compiler from doing the
-0.179706	are in fact doing the
-0.179706	program is busy doing the
-0.249380	alternative is to run the
-0.249380	processor models to run the
-0.206055	__debugbreak();. If you run the
-0.600495	then it will run the
-0.063759	2.0/3.0 than to calculate the
-0.006639	it takes to calculate the
-0.063759	induction variables to calculate the
-0.063759	in order to calculate the
-0.063759	you want to calculate the
-0.063759	are able to calculate the
-0.063759	is recommended to calculate the
-0.063759	typical application to calculate the
-0.063759	don't care to calculate the
-0.063759	more convenient to calculate the
-0.063759	is safer to calculate the
-0.198779	consecutively and can calculate the
-0.219801	the compiler to inline the
-0.219801	is optimal to inline the
-0.215374	Therefore, it cannot inline the
-0.279588	would have to add the
-0.273814	return add_elements(s); // add the
-0.324457	modules. You may add the
-0.246040	other module then add the
-0.195312	a vector register, add the
-0.269519	you have to store the
-0.091689	same class and store the
-0.091689	containing (2,2,2,2), and store the
-0.091689	vector (1,2,3,4), and store the
-0.161363	so we can store the
-0.208063	the system may store the
-0.161363	The compiler will store the
-0.161363	optimizing compiler might store the
-0.161363	table. Even better: store the
-0.222534	library is needed. All the
-0.222534	they cannot do. All the
-0.158957	takes time to copy the
-0.158957	be useful to copy the
-0.295855	memory block and copy the
-0.297614	the requirements of optimizing the
-0.277872	algorithm than by optimizing the
-0.164358	depends on how well the
-0.164358	for checking how well the
-0.444368	function pointer is simply the
-0.219159	operations than to write the
-0.156469	is possible to write the
-0.156469	several minutes to write the
-0.156469	from attempting to write the
-0.289766	very important to optimize the
-0.289766	this information to optimize the
-0.211776	compiler can often optimize the
-0.202164	lines we used above the
-0.202164	with column 28 above the
-0.202164	mirror elements matrix[c][r] above the
-0.202164	its mirror position above the
-0.163771	the next function. However, the
-0.163771	were scarce resources. However, the
-0.163771	many different purposes. However, the
-0.163771	large data sets. However, the
-0.163771	they are executed. However, the
-0.163771	with its value. However, the
-0.163771	of a debugger. However, the
-0.163771	the next calculation. However, the
-0.232020	ago, the recommendation was the
-0.289474	cache line in both the
-0.191563	is supported by both the
-0.191563	const twice because both the
-0.191563	run time. Therefore, both the
-0.191563	independent and checks both the
-0.100364	very long time unless the
-0.100364	an induction variable unless the
-0.100364	do the optimization unless the
-0.100364	in 32-bit systems unless the
-0.100364	floating point calculations unless the
-0.140651	in 32-bit mode unless the
-0.100364	for exception handling unless the
-0.047289	which is slow unless the
-0.047289	comparisons are slow unless the
-0.100364	but not safe unless the
-0.100364	program more clear unless the
-0.100364	time than rounding unless the
-0.100364	int (16 bits), unless the
-0.100364	an integer constant, unless the
-0.100364	variable method unfavorable, unless the
-0.253435	cases In most cases, the
-0.253435	strings. In most cases, the
-0.304956	efficient. In many cases, the
-0.283351	function. In some cases, the
-0.283351	users. In some cases, the
-0.229700	In 50 simple cases, the
-0.434362	is possible to replace the
-0.149977	be necessary to replace the
-0.149977	is advantageous to replace the
-0.149977	is expected to replace the
-1.135332	The compiler may replace the
-0.227864	Some compilers will replace the
-0.179557	contains writeable data. Therefore, the
-0.179557	constructors are called. Therefore, the
-0.179557	in 64-bit mode. Therefore, the
-0.179557	can be critical. Therefore, the
-0.303377	are time consuming. Therefore, the
-0.179557	self- relative addresses. Therefore, the
-0.331050	also possible to see the
-0.251536	you want to see the
-0.251536	therefore fail to see the
-0.282989	the user can see the
-0.298878	consequence that it allows the
-0.167749	is "undefined". This allows the
-0.167749	F1() throw(); This allows the
-0.176170	Linux: -ffunction-sections) which allows the
-0.176170	a const reference allows the
-0.176170	the out-of-order mechanism allows the
-0.312234	on which instruction sets the
-0.198093	The above example sets the
-0.198093	library function __intel_cpu_features_init() sets the
-0.198093	brands and similarly sets the
-0.232669	an intermediate code like the
-0.185446	the level-2 cache. Using the
-0.185446	at least temporarily. Using the
-0.185446	(see page 105). Using the
-0.185446	in this chapter. Using the
-0.185446	in chapter 11. Using the
-0.232181	unknown brand or model the
-0.220118	because they can block the
-0.220118	thread can possibly block the
-0.234573	module, and to put the
-0.438986	be advantageous to put the
-0.215277	the code and put the
-0.153108	then you may put the
-0.153108	after they have put the
-0.045292	the other then put the
-0.045292	128 bytes then put the
-0.045292	the other, then put the
-0.233429	where each iteration needs the
-0.183518	serious limitations to what the
-0.183518	following example shows what the
-0.250563	programmer to know what the
-0.039130	105. 8.7 Checking what the
-0.039130	82 8.7 Checking what the
-0.357965	models to avoid running the
-0.219950	of resources. Consider running the
-0.306252	CriticalFunction, @gnu_indirect_function"); // Make the
-0.181567	the object's class. Make the
-0.181567	for the object. Make the
-0.181567	the function returns. Make the
-0.181567	the following alternatives: Make the
-0.288160	in x, and last the
-0.395084	program before and after the
-0.143105	data member or after the
-0.143105	do the check after the
-0.031602	two clock cycles after the
-0.031602	few clock cycles after the
-0.143105	are then output after the
-0.143105	to execute _mm_empty() after the
-0.143105	will remain locked after the
-0.297098	testing. Trying to read the
-0.191792	Furthermore, you may read the
-0.191792	16. If you read the
-0.242086	implementation would only read the
-0.148873	assembly code to give the
-0.148873	wherever appropriate to give the
-0.068848	an overflow and give the
-0.068848	an underflow and give the
-0.150705	unit-test does not give the
-0.150705	CPUID instruction doesn't give the
-0.150705	The subsequent counts give the
-0.334806	resulting machine code becomes the
-0.307583	seconds because it requires the
-0.229424	the cache to load the
-0.229424	it takes to load the
-0.229424	it needs to load the
-0.189710	optimized code will load the
-0.094871	easy way to control the
-0.094871	various options to control the
-0.321198	it has to assume the
-0.254951	be avoided by calling the
-0.187276	cycles more than calling the
-0.083621	temporary array before calling the
-0.083621	call _mm256_zeroupper() before calling the
-0.734729	The following example shows the
-0.164835	be possible to improve the
-0.111030	stack. This can improve the
-0.167071	cycles. You can improve the
-0.111030	64-bit systems can improve the
-0.098721	this did not improve the
-0.050345	advantages that may improve the
-0.050345	entries. This may improve the
-0.172764	then you may improve the
-0.050345	because this may improve the
-0.098721	can not only improve the
-0.098721	valid) can possibly improve the
-0.172384	= i; } Here, the
-0.172384	+ 1.; } Here, the
-0.143055	p + i; Here, the
-0.143055	matrix[FuncRow(i)][FuncCol(i)] += x; Here, the
-0.143055	d + 3.5; Here, the
-0.143055	ArraySize; i++) List[i]++; Here, the
-0.143055	c1; int c1::*MemberPointer; Here, the
-0.563219	the compiler doesn't know the
-0.336045	overflow condition will generate the
-0.337852	link order is usually the
-0.270396	is possible to reduce the
-0.219862	but you can reduce the
-0.219862	context switches can reduce the
-0.184601	example, compilers cannot reduce the
-0.287918	only when it goes the
-0.215328	branch that always goes the
-0.211910	than done to choose the
-0.115754	operating system and choose the
-0.115754	point operations and choose the
-0.194616	available, we may choose the
-0.310844	question. You may choose the
-0.143055	The compiler will choose the
-0.143055	compilers will automatically choose the
-0.094387	the microprocessor has made the
-0.094387	This reordering has made the
-0.373356	of the simple function, the
-0.307303	therefore fail to start the
-0.214658	the iterations and start the
-0.087716	is faster the smaller the
-0.087716	more advantageous the smaller the
-0.197751	bigger systems. The smaller the
-0.491395	put a parenthesis around the
-0.213519	identify the circumstances around the
-0.286015	77) shows which reductions the
-0.316563	simply predicted to go the
-0.233778	Programmers that have tested the
-0.362767	manually. I have tested the
-0.396908	that the CPU supports the
-0.133137	not allowed to change the
-0.104945	the loop can change the
-0.104945	dynamic library can change the
-0.104945	caveats. We can change the
-0.242671	a compiler may change the
-0.176746	optimizing compiler will change the
-0.133137	result if we change the
-0.178762	program to turn off the
-0.178762	user to turn off the
-0.178762	useful to turn off the
-0.127695	off or log off the
-0.049807	program by turning off the
-0.049807	just by turning off the
-0.127695	and) will cut off the
-0.230675	(SDK or PSDK). Supports the
-0.286237	parameters. In 64-bit Windows, the
-0.042730	the one that gives the
-0.042730	the method that gives the
-0.042730	the option that gives the
-0.143338	AVX2 instruction set gives the
-0.143338	of these two gives the
-0.143338	N1 = N&(N-1) gives the
-0.193172	away p and inlining the
-0.190645	be avoided by inlining the
-0.190645	be improved by inlining the
-0.121071	different integer types Unfortunately, the
-0.121071	is never called. Unfortunately, the
-0.121071	pure function calls. Unfortunately, the
-0.121071	of these purposes. Unfortunately, the
-0.121071	automatic CPU dispatching. Unfortunately, the
-0.121071	to do this. Unfortunately, the
-0.121071	of cross-platform portability. Unfortunately, the
-0.113885	software is to find the
-0.112112	in order to find the
-0.113885	of bytes to find the
-0.113885	be able to find the
-0.170321	be difficult to find the
-0.105739	if you cannot find the
-0.105739	function call. (2) find the
-0.258784	are sure to produce the
-0.190553	The compiler will produce the
-0.190553	optimizing compiler should produce the
-0.121071	different compiler by including the
-0.121071	AMD and VIA including the
-0.013412	handle the strings including the
-0.121071	full metaprogramming features, including the
-0.121071	only on n, including the
-0.121071	the same computer, including the
-0.103353	when it is outside the
-0.066243	to stack memory outside the
-0.066243	a function but outside the
-0.066243	a temporary variable outside the
-0.066243	the extra operations outside the
-0.066243	the last element outside the
-0.066243	check for overflow outside the
-0.066243	to be done outside the
-0.066243	arithmetic calculations go outside the
-0.066243	it can move outside the
-0.155253	but it is still the
-0.155253	multiplication, etc. is still the
-0.187579	factors that can prevent the
-0.187579	in the code prevent the
-0.187579	definition. This will prevent the
-0.230134	constructor and no destructor the
-0.076266	function, and it prevents the
-0.076266	optimal because it prevents the
-0.044135	in memory. This prevents the
-0.044135	another thread. This prevents the
-0.044135	preceding one. This prevents the
-0.044135	declared volatile. This prevents the
-0.081586	nontemporal write instruction prevents the
-0.081586	this. It also prevents the
-0.081586	The integer division prevents the
-0.046062	the code to tell the
-0.046062	also possible to tell the
-0.046062	vector always to tell the
-0.046062	variable declaration to tell the
-0.046062	__assume_aligned directive to tell the
-0.046062	function prototype to tell the
-0.046062	we forgot to tell the
-0.046062	#pragma novector to tell the
-0.081586	or references then tell the
-0.231367	of precision. Let's repeat the
-0.226121	no reason to unroll the
-0.226121	be worthwhile to unroll the
-0.088545	several different CPUs. On the
-0.088545	by the compiler. On the
-0.088545	uses few resources. On the
-0.088545	big data structures. On the
-0.088545	optimization more difficult. On the
-0.088545	to using hyperthreading. On the
-0.088545	modification is profitable. On the
-0.088545	in example 9.1b. On the
-0.305076	Windows. In 64-bit Linux, the
-0.207771	factorial *= x; Note the
-0.207771	the variable Day. Note the
-0.143319	the dispatcher function. When the
-0.143319	the cache size. When the
-0.143319	than single precision. When the
-0.143319	every call method. When the
-0.143319	and branch mispredictions. When the
-0.183696	of a function. Avoid the
-0.183696	Possible solutions are: Avoid the
-0.183696	on page 93. Avoid the
-0.132646	is done by copying the
-0.132646	be avoided by copying the
-0.132646	this jump by copying the
-0.321507	be used for accessing the
-0.071783	them off or until the
-0.071783	is valid only until the
-0.071783	by a variable until the
-0.071783	access the file until the
-0.071783	can be loaded until the
-0.122686	seconds and wait until the
-0.071783	loaded, but waits until the
-0.071783	iteration is repeated until the
-0.071783	should be postponed until the
-0.304739	each row by adding the
-0.205761	size needed before adding the
-0.206012	of course, and causes the
-0.206012	malloc and free) causes the
-0.228743	more time than processing the
-0.021090	case is to divide the
-0.010417	cores is to divide the
-0.506126	in order to divide the
-0.066616	we need to divide the
-0.066616	several ways to divide the
-0.147255	vector library, you divide the
-0.327292	are able to mix the
-0.034935	by 16 to fit the
-0.002103	by eight to fit the
-0.034935	if necessary, to fit the
-0.136983	into sub-vectors that fit the
-0.444303	be able to predict the
-0.255807	the microprocessor can predict the
-0.100781	to temp even though the
-0.100781	is executed even though the
-0.100781	point expressions, even though the
-0.100781	|| b)) even though the
-0.136520	the track backwards though the
-0.553084	it takes to execute the
-0.273914	the microprocessor can execute the
-0.204533	for interpreting or compiling the
-0.256417	a module by compiling the
-0.255807	constant and then convert the
-0.203992	faster to first convert the
-0.262014	aligned by at least the
-0.262014	N+1 supports at least the
-0.080629	be a class containing the
-0.080629	a simple class containing the
-0.179720	then the line containing the
-0.046376	sufficiently large to handle the
-0.157948	of the performance during the
-0.157948	off the computer during the
-0.157948	which will change during the
-0.157948	may be selected during the
-0.128371	innermost loop that includes the
-0.194753	other compilers. This includes the
-0.128371	C++ language also includes the
-0.128371	in this way includes the
-0.128371	The map file includes the
-0.269995	way is to insert the
-0.167344	compile time and insert the
-0.167344	by hand and insert the
-0.510186	object, you may consider the
-0.228331	you will be loading the
-0.151615	would all be below the
-0.151615	from row 28 below the
-0.069225	the elements matrix[r][c] below the
-0.069225	Each element matrix[r][c] below the
-0.252801	to optimize, and reading the
-0.201323	in connection with reading the
-0.201323	i which will delay the
-0.201323	store operation doesn't delay the
-0.086437	table instead of calculating the
-0.129097	often used for calculating the
-0.077480	physics processor for calculating the
-0.077480	hardware support for calculating the
-0.077480	unit intended for calculating the
-0.086437	is faster than calculating the
-0.086437	done implicitly when calculating the
-0.077939	do is to enable the
-0.209736	is recommended to enable the
-0.077939	compiler options to enable the
-0.086212	64-bit mode or enable the
-0.086212	inline. This may enable the
-0.125149	inline. This will enable the
-0.086212	newer instruction sets enable the
-0.227707	the module with, e.g. the
-0.243888	be advantageous to keep the
-0.243888	be preferable to keep the
-0.228019	the compiler can align the
-0.228019	changed. This will allow the
-0.227396	the time and rarely the
-0.175620	test the performance under the
-0.175620	tests are done under the
-0.175620	program that runs under the
-0.079214	table if you expect the
-0.079214	& unless you expect the
-0.286623	and you cannot expect the
-0.042370	if all bits except the
-0.042370	testing all bits except the
-0.463551	1. The reason why the
-0.228407	pointers to zero whenever the
-0.087619	to gain by unrolling the
-0.087619	performance dramatically by unrolling the
-0.144136	column; Do not swap the
-0.020940	Here, you cannot swap the
-0.020940	Here you cannot swap the
-0.042940	operands. You cannot swap the
-0.217189	be necessary to modify the
-0.217189	container classes or modify the
-0.169550	flag and don't modify the
-0.015596	* c); // Store the
-0.066056	_mm_or_si128(c2, bc); // Store the
-0.066056	c2, mask); // Store the
-0.281472	of the Gnu compiler, the
-0.143853	the test and setting the
-0.161116	absolute value by setting the
-0.161116	[1.0, 2.0) by setting the
-0.143853	can benefit from setting the
-0.119859	is accessed from within the
-0.119859	references to data within the
-0.119859	is used only within the
-0.119859	all data members within the
-0.119859	to become obsolete within the
-0.369628	Therefore, you should apply the
-0.169850	two clock cycles. Obviously, the
-0.169850	is not needed. Obviously, the
-0.169850	the .NET framework. Obviously, the
-0.330877	is preferable to allocate the
-0.222068	shows how to implement the
-0.295895	quite difficult to implement the
-0.163556	The child classes implement the
-0.280931	that it has chosen the
-0.225837	third generations classes contain the
-0.074124	this is to help the
-0.074124	in order to help the
-0.163556	that we can help the
-0.355363	compiler to optimize away the
-0.180488	compiler can optimize away the
-0.015309	same time to share the
-0.005043	that threads can share the
-0.005043	and c can share the
-0.005043	running simultaneously can share the
-0.015309	make different objects share the
-0.015309	if the threads share the
-0.015309	where data members share the
-0.015309	logical processors usually share the
-0.015309	in row 28 share the
-0.228118	repeat count is near the
-0.050751	ten times and stores the
-0.050751	a matrix and stores the
-0.149288	and the function stores the
-0.050751	time. It simply stores the
-0.050751	member pointer simply stores the
-0.089385	is used for finding the
-0.089648	be useful for finding the
-0.089648	are useful for finding the
-0.089385	is required for finding the
-0.107939	anything else than finding the
-0.226596	but for most purposes the
-0.056510	you have to vectorize the
-0.056510	We want to vectorize the
-0.056510	is unable to vectorize the
-0.082865	12.8b automatically and vectorize the
-0.178989	The compiler will vectorize the
-0.082865	current compilers don't vectorize the
-0.281361	You have to include the
-0.194286	risky because it involves the
-0.134858	processor. However, this involves the
-0.134858	This method also involves the
-0.134858	to a driver involves the
-0.226596	and Newton-Raphson iterations. Here the
-0.301267	or otherwise optimize across the
-0.225194	piece of code once the
-0.225620	mechanism should never interrupt the
-0.225620	a loop where almost the
-0.226474	hardware support for multiplying the
-0.204533	writes may slow down the
-0.143781	lookup operations slow down the
-0.201487	template parameters are exactly the
-0.155452	different methods have exactly the
-0.155452	Sum3 are doing exactly the
-0.223398	size. Later models had the
-0.308275	this function to measure the
-0.363378	value in one vector, the
-0.278285	you forget to delete the
-0.223398	and instruction sets. Likewise, the
-0.278285	respectively. (In 64-bit mode, the
-0.223398	the need to update the
-0.172123	} The compiler generates the
-0.172123	The Intel compiler generates the
-0.183010	other CPUs for executing the
-0.435255	before and after executing the
-0.182128	compilers do not free the
-0.182128	though it could free the
-0.139801	extra register to hold the
-0.139801	big enough to hold the
-0.278285	The smaller the system, the
-0.182128	dispatcher. The dispatcher changes the
-0.182128	The keyword __fastcall changes the
-0.277735	implemented simply by storing the
-0.182128	problems mentioned above. Now the
-0.182128	(c + d); Now the
-0.144187	here is to remove the
-0.144187	names. Remember to remove the
-0.144442	zero. You may remove the
-0.420838	more time to transpose the
-0.222141	depending on how predictable the
-0.221577	value from memory plus the
-0.027216	in order to increase the
-0.027216	The way to increase the
-0.056253	Windows you can increase the
-0.056253	systems, you cannot increase the
-0.056253	your modifications actually increase the
-0.199851	you have to identify the
-0.199851	the debugger to identify the
-0.223270	function local: 1. Add the
-0.239458	also recommended to declare the
-0.173984	called. You may declare the
-0.078742	data size that fits the
-0.078742	the version that fits the
-0.360202	is used for giving the
-0.275673	perfectly. As explained above, the
-0.221577	that it may detect the
-0.276311	a time and show the
-0.222141	and branch mispredictions. Test the
-0.311691	be able to evaluate the
-0.641109	a pointer or reference, the
-0.222705	at more than half the
-0.240045	etc. of only half the
-0.277591	extra instructions for converting the
-0.174489	is declared by specifying the
-0.174489	an int, without specifying the
-0.222705	arguments. This closely follows the
-0.221577	a loop counter, comparing the
-0.222705	execution mechanism can prefetch the
-0.223270	is executed. Without static, the
-0.222141	16 Testing speed Testing the
-0.323551	this condition. In general, the
-0.221577	the same module (i.e. the
-0.222141	suggests methods for avoiding the
-0.239458	may save by avoiding the
-0.272802	the CPU to increment the
-0.070888	is important to economize the
-0.046192	more important to economize the
-0.068434	efficient library and economize the
-0.157899	easiest way to overcome the
-0.243983	discussed how to overcome the
-0.219042	be careful when swapping the
-0.503217	optimization options turned on, the
-0.113074	various ways of reducing the
-0.113074	be used for reducing the
-0.113074	point operations without reducing the
-0.007987	may not be worth the
-0.068434	It is rarely worth the
-0.068434	it is hardly worth the
-0.219714	piece of software specifies the
-0.219714	_mm_andnot_si128(mask, bc); // OR the
-0.219042	The following table lists the
-0.209702	preprocessing directives that select the
-0.162834	compiler will always select the
-0.219042	an element in list, the
-0.162834	chains. In this case, the
-0.281275	in the latter case, the
-0.209702	weigh the advantages over the
-0.162834	due to controversies over the
-0.812788	On the other hand, the
-0.273563	not advantageous to split the
-0.220388	faster way to limit the
-0.068434	it needs to follow the
-0.068434	C if you follow the
-0.068434	want vectorization then follow the
-0.068434	the cache lines follow the
-0.219042	generation of CPUs increased the
-0.162834	the executable file. Only the
-0.162834	value of ebx. Only the
-0.219042	// This function adds the
-0.219042	and cache sizes. Fortunately, the
-0.154617	therefore, always to specify the
-0.113074	code if we specify the
-0.113074	may as well specify the
-0.302403	optimized function, but unfortunately the
-0.219042	can be inlined. (In the
-0.292310	you want to compare the
-0.163425	for the stack. Is the
-0.163425	(see page 38). Is the
-0.163425	set is enabled. Typically, the
-0.163425	in the future. Typically, the
-0.163425	the end user gets the
-0.163425	the application programmer gets the
-0.068434	function definition. This tells the
-0.068434	The map file tells the
-0.007987	sampling: The profiler tells the
-0.128031	method is to wrap the
-0.128031	is recommended to wrap the
-0.219714	the problem by increasing the
-0.220388	15. C++ is definitely the
-0.215266	32-bit Linux and BSD, the
-0.008545	time. 4 2 Choosing the
-0.008545	............................................................................................... 4 2 Choosing the
-0.017262	............................................................................................... 23 5 Choosing the
-0.017262	a website. 5 Choosing the
-0.215266	is recommended to place the
-0.041492	the CPU to overlap the
-0.041492	is able to overlap the
-0.087371	out-of-order capabilities can overlap the
-0.297992	function or by turning the
-0.032281	is possible to obtain the
-0.032281	sometimes possible to obtain the
-0.026719	object file. This enables the
-0.260109	code. Let me explain the
-0.146627	vector libraries. To explain the
-0.215266	should be weighed against the
-0.117987	is done by declaring the
-0.117987	be inlined by declaring the
-0.215266	the container may move the
-0.215266	to the container. Can the
-0.216934	the cache always chooses the
-0.192476	much more by choosing the
-0.147339	help the programmer choosing the
-0.216934	etc., as is commonly the
-0.215266	the overhead of transferring the
-0.216099	the loop and splitting the
-0.216099	one big problem. Whenever the
-0.146627	time and it avoids the
-0.146627	more time but avoids the
-0.215266	the microprocessor can begin the
-0.468464	safe. In other words, the
-0.755549	The following example illustrates the
-0.146627	be optimal to mirror the
-0.146627	time. You may mirror the
-0.268531	possible and by changing the
-0.146627	in order to force the
-0.146627	disk. Memory-hungry applications force the
-0.146627	or if it opens the
-0.146627	new instruction set opens the
-0.087371	all threads have finished the
-0.010007	before it has finished the
-0.209044	in 32-bit mode. Storing the
-0.209044	made smaller by reordering the
-0.209044	calculations while simultaneously prefetching the
-0.209044	will call this distance the
-0.209044	the compiler from aligning the
-0.023290	it takes to reload the
-0.209044	} 73 Without optimization, the
-0.210140	many Boolean expressions. Whether the
-0.209044	columns unused. This removed the
-0.209044	how well it optimizes the
-0.280500	to zero // Return the
-0.101747	code size. In fact, the
-0.101747	can throw. In fact, the
-0.209044	model. You may ignore the
-0.047649	zigzag course that reflects the
-0.047649	innermost loop. This reflects the
-0.047649	and long double reflects the
-0.056253	allow you to manipulate the
-0.056253	allows us to manipulate the
-0.120892	a code that copies the
-0.120892	/ shr ebx,31 copies the
-0.056253	very useful to study the
-0.056253	is important to study the
-0.120892	this problem by bypassing the
-0.120892	on the compiler bypassing the
-0.209044	an explanation. Please skip the
-0.209044	is called, it allocates the
-0.101747	executable. Most compilers offer the
-0.101747	format. Other compilers offer the
-0.209044	} } // At the
-0.023290	call _mm256_zeroupper() before leaving the
-0.120892	than future processors. Consider the
-0.120892	case in loops. Consider the
-0.261504	512 520 and leave the
-0.210140	for the user. With the
-0.210140	optimizes the code. Sometimes the
-0.343049	small enough to justify the
-0.398806	9.1b. On the contrary, the
-0.056253	in order to cover the
-0.056253	very big to cover the
-0.209044	In the same way, the
-0.120892	was too slow. Today, the
-0.120892	big mainframe computers. Today, the
-0.209044	is important to focus the
-0.209044	array bounds is probably the
-0.261504	use it for improving the
-0.210140	the array. eax holds the
-0.056253	a key or moving the
-0.056253	a button or moving the
-0.209044	of clock pulses since the
-0.015309	different purposes is beyond the
-0.015309	database queries is beyond the
-0.015309	of coprocessors is beyond the
-0.120892	smarter ways of organizing the
-0.120892	the performance by organizing the
-0.209044	Function inlining can open the
-0.120892	hot spots and measuring the
-0.120892	confirmed this by measuring the
-0.056253	is possible to utilize the
-0.056253	in order to utilize the
-0.210140	the subsequent counts represent the
-0.056253	performance test that measures the
-0.056253	a counter that measures the
-0.023171	optimizing compiler can bypass the
-0.023171	__intel_cpu_feature_indicator_x. You can bypass the
-0.047649	on. Replace or bypass the
-0.292524	the cache will evict the
-0.209044	of the function. Copying the
-0.209044	to develop and market the
-0.209044	4.5.2, July 2011). Instead, the
-0.280500	be avoided by joining the
-0.280500	more efficient to determine the
-0.013395	a matter of interpreting the
-0.196863	similar utility for modifying the
-0.196863	Older operating systems lack the
-0.196863	C++ programs. Writing past the
-0.247783	allows you to override the
-0.196863	repeat or to exit the
-0.196863	function or variable having the
-0.196863	in advance. This reduces the
-0.196863	non-reduced expression better explains the
-0.196863	in order to emulate the
-0.196863	12.1b to 12.1a. Enable the
-0.196863	(see page 78). Adding the
-0.073589	the last vector. Organize the
-0.073589	is a bottleneck. Organize the
-0.196863	including the while loop, the
-0.073589	do this by invoking the
-0.073589	application program without invoking the
-0.196863	following example, which calculates the
-0.035238	the program. 3 Finding the
-0.035238	language...................................................... 14 3 Finding the
-0.073589	is required for putting the
-0.073589	to 2 by putting the
-0.035238	all applications. 2.8 Overcoming the
-0.035238	framework........................................................................... 14 2.8 Overcoming the
-0.196863	efficient than necessary. Take the
-0.073589	solution, but it increases the
-0.073589	A hash table increases the
-0.196863	you may actively invalidate the
-0.196863	int is 4. So the
-0.196863	evaluate both operands. Nevertheless, the
-0.196863	FuncB, then FuncC. Unrolling the
-0.196863	a profiler which determines the
-0.073589	explains the logic behind the
-0.073589	is actually hidden behind the
-0.073589	be useful to isolate the
-0.073589	to identify and isolate the
-0.196863	is necessary for verifying the
-0.196863	(columns * sizeof(float)). Now, the
-0.073589	got low priority. Especially the
-0.073589	at round addresses. Especially the
-0.196863	another dynamic library requiring the
-0.196863	development process can influence the
-0.073589	initialization routine that loads the
-0.073589	The application program loads the
-0.247783	necessary here to draw the
-0.073589	The advantage of sharing the
-0.073589	multiple threads are sharing the
-0.035238	of overflow and redo the
-0.035238	with _finite()) and redo the
-0.326547	for creating and deleting the
-0.196863	the line that covered the
-0.196863	the hot spot. Sometimes, the
-0.247783	are disabled will crash the
-0.247783	allows you to reserve the
-0.196863	program is run. Both the
-0.460336	the program is loaded, the
-0.247783	longer size by extending the
-0.073589	the & operator forces the
-0.073589	variable. The union forces the
-0.035238	to start and stop the
-0.035238	error message and stop the
-0.247783	be possible to organize the
-0.196863	that the compiler sees the
-0.196863	program performance and studying the
-0.196863	loop control it compares the
-0.035238	may have to fix the
-0.035238	will try to fix the
-0.196863	because it may involve the
-0.196863	measured simply by removing the
-0.196863	makes the compiler interpret the
-0.196863	you want to flip the
-0.196863	resources. For these reasons, the
-0.196863	is used for relieving the
-0.196863	different meaning. 2. Put the
-0.162244	done simply by ignoring the
-0.162244	one, auto_ptr that owns the
-0.162244	any option that limits the
-0.162244	to the sign bit, the
-0.162244	graphical user interface. Otherwise the
-0.162244	and this will trigger the
-0.162244	doesn't work // Re-do the
-0.162244	CPU has problems separating the
-0.162244	A few decades ago, the
-0.162244	Alternatively, you may reuse the
-0.162244	available vector classes. Including the
-0.162244	100. pop ebx restores the
-0.162244	update or even telling the
-0.162244	which is fast. Calculating the
-0.162244	I want to thank the
-0.162244	critical time consumers. Choose the
-0.162244	and the system forbids the
-0.162244	an existing program. Weighing the
-0.162244	obtain, such as eliminating the
-0.162244	intermediate code by emulating the
-0.162244	the current version satisfies the
-0.162244	protection means are among the
-0.162244	a discrete icon signaling the
-0.162244	simple way of solving the
-0.162244	poor because it lacks the
-0.162244	process. Obviously, we loose the
-0.162244	and exception handling. Omitting the
-0.162244	be done by controlling the
-0.162244	programming experience before trying the
-0.162244	number simply by inverting the
-0.162244	The compiler may interleave the
-0.162244	that it rarely justifies the
-0.162244	is important to weigh the
-0.162244	the user to restart the
-0.162244	space is occupied throughout the
-0.162244	induction variable. (This eliminates the
-0.162244	is obtained by dropping the
-0.162244	register. In example 12.2, the
-0.162244	(e.g. an if-else structure), the
-0.162244	likely to experience. Occasionally, the
-0.162244	is aiming at explaining the
-0.162244	enough bits for holding the
-0.162244	are also included. Combining the
-0.162244	clock cycles to fetch the
-0.162244	the destructor by constructing the
-0.162244	a new processor enters the
-0.162244	something that can steal the
-0.162244	infinity or NAN. Avoiding the
-0.162244	forward) instruction to localize the
-0.162244	this hot spot. Repeating the
-0.162244	16 in column 28, the
-0.162244	(or in addition to) the
-0.162244	As table 9.3 shows, the
-0.162244	oriented programming without paying the
-0.162244	chosen compiler doesn't provide the
-0.162244	do integer operations in-between the
-0.162244	distributors are often abusing the
-0.162244	intelligible way by wrapping the
-0.162244	to CPU dispatching. Underestimating the
-0.162244	don't have to reinvent the
-0.162244	The following table summarizes the
-0.162244	an error condition terminates the
-0.162244	the Intel compiler puts the
-0.162244	structure and then merge the
-0.162244	problem is to combine the
-0.162244	integers to alias upon the
-0.162244	bitwise AND operation isolates the
-0.162244	an efficient solution. Sort the
-0.162244	by n and reorganize the
-0.162244	may be faster despite the
-0.162244	that the linker extracts the
-0.162244	= 1. This ends the
-0.162244	metaprogramming so complicated? Because the
-0.162244	an interpreter which interprets the
-0.162244	avoids the overflow. Taking the
-0.162244	16. In example 12.1a, the
-0.162244	following conditions are met: the
-0.162244	contrived examples exist. Therefore the
-0.162244	illegal operation that crashes the
-0.162244	no need to deallocate the
-0.162244	the whole program. During the
-0.162244	case, you may view the
-0.162244	pointers: The trick violates the
-0.162244	You have to consult the
-0.162244	counts. In any event, the
-0.162244	in order to minimize the
-0.162244	but in example 12.1b, the
-0.162244	compilers succeeded in applying the
-0.162244	it takes to refresh the
-0.162244	a program and concentrate the
-0.162244	A thread that shares the
-0.162244	loop-carried dependency chains, namely the
-0.162244	} Example 14.30 finds the
-0.162244	case a = ++b; the
-0.162244	in programming nowadays stress the
-0.162244	good idea to collect the
-0.162244	that they are. Declare the
-0.162244	tempting to fine- tune the
-0.162244	solution. (In my tests, the
-0.162244	types of variables. Move the
-0.162244	which opens and closes the
-0.162244	exp, sin, etc. Overriding the
-0.162244	is intended to mimic the
-0.162244	worse, it can overwrite the
-0.162244	the risk of activating the
-0.162244	the memory when exiting the
-0.162244	Updating mechanisms often disturb the
-0.162244	dispatcher function and replaces the
-0.162244	result is valid. Re-interpreting the
-0.352426	for example, that a is
-0.061615	be evaluated if a is
-0.345606	evaluate b when a is
-0.444649	object it points to is
-0.279501	of object pointed to is
-0.397494	the variable pointed to is
-0.208197	the target pointed to is
-0.321891	all class objects and is
-0.290979	function is big and is
-0.290979	requires OS support and is
-0.439771	for CPU dispatching and is
-0.290979	some syntax checking and is
-0.235062	as it is, and is
-0.340260	that you optimized for is
-0.443595	in a function that is
-0.368526	local A function that is
-0.257251	a const function that is
-0.257251	that every function that is
-0.411470	is the code that is
-0.339094	and for code that is
-0.339094	says. A code that is
-0.258224	149 All code that is
-0.258224	a complicated code that is
-0.268902	loop. The time that is
-0.215594	part of memory that is
-0.320497	more well-structured program that is
-0.399263	like the one that is
-0.350998	set and one that is
-0.351990	an instruction set that is
-0.298229	function or class that is
-0.092798	an integer size that is
-0.103249	smallest integer size that is
-0.395284	of the library that is
-0.288232	array or object that is
-0.304340	a generic version that is
-0.057402	on the value that is
-0.124250	from the value that is
-0.057402	Here, the value that is
-0.034768	from a value that is
-0.034768	constant a value that is
-0.288232	thread. A variable that is
-0.215594	of the table that is
-0.196396	wasted on software that is
-0.196396	computer. Security software that is
-0.324670	branches Remove branch that is
-0.021779	a memory address that is
-0.288232	most important method that is
-0.215594	choose the type that is
-0.215594	be a constant that is
-0.147663	while the expression that is
-0.147663	mask. The expression that is
-0.147663	variables An expression that is
-0.147663	counter. Any expression that is
-0.215594	register to zero that is
-0.215594	with an offset that is
-0.021779	put the operand that is
-0.288232	make sure everything that is
-0.215594	gives a measure that is
-0.215594	the runtime polymorphism that is
-0.215594	called stack unwinding that is
-0.215594	a constant divisor that is
-0.495337	an initialization routine that is
-0.215594	linkage table (PLT) that is
-0.094554	software optimization. Everything that is
-0.094554	same register. Everything that is
-0.215594	integer vectors. Code that is
-0.265297	VIA processors, and it is
-0.322709	code is that it is
-0.221375	double is that it is
-0.221375	problem is that it is
-0.221375	storage is that it is
-0.365197	function so that it is
-0.365197	switches; so that it is
-0.145609	so high that it is
-0.149238	This means that it is
-0.068240	variable means that it is
-0.193607	compiler optimizations that it is
-0.193607	so expensive that it is
-0.193607	I think that it is
-0.193607	may argue that it is
-0.077224	small or if it is
-0.055199	pattern or if it is
-0.068012	a function if it is
-0.068012	but not if it is
-0.068012	extra time if it is
-0.068012	RAM memory if it is
-0.118449	works only if it is
-0.068012	a loop if it is
-0.068012	an integer if it is
-0.068012	more efficient if it is
-0.068012	function call if it is
-0.068012	separate thread if it is
-0.068012	quite well if it is
-0.016031	clock cycles if it is
-0.068012	calculated fast if it is
-0.068012	to see if it is
-0.068012	variable global if it is
-0.068012	the costs if it is
-0.068012	a destructor if it is
-0.068012	most efficiently if it is
-0.068012	may consider if it is
-0.068012	programming style if it is
-0.068012	separate subroutine if it is
-0.160803	as long as it is
-0.160803	is distributed as it is
-0.173658	such systems than it is
-0.173658	other purposes than it is
-0.314162	memory each time it is
-0.314162	re-allocated every time it is
-0.137678	is used when it is
-0.219453	inlined even when it is
-0.137678	is compiled when it is
-0.137678	by line when it is
-0.137678	process running when it is
-0.137678	is permissible when it is
-0.125338	a code then it is
-0.125338	different compilers then it is
-0.125338	is available then it is
-0.125338	program execution then it is
-0.125338	not advantageous then it is
-0.125338	already known then it is
-0.125338	73) automatically then it is
-0.125338	CPU core then it is
-0.125338	array index then it is
-0.125338	CPU efficiency then it is
-0.125338	separate module then it is
-0.125338	non-sequential manner then it is
-0.125338	big arrays, then it is
-0.125338	one segment then it is
-0.125338	CPU dispatching, then it is
-0.125338	been found, then it is
-0.125338	too fine then it is
-0.125338	poorly predictable, then it is
-0.125338	be made) then it is
-0.125338	too small, then it is
-0.125338	not met then it is
-0.209911	induction variable because it is
-0.209911	eight times because it is
-0.209911	function just because it is
-0.209911	is inefficient because it is
-0.209911	main executable because it is
-0.209911	Sandy Bridge) because it is
-0.209911	with alloca, because it is
-0.135092	for the CPU it is
-0.040534	a branch. If it is
-0.040534	come first. If it is
-0.040534	so. 58 If it is
-0.178908	processors on which it is
-0.179630	reasonably well, but it is
-0.179630	a pointer, but it is
-0.179630	such applications, but it is
-0.179630	64-bit software, but it is
-0.179630	small subtasks, but it is
-0.179630	container expandable, but it is
-0.179630	is wrong, but it is
-0.135092	than the one it is
-0.113911	the point where it is
-0.134768	be cases where it is
-0.134768	are cases where it is
-0.113911	data cache, where it is
-0.113911	of data", where it is
-0.138775	the stack before it is
-0.138775	actual values before it is
-0.138775	of temp before it is
-0.138775	the misprediction before it is
-0.135092	in most libraries it is
-0.103785	to make sure it is
-0.103785	you make sure it is
-0.116399	In most cases it is
-0.116399	In many cases it is
-0.291124	In some cases it is
-0.135092	the more important it is
-0.135092	running on, while it is
-0.106044	modern CPU. But it is
-0.106044	such checks. But it is
-0.106044	optimization issue. But it is
-0.135092	memory and therefore it is
-0.019794	find out whether it is
-0.019794	may consider whether it is
-0.019794	to evaluate whether it is
-0.004865	when deciding whether it is
-0.019794	to determine whether it is
-0.135092	the size. However, it is
-0.062314	In other cases, it is
-0.062314	In some cases, it is
-0.135092	fits the microprocessor it is
-0.082311	different applications. Therefore, it is
-0.082311	other number. Therefore, it is
-0.082311	type int. Therefore, it is
-0.082311	library. 78 Therefore, it is
-0.082311	was programmed. Therefore, it is
-0.082311	than PCs. Therefore, it is
-0.082311	been calculated. Therefore, it is
-0.135092	In multithreaded applications it is
-0.135092	&list[8]); } Here, it is
-0.245188	nine, even though it is
-0.062314	of the program, it is
-0.062314	the final program, it is
-0.135092	the reason why it is
-0.194554	own initiative whenever it is
-0.135092	allocation is used, it is
-0.135092	back again. Obviously, it is
-0.135092	} } Here it is
-0.135092	generating overflow. Likewise, it is
-0.289106	object is called, it is
-0.135092	brutally interrupted. Now it is
-0.135092	function bodies above, it is
-0.135092	generators. In general, it is
-0.135092	checking In C++, it is
-0.135092	the specific event it is
-0.135092	the other hand, it is
-0.135092	vector classes Fortunately, it is
-0.062314	pointers, etc. And it is
-0.062314	to maintain. And it is
-0.135092	porting between platforms, it is
-0.135092	interpreted script languages, it is
-0.135092	system crash. Furthermore, it is
-0.135092	In other words, it is
-0.178908	simple tasks. Sometimes it is
-0.135092	or size. Today, it is
-0.178908	or two. Often, it is
-0.135092	a PC. Nevertheless, it is
-0.135092	of modern software, it is
-0.135092	of Boolean algebra, it is
-0.062314	For team projects, it is
-0.062314	For one-man projects, it is
-0.135092	of each method, it is
-0.135092	object is accessed, it is
-0.135092	we can see, it is
-0.135092	improve the performance, it is
-0.135092	programs do. Hence, it is
-0.135092	optimized software design, it is
-0.135092	weakness or bottleneck, it is
-0.135092	new software project, it is
-0.135092	matter of habit, it is
-0.135092	cases like these, it is
-0.135092	iterative in nature, it is
-0.299167	function and the function is
-0.290858	disadvantage that the function is
-0.195963	means that the function is
-0.195963	possibility that the function is
-0.124317	advantageous if the function is
-0.124317	reference if the function is
-0.124317	pure if the function is
-0.016319	each time the function is
-0.024714	first time the function is
-0.074218	every time the function is
-0.050919	next time the function is
-0.434980	initialized when the function is
-0.224827	in case the function is
-0.224827	many times the function is
-0.224827	that calls the function is
-0.224827	to calculate the function is
-0.299167	clear unless the function is
-0.224827	method. When the function is
-0.438414	objects to a function is
-0.203018	know that a function is
-0.203018	says that a function is
-0.115309	next time a function is
-0.115309	Every time a function is
-0.272757	comes when a function is
-0.203018	problem. If a function is
-0.203018	part. If a function is
-0.272757	definition. Inlining a function is
-0.249592	first call. The function is
-0.249592	Intel/x86-compatible microprocessors. The function is
-0.161642	different compilers. This function is
-0.161642	CPU dispatching. This function is
-0.161642	for overflow. This function is
-0.363463	name of this function is
-0.299011	function body. A function is
-0.194296	call any other function is
-0.378651	many times each function is
-0.201802	if the member function is
-0.325971	Calling a member function is
-0.148852	A static member function is
-0.201802	a virtual member function is
-0.201802	a polymorphic member function is
-0.259136	time the critical function is
-0.259136	When the critical function is
-0.323090	example, the template function is
-0.189564	of the virtual function is
-0.189564	when the virtual function is
-0.263166	functions An inline function is
-0.356468	that the dispatcher function is
-0.485719	to a graphics function is
-0.194296	of a linked function is
-0.363463	functions. A frame function is
-0.263166	#pragma ivdep Assume function is
-0.288560	functions A pure function is
-0.272596	if the dispatched function is
-0.040497	function. A leaf function is
-0.194296	of the lrint function is
-0.194296	} The InstructionSet() function is
-0.194296	that a user-defined function is
-0.194296	sin(0.8); The sin function is
-0.237140	compile time while if is
-0.236609	to multiply j by is
-0.347999	can surely rely on is
-0.095455	version of the code is
-0.867617	part of the code is
-0.280663	sure that the code is
-0.280663	notice that the code is
-0.219438	see if the code is
-0.219438	vectorized if the code is
-0.219438	efficiently if the code is
-0.107486	relevant when the code is
-0.107486	obtained when the code is
-0.107486	fragmented when the code is
-0.107486	precisions when the code is
-0.422537	time, then the code is
-0.197507	purposes. If the code is
-0.197507	12. If the code is
-0.263099	in case the code is
-0.263099	checks whether the code is
-0.263099	needed. All the code is
-0.111825	new branch of code is
-0.111825	particular branch of code is
-0.262819	A lot of code is
-0.887344	a piece of code is
-0.284246	trigonometric functions. The code is
-0.284246	Clang compilers. The code is
-0.284246	interrupt 3. The code is
-0.195821	instance. The function code is
-0.274423	problem with this code is
-0.139072	piece of program code is
-0.063993	compilation. The program code is
-0.063993	interpretation. The program code is
-0.195821	and maintaining such code is
-0.067738	that the system code is
-0.067738	therefore the system code is
-0.193009	disadvantage of intermediate code is
-0.374228	based on intermediate code is
-0.096786	using an intermediate code is
-0.411424	a[i]; The above code is
-0.246611	code). The source code is
-0.086966	remember that your code is
-0.086966	years before your code is
-0.302333	for special position-independent code is
-0.274423	in the machine code is
-0.148030	146 below. Position-independent code is
-0.148030	by default. Position-independent code is
-0.246611	register size. Vectorized code is
-0.246611	instructions. The built-in code is
-0.195821	if the unsafe code is
-0.195821	shell script. Interpreted code is
-0.195821	use it. Complicated code is
-0.235481	just-in-time compilers, etc., as is
-0.235481	vector of vectors, as is
-0.107992	= 0 // This is
-0.050648	/ 16; // This is
-0.050648	% 16; // This is
-0.107992	simple method. // This is
-0.107992	- time1; // This is
-0.363681	= 1; } This is
-0.467323	of the code. This is
-0.034543	of the time. This is
-0.034543	at a time. This is
-0.034543	the first time. This is
-0.034543	any extra time. This is
-0.158535	systems. 10 Gnu This is
-0.411622	inside the function. This is
-0.221546	so-called intrinsic functions. This is
-0.158535	a || b; This is
-0.235324	library into memory. This is
-0.235324	of a program. This is
-0.125381	on the data. This is
-0.183436	block of data. This is
-0.418314	function is called. This is
-0.275637	for different CPUs. This is
-0.392431	the innermost loop. This is
-0.204916	a memory pointer. This is
-0.204916	in both cases. This is
-0.158535	level-1 cache size. This is
-0.221546	its child class. This is
-0.158535	pointer to it. This is
-0.204916	a local variable. This is
-0.204916	for other purposes. This is
-0.158535	floating point instructions. This is
-0.158535	into the vector. This is
-0.204916	languages as well. This is
-0.158535	the function returns. This is
-0.275637	bigger memory block. This is
-0.158535	the same value. This is
-0.158535	the operating system. This is
-0.158535	their 23 software. This is
-0.158535	a cache line. This is
-0.158535	and its parameters. This is
-0.204916	a vector simultaneously. This is
-0.158535	Microsoft Visual Studio This is
-0.158535	on the processor. This is
-0.125381	number of bits. This is
-0.125381	to 64 bits. This is
-0.204916	jobs to do. This is
-0.204916	than by 16. This is
-0.204916	a hundred times. This is
-0.158535	time stamp counter. This is
-0.158535	class or structure. This is
-0.158535	intrinsics. Digital Mars This is
-0.158535	be filled up. This is
-0.158535	for all objects. This is
-0.158535	and invalid pointers. This is
-0.158535	the subsequent counts. This is
-0.158535	get reproducible results. This is
-0.158535	floating point addition. This is
-0.158535	they are long. This is
-0.158535	of #include directives. This is
-0.392431	and VIA CPUs"). This is
-0.158535	is the same. This is
-0.158535	neutralize each other. This is
-0.158535	instead of truncation. This is
-0.158535	destructor of x. This is
-0.158535	and interface frameworks. This is
-0.158535	it is compiled. This is
-0.158535	4 = 32. This is
-0.158535	the function declaration. This is
-0.158535	code by default. This is
-0.158535	rather than rounding. This is
-0.158535	not even temporarily. This is
-0.158535	should be predicted. This is
-0.158535	arrays with alloca. This is
-0.158535	the label $B1$2:. This is
-0.158535	been called before. This is
-0.158535	is also de-allocated. This is
-0.158535	at address [ecx+eax*4]. This is
-0.158535	b + 0.666666666666666666667; This is
-0.158535	for vacant spaces. This is
-0.158535	instead of if. This is
-0.158535	(partial) template specialization. This is
-0.158535	exception ever happens. This is
-0.158535	!= 0; 35 This is
-0.158535	exceeds 64 kbytes. This is
-0.158535	math library (SVML). This is
-0.158535	happen quite often. This is
-0.158535	list plus i*sizeof(S1). This is
-0.158535	string or CString. This is
-0.158535	memset is deprecated. This is
-0.158535	calculated as ((a+b)+c)+d. This is
-0.158535	a time measure. This is
-0.158535	perspective of usability. This is
-0.158535	instruction xor eax,eax. This is
-0.170675	size of an int is
-0.170675	wide, while an int is
-0.353586	int. A short int is
-0.457670	maintain. If the compiler is
-0.477586	aliasing, but the compiler is
-0.132650	cases where the compiler is
-0.132650	situations where the compiler is
-0.420385	above example, the compiler is
-0.324586	exist. Therefore the compiler is
-0.320618	up-to-date solution. The compiler is
-0.320618	is big. The compiler is
-0.320618	in x. The compiler is
-0.270272	very well. This compiler is
-0.505372	of the Intel compiler is
-0.450988	in the Intel compiler is
-0.319658	If the Intel compiler is
-0.339001	this. The Intel compiler is
-0.339001	details). The Intel compiler is
-0.225151	of the C++ compiler is
-0.225151	The Gnu C++ compiler is
-0.515460	Mac. The Gnu compiler is
-0.353650	platforms. The Clang compiler is
-0.216806	the Digital Mars compiler is
-0.329837	The address of x is
-0.236264	result is that x is
-0.318472	The purpose of this is
-0.564802	The reason for this is
-0.280704	the compiler that this is
-0.214819	functions, or if this is
-0.214819	don't know if this is
-0.209218	as long as this is
-0.261700	can learn from this is
-0.247015	always position-independent because this is
-0.247015	^= 0x80000000; because this is
-0.261700	cache size. If this is
-0.163691	of course, but this is
-0.163691	function library, but this is
-0.163691	it occurs, but this is
-0.163691	the factorials, but this is
-0.163691	than 2-20, but this is
-0.163691	option -ftrapv, but this is
-0.209218	the preceding example, this is
-0.209218	way to test this is
-0.209218	an operating system this is
-0.290528	later maintenance. However, this is
-0.092130	all variables. Obviously, this is
-0.092130	is finished. Obviously, this is
-0.209218	occur, but unfortunately this is
-0.167210	99% of the time is
-0.219722	network resources. This time is
-0.219722	time measurement. If time is
-0.308678	users and much time is
-0.073549	if the calculation time is
-0.073549	but the calculation time is
-0.219722	integer. The conversion time is
-0.391677	that the response time is
-0.275061	If the response time is
-0.219722	template parameter. No time is
-0.219722	16.2. The measured time is
-0.219722	4 processor. Extra time is
-0.219722	the overall computation time is
-0.292635	Which method you use is
-0.120705	the value of A is
-0.115117	the calculation of A is
-0.028845	or -0 } It is
-0.028845	} 109 } It is
-0.059743	Using intrinsic functions It is
-0.059743	of an object It is
-0.059743	in two libraries It is
-0.014183	of the code. It is
-0.014183	on integer code. It is
-0.014183	any extra code. It is
-0.014183	the source code. It is
-0.062163	a long time. It is
-0.071301	takes longer time. It is
-0.140506	take longer time. It is
-0.059743	and function pointers It is
-0.059743	hash maps etc. It is
-0.061758	used intrinsic functions. It is
-0.061758	remove unreferenced functions. It is
-0.061758	in the memory. It is
-0.061758	dynamically allocated memory. It is
-0.082643	set is used. It is
-0.035992	they are used. It is
-0.035992	no longer used. It is
-0.035992	is seldom used. It is
-0.059743	in 64-bit systems. It is
-0.059743	on these data. It is
-0.059743	necessary instruction set. It is
-0.059743	and VIA processors. It is
-0.059743	to be called. It is
-0.059743	the same compiler. It is
-0.059743	vector registers are: It is
-0.059743	during the loop. It is
-0.272172	to a pointer. It is
-0.122550	a 'this' pointer. It is
-0.059743	the best cases. It is
-0.059743	with induction variables. It is
-0.059743	library function calls. It is
-0.059743	a shared object. It is
-0.028845	64-bit integer calculations. It is
-0.028845	do mathematical calculations. It is
-0.059743	of full optimization. It is
-0.059743	to improve performance. It is
-0.059743	for each thread. It is
-0.059743	large data structures It is
-0.059743	pointers or references. It is
-0.059743	database in Windows. It is
-0.061758	applied to integers. It is
-0.061758	than signed integers. It is
-0.028845	want it to. It is
-0.028845	to apply to. It is
-0.059743	are not critical. It is
-0.059743	it was executed. It is
-0.059743	their software faster. It is
-0.028845	no cache problems. It is
-0.028845	of alignment problems. It is
-0.059743	floating point expressions. It is
-0.059743	of the arrays. It is
-0.059743	extra element zero. It is
-0.059743	or assembly language. It is
-0.059743	if it is. It is
-0.059743	the code automatically. It is
-0.028845	in the core. It is
-0.028845	the same core. It is
-0.059743	and automatic vectorization. It is
-0.059743	it to do. It is
-0.059743	double precision constant. It is
-0.059743	modify data members. It is
-0.059743	irregular response times. It is
-0.059743	desired program structure. It is
-0.059743	not a profiler. It is
-0.059743	no exception handling. It is
-0.059743	b[i] = a[i]; It is
-0.059743	prevents out-of-order execution. It is
-0.059743	or mouse input. It is
-0.059743	its own IDE. It is
-0.059743	and dynamic versions. It is
-0.059743	line number information. It is
-0.059743	the user interface. It is
-0.059743	pitfalls of unit-testing It is
-0.059743	of programming style. It is
-0.059743	from the counts. It is
-0.059743	make files smaller. It is
-0.059743	with 16-bit programs. It is
-0.059743	that particular part. It is
-0.059743	9.2 Cache organization It is
-0.096277	a specific purpose. It is
-0.059743	a non-sequential manner. It is
-0.059743	restriction on x. It is
-0.059743	the new context. It is
-0.059743	objects are aligned. It is
-0.059743	catch, and throw. It is
-0.059743	safe, of course. It is
-0.059743	(see page 73). It is
-0.059743	also has disadvantages: It is
-0.059743	or optimized away. It is
-0.059743	is data decomposition. It is
-0.059743	settings are lost. It is
-0.059743	difficult to read. It is
-0.059743	static data. 148 It is
-0.059743	compatible with Gnu. It is
-0.059743	CPU dispatcher updated. It is
-0.059743	7.15b below shows. It is
-0.059743	used more efficiently. It is
-0.059743	not doing divisions. It is
-0.059743	on page 72. It is
-0.059743	an error message. It is
-0.059743	the final product. It is
-0.059743	on and off. It is
-0.059743	difficult to diagnose. It is
-0.059743	user. Feature bloat. It is
-0.059743	with compile-time polymorphism. It is
-0.059743	or mouse move. It is
-0.059743	several iterations ahead. It is
-0.059743	CPU dispatch strategies It is
-0.059743	the C-style type-casting. It is
-0.059743	waiting for response. It is
-0.059743	what is happening. It is
-0.059743	is not standardized. It is
-0.059743	actually quite convenient. It is
-0.059743	game or animation. It is
-0.059743	#include <xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It is
-0.059743	the programmer can. It is
-0.059743	quite tedious indeed. It is
-0.059743	attack for hackers. It is
-0.059743	less user friendly. It is
-0.059743	See page 54. It is
-0.059743	between these considerations. It is
-0.059743	version performs poorly. It is
-0.059743	a first-in-last-out fashion. It is
-0.059743	to choose between. It is
-0.059743	on page 130. It is
-0.059743	(see p. 57). It is
-0.059743	branches for correctness. It is
-0.059743	See page 61. It is
-0.059743	types or sizes? It is
-0.059743	(a*b*c)+(c*b*a) to a*b*c*2. It is
-0.059743	additions with double's. It is
-0.059743	is memory pooling. It is
-0.059743	with many decimals. It is
-0.059743	simple to develop. It is
-0.059743	as memory leaks. It is
-0.059743	integers is costless. It is
-0.059743	use a queue. It is
-0.521895	part of the memory is
-0.337282	whose distance in memory is
-0.337742	memory. The static memory is
-0.419463	collection. The allocated memory is
-0.306824	advantage of static data is
-0.231270	overflow on input data is
-0.286668	to make thread-specific data is
-0.231270	file containing numerical data is
-0.525070	part of the program is
-0.286666	only if the program is
-0.286666	even if the program is
-0.292775	every time the program is
-0.082959	memory when the program is
-0.082959	there when the program is
-0.082959	files when the program is
-0.039502	initialized when the program is
-0.082959	inputs when the program is
-0.082959	resolved when the program is
-0.219434	time, but the program is
-0.320259	resolved before the program is
-0.096003	and while the program is
-0.096003	break while the program is
-0.219434	locked after the program is
-0.219434	function. When the program is
-0.219434	postponed until the program is
-0.642916	part of a program is
-0.381896	speed of a program is
-0.363633	data in a program is
-0.301853	are called. The program is
-0.368177	by the test program is
-0.191059	a highly optimized program is
-0.058473	A console mode program is
-0.191059	to the calling program is
-0.191059	a computationally intensive program is
-0.306218	that calls other functions is
-0.314585	classes and member functions is
-0.575305	pointer in member functions is
-0.306218	implementation of these functions is
-0.533295	frequency of the CPU is
-0.234348	Pentium 4 (NetBurst) CPU is
-0.448690	0 and the other is
-0.339075	far from each other is
-0.658807	the bit scan instruction is
-0.285588	example, but the point is
-0.537005	integers to floating point is
-0.381534	Conversion to floating point is
-0.285588	before the decimal point is
-0.385652	But if the loop is
-0.385652	further if the loop is
-0.724936	calculations inside the loop is
-0.322328	variable unless the loop is
-0.531609	body of a loop is
-0.423220	of the while loop is
-0.512786	only the innermost loop is
-0.166664	a new compiler which is
-0.166664	in stack memory which is
-0.306452	a 'this' pointer which is
-0.166664	floating point library which is
-0.166664	a shared object which is
-0.166664	on a variable which is
-0.213970	a memory address which is
-0.166664	a single bit which is
-0.166664	full debugging support which is
-0.166664	a complicated process which is
-0.166664	a shift operation which is
-0.213970	an intermediate code, which is
-0.166664	Intel C++ compiler, which is
-0.166664	a special trick which is
-0.166664	two 32-bit integers, which is
-0.166664	cache line size, which is
-0.166664	has a latency which is
-0.036059	a is true, which is
-0.036059	b is true, which is
-0.213970	to start up, which is
-0.406455	on the stack, which is
-0.166664	CPU family number, which is
-0.166664	requires a division, which is
-0.166664	an integer comparison, which is
-0.166664	the loop counter, which is
-0.166664	a garbage collector which is
-0.075386	single & operation, which is
-0.075386	a shift operation, which is
-0.166664	link library (DLL) which is
-0.166664	replaced by x<<3, which is
-0.166664	the generic branch, which is
-0.166664	function library asmlib, which is
-0.166664	a compile-time polymorphism, which is
-0.166664	addresses for everything, which is
-0.166664	assembly language output, which is
-0.166664	Visual Basic .NET, which is
-0.166664	make a bit-mask which is
-0.166664	the constant 2.5, which is
-0.322661	are used at all is
-0.230155	designed by Intel but is
-0.230155	advantages of C++ but is
-0.285401	Intel Atom processors, but is
-0.230155	program under test, but is
-0.535396	method that is used is
-0.229536	negative or if one is
-0.167849	recommendation of which one is
-0.167849	find out which one is
-0.371191	of the preceding one is
-0.327693	know how a cache is
-0.502766	that the code cache is
-0.272072	and data A cache is
-0.251192	index. The data cache is
-0.507518	the level-1 data cache is
-0.510428	sharing the same cache is
-0.446604	for the level-2 cache is
-0.285408	cache. The level-2 cache is
-0.311936	the entire level-1 cache is
-0.333525	sign-bit if the integer is
-0.319936	check whether an integer is
-0.319936	operations. When an integer is
-0.226827	then the even integer is
-0.301541	= s; An integer is
-0.245820	to the instruction set is
-0.128951	efficient. This instruction set is
-0.128951	view. This instruction set is
-0.211369	detect which instruction set is
-0.062652	the SSE2 instruction set is
-0.022167	The SSE2 instruction set is
-0.070214	145 SSE2 instruction set is
-0.149719	AVX 32 instruction set is
-0.099820	the AVX instruction set is
-0.146109	The AVX instruction set is
-0.051418	or later instruction set is
-0.211369	next higher instruction set is
-0.338751	the x86 instruction set is
-0.032873	the SSE4.1 instruction set is
-0.149719	the newest instruction set is
-0.149719	the AVX512 instruction set is
-0.234030	the CISC instruction set is
-0.149719	(or later) instruction set is
-0.470622	object of the class is
-0.515106	object of a class is
-0.367715	same function or class is
-0.388552	the structure or class is
-0.244541	}; 52 or class is
-0.264905	that a template class is
-0.195780	the above template class is
-0.314409	class. The child class is
-0.624476	of a derived class is
-0.214736	to a base class is
-0.901014	you have to do is
-0.780475	that you can do is
-0.223136	Portability note: This example is
-0.339632	MultiplyBy in this example is
-0.339632	1.2 in this example is
-0.223136	is executed. An example is
-0.097392	illustrates this. My example is
-0.097392	an example. My example is
-0.323430	files from different compilers is
-0.492142	Mixing float and double is
-0.230483	extra complications. A double is
-0.372504	of a 64-bit double is
-0.528523	matter if the size is
-0.263419	speed. Optimizing for size is
-0.210741	today where cache size is
-0.270500	application. The integer size is
-0.270500	a particular integer size is
-0.292366	largest vector register size is
-0.210741	make sure its size is
-0.210741	of a specific size is
-0.668243	The cache line size is
-0.210741	to a smaller size is
-0.210741	if the RAM size is
-0.502381	value of the pointer is
-0.217409	deleted when the pointer is
-0.217409	well before the pointer is
-0.207478	a pointer. The pointer is
-0.721278	of the function pointer is
-0.280718	a member function pointer is
-0.207478	last member. This pointer is
-0.259737	integer, and this pointer is
-0.278655	Pointer arithmetic A pointer is
-0.259737	__attribute(( aligned(16))) Assume pointer is
-0.285650	whenever a smart pointer is
-0.285650	pointers A smart pointer is
-0.192115	the value of b is
-0.192115	The offset of b is
-0.325057	the time and b is
-0.209639	we assume that b is
-0.111597	but not if b is
-0.052224	& b if b is
-0.052224	| b if b is
-0.043699	a only when b is
-0.043699	is efficient when b is
-0.043699	different implementation when b is
-0.043699	inefficient, however, when b is
-0.276306	useful if the library is
-0.276306	resolved when the library is
-0.360972	function from the library is
-0.221268	a long vector library is
-0.297869	in a dynamic library is
-0.201589	which a dynamic library is
-0.099035	positive number when i is
-0.099035	is invalid when i is
-0.227538	to 15. If i is
-0.227538	The loop counter i is
-0.309074	class of the object is
-0.309074	fail if the object is
-0.092636	called when the object is
-0.092636	released when the object is
-0.210543	object. If the object is
-0.210543	make sure the object is
-0.210543	are met: the object is
-0.380873	no variable or object is
-0.285929	every time an object is
-0.285929	called whenever an object is
-0.189431	sure that no object is
-0.316558	for a new object is
-0.215853	in a shared object is
-0.189431	following reasons: Each object is
-0.316558	and the local object is
-0.189431	that the original object is
-1.026942	a floating point number is
-0.233459	with a higher number is
-0.235858	If the word static is
-0.175126	doesn't matter and there is
-0.175126	are common, and there is
-0.158985	a way that there is
-0.323581	can assume that there is
-0.158985	the feature that there is
-0.323581	be aware that there is
-0.158985	more complex, that there is
-0.125824	out or if there is
-0.125824	the program if there is
-0.125824	signed integer if there is
-0.125824	separate thread if there is
-0.125824	becomes bigger if there is
-0.125824	becomes smaller if there is
-0.125824	the exponent if there is
-0.125824	consuming, especially if there is
-0.125824	will consider if there is
-0.125824	taken, i.e. if there is
-0.125824	multiple accumulators if there is
-0.094178	of convenience - there is
-0.044541	more efficient when there is
-0.044541	is critical when there is
-0.011676	CPU time then there is
-0.011676	compile time then there is
-0.023674	in memory then there is
-0.023674	member functions then there is
-0.023674	by two then there is
-0.023674	the performance then there is
-0.023674	an error then there is
-0.023674	_endthread(), etc. then there is
-0.023674	is elsewhere then there is
-0.147948	is negligible because there is
-0.174642	is running. If there is
-0.174642	to calculate. If there is
-0.138619	double precision, but there is
-0.138619	data bases, but there is
-0.133869	vector operations where there is
-0.094178	In this case there is
-0.196777	smart pointer. But there is
-0.094178	program and whether there is
-0.029181	by default unless there is
-0.029181	large object, unless there is
-0.029181	loop manually unless there is
-0.061797	In most cases, there is
-0.071052	in some cases, there is
-0.140214	In some cases, there is
-0.094178	the reason why there is
-0.094178	time. (Of course there is
-0.094178	registers are used, there is
-0.094178	At the diagonal there is
-0.094178	In 64-bit systems, there is
-0.094178	many cases, however, there is
-0.094178	code. In general, there is
-0.094178	functions, but unfortunately there is
-0.133869	memory caches. Typically, there is
-0.094178	set is enabled there is
-0.266960	for Windows and C++ is
-0.309289	old-fashioned. Development in C++ is
-0.213876	these disadvantages when C++ is
-0.213876	major platforms. However, C++ is
-0.213876	optimized function libraries. C++ is
-0.213876	for several reasons. C++ is
-0.213876	on page 15. C++ is
-0.213876	optimization. 14 Portability C++ is
-0.127540	everything is double There is
-0.127540	as different functions. There is
-0.127540	min)) { ... There is
-0.127540	between the systems. There is
-0.170562	of logical processors. There is
-0.127540	6 Development process There is
-0.127540	alloca was called. There is
-0.127540	and free are: There is
-0.127540	the vector size. There is
-0.127540	the same object. There is
-0.127540	a different way. There is
-0.127540	for internal references. There is
-0.127540	the function returns. There is
-0.127540	dynamic memory allocation. There is
-0.127540	caching becomes inefficient. There is
-0.127540	a try block. There is
-0.127540	a template parameter. There is
-0.127540	the maximum value. There is
-0.127540	loop control branch. There is
-0.127540	than four parameters. There is
-0.127540	the higher bits. There is
-0.127540	advantage comes automatically. There is
-0.127540	Execution unit throughput There is
-0.127540	very time- consuming. There is
-0.127540	no out-of-order execution. There is
-0.127540	without AVX support. There is
-0.127540	refresh the screen. There is
-0.127540	the application programmer. There is
-0.127540	structure is created. There is
-0.127540	(see p. 43). There is
-0.127540	Intel and Gnu. There is
-0.127540	vectors. 12.10 Conclusion There is
-0.127540	(see p. 87). There is
-0.127540	container be recycled? There is
-0.127540	xn as x4∙xn-4. There is
-0.127540	discussion. 7.33 Namespaces There is
-0.127540	integer is returned. There is
-0.127540	(.dll or .so). There is
-0.459353	size of the array is
-0.325824	dimensions of the array is
-0.267167	in which the array is
-0.347958	size of each array is
-0.489738	then a simple array is
-0.300725	7.10 Arrays An array is
-0.552918	matrix or multidimensional array is
-0.212645	when a fixed-size array is
-0.235948	as 2eee 1.fffff, where is
-0.099458	time. Each code version is
-0.099458	initialization. Each code version is
-0.276813	is. The 64-bit version is
-0.222584	BSD. The Windows version is
-0.222584	the directly compiled version is
-0.524636	and that the value is
-0.322389	range then the value is
-0.322389	unfavorable, unless the value is
-0.305108	sure that a value is
-0.311785	63 . The value is
-0.406297	account that each value is
-0.269186	pointer then its value is
-0.038049	the number of objects is
-0.150438	variable number of objects is
-0.079755	total number of objects is
-0.236474	even if the variable is
-0.236474	declared. If the variable is
-0.048161	in which the variable is
-0.236474	makes sure the variable is
-0.403201	range of a variable is
-0.202013	sure that a variable is
-0.202013	tells that a variable is
-0.336520	the function or variable is
-0.294443	of data A variable is
-0.189564	Failure to do so is
-0.486719	of floating point variables is
-0.742299	floating point register variables is
-0.249090	of integer register variables is
-1.701313	a power of 2 is
-0.275197	sure that the table is
-0.275197	fast if the table is
-0.693536	to calculate the table is
-0.303554	This so-called virtual table is
-0.415870	with a lookup table is
-0.308452	hyperthreading, but the performance is
-0.308452	this case, the performance is
-0.296542	Intel processors. The performance is
-0.209226	for good code performance is
-0.209226	ment time when performance is
-0.018851	calls. The best performance is
-0.018851	system. The best performance is
-0.018851	compilers). The best performance is
-0.231056	in the application software is
-0.231056	piece of CPU-intensive software is
-0.232572	where the storage order is
-0.232572	together. The link order is
-0.324920	performance if the branch is
-0.297149	each function and branch is
-0.269232	4. The if branch is
-0.561375	the loop control branch is
-0.273488	i<20 loop control branch is
-0.215886	If the wrong branch is
-0.376300	accessing a data member is
-0.250990	a class data member is
-0.345969	define in this way is
-0.489925	goes the same way is
-0.133654	times the other way is
-0.133654	rarely the other way is
-0.263923	ways. The first way is
-0.211187	code. The second way is
-0.211187	The most compatible way is
-1.174178	the number of elements is
-0.657125	total number of elements is
-0.290205	and types of elements is
-0.218565	address. If this address is
-0.218565	register if its address is
-0.224940	then the target address is
-0.161469	predicted. The target address is
-0.218565	the variable whose address is
-0.344838	each intrinsic function call is
-0.404264	If the carry bit is
-0.316973	The opposite of register is
-0.225864	main memory. A register is
-0.324274	of each vector register is
-0.265716	highest level of optimization is
-0.196471	high degree of optimization is
-0.269021	of 18 software optimization is
-0.215699	optimizations when interprocedural optimization is
-0.215699	whole program 81 optimization is
-0.350840	compilers or function libraries is
-0.316327	7.28 Templates A template is
-0.046338	templates. The powN template is
-0.046338	template. The powN template is
-0.071686	The number of registers is
-0.214591	is advantageous because registers is
-0.287047	of the integer registers is
-0.214591	number of available registers is
-0.291180	of using smart pointers is
-0.340086	overloaded or the user is
-0.279153	times when a user is
-0.647719	that the end user is
-0.204740	of bits. The method is
-0.204740	assembly language". The method is
-0.102361	never called. This method is
-0.102361	different CPUs. This method is
-0.102361	Intel compiler. This method is
-0.102361	the library. This method is
-0.102361	modulo 16. This method is
-0.102361	are finished. This method is
-0.102361	multiple versions. This method is
-0.102361	< 2.0 This method is
-0.102361	less efficiently. This method is
-0.102361	be added. This method is
-0.158377	advantage of this method is
-0.158377	pointers because this method is
-0.158377	metaprogramming, but this method is
-0.119401	also discussed which method is
-0.119401	to consider which method is
-0.148895	A more general method is
-0.218396	or column. The access is
-0.238894	likely that memory access is
-0.238894	method if memory access is
-0.301621	storage. Optimizing file access is
-0.228979	factor sizeof(S1) = 16 is
-0.325101	when alignment by 16 is
-0.224071	denormals-are-zero mode if SSE2 is
-0.224071	set if possible. SSE2 is
-0.224071	the same executable. SSE2 is
-0.222996	allocated resources. The system is
-0.968225	of the operating system is
-0.320052	OS X operating system is
-0.243050	program before the file is
-0.243050	makes sure the file is
-0.317848	access to a file is
-0.234991	possible. Template meta- programming is
-0.234712	sizes to 1024 bits is
-0.345940	use of vector operations is
-0.312536	when bb[i] > 0 is
-0.428906	of a composite type is
-0.205880	parameter of composite type is
-0.304750	solution in this case is
-0.304750	columns in this case is
-0.217150	repetitive. The simplest case is
-0.298165	etc. The worst case is
-0.460822	next generation of processors is
-0.277283	counter in Intel processors is
-0.671628	the standard PC processors is
-0.292144	faster if the constant is
-0.082075	division by a constant is
-0.050464	Division by a constant is
-0.107573	Modulo by a constant is
-0.304432	pipeline then the error is
-0.288321	worse kind of error is
-0.268987	accessed, and this error is
-0.409104	until the residual error is
-0.641402	memory to the stack is
-0.198526	new function. The stack is
-0.198526	sections below. The stack is
-0.287275	way the register stack is
-0.214785	floating point register stack is
-0.313404	The speed of CPUs is
-0.228102	compatibility with old CPUs is
-0.234668	strings in character arrays is
-0.347494	of virtual function calls is
-0.234735	critical. The fastest execution is
-0.370677	away and the result is
-0.284277	evaluated, because the result is
-0.284277	make sure the result is
-0.289204	be 2. The result is
-0.194943	then the first result is
-0.194943	last the second result is
-0.194943	because the 33 result is
-0.155249	even if the processor is
-0.155249	selected if the processor is
-0.191850	checks whether the processor is
-0.206778	only one logical processor is
-0.339966	Such a soft processor is
-0.234455	bigger than 127 bytes is
-0.282040	The use of threads is
-0.367950	necessary communication between threads is
-0.407231	if the array element is
-0.063092	before the first element is
-0.092521	needed. The C++ language is
-0.092521	work. The C++ language is
-0.222161	way the programming language is
-0.476327	choice of programming language is
-0.222161	decide which programming language is
-0.286137	d in assembly language is
-0.308831	a hardware definition language is
-0.181929	The hardware definition language is
-0.276600	and precision. The speed is
-0.261472	critical. Optimizing for speed is
-0.192850	static version if speed is
-0.085807	be avoided when speed is
-0.085807	is preferred when speed is
-0.085807	the code where speed is
-0.085807	are areas where speed is
-0.235155	then d+e, then c is
-0.075323	16 3.1 How much is
-0.075323	consumers 3.1 How much is
-0.313139	used when a thread is
-0.292695	threads where one thread is
-0.302797	calculations while another thread is
-0.226094	addition, subtraction, multiplication, etc. is
-0.226094	of semaphores, mutexes, etc. is
-0.098832	of overflow. The exception is
-0.098832	support anyway. The exception is
-0.404366	everything that is allocated is
-0.226694	that has been allocated is
-0.563350	in case of overflow is
-0.218969	pool. 15 Integer overflow is
-0.218969	if protection against overflow is
-0.672075	signed and unsigned integers is
-0.370266	signed with unsigned integers is
-0.366950	The assembly output option is
-0.226470	case. The -fpie option is
-0.344598	size of the matrix is
-0.238618	2 and the matrix is
-0.173068	columns in a matrix is
-0.197695	2 if a matrix is
-0.197695	} Transposing a matrix is
-0.331029	numbers. Therefore, 64-bit Linux is
-0.314689	-fpic in 32-bit Linux is
-0.226256	run only if AVX is
-0.226256	the operating system. AVX is
-0.629810	functions or vector classes is
-0.393318	when long double precision is
-0.393318	are: Long double precision is
-0.271671	data cache. Single precision is
-0.233772	negative. The last line is
-0.234140	program that already works is
-0.310051	the code is optimized is
-0.142331	8 below. This manual is
-0.142331	1 Introduction This manual is
-0.142331	page 158. This manual is
-0.089372	of the present manual is
-0.089372	Fog The present manual is
-0.234611	in Linux. Address calculation is
-0.234939	the CPU brand check is
-0.191050	models if the problem is
-0.191050	better. If the problem is
-0.191050	of solving the problem is
-0.259571	execution units. The problem is
-0.067391	solution to this problem is
-0.178783	to solve this problem is
-0.050748	time, then the solution is
-0.050748	ignore, then the solution is
-0.174476	; mark_end; This solution is
-0.108221	cost of this solution is
-0.108221	edx. Furthermore, this solution is
-0.131084	to see which solution is
-0.131084	case. The best solution is
-0.174476	then the optimal solution is
-0.131084	A more complicated solution is
-0.174476	inlined. An alternative solution is
-0.131084	even more powerful solution is
-0.131084	and most clean solution is
-0.131084	the only reasonable solution is
-0.131084	time. No universal solution is
-0.316455	elements? If the container is
-0.365533	the array or container is
-0.341278	of using bitwise operators is
-0.234850	for (i=0; i<n; i++) is
-0.277642	inefficient if the list is
-0.303111	market. Such a list is
-0.096767	block. A linked list is
-0.096767	lists. A linked list is
-0.382597	then a sorted list is
-0.235031	model, which quite likely is
-0.057883	the class or structure is
-0.233755	a class or structure is
-0.233424	754 (1985). This standard is
-0.331995	problems when the hardware is
-0.223640	= b + 1 is
-0.223640	security. b & 1 is
-0.289438	64-bit mode. 16-bit mode is
-0.346328	of elements to store is
-0.233864	of these two values is
-0.234451	the fraction. The sign is
-0.322038	module. This non-inlined copy is
-0.279010	make sure the information is
-0.224522	is known. This information is
-0.278062	range of memory addresses is
-0.223686	to calculate self-relative addresses is
-0.284650	and the loop counter is
-0.284650	if the loop counter is
-0.193410	core clock cycles counter is
-0.321899	A performance monitor counter is
-0.454023	core clock cycle counter is
-0.033569	if the loop count is
-0.069951	when the loop count is
-0.069951	If the loop count is
-0.251627	way. The first count is
-0.030084	if the repeat count is
-0.062408	when the repeat count is
-0.062408	If the repeat count is
-0.119689	a high repeat count is
-0.119689	the typical repeat count is
-0.223803	Especially the memory allocation is
-0.648817	of dynamic memory allocation is
-0.331635	Whenever dynamic memory allocation is
-0.304329	code. Dynamic memory allocation is
-0.304329	allocation Dynamic memory allocation is
-0.233269	store An uncached write is
-0.288994	solution to these problems is
-0.200760	the heap. The space is
-0.208759	that the memory space is
-0.208759	purposes. This memory space is
-0.208759	efficient. Extra memory space is
-0.396208	sum2; If the microprocessor is
-0.305089	most cases the microprocessor is
-0.232810	that contains several branches is
-0.232922	operator. The & operator is
-0.049206	constructor or overloaded operator is
-0.049206	Using an overloaded operator is
-0.049206	operators An overloaded operator is
-0.168357	cast The dynamic_cast operator is
-0.215859	cast The const_cast operator is
-0.168357	cast The reinterpret_cast operator is
-0.253732	advance and the multiplication is
-0.253732	integers, while the multiplication is
-0.221363	a heavy graphics application is
-0.221363	(WTL). A WTL application is
-0.168474	or when code caching is
-0.168474	situations where code caching is
-0.177157	memory addresses. If caching is
-0.177157	storing data without caching is
-0.177157	page 87). Data caching is
-0.177157	RAM memory. Efficient caching is
-0.297570	of the instruction sets is
-0.297570	backwards compatible instruction sets is
-0.310744	result of the expression is
-0.209838	+ d; This expression is
-0.209838	avoid that some expression is
-0.187570	where a vector implementation is
-0.187570	information about which implementation is
-0.241925	But the software implementation is
-0.241925	However, a software implementation is
-0.274244	much more complicated implementation is
-0.362245	for the exception handling is
-0.277354	compilers. If exception handling is
-0.041695	error handling Exception handling is
-0.041695	exception handling Exception handling is
-0.343069	parent class data members is
-0.348178	This large memory model is
-0.182611	a known CPU model is
-0.182611	a particular CPU model is
-0.275238	for each processor model is
-0.296909	but this memory block is
-0.543608	new bigger memory block is
-0.325197	in the function name is
-0.200017	check that the conversion is
-0.200017	this example, the conversion is
-0.193861	read-only data. The disadvantage is
-0.193861	program starts. The disadvantage is
-0.193861	is avoided. The disadvantage is
-0.271761	different types. A disadvantage is
-0.271761	program slower. Another disadvantage is
-0.339229	an integer to zero is
-0.207692	so fast that what is
-0.297593	C++ based on what is
-0.207692	to the reader what is
-0.220186	value of the parameter is
-0.237423	significant if a parameter is
-0.293664	than a function parameter is
-0.018147	if the size parameter is
-0.246305	because the template parameter is
-0.220063	< 0. The division is
-0.330449	the microprocessor. Integer division is
-0.561226	returned pointer or reference is
-0.220430	points to. A reference is
-0.330027	cycle if the source is
-0.220920	task switching. This cost is
-0.294534	231. This extra cost is
-0.203595	clock cycles. The reason is
-0.203595	performs well. The reason is
-0.203595	Pentium 4. The reason is
-0.203595	have tested. The reason is
-0.193194	the fact that n is
-0.085942	index than when n is
-0.085942	more serious when n is
-0.193194	places back, where n is
-0.193434	length of the string is
-0.038565	every time a string is
-0.193434	length of each string is
-0.338979	variable. The register keyword is
-0.206051	if the inline keyword is
-0.206051	Therefore, the __fastcall keyword is
-0.365272	code and table lookup is
-0.279842	132. Unfortunately, table lookup is
-0.231786	first operand of && is
-0.333020	some of the difference is
-0.293673	two functions. The difference is
-0.375406	before the preceding addition is
-0.206051	return route. This mechanism is
-0.303042	virtual function dispatch mechanism is
-0.477342	The stack unwinding mechanism is
-0.231918	first operand of || is
-0.288057	this kind of optimizations is
-0.286915	a specific graphics framework is
-0.142093	mechanism of static linking is
-0.142093	clear that static linking is
-0.142093	not if static linking is
-0.142093	file when static linking is
-0.270880	application if dynamic linking is
-0.083829	speed of modern microprocessors is
-0.083829	core of modern microprocessors is
-0.203667	compatibility with older microprocessors is
-0.021959	when the work load is
-0.289121	function which we assume is
-0.335123	to floating point numbers is
-0.335123	and floating point numbers is
-0.216131	The choice of platform is
-0.216131	to a different platform is
-0.231919	is called, a dispatch is
-0.421530	that the user interface is
-0.297697	simplest possible user interface is
-0.231166	AVX only when AVX2 is
-0.330664	The log on process is
-0.199914	slow GOT lookup process is
-0.199914	why this delaying process is
-0.200504	pointed to by r is
-0.200504	the sequence, where r is
-0.200504	not clear whether r is
-0.251049	The type of storage is
-0.251049	of binary data storage is
-0.269584	each thread. Thread-local storage is
-0.089011	variables into a union is
-0.089011	} Using a union is
-0.285220	7.24 Unions A union is
-0.231444	will recognize that 10 is
-0.170932	The indirect function feature is
-0.218732	in 2010. This feature is
-0.024213	pointer, but this feature is
-0.024213	symbols, but this feature is
-0.049857	prefetching so this feature is
-0.198049	and destructors A constructor is
-0.385567	initialization. A copy constructor is
-0.267567	constructor. A default constructor is
-0.232759	of array element a[i] is
-0.231586	macro declared with #define is
-0.232591	the number of points is
-0.231920	switches A context switch is
-0.490100	being out of range is
-0.138543	the method used here is
-0.138543	of using static here is
-0.138543	because the speed here is
-0.138543	etc. The problem here is
-0.063770	the const_cast operator here is
-0.063770	The ?: operator here is
-0.138543	languages. www.yeppp.info And here is
-0.317409	that the CPU core is
-0.438077	X The code section is
-0.284844	processes. The data section is
-0.310884	for level-1 cache contentions is
-0.213576	consequence of such contentions is
-0.181545	automatically when the computer is
-0.081355	or until the computer is
-0.081355	file until the computer is
-0.179371	cycle on one computer is
-0.230793	of different type conversions is
-0.566157	to prevent such errors is
-0.231546	is faster when columns is
-0.065923	add i to p is
-0.065923	is added to p is
-0.188411	is clear that p is
-0.143674	pointed to by p is
-0.143674	be read before p is
-0.143674	equally fast whether p is
-0.097914	sense that the syntax is
-0.097914	efficient, but the syntax is
-0.152179	this. Unfortunately, the syntax is
-0.241947	pointers are: The syntax is
-0.352586	performance of the STL is
-0.244864	purposes. However, the STL is
-0.193478	page 153. A profiler is
-0.193478	their CPUs. Intel's profiler is
-0.193478	called VTune; AMD's profiler is
-0.224347	error if the index is
-0.175958	// Check that index is
-0.187674	case the array index is
-0.280568	if an array index is
-0.262378	disadvantage of function inlining is
-0.193623	allocation and function inlining is
-0.219415	problems if the network is
-0.156692	situation where the network is
-0.432514	constant (2n / b) is
-0.231321	if such a response is
-0.230519	the number of lines is
-0.043734	if the same operation is
-0.043734	where the same operation is
-0.281900	method of bounds checking is
-0.291752	memory fragmentation. Bounds checking is
-0.279402	situation where a task is
-0.208112	for a given task is
-0.324029	clock frequency is limited is
-0.230420	factor. A little math is
-0.229562	the network or database is
-0.285458	a table of constants is
-0.230849	lookup[b]; If a bool is
-0.410312	The standard stack frame is
-0.358917	process, and the destructor is
-0.188061	object owns. A destructor is
-0.188061	necessary. A virtual destructor is
-0.188646	the question when efficiency is
-0.188646	priority of program efficiency is
-0.188646	implemented. The highest efficiency is
-0.208112	The choice of algorithm is
-0.208112	only. The following algorithm is
-0.230306	way to handle strings is
-0.084258	representation of the exponent is
-0.019582	function when the exponent is
-0.066105	negative numbers. The exponent is
-0.066105	binary digits. The exponent is
-0.091530	innermost loop. Another possibility is
-0.091530	a GOT. Another possibility is
-0.230306	any of these conditions is
-0.229382	are optimal. Best-case testing is
-0.304578	so that the alignment is
-0.230768	Unfortunately, the cross-platform compatibility is
-0.229152	twice because the macro is
-0.106507	microprocessors when an operand is
-0.024279	prediction. If one operand is
-0.024279	first. If one operand is
-0.005944	and the second operand is
-0.002962	then the second operand is
-0.005944	whether the second operand is
-0.216127	the data. The effect is
-0.216127	or post-increment. The effect is
-0.185282	reason why this effect is
-0.180496	used set of containers is
-0.180496	using ready made containers is
-0.229422	of the STL containers is
-0.521572	with the same priority is
-0.182356	if the clock frequency is
-0.182356	when the clock frequency is
-0.265341	load. The clock frequency is
-0.347682	the CPU clock frequency is
-0.180272	iterations. Here the iteration is
-0.246778	error for each iteration is
-0.180272	before the preceding iteration is
-0.157955	& N-1)==0 if N is
-0.157955	1-bit removed. If N is
-0.157955	for pow(x,N) where N is
-0.157955	// General case, N is
-0.229672	The most important thing is
-0.228674	WriteFile if the handle is
-0.327680	memory called the heap is
-0.430412	the #pragma vector nontemporal is
-0.313744	Violation of array bounds is
-0.809791	that can be improved is
-0.202611	or structure. The situation is
-0.470938	the worst case situation is
-0.333024	checking). An error message is
-0.136005	extra time. The delay is
-0.136005	matrix line. The delay is
-0.175869	250 ms. This delay is
-0.449921	eliminated if the condition is
-0.175628	because testing a condition is
-0.298160	efficient loop control condition is
-0.175628	between the different cores is
-0.140073	with multiple CPU cores is
-0.140073	use multiple CPU cores is
-0.201076	ecx+eax*4. The result ebx is
-0.201076	assembly code. Register ebx is
-0.201843	The address of list[i] is
-0.201843	because the expression list[i] is
-0.318540	well. A switch statements is
-0.227433	polymorphous class? This chapter is
-0.427611	The branch target buffer is
-0.302616	much. Excessive loop unrolling is
-0.198669	number of times CriticalFunction is
-0.198669	depends on whether CriticalFunction is
-0.283667	integer, and the fraction is
-0.229227	when the row length is
-0.368759	sign bit of f is
-0.066295	else { // f is
-0.066295	- 30 // f is
-0.144564	first sum, then f is
-0.283328	so the misprediction penalty is
-0.198111	} The function F1 is
-0.198111	F1 without returning. F1 is
-0.170120	profiler. A simple alternative is
-0.217826	become fragmented. An alternative is
-0.170120	load. A light-weight alternative is
-0.252019	that the critical stride is
-0.252019	because the critical stride is
-0.286002	each. The critical stride is
-0.228628	parameter transfer for 'this' is
-0.227433	of elements per row is
-0.312228	however, where template metaprogramming is
-0.302971	or a hash map is
-0.226277	of objects they contain is
-0.368055	A programmable logic device is
-0.138988	overhead of parameter transfer is
-0.163195	64-bit mode Parameter transfer is
-0.262701	of big memory blocks is
-0.244452	or writing big blocks is
-0.106893	used in example 15.1b is
-0.106893	branch in example 15.1b is
-0.259387	branch of the latter is
-0.074093	64-bit systems. The latter is
-0.074093	bit mode. The latter is
-0.327224	effect of dependency chains is
-0.226277	of a particular brand is
-0.656129	matrix[r][c] below the diagonal is
-0.193899	algorithms for different purposes is
-0.193899	audience for educational purposes is
-0.227270	for the user. Time is
-0.086456	a, b; // everything is
-0.086456	* 1.2; // everything is
-0.226938	order calculation capabilities. Here is
-0.227933	Use OpenMP directives. OpenMP is
-0.035265	of a clock cycle is
-0.318298	of possible pointer aliasing is
-0.211061	from www.agner.org/optimize/testp.zip. This tool is
-0.074326	language and development tool is
-0.074326	One popular development tool is
-0.226322	of memset and memcpy is
-0.016609	cases where the parallelism is
-0.155008	in parallel. Fine-grained parallelism is
-0.189373	than if because #if is
-0.189373	same source code. #if is
-0.189032	unit, but this unit is
-0.189032	below. The time unit is
-0.225950	sequence where each label is
-0.225207	the number of iterations is
-0.300058	from a branch misprediction is
-0.323406	principle of lazy binding is
-0.225207	alternative. The theoretical background is
-0.323450	chains. A dependency chain is
-0.225578	disadvantage of complicated algorithms is
-0.225578	number of possible inputs is
-0.225950	software packages and who is
-0.279786	vacant then the DLL is
-0.226322	amount of memory required is
-0.225950	of the keyword volatile is
-0.414693	penalty of cache misses is
-0.317537	is terminated. The purpose is
-0.405803	object compiled without -fpic is
-0.224258	of C++. Yet, D is
-0.224258	Here, each value xn is
-0.492312	using new and delete is
-0.182853	in the code itself is
-0.182853	on the device itself is
-0.563769	number of context switches is
-0.318334	of sum. The trick is
-0.249787	if the loop body is
-0.182853	or if its body is
-0.312852	that the compiler generates is
-0.223835	error without using exceptions is
-0.459004	33% when the CPUID is
-0.224682	series of five manuals is
-0.278231	if the type T is
-0.299497	of its binary representation is
-0.278710	page 73. Runtime polymorphism is
-0.278231	faster when the factor is
-0.224682	critical innermost loop. log is
-0.222996	this manually. This principle is
-0.078468	the first time Func is
-0.078468	re-calculated every time Func is
-0.037544	first thing we notice is
-0.037544	second thing we notice is
-0.222504	not recommended if portability is
-0.419049	see in the debugger is
-0.222013	if the image base is
-0.222504	step where the compilation is
-0.222504	funny looking name ?Func@@YAXQAHAAH@Z is
-0.223982	The preprocessing macro INSTRSET is
-0.219475	quite as versatile. Fortran is
-0.220061	The order of inheritance is
-0.219475	to disk. Memory swapping is
-0.220648	may report that memset is
-0.059529	if your optimization effort is
-0.059529	If your optimization effort is
-0.618130	folding and constant propagation is
-0.273956	complicated reductions. Algebraic reduction is
-0.118398	the size of abc is
-0.273292	scan instructions. My recommendation is
-0.219475	install a program package is
-0.273292	of handling cleanup jobs is
-0.219475	each value of n! is
-0.357312	popular version of Basic is
-0.301445	with new or malloc is
-0.219475	code in example 15.1c is
-0.219475	A problem with macros is
-0.220648	Which solution you prefer is
-0.008001	value of the divisor is
-0.008001	// Faster if divisor is
-0.422401	of the time slices is
-0.216422	7.4 Enums An enum is
-0.215695	see if our estimate is
-0.087531	in the way m is
-0.041564	the template function, m is
-0.041564	the simple function, m is
-0.312598	because partial template specialization is
-0.215695	also situations where pre-increment is
-0.217150	allocated object, and ownership is
-0.067269	to assume that *p+2 is
-0.067269	from assuming that *p+2 is
-0.216422	example 14.8 and 14.9 is
-0.215695	if a certain modification is
-0.216422	infinite loop if powN is
-0.067269	speed if the bottleneck is
-0.067269	calls. If the bottleneck is
-0.216422	The static data area is
-0.269838	been wasted. The consequence is
-0.216422	make such an assumption is
-0.216422	compatible across compilers. Fastcall is
-0.215695	this address. Step (1) is
-0.215695	space explicitly when alloca is
-0.288351	called when the original is
-0.146902	measuring performance by unit-testing is
-0.146902	software development. This unit-testing is
-0.215695	of the desired interval is
-0.269838	Certainly not! 250 μs is
-0.297580	matrix line (in bytes) is
-0.215695	particular application. If hyperthreading is
-0.146902	the floating point format is
-0.260466	The intermediate file format is
-0.101894	page 71). The conclusion is
-0.101894	make utility. The conclusion is
-0.366682	separate layers of abstraction is
-0.209468	lot of data manipulation is
-0.023321	faster if the dividend is
-0.209468	The table of coefficients is
-0.209468	statement with sequential labels is
-0.209468	code. The main focus is
-0.209468	when the function longjmp is
-0.483744	standard template library (STL) is
-0.263060	i.e. each element matrix[r][c] is
-0.209468	example explains why bookkeeping is
-0.209468	eight logical processors. Hyperthreading is
-0.210423	example so that a+b is
-0.209468	conclusion to this argument is
-0.210423	of the kind: "what is
-0.209468	old. The CPU market is
-0.343625	loop in example 11.3 is
-0.210423	example, x = *(p++) is
-0.121122	better performing software product is
-0.121122	(MFC). A competing product is
-0.210423	the number of allocations is
-0.209468	100 / jl $B1$2 is
-0.197275	A more realistic goal is
-0.197275	the stack when CriticalInnerFunction is
-0.197275	"Hello 2" Here CParent is
-0.197275	of 2 then N&(N-1) is
-0.197275	element. The integer comparison is
-0.197275	even a linear search, is
-0.248246	that the memory footprint is
-0.197275	a computer. The proxy is
-0.197275	that doesn't compromise safety is
-0.197275	Another alternative worth considering is
-0.197275	than an hour. Neither is
-0.197275	as the .exe file, is
-0.197275	critical function. The branching is
-0.248246	This so-called symbol interposition is
-0.197275	FactorialTable in example 14.1c is
-0.197275	the address of matrix[j][0] is
-0.073744	at the time MemberPointer is
-0.073744	of c1 before MemberPointer is
-0.248246	AND'ed with all 1's is
-0.197275	uses SSE3. // (This is
-0.197275	Tuesday, Wednesday or Friday is
-0.197275	etc. Optimizing database queries is
-0.197275	to identify performance bottlenecks is
-0.248246	member of a bitfield is
-0.197275	The use of coprocessors is
-0.248246	copy constructor, if any, is
-0.248246	the loop. Example 8.21 is
-0.248246	bug". The FDIV bug is
-0.162624	necessary when no attempt is
-0.162624	considerably. Another serious burden is
-0.162624	leave them enabled (there is
-0.162624	combined with the LLVM is
-0.162624	while he or she is
-0.162624	Friday) in example 14.7b is
-0.162624	The value of cc[i]+2 is
-0.162624	or tiling. This technique is
-0.162624	number of different targets is
-0.162624	specification. The empty throw()specification is
-0.162624	sign bit // u.d is
-0.162624	by a constant: Unsigned is
-0.162624	assume that model N-1 is
-0.162624	The value of i&15 is
-0.162624	even though the CPU-type is
-0.162624	is the sign, eee is
-0.162624	example 12.4b and 12.4c is
-0.162624	integer representation of &list[100] is
-0.162624	= (int)d; // Truncation is
-0.162624	don't support processor X" is
-0.162624	as is often seen, is
-0.162624	in case memory re-allocation is
-0.162624	address of it (&ArraySize) is
-0.162624	a shift operation. x*8 is
-0.162624	the loop count (ArraySize) is
-0.162624	example, f(x) or g(x) is
-0.162624	the code, which supposedly is
-0.162624	A for-loop or while-loop is
-0.162624	computer games and animations is
-0.162624	... } Here, log(2.0) is
-0.162624	file stdint.h or inttypes.h is
-0.162624	example, x = array[i++] is
-0.162624	because the memory bus is
-0.162624	to C1::Disp() or C2::Disp() is
-0.162624	model is over. Virtualization is
-0.162624	by *p or p->member is
-0.162624	of Mathcad (v. 15.0) is
-0.162624	+ b * 1.5f; is
-0.162624	A limited "express" edition is
-0.162624	do the divisions (Division is
-0.162624	this mask, and bb[i]*cc[i] is
-0.162624	unsigned integer type size_t is
-0.162624	function at runtime. Polymorphism is
-0.162624	coordination with other subtasks is
-0.162624	the exponent, and fffff is
-0.162624	The most important remedy is
-0.162624	(see above, page 87) is
-0.162624	The instruction add eax,1 is
-0.162624	the stack. This behaviour is
-0.162624	well-known languages. My preference is
-0.162624	square. // This triangle is
-0.162624	= lrint(d); // Rounding is
-0.162624	the loop. The loop-branch is
-0.162624	calculations unless the strictness is
-0.162624	eliminate common sub-expressions. Why is
-0.162624	(with new or malloc) is
-0.162624	Public distribution and mirroring is
-0.162624	assuming that the occurrence is
-0.162624	instruction set (or higher) is
-0.162624	attacks and other abuse is
-0.162624	of memory. One kilobyte is
-0.162624	syntax in example 7.43b is
-0.162624	actual load address. Relocation is
-0.162624	code in example 14.21 is
-0.162624	slow. If the granularity is
-0.162624	is 1. This '1' is
-0.421393	object pointed to is a
-0.682528	a value that is a
-0.480451	An expression that is a
-0.316308	constant divisor that is a
-0.518058	not if it is a
-0.518058	see if it is a
-0.524623	variable because it is a
-0.485826	virtual member function is a
-0.316081	A frame function is a
-0.316081	A pure function is a
-0.578036	A leaf function is a
-0.335018	it. Complicated code is a
-0.406134	extra time. This is a
-0.273152	the vector. This is a
-0.273152	Visual Studio This is a
-0.273152	by 16. This is a
-0.273152	stamp counter. This is a
-0.273152	Digital Mars This is a
-0.273152	filled up. This is a
-0.273152	invalid pointers. This is a
-0.273152	= 32. This is a
-0.273152	with alloca. This is a
-0.273152	of if. This is a
-0.273152	plus i*sizeof(S1). This is a
-0.273152	as ((a+b)+c)+d. This is a
-0.273152	xor eax,eax. This is a
-0.258985	well. This compiler is a
-0.559363	The Intel compiler is a
-0.258985	The Gnu compiler is a
-0.258985	The Clang compiler is a
-0.336003	compiler that this is a
-0.215069	method you use is a
-0.348246	on x. It is a
-0.297594	distance in memory is a
-0.297594	on input data is a
-0.331686	start up, which is a
-0.321044	data A cache is a
-0.346243	in this example is a
-0.078492	this. My example is a
-0.078492	example. My example is a
-0.237938	if the size is a
-0.237938	where cache size is a
-0.237938	sure its size is a
-0.260114	efficient when b is a
-0.260114	however, when b is a
-0.377099	program if there is a
-0.377099	i.e. if there is a
-0.447662	time then there is a
-0.317198	memory then there is a
-0.317198	functions then there is a
-0.105079	running. If there is a
-0.105079	calculate. If there is a
-0.321996	precision, but there is a
-0.067157	default unless there is a
-0.067157	object, unless there is a
-0.067157	manually unless there is a
-0.243984	cases, however, there is a
-0.243984	caches. Typically, there is a
-0.237663	Development process There is a
-0.237663	vector size. There is a
-0.237663	becomes inefficient. There is a
-0.237663	try block. There is a
-0.237663	time- consuming. There is a
-0.237663	AVX support. There is a
-0.237663	application programmer. There is a
-0.237663	and Gnu. There is a
-0.237663	12.10 Conclusion There is a
-0.237663	be recycled? There is a
-0.313012	of each array is a
-0.494338	to do so is a
-0.287611	memory. A register is a
-0.527331	The powN template is a
-0.303696	advantageous because registers is a
-0.386982	if memory access is a
-0.196019	The simplest case is a
-0.196019	The worst case is a
-0.331114	if the constant is a
-0.219555	to the stack is a
-0.292918	below. The stack is a
-0.442020	of programming language is a
-0.021737	3.1 How much is a
-0.239906	of the matrix is a
-0.081384	in a matrix is a
-0.330055	the optimal solution is a
-0.555658	A linked list is a
-0.215069	which quite likely is a
-0.649706	class or structure is a
-0.303696	performance monitor counter is a
-0.312441	addresses. If caching is a
-0.386982	than when n is a
-0.287611	clear whether r is a
-0.296492	Unions A union is a
-0.215069	A context switch is a
-0.331088	?: operator here is a
-0.251568	www.yeppp.info And here is a
-0.215069	faster when columns is a
-0.235986	clear that p is a
-0.235986	fast whether p is a
-0.607408	when the exponent is a
-0.287611	for each iteration is a
-0.113755	N-1)==0 if N is a
-0.113755	removed. If N is a
-0.113755	pow(x,N) where N is a
-0.268308	worst case situation is a
-0.287611	loop control condition is a
-0.215069	A switch statements is a
-0.062315	the critical stride is a
-0.215069	elements per row is a
-0.215069	programmable logic device is a
-0.215069	the user. Time is a
-0.215069	calculation capabilities. Here is a
-0.215069	OpenMP directives. OpenMP is a
-0.215069	A dependency chain is a
-0.094355	the code itself is a
-0.094355	the device itself is a
-0.215069	the type T is a
-0.215069	when the factor is a
-0.215069	innermost loop. log is a
-0.215069	disk. Memory swapping is a
-0.215069	reductions. Algebraic reduction is a
-0.494338	size of abc is a
-0.215069	solution you prefer is a
-0.041344	Faster if divisor is a
-0.044620	assume that *p+2 is a
-0.044620	assuming that *p+2 is a
-0.215069	the desired interval is a
-0.215069	line (in bytes) is a
-0.215069	layers of abstraction is a
-0.215069	template library (STL) is a
-0.215069	2" Here CParent is a
-0.215069	The FDIV bug is a
-0.215069	with the LLVM is a
-0.601812	| 0 = a a
-0.328927	|| false = a a
-0.328927	a ^0 = a a
-0.236739	a & a= a a
-0.264184	whenever a function of a
-0.487419	a linear function of a
-0.267605	non-AVX code because of a
-0.267605	< b because of a
-0.267605	than intended because of a
-0.634181	the member functions of a
-0.269199	the newest CPU of a
-0.025833	critical innermost loop of a
-0.114067	the message loop of a
-0.215857	declaring an integer of a
-0.311792	at the example of a
-0.454587	of the size of a
-0.645727	if the size of a
-0.454587	increase the size of a
-0.215857	to a pointer of a
-0.288704	as an object of a
-0.288704	accessing an object of a
-0.176646	a new object of a
-0.079403	declared. An object of a
-0.079403	Inheritance An object of a
-0.159386	time which version of a
-0.159386	testing which version of a
-0.159386	certainty which version of a
-0.970359	that the value of a
-0.509454	calculate the value of a
-0.315378	void. Returning objects of a
-0.320812	the overall performance of a
-0.302964	accessing a member of a
-0.302964	Accessing a member of a
-0.230478	a data member of a
-0.220192	adds the elements of a
-0.220192	on all elements of a
-0.575190	that the address of a
-0.328190	if each bit of a
-0.529257	be moved out of a
-0.120796	for jumping out of a
-0.120796	after jumping out of a
-0.333016	parameter is part of a
-0.253172	is not part of a
-0.253172	cases: If part of a
-0.190875	the critical part of a
-0.253317	a critical part of a
-0.632149	most critical part of a
-0.253172	you access part of a
-0.129181	example 32 bits of a
-0.129181	accessing 32 bits of a
-0.209216	upper 32 bits of a
-0.318311	The return type of a
-0.498248	two different versions of a
-0.443689	making multiple versions of a
-0.381941	make two versions of a
-0.471340	Testing the speed of a
-0.305119	A positive overflow of a
-0.269199	if the uses of a
-0.338190	the full advantage of a
-0.288542	the logic structure of a
-0.370192	that the values of a
-0.258541	make the values of a
-0.044753	test the sign of a
-0.044753	change the sign of a
-0.251578	that are members of a
-0.184388	The data members of a
-0.082481	class. Data members of a
-0.082481	together. Data members of a
-0.269199	during the development of a
-0.323860	have the disadvantage of a
-0.424883	in the end of a
-0.261181	point to end of a
-0.752857	between different parts of a
-0.321826	defining integer types of a
-0.304911	inline function instead of a
-0.304911	or typedef instead of a
-0.269199	off all optimizations of a
-0.304662	The live range of a
-0.215857	all the modules of a
-0.215857	updating. The change of a
-0.327406	classes. Each instance of a
-0.269199	the assembly output of a
-0.488264	Loops The efficiency of a
-0.269199	is a sum of a
-0.454640	stores the offset of a
-0.163468	to the length of a
-0.163468	then the length of a
-0.163468	adding the length of a
-0.058574	doubled. The length of a
-0.058574	started. The length of a
-0.553776	with the beginning of a
-0.215857	class. The transfer of a
-0.096288	integer conversion Conversion of a
-0.096288	float conversion Conversion of a
-0.352350	because the body of a
-0.269199	consuming. A collection of a
-0.501344	make the scope of a
-0.146792	in the form of a
-0.215857	often inefficient. Objects of a
-0.215857	these calculations. Division of a
-0.352350	as the latency of a
-0.402257	into small pieces of a
-0.352350	A complete redesign of a
-0.288542	is the combination of a
-0.215857	the disassembly window of a
-0.215857	overcome the dangers of a
-0.269199	is slow. Value of a
-0.269199	c) The creation of a
-0.094653	Example 12.8a. Sum of a
-0.094653	Example 12.8b. Sum of a
-0.215857	the self-explaining menus of a
-0.215857	copying. The benefits of a
-0.215857	because the insertion of a
-0.215857	the mathematical notion of a
-0.215857	an extra layer of a
-0.420655	must convert it to a
-0.510905	tune the code to a
-0.335838	sure to point to a
-0.271588	Converting an integer to a
-0.201473	the unsigned integer to a
-0.201473	a signed integer to a
-0.224367	to one class to a
-0.274915	is a pointer to a
-0.464088	to a pointer to a
-0.274915	when a pointer to a
-0.274915	has a pointer to a
-0.274915	make a pointer to a
-0.233394	its 'this' pointer to a
-0.298621	such an object to a
-0.224367	transferring composite objects to a
-0.305217	across a call to a
-0.305217	a function call to a
-0.229920	driver. A call to a
-0.229920	because each call to a
-0.431561	a single call to a
-0.333791	Sequential forward access to a
-0.224367	write the file to a
-0.326810	"function". Multiple calls to a
-0.097853	fix the thread to a
-0.097853	lock a thread to a
-0.325150	passed as parameters to a
-0.278834	such an application to a
-0.321673	pointer or reference to a
-0.221429	a relative reference to a
-0.278834	differently. The link to a
-0.480875	which initially points to a
-0.224367	add unused columns to a
-0.312952	Reading or writing to a
-0.400868	may be changed to a
-0.278834	the main executable to a
-0.411046	parameter is copied to a
-0.364045	template is similar to a
-0.439693	integer is added to a
-0.248805	can be added to a
-0.423395	can be applied to a
-0.016935	static, when applied to a
-0.068243	integer is converted to a
-0.068243	class is converted to a
-0.149503	can be converted to a
-0.278834	the microprocessor jump to a
-0.364045	by the linker to a
-0.278834	operator is equivalent to a
-0.224367	can be updated to a
-0.224367	system resources Writes to a
-0.046176	likely to lead to a
-0.014853	This can lead to a
-0.014853	insight can lead to a
-0.014853	bottlenecks can lead to a
-0.224367	by the loader to a
-0.298621	lazy binding leads to a
-0.097853	x is type-casted to a
-0.097853	pointers are type-casted to a
-0.224367	should never respond to a
-0.224367	allowed. Non-public distribution to a
-0.224367	not always comparable to a
-0.224367	n < 223 to a
-0.224367	-a > -b to a
-0.224367	values are confined to a
-0.339065	for all functions and a
-0.398816	a parent class and a
-0.231589	a debug version and a
-0.337441	with existing systems and a
-0.374038	for the user and a
-0.309726	or VIA processor and a
-0.456934	C++ programming language and a
-0.374038	own memory block and a
-0.231589	a function parameter and a
-0.231589	SSE2 instruction set, and a
-0.231589	floating point addition, and a
-0.287030	of the object, and a
-0.287030	language, e.g. C++, and a
-0.287030	during program development, and a
-0.287030	instructions, multiple cores, and a
-0.231589	a well-defined functionality and a
-0.287030	a remote database, and a
-0.231589	linkage table (PLT) and a
-0.231589	- 64 Kbytes and a
-0.231589	background processes running, and a
-0.231589	with heavy traffic and a
-0.198279	to matrix a in a
-0.107029	a function or in a
-0.107029	system code or in a
-0.107029	carry flag or in a
-0.277371	and store it in a
-0.041652	to a function in a
-0.087727	as a function in a
-0.087727	call a function in a
-0.087727	calls a function in a
-0.087727	Whenever a function in a
-0.249376	are dealing with in a
-0.232332	of the code in a
-0.297969	organize the code in a
-0.238468	hyperthreading or not in a
-0.238468	GB, but not in a
-0.193901	dynamic library than in a
-0.193901	times less than in a
-0.071241	memory rather than in a
-0.193901	logic device than in a
-0.087921	cannot avoid this in a
-0.087921	implemented like this in a
-0.249376	the main memory in a
-0.344094	of all data in a
-0.262371	of received data in a
-0.350701	run the program in a
-0.328457	write the same in a
-0.238468	all suitable functions in a
-0.238468	and internal functions in a
-0.320827	that a loop in a
-0.283193	intrinsic functions, but in a
-0.489391	cache is used in a
-0.487942	functions are used in a
-0.283822	is only used in a
-0.198279	is number one in a
-0.380755	of the integer in a
-0.462936	a and b in a
-0.249376	the desired version in a
-0.572063	movements of objects in a
-0.269543	pointer. A variable in a
-0.269543	a public variable in a
-0.198279	or does so in a
-0.197995	often used variables in a
-0.197995	that most variables in a
-0.197995	have public variables in a
-0.197995	class. Storing variables in a
-0.277371	with making software in a
-0.263904	number of elements in a
-0.241936	requests for elements in a
-0.267837	be executed faster in a
-0.498532	variable is stored in a
-0.332193	to be stored in a
-0.228869	may be stored in a
-0.099529	cannot be stored in a
-0.458934	variables are stored in a
-0.325516	numbers are stored in a
-0.222900	a pointer stored in a
-0.222900	is never stored in a
-0.149497	CriticalFunction is called in a
-0.149497	are actually called in a
-0.461657	tasks. For example, in a
-0.249376	is the first in a
-0.249376	put file access in a
-0.267837	the entire file in a
-0.181886	the same bits in a
-0.181886	set multiple bits in a
-0.181886	writing small bits in a
-0.282940	to look up in a
-0.288902	used many times in a
-0.099514	data are accessed in a
-0.099514	objects are accessed in a
-0.201659	elements are accessed in a
-0.099514	rows are accessed in a
-0.106651	memory or accessed in a
-0.024309	Are objects accessed in a
-0.041739	treats non-Intel CPUs in a
-0.041739	treat non-Intel CPUs in a
-0.277371	redo the calculations in a
-0.163826	return the result in a
-0.163826	stores the result in a
-0.362266	of unused bytes in a
-0.288902	between different threads in a
-0.282842	list. Each element in a
-0.519082	numerically largest element in a
-0.183901	is all done in a
-0.183901	is usually done in a
-0.068347	redo the calculation in a
-0.068347	Re-do the calculation in a
-0.080221	version is implemented in a
-0.635408	can be implemented in a
-0.198279	bits is likely in a
-0.283193	process should run in a
-0.369047	store the values in a
-0.267837	store application-specific information in a
-0.249376	may be fast in a
-0.198279	calls and branches in a
-0.198279	priority level, typically in a
-0.198279	is more complicated in a
-0.198279	puts the programmer in a
-0.198279	also a lookup in a
-0.249376	objects come last in a
-0.193176	they are declared in a
-0.261854	and objects declared in a
-0.198279	piece by piece in a
-0.198279	switches is smaller in a
-0.198279	is provoked here in a
-0.015918	number of columns in a
-0.283193	to 16 lines in a
-0.417877	store all strings in a
-0.221268	to store strings in a
-0.149497	testing multiple conditions in a
-0.149497	from error conditions in a
-0.198279	between different tasks in a
-0.198279	possibly be obtained in a
-0.198279	is overwritten, possibly in a
-0.249376	distance between rows in a
-0.462936	an error message in a
-0.043064	function is defined in a
-0.198279	of the sequence in a
-0.249376	Don't put something in a
-0.198279	would be invalid in a
-0.198279	that is organized in a
-0.198279	by transferring 'this' in a
-0.198279	difficult to implement in a
-0.392462	code is included in a
-0.198279	rules of algebra in a
-0.267837	calls are saved in a
-0.249376	be completely contained in a
-0.087921	can be coded in a
-0.087921	that are coded in a
-0.198279	several standard PC's in a
-0.198279	should be handled in a
-0.198279	use as pivot in a
-0.328457	then be placed in a
-0.198279	should also proceed in a
-0.198279	because a typo in a
-0.198279	efficient than investing in a
-0.198279	or completely absent in a
-0.198279	can be programmed in a
-0.198279	under test finishes in a
-0.198279	rows are indexed in a
-0.198279	commas and semicolons in a
-0.198279	should be scheduled in a
-0.198279	clock cycles. Calculations in a
-0.225681	to use that for a
-0.328266	the right function for a
-0.438588	lines to use for a
-0.365859	allocation of memory for a
-0.280324	C function library for a
-0.365859	A code branch for a
-0.280324	See page 16 for a
-0.414361	a header file for a
-0.666642	code is compiled for a
-0.225681	cycles per element for a
-0.326350	function, each optimized for a
-0.390161	choosing a container for a
-0.225681	the best implementation for a
-0.514752	disable exception handling for a
-0.225681	virtual table lookup for a
-0.225681	of hardware platform for a
-0.493101	registers when compiling for a
-0.225681	See page 3 for a
-0.514752	is big enough for a
-0.390161	CPU is designed for a
-0.225681	program. The inputs for a
-0.858901	has to wait for a
-0.280324	shows the principle for a
-0.225681	See page 87 for a
-0.280324	See page 90 for a
-0.225681	expect a directive for a
-0.514752	same memory area for a
-0.225681	See page 49 for a
-0.225681	faster, except perhaps for a
-0.225681	following: 130 Compile for a
-0.280324	optimized and fine-tuned for a
-0.225681	can be wired for a
-0.993073	The reason is that a
-0.537842	study the code that a
-0.904505	tell the compiler that a
-0.318154	tell these compilers that a
-0.226821	Assume, for example, that a
-0.549680	can make sure that a
-0.529461	keyword makes sure that a
-0.367435	are so small that a
-0.229030	it is certain that a
-0.229030	cannot be certain that a
-0.371432	ways. This means that a
-0.371432	28. This means that a
-0.281617	page 16) shows that a
-0.246245	compiler cannot know that a
-0.246245	who would know that a
-0.329328	This may require that a
-0.155118	exception-safe code Assume that a
-0.155118	memory access. Assume that a
-0.155118	in general. Assume that a
-0.660629	out the possibility that a
-0.281617	can often happen that a
-0.281617	volatile keyword specifies that a
-0.226821	const keyword tells that a
-0.226821	is it unusual that a
-0.281617	usage convention says that a
-0.226821	early planning stage that a
-0.226821	Some developers feel that a
-0.436927	want this to be a
-0.543135	less likely to be a
-0.549432	But it can be a
-0.569273	times. This can be a
-0.339361	mixed implementation can be a
-0.339361	copy constructor can be a
-0.339361	program loading can be a
-0.339361	same chip can be a
-0.533831	But it may be a
-0.466991	then there may be a
-0.490851	inheritance. There may be a
-0.317111	efficient solution may be a
-0.317111	clock frequency may be a
-0.317111	just-in-time compilation may be a
-0.099804	will not only be a
-0.099804	would not only be a
-0.343416	template parameter should be a
-0.510916	parameter can also be a
-0.397307	There may also be a
-0.278553	memory may even be a
-0.224119	size should always be a
-0.335311	interface framework must be a
-0.318196	examples will therefore be a
-0.229773	dimension may preferably be a
-0.291778	function should preferably be a
-0.291778	object should preferably be a
-0.589768	objects should preferably be a
-0.291778	count should preferably be a
-0.363704	should of course be a
-0.303755	then this might be a
-0.298327	the function could be a
-0.504925	Note that there are a
-0.418569	integers. But there are a
-0.475514	automatically. However, there are a
-0.330965	size, etc. There are a
-0.330965	overflow check. There are a
-0.330965	= -abs(x);. There are a
-0.101347	one register. Registers are a
-0.101347	or reference. Registers are a
-0.055931	else { a = a
-0.019666	(b) { a = a
-0.271899	a && a = a
-0.271899	a | a = a
-0.312571	x, y; x = a
-0.348554	0, b; b = a
-0.266062	= a; b = a
-0.266062	parabola (2.0f); b = a
-0.034518	a | 0 = a
-0.241952	changed to c = a
-0.241952	fast division c = a
-0.049044	c, d; c = a
-0.244988	the expression y = a
-0.212852	d, y; y = a
-0.132203	100, y; y = a
-0.132203	1.23456, y; y = a
-0.114972	& b; d = a
-0.114972	&& b; d = a
-0.266341	a || false = a
-0.348891	stack ; ecx = a
-0.158401	a*(b+c) - -(-a) = a
-0.158401	- n.a. -(-a) = a
-0.093695	0 - a*1 = a
-0.093695	- n.a. a*1 = a
-0.093695	0 - a+0 = a
-0.093695	- n.a. a+0 = a
-0.213328	(a&&b) || (!a&&c) = a
-0.213328	a - a/1 = a
-0.213328	(!a&&c) || (b&&c) = a
-0.213328	(a&&!b) || (!a&&b) = a
-0.213328	~a ^ ~b = a
-0.213328	(vector) reductions: ~(~a) = a
-0.213328	(a&b)&(c&d) a ^0 = a
-0.022509	at compile time or a
-0.224805	from the same or a
-0.243224	program has one or a
-0.243224	from only one or a
-0.243224	to just one or a
-0.856421	through a pointer or a
-0.453160	Unlike a pointer or a
-0.301131	a simple pointer or a
-0.279331	a function library or a
-0.224805	an import table or a
-0.423452	with multiple CPUs or a
-0.513050	a command line or a
-0.279331	a sorted list or a
-0.364650	is a reference or a
-0.224805	a different module or a
-0.224805	a runtime DLL or a
-0.074754	a binary tree or a
-0.074754	A binary tree or a
-0.224805	or use objconv or a
-0.335265	parameters then make it a
-0.236671	class or give it a
-0.063058	to make the function a
-0.496499	alternatives: Make the function a
-0.337797	for giving the function a
-0.618793	// Call critical function a
-0.561103	This means that if a
-0.525531	powers of 2 if a
-0.529262	This is faster if a
-0.267202	vector. For example, if a
-0.267202	structures. For example, if a
-0.267202	exits. For example, if a
-0.337481	(|) works even if a
-0.323475	resource conflicts. But if a
-0.222615	can be optimized if a
-0.324869	We can check if a
-0.419985	not be advantageous if a
-0.452411	at to see if a
-0.276848	be quite inefficient if a
-0.222615	and complicated algorithm if a
-0.222615	buffer can occur if a
-0.276848	will unroll loops if a
-0.222615	delay is significant if a
-0.097197	would be invalid if a
-0.097197	counter becomes invalid if a
-0.222615	is stored (or if a
-0.022337	not be evaluated if a
-0.210443	a function is by a
-0.263083	then replace it by a
-0.316451	other ways than by a
-0.400973	the memory used by a
-0.292768	has replaced i by a
-0.210443	an integer variable by a
-0.059489	replace the branch by a
-0.059489	replace a branch by a
-0.059489	poorly predictable branch by a
-0.895992	can be calculated by a
-0.210443	a loop counter by a
-0.292006	replace integer multiplication by a
-0.012511	faster than division by a
-0.052364	Floating point division by a
-0.012511	2 Integer division by a
-0.012511	time. Integer division by a
-0.012511	processor). Integer division by a
-0.012511	division: Integer division by a
-0.035418	can be replaced by a
-0.210443	replace a database by a
-0.210443	An array initialized by a
-0.511581	will be improved by a
-0.282150	it can multiply by a
-0.403865	can be determined by a
-0.059489	slow // Division by a
-0.059489	much faster. Division by a
-0.059489	it matters: Division by a
-0.261289	but are identified by a
-0.192694	Are objects identified by a
-0.308409	an index multiplied by a
-0.021365	to be spaced by a
-0.210443	slow // Modulo by a
-0.210443	shift operations. Multiplying by a
-0.191736	Replacing a function with a
-0.191736	a simple function with a
-0.195370	requires log on with a
-0.289628	running this code with a
-0.246105	template specialization, not with a
-0.377818	on a CPU with a
-0.195370	or the other with a
-0.081451	well. A loop with a
-0.081451	prediction. A loop with a
-0.246105	divide an integer with a
-0.067627	is a class with a
-0.067627	into a class with a
-0.195370	8 kb size with a
-0.147761	a && b with a
-0.147761	a || b with a
-0.167435	A function library with a
-0.167435	math function library with a
-0.264425	a linear array with a
-0.246105	to distinguish elements with a
-0.195370	a function call with a
-0.365050	avoided on processors with a
-0.147761	by a constant with a
-0.147761	a single constant with a
-0.246105	function many times with a
-0.264425	It is accessed with a
-0.380021	function on CPUs with a
-0.195370	advanced high-level language with a
-0.264425	addition of integers with a
-0.392623	can be done with a
-0.195370	a linear list with a
-0.246105	a test run with a
-0.195370	PC platform. However, with a
-0.195370	accesses data members with a
-0.195370	inferior. A model with a
-0.195370	b) >> n with a
-0.195370	must always end with a
-0.246105	on a platform with a
-0.086790	templates or modules with a
-0.086790	most critical modules with a
-0.195370	should be tested with a
-0.264425	an old computer with a
-0.440449	processor is compatible with a
-0.283553	of backwards compatibility with a
-0.299832	no doubt obtained with a
-0.298904	2 when multiplying with a
-0.195370	name of Func with a
-0.195370	care of communication with a
-0.195370	C- style type-casting with a
-0.195370	should be performed with a
-0.195370	copied or moved with a
-0.195370	further optimizations. Loops with a
-0.195370	bytes, 4 ways, with a
-0.195370	easy to trace with a
-0.195370	by calling vector::reserve with a
-0.195370	can't be reached with a
-0.264575	than it is on a
-0.264575	the same function on a
-0.211765	for very long on a
-0.211765	writing a file on a
-0.283709	of the processors on a
-0.021472	objects are accessed on a
-0.409457	compiler to work on a
-0.211765	be cross- compiled on a
-0.211765	Running multiple threads on a
-0.627144	that works best on a
-0.211765	64 64 matrix on a
-0.264575	is preferably implemented on a
-0.283709	can still run on a
-0.283709	worked sufficiently fast on a
-0.211765	than anything else on a
-0.211765	floating point addition on a
-0.488074	should be tested on a
-0.458125	you cannot rely on a
-0.211765	processing unit, either on a
-0.211765	sizes were measured on a
-0.211765	a software package on a
-0.211765	works particularly bad on a
-0.264575	= 250 μs on a
-0.488074	operation is performed on a
-0.211765	are typically specified on a
-0.211765	using example 9.5a on a
-0.211765	the cache miss on a
-0.211765	be predicted perfectly on a
-0.211765	put a tag on a
-0.211765	program runs satisfactorily on a
-0.211765	a Boolean NOT on a
-0.192002	function parameter, or as a
-0.192002	compiler recognizes it as a
-0.192002	fixed size, not as a
-0.242321	circular buffer than as a
-1.013925	is the same as a
-0.381773	may be used as a
-0.267469	also be used as a
-0.192002	a / b as a
-0.292043	composite type such as a
-0.292043	other applications such as a
-0.045666	is as efficient as a
-0.097527	i is stored as a
-0.097527	sign is stored as a
-0.097527	exponent is stored as a
-0.192002	occur quite often as a
-0.192002	object oriented programming as a
-0.242321	software development work as a
-0.242321	possibly be compiled as a
-0.192002	low-level C language as a
-0.192002	can be done as a
-0.054262	code is implemented as a
-0.054262	software is implemented as a
-0.181078	can be implemented as a
-0.049622	should be implemented as a
-0.116286	is often implemented as a
-0.693401	just as fast as a
-0.032111	the same name as a
-0.192002	of thousand numbers as a
-0.296763	brackets index, just as a
-0.192002	matrix in STL as a
-0.192002	It is intended as a
-0.192002	class is given as a
-0.192002	code the offset as a
-0.192002	expressions may occur as a
-0.277847	be linked either as a
-0.192002	It uses ebx as a
-0.179338	can be organized as a
-0.179338	point registers organized as a
-0.320007	contain is provided as a
-0.242321	will be interpreted as a
-0.192002	value in edx as a
-0.192002	i will appear as a
-0.192002	implement a queue as a
-0.185402	to be expressed as a
-0.313489	can be expressed as a
-0.192002	in an FPGA as a
-0.192002	or get ReadTSC as a
-0.192002	seconds or microseconds as a
-0.242321	is implemented internally as a
-0.192002	should be regarded as a
-0.192002	copied by assignment, as a
-0.418976	usability. This is not a
-0.536209	but this is not a
-0.380978	However, this is not a
-0.527626	poorly. It is not a
-0.591613	a union is not a
-0.323456	case, N is not a
-0.323456	This tool is not a
-0.230108	scalar (Scalar means not a
-0.712080	- n.a. n.a. - a
-0.450835	int Func2() { int a
-0.223994	can take more than a
-0.223994	may take more than a
-1.032178	is more efficient than a
-0.859528	is less efficient than a
-0.737127	function is faster than a
-0.391800	to be faster than a
-0.301512	83 called faster than a
-0.325012	or write less than a
-0.305655	full use rather than a
-0.305655	class template rather than a
-0.305655	software implementation rather than a
-0.305655	template parameter rather than a
-0.305655	big blocks rather than a
-0.214173	uses more bits than a
-0.319203	more memory resources than a
-0.158896	that is longer than a
-0.158896	should be longer than a
-0.214173	dynamic_cast more time-consuming than a
-0.214173	error is lower than a
-0.235202	is much slower than a
-0.235202	always run slower than a
-0.214173	function is simpler than a
-0.214173	maintain and verify than a
-0.319198	b; } else { a
-0.244371	1; } else { a
-0.130911	2; } else { a
-0.032328	{ if (b) { a
-0.226207	8.10a if (true) { a
-0.266710	solution is to have a
-0.266710	is important to have a
-0.260079	the loop and have a
-0.312134	some processors that have a
-0.224298	operating system may have a
-0.224298	virtual processor may have a
-0.224298	this unit-test may have a
-0.264899	Non-static member functions have a
-0.264899	case. Inlined functions have a
-0.291145	compiler Intel compilers have a
-0.482876	compilers. Some compilers have a
-0.040422	Some systems also have a
-0.401372	map. Do objects have a
-0.294775	hexadecimal numbers, we have a
-0.512504	used if elements have a
-0.279012	applications. Some systems have a
-0.260079	It may even have a
-0.294775	counters Many CPUs have a
-0.155133	container class must have a
-0.155133	This task must have a
-0.207781	in the thread have a
-0.260079	should therefore preferably have a
-0.207781	processing capabilities still have a
-0.207781	software development models have a
-0.220118	do this every time a
-0.220118	for example every time a
-0.045469	memory block every time a
-0.220118	be updated every time a
-0.282651	returns. The next time a
-0.227732	functions (methods) Each time a
-0.227732	function returns. Every time a
-0.456143	problem is to use a
-0.323462	generates is to use a
-0.248135	the program to use a
-0.050029	more efficient to use a
-0.326968	no need to use a
-0.642111	not advantageous to use a
-0.280901	is recommended to use a
-0.326968	be optimal to use a
-0.326968	is preferred to use a
-0.326968	is safer to use a
-0.291112	user interface and use a
-0.466894	if you can use a
-0.296949	user interface can use a
-0.306002	allocation. Do not use a
-0.306002	resource. Do not use a
-0.428555	then you may use a
-0.194664	one element then use a
-0.086515	(FIFO) basis then use a
-0.086515	(FILO) basis then use a
-0.196864	Floating point variables use a
-0.247784	The vector operations use a
-0.247784	Mathematical functions must use a
-0.196864	may as well use a
-0.247784	Many software applications use a
-0.196864	example, some programmers use a
-0.196864	such applications. Alternatively, use a
-0.020251	appropriate version (May use a
-0.295454	more efficient than when a
-0.331171	new branch only when a
-0.295454	may be used when a
-0.221696	to evaluate b when a
-0.221696	valid. For example, when a
-0.221696	at inconvenient times when a
-0.221696	therefore be advantageous when a
-0.221696	considerable delay comes when a
-0.221696	shared object. Likewise, when a
-0.221696	normal. This happens when a
-0.221696	or re- allocating when a
-0.221696	remarkably in popularity when a
-0.232495	elements is small then a
-0.169558	a limited range then a
-0.169558	a narrow range then a
-0.232495	in the container, then a
-0.285420	and call it from a
-0.285420	or send data from a
-0.296942	reading the value from a
-0.222951	Reading a value from a
-0.455987	compiler when called from a
-0.340005	tasks are available from a
-0.258980	be easily available from a
-0.703480	value is calculated from a
-0.213214	condition is known from a
-0.213214	is not optimal from a
-0.213214	thread steals resources from a
-0.391690	cycles to read from a
-0.213214	waiting for response from a
-0.266212	initialized or comes from a
-0.465563	takes to recover from a
-0.213214	the const restriction from a
-0.321838	is re-loaded from memory a
-0.235019	calculate how much memory a
-0.208143	be responded to at a
-0.208143	method may be at a
-0.087821	handle eight elements at a
-0.087821	handles eight elements at a
-0.174578	it is stored at a
-0.037513	16, i.e. stored at a
-0.208143	a small bit at a
-0.208143	double 32 bits at a
-0.208143	test 16 bytes at a
-0.043443	code one line at a
-0.043443	than one line at a
-0.208143	with four numbers at a
-0.208143	a small piece at a
-0.298009	is typically loaded at a
-0.208143	and BSD comes at a
-0.208143	handle one square at a
-0.208143	a few kilobytes at a
-0.208143	uses by looking at a
-0.298818	function library that has a
-0.171722	Whenever the code has a
-0.115116	automatically. The code has a
-0.115116	} This code has a
-0.115116	Not all code has a
-0.305148	level-2 cache. This has a
-0.229861	+= list[i]; This has a
-0.229861	32 bytes). This has a
-0.201996	the unit-test but has a
-0.253560	a 32-bit integer has a
-0.253560	a polymorphic class has a
-0.489310	7.2). This library has a
-0.285949	A shared object has a
-0.201996	functions, where static has a
-0.524449	if the user has a
-0.272204	Whenever a processor has a
-0.201996	if the application has a
-0.201996	if the parameter has a
-0.333481	the static keyword has a
-0.201996	Each dependency chain has a
-0.201996	The heap manager has a
-0.201996	that the reader has a
-0.148343	this is to make a
-0.148343	way is to make a
-0.232361	solution is to make a
-0.148343	jobs is to make a
-0.148343	goal is to make a
-0.219392	maintenance - to make a
-0.219392	compiler has to make a
-0.045347	more efficient to make a
-0.579249	is possible to make a
-0.286633	often possible to make a
-0.414930	it takes to make a
-0.996718	in order to make a
-0.377265	shows how to make a
-0.263999	discusses how to make a
-0.414930	be useful to make a
-0.219392	are sure to make a
-0.219392	compiler needs to make a
-0.219392	three ways to make a
-0.219392	be safe to make a
-0.219392	is convenient to make a
-0.414930	the effort to make a
-0.219392	often preferable to make a
-0.219392	good idea to make a
-0.219392	is sufficient to make a
-0.258201	possible case and make a
-0.258201	of truncation and make a
-0.406347	Instead, you can make a
-0.350490	function. Do not make a
-0.252837	support. Then you make a
-0.291417	Instead, I will make a
-0.185466	OMF format. Alternatively, make a
-0.231607	using assembly language because a
-0.231607	in the values because a
-0.231607	can be vectorized, because a
-0.231607	determined in advance, because a
-0.379996	systems, there is only a
-0.266102	register keyword is only a
-0.266102	misprediction penalty is only a
-0.321265	should include not only a
-0.297625	code that use only a
-0.410401	the function has only a
-0.312180	Usually it takes only a
-0.297625	container that contains only a
-0.205668	a smart pointer. If a
-0.257696	around this problem. If a
-0.205668	procedures are inefficient. If a
-0.257696	procedure linkage table. If a
-0.257696	be an integer. If a
-0.257696	a 32-bit number. If a
-0.205668	long dependency chain. If a
-0.205668	in the future. If a
-0.257696	any other factor. If a
-0.020977	the AVX part. If a
-0.205668	difficult to read. If a
-0.205668	can be obtained. If a
-0.205668	Linux and BSD. If a
-0.205668	instruction set extensions. If a
-0.205668	a = lookup[b]; If a
-0.205668	(number of ways). If a
-0.599378	processor models on which a
-0.234481	memory address at which a
-0.043650	a[size], b[size]; // set a
-0.231666	The debugger cannot set a
-0.505042	may have to do a
-0.328969	is good to do a
-0.328969	clock cycles to do a
-0.409336	Modern compilers can do a
-0.315702	Modern CPUs can do a
-0.251201	(v. 15.0) is using a
-0.324940	The advantage of using a
-0.369673	The disadvantage of using a
-0.477359	The trick of using a
-0.300843	various alternatives to using a
-0.232480	static or by using a
-0.232480	of 2 by using a
-0.232480	64-bit systems by using a
-0.474319	in speed by using a
-0.232480	significantly simply by using a
-0.232480	is obtained by using a
-0.531912	be improved by using a
-0.232480	these guidelines by using a
-0.199901	as efficient as using a
-0.199901	code rather than using a
-0.236140	// Example 8.4 double a
-0.292530	make the array size a
-0.346951	function through function pointer a
-0.223604	the frame function into a
-0.175294	the allocated memory into a
-0.255400	data Loading data into a
-0.175294	elements of b into a
-0.175294	the allocated array into a
-0.175294	Putting simple variables into a
-0.175294	time- consuming calculations into a
-0.030151	multiple .cpp files into a
-0.175294	the structure y into a
-0.175294	the objects together into a
-0.175294	put a task into a
-0.267621	by copying them into a
-0.223604	does not fit into a
-0.078862	that it fits into a
-0.078862	four float's fits into a
-0.175294	can be turned into a
-0.175294	simply put 80 into a
-0.175294	can be combined into a
-0.175294	by some formula into a
-0.297716	can be joined into a
-0.078862	can be wrapped into a
-0.078862	they are wrapped into a
-0.175294	- preferably isolated into a
-0.175294	a time packed into a
-0.236027	// Example 8.18 float a
-0.274415	However, C++ is also a
-0.274415	do so is also a
-0.274415	target buffer is also a
-0.274415	or while-loop is also a
-0.220305	C++ programs and also a
-0.220305	table and possibly also a
-0.260476	performance costs to such a
-0.279426	often reorganized in such a
-0.260476	the user if such a
-0.260476	do not have such a
-0.279426	possible to do such a
-0.208133	running. Programs using such a
-0.208133	a parenthesis around such a
-0.208133	following example illustrates such a
-0.208133	enough to justify such a
-0.208133	system may supply such a
-0.236551	on different processors. In a
-0.544241	element of the array a
-0.610132	may be cases where a
-0.203603	But a solution where a
-0.203603	take memory space where a
-0.138616	is the situation where a
-0.138616	to the situation where a
-0.165025	space. A situation where a
-0.165025	in any situation where a
-0.242634	may be situations where a
-0.242634	There are situations where a
-0.203603	as multiple inheritance where a
-0.203603	of sequential instructions, where a
-0.320138	Any task that takes a
-0.163734	double to integer takes a
-0.163734	to an integer takes a
-0.296344	function pointer typically takes a
-0.222447	and garbage collection takes a
-0.236287	the same address so a
-0.268730	with new and return a
-0.268730	Make the function return a
-0.485976	} else { return a
-0.263320	& b) { return a
-0.263320	if (b) { return a
-0.345240	(float a) { return a
-0.278284	* 3; } return a
-0.020431	a * 2; return a
-0.020431	a * 3; return a
-0.227993	6 unused bytes between a
-0.451212	illustrates the difference between a
-0.280181	FPGAs. The difference between a
-0.336262	satisfied with the way a
-0.232048	to predict which way a
-0.340724	been loaded. This makes a
-0.219251	functions. The compiler makes a
-0.219251	another class. It makes a
-0.219251	The installation program makes a
-0.219251	code section position-independent, makes a
-0.499846	other function is called a
-0.307263	other functions is called a
-0.307263	preceding one is called a
-0.339788	read from memory address a
-0.202310	longer time to call a
-0.149051	it takes to call a
-0.202310	that need to call a
-0.202310	that needs to call a
-0.233681	header files For example, a
-0.233681	ample resources. For example, a
-0.233681	the variable. For example, a
-0.309694	point expressions. For example, a
-0.233681	execution units. For example, a
-0.233681	each core. For example, a
-0.309694	more constants. For example, a
-0.233681	in question. For example, a
-0.233681	a matrix. For example, a
-0.233681	of algebra. For example, a
-0.233681	algebraic reduction. For example, a
-0.325605	reinstallation work to take a
-0.306622	big objects that take a
-0.314965	The branches may take a
-0.217489	precision. These conversions take a
-0.217489	log, and logarithms take a
-0.408469	frameworks. This is often a
-0.315003	that there is often a
-0.225796	compiler e.g. how often a
-0.483615	useful to know how a
-0.231162	or no idea how a
-0.220375	Therefore, you only need a
-0.421927	because it doesn't need a
-0.223839	A class doesn't need a
-0.402536	then you don't need a
-0.448644	on how to test a
-0.202923	a hundred or even a
-0.202923	binary search, or even a
-0.224718	as you have even a
-0.206207	always possible to access a
-0.090978	is faster to access a
-0.090978	much faster to access a
-0.206207	following steps to access a
-0.372434	storage. If you access a
-0.313560	useful to roll out a
-0.313560	advantageous to roll out a
-0.562457	& ~a = 0 a
-0.562457	a ^a = 0 a
-0.446024	example, in the case a
-0.902565	n.a. - a & a
-0.235053	By giving each constant a
-0.217063	modules that make up a
-0.217063	needed for setting up a
-0.095109	less efficient. Splitting up a
-0.095109	this rule. Splitting up a
-0.280074	have facilities for making a
-0.373500	unless you are making a
-0.278275	precision or by making a
-0.278275	one division by making a
-0.146061	be faster than making a
-0.146061	object rather than making a
-0.869658	the compiler from making a
-0.192529	code for actually making a
-0.536706	library. If you want a
-0.348845	compiler additional information about a
-0.227968	uses 32 bits while a
-0.227968	a frame function, while a
-0.234800	function name ;startofFunc ; a
-0.283205	function which then calls a
-0.228221	with AVX support calls a
-0.206921	of copying it Use a
-0.206921	blocks, for example: Use a
-0.206921	are satisfied: 1. Use a
-0.091252	...................................................................................... 16 3.2 Use a
-0.091252	be improved. 3.2 Use a
-0.234656	in doubt how big a
-0.220730	to function names. But a
-0.220730	advantageous by itself. But a
-0.220730	of its simplicity. But a
-0.096212	to the function through a
-0.096212	Calling a function through a
-0.034290	its child class through a
-0.034290	third generation class through a
-0.034290	a derived class through a
-0.112246	that accesses b through a
-0.096212	variable or object through a
-0.096212	accessing an object through a
-0.112246	accessing a variable through a
-0.112246	function is called through a
-0.112246	its own address through a
-0.205896	it is accessed through a
-0.205896	is necessarily accessed through a
-0.168454	but must go through a
-0.112246	find the GOT through a
-0.112246	an extra jump through a
-0.112246	by the caller through a
-0.112246	allocated block. Walking through a
-0.112246	time than looping through a
-0.112246	can be propagated through a
-0.088066	& a = a, a
-0.088066	&& true = a, a
-0.020399	& -1 = a, a
-0.426550	is possible to compile a
-0.899002	columns in a matrix a
-0.226630	function writes to matrix a
-0.328023	columns had not been a
-0.237123	that it may cause a
-0.237123	modules. This may cause a
-0.171975	class members may cause a
-0.564890	address 0x2710 will cause a
-0.290979	easier. I have done a
-0.557653	code. It is therefore a
-0.238100	if it is inside a
-0.188241	declaring the table inside a
-0.026229	and objects declared inside a
-0.054145	class Variables declared inside a
-0.054145	inefficient. Variables declared inside a
-0.238100	body is defined inside a
-0.267028	second application that uses a
-0.310552	that the compiler uses a
-0.345966	if the program uses a
-0.174183	time. The program uses a
-0.234401	a particular application uses a
-0.234401	} This implementation uses a
-0.184941	where it still uses a
-0.234821	operating systems". The parameters a
-0.239825	is possible to get a
-0.606342	in order to get a
-0.177219	information elsewhere and get a
-0.037993	then you may get a
-0.037993	fact, you may get a
-0.177219	appropriately. Users should get a
-0.177219	you will soon get a
-0.177219	extremely inefficient, (4) get a
-0.647577	float a; int b; a
-0.290704	float a; double b; a
-0.088288	m;} int a, b; a
-0.088288	1.6; int a, b; a
-0.064477	14.14b double a, b; a
-0.064477	14.14a double a, b; a
-0.064477	8.2a double a, b; a
-0.027209	66 float a, b; a
-0.027209	14.18a float a, b; a
-0.027209	14.18b float a, b; a
-0.247712	float a; bool b; a
-0.098656	instruction and have implemented a
-0.098656	call. I have implemented a
-0.234855	integer comparisons. The solution a
-0.657159	of processors that support a
-0.043322	if the software contains a
-0.043322	If the software contains a
-0.259693	Vectorized code often contains a
-0.207439	if the expression contains a
-0.234827	GB. When considering whether a
-0.320768	if you are doing a
-0.261234	64-bit programs to run a
-0.261234	will prefer to run a
-0.331962	Many processors can calculate a
-0.328609	more likely to inline a
-0.278555	vector size then add a
-0.224121	100000000. When we add a
-0.097927	0, sizeof(a)); // copy a
-0.097927	= 0.0; // copy a
-0.234244	the best job optimizing a
-0.076669	code. It is simply a
-0.076669	costless. It is simply a
-0.169837	or structure is simply a
-0.076669	the difference is simply a
-0.076669	The difference is simply a
-0.334747	1000; i++) { ... a
-0.301307	it may be quite a
-0.044290	memory can take quite a
-0.044290	which can take quite a
-0.919912	XMM registers are used. a
-0.427267	just easier to write a
-0.223147	example if you write a
-0.633327	you want to optimize a
-0.234211	the x86 CPUs. However, a
-0.289858	pointers In simple cases, a
-0.280809	is possible to replace a
-0.187868	be possible to replace a
-0.381697	The compiler can replace a
-0.080572	Likewise, you cannot replace a
-0.080572	1; You cannot replace a
-0.228394	compiler can automatically replace a
-0.376621	efficient. 64-bit Windows allows a
-0.221710	CPU dispatcher then sets a
-0.221710	The initialization routine sets a
-0.252820	example, in the expression a
-0.332593	both, while the expression a
-0.281960	the time. The expression a
-0.233364	used twice for handling a
-0.237570	to simple things like a
-0.187768	is also treated like a
-0.314332	object that behaves like a
-0.187768	function is expanded like a
-0.187768	to simple actions like a
-0.221407	for each element __m128i a
-0.221407	two AND operations: __m128i a
-0.233612	F2(b); } } Using a
-0.094744	is recommended to put a
-0.094744	therefore recommended to put a
-0.216098	would like to put a
-0.186185	b*(2.0/3.0) unless you put a
-0.235795	stopping threads. Don't put a
-0.221575	+= 1.0f; This needs a
-0.221575	If a loop needs a
-0.440806	: b * c; a
-0.186108	= b / c; a
-0.424573	int a, b, c; a
-0.186108	= b % c; a
-0.220643	be called, or what a
-0.274615	You can change what a
-0.233520	monitor counters before running a
-0.583758	= true a && a
-0.219179	equivalent expression b && a
-0.612921	= a, a | a
-0.023382	cc[]) { // Make a
-0.024653	= _mm_set1_epi16(0); // Make a
-0.108319	Is16vec8 zero(0,0,0,0,0,0,0,0); // Make a
-0.147219	with C++0x support. Make a
-0.147219	expressions and operators. Make a
-0.206885	// Example 7.10b char a
-0.206885	// Example 8.17 char a
-0.206885	// Example 7.9b char a
-0.327344	the pointer is needed a
-0.218123	CPU dispatcher should give a
-0.218123	solution can still give a
-0.217925	Unrolling a loop becomes a
-0.271539	big that caching becomes a
-0.258378	memory block. This requires a
-0.317598	between two pointers requires a
-0.190206	on such processors requires a
-0.190206	etc. Event-based sampling requires a
-0.287332	more time to load a
-0.287332	all writes to load a
-0.232634	as fast as calling a
-1.405088	The following example shows a
-0.217569	section (page 131) shows a
-0.208508	we want to generate a
-0.208508	is likely to generate a
-0.018059	to 0 and generate a
-0.363149	created it will generate a
-0.232242	constant propagation and reduce a
-0.299351	it is to choose a
-0.483459	today. You may choose a
-0.216085	compilers I have made a
-0.216085	consequences. I once made a
-0.278056	vector classes is just a
-0.250018	the cache in just a
-0.198850	memory even when just a
-0.232522	table lookup or require a
-0.168013	pointer does not require a
-0.215475	time measurements may require a
-0.168013	or global arrays require a
-0.168013	and non-constant references require a
-0.308172	is possible to start a
-0.215360	that it can start a
-0.231446	processor model N supports a
-1.018944	the number of columns a
-0.284256	CPUs unequally can become a
-0.294162	suboptimal way has become a
-0.231979	external clock. This gives a
-0.231443	a request for inlining a
-0.231153	= !(a || b) a
-0.155277	going either way. Such a
-0.155277	so-called soft processor. Such a
-0.155277	hardware definition language. Such a
-0.155277	on the market. Such a
-0.155277	in computer games. Such a
-0.231868	The preceding paragraph described a
-0.286535	The Boolean operators produce a
-0.231439	If you are including a
-0.261740	processor may be given a
-0.209253	has not been given a
-0.230976	compilers will make temp a
-0.838579	a, b, c, d; a
-0.305564	overlap. You can save a
-0.231435	tables if this prevents a
-0.629440	no way to tell a
-0.136452	not have to unroll a
-0.136452	not necessary to unroll a
-0.136452	an advantage to unroll a
-0.169085	Compilers will usually unroll a
-0.230113	code faster because testing a
-0.319162	than reading or writing a
-0.167505	access Reading or writing a
-0.167505	0x1C. Reading or writing a
-0.230607	discussion of profiling. When a
-0.230113	function implicitly when copying a
-0.241769	The code for accessing a
-0.074275	more time than accessing a
-0.074275	less efficient than accessing a
-0.163929	Pointer aliasing When accessing a
-0.306622	x must wait until a
-0.240469	each address by adding a
-0.240469	be calculated by adding a
-0.230277	be mispredicted, which causes a
-0.474602	sometimes able to predict a
-0.090314	|| true = true a
-0.090314	|| !a = true a
-0.304887	A debugger can execute a
-0.826784	template specialization for N a
-0.336985	library (or at least a
-0.255797	often possible to insert a
-0.188000	down. Remember to insert a
-0.268605	static memory and insert a
-0.089329	to memory without loading a
-0.089329	a double without loading a
-0.313734	induction variables for calculating a
-0.202650	it to begin calculating a
-0.228892	response time to e.g. a
-0.069244	a > b ? a
-0.069244	(a > b ? a
-0.229668	the programmer has defined a
-0.267586	optimizations you can expect a
-0.027352	case. You cannot expect a
-0.027352	examples. You cannot expect a
-0.027352	compiler-specific. You cannot expect a
-0.486830	works is of course a
-0.171020	principle is useful whenever a
-0.171020	an extra cost whenever a
-0.171020	table (PLT). And whenever a
-0.283430	not recommended to modify a
-0.198872	common way of setting a
-0.293108	CPU core by setting a
-0.145024	an integer is within a
-0.145024	byte of zero within a
-0.145024	to be irrelevant within a
-0.145024	or by keys within a
-0.228206	in member functions counts a
-0.198872	processors. On many processors, a
-0.198872	tree. On older processors, a
-0.228632	} } } Obviously, a
-0.144315	pooling) than to allocate a
-0.144315	is necessary to allocate a
-0.144315	and delete to allocate a
-0.144672	The string classes allocate a
-0.229059	called. I have added a
-0.229699	problems and they waste a
-0.277605	transpose matrix // define a
-0.198474	Alternatively, you may define a
-0.167816	more efficient to implement a
-0.036272	is possible to implement a
-0.227134	test data should contain a
-0.368520	that reads or writes a
-0.282508	usually faster to transfer a
-0.315930	can easily optimize away a
-0.282508	} We can multiply a
-0.245885	// This function stores a
-0.195175	the Gnu mechanism stores a
-0.228079	the purpose of finding a
-0.369501	the compiler will vectorize a
-0.210747	would be to include a
-0.163772	newest instruction sets include a
-0.163772	Most compiler packages include a
-0.282776	do an integer addition, a
-0.227606	member is unchanged across a
-0.226857	browsing that previously required a
-0.397004	calls may slow down a
-0.224688	as if it had a
-0.094982	= b / 10; a
-0.094982	(unsigned int)b / 10; a
-0.051672	= b % 10; a
-0.051672	(unsigned int)b % 10; a
-0.224688	a different type. Likewise, a
-0.225291	heap manager can spend a
-0.697495	the function is called, a
-0.514583	before and after executing a
-0.224688	what kind of exceptions a
-0.290271	long time to transpose a
-0.217319	it takes to transpose a
-0.223213	spot. Repeating the break a
-0.223213	of how to break a
-0.222862	a base address plus a
-0.223564	= b / 16; a
-0.223564	= b % 16; a
-0.320937	be enough to identify a
-0.223213	3 breakpoint and show a
-0.174943	user but only show a
-0.406324	a needs to evaluate a
-0.223213	or a non-const reference, a
-0.297668	there is only half a
-0.175572	is used for converting a
-0.132075	7.4 we are converting a
-0.132075	line is implicitly converting a
-0.131790	A branch that follows a
-0.131790	statement if it follows a
-0.131790	the function pointer follows a
-0.222862	deciding whether to base a
-0.223915	Use lookup tables Reading a
-0.223915	processor model numbers form a
-0.131790	The type __m128i defines a
-0.131790	The type __m128 defines a
-0.131790	The type __m128d defines a
-0.222862	b * c; Is16vec8 a
-0.131790	classes or structures. Accessing a
-0.131790	data more compact. Accessing a
-0.131790	a variable. Efficiency Accessing a
-0.022322	it takes to install a
-0.113674	the user must install a
-0.053128	extra framework can consume a
-0.053128	A database can consume a
-0.113674	operators and functions consume a
-0.817821	On the other hand, a
-0.220737	Windows program that created a
-0.221575	the case labels follow a
-0.163731	to use and returns a
-0.163731	member function which returns a
-0.359622	most efficient solution. Is a
-0.221156	of computing resources. Typically, a
-0.221156	^ -1 = ~a a
-0.221156	double. Here we prefer a
-0.270549	if a loop repeats a
-0.269962	be overcome by defining a
-0.126931	of code that produces a
-0.041704	an unsigned variable produces a
-0.041704	a signed variable produces a
-0.270549	*p and calculate *p+2 a
-0.193076	is obtained by choosing a
-0.147880	into account when choosing a
-0.289340	a strategy for saving a
-0.147438	is never used. Whenever a
-0.147438	processors is better. Whenever a
-0.216532	b; int Sum1() {return a
-0.147438	7.7 Function pointers Calling a
-0.147438	of the class. Calling a
-0.217050	functions. You can force a
-0.270549	simple type, a pointer, a
-0.217050	that a function opens a
-0.262913	The syntax may seem a
-0.210292	level framework still consumes a
-0.210292	well the compiler optimizes a
-0.281972	return a[i]; // Return a
-0.210974	Example: // Example 7.2 a
-0.210292	in some cases ignore a
-0.163976	pointer may be considered a
-0.121571	is not traditionally considered a
-0.210974	the addresses are spaced a
-0.227842	be used for implementing a
-0.121571	93 themselves. But implementing a
-0.210292	__attribute__((const)) (Linux only). Specifies a
-0.210292	appropriate here. It reveals a
-0.210292	SSE2 instruction set (requires a
-0.210292	everything is float 140 a
-0.210292	No program should leave a
-0.121571	a thousand numbers. With a
-0.121571	the next step. With a
-0.210974	string length function scans a
-0.210292	performance can easily justify a
-0.121571	an integer that holds a
-0.121571	The float type holds a
-0.210292	program has two arrays, a
-0.056545	&& false = false, a
-0.056545	&& !a = false, a
-0.121571	the remaining bits represent a
-0.121571	certain to truly represent a
-0.121571	it. Instead of returning a
-0.121571	unconventional manner by returning a
-0.210974	require cleanup before terminating a
-0.281972	cache space by joining a
-0.281972	often seen, is certainly a
-0.262913	Example 8.21 is indeed a
-0.210292	infinity or NAN (not a
-0.210292	variable where it expects a
-0.210292	optimizing compilers can compute a
-0.198076	horizontal add, etc. SSSE3 a
-0.198076	the Intel mechanism executes a
-0.198076	research, I have developed a
-0.249148	the cost of keeping a
-0.198076	class library can emulate a
-0.198076	the function inline. Replacing a
-0.198076	programming language Before starting a
-0.198076	if, and only if, a
-0.198076	libraries and drivers differ a
-0.074046	immediate response to pressing a
-0.074046	simple tasks like pressing a
-0.249148	testing, verifying and maintaining a
-0.198076	16) { b.load(bb+i); c.load(cc+i); a
-0.198076	on the CPU. Unrolling a
-0.198076	a time and afterwards a
-0.074046	may need to lock a
-0.074046	than to temporarily lock a
-0.198076	and their implementations reveal a
-0.328183	float x, y, z; a
-0.074046	a function which transposes a
-0.074046	The following example transposes a
-0.249148	CriticalFunctionDispatch(void) { // Returns a
-0.198076	message when it sees a
-0.198076	make the compiler treat a
-0.198076	faster to make log2 a
-0.198076	spots, but for studying a
-0.249148	this fact by replacing a
-0.198076	be fragmented and involve a
-0.198076	be a simple type, a
-0.198076	0 or 1. Writing a
-0.198076	does this by assigning a
-0.198076	a way of relieving a
-0.163364	of irrelevant software installed, a
-0.163364	Linux syntax 90 Gives a
-0.163364	more efficient to re-use a
-0.163364	a class definition. Inlining a
-0.163364	integer operations for incrementing a
-0.163364	structure. Incrementing or decrementing a
-0.163364	in memory by requesting a
-0.163364	function call statement occupies a
-0.163364	code, see below. Installing a
-0.163364	of a program executable: a
-0.163364	(~a&c) a&b&c&d = (a&b)&(c&d) a
-0.163364	} } } Transposing a
-0.163364	by // Example 8.5b a
-0.163364	is not modified. Unlike a
-0.163364	delete it and create a
-0.163364	sequence of calculations forms a
-0.163364	is faster to compose a
-0.163364	graphics function that draws a
-0.163364	cycles whenever it feeds a
-0.163364	by // Example 8.2b a
-0.163364	by // Example 8.3b a
-0.163364	where each bit indicates a
-0.163364	This solution can incur a
-0.163364	OK, however, to pass a
-0.163364	user has to reinstall a
-0.163364	below. The function rounds a
-0.163364	to: // Example 8.10b a
-0.163364	be used for fetching a
-0.163364	you spend on redesigning a
-0.163364	lookup[2] = {2.6f, 1.5f}; a
-0.163364	The following example converts a
-0.163364	Multiply(10,8); b = MultiplyBy<8>(10); a
-0.163364	frameworks, rather than isolating a
-0.163364	--xxxx--- a & a= a
-0.163364	OneOrTwo5[2] = {1.0f, 2.5f}; a
-0.163364	large overhead of managing a
-0.163364	b ---xx---- a<<b<<c=a<<(b+c) x-xxx--xx a
-0.163364	to develop and publish a
-0.163364	a|(b&c) x-xxxx--x ~a&~b=~(a|b) --xxxx--- a
-0.163364	insufficient amount of RAM, a
-0.163364	first time you activate a
-0.163364	the program of occupying a
-0.350729	1; } This is of
-0.350729	all objects. This is of
-0.236707	that already works is of
-0.236707	games and animations is of
-0.237167	processing speed exceeding that of
-0.356102	this table may be of
-0.355291	and b should be of
-0.292872	system calls. These are of
-0.023400	use a table // of
-0.353467	useful whenever a function of
-0.023080	is a linear function of
-0.866647	the CPU detection function of
-0.232146	a monotonically increasing function of
-0.100743	directly: Library exp function of
-0.100743	4 floats exp function of
-0.232146	is a staircase function of
-0.357270	simultaneously prefetching the code of
-0.568906	code then you may of
-0.321632	unknown at the time of
-0.310701	an estimated calculation time of
-0.014060	said that the use of
-0.004636	obtained by the use of
-0.004636	additions by the use of
-0.004636	bitfield by the use of
-0.006973	directly with the use of
-0.006973	rewritten with the use of
-0.014060	This makes the use of
-0.014060	it prevents the use of
-0.014060	function. Avoid the use of
-0.003472	to economize the use of
-0.014060	addresses. Especially the use of
-0.043623	the program. The use of
-0.043623	other purposes. The use of
-0.043623	size vector. The use of
-0.043623	multiple threads. The use of
-0.209196	application can make use of
-0.209196	obstacles to efficient use of
-0.209196	should avoid any use of
-0.209196	to make better use of
-0.209196	cases. The explicit use of
-0.209196	cache space. Excessive use of
-0.515557	using one or more of
-1.209300	parts of the program of
-0.064544	STL as a vector of
-0.064544	organized as a vector of
-0.234202	fit into a vector of
-0.003345	// Make a vector of
-0.147818	a 128 bit vector of
-0.283974	to non-AVX code because of
-0.306144	at compile time because of
-0.283974	a < b because of
-0.211989	make register variables because of
-0.211989	in some systems because of
-0.283974	the subsequent times because of
-0.211989	than half speed because of
-0.211989	as function parameters because of
-0.264829	most efficient solution because of
-0.211989	slower than intended because of
-0.488498	should be avoided because of
-0.426670	This is inefficient because of
-0.211989	a hard disk because of
-0.211989	processors are preferred because of
-0.211989	or fail completely because of
-0.121011	have the member functions of
-0.121011	If the member functions of
-0.289352	52. The member functions of
-0.337641	should give a CPU of
-0.532091	on the newest CPU of
-0.234376	also a possible point of
-0.234376	from a technological point of
-0.513606	done in a loop of
-0.092773	the critical innermost loop of
-0.285914	from the message loop of
-0.290079	it is discussed which of
-0.234271	know in advance which of
-0.331188	you access to all of
-0.332905	most efficient if all of
-0.067767	Gnu This is one of
-0.067767	CPUs"). This is one of
-0.148099	runtime. Polymorphism is one of
-0.307768	a pointer to one of
-0.148099	is identical to one of
-0.148099	often belong to one of
-0.326272	For example, only one of
-0.238535	be read into one of
-0.238535	to 0x273F into one of
-0.213128	You may choose one of
-0.213128	cache line. Only one of
-0.213128	Day for signifying one of
-0.339698	example is a cache of
-0.468847	a level-1 data cache of
-0.440781	and a level-2 cache of
-0.292866	to NULL. There should of
-0.351218	of declaring an integer of
-0.043875	to use a set of
-0.043875	operations use a set of
-0.210676	can calculate which set of
-0.210676	most commonly used set of
-0.210676	(there is one set of
-0.210676	instance for each set of
-0.210676	for a particular set of
-0.210676	has its own set of
-0.210676	on a typical set of
-0.210676	with a suitable set of
-0.021384	with a realistic set of
-0.341862	functions if the class of
-0.233549	doesn't know what class of
-0.233582	a structure or each of
-0.289295	before and after each of
-0.283227	look at the example of
-0.055403	80 for an example of
-0.055403	89 for an example of
-0.118923	58 shows an example of
-0.207440	shared object where most of
-0.207440	goes one way most of
-0.207440	for Windows, while most of
-0.207440	vice versa. But most of
-0.207440	the application uses most of
-0.207440	likely to run most of
-0.207440	they are predicted most of
-0.091451	Many programs spend most of
-0.091451	Some applications spend most of
-0.207440	most software runs most of
-0.207440	you can obtain most of
-0.207440	task that consumes most of
-0.207440	program 153 spends most of
-0.067876	multiple of the size of
-0.067876	documentation for the size of
-0.162294	memory if the size of
-0.249348	happen if the size of
-0.075183	divisible by the size of
-0.145063	multiplied by the size of
-0.067876	well as the size of
-0.021468	matrix when the size of
-0.021468	allocation when the size of
-0.021468	dynamically when the size of
-0.067876	i*12, because the size of
-0.067876	addition. If the size of
-0.118296	and where the size of
-0.067876	above example, the size of
-0.067876	slow unless the size of
-0.067876	that fit the size of
-0.067876	cannot increase the size of
-0.067876	only half the size of
-0.067876	// Return the size of
-0.067876	table increases the size of
-0.036493	more efficient. The size of
-0.036493	less efficient. The size of
-0.076337	execution units. The size of
-0.076337	class elements. The size of
-0.076337	another module. The size of
-0.076337	following reasons: The size of
-0.076337	number i. The size of
-0.040417	with a line size of
-0.153360	registers. The maximum size of
-0.153360	temp; // Define size of
-0.153360	operations. The total size of
-0.033566	If the combined size of
-0.033566	where the combined size of
-0.016459	of elements Total size of
-0.799769	converted to a pointer of
-0.323593	For example, a library of
-0.003745	memory is a multiple of
-0.003745	array is a multiple of
-0.003745	matrix is a multiple of
-0.001868	stride is a multiple of
-0.004686	spaced by a multiple of
-0.019054	array size a multiple of
-0.019054	are spaced a multiple of
-0.324804	determined where the object of
-0.039762	points to an object of
-0.039762	together in an object of
-0.187055	called on an object of
-0.187055	way as an object of
-0.254693	when accessing an object of
-0.339489	time a new object of
-0.154333	is declared. An object of
-0.154333	7.22 Inheritance An object of
-0.005216	n is the number of
-0.005216	r is the number of
-0.010495	equal to the number of
-0.003470	processors and the number of
-0.003470	branches and the number of
-0.003470	flow and the number of
-0.005216	calculate that the number of
-0.005216	expected that the number of
-0.010495	array or the number of
-0.002600	faster if the number of
-0.002600	example, if the number of
-0.002600	complicated if the number of
-0.002600	minimized if the number of
-0.005216	and by the number of
-0.005216	divisible by the number of
-0.010495	depends on the number of
-0.010495	such as the number of
-0.005216	more than the number of
-0.005216	priority than the number of
-0.005216	useful when the number of
-0.005216	negligible when the number of
-0.010495	to make the number of
-0.017371	code. If the number of
-0.017371	used. If the number of
-0.017371	cache. If the number of
-0.017371	problem. If the number of
-0.017371	time? If the number of
-0.010495	would double the number of
-0.003470	set where the number of
-0.003470	cases where the number of
-0.003470	situations where the number of
-0.010495	distinguishing between the number of
-0.010495	of making the number of
-0.010495	critical. Therefore, the number of
-0.010495	to reduce the number of
-0.010495	of reducing the number of
-0.010495	that measures the number of
-0.033262	used in a number of
-0.008080	There are a number of
-0.033262	available from a number of
-0.007132	application program. The number of
-0.007132	is available. The number of
-0.007132	the system. The number of
-0.007132	bit systems: The number of
-0.007132	by 8. The number of
-0.007132	is small. The number of
-0.007132	overlap. 27 The number of
-0.004146	= 512; // number of
-0.016827	= 64; // number of
-0.088482	always use this number of
-0.025491	of a variable number of
-0.025491	time. A variable number of
-0.052572	a very large number of
-0.052572	finished. The optimal number of
-0.025491	only a limited number of
-0.025491	storage A limited number of
-0.005537	27). The maximum number of
-0.005537	weekdays. The maximum number of
-0.005537	67 The maximum number of
-0.052572	have a reduced number of
-0.005537	when the total number of
-0.002760	If the total number of
-0.052572	get a realistic number of
-0.005537	into an excessive number of
-0.005537	avoid an excessive number of
-0.005537	Avoid an excessive number of
-0.052572	by the 107 number of
-0.012558	for an increasing number of
-0.012558	seeing an increasing number of
-0.052572	with an extended number of
-0.052572	simultaneous lookups Max. number of
-0.052572	get an integral number of
-0.364810	applies to an array of
-0.397443	data as an array of
-0.279462	are feeding an array of
-0.222571	// Make dynamic array of
-0.222571	list; // Make array of
-0.461048	standard libraries for many of
-0.328070	an IDE with many of
-0.254641	language. D has many of
-0.254641	platforms. Pascal has many of
-0.222129	C# and avoids many of
-0.130205	calls the same version of
-0.061351	compile time which version of
-0.061351	is known which version of
-0.061351	when testing which version of
-0.061351	with certainty which version of
-0.188954	speed of each version of
-0.130205	The best possible version of
-0.266208	gets the new version of
-0.226995	only the SSE2 version of
-0.130205	AVX2 // specific version of
-0.238901	in the optimized version of
-0.130205	then the optimal version of
-0.130205	with a better version of
-0.130205	six years old version of
-0.236348	to the appropriate version of
-0.104800	choose the appropriate version of
-0.104800	loads the appropriate version of
-0.083261	systems. The appropriate version of
-0.344959	to the desired version of
-0.006295	to the right version of
-0.012683	finding the right version of
-0.130205	in the final version of
-0.130205	If a future version of
-0.130205	uses a newer version of
-0.130205	because the interpreted version of
-0.133472	The 17 debug version of
-0.133472	time. Uses debug version of
-0.012683	example, the latest version of
-0.012683	Use the latest version of
-0.012683	gets the latest version of
-0.130205	8 most popular version of
-0.173505	supports. An inferior version of
-0.130205	debugging. A command-line version of
-0.344959	and a release version of
-0.177423	regardless of the value of
-0.182718	is that the value of
-0.107031	means that the value of
-0.107031	assume that the value of
-0.107031	detect that the value of
-0.021511	used if the value of
-0.021511	possible if the value of
-0.021511	way if the value of
-0.021511	predicted if the value of
-0.021511	course, if the value of
-0.120115	executed. If the value of
-0.120115	digits, so the value of
-0.120115	Make sure the value of
-0.267907	to calculate the value of
-0.120115	cycles after the value of
-0.055918	you read the value of
-0.055918	only read the value of
-0.120115	doesn't know the value of
-0.120115	to change the value of
-0.120115	N&(N-1) gives the value of
-0.120115	to hold the value of
-0.120115	ebx restores the value of
-0.075309	little explanation. The value of
-0.075309	clock counts. The value of
-0.075309	when false. The value of
-0.123837	for each different value of
-0.123837	to the integer value of
-0.155210	x∙xn-1, and each value of
-0.155210	serial because each value of
-0.057519	for the new value of
-0.057519	calculating a new value of
-0.123837	off the binary value of
-0.123837	A possible negative value of
-0.123837	on the preceding value of
-0.123837	when the final value of
-0.099095	of the absolute value of
-0.099095	calculate the absolute value of
-0.123837	Obviously, the initial value of
-0.074193	becomes fragmented when objects of
-0.074193	become fragmented when objects of
-0.222435	possible to store objects of
-0.222435	strings and similar objects of
-0.222435	or void. Returning objects of
-0.263873	execution speed in any of
-0.263873	as true, if any of
-0.308847	is bypassed by any of
-0.211143	union can use any of
-0.092864	execution units. If any of
-0.092864	identification (RTTI) If any of
-0.211143	be used, but any of
-0.282975	old microprocessors without any of
-0.282296	take care of some of
-0.292156	software programmers to some of
-0.292156	dispatch mechanisms, and some of
-0.282296	compiler comes with some of
-0.210567	I have described some of
-0.210567	very common. Even some of
-0.210567	than others. While some of
-0.210567	following sections describe some of
-0.191918	pointer to a table of
-0.191918	(PLT) and a table of
-0.191918	implemented as a table of
-0.260380	value from a table of
-0.191918	object has a table of
-0.303257	p. 104). The table of
-0.209084	= { // table of
-0.209084	Loop to make table of
-0.250501	unsatisfied with the performance of
-0.250501	cases where the performance of
-0.250501	information about the performance of
-0.250501	to compare the performance of
-0.250501	can influence the performance of
-0.209392	Table 2.1. Comparing performance of
-0.209392	measuring the overall performance of
-0.209392	microprocessor The benchmark performance of
-0.155332	is that the order of
-0.155332	can check the order of
-0.155332	we change the order of
-0.045869	not swap the order of
-0.022329	cannot swap the order of
-0.155332	when swapping the order of
-0.151728	template parameter. The order of
-0.151728	7.5 Booleans The order of
-0.202030	sequentially. The opposite order of
-0.368784	maintaining a new branch of
-0.227797	that each particular branch of
-0.227797	then the dispatch branch of
-0.205151	function that is member of
-0.199373	could be a member of
-0.041930	the function a member of
-0.199373	than accessing a member of
-0.199373	compact. Accessing a member of
-0.445947	Accessing a data member of
-0.257113	call the polymorphic member of
-0.205151	each other (not member of
-0.343401	Programming in the way of
-0.237680	types Unfortunately, the way of
-0.404283	It is a way of
-0.187777	resource. The C++ way of
-0.143209	be an efficient way of
-0.143209	a very efficient way of
-0.237580	is a useful way of
-0.187777	time. A simple way of
-0.187777	is a common way of
-0.237580	is a good way of
-0.187777	is a convenient way of
-0.187777	primitive, but efficient, way of
-0.187777	for a portable way of
-0.327715	function adds the elements of
-0.292437	operations on all elements of
-0.302532	addresses of array elements of
-0.219149	will read four elements of
-0.219149	Array with N elements of
-0.047322	calculation of the address of
-0.015208	row to the address of
-0.015208	eax to the address of
-0.015208	equal to the address of
-0.047322	is that the address of
-0.011356	look up the address of
-0.011356	which contains the address of
-0.011356	now contains the address of
-0.011356	ecx contains the address of
-0.011356	edx contains the address of
-0.023017	to calculate the address of
-0.023017	can calculate the address of
-0.047322	is simply the address of
-0.047322	constant, unless the address of
-0.047322	} Here, the address of
-0.047322	to find the address of
-0.023017	for calculating the address of
-0.023017	when calculating the address of
-0.047322	file tells the address of
-0.047322	4. So the address of
-0.066593	its address. The address of
-0.066593	odd here. The address of
-0.145279	overwrite the return address of
-0.379668	dispatching on every call of
-0.286997	example, if each bit of
-0.174340	with the sign bit of
-0.174340	example, the sign bit of
-0.174340	sets the sign bit of
-0.174340	copies the sign bit of
-0.313275	// set sign bit of
-0.151904	shift down sign bit of
-0.151904	// Set sign bit of
-0.151904	// flip sign bit of
-0.196217	the least significant bit of
-0.317751	language when the optimization of
-0.061872	Many advices on optimization of
-0.061872	Advanced book on optimization of
-0.061872	www.amd.com. Advices on optimization of
-0.235252	write _mm_add_epi16(a,b). Two libraries of
-0.305556	C, specifying that pointers of
-0.308621	specifying that two pointers of
-0.235429	make multiple versions even of
-0.279526	9.3 shows, the method of
-0.254812	iterations back. The method of
-0.254812	loop count. The method of
-0.208218	used. A newer method of
-0.208218	The old C-style method of
-0.208218	disadvantages. The original method of
-0.013381	the index is out of
-0.006639	array index is out of
-0.138501	Interpreted languages are out of
-0.138501	calculations simultaneously or out of
-0.138501	return 0 if out of
-0.138501	index is not out of
-0.063753	can execute instructions out of
-0.063753	by executing instructions out of
-0.138501	Move the conversions out of
-0.138501	write FatalAppExitA(0,"Array index out of
-0.041446	}; // Index out of
-0.009997	<< "Error: Index out of
-0.030708	can be moved out of
-0.030708	may be moved out of
-0.138501	of n being out of
-0.063753	used for jumping out of
-0.063753	destructors after jumping out of
-0.138501	we are breaking out of
-0.234712	make a zip file of
-0.019812	includes only the part of
-0.009793	software that is part of
-0.009793	a parameter is part of
-0.009793	stack is a part of
-0.009793	how often a part of
-0.019812	X (Darwin) are part of
-0.019812	usually included as part of
-0.019812	that is not part of
-0.009793	certain that this part of
-0.009793	measurements on this part of
-0.019812	of time. A part of
-0.000718	in the same part of
-0.019812	following cases: If part of
-0.019812	to see which part of
-0.019812	other reasons, but part of
-0.034612	occurs in each part of
-0.034612	many times each part of
-0.019812	in a static part of
-0.019812	not include any part of
-0.002670	in the critical part of
-0.013519	if the critical part of
-0.005911	in a critical part of
-0.005911	the same critical part of
-0.000105	the most critical part of
-0.000734	The most critical part of
-0.019812	If you access part of
-0.019812	is an important part of
-0.019812	least a large part of
-0.019812	in a small part of
-0.019812	framework. The optimized part of
-0.019812	support and another part of
-0.019812	activate a particular part of
-0.019812	the most significant part of
-0.019812	the most time-consuming part of
-0.019812	C++ program (or part of
-0.019812	if the time-critical part of
-0.019812	put the task-specific part of
-0.141793	manipulate all the bits of
-0.141793	compiler interpret the bits of
-0.174539	8 or 16 bits of
-0.174539	the lower 16 bits of
-0.177223	for example 32 bits of
-0.177223	with accessing 32 bits of
-0.079634	the upper 32 bits of
-0.079634	Get upper 32 bits of
-0.185427	least significant n bits of
-0.185427	into the individual bits of
-0.447520	Using the vector operations of
-0.143234	size of the type of
-0.143234	processor and the type of
-0.143234	done on the type of
-0.143234	templates where the type of
-0.143234	valid. Re-interpreting the type of
-0.178407	the size and type of
-0.253636	class declaration. The type of
-0.178407	different for each type of
-0.178407	work with any type of
-0.178407	types The return type of
-0.178407	for the appropriate type of
-0.035745	faster. In the case of
-0.035745	all. In the case of
-0.035745	occur. In the case of
-0.035745	60. In the case of
-0.127540	a function in case of
-0.076098	long time in case of
-0.076098	the program in case of
-0.076098	safe way in case of
-0.036384	clean up in case of
-0.036384	cleaned up in case of
-0.017811	an exception in case of
-0.076098	signed integers in case of
-0.076098	denormal numbers in case of
-0.076098	up everything in case of
-0.076098	be justified in case of
-0.217485	in all possible cases of
-0.045027	on. 7.31 Other cases of
-0.045027	61 7.31 Other cases of
-0.217485	in some rare cases of
-0.235008	the Xnu project. Some of
-0.275138	advice applies to arrays of
-0.221105	you can make arrays of
-0.221105	or a few arrays of
-0.323291	to overlap the calculations of
-0.227528	do the necessary calculations of
-0.012188	two or more versions of
-0.070410	to the different versions of
-0.070410	contain the different versions of
-0.029362	functions The different versions of
-0.029362	sets. The different versions of
-0.060853	compatible with different versions of
-0.060853	stub. If different versions of
-0.060853	have two different versions of
-0.224935	may make multiple versions of
-0.161465	for making multiple versions of
-0.161465	automatically generate multiple versions of
-0.093920	there are two versions of
-0.093920	to make two versions of
-0.108684	to advertise new versions of
-0.108684	10.1 Hyperthreading Some versions of
-0.108684	Currently includes optimized versions of
-0.108684	to make special versions of
-0.108684	available in newer versions of
-0.108684	3 The latest versions of
-0.108684	to the CPU-specific versions of
-0.108684	where necessary. Fast versions of
-0.112802	can block the execution of
-0.112802	possibly block the execution of
-0.221162	switch occurs during execution of
-0.193124	wait for the result of
-0.026793	depends on the result of
-0.055348	depend on the result of
-0.193124	But when the result of
-0.193124	to see the result of
-0.193124	iteration needs the result of
-0.059700	often as a result of
-0.059700	occur as a result of
-0.178072	very fast. The result of
-0.178072	as <. The result of
-0.283206	store the intermediate result of
-0.328967	double takes 8 bytes of
-0.220613	covers 64 consecutive bytes of
-0.220613	(less than 65 bytes of
-0.461277	of the first element of
-0.131854	faster than the speed of
-0.131854	used, while the speed of
-0.131854	can improve the speed of
-0.131854	speed Testing the speed of
-0.131854	that measures the speed of
-0.276812	used data. The speed of
-0.193024	statements The high speed of
-0.234836	facilities that do much of
-0.233988	units, memory ports, etc. of
-0.116622	no check for overflow of
-0.211647	negative result. An overflow of
-0.211647	variables. A positive overflow of
-0.247919	these types to integers of
-0.247919	each, or two integers of
-0.196983	bits each, four integers of
-0.196983	bits each, eight integers of
-0.196983	instructions cannot multiply integers of
-0.196983	contain either sixteen integers of
-0.003160	x to the power of
-0.000058	to is a power of
-0.000029	that is a power of
-0.000058	This is a power of
-0.000029	size is a power of
-0.000058	constant is a power of
-0.000058	matrix is a power of
-0.000058	columns is a power of
-0.000029	N is a power of
-0.000058	factor is a power of
-0.000058	abc is a power of
-0.000029	divisor is a power of
-0.000058	interval is a power of
-0.000234	always be a power of
-0.000078	preferably be a power of
-0.000311	division by a power of
-0.000311	multiply by a power of
-0.000311	Multiplying by a power of
-0.000935	is not a power of
-0.000935	a matrix a power of
-0.000935	not been a power of
-0.000935	of columns a power of
-0.000935	for N a power of
-0.003281	is a high power of
-0.026965	the high processing power of
-0.026965	utilize the computational power of
-0.095676	is a common cause of
-0.095676	the most common cause of
-0.218566	is a frequent cause of
-0.234122	type holds a precision of
-0.053262	burden is the calculation of
-0.025815	count and the calculation of
-0.025815	B, and the calculation of
-0.053262	Nothing in the calculation of
-0.053262	needed for the calculation of
-0.053262	how efficient the calculation of
-0.053262	B before the calculation of
-0.053262	roll out the calculation of
-0.053262	speed up the calculation of
-0.053262	and start the calculation of
-0.053262	software specifies the calculation of
-0.053262	latter case, the calculation of
-0.053262	can begin the calculation of
-0.053262	has finished the calculation of
-0.075076	function calls. The calculation of
-0.075076	Calculate polynomial The calculation of
-0.075076	generates 127. The calculation of
-0.075076	not supported. The calculation of
-0.225457	compiler if the uses of
-0.225457	time. Four typical uses of
-0.321336	space for the parameters of
-0.234362	critical. The worst problem of
-0.280423	size. The alternative solution of
-0.225769	library. The radical solution of
-0.183357	set gives the advantage of
-0.009053	list[x]; } The advantage of
-0.009053	the program. The advantage of
-0.009053	level-1 cache. The advantage of
-0.009053	C++ compilers. The advantage of
-0.009053	is enabled. The advantage of
-0.009053	calculation faster. The advantage of
-0.009053	of iterations. The advantage of
-0.009053	variable m. The advantage of
-0.078328	version that takes advantage of
-0.004503	order to take advantage of
-0.004503	how to take advantage of
-0.001496	that can take advantage of
-0.001496	you can take advantage of
-0.001496	program can take advantage of
-0.000747	You can take advantage of
-0.001496	We can take advantage of
-0.078328	set. The main advantage of
-0.078328	to take maximum advantage of
-0.078328	Typically, the full advantage of
-0.078328	15.1c? We took advantage of
-0.234286	operating system for support of
-0.233558	libraries where only few of
-0.061844	here is a list of
-0.061844	Here is a list of
-0.133981	3 for a list of
-0.150404	instruction set?". A list of
-0.150404	of a long list of
-0.014360	use a negative list of
-0.014360	make a negative list of
-0.014360	contains a negative list of
-0.110051	make a positive list of
-0.110051	contains a positive list of
-0.150404	even the smallest list of
-0.234223	15.1a to 15.1c would of
-0.215664	4 floats A structure of
-0.215664	of the whole structure of
-0.215664	where the logic structure of
-0.054579	recognize that the values of
-0.054579	Assuming that the values of
-0.117017	only on the values of
-0.117017	will make the values of
-0.117017	and show the values of
-0.273090	bit }; The values of
-0.223222	and message systems. All of
-0.223222	file formats. Comments All of
-0.312366	to test the sign of
-0.312366	can change the sign of
-0.246401	will use the copy of
-0.195633	making an unused copy of
-0.149175	make a non-inlined copy of
-0.149175	making a non-inlined copy of
-0.195633	saving a backup copy of
-0.120369	to calculate the addresses of
-0.120369	to control the addresses of
-0.120369	file includes the addresses of
-0.120369	for calculating the addresses of
-0.277206	the object. The allocation of
-0.222931	if it involves allocation of
-0.312820	fighting with the problems of
-0.222494	less susceptible to problems of
-0.233368	the larger address space of
-0.027936	advantageous if a lot of
-0.027936	Func with a lot of
-0.027936	can use a lot of
-0.009114	to do a lot of
-0.004533	can do a lot of
-0.013744	that take a lot of
-0.013744	conversions take a lot of
-0.027936	may cause a lot of
-0.013744	program uses a lot of
-0.013744	application uses a lot of
-0.027936	soon get a lot of
-0.027936	often contains a lot of
-0.027936	or require a lot of
-0.027936	can save a lot of
-0.027936	they waste a lot of
-0.027936	can spend a lot of
-0.006817	can consume a lot of
-0.027936	still consumes a lot of
-0.027936	software installed, a lot of
-0.027936	of RAM, a lot of
-0.015740	mathematical functions. A lot of
-0.015740	into vectors. A lot of
-0.334034	latency of the multiplication of
-0.177036	C99 standard. An implementation of
-0.225552	has a good implementation of
-0.265651	than a hardware implementation of
-0.136715	make a complicated implementation of
-0.136715	the most complicated implementation of
-0.177036	infinity. A typical implementation of
-0.210135	of procedure 4 Most of
-0.210135	user interface framework Most of
-0.210135	with limited resources. Most of
-0.191211	class containing the members of
-0.066976	Variables that are members of
-0.066976	if they are members of
-0.146199	or class with members of
-0.249159	(properties) The data members of
-0.146199	the saved variable members of
-0.066976	the class. Data members of
-0.066976	data together. Data members of
-0.146199	one instance. Non-static members of
-0.232923	faster than other methods of
-0.274533	performance during the development of
-0.220571	fast and easy development of
-0.208659	zero within a block of
-0.208659	allocate one big block of
-0.208659	handle its own block of
-0.304132	?Func@@YAXQAHAAH@Z is the name of
-0.274417	library www.agner.org/optimize/asmlib.zip. The name of
-0.233765	because of the needs of
-0.319665	i<100; i++)a[i]=2*i; The conversion of
-0.242622	languages have the disadvantage of
-0.121815	explained below. The disadvantage of
-0.121815	never called. The disadvantage of
-0.121815	64-bit Windows. The disadvantage of
-0.121815	another array. The disadvantage of
-0.094581	page 107. A disadvantage of
-0.094581	ASCII form. A disadvantage of
-0.051400	The most important disadvantage of
-0.051400	important. An important disadvantage of
-0.094581	CPU time. Another disadvantage of
-0.094581	code itself. Another disadvantage of
-0.109711	compact. The biggest disadvantage of
-0.309195	transfer of a parameter of
-0.195643	is a useful source of
-0.195643	is a common source of
-0.195643	is a frequent source of
-0.195643	as a valuable source of
-0.017779	itself, and the cost of
-0.017779	consider if the cost of
-0.017779	faster at the cost of
-0.017779	market. But the cost of
-0.017779	NAN. Avoiding the cost of
-0.017779	dispatching. Underestimating the cost of
-0.026949	the thread. The cost of
-0.026949	cores. 60 The cost of
-0.026949	be defined. The cost of
-0.026949	multithreaded applications: The cost of
-0.161767	a high overhead cost of
-0.233262	that shares the resources of
-0.320624	function scans a string of
-0.201416	address of the end of
-0.297652	handling in the end of
-0.201416	elements at the end of
-0.201416	doing. See the end of
-0.201416	Writing past the end of
-0.147219	= point to end of
-0.147219	; compare with end of
-0.147219	align ; mark end of
-0.043654	and 90 for examples of
-0.043654	page 103 for examples of
-0.043654	See www.agner.org/optimize/cppexamples.zip for examples of
-0.146844	also find more examples of
-0.146844	have seen many examples of
-0.146844	have provided several examples of
-0.146844	at www.agner.org/optimize/cppexamples.zip contains examples of
-0.146844	to a[i] More examples of
-0.232829	precision math allow addition of
-0.288309	(*.dll, *.so). The mechanism of
-0.226411	() { // Table of
-0.226411	// n! // Table of
-0.232601	the common language runtime of
-0.207169	programming as a means of
-0.021099	into one by means of
-0.110570	find the first byte of
-0.110570	localize the first byte of
-0.206269	clock cycles per byte of
-0.007278	rather than the parts of
-0.007278	to optimize the parts of
-0.014680	not possible when parts of
-0.014680	times and make parts of
-0.003624	manipulate the different parts of
-0.003624	stored in different parts of
-0.001808	or between different parts of
-0.001808	Switch between different parts of
-0.007278	costs to other parts of
-0.007278	which affects other parts of
-0.014680	the most used parts of
-0.002897	identify the critical parts of
-0.002897	advices in critical parts of
-0.002897	important or critical parts of
-0.002897	the most critical parts of
-0.002897	Optimizing less critical parts of
-0.014680	other system- specific parts of
-0.014680	likelihood that certain parts of
-0.014680	the most time-consuming parts of
-0.014680	CPU brand. Critical parts of
-0.014680	if other nearby parts of
-0.131791	the number and types of
-0.127213	by using different types of
-0.058965	have two different types of
-0.058965	correspondingly two different types of
-0.127213	of CPUs, different types of
-0.131791	can reduce other types of
-0.190771	of defining integer types of
-0.131791	mix the two types of
-0.131791	can reduce some types of
-0.006220	an inline function instead of
-0.006220	inserts built-in code instead of
-0.006220	using short int instead of
-0.006220	of test data instead of
-0.006220	it has i instead of
-0.006220	8.15a were float instead of
-0.006220	to the object instead of
-0.006220	a lookup table instead of
-0.006220	stored in registers instead of
-0.006220	error handling system instead of
-0.006220	by using references instead of
-0.006220	performance monitor counters instead of
-0.006220	effect with templates instead of
-0.006220	example use #if instead of
-0.003099	by using rounding instead of
-0.003099	this). Use rounding instead of
-0.006220	48 Use macros instead of
-0.006220	intermediate file format instead of
-0.006220	the option -fpie instead of
-0.006220	const or typedef instead of
-0.006220	char (or int) instead of
-0.006220	(& and |) instead of
-0.232342	output are unacceptable. Each of
-0.271689	turning off all optimizations of
-0.218058	to do interprocedural optimizations of
-0.217332	in the code. Many of
-0.217332	with other microprocessors. Many of
-0.270445	only have four numbers of
-0.216959	can have eight numbers of
-0.221636	to use the vectors of
-0.173532	a time in vectors of
-0.221636	parallel calculations on vectors of
-0.173532	x^4 // Define vectors of
-0.173532	instruction set (128 vectors of
-0.017365	and after the piece of
-0.017365	to insert the piece of
-0.003418	version of a piece of
-0.003418	that if a piece of
-0.000852	to make a piece of
-0.003418	part. If a piece of
-0.003418	idea how a piece of
-0.003418	to optimize a piece of
-0.003418	to generate a piece of
-0.003418	compiler optimizes a piece of
-0.003418	for studying a piece of
-0.008596	share the same piece of
-0.008596	executing the same piece of
-0.035454	executing a critical piece of
-0.120592	is a small piece of
-0.035454	optimizing a particular piece of
-0.231891	allocation are: The process of
-0.094177	most of the advantages of
-0.094177	many of the advantages of
-0.002070	is used. The advantages of
-0.008338	end user. The advantages of
-0.008338	with pointers. The advantages of
-0.008338	programming style. The advantages of
-0.008338	the operands. The advantages of
-0.008338	+ 1.0f;} The advantages of
-0.008338	allocated dynamically. The advantages of
-0.008338	(80 bits). The advantages of
-0.081523	Weighing the above advantages of
-0.269526	// OR the results of
-0.297358	matrix sizes. The results of
-0.268925	compile time. The storage of
-0.268925	can make thread-local storage of
-0.039365	there are different ways of
-0.019237	are several different ways of
-0.019237	has several different ways of
-0.130747	are other possible ways of
-0.130747	also have fast ways of
-0.270808	135 show various ways of
-0.014347	there are smarter ways of
-0.231701	Boolean operands The operands of
-0.075849	variable is the range of
-0.075849	to limit the range of
-0.167807	and underflow. The range of
-0.167807	because the same range of
-0.167807	analysis The live range of
-0.329045	memory at the start of
-0.198715	function name ; start of
-0.198715	the framework, during start of
-0.306817	load all the modules of
-0.231643	access. The execution core of
-0.021258	is that the overhead of
-0.021258	to avoid the overhead of
-0.021258	driver involves the overhead of
-0.021258	without invoking the overhead of
-0.010499	the function. The overhead of
-0.010499	member function. The overhead of
-0.021258	inlining are: The overhead of
-0.021258	between threads. The overhead of
-0.152763	away the extra overhead of
-0.131584	of the large overhead of
-0.231797	Hardware updating. The change of
-0.262875	in use. The installation of
-0.086275	time both during installation of
-0.086275	framework itself, during installation of
-0.045300	foremost, in the choice of
-0.022059	is that the choice of
-0.022059	discussion that the choice of
-0.045300	computers. Today, the choice of
-0.015681	hardware platform The choice of
-0.015681	Windows applications. The choice of
-0.015681	B values. The choice of
-0.015681	Graphics accelerators The choice of
-0.015681	best algorithm. The choice of
-0.084766	with a suitable choice of
-0.286459	or with an index of
-0.122003	declared whenever an instance of
-0.272923	more than one instance of
-0.122003	stored with each instance of
-0.027440	make a new instance of
-0.027440	generate a new instance of
-0.122003	that the next instance of
-0.122003	polymorphic classes. Each instance of
-0.261012	you compile the output of
-0.450971	what the assembly output of
-0.231419	that are declared outside of
-0.230110	on the essential task of
-0.122100	because of the costs of
-0.122100	focus on the costs of
-0.122100	for avoiding the costs of
-0.039712	an exception. The costs of
-0.019402	3 1.1 The costs of
-0.019402	information. 1.1 The costs of
-0.598604	to call the destructor of
-0.004880	............................................................................... 8 2.5 Choice of
-0.004880	C++ compilers. 2.5 Choice of
-0.004880	........................................................................................... 6 2.3 Choice of
-0.004880	this manual. 2.3 Choice of
-0.004880	data cache. 2.2 Choice of
-0.004880	....................................................................................... 5 2.2 Choice of
-0.004880	optimal platform 2.1 Choice of
-0.004880	........................................................................................... 5 2.1 Choice of
-0.004880	libraries........................................................................................ 12 2.7 Choice of
-0.004880	are undocumented. 2.7 Choice of
-0.004880	.................................................................................................... 10 2.6 Choice of
-0.004880	another compiler. 2.6 Choice of
-0.004880	program optimization. 2.4 Choice of
-0.004880	system......................................................................................... 6 2.4 Choice of
-0.053553	influence on the efficiency of
-0.053553	difference between the efficiency of
-0.025951	databases, etc. The efficiency of
-0.012782	25 7 The efficiency of
-0.012782	tool. 7 The efficiency of
-0.025951	7.13 Loops The efficiency of
-0.114650	explaining the relative efficiency of
-0.230110	language defines an algorithm of
-0.209497	which calculates the sum of
-0.209497	case is a sum of
-0.230516	different types or strings of
-0.212083	platforms and the possibility of
-0.212083	thought about the possibility of
-0.212083	compilers offer the possibility of
-0.164510	a very obscure possibility of
-0.040208	memory. See the discussion of
-0.009708	16 for a discussion of
-0.009708	87 for a discussion of
-0.019639	and 120 for discussion of
-0.019639	page 93 for discussion of
-0.040208	31 for more discussion of
-0.040208	error prone. A discussion of
-0.009708	for a further discussion of
-0.003212	150 for further discussion of
-0.003212	101 for further discussion of
-0.003212	153 for further discussion of
-0.230114	Windows allows a maximum of
-0.206429	alignment automatically. The alignment of
-0.206429	or __attribute__((aligned(16))). Specifies alignment of
-0.157772	added to the offset of
-0.157772	compact if the offset of
-0.157772	simply stores the offset of
-0.163647	b;} }; The offset of
-0.212976	if the first operand of
-0.132306	If the first operand of
-0.000062	is for the sake of
-0.000062	or for the sake of
-0.000062	function for the sake of
-0.000062	cache for the sake of
-0.000062	size for the sake of
-0.000062	version for the sake of
-0.000062	instructions for the sake of
-0.000062	1 for the sake of
-0.000062	files for the sake of
-0.000062	them for the sake of
-0.000062	chosen for the sake of
-0.000062	included for the sake of
-0.000062	maintained for the sake of
-0.164337	only, then the effect of
-0.134684	xplus2() { The effect of
-0.134684	infinite loop. The effect of
-0.134684	parentheses manually. The effect of
-0.001745	lower; and the amount of
-0.001745	so that the amount of
-0.001745	useful when the amount of
-0.001745	it increases the amount of
-0.001745	to reserve the amount of
-0.001745	to minimize the amount of
-0.010580	because the total amount of
-0.010580	consume a significant amount of
-0.002621	to the required amount of
-0.002621	allocates the required amount of
-0.010580	put an equal amount of
-0.010580	takes a considerable amount of
-0.010580	CPU, an insufficient amount of
-0.230114	table takes extra time, of
-0.229712	avoid this wasteful copying of
-0.229913	the most frequent causes of
-0.229030	have a balanced mix of
-0.204310	powerful. The high priority of
-0.256166	and the low priority of
-0.479317	Multithreading The clock frequency of
-0.284617	temp in one iteration of
-0.204310	again for every iteration of
-0.229465	choice for future models of
-0.090408	code addresses. The names of
-0.090408	compile for. The names of
-0.002271	cache. The different kinds of
-0.002271	to do different kinds of
-0.002271	are two different kinds of
-0.002271	are doing different kinds of
-0.002271	to mix different kinds of
-0.011473	time than other kinds of
-0.011473	can cause all kinds of
-0.011473	between the two kinds of
-0.011473	There are four kinds of
-0.011473	can make certain kinds of
-0.002840	manuals. 7.1 Different kinds of
-0.002840	26 7.1 Different kinds of
-0.256397	cache (en.wikipedia.org/wiki/L2_cache). The details of
-0.204515	into the technical details of
-0.180728	adds an extra level of
-0.180728	microarchitecture. A higher level of
-0.180728	when the highest level of
-0.284286	next paragraph. The target of
-0.228699	go outside the bounds of
-0.079096	it requires the loading of
-0.079096	may involve the loading of
-0.175879	because of lazy loading of
-0.089291	latter case the reading of
-0.089291	turn off the reading of
-0.523434	the worst case situation of
-0.015758	of two different implementations of
-0.015758	make two different implementations of
-0.103940	can run. Some implementations of
-0.032109	The most common implementations of
-0.032109	93). All common implementations of
-0.066783	executable code. Most implementations of
-0.066783	have particularly slow implementations of
-0.066783	compilation. Some early implementations of
-0.210153	vector). The first generation of
-0.107660	development, each new generation of
-0.024518	that the next generation of
-0.024518	on the next generation of
-0.208133	in the second generation of
-0.107660	loops or compile-time generation of
-0.285447	alignments and different sizes of
-0.201375	efficiently on all sizes of
-0.023366	changed without the risk of
-0.011526	it involves the risk of
-0.011526	also involves the risk of
-0.074245	// Faster, but risk of
-0.002535	there is no risk of
-0.198850	if a large fraction of
-0.198850	only a small fraction of
-0.163061	step in the sequence of
-0.107503	disadvantage if the sequence of
-0.085809	performed on a sequence of
-0.085809	are doing a sequence of
-0.096161	earlier CPUs. The sequence of
-0.096161	where a long sequence of
-0.008401	limit to the length of
-0.008401	integer if the length of
-0.008401	GHz then the length of
-0.008401	0x20; If the length of
-0.008401	safe unless the length of
-0.008401	by adding the length of
-0.025708	is doubled. The length of
-0.025708	was started. The length of
-0.198607	alternative version. The penalty of
-0.249745	get a misprediction penalty of
-0.049030	not permissible for reasons of
-0.003126	address of the beginning of
-0.000779	relative to the beginning of
-0.012641	sure that the beginning of
-0.012641	coincides with the beginning of
-0.012641	block into the beginning of
-0.003131	use is a matter of
-0.003131	prefer is a matter of
-0.000250	is simply a matter of
-0.006286	is just a matter of
-0.198121	printf(Greek[n]); } The declaration of
-0.198121	making the full declaration of
-0.032976	loop or the series of
-0.005325	chain is a series of
-0.002654	one in a series of
-0.002654	first in a series of
-0.005325	propagated through a series of
-0.005325	have made a series of
-0.005325	mechanism executes a series of
-0.032976	Copyright notice This series of
-0.032976	volumes in this series of
-0.170130	many of the features of
-0.217837	using the optimization features of
-0.170130	the time- consuming features of
-0.002654	stack is a waste of
-0.002654	situation is a waste of
-0.005325	user and a waste of
-0.001325	only be a waste of
-0.005325	may cause a waste of
-0.032976	of frustration and waste of
-0.032976	is a big waste of
-0.032976	is a total waste of
-0.015366	platforms. 3. The microarchitecture of
-0.000034	manual 3: "The microarchitecture of
-0.000236	Manual 3: "The microarchitecture of
-0.074083	if it is independent of
-0.074083	example 11.3 is independent of
-0.163456	that is almost independent of
-0.571251	copy constructors and destructors of
-0.004433	the function in terms of
-0.004433	expensive - in terms of
-0.001104	no cost in terms of
-0.002211	the costs in terms of
-0.002211	also costs in terms of
-0.004433	are costless in terms of
-0.004433	lag. Thinking in terms of
-0.227817	reorder instructions without help of
-0.227240	or class. The transfer of
-0.194178	ways of copying blocks of
-0.194178	section 17.9: "Moving blocks of
-0.026142	32 for an explanation of
-0.026142	43 for an explanation of
-0.026142	81 for an explanation of
-0.012874	CPUs" for an explanation of
-0.083129	A more detailed explanation of
-0.026041	Test with different brands of
-0.026041	on seven different brands of
-0.026041	that treats different brands of
-0.083336	not for other brands of
-0.083336	well on all brands of
-0.083336	performance of competing brands of
-0.281438	optimally on any brand of
-0.227240	times faster. The logic of
-0.086430	the book "Performance Optimization of
-0.086430	Adolfy Hoisie: "Performance Optimization of
-0.009345	class that takes care of
-0.009345	library that takes care of
-0.018895	the compiler takes care of
-0.006990	destructors to take care of
-0.006990	coprocessors to take care of
-0.006990	thread can take care of
-0.006990	tread can take care of
-0.226239	should save one unit of
-0.008576	statements is a kind of
-0.008576	is also a kind of
-0.005699	not make this kind of
-0.005699	CPU supports this kind of
-0.005699	To prevent this kind of
-0.017325	use a different kind of
-0.017325	tell explicitly what kind of
-0.017325	An even worse kind of
-0.280224	prepared for several iterations of
-0.225916	for prediction and misprediction of
-0.323758	systems allow lazy binding of
-0.225593	information about the chain of
-0.425720	on the x86 family of
-0.094974	to floating point Conversion of
-0.094974	of registers used. Conversion of
-0.044896	to integer conversion Conversion of
-0.044896	to float conversion Conversion of
-0.094974	to floating point. Conversion of
-0.226239	example to produce tables of
-0.227212	registers are used. Conversions of
-0.110128	accurate for the purpose of
-0.069126	loops // The purpose of
-0.069126	in y. The purpose of
-0.069126	using new. The purpose of
-0.193263	is true. The trick of
-0.193263	/ (b1*b2); The trick of
-0.183103	in advance. The disadvantages of
-0.183103	However, there are disadvantages of
-0.110128	so-called objects are instances of
-0.051582	applied to all instances of
-0.051582	// Make all instances of
-0.110128	hold many renamed instances of
-0.081972	call by the body of
-0.081972	inefficient because the body of
-0.224588	target if the changes of
-0.182768	have implemented a collection of
-0.182768	time consuming. A collection of
-0.058704	programmer to be aware of
-0.058704	dangers to be aware of
-0.126603	should therefore be aware of
-0.110411	/ 2 (be aware of
-0.308685	Using the out-of-order capabilities of
-0.110128	fact that the representation of
-0.110128	DOS compilers). The representation of
-0.110128	8.15b. The integer representation of
-0.166044	1-bit in binary representation of
-0.019858	works only for powers of
-0.019858	the numbers are powers of
-0.019858	are defined as powers of
-0.003247	advantage of using powers of
-0.003247	advise of using powers of
-0.006519	by preferably using powers of
-0.019858	all means avoid powers of
-0.363842	multiplied by a factor of
-0.066576	according to the rules of
-0.066576	even though the rules of
-0.145238	implement the many rules of
-0.000396	it is the responsibility of
-0.000066	It is the responsibility of
-0.037150	cycle is the reciprocal of
-0.222396	an n'th degree polynomial of
-0.267685	rather than type casting of
-0.175363	is safer. Type casting of
-0.010716	visible in the scope of
-0.010716	to make the scope of
-0.001178	is beyond the scope of
-0.078736	function calls. The principle of
-0.078736	go undetected. The principle of
-0.078736	latency and the throughput of
-0.078736	limited by the throughput of
-0.078736	to increase the throughput of
-0.092368	to measure // Number of
-0.010527	each element, bits Number of
-0.092368	class libraries 113 Number of
-0.004553	thanks to the availability of
-0.004553	source, and the availability of
-0.004553	test for the availability of
-0.004553	will delay the availability of
-0.004553	icon signaling the availability of
-0.023258	the application. The availability of
-0.297299	by modifying only half of
-0.174595	result of each step of
-0.222824	through a second step of
-0.023258	the same time regardless of
-0.023258	still the same regardless of
-0.023258	in most cases, regardless of
-0.023258	to be false regardless of
-0.023258	transferred in registers, regardless of
-0.023258	the same name, regardless of
-0.135230	based on just-in-time compilation of
-0.135230	machines use just-in-time compilation of
-0.028661	says that the behavior of
-0.028661	can change the behavior of
-0.028661	to mimic the behavior of
-0.092368	for details. The behavior of
-0.174595	elements in vector Type of
-0.174595	elements, as follows: Type of
-0.009345	it in the form of
-0.009345	classes in the form of
-0.009345	either in the form of
-0.092368	by any other form of
-0.018453	the strict aliasing rule of
-0.306470	this is the job of
-0.019383	determined by the requirements of
-0.019383	influenced by the requirements of
-0.039672	conflicting with the requirements of
-0.028063	may cause a loss of
-0.028063	of overflow and loss of
-0.028063	cause overflow or loss of
-0.028063	with hardly any loss of
-0.028063	to worry about loss of
-0.220366	care of all cleanup of
-0.219856	to force the swapping of
-0.005469	interface to the rest of
-0.005469	language and the rest of
-0.005469	advice in the rest of
-0.005469	so that the rest of
-0.005469	called by the rest of
-0.219856	on the advanced principles of
-0.074062	or the negative effects of
-0.074062	variables. The negative effects of
-0.294484	= 100; // Array of
-0.219856	complicated criteria or lists of
-0.450526	recover in the event of
-0.220366	non-sequential order. The advice of
-0.273723	have no specific recommendation of
-0.219856	and often inefficient. Objects of
-0.220877	for these calculations. Division of
-0.025708	program. 16.2 The pitfalls of
-0.025708	155 16.2 The pitfalls of
-0.113457	The most common pitfalls of
-0.293274	block. This is inefficient, of
-0.221301	limited by the latency of
-0.158323	same as the latency of
-0.028063	most efficiently if pieces of
-0.013805	divided into small pieces of
-0.013805	are typically small pieces of
-0.028063	by joining identical pieces of
-0.028063	parts only. Critical pieces of
-0.221301	stores the time consumption of
-0.158323	style. The time consumption of
-0.220366	and using advanced facilities of
-0.298808	will get time slices of
-0.216072	a prediction or estimate of
-0.216705	way is predicted well, of
-0.087671	constructor" to transfer ownership of
-0.087671	operator that transfers ownership of
-0.087671	object that looses ownership of
-0.011473	many of the drawbacks of
-0.002840	2.8 Overcoming the drawbacks of
-0.035371	the advantages and drawbacks of
-0.216705	fill up the queue of
-0.216072	This requires no modification of
-0.067370	rarely needed. 11 Out of
-0.067370	..................................................................................................... 103 11 Out of
-0.216072	be improved by modifications of
-0.216072	are smaller. The lengths of
-0.035371	divisible by 16. Alignment of
-0.017325	register variables. 9.5 Alignment of
-0.017325	...................................... 88 9.5 Alignment of
-0.035371	16 Table 7.2. Alignment of
-0.067370	for classes. The splitting of
-0.067370	a formalism. The splitting of
-0.216705	belongs to the area of
-0.147144	here because the consequence of
-0.192259	function calls. The consequence of
-0.087671	true, which is 50% of
-0.087671	a is true 50% of
-0.087671	will be mispredicted 50% of
-0.216072	for verifying the functionality of
-0.147684	that requires several layers of
-0.147684	number of separate layers of
-0.216705	specifying the size. Integers of
-0.216072	reflects the conflicting considerations of
-0.055138	the size (in bytes) of
-0.055138	The size (in bytes) of
-0.269443	discussion of the techniques of
-0.035371	and x86-64 platforms. Comparison of
-0.035371	- Table 8.1. Comparison of
-0.017325	this optimization. 8.2 Comparison of
-0.017325	............................................................................................ 66 8.2 Comparison of
-0.216072	done with the resolution of
-0.209840	"best case" values. Which of
-0.056439	to a complete redesign of
-0.056439	data. A complete redesign of
-0.047832	solution is the combination of
-0.047832	constant with a combination of
-0.047832	zero. An OR combination of
-0.047832	out some typical sources of
-0.011473	schemes are frequent sources of
-0.011473	frameworks are frequent sources of
-0.209840	do a thorough analysis of
-0.209840	Automatic updates. Automatic updating of
-0.056439	performance because the contents of
-0.056439	and copy the contents of
-0.210672	mentioned above. The generality of
-0.210672	mainly on my study of
-0.047832	(critical stride) = (number of
-0.047832	cache size) / (number of
-0.047832	(line size) % (number of
-0.121325	if a high degree of
-0.121325	contain a typical degree of
-0.210672	feature that allows overriding of
-0.056439	way to keep track of
-0.056439	objects and keep track of
-0.015366	want to get rid of
-0.015366	Then we get rid of
-0.015366	we don't get rid of
-0.209840	find the optimal decomposition of
-0.047832	program. During the history of
-0.047832	precompiled code. The history of
-0.047832	on the past history of
-0.047832	support for relative addressing of
-0.023258	instruction for self-relative addressing of
-0.023258	set supports self-relative addressing of
-0.047832	; jump to top of
-0.023258	of a ; top of
-0.023258	$B1$2 ebx ; top of
-0.209840	std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP. www.openmp.org. Documentation of
-0.047832	is achieved when none of
-0.023258	any expression, but none of
-0.023258	to 15.1c, but none of
-0.047832	below. Intrinsic function Size of
-0.047832	Type of elements Size of
-0.047832	of these classes. Size of
-0.281438	overflow. Taking the logarithm of
-0.027304	The allocation and deallocation of
-0.027304	dynamic allocation and deallocation of
-0.121325	cause seven memory allocations of
-0.121325	there are many allocations of
-0.209840	thrown exceptions are indeed of
-0.292953	> b) But beware of
-0.197636	With the high complexity of
-0.197636	poor documentation and lack of
-0.197636	in the disassembly window of
-0.197636	S1 aligned // Structure of
-0.197636	ignored if the goal of
-0.197636	with a 50-50 chance of
-0.197636	to overcome the dangers of
-0.197636	make an approximate comparison of
-0.197636	spot that uses 90% of
-0.073880	Rounding is fast. Value of
-0.073880	Truncation is slow. Value of
-0.073880	loop control branch ahead of
-0.073880	the loop counter ahead of
-0.248653	used most. The opposite of
-0.073880	for calculating the movements of
-0.073880	calculating the physical movements of
-0.073880	Intel compiler is capable of
-0.073880	Modern CPUs are capable of
-0.197636	in only the lowest of
-0.197636	is no heavy marketing of
-0.073880	to a better understanding of
-0.073880	and a basic understanding of
-0.073880	will cause the creation of
-0.073880	+ c) The creation of
-0.197636	OpenMP and automatic parallelization of
-0.197636	to a dramatic degradation of
-0.017325	done a good deal of
-0.017325	get a good deal of
-0.073880	expressions. There are lots of
-0.073880	and databases with lots of
-0.073880	programs, more than 99% of
-0.073880	In other programs, 99% of
-0.035371	............................................................................. 158 18 Overview of
-0.035371	22). 159 18 Overview of
-0.073880	connections rather than sequences of
-0.073880	instructions or small sequences of
-0.197636	expected for further expansions of
-0.035371	memory access 9.1 Caching of
-0.035371	............................................................................................. 87 9.1 Caching of
-0.197636	Agner Fog. Technical University of
-0.073880	// Example 12.8a. Sum of
-0.073880	// Example 12.8b. Sum of
-0.197636	to overcome the obstacle of
-0.248653	is because algebraic manipulations of
-0.197636	instructions. A further extension of
-0.073880	which is only 10% of
-0.073880	b is true 10% of
-0.197636	the columns. Every fourth of
-0.162958	copying. Security. The vulnerability of
-0.162958	is less than 1/50 of
-0.162958	nowadays stress the importance of
-0.162958	9.1. Time for transposition of
-0.162958	in the logical architecture of
-0.162958	loops, then the transformation of
-0.162958	efficiently by better standardization of
-0.162958	frequent allocation and de-allocation of
-0.162958	the function scanf. Violation of
-0.162958	full generality and flexibility of
-0.162958	language with a wealth of
-0.162958	be carried out independently of
-0.162958	that use large amounts of
-0.162958	for installation and uninstallation of
-0.162958	is the binary decimals of
-0.162958	we need metaprogramming. None of
-0.162958	invalid pointers. The absence of
-0.162958	measured separately. The fallacy of
-0.162958	lacks the self-explaining menus of
-0.162958	obsolete within the lifetime of
-0.162958	in the broader perspective of
-0.162958	is the responsi- bility of
-0.162958	operands if the evaluation of
-0.162958	illegitimate copying. The benefits of
-0.162958	platforms if the bias of
-0.162958	more than 2 gigabytes of
-0.162958	that a detailed overview of
-0.162958	and child class. Members of
-0.162958	running in the majority of
-0.162958	compatibility with a lineage of
-0.162958	can give some indication of
-0.162958	set is the scarcity of
-0.162958	conversions are not safe, of
-0.162958	large because the insertion of
-0.162958	comes with most distributions of
-0.162958	can hold 8 double's of
-0.162958	the pros and cons of
-0.162958	has a good knowledge of
-0.162958	follows the mathematical notion of
-0.162958	count on it. Instead of
-0.162958	4. Instruction tables: Lists of
-0.162958	instruction set (called x86) of
-0.162958	call the function billions of
-0.162958	a safe programming practice, of
-0.162958	the data cache. Bit-fields of
-0.162958	b can be omitted, of
-0.162958	defines hardware circuits consisting of
-0.162958	2B. There are hundreds of
-0.162958	need to do searches of
-0.162958	modularity, reusability and systematization of
-0.162958	the function. This fragmentation of
-0.162958	or 64-bit mode. Much of
-0.162958	can replace all occurrences of
-0.162958	the data into groups of
-0.162958	alignment can cause holes of
-0.162958	uint64_t Table 7.1. Sizes of
-0.162958	due to the design of
-0.162958	These systems use segmentation of
-0.162958	information about the dimensions of
-0.162958	final program. This requires, of
-0.162958	using the fundamental laws of
-0.162958	to draw the attention of
-0.162958	development time and maintainability of
-0.162958	a reply about investigation of
-0.162958	its reputation. The compactness of
-0.162958	or not. The advise of
-0.162958	when portability and ease of
-0.162958	there are a couple of
-0.162958	to the first-in-last-out nature of
-0.162958	tricks Michael Abrash: "Zen of
-0.162958	are used by thousands of
-0.162958	Intel compiler in favor of
-0.162958	is an extra layer of
-0.162958	logical structure and clarity of
-0.162958	two or three levels of
-0.162958	have this problem. Vectors of
-0.162958	will be misleading reports of
-0.352328	more important it is to
-0.323107	purpose of this is to
-0.323107	to test this is to
-0.313988	make thread-specific data is to
-0.327685	the while loop is to
-0.099422	have to do is to
-0.099422	you can do is to
-0.332956	good code performance is to
-0.283614	of CPU-intensive software is to
-0.203752	The first way is to
-0.203752	The second way is to
-0.203752	most compatible way is to
-0.230444	18 software optimization is to
-0.230444	program 81 optimization is to
-0.228581	using smart pointers is to
-0.341847	more general method is to
-0.579400	in this case is to
-0.313988	kind of error is to
-0.228581	code is optimized is to
-0.235908	solving the problem is to
-0.021828	to this problem is to
-0.044812	solve this problem is to
-0.358707	then the solution is to
-0.182721	The best solution is to
-0.182721	more complicated solution is to
-0.182721	An alternative solution is to
-0.182721	most clean solution is to
-0.182721	only reasonable solution is to
-0.283614	If the container is to
-0.369869	and table lookup is to
-0.428440	const_cast operator here is to
-0.228581	prevent such errors is to
-0.228581	frequency is limited is to
-0.046872	loop. Another possibility is to
-0.046872	GOT. Another possibility is to
-0.228581	most important thing is to
-0.036176	multiple CPU cores is to
-0.167296	A simple alternative is to
-0.167296	fragmented. An alternative is to
-0.228581	possible pointer aliasing is to
-0.228581	terminated. The purpose is to
-0.228581	new and delete is to
-0.228581	sum. The trick is to
-0.228581	the compiler generates is to
-0.228581	without using exceptions is to
-0.228581	instructions. My recommendation is to
-0.228581	handling cleanup jobs is to
-0.228581	more realistic goal is to
-0.228581	doesn't compromise safety is to
-0.228581	identify performance bottlenecks is to
-0.036923	b[size]; // set a to
-0.291599	When we add a to
-0.048021	sizeof(a)); // copy a to
-0.048021	0.0; // copy a to
-0.235608	propagation and reduce a to
-0.171350	we are converting a to
-0.171350	is implicitly converting a to
-0.235608	Here we prefer a to
-0.337928	inline the function and to
-0.483177	compiling for Windows and to
-0.036841	the terminating zero and to
-0.234837	of these obstacles and to
-0.234837	vector integer operations, and to
-0.234837	a separate module, and to
-0.348203	ultimate solution would be to
-0.235459	whether to repeat or to
-0.235459	to the console or to
-0.288906	that we want it to
-0.233240	iteration. This allows it to
-0.101147	231 then convert it to
-0.101147	compiler must convert it to
-0.233240	efficient than comparing it to
-0.233240	error and compare it to
-0.233240	system which redirects it to
-0.352611	you want the function to
-0.352577	only). Specifies a function to
-0.119866	a[SIZE][SIZE]) { // function to
-0.027009	in matrix // function to
-0.322908	can use this function to
-0.230670	function decides which function to
-0.306109	transferred from one function to
-0.531163	force a member function to
-0.230670	test // Critical function to
-1.373035	part of the code to
-0.494198	dispatching in the code to
-0.494198	pragmas in the code to
-0.329388	immediately before the code to
-0.329388	you want the code to
-0.329388	fine- tune the code to
-0.806422	the piece of code to
-0.640519	of the critical code to
-0.319834	actually add extra code to
-0.261722	compilers need assembly code to
-0.261722	use inline assembly code to
-0.046408	going from AVX code to
-0.046408	transition from AVX code to
-0.310566	transferred as machine code to
-0.235514	be designed so as to
-0.235514	coding rules apply as to
-0.235270	tell the compiler not to
-0.235270	signed. Be sure not to
-0.237187	testing and maintenance - to
-0.098077	in the program than to
-0.098077	load a program than to
-0.224968	a + b than to
-0.224968	on compiler optimization than to
-0.224968	and | operations than to
-0.279515	for each thread than to
-0.224968	re-use a container than to
-0.224968	big memory block than to
-0.224968	been accessed recently than to
-0.224968	polygon or bitmap than to
-0.224968	to write 2.0/3.0 than to
-0.224968	objects (memory pooling) than to
-0.035225	possible for the compiler to
-0.035225	difficult for the compiler to
-0.035225	easier for the compiler to
-0.389575	converted by the compiler to
-0.481941	rely on the compiler to
-0.070853	it allows the compiler to
-0.033983	This allows the compiler to
-0.621348	to tell the compiler to
-0.070853	may enable the compiler to
-0.070853	will enable the compiler to
-0.070853	sets enable the compiler to
-0.260899	will allow the compiler to
-0.260899	cannot expect the compiler to
-0.052027	This enables the compiler to
-0.260899	operator forces the compiler to
-0.053470	cannot expect a compiler to
-0.211724	for a Windows compiler to
-0.211724	expect a particular compiler to
-0.234124	Function template for x to
-0.234124	template to get x to
-0.234124	Example 15.1a. Calculate x to
-0.234124	// constructor initializes x to
-0.031091	code that allows you to
-0.031091	container that allows you to
-0.064580	Intel compiler allows you to
-0.100870	instructions that allow you to
-0.100870	32-bit systems allow you to
-0.286657	saving registers that have to
-0.365821	we do not have to
-0.312512	programmer does not have to
-0.171725	cases you may have to
-0.171725	The program may have to
-0.171725	cores. You may have to
-0.171725	history, etc. may have to
-0.159427	so that you have to
-0.032063	calculations then you have to
-0.032063	vectors then you have to
-0.032063	numbers, then you have to
-0.104305	endian systems you have to
-0.104305	this case you have to
-0.104305	operations. All you have to
-0.104305	which optimizations you have to
-0.104305	way. Here you have to
-0.104305	out-of-order execution, you have to
-0.104305	optimize anything, you have to
-0.281999	four, we will have to
-0.235402	data. The data have to
-0.299043	unwinding. All functions have to
-0.235402	But we do have to
-0.268108	multithreading that we have to
-0.025959	111 } You have to
-0.025959	intrinsic functions You have to
-0.025959	error handling. You have to
-0.025959	this manual. You have to
-0.025959	page 72. You have to
-0.025959	optimization job. You have to
-0.185834	Therefore, micro- processors have to
-0.185834	runtime address calculations have to
-0.185834	if different versions have to
-0.067747	that it doesn't have to
-0.452034	the compiler doesn't have to
-0.235402	then we would have to
-0.185834	all five values have to
-0.008741	that you don't have to
-0.017661	Therefore, you don't have to
-0.008741	that we don't have to
-0.017661	so we don't have to
-0.185834	because all caches have to
-0.289379	that you want this to
-0.233656	instruction and expect this to
-0.233656	sar ebx,1 adds this to
-0.345878	don't have the time to
-0.404952	it takes more time to
-0.260543	and take more time to
-0.260543	functions take more time to
-0.246832	takes 40% more time to
-0.354579	at the same time to
-0.354579	take the same time to
-0.218470	it obviously takes time to
-0.455933	take a long time to
-0.210684	times as long time to
-0.210684	takes too long time to
-1.106812	known at compile time to
-0.464252	It takes longer time to
-0.391007	that the response time to
-0.274548	because the response time to
-0.303935	can cause the memory to
-0.303935	free) causes the memory to
-0.335939	the swapping of memory to
-0.036836	data from static memory to
-0.036836	table from static memory to
-0.036836	list from static memory to
-0.036836	copied from static memory to
-0.225103	system to swap memory to
-0.344538	you should look at to
-0.352344	for converting the data to
-0.234616	need to organize data to
-1.118039	parts of the program to
-0.338630	just want the program to
-0.338630	to modify the program to
-0.317075	switch in your program to
-0.303571	extra iteration that has to
-0.321811	declared. Therefore, it has to
-0.316390	else. System code has to
-0.353129	because the compiler has to
-0.338091	module. The compiler has to
-0.233511	37 A compiler has to
-0.400193	error-prone. The program has to
-0.390860	then the pointer has to
-0.206559	b because b has to
-0.316916	if the user has to
-0.154411	that a user has to
-0.206559	the array element has to
-0.206559	Each cache line has to
-0.206559	point rounding mode has to
-0.206559	where each addition has to
-0.206559	the user actually has to
-0.206559	then the offset has to
-0.258701	exception then F1 has to
-0.206559	end user who has to
-0.234438	giving this example only to
-0.290268	apply CPU dispatching only to
-0.453369	possible for the CPU to
-0.321417	We want the CPU to
-0.321417	mechanism allows the CPU to
-0.321417	profiler tells the CPU to
-0.236932	(bit scan forward) instruction to
-0.215147	are sure to point to
-0.215147	; edx = point to
-0.215147	and makes it point to
-0.215147	call it will point to
-0.164289	than from floating point to
-0.164289	conversion from floating point to
-0.164289	Conversion from floating point to
-0.021744	different types cannot point to
-0.094385	the objects they point to
-0.094385	the texts they point to
-0.215147	; Induction++; ; point to
-0.322713	be visible at all to
-0.837987	method can be used to
-0.506010	metaprogramming can be used to
-0.230265	important than it used to
-0.230265	experience to get used to
-0.350627	will cause the cache to
-0.315634	clause. Comparing an integer to
-0.315634	a[i]; Converting an integer to
-0.296339	adding one more integer to
-0.097133	avoid conversions from integer to
-0.097133	enabled. Conversion from integer to
-0.306527	convert the unsigned integer to
-0.397982	of a signed integer to
-0.233687	certain options are set to
-0.233687	have its pointer set to
-0.235756	pointer to one class to
-0.725677	things you can do to
-0.322680	the programmer can do to
-0.341235	some cases, for example to
-0.314981	brands of C++ compilers to
-0.536185	several different C++ compilers to
-0.193216	of float or double to
-0.085950	from float or double to
-0.312571	buffer with fixed size to
-0.328784	there is a pointer to
-0.107127	class to a pointer to
-0.107127	pointer to a pointer to
-0.182832	converted to a pointer to
-0.107127	type-casted to a pointer to
-0.162540	block and a pointer to
-0.162540	example, when a pointer to
-0.162540	class has a pointer to
-0.162540	to make a pointer to
-0.226179	and return a pointer to
-0.376945	class through a pointer to
-0.162540	mechanism stores a pointer to
-0.162540	for converting a pointer to
-0.162540	and returns a pointer to
-0.162540	// Returns a pointer to
-0.185352	a reference or pointer to
-0.117304	sets a function pointer to
-0.279670	type-casting its 'this' pointer to
-0.019281	InstructionSet(); // Set pointer to
-0.023188	each element in b to
-0.413501	binary value of i to
-0.278563	The conversion of i to
-0.227556	takes to add i to
-0.227556	or by type-casting i to
-0.235587	example, to convert float to
-0.345013	to such an object to
-0.229800	block from one object to
-0.229800	adding the first object to
-0.326662	you want a number to
-0.336234	a floating point number to
-0.227367	a false model number to
-0.047565	add the keyword static to
-0.047565	Add the keyword static to
-0.057892	it is more efficient to
-0.027982	It is more efficient to
-0.042317	may be more efficient to
-0.201602	is often more efficient to
-0.271740	is much more efficient to
-0.421780	is slightly more efficient to
-0.534101	end of the array to
-0.312805	to set an array to
-0.312805	or setting an array to
-0.001164	that it is possible to
-0.001164	if it is possible to
-0.000388	cases it is possible to
-0.001164	But it is possible to
-0.001164	whether it is possible to
-0.001164	cases, it is possible to
-0.001164	Here it is possible to
-0.001164	Nevertheless, it is possible to
-0.001164	algebra, it is possible to
-0.001164	see, it is possible to
-0.001164	design, it is possible to
-0.006618	} It is possible to
-0.006618	object It is possible to
-0.006618	time. It is possible to
-0.006618	used. It is possible to
-0.006618	processors. It is possible to
-0.006618	object. It is possible to
-0.006618	critical. It is possible to
-0.006618	vectorization. It is possible to
-0.006618	input. It is possible to
-0.006618	purpose. It is possible to
-0.006618	context. It is possible to
-0.006618	148 It is possible to
-0.006618	happening. It is possible to
-0.006618	indeed. It is possible to
-0.006618	57). It is possible to
-0.006618	sizes? It is possible to
-0.006852	it may be possible to
-0.006852	It may be possible to
-0.013815	It should be possible to
-0.013815	neverthe- less be possible to
-0.006852	it might be possible to
-0.006852	It might be possible to
-0.048433	to make it possible to
-0.113807	that make it possible to
-0.007065	it makes it possible to
-0.003518	This makes it possible to
-0.007065	library makes it possible to
-0.028975	How was it possible to
-0.023724	it is not possible to
-0.065906	is therefore not possible to
-0.010320	It is also possible to
-0.020890	it is often possible to
-0.020890	It is often possible to
-0.090365	is not always possible to
-0.090365	It is sometimes possible to
-0.331038	on deciding which version to
-0.236193	an unused fourth value to
-0.292005	for transferring composite objects to
-0.117533	time that it takes to
-0.051637	more than it takes to
-0.008958	the time it takes to
-0.000865	The time it takes to
-0.350239	union forces the variable to
-0.224451	can cause other variables to
-0.224451	by setting these variables to
-0.330357	doesn't need induction variables to
-0.224451	16-bit Windows, allow variables to
-0.234842	then it must return to
-0.227339	2;} // add 2 to
-0.022707	i); // Add 2 to
-0.310206	you expect the table to
-0.310206	that copies the table to
-0.405739	in a virtual table to
-0.341638	may cause the software to
-0.306626	is common for software to
-0.001465	disable it in order to
-0.001465	the code in order to
-0.000732	of data in order to
-0.000732	arranging data in order to
-0.001465	each other in order to
-0.001465	instruction set in order to
-0.001465	line size in order to
-0.001465	of i in order to
-0.001465	point variables in order to
-0.001465	of 2 in order to
-0.001465	non-sequential order in order to
-0.001465	language elements in order to
-0.001465	declared const in order to
-0.001465	by 8 in order to
-0.001465	to unsigned in order to
-0.001465	of operations in order to
-0.000366	several times in order to
-0.001465	very big in order to
-0.001465	array element in order to
-0.001465	debug information in order to
-0.001465	round addresses in order to
-0.001465	the end in order to
-0.001465	than needed in order to
-0.001465	joined together in order to
-0.001465	of bool in order to
-0.001465	worst-case conditions in order to
-0.001465	the right in order to
-0.001465	of cores in order to
-0.001465	user input in order to
-0.001465	multiple blocks in order to
-0.001465	is low in order to
-0.001465	different algorithms in order to
-0.001465	specific purpose in order to
-0.001465	calling itself in order to
-0.001465	programming principles in order to
-0.001465	software package in order to
-0.001465	there is, in order to
-0.001465	of bookkeeping in order to
-0.001465	disturbing influences in order to
-0.001465	do experiments in order to
-0.001465	develop- ment in order to
-0.001465	of randomness in order to
-0.001465	then de-referenced in order to
-0.001465	planning phase in order to
-0.001465	table (GOT) in order to
-0.001465	* sizeof(float) in order to
-0.023546	2.0; } In order to
-0.023546	of B. In order to
-0.023546	course system-specific. In order to
-0.312979	of which code branch to
-0.297630	there is a way to
-0.223532	131) shows a way to
-0.066133	fine-grained parallelism. The way to
-0.066133	physical factors. The way to
-0.066133	is the only way to
-0.066133	aliasing. The only way to
-0.107518	there is no way to
-0.183298	There is no way to
-0.144177	an even faster way to
-0.188969	a very useful way to
-0.008257	is the best way to
-0.008257	Obviously, the best way to
-0.008257	Sometimes, the best way to
-0.012445	shows. The best way to
-0.012445	duration. The best way to
-0.188969	exploited. A good way to
-0.188969	called. The safe way to
-0.144177	module2.cpp. The simplest way to
-0.015613	is no easy way to
-0.144177	fastest. The typical way to
-0.031809	cases, the fastest way to
-0.031809	still the fastest way to
-0.031809	(/Oa). The easiest way to
-0.031809	delays. The easiest way to
-0.845523	total number of elements to
-0.307102	constructor sets all elements to
-0.280508	general, it is faster to
-0.207401	time. It is faster to
-0.207401	smaller. It is faster to
-0.280508	one operand is faster to
-0.280508	development tool is faster to
-0.335350	It is often faster to
-0.255114	would be even faster to
-0.255114	is usually much faster to
-0.203376	it is usually faster to
-0.145731	may replace the call to
-0.145731	and inlining the call to
-0.145731	by removing the call to
-0.076270	certain that a call to
-0.076270	unchanged across a call to
-0.342734	as a function call to
-0.313838	at each function call to
-0.216408	device driver. A call to
-0.049331	more than one call to
-0.023965	is only one call to
-0.023965	make only one call to
-0.168850	time because each call to
-0.168850	will make any call to
-0.263264	before the first call to
-0.036463	with a single call to
-0.036463	make a single call to
-0.168850	p->f(); // Virtual call to
-0.250017	is used, for example, to
-0.250017	can be, for example, to
-0.349897	the fraction. For example, to
-0.561220	out the sign bit to
-0.398320	setting the sign bit to
-0.521981	// set sign bit to
-0.333841	of setting a register to
-0.366170	requires an extra register to
-0.225906	a new physical register to
-0.007046	an example of how to
-0.007046	for examples of how to
-0.003548	page 130 for how to
-0.014370	page 120 for how to
-0.003548	page 122 for how to
-0.014370	page 107 for how to
-0.014370	at www.agner.org/optimize/cppexamples.zip for how to
-0.176065	theory. Advice on how to
-0.168497	tables". Tips about how to
-0.001457	following example shows how to
-0.139443	and to know how to
-0.139443	order to know how to
-0.012547	It is discussed how to
-0.112283	operator that specifies how to
-0.112283	advanced C++ programming, how to
-0.112283	following example illustrates how to
-0.052523	This manual discusses how to
-0.052523	This section discusses how to
-0.235261	} // Use template to
-0.343454	implementation uses XMM registers to
-0.023283	version without the need to
-0.023283	libraries without the need to
-0.023283	calculations without the need to
-0.097985	member functions that need to
-0.097985	Temporary files that need to
-0.097985	allocated resources that need to
-0.284314	list does not need to
-0.127169	Things that may need to
-0.127169	allocated array may need to
-0.127169	cores. You may need to
-0.058129	algorithm, then you need to
-0.058129	ordering? If you need to
-0.058129	C++ so you need to
-0.058129	other words, you need to
-0.021891	there is no need to
-0.025973	There is no need to
-0.035409	inlined - no need to
-0.126373	this case we need to
-0.126373	CPU cores, we need to
-0.120835	different registers. You need to
-0.120835	the dynamic libraries need to
-0.120835	the object files need to
-0.012981	a table of pointers to
-0.253133	the programmer that pointers to
-0.185720	exchange data or pointers to
-0.253133	objects (rather than pointers to
-0.185720	shared_ptr allows multiple pointers to
-0.185720	new block. Any pointers to
-0.185720	has to keep pointers to
-0.185720	zero, by setting pointers to
-0.185720	pointers, by initializing pointers to
-0.321180	even telling the user to
-0.321180	system forbids the user to
-0.400883	organization It is useful to
-0.702184	it can be useful to
-0.033108	it may be useful to
-0.050659	It may be useful to
-0.256637	is 102 also useful to
-0.519382	can be very useful to
-0.188718	It is often useful to
-0.089931	because it is sure to
-0.089931	directives. This is sure to
-0.073166	if they are sure to
-0.073166	cases they are sure to
-0.161206	AMD processors are sure to
-0.161206	same arguments are sure to
-0.255233	and executables. Make sure to
-0.312722	choice of which method to
-0.428896	the #pragma vector always to
-0.228243	compiler. Remember, therefore, always to
-0.231244	blocks makes the access to
-0.248936	will give you access to
-0.182123	other threads have access to
-0.182123	of fastest possible access to
-0.182123	order to get access to
-0.182123	instructions for fast access to
-0.182123	double which gives access to
-0.248936	software with network access to
-0.182123	a subset, giving access to
-0.182123	that allows direct access to
-0.182123	access. Sequential forward access to
-0.325156	out loop by 16 to
-0.229024	temp++ actually adds 16 to
-0.099903	memory block turns out to
-0.099903	the prediction turns out to
-0.661097	of the operating system to
-0.391769	tells the operating system to
-0.391769	force the operating system to
-0.332713	to write the file to
-0.234755	sets all other bits to
-0.234810	waiting for disk operations to
-0.099627	the integers from 0 to
-0.099627	the interval from 0 to
-0.234916	of the same type to
-0.547592	possible in some cases to
-0.234276	fast, compact, and simple to
-0.290414	keep adding new instructions to
-0.342130	ones that are available to
-0.228956	can be made available to
-0.059573	by adding a constant to
-0.170982	CPUs that are up to
-0.170982	make the code up to
-0.170982	is currently not up to
-0.235969	can be set up to
-0.260695	This may take up to
-0.170982	making it count up to
-0.077130	Unix systems allow up to
-0.077130	and Mac allow up to
-0.170982	STL vector turned up to
-0.170982	versions tested (not up to
-0.170982	their CPU dispatchers up to
-0.170982	in registers, totaling up to
-0.306225	// Number of times to
-0.160331	unacceptably long response times to
-0.059929	big arrays and want to
-0.014225	and you may want to
-0.014225	example, you may want to
-0.092466	function and you want to
-0.092466	functions and you want to
-0.092466	efficient and you want to
-0.092466	access and you want to
-0.064966	set that you want to
-0.064966	arrays that you want to
-0.064966	say that you want to
-0.009356	example if you want to
-0.009356	example, if you want to
-0.009356	CPUs if you want to
-0.009356	option if you want to
-0.009356	help if you want to
-0.048937	of code you want to
-0.023779	do when you want to
-0.023779	indices when you want to
-0.048937	a program you want to
-0.064966	www.agner.org/optimize/asmlib.zip. If you want to
-0.064966	results. If you want to
-0.064966	152 If you want to
-0.097080	case where you want to
-0.048937	for example, you want to
-0.048937	Sum3. Whether you want to
-0.036062	the function we want to
-0.017657	but if we want to
-0.017657	example, if we want to
-0.036062	n∙(n-1)!. If we want to
-0.059929	and manuals. I want to
-0.096479	Intel compilers. We want to
-0.059929	However, we still want to
-0.028932	software developers who want to
-0.028932	for those who want to
-0.001719	then it is important to
-0.005176	because it is important to
-0.005176	but it is important to
-0.005176	Therefore, it is important to
-0.005176	project, it is important to
-0.005176	these, it is important to
-0.005176	nature, it is important to
-0.107622	variables. It is important to
-0.107622	language. It is important to
-0.107622	is. It is important to
-0.107622	structure. It is important to
-0.107622	decomposition. It is important to
-0.107622	off. It is important to
-0.048972	when performance is important to
-0.216494	is even more important to
-0.097658	it is very important to
-0.097658	It is very important to
-0.120422	It is therefore important to
-0.120422	problem is too important to
-0.290658	have many different CPUs to
-0.022714	arrays are sufficiently large to
-0.227512	it some heavy work to
-0.227512	for the reinstallation work to
-0.084063	in between the calls to
-0.084063	to avoid the calls to
-0.188401	the number of calls to
-0.305963	the 61 function calls to
-0.188401	such loops by calls to
-0.188401	program contains no calls to
-0.188401	a program contains calls to
-0.188401	a "function". Multiple calls to
-0.234244	operations with other calculations to
-0.620617	slow down the execution to
-0.378829	check the final result to
-0.234496	tell a hyperthreading processor to
-0.321315	Yet, D is compiled to
-0.213958	files are first compiled to
-0.213958	; Example 8.26a compiled to
-0.213958	; Example 8.26b compiled to
-0.234496	a string of bytes to
-0.320707	the arrays very big to
-0.079624	that it is necessary to
-0.009332	then it is necessary to
-0.018868	Therefore, it is necessary to
-0.018868	Here, it is necessary to
-0.018868	called, it is necessary to
-0.018868	Sometimes it is necessary to
-0.018868	accessed, it is necessary to
-0.021721	calculations. It is necessary to
-0.021721	hackers. It is necessary to
-0.065994	may not be necessary to
-0.031744	it may be necessary to
-0.031744	It may be necessary to
-0.103273	linking makes it necessary to
-0.173012	it is not necessary to
-0.173012	It is not necessary to
-0.023608	it is often necessary to
-0.023608	It is often necessary to
-0.003418	It is therefore necessary to
-0.103273	it is rarely necessary to
-0.227234	is the nearest element to
-0.227234	an extra dummy element to
-0.047027	terms of execution speed to
-0.233728	This example is specific to
-0.200191	etc. It is common to
-0.200191	away. It is common to
-0.219073	It is more common to
-0.318009	to fix the thread to
-0.322264	to lock a thread to
-0.234519	the time slices allocated to
-0.289688	C are too small to
-0.283381	point Conversion of integers to
-0.283381	than two 32-bit integers to
-0.270104	Conversion of unsigned integers to
-0.270104	to convert unsigned integers to
-0.315124	product. It is good to
-0.226729	It is not good to
-0.234735	easier said than done to
-0.335842	b from single precision to
-0.346229	an entire cache line to
-0.270880	only four function parameters to
-0.200870	be passed as parameters to
-0.252291	maximum of four parameters to
-0.020583	up to fourteen parameters to
-0.015417	then it is advantageous to
-0.005078	whether it is advantageous to
-0.015417	Therefore, it is advantageous to
-0.015417	Likewise, it is advantageous to
-0.101966	to. It is advantageous to
-0.024508	it can be advantageous to
-0.012081	It can be advantageous to
-0.129805	It may be advantageous to
-0.078109	can also be advantageous to
-0.078109	some cases be advantageous to
-0.209966	can therefore be advantageous to
-0.030811	it is not advantageous to
-0.063975	It is not advantageous to
-0.100493	it is less advantageous to
-0.100493	is almost always advantageous to
-0.228657	variable which is known to
-0.099451	the result is known to
-0.099451	33 result is known to
-0.228657	on process is known to
-0.200089	is constant and known to
-0.291112	sizes. Fortunately, the solution to
-0.208388	speed. A simple solution to
-0.208388	storing. The standard solution to
-0.208388	be a better solution to
-0.235575	there is an advantage to
-0.440592	can be an advantage to
-0.208946	there is no advantage to
-0.208946	is a specific advantage to
-0.009413	const*)p); } // Function to
-0.093105	intrinsic functions // Function to
-0.093105	vector classes // Function to
-0.093105	with SSE4.1 // Function to
-0.093105	cc[size] ); // Function to
-0.093105	_mm_loadu_si128((__m128i const*)p);} // Function to
-0.002586	out loop by eight to
-0.146527	account when deciding whether to
-0.157737	disadvantages when deciding whether to
-0.207629	iteration it decides whether to
-0.023084	optimized for is likely to
-0.185093	program, it is likely to
-0.011388	the code is likely to
-0.011388	your code is likely to
-0.023084	The compiler is likely to
-0.023084	because this is likely to
-0.023084	test program is likely to
-0.023084	collector which is likely to
-0.023084	target address is likely to
-0.023084	end user is likely to
-0.023084	which method is likely to
-0.023084	The system is likely to
-0.023084	the problem is likely to
-0.023084	CPU model is likely to
-0.023084	different platform is likely to
-0.023084	speed here is likely to
-0.023084	particular brand is likely to
-0.023084	integer comparison is likely to
-0.023084	page 87) is likely to
-0.044964	static data are likely to
-0.044964	compiler is more likely to
-0.044964	it is also likely to
-0.132189	which is very likely to
-0.044964	instructions are less likely to
-0.044964	variables and therefore likely to
-0.044964	and are equally likely to
-0.331559	UnusedFiller in the structure to
-0.233382	wrap around. Adding 1 to
-0.233197	operations do not add to
-0.224543	then use this information to
-0.224543	identification adds extra information to
-0.233806	additions. When used simply to
-0.000853	the compiler is able to
-0.003424	the microprocessor is able to
-0.002096	code to be able to
-0.002096	compiler to be able to
-0.002096	want to be able to
-0.000523	may not be able to
-0.001571	might not be able to
-0.002096	compiler may be able to
-0.002096	compilers may be able to
-0.002096	processor may be able to
-0.003147	program will be able to
-0.003147	103) will be able to
-0.000213	the compilers are able to
-0.000213	all compilers are able to
-0.000213	C++ compilers are able to
-0.000213	Some compilers are able to
-0.000213	few compilers are able to
-0.000213	Few compilers are able to
-0.000640	Intel microprocessors are able to
-0.000640	Modern microprocessors are able to
-0.005146	they are not able to
-0.005146	is usually not able to
-0.010354	is not always able to
-0.010354	CPUs are actually able to
-0.005146	whether they were able to
-0.005146	have tested were able to
-0.010354	processors are sometimes able to
-0.141935	& 1 is certain to
-0.141935	with #define is certain to
-0.185663	count is not certain to
-0.185663	It is therefore certain to
-0.185663	CPUID instruction was certain to
-0.311516	list is almost certain to
-0.926778	a few clock cycles to
-0.298824	- 5 clock cycles to
-0.388493	a hundred clock cycles to
-0.278097	by causing return addresses to
-0.223717	to translate these addresses to
-0.233865	nothing while seconds count to
-0.233668	framework requiring many files to
-0.000051	then it is recommended to
-0.000154	Therefore, it is recommended to
-0.000154	used, it is recommended to
-0.000154	platforms, it is recommended to
-0.000154	projects, it is recommended to
-0.000077	functions. It is recommended to
-0.000077	used. It is recommended to
-0.000077	called. It is recommended to
-0.000077	pointer. It is recommended to
-0.000077	calls. It is recommended to
-0.000077	problems. It is recommended to
-0.000077	members. It is recommended to
-0.000077	handling. It is recommended to
-0.000077	versions. It is recommended to
-0.000077	divisions. It is recommended to
-0.000077	message. It is recommended to
-0.000077	54. It is recommended to
-0.000077	61. It is recommended to
-0.000077	decimals. It is recommended to
-0.004247	It is not recommended to
-0.023285	it is also recommended to
-0.181291	It is therefore recommended to
-0.023285	It is strongly recommended to
-0.233316	if the threads write to
-0.232835	can expect 64-bit programs to
-0.094188	whether it is optimal to
-0.094188	cases, it is optimal to
-0.144982	may not be optimal to
-0.303274	it may be optimal to
-0.442952	it is not optimal to
-0.328022	most efficient memory space to
-0.384940	causes the heap space to
-0.059217	There is a lot to
-0.309681	is often a lot to
-0.222407	risk of overflow Integer to
-0.222407	for further discussion. Integer to
-0.276380	"Hello 2" The dispatching to
-0.338724	compiler supports CPU dispatching to
-0.232864	of the two branches to
-0.275490	Porting such an application to
-0.221416	in a typical application to
-0.233349	expect the && expression to
-0.439928	it is more complicated to
-0.233008	but who would like to
-0.343118	will align data members to
-0.320604	that use these methods to
-0.232811	of the data block to
-0.095118	b && a needs to
-0.111962	Any function that needs to
-0.062233	User work that needs to
-0.062233	a destructor that needs to
-0.039592	calls and it needs to
-0.039592	layers and it needs to
-0.154446	time because it needs to
-0.021899	that the compiler needs to
-0.021899	optimization, the compiler needs to
-0.095118	a && b needs to
-0.095118	the executable file needs to
-0.095118	only one constant needs to
-0.095118	a positive list needs to
-0.095118	such as ReadB needs to
-0.309570	point if the conversion to
-0.220691	signed integers before conversion to
-0.293710	numbers as a parameter to
-0.220224	as an implicit parameter to
-0.325748	causes floating point division to
-0.209416	which returns a reference to
-0.031600	a pointer or reference to
-0.027017	any pointer or reference to
-0.109188	sees a relative reference to
-0.109188	error // Return reference to
-0.109188	Return a null reference to
-0.276089	is virtually no cost to
-0.184428	is no performance cost to
-0.251625	is no extra cost to
-0.184428	is a large cost to
-0.233827	a large overhead cost to
-0.001990	there is no reason to
-0.002989	There is no reason to
-0.441953	for the CPU dispatcher to
-0.218847	23; // add n to
-0.218847	2n by adding n to
-0.232492	a zero-terminated ASCII string to
-0.002723	responsibility of the programmer to
-0.013787	bility of the programmer to
-0.020848	useful for the programmer to
-0.020848	important for the programmer to
-0.020848	difficult for the programmer to
-0.020848	relevant for the programmer to
-0.231852	Example 12.4b executes three to
-0.041802	it may be better to
-0.415196	add the static keyword to
-0.231980	cause the resource-hungry applications to
-0.231852	cases. Don't change && to
-0.244113	definition code in addition to
-0.244113	is doing an addition to
-0.193597	The most important addition to
-0.193597	can do another addition to
-0.232492	abusing the update mechanism to
-0.233519	15 Metaprogramming Metaprogramming means to
-0.232780	will convert these types to
-0.103558	code that is difficult to
-0.180753	functions It is difficult to
-0.180753	between. It is difficult to
-0.103558	process which is difficult to
-0.024504	very long and difficult to
-0.024504	becomes bulky and difficult to
-0.016182	it can be difficult to
-0.003989	it may be difficult to
-0.024504	space and are difficult to
-0.024504	functions that are difficult to
-0.050474	makes the code difficult to
-0.086204	clear and more difficult to
-0.050474	that are very difficult to
-0.050474	understand and therefore difficult to
-0.050474	It is quite difficult to
-0.050474	that is slow, difficult to
-0.414580	way m is transferred to
-0.413615	the data are aligned to
-0.232394	assembly or easy linking to
-0.532318	from floating point numbers to
-0.231970	wasted on runtime dispatch to
-0.044881	and a well-defined interface to
-0.044881	with a well-defined interface to
-0.448386	for the installation process to
-0.170561	of the time goes to
-0.170561	file. The output goes to
-0.170561	typical software project goes to
-0.170561	the call p->f() goes to
-0.170561	less than 1% goes to
-0.371484	52. You may choose to
-0.240238	software developer may choose to
-0.200539	etc. Whether you choose to
-0.268819	Use appropriate compiler options to
-0.215520	compilers have various options to
-0.130681	There are several ways to
-0.050885	there are various ways to
-0.054424	There are various ways to
-0.087597	sections describe various ways to
-0.014341	There are three ways to
-0.269494	works differently. The link to
-0.216117	makes a symbolic link to
-0.159687	a dispatch is made to
-0.159687	no attempt is made to
-0.231961	const keyword wherever appropriate to
-0.087624	the object it points to
-0.087624	first call it points to
-0.139103	what a pointer points to
-0.098953	that p always points to
-0.098953	original pointer actually points to
-0.200805	add what r points to
-0.098953	of object p points to
-0.030546	pointer which initially points to
-0.030546	Function pointer initially points to
-0.030546	PLT entry initially points to
-0.231961	it needs to switch to
-0.052811	algorithm before you start to
-0.025603	spots Before you start to
-0.025603	tasks. Before you start to
-0.231961	The CPU will start to
-0.231961	it is necessary here to
-0.148714	it is more relevant to
-0.148714	manual is also relevant to
-0.148714	Command line options relevant to
-0.148714	these are hardly relevant to
-0.148714	directives and keywords relevant to
-0.148714	almost all respects relevant to
-0.213782	There are two things to
-0.213782	does quite ingenious things to
-0.213119	cc[]) { // go to
-0.213119	to the function go to
-0.231263	labels is simply predicted to
-0.213285	will have more references to
-0.213285	function calls. Internal references to
-0.296641	may need extra overhead to
-0.214281	is very little overhead to
-0.143550	the code are relative to
-0.143550	of each function relative to
-0.065871	of the member relative to
-0.065871	a data member relative to
-0.143550	If the offset relative to
-0.143550	in fact addressed relative to
-0.231582	to add unused columns to
-0.358141	that it is intended to
-0.273975	symbol interposition is intended to
-0.245172	The examples are intended to
-0.044929	3.2 Use a profiler to
-0.624259	i++) { // Loop to
-0.212490	// Example 8.23a. Loop to
-0.098776	However, it is inefficient to
-0.098776	words, it is inefficient to
-0.231354	expects an immediate response to
-0.336431	set of cache lines to
-0.263112	first algorithm that comes to
-0.263112	advantages when it comes to
-0.324085	unsafe code is limited to
-0.294758	memory plus the costs to
-0.188674	four kinds of costs to
-0.188674	are inherent performance costs to
-0.300025	information about the destructor to
-0.402296	must have a destructor to
-0.238159	why it is safe to
-0.444744	may not be safe to
-0.238159	It is more safe to
-0.229443	intrinsic vectors requires alignment to
-0.276867	// define a macro to
-0.338856	{ // Define macro to
-0.229668	that you want them to
-0.163652	where we are writing to
-0.433669	program. Reading or writing to
-0.074163	or more threads writing to
-0.074163	avoid multiple threads writing to
-0.070389	bytes can be reduced to
-0.070389	condition can be reduced to
-0.323933	makes it more clear to
-0.284347	to give higher priority to
-0.314774	saved from one iteration to
-0.045001	of which processor models to
-0.204316	+ 1 is changed to
-0.206593	has to be changed to
-0.206593	2.5 may be changed to
-0.157996	CPUID is artificially changed to
-0.158824	so. It may fail to
-0.116896	software companies often fail to
-0.116896	code because they fail to
-0.054527	not, and therefore fail to
-0.054527	developers may therefore fail to
-0.116896	many software products fail to
-0.336935	algorithm The first thing to
-0.204546	be an obvious thing to
-0.331656	Alignment of data structures to
-0.327724	will cause the heap to
-0.062711	it can be initialized to
-0.062711	array can be initialized to
-0.175910	b have been initialized to
-0.304790	you want the executable to
-0.387765	from the main executable to
-0.228971	around such a subexpression to
-0.228174	to install automatic updates to
-0.175910	below. Make calls directly to
-0.175910	These instructions write directly to
-0.175910	can be fed directly to
-0.211188	the object is copied to
-0.149562	the parameter is copied to
-0.151627	file has been copied to
-0.151627	the entire contents copied to
-0.228440	of vector register sizes to
-0.032933	program that is easier to
-0.032933	convenient. It is easier to
-0.032933	example 15.1b is easier to
-0.107394	more manageable and easier to
-0.308162	It is often easier to
-0.107394	It is just easier to
-0.079015	pointed to is identical to
-0.079015	by p is identical to
-0.224030	Open BSD are identical to
-0.228971	been reduced from 20 to
-0.202130	we do not expect to
-0.202130	line that we expect to
-0.087662	compiler which is similar to
-0.087662	A template is similar to
-0.144588	convert the result back to
-0.144588	setting the priority back to
-0.144588	100 and jumps back to
-0.144588	software that dates back to
-0.197613	attempts to set seconds to
-0.248627	can take several seconds to
-0.410826	elements in the sequence to
-0.228372	function also has something to
-0.316634	hardly any performance penalty to
-0.228372	you have special reasons to
-0.228080	attention of software programmers to
-0.228080	object. A little-known alternative to
-0.170419	of the program happen to
-0.170419	that several variables happen to
-0.170419	a big matrix happen to
-0.096260	times may be enough to
-0.045468	but not long enough to
-0.045468	is just long enough to
-0.497718	that is big enough to
-0.096260	This is small enough to
-0.096260	mechanism is rarely enough to
-0.022094	2 does not apply to
-0.022094	argument does not apply to
-0.045373	given here may apply to
-0.045373	the advices may apply to
-0.096045	does not always apply to
-0.135914	The same rules apply to
-0.227497	length of a row to
-0.227788	to a variable declaration to
-0.227788	to add new features to
-0.017584	value that is added to
-0.017584	an integer is added to
-0.017584	then c is added to
-0.017584	data members is added to
-0.017584	then f is added to
-0.195497	keyword can be added to
-0.023326	are: It is easy to
-0.023326	develop. It is easy to
-0.047976	this error is easy to
-0.228289	useful in test situations to
-0.281813	afterwards reads or writes to
-0.163243	} This function writes to
-0.163243	stride causes all writes to
-0.017989	member functions. This applies to
-0.017989	random manner. This applies to
-0.008901	order. The same applies to
-0.008901	floats. The same applies to
-0.011909	137). This also applies to
-0.011909	about Linux also applies to
-0.011909	increment operators also applies to
-0.036755	The same advice applies to
-0.001889	code can be applied to
-0.001889	which can be applied to
-0.000943	can only be applied to
-0.003026	32 results when applied to
-0.000188	keyword static, when applied to
-0.500877	copy constructors and destructors to
-0.193950	wrapper classes with destructors to
-0.105347	transformation of example 15.1b to
-0.049487	conversion from example 15.1b to
-0.049487	come from example 15.1b to
-0.105347	will convert example 15.1b to
-0.108648	Gnu compiler reduced 15.1b to
-0.226671	the array pointer eax to
-0.228289	If you don't care to
-0.227093	bit Linux The procedure to
-0.226003	function by adding throw() to
-0.123785	optimizing compiler may try to
-0.123785	the producer will try to
-0.123785	has hyperthreading, then try to
-0.123785	CPU. Should we try to
-0.049507	An integer is converted to
-0.049507	base class is converted to
-0.010027	need to be converted to
-0.003317	integer can be converted to
-0.001655	pointer can be converted to
-0.041576	positive number when converted to
-0.049507	type of object pointed to
-0.049507	division). The object pointed to
-0.010027	that the value pointed to
-0.010027	because the value pointed to
-0.010027	that the variable pointed to
-0.010027	time the variable pointed to
-0.041576	if the target pointed to
-0.225640	are using advanced algorithms to
-0.225640	It is OK, however, to
-0.084461	CPU dispatchers are designed to
-0.084461	instructions (MOVNT) are designed to
-0.280278	if all the inputs to
-0.123226	then it is preferred to
-0.123226	aligned. It is preferred to
-0.155053	it may be preferred to
-0.227093	modulo operator %. Conversion to
-0.189081	a loop count down to
-0.189081	time consumption was down to
-0.188748	of array ; jump to
-0.188748	makes the microprocessor jump to
-0.232114	STL container are allowed to
-0.307828	function is not allowed to
-0.492384	uses new and delete to
-0.039086	need to be distributed to
-0.039086	needs to be distributed to
-0.312908	that a compiler generates to
-0.223906	sum1 from time T to
-0.262082	relocated by the linker to
-0.193370	which allows the linker to
-0.231693	core during time measurements to
-0.182524	to do some measurements to
-0.278311	sizeof(float)). Now, the factor to
-0.223906	increased from 64-bit MMX to
-0.225975	where it makes sense to
-0.017989	this way is equal to
-0.017989	of list[i] is equal to
-0.017989	each label is equal to
-0.056368	happen to be equal to
-0.056368	p is therefore equal to
-0.223044	normal writes or reads to
-0.131759	the application program. Add to
-0.131759	of the library. Add to
-0.131759	with CPU dispatching. Add to
-0.131369	Gnu. It is expected to
-0.283808	applications can be expected to
-0.131369	instruction set are expected to
-0.092252	then it is convenient to
-0.131759	it may be convenient to
-0.043681	may be more convenient to
-0.043681	is certainly more convenient to
-0.222563	from the leftmost column to
-0.222563	the sake of portability to
-0.223044	by each thread. Pointers to
-0.175222	Use signed when converting to
-0.175222	to signed before converting to
-0.222563	abuse is extremely costly to
-0.362221	ever more powerful computers to
-0.419160	stack in the debugger to
-0.302509	It is not permissible to
-0.023225	many commercial compilers due to
-0.023225	to be higher due to
-0.023225	a large delay due to
-0.023225	in the future due to
-0.023225	measurements are unstable due to
-0.023225	are some differences due to
-0.222563	address is in edx, to
-0.035507	rarely worth the effort to
-0.035507	hardly worth the effort to
-0.219545	and software engineering principles to
-0.073974	It may be obvious to
-0.073974	it would be obvious to
-0.163690	they may be swapped to
-0.163690	uncached or even swapped to
-0.163690	will not be portable to
-0.163690	it is not portable to
-0.163690	be no certain limit to
-0.334863	no reasonable upper limit to
-0.073974	if there is nothing to
-0.073974	branch. There is nothing to
-0.357408	abc can be increased to
-0.163188	overloaded operator is equivalent to
-0.163188	#define directives are equivalent to
-0.273372	the necessary cleanup jobs to
-0.032936	that it is safer to
-0.032936	to. It is safer to
-0.068576	references. References are safer to
-0.068576	It is therefore safer to
-0.220691	change the expression -(-a) to
-0.273372	library can be updated to
-0.220691	of the program appear to
-0.221265	Other system resources Writes to
-0.028023	or *.so) that belong to
-0.028023	These addresses all belong to
-0.028023	library functions often belong to
-0.028023	the stack always belong to
-0.028023	these cache lines belong to
-0.052970	dilemma. You may prefer to
-0.052970	a programmer may prefer to
-0.113311	many users will prefer to
-0.422513	increase the time slices to
-0.035320	also likely to lead to
-0.011457	code. This can lead to
-0.011457	new insight can lead to
-0.011457	the bottlenecks can lead to
-0.269095	and shifts one place to
-0.126619	then it is preferable to
-0.087557	it may be preferable to
-0.087557	It is often preferable to
-0.087557	predict where the obstacles to
-0.087557	them. Some important obstacles to
-0.087557	the most common obstacles to
-0.017301	this option. 8.4 Obstacles to
-0.017301	....................................................................... 77 8.4 Obstacles to
-0.017301	PathScale compilers. 8.3 Obstacles to
-0.017301	compilers............................................................................. 74 8.3 Obstacles to
-0.288434	more by the loader to
-0.035320	and model number. Failure to
-0.017301	is also deallocated. Failure to
-0.017301	has been deallocated. Failure to
-0.035320	of program flow. Failure to
-0.215765	if you change pre-increment to
-0.035320	is quite efficient thanks to
-0.035320	prefetch data automatically thanks to
-0.035320	are very similar thanks to
-0.035320	never becomes fragmented thanks to
-0.017301	of b is guaranteed to
-0.017301	of i&15 is guaranteed to
-0.035320	- they are guaranteed to
-0.035320	base is not guaranteed to
-0.269095	program may need modification to
-0.269095	non-Intel machines? Possible solutions to
-0.041576	provided in an appendix to
-0.041576	available as an appendix to
-0.087557	container classes. An appendix to
-0.087557	be used as alternatives to
-0.087557	at the possible alternatives to
-0.087557	There are various alternatives to
-0.215765	a lot of modifications to
-0.215765	for better metaprogramming tools to
-0.035320	the option -fpic according to
-0.035320	a binary representation according to
-0.035320	to always behave according to
-0.035320	= 100. Now, according to
-0.215765	gone to great lengths to
-0.087557	integer registers is extended to
-0.087557	method can be extended to
-0.087557	XMM registers are extended to
-0.147552	caching less efficient. Access to
-0.147552	remote data locally. Access to
-0.215765	market for many years to
-0.216474	time, but also inconvenient to
-0.215765	or the __assume_aligned directive to
-0.146947	processor model is going to
-0.146947	I am not going to
-0.008901	be a good idea to
-0.008901	not a good idea to
-0.008901	therefore a good idea to
-0.216474	user interfaces and interfaces to
-0.216474	zero); // Use mask to
-0.146947	and variable names. Remember to
-0.146947	up and down. Remember to
-0.146947	automatically, although it appears to
-0.146947	automatically if this appears to
-0.215765	plug-ins that add functionality to
-0.409253	of the exception handler to
-0.147552	The performance is inferior to
-0.147552	64-bit compilers are inferior to
-0.215765	transferred from one auto_ptr to
-0.004261	the compiler is unable to
-0.008564	program will be unable to
-0.008564	user will be unable to
-0.017301	can convert example 15.1a to
-0.017301	automatically reduces example 15.1a to
-0.017301	Intel compiler reduced 15.1a to
-0.017301	the compilers reduced 15.1a to
-0.215765	in a systematic manner to
-0.292789	than 1.23456. The conclusion to
-0.209536	with floating point multiplication, to
-0.209536	I have tested seem to
-0.262059	which range from -128 to
-0.292789	by changing the dividend to
-0.121160	software. This is annoying to
-0.121160	protection schemes are annoying to
-0.209536	the assembly output listing to
-0.163523	function call is translated to
-0.121160	i=0; has been translated to
-0.210469	Use 12 option -fno-builtin to
-0.047762	or switch statement leads to
-0.047762	where automatic vectorization leads to
-0.047762	on lazy binding leads to
-0.343719	send your programming questions to
-0.209536	is a portability issue to
-0.209536	allows the function argument to
-0.209536	annoying. We may decide to
-0.209536	may be completely unrolled to
-0.005691	address range from 0x2700 to
-0.047762	bytes from address 0x2700 to
-0.047762	code that is ported to
-0.047762	code is later ported to
-0.047762	and not easily ported to
-0.209536	It takes some experience to
-0.281080	are modified, if necessary, to
-0.292789	Virtual function // Call to
-0.209536	"override" feature. All accesses to
-0.121160	a long time compared to
-0.121160	the following disadvantages compared to
-0.027270	Often, it is sufficient to
-0.027270	projects, it is sufficient to
-0.197341	but it is impossible to
-0.073769	of x is type-casted to
-0.073769	if pointers are type-casted to
-0.197341	the CPUID was manipulated to
-0.197341	in the code carefully to
-0.197341	a number of dangers to
-0.248321	uncommon for virus scanners to
-0.248321	for assigning different priorities to
-0.197341	you can get answers to
-0.197341	to a function prototype to
-0.197341	cache of 256 Kbytes to
-0.197341	parallel. Coarse-grained parallelism refers to
-0.197341	critical functions take microseconds to
-0.197341	to take extra precautions to
-0.073769	Therefore, it is worthwhile to
-0.073769	It may be worthwhile to
-0.197341	whether it is profitable to
-0.248321	// Constructor // Initialize to
-0.073769	whether the object belongs to
-0.073769	else. This normally belongs to
-0.248321	runtime from the caller to
-0.197341	unused returns // Volatile to
-0.035320	is standardized allows us to
-0.035320	is biased allows us to
-0.073769	funda- mentally flawed approach to
-0.073769	and well thought-through approach to
-0.073769	disadvantage of C++ relates to
-0.073769	the C++ language relates to
-0.197341	code in example 11.1a to
-0.035320	leaks if you forget to
-0.035320	thread. If you forget to
-0.197341	It should never respond to
-0.073769	is a significant contribution to
-0.073769	only a negligible contribution to
-0.035320	features, including the ability to
-0.035320	we loose the ability to
-0.073769	their address and attempts to
-0.073769	event that it attempts to
-0.248321	an annoying time consumer to
-0.197341	not allowed. Non-public distribution to
-0.197341	make appropriate error messages to
-0.197341	be used as coprocessors to
-0.197341	x^n // sum, initialize to
-0.197341	is often an obstacle to
-0.197341	in the 64-bit extension to
-0.035320	can take several minutes to
-0.035320	it took several minutes to
-0.197341	seconds has been incremented to
-0.162686	and restarted anyway. Updates to
-0.162686	there are serious limitations to
-0.162686	is because we forgot to
-0.162686	new or malloc. Handles to
-0.162686	convenient for adding bounds-checking to
-0.162686	is not always comparable to
-0.162686	move. It is unacceptable to
-0.162686	sum2 from time T+1 to
-0.162686	and map are prone to
-0.162686	intended as a plug-in to
-0.162686	<= n < 223 to
-0.162686	answer. Beginners are advised to
-0.162686	I consider it unwise to
-0.162686	Use a "move constructor" to
-0.162686	the overhead of switching to
-0.162686	add statements like throw(A,B,C) to
-0.162686	container elements are cumbersome to
-0.162686	vectorize, or #pragma novector to
-0.162686	point to integer According to
-0.162686	have tested the capability to
-0.162686	line size. I tried to
-0.162686	x) { // Round to
-0.162686	expression -a > -b to
-0.162686	cc); } // Entry to
-0.162686	b+c will be rounded to
-0.162686	two values is closest to
-0.162686	function F1 is supposed to
-0.162686	are cheap, in relation to
-0.162686	user expects immediate responses to
-0.162686	not, by default, conform to
-0.162686	do not 123 correspond to
-0.162686	through the following steps to
-0.162686	same processor core. Try to
-0.162686	two threads from attempting to
-0.162686	actually reducing example 15.1d to
-0.162686	then it is advisable to
-0.162686	32-bit systems gives rise to
-0.162686	the empty throw() specification to
-0.162686	are not testing. Trying to
-0.162686	/ 10; // Convert to
-0.162686	i<100; i++,i2+=2.0f)a[i]=i2; 41 Float to
-0.162686	unable to respond quickly to
-0.162686	uncommon for software teachers to
-0.162686	= 2 * 5; to
-0.162686	is easy to port to
-0.162686	to reduce example 12.1b to
-0.162686	purpose. It just happened to
-0.162686	is smaller and closer to
-0.162686	64 kb. This corresponds to
-0.162686	will convert example 12.8a to
-0.162686	Common Language Runtime, CLR, to
-0.162686	then you are risking to
-0.162686	key values are confined to
-0.162686	32-bit integer multiplication prior to
-0.162686	Sometimes it takes hours to
-0.162686	Microprocessor designers have gone to
-0.162686	expressions are less susceptible to
-0.162686	is necessary to adhere to
-0.162686	finding problems that relate to
-0.162686	as GetPrivateProfileString and WritePrivateProfileString to
-0.162686	should preferably be responded to
-0.162686	we are adding -100 to
-0.162686	memory allocation also tends to
-0.162686	able to reduce (a*b*c)+(c*b*a) to
-0.162686	A metaprogramming implementation analogous to
-0.162686	I am always happy to
-0.162686	It is common practice to
-0.162686	(e.g. GetLogicalProcessorInformation in Windows) to
-0.162686	a jump from a=a*2; to
-0.162686	updated. It is tempting to
-0.162686	caches have to adapt to
-0.162686	two functions are unrelated to
-0.162686	// Example 7.38b. Alternative to
-0.461809	distributed as it is and
-0.344979	the uses of a and
-0.064005	the values of a and
-0.350621	memory area for a and
-0.279851	is faster if a and
-0.632046	For example, if a and
-0.279851	works even if a and
-0.279851	be optimized if a and
-0.316262	32-bit number. If a and
-0.316262	other factor. If a and
-0.351113	put 80 into a and
-0.233466	of the array a and
-0.309438	unused bytes between a and
-0.434901	or by making a and
-0.233466	systems". The parameters a and
-0.233466	registers are used. a and
-0.233466	has two arrays, a and
-0.233466	space by joining a and
-0.233466	b = MultiplyBy<8>(10); a and
-0.344159	a pointer points to and
-0.236865	than to delete it and
-0.456487	to inline the function and
-0.324168	to inline this function and
-0.307321	created by one function and
-0.426878	of times each function and
-0.334928	of a critical function and
-0.324168	deleted by another function and
-0.411907	calls the dispatcher function and
-0.374177	to the inlined function and
-0.293055	allow compile- time if and
-0.235279	in multiple versions with and
-0.436028	mixing code compiled with and
-0.330311	you can turn on and
-1.410802	part of the code and
-0.560906	parallelization in the code and
-0.436941	to optimize the code and
-0.278732	combined size of code and
-0.278732	total amount of code and
-0.054736	9.1 Caching of code and
-0.161568	execution time when code and
-0.161568	execute CriticalFunction when code and
-0.445843	share the same code and
-0.445843	on floating point code and
-0.405794	generate any extra code and
-0.291945	C++, directly compiled code and
-0.523835	based on intermediate code and
-0.453492	with an intermediate code and
-0.218733	compiled to binary code and
-0.241097	X make position-independent code and
-0.241097	compiler uses position-independent code and
-0.241097	the burdensome position-independent code and
-0.302029	translated into machine code and
-0.218733	clear and well-structured code and
-0.218733	sure the startup code and
-0.355984	comes before the compiler and
-0.537624	calls. The Gnu compiler and
-0.234261	with the best compiler and
-0.237204	are there between x and
-0.631589	most of the time and
-0.445430	50% of the time and
-0.284037	elements at a time and
-0.190472	bytes at a time and
-0.190472	line at a time and
-0.416060	branch ahead of time and
-0.542382	(1./1.2345) at compile time and
-0.365997	automatically. The development time and
-0.225781	should take installation time and
-0.355246	which function to use and
-0.290617	Both code cache use and
-0.292041	is therefore becoming more and
-0.849252	the calculation of A and
-0.335306	optimized versions of memory and
-0.202643	table in static memory and
-0.202643	stored in static memory and
-0.224584	have much less memory and
-0.224584	the most common memory and
-0.364345	than the main memory and
-0.309127	1980 where RAM memory and
-0.224584	writing to uncached memory and
-0.451523	consuming. Therefore, the data and
-0.560210	set of test data and
-0.230778	to store intermediate data and
-0.286109	for containing thread-specific data and
-1.054886	part of a program and
-0.233914	in the final program and
-1.322159	the programmer to make and
-0.217547	All accesses to functions and
-0.271110	by several different functions and
-0.290541	PLT for all functions and
-0.378332	includes the critical functions and
-0.217547	and optimize both functions and
-0.325667	of different intrinsic functions and
-0.290541	memory and string functions and
-0.095292	overriding of public functions and
-0.095292	relocation. All public functions and
-0.290541	is pure. Virtual functions and
-0.217547	distinction between leaf functions and
-0.482834	instructions in the CPU and
-0.059316	supported by the CPU and
-0.128034	by both the CPU and
-0.128034	checks both the CPU and
-0.225008	the bit scan instruction and
-0.142275	slow bit scan instruction and
-0.573575	store the floating point and
-0.937207	roll out the loop and
-0.609355	to unroll the loop and
-0.609355	by unrolling the loop and
-0.692538	to unroll a loop and
-0.518799	outside the innermost loop and
-0.136161	time it is used and
-0.136161	before it is used and
-0.231971	are no longer used and
-0.311682	and will have one and
-0.251994	to the code cache and
-0.405498	in the code cache and
-0.228904	likely that code cache and
-0.224436	resources, such as cache and
-0.651142	the level-1 data cache and
-0.517465	than the level-2 cache and
-0.283471	SSE2 128 bit integer and
-0.407020	as an unsigned integer and
-0.228455	advantageous to mix integer and
-0.228455	in registers (6 integer and
-0.351655	inefficient way. See page and
-0.410279	for this instruction set and
-0.153900	12.1 AVX instruction set and
-0.578733	the latest instruction set and
-0.098942	12.2 AVX-512 instruction set and
-0.221436	such as list, set and
-0.523863	of the same class and
-0.395531	of a parent class and
-0.206926	functions of parent class and
-0.531883	particular compiler to do and
-0.347311	the compiler can do and
-0.209433	that come with compilers and
-0.209433	can use only compilers and
-0.306566	127. The Intel compilers and
-0.013320	for different C++ compilers and
-0.343578	study of how compilers and
-0.261943	some very good compilers and
-0.209433	supplied with Intel's compilers and
-0.542968	compiler you are using and
-0.100850	representation of float, double and
-0.100850	Conversions between float, double and
-0.352428	depends on the size and
-0.288619	between optimizing for size and
-0.340240	generated by the Intel and
-0.095347	available, one from Intel and
-0.095347	Agner Available from Intel and
-0.045062	tried. The Microsoft, Intel and
-0.045062	Linux with Microsoft, Intel and
-0.045062	compilers from Microsoft, Intel and
-0.045062	functions (i.e. Microsoft, Intel and
-0.217693	parallelization. The Gnu, Intel and
-0.525812	changes the function pointer and
-0.317489	is inexact if b and
-0.225028	this example, a, b and
-0.225028	operands and add b and
-0.022527	two); // Multiply b and
-0.308620	a separate function library and
-0.308620	the asmlib function library and
-0.229347	the most efficient library and
-0.233186	adds this to i and
-0.233186	can also eliminate i and
-0.296969	AVX 256 bit float and
-0.022365	14.7 Don't mix float and
-0.222974	1.2; // Mixing float and
-0.222974	the code mixes float and
-0.568406	the loop by two and
-0.341172	bits of the number and
-0.541560	www.agner.org/optimize/cppexamples.zip. If the number and
-0.314757	thread-local storage of static and
-0.099657	available in both static and
-0.099657	work with both static and
-0.019327	on optimization of C++ and
-0.323914	parallel processing in C++ and
-0.353638	makes inlining more efficient and
-0.485880	implementations are less efficient and
-0.450507	bounds-checking to an array and
-0.235498	of 2 if possible and
-0.418862	making a debug version and
-0.651990	to calculate the value and
-0.309905	to all class objects and
-0.225224	to align large objects and
-0.406904	to all allocated objects and
-0.225224	preferred to declare objects and
-0.100692	7.3 Floating point variables and
-0.198781	is that all variables and
-0.198781	program contains many variables and
-0.249940	are used. Such variables and
-0.249940	with other local variables and
-0.088115	Likewise, all non-static variables and
-0.088115	constructed. All non-static variables and
-0.198781	can access internal variables and
-0.041827	classes. 7.2 Integers variables and
-0.041827	26 7.2 Integers variables and
-0.402273	of call and return and
-1.667302	a power of 2 and
-0.230279	It takes between 2 and
-0.378244	in an import table and
-0.286659	and analyzing program performance and
-0.286659	have very good performance and
-0.232411	accessed in sequential order and
-0.232411	have a natural order and
-0.090486	to be very long and
-0.090486	can be very long and
-0.227528	would give annoyingly long and
-0.338214	64-bit Windows and 32-bit and
-0.255195	optimization capabilities for 32-bit and
-0.255195	separate executables for 32-bit and
-0.239457	in performance between 32-bit and
-0.189450	Gnu libraries support 32-bit and
-0.189450	Today (2013) both 32-bit and
-0.189450	utility. It supports 32-bit and
-0.084476	are not. Supports 32-bit and
-0.084476	binary code). Supports 32-bit and
-0.189450	many platforms, including 32-bit and
-0.189450	Windows and Linux, 32-bit and
-0.189450	Mac OS X, 32-bit and
-0.189450	in both 16-bit, 32-bit and
-0.235130	history of that branch and
-0.291721	times the first way and
-0.291204	on multiple data elements and
-0.362628	makes function calls faster and
-0.223339	between threads becomes faster and
-0.223339	application is generally faster and
-0.223339	make software packages faster and
-0.361375	read the time before and
-0.276638	inside your program before and
-0.222429	time stamp counter before and
-0.222429	the clock count before and
-0.114978	the function is called and
-0.271807	each function is called and
-0.315959	all destructors are called and
-0.226640	a different code address and
-0.329328	an arbitrary memory address and
-0.226640	program by their address and
-0.335055	pattern, while Pentium 4 and
-0.330057	to overlap the call and
-0.230453	The overhead of call and
-0.234447	sizeof(S1) would be 8 and
-0.234700	nothing between 8 bit and
-0.234590	efficiency is reflected, first and
-0.516756	than in a register and
-0.234852	optimization Intel: "Intel 64 and
-0.406123	in Intel function libraries and
-0.313111	Various graphics function libraries and
-0.215166	libraries: long vector libraries and
-0.215166	to try different libraries and
-0.215166	including all runtime libraries and
-0.472764	of floating point registers and
-0.194664	because both the pointers and
-0.245312	as well as pointers and
-0.194664	you analyze all pointers and
-0.323586	disadvantages of using pointers and
-0.278802	complications with member pointers and
-0.194664	as arguments while pointers and
-0.194664	1. Relocation. All pointers and
-0.194664	for the link pointers and
-0.194664	variables. This includes pointers and
-0.321932	priority before the test and
-0.621827	every time a new and
-0.117496	of memory with new and
-0.117496	an object with new and
-0.117496	memory allocation with new and
-0.117496	allocated dynamically with new and
-0.193513	alternative to using new and
-0.193513	or CString uses new and
-0.193513	with the operators new and
-0.193513	of alloca over new and
-0.367022	available in 64-bit systems and
-0.131926	bits in 32-bit systems and
-0.131926	bytes in 32-bit systems and
-0.131926	eight in 32-bit systems and
-0.131926	six in 32-bit systems and
-0.347052	for 64-bit operating systems and
-0.052630	in 32-bit operating systems and
-0.193513	compatibility with existing systems and
-0.452346	annoying to the user and
-0.320662	unnecessary for the user and
-0.289292	You should avoid these and
-0.315288	bottleneck than memory access and
-0.217799	relevant when CPU access and
-0.550672	to put file access and
-0.217799	structures with non-sequential access and
-0.290834	Not optimized for SSE2 and
-0.904815	and the operating system and
-0.324451	both compiler, operating system and
-0.228950	preferably aligned by 32 and
-0.228950	than 8, 16, 32 and
-0.211401	from the library file and
-0.211401	C or C++ file and
-0.211401	in one source file and
-0.170241	both the executable file and
-0.170241	Both the executable file and
-0.331340	the 64-bit vector operations and
-0.494686	of floating point operations and
-0.222623	subtraction, comparison, bit operations and
-0.257359	if one is 0 and
-0.033632	in b to 0 and
-0.339116	other values than 0 and
-0.257359	comparisons i < 0 and
-0.339046	by specifying the type and
-0.303263	about the function type and
-0.234357	the worst possible case and
-0.546920	efficient in some cases and
-0.337425	versions for different processors and
-0.204907	and between simple processors and
-0.256838	Intel processors. AMD processors and
-0.256838	number of physical processors and
-0.204907	effect on older processors and
-0.204907	also see emulated processors and
-0.227562	count (ArraySize) is constant and
-0.227562	the double precision constant and
-0.296282	tool can set up and
-0.222395	clock frequency goes up and
-0.296282	handlers for cleaning up and
-0.619842	of the residual error and
-0.248342	one way two times and
-0.197360	1024/4 = 256 times and
-0.197360	loop repeats 20 times and
-0.197360	occur at random times and
-0.327217	program repeats 1000 times and
-0.197360	start at unpredictable times and
-0.197360	critical function ten times and
-0.690718	stored on the stack and
-0.431208	parameters on the stack and
-0.009754	the Microsoft, Intel, Gnu and
-0.019732	platforms. Microsoft, Intel, Gnu and
-0.234493	function is so important and
-0.227519	supported by most CPUs and
-0.227519	on all 64-bit CPUs and
-0.280048	Specifies alignment of arrays and
-0.251885	has several large arrays and
-0.088784	recommended that big arrays and
-0.088784	you have big arrays and
-0.200509	for simple variables, arrays and
-0.200509	Linux) 4. Align arrays and
-0.216521	best performance. The Windows and
-0.182679	Intel compiler for Windows and
-0.124718	interface library for Windows and
-0.124718	when compiling for Windows and
-0.094451	32-bit and 64-bit Windows and
-0.094451	32- and 64-bit Windows and
-0.233611	intended for 32-bit Windows and
-0.168951	in 64 bit Windows and
-0.036482	work in both Windows and
-0.036482	syntax in both Windows and
-0.168951	64-bit Linux, BSD, Windows and
-0.386652	chain of function calls and
-0.297326	contain pure function calls and
-0.213715	program has many calls and
-0.213715	number of jumps, calls and
-0.529949	contains floating point calculations and
-0.312230	distinguish these two versions and
-0.311785	thanks to out-of-order execution and
-0.257982	supported by the processor and
-0.257982	depending on the processor and
-0.219759	AMD or VIA processor and
-0.910274	the code is compiled and
-0.318189	member function is big and
-0.226161	the compiled code big and
-0.339937	cores: Define multiple threads and
-0.347453	libraries have the best and
-0.500478	choice of programming language and
-0.235447	the C++ programming language and
-0.235447	a particular programming language and
-0.265380	C++ or assembly language and
-0.347730	optimized, using assembly language and
-0.328101	optimized for execution speed and
-0.234796	if a, b, c and
-0.218718	in speed between single and
-0.218718	Do not mix single and
-0.218718	penalty for mixing single and
-0.232780	multiplexers, arithmetic units, etc. and
-0.118299	at all on AMD and
-0.118299	of performance on AMD and
-0.118299	work well on AMD and
-0.126214	CPUs such as AMD and
-0.126214	yet. Supports both AMD and
-0.000266	microarchitecture of Intel, AMD and
-0.022101	breakdowns for Intel, AMD and
-0.022101	microprocessors from Intel, AMD and
-0.022101	supports both Intel, AMD and
-0.301727	memory block is allocated and
-0.041772	different sizes are allocated and
-0.104265	loop count is small and
-0.104265	repeat count is small and
-0.319405	loop with a small and
-0.248881	result because the overflow and
-0.342854	in case of overflow and
-0.172086	to problems of overflow and
-0.448633	not check for overflow and
-0.182076	wrap around on overflow and
-0.248881	will generate an overflow and
-0.182076	worry much about overflow and
-0.182076	Higher inputs give overflow and
-0.273965	The size of integers and
-0.027959	need conversions between integers and
-0.027959	Avoid conversions between integers and
-0.057843	enabled. Conversions between integers and
-0.273965	efficiency of 32-bit integers and
-0.343095	example transposes a matrix and
-0.146418	newer versions of Linux and
-0.085261	and data in Linux and
-0.085261	been introduced in Linux and
-0.085261	be overridden in Linux and
-0.401505	32-bit and 64-bit Linux and
-0.210786	available for 64-bit Linux and
-0.146418	option /MT). In Linux and
-0.145667	addressing. In 32-bit Linux and
-0.145667	difference between 32-bit Linux and
-0.146418	but also supports Linux and
-0.010483	guide for Windows, Linux and
-0.043542	and 64-bit Windows, Linux and
-0.446992	compiling for the AVX and
-0.209389	the use of classes and
-0.312576	are using vector classes and
-0.209389	vectors into C++ classes and
-0.310248	www.agner.org/optimize/cppexamples.zip containing container classes and
-0.289169	describes how this works and
-0.233251	versions, each carefully optimized and
-0.320766	the complicated address calculation and
-0.311585	first six integer parameters and
-0.270337	may ignore the problem and
-0.270337	to fix the problem and
-0.547072	compiled with AVX support and
-0.224575	method requires OS support and
-0.232698	and free. These operators and
-0.233312	dummy element to list and
-0.296392	The alignment of structure and
-0.268167	its own data structure and
-0.214944	for the logical structure and
-0.424262	is able to inline and
-0.435674	always 0 or 1 and
-0.371517	in 32 bit mode and
-0.256492	optimized for 16-bit mode and
-0.042835	switch to protected mode and
-0.042835	switching to protected mode and
-0.234077	various corrections for sign and
-0.332806	// Increment loop counter and
-0.222245	adding an integer counter and
-0.334816	possible. The first count and
-0.105075	the maximum repeat count and
-0.105075	worst-case maximum repeat count and
-0.243974	and fixed repeat count and
-0.232972	Use large data files and
-0.250735	version of object files and
-0.140731	to store help files and
-0.201026	configuration files, help files and
-0.183666	database connections. Open files and
-0.232972	several drivers, configuration files and
-0.296140	character arrays is fast and
-0.312550	code and for fast and
-0.241747	64-bit integers. The allocation and
-0.191491	to optimize register allocation and
-0.191491	process of dynamic allocation and
-0.191491	advance. The frequent allocation and
-0.191491	is finished. Register allocation and
-0.231873	errors in C++ programs and
-0.249331	investigation of the problems and
-0.045107	causes of compatibility problems and
-0.045107	sources of compatibility problems and
-0.149386	problems and compatibility problems and
-0.217961	sources of resource problems and
-0.132574	terms of usability problems and
-0.132574	compatibility problems, usability problems and
-0.318543	take up cache space and
-0.365103	override the CPU dispatching and
-0.279703	intended for CPU dispatching and
-0.279703	research on CPU dispatching and
-0.336451	supported by the microprocessor and
-0.359178	than a dedicated microprocessor and
-0.184096	The number of branches and
-0.184096	The target of branches and
-0.249677	program with many branches and
-0.198547	correlated with preceding branches and
-0.287097	to make a multiplication and
-0.231887	12.8a to 12.8b automatically and
-0.321042	This makes code caching and
-0.296982	as supported instruction sets and
-0.296982	on what instruction sets and
-0.268621	the code more complicated and
-0.268621	array elements more complicated and
-0.208186	method is extremely complicated and
-0.343923	on structured exception handling and
-0.321916	compiler can look like and
-0.231811	the various optimization methods and
-0.185391	behaves differently on signed and
-0.185391	speed between using signed and
-0.082877	The conversion between signed and
-0.082877	... Conversions between signed and
-0.185391	not to mix signed and
-0.319641	a specific CPU model and
-0.328637	of structured software development and
-0.296260	its own memory block and
-0.542464	new bigger memory block and
-0.357110	the child class name and
-0.219328	have any brand name and
-0.232312	vectors. This data conversion and
-0.287190	work load is high and
-0.306021	these variables to zero and
-0.021014	including the terminating zero and
-0.273219	Windows platforms. The Microsoft and
-0.273219	Gnu, Clang, Intel, Microsoft and
-0.375199	between a function parameter and
-0.232251	many reductions involving division and
-0.232251	operands means that source and
-0.232424	the program starts running and
-0.400045	relies on network resources and
-0.286678	the loop by n and
-0.319638	that produces a string and
-0.231098	compilers are becoming better and
-0.230918	compiler for Unix applications and
-0.230737	the Boolean operators && and
-0.218445	longer time than addition and
-0.498072	for floating point addition and
-0.231355	the <, <=, > and
-0.270317	some types of expressions and
-0.216846	only the simplest expressions and
-0.217633	get used to read and
-0.217633	and WritePrivateProfileString to read and
-0.275242	line to be read and
-0.216630	means of #include directives and
-0.216630	Table 18.2. Compiler directives and
-0.231240	GOT for all public and
-0.269208	to load the framework and
-0.388164	frameworks. The .NET framework and
-0.406986	by using static linking and
-0.083601	19 3.6 Dynamic linking and
-0.083601	acceptable. 3.6 Dynamic linking and
-0.319816	almost all modern microprocessors and
-0.135848	between floating point numbers and
-0.322686	on the hardware platform and
-0.231530	20 clock cycles later and
-0.231530	whole software project together and
-0.324745	MMX to 128-bit XMM and
-0.420700	place the user interface and
-0.420700	the graphical user interface and
-0.232155	projects have become bigger and
-0.286018	mathematical operations on vectors and
-0.230076	and one for AVX2 and
-0.230975	the use of << and
-0.173691	processors. Supports all x86 and
-0.173691	workaround. Supports all x86 and
-0.199941	source library. Supports x86 and
-0.286084	which software development process and
-0.455244	discussion of the advantages and
-0.216554	Each type has advantages and
-0.232728	arrays, a and b, and
-0.285835	26 about data storage and
-0.509621	the relevant optimization options and
-0.246770	to the copy constructor and
-0.246770	has no copy constructor and
-0.232439	32 bits of a[i] and
-0.248253	to inline the function, and
-0.197280	to the library function, and
-0.197280	in the select function, and
-0.678321	order of the operands and
-0.284847	Don't use an advanced and
-0.700501	is out of range and
-0.327975	it takes to start and
-0.229897	well- tested library modules and
-0.266666	The proxy is smaller and
-0.197280	make the code smaller and
-0.197280	structure 8 bytes smaller and
-0.212693	service routines, system core and
-0.212693	a dedicated microprocessor core and
-0.231135	lot of jumping around and
-0.212224	is typically between 5 and
-0.212224	multiplying by 3, 5 and
-0.499978	only the code section and
-0.229914	can be further tested and
-0.230891	This removed the contentions and
-0.265094	is 0 for positive and
-0.265094	style has both positive and
-0.230646	of arrays in C and
-0.213162	two names, one global and
-0.213162	page 26. Avoid global and
-0.305233	can avoid the conversions and
-0.284855	loop, the if statement and
-0.230971	then turn it off and
-0.212154	same thing as p and
-0.212154	at optimizing away p and
-0.326615	history of programming languages and
-0.230452	The procedures for installation and
-0.230971	frequency may vary dynamically and
-0.318069	able do function inlining and
-0.229955	or accessing databases, network and
-0.305259	running, and a slow and
-0.209957	from seldom used functions, and
-0.209957	multiple inheritance, virtual functions, and
-0.229678	are organized into lines and
-0.879666	in order to find and
-0.209430	has some syntax checking and
-0.366627	arrays with bounds checking and
-0.435126	apply to other platforms and
-0.167698	recommendation about which platforms and
-0.167698	assembly on all platforms and
-0.167698	x86 and ARM platforms and
-0.341780	in both the level-1 and
-0.323136	possible inputs is limited and
-0.229683	options for fast math and
-0.229387	the stack. String constants and
-0.229091	by constant = shift and
-0.285201	CString. This is safe and
-0.230572	memory economy, cache efficiency and
-0.304942	compile time. Text strings and
-0.232408	you gain by testing and
-0.183162	This also makes testing and
-0.183162	terms of development, testing and
-0.205200	few restrictions on alignment and
-0.205200	information about pointer alignment and
-0.282705	compares eax with 100 and
-0.184884	on the stack Variables and
-0.184884	other in memory. Variables and
-0.234337	of variable storage Variables and
-0.228097	for statistics, signal processing and
-0.242707	in a more clear and
-0.176777	making software more clear and
-0.162736	the code less clear and
-0.162736	good for making clear and
-0.229005	is stored in x, and
-0.440748	as Intel-based Mac OS and
-0.118531	correspondence between function names and
-0.118531	information about function names and
-0.118531	to define function names and
-0.157601	at CPU brand names and
-0.229350	of CPU time, RAM and
-0.020857	// number of rows and
-0.033700	doing the same thing and
-0.070237	exactly the same thing and
-0.282532	are instances of structures and
-0.228198	errors that seldom occur and
-0.229326	some things very smart and
-0.194471	Microprocessors with the SSE and
-0.194471	not support the SSE and
-0.200905	because we are reading and
-0.200905	is spent on reading and
-0.312609	per vector. The simplest and
-0.410016	issue an error message and
-0.248403	an appropriate error message and
-0.316130	all the CPU cores and
-0.227448	intended for array sizes and
-0.201258	work on the PathScale and
-0.201258	by Microsoft, Intel, PathScale and
-0.068228	distributions of Linux, BSD and
-0.068228	objects in Linux, BSD and
-0.068228	whereas 64-bit Linux, BSD and
-0.118690	with Windows, Linux, BSD and
-0.234070	start of the program, and
-0.310157	modified by the program, and
-0.149751	to execute the program, and
-0.197366	multiplication prior to SSE4.1 and
-0.197366	set, one for SSE4.1 and
-0.226409	in a memory buffer and
-0.226409	the value of seconds and
-0.280683	comes with the compiler, and
-0.301533	Pointers can be invalid and
-0.226409	used for file input and
-0.217047	use for many programmers and
-0.169422	guide for assembly programmers and
-0.169422	is for advanced programmers and
-0.952117	of the critical stride and
-0.806020	the SSE2 instruction set, and
-0.169783	on Pentium 4 processors, and
-0.169783	AMD and VIA processors, and
-0.169783	best on future processors, and
-0.283958	the size doesn't matter and
-0.196981	structure or class declaration and
-0.196981	have extern "C" declaration and
-0.281616	its many optimization features and
-0.228057	elements have been added and
-0.226054	function is OS independent and
-0.226054	is simply optimized away and
-0.315600	analogous to example 15.1b and
-0.194776	to an inlined 15.1b and
-0.822171	by constant = multiply and
-0.536641	130 for an explanation and
-0.282824	discriminating between CPU brands and
-0.459772	28 below the diagonal and
-0.227427	necessary to reload *p and
-0.226511	a floating point addition, and
-0.134915	and 64-bit. Supports OpenMP and
-0.014744	Supports parallel processing, OpenMP and
-0.134915	(see page 107), OpenMP and
-0.192656	programs should be standardized and
-0.192656	syntax is fully standardized and
-0.279755	member functions of parent and
-0.128512	class. Members of parent and
-0.163617	members of both parent and
-0.279083	programming languages, operating systems, and
-0.224587	value 0 for false and
-0.365056	compiled on a PC and
-0.279664	distinguish between coarse-grained parallelism and
-0.225612	to choose between c2 and
-0.224075	The rules for prediction and
-0.224075	can overlap the iterations and
-0.353877	pointer to an integer, and
-0.187780	a biased binary integer, and
-0.224587	general literature on algorithms and
-0.280246	go through the PLT and
-0.070506	balanced mix of additions and
-0.070506	a combination of additions and
-0.154721	calculated by n additions and
-0.188250	and loaded into ecx and
-0.188250	the registers eax, ecx and
-0.224075	function parameters, local variables, and
-0.224587	not always accurate, however, and
-0.224075	multiple programming languages, profiling and
-0.155157	the code is fragmented and
-0.155157	memory to be fragmented and
-0.201158	memory to become fragmented and
-0.200189	newer. The CPU family and
-0.154285	based on its family and
-0.154285	than its brand, family and
-0.188250	increasing number of devices and
-0.188250	again. Accessing system devices and
-0.165761	rid of the GOT and
-0.123190	will not use GOT and
-0.123190	has no effect. GOT and
-0.123190	option -read_only_relocs suppress. GOT and
-0.224587	and sound processing Memory and
-0.225099	program is shut down and
-0.267218	number of cache misses and
-0.197751	the code, cache misses and
-0.223290	when compiled with -fpic and
-0.328202	register into the carry and
-0.188806	is used for debugging and
-0.144030	and turn off debugging and
-0.144030	cost of verifying, debugging and
-0.223290	in the next vector, and
-0.182120	beginning of the object, and
-0.182120	owns the allocated object, and
-0.223874	imprecisions should be allowed and
-0.231240	the test program itself and
-0.182120	than the application itself and
-0.224458	calculations including linear algebra and
-0.222707	events as task switches and
-0.597471	program code is distributed and
-0.306082	than in 32-bit mode, and
-0.181589	file in exclusive mode, and
-0.277613	the features of Java and
-0.223290	friendly. It is free and
-0.276953	program development more expensive and
-0.081583	in speed between rounding and
-0.081583	the difference between rounding and
-0.260824	and the operating system, and
-0.192297	is no operating system, and
-0.306848	applied to 32-bit integers, and
-0.181589	loop is interpreted again and
-0.181589	addresses is reused again and
-0.361155	done by the linker and
-0.182120	from both compiler, linker and
-0.224458	more difficult to debug and
-0.064297	with the Gnu, Clang and
-0.064297	as the Gnu, Clang and
-0.385825	it gives more reliable and
-0.297344	intended, while the Borland and
-0.313082	full 128-bit execution units and
-0.223290	saving and restoring registers, and
-0.154533	// function to transpose and
-0.081225	static variables if possible, and
-0.081225	possible implementation if possible, and
-0.081225	be avoided, if possible, and
-0.091809	few branches as possible, and
-0.173470	The code is compact and
-0.333059	data is more compact and
-0.222245	efficiency. Using unaligned reads and
-0.310845	is inefficient, of course, and
-0.319219	discussed how to identify and
-0.395657	type is more complex and
-0.078131	computation time. 4 Performance and
-0.078131	....................................................................................... 22 4 Performance and
-0.078131	124 2 13.4 Test and
-0.078131	reliable decision. 13.4 Test and
-0.173470	viable compromise when portability and
-0.173470	compromise between efficiency, portability and
-0.310845	same time to evaluate and
-0.222925	C#, Visual Basic .NET and
-0.174078	Pointers versus references Pointers and
-0.060461	vector operations. 7.6 Pointers and
-0.060461	Booleans................................................................................................................... 33 7.6 Pointers and
-0.221567	programming constructs are costly and
-0.174686	the code more efficient, and
-0.174686	to make pointers efficient, and
-0.222245	next generation of computers and
-0.221567	values of A, B and
-0.174078	perfectly varies between 9 and
-0.174078	3, 4, 6, 9 and
-0.315765	for the Intel Core and
-0.220890	program in a debugger and
-0.173470	rounding instead of truncation and
-0.173470	be changed to truncation and
-0.413704	finding the hot spots and
-0.221567	supported in Windows 7 and
-0.221567	are actually quite powerful and
-0.014450	C++ compiler for 32- and
-0.131824	Intel libraries. Supports 32- and
-0.221567	image processing, sound processing, and
-0.220890	for many computer users and
-0.173470	programming language, e.g. C++, and
-0.173470	code. C#, managed C++, and
-0.222925	various methods for communication and
-0.174686	fine-grained parallelism because communication and
-0.218358	C++, D, Pascal, Fortran and
-0.218358	discussion of the increment and
-0.219166	data are accessed backwards and
-0.218358	the excessive memory swapping and
-0.112753	explicit use of memset and
-0.112753	by calls to memset and
-0.112753	use the functions memset and
-0.218358	are becoming more popular and
-0.219976	functions for string searching and
-0.243549	inlining and constant propagation and
-0.157542	to enable constant propagation and
-0.162353	used during program development, and
-0.162353	costs of software development, and
-0.219166	the parallelism is obvious and
-0.218358	type casting. Linked lists and
-0.219166	functions like sqrt, pow and
-0.218358	Far storage, far pointers, and
-0.218358	Dynamic memory allocation Objects and
-0.068241	allowed to have constructors and
-0.021577	destructors. The copy constructors and
-0.021577	useful for copy constructors and
-0.021577	are no copy constructors and
-0.272943	always 1 if nonzero and
-0.426695	install a software package and
-0.291501	is difficult to understand and
-0.218358	which is quite inefficient, and
-0.218358	ms for foreground jobs and
-0.308167	distinction between the latency and
-0.219166	possible on Linux platforms, and
-0.218358	all code branches separately and
-0.291501	allows common subexpression elimination and
-0.219166	two summation variables sum1 and
-0.218358	or C++ code. Compilers and
-0.219166	these. The CodeGear, Codeplay and
-0.162353	many advanced optimizing features, and
-0.162353	need better backup features, and
-0.218358	use the zero flag and
-0.163064	the numbers in b[i] and
-0.163064	by checking if b[i] and
-0.067431	and delete or malloc and
-0.067431	and delete, or malloc and
-0.068241	with the functions malloc and
-0.068241	and delete (or malloc and
-0.308167	> 0 is true, and
-0.214588	categories: File input/output Graphics and
-0.216594	aware of these obstacles and
-0.216594	the value of m and
-0.214588	a lot of resources, and
-0.214588	there is always one, and
-0.526344	search facilities are needed, and
-0.214588	names in the SVML and
-0.006628	time than addition, subtraction and
-0.027143	involving integer addition, subtraction and
-0.350613	single or double precision, and
-0.214588	we know that u.f and
-0.267764	examples on page 134 and
-0.268897	you could calculate *p+2 and
-0.146193	two or more cores, and
-0.146193	processing instructions, multiple cores, and
-0.374054	branch into the pipeline and
-0.270033	Example 7.6. Set flush-to-zero and
-0.215590	code in example 14.8 and
-0.146193	no checking for overflow, and
-0.146193	bounds violation, integer overflow, and
-0.066684	is calculated in advance and
-0.066684	has calculated in advance and
-0.350613	represent the "worst case" and
-0.147049	have a = 0x2710 and
-0.394056	reads from address 0x2710 and
-0.297012	isolate the hot spot and
-0.214588	a stack frame, saving and
-0.268897	the importance of structured and
-0.215590	due to poor documentation and
-0.215590	fact it does not, and
-0.267764	way of example 12.4b and
-0.214588	will generate an underflow and
-0.215590	is less than 2n and
-0.067330	member function. 7.12 Branches and
-0.067330	conversions.................................................................................................... 40 7.12 Branches and
-0.191204	development of user interfaces and
-0.146193	access to hardware interfaces and
-0.214588	invalidate each other's caches and
-0.350613	stored as it is, and
-0.495329	the interrupt 3 breakpoint and
-0.214588	with a well-defined functionality and
-0.147049	go through multiple layers and
-0.147049	other extra software layers and
-0.214588	two induction variables Y and
-0.214588	smart pointers are auto_ptr and
-0.495329	point constants, string constants, and
-0.215590	calling WritePrivateProfileString, which opens and
-0.215590	to the right format and
-0.214588	measured with millisecond resolution and
-0.214588	floating point addition units, and
-0.260750	Windows (See page 49 and
-0.208376	i to four bits, and
-0.209694	rows are accessed consecutively and
-0.209694	Out-of-order execution (chapter 11) and
-0.209694	A and then B, and
-0.208376	the effort. Square blocking and
-0.208376	the operating system API and
-0.208376	TILESIZE // Loop r1 and
-0.209694	{ // Loop r2 and
-0.208376	code is fast anyway and
-0.120528	just-in-time compilers, system database, and
-0.120528	with a remote database, and
-0.208376	size when doing calculations, and
-0.120528	the last cache level, and
-0.120528	the object file level, and
-0.208376	in the relevant books and
-0.120528	is designed for generality and
-0.120528	if the full generality and
-0.208376	copied to the parameter, and
-0.208376	compilers have many keywords and
-0.279713	with the option -fno-pic and
-0.260750	_M_IX86 x86-64 platform _M_IX86 and
-0.208376	to seek information elsewhere and
-0.208376	with vector integer operations, and
-0.260750	ever bigger software packages and
-0.208376	given on page 136 and
-0.208376	to 151 15.1c automatically, and
-0.208376	0x8040); See page 145 and
-0.208376	The distinctions between RISC and
-0.208376	implementations of C++, Pascal and
-0.056096	c; }; 7.23 Constructors and
-0.056096	.............................................................................................................. 54 7.23 Constructors and
-0.208376	CISC processors, between PC's and
-0.120528	memory, such as DOS and
-0.120528	old operating systems DOS and
-0.208376	// Example 7.4. Signed and
-0.208376	functions such as logarithms and
-0.209694	course be the easiest and
-0.120528	data structure, data flow and
-0.226511	determines the program flow and
-0.208376	Now s0, s1, s2 and
-0.208376	calls another function, etc., and
-0.208376	exceptions thrown by F2 and
-0.208376	is not human readable and
-0.208376	principles here: functional decomposition and
-0.208376	forums Several internet forums and
-0.208376	... } If Func1 and
-0.208376	repeat count is odd and
-0.208376	integers as Boolean vectors, and
-0.209694	separately. The allocation, deallocation and
-0.631735	procedure linkage table (PLT) and
-0.208376	bit systems: Pointers, references, and
-0.101515	= 6.0f; Constant folding and
-0.101515	few places. Constant folding and
-0.047495	and r in Sum2 and
-0.047495	more efficient than Sum2 and
-0.047495	three functions Sum1, Sum2 and
-0.196213	800 bytes smaller. Structure and
-0.196213	__INTEL_COMPILER n.a. n.a. _MSC_VER and
-0.196213	with the & operator; and
-0.196213	monitor counters are CPU-specific and
-0.008519	it takes to develop and
-0.196213	on most microprocessors. Multiplication and
-0.196213	unrolling in example 14.12b and
-0.196213	rather than self-styled hacks and
-0.196213	is only a hint and
-0.247053	See the preceding paragraph and
-0.247053	users. Firewalls, virus scanners and
-0.196213	required. See page 73 and
-0.196213	different system color settings and
-0.196213	have sent me corrections and
-0.196213	menu click becomes inconsistent and
-0.196213	The cost of starting and
-0.196213	risk factor in itself, and
-0.196213	8 - 64 Kbytes and
-0.073344	large cost to creating and
-0.073344	is responsible for creating and
-0.196213	and calls alternately FuncA and
-0.196213	variables to be overwritten, and
-0.196213	caching is advantageous if, and
-0.196213	is contained in p1 and
-0.196213	with the functions lrintf and
-0.196213	many functions for audio and
-0.196213	at a high price, and
-0.196213	capable of register renaming and
-0.247053	signal processing, data compression and
-0.035126	eee is the exponent, and
-0.035126	sign bit, the exponent, and
-0.196213	of fine-tuning, testing, verifying and
-0.035126	.................................................................................................................. 60 7.30 Exceptions and
-0.035126	of multithreading. 7.30 Exceptions and
-0.196213	default constructors, copy constructors, and
-0.196213	doesn't have to push and
-0.196213	such as sorting, searching, and
-0.196213	object is deleted properly and
-0.196213	variable for accessing list[i].a and
-0.196213	16-byte instructions MOVNTPS, MOVNTPD and
-0.196213	and 137, respectively. Increment and
-0.459145	subexpression elimination, constant propagation, and
-0.459145	"Intel Math Kernel Library" and
-0.196213	to speed up multiplications and
-0.196213	updates should be optional and
-0.196213	16 bit platform __GNUC__ and
-0.247053	big-endian storage. Example 14.23b and
-0.196213	compiler in many respects and
-0.196213	the CPU doesn't support, and
-0.247053	can be quite tedious and
-0.247053	to use a systematic and
-0.073344	cost to memory management and
-0.073344	cost of heap management and
-0.196213	Intrinsic functions look clumsy and
-0.196213	where instructions are fetched and
-0.196213	example, #define ABC 123 and
-0.196213	put away in reusable and
-0.196213	such as Taylor expansions and
-0.196213	unstable due to interrupts and
-0.196213	Agner Fog. Public distribution and
-0.196213	of calls to log, and
-0.073344	Codes", by S. Goedecker and
-0.073344	improving performance. Stefan Goedecker and
-0.196213	the measurements as accurate and
-0.196213	tasks such as sorting and
-0.247053	response times to keyboard and
-0.035126	precision division, square root and
-0.035126	inefficient. Division, square root and
-0.196213	linear algebra and statistics, and
-0.247053	to prevent memory leaks and
-0.196213	or See page 95 and
-0.459145	with new and delete, and
-0.247053	array element is accessed, and
-0.035126	the object. 7.17 Structures and
-0.035126	.............................................................................................. 50 7.17 Structures and
-0.161644	operating systems are common, and
-0.161644	program is fast, compact, and
-0.161644	The characters '?', '@' and
-0.161644	use different memory areas, and
-0.161644	all kinds of strange and
-0.161644	point to become invalid, and
-0.161644	functions such as sqrt and
-0.161644	address esp+8 and esp+12 and
-0.161644	as xn = x∙xn-1, and
-0.161644	because of cache evictions and
-0.161644	time, usability, program compactness, and
-0.161644	criticized for code bloat and
-0.161644	of Java and C# and
-0.161644	(1) or false (0); and
-0.161644	typical sources of frustration and
-0.161644	least in some situations, and
-0.161644	language programming, compiler technology, and
-0.161644	different microprocessors, different alignments and
-0.161644	of efficiency, platform independence, and
-0.161644	all resources are sufficient, and
-0.161644	of background processes running, and
-0.161644	Volume 1, 2A, 2B, and
-0.161644	calculation here gives a+b=0, and
-0.161644	to avoid hard-to-find errors, and
-0.161644	three clauses: initialization, condition, and
-0.161644	global and one local, and
-0.161644	the difference between commas and
-0.161644	(XMM), 256 bits (YMM), and
-0.161644	above sections are dominating and
-0.161644	tasks such as spell-checking and
-0.161644	universal, flexible, well tested, and
-0.161644	a low priority thread, and
-0.161644	operators &&, ||, ! and
-0.161644	Developer’s Manual", Volume 2A and
-0.161644	9.3. Time for transposing and
-0.161644	the arrays are aligned, and
-0.161644	may be both cheaper and
-0.161644	price, compatibility, second source, and
-0.161644	problem by defining _mm_malloc and
-0.161644	of instruction latencies, throughputs and
-0.161644	object-oriented programming, modularity, reusability and
-0.161644	execution speed, memory economy and
-0.161644	v. 10.1.020. Functions _intel_fast_memcpy and
-0.161644	functions such as GetPrivateProfileString and
-0.161644	operation will be non-zero, and
-0.161644	easier to test, maintain and
-0.161644	in a separate module, and
-0.161644	unsigned integers is ambiguous and
-0.161644	are not always sequential, and
-0.161644	as AQtime, Intel VTune and
-0.161644	stack at address esp+8 and
-0.161644	table values by hand and
-0.161644	to its own caller, and
-0.161644	addresses 0x2F00, 0x3700, 0x3F00 and
-0.161644	time T+1 to T+6, and
-0.161644	all the G values, and
-0.161644	only in the Professional and
-0.161644	function calling conventions. FreeBSD and
-0.161644	you want to optimize, and
-0.161644	libraries named MKL, VML and
-0.161644	discriminates between CPU brands, and
-0.161644	type. Interrupt service routines and
-0.161644	sets. Covers PC's, workstations and
-0.161644	done by fetching, decoding and
-0.161644	applies to 3-dimensional geometry and
-0.161644	the program is terminated and
-0.161644	'this' pointer, common subexpressions, and
-0.161644	9.2, such as flush and
-0.161644	include JavaScript, PHP, ASP and
-0.161644	problems are usability issues, and
-0.161644	such as semaphores, mutexes and
-0.161644	counts are often fluctuating and
-0.161644	similar CPU dispatch mechanisms, and
-0.161644	vector register containing (2,2,2,2), and
-0.161644	generate the value infinity, and
-0.161644	objects in computer games and
-0.161644	purposes such as email and
-0.161644	intended to be platform-independent and
-0.161644	work around this limitation and
-0.161644	access to virus attacks and
-0.161644	and have a temp1 and
-0.161644	STL (Standard Template Library) and
-0.161644	the matrix 512 520 and
-0.161644	Active Template Library (ATL) and
-0.161644	Linux kernel version 2.6.30 and
-0.161644	is faster than 15.1b, and
-0.161644	to distinguish between recoverable and
-0.161644	all software be reinstalled and
-0.161644	The cost of synchronizing and
-0.161644	code optimization. See www.agner.org/optimize and
-0.161644	are highly system dependent and
-0.161644	by the clock period and
-0.161644	2A, 2B, and 3A and
-0.161644	hardware access. Available protocols and
-0.161644	parallelization. Supports vector intrinsics and
-0.161644	object in the GOT, and
-0.161644	network with heavy traffic and
-0.161644	program is more manageable and
-0.161644	cache between each call, and
-0.161644	127 will generate -128, and
-0.161644	including the library libmmt.lib and
-0.161644	number (e.g. with _finite()) and
-0.161644	the bitwise operators (& and
-0.161644	in the copying process, and
-0.161644	that has no side-effects and
-0.161644	improved by consistent modularity and
-0.161644	can be different sizes, and
-0.161644	before leaving their workplace and
-0.161644	Gnu compilers. See www.openmp.org and
-0.161644	Common devices are CPLDs and
-0.161644	dynamic memory allocation (new and
-0.161644	matrix into smaller squares and
-0.161644	= (int)(&list[0]) + 100*16, and
-0.161644	for array bounds violations and
-0.161644	51 for the pros and
-0.161644	to show how tortuous and
-0.161644	to use try, catch, and
-0.161644	Addison-Wesley. Third Edition, 2005; and
-0.161644	The pre-increment operator ++i and
-0.161644	a thousand times lower; and
-0.161644	between PC's and mainframes, and
-0.161644	or First-In-Last-Out access, sort and
-0.161644	AND'ed with this mask, and
-0.161644	such as floppy disks and
-0.161644	switch statement jump tables, and
-0.161644	the code becomes bulky and
-0.161644	the constant vector (1,2,3,4), and
-0.161644	by avoiding pointer arithmetics and
-0.161644	integer division with truncation, and
-0.161644	The AND operator (&) and
-0.161644	the Boolean operators (&& and
-0.161644	case of an error; and
-0.496687	inside the loop is in
-0.346052	arithmetic A pointer is in
-0.330395	variable whose address is in
-0.293121	writes to matrix a in
-0.341048	in 64-bit systems and in
-0.235223	on Linux platforms, and in
-0.291162	last cache level, and in
-0.235223	a high price, and in
-0.235223	faster than 15.1b, and in
-0.561589	very likely to be in
-0.704447	is guaranteed to be in
-0.312057	0x273F would still be in
-0.310337	the stack and are in
-0.548586	systems. If you are in
-0.320863	well-structured C++ program are in
-0.344661	example 14.7b, we are in
-0.432777	efficient because they are in
-0.432777	integers, but they are in
-0.633936	and PathScale compilers can in
-0.345539	of a function or in
-0.310135	in system code or in
-0.234051	the compiler manual or in
-0.234051	the carry flag or in
-0.432118	advisable to make it in
-0.235126	*p+2 and store it in
-0.291050	read or write it in
-0.235126	you should disable it in
-0.373488	performance of the function in
-0.373488	address of the function in
-0.491807	declared in the function in
-0.326671	returning from the function in
-0.654952	declared inside the function in
-0.438483	out of a function in
-0.308495	link to a function in
-0.308495	executable to a function in
-0.449805	name as a function in
-0.446746	BSD. If a function in
-0.296732	to call a function in
-0.296732	support calls a function in
-0.296732	used. Whenever a function in
-0.225791	A very time-consuming function in
-0.839897	the CPU detection function in
-0.366011	tested the strlen function in
-0.225791	call the std::unexpected() function in
-0.291668	you are dealing with in
-0.235668	are usually dealt with in
-0.465777	Most of the code in
-0.676614	parts of the code in
-0.411419	modifications to the code in
-0.466163	order. If the code in
-0.317380	will replace the code in
-0.317380	will change the code in
-0.317380	to organize the code in
-0.785548	a piece of code in
-0.401623	same piece of code in
-0.297923	the range of code in
-0.259640	timediff[i]); } The code in
-0.340800	same time. The code in
-0.259640	integer calculations. The code in
-0.259640	CPU dispatching. The code in
-0.259640	return _mm_cvtsd_si32(_mm_load_sd(&x));} The code in
-0.446351	produce the same code in
-0.045909	13 Making critical code in
-0.313144	explain the above code in
-0.219034	the hardware definition code in
-0.272794	see the compiler-generated code in
-0.694663	the same way as in
-0.522443	R2 as well as in
-0.231059	CPU dispatching explicitly as in
-0.231059	variable in memory, as in
-0.231059	the same principle as in
-0.231059	with multiple counters, as in
-0.231059	use a union, as in
-0.844879	if it is not in
-0.844879	when it is not in
-0.342503	user interface is not in
-0.285774	use hyperthreading or not in
-0.342243	2 GB, but not in
-0.285774	saved in registers, not in
-0.230483	operating systems (but not in
-0.237136	is quite expensive - in
-0.286682	or double to int in
-0.347965	65535 uint16_t unsigned int in
-0.459280	uint8_t unsigned short int in
-0.325771	of type short int in
-0.325771	127 int8_t short int in
-0.231283	-32768 32767 int16_t int in
-0.222804	optimizing library functions than in
-0.222804	a dynamic library than in
-0.341010	and C++ faster than in
-0.333767	million times less than in
-0.061755	in memory rather than in
-1.153859	in registers rather than in
-0.222804	a separate file than in
-0.006504	in 64-bit Linux than in
-0.097268	in 64-bit mode than in
-0.097268	64 bit mode than in
-0.222804	programmable logic device than in
-0.523174	implemented by the compiler in
-0.974003	of the Intel compiler in
-0.813183	to the Gnu compiler in
-0.237319	compiler to store x in
-0.520414	hand, the compiler may in
-0.325699	choice of compiler may in
-0.348035	clock cycles. It may in
-0.318489	redesigning a program may in
-0.232275	const int declaration may in
-0.332345	You cannot avoid this in
-0.234776	be implemented like this in
-0.761119	elements at a time in
-0.422605	a waste of time in
-0.229993	mispredicted only one time in
-0.704435	take a long time in
-0.691572	most of its time in
-0.474382	that takes longer time in
-0.354673	which version to use in
-0.233754	than optimizing CPU use in
-0.289490	to economize resource use in
-0.236513	the loop exits, when in
-0.378164	for the main memory in
-0.234556	feature for reserving memory in
-0.342108	than processing the data in
-0.327141	realistic set of data in
-0.383912	Any pointers to data in
-0.329778	public functions and data in
-0.311463	Func(a[i]); } The data in
-0.221392	accessing the same data in
-0.200548	size of all data in
-0.200548	operations on all data in
-0.221392	unit of received data in
-0.221392	concentrated on arranging data in
-0.346527	to run the program in
-0.346527	and stop the program in
-0.375541	making the entire program in
-0.002809	Store the result vector in
-0.235664	It will look different in
-0.309139	use than pointers because in
-0.233215	x = *(++p) because in
-0.233215	x = array[++i] because in
-0.494347	algebra are the same in
-0.351278	to write the same in
-0.309823	the order of functions in
-0.279730	put the different functions in
-0.225157	have information about functions in
-0.299559	page 96). Virtual functions in
-0.225157	for all suitable functions in
-0.225157	variables and internal functions in
-0.225157	// Place non-polymorphic functions in
-0.232442	returned in registers only in
-0.232442	supports this option only in
-0.232442	register size comes only in
-0.172848	waiting for each other in
-0.075420	stored near each other in
-0.166747	together near each other in
-0.292129	with a decimal point in
-0.496053	iteration of the loop in
-0.601427	by unrolling the loop in
-0.328721	to vectorize the loop in
-0.336401	Assume that a loop in
-0.379219	Z } The loop in
-0.291265	value 1000. The loop in
-0.417079	emulate the while loop in
-0.220761	diagonal. The c loop in
-0.274748	in a message loop in
-0.333930	calls another function which in
-0.406378	supports intrinsic functions, but in
-0.228029	in 32-bit systems, but in
-0.228029	the general case, but in
-0.228029	of system programming, but in
-0.228029	arrays as required, but in
-0.245425	dispatching and is used in
-0.245425	Position-independent code is used in
-0.323721	trick which is used in
-0.323721	level-1 cache is used in
-0.245425	This standard is used in
-0.245425	16-bit mode is used in
-0.245425	function feature is used in
-0.485411	lookup can be used in
-0.485411	tool can be used in
-0.398069	instruction cannot be used in
-0.212881	functions that are used in
-0.378016	variables that are used in
-0.264578	libraries that are used in
-0.264578	iterators that are used in
-0.276685	virtual functions are used in
-0.205805	function they are used in
-0.205805	x86 processors are used in
-0.190544	size or data used in
-0.190544	variable is only used in
-0.521295	mechanism is also used in
-0.240685	pow The method used in
-0.190544	solutions are now used in
-0.190544	embedded systems Microcontrollers used in
-0.235826	manual is number one in
-0.417574	fetched from the cache in
-0.322330	that uses the cache in
-0.375235	bits of the integer in
-0.288009	and using the integer in
-0.388808	bits of an integer in
-0.299080	is simply an integer in
-0.299080	can convert an integer in
-0.286585	or Friday is set in
-0.231197	to the same set in
-0.353566	the highest instruction set in
-0.428887	and the derived class in
-0.587242	of a parent class in
-0.323774	provided as an example in
-0.338680	the default integer size in
-0.316688	of the register size in
-0.719308	the cache line size in
-0.335929	same as a pointer in
-0.536945	jump through a pointer in
-0.283057	parameters). The this pointer in
-0.417340	The implicit 'this' pointer in
-0.658421	If a and b in
-0.462719	joining a and b in
-0.787612	sign bit of i in
-0.311788	of i to float in
-0.346250	may move the object in
-0.371391	of the data object in
-0.315326	to store each object in
-1.037539	a floating point number in
-0.571050	transfer is more efficient in
-0.406413	mode, and more efficient in
-0.406413	point code more efficient in
-0.313344	function calling more efficient in
-0.476384	libraries are less efficient in
-0.336624	if this is possible in
-0.578802	animation. It is possible in
-0.308416	object, then the version in
-0.528332	to the desired version in
-0.433512	independent of the value in
-0.335083	mean use the value in
-0.219108	are transferred by value in
-0.219108	minimum value maximum value in
-0.219108	the four B value in
-0.219108	the four R value in
-0.418813	stores all the objects in
-0.109406	the movements of objects in
-0.109406	physical movements of objects in
-0.196144	than there are objects in
-0.226462	used in shared objects in
-0.226462	up 64-bit shared objects in
-0.196144	calculation of graphics objects in
-0.057457	Position-independent code Shared objects in
-0.057457	load time. Shared objects in
-0.013668	explained below. Shared objects in
-0.057457	in BSD Shared objects in
-0.057457	local references. Shared objects in
-0.534970	address of the variable in
-0.291993	optimize away the variable in
-0.378703	writing to a variable in
-0.265107	value from a variable in
-0.265107	you access a variable in
-0.286186	a pointer. A variable in
-0.197519	to some other variable in
-0.041607	as a register variable in
-0.041607	temp a register variable in
-0.327431	whenever a public variable in
-0.128522	as a global variable in
-0.128522	when a global variable in
-0.235915	automatically or does so in
-0.199174	registers, not on variables in
-0.441613	in floating point variables in
-0.250383	most often used variables in
-0.199174	conclude that most variables in
-0.159208	floating point register variables in
-0.041895	four single precision variables in
-0.041895	eight single precision variables in
-0.250383	can't have public variables in
-0.250383	data and local variables in
-0.199174	the class. Storing variables in
-1.009201	a power of 2 in
-0.301929	The multiplication by 2 in
-0.791610	to calculate the table in
-0.310065	will store the table in
-0.441047	read from a table in
-0.235558	64) can improve performance in
-0.277510	satisfied with making software in
-0.045982	manuals: 1. Optimizing software in
-0.223199	to make memory-hungry software in
-0.050230	stored in the order in
-0.107040	consecutively in the order in
-0.182525	is usually the order in
-0.182525	This reflects the order in
-0.182525	by controlling the order in
-0.291929	page 51). The order in
-0.209401	in a non-sequential order in
-0.291532	how the if branch in
-0.325772	more than one way in
-0.287207	in a safe way in
-0.284147	one of the elements in
-0.157106	the number of elements in
-0.160805	The number of elements in
-0.160805	Max. number of elements in
-0.319973	bits Number of elements in
-0.216200	search requests for elements in
-0.168663	not alias any elements in
-0.168663	different C++ language elements in
-0.261467	The first eight elements in
-0.056132	in eight consecutive elements in
-0.168663	causes all subsequent elements in
-0.368671	make function calls faster in
-0.227716	may run slightly faster in
-0.227716	often be executed faster in
-0.236075	should be declared const in
-0.129481	table that is stored in
-0.307861	if it is stored in
-0.028933	the variable is stored in
-0.059933	first result is stored in
-0.059933	second result is stored in
-0.149109	in advance and stored in
-0.047266	variable to be stored in
-0.047266	variables to be stored in
-0.047266	which can be stored in
-0.047266	variables can be stored in
-0.100312	code may be stored in
-0.124031	3.5 will be stored in
-0.124031	modifier will be stored in
-0.285037	array should be stored in
-0.047266	object cannot be stored in
-0.047266	variable cannot be stored in
-0.094227	Variables that are stored in
-0.043958	the data are stored in
-0.148004	a class are stored in
-0.094227	and objects are stored in
-0.092873	used variables are stored in
-0.092873	Global variables are stored in
-0.094227	the elements are stored in
-0.094227	point numbers are stored in
-0.094227	point constants are stored in
-0.108066	through a pointer stored in
-0.068186	(en.wikipedia.org/wiki/Standard_Template_Library). The objects stored in
-0.032756	only for objects stored in
-0.032756	principle for objects stored in
-0.108066	objects have been stored in
-0.108066	child are typically stored in
-0.108066	variable is never stored in
-0.108066	functions are usually stored in
-0.442533	times CriticalFunction is called in
-0.226419	functions are actually called in
-0.366879	any, is usually called in
-0.231306	up the function address in
-0.231306	to any other address in
-0.234991	by a factor 4 in
-0.290901	address divisible by 8 in
-0.392543	makes code. For example, in
-0.302115	cause overflow. For example, in
-0.302115	different tasks. For example, in
-0.302115	than post-increment. For example, in
-0.302115	error reporting. For example, in
-0.203947	fact using each bit in
-0.203947	get next each bit in
-0.553462	flip the sign bit in
-0.322730	converting a to unsigned in
-0.347979	This is the first in
-0.230492	&& expression, or first in
-0.350748	to distribute function libraries in
-0.336821	systems, but in registers in
-0.458246	of the vector registers in
-0.298919	and fourteen integer registers in
-0.487623	fact accessed through pointers in
-0.346437	of code to test in
-0.441833	functions. This is useful in
-0.292103	it can be useful in
-0.166706	This can be useful in
-0.196964	pointers can be useful in
-0.196964	Metaprogramming can be useful in
-0.344140	It is also useful in
-0.193649	option is less useful in
-0.235168	use exception handling even in
-0.333972	Gauss elimination. The method in
-0.229646	the function calling method in
-0.575695	to put file access in
-0.304417	access and network access in
-0.234537	reach element number 16 in
-0.317680	function opens a file in
-0.223022	plain old data file in
-0.223022	mirror the entire file in
-0.200988	the number of bits in
-0.467932	interpreting the same bits in
-0.285133	can set multiple bits in
-0.381703	systems and 64 bits in
-0.268692	size_t is 32 bits in
-0.268692	(Both use 32 bits in
-0.200988	or writing small bits in
-0.283939	different kinds of operations in
-0.228867	functions are primitive operations in
-0.235023	used by element 0 in
-0.234774	program than to type in
-0.344509	is often the case in
-0.236020	if it is short in
-0.234078	problems is quite simple in
-0.234395	queue of pending instructions in
-0.222319	support and is available in
-0.222319	InstructionSet() function is available in
-0.296193	asmlib, which is available in
-0.176301	expected to be available in
-0.234809	function libraries are available in
-0.181067	vector registers are available in
-0.181067	stack registers are available in
-0.299052	function is also available in
-0.224730	floating point registers available in
-0.299052	or logical processors available in
-0.176301	feature will become available in
-0.409802	needs to look up in
-0.385886	nothing to clean up in
-0.277000	resources are cleaned up in
-0.234381	is a minor error in
-0.215868	once or multiple times in
-0.321837	are used many times in
-0.094658	software package several times in
-0.094658	versions alternatingly several times in
-0.678944	stored on the stack in
-0.600145	transferred on the stack in
-0.221602	Use the call stack in
-0.234104	you can read about in
-0.343597	or object is accessed in
-0.408933	when data are accessed in
-0.349205	the objects are accessed in
-0.116102	the elements are accessed in
-0.111750	when elements are accessed in
-0.193818	the addresses are accessed in
-0.375457	the rows are accessed in
-0.172587	in memory or accessed in
-0.037149	time. Are objects accessed in
-0.037149	94 Are objects accessed in
-0.231964	dispatcher treats non-Intel CPUs in
-0.231964	also treat non-Intel CPUs in
-0.234007	has been incremented, while in
-0.312835	the use of arrays in
-0.282539	want as static arrays in
-0.328027	are intended to work in
-0.282218	this does not work in
-0.310320	Linux and 32-bit Windows in
-0.318385	number of function calls in
-0.449266	branches and function calls in
-0.240965	spent on function calls in
-0.240965	Avoid nested function calls in
-0.306011	and redo the calculations in
-0.213613	do simple integer calculations in
-0.134554	and doing multiple calculations in
-0.134554	CPU doing multiple calculations in
-0.290698	will return the result in
-0.290698	and store the result in
-0.290698	and stores the result in
-0.207816	[ecx+eax*4],ebx stores this result in
-0.207816	to ; store result in
-0.343156	a function is compiled in
-0.330044	pointer is 4 bytes in
-0.322811	systems and 8 bytes in
-0.325121	holes of unused bytes in
-0.325121	also 4 unused bytes in
-0.320472	is made very big in
-0.285099	environment, between different threads in
-0.313480	for running multiple threads in
-0.047780	to run two threads in
-0.323452	This can be necessary in
-0.154448	to access an element in
-0.016304	2 to each element in
-0.016304	// AND each element in
-0.016304	// Compare each element in
-0.233926	of each array element in
-0.154448	of a new element in
-0.154448	expressions for every element in
-0.200371	linked list. Each element in
-0.052262	the numerically largest element in
-0.796964	a hardware definition language in
-0.282251	the called function. But in
-0.227380	a and b. But in
-0.328566	improve the execution speed in
-0.317789	belong to the thread in
-0.508728	making a separate thread in
-0.530147	functions, trigonometric functions, etc. in
-0.260742	will catch an exception in
-0.260742	for raising an exception in
-0.322155	Objects that are allocated in
-0.233737	preferably be kept small in
-0.290114	that doesn't cause overflow in
-0.283221	systems or 64-bit integers in
-0.283221	to use 32-bit integers in
-0.760663	signed and unsigned integers in
-0.264108	behavior of signed integers in
-0.226357	the exception handling option in
-0.226357	the loop unroll option in
-0.335958	But implementing a matrix in
-0.366275	a 512 512 matrix in
-0.234125	are identical to Linux in
-0.260024	Remember that container classes in
-0.260024	to make container classes in
-0.217750	from multiple parent classes in
-0.420172	if it is done in
-0.598088	should preferably be done in
-0.211082	This is all done in
-0.211082	This is usually done in
-0.355163	keep the same precision in
-0.553757	single and double precision in
-0.302630	precision. Using double precision in
-0.346060	a new cache line in
-0.229847	profile-guided optimization. This works in
-0.165706	absolute addresses. This works in
-0.133634	These factors are explained in
-0.133634	name mangling are explained in
-0.237773	cache space, as explained in
-0.237773	vector classes, as explained in
-0.237773	eliminate branches, as explained in
-0.237773	register use, as explained in
-0.237773	other ways, as explained in
-0.237773	static linking, as explained in
-0.171977	background is further explained in
-0.327619	/ b) is calculated in
-0.552317	variable can be calculated in
-0.217354	which it has calculated in
-0.317736	and redo the calculation in
-0.317736	// Re-do the calculation in
-0.300970	instructions for address calculation in
-0.393540	line. This is advantageous in
-0.302925	lookup table is advantageous in
-0.172935	derived class is implemented in
-0.037213	code version is implemented in
-0.188664	code can be implemented in
-0.188664	loop can be implemented in
-0.188664	mechanism can be implemented in
-0.188664	etc., can be implemented in
-0.188664	8.24 can be implemented in
-0.170815	some day be implemented in
-0.249736	functions, etc. are implemented in
-0.155245	even for programs implemented in
-0.234056	to a usability problem in
-0.234518	the implicit pointer known in
-0.237521	a very efficient solution in
-0.237521	matrices. An efficient solution in
-0.216825	be a viable solution in
-0.235538	is only an advantage in
-0.235538	SSE4.1 gives an advantage in
-0.208894	is no such advantage in
-0.208894	hardly any speed advantage in
-0.233964	debugging and profiling support in
-1.156212	instruction set is supported in
-0.330898	system. AVX is supported in
-0.251410	when AVX2 is supported in
-0.234321	register variables is eight in
-0.233846	of elements in list in
-0.352191	1024 bits is likely in
-0.319645	for making the structure in
-0.224602	more clear program structure in
-0.148941	multiple threads that run in
-0.148941	Many services that run in
-0.276253	x86 family can run in
-0.197347	update process should run in
-0.197347	consumption of each run in
-0.233690	are implemented in hardware in
-0.269450	can store the values in
-0.269450	and insert the values in
-0.213970	the four G values in
-0.232721	the application program. All in
-0.233657	They have worked well in
-0.267854	may store the information in
-0.214667	testing contains debug information in
-0.214667	to store application-specific information in
-0.350218	- 2 clock cycles in
-0.193976	All pointers and addresses in
-0.193976	Position-independent code. All addresses in
-0.146927	will generate relative addresses in
-0.146927	calculating self- relative addresses in
-0.193976	members to round addresses in
-0.361977	useful performance monitor counter in
-0.771013	the time stamp counter in
-0.222789	access may be fast in
-0.222789	Integer operations are fast in
-1.051961	use dynamic memory allocation in
-0.233694	the other thread. However, in
-0.319821	35 This is optimal in
-0.552940	it may be optimal in
-0.200634	statement occupies a space in
-0.270603	takes up more space in
-0.200634	takes too much space in
-0.200634	and takes little space in
-0.353207	drivers differ a lot in
-0.194917	an explicit CPU dispatching in
-0.086614	set. 13.6 CPU dispatching in
-0.086614	126 13.6 CPU dispatching in
-0.086614	128 13.7 CPU dispatching in
-0.086614	129 13.7 CPU dispatching in
-0.194917	Example 13.2. CPU dispatching in
-0.308765	to implement a microprocessor in
-0.232662	jumps, calls and branches in
-0.232543	lower priority level, typically in
-0.233856	A few files, preferably in
-0.462222	vectorize the code automatically in
-0.197890	do this optimization automatically in
-0.197890	use vector operations automatically in
-0.197890	insert nontemporal writes automatically in
-0.233202	code that you see in
-0.035392	than the hardware implementation in
-0.439606	calculation is more complicated in
-0.304423	such as error handling in
-0.331180	alternatives to exception handling in
-0.232955	most often used members in
-0.288062	temporarily. Using the methods in
-0.318603	for modifying the name in
-0.232136	that seconds remains zero in
-0.308653	instructions for integer division in
-0.085156	there is no cost in
-0.085156	There is no cost in
-0.208351	freely without any cost in
-0.315075	your code is running in
-0.126863	tasks that are running in
-0.126863	and repagination are running in
-0.276739	systems disappears when running in
-0.160938	Mac operating system running in
-0.160938	core. Two threads running in
-0.160938	a higher-priority thread running in
-0.219048	2 // make dispatcher in
-0.423736	that the CPU dispatcher in
-0.347917	compiler puts the programmer in
-0.332702	possibly also a lookup in
-0.490458	points in the end in
-0.277286	pool. See the examples in
-0.277286	well documented. The examples in
-0.258427	allowed. The code examples in
-0.158587	may be a difference in
-0.013203	there is no difference in
-0.100011	is simply no difference in
-0.158587	is no big difference in
-0.221888	the CPU dispatch mechanism in
-0.323356	The CPU dispatch mechanism in
-0.205921	the CPU detection mechanism in
-0.275764	hash map is needed in
-0.258989	Fastcall is not needed in
-0.258989	(1) is not needed in
-0.181423	matrix longer than needed in
-0.181423	amount of memory needed in
-0.219319	most often true last in
-0.219319	big objects come last in
-0.001113	parameters to be transferred in
-0.018160	parameters would be transferred in
-0.041087	the parameters are transferred in
-0.041087	point parameters are transferred in
-0.020058	integer parameters are transferred in
-0.041087	four parameters are transferred in
-0.067279	Function parameters are transferred in
-0.233071	are one byte longer in
-0.255322	8.1. Comparison of optimizations in
-0.203561	make CPU- specific optimizations in
-0.203561	statement can improve optimizations in
-0.373649	supply such a framework in
-0.233300	embedded systems. A look in
-0.217731	system All newer microprocessors in
-0.306845	and operators Modern microprocessors in
-0.151567	Assume that the numbers in
-0.151567	to hold the numbers in
-0.201760	than generating denormal numbers in
-0.232101	expect to use later in
-0.221802	store many objects together in
-0.199583	class are stored together in
-0.199583	are always stored together in
-0.221802	are then linked together in
-0.173681	will be joined together in
-0.517659	should preferably be declared in
-0.346935	how they are declared in
-0.483684	variables and objects declared in
-0.187922	if any objects declared in
-0.233389	calculations piece by piece in
-0.308052	the microprocessor doesn't know in
-0.287648	as p and r in
-0.170694	event-counters do. This results in
-0.170694	store the four results in
-0.244480	to store intermediate results in
-0.170694	stores the thousand results in
-0.170694	matrix. My experimental results in
-0.231625	a polymorphic function goes in
-0.231456	test. disable power-save options in
-0.231625	Func1 and Func2 were in
-0.231275	precision in all operands in
-0.232525	a few unused points in
-0.319793	to insert a switch in
-0.306829	context switches is smaller in
-0.231810	message is provoked here in
-0.127934	fragmented and scattered around in
-0.059273	that are scattered around in
-0.059273	data are scattered around in
-0.127934	many functions scattered around in
-0.163745	are scattered randomly around in
-0.134566	advantageous to do things in
-0.134566	ways to do things in
-0.328945	methods and algebraic reductions in
-0.470153	systems should be tested in
-0.265676	have not been tested in
-0.047992	likely to cause contentions in
-0.047992	stride and cause contentions in
-0.047992	this can cause contentions in
-0.059511	96 9.10 Cache contentions in
-0.059511	opposite). 9.10 Cache contentions in
-0.254635	all pointers and references in
-0.228025	and mostly relative references in
-0.179247	DLL use absolute references in
-0.179247	calculation of self-relative references in
-0.318405	be no extra overhead in
-0.231450	because of a change in
-0.230644	are floating point-to-integer conversions in
-0.264164	135 The if statement in
-0.211401	be only one statement in
-0.072469	common cause of errors in
-0.072469	frequent source of errors in
-0.159501	for preventing program errors in
-0.125977	checks for such errors in
-0.276168	to prevent such errors in
-0.060424	the number of columns in
-0.015545	of rows and columns in
-0.143439	The multiplication by columns in
-0.286185	C++, and other languages in
-0.262264	cache space. The syntax in
-0.244033	behind the C++ syntax in
-0.193526	same inline assembly syntax in
-0.995305	This can be avoided in
-0.286134	flexible, but quite inefficient in
-0.281640	This method is described in
-0.090502	These algorithms are described in
-0.047386	CPU-intensive code, as described in
-0.007547	modern CPUs, as described in
-0.015228	multi-core CPUs, as described in
-0.090502	for the cases described in
-0.090502	Unfortunately, the syntax described in
-0.090502	methods are further described in
-0.171618	eight different cache lines in
-0.225571	the four cache lines in
-0.142741	only four cache lines in
-0.154383	of the 4 lines in
-0.154383	corresponds to 16 lines in
-0.229946	resources. Each graphics operation in
-0.231656	object, then the instance in
-0.194723	lrint function is given in
-0.194723	space can be given in
-0.147871	container classes are given in
-0.147871	Further details are given in
-0.149364	optimize access, as given in
-0.229607	requirements of the task in
-0.229835	be overloaded or limited in
-0.312214	weighed against the costs in
-0.209256	the STL also costs in
-0.285557	next instance of S1 in
-0.208603	value of register temp in
-0.208603	but will save temp in
-0.207734	access the system database in
-0.207734	the big registration database in
-0.230064	stored. All identical constants in
-0.230750	int) instead of bool in
-0.229835	reasons for this shift in
-0.286858	example 15.1b and d in
-0.021163	best into the algorithm in
-0.027932	to store all strings in
-0.027932	may store all strings in
-0.124456	how to store strings in
-0.124456	method of storing strings in
-0.057785	storage of text strings in
-0.057785	and handle text strings in
-0.252023	for testing multiple conditions in
-0.184769	recovering from error conditions in
-0.184769	tested under worst-case conditions in
-0.503494	place to the right in
-0.304087	similar to a macro in
-0.283776	comparing i with 100 in
-0.284892	switch between different tasks in
-0.228969	for specifying parallel processing in
-0.204146	the wheel. The containers in
-0.204146	Use these example containers in
-0.229024	with widely different priority in
-0.318213	can possibly be obtained in
-0.203644	one set of counters in
-0.658693	more performance monitor counters in
-0.229024	object is overwritten, possibly in
-0.325675	119 The function names in
-0.229291	on non- standardized details in
-0.304769	to make the rows in
-0.204649	the distance between rows in
-0.285630	above example may fail in
-0.228227	sequentially. Some applications (e.g. in
-0.285024	This works by compiling in
-0.262132	and memcpy, at least in
-0.262132	to do, at least in
-0.331383	the multiple data structures in
-0.186998	that overflow can occur in
-0.171617	// Overflow may occur in
-0.128495	the break will occur in
-0.028738	matrix when contentions occur in
-0.028738	dramatic when contentions occur in
-0.151970	code is inefficient, especially in
-0.151970	loss of precision, especially in
-0.151970	a scarce resource, especially in
-0.151970	efficient than relocation, especially in
-0.552893	code can be improved in
-0.195018	These methods are discussed in
-0.195018	common time-consumers are discussed in
-0.229089	function ReadTSC listed below in
-0.268839	generate an error message in
-0.268839	provoke an error message in
-0.020632	a store forwarding delay in
-0.228509	are a scarce resource in
-0.429319	reduced number of cores in
-0.201197	can be saved either in
-0.201197	multiple memory blocks, either in
-0.050094	the function is defined in
-0.050094	sin function is defined in
-0.128726	could have been defined in
-0.028783	12.5. Vector classes defined in
-0.028783	12.1. Vector classes defined in
-0.227641	Mac programs but rarely in
-0.176272	in a register except in
-0.176272	overflow and underflow except in
-0.176272	in the representation, except in
-0.227256	the "Macro loops" chapter in
-0.228848	lies r places back in
-0.282110	suitable containers class templates in
-0.302428	} The loop unrolling in
-0.198577	different versions of CriticalFunction in
-0.198577	the call to CriticalFunction in
-0.316779	step of the sequence in
-0.198279	opposite: Don't put something in
-0.198279	vectorized code. Storing something in
-0.392928	reduction would be invalid in
-0.315608	instead of user input in
-0.283555	memory that is organized in
-0.228529	Windows by transferring 'this' in
-0.499264	a lot to gain in
-0.161879	high priority. The gain in
-0.119668	quite substantial. This gain in
-0.161879	How much you gain in
-0.119668	the relatively small gain in
-0.313537	The same can happen in
-0.226938	can be considered metaprogramming in
-0.227574	constants we can define in
-0.429559	is difficult to implement in
-0.228554	calculates four consecutive terms in
-0.226786	data into multiple blocks in
-0.193757	preferably be put away in
-0.193757	likely to go away in
-0.226786	work load is low in
-0.254695	instruction sets is provided in
-0.128547	above. Examples are provided in
-0.128547	and parsing are provided in
-0.301492	in registers by default in
-0.327035	order. Long dependency chains in
-0.226786	available for general purposes in
-0.163673	the vector operations mentioned in
-0.163673	of the time-consumers mentioned in
-0.163673	than the ones mentioned in
-0.041005	................................................................................................ 157 17 Optimization in
-0.041005	normal. 157 17 Optimization in
-0.227492	must clean up everything in
-0.226080	counters instead of (or in
-0.084740	the code is included in
-0.084740	This time is included in
-0.094520	important functions are included in
-0.094520	'1' is not included in
-0.094520	yes License license included in
-0.225011	is doing two iterations in
-0.333324	branch prediction into account in
-0.323205	splitting the dependency chain in
-0.188533	data flow and algorithms in
-0.238428	test several different algorithms in
-0.189261	get four float additions in
-0.189261	can do four additions in
-0.226199	so many unknown factors in
-0.038808	These functions are listed in
-0.038808	The results are listed in
-0.038808	These suffixes are listed in
-0.038808	instruction latencies are listed in
-0.067218	instruction set, as listed in
-0.067218	using the instructions listed in
-0.225407	the algebraic reductions explicitly in
-0.299855	measured time is interpreted in
-0.315001	loaded cannot be determined in
-0.019582	matrix size causes misses in
-0.226199	256-bit registers named YMM in
-0.224993	for the specific purpose in
-0.172262	shared object without -fpic in
-0.172262	of compiling without -fpic in
-0.223640	128-bit vector registers had in
-0.224091	listed in table 19 in
-0.231920	and '$' are allowed in
-0.307597	name is not allowed in
-0.224542	template is calling itself in
-0.364910	many rules of algebra in
-0.224091	is available for free in
-0.223640	handling can be expensive in
-0.223640	{ // Catch exceptions in
-0.205459	results should be saved in
-0.144586	function calls are saved in
-0.144586	ebx that was saved in
-0.224091	almost independent of changes in
-0.297759	time that is measured in
-0.223640	is a risk factor in
-0.314231	the different execution units in
-0.312698	and insert the reciprocal in
-0.223917	a possible minor increase in
-0.421221	the time is spent in
-0.297464	only the time spent in
-0.419567	if an exception occurs in
-0.174164	times an interrupt occurs in
-0.277133	be implemented as follows in
-0.222342	to the next step in
-0.200010	identifies any hot spots in
-0.200010	for identifying hot spots in
-0.221819	instructions at specific places in
-0.276539	and || are evaluated in
-0.014474	are allocated and deallocated in
-0.132076	objects are also deallocated in
-0.222342	and multiplication are permissible in
-0.275946	used by many users in
-0.276539	variables is approximately six in
-0.357905	32-bit systems and fourteen in
-0.219282	to certain programming principles in
-0.273781	to do the reduction in
-0.163551	C++ language is portable in
-0.163551	C++ is fully portable in
-0.219282	faster than linked lists in
-0.323529	know how to recover in
-0.390404	Most of the advice in
-0.300378	the value is already in
-0.219906	they are available, i.e. in
-0.302694	of the software package in
-0.113187	extra cost is seen in
-0.113187	performance should be seen in
-0.113187	misses is not seen in
-0.219906	the two modules contiguous in
-0.219282	function or class separately in
-0.219282	is very old-fashioned. Development in
-0.273073	their index or key in
-0.012637	in which they appear in
-0.113187	which the modules appear in
-0.164101	the two loops (except in
-0.164101	floating point capabilities (except in
-0.219282	the position-independent code flag in
-0.163002	code should be written in
-0.163002	case with programs written in
-0.219906	that were not present in
-0.268800	only from one place in
-0.032310	32-bit systems and sixteen in
-0.032310	operating systems and sixteen in
-0.215504	SSE2 is always enabled in
-0.352927	of code is serial in
-0.215504	likely to require modifications in
-0.067218	Operations that are missing in
-0.067218	these functions are missing in
-0.035277	platforms. 2. Optimizing subroutines in
-0.001257	manual 2: "Optimizing subroutines in
-0.215504	strings of different lengths in
-0.146780	This information is contained in
-0.146780	can be completely contained in
-0.215504	generate floating point underflow in
-0.352927	contentions can be prevented in
-0.035277	power of 2. Contentions in
-0.035277	branch target buffer. Contentions in
-0.035277	target buffer (BTB). Contentions in
-0.035277	in my experiments. Contentions in
-0.041532	This method is illustrated in
-0.041532	This technique is illustrated in
-0.087460	bounds checking, as illustrated in
-0.032310	objects can be returned in
-0.032310	type can be returned in
-0.215504	RAM than there is, in
-0.146780	cannot set a breakpoint in
-0.146780	insert a fixed breakpoint in
-0.216278	logical register that appears in
-0.038217	work can be found in
-0.038217	(Examples can be found in
-0.087460	advanced features rarely found in
-0.408846	to the exception handler in
-0.147441	that can be coded in
-0.147441	instructions that are coded in
-0.215504	the last index changing in
-0.146780	carry bit is kept in
-0.146780	Sometimes, functions are kept in
-0.268800	before trying the techniques in
-0.146780	not necessarily stored sequentially in
-0.146780	can be accessed sequentially in
-0.215504	addresses is much simpler in
-0.163368	are described in detail in
-0.121020	described in more detail in
-0.209279	the above security advices in
-0.261769	core and an FPGA in
-0.121020	storing the elements consecutively in
-0.121020	structure are stored consecutively in
-0.366411	several layers of abstraction in
-0.209279	stride. Variables whose distance in
-0.209279	used by default anyway in
-0.209279	does not need updating in
-0.209279	your own profiling instruments in
-0.209279	allocation is unnecessarily wasteful in
-0.121020	80. The difference lies in
-0.121020	for this efficiency lies in
-0.209279	cause unpredictable errors elsewhere in
-0.209279	be better than RISC in
-0.121020	mathematical functions are supplied in
-0.121020	that I have supplied in
-0.209279	Connecting several standard PC's in
-0.209279	times and cause delays in
-0.209279	pow function uses logarithms in
-0.209279	Don't rely on longjmp in
-0.209279	the operating system kernel in
-0.209279	a lot of bookkeeping in
-0.209279	finding the right formula in
-0.209279	exiting the {} brackets in
-0.209279	feedback should be handled in
-0.634132	procedure linkage table (PLT) in
-0.209279	for use as pivot in
-0.101829	13.1 can be placed in
-0.101829	can then be placed in
-0.248040	who has to invest in
-0.326854	than Sum2 and Sum3 in
-0.035277	different platforms as shown in
-0.035277	as _mm_empty() as shown in
-0.197091	Uninstallation should also proceed in
-0.248040	time may be justified in
-0.248040	remove all disturbing influences in
-0.197091	it can be disabled in
-0.073675	-fpic because the relocations in
-0.073675	it will generate relocations in
-0.197091	usage in kernel code" in
-0.197091	to declare it locally in
-0.197091	template parameter. If MultiplyBy in
-0.197091	cannot find the answers in
-0.197091	critical function is inserted in
-0.197091	mask. Poor reproducibility. Delays in
-0.197091	alignment is not visible in
-0.248040	The results are summarized in
-0.197091	necessary to do experiments in
-0.197091	branches are scattered everywhere in
-0.197091	optimization hints as pragmas in
-0.197091	thread that runs alone in
-0.197091	is the biggest time-consumer in
-0.017280	namespaces. 65 8 Optimizations in
-0.017280	Namespaces........................................................................................................... 65 8 Optimizations in
-0.197091	detect opportunities for parallelization in
-0.197091	different processors are covered in
-0.197091	be a slight degradation in
-0.460754	Boolean variables are overdetermined in
-0.197091	graphics card or integrated in
-0.248040	option for source annotation in
-0.248040	of software develop- ment in
-0.197091	to invest more efforts in
-0.197091	// Remove right-most 1-bit in
-0.197091	the chapter "Register usage in
-0.197091	it was assigned previously in
-0.197091	does not necessarily stay in
-0.197091	| Wednesday | Friday) in
-0.197091	algorithm in question: Put in
-0.162454	64-bit systems will dominate in
-0.162454	still have a niche in
-0.162454	this. (In Windows, SetThreadAffinityMask, in
-0.162454	will make 32 AND-operations in
-0.162454	values because a typo in
-0.162454	keyword is not recognized in
-0.162454	calling conventions. The dot in
-0.162454	? 1 : 0] in
-0.162454	by the Gnu utilities in
-0.162454	problem has been alleviated in
-0.162454	into the right positions in
-0.162454	different function libraries. Numbers in
-0.162454	of cores will grow in
-0.162454	which supposedly is system-independent, in
-0.162454	interleave the two formulas in
-0.162454	access can be arranged in
-0.162454	limitation and other flaws in
-0.162454	system functions (e.g. GetLogicalProcessorInformation in
-0.162454	more efficient than investing in
-0.162454	typical degree of randomness in
-0.162454	the memory is mirrored in
-0.162454	program is often reorganized in
-0.162454	pointer is then de-referenced in
-0.162454	Using vector classes Programming in
-0.162454	traffic and a server in
-0.162454	loop-carried dependency chain. Nothing in
-0.162454	slow or completely absent in
-0.162454	are fetched and decoded in
-0.162454	that can be programmed in
-0.162454	are not stored contiguously in
-0.162454	denormal numbers. You may, in
-0.162454	performance. We must bear in
-0.162454	are equally efficient because, in
-0.162454	program under test finishes in
-0.162454	we have inserted UnusedFiller in
-0.162454	in the planning phase in
-0.162454	the rows are indexed in
-0.162454	} } The FactorialTable in
-0.162454	due to general improvements in
-0.162454	function" has been introduced in
-0.162454	an exception occurs somewhere in
-0.162454	may cause slight imprecision in
-0.162454	graphical user interface (OnIdle in
-0.162454	global offset table (GOT) in
-0.162454	make a thread-like scheduling in
-0.162454	The clumsy AND-OR construction in
-0.162454	by default, so 1.2 in
-0.162454	optimized code (release version) in
-0.162454	overhead of the iterator in
-0.162454	this language gained remarkably in
-0.162454	and which are cheap, in
-0.162454	with #) are costless in
-0.162454	between commas and semicolons in
-0.162454	Function addresses are obscured in
-0.162454	point and integer representations in
-0.162454	the different compilers succeeded in
-0.162454	as eliminating the if-branch in
-0.162454	return; } // continue in
-0.162454	system call (e.g. GetProcessAffinityMask in
-0.162454	give a considerable improvement in
-0.162454	reflected, first and foremost, in
-0.162454	a lot of CPU-time in
-0.162454	of optimizing University courses in
-0.162454	calculations, should be scheduled in
-0.162454	loop ; compute i/2 in
-0.162454	speed. Delays or glitches in
-0.162454	system calls (e.g. IsProcessorFeaturePresent in
-0.162454	+ FuncCol(i)) * sizeof(float) in
-0.162454	// number of rows/columns in
-0.162454	256 clock cycles. Calculations in
-0.162454	are incremental or iterative in
-0.162454	details on branch predictions in
-0.162454	replaced by my comments, in
-0.162454	error has occurred anywhere in
-0.162454	need to be resized in
-0.162454	data can be overridden in
-0.162454	this time lag. Thinking in
-0.162454	Functions _intel_fast_memcpy and __intel_new_strlen in
-0.162454	in the other volumes in
-0.536346	(static_cast<MyChild*>(this))->Disp(); } }; // The
-0.236034	outside both loops // The
-0.605938	sign bit of x The
-0.236713	{}; void xplus2() { The
-0.423286	b[r][c]); } } } The
-0.178101	{ ... } } The
-0.178101	FactorialTable[b]; ... } } The
-0.469630	- 1; } } The
-0.431349	+ 2; } } The
-0.047070	i, a); } } The
-0.305059	aa: a.store(aa+i); } } The
-0.229787	Induction; Induction++; } } The
-0.229787	+ i/2; } } The
-0.039610	... return 0; } The
-0.504425	b + 1; } The
-0.571186	list[i+2] = 2; } The
-0.295679	9 + 3; } The
-0.545015	y + 1.; } The
-0.332753	temp->b = 2.0; } The
-0.468803	list[i] += 1.0f; } The
-0.201458	induction variable Z } The
-0.201458	{ return pow(x,10); } The
-0.201458	abs(u.f) > abs(v.f) } The
-0.201458	of range printf(Greek[n]); } The
-0.201458	b[i] = Func(a[i]); } The
-0.201458	%10I64i", i, timediff[i]); } The
-0.201458	2.5}; return list[x]; } The
-0.236101	to optimization. Prefetching data The
-0.427924	u[0]. 14.10 Mathematical functions The
-0.375289	); 7.26 Overloaded functions The
-0.232489	names. Use fastcall functions The
-0.536131	in different C++ compilers The
-0.314894	on all C++ compilers The
-0.236015	on Mac platform. Intel The
-0.235510	/ 4; Register variables The
-0.335856	of elements in table The
-0.925494	Still faster if unsigned The
-0.299194	possibly improve the code. The
-0.270513	small sequences of code. The
-0.126633	any floating point code. The
-0.126633	style floating point code. The
-0.242943	of an intermediate code. The
-0.482320	the same source code. The
-0.192555	so-called position- independent code. The
-0.192555	C++ and Fortran code. The
-0.242943	in optimizing application-specific code. The
-0.192555	optimizations in precompiled code. The
-0.534403	10% of the time. The
-0.348340	considerable amount of time. The
-0.894226	at the same time. The
-0.281171	again takes extra time. The
-0.417442	etc. at compile time. The
-0.774303	known at compile time. The
-0.326213	Dispatch at load time. The
-0.196615	calculations to save time. The
-0.326213	steal the user's time. The
-0.557896	set and YMM registers The
-0.557896	set and ZMM registers The
-0.655528	version of the function. The
-0.250952	parameter to the function. The
-0.082269	return from the function. The
-0.082269	returns from the function. The
-0.237929	a class member function. The
-0.191263	of the critical function. The
-0.114198	to the critical function. The
-0.160716	of the new function. The
-0.160716	to the desired function. The
-0.207343	of an inlined function. The
-0.160716	own error message function. The
-0.160716	call a polymorphic function. The
-0.160716	some examples: strlen function. The
-0.211046	data base access, etc. The
-0.211046	the other way, etc. The
-0.211046	network resources, databases, etc. The
-0.211046	mutexes, database connections, etc. The
-0.289816	in 64 bit Linux The
-0.225727	// sign bit }; The
-0.280376	ReadB() {return b;} }; The
-0.157214	of the library functions. The
-0.157214	useful for library functions. The
-0.161910	for the two functions. The
-0.166507	with virtual member functions. The
-0.230775	all non-static member functions. The
-0.166507	any non-polymorphic member functions. The
-0.208673	instead of virtual functions. The
-0.161910	the intrinsic hardware functions. The
-0.161910	of the polymorphic functions. The
-0.161910	logarithms and trigonometric functions. The
-0.288935	Increment and decrement operators The
-0.285979	part of the memory. The
-0.316335	registers, not in memory. The
-0.213687	in the code memory. The
-0.555264	part of the program. The
-0.265469	beginning of the program. The
-0.180634	modified by the program. The
-0.180634	that crashes the program. The
-0.185259	performance of a program. The
-0.185259	development of a program. The
-0.165608	a console mode program. The
-0.333291	by the application program. The
-0.223172	graphics framework is used. The
-0.420875	static linking is used. The
-0.223172	when alloca is used. The
-0.202832	objects are not used. The
-0.528713	2.2 Choice of microprocessor The
-0.375374	area. Join identical branches The
-0.232925	up to date. Mac The
-0.082854	be in the cache. The
-0.082854	not in the cache. The
-0.079238	actively invalidate the cache. The
-0.014531	in the code cache. The
-0.470666	in the data cache. The
-0.405594	in the level-2 cache. The
-0.252679	in the level-1 cache. The
-0.109225	the same level-1 cache. The
-0.176233	cache or micro-op cache. The
-0.405377	32-bit and 64-bit systems. The
-0.480554	registers in 64-bit systems. The
-0.312220	CPUs and operating systems. The
-0.163359	available for Linux systems. The
-0.163359	used on bigger systems. The
-0.163359	applies to BSD systems. The
-0.163359	OS and Itanium systems. The
-0.700121	+ b + c; The
-0.289271	blocks is more efficient. The
-0.216473	function calls more efficient. The
-0.180957	data caching very efficient. The
-0.248960	makes it less efficient. The
-0.683109	data caching less efficient. The
-0.432866	optimizations, as explained below. The
-0.229647	function as described below. The
-0.180697	experiment are given below. The
-0.180697	in the sections below. The
-0.180697	in example 14.19 below. The
-0.216373	can prefetch the data. The
-0.134039	relative addressing of data. The
-0.215063	self-relative addressing of data. The
-0.157752	2 gigabytes of data. The
-0.146861	other odd-sized vector data. The
-0.146861	the most used data. The
-0.372614	set of test data. The
-0.146861	code and read-only data. The
-0.528219	7.16 Function return types The
-0.039500	the available instruction set. The
-0.185588	the AVX instruction set. The
-0.185588	a given instruction set. The
-0.185588	a lower instruction set. The
-0.185588	the specified instruction set. The
-0.274711	on seven different compilers. The
-0.274711	most modern C++ compilers. The
-0.204129	Gnu and Clang compilers. The
-0.348845	works on Intel processors. The
-0.238990	on most newer processors. The
-0.189034	implemented in PC processors. The
-0.189034	bytes on contemporary processors. The
-0.679575	Choice of hardware platform The
-0.754645	should be stored together The
-0.725005	the function is called. The
-0.232257	when CriticalInnerFunction is called. The
-0.163431	local objects are called. The
-0.163431	all destructors are called. The
-0.235189	functions are never called. The
-0.214350	all brands of CPUs. The
-0.214350	supported on AMD CPUs. The
-0.167005	incompatible with old CPUs. The
-0.167005	by all modern CPUs. The
-0.167005	not with earlier CPUs. The
-0.286550	order of Boolean operands The
-0.330466	version of the compiler. The
-0.285914	Cannot optimize across modules The
-0.195201	rather than pointers are: The
-0.457291	dynamic memory allocation are: The
-0.195201	of function inlining are: The
-0.323359	calculations inside the loop. The
-0.221891	to exit the loop. The
-0.163339	after the test loop. The
-0.476773	the critical innermost loop. The
-0.163339	be an infinite loop. The
-0.320648	actually making a pointer. The
-0.213189	through a hidden pointer. The
-0.230532	problem. 7.11 Type conversions The
-0.320802	Linux and Windows platforms. The
-0.157081	cases on Windows platforms. The
-0.373154	sections. 3.3 Program installation The
-0.175728	50% of the cases. The
-0.348262	lists in most cases. The
-0.175728	solution in such cases. The
-0.348262	least in simple cases. The
-0.487545	than 0 and 1. The
-0.695230	than 0 or 1. The
-0.316364	know about. Function inlining The
-0.229903	arrays of variable size. The
-0.611885	high power of 2. The
-0.172074	result will be 2. The
-0.029766	divide i by 2. The
-0.229370	avoided for these variables. The
-0.229239	cleanup of allocated resources. The
-0.288262	a structure or class. The
-0.168143	of the same class. The
-0.168143	into a container class. The
-0.168143	the // parent class. The
-0.346609	functions that it calls. The
-0.134838	binding of function calls. The
-0.134838	branches or function calls. The
-0.134838	through multiple function calls. The
-0.194262	involves pure function calls. The
-0.131210	system-specific graphical interface calls. The
-0.609416	Choosing the optimal algorithm The
-0.205838	together and tested it. The
-0.205838	than to execute it. The
-0.140793	of bigger vector registers. The
-0.140793	of special vector registers. The
-0.233087	and 256-bit YMM registers. The
-0.108685	faster in 32-bit mode. The
-0.108685	references in 32-bit mode. The
-0.184687	especially in 32-bit mode. The
-0.209873	in 64 bit mode. The
-0.230376	abc is 12 bytes. The
-0.480479	reference to the object. The
-0.183769	in a global object. The
-0.183769	or an anonymous object. The
-0.625039	version of the library. The
-0.276723	a separate function library. The
-0.074162	anywhere in the calculations. The
-0.074162	program do the calculations. The
-0.210612	all the integer calculations. The
-0.163651	CPU from overlapping calculations. The
-0.226426	called core clock cycles. The
-0.226426	- 100 clock cycles. The
-0.301065	- 20 clock cycles. The
-0.257888	than doing arithmetic operations. The
-0.205838	called Single-Instruction-Multiple-Data (SIMD) operations. The
-0.163209	optimizations on that variable. The
-0.035418	is a register variable. The
-0.035418	be a register variable. The
-0.210119	by an induction variable. The
-0.228831	debugging options prevent optimization. The
-0.205838	impacts on program performance. The
-0.205838	version for best performance. The
-0.179791	between multiple dynamic libraries. The
-0.179791	for very large libraries. The
-0.179791	Intel vector math libraries. The
-0.433138	support for the stack. The
-0.542197	instruction set if possible. The
-0.273694	as good as possible. The
-0.289262	of optimization is needed. The
-0.203264	evaluated only when needed. The
-0.179791	of structures and classes. The
-0.179791	allowed only for classes. The
-0.255283	by well-tested container classes. The
-0.228912	and stop the thread. The
-0.458727	for many different purposes. The
-0.304350	used for other purposes. The
-0.304350	variable for test purposes. The
-0.179791	good performance and precision. The
-0.179791	relaxed floating point precision. The
-0.179791	risk of losing precision. The
-0.246217	or by memory access. The
-0.179791	addresses at each access. The
-0.179791	done at every access. The
-0.369619	The loop control condition The
-0.297508	use the AVX instructions. The
-0.175137	memory and string instructions. The
-0.175137	delay the subsequent instructions. The
-0.229312	+ e + f; The
-0.548648	in the following way. The
-0.223429	a very inefficient way. The
-0.421279	in a suboptimal way. The
-0.223429	object to the vector. The
-0.175137	a full size vector. The
-0.421279	of elements per vector. The
-0.252545	be smaller as well. The
-0.201095	code version performs well. The
-0.668100	with automatic CPU dispatching. The
-0.412330	Branches and switch statements The
-0.197263	must calculate its address. The
-0.197263	a 32-bit (signed) address. The
-0.524315	of these instruction sets. The
-0.197263	the different instructions sets. The
-0.108560	a loop or not. The
-0.108560	of 2 or not. The
-0.108560	be advantageous or not. The
-0.164261	properly aligned or not. The
-0.248932	to reduce this problem. The
-0.169908	we encounter another problem. The
-0.169908	is another security problem. The
-0.228121	and operating systems. 3 The
-0.143900	changes for each version. The
-0.143900	than the 32-bit version. The
-0.143900	than the alternative version. The
-0.143900	have an up-to-date version. The
-0.844016	to the end user. The
-0.551282	when the function returns. The
-0.551282	before the function returns. The
-0.359710	32-bit and 64-bit Windows. The
-0.359710	than in 64-bit Windows. The
-0.118351	in a non-sequential order. The
-0.400047	deallocated in random order. The
-0.225932	reserved for dynamic allocation. The
-0.244557	advantage of 64-bit integers. The
-0.193992	of eight 16-bit integers. The
-0.407083	instruction set is enabled. The
-0.226670	in the future. 6 The
-0.225932	This is quite inefficient. The
-0.128111	when speed is critical. The
-0.279186	code caching is critical. The
-0.281445	use can be critical. The
-0.004308	instruction set is available. The
-0.134602	math function libraries available. The
-0.321301	and branch is executed. The
-0.163283	approximately three times faster. The
-0.163283	the address calculation faster. The
-0.163283	of code execute faster. The
-0.244172	for investigating performance problems. The
-0.193650	end user. Installation problems. The
-0.420102	and the template parameter. The
-0.281027	in case of overflow. The
-0.134602	only a single element. The
-0.134602	of the matrix element. The
-0.352193	clock cycles per element. The
-0.134602	a suitable pivot element. The
-0.280608	function for register storage. The
-0.154788	can change the value. The
-0.154788	a function return value. The
-0.154788	with the calculated value. The
-0.225277	or an input file. The
-0.315195	than in a register. The
-0.238294	into a vector register. The
-0.225277	multiple values at once The
-0.188794	available in the system. The
-0.285077	have an operating system. The
-0.265562	which is very fast. The
-0.188414	is accessed quite fast. The
-0.223405	Half size execution units. The
-0.223405	with full-size execution units. The
-0.224863	compatible with that branch. The
-0.154788	run in an array. The
-0.154788	results in another array. The
-0.154788	as a normal array. The
-0.310977	take up cache space. The
-0.299209	an array to zero. The
-0.505735	the same cache line. The
-0.188414	of a matrix line. The
-0.188414	things like adding vectors. The
-0.188414	as two 128-bit vectors. The
-0.188414	inefficient in large applications. The
-0.188414	compiler for Windows applications. The
-0.504877	64-bit Mac OS X The
-0.278377	listed in the table. The
-0.223964	and most up-to-date solution. The
-0.362839	both Windows and Linux. The
-0.182630	floating point to integer. The
-0.370351	x as an integer. The
-0.182202	different kinds of optimizations. The
-0.306897	This enables interprocedural optimizations. The
-0.385885	message in this case. The
-0.182202	for the 32-bit case. The
-0.223964	on an Intel processor. The
-0.258367	extended number of bits. The
-0.190197	double uses 64 bits. The
-0.145285	of the extra bits. The
-0.182202	inside the loop is. The
-0.182202	the compiler itself is. The
-0.144498	trivial programming work automatically. The
-0.144498	of this alignment automatically. The
-0.144498	may not vectorize automatically. The
-0.224436	all Unix-like platforms. Clang The
-0.182202	above code in details. The
-0.267473	my blog for details. The
-0.144891	an obstacle to vectorization. The
-0.179770	invoked with automatic vectorization. The
-0.179770	rely on automatic vectorization. The
-0.223492	exception handling support anyway. The
-0.249527	obvious thing to do. The
-0.182630	it can not do. The
-0.194593	code into multiple threads. The
-0.289154	work into multiple threads. The
-0.144498	and communicating between threads. The
-0.422124	name and model number. The
-0.221672	multiplying with a constant. The
-0.361086	over 32 bit systems: The
-0.275780	counter // Calculate polynomial The
-0.221672	not always avoiding this. The
-0.221672	Dispatch on first call. The
-0.222220	made the right prediction. The
-0.190032	compiler with the application. The
-0.174546	in the particular application. The
-0.131146	than an MFC application. The
-0.174546	memory model used here. The
-0.174546	a little odd here. The
-0.221672	the child class members. The
-0.295425	collection, as mentioned above. The
-0.222220	a low positive result. The
-0.174055	Development process...................................................................................................... 25 7 The
-0.174055	version control tool. 7 The
-0.585043	cases of stack unwinding The
-0.222220	interrupt 3 breakpoint again. The
-0.174055	two iterations in one. The
-0.296074	making a new one. The
-0.361086	the time stamp counter. The
-0.222220	member of a structure. The
-0.296074	a class or structure. The
-0.134901	on a Pentium 4. The
-0.134901	the old Pentium 4. The
-0.174055	many calls and branches. The
-0.174055	and other nearby branches. The
-0.219135	all to the profiler. The
-0.302516	is easier to maintain. The
-0.292421	of advanced development tools. The
-0.219135	1 for negative numbers. The
-0.219135	allowed in assembly names. The
-0.219135	read this first manual. The
-0.154665	memory in a computer. The
-0.113118	a Pentium 4 computer. The
-0.113118	cycle on another computer. The
-0.302516	preceding addition is finished. The
-0.209775	used in two ways. The
-0.162900	512 kb, 8 ways. The
-0.162900	crash the program. 16.2 The
-0.162900	counters .................................................................... 155 16.2 The
-0.219789	ipow faster than pow The
-0.535718	i++) sum += a[i]; The
-0.272908	performance has high priority. The
-0.302516	advantage of out-of-order execution. The
-0.219135	only on Intel/x86-compatible microprocessors. The
-0.281362	function library at www.agner.org/optimize/asmlib.zip. The
-0.162900	in the library www.agner.org/optimize/asmlib.zip. The
-0.302516	results for branch mispredictions. The
-0.414527	seem to do so. The
-0.219135	names and code addresses. The
-0.219135	wires that connect them. The
-0.292421	b is floating point. The
-0.157948	b member by 8. The
-0.322067	address divisible by 8. The
-0.162900	Performance for further explanation. The
-0.162900	needs a little explanation. The
-0.162900	structure or class elements. The
-0.209775	access to array elements. The
-0.128071	calculation time is doubled. The
-0.128071	clock frequency is doubled. The
-0.219135	involves multiplication or division. The
-0.092439	only one clock cycle. The
-0.092439	just one clock cycle. The
-0.231054	addition every clock cycle. The
-0.219135	32 for AVX. 5. The
-0.219135	of the pitfalls here: The
-0.215358	solution is clearly better. The
-0.215358	can be broken up. The
-0.215358	possible for usability reasons. The
-0.375167	event of an exception. The
-0.215358	also be a type. The
-0.191752	Make pointer at initialization. The
-0.146687	does the necessary initialization. The
-0.191752	significant bit of ebx. The
-0.146687	in edx, to ebx. The
-0.215358	and double is bad The
-0.146687	if a is true. The
-0.146687	is not always true. The
-0.215358	structure or class objects. The
-0.146687	of the loop index. The
-0.146687	as an array index. The
-0.010011	library (*.dll or *.so). The
-0.276842	shared objects (*.dll, *.so). The
-0.146687	functions or code lines. The
-0.372322	the same cache lines. The
-0.494889	the program is running. The
-0.408619	waits for user input. The
-0.216169	points to a dispatcher. The
-0.215358	course far from optimal. The
-0.287953	(n!) as an example. The
-0.216169	= a + 1.0f; The
-0.268636	any loss of efficiency. The
-0.215358	including the 64-bit versions. The
-0.268636	data are not cached. The
-0.146687	the dividend is unsigned. The
-0.146687	be signed or unsigned. The
-0.146687	arithmetic operations with pointers. The
-0.260187	overflow, and invalid pointers. The
-0.215358	vector of two double. The
-0.058641	column to the diagonal. The
-0.028331	28 above the diagonal. The
-0.028331	position above the diagonal. The
-0.269552	one object to another. The
-0.408619	about no pointer aliasing. The
-0.190970	matter of programming style. The
-0.215358	the two clock counts. The
-0.215358	or thread are smaller. The
-0.087406	for x86 platforms. 3. The
-0.087406	instruction for interrupt 3. The
-0.087406	an anonymous namespace. 3. The
-0.087406	from any other module. The
-0.080090	system-independent, in another module. The
-0.212651	called from another module. The
-0.035253	extra code. Dynamic cast The
-0.035253	can not. Static cast The
-0.035253	to int. Reinterpret cast The
-0.035253	VIA CPUs"). Const cast The
-0.215358	set the parentheses manually. The
-0.527926	the program is run. The
-0.146687	the long double format. The
-0.146687	usual object file format. The
-0.215358	obscured in optimized programs. The
-0.087406	of how compilers work. The
-0.087406	concentrating on important work. The
-0.087406	compilers and microprocessors work. The
-0.209136	be platform-independent and compact. The
-0.106712	for the following reasons: The
-0.120942	in case of error. The
-0.120942	a common programming error. The
-0.209136	any non-vector library. 119 The
-0.209136	intermediate code (byte code). The
-0.483120	to recover from errors. The
-0.209136	calculated using multiplications only. The
-0.343173	storage. Live range analysis The
-0.209136	rather than processor features. The
-0.101779	reserving memory in advance. The
-0.101779	be given in advance. The
-0.209136	in set number 28. The
-0.343173	long time it takes. The
-0.120942	errors without using exceptions. The
-0.120942	to catching hardware exceptions. The
-0.056275	programming language is implemented. The
-0.056275	example 15.1b is implemented. The
-0.209136	is only called once. The
-0.209136	drivers for 64-bit Windows). The
-0.209136	want to compile for. The
-0.209136	at the second step. The
-0.261608	dispatch by CPU brand. The
-0.280608	need for garbage collection. The
-0.209136	{ DoThisThreeTimesAWeek(); } 135 The
-0.210201	for a particular purpose. The
-0.120942	their values before compilation. The
-0.163283	implementations use just-in-time compilation. The
-0.343173	elements are accessed sequentially. The
-0.209136	precision as the operands. The
-0.120942	or more iterations back. The
-0.120942	read and written back. The
-0.209136	miss can be expected. The
-0.633752	compilers and operating systems". The
-0.280608	manual 5: calling conventions. The
-0.209136	how variables are stored. The
-0.485125	it has been deallocated. The
-0.209136	with different matrix sizes. The
-0.027226	work in example 9.6b. The
-0.027226	shown in example 9.6b. The
-0.280608	and always the same. The
-0.261608	contentions do not occur. The
-0.366205	therefore more error prone. The
-0.163283	file from the linker. The
-0.120942	load the dynamic linker. The
-0.209136	additions and no multiplications. The
-0.261608	we may need metaprogramming. The
-0.120942	/Fa for assembly output. The
-0.120942	that produce Boolean output. The
-0.209136	is an arithmetic expression. The
-0.120942	Any other allocated resource. The
-0.120942	is a limited resource. The
-0.483120	between rounding and truncation. The
-0.209136	the function is big. The
-0.209136	vector of four float. The
-0.242353	* x + 1.0f;} The
-0.101779	return square(x) + 1.0f;} The
-0.209136	suitable choice of n. The
-0.209136	effectively preventing illegitimate copying. The
-0.209136	the result in x. The
-0.209136	use pre-increment or post-increment. The
-0.209136	implemented as recursive templates. The
-0.326667	vectorization (see page 107). The
-0.247883	found elsewhere. 13.5 Implementation The
-0.073623	for exploiting fine-grained parallelism. The
-0.073623	code contains natural parallelism. The
-0.326667	in the clock frequency. The
-0.247883	2. (See page 71). The
-0.196952	ms for background jobs. The
-0.196952	depending on the context. The
-0.073623	with the ^ operator. The
-0.073623	with the sizeof operator. The
-0.247883	compilers. Use automatic parallelization. The
-0.073623	less reliable. Event-based sampling: The
-0.073623	code line. Time-based sampling: The
-0.326667	has only one instance. The
-0.196952	of. Big runtime frameworks. The
-0.073623	example 13.1 page 127. The
-0.073623	from -128 generates 127. The
-0.196952	in a single instruction. The
-0.196952	simple in most cases: The
-0.460499	or moving the mouse. The
-0.247883	libraries is more difficult. The
-0.196952	all the B values. The
-0.460499	// Writes "Hello 2" The
-0.073623	rather than the heap. The
-0.073623	managing a memory heap. The
-0.196952	Dynamic linking works differently. The
-0.460499	subroutines in assembly language". The
-0.247883	compact. See page 52. The
-0.196952	different microprocessors. 7.13 Loops The
-0.196952	the PLT and GOT. The
-0.196952	length of a string. The
-0.196952	of register is volatile. The
-0.196952	Windows DLLs use relocation. The
-0.196952	precision is not supported. The
-0.326667	Index out of range. The
-0.460499	the time of programming. The
-0.326667	unrolling also has disadvantages: The
-0.196952	satisfies the user's needs. The
-0.196952	not affected by __fastcall. The
-0.196952	instruction set is specified. The
-0.073623	no risk of underflow. The
-0.073623	on overflow and underflow. The
-0.196952	all 0's when false. The
-0.196952	for Windows. 10 Multithreading The
-0.196952	inappropriate CPU dispatch methods. The
-0.196952	different targets is small. The
-0.326667	as a make utility. The
-0.326667	times cannot be controlled. The
-0.073623	security reason for updating. The
-0.073623	necessary support. Hardware updating. The
-0.073623	object is allocated separately. The
-0.073623	should be measured separately. The
-0.247883	storage. See page 26. The
-0.460499	Class data members (properties) The
-0.196952	of CPU cores. 60 The
-0.196952	this number of iterations. The
-0.247883	by 16 is required. The
-0.196952	is not in use. The
-0.196952	in the class declaration. The
-0.247883	table may go undetected. The
-0.196952	higher) is enabled. Volatile The
-0.196952	educational purposes is allowed. The
-0.196952	summarized in table 8.1. The
-0.196952	for (b + c) The
-0.196952	the CPU detection mechanism. The
-0.035253	will not be negative. The
-0.035253	will never be negative. The
-0.196952	limit can be defined. The
-0.196952	on the work load. The
-0.196952	the high level framework. The
-0.460499	the same memory area. The
-0.196952	resources than standard PCs. The
-0.247883	in my test examples. The
-0.247883	address can be predicted. The
-0.196952	with the inverted mask. The
-0.073623	on the same machine. The
-0.073623	so-called Java virtual machine. The
-0.460499	can be allocated dynamically. The
-0.196952	0 rather than 1.23456. The
-0.460499	procedure linkage table (PLT). The
-0.196952	each row or column. The
-0.247883	by 16 (see below). The
-0.196952	come from unknown sources. The
-0.247883	2 (See page 137). The
-0.196952	branch inside the template. The
-0.196952	the CPU was started. The
-0.073623	* 8 = 80. The
-0.073623	optimizations. See page 80. The
-0.196952	of element number i. The
-0.073623	then N&(N-1) is 0. The
-0.073623	if c < 0. The
-0.196952	code versions work correctly. The
-0.073623	Introduction ....................................................................................................................... 3 1.1 The
-0.073623	new relevant information. 1.1 The
-0.196952	mispredictions (see page 43). The
-0.247883	BSD, Windows and Mac. The
-0.326667	decimals of the fraction. The
-0.073623	with the other compilers). The
-0.073623	very old DOS compilers). The
-0.073623	Interference from other processes. The
-0.073623	shared between multiple processes. The
-0.326667	without the sign bit. The
-0.196952	static and dynamic linking. The
-0.196952	and reusable classes. Security The
-0.196952	as integers. 7.5 Booleans The
-0.196952	modules are linked together. The
-0.196952	block then become invalid. The
-0.035253	explained on page 122. The
-0.035253	registers; see page 122. The
-0.017269	time the program starts. The
-0.017269	before the program starts. The
-0.073623	Now ebx contains i/2+r. The
-0.073623	register for computing i/2+r. The
-0.247883	the example below shows. The
-0.008549	the file is closed. The
-0.196952	parameter transfer is avoided. The
-0.073623	of 64 bits each. The
-0.073623	of 8 bytes each. The
-0.196952	version 2.6.30 and later. The
-0.326667	double (see page 140). The
-0.196952	as vector operations. 105 The
-0.247883	= (a+1) / 4; The
-0.196952	than end users have. The
-0.326667	64-bit Linux and BSD. The
-0.162326	class (see page 51). The
-0.162326	is a loop count. The
-0.162326	can be calculated independently. The
-0.162326	under CPU cache (en.wikipedia.org/wiki/L2_cache). The
-0.162326	anything else being initialized. The
-0.162326	i++) a[i] = i+1; The
-0.162326	a thread is terminated. The
-0.162326	priority than code generality. The
-0.162326	double a = sin(0.8); The
-0.162326	bloat and complexity (en.wikipedia.org/wiki/Standard_Template_Library). The
-0.162326	next year. Ignoring virtualization. The
-0.162326	files to be installed. The
-0.162326	are implemented with interpretation. The
-0.162326	this only happens rarely. The
-0.162326	RAM size is insufficient. The
-0.162326	in each CPU core). The
-0.162326	it used to be. The
-0.162326	in two other situations: The
-0.162326	(See also page 119). The
-0.162326	in the next paragraph. The
-0.162326	interrupt, e.g. every millisecond. The
-0.162326	have constructors and destructors. The
-0.162326	page 137 about division). The
-0.162326	registers (see page 27). The
-0.162326	for char pointers. 144 The
-0.162326	element a[i] is ecx+eax*4. The
-0.162326	handle current CPUs optimally. The
-0.162326	Func1 when compiling module2.cpp. The
-0.162326	register variable in eax. The
-0.162326	+ a2*b1) / (b1*b2); The
-0.162326	when testing worst-case performance: The
-0.162326	by the value 1000. The
-0.162326	everything happens at runtime). The
-0.162326	times rather than 20. The
-0.162326	PC platforms. Graphics accelerators The
-0.162326	efficient than mov eax,0. The
-0.162326	is checked before storing. The
-0.162326	syntax in example 8.15b. The
-0.162326	means not a vector). The
-0.162326	different profiling methods: Instrumentation: The
-0.162326	is stored in y. The
-0.162326	precision requires only SSE). The
-0.162326	to such a formalism. The
-0.162326	of a suitable duration. The
-0.162326	has advantages and disadvantages. The
-0.162326	is admittedly very kludgy. The
-0.162326	SetThreadAffinityMask, in Linux, sched_setaffinity). The
-0.162326	you will see shortly. The
-0.162326	help files and databases. The
-0.162326	of when type-casting pointers: The
-0.162326	for Nerds at Wikibooks. The
-0.162326	do not overlap. 27 The
-0.162326	write instructions becomes noticeable. The
-0.162326	is slow, you know). The
-0.162326	more complicated and error-prone. The
-0.162326	is efficient, but risky. The
-0.162326	to reinvent the wheel. The
-0.162326	one of the weekdays. The
-0.162326	CriticalFunction in example 16.2. The
-0.162326	metaprogramming would be straightforward. The
-0.162326	can be optimized further. The
-0.162326	on with a password. The
-0.162326	throughput (see p. 104). The
-0.162326	the code becomes contiguous. The
-0.162326	get library versions instead. The
-0.162326	as the binary digits. The
-0.162326	explained on page 44. The
-0.162326	limited by physical factors. The
-0.162326	> and >= operators). The
-0.162326	the diagonal remain unchanged. The
-0.162326	an empty throw() specification. The
-0.162326	of e.g. four floats. The
-0.162326	is called register renaming. The
-0.162326	will be used most. The
-0.162326	behaviour is implementation dependent. The
-0.162326	example 14.23 page 143. The
-0.162326	operators (&& and ||). The
-0.162326	frequency (in Windows: __rdtsc()). The
-0.162326	* (1. / 1.2345); The
-0.162326	the code is repetitive. The
-0.162326	than a certain tolerance. The
-0.162326	better than its reputation. The
-0.162326	no pointer aliasing (/Oa). The
-0.162326	time it takes. Debugging. The
-0.162326	are CPLDs and FPGAs. The
-0.162326	without the register keyword. The
-0.162326	first. b+c = 100000001.23456. The
-0.162326	instruction set, e.g. /arch:SSE2. The
-0.162326	compilers has several flaws: The
-0.162326	low-power CPUs (Intel Atom). The
-0.162326	improve the performance somewhat. The
-0.162326	thread-local storage p. 28) The
-0.162326	calling. __fastcall or __attribute__((fastcall)). The
-0.162326	below on page 134. The
-0.162326	{return r.a + r.b;} The
-0.162326	is not necessarily newer. The
-0.162326	than the variable m. The
-0.162326	find the best algorithm. The
-0.162326	fraction 2 63 . The
-0.162326	are not well documented. The
-0.162326	allocated memory, using new. The
-0.162326	bytes in the end. The
-0.162326	in 2015 or 2016. The
-0.162326	intended (see page 84). The
-0.162326	has the following features: The
-0.162326	it with 2n -1. The
-0.162326	integer value of temp. The
-0.162326	look clumsy and tedious. The
-0.162326	in microprocessor hardware design. The
-0.162326	hard disk copying. Security. The
-0.162326	the most efficient alternative. The
-0.162326	in a Gauss elimination. The
-0.162326	Intel Math Kernel Library. The
-0.162326	back to normal afterwards. The
-0.162326	iteration to the next. The
-0.162326	as AMD and VIA. The
-0.162326	many common purposes (www.boost.org). The
-0.162326	MKL, VML and SVML. The
-0.162326	int a; Plus2 (&a); The
-0.162326	STL is not satisfactory. The
-0.162326	comparisons, such as <. The
-0.162326	which one is fastest. The
-0.162326	manuals by Agner Fog The
-0.162326	and one that doesn’t. The
-0.162326	code that is distributed. The
-0.162326	have family number 6! The
-0.162326	in 64-bit systems. 67 The
-0.162326	page 130 for details). The
-0.162326	to do the conversion. The
-0.162326	paying the performance costs. The
-0.162326	a=a*2; to return a+1;. The
-0.162326	certain conditions are satisfied. The
-0.162326	and cause large delays. The
-0.162326	my vector class library). The
-0.162326	can be changed freely. The
-0.162326	initialization, condition, and increment. The
-0.162326	when optimizing multithreaded applications: The
-0.162326	expressions (see page 72). The
-0.162326	their superior performance/price ratio. The
-0.162326	is called name mangling. The
-0.162326	preceding value of sum. The
-0.162326	+ 2.0 / 3.0; The
-0.162326	i; for(i=0; i<100; i++)a[i]=2*i; The
-0.162326	for the "FDIV bug". The
-0.162326	index by 8. 71 The
-0.162326	more clear and modular. The
-0.162326	This has three advantages: The
-0.162326	into two 128-bit reads. The
-0.162326	? 1.5f : 2.6f; The
-0.162326	functions and hot spots. The
-0.162326	of a Taylor series. The
-0.162326	operators &, |, ~. The
-0.162326	of the preceding row. The
-0.162326	examples I have tested. The
-0.162326	compiler not to vectorize. The
-0.162326	http://www.agner.org/optimize/ - vectorclass www.agner.org/optimize/#vectorclass. The
-0.162326	and virtual function tables. The
-0.162326	have become more powerful. The
-0.162326	predictable than integer comparisons. The
-0.162326	double precision (80 bits). The
-0.162326	x) { return _mm_cvtsd_si32(_mm_load_sd(&x));} The
-0.162326	compilers I have tried. The
-0.162326	is now as follows. The
-0.162326	as accessing it directly. The
-0.162326	frame" or "frame pointer". The
-0.162326	pointer or reference parameters). The
-0.162326	or ten years old. The
-0.162326	mostly compatible with these. The
-0.162326	time has been wasted. The
-0.162326	if-branch in example 7.30b. The
-0.162326	profitable (see page 70). The
-0.236905	used at all is for
-0.478845	Introduction This manual is for
-0.236905	languages. My preference is for
-0.350355	and well-structured code and for
-0.235313	to an array and for
-0.340761	the link pointers and for
-0.235313	for execution speed and for
-0.235313	many optimization features and for
-0.235313	parameters, local variables, and for
-0.237195	recommended to use that for
-0.313378	The subsequent manuals are for
-0.235082	to reflect this or for
-0.235082	whole program optimization or for
-0.235082	use for recovering or for
-0.102378	reason to use it for
-0.102378	you can use it for
-0.456905	before calling the function for
-0.496166	versions of a function for
-0.475400	variables in a function for
-0.337566	may have a function for
-0.374854	of the template function for
-0.374854	of the inlined function for
-0.232176	find the right function for
-0.374854	of the strlen function for
-0.346438	sizeof operator. The code for
-0.419898	in the program code for
-0.227582	32-bit and 64-bit code for
-0.416710	produce any extra code for
-0.052327	C++ and assembly code for
-0.338001	code and intermediate code for
-0.282480	produce the optimal code for
-0.227582	make exactly identical code for
-0.022726	12.6 Transforming serial code for
-0.227582	Studio can build code for
-0.510318	is the same as for
-0.236426	for Intel CPUs, not for
-0.234482	for single precision than for
-0.234482	level-2 cache contentions than for
-0.234482	higher for shared_ptr than for
-0.314925	for Basic. A compiler for
-0.441050	and the Intel compiler for
-0.441050	on the Intel compiler for
-0.229661	platforms. PathScale C++ compiler for
-0.229661	tolerated. PGI C++ compiler for
-0.574712	and the Gnu compiler for
-0.407547	with the Gnu compiler for
-0.308664	Intel or Microsoft compiler for
-0.224203	Another open source compiler for
-0.224203	manual for your compiler for
-0.224203	Intel or PathScale compiler for
-0.224203	VectorC A commercial compiler for
-0.224203	is a cheap compiler for
-0.331249	the availability of x for
-1.364674	short int cc[]) { for
-0.068473	< SIZE; r++) { for
-0.369744	r1 += TILESIZE) { for
-0.022797	< r1+TILESIZE; r2++) { for
-0.228491	* __restrict bb) { for
-0.236386	I am using this for
-0.427054	and waste of time for
-0.375781	application. The development time for
-0.232843	is to save time for
-0.232843	one that saves time for
-0.339178	code branch to use for
-0.339178	cache lines to use for
-0.339178	are cumbersome to use for
-0.348099	calling function can use for
-0.414870	The allocation of memory for
-0.320157	big block of memory for
-0.286908	has too much data for
-0.231482	generates too little data for
-0.307076	to automatically prefetch data for
-0.231482	used for prefetching data for
-0.352788	slow down a program for
-0.232494	specific size is different for
-0.288059	section will be different for
-0.308281	branch prediction are different for
-0.501695	principles are the same for
-0.214554	class library have functions for
-0.214554	well as efficient functions for
-0.053075	library contains many functions for
-0.053075	Library" contains many functions for
-0.113551	CPUs. Includes many functions for
-0.214554	lack the necessary functions for
-0.322469	implemented with intrinsic functions for
-0.214554	www.agner.org/optimize/asmlib.zip contains various functions for
-0.267727	efficient than frame functions for
-0.101992	117 12.7 Mathematical functions for
-0.101992	118 12.7 Mathematical functions for
-0.303919	Table 12.3. Intrinsic functions for
-0.214554	GetTickCount or QueryPerformanceCounter functions for
-0.213405	This is used only for
-0.213405	therefore be used only for
-0.151485	pointers are used only for
-0.219707	use this method only for
-0.168491	overflow and works only for
-0.168491	This code works only for
-0.168491	Intel compiler works only for
-0.168491	this method works only for
-0.273556	do the dispatching only for
-0.219707	specialization is allowed only for
-0.235089	set has no instruction for
-0.235089	an inline assembly instruction for
-0.639338	to use a loop for
-0.327948	return x^10 // loop for
-0.232596	{ // Main loop for
-0.234472	finding hot spots, but for
-0.234472	in scientific computing, but for
-0.324089	one that is used for
-0.173403	fashion. It is used for
-0.173403	virtual table is used for
-0.173403	one thread is used for
-0.173403	memory space is used for
-0.050479	dynamic_cast operator is used for
-0.050479	const_cast operator is used for
-0.050479	reinterpret_cast operator is used for
-0.238782	lookup process is used for
-0.173403	stack frame is used for
-0.173403	macro INSTRSET is used for
-0.173403	function longjmp is used for
-0.159661	more popular and used for
-0.211154	instruction can be used for
-0.211154	which can be used for
-0.211154	class can be used for
-0.211154	register can be used for
-0.348439	method can be used for
-0.211154	union can be used for
-0.211154	guidelines can be used for
-0.203785	that may be used for
-0.203785	methods may be used for
-0.203785	Templates may be used for
-0.172971	can also be used for
-0.137492	might also be used for
-0.237344	optimization cannot be used for
-0.355795	Virtual functions are used for
-0.272042	directives which are used for
-0.272042	Threads Threads are used for
-0.159661	return ipow(x,10); // used for
-0.017030	tables are not used for
-0.206168	input. The time used for
-0.206168	const definitions when used for
-0.159661	inside the CPU used for
-0.324568	memory is also used for
-0.349852	variables are often used for
-0.159661	of cache space used for
-0.159661	branches. The algorithms used for
-0.159661	The method currently used for
-0.258088	the program, and one for
-0.258088	for SSE4.1 and one for
-0.223479	by the program, one for
-0.223479	SSE2 instruction set, one for
-0.223479	compiled three times, one for
-0.223479	into three parts: one for
-0.223479	just two branches: one for
-0.342830	three levels of cache for
-0.234034	have an extra cache for
-0.511927	in the instruction set for
-0.450339	minimum supported instruction set for
-0.906402	a structure or class for
-0.191403	Microsoft and Intel compilers for
-0.191403	PathScale and Intel compilers for
-0.236331	that are used most for
-0.736915	by the vector size for
-0.535908	copy a to b for
-0.345993	Gnu C function library for
-0.436570	popular user interface library for
-0.233019	// makes intermediate object for
-0.233019	of a temporary object for
-0.232692	preference is for C++ for
-0.288284	languages such as C++ for
-0.288753	variable. This is efficient for
-0.444197	systems are most efficient for
-0.232598	reuse the same array for
-0.435829	than a linear array for
-0.694873	that make it possible for
-0.310943	as standardized as possible for
-0.226078	It is therefore possible for
-0.226078	It is rarely possible for
-0.276998	and a 64-bit version for
-0.222747	A 32- bit version for
-0.306896	to each new version for
-0.222747	instruction set, another version for
-0.222747	implemented a separate version for
-0.236128	creation of temporary objects for
-0.508807	optimizations of a variable for
-0.337689	the same induction variable for
-0.221805	to use induction variables for
-0.221805	not make induction variables for
-0.295584	floating point induction variables for
-0.150245	array elements Induction variables for
-0.150245	integer expressions Induction variables for
-0.150245	code motion Induction variables for
-0.311928	use a hash table for
-0.274529	get a good performance for
-0.220567	by selecting optimize performance for
-0.220567	give almost identical performance for
-0.220567	well. Very poor performance for
-0.220567	binding definitely degrades performance for
-0.345245	of optimizing the software for
-0.169173	correctly. A code branch for
-0.169173	run any code branch for
-0.549520	a function is called for
-0.237999	float sum = 0; for
-0.237999	i, sum = 0; for
-0.336397	i, largest_index = 0; for
-0.354907	simply identical. For example, for
-0.323015	// Convert to unsigned for
-0.318981	one 256-bit vector register for
-0.204659	use the same register for
-0.221639	as a temporary register for
-0.313808	are various function libraries for
-0.313808	you. Optimized function libraries for
-0.269426	compilers include standard libraries for
-0.216058	discussed below. Many libraries for
-0.216058	collection contains well-tested libraries for
-0.235470	}; // Function template for
-0.230126	benefit from using registers for
-0.524181	use the XMM registers for
-0.317683	(This eliminates the need for
-0.225342	of data. The need for
-1.278389	There is no need for
-0.448349	for how to test for
-0.175084	performance. It is useful for
-0.118065	mode program is useful for
-0.118065	output, which is useful for
-0.113957	This method is useful for
-0.118065	empty throw()specification is useful for
-0.687481	This can be useful for
-0.293337	library can be useful for
-0.293337	binding can be useful for
-0.293337	tables can be useful for
-0.457745	Bitfields may be useful for
-0.037876	available which are useful for
-0.037876	function libraries are useful for
-0.037876	#if directives are useful for
-0.037876	These profilers are useful for
-0.037876	Linux). Threads are useful for
-0.037876	type. References are useful for
-0.037876	^, ~ are useful for
-0.132981	profiler is most useful for
-0.260634	operator is also useful for
-0.152047	tested, and very useful for
-0.313483	can be very useful for
-0.219199	in most cases, even for
-0.219199	overflow never occurs, even for
-0.219199	be a time-consumer even for
-0.219199	long response times, even for
-0.334242	and choose this method for
-0.229929	returns. The preferred method for
-0.344297	applications, but not always for
-0.325479	big structures by 16 for
-0.229283	files. See page 16 for
-0.991550	and the operating system for
-0.229693	used. See page 32 for
-0.229693	for SSE2, preferably 32 for
-0.309748	and closes the file for
-0.233726	including a header file for
-0.437632	the appropriate header file for
-0.071815	call // Header file for
-0.071815	later // Header file for
-0.307832	any of the bits for
-0.229216	integer has enough bits for
-0.214420	typically use integer operations for
-0.044510	14.9 Using integer operations for
-0.284449	bit which is 0 for
-0.229316	with the value 0 for
-0.235332	so many different cases for
-0.199310	this kind of instructions for
-0.199310	There are no instructions for
-0.088320	lot of extra instructions for
-0.088320	a few extra instructions for
-0.199310	There are intrinsic instructions for
-0.250536	compiler may reorder instructions for
-0.199310	support the ADX instructions for
-0.046261	C++ compiler is available for
-0.224881	"express" edition is available for
-0.206990	Vector operations are available for
-0.206990	trial versions are available for
-0.206990	class templates are available for
-0.206990	interface frameworks are available for
-0.182082	an extra register available for
-0.231199	six integer registers available for
-0.182082	LIBM library. Only available for
-0.622310	of the residual error for
-0.155480	because the response times for
-0.617528	unacceptably long response times for
-0.155480	of longer response times for
-0.560682	space on the stack for
-0.566357	do. It is important for
-0.456808	algorithm is very important for
-0.234997	cycles than other CPUs for
-0.234492	that are too large for
-0.328472	is impossible to work for
-0.319315	this method doesn't work for
-0.253178	make multiple code versions for
-0.326303	available in different versions for
-0.047581	code in multiple versions for
-0.201658	functions have several versions for
-0.234737	a dedicated physics processor for
-0.286607	Code that is compiled for
-0.180796	the code is compiled for
-0.187683	C++ file and compiled for
-0.364372	on mixing code compiled for
-0.187683	if necessary, each compiled for
-0.083781	systems and programs compiled for
-0.083781	precision in programs compiled for
-0.290215	container is too big for
-0.310723	universal solution is best for
-0.344330	This unit-testing is necessary for
-0.720268	handling is not necessary for
-0.322466	clock cycles per element for
-0.342370	use of assembly language for
-0.328143	CPUs optimally. The speed for
-0.385873	p) { int i; for
-0.053515	= 0; int i; for
-0.270617	= 1.0; int i; for
-0.270617	S1 list[100]; int i; for
-0.270617	Example 7.30a int i; for
-0.486571	bloat. It is common for
-0.226475	set (/arch:SSE2, /arch:AVX etc. for
-0.226475	Windows, -msse2, -mavx, etc. for
-0.426436	it possible to compile for
-0.234985	c[arraysize]; // Enable exception for
-0.334319	memory will be allocated for
-0.227760	Turn on the option for
-0.302650	option. Use the option for
-0.060741	compilers have an option for
-0.019807	compiler has an option for
-0.195292	use a compiler option for
-0.195292	the same compiler option for
-0.169001	"generate map file" option for
-0.040927	if it is good for
-0.044059	high-level languages are good for
-0.044059	Low-level languages are good for
-0.510937	element in a matrix for
-0.289990	and loss of precision for
-0.328437	of code that works for
-0.246407	data cache is optimized for
-0.228816	these examples are optimized for
-0.179954	microprocessors are not optimized for
-0.179954	processor that you optimized for
-0.179954	same function, each optimized for
-0.076948	functions are highly optimized for
-0.076948	libraries are highly optimized for
-0.228816	C++ builder. Not optimized for
-0.042419	mode. See the manual for
-0.042419	handling. See the manual for
-0.333747	and the compiler manual for
-0.303221	basis for this manual for
-0.202193	See the vectorclass manual for
-0.234654	int i, a[100], b; for
-0.162747	can bypass the check for
-0.148656	F1 has to check for
-0.210144	a way to check for
-0.148656	for how to check for
-0.148656	function calls to check for
-0.226419	14.26 does not check for
-0.120456	function must then check for
-0.006625	There is no check for
-0.006625	functions have no check for
-0.120456	that doesn't automatically check for
-0.120456	is no automatic check for
-0.120456	example. We might check for
-0.120456	data. A missing check for
-0.120456	this problem: (1) check for
-0.234865	multiple cores are advantageous for
-0.762164	the most efficient solution for
-0.271855	when choosing a container for
-0.271855	temporarily lock a container for
-0.216954	not use one container for
-0.060560	compilers that have support for
-0.060560	Some compilers have support for
-0.060560	instruction set has support for
-0.060560	operating system has support for
-0.130955	takes to make support for
-0.130955	compiler has some support for
-0.006327	CPU has hardware support for
-0.012747	microprocessor has hardware support for
-0.130955	systems need better support for
-0.014367	to turn off support for
-0.130955	not have inherent support for
-0.130955	It has excellent support for
-0.269111	and using overloaded operators for
-0.054404	14.3 Use bitwise operators for
-0.533036	i < rows; i++) for
-0.289504	OpenMP is a standard for
-0.289863	on the microprocessor hardware for
-0.094141	for positive and 1 for
-0.094141	for false and 1 for
-0.214504	AND'ed b with 1 for
-0.224024	for size and optimizing for
-0.224024	the choice between optimizing for
-0.224663	to save some information for
-0.224663	to save recovery information for
-0.350623	- 80 clock cycles for
-0.233992	a[size], b[size]; // ... for
-0.263593	ab[size]; int i; ... for
-0.173916	a[size], b[size], i; ... for
-0.184576	order(int x); 136 ... for
-0.184576	int i, j; ... for
-0.184576	1000; int List[ArraySize]; ... for
-0.223896	use full 64-bit addresses for
-0.223896	only calculate element addresses for
-0.297672	in different source files for
-0.223567	Table 12.2. Header files for
-0.329748	resources are not recommended for
-1.053625	use dynamic memory allocation for
-0.632808	you want to optimize for
-0.233943	the disadvantages mentioned above for
-0.289337	are no caching problems for
-0.331802	type that is optimal for
-0.233438	use the same space for
-0.476910	useful in some cases, for
-0.233177	to test all branches for
-0.330618	i, f = 1; for
-0.673620	* b + 1; for
-0.321683	degradation in code caching for
-0.233577	select the best implementation for
-0.058122	can disable exception handling for
-0.196940	two commonly used methods for
-0.196940	are more useful methods for
-0.196940	There are various methods for
-0.196940	works and suggests methods for
-0.232997	modified should be separate for
-0.333235	allocates one memory block for
-0.220741	allocate a small block for
-0.208534	function a different name for
-0.801756	has the same name for
-0.208534	use the local name for
-0.102919	{ int r, c; for
-0.321855	is often a disadvantage for
-0.232896	can be annoyingly high for
-0.622288	set a to zero for
-0.233317	order to reserve resources for
-0.240667	intermediate code. The reason for
-0.240667	the end. The reason for
-0.240667	it directly. The reason for
-0.194754	a compelling security reason for
-0.340331	the virtual table lookup for
-0.288765	contains complete code examples for
-0.427930	#define makes no difference for
-0.220340	test // Time difference for
-0.192446	intermediate code is needed for
-0.192446	Extra time is needed for
-0.230851	variables may be needed for
-0.477385	A is not needed for
-0.181772	the extra work needed for
-0.232208	chooses between two expressions for
-0.196377	that it is difficult for
-0.233776	memory. It is difficult for
-0.233776	disadvantages: It is difficult for
-0.243669	code is more difficult for
-0.073083	of the OpenMP directives for
-0.073083	Supports the OpenMP directives for
-0.399325	a large runtime framework for
-0.337683	to specify static linking for
-0.503651	double Table[100]; int x; for
-0.152523	int i; float x; for
-0.152523	i, j; float x; for
-0.681590	choice of hardware platform for
-0.217023	these functions is higher for
-0.217023	These costs are higher for
-0.528378	the compiler cannot know for
-0.232365	to get reliable results for
-0.269157	we specify the options for
-0.215820	study the available options for
-0.325302	compilers have a feature for
-0.160162	have such a feature for
-0.471274	dispatching can be made for
-0.215820	own function library made for
-0.232195	what is most appropriate for
-0.231775	the function a constructor for
-0.310296	that it is relevant for
-0.306641	end of this section for
-0.434409	log off the computer for
-0.003746	is a good choice for
-0.034937	a very good choice for
-0.160633	be the optimal choice for
-0.286367	are used in STL for
-0.122895	This function is intended for
-0.122895	temporarily. This is intended for
-0.180596	systems. It is intended for
-0.122895	Exception handling is intended for
-0.122895	This feature is intended for
-0.154746	libraries that are intended for
-0.113192	It is not intended for
-0.113192	physics processing unit intended for
-0.231950	parameter transfer is avoided for
-0.336738	loading any cache lines for
-0.080539	variables have one instance for
-0.080539	and make one instance for
-0.080539	will get one instance for
-0.018778	section needs one instance for
-0.092649	There is no checking for
-0.092649	mentations have no checking for
-0.305548	constructor may be inlined for
-0.285246	applications use a database for
-0.598757	to call the destructor for
-0.096147	it opens the possibility for
-0.096147	set opens the possibility for
-0.219817	can open the possibility for
-0.371288	arrays // Define macro for
-0.229991	variables or hide them for
-0.229750	to have separate containers for
-0.180788	CPU dispatching are: Optimizing for
-0.180788	caching is critical. Optimizing for
-0.180788	optimizing for speed. Optimizing for
-0.523526	// loop through rows for
-0.039722	vector registers when compiling for
-0.039722	Visual Studio when compiling for
-0.039722	less strict when compiling for
-0.039722	option -fno-pic when compiling for
-0.039722	class Vec16s when compiling for
-0.039722	to Eclipse when compiling for
-0.332082	algorithms and data structures for
-0.229010	is a precious resource for
-0.030277	r, c; double temp; for
-0.176347	c[size]; float register temp; for
-0.037908	This makes it easier for
-0.037908	declaration makes it easier for
-0.176751	make this reordering easier for
-0.284101	code is exactly identical for
-0.463728	redesign of the program, for
-0.152232	the same time, except for
-0.152232	the same object, except for
-0.152232	make 16-bit programs, except for
-0.152232	on the stack, except for
-0.197964	container classes and templates for
-0.197964	page 150. Using templates for
-0.229124	#include "xmmintrin.h" // header for
-0.144314	there is a penalty for
-0.144314	There is no penalty for
-0.144091	is no performance penalty for
-0.144091	no 51 performance penalty for
-0.228623	called once. The reasons for
-0.228123	test a software module for
-0.322174	type-casting. It is used, for
-0.198664	There are no checks for
-0.198664	to make explicit checks for
-0.472141	ways. The critical stride for
-0.228623	manuals. See page 3 for
-0.227873	in meaningless event counts for
-0.209157	that is big enough for
-0.129132	size is big enough for
-0.228123	function libraries have features for
-0.228151	temp; temp = 3; for
-0.262812	when C++ is chosen for
-0.244559	the compiler has chosen for
-0.647558	sure that all destructors for
-0.845857	overhead of parameter transfer for
-0.163763	user's needs. The search for
-0.163763	intervals. Some programs search for
-0.163763	then use binary search for
-0.194508	97 Table 9.1. Time for
-0.194508	168.3 Table 9.3. Time for
-0.413567	calls can be mispredicted for
-0.211698	developed a test tool for
-0.211698	for my test tool for
-0.306011	Bounds checking is included for
-0.226626	Loop r2 and c2 for
-0.281043	the graphics processing unit for
-0.076625	any specific calling conventions for
-0.000318	manual 5: "Calling conventions for
-0.041644	CPUs. 5. Calling conventions for
-0.226938	extra precautions to account for
-0.280690	discussion of different algorithms for
-0.189653	position-independent, makes a PLT for
-0.441871	use GOT and PLT for
-0.189082	is executed only once for
-0.189082	the function. Compile once for
-0.070861	(NetBurst) CPU is designed for
-0.070861	the STL is designed for
-0.155582	feature was never designed for
-0.226004	mode program. The inputs for
-0.226626	resources are limiting factors for
-0.070861	little math is required for
-0.070861	data manipulation is required for
-0.155582	after debugging if required for
-0.281396	functions and a GOT for
-0.225029	applications to perform poorly for
-0.224320	instructions are not suitable for
-0.225383	template metaprogramming // Template for
-0.225383	8.14a int i, a[100]; for
-0.364470	especially in 32-bit mode, for
-0.012337	processors. See page 130 for
-0.012337	CPU. See page 130 for
-0.025035	processors (see page 130 for
-0.025035	CPUs. (See page 130 for
-0.182847	page 95 and 120 for
-0.182847	aligned. See page 120 for
-0.224674	14, with some changes for
-0.224674	interpreted again and again for
-0.224320	between the optimization capabilities for
-0.009820	a user is waiting for
-0.009820	another thread is waiting for
-0.019867	While we are waiting for
-0.009820	of its time waiting for
-0.009820	of their time waiting for
-0.019867	threads are often waiting for
-0.019867	can do while waiting for
-0.225383	clock cycles. The rules for
-0.025147	don't have to wait for
-0.008222	element has to wait for
-0.008222	addition has to wait for
-0.008222	actually has to wait for
-0.175039	example shows the principle for
-0.175039	normally use this principle for
-0.421107	same can be expected for
-0.222908	Report on C++ Performance for
-0.278113	can also be convenient for
-0.222496	Loop r1 and c1 for
-0.276714	contentions. See page 87 for
-0.062392	reductions are not permissible for
-0.062392	'>') are not permissible for
-0.572098	n, factorial = 1.0; for
-0.222908	occurrence is rare. Testing for
-0.223731	or turn off requirements for
-0.158375	(except in device drivers for
-0.158375	in 64-bit device drivers for
-0.035467	system. See page 122 for
-0.035467	crash. See page 122 for
-0.220937	between c2 and bc for
-0.163907	longer used and searching for
-0.163907	programs use time searching for
-0.028075	fit the biggest vectors: for
-0.000425	fit the eight-element vectors: for
-0.220446	expressions. See page 80 for
-0.163476	See page and 90 for
-0.163476	errors. See page 90 for
-0.273836	file level. My recommendation for
-0.220446	Example 14.21. // Only for
-0.219955	not. See page 107 for
-0.219955	Mac OS X Compilers for
-0.113887	the same object (except for
-0.113887	on integer expressions (except for
-0.113887	the previous iteration (except for
-0.220446	Development Environments) have facilities for
-0.216171	order. See page 103 for
-0.216171	other. See page 51 for
-0.270931	static linking is preferable for
-0.352781	mechanism. See page 43 for
-0.013967	// Full template specialization for
-0.058782	// Partial template specialization for
-0.216171	other. See page 88 for
-0.298927	by the heap manager for
-0.216171	program. See page 150 for
-0.002129	C++ An optimization guide for
-0.002129	CPUs: An optimization guide for
-0.002129	C++: An optimization guide for
-0.002129	language: An optimization guide for
-0.269555	network. Various development tools for
-0.015894	See the compiler documentation for
-0.216171	cannot expect a directive for
-0.015894	the same memory area for
-0.216171	system. See page 29 for
-0.216171	array of 100 floats for
-0.208476	the appendix at www.agner.org/optimize/cppexamples.zip for
-0.147207	the alignment. See www.agner.org/optimize/cppexamples.zip for
-0.216171	incremented. See page 31 for
-0.216779	7.30b int i; 45 for
-0.262512	functions. See page 49 for
-0.209937	chapter 10 page 101 for
-0.209937	allocation. See page 93 for
-0.209937	page 145 and 119 for
-0.281553	copying of memory blocks, for
-0.262512	techniques like square blocking for
-0.210737	int Induction = r; for
-0.209937	much faster, except perhaps for
-0.209937	details of cache organization for
-0.210737	Main loop for calculations: for
-0.210737	chosen as the basis for
-0.209937	available. See page 81 for
-0.027314	memory. See page 89 for
-0.027314	overlap. See page 89 for
-0.209937	takes. See page 153 for
-0.209937	this. See page 140 for
-0.209937	language. See page 141 for
-0.210737	units are used twice for
-0.209937	the threads are competing for
-0.027314	it is not unusual for
-0.027314	It is not unusual for
-0.121378	jobs and 10 ms for
-0.121378	of typically 30 ms for
-0.209937	Intel C++ Compiler Documentation for
-0.209937	GOT and PLT lookups for
-0.027314	aliasing. See page 78 for
-0.027314	occur. See page 78 for
-0.209937	processor enters the market for
-0.210737	each bit in Day for
-0.106836	AMD and VIA CPUs" for
-0.262512	part of a variable, for
-0.210737	and may be sufficient for
-0.197731	a graphics accelerator card for
-0.197731	optimal number of accumulators for
-0.248759	costs can be justified for
-0.197731	s3 = 0, sum; for
-0.197731	happens inside the loop, for
-0.197731	// Critical innermost loop: for
-0.197731	string; int i, StringLength; for
-0.197731	only half of it, for
-0.197731	8.12a int i, a[2]; for
-0.197731	page 73 and 72 for
-0.073916	algorithms, are not suited for
-0.073916	language is best suited for
-0.197731	the following: 130 Compile for
-0.197731	complicated because various corrections for
-0.017333	} // Approximate exp(x) for
-0.017333	n+1; // Approximate exp(x) for
-0.248759	interrupts at certain events, for
-0.197731	cache is a proxy for
-0.197731	or the specific literature for
-0.197731	to take special precautions for
-0.197731	arrays and structures. Useful for
-0.197731	get a compiler warning for
-0.197731	matrix[NUMROWS][NUMCOLUMNS]; int row, column; for
-0.197731	search times 24 dramatically for
-0.197731	code. Compilers and IDE's for
-0.248759	float nfac = 1.f; for
-0.073916	carefully optimized and fine-tuned for
-0.073916	branches that are fine-tuned for
-0.197731	the program that waits for
-0.197731	response time is consistent for
-0.248759	PC's had an interpreter for
-0.197731	use different memory spaces for
-0.197731	c1 for all squares: for
-0.008580	it is not uncommon for
-0.197731	the background are unnecessary for
-0.197731	{ int i; 84 for
-0.197731	one free register left for
-0.197731	is so much stronger for
-0.197731	Installation problems. The procedures for
-0.197731	You cannot use ~ for
-0.197731	this is sufficiently accurate for
-0.073916	critical stride will contend for
-0.073916	all dynamic libraries contend for
-0.197731	= A + B; for
-0.197731	Intel compiler Linux Optimize for
-0.197731	version of the subroutine for
-0.163045	legacy code, specific preferences for
-0.163045	floating point division. Correction for
-0.163045	or a similar utility for
-0.163045	www.agner.org/optimize and the FAQ for
-0.163045	row < NUMROWS; row++) for
-0.163045	periodic pattern can be, for
-0.163045	is outside this interval, for
-0.163045	calculated with two decimals, for
-0.163045	times per matrix cell for
-0.163045	b;}; S1 list[100], *temp; for
-0.163045	CPUs use Intel VTune, for
-0.163045	may make separate executables for
-0.163045	instruction set is maintained for
-0.163045	preferences for the IDE, for
-0.163045	often used as buffers for
-0.163045	to a limited audience for
-0.163045	"Inner Loops: A sourcebook for
-0.163045	has a different meaning for
-0.163045	for elements inside sqaure: for
-0.163045	a response is delayed for
-0.163045	C++ Compiler v. 11.1 for
-0.163045	my free E-book Usability for
-0.163045	such as e.g. .R. for
-0.163045	CPU-dispatching (see page 122) for
-0.163045	returned in registers. Except for
-0.163045	you should be prepared for
-0.163045	need modification to compensate for
-0.163045	the heap is reserved for
-0.163045	time1; long long timediff[NumberOfTests]; for
-0.163045	cases ignore a request for
-0.163045	AMD: "Software Optimization Guide for
-0.163045	there are search requests for
-0.163045	with the static keyword, for
-0.163045	core will always compete for
-0.163045	an || expression. Assume, for
-0.163045	possible point of attack for
-0.163045	use internet or intranet for
-0.163045	STL has been criticized for
-0.163045	High precision math. Libraries for
-0.163045	(time before) } printf("\nResults:"); for
-0.163045	According to the standards for
-0.163045	compiler can optimize specifically for
-0.163045	have CPU dispatching 125 for
-0.163045	may improve the possibilities for
-0.163045	will be very helpful for
-0.163045	cc[]); // function prototypes for
-0.163045	smaller memory footprint. If, for
-0.163045	throughputs and micro-operation breakdowns for
-0.163045	a variable in parts, for
-0.163045	issue. See my blog for
-0.163045	me corrections and suggestions for
-0.163045	it can handle. Waiting for
-0.163045	can automatically detect opportunities for
-0.163045	particular advantageous as replacements for
-0.163045	used by exception handlers for
-0.163045	options -S or /Fa for
-0.163045	hardware can be wired for
-0.163045	application. In example 12.3a, for
-0.163045	to have a strategy for
-0.163045	triangle is handled separately: for
-0.163045	parm2) {...} // Prototype for
-0.163045	that C++ compilers exist for
-0.163045	compilers and operating systems" for
-0.163045	C++ Compiler v. 14.00 for
-0.163045	use the name _alloca) for
-0.163045	was more than doubled for
-0.163045	not turn on correction for
-0.163045	the performance by 5-10% for
-0.163045	integer registers. Typical candidates for
-0.163045	or class is responsible for
-0.163045	for the newsgroup comp.lang.asm.x86 for
-0.163045	Hyperthreading is Intel's term for
-0.317245	with this code is that
-0.158664	of intermediate code is that
-0.072126	an intermediate code is that
-0.866060	the Intel compiler is that
-0.323543	reason for this is that
-0.323543	learn from this is that
-0.314744	of static data is that
-0.317397	but the point is that
-0.335072	the same cache is that
-1.030826	AVX instruction set is that
-0.536310	newest instruction set is that
-0.304364	a 64-bit double is that
-0.327189	long vector library is that
-0.482156	of this method is that
-0.467221	and the result is that
-0.605321	hardware definition language is that
-0.284319	in 32-bit Linux is that
-0.333170	units. The problem is that
-0.012038	data. The disadvantage is that
-0.012038	starts. The disadvantage is that
-0.012038	avoided. The disadvantage is that
-0.037164	types. A disadvantage is that
-0.037164	slower. Another disadvantage is that
-0.325378	a function parameter is that
-0.011276	cycles. The reason is that
-0.011276	well. The reason is that
-0.011276	4. The reason is that
-0.011276	tested. The reason is that
-0.229202	kind of optimizations is that
-0.488050	of static linking is that
-0.304364	binary data storage is that
-0.262508	using static here is that
-0.262508	The problem here is that
-0.099653	level-1 cache contentions is that
-0.099653	of such contentions is that
-0.370729	of function inlining is that
-0.304364	ready made containers is that
-0.229202	can be improved is that
-0.229202	of lazy binding is that
-0.229202	of complicated algorithms is that
-0.229202	the keyword volatile is that
-0.022852	thing we notice is that
-0.229202	problem with macros is that
-0.229202	wasted. The consequence is that
-0.229202	such an assumption is that
-0.046974	71). The conclusion is that
-0.046974	utility. The conclusion is that
-0.229202	to this argument is that
-0.314076	the past history of that
-0.322922	threads becomes faster and that
-0.337281	repeats 1000 times and that
-0.235907	0 or 1 and that
-0.235907	may vary dynamically and that
-0.291939	of software development, and that
-0.237053	software writing style are that
-0.490695	arrays inside the function that
-0.035468	function is a function that
-0.116491	This is a function that
-0.466021	object to a function that
-0.419555	piece in a function that
-0.296211	as using a function that
-0.296211	function, while a function that
-0.330157	using exceptions. The function that
-0.154100	innermost loop A function that
-0.154100	functions local A function that
-0.154100	a destructor. A function that
-0.224821	to a const function that
-0.279349	recommend that every function that
-0.547589	to a graphics function that
-0.224821	}; // Any function that
-0.835924	the CPU detection function that
-0.224821	your own error-handling function that
-0.847052	that is compatible with that
-0.292360	from doing optimizations on that
-0.236277	the optimization effort on that
-0.409400	which is the code that
-0.725799	parts of the code that
-0.409400	feature into the code that
-0.315753	to check the code that
-0.315753	debugger. However, the code that
-0.315753	by copying the code that
-0.315753	to study the code that
-0.220446	example shows a code that
-0.220446	and insert a code that
-0.478710	a piece of code that
-0.384853	small pieces of code that
-0.338903	class members. The code that
-0.160069	features and for code that
-0.160069	good choice for code that
-0.160069	same time. A code that
-0.160069	it says. A code that
-0.216172	means to make code that
-0.216172	there is other code that
-0.216172	prediction). 149 All code that
-0.216172	compiler to optimize code that
-0.216172	such a complicated code that
-0.216172	400 here. Any code that
-0.298928	is a loop-invariant code that
-0.216172	used for improving code that
-0.260815	to tell the compiler that
-0.335227	This tells the compiler that
-0.286711	by using a compiler that
-0.286711	1. Use a compiler that
-0.321655	is possible. A compiler that
-0.349320	test loop. The time that
-0.505918	in the same time that
-1.404737	is recommended to use that
-0.452602	a part of memory that
-0.342541	if pieces of data that
-0.233417	multiple threads, while data that
-0.233417	data, including local data that
-1.020677	part of the program that
-0.486757	installation of the program that
-0.508823	elsewhere in the program that
-0.324285	to update the program that
-0.543482	redesign of a program that
-0.293461	Make a C++ program that
-0.199563	make a test program that
-0.199563	a small test program that
-0.220014	made a Windows program that
-0.220014	a more well-structured program that
-0.220014	Use an antivirus program that
-0.115057	performance for the functions that
-0.115057	to make the functions that
-0.026034	inlining all the functions that
-0.115057	class containing the functions that
-0.115057	classes implement the functions that
-0.115057	linker extracts the functions that
-0.115057	to collect the functions that
-0.288691	versions even of functions that
-0.207694	calling conventions for functions that
-0.259980	most efficiently if functions that
-0.207694	to return from functions that
-0.278909	and some other functions that
-0.318662	contains any member functions that
-0.207694	memory. If several functions that
-0.207694	for a few functions that
-0.322434	tables of mathematical functions that
-0.330419	the type of CPU that
-0.291150	to insert an instruction that
-0.235213	any particularly slow instruction that
-0.327820	For example, a loop that
-0.327820	many processors, a loop that
-0.230888	is inside another loop that
-0.523730	also the innermost loop that
-0.236870	important and generally used that
-0.089858	which is the one that
-0.089858	may be the one that
-0.089858	code like the one that
-0.089858	to find the one that
-0.318231	software product is one that
-0.255118	instruction set and one that
-0.255118	CPU brands, and one that
-0.333037	is the only one that
-0.219643	CPU detection function, one that
-0.344024	is also a cache that
-0.351631	in fact an integer that
-0.738740	for the instruction set that
-0.348602	has an instruction set that
-0.617696	information about the class that
-0.497031	No function or class that
-0.106657	into a container class that
-0.106657	defining a container class that
-0.337420	may choose the compilers that
-0.226874	works only for compilers that
-0.226874	not work on compilers that
-0.226874	are accessible from compilers that
-0.281677	to tell these compilers that
-0.226224	the smallest data size that
-0.187463	use an integer size that
-0.017255	the smallest integer size that
-0.434610	part of the library that
-0.307495	other than the library that
-0.273546	Use another function library that
-0.273546	a standard function library that
-0.273546	an up-to-date function library that
-0.305642	to another. The object that
-0.316053	Any array or object that
-0.345406	pointer is an object that
-0.296954	to call the version that
-0.097327	it takes. The version that
-0.097327	CPU brand. The version that
-0.222962	to make one version that
-0.222962	and a generic version that
-0.313350	depends on the value that
-0.129007	value from the value that
-0.129007	calculated from the value that
-0.313350	i; Here, the value that
-0.199202	calculated from a value that
-0.199202	each constant a value that
-0.515778	many variables and objects that
-0.288548	only for big objects that
-0.630690	address of the variable that
-0.483176	reference to a variable that
-0.324967	one thread. A variable that
-0.107268	are used in so that
-0.107268	directives around it so that
-0.107268	the thread function so that
-0.050331	reorganize the code so that
-0.050331	the critical code so that
-0.050331	ahead of time so that
-0.050331	at compile time so that
-0.107268	a 128-bit vector so that
-0.107268	and b different so that
-0.107268	in this example so that
-0.107268	each function call so that
-0.107268	one register less so that
-0.107268	the sign bit so that
-0.107268	accessed through pointers so that
-0.107268	two 64-bit operations so that
-0.107268	through the calculations so that
-0.107268	in separate threads so that
-0.107268	throw any exception so that
-0.107268	in 32-bit mode so that
-0.107268	a reliable source so that
-0.107268	at the start so that
-0.107268	never be negative so that
-0.107268	the code section so that
-0.107268	after this statement so that
-0.107268	operators are inlined so that
-0.107268	like a macro so that
-0.107268	it by 100 so that
-0.107268	14.9 is changed so that
-0.107268	constants are identical so that
-0.107268	excessive loop unrolling so that
-0.012046	should be organized so that
-0.107268	is an integer, so that
-0.107268	normalized, if possible, so that
-0.107268	code more compact so that
-0.107268	as explained above, so that
-0.107268	modify example 9.5 so that
-0.107268	in example 12.4a so that
-0.107268	the reciprocal factorials so that
-0.107268	and task switches; so that
-0.107268	the value 0x2C so that
-0.271141	will choose the variables that
-0.271141	is intended for variables that
-0.217574	making sure that variables that
-0.217574	may add counter variables that
-0.217574	one for initialized variables that
-0.217574	one for uninitialized variables that
-0.450857	copy of the table that
-0.236161	cycle. The highest performance that
-0.324323	a lineage of software that
-0.224072	is wasted on software that
-0.363639	possible to make software that
-0.224072	a computer. Security software that
-0.236282	time is so long that
-0.232244	case is a branch that
-0.435264	code has a branch that
-0.232244	For example, a branch that
-0.288314	a kind of branch that
-0.161859	other way. A branch that
-0.161859	of course. A branch that
-0.161859	often mispredicted. A branch that
-0.161859	it changes. A branch that
-0.204922	Eliminate branches Remove branch that
-0.044648	code in a way that
-0.021750	calculation in a way that
-0.150637	in such a way that
-0.236168	standard user interface elements that
-0.175693	at a memory address that
-0.335140	expression. Assume, for example, that
-0.235703	though the logical register that
-0.225938	operating system or libraries that
-0.588037	compilers and function libraries that
-0.321523	for general function libraries that
-0.235684	and for saving registers that
-0.235778	do things with pointers that
-0.291507	include a performance test that
-0.224964	available in all systems that
-0.313709	a different operating systems that
-0.406865	on old operating systems that
-0.051506	you cannot be sure that
-0.051506	can never be sure that
-0.226605	if you are sure that
-0.168184	have to make sure that
-0.168184	order to make sure that
-0.076001	way to make sure that
-0.076001	want to make sure that
-0.375134	programmer to make sure that
-0.168184	carefully to make sure that
-0.186856	aligned, and make sure that
-0.128372	We can make sure that
-0.128372	you must make sure that
-0.128372	parameters. Therefore, make sure that
-0.018911	x; This makes sure that
-0.018911	bytes. This makes sure that
-0.018911	static. This makes sure that
-0.059399	handling system makes sure that
-0.059399	const reference makes sure that
-0.059399	volatile keyword makes sure that
-0.059399	the product makes sure that
-0.008171	way of making sure that
-0.033650	solved by making sure that
-0.298983	may choose the method that
-0.224672	The most important method that
-0.224672	uses an unfortunate method that
-0.332921	to access a file that
-0.347858	= i = 0 that
-0.343679	and choose the type that
-0.438873	cycles in the case that
-0.229495	in the likely case that
-0.199689	must rely on instructions that
-0.284908	microprocessors have vector instructions that
-0.199689	critical application- specific instructions that
-0.199689	operators are single instructions that
-0.199689	include a few instructions that
-0.199689	sets have certain instructions that
-0.199689	to define application-specific instructions that
-0.238752	version on the processors that
-0.146101	first generation of processors that
-0.146101	second generation of processors that
-0.084093	the time on processors that
-0.084093	its time on processors that
-0.188821	only on some processors that
-0.143837	way, the first processors that
-0.204597	registers. The first processors that
-0.238752	for all unknown processors that
-0.348577	preferably be a constant that
-0.235209	is a common error that
-0.357858	used. It is important that
-0.357858	pointer. It is important that
-0.357858	calculations. It is important that
-0.228681	is compatible with CPUs that
-0.228681	compatible with all CPUs that
-0.234947	variables is so large that
-0.313663	a number of arrays that
-0.283311	always apply to arrays that
-0.228149	there is other work that
-0.228149	be deleted. User work that
-0.735831	in order to avoid that
-0.322948	test situations to avoid that
-0.235110	this time, any processor that
-0.099117	matrix is so big that
-0.099117	c[i] are so big that
-0.227827	event counts for threads that
-0.332327	divided into multiple threads that
-0.220474	is also a language that
-0.332159	choosing a programming language that
-0.220474	the device. Any language that
-0.323145	slower than a thread that
-0.227411	not doubled. A thread that
-0.098794	object is so small that
-0.098794	that are so small that
-0.420872	implemented. Use the option that
-0.329400	you specify an option that
-0.219462	compiled without any option that
-0.338105	of example container classes that
-0.380097	line then the line that
-0.469868	evict the cache line that
-0.389277	fetching a cache line that
-0.337332	overloaded operators. Function parameters that
-0.337963	extra code to check that
-0.226598	makes a runtime check that
-0.332261	can avoid the problem that
-0.225855	new version causes problem that
-0.333210	A more efficient solution that
-0.511888	to use a container that
-0.278652	be considered a container that
-0.035946	This has the advantage that
-0.216221	they come from operators that
-0.216221	sense that all operators that
-0.216221	or 1, but operators that
-0.231759	then it is likely that
-0.231759	bottleneck, it is likely that
-0.353316	it is very likely that
-0.234670	has a parallel structure that
-0.332142	so we can calculate that
-0.312208	data block to copy that
-0.290774	The only CPUID information that
-0.280653	if it is certain that
-0.195458	You cannot be certain that
-0.280653	If you are certain that
-0.195458	it is quite certain that
-0.324653	it is almost certain that
-0.540541	The few clock cycles that
-0.224173	contain pointers or addresses that
-0.224173	contains no absolute addresses that
-0.377515	This is a counter that
-0.117449	the maximum loop count that
-0.117449	The maximum loop count that
-0.234278	network connections. Temporary files that
-0.533259	It is therefore recommended that
-0.097534	CPUs are so fast that
-0.097534	is developing so fast that
-0.097463	But if I write that
-0.097463	speeds. If I write that
-0.293674	CPU use in programs that
-0.264643	annoyingly high for programs that
-0.211825	you are making programs that
-0.287245	this involves the problems that
-0.253122	and other resource problems that
-0.201608	useful for finding problems that
-0.285635	as important usability problems that
-0.309836	set (requires a microprocessor that
-0.290607	a lot of branches that
-0.200125	you are making branches that
-0.200125	vector size. Unpredictable branches that
-0.200125	unpacking needed. Predictable branches that
-0.222259	friend function or operator that
-0.276446	loaded type casting operator that
-0.233661	Installing a second application that
-0.071317	the compiler can see that
-0.071317	optimizing compiler can see that
-0.263062	optimizing compiler will see that
-0.345580	unchanged, while the expression that
-0.231253	inverted mask. The expression that
-0.029143	b is an expression that
-0.231253	Induction variables An expression that
-0.166919	loop counter. Any expression that
-0.166919	is a loop-invariant expression that
-0.233423	programming is so complicated that
-0.343840	allowing two data members that
-0.233392	likely is a model that
-0.386934	of a memory block that
-0.297555	destroys any memory block that
-0.233392	just an arbitrary name that
-0.034015	it has the disadvantage that
-0.016675	This has the disadvantage that
-0.013995	code is so high that
-0.058909	may be so high that
-0.340391	a register to zero that
-0.289397	there are allocated resources that
-0.234057	for the same reason that
-0.482260	has a CPU dispatcher that
-0.348413	obvious to the programmer that
-0.273325	an advantage in applications that
-0.219504	are advantageous for applications that
-0.497004	A CPU dispatch mechanism that
-0.085107	class member function means that
-0.011184	2, etc. This means that
-0.011184	most cases. This means that
-0.011184	multiplication units. This means that
-0.011184	4 ways. This means that
-0.011184	out-of-order execution. This means that
-0.011184	clock cycle. This means that
-0.011184	= 28. This means that
-0.040472	local const variable means that
-0.040472	a global variable means that
-0.085107	20 to 10 means that
-0.085107	a non-member function, means that
-0.085107	performance). Aligned operands means that
-0.085107	Functional decomposition here means that
-0.288373	however, often write expressions that
-0.232881	library has preprocessing directives that
-0.258574	core, but it requires that
-0.258574	this option. This requires that
-0.190374	other hardware often requires that
-0.317823	array. This method requires that
-0.204281	from doing the optimizations that
-0.204281	of the compiler optimizations that
-0.336576	compiler from making optimizations that
-0.217871	choose a software framework that
-0.378802	very large runtime framework that
-0.529402	compatible with old microprocessors that
-0.185759	the compiler to assume that
-0.060111	not permissible to assume that
-0.035550	the problem and assume that
-0.004288	cases, you can assume that
-0.004288	general, you can assume that
-0.004288	code. You can assume that
-0.004288	result. You can assume that
-0.035550	ignore overflow or assume that
-0.035550	Neither can you assume that
-0.070024	2.5f; If we assume that
-0.017411	then you cannot assume that
-0.017411	cycles. You cannot assume that
-0.035550	optimizing compiler would assume that
-0.008618	you can generally assume that
-0.008618	You can generally assume that
-0.035550	that compiler makers assume that
-0.035550	compiler can safely assume that
-0.354880	element. The table shows that
-0.217703	(see page 16) shows that
-0.157458	programmers do not know that
-0.203718	sequence. If you know that
-0.157458	assumes that we know that
-0.390588	the compiler cannot know that
-0.157458	And who would know that
-0.157458	compilers (Microsoft, Intel) know that
-0.217137	are also other advantages that
-0.217137	there are specific advantages that
-0.336642	have various optimization options that
-0.252510	auto_ptr has the feature that
-0.201065	has the special feature that
-0.201065	The symbol interposition feature that
-0.307984	with a default constructor that
-0.199199	full. This may require that
-0.153394	efficient vector operations require that
-0.153394	All these instructions require that
-0.153394	Alignment? Some applications require that
-0.153394	code. Some profilers require that
-0.153394	MOVNTPD and MOVNTDQ require that
-0.214908	option for all modules that
-0.268127	across all .cpp modules that
-0.247963	a couple of things that
-0.197023	listing reveals three things that
-0.197023	can often reveal things that
-0.287395	do. All the reductions that
-0.231534	so that each statement that
-0.265847	may catch programming errors that
-0.212891	intended for detecting errors that
-0.265595	faster than other languages that
-0.304841	called from programming languages that
-0.325678	packages include a profiler that
-0.231107	performing an illegal operation that
-0.155208	advantage of the fact that
-0.033916	suboptimal way. The fact that
-0.033916	of underflow. The fact that
-0.033916	sign bit. The fact that
-0.033916	than 20. The fact that
-0.231154	be portable to platforms that
-0.209042	is a single task that
-0.209042	spell checking. Any task that
-0.231020	parts: one for constants that
-0.135323	course be a destructor that
-0.135323	class with a destructor that
-0.194819	thread have a destructor that
-0.150411	Making exception-safe code Assume that
-0.150411	} } } Assume that
-0.150411	another memory access. Assume that
-0.150411	terms of speed. Assume that
-0.150411	code in general. Assume that
-0.230754	optimizing the first algorithm that
-0.212157	sake of the possibility that
-0.044127	rule out the possibility that
-0.164709	out the theoretical possibility that
-0.231495	clear from this discussion that
-0.230920	are satisfied. The conditions that
-0.230633	accessed with an offset that
-0.056067	0 and 1. Note that
-0.056067	the desired version. Note that
-0.056067	the Windows system. Note that
-0.056067	as character arrays. Note that
-0.056067	Documentation for details. Note that
-0.056067	for an explanation. Note that
-0.056067	but less optimized. Note that
-0.056067	is optimized away. Note that
-0.056067	object file disassembler. Note that
-0.056067	any patch. 131 Note that
-0.043392	to put the operand that
-0.043392	then put the operand that
-0.230633	independently of other tasks that
-0.125247	caching more efficient. Variables that
-0.168030	or static storage Variables that
-0.125247	positive effects are: Variables that
-0.125247	for temporary storage. Variables that
-0.058124	library functions. 9.4 Variables that
-0.058124	together...................................... 88 9.4 Variables that
-0.043299	above, it is clear that
-0.043299	method, it is clear that
-0.230182	CPU may occasionally predict that
-0.340845	the actual clock frequency that
-0.230027	is an extra iteration that
-0.204841	Other brands or models that
-0.204841	on all newer models that
-0.305160	habit, it is true that
-0.372302	vector functions have names that
-0.230182	are also other details that
-0.205134	dependency chains. Another thing that
-0.205134	code. The third thing that
-0.332740	and other data structures that
-0.284827	However, we must consider that
-0.229649	calculate the time delay that
-0.229480	and FPGA soft cores that
-0.229311	the value of ebx that
-0.228268	the series of statements that
-0.228454	reveal a zigzag course that
-0.330168	there is a risk that
-0.199546	is a higher risk that
-0.228454	up the loop buffer that
-0.199026	if there is something that
-0.199026	it is certainly something that
-0.351024	Occasionally, the clock counts that
-0.198506	the "best case" counts that
-0.278267	(b*c)/d, it can happen that
-0.199026	it can often happen that
-0.302232	relatively primitive programming style that
-0.195097	non-virtual member function, provided that
-0.195097	not use branches, provided that
-0.164312	make sure that everything that
-0.164312	to make sure everything that
-0.164312	time to eliminate everything that
-0.226566	accessed column-wise. Assume now that
-0.431763	you take into account that
-0.202011	summing up the factors that
-0.016692	There are several factors that
-0.226566	tell the compiler explicitly that
-0.225223	This gives a measure that
-0.225487	swapped to disk. Software that
-0.225223	have used the trick that
-0.225487	does have some disadvantages that
-0.225750	CParent::Hello() has multiple instances that
-0.224960	cache are so expensive that
-0.176462	you should be aware that
-0.176462	You should be aware that
-0.225223	than the runtime polymorphism that
-0.001265	operations in the sense that
-0.001265	macro in the sense that
-0.000632	portable in the sense that
-0.001265	serial in the sense that
-0.001265	overdetermined in the sense that
-0.313607	This requires, of course, that
-0.037666	compiler you will notice that
-0.037666	compiler, you will notice that
-0.421778	it can be expected that
-0.223133	smart. They can detect that
-0.223439	in table 9.1 show that
-0.018425	of standard C, specifying that
-0.321173	mechanism called stack unwinding that
-0.220953	or any other cleanup that
-0.074272	................................................................................................... 87 9.3 Functions that
-0.074272	VIA CPUs". 9.3 Functions that
-0.282703	The C/C++ standard specifies that
-0.163921	The volatile keyword specifies that
-0.220953	be allocated dynamically. Arrays that
-0.220589	use it for lists that
-0.115475	fail in the event that
-0.115475	resized in the event that
-0.220589	called garbage collection. Objects that
-0.220953	same memory areas. Data that
-0.221318	virtual machine are frameworks that
-0.221683	The const keyword tells that
-0.220953	tools have powerful facilities that
-0.221683	using a constant divisor that
-0.192775	software teachers to recommend that
-0.261384	Some programming textbooks recommend that
-0.216799	we can roughly estimate that
-0.217251	it can be said that
-0.035491	way. You may think that
-0.035491	model and then think that
-0.035491	usability, but I think that
-0.035491	a. I don't think that
-0.217703	are more efficient alternatives that
-0.216799	are offering profiling tools that
-0.389555	is a hot spot that
-0.216799	typically have variable lengths that
-0.217251	has the unfortunate consequence that
-0.067563	rely on the assumption that
-0.067563	Gnu compiler, the assumption that
-0.027382	case it will recognize that
-0.027382	the compiler will recognize that
-0.027382	Most compilers will recognize that
-0.217251	real time applications. Remember that
-0.216799	some of the considerations that
-0.026753	has an initialization routine that
-0.216799	and only one, auto_ptr that
-0.121714	since we are assuming that
-0.121714	is prevented from assuming that
-0.210556	or #pragma optimize("a",on). Specifies that
-0.210556	my crystal ball reveals that
-0.210556	statement with many labels that
-0.210556	bit scan instruction. Programmers that
-0.210556	calls another function F2 that
-0.211150	Neither is it unusual that
-0.047996	and 64-bit systems. Applications that
-0.047996	the user interface. Applications that
-0.047996	See page 141. Applications that
-0.637538	procedure linkage table (PLT) that
-0.210556	Background services. Many services that
-0.164134	4) { // Check that
-0.121714	code version. 2. Check that
-0.102271	sprintf, etc. But beware that
-0.102271	be inlined. But beware that
-0.198333	work better. Remember again, that
-0.074142	CPU dispatching and discovered that
-0.074142	Many programmers have discovered that
-0.198333	is a 90% chance that
-0.198333	device is a chip that
-0.035491	issues, and I believe that
-0.035491	as expected. I believe that
-0.074142	for vector operations. Algorithms that
-0.074142	vectors and matrixes. Algorithms that
-0.198333	are based on hacks that
-0.074142	x.c = C; Assuming that
-0.074142	features it has. Assuming that
-0.074142	is important to note that
-0.074142	subsequent manuals. Please note that
-0.198333	caused by random events that
-0.074142	two simple expressions. Operations that
-0.074142	alignment and aliasing. Operations that
-0.074142	as vector register. Factors that
-0.074142	advantageous vectorization is. Factors that
-0.198333	is useful on servers that
-0.074142	to software optimization. Everything that
-0.074142	the same register. Everything that
-0.198333	The compiler may report that
-0.198333	different CPUs to verify that
-0.198333	functions, and other complications that
-0.198333	We can therefore conclude that
-0.198333	time cleaning up spaces that
-0.249436	bigger and more complex, that
-0.008604	We can only hope that
-0.249436	or modify the ones that
-0.198333	on a program saying that
-0.198333	256-bit integer vectors. Code that
-0.249436	is known with certainty that
-0.198333	under worst-case conditions. Programs that
-0.074142	official C standard says that
-0.074142	register usage convention says that
-0.198333	string and then interpret that
-0.163601	compiler has not noticed that
-0.163601	useful for making plug-ins that
-0.163601	a processing speed exceeding that
-0.163601	that the programmer forgets that
-0.163601	data set into sub-vectors that
-0.163601	It may seem illogical that
-0.163601	has a virus scanner that
-0.163601	} Example 14.27 assumes that
-0.163601	faster. It is assumed that
-0.163601	syntax is so kludgy that
-0.163601	is important to remember that
-0.163601	the header file mathimf.h that
-0.163601	cases. The so-called iterators that
-0.163601	example when you discover that
-0.163601	branch. The common excuse that
-0.163601	to increase the likelihood that
-0.163601	(e.g. Quine–McCluskey or Espresso) that
-0.163601	email and web browsing that
-0.163601	there is no guarantee that
-0.163601	the early planning stage that
-0.163601	bypassing the so-called CPU-dispatcher that
-0.163601	pipeline and later discovers that
-0.163601	is important to realize that
-0.163601	been doubled. Thin clients that
-0.163601	It must be emphasized that
-0.163601	libraries (*.dll or *.so) that
-0.163601	have a strict formalism that
-0.163601	of a program dictates that
-0.163601	must bear in mind, that
-0.163601	that it is unrealistic that
-0.163601	can eliminate common subexpressions that
-0.163601	reason is, I guess, that
-0.163601	other error condition. Things that
-0.163601	because the compiler knows that
-0.163601	etc. and the wires that
-0.163601	time Some developers feel that
-0.163601	for Linux. 82 Keywords that
-0.163601	the difference, let's say that
-0.163601	inlining has the complication that
-0.163601	of costs to multithreading that
-0.163601	And it is unlikely that
-0.163601	once. One may argue that
-0.163601	optimally, or from knowing that
-0.163601	directives Preprocessing directives (everything that
-0.326538	we prefer a to be
-0.280394	Specifies a function to be
-0.280394	a member function to be
-0.504168	want the code to be
-0.566749	expect the compiler to be
-0.272437	registers that have to be
-0.405164	etc. may have to be
-0.272437	The data have to be
-0.272437	micro- processors have to be
-0.272437	address calculations have to be
-0.272437	different versions have to be
-0.272437	five values have to be
-0.162660	you want this to be
-0.162660	and expect this to be
-0.424723	cause the memory to be
-0.236970	iteration that has to be
-0.236970	the pointer has to be
-0.236970	cache line has to be
-0.236970	rounding mode has to be
-0.236970	the offset has to be
-0.304296	want a number to be
-0.220603	forces the variable to be
-0.199985	cause other variables to be
-0.199985	Windows, allow variables to be
-0.382766	expect the table to be
-0.274570	cause the software to be
-0.178136	files that need to be
-0.178136	resources that need to be
-0.229845	does not need to be
-0.178136	that may need to be
-0.178136	array may need to be
-0.229845	dynamic libraries need to be
-0.229845	object files need to be
-0.411799	it is sure to be
-0.045550	block turns out to be
-0.045550	prediction turns out to be
-1.047018	and you want to be
-0.220603	entire cache line to be
-0.090142	four function parameters to be
-0.090142	of four parameters to be
-0.020843	to fourteen parameters to be
-0.020843	which is known to be
-0.010296	result is known to be
-0.020843	process is known to be
-0.147768	for is likely to be
-0.147768	it is likely to be
-0.147768	program is likely to be
-0.147768	address is likely to be
-0.147768	model is likely to be
-0.147768	here is likely to be
-0.147768	brand is likely to be
-0.147768	comparison is likely to be
-0.145554	data are likely to be
-0.145554	is very likely to be
-0.145554	are less likely to be
-0.145554	and therefore likely to be
-0.145554	are equally likely to be
-0.243806	#define is certain to be
-0.177721	is not certain to be
-0.177721	is therefore certain to be
-0.274570	causing return addresses to be
-0.220603	requiring many files to be
-0.163389	work that needs to be
-0.163389	destructor that needs to be
-0.205213	executable file needs to be
-0.205213	one constant needs to be
-0.205213	positive list needs to be
-0.220603	floating point division to be
-1.089806	for the programmer to be
-0.382766	it is intended to be
-0.220603	cause the heap to be
-0.274570	want the executable to be
-0.220603	in the sequence to be
-0.061860	the program happen to be
-0.061860	several variables happen to be
-0.061860	big matrix happen to be
-0.105014	not long enough to be
-0.105014	just long enough to be
-0.162660	can be expected to be
-0.162660	set are expected to be
-0.054112	b is guaranteed to be
-0.054112	i&15 is guaranteed to be
-0.115938	is not guaranteed to be
-0.220603	better metaprogramming tools to be
-0.274570	model is going to be
-0.096443	although it appears to be
-0.096443	if this appears to be
-0.220603	the function argument to be
-0.220603	number of dangers to be
-0.220603	It just happened to be
-0.026447	object pointed to can be
-0.026447	variable pointed to can be
-0.159037	binary code and can be
-0.144918	loop-invariant code that can be
-0.091499	a cache that can be
-0.091499	a way that can be
-0.091499	application-specific instructions that can be
-0.091499	programming language that can be
-0.021132	loop count that can be
-0.144918	for applications that can be
-0.091499	write expressions that can be
-0.091499	specific advantages that can be
-0.043344	three things that can be
-0.043344	reveal things that can be
-0.091499	third thing that can be
-0.144918	is something that can be
-0.091499	efficient alternatives that can be
-0.091499	a chip that can be
-0.138734	functions and it can be
-0.138734	cores, and it can be
-0.133063	is that it can be
-0.194580	so that it can be
-0.133063	means that it can be
-0.176101	program then it can be
-0.109150	constants because it can be
-0.109150	one, because it can be
-0.196952	hint, but it can be
-0.026826	some cases it can be
-0.118957	resources. But it can be
-0.118957	At least, it can be
-0.140061	unless the function can be
-0.140061	A frame function can be
-0.140061	The exponential function can be
-0.187626	of the code can be
-0.125523	if the code can be
-0.085355	pieces of code can be
-0.137975	branches The code can be
-0.085355	overflow, this code can be
-0.085355	the same code can be
-0.085355	The above code can be
-0.009572	3; } This can be
-0.019360	break; } This can be
-0.110444	non-AVX code. This can be
-0.060879	program memory. This can be
-0.060879	*= x; This can be
-0.060879	member pointer. This can be
-0.060879	integer operations. This can be
-0.060879	the variable. This can be
-0.060879	very fast. This can be
-0.060879	is important. This can be
-0.060879	inconvenient times. This can be
-0.060879	dispatch process. This can be
-0.060879	/ b2; This can be
-0.060879	code only. This can be
-0.060879	VIA CPUs"). This can be
-0.060879	and free. This can be
-0.186784	or modified. This can be
-0.060879	is saturated. This can be
-0.060879	of it). This can be
-0.060879	thread scheduler. This can be
-0.060879	bits 32-62. This can be
-0.060879	this place. This can be
-0.117089	former case x can be
-0.173971	shows how this can be
-0.117089	The load time can be
-0.117089	data cache use can be
-0.170442	compiler does It can be
-0.170442	input/output operations. It can be
-0.170442	the CPU. It can be
-0.159037	and public data can be
-0.222126	of each vector can be
-0.222126	reads. The same can be
-0.099317	Using intrinsic functions can be
-0.099317	The missing functions can be
-0.117089	The prefetch instruction can be
-0.345651	inside the loop can be
-0.050622	the CPU which can be
-0.050622	of i which can be
-0.050622	vector register which can be
-0.050622	conversion instructions which can be
-0.050622	and references, which can be
-0.050622	an attribute which can be
-0.050622	or YMM) which can be
-0.054610	double to integer can be
-0.054610	or an integer can be
-0.173971	in the set can be
-0.117089	A template class can be
-0.013022	in this example can be
-0.232671	of these compilers can be
-0.054610	of variable size can be
-0.054610	i >= size can be
-0.035634	Likewise, a pointer can be
-0.035634	conversion A pointer can be
-0.035634	No link pointer can be
-0.117089	check on b can be
-0.019147	A dynamic library can be
-0.078332	user interface library can be
-0.035634	if the object can be
-0.035634	a shared object can be
-0.035634	The existing object can be
-0.072279	a simple array can be
-0.072279	A large array can be
-0.072279	27. An array can be
-0.056829	number of objects can be
-0.056829	on when objects can be
-0.056829	containing many objects can be
-0.056829	and new objects can be
-0.035634	that a variable can be
-0.035634	expensive. A variable can be
-0.035634	the induction variable can be
-0.099317	number of variables can be
-0.099317	158 Integer variables can be
-0.117089	power of 2 can be
-0.159037	efficient. The performance can be
-0.153769	} A branch can be
-0.099317	the optimal branch can be
-0.117089	data are stored can be
-0.054610	if the address can be
-0.054610	the target address can be
-0.117089	The carry bit can be
-0.054610	The same register can be
-0.054610	128-bit XMM register can be
-0.026447	code Function libraries can be
-0.026447	libraries Function libraries can be
-0.054610	with invalid pointers can be
-0.054610	auto_ptr. Smart pointers can be
-0.159037	sizes, and they can be
-0.013022	all. This method can be
-0.013022	executables. This method can be
-0.026447	The same method can be
-0.026447	A similar method can be
-0.117089	if data access can be
-0.013022	the operating system can be
-0.117089	writes a file can be
-0.117089	Object oriented programming can be
-0.054610	sequence of operations can be
-0.054610	The Boolean operations can be
-0.117089	a composite type can be
-0.173971	on non-Intel processors can be
-0.117089	logical processors available can be
-0.054610	by a constant can be
-0.054610	subexpression. A constant can be
-0.117089	for the stack can be
-0.181635	current Intel CPUs can be
-0.117089	Objects and arrays can be
-0.117089	how caches work can be
-0.117089	Even function calls can be
-0.117089	that the result can be
-0.117089	of unused bytes can be
-0.117089	no branches inside can be
-0.054610	caching. This problem can be
-0.054610	This safety problem can be
-0.117089	a sorted list can be
-0.117089	because the hardware can be
-0.117089	stack unwinding information can be
-0.117089	max) { ... can be
-0.008638	the loop counter can be
-0.004298	a loop counter can be
-0.026447	time stamp counter can be
-0.222126	dynamic memory allocation can be
-0.117089	swapped then both can be
-0.117089	object oriented programs can be
-0.117089	of memory space can be
-0.117089	The automatic dispatching can be
-0.117089	is that branches can be
-0.117089	then the multiplication can be
-0.054610	various instruction sets can be
-0.054610	the 32 sets can be
-0.117089	A mixed implementation can be
-0.117089	why exception handling can be
-0.117089	its data members can be
-0.117089	pointer or reference can be
-0.117089	The register keyword can be
-0.117089	of table lookup can be
-0.117089	for WTL applications can be
-0.099317	The dispatching mechanism can be
-0.099317	CPU dispatch mechanism can be
-0.117089	floating point numbers can be
-0.239442	space. A union can be
-0.117089	The copy constructor can be
-0.117089	the code section can be
-0.117089	the cache contentions can be
-0.159037	time. These conversions can be
-0.159037	No general statement can be
-0.117089	other programming languages can be
-0.117089	CPUs. These costs can be
-0.117089	between two constants can be
-0.117089	because the offset can be
-0.117089	effects. This effect can be
-0.117089	etc. These counters can be
-0.117089	But program loading can be
-0.054610	if the condition can be
-0.054610	the if condition can be
-0.117089	The critical stride can be
-0.117089	explain how metaprogramming can be
-0.117089	A hash map can be
-0.026447	such dependency chains can be
-0.026447	Such dependency chains can be
-0.239442	My test tool can be
-0.117089	called. Lazy binding can be
-0.117089	in a DLL can be
-0.117089	lookup Lookup tables can be
-0.117089	read-only data sections can be
-0.099317	global variables. They can be
-0.099317	three branches. They can be
-0.117089	out if exceptions can be
-0.117089	in these manuals can be
-0.117089	chip. Such units can be
-0.117089	of this polynomial can be
-0.117089	in example 13.1 can be
-0.117089	valid address. Pointers can be
-0.117089	This wasteful behavior can be
-0.117089	ecx and edx can be
-0.117089	the background job can be
-0.117089	size of abc can be
-0.026447	reasonable upper limit can be
-0.026447	not-too-big upper limit can be
-0.117089	The following guidelines can be
-0.117089	a reasonable estimate can be
-0.117089	as code. Metaprogramming can be
-0.117089	in example 12.4b can be
-0.117089	Integer sizes Integers can be
-0.117089	The following techniques can be
-0.117089	much higher resolution can be
-0.117089	of C++ projects can be
-0.117089	in example 14.28 can be
-0.117089	exact. Multiple divisions can be
-0.117089	s2 and s3 can be
-0.117089	user interface etc., can be
-0.117089	character arrays. Strings can be
-0.117089	that are read-only can be
-0.117089	the subexpression c+b can be
-0.117089	the same chip can be
-0.117089	format. The formats can be
-0.117089	level-2 cache miss can be
-0.117089	Eliminate jumps Jumps can be
-0.117089	the two parentheses can be
-0.117089	intermediate result (b+c) can be
-0.117089	lost. This dilemma can be
-0.117089	the following work-around can be
-0.117089	in example 8.24 can be
-0.117089	compile time. (Examples can be
-0.117089	b = !a; can be
-0.117089	are zero. Zero can be
-0.036631	that it may not be
-0.036631	then it may not be
-0.080142	the compiler may not be
-0.059365	The compiler may not be
-0.024074	arrays It may not be
-0.024074	registers. It may not be
-0.024074	objects? It may not be
-0.076639	allocated memory may not be
-0.076639	inlined functions may not be
-0.076639	level-1 cache may not be
-0.076639	returns. alloca may not be
-0.076639	Calling exit may not be
-0.076639	USB sticks may not be
-0.354419	that it will not be
-0.246293	the code will not be
-0.246293	The program will not be
-0.045812	expression that should not be
-0.218316	a class need not be
-0.301524	You should therefore not be
-0.218316	cases it might not be
-0.191475	quite powerful and may be
-0.043348	initialized variables that may be
-0.043348	uninitialized variables that may be
-0.091508	few instructions that may be
-0.091508	other cleanup that may be
-0.036674	however, and it may be
-0.036674	fluctuating and it may be
-0.043665	necessary then it may be
-0.043665	problem then it may be
-0.043665	resource then it may be
-0.043665	not, then it may be
-0.043665	below) then it may be
-0.043665	identified, then it may be
-0.043665	obvious, then it may be
-0.017950	cases, but it may be
-0.017950	function, but it may be
-0.017950	set, but it may be
-0.017950	job, but it may be
-0.208104	For example, it may be
-0.076734	best optimization it may be
-0.076734	this case it may be
-0.076734	parameter. But it may be
-0.076734	large arrays, it may be
-0.076734	allows it, it may be
-0.117868	the same function may be
-0.117868	the critical function may be
-0.191475	the error code may be
-0.161140	RAM memory. This may be
-0.161140	a; 72 This may be
-0.161140	be reduced. This may be
-0.161140	page 45. This may be
-0.473765	cases, the compiler may be
-0.146437	The extra time may be
-0.121707	a pointer. It may be
-0.121707	one vector. It may be
-0.121707	disk space. It may be
-0.121707	true anyway. It may be
-0.121707	time here. It may be
-0.121707	poorly predictable. It may be
-0.121707	the profile. It may be
-0.121707	too high. It may be
-0.121707	are unavoidable. It may be
-0.409427	of the program may be
-0.215876	intermediate results, which may be
-0.146437	of security, but may be
-0.146437	operations An integer may be
-0.207589	though future compilers may be
-0.146437	A smart pointer may be
-0.146437	user interface library may be
-0.032244	used, then there may be
-0.032244	problematic because there may be
-0.032244	parameter, so there may be
-0.032244	critical. However, there may be
-0.054948	Model-specific dispatching There may be
-0.054948	end user. There may be
-0.054948	much faster. There may be
-0.054948	to 36. There may be
-0.054948	using inheritance. There may be
-0.146437	it. Global variables may be
-0.191475	in this table may be
-0.191475	typically use pointers may be
-0.146437	needed, or they may be
-0.067075	MFC). This method may be
-0.067075	short vector method may be
-0.146437	The network access may be
-0.146437	a few times may be
-0.146437	of 64-bit Windows may be
-0.146437	in program execution may be
-0.174860	The virtual processor may be
-0.117868	Pentium M processor may be
-0.146437	in www.agner.org/optimize/cppexamples.zip. These may be
-0.146437	motion A calculation may be
-0.146437	most efficient solution may be
-0.146437	low repeat count may be
-0.146437	Dynamic memory allocation may be
-0.146437	addition and multiplication may be
-0.146437	The following methods may be
-0.146437	pointer or reference may be
-0.146437	stack unwinding mechanism may be
-0.067075	A simple constructor may be
-0.067075	A copy constructor may be
-0.146437	up. Some modules may be
-0.146437	in a network may be
-0.259862	The clock frequency may be
-0.146437	calls. The usability may be
-0.146437	in memory. They may be
-0.146437	and just-in-time compilation may be
-0.146437	template parameter. Templates may be
-0.146437	A binary tree may be
-0.146437	7.25 Bitfields Bitfields may be
-0.146437	b * 2.5 may be
-0.146437	to the tolerance may be
-0.227368	Value of a will be
-0.469737	set then it will be
-0.253937	a virtual function will be
-0.295205	but the code will be
-0.199454	The resulting code will be
-0.199454	the resultant code will be
-0.290884	their functionality. This will be
-0.284889	your program, you will be
-0.227368	time. No memory will be
-0.188665	Otherwise the program will be
-0.129952	The application program will be
-0.129952	the entire program will be
-0.302184	that the cache will be
-0.178660	A negative integer will be
-0.178660	the same class will be
-0.227368	Value of b will be
-0.178660	areas, and there will be
-0.178660	Func(ab[i].a); } There will be
-0.178660	a linear array will be
-0.178660	variables and objects will be
-0.178660	predict which variables will be
-0.302184	Such a branch will be
-0.227368	and the user will be
-0.051793	and the result will be
-0.051793	profiler. The result will be
-0.051793	the final result will be
-0.178660	and the speed will be
-0.178660	a cache line will be
-0.178660	cases this multiplication will be
-0.178660	eliminated. Code caching will be
-0.178660	the code section will be
-0.302184	instance in main will be
-0.227368	the & operation will be
-0.178660	correctly whether vectorization will be
-0.178660	containing only constants will be
-0.178660	more template instances will be
-0.178660	0x2700 to 0x273F will be
-0.178660	the constant 3.5 will be
-0.178660	brand new today will be
-0.178660	the static modifier will be
-0.178660	value of b+c will be
-0.178660	(see page 103) will be
-0.178660	For example, b*2.0/3.0 will be
-0.312458	the code can then be
-0.236000	This operation will then be
-0.233078	operator, which can only be
-0.233078	which otherwise can only be
-0.248876	16 will not only be
-0.248876	This would not only be
-0.229842	Loop unrolling should only be
-0.237311	and d would all be
-0.153105	false where it should be
-0.167833	code. System code should be
-0.111699	actual calculations. This should be
-0.202681	manual, but you should be
-0.202681	testing. Here, you should be
-0.119416	became available. It should be
-0.119416	installation tools. It should be
-0.232313	sched_setaffinity). The program should be
-0.111699	structure or class should be
-0.111699	in this example should be
-0.111699	a and b should be
-0.153105	by commas. There should be
-0.025348	or multidimensional array should be
-0.025348	A multidimensional array should be
-0.228414	the software. You should be
-0.111699	134. The table should be
-0.111699	that software performance should be
-0.111699	problems. All software should be
-0.111699	The loop branch should be
-0.167833	spots. The test should be
-0.111699	used. Web systems should be
-0.111699	function or method should be
-0.111699	by a constant should be
-0.052268	memory. Big arrays should be
-0.052268	data. Multidimensional arrays should be
-0.052268	in multiple versions should be
-0.052268	all three versions should be
-0.111699	than 16 bytes should be
-0.111699	dialog boxes, etc. should be
-0.111699	uninstallation of programs should be
-0.111699	server. These problems should be
-0.111699	and the dispatching should be
-0.111699	The template parameter should be
-0.111699	data and resources should be
-0.001542	are used together should be
-0.111699	All intermediate results should be
-0.111699	block. Thread-local storage should be
-0.111699	a few lines should be
-0.111699	input and output should be
-0.153105	Objects inside containers should be
-0.052268	search for updates should be
-0.052268	downloaded program updates should be
-0.111699	state. This penalty should be
-0.111699	the clock counts should be
-0.153105	that software developers should be
-0.111699	etc. Accessibility guidelines should be
-0.052268	times. A queue should be
-0.052268	a FIFO queue should be
-0.111699	The following considerations should be
-0.111699	that are modified should be
-0.111699	features. User feedback should be
-0.111699	heavy mathematical calculations, should be
-0.111699	standardized file formats should be
-0.111699	resources and servers should be
-0.111699	64 bits wide, should be
-0.111699	outside any function) should be
-0.111699	planned solutions. Patches should be
-0.111699	seriously. User complaints should be
-0.111699	copy protection scheme should be
-0.111699	on which imprecisions should be
-0.111699	current .cpp file) should be
-0.111699	new/delete or malloc/free should be
-0.061504	but it can also be
-0.070853	Linux. It can also be
-0.070853	numbers. It can also be
-0.061504	induction variables can also be
-0.061504	A branch can also be
-0.061504	operating systems can also be
-0.061504	Container classes can also be
-0.061504	template parameter can also be
-0.061504	A union can also be
-0.061504	periodic pattern can also be
-0.061504	far (arrays can also be
-0.179035	Mbytes. There may also be
-0.179035	hash map may also be
-0.145496	a function should also be
-0.145496	or video should also be
-0.191586	a coprocessor might also be
-0.236822	requires that all software be
-0.140161	be inlined or cannot be
-0.220975	so that it cannot be
-0.158041	only if it cannot be
-0.208521	that the function cannot be
-0.140161	The intermediate code cannot be
-0.339048	handling then you cannot be
-0.140161	The MOVNTQ instruction cannot be
-0.140161	a function which cannot be
-0.140161	the final size cannot be
-0.140161	above. An object cannot be
-0.140161	below). A variable cannot be
-0.245349	a thread. You cannot be
-0.140161	particular memory address cannot be
-0.140161	Whole program optimization cannot be
-0.158041	avoided because they cannot be
-0.158041	cases where they cannot be
-0.140161	More complicated cases cannot be
-0.140161	that access times cannot be
-0.140161	of Intel CPUs cannot be
-0.140161	make it work cannot be
-0.140161	that the name cannot be
-0.140161	for network resources cannot be
-0.140161	132 Table lookup cannot be
-0.140161	where dynamic linking cannot be
-0.140161	floating point operands cannot be
-0.140161	library is loaded cannot be
-0.236740	It may neverthe- less be
-0.178849	class objects can often be
-0.178849	digital operation can often be
-0.178849	Database queries can often be
-0.296374	preferred language will often be
-0.230974	common denominator can even be
-0.306472	RAM memory may even be
-0.292423	The size should always be
-0.486793	can in most cases be
-0.358989	may in some cases be
-0.116909	means that a must be
-0.116909	runtime framework that must be
-0.211604	first. However, you must be
-0.116909	do manually. It must be
-0.116909	The MOVNTQ instruction must be
-0.116909	compilers cannot do must be
-0.116909	value of i must be
-0.116909	that an object must be
-0.116909	the carry bit must be
-0.116909	costly because they must be
-0.116909	between multiple threads must be
-0.116909	The inequality sign must be
-0.116909	user interface framework must be
-0.116909	128-bit XMM vectors must be
-0.116909	Any copy constructor must be
-0.116909	rounding 137 errors must be
-0.116909	i. This index must be
-0.116909	8; // SIZE must be
-0.116909	vectorize. The pragmas must be
-0.116909	destructor, if any, must be
-0.116909	module for correctness must be
-0.188481	time. It can therefore be
-0.188481	memory allocation can therefore be
-0.256496	above examples will therefore be
-0.315130	them. You should therefore be
-0.238239	Lazy binding should therefore be
-0.329020	container. Can the container be
-0.055401	even when it would be
-0.055401	again, but it would be
-0.161051	then the compiler would be
-0.224456	0 because this would be
-0.118917	value. The loop would be
-0.118917	5) {} which would be
-0.118917	no induction variable would be
-0.118917	the cache line would be
-0.118917	mode, the parameters would be
-0.118917	The ultimate solution would be
-0.118917	15.1b to metaprogramming would be
-0.161051	a particular reduction would be
-0.118917	if they otherwise would be
-0.118917	static, the logarithm would be
-0.118917	double, then sizeof(S1) would be
-0.235538	function will most likely be
-0.118134	first dimension may preferably be
-0.014014	a function should preferably be
-0.014014	innermost loop should preferably be
-0.014014	each object should preferably be
-0.004621	the objects should preferably be
-0.002305	and objects should preferably be
-0.014014	speed test should preferably be
-0.014014	a list should preferably be
-0.014014	loop counter should preferably be
-0.014014	loop count should preferably be
-0.014014	switch statements should preferably be
-0.014014	Loop unrolling should preferably be
-0.014014	other device should preferably be
-0.014014	an interrupt should preferably be
-0.080058	code should therefore preferably be
-0.080058	It should therefore preferably be
-0.092495	that i can never be
-0.092495	compiler. We can never be
-0.281831	that a will never be
-0.289751	instruction set may actually be
-0.336862	program may in fact be
-0.074484	of optimization can sometimes be
-0.074484	float conversions can sometimes be
-0.074484	explicitly. Divisions can sometimes be
-0.074484	c) 139 can sometimes be
-0.177616	The loop can still be
-0.177616	described above can still be
-0.189639	to 0x273F would still be
-0.239590	performance that can possibly be
-0.174098	the code can possibly be
-0.159495	line size may possibly be
-0.159495	following methods could possibly be
-0.242757	There should of course be
-0.242757	15.1c would of course be
-0.219680	parameters. Or it might be
-0.171781	elements then this might be
-0.171781	better solution. It might be
-0.165059	or the function could be
-0.165059	that the portability could be
-0.165059	is that r+i/2 could be
-0.227970	The code can now be
-0.228000	structure that can easily be
-0.071341	the problem cannot easily be
-0.071341	encryption algorithms, cannot easily be
-0.279286	my manual will soon be
-0.034928	complicated cases should definitely be
-0.034928	.NET framework should definitely be
-0.034928	These containers should definitely be
-0.218368	= c; } Can be
-0.212102	the code can probably be
-0.199836	higher address which can't be
-0.164988	solutions may some day be
-0.164988	{...} // Dispatcher. Will be
-0.448729	texts they point to are
-0.536471	on the stack and are
-0.236708	up cache space and are
-0.236708	time to evaluate and are
-0.470289	pieces of code that are
-0.160667	pieces of data that are
-0.160667	threads, while data that are
-0.474407	in the program that are
-0.219495	for the functions that are
-0.219495	extracts the functions that are
-0.219495	collect the functions that are
-0.141374	even of functions that are
-0.141374	conventions for functions that are
-0.141374	efficiently if functions that are
-0.141374	some other functions that are
-0.141374	If several functions that are
-0.141374	a few functions that are
-0.306303	choose the compilers that are
-0.270711	variables and objects that are
-0.178744	choose the variables that are
-0.178744	intended for variables that are
-0.178744	sure that variables that are
-0.072946	and function libraries that are
-0.072946	general function libraries that are
-0.217194	things with pointers that are
-0.313427	application- specific instructions that are
-0.270711	compatible with CPUs that are
-0.270711	apply to arrays that are
-0.217194	operators. Function parameters that are
-0.298200	high for programs that are
-0.300165	are making branches that are
-0.217194	two data members that are
-0.217194	one for constants that are
-0.217194	of other tasks that are
-0.029462	more efficient. Variables that are
-0.029462	static storage Variables that are
-0.029462	effects are: Variables that are
-0.029462	temporary storage. Variables that are
-0.014481	functions. 9.4 Variables that are
-0.014481	88 9.4 Variables that are
-0.044978	87 9.3 Functions that are
-0.044978	CPUs". 9.3 Functions that are
-0.217194	allocated dynamically. Arrays that are
-0.217194	it for lists that are
-0.217194	garbage collection. Objects that are
-0.217194	memory areas. Data that are
-0.217194	have variable lengths that are
-0.217194	of the considerations that are
-0.290124	user interface. Applications that are
-0.270711	and matrixes. Algorithms that are
-0.217194	by random events that are
-0.095159	simple expressions. Operations that are
-0.095159	and aliasing. Operations that are
-0.217194	cleaning up spaces that are
-0.217194	modify the ones that are
-0.217194	The so-called iterators that are
-0.104998	declared inside a function are
-0.291259	outside of any function are
-0.235309	of an overloaded function are
-0.343905	addresses in the code are
-0.426593	of the program code are
-0.659411	of the critical code are
-0.328949	slow instruction that you are
-0.217477	is that if you are
-0.217477	methods only if you are
-0.217477	the loop if you are
-0.217477	to unsigned if you are
-0.217477	this section if you are
-0.217477	is organized if you are
-0.217477	following explanation if you are
-0.217477	database anyway if you are
-0.217477	pointer aliasing" if you are
-0.255775	that the code you are
-0.664800	as long as you are
-0.336145	for the compiler you are
-0.204022	comes first when you are
-0.204022	the counters when you are
-0.204022	more readable when you are
-0.341522	CPU supports then you are
-0.225031	64-bit systems. If you are
-0.225031	instruction set. If you are
-0.225031	= u; If you are
-0.225031	many branches. If you are
-0.299409	reliable results. If you are
-0.225031	these methods. If you are
-0.225031	cryptography (www.intel.com). If you are
-0.274517	and make sure you are
-0.255775	regardless of whether you are
-0.042735	non-Intel CPUs unless you are
-0.204022	OS X, unless you are
-0.287539	you know what you are
-0.336145	long vector library, you are
-0.490185	cache and the data are
-0.140125	than if the data are
-0.140125	efficient if the data are
-0.140125	faster if the data are
-0.265836	efficiently when the data are
-0.265836	cases where the data are
-0.131899	when code and data are
-0.215761	applications require that data are
-0.269091	is poor if data are
-0.021793	less efficiently when data are
-0.215761	order in which data are
-0.288430	problems because static data are
-1.340900	part of the program are
-0.447193	and compile the program are
-0.307527	a well-structured C++ program are
-0.459177	a console mode program are
-0.121372	most of the functions are
-0.121372	few of the functions are
-0.250382	code memory. The functions are
-0.088267	and these two functions are
-0.088267	} These two functions are
-0.437301	systems. Virtual member functions are
-0.268887	lrint. Unfortunately, these functions are
-0.199173	Microsoft compiler. Some functions are
-0.199173	the most important functions are
-0.088267	so-called commpage. These functions are
-0.088267	with _mm. These functions are
-0.284281	called. If virtual functions are
-0.476699	root and mathematical functions are
-0.257740	of advanced mathematical functions are
-0.199173	fully optimized. Library functions are
-0.268887	member functions Virtual functions are
-0.184550	suboptimal code. Intrinsic functions are
-0.184550	machine instructions. Intrinsic functions are
-0.250382	CodeGear compiler). Fastcall functions are
-0.199173	the program. Small functions are
-0.199173	separate function. Sometimes, functions are
-0.199173	leaf function. Leaf functions are
-0.324641	used near each other are
-0.324641	called near each other are
-0.799417	calculations inside the loop are
-0.309402	are costly and which are
-0.224810	sure that functions which are
-0.224810	various profilers available which are
-0.224810	keywords and directives which are
-0.224810	has three conditions which are
-0.224810	older MMX registers, which are
-0.224810	floating point comparisons, which are
-0.236982	no specific order but are
-0.431410	cache and data cache are
-0.770021	in the level-2 cache are
-0.484276	in the level-1 cache are
-0.502966	the AVX-512 instruction set are
-0.406139	declared in a class are
-0.626592	declared inside a class are
-0.327837	parent and child class are
-0.354753	and a derived class are
-0.246554	class and derived class are
-0.321877	which reductions the compilers are
-0.213842	the compiler. The compilers are
-0.213842	do so. The compilers are
-0.208012	less. Fortunately, all compilers are
-0.208012	mode. Some 64-bit compilers are
-0.428071	optimizations. Most C++ compilers are
-0.260339	Intel and Gnu compilers are
-0.335568	compile time. Some compilers are
-0.208012	stupid. Some common compilers are
-0.208012	table. Unfortunately, few compilers are
-0.208012	Codeplay and Watcom compilers are
-0.208012	page 73). Current compilers are
-0.208012	is enabled. Few compilers are
-0.236769	Vectors of 256-bit size are
-0.597516	of a and b are
-0.722455	if a and b are
-0.531215	If a and b are
-0.419382	destructors of each object are
-0.140132	natural order and there are
-0.140132	is limited and there are
-0.140132	are dominating and there are
-0.141774	times and that there are
-0.298662	can assume that there are
-0.141774	1. Note that there are
-0.298662	be aware that there are
-0.141774	have discovered that there are
-0.141774	you discover that there are
-0.164508	be used if there are
-0.164508	code cache if there are
-0.035659	function pointers if there are
-0.164508	32-bit programs if there are
-0.074511	be safe if there are
-0.074511	exception safe if there are
-0.164508	object separately if there are
-0.164508	function calls, if there are
-0.180871	memory blocks than there are
-0.196589	behave differently because there are
-0.124217	32-bit mode. If there are
-0.057682	and again. If there are
-0.057682	back again. If there are
-0.161743	superfluous code, but there are
-0.161743	small devices, but there are
-0.180871	cases, however, where there are
-0.111869	member function. But there are
-0.111869	unsigned integers. But there are
-0.063613	as possible. However, there are
-0.063613	set. 120 However, there are
-0.063613	works automatically. However, there are
-0.063613	they are. However, there are
-0.582600	In some cases, there are
-0.136866	S1 ArrayOfStructures[100]; Here, there are
-0.136866	not all. Fortunately, there are
-0.180871	execution units. Typically, there are
-0.136866	cannot be avoided, there are
-0.049199	Choice of compiler There are
-0.049199	optimization by compiler There are
-0.104692	vectorizing mathematical code. There are
-0.049199	the same time. There are
-0.049199	no extra time. There are
-0.104692	the memcpy function. There are
-0.104692	cache size, etc. There are
-0.104692	caching less efficient. There are
-0.104692	as explained below. There are
-0.145403	on future processors. There are
-0.104692	functions for vectors There are
-0.104692	or other resources. There are
-0.104692	across function calls. There are
-0.104692	that support it. There are
-0.104692	kind of registers. There are
-0.104692	gain in performance. There are
-0.104692	Cache control instructions. There are
-0.104692	the same address. There are
-0.104692	speed or not. There are
-0.104692	set is enabled. There are
-0.104692	than Boolean expressions. There are
-0.104692	for unaligned arrays. There are
-0.104692	floating point vectors. There are
-0.104692	own CPU core. There are
-0.104692	into multiple threads. There are
-0.104692	the present manual. There are
-0.104692	divisible by 8. There are
-0.104692	objects (*.dll, *.so). There are
-0.104692	are not optimal. There are
-0.104692	Test and maintenance There are
-0.104692	the code explicitly. There are
-0.104692	GetProcessAffinityMask in Windows). There are
-0.104692	and AMD CodeAnalyst. There are
-0.104692	the following way: There are
-0.104692	relates to security. There are
-0.104692	is very limited. There are
-0.104692	set number 0x1C. There are
-0.104692	math. Memory copying. There are
-0.104692	pre-increment to post-increment. There are
-0.104692	an overflow check. There are
-0.104692	4: "Instruction tables". There are
-0.104692	to save power. There are
-0.104692	x = -abs(x);. There are
-0.104692	time it uses. There are
-0.104692	2A and 2B. There are
-0.104692	time than normally. There are
-0.104692	8 floating point). There are
-0.236765	knowing that the objects are
-0.313371	2 if the objects are
-0.433090	storage Variables and objects are
-0.244179	of time. The objects are
-0.026854	FIFO manner? If objects are
-0.026854	FILO manner? If objects are
-0.055478	numbered consecutively? If objects are
-0.247093	that all allocated objects are
-0.247093	different dynamically allocated objects are
-0.298995	even when shared objects are
-0.193656	destructors for local objects are
-0.193656	modular. The so-called objects are
-0.322999	bit Linux Shared objects are
-0.244179	simplest cases, composite objects are
-0.307341	Assume now that we are
-0.192222	at the time we are
-0.179499	anything here because we are
-0.179499	example 9.5 because we are
-0.192222	in cases where we are
-0.192222	In this example, we are
-0.192222	In these examples we are
-0.192222	assembly language". While we are
-0.242569	in F1? Then we are
-0.192222	In example 7.4 we are
-0.192222	a problem since we are
-0.192222	a PC. Similarly, we are
-0.192222	in example 14.7b, we are
-0.192222	than 200. Next, we are
-0.263235	most commonly used variables are
-0.313330	candidates for register variables are
-0.210578	to understand how variables are
-0.151389	for true. Boolean variables are
-0.151389	are overdetermined Boolean variables are
-0.151389	is invalid. Boolean variables are
-0.311389	9; } Induction variables are
-0.263235	any function. Global variables are
-0.349162	Numbers in the table are
-0.236100	for supporting multi-threaded software are
-0.177278	b, and the elements are
-0.177278	sure that the elements are
-0.243290	2 if the elements are
-0.177278	in which the elements are
-0.211507	applies only when elements are
-0.211507	or "how many elements are
-0.211507	for accessing container elements are
-0.324611	if the objects stored are
-0.558987	except the sign bit are
-0.440471	important obstacles to optimization are
-0.213500	The Gnu function libraries are
-0.213500	The best function libraries are
-0.093760	vectors. These function libraries are
-0.093760	Primitives". These function libraries are
-0.213500	Some common function libraries are
-0.213500	functions. Many function libraries are
-0.229464	work when Intel libraries are
-0.224078	libraries. The dynamic libraries are
-0.224078	or more dynamic libraries are
-0.229464	Unfortunately, the standard libraries are
-0.180533	less efficient. Dynamic libraries are
-0.180533	Several special purpose libraries are
-0.180533	SVML and LIBM libraries are
-0.585479	are stored in registers are
-0.265891	The XMM vector registers are
-0.377692	the floating point registers are
-0.005419	floating point stack registers are
-0.031392	when the XMM registers are
-0.051503	check if XMM registers are
-0.051503	costly if XMM registers are
-0.109947	The 128-bit XMM registers are
-0.243034	SSE). The YMM registers are
-0.220538	addresses, or if pointers are
-0.310413	the way member pointers are
-0.274496	implementations of smart pointers are
-0.304217	is deleted. Smart pointers are
-0.338957	platforms and operating systems are
-0.234192	the two operating systems are
-0.310303	and 64-bit operating systems are
-0.234192	and some operating systems are
-0.234192	Unfortunately, contemporary operating systems are
-0.285622	total size, because these are
-0.285622	fence instructions, but these are
-0.117451	same thing and they are
-0.117451	many programmers and they are
-0.254306	in so that they are
-0.091905	inside the function they are
-0.103600	overlap or if they are
-0.103600	arrays even if they are
-0.054767	other values if they are
-0.054767	stored together if they are
-0.054767	fatal errors if they are
-0.054767	but expensive if they are
-0.054767	relatively cheap if they are
-0.091905	unsigned integers - they are
-0.091905	evaluated every time they are
-0.021218	track of when they are
-0.021218	loaded only when they are
-0.021218	monitor counters when they are
-0.021218	is stronger when they are
-0.133079	equally efficient because they are
-0.133079	program performance because they are
-0.049758	function in which they are
-0.357538	order in which they are
-0.105963	thread in which they are
-0.043526	= a, but they are
-0.043526	64-bit integers, but they are
-0.131379	only situation where they are
-0.091905	several stages before they are
-0.091905	depending on how they are
-0.091905	in most cases they are
-0.131379	regardless of whether they are
-0.091905	than the programs they are
-0.091905	as pointers unless they are
-0.091905	point calculations whenever they are
-0.334131	access and memory access are
-0.875020	of object oriented programming are
-0.435506	32 and 64 bits are
-0.287954	processors (when vector operations are
-0.282316	purposes. Floating point operations are
-0.292388	advantage because integer operations are
-0.186139	vectors, and these operations are
-0.083173	Integer operators Integer operations are
-0.083173	specific size. Integer operations are
-0.053643	register variables. Vector operations are
-0.053643	instruction sets. Vector operations are
-0.053643	bits (ZMM). Vector operations are
-0.235744	address. Pointer arithmetic operations are
-0.235864	is best. These cases are
-0.200005	a pipeline where instructions are
-0.200005	used, though. Some instructions are
-0.150524	data cache. These instructions are
-0.150524	table lookup. These instructions are
-0.189818	the nontemporal write instructions are
-0.189818	The nontemporal write instructions are
-0.269864	spent on executing instructions are
-0.200271	processors and vector processors are
-0.331147	organization for different processors are
-0.251617	and earlier Intel processors are
-0.251617	4 and AMD processors are
-0.200271	versions. The x86 processors are
-0.610574	the standard PC processors are
-0.200271	last time. Newer processors are
-0.116644	critical resources. Modern CPUs are
-0.116644	in parallel. Modern CPUs are
-0.116644	and temp2. Modern CPUs are
-0.039977	compiler that the arrays are
-0.039977	require that the arrays are
-0.084011	disadvantage when the arrays are
-0.019529	make sure the arrays are
-0.039977	efficient whether the arrays are
-0.039977	sure whether the arrays are
-0.176682	more efficient when arrays are
-0.176682	vector. 6. If arrays are
-0.176682	very inefficient. Linear arrays are
-0.330032	Intel compilers for Windows are
-0.638739	branches and function calls are
-0.299317	inputs to the calculations are
-0.260545	but these address calculations are
-0.341892	that double precision calculations are
-0.208194	stack are: All calculations are
-0.208194	sure that certain calculations are
-0.045775	Intel CPUs. New versions are
-0.045775	AMD CPUs. New versions are
-0.221953	IDE. Free trial versions are
-0.434509	advantage if the threads are
-0.278130	means that different threads are
-0.397387	example, if multiple threads are
-0.304064	considerable. If two threads are
-0.207033	core and high-priority threads are
-0.505898	if b and c are
-0.227675	and system calls. These are
-0.227675	to improve efficiency. These are
-0.291138	each task or thread are
-0.518094	functions, trigonometric functions, etc. are
-0.227395	constants Sunday, Monday, etc. are
-0.291248	integer. If two integers are
-0.420827	of predefined vector classes are
-0.415905	examples of container classes are
-0.302485	Multiple threads? Container classes are
-0.193518	mode where the parameters are
-0.185643	efficient. Simple function parameters are
-0.059007	eight floating point parameters are
-0.059007	parameters. Floating point parameters are
-0.105831	first two integer parameters are
-0.105831	CodeGear compiler) integer parameters are
-0.071208	that the template parameters are
-0.071208	if the template parameters are
-0.071208	because the template parameters are
-0.170309	the first four parameters are
-0.099884	Function parameters Function parameters are
-0.022901	in memory. Function parameters are
-0.099884	using __fastcall. Function parameters are
-0.127311	beware that macro parameters are
-0.484932	solutions to this problem are
-0.235108	in an STL container are
-0.270029	adding vectors. The operators are
-0.362019	Nevertheless, the bitwise operators are
-0.394435	||). The bitwise operators are
-0.856449	a class or structure are
-0.159326	is called. The values are
-0.159326	an array. The values are
-0.214905	If the key values are
-0.243987	see that the addresses are
-0.243987	occurs because the addresses are
-0.204395	profiler itself. Function addresses are
-0.287888	smaller because relative addresses are
-0.204228	the necessary library files are
-0.204228	format. The intermediate files are
-0.274829	steps. All source files are
-0.256074	of the header files are
-0.290097	will fail if both are
-0.290164	prone. All these problems are
-0.222585	especially if the branches are
-0.222585	where the dispatch branches are
-0.729656	addition, subtraction and multiplication are
-0.298433	used if instruction sets are
-0.298433	only when instruction sets are
-0.234119	each of its members are
-0.240870	All of these methods are
-0.175199	Note that these methods are
-0.235957	of memory. These methods are
-0.186329	Since most development methods are
-0.186329	blocking and similar methods are
-0.233859	and ease of development are
-0.184553	to predict which resources are
-0.184553	are removed, all resources are
-0.233966	make sure allocated resources are
-0.184553	of the shared resources are
-0.331332	depend on network resources are
-0.233387	screen. However, such applications are
-0.293866	each version. The examples are
-0.274290	purposes. All these examples are
-0.234083	and other protection means are
-0.233300	operators && and || are
-0.233240	many cases. Integer expressions are
-0.240798	Many of these directives are
-0.190645	and Fortran. These directives are
-0.240798	at runtime. #define directives are
-0.240798	is compiled. #if directives are
-0.319392	in Microsoft's .NET framework are
-0.211616	competing brands of microprocessors are
-0.164552	versions of Intel microprocessors are
-0.164552	in the way microprocessors are
-0.257647	inefficient. The modern microprocessors are
-0.159172	branch prediction. Modern microprocessors are
-0.159172	prediction mechanisms. Modern microprocessors are
-0.286684	because all the numbers are
-0.308434	variables Floating point numbers are
-0.254583	family and model numbers are
-0.561531	that are used together are
-0.232969	The 256-bit YMM vectors are
-0.288778	parameters a and r are
-0.235814	loop if the results are
-0.142260	different compilers. The results are
-0.142260	of optimizations. The results are
-0.262921	precision, and intermediate results are
-0.528666	kinds of variable storage are
-0.321586	options. Many optimization options are
-0.216668	only if certain options are
-0.283667	provided that the operands are
-0.167323	than if the operands are
-0.167323	advantageous if the operands are
-0.563714	in which the modules are
-0.330451	compiler. Many algebraic reductions are
-0.449618	references Pointers and references are
-0.232481	A, B and C are
-0.318223	syntax checks. These conversions are
-0.342836	Several other programming languages are
-0.177145	size, while high-level languages are
-0.177145	development time. Interpreted languages are
-0.177145	at hand. Low-level languages are
-0.430933	containers in the STL are
-0.231754	= 128. These lines are
-0.286716	glitches in the output are
-0.287254	to another. These costs are
-0.195750	expressions. Whether the constants are
-0.032982	and floating point constants are
-0.032982	all floating point constants are
-0.195750	that the two constants are
-0.150289	for constants. Integer constants are
-0.306754	container classes. Text strings are
-0.031866	of the following conditions are
-0.031866	if the following conditions are
-0.144474	parallel if certain conditions are
-0.144474	and the caching conditions are
-0.144474	from www.agner.org/optimize. Copyright conditions are
-0.286370	for many standard tasks are
-0.612769	of parent and child are
-0.512638	The performance monitor counters are
-0.307488	but the function names are
-0.205431	library libircmt.lib. Function names are
-0.230542	one parameter. Further details are
-0.100231	see that the rows are
-0.022974	2 if the rows are
-0.230211	the arrays or structures are
-0.229632	remotely. If frequent updates are
-0.229872	Processors with multiple cores are
-0.230353	case if alternative implementations are
-0.033294	objects of different sizes are
-0.229872	FreeBSD and Open BSD are
-0.176708	C++ template metaprogramming, loops are
-0.018537	the processor. Nested loops are
-0.228806	two ways. Switch statements are
-0.284019	made container class templates are
-0.229466	are allocated in sequence are
-0.304050	afterwards. The clock counts are
-0.229202	list, set and map are
-0.227945	in software writing style are
-0.207320	sure that all destructors are
-0.127604	guarantee that all destructors are
-0.417521	return and parameter transfer are
-0.304706	matrix[c][r] above the diagonal are
-0.469115	matrix[r][c] below the diagonal are
-0.228238	libraries for special purposes are
-0.228238	in assembly language. Here are
-0.311907	used for branch prediction are
-0.282792	the function calling conventions are
-0.281674	run in the background are
-0.227035	of microprocessor. These algorithms are
-0.227364	if all the additions are
-0.227035	The only allowed inputs are
-0.227200	164 below. Those who are
-0.240445	if all the factors are
-0.190330	link libraries. These factors are
-0.190180	or Verilog. Common devices are
-0.190180	controlled. Small hand-held devices are
-0.227200	stored together Cache misses are
-0.016705	GOT and PLT tables are
-0.202172	table lookup. Lookup tables are
-0.225681	and IDE's for D are
-0.225681	counts that you measure are
-0.225681	in the above sections are
-0.366376	the rules of algebra are
-0.280112	be renewed. Context switches are
-0.280324	Some implementations of Java are
-0.225494	check that thrown exceptions are
-0.226055	the Java virtual machine are
-0.145689	that my optimization manuals are
-0.190645	versions of these manuals are
-0.145689	manual. The subsequent manuals are
-0.145689	a program. The profilers are
-0.145689	called CodeAnalyst. These profilers are
-0.145689	AMD CodeAnalyst. Unfortunately, profilers are
-0.438284	Microprocessors with out-of-order capabilities are
-0.225681	dynamically and that measurements are
-0.225494	largest vector. These units are
-0.225868	sqrt, pow and log are
-0.130853	that floating point comparisons are
-0.037724	cycles. Floating point comparisons are
-0.037724	organized. Floating point comparisons are
-0.175932	development process. These requirements are
-0.175932	but the alignment requirements are
-0.275152	C++, Pascal and Fortran are
-0.310201	routines and device drivers are
-0.164292	x = 10; Templates are
-0.164292	same template. 57 Templates are
-0.221117	stored. The storage principles are
-0.106378	system standards. Such schemes are
-0.007149	Some copy protection schemes are
-0.007149	Most copy protection schemes are
-0.007149	Many copy protection schemes are
-0.164292	pointers and references. Arrays are
-0.164292	and unexpected behaviors. Arrays are
-0.221895	and before any constructors are
-0.221117	program logic. Some guidelines are
-0.114049	why such runtime frameworks are
-0.114049	Several graphical interface frameworks are
-0.114049	are running. Such frameworks are
-0.221377	with low power consumption are
-0.017445	added? If search facilities are
-0.221117	when used as macros are
-0.217323	chip. Such hybrid solutions are
-0.217323	variables sum1 and sum2 are
-0.148218	in example 15.1b. Branches are
-0.148218	branch misprediction penalty. Branches are
-0.217323	heuristic guidelines. Most caches are
-0.147944	method. 7.29 Threads Threads are
-0.147944	possible in Linux). Threads are
-0.217323	testing Most performance tests are
-0.211495	startup code and main() are
-0.211073	up multiplications and divisions are
-0.211073	integer in disguise. Enums are
-0.211495	the constructor itself. Constructors are
-0.027438	in b[i] and c[i] are
-0.027438	if b[i] and c[i] are
-0.121995	allows parallel calculations. Examples are
-0.121995	as explained above. Examples are
-0.211073	(GOT). These table lookups are
-0.329208	Sum1, Sum2 and Sum3 are
-0.250001	conditions. All disturbing influences are
-0.198835	the advanced programming constructs are
-0.198835	reinstalled and user settings are
-0.198835	are optimized well, others are
-0.198835	use relocation. The DLLs are
-0.198835	for AVX. These suffixes are
-0.198835	with the same arguments are
-0.198835	resolution if time intervals are
-0.198835	that u.f and v.f are
-0.250001	properly. Many CPU dispatchers are
-0.074331	up one register. Registers are
-0.074331	pointer or reference. Registers are
-0.074331	when using references. References are
-0.074331	a wrong type. References are
-0.198835	sizes (char, short int) are
-0.198835	as most sorting algorithms, are
-0.198835	results of my experiment are
-0.198835	the post-increment operator i++ are
-0.198835	The most common time-consumers are
-0.198835	pointers, and far procedures are
-0.198835	&, |, ^, ~ are
-0.164065	as spell-checking and repagination are
-0.164065	increment. The three clauses are
-0.164065	Mac OS X (Darwin) are
-0.164065	vector operations (chapter 12) are
-0.164065	get any answer. Beginners are
-0.164065	details about name mangling are
-0.164065	restarted anyway. Software distributors are
-0.164065	microprocessors work. The recommendations are
-0.164065	Details about instruction latencies are
-0.164065	'?', '@' and '$' are
-0.164065	string searching and parsing are
-0.164065	than pointers to objects) are
-0.164065	class (also called properties) are
-0.164065	int ABC = 123; are
-0.164065	that begins with #) are
-0.164065	after each time slice are
-0.164065	nontemporal write instructions (MOVNT) are
-0.164065	relational operators (e.g. '>') are
-0.164065	statements within each clause are
-0.164065	monitoring options. CPU vendors are
-0.164065	5 and 9. Multiplications are
-0.352071	The object pointed to can
-0.495449	the variable pointed to can
-0.351814	to binary code and can
-0.236956	are accessed consecutively and can
-0.417061	choice for code that can
-0.321918	a loop-invariant code that can
-0.425193	using a compiler that can
-0.430117	a test program that can
-0.225837	also a cache that can
-0.225837	The highest performance that can
-0.333433	kind of branch that can
-0.740513	in a way that can
-0.324240	define application-specific instructions that can
-0.300365	a programming language that can
-0.225837	a parallel structure that can
-0.022590	maximum loop count that can
-0.310649	needed. Predictable branches that can
-0.098401	advantage in applications that can
-0.098401	advantageous for applications that can
-0.225837	often write expressions that can
-0.280500	are specific advantages that can
-0.165705	reveals three things that can
-0.165705	often reveal things that can
-0.225837	include a profiler that can
-0.280500	The third thing that can
-0.098401	there is something that can
-0.098401	is certainly something that can
-0.035882	are several factors that can
-0.225837	more efficient alternatives that can
-0.225837	another function F2 that can
-0.225837	is a chip that can
-0.225837	Quine–McCluskey or Espresso) that can
-0.244026	intrinsic functions and it can
-0.244026	sequential order and it can
-0.454223	function calls and it can
-0.244026	more cores, and it can
-0.483710	data is that it can
-0.261985	it so that it can
-0.261985	source so that it can
-0.261985	statement so that it can
-0.261985	100 so that it can
-0.261985	above, so that it can
-0.272638	loop-invariant expression that it can
-0.461844	10 means that it can
-0.272638	compiler knows that it can
-0.347991	large arrays if it can
-0.316246	more data than it can
-0.337032	the program then it can
-0.337032	to T+5, then it can
-0.311429	This is because it can
-0.311429	other constants because it can
-0.311429	plus one, because it can
-0.296767	clock cycles, but it can
-0.296767	multiple threads, but it can
-0.296767	the hint, but it can
-0.216439	class C1, so it can
-0.328406	or compilation before it can
-0.339183	In some cases it can
-0.315817	same resources. But it can
-0.333814	than another. Therefore, it can
-0.269858	do and what it can
-0.578775	function is called, it can
-0.216439	and even worse, it can
-0.216439	a = (b*c)/d, it can
-0.216439	dominating. At least, it can
-0.457794	optimization unless the function can
-0.354111	of exceptions a function can
-0.326056	0 // this function can
-0.309138	sure that one function can
-0.288877	which the calling function can
-0.422219	exception. A frame function can
-0.233214	series. The exponential function can
-0.366797	part of the code can
-0.841178	parts of the code can
-0.512030	double if the code can
-0.289032	functions then the code can
-0.289032	parameters then the code can
-0.445090	Critical pieces of code can
-0.323589	identical branches The code can
-0.323589	you know). The code can
-0.311185	cause overflow, this code can
-0.458647	because the same code can
-0.451912	sources. The above code can
-0.105256	/ 3; } This can
-0.105256	% 3; } This can
-0.244473	printf("Delta"); break; } This can
-0.372196	of the code. This can
-0.191669	to non-AVX code. This can
-0.296289	in program memory. This can
-0.261477	factorial *= x; This can
-0.261477	the member pointer. This can
-0.209020	of integer operations. This can
-0.261477	on the variable. This can
-0.405410	on the stack. This can
-0.191669	each their stack. This can
-0.209020	are very fast. This can
-0.343015	speed is important. This can
-0.261477	at inconvenient times. This can
-0.209020	function dispatch process. This can
-0.209020	of data files. This can
-0.209020	not be cached. This can
-0.209020	a2 / b2; This can
-0.209020	the code only. This can
-0.482903	and VIA CPUs"). This can
-0.209020	malloc and free. This can
-0.021250	created or modified. This can
-0.209020	array is defined. This can
-0.209020	bus is saturated. This can
-0.209020	calculation of (a+b). This can
-0.209020	part of it). This can
-0.209020	system thread scheduler. This can
-0.209020	comparing bits 32-62. This can
-0.209020	other access patterns. This can
-0.209020	at this place. This can
-0.435876	index then the compiler can
-0.454795	103), but the compiler can
-0.308439	2, so the compiler can
-0.400344	For example, the compiler can
-0.308439	some cases the compiler can
-0.457762	know what the compiler can
-0.400344	List[i]++; Here, the compiler can
-0.308439	sets. Likewise, the compiler can
-0.308439	example 12.1a, the compiler can
-0.316817	output of a compiler can
-0.282773	Function inlining The compiler can
-0.282773	anonymous object. The compiler can
-0.282773	induction variable. The compiler can
-0.282773	is executed. The compiler can
-0.282773	page 84). The compiler can
-0.282773	return a+1;. The compiler can
-0.297690	same result. A compiler can
-0.523837	122. The Intel compiler can
-0.338729	CPUs. The Gnu compiler can
-0.338729	vectorization. The Gnu compiler can
-0.199009	because a good compiler can
-0.354153	index. A good compiler can
-0.251074	because an optimizing compiler can
-0.251074	cases, an optimizing compiler can
-0.194710	used. An optimizing compiler can
-0.194710	Devirtualization An optimizing compiler can
-0.210166	hand, a just-in-time compiler can
-0.237544	the former case x can
-0.251269	points to and you can
-0.251269	point code and you can
-0.251269	& operator; and you can
-0.307340	method requires that you can
-0.231704	optimization options that you can
-0.231704	Another thing that you can
-0.231704	then think that you can
-0.231704	is unrealistic that you can
-0.291877	not necessary if you can
-0.291877	is good if you can
-0.291877	variables global if you can
-0.277255	legal issue, as you can
-0.330762	of code then you can
-0.251297	virtual functions then you can
-0.251297	the cache then you can
-0.251297	are independent then you can
-0.251297	an integer, then you can
-0.251297	particular meaning, then you can
-0.251297	If so, then you can
-0.258397	is fastest because you can
-0.316476	oriented programs. If you can
-0.267723	operating system, but you can
-0.190223	On most compilers you can
-0.276402	the Internet where you can
-0.258397	32 bits, so you can
-0.389528	12.3a, for example, you can
-0.084779	like and how you can
-0.084779	39 shows how you can
-0.240324	in most cases you can
-0.190223	one vector, while you can
-0.190223	input. (In Windows you can
-0.084779	In most cases, you can
-0.084779	In such cases, you can
-0.317620	of which optimizations you can
-0.258397	my blog. Here, you can
-0.084779	lots of things you can
-0.084779	are various things you can
-0.240324	short. In Windows, you can
-0.307590	intrinsic functions. Alternatively, you can
-0.125741	fast. In general, you can
-0.125741	counterparts. In general, you can
-0.190223	for NOT. Instead, you can
-0.190223	the | operator; you can
-0.190223	For this reason, you can
-0.328461	a loop if this can
-0.310613	critical stride then this can
-0.310613	12.4b shows how this can
-0.237195	it. The load time can
-0.293336	and data cache use can
-0.669470	take longer time. It can
-0.223558	the compiler does It can
-0.223558	3.8 System database It can
-0.223558	file input/output operations. It can
-0.223558	in the CPU. It can
-0.303305	vector containing integers. It can
-0.223558	a template parameter. It can
-0.223558	compiling for Linux. It can
-0.223558	floating point numbers. It can
-0.223558	list[i].a and list[i].b. It can
-0.487605	something in static memory can
-0.417482	data from RAM memory can
-0.974699	of code and data can
-0.291249	functions and public data can
-0.345679	the calculations. The program can
-0.235220	if the 7 program can
-0.400938	size of each vector can
-0.210420	available then each vector can
-0.296850	data cache. The same can
-0.296850	128-bit reads. The same can
-0.309173	x, while other functions can
-0.482145	lookup Using intrinsic functions can
-0.233243	truncation. The missing functions can
-0.523876	calls because the CPU can
-0.345318	register renaming. The CPU can
-0.237235	data The prefetch instruction can
-0.532395	branch inside the loop can
-0.378306	nothing inside the loop can
-0.343022	or not. The loop can
-0.220848	inside the CPU which can
-0.359199	function of i which can
-0.220848	a vector register which can
-0.220848	precision conversion instructions which can
-0.220848	pointers and references, which can
-0.220848	bitwise OR operator, which can
-0.220848	have an attribute which can
-0.220848	requires n-1 multiplications, which can
-0.220848	(XMM or YMM) which can
-0.599235	or double to integer can
-0.349571	integer, or an integer can
-0.232012	lines in the set can
-1.105514	the AVX instruction set can
-0.347422	a low instruction set can
-0.324217	polymorphism A template class can
-0.347199	data in this example can
-0.347199	float in this example can
-0.304862	expressions and other compilers can
-0.459188	Clang and Intel compilers can
-0.265615	All of these compilers can
-0.437065	a compiler. Some compilers can
-0.284795	variable because optimizing compilers can
-0.245410	Thread-local storage Most compilers can
-0.245410	Algebraic reductions Most compilers can
-0.245410	Algebraic reduction Most compilers can
-0.021546	Intel and PathScale compilers can
-0.212685	compilers optimize Modern compilers can
-0.234152	objects of variable size can
-0.234152	and i >= size can
-0.352589	type. Likewise, a pointer can
-0.307011	type conversion A pointer can
-0.286846	array. No link pointer can
-0.236824	extra check on b can
-0.862534	Agner's vector class library can
-0.049504	it. A dynamic library can
-0.049504	process. A dynamic library can
-0.049504	linking. A dynamic library can
-0.169533	the same dynamic library can
-0.419114	The user interface library can
-0.236884	not noticed that i can
-0.699471	needed if the object can
-0.515455	of a shared object can
-0.285799	one. The existing object can
-0.501383	end of an array can
-0.516223	then a simple array can
-0.226436	here: A large array can
-0.317678	page 27. An array can
-0.760418	variable number of objects can
-0.302788	details on when objects can
-0.307096	Structure and class objects can
-0.277185	block containing many objects can
-0.222912	needed, and new objects can
-0.490647	specifies that a variable can
-0.325335	very expensive. A variable can
-0.470555	of the induction variable can
-0.205787	conclusion is that we can
-0.238878	vector so that we can
-0.238878	12.4a so that we can
-0.205787	CPUID information that we can
-0.233707	clock cycles, then we can
-0.233707	or C2, then we can
-0.283292	run faster because we can
-0.190167	on x so we can
-0.190167	1024 bytes, so we can
-0.192211	In 64-bit systems we can
-0.192211	number of constants we can
-0.242556	+ a2/b2; Here we can
-0.192211	= pow(x,n) As we can
-0.192211	fast. The lesson we can
-0.228869	limited number of variables can
-0.228869	integer. 158 Integer variables can
-0.334742	method of induction variables can
-1.710242	a power of 2 can
-0.161777	// Read time You can
-0.161777	faster if unsigned You can
-0.161777	the instruction code. You can
-0.208525	lot of time. You can
-0.161777	for member functions. You can
-0.208525	and more efficient. You can
-0.161777	in the compiler. You can
-0.279888	500 clock cycles. You can
-0.161777	impossible with references. You can
-0.161777	divisible by 16. You can
-0.161777	of the result. You can
-0.161777	the preceding one. You can
-0.208525	do not overlap. You can
-0.161777	for positive n. You can
-0.161777	4) + a. You can
-0.161777	to 155 test. You can
-0.161777	referencing it twice. You can
-0.161777	// Print heading You can
-0.161777	the variable __intel_cpu_feature_indicator_x. You can
-0.161777	effects into account. You can
-0.161777	a GOT entry. You can
-0.161777	window or makefile. You can
-0.326365	test examples. The table can
-0.399924	data. A hash table can
-0.422813	This gain in performance can
-0.325185	less efficient. The performance can
-0.339504	Automatic updating of software can
-0.274190	} } A branch can
-0.274190	= b; A branch can
-0.228344	cases, the optimal branch can
-0.497317	which data are stored can
-0.352479	element if the address can
-0.402342	if the target address can
-0.312620	next. The carry bit can
-0.327048	storage. The same register can
-0.231393	a 128-bit XMM register can
-0.419122	higher level of optimization can
-0.100345	position-independent code Function libraries can
-0.100345	dynamic libraries Function libraries can
-0.468758	If the vector registers can
-0.340625	underflow in XMM registers can
-0.230793	Problems with invalid pointers can
-0.316682	for auto_ptr. Smart pointers can
-0.327013	6 The 64-bit systems can
-0.345599	though these operating systems can
-0.350836	no way the user can
-0.324934	different sizes, and they can
-0.331866	particularly critical because they can
-0.311787	at all. This method can
-0.311787	different executables. This method can
-0.219422	closed. The same method can
-0.219422	138 A similar method can
-0.235831	explicitly if data access can
-0.827264	of the operating system can
-0.473214	to the operating system can
-0.333368	or writes a file can
-0.337211	88 Object oriented programming can
-0.554053	then the critical part can
-0.285142	long sequence of operations can
-0.229927	43). The Boolean operations can
-0.648027	of a composite type can
-0.291553	set. These new instructions can
-0.278042	CPU. These virtual processors can
-0.223668	newer processors. Many processors can
-0.363082	performance on non-Intel processors can
-0.379963	of logical processors available can
-0.556159	counter by a constant can
-0.229289	constant subexpression. A constant can
-0.347126	that detects an error can
-0.348719	large for the stack can
-0.298635	on current Intel CPUs can
-0.215930	All modern x86 CPUs can
-0.269282	is because modern CPUs can
-0.311838	by CPU Modern CPUs can
-0.291500	allocation Objects and arrays can
-0.235399	about how caches work can
-0.348222	well. Even function calls can
-0.235308	consider if intermediate calculations can
-0.351493	so that the result can
-0.337663	work that the processor can
-0.435586	number of unused bytes can
-0.228110	faster and that threads can
-0.430510	safe if multiple threads can
-0.505887	a, b and c can
-0.277130	so that one thread can
-0.286872	user interface, another thread can
-0.286872	multiple threads. Each thread can
-0.206184	and a third thread can
-0.206184	in a high-priority thread can
-0.273676	so big that overflow can
-0.219814	sure that no overflow can
-0.219814	arrays. An array overflow can
-0.321733	memory allocation. Container classes can
-0.448788	64. Each cache line can
-0.235538	and no branches inside can
-0.280816	code caching. This problem can
-0.226115	safe. This safety problem can
-0.280990	DEC, JNZ). This solution can
-0.305475	time. But this solution can
-0.439494	But a sorted list can
-0.332974	microprocessor because the hardware can
-0.235033	the stack unwinding information can
-0.335280	<= max) { ... can
-0.404215	of the loop counter can
-0.164746	of a loop counter can
-0.708002	the time stamp counter can
-0.981720	of dynamic memory allocation can
-0.504645	limited. Dynamic memory allocation can
-0.234547	The method described above can
-0.234265	are swapped then both can
-0.234084	why object oriented programs can
-0.340063	amount of memory space can
-0.234279	function. The automatic dispatching can
-0.605105	is that the microprocessor can
-0.268139	slow, then the microprocessor can
-0.268139	how well the microprocessor can
-0.234042	contentions is that branches can
-0.613558	2 then the multiplication can
-0.443872	integers if the application can
-0.334571	of various instruction sets can
-0.276361	of the 32 sets can
-0.234206	optimization. A mixed implementation can
-0.346273	reason why exception handling can
-0.344185	but its data members can
-0.320532	so). A template parameter can
-0.566364	A pointer or reference can
-0.439442	things that the programmer can
-0.293846	usability reasons. The programmer can
-0.376830	compilers The register keyword can
-0.341336	principle of table lookup can
-0.233361	time for WTL applications can
-0.075647	is zero } We can
-0.075647	can be used. We can
-0.075647	the level-1 cache. We can
-0.075647	bit to zero We can
-0.075647	by the compiler. We can
-0.075647	floating point number. We can
-0.075647	of identifier names. We can
-0.075647	bit of u.f We can
-0.075647	15.1b to 15.1c. We can
-0.075647	the sign bit. We can
-0.075647	for some caveats. We can
-0.075647	32, 64, ...). We can
-0.075647	storage (e.g. PowerPC). We can
-0.075647	sign bit set). We can
-0.075647	C++ as 'this'. We can
-0.479375	the out-of-order execution mechanism can
-0.207139	44. The dispatching mechanism can
-0.665457	The CPU dispatch mechanism can
-0.232988	Such an extra framework can
-0.321292	Dependency chains Modern microprocessors can
-0.533422	nonzero floating point numbers can
-0.486455	A graphical user interface can
-0.270092	how the development process can
-0.423933	of the installation process can
-0.201582	class, structure or union can
-0.151462	memory space. A union can
-0.151462	an example. A union can
-0.333719	value. The copy constructor can
-0.504595	Therefore, the code section can
-0.886506	compilers I have tested can
-0.330677	If the cache contentions can
-0.212923	Integer to float conversions can
-0.295000	extra time. These conversions can
-0.212842	an empty throw() statement can
-0.212842	53). No general statement can
-0.213167	But the same errors can
-0.213167	unsafe because serious errors can
-0.425923	over other programming languages can
-0.318195	inlined function. Function inlining can
-0.231551	A complex digital operation can
-0.209551	have Booleans as output can
-0.209551	at the compiler output can
-0.287240	of CPUs. These costs can
-0.231095	user data. A database can
-0.286799	chooses between two constants can
-0.231192	if a simple algorithm can
-0.230778	data members. This alignment can
-0.333210	128 because the offset can
-0.231404	undesired effects. This effect can
-0.230297	mispredictions, etc. These counters can
-0.326481	dynamic allocation. The heap can
-0.229851	etc. But program loading can
-0.501829	eliminated if the condition can
-0.202456	that the if condition can
-0.229851	processor with four cores can
-0.230465	in the same generation can
-0.608057	the branch target buffer can
-0.473437	lines. The critical stride can
-0.228776	examples explain how metaprogramming can
-0.395283	interval. A hash map can
-0.229467	if such dependency chains can
-0.229467	chain. Such dependency chains can
-0.211764	Visual Studio. This tool can
-0.192536	counter. The test tool can
-0.261104	identified. My test tool can
-0.281830	is called. Lazy binding can
-0.603917	in the x86 family can
-0.391857	variable in a DLL can
-0.282021	table lookup Lookup tables can
-0.225656	and read-only data sections can
-0.310197	program. Frequent context switches can
-0.320720	for free. Visual Studio can
-0.145831	called global variables. They can
-0.145831	as three branches. They can
-0.145831	are very smart. They can
-0.225465	left out if exceptions can
-0.402805	switches and garbage collection can
-0.280512	examples in these manuals can
-0.620592	microprocessor with out-of-order capabilities can
-0.225656	to execute then measurements can
-0.225465	CPU chip. Such units can
-0.223636	calculation of this polynomial can
-0.421956	CriticalFunction in example 13.1 can
-0.224080	a valid address. Pointers can
-0.223636	with debugging. A debugger can
-0.224302	version). This wasteful behavior can
-0.278257	eax, ecx and edx can
-0.223636	that the background job can
-0.816679	the size of abc can
-0.187641	a reasonable upper limit can
-0.129057	a not-too-big upper limit can
-0.221088	unsigned. The following guidelines can
-0.221618	I have ever seen can
-0.217294	if a reasonable estimate can
-0.217294	string as code. Metaprogramming can
-0.424978	order. The heap manager can
-0.185117	a simple periodic pattern can
-0.109046	A simple periodic pattern can
-0.270825	construction in example 12.4b can
-0.217623	operators Integer sizes Integers can
-0.217623	Multiple applications running simultaneously can
-0.217294	runtime). The following techniques can
-0.217294	| operator which otherwise can
-0.217294	A much higher resolution can
-0.211476	good investment. A redesign can
-0.211044	maintainability of C++ projects can
-0.345773	method in example 14.28 can
-0.211044	is exact. Multiple divisions can
-0.211044	s1, s2 and s3 can
-0.211044	including user interface etc., can
-0.211044	with character arrays. Strings can
-0.211044	Data that are read-only can
-0.198807	line size (typically 64) can
-0.198807	if the subexpression c+b can
-0.198807	in the same chip can
-0.198807	double format. The formats can
-0.249970	a level-2 cache miss can
-0.198807	1.; Eliminate jumps Jumps can
-0.198807	Now the two parentheses can
-0.198807	same instruction set. Neither can
-0.198807	this optimization explicitly. Divisions can
-0.198807	b / c) 139 can
-0.198807	__attribute__((fastcall)). The fastcall modifier can
-0.198807	problem. This new insight can
-0.198807	simple cases. Database queries can
-0.198807	and studying the bottlenecks can
-0.198807	^, ~, <<, >> can
-0.164039	For example, one tread can
-0.164039	the intermediate result (b+c) can
-0.164039	brands of CPUs unequally can
-0.164039	is lost. This dilemma can
-0.164039	to what the preprocessor can
-0.164039	Instead, the following work-around can
-0.164039	loop in example 8.24 can
-0.164039	Contentions in the BTB can
-0.164039	using a common denominator can
-0.164039	pointer aliasing" (if valid) can
-0.164039	at compile time. (Examples can
-0.164039	many such programs installed can
-0.164039	b; b = !a; can
-0.164039	data conversion and shuffling can
-0.164039	bit are zero. Zero can
-0.164039	the keyword far (arrays can
-0.358251	function goes in the //
-0.237443	loop if powN is //
-0.313849	x^10 // loop for //
-0.236565	here // Virtual function //
-0.236565	"instrset_detect.cpp" // instrset_detect function //
-0.007871	may replace this by //
-0.032378	will replace this by //
-0.545504	sometimes be replaced by //
-0.814250	dispatching in Gnu compiler //
-0.023326	exp(x) for small x //
-0.235336	multiply // square x //
-0.226504	float coef[16] = { //
-0.177888	7.41a class vector { //
-0.728754	< size; i++) { //
-0.464714	< NumberOfTests; i++) { //
-0.250462	< arraysize; i++) { //
-0.250462	< list.Size(); i++) { //
-0.570696	} } else { //
-0.404810	nonzero } else { //
-0.404810	&CriticalFunction_SSE2; } else { //
-0.599111	(double const x) { //
-0.977563	const & x) { //
-0.958553	double p(double x) { //
-0.272617	IntegerPower (double x) { //
-0.203574	(u.i[1] < 0) { //
-0.203574	(n > 0) { //
-0.314114	128 == 0) { //
-0.047084	short int cc[]) { //
-0.253019	v.i * 2) { //
-0.177888	parm1, int parm2) { //
-0.098650	int)n < 4) { //
-0.022642	(level >= 4) { //
-0.258522	void CriticalInnerFunction () { //
-0.002287	i += 8) { //
-0.976952	< SIZE; r++) { //
-0.036493	< r; c++) { //
-0.064850	factorial (int n) { //
-0.177888	(seconds < 5) { //
-0.018639	(level >= 11) { //
-0.177888	typeof(CriticalFunction) * CriticalFunctionDispatch(void) { //
-0.018639	(u.i & 0x7FFFFFFF) { //
-0.301158	c1 += TILESIZE) { //
-0.018639	void transpose(double a[SIZE][SIZE]) { //
-0.177888	the array i) { //
-0.177888	block: 62 __try { //
-0.177888	EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH) { //
-0.177888	(i >= N) { //
-0.177888	(i < arraysize) { //
-0.177888	(u.i > v.i) { //
-0.177888	int)n < 13) { //
-0.781782	swapd(a[r2][c2],a[c2][r2]); } } } //
-0.328414	* c[i]); } } //
-0.213296	used for multiplication } //
-0.571569	_mm_storeu_si128((__m128i *)d, x); } //
-0.014277	return _mm_loadu_si128((__m128i const*)p); } //
-0.060159	return _mm_load_si128((__m128i const*)p); } //
-0.213296	= &Object2; p->Hello(); } //
-0.213296	__rdtsc(); return clock; } //
-0.266305	CriticalFunction = &CriticalFunction_386; } //
-0.266305	supported return &CriticalFunction_SSE2; } //
-0.490972	(*SelectAddMul_pointer)(aa, bb, cc); } //
-0.213296	x8*x2; return x10; } //
-0.213296	not supported"); return; } //
-0.213296	s); return _mm_cvtss_f32(s); } //
-0.213296	powN<(N & N-1)==0,N>::p(x); } //
-0.213296	here: return *(T*)0; } //
-0.292935	}; // constant data //
-0.426588	header for intrinsic functions //
-0.301495	Define SSE2 intrinsic functions //
-0.509770	by cache line size //
-0.535683	functions that have multiple //
-0.292622	the desired function version //
-0.312949	biggest possible vector objects //
-1.010744	a power of 2 //
-0.606193	#if INSTRSET == 2 //
-0.058509	(May use a table //
-0.380417	12.4a. Loop with branch //
-0.380452	diagonal // swap elements //
-0.533976	a constant is faster //
-0.427169	dispatching on first call //
-0.578397	int i = 0; //
-0.309950	else { return 0; //
-0.349626	// test sign bit //
-0.228616	Still faster if unsigned //
-0.419345	// 2 bytes. first //
-0.705521	// 4 bytes. first //
-0.419345	// 8 bytes. first //
-0.347192	for improving the code. //
-1.247441	known at compile time. //
-0.297727	Critical function to test //
-0.297727	of times to test //
-0.220095	difference for each test //
-0.220095	// Time before test //
-0.330377	SelectAddMul_AVX2 #endif // SSE2 //
-0.308168	initializes x to 0 //
-0.344526	for N = 0 //
-0.235257	reference to provoke error //
-0.235442	// Repeat NumberOfTests times //
-0.000178	in the code. Example: //
-0.000711	pieces of code. Example: //
-0.000711	any extra code. Example: //
-0.002850	the same time. Example: //
-0.001423	the called function. Example: //
-0.001423	a pure function. Example: //
-0.001423	stored in memory. Example: //
-0.001423	in static memory. Example: //
-0.002850	they are used. Example: //
-0.002850	function is called. Example: //
-0.002850	unroll a loop. Example: //
-0.000355	power of 2. Example: //
-0.002850	of consecutive variables. Example: //
-0.002850	across function calls. Example: //
-0.002850	versus XMM registers. Example: //
-0.002850	a float variable. Example: //
-0.002850	function is needed. Example: //
-0.002850	for detailed instructions. Example: //
-0.002850	a non-sequential order. Example: //
-0.002850	it jumps to. Example: //
-0.001423	check for overflow. Example: //
-0.001423	doesn't cause overflow. Example: //
-0.002850	the previous value. Example: //
-0.002850	a previous branch. Example: //
-0.002850	the same constant. Example: //
-0.002850	poor branch prediction. Example: //
-0.002850	the calculated result. Example: //
-0.001423	the loop counter. Example: //
-0.001423	additional integer counter. Example: //
-0.002850	a single operation. Example: //
-0.002850	iteration is finished. Example: //
-0.002850	in different ways. Example: //
-0.002850	of parallel execution. Example: //
-0.002850	of array elements. Example: //
-0.002850	it only once. Example: //
-0.002850	registers is limited. Example: //
-0.002850	a lookup-table static. Example: //
-0.002850	to is known. Example: //
-0.002850	the same thing. Example: //
-0.002850	completely independent divisions. Example: //
-0.002850	SSE2 or later. Example: //
-0.002850	to all zeroes. Example: //
-0.002850	may be undesired. Example: //
-0.002850	32 bit offsets). Example: //
-0.002850	the loop overhead. Example: //
-0.002850	the members individually. Example: //
-0.235297	Example 12.5. Aligned arrays //
-0.328438	if above doesn't work //
-0.235498	//=DeltaY // Store result //
-0.435249	// 6 unused bytes //
-0.234640	Or #include <ia32intrin.h> etc. //
-0.034292	and columns in matrix //
-0.211287	size // define matrix //
-0.211287	function to transpose matrix //
-0.794847	// Define vector classes //
-0.104058	virtual function } }; //
-0.104058	index operator } }; //
-0.104058	return x; } }; //
-0.104058	return 1.0; } }; //
-0.104058	#undef N1 } }; //
-0.104058	* powN<true,N/2>::p(x); } }; //
-0.104058	function: (static_cast<MyChild*>(this))->Disp(); } }; //
-0.162708	decoding and perhaps }; //
-0.162708	public: void NotPolymorphic(); }; //
-0.162708	y.d + 4.; }; //
-0.312431	at 399 int b; //
-0.281053	S1 { double b; //
-0.477949	14.18c double a, b; //
-0.293156	int i, a, b; //
-0.234440	Volatile to prevent optimizing //
-0.021645	a, b, c; ... //
-0.213919	<asmlib.h> void CriticalFunction(); ... //
-0.511092	Table // Loop counter //
-0.475194	Returns time stamp counter //
-0.233785	constructor // sum operator //
-0.097279	int one : 1; //
-0.097279	int sign : 1; //
-0.261828	performance. Integer size conversion //
-0.209331	Signed / unsigned conversion //
-0.290665	// Implicit type conversion //
-0.354412	Vec16s a, b, c; //
-0.354412	Vec8s a, b, c; //
-0.340470	// Initialize to zero //
-0.226840	Polynomial coefficients // Table //
-0.226840	+= A2; // Table //
-0.356536	Intel and Gnu compilers. //
-0.218910	specific to Microsoft compilers. //
-0.233365	instances of S1 aligned //
-0.232766	= x * x; //
-1.021841	int size = 100; //
-0.339514	version 2.20 or later //
-0.232327	// SSE4.1 // AVX2 //
-0.268527	default constructor // constructor //
-0.287840	coordinates // default constructor //
-0.328175	= a[i].u[1] * 2; //
-0.232601	Non-polymorphic functions go here //
-0.422666	11 short int a; //
-0.030499	necessary. Take the example: //
-0.002461	compile time. For example: //
-0.002461	comparisons, etc. For example: //
-0.002461	many cases. For example: //
-0.001229	pointed to. For example: //
-0.001229	refers to. For example: //
-0.002461	a structure. For example: //
-0.002461	mixed sizes. For example: //
-0.002461	table lookup. For example: //
-0.002461	is valid. For example: //
-0.002461	poorly predictable. For example: //
-0.002461	be combined. For example: //
-0.002461	eliminated completely. For example: //
-0.030499	by the following example: //
-0.030499	than ARRAYSIZE. Another example: //
-0.034282	// This is slow //
-0.084209	double b; int d; //
-0.084209	at 7 int d; //
-0.445609	+ c + d; //
-0.524240	// loop through rows //
-0.229699	}; // Called directly //
-0.308347	c1, c2; double temp; //
-0.229537	counter outside both loops //
-0.198903	// SSE2 // SSE4.1 //
-0.198903	example, vectorized with SSE4.1 //
-0.283549	Compile-time polymorphism with templates //
-0.013855	reduces the code to: //
-0.000682	may reduce this to: //
-0.002736	may change this to: //
-0.002736	by changing this to: //
-0.002736	1.2345; Change this to: //
-0.013855	can be optimized to: //
-0.013855	Can be reduced to: //
-0.001708	can be changed to: //
-0.313702	power using template metaprogramming //
-0.227880	loop for // multiply //
-0.518656	loop columns below diagonal //
-0.303027	prevent optimizing // Time //
-0.313375	public: float x, y; //
-0.226413	how to align arrays. //
-0.227077	<pmmintrin.h> // SSE3 required //
-0.226855	in a different array. //
-0.309984	we want to measure //
-0.001266	may look like this: //
-0.002535	typically look like this: //
-0.002535	function looks like this: //
-0.002535	code looks like this: //
-0.002535	classes looks like this: //
-0.305096	(unsigned int)a / 10; //
-0.325800	if organized as follows: //
-0.166684	2, b * c); //
-0.093523	two, b * c); //
-0.012388	= _mm_mullo_epi16 (b, c); //
-0.280452	{ public: int a[100]; //
-0.226046	int size = 256; //
-0.279594	elements of type T //
-0.280166	as powers of 2: //
-0.225289	convoluted template metaprogramming is. //
-0.415298	vectorclass manual for details. //
-0.225542	xxn(x4, x2*x, x2, x); //
-0.223210	a positive integer constant. //
-0.223210	+ C; } polynomial //
-0.167423	// C-style type casting //
-0.167423	// Constructor-style type casting //
-0.223796	(unsigned int)b / 16; //
-0.223796	(unsigned int)b % 16; //
-0.000105	parm1, int parm2) {...} //
-0.223503	InstructionSet(): // Example 13.1 //
-0.000421	= LoadVector(cc + i); //
-0.000421	= LoadVector(bb + i); //
-0.220665	contentions. Use simple method. //
-0.220665	No error return a[i]; //
-0.221014	// For unused returns //
-0.505046	n) { // n! //
-0.275035	library function from www.agner.org/optimize/asmlib.zip. //
-0.163974	short int cc[size] ); //
-0.163974	short int aa[size] ); //
-0.220665	serves as entry point. //
-0.221014	last byte at 13 //
-0.163974	bb, cc); } #endif //
-0.163974	#define FUNCNAME SelectAddMul_AVX2 #endif //
-0.216875	this by writing: 103 //
-0.008607	overflow: _controlfp_s(&dummy, 0, _EM_OVERFLOW); //
-0.008607	_fpreset(); _controlfp_s(&dummy, 0, _EM_OVERFLOW); //
-0.004282	_EM_OVERFLOW); // _controlfp(0, _EM_OVERFLOW); //
-0.118860	// x^2 // x^4 //
-0.118860	* x2; // x^4 //
-0.041761	x; *(int*)&x |= 0x80000000; //
-0.041761	2.0f; x.i |= 0x80000000; //
-0.127071	u; u.i ^= 0x80000000; //
-0.217308	points to the dispatcher. //
-0.216875	XOR'ing it with 1: //
-0.217308	when converted to unsigned. //
-0.216875	Update induction variable Y //
-0.211200	line provokes an error. //
-0.121755	Alignd(X) __declspec(align(16)) X #else //
-0.121755	: "memory" ); #else //
-0.048013	long list of numbers: //
-0.048013	with floating point numbers: //
-0.048013	sum of 100 numbers: //
-0.121755	a series of calculations: //
-0.121755	apply to modulo calculations: //
-0.164179	int TILESIZE = 8; //
-0.121755	int exponent : 8; //
-0.121755	by joining the operations: //
-0.121755	to avoid modulo operations: //
-0.210631	no risk of overflow: //
-0.637737	in the following way: //
-0.345210	polynomial // Polynomial coefficients //
-0.005719	can be replaced with: //
-0.048013	c; }; Replace with: //
-0.210631	bits 0 - 30 //
-0.210631	in this example: 38 //
-0.048144	except the sign bit: //
-0.048144	inverting the sign bit: //
-0.015422	1024; int a[size], b[size]; //
-0.007643	i; float a[size], b[size]; //
-0.007643	1000; float a[size], b[size]; //
-0.008607	__m128i two = _mm_set1_epi16(2); //
-0.198405	aligning data #ifdef _MSC_VER //
-0.463169	swapd(x,y) {temp=x; x=y; y=temp;} //
-0.249518	x10; } // x^2 //
-0.249518	class library #include <stdio.h> //
-0.463169	2.2, C = 3.3; //
-0.035503	series, vectorized #include <dvec.h> //
-0.035503	classes 114 #include <dvec.h> //
-0.074169	illustrates such a case: //
-0.074169	string to lower case: //
-0.198405	operations outside the loop: //
-0.035503	vector classes #include "vectorclass.h" //
-0.035503	CPU dispatching #include "vectorclass.h" //
-0.008607	by a table lookup: //
-0.074169	a = _mm_or_si128(c2, bc); //
-0.074169	bc = _mm_andnot_si128(mask, bc); //
-0.008607	c2 = _mm_add_epi16(c, two); //
-0.198405	be divisible by TILESIZE //
-0.198405	= ReadTSC() - time1; //
-0.249518	with SSE2 #include <emmintrin.h> //
-0.249518	below diagonal swapd(a[r][c], a[c][r]); //
-0.008607	int level = InstructionSet(); //
-0.008607	for InstructionSet() #include "asmlib.h" //
-0.198405	= _mm_blendv_epi8(bc, c2, mask); //
-0.008607	int SIZE = 512; //
-0.008607	= b * 1.2; //
-0.198405	eight consecutive elements c.load(cc+i); //
-0.328627	and a template parameter: //
-0.198405	CriticalFunctionType(int parm1, int parm2); //
-0.198405	small x // x^n //
-0.017389	use a lookup table: //
-0.017389	using a lookup table: //
-0.008607	__m128i zero = _mm_set1_epi16(0); //
-0.074169	// x^8 // x^10 //
-0.074169	x^10 // return x^10 //
-0.198405	inside the derived class: //
-0.074169	int fraction : 23; //
-0.074169	+= n << 23; //
-0.035503	against overflow is needed: //
-0.035503	why bookkeeping is needed: //
-0.008607	mask = _mm_cmpgt_epi16(b, zero); //
-0.463169	cout << "Hello "; //
-0.008607	// Writes "Hello 1" //
-0.463169	d = (double)(signed int)u; //
-0.198405	point overflow has occurred. //
-0.198405	x *const_cast<int*>(&x) += 2;} //
-0.035503	instruction set is available: //
-0.035503	if SSE2 is available: //
-0.328627	zero memset(a, 0, sizeof(a)); //
-0.198405	eight consecutive elements b.load(bb+i); //
-0.198405	use the lrint function: //
-0.163668	how to use SafeArray: //
-0.163668	b in a union: //
-0.163668	setting the fraction bits: //
-0.163668	sign bit to zero: //
-0.163668	x^3, x^4 F32vec4 xx4(x4); //
-0.163668	converting to floating point: //
-0.163668	instruction set is enabled: //
-0.163668	// x^4 // x^8 //
-0.163668	SafeArray <float, 100> list; //
-0.163668	and b double precision: //
-0.163668	int SIZE = 64; //
-0.163668	= log(b[i]) + log(c[i]); //
-0.163668	p2 = &Object2; p2->Hello(); //
-0.163668	__asm__ (".type CriticalFunction, @gnu_indirect_function"); //
-0.163668	is a positive integer: //
-0.163668	the smallest members last: //
-0.163668	Z; Z += A2; //
-0.163668	Example 9.6b. #include "xmmintrin.h" //
-0.163668	the loop control condition: //
-0.163668	address calculation more efficient: //
-0.163668	double a[arraysize], b[arraysize], c[arraysize]; //
-0.163668	classes (Intel) #include <pmmintrin.h> //
-0.163668	{ return _mm_loadu_si128((__m128i const*)p);} //
-0.163668	element in the arrays: //
-0.163668	in case of underflow: //
-0.163668	first the runtime polymorphism: //
-0.163668	dividend is unsigned Examples: //
-0.163668	* SelectAddMul_pointer = &SelectAddMul_dispatch; //
-0.163668	additional floating point variable: //
-0.163668	x; nfac *= n+1; //
-0.163668	type T // Constructor //
-0.163668	in nn ifbit=1 bitofn //
-0.163668	sign of a double: //
-0.163668	the loop and reorganize: //
-0.163668	first two suggested improvements). //
-0.163668	Prevent optimizing away cpuid //
-0.163668	(float *)alloca(n * sizeof(float)); //
-0.163668	(using Intel vector classes): //
-0.163668	x) { return ipow(x,10); //
-0.163668	b; a = (int)d; //
-0.163668	a 2'nd order polynomial: //
-0.163668	+= xxn * _mm_load_ps(coef+i); //
-0.163668	#if defined(__unix__) || defined(__GNUC__) //
-0.163668	call method using InstructionSet(): //
-0.163668	writing: __declspec(align(64)) int BigArray[1024]; //
-0.163668	an array of structures: //
-0.163668	case: // Example 7.45 //
-0.163668	of (2,2,2,2,2,2,2,2) Is16vec8 two(2,2,2,2,2,2,2,2); //
-0.163668	92 DynamicArray[i] = WhateverFunction(i); //
-0.163668	u; u.i &= 0x7FFFFFFF; //
-0.163668	to compare absolute values: //
-0.163668	without the static keyword: //
-0.163668	by a single comparison: //
-0.163668	the SSE2 instruction set: //
-0.163668	multiplying with the reciprocal: //
-0.163668	Windows Template Library (WTL): //
-0.163668	use a loop counter: //
-0.163668	// Time // Serialize //
-0.163668	work-around can be used: //
-0.163668	enabled: // Example 14.21. //
-0.163668	functions memset and memcpy: //
-0.163668	ReadTSC() from library asmlib.. //
-0.163668	i++) a[i] = 0.0; //
-0.163668	* CriticalFunction = &CriticalFunction_Dispatch; //
-0.163668	a vector, uses SSE3. //
-0.163668	1 b = lrint(d); //
-0.163668	making a common denominator: //
-0.163668	dependency chain in two: //
-0.163668	= 4, we have: //
-0.163668	zero by using memset: //
-0.163668	<stdio.h> // define fprintf //
-0.163668	int fraction : 52; //
-0.163668	compiled versions #include "instrset_detect.cpp" //
-0.163668	y; // x,y coordinates //
-0.163668	of (0,0,0,0,0,0,0,0) Is16vec8 zero(0,0,0,0,0,0,0,0); //
-0.163668	by type-casting its address: //
-0.163668	last index changes fastest: //
-0.163668	variable-size array with alloca: //
-0.163668	counter //=2*A //=A*x*x+B*x+C //=DeltaY //
-0.163668	n to the exponent: //
-0.163668	a constant reference instead: //
-0.163668	of doing type conversions: //
-0.163668	swapped with element matrix[c][r]. //
-0.163668	2) SelectAddMul_pointer = &SelectAddMul_SSE2; //
-0.163668	SelectAddMul_SSE2, SelectAddMul_SSE41, SelectAddMul_AVX2, SelectAddMul_dispatch; //
-0.163668	of Intel vector classes: //
-0.163668	Function prototype CriticalFunctionType CriticalFunction_Dispatch; //
-0.163668	f cout << x.f; //
-0.163668	lines for matrix a: //
-0.163668	int fraction : 63; //
-0.163668	x^n } return add_elements(s); //
-0.163668	comparing them as integers: //
-0.163668	& 0x7FFFFF) | 0x3F800000; //
-0.163668	accessed through pointers, e.g.: //
-0.163668	page 134) return FactorialTable[n]; //
-0.163668	as in example 7.22. //
-0.163668	= & obj1; p->f(); //
-0.163668	in a pivot search: //
-0.163668	= x2 * x2; //
-0.163668	sum, initialize to x^0/0! //
-0.163668	illustrated in example 9.5b. //
-0.163668	true or always false: //
-0.163668	the code inside square: //
-0.163668	only half a square. //
-0.163668	source) { _mm_stream_pi((__m64*)dest, *(__m64*)&source); //
-0.163668	within a certain interval: //
-0.163668	s(0.f, 0.f, 0.f, 1.f); //
-0.163668	of two induction variables: //
-0.163668	x^n/n! xxn *= xx4; //
-0.163668	int exponent : 15; //
-0.163668	the ReadTSC function. 154 //
-0.163668	copying the return statement: //
-0.163668	make it a template: //
-0.163668	float(i); f = static_cast<float>(i); //
-0.163668	advantage of this capability: //
-0.163668	are: int BigArray[1024] __attribute__((aligned(64))); //
-0.163668	*(__m64*)&source); // MOVNTQ _mm_empty(); //
-0.163668	loop by two gives: //
-0.163668	volatile volatile int seconds; //
-0.163668	= b * 1.2f; //
-0.163668	bb[], short int cc[]); //
-0.163668	iset = instrset_detect(); 116 //
-0.163668	= _mm_and_si128(c2, mask); 110 //
-0.163668	int exponent : 11; //
-0.163668	<< list[i] << endl; //
-0.163668	= a * 2.5; //
-0.346091	be changed to a =
-0.320368	Call critical function a =
-0.226305	Func2() { int a =
-0.000706	} else { a =
-0.001104	if (b) { a =
-0.018006	if (true) { a =
-0.340098	numbers, we have a =
-0.226305	Example 8.4 double a =
-0.226305	through function pointer a =
-0.226305	Example 8.18 float a =
-0.226305	from memory address a =
-0.529326	expressions. For example, a =
-0.226305	in the case a =
-0.226305	- a & a =
-0.039351	a; int b; a =
-0.002358	int a, b; a =
-0.001570	double a, b; a =
-0.001570	float a, b; a =
-0.039351	a; bool b; a =
-0.226305	comparisons. The solution a =
-0.226305	i++) { ... a =
-0.075107	in the expression a =
-0.075107	while the expression a =
-0.098576	each element __m128i a =
-0.098576	AND operations: __m128i a =
-0.036791	b * c; a =
-0.036791	b / c; a =
-0.008910	a, b, c; a =
-0.036791	b % c; a =
-0.281032	true a && a =
-0.226305	a, a | a =
-0.063172	Example 7.10b char a =
-0.063172	Example 8.17 char a =
-0.063172	Example 7.9b char a =
-0.226305	b, c, d; a =
-0.022627	b / 10; a =
-0.022627	int)b / 10; a =
-0.022627	b % 10; a =
-0.022627	int)b % 10; a =
-0.098576	b / 16; a =
-0.098576	b % 16; a =
-0.226305	* c; Is16vec8 a =
-0.226305	// Example 7.2 a =
-0.226305	is float 140 a =
-0.226305	{ b.load(bb+i); c.load(cc+i); a =
-0.226305	x, y, z; a =
-0.226305	or 1. Writing a =
-0.226305	// Example 8.2b a =
-0.226305	// Example 8.3b a =
-0.226305	// Example 8.10b a =
-0.226305	= {2.6f, 1.5f}; a =
-0.226305	= {1.0f, 2.5f}; a =
-0.099578	5; to int x =
-0.099578	will reduce int x =
-0.015090	more efficient than x =
-0.063788	is faster than x =
-0.046941	used. For example, x =
-0.046941	efficiency. For example, x =
-0.228999	(int)n - 2, x =
-0.228999	f, x, y; x =
-0.047985	of const double A =
-0.047985	variables const double A =
-0.001154	vectorization const int size =
-0.001154	x); const int size =
-0.001154	#endif const int size =
-0.001154	14.30 const int size =
-0.001154	11.3 const int size =
-0.000576	Func(int); const int size =
-0.001154	11.2b const int size =
-0.001154	11.2a const int size =
-0.001154	7.33b const int size =
-0.001154	7.33a const int size =
-0.001154	14.4a const int size =
-0.087610	bool a, b; b =
-0.087610	= 0, b; b =
-0.197477	will be 1 b =
-0.020302	vector b: __m128i b =
-0.087610	b; double c; b =
-0.087610	a, b, c; b =
-0.197477	a = 0, b =
-0.197477	x[0] = a; b =
-0.197477	vector b: Is16vec8 b =
-0.197477	a = -100, b =
-0.197477	a = -1.0E8, b =
-0.197477	= parabola (2.0f); b =
-0.197477	a = 5.0f; b =
-0.197477	a = Multiply(10,8); b =
-0.197477	temp = a+1; b =
-0.259577	; eax = i =
-0.191231	doesn't work int i =
-0.191231	interval, for example i =
-0.000458	{ for (int i =
-0.000916	0; for (int i =
-0.000458	... for (int i =
-0.000183	vectors: for (int i =
-0.000916	floats for (int i =
-0.000916	sum; for (int i =
-0.000916	_alloca) for (int i =
-0.191231	int s; 40 i =
-0.023176	of (2,2,2,2,2,2,2,2) __m128i two =
-0.236888	DontSkip = dummy[0]; clock =
-0.231723	= 8 * 4 =
-0.231723	is 8192 / 4 =
-0.231652	value 10 * 8 =
-0.231652	512 kb / 8 =
-0.235975	/ 64) % 32 =
-0.223942	a, a & 0 =
-0.046106	a, a | 0 =
-0.046106	x-xxx--xx a | 0 =
-0.044965	b<c) Multiply by constant =
-0.014477	- Divide by constant =
-0.014477	add Divide by constant =
-0.014477	---xx---x Divide by constant =
-0.099494	+ i); // result =
-0.099494	elements c.load(cc+i); // result =
-0.035305	_mm_permutevar_ps 4 4 bytes =
-0.035305	_mm256_permutevar_ps 4 4 bytes =
-0.017294	_mm256_i32gather_epi32 unlimited 4 bytes =
-0.017294	_mm_i32gather_ps unlimited 4 bytes =
-0.017294	_mm_i32gather_epi32 unlimited 4 bytes =
-0.017294	_mm256_i32gather_ps unlimited 4 bytes =
-0.033064	_mm256_i64gather_pd unlimited 8 bytes =
-0.033064	_mm_i64gather_pd unlimited 8 bytes =
-0.033064	_mm256_i64gather_epi32 unlimited 8 bytes =
-0.033064	_mm_i64gather_epi32 unlimited 8 bytes =
-0.169189	4 = 2048 bytes =
-0.135596	is changed to c =
-0.135596	!= 0) { c =
-0.135596	not overlap. If c =
-0.135596	x[1] = b; c =
-0.014808	vector c: __m128i c =
-0.135596	for fast division c =
-0.030139	b, c, d; c =
-0.030139	0, c, d; c =
-0.135596	temp * temp; c =
-0.135596	vector c: Is16vec8 c =
-0.135596	b = 100, c =
-0.135596	b * 3.5; c =
-0.135596	b = 1.0E8, c =
-0.135596	else { CFALSE: c =
-0.135596	(a+1) * (a+1); c =
-0.001217	to b for (i =
-0.001217	= 0; for (i =
-0.000034	int i; for (i =
-0.001217	a[100], b; for (i =
-0.000152	i; ... for (i =
-0.000304	136 ... for (i =
-0.000304	j; ... for (i =
-0.000608	= 1; for (i =
-0.000608	+ 1; for (i =
-0.001217	to zero for (i =
-0.000304	float x; for (i =
-0.001217	register temp; for (i =
-0.001217	= 3; for (i =
-0.001217	i, a[100]; for (i =
-0.001217	i; 45 for (i =
-0.001217	= r; for (i =
-0.001217	innermost loop: for (i =
-0.001217	i, StringLength; for (i =
-0.001217	i, a[2]; for (i =
-0.001217	i; 84 for (i =
-0.001217	long timediff[NumberOfTests]; for (i =
-0.001217	} printf("\nResults:"); for (i =
-0.008069	} else { y =
-0.008069	68 else { y =
-0.004016	if (b) { y =
-0.069182	n) { double y =
-0.069182	bitofn // return y =
-0.069182	then the expression y =
-0.069182	b + c; y =
-0.069182	{x = a; y =
-0.069182	a : b) y =
-0.004016	c, d, y; y =
-0.016290	= 100, y; y =
-0.016290	= 1.23456, y; y =
-0.008069	a2, b1, b2; y =
-0.069182	we may write: y =
-0.022196	of (0,0,0,0,0,0,0,0) __m128i zero =
-0.220148	a vector. If n =
-0.220148	1.f; for (int n =
-0.162376	_mm_shuffle_epi8 16 1 byte =
-0.162376	_mm_perm_epi8 32 1 byte =
-0.216987	& r) { r =
-0.216987	for example when r =
-0.000143	100; i++) { a[i] =
-0.000861	size; i++) { a[i] =
-0.006061	+= 2) { a[i] =
-0.051033	of loop ; a[i] =
-0.024767	< 2; i++) a[i] =
-0.024767	< size; i++) a[i] =
-0.051033	in multiplication here: a[i] =
-0.051033	that avoids overflow: a[i] =
-0.051033	the safe formula a[i] =
-0.021693	B = 2.2, C =
-0.020094	rows = 20, columns =
-0.194987	rows = 10, columns =
-0.177475	obj1; C0 * p =
-0.177475	p; int i; p =
-0.177475	&Object1; p->NotPolymorphic(); p->Hello(); p =
-0.177475	CHello * p; p =
-0.231824	n.a. !(a < b) =
-0.169329	size; i++) { temp =
-0.095064	a[100], b, temp; temp =
-0.095064	b, c, temp; temp =
-0.095064	i, a[100], temp; temp =
-0.066848	== 0) { d =
-0.066848	Example 14.20 double d =
-0.007815	+ c*x + d =
-0.032139	a & b; d =
-0.032139	a && b; d =
-0.002301	u; double d; d =
-0.066848	else { DTRUE: d =
-0.150638	+= a[i+3]; } sum =
-0.068820	= x; float sum =
-0.068820	float a[100]; float sum =
-0.150638	a[100]; int i, sum =
-0.150638	100; float list[size], sum =
-0.230885	sign(i) ; shift right =
-0.205194	true a && true =
-0.205194	false, a || true =
-0.129294	template specialization for N =
-0.016822	14.8 const int rows =
-0.016822	7.17 const int rows =
-0.016822	FuncCol(int); const int rows =
-0.020954	asmlib library int level =
-0.471199	for(i=0; i<300; i++){ list[i] =
-0.254410	for(i=0; i<301; i+=3){ list[i] =
-0.145047	version CriticalFunctionType * CriticalFunction =
-0.145047	// Generic version CriticalFunction =
-0.066496	// SSE2 supported CriticalFunction =
-0.066496	// AVX supported CriticalFunction =
-0.229024	void DelayFiveSeconds() { seconds =
-0.145148	n! int i, f =
-0.145148	f = (float)i; f =
-0.145148	f = float(i); f =
-0.145148	float f; f=i; f =
-0.014843	* p) { *p =
-0.030211	7.31b char string[100], *p =
-0.030211	7.31a char string[100], *p =
-0.020131	double x, n, factorial =
-0.228176	variable 85 ; eax =
-0.190105	a a && false =
-0.190105	a, a || false =
-0.016722	vector c __m128i c2 =
-0.156259	with the bit-mask: c2 =
-0.084788	on stack ; ecx =
-0.084788	;eax=addressofa ;edx=addressinr ; ecx =
-0.040328	size; i++) { j =
-0.040328	rows; i++) { j =
-0.021925	- a & -1 =
-0.021925	0 a & -1 =
-0.021925	a a | -1 =
-0.021925	- a | -1 =
-0.095243	a a ^ -1 =
-0.183835	previous value as xn =
-0.183835	x) { float xn =
-0.226283	int i; int Induction =
-0.018442	A = 1.1, B =
-0.078990	= Induction ; edx =
-0.078990	PTR [esp+12] ; edx =
-0.012727	and c __m128i bc =
-0.114093	the inverted bit-mask: bc =
-0.011294	} const int SIZE =
-0.011294	9.5a const int SIZE =
-0.011294	9.6a const int SIZE =
-0.001777	r++) { for (c =
-0.016260	through rows for (c =
-0.114093	reductions such as -(-a) =
-0.114093	= a*(b+c) - -(-a) =
-0.114093	a*4 - n.a. -(-a) =
-0.221211	previous value as n! =
-0.114281	= _mm_hadd_ps(x, x); s =
-0.053392	short int s; s =
-0.053392	{ __m128 s; s =
-0.221452	Tuesday = 4, Wednesday =
-0.221452	100; float list[size], sum1 =
-0.025838	a a & ~a =
-0.025838	- a & ~a =
-0.114093	0 a ^ ~a =
-0.164569	size; i++) { b[i] =
-0.164569	< size; i++) b[i] =
-0.035592	version FuncType * SelectAddMul_pointer =
-0.035592	(iset >= 2) SelectAddMul_pointer =
-0.035592	(iset >= 8) SelectAddMul_pointer =
-0.035592	(iset >= 5) SelectAddMul_pointer =
-0.088171	= cos(x); } z =
-0.088171	y = cos(x); z =
-0.088171	y = sin(x); z =
-0.217417	u; int n; u.i =
-0.270963	of sets) (line size) =
-0.217417	sum1 = 0, sum2 =
-0.002141	r, c; for (r =
-0.002141	double temp; for (r =
-0.217417	the trick that N1 =
-0.015968	a bit-mask: __m128i mask =
-0.217417	+ A; double Y =
-0.048136	= 0x2710 and (set) =
-0.048136	then we have (set) =
-0.048136	by the formula: (set) =
-0.122045	b) a && !a =
-0.122045	false, a || !a =
-0.048136	= a - a-a =
-0.048136	a - n.a. a-a =
-0.048136	xxxxxxxxx x-xxx---- a-(-b)=a+b a-a =
-0.048136	a*(b+c) - n.a. x*x*x*x*x*x*x*x =
-0.023401	d = ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x =
-0.023401	x x ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x =
-0.122045	list[i] = 0; list[i+1] =
-0.122045	i+=3){ list[i] =0; list[i+1] =
-0.122045	x) { double x2 =
-0.122045	1./1.30767E12, 1./2.09227E13}; float x2 =
-0.013507	list[i+1] = 1; list[i+2] =
-0.211165	= C; double Z =
-0.211556	a[i] + b[i]; c[i] =
-0.211165	s2 = 0, s3 =
-0.211165	s1 = 0, s2 =
-0.122045	folding - n.a. a+b =
-0.122045	Integer algebra reductions: a+b =
-0.122045	a+b = b+a a*b =
-0.122045	a+b = b+a, a*b =
-0.015460	int x; for (x =
-0.015460	= 1.0; for (x =
-0.015460	+ B; for (x =
-0.263898	cache is 8 kb =
-0.074364	x *x; double x4 =
-0.074364	// x^2 float x4 =
-0.074364	= Induction; ; a[i+1] =
-0.074364	a[i] = Induction; a[i+1] =
-0.198925	= 1.f; float nfac =
-0.074364	= a - a*0 =
-0.074364	a - n.a. a*0 =
-0.074364	= 0 - a*1 =
-0.074364	0 - n.a. a*1 =
-0.198925	squares: const int TILESIZE =
-0.198925	Friday = 0x20, Saturday =
-0.074364	= 0 - a+0 =
-0.074364	0 - n.a. a+0 =
-0.074364	(b1 * b2); y1 =
-0.074364	b2, y1, y2; y1 =
-0.074364	b2 * reciprocal_divisor; y2 =
-0.074364	a1 / b1; y2 =
-0.074364	A, B, C; x.abc =
-0.074364	// Example 7.40c x.abc =
-0.198925	CChild2 * p2; p2 =
-0.198925	CChild1 * p1; p1 =
-0.198925	constant const int ArraySize =
-0.017432	size; i++) { aa[i] =
-0.017432	256; i++) { aa[i] =
-0.008628	r2++) { for (c2 =
-0.198925	static const int ABC =
-0.198925	float a[100]; float s0 =
-0.329328	--xx----- (a&&b) || (a&&c) =
-0.198925	16.1 const int NumberOfTests =
-0.198925	14.5a const int min =
-0.198925	} The factor sizeof(S1) =
-0.074364	0; int i, largest_index =
-0.074364	largest_abs = absvalue; largest_index =
-0.198925	100; i++) { list[i].a =
-0.198925	j = order(i); matrix[j][0] =
-0.198925	s0 = 0, s1 =
-0.074364	= a+(b+c) - a*b+a*c =
-0.074364	a+(b+c) - n.a. a*b+a*c =
-0.035592	c (a&&b) || (a&&b&&c) =
-0.035592	|| (a&&c) || (a&&b&&c) =
-0.198925	i; const int ARRAYSIZE =
-0.035592	n.a. - a ^a =
-0.035592	= ~a a ^a =
-0.198925	Thursday = 0x10, Friday =
-0.074364	obvious reductions as 0/a =
-0.074364	= a - 0/a =
-0.074364	n.a. n.a. - (a&b)|(a&c) =
-0.074364	----- ~(~a)=a x-xxxxx-- (a&b)|(a&c) =
-0.198925	static const double log2 =
-0.329328	x--xx---- (a&&b) || (!a&&c) =
-0.074364	> largest_abs) { largest_abs =
-0.074364	unsigned int absvalue, largest_abs =
-0.074364	x, y; ... x.a =
-0.074364	A, B, C; x.a =
-0.008628	100; x++) { Table[x] =
-0.074364	x.b = B; x.c =
-0.074364	y.b + 2.; x.c =
-0.074364	y.a + 1.; x.b =
-0.074364	x.a = A; x.b =
-0.074364	x---x---x x-xxx---- a*b*c=a*(b*c) a+b+c+d =
-0.074364	b*a (a+b)+c=a+(b+c) a+b+c=c+b+a a+b+c+d =
-0.017432	static const int FactorialTable[13] =
-0.017432	factorials: const int FactorialTable[13] =
-0.198925	Monday = 2, Tuesday =
-0.035592	inside sqaure: for (r2 =
-0.035592	handled separately: for (r2 =
-0.198925	and c first. b+c =
-0.164147	a, float b) {x =
-0.164147	n.a. n.a. - a<<b<<c =
-0.164147	a+b=0, and then 0+1.23456 =
-0.164147	= ReadTSC(); CriticalFunction(); timediff[i] =
-0.164147	enum Weekdays { Sunday =
-0.164147	static const float OneOrTwo5[2] =
-0.164147	&& b<c && a<c) =
-0.164147	NumberOfTests; i++) { time1 =
-0.164147	= a - a/1 =
-0.164147	a + 1; x[1] =
-0.164147	j = order(i); list[j].a =
-0.164147	clock; __cpuid(dummy, 0); DontSkip =
-0.164147	const * const Greek[4] =
-0.164147	x; const double A2 =
-0.164147	min = 100, max =
-0.164147	= x4*x4; double x10 =
-0.164147	i++) { 92 DynamicArray[i] =
-0.164147	x) { // polynomial(x) =
-0.164147	< NUMCOLUMNS; column++) matrix[row][column] =
-0.164147	out sign bit: absvalue =
-0.164147	y1, y2, reciprocal_divisor; reciprocal_divisor =
-0.164147	a - n.a. (-a)*(-b) =
-0.164147	n.a. n.a. - a+b+c =
-0.164147	static const float coef[16] =
-0.164147	b*a - n.a. (a+b)+c =
-0.164147	9.4 const int NUMROWS =
-0.164147	of &list[100] is (int)(&list[100]) =
-0.164147	rows; i++) for (j =
-0.164147	2 - n.a. a+a+a+a =
-0.164147	= x2*x2; double x8 =
-0.164147	| (~a&c) | (b&c) =
-0.164147	= a&(b|c) x-xxxx--x (a|b)&(a|c) =
-0.164147	get the value -100+100+100 =
-0.164147	|| (!a&&c) || (b&&c) =
-0.164147	TILESIZE) { for (c1 =
-0.164147	list[i].a = 1.0; list[i].b =
-0.164147	n.a. n.a. - andnot(a,a) =
-0.164147	static const float lookup[2] =
-0.164147	y.c + 3.; x.d =
-0.164147	i; } x; x.f =
-0.164147	a&&(b||c) !a && !b =
-0.164147	a[0] = 1; a[1] =
-0.164147	row, column; for (row =
-0.164147	all squares: for (r1 =
-0.164147	{ const int arraysize =
-0.164147	list[100], *temp; for (temp =
-0.164147	then be repeated 1024/4 =
-0.164147	8.12b int a[2]; a[0] =
-0.164147	a&&(b||c) (a&&!b) || (!a&&b) =
-0.164147	calculated as (critical stride) =
-0.164147	5 * 0.5 ns =
-0.164147	size; i++) { ab[i].b =
-0.164147	is (columns * sizeof(float)) =
-0.164147	temp->a = 1.0; temp->b =
-0.164147	&list[100]; temp++) { temp->a =
-0.164147	{ static float list[] =
-0.164147	Wednesday = 8, Thursday =
-0.164147	floats: float * DynamicArray =
-0.164147	instruction set int iset =
-0.164147	Sunday = 1, Monday =
-0.164147	(a&b) | (~a&c) a&b&c&d =
-0.164147	--------- ~a ^ ~b =
-0.164147	SIZE; c++) { a[c][r] =
-0.164147	of lines is 8*1024/64 =
-0.164147	erroneously called with IsPowerOf2 =
-0.164147	XMM (vector) reductions: ~(~a) =
-0.164147	int b, c; x[0] =
-0.164147	NUMROWS = 100, NUMCOLUMNS =
-0.164147	= (a&b)&(c&d) a ^0 =
-0.164147	NUMROWS; row++) for (column =
-0.164147	// Example 7.41b a.x =
-0.164147	c.x + d.x; a.y =
-0.164147	= 2; } list[300] =
-0.164147	/ 0x40) % 0x20 =
-0.237151	from library asmlib.. // or
-0.540144	even when the function or
-0.439002	cannot inline the function or
-0.339453	2. Put the function or
-0.515808	end of a function or
-0.461990	If the same function or
-0.228232	recommend that no function or
-0.326177	to test each function or
-0.283218	interfere with any function or
-0.528346	Make a member function or
-0.428879	only a single function or
-0.283218	breakpoints at every function or
-0.228232	separate storage. No function or
-0.414545	inlining the frame function or
-0.228232	function or friend function or
-0.293190	can be represented with or
-0.312786	function in system code or
-0.337680	to use assembly code or
-0.311773	to use vectorized code or
-0.352811	i an unsigned int or
-0.236709	number to reflect this or
-0.500462	known at compile time or
-0.447575	involves allocation of memory or
-0.283778	on variables in memory or
-0.520695	scattered around in memory or
-0.236743	common to exchange data or
-0.353893	logic behind the program or
-0.326147	the size of program or
-0.343761	at initialization. The program or
-0.452396	elements in a vector or
-0.538078	read from the same or
-0.323545	pointers and virtual functions or
-0.297838	to use intrinsic functions or
-0.297838	by using intrinsic functions or
-0.231185	to identify individual functions or
-0.381191	for a particular CPU or
-0.445429	immediately before the loop or
-0.485031	element outside the loop or
-0.348788	will vectorize a loop or
-0.354283	inline keyword is used or
-0.321070	executable file and one or
-0.290575	a program has one or
-0.330890	called from only one or
-0.271143	improved by using one or
-0.217575	above, but read one or
-0.271143	translated to just one or
-0.217575	up and enable one or
-0.217575	units, and 22 one or
-0.217575	more integer units, one or
-0.217575	specific purpose: Contain one or
-0.871810	in the code cache or
-0.460608	support this instruction set or
-0.185353	member of the class or
-0.185353	instance of the class or
-0.242112	reference to the class or
-0.297115	member of a class or
-0.471261	members of a class or
-0.335655	variables in a class or
-0.472778	wrapped into a class or
-0.219912	of the object's class or
-0.433117	compiler with other compilers or
-0.313099	making the code size or
-0.010081	the Gnu, Clang, Intel or
-0.020401	as Gnu, Clang, Intel or
-0.671289	value of the pointer or
-0.391858	there is a pointer or
-0.204543	give it a pointer or
-0.204543	function with a pointer or
-0.275199	function return a pointer or
-0.151805	function through a pointer or
-0.151805	b through a pointer or
-0.151805	object through a pointer or
-0.151805	variable through a pointer or
-0.204543	to transfer a pointer or
-0.204543	modified. Unlike a pointer or
-0.204543	to pass a pointer or
-0.250317	Pointer elimination A pointer or
-0.082053	never return any pointer or
-0.082053	avoid making any pointer or
-0.183307	taken. A const pointer or
-0.183307	mode 4 4 pointer or
-0.183307	unsigned 8 8 pointer or
-0.183307	is a simple pointer or
-0.183307	plus 6 integer, pointer or
-0.183307	risky. The returned pointer or
-0.183307	about a variable, pointer or
-0.447547	as a function library or
-0.233373	to a graphics library or
-0.196282	b is a float or
-0.196282	integer to a float or
-0.215435	used. Conversions of float or
-0.221738	4 bytes = float or
-0.094494	Efficient conversion from float or
-0.094494	avoid conversions from float or
-0.215435	in registers (8 float or
-0.212189	value that is two or
-0.212189	There may be two or
-0.323752	Typically, there are two or
-0.378550	way: There are two or
-0.199994	good to have two or
-0.199994	modern CPUs have two or
-0.424818	program chooses between two or
-0.265054	used for doing two or
-0.212189	instruction set. Make two or
-0.491996	beginning of the object or
-0.233305	libraries distributed as object or
-0.227174	efficient. Access to static or
-0.282017	functions inline or static or
-0.227174	should rely on static or
-0.282017	declare all functions static or
-0.233022	implemented in compiled C++ or
-0.233022	written in C, C++ or
-0.433157	however, if the array or
-0.454133	size of an array or
-0.307966	as copying an array or
-0.222898	a fixed size array or
-0.222898	memory allocation Any array or
-0.323513	as small as possible or
-0.313859	Efficiency Accessing a variable or
-0.313859	compiler treat a variable or
-0.225353	sure that no variable or
-0.279952	a global const variable or
-0.312669	preferably avoid global variables or
-1.707397	a power of 2 or
-0.380064	through an import table or
-0.313810	instructions out of order or
-0.290410	uint32_t unsigned long long or
-0.290410	231-1 int32_t long long or
-0.352484	system running in 32-bit or
-0.334864	the first data member or
-0.227918	in a different way or
-0.229911	pointers in one way or
-0.229911	goes randomly one way or
-0.236420	use a #define, const or
-0.235766	of 1, 2, 4 or
-0.347818	the destructor to call or
-0.317276	12 or 16 8 or
-0.231280	only the lower 8 or
-0.235917	load the entire 64 or
-0.425932	for whole program optimization or
-0.235673	DLL's (dynamically linked libraries or
-0.220286	rather than by pointers or
-0.174041	are accessed through pointers or
-0.220286	section may contain pointers or
-0.235520	<<, >> can test or
-0.050543	Memory allocated with new or
-0.224957	allocated dynamically (with new or
-0.235510	integers in 16-bit systems or
-0.322484	bottleneck is file access or
-0.235366	can be 8, 16 or
-0.388609	systems when the SSE2 or
-0.388609	operations when the SSE2 or
-0.441451	time unless the SSE2 or
-0.610355	to enable the SSE2 or
-0.260386	// Only for SSE2 or
-0.260386	the instruction set SSE2 or
-0.380060	can be ruled out or
-0.571469	with the operating system or
-0.041189	known to be 0 or
-0.041189	guaranteed to be 0 or
-0.009950	other value than 0 or
-0.284541	other values than 0 or
-0.195133	b is always 0 or
-0.236201	avoid macros with short or
-0.235236	testing single assembly instructions or
-0.235539	function libraries. Use Gnu or
-0.615919	of the most important or
-0.062212	computer with multiple CPUs or
-0.062212	To use multiple CPUs or
-0.062212	parallel: Using multiple CPUs or
-0.328909	intrinsic functions, inline assembly or
-0.303114	repeat count is large or
-0.327144	library is very large or
-0.343883	7. If the arrays or
-0.532984	fast floating point calculations or
-0.321940	class is 128 bytes or
-0.321500	actually increase the speed or
-0.277932	the software for speed or
-0.301051	for optimizing execution speed or
-0.206864	run with reduced speed or
-0.259045	handled at half speed or
-0.234876	are done with single or
-0.347865	with an Intel, AMD or
-0.617175	case of an exception or
-0.237528	if it is small or
-0.237528	the function is small or
-0.264506	body is very small or
-0.264506	Execution time too small or
-0.292907	Number) if an overflow or
-0.273373	reduction would cause overflow or
-0.219546	compiler to ignore overflow or
-0.227207	whether they are integers or
-0.227207	of eight 16-bit integers or
-0.098757	page 78). A matrix or
-0.098757	structure needed? A matrix or
-0.341779	12.1a. Enable the AVX or
-0.226887	code compiled for AVX or
-0.226553	organizing data into classes or
-0.329232	your own container classes or
-0.296779	of two double precision or
-0.296779	hold four double precision or
-0.248479	are using single precision or
-0.248479	the constant single precision or
-0.075177	on a command line or
-0.075177	from a command line or
-0.379026	in the compiler manual or
-0.345166	vectorization will be advantageous or
-0.624962	to use a container or
-0.532820	point operations involves eight or
-0.234149	a loop with few or
-0.475707	of a linked list or
-0.424460	use a sorted list or
-0.426599	beginning of the structure or
-0.026866	bytes in a structure or
-0.026866	last in a structure or
-0.055504	such as a structure or
-0.055504	how big a structure or
-0.055504	may define a structure or
-0.119155	an array of structure or
-0.119155	to arrays of structure or
-0.148501	each thread. This structure or
-0.148501	of the same structure or
-0.148501	of a class, structure or
-0.234722	making critical functions inline or
-0.234029	function that doesn't add or
-0.307399	follows in 64-bit mode or
-0.307399	i.e. in 64-bit mode or
-0.202119	2 In 64-bit mode or
-0.272347	point. Use 64-bit mode or
-0.234452	initialized to valid values or
-0.234272	more time loading files or
-0.234008	inlining causes technical problems or
-0.320438	can save cache space or
-0.450639	contains automatic CPU dispatching or
-0.289374	program has many branches or
-0.233512	the code involves multiplication or
-0.530214	vectorize the code automatically or
-0.302066	to be an expression or
-0.295828	constant propagation An expression or
-0.056744	any non-static data members or
-0.233458	have inefficient code-based methods or
-0.186377	they can be signed or
-0.186377	4 64-bit integer, signed or
-0.083266	2 2 int, signed or
-0.083266	1 short int, signed or
-0.186377	1 1 char, signed or
-0.233534	is no try block or
-0.220325	type conversion takes zero or
-0.220325	if a was zero or
-0.289300	projects made with Microsoft or
-0.162854	r is a reference or
-0.162854	example: Use a reference or
-0.233232	classes, such as string or
-0.232821	multiplication may be three or
-0.340883	heavily on table lookup or
-0.495438	objects of different types or
-0.219336	objects have mixed types or
-0.436088	for floating point expressions or
-0.219336	memory buffer and read or
-0.219336	time. Do not read or
-0.522420	the arrays are aligned or
-0.391703	use #pragma vector aligned or
-0.204443	arrays are properly aligned or
-0.375993	a class is declared or
-0.216274	CPU cores. A process or
-0.423332	during the installation process or
-0.232735	sometimes give misleading results or
-0.287765	the conversion. The constructor or
-0.231922	the loading of modules or
-0.213896	// u.d is negative or
-0.213896	if both are negative or
-0.335287	size cannot be predicted or
-0.412565	the library is loaded or
-0.231901	about whether the positive or
-0.180179	will often be C or
-0.229068	linked together with C or
-0.180179	in a separate C or
-0.180179	may choose either C or
-0.521815	has to turn off or
-0.212997	you turn them off or
-0.231759	BigArray[1024]; // Windows syntax or
-0.231641	identified by their index or
-0.401821	minute if the network or
-0.231473	test setup but slow or
-0.231598	CPU-time in library functions, or
-0.231146	for supporting multiple platforms or
-0.230744	allocated to each task or
-0.306197	which cannot be inlined or
-0.231818	decides whether to repeat or
-0.230625	operator; you can clear or
-0.220990	of a hard disk or
-0.220990	from a hard disk or
-0.205127	the network is overloaded or
-0.205127	name cannot be overloaded or
-0.229862	reduced to always true or
-0.204685	table-based methods with little or
-0.204685	Most programmers have little or
-0.411941	sure it is initialized or
-0.195139	math and the SSE or
-0.195139	microprocessor has the SSE or
-0.089460	is faster than reading or
-0.089460	operation rather than reading or
-0.546432	the number of cores or
-0.442695	The multiple CPU cores or
-0.388349	objects can be copied or
-0.202260	is created, deleted, copied or
-0.424293	a Windows, Linux, BSD or
-0.229641	in a multithreaded program, or
-0.229471	no compile- time loops or
-0.228443	well-tested functions, classes, templates or
-0.228443	in a static buffer or
-0.228443	cycles rather than seconds or
-0.228256	on a Linux compiler, or
-0.228630	in a different module or
-0.039226	waiting for user input or
-0.283457	container for each row or
-0.228817	at a link map or
-0.228432	is __asm int 3; or
-0.227810	writes with normal writes or
-0.246223	for specific CPU brands or
-0.195475	known processors. Other brands or
-0.227396	CPU of unknown brand or
-0.228432	an object by *p or
-0.226556	double 8, 10, 12 or
-0.226324	vector::reserve with a prediction or
-0.411973	converted to an integer, or
-0.226556	CriticalFunction is called once or
-0.189802	aliasing. __declspec(noalias) or __restrict or
-0.189802	using the keyword __restrict or
-0.281053	as a runtime DLL or
-0.222132	operators new and delete or
-0.222132	over new and delete or
-0.313730	belongs to class C1 or
-0.280094	function will be called, or
-0.224948	this important new update or
-0.366678	software to be slower or
-0.225213	desired functionality without polymorphism or
-0.225478	threads can add, remove or
-0.321542	error code if possible, or
-0.175412	A function that reads or
-0.175412	the program afterwards reads or
-0.224353	name, regardless of scope or
-0.175137	a pointer, a reference, or
-0.527186	a pointer or reference, or
-0.223122	are up to five or
-0.092636	3.7 File access Reading or
-0.092636	in the program. Reading or
-0.092636	than random access. Reading or
-0.092636	0x20 = 0x1C. Reading or
-0.223429	language that requires compilation or
-0.220577	Fast function calling. __fastcall or
-0.220577	cache. Files on remote or
-0.220944	operands has side effects or
-0.274540	access. Run multiple processes or
-0.221679	goes to the console or
-0.017411	a string is created or
-0.304709	may be a hundred or
-0.301505	respond to a command or
-0.402030	than by the latency or
-0.220944	bits for Tuesday, Wednesday or
-0.274540	to pressing a key or
-0.588364	in the carry flag or
-0.217698	live-ranges do not overlap or
-0.289643	when they are needed, or
-0.216788	whether you use pre-increment or
-0.216788	like a mouse move or
-0.216788	work. Data alignment. __declspec(align(16)) or
-0.635795	a simple periodic pattern or
-0.217242	maintain. Any specific bottleneck or
-0.015930	12.9 Aligning RGB video or
-0.217698	grows by only 50% or
-0.216788	single result, true (1) or
-0.216788	program code are uncached or
-0.217698	difficult to use, incompatible or
-0.147601	doing multiple calculations simultaneously or
-0.147601	or more jobs simultaneously or
-0.087937	call. A branch tree or
-0.080438	be a binary tree or
-0.132429	moved. A binary tree or
-0.216788	advantageous to use hyperthreading or
-0.282270	of 32 bits each, or
-0.282270	of large memory blocks, or
-0.210545	to using templates. Two or
-0.263198	is called square blocking or
-0.005717	dynamic link library (*.dll or
-0.047993	the dynamic libraries (*.dll or
-0.211142	same queue, list, database, or
-0.210545	as a function parameter, or
-0.211142	buffers for storing text or
-0.210545	our estimate is correct or
-0.210545	and the options -S or
-0.210545	possibly be more (128 or
-0.121708	(Intel CPU only) -O3 or
-0.121708	/O2 or /Ox -O3 or
-0.047993	an operand is infinity or
-0.047993	result will be infinity or
-0.047993	was zero or infinity or
-0.211142	the function returns. Global or
-0.210545	forwards, not backwards. Copying or
-0.210545	is running on. Replace or
-0.211142	runtime framework for interpreting or
-0.198322	on a graphics card or
-0.198322	second step of interpretation or
-0.198322	objects are not overlapping or
-0.198322	in the project window or
-0.198322	the vector size (16 or
-0.198322	an overloaded assignment operator, or
-0.249424	the time. A for-loop or
-0.198322	standard header file stdint.h or
-0.198322	allows bigger segments (32-bit or
-0.198322	an 8-bit signed number, or
-0.198322	and other resources locally or
-0.198322	that a binary search, or
-0.198322	if they are uninitialized or
-0.198322	in an && expression, or
-0.198322	expected real-time speed. Delays or
-0.198322	application programs use internet or
-0.198322	that produce streaming audio or
-0.198322	identified by consecutive indices or
-0.198322	the network is unstable or
-0.008604	link libraries (*.lib, *.a) or
-0.198322	as sorting and searching, or
-0.198322	element in an array, or
-0.198322	function for this purpose, or
-0.198322	with a graphics coprocessor or
-0.198322	can use for recovering or
-0.198322	don't want this initialization, or
-0.198322	like a key press or
-0.249424	respond quickly to keyboard or
-0.198322	the structure }; 52 or
-0.463015	with new and delete, or
-0.163591	availability of an update, or
-0.163591	code or use objconv or
-0.163591	The dispatching to C1::Disp() or
-0.163591	has a particular weakness or
-0.163591	nearest element to x?" or
-0.163591	to draw each pixel or
-0.163591	using the keyword __thread or
-0.163591	In this example, f(x) or
-0.163591	not dynamic libraries (.dll or
-0.163591	Requires binutils version 2.20 or
-0.163591	like pressing a button or
-0.163591	measurements can become imprecise or
-0.163591	draws a whole polygon or
-0.163591	aligning the data optimally, or
-0.163591	using the declaration "static" or
-0.163591	in a computer game or
-0.163591	vectors of inte- ger or
-0.163591	take a whole workday or
-0.163591	the compiler to vectorize, or
-0.163591	cache misses, branch misprediction, or
-0.163591	irrelevant within a year or
-0.163591	data structures with First-In-First-Out or
-0.163591	can use the GetTickCount or
-0.163591	from static libraries (.lib or
-0.163591	for "standard stack frame" or
-0.163591	no pointer aliasing. __declspec(noalias) or
-0.163591	the computer is reset or
-0.163591	rather than allocating piecewise or
-0.163591	if a program creates or
-0.163591	on hacks that violate or
-0.163591	language, such as VHDL or
-0.163591	have been reordered, inlined, or
-0.163591	e.g. the option /QaxAVX or
-0.163591	Optimize for speed /O2 or
-0.163591	memory allocation using new/delete or
-0.163591	as memcpy, memmove, memset, or
-0.163591	be available in 2015 or
-0.163591	the integer is signed, or
-0.163591	the assembly output (/FAs or
-0.163591	defined with enum, const, or
-0.163591	or __asm ("int 3"); or
-0.163591	software development kit (SDK or
-0.163591	to the structure. Incrementing or
-0.163591	well-defined with option -fwrapv or
-0.163591	universal algorithm (e.g. Quine–McCluskey or
-0.163591	Writes to a printer or
-0.163591	when compiling for AVX2, or
-0.163591	development methods are incremental or
-0.163591	based on complicated criteria or
-0.163591	slow unless the Pentium-II or
-0.163591	file timingtest.h from www.agner.org/optimize/testp.zip or
-0.163591	bounds of valid addresses, or
-0.163591	the CPUID instruction directly, or
-0.163591	index out of range"); or
-0.163591	sixteen vector registers (XMM or
-0.163591	p->f() goes to C0::f or
-0.163591	the computer while he or
-0.163591	classes like string, wstring or
-0.163591	optimizations with option -Wstrict-overflow=2, or
-0.237695	an hour. Neither is it
-0.582981	unless the address of it
-0.309517	before the compiler and it
-0.776227	at a time and it
-0.341329	different intrinsic functions and it
-0.665758	the code cache and it
-0.289239	in sequential order and it
-0.233532	in some cases and it
-0.092121	of function calls and it
-0.092121	pure function calls and it
-0.311277	inline the function, and it
-0.309517	and VIA processors, and it
-0.233532	always accurate, however, and it
-0.233532	compiled with -fpic and it
-0.233532	is quite inefficient, and it
-0.289239	or more cores, and it
-0.289239	through multiple layers and it
-0.233532	are often fluctuating and it
-0.197452	intermediate code is that it
-0.251032	static data is that it
-0.465648	instruction set is that it
-0.251032	64-bit double is that it
-0.251032	The problem is that it
-0.251032	of optimizations is that it
-0.251032	data storage is that it
-0.251032	complicated algorithms is that it
-0.251032	keyword volatile is that it
-0.465648	we notice is that it
-0.522543	copying the code that it
-0.269963	the same time that it
-0.156801	all the functions that it
-0.212751	around it so that it
-0.212751	thread function so that it
-0.284873	of time so that it
-0.212751	register less so that it
-0.212751	any exception so that it
-0.212751	reliable source so that it
-0.212751	code section so that it
-0.212751	this statement so that it
-0.212751	by 100 so that it
-0.212751	is changed so that it
-0.212751	explained above, so that it
-0.212751	example 9.5 so that it
-0.212751	task switches; so that it
-0.625316	you are sure that it
-0.353275	is so small that it
-0.497126	has the advantage that it
-0.327781	a loop-invariant expression that it
-0.009578	is so high that it
-0.019371	be so high that it
-0.205588	member function means that it
-0.388056	units. This means that it
-0.090741	const variable means that it
-0.090741	global variable means that it
-0.205588	to 10 means that it
-0.289341	the compiler optimizations that it
-0.332820	overflow or assume that it
-0.269963	The table shows that it
-0.216532	are so expensive that it
-0.216532	table 9.1 show that it
-0.021854	in the event that it
-0.299364	but I think that it
-0.216532	the unfortunate consequence that it
-0.216532	a program saying that it
-0.216532	is so kludgy that it
-0.216532	and later discovers that it
-0.216532	the compiler knows that it
-0.216532	One may argue that it
-0.033394	is small or if it
-0.069571	very small or if it
-0.254974	periodic pattern or if it
-0.348905	inline a function if it
-0.266352	or object as if it
-0.418613	memory, but not if it
-0.266352	no extra time if it
-0.285566	from RAM memory if it
-0.352251	this works only if it
-0.269117	is needed only if it
-0.407980	of a loop if it
-0.372251	as an integer if it
-0.313476	much more efficient if it
-0.266352	best possible branch if it
-0.213338	virtual function call if it
-0.295100	of member pointers if it
-0.213338	of large arrays if it
-0.645013	a separate thread if it
-0.314609	function must check if it
-0.266352	predicted quite well if it
-0.044327	hundred clock cycles if it
-0.044327	2-3 clock cycles if it
-0.266352	is calculated fast if it
-0.436780	result to see if it
-0.266352	a variable global if it
-0.213338	a switch statement if it
-0.213338	about the costs if it
-0.213338	make a destructor if it
-0.302798	is only safe if it
-0.418613	accessed most efficiently if it
-0.541127	you may consider if it
-0.213338	an error message if it
-0.213338	oriented programming style if it
-0.213338	a time consumer if it
-0.213338	a separate subroutine if it
-0.532620	significant as long as it
-0.333645	distributed and stored as it
-0.310950	code is distributed as it
-0.234735	cannot be executed as it
-0.592788	clock cycles more than it
-0.324089	actually implies more than it
-0.231038	sample more data than it
-0.231038	on such systems than it
-0.306547	become less important than it
-0.340826	treated as bigger than it
-0.525236	for other purposes than it
-0.195459	This is the time it
-0.388476	1/50 of the time it
-0.086825	equal to the time it
-0.086825	compared to the time it
-0.264530	call, and the time it
-0.264530	compared with the time it
-0.112428	more than the time it
-0.116558	less than the time it
-0.071632	to this the time it
-0.500317	unknown at the time it
-0.195459	way includes the time it
-0.195459	processors. Consider the time it
-0.195459	by measuring the time it
-0.131314	user's time. The time it
-0.131314	Program installation The time it
-0.131314	12 bytes. The time it
-0.131314	the calculations. The time it
-0.131314	right prediction. The time it
-0.131314	is doubled. The time it
-0.131314	is run. The time it
-0.131314	certain tolerance. The time it
-0.131314	performance costs. The time it
-0.311632	from memory each time it
-0.304882	measure how long time it
-0.068875	and how much time it
-0.329311	is re-allocated every time it
-0.329689	way as last time it
-0.826994	no reason to use it
-0.530837	how you can use it
-0.216723	the object x when it
-0.022136	is mispredicted only when it
-0.289567	process is used when it
-0.209397	dispatch mechanism even when it
-0.209397	always inlined even when it
-0.209397	memory space, even when it
-0.216723	and is compiled when it
-0.216723	line by line when it
-0.216723	background process running when it
-0.216723	has many advantages when it
-0.216723	an error message when it
-0.216723	exception is costly when it
-0.216723	this is permissible when it
-0.216723	code motion manually when it
-0.269096	of a code then it
-0.230272	of the program then it
-0.257024	in library functions then it
-0.230272	particular instruction set then it
-0.181254	with different compilers then it
-0.181254	local object static then it
-0.305636	inttypes.h is available then it
-0.181254	to clean up then it
-0.181254	object is large then it
-0.181254	during program execution then it
-0.181254	updates are necessary then it
-0.181254	is not advantageous then it
-0.181254	is a problem then it
-0.181254	is already known then it
-0.181254	10 clock cycles then it
-0.181254	page 73) automatically then it
-0.181254	one CPU core then it
-0.230272	an array index then it
-0.181254	on CPU efficiency then it
-0.181254	the same resource then it
-0.247922	a separate module then it
-0.181254	in the debugger then it
-0.038723	code version on, then it
-0.038723	advanced version on, then it
-0.305636	CPU. If not, then it
-0.181254	a non-sequential manner then it
-0.181254	no big arrays, then it
-0.181254	register (see below) then it
-0.181254	for one segment then it
-0.181254	10 μs today, then it
-0.181254	has been identified, then it
-0.181254	with CPU dispatching, then it
-0.181254	have been found, then it
-0.181254	is too fine then it
-0.181254	T to T+5, then it
-0.181254	is poorly predictable, then it
-0.181254	can be made) then it
-0.181254	this is obvious, then it
-0.181254	be too small, then it
-0.181254	is not met then it
-1.417201	in order to make it
-0.347674	is advisable to make it
-0.338403	software package and make it
-0.271691	vector instructions that make it
-0.271691	The conditions that make it
-0.316764	of parameters then make it
-0.474455	deprecated. This is because it
-0.072905	non-static member function because it
-0.243312	fully optimized code because it
-0.177297	the Intel compiler because it
-0.271663	the first time because it
-0.177297	preprocessor can do because it
-0.177297	static link library because it
-0.271663	be very efficient because it
-0.225844	an induction variable because it
-0.243312	that uses pointers because it
-0.177297	testing is useful because it
-0.177297	from cleaning up because it
-0.243312	reloaded eight times because it
-0.018588	is not optimal because it
-0.177297	has no cost because it
-0.177297	Gnu compiler mechanism because it
-0.177297	a function just because it
-0.384390	storage is inefficient because it
-0.177297	to different platforms because it
-0.177297	by other constants because it
-0.177297	the main executable because it
-0.177297	is inherently parallel because it
-0.177297	for several seconds because it
-0.038008	is time consuming because it
-0.038008	be time consuming because it
-0.177297	may be poor because it
-0.177297	between multiple processes because it
-0.177297	label plus one, because it
-0.177297	code becomes simpler because it
-0.177297	is particularly interesting because it
-0.177297	and accessed non-sequentially because it
-0.177297	(e.g. Sandy Bridge) because it
-0.177297	allocated with alloca, because it
-0.177297	is particularly risky because it
-0.500661	specifically for the CPU it
-0.233217	is a branch. If it
-0.288880	members come first. If it
-0.233217	do so. 58 If it
-0.328322	of processors on which it
-0.235043	of the array, which it
-0.259837	in most cases, but it
-0.145417	of the function, but it
-0.145417	the latter function, but it
-0.319270	SSE2 instruction set, but it
-0.259837	than other CPUs, but it
-0.191453	few clock cycles, but it
-0.341037	between multiple threads, but it
-0.241705	optimizes reasonably well, but it
-0.241705	is a pointer, but it
-0.191453	the logarithm again, but it
-0.241705	for such applications, but it
-0.191453	of 64-bit software, but it
-0.191453	the simplest method, but it
-0.191453	one CPU core, but it
-0.191453	take the hint, but it
-0.191453	associated with profiling, but it
-0.191453	a simple solution, but it
-0.191453	a syntax restriction, but it
-0.191453	of disk caching, but it
-0.191453	many small subtasks, but it
-0.191453	the container expandable, but it
-0.191453	a considerable job, but it
-0.191453	the code section, but it
-0.191453	parameter is wrong, but it
-0.331477	modules than the one it
-0.460924	checks which instruction set it
-0.459662	optimized is to do it
-0.342929	entry with the pointer it
-0.455869	feature that the object it
-0.207228	to the point where it
-0.207228	a public variable where it
-0.457281	may be cases where it
-0.245908	there are cases where it
-0.105777	the few cases where it
-0.105777	a few cases where it
-0.207228	IsPowerOf2 = false where it
-0.207228	32- bit mode, where it
-0.207228	level-1 data cache, where it
-0.207228	blocks of data", where it
-0.355487	always has the value it
-0.236730	of class C1, so it
-0.250541	function pointer and makes it
-0.291223	cases and it makes it
-0.281166	is doubled. This makes it
-0.281166	have occurred. This makes it
-0.281166	made local. This makes it
-0.278614	by default, which makes it
-0.199314	vector class library makes it
-0.199314	variable. Using pointers makes it
-0.250541	used. Dynamic linking makes it
-0.199314	The static declaration makes it
-0.202807	is an Intel before it
-0.334579	on the stack before it
-0.202807	check for overflow before it
-0.273157	their actual values before it
-0.202807	value of temp before it
-0.202807	detect the misprediction before it
-0.202807	interpretation or compilation before it
-0.202807	to calculate (c+d) before it
-0.202807	the second sub-vector before it
-0.282402	best compiler and call it
-0.047037	// After first call it
-0.341513	execution unit. For example, it
-0.341513	of modularity. For example, it
-0.236216	provide the best optimization it
-0.236244	and in most libraries it
-0.561183	important to make sure it
-0.310847	only you make sure it
-0.403321	library then make sure it
-0.345191	recently than to access it
-0.276205	and in this case it
-0.276205	but in this case it
-0.346726	71). In this case it
-0.381880	optimizations. In most cases it
-0.252556	purity. In many cases it
-0.201055	calculations. In some cases it
-0.201055	unrolling In some cases it
-0.201055	have. In some cases it
-0.201055	44 In some cases it
-0.201105	in more complex cases it
-0.313203	to zero than making it
-0.454326	does what you want it
-0.327802	optimizations that we want it
-0.323588	system, the more important it
-0.235644	is running on, while it
-0.337459	later and the work it
-0.207348	the same resources. But it
-0.207348	a modern CPU. But it
-0.207348	as function parameter. But it
-0.207348	have such checks. But it
-0.207348	software optimization issue. But it
-0.342322	main memory and therefore it
-0.191303	to find out whether it
-0.450191	you may consider whether it
-0.191303	compiler to evaluate whether it
-0.297358	parallelism when deciding whether it
-0.597171	account when deciding whether it
-0.191303	order to determine whether it
-0.280336	this function and calculate it
-0.225692	the compiler may calculate it
-0.323622	calculate *p+2 and store it
-0.312404	to see how well it
-0.278378	the value and write it
-0.363491	and read or write it
-0.234829	of the size. However, it
-0.234434	four multiplications. How was it
-0.223626	__intel_cpu_features_init_x(). In other cases, it
-0.515229	2002). In some cases, it
-0.347538	that fits the microprocessor it
-0.223199	const variable or replace it
-0.223199	poorly predictable then replace it
-0.131598	for internal references. Therefore, it
-0.131598	pointer points to. Therefore, it
-0.131598	many different applications. Therefore, it
-0.131598	any other number. Therefore, it
-0.131598	MemberPointer is declared. Therefore, it
-0.175045	thread than another. Therefore, it
-0.131598	of type int. Therefore, it
-0.131598	function library. 78 Therefore, it
-0.131598	it was programmed. Therefore, it
-0.131598	power than PCs. Therefore, it
-0.131598	has been calculated. Therefore, it
-0.322108	previous iteration. This allows it
-0.221315	can do and what it
-0.275375	reference cannot change what it
-0.233781	simultaneously. In multithreaded applications it
-0.219947	access an object after it
-0.219947	object is accessed after it
-0.233635	the class or give it
-0.812417	in the loop control it
-0.331538	Func1(list, &list[8]); } Here, it
-0.232877	have #if directives around it
-0.232780	for RTTI then turn it
-0.336277	exits, when in fact it
-0.287190	more efficient, and sometimes it
-0.287190	the CPU and prevent it
-0.328974	is compiling. This prevents it
-0.287474	bit. We can tell it
-0.231320	a waste of time, it
-0.286572	object instead of copying it
-0.231320	as fast as accessing it
-0.231589	the code and divide it
-0.306228	branch. After each iteration it
-0.244456	function returns even though it
-0.244456	than nine, even though it
-0.257631	< 231 then convert it
-0.205611	the compiler must convert it
-0.230279	complicated that I consider it
-0.433681	part of the program, it
-0.202859	in the final program, it
-0.861330	is the reason why it
-0.171573	several clock cycles whenever it
-0.171573	to be mispredicted whenever it
-0.171573	their own initiative whenever it
-0.323472	memory allocation is used, it
-0.229516	and back again. Obviously, it
-0.229429	sets and other features it
-0.228586	example, you should multiply it
-0.228586	F2(b); } } Here it
-0.280757	container than to delete it
-0.225940	without generating overflow. Likewise, it
-0.061575	a function is called, it
-0.112105	shared object is called, it
-0.226063	been brutally interrupted. Now it
-0.298485	data is to declare it
-0.224109	the CPU by giving it
-0.224109	the function bodies above, it
-0.278541	more efficient than comparing it
-0.324972	number generators. In general, it
-0.224252	Bounds checking In C++, it
-0.224396	or reference to anything it
-0.221730	than the specific event it
-0.822749	On the other hand, it
-0.221730	is declared or created it
-0.221558	predefined vector classes Fortunately, it
-0.305454	more readable but unfortunately it
-0.164601	invalid pointers, etc. And it
-0.164601	difficult to maintain. And it
-0.221730	facilitate porting between platforms, it
-0.221558	residual error and compare it
-0.221558	in interpreted script languages, it
-0.217761	and system crash. Furthermore, it
-0.298826	even smaller by declaring it
-0.217761	to a derived class, it
-0.217973	then you should disable it
-0.470348	size. In other words, it
-0.211504	it comes to optimization, it
-0.122229	particularly time consuming. Sometimes it
-0.122229	for simple tasks. Sometimes it
-0.211782	speed or size. Today, it
-0.211504	etc. In large arrays, it
-0.211504	increment an integer variable, it
-0.074488	3.5 Program loading Often, it
-0.074488	year or two. Often, it
-0.199254	program logic allows it, it
-0.199254	overwritten, and even worse, it
-0.199254	in a PC. Nevertheless, it
-0.199254	complexity of modern software, it
-0.199254	case of Boolean algebra, it
-0.074488	version. For team projects, it
-0.074488	version. For one-man projects, it
-0.199254	to 11.1b automatically, although it
-0.199254	same few parameters. Or it
-0.199254	advantages of each method, it
-0.250473	shared object is accessed, it
-0.164451	sure the compiler recognizes it
-0.164451	As we can see, it
-0.164451	table is cached. Usually it
-0.164451	the macro is referencing it
-0.164451	only improve the performance, it
-0.164451	operating system which redirects it
-0.164451	expression a = (b*c)/d, it
-0.164451	are dominating. At least, it
-0.164451	work as possible. Typically it
-0.164451	few programs do. Hence, it
-0.164451	well optimized software design, it
-0.164451	particular weakness or bottleneck, it
-0.164451	a new software project, it
-0.164451	a matter of habit, it
-0.164451	program. All in all, it
-0.164451	or 1 by XOR'ing it
-0.164451	not negative by AND'ing it
-0.164451	and references. Most importantly, it
-0.164451	the same as reflecting it
-0.164451	difficult cases like these, it
-0.164451	or iterative in nature, it
-0.831552	// This is the function
-0.806011	the value of the function
-0.800938	the performance of the function
-0.560916	return address of the function
-0.346967	the changes of the function
-0.346967	the scope of the function
-0.062917	keyword static to the function
-0.502862	any call to the function
-0.336738	made available to the function
-0.336738	adding throw() to the function
-0.343438	inlined function and the function
-0.444018	library function, and the function
-0.306996	is available in the function
-0.306996	also available in the function
-0.449175	be declared in the function
-0.347523	The dot in the function
-0.350892	is consistent for the function
-0.607429	the compiler that the function
-0.195504	the disadvantage that the function
-0.429541	function, means that the function
-0.607429	the possibility that the function
-0.429541	function, provided that the function
-0.334876	or reference, or the function
-0.719394	is advantageous if the function
-0.340619	const reference if the function
-0.340619	table lookup if the function
-0.340619	is pure if the function
-0.451824	is replaced by the function
-0.704524	be obtained with the function
-0.341176	as soon as the function
-1.044326	is faster than the function
-0.043203	than each time the function
-0.091184	value each time the function
-0.032103	the first time the function
-0.144941	list every time the function
-0.144941	branches every time the function
-0.144941	misprediction every time the function
-0.145702	the next time the function
-0.309507	also used when the function
-0.309507	needed even when the function
-0.401663	goes automatically when the function
-0.458766	be initialized when the function
-0.127746	is deallocated when the function
-0.127746	are deallocated when the function
-0.309507	is freed when the function
-0.348057	when returning from the function
-0.972076	you look at the function
-0.715251	is to make the function
-0.498063	want to make the function
-0.803240	64-bit mode because the function
-0.342526	intrinsic functions, but the function
-0.345522	addresses (i.e. where the function
-0.414731	be called before the function
-0.292575	the stack before the function
-0.292575	be freed before the function
-0.292575	be restored before the function
-0.778131	have to call the function
-0.445382	use in case the function
-0.484577	to look up the function
-0.412155	how many times the function
-0.453982	If you want the function
-0.533912	added information about the function
-0.419491	statement that calls the function
-0.302146	size arrays inside the function
-0.058151	be declared inside the function
-0.960253	takes to calculate the function
-0.232529	compiler to inline the function
-0.168018	it cannot inline the function
-0.307677	the optimization unless the function
-0.307677	more clear unless the function
-0.329534	const reference allows the function
-0.092855	object's class. Make the function
-0.092855	the object. Make the function
-0.092855	function returns. Make the function
-0.092855	following alternatives: Make the function
-0.206563	more than calling the function
-0.277577	array before calling the function
-0.321868	call method. When the function
-0.305109	page 93. Avoid the function
-0.334552	a variable until the function
-0.229829	otherwise optimize across the function
-0.099886	The dispatcher changes the function
-0.099886	keyword __fastcall changes the function
-0.285031	You may declare the function
-0.229829	used for giving the function
-0.371598	inlined by declaring the function
-0.229829	meaning. 2. Put the function
-0.229829	they are. Declare the function
-0.163944	frame function is a function
-0.163944	pure function is a function
-0.163944	leaf function is a function
-0.542299	alloca. This is a function
-0.667759	My example is a function
-0.672163	jumping out of a function
-0.334669	return type of a function
-0.471425	multiple versions of a function
-0.432992	the end of a function
-0.315408	an object to a function
-0.315408	composite objects to a function
-0.315408	The link to a function
-0.315408	main executable to a function
-0.476621	be applied to a function
-0.502804	used variables in a function
-0.342163	and branches in a function
-0.342163	by piece in a function
-0.395024	cannot know that a function
-0.430106	code Assume that a function
-0.304129	convention says that a function
-0.338774	different module or a function
-0.307315	parameter, or as a function
-0.141725	is implemented as a function
-0.562145	same name as a function
-0.307315	by assignment, as a function
-0.325913	template rather than a function
-0.325913	parameter rather than a function
-0.478256	system may have a function
-0.272835	The next time a function
-0.272835	returns. Every time a function
-0.342348	functions must use a function
-0.332069	delay comes when a function
-0.280581	how much memory a function
-0.560466	ways to make a function
-0.281921	this problem. If a function
-0.517496	AVX part. If a function
-0.281921	and BSD. If a function
-0.334811	efficient as using a function
-0.390492	the difference between a function
-0.494270	time to call a function
-0.090043	efficient. Splitting up a function
-0.090043	rule. Splitting up a function
-0.280581	frame function, while a function
-0.280581	AVX support calls a function
-0.380914	a function through a function
-0.292649	is called through a function
-0.292649	own address through a function
-0.203921	the table inside a function
-0.102849	objects declared inside a function
-0.310736	the expression contains a function
-0.225908	likely to inline a function
-0.321271	compiler can replace a function
-0.098428	dispatcher then sets a function
-0.098428	initialization routine sets a function
-0.280581	called, or what a function
-0.225908	request for inlining a function
-0.300450	is useful whenever a function
-0.225908	} } Obviously, a function
-0.225908	kind of exceptions a function
-0.280581	never used. Whenever a function
-0.280581	Function pointers Calling a function
-0.225908	(Linux only). Specifies a function
-0.225908	function inline. Replacing a function
-0.225908	fact by replacing a function
-0.225908	class definition. Inlining a function
-0.225908	develop and publish a function
-0.023375	a table // of function
-0.825910	an excessive number of function
-0.507492	below. The disadvantage of function
-0.557938	1.0f;} The advantages of function
-0.128118	10 2.6 Choice of function
-0.128118	compiler. 2.6 Choice of function
-0.235977	allow lazy binding of function
-0.235977	about the chain of function
-0.323618	x86-64 platforms. Comparison of function
-0.293971	translate these addresses to function
-0.344412	pure. Virtual functions and function
-0.303386	use only compilers and function
-0.303386	The Intel compilers and function
-0.330175	finished. Register allocation and function
-0.282885	target of branches and function
-0.211067	with many branches and function
-0.817157	... } } The function
-0.622070	at compile time. The function
-0.727629	from the function. The function
-0.327405	example 14.19 below. The function
-0.234303	on first call. The function
-0.234303	on Intel/x86-compatible microprocessors. The function
-0.531688	the following reasons: The function
-0.234303	non-vector library. 119 The function
-0.290115	without using exceptions. The function
-0.234303	only one instance. The function
-0.572773	cannot be used for function
-0.293393	save recovery information for function
-0.650834	transpose(double a[SIZE][SIZE]) { // function
-0.043895	columns in matrix // function
-0.235789	short int cc[]); // function
-0.236081	with other compilers or function
-0.236081	has many branches or function
-0.236081	no try block or function
-0.314225	the time spent on function
-0.456801	other optimizations such as function
-0.235630	allow vector objects as function
-0.312017	transferring the variable as function
-0.277378	intrinsic functions // This function
-0.277378	SSE3 required // This function
-0.690728	} } } This function
-0.288419	for different compilers. This function
-0.232811	the CPU dispatching. This function
-0.232811	check for overflow. This function
-0.317476	the performance of this function
-0.317476	The name of this function
-0.232212	to 0 // this function
-0.483166	You can use this function
-0.232212	able to inline this function
-0.231285	the innermost loop A function
-0.231285	Make functions local A function
-0.231285	are mutually incompatible. A function
-0.231285	the function body. A function
-0.231285	make a destructor. A function
-0.237214	AVX, AVX2 Mathematical vector function
-0.339533	other details that make function
-0.344595	fastcall modifier can make function
-0.348204	by using a different function
-0.349317	Comparing performance of different function
-0.342689	complication that the same function
-0.443075	class). If the same function
-0.689802	are using the same function
-0.342689	to calculate the same function
-0.640457	doesn't call any other function
-0.237102	dispatcher function decides which function
-0.288344	making sure that one function
-0.288344	dynamically created by one function
-0.421495	or transferred from one function
-0.324203	to recommend that no function
-0.619321	the address of each function
-0.438700	only once for each function
-0.279492	extra code at each function
-0.224948	practice to test each function
-0.165189	number of times each function
-0.332674	how many times each function
-0.237008	would be able do function
-0.313676	generally used that most function
-0.354197	code further by using function
-0.700532	Note that the Intel function
-0.327151	detection function in Intel function
-0.595100	are using an Intel function
-0.229166	the well optimized Intel function
-0.558538	to call the library function
-0.305306	and economize the library function
-0.201204	log is a library function
-0.201204	ReadTSC as a library function
-0.276502	vector register. The library function
-0.276502	The undocumented Intel library function
-0.292953	line separately through multiple function
-1.014665	if there are many function
-0.291300	An application with many function
-0.291300	CPU-intensive applications with many function
-0.288648	declared outside of any function
-0.288648	will interfere with any function
-0.236566	no clear correspondence between function
-0.265678	(i.e. if the member function
-0.162729	object, and a member function
-0.162729	pointer or a member function
-0.319515	internally as a member function
-0.162729	support. Make a member function
-0.162729	class. Calling a member function
-0.162729	can force a member function
-0.216981	variable, pointer or member function
-0.452403	to a class member function
-0.169363	members. But each member function
-0.007862	functions. A static member function
-0.015869	stack. A static member function
-0.169363	to. A const member function
-0.347130	call a virtual member function
-0.169363	static static Assume member function
-0.038252	than a non-static member function
-0.216981	of a polymorphic member function
-0.324278	call to a const function
-0.348913	the stack. This makes function
-0.232326	standard stack frame makes function
-0.855191	version of the critical function
-0.286367	every time the critical function
-0.119996	that calls the critical function
-0.119996	16.2 calls the critical function
-0.286367	size. When the critical function
-0.390832	versions of a critical function
-0.021048	... // Call critical function
-0.378283	instance of the template function
-0.290500	above example, the template function
-0.308207	faster than the simple function
-0.342036	as calling a simple function
-0.342651	page 107). The Gnu function
-0.349575	probably without information about function
-0.329285	faster despite the extra function
-0.235578	manual for details. Use function
-0.343167	application-specific code. The best function
-0.294386	handling for a single function
-0.383043	not only a single function
-0.294386	when just a single function
-0.228046	etc. in vectors. These function
-0.228046	"Integrated Performance Primitives". These function
-0.454866	version of the virtual function
-0.181674	bypassed when the virtual function
-0.181674	by avoiding the virtual function
-0.224521	version of a virtual function
-0.224521	lookup for a virtual function
-0.228356	and misprediction of virtual function
-0.228356	// Call to virtual function
-0.228356	jump tables, and virtual function
-0.179543	avoid the inefficient virtual function
-0.307207	Call critical function through function
-0.228434	pointers to data through function
-0.311796	works best. Some common function
-0.425823	locally in the thread function
-0.324181	calculation of the power function
-0.227163	good compilers and optimized function
-0.517640	of the best optimized function
-0.235254	Therefore, the dispatcher 128 function
-0.311349	Windows allows only four function
-0.291910	later deleted by another function
-0.201535	code by making another function
-0.083221	if F1 calls another function
-0.083221	If F1 calls another function
-0.201535	object file. Use another function
-0.216530	Use macro as inline function
-0.269960	by using an inline function
-0.216530	inline functions An inline function
-0.225392	textbooks recommend that every function
-0.279996	debug breakpoints at every function
-0.279796	to have a standard function
-0.225215	Intel compiler includes standard function
-0.276294	by calling the intrinsic function
-0.222125	sense that each intrinsic function
-0.501893	defined in a separate function
-0.454793	isolated into a separate function
-0.552864	vectors There are various function
-0.242282	Note that the dispatcher function
-0.242282	loader calls the dispatcher function
-0.245538	being initialized. The dispatcher function
-0.194865	hardware conditions. A dispatcher function
-0.233632	exception handling information. Each function
-0.162940	call to a graphics function
-0.204733	processing unit. Various graphics function
-0.233514	for speed-critical functions. Many function
-0.439556	address of a linked function
-0.271727	code which the calling function
-0.218092	global object. The calling function
-0.233292	Linux. Asmlib My own function
-0.531961	point to the appropriate function
-0.528731	in the Gnu C function
-0.867943	pointer to the desired function
-0.231906	need separate storage. No function
-0.210063	use the Intel math function
-0.210063	the best optimized math function
-0.156381	copy of the inlined function
-0.319782	call to the inlined function
-0.068840	by inlining the frame function
-0.068840	by turning the frame function
-0.196191	simpler than a frame function
-0.068840	frame functions. A frame function
-0.068840	an exception. A frame function
-0.189497	throw() throw() throw() Assume function
-0.189497	const)) __attribute(( const)) Assume function
-0.189497	__restrict #pragma ivdep Assume function
-0.339429	to find the right function
-0.283867	#include "asmlib.h" // Define function
-0.283867	SelectAddMul_AVX2, SelectAddMul_dispatch; // Define function
-0.230810	on Intel CPU’s. Another function
-0.306439	versions of an overloaded function
-0.230605	NotPolymorphic(); }; // Any function
-0.229881	function. The string length function
-0.108230	that is a linear function
-0.108230	This is a linear function
-0.278785	the ability to define function
-0.278785	define fprintf // define function
-0.329506	code. If the latter function
-0.303574	critical. A very time-consuming function
-0.150183	calls to a pure function
-0.109043	Pure functions A pure function
-0.109043	loop-invariant code containing pure function
-0.109043	subexpressions that contain pure function
-0.109043	when it involves pure function
-0.195632	implementation of the factorial function
-0.195632	take the integer factorial function
-0.264536	and to optimize across function
-0.246211	from making optimizations across function
-0.282588	to use the memcpy function
-0.079268	use the CPU detection function
-0.037828	replace the CPU detection function
-0.079268	bypass the CPU detection function
-0.071263	the Intel CPU detection function
-0.571767	to call a polymorphic function
-0.282703	go here // Virtual function
-0.282242	be justified for general function
-0.225993	be predicted well. Even function
-0.300551	is used for storing function
-0.226225	matrix // call transpose function
-0.176275	Function Assembly name Intrinsic function
-0.176275	are summarized below. Intrinsic function
-0.093121	even if the dispatched function
-0.093121	table. If a dispatched function
-0.132711	// Entry to dispatched function
-0.093121	function calls another dispatched function
-0.224162	not support SSE. Several function
-0.313011	instrset_detect(); 116 // Set function
-0.039865	function into a leaf function
-0.039865	turned into a leaf function
-0.010595	other function. A leaf function
-0.275711	before test // Critical function
-0.164638	used inside the pow function
-0.164638	pow(x,10); } The pow function
-0.221772	represent a monotonically increasing function
-0.074621	versions of the strlen function
-0.074621	have tested the strlen function
-0.271862	vectors, but the asmlib function
-0.217813	and make a round function
-0.271637	implementation of the lrint function
-0.218012	function #pragma optimize(...) Fastcall function
-0.148257	functions directly: Library exp function
-0.148257	of 4 floats exp function
-0.211555	of the virtual 53 function
-0.211555	graphics library or API function
-0.264339	temporary intermediates, loop counters, function
-0.211817	Taylor series. The exponential function
-0.264339	should choose an up-to-date function
-0.211555	is equally efficient. Simple function
-0.211817	0; } The InstructionSet() function
-0.199304	0; } The indirect function
-0.199304	backwards though the 61 function
-0.250529	not possible to distribute function
-0.199304	appropriate. Compiler-specific keywords Fast function
-0.035657	by 4 ; mangled function
-0.035657	esp ;alignby4 ; mangled function
-0.199304	#pragma vector always Optimize function
-0.199304	thread-safe functions. A thread-safe function
-0.164498	compilers that a user-defined function
-0.164498	performance problems. Avoid nested function
-0.164498	member function or friend function
-0.164498	x); } // Branch/loop function
-0.164498	define your own error-handling function
-0.164498	it is a staircase function
-0.164498	job before you. Optimized function
-0.164498	#include "instrset_detect.cpp" // instrset_detect function
-0.164498	and call the std::unexpected() function
-0.164498	this example, the DelayFiveSeconds function
-0.164498	= sin(0.8); The sin function
-0.164498	function was called from), function
-0.576943	0x2C so that the if
-0.314139	This is how the if
-0.237409	the while loop, the if
-0.711727	the same thing and if
-0.236851	another function, etc., and if
-0.381399	a Pentium 4. The if
-0.236878	DoThisThreeTimesAWeek(); } 135 The if
-0.511575	this code is that if
-0.344655	vector library is that if
-0.376095	execution. This means that if
-0.376095	cycle. This means that if
-0.044055	// _controlfp(0, _EM_OVERFLOW); // if
-0.226364	keyword is used or if
-0.226364	be ruled out or if
-0.281098	is very large or if
-0.055088	it is small or if
-0.055088	function is small or if
-0.118193	is very small or if
-0.226364	to valid values or if
-0.281098	both are negative or if
-0.226364	cannot be predicted or if
-0.226364	in library functions, or if
-0.226364	has side effects or if
-0.226364	do not overlap or if
-0.226364	simple periodic pattern or if
-0.226364	large memory blocks, or if
-0.226364	estimate is correct or if
-0.226364	network is unstable or if
-0.226364	want this initialization, or if
-0.226364	of valid addresses, or if
-0.348085	to inline a function if
-0.348085	for inlining a function if
-0.538238	and vectorize the code if
-0.404622	return an error code if
-0.235531	copy is dead code if
-0.550436	becomes the same as if
-0.236157	variable or object as if
-0.274427	is used, but not if
-0.274427	in memory, but not if
-0.274427	a float, but not if
-0.234121	value, n. But not if
-0.237083	& operator[] (unsigned int if
-1.290183	is more efficient than if
-0.234910	operands are variables than if
-0.234910	in binary form than if
-0.357454	warning from the compiler if
-0.345068	< 20; i++) { if
-0.005205	a, bool b) { if
-0.350470	(a == 0) { if
-0.179109	(a != 0) { if
-0.179109	(n != 0) { if
-0.463734	SomeFunction (int n) { if
-0.516093	void F3(bool y) { if
-0.646731	takes no extra time if
-0.291817	should allow compile- time if
-0.417032	c = 0; } if
-0.600284	a * 3; } if
-0.289883	supported return &CriticalFunction_AVX; } if
-0.524947	segmentation of the memory if
-0.339497	accessed sequentially in memory if
-0.414897	variable from RAM memory if
-0.559001	handler in the program if
-0.449970	to virtual member functions if
-0.328313	functions. Avoid virtual functions if
-0.312099	priority thread, and only if
-0.312099	14.14b automatically but only if
-0.221908	objects) are possible only if
-0.270603	course, this works only if
-0.270603	and 14.13b works only if
-0.221908	set can run only if
-0.221908	use such methods only if
-0.221908	polymorphism is needed only if
-0.221908	specification to F1 only if
-0.237196	by a blend instruction if
-0.548411	conversion to floating point if
-0.427358	check before the loop if
-0.330170	check after the loop if
-0.465273	variable outside the loop if
-0.510859	out of a loop if
-0.320342	roll out a loop if
-0.223275	there is no loop if
-0.223275	;checkifi<100 ; repeat loop if
-0.223275	for avoiding infinite loop if
-0.236899	by me manually, but if
-0.781746	method can be used if
-0.478915	map can be used if
-0.501416	tree may be used if
-0.296948	should only be used if
-0.296948	can still be used if
-0.335251	be joined into one if
-0.872139	in the code cache if
-0.420983	range of an integer if
-0.325065	variable as an integer if
-0.412465	to a signed integer if
-1.609253	the SSE2 instruction set if
-1.175955	or later instruction set if
-0.284058	of it, for example if
-0.284058	in parts, for example if
-0.292570	float rather than double if
-0.342855	most efficient integer size if
-0.347167	// Set function pointer if
-0.234014	with a & b if
-0.234014	with a | b if
-0.236470	a re- usable library if
-0.236505	by making them static if
-0.223820	more compact and efficient if
-0.452456	made much more efficient if
-0.295572	type is most efficient if
-0.384498	statements are most efficient if
-0.543964	It is less efficient if
-0.515886	propagation is not possible if
-0.047069	This is only possible if
-0.047069	this is only possible if
-0.236585	use the static version if
-0.292615	or remove any objects if
-0.236163	more than one variable if
-0.236320	global and static variables if
-0.474300	a power of 2 if
-0.385867	avoid powers of 2 if
-0.441441	use a lookup table if
-0.908877	may improve the performance if
-0.326441	considerable improvement in performance if
-0.232247	the best possible branch if
-0.232247	into a single branch if
-0.313610	a more efficient way if
-0.364756	time. This is faster if
-0.364756	The method is faster if
-0.279418	The access is faster if
-0.510799	a constant is faster if
-0.199734	a vector goes faster if
-0.005032	2 // Still faster if
-0.005032	faster // Still faster if
-0.345556	a virtual function call if
-0.255493	Boolean vector. For example, if
-0.255493	one operation. For example, if
-0.255493	unroll factor. For example, if
-0.255493	case conditions. For example, if
-0.255493	data structures. For example, if
-0.255493	clock frequency. For example, if
-0.255493	it exits. For example, if
-0.255493	means modulo. For example, if
-0.255493	is minimized. For example, if
-0.323409	the dividend to unsigned if
-0.866775	stored in a register if
-0.035849	// of function pointers if
-0.316566	implementation of member pointers if
-0.333504	to use 64-bit systems if
-0.493603	unacceptable to the user if
-0.877850	This method is useful if
-0.551202	This may be useful if
-0.183816	for different arrays even if
-0.183816	operator (|) works even if
-0.183816	10 clock cycles even if
-0.183816	not an Intel, even if
-0.183816	can be mispredicted even if
-0.183816	likely be called, even if
-0.183816	computer starts up, even if
-0.183816	require more resources, even if
-0.183816	whole program execution, even if
-0.183816	that (b*c) overflows, even if
-0.183816	the exception handler, even if
-0.339950	may avoid this method if
-0.236023	can be left out if
-0.351353	a protected operating system if
-0.235759	0; // return 0 if
-0.446383	This is the case if
-0.489722	YMM registers are available if
-0.371148	size are only available if
-0.235797	will be filled up if
-0.347020	will detect an error if
-0.235641	stored can be important if
-0.283905	on several different CPUs if
-0.228837	on contemporary 106 CPUs if
-0.235162	at compile time while if
-0.291380	case of large arrays if
-0.337052	efficient than 64-bit Windows if
-0.235590	get the same result if
-0.330557	automatic vectorization works best if
-0.344499	often. This is necessary if
-0.326119	they are not necessary if
-0.332951	of an array element if
-0.200236	three clock cycles. But if
-0.200236	soon be obsolete. But if
-0.200236	it was programmed. But if
-0.200236	see the delay. But if
-0.200236	a cache miss. But if
-0.200236	other resource conflicts. But if
-0.235370	can actually reduce speed if
-0.354138	Example 7.20 int i; if
-0.058484	in a separate thread if
-0.205500	into a separate thread if
-0.404097	may use 64-bit integers if
-0.235096	the source annotation option if
-0.322918	Single precision is good if
-0.336710	may use single precision if
-0.234820	remove the memset line if
-0.453748	code can be optimized if
-0.226468	a[1000]; float b[1000]; }; if
-0.226468	"Beta", "Gamma", "Delta" }; if
-0.166046	x, y; bool b; if
-0.166046	y, z; bool b; if
-0.216929	You have to check if
-0.216929	You need to check if
-0.216929	often necessary to check if
-0.266477	zero We can check if
-0.178047	0x7FFFFFFF) { // check if
-0.301369	dispatcher does not check if
-0.178047	the function must check if
-0.178047	variables as input check if
-0.291549	a function is advantageous if
-0.291549	This method is advantageous if
-0.284157	may not be advantageous if
-0.370531	This may be advantageous if
-0.332894	operators is more advantageous if
-0.226011	This is no problem if
-0.226011	a very big problem if
-0.334205	is not an advantage if
-0.234378	1; // always 1 if
-0.566009	in 64 bit mode if
-0.279390	set the denormals-are-zero mode if
-0.478277	might have other values if
-0.224540	be predicted quite well if
-0.304092	is usually predicted well if
-0.420395	several hundred clock cycles if
-0.324594	only 2-3 clock cycles if
-0.441671	110; int i; ... if
-0.224151	i; float list[size]; ... if
-0.329844	and therefore not recommended if
-0.223682	constants is very fast if
-0.223682	which is calculated fast if
-0.234495	for function F1. However, if
-0.233922	faster than 32-bit programs if
-0.223125	Gnu compilers without problems if
-0.277426	software can cause problems if
-0.158186	function pointer if else if
-0.322414	if else if else if
-0.348762	= &CriticalFunction_AVX; } else if
-0.233889	of the first application if
-0.233793	unroll a loop automatically if
-0.219691	look at to see if
-0.219691	final result to see if
-0.219691	some measurements to see if
-0.219691	output listing to see if
-0.222179	the simplest possible implementation if
-0.403436	use the software implementation if
-0.341492	a little more complicated if
-0.233689	of the above methods if
-0.200552	therefore be a disadvantage if
-0.200552	be at a disadvantage if
-0.309414	The value is zero if
-0.233838	it returns. But what if
-0.415366	away a const reference if
-0.341159	with a table lookup if
-0.309343	rather than at runtime if
-0.116180	constructor is not needed if
-0.331254	are also stored together if
-0.233329	the code becomes bigger if
-0.232780	speed by using vectors if
-0.233085	options. I don't know if
-0.232920	may give inconsistent results if
-0.232467	entirely inside one function, if
-0.339090	cannot swap the operands if
-0.232206	placed in separate modules if
-0.232467	The code becomes smaller if
-0.232728	done at runtime here if
-0.307646	may skip this section if
-0.235397	will be cache contentions if
-0.235397	can cause cache contentions if
-0.307975	target address is predicted if
-0.232329	more resources than C if
-0.214484	make a variable global if
-0.214484	not make variables global if
-0.421911	predict a switch statement if
-0.213057	it can cause errors if
-0.213057	and cause fatal errors if
-0.293215	can be very inefficient if
-0.264214	can be quite inefficient if
-0.231581	for overflow by checking if
-0.231351	choice for Linux platforms if
-0.215223	that can be vectorized if
-0.215223	can also be vectorized if
-0.331090	much about the costs if
-0.231017	function is usually inlined if
-0.839796	a, b, c, d; if
-0.327017	not make a destructor if
-0.410941	may not be safe if
-0.169241	This is only safe if
-0.169241	generally not thread safe if
-0.169241	program is exception safe if
-0.231128	unroll the loop further if
-0.231017	advanced and complicated algorithm if
-0.321374	n from the exponent if
-0.328019	to the hard disk if
-0.765869	best performance is obtained if
-0.090631	is accessed most efficiently if
-0.101665	cache works most efficiently if
-0.250753	cache works less efficiently if
-0.230256	of specific CPU models if
-0.159213	is likely to fail if
-0.046871	above code will fail if
-0.046871	positive. It will fail if
-0.046871	The trick will fail if
-0.305127	target buffer can occur if
-0.317987	to predict the target if
-0.152595	of a program, especially if
-0.152595	in 32-bit systems, especially if
-0.152595	in the file, especially if
-0.152595	also time consuming, especially if
-0.229420	not need the updates if
-0.201934	then you may consider if
-0.152358	then we will consider if
-0.198048	purpose, you must consider if
-0.229844	calling the function directly if
-0.511117	need an error message if
-0.229420	multiple calculations in parallel if
-0.230268	The calculation becomes easier if
-0.202317	happen with the loops if
-0.202317	compilers will unroll loops if
-0.905286	int i; } u; if
-0.235389	int i[2]; } u; if
-0.228573	The delay is significant if
-0.351594	this would be invalid if
-0.250082	stamp counter becomes invalid if
-0.284486	a cache is organized if
-0.722316	a lot to gain if
-0.363606	serious errors can happen if
-0.199197	87. This will happen if
-0.284839	case it doesn't matter if
-0.553690	object oriented programming style if
-0.228401	be of some help if
-0.228573	skip the following explanation if
-0.369620	a function is pure if
-0.227712	element is stored (or if
-0.437707	takes one clock cycle if
-0.226832	switches are more frequent if
-0.227025	to the $B1$2 label if
-0.226639	and floating point variables, if
-0.226832	may be needed, however, if
-0.281849	on such small devices if
-0.226832	to prefetch data explicitly if
-0.301777	not use lookup tables if
-0.225703	be removed after debugging if
-0.225263	the second operand. Likewise, if
-0.225263	the time, but expensive if
-0.225263	D language allows compile-time if
-0.407589	member is more compact if
-0.313978	be omitted, of course, if
-0.399469	situation is more complex if
-0.223435	purpose is to detect if
-0.223690	without CPU dispatching. Test if
-0.278067	the conversion is costly if
-0.223435	Data caching is poor if
-0.223435	C++. This typically happens if
-0.018421	should not be evaluated if
-0.223690	This can be permissible if
-0.223435	to do so (i.e. if
-0.820079	On the other hand, if
-0.221193	address is taken, i.e. if
-0.220888	moving each object separately if
-0.221193	therefore count as true, if
-0.270601	will therefore need modification if
-0.028472	branch can be eliminated if
-0.028472	reference can be eliminated if
-0.058943	can also be eliminated if
-0.217096	the code is selected if
-0.217096	a very high resolution if
-0.013491	if unsigned // Faster if
-0.210849	in the database anyway if
-0.211345	example: // Example 7.8 if
-0.210849	may save RAM space, if
-0.210849	can cause severe delays if
-0.210849	the use of longjmp if
-0.345507	to your programming questions if
-0.486345	standard template library (STL) if
-0.264102	x=y; y=temp;} // Check if
-0.282628	in Windows) to determine if
-0.263541	can cause branch mispredictions if
-0.198617	and use multiple accumulators if
-0.463559	"assume no pointer aliasing" if
-0.463559	362880, 3628800, 39916800, 479001600}; if
-0.198617	pointer. The copy constructor, if
-0.198617	// u.f > v.f if
-0.008616	there are many branches): if
-0.008616	i; } u, v; if
-0.198617	Branches are relatively cheap if
-0.017406	Saturday }; Weekdays Day; if
-0.017406	0x40 }; Weekdays Day; if
-0.249757	be a time consumer if
-0.249757	storage should be avoided, if
-0.198617	in a separate subroutine if
-0.249757	risk of memory leaks if
-0.163864	to: // Example 14.5b if
-0.163864	// (N & N-1)==0 if
-0.163864	should not call WriteFile if
-0.163864	u; int n; 143 if
-0.163864	NAN (Not A Number) if
-0.163864	are many function calls, if
-0.163864	comparison: // Example 14.4b if
-0.163864	by extending with zero-bits if
-0.163864	by // Example 14.15b if
-0.163864	numerically largest element (approximately): if
-0.163864	(C << 6); Or, if
-0.163864	The division is inexact if
-0.163864	the code are modified, if
-0.163864	i must be adjusted if
-0.163864	false: // Example 8.10a if
-0.163864	the table at runtime, if
-0.163864	cache contentions will occur: if
-0.163864	available, 256 bits (YMM) if
-0.163864	parameter, and the destructor, if
-0.163864	by extending the sign-bit if
-0.163864	work cannot be ignored if
-0.163864	exponent is always normalized, if
-0.163864	if they are uninitialized, if
-0.163864	is 128 bits (XMM) if
-0.163864	keyword __restrict or __restrict__, if
-0.163864	more than a minute if
-0.163864	example: // Example 14.15a if
-0.163864	= 100; float list[ARRAYSIZE]; if
-0.163864	template metaprogramming. Don't panic if
-0.163864	sign must be reversed if
-0.163864	for "function level linking" if
-0.163864	This cost is minimized if
-0.163864	pointers do not alias, if
-0.564252	to a function is by
-0.053266	the value pointed to by
-0.236479	2 if possible and by
-0.313625	using static linking and by
-0.236479	to become invalid, and by
-0.236479	the clock period and by
-0.344296	the frame function or by
-0.232705	an unsigned int or by
-0.320615	inline or static or by
-0.413448	constant single precision or by
-0.288299	the installation process or by
-0.232705	by the latency or by
-0.232705	by consecutive indices or by
-0.232705	integer is signed, or by
-0.293039	predictable then replace it by
-0.236873	you should multiply it by
-0.441316	for a single function by
-0.044002	into a leaf function by
-0.344594	interprets the intermediate code by
-0.444786	systems use position-independent code by
-0.291624	libraries and compiler-generated code by
-0.237084	the operating system, not by
-0.400748	point code rather than by
-0.400748	by 8 rather than by
-0.308766	operating system rather than by
-0.308766	the container rather than by
-0.308766	execution units rather than by
-0.308766	development tools, rather than by
-0.228794	or vector classes than by
-0.228794	in other ways than by
-0.228794	the best algorithm than by
-0.237313	with a different compiler by
-0.606533	sign bit of x by
-0.808668	take advantage of this by
-0.223327	can tell it this by
-0.587174	you can do this by
-0.223327	value. It does this by
-0.581660	You can avoid this by
-0.011049	compiler may replace this by
-0.074405	compiler will replace this by
-0.223327	You can improve this by
-0.223327	I have confirmed this by
-0.343278	can obtain much more by
-0.235743	relocated (rebased) once more by
-0.342833	of functions in memory by
-0.557223	spots in the program by
-0.438845	for the whole program by
-0.237028	only for speed-critical functions by
-0.355693	warm up the CPU by
-0.121520	roll out the loop by
-0.055791	rolling out the loop by
-0.308937	FuncC. Unrolling the loop by
-0.269126	will optimize this loop by
-0.001363	// Roll out loop by
-0.497409	of the innermost loop by
-0.335878	class that is used by
-0.335878	unwinding that is used by
-0.558091	code that are used by
-0.323690	optimization manuals are used by
-0.274466	also the time used by
-0.096408	free the memory used by
-0.096408	and data memory used by
-0.448166	addresses are often used by
-0.220511	line that was used by
-0.252156	source files into one by
-0.252156	.cpp modules into one by
-0.232527	elements were inserted, one by
-0.348333	the contrary, you should by
-0.292975	of f is set by
-0.437326	of its child class by
-0.334665	to modify a double by
-0.236796	to a longer size by
-0.230928	compiler has replaced i by
-0.022986	order to divide i by
-0.350637	pointer. Accessing an object by
-0.520346	nonzero floating point number by
-0.472798	of the CPU clock by
-0.462813	take the absolute value by
-0.236340	replacing an integer variable by
-0.313002	static and global variables by
-1.681866	a power of 2 by
-0.287833	be reduced to 2 by
-0.236277	x // align table by
-0.505620	can improve the performance by
-0.567841	may improve the performance by
-0.228706	fallacy of measuring performance by
-0.335609	to replace the branch by
-0.337550	automatically replace a branch by
-0.228385	a poorly predictable branch by
-0.236686	align its b member by
-0.236547	clear and intelligible way by
-0.236616	make member functions faster by
-0.236613	and for information stored by
-0.343370	(DLL) which is called by
-0.099017	if all functions called by
-0.099017	any library functions called by
-0.613866	a particular memory address by
-0.232037	can calculate each address by
-0.486618	replace a function call by
-0.305899	a lot of optimization by
-0.017699	8.4 Obstacles to optimization by
-0.017699	8.3 Obstacles to optimization by
-1.152309	are transferred in registers by
-0.306234	the 512-bit ZMM registers by
-0.431815	avoided in 64-bit systems by
-0.235883	container for exclusive access by
-0.364639	cannot be ruled out by
-0.098013	loop is rolled out by
-0.098013	a list, rolled out by
-0.333452	that created a file by
-0.291813	had a different type by
-0.291573	can avoid this error by
-0.476960	if it is accessed by
-0.045868	it is not accessed by
-0.045868	function is not accessed by
-0.283661	large objects and arrays by
-0.228623	is to replace arrays by
-0.311777	Sum3 in 32-bit Windows by
-0.235599	method that delays execution by
-0.235712	get a better result by
-0.334250	increased to 16 bytes by
-0.331753	can double the speed by
-0.100374	to gain in speed by
-0.100374	you gain in speed by
-0.518223	might check for overflow by
-0.094260	called. This is done by
-0.094260	loop. This is done by
-0.214818	address. Relocation is done by
-0.175180	be standardized and done by
-0.877923	This can be done by
-0.262064	constant should be done by
-0.175180	work it has done by
-0.175180	to 15.1c was done by
-0.175180	is not necessarily done by
-0.348225	constants are double precision by
-0.227041	may replace this line by
-0.227041	is and interpreted line by
-0.405461	program optimization. This works by
-0.321930	can often be optimized by
-0.177274	function can be calculated by
-0.177274	which can be calculated by
-0.159119	counter can be calculated by
-0.177274	condition can be calculated by
-0.175994	r+i/2 could be calculated by
-0.290874	memory a function uses by
-0.311105	one auto_ptr to another by
-0.329985	and therefore not advantageous by
-0.320825	15.1b. Branches are implemented by
-0.226765	This is typically implemented by
-0.325528	can reduce the problem by
-0.256534	to avoid this problem by
-0.256534	has solved this problem by
-0.061161	instruction set is supported by
-0.097620	reasons. C++ is supported by
-0.151846	if AVX is supported by
-0.180250	in Linux and supported by
-0.040859	Intrinsic functions are supported by
-0.040859	XMM registers are supported by
-0.040859	These directives are supported by
-0.062827	only available if supported by
-0.062827	or __restrict__, if supported by
-0.439024	be 0 or 1 by
-0.234774	calculate the table values by
-0.160313	floating point number simply by
-0.160313	signal an error simply by
-0.160313	size is done simply by
-0.160313	array is implemented simply by
-0.160313	floating point numbers simply by
-0.160313	can be copied simply by
-0.160313	for CPU brand simply by
-0.160313	It is measured simply by
-0.160313	the performance significantly simply by
-0.881412	of a loop counter by
-0.329131	for saving memory space by
-0.399141	lot of cache space by
-0.299032	may avoid the multiplication by
-0.088764	abs(v.f) } The multiplication by
-0.088764	matrix element. The multiplication by
-0.251826	often replace integer multiplication by
-0.234014	are often inlined automatically by
-0.294396	point number is zero by
-0.324584	be initialized to zero by
-0.016229	is faster than division by
-0.343485	cycles). Floating point division by
-0.150851	can eliminate one division by
-0.136539	of 2 Integer division by
-0.136539	compile time. Integer division by
-0.136539	the processor). Integer division by
-0.136539	integer division: Integer division by
-0.233830	that is actually needed by
-0.996958	Function parameters are transferred by
-0.048273	have to be aligned by
-0.048273	class should be aligned by
-0.048273	vectors must be aligned by
-0.443473	the arrays are aligned by
-0.164687	malloc is typically aligned by
-0.164687	vectors are preferably aligned by
-0.233148	makes sense to dispatch by
-0.376469	template class is declared by
-0.233148	// incremented every second by
-0.233781	heavy background calculations piece by
-0.000190	address that is divisible by
-0.000380	constant that is divisible by
-0.001140	address which is divisible by
-0.000285	loop count is divisible by
-0.003430	certain to be divisible by
-0.003430	SIZE must be divisible by
-0.001712	points is not divisible by
-0.001712	iterations is not divisible by
-0.006888	arrays. Array size divisible by
-0.000273	to an address divisible by
-0.000068	at an address divisible by
-0.000855	alignment to addresses divisible by
-0.000855	structures to addresses divisible by
-0.001712	both have addresses divisible by
-0.001712	round memory addresses divisible by
-0.232825	speeded up significantly just by
-0.215631	a variable even smaller by
-0.215631	often be made smaller by
-0.583006	a specific CPU core by
-0.308140	been incremented to 5 by
-0.013640	the function is replaced by
-0.013640	function, m is replaced by
-0.013640	operation. x*8 is replaced by
-0.042278	if possible, and replaced by
-0.091066	size can be replaced by
-0.091066	multiplication can be replaced by
-0.091066	constants can be replaced by
-0.091066	12.4b can be replaced by
-0.045149	pointers may be replaced by
-0.045149	constants will be replaced by
-0.045149	can sometimes be replaced by
-0.042278	template parameters are replaced by
-0.042278	compiler have been replaced by
-0.042278	has its parameters replaced by
-0.232376	2n and not negative by
-0.307963	will conclude this section by
-0.335773	likely to be predicted by
-0.307819	are: Avoid the conversions by
-0.232422	be optional and off by
-0.287618	for multiplying the index by
-0.058784	This can be avoided by
-0.029138	example can be avoided by
-0.029138	pointers can be avoided by
-0.029138	constant can be avoided by
-0.029138	Jumps can be avoided by
-0.029138	(b+c) can be avoided by
-0.238857	penalty should be avoided by
-0.078738	can sometimes be avoided by
-0.231802	advantage of this fact by
-0.198297	the CPU is limited by
-0.198297	the performance is limited by
-0.188918	likely to be limited by
-0.306829	function to be inlined by
-0.286570	to replace a database by
-0.425025	constructor and the destructor by
-0.286468	that we may save by
-0.231364	optimize the code further by
-0.231815	you may improve efficiency by
-0.231725	odd and you unroll by
-0.230860	vector operations when alignment by
-0.230763	Example 7.34b. Replace macro by
-0.208067	n. You can divide by
-0.208067	shift right = divide by
-0.406346	best performance is obtained by
-0.164944	modern microprocessors is obtained by
-0.164944	user interface is obtained by
-0.252680	can sometimes be obtained by
-0.316558	be achieved more efficiently by
-0.498587	variable can be changed by
-0.230481	the call to square by
-0.230272	arrays and big structures by
-0.230035	} An array initialized by
-0.096114	that can be improved by
-0.016268	This can be improved by
-0.033170	performance can be improved by
-0.033170	processors can be improved by
-0.033170	projects can be improved by
-0.032024	speed will be improved by
-0.032024	can probably be improved by
-0.229920	5 times faster either by
-0.412390	an object is copied by
-0.254031	swap elements // align by
-0.254031	+ esp ; align by
-0.229920	automatically replace such loops by
-0.229123	functions in a module by
-0.405257	a lot to gain by
-0.164320	is nothing to gain by
-0.219383	The insight you gain by
-0.284087	address of each row by
-0.211403	that it can multiply by
-0.118724	by constant = multiply by
-0.044449	code and lazy binding by
-0.306759	example 14.7b is converted by
-0.282279	with just two additions by
-0.227405	set was originally designed by
-0.227249	factor to multiply j by
-0.190081	to vectorize code explicitly by
-0.190081	specify the alignment explicitly by
-0.190511	fast ways of multiplying by
-0.190511	is faster than multiplying by
-0.227093	can eliminate this jump by
-0.085204	computation time is determined by
-0.085204	time slices is determined by
-0.162247	available can be determined by
-0.106787	some cases be determined by
-0.095232	task is often determined by
-0.321305	disk. Provoke cache misses by
-0.567550	number of context switches by
-0.225915	19 Literature Other manuals by
-0.314597	be made more compact by
-0.278580	condition. Replacing two comparisons by
-0.223937	more space 91 step by
-0.224143	does not alias anything by
-0.295137	may avoid multiple inheritance by
-0.221428	problem can be overcome by
-0.008053	at the code generated by
-0.069038	compiler. Object files generated by
-0.069038	of the comments generated by
-0.221428	must be dynamically created by
-0.221182	references instead of pointers, by
-0.359658	CPUs can be increased by
-0.114079	is slow // Division by
-0.114079	is much faster. Division by
-0.114079	where it matters: Division by
-0.221182	advantage of these guidelines by
-0.221428	the results are combined by
-0.217693	(a<b && b<c) Multiply by
-0.217693	32-bit Windows. Does not, by
-0.067720	point number by 2n by
-0.067720	can divide by 2n by
-0.354866	behavior can be prevented by
-0.217998	effect can be illustrated by
-0.217693	composite objects are returned by
-0.270930	compiler optimize example 8.26a by
-0.035588	containers should be identified by
-0.017429	order but are identified by
-0.017429	If objects are identified by
-0.035588	index. Are objects identified by
-0.035588	its value is multiplied by
-0.017429	counts should be multiplied by
-0.017429	index must be multiplied by
-0.035588	plus an index multiplied by
-0.010090	that may be modified by
-0.088160	that are never modified by
-0.217387	a rather unconventional manner by
-0.217387	you can avoid hyperthreading by
-0.211136	function and later deleted by
-0.211136	function) should be hidden by
-0.211136	initializing pointers to zero, by
-0.211136	must be done manually by
-0.013506	happen to be spaced by
-0.056743	three clauses are separated by
-0.056743	each clause are separated by
-0.027445	problem can be solved by
-0.027445	dilemma can be solved by
-0.048129	= 0 - Divide by
-0.048129	shift and add Divide by
-0.048129	---xx---- (-a>-b)=(a<b) ---xx---x Divide by
-0.211537	slices to 120 ms by
-0.263865	function. Provoke branch mispredictions by
-0.282967	RAM space, if necessary, by
-0.198897	check for exceptions thrown by
-0.198897	brand check is bypassed by
-0.464073	of Numerically Intensive Codes", by
-0.198897	all occurrences of ArraySize by
-0.198897	use position-independent code everywhere by
-0.198897	Intel compiler Linux Align by
-0.198897	improve the performance dramatically by
-0.198897	unsigned int before dividing by
-0.198897	The DLLs are relocated by
-0.198897	servicing. A command received by
-0.198897	the block size grows by
-0.198897	a far data segment by
-0.250071	to compose a bitfield by
-0.164121	by multiple threads Parallelization by
-0.164121	the interval [1.0, 2.0) by
-0.164121	further tested and investigated by
-0.164121	five manuals is copyrighted by
-0.164121	is slow // Modulo by
-0.164121	approximate comparison of doubles by
-0.164121	The exception is caught by
-0.164121	have to replace u[1] by
-0.164121	vectorization Automatic paralleli- zation by
-0.164121	parameters are not affected by
-0.164121	and shift operations. Multiplying by
-0.164121	into a place indicated by
-0.164121	Windows may be mitigated by
-0.164121	Several function libraries published by
-0.164121	This can be ameliorated by
-0.164121	instruction must be followed by
-0.164121	execution may be caused by
-0.164121	platform is obviously influenced by
-0.164121	This can be accomplished by
-0.164121	services only when activated by
-0.164121	who is still frustrated by
-0.291742	a linked list or with
-0.534531	new and delete or with
-0.235734	functionality without polymorphism or with
-0.291516	value and write it with
-0.291516	variable or replace it with
-0.235535	1 by XOR'ing it with
-0.235535	negative by AND'ing it with
-0.355356	inline. Replacing a function with
-0.291230	calling a simple function with
-0.328620	by making another function with
-0.329287	to a pure function with
-0.314258	usually requires log on with
-0.581148	breakpoint in the code with
-0.320436	are running this code with
-0.233871	12.7. Vector class code with
-0.326870	making highly optimized code with
-0.233871	of the user-written code with
-0.236068	set, but possibly not with
-0.236068	non-recursing template specialization, not with
-0.236027	faster with signed than with
-0.236027	with coarse-grained parallelism than with
-0.341259	This requires a compiler with
-0.234049	belong to each compiler with
-0.234049	Combining the Borland compiler with
-0.234049	very user friendly compiler with
-0.404848	able to optimize this with
-0.235684	Let me explain this with
-0.350478	and de-allocation of memory with
-0.457554	vector. Organize the data with
-0.458979	you compile the program with
-0.235195	this reason. A program with
-0.324223	the speed of functions with
-0.341255	CPU dispatching works only with
-0.206829	CPUs or a CPU with
-0.206829	run on a CPU with
-0.206829	only need a CPU with
-0.349670	way or the other with
-0.493252	semicolons in a loop with
-0.327989	is inside a loop with
-0.168729	predicted well. A loop with
-0.168729	branch prediction. A loop with
-1.031884	can also be used with
-0.351988	Intel libraries are used with
-0.349541	you divide an integer with
-0.310540	to mix simple integer with
-0.410833	T is a class with
-0.579550	wrapped into a class with
-0.890432	a structure or class with
-0.788006	things you can do with
-0.253843	in the above example with
-0.165969	repeat the above example with
-0.236733	of 8 kb size with
-0.373698	replace a && b with
-0.231343	replace a || b with
-0.231343	I have AND'ed b with
-0.357798	publish a function library with
-0.273693	incompatible. A function library with
-0.273693	Intel math function library with
-0.022516	can use this library with
-0.236870	Rather than comparing i with
-0.350582	function construct an object with
-0.343966	behaves like an array with
-0.431279	use a linear array with
-0.229743	make a variable-size array with
-0.416023	executable: a debug version with
-0.529535	and a release version with
-0.419845	small dynamically allocated objects with
-0.236251	A class member variable with
-0.235976	an error can return with
-0.514348	replaced by a table with
-0.339458	The vulnerability of software with
-0.232104	and swap these elements with
-0.232104	fail to distinguish elements with
-0.650508	floating point is faster with
-0.518271	class will be stored with
-0.347756	this example is called with
-0.231726	is // erroneously called with
-0.773377	on a Pentium 4 with
-0.486515	replacing a function call with
-0.351582	details. Use function libraries with
-0.329534	template parameters. A template with
-0.096233	not portable to systems with
-0.096233	easily ported to systems with
-0.220047	separate thread in systems with
-0.220047	to fully utilize systems with
-0.235980	memory blocks. A method with
-0.292128	tests were carried out with
-0.291451	threads on a system with
-0.235892	replace j * 32 with
-0.168042	mask out multiple bits with
-0.168042	can toggle multiple bits with
-0.224173	call and return operations with
-0.224173	1. This makes operations with
-0.278614	can do arithmetic operations with
-0.312121	// Define function type with
-0.345386	is commonly the case with
-0.199797	works best on processors with
-0.199797	be avoided on processors with
-0.382386	registers The first processors with
-0.640830	the standard PC processors with
-0.211785	parallel. Small lightweight processors with
-0.379932	that are only available with
-0.552363	multiplication by a constant with
-0.223087	replace an integer constant with
-0.223087	even a single constant with
-0.223420	Surprisingly, we end up with
-0.097498	computers to keep up with
-0.097498	they always keep up with
-0.335121	critical function many times with
-0.229251	library function 250 times with
-0.324063	shows. It is accessed with
-0.406187	both can be accessed with
-0.245102	arrays should be accessed with
-0.089850	strlen function for CPUs with
-0.089850	another version for CPUs with
-0.152464	this function on CPUs with
-0.152464	optimal only on CPUs with
-0.203271	Intel compiler. Use CPUs with
-0.203271	integer division. Older CPUs with
-0.305637	contains examples of arrays with
-0.275822	to make aligned arrays with
-0.221709	to allocate variable-size arrays with
-0.321800	the function to work with
-0.221586	systems, this may work with
-0.404380	but it doesn't work with
-0.291216	to mix mathematical calculations with
-0.525955	compiled in multiple versions with
-0.235372	a Core i7 processor with
-0.391220	a program is compiled with
-0.263505	have to be compiled with
-0.245223	critical code are compiled with
-0.181218	piece of code compiled with
-0.247880	when mixing code compiled with
-0.245223	faster than when compiled with
-0.194585	objects are normally compiled with
-0.251514	the number of threads with
-0.251514	have no more threads with
-0.251514	running in other threads with
-0.200179	not divided into threads with
-0.296926	avoid running two threads with
-0.251514	tasks into separate threads with
-0.291111	an advanced high-level language with
-0.742954	into a separate thread with
-0.097720	solution is to compile with
-0.097720	possibility is to compile with
-0.300926	code when you compile with
-0.219821	anything it has allocated with
-0.096148	allocated memory Memory allocated with
-0.096148	up include: Memory allocated with
-0.311569	(3) trap integer overflow with
-0.293191	allow addition of integers with
-0.381579	can use 64-bit integers with
-0.273645	stored as 8-bit integers with
-0.322858	Windows and 32-bit Linux with
-0.234893	there are wrapper classes with
-0.252814	memory allocation is done with
-0.252814	the multiplication is done with
-0.103139	This can be done with
-0.092636	integer can be done with
-0.092636	2 can be done with
-0.092636	polynomial can be done with
-0.200861	point operations are done with
-0.296960	All calculations are done with
-0.312390	from the command line with
-0.290992	thread. This method works with
-0.341938	number to be calculated with
-0.226609	results are always calculated with
-0.319472	.NET, which is implemented with
-0.557273	this can be implemented with
-0.304066	programming languages are implemented with
-0.209898	shows this calculation implemented with
-0.398087	This is a problem with
-0.279953	remain unchanged. The problem with
-0.200430	b; } A problem with
-0.200430	the double. Another problem with
-0.200430	The most serious problem with
-0.700842	if it is known with
-0.235110	// Example 12.6. Function with
-0.234859	use a linear list with
-0.299820	compiled code may run with
-0.225377	make a test run with
-0.234650	x86 platforms. Works well with
-0.234729	read from different addresses with
-0.524184	comparing the loop counter with
-1.056301	use dynamic memory allocation with
-0.234568	a PC platform. However, with
-0.320644	be useful in programs with
-0.234295	problems. Some common problems with
-0.348939	vector c: CPU dispatching with
-0.097201	dependency chain. A microprocessor with
-0.097201	integer counter. A microprocessor with
-0.234527	a single container, preferably with
-0.234001	operating systems"). An application with
-0.310285	same thing. An expression with
-0.344149	that accesses data members with
-0.233798	have efficient table-based methods with
-0.234139	result of comparing signed with
-0.233741	is inferior. A model with
-0.309909	/ means integer division with
-0.233437	/ b) >> n with
-0.233991	recursion must always end with
-0.219891	is on mathematical applications with
-0.219891	for some CPU-intensive applications with
-0.274102	register, do an addition with
-0.500982	mix floating point addition with
-0.505503	arrays of different types with
-0.289101	warning for such optimizations with
-0.217447	implemented on a platform with
-0.217447	the standard PC platform with
-0.439215	for AVX or later with
-0.288843	can be linked together with
-0.217857	used for variables declared with
-0.217857	functions A macro declared with
-0.216895	no need to link with
-0.216895	option -fno-pic and link with
-0.216608	64-bit shared object made with
-0.216608	linked into projects made with
-0.233126	four (or eight) points with
-0.215300	classes, templates or modules with
-0.215300	the most critical modules with
-0.287982	resources of the core with
-0.446135	possible to do things with
-0.214560	does some funny things with
-0.491291	program should be tested with
-0.326754	smaller in a computer with
-0.197016	never used. A computer with
-0.197016	Use an old computer with
-0.004267	one that is compatible with
-0.004267	set that is compatible with
-0.004267	version that is compatible with
-0.012928	the processor is compatible with
-0.008576	may not be compatible with
-0.008576	will not be compatible with
-0.017324	it will be compatible with
-0.068486	compilers are not compatible with
-0.068486	names are not compatible with
-0.054195	is not even compatible with
-0.054195	respects and highly compatible with
-0.090245	are not backwards compatible with
-0.054195	compiler is mostly compatible with
-0.290746	processors, a switch statement with
-0.217720	targets. A switch statement with
-0.844239	can be allocated dynamically with
-0.312777	with branch // Loop with
-0.213222	// Example 12.4a. Loop with
-0.287254	tested on a network with
-0.221494	the library that comes with
-0.173406	Linux. The compiler comes with
-0.173406	open source. It comes with
-0.173406	Library (STL) which comes with
-0.281276	implemented on other platforms with
-0.209702	the most common platforms with
-0.394824	lookup cannot be vectorized with
-0.209798	12.4c. Same example, vectorized with
-0.231146	to express any algorithm with
-0.125384	is that the compatibility with
-0.026980	the sake of compatibility with
-0.119720	the requirements of compatibility with
-0.125384	instruction set when compatibility with
-0.125384	sake of backwards compatibility with
-0.231383	the desired polymorphism effect with
-0.337401	the compiler to predict with
-0.204969	polymorphism that is obtained with
-0.478714	best performance is obtained with
-0.064005	counter can be obtained with
-0.064005	resolution can be obtained with
-0.137670	is no doubt obtained with
-0.316451	Multithreading works more efficiently with
-0.372671	specific functions have names with
-0.305891	the value of N with
-0.230017	valid 63 number (e.g. with
-0.247666	have big data structures with
-0.247666	more advanced data structures with
-0.229939	floating point representation directly with
-0.285301	the constants are defined with
-0.049929	interface elements that come with
-0.049929	or libraries that come with
-0.049929	file mathimf.h that come with
-0.127669	as a circular buffer with
-0.408271	same errors can happen with
-0.554007	old fashioned C style with
-0.283151	Don't mix nontemporal writes with
-0.328747	especially loop-carried dependency chains with
-0.228017	100. It compares eax with
-0.227486	Mac The libraries included with
-0.227312	element in vector c2 with
-0.282173	to do two additions with
-0.301495	Alternatively, make a DLL with
-0.281975	important on small devices with
-0.124670	be done by multiplying with
-0.124670	of 2 when multiplying with
-0.104153	to double before multiplying with
-0.104153	double precision before multiplying with
-0.409483	sets can be determined with
-0.225411	floating point calculations. Even with
-0.233006	Example 7.43a. Runtime polymorphism with
-0.183696	Example 7.43b. Compile-time polymorphism with
-0.299860	If time is measured with
-0.277945	Example 8.23b. Calculate polynomial with
-0.278206	the name of Func with
-0.223813	fragmented hard disk. Test with
-0.363601	have more powerful computers with
-0.223582	often disturb the users with
-0.223582	they cannot be mixed with
-0.224276	take care of communication with
-0.223813	as C- style type-casting with
-0.221586	element in vector bc with
-0.074498	the diagonal is swapped with
-0.074498	element matrix[r][c] is swapped with
-0.228422	N) { // Array with
-0.164476	// Example 7.15a. Array with
-0.221035	to a[i+2] ; compare with
-0.074399	allocated memory is contiguous with
-0.074399	memory which is contiguous with
-0.221035	and compile them separately with
-0.004006	expression that is AND'ed with
-0.016249	of cc[i]+2 is AND'ed with
-0.016249	and bb[i]*cc[i] is AND'ed with
-0.221311	The Clang compiler combined with
-0.221035	you should avoid macros with
-0.217242	data files and databases with
-0.217583	// Example 12.4b. Vectorized with
-0.217242	are swapping column 29 with
-0.217242	have names that begin with
-0.148183	A double is represented with
-0.262130	Zero can be represented with
-0.041823	because these are incompatible with
-0.041823	optimization options are incompatible with
-0.088106	makes the code incompatible with
-0.270765	replaces the PLT entry with
-0.217242	may make some tests with
-0.217583	the overflow behavior well-defined with
-0.023383	if you are satisfied with
-0.023383	Those who are satisfied with
-0.048096	you are not satisfied with
-0.048096	32-bit Windows. Gnu Comes with
-0.048096	also available. Microsoft Comes with
-0.048096	CodeGear / Embarcadero Comes with
-0.211441	test should be performed with
-0.121951	library (VML, MKL). Works with
-0.121951	Performance Primitives (IPP). Works with
-0.211441	(SVML). This is supplied with
-0.211441	be copied or moved with
-0.211441	short in duration compared with
-0.198757	pointers that are impossible with
-0.198757	Strings can be manipulated with
-0.198757	for further optimizations. Loops with
-0.198757	code in example 14.14a with
-0.074301	temp; 104 } Microprocessors with
-0.074301	Explicit cache control Microprocessors with
-0.198757	arranged in regular patterns with
-0.198757	requirements are often conflicting with
-0.198757	by using indexes, working with
-0.074301	all the problems associated with
-0.074301	common programming errors associated with
-0.198757	of devices and machines with
-0.198757	8192 bytes, 4 ways, with
-0.198757	generation can cause complications with
-0.074301	used methods for dealing with
-0.074301	that you are dealing with
-0.249914	signed, or by extending with
-0.074301	intervals which may interfere with
-0.074301	A macro will interfere with
-0.198757	directives (everything that begins with
-0.198757	builder Has an IDE with
-0.163992	C++ Compiler Documentation". Included with
-0.163992	particular subtask before coordination with
-0.163992	the SelectAddMul example (12.4e) with
-0.163992	loop of ADC (add with
-0.163992	// Example 12.1b. Vectorization with
-0.163992	platforms or multiple configurations with
-0.163992	are created. Far Systems with
-0.163992	if it is correlated with
-0.163992	versions 7 through 14, with
-0.163992	where security matters. Problems with
-0.163992	is easy to trace with
-0.163992	to heavy competition. Processors with
-0.163992	mainframe computer. Big supercomputers with
-0.163992	from a project built with
-0.163992	prevented by calling vector::reserve with
-0.163992	isolation have been unsatisfied with
-0.163992	which can't be reached with
-0.163992	other is -0 (zero with
-0.163992	as example 12.4b, rewritten with
-0.163992	patterns containing multiple streams with
-0.163992	is done in connection with
-0.163992	object or array coincides with
-0.163992	Intel's compilers and invoked with
-0.163992	program is dividing repeatedly with
-0.163992	possible to calculate pow(x,10) with
-0.163992	cases are usually dealt with
-0.163992	fixed address might clash with
-0.163992	multiple functions. I disagree with
-0.163992	would have spent fighting with
-0.462344	systems than it is on
-0.237463	The main focus is on
-0.335567	on older processors and on
-0.427728	performance of this function on
-0.476722	calculate the same function on
-0.308760	AMD processors, but not on
-0.308760	Intel CPUs, but not on
-0.290178	only on registers, not on
-0.234359	my own research, not on
-0.318205	a register rather than on
-0.069612	in registers rather than on
-0.373133	on integer expressions than on
-0.230936	the user interface than on
-0.558261	replace the Gnu compiler on
-0.829669	most of the time on
-0.287489	spend more CPU time on
-0.697386	most of its time on
-0.319961	of their execution time on
-0.287489	does not spend time on
-0.293443	economize the resource use on
-0.237196	of software. For more on
-0.237090	function that allocates memory on
-0.237035	run a speed-critical program on
-0.223928	code will work only on
-0.332937	renaming mechanism works only on
-0.022440	are predicted well only on
-0.223928	implementation is optimal only on
-0.097689	everything that depends only on
-0.097689	return value depends only on
-0.223928	have been tested only on
-0.324026	or not at all on
-0.237061	and model numbers, but on
-0.354028	can even be used on
-0.235004	and frameworks typically used on
-0.236624	a type. The example on
-0.342990	smaller the integer size on
-0.352717	by constructing the object on
-0.236694	compilers is generally possible on
-0.376444	run the advanced version on
-0.288997	run an inferior version on
-0.236679	contiguous with other objects on
-0.294844	dramatic degradation of performance on
-0.311204	of CPUs. The performance on
-0.006467	library has reduced performance on
-0.324159	work for very long on
-0.474608	that it is stored on
-0.507935	objects will be stored on
-0.055530	a function are stored on
-0.284083	Function parameters are stored on
-0.215242	p. 26). Variables stored on
-0.550447	virtual function is called on
-0.347756	you need to test on
-0.488543	memory. This is useful on
-0.236101	for many applications even on
-0.333589	or writing a file on
-0.460457	to use vector operations on
-0.218696	possible to do operations on
-0.218696	processing, and mathematical operations on
-0.218696	int, float. Similar operations on
-0.802045	and in some cases on
-0.278157	power of the processors on
-0.312403	negative list of processors on
-0.278157	for other virtual processors on
-0.284455	but is less important on
-0.229322	that are particularly important on
-0.100341	If objects are accessed on
-0.329289	support for inline assembly on
-0.229931	Windows compiler to work on
-0.229931	is sure to work on
-0.195442	82 Keywords that work on
-0.246186	pointers may not work on
-0.195442	to make this work on
-0.086819	the Gnu directives work on
-0.086819	the Microsoft directives work on
-0.299490	has finished the calculations on
-0.291069	takes to do calculations on
-0.208335	useful when doing calculations on
-0.208335	CPU to start calculations on
-0.208335	for doing parallel calculations on
-0.235700	can be cross- compiled on
-0.235548	is typically 64 bytes on
-0.342134	processing. Running multiple threads on
-0.207077	likely to work best on
-0.040916	one that works best on
-0.040916	version that works best on
-0.040916	than "what works best on
-0.040916	thinks "what works best on
-0.329007	number 6! The speed on
-0.235565	bookkeeping depends very much on
-0.227611	aware of possible overflow on
-0.227611	check for buffer overflow on
-0.379151	a 64 64 matrix on
-0.438106	discussion of container classes on
-0.635246	should preferably be done on
-0.291112	same regardless of precision on
-0.416045	only one that works on
-0.227043	functions. It also works on
-0.235451	is not a manual on
-0.151947	This method is explained on
-0.210640	variable storage are explained on
-0.132004	optimizing code, as explained on
-0.132004	non-Intel processors, as explained on
-0.132004	be static, as explained on
-0.132004	of precision, as explained on
-0.132004	out-of-order execution, as explained on
-0.132004	vector operations, as explained on
-0.132004	class templates, as explained on
-0.132004	without AVX, as explained on
-0.132004	switch statements, as explained on
-0.132004	clock frequency, as explained on
-0.132004	critical stride, as explained on
-0.132004	cache contentions, as explained on
-0.006148	functions for reasons explained on
-0.006148	precision for reasons explained on
-0.006148	manual for reasons explained on
-0.006148	mode, for reasons explained on
-0.311609	mode. Storing the parameters on
-0.226626	the stack (three parameters on
-0.226808	135). This extra check on
-0.226808	added a bounds check on
-0.226808	need modification if implemented on
-0.226808	application is preferably implemented on
-0.235275	be the fastest solution on
-1.292750	instruction set is supported on
-0.270698	fully standardized and supported on
-0.217182	is currently only supported on
-0.290758	applies to decrement operators on
-0.216309	thread can then run on
-0.216309	all. Can only run on
-0.216309	set can still run on
-0.090666	not always work well on
-0.090666	it doesn't work well on
-0.090666	library that works well on
-0.090666	sure it works well on
-0.324846	- 4 clock cycles on
-0.324846	takes 11 clock cycles on
-0.234789	destination, but don't count on
-0.022474	that scans all files on
-0.213476	that runs quite fast on
-0.213476	operations are particularly fast on
-0.213476	This worked sufficiently fast on
-0.332705	which solution is optimal on
-0.234403	required amount of space on
-0.311217	time than anything else on
-0.339707	13.1 // CPU dispatching on
-0.223359	because it makes dispatching on
-0.349314	the code is running on
-0.101459	chosen only when running on
-0.101459	instruction set when running on
-0.101459	other libraries when running on
-0.184661	all other processes running on
-0.233630	Core2 processor performs better on
-0.309978	instruction set. The examples on
-0.758884	a floating point addition on
-0.289126	reduce various algebraic expressions on
-0.527660	function parameters are transferred on
-0.288070	and r are transferred on
-0.272641	and prevents all optimizations on
-0.218899	compiler from doing optimizations on
-0.233429	purposes than rendering graphics on
-0.233369	want to keep together on
-0.238425	It uses the dispatch on
-0.238425	to implement the dispatch on
-0.232953	and replaced by storage on
-0.032768	operating system is based on
-0.149614	This manual is based on
-0.017141	dispatching should be based on
-0.002106	programs that are based on
-0.002106	these methods are based on
-0.002106	.NET framework are based on
-0.002106	of Java are based on
-0.002106	and Fortran are based on
-0.000526	protection schemes are based on
-0.002106	The recommendations are based on
-0.017141	an unknown CPU based on
-0.017141	C or C++ based on
-0.017141	important. A language based on
-0.017141	a CPU dispatcher based on
-0.017141	high level framework based on
-0.017141	branch will go based on
-0.017141	can be chosen based on
-0.233244	a specific CPU feature on
-0.232626	the same processor core on
-0.326774	files etc. scattered around on
-0.214717	do not wrap around on
-0.180510	to do more reductions on
-0.180510	the most simple reductions on
-0.205618	cannot make algebraic reductions on
-0.205618	do any algebraic reductions on
-0.042296	method to use depends on
-0.042296	in each vector depends on
-0.042296	of a loop depends on
-0.077332	that each value depends on
-0.005067	loop control branch depends on
-0.010194	that each calculation depends on
-0.010194	where each calculation depends on
-0.042296	the final application depends on
-0.042296	if each addition depends on
-0.042296	can be predicted depends on
-0.042296	value of sum depends on
-0.042296	parallelism. The gain depends on
-0.042296	to the truth depends on
-0.215245	software should be tested on
-0.215245	servers should be tested on
-0.233017	Windows are fully compatible on
-0.008633	Define function name depending on
-0.008633	in various ways depending on
-0.008633	clock frequency dynamically depending on
-0.000342	16 clock cycles, depending on
-0.000342	10 clock cycles, depending on
-0.000342	6 clock cycles, depending on
-0.000342	80 clock cycles, depending on
-0.000342	25 clock cycles, depending on
-0.008633	of the memory, depending on
-0.008633	for 32-bit integers, depending on
-0.008633	9 and 64, depending on
-0.008633	three or four, depending on
-0.008633	has several meanings depending on
-0.008633	than example 12.4a, depending on
-0.008633	a conditional move, depending on
-0.008633	the following solutions, depending on
-0.347007	should preferably be avoided on
-0.256865	aliasing is to turn on
-0.753094	is recommended to turn on
-0.207371	are using and turn on
-0.160741	that you can turn on
-0.160741	editions). Do not turn on
-0.232286	by the methods described on
-0.374290	Any floating point operation on
-0.231991	next new model comes on
-0.010777	solution is to rely on
-0.010777	more convenient to rely on
-0.036002	on compilers that rely on
-0.036002	making optimizations that rely on
-0.005355	and you can rely on
-0.005355	cases you can rely on
-0.021829	multiple threads should rely on
-0.007155	128 function cannot rely on
-0.007155	then you cannot rely on
-0.007155	not. You cannot rely on
-0.021829	you cannot always rely on
-0.021829	loop branch must rely on
-0.021829	if possible. Don't rely on
-0.021829	we can surely rely on
-0.319641	and divisions are given on
-0.231162	speed for certain tasks on
-0.164838	has hardly any effect on
-0.164838	has no negative effect on
-0.164838	has a significant effect on
-0.164838	a very dramatic effect on
-0.230782	functions should work efficiently on
-0.045135	list of processor models on
-0.276286	operating systems" for details on
-0.205467	CPUs" gives more details on
-0.230334	critical dependency chain, especially on
-0.297122	of threads is discussed on
-0.202910	small devices, as discussed on
-0.230234	checking is explained below on
-0.308565	dynamic linker. The delay on
-0.230034	graphics processing unit, either on
-0.229934	label ; save ebx on
-0.229467	to actually doing something on
-0.408076	to a clock cycle on
-0.203915	2GHz A clock cycle on
-0.274460	only one clock cycle on
-0.282079	which method is fastest on
-0.326866	Copyright conditions are listed on
-0.226025	The time you spend on
-0.225870	spot and make measurements on
-0.225714	matrix sizes were measured on
-0.145820	some cases, the log on
-0.145820	a password. The log on
-0.145820	databases usually requires log on
-0.348711	the time is spent on
-0.241837	reducing the time spent on
-0.132489	the clock cycles spent on
-0.224065	number, which is 15 on
-0.223884	more time than normal on
-0.023414	Thin clients that depend on
-0.023414	one iteration should depend on
-0.023414	because it doesn't depend on
-0.023414	the factorials don't depend on
-0.023414	These workaround methods depend on
-0.023414	other hardware-related details depend on
-0.301870	concentrate the optimization effort on
-0.221335	software. A negative list, on
-0.221551	is necessary to compromise on
-0.431515	base a software package on
-0.028252	disk. Software that relies on
-0.028252	unless the code relies on
-0.028252	unless your program relies on
-0.028252	exceptions. The mechanism relies on
-0.028252	in the MKL relies on
-0.148310	branching takes time. Dispatch on
-0.148310	at different times: Dispatch on
-0.217539	implementation works particularly bad on
-0.271102	range (see page 134 on
-0.035612	your compiler for restrictions on
-0.035612	have very few restrictions on
-0.004295	There are certain restrictions on
-0.217806	and better processor appears on
-0.148082	take only 5 μs on
-0.193300	ns = 250 μs on
-0.217806	(128 or 256 bytes) on
-0.217539	worked well in tests on
-0.264431	are explained in detail on
-0.211285	Manual". developer.intel.com. Many advices on
-0.164571	compilers may behave differently on
-0.122110	30 Overflow behaves differently on
-0.013514	same operation is performed on
-0.211285	list of titles. Literature on
-0.211285	there is more focus on
-0.211285	program are typically specified on
-0.264034	matrix using example 9.5a on
-0.211285	are various discussion forums on
-0.211285	carry and zero flags on
-0.199042	requires compilation or interpretation on
-0.074408	ISO/IEC TR18015 Technical Report on
-0.074408	TR 18015, "Technical Report on
-0.199042	available from www.intel.com. Manual on
-0.250234	because the cache miss on
-0.199042	consult the general literature on
-0.008633	optimization effort is concentrated on
-0.199042	a series of experiments on
-0.199042	program. This has influence on
-0.199042	the first two (three on
-0.250234	system. It will crash on
-0.464341	can be predicted perfectly on
-0.199042	part can run optimally on
-0.017441	much time is wasted on
-0.017441	No time is wasted on
-0.199042	Algorithms that rely heavily on
-0.199042	focus the optimization efforts on
-0.199042	SIAM 2001. Advanced book on
-0.329487	remove the const restriction on
-0.199042	available from the IDE on
-0.250234	to use algebraic manipulations on
-0.199042	The counters will stay on
-0.074408	books contain many tips on
-0.074408	errors, and some tips on
-0.164255	guidelines are provided below, on
-0.164255	the disk cache. Files on
-0.164255	72 for discussions. Turn on
-0.164255	arrays and objects. Storage on
-0.164255	Register ebx is pushed on
-0.164255	C++ compilers. Wikipedia article on
-0.164255	on test theory. Advice on
-0.164255	to put a tag on
-0.164255	at runtime. Example 7.43 on
-0.164255	manual is based mainly on
-0.164255	the program runs satisfactorily on
-0.164255	positive and negative impacts on
-0.164255	15 clock cycles (depending on
-0.164255	good deal of research on
-0.164255	she is busy concentrating on
-0.164255	15h Processors". www.amd.com. Advices on
-0.164255	because we are relying on
-0.164255	in the BIOS setup. on
-0.164255	is not a textbook on
-0.164255	there are different opinions on
-0.164255	make a Boolean NOT on
-0.164255	for 'this' is incurred on
-0.807587	// This is the code
-0.478000	code, which is the code
-0.475517	possible version of the code
-0.678677	debug version of the code
-0.475517	inferior version of the code
-0.327683	a part of the code
-0.461882	this part of the code
-0.327683	but part of the code
-0.244045	critical part of the code
-0.327683	optimized part of the code
-0.327683	time-critical part of the code
-0.453671	framework Most of the code
-0.104263	the parts of the code
-0.241740	when parts of the code
-0.191193	critical parts of the code
-0.241740	specific parts of the code
-0.241740	Critical parts of the code
-0.241740	nearby parts of the code
-0.321639	This fragmentation of the code
-0.321639	The compactness of the code
-0.446559	in addition to the code
-0.345452	of modifications to the code
-0.335120	is obvious and the code
-0.335120	for overflow, and the code
-0.335120	quite tedious and the code
-0.584970	each other in the code
-0.414501	the values in the code
-0.082883	and addresses in the code
-0.082883	All addresses in the code
-0.082883	relative addresses in the code
-0.159570	more space in the code
-0.159570	much space in the code
-0.159570	little space in the code
-0.319861	CPU dispatching in the code
-0.843607	cause contentions in the code
-0.319861	and references in the code
-0.319861	can happen in the code
-0.319861	dependency chains in the code
-0.414501	fixed breakpoint in the code
-0.319861	profiling instruments in the code
-0.131125	the relocations in the code
-0.131125	generate relocations in the code
-0.319861	as pragmas in the code
-0.319861	for parallelization in the code
-0.551963	inlining is that the code
-0.537909	compact so that the code
-1.029433	make sure that the code
-0.490425	profilers require that the code
-0.333574	will notice that the code
-0.331432	than double if the code
-0.331432	to see if the code
-0.331432	using vectors if the code
-0.331432	be vectorized if the code
-0.331432	less efficiently if the code
-0.466997	systems, especially if the code
-0.347191	to square by the code
-0.793586	more resources than the code
-0.321593	is relevant when the code
-0.321593	is obtained when the code
-0.321593	slower, especially when the code
-0.321593	more fragmented when the code
-0.321593	point precisions when the code
-0.310595	frame functions then the code
-0.310595	template parameters then the code
-0.403009	a time, then the code
-0.310595	64-bit double, then the code
-0.283524	you look at the code
-0.365223	do to make the code
-0.709880	order to make the code
-0.365223	things to make the code
-0.365223	tends to make the code
-0.367179	conversions and make the code
-0.344903	be better because the code
-0.335606	simultaneously. Actually, only the code
-0.328592	test purposes. If the code
-0.328592	sequential order. If the code
-0.328592	chapter 12. If the code
-0.339679	be vectorized, but the code
-0.318687	test feature into the code
-0.318687	instruments directly into the code
-0.251845	all. This makes the code
-0.184616	but this makes the code
-0.184616	by one makes the code
-0.184616	it also makes the code
-0.361553	This option makes the code
-0.184616	of templates makes the code
-0.184616	such checks makes the code
-0.445975	do immediately before the code
-0.333951	you are sure the code
-0.441060	operands in case the code
-0.501936	misses by making the code
-0.637967	and you want the code
-0.301976	way to check the code
-0.471669	that checks whether the code
-0.282039	is needed. All the code
-0.230758	information to optimize the code
-0.166492	can often optimize the code
-0.328258	a debugger. However, the code
-0.339575	exception handling unless the code
-0.325092	compilers will replace the code
-0.325092	relative addresses. Therefore, the code
-0.327856	+= x; Here, the code
-0.325942	compiler will change the code
-0.427231	avoided by copying the code
-0.080213	have to vectorize the code
-0.080213	unable to vectorize the code
-0.075316	automatically and vectorize the code
-0.075316	compiler will vectorize the code
-0.075316	compilers don't vectorize the code
-0.282039	mentioned above. Now the code
-0.227193	big problem. Whenever the code
-0.227193	while simultaneously prefetching the code
-0.367949	useful to study the code
-0.227193	On the contrary, the code
-0.227193	advance. This reduces the code
-0.227193	possible to organize the code
-0.227193	n and reorganize the code
-0.227193	to fine- tune the code
-1.335754	critical part of a code
-0.293236	models on which a code
-0.293236	following example shows a code
-0.237047	debugger can execute a code
-0.313707	memory and insert a code
-0.647234	the combined size of code
-0.169983	a new branch of code
-0.169983	each particular branch of code
-0.449891	vectors. A lot of code
-0.058941	after the piece of code
-0.058941	insert the piece of code
-0.097948	of a piece of code
-0.046218	make a piece of code
-0.097948	If a piece of code
-0.097948	generate a piece of code
-0.097948	studying a piece of code
-0.277834	the same piece of code
-0.127156	a critical piece of code
-0.422248	is the range of code
-0.344176	the total amount of code
-0.300286	the two kinds of code
-0.300286	make certain kinds of code
-0.772933	costs in terms of code
-0.310286	typically small pieces of code
-0.234178	only. Critical pieces of code
-0.233233	and automatic parallelization of code
-0.047634	access 9.1 Caching of code
-0.047634	87 9.1 Caching of code
-0.233233	Michael Abrash: "Zen of code
-0.348987	as ReadB needs to code
-0.237358	economy, cache efficiency and code
-0.473208	between function names and code
-0.237802	a slight degradation in code
-0.344531	i, timediff[i]); } The code
-0.279123	the same time. The code
-0.279123	to save time. The code
-0.335420	and trigonometric functions. The code
-0.229406	Join identical branches The code
-0.664337	addressing of data. The code
-0.304607	and Clang compilers. The code
-0.229406	be stored together The code
-0.314993	the integer calculations. The code
-0.304607	at every access. The code
-0.229406	automatic CPU dispatching. The code
-0.229406	Mac OS X The code
-0.304607	not vectorize automatically. The code
-0.229406	child class members. The code
-0.304607	for interrupt 3. The code
-0.284551	the sizeof operator. The code
-0.229406	set is specified. The code
-0.229406	purposes is allowed. The code
-0.371012	on page 122. The code
-0.229406	slow, you know). The code
-0.229406	code becomes contiguous. The code
-0.229406	the following features: The code
-0.229406	clumsy and tedious. The code
-0.229406	{ return _mm_cvtsd_si32(_mm_load_sd(&x));} The code
-0.335025	optimization features and for code
-0.817722	a good choice for code
-0.236915	has been criticized for code
-0.314513	is very likely that code
-0.325002	identify individual functions or code
-0.344883	one instance. The function code
-0.237546	AVX or later with code
-0.237569	of titles. Literature on code
-0.603863	} } } This code
-0.392315	+ 1; } This code
-0.234756	#endif return n;} This code
-0.234756	in example 16.1. This code
-0.314405	has higher priority than code
-0.326525	The problem with this code
-0.233592	models on which this code
-0.233592	we are running this code
-0.233592	to cause overflow, this code
-0.329463	as possible or when code
-0.329463	the execution time when code
-0.234625	to execute CriticalFunction when code
-0.617637	the same time. A code
-0.234340	branches works correctly. A code
-0.234340	than it says. A code
-1.149677	parts of the program code
-0.540943	overhead in the program code
-0.322740	a piece of program code
-0.301660	just-in-time compilation. The program code
-0.301660	with interpretation. The program code
-0.357293	Metaprogramming means to make code
-0.493575	jump to a different code
-0.443224	copied because the same code
-0.342808	example shows the same code
-0.443224	should produce the same code
-0.924599	can share the same code
-0.381716	whether there is other code
-0.549369	by the floating point code
-0.698159	reductions on floating point code
-0.240050	it makes floating point code
-0.346427	set makes floating point code
-0.486151	A list of which code
-0.293877	2. Check that all code
-0.293877	to verify that all code
-0.230926	that can call all code
-0.230926	for vectorization Not all code
-0.313728	Example 12.7. Vector class code
-0.313980	dispatching to make multiple code
-0.825083	for 32-bit and 64-bit code
-0.236984	debugging and maintaining such code
-0.348425	around and less efficient code
-0.476759	RISC in situations where code
-0.236739	dispatching and run any code
-0.351640	cache lines. This makes code
-0.684901	version of the critical code
-0.427998	versions of the critical code
-0.318359	called by the critical code
-0.045323	120 13 Making critical code
-0.045323	121 13 Making critical code
-0.341119	bit code 64 bit code
-0.337835	OpenMP directives 32 bit code
-0.302213	forgets that the system code
-0.227393	and therefore the system code
-0.279309	time-consuming function in system code
-0.313840	possible, or the error code
-0.300614	return with an error code
-0.300614	may return an error code
-0.235936	contain useful discussions about code
-0.323706	conversion generates no extra code
-0.241102	doesn't generate any extra code
-0.731285	not produce any extra code
-0.209916	may actually add extra code
-0.209916	The compiler inserts extra code
-0.020239	of C++ and assembly code
-0.326350	need to use assembly code
-0.247619	Other compilers need assembly code
-0.020239	generates the following assembly code
-0.281295	to use inline assembly code
-0.200860	is that the compiled code
-0.200860	instances makes the compiled code
-0.295610	of C++, directly compiled code
-0.235600	memory economy and small code
-0.235715	My recommendation for good code
-0.099065	when going from AVX code
-0.099065	any transition from AVX code
-0.195782	you run the optimized code
-0.195782	example 12.2, the optimized code
-0.255331	Boolean output. The optimized code
-0.255331	in the fully optimized code
-0.293788	consider making highly optimized code
-0.235241	every function or every code
-0.234949	return prediction). 149 All code
-0.235672	which interprets the intermediate code
-0.141103	Another disadvantage of intermediate code
-0.185562	compiled code and intermediate code
-0.201454	is distributed. The intermediate code
-0.031214	language based on intermediate code
-0.031214	framework based on intermediate code
-0.092285	implemented with an intermediate code
-0.092285	languages use an intermediate code
-0.092285	Pascal used an intermediate code
-0.043696	of using an intermediate code
-0.043696	for using an intermediate code
-0.979620	the compiler to optimize code
-0.311262	me explain the above code
-0.137893	+= a[i]; The above code
-0.137893	unknown sources. The above code
-0.137893	happens rarely. The above code
-0.193584	actually is. This above code
-0.333769	will produce the optimal code
-0.223717	compilers produce less optimal code
-0.316037	by using a particular code
-0.316037	cases where a particular code
-0.329514	addresses in 32-bit Mac code
-0.310645	justify such a complicated code
-0.315731	Templates make the source code
-0.221454	(byte code). The source code
-0.219118	at installation time. Each code
-0.219118	library at initialization. Each code
-0.218334	to remember that your code
-0.218334	several years before your code
-0.233377	is compiled to binary code
-0.233060	run the most advanced code
-0.098924	turning off the position-independent code
-0.007302	Dynamic linking and position-independent code
-0.029976	systems often use position-independent code
-0.029976	Unix-like systems use position-independent code
-0.062176	OS X make position-independent code
-0.062176	by not using position-independent code
-0.062176	shared objects without position-independent code
-0.062176	the compiler uses position-independent code
-0.062176	need for special position-independent code
-0.062176	avoid the burdensome position-independent code
-0.189522	writes automatically in vectorized code
-0.189522	error prone. The vectorized code
-0.189522	profitable to use vectorized code
-0.230858	is 400 here. Any code
-0.285705	will make exactly identical code
-0.303914	This makes position- independent code
-0.319949	is possible to vectorize code
-0.331195	for the hardware definition code
-0.111015	appears in the machine code
-0.111015	then transferred as machine code
-0.111015	is translated into machine code
-0.111015	that the resulting machine code
-0.176265	for system code. System code
-0.176265	of everything else. System code
-0.114396	page 146 below. Position-independent code
-0.114396	everywhere by default. Position-independent code
-0.156072	code. 147 14.12 Position-independent code
-0.249038	*p+2 is a loop-invariant code
-0.033236	subexpression elimination and loop-invariant code
-0.033236	constant propagation, and loop-invariant code
-0.069229	can move out loop-invariant code
-0.148413	vector register size. Vectorized code
-0.148413	and overloaded operators. Vectorized code
-0.032623	2 12.6 Transforming serial code
-0.032623	113 12.6 Transforming serial code
-0.148413	certain restrictions on mixing code
-0.148413	a problem when mixing code
-0.218196	this method. Your measurement code
-0.271689	contents of data cache, code
-0.122388	to see the compiler-generated code
-0.122388	function libraries and compiler-generated code
-0.264612	be used for improving code
-0.211797	making clear and well-structured code
-0.122388	string instructions. The built-in code
-0.122388	compiler often inserts built-in code
-0.211978	file http://www.agner.org/optimize/asmlib.zip contains complete code
-0.035698	x x Loop invariant code
-0.035698	the compiler. Loop invariant code
-0.465258	AVX code to non-AVX code
-0.199539	} } The resulting code
-0.199539	permissible if the unsafe code
-0.199539	for the linker. Both code
-0.199539	UNIX shell script. Interpreted code
-0.164714	non-inlined copy is dead code
-0.164714	Visual Studio can build code
-0.164714	versions of the user-written code
-0.164714	make sure the startup code
-0.164714	to use it. Complicated code
-0.164714	the user. Making exception-safe code
-0.164714	alignment and the resultant code
-0.546403	a member function is as
-0.476799	or overloaded operator is as
-0.381480	compatible instruction sets is as
-0.172114	owns. A destructor is as
-0.172114	A virtual destructor is as
-0.356473	the executable to be as
-0.355877	boxes, etc. should be as
-0.324142	the header files are as
-0.236905	post-increment operator i++ are as
-0.237129	a function parameter, or as
-0.237547	the compiler recognizes it as
-0.889993	the CPU detection function as
-0.477773	shows the same code as
-0.237131	with fixed size, not as
-0.355369	as (b*2.0)/3.0 rather than as
-0.236135	a circular buffer than as
-0.237566	order to access x as
-0.546200	instance then you may as
-0.334963	automatically but you may as
-0.313863	CPU dispatcher should have as
-0.728740	at the same time as
-0.647065	takes no extra time as
-0.237204	a matrix for use as
-0.352934	by organizing the data as
-0.291341	get as much data as
-0.514920	size of the program as
-0.514920	clarity of the program as
-0.293267	emulate a 256-bit vector as
-0.033212	which is the same as
-0.033212	pointer is the same as
-0.033212	calls is the same as
-0.033212	result is the same as
-0.033212	matrix is the same as
-0.033212	i++) is the same as
-0.033212	reference is the same as
-0.323107	It does the same as
-0.323107	operator does the same as
-0.314574	code becomes the same as
-0.313725	of common string functions as
-0.237038	such optimizations automatically, but as
-0.330292	loop counter is used as
-0.330292	a bool is used as
-0.476715	compilers can be used as
-0.476715	units can be used as
-0.372114	integer may be used as
-0.372114	These may be used as
-0.911574	can also be used as
-0.274496	#define directives when used as
-0.448212	Arrays are often used as
-0.783265	in the level-2 cache as
-0.353756	may try to do as
-0.234378	service routine should do as
-0.234230	the type and size as
-0.234230	execution units same size as
-0.236888	compute a / b as
-0.649004	a variable or object as
-0.236568	known in 36 C++ as
-0.093740	a member function such as
-0.010669	are using functions such as
-0.002643	for mathematical functions such as
-0.002643	or mathematical functions such as
-0.002643	common mathematical functions such as
-0.002643	computing mathematical functions such as
-0.010669	with C functions such as
-0.010669	common math functions such as
-0.010669	or memory-intensive functions such as
-0.093740	vectorization Good compilers such as
-0.093740	Simple integer operations such as
-0.093740	a composite type such as
-0.093740	are special cases such as
-0.093740	brands of CPUs such as
-0.093740	seldom used branches such as
-0.093740	from other applications such as
-0.093740	for simple types such as
-0.044346	of other optimizations such as
-0.044346	to do optimizations such as
-0.093740	simple algebraic reductions such as
-0.044346	in compiled languages such as
-0.044346	This includes languages such as
-0.021607	important for tasks such as
-0.021607	for standard tasks such as
-0.021607	priority. Other tasks such as
-0.021607	for trivial tasks such as
-0.093740	a long time, such as
-0.093740	digital building blocks such as
-0.093740	number of purposes such as
-0.093740	in mathematical iterations such as
-0.093740	type of vector, such as
-0.093740	with segmented memory, such as
-0.093740	also third-party profilers such as
-0.093740	are also available, such as
-0.093740	synchronization between threads, such as
-0.093740	Some programming languages, such as
-0.093740	the same resources, such as
-0.093740	risk of overflow, such as
-0.093740	determined by considerations such as
-0.093740	integers in comparisons, such as
-0.093740	hardware definition language, such as
-0.093740	use string classes, such as
-0.093740	Some STL templates, such as
-0.093740	any other resource, such as
-0.093740	of data shuffling, such as
-0.093740	count certain events, such as
-0.093740	names with suffixes such as
-0.093740	are inherently serial, such as
-0.093740	supports automatic vectorization, such as
-0.093740	advantage to obtain, such as
-0.093740	or removable media such as
-0.093740	in table 9.2, such as
-0.093740	the feature information, such as
-0.019839	operator is as efficient as
-0.009806	destructor is as efficient as
-0.062460	are therefore as efficient as
-0.030108	is exactly as efficient as
-0.030108	are exactly as efficient as
-0.136066	from the previous value as
-0.313237	not allow vector objects as
-0.344474	of transferring the variable as
-0.229275	a floating point variable as
-0.470742	use the induction variable as
-0.236627	to be designed so as
-0.228952	used for multiple variables as
-0.234345	operations with Boolean variables as
-0.234345	that have Boolean variables as
-0.052840	extra time as long as
-0.052840	the program as long as
-0.052840	automatically, but as long as
-0.052840	multiple variables as long as
-0.052840	point calculations as long as
-0.052840	not significant as long as
-0.052840	64-bit integers, as long as
-0.079148	in the same way as
-0.079148	work the same way as
-0.079148	go the same way as
-0.229087	counter i is stored as
-0.229087	The sign is stored as
-0.229087	The exponent is stored as
-0.229087	the fraction is stored as
-0.268489	is distributed and stored as
-0.487090	Boolean variables are stored as
-0.291973	also occur quite often as
-0.530699	compatibility is not always as
-0.522972	recommend object oriented programming as
-0.346987	efficiency. These are available as
-0.235761	it takes six times as
-0.353993	that 150 you want as
-0.344071	can align the arrays as
-0.228565	the software development work as
-0.228565	do as little work as
-0.533933	and floating point calculations as
-0.435454	code that is compiled as
-0.303606	can possibly be compiled as
-0.235343	the low-level C language as
-0.291514	to do as much as
-0.235308	in the same thread as
-0.235175	to be as small as
-0.235382	bitwise operators using integers as
-0.044169	not always as good as
-0.044169	Not optimized as good as
-0.044169	not optimize as good as
-0.044169	are cached as good as
-0.618174	32-bit and 64-bit Linux as
-0.549280	it can be done as
-0.098918	arithmetic operations are therefore as
-0.098918	itself. Constructors are therefore as
-0.378968	with the same precision as
-0.291081	Gnu compiler. Not optimized as
-0.320859	This expression is calculated as
-0.406391	numbers can be calculated as
-0.406391	stride can be calculated as
-0.265907	b*2.0/3.0 will be calculated as
-0.444808	and want to get as
-0.235356	tables are particular advantageous as
-0.226642	the code is implemented as
-0.226642	application software is implemented as
-0.372469	it can be implemented as
-0.372469	functions can be implemented as
-0.041776	queue should be implemented as
-0.198491	cannot easily be implemented as
-0.177051	conditions which are implemented as
-0.177051	metaprogramming, loops are implemented as
-0.163067	objects is often implemented as
-0.098602	source of error known as
-0.098602	common programming error known as
-0.073435	the program as well as
-0.073435	string functions as well as
-0.073435	64-bit Linux as well as
-0.073435	.NET framework as well as
-0.073435	to reading as well as
-0.073435	software users as well as
-0.073435	bool, enum as well as
-0.073435	2008 R2 as well as
-0.234755	non-zero, and therefore count as
-0.234363	but is not quite as
-0.025690	function is as fast as
-0.025690	i++ are as fast as
-0.025690	are therefore as fast as
-0.008396	be just as fast as
-0.008396	are just as fast as
-0.008396	vector just as fast as
-0.378004	sets. Does not optimize as
-0.234181	have as few branches as
-0.130315	has the same name as
-0.343679	correct child class name as
-0.233728	then interpret that string as
-0.233376	constant references accept expressions as
-0.287093	pointer which is transferred as
-0.206367	PC and then transferred as
-0.206367	Arrays are always transferred as
-0.319566	for the .NET framework as
-0.233270	array of thousand numbers as
-0.414484	parameters that are declared as
-0.319553	by storing intermediate results as
-0.233024	The measured results were as
-0.183584	reference may be just as
-0.183584	precision calculations are just as
-0.183584	calculate a vector just as
-0.183584	square brackets index, just as
-0.232750	it may be smaller as
-0.232445	do such obvious reductions as
-0.329183	other compiled programming languages as
-0.287612	a matrix in STL as
-0.442870	IDE. It is intended as
-0.192082	useful for optimizing code, as
-0.192082	of the source code, as
-0.192082	language for CPU-intensive code, as
-0.262413	different for different platforms as
-0.516685	apply to other platforms as
-0.287045	child class is given as
-0.304097	can now be vectorized as
-0.209929	code is indeed vectorized as
-0.333327	to code the offset as
-0.230846	Example 7.34a. Use macro as
-0.231026	simply by comparing them as
-0.499104	does the same thing as
-0.285329	such expressions may occur as
-0.230092	This applies to reading as
-0.018546	can be implemented either as
-0.176808	can be linked either as
-0.229880	label. It uses ebx as
-0.285450	Monday, etc. are defined as
-0.228962	table is not significant as
-0.284311	then it becomes invalid as
-0.239516	register can be organized as
-0.189961	These lines are organized as
-0.145072	in memory if organized as
-0.145072	floating point registers organized as
-0.494325	or SSE2 instruction set, as
-0.229427	well on non-Intel processors, as
-0.284707	same coding rules apply as
-0.229195	of the same features as
-0.554513	old fashioned C style as
-0.303208	C++ language is chosen as
-0.147811	assembly language is provided as
-0.147811	they contain is provided as
-0.228229	should be as standardized as
-0.227606	constants are usually included as
-0.227171	of inheritance is now as
-0.227316	use the same unit as
-0.034144	operations of modern CPUs, as
-0.034144	capabilities of modern CPUs, as
-0.202556	CPUs or multi-core CPUs, as
-0.227316	i instead of j as
-0.227461	with many different factors as
-0.227171	the CPU dispatching explicitly as
-0.258316	when i is interpreted as
-0.190153	integer will be interpreted as
-0.240694	overloaded operator is exactly as
-0.240694	disguise. Enums are exactly as
-0.225814	we will calculate xn as
-0.430765	program code is distributed as
-0.145918	is compiled and distributed as
-0.145918	for function libraries distributed as
-0.280474	portability to 64-bit mode, as
-0.300142	a variable in memory, as
-0.280474	depend on the system, as
-0.280287	integers and 64-bit integers, as
-0.225814	to make the measurements as
-0.224202	use the same principle as
-0.224394	should preferably be static, as
-0.224011	the value in edx as
-0.223819	time for software users as
-0.175657	becomes invalid as soon as
-0.175657	* 5). As soon as
-0.275325	code cannot be executed as
-0.310314	the exact time consumption as
-0.221728	of i will appear as
-0.221499	floating point value written as
-0.294949	example 15.1d to 15.1c as
-0.217758	to be cleaned up, as
-0.217758	float, double, bool, enum as
-0.354567	a loss of precision, as
-0.217758	to implement a queue as
-0.008631	has to be expressed as
-0.002858	address can be expressed as
-0.002858	offset can be expressed as
-0.002858	formats can be expressed as
-0.218042	Gnu compiler // Same as
-0.041861	that x is treated as
-0.041861	the object is treated as
-0.088193	function are simply treated as
-0.067843	offset that is coded as
-0.067843	instructions. This is coded as
-0.262260	it can be represented as
-0.148282	are in fact represented as
-0.271350	as accurate and reproducible as
-0.211222	size as template parameters, as
-0.263962	microprocessor in an FPGA as
-0.263962	on such small devices, as
-0.211222	loop with multiple counters, as
-0.263962	from doing out-of-order execution, as
-0.211222	order to optimize access, as
-0.211222	waste of cache space, as
-0.263962	advantage of vector operations, as
-0.211222	interpreters, just-in-time compilers, etc., as
-0.211222	www.agner.org/optimize/testp.zip or get ReadTSC as
-0.211222	be used for metaprogramming, as
-0.211222	a vector of vectors, as
-0.211222	to use vector classes, as
-0.198980	efficient container class templates, as
-0.198980	this can eliminate branches, as
-0.198980	yet as well developed as
-0.198980	not reproducible. Such events as
-0.198980	than seconds or microseconds as
-0.198980	rules about register use, as
-0.198980	is coded as _mm_empty() as
-0.198980	prevented in other ways, as
-0.198980	code compiled without AVX, as
-0.198980	and data are cached as
-0.198980	operators that have Booleans as
-0.074385	matrix[j][0] is calculated internally as
-0.074385	constructor is implemented internally as
-0.164199	extremely complicated and clumsy, as
-0.164199	as for switch statements, as
-0.164199	D are not yet as
-0.164199	complaints should be regarded as
-0.164199	in a memory pool, as
-0.164199	is copied by assignment, as
-0.164199	possibility for other optimizations, as
-0.164199	are becoming increasingly blurred as
-0.164199	need not be passed as
-0.164199	// Function pointer serves as
-0.164199	type of data elements, as
-0.164199	have tested implement OneOrTwo5[b!=0] as
-0.164199	to use static linking, as
-0.164199	of the clock frequency, as
-0.164199	to insert optimization hints as
-0.164199	this unit is pipelined, as
-0.164199	to the critical stride, as
-0.164199	very expensive cache contentions, as
-0.164199	array with bounds checking, as
-0.164199	integer factorial function (n!) as
-0.164199	in the same directory as
-0.164199	to use a union, as
-0.164199	a serious legal issue, as
-0.164199	management and garbage collection, as
-0.164199	Windows Server 2008 R2 as
-0.313205	class objects and is not
-0.511513	All code that is not
-0.355790	is that it is not
-0.594557	means that it is not
-0.431597	memory if it is not
-0.610531	cycles if it is not
-0.431597	destructor if it is not
-0.236557	used when it is not
-0.236557	even when it is not
-0.236557	running when it is not
-0.459801	core then it is not
-0.459801	fine then it is not
-0.413802	first. If it is not
-0.291873	on which it is not
-0.350050	well, but it is not
-0.350050	pointer, but it is not
-0.627725	cases where it is not
-0.291873	other hand, it is not
-0.291873	crash. Furthermore, it is not
-0.291873	size. Today, it is not
-0.291873	modern software, it is not
-0.925733	that the function is not
-0.581841	that a function is not
-0.581841	If a function is not
-0.319117	a linked function is not
-0.919438	of the code is not
-0.308775	The function code is not
-0.564773	the system code is not
-0.308775	The built-in code is not
-0.340141	the code. This is not
-0.340141	of usability. This is not
-0.472113	x. The compiler is not
-0.218518	long as this is not
-0.263192	occurs, but this is not
-0.263192	factorials, but this is not
-0.218518	preceding example, this is not
-0.218518	maintenance. However, this is not
-0.291690	variables. Obviously, this is not
-0.218518	but unfortunately this is not
-0.414973	value of A is not
-0.401553	-0 } It is not
-0.309418	two libraries It is not
-0.309418	the loop. It is not
-0.309418	each thread. It is not
-0.309418	data structures It is not
-0.401553	cache problems. It is not
-0.309418	element zero. It is not
-0.309418	a profiler. It is not
-0.309418	16-bit programs. It is not
-0.309418	are lost. It is not
-0.309418	not standardized. It is not
-0.309418	performs poorly. It is not
-0.393461	and member functions is not
-0.308400	of C++ but is not
-1.539871	SSE2 instruction set is not
-0.423445	The integer size is not
-0.393461	invalid when i is not
-0.301337	a new object is not
-0.301337	the original object is not
-0.273230	a higher number is not
-0.637671	of the array is not
-0.770135	number of objects is not
-0.436679	calculate the table is not
-0.419921	but the performance is not
-0.309038	If this address is not
-0.292757	standard PC processors is not
-0.302861	then the error is not
-0.273230	with old CPUs is not
-0.458025	if the processor is not
-0.223068	one logical processor is not
-0.381047	Long double precision is not
-0.895563	the repeat count is not
-0.219420	contains several branches is not
-0.268845	If exception handling is not
-0.383565	handling Exception handling is not
-0.219420	the function name is not
-0.292757	the __fastcall keyword is not
-0.357236	the user interface is not
-0.073477	into a union is not
-0.073477	Using a union is not
-0.161969	A copy constructor is not
-0.161969	A default constructor is not
-0.219420	number of points is not
-0.273230	The data section is not
-0.302861	on one computer is not
-0.410921	added to p is not
-0.357236	of the STL is not
-0.314011	Check that index is not
-0.219420	of these conditions is not
-0.219420	that the alignment is not
-0.219420	the cross-platform compatibility is not
-0.254182	the second operand is not
-0.314011	General case, N is not
-0.219420	Excessive loop unrolling is not
-0.219420	the row length is not
-0.292757	www.agner.org/optimize/testp.zip. This tool is not
-0.219420	number of iterations is not
-0.219420	of memory required is not
-0.219420	of cache misses is not
-0.219420	in the debugger is not
-0.219420	the image base is not
-0.219420	and constant propagation is not
-0.219420	a program package is not
-0.564097	of the divisor is not
-0.219420	across compilers. Fastcall is not
-0.219420	address. Step (1) is not
-0.219420	application. If hyperthreading is not
-0.273230	intermediate file format is not
-0.219420	distribution and mirroring is not
-0.219420	1. This '1' is not
-0.236943	less than 2n and not
-0.236943	not human readable and not
-0.236943	n.a. n.a. _MSC_VER and not
-0.236943	bit platform __GNUC__ and not
-0.403295	of functions that are not
-0.403295	for functions that are not
-0.301989	variable lengths that are not
-0.301989	interface. Applications that are not
-0.301989	random events that are not
-0.497255	explanation if you are not
-0.293624	long as you are not
-0.443819	counters when you are not
-0.471433	results. If you are not
-0.614185	code and data are not
-0.329924	compiler). Fastcall functions are not
-0.296638	and Watcom compilers are not
-0.296638	73). Current compilers are not
-0.479018	mode. If there are not
-0.262963	time. The objects are not
-0.344809	dynamically allocated objects are not
-0.262963	when shared objects are not
-0.423843	Gnu function libraries are not
-0.257897	the standard libraries are not
-0.257897	and LIBM libraries are not
-0.247152	some operating systems are not
-0.247152	contemporary operating systems are not
-0.413904	programmers and they are not
-0.413904	a, but they are not
-0.321822	(when vector operations are not
-0.050680	nontemporal write instructions are not
-0.316365	earlier Intel processors are not
-0.437920	Floating point parameters are not
-0.309194	on network resources are not
-0.317111	The modern microprocessors are not
-0.292908	and model numbers are not
-0.219547	Many algebraic reductions are not
-0.219547	checks. These conversions are not
-0.273373	the function names are not
-0.219547	allocated in sequence are not
-0.035200	and PLT tables are not
-0.219547	IDE's for D are not
-0.292908	program. The profilers are not
-0.219547	most sorting algorithms, are not
-0.219547	operators (e.g. '>') are not
-0.354380	and what it can not
-0.237219	investment. A redesign can not
-0.330731	with reduced speed or not
-0.236985	to use hyperthreading or not
-0.325059	static linking and by not
-0.324942	platform __GNUC__ and not not
-1.376199	to tell the compiler not
-0.462905	kludgy that it may not
-0.550560	functions then it may not
-0.252705	and the compiler may not
-0.252705	but the compiler may not
-0.252705	reference, the compiler may not
-0.493526	access. The compiler may not
-0.284058	clearing arrays It may not
-0.284058	vector registers. It may not
-0.284058	contained objects? It may not
-0.271591	The allocated memory may not
-0.217972	of inlined functions may not
-0.217972	the level-1 cache may not
-0.291044	that current compilers may not
-0.271591	casting of pointers may not
-0.217972	starts. The user may not
-0.681425	The operating system may not
-0.217972	function returns. alloca may not
-0.217972	exit. Calling exit may not
-0.217972	and USB sticks may not
-0.293773	and 64-bit. They have not
-0.237504	skip large expressions when not
-0.349256	-fpic and it will not
-0.266643	sure that it will not
-0.266643	section, but it will not
-0.464566	then the code will not
-0.306852	in memory. It will not
-0.320539	below. The program will not
-0.310510	on, the compilers will not
-0.310510	precision. The compilers will not
-0.306852	1.23456. But we will not
-0.222711	to me. You will not
-0.222711	the code 16 will not
-0.347588	even when it has not
-0.482580	2. The compiler has not
-0.622456	vector class library has not
-0.282695	in some cases, but not
-0.292565	at compile- time, but not
-0.263605	linking is used, but not
-0.263605	on AMD processors, but not
-0.282695	certain Intel CPUs, but not
-0.210906	4 or 8, but not
-0.210906	stored in memory, but not
-0.282695	fast and efficient, but not
-0.210906	is a float, but not
-0.263605	in multiple applications, but not
-0.210906	code more complex, but not
-0.210906	(.lib or .a), but not
-0.210906	below 2 GB, but not
-0.210906	to be noticeable but not
-0.022783	an expression that should not
-0.340230	operands because you should not
-0.686702	The CPU dispatcher should not
-0.228304	The performance measurement should not
-0.356923	Error: lowest instruction set not
-0.102809	operating systems that do not
-0.102809	old microprocessors that do not
-0.157729	programming languages that do not
-0.102809	soft cores that do not
-0.077751	However, most compilers do not
-0.077751	reason why compilers do not
-0.133968	memory and we do not
-0.133968	large that we do not
-0.172524	Floating point variables do not
-0.077751	Intel function libraries do not
-0.077751	the Intel libraries do not
-0.172524	explicitly that pointers do not
-0.172524	but 32-bit systems do not
-0.172524	operators because they do not
-0.172524	these integer operations do not
-0.172524	that these directives do not
-0.172524	The bigger vectors do not
-0.172524	as when contentions do not
-0.172524	because relative references do not
-0.172524	pointer. These conversions do not
-0.172524	other STL containers do not
-0.172524	example, many programmers do not
-0.172524	now overlap. Compilers do not
-0.005305	their live ranges do not
-0.172524	I have studied do not
-0.172524	if their live-ranges do not
-0.172524	uses (live ranges) do not
-0.293377	vector aligned Assume pointer not
-0.236529	of a class need not
-0.236502	be signed. Be sure not
-0.292505	"\nError: Instruction set SSE2 not
-0.209994	advantage that it does not
-0.084677	const)) Assume function does not
-0.040278	ready-made profiler. This does not
-0.040278	per point. This does not
-0.108323	integer. The compiler does not
-0.108323	composer) This compiler does not
-0.108323	The Microsoft compiler does not
-0.084677	used. However, this does not
-0.137210	syntax check. It does not
-0.084677	inside the loop does not
-0.026433	that the pointer does not
-0.026433	decrementing a pointer does not
-0.026433	a specific pointer does not
-0.084677	The IPP library does not
-0.084677	member the object does not
-0.019672	power of 2 does not
-0.019672	powers of 2 does not
-0.084677	it is long does not
-0.084677	process or thread does not
-0.084677	important. This manual does not
-0.084677	that the list does not
-0.084677	The CPU dispatcher does not
-0.084677	70). The programmer does not
-0.084677	that pointer aliasing does not
-0.084677	the other hand, does not
-0.084677	unfortunately the unit-test does not
-0.084677	The same argument does not
-0.084677	} Example 14.26 does not
-0.235974	positive value, n. But not
-0.530757	compiler. It is therefore not
-0.267835	dedicated microprocessor and therefore not
-0.267835	system dependent and therefore not
-0.414639	late. You should therefore not
-0.311744	from scratch. This would not
-0.345188	processor X" is simply not
-0.234925	thread. If seconds was not
-0.234424	a scalar (Scalar means not
-0.309758	161 32 bit platform not
-0.341283	the compiler is usually not
-0.327627	causes problem that were not
-0.217381	the different tasks were not
-0.037536	at a time. Do not
-0.037536	a member function. Do not
-0.037536	registers are used. Do not
-0.037536	caching less efficient. Do not
-0.037536	current instruction set. Do not
-0.037536	dynamic memory allocation. Do not
-0.037536	contiguous memory block. Do not
-0.037536	prevents certain optimizations. Do not
-0.037536	a linked list. Do not
-0.037536	a scarce resource. Do not
-0.037536	row + column; Do not
-0.037536	and Enterprise editions). Do not
-0.231162	instruction set, but possibly not
-0.231221	for this function, though not
-0.285051	complex cases it might not
-0.228991	performance test should include not
-0.282968	only for Intel CPUs, not
-0.226457	5. If columns had not
-0.281260	Container classes are generally not
-0.415381	by the operating system, not
-0.226457	buffer with fixed size, not
-0.260904	be saved in registers, not
-0.184511	works only on registers, not
-0.212191	to use. I am not
-0.212191	embedded microcontrollers. I am not
-0.276309	bit platform not _WIN64 not
-0.276232	Windows version is currently not
-0.069288	in some cases. Does not
-0.069288	later instruction sets. Does not
-0.162114	only 32-bit Windows. Does not
-0.069288	including an IDE. Does not
-0.212004	Visual Studio IDE. Has not
-0.212004	stored in a register, not
-0.212116	processes. The profiler measures not
-0.330430	on my own research, not
-0.199741	STL containers is 95 not
-0.164900	DWORD PTR [edx] adds, not
-0.164900	lines follow the rows, not
-0.164900	flow. However, this did not
-0.164900	a non-recursing template specialization, not
-0.164900	for accessing arrays forwards, not
-0.164900	and operating systems (but not
-0.164900	main will take precedence, not
-0.376305	for intrinsic functions // This
-0.288883	N = 0 // This
-0.233220	// SSE3 required // This
-0.101139	int)b / 16; // This
-0.101139	int)b % 16; // This
-0.233220	Use simple method. // This
-0.233220	ReadTSC() - time1; // This
-0.233220	half a square. // This
-0.701495	} } } } This
-0.437104	b[r][c]; } } } This
-0.319542	of range } } This
-0.386688	d = 1; } This
-0.893961	a + 1; } This
-0.220663	i; return f; } This
-0.224071	i / 3; } This
-0.224071	i % 3; } This
-0.220663	3: printf("Delta"); break; } This
-0.220663	FuncB(i); } FuncC(i); } This
-0.220663	the four sums } This
-0.220663	FuncC(i); FuncB(i+1); FuncC(i+1); } This
-0.561793	version of the code. This
-0.355606	rest of the code. This
-0.275321	compiling the intermediate code. This
-0.221267	code to non-AVX code. This
-0.589588	10% of the time. This
-0.449846	square at a time. This
-0.221101	it the first time. This
-0.311106	take any extra time. This
-0.235689	operating systems. 10 Gnu This
-0.249054	version of the function. This
-0.206509	modules call the function. This
-0.206509	defined inside the function. This
-0.399346	from the dispatcher function. This
-0.234849	evict number 2, etc. This
-0.324914	any other member functions. This
-0.271476	of the virtual functions. This
-0.290923	the so-called intrinsic functions. This
-0.235099	= a || b; This
-0.347116	contiguously in the memory. This
-0.296193	save temp in memory. This
-0.325050	contiguous in program memory. This
-0.246534	the library into memory. This
-0.264873	results in RAM memory. This
-0.257085	part of the program. This
-0.639007	part of a program. This
-0.194366	in a C++ program. This
-0.194366	in the final program. This
-0.221579	the level- 1 cache. This
-0.590705	in the level-2 cache. This
-0.334655	point comparisons more efficient. This
-0.233347	on page 8 below. This
-0.288342	calculations on the data. This
-0.250232	own block of data. This
-0.567788	self-relative addressing of data. This
-0.348117	the FMA4 instruction set. This
-0.292403	versions for different compilers. This
-0.272891	used with other compilers. This
-0.271977	39 matrix[i][j] += x; This
-0.562797	x++) factorial *= x; This
-0.621766	the function is called. This
-0.343714	critical function is called. This
-0.498949	function is never called. This
-0.379308	versions for different CPUs. This
-0.196349	to support different CPUs. This
-0.268551	of the Intel compiler. This
-0.215283	the Intel C++ compiler. This
-0.171659	in the innermost loop. This
-0.171659	outside the innermost loop. This
-0.214329	as a memory pointer. This
-0.267473	of the member pointer. This
-0.539403	Linux and Windows platforms. This
-0.589360	with all x86 platforms. This
-0.404636	value in most cases. This
-0.212802	same in both cases. This
-0.194501	decimal point is 1. This
-0.245129	for N = 1. This
-0.645437	than 0 or 1. This
-0.317692	the level-1 cache size. This
-0.306742	most for register variables. This
-0.230939	input or network resources. This
-0.315883	same structure or class. This
-0.238870	to its child class. This
-0.188927	of the derived class. This
-0.525899	+ c + d; This
-0.285948	a pointer to it. This
-0.285948	the scarcity of registers. This
-0.286613	is typically 64 bytes. This
-0.578786	from the shared object. This
-0.693435	version of the library. This
-0.231105	than the actual calculations. This
-0.230636	use of integer operations. This
-0.259792	optimizations on the variable. This
-0.207527	to a local variable. This
-0.340531	for whole program optimization. This
-0.207194	compilers offer profile-guided optimization. This
-0.177372	than on the stack. This
-0.175097	cleans up the stack. This
-0.158685	have each their stack. This
-0.329506	usable library if possible. This
-0.230030	more space than needed. This
-0.406002	instance for each thread. This
-0.192030	work into each thread. This
-0.305650	changed by another thread. This
-0.337826	available for other purposes. This
-0.257171	for all these purposes. This
-0.371267	any floating point instructions. This
-0.284760	fit into the vector. This
-0.253949	programming languages as well. This
-0.202342	executed. Optimizes very well. This
-0.334678	of the CPU dispatching. This
-0.229064	structure of the problem. This
-0.970530	before the function returns. This
-0.519441	deallocated in random order. This
-0.515992	of dynamic memory allocation. This
-0.181491	of the memory block. This
-0.181491	new bigger memory block. This
-0.591816	time Func is executed. This
-0.519112	no check for overflow. This
-0.227052	have the same value. This
-0.031865	a single object file. This
-0.226863	the same logical register. This
-0.478095	to the operating system. This
-0.311898	operations are very fast. This
-0.226863	floating point multiplication units. This
-0.281879	elements in the array. This
-0.226675	of their 23 software. This
-0.454786	loading a cache line. This
-0.226863	positions in the vectors. This
-0.225298	type and its parameters. This
-0.288555	where speed is important. This
-0.194112	if portability is important. This
-0.183804	of a vector simultaneously. This
-0.233127	run eight threads simultaneously. This
-0.452145	below. Microsoft Visual Studio This
-0.645447	depending on the processor. This
-0.259204	large number of bits. This
-0.190913	extended to 64 bits. This
-0.145930	rather than 32 bits. This
-0.225513	than it actually is. This
-0.225513	more important than speed. This
-0.250896	cleanup jobs to do. This
-0.183804	CPUs, but event-counters do. This
-0.265419	rather than by 16. This
-0.183609	as i modulo 16. This
-0.363497	links. 20 Copyright notice This
-0.314022	Alignment of data members. This
-0.454289	address and back again. This
-0.175397	memory a hundred times. This
-0.297853	unpredictably at inconvenient times. This
-0.510945	of the preceding one. This
-0.363153	so-called time stamp counter. This
-0.363153	object's class or structure. This
-0.220923	using a ready-made profiler. This
-0.220923	the Gnu compiler manual. This
-0.220923	the loop are finished. This
-0.221221	32 sets 4 ways. This
-0.310887	vector intrinsics. Digital Mars This
-0.220923	virtual function dispatch process. This
-0.220923	size of data files. This
-0.304684	can do out-of-order execution. This
-0.113958	stored in different modules. This
-0.022361	by any other modules. This
-0.128844	extra code at all. This
-0.128844	no offset at all. This
-0.220923	uses 32-bit absolute addresses. This
-0.220923	big before multiplying them. This
-0.220923	three values per point. This
-0.301433	of registers is doubled. This
-0.584571	addition every clock cycle. This
-0.217131	to be filled up. This
-0.217131	version for marketing reasons. This
-0.217131	allocation for all objects. This
-0.498974	the same cache lines. This
-0.298446	& 15] += 1.0f; This
-0.217131	that have multiple versions. This
-0.217131	may not be cached. This
-0.217500	with zero-bits if unsigned. This
-0.354602	violations and invalid pointers. This
-0.531574	turn on this option. This
-0.217500	= a2 / b2; This
-0.270640	choose other programming languages. This
-0.217131	than the subsequent counts. This
-0.217131	packages faster and smaller. This
-0.546850	called from another module. This
-0.217131	to do this manually. This
-0.271058	to get reproducible results. This
-0.210883	a floating point addition. This
-0.210883	of the code only. This
-0.210883	when they are long. This
-0.293514	be calculated in advance. This
-0.263580	% 32 = 28. This
-0.121892	the program is loaded. This
-0.121892	pointer has been loaded. This
-0.210883	implemented in compiled C++. This
-0.210883	less efficient code caching. This
-0.210883	the array is stored. This
-0.210883	in a random manner. This
-0.210883	means of #include directives. This
-0.121892	to the function definition. This
-0.164330	inside the class definition. This
-0.106984	AMD and VIA CPUs"). This
-0.282669	code is the same. This
-0.210883	returning a null reference. This
-0.293514	underflow neutralize each other. This
-0.121892	has become too fragmented. This
-0.164330	heap has become fragmented. This
-0.210883	rounding instead of truncation. This
-0.121892	functions static or inline. This
-0.228253	Declare the function inline. This
-0.345554	or malloc and free. This
-0.013492	is created or modified. This
-0.210883	the destructor of x. This
-0.210883	four results in a. This
-0.198651	declare the table static. This
-0.198651	language and interface frameworks. This
-0.463621	or moving the mouse. This
-0.249794	unused bytes S1 ArrayOfStructures[100]; This
-0.249794	before it is compiled. This
-0.198651	* 4 = 32. This
-0.198651	b + a; 72 This
-0.198651	must be declared volatile. This
-0.198651	but also less safe. This
-0.198651	function to be pure. This
-0.328959	is out of range. This
-0.198651	<= u.f < 2.0 This
-0.198651	older microprocessors is lost. This
-0.198651	to the function declaration. This
-0.198651	variable is never changed. This
-0.249794	need the "override" feature. This
-0.198651	the array is defined. This
-0.249794	position-independent code by default. This
-0.198651	separately in software development. This
-0.249794	the object is known. This
-0.198651	zero, rather than rounding. This
-0.198651	register, not even temporarily. This
-0.249794	branch should be predicted. This
-0.463621	execution of everything else. This
-0.198651	variable-size arrays with alloca. This
-0.249794	time of 250 ms. This
-0.249794	constant (see page 137). This
-0.198651	holds the index, i. This
-0.249794	.......................................................................................................... 164 1 Introduction This
-0.249794	much higher than normal. This
-0.198651	cache misses have occurred. This
-0.198651	although slightly less efficiently. This
-0.249794	i++) sum += list[i]; This
-0.198651	link. Use different executables. This
-0.163895	be less than 231. This
-0.163895	may have undesired effects. This
-0.163895	at the label $B1$2:. This
-0.163895	has been called before. This
-0.163895	memory bus is saturated. This
-0.163895	one it is compiling. This
-0.163895	making longer time slices. This
-0.163895	can exceed 2 Gbytes. This
-0.163895	cost of task switching. This
-0.163895	technological point of view. This
-0.163895	version 2.11 ifunc branch). This
-0.163895	IEEE standard 754 (1985). This
-0.163895	allocated is also de-allocated. This
-0.163895	stored at address [ecx+eax*4]. This
-0.163895	= b + 0.666666666666666666667; This
-0.163895	should be made local. This
-0.163895	(16 or 32 bytes). This
-0.163895	searching for vacant spaces. This
-0.163895	that it becomes full. This
-0.163895	#if instead of if. This
-0.163895	the calculation of (a+b). This
-0.163895	(or part of it). This
-0.163895	can be quite substantial. This
-0.163895	values of its arguments. This
-0.163895	option /QaxAVX or -axAVX. This
-0.163895	by (partial) template specialization. This
-0.163895	after the last member. This
-0.163895	Visual Studio 2008 version). This
-0.163895	no exception ever happens. This
-0.163895	table with two entries. This
-0.163895	is Microsoft Visual Studio. This
-0.163895	n; #endif return n;} This
-0.163895	z != 0; 35 This
-0.163895	and test their functionality. This
-0.163895	ret ALIGN ; mark_end; This
-0.163895	memory page size (4096). This
-0.163895	last 8 columns unused. This
-0.163895	data exceeds 64 kbytes. This
-0.163895	that can be reduced. This
-0.163895	vector math library (SVML). This
-0.163895	may happen quite often. This
-0.163895	flag (e.g. DEC, JNZ). This
-0.163895	operating system thread scheduler. This
-0.163895	errors must be added. This
-0.163895	than the external clock. This
-0.163895	of list plus i*sizeof(S1). This
-0.163895	requested. See page 45. This
-0.163895	user interfaces from scratch. This
-0.163895	than a polymorphous class? This
-0.163895	1 (see page 135). This
-0.163895	square blocking or tiling. This
-0.163895	below in example 16.1. This
-0.163895	by comparing bits 32-62. This
-0.163895	explained on page 87. This
-0.163895	is available from www.agner.org/optimize/testp.zip. This
-0.163895	as string or CString. This
-0.163895	in the previous iteration. This
-0.163895	Library (MKL v. 7.2). This
-0.163895	the YMM register state. This
-0.163895	that memset is deprecated. This
-0.163895	is calculated as ((a+b)+c)+d. This
-0.163895	C++ compiler (parallel composer) This
-0.163895	-fpie instead of -fpic. This
-0.163895	Gnu utilities in 2010. This
-0.163895	for each line written. This
-0.163895	discussed on page 158. This
-0.163895	as a time measure. This
-0.163895	the normal return route. This
-0.163895	(OnIdle in Windows MFC). This
-0.163895	require other access patterns. This
-0.163895	of overflow is "undefined". This
-0.163895	with the option -mveclibabi=svml. This
-0.163895	7.45 // Portability note: This
-0.163895	known at this place. This
-0.163895	8 = 64 kb. This
-0.163895	broader perspective of usability. This
-0.163895	= log(b[i]) + log(c[i]);. This
-0.163895	s = (short int)i; This
-0.163895	output (/FAs or -fsource-asm). This
-0.163895	prototype: void F1() throw(); This
-0.163895	the instruction xor eax,eax. This
-0.512738	| a = a -
-0.460245	| 0 = a -
-0.106435	- -(-a) = a -
-0.106435	n.a. -(-a) = a -
-0.106435	- a*1 = a -
-0.106435	n.a. a*1 = a -
-0.106435	- a+0 = a -
-0.106435	n.a. a+0 = a -
-0.247728	- a/1 = a -
-0.247728	reductions: ~(~a) = a -
-0.441213	else { return a -
-0.546773	* 3; return a -
-0.575069	constant = multiply by -
-0.195129	- - - - -
-0.084374	x - - - -
-0.016573	n.a. - - - -
-0.110499	-- - - - -
-0.265996	- x - - -
-0.478910	x x - - -
-0.222305	-1 x - - -
-0.053755	- n.a. - - -
-0.378937	x n.a. - - -
-0.133550	x -- - - -
-0.163119	- - x - -
-0.154680	x - x - -
-0.033607	n.a. - x - -
-0.397852	- x x - -
-0.529157	x x x - -
-0.117622	n.a. x x - -
-0.153077	= -1 x - -
-0.096303	a - n.a. - -
-0.220232	-1 - n.a. - -
-0.220232	a&(b|c) - n.a. - -
-0.220232	a<<(b+c) - n.a. - -
-0.247795	n.a. x n.a. - -
-0.123083	- n.a. n.a. - -
-0.169768	x x -- - -
-0.126821	((x2)2)2 a+a+a+a=a*4 -(-a)=a - -
-0.166086	a = a x -
-0.270209	- - - x -
-0.228338	x - - x -
-0.172884	-(-a)=a - - x -
-0.343174	- x - x -
-0.714623	x x - x -
-0.016250	- n.a. - x -
-0.568939	- - x x -
-0.474028	x - x x -
-0.945076	- x x x -
-0.637780	x x x x -
-0.389499	n.a. x x x -
-0.082191	- n.a. x x -
-0.239212	x (x) x x -
-0.019097	- n.a. n.a. x -
-0.280093	return (2.5f * x -
-0.166086	-1 = -1 x -
-0.166086	vectorization Devirtualization ---x----- x -
-0.166086	--xx----- x-xxx---x x-xxx---x x -
-1.424007	part of the program -
-0.359056	= a - n.a. -
-0.268116	- n.a. - n.a. -
-0.152592	x n.a. - n.a. -
-0.152592	reciprocal n.a. - n.a. -
-0.258869	= 0 - n.a. -
-0.203275	0= 0 - n.a. -
-0.236609	= -1 - n.a. -
-0.236609	= a&(b|c) - n.a. -
-0.236609	= a<<(b+c) - n.a. -
-0.221149	- n.a. x n.a. -
-0.139083	x n.a. x n.a. -
-0.033173	- - n.a. n.a. -
-0.007539	x - n.a. n.a. -
-0.123541	by - reciprocal n.a. -
-0.536654	= ((x2) 2) 2 -
-0.100785	or double takes 4 -
-0.100785	microprocessor. Multiplication takes 4 -
-0.419477	data cache of 8 -
-0.262997	& ~a = 0 -
-0.107623	- a-a = 0 -
-0.107623	n.a. a-a = 0 -
-0.054406	- a*0 = 0 -
-0.054406	n.a. a*0 = 0 -
-0.262997	a ^a = 0 -
-0.173432	- 0/a = 0 -
-0.116616	- andnot(a,a) = 0 -
-0.179494	// test bits 0 -
-0.179494	instruction takes typically 0 -
-0.179494	a & 0= 0 -
-0.343093	(2) use unsigned integers -
-0.235289	Programmer’s Manual", Volume 1 -
-0.298351	misprediction penalty of 10 -
-0.217382	not detected until 10 -
-0.232505	= (a >= b) -
-0.232095	The code is inlined -
-0.171796	4 processors, and 3 -
-0.171796	point addition takes 3 -
-0.171796	it may take 3 -
-0.227923	misprediction is approximately 12 -
-0.554375	| -1 = -1 -
-0.226505	It is quite expensive -
-0.224721	cycles. Division takes 14 -
-0.222118	the conversion takes 50 -
-0.271980	Integer division takes 40 -
-0.391813	development, testing and maintenance -
-0.264898	assembly listing /FA -S -
-0.212050	x x- x ----- -
-0.212147	n.a. (-a)*(-b) = a*b -
-0.212050	CriticalFunction(); timediff[i] = ReadTSC() -
-0.294141	x n.a. Constant folding -
-0.035740	- a+b+c = a+(b+c) -
-0.035740	n.a. (a+b)+c = a+(b+c) -
-0.074687	x x x -- -
-0.074687	- - xxxxxxxxx -- -
-0.017502	- a*b+a*c = a*(b+c) -
-0.017502	n.a. a*b+a*c = a*(b+c) -
-0.251071	= ((x2)2)2 a+a+a+a=a*4 -(-a)=a -
-0.330491	b+a, a*b = b*a -
-0.330491	- (a&b)|(a&c) = a&(b|c) -
-0.164942	a matter of convenience -
-0.164942	x ----- - x-xxx -
-0.164942	test // (time after) -
-0.164942	for (i = (int)n -
-0.164942	// polynomial(x) = 2.5*x^2 -
-0.164942	- a<<b<<c = a<<(b+c) -
-0.164942	subtraction and multiplication (27 -
-0.164942	subtraction and multiplication (20 -
-0.164942	14.5b if ((unsigned int)(i -
-0.164942	addition and subtraction (3 -
-0.164942	Denmark. Copyright © 2004 -
-0.164942	- x-xxx - xx(-)x- -
-0.164942	min) <= (unsigned int)(max -
-0.164942	many features, see http://www.agner.org/optimize/ -
-0.164942	n.a. a+a+a+a = a*4 -
-0.164942	x x-- x --- -
-0.521076	Any expression that is an
-0.585612	fast if it is an
-0.707366	of a program is an
-0.444058	A smart pointer is an
-0.069575	not if b is an
-0.033396	b if b is an
-0.549325	two then there is an
-0.326214	this case there is an
-0.326214	pointer. But there is an
-0.676982	some cases, there is an
-0.337530	function libraries. C++ is an
-0.340454	{ ... There is an
-0.340454	unit throughput There is an
-0.463039	whether the processor is an
-0.599475	the loop counter is an
-0.234523	if the source is an
-0.416208	serious when n is an
-0.234523	recognize that 10 is an
-0.657099	when the exponent is an
-1.023689	when the size of an
-0.529368	i. The size of an
-0.184853	calculating the address of an
-0.442919	all the bits of an
-0.514349	Re-interpreting the type of an
-0.218353	function in case of an
-0.218353	program in case of an
-0.095596	up in case of an
-0.218353	everything in case of an
-0.804349	The different versions of an
-0.321567	result. An overflow of an
-0.328804	an unused copy of an
-0.515754	past the end of an
-0.423036	limit the range of an
-0.327756	registers used. Conversion of an
-0.437658	and the throughput of an
-0.499484	signaling the availability of an
-0.377031	on just-in-time compilation of an
-0.233743	in the event of an
-0.233743	verifying the functionality of an
-0.291582	the console or to an
-1.104465	to a pointer to an
-0.589914	floating point number to an
-0.323266	are first compiled to an
-0.235593	data are aligned to an
-0.292395	p always points to an
-0.292395	pointer actually points to an
-0.620672	The same applies to an
-0.901226	can be converted to an
-0.235593	that add functionality to an
-0.417834	compiler reduced 15.1a to an
-0.235593	for adding bounds-checking to an
-0.237006	make a multiplication and an
-0.293190	dedicated microprocessor core and an
-0.293190	in exclusive mode, and an
-0.416949	through a pointer in an
-1.233806	number of elements in an
-0.495889	The objects stored in an
-0.289801	expression, or first in an
-0.335956	number of bits in an
-0.633740	numerically largest element in an
-0.327063	of each run in an
-0.234027	dynamic memory allocation in an
-0.234027	implement a microprocessor in an
-0.289801	often true last in an
-0.101727	are stored together in an
-0.101727	always stored together in an
-0.402404	Examples are provided in an
-0.234027	in question: Put in an
-0.234027	a thread-like scheduling in an
-0.350082	popular and used for an
-0.288810	See page 32 for an
-0.233155	will be allocated for an
-0.694920	See page 130 for an
-0.233155	See page 80 for an
-0.233155	See page 43 for an
-0.233155	See page 81 for an
-0.023158	See page 89 for an
-0.023158	See page 78 for an
-0.023158	and VIA CPUs" for an
-0.452675	You cannot assume that an
-0.237267	a program dictates that an
-0.894388	is known to be an
-0.441405	it appears to be an
-0.341364	function argument to be an
-0.837160	cases it can be an
-0.580915	operations. This can be an
-0.351498	oriented programming can be an
-0.351045	linear array will be an
-0.447924	map may also be an
-0.344602	{} which would be an
-0.568535	counter should preferably be an
-0.532828	a command line or an
-0.290768	be an expression or an
-0.234877	a link map or an
-0.234877	to an integer, or an
-0.234877	overloaded assignment operator, or an
-0.290673	function, etc., and if an
-0.476512	necessary to check if an
-0.234793	returns. But what if an
-0.454881	code will fail if an
-0.234793	(Not A Number) if an
-0.337567	installation process or by an
-0.512740	can be calculated by an
-0.294919	could be calculated by an
-0.234552	A command received by an
-0.234552	must be followed by an
-0.307640	linked list or with an
-0.325981	in the code with an
-0.287447	me explain this with an
-0.231956	error can return with an
-0.399356	can be accessed with an
-0.437996	multiplication is done with an
-0.318099	which is implemented with an
-0.287447	standard PC platform with an
-0.234997	function is called on an
-0.236187	code is running on an
-0.341498	only when running on an
-0.546150	Java are based on an
-0.225341	to access x as an
-0.279938	organizing the data as an
-0.116650	counter is used as an
-0.116650	bool is used as an
-0.299777	floating point variable as an
-0.678240	the same way as an
-0.225341	These are available as an
-0.299777	which is transferred as an
-0.365389	language is provided as an
-0.279938	i is interpreted as an
-0.779428	can be expressed as an
-0.389665	x is treated as an
-0.365389	that is coded as an
-0.279938	can be represented as an
-0.225341	factorial function (n!) as an
-0.440509	code. This is not an
-0.533150	example, this is not an
-0.137749	the processor is not an
-0.137749	logical processor is not an
-0.497375	to take more than an
-0.235276	is more expensive than an
-0.235276	and more compact than an
-0.244678	end user will have an
-0.244678	a processor will have an
-0.325484	optimization Some compilers have an
-0.325484	places). Some compilers have an
-0.238010	loop. Most compilers have an
-0.238010	slower. Many compilers have an
-0.313556	by F1 also have an
-0.409627	times then we have an
-0.279579	you don't even have an
-0.795066	the compiler doesn't have an
-0.225024	compiler for Linux have an
-0.349514	are called every time an
-0.649370	is inefficient to use an
-0.515704	expensive. You may use an
-0.336204	output option then use an
-0.231987	modern programming languages use an
-0.231987	against overkill. Don't use an
-0.237369	on some microprocessors when an
-0.237415	not declared volatile then an
-0.107730	should be stored at an
-0.107730	preferably be stored at an
-0.284249	has to start at an
-0.090927	to be loaded at an
-0.090927	can be loaded at an
-0.229140	array must begin at an
-0.503532	branch if it has an
-0.431990	If the compiler has an
-0.380164	the Intel compiler has an
-0.431299	*.so). The program has an
-0.334612	program or library has an
-0.228738	three functions. Sum1 has an
-0.424506	that you can make an
-0.389856	PowerPC). We can make an
-0.321584	point counter then make an
-0.237369	not an issue because an
-0.485572	bits. This is only an
-0.326301	a multiplication but only an
-0.289101	two pointers requires only an
-0.237287	resources cleaned up. If an
-0.237220	implementations of Pascal used an
-0.335385	fastest way to set an
-1.043438	is possible to do an
-0.234558	128-bit vector register, do an
-0.484555	biggest disadvantage of using an
-0.301690	The reason for using an
-0.145908	If you are using an
-0.347264	avoid this by using an
-0.226217	function or class into an
-0.226217	splitting of software into an
-0.226217	code is compiled into an
-0.311112	cannot be loaded into an
-0.226217	one by one, into an
-0.237038	this by making i an
-0.285747	or reference to such an
-0.230459	compiler doesn't make such an
-0.230459	CPU hardware. Porting such an
-0.236472	The function may return an
-0.287896	calls faster and makes an
-0.232351	because the linker makes an
-0.350519	table lookup is often an
-0.329985	and we don't need an
-0.230792	as command-line versions without an
-0.230792	but in applications without an
-0.295120	error is to access an
-0.383945	In order to access an
-0.217114	by two and making an
-0.607139	be avoided by making an
-0.238835	the compiler from making an
-0.508431	counts how many times an
-0.235815	make any assumption about an
-0.235737	16 bits wide, while an
-0.291879	long as you avoid an
-0.228530	graphics cards, etc. Use an
-0.283556	lots of data. Use an
-0.235791	C0::f or C1::f. But an
-0.283470	a DLL goes through an
-0.228454	accessed from main through an
-0.226642	32-bit Mac code uses an
-0.226642	branch). This feature uses an
-0.905295	in order to get an
-0.235368	want to check whether an
-0.311367	the microprocessor is doing an
-0.660311	then it will run an
-0.234924	the loop or add an
-0.345002	An enum is simply an
-0.234558	in example 11.2b was an
-0.499909	because, in most cases, an
-0.473213	optimizing compiler can replace an
-0.378019	class that behaves like an
-0.234402	to a function. Using an
-0.310588	multiple threads and put an
-0.464203	library because it needs an
-0.309745	Most importantly, it requires an
-0.233661	on page 58 shows an
-0.209021	operating system to generate an
-0.209021	are able to generate an
-0.175930	-56 which will generate an
-0.175930	The linker will generate an
-0.175930	of c+b will generate an
-0.233435	program, you should choose an
-0.319548	$B1$2:. This is just an
-0.213543	kinds of code gives an
-0.213543	class library, SSE4.1 gives an
-0.232281	the application software. Such an
-0.336401	pointer is in fact an
-0.232281	for 32-bit Windows, including an
-0.231469	integer arithmetic operations. When an
-0.231297	key press. 19 Avoid an
-0.231297	tasks such as copying an
-0.208040	extra cost to accessing an
-0.208040	functions or when accessing an
-0.332294	be improved by adding an
-0.231355	because the write causes an
-0.287088	signed when you divide an
-0.230684	has a branch (e.g. an
-0.286282	number. We can convert an
-0.425647	safe way to handle an
-0.324399	are risking to insert an
-0.199722	may be called whenever an
-0.199722	they are declared whenever an
-0.229535	Make the function modify an
-0.229683	an array or setting an
-0.579581	compiler to optimize away an
-0.226071	the first PC's had an
-0.224239	plus a constant plus an
-0.224361	size if you declare an
-0.224239	[] operator will detect an
-0.278965	software programming language defines an
-0.224483	a smart pointer. Accessing an
-0.275798	used simply to increment an
-0.221688	is that it adds an
-0.221978	so unless you specify an
-0.218069	the way of declaring an
-0.217889	to frame functions. While an
-0.211630	above code will catch an
-0.211630	makes the dispatcher signal an
-0.211630	Borland/CodeGear/Embarcadero C++ builder Has an
-0.211630	the program to issue an
-0.211630	the condition clause. Comparing an
-0.211630	that can possibly throw an
-0.264423	processor the user expects an
-0.199377	Make the function construct an
-0.199377	with a slow CPU, an
-0.199377	has defined a constructor, an
-0.199377	user input never exceeds an
-0.199377	error simply by performing an
-0.199377	reference. This will provoke an
-0.199377	sum += a[i]; Converting an
-0.250611	be avoided by replacing an
-0.164565	how this works, here's an
-0.164565	recovering or for issuing an
-0.164565	Similarly, we are seeing an
-0.164565	The next line provokes an
-0.164565	library, you are feeding an
-0.164565	function that simply prints an
-0.164565	case F2 actually throws an
-0.164565	instruction set specified. Insert an
-0.164565	was manipulated to fake an
-0.164565	microprocessor hardware for raising an
-0.164565	The function that detects an
-0.835586	float or double to int
-0.237504	2 * 5; to int
-0.396428	unlimited 4 bytes = int
-0.054801	bytes = float or int
-0.452802	The size of an int
-0.329357	will fail if an int
-0.235354	bits wide, while an int
-0.355293	unsigned 256 short int int
-0.449101	14.9 struct S1 { int
-0.220662	Bitfield { struct { int
-0.538029	int * p) { int
-0.063184	int & r) { int
-0.274636	7.40a struct Bitfield { int
-0.022183	} int main() { int
-0.220662	long long ReadTSC() { int
-0.096465	module2.cpp int Func2() { int
-0.096465	} void Func2() { int
-0.358943	{ if (y) { int
-0.022183	a[SIZE][SIZE], double b[SIZE][SIZE]) { int
-0.220662	a, int x[]) { int
-0.335731	called only first time int
-0.234317	return (*CriticalFunction)(parm1, parm2); } int
-0.290130	version return &CriticalFunction_386; } int
-0.234317	memset(a, 0, sizeof(a)); } int
-0.236981	Round to nearest integer int
-0.460946	Detect supported instruction set int
-0.036742	set, using asmlib library int
-0.035609	{...} // SSE2 version int
-0.510158	{...} // AVX version int
-0.046001	"asmlib.h" // Lowest version int
-0.046001	&CriticalFunction_Dispatch; // Lowest version int
-0.292555	Iu16vec4 32 2 2 int
-0.303828	64-bit Linux: unsigned long int
-0.228752	in 16-bit systems: long int
-0.228752	29 64-bit Linux: long int
-0.223349	value. However, the const int
-0.059195	class c1 { const int
-0.059195	void MathLoop() { const int
-0.127752	// EMMS } const int
-0.195040	123 and static const int
-0.195040	of factorials: static const int
-0.127752	8.24. Integer constant const int
-0.127752	unsigned int i; const int
-0.127752	12.1a. Automatic vectorization const int
-0.127752	int order(int x); const int
-0.127752	X __attribute__((aligned(16))) #endif const int
-0.127752	// Example 14.8 const int
-0.127752	from example 16.1 const int
-0.127752	// Example 14.30 const int
-0.127752	// Example 9.5a const int
-0.127752	// Example 11.3 const int
-0.127752	// Example 9.4 const int
-0.127752	// Example 7.17 const int
-0.028590	9.1a int Func(int); const int
-0.028590	9.1b int Func(int); const int
-0.127752	// Example 9.6a const int
-0.127752	// Example 11.2b const int
-0.127752	size of squares: const int
-0.127752	Table of factorials: const int
-0.127752	// Example 14.5a const int
-0.127752	// Example 11.2a const int
-0.127752	// Example 7.33b const int
-0.127752	// Example 7.33a const int
-0.127752	FuncRow(int); int FuncCol(int); const int
-0.127752	// Example 14.4a const int
-0.330391	Iu16vec8 Vec8us 32 4 int
-0.484136	list[size], sum = 0; int
-0.336545	0, sum2 = 0; int
-0.336545	absvalue, largest_abs = 0; int
-0.202456	type-casting i to unsigned int
-0.230096	making i an unsigned int
-0.178777	2 2 int unsigned int
-0.040503	struct Sdouble { unsigned int
-0.040503	struct Slongdouble { unsigned int
-0.040503	struct Sfloat { unsigned int
-0.194419	Vec4i 32 4 unsigned int
-0.014749	// fractional part unsigned int
-0.134974	union {double d; unsigned int
-0.134974	ipow (double x, unsigned int
-0.134974	{ float f; unsigned int
-0.352807	in 16-bit systems: unsigned int
-0.134974	nonzero and normal unsigned int
-0.134974	arraysize = 1000; unsigned int
-0.134974	// Example 7.7 unsigned int
-0.134974	// Example 7.25 unsigned int
-0.134974	fractional part 142 unsigned int
-0.134974	int u[2]} a[size]; unsigned int
-0.134974	template <typename T, unsigned int
-0.134974	exponent + 0x3FF unsigned int
-0.134974	exponent + 0x3FFF unsigned int
-0.134974	// Example 14.22b unsigned int
-0.134974	// Example 14.22a unsigned int
-0.134974	0 65535 uint16_t unsigned int
-0.134974	exponent + 0x7F unsigned int
-0.329794	16 8 128 SSE2 int
-0.043439	size other than short int
-0.043439	struct S1 { short int
-0.043439	of int. A short int
-0.043439	speed by using short int
-0.043439	Iu8vec8 16 4 short int
-0.043439	Vec16uc 16 8 short int
-0.014002	16 4 unsigned short int
-0.014002	16 8 unsigned short int
-0.014002	255 uint8_t unsigned short int
-0.043439	16 128 SSE2 short int
-0.043439	numbers of type short int
-0.010459	7.21 int i; short int
-0.010459	7.23 int i; short int
-0.043439	4 unsigned 256 short int
-0.043439	Vec32c unsigned char short int
-0.043439	32 256 AVX2 short int
-0.000057	short int bb[], short int
-0.000085	SelectAddMul(short int aa[], short int
-0.000511	SelectAddMul_dispatch(short int aa[], short int
-0.000511	FUNCNAME(short int aa[], short int
-0.000511	FuncType(short int aa[], short int
-0.004617	size Alignd ( short int
-0.004617	arrays Alignd ( short int
-0.004617	); Alignd ( short int
-0.043439	8 64 MMX short int
-0.043439	byte at 11 short int
-0.043439	// Example 7.22 short int
-0.043439	-128 127 int8_t short int
-0.329179	above line doesn't work int
-0.715978	r) { int i; int
-1.156909	int SomeFunction (int a, int
-0.503759	Signed and unsigned integers int
-0.291470	64 4 256 AVX int
-0.281571	a + b;} }; int
-0.226780	c; int UnusedFiller; }; int
-0.063112	int a; int b; int
-0.136981	{int a; int b; int
-0.367347	int a; double b; int
-0.272802	#include <emmintrin.h> static inline int
-0.272802	Example 14.19 static inline int
-0.272802	return _mm_cvtss_si32(_mm_load_ss(&x));} static inline int
-0.234913	3628800, 39916800, 479001600}; ... int
-0.290005	int 256 unsigned 256 int
-0.311943	int b; int c; int
-0.281470	public B2 { public: int
-0.281470	class S2 { public: int
-0.281470	class S3 { public: int
-0.045261	c:2; }; Bitfield x; int
-0.045261	abc; }; Bitfield x; int
-0.812289	int size = 100; int
-0.257094	100, NUMCOLUMNS = 100; int
-0.319700	16 16 256 AVX2 int
-0.233234	good compilers will reduce int
-0.528380	dynamic memory allocation are: int
-0.165699	S1 { int a; int
-0.165699	{ public: int a; int
-0.147870	Example 14.2a float a; int
-0.147870	Example 14.2b float a; int
-0.068292	struct abc {int a; int
-0.068292	struct Sab {int a; int
-0.321690	union { double d; int
-0.230957	int Multiply (int x, int
-0.019019	union { float f; int
-0.382709	int i; } u; int
-0.002145	version int CriticalFunction_386(int parm1, int
-0.002145	version int CriticalFunction_SSE2(int parm1, int
-0.004301	version int CriticalFunction_AVX(int parm1, int
-0.004301	127 int CriticalFunction_AVX(int parm1, int
-0.017464	typedef int CriticalFunctionType(int parm1, int
-0.017464	time int CriticalFunction_Dispatch(int parm1, int
-0.227688	T & operator[] (unsigned int
-0.190537	7.3. Explain volatile volatile int
-0.190537	{ int dummy[4]; volatile int
-0.281084	int NumberOfTests = 10; int
-0.191090	7.22 short int a[100]; int
-0.166603	Example 7.26b float a[100]; int
-0.166603	Example 7.26a float a[100]; int
-0.226011	// AVX version 127 int
-0.310862	16 4 64 MMX int
-0.902296	int in 16-bit systems: int
-0.224443	int size = 16; int
-0.224312	last byte at 7 int
-0.575873	n, factorial = 1.0; int
-0.003821	x); } void SelectAddMul(short int
-0.003821	with branch void SelectAddMul(short int
-0.003821	vector classes void SelectAddMul(short int
-0.003821	call inline void SelectAddMul(short int
-0.003821	*)d, x);} void SelectAddMul(short int
-0.003821	function vectorized: void SelectAddMul(short int
-0.506903	n) { // n! int
-0.323962	int size = 1000; int
-0.159246	int ArraySize = 1000; int
-0.221629	The code is __asm int
-0.008644	Example 14.12b int list[300]; int
-0.008644	Example 14.13b int list[300]; int
-0.008644	Example 14.13a int list[300]; int
-0.008644	Example 14.12a int list[300]; int
-0.218025	{ public: B2 b2; int
-0.015489	= 8; float matrix[rows][columns]; int
-0.015489	= 32; float matrix[rows][columns]; int
-0.015489	= 50; float matrix[rows][columns]; int
-0.211828	{ union { 89 int
-0.211573	double b;}; S1 list[100]; int
-0.211828	Examples: // Example 14.10 int
-0.211828	calculations: // Example 14.11 int
-0.211828	Example: // Example 8.7 int
-0.211828	conversion // Example 7.21 int
-0.567646	matrix[rows][columns]; int i, j; int
-0.107091	int size = 1024; int
-0.048229	void Func1 (int a[], int
-0.011564	8.26a void Func(int a[], int
-0.011564	8.26b void Func(int a[], int
-0.211573	with desired parameters typedef int
-0.211828	Example: // Example 7.23 int
-0.211828	conversion // Example 7.20 int
-0.211828	conversions: // Example 7.19 int
-0.211828	efficient: // Example 7.18 int
-0.464856	swapd(x,y) {temp=x; x=y; y=temp;} int
-0.199321	with: // Example 14.12b int
-0.035660	Polynomial coefficients double Table[100]; int
-0.035660	= 3.3; double Table[100]; int
-0.008644	int a:4; int b:2; int
-0.464856	string[100], *p = string; int
-0.199321	with: // Example 14.13b int
-0.199321	= 100; S1 list[size]; int
-0.250548	c;}; abc * p; int
-0.250548	entry point extern "C" int
-0.017464	struct { int a:4; int
-0.017464	Bitfield { int a:4; int
-0.199321	Example 7.14 class c1; int
-0.464856	return x * m;} int
-0.199321	= *p + 2;} int
-0.164513	Example: // Example 14.3a int
-0.164513	table: // Example 14.3b int
-0.164513	= 100; int matrix[NUMROWS][NUMCOLUMNS]; int
-0.164513	statement: // Example 8.9b int
-0.164513	Example: // Example 8.9a int
-0.164513	table: // Example 14.1b int
-0.164513	this: // Example 14.1a int
-0.164513	last byte at 403 int
-0.164513	100, max = 110; int
-0.164513	loop: // Example 14.13c int
-0.164513	operations: // Example 14.13a int
-0.164513	Example 7.18 int FuncRow(int); int
-0.164513	Example: // Example 8.13a int
-0.164513	to: // Example 8.13b int
-0.164513	structures: // Example 9.1a int
-0.164513	follows: // Example 9.1b int
-0.164513	to: // Example 8.11b int
-0.164513	Example: // Example 8.11a int
-0.164513	parameter: // Example 7.42 int
-0.164513	int b;}; Sab ab[size]; int
-0.164513	// Dispatcher void SelectAddMul_dispatch(short int
-0.164513	+ 1; } module2.cpp int
-0.164513	expected. Use square blocking: int
-0.164513	m;} template <int m> int
-0.164513	Example: // Example 8.6a int
-0.164513	by // Example 8.6b int
-0.164513	Example 14.6 float list[16]; int
-0.164513	// Example 8.20 module1.cpp int
-0.164513	gives: // Example 7.30b int
-0.164513	Example: // Example 7.30a int
-0.164513	Example 14.13c int list[301]; int
-0.164513	2: template <bool IsPowerOf2, int
-0.164513	AddTwo(int * __restrict aa, int
-0.164513	each version void FUNCNAME(short int
-0.164513	explicitly by writing: __declspec(align(64)) int
-0.164513	Example: // Example 8.12a int
-0.164513	to: // Example 8.12b int
-0.164513	type typedef void FuncType(short int
-0.164513	Example: // Example 14.12a int
-0.164513	to: // Example 8.14b int
-0.164513	Example: // Example 8.14a int
-0.164513	{return p->a + p->b;} int
-0.164513	16 -32768 32767 int16_t int
-0.164513	double d = 1.6; int
-0.164513	last byte at 399 int
-0.164513	from example 9.5a: 98 int
-0.057198	takes no more time than
-0.057198	take no more time than
-0.141974	list takes more time than
-0.141974	conversion takes more time than
-0.192312	may take more time than
-0.192312	sometimes take more time than
-0.027658	takes much more time than
-0.123089	to consume more time than
-0.205379	multiplication takes longer time than
-0.125988	to take longer time than
-0.012324	takes much longer time than
-0.450400	is faster to use than
-0.450400	are safer to use than
-0.435958	if there is more than
-0.279008	is accessed in more than
-0.294771	same register for more than
-0.260075	be increased by more than
-0.279008	if functions have more than
-0.207778	will run at more than
-0.297672	priority is no more than
-0.279008	order or do more than
-0.213735	process to take more than
-0.213735	it can take more than
-0.286035	process may take more than
-0.318121	are often much more than
-0.288793	the program uses more than
-0.021149	few clock cycles more than
-0.207778	non-Intel CPUs was more than
-0.207778	time is actually more than
-0.207778	need to load more than
-0.207778	that can go more than
-0.207778	same subexpression occurs more than
-0.207778	cache cannot prefetch more than
-0.207778	In some programs, more than
-0.207778	code" actually implies more than
-0.237325	may sample more data than
-0.557608	table in the program than
-0.352093	to load a program than
-0.342920	in optimizing library functions than
-0.355915	closer to the CPU than
-0.231379	of any size other than
-0.231379	in a library other than
-0.231379	Bit-fields of sizes other than
-0.231379	the class c1 other than
-0.461029	a higher instruction set than
-0.529097	write a + b than
-0.699698	in a dynamic library than
-0.232532	that is more efficient than
-0.232532	which is more efficient than
-0.232532	class is more efficient than
-0.232532	Linux is more efficient than
-0.232532	#if is more efficient than
-0.232532	pre-increment is more efficient than
-0.232532	*(p++) is more efficient than
-0.232532	array[i++] is more efficient than
-0.264581	cheaper and more efficient than
-0.264581	functions are more efficient than
-0.264581	data caching more efficient than
-0.086842	is sometimes more efficient than
-0.086842	are sometimes more efficient than
-0.290285	Sum1 slightly more efficient than
-0.078520	This is less efficient than
-0.078520	compiler is less efficient than
-0.078520	list is less efficient than
-0.078520	numbers is less efficient than
-0.078520	bitfield is less efficient than
-0.190221	can be less efficient than
-0.131311	as input less efficient than
-0.030474	have no other value than
-0.030474	produce no other value than
-0.063249	have any other value than
-0.822856	from the previous value than
-0.236782	the operands are variables than
-0.236754	it goes another way than
-0.057955	member function is faster than
-0.057955	template function is faster than
-0.182833	before. This is faster than
-0.283219	integers. It is faster than
-0.274581	floating point is faster than
-0.124853	of 2 is faster than
-0.124853	a file is faster than
-0.438159	a constant is faster than
-0.028012	software implementation is faster than
-0.124853	big blocks is faster than
-0.124853	example 15.1c is faster than
-0.124853	// (This is faster than
-0.124853	constant: Unsigned is faster than
-0.124853	example 14.21 is faster than
-0.126665	likely to be faster than
-0.184905	method may be faster than
-0.056395	integer operations are faster than
-0.056395	Linear arrays are faster than
-0.121221	C and C++ faster than
-0.121221	is 83 called faster than
-0.227395	this is often faster than
-0.101957	is many times faster than
-0.101957	to seven times faster than
-0.163590	are calculated much faster than
-0.121221	functions are calculated faster than
-0.244927	it will run faster than
-0.121221	the code execute faster than
-0.121221	run a little faster than
-0.121221	CPUs is increasing faster than
-0.121221	} // ipow faster than
-0.213863	the function is less than
-0.213863	or class is less than
-0.213863	a value is less than
-0.213863	The delay is less than
-0.213863	250 μs is less than
-0.064822	likely to be less than
-0.064822	guaranteed to be less than
-0.141047	in fact be less than
-0.181582	i is not less than
-0.181582	may run at less than
-0.181582	a million times less than
-0.181582	data files while less than
-0.181582	read or write less than
-0.181582	a relative difference less than
-0.002469	into the code rather than
-0.002469	floating point code rather than
-0.000616	at compile time rather than
-0.004951	in full use rather than
-0.001233	x in memory rather than
-0.001233	stored in memory rather than
-0.004951	a 64-bit integer rather than
-0.004951	to use float rather than
-0.004951	an existing object rather than
-0.004951	for one array rather than
-0.002469	aligned by 8 rather than
-0.002469	the constant 8 rather than
-0.004951	in a register rather than
-0.004951	a class template rather than
-0.000197	variables in registers rather than
-0.000049	transferred in registers rather than
-0.004951	of using pointers rather than
-0.004951	or cache access rather than
-0.004951	the operating system rather than
-0.004951	use 64 bits rather than
-0.004951	we get 0 rather than
-0.004951	only six instructions rather than
-0.004951	for present processors rather than
-0.004951	executed 10 times rather than
-0.004951	on the stack rather than
-0.004951	standard API calls rather than
-0.004951	in the container rather than
-0.004951	the flush-to-zero mode rather than
-0.004951	CPU clock cycles rather than
-0.004951	working with sets rather than
-0.004951	a software implementation rather than
-0.004951	a template parameter rather than
-0.004951	are integer expressions rather than
-0.004951	using static linking rather than
-0.004951	of using references rather than
-0.004951	program is loaded rather than
-0.004951	in one operation rather than
-0.004951	the result 100 rather than
-0.004951	specific processor models rather than
-0.004951	from the beginning rather than
-0.004951	in big blocks rather than
-0.004951	call to memcpy rather than
-0.004951	of each factor rather than
-0.004951	the execution units rather than
-0.004951	a single step rather than
-0.004951	needed in advance rather than
-0.004951	truncation towards zero, rather than
-0.004951	multiplication of xxn rather than
-0.004951	libraries and frameworks, rather than
-0.004951	the result -56 rather than
-0.004951	only calculated once, rather than
-0.004951	if(!a && !b) rather than
-0.004951	defines electrical connections rather than
-0.004951	calculated as (b*2.0)/3.0 rather than
-0.004951	are using unions rather than
-0.004951	on processor X?" rather than
-0.004951	code that matters rather than
-0.004951	the CPU supports, rather than
-0.004951	good development tools, rather than
-0.004951	is running at, rather than
-0.236419	rely on compiler optimization than
-0.236315	use on such systems than
-0.236164	in a separate file than
-0.236191	int uses more bits than
-0.236201	<< and | operations than
-0.236086	other cache control instructions than
-0.201797	access is more important than
-0.201797	development are more important than
-0.277443	has become less important than
-0.303281	priority to one thread than
-0.313635	containers for each thread than
-0.235934	memory and computing power than
-0.027454	efficient in 64-bit Linux than
-0.056762	faster in 64-bit Linux than
-0.634438	functions or vector classes than
-0.337355	higher for single precision than
-0.341649	to re-use a container than
-0.350454	is faster to calculate than
-0.538559	efficient in 64-bit mode than
-0.567323	in 64 bit mode than
-0.087170	b have other values than
-0.087170	operands have other values than
-0.019887	have no other values than
-0.351782	uses more clock cycles than
-0.311015	not allocate more space than
-0.311339	to optimize anything else than
-0.234560	is faster with signed than
-0.345909	one big memory block than
-0.341746	count down to zero than
-0.026322	may use more resources than
-0.026322	C++ take more resources than
-0.012961	take much more resources than
-0.012961	uses much more resources than
-0.026322	only slightly more resources than
-0.151220	takes more memory resources than
-0.266083	network and other resources than
-0.151220	have less computing resources than
-0.091516	instruction set is better than
-0.091516	64-bit version is better than
-0.297518	may actually be better than
-0.297523	reductions on integer expressions than
-0.223442	at reducing integer expressions than
-0.206511	function that is longer than
-0.206511	method should be longer than
-0.206511	in the matrix longer than
-0.487142	on the user interface than
-0.271929	measure are much higher than
-0.218270	count is usually higher than
-0.034082	a program is bigger than
-0.034082	the matrix is bigger than
-0.034082	size parameter is bigger than
-0.111501	library may be bigger than
-0.111501	arrays that are bigger than
-0.111501	is treated as bigger than
-0.111501	the innermost loop bigger than
-0.111501	even for arrays bigger than
-0.111501	a total offset bigger than
-0.111501	of 2. Objects bigger than
-0.233686	defined in other ways than
-0.288768	functions in other modules than
-0.233208	with execution units smaller than
-0.428624	for level-2 cache contentions than
-0.460504	is an array index than
-0.287551	is therefore more safe than
-0.231894	choosing the best algorithm than
-0.433316	with the same priority than
-0.231022	size has higher priority than
-0.231022	thread with lower priority than
-0.341606	a higher clock frequency than
-0.317071	be cached more efficiently than
-0.090863	computers with more RAM than
-0.090863	to allocate more RAM than
-0.826370	as a circular buffer than
-0.370319	a programmable logic device than
-0.303942	allocate more memory blocks than
-0.228787	makes dynamic_cast more time-consuming than
-0.041283	unit for other purposes than
-0.041283	card for other purposes than
-0.282711	efficiently with coarse-grained parallelism than
-0.227719	residual error is lower than
-0.227719	the data more random than
-0.136275	likely to be slower than
-0.048304	dynamic link libraries slower than
-0.048304	processor is much slower than
-0.048304	will always run slower than
-0.048304	likely to execute slower than
-0.048304	neither faster nor slower than
-0.280995	write is more expensive than
-0.300883	is often more reliable than
-0.079193	operand is more predictable than
-0.079193	comparisons are more predictable than
-0.315326	faster and more compact than
-0.224705	stored in binary form than
-0.221888	size that is larger than
-0.218218	is often easier said than
-0.218218	to be a bottleneck than
-0.271722	leaf function is simpler than
-0.211997	to find elsewhere. Faster than
-0.211826	or do other input/output than
-0.250825	a larger memory footprint than
-0.199567	has been accessed recently than
-0.199567	test, maintain and verify than
-0.199567	are higher for shared_ptr than
-0.164740	current array element. Rather than
-0.164740	whole polygon or bitmap than
-0.164740	easier to write 2.0/3.0 than
-0.164740	involving class objects (rather than
-0.164740	or function call (other than
-0.164740	very small loops (less than
-0.164740	the objects (memory pooling) than
-0.546971	describe some of the compiler
-0.350961	without help of the compiler
-0.349808	and known to the compiler
-0.341608	a hint and the compiler
-0.341608	See www.openmp.org and the compiler
-0.449848	are listed in the compiler
-0.064406	8 Optimizations in the compiler
-0.230635	therefore possible for the compiler
-0.230635	rarely possible for the compiler
-0.227657	is difficult for the compiler
-0.322645	available options for the compiler
-0.100184	it easier for the compiler
-0.495495	point is that the compiler
-0.879343	disadvantage is that the compiler
-0.323002	the code that the compiler
-0.521108	time so that the compiler
-0.515873	be sure that the compiler
-0.323002	arbitrary name that the compiler
-0.698248	generally assume that the compiler
-0.323002	seem illogical that the compiler
-0.447856	to check if the compiler
-0.346480	runtime here if the compiler
-0.326135	typically implemented by the compiler
-0.326135	is converted by the compiler
-0.414328	code generated by the compiler
-0.292271	comments generated by the compiler
-0.303624	cannot rely on the compiler
-0.303624	always rely on the compiler
-0.620571	as good as the compiler
-0.322610	as index then the compiler
-0.322610	same module then the compiler
-0.322610	than once then the compiler
-0.346407	no warning from the compiler
-0.348339	things. Looking at the compiler
-0.838802	possible to make the compiler
-0.413550	simple function because the compiler
-0.413550	less efficient because the compiler
-0.319096	optimizations possible because the compiler
-0.413550	is inefficient because the compiler
-0.329353	some systems. If the compiler
-0.329353	not efficient. If the compiler
-0.329353	to maintain. If the compiler
-0.259445	multiple threads, but the compiler
-0.259445	on BSD, but the compiler
-0.259445	page 103), but the compiler
-0.259445	or aliasing, but the compiler
-0.517515	many cases where the compiler
-0.422086	in situations where the compiler
-0.313299	of 2, so the compiler
-0.342511	integers simply makes the compiler
-0.345682	it comes before the compiler
-0.285213	across platforms. See the compiler
-0.285213	is obvious. See the compiler
-0.421060	optimization. For example, the compiler
-0.641645	the above example, the compiler
-1.087079	to make sure the compiler
-0.554331	In some cases the compiler
-0.323897	constant 5. But the compiler
-0.335812	to predict whether the compiler
-0.369085	checking how well the compiler
-0.247285	In many cases, the compiler
-0.247285	50 simple cases, the compiler
-0.182182	that it allows the compiler
-0.081607	"undefined". This allows the compiler
-0.081607	throw(); This allows the compiler
-0.155725	to know what the compiler
-0.034014	8.7 Checking what the compiler
-0.425576	appropriate to give the compiler
-0.260075	i++) List[i]++; Here, the compiler
-0.260075	int c1::*MemberPointer; Here, the compiler
-0.228015	the simple function, the compiler
-0.328745	function calls. Unfortunately, the compiler
-0.166968	that can prevent the compiler
-0.166968	This will prevent the compiler
-0.210886	and it prevents the compiler
-0.207490	thread. This prevents the compiler
-0.207490	volatile. This prevents the compiler
-0.149300	It also prevents the compiler
-0.149300	integer division prevents the compiler
-0.002471	code to tell the compiler
-0.002471	possible to tell the compiler
-0.002471	always to tell the compiler
-0.002471	declaration to tell the compiler
-0.002471	directive to tell the compiler
-0.002471	prototype to tell the compiler
-0.002471	forgot to tell the compiler
-0.002471	novector to tell the compiler
-0.020171	references then tell the compiler
-0.204778	This may enable the compiler
-0.204778	This will enable the compiler
-0.204778	instruction sets enable the compiler
-0.228015	This will allow the compiler
-0.302952	you cannot expect the compiler
-0.228015	The reason why the compiler
-0.302952	we can help the compiler
-0.228015	instruction sets. Likewise, the compiler
-0.228015	pointer or reference, the compiler
-0.228015	element in list, the compiler
-0.228015	the other hand, the compiler
-0.302952	always to specify the compiler
-0.313299	definition. This tells the compiler
-0.022760	file. This enables the compiler
-0.228015	73 Without optimization, the compiler
-0.369085	throw. In fact, the compiler
-0.228015	the code. Sometimes the compiler
-0.228015	page 78). Adding the compiler
-0.282971	this by invoking the compiler
-0.282971	& operator forces the compiler
-0.228015	In example 12.1a, the compiler
-0.228015	examples exist. Therefore the compiler
-0.228015	in example 12.1b, the compiler
-0.228015	a = ++b; the compiler
-0.356481	assembly output of a compiler
-0.352204	the code that a compiler
-0.843856	is to use a compiler
-0.545855	or by using a compiler
-0.549529	algebra. For example, a compiler
-0.330240	satisfied: 1. Use a compiler
-0.339986	inefficient, (4) get a compiler
-0.323333	block. This requires a compiler
-0.152491	You cannot expect a compiler
-0.519252	applications. The choice of compiler
-0.128334	8 2.5 Choice of compiler
-0.128334	compilers. 2.5 Choice of compiler
-0.048260	158 18 Overview of compiler
-0.048260	159 18 Overview of compiler
-0.314562	for assembly programmers and compiler
-0.331872	License license included in compiler
-0.352374	1; } } The compiler
-0.352374	Induction++; } } The compiler
-0.260298	+ 1; } The compiler
-0.260298	+ 3; } The compiler
-0.260298	+ 1.; } The compiler
-0.424977	for library functions. The compiler
-0.406798	CriticalInnerFunction is called. The compiler
-0.223172	optimize across modules The compiler
-0.223172	about. Function inlining The compiler
-0.562320	i by 2. The compiler
-0.297204	an anonymous object. The compiler
-0.307412	an induction variable. The compiler
-0.297204	at each access. The compiler
-0.223172	e + f; The compiler
-0.297204	encounter another problem. The compiler
-0.509884	set is enabled. The compiler
-0.223172	branch is executed. The compiler
-0.223172	most up-to-date solution. The compiler
-0.223172	Windows and Linux. The compiler
-0.277480	as an integer. The compiler
-0.386502	on automatic vectorization. The compiler
-0.544129	into multiple threads. The compiler
-0.277480	model used here. The compiler
-0.277480	iterations in one. The compiler
-0.223172	multiplication or division. The compiler
-0.277480	the necessary initialization. The compiler
-0.223172	a + 1.0f; The compiler
-0.386502	from another module. The compiler
-0.223172	function is big. The compiler
-0.362398	square(x) + 1.0f;} The compiler
-0.223172	result in x. The compiler
-0.223172	(a+1) / 4; The compiler
-0.223172	a[i] = i+1; The compiler
-0.223172	profiling methods: Instrumentation: The compiler
-0.223172	(1. / 1.2345); The compiler
-0.223172	set, e.g. /arch:SSE2. The compiler
-0.223172	(see page 84). The compiler
-0.223172	value of temp. The compiler
-0.223172	a; Plus2 (&a); The compiler
-0.223172	to return a+1;. The compiler
-0.223172	(see page 72). The compiler
-0.223172	2.0 / 3.0; The compiler
-0.237797	is, I guess, that compiler
-0.245647	Obstacles to optimization by compiler
-0.449905	convenient to rely on compiler
-0.236827	compilers. Wikipedia article on compiler
-0.292859	Optimizes very well. This compiler
-0.236715	compiler (parallel composer) This compiler
-0.231609	it is possible. A compiler
-0.231609	the same result. A compiler
-0.231609	be avoided. 37 A compiler
-0.231609	interpreter for Basic. A compiler
-0.231609	a polynomial. Scheduling A compiler
-0.453486	library with a different compiler
-0.723764	are using the same compiler
-0.477707	difficult to predict which compiler
-0.313933	of expressions, but no compiler
-0.335562	that belong to each compiler
-0.086102	disadvantage of the Intel compiler
-0.086102	behavior of the Intel compiler
-0.086102	bias of the Intel compiler
-0.134263	compiler and the Intel compiler
-0.029878	mechanism in the Intel compiler
-0.315092	Note that the Intel compiler
-0.134263	work on the Intel compiler
-0.193603	to use the Intel compiler
-0.134263	references: If the Intel compiler
-0.134263	my tests, the Intel compiler
-0.041598	CPU dispatching in Intel compiler
-0.058440	} } The Intel compiler
-0.058440	platform. Intel The Intel compiler
-0.058440	Itanium systems. The Intel compiler
-0.058440	or not. The Intel compiler
-0.058440	avoiding this. The Intel compiler
-0.058440	is required. The Intel compiler
-0.058440	page 122. The Intel compiler
-0.058440	for details). The Intel compiler
-0.266885	in tests on Intel compiler
-0.292418	Linux Intel compiler Intel compiler
-0.018064	Intel compiler Windows Intel compiler
-0.005274	Gnu compiler Linux Intel compiler
-0.485793	version of the C++ compiler
-0.327283	PathScale compilers. Intel C++ compiler
-0.278910	libraries. The Gnu C++ compiler
-0.278910	all platforms. PathScale C++ compiler
-0.278910	be tolerated. PGI C++ compiler
-0.352416	LLVM is a new compiler
-0.353249	point precision. The following compiler
-0.047133	similar to the Gnu compiler
-0.047133	according to the Gnu compiler
-0.047133	Windows and the Gnu compiler
-0.047133	15.1b and the Gnu compiler
-0.154555	included with the Gnu compiler
-0.100010	efficient than the Gnu compiler
-0.100010	and only the Gnu compiler
-0.100010	to replace the Gnu compiler
-0.100010	} Here, the Gnu compiler
-0.003938	CPU dispatching in Gnu compiler
-0.073246	AMD CPUs. The Gnu compiler
-0.073246	function calls. The Gnu compiler
-0.073246	32-bit version. The Gnu compiler
-0.073246	automatic vectorization. The Gnu compiler
-0.073246	namespace. 3. The Gnu compiler
-0.073246	and Mac. The Gnu compiler
-0.003938	MS compiler Windows Gnu compiler
-0.292088	directive for a Windows compiler
-0.349799	DLL with the best compiler
-0.328173	language because a good compiler
-0.173006	2; } A good compiler
-0.173006	one operation. A good compiler
-0.173006	simple index. A good compiler
-0.026482	assume that an optimizing compiler
-0.026482	volatile then an optimizing compiler
-0.026482	issue because an optimizing compiler
-0.026482	C1::f. But an optimizing compiler
-0.026482	most cases, an optimizing compiler
-0.033363	first program. An optimizing compiler
-0.033363	is used. An optimizing compiler
-0.033363	below. Devirtualization An optimizing compiler
-0.033363	char pointers). An optimizing compiler
-0.152290	performance. A good optimizing compiler
-0.349918	can expect a particular compiler
-0.234795	debug and maintain. Most compiler
-0.247820	} } The Microsoft compiler
-0.196896	Clang, Intel or Microsoft compiler
-0.196896	_MSC_VER // If Microsoft compiler
-0.196896	an explanation. (The Microsoft compiler
-0.234518	Watcom Another open source compiler
-0.234012	piece of code. Each compiler
-0.289643	the manual for your compiler
-0.233436	vectorization. 3. Use appropriate compiler
-0.230676	Clang, Intel or PathScale compiler
-0.304130	cases where the chosen compiler
-0.228041	other hand, a just-in-time compiler
-0.082563	Windows platforms. The Clang compiler
-0.082563	platforms. Clang The Clang compiler
-0.301222	included. Combining the Borland compiler
-0.224683	register keyword. The CodeGear compiler
-0.312371	and the Digital Mars compiler
-0.222129	timing, assembly language programming, compiler
-0.222186	inefficient way. The Codeplay compiler
-0.114567	not _WIN32 n.a. MS compiler
-0.012774	relevant to optimization MS compiler
-0.218327	Codeplay VectorC A commercial compiler
-0.218397	PathScale compilers. (The PGI compiler
-0.199797	as a stand alone compiler
-0.199797	This is a cheap compiler
-0.164952	a very user friendly compiler
-0.164952	popularity when a genuine compiler
-0.568967	&& a = a x
-0.455809	here. The address of x
-0.751973	the sign bit of x
-0.395023	Set sign bit of x
-0.381704	case the reading of x
-0.505690	delay the availability of x
-0.314722	// add 2 to x
-0.237785	// Function template for x
-0.356177	the result is that x
-0.355891	y, z; a = x
-0.102346	{ double x2 = x
-0.102346	1./2.09227E13}; float x2 = x
-0.336109	factorials don't depend on x
-0.588091	- - - - x
-0.902879	x - - - x
-0.576682	- x - - x
-0.282937	x x - - x
-0.255972	a+a+a+a=a*4 -(-a)=a - - x
-0.151213	= a x - x
-0.504682	- - x - x
-0.524079	x - x - x
-0.222089	- x x - x
-0.152319	x x x - x
-0.140855	(x) x x - x
-0.151213	Devirtualization ---x----- x - x
-0.151213	x-xxx---x x-xxx---x x - x
-0.096668	0 - n.a. - x
-0.292710	* 5; to int x
-0.236584	compilers will reduce int x
-0.611978	is more efficient than x
-0.573939	It is faster than x
-0.352884	- - - x x
-0.389744	x - - x x
-0.091737	a x - x x
-0.262691	- x - x x
-0.093100	x x - x x
-0.091737	---x----- x - x x
-0.091737	x-xxx---x x - x x
-0.151364	- - x x x
-0.177966	x - x x x
-0.058757	- x x x x
-0.117697	x x x x x
-0.184190	x- x x x x
-0.233429	x n.a. x x x
-0.007541	x x- x x x
-0.073108	x 74 x x x
-0.227715	a - n.a. x x
-0.227715	0 - n.a. x x
-0.252666	a*b=b*a x n.a. x x
-0.019919	x x x- x x
-0.195425	x x (x) x x
-0.095987	- x 74 x x
-1.031744	= a - n.a. x
-0.401556	- n.a. - n.a. x
-1.025090	= 0 - n.a. x
-0.532086	- n.a. x n.a. x
-0.224164	a+b=b+a, a*b=b*a x n.a. x
-1.133945	- - n.a. n.a. x
-0.353072	destructor for the object x
-0.330623	{ return x * x
-0.226994	2 return (2.5f * x
-0.226994	x - 8.0f) * x
-0.355805	(int x) { return x
-0.264533	(float x) { return x
-0.284481	int m) { return x
-0.236852	elements are there between x
-0.064191	for (x = 0; x
-0.341657	is used. For example, x
-0.341657	in efficiency. For example, x
-0.446530	in order to access x
-0.236331	in the former case x
-0.036115	Approximate exp(x) for small x
-0.344764	Use template to get x
-0.348091	the compiler to store x
-0.327053	1) y *= x; x
-0.231239	= (int)n - 2, x
-0.231239	// multiply // square x
-0.371735	this function can modify x
-0.314617	e, f, x, y; x
-0.554432	| -1 = -1 x
-0.001486	x x x x- x
-0.013557	- xx x x- x
-0.056968	d = x- x- x
-0.222268	// Example 15.1a. Calculate x
-0.385044	reductions: Common subexpression elimination x
-0.307487	for (x = 2.0; x
-0.011592	x x-- x x-- x
-0.011592	reductions: x-- x x-- x
-0.048350	vector algebra reductions: x-- x
-0.212102	Automatic vectorization Devirtualization ---x----- x
-0.102806	x x x (x) x
-0.102806	x- x- x (x) x
-0.199836	x--x----- --xx----- x-xxx---x x-xxx---x x
-0.199836	- - x 74 x
-0.199836	(vector) reductions: a+b=b+a, a*b=b*a x
-0.164988	xxxxxxxxx -- - xx x
-0.164988	data // constructor initializes x
-0.237328	actually quite powerful and may
-0.237328	integers is ambiguous and may
-0.254747	for initialized variables that may
-0.254747	for uninitialized variables that may
-0.336610	a few instructions that may
-0.291670	also other advantages that may
-0.235670	any other cleanup that may
-0.235670	error condition. Things that may
-0.304159	accurate, however, and it may
-0.304159	often fluctuating and it may
-0.541632	optimizations is that it may
-0.550399	time so that it may
-0.328492	so kludgy that it may
-0.253595	library functions then it may
-0.253595	are necessary then it may
-0.253595	a problem then it may
-0.253595	same resource then it may
-0.253595	If not, then it may
-0.253595	(see below) then it may
-0.253595	μs today, then it may
-0.253595	been identified, then it may
-0.253595	is obvious, then it may
-0.059821	time consuming because it may
-0.313996	accessed non-sequentially because it may
-0.279309	most cases, but it may
-0.364623	latter function, but it may
-0.279309	instruction set, but it may
-0.279309	considerable job, but it may
-0.045508	unit. For example, it may
-0.045508	modularity. For example, it may
-0.220354	the best optimization it may
-0.587845	in this case it may
-0.318767	function parameter. But it may
-0.220354	In large arrays, it may
-0.220354	logic allows it, it may
-0.343251	the function. The function may
-0.475756	that the same function may
-0.518258	of the critical function may
-0.313288	or the error code may
-0.406343	that the compiled code may
-0.319874	in RAM memory. This may
-0.228215	and back again. This may
-0.303190	in different modules. This may
-0.228215	for marketing reasons. This may
-0.283199	the function inline. This may
-0.228215	+ a; 72 This may
-0.228215	out of range. This may
-0.228215	it becomes full. This may
-0.228215	with two entries. This may
-0.228215	can be reduced. This may
-0.228215	See page 45. This may
-0.413521	hint and the compiler may
-0.450195	once then the compiler may
-0.469763	threads, but the compiler may
-0.413521	simple cases, the compiler may
-0.319072	or reference, the compiler may
-0.319072	other hand, the compiler may
-0.319072	In fact, the compiler may
-0.321195	For example, a compiler may
-0.302932	The choice of compiler may
-0.169176	} } The compiler may
-0.095627	1; } The compiler may
-0.095627	3; } The compiler may
-0.095627	1.; } The compiler may
-0.222090	each access. The compiler may
-0.222090	+ f; The compiler may
-0.222090	necessary initialization. The compiler may
-0.222090	+ 1.0f; The compiler may
-0.222090	+ 1.0f;} The compiler may
-0.222090	/ 4; The compiler may
-0.222090	= i+1; The compiler may
-0.222090	Plus2 (&a); The compiler may
-0.302932	polynomial. Scheduling A compiler may
-0.492269	program. An optimizing compiler may
-0.331831	optimizing features, and you may
-0.205168	this purpose, or you may
-0.195884	members are then you may
-0.265028	the code then you may
-0.195884	a program then you may
-0.195884	smart pointer then you may
-0.195884	profiler works then you may
-0.195884	data structure then you may
-0.195884	b[1], ... then you may
-0.195884	the code, then you may
-0.195884	one instance then you may
-0.195884	version changes then you may
-0.195884	cache efficiency, then you may
-0.195884	and y?" then you may
-0.285649	arrays automatically but you may
-0.379610	dispatching. For example, you may
-0.257132	In some cases you may
-0.257132	example, in Windows, you may
-0.205168	start to code, you may
-0.257132	In 64-bit systems, you may
-0.205168	a composite object, you may
-0.085513	cache size. Alternatively, you may
-0.085513	own stack. Alternatively, you may
-0.085513	function returns. Alternatively, you may
-0.085513	in Windows). Alternatively, you may
-0.205168	the latter case, you may
-0.205168	is executed. Furthermore, you may
-0.205168	language. In fact, you may
-0.205168	needed. Even better, you may
-0.205168	without jeopardizing safety, you may
-0.337810	member function because this may
-0.236079	and Mac systems, this may
-0.324792	structure. The extra time may
-0.211659	or clearing arrays It may
-0.475965	through a pointer. It may
-0.211659	the vector registers. It may
-0.211659	20 clock cycles. It may
-0.211659	in one vector. It may
-0.211659	of the branch. It may
-0.264456	and disk space. It may
-0.211659	some legacy software. It may
-0.211659	be true anyway. It may
-0.264456	compile time here. It may
-0.211659	the previous one. It may
-0.211659	or approximately so. It may
-0.211659	are poorly predictable. It may
-0.211659	but also safer. It may
-0.211659	in the profile. It may
-0.211659	to contained objects? It may
-0.211659	is too high. It may
-0.211659	they are unavoidable. It may
-0.426032	error. The allocated memory may
-0.322643	amount of RAM memory may
-0.512330	version of the program may
-0.512330	logic of the program may
-0.349009	on redesigning a program may
-0.342667	always true. The program may
-0.237268	names of inlined functions may
-0.355887	changing then the CPU may
-0.231108	oldest Pentium CPUs which may
-0.231108	at unpredictable intervals which may
-0.231108	object is moved, which may
-0.231108	for intermediate results, which may
-0.237230	sake of security, but may
-0.492898	for the level-1 cache may
-0.313740	vector operations An integer may
-0.721959	The CISC instruction set may
-0.674422	in the above example may
-0.441924	emphasized that the compilers may
-0.287709	Object1.Hello(), though future compilers may
-0.287709	code that current compilers may
-0.736040	The cache line size may
-0.428471	used. A smart pointer may
-0.442766	A user interface library may
-0.342666	is used, then there may
-0.302755	particularly problematic because there may
-0.227849	a parameter, so there may
-0.326904	most critical. However, there may
-0.220597	13.2 Model-specific dispatching There may
-0.220597	the end user. There may
-0.220597	accessed much faster. There may
-0.220597	to 2 Mbytes. There may
-0.220597	down to 36. There may
-0.220597	to using inheritance. There may
-0.293001	inefficient. An allocated array may
-0.334741	clock cycles that we may
-0.276935	not the case we may
-0.222691	In the future we may
-0.222691	to be available, we may
-0.222691	rules of algebra, we may
-0.292888	avoid it. Global variables may
-0.127152	an error. // You may
-0.170133	at a time. You may
-0.127152	of memory used. You may
-0.170133	is less efficient. You may
-0.127152	the guidelines below. You may
-0.127152	function is called. You may
-0.127152	the 'this' pointer. You may
-0.170133	use vector operations. You may
-0.127152	95 not needed. You may
-0.127152	the base classes. You may
-0.127152	for double precision. You may
-0.127152	a graceful way. You may
-0.127152	elements per vector. You may
-0.127152	elements to zero. You may
-0.127152	rarely needed anyway. You may
-0.127152	for the application. You may
-0.028470	different CPU cores. You may
-0.028470	multiple CPU cores. You may
-0.127152	from other modules. You may
-0.127152	the program itself. You may
-0.127152	is not expensive. You may
-0.127152	with a debugger. You may
-0.127152	7.35 page 52. You may
-0.127152	algorithm in question. You may
-0.127152	particular processor model. You may
-0.127152	are mutually incompatible. You may
-0.127152	systems available today. You may
-0.127152	element level 108 You may
-0.127152	-fwrapv or -fno-strict-overflow. You may
-0.127152	prevent cache contention. You may
-0.127152	bits (rarely 64). You may
-0.127152	are not used). You may
-0.127152	a bad dilemma. You may
-0.232757	examples in this table may
-0.232757	a hand- written table may
-0.423214	Type casting of pointers may
-0.231132	that typically use pointers may
-0.236260	storage, but other systems may
-0.312859	program starts. The user may
-0.236444	are needed, or they may
-0.347542	Windows MFC). This method may
-0.230704	the short vector method may
-0.312727	controlled. The network access may
-0.406355	Note that the system may
-0.078607	used. The operating system may
-0.078607	cache. The operating system may
-0.078607	databases. The operating system may
-0.236040	break a few times may
-0.337581	disadvantage of 64-bit Windows may
-0.342341	with many function calls may
-0.228958	7.14 Functions Function calls may
-0.235798	of error. The calculations may
-0.291944	Delays in program execution may
-0.096861	machine. The virtual processor may
-0.096861	important. A virtual processor may
-0.221718	A Pentium M processor may
-0.235627	templates in www.agner.org/optimize/cppexamples.zip. These may
-0.322636	the contrary, each thread may
-0.235555	branch pattern history, etc. may
-0.235585	code motion A calculation may
-0.764307	the most efficient solution may
-0.328803	object because the container may
-0.345134	very low repeat count may
-0.553728	allocation. Dynamic memory allocation may
-0.234585	from optimal. The branches may
-0.321250	point addition and multiplication may
-0.234601	117 A C++ implementation may
-0.234552	structure and class members may
-0.290185	branches. The following methods may
-1.378710	a pointer or reference may
-0.234190	For example, a programmer may
-0.234327	to be annoying. We may
-0.531300	The stack unwinding mechanism may
-0.415580	of floating point expressions may
-0.219806	often, but such expressions may
-0.309657	and the runtime framework may
-0.376504	the log on process may
-0.216071	members. A simple constructor may
-0.409519	performance. A copy constructor may
-0.233038	starts up. Some modules may
-0.233248	the advice given here may
-0.308628	Therefore, the data section may
-0.308432	of x The syntax may
-0.288158	execute then the profiler may
-0.287847	PC's in a network may
-0.623284	that the clock frequency may
-0.397107	PCs. The clock frequency may
-0.230265	these time consuming updates may
-0.229626	the const int declaration may
-0.396027	elements. A hash map may
-0.228777	unaligned reads and writes may
-0.307481	deallocated. The program logic may
-0.228777	interface calls. The usability may
-0.326406	A long dependency chain may
-0.226450	sequentially in memory. They may
-0.311179	fragmented. This garbage collection may
-0.226272	users have. The developers may
-0.280994	program. The time measurements may
-0.304024	code and just-in-time compilation may
-0.310704	C++. Critical device drivers may
-0.221923	a template parameter. Templates may
-0.218000	reveals that similar solutions may
-0.218000	the function returns. alloca may
-0.218000	best under this unit-test may
-0.388823	case. A binary tree may
-0.211739	Many of the advices may
-0.211739	done only once. One may
-0.211739	integers. 7.25 Bitfields Bitfields may
-0.211940	= b * 2.5 may
-0.264547	are not computationally intensive may
-0.199483	calls exit. Calling exit may
-0.199483	i++) { // Overflow may
-0.250731	code. A test setup may
-0.164663	it to the tolerance may
-0.164663	disks and USB sticks may
-0.164663	systems. A software developer may
-0.164663	but the first dimension may
-0.164663	EXCLUSIVE OR operator (^) may
-0.234512	pointer points to and you
-0.337515	a critical function and you
-0.349643	floating point code and you
-0.342470	several different functions and you
-0.403119	separate function library and you
-0.290352	are less efficient and you
-0.321218	with non-sequential access and you
-0.234512	structured exception handling and you
-0.290352	advanced optimizing features, and you
-0.234512	is fast anyway and you
-0.234512	count is odd and you
-0.234512	the & operator; and you
-0.234512	not always sequential, and you
-0.353161	made containers is that you
-0.542088	However, the code that you
-0.914023	tell the compiler that you
-0.284834	particularly slow instruction that you
-0.371358	the instruction set that you
-0.439343	critical code so that you
-0.339725	function call so that you
-0.284834	number of arrays that you
-0.229656	time, any processor that you
-0.206440	option. This requires that you
-0.206440	This method requires that you
-0.229656	various optimization options that you
-0.325749	of speed. Assume that you
-0.284834	chains. Another thing that you
-0.229656	series of statements that you
-0.099821	the clock counts that you
-0.099821	"best case" counts that you
-0.229656	requires, of course, that you
-0.315297	and then think that you
-0.229656	it is unrealistic that you
-0.229656	difference, let's say that you
-0.237740	instruction set. Neither can you
-0.237531	for this purpose, or you
-0.269502	same thing and if you
-0.388551	library is that if you
-0.269502	the same as if you
-0.325125	such methods only if you
-0.448325	outside the loop if you
-0.044798	it, for example if you
-0.044798	parts, for example if you
-0.216124	efficient integer size if you
-0.216124	a lookup table if you
-0.530899	conditions. For example, if you
-0.216124	dividend to unsigned if you
-0.216124	can be important if you
-0.269502	several different CPUs if you
-0.269502	are not necessary if you
-0.216124	source annotation option if you
-0.216124	precision is good if you
-0.216124	use single precision if you
-0.216124	the memset line if you
-0.216124	skip this section if you
-0.216124	resources than C if you
-0.269502	make variables global if you
-0.352716	also be vectorized if you
-0.216124	cache is organized if you
-0.216124	of some help if you
-0.216124	the following explanation if you
-0.216124	such small devices if you
-0.216124	the database anyway if you
-0.216124	your programming questions if you
-0.216124	no pointer aliasing" if you
-0.216124	of memory leaks if you
-0.216124	must be adjusted if you
-0.216124	metaprogramming. Don't panic if you
-0.539118	require that the code you
-0.826693	the piece of code you
-0.205168	program as long as you
-0.205168	calculations as long as you
-0.205168	integers, as long as you
-0.288837	5). As soon as you
-0.233179	complicated and clumsy, as you
-0.233179	serious legal issue, as you
-0.554836	options for the compiler you
-0.748030	generated by the compiler you
-0.935113	less than the time you
-0.348724	to maintain. The time you
-0.333069	performance: The first time you
-0.282635	without position-independent code when you
-0.227718	thing to do when you
-0.368675	program, for example when you
-0.227718	parameter comes first when you
-0.282635	faster than signed when you
-0.282635	of the counters when you
-0.227718	becomes more readable when you
-0.227718	and array indices when you
-0.257568	its members are then you
-0.144253	in the code then you
-0.144253	piece of code then you
-0.239529	of a program then you
-0.266877	avoid virtual functions then you
-0.189515	a big loop then you
-0.189515	than the cache then you
-0.239529	their smart pointer then you
-0.189515	deleting the object then you
-0.189515	sequence of calculations then you
-0.189515	a profiler works then you
-0.189515	other data structure then you
-0.189515	a[1], b[1], ... then you
-0.189515	for your application then you
-0.189515	using exception handling then you
-0.189515	the preceding addition then you
-0.189515	the 64-bit vectors then you
-0.189515	the CPU supports then you
-0.189515	of the code, then you
-0.189515	only one instance then you
-0.189515	or specific models then you
-0.189515	particular instruction set, then you
-0.189515	additions are independent then you
-0.189515	is an integer, then you
-0.189515	the version changes then you
-0.316670	factor. If not, then you
-0.189515	e.g. four numbers, then you
-0.189515	improve cache efficiency, then you
-0.189515	template parameters differ then you
-0.189515	an acceptable limit, then you
-0.189515	a particular meaning, then you
-0.189515	hyperthreading. If so, then you
-0.189515	the same algorithm, then you
-0.189515	x and y?" then you
-0.353644	before running a program you
-0.289492	the Boolean operands because you
-0.233755	code is fastest because you
-0.233755	practice, of course, because you
-0.237307	at runtime, if only you
-0.187571	another exception. 64 If you
-0.187852	floating point code. If you
-0.187852	for application-specific code. If you
-0.255297	than static memory. If you
-0.237350	in 64-bit systems. If you
-0.237350	are equally efficient. If you
-0.358468	corresponding instruction set. If you
-0.237350	floating point library. If you
-0.187571	40 clock cycles. If you
-0.237350	a different thread. If you
-0.187571	d = u; If you
-0.187571	big endian storage. If you
-0.187571	on page 16. If you
-0.187571	too many branches. If you
-0.187571	library at www.agner.org/optimize/asmlib.zip. If you
-0.187571	non-object oriented programs. If you
-0.083737	more reliable results. If you
-0.083737	and reproducible results. If you
-0.187571	recover from errors. If you
-0.187571	for these methods. If you
-0.187571	of a macro. If you
-0.187571	3"); or __debugbreak();. If you
-0.187571	and cryptography (www.intel.com). If you
-0.187571	a natural ordering? If you
-0.187571	x86 systems). 42 If you
-0.187571	a logical sequence. If you
-0.187571	IntegerPower<10>(x); } 152 If you
-0.286405	static arrays automatically but you
-0.231038	protected operating system, but you
-0.231038	of this manual, but you
-0.231038	of known type, but you
-0.313874	output. On most compilers you
-0.230189	of the function where you
-0.230189	the general case where you
-0.230189	on the Internet where you
-0.229403	allowed in C++ so you
-0.229403	in the code, so you
-0.229403	is 32 bits, so you
-0.236700	the optimal algorithm before you
-0.179545	two decimals, for example, you
-0.179545	footprint. If, for example, you
-0.179545	example 12.3a, for example, you
-0.612274	CPU dispatching. For example, you
-0.334476	elements are. For example, you
-0.319672	look like and how you
-0.338902	page 39 shows how you
-0.236343	On big endian systems you
-0.949448	if you are sure you
-0.225342	you are not sure you
-0.451754	possible, and make sure you
-0.236362	same effect. Which method you
-0.502970	divisor. In this case you
-0.497541	However, in most cases you
-0.550647	mind. In some cases you
-0.235915	in one vector, while you
-0.235913	user input. (In Windows you
-0.313132	dispatching: 1. How much you
-0.235523	the class. Which solution you
-0.366708	time regardless of whether you
-0.226295	makes no difference whether you
-0.235016	simple standard operations. All you
-0.235016	the fastest first. However, you
-0.234858	this kind of problems you
-0.162741	not do so unless you
-0.162741	&& to & unless you
-0.017306	on non-Intel CPUs unless you
-0.209598	the flush-to-zero mode unless you
-0.162741	definitely be avoided unless you
-0.162741	Mac OS X, unless you
-0.162741	than as b*(2.0/3.0) unless you
-0.503585	etc. In most cases, you
-0.223815	calculations. In such cases, you
-0.180240	higher instruction set. Therefore, you
-0.180240	reference to it. Therefore, you
-0.304286	very time consuming. Therefore, you
-0.180240	throws an exception. Therefore, you
-0.180240	with different strides. Therefore, you
-0.180240	scope or namespaces. Therefore, you
-0.232478	the code that allows you
-0.232478	a container that allows you
-0.263780	The Intel compiler allows you
-0.208987	the compiler does what you
-0.280433	sure you know what you
-0.208987	to measure exactly what you
-0.289649	header file will give you
-0.095930	indication of which optimizations you
-0.095930	do and which optimizations you
-0.201742	1.0) { ... Here, you
-0.201742	comes from testing. Here, you
-0.201742	in my blog. Here, you
-0.268264	are lots of things you
-0.215029	There are various things you
-0.213605	For example, in Windows, you
-0.213605	are short. In Windows, you
-0.232368	you start to code, you
-0.231624	than references are: When you
-0.231737	will stay on until you
-0.203005	certain instructions that allow you
-0.273389	Many 32-bit systems allow you
-0.203088	In a C++ program, you
-0.203088	point in your program, you
-0.284815	generated by the compiler, you
-0.229785	bad CPU dispatching. Obviously, you
-0.195701	...)) { ... Here you
-0.195701	a suboptimal way. Here you
-0.190695	step. In most systems, you
-0.318253	efficient. In 64-bit systems, you
-0.226459	returning a composite object, you
-0.226320	a is false. Likewise, you
-0.078549	support intrinsic functions. Alternatively, you
-0.078549	the cache size. Alternatively, you
-0.078549	its own stack. Alternatively, you
-0.078549	the function returns. Alternatively, you
-0.078549	IsProcessorFeaturePresent in Windows). Alternatively, you
-0.122513	the loop. In general, you
-0.122513	equally fast. In general, you
-0.122513	32-bit counterparts. In general, you
-0.360827	In the latter case, you
-0.074656	a long vector library, you
-0.074656	a short vector library, you
-0.218134	program is executed. Furthermore, you
-0.218134	mathematical functions that 150 you
-0.160743	parameter. In other words, you
-0.160743	for. In other words, you
-0.218253	with profiling support. Then you
-0.211872	On the smallest devices, you
-0.264696	advantage of out-of-order execution, you
-0.283835	divisions (Division is slow, you
-0.122429	constant propagation, etc. Whether you
-0.122429	Sum2 and Sum3. Whether you
-0.294045	assembly language. In fact, you
-0.403191	structures. On the contrary, you
-0.122429	find hot spots Before you
-0.122429	complicated mathematical tasks. Before you
-0.211872	~ for NOT. Instead, you
-0.199612	using the | operator; you
-0.199612	for a specific purpose, you
-0.199612	tested it. The insight you
-0.199612	not needed. Even better, you
-0.164782	explicitly. In example 8.21, you
-0.164782	speed without jeopardizing safety, you
-0.164782	start to optimize anything, you
-0.164782	systems". For this reason, you
-0.164782	the following way. First you
-0.236931	* const Greek[4] = {
-0.236931	const float coef[16] = {
-0.237306	Example 7.41a class vector {
-0.035414	i < 100; i++) {
-0.052354	i < size; i++) {
-0.105626	i < n; i++) {
-0.068330	i < 256; i++) {
-0.068330	i < 1000; i++) {
-0.068330	i < 20; i++) {
-0.247679	i < rows; i++) {
-0.007976	i < NumberOfTests; i++) {
-0.068330	i < arraysize; i++) {
-0.068330	i < list.Size(); i++) {
-0.319511	if else if else {
-0.003748	} } } else {
-0.003748	DTRUE; } } else {
-0.007529	= 0; } else {
-0.007529	= b; } else {
-0.002495	= 1; } else {
-0.001246	+ 1; } else {
-0.007529	Table lookup } else {
-0.001870	* 2; } else {
-0.007529	+ 1.; } else {
-0.007529	is nonzero } else {
-0.007529	range"; 134 } else {
-0.007529	of range"; } else {
-0.007529	{ FuncA(i); } else {
-0.003748	{ F1(a); } else {
-0.003748	a[1000]; F1(a); } else {
-0.007529	= &CriticalFunction_SSE2; } else {
-0.007529	1; 69 } else {
-0.046510	} } 34 else {
-0.046510	sin(x); } 68 else {
-0.001695	lrintf (float const x) {
-0.000423	lrint (double const x) {
-0.000035	__m128i const & x) {
-0.000141	(Vec4f const & x) {
-0.000141	add_elements(__m128 const & x) {
-0.002546	float SomeFunction (int x) {
-0.002546	int MultiplyBy (int x) {
-0.000188	float parabola (float x) {
-0.000079	static double p(double x) {
-0.000317	} double xpow10(double x) {
-0.000159	10 double xpow10(double x) {
-0.000317	unrolled double xpow10(double x) {
-0.005106	double IntegerPower (double x) {
-0.001271	16 float Exp(float x) {
-0.001271	series float Exp(float x) {
-0.005106	module1.cpp int Func1(int x) {
-0.005106	; double Func2(double x) {
-0.105815	F3(bool y) { union {
-0.105815	// Example 14.28 union {
-0.105815	// Example 14.23b union {
-0.105815	// Example 14.26 union {
-0.105815	// Example 14.27 union {
-0.105815	// Example 14.23 union {
-0.105815	// Example 7.39 union {
-0.105815	// Example 14.29 union {
-0.105815	// Example 14.24 union {
-0.105815	// Example 14.25 union {
-0.155993	T const & b) {
-0.002053	(int a, bool b) {
-0.110696	12.2 __declspec(align(16)) struct S1 {
-0.110696	Example 14.9 struct S1 {
-0.110696	Example 7.35b struct S1 {
-0.110696	Example 7.35a struct S1 {
-0.231561	union Bitfield { struct {
-0.061298	if (u.i[1] < 0) {
-0.097969	if (n > 0) {
-0.036580	% 2 == 0) {
-0.036580	% 128 == 0) {
-0.036580	if (a == 0) {
-0.083298	if (b == 0) {
-0.046228	if (a != 0) {
-0.046228	while (n != 0) {
-0.046228	if (b != 0) {
-0.227819	void F0() { try {
-0.018000	a[], int * p) {
-0.001472	LoadVector(void const * p) {
-0.004431	LoadVectorA(void const * p) {
-0.008907	Plus2 (int * p) {
-0.008907	FuncA (int * p) {
-0.000033	bb[], short int cc[]) {
-0.124990	> v.i * 2) {
-0.012239	100; i += 2) {
-0.012239	size; i += 2) {
-0.012239	20; i += 2) {
-0.000175	b) { if (b) {
-0.002806	3; } if (b) {
-0.000700	bool b; if (b) {
-0.314732	Example 7.44 class C1 {
-0.547027	CriticalFunction_Dispatch(int parm1, int parm2) {
-0.111069	((unsigned int)n < 4) {
-0.111069	100; i += 4) {
-0.012427	if (level >= 4) {
-0.278857	Example 7.28 class c1 {
-0.001526	}; void test () {
-0.003057	a[c][r]); void test () {
-0.028317	8.25 void Func () {
-0.028317	14.1c void CriticalInnerFunction () {
-0.001001	256; i += 8) {
-0.014040	a[], int & r) {
-0.059106	FuncB (int & r) {
-0.000536	r < SIZE; r++) {
-0.004305	c < SIZE; c++) {
-0.004305	c < r; c++) {
-0.035694	x, unsigned int n) {
-0.002865	int factorial (int n) {
-0.011575	void SomeFunction (int n) {
-0.016015	x < 100; x++) {
-0.218178	int N> class powN {
-0.148521	Example 7.40b union Bitfield {
-0.148521	Example 7.40a struct Bitfield {
-0.218035	|| i >= size) {
-0.002147	public: virtual void Disp() {
-0.002147	{ public: void Disp() {
-0.088400	virtual functions class CHello {
-0.020471	C1 : public CHello {
-0.020471	C2 : public CHello {
-0.067942	using & enum Weekdays {
-0.067942	multiple conditions enum Weekdays {
-0.264586	while (seconds < 5) {
-0.013540	if (level >= 11) {
-0.027515	parm2); } int main() {
-0.027515	&CriticalFunction_386; } int main() {
-0.122375	8.19. Devirtualization class C0 {
-0.122375	C1 : public C0 {
-0.211774	<intrin.h> long long ReadTSC() {
-0.199517	256; i += 16) {
-0.074586	(vector const & a) {
-0.074586	float square (float a) {
-0.008652	CChild1 : public CParent<CChild1> {
-0.074586	grandparent class: class CGrandParent {
-0.074586	CParent : public CGrandParent {
-0.017480	x[]); void F3(bool y) {
-0.017480	9.2b void F3(bool y) {
-0.330128	("CriticalFunction"); typeof(CriticalFunction) * CriticalFunctionDispatch(void) {
-0.250769	~C1(); }; void F1() {
-0.074586	} module2.cpp int Func2() {
-0.074586	} } void Func2() {
-0.008652	if (u.i & 0x7FFFFFFF) {
-0.035694	{ public: void Hello() {
-0.035694	void Disp(); void Hello() {
-0.035694	y) { if (y) {
-0.035694	b[1000]; }; if (y) {
-0.035694	r1; c1 += TILESIZE) {
-0.035694	SIZE; r1 += TILESIZE) {
-0.074586	c2 < c1+TILESIZE; c2++) {
-0.074586	c2 < r2; c2++) {
-0.008652	r2 < r1+TILESIZE; r2++) {
-0.017480	96 void transpose(double a[SIZE][SIZE]) {
-0.017480	9.5b void transpose(double a[SIZE][SIZE]) {
-0.199517	int N> class SafeArray {
-0.008652	TransposeCopy(double a[SIZE][SIZE], double b[SIZE][SIZE]) {
-0.199517	public B1, public B2 {
-0.199517	|| Day == Friday) {
-0.164694	((unsigned int)i < 10) {
-0.164694	D : public B1 {
-0.164694	double const & source) {
-0.164694	of the array i) {
-0.164694	f(); }; void g() {
-0.164694	... } void F0() {
-0.164694	true. template<> class powN<true,0> {
-0.164694	if (absvalue > largest_abs) {
-0.164694	<int N> class powN<true,N> {
-0.164694	int)i >= (unsigned int)size) {
-0.164694	:1;//signbit }; struct Sdouble {
-0.164694	temp < &list[100]; temp++) {
-0.164694	Example 7.36 class S2 {
-0.164694	Example 7.37 class S3 {
-0.164694	this block: 62 __try {
-0.164694	Example 8.10a if (true) {
-0.164694	CChild2 : public CParent<CChild2> {
-0.164694	int n; switch (n) {
-0.164694	&& list[i] > 1.0) {
-0.164694	(int x, int m) {
-0.164694	INVALID_HANDLE_VALUE && WriteFile(handle, ...)) {
-0.164694	? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH) {
-0.164694	&& i <= max) {
-0.164694	(int a, int x[]) {
-0.164694	:1;//signbit }; struct Slongdouble {
-0.164694	n <= 16; n++) {
-0.164694	| Wednesday | Friday)) {
-0.164694	as follows: struct Sfloat {
-0.164694	EXCEPTION_FLT_OVERFLOW 0xC0000091L void MathLoop() {
-0.164694	sizeof(a)); } int Size() {
-0.164694	if (i >= N) {
-0.164694	x(0) {}; void xplus2() {
-0.164694	while (i < arraysize) {
-0.164694	d; }; void Func() {
-0.164694	if (u.i > v.i) {
-0.164694	recursion template<> class powN<true,1> {
-0.164694	F1(); } catch (...) {
-0.164694	((unsigned int)n < 13) {
-0.164694	(unsigned int)(max - min)) {
-0.164694	T a[N]; public: SafeArray() {
-0.164694	int * __restrict bb) {
-0.164694	another thread void DelayFiveSeconds() {
-0.545234	reasonable solution is to have
-0.564036	much more efficient to have
-0.710990	they are sure to have
-1.326236	it is important to have
-0.643496	is not necessary to have
-0.292099	is not good to have
-0.433086	1 is certain to have
-0.292099	container are allowed to have
-0.323094	may be convenient to have
-0.535155	it is sufficient to have
-0.537837	bit scan instruction and have
-0.466729	unroll the loop and have
-0.409867	containing the functions that have
-0.409867	implement the functions that have
-0.326130	accessible from compilers that have
-0.233273	for saving registers that have
-0.309209	in all systems that have
-0.341108	on some processors that have
-0.170006	that all operators that have
-0.170006	1, but operators that have
-0.170006	use in programs that have
-0.170006	are making programs that have
-0.233273	with many labels that have
-0.233273	scan instruction. Programmers that have
-1.155887	so that it can have
-0.353983	be cached. This can have
-0.534707	independent then you can have
-0.341818	bits, so you can have
-0.311914	These virtual processors can have
-1.466009	parts of the code have
-0.480926	and it will not have
-0.439986	microprocessors that do not have
-0.364480	that we do not have
-0.279191	32-bit systems do not have
-0.279191	bigger vectors do not have
-0.279191	STL containers do not have
-0.462582	This compiler does not have
-0.328197	The programmer does not have
-0.761225	generated by the compiler have
-0.346710	some cases you may have
-0.314152	true. The program may have
-0.635787	CPU cores. You may have
-0.228715	but other systems may have
-0.713426	The operating system may have
-0.399099	A virtual processor may have
-0.228715	pattern history, etc. may have
-0.283767	floating point expressions may have
-0.228715	under this unit-test may have
-0.407528	call so that you have
-0.314244	speed. Assume that you have
-0.325260	be important if you have
-0.325260	single precision if you have
-0.311822	As soon as you have
-0.316409	of calculations then you have
-0.316409	64-bit vectors then you have
-0.316409	four numbers, then you have
-0.218305	big endian systems you have
-0.218305	In this case you have
-0.218305	standard operations. All you have
-0.265893	flush-to-zero mode unless you have
-0.265893	be avoided unless you have
-0.355706	and which optimizations you have
-0.271968	suboptimal way. Here you have
-0.448968	loop. In general, you have
-0.218305	of out-of-order execution, you have
-0.218305	to optimize anything, you have
-0.229962	static memory and will have
-0.473013	addition then you will have
-0.315670	by four, we will have
-0.285182	the end user will have
-0.229962	Such a processor will have
-0.229962	that the loader will have
-0.329100	vector data. The data have
-0.235670	vectors RGB image data have
-1.213373	parts of the program have
-0.276722	64-bit Windows if functions have
-0.222503	The intrinsic vector functions have
-0.324133	Many Intel library functions have
-0.334617	are: Non-static member functions have
-0.296411	beware that these functions have
-0.222503	The CPU- specific functions have
-0.222503	stack unwinding. All functions have
-0.296411	C style string functions have
-0.222503	each case. Inlined functions have
-0.331460	while you can only have
-0.308583	Your measurement code should have
-0.232748	the memory block should have
-0.699594	The CPU dispatcher should have
-0.432497	other languages that do have
-0.312175	applications. But we do have
-0.385702	overhead while other compilers have
-0.302938	Intel compiler Intel compilers have
-0.297942	options All C++ compilers have
-0.387410	efficient. Most C++ compilers have
-0.149369	program optimization Some compilers have
-0.068294	64-bit systems. Some compilers have
-0.068294	operating systems. Some compilers have
-0.149369	all compilers. Some compilers have
-0.149369	Optimization directives Some compilers have
-0.210966	the compiler. Some compilers have
-0.149369	optimal order. Some compilers have
-0.149369	different places). Some compilers have
-0.327537	the loop. Most compilers have
-0.205952	be slower. Many compilers have
-0.313787	efficiency and code size have
-0.293324	libraries published by Intel have
-0.416798	if a and b have
-0.540408	the vector class library have
-0.227416	constant. The compilers also have
-0.046680	for. Some systems also have
-0.046680	card. Some systems also have
-0.227416	called by F1 also have
-0.062467	needed before all objects have
-0.030111	only after all objects have
-0.030111	needed after all objects have
-0.097429	unique key. Do objects have
-0.097429	hash map. Do objects have
-0.334785	to multithreading that we have
-0.257516	1000 times then we have
-0.257516	= 10000, then we have
-0.222740	list[j].c; } Here, we have
-0.222740	Using hexadecimal numbers, we have
-0.292953	or __declspec(thread). Such variables have
-0.217854	} 111 } You have
-0.217854	for intrinsic functions You have
-0.217854	to error handling. You have
-0.217854	of this manual. You have
-0.217854	on page 72. You have
-0.217854	the optimization job. You have
-0.022779	be used if elements have
-0.303235	array after all elements have
-0.231659	new vector size often have
-0.231659	error that hackers often have
-0.582238	compilers and function libraries have
-0.318374	that most function libraries have
-0.221900	though not all libraries have
-0.221900	www.agner.org/optimize/#vectorclass All these libraries have
-0.236397	register stack. These registers have
-0.046408	efficient. 64 bit systems have
-0.046408	-fno-pic). 64 bit systems have
-0.314236	intensive applications. Some systems have
-0.236483	things only after they have
-0.306340	software. It may even have
-0.230863	devices, you don't even have
-0.236109	possible. The AVX instructions have
-0.236219	another. Therefore, micro- processors have
-0.149838	detection function that I have
-0.108729	but no compiler I have
-0.000917	of the compilers I have
-0.001376	all the compilers I have
-0.013963	71 The compilers I have
-0.013963	of different compilers I have
-0.108729	of the examples I have
-0.050971	| b; Here, I have
-0.050971	& 1]; Here, I have
-0.108729	function is called. I have
-0.108729	lot in performance. I have
-0.108729	the next element. I have
-0.108729	and initialized arrays. I have
-0.108729	and model number. I have
-0.108729	destructors to call. I have
-0.108729	a new one. I have
-0.108729	the reductions manually. I have
-0.108729	my own research, I have
-0.108729	and maintenance easier. I have
-0.108729	is particularly tricky. I have
-0.108729	In this chapter, I have
-0.291451	all newer Intel CPUs have
-0.209983	16 bytes. Some CPUs have
-0.209983	monitor counters Many CPUs have
-0.262563	12. Most modern CPUs have
-0.209983	called accumulators. Current CPUs have
-0.329702	to optimization, it does have
-0.276744	difficult. The functions must have
-0.222523	The container class must have
-0.222523	mouse. This task must have
-0.291912	the runtime address calculations have
-0.344289	or if different versions have
-0.097059	so that it doesn't have
-0.385153	code because it doesn't have
-0.187377	because the compiler doesn't have
-0.187377	If the compiler doesn't have
-0.187377	But the compiler doesn't have
-0.236287	modules The compiler doesn't have
-0.221430	p. 28) The threads have
-0.275506	if no other threads have
-0.221430	end when all threads have
-0.426156	declared in the thread have
-0.312059	Intel compiler for Linux have
-0.225881	the time you would have
-0.225881	number then we would have
-0.235169	then all five values have
-0.234888	source and destination both have
-0.211967	Strings Text strings typically have
-0.211967	that such devices typically have
-0.211967	swapping. Software developers typically have
-0.347258	decomposition, we should preferably have
-0.452789	calculations should therefore preferably have
-0.298856	and SSE2 instruction sets have
-0.298856	with CISC instruction sets have
-0.083886	is that you don't have
-0.083886	so that you don't have
-0.187950	strides. Therefore, you don't have
-0.027548	so that we don't have
-0.180193	cache so we don't have
-0.164943	people. I simply don't have
-0.234457	operator These different methods have
-0.234108	in small embedded applications have
-0.310386	well, but the examples have
-0.219233	if possible. Smaller microprocessors have
-0.219233	vector operations Today's microprocessors have
-0.339744	certainty that the operands have
-0.410901	results if the operands have
-0.199730	of expressions where operands have
-0.232708	implementations. However, these languages have
-0.231996	such a framework sometimes have
-0.231961	vector processing capabilities still have
-0.231049	Some software development models have
-0.229727	that the variables might have
-0.199741	C++ constructs Most programmers have
-0.199741	to program. Many programmers have
-0.426663	used above the diagonal have
-0.228930	The constant N1 could have
-0.282671	check if the inputs have
-0.605431	in the x86 family have
-0.227812	the many people who have
-0.321865	a thousand cache misses have
-0.184524	publicly available information. They have
-0.184524	32-bit and 64-bit. They have
-0.132662	development, and that computers have
-0.132662	This is because computers have
-0.132662	why all modern computers have
-0.417588	once the hot spots have
-0.271762	that some development tools have
-0.218122	costly because all caches have
-0.211860	are that software projects have
-0.015509	the software. Smaller microcontrollers have
-0.015509	optimize caching. Smaller microcontrollers have
-0.015509	small microcontrollers: Smaller microcontrollers have
-0.199601	be shared. You can't have
-0.199601	may consider whether others have
-0.199601	Intel-based Mac OS, etc.) have
-0.164771	IDE's (Integrated Development Environments) have
-0.164771	wrong branch. Microprocessor designers have
-0.164771	Standard C++ imple- mentations have
-0.164771	strlen function in isolation have
-0.522557	with the performance of this
-0.511954	polynomial The calculation of this
-0.500046	iterations. The advantage of this
-0.409622	to take advantage of this
-0.415003	can take advantage of this
-0.290181	www.agner.org/optimize/asmlib.zip. The name of this
-0.524519	if the cost of this
-0.516465	See the end of this
-0.480420	exception. The costs of this
-0.543350	for a discussion of this
-0.742835	for further discussion of this
-0.337895	particularly slow implementations of this
-1.010645	for an explanation of this
-0.483868	compiler takes care of this
-0.452852	y. The purpose of this
-0.039194	beyond the scope of this
-0.341849	we add a to this
-0.210962	A simple solution to this
-0.210962	The standard solution to this
-0.171593	the library. Add to this
-0.171593	CPU dispatching. Add to this
-0.236030	machines? Possible solutions to this
-0.031461	in an appendix to this
-0.031461	as an appendix to this
-0.065380	classes. An appendix to this
-0.236030	1.23456. The conclusion to this
-0.572677	it is used and this
-0.323736	an unsigned integer and this
-0.292696	to an integer, and this
-0.236572	element is accessed, and this
-0.236572	the value infinity, and this
-0.324387	than 15.1b, and in this
-0.531050	dispatching. The code in this
-0.338440	} The data in this
-0.486847	unrolling the loop in this
-0.322513	general case, but in this
-0.230351	i to float in this
-0.285623	any other address in this
-0.230351	distribute function libraries in this
-0.230351	by element 0 in this
-0.230351	and 32-bit Windows in this
-0.396998	An efficient solution in this
-0.285623	to exception handling in this
-0.305730	documented. The examples in this
-0.324387	map is needed in this
-0.285623	The if statement in this
-0.331559	multiplication by columns in this
-0.902253	CPUs, as described in this
-0.326269	break will occur in this
-0.523885	an error message in this
-0.230351	we can define in this
-0.230351	// Catch exceptions in this
-0.230351	that is measured in this
-0.230351	do the reduction in this
-0.308738	checking, as illustrated in this
-0.230351	parameter. If MultiplyBy in this
-0.230351	1 : 0] in this
-0.230351	the two formulas in this
-0.230351	default, so 1.2 in this
-0.230351	the other volumes in this
-0.237654	or reference parameters). The this
-0.475806	have a function for this
-0.344021	the optimal code for this
-0.166136	end. The reason for this
-0.166136	directly. The reason for this
-0.234331	once. The reasons for this
-0.234331	can be mispredicted for this
-0.310468	was never designed for this
-0.234331	as the basis for this
-0.234331	CPU dispatching 125 for this
-0.234331	more than doubled for this
-0.939073	tell the compiler that this
-0.330901	is almost certain that this
-0.293079	important to note that this
-0.293899	x to 0 // this
-0.237686	We can tell it this
-0.349291	library functions, or if this
-0.437701	out a loop if this
-0.234833	a loop automatically if this
-0.234833	I don't know if this
-0.234833	use lookup tables if this
-0.424031	best on processors with this
-0.327508	unchanged. The problem with this
-0.486858	cc[i]+2 is AND'ed with this
-0.290209	methods for dealing with this
-0.234386	functions. I disagree with this
-0.235059	software. For more on this
-0.102031	is to turn on this
-0.102031	recommended to turn on this
-0.235059	and make measurements on this
-0.535417	but as long as this
-0.513038	vectors do not have this
-0.350427	In general, you have this
-0.351368	optimization than to use this
-0.118845	Gnu compiler can use this
-0.379460	time You can use this
-0.332462	compiler can then use this
-0.393645	overlap. You should use this
-0.303010	count and always use this
-0.283027	do not normally use this
-0.234396	the array elements then this
-0.234396	the critical stride then this
-0.531873	5 clock cycles, then this
-0.235724	should be clear from this
-0.235724	we can learn from this
-0.350747	is already known at this
-0.545618	for how to make this
-0.341855	to list and make this
-0.340601	the compiler can make this
-0.420479	compilers do not make this
-0.471801	or member function because this
-0.301156	casting of pointers because this
-0.226503	0/a = 0 because this
-0.226503	recognize VIA processors because this
-0.226503	is always position-independent because this
-0.226503	very time-consuming tasks because this
-0.226503	u.i[1] ^= 0x80000000; because this
-0.235308	and cache size. If this
-0.235308	specific load address. If this
-0.605743	processor models on which this
-0.265757	more efficient code, but this
-0.212811	time, of course, but this
-0.212811	a function library, but this
-0.265757	an imported pointer, but this
-0.212811	when it occurs, but this
-0.212811	with template metaprogramming, but this
-0.212811	point addition unit, but this
-0.212811	of the factorials, but this
-0.212811	less than 2-20, but this
-0.212811	compile with -mcmodel=large, but this
-0.212811	same memory block, but this
-0.212811	the option -ftrapv, but this
-0.212811	override public symbols, but this
-0.688944	therefore necessary to do this
-0.744978	are able to do this
-0.327629	therefore safer to do this
-0.493318	that you can do this
-0.350538	compilers you can do this
-0.388934	most compilers will do this
-0.381637	it. I am using this
-0.149795	5 } } In this
-0.149795	return c; } In this
-0.249939	of system code. In this
-0.198780	a += b; In this
-0.198780	of the resources. In this
-0.249939	the single-thread speed. In this
-0.198780	long dependency chains. In this
-0.198780	give -2.0 55 In this
-0.198780	be reused elsewhere. In this
-0.198780	(see page 71). In this
-0.198780	a clock cycle? In this
-0.198780	the same divisor. In this
-0.198780	= MAX(f(x), g(x)); In this
-0.236840	on automatic prefetching so this
-0.236588	apart. I will call this
-0.236750	and operating systems". For this
-0.236604	In the preceding example, this
-0.336554	Example 12.4b shows how this
-0.479230	want to know how this
-0.226803	next chapter describes how this
-0.348096	typical way to test this
-0.352081	without an operating system this
-0.554949	element. In some cases this
-0.534595	compiler that you want this
-0.229698	if you don't want this
-0.235940	be too worried about this
-0.312623	previous value. It does this
-0.304431	is possible to avoid this
-0.689449	various ways to avoid this
-0.192630	twice. You can avoid this
-0.192630	entry. You can avoid this
-0.356335	used. You may avoid this
-0.253831	CriticalFunction. You cannot avoid this
-0.228602	consumes CPU time. But this
-0.228602	for Java today. But this
-0.313201	the data object through this
-0.328780	all CPUs that support this
-0.426307	be able to inline this
-0.291597	be able to optimize this
-0.291597	we try to optimize this
-0.213688	good compiler will optimize this
-0.203585	nontemporal is used. However, this
-0.203585	the actual processor. However, this
-0.203585	the program flow. However, this
-0.203585	for later maintenance. However, this
-0.039462	The compiler may replace this
-0.063796	debugger. You may replace this
-0.229035	The compiler will replace this
-0.234571	can be implemented like this
-0.311955	that we are running this
-0.234135	are the same after this
-0.289865	language need only read this
-0.344818	The next example shows this
-0.434297	one. You can improve this
-0.291192	great lengths to reduce this
-0.020643	The compiler may reduce this
-0.311276	future processors, and choose this
-0.214993	way to work around this
-0.214993	are various ways around this
-0.297271	(The Microsoft compiler supports this
-0.287295	instruction. The CPU supports this
-0.376014	The compiler may change this
-0.232390	non-member functions. 80 Unfortunately, this
-0.287673	If i is outside this
-0.231953	is rebooted. To prevent this
-0.231098	the multiplication b[i]*c[i], though this
-0.231098	speculatively executing instructions during this
-0.230478	that performs best under this
-0.230580	scan instruction and expect this
-0.421694	occur. The reason why this
-0.202902	have no explanation why this
-0.199702	of all variables. Obviously, this
-0.199702	A is finished. Obviously, this
-0.332989	in order to implement this
-0.229015	DWORD PTR [ecx+eax*4],ebx stores this
-0.227698	Linux and Mac systems, this
-0.320391	the Windows operating system, this
-0.403816	The compiler can eliminate this
-0.224417	division faster. Of course, this
-0.224417	it. I am giving this
-0.438979	for how to overcome this
-0.221974	program. Add to 122 this
-0.222084	messages saying please install this
-0.221864	/ sar ebx,1 adds this
-0.305825	doesn't occur, but unfortunately this
-0.218064	stored in edx. Furthermore, this
-0.355562	sets. Let me explain this
-0.218064	of possible remedies against this
-0.218200	small to cause overflow, this
-0.271696	the multiplication by changing this
-0.211803	contention. You may skip this
-0.211803	end user's computers. At this
-0.211982	Intel compiler has solved this
-0.056898	simplest way to solve this
-0.056898	are designed to solve this
-0.199545	how the microprocessor handles this
-0.199545	not. I will conclude this
-0.164720	heading You can subtract this
-0.164720	Programmers very often underestimate this
-0.164720	element. I have confirmed this
-0.164720	b / 1.2345; Change this
-0.164720	model number to reflect this
-0.568835	counts. This is the time
-0.419856	way most of the time
-0.419856	runs most of the time
-1.176667	the value of the time
-0.496333	The value of the time
-0.445203	large fraction of the time
-0.344378	The lengths of the time
-0.484784	true 50% of the time
-0.138913	than 99% of the time
-0.138913	programs, 99% of the time
-0.344378	than 1/50 of the time
-0.529091	therefore equal to the time
-0.350919	time compared to the time
-0.347472	the contentions and the time
-0.347472	each call, and the time
-0.354786	more frequent if the time
-0.694993	be obtained with the time
-0.345015	duration compared with the time
-0.353699	optimization efforts on the time
-0.415357	much more than the time
-0.072594	is less than the time
-0.246311	be less than the time
-0.339976	simply don't have the time
-0.015335	Add to this the time
-0.064883	to 122 this the time
-0.254481	its name at the time
-0.033345	was unknown at the time
-0.006467	were unknown at the time
-0.254481	less popular at the time
-0.254481	been lost at the time
-0.343479	measures not only the time
-0.377151	test but also the time
-0.350577	clock cycles before the time
-0.562491	application to calculate the time
-0.321646	you may read the time
-0.326817	this way includes the time
-0.423168	times and stores the time
-0.326817	you can increase the time
-0.309869	used for reducing the time
-0.289575	future processors. Consider the time
-0.289575	this by measuring the time
-0.233828	in addition to) the time
-0.356764	a graphics function is time
-0.543251	constructor can be a time
-0.355134	or microseconds as a time
-0.046503	eight elements at a time
-0.226342	16 bytes at a time
-0.425882	one line at a time
-0.226342	four numbers at a time
-0.226342	small piece at a time
-0.498112	waste a lot of time
-0.498112	spend a lot of time
-0.653554	is a waste of time
-0.286966	frustration and waste of time
-0.102476	control branch ahead of time
-0.102476	loop counter ahead of time
-0.407822	elements more complicated and time
-0.336961	the user's time. The time
-0.611904	of a program. The time
-0.324502	are given below. The time
-0.324502	the test loop. The time
-0.231959	3.3 Program installation The time
-0.231959	is 12 bytes. The time
-0.412317	do the calculations. The time
-0.231959	the right prediction. The time
-0.231959	easier to maintain. The time
-0.374552	time is doubled. The time
-0.231959	for user input. The time
-0.527051	of programming style. The time
-0.231959	program is run. The time
-0.231959	year. Ignoring virtualization. The time
-0.231959	(in Windows: __rdtsc()). The time
-0.231959	a certain tolerance. The time
-0.231959	the performance costs. The time
-0.658607	operating system can be time
-0.429141	of these methods are time
-0.237707	very high resolution if time
-0.237623	or network resources. This time
-0.236182	executing instructions during this time
-0.236182	very often underestimate this time
-0.324848	time. Other programs use time
-0.272038	many programs use more time
-0.198384	object takes no more time
-0.198384	calculations take no more time
-0.089479	way that takes more time
-0.089479	Often, it takes more time
-0.089479	linked list takes more time
-0.089479	integer-to-float conversion takes more time
-0.170034	itself and take more time
-0.234868	calculations may take more time
-0.170034	mathematical functions take more time
-0.170034	can sometimes take more time
-0.117331	memory takes much more time
-0.117331	often takes much more time
-0.218366	scanners to consume more time
-0.218366	it takes 40% more time
-0.235785	addition to sum1 from time
-0.235785	addition to sum2 from time
-0.561458	additions in the same time
-0.601557	used at the same time
-0.425633	zero at the same time
-0.136943	b take the same time
-0.136943	usually take the same time
-0.534495	90% of the CPU time
-0.235397	well spend more CPU time
-0.237366	Coarse time measurement. If time
-0.351315	is mispredicted only one time
-0.046306	loaded rather than each time
-0.046306	once, rather than each time
-0.225149	loaded from memory each time
-0.225149	calculating the value each time
-0.279721	context switches after each time
-0.225149	searching for updates each time
-0.236936	is of course also time
-0.236932	then it obviously takes time
-0.491970	library which is very time
-0.293854	integer takes a long time
-0.096346	may take a long time
-0.096346	logarithms take a long time
-0.416427	take quite a long time
-0.332725	six times as long time
-0.299425	takes a very long time
-0.209679	and measure how long time
-0.209679	program takes too long time
-0.292884	memory access are critical time
-0.374626	called for the first time
-0.287510	initialized only the first time
-0.287510	only calculated the first time
-0.287510	waits until the first time
-0.326855	worst-case performance: The first time
-0.213959	be called only first time
-0.292415	very problematic because these time
-0.292500	finishes in a short time
-0.060341	uses most of its time
-0.060341	run most of its time
-0.060341	spends most of its time
-0.216364	a structure. The extra time
-0.249582	often takes no extra time
-0.249582	handling takes no extra time
-0.300366	double take no extra time
-0.260861	effect on the execution time
-0.260861	counts give the execution time
-0.215473	most of their execution time
-0.351824	on the total execution time
-0.267019	computer users and much time
-0.025677	called and how much time
-0.113303	to measure how much time
-0.177904	its value at compile time
-0.177904	some calculations at compile time
-0.244018	is calculated at compile time
-0.108380	is known at compile time
-0.241943	be known at compile time
-0.692286	not known at compile time
-0.146650	is resolved at compile time
-0.230307	always resolved at compile time
-0.177904	calculate (1./1.2345) at compile time
-0.317966	or if the calculation time
-0.317966	occur, but the calculation time
-0.218955	have an estimated calculation time
-0.322437	Each thread will get time
-0.115624	can do this every time
-0.115624	events, for example every time
-0.115624	object are called every time
-0.115624	branching is done every time
-0.115624	into the list every time
-0.115624	series of branches every time
-0.012878	new memory block every time
-0.115624	must be loaded every time
-0.115624	search for updates every time
-0.115624	make a misprediction every time
-0.115624	parameters are evaluated every time
-0.115624	to be updated every time
-0.115624	block is re-allocated every time
-0.115624	would be re-calculated every time
-0.332871	only until the next time
-0.320890	function returns. The next time
-0.587628	spend most of their time
-0.097010	work automatically. The development time
-0.097010	MFC application. The development time
-0.321283	to integer. The conversion time
-0.096537	the same as last time
-0.096537	same way as last time
-0.084317	one that takes longer time
-0.084317	started. It takes longer time
-0.084317	Integer multiplication takes longer time
-0.265771	appear to take longer time
-0.144247	switches by making longer time
-0.006873	division takes much longer time
-0.013856	truncation takes much longer time
-0.233917	member functions (methods) Each time
-0.233978	execute it. The load time
-0.232789	developers should take installation time
-0.027827	sure that the response time
-0.027827	advantage that the response time
-0.106726	programs because the response time
-0.057560	ms. If the response time
-0.057560	should test the response time
-0.232303	a template parameter. No time
-0.231992	system can be particularly time
-0.324468	optimization is to save time
-0.606381	by using the so-called time
-0.231178	specific CPU core during time
-0.184509	it does not spend time
-0.184509	you will never spend time
-0.281123	example 16.2. The measured time
-0.036264	3 Finding the biggest time
-0.276339	is likely to consume time
-0.222000	29 for details. Development time
-0.218198	the Internet at regular time
-0.218301	to get more reproducible time
-0.212070	or simply zero. Execution time
-0.211935	Pentium 4 processor. Extra time
-0.212070	to be an annoying time
-0.122463	if), but no compile- time
-0.122463	C++ should allow compile- time
-0.199674	that the overall computation time
-0.199674	the function returns. Every time
-0.250945	<ia32intrin.h> etc. // Returns time
-0.250945	increase in develop- ment time
-0.164838	may interfere with real time
-0.164838	is one that saves time
-0.164838	away cpuid // Read time
-0.164838	with profilers are: Coarse time
-0.164838	can get the exact time
-0.357126	be said that the use
-0.341194	be obtained by the use
-0.341194	two additions by the use
-0.341194	a bitfield by the use
-0.347882	representation directly with the use
-0.347882	12.4b, rewritten with the use
-0.452294	stored. This makes the use
-0.443687	because it prevents the use
-0.313542	a function. Avoid the use
-0.152674	important to economize the use
-0.293080	round addresses. Especially the use
-0.357987	completely absent in a use
-0.285886	code performance is to use
-0.212512	this problem is to use
-0.458272	complicated solution is to use
-0.285886	table lookup is to use
-0.285886	compiler generates is to use
-0.285886	My recommendation is to use
-0.332825	decides which function to use
-0.331217	compiler optimization than to use
-0.444426	compiler allows you to use
-0.436011	modify the program to use
-0.477888	A compiler has to use
-0.636994	is more efficient to use
-1.322506	it is possible to use
-0.538992	make it possible to use
-0.644791	makes it possible to use
-0.224101	deciding which version to use
-0.224101	which code branch to use
-0.439326	parallelism. The way to use
-0.419959	tool is faster to use
-0.275539	is often faster to use
-0.761487	example of how to use
-0.459534	107 for how to use
-0.933510	example shows how to use
-0.294287	example illustrates how to use
-0.472481	so you need to use
-1.033448	is no need to use
-0.224101	of which method to use
-0.224101	in some cases to use
-0.623801	you may want to use
-0.718810	It is necessary to use
-0.314420	is rarely necessary to use
-0.159707	it is advantageous to use
-0.139096	It is advantageous to use
-0.140054	is not advantageous to use
-0.179779	is less advantageous to use
-0.179779	almost always advantageous to use
-0.699488	when deciding whether to use
-0.569170	compiler is likely to use
-0.240841	it is recommended to use
-0.318216	It is recommended to use
-0.484866	is not recommended to use
-0.426673	it is optimal to use
-0.301558	not be optimal to use
-0.126284	is no reason to use
-0.298305	Whether you choose to use
-0.022454	it is inefficient to use
-0.224101	of cache lines to use
-0.298305	it is safe to use
-0.321863	is often easier to use
-0.278532	that we expect to use
-0.224101	have special reasons to use
-0.228677	it is preferred to use
-0.164697	may be preferred to use
-0.272776	It is safer to use
-0.202483	References are safer to use
-0.387855	You may prefer to use
-0.224101	it is profitable to use
-0.224101	consider it unwise to use
-0.224101	elements are cumbersome to use
-0.330306	unroll a loop and use
-0.292777	also eliminate i and use
-0.381071	graphical user interface and use
-0.236643	turn it off and use
-0.236643	and one local, and use
-0.718389	of the program. The use
-0.313206	for other purposes. The use
-0.313206	full size vector. The use
-0.572796	into multiple threads. The use
-0.237715	in a matrix for use
-0.806754	of the code that use
-0.336680	rely on instructions that use
-0.235725	example container classes that use
-0.291733	for all modules that use
-0.235725	portable to platforms that use
-0.312131	64-bit systems. Applications that use
-1.087701	so that it can use
-0.331880	is called, it can use
-0.332273	the calling function can use
-0.446788	variable. The compiler can use
-0.056296	The Gnu compiler can use
-0.605421	An optimizing compiler can use
-0.427267	necessary if you can use
-0.392405	and how you can use
-0.302004	In Windows, you can use
-0.302004	functions. Alternatively, you can use
-0.302004	this reason, you can use
-0.337157	and Intel compilers can use
-0.336273	64-bit systems we can use
-0.319979	Read time You can use
-0.319979	155 test. You can use
-0.282916	reasons. The programmer can use
-0.227966	graphical user interface can use
-0.302895	structure or union can use
-0.313582	use assembly code or use
-0.236943	CPUID instruction directly, or use
-0.416143	but it will not use
-0.293640	memory. It will not use
-0.424646	function libraries do not use
-0.328001	because they do not use
-0.344508	the object does not use
-0.161052	instruction set. Do not use
-0.161052	memory allocation. Do not use
-0.161052	memory block. Do not use
-0.161052	certain optimizations. Do not use
-0.161052	linked list. Do not use
-0.161052	scarce resource. Do not use
-0.311972	purpose, or you may use
-0.518314	y?" then you may use
-0.311972	64-bit systems, you may use
-0.285477	less efficient. You may use
-0.285477	double precision. You may use
-0.285477	the application. You may use
-0.285477	not expensive. You may use
-0.285477	not used). You may use
-0.228739	the runtime framework may use
-0.236445	effect. Which method you use
-0.292551	no difference whether you use
-0.307062	-mcmodel=large, but this will use
-0.321463	loop. The loop will use
-0.466900	two. Some compilers will use
-0.478998	sets. Most compilers will use
-0.231471	vector class library will use
-0.288157	been added and then use
-0.288157	doing calculations, and then use
-0.301587	The compiler can then use
-0.226866	only one element then use
-0.226866	assembly output option then use
-0.367496	compiler is used, then use
-0.098785	Out (FIFO) basis then use
-0.098785	Out (FILO) basis then use
-0.346837	the application can make use
-0.237323	important than optimizing CPU use
-0.237209	The above examples all use
-0.341221	linker. Both code cache use
-0.434997	use and data cache use
-0.252325	point operations. You should use
-0.252325	b overlap. You should use
-0.232768	access rights. Software should use
-0.237176	code. If you do use
-0.237074	Preprocessor directives. For example use
-0.234726	available. The best compilers use
-0.234726	sizeof(float)); // (Some compilers use
-0.348779	The compiler can also use
-0.237025	common obstacles to efficient use
-0.236825	you should avoid any use
-0.330901	becomes easier if we use
-0.507156	systems. Floating point variables use
-0.339604	pitfalls here. You cannot use
-0.422718	so that they cannot use
-0.355404	is supported. For example, use
-0.236474	systems. Mac systems often use
-0.292615	other container class libraries use
-0.231064	Windows 3.x. These systems use
-0.231064	objects in Unix-like systems use
-0.299420	possibility is to always use
-0.279597	repeat count and always use
-0.279597	installation process should always use
-0.340753	105 The vector operations use
-0.334749	because the integer operations use
-0.236225	optimized function libraries available use
-0.315003	profiler. For Intel CPUs use
-0.284561	VTune, for AMD CPUs use
-0.284383	140). Mathematical functions must use
-0.229258	cache. Multithreaded programs must use
-0.484428	up if the threads use
-0.291792	point numbers to integers use
-0.330421	many standard container classes use
-0.282529	implementations of string classes use
-0.632858	you may as well use
-0.202357	code. But many programs use
-0.202357	that many common programs use
-0.202357	access Some application programs use
-0.202357	user's time. Other programs use
-0.277540	data structures that typically use
-0.277540	point calculations will typically use
-0.416167	thread-safe function should never use
-0.234195	order to make better use
-0.359034	databases Many software applications use
-0.220728	except when several applications use
-0.329826	Several modern programming languages use
-0.231143	inefficient solution. Many containers use
-0.231002	a server in full use
-0.203092	to economize the resource use
-0.203092	important to economize resource use
-0.285998	the loop. Some implementations use
-0.229831	For example, some programmers use
-0.228982	warn against overkill. Don't use
-0.282689	data within the DLL use
-0.226625	in such applications. Alternatively, use
-0.218198	simple cases. The explicit use
-0.148503	Compilers do not normally use
-0.148503	and Mac systems normally use
-0.211935	The square brackets mean use
-0.211935	in the future. To use
-0.008659	the appropriate version (May use
-0.199674	setup. on Intel CPUs: use
-0.199674	before it occurs, (2) use
-0.199674	more cache space. Excessive use
-0.199674	current position. Windows DLLs use
-0.199674	The best Java machines use
-0.199674	languages, such as Java, use
-0.164838	a + 2 thenaandbcannot use
-0.164838	references, and stack entries use
-0.164838	than a float. (Both use
-0.164838	additions and multiplications. Subtractions use
-0.434753	time, but not the more
-0.237750	smaller the system, the more
-0.350053	to zero that is more
-0.818659	high that it is more
-0.565881	known then it is more
-0.743751	cases where it is more
-0.341602	on, while it is more
-0.341602	again. Obviously, it is more
-0.349886	size. Vectorized code is more
-0.486756	big. The compiler is more
-0.483126	integer code. It is more
-0.462961	long time. It is more
-0.483126	seldom used. It is more
-0.328474	in Windows. It is more
-0.425238	signed integers. It is more
-0.328474	and throw. It is more
-0.328474	memory pooling. It is more
-0.328474	a queue. It is more
-0.317020	containing numerical data is more
-0.346768	the calling program is more
-0.348220	compile-time polymorphism, which is more
-0.436335	a template class is more
-0.519310	complex, that there is more
-0.558187	bigger if there is more
-0.373319	a data member is more
-0.231070	or function libraries is more
-0.317020	Optimizing file access is more
-0.231070	of vector operations is more
-0.373319	a composite type is more
-0.286441	Therefore, 64-bit Linux is more
-0.231070	Linux. Address calculation is more
-0.629388	then the solution is more
-0.231070	using bitwise operators is more
-0.231070	An uncached write is more
-0.607682	If one operand is more
-0.286441	structure. The situation is more
-0.306586	mode Parameter transfer is more
-0.286441	big memory blocks is more
-0.398055	systems. The latter is more
-0.286441	source code. #if is more
-0.231070	situations where pre-increment is more
-0.231070	x = *(p++) is more
-0.231070	x = array[i++] is more
-0.523204	to lead to a more
-0.350279	the same in a more
-0.496985	be implemented in a more
-0.350279	is likely in a more
-0.237067	syntax 90 Gives a more
-0.235815	therefore becoming more and more
-0.322810	is generally faster and more
-0.235815	have become bigger and more
-0.312238	the code smaller and more
-0.322810	code less clear and more
-0.291835	in 32-bit mode, and more
-0.235815	development more expensive and more
-0.235815	be both cheaper and more
-0.346105	object is accessed in more
-0.293500	and b. But in more
-0.343286	method is described in more
-0.334130	an array and for more
-0.591290	the same register for more
-0.236201	See page 31 for more
-0.236201	145 and 119 for more
-0.236201	the specific literature for more
-0.566958	it, it may be more
-0.436561	here. It may be more
-0.436561	high. It may be more
-0.323470	size may possibly be more
-0.349508	function. Leaf functions are more
-0.353157	all. Fortunately, there are more
-0.235391	ease of development are more
-0.235391	renewed. Context switches are more
-0.440297	floating point comparisons are more
-0.235391	operations (chapter 12) are more
-0.203952	file and one or more
-0.203952	by using one or more
-0.203952	but read one or more
-0.203952	and enable one or more
-0.048972	that is two or more
-0.023796	there are two or more
-0.023796	There are two or more
-0.023796	to have two or more
-0.023796	CPUs have two or more
-0.048972	chooses between two or more
-0.048972	for doing two or more
-0.048972	set. Make two or more
-0.228818	is 128 bytes or more
-0.228818	floating point expressions or more
-0.228818	using templates. Two or more
-0.345068	Using pointers makes it more
-0.548099	may be replaced by more
-0.236857	can be increased by more
-0.236780	more powerful computers with more
-0.406468	you are satisfied with more
-0.132765	to make the code more
-0.530714	This makes the code more
-0.668740	makes floating point code more
-0.288025	make the source code more
-0.232465	makes position- independent code more
-0.333205	the loader will have more
-0.341214	Windows if functions have more
-0.311219	Software developers typically have more
-0.341920	runtime framework may use more
-0.323161	But many programs use more
-0.230037	13.1. Instruction sets A more
-0.230037	optimization is enabled. A more
-0.230037	and garbage collection. A more
-0.230037	up to date. A more
-0.230037	or "__attribute__((visibility ("hidden")))". A more
-0.230037	_mm_malloc and _mm_free. A more
-0.293625	thread will run at more
-0.353157	by making the data more
-0.235670	useful for making data more
-0.502021	not make the program more
-0.656624	the compiler to make more
-0.457993	code cache is used more
-0.237187	bytes by adding one more
-0.350403	same priority is no more
-0.341937	should preferably have no more
-0.328059	structure object takes no more
-0.308579	precision calculations take no more
-0.931953	the compiler to do more
-0.535622	not able to do more
-0.287713	of order or do more
-0.320791	a way that takes more
-0.349454	loading Often, it takes more
-0.223073	a linked list takes more
-0.307292	the integer-to-float conversion takes more
-0.223073	A runtime DLL takes more
-0.236757	vector instructions SSE4.1 some more
-0.292787	means of making software more
-0.323904	to individual array elements more
-0.236761	use of software. For more
-0.312035	installation process to take more
-0.204781	application itself and take more
-0.330473	but it can take more
-0.226976	The calculations may take more
-0.226976	on process may take more
-0.275479	and mathematical functions take more
-0.204781	functions in C++ take more
-0.204781	shuffling can sometimes take more
-0.509495	execution. It is often more
-0.320039	operating system is often more
-0.236441	which makes detailed optimization more
-0.230863	Therefore, it is even more
-0.286206	some cases. An even more
-0.292277	unrolled loop takes up more
-0.450533	This makes function calls more
-0.218572	applications it is much more
-0.218572	The effect is much more
-0.173044	inheritance where a much more
-0.210829	of memory takes much more
-0.210829	disk often takes much more
-0.173044	that typically take much more
-0.173044	program are often much more
-0.173044	framework typically uses much more
-0.173044	can be made much more
-0.173044	you can obtain much more
-0.338664	checking and is therefore more
-0.333381	to make and therefore more
-0.235605	necessary. 101 Multithreading works more
-1.005535	counter can be calculated more
-0.590013	make the address calculation more
-0.209965	code big and uses more
-0.209965	CPUs, but it uses more
-0.209965	if an int uses more
-0.548527	if the program uses more
-0.344572	various ways to get more
-0.349979	etc. SSSE3 a few more
-0.399421	a few clock cycles more
-0.234784	on non-Intel CPUs was more
-0.607276	This makes data caching more
-0.486972	the code makes caching more
-0.234457	it makes program development more
-0.188584	122. The code becomes more
-0.188584	tedious. The code becomes more
-0.256859	The memory space becomes more
-0.326709	calculation time is actually more
-0.343677	the need to load more
-0.320236	that make function calling more
-0.503569	code can be made more
-0.309296	be slower or require more
-0.232957	branch that can go more
-0.288230	that computers have become more
-0.195322	useful because it gives more
-0.195322	but it often gives more
-0.195322	and VIA CPUs" gives more
-0.232677	modules. This makes inlining more
-0.232527	you can also find more
-0.485406	makes the assembly output more
-0.262836	to zero is sometimes more
-0.262836	as macros are sometimes more
-0.231049	of range is possibly more
-0.212020	type-casting with a little more
-0.212020	loop becomes a little more
-0.279318	core. Try to allocate more
-0.171736	cases. Does not allocate more
-0.171736	prone to even allocate more
-0.094291	Therefore, it is slightly more
-0.094291	The latter is slightly more
-0.044592	C++ takes only slightly more
-0.044592	precision takes only slightly more
-0.109261	compilers make Sum1 slightly more
-0.227750	is relocated (rebased) once more
-0.226450	may very well spend more
-0.411256	makes floating point comparisons more
-0.224558	the same subexpression occurs more
-0.224641	level-2 cache cannot prefetch more
-0.276287	virus scanners to consume more
-0.228964	hand-held devices are becoming more
-0.164943	caching is therefore becoming more
-0.221923	to invest in ever more
-0.211860	program. In some programs, more
-0.346887	beginning rather than allocating more
-0.370122	then it is certainly more
-0.250863	is worthwhile to invest more
-0.199601	This check makes dynamic_cast more
-0.199601	portability could be achieved more
-0.199601	likely to be cached more
-0.199601	The method is somewhat more
-0.164771	the profiler may sample more
-0.164771	"position-independent code" actually implies more
-0.164771	that it takes 40% more
-0.651029	to be aware of when
-0.382215	and keep track of when
-0.322521	and virtual functions or when
-0.234782	small as possible or when
-0.038307	in 64-bit mode or when
-0.080323	In 64-bit mode or when
-0.236831	of the power function when
-0.292991	inside the pow function when
-0.293697	systems" for details on when
-0.345070	objects without position-independent code when
-0.313179	automatically in vectorized code when
-0.237336	the level-2 cache as when
-0.564536	caching more efficient than when
-0.352184	will run faster than when
-0.235203	an array index than when
-0.237425	a stand alone compiler when
-0.237598	for the object x when
-0.346814	Ignoring virtualization. The time when
-0.494302	takes a long time when
-0.449050	take no extra time when
-0.414701	give the execution time when
-0.232058	in develop- ment time when
-0.449324	could free the memory when
-0.291542	are loaded into memory when
-0.237128	of a big program when
-0.213052	to evaluate a only when
-0.310338	take precedence, not only when
-0.406308	should be used only when
-0.213052	new register size only when
-0.213052	a new branch only when
-0.213052	example, use AVX only when
-0.213052	may be loaded only when
-0.213052	code is chosen only when
-0.213052	of 2 applies only when
-0.036539	way is mispredicted only when
-0.076437	count is mispredicted only when
-0.213052	operand is evaluated only when
-0.213052	running the services only when
-0.455375	delaying process is used when
-0.552515	mechanism may be used when
-0.617842	mechanism is also used when
-0.348836	best possible instruction set when
-0.348836	a newer instruction set when
-0.355791	first thing to do when
-0.284434	the program, for example when
-0.284434	the loop, for example when
-0.236945	of the default size when
-0.236979	needs to evaluate b when
-0.047745	a large positive number when
-0.047745	very large positive number when
-0.237122	are simply put there when
-0.340449	from main, but also when
-0.282048	* 1.5f; is efficient when
-0.438310	the code more efficient when
-0.338904	code becomes more efficient when
-0.545692	array is less efficient when
-0.519929	this is not possible when
-0.895414	using powers of 2 when
-0.352865	this case is faster when
-0.320380	multiplication will be faster when
-0.348047	the destructor is called when
-0.432319	constructor must be called when
-0.236593	system code is critical when
-0.355290	is valid. For example, when
-0.236330	integer parameter comes first when
-0.236227	than most other libraries when
-0.339639	128- bit vector registers when
-0.347881	two things to test when
-0.542615	and in 32-bit systems when
-0.470692	value. This is useful when
-1.206649	This can be useful when
-0.336142	Vector operations are useful when
-0.314161	counter is very useful when
-0.199062	load into memory even when
-0.199062	for different objects even when
-0.199062	the dispatch mechanism even when
-0.199062	lookups are needed even when
-0.199062	is always inlined even when
-0.199062	will be used, even when
-0.199062	binding by default, even when
-0.199062	takes memory space, even when
-0.329907	a single executable file when
-0.292012	system, and 512 bits when
-0.306146	problem with vector operations when
-0.306146	by using vector operations when
-0.508416	implementation in most cases when
-0.235898	flow at inconvenient times when
-0.703956	memory to the stack when
-0.457390	exactly what you want when
-0.235672	These methods also work when
-0.344211	is, and is compiled when
-0.311940	cases. It is best when
-0.736967	handling is not necessary when
-0.344506	the preferred programming language when
-0.235714	(i=0; i<n; ++i). But when
-0.330155	to transpose the matrix when
-0.617669	to transpose a matrix when
-0.568230	single and double precision when
-0.310689	faster than double precision when
-0.235295	interpreted line by line when
-0.633092	can therefore be advantageous when
-0.439672	There is a problem when
-0.331094	you have this problem when
-0.350362	more time to calculate when
-0.345199	variable as loop counter when
-0.234828	to load several files when
-1.057528	use dynamic memory allocation when
-0.234441	throughput of CPU-intensive programs when
-0.290457	Such schemes cause problems when
-0.222355	because it goes automatically when
-0.222355	update, or update automatically when
-0.234451	uses a different implementation when
-0.276139	is faster than signed when
-0.221989	2.5; // Use signed when
-0.322227	There is a disadvantage when
-0.234196	a background process running when
-0.490970	structures in the end when
-0.233605	can skip large expressions when
-0.289283	a name. #define directives when
-0.233701	to do cross-module optimizations when
-0.289431	normal on some microprocessors when
-0.233158	instance for each process when
-0.233589	C++ has many advantages when
-0.233302	operators produce 32 results when
-0.232791	to assembly language modules when
-0.149087	for size is relevant when
-0.149087	for speed is relevant when
-0.197592	could possibly be relevant when
-0.146110	can be allocated dynamically when
-0.636434	should definitely be avoided when
-0.232184	point comparisons are inefficient when
-0.232123	A considerable delay comes when
-0.231579	doubled for this task when
-0.334510	highest efficiency is obtained when
-0.205449	reading of the counters when
-0.663861	the performance monitor counters when
-0.800350	cache works most efficiently when
-0.139611	and much less efficiently when
-0.139611	works somewhat less efficiently when
-0.168396	(PLT) that is initialized when
-0.168396	the table is initialized when
-0.265662	need to be initialized when
-0.230424	point code slower, especially when
-0.511662	makes an error message when
-0.230424	a static library, except when
-0.229676	takes to execute CriticalFunction when
-0.317749	is a performance penalty when
-0.229405	expression list[i] is invalid when
-0.282401	parallelism and fine-grained parallelism when
-0.107026	to take into account when
-0.057827	compatibility problems into account when
-0.006058	be taken into account when
-0.227400	method is inefficient, however, when
-0.156525	space becomes more fragmented when
-0.202680	heap space becomes fragmented when
-0.202680	can easily become fragmented when
-0.227400	keyboard and mouse inputs when
-0.306470	compiled version is preferred when
-0.227400	deallocate the space explicitly when
-0.271689	dynamic library is resolved when
-0.184433	function is not resolved when
-0.225910	the shared object. Likewise, when
-0.280873	than the program itself when
-0.226038	is possibly more serious when
-0.453177	to Microsoft Visual Studio when
-0.233572	advantages over the disadvantages when
-0.184201	to overcome these disadvantages when
-0.225910	can make an update when
-0.310894	will start garbage collection when
-0.278676	The exception is costly when
-0.224228	nor slower than truncation when
-0.224079	than normal. This happens when
-0.278508	around at different places when
-0.005264	value. The keyword static, when
-0.005264	optimizations. The keyword static, when
-0.005264	context. The keyword static, when
-0.005264	80. The keyword static, when
-0.132688	because it is deallocated when
-0.132688	that they are deallocated when
-0.132688	space is automatically deallocated when
-0.224228	but this is permissible when
-0.221706	requirements are less strict when
-0.221706	be a viable compromise when
-0.221529	clock frequency is increased when
-0.295256	is easier to understand when
-0.217731	is much more dramatic when
-0.217951	automatically come into force when
-0.271319	The syntax is simpler when
-0.264248	points to is deleted when
-0.211475	loop-invariant code motion manually when
-0.369568	the compiler option -fno-pic when
-0.211475	for the class Vec16s when
-0.264248	code becomes more readable when
-0.264248	memory allocation is negligible when
-0.211475	piecewise or re- allocating when
-0.211475	necessary information about Func1 when
-0.122213	the memcpy function implicitly when
-0.122213	Multiplications are done implicitly when
-0.264248	0x273F will be evicted when
-0.199226	memory space is freed when
-0.250441	true, and all 0's when
-0.199226	mechanism can be bypassed when
-0.199226	intensive program is achieved when
-0.199226	you must be careful when
-0.199226	sizes and array indices when
-0.199226	possible memory requirement. Useful when
-0.199226	out of the question when
-0.199226	different floating point precisions when
-0.250441	which is all 1's when
-0.199226	dependency chains is stronger when
-0.250441	integers or four float's when
-0.164426	that the loop exits, when
-0.164426	gained remarkably in popularity when
-0.164426	(depending on the processor) when
-0.164426	Windows and to Eclipse when
-0.164426	the operating systems disappears when
-0.164426	by more than 33% when
-0.164426	and the memory released when
-0.164426	is high and decreased when
-0.164426	equivalent to const definitions when
-1.248631	if the value of A
-0.551178	and the calculation of A
-0.391394	before the calculation of A
-0.391394	specifies the calculation of A
-0.236100	C; double Z = A
-0.292158	B, C; x.abc = A
-0.236100	const double A2 = A
-0.814913	dispatching in Gnu compiler A
-0.496205	+ 1; } } A
-0.288325	a : b; } A
-0.520255	b[i] + 2; } A
-0.528572	list[i] += 1.0f; } A
-0.730268	9.5 Alignment of data A
-0.975195	of code and data A
-0.322092	macros instead of functions A
-0.235227	needs them. Pure functions A
-0.902201	in the innermost loop A
-0.209621	table of const double A
-0.209621	induction variables const double A
-0.236091	executing the critical code. A
-0.388578	50% of the time. A
-0.418650	line at a time. A
-0.356580	significant amount of time. A
-0.287971	at the same time. A
-0.788915	known at compile time. A
-0.423806	instantiated at compile time. A
-0.202409	the total calculation time. A
-0.418699	pointer. 7.9 Smart pointers A
-0.099588	least one other function. A
-0.099588	call any other function. A
-0.241459	members or member functions. A
-0.318976	or non-static member functions. A
-0.262321	many useful mathematical functions. A
-0.193575	names from string functions. A
-0.244088	functions and frame functions. A
-0.193575	must use thread-safe functions. A
-0.404296	8.10b a = b; A
-0.234812	instead of main memory. A
-0.323877	double precision is used. A
-0.350029	program is never used. A
-0.350029	is no longer used. A
-0.234287	Table 13.1. Instruction sets A
-0.325427	of the 64-bit systems. A
-0.275872	in some embedded systems. A
-0.320817	rounding. Pointer type conversion A
-0.273363	of organizing the data. A
-0.308028	smallest list of data. A
-0.194936	for storing user data. A
-0.194936	text or input data. A
-0.348416	for each instruction set. A
-0.319816	only with Intel processors. A
-0.232991	integer constants. Register storage A
-0.320980	functions that are called. A
-0.333013	table or a pointer. A
-0.231671	signed and unsigned variables. A
-0.231525	mode. Make functions local A
-0.231033	program that calls it. A
-0.231033	be copied into registers. A
-0.456136	enabled in 64-bit mode. A
-0.231033	block for each object. A
-0.231033	or a static library. A
-0.231033	than sequences of operations. A
-0.230952	still needs careful optimization. A
-0.288675	no difference in performance. A
-0.207680	inlined for improved performance. A
-0.230547	behavior of static libraries. A
-0.918090	than on the stack. A
-0.230459	whenever it is possible. A
-0.230635	them into one thread. A
-0.371797	to the AVX instructions. A
-0.229973	goes the other way. A
-0.152102	loop is predicted well. A
-0.152102	is not predicted well. A
-0.229181	to a different address. A
-0.573921	7.23 Constructors and destructors A
-0.332856	interprocedural optimization is enabled. A
-0.181814	what it points to. A
-0.357348	that r points to. A
-0.286145	own allocated memory block. A
-0.195296	to the next block. A
-0.283462	speed is particularly critical. A
-0.520150	4 Performance and usability A
-0.227270	to an output file. A
-0.282275	be accessed. Pointer arithmetic A
-0.227401	5 Programmable logic devices A
-0.071177	the same memory space. A
-0.071177	never takes memory space. A
-0.302378	waste of cache space. A
-0.227139	first byte of zero. A
-0.227139	lifetime of your software. A
-0.227270	be organized into vectors. A
-0.225762	set of template parameters. A
-0.278430	when efficiency is important. A
-0.184104	is becoming increasingly important. A
-0.280415	caching. 3.14 Context switches A
-0.233312	on the hard disk. A
-0.183969	to a floppy disk. A
-0.458020	needed in this case. A
-0.225910	Using templates for polymorphism A
-0.225910	speed or full speed. A
-0.223931	Dispatch on every call. A
-0.365887	explanation of branch prediction. A
-0.187495	43 about branch prediction. A
-0.314590	copying all data members. A
-0.278340	lookup mechanisms explained above. A
-0.224104	produce the same result. A
-0.449842	no loop-carried dependency chain. A
-0.363683	collector at inconvenient times. A
-0.278536	a simple integer counter. A
-0.224104	or no other branches. A
-0.312711	Jumps between CPU cores. A
-0.275452	to using a profiler. A
-0.221588	simple cases. 7.28 Templates A
-0.310396	complicated and time consuming. A
-0.221382	to justify the method. A
-0.295082	compilers and development tools. A
-0.360217	additions in one operation. A
-0.305490	the loop unroll factor. A
-0.359933	instance for each process. A
-0.221382	it is not necessary. A
-0.275452	like sin. Pointer elimination A
-0.221588	search for finding elements. A
-0.221588	performance is not doubled. A
-0.221382	can be mentioned here: A
-0.378389	case of an exception. A
-0.217841	object doesn't need initialization. A
-0.271154	database access. 3.10 Graphics A
-0.217841	with a simple index. A
-0.217841	than a few lines. A
-0.217586	or other hardware conditions. A
-0.290587	89 for an example. A
-0.067772	misses are very expensive. A
-0.067772	but also very expensive. A
-0.217586	software in two versions. A
-0.217586	multiple logically distinct tasks. A
-0.550213	a graphical user interface. A
-0.354719	Structure of 4 floats A
-0.499627	9.9 Access data sequentially A
-0.567098	avoid long dependency chains. A
-0.519694	Loop invariant code motion A
-0.283197	management and garbage collection. A
-0.211331	int instead of int. A
-0.211331	by a const reference. A
-0.369361	and more error prone. A
-0.250284	page 53. 7.24 Unions A
-0.199087	around the constant subexpression. A
-0.199087	the same bits differently. A
-0.250284	predicted well, of course. A
-0.199087	definitely be avoided. 37 A
-0.329547	dispatchers up to date. A
-0.464423	aliasing (see page 78). A
-0.074425	languages, profiling and debugging. A
-0.074425	are incompatible with debugging. A
-0.199087	more time to load. A
-0.250284	induction variables (see below). A
-0.250284	search, is fast enough. A
-0.199087	code branches works correctly. A
-0.250284	reasonably well. Codeplay VectorC A
-0.199087	not support static linking. A
-0.464423	sets are mutually incompatible. A
-0.329547	called performance monitor counters. A
-0.464423	AMD and VIA CPUs". A
-0.199087	n 0 n! 117 A
-0.164297	index of memory blocks. A
-0.164297	able to do so). A
-0.164297	sequence to be moved. A
-0.164297	by the function body. A
-0.164297	they are often mispredicted. A
-0.164297	that the object owns. A
-0.164297	quite a good investment. A
-0.164297	described on page 153. A
-0.164297	and any other constructors. A
-0.164297	explained on page 107. A
-0.164297	table of jump targets. A
-0.164297	as in example 7.32b. A
-0.164297	doesn't need a constructor. A
-0.164297	on the first sub-vector. A
-0.164297	technology, and microprocessor microarchitecture. A
-0.164297	"static" or "__attribute__((visibility ("hidden")))". A
-0.164297	calls a device driver. A
-0.164297	specified types (See Sutter: A
-0.164297	Microsoft Foundation Classes (MFC). A
-0.164297	be used without restrictions. A
-0.164297	list[100]; memset(list, 0, sizeof(list)); A
-0.164297	stored in ASCII form. A
-0.164297	the software was developed. A
-0.164297	Rick Booth: "Inner Loops: A
-0.164297	functions for millisecond resolution. A
-0.164297	infinity or NAN (Not A
-0.164297	to some extra complications. A
-0.164297	containers use linked lists. A
-0.164297	is 1 0.5ns. 2GHz A
-0.164297	Windows Template Library (WTL). A
-0.164297	*(p++) |= 0x20; 46 A
-0.164297	should also be considered. A
-0.164297	or limited in scope. A
-0.164297	negative inputs give infinity. A
-0.164297	defining _mm_malloc and _mm_free. A
-0.164297	it (&ArraySize) is taken. A
-0.164297	more than it says. A
-0.164297	tool for details (www.agner.org/optimize/testp.zip). A
-0.164297	only when it changes. A
-0.164297	an interpreter for Basic. A
-0.164297	hackers often have exploited. A
-0.164297	event it is servicing. A
-0.164297	model N-1 is inferior. A
-0.164297	p2 having different types. A
-0.164297	to make a destructor. A
-0.164297	mispredicted for this reason. A
-0.164297	with this instruction set?". A
-0.164297	in a specific interval. A
-0.164297	calculating a polynomial. Scheduling A
-0.164297	+= i_div_3; } 138 A
-0.164297	a multidimensional structure needed? A
-0.164297	other thread increments seconds. A
-0.357511	slow. Value of a will
-0.456722	is certain that a will
-0.625676	in static memory and will
-0.237611	cout << x.f; // will
-0.310887	with -fpic and it will
-0.310887	quite inefficient, and it will
-0.352389	are sure that it will
-0.301622	instruction set then it will
-0.301622	clock cycles then it will
-0.301622	the debugger then it will
-0.058076	version on, then it will
-0.347534	code section, but it will
-0.556956	After first call it will
-0.608871	in this case it will
-0.341399	internal references. Therefore, it will
-0.229251	declared or created it will
-0.229251	All in all, it will
-0.333280	register. The library function will
-0.441382	of a virtual function will
-0.322453	initialized. The dispatcher function will
-0.235523	example, the DelayFiveSeconds function will
-0.511312	double, then the code will
-0.344417	vectorized, but the code will
-0.344417	above. Now the code will
-0.347951	is specified. The code will
-0.315577	return n;} This code will
-0.417087	12.2, the optimized code will
-0.335524	rarely. The above code will
-0.231493	is. This above code will
-0.229886	} The resulting code will
-0.229886	and the resultant code will
-0.322270	the dispatcher function. This will
-0.283204	the class definition. This will
-0.228220	a null reference. This will
-0.283204	static or inline. This will
-0.228220	is never changed. This will
-0.228220	longer time slices. This will
-0.228220	/QaxAVX or -axAVX. This will
-0.228220	test their functionality. This will
-0.228220	page size (4096). This will
-0.228220	on page 87. This will
-0.228220	instead of -fpic. This will
-0.339910	predict whether the compiler will
-0.439576	many cases, the compiler will
-0.339910	= ++b; the compiler will
-0.276032	is called. The compiler will
-0.276032	is enabled. The compiler will
-0.276032	automatic vectorization. The compiler will
-0.276032	or division. The compiler will
-0.276032	/ 1.2345); The compiler will
-0.276032	e.g. /arch:SSE2. The compiler will
-0.276032	/ 3.0; The compiler will
-0.220621	to predict which compiler will
-0.404308	only the Gnu compiler will
-0.404308	Here, the Gnu compiler will
-0.162699	} A good compiler will
-0.162699	operation. A good compiler will
-0.311425	that an optimizing compiler will
-0.311425	But an optimizing compiler will
-0.324359	and clumsy, as you will
-0.369598	by the compiler you will
-0.320098	preceding addition then you will
-0.320098	specific models then you will
-0.320098	parameters differ then you will
-0.303393	of course, because you will
-0.303393	the code, so you will
-0.283392	in your program, you will
-0.228385	by the compiler, you will
-0.235456	is used and this will
-0.235456	unsigned integer and this will
-0.346224	with -mcmodel=large, but this will
-0.232842	Microsoft compilers. // It will
-0.310727	stored in memory. It will
-0.232842	the operating system. It will
-0.232842	are both positive. It will
-0.348916	large amounts of memory will
-0.235686	compile time. No memory will
-0.445990	support, and the program will
-0.345001	interface. Otherwise the program will
-0.341011	described below. The program will
-0.397239	processors. The application program will
-0.372549	in the entire program will
-0.346993	calculated independently. The CPU will
-0.455821	estimate that the loop will
-0.341197	innermost loop. The loop will
-0.286769	cycles, then this loop will
-0.231359	and the whole loop will
-0.373382	value of i which will
-0.231116	whole program optimization, which will
-0.231116	-100 to -56 which will
-0.231116	elements in a[] which will
-0.237237	point register variables, but will
-0.323183	is that the cache will
-0.323183	column 28, the cache will
-0.237084	differently. A negative integer will
-0.537136	of the same class will
-0.312536	turned on, the compilers will
-0.290993	losing precision. The compilers will
-0.371848	Func1, while other compilers will
-0.266036	d.y; Fortunately, most compilers will
-0.247649	notice that some compilers will
-0.241580	Loop unrolling Some compilers will
-0.241580	cache line. Some compilers will
-0.241580	the division. Some compilers will
-0.241580	than two. Some compilers will
-0.247649	example, all good compilers will
-0.266036	fast. All optimizing compilers will
-0.166851	static memory. Most compilers will
-0.166851	or cache. Most compilers will
-0.166851	simple variable. Most compilers will
-0.166851	instruction sets. Most compilers will
-0.166851	sizeof(b)); 47 Most compilers will
-0.196744	the processor). Optimizing compilers will
-0.247649	hope that future compilers will
-0.322115	fast. Value of b will
-0.567689	MultiplyBy<8>(10); a and b will
-0.540325	The vector class library will
-0.503845	negative value of i will
-0.335836	memory areas, and there will
-0.237103	= Func(ab[i].a); } There will
-0.442635	then a linear array will
-0.293090	infinity, and this value will
-0.519561	Such variables and objects will
-0.328871	a result then we will
-0.280900	= 1.23456. But we will
-0.226189	loop by four, we will
-0.226189	places back. Thus, we will
-0.888529	Failure to do so will
-0.236749	to predict which variables will
-0.236933	questions to me. You will
-0.295336	which way a branch will
-0.295336	way. Such a branch will
-0.324251	But these eight elements will
-0.334218	that the 64-bit systems will
-0.346135	terminated and the user will
-0.664822	that the end user will
-0.236159	of the code 16 will
-0.285695	is closed. The file will
-0.600928	the appropriate header file will
-0.346414	the time of programming will
-0.236173	same brand. Future processors will
-0.223469	code or not. I will
-0.223469	specific model. Instead, I will
-0.223469	of 0x800 apart. I will
-0.777398	does floating point calculations will
-0.445019	bits, and the result will
-0.316265	the profiler. The result will
-0.361026	that the final result will
-0.291853	language. Such a processor will
-0.329255	line, because the threads will
-0.235724	reasons, the preferred language will
-0.342264	times and the speed will
-0.304688	chains then each thread will
-0.304688	core with another thread will
-0.304688	jobs simultaneously. Each thread will
-0.312075	+127. An integer overflow will
-0.449377	so a cache line will
-0.235626	computers and my manual will
-0.534084	= a & b; will
-0.291157	with multiple overloaded operators will
-0.234621	zero. The [] operator will
-0.234553	some cases this multiplication will
-0.234681	are eliminated. Code caching will
-0.321025	the next processor model will
-0.234673	+ c; Here, y will
-0.289998	in the above examples will
-0.233505	hope that such feature will
-0.232946	in the same core will
-0.728136	in the code section will
-0.428569	if level-2 cache contentions will
-0.159282	the version in main will
-0.159282	the instance in main will
-0.211835	in a. This operation will
-0.211835	of the & operation will
-0.232157	predict correctly whether vectorization will
-0.231919	subexpression containing only constants will
-0.286773	in scope. A macro will
-0.230917	CPU core). The counters will
-0.285670	An uncaught overflow condition will
-0.611610	the number of cores will
-0.229701	to assume that F1 will
-0.962479	of the critical stride will
-0.320350	page 143. The trick will
-0.226460	or more template instances will
-0.226201	Adding 1 to 127 will
-0.226374	(signed) address. The linker will
-0.278950	chance that the break will
-0.278837	mind, that many users will
-0.221817	the micro-op cache. Compilers will
-0.221937	number 1. Number 18 will
-0.291098	is that the loader will
-0.426145	invalid. The heap manager will
-0.218165	this column. Number 17 will
-0.109285	variable from address 0x2710 will
-0.109285	again from address 0x2710 will
-0.346746	2 in example 14.28 will
-0.211757	Example 14.23b and 14.30 will
-0.568064	from 0x2700 to 0x273F will
-0.211951	Here, the constant 3.5 will
-0.199500	The calculation of c+b will
-0.199500	when they are disabled will
-0.199500	is brand new today will
-0.199500	with the static modifier will
-0.199500	the value of b+c will
-0.164678	Writing a = OneOrTwo5[b!=0]; will
-0.164678	capabilities (see page 103) will
-0.164678	likely that the producer will
-0.164678	& operator (bitwise and) will
-0.164678	expression a = b++; will
-0.164678	subexpression. For example, b*2.0/3.0 will
-0.347996	a polymorphic function. The }
-0.343421	Call to virtual function }
-0.273241	elements } } } }
-0.389299	swapd(a[r2][c2],a[c2][r2]); } } } }
-0.273241	transpose(matrix); } } } }
-0.210602	swap elements } } }
-0.365193	// ... } } }
-0.043863	{ swapd(a[r2][c2],a[c2][r2]); } } }
-0.210602	= b[r][c]; } } }
-0.210602	matrix[SIZE][SIZE]; transpose(matrix); } } }
-0.210602	StoreNTD(&a[c][r], b[r][c]); } } }
-0.210602	for-loop: i++; } } }
-0.181536	// swap elements } }
-0.400193	largest_index = i; } }
-0.153708	WhateverFunction(i); // ... } }
-0.153708	(...) { ... } }
-0.153708	= FactorialTable[b]; ... } }
-0.048412	a - 1; } }
-0.759983	a + 1; } }
-0.386879	*p + 2; } }
-0.271388	bb[i] + 2; } }
-0.181536	out of range } }
-0.181536	count to 5 } }
-0.123240	+ i, a); } }
-0.081351	else { F2(b); } }
-0.081351	float b[1000]; F2(b); } }
-0.081351	* c); a.store(aa+i); } }
-0.081351	in aa: a.store(aa+i); } }
-0.181536	= Induction; Induction++; } }
-0.018954	c2++) { swapd(a[r2][c2],a[c2][r2]); } }
-0.181536	r + i/2; } }
-0.181536	(b[i] * c[i]); } }
-0.181536	a[c][r] = b[r][c]; } }
-0.181536	{ goto CFALSE; } }
-0.181536	{ goto DTRUE; } }
-0.181536	double matrix[SIZE][SIZE]; transpose(matrix); } }
-0.181536	{ StoreNTD(&a[c][r], b[r][c]); } }
-0.181536	the for-loop: i++; } }
-0.381094	a[c][r]); // swap elements }
-0.343479	CFALSE: c = 0; }
-0.343479	{ d = 0; }
-0.035612	c); ... return 0; }
-0.333609	// Output array element }
-0.077965	absvalue; largest_index = i; }
-0.077965	order(i); matrix[j][0] = i; }
-0.312720	f; unsigned int i; }
-0.003734	float f; int i; }
-0.391848	{ a = b; }
-0.226835	? a : b; }
-0.235007	= WhateverFunction(i); // ... }
-0.274194	catch (...) { ... }
-0.185482	{ C1 x; ... }
-0.185482	a = Func1(2); ... }
-0.185482	log2 = log(2.0); ... }
-0.185482	a = FactorialTable[b]; ... }
-0.234608	[] array index operator }
-0.191479	{ c = 1; }
-0.191479	DTRUE: d = 1; }
-0.012676	return a - 1; }
-0.050250	return a + 1; }
-0.237555	* b + 1; }
-0.081609	return x*x + 1; }
-0.012676	{ cout << 1; }
-0.113579	x; n >>= 1; }
-0.234538	// used for multiplication }
-0.222070	{ a = c; }
-0.222070	+ 1; return c; }
-0.310299	// f is zero }
-0.289861	FactorialTable[n]; // Table lookup }
-0.233710	x) { return x; }
-0.022996	1; list[i+2] = 2; }
-0.016041	= r + 2; }
-0.003955	= *p + 2; }
-0.016041	= b[i] + 2; }
-0.016041	= bb[i] + 2; }
-0.300224	= a * 2; }
-0.009956	{ cout << 2; }
-0.492267	if out of range }
-0.308717	seconds count to 5 }
-0.288475	v.f if both positive }
-0.231684	add n to exponent }
-0.524164	{ a[i] = temp; }
-0.230550	*= i; return f; }
-0.109176	* 9 + 3; }
-0.302263	= a * 3; }
-0.109176	+= i / 3; }
-0.109176	= i % 3; }
-0.228846	1; } return y; }
-0.228916	x^0/0! // n factorial }
-0.010464	{ _mm_storeu_si128((__m128i *)d, x); }
-0.043458	{ _mm_store_si128((__m128i *)d, x); }
-0.224352	x) { return 1.0; }
-0.103330	= y + 1.; }
-0.094292	* Func1(x) + 1.; }
-0.221923	// f is nonzero }
-0.221800	+ B*x + C; }
-0.218304	case 3: printf("Delta"); break; }
-0.218000	out of range"; 134 }
-0.147132	1.0; list[i].b = 2.0; }
-0.147132	1.0; temp->b = 2.0; }
-0.026850	{ list[i] += 1.0f; }
-0.291077	i_div_3; list[i+2] += i_div_3; }
-0.218000	* powN<true,N-N1>::p(x); #undef N1 }
-0.005748	{ return _mm_loadu_si128((__m128i const*)p); }
-0.048267	{ return _mm_load_si128((__m128i const*)p); }
-0.211739	Update induction variable Z }
-0.001695	StoreVector(aa + i, a); }
-0.199483	p = &Object2; p->Hello(); }
-0.465154	{ y = cos(x); }
-0.199483	Virtual call to C1::f }
-0.199483	factorial } return sum; }
-0.199483	* x + 2.0f; }
-0.465154	Index out of range"; }
-0.199483	= __rdtsc(); return clock; }
-0.199483	is negative or -0 }
-0.074574	} else { F2(b); }
-0.074574	{ float b[1000]; F2(b); }
-0.199483	{ FuncB(i); } FuncC(i); }
-0.250731	== 0) { FuncA(i); }
-0.199483	version return (*CriticalFunction)(parm1, parm2); }
-0.199483	// next four x^n }
-0.074574	if (y) { F1(a); }
-0.074574	{ int a[1000]; F1(a); }
-0.074574	version CriticalFunction = &CriticalFunction_386; }
-0.074574	Default version return &CriticalFunction_386; }
-0.035688	== Friday) { DoThisThreeTimesAWeek(); }
-0.035688	| Friday)) { DoThisThreeTimesAWeek(); }
-0.465154	{ y = sin(x); }
-0.074574	b * c); a.store(aa+i); }
-0.074574	elements in aa: a.store(aa+i); }
-0.074574	supported CriticalFunction = &CriticalFunction_SSE2; }
-0.074574	SSE2 supported return &CriticalFunction_SSE2; }
-0.199483	a[i+1] = Induction; Induction++; }
-0.008651	return (*SelectAddMul_pointer)(aa, bb, cc); }
-0.017478	c1+TILESIZE; c2++) { swapd(a[r2][c2],a[c2][r2]); }
-0.017478	r2; c2++) { swapd(a[r2][c2],a[c2][r2]); }
-0.199483	MOVNTQ _mm_empty(); // EMMS }
-0.330083	{ memset(a, 0, sizeof(a)); }
-0.074574	supported CriticalFunction = &CriticalFunction_AVX; }
-0.074574	AVX supported return &CriticalFunction_AVX; }
-0.199483	* cc[i]); } 109 }
-0.164663	x) { return pow(x,10); }
-0.164663	add the four sums }
-0.164663	after) - (time before) }
-0.164663	return powN<true,N/2>::p(x) * powN<true,N/2>::p(x); }
-0.164663	= list[j].b + list[j].c; }
-0.164663	: (bb[i] * cc[i]); }
-0.164663	= x8*x2; return x10; }
-0.164663	= r + i/2; }
-0.164663	// abs(u.f) > abs(v.f) }
-0.164663	log (b[i] * c[i]); }
-0.164663	} else { FuncB(i); }
-0.164663	x) { return IntegerPower<10>(x); }
-0.164663	polymorphic child function: (static_cast<MyChild*>(this))->Disp(); }
-0.164663	{ a[c][r] = b[r][c]; }
-0.164663	out of range printf(Greek[n]); }
-0.164663	else { goto CFALSE; }
-0.164663	else { goto DTRUE; }
-0.164663	{ b[i] = Func(a[i]); }
-0.164663	printf("\n%2i %10I64i", i, timediff[i]); }
-0.164663	__declspec(__align(64)) double matrix[SIZE][SIZE]; transpose(matrix); }
-0.164663	{ try { F1(); }
-0.164663	SSE2 not supported"); return; }
-0.164663	{ ab[i].b = Func(ab[i].a); }
-0.164663	a + 1; 69 }
-0.164663	4.4, 2.5}; return list[x]; }
-0.164663	Size() { return N; }
-0.164663	c++) { StoreNTD(&a[c][r], b[r][c]); }
-0.164663	_mm_hadd_ps(s, s); return _mm_cvtss_f32(s); }
-0.164663	int list[100]; Func1(list, &list[8]); }
-0.164663	double d; int i[2]; }
-0.164663	return powN<(N & N-1)==0,N>::p(x); }
-0.164663	temp * temp; 104 }
-0.164663	FuncA(i); FuncC(i); FuncB(i+1); FuncC(i+1); }
-0.164663	a[i+2]; s3 += a[i+3]; }
-0.164663	into the for-loop: i++; }
-0.164663	reporting here: return *(T*)0; }
-0.164663	i, a); } 111 }
-0.164663	temp; temp += 9; }
-0.346979	and this pointer is then
-0.293753	The result ebx is then
-0.234793	calculation of A and then
-0.290673	double precision constant and then
-0.311020	own data structure and then
-0.234793	specific CPU model and then
-0.312282	variables to zero and then
-0.234793	produces a string and then
-0.378494	appropriate error message and then
-0.234793	have been added and then
-0.234793	on a PC and then
-0.234793	when doing calculations, and then
-0.234793	here gives a+b=0, and then
-0.406116	array. The values are then
-0.323698	The intermediate files are then
-0.236542	of its members are then
-1.122913	of the code can then
-0.541116	executed. The compiler can then
-0.330539	threads. Each thread can then
-0.324885	calls another dispatched function then
-0.582425	values in the code then
-0.329727	part of a code then
-0.561154	critical piece of code then
-0.290536	of the CPU time then
-0.234674	in a short time then
-1.184549	known at compile time then
-0.348997	128 bytes or more then
-0.293421	a. This operation will then
-0.628502	scattered around in memory then
-1.412202	part of the program then
-1.060349	part of a program then
-0.335569	time in library functions then
-0.445182	one virtual member functions then
-0.323969	can avoid virtual functions then
-0.286961	calls to frame functions then
-0.349926	calculate than the other then
-0.237116	out a big loop then
-0.335179	or API function which then
-0.715073	bigger than the cache then
-0.237155	cores. Each thread should then
-0.378169	limit can be set then
-0.355443	a particular instruction set then
-0.324287	compiled with different compilers then
-0.738975	by the vector size then
-0.546726	added to a pointer then
-0.328259	each their smart pointer then
-0.348871	an Intel function library then
-0.313177	you unroll by two then
-0.352775	and deleting the object then
-0.236991	was an odd number then
-0.236771	the local object static then
-0.762862	a power of 2 then
-0.352094	factors for the performance then
-0.323679	all the array elements then
-0.501731	from the above example, then
-0.236155	are not enough registers then
-0.446749	this is the case then
-0.290527	instruction set is available then
-0.290527	or inttypes.h is available then
-0.405353	something to clean up then
-0.347327	case of an error then
-0.371110	repeats a thousand times then
-0.371110	also repeats 1000 times then
-0.311958	the object is large then
-0.291833	The calling function must then
-0.440586	a sequence of calculations then
-0.291754	grows during program execution then
-0.313109	wait for a result then
-0.322581	the first 128 bytes then
-0.235709	frequent updates are necessary then
-0.235595	hold only one element then
-0.235289	exit(), abort(), _endthread(), etc. then
-0.337388	possibly throw an exception then
-0.334134	of elements is small then
-0.379396	an assembly output option then
-0.347461	87 used cache line then
-0.235334	way a profiler works then
-0.226687	any type of parameters then
-0.323524	factors as template parameters then
-0.505904	hyperthreading is not advantageous then
-0.454272	caching is a problem then
-0.379281	string is already known then
-0.311494	compiled without AVX support then
-0.291035	or other data structure then
-0.234927	with different set values then
-0.454246	takes 10 clock cycles then
-0.234826	b[0], a[1], b[1], ... then
-0.234696	a floating point counter then
-0.349183	AVX using CPU dispatching then
-0.234333	necessary for your application then
-0.234231	(see page 73) automatically then
-0.346602	of using exception handling then
-0.417057	any of these methods then
-0.234160	in the old block then
-0.289772	of objects is high then
-0.343184	set. A CPU dispatcher then
-0.377269	of the preceding addition then
-0.233259	use the 64-bit vectors then
-0.232899	in the innermost function, then
-0.199619	address in this range then
-0.199619	within a limited range then
-0.199619	to a narrow range then
-0.319026	only one CPU core then
-0.232659	through pointers or references then
-0.400139	than the CPU supports then
-0.213204	of j as index then
-0.434418	as an array index then
-0.681962	part of the code, then
-0.336498	has only one instance then
-0.231825	where you want vectorization then
-0.231941	concentrated on CPU efficiency then
-0.207893	bits at a time, then
-0.207893	added at any time, then
-0.230702	brands or specific models then
-0.286114	function pointer has changed then
-0.239166	long time to execute then
-0.239166	take microseconds to execute then
-0.230134	to the same resource then
-0.229165	using an Intel compiler, then
-0.323424	in the same module then
-0.171282	from any other module then
-0.171282	or a separate module then
-0.206805	Intel compiler is used, then
-0.206805	the expression is used, then
-0.961431	of the critical stride then
-0.336455	a particular instruction set, then
-0.228517	the additions are independent then
-0.228943	integers are equally near then
-0.329172	or long dependency chains then
-0.413188	counter is an integer, then
-0.227348	occurs more than once then
-0.051796	takes 5 clock cycles, then
-0.227468	was not declared volatile then
-0.082392	in the shared object, then
-0.082392	in a shared object, then
-0.225987	If the version changes then
-0.310667	b are 32-bit integers, then
-0.278620	it is poorly predictable then
-0.422212	version) in the debugger then
-0.074524	advanced code version on, then
-0.074524	the advanced version on, then
-0.221848	and b are swapped then
-0.590450	in the carry flag then
-0.310461	of || is true, then
-0.378516	fed into the pipeline then
-0.217907	any compile-time constant n, then
-0.067795	non-Intel CPU. If not, then
-0.067795	unroll factor. If not, then
-0.217673	is small and changing then
-0.217673	in a non-sequential manner then
-0.211418	hold e.g. four numbers, then
-0.283299	of A is slow, then
-0.122182	First-In-First- Out (FIFO) basis then
-0.122182	First-In-Last- Out (FILO) basis then
-0.211418	the bottleneck is elsewhere then
-0.211418	many times one way, then
-0.211418	to improve cache efficiency, then
-0.211418	are no big arrays, then
-0.211725	of && is false, then
-0.122182	to the first sum, then
-0.122182	to the second sum, then
-0.122182	float instead of double, then
-0.122182	of a 64-bit double, then
-0.199170	address is not vacant then
-0.250379	threads with different priorities then
-0.250379	frequency is 2 GHz then
-0.199170	the template parameters differ then
-0.074456	a+b is calculated first, then
-0.074456	the R values first, then
-0.250379	a register (see below) then
-0.199170	the microprocessor has hyperthreading, then
-0.199170	and compile-time while loops, then
-0.199170	stored in the container, then
-0.199170	has no AVX support, then
-0.199170	data for one segment then
-0.199170	exceeds an acceptable limit, then
-0.164374	takes 10 μs today, then
-0.164374	alternately FuncA and FuncB, then
-0.164374	has a particular meaning, then
-0.164374	spot has been identified, then
-0.164374	code with CPU dispatching, then
-0.164374	program have been found, then
-0.164374	explained in chapter 9.10, then
-0.164374	granularity is too fine then
-0.164374	has hyperthreading. If so, then
-0.164374	predictable than the other, then
-0.164374	diagonal are accessed row-wise, then
-0.164374	time T to T+5, then
-0.164374	value is poorly predictable, then
-0.164374	estimate can be made) then
-0.164374	of the same algorithm, then
-0.164374	too important to ignore, then
-0.164374	between x and y?" then
-0.164374	if this is obvious, then
-0.164374	an option for RTTI then
-0.164374	address a = 10000, then
-0.164374	that it writes only, then
-0.164374	to be too small, then
-0.164374	that u < 231 then
-0.164374	class C1 or C2, then
-0.164374	conditions is not met then
-0.164374	example i = 18, then
-0.164374	calculated first, then d+e, then
-0.293700	to Microsoft compilers. // It
-0.235768	negative or -0 } It
-0.235768	cc[i]); } 109 } It
-0.698574	12.4 Using intrinsic functions It
-0.350718	type of an object It
-0.236148	defined in two libraries It
-0.520894	part of the code. It
-0.221554	reductions on integer code. It
-0.590646	produce any extra code. It
-0.295285	in the source code. It
-0.216548	quite a long time. It
-0.611067	takes no extra time. It
-0.114340	integer takes longer time. It
-0.106135	would take longer time. It
-0.106135	Divisions take longer time. It
-0.313377	functions and function pointers It
-0.717993	what the compiler does It
-0.235682	Copying or clearing arrays It
-0.235228	trees, hash maps etc. It
-0.291199	had used intrinsic functions. It
-0.291199	contains optimized mathematical functions. It
-0.218103	to remove unreferenced functions. It
-0.374622	around in the memory. It
-0.411929	objects stored in memory. It
-0.351151	align dynamically allocated memory. It
-0.259395	instruction set is used. It
-0.259395	register stack is used. It
-0.588051	which they are used. It
-0.323518	are no longer used. It
-0.194614	feature is seldom used. It
-0.520817	bits in 64-bit systems. It
-0.234090	something on these data. It
-0.348424	the necessary instruction set. It
-0.701426	AMD and VIA processors. It
-0.233111	needs to be called. It
-0.288520	for different Intel CPUs. It
-0.232726	of the same compiler. It
-0.232419	the vector registers are: It
-0.338509	change during the loop. It
-0.099426	converted to a pointer. It
-0.099426	restriction from a pointer. It
-0.099426	accessed through a pointer. It
-0.099426	behaves like a pointer. It
-0.211695	need a 'this' pointer. It
-0.232270	in the best cases. It
-0.231683	this with induction variables. It
-0.231684	pointer to another class. It
-0.373757	finish. 3.8 System database It
-0.339637	the library function calls. It
-0.309291	into the vector registers. It
-0.316987	making a shared object. It
-0.231043	the Gnu C library. It
-0.260319	of 64-bit integer calculations. It
-0.290794	can do mathematical calculations. It
-0.441721	and 20 clock cycles. It
-0.231043	many file input/output operations. It
-0.230963	result of full optimization. It
-0.231043	order to improve performance. It
-0.457794	separate for each thread. It
-0.610569	in large data structures It
-0.229983	terms in one vector. It
-0.229088	by pointers or references. It
-0.408289	units in the CPU. It
-0.229295	system database in Windows. It
-0.402885	be applied to integers. It
-0.164605	efficient than signed integers. It
-0.164605	bit vector containing integers. It
-0.195412	you want it to. It
-0.195412	them to apply to. It
-0.228456	that are not critical. It
-0.228571	genuine compiler became available. It
-0.283211	time it was executed. It
-0.228571	make their software faster. It
-0.195305	are no cache problems. It
-0.195305	because of alignment problems. It
-0.688049	through a template parameter. It
-0.657570	on floating point expressions. It
-0.368248	using the previous value. It
-0.478626	by the operating system. It
-0.517618	overflow of the arrays. It
-0.227151	rid of the branch. It
-0.190491	bytes of storage space. It
-0.190491	RAM and disk space. It
-0.227151	this extra element zero. It
-0.227151	with some legacy software. It
-0.225921	be a better solution. It
-0.280429	when compiling for Linux. It
-0.316861	C++ or assembly language. It
-0.225921	message if it is. It
-0.226068	vectorize the code automatically. It
-0.183978	alone in the core. It
-0.183978	in the same core. It
-0.591084	OpenMP and automatic vectorization. It
-0.225773	to be true anyway. It
-0.300640	want it to do. It
-0.223943	a double precision constant. It
-0.223943	able to see this. It
-0.175903	at compile time here. It
-0.175903	simply not appropriate here. It
-0.314605	cannot modify data members. It
-0.224114	and irregular response times. It
-0.224114	on the previous one. It
-0.224114	the desired program structure. It
-0.275465	is not a profiler. It
-0.305254	is no exception handling. It
-0.275465	use standardized installation tools. It
-0.359950	and floating point numbers. It
-0.221394	i++) b[i] = a[i]; It
-0.305254	which prevents out-of-order execution. It
-0.221394	array, or approximately so. It
-0.217597	keyboard or mouse input. It
-0.217597	have its own IDE. It
-0.217597	static and dynamic versions. It
-0.217597	the line number information. It
-0.390743	than the user interface. It
-0.499644	The pitfalls of unit-testing It
-0.889405	matter of programming style. It
-0.217597	value from the counts. It
-0.217597	or make files smaller. It
-0.217597	have to do manually. It
-0.217597	compatible with 16-bit programs. It
-0.211675	on that particular part. It
-0.346181	condition i < 100. It
-0.346181	important. 9.2 Cache organization It
-0.122141	designed for this purpose. It
-0.122141	for a specific purpose. It
-0.346181	data are accessed sequentially. It
-0.211343	in a non-sequential manner. It
-0.211343	const restriction on x. It
-0.199098	to the new context. It
-0.199098	the objects are aligned. It
-0.199098	try, catch, and throw. It
-0.250297	not safe, of course. It
-0.329562	precision (see page 73). It
-0.329562	stack also has disadvantages: It
-0.199098	data from a buffer. It
-0.250297	inlined, or optimized away. It
-0.329562	or a make utility. It
-0.199098	little more syntax check. It
-0.250297	This is data decomposition. It
-0.199098	user settings are lost. It
-0.329562	and difficult to read. It
-0.199098	and static data. 148 It
-0.199098	the program is started. It
-0.199098	highly compatible with Gnu. It
-0.199098	a CPU dispatcher updated. It
-0.250297	branches are poorly predictable. It
-0.250297	example 7.15b below shows. It
-0.199098	free and open source. It
-0.199098	is used more efficiently. It
-0.199098	are not doing divisions. It
-0.464444	explained on page 72. It
-0.199098	casting, but also safer. It
-0.164307	issuing an error message. It
-0.164307	of the final product. It
-0.164307	turn on and off. It
-0.164307	very difficult to diagnose. It
-0.164307	included in the profile. It
-0.164307	the user. Feature bloat. It
-0.164307	Pointers to contained objects? It
-0.164307	or with compile-time polymorphism. It
-0.164307	press or mouse move. It
-0.164307	statement several iterations ahead. It
-0.164307	13.1 CPU dispatch strategies It
-0.164307	as the C-style type-casting. It
-0.164307	is waiting for response. It
-0.164307	reader what is happening. It
-0.164307	format is not standardized. It
-0.164307	is actually quite convenient. It
-0.164307	solution is too high. It
-0.164307	where they are unavoidable. It
-0.164307	computer game or animation. It
-0.164307	(SSE): #include <xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It
-0.164307	work on non-Intel processors). It
-0.164307	but the programmer can. It
-0.164307	is quite tedious indeed. It
-0.164307	of attack for hackers. It
-0.164307	though less user friendly. It
-0.164307	(RTTI). See page 54. It
-0.164307	chosen for the label. It
-0.164307	v.f are both positive. It
-0.164307	balance between these considerations. It
-0.164307	accessing list[i].a and list[i].b. It
-0.164307	volatile doesn't mean atomic. It
-0.164307	code version performs poorly. It
-0.164307	in a first-in-last-out fashion. It
-0.164307	compilers to choose between. It
-0.164307	explained on page 130. It
-0.164307	used for pointer conversions. It
-0.164307	templates (see p. 57). It
-0.164307	all branches for correctness. It
-0.164307	exceptions. See page 61. It
-0.164307	mixed types or sizes? It
-0.164307	reduce (a*b*c)+(c*b*a) to a*b*c*2. It
-0.164307	two additions with double's. It
-0.164307	remedy is memory pooling. It
-0.164307	it with many decimals. It
-0.164307	and simple to develop. It
-0.164307	known as memory leaks. It
-0.164307	my comments, in green. It
-0.164307	unsigned integers is costless. It
-0.164307	then use a queue. It
-0.000801	replace this by // Example
-0.020480	be replaced by // Example
-0.140483	improving the code. // Example
-0.140483	at compile time. // Example
-0.000313	the code. Example: // Example
-0.000626	of code. Example: // Example
-0.000626	extra code. Example: // Example
-0.002508	same time. Example: // Example
-0.001252	called function. Example: // Example
-0.001252	pure function. Example: // Example
-0.001252	in memory. Example: // Example
-0.001252	static memory. Example: // Example
-0.002508	are used. Example: // Example
-0.002508	is called. Example: // Example
-0.002508	a loop. Example: // Example
-0.000626	of 2. Example: // Example
-0.002508	consecutive variables. Example: // Example
-0.002508	function calls. Example: // Example
-0.002508	XMM registers. Example: // Example
-0.002508	float variable. Example: // Example
-0.002508	is needed. Example: // Example
-0.002508	detailed instructions. Example: // Example
-0.002508	non-sequential order. Example: // Example
-0.002508	jumps to. Example: // Example
-0.001252	for overflow. Example: // Example
-0.001252	cause overflow. Example: // Example
-0.002508	previous value. Example: // Example
-0.002508	previous branch. Example: // Example
-0.002508	same constant. Example: // Example
-0.002508	branch prediction. Example: // Example
-0.002508	calculated result. Example: // Example
-0.001252	loop counter. Example: // Example
-0.001252	integer counter. Example: // Example
-0.002508	single operation. Example: // Example
-0.002508	is finished. Example: // Example
-0.002508	different ways. Example: // Example
-0.002508	parallel execution. Example: // Example
-0.002508	array elements. Example: // Example
-0.002508	only once. Example: // Example
-0.002508	is limited. Example: // Example
-0.002508	lookup-table static. Example: // Example
-0.002508	is known. Example: // Example
-0.002508	same thing. Example: // Example
-0.002508	independent divisions. Example: // Example
-0.002508	or later. Example: // Example
-0.002508	all zeroes. Example: // Example
-0.002508	be undesired. Example: // Example
-0.002508	bit offsets). Example: // Example
-0.002508	loop overhead. Example: // Example
-0.002508	members individually. Example: // Example
-0.114142	Integer size conversion // Example
-0.114142	/ unsigned conversion // Example
-0.184876	and Gnu compilers. // Example
-0.008076	Take the example: // Example
-0.000667	time. For example: // Example
-0.000667	etc. For example: // Example
-0.000667	cases. For example: // Example
-0.000334	to. For example: // Example
-0.000667	structure. For example: // Example
-0.000667	sizes. For example: // Example
-0.000667	lookup. For example: // Example
-0.000667	valid. For example: // Example
-0.000667	predictable. For example: // Example
-0.000667	combined. For example: // Example
-0.000667	completely. For example: // Example
-0.008076	the following example: // Example
-0.008076	ARRAYSIZE. Another example: // Example
-0.012172	the code to: // Example
-0.001202	reduce this to: // Example
-0.002407	change this to: // Example
-0.002407	changing this to: // Example
-0.002407	Change this to: // Example
-0.012172	be optimized to: // Example
-0.012172	be reduced to: // Example
-0.003011	be changed to: // Example
-0.140483	to align arrays. // Example
-0.140483	a different array. // Example
-0.001113	look like this: // Example
-0.001113	looks like this: // Example
-0.140483	organized as follows: // Example
-0.140483	powers of 2: // Example
-0.140483	template metaprogramming is. // Example
-0.140483	manual for details. // Example
-0.140483	function from www.agner.org/optimize/asmlib.zip. // Example
-0.140483	by writing: 103 // Example
-0.140483	it with 1: // Example
-0.140483	converted to unsigned. // Example
-0.041973	list of numbers: // Example
-0.041973	floating point numbers: // Example
-0.041973	of 100 numbers: // Example
-0.064586	series of calculations: // Example
-0.064586	to modulo calculations: // Example
-0.064586	joining the operations: // Example
-0.064586	avoid modulo operations: // Example
-0.140483	risk of overflow: // Example
-0.140483	the following way: // Example
-0.010119	be replaced with: // Example
-0.041973	}; Replace with: // Example
-0.140483	this example: 38 // Example
-0.015269	the sign bit: // Example
-0.064586	such a case: // Example
-0.064586	to lower case: // Example
-0.140483	outside the loop: // Example
-0.015269	a table lookup: // Example
-0.140483	a template parameter: // Example
-0.015269	a lookup table: // Example
-0.140483	the derived class: // Example
-0.031093	overflow is needed: // Example
-0.031093	bookkeeping is needed: // Example
-0.031093	set is available: // Example
-0.031093	SSE2 is available: // Example
-0.140483	the lrint function: // Example
-0.140483	to use SafeArray: // Example
-0.140483	in a union: // Example
-0.140483	the fraction bits: // Example
-0.140483	bit to zero: // Example
-0.140483	to floating point: // Example
-0.140483	set is enabled: // Example
-0.140483	b double precision: // Example
-0.140483	a positive integer: // Example
-0.140483	smallest members last: // Example
-0.140483	loop control condition: // Example
-0.140483	calculation more efficient: // Example
-0.140483	in the arrays: // Example
-0.140483	case of underflow: // Example
-0.140483	the runtime polymorphism: // Example
-0.140483	is unsigned Examples: // Example
-0.140483	floating point variable: // Example
-0.140483	of a double: // Example
-0.140483	loop and reorganize: // Example
-0.140483	two suggested improvements). // Example
-0.140483	Intel vector classes): // Example
-0.140483	2'nd order polynomial: // Example
-0.140483	method using InstructionSet(): // Example
-0.140483	array of structures: // Example
-0.140483	compare absolute values: // Example
-0.140483	the static keyword: // Example
-0.140483	a single comparison: // Example
-0.140483	SSE2 instruction set: // Example
-0.140483	with the reciprocal: // Example
-0.140483	Template Library (WTL): // Example
-0.140483	a loop counter: // Example
-0.140483	can be used: // Example
-0.140483	memset and memcpy: // Example
-0.140483	a common denominator: // Example
-0.140483	chain in two: // Example
-0.140483	4, we have: // Example
-0.140483	by using memset: // Example
-0.140483	type-casting its address: // Example
-0.140483	index changes fastest: // Example
-0.140483	array with alloca: // Example
-0.140483	to the exponent: // Example
-0.140483	constant reference instead: // Example
-0.140483	doing type conversions: // Example
-0.140483	with element matrix[c][r]. // Example
-0.140483	Intel vector classes: // Example
-0.140483	for matrix a: // Example
-0.140483	them as integers: // Example
-0.140483	through pointers, e.g.: // Example
-0.140483	in example 7.22. // Example
-0.140483	a pivot search: // Example
-0.140483	in example 9.5b. // Example
-0.140483	or always false: // Example
-0.140483	code inside square: // Example
-0.140483	a certain interval: // Example
-0.140483	two induction variables: // Example
-0.140483	the return statement: // Example
-0.140483	it a template: // Example
-0.140483	of this capability: // Example
-0.140483	by two gives: // Example
-0.140483	b * 1.2f; // Example
-0.353623	= i; } } Example
-0.234533	if both positive } Example
-0.234533	n to exponent } Example
-0.229417	8.26a (32-bit mode): ; Example
-0.229417	from example 8.26b: ; Example
-0.163778	size Time per element Example
-0.163778	9.6a Time per element Example
-0.350630	out of the loop. Example
-0.382176	branch inside the loop. Example
-0.608779	systems with big-endian storage. Example
-0.284331	rather than at runtime. Example
-0.212292	gives the chosen expression. Example
-0.236815	library asmlib.. // or from
-0.236815	the data optimally, or from
-0.312889	compiler and call it from
-0.236362	CPU and prevent it from
-0.236362	compiling. This prevents it from
-0.433432	as a library function from
-0.442736	just a single function from
-0.460722	This is the code from
-0.053108	the following assembly code from
-0.324472	the previous value than from
-0.574924	point is faster than from
-0.131835	can prevent the compiler from
-0.131835	will prevent the compiler from
-0.048044	it prevents the compiler from
-0.023358	This prevents the compiler from
-0.048044	also prevents the compiler from
-0.048044	division prevents the compiler from
-0.322057	by invoking the compiler from
-0.289709	memory. Copying constant data from
-0.233946	Memory access Accessing data from
-0.233946	buffer or send data from
-0.275960	load aligned integer vector from
-0.063279	load unaligned integer vector from
-0.120942	it is called only from
-0.056274	member function called only from
-0.056274	Assume function called only from
-0.333052	code prevent the CPU from
-0.062424	This prevents the CPU from
-0.336809	from Intel and one from
-0.234971	are currently available, one from
-0.534673	prevents the level-2 cache from
-0.488836	reload the level-1 cache from
-0.237135	in the latest compilers from
-0.237070	necessary to convert b from
-0.337140	with reading the value from
-0.337140	to reload the value from
-0.311907	tables Reading a value from
-0.277546	can subtract this value from
-0.320955	to calculate each value from
-0.348126	to fetch the variable from
-0.347365	or writing a variable from
-0.167540	exceptions is to return from
-0.167540	is recommended to return from
-0.284093	unused label ; return from
-0.349553	to copy the table from
-0.321386	We take the elements from
-0.090681	Load eight consecutive elements from
-0.276416	big and is called from
-0.361105	function that is called from
-0.508070	if it is called from
-0.179440	library can be called from
-0.245808	function may be called from
-0.179440	function cannot be called from
-0.282765	derived class are called from
-0.055737	alone compiler when called from
-0.055737	not only when called from
-0.055737	but also when called from
-0.372316	function is also called from
-0.292673	different address. A call from
-0.323149	requesting a map file from
-0.273746	and it is available from
-0.357862	compiler, which is available from
-0.280079	multi-threaded software are available from
-0.280079	standard tasks are available from
-0.322776	option is also available from
-0.194062	manuals are always available from
-0.194062	should be easily available from
-0.194062	Math Kernel Library, available from
-0.410124	or variable is accessed from
-0.262001	DLL can be accessed from
-0.262001	They can be accessed from
-0.216322	used, even when accessed from
-0.235814	64 or 0x40 bytes from
-0.334059	doesn't prevent two threads from
-0.235662	inputs are the integers from
-0.092050	the value is calculated from
-0.092050	each value is calculated from
-0.209008	value xn is calculated from
-0.209008	of n! is calculated from
-0.347779	the condition is known from
-0.235370	library. It requires support from
-0.235293	copying the entire list from
-0.235045	-128, and subtracting 1 from
-0.298913	compilers. Mixing object files from
-0.224613	modules or resource files from
-0.487240	processors is not optimal from
-0.634376	for different instruction sets from
-0.234356	often used functions separate from
-0.634085	of the memory block from
-0.291457	as well. The conversion from
-0.209987	64-bit mode. A conversion from
-0.209987	and truncation. Efficient conversion from
-0.234340	low-priority thread steals resources from
-0.234043	2n by subtracting n from
-0.310158	is transferred at runtime from
-0.289978	functions that are needed from
-0.303729	and ownership is transferred from
-0.220136	deleted, copied or transferred from
-0.202520	we want to read from
-0.202520	clock cycles to read from
-0.180393	evicted when we read from
-0.180393	the program had read from
-0.180393	expect to 99 read from
-0.233854	of library functions linked from
-0.289604	x86 family of microprocessors from
-0.233616	is useful for calling from
-0.233388	in example 9.5a goes from
-0.199770	8-bit integers which range from
-0.041999	containing the address range from
-0.041999	covered the address range from
-0.435042	has to be loaded from
-0.195111	C++ language, all conversions from
-0.086690	order to avoid conversions from
-0.086690	you cannot avoid conversions from
-0.232429	is waiting for response from
-0.694882	the four cache lines from
-0.211932	is initialized or comes from
-0.211932	the main feedback comes from
-0.231384	final array size right from
-0.373960	are reading and writing from
-0.231482	count has been reduced from
-0.231482	It should be clear from
-0.316998	be calculated more efficiently from
-0.231052	names and variable names from
-0.230312	label ; restore ebx from
-0.318523	example 14.1c is copied from
-0.171620	it possible to come from
-0.171620	are uninitialized or come from
-0.171620	or if they come from
-0.190866	numbers and integers Conversion from
-0.190866	set is enabled. Conversion from
-0.227630	code has a jump from
-0.146092	the user is far from
-0.146092	that have values far from
-0.146092	is of course far from
-0.300978	bit must be saved from
-0.226352	in the programming manuals from
-0.226262	from 0x4700. Reading again from
-0.176140	that a program reads from
-0.176140	0x2710 and later reads from
-0.013925	code that can benefit from
-0.013925	XMM registers can benefit from
-0.028311	of memory will benefit from
-0.013925	variable that could benefit from
-0.013925	the code could benefit from
-0.039553	it takes to recover from
-0.009556	be able to recover from
-0.039553	is made to recover from
-0.222164	the factors are generated from
-0.221788	registers has been increased from
-0.221913	the beginning. ret returns from
-0.222038	the pointer it gets from
-0.221913	an addition to sum1 from
-0.218143	performance penalty when going from
-0.217988	another addition to sum2 from
-0.067879	the compiler is prevented from
-0.067879	returning. F1 is prevented from
-0.271785	own graphical user interfaces from
-0.412730	be in the interval from
-0.889986	bb into vector b: from
-0.211728	word static is removed from
-0.211932	code is not separated from
-0.211728	88 for details. Inheritance from
-0.122350	for demonstration purposes. Available from
-0.122350	library Intel Agner Available from
-0.211728	time to answer questions from
-0.211932	automatically deallocated when returning from
-0.211728	... // Use ReadTSC() from
-0.264534	table to be evicted from
-0.074570	switch statements often suffer from
-0.074570	code can therefore suffer from
-0.199472	will get no warning from
-0.199472	variable can be fetched from
-0.074570	These instructions are accessible from
-0.074570	that are not accessible from
-0.199472	seldom occur and recovering from
-0.330068	relieving the const restriction from
-0.164653	the header file timingtest.h from
-0.164653	function is not referenced from
-0.164653	_mm256_zeroupper() before any transition from
-0.164653	is used and popped from
-0.164653	-fno-strict-overflow. You may deviate from
-0.164653	lesson we can learn from
-0.164653	make profiling feasible. Interference from
-0.164653	+ 2.0f; } 115 from
-0.164653	by r is re-loaded from
-0.875431	this part of the memory
-0.089008	transfer ownership of the memory
-0.089008	transfers ownership of the memory
-0.089008	looses ownership of the memory
-0.352549	use segmentation of the memory
-0.356093	deleted properly and the memory
-0.356644	efficient solution for the memory
-1.123538	disadvantage is that the memory
-0.569208	unrolling so that the memory
-0.835586	addresses divisible by the memory
-0.652281	execution time because the memory
-0.334585	This can cause the memory
-0.292687	and free) causes the memory
-0.102370	do not free the memory
-0.102370	it could free the memory
-0.292687	low priority. Especially the memory
-0.237818	most important remedy is memory
-0.898879	the size of a memory
-0.349522	entire file in a memory
-0.140509	all strings in a memory
-0.140509	store strings in a memory
-0.354623	in edx as a memory
-0.345028	re- allocating when a memory
-0.078253	is stored at a memory
-0.037367	i.e. stored at a memory
-0.292072	integer that holds a memory
-0.236023	overhead of managing a memory
-0.445405	is a part of memory
-0.344538	time. A part of memory
-0.350602	includes optimized versions of memory
-0.101575	object. The allocation of memory
-0.101575	it involves allocation of memory
-0.310553	one big block of memory
-0.348973	a small piece of memory
-0.327527	the same range of memory
-0.234402	with an index of memory
-0.202589	when the amount of memory
-0.202589	increases the amount of memory
-0.202589	reserve the amount of memory
-0.440471	the required amount of memory
-0.234402	this wasteful copying of memory
-0.672907	involves the risk of memory
-0.234402	force the swapping of memory
-0.531884	allocation and deallocation of memory
-0.234402	allocation and de-allocation of memory
-0.234402	use large amounts of memory
-0.331600	large overhead cost to memory
-0.314527	instructions write directly to memory
-0.325105	when CPU access and memory
-0.235686	to store x in memory
-0.337455	order of functions in memory
-0.486099	to a variable in memory
-0.344907	not on variables in memory
-0.790911	variable is stored in memory
-0.099174	are scattered around in memory
-0.291688	be accessed sequentially in memory
-0.235686	Variables whose distance in memory
-0.313634	very large libraries. The memory
-0.236986	for the stack. The memory
-0.236986	linkage table (PLT). The memory
-0.575256	it is likely that memory
-0.237567	use a container or memory
-0.237706	avoid this method if memory
-0.341464	the latency or by memory
-0.237603	mix mathematical calculations with memory
-0.048195	of error known as memory
-0.048195	programming error known as memory
-0.293890	all these purposes. This memory
-0.237521	be a bottleneck than memory
-0.314231	all modern computers have memory
-0.348043	memory block, but this memory
-0.331014	runtime DLL takes more memory
-0.312540	to even allocate more memory
-0.424578	reload the value from memory
-0.421260	want to read from memory
-0.232593	to be loaded from memory
-0.232593	r is re-loaded from memory
-0.982650	of code and data memory
-0.170251	compilers will use different memory
-0.170251	the threads use different memory
-0.566468	scattered around at different memory
-0.417117	write to the same memory
-0.417117	reads to the same memory
-0.491370	objects in the same memory
-0.491370	lengths in the same memory
-0.448964	to use the same memory
-0.448964	not use the same memory
-0.412512	objects share the same memory
-0.412512	members share the same memory
-0.344431	all strings in one memory
-0.235076	ended queue) allocates one memory
-0.234427	of it) load into memory
-0.321114	libraries are loaded into memory
-0.334422	A method with multiple memory
-0.233935	is to keep multiple memory
-0.153216	the table in static memory
-0.438523	be stored in static memory
-0.153216	Storing something in static memory
-0.309964	the memory. The static memory
-0.088724	constant data from static memory
-0.088724	the table from static memory
-0.088724	entire list from static memory
-0.088724	is copied from static memory
-1.061913	is the most efficient memory
-0.236959	of the maximum possible memory
-0.233431	point constant always takes memory
-0.233431	#define directive never takes memory
-0.236822	destructor that destroys any memory
-0.292798	typically have much less memory
-0.236622	in some cases take memory
-0.337766	to allocate a new memory
-0.233256	classes allocate a new memory
-0.075626	typical uses of dynamic memory
-0.075626	The cost of dynamic memory
-0.206606	The advantages of dynamic memory
-0.075626	the costs of dynamic memory
-0.075626	The disadvantages of dynamic memory
-0.142159	errors associated with dynamic memory
-0.010222	whether to use dynamic memory
-0.010222	reason to use dynamic memory
-0.020691	class libraries use dynamic memory
-0.010222	container classes use dynamic memory
-0.010222	string classes use dynamic memory
-0.020691	as Java, use dynamic memory
-0.142159	drawbacks of using dynamic memory
-0.142159	or container without dynamic memory
-0.020691	size to avoid dynamic memory
-0.020691	how to avoid dynamic memory
-0.042418	possible, and avoid dynamic memory
-0.142159	multiple purposes. All dynamic memory
-0.142159	Container classes Whenever dynamic memory
-0.350356	the object in case memory
-0.331124	static memory to stack memory
-0.164134	the table to stack memory
-0.223134	is stored in stack memory
-0.236032	and page 87 about memory
-0.324071	complicated in a large memory
-0.276666	many allocations of large memory
-0.222454	2 Gbytes. This large memory
-0.228766	and deallocation of big memory
-0.303845	together in one big memory
-0.324176	can calculate how much memory
-0.742100	of the most common memory
-0.507604	to wrap the allocated memory
-0.090647	programming error. The allocated memory
-0.090647	garbage collection. The allocated memory
-0.205345	in its own allocated memory
-0.577534	12.8 Aligning dynamically allocated memory
-0.235470	to wait for another memory
-0.477700	use for a particular memory
-0.465639	means that a particular memory
-0.478393	list has its own memory
-0.057852	of the new bigger memory
-0.027963	that a new bigger memory
-0.027963	allocate a new bigger memory
-0.337634	contents of the old memory
-0.288960	but has a smaller memory
-0.263559	proxy for the main memory
-0.263559	CPU than the main memory
-0.130769	in system code. Dynamic memory
-0.091348	Dynamic memory allocation Dynamic memory
-0.091348	dynamic memory allocation. Dynamic memory
-0.091348	data caching inefficient. Dynamic memory
-0.091348	64-bit systems). 28 Dynamic memory
-0.091348	to optimization are. Dynamic memory
-0.091348	allocations is limited. Dynamic memory
-0.043277	allocated memory. 9.6 Dynamic memory
-0.043277	...................................................................................................... 90 9.6 Dynamic memory
-0.287835	at compile time. No memory
-0.428890	best way to prevent memory
-0.090857	pop ebx. 9 Optimizing memory
-0.090857	............................................................................. 84 9 Optimizing memory
-0.251238	the amount of RAM memory
-0.072457	Accessing data from RAM memory
-0.072457	the variable from RAM memory
-0.159472	around 1980 where RAM memory
-0.308388	operating system to swap memory
-0.222078	up to cause seven memory
-0.222162	to see the excessive memory
-0.221994	may have a larger memory
-0.222078	stored in one contiguous memory
-0.271841	are loaded at round memory
-0.291305	be used for saving memory
-0.218192	are writing to uncached memory
-0.218192	often have execution units, memory
-0.283903	loaded at an arbitrary memory
-0.211929	caching less efficient. Extra memory
-0.211929	a function that allocates memory
-0.035720	used where execution speed, memory
-0.035720	flexibility, while execution speed, memory
-0.164833	speed and for minimizing memory
-0.164833	with fixed strides. Uncached memory
-0.164833	a feature for reserving memory
-0.237734	preferably be responded to at
-0.461403	vector method may be at
-0.237421	library is loaded or at
-0.293190	function and calculate it at
-0.237006	same as reflecting it at
-0.237600	the virtual 53 function at
-0.473095	should be aligned by at
-0.236939	generates no extra code at
-0.236939	compiler inserts extra code at
-0.293626	reduced speed or not at
-0.064266	compile time rather than at
-0.357592	known to the compiler at
-0.237421	likely to consume time at
-0.407202	table to stack memory at
-0.293548	If the class has at
-0.350953	runtime frameworks are used at
-0.047603	that are never used at
-0.047603	they are never used at
-0.433766	compatible with other compilers at
-0.237050	below. 126 Make pointer at
-0.562916	in the function library at
-0.307744	time. The function library at
-0.228131	function call. Load library at
-0.306965	in the asmlib library at
-0.324090	as much as possible at
-0.293067	replaced by its value at
-0.351495	to write the variable at
-0.924308	to calculate the table at
-0.224017	the diagonal. The elements at
-0.202424	can handle eight elements at
-0.202424	it handles eight elements at
-0.224017	even add dummy elements at
-0.313355	make applications run faster at
-0.677322	if it is stored at
-0.502835	bytes should be stored at
-0.312578	therefore preferably be stored at
-0.215328	ebx is then stored at
-0.021758	by 16, i.e. stored at
-0.340407	libraries. The memory address at
-0.236484	writing a small bit at
-0.343845	64-bit double 32 bits at
-0.235998	giving specific optimization instructions at
-0.347125	the calculations are available at
-0.826760	transferred on the stack at
-0.427004	a function that calls at
-0.235765	for doing some calculations at
-0.334530	to test 16 bytes at
-0.235710	compilers that are best at
-0.235517	(2n / b) etc. at
-0.322621	are not very good at
-0.280643	function inlining is done at
-0.280643	or C2::Disp() is done at
-0.424313	have to be done at
-0.437057	certain calculations are done at
-0.099004	a code one line at
-0.099004	more than one line at
-0.011397	appendix to this manual at
-0.352763	for metaprogramming, as explained at
-0.336962	of coefficients is calculated at
-0.342224	or cannot be calculated at
-0.216434	If it is known at
-0.135176	of objects is known at
-0.030057	of elements is known at
-0.135176	that n is known at
-0.135176	the divisor is known at
-0.111071	it cannot be known at
-0.003279	array is not known at
-0.003279	objects is not known at
-0.003279	length is not known at
-0.003279	required is not known at
-0.003279	divisor is not known at
-0.016649	that are not known at
-0.111071	is an integer known at
-0.314014	Is the size known at
-0.111071	is a constant known at
-0.214476	CPU-type is already known at
-0.291423	processors are not supported at
-0.300289	each thread may run at
-0.319839	each thread will run at
-0.022527	for checking multiple values at
-0.351701	is counting clock cycles at
-0.235050	for calculating row addresses at
-0.234549	appropriate. 8. Avoid branches at
-0.378083	a floating point multiplication at
-0.234303	other than its name at
-0.341605	set seconds to zero at
-0.274500	becoming better and better at
-0.220542	The compilers are better at
-0.341925	9. Avoid table lookup at
-0.012100	2 bytes. first byte at
-0.006008	4 bytes. first byte at
-0.012100	8 bytes. first byte at
-0.012100	400 bytes. first byte at
-0.060118	4 unused bytes byte at
-0.167847	byte at 1 byte at
-0.000780	at 0, last byte at
-0.001758	at 8, last byte at
-0.007075	at 16, last byte at
-0.007075	at 12, last byte at
-0.007075	at 400, last byte at
-0.060118	byte at 15 byte at
-0.415679	function, m is transferred at
-0.415257	that data are aligned at
-0.116786	dispatcher should not look at
-0.215787	108 You may look at
-0.035550	and if you look at
-0.035550	efficient. If you look at
-0.035550	are: When you look at
-0.116786	that you should look at
-0.116786	you may also look at
-0.054479	the code. Let's look at
-0.054479	4 rows. Let's look at
-0.116786	For example, let's look at
-0.289307	times with four numbers at
-0.377249	than a small piece at
-0.233327	select all installation options at
-0.309147	b has to start at
-0.216147	garbage collection may start at
-0.110293	to be scattered around at
-0.110293	may be scattered around at
-0.232983	to do multiple things at
-0.232823	at doing equivalent reductions at
-0.298666	sure to be loaded at
-0.298666	section can be loaded at
-0.431439	dynamic libraries are loaded at
-0.180871	which is typically loaded at
-0.232743	that model N+1 supports at
-0.232276	Linux and BSD comes at
-0.231457	number, or no offset at
-0.117389	model that was unknown at
-0.001374	processors that were unknown at
-0.002062	models that were unknown at
-0.230918	and handle one square at
-0.231031	more than one thing at
-0.337818	same cache, at least at
-0.305794	garbage collection can occur at
-0.229520	the CPU, which counts at
-0.371543	objects can be added at
-0.228591	the entire library (or at
-0.227515	use the same DLL at
-0.095438	while if is resolved at
-0.095438	because #if is resolved at
-0.051982	parameter is always resolved at
-0.051982	parameters are always resolved at
-0.390826	be stored in memory, at
-0.224415	then it will break at
-0.224304	languages where everything happens at
-0.224415	operand is not evaluated at
-0.221753	brand was less popular at
-0.221885	as sqrt and pow at
-0.275871	suited for the project at
-0.148486	page 128 below. Dispatch at
-0.148486	with different compilers. Dispatch at
-0.218281	examples in the appendix at
-0.217953	the array must begin at
-0.217953	share the same cache, at
-0.264495	updates through the Internet at
-0.346953	in the program flow at
-0.264495	register size is handled at
-0.199439	to memset and memcpy, at
-0.199439	than a few kilobytes at
-0.199439	may not be visible at
-0.199439	function uses by looking at
-0.199439	swapped with element matrix[c][r] at
-0.199439	CPU to generate interrupts at
-0.199439	are able to do, at
-0.199439	The loop body begins at
-0.250680	very time-consuming garbage collector at
-0.164622	diagonal have been lost at
-0.164622	This chapter is aiming at
-0.164622	10; Templates are instantiated at
-0.164622	compiler will calculate (1./1.2345) at
-0.164622	making the dispatch decision at
-0.164622	E-book Usability for Nerds at
-0.164622	Sutter: A Pragmatic Look at
-0.164622	do not need relocation at
-0.164622	incredibly stupid things. Looking at
-0.164622	updates may come unpredictably at
-0.164622	inserts temporary debug breakpoints at
-0.492932	better use of the data
-1.407381	the address of the data
-0.765947	new instance of the data
-0.452636	the efficiency of the data
-0.350260	different sizes of the data
-0.350260	thorough analysis of the data
-0.140242	code cache and the data
-0.140242	level-2 cache and the data
-0.202766	cause contentions in the data
-0.445521	form than if the data
-0.485131	and efficient if the data
-0.523903	goes faster if the data
-0.344629	106 CPUs if the data
-0.355233	most efficiently when the data
-0.577497	value to make the data
-0.349793	as possible into the data
-0.552166	special cases where the data
-0.348389	non-sequential which makes the data
-0.321819	finally (4) access the data
-0.343043	to split up the data
-0.516759	mispredictions by making the data
-0.253367	writeable data. Therefore, the data
-0.253367	time consuming. Therefore, the data
-0.403845	advantageous the smaller the data
-0.235004	time than processing the data
-0.336839	library, you divide the data
-0.235004	instructions for converting the data
-0.235004	smaller by reordering the data
-0.235004	compiler from aligning the data
-0.378786	you to manipulate the data
-0.290912	performance by organizing the data
-0.101796	last vector. Organize the data
-0.101796	a bottleneck. Organize the data
-0.461688	of data. This is data
-0.357259	the offset of a data
-0.237243	In simple cases, a data
-0.324555	code for accessing a data
-0.313941	or structures. Accessing a data
-0.632750	a realistic set of data
-0.582988	as the size of data
-0.343343	size and type of data
-0.983970	In the case of data
-0.496437	if a lot of data
-0.496437	require a lot of data
-0.311870	cycles per byte of data
-0.329509	efficiently if pieces of data
-0.048004	by 16. Alignment of data
-0.023339	variables. 9.5 Alignment of data
-0.023339	88 9.5 Alignment of data
-0.048004	Table 7.2. Alignment of data
-0.379488	because the contents of data
-0.299310	data or pointers to data
-0.299310	block. Any pointers to data
-0.293634	calls. Internal references to data
-0.048825	size of code and data
-0.048825	amount of code and data
-0.023727	Caching of code and data
-0.103843	time when code and data
-0.103843	CriticalFunction when code and data
-0.290639	code cache use and data
-0.443169	of public functions and data
-0.469139	that code cache and data
-0.234764	makes code caching and data
-0.234764	literature on algorithms and data
-0.234764	here: functional decomposition and data
-0.351651	= Func(a[i]); } The data
-0.339696	odd-sized vector data. The data
-0.292309	an array index. The data
-0.236232	data members (properties) The data
-0.292309	between multiple processes. The data
-0.336208	Some applications require that data
-0.313523	size of program or data
-0.236893	the code size or data
-0.236976	prefetch data explicitly if data
-0.236976	caching is poor if data
-0.237599	in the vectors. This data
-0.237430	profiler may sample more data
-0.077380	much less efficiently when data
-0.077380	somewhat less efficiently when data
-0.357527	for accessing the same data
-0.349718	Linked lists and other data
-0.333478	list, database, or other data
-0.550968	The order in which data
-0.418742	combined size of all data
-0.438916	Similar operations on all data
-0.230961	simply by copying all data
-0.230961	possible to contain all data
-0.777710	the most often used data
-0.537321	value of a class data
-0.322454	of the parent class data
-0.309950	then merge the multiple data
-0.233896	is performed on multiple data
-0.236890	space by allowing two data
-0.316714	The advantage of static data
-0.324852	function tables. The static data
-0.230819	caching problems because static data
-0.236928	is a structure where data
-0.204144	the memory. This makes data
-0.204144	the program. This makes data
-0.204144	or class. This makes data
-0.204144	than needed. This makes data
-0.204144	random order. This makes data
-0.204144	32 bits. This makes data
-0.204144	become fragmented. This makes data
-0.288386	the stack, which makes data
-0.521330	table before the first data
-0.239334	a set of test data
-0.239334	suitable set of test data
-0.300503	test data. The test data
-0.229813	perhaps }; // constant data
-0.229813	stack memory. Copying constant data
-0.611433	be useful for making data
-0.235967	simplest cases, but its data
-0.235978	See page 26 about data
-0.038979	Cache contentions in large data
-0.247483	highly optimized for large data
-0.196596	mathematical applications with large data
-0.196596	doing calculations on large data
-0.375062	dramatically for very large data
-0.196596	test server. Use large data
-0.235902	between multiple threads, while data
-0.200781	programs that have big data
-0.386080	if you have big data
-0.305647	apply to very big data
-0.283416	to get as much data
-0.303419	program has too much data
-0.347926	memory space to store data
-0.378953	possible to store intermediate data
-0.299779	to access a public data
-0.356879	public functions and public data
-0.339653	each thread its own data
-0.233425	A disadvantage of binary data
-0.233425	by a plain old data
-0.233117	and for more advanced data
-0.375474	mirrored in the level-1 data
-0.219456	bigger than the level-1 data
-0.219456	computer where the level-1 data
-0.169889	there is a level-1 data
-0.231956	static data, including local data
-0.339707	for putting the right data
-0.374073	on reading and writing data
-0.231002	sampling generates too little data
-0.372417	Most compilers will align data
-0.229722	member function cannot modify data
-0.229671	buffer overflow on input data
-0.032553	cannot access any non-static data
-0.228814	have finished the time-consuming data
-0.226446	stored in a far data
-0.366718	the method of storing data
-0.857400	to use the smallest data
-0.224553	updates, remote help files, data
-0.132659	don't have to prefetch data
-0.132659	that modern processors prefetch data
-0.132659	able to automatically prefetch data
-0.224553	video processing, signal processing, data
-0.224637	3.13 Memory access Accessing data
-0.221917	to mirror the remote data
-0.222116	supported instruction set. Aligning data
-0.067957	...................................................................................................................... 96 9.9 Access data
-0.067957	at www.agner.org/optimize/cppexamples.zip. 9.9 Access data
-0.218116	3-dimensional vectors RGB image data
-0.067957	on performance. 7.18 Class data
-0.067957	classes............................................................................................ 51 7.18 Class data
-0.218116	make vectorization favorable: Small data
-0.211855	double. Misaligned data. Extra data
-0.211855	is used for prefetching data
-0.211855	Define macro for aligning data
-0.212016	performance penalty for organizing data
-0.264677	code section and read-only data
-0.346879	The code that accesses data
-0.199595	static buffer or send data
-0.250857	efficient way of keeping data
-0.074616	way to make thread-specific data
-0.074616	class for containing thread-specific data
-0.199595	one unit of received data
-0.250857	no need to organize data
-0.164766	relevant to optimization. Prefetching data
-0.164766	more common to exchange data
-0.164766	Mathematical functions Encryption, decryption, data
-0.164766	is concentrated on arranging data
-0.164766	be shared. Any writable data
-0.164766	vectorization less favorable: Larger data
-0.164766	big file containing numerical data
-0.164766	set. Aligning data Loading data
-0.164766	of the data structure, data
-1.172799	the size of the program
-1.168960	appropriate version of the program
-0.262627	same part of the program
-0.352285	which part of the program
-0.495747	each part of the program
-0.352285	any part of the program
-0.647312	critical part of the program
-0.352285	another part of the program
-0.352285	time-consuming part of the program
-0.376879	make parts of the program
-0.530361	other parts of the program
-0.818369	critical parts of the program
-0.376879	certain parts of the program
-0.376879	time-consuming parts of the program
-0.604232	during installation of the program
-0.330216	The logic of the program
-0.330216	no modification of the program
-0.330216	and clarity of the program
-0.344858	data elements and the program
-0.344858	doesn't support, and the program
-0.441453	the table in the program
-0.341402	use later in the program
-0.341402	extra overhead in the program
-0.441453	time spent in the program
-0.625498	hot spots in the program
-0.341402	exception handler in the program
-0.341402	errors elsewhere in the program
-0.341402	cause delays in the program
-0.471222	and only if the program
-0.334520	64-bit systems if the program
-0.513368	up, even if the program
-0.471222	miss. But if the program
-0.334520	dispatching. Test if the program
-0.334520	typically happens if the program
-0.350914	ruled out by the program
-0.806274	more resources than the program
-0.495642	updates each time the program
-0.506812	loaded every time the program
-0.297043	into memory when the program
-0.297043	big program when the program
-0.297043	put there when the program
-0.297043	you want when the program
-0.297043	several files when the program
-0.094949	is initialized when the program
-0.297043	mouse inputs when the program
-0.386305	not resolved when the program
-0.315304	does not make the program
-0.445107	this will make the program
-0.315304	of course make the program
-0.323671	too long. If the program
-0.323671	set 0x1C. If the program
-0.323671	deleting containers. If the program
-0.323671	for analysis. If the program
-0.344046	compile time, but the program
-0.345741	measurement instruments into the program
-0.345223	functions only makes the program
-0.439478	usually called before the program
-0.311122	desired values before the program
-0.311122	are resolved before the program
-0.323609	you just want the program
-0.288152	program, and while the program
-0.288152	press break while the program
-0.207563	framework and compile the program
-0.278755	First you compile the program
-0.411224	is to run the program
-0.333368	remain locked after the program
-0.323609	dispatcher function. When the program
-0.336144	be postponed until the program
-0.306784	necessary to modify the program
-0.231237	need to update the program
-0.231237	In other words, the program
-0.231237	profiler which determines the program
-0.286630	the logic behind the program
-0.373550	message and stop the program
-0.231237	user interface. Otherwise the program
-0.231237	error condition terminates the program
-0.098318	innermost loop of a program
-0.298913	If part of a program
-0.331911	critical part of a program
-0.442515	two versions of a program
-0.313380	the speed of a program
-0.313380	logic structure of a program
-0.313380	complete redesign of a program
-0.453318	all data in a program
-0.350798	a loop in a program
-0.491140	access. Assume that a program
-0.681295	For example, if a program
-0.301053	quite inefficient if a program
-0.301053	can occur if a program
-0.349491	a tag on a program
-0.347815	and verify than a program
-0.341017	be advantageous when a program
-0.287057	smart pointer. If a program
-0.287057	are inefficient. If a program
-0.287057	of ways). If a program
-0.270065	A situation where a program
-0.270065	any situation where a program
-0.232900	counters before running a program
-0.375859	time to load a program
-0.232900	may slow down a program
-0.564753	takes to install a program
-0.232900	spend on redesigning a program
-0.866740	if the size of program
-0.324649	all possible cases of program
-0.563636	how a piece of program
-0.293298	the low priority of program
-0.538031	costless in terms of program
-0.496907	functions scattered around in program
-0.237329	two modules contiguous in program
-0.237329	Poor reproducibility. Delays in program
-0.338669	at load time. The program
-0.326369	as described below. The program
-0.422609	objects are called. The program
-0.414603	in the calculations. The program
-0.289164	these instruction sets. The program
-0.289164	pointer at initialization. The program
-0.289164	not always true. The program
-0.565971	(*.dll or *.so). The program
-0.289164	use just-in-time compilation. The program
-0.233466	has been deallocated. The program
-0.233466	implemented with interpretation. The program
-0.233466	in Linux, sched_setaffinity). The program
-0.233466	complicated and error-prone. The program
-0.331594	optimizing execution speed or program
-0.237685	and negative impacts on program
-0.237465	for this reason. A program
-0.306160	operators. Make a C++ program
-0.230713	that produces another C++ program
-0.230713	in a well-structured C++ program
-0.619093	is that it makes program
-0.200271	test in the test program
-0.200271	used by the test program
-0.395828	to make a test program
-0.221004	make a small test program
-0.292127	once made a Windows program
-0.435971	modules of a big program
-0.292077	Visual Basic, etc. But program
-0.323407	But a highly optimized program
-0.170643	for a console mode program
-0.053343	file. A console mode program
-0.053343	interface. A console mode program
-0.093256	PC processors. The application program
-0.093256	the library. The application program
-0.265035	scheduling in an application program
-0.289634	interface to the calling program
-0.239180	a switch in your program
-0.275573	you must make your program
-0.189203	performance counters inside your program
-0.189203	stack frame unless your program
-0.308771	or *.so). The installation program
-0.441820	appropriate for the desired program
-0.165134	handling for the whole program
-0.165134	occupied throughout the whole program
-0.050078	an option for whole program
-0.050078	have support for whole program
-0.106693	that can do whole program
-0.106693	a feature called whole program
-0.106693	allows "__attribute__((visibility("hidden")))". Use whole program
-0.106693	way of doing whole program
-0.232490	is actually used. No program
-0.500890	times in the final program
-0.421885	Gives a more clear program
-0.206030	which is used during program
-0.206030	an array grows during program
-0.258674	constants in the entire program
-0.258674	of making the entire program
-0.230598	program. Application programmers rarely program
-0.224776	or if the 7 program
-0.212114	to run a speed-critical program
-0.212114	to a more well-structured program
-0.048353	x-xxxx--x Profile-guided optimization Whole program
-0.048353	up a program. Whole program
-0.048353	Interprocedural optimization /Og Whole program
-0.264969	in a computationally intensive program
-0.199847	is useful for preventing program
-0.199847	between development time, usability, program
-0.164998	by testing and analyzing program
-0.164998	The installation of downloaded program
-0.164998	optimization /GL --combine -fwhole- program
-0.164998	etc. Use an antivirus program
-1.031809	is a function that has
-0.463841	up-to-date function library that has
-0.235686	access a file that has
-0.379737	a memory block that has
-0.235686	an extra iteration that has
-0.312084	sure that everything that has
-0.561749	changed so that it has
-0.341157	later discovers that it has
-0.307961	possible branch if it has
-0.307961	member pointers if it has
-0.307961	must check if it has
-0.307961	only safe if it has
-0.479196	space, even when it has
-0.332493	Intel compiler because it has
-0.332493	cleaning up because it has
-0.281481	the array, which it has
-0.447072	the function, but it has
-0.280607	calculate (c+d) before it has
-0.280607	second sub-vector before it has
-0.226701	and the work it has
-0.339896	is declared. Therefore, it has
-0.098723	an object after it has
-0.098723	is accessed after it has
-0.226701	reference to anything it has
-0.525814	lookup if the function has
-0.350949	But each member function has
-0.350747	are sure the code has
-0.350747	problem. Whenever the code has
-0.350185	vectorize automatically. The code has
-0.414342	1; } This code has
-0.321156	vectorization Not all code has
-0.288968	everything else. System code has
-0.339963	FuncB(i+1); FuncC(i+1); } This has
-0.097736	of the function. This has
-0.321039	a C++ program. This has
-0.099636	level- 1 cache. This has
-0.099636	the level-2 cache. This has
-0.556758	function is called. This has
-0.229158	Func is executed. This has
-0.229158	sum += list[i]; This has
-0.229158	or 32 bytes). This has
-0.534576	name that the compiler has
-0.497408	efficient because the compiler has
-0.476584	efficient. If the compiler has
-0.437712	c1::*MemberPointer; Here, the compiler has
-0.325165	by 2. The compiler has
-0.325165	in one. The compiler has
-0.325165	another module. The compiler has
-0.318243	avoided. 37 A compiler has
-0.534177	that the Intel compiler has
-0.517388	required. The Intel compiler has
-0.226893	way. The Codeplay compiler has
-0.538072	: public CParent<CChild1> { has
-0.293703	instructions during this time has
-0.227681	from a pointer. It has
-0.227681	like a pointer. It has
-0.234400	on non-Intel processors). It has
-1.349138	part of the program has
-0.761364	part of a program has
-0.194394	example, if a program has
-0.194394	occur if a program has
-0.257682	advantageous when a program has
-0.369083	inefficient. If a program has
-0.476603	situation where a program has
-0.295632	or *.so). The program has
-0.295632	and error-prone. The program has
-0.164092	needed because the CPU has
-0.164092	types because the CPU has
-0.164092	stall because the CPU has
-0.237252	in the unit-test but has
-0.342849	important that the integer has
-0.290722	of a 32-bit integer has
-0.341108	the 64-bit instruction set has
-0.480273	bit x86 instruction set has
-0.341108	number (the instruction set has
-0.343044	members. If the class has
-0.234675	of a polymorphic class has
-0.537519	statement in this example has
-0.335503	mechanism in Intel compilers has
-0.313754	and small code size has
-0.630111	value of the pointer has
-0.278990	registers then the pointer has
-0.489848	of the function pointer has
-0.309473	time the function pointer has
-0.281150	the previous link pointer has
-0.237095	and b because b has
-0.327042	compiler, and the library has
-0.267033	source code. The library has
-0.267033	The program or library has
-0.041846	Windows platforms. This library has
-0.041846	x86 platforms. This library has
-0.088158	v. 7.2). This library has
-0.088158	option -mveclibabi=svml. This library has
-0.336171	Intel vector class library has
-0.336171	My vector class library has
-0.327655	single register the object has
-0.327655	no destructor the object has
-0.437161	libraries. A shared object has
-0.236948	member functions, where static has
-0.236882	C++ language While C++ has
-0.230514	the latter function also has
-0.230514	The register stack also has
-0.230514	eliminated. Loop unrolling also has
-0.355535	wait until the value has
-0.451793	declaration of the table has
-0.313347	the optimization of performance has
-0.236746	in a suboptimal way has
-0.344297	read. If a template has
-0.467844	the number of registers has
-0.333235	size of vector registers has
-0.227837	problem if the user has
-0.227837	especially if the user has
-0.280449	unusual that a user has
-0.236032	that the variable always has
-0.572759	that the operating system has
-0.334440	is because the file has
-0.236159	point instructions. Each type has
-0.236013	overflow or another error has
-0.475893	determine if the processor has
-0.275870	better. Whenever a processor has
-0.221752	threads simultaneously. This processor has
-0.431718	of the array element has
-0.343917	The MASM assembly language has
-0.322686	the stack. Each thread has
-0.291705	// Floating point overflow has
-0.449411	29. Each cache line has
-0.291392	of registers. This problem has
-0.490455	in a linked list has
-0.235306	However, the pipeline structure has
-0.235169	floating point rounding mode has
-0.532864	that the repeat count has
-0.403499	when the heap space has
-0.226132	case that the microprocessor has
-0.226132	require that the microprocessor has
-0.233724	process because the microprocessor has
-0.309745	simultaneously. If the microprocessor has
-0.444666	thread if the application has
-0.322158	and each CPU model has
-0.290202	complex if the parameter has
-0.348894	only if the programmer has
-0.269922	because the static keyword has
-0.269922	class. The static keyword has
-0.289968	chain where each addition has
-0.233678	that the user actually has
-0.686289	choice of hardware platform has
-0.478338	evaluation of the operands has
-0.310778	global variable in main has
-0.337112	simultaneously. If the computer has
-0.232805	after the pointer p has
-0.288250	conversions The C++ syntax has
-0.333694	In fact, the STL has
-0.318918	non-inlined copy Function inlining has
-0.232538	time. A template instance has
-0.232576	is compiled as position-independent has
-0.333703	more then the offset has
-0.426102	places when the heap has
-0.230474	integer overflow doesn't occur has
-0.611931	in the main executable has
-0.229617	will wait until seconds has
-0.250972	only possible if F1 has
-0.199698	an exception then F1 has
-0.229730	when not selected. Compiler has
-0.555781	object oriented programming style has
-0.329687	even though the latter has
-0.326481	Z. Each dependency chain has
-0.227763	the end user who has
-0.226323	the D language. D has
-0.426211	heap. The heap manager has
-0.391430	When a hot spot has
-0.218058	auto_ptr and shared_ptr. auto_ptr has
-0.211797	S1 ArrayOfStructures[100]; This reordering has
-0.211797	all major platforms. Pascal has
-0.250794	example 7.32b. A for-loop has
-0.199539	the three functions. Sum1 has
-0.250794	assumed that the reader has
-0.199539	structures (without member functions) has
-0.164714	called "Gnu indirect function" has
-0.164714	disadvantage is that CParent::Hello() has
-0.164714	of the stack. Deallocation has
-0.164714	loop in example 8.23b has
-0.164714	compiler option -fno-pic apparently has
-0.164714	The loop initialisation i=0; has
-0.653508	instruction set is the vector
-1.347127	the size of the vector
-1.328990	a multiple of the vector
-0.356190	function library and the vector
-0.655470	instruction set for the vector
-0.340972	be aligned by the vector
-0.634987	not divisible by the vector
-0.447640	address divisible by the vector
-0.354833	units smaller than the vector
-0.340992	Avoid branches at the vector
-0.340992	table lookup at the vector
-0.355430	the CPU. If the vector
-0.748479	advantages of using the vector
-0.351574	fit nicely into the vector
-0.330326	chapter 11. Using the vector
-0.459269	all elements of a vector
-0.928163	of elements in a vector
-0.646740	the result in a vector
-0.343705	in STL as a vector
-0.444355	be organized as a vector
-0.290533	Loading data into a vector
-0.290533	structure y into a vector
-0.290533	not fit into a vector
-0.290533	time packed into a vector
-0.443473	are situations where a vector
-0.234971	processors can calculate a vector
-0.004654	{ // Make a vector
-0.006999	_mm_set1_epi16(0); // Make a vector
-0.014114	zero(0,0,0,0,0,0,0,0); // Make a vector
-0.516572	vector. The use of vector
-0.551952	units. The size of vector
-1.220317	can take advantage of vector
-0.522670	two different kinds of vector
-0.237053	A further extension of vector
-0.335692	between simple processors and vector
-0.237448	execution (chapter 11) and vector
-0.552781	Number of elements in vector
-0.069709	to each element in vector
-0.069709	AND each element in vector
-0.406794	bigger vector registers. The vector
-0.313652	uses 64 bits. The vector
-0.237001	vector operations. 105 The vector
-0.330731	Very poor performance for vector
-0.293166	are not suited for vector
-0.236985	and structures. Useful for vector
-0.382531	dispatching #include "vectorclass.h" // vector
-0.093046	use intrinsic functions or vector
-0.093046	using intrinsic functions or vector
-0.351943	Array size divisible by vector
-0.289392	mix simple integer with vector
-0.233668	are only available with vector
-0.309678	make aligned arrays with vector
-0.326618	is a problem with vector
-0.233668	Example 12.6. Function with vector
-0.233668	c: CPU dispatching with vector
-0.292843	units same size as vector
-0.528051	easily be implemented as vector
-0.293753	operations Today's microprocessors have vector
-0.683991	it possible to use vector
-0.413991	less advantageous to use vector
-0.413991	always advantageous to use vector
-0.330890	often easier to use vector
-0.346801	Intel compilers can use vector
-0.230781	compiler can also use vector
-0.237475	instructions SSE4.1 some more vector
-0.300154	have functions for integer vector
-0.290112	a few more integer vector
-0.095155	to store aligned integer vector
-0.095155	to load aligned integer vector
-0.003176	to store unaligned integer vector
-0.003176	to load unaligned integer vector
-0.237161	// Example 7.41a class vector
-0.322865	maximum size of each vector
-0.322865	total size of each vector
-0.326280	of elements in each vector
-0.285139	is available then each vector
-0.790568	when you are using vector
-0.352601	this loop by using vector
-0.308638	the use of Intel vector
-0.321644	both AMD and Intel vector
-0.339931	vectorclass www.agner.org/optimize/#vectorclass. The Intel vector
-0.224181	Same example, using Intel vector
-0.511839	vector math libraries: Intel vector
-0.224181	as follows (using Intel vector
-0.003590	elements from cc into vector
-0.010860	b: from cc into vector
-0.003590	elements from bb into vector
-0.010860	115 from bb into vector
-0.338320	_mm_empty() after the 64-bit vector
-0.449118	} The most efficient vector
-0.236973	// Define biggest possible vector
-0.298962	of using a long vector
-0.298962	numbers. With a long vector
-0.224990	list of some long vector
-0.224990	vector math libraries: long vector
-0.014588	defines a 128 bit vector
-0.223106	or two 128- bit vector
-0.352357	that support a new vector
-0.212769	function. However, the short vector
-0.265709	step. With a short vector
-0.212769	a list of short vector
-0.212769	vector libraries and short vector
-0.212769	math libraries: Intel short vector
-0.323857	table lists the available vector
-0.339593	register, add the constant vector
-0.012860	// Store the result vector
-0.322420	an addition with another vector
-0.087973	memory access. 12 Using vector
-0.087973	................................................................................................. 103 12 Using vector
-0.087973	........................................................................................ 109 12.5 Using vector
-0.087973	next section. 12.5 Using vector
-0.222199	are useful for Boolean vector
-0.222199	many branch mispredictions. Boolean vector
-0.234678	two double. The intrinsic vector
-0.233857	64-bit Windows). The XMM vector
-0.233857	the advantage of bigger vector
-0.319366	The Intel compiler supports vector
-0.325812	be found in my vector
-0.232783	and automatic parallelization. Supports vector
-0.266639	one, into an STL vector
-0.213592	every four objects. STL vector
-0.043263	to use the #pragma vector
-0.043263	code when the #pragma vector
-0.091317	used, then use #pragma vector
-0.091317	#pragma vector always #pragma vector
-0.091317	__fastcall Noncached write #pragma vector
-0.043263	pointer is aligned #pragma vector
-0.043263	#pragma vector aligned #pragma vector
-0.091317	#pragma vector nontemporal #pragma vector
-0.091317	((visibility ("internal"))) Vectorize #pragma vector
-0.231990	a set of special vector
-0.439448	data into the right vector
-0.253515	cc[]) { // Define vector
-0.039582	#include <dvec.h> // Define vector
-0.186047	#include "vectorclass.h" // Define vector
-0.151107	b into a 128-bit vector
-0.151107	combined into a 128-bit vector
-0.182109	processors that supported 128-bit vector
-0.230478	supercomputers with massively parallel vector
-0.230513	compiler does not allow vector
-0.229852	one from me. My vector
-0.195770	can emulate a 256-bit vector
-0.195770	will use one 256-bit vector
-0.226694	e.g. AVX, AVX2 Mathematical vector
-0.299030	know whether the largest vector
-0.224647	Same example, using Agner vector
-0.028341	same example using Agner's vector
-0.028341	with vector classes Agner's vector
-0.028341	(see page 107). Agner's vector
-0.028341	the option -mveclibabi=acml. Agner's vector
-0.028341	Library amd_vrs4_expf amd_vrd2_exp Agner's vector
-0.276185	in using the larger vector
-0.218323	involves eight or sixteen vector
-0.218323	have got RISC cores, vector
-0.212089	a; y = b;} vector
-0.199701	PC's, workstations and scientific vector
-0.074656	Two libraries of predefined vector
-0.074656	intrinsic functions Use predefined vector
-0.164864	on most processors (when vector
-0.164864	geometry and other odd-sized vector
-0.164864	a XOR b Bit vector
-0.164864	a.x, y + a.y);} vector
-0.164864	vector { // 2-dimensional vector
-0.351799	command line or a make
-0.460211	applications such as a make
-0.357634	test this is to make
-0.389712	compatible way is to make
-0.273558	smart pointers is to make
-0.433872	the problem is to make
-0.313773	the solution is to make
-0.313773	alternative solution is to make
-0.273558	cleanup jobs is to make
-0.273558	realistic goal is to make
-0.050702	terminating zero and to make
-0.273549	designed so as to make
-0.219702	and maintenance - to make
-0.947115	for the compiler to make
-0.810744	enable the compiler to make
-0.520061	functions You have to make
-0.472495	The compiler has to make
-0.357623	you can do to make
-0.911919	is more efficient to make
-0.429731	slightly more efficient to make
-0.293091	of the array to make
-0.779972	it is possible to make
-0.505269	It is possible to make
-0.537787	is often possible to make
-0.219702	unused fourth value to make
-1.009074	time it takes to make
-0.311503	it in order to make
-0.311503	2 in order to make
-0.311503	order in order to make
-0.311503	unsigned in order to make
-0.439990	times in order to make
-0.311503	bool in order to make
-0.311503	conditions in order to make
-0.311503	input in order to make
-0.311503	purpose in order to make
-0.311503	influences in order to make
-0.311503	ment in order to make
-0.122914	the only way to make
-0.122914	The only way to make
-0.540216	The easiest way to make
-0.325643	be even faster to make
-0.478904	example of how to make
-0.478904	examples of how to make
-0.580627	130 for how to make
-0.381399	example shows how to make
-0.342359	manual discusses how to make
-0.361383	may be useful to make
-0.466212	processors are sure to make
-0.510329	you may want to make
-0.698150	if you want to make
-0.560071	when you want to make
-0.397530	Whether you want to make
-0.362715	developers who want to make
-1.200956	it is important to make
-0.035217	It is common to make
-0.536884	therefore be advantageous to make
-0.308660	a better solution to make
-0.219702	in the structure to make
-1.487898	It is recommended to make
-0.733393	is not recommended to make
-0.273549	supports CPU dispatching to make
-0.219702	is more complicated to make
-0.611244	the compiler needs to make
-0.012747	of the programmer to make
-0.305534	for the programmer to make
-0.219702	Metaprogramming Metaprogramming means to make
-0.381457	developer may choose to make
-0.598843	are three ways to make
-0.273549	quite ingenious things to make
-0.273549	Example 8.23a. Loop to make
-0.273549	have a destructor to make
-0.293091	not be safe to make
-0.219702	such a subexpression to make
-0.586325	It is easy to make
-0.303203	it is convenient to make
-0.022107	worth the effort to make
-0.293091	is often preferable to make
-0.662427	a good idea to make
-0.219702	a systematic manner to make
-0.503193	it is sufficient to make
-0.219702	the code carefully to make
-0.357623	If you forget to make
-0.219702	size. I tried to make
-0.219702	it is advisable to make
-0.219702	allocation also tends to make
-0.321747	containing thread-specific data and make
-0.234945	worst possible case and make
-0.336240	at random times and make
-0.378704	fix the problem and make
-0.234945	element to list and make
-0.234945	avoid the conversions and make
-0.453817	implementation if possible, and make
-0.290845	instead of truncation and make
-0.234945	a software package and make
-0.234945	the hot spot and make
-0.234945	arrays are aligned, and make
-0.234945	of an error; and make
-0.335748	have vector instructions that make
-0.290892	all .cpp modules that make
-0.234986	satisfied. The conditions that make
-0.234986	also other details that make
-0.234986	have some disadvantages that make
-0.101790	vector register. Factors that make
-0.101790	vectorization is. Factors that make
-0.234986	and other complications that make
-0.643396	several factors that can make
-0.554489	but the compiler can make
-0.503105	think that you can make
-0.423839	shows how you can make
-0.327354	NOT. Instead, you can make
-0.479769	storage Most compilers can make
-0.231801	These new instructions can make
-0.231801	if the application can make
-0.308918	64, ...). We can make
-0.308918	(e.g. PowerPC). We can make
-0.307456	Studio. This tool can make
-0.231801	The fastcall modifier can make
-0.314658	INSTRSET == 2 // make
-0.237731	save cache space or make
-0.430200	why compilers do not make
-0.332440	have studied do not make
-0.350089	is long does not make
-0.302067	member function. Do not make
-0.302067	less efficient. Do not make
-0.544354	are then you may make
-0.333449	Even better, you may make
-0.335295	'this' pointer. You may make
-0.335295	mutually incompatible. You may make
-0.235344	runtime, if only you make
-0.351842	exception. 64 If you make
-0.235344	profiling support. Then you make
-0.206512	dispatcher function. This will make
-0.206512	time slices. This will make
-0.206512	or -axAVX. This will make
-0.206512	size (4096). This will make
-0.606844	the Gnu compiler will make
-0.381742	used and this will make
-0.281639	while other compilers will make
-0.281639	that some compilers will make
-0.428488	cache. Most compilers will make
-0.293324	model. Instead, I will make
-0.219898	An integer overflow will make
-0.219898	a & b; will make
-0.219898	a = b++; will make
-0.232970	Intel function library then make
-0.288600	type of parameters then make
-0.232970	floating point counter then make
-0.232970	an Intel compiler, then make
-0.350055	programming style. Some compilers make
-0.321538	The CodeGear compiler cannot make
-0.428579	sequential, and you cannot make
-0.412334	reason that they cannot make
-0.224040	induction variables Compilers cannot make
-0.231087	of problems you must make
-0.231087	other words, you must make
-0.537960	why the compiler doesn't make
-0.226059	the two loops would make
-0.226059	carried dependency chain would make
-0.235014	transferring additional parameters. Therefore, make
-0.333043	you may of course make
-0.732683	32-bit Mac OS X make
-0.226740	to OMF format. Alternatively, make
-0.222295	at compile time. Templates make
-0.199925	of range"); or better, make
-0.165071	recoverable and non-recoverable errors; make
-0.165071	option -Wstrict-overflow=2, or (5) make
-0.358022	for support of the different
-0.461461	of pointers to the different
-0.357843	branch predictions in the different
-0.355697	lower priority. If the different
-0.307457	and synchronization between the different
-0.307457	work evenly between the different
-0.457119	necessary to test the different
-0.346313	table shows whether the different
-0.340921	code and put the different
-0.236936	generations classes contain the different
-0.381480	us to manipulate the different
-0.236936	following table summarizes the different
-0.346379	a specific size is different
-0.355236	a pointer of a different
-0.338180	an application to a different
-0.338180	microprocessor jump to a different
-0.338180	the loader to a different
-0.341500	the integer in a different
-0.480812	same bits in a different
-0.625687	the result in a different
-0.341500	overwritten, possibly in a different
-0.625687	is defined in a different
-0.493978	giving the function a different
-0.628699	function library with a different
-0.443545	critical modules with a different
-1.018213	recommended to use a different
-0.334461	point variables use a different
-0.325041	where static has a different
-0.325041	static keyword has a different
-0.542597	simply by using a different
-0.335913	the compiler uses a different
-0.234643	if it had a different
-1.038776	and the number of different
-0.034626	fragmented when objects of different
-0.158976	to store objects of different
-0.338541	2.1. Comparing performance of different
-0.101588	specifying that pointers of different
-0.101588	that two pointers of different
-0.310597	can make arrays of different
-0.090978	7 The efficiency of different
-0.206903	the relative efficiency of different
-0.234438	types or strings of different
-0.343207	prone. A discussion of different
-0.378000	The time consumption of different
-0.092366	optimization. 8.2 Comparison of different
-0.092366	66 8.2 Comparison of different
-0.234438	Time for transposition of different
-0.234438	There are hundreds of different
-0.234438	Table 7.1. Sizes of different
-0.346023	programmer that pointers to different
-0.237454	assigning different priorities to different
-0.237454	easy to port to different
-0.237491	system color settings and different
-0.237491	microprocessors, different alignments and different
-0.343488	the same data in different
-0.543755	objects are stored in different
-0.338222	are typically stored in different
-0.485151	which is available in different
-1.018935	can be implemented in different
-0.312430	Comparison of optimizations in different
-0.292018	should be tested in different
-0.292018	functions are kept in different
-0.313263	7.26 Overloaded functions The different
-0.444240	the level-1 cache. The different
-0.323861	on that variable. The different
-0.292813	different instructions sets. The different
-0.469818	of a function for different
-0.308775	size is different for different
-0.230397	many different cases for different
-0.030871	multiple code versions for different
-0.030871	in different versions for different
-0.003324	in multiple versions for different
-0.030871	have several versions for different
-0.230397	possible to compile for different
-0.002399	5: "Calling conventions for different
-0.064350	5. Calling conventions for different
-0.230397	of different algorithms for different
-0.523976	same memory area for different
-0.230397	of cache organization for different
-0.230397	different memory spaces for different
-0.230397	per matrix cell for different
-0.349044	decomposition here means that different
-0.358063	sizes Integers can be different
-0.355048	code section will be different
-0.339649	differently because there are different
-0.339649	however, where there are different
-0.236827	for branch prediction are different
-0.351175	very large or if different
-0.232132	a Pentium 4 with different
-0.264766	to be compiled with different
-0.264766	code are compiled with different
-0.330249	divided into threads with different
-0.232132	from different addresses with different
-0.341464	not even compatible with different
-0.232132	hard disk. Test with different
-0.232132	containing multiple streams with different
-0.535027	should be tested on different
-0.235983	method is fastest on different
-0.292026	may behave differently on different
-0.314384	are simply treated as different
-0.425687	Some compilers will use different
-0.290711	the integer operations use different
-0.234827	if the threads use different
-0.291858	Mixing object files from different
-0.329305	program had read from different
-0.023237	be scattered around at different
-0.234172	the dispatch decision at different
-0.802257	of how to make different
-0.237394	executable file stub. If different
-0.352629	template function for each different
-0.553752	were able to do different
-0.354383	the conversions by using different
-0.990643	of a and b different
-0.302322	the performance of two different
-0.312653	integer representations in two different
-0.481810	code. There are two different
-0.203280	that you have two different
-0.203280	x86 family have two different
-0.272726	tool can make two different
-0.218975	registers and correspondingly two different
-0.311122	many users in many different
-0.272814	very useful for many different
-0.356731	are available for many different
-0.322124	is called with many different
-0.289893	necessary to have many different
-0.216999	is called from many different
-0.270491	There are so many different
-0.313423	because computers have very different
-0.213456	multithreaded program, or between different
-0.213456	is a switch between different
-0.310760	double precision. Conversions between different
-0.295195	needed for communication between different
-0.213456	a thread jumps between different
-0.213456	single function. Switch between different
-0.213456	a multitasking environment, between different
-0.235954	the symbolic link. Use different
-0.235865	C++ casting operator These different
-0.190681	to optimize for several different
-0.190681	and templates for several different
-0.273676	compiler There are several different
-0.182101	CodeAnalyst. There are several different
-0.182101	copying. There are several different
-0.179620	is accessed by several different
-0.179620	to test on several different
-0.255079	C++ syntax has several different
-0.179620	have to test several different
-0.235551	the code to support different
-0.235526	will go into eight different
-0.322274	different threads are doing different
-0.234263	moderately well. Supports three different
-0.234133	// It will look different
-0.231664	for transposing and copying different
-0.330905	do is to mix different
-0.227994	is recommended to try different
-0.228032	different brands of CPUs, different
-0.074706	algebraic expressions on seven different
-0.074706	of experiments on seven different
-0.074706	tested on different platforms, different
-0.074706	different browsers, different platforms, different
-0.355814	no penalty for mixing different
-0.199780	p1 and p2 having different
-0.465701	platforms, different screen resolutions, different
-0.199780	dispatch mechanism that treats different
-0.199780	are useful for assigning different
-0.164936	tested in different browsers, different
-0.164936	two threads with widely different
-0.164936	cases for different microprocessors, different
-0.465851	first time. This is because
-0.316303	both cases. This is because
-0.316303	cache size. This is because
-0.316303	function returns. This is because
-0.316303	hundred times. This is because
-0.316303	reproducible results. This is because
-0.316303	is deprecated. This is because
-0.316303	time measure. This is because
-0.524661	reduced. This may be because
-0.314098	behind the program or because
-0.295349	pointer or member function because
-0.057176	a non-static member function because
-0.290661	than the simple function because
-0.328000	than a frame function because
-0.314545	more efficient than if because
-0.329256	the fully optimized code because
-0.291813	produce less optimal code because
-0.235796	code to non-AVX code because
-0.567009	use the Intel compiler because
-0.527386	piece at a time because
-0.495705	for the first time because
-0.279483	on the execution time because
-0.208182	the total execution time because
-1.171478	known at compile time because
-0.237231	space to store data because
-0.350515	for class member functions because
-0.443195	in the while loop because
-0.324276	not evaluated at all because
-0.538339	from the level-2 cache because
-0.455146	b is an integer because
-0.351432	the preprocessor can do because
-0.773236	a float or double because
-0.461252	for a and b because
-0.461252	between a and b because
-0.231565	to a < b because
-0.324435	a static link library because
-0.236868	to a contained object because
-0.224073	57 Templates are efficient because
-0.706732	is slightly more efficient because
-0.298272	will be very efficient because
-0.438251	will be less efficient because
-0.363640	references are equally efficient because
-0.236831	various other optimizations possible because
-0.381374	not the optimized version because
-0.448893	division by a variable because
-0.438426	by an induction variable because
-0.335276	to make register variables because
-0.323851	can reduce the performance because
-0.323851	without reducing the performance because
-0.284010	terms of program performance because
-0.292595	performance for 32-bit software because
-0.292837	the loop is long because
-0.454186	This method is faster because
-0.228275	makes the code faster because
-0.393956	code will run faster because
-0.236638	drivers are particularly critical because
-0.540413	use the same register because
-0.253994	share the same register because
-0.292313	which happens quite often because
-0.236347	than a function template because
-0.418332	type casting of pointers because
-0.300447	to use than pointers because
-0.225906	code that uses pointers because
-0.236202	in the unit- test because
-0.236185	efficient in some systems because
-0.347164	Best-case testing is useful because
-0.449946	as 0/a = 0 because
-0.236113	to recognize VIA processors because
-0.236091	Basic soon became available because
-0.312585	prevented from cleaning up because
-0.223286	be reloaded eight times because
-0.223286	than the subsequent times because
-0.223286	*p+2 a hundred times because
-0.312135	the list is large because
-0.348486	the extra function calls because
-0.235916	with the correct result because
-0.511188	this is not necessary because
-0.444435	by using assembly language because
-0.291780	less than half speed because
-0.235296	is less than 128 because
-0.311796	objects as function parameters because
-0.448750	purposes. This is advantageous because
-0.739531	the most efficient solution because
-0.281260	not an optimal solution because
-0.612019	can be an advantage because
-0.483039	than the Boolean operators because
-0.246990	running in 64-bit mode because
-0.355313	needed in 64-bit mode because
-0.246990	simpler in 64-bit mode because
-0.340510	type in the values because
-0.234560	problem in interactive programs because
-0.290562	can cause caching problems because
-0.202626	union is not optimal because
-0.202626	unrolling is not optimal because
-0.310726	than in a microprocessor because
-0.342255	is somewhat more complicated because
-0.322233	Deallocation has no cost because
-0.321730	caching will be better because
-0.233887	recommended for critical applications because
-0.234019	the Gnu compiler mechanism because
-0.289904	variable would be needed because
-0.289745	elements of simple types because
-0.233978	than an uncached read because
-0.233542	we use hexadecimal numbers because
-0.233269	to the allocation process because
-0.233136	up a function just because
-0.651408	order of the operands because
-0.269342	of the Boolean operands because
-0.308944	position-independent code is smaller because
-0.216105	check on n here because
-0.216105	not cost anything here because
-0.232770	execute slower than intended because
-0.219929	wide, should be avoided because
-0.219929	malloc/free should be avoided because
-0.138207	it. This is inefficient because
-0.138207	each other is inefficient because
-0.138207	Thread-local storage is inefficient because
-0.248202	code is very inefficient because
-0.232526	section is always position-independent because
-0.287349	port to different platforms because
-0.231816	multiplying by other constants because
-0.286819	as very time-consuming tasks because
-0.708785	on a hard disk because
-0.611687	in the main executable because
-0.230150	language is inherently parallel because
-0.230355	code is not copied because
-0.321307	replacements for switch statements because
-0.284601	delayed for several seconds because
-0.282528	than with fine-grained parallelism because
-0.282422	of code is fastest because
-0.227624	PC processors are preferred because
-0.407976	when compiled without -fpic because
-0.173641	function is time consuming because
-0.173641	can be time consuming because
-0.226059	64 bits total size, because
-0.314955	programming practice, of course, because
-0.224351	The problem only occurs because
-0.175962	slice are quite costly because
-0.175962	which are relatively costly because
-0.224227	usability may be poor because
-0.224227	results or fail completely because
-0.275785	lines in column 28 because
-0.275785	shared between multiple processes because
-0.217877	preceding label plus one, because
-0.271898	with u.i[1] ^= 0x80000000; because
-0.355370	above code is serial because
-0.271691	suitable for example 9.5 because
-0.217877	the code becomes simpler because
-0.264682	Different compilers behave differently because
-0.211619	rounding. This is unfortunate because
-0.122291	table has const twice because
-0.122291	g(x) is calculated twice because
-0.264410	made with option -fpie because
-0.211619	is not an issue because
-0.264410	exception handling is negligible because
-0.074530	This is very problematic because
-0.074530	strings are particularly problematic because
-0.250599	code can be vectorized, because
-0.199366	and memcpy is unsafe because
-0.164554	not i but i*12, because
-0.164554	set is particularly interesting because
-0.164554	cache and accessed non-sequentially because
-0.164554	so-called partial flags stall because
-0.164554	operand is not evaluated, because
-0.164554	than x = *(++p) because
-0.164554	than x = array[++i] because
-0.164554	the same cache line, because
-0.164554	cache (e.g. Sandy Bridge) because
-0.164554	be determined in advance, because
-0.164554	has allocated with alloca, because
-0.164554	allocation is particularly risky because
-0.420828	latency which is the same
-0.297169	function pointer is the same
-0.297169	function calls is the same
-0.297169	The result is the same
-0.297169	a matrix is the same
-0.297169	i<n; i++) is the same
-0.297169	A reference is the same
-0.334822	should be of the same
-0.492217	with many of the same
-0.569043	newer version of the same
-0.344341	is member of the same
-0.344341	(not member of the same
-0.636878	more versions of the same
-0.774071	different versions of the same
-0.085733	are members of the same
-0.085733	variable members of the same
-0.085733	Non-static members of the same
-0.334822	future models of the same
-0.334822	different implementations of the same
-0.433185	renamed instances of the same
-0.120219	cannot point to the same
-0.412923	multiple pointers to the same
-0.469083	get access to the same
-0.318591	threads write to the same
-0.060458	threads writing to the same
-0.318591	or reads to the same
-0.412923	lines belong to the same
-0.578872	is not in the same
-0.578872	each other in the same
-0.420708	are used in the same
-0.410373	the objects in the same
-0.316538	one way in the same
-0.130047	multiple threads in the same
-0.130047	two threads in the same
-0.316538	definition language in the same
-0.446772	parent classes in the same
-0.410373	is implemented in the same
-0.316538	files, preferably in the same
-0.227331	threads running in the same
-0.227331	thread running in the same
-0.316538	Func2 were in the same
-0.316538	different priority in the same
-0.316538	float additions in the same
-0.316538	different lengths in the same
-0.316538	an FPGA in the same
-0.316538	necessarily stay in the same
-0.324237	induction variables for the same
-0.324237	using registers for the same
-0.324237	are competing for the same
-0.132537	will contend for the same
-0.132537	libraries contend for the same
-0.324237	always compete for the same
-0.346858	the complication that the same
-0.154007	and b are the same
-0.154007	of algebra are the same
-0.154007	storage principles are the same
-0.449108	no loop if the same
-0.344839	the loop by the same
-0.121919	another function with the same
-0.121919	pure function with the same
-0.121919	of threads with the same
-0.121919	more threads with the same
-0.292034	always calculated with the same
-0.292034	different types with the same
-0.292034	multiple configurations with the same
-0.292034	dividing repeatedly with the same
-0.339497	virtual processors on the same
-0.439057	processes running on the same
-0.424993	sure to have the same
-0.591837	possible to use the same
-0.320008	modules that use the same
-0.054739	it can use the same
-0.054739	compiler can use the same
-0.117385	You can use the same
-0.349332	will not use the same
-0.242322	they cannot use the same
-0.242322	several applications use the same
-0.242322	2 thenaandbcannot use the same
-0.242322	multiplications. Subtractions use the same
-0.413097	are called from the same
-0.291340	instruction sets from the same
-0.291340	99 read from the same
-0.291340	and writing from the same
-0.291340	are generated from the same
-0.051909	never used at the same
-0.260135	the variable at the same
-0.260135	point multiplication at the same
-0.260135	to zero at the same
-0.260135	multiple things at the same
-0.260135	one thing at the same
-0.260135	same DLL at the same
-0.276546	CParent<CChild1> { has the same
-0.276546	in main has the same
-0.276546	main executable has the same
-0.333115	store data because the same
-0.333115	not copied because the same
-0.334943	subexpression elimination If the same
-0.334943	same class). If the same
-0.446391	possibility of using the same
-0.125403	you are using the same
-0.125403	systems are using the same
-0.338652	be linked into the same
-0.336487	higher speed. In the same
-0.339881	data sets where the same
-0.244344	and b take the same
-0.244344	calculations usually take the same
-0.279144	some cases even the same
-0.125984	function that does the same
-0.074715	conversions. It does the same
-0.074715	pointer which does the same
-0.074715	static_cast operator does the same
-0.074715	function __intel_cpu_features_init_x() does the same
-0.224640	in BSD work the same
-0.318714	statement always calls the same
-0.319693	destination array. But the same
-0.318714	will not get the same
-0.199845	ways of doing the same
-0.199845	functions are doing the same
-0.199845	in fact doing the same
-0.553771	want to calculate the same
-0.455855	possible to write the same
-0.224640	machine code becomes the same
-0.512730	following example shows the same
-0.279144	that always goes the same
-0.224640	predicted to go the same
-0.165010	sure to produce the same
-0.165010	compiler should produce the same
-0.364423	etc. is still the same
-0.224640	used for accessing the same
-0.364423	supports at least the same
-0.364423	preferable to keep the same
-0.227275	accessed from within the same
-0.227275	used only within the same
-0.019938	time to share the same
-0.006545	threads can share the same
-0.006545	c can share the same
-0.006545	simultaneously can share the same
-0.019938	different objects share the same
-0.019938	the threads share the same
-0.019938	data members share the same
-0.019938	processors usually share the same
-0.019938	row 28 share the same
-0.165010	methods have exactly the same
-0.165010	are doing exactly the same
-0.279144	CPUs for executing the same
-0.022496	matter of interpreting the same
-0.224640	or variable having the same
-0.224640	dynamic library requiring the same
-0.097955	advantage of sharing the same
-0.097955	threads are sharing the same
-0.224640	you may reuse the same
-0.817069	a); } } The same
-0.234271	faster if unsigned The same
-0.340541	the data cache. The same
-0.567702	a non-sequential order. The same
-0.234271	for register storage. The same
-0.531625	file is closed. The same
-0.234271	and >= operators). The same
-0.234271	e.g. four floats. The same
-0.234271	2015 or 2016. The same
-0.234271	two 128-bit reads. The same
-0.234271	the preceding row. The same
-0.065515	function called only from same
-0.581736	CPUs with execution units same
-0.559799	where most of the functions
-0.356631	only few of the functions
-0.356862	degrades performance for the functions
-0.347798	delete or with the functions
-0.701248	be done with the functions
-0.581041	faster to use the functions
-0.579780	solution to make the functions
-0.062265	by inlining all the functions
-0.406525	a class containing the functions
-0.313435	child classes implement the functions
-0.236819	the linker extracts the functions
-0.236819	idea to collect the functions
-0.544991	check the order of functions
-0.237227	multiple versions even of functions
-0.510618	improve the speed of functions
-0.351878	Use macros instead of functions
-0.237802	feature. All accesses to functions
-0.237686	free. These operators and functions
-0.314060	the code memory. The functions
-0.237343	is more difficult. The functions
-0.339427	specific calling conventions for functions
-0.497792	by making sure that functions
-0.236990	than 64-bit Windows if functions
-0.649804	works most efficiently if functions
-0.237481	vector class library have functions
-0.407356	recommended to return from functions
-0.237366	double. The intrinsic vector functions
-0.344861	and put the different functions
-0.340947	accessed by several different functions
-0.289033	functions and some other functions
-0.311133	modify x, while other functions
-0.233352	function that calls other functions
-0.338758	misleading reports of which functions
-0.343877	a PLT for all functions
-0.332238	F1 only if all functions
-0.233055	directives and declare all functions
-0.338684	file. Keep often used functions
-0.545867	if you are using functions
-0.598396	to call the library functions
-0.213999	the addresses of library functions
-0.286348	its time in library functions
-0.213999	version of most library functions
-0.267100	dispatching. Many Intel library functions
-0.213999	linking for any library functions
-0.213999	know that standard library functions
-0.213999	efforts in optimizing library functions
-0.213999	functions. Time- consuming library functions
-0.320439	function and these two functions
-0.289626	2; } These two functions
-0.339178	as well as efficient functions
-1.004267	if there are many functions
-0.275014	Primitives" library contains many functions
-0.204386	Kernel Library" contains many functions
-0.226769	non-Intel CPUs. Includes many functions
-0.208991	to have the member functions
-0.208991	file. If the member functions
-0.172919	of classes and member functions
-0.037210	this pointer in member functions
-0.037210	'this' pointer in member functions
-0.172919	page 52. The member functions
-0.134209	recommended to make member functions
-0.134209	You may make member functions
-0.247118	work for class member functions
-0.172919	(CParent<>) contains any member functions
-0.198908	dispatch to virtual member functions
-0.198908	least one virtual member functions
-0.050357	32-bit systems. Virtual member functions
-0.024449	access. 7.20 Virtual member functions
-0.024449	53 7.20 Virtual member functions
-0.037210	bytes. 7.19 Class member functions
-0.037210	51 7.19 Class member functions
-0.172919	programming are: Non-static member functions
-0.336139	resolution and the critical functions
-0.336139	that includes the critical functions
-0.227920	mitigated by making critical functions
-0.338110	hardware implementation of these functions
-0.310532	But beware that these functions
-0.225740	and lrint. Unfortunately, these functions
-0.352169	query certain operating system functions
-0.313329	with Microsoft compiler. Some functions
-0.349784	doesn't have information about functions
-0.617447	of the most important functions
-0.336954	systems lack the necessary functions
-0.312191	functions. The CPU- specific functions
-0.228323	the so-called commpage. These functions
-0.228323	begin with _mm. These functions
-0.259073	member pointers and virtual functions
-0.259073	Runtime polymorphism with virtual functions
-0.206889	is called. If virtual functions
-0.206889	you can avoid virtual functions
-0.206889	non-virtual functions. Avoid virtual functions
-0.235798	program memory. If several functions
-0.350027	library for a few functions
-0.235381	if possible. Use inline functions
-0.235056	called stack unwinding. All functions
-0.234932	inline and optimize both functions
-0.342552	not the more complicated functions
-0.121946	have support for intrinsic functions
-0.071125	Header files for intrinsic functions
-0.071125	// header for intrinsic functions
-0.114535	be implemented with intrinsic functions
-0.114535	choose to use intrinsic functions
-0.114535	hundreds of different intrinsic functions
-0.114535	result by using intrinsic functions
-0.114535	// Define SSE2 intrinsic functions
-0.114535	assembly language Use intrinsic functions
-0.034926	table lookup Using intrinsic functions
-0.017112	107 12.4 Using intrinsic functions
-0.017112	division. 12.4 Using intrinsic functions
-0.247085	produce tables of mathematical functions
-0.033242	square root and mathematical functions
-0.154434	intrinsic instructions for mathematical functions
-0.154434	memmove, memset, or mathematical functions
-0.200354	The most common mathematical functions
-0.154434	lot of advanced mathematical functions
-0.154434	libraries for computing mathematical functions
-0.234393	at www.agner.org/optimize/asmlib.zip contains various functions
-0.364204	of memory and string functions
-0.207741	versions of common string functions
-0.207741	the C style string functions
-0.289931	+ r.b;} The three functions
-0.234334	in 64-bit mode. Make functions
-0.219205	allows overriding of public functions
-0.219205	need relocation. All public functions
-0.233278	function into multiple smaller functions
-0.288697	be manipulated with C functions
-0.210295	a library of math functions
-0.210295	the most common math functions
-0.231976	The names of inlined functions
-0.344793	contains calls to frame functions
-0.210325	more efficient than frame functions
-0.230394	always fully optimized. Library functions
-0.156720	Virtual member functions Virtual functions
-0.156720	function is pure. Virtual functions
-0.156720	(see page 96). Virtual functions
-0.226368	made for all suitable functions
-0.048322	and string manipulation Mathematical functions
-0.023489	......................... 142 14.10 Mathematical functions
-0.023489	by u[0]. 14.10 Mathematical functions
-0.048322	(see page 140). Mathematical functions
-0.023489	vectorization............................................................. 117 12.7 Mathematical functions
-0.023489	called. 118 12.7 Mathematical functions
-0.093213	to suboptimal code. Intrinsic functions
-0.093213	few machine instructions. Intrinsic functions
-0.093213	in either case. Intrinsic functions
-0.093213	AVX2 Table 12.3. Intrinsic functions
-0.224753	a distinction between leaf functions
-0.224680	internal variables and internal functions
-0.218288	and truncation. The missing functions
-0.148492	member pointers /vms Fastcall functions
-0.148492	on CodeGear compiler). Fastcall functions
-0.218181	order to identify individual functions
-0.218181	in the program. Small functions
-0.067968	<<6 ); 7.26 Overloaded functions
-0.067968	................................................................................................................... 56 7.26 Overloaded functions
-0.264748	dispatching only for speed-critical functions
-0.199657	that needs them. Pure functions
-0.199657	common names. Use fastcall functions
-0.199657	templates // Place non-polymorphic functions
-0.199657	a separate function. Sometimes, functions
-0.199657	a program. Avoid unnecessary functions
-0.164823	p->Hello(); } // Non-polymorphic functions
-0.164823	the GetTickCount or QueryPerformanceCounter functions
-0.164823	a leaf function. Leaf functions
-0.164823	CPU dispatching or memory-intensive functions
-0.164823	in each case. Inlined functions
-0.347758	used here is the only
-0.347758	template metaprogramming is the only
-0.357680	to copy that the only
-0.347063	but that's about the only
-0.582615	subroutine if it is only
-0.649300	make sure it is only
-0.344147	child class. This is only
-0.444912	64 bits. This is only
-0.344147	of x. This is only
-0.453873	finished. Obviously, this is only
-0.647887	is true, which is only
-0.549492	smaller if there is only
-0.336051	the diagonal there is only
-0.336051	64-bit systems, there is only
-0.522108	if the variable is only
-0.311973	The register keyword is only
-0.235593	the misprediction penalty is only
-0.235593	} Here, log(2.0) is only
-0.237772	needs to evaluate a only
-0.237769	memory ports, etc. of only
-0.236591	will have one and only
-0.236591	is always one, and only
-0.236591	151 15.1c automatically, and only
-0.236591	is advantageous if, and only
-0.236591	low priority thread, and only
-0.294105	// make dispatcher in only
-0.236946	and model number. The only
-0.236946	as an example. The only
-0.236946	no pointer aliasing. The only
-0.355552	are identical so that only
-0.524231	so there may be only
-0.356138	commas. There should be only
-0.458612	expressions. Operations that are only
-0.236281	of 256-bit size are only
-0.355161	number 0x1C. There are only
-0.236281	(char, short int) are only
-0.354186	vector, while you can only
-0.342743	OR operator, which can only
-0.312488	identifier names. We can only
-0.312488	to 15.1c. We can only
-0.235538	operator which otherwise can only
-0.293973	compiler may calculate it only
-0.237684	table at runtime, if only
-0.237586	block size grows by only
-0.322895	thread in systems with only
-0.235189	on a system with only
-0.431456	version for CPUs with only
-0.235189	to calculate pow(x,10) with only
-0.639843	you can rely on only
-0.288402	A redesign can not only
-0.342878	code 16 will not only
-0.232796	scratch. This would not only
-0.232796	test should include not only
-0.232796	The profiler measures not only
-0.232796	will take precedence, not only
-0.336110	instruction set. Therefore, you only
-0.334496	then you will have only
-0.329753	accumulators. Current CPUs have only
-0.237395	multiplication b[i]*c[i], though this only
-0.332981	the code that use only
-0.529826	reason, you can use only
-0.438223	calculations, and then use only
-0.486917	it is called from only
-0.100768	if the function has only
-0.100768	each member function has only
-0.232213	If a template has only
-0.232213	If the computer has only
-0.347721	Gnu compiler will make only
-0.237307	into multiple smaller functions only
-0.228965	interrupt the user but only
-0.228965	require a multiplication but only
-0.284050	with 14.14b automatically but only
-0.228965	facilities are needed, but only
-0.228965	that need relocation, but only
-0.511328	function that is used only
-0.334201	// This is used only
-0.337084	versions should be used only
-0.337084	should therefore be used only
-0.348666	Smart pointers are used only
-0.381946	factor. Loop unrolling should only
-0.340988	am giving this example only
-0.354304	avoid hyperthreading by using only
-0.593338	a new register size only
-0.236921	for large libraries where only
-0.236921	to objects) are possible only
-0.350627	cached. Usually it takes only
-0.281230	variable in memory takes only
-0.226480	assembly language. C++ takes only
-0.226480	Long double precision takes only
-0.381194	make a new branch only
-0.621574	if it is called only
-0.326543	Dispatcher. Will be called only
-0.097572	Assume member function called only
-0.097572	throw() Assume function called only
-0.459127	algebraic expressions. For example, only
-0.311385	single instructions that take only
-0.415127	then it may take only
-0.433672	whole loop will take only
-0.222663	and shift operations take only
-0.347693	be returned in registers only
-0.236465	a high-level language need only
-0.340360	should use this method only
-0.291934	This code will work only
-0.235817	char 16 XOP, AMD only
-0.235697	compiler supports this option only
-0.235652	For example, use AVX only
-0.437321	as it is done only
-0.459110	address calculations are done only
-0.181098	for overflow and works only
-0.181098	16.1. This code works only
-0.181098	the Intel compiler works only
-0.230096	Of course, this works only
-0.230096	that this method works only
-0.181098	Explicit CPU dispatching works only
-0.230096	The renaming mechanism works only
-0.181098	14.12b and 14.13b works only
-0.322361	the cache. The problem only
-0.290109	a container that contains only
-0.354166	and the code contains only
-0.270698	loop body now contains only
-0.235311	A safer implementation would only
-0.322174	instruction set can run only
-0.035802	loops are predicted well only
-0.333074	vector implementation is optimal only
-0.278104	to do the dispatching only
-0.340017	should apply CPU dispatching only
-0.378189	different. 64-bit Windows allows only
-0.234439	should use such methods only
-0.464292	simpler because it needs only
-0.328258	Runtime polymorphism is needed only
-0.356895	Comparing two pointers requires only
-0.219171	enabled (single precision requires only
-0.233063	ways of doing things only
-0.215190	eliminate everything that depends only
-0.268445	its return value depends only
-0.288529	examples have been tested only
-0.336385	modules may be loaded only
-0.213552	the Microsoft compiler. Supports only
-0.213552	later instruction sets. Supports only
-0.232377	new register size comes only
-0.336603	registers had in fact only
-0.230987	expression or subexpression containing only
-0.329008	are designed to handle only
-0.412922	that it is initialized only
-0.230501	are: Static linking includes only
-0.318590	instruction set and insert only
-0.229760	throw() specification to F1 only
-0.229812	predictor. On other processors, only
-0.303884	the code is chosen only
-0.229147	powers of 2 applies only
-0.030241	control branch is mispredicted only
-0.007365	other way is mispredicted only
-0.030241	repeat count is mispredicted only
-0.226433	template specialization is allowed only
-0.305548	container is to hold only
-0.224539	second operand is evaluated only
-0.295695	scan instruction is executed only
-0.221900	or reference is valid only
-0.276038	Intel but is currently only
-0.218099	supported at all. Can only
-0.211837	Consider running the services only
-0.199578	a double by modifying only
-0.164751	many processes simultaneously. Actually, only
-0.164751	compiler and it understands only
-0.562018	two versions of the CPU
-0.350491	clock frequency of the CPU
-0.707348	takes care of the CPU
-0.350491	the resolution of the CPU
-0.350491	uses 90% of the CPU
-0.354228	and closer to the CPU
-0.348142	pending instructions in the CPU
-0.064417	forwarding delay in the CPU
-0.348142	or integrated in the CPU
-0.348142	other flaws in the CPU
-0.491778	it possible for the CPU
-0.913954	is difficult for the CPU
-0.344710	optimize specifically for the CPU
-0.821291	reason is that the CPU
-0.338534	an instruction that the CPU
-1.054634	make sure that the CPU
-0.524530	details. Note that the CPU
-0.338534	clock frequency that the CPU
-0.543728	cycles even if the CPU
-0.672134	is supported by the CPU
-0.594457	if supported by the CPU
-0.353240	MKL relies on the CPU
-0.351154	instruction set than the CPU
-0.351888	directly, or use the CPU
-0.351327	and changing then the CPU
-0.352193	which counts at the CPU
-0.324545	function calls because the CPU
-0.324545	be needed because the CPU
-0.324545	simple types because the CPU
-0.324545	flags stall because the CPU
-0.455124	instruction set. If the CPU
-0.349023	counters in all the CPU
-0.546161	better to do the CPU
-0.341097	measurements: warm up the CPU
-0.326205	chain. We want the CPU
-0.326441	of memory inside the CPU
-0.326441	a counter inside the CPU
-0.234259	supported by both the CPU
-0.234259	and checks both the CPU
-0.239464	possible to replace the CPU
-0.239464	necessary to replace the CPU
-0.332471	out-of-order mechanism allows the CPU
-0.321192	which instruction sets the CPU
-0.334497	CPU dispatching. Unfortunately, the CPU
-0.311119	the code prevent the CPU
-0.279991	memory. This prevents the CPU
-0.279991	one. This prevents the CPU
-0.401384	is to help the CPU
-0.584823	The profiler tells the CPU
-0.233334	clock pulses since the CPU
-0.309281	Replace or bypass the CPU
-0.233334	you to override the CPU
-0.233334	option that limits the CPU
-0.350749	multiple CPUs or a CPU
-0.353434	still run on a CPU
-0.351422	This library has a CPU
-0.580210	effort to make a CPU
-0.323962	you only need a CPU
-0.292907	dispatcher should give a CPU
-0.236758	cost of keeping a CPU
-0.880226	than the number of CPU
-0.520712	on the type of CPU
-0.856529	consume a lot of CPU
-0.314092	most common pitfalls of CPU
-0.313729	During the history of CPU
-0.294126	mentally flawed approach to CPU
-0.382591	compiler, operating system and CPU
-0.494753	crashes the program. The CPU
-0.320323	an up-to-date version. The CPU
-0.233778	an Intel processor. The CPU
-0.233778	elsewhere. 13.5 Implementation The CPU
-0.233778	a single instruction. The CPU
-0.233778	in most cases: The CPU
-0.233778	time of programming. The CPU
-0.233778	CPU detection mechanism. The CPU
-0.233778	be calculated independently. The CPU
-0.233778	called register renaming. The CPU
-0.233778	not necessarily newer. The CPU
-0.233778	ten years old. The CPU
-0.349943	bypass the check for CPU
-0.514443	feature is intended for CPU
-0.237705	// Example 13.1 // CPU
-0.443865	use multiple CPUs or CPU
-0.942661	Obstacles to optimization by CPU
-0.236916	sense to dispatch by CPU
-0.330835	highly optimized code with CPU
-0.479051	a function library with CPU
-0.536667	effort is concentrated on CPU
-0.236806	deal of research on CPU
-0.356508	cache access rather than CPU
-0.420767	most function libraries have CPU
-0.237491	very well spend more CPU
-0.407539	speed is relevant when CPU
-0.235931	each instruction set. A CPU
-0.235931	software was developed. A CPU
-0.345007	should not look at CPU
-0.338816	thread jumps between different CPU
-0.417977	systems with only one CPU
-0.455046	computer has only one CPU
-0.290634	are CPU-specific and each CPU
-0.332324	of counters in each CPU
-0.234636	and without AVX using CPU
-0.378275	manual, I am using CPU
-0.461854	fact that the Intel CPU
-0.327662	etc. Overriding the Intel CPU
-0.224923	Kernel Library. The multiple CPU
-0.323094	utilize systems with multiple CPU
-0.299281	way to use multiple CPU
-0.268715	that jump between multiple CPU
-0.268715	the workload between multiple CPU
-0.228872	not optimized. Jumps between CPU
-0.228872	same without discriminating between CPU
-0.228872	one that discriminates between CPU
-0.548205	CPUs. This is called CPU
-0.236418	most library functions without CPU
-0.120620	code to a specific CPU
-0.056136	thread to a specific CPU
-0.215569	bit indicates a specific CPU
-0.244493	or lists of specific CPU
-0.086231	separate version for specific CPU
-0.086231	are fine-tuned for specific CPU
-0.235556	test when software uses CPU
-0.235589	truly represent a known CPU
-0.235236	more important than optimizing CPU
-0.446339	with in a particular CPU
-0.477795	optimized for a particular CPU
-0.234642	fail to keep their CPU
-0.234513	have features for automatic CPU
-0.047843	class code with automatic CPU
-0.047843	user-written code with automatic CPU
-0.101616	example (12.4e) with automatic CPU
-0.185041	the program contains automatic CPU
-0.233919	unknown processors properly. Many CPU
-0.339834	run on its own CPU
-0.319411	threads. The compiler supports CPU
-0.231250	assumption about an unknown CPU
-0.230643	AVX instr. set Automatic CPU
-0.230611	found in Wikipedia under CPU
-0.229812	by Intel have similar CPU
-0.371768	Obviously, you should apply CPU
-0.019213	you are overriding Intel's CPU
-0.176258	instruction sets........................... 122 13.1 CPU
-0.176258	critical innermost loops. 13.1 CPU
-0.160207	program on the newest CPU
-0.160207	best on the newest CPU
-0.224624	many examples of poor CPU
-0.218268	more examples of bad CPU
-0.218268	of performance monitoring options. CPU
-0.890403	cc into vector c: CPU
-0.271927	specified. Insert an explicit CPU
-0.212004	extra overhead which consumes CPU
-0.212116	and VIA processors. Explicit CPU
-0.122500	specific instruction set. 13.6 CPU
-0.122500	Implementation ..................................................................................................... 126 13.6 CPU
-0.122500	-mAVX -axSSE3, etc. (Intel CPU
-0.122500	(Intel CPU only) (Intel CPU
-0.074670	compiler ......................................................................... 128 13.7 CPU
-0.074670	particularly critical. 129 13.7 CPU
-0.164900	used: // Example 13.2. CPU
-0.164900	common programs use inappropriate CPU
-0.164900	The Pentium 4 (NetBurst) CPU
-0.350181	is 0 and the other
-0.452536	two times and the other
-0.539570	further explained in the other
-0.356601	prefetching data for the other
-0.343407	one way or the other
-0.355585	not compatible with the other
-0.356194	negative list, on the other
-0.349833	same time as the other
-0.354671	to calculate than the other
-0.283165	then many times the other
-0.211304	and three times the other
-0.292629	when it goes the other
-0.176038	different CPUs. On the other
-0.176038	the compiler. On the other
-0.176038	more difficult. On the other
-0.176038	is profitable. On the other
-0.236513	time and rarely the other
-0.560290	thread if there is other
-0.346425	and whether there is other
-0.636981	as a result of other
-0.237654	carried out independently of other
-0.314200	plus the costs to other
-0.109226	here may apply to other
-0.109226	advices may apply to other
-0.232007	of that branch and other
-0.425748	that big arrays and other
-0.327508	size of integers and other
-0.425748	store help files and other
-0.374620	what instruction sets and other
-0.287505	types of expressions and other
-0.374620	the user interface and other
-0.232007	accessing databases, network and other
-0.287505	inheritance, virtual functions, and other
-0.318162	to other platforms and other
-0.232007	things very smart and other
-0.287505	C#, managed C++, and other
-0.232007	excessive memory swapping and other
-0.374620	enable constant propagation and other
-0.232007	casting. Linked lists and other
-0.287505	compilers, system database, and other
-0.232007	Firewalls, virus scanners and other
-0.232007	prevent memory leaks and other
-0.232007	of cache evictions and other
-0.232007	to 3-dimensional geometry and other
-0.232007	around this limitation and other
-0.232007	to virus attacks and other
-0.232007	(Standard Template Library) and other
-0.347886	C++ faster than in other
-0.236248	will look different in other
-0.337963	information about functions in other
-0.437128	repagination are running in other
-0.330245	have been defined in other
-0.236248	can be prevented in other
-0.312754	features rarely found in other
-0.314569	to the vector. The other
-0.234810	Intel CPUs, not for other
-0.985638	also be used for other
-0.468303	elements Induction variables for other
-0.341634	extra register available for other
-0.234810	to reserve resources for other
-0.424687	code is needed for other
-0.439367	open the possibility for other
-0.234810	graphics processing unit for other
-0.234810	graphics accelerator card for other
-0.344063	explained below. There are other
-0.344063	control instructions. There are other
-0.310696	allocation of memory or other
-0.234522	a particular CPU or other
-0.234522	of an exception or other
-0.532123	a hard disk or other
-0.234522	queue, list, database, or other
-0.234522	to a printer or other
-0.382598	be a disadvantage if other
-0.293958	faster than multiplying by other
-0.318326	the Borland compiler with other
-0.100741	also be used with other
-0.100741	libraries are used with other
-0.307861	and return operations with other
-0.625641	are not compatible with other
-0.047456	memory is contiguous with other
-0.047456	which is contiguous with other
-0.232142	subtask before coordination with other
-0.293944	modification if implemented on other
-0.797387	much more time than other
-0.550725	14.21 is faster than other
-0.333764	code execute faster than other
-0.233385	more clock cycles than other
-0.233385	higher clock frequency than other
-0.533173	a and b have other
-0.403914	if the operands have other
-0.235051	the variables might have other
-0.234248	useful for calling from other
-0.290053	are not accessible from other
-0.234248	profiling feasible. Interference from other
-0.311520	time used by all other
-0.291150	number and sets all other
-0.237352	have little-endian storage, but other
-0.381982	calls at least one other
-0.304943	with few or no other
-0.162971	dead code if no other
-0.162971	any objects if no other
-0.264218	certain to have no other
-0.264218	it can have no other
-0.264218	the operands have no other
-0.294792	point overflow but no other
-0.221137	output can produce no other
-0.318117	are unrelated to each other
-0.334367	often waiting for each other
-0.220511	values far from each other
-0.104690	are used near each other
-0.181429	are stored near each other
-0.448938	also stored near each other
-0.104690	are called near each other
-0.104690	code together near each other
-0.293450	a command or do other
-0.321515	clock cycles on most other
-0.234755	elsewhere. Faster than most other
-0.237205	integers of any size other
-0.324647	function in a library other
-0.433620	also used in two other
-0.315298	vectors. There are also other
-0.315298	point). There are also other
-0.274716	available register size. In other
-0.382953	doing mathematical calculations. In other
-0.220732	as template parameter. In other
-0.220732	is intended for. In other
-0.220732	program exception safe. In other
-0.220732	then call __intel_cpu_features_init_x(). In other
-0.142206	comparing it to any other
-0.142206	or writes to any other
-0.186111	copy constructors, and any other
-0.253590	be used for any other
-0.253590	to call or any other
-0.023355	not accessed by any other
-0.102057	this line by any other
-0.186111	as efficient as any other
-0.186111	8, but not any other
-0.235712	the inputs have any other
-0.142206	be called from any other
-0.142206	not referenced from any other
-0.019346	that doesn't call any other
-0.186111	You may insert any other
-0.319649	is identical to some other
-0.319649	string functions and some other
-0.292257	will make two. Some other
-0.222553	or no overhead while other
-0.222553	can modify x, while other
-0.222553	call to Func1, while other
-0.427244	A function that calls other
-0.235904	Pascal, Fortran and several other
-0.346328	an array can cause other
-0.234498	keyword also makes various other
-0.471610	other compilers can reduce other
-0.233677	that make developers choose other
-0.233455	in linear algebra) require other
-0.231794	special loop predictor. On other
-0.231140	to be saved. Any other
-0.285909	cache. Bit-fields of sizes other
-0.224689	based on compilers. Several other
-0.279199	about the class c1 other
-0.222191	gain in performance over other
-0.164957	identification (RTTI), which affects other
-0.591091	as part of the instruction
-0.356906	detailed explanation of the instruction
-0.350019	new instructions to the instruction
-0.452331	important addition to the instruction
-0.350019	been translated to the instruction
-0.358058	are missing in the instruction
-0.352579	header file for the instruction
-0.517964	programs compiled for the instruction
-0.293870	the technical details of instruction
-0.237604	Instruction tables: Lists of instruction
-0.335988	for different processors and instruction
-0.593875	i by 2. The instruction
-0.293570	to array elements. The instruction
-0.732275	can be used if instruction
-0.350265	function name depending on instruction
-0.235729	comparisons more efficient. This instruction
-0.235729	FMA4 instruction set. This instruction
-0.235729	point of view. This instruction
-0.335136	if it has an instruction
-0.236517	risking to insert an instruction
-0.341357	optimal code for this instruction
-0.328157	on processors with this instruction
-0.234910	CPUs that support this instruction
-0.348602	be used only when instruction
-0.260367	multiple versions for different instruction
-0.300121	to compile for different instruction
-0.357557	at least the same instruction
-0.326045	be based on which instruction
-0.233205	before it checks which instruction
-0.233205	should automatically detect which instruction
-0.344226	instruction set has no instruction
-0.352550	different name for each instruction
-0.338290	mode because the 64-bit instruction
-0.457144	use the best possible instruction
-0.341144	VIA CPUs". A branch instruction
-0.346066	systems. The 64 bit instruction
-0.352325	only when a new instruction
-0.444448	without any of these instruction
-0.314815	the availability of these instruction
-0.030124	functions for the SSE2 instruction
-0.030124	one for the SSE2 instruction
-0.030124	implementation if the SSE2 instruction
-0.030124	(XMM) if the SSE2 instruction
-0.025676	or when the SSE2 instruction
-0.082086	truncation when the SSE2 instruction
-0.082086	float's when the SSE2 instruction
-0.112257	with only the SSE2 instruction
-0.062495	processors without the SSE2 instruction
-0.062495	some cases the SSE2 instruction
-0.093810	mode unless the SSE2 instruction
-0.093810	rounding unless the SSE2 instruction
-0.062495	105). Using the SSE2 instruction
-0.140772	to enable the SSE2 instruction
-0.071528	or enable the SSE2 instruction
-0.121080	the SSE and SSE2 instruction
-0.036733	more efficient. The SSE2 instruction
-0.036733	modern CPUs. The SSE2 instruction
-0.036733	page 140). The SSE2 instruction
-0.121080	the SSE or SSE2 instruction
-0.121080	when the 145 SSE2 instruction
-0.121080	-msse /arch:SSE -msse SSE2 instruction
-0.236334	if the AVX 32 instruction
-0.206648	depending on the available instruction
-0.206648	CPUs increased the available instruction
-0.236020	Intel processors. Details about instruction
-0.329611	with an inline assembly instruction
-0.336950	that support the necessary instruction
-0.407088	compiled for the specific instruction
-0.499688	Compile for a specific instruction
-0.033153	advantage of the AVX instruction
-0.151189	YMM in the AVX instruction
-0.213064	compiled for the AVX instruction
-0.213064	(YMM) if the AVX instruction
-0.151189	105). If the AVX instruction
-0.223737	and later. The AVX instruction
-0.078910	vector elements. 12.1 AVX instruction
-0.078910	operations............................................................................................... 105 12.1 AVX instruction
-0.183193	check for the supported instruction
-0.183193	information, such as supported instruction
-0.183193	CPUID information about supported instruction
-0.039071	{ // Get supported instruction
-0.039071	version // Get supported instruction
-0.183193	is the minimum supported instruction
-0.183193	{ // Detect supported instruction
-0.427723	Using the nontemporal write instruction
-0.477679	compiled for a particular instruction
-0.316138	N supports a particular instruction
-0.335306	contains i/2+r. The next instruction
-0.234389	The availability of various instruction
-0.322212	numbers, but on what instruction
-0.129090	for SSE2 and later instruction
-0.129090	the AVX and later instruction
-0.129090	the SSE and later instruction
-0.003487	the SSE2 or later instruction
-0.108861	the AVX or later instruction
-0.059466	the Pentium-II or later instruction
-0.270858	file for a higher instruction
-0.270858	CPU with a higher instruction
-0.175115	the SSE or higher instruction
-0.175115	set or any higher instruction
-0.175115	when the next higher instruction
-0.218197	function, and the AVX2 instruction
-0.218197	performance somewhat. The AVX2 instruction
-0.236702	problem of the x86 instruction
-0.236702	extension to the x86 instruction
-0.201762	The 32- bit x86 instruction
-0.343806	separately with the appropriate instruction
-0.288873	sequence of backwards compatible instruction
-0.287984	or any particularly slow instruction
-0.364438	compiled for the desired instruction
-0.279157	to enable the desired instruction
-0.374703	implementation for a given instruction
-0.230550	/Gy -ffunction- sections SSE instruction
-0.088501	instruction if the SSE4.1 instruction
-0.088501	bits), unless the SSE4.1 instruction
-0.251013	may choose a newer instruction
-0.199734	instruction set. The newer instruction
-0.321740	vectorized with the current instruction
-0.304025	branch for a low instruction
-0.282724	compiling for a lower instruction
-0.131315	thing and the CPUID instruction
-0.211782	time when the CPUID instruction
-0.131315	may call the CPUID instruction
-0.098943	one for the latest instruction
-0.098943	122) for the latest instruction
-0.049200	that the bit scan instruction
-0.049200	use the bit scan instruction
-0.159870	a slow bit scan instruction
-0.224529	-msse2 /arch:SSE2 -msse2 SSE3 instruction
-0.171838	to use the newest instruction
-0.171838	of using the newest instruction
-0.132750	to vectorization. The newest instruction
-0.224676	Prefetching data The prefetch instruction
-0.221976	bits when the AVX512 instruction
-0.069262	resources. However, the CISC instruction
-0.033252	limited resource. The CISC instruction
-0.033252	performance/price ratio. The CISC instruction
-0.069262	PC processors with CISC instruction
-0.271821	CPU with the highest instruction
-0.218283	mode because the x86-64 instruction
-0.016012	example 9.6b. The MOVNTQ instruction
-0.271821	not have the selected instruction
-0.264742	depending on the specified instruction
-0.048307	processors with the AVX-512 instruction
-0.023482	library function. 12.2 AVX-512 instruction
-0.023482	................................................................. 107 12.2 AVX-512 instruction
-0.199651	&SelectAddMul_SSE2; // Error: lowest instruction
-0.199651	CPUs without the FMA4 instruction
-0.250920	CPU supports the corresponding instruction
-0.199651	followed by an EMMS instruction
-0.164818	a 32-bit number (the instruction
-0.164818	replaced by a blend instruction
-0.164818	the SSE2 (or later) instruction
-0.164818	BSF (bit scan forward) instruction
-0.164818	RGB color difference. Newest instruction
-0.164818	CPUs. The Pentium Pro instruction
-0.357972	the execution to the point
-0.351195	contrived example, but the point
-0.716553	they are sure to point
-0.382659	Induction ; edx = point
-0.345140	pointer and makes it point
-0.348821	first call it will point
-0.000417	value of the floating point
-0.000417	parts of the floating point
-0.000834	fact that the floating point
-0.000834	determined by the floating point
-0.000139	time when the floating point
-0.000278	inefficient when the floating point
-0.000834	truncation so the floating point
-0.000834	long before the floating point
-0.000834	might store the floating point
-0.000834	precision. When the floating point
-0.000834	double reflects the floating point
-0.000834	operations in-between the floating point
-0.000278	sign of a floating point
-0.000278	Conversion of a floating point
-0.000278	latency of a floating point
-0.000834	223 to a floating point
-0.000834	addition, and a floating point
-0.000834	check if a floating point
-0.000834	faster than a floating point
-0.000834	chain. If a floating point
-0.000834	to do a floating point
-0.000834	to access a floating point
-0.000834	loop needs a floating point
-0.000834	integer addition, a floating point
-0.000834	function rounds a floating point
-0.001206	any use of floating point
-0.001206	maximum number of floating point
-0.001206	the order of floating point
-0.001206	rare cases of floating point
-0.000201	different types of floating point
-0.000402	two types of floating point
-0.001206	The range of floating point
-0.001206	algebraic manipulations of floating point
-0.132688	from integer to floating point
-0.009775	of integers to floating point
-0.009775	unsigned integers to floating point
-0.019775	before conversion to floating point
-0.019775	not apply to floating point
-0.019775	%. Conversion to floating point
-0.002173	mix integer and floating point
-0.000241	between integers and floating point
-0.002173	String constants and floating point
-0.005452	integer calculations in floating point
-0.005452	precision, especially in floating point
-0.010974	hardware functions. The floating point
-0.016873	induction variables for floating point
-0.016873	XMM registers for floating point
-0.016873	Enable exception for floating point
-0.016873	of accumulators for floating point
-0.010974	makers assume that floating point
-0.010974	are integers or floating point
-0.003627	above example with floating point
-0.003627	point addition with floating point
-0.003627	are incompatible with floating point
-0.003627	expressions than on floating point
-0.000904	algebraic reductions on floating point
-0.005452	are faster than floating point
-0.005452	integer expressions than floating point
-0.010974	systems that have floating point
-0.010974	memory space. A floating point
-0.002718	faster than from floating point
-0.002718	A conversion from floating point
-0.002718	all conversions from floating point
-0.002718	integers Conversion from floating point
-0.003627	easy to make floating point
-0.001810	compiler cannot make floating point
-0.001810	Compilers cannot make floating point
-0.010974	for mixing different floating point
-0.010974	specifies that all floating point
-0.010974	have only one floating point
-0.010974	loop. If each floating point
-0.000904	one or two floating point
-0.003627	would require two floating point
-0.002718	and before any floating point
-0.002718	instruction before any floating point
-0.001357	14.8 Conversions between floating point
-0.003627	because it makes floating point
-0.000904	instruction set makes floating point
-0.010974	start a new floating point
-0.001357	have difficulties making floating point
-0.002718	code that does floating point
-0.002718	loop that does floating point
-0.010974	requires a big floating point
-0.003627	There are eight floating point
-0.003627	49 first eight floating point
-0.003627	operations involves eight floating point
-0.010974	a loop contains floating point
-0.001357	method of doing floating point
-0.010974	to enable fast floating point
-0.010974	Applications that generate floating point
-0.010974	compare two positive floating point
-0.010974	there are 100 floating point
-0.010974	FDIV bug causes floating point
-0.010974	advantageous to mix floating point
-0.010974	execution units. Any floating point
-0.010974	loading the entire floating point
-0.010974	with x87 style floating point
-0.010974	includes static variables, floating point
-0.010974	requirements for strict floating point
-0.005452	multiply a nonzero floating point
-0.005452	values of nonzero floating point
-0.010974	it allows larger floating point
-0.010974	making an additional floating point
-0.001357	operations for manipulating floating point
-0.010974	} // Catch floating point
-0.010974	misses, branch mispredictions, floating point
-0.010974	allows less precise floating point
-0.010974	/Oa -fno-alias Non-strict floating point
-0.010974	have no native floating point
-0.010974	occurred. // Reset floating point
-0.010974	on, including relaxed floating point
-0.010974	set to relax floating point
-0.010974	integer vectors FMA3 floating point
-0.293286	and also a possible point
-0.023112	of different types cannot point
-0.286687	whenever the objects they point
-0.231287	and the texts they point
-0.236196	$B2$2 ; Induction++; ; point
-0.006923	EXCEPTION_CONTINUE_SEARCH) { // Floating point
-0.006923	float and double Floating point
-0.006923	b) - n.a. Floating point
-0.006923	floating point variables Floating point
-0.006923	in 64-bit systems. Floating point
-0.006923	Floating point division Floating point
-0.006923	multiply and shift Floating point
-0.006923	45 clock cycles. Floating point
-0.006923	for multiple purposes. Floating point
-0.006923	floating point expressions. Floating point
-0.006923	with integer parameters. Floating point
-0.006923	the 32-bit integer. Floating point
-0.003448	= 0; 14.6 Floating point
-0.003448	division...................................................................................................... 137 14.6 Floating point
-0.006923	on page 105. Floating point
-0.003448	operators............................................................................... 29 7.3 Floating point
-0.003448	variables. 31 7.3 Floating point
-0.006923	stack is organized. Floating point
-0.006923	45 clock cycles). Floating point
-0.006923	the programmer. 79 Floating point
-0.218519	keyword, for floating 26 point
-0.218519	for the common entry point
-0.074760	value before the decimal point
-0.074760	constant with a decimal point
-0.165122	optimal from a technological point
-0.343992	jl $B1$2 is the loop
-0.343992	add eax,1 is the loop
-0.492831	staircase function of the loop
-1.467014	the value of the loop
-0.452544	one iteration of the loop
-0.642559	is independent of the loop
-0.350187	degree polynomial of the loop
-0.352303	point calculations and the loop
-0.456364	the integer in the loop
-0.353202	with 100 in the loop
-0.456716	iteration (except for the loop
-0.349790	occasionally predict that the loop
-0.349790	roughly estimate that the loop
-0.338737	a vector or the loop
-0.332195	But not if the loop
-0.468039	works only if the loop
-0.468039	less efficient if the loop
-0.468039	delay. But if the loop
-0.488447	be advantageous if the loop
-0.332195	loop further if the loop
-0.332195	the loops if the loop
-0.816587	rather than by the loop
-0.353855	is best when the loop
-0.338845	thousand times then the loop
-0.338845	carry flag then the loop
-0.350808	value than from the loop
-0.351765	or constant. If the loop
-0.490930	common situation where the loop
-0.330665	a check before the loop
-0.427977	placed immediately before the loop
-0.112721	to roll out the loop
-0.106739	we roll out the loop
-0.038823	by rolling out the loop
-0.340542	it fills up the loop
-0.510858	unrolled to avoid the loop
-0.338586	the branch inside the loop
-0.233900	The branch inside the loop
-0.186631	on calculations inside the loop
-0.186631	point calculations inside the loop
-0.244299	overflow condition inside the loop
-0.244299	because nothing inside the loop
-0.344954	induction variable unless the loop
-0.335419	the check after the loop
-0.764977	to turn off the loop
-0.245517	temporary variable outside the loop
-0.245517	last element outside the loop
-0.245517	for overflow outside the loop
-0.047573	reason to unroll the loop
-0.047573	worthwhile to unroll the loop
-0.288472	microprocessor can predict the loop
-0.288472	microprocessor can execute the loop
-0.047573	gain by unrolling the loop
-0.047573	dramatically by unrolling the loop
-0.481949	want to vectorize the loop
-0.232858	able to evaluate the loop
-0.232858	loop counter, comparing the loop
-0.232858	CPU to increment the loop
-0.232858	then FuncC. Unrolling the loop
-0.354103	when n is a loop
-0.134717	a function of a loop
-0.134717	linear function of a loop
-0.466478	moved out of a loop
-0.331052	The efficiency of a loop
-0.331052	the body of a loop
-0.331052	calculations. Division of a loop
-0.343874	many times in a loop
-0.444568	usually done in a loop
-0.343874	and semicolons in a loop
-0.343874	cycles. Calculations in a loop
-0.491389	general. Assume that a loop
-0.804129	For example, if a loop
-0.671112	efficient to use a loop
-0.470763	optimal to use a loop
-0.847698	efficient to make a loop
-0.315964	an integer. If a loop
-0.315964	be obtained. If a loop
-0.543057	units. For example, a loop
-0.529276	to roll out a loop
-0.336868	it is inside a loop
-0.110350	have to unroll a loop
-0.110350	necessary to unroll a loop
-0.120800	will usually unroll a loop
-0.288730	On many processors, a loop
-0.233085	compiler will vectorize a loop
-0.233085	the CPU. Unrolling a loop
-0.233085	operations for incrementing a loop
-0.293523	overlap the calculations of loop
-0.065665	jump to top of loop
-0.031593	a ; top of loop
-0.031593	ebx ; top of loop
-0.327494	= 2; } The loop
-0.327494	variable Z } The loop
-0.326384	critical innermost loop. The loop
-0.319958	from overlapping calculations. The loop
-0.471017	aligned or not. The loop
-0.309453	change the value. The loop
-0.233479	for AVX. 5. The loop
-0.233479	is clearly better. The loop
-0.233479	variable in eax. The loop
-0.233479	the value 1000. The loop
-0.233479	than mov eax,0. The loop
-0.233479	be changed freely. The loop
-0.233479	in example 7.30b. The loop
-0.333712	SIZE; r++) { // loop
-0.062513	r; c++) { // loop
-0.235430	loop through rows // loop
-0.291397	// return x^10 // loop
-0.314442	the induction variable as loop
-0.303473	} FuncC(i); } This loop
-0.303473	four sums } This loop
-0.312806	clock cycles, then this loop
-0.312806	compiler will optimize this loop
-0.337579	total calculation time. A loop
-0.378172	is predicted well. A loop
-0.378172	of branch prediction. A loop
-0.584847	where there is no loop
-0.293451	Calculate integer power using loop
-0.449194	condition The most efficient loop
-0.006814	objects // Roll out loop
-0.006814	c; // Roll out loop
-0.003394	_mm_set1_epi16(2); // Roll out loop
-0.006814	two(2,2,2,2,2,2,2,2); // Roll out loop
-0.116724	purpose of the while loop
-0.116724	zero in the while loop
-0.116724	to emulate the while loop
-0.337069	roll out a big loop
-0.236031	the diagonal. The c loop
-0.235632	loop is inside another loop
-0.088177	out of the innermost loop
-0.088976	used in the innermost loop
-0.088976	calls in the innermost loop
-0.088976	spent in the innermost loop
-0.088177	processors, only the innermost loop
-0.088177	but also the innermost loop
-0.141162	but outside the innermost loop
-0.201263	if the critical innermost loop
-0.201263	If the critical innermost loop
-0.120976	tasks. A critical innermost loop
-0.334696	T+6, and the whole loop
-0.307886	that have a special loop
-0.232296	i++ ;checkifi<100 ; repeat loop
-0.289532	explained above, the maximum loop
-0.309733	the same. The maximum loop
-0.203239	called from the message loop
-0.203239	typically in a message loop
-0.282932	such as simple variables, loop
-0.222293	libraries available use excessive loop
-0.212142	has disadvantages: The unrolled loop
-0.199875	unroll too much. Excessive loop
-0.199875	only for avoiding infinite loop
-0.251171	= 0; // Initialize loop
-0.199875	+ log(c[i]); // Increment loop
-0.165024	variables are temporary intermediates, loop
-0.165024	15.1c. Calculate integer power, loop
-0.165024	three advantages: The i<20 loop
-0.165024	__try { // Main loop
-0.237546	data #ifdef _MSC_VER // If
-0.335847	= Func1(2); ... } If
-0.236427	causes another exception. 64 If
-0.451536	with floating point code. If
-0.221704	for the error code. If
-0.221704	gives the simplest code. If
-0.275817	always for application-specific code. If
-0.235837	of the virtual function. If
-0.351344	around in program memory. If
-0.268369	is loaded into memory. If
-0.287674	efficiently than static memory. If
-0.403565	array can be used. If
-0.695984	in the data cache. If
-0.221981	be a level-3 cache. If
-0.499993	bytes in 64-bit systems. If
-0.221919	IDE on some systems. If
-0.220550	and is not efficient. If
-0.274510	they are equally efficient. If
-0.299018	the desired instruction set. If
-0.299018	the corresponding instruction set. If
-0.206210	lines in each set. If
-0.233741	expensive in some compilers. If
-0.540908	member function is called. If
-0.476920	etc.) inside the loop. If
-0.436198	need a smart pointer. If
-0.318533	access and cache size. If
-0.339783	with many function calls. If
-0.738519	especially in 32-bit mode. If
-0.580171	reference to the object. If
-0.279143	a different function library. If
-0.207893	entire floating point library. If
-0.341807	takes 40 clock cycles. If
-0.205601	in a different thread. If
-0.338368	5 by another thread. If
-0.373081	branch for test purposes. If
-0.692139	in the following way. If
-0.523567	elements in a vector. If
-0.370919	lookups for local references. If
-0.229733	d; d = u; If
-0.284564	a specific load address. If
-0.364018	registers in the CPU. If
-0.199479	on a non-Intel CPU. If
-0.199553	caching becomes a problem. If
-0.284743	ways around this problem. If
-0.228651	in non- sequential order. If
-0.228477	far procedures are inefficient. If
-0.283496	the statement was executed. If
-0.424089	of the template parameter. If
-0.228477	use big endian storage. If
-0.282387	in another source file. If
-0.368372	or in a register. If
-0.328188	cache and execution units. If
-0.227402	here is a branch. If
-0.184173	in the above table. If
-0.184173	a procedure linkage table. If
-0.233654	processes or threads simultaneously. If
-0.184274	simultaneously or seemingly simultaneously. If
-0.260622	preferably be an integer. If
-0.184274	to the nearest integer. If
-0.226023	will be loaded anyway. If
-0.226135	explained on page 16. If
-0.184173	as a 32-bit number. If
-0.184173	an 8-bit signed number. If
-0.224192	is large or constant. If
-0.582279	explanation of branch prediction. If
-0.278929	in a particular application. If
-0.408355	reordering the data members. If
-0.583635	a long dependency chain. If
-0.175935	reused again and again. If
-0.375911	truncation and back again. If
-0.735576	ranges do not overlap. If
-0.224321	Making too many branches. If
-0.432011	therefore difficult to maintain. If
-0.450759	available in the future. If
-0.164659	not any other factor. If
-0.541644	by the unroll factor. If
-0.221641	threads with lower priority. If
-0.360502	asmlib library at www.agner.org/optimize/asmlib.zip. If
-0.221641	that may be necessary. If
-0.384274	2; Common subexpression elimination If
-0.221641	at different memory addresses. If
-0.221641	with j << 5. If
-0.271445	model will work better. If
-0.217842	and resources cleaned up. If
-0.300950	the variable is declared. If
-0.499629	the program is running. If
-0.629918	runtime type identification (RTTI) If
-0.148276	most predictable operand first. If
-0.148276	data members come first. If
-0.217842	memory pool. See www.agner.org/optimize/cppexamples.zip. If
-0.217842	than non-object oriented programs. If
-0.148276	gives more reliable results. If
-0.193515	reliable and reproducible results. If
-0.264371	multiplication and an addition. If
-0.211584	65 bytes of code). If
-0.487733	to recover from errors. If
-0.013530	leaving the AVX part. If
-0.346510	mentioned in chapter 12. If
-0.211584	small or too long. If
-0.283496	are exactly the same. If
-0.264371	different cores is slow. If
-0.211584	lines from set 0x1C. If
-0.199332	the calls to CriticalFunction. If
-0.199332	from 0 to 15. If
-0.199332	in the following cases: If
-0.329879	0) *(p++) |= 0x20; If
-0.199332	examples for these methods. If
-0.199332	the processor has hyperthreading. If
-0.074517	in a FIFO manner? If
-0.074517	in a FILO manner? If
-0.329879	code difficult to read. If
-0.250561	that can be obtained. If
-0.199332	creating and deleting containers. If
-0.250561	more than 250 ms. If
-0.008645	objects have been added? If
-0.199332	to do so. 58 If
-0.329879	registers (see page 105). If
-0.329879	in Linux and BSD. If
-0.164524	latest instruction set extensions. If
-0.164524	instead of a macro. If
-0.164524	the rightmost 1-bit removed. If
-0.164524	known at compile time? If
-0.164524	of the same class). If
-0.164524	have very different speeds. If
-0.164524	advice given above. 7. If
-0.164524	1.5f}; a = lookup[b]; If
-0.164524	("int 3"); or __debugbreak();. If
-0.164524	Are objects numbered consecutively? If
-0.164524	as n! = n∙(n-1)!. If
-0.164524	explained on page 62. If
-0.164524	the software was coded. If
-0.164524	identified by a key? If
-0.164524	solution is more complicated. If
-0.164524	compression and cryptography (www.intel.com). If
-0.164524	resources locally or remotely. If
-0.164524	have a natural ordering? If
-0.164524	an executable file stub. If
-0.164524	long time to calculate. If
-0.164524	in x86 systems). 42 If
-0.164524	by using nontemporal writes. If
-0.164524	form a logical sequence. If
-0.164524	are: Coarse time measurement. If
-0.164524	little data for analysis. If
-0.164524	one or multiple elements? If
-0.164524	through pointers or references: If
-0.164524	in a vector. 6. If
-0.164524	feed into the pipeline. If
-0.164524	first element is stored? If
-0.164524	return IntegerPower<10>(x); } 152 If
-0.164524	/ (number of ways). If
-0.164524	mutexes, etc. is considerable. If
-0.164524	? 1.0f : 2.5f; If
-0.164524	list[i+1];} sum1 += sum2; If
-0.164524	has already been allocated. If
-0.256572	set?". A list of which
-0.581318	a negative list of which
-0.474766	a positive list of which
-0.518506	values. The choice of which
-0.236597	no specific recommendation of which
-0.236597	give some indication of which
-0.236597	be misleading reports of which
-0.292947	compiler to do and which
-0.236792	should be allowed and which
-0.236792	constructs are costly and which
-0.236792	in some situations, and which
-0.182807	of the function in which
-0.107106	in the function in which
-0.107106	from the function in which
-0.107106	inside the function in which
-0.493031	range of code in which
-0.002949	in the order in which
-0.008908	usually the order in which
-0.008908	reflects the order in which
-0.008908	controlling the order in which
-0.056417	51). The order in which
-0.290124	to the thread in which
-0.234311	the {} brackets in which
-0.533760	example is a function which
-0.346602	expression contains a function which
-0.429053	is a library function which
-0.535156	as a member function which
-0.598541	F1 calls another function which
-0.234120	library or API function which
-0.310308	list of processors on which
-0.347516	should be based on which
-0.023239	of processor models on which
-0.234196	are different opinions on which
-0.407605	with an error code which
-0.237506	is a new compiler which
-1.198956	known at compile time which
-0.313997	stored in stack memory which
-0.237337	The memory address at which
-0.237261	making sure that functions which
-0.459764	counter inside the CPU which
-0.420373	is a template class which
-0.335090	bits of a double which
-0.753639	through a function pointer which
-0.232939	have a 'this' pointer which
-0.308810	an implicit 'this' pointer which
-0.236920	big floating point library which
-0.284248	linear function of i which
-0.421275	new value of i which
-0.933910	in a shared object which
-0.350850	NOT on a variable which
-0.867138	at a memory address which
-0.232268	at a higher address which
-0.236568	Consider the following example, which
-0.236500	as a single bit which
-0.337569	in a vector register which
-0.535710	want to find out which
-0.846196	by the operating system which
-0.236017	require precision conversion instructions which
-0.236135	are various profilers available which
-0.438881	124 necessary information about which
-0.222935	any specific recommendation about which
-0.222935	a considerable debate about which
-0.235962	the oldest Pentium CPUs which
-0.291657	are using 8-bit integers which
-0.701201	if it is known which
-0.235366	with full debugging support which
-0.332936	used. We can calculate which
-0.234595	using the | operator which
-0.293578	critical code to see which
-0.382053	it possible to see which
-0.211052	different libraries and see which
-0.233757	many keywords and directives which
-0.233710	8.1 (page 77) shows which
-0.233350	is a complicated process which
-0.319512	an 9 extra overhead which
-0.326590	run with a profiler which
-0.232167	by a shift operation which
-0.627462	part of the code, which
-0.264713	into an intermediate code, which
-0.231578	example has three conditions which
-0.286791	very useful when testing which
-0.144369	in order to predict which
-0.302391	is difficult to predict which
-0.144369	advanced algorithms to predict which
-0.144369	is unable to predict which
-0.297315	And it is discussed which
-0.203105	it is also discussed which
-0.230424	is important to consider which
-0.371118	with Intel C++ compiler, which
-0.284930	Intel before it checks which
-0.326384	a critical dependency chain which
-0.227705	make the access non-sequential which
-0.226257	with a special trick which
-0.311050	as two 32-bit integers, which
-0.226166	the cache line size, which
-0.226257	The older MMX registers, which
-0.278796	program should automatically detect which
-0.224758	spend time on deciding which
-0.221908	chain has a latency which
-0.159325	when a is true, which
-0.159325	when b is true, which
-0.148365	minutes to start up, which
-0.148365	it is filled up, which
-0.307221	doesn't know in advance which
-0.217982	(0 < 5) {} which
-0.551067	lazy binding by default, which
-0.283659	extra level of abstraction which
-0.211722	called whole program optimization, which
-0.211722	two floating point comparisons, which
-0.098469	objects on the stack, which
-0.172544	stored on the stack, which
-0.211722	Standard Template Library (STL) which
-0.211722	is important to decide which
-0.211722	includes pointers and references, which
-0.199467	the bitwise OR operator, which
-0.199467	adding -100 to -56 which
-0.199467	the CPU family number, which
-0.199467	counter in the CPU, which
-0.199467	time at unpredictable intervals which
-0.199467	end of the array, which
-0.250712	based on an interpreter which
-0.199467	pointers requires a division, which
-0.199467	only an integer comparison, which
-0.250712	for the loop counter, which
-0.199467	A dispatcher function decides which
-0.250712	has a garbage collector which
-0.250712	to predict with certainty which
-0.074567	a single & operation, which
-0.074567	with a shift operation, which
-0.164647	an object is moved, which
-0.164647	likely to be mispredicted, which
-0.164647	dynamic link library (DLL) which
-0.164647	is replaced by x<<3, which
-0.164647	get the generic branch, which
-0.164647	the function library asmlib, which
-0.164647	implementing a compile-time polymorphism, which
-0.164647	Linux have an attribute which
-0.164647	objects for intermediate results, which
-0.164647	the performance even matters, which
-0.164647	(Windows: /Gy, Linux: -ffunction-sections) which
-0.164647	64-bit addresses for everything, which
-0.164647	an assembly language output, which
-0.164647	is Visual Basic .NET, which
-0.164647	the elements in a[] which
-0.164647	calculation requires n-1 multiplications, which
-0.164647	to make a bit-mask which
-0.164647	second induction variable (eax) which
-0.164647	the newest CPU model, which
-0.164647	file by calling WritePrivateProfileString, which
-0.164647	runtime type identification (RTTI), which
-0.164647	registers (XMM or YMM) which
-0.164647	with the constant 2.5, which
-0.357309	address [ecx+eax*4]. This is all
-0.355031	a bit-mask which is all
-0.772195	efficient. The size of all
-0.621972	the combined size of all
-0.512757	show the values of all
-0.714787	to take care of all
-0.463064	we get rid of all
-0.406568	setting an array to all
-0.345458	to keep pointers to all
-0.345458	give you access to all
-0.293010	adds extra information to all
-0.236848	the static keyword to all
-0.734956	can be applied to all
-0.605044	the executable file and all
-0.236918	one for AVX2 and all
-0.236918	the if statement and all
-0.236918	0 is true, and all
-0.695804	registers are available in all
-0.313367	the same precision in all
-0.292912	performance monitor counters in all
-0.313850	are also deallocated in all
-0.236762	multiplication are permissible in all
-0.375384	block of memory for all
-0.559987	process is used for all
-0.288130	choose this method for all
-0.232557	on the stack for all
-0.232557	solution is best for all
-0.437619	same compiler option for all
-0.508990	calls to check for all
-0.232557	dynamic memory allocation for all
-0.288130	can be made for all
-0.369029	a good choice for all
-0.226067	very good choice for all
-0.288130	makes a PLT for all
-0.232557	and a GOT for all
-0.232557	r1 and c1 for all
-0.232557	C++ compilers exist for all
-0.458217	from this is that all
-0.429069	cannot be sure that all
-0.532957	system makes sure that all
-0.831128	It is important that all
-0.543631	etc. This means that all
-0.321533	hardware often requires that all
-1.127200	in the sense that all
-0.289435	C/C++ standard specifies that all
-0.289435	version. 2. Check that all
-0.233705	CPUs to verify that all
-0.233705	is no guarantee that all
-0.340532	to F1 only if all
-0.475545	before the loop if all
-0.426040	is most efficient if all
-0.333163	clock cycles. But if all
-0.234160	value is zero if all
-0.234160	than at runtime if all
-0.341462	the time used by all
-0.236159	contrary, you should by all
-1.147596	set is supported by all
-0.229751	Organize the data with all
-0.284942	a release version with all
-0.229751	were carried out with all
-0.229751	the command line with all
-0.229751	This method works with all
-0.476873	will be compatible with all
-0.229751	point calculations. Even with all
-0.090864	that is AND'ed with all
-0.099856	(VML, MKL). Works with all
-0.099856	Primitives (IPP). Works with all
-0.282573	the advanced version on all
-0.205023	to do operations on all
-0.205023	float. Similar operations on all
-0.227664	for inline assembly on all
-0.342828	sure to work on all
-0.261322	Keywords that work on all
-0.282573	one that works on all
-0.166765	set is supported on all
-0.166765	standardized and supported on all
-0.409685	that works well on all
-0.321718	using and turn on all
-0.227664	should work efficiently on all
-0.227664	'this' is incurred on all
-0.237534	this function, though not all
-0.237488	in the end when all
-0.235963	the above example, then all
-0.292004	R values first, then all
-0.232507	speed or not at all
-0.310459	frameworks are used at all
-0.232507	is not evaluated at all
-0.232507	not be visible at all
-0.510948	(4096). This will make all
-0.233773	float or double because all
-0.233773	use hexadecimal numbers because all
-0.289512	are quite costly because all
-0.292879	Is searching needed before all
-0.313249	program that can call all
-0.355410	compile time. For example, all
-0.348198	in order to test all
-0.235978	old Pentium 4, while all
-0.508719	defined. This can cause all
-0.235376	c and d would all
-0.337613	safety is to store all
-0.280161	safety, you may store all
-0.235171	and 0x4700. These addresses all
-0.675394	the compiler can replace all
-0.222762	the number and sets all
-0.222762	array. The constructor sets all
-0.234660	the exception handler needs all
-0.343973	4.; }; // Make all
-0.290134	144 The above examples all
-0.290134	G values, and last all
-0.258687	needed, but only after all
-0.206547	Sort the array after all
-0.206547	Is searching needed after all
-0.233992	system may not load all
-0.310736	used for turning off all
-0.195300	the newest processors. Supports all
-0.195300	library. Open source. Supports all
-0.195300	and possible workaround. Supports all
-0.042158	leaf function by inlining all
-0.232488	whole software package, including all
-0.232434	the loop without checking all
-0.232184	a register and prevents all
-0.287029	is zero by testing all
-0.327194	copied simply by copying all
-0.208236	in the list causes all
-0.208236	the critical stride causes all
-0.863178	is the reason why all
-0.229848	same shared object. Obviously, all
-0.228866	not possible to contain all
-0.229086	objects. STL vector stores all
-0.228954	and parameter transfer across all
-0.085010	is used in almost all
-0.085010	to Linux in almost all
-0.226410	the stack pointer. Likewise, all
-0.226522	if F1 has saved all
-0.314934	you need to remove all
-0.224642	#include directives and declare all
-0.222102	be possible to select all
-0.222023	one operator less. Fortunately, all
-0.218222	AMD FMA4 fma4intrin.h (Gnu) all
-0.212085	can test or manipulate all
-0.211958	for the C++ language, all
-0.056934	antivirus program that scans all
-0.056934	virus scanner that scans all
-0.212085	This does not solve all
-0.212085	code for vectorization Not all
-0.250970	optimization is to join all
-0.250970	don't have to distribute all
-0.164859	more efficient to pool all
-0.164859	disturbing influences are removed, all
-0.164859	requires that you analyze all
-0.503012	table inside a function but
-0.517037	that takes more time but
-0.428100	The size of all but
-0.293209	originally designed by Intel but
-0.237033	p is not i but
-0.340774	the advantages of C++ but
-0.236869	have no specific order but
-0.236518	a very contrived example, but
-0.535455	the program under test but
-0.351107	never interrupt the user but
-0.292171	has four physical processors but
-0.330937	as -(-a) = a, but
-0.291589	Catch floating point overflow but
-0.234555	Windows and Mac programs but
-0.464591	time in most cases, but
-0.453967	iterator in some cases, but
-0.348260	in the simplest cases, but
-0.290258	not require a multiplication but
-0.222458	large static arrays automatically but
-0.222458	14.14a with 14.14b automatically but
-0.250813	copy of the function, but
-0.199557	of an optimized function, but
-0.199557	inlining the latter function, but
-0.134764	support for intrinsic functions, but
-0.134764	compiler supports intrinsic functions, but
-0.173829	library contains similar functions, but
-0.173829	log are pure functions, but
-0.211807	make more efficient code, but
-0.211807	of removing superfluous code, but
-0.164750	most of the time, but
-0.164750	known at compile time, but
-0.164750	pow at compile- time, but
-0.164750	of the programmers' time, but
-0.293734	dynamic linking is used, but
-0.250751	sets can be used, but
-0.613740	the SSE2 instruction set, but
-0.259118	any higher instruction set, but
-0.199643	size on AMD processors, but
-0.199643	on Intel Atom processors, but
-0.368409	stack in 32-bit systems, but
-0.156586	frequency than other CPUs, but
-0.202747	on certain Intel CPUs, but
-0.202747	core on multi-core CPUs, but
-0.227431	floating point register variables, but
-0.227620	2, 4 or 8, but
-0.337085	a few clock cycles, but
-0.226375	than 0 or 1, but
-0.390705	is stored in memory, but
-0.320224	a protected operating system, but
-0.280745	to define 64-bit integers, but
-0.314947	extra time, of course, but
-0.224221	in example 16.2 above, but
-0.132626	function. This is efficient, but
-0.132626	is fast and efficient, but
-0.132626	is also quite efficient, but
-0.224346	[edx] adds, not edx but
-0.221819	in the general case, but
-0.221819	use a function library, but
-0.221670	area of system programming, but
-0.114308	communication between different threads, but
-0.097535	it into multiple threads, but
-0.236498	shared between multiple threads, but
-0.275947	compiler with many features, but
-0.305590	in other programming languages, but
-0.217871	is based on BSD, but
-0.148294	other platforms as well, but
-0.148294	Studio optimizes reasonably well, but
-0.533103	search facilities are needed, but
-0.355111	precision or double precision, but
-0.300985	function or hot spot but
-0.218056	b is a float, but
-0.355111	executed as it is, but
-0.271478	time in the unit-test but
-0.193536	p is a pointer, but
-0.148294	through an imported pointer, but
-0.211613	represented with 64 bits, but
-0.211855	optimized Intel function libraries, but
-0.264404	relevant to small devices, but
-0.211613	names and model numbers, but
-0.211613	ZMM registers by 64, but
-0.264404	costly when it occurs, but
-0.211613	do such optimizations automatically, but
-0.264404	assembly output more readable but
-0.211613	flush and fence instructions, but
-0.264404	algorithm with template metaprogramming, but
-0.211613	integer division in vectors, but
-0.199360	etc.) have little-endian storage, but
-0.199360	other types of expressions, but
-0.199360	calculating the logarithm again, but
-0.074528	used in multiple applications, but
-0.074528	efficient for such applications, but
-0.250592	can still be vectorized, but
-0.199360	can reduce any expression, but
-0.074528	no overflow can occur, but
-0.074528	the error doesn't occur, but
-0.199360	advantage to using hyperthreading, but
-0.199360	the program under test, but
-0.199360	marketing of 64-bit software, but
-0.250592	source code more complex, but
-0.464928	the program is loaded, but
-0.199360	is safe and flexible, but
-0.199360	addresses that need relocation, but
-0.199360	floating point addition unit, but
-0.199360	a manual on usability, but
-0.250592	scope of this manual, but
-0.250592	a simple test setup but
-0.199360	object of known type, but
-0.199360	needed for other reasons, but
-0.199360	is the simplest method, but
-0.164549	calculation of the factorials, but
-0.164549	for finding hot spots, but
-0.164549	than by a macro, but
-0.164549	libraries (.lib or .a), but
-0.164549	difference less than 2-20, but
-0.164549	to compile with -mcmodel=large, but
-0.164549	more references to relocate, but
-0.164549	statements (called static if), but
-0.164549	only one CPU core, but
-0.164549	very large data bases, but
-0.164549	enabled. A more primitive, but
-0.164549	when called from main, but
-0.164549	the same memory block, but
-0.164549	not take the hint, but
-0.164549	done by me manually, but
-0.164549	with the option -ftrapv, but
-0.164549	address below 2 GB, but
-0.164549	problems associated with profiling, but
-0.164549	compiler (see page 103), but
-0.164549	like -(-a) very often, but
-0.164549	the sake of security, but
-0.164549	is a simple solution, but
-0.164549	in a particular situation, but
-0.164549	relieving a syntax restriction, but
-0.164549	the arrays as required, but
-0.164549	(double)(signed int)u; // Faster, but
-0.164549	because of disk caching, but
-0.164549	enough to be noticeable but
-0.164549	into many small subtasks, but
-0.164549	making the container expandable, but
-0.164549	not overlapping or aliasing, but
-0.164549	reduced 15.1b to 15.1c, but
-0.164549	if it is cached, but
-0.164549	course a considerable job, but
-0.164549	niche in scientific computing, but
-0.164549	a simple type casting, but
-0.164549	in the code section, but
-0.164549	other methods of rounding, but
-0.164549	size parameter is wrong, but
-0.164549	on its final destination, but
-0.164549	to override public symbols, but
-0.164549	on the Mac platform, but
-0.329476	CPU dispatching and is used
-0.461475	A function that is used
-0.406231	the one that is used
-0.313197	or class that is used
-0.313197	important method that is used
-0.313197	stack unwinding that is used
-0.454425	each time it is used
-0.516639	stack before it is used
-0.453823	below. Position-independent code is used
-0.536881	0 // This is used
-0.355712	first-in-last-out fashion. It is used
-0.335532	debugging support which is used
-0.335532	special trick which is used
-0.282870	the code cache is used
-0.282870	entire level-1 cache is used
-0.325151	so-called virtual table is used
-0.308267	where one thread is used
-0.232483	(1985). This standard is used
-0.232483	mode. 16-bit mode is used
-0.594761	the loop counter is used
-0.449749	Extra memory space is used
-0.205120	The dynamic_cast operator is used
-0.205120	The const_cast operator is used
-0.205120	The reinterpret_cast operator is used
-0.308267	the inline keyword is used
-0.169551	GOT lookup process is used
-0.169551	this delaying process is used
-0.326505	indirect function feature is used
-0.232483	If a bool is used
-0.232483	standard stack frame is used
-0.288046	The following algorithm is used
-0.232483	preprocessing macro INSTRSET is used
-0.232483	the function longjmp is used
-0.237797	becoming more popular and used
-0.517258	alternatives that can be used
-0.313618	prefetch instruction can be used
-0.488115	YMM) which can be used
-0.313618	template class can be used
-0.313618	these compilers can be used
-0.406752	same register can be used
-0.243377	This method can be used
-0.157401	same method can be used
-0.157401	similar method can be used
-0.313618	table lookup can be used
-0.313618	A union can be used
-0.313618	These conversions can be used
-0.313618	how metaprogramming can be used
-0.313618	hash map can be used
-0.313618	test tool can be used
-0.313618	these manuals can be used
-0.313618	Such units can be used
-0.313618	following guidelines can be used
-0.473442	instructions that may be used
-0.309899	An integer may be used
-0.309899	www.agner.org/optimize/cppexamples.zip. These may be used
-0.309899	following methods may be used
-0.309899	unwinding mechanism may be used
-0.309899	parameter. Templates may be used
-0.309899	binary tree may be used
-0.338214	which variables will be used
-0.319964	unrolling should only be used
-0.441329	multiple versions should be used
-0.228320	it can also be used
-0.331496	It can also be used
-0.228320	systems can also be used
-0.228320	union can also be used
-0.220504	coprocessor might also be used
-0.316107	MOVNTQ instruction cannot be used
-0.316107	program optimization cannot be used
-0.276090	denominator can even be used
-0.409751	binding should therefore be used
-0.384717	above can still be used
-0.228946	of code that are used
-0.304060	of data that are used
-0.428661	the functions that are used
-0.268130	if functions that are used
-0.268130	several functions that are used
-0.177603	the variables that are used
-0.177603	that variables that are used
-0.430013	function libraries that are used
-0.198528	are: Variables that are used
-0.088017	9.4 Variables that are used
-0.046932	9.3 Functions that are used
-0.228946	so-called iterators that are used
-0.527347	where the data are used
-0.315055	If virtual functions are used
-0.315055	functions Virtual functions are used
-0.325197	and directives which are used
-0.338241	when Intel libraries are used
-0.311576	deleted. Smart pointers are used
-0.342639	the function they are used
-0.325197	The x86 processors are used
-0.301271	my optimization manuals are used
-0.226599	vector. These units are used
-0.301271	such runtime frameworks are used
-0.281365	7.29 Threads Threads are used
-0.237751	{ return ipow(x,10); // used
-0.339291	less important than it used
-0.063527	PLT tables are not used
-0.426312	b; Here, I have used
-0.329333	particularly tricky. I have used
-0.355221	but also the time used
-0.349695	user input. The time used
-0.236143	name. #define directives when used
-0.236143	to const definitions when used
-0.449747	not free the memory used
-0.235854	code and data memory used
-0.293708	code size or data used
-0.348641	the variable is only used
-0.459929	memory inside the CPU used
-0.540661	compilation of the most used
-0.439719	access to the most used
-0.302363	static memory is also used
-0.125383	This mechanism is also used
-0.125383	unwinding mechanism is also used
-0.236988	the cache lines we used
-0.232532	Induction variables are often used
-0.232532	relative addresses are often used
-0.232532	behaviors. Arrays are often used
-0.034016	and the most often used
-0.034016	put the most often used
-0.034016	choose the most often used
-0.208807	source file. Keep often used
-0.306346	macro, but the method used
-0.335138	than pow The method used
-0.344782	some experience to get used
-0.311318	cache line that was used
-0.416938	amount of cache space used
-0.234877	layers and frameworks typically used
-0.310847	for the memory model used
-0.271220	members that are never used
-0.201160	if they are never used
-0.404610	that are no longer used
-0.231787	fast as additions. When used
-0.227990	hybrid solutions are now used
-0.227990	nearby branches. The algorithms used
-0.226582	as if you had used
-0.226614	so important and generally used
-0.224749	return from Func 87 used
-0.222194	used. The method currently used
-0.020499	variables The most commonly used
-0.020499	purposes. The most commonly used
-0.088533	There are two commonly used
-0.148626	functions separate from seldom used
-0.148626	functions, and put seldom used
-0.212125	early implementations of Pascal used
-0.165009	in embedded systems Microcontrollers used
-0.501532	branch, which is the one
-0.324888	function may be the one
-0.355780	other modules than the one
-0.237515	intermediate code like the one
-0.758341	order to find the one
-0.350996	10 Gnu This is one
-0.350996	VIA CPUs"). This is one
-0.293350	performing software product is one
-0.237147	them enabled (there is one
-0.237147	at runtime. Polymorphism is one
-0.237809	member functions counts a one
-0.472831	in the calculation of one
-0.472831	start the calculation of one
-0.483703	through a pointer to one
-0.483703	converting a pointer to one
-0.237039	give higher priority to one
-0.406851	p is identical to one
-0.331031	functions often belong to one
-0.528974	latest instruction set and one
-0.439877	one from Intel and one
-0.603199	the executable file and one
-0.292189	names, one global and one
-0.441478	by the program, and one
-0.292189	one for SSE4.1 and one
-0.236127	between CPU brands, and one
-0.321680	or write it in one
-0.482144	type short int in one
-0.463536	of an integer in one
-0.290782	a parent class in one
-0.333131	four R value in one
-0.564739	should be stored in one
-0.234890	accessed through pointers in one
-0.328896	many objects together in one
-0.290782	of register temp in one
-0.609726	store all strings in one
-0.234890	four consecutive terms in one
-0.290782	do four additions in one
-0.324688	too much data for one
-0.293583	calculate element addresses for one
-0.355432	separate threads so that one
-0.714273	of making sure that one
-0.339211	though. Some instructions are one
-0.293844	conversion takes zero or one
-0.351162	are negative or if one
-0.236894	Replacing two comparisons by one
-0.236894	be dynamically created by one
-0.443886	A clock cycle on one
-0.331617	can execute a code one
-0.353324	part 142 unsigned int one
-0.243463	there is more than one
-0.243463	accessed in more than one
-0.243463	register for more than one
-0.243463	or do more than one
-0.243463	to load more than one
-0.243463	cannot prefetch more than one
-0.424983	then you can have one
-0.333246	memory and will have one
-0.235001	__declspec(thread). Such variables have one
-0.532553	block. Do not use one
-0.330136	class library will use one
-0.458544	is called only from one
-0.231018	the memory block from one
-0.100326	ownership is transferred from one
-0.100326	copied or transferred from one
-0.231018	must be saved from one
-0.525425	when a program has one
-0.233965	the pipeline structure has one
-0.233965	though the latter has one
-0.538334	may want to make one
-0.345278	thread-specific data and make one
-0.368512	here is the only one
-0.446551	if there is only one
-0.288394	have one and only one
-0.202551	identical so that only one
-0.089573	there may be only one
-0.089573	There should be only one
-0.186997	in systems with only one
-0.186997	a system with only one
-0.089573	you will have only one
-0.089573	Current CPUs have only one
-0.202551	is called from only one
-0.164484	member function has only one
-0.108756	a template has only one
-0.108756	the computer has only one
-0.202551	compiler will make only one
-0.202551	expressions. For example, only one
-0.186997	instructions that take only one
-0.186997	shift operations take only one
-0.485801	branch is mispredicted only one
-0.202551	is to hold only one
-0.235347	of branch prediction. If one
-0.291302	predictable operand first. If one
-0.334377	specific recommendation of which one
-0.233223	to find out which one
-0.309149	libraries and see which one
-0.290462	this solution is using one
-0.817147	be improved by using one
-0.303367	all source files into one
-0.223635	must be read into one
-0.223635	multiple .cpp modules into one
-0.312279	to join them into one
-0.223635	0x2700 to 0x273F into one
-0.363036	will be joined into one
-0.237114	present manual is number one
-0.236968	making two threads where one
-0.313578	unsigned. This typically takes one
-0.355408	of jobs. For example, one
-0.292319	'this' pointer takes up one
-0.341973	that goes many times one
-0.235790	be kept entirely inside one
-0.417725	then you will get one
-0.097063	The code section needs one
-0.097063	writable data section needs one
-0.234169	16.2 above, but read one
-0.619244	A branch that goes one
-0.506179	operations. You may choose one
-0.216270	is translated to just one
-0.269667	32 AND-operations in just one
-0.233305	the CPU detection function, one
-0.319411	for example, to go one
-0.231949	Typically it should save one
-0.844944	result of the preceding one
-0.365636	constant to the preceding one
-0.332602	16 bytes by adding one
-0.263472	class has at least one
-0.263472	that calls at least one
-0.286422	smaller squares and handle one
-0.230651	set up and enable one
-0.604129	modified by the program, one
-0.815141	the SSE2 instruction set, one
-0.332250	more efficient to allocate one
-0.311408	Here we can eliminate one
-0.226404	classes are currently available, one
-0.222018	addition units, and 22 one
-0.222097	arbitrary cache line. Only one
-0.218216	or more integer units, one
-0.211952	(doubly ended queue) allocates one
-0.199690	branch that goes randomly one
-0.199690	be compiled three times, one
-0.164854	( 1)sign 2exponent 16383 one
-0.164854	in Day for signifying one
-0.164854	a specific purpose: Contain one
-0.164854	to i and shifts one
-0.164854	divided into three parts: one
-0.164854	have just two branches: one
-0.164854	the variable two names, one
-0.164854	10 elements were inserted, one
-0.357366	is already in the cache
-1.143279	disadvantage is that the cache
-0.348002	and arrays by the cache
-0.819242	addresses divisible by the cache
-0.732228	be less than the cache
-0.581890	is bigger than the cache
-0.367997	loop bigger than the cache
-0.339964	be evicted from the cache
-0.339964	be fetched from the cache
-0.633421	execution time because the cache
-0.345502	column 28 because the cache
-0.447136	each set. If the cache
-0.345909	nontemporal writes. If the cache
-0.351949	means that all the cache
-0.352612	address again before the cache
-0.470769	0x2710 will cause the cache
-0.312754	code that uses the cache
-0.380521	by at least the cache
-0.236248	cache will evict the cache
-0.236248	in column 28, the cache
-0.721394	My example is a cache
-0.356476	the beginning of a cache
-0.491286	buffer is also a cache
-0.236236	same address so a cache
-0.292313	to know how a cache
-0.323324	0x2710 will cause a cache
-0.048122	memory without loading a cache
-0.048122	double without loading a cache
-0.236236	used for fetching a cache
-0.236236	program of occupying a cache
-0.348233	half speed because of cache
-0.345578	calculate which set of cache
-0.355557	a realistic number of cache
-0.713464	take a lot of cache
-0.496979	save a lot of cache
-0.538029	minimize the amount of cache
-0.291980	(en.wikipedia.org/wiki/L2_cache). The details of cache
-0.291980	version. The penalty of cache
-0.458906	is a waste of cache
-0.325496	cause a waste of cache
-0.235943	or three levels of cache
-0.331825	the time goes to cache
-0.324855	than memory access and cache
-0.382249	supported instruction sets and cache
-0.692886	in the cache. The cache
-0.324299	on contemporary processors. The cache
-0.293221	same cache line. The cache
-0.355523	and there will be cache
-0.237634	is file access or cache
-0.293945	elements // align by cache
-0.435258	addition to the code cache
-0.312907	space in the code cache
-0.450056	happen in the code cache
-0.514063	so that the code cache
-0.422686	save time. The code cache
-0.326431	stored together The code cache
-0.230828	very likely that code cache
-0.230828	the linker. Both code cache
-0.355951	same resources, such as cache
-0.324876	big and uses more cache
-0.291992	code and data A cache
-0.235953	Access data sequentially A cache
-0.532096	efficiency of the data cache
-0.294246	cache use and data cache
-0.294246	code cache and data cache
-0.318977	array index. The data cache
-0.108937	than the level-1 data cache
-0.108937	where the level-1 data cache
-0.118631	is a level-1 data cache
-0.237345	go into eight different cache
-0.162776	writing to the same cache
-0.146178	contend for the same cache
-0.530963	28 share the same cache
-0.431402	of sharing the same cache
-0.237367	in Wikipedia under CPU cache
-0.381958	instructions. There are other cache
-0.237352	from Func 87 used cache
-0.740415	and there are no cache
-0.799072	Smaller microcontrollers have no cache
-0.237227	the program has most cache
-0.237004	more efficient today where cache
-0.236890	cache from loading any cache
-0.352402	to load a new cache
-0.349633	Every fourth of these cache
-0.035338	Store 4 bytes without cache
-0.035338	Store 8 bytes without cache
-0.003787	Store 16 bytes without cache
-0.075976	branches that take up cache
-0.075976	instances that take up cache
-0.343133	will have an extra cache
-0.493911	stack. This can cause cache
-0.424386	because it may cause cache
-0.050505	one of the four cache
-0.291486	There are only four cache
-0.335615	least at the last cache
-0.219293	size of 64. Each cache
-0.219293	with line 29. Each cache
-0.597977	in order to improve cache
-0.246568	occur in the level-2 cache
-0.160018	Contentions in the level-2 cache
-0.069379	stride for the level-2 cache
-0.069379	is that the level-2 cache
-0.069379	bigger than the level-2 cache
-0.069379	cache from the level-2 cache
-0.069379	instruction prevents the level-2 cache
-0.023515	Kbytes and a level-2 cache
-0.023515	only if, a level-2 cache
-0.048377	level-2 cache. The level-2 cache
-0.048377	much stronger for level-2 cache
-0.048377	// Check if level-2 cache
-0.478413	loading of the code, cache
-0.287965	else { // No cache
-0.165171	multiple of the level-1 cache
-0.317420	Contentions in the level-1 cache
-0.165171	blocking for the level-1 cache
-0.165171	to reload the level-1 cache
-0.132857	contentions than for level-1 cache
-0.132857	almost the entire level-1 cache
-0.307767	saved in a special cache
-0.331462	you want to prevent cache
-0.307740	if it can save cache
-0.230549	write causes an entire cache
-0.226493	can get very expensive cache
-0.314948	every time a thousand cache
-0.284032	loaded into an arbitrary cache
-0.056953	............................................................. 96 9.11 Explicit cache
-0.056953	SIAM 2001. 9.11 Explicit cache
-0.212139	processors with a micro-op cache
-0.199774	memory to disk. Provoke cache
-0.164931	about supported instruction sets, cache
-0.164931	(line size) = (total cache
-0.164931	execution speed, memory economy, cache
-0.164931	a unit-test without taking cache
-0.164931	of machine instructions executed, cache
-0.053263	is an expression that should
-0.344558	= false where it should
-0.237036	as possible. Typically it should
-0.498983	type of a function should
-0.478085	branches in a function should
-0.339519	} Obviously, a function should
-0.235543	functions. A thread-safe function should
-0.312298	prone. The vectorized code should
-0.291892	system code. System code should
-0.235866	method. Your measurement code should
-0.237589	the actual calculations. This should
-0.344944	A good optimizing compiler should
-0.442195	case" counts that you should
-0.333253	your application then you should
-0.333253	acceptable limit, then you should
-0.296712	Boolean operands because you should
-0.306909	this manual, but you should
-0.442197	decimals, for example, you should
-0.123495	to it. Therefore, you should
-0.123495	time consuming. Therefore, you should
-0.123495	an exception. Therefore, you should
-0.123495	or namespaces. Therefore, you should
-0.296712	from testing. Here, you should
-0.277010	a C++ program, you should
-0.222757	CPU dispatching. Obviously, you should
-0.222757	On the contrary, you should
-0.232850	compiler became available. It should
-0.288463	of storage space. It should
-0.232850	standardized installation tools. It should
-0.232850	from a buffer. It should
-0.314036	data. The test data should
-0.304717	instruction sets. The program should
-0.304717	Linux, sched_setaffinity). The program should
-0.233908	actually used. No program should
-0.293525	library of math functions should
-0.491557	A critical innermost loop should
-0.538345	This structure or class should
-0.537536	code in this example should
-0.344830	by 8. The size should
-0.570419	used. a and b should
-0.419921	size of each object should
-0.340882	future version of C++ should
-0.233755	set to NULL. There should
-0.233755	separated by commas. There should
-0.398991	matrix or multidimensional array should
-0.209164	sizeof(list)); A multidimensional array should
-0.333059	bytes) of the objects should
-0.439891	all variables and objects should
-0.390313	memory. Variables and objects should
-0.236860	of data decomposition, we should
-0.271453	floating point operations. You should
-0.217850	dynamic memory allocation. You should
-0.217850	of the software. You should
-0.271453	and b overlap. You should
-0.217850	something about them. You should
-0.217850	is too late. You should
-0.330683	page 134. The table should
-0.236756	believe that software performance should
-0.236624	Compatibility problems. All software should
-0.292869	calculations. The loop branch should
-0.300478	hot spots. The test should
-0.280607	A realistic performance test should
-0.225931	correctly. The speed test should
-0.236311	be used. Web systems should
-0.236340	no function or method should
-0.292289	needed. These complicated cases should
-1.124223	division by a constant should
-0.228986	RAM memory. Big arrays should
-0.228986	writing data. Multidimensional arrays should
-0.777553	does floating point calculations should
-0.521623	program in multiple versions should
-0.229071	and all three versions should
-0.334594	bigger than 16 bytes should
-0.442883	used by multiple threads should
-0.322706	processor cores. Each thread should
-0.235621	buttons, dialog boxes, etc. should
-0.334157	initialized by a list should
-0.345440	time. A loop counter should
-0.341719	5. The loop count should
-0.234727	and uninstallation of programs should
-0.234825	the server. These problems should
-0.290686	possible, and the dispatching should
-0.634203	of the memory block should
-0.321073	functions. The template parameter should
-0.290217	the data and resources should
-0.172853	version. The CPU dispatcher should
-0.172853	cases: The CPU dispatcher should
-0.172853	programming. The CPU dispatcher should
-0.234166	work. The updating mechanism should
-0.414998	mouse. The .NET framework should
-0.146610	that are used together should
-0.300203	installed. The installation process should
-0.217225	updating. The update process should
-0.319998	integer). All intermediate results should
-0.309355	environment block. Thread-local storage should
-0.232328	than a few lines should
-0.231927	file input and output should
-0.205819	excessively so. These containers should
-0.205819	needed. Objects inside containers should
-0.579637	calculation of one iteration should
-0.357304	The search for updates should
-0.202917	of downloaded program updates should
-0.416623	branches and switch statements should
-0.207060	this case. Loop unrolling should
-0.207060	unroll factor. Loop unrolling should
-0.229801	register state. This penalty should
-0.395954	event, the clock counts should
-0.228902	printer or other device should
-0.282630	unacceptably long. Lazy binding should
-0.282706	received by an interrupt should
-0.226420	user access rights. Software should
-0.233753	problems that software developers should
-0.233753	compatibility problems. Software developers should
-0.221882	resolutions, etc. Accessibility guidelines should
-0.148428	inconvenient times. A queue should
-0.148428	example, a FIFO queue should
-0.218213	streaming audio or video should
-0.218213	mispredictions. The performance measurement should
-0.218081	satisfactory. The following considerations should
-0.218213	An interrupt service routine should
-0.218345	data that are modified should
-0.211820	new features. User feedback should
-0.211820	as heavy mathematical calculations, should
-0.199562	and standardized file formats should
-0.199562	network resources and servers should
-0.250819	are 64 bits wide, should
-0.164735	defined outside any function) should
-0.164735	and planned solutions. Patches should
-0.164735	is utilized appropriately. Users should
-0.164735	feedback seriously. User complaints should
-0.164735	a copy protection scheme should
-0.164735	opinions on which imprecisions should
-0.164735	can proceed unattended. Uninstallation should
-0.164735	the current .cpp file) should
-0.164735	using new/delete or malloc/free should
-0.841243	The size of the integer
-0.524084	the bits of the integer
-0.357273	adds 16 to the integer
-0.656428	is important that the integer
-0.357308	the sign-bit if the integer
-0.653235	the performance because the integer
-0.455738	statement and all the integer
-0.352894	counter and using the integer
-0.335341	inputs. Let's take the integer
-0.324264	compilers cannot reduce the integer
-0.406800	faster the smaller the integer
-0.831367	with the use of integer
-0.829764	The maximum number of integer
-0.186146	from floating point to integer
-0.136954	float or double to integer
-0.237046	i++,i2+=2.0f)a[i]=i2; 41 Float to integer
-0.237454	the floating point and integer
-0.293700	prior to SSE4.1 and integer
-0.834998	can be stored in integer
-0.323824	suitable pivot element. The integer
-0.313227	the particular application. The integer
-0.292779	the loop index. The integer
-0.236645	in example 8.15b. The integer
-0.346663	library have functions for integer
-0.337808	are no instructions for integer
-0.349393	no automatic check for integer
-0.236619	no caching problems for integer
-0.444003	that the operands are integer
-0.237623	speed of functions with integer
-0.235122	to decrement operators on integer
-0.210992	do more reductions on integer
-0.210992	most simple reductions on integer
-0.235122	use algebraic manipulations on integer
-0.515392	therefore as fast as integer
-0.822189	if b is an integer
-0.305317	the exponent is an integer
-0.308644	the bits of an integer
-0.308644	the range of an integer
-0.324906	point number to an integer
-0.334318	be allocated for an integer
-0.303057	an integer, or an integer
-0.329532	point variable as an integer
-0.308686	inefficient to use an integer
-0.286995	pointers requires only an integer
-0.267719	possible to do an integer
-0.214548	to check whether an integer
-0.214548	enum is simply an integer
-0.214548	compiler can replace an integer
-0.214548	is in fact an integer
-0.214548	arithmetic operations. When an integer
-0.214548	improved by adding an integer
-0.214548	when you divide an integer
-0.214548	We can convert an integer
-0.214548	simply to increment an integer
-0.214548	way of declaring an integer
-0.214548	condition clause. Comparing an integer
-0.214548	+= a[i]; Converting an integer
-0.214548	avoided by replacing an integer
-0.382326	are more predictable than integer
-0.293749	calculations will typically use integer
-0.795980	are two or more integer
-0.234699	by adding one more integer
-0.234699	SSSE3 a few more integer
-0.405026	to avoid conversions from integer
-0.291822	is enabled. Conversion from integer
-0.335620	simple integer with vector integer
-0.344898	table summarizes the different integer
-0.349543	7.1. Sizes of different integer
-0.237441	be an advantage because integer
-0.343043	Induction variables for other integer
-0.771035	the size of each integer
-1.052361	is possible to do integer
-0.313615	mode. The first two integer
-0.332107	by using a 64-bit integer
-0.586119	take advantage of 64-bit integer
-0.549331	select the most efficient integer
-0.319113	bit of a 32-bit integer
-0.232787	vector, such as 32-bit integer
-0.562088	that the most critical integer
-0.327563	vectors SSE2 128 bit integer
-0.287589	vectors AVX2 256 bit integer
-0.223177	first convert the unsigned integer
-0.223177	extra bits. The unsigned integer
-0.272001	Conversion of an unsigned integer
-0.201824	interpreted as an unsigned integer
-0.323471	can assume that these integer
-0.236421	near then the even integer
-0.473134	condition is a simple integer
-0.278355	advantageous to do simple integer
-0.223945	multiplication, to mix simple integer
-0.221847	Boolean vector operations An integer
-0.221847	i = s; An integer
-0.221847	-128 to +127. An integer
-0.311762	improving code that contains integer
-0.349837	considering whether a particular integer
-0.234897	compilers will often replace integer
-0.097044	................................... 141 14.9 Using integer
-0.097044	(double)(signed int)u; 14.9 Using integer
-0.211229	Conversion of a signed integer
-0.149597	integer to a signed integer
-0.268036	the conversion to signed integer
-0.198448	the assumption that signed integer
-0.234413	sets). Here, / means integer
-0.219336	Function to store aligned integer
-0.219336	Function to load aligned integer
-0.288690	bits differently. A negative integer
-0.425004	N is a positive integer
-0.605372	be advantageous to mix integer
-0.003232	Function to store unaligned integer
-0.003232	Function to load unaligned integer
-0.228968	set also allows 256-bit integer
-0.284053	to use the default integer
-0.282736	elimin., float Register variables, integer
-0.125795	to use the smallest integer
-0.145435	by using the smallest integer
-0.309140	expressions or more complex integer
-0.176236	Linux, the first six integer
-0.224657	There are approximately six integer
-0.360943	operating systems and fourteen integer
-0.222190	are better at reducing integer
-0.165005	// Example 15.1c. Calculate integer
-0.165005	// Example 15.1b. Calculate integer
-0.271894	then make an additional integer
-0.218239	portable way of defining integer
-0.264813	expressions. Most reductions involving integer
-0.211975	// Round to nearest integer
-0.211975	generally very fast. Simple integer
-0.330393	elimination Common subexpression elimin., integer
-0.164875	(three on CodeGear compiler) integer
-0.164875	transferred in registers (6 integer
-0.164875	for array bounds violation, integer
-0.164875	wrap around, (3) trap integer
-0.352162	be predicted. This is no
-0.770477	when the object is no
-0.076229	matter and there is no
-0.076229	common, and there is no
-0.277338	way that there is no
-0.204358	integer if there is no
-0.204358	exponent if there is no
-0.204358	especially if there is no
-0.204358	accumulators if there is no
-0.168749	convenience - there is no
-0.076229	efficient when there is no
-0.076229	critical when there is no
-0.252321	time then there is no
-0.164725	performance then there is no
-0.164725	error then there is no
-0.164725	etc. then there is no
-0.164725	elsewhere then there is no
-0.168749	negligible because there is no
-0.233376	bases, but there is no
-0.168749	operations where there is no
-0.140948	most cases, there is no
-0.223404	some cases, there is no
-0.168749	(Of course there is no
-0.168749	In general, there is no
-0.168749	but unfortunately there is no
-0.168749	is enabled there is no
-0.106132	is double There is no
-0.106132	different functions. There is no
-0.106132	the systems. There is no
-0.106132	logical processors. There is no
-0.106132	was called. There is no
-0.106132	same object. There is no
-0.106132	different way. There is no
-0.106132	internal references. There is no
-0.106132	function returns. There is no
-0.106132	memory allocation. There is no
-0.106132	template parameter. There is no
-0.106132	maximum value. There is no
-0.106132	four parameters. There is no
-0.106132	higher bits. There is no
-0.106132	comes automatically. There is no
-0.106132	out-of-order execution. There is no
-0.106132	the screen. There is no
-0.106132	is created. There is no
-0.106132	p. 43). There is no
-0.106132	p. 87). There is no
-0.106132	as x4∙xn-4. There is no
-0.106132	7.33 Namespaces There is no
-0.106132	is returned. There is no
-0.106132	or .so). There is no
-0.229326	The fastest execution is no
-0.229326	the same priority is no
-0.237032	would be 8 and no
-0.457670	fixed repeat count and no
-0.381613	no copy constructor and no
-0.313689	by n additions and no
-0.690189	to make sure that no
-0.517074	product makes sure that no
-0.292734	teachers to recommend that no
-0.524415	because there may be no
-0.355109	} There will be no
-0.353276	up spaces that are no
-0.216327	limited and there are no
-0.216327	dominating and there are no
-0.471441	assume that there are no
-0.510771	separately if there are no
-0.593007	again. If there are no
-0.342455	is enabled. There are no
-0.342455	to security. There are no
-0.515290	of when they are no
-0.235835	loop with few or no
-0.102102	methods with little or no
-0.102102	programmers have little or no
-0.235835	8-bit signed number, or no
-0.312860	is dead code if no
-0.236337	remove any objects if no
-0.345555	exception handler, even if no
-0.237702	code is inlined - no
-0.328534	is certain to have no
-0.311743	that it can have no
-0.275380	that these functions have no
-0.275380	style string functions have no
-0.540883	used if elements have no
-0.315656	in performance. I have no
-0.315656	initialized arrays. I have no
-0.275720	we should preferably have no
-0.275720	possible. Smaller microprocessors have no
-0.384242	that the operands have no
-0.019728	software. Smaller microcontrollers have no
-0.019728	caching. Smaller microcontrollers have no
-0.019728	microcontrollers: Smaller microcontrollers have no
-0.221619	C++ imple- mentations have no
-0.236160	is not necessary when no
-0.236160	memory requirement. Useful when no
-0.319870	a function that has no
-0.422636	sure the code has no
-0.487817	Here, the compiler has no
-0.419329	x86 instruction set has no
-0.328113	and the library has no
-0.074135	register the object has no
-0.074135	destructor the object has no
-0.222191	(without member functions) has no
-0.222191	the stack. Deallocation has no
-0.222191	option -fno-pic apparently has no
-0.233219	floating point overflow but no
-0.233219	types of expressions, but no
-0.233219	(called static if), but no
-0.223197	and the code takes no
-0.223197	or structure object takes no
-0.277508	size conversion often takes no
-0.223197	that exception handling takes no
-0.307443	int)i; This conversion takes no
-0.332651	integer variable, it makes no
-0.232529	const, or #define makes no
-0.227393	and long double take no
-0.227393	double precision calculations take no
-0.227393	between different precisions take no
-0.236208	takes the hint about no
-0.417950	because you will get no
-0.622851	of a program contains no
-0.226372	The code section contains no
-0.345271	used, there is simply no
-0.310197	the program. This requires no
-0.233919	BSD Instruction set control no
-0.588553	the compiler to assume no
-0.232598	as output can produce no
-0.210493	-fno-rtti /GR- -fno-rtti Assume no
-0.210493	See page 78. Assume no
-0.226655	the type conversion generates no
-0.212228	an option for assuming no
-0.212171	only) -O3 or (requires no
-0.074731	the option for "assume no
-0.074731	the compiler option "assume no
-0.165050	why there is virtually no
-0.237868	about data storage and page
-0.214032	type. The example on page
-0.214032	of container classes on page
-0.009491	method is explained on page
-0.009491	storage are explained on page
-0.000783	code, as explained on page
-0.000783	processors, as explained on page
-0.000783	static, as explained on page
-0.000783	precision, as explained on page
-0.000783	execution, as explained on page
-0.000783	operations, as explained on page
-0.000783	templates, as explained on page
-0.000783	AVX, as explained on page
-0.000783	statements, as explained on page
-0.000783	frequency, as explained on page
-0.000783	stride, as explained on page
-0.000783	contentions, as explained on page
-0.000587	for reasons explained on page
-0.214032	set. The examples on page
-0.214032	the methods described on page
-0.214032	divisions are given on page
-0.093962	threads is discussed on page
-0.093962	devices, as discussed on page
-0.214032	is explained below on page
-0.214032	conditions are listed on page
-0.214032	explained in detail on page
-0.214032	are provided below, on page
-0.214032	runtime. Example 7.43 on page
-0.349881	divisible by the memory page
-0.237535	metaprogramming, as explained at page
-0.237129	from www.intel.com. (See also page
-0.057136	Windows syntax or See page
-0.057136	virtual member function. See page
-0.057136	efficient than functions. See page
-0.093443	piece of memory. See page
-0.057136	the main program. See page
-0.006744	registers are used. See page
-0.057136	and VIA processors. See page
-0.057136	0 or 1. See page
-0.057136	variables, if possible. See page
-0.057136	an inefficient way. See page
-0.057136	an Intel CPU. See page
-0.057136	aligned or not. See page
-0.057136	out of order. See page
-0.057136	dynamic memory allocation. See page
-0.057136	linking" if available. See page
-0.057136	complex integer expressions. See page
-0.057136	kinds of storage. See page
-0.013595	the operating system. See page
-0.013595	and operating system. See page
-0.057136	enables interprocedural optimizations. See page
-0.057136	using assembly language. See page
-0.057136	to avoid this. See page
-0.057136	do not overlap. See page
-0.057136	reading disk files. See page
-0.057136	to do so. See page
-0.057136	of five manuals. See page
-0.006744	Intel's CPU dispatcher. See page
-0.057136	is not cached. See page
-0.057136	no pointer aliasing. See page
-0.057136	slightly less compact. See page
-0.057136	prevent such errors. See page
-0.057136	each part takes. See page
-0.057136	recover from exceptions. See page
-0.057136	does not occur. See page
-0.006744	near each other. See page
-0.057136	be optimally aligned. See page
-0.057136	strictness is required. See page
-0.057136	branch prediction mechanism. See page
-0.057136	a long delay. See page
-0.057136	use STL containers. See page
-0.057136	cause cache contentions. See page
-0.057136	program will crash. See page
-0.057136	p is incremented. See page
-0.057136	optimization is requested. See page
-0.057136	perhaps Mac OS. See page
-0.057136	loop-invariant code motion. See page
-0.057136	type identification (RTTI). See page
-0.057136	_mm_setcsr(_mm_getcsr() | 0x8040); See page
-0.050680	p points to (see page
-0.050680	of the compiler (see page
-0.050680	the preceding one (see page
-0.086428	the data cache (see page
-0.050680	the derived class (see page
-0.050680	float and double (see page
-0.050680	a smart pointer (see page
-0.050680	are less efficient (see page
-0.024601	stored in registers (see page
-0.024601	the XMM registers (see page
-0.050680	the operating system (see page
-0.050680	use vector instructions (see page
-0.050680	on non-Intel processors (see page
-0.050680	with a constant (see page
-0.050680	checks where necessary (see page
-0.050680	with unsigned integers (see page
-0.050680	floating point precision (see page
-0.050680	a linked list (see page
-0.050680	0 or 1 (see page
-0.050680	floating point expressions (see page
-0.050680	out of range (see page
-0.050680	vectorized as intended (see page
-0.012126	and automatic vectorization (see page
-0.012126	intrinsics, automatic vectorization (see page
-0.050680	// Bounds checking (see page
-0.050680	be quite time-consuming (see page
-0.024601	no pointer aliasing (see page
-0.024601	rule out aliasing (see page
-0.050680	If a profiling (see page
-0.050680	with out-of-order capabilities (see page
-0.050680	case of mispredictions (see page
-0.050680	to be profitable (see page
-0.050680	and automatic CPU-dispatching (see page
-0.050680	do the devirtualization (see page
-0.222957	for vector operations, see page
-0.222957	for XMM registers; see page
-0.233759	variables. See chapter 10 page
-0.058354	power of 2 (See page
-0.058354	dividing by 2 (See page
-0.125783	in 64-bit Windows (See page
-0.125783	for different CPUs. (See page
-0.125783	optimizations across modules (See page
-0.125783	out by 2. (See page
-0.279421	critical stride (see above, page
-0.423600	as in example 13.1 page
-0.199987	as in example 14.23 page
-0.165128	explained in example 7.35 page
-0.358520	4 lines in the set
-0.325136	bit of f is set
-0.237718	Wednesday or Friday is set
-0.551473	program to use a set
-0.337468	vector operations use a set
-0.500093	second way is to set
-0.544475	72. You have to set
-0.646941	the fastest way to set
-0.353160	is strongly recommended to set
-0.330905	addresses all belong to set
-0.293084	that it attempts to set
-0.666030	four cache lines in set
-0.552945	CPU which can be set
-0.653847	upper limit can be set
-0.294080	if certain options are set
-0.552908	meaning, then you can set
-0.407169	The test tool can set
-0.405347	x.i |= 0x80000000; // set
-0.065524	float a[size], b[size]; // set
-0.236022	u.i &= 0x7FFFFFFF; // set
-0.237498	four cache lines from set
-0.354554	you divide the data set
-0.341212	different addresses with different set
-0.565623	belong to the same set
-0.171803	addition to the instruction set
-0.097844	missing in the instruction set
-0.046172	file for the instruction set
-0.046172	compiled for the instruction set
-0.102510	name depending on instruction set
-0.031556	more efficient. This instruction set
-0.031556	instruction set. This instruction set
-0.031556	of view. This instruction set
-0.143006	it has an instruction set
-0.089931	code for this instruction set
-0.089931	that support this instruction set
-0.089931	it checks which instruction set
-0.089931	automatically detect which instruction set
-0.102510	because the 64-bit instruction set
-0.102510	the best possible instruction set
-0.102510	The 64 bit instruction set
-0.102510	when a new instruction set
-0.029650	if the SSE2 instruction set
-0.011618	when the SSE2 instruction set
-0.061474	cases the SSE2 instruction set
-0.029650	unless the SSE2 instruction set
-0.061474	Using the SSE2 instruction set
-0.029650	enable the SSE2 instruction set
-0.018171	efficient. The SSE2 instruction set
-0.018171	CPUs. The SSE2 instruction set
-0.018171	140). The SSE2 instruction set
-0.056967	the 145 SSE2 instruction set
-0.056967	/arch:SSE -msse SSE2 instruction set
-0.102510	the AVX 32 instruction set
-0.118140	of the AVX instruction set
-0.052158	for the AVX instruction set
-0.052158	if the AVX instruction set
-0.052158	If the AVX instruction set
-0.046401	later. The AVX instruction set
-0.022581	elements. 12.1 AVX instruction set
-0.022581	105 12.1 AVX instruction set
-0.161713	the minimum supported instruction set
-0.161713	// Detect supported instruction set
-0.203642	for a particular instruction set
-0.053473	SSE2 or later instruction set
-0.049000	AVX or later instruction set
-0.049000	Pentium-II or later instruction set
-0.213710	for a higher instruction set
-0.151749	the next higher instruction set
-0.048237	and the AVX2 instruction set
-0.048237	somewhat. The AVX2 instruction set
-0.015491	of the x86 instruction set
-0.015491	to the x86 instruction set
-0.031556	32- bit x86 instruction set
-0.102510	with the appropriate instruction set
-0.203642	enable the desired instruction set
-0.102510	-ffunction- sections SSE instruction set
-0.023449	if the SSE4.1 instruction set
-0.023449	unless the SSE4.1 instruction set
-0.143006	choose a newer instruction set
-0.102510	for a low instruction set
-0.011566	for the latest instruction set
-0.102510	/arch:SSE2 -msse2 SSE3 instruction set
-0.220223	using the newest instruction set
-0.102510	when the AVX512 instruction set
-0.065643	However, the CISC instruction set
-0.031583	resource. The CISC instruction set
-0.031583	ratio. The CISC instruction set
-0.102510	with the highest instruction set
-0.102510	because the x86-64 instruction set
-0.031556	with the AVX-512 instruction set
-0.015491	function. 12.2 AVX-512 instruction set
-0.015491	107 12.2 AVX-512 instruction set
-0.102510	// Error: lowest instruction set
-0.102510	32-bit number (the instruction set
-0.102510	SSE2 (or later) instruction set
-0.102510	color difference. Newest instruction set
-0.102510	The Pentium Pro instruction set
-0.237389	We can calculate which set
-0.574463	The most commonly used set
-0.331496	enabled (there is one set
-1.075192	one instance for each set
-0.237236	should have its pointer set
-0.236830	optimization. The debugger cannot set
-0.527738	fine-tuned for a particular set
-0.478795	model has its own set
-0.059895	of vector, bits Instruction set
-0.059895	each table element Instruction set
-0.059895	Intrinsic function name Instruction set
-0.059895	Linux, Mac, BSD Instruction set
-0.014217	is as follows: Instruction set
-0.014217	are as follows: Instruction set
-0.059895	supported fprintf(stderr, "\nError: Instruction set
-0.314604	best on a typical set
-0.229146	You may, in addition, set
-0.404270	times with a suitable set
-0.222241	templates, such as list, set
-0.059903	tested with a realistic set
-0.059903	performed with a realistic set
-0.048366	instr. set AVX instr. set
-0.048366	instr. set SSE4.1 instr. set
-0.048366	set Suppl. SSE3 instr. set
-0.523485	the object of the class
-0.959489	a member of the class
-0.535945	an instance of the class
-0.724311	or reference to the class
-0.724400	they appear in the class
-0.357040	vector register for the class
-0.357326	member functions if the class
-0.355786	data members. If the class
-0.377296	no information about the class
-0.377296	incomplete information about the class
-0.352295	function body inside the class
-0.351437	powN template is a class
-0.351437	type T is a class
-0.509089	new object of a class
-0.608230	the value of a class
-0.468236	data member of a class
-0.163843	are members of a class
-0.163843	data members of a class
-0.251241	Data members of a class
-0.346132	as parameters to a class
-0.946230	when applied to a class
-0.517424	Storing variables in a class
-0.455105	objects declared in a class
-0.351352	parameter should be a class
-0.134508	be wrapped into a class
-0.134508	are wrapped into a class
-0.103562	Variables declared inside a class
-0.207533	is defined inside a class
-0.798253	to an object of class
-0.294147	the object belongs to class
-0.314251	alignment of structure and class
-0.237503	bytes smaller. Structure and class
-0.294060	method doesn't work for class
-0.346819	Put the function or class
-0.240357	the same function or class
-0.240357	test each function or class
-0.240357	storage. No function or class
-0.118839	of the structure or class
-0.050277	in a structure or class
-0.107146	define a structure or class
-0.055367	array of structure or class
-0.055367	arrays of structure or class
-0.118839	thread. This structure or class
-0.230795	structure }; 52 or class
-0.235972	copied into registers. A class
-0.235972	any other constructors. A class
-0.303532	library and the vector class
-0.303532	set for the vector class
-0.293891	vector registers. The vector class
-0.220377	#include "vectorclass.h" // vector class
-0.317978	www.agner.org/optimize/#vectorclass. The Intel vector class
-0.220377	found in my vector class
-0.220377	from me. My vector class
-0.090075	vector classes Agner's vector class
-0.090075	page 107). Agner's vector class
-0.090075	option -mveclibabi=acml. Agner's vector class
-0.090075	amd_vrs4_expf amd_vrd2_exp Agner's vector class
-0.329102	members of the same class
-0.331244	polymorphism with virtual functions class
-0.335495	extra information to all class
-0.606565	a pointer to one class
-0.285009	CParent is a template class
-0.285009	know that a template class
-0.311956	for polymorphism A template class
-0.221792	using the above template class
-0.349008	object of a simple class
-0.056919	<< 1; } }; class
-0.500876	virtual void f(); }; class
-0.333314	array into a container class
-0.253420	by defining a container class
-0.192748	container class. The container class
-0.192748	Library) and other container class
-0.278453	by more efficient container class
-0.192748	templates. Ready made container class
-0.310710	compiler doesn't know what class
-0.046237	offset of the child class
-0.046237	name for the child class
-0.342693	of parent and child class
-0.138048	parent class. The child class
-0.086999	member of its child class
-0.086999	information about its child class
-0.097991	has the correct child class
-0.231235	examples of suitable containers class
-0.268562	follows. The first generation class
-0.027445	functions. The second generation class
-0.153122	about the third generation class
-0.171855	license Table 12.4. Vector class
-0.171855	been updated lately. Vector class
-0.171855	// Example 12.7. Vector class
-0.285111	sure that the declaration class
-0.237149	object of the derived class
-0.098008	file and the derived class
-0.013560	object of a derived class
-0.056981	class and a derived class
-0.083867	parent class and derived class
-0.136297	size of the parent class
-0.030277	functions of a parent class
-0.030277	members of a parent class
-0.434977	member functions of parent class
-0.406241	instance of a polymorphic class
-0.279199	pointer to a base class
-0.229036	Alternative to multiple inheritance class
-0.165006	Example 7.38a. Multiple inheritance class
-0.042004	T, unsigned int N> class
-0.042004	<bool IsPowerOf2, int N> class
-0.186046	2 template <int N> class
-0.212067	B1; class B2; 54 class
-0.212067	class objects Conversions involving class
-0.212067	Example: // Example 7.28 class
-0.212067	// Example 8.19. Devirtualization class
-0.212158	example: // Example 7.14 class
-0.017504	multiple inheritance class B1; class
-0.017504	Multiple inheritance class B1; class
-0.465743	member of the object's class
-0.199802	in the grandparent class: class
-0.199802	<< "Hello "; Disp(); class
-0.074693	should be true. template<> class
-0.074693	ends the recursion template<> class
-0.199802	otherwise go undetected. Converting class
-0.465743	class B1; class B2; class
-0.164957	have multiple // versions: class
-0.164957	parameter: template <typename MyChild> class
-0.164957	needed: // Example 7.44 class
-0.164957	example: // Example 7.37 class
-0.164957	Example: // Example 7.36 class
-0.164957	Example: // Example 7.41a class
-0.858956	absolute value of the floating
-1.028960	different parts of the floating
-0.953313	The fact that the floating
-0.720131	is determined by the floating
-0.240789	long time when the floating
-0.240789	extra time when the floating
-0.341662	are inefficient when the floating
-0.324060	specifies truncation so the floating
-0.353107	misprediction long before the floating
-0.342327	compiler might store the floating
-0.330548	single precision. When the floating
-0.313457	long double reflects the floating
-0.236838	integer operations in-between the floating
-0.513475	implementation when b is floating
-0.635694	the sign of a floating
-0.635694	conversion Conversion of a floating
-0.346672	the latency of a floating
-0.354865	< 223 to a floating
-0.349589	point addition, and a floating
-0.350554	can check if a floating
-0.497340	be faster than a floating
-0.348965	dependency chain. If a floating
-0.465392	cycles to do a floating
-0.692385	faster to access a floating
-0.291691	a loop needs a floating
-0.235689	an integer addition, a floating
-0.235689	The function rounds a floating
-0.350743	avoid any use of floating
-0.826678	The maximum number of floating
-0.544457	that the order of floating
-0.323920	some rare cases of floating
-0.099320	two different types of floating
-0.236677	the two types of floating
-0.329888	underflow. The range of floating
-0.236306	because algebraic manipulations of floating
-0.113765	conversions from integer to floating
-0.113765	Conversion from integer to floating
-0.211275	Conversion of integers to floating
-0.283131	of unsigned integers to floating
-0.292583	integers before conversion to floating
-0.612859	does not apply to floating
-0.236473	operator %. Conversion to floating
-0.292583	signed before converting to floating
-0.323859	to mix integer and floating
-0.022912	conversions between integers and floating
-0.047100	Conversions between integers and floating
-0.236674	stack. String constants and floating
-0.324959	simple integer calculations in floating
-0.325207	of precision, especially in floating
-0.345004	intrinsic hardware functions. The floating
-0.470783	make induction variables for floating
-0.292337	the XMM registers for floating
-0.236256	// Enable exception for floating
-0.236256	number of accumulators for floating
-0.236256	the static keyword, for floating
-0.350719	compiler makers assume that floating
-0.717774	again. If there are floating
-0.293873	they are integers or floating
-0.535152	the above example with floating
-0.292097	floating point addition with floating
-0.405383	these are incompatible with floating
-0.338796	integer expressions than on floating
-0.092958	make algebraic reductions on floating
-0.092958	any algebraic reductions on floating
-0.237589	storing intermediate results as floating
-0.456549	operations are faster than floating
-0.380867	reducing integer expressions than floating
-0.346994	all systems that have floating
-0.407364	takes memory space. A floating
-0.288204	is faster than from floating
-0.308433	mode. A conversion from floating
-0.308433	language, all conversions from floating
-0.288204	and integers Conversion from floating
-0.355753	is easy to make floating
-0.209454	CodeGear compiler cannot make floating
-0.209454	variables Compilers cannot make floating
-0.237317	penalty for mixing different floating
-0.345828	standard specifies that all floating
-0.454012	CPUs have only one floating
-0.237179	the loop. If each floating
-0.091402	22 one or two floating
-0.091402	units, one or two floating
-0.230884	This would require two floating
-0.280467	operations and before any floating
-0.209016	EMMS instruction before any floating
-0.107780	platform. 14.8 Conversions between floating
-0.107780	140 14.8 Conversions between floating
-0.423662	interesting because it makes floating
-0.046810	SSE2 instruction set makes floating
-0.046810	Pro instruction set makes floating
-0.292771	(6 integer and 8 floating
-0.455314	to start a new floating
-0.022899	compilers have difficulties making floating
-0.208338	A code that does floating
-0.208338	a loop that does floating
-0.336929	processors requires a big floating
-0.217352	registers. There are eight floating
-0.270890	the 49 first eight floating
-0.498690	point operations involves eight floating
-0.235428	If a loop contains floating
-0.105607	newer method of doing floating
-0.105607	original method of doing floating
-0.235053	recommended to enable fast floating
-0.233617	141. Applications that generate floating
-0.233063	to compare two positive floating
-0.231562	that there are 100 floating
-0.231646	the FDIV bug causes floating
-0.605411	be advantageous to mix floating
-0.231062	point execution units. Any floating
-0.336682	be loading the entire floating
-0.228902	mixed with x87 style floating
-0.227825	This includes static variables, floating
-0.222130	off requirements for strict floating
-0.164952	can multiply a nonzero floating
-0.164952	The values of nonzero floating
-0.222059	that it allows larger floating
-0.271914	by making an additional floating
-0.008661	integer operations for manipulating floating
-0.251008	} } // Catch floating
-0.465608	cache misses, branch mispredictions, floating
-0.164890	that allows less precise floating
-0.164890	aliasing /Oa -fno-alias Non-strict floating
-0.164890	microprocessors have no native floating
-0.164890	has occurred. // Reset floating
-0.164890	turned on, including relaxed floating
-0.164890	are set to relax floating
-0.164890	bit integer vectors FMA3 floating
-0.493806	for the size of each
-0.493806	where the size of each
-0.474560	elements. The size of each
-0.296045	The maximum size of each
-0.296045	The total size of each
-0.741089	calculate the address of each
-0.513549	tells the address of each
-0.345382	the intermediate result of each
-0.505750	measures the speed of each
-0.345023	the above advantages of each
-0.536288	unless the length of each
-0.234626	constructors and destructors of each
-0.378261	the time consumption of each
-0.532330	size (in bytes) of each
-0.065063	Intrinsic function Size of each
-0.065063	of elements Size of each
-0.065063	these classes. Size of each
-0.234626	Taking the logarithm of each
-0.037056	// Add 2 to each
-0.236874	time slices allocated to each
-0.236874	add new features to each
-0.330867	*.so) that belong to each
-0.236874	functions are unrelated to each
-0.237491	counters are CPU-specific and each
-0.237491	xn = x∙xn-1, and each
-1.241990	number of elements in each
-0.592825	run two threads in each
-0.662379	four cache lines in each
-0.292627	set of counters in each
-0.292627	an interrupt occurs in each
-0.236511	the right formula in each
-0.331878	the template function for each
-0.443920	branch to use for each
-0.167503	will be different for each
-0.167503	prediction are different for each
-0.322984	closes the file for each
-0.228939	the residual error for each
-0.304051	use one container for each
-0.228939	that is optimal for each
-0.228939	should be separate for each
-0.284020	a small block for each
-0.304051	a different name for each
-0.284020	// Time difference for each
-0.007176	have one instance for each
-0.007176	make one instance for each
-0.007176	get one instance for each
-0.003573	needs one instance for each
-0.228939	have separate containers for each
-0.099555	executed only once for each
-0.099555	function. Compile once for each
-0.228939	with some changes for each
-0.329744	are often waiting for each
-0.228939	c2 and bc for each
-0.228939	the heap manager for each
-0.228939	// function prototypes for each
-0.235756	type of CPU that each
-0.441744	the code so that each
-0.341633	a macro so that each
-0.235756	take into account that each
-0.437158	in the sense that each
-0.529782	big a structure or each
-0.593146	This means that if each
-0.550031	operation. For example, if each
-0.237671	for exclusive access by each
-0.236859	dynamically allocated objects with each
-0.236859	will be stored with each
-0.347106	is loaded rather than each
-0.347106	calculated once, rather than each
-0.314291	28) The threads have each
-0.473367	measure how much time each
-0.380154	set is available then each
-0.235985	long dependency chains then each
-0.314175	have values far from each
-0.324793	be loaded from memory each
-0.379911	inserts extra code at each
-0.235811	calculating row addresses at each
-0.329621	at a time because each
-0.235619	code is serial because each
-0.237394	inside the loop. If each
-0.237182	are in fact using each
-0.314201	amount of work into each
-0.510740	in a loop where each
-0.223614	follow a sequence where each
-0.223614	a dependency chain where each
-0.223614	series of calculations, where each
-0.223614	the variable __intel_cpu_feature_indicator where each
-0.355576	of calculating the value each
-0.236822	from the cache between each
-0.236714	only 50% or less each
-0.348255	common practice to test each
-0.563057	the number of times each
-0.367145	count how many times each
-0.256182	tell how many times each
-0.235994	the data members. But each
-0.345951	will have to calculate each
-0.321361	and it can calculate each
-0.348053	block than to store each
-0.234765	x // get next each
-0.472586	time before and after each
-0.220198	The context switches after each
-0.321817	is possible to give each
-0.331850	return sum; } Here, each
-0.233374	of the same function, each
-0.305972	time searching for updates each
-0.230034	while multiple statements within each
-0.059866	that are used near each
-0.014211	together are stored near each
-0.001556	are also stored near each
-0.059866	which are called near each
-0.059866	the code together near each
-0.224666	DoThisThreeTimesAWeek(); } By giving each
-0.074706	_mm_cmpgt_epi16(b, zero); // AND each
-0.074706	mask); 110 // AND each
-0.222172	history of CPU development, each
-0.222172	a quadratic matrix, i.e. each
-0.212143	kind of branch. After each
-0.403459	hyperthreading. On the contrary, each
-0.212143	memcpy rather than moving each
-0.284039	separate modules if necessary, each
-0.199780	the threads will invalidate each
-0.251064	bitmap than to draw each
-0.008663	(b, c); // Compare each
-0.164936	code in multiple versions, each
-0.164936	overflow and underflow neutralize each
-0.336461	is optimized is to do
-0.336461	is limited is to do
-0.652970	for the compiler to do
-0.365846	on the compiler to do
-0.578375	enable the compiler to do
-0.514734	enables the compiler to do
-0.266107	a particular compiler to do
-0.441925	program may have to do
-0.346412	case you have to do
-0.346412	All you have to do
-0.346412	optimizations you have to do
-0.746709	we don't have to do
-0.817888	it is possible to do
-0.620549	It is possible to do
-0.131177	might be possible to do
-0.645484	make it possible to do
-0.663461	is not possible to do
-0.507086	that it takes to do
-1.298848	time it takes to do
-0.853298	example of how to do
-0.329548	that specifies how to do
-0.505330	If you need to do
-1.101749	it is important to do
-0.578352	is very important to do
-0.282012	some heavy work to do
-0.838580	it is necessary to do
-0.503392	is often necessary to do
-0.175725	is therefore necessary to do
-0.282012	It is good to do
-1.117756	it is advantageous to do
-0.511240	cases be advantageous to do
-0.502517	compiler is able to do
-0.608402	not be able to do
-0.573149	may be able to do
-0.399548	compilers are able to do
-0.327301	are not able to do
-0.327301	they were able to do
-0.427193	5 clock cycles to do
-0.321226	is not optimal to do
-0.517652	may be better to do
-0.872010	are various ways to do
-0.500661	are three ways to do
-0.301948	is more safe to do
-0.282012	The first thing to do
-0.312270	compiler may try to do
-0.367915	would be obvious to do
-0.315529	is therefore safer to do
-0.055224	also deallocated. Failure to do
-0.055224	been deallocated. Failure to do
-0.118506	program flow. Failure to do
-0.227169	have tested seem to do
-0.227169	We may decide to do
-0.404994	old operating systems that do
-0.235782	with old microprocessors that do
-0.102083	than other languages that do
-0.102083	from programming languages that do
-0.235782	FPGA soft cores that do
-0.235782	have powerful facilities that do
-0.347622	a compiler that can do
-0.448298	T+5, then it can do
-0.549480	what the compiler can do
-0.411841	to and you can do
-0.303277	thing that you can do
-0.303277	unrealistic that you can do
-0.290391	most compilers you can do
-0.121363	of things you can do
-0.121363	various things you can do
-0.281745	because the CPU can do
-0.412009	reduction Most compilers can do
-0.290518	optimize Modern compilers can do
-0.705954	so that we can do
-0.311984	CPU Modern CPUs can do
-0.226934	that the processor can do
-0.229121	that one thread can do
-0.229121	a third thread can do
-0.281745	that the programmer can do
-0.226934	chains Modern microprocessors can do
-0.226934	a simple algorithm can do
-0.226934	what the preprocessor can do
-0.356191	< 5) { // do
-0.237067	out of order or do
-0.237067	to a command or do
-0.124257	the compilers will not do
-0.124257	The compilers will not do
-0.456900	point code. If you do
-0.347200	predict which compiler will do
-0.319318	Fortunately, most compilers will do
-0.319318	that future compilers will do
-0.502157	course make the program do
-0.237353	interrupt service routine should do
-0.311025	brackets. However, most compilers do
-0.234797	The reason why compilers do
-0.285072	uncached memory and we do
-0.341165	so large that we do
-0.285072	matical applications. But we do
-0.507272	expressions. Floating point variables do
-0.288076	that the compilers cannot do
-0.232509	m and therefore cannot do
-0.449942	the Intel function libraries do
-0.287170	However, the Intel libraries do
-0.313125	compiler explicitly that pointers do
-0.342532	bits, but 32-bit systems do
-0.338001	Boolean operators because they do
-0.340400	that these integer operations do
-0.330159	not, then you must do
-0.347905	compiler would be able do
-0.289848	Note that these directives do
-0.233863	code. The bigger vectors do
-0.311025	cache as when contentions do
-0.288851	efficient because relative references do
-0.319244	the pointer. These conversions do
-0.286683	The other STL containers do
-0.285200	For example, many programmers do
-0.222206	ranges now overlap. Compilers do
-0.218403	updated since 2004. Can do
-0.050883	when their live ranges do
-0.074193	because their live ranges do
-0.264995	a 128-bit vector register, do
-0.165019	compilers I have studied do
-0.165019	variable if their live-ranges do
-0.165019	their uses (live ranges) do
-0.351539	source code, as the example
-0.553006	let's look at the example
-0.449636	in the way of example
-0.293747	implemented a collection of example
-0.237496	then the transformation of example
-0.237877	metaprogramming implementation analogous to example
-0.274217	If the code in example
-0.274217	replace the code in example
-0.274217	change the code in example
-0.172253	} The code in example
-0.172253	calculations. The code in example
-0.172253	_mm_cvtsd_si32(_mm_load_sd(&x));} The code in example
-0.104847	dispatching explicitly as in example
-0.104847	in memory, as in example
-0.104847	same principle as in example
-0.104847	multiple counters, as in example
-0.104847	a union, as in example
-0.104721	of the loop in example
-0.104721	vectorize the loop in example
-0.052478	} The loop in example
-0.052478	1000. The loop in example
-0.112181	the while loop in example
-0.112181	The c loop in example
-0.318056	as required, but in example
-0.345278	The method used in example
-0.516820	a and b in example
-0.616485	a register variable in example
-0.334207	power of 2 in example
-0.166231	multiplication by 2 in example
-0.226743	the if branch in example
-0.281528	elimination. The method in example
-0.281528	does not work in example
-0.532730	space, as explained in example
-0.226743	elements in list in example
-0.281528	making the structure in example
-0.166231	space. The syntax in example
-0.166231	the C++ syntax in example
-0.318056	function is given in example
-0.226743	ReadTSC listed below in example
-0.226743	The loop unrolling in example
-0.098739	versions of CriticalFunction in example
-0.098739	call to CriticalFunction in example
-0.075210	method is illustrated in example
-0.075210	technique is illustrated in example
-0.367326	_mm_empty() as shown in example
-0.226743	Wednesday | Friday) in example
-0.226743	} The FactorialTable in example
-0.226743	clumsy AND-OR construction in example
-0.226743	eliminating the if-branch in example
-0.237812	be a type. The example
-0.234851	in some cases, for example
-0.234851	of the program, for example
-0.234851	are not suitable for example
-0.234851	of a variable, for example
-0.234851	inside the loop, for example
-0.234851	half of it, for example
-0.234851	at certain events, for example
-0.234851	outside this interval, for example
-0.234851	variable in parts, for example
-0.235928	the same code as example
-0.235928	compiler // Same as example
-0.235928	as template parameters, as example
-0.237690	// Portability note: This example
-0.306550	page 80 for an example
-0.560769	page 89 for an example
-0.346877	is provided as an example
-0.234575	page 58 shows an example
-0.457845	seven times faster than example
-0.247859	The code in this example
-0.247859	The data in this example
-0.247859	to float in this example
-0.247859	if statement in this example
-0.247859	If MultiplyBy in this example
-0.247859	two formulas in this example
-0.247859	so 1.2 in this example
-0.228793	I am giving this example
-0.284696	asmlib.. // or from example
-0.063909	is the code from example
-0.015117	following assembly code from example
-0.304759	well. The conversion from example
-0.304759	possible to come from example
-0.345912	} } The same example
-0.237248	transpose a matrix using example
-0.223978	converting to double In example
-0.223978	128-bit XMM register. In example
-0.223978	divisible by 16. In example
-0.223978	fits the application. In example
-0.223978	do it explicitly. In example
-0.236816	C++: Preprocessor directives. For example
-0.236503	performance reasons. Use these example
-0.088320	message function. The following example
-0.088320	member functions. The following example
-0.088320	very efficient. The following example
-0.088320	instruction set. The following example
-0.088320	newer processors. The following example
-0.088320	the loop. The following example
-0.088320	of 2. The following example
-0.088320	or not. The following example
-0.088320	128-bit vectors. The following example
-0.088320	breakpoint again. The following example
-0.088320	at www.agner.org/optimize/asmlib.zip. The following example
-0.088320	further explanation. The following example
-0.088320	from errors. The following example
-0.088320	before compilation. The following example
-0.088320	no multiplications. The following example
-0.088320	becomes noticeable. The following example
-0.088320	(Intel Atom). The following example
-0.148395	function is InstructionSet().The following example
-0.236072	it is executed. An example
-0.235128	help the compiler optimize example
-0.275492	code in the above example
-0.275492	bit in the above example
-0.256898	Let's repeat the above example
-0.306212	a register. The above example
-0.335617	need metaprogramming. The next example
-0.234821	useful in situations like example
-0.428384	the compiler to reduce example
-0.231304	have tested can convert example
-0.122640	Gnu compiler will convert example
-0.122640	good compiler will convert example
-0.229981	anyway. If we modify example
-0.199948	example illustrates this. My example
-0.199948	with an example. My example
-0.222307	compilers are actually reducing example
-0.199897	compiler that automatically reduces example
-0.165045	to make the SelectAddMul example
-0.566176	only one of the compilers
-0.106296	but none of the compilers
-0.354148	metaprogramming. None of the compilers
-0.353245	the reductions that the compilers
-0.353245	be emphasized that the compilers
-0.332066	works on all the compilers
-0.429731	double because all the compilers
-0.437584	we may choose the compilers
-0.237030	shows which reductions the compilers
-0.237030	options turned on, the compilers
-0.236326	of the compiler. The compilers
-0.312847	of losing precision. The compilers
-0.236326	with a constant. The compilers
-0.236326	to do so. The compilers
-0.236326	by 8. 71 The compilers
-0.506946	code works only for compilers
-0.463523	libraries that come with compilers
-0.339246	may not work on compilers
-0.293698	instructions are accessible from compilers
-0.341139	shows whether the different compilers
-0.635206	8.2 Comparison of different compilers
-0.432580	are compiled with different compilers
-0.287502	object files from different compilers
-0.314117	you can use only compilers
-0.340866	of expressions and other compilers
-0.269677	Borland compiler with other compilers
-0.269677	not compatible with other compilers
-0.227762	make two. Some other compilers
-0.166822	no overhead while other compilers
-0.166822	to Func1, while other compilers
-0.237293	operator less. Fortunately, all compilers
-0.232287	{} brackets. However, most compilers
-0.232287	language output. On most compilers
-0.232287	+ d.y; Fortunately, most compilers
-0.153795	The Microsoft and Intel compilers
-0.153795	the PathScale and Intel compilers
-0.316021	Gnu, Clang and Intel compilers
-0.321993	detection mechanism in Intel compilers
-0.339948	page 127. The Intel compilers
-0.363849	in Intel compiler Intel compilers
-0.237074	bit mode. Some 64-bit compilers
-0.297601	different brands of C++ compilers
-0.191583	the sense that C++ compilers
-0.079908	optimizations in different C++ compilers
-0.002032	conventions for different C++ compilers
-0.079908	are several different C++ compilers
-0.191583	work on all C++ compilers
-0.191583	code. Furthermore, most C++ compilers
-0.191583	optimization options All C++ compilers
-0.085312	less efficient. Most C++ compilers
-0.085312	low-level optimizations. Most C++ compilers
-0.241851	Intel and Microsoft C++ compilers
-0.319654	will notice that some compilers
-0.233230	= 2; Unfortunately, some compilers
-0.355438	mathematical purity. For example, compilers
-0.277794	my study of how compilers
-0.277794	basic understanding of how compilers
-0.444615	Comments All of these compilers
-0.231112	way to tell these compilers
-0.563751	Microsoft, Intel and Gnu compilers
-0.229687	with Microsoft or Gnu compilers
-0.147309	whole program optimization Some compilers
-0.147309	at compile time. Some compilers
-0.067438	in 64-bit systems. Some compilers
-0.067438	64-bit operating systems. Some compilers
-0.147309	on all compilers. Some compilers
-0.147309	8.6 Optimization directives Some compilers
-0.118412	on the compiler. Some compilers
-0.118412	in a compiler. Some compilers
-0.147309	} Loop unrolling Some compilers
-0.147309	the optimal order. Some compilers
-0.147309	optimization option available. Some compilers
-0.147309	a cache line. Some compilers
-0.147309	doing the division. Some compilers
-0.192442	rather than two. Some compilers
-0.147309	of programming style. Some compilers
-0.147309	many different places). Some compilers
-0.343475	is available. The best compilers
-0.312308	very stupid. Some common compilers
-0.228047	For example, all good compilers
-0.313338	by some very good compilers
-0.235430	virtual table. Unfortunately, few compilers
-0.215417	a variable because optimizing compilers
-0.215417	of the best optimizing compilers
-0.215417	very fast. All optimizing compilers
-0.128060	in static memory. Most compilers
-0.128060	memory or cache. Most compilers
-0.128060	times. Thread-local storage Most compilers
-0.128060	CPU. Algebraic reductions Most compilers
-0.128060	outside the loop. Most compilers
-0.128060	a simple variable. Most compilers
-0.128060	these instruction sets. Most compilers
-0.128060	call. Algebraic reduction Most compilers
-0.128060	of the executable. Most compilers
-0.128060	decryption, data compression Most compilers
-0.128060	a, sizeof(b)); 47 Most compilers
-0.233981	to be slower. Many compilers
-0.231239	on the processor). Optimizing compilers
-0.205985	in this format. Other compilers
-0.205985	Intel and Gnu). Other compilers
-0.097621	the Intel and PathScale compilers
-0.097621	Gnu, Intel and PathScale compilers
-0.468509	do. The reason why compilers
-0.199860	only hope that future compilers
-0.199860	to Object1.Hello(), though future compilers
-0.199834	a code that current compilers
-0.199834	example 12.4a where current compilers
-0.229170	How compilers optimize Modern compilers
-0.329217	supported in the latest compilers
-0.226608	is supplied with Intel's compilers
-0.079302	the compiler 8.1 How compilers
-0.079302	.......................................................................................... 66 8.1 How compilers
-0.312385	Borland and Digital Mars compilers
-0.218338	missing in many commercial compilers
-0.218338	CodeGear, Codeplay and Watcom compilers
-0.212162	(not a number). Different compilers
-0.199808	(see page 73). Current compilers
-0.199808	file dvec.h vectorclass.h Supported compilers
-0.164962	12.3 Automatic vectorization Good compilers
-0.164962	set is enabled. Few compilers
-0.164962	* sizeof(float)); // (Some compilers
-0.463090	because this is the most
-0.328569	simple array is the most
-0.328569	The stack is the most
-0.328569	sorted list is the most
-0.348738	after each of the most
-0.543982	to some of the most
-0.348738	project. Some of the most
-0.659944	more versions of the most
-0.659944	multiple versions of the most
-0.490820	code instead of the most
-0.348738	just-in-time compilation of the most
-0.351588	dispatching only to the most
-0.516515	possible access to the most
-0.442881	class declaration and the most
-0.342535	the easiest and the most
-0.342535	the exponent, and the most
-0.434907	are used in the most
-0.350827	handling even in the most
-0.989939	function calls in the most
-1.149473	make sure that the most
-0.355431	above methods if the most
-0.577863	has to use the most
-0.352622	is high then the most
-0.577041	useful to make the most
-0.344445	rely on only the most
-0.516065	solved by making the most
-0.416386	models to run the most
-0.563262	able to calculate the most
-0.205286	bytes then put the most
-0.205286	other, then put the most
-0.335312	will automatically choose the most
-0.327824	branch mispredictions. When the most
-0.328649	else than finding the most
-0.290499	will always select the most
-0.290499	the programmer choosing the most
-0.234641	bounds is probably the most
-0.290499	identify and isolate the most
-0.234641	means are among the most
-0.653865	the operand that is most
-0.382074	of composite type is most
-0.314083	based on what is most
-0.314328	153. A profiler is most
-0.355126	the SSE2 version of most
-0.237528	have the best and most
-0.237528	vector. The simplest and most
-0.328398	high price, and in most
-0.234389	PathScale compilers can in most
-0.331861	waste of time in most
-0.311960	than pointers because in most
-0.332607	transferred by value in most
-0.234389	is quite simple in most
-0.377931	table is advantageous in most
-0.290213	operations are fast in most
-0.234389	other thread. However, in most
-0.290213	This is optimal in most
-0.023253	the hardware implementation in most
-0.234389	than linked lists in most
-0.234389	equally efficient because, in most
-0.554083	2; } } The most
-0.308160	14.10 Mathematical functions The most
-0.232392	4; Register variables The most
-0.727761	available instruction set. The most
-0.318632	in such cases. The most
-0.287943	doing arithmetic operations. The most
-0.308160	many different purposes. The most
-0.232392	loop control condition The most
-0.308160	reduce this problem. The most
-0.318632	function libraries available. The most
-0.308160	code execute faster. The most
-0.318632	a single element. The most
-0.232392	of out-of-order execution. The most
-0.232392	CPU dispatch methods. The most
-0.232392	reusable classes. Security The most
-0.232392	than code generality. The most
-0.294082	scientific computing, but for most
-0.236996	and generally used that most
-0.236996	better. Remember again, that most
-0.236996	can therefore conclude that most
-0.507301	and operating systems are most
-0.330573	predict which resources are most
-0.236858	ways. Switch statements are most
-1.150085	set is supported by most
-0.211570	source. It comes with most
-0.211570	(STL) which comes with most
-0.291104	is less important on most
-0.235172	regardless of precision on most
-0.379021	4 clock cycles on most
-0.439947	one clock cycle on most
-0.355996	inherently serial, such as most
-0.237608	find elsewhere. Faster than most
-0.324877	The dispatcher function will most
-0.343202	of the program has most
-0.852492	variables that are used most
-0.214380	13.3 Difficult cases In most
-0.286797	its limit, etc. In most
-0.214380	versus unsigned integers In most
-0.214380	4 clock cycles. In most
-0.214380	and other optimizations. In most
-0.214380	step by step. In most
-0.214380	for each calculation. In most
-0.214380	modifies many strings. In most
-0.237034	large shared object where most
-0.330834	that goes one way most
-0.236711	slow. Today, the 8 most
-0.313294	of which functions take most
-0.441577	A variable is accessed most
-0.236093	compiler for Windows, while most
-0.236028	or vice versa. But most
-0.115494	cache. The cache works most
-0.261427	The code cache works most
-0.115494	sequentially A cache works most
-0.291624	If the application uses most
-0.337700	is likely to run most
-0.235139	inside {} brackets. However, most
-0.310992	if they are predicted most
-0.231817	assembly language output. On most
-0.184629	input. Many programs spend most
-0.184629	libraries Some applications spend most
-0.222183	c.y + d.y; Fortunately, most
-0.222183	that most software runs most
-0.218379	piece of code. Furthermore, most
-0.355887	then you can obtain most
-0.212114	single task that consumes most
-0.164998	reduced performance. 25 Since most
-0.164998	the program 153 spends most
-0.450959	Furthermore, this solution is using
-0.237703	Mathcad (v. 15.0) is using
-0.152345	} The advantage of using
-0.152345	compilers. The advantage of using
-0.152345	enabled. The advantage of using
-0.152345	faster. The advantage of using
-0.152345	m. The advantage of using
-0.390522	array. The disadvantage of using
-0.343168	107. A disadvantage of using
-0.261603	The biggest disadvantage of using
-0.348136	handling system instead of using
-0.158564	user. The advantages of using
-0.158564	pointers. The advantages of using
-0.158564	style. The advantages of using
-0.158564	operands. The advantages of using
-0.158564	bits). The advantages of using
-0.452087	and the possibility of using
-0.452087	new. The purpose of using
-0.047743	true. The trick of using
-0.047743	(b1*b2); The trick of using
-0.289655	there are disadvantages of using
-0.321711	advantages and drawbacks of using
-0.233899	pros and cons of using
-0.233899	not. The advise of using
-0.023401	of execution speed to using
-0.283002	is an advantage to using
-0.211166	is no advantage to using
-0.236650	virtually no cost to using
-0.236650	no performance cost to using
-0.236318	any performance penalty to using
-0.236318	A little-known alternative to using
-0.312838	are various alternatives to using
-0.324585	into C++ classes and using
-0.293487	an integer counter and using
-0.293487	Accessing system devices and using
-0.325318	any speed advantage in using
-0.472943	code. The reason for using
-0.420407	no performance penalty for using
-0.381639	a test tool for using
-0.442775	anyway if you are using
-0.242302	the compiler you are using
-0.185465	first when you are using
-0.185465	readable when you are using
-0.301416	methods. If you are using
-0.301416	(www.intel.com). If you are using
-0.242302	make sure you are using
-0.242302	of whether you are using
-0.306147	this example, we are using
-0.306147	these examples we are using
-0.502483	two operating systems are using
-0.428350	prediction. Modern microprocessors are using
-0.237734	make a round function using
-0.311604	or static or by using
-0.319927	vector classes than by using
-0.213944	bit of x by using
-0.326842	can avoid this by using
-0.213944	for speed-critical functions by using
-0.326842	optimize this loop by using
-0.213944	f is set by using
-0.213944	the CPU clock by using
-0.213944	and global variables by using
-0.267037	power of 2 by using
-0.213944	in 64-bit systems by using
-0.213944	a better result by using
-0.060311	double the speed by using
-0.014311	gain in speed by using
-0.213944	often be optimized by using
-0.313346	performance significantly simply by using
-0.267037	initialized to zero by using
-0.213944	Avoid the conversions by using
-1.056731	can be avoided by using
-0.213944	the code further by using
-0.213944	may improve efficiency by using
-0.419587	microprocessors is obtained by using
-0.216767	can be improved by using
-0.267037	vectorize code explicitly by using
-0.213944	not alias anything by using
-0.213944	of these guidelines by using
-0.213944	can avoid hyperthreading by using
-0.213944	should be hidden by using
-0.213944	space, if necessary, by using
-0.213944	far data segment by using
-0.213944	can be ameliorated by using
-0.210948	compiler for restrictions on using
-0.401756	are certain restrictions on using
-0.236010	from www.intel.com. Manual on using
-0.764299	exactly as efficient as using
-0.237622	linking and by not using
-0.460615	the code rather than using
-0.237546	syntax is simpler when using
-0.428819	code could benefit from using
-0.237202	} The same example using
-0.330776	in www.agner.org/optimize/cppexamples.zip. An array using
-0.711756	difference in speed between using
-0.169420	Example 12.4e. Same example, using
-0.169420	Example 12.4d. Same example, using
-0.220449	an unrecoverable error without using
-0.220449	of an exception without using
-0.220449	of handling errors without using
-0.220449	call C1::f directly without using
-0.236470	on first call method using
-0.228157	15.1b. Calculate integer power using
-0.228157	Example 15.1d. Integer power using
-0.633314	to transpose a matrix using
-0.571047	with and without AVX using
-0.563774	result can be calculated using
-0.343051	do with bitwise operators using
-0.561042	All dynamic memory allocation using
-0.234967	of this by preferably using
-0.289988	reduce simple algebraic expressions using
-0.232527	in a single operation using
-0.508099	we are in fact using
-0.399129	14.7b. Testing multiple conditions using
-0.051807	Get supported instruction set, using
-0.367096	in dynamically allocated memory, using
-0.284272	recompile it. I am using
-0.212242	this manual, I am using
-0.218497	before it is finished using
-0.199853	itself when running. Programs using
-0.165004	libraries are highly optimized, using
-0.721960	before multiplying with the double
-0.382710	1.2; // everything is double
-0.502529	this example is a double
-0.899950	32 bits of a double
-0.550905	variable. For example, a double
-0.293082	32 bits while a double
-0.236911	recommended to modify a double
-0.293082	This function stores a double
-0.443752	implicitly converting a to double
-0.237419	from single precision to double
-0.293660	signed when converting to double
-0.322855	128 bit integer and double
-0.094604	256 bit float and double
-0.232466	Don't mix float and double
-0.094604	// Mixing float and double
-0.094604	code mixes float and double
-0.065340	speed between single and double
-0.065340	not mix single and double
-0.065340	for mixing single and double
-0.314058	single precision than for double
-0.324676	most cases, even for double
-0.336130	do not know that double
-0.606821	floating point constants are double
-0.355706	for example, you can double
-0.350846	unlimited 8 bytes = double
-0.043773	is a float or double
-0.043773	to a float or double
-0.092458	Conversions of float or double
-0.043773	conversion from float or double
-0.043773	conversions from float or double
-0.092458	registers (8 float or double
-0.233191	done with single or double
-0.414186	using single precision or double
-0.353322	are calculated faster than double
-0.355635	use float rather than double
-1.057028	double xpow10(double x) { double
-0.343710	Example 14.23b union { double
-0.471741	7.35b struct S1 { double
-0.321710	unsigned int n) { double
-0.522281	used). You may use double
-0.237345	some extra complications. A double
-0.237471	} return y; } double
-0.237320	integer power using loop double
-0.570458	making a and b double
-0.324184	on vectors of two double
-0.011223	powN { public: static double
-0.011223	powN<true,0> { public: static double
-0.011223	powN<true,N> { public: static double
-0.011223	powN<true,1> { public: static double
-0.467890	bits of a 64-bit double
-0.252398	you write a 64-bit double
-0.353000	exp function of 2 double
-0.213451	except in the long double
-0.021608	float, double and long double
-0.266480	are done with long double
-0.213451	These registers have long double
-0.213451	to calculate when long double
-0.213451	double 8 8 long double
-0.224169	make table of const double
-0.327647	() { static const double
-0.224169	with induction variables const double
-0.224169	Table[100]; int x; const double
-0.330607	8 float 4 4 double
-0.235967	double Func1(double) pure_function ; double
-0.291711	32 8 256 AVX double
-0.235506	uint64_t 256 float 128 double
-0.235409	vector can hold four double
-0.493467	integers int a, b; double
-0.311696	+ log(c[i]);. This would double
-0.348471	<int N> static inline double
-0.518722	calculation. In most cases, double
-0.234578	use single precision. Using double
-0.234492	double 128 float 256 double
-0.238358	y=temp;} int r, c; double
-0.238358	98 int r, c; double
-0.034953	the power of 10 double
-0.359874	{ short int a; double
-0.272966	Example 7.24 float a; double
-0.018913	struct S1 {double a; double
-0.230562	32 4 128 SSE double
-0.016056	7.25 unsigned int u; double
-0.016056	14.22b unsigned int u; double
-0.016056	14.22a unsigned int u; double
-0.559751	32 16 512 AVX512 double
-0.276145	double Y = C; double
-0.222078	#else #define pure_function #endif double
-0.148499	The representation of float, double
-0.148499	conversion Conversions between float, double
-0.212066	example: // Example 8.4 double
-0.346981	3.3; // Polynomial coefficients double
-0.122460	registers. Disadvantages are: Long double
-0.122460	and double precision. Long double
-0.211929	integer power, loop unrolled double
-0.465495	2.2, C = 3.3; double
-0.199668	to: // Example 14.14b double
-0.199668	reciprocal: // Example 14.14a double
-0.199668	= A + A; double
-0.008658	matrix void TransposeCopy(double a[SIZE][SIZE], double
-0.164833	double x4 = x2*x2; double
-0.164833	1000; unsigned int dummy; double
-0.164833	r1, r2, c1, c2; double
-0.164833	1.2f; // Example 14.18c double
-0.164833	Example: // Example 8.2a double
-0.164833	Example: // Example 7.32a double
-0.164833	condition: // Example 7.32b double
-0.164833	test () { __declspec(__align(64)) double
-0.164833	to: // Example 14.17b double
-0.164833	example: // Example 14.16a double
-0.164833	void StoreNTD(double * dest, double
-0.164833	x2 = x *x; double
-0.164833	by // Example 8.8b double
-0.164833	Example: // Example 8.8a double
-0.164833	double x8 = x4*x4; double
-0.164833	denominator: // Example 14.16b double
-0.164833	Example: // Example 14.17a double
-0.164833	function: // Example 14.20 double
-1.331264	a multiple of the size
-0.355804	compiler documentation for the size
-0.341436	the memory if the size
-0.341436	cause errors if the size
-0.137994	can happen if the size
-0.137994	will happen if the size
-0.341436	doesn't matter if the size
-1.022494	is divisible by the size
-0.239917	is multiplied by the size
-0.346257	be multiplied by the size
-0.565247	vector depends on the size
-0.348621	as well as the size
-0.340882	the matrix when the size
-0.340882	memory allocation when the size
-0.624499	allocated dynamically when the size
-0.354044	but i*12, because the size
-0.354389	an addition. If the size
-0.333850	arrays and where the size
-0.887348	in cases where the size
-0.788403	the above example, the size
-0.449192	is slow unless the size
-0.337791	sub-vectors that fit the size
-0.328987	you cannot increase the size
-0.291566	of only half the size
-0.102008	the stack. Is the size
-0.102008	page 38). Is the size
-0.235579	zero // Return the size
-0.291566	hash table increases the size
-0.294077	specifying the type and size
-0.311533	calls more efficient. The size
-0.311533	caching less efficient. The size
-0.378138	size execution units. The size
-0.378138	divisible by 8. The size
-0.290382	or class elements. The size
-0.234538	or class objects. The size
-0.403158	in another module. The size
-0.532155	the following reasons: The size
-0.234538	an arithmetic expression. The size
-0.234538	element number i. The size
-0.293654	choice between optimizing for size
-0.314145	for speed. Optimizing for size
-0.356713	by making the code size
-0.292003	cache efficiency and code size
-0.235963	economy and small code size
-0.183269	Automatic vectorization const int size
-0.183269	order(int x); const int size
-0.183269	__attribute__((aligned(16))) #endif const int size
-0.183269	Example 14.30 const int size
-0.183269	Example 11.3 const int size
-0.039085	int Func(int); const int size
-0.183269	Example 11.2b const int size
-0.183269	Example 11.2a const int size
-0.183269	Example 7.33b const int size
-0.183269	Example 7.33a const int size
-0.183269	Example 14.4a const int size
-0.237451	use the smallest data size
-0.074694	aligned by the vector size
-0.035743	divisible by the vector size
-0.232257	support a new vector size
-0.349611	for transposition of different size
-0.235589	transposing and copying different size
-0.237365	with execution units same size
-0.237301	efficient today where cache size
-0.324948	the smaller the integer size
-0.302882	particular application. The integer size
-0.334117	to use an integer size
-0.219437	the most efficient integer size
-0.219437	whether a particular integer size
-0.219437	use the default integer size
-0.009677	use the smallest integer size
-0.019575	using the smallest integer size
-0.237174	by the memory page size
-0.346241	to make the array size
-0.233626	allocate the final array size
-0.330925	similar objects of variable size
-0.293103	multiply integers of any size
-0.317031	expansions of the register size
-0.320617	the largest vector register size
-0.045940	of a new register size
-0.045940	use a new register size
-0.236152	to make sure its size
-0.449397	integer of a specific size
-0.368768	The 64 64 matrix size
-0.368768	The 512 512 matrix size
-0.040246	size with a line size
-0.040246	ways, with a line size
-0.322223	by the cache line size
-0.220991	least the cache line size
-0.085628	processors. The cache line size
-0.085628	line. The cache line size
-0.192392	align by cache line size
-0.235014	cost in performance. Integer size
-0.234650	often because the block size
-0.234289	converted to a longer size
-0.289109	integer to a smaller size
-0.308302	0 and i >= size
-0.327318	vector registers. The maximum size
-0.392400	estimate of the final size
-0.275613	allocated. If the final size
-0.342875	double temp; // Define size
-0.231734	(SIMD) operations. The total size
-0.231176	for handling a full size
-0.231301	disk if the RAM size
-0.229070	problem. Vectors of 256-bit size
-0.284170	integers of the default size
-0.219481	solution is a fixed size
-0.202928	declare objects and fixed size
-0.415070	circular buffer with fixed size
-0.229114	= 256; // Array size
-0.165074	in large arrays. Array size
-0.074723	cache. If the combined size
-0.074723	model where the combined size
-0.027551	Number of elements Total size
-0.027551	Type of elements Total size
-0.264956	cache of 8 kb size
-0.074706	per array element. Matrix size
-0.074706	were as follows: Matrix size
-0.164988	on future CPUs. Half size
-0.497419	detection function of the Intel
-0.353486	important disadvantage of the Intel
-0.456723	The behavior of the Intel
-0.353486	the bias of the Intel
-0.356433	is inferior to the Intel
-0.355507	Gnu compiler and the Intel
-0.065351	dispatch mechanism in the Intel
-0.356156	highly optimized for the Intel
-0.920245	The fact that the Intel
-0.381705	optimized. Note that the Intel
-0.381705	disassembler. Note that the Intel
-0.523300	files generated by the Intel
-0.652224	that come with the Intel
-0.653451	directives work on the Intel
-0.563972	prefer to use the Intel
-0.692592	You may use the Intel
-0.354786	or references: If the Intel
-0.343247	directive __declspec(cpu_dispatch(...)). See the Intel
-0.339391	data sets. However, the Intel
-0.611905	In some cases, the Intel
-0.235991	(In my tests, the Intel
-0.235991	sin, etc. Overriding the Intel
-0.830091	with the use of Intel
-0.352301	Hyperthreading Some versions of Intel
-0.313968	the optimization features of Intel
-0.237266	compiler in favor of Intel
-0.350594	Supports both AMD and Intel
-0.292840	platforms. The Microsoft and Intel
-0.292840	on the PathScale and Intel
-0.023430	the Gnu, Clang and Intel
-0.349991	CPU detection function in Intel
-0.292626	performance monitor counter in Intel
-0.135094	13.7 CPU dispatching in Intel
-0.313650	CPU detection mechanism in Intel
-0.604777	Vector classes defined in Intel
-0.557287	i/2; } } The Intel
-0.234511	Mac platform. Intel The Intel
-0.335148	and Itanium systems. The Intel
-0.472802	advantageous or not. The Intel
-0.234511	always avoiding this. The Intel
-0.290351	13.1 page 127. The Intel
-0.234511	16 is required. The Intel
-0.378101	see page 122. The Intel
-0.234511	130 for details). The Intel
-0.234511	- vectorclass www.agner.org/optimize/#vectorclass. The Intel
-0.636277	and assembly code for Intel
-0.506229	compiler works only for Intel
-0.237634	libraries. Use Gnu or Intel
-0.236930	was originally designed by Intel
-0.236930	function libraries published by Intel
-0.236857	dispatching works only with Intel
-0.236857	Compiler Documentation". Included with Intel
-0.233478	older processors and on Intel
-0.414620	processors, but not on Intel
-0.289177	It also works on Intel
-0.233478	specific CPU feature on Intel
-0.233478	well in tests on Intel
-0.233478	the BIOS setup. on Intel
-0.347095	the processor is an Intel
-0.091837	is running on an Intel
-0.091837	when running on an Intel
-0.047869	you are using an Intel
-0.232486	manipulated to fake an Intel
-0.610658	dispatching in Intel compiler Intel
-0.864197	compiler Linux Intel compiler Intel
-0.293778	For Intel CPUs use Intel
-0.237517	methods also work when Intel
-0.291855	currently available, one from Intel
-0.291855	Intel Agner Available from Intel
-0.351954	a function for different Intel
-0.381818	12.4d. Same example, using Intel
-0.237150	function of 2 double Intel
-0.342767	lately. Vector class library Intel
-0.236810	available from Intel. See Intel
-0.236793	CPU- specific profiler. For Intel
-0.236198	Mars PGI PathScale Gnu Intel
-0.084965	Linux Intel compiler Windows Intel
-0.418454	0.28 strlen 128 bytes Intel
-0.014455	Windows Gnu compiler Linux Intel
-0.291701	use the well optimized Intel
-0.235225	zero flags on certain Intel
-0.329634	-fno-builtin Gnu 32-bit Mac Intel
-0.620663	Gnu and PathScale compilers. Intel
-0.233950	automatic CPU dispatching. Many Intel
-0.335792	Intel Core and later Intel
-0.529840	memcpy 16kB aligned operands Intel
-0.230549	with internal multi-threading, e.g. Intel
-0.285084	4, while all newer Intel
-0.229844	certain tasks on current Intel
-0.136314	have tried. The Microsoft, Intel
-0.136314	32-bit Linux with Microsoft, Intel
-0.136314	latest compilers from Microsoft, Intel
-0.136314	intrinsic functions (i.e. Microsoft, Intel
-0.228068	automatic parallelization. The Gnu, Intel
-0.069250	and the Gnu, Clang, Intel
-0.069250	use the Gnu, Clang, Intel
-0.059152	such as Gnu, Clang, Intel
-0.271966	vector, bits Vector class, Intel
-0.212039	AMD processors and earlier Intel
-0.212039	run on Mac platform. Intel
-0.098541	long vector math libraries: Intel
-0.098541	short vector math libraries: Intel
-0.465691	dispatcher. See page 131. Intel
-0.465691	memcpy 16kB unaligned op. Intel
-0.035738	manuals from Intel: "IA-32 Intel
-0.035738	Microprocessor documentation Intel: "IA-32 Intel
-0.164931	& earlier vmlsExp4 vmldExp2 Intel
-0.164931	C++ Builder 5, 2009). Intel
-0.164931	that doesn’t. The undocumented Intel
-0.164931	& later __svml_expf4 __svml_exp2 Intel
-0.164931	2.8. Asmlib: v. 2.00. Intel
-0.164931	vectorized as follows (using Intel
-0.164931	profilers such as AQtime, Intel
-0.563538	the value of the pointer
-0.570728	are sure that the pointer
-0.356249	PLT entry with the pointer
-0.356650	is deleted when the pointer
-0.355172	enough registers then the pointer
-0.353422	pointer well before the pointer
-0.624615	clock cycles after the pointer
-0.685157	if there is a pointer
-0.783051	then there is a pointer
-0.648203	the value of a pointer
-0.313241	one class to a pointer
-0.481634	'this' pointer to a pointer
-0.406285	is added to a pointer
-0.402466	is converted to a pointer
-0.402466	be converted to a pointer
-0.406285	is type-casted to a pointer
-0.344836	memory block and a pointer
-0.287322	or give it a pointer
-0.451846	simple function with a pointer
-0.350630	the same as a pointer
-0.339666	For example, when a pointer
-0.346191	polymorphic class has a pointer
-0.571221	safe to make a pointer
-0.296573	new and return a pointer
-0.296573	the function return a pointer
-0.280625	the function through a pointer
-0.363811	derived class through a pointer
-0.209150	accesses b through a pointer
-0.280625	or object through a pointer
-0.209150	a variable through a pointer
-0.280625	is accessed through a pointer
-0.209150	extra jump through a pointer
-0.287322	can change what a pointer
-0.231846	faster to transfer a pointer
-0.287322	Gnu mechanism stores a pointer
-0.231846	different type. Likewise, a pointer
-0.309932	used for converting a pointer
-0.287322	use and returns a pointer
-0.231846	{ // Returns a pointer
-0.231846	Incrementing or decrementing a pointer
-0.231846	not modified. Unlike a pointer
-0.231846	however, to pass a pointer
-0.237794	avoiding pointer arithmetics and pointer
-0.294084	making a pointer. The pointer
-0.988830	operator is used for pointer
-0.237814	#pragma optimize("a",on). Specifies that pointer
-0.382524	Use a reference or pointer
-0.111565	value of the function pointer
-0.262085	changes of the function pointer
-0.991988	every time the function pointer
-0.425890	dispatcher changes the function pointer
-0.229560	function through a function pointer
-0.229560	called through a function pointer
-0.131382	then sets a function pointer
-0.131382	routine sets a function pointer
-0.320653	or what a function pointer
-0.529701	and a member function pointer
-0.284549	critical function through function pointer
-0.229405	116 // Set function pointer
-0.237749	they are uninitialized, if pointer
-0.237677	the last member. This pointer
-0.330277	an integer, and this pointer
-0.236280	reference parameters). The this pointer
-0.234547	Pointer type conversion A pointer
-0.234547	accessed. Pointer arithmetic A pointer
-0.234547	sin. Pointer elimination A pointer
-0.323871	the code has no pointer
-0.221165	the hint about no pointer
-0.221165	compiler to assume no pointer
-0.096653	/GR- -fno-rtti Assume no pointer
-0.096653	page 78. Assume no pointer
-0.221165	option for assuming no pointer
-0.096653	option for "assume no pointer
-0.096653	compiler option "assume no pointer
-0.349865	it compares the array pointer
-0.330795	the obstacle of possible pointer
-0.233363	should never return any pointer
-0.289046	should avoid making any pointer
-0.334004	class that the member pointer
-0.486778	cases, a data member pointer
-0.313471	is taken. A const pointer
-0.330723	32-bit mode 4 4 pointer
-0.334773	or unsigned 8 8 pointer
-0.491253	p is a simple pointer
-0.292233	block should have its pointer
-0.349900	has insufficient information about pointer
-0.347714	compiler that a specific pointer
-0.300023	instrset_detect function // Function pointer
-0.300023	CriticalFunctionType CriticalFunction_Dispatch; // Function pointer
-0.234436	13.1 below. 126 Make pointer
-0.217496	linear array. No link pointer
-0.217496	until the previous link pointer
-0.210471	#pragma vector aligned Assume pointer
-0.210471	align(16)) __attribute(( aligned(16))) Assume pointer
-0.191540	then use a smart pointer
-0.191540	cost whenever a smart pointer
-0.060172	Smart pointers A smart pointer
-0.060172	longer used. A smart pointer
-0.130044	with each their smart pointer
-0.162875	functions have a 'this' pointer
-0.120573	member functions. The 'this' pointer
-0.120573	by type-casting its 'this' pointer
-0.056115	__fastcall. The implicit 'this' pointer
-0.056115	has an implicit 'this' pointer
-0.227985	double plus 6 integer, pointer
-0.036288	= InstructionSet(); // Set pointer
-0.299112	invalid, and by avoiding pointer
-0.291533	checks whether the original pointer
-0.218441	but risky. The returned pointer
-0.218441	accessed through the implicit pointer
-0.264976	information about a variable, pointer
-0.035751	frame- pointer -fomit- frame- pointer
-0.035751	frame /Oy -fomit- frame- pointer
-1.248993	if the value of b
-0.331263	read four elements of b
-0.324808	}; The offset of b
-0.293494	is fast. Value of b
-0.056016	// copy a to b
-0.020628	uses of a and b
-0.010192	values of a and b
-0.065080	area for a and b
-0.015379	faster if a and b
-0.015379	example, if a and b
-0.015379	even if a and b
-0.015379	optimized if a and b
-0.031322	number. If a and b
-0.031322	factor. If a and b
-0.065080	bytes between a and b
-0.065080	by making a and b
-0.065080	are used. a and b
-0.065080	by joining a and b
-0.065080	= MultiplyBy<8>(10); a and b
-0.619923	of the time and b
-0.147406	Compare each element in b
-0.350735	If we assume that b
-0.224490	changed to a = b
-0.224490	For example, a = b
-0.114729	a, b; a = b
-0.140753	bool b; a = b
-0.096343	/ c; a = b
-0.045505	b, c; a = b
-0.096343	% c; a = b
-0.224490	c, d; a = b
-0.357728	/ 10; a = b
-0.357728	% 10; a = b
-0.224490	Example 7.2 a = b
-0.224490	float 140 a = b
-0.224490	Example 8.2b a = b
-0.046119	i); // result = b
-0.046119	c.load(cc+i); // result = b
-0.306939	overlap. If c = b
-0.306939	= b; c = b
-1.128076	i++) { a[i] = b
-0.435888	b, temp; temp = b
-0.454904	float, but not if b
-0.102016	a & b if b
-0.102016	a | b if b
-0.235601	division is inexact if b
-0.293941	This extra check on b
-0.344014	evaluate a only when b
-0.319769	1.5f; is efficient when b
-0.233324	a different implementation when b
-0.233324	is inefficient, however, when b
-0.574603	a and b because b
-0.154529	x = a + b
-0.045661	y = a + b
-0.209942	to write a + b
-0.328078	y = c + b
-0.220623	a = 1.0f + b
-0.407285	a[i] = b * b
-0.407285	temp = b * b
-0.236824	-b to a < b
-0.344871	b with a & b
-0.236122	order to align its b
-0.236026	In this example, a, b
-0.235596	can compute a / b
-0.342791	7.10a bool a, b; b
-0.226983	a = 0, b; b
-0.235298	a will be 1 b
-0.022570	c + 2 : b
-0.291226	the operands and add b
-0.222849	on the intermediate expression b
-0.222849	cases. The equivalent expression b
-0.022310	into vector b: __m128i b
-0.222255	a, b; double c; b
-1.115400	int a, b, c; b
-0.253438	cannot replace a && b
-0.253438	The expression a && b
-0.336208	b with a | b
-0.327218	cannot replace a || b
-0.206519	{ return a > b
-0.091098	14.15a if (a > b
-0.091098	#define MAX(a,b) (a > b
-0.619398	char a = 0, b
-0.288847	c; x[0] = a; b
-0.021000	0, c + 2, b
-0.306778	therefore necessary to convert b
-0.042590	(!a&&c) = a ? b
-0.042590	(b&&c) = a ? b
-0.409106	b needs to evaluate b
-0.401343	~b = a ^ b
-0.224683	into vector b: Is16vec8 b
-0.222301	Here, I have AND'ed b
-0.016021	_mm_add_epi16(c, two); // Multiply b
-0.212062	the sake of security. b
-0.347162	Any code that accesses b
-0.164952	char a = -100, b
-0.164952	float a = -1.0E8, b
-0.164952	a = parabola (2.0f); b
-0.164952	8.3b a = 5.0f; b
-0.164952	zero, c + two, b
-0.164952	(!a&&b) = a XOR b
-0.164952	b; a = Multiply(10,8); b
-0.164952	temp; temp = a+1; b
-0.237710	code and divide it into
-0.654756	Splitting up a function into
-0.428028	turning the frame function into
-0.353186	automatic parallelization of code into
-0.331210	wrap the allocated memory into
-0.333560	split up the data into
-0.431602	bottleneck. Organize the data into
-0.230713	putting the right data into
-0.230713	penalty for organizing data into
-0.230713	Aligning data Loading data into
-0.275991	store aligned integer vector into
-0.063281	store unaligned integer vector into
-0.237174	divide the data set into
-0.509280	the function or class into
-0.324678	four elements of b into
-0.492472	instance of the library into
-0.789523	sign bit of i into
-0.293088	wrap the allocated array into
-0.324193	much data as possible into
-0.236817	space. Putting simple variables into
-0.340266	The splitting of software into
-0.345376	it feeds a branch into
-0.236541	separating the flags register into
-0.341048	we have to take into
-0.231920	vectorized if you take into
-0.236247	splitting 256-bit read operations into
-0.292295	should be split up into
-0.169759	to divide the work into
-0.222145	equal amount of work into
-0.235911	put time- consuming calculations into
-0.511310	source code is compiled into
-0.022804	on what fits best into
-0.339344	to divide the matrix into
-0.287041	join all source files into
-0.044539	compiling multiple .cpp files into
-0.044539	combining multiple .cpp files into
-0.419100	time and compatibility problems into
-0.345958	the old memory block into
-0.290430	have to be put into
-0.234709	load the structure y into
-0.310240	a must be read into
-0.199021	versions should be linked into
-0.199021	most cases be linked into
-0.233957	part of it) load into
-0.289494	all the objects together into
-0.289389	by wrapping the vectors into
-0.233512	is reset or goes into
-0.289298	put a test feature into
-0.288848	the multiple .cpp modules into
-0.288622	eight elements will go into
-0.475006	the program is loaded into
-0.180996	and esp+12 and loaded into
-0.279014	address cannot be loaded into
-0.431661	dynamic libraries are loaded into
-0.307646	Don't put a task into
-0.164898	returned by copying them into
-0.164898	file and copies them into
-0.164898	better to join them into
-0.164898	format and getting them into
-0.208225	to split the tasks into
-0.260580	to put time-consuming tasks into
-0.205980	This does not fit into
-0.338882	make the data fit into
-0.306644	The splitting of N into
-0.230543	put measurement instruments directly into
-0.432552	members can be copied into
-0.229852	License shall automatically come into
-0.229943	counter and go back into
-0.290357	can easily be organized into
-0.251147	Most caches are organized into
-0.312952	to take branch prediction into
-0.176313	small that it fits into
-0.176313	of four float's fits into
-0.309046	to divide the job into
-0.222055	function can be turned into
-0.222055	without taking cache effects into
-0.222055	and simply put 80 into
-0.222055	128-bit operation was split into
-0.028333	a task is divided into
-0.006912	that can be divided into
-0.006912	job can be divided into
-0.028333	tasks were not divided into
-0.028333	area is usually divided into
-0.222055	s3 can be combined into
-0.218163	inserted, one by one, into
-0.000953	consecutive elements from cc into
-0.008657	vector b: from cc into
-0.000953	consecutive elements from bb into
-0.008657	} 115 from bb into
-0.264729	the desired measurement instruments into
-0.403235	address 0x2700 to 0x273F into
-0.264894	program code is translated into
-0.211901	combined by some formula into
-0.005109	This should be taken into
-0.005109	problems should be taken into
-0.005109	considerations should be taken into
-0.102737	They can be joined into
-0.157648	instances will be joined into
-0.199640	wrong branch is fed into
-0.074632	file can be wrapped into
-0.074632	unless they are wrapped into
-0.164807	want to go deeper into
-0.164807	for 32-bit Windows. Integrates into
-0.164807	the data fit nicely into
-0.164807	program - preferably isolated into
-0.164807	at a time packed into
-0.164807	two branches to feed into
-0.280428	y; x = a +
-0.207356	a; b = a +
-0.207356	(2.0f); b = a +
-0.416047	to c = a +
-0.043308	y; y = a +
-0.287602	(b) { return a +
-0.182880	3; } return a +
-0.039015	* 2; return a +
-0.358946	* 3; return a +
-0.291489	easier to write a +
-0.235511	int Sum1() {return a +
-0.027386	return x * x +
-0.121734	- 8.0f) * x +
-0.171532	double Z = A +
-0.171532	double A2 = A +
-0.904722	b; a = b +
-0.426784	d; a = b +
-0.426784	8.2b a = b +
-0.111408	If c = b +
-0.111408	b; c = b +
-0.013093	= a + b +
-0.098663	= c + b +
-0.021160	= b * b +
-0.293335	of i ; i +
-0.236651	; parameter 1: 4 +
-0.236638	; parameter 1: 8 +
-0.186352	write: y = c +
-0.039636	= b + c +
-0.039636	+ b + c +
-0.019366	select(b > 0, c +
-0.019366	> 0 ? c +
-0.186352	= select_gt(b, zero, c +
-0.234796	= b;} vector operator +
-0.018076	} z = y +
-0.018076	cos(x); z = y +
-0.018076	sin(x); z = y +
-0.198537	vector(x + a.x, y +
-0.160804	{ r = r +
-0.160804	{ a[i] = r +
-0.233497	{ temp = a[i] +
-0.232895	i; p = p +
-0.232461	= (a + b) +
-0.092607	3.5; c = d +
-0.092607	c; y = d +
-0.053657	: 8; // exponent +
-0.053657	: 15; // exponent +
-0.053657	: 11; // exponent +
-0.229801	column++) matrix[row][column] = row +
-0.048369	{ a[i] = *p +
-0.011597	{ *p = *p +
-0.281249	y; y = (a +
-0.305781	calculated as(a << 4) +
-0.224731	= i * 9 +
-0.222272	+ b) + (c +
-0.222201	{ a[i] = b[i] +
-0.291381	__svml_expf4 __svml_exp2 Intel SVML +
-0.218345	intermediate object for (b +
-0.002548	__m128i c = LoadVector(cc +
-0.005111	Is16vec8 c = LoadVector(cc +
-0.002548	__m128i b = LoadVector(bb +
-0.005111	Is16vec8 b = LoadVector(bb +
-0.001697	elements in aa: StoreVector(aa +
-0.008661	+ b*x*x + c*x +
-0.008661	a*b+a*c=a*(b+c) a*x*x*x + b*x*x +
-0.008661	= (a+b)+(c+d) a*b+a*c=a*(b+c) a*x*x*x +
-0.199729	{ aa[i] = bb[i] +
-0.017498	overflow: a[i] = log(b[i]) +
-0.017498	formula a[i] = log(b[i]) +
-0.199729	return Func1(x) * Func1(x) +
-0.199729	solution a = 1.0f +
-0.199729	ret ALIGN ?Func@@YAXQAHAAH@Z ENDP +
-0.164890	d.x; a.y = b.y +
-0.164890	code must compute (FuncRow(i)*columns +
-0.164890	b2; y = a1/b1 +
-0.164890	calculated internally as (int)&matrix[0][0] +
-0.164890	order(i); list[j].a = list[j].b +
-0.164890	= d + e +
-0.164890	& r) {return r.a +
-0.164890	{ Table[x] = A*x*x +
-0.164890	b2; y = (a1*b2 +
-0.164890	= A*x*x + B*x +
-0.164890	x) { return x*x +
-0.164890	> 0) ? (cc[i] +
-0.164890	* p) {return p->a +
-0.164890	3.; x.d = y.d +
-0.164890	... x.a = y.a +
-0.164890	1.; x.b = y.b +
-0.164890	2.; x.c = y.c +
-0.164890	x) { return square(x) +
-0.164890	= 2.5*x^2 - 8*x +
-0.164890	is (int)(&list[100]) = (int)(&list[0]) +
-0.164890	a) { return vector(x +
-0.164890	7.41b a.x = b.x +
-0.164890	= b.x + c.x +
-0.164890	= b.y + c.y +
-0.103880	a = a - n.a.
-0.103880	0 = a - n.a.
-0.178969	-(-a) = a - n.a.
-0.178969	a*1 = a - n.a.
-0.178969	a+0 = a - n.a.
-0.103880	~(~a) = a - n.a.
-1.010860	- - - - n.a.
-0.716898	- x - - n.a.
-0.896089	x x - - n.a.
-0.003681	n.a. n.a. - - n.a.
-0.026907	n.a. n.a. x - n.a.
-0.061386	n.a. - n.a. - n.a.
-0.526865	n.a. x n.a. - n.a.
-0.287343	- reciprocal n.a. - n.a.
-0.180461	((x2) 2) 2 - n.a.
-0.094205	~a = 0 - n.a.
-0.167492	a-a = 0 - n.a.
-0.167492	a*0 = 0 - n.a.
-0.094205	^a = 0 - n.a.
-0.094205	andnot(a,a) = 0 - n.a.
-0.124579	& 0= 0 - n.a.
-0.180461	(a >= b) - n.a.
-0.180461	-1 = -1 - n.a.
-0.180461	(-a)*(-b) = a*b - n.a.
-0.180461	n.a. Constant folding - n.a.
-0.304580	(a+b)+c = a+(b+c) - n.a.
-0.430709	a*b+a*c = a*(b+c) - n.a.
-0.180461	a*b = b*a - n.a.
-0.180461	(a&b)|(a&c) = a&(b|c) - n.a.
-0.180461	a<<b<<c = a<<(b+c) - n.a.
-0.180461	a+a+a+a = a*4 - n.a.
-0.109102	n.a. - n.a. x n.a.
-0.312990	n.a. x n.a. x n.a.
-0.233398	Common subexpression elimination x n.a.
-0.233398	reductions: a+b=b+a, a*b=b*a x n.a.
-0.000620	- - - n.a. n.a.
-0.000517	x - - n.a. n.a.
-0.000148	n.a. - - n.a. n.a.
-0.011158	n.a. x - n.a. n.a.
-0.148180	Linux __INTEL_COMPILER __INTEL_COMPILER n.a. n.a.
-0.233981	_WIN32 _WIN32 Linux platform n.a.
-0.226752	multiply by - reciprocal n.a.
-0.272261	Windows Linux __INTEL_COMPILER __INTEL_COMPILER n.a.
-0.212281	and not not _WIN32 n.a.
-0.200009	K8 0.38 0.44 0.40 n.a.
-0.200009	K8 0.24 0.25 0.24 n.a.
-0.165148	K8 1.09 1.25 1.61 n.a.
-0.589838	the part of the library
-0.568209	CPU-specific versions of the library
-0.535457	one instance of the library
-0.461027	a parameter to the library
-0.356053	the compiler, and the library
-0.357557	have supplied in the library
-0.461092	is useful if the library
-0.354684	library other than the library
-0.460180	is resolved when the library
-0.340207	single function from the library
-0.340207	are needed from the library
-0.520086	have to call the library
-0.369638	want to call the library
-0.419805	_mm256_zeroupper() before calling the library
-0.337950	compiler by including the library
-0.324121	library and economize the library
-0.292642	application program loads the library
-0.357478	loop. log is a library
-1.199823	a function in a library
-0.355868	get ReadTSC as a library
-0.551761	matrix. For example, a library
-0.490351	includes the addresses of library
-0.335544	of its time in library
-0.237329	lot of CPU-time in library
-0.237329	_intel_fast_memcpy and __intel_new_strlen in library
-0.344577	same source code. The library
-0.293670	a vector register. The library
-0.540821	throw()specification is useful for library
-0.313714	initialization. The program or library
-0.293243	distributed as object or library
-0.126355	available in the function library
-0.747276	implemented as a function library
-0.342995	and publish a function library
-0.333840	compile time. The function library
-0.324531	mutually incompatible. A function library
-0.316320	using an Intel function library
-0.319646	file. Use another function library
-0.282989	have a standard function library
-0.369107	in a separate function library
-0.228030	Asmlib My own function library
-0.228030	the Gnu C function library
-0.282989	the Intel math function library
-0.228030	but the asmlib function library
-0.228030	choose an up-to-date function library
-0.289668	Intel C++ compiler. This library
-0.101394	and Windows platforms. This library
-0.101394	all x86 platforms. This library
-0.233910	(MKL v. 7.2). This library
-0.233910	the option -mveclibabi=svml. This library
-0.090867	compiler can use this library
-0.237471	// Use ReadTSC() from library
-0.420660	using a long vector library
-0.357492	a big floating point library
-0.140177	and the vector class library
-0.087305	registers. The vector class library
-0.087305	"vectorclass.h" // vector class library
-0.087305	The Intel vector class library
-0.087305	me. My vector class library
-0.064457	classes Agner's vector class library
-0.064457	107). Agner's vector class library
-0.064457	amd_vrd2_exp Agner's vector class library
-0.291650	updated lately. Vector class library
-0.237258	SSE2 version of most library
-0.234598	CPU dispatching. Many Intel library
-0.234598	doesn’t. The undocumented Intel library
-0.549386	finding the most efficient library
-0.313576	static linking for any library
-0.100548	to the standard template library
-0.100548	classes. The standard template library
-0.024308	function in a dynamic library
-0.106646	at which a dynamic library
-0.056367	calls it. A dynamic library
-0.056367	each process. A dynamic library
-0.056367	static linking. A dynamic library
-0.327182	of the same dynamic library
-0.197334	clash with another dynamic library
-0.337069	problems if the necessary library
-0.344773	option -fno-builtin to get library
-0.235345	Intel) know that standard library
-0.291247	more efforts in optimizing library
-0.866688	call to a graphics library
-0.234085	addresses of dynamically linked library
-0.256753	systems. The user interface library
-0.256753	time. A user interface library
-0.256753	A popular user interface library
-0.278268	in a static link library
-0.164561	as a dynamic link library
-0.035669	a separate dynamic link library
-0.215131	The AMD math core library
-0.215131	AMD AMD Math core library
-0.233147	reusable and well- tested library
-0.318873	libraries: Intel vector math library
-0.318873	Intel short vector math library
-0.336814	linking makes the entire library
-0.229211	the function call. Load library
-0.281446	execution time on executing library
-0.226604	library functions. Time- consuming library
-0.127682	provided in the asmlib library
-0.010128	instruction set, using asmlib library
-0.164998	of a re- usable library
-0.164998	and SVML. The IPP library
-0.164998	the "Intel Performance Primitives" library
-0.623995	a linear function of i
-0.405668	the new value of i
-0.312743	the binary value of i
-0.312743	possible negative value of i
-0.312743	the initial value of i
-0.285123	the sign bit of i
-0.279948	down sign bit of i
-0.236348	i++)a[i]=2*i; The conversion of i
-0.314728	ebx,1 adds this to i
-0.336207	i < 0 and i
-0.237815	has not noticed that i
-1.190506	i++) { a[i] = i
-0.292626	i<300; i++){ list[i] = i
-0.236511	85 ; eax = i
-0.237750	the $B1$2 label if i
-1.209959	is the same as i
-0.357516	to p is not i
-0.237628	line doesn't work int i
-0.535346	large positive number when i
-0.236143	list[i] is invalid when i
-0.453395	so that it has i
-0.237415	0 to 15. If i
-0.343015	this interval, for example i
-0.001254	(int i = 0; i
-0.000108	for (i = 0; i
-0.348329	it this by making i
-0.229300	bit of i ; i
-0.229300	calculations: for ( ; i
-0.323506	i<300; i++){ list[i] += i
-0.328680	it takes to add i
-0.345653	freely. The loop counter i
-0.009848	cc[]) { for (int i
-0.009848	bb) { for (int i
-0.019924	= 0; for (int i
-0.009848	// ... for (int i
-0.009848	List[ArraySize]; ... for (int i
-0.003912	biggest vectors: for (int i
-0.000975	eight-element vectors: for (int i
-0.019924	100 floats for (int i
-0.019924	0, sum; for (int i
-0.019924	name _alloca) for (int i
-0.234340	(i >= min && i
-0.234206	(i < 0 || i
-0.882324	0; i < 100; i
-0.321355	for (i = 2; i
-0.233303	The compiler has replaced i
-0.174845	in order to divide i
-0.230671	is the loop condition i
-1.579928	0; i < size; i
-0.027252	0; i < 256; i
-0.226614	It can also eliminate i
-0.279350	} The two comparisons i
-0.279266	element. Rather than comparing i
-0.304288	int or by type-casting i
-0.218391	short int s; 40 i
-0.307503	2, x = 2.0; i
-0.199858	for (i = StringLength; i
-0.465846	0; i < 20; i
-0.382707	b; // everything is float
-0.656874	when b is a float
-0.501396	signed integer to a float
-0.354847	integer variable by a float
-0.324583	be vectorized, because a float
-0.237797	are used. Conversions of float
-0.420612	conversion of i to float
-0.102683	of overflow Integer to float
-0.102683	further discussion. Integer to float
-0.237707	b+c = 100000001.23456. The float
-0.473280	expressions Induction variables for float
-0.067504	4 4 bytes = float
-0.264766	unlimited 4 bytes = float
-0.844441	F1(a); } else { float
-0.639068	float Exp(float x) { float
-0.070392	Example 14.28 union { float
-0.070392	Example 14.26 union { float
-0.070392	Example 14.27 union { float
-0.070392	Example 14.23 union { float
-0.070392	Example 7.39 union { float
-0.070392	Example 14.29 union { float
-0.070392	Example 14.24 union { float
-0.070392	Example 14.25 union { float
-0.457308	__declspec(align(16)) struct S1 { float
-1.090882	is advantageous to use float
-0.312196	truncation. Efficient conversion from float
-0.404990	cannot avoid conversions from float
-0.293192	(int x) { static float
-0.183807	int b; static const float
-0.183807	{ __declspec(align(16)) static const float
-0.183807	floata; boolb=0; static const float
-0.292738	rolled out by 4 float
-0.334625	64-bit mode 8 8 float
-0.324168	mode SSE 128 bit float
-0.282439	instructions AVX 256 bit float
-0.227546	instruction set (128 bit float
-0.334214	align table by 16 float
-0.603770	64 2 128 SSE2 float
-0.270903	= 100; int i; float
-0.498713	float a[100]; int i; float
-0.270903	= 16; int i; float
-0.270903	= 1000; int i; float
-0.270903	float matrix[rows][columns]; int i; float
-0.270903	Example 7.19 int i; float
-0.235991	vector() {} vector(float a, float
-0.235502	float 128 double 128 float
-0.235404	example, you get four float
-0.334208	Sum of a list float
-0.348470	function add_horizontal) static inline float
-0.234487	int64_t 256 uint64_t 256 float
-0.234034	2-dimensional vector 56 public: float
-0.233867	float xn = x; float
-0.273153	int size = 100; float
-0.190076	int ARRAYSIZE = 100; float
-0.320128	64 4 256 AVX2 float
-0.233529	in example 8.15a were float
-0.233111	Example 7.11 bool a; float
-0.070190	reciprocal_divisor; 14.7 Don't mix float
-0.070190	139 14.7 Don't mix float
-0.306663	for example, to convert float
-0.230040	Example 12.9a. Taylor series float
-0.282675	variables, integer Register variables, float
-0.186341	by 4 float a[100]; float
-0.186341	a list float a[100]; float
-0.559739	64 8 512 AVX512 float
-0.568508	int size = 1000; float
-0.212062	Example: // Example 14.6 float
-0.212062	Example: // Example 7.1 float
-0.264911	10, columns = 8; float
-0.211924	x + 1.0f;} 66 float
-0.568443	matrix[rows][columns]; int i, j; float
-0.212062	address: // Example 7.27 float
-0.212062	Example: // Example 7.24 float
-0.212062	memset: // Example 7.16 float
-0.330325	integer Common subexpression elimin., float
-0.250932	* x; // x^2 float
-0.250932	{ 89 int a[1000]; float
-0.250932	float sum = 1.f; float
-0.199662	capability: // Example 11.1a float
-0.199662	103 // Example 11.1b float
-0.199662	* 1.2; // Mixing float
-0.164828	calculations: // Example 8.3a float
-0.164828	example: // Example 7.29a float
-0.164828	when the code mixes float
-0.164828	return a * a;} float
-0.164828	array of n floats: float
-0.164828	square: // Example 8.1b float
-0.164828	Example: // Example 8.1a float
-0.164828	Example: // Example 8.16 float
-0.164828	numbers: // Example 8.18 float
-0.164828	20, columns = 32; float
-0.164828	Example: // Example 14.18a float
-0.164828	precision: // Example 14.18b float
-0.164828	1./6.22702E9, 1./8.71782E10, 1./1.30767E12, 1./2.09227E13}; float
-0.164828	float a[size], b[size], c[size]; float
-0.164828	variable: // Example 7.26b float
-0.164828	Example: // Example 7.26a float
-0.164828	20, columns = 50; float
-0.164828	transferred in registers (8 float
-0.164828	example: // Example 14.2a float
-0.164828	lookup: // Example 14.2b float
-0.382570	order to utilize the multiple
-0.237717	and then merge the multiple
-0.237717	is to combine the multiple
-0.340723	in memory is a multiple
-0.340723	each array is a multiple
-0.479743	the matrix is a multiple
-0.063445	critical stride is a multiple
-0.063342	be spaced by a multiple
-0.236424	the array size a multiple
-0.236424	addresses are spaced a multiple
-0.314433	that is ported to multiple
-0.237656	Example 7.38b. Alternative to multiple
-0.896054	of the code in multiple
-0.565831	piece of code in multiple
-0.055302	Making critical code in multiple
-0.312756	the entire program in multiple
-1.175691	that are used in multiple
-0.236250	function is compiled in multiple
-0.237772	Math Kernel Library. The multiple
-0.601190	can be used for multiple
-0.293218	the same array for multiple
-0.344474	purpose: Contain one or multiple
-0.236401	supporting multiple platforms or multiple
-0.236401	is called once or multiple
-0.550037	minimized. For example, if multiple
-0.324581	not thread safe if multiple
-0.627542	that is used by multiple
-0.236944	Automatic paralleli- zation by multiple
-0.436357	or a CPU with multiple
-0.415694	in a loop with multiple
-0.320820	fully utilize systems with multiple
-0.232928	blocks. A method with multiple
-0.232928	thing. An expression with multiple
-0.308798	in a computer with multiple
-0.232928	heavy competition. Processors with multiple
-0.538393	operation is performed on multiple
-0.355972	special cases such as multiple
-0.058222	the functions that have multiple
-0.355300	The way to use multiple
-0.328845	a loop and use multiple
-0.234838	the future. To use multiple
-0.237453	for details. Inheritance from multiple
-0.237453	is that CParent::Hello() has multiple
-0.355767	CPU dispatching to make multiple
-0.415309	better, you may make multiple
-0.505468	-axAVX. This will make multiple
-0.293472	then you can set multiple
-0.459913	limited is to do multiple
-0.213133	and divide it into multiple
-0.266120	up a function into multiple
-0.213133	parallelization of code into multiple
-0.391567	up the data into multiple
-0.213133	be split up into multiple
-0.034496	divide the work into multiple
-0.266120	split the tasks into multiple
-0.213133	divide the job into multiple
-0.551428	can be divided into multiple
-0.034608	that is shared between multiple
-0.034608	address and shared between multiple
-0.034608	can be shared between multiple
-0.034608	that are shared between multiple
-0.034608	is not shared between multiple
-0.209695	threads that jump between multiple
-0.209695	code is distributed between multiple
-0.209695	divide the workload between multiple
-0.236448	clear or mask out multiple
-0.334169	a feature for making multiple
-0.236063	separated by semicolons, while multiple
-0.799491	in order to avoid multiple
-0.395204	classes. You may avoid multiple
-0.303618	it may go through multiple
-0.228575	or line separately through multiple
-0.278870	the function is doing multiple
-0.207661	register renaming and doing multiple
-0.529140	the CPU from doing multiple
-0.207661	help the CPU doing multiple
-0.234747	by assignment. shared_ptr allows multiple
-0.234663	things in parallel: Using multiple
-0.234537	Intel's term for running multiple
-0.233666	compiler can automatically generate multiple
-0.233095	development environment (IDE) supports multiple
-0.034387	bitwise operators for checking multiple
-0.287144	also useful for testing multiple
-0.231690	MemberPointer is declared. Avoid multiple
-0.231754	multiple CPU cores: Define multiple
-0.286686	the possibility of compiling multiple
-0.231193	regular access patterns containing multiple
-0.333756	alternative is to keep multiple
-0.176311	// Example 14.7b. Testing multiple
-0.176311	// Example 14.7a. Testing multiple
-0.212073	cores, vector processing instructions, multiple
-0.251096	are useful for supporting multiple
-0.164962	time-consuming data processing. Running multiple
-0.164962	optimization or for combining multiple
-0.164962	files on access. Run multiple
-0.164962	and you can toggle multiple
-0.457359	advance which of the two
-0.687355	the order of the two
-0.355994	opposite order of the two
-0.353987	the results of the two
-0.356825	exactly identical for the two
-0.656048	will recognize that the two
-0.355351	32-bit software because the two
-0.614717	the difference between the two
-0.307313	the transitions between the two
-0.236776	able to mix the two
-0.381257	advantageous to keep the two
-0.292928	+ d); Now the two
-0.236776	compiler may interleave the two
-0.236776	dependency chains, namely the two
-0.782083	a value that is two
-0.973904	by the use of two
-0.634280	128 bit vector of two
-0.525108	compare the performance of two
-0.331100	calculations on vectors of two
-0.661843	can be used in two
-0.329894	is also used in two
-0.323672	make memory-hungry software in two
-0.236521	works by compiling in two
-0.604796	Vector classes defined in two
-0.236521	and integer representations in two
-0.352888	+= 1.0f; } The two
-0.237417	be broken up. The two
-0.538663	standard C, specifying that two
-0.539291	faster. There may be two
-0.338245	some cases, there are two
-0.338245	units. Typically, there are two
-0.309850	mathematical code. There are two
-0.309850	same address. There are two
-0.309850	multiple threads. There are two
-0.309850	and maintenance There are two
-0.309850	following way: There are two
-0.293297	and 22 one or two
-0.293297	integer units, one or two
-0.235786	32 bits each, or two
-0.235786	compiling for AVX2, or two
-0.525593	out the loop by two
-0.313694	Unrolling the loop by two
-0.236213	and you unroll by two
-0.236879	by a table with two
-0.293046	to be calculated with two
-0.236763	a 256-bit vector as two
-0.292914	in fact represented as two
-0.351774	can go more than two
-0.355696	64-bit integer rather than two
-0.341022	not good to have two
-0.451332	Assume that you have two
-0.326804	Most modern CPUs have two
-0.233818	the x86 family have two
-0.765758	if a program has two
-0.235732	in example 8.23b has two
-0.655135	is common to make two
-0.344840	This tool can make two
-0.291342	the nearest integer. If two
-0.235382	etc. is considerable. If two
-0.542913	not possible to do two
-0.631434	it takes to do two
-0.234474	256-bit read operations into two
-0.234474	operation was split into two
-0.351725	to give the variable two
-0.512895	Calculating the difference between two
-0.221129	branches that select between two
-0.014686	branch that chooses between two
-0.061982	a program chooses between two
-0.330827	to go one way two
-0.426066	need for the first two
-0.329136	so that the first two
-0.338519	32-bit mode. The first two
-0.304394	values. Which of these two
-0.304394	OR combination of these two
-0.274653	another function and these two
-0.220677	sure to distinguish these two
-0.340809	is capable of making two
-0.228456	+ 2; } These two
-0.228456	C++". Addison-Wesley, 1996. These two
-0.300518	though it is doing two
-0.300518	are used for doing two
-0.261864	are able to run two
-0.261864	then try to run two
-0.335525	of ebx. The next two
-0.022311	vector of (2,2,2,2,2,2,2,2) __m128i two
-0.378151	system to avoid running two
-0.234421	selected instruction set. Make two
-0.216355	be done with just two
-0.216355	sufficient to have just two
-0.233468	them. This would require two
-0.232169	atomic. It doesn't prevent two
-0.068759	a macro to swap two
-0.068759	Define macro to swap two
-0.195865	of x for approximately two
-0.195865	can be accessed approximately two
-0.226583	three times. Then again two
-0.296003	allows us to compare two
-0.212091	only an addition. Comparing two
-0.199825	the error condition. Replacing two
-0.164978	point registers and correspondingly two
-0.164978	memory space by allowing two
-0.354592	the class of the object
-1.332042	the size of the object
-1.207825	the beginning of the object
-0.564233	or pointer to the object
-0.458885	"C" declaration and the object
-0.356972	the name in the object
-0.355890	the destructor for the object
-0.352128	memory block that the object
-0.352128	special feature that the object
-0.064506	not needed if the object
-0.450818	will fail if the object
-0.355420	fully compatible on the object
-0.450085	be called when the object
-0.348244	memory released when the object
-0.353888	other compilers at the object
-0.354486	the object. If the object
-0.351890	is determined where the object
-0.235680	no virtual member the object
-0.235680	a single register the object
-1.124295	to make sure the object
-0.344833	at compile-time whether the object
-0.235680	and no destructor the object
-0.235680	container may move the object
-0.235680	creating and deleting the object
-0.235680	destructor by constructing the object
-0.235680	conditions are met: the object
-0.293101	know what class of object
-0.354856	a release version of object
-0.520448	of the type of object
-0.635852	of the advantages of object
-0.048235	the negative effects of object
-0.048235	The negative effects of object
-0.313756	structures and classes. The object
-0.237088	object to another. The object
-0.237088	137 about division). The object
-0.329831	allocation Any array or object
-0.056669	Accessing a variable or object
-0.056669	treat a variable or object
-0.121858	that no variable or object
-0.314455	function libraries distributed as object
-0.335267	smart pointer is an object
-0.336298	the type of an object
-0.120642	always points to an object
-0.120642	actually points to an object
-0.057026	stored together in an object
-0.273349	program dictates that an object
-0.308497	is called on an object
-0.333853	same way as an object
-0.219525	called every time an object
-0.223154	You may use an object
-0.223154	option then use an object
-0.292882	reference to such an object
-0.357381	is to access an object
-0.096037	cost to accessing an object
-0.096037	or when accessing an object
-0.273349	be called whenever an object
-0.219525	smart pointer. Accessing an object
-0.219525	the function construct an object
-0.517650	address of the data object
-0.337352	(4) access the data object
-0.237422	well. Supports three different object
-0.833031	point to the same object
-0.331181	memory block from one object
-0.650591	make sure that no object
-0.487593	The size of each object
-0.314379	and destructors of each object
-0.230033	than to store each object
-0.230033	rather than moving each object
-0.335282	reference to a static object
-0.353661	before adding the first object
-0.327652	memory for a new object
-0.461840	Each time a new object
-0.221962	methods mentioned above. An object
-0.221962	class is declared. An object
-0.221962	implementations. 7.22 Inheritance An object
-0.143094	files into a single object
-0.857291	a class or structure object
-0.244590	we compile the shared object
-0.078421	part of a shared object
-0.127540	function in a shared object
-0.060220	not in a shared object
-0.060220	variable in a shared object
-0.078421	to compile a shared object
-0.068474	static libraries. A shared object
-0.068474	explained above. A shared object
-0.195209	in a 64-bit shared object
-0.149802	a very large shared object
-0.235265	d; // makes intermediate object
-0.234090	the following reasons: Each object
-0.156695	called and the local object
-0.156695	to make the local object
-0.230669	the main reasons why object
-0.313286	creation of a temporary object
-0.355941	Nowadays, programming textbooks recommend object
-0.218476	pointer to a contained object
-0.291588	sure that the original object
-0.148652	new one. The existing object
-0.193933	function modify an existing object
-0.199897	the commercial compilers. Mixing object
-0.165045	instead of the usual object
-0.345425	where n is the number
-0.345425	where r is the number
-0.523786	n bits of the number
-0.904556	is equal to the number
-0.342544	physical processors and the number
-0.342544	preceding branches and the number
-0.342544	program flow and the number
-0.351275	can calculate that the number
-0.351275	be expected that the number
-0.341025	an array or the number
-1.001561	is faster if the number
-0.631198	For example, if the number
-0.344355	more complicated if the number
-0.344355	is minimized if the number
-0.346556	period and by the number
-1.044949	is divisible by the number
-0.563987	gain depends on the number
-0.488987	events, such as the number
-0.442147	no more than the number
-0.442147	same priority than the number
-0.449263	is useful when the number
-0.347593	is negligible when the number
-0.577052	advantageous to make the number
-0.402052	simplest code. If the number
-0.309821	be used. If the number
-0.402052	data cache. If the number
-0.309821	a problem. If the number
-0.309821	See www.agner.org/optimize/cppexamples.zip. If the number
-0.309821	compile time? If the number
-0.290509	This would double the number
-0.315698	instruction set where the number
-0.503631	most cases where the number
-0.409332	of situations where the number
-0.345112	for distinguishing between the number
-0.502922	advice of making the number
-0.332879	be critical. Therefore, the number
-0.321387	possible to reduce the number
-0.310848	ways of reducing the number
-0.378293	counter that measures the number
-0.429481	The present manual is number
-0.503039	is used in a number
-0.206589	check. There are a number
-0.206589	-abs(x);. There are a number
-0.451535	are available from a number
-0.237160	If you want a number
-0.338942	the application program. The number
-0.734913	set is available. The number
-0.291633	in the system. The number
-0.235637	32 bit systems: The number
-0.379669	member by 8. The number
-0.235637	targets is small. The number
-0.235637	not overlap. 27 The number
-0.023422	SIZE = 512; // number
-0.236594	SIZE = 64; // number
-0.344204	add a to this number
-0.340043	and always use this number
-0.227065	of a floating point number
-0.288822	to a floating point number
-0.288822	if a floating point number
-0.288822	rounds a floating point number
-0.426285	a nonzero floating point number
-0.333139	all belong to set number
-0.234898	cache lines in set number
-0.510719	collection of a variable number
-0.330602	compile time. A variable number
-0.419806	expressed as a 32-bit number
-0.495603	with a very large number
-0.228750	the address of element number
-0.228750	When we reach element number
-0.312227	doesn't have the line number
-0.235035	is finished. The optimal number
-0.234667	given a false model number
-0.429773	model with a higher number
-0.094379	as a large positive number
-0.094379	a very large positive number
-0.291973	has only a limited number
-0.263051	Register storage A limited number
-0.133980	page 27). The maximum number
-0.133980	the weekdays. The maximum number
-0.133980	systems. 67 The maximum number
-0.231787	may have a reduced number
-0.202773	dynamically when the total number
-0.089658	complicated. If the total number
-0.089658	stored? If the total number
-0.228036	Intel CPUs have family number
-0.228011	is useful for random number
-0.302263	to get a realistic number
-0.034935	software into an excessive number
-0.034935	you avoid an excessive number
-0.034935	19 Avoid an excessive number
-0.222218	divisible by the 107 number
-0.074737	used for an increasing number
-0.074737	are seeing an increasing number
-0.218512	done with an extended number
-0.218414	is a valid 63 number
-0.212148	11.2b was an odd number
-0.102822	Number 18 will evict number
-0.102822	Number 17 will evict number
-0.165029	of simultaneous lookups Max. number
-0.165029	to get an integral number
-0.348389	member variable with the static
-0.348389	variables declared with the static
-1.128792	recommended to use the static
-0.356090	member functions because the static
-0.345962	loop and without the static
-0.331312	You may add the static
-0.875433	or reference to a static
-0.527999	library than in a static
-0.452553	received data in a static
-0.815818	are stored in a static
-0.350961	runtime DLL or a static
-0.351946	memory resources than a static
-0.557725	program. The advantage of static
-0.237269	*.so). The mechanism of static
-0.293490	make thread-local storage of static
-0.457966	mimic the behavior of static
-0.294141	less efficient. Access to static
-0.237230	for all public and static
-0.293445	26. Avoid global and static
-0.237230	#define ABC 123 and static
-0.407058	store the table in static
-0.165599	will be stored in static
-0.525295	constants are stored in static
-0.292933	code. Storing something in static
-0.346437	an inlined function. The static
-0.312835	of the memory. The static
-0.323423	the same class. The static
-0.312835	any other module. The static
-0.236316	virtual function tables. The static
-0.538640	it is clear that static
-0.237014	critical functions inline or static
-0.237014	function returns. Global or static
-0.458443	used, but not if static
-0.348872	threads should rely on static
-0.235872	150 you want as static
-0.036950	be implemented either as static
-0.236524	cached more efficiently than static
-0.334838	link libraries slower than static
-0.456335	SomeFunction (int x) { static
-0.329962	void Func () { static
-0.554529	lookup is to use static
-0.236176	function should never use static
-0.237521	single executable file when static
-0.108307	or member functions. A static
-0.108307	non-static member functions. A static
-0.234503	on the stack. A static
-0.306576	Copying constant data from static
-0.231062	copy the table from static
-0.231062	the entire list from static
-0.231062	library functions linked from static
-0.231062	14.1c is copied from static
-0.237457	cause caching problems because static
-0.311876	and declare all functions static
-0.450621	to make member functions static
-0.348970	is used for all static
-0.985450	The advantage of using static
-0.985450	The advantages of using static
-0.812191	be improved by using static
-0.381670	make the local object static
-0.233253	module static static static static
-0.168642	same module static static static
-0.230909	from same module static static
-0.002646	integer vector from array static
-0.002646	integer vector into array static
-0.237008	with member functions, where static
-0.292098	compilers will align large static
-0.750051	float a; int b; static
-0.570472	that do not support static
-0.324744	are available in both static
-0.224312	may work with both static
-0.241064	then add the keyword static
-0.241064	1. Add the keyword static
-0.234018	operating system, this requires static
-0.251127	class powN { public: static
-0.251127	class powN<true,0> { public: static
-0.251127	class powN<true,N> { public: static
-0.251127	class powN<true,1> { public: static
-0.231693	faster by making them static
-0.316498	"override" feature. This includes static
-0.577113	only from same module static
-0.320837	Example 14.3b int n; static
-0.276419	is recommended to specify static
-0.218315	Exp(float x) { __declspec(align(16)) static
-0.355921	N template <int N> static
-0.251071	x64 141 #include <emmintrin.h> static
-0.199785	compilers. // Example 14.19 static
-0.199785	compile-time if statements (called static
-0.465712	// Table of factorials: static
-0.199785	memory. If the word static
-0.164942	Example 7.29b floata; boolb=0; static
-0.164942	template template <typename T> static
-0.164942	loading a cache line: static
-0.164942	x) { return _mm_cvtss_si32(_mm_load_ss(&x));} static
-0.164942	than the function add_horizontal) static
-0.358315	address space of the 64-bit
-0.358159	been alleviated in the 64-bit
-0.357530	quite certain that the 64-bit
-0.355783	you do use the 64-bit
-0.827207	64-bit mode because the 64-bit
-0.341170	execute _mm_empty() after the 64-bit
-0.338890	and VIA including the 64-bit
-0.202027	32 bits of a 64-bit
-0.351190	existing systems and a 64-bit
-0.525012	public variables in a 64-bit
-0.547363	systems by using a 64-bit
-0.293162	if you write a 64-bit
-0.522932	can take advantage of 64-bit
-0.509556	Windows. The disadvantage of 64-bit
-0.237301	no heavy marketing of 64-bit
-0.237872	sake of portability to 64-bit
-0.054630	Windows and 32-bit and 64-bit
-0.026457	capabilities for 32-bit and 64-bit
-0.026457	executables for 32-bit and 64-bit
-0.054630	performance between 32-bit and 64-bit
-0.054630	libraries support 32-bit and 64-bit
-0.054630	(2013) both 32-bit and 64-bit
-0.054630	It supports 32-bit and 64-bit
-0.026457	not. Supports 32-bit and 64-bit
-0.026457	code). Supports 32-bit and 64-bit
-0.054630	platforms, including 32-bit and 64-bit
-0.054630	both 16-bit, 32-bit and 64-bit
-0.329098	of 32-bit integers and 64-bit
-0.015350	compiler for 32- and 64-bit
-0.064953	libraries. Supports 32- and 64-bit
-0.023126	64-bit Linux than in 64-bit
-0.490165	cannot be used in 64-bit
-0.096734	is more efficient in 64-bit
-0.096734	and more efficient in 64-bit
-0.096734	calling more efficient in 64-bit
-0.306439	run slightly faster in 64-bit
-0.168664	but in registers in 64-bit
-0.168664	fourteen integer registers in 64-bit
-0.333174	and 64 bits in 64-bit
-0.683722	registers are available in 64-bit
-0.316870	and 8 bytes in 64-bit
-0.333174	disappears when running in 64-bit
-0.047489	is not needed in 64-bit
-0.230947	can be avoided in 64-bit
-0.230947	registers by default in 64-bit
-0.230947	implemented as follows in 64-bit
-0.230947	systems and fourteen in 64-bit
-0.230947	are available, i.e. in 64-bit
-0.022988	systems and sixteen in 64-bit
-0.230947	is always enabled in 64-bit
-0.230947	is much simpler in 64-bit
-0.230947	by default anyway in 64-bit
-0.230947	is not recognized in 64-bit
-0.237442	the future. 6 The 64-bit
-0.293687	compiler itself is. The 64-bit
-0.343800	library. Only available for 64-bit
-0.440812	in programs compiled for 64-bit
-0.347740	have inherent support for 64-bit
-0.381139	in device drivers for 64-bit
-0.236447	running in 32-bit or 64-bit
-0.236447	in 16-bit systems or 64-bit
-0.236447	bigger segments (32-bit or 64-bit
-1.312130	is more efficient than 64-bit
-0.459047	is necessary to use 64-bit
-0.349606	systems we can use 64-bit
-0.479703	systems, you may use 64-bit
-0.237487	has been increased from 64-bit
-0.237421	had in fact only 64-bit
-0.449260	is supported on all 64-bit
-0.293278	was split into two 64-bit
-0.220755	will be 2 In 64-bit
-0.359071	are less efficient. In 64-bit
-0.220755	in 64-bit Windows. In 64-bit
-0.220755	fourteen register parameters. In 64-bit
-0.220755	one clock cycle. In 64-bit
-0.220755	integer, usually 32. In 64-bit
-0.330740	or unsigned 4 4 64-bit
-0.236319	precautions for speeding up 64-bit
-0.236229	32 bit mode. Some 64-bit
-0.228922	or library files. Use 64-bit
-0.228922	as floating point. Use 64-bit
-0.235001	floating point numbers. Therefore, 64-bit
-0.434233	latter is more efficient. 64-bit
-0.287220	be transferred in registers. 64-bit
-0.231225	this will use full 64-bit
-0.372866	general, you can expect 64-bit
-0.610491	used for internal references. 64-bit
-0.315675	allow you to define 64-bit
-0.456136	4 pointer or reference, 64-bit
-0.222218	and edx, respectively. (In 64-bit
-0.218414	-263 263-1 int64_t 29 64-bit
-0.199881	MS compiler: unsigned __int64 64-bit
-0.165029	transferred in registers, whereas 64-bit
-0.165029	calling conventions are different. 64-bit
-0.292607	a natural order and there
-0.236494	inputs is limited and there
-0.236494	size doesn't matter and there
-0.236494	systems are common, and there
-0.236494	different memory areas, and there
-0.236494	sections are dominating and there
-0.326230	1000 times and that there
-0.354146	64-bit operations so that there
-0.470801	such a way that there
-0.405167	you can assume that there
-0.405167	You can assume that there
-0.309305	has the feature that there
-0.342908	and 1. Note that there
-0.023174	should be aware that there
-0.289036	programmers have discovered that there
-0.233354	and more complex, that there
-0.233354	when you discover that there
-0.339379	"how many elements are there
-0.342478	ruled out or if there
-0.224408	in the program if there
-0.492490	only be used if there
-0.224408	the code cache if there
-0.298670	a signed integer if there
-0.035728	of function pointers if there
-0.675599	a separate thread if there
-0.224408	than 32-bit programs if there
-0.224408	code becomes bigger if there
-0.224408	code becomes smaller if there
-0.202702	not be safe if there
-0.202702	is exception safe if there
-0.224408	from the exponent if there
-0.317105	time consuming, especially if there
-0.308913	we will consider if there
-0.224408	is taken, i.e. if there
-0.224408	each object separately if there
-0.224408	use multiple accumulators if there
-0.224408	many function calls, if there
-0.237698	matter of convenience - there
-0.380969	allocate more RAM than there
-0.236570	more memory blocks than there
-0.418684	becomes more efficient when there
-0.236152	code is critical when there
-0.164597	the CPU time then there
-0.164597	at compile time then there
-0.223929	around in memory then there
-0.308331	virtual member functions then there
-0.223929	unroll by two then there
-0.223929	for the performance then there
-0.223929	of an error then there
-0.223929	abort(), _endthread(), etc. then there
-0.363441	expression is used, then there
-0.223929	bottleneck is elsewhere then there
-0.233812	compilers behave differently because there
-0.233812	handling is negligible because there
-0.289557	are particularly problematic because there
-0.229380	in 32-bit mode. If there
-0.099719	again and again. If there
-0.099719	and back again. If there
-0.229380	program is running. If there
-0.229380	time to calculate. If there
-0.286520	removing superfluous code, but there
-0.231140	or double precision, but there
-0.231140	to small devices, but there
-0.231140	large data bases, but there
-0.233661	use vector operations where there
-0.376918	few cases, however, where there
-0.236966	like a parameter, so there
-0.503150	speed. In this case there
-0.275713	the member function. But there
-0.221613	a smart pointer. But there
-0.221613	and unsigned integers. But there
-0.235577	final program and whether there
-0.193764	reproducible as possible. However, there
-0.193764	are most critical. However, there
-0.193764	instruction set. 120 However, there
-0.193764	mechanism works automatically. However, there
-0.193764	what they are. However, there
-0.213196	off by default unless there
-0.213196	a large object, unless there
-0.213196	a loop manually unless there
-0.475062	integers In most cases, there
-0.299822	optimizations in some cases, there
-0.258978	function In some cases, there
-0.258978	API. In some cases, there
-0.290584	values are simply put there
-0.233722	}; S1 ArrayOfStructures[100]; Here, there
-0.863952	is the reason why there
-0.229946	compile time. (Of course there
-0.371788	point registers are used, there
-0.329902	// At the diagonal there
-0.369080	cycle. In 64-bit systems, there
-0.306869	In many cases, however, there
-0.325344	compiled code. In general, there
-0.222218	but not all. Fortunately, there
-0.306254	pure functions, but unfortunately there
-0.165098	several execution units. Typically, there
-0.165098	have memory caches. Typically, there
-0.500724	instruction set is enabled there
-0.251178	linking cannot be avoided, there
-0.585655	command-line version of the C++
-0.065208	the drawbacks of the C++
-0.354259	good knowledge of the C++
-0.357189	the standards for the C++
-0.356239	serious problem with the C++
-0.335386	other languages. But the C++
-0.293414	actually hidden behind the C++
-0.237203	so complicated? Because the C++
-0.462151	are declared in a C++
-0.237493	different processors. In a C++
-0.341683	and operators. Make a C++
-0.354737	a future version of C++
-0.037019	advices on optimization of C++
-0.037019	book on optimization of C++
-0.037019	Advices on optimization of C++
-0.447227	An important disadvantage of C++
-0.635259	of the advantages of C++
-0.471655	seven different brands of C++
-0.236522	time and maintainability of C++
-0.487055	library for Windows and C++
-0.237509	arrays in C and C++
-0.337965	96). Virtual functions in C++
-0.591403	1. Optimizing software in C++
-0.102382	cause of errors in C++
-0.102382	source of errors in C++
-0.236251	specifying parallel processing in C++
-0.292330	is not allowed in C++
-0.236251	very old-fashioned. Development in C++
-0.323438	database connections, etc. The C++
-0.236329	7.11 Type conversions The C++
-0.292419	optimization is needed. The C++
-0.312850	how compilers work. The C++
-0.292419	other allocated resource. The C++
-0.314576	My preference is for C++
-1.136760	in the sense that C++
-0.382592	Constructor-style type casting // C++
-0.122079	together with C or C++
-0.122079	a separate C or C++
-0.122079	choose either C or C++
-0.102469	TR18015 Technical Report on C++
-0.102469	18015, "Technical Report on C++
-0.458536	compiled languages such as C++
-0.236758	as well developed as C++
-0.293788	overcome these disadvantages when C++
-0.237442	0 n! 117 A C++
-0.036128	The efficiency of different C++
-0.075537	relative efficiency of different C++
-0.318886	of optimizations in different C++
-0.005943	"Calling conventions for different C++
-0.030562	Calling conventions for different C++
-0.736397	There are several different C++
-0.449191	that work on all C++
-0.237244	of code. Furthermore, most C++
-0.319985	come with the Intel C++
-0.319985	__declspec(cpu_dispatch(...)). See the Intel C++
-0.305596	optimization features of Intel C++
-0.275784	Documentation". Included with Intel C++
-0.221676	and PathScale compilers. Intel C++
-0.221676	Builder 5, 2009). Intel C++
-0.221676	Asmlib: v. 2.00. Intel C++
-0.237213	wrapping the vectors into C++
-0.237069	you need it. In C++
-0.339062	math libraries. The Gnu C++
-0.229689	for IA-32/Intel64, 2009. Gnu C++
-0.405407	be implemented in compiled C++
-0.235584	program that produces another C++
-0.235199	Compiler optimization options All C++
-0.235122	all major platforms. However, C++
-0.222869	pointers less efficient. Most C++
-0.222869	to low-level optimizations. Most C++
-0.221622	from Intel and Microsoft C++
-0.221622	versions were tested: Microsoft C++
-0.233340	many tips on advanced C++
-0.232866	comes with most modern C++
-0.317176	and optimized function libraries. C++
-0.203241	for all platforms. PathScale C++
-0.203241	2006 (Red Hat). PathScale C++
-0.317800	may need assembly language. C++
-0.251792	Comes with the Borland C++
-0.184571	(Visual Studio 2005). Borland C++
-0.218344	C++ for several reasons. C++
-0.218344	the C++ language While C++
-0.148596	cannot be tolerated. PGI C++
-0.148596	v. 3.1, 2007. PGI C++
-0.218410	be written in C, C++
-0.212079	variables in a well-structured C++
-0.199813	below, on page 15. C++
-0.199813	pointer known in 36 C++
-0.199813	relates to security. Standard C++
-0.199813	of optimization. 14 Portability C++
-0.164967	the best optimizer. Borland/CodeGear/Embarcadero C++
-0.164967	produced regularly. Intel: "Intel® C++
-0.164967	Borland C++ 5.82 (Embarcadero/CodeGear/Borland C++
-0.527138	data", where it is also
-0.520481	most cases it is also
-0.349623	the performance, it is also
-0.997669	that the function is also
-0.446854	microprocessors. The function is also
-0.355154	vector simultaneously. This is also
-0.512184	the code. It is also
-0.348621	full optimization. It is also
-0.348621	code automatically. It is also
-0.320791	The static memory is also
-0.415659	in member functions is also
-0.337072	platforms. However, C++ is also
-0.531409	to do so is also
-0.101487	that is allocated is also
-0.101487	has been allocated is also
-0.289955	assembly output option is also
-0.424862	the present manual is also
-0.336079	The & operator is also
-0.170518	route. This mechanism is also
-0.170518	stack unwinding mechanism is also
-0.234162	branch target buffer is also
-0.234162	as versatile. Fortran is also
-0.234162	for-loop or while-loop is also
-0.237800	in C++ programs and also
-0.325257	inside another loop that also
-0.412945	of the program are also
-0.023098	near each other are also
-0.308565	point vectors. There are also
-0.308565	by 8. There are also
-0.308565	to post-increment. There are also
-0.308565	it uses. There are also
-0.308565	floating point). There are also
-0.444714	all allocated objects are also
-0.343407	special purpose libraries are also
-0.318609	necessary library files are also
-0.527870	are used together are also
-0.232373	for special purposes are also
-0.492019	threads, but it can also
-0.533716	84). The compiler can also
-0.350860	blog. Here, you can also
-0.244185	for Linux. It can also
-0.244185	point numbers. It can also
-0.244185	and list[i].b. It can also
-0.309020	of induction variables can also
-0.401062	b; A branch can also
-0.286025	these operating systems can also
-0.230704	allocation. Container classes can also
-0.230704	A template parameter can also
-0.397517	example. A union can also
-0.524580	simple periodic pattern can also
-0.230704	keyword far (arrays can also
-0.237765	waste of time, it also
-0.237737	If the latter function also
-0.236736	library if possible. This also
-0.236736	(see page 137). This also
-0.566267	efficiency, then you may also
-0.334816	2 Mbytes. There may also
-0.327400	the future we may also
-0.234299	A hash map may also
-0.237513	a = OneOrTwo5[b!=0]; will also
-0.313241	optimized mathematical functions. It also
-0.235997	to see this. It also
-0.222947	program under test but also
-0.307139	the programmers' time, but also
-0.222947	with many features, but also
-0.222947	other programming languages, but also
-0.222947	or hot spot but also
-0.222947	called from main, but also
-0.222947	simple type casting, but also
-0.222947	the Mac platform, but also
-0.452428	in a function should also
-0.232821	audio or video should also
-0.232821	proceed unattended. Uninstallation should also
-0.461130	The AVX2 instruction set also
-0.331270	a constant. The compilers also
-0.207527	intended for. Some systems also
-0.207527	accelerator card. Some systems also
-0.349710	and some of these also
-0.350198	memory allocation. This method also
-0.330204	cache. The register stack also
-0.470893	But the C++ language also
-0.235827	said here about Linux also
-0.235498	here about increment operators also
-0.554067	inefficient. Dynamic memory allocation also
-0.290523	of CPU. These methods also
-0.417369	function. The static keyword also
-0.234201	function inlining. Reducible expressions also
-0.470511	generality of the STL also
-0.231842	available from www.intel.com. (See also
-0.286658	import table and possibly also
-0.489569	animations is of course also
-0.323991	is eliminated. Loop unrolling also
-0.229952	Such a coprocessor might also
-0.229974	functions called by F1 also
-0.224790	bits (YMM), and soon also
-0.013558	or dynamic link libraries, also
-0.165014	interface. It is 102 also
-0.324781	a useful source of such
-0.293669	calls. The consequence of such
-0.237427	pointers. The absence of such
-1.295977	pointer or reference to such
-0.314389	inherent performance costs to such
-0.314005	a viable solution in such
-0.443356	AVX2 is supported in such
-0.237297	is often reorganized in such
-0.293161	are most efficient for such
-0.293161	make explicit checks for such
-0.236980	a compiler warning for such
-0.324512	the theoretical possibility that such
-0.536956	can only hope that such
-0.236951	important to realize that such
-0.539282	or a member function such
-0.236998	to the user if such
-0.236998	lot to gain if such
-0.235114	the resource use on such
-0.235114	many applications even on such
-0.311401	runs quite fast on such
-0.235114	floating point operation on such
-0.513130	containers do not have such
-0.292326	languages that do have such
-0.407516	operations. You should use such
-0.237388	the compiler doesn't make such
-0.224363	you are using functions such
-0.170554	instructions for mathematical functions such
-0.170554	memset, or mathematical functions such
-0.170554	most common mathematical functions such
-0.170554	for computing mathematical functions such
-0.224363	manipulated with C functions such
-0.278829	most common math functions such
-0.224363	dispatching or memory-intensive functions such
-0.237310	-(-a) very often, but such
-0.584751	course there is no such
-0.810744	be possible to do such
-0.527587	compilers will not do such
-0.399759	future compilers will do such
-0.237216	Automatic vectorization Good compilers such
-0.237131	when running. Programs using such
-0.406841	and mathematical calculations. In such
-0.345221	A computer with many such
-0.340294	fast. Simple integer operations such
-0.456688	has a composite type such
-0.292359	there are special cases such
-0.418585	other brands of CPUs such
-0.235057	on the screen. However, such
-0.290779	compilers will automatically replace such
-0.234751	put seldom used branches such
-0.234192	calling from other applications such
-0.289963	efficient for simple types such
-0.273079	result of other optimizations such
-0.219287	necessary to do optimizations such
-0.529359	put a parenthesis around such
-0.331189	do simple algebraic reductions such
-0.213631	useful in compiled languages such
-0.213631	speed. This includes languages such
-0.158704	in order to prevent such
-0.221741	good way to prevent such
-0.158704	extra overhead to prevent such
-0.164961	very important for tasks such
-0.212072	structures for standard tasks such
-0.164961	high priority. Other tasks such
-0.164961	loop for trivial tasks such
-0.231645	takes a long time, such
-0.332394	The main reason why such
-0.228949	of digital building blocks such
-0.228949	increasing number of purposes such
-0.227783	is in mathematical iterations such
-0.305629	appropriate type of vector, such
-0.226404	Systems with segmented memory, such
-0.226518	are also third-party profilers such
-0.226404	libraries are also available, such
-0.222176	and synchronization between threads, such
-0.306011	allocation. Some programming languages, such
-0.218216	for the same resources, such
-0.218314	no risk of overflow, such
-0.765871	The following example illustrates such
-0.218216	often determined by considerations such
-0.211952	unsigned integers in comparisons, such
-0.211952	a hardware definition language, such
-0.347013	rarely enough to justify such
-0.211952	to use string classes, such
-0.199690	STL. Some STL templates, such
-0.199690	for any other resource, such
-0.199690	lot of data shuffling, such
-0.250964	to count certain events, such
-0.199690	have names with suffixes such
-0.250964	verifying, debugging and maintaining such
-0.164854	that are inherently serial, such
-0.164854	that supports automatic vectorization, such
-0.164854	specific advantage to obtain, such
-0.164854	remote or removable media such
-0.164854	mentioned in table 9.2, such
-0.164854	is the feature information, such
-0.164854	and CPU hardware. Porting such
-0.164854	operating system may supply such
-0.357358	local variable. This is efficient
-0.237736	b * 1.5f; is efficient
-0.447803	93 for discussion of efficient
-0.314754	most common obstacles to efficient
-0.294157	is more compact and efficient
-0.500391	resultant code will be efficient
-0.294113	template. 57 Templates are efficient
-0.157992	overloaded operator is as efficient
-0.071851	A destructor is as efficient
-0.071851	virtual destructor is as efficient
-0.375290	Constructors are therefore as efficient
-0.525247	functions as well as efficient
-0.100870	operator is exactly as efficient
-0.100870	Enums are exactly as efficient
-0.399746	programming can be an efficient
-0.246984	array will be an efficient
-0.246984	may also be an efficient
-0.551022	compiler. Some compilers have efficient
-0.148389	zero that is more efficient
-0.124160	that it is more efficient
-0.124160	then it is more efficient
-0.124160	Obviously, it is more efficient
-0.073354	time. It is more efficient
-0.073354	Windows. It is more efficient
-0.073354	integers. It is more efficient
-0.073354	throw. It is more efficient
-0.073354	pooling. It is more efficient
-0.073354	queue. It is more efficient
-0.148389	polymorphism, which is more efficient
-0.148389	template class is more efficient
-0.148389	64-bit Linux is more efficient
-0.148389	Parameter transfer is more efficient
-0.148389	code. #if is more efficient
-0.148389	where pre-increment is more efficient
-0.148389	= *(p++) is more efficient
-0.148389	= array[i++] is more efficient
-0.394169	implemented in a more efficient
-0.226106	32-bit mode, and more efficient
-0.226106	both cheaper and more efficient
-0.143285	it may be more efficient
-0.226230	It may be more efficient
-0.207489	Leaf functions are more efficient
-0.207489	Fortunately, there are more efficient
-0.231813	be replaced by more efficient
-0.525772	make the code more efficient
-0.218030	floating point code more efficient
-0.267995	garbage collection. A more efficient
-0.182631	compiler to make more efficient
-0.307469	It is often more efficient
-0.328765	it is much more efficient
-0.249632	be made much more efficient
-0.296270	makes data caching more efficient
-0.140105	code makes caching more efficient
-0.463184	The code becomes more efficient
-0.182631	make function calling more efficient
-0.182631	This makes inlining more efficient
-0.081785	zero is sometimes more efficient
-0.081785	macros are sometimes more efficient
-0.061037	it is slightly more efficient
-0.061037	latter is slightly more efficient
-0.132078	make Sum1 slightly more efficient
-0.051013	this is the most efficient
-0.051013	array is the most efficient
-0.051013	stack is the most efficient
-0.051013	list is the most efficient
-0.254382	high then the most efficient
-0.254382	than finding the most efficient
-0.254382	always select the most efficient
-0.254382	programmer choosing the most efficient
-0.289537	composite type is most efficient
-0.301661	} } The most efficient
-0.301661	control condition The most efficient
-0.155496	operating systems are most efficient
-0.155496	Switch statements are most efficient
-0.658429	This is a very efficient
-0.574394	can be a very efficient
-0.429018	code will be very efficient
-0.169277	default. This is less efficient
-0.169277	Intel compiler is less efficient
-0.169277	ahead. It is less efficient
-0.169277	multidimensional array is less efficient
-0.169277	linked list is less efficient
-0.169277	point numbers is less efficient
-0.169277	a bitfield is less efficient
-0.181745	jumping around and less efficient
-0.214604	programs can be less efficient
-0.214604	code will be less efficient
-0.183333	member functions are less efficient
-0.183333	Dynamic libraries are less efficient
-0.183333	alternative implementations are less efficient
-0.181745	variables as input less efficient
-0.236707	in a[i]. Note how efficient
-0.236115	sizes of matrices. An efficient
-0.338454	in C++ is quite efficient
-0.290470	bounds checking and various efficient
-0.200006	and references are equally efficient
-0.200006	= 123; are equally efficient
-0.500231	Intel CPU detection function In
-0.353551	to 5 } } In
-0.290229	1; return c; } In
-0.377950	list[i].b = 2.0; } In
-0.313707	when converting to double In
-0.236660	b will be 2 In
-0.307128	discussion of system code. In
-0.307128	not the compiled code. In
-0.236423	changed. 7.8 Member pointers In
-0.236233	programming. 13.3 Difficult cases In
-0.441518	transferred to the function. In
-0.649927	of the critical function. In
-0.220543	language 11 programming, etc. In
-0.220543	variables, loop counters, etc. In
-0.220543	with its limit, etc. In
-0.342986	Signed versus unsigned integers In
-0.235558	b; a += b; In
-0.542065	parts of the program. In
-0.287693	bits are less efficient. In
-0.287693	only slightly less efficient. In
-0.233813	fastest on different processors. In
-0.438253	done outside the loop. In
-0.213447	terms of code size. In
-0.266476	largest available register size. In
-0.232225	for the same variables. In
-0.412924	constant. 14.2 Bounds checking In
-0.231889	most of the resources. In
-0.231501	sure you need it. In
-0.142187	searching, and mathematical calculations. In
-0.142187	loop doing mathematical calculations. In
-0.186081	the heavy graphics calculations. In
-0.342027	to 4 clock cycles. In
-0.323767	execution time. Loop unrolling In
-0.581329	than in 64-bit Windows. In
-0.195744	x<<3, which is faster. In
-0.488557	which is much faster. In
-0.327915	name as template parameter. In
-0.304041	of an array element. In
-0.227750	a 128-bit XMM register. In
-0.227750	are accessed equally fast. In
-0.226308	to fourteen register parameters. In
-0.226450	or modify objects simultaneously. In
-0.226379	propagation and other optimizations. In
-0.317521	to use assembly language. In
-0.184395	and hence higher speed. In
-0.184395	half the single-thread speed. In
-0.467694	is divisible by 16. In
-0.298946	that fits the application. In
-0.276065	threads with low priority. In
-0.222022	can reduce them all. In
-0.468712	or one clock cycle. In
-0.218244	a power of two. In
-0.218122	the "worst case" counts. In
-0.403173	no long dependency chains. In
-0.211860	will give -2.0 55 In
-0.211860	to do it explicitly. In
-0.346887	code is intended for. In
-0.211860	91 step by step. In
-0.211860	worried about this condition. In
-0.211860	that it doesn't occur. In
-0.211860	can be reused elsewhere. In
-0.211860	can become very big. In
-0.250863	one (see page 71). In
-0.199601	hard working software users. In
-0.199601	a function can throw. In
-0.199601	an integer, usually 32. In
-0.199601	a false vendor string. In
-0.199601	your program exception safe. In
-0.199601	not always optimal, though. In
-0.199601	with the same name. In
-0.199601	optimal for each calculation. In
-0.250863	cannot easily be obtained. In
-0.465371	is a clock cycle? In
-0.465371	reasons of mathematical purity. In
-0.199601	that some microprocessors have. In
-0.164771	copy constructor specifying otherwise. In
-0.164771	linking (e.g. option /MT). In
-0.164771	the calculation of B. In
-0.164771	that comes to mind. In
-0.164771	discussed on page 60. In
-0.164771	Dr Dobbs Journal, 2002). In
-0.164771	are of course system-specific. In
-0.164771	suffer from mispredictions. 44 In
-0.164771	1. See page 34. In
-0.164771	to their 32-bit counterparts. In
-0.164771	time intervals are short. In
-0.164771	the original is destroyed. In
-0.164771	set supports self-relative addressing. In
-0.164771	with the same divisor. In
-0.164771	y = MAX(f(x), g(x)); In
-0.164771	for random number generators. In
-0.164771	previous chapter (page 146). In
-0.164771	and then call __intel_cpu_features_init_x(). In
-0.164771	framework in its API. In
-0.164771	or modifies many strings. In
-0.009442	{ a = a *
-0.386291	division c = a *
-0.354655	/ b as a *
-0.507485	a) { return a *
-0.313685	Func1 (int a[], int *
-0.236555	* __restrict aa, int *
-0.401432	float x2 = x *
-0.003778	x) { return x *
-0.011433	m) { return x *
-0.245634	to a = b *
-0.245634	example, a = b *
-0.231351	b; a = b *
-0.245634	7.2 a = b *
-0.245634	140 a = b *
-0.190520	{ a[i] = b *
-0.190520	temp; temp = b *
-0.317288	= 1.0f + b *
-0.020326	+ 2 : b *
-0.248795	the intermediate expression b *
-0.020326	c + 2, b *
-0.197762	c + two, b *
-0.313822	{ a[i] = i *
-0.237039	of n floats: float *
-0.236787	int x = 2 *
-0.220068	n; static char const *
-0.006442	inline __m128i LoadVector(void const *
-0.220068	inline __m128i LoadVectorA(void const *
-0.236662	* sizeof(float)) = 8 *
-0.224430	8.5a void Plus2 (int *
-0.224430	7.12 void FuncA (int *
-0.233573	get the value 10 *
-0.288742	* 100 * 5 *
-0.156660	a+1; b = temp *
-0.156660	b[i]; c[i] = temp *
-0.231605	take 1000 * 100 *
-0.190866	as (int)&matrix[0][0] + j *
-0.190866	compiler can replace j *
-0.311495	Example 14.15b if (a *
-0.222163	b; int c;}; abc *
-0.562367	u, v; if (u.i *
-0.218375	Object1; CChild2 Object2; CChild1 *
-0.218453	Object1; C2 Object2; CHello *
-0.212135	loop will take 1000 *
-0.212135	float x4 = x2 *
-0.212135	{ C1 obj1; C0 *
-0.001697	static inline void StoreVector(void *
-0.212033	//Loopby4 s += xxn *
-0.212033	= &Object1; p1->Hello(); CChild2 *
-0.330468	reciprocal_divisor; y2 = a2 *
-0.330468	b2); y1 = a1 *
-0.199769	appropriate function version CriticalFunctionType *
-0.251052	the selected version FuncType *
-0.199769	+ 2) : (bb[i] *
-0.074681	j by is (columns *
-0.074681	+ j * (columns *
-0.074681	the dispatcher function. typeof(CriticalFunction) *
-0.074681	CriticalFunctionDispatch(void) __asm__ ("CriticalFunction"); typeof(CriticalFunction) *
-0.199769	x) { return Func1(x) *
-0.251052	c; b = (a+1) *
-0.164926	b;} }; int Sum2(S3 *
-0.164926	shift operation. For example,a *
-0.164926	alignment problem void AddTwo(int *
-0.164926	compute (FuncRow(i)*columns + FuncCol(i)) *
-0.164926	+ 2 return (2.5f *
-0.164926	bit: absvalue = a[i].u[1] *
-0.164926	DynamicArray = (float *)alloca(n *
-0.164926	= 1. / (b1 *
-0.164926	& (N-1)) return powN<(N1&(N1-1))==0,N1>::p(x) *
-0.164926	* 2 > v.i *
-0.164926	static inline void StoreNTD(double *
-0.164926	x) { return powN<true,N/2>::p(x) *
-0.164926	static inline void StoreVectorA(void *
-0.164926	asa << 4, anda *
-0.164926	= a1 * b2 *
-0.164926	= a2 * b1 *
-0.164926	a[i] = log (b[i] *
-0.164926	* x - 8.0f) *
-0.603977	2.5 Choice of compiler There
-0.535978	to optimization by compiler There
-0.237437	ab[i].b = Func(ab[i].a); } There
-0.237020	// everything is double There
-0.236403	for vectorizing mathematical code. There
-0.989288	at the same time. There
-0.456667	take no extra time. There
-0.235976	of the memcpy function. There
-0.235616	sets, cache size, etc. There
-0.235505	treated as different functions. There
-0.335827	- min)) { ... There
-0.378505	files. 13.2 Model-specific dispatching There
-0.234466	performance between the systems. There
-0.526921	makes caching less efficient. There
-0.486584	up, as explained below. There
-0.272845	number of logical processors. There
-0.272845	bytes) on future processors. There
-0.233592	Mathematical functions for vectors There
-0.233421	program. 6 Development process There
-0.233450	which alloca was called. There
-0.232855	malloc and free are: There
-0.308332	of the vector size. There
-0.231844	memory or other resources. There
-0.439591	optimize across function calls. There
-0.231460	processors that support it. There
-0.286884	different kind of registers. There
-0.231460	to the same object. There
-0.317496	small gain in performance. There
-0.230386	9.2. Cache control instructions. There
-0.230386	in a different way. There
-0.609656	used for internal references. There
-0.229633	requiring the same address. There
-0.333563	the speed or not. There
-0.855690	to the end user. There
-0.974350	when the function returns. There
-0.748416	use dynamic memory allocation. There
-1.106585	instruction set is enabled. There
-0.228716	and caching becomes inefficient. There
-0.228838	is a try block. There
-0.304003	is accessed much faster. There
-0.482242	as a template parameter. There
-0.283836	rather than Boolean expressions. There
-0.227777	below the maximum value. There
-0.227640	account for unaligned arrays. There
-0.518573	the loop control branch. There
-0.227708	larger floating point vectors. There
-0.226261	more than four parameters. There
-0.226572	ignoring the higher bits. There
-0.226416	This advantage comes automatically. There
-0.226339	its own CPU core. There
-0.710004	work into multiple threads. There
-0.512669	3.16 Execution unit throughput There
-0.221984	and very time- consuming. There
-0.221876	in the present manual. There
-0.305840	have no out-of-order execution. There
-0.568359	address divisible by 8. There
-0.500586	shared objects (*.dll, *.so). There
-0.218076	and without AVX support. There
-0.271709	instructions are not optimal. There
-0.551268	13.4 Test and maintenance There
-0.211814	vectorize the code explicitly. There
-0.264631	(e.g. GetProcessAffinityMask in Windows). There
-0.264631	VTune and AMD CodeAnalyst. There
-0.640909	in the following way: There
-0.346824	C++ relates to security. There
-0.211814	registers is very limited. There
-0.211814	to set number 0x1C. There
-0.211814	precision math. Memory copying. There
-0.211814	change pre-increment to post-increment. There
-0.250813	to refresh the screen. There
-0.199556	of the application programmer. There
-0.199556	away an overflow check. There
-0.465289	manual 4: "Instruction tables". There
-0.199556	or structure is created. There
-0.199556	prediction (see p. 43). There
-0.199556	Microsoft, Intel and Gnu. There
-0.250813	of vectors. 12.10 Conclusion There
-0.199556	order to save power. There
-0.199556	cache (see p. 87). There
-0.164730	than x = -abs(x);. There
-0.164730	pointer set to NULL. There
-0.164730	much time it uses. There
-0.164730	Volume 2A and 2B. There
-0.164730	Kbytes to 2 Mbytes. There
-0.164730	are separated by commas. There
-0.164730	the container be recycled? There
-0.164730	calculate xn as x4∙xn-4. There
-0.164730	a discussion. 7.33 Namespaces There
-0.164730	even integer is returned. There
-0.164730	was down to 36. There
-0.164730	longer time than normally. There
-0.164730	libraries (.dll or .so). There
-0.164730	and 8 floating point). There
-0.164730	penalty to using inheritance. There
-0.653408	the size of the array
-1.404769	the address of the array
-0.349934	first element of the array
-1.014343	the calculation of the array
-0.642063	the end of the array
-0.349934	the dimensions of the array
-0.356059	bytes smaller and the array
-0.357561	of S1 in the array
-0.353193	operating system if the array
-0.353193	needed, however, if the array
-0.579417	array to make the array
-0.529351	function in which the array
-0.352232	without checking all the array
-0.456445	errors in case the array
-0.236530	control it compares the array
-0.236530	efficient solution. Sort the array
-0.582786	So the address of array
-0.209811	calculate the addresses of array
-0.209811	calculating the addresses of array
-0.341085	compare with end of array
-0.237108	function scanf. Violation of array
-0.346411	for fast access to array
-0.237803	for array sizes and array
-0.331856	; store result in array
-0.470886	motion Induction variables for array
-0.349172	doesn't automatically check for array
-0.512446	It is intended for array
-0.380616	have no checking for array
-0.292406	are no checks for array
-0.293974	of the object or array
-0.337062	when n is an array
-0.354481	the size of an array
-0.053566	the address of an array
-0.270958	the end of an array
-0.289992	same applies to an array
-0.289992	adding bounds-checking to an array
-0.329983	of elements in an array
-0.315585	to check if an array
-0.270197	the data as an array
-0.053451	is used as an array
-0.221491	way to set an array
-0.221491	that behaves like an array
-0.221491	such as copying an array
-0.221491	array or setting an array
-0.221491	you are feeding an array
-0.011407	aligned integer vector from array
-0.003769	unaligned integer vector from array
-0.357671	may reuse the same array
-0.293561	element addresses for one array
-0.706933	the size of each array
-0.318910	(in bytes) of each array
-0.313931	is a fixed size array
-0.011270	aligned integer vector into array
-0.003724	unaligned integer vector into array
-0.537205	macro to swap two array
-0.236462	{ // Make dynamic array
-0.059540	range then a simple array
-0.292170	mentioned here: A large array
-0.208278	array element } An array
-0.208278	of the arrays. An array
-0.208278	93. 7.10 Arrays An array
-0.208278	given in www.agner.org/optimize/cppexamples.zip. An array
-0.208278	on page 27. An array
-0.236040	{ // Loop through array
-0.554470	to wrap the allocated array
-0.228080	caching inefficient. An allocated array
-0.344036	100> list; // Make array
-0.501268	is clock cycles per array
-0.340861	to allocate the final array
-0.231213	Dynamic memory allocation Any array
-0.183656	efficient than a linear array
-0.250724	then use a linear array
-0.183656	container, then a linear array
-0.322000	address of the current array
-0.406296	sequence in a temporary array
-0.106712	to access a multidimensional array
-0.008084	A matrix or multidimensional array
-0.069327	0, sizeof(list)); A multidimensional array
-0.218403	the access to individual array
-0.500800	point constants, string constants, array
-0.199869	to make a variable-size array
-0.199869	{ // Safe [] array
-0.165019	<< endl; // Output array
-0.165019	than when a fixed-size array
-0.336034	several large arrays and where
-0.549603	scope of the function where
-1.466148	parts of the code where
-0.353692	verify than a program where
-0.293548	execution to the point where
-0.296482	times in a loop where
-0.296482	Calculations in a loop where
-0.581381	that can be used where
-0.722104	the x86 instruction set where
-0.344321	very large shared object where
-0.293222	make member functions static where
-0.381371	to a public variable where
-0.236528	useful for large libraries where
-0.489495	also use vector operations where
-0.236267	to the general case where
-0.199448	146). In the cases where
-0.026681	integer size in cases where
-0.026681	is advantageous in cases where
-0.026681	operations automatically in cases where
-0.026681	such errors in cases where
-0.026681	example containers in cases where
-0.016482	There may be cases where
-0.153618	However, there are cases where
-0.364982	advantageous in most cases where
-0.153618	programming, etc. In cases where
-0.199448	there are many cases where
-0.153618	automatically in simple cases where
-0.070052	of the few cases where
-0.070052	are a few cases where
-0.199448	optimal in special cases where
-0.236189	(add with carry) instructions where
-0.334215	of making two threads where
-0.235569	itself. But a solution where
-0.335278	union is a structure where
-0.795305	needed in 64-bit mode where
-0.321644	such errors in programs where
-0.340728	cases take memory space where
-0.234709	on large data sets where
-0.403151	a large memory model where
-0.234333	to construct obscure examples where
-0.289845	The calculation of expressions where
-0.233550	as a learning process where
-0.233076	a Pentium 4 computer where
-0.232795	advantage in interpreted languages where
-0.232502	work with member functions, where
-0.286560	that you can predict where
-0.021315	parallelism is the situation where
-0.021315	refers to the situation where
-0.021315	useful in the situation where
-0.067365	in a use situation where
-0.067365	cache space. A situation where
-0.067365	about the only situation where
-0.067365	used in any situation where
-0.067365	46 A common situation where
-0.284987	the form of templates where
-0.308410	labels follow a sequence where
-0.020499	to make overflow checks where
-0.059851	be aware of situations where
-0.059907	be useful in situations where
-0.059907	also useful in situations where
-0.045528	than RISC in situations where
-0.059851	There may be situations where
-0.059851	it. There are situations where
-0.059851	There are also situations where
-0.227860	with IsPowerOf2 = false where
-0.326721	is a dependency chain where
-0.145016	There are cases, however, where
-0.145016	a few cases, however, where
-0.306860	of storage is determined where
-0.281237	in 32- bit mode, where
-0.279168	in the second step where
-0.224600	function return addresses (i.e. where
-0.222047	loops (except in Fortran where
-0.295957	such as multiple inheritance where
-0.218245	by using a pipeline where
-0.271900	the level-1 data cache, where
-0.218245	in a column-wise manner where
-0.211981	a series of calculations, where
-0.264820	forums on the Internet where
-0.211981	algorithm of sequential instructions, where
-0.250995	situations like example 12.4a where
-0.199718	caching more efficient today where
-0.199718	each time. An experiment where
-0.164880	high-priority threads are areas where
-0.164880	sets the variable __intel_cpu_feature_indicator where
-0.164880	back to around 1980 where
-0.164880	// Template for pow(x,N) where
-0.164880	written as 2eee 1.fffff, where
-0.164880	"Moving blocks of data", where
-0.164880	is n places back, where
-0.164880	back in the sequence, where
-0.407983	difficult to implement the many
-0.237805	want to thank the many
-0.339418	precision. The speed is many
-0.237657	is extremely costly to many
-0.237657	annoying time consumer to many
-0.313396	the Gnu compiler in many
-0.447745	away the variable in many
-0.236786	algebraic reductions explicitly in many
-0.236786	by many users in many
-0.381270	functions are missing in many
-0.452992	cumbersome to use for many
-0.327582	a good performance for many
-0.222060	Optimized function libraries for many
-0.158980	include standard libraries for many
-0.158980	contains well-tested libraries for many
-0.450728	and very useful for many
-0.581597	compiler is available for many
-0.430115	templates are available for many
-0.234446	a precious resource for many
-0.234446	enters the market for many
-0.293632	dispatching and discovered that many
-0.237394	bear in mind, that many
-0.187317	cache if there are many
-0.083637	pointers if there are many
-0.187317	programs if there are many
-0.187317	calls, if there are many
-0.421642	120 However, there are many
-0.752512	calls the critical function many
-0.443982	that are used by many
-0.315474	and write it with many
-0.315474	user friendly compiler with many
-0.285000	reason. A program with many
-0.285000	example is called with many
-0.229802	parameters. A template with many
-0.229802	useful in programs with many
-0.229802	systems"). An application with many
-0.285000	some CPU-intensive applications with many
-0.305077	used. A computer with many
-0.371560	A switch statement with many
-0.229802	Has an IDE with many
-0.355975	data shuffling, such as many
-0.342289	not necessary to have many
-0.444849	in programs that have many
-0.548760	directives Some compilers have many
-0.237528	times one way, then many
-0.487061	and is called from many
-0.441224	compiler because it has many
-0.316848	of a program has many
-0.447190	if a program has many
-0.609638	vector class library has many
-0.227199	language While C++ has many
-0.227199	D language. D has many
-0.227199	major platforms. Pascal has many
-0.353347	the data are used many
-0.331014	task is divided into many
-0.372475	slightly less efficient. In many
-0.230462	with low priority. In many
-0.230462	of mathematical purity. In many
-0.376267	there may be so many
-0.326030	function. There are so many
-0.459203	this code. For example, many
-0.097001	call to count how many
-0.097001	variables that count how many
-0.222092	that can tell how many
-0.222092	The profiler counts how many
-0.236135	can benefit from its many
-0.236067	in all cases, while many
-0.236011	to CPU-intensive code. But many
-0.323279	If a program uses many
-0.410933	If a program contains many
-0.191319	compiler. This library contains many
-0.191319	Performance Primitives" library contains many
-0.208534	Math Kernel Library" contains many
-0.312792	on servers that run many
-0.638429	more efficient to store many
-0.234320	significant improvements. Making too many
-0.332752	&& expression to generate many
-0.619466	A branch that goes many
-0.232564	such container classes. Unfortunately, many
-0.231800	on some processors. On many
-0.231198	A large block containing many
-0.228991	These two books contain many
-0.311545	The CPU can hold many
-0.222254	number. I have seen many
-0.218410	and C# and avoids many
-0.212079	and 64-bit Linux. Has many
-0.199813	Pentium 4. Even worse, many
-0.199813	a complex framework requiring many
-0.164967	with non-Intel CPUs. Includes many
-0.164967	Delight". Addison-Wesley, 2003. Contains many
-0.164967	program creates or modifies many
-0.164967	to x?" or "how many
-0.553090	Let's look at the possible
-0.887411	means that it is possible
-0.540780	consider if it is possible
-0.159282	many cases it is possible
-0.072379	some cases it is possible
-0.450136	CPU. But it is possible
-0.509501	consider whether it is possible
-0.413466	other cases, it is possible
-0.319028	} Here it is possible
-0.319028	PC. Nevertheless, it is possible
-0.319028	Boolean algebra, it is possible
-0.319028	can see, it is possible
-0.319028	software design, it is possible
-0.451290	know if this is possible
-0.386393	109 } It is possible
-0.297115	an object It is possible
-0.594265	longer time. It is possible
-0.439015	are used. It is possible
-0.297115	VIA processors. It is possible
-0.297115	shared object. It is possible
-0.297115	not critical. It is possible
-0.297115	automatic vectorization. It is possible
-0.297115	mouse input. It is possible
-0.297115	specific purpose. It is possible
-0.297115	new context. It is possible
-0.297115	data. 148 It is possible
-0.297115	is happening. It is possible
-0.297115	or animation. It is possible
-0.297115	tedious indeed. It is possible
-0.297115	p. 57). It is possible
-0.297115	or sizes? It is possible
-0.336008	programs and also a possible
-0.237699	can easily justify a possible
-1.000898	where the number of possible
-0.722532	are a number of possible
-0.442116	a limited number of possible
-0.324402	2 (be aware of possible
-0.237117	overcome the obstacle of possible
-0.237819	for an explanation and possible
-1.298461	then it may be possible
-0.555371	space. It may be possible
-0.458123	tools. It should be possible
-0.235620	may neverthe- less be possible
-0.171357	Or it might be possible
-0.171357	solution. It might be possible
-0.237786	pointers to objects) are possible
-0.252637	order to make it possible
-0.082839	instructions that make it possible
-0.082839	conditions that make it possible
-0.204720	and it makes it possible
-0.163091	doubled. This makes it possible
-0.163091	occurred. This makes it possible
-0.204720	class library makes it possible
-0.233215	multiplications. How was it possible
-1.172523	power of 2 if possible
-0.290973	as much data as possible
-0.235057	do as much as possible
-0.235057	be as small as possible
-0.235057	be as standardized as possible
-0.641665	then it is not possible
-0.451967	If it is not possible
-0.451967	where it is not possible
-0.525330	Obviously, this is not possible
-0.334782	constant propagation is not possible
-0.319342	It is therefore not possible
-0.324868	+= 1.0f; } A possible
-0.437634	x. This is only possible
-0.309749	Obviously, this is only possible
-0.382085	below. There are other possible
-0.331184	also deallocated in all possible
-0.232771	code. It is also possible
-0.232771	optimization. It is also possible
-0.414965	languages, it is often possible
-0.509759	information. It is often possible
-0.532529	It is not always possible
-0.236200	We cannot change its possible
-0.277596	pointer to the best possible
-0.277596	not use the best possible
-0.277596	or model the best possible
-0.324230	several flaws: The best possible
-0.559118	72. It is therefore possible
-0.289903	makes various other optimizations possible
-0.287752	polymorphism. It is sometimes possible
-0.317909	situation of the maximum possible
-0.552680	that gives the simplest possible
-0.283330	development tools. The simplest possible
-0.423633	pointers It is rarely possible
-0.228027	the sake of fastest possible
-0.281423	different compilers is generally possible
-0.182244	to assume the worst possible
-0.182244	that gives the worst possible
-0.224820	c; // Define biggest possible
-0.065523	the reciprocal of the clock
-0.357853	of changes in the clock
-0.349234	the problem that the clock
-0.349234	so fast that the clock
-0.349234	the problems that the clock
-0.656545	For example, if the clock
-0.722570	be multiplied by the clock
-0.356482	CPU-intensive programs when the clock
-0.236951	function to measure the clock
-0.236951	to experience. Occasionally, the clock
-0.236951	In any event, the clock
-0.294170	The time unit is clock
-0.065062	How much is a clock
-0.368759	the length of a clock
-0.098414	The length of a clock
-0.356059	always comparable to a clock
-0.591170	measures the number of clock
-0.236700	Windows. 10 Multithreading The clock
-0.236700	the work load. The clock
-0.236700	than standard PCs. The clock
-0.236700	to normal afterwards. The clock
-0.324908	but it uses more clock
-0.237457	1 0.5ns. 2GHz A clock
-0.500568	resolution of the CPU clock
-0.331787	even if the CPU clock
-0.331787	counts at the CPU clock
-0.287091	I am using CPU clock
-0.228788	takes zero or one clock
-0.130021	that take only one clock
-0.130021	operations take only one clock
-0.228788	This typically takes one clock
-0.283849	AND-operations in just one clock
-0.440637	difference between the two clock
-0.100303	x for approximately two clock
-0.100303	be accessed approximately two clock
-0.236838	typically 0 - 2 clock
-0.232290	take up to 4 clock
-0.232290	and 3 - 4 clock
-0.236704	takes 4 - 8 clock
-0.236384	takes 4 - 16 clock
-0.228482	This can save several clock
-0.228482	the microprocessor wastes several clock
-0.199376	function is a few clock
-0.199376	takes only a few clock
-0.199376	typically takes a few clock
-0.199376	is needed a few clock
-0.199376	in just a few clock
-0.199376	wait until a few clock
-0.183134	very kludgy. The few clock
-0.165601	floating point addition every clock
-0.229725	have one addition every clock
-0.234729	CPUs can change their clock
-0.234630	will take only 256 clock
-0.234318	one addition every three clock
-0.429732	designed for a higher clock
-0.201799	subtraction (3 - 10 clock
-0.469433	that something takes 10 clock
-0.201799	will still take 10 clock
-0.212186	CPUs: use the core clock
-0.074737	clock cycles. The core clock
-0.074737	clock frequency. The core clock
-0.165064	the table are core clock
-0.165064	processors is called core clock
-0.197761	take 3 - 5 clock
-0.087720	point addition takes 5 clock
-0.087720	point operation takes 5 clock
-0.231677	takes 50 - 100 clock
-0.177419	between 5 and 20 clock
-0.038030	of 10 - 20 clock
-0.038030	until 10 - 20 clock
-0.229070	takes 3 - 6 clock
-0.224766	between 2 and 15 clock
-0.074723	takes 40 - 80 clock
-0.074723	multiplication (27 - 80 clock
-0.306197	cycles at the actual clock
-0.237776	more than a hundred clock
-0.165031	cached, but several hundred clock
-0.222219	Integer multiplication takes 11 clock
-0.222171	This code took 50 clock
-0.272039	SSE2 typically takes 40 clock
-0.067980	takes 14 - 45 clock
-0.067980	multiplication (20 - 45 clock
-0.212102	approximately 12 - 25 clock
-0.017507	of different size matrices, clock
-0.017507	copying different size matrices, clock
-0.164988	memory takes only 2-3 clock
-0.164988	cycles counter is counting clock
-0.164988	will take approximately 500 clock
-0.164988	0); DontSkip = dummy[0]; clock
-0.653367	shared object, then the version
-0.356445	virtual function. If the version
-0.544595	code to call the version
-1.099860	recommended to use a version
-0.237400	time it takes. The version
-0.237400	by CPU brand. The version
-0.237012	to the appropriate function version
-0.237012	to the desired function version
-0.328244	on which a code version
-0.320879	on which this code version
-0.101513	installation time. Each code version
-0.101513	at initialization. Each code version
-0.234234	the most advanced code version
-0.357627	always calls the same version
-0.229242	at compile time which version
-0.229242	it is known which version
-0.229242	useful when testing which version
-0.229242	time on deciding which version
-0.229242	predict with certainty which version
-0.293510	want to make one version
-0.345673	the speed of each version
-0.422295	Compile once for each version
-0.326117	function prototypes for each version
-0.335405	to use the static version
-0.332142	systems and a 64-bit version
-0.289706	itself is. The 64-bit version
-0.324268	flaws: The best possible version
-0.313264	versions. A 32- bit version
-0.262857	or if the new version
-0.262857	programmer gets the new version
-0.337146	updated to a new version
-0.274665	features to each new version
-0.452357	insert only the SSE2 version
-0.047138	parm2) {...} // SSE2 version
-0.292080	and BSD. The Windows version
-0.312517	Obviously, the directly compiled version
-0.235890	// AVX2 // specific version
-0.043333	parm2) {...} // AVX version
-0.207364	support in the optimized version
-0.207364	is not the optimized version
-0.235562	AVX instruction set, another version
-0.445535	case then the optimal version
-0.347647	have implemented a separate version
-0.327395	compatible with a better version
-0.233588	a six years old version
-0.028826	pointer to the appropriate version
-0.059702	link to the appropriate version
-0.059702	leads to the appropriate version
-0.112015	and choose the appropriate version
-0.112015	that loads the appropriate version
-0.139760	64-bit systems. The appropriate version
-0.196854	will run the advanced version
-0.196854	avoid running the advanced version
-0.466759	pointer to the desired version
-0.294094	made to the desired version
-0.210641	point to the right version
-0.210641	pointer to the right version
-0.404689	for finding the right version
-0.500803	disabled in the final version
-0.229885	future. If a future version
-0.285094	that uses a newer version
-0.321900	updates if the current version
-0.304117	Now call the chosen version
-0.227923	available because the interpreted version
-0.052020	for making a debug version
-0.052020	program executable: a debug version
-0.111130	lines. The 17 debug version
-0.111130	most time. Uses debug version
-0.144871	For example, the latest version
-0.144871	2. Use the latest version
-0.144871	user gets the latest version
-0.224871	// go to dispatched version
-0.176427	// continue in dispatched version
-0.222118	the 8 most popular version
-0.271980	points to the selected version
-0.148640	will run an inferior version
-0.148640	it supports. An inferior version
-0.212050	as in Linux kernel version
-0.035740	#include "asmlib.h" // Lowest version
-0.035740	= &CriticalFunction_Dispatch; // Lowest version
-0.074687	and loader (requires binutils version
-0.074687	example 13.1, Requires binutils version
-0.199785	and debugging. A command-line version
-0.017502	version and a release version
-0.017502	development, and a release version
-0.199785	set, and a generic version
-0.164942	else { // Generic version
-0.164942	binutils version 2.20, glibc version
-0.164942	&CriticalFunction_SSE2; } // Default version
-0.651661	is independent of the value
-0.458399	false regardless of the value
-0.354763	the compiler to the value
-0.906832	disadvantage is that the value
-0.506643	notice is that the value
-0.331447	1 and that the value
-0.534510	possible, so that the value
-0.428956	This means that the value
-0.500067	safely assume that the value
-0.331447	can detect that the value
-0.340207	be used if the value
-0.439950	only possible if the value
-0.340207	efficient way if the value
-0.340207	is predicted if the value
-0.340207	of course, if the value
-0.353436	of ArraySize by the value
-0.353182	8-bit integers with the value
-0.563078	value depends on the value
-0.352506	brackets mean use the value
-0.351965	this range then the value
-0.437133	each value from the value
-0.870765	is calculated from the value
-0.346819	variable always has the value
-0.493621	overflow will make the value
-0.352298	hundred times because the value
-0.352845	was executed. If the value
-0.320567	significant digits, so the value
-0.341848	variable. Make sure the value
-0.234775	y will get the value
-0.234775	will both get the value
-0.444584	than to calculate the value
-0.444584	recommended to calculate the value
-0.346018	method unfavorable, unless the value
-0.616763	clock cycles after the value
-0.209508	If you read the value
-0.209508	would only read the value
-0.335194	+ i; Here, the value
-0.233978	compiler doesn't know the value
-0.233978	condition will generate the value
-0.334476	allowed to change the value
-0.332178	= N&(N-1) gives the value
-0.339248	and wait until the value
-0.289745	connection with reading the value
-0.335194	instead of calculating the value
-0.377359	register to hold the value
-0.531044	takes to reload the value
-0.233978	pop ebx restores the value
-0.456458	make sure that a value
-0.349624	is calculated from a value
-0.237352	giving each constant a value
-0.237352	lookup tables Reading a value
-0.292887	a little explanation. The value
-0.236739	two clock counts. The value
-0.236739	0's when false. The value
-0.236739	2 63 . The value
-0.237733	parameters are transferred by value
-0.330316	value infinity, and this value
-0.236319	You can subtract this value
-0.237439	function for each different value
-0.427062	to have no other value
-0.275858	can produce no other value
-0.345009	inputs have any other value
-0.575405	reflects the floating point value
-0.345754	16 to the integer value
-0.279877	= x∙xn-1, and each value
-0.323102	code so that each value
-0.244908	into account that each value
-0.279877	is serial because each value
-0.279877	have to calculate each value
-0.225287	sum; } Here, each value
-0.236911	side-effects and its return value
-0.334413	wait for the new value
-0.347348	begin calculating a new value
-0.229695	is replaced by its value
-0.229695	a pointer then its value
-0.309725	cut off the binary value
-0.233225	} A possible negative value
-0.342008	depends on the preceding value
-0.231820	bits minimum value maximum value
-0.340916	counter when the final value
-0.028062	calculated from the previous value
-0.058063	efficiently from the previous value
-0.043531	function of the absolute value
-0.043531	can take the absolute value
-0.043531	to calculate the absolute value
-0.224845	and the four B value
-0.199925	declaration size, bits minimum value
-0.199925	with the four R value
-0.199925	in an unused fourth value
-0.165071	cycles. Obviously, the initial value
-0.358259	(in bytes) of the objects
-0.357457	from knowing that the objects
-0.997263	of 2 if the objects
-0.353802	is necessary if the objects
-0.405477	memory for all the objects
-0.312589	vector stores all the objects
-0.312589	to pool all the objects
-0.237305	to zero whenever the objects
-0.527597	or the number of objects
-1.058455	If the number of objects
-0.133558	a variable number of objects
-0.133558	A variable number of objects
-0.854875	the total number of objects
-0.519708	where the type of objects
-0.102362	calculating the movements of objects
-0.102362	the physical movements of objects
-0.183063	that all variables and objects
-0.183063	contains many variables and objects
-0.183063	used. Such variables and objects
-0.081956	all non-static variables and objects
-0.081956	All non-static variables and objects
-0.065365	the stack Variables and objects
-0.065365	in memory. Variables and objects
-0.065365	variable storage Variables and objects
-0.343171	amount of time. The objects
-0.237440	and complexity (en.wikipedia.org/wiki/Standard_Template_Library). The objects
-0.485065	are used only for objects
-0.293674	use this principle for objects
-0.355100	blocks than there are objects
-0.234749	for details on when objects
-0.170856	space becomes fragmented when objects
-0.170856	easily become fragmented when objects
-0.233996	Define biggest possible vector objects
-0.500609	{ // Define vector objects
-0.233996	does not allow vector objects
-0.350665	memory area for different objects
-0.235613	how to make different objects
-0.625148	is contiguous with other objects
-0.101199	a FIFO manner? If objects
-0.101199	a FILO manner? If objects
-0.233381	objects numbered consecutively? If objects
-0.233192	searching needed before all objects
-0.169959	but only after all objects
-0.169959	searching needed after all objects
-0.285375	smaller. Structure and class objects
-0.230132	information to all class objects
-0.230132	objects Conversions involving class objects
-0.230132	go undetected. Converting class objects
-0.233610	efficient to store many objects
-0.233610	large block containing many objects
-0.289062	to detect if any objects
-0.233377	add or remove any objects
-0.236513	are needed, and new objects
-0.348337	multiple inheritance by making objects
-0.292175	choose to align large objects
-0.228922	method only for big objects
-0.228922	arrays and other big objects
-0.093519	pointers to all allocated objects
-0.093519	important that all allocated objects
-0.262091	to different dynamically allocated objects
-0.262091	many small dynamically allocated objects
-0.449927	is possible to store objects
-0.185734	is used in shared objects
-0.185734	default, even when shared objects
-0.185734	possible to make shared objects
-0.235290	speeding up 64-bit shared objects
-0.019313	libraries, also called shared objects
-0.289853	The calculation of graphics objects
-0.310208	all destructors for local objects
-0.210537	a unique key. Do objects
-0.210537	a hash map. Do objects
-0.307358	and modular. The so-called objects
-0.285158	Text strings and similar objects
-0.285182	add, remove or modify objects
-0.227980	the creation of temporary objects
-0.020070	14.12 Position-independent code Shared objects
-0.020070	at load time. Shared objects
-0.020070	32 bit Linux Shared objects
-0.002459	as explained below. Shared objects
-0.020070	objects in BSD Shared objects
-0.020070	for local references. Shared objects
-0.184701	the simplest cases, composite objects
-0.184701	method for transferring composite objects
-0.299135	is preferred to declare objects
-0.035755	at compile time. Are objects
-0.035755	a top-of-stack index. Are objects
-0.035755	be too small. Are objects
-0.035755	linked list. 94 Are objects
-0.165024	reference, or void. Returning objects
-0.294091	code is compact and takes
-0.501989	be the one that takes
-0.590335	a container class that takes
-0.463997	standard function library that takes
-0.329235	make one version that takes
-0.773916	in a way that takes
-0.291794	checking. Any task that takes
-0.327681	same time that it takes
-0.327681	table shows that it takes
-0.327681	9.1 show that it takes
-0.412852	cycles more than it takes
-0.035225	is the time it takes
-0.035225	of the time it takes
-0.017256	to the time it takes
-0.035225	and the time it takes
-0.035225	with the time it takes
-0.011427	than the time it takes
-0.017256	this the time it takes
-0.035225	includes the time it takes
-0.035225	Consider the time it takes
-0.035225	measuring the time it takes
-0.007520	time. The time it takes
-0.007520	installation The time it takes
-0.007520	bytes. The time it takes
-0.007520	calculations. The time it takes
-0.007520	prediction. The time it takes
-0.007520	doubled. The time it takes
-0.007520	run. The time it takes
-0.007520	tolerance. The time it takes
-0.007520	costs. The time it takes
-0.625970	not optimal because it takes
-0.272257	time consuming. Sometimes it takes
-0.272257	Program loading Often, it takes
-0.218560	is cached. Usually it takes
-0.503360	overflow, and the code takes
-0.428213	sure that the compiler takes
-0.428213	assume that the compiler takes
-0.427398	stack is used. It takes
-0.234438	program is started. It takes
-0.234438	comments, in green. It takes
-0.349122	and deallocation of memory takes
-0.341674	a variable in memory takes
-0.524270	analysis. If the program takes
-0.237452	CPUs". A branch instruction takes
-0.237398	disadvantages: The unrolled loop takes
-0.595400	or double to integer takes
-0.347858	number to an integer takes
-0.413531	of an unsigned integer takes
-0.332116	For example, a double takes
-0.769925	a float or double takes
-0.331208	functions. The 'this' pointer takes
-0.237108	class or structure object takes
-0.237061	need assembly language. C++ takes
-0.349790	function. Copying the table takes
-0.231857	The size conversion often takes
-0.231857	a hard disk often takes
-0.236354	floating point constant always takes
-0.450899	precision. Long double precision takes
-0.490745	through a linked list takes
-0.265036	if unsigned. This typically takes
-0.212173	a function pointer typically takes
-0.212173	integer without SSE2 typically takes
-0.201960	longer time. Integer multiplication takes
-0.201960	Integer multiplication Integer multiplication takes
-0.347225	think that exception handling takes
-0.234648	a #define directive never takes
-0.289100	enabled. Typically, the conversion takes
-0.198493	(short int)i; This conversion takes
-0.362578	that the type conversion takes
-0.198493	because the integer-to-float conversion takes
-0.438031	division Floating point division takes
-0.263401	Integer division Integer division takes
-0.263401	other microprocessors. Integer division takes
-0.585174	a floating point addition takes
-0.263105	105. Floating point addition takes
-0.375350	each floating point operation takes
-0.020506	I write that something takes
-0.282912	library. A runtime DLL takes
-0.404246	deallocation and garbage collection takes
-0.226614	and reading them again takes
-0.224785	is unfortunate because truncation takes
-0.222282	8 clock cycles. Division takes
-0.199858	on the microprocessor. Multiplication takes
-0.199858	disadvantage that the branching takes
-0.199858	large then it obviously takes
-0.571135	pointer. This is the variable
-0.779142	the address of the variable
-0.461536	is set in the variable
-0.647087	the possibility that the variable
-0.352491	the assumption that the variable
-0.547153	execution, even if the variable
-0.555508	8 rather than the variable
-0.496624	before the time the variable
-0.354909	is declared. If the variable
-0.323859	code in which the variable
-0.323859	brackets in which the variable
-0.349324	not edx but the variable
-0.445161	it makes sure the variable
-0.475589	attempting to write the variable
-0.211025	function __intel_cpu_features_init() sets the variable
-0.211025	and similarly sets the variable
-0.436560	code to give the variable
-0.380340	to optimize away the variable
-0.236119	overhead of transferring the variable
-0.292180	The union forces the variable
-0.236119	cycles to fetch the variable
-0.341175	all optimizations of a variable
-0.341175	live range of a variable
-0.341175	A collection of a variable
-0.341175	the scope of a variable
-0.818627	or reference to a variable
-0.338067	or writing to a variable
-0.437259	be added to a variable
-0.404447	makes sure that a variable
-0.311757	keyword specifies that a variable
-0.311757	keyword tells that a variable
-0.338604	memory used by a variable
-0.785634	than division by a variable
-0.351144	Boolean NOT on a variable
-0.447484	the value from a variable
-0.576100	useful to make a variable
-0.327671	If you access a variable
-0.621602	Reading or writing a variable
-0.321225	aliasing When accessing a variable
-0.310691	variable. Efficiency Accessing a variable
-0.234517	the compiler treat a variable
-0.331332	and similar objects of variable
-0.313806	a few arrays of variable
-0.404706	The different kinds of variable
-0.052589	7.1 Different kinds of variable
-0.167520	about function names and variable
-0.167520	define function names and variable
-0.439436	when the function or variable
-0.311091	with any function or variable
-0.314375	Text strings typically have variable
-0.285517	Alignment of data A variable
-0.607834	at compile time. A variable
-0.230257	or a pointer. A variable
-0.230257	into one thread. A variable
-0.372191	are very expensive. A variable
-0.230257	variables (see below). A variable
-0.293637	identical to some other variable
-0.576172	access a floating point variable
-0.514990	for more than one variable
-0.352579	by replacing an integer variable
-0.650650	make sure that no variable
-0.324176	registers. A class member variable
-0.232621	log2 a global const variable
-0.232621	to a local const variable
-0.420102	overflow of an unsigned variable
-0.382887	stored as a register variable
-0.294259	make temp a register variable
-0.333567	and store the shared variable
-0.405057	overflow of a signed variable
-0.038131	value of the induction variable
-0.038131	and use the induction variable
-0.038131	would make the induction variable
-0.246981	calculated by an induction variable
-0.105123	from making an induction variable
-0.126195	use the same induction variable
-0.126195	8 and no induction variable
-0.126195	by a second induction variable
-0.028280	result // Update induction variable
-0.028280	Y // Update induction variable
-0.161966	reference to a public variable
-0.161966	And whenever a public variable
-0.140149	applied to a global variable
-0.140149	name as a global variable
-0.140149	Likewise, when a global variable
-0.406404	it in a temporary variable
-0.226705	that access the saved variable
-0.335932	cannot multiply integers of any
-0.237639	are declared outside of any
-0.338948	than comparing it to any
-0.314184	reads or writes to any
-0.237447	not 123 correspond to any
-0.237742	constructors, copy constructors, and any
-0.712988	can be used in any
-0.237579	the execution speed in any
-1.293912	can be used for any
-0.237009	code that works for any
-0.237009	specify static linking for any
-0.236369	this instruction set or any
-0.236369	destructor to call or any
-0.236369	Any specific bottleneck or any
-0.237014	is to detect if any
-0.237014	count as true, if any
-0.036906	is not accessed by any
-0.291417	replace this line by any
-0.235448	check is bypassed by any
-0.313471	function to work with any
-0.293012	macro will interfere with any
-0.237652	can run optimally on any
-0.527102	therefore as efficient as any
-0.348786	or 8, but not any
-0.457768	many times faster than any
-0.329852	virtual processors can have any
-0.236277	if the inputs have any
-0.351420	or union can use any
-0.482895	cannot be called from any
-0.586832	can be accessed from any
-0.234227	is not referenced from any
-0.237448	can be added at any
-0.508203	function. This will make any
-0.323327	and you cannot make any
-0.235364	and execution units. If any
-0.235364	type identification (RTTI) If any
-0.293565	can be used, but any
-0.986848	the compiler to do any
-0.237053	"worst case" counts. In any
-0.236777	function should never return any
-0.097635	vector operations and before any
-0.097635	starts running and before any
-0.223785	an EMMS instruction before any
-0.312417	intrinsic function _mm256_zeroupper() before any
-0.023085	function that doesn't call any
-0.236661	code and doesn't take any
-0.456454	library does not need any
-0.323691	if they don't need any
-0.319740	main() are compiled without any
-0.225673	with old microprocessors without any
-0.225673	be used freely without any
-0.100239	that it cannot access any
-0.100239	member function cannot access any
-0.229812	I am not making any
-0.229812	you should avoid making any
-0.736237	Therefore, you should avoid any
-0.379572	You will not get any
-0.226277	generation class (CGrandParent) contains any
-0.226277	generation class (CParent<>) contains any
-0.235400	CPU dispatching and run any
-0.233817	cleaning up and calling any
-0.233634	but it doesn't generate any
-0.334765	Espresso) that can reduce any
-0.055232	conversions do not produce any
-0.026738	compiler does not produce any
-0.026738	It does not produce any
-0.232236	(i.e. variables defined outside any
-0.231695	computers. At this time, any
-0.231644	C-style type-casting without adding any
-0.230623	// You may insert any
-0.230593	level-2 cache from loading any
-0.229005	measurement should not include any
-0.167170	that there is hardly any
-0.111117	64-bit integers with hardly any
-0.111117	cache. This has hardly any
-0.111117	that there was hardly any
-0.226569	doesn't add or remove any
-0.299012	this error by avoiding any
-0.272038	not going to recommend any
-0.488559	pointer does not alias any
-0.212021	F1 will never throw any
-0.251039	of time and resolve any
-0.251039	doesn't have to obey any
-0.164916	is possible to express any
-0.164916	programs. The profiler identifies any
-0.164916	a destructor that destroys any
-0.341382	to uncached memory and we
-0.237500	out of range and we
-0.650807	The conclusion is that we
-0.315389	128-bit vector so that we
-0.315389	be negative so that we
-0.315389	example 12.4a so that we
-0.315389	reciprocal factorials so that we
-0.232926	is so large that we
-0.400783	a cache line that we
-0.232926	only CPUID information that we
-0.232926	few clock cycles that we
-0.308795	doing the optimizations that we
-0.232926	column-wise. Assume now that we
-0.232926	Example 14.27 assumes that we
-0.232926	costs to multithreading that we
-0.357363	This is the function we
-0.311137	vectorize the code if we
-0.234892	me manually, but if we
-0.548087	modulo. For example, if we
-0.234892	the same result if we
-0.234892	calculation becomes easier if we
-0.564495	lost at the time we
-0.236120	easier to understand when we
-0.236120	will be evicted when we
-0.228399	an odd number then we
-0.283408	repeats 1000 times then we
-0.228399	for a result then we
-0.520057	5 clock cycles, then we
-0.228399	compile-time constant n, then we
-0.228399	a = 10000, then we
-0.228399	C1 or C2, then we
-0.535885	times. This is because we
-0.307661	will run faster because we
-0.287467	cost anything here because we
-0.231973	for example 9.5 because we
-0.231355	for local references. If we
-0.231355	be loaded anyway. If we
-0.231355	n! = n∙(n-1)!. If we
-0.231355	1.0f : 2.5f; If we
-0.335555	a library function which we
-0.293341	a to this number we
-0.533039	advantageous in cases where we
-0.225787	depend on x so we
-0.225787	in the cache so we
-0.225787	in this case so we
-0.225787	is 1024 bytes, so we
-0.236773	it is evicted before we
-0.526338	elsewhere. In this example, we
-0.334458	32. In 64-bit systems we
-0.340110	is not the case we
-0.495676	resources. In this case we
-0.228727	mathe- matical applications. But we
-0.228727	0+1.23456 = 1.23456. But we
-0.224390	in 64-bit code. However, we
-0.224390	specific CPU models. However, we
-0.290204	variables. In these examples we
-0.331860	+ list[j].c; } Here, we
-0.437626	all the cache lines we
-0.287664	maximum number of constants we
-0.208357	in the cache. When we
-0.208357	rounded to 100000000. When we
-0.338910	further. The first thing we
-0.206000	137). The second thing we
-0.285105	string. In the future we
-0.229924	for each process. Obviously, we
-0.195844	which is double. Here we
-0.195844	a1/b1 + a2/b2; Here we
-0.282838	If n = 4, we
-0.281318	in 64 bit mode, we
-0.226517	tools to be available, we
-0.218397	CPUs or CPU cores, we
-0.218327	in assembly language". While we
-0.218327	y = pow(x,n) As we
-0.148585	the -fpic option. Then we
-0.148585	somewhere in F1? Then we
-0.212062	2. Using hexadecimal numbers, we
-0.212155	double In example 7.4 we
-0.212062	not a problem since we
-0.199797	the loop by four, we
-0.199797	the rules of algebra, we
-0.164952	case of data decomposition, we
-0.164952	quite fast. The lesson we
-0.164952	required a PC. Similarly, we
-0.164952	rather than -156. Surprisingly, we
-0.164952	2 in example 14.7b, we
-0.164952	rather than 200. Next, we
-0.164952	four places back. Thus, we
-0.164952	2 GHz CPU. Should we
-0.293745	table may be of some
-0.697337	is a list of some
-0.715738	to take care of some
-0.346883	it count up to some
-0.407229	to is identical to some
-0.237295	of software programmers to some
-0.237295	systems gives rise to some
-0.345402	and string functions and some
-0.324581	for 16-bit mode and some
-0.237026	avoid hard-to-find errors, and some
-0.237026	CPU dispatch mechanisms, and some
-0.233833	Linux platforms, and in some
-0.233833	cache level, and in some
-0.334851	systems (but not in some
-0.044337	the compiler may in some
-0.044337	of compiler may in some
-0.093720	cycles. It may in some
-0.093720	int declaration may in some
-0.349582	are now used in some
-0.433476	code more efficient in some
-0.235901	are less efficient in some
-0.375725	It is possible in some
-0.232803	can improve performance in some
-1.027186	can be useful in some
-0.400602	very efficient solution in some
-0.288410	clear program structure in some
-0.308648	can improve optimizations in some
-0.375725	do, at least in some
-0.232803	can be expensive in some
-0.232803	cause slight imprecision in some
-0.232803	of the iterator in some
-0.236697	of this section for some
-0.323888	same time, except for some
-0.236697	performance by 5-10% for some
-0.236697	the newsgroup comp.lang.asm.x86 for some
-0.381014	situations to avoid that some
-0.236602	it is true that some
-0.236602	the loop buffer that some
-0.536260	you will notice that some
-0.458312	devices, but there are some
-0.237331	assembly language. Here are some
-0.237771	CPU by giving it some
-0.545230	C++ is supported by some
-0.236979	results are combined by some
-0.323747	The compiler comes with some
-0.670765	sake of compatibility with some
-0.236117	7 through 14, with some
-0.054451	predicted well only on some
-0.235189	time than normal on some
-0.235189	from the IDE on some
-0.237641	that similar solutions may some
-0.237586	optimization, it does have some
-0.344438	The Codeplay compiler has some
-0.570910	a pointer. It has some
-0.420690	then you may make some
-0.356028	may decide to do some
-0.195836	CPU detection function In some
-0.246629	the critical function. In some
-0.195836	of the program. In some
-0.264972	heavy graphics calculations. In some
-0.195836	time. Loop unrolling In some
-0.195836	an array element. In some
-0.195836	working software users. In some
-0.195836	always optimal, though. In some
-0.195836	some microprocessors have. In some
-0.195836	comes to mind. In some
-0.195836	Dobbs Journal, 2002). In some
-0.195836	from mispredictions. 44 In some
-0.195836	See page 34. In some
-0.195836	in its API. In some
-0.313680	in green. It takes some
-0.355460	fast enough. For example, some
-0.236478	following list points out some
-0.416613	is that it does some
-0.333907	code. Each compiler does some
-0.311811	as C++ for doing some
-0.234095	The table can give some
-0.471715	Some compilers can reduce some
-0.232652	chapter, I have described some
-0.232615	a[1] = 2; Unfortunately, some
-0.420601	functions have to save some
-0.229995	integer vector instructions SSE4.1 some
-0.226618	unfortunately very common. Even some
-0.218426	use than others. While some
-0.251190	The following sections describe some
-0.500454	dispatching. This function is so
-0.727615	branch of code is so
-0.344037	maintaining such code is so
-0.636945	the response time is so
-0.542281	met: the object is so
-0.312825	floating point variables is so
-0.236308	Template meta- programming is so
-0.500908	if a matrix is so
-0.456076	Unfortunately, the syntax is so
-0.312825	why this effect is so
-0.237717	its own caller, and so
-0.557265	they are used in so
-0.515517	However, there may be so
-0.350905	the tolerance may be so
-0.354660	for lists that are so
-0.312331	the level-2 cache are so
-0.354954	memcpy function. There are so
-0.441103	resources. Modern CPUs are so
-0.534848	b[i] and c[i] are so
-0.237727	#if directives around it so
-0.237688	in the thread function so
-0.357199	and reorganize the code so
-0.466093	by the critical code so
-0.237662	don't depend on x so
-0.432301	counter ahead of time so
-0.557447	value at compile time so
-0.574452	into a 128-bit vector so
-0.237306	a and b different so
-0.351523	already in the cache so
-0.320461	not optimal to do so
-0.039520	deallocated. Failure to do so
-0.082999	flow. Failure to do so
-0.517991	compilers will not do so
-0.537652	formulas in this example so
-0.338263	not allowed in C++ so
-0.236734	from the same address so
-0.447428	for each function call so
-0.236682	using one register less so
-0.821320	out the sign bit so
-0.876440	are accessed through pointers so
-0.236292	into two 64-bit operations so
-0.526091	0] in this case so
-0.236189	code automatically or does so
-0.333833	propagate through the calculations so
-0.291959	access in separate threads so
-0.235848	never throw any exception so
-0.404254	method in 32-bit mode so
-0.234465	from a reliable source so
-0.376535	options at the start so
-0.233050	can never be negative so
-0.728770	in the code section so
-0.232743	same after this statement so
-0.325107	reciprocal in the code, so
-0.232028	The operators are inlined so
-0.307185	expanded like a macro so
-0.231544	multiply it by 100 so
-0.231139	a power of 2, so
-0.286555	and 14.9 is changed so
-0.285859	two constants are identical so
-0.305102	use excessive loop unrolling so
-0.082750	array should be organized so
-0.082750	resources should be organized so
-0.315407	Why is template metaprogramming so
-0.413965	10 is an integer, so
-0.227949	have to be designed so
-0.226481	one time in thousand so
-0.322095	always normalized, if possible, so
-0.315484	the code more compact so
-0.279092	pipelined, as explained above, so
-0.224657	C/C++ standard specifies truncation so
-0.301431	double precision by default, so
-0.271998	we modify example 9.5 so
-0.211975	int is 32 bits, so
-0.211975	rely on automatic prefetching so
-0.211975	treated like a parameter, so
-0.250989	loop in example 12.4a so
-0.199713	kilobyte is 1024 bytes, so
-0.199713	store the reciprocal factorials so
-0.164875	interrupts and task switches; so
-0.164875	CPU market is developing so
-0.164875	approximately seven significant digits, so
-0.164875	to the value 0x2C so
-0.164875	object of class C1, so
-0.587987	assumption is that the variables
-0.339309	compiler will choose the variables
-0.461267	A limited number of variables
-0.354975	is also used for variables
-0.514511	This is intended for variables
-0.714869	of making sure that variables
-0.629530	if the operands are variables
-0.325086	on registers, not on variables
-0.428768	efficient. Do not make variables
-0.237316	array can cause other variables
-0.529500	range of floating point variables
-0.433707	calculations in floating point variables
-0.062717	for manipulating floating point variables
-0.266518	64-bit systems. Floating point variables
-0.266518	point expressions. Floating point variables
-0.113128	29 7.3 Floating point variables
-0.113128	31 7.3 Floating point variables
-0.477730	unable to predict which variables
-0.345884	this is that all variables
-0.770711	the most often used variables
-0.569846	The most commonly used variables
-0.313948	therefore conclude that most variables
-0.574426	be used for multiple variables
-0.313753	Avoid global and static variables
-0.324522	a program contains many variables
-0.286380	Typical candidates for register variables
-0.214027	compiler to make register variables
-0.163395	of floating point register variables
-0.074058	making floating point register variables
-0.214027	number of integer register variables
-0.236659	important to understand how variables
-0.236445	simply by setting these variables
-0.236236	cache space. Putting simple variables
-0.591161	of pointers to its variables
-0.235920	a risk that several variables
-0.261394	or four single precision variables
-0.261394	or eight single precision variables
-0.235170	You may add counter variables
-0.235007	an integer. 158 Integer variables
-0.187059	makes operations with Boolean variables
-0.187059	operators that have Boolean variables
-0.187059	1 for true. Boolean variables
-0.187059	variables are overdetermined Boolean variables
-0.187059	handle is invalid. Boolean variables
-0.136959	The method of induction variables
-0.136959	common subexpressions, and induction variables
-0.180974	Calculate polynomial with induction variables
-0.136959	how to use induction variables
-0.136959	do not make induction variables
-0.063103	make floating point induction variables
-0.063103	79 Floating point induction variables
-0.180974	namely the two induction variables
-0.136959	compiler doesn't need induction variables
-0.357114	to functions and public variables
-0.219331	You can't have public variables
-0.197809	of static and global variables
-0.197809	may preferably avoid global variables
-0.197809	public variables. All global variables
-0.212190	they are used. Such variables
-0.212190	__thread or __declspec(thread). Such variables
-0.210370	intermediate data and local variables
-0.210370	contiguous with other local variables
-0.230669	program, one for initialized variables
-0.230621	and 16-bit Windows, allow variables
-0.246752	pointer. Likewise, all non-static variables
-0.195945	are constructed. All non-static variables
-0.078618	+= 9; } Induction variables
-0.078618	for array elements Induction variables
-0.078618	other integer expressions Induction variables
-0.078618	invariant code motion Induction variables
-0.078618	temp; } 70 Induction variables
-0.224756	temp / 4; Register variables
-0.224713	All global variables (i.e. variables
-0.224799	we can access internal variables
-0.067977	using classes. 7.2 Integers variables
-0.067977	storage............................................................................. 26 7.2 Integers variables
-0.212091	of the class. Storing variables
-0.122547	from any function. Global variables
-0.122547	can avoid it. Global variables
-0.199825	and one for uninitialized variables
-0.164978	up. The two summation variables
-0.444194	jump by copying the return
-0.237818	it can overwrite the return
-1.021904	for an explanation of return
-0.355905	using exceptions is to return
-1.623867	It is recommended to return
-0.237480	jump from a=a*2; to return
-0.102630	overlap the call and return
-0.102630	overhead of call and return
-0.510173	object with new and return
-0.237797	Function return types The return
-0.237796	detects an error can return
-0.292714	{ return 0; // return
-0.292714	x^8 // x^10 // return
-0.236588	nn ifbit=1 bitofn // return
-0.523620	class. Make the function return
-0.536756	or as a function return
-0.236338	used for storing function return
-0.314542	function. The function may return
-0.484133	lookup } else { return
-0.484133	69 } else { return
-0.121659	(float const x) { return
-0.200187	(double const x) { return
-0.198233	MultiplyBy (int x) { return
-0.013364	parabola (float x) { return
-0.072690	double p(double x) { return
-0.072690	double xpow10(double x) { return
-0.138299	int Func1(int x) { return
-0.138299	double Func2(double x) { return
-0.316854	const & b) { return
-0.031784	const * p) { return
-0.533184	} if (b) { return
-0.091693	const & a) { return
-0.091693	square (float a) { return
-0.208073	x, int m) { return
-0.208073	} int Size() { return
-0.324886	The library function will return
-0.344930	n >>= 1; } return
-0.598184	a * 3; } return
-0.232968	// n factorial } return
-0.232968	next four x^n } return
-0.226852	call the chosen version return
-0.098779	go to dispatched version return
-0.098779	continue in dispatched version return
-0.226852	} // Default version return
-1.687214	a power of 2 return
-0.320905	- 8*x + 2 return
-0.236229	} // No error return
-0.292243	no side-effects and its return
-0.236158	up then it must return
-0.284435	ebx from stack ; return
-0.689579	; unused label ; return
-0.236017	i++) f *= i; return
-0.098727	............................................................................................... 50 7.16 Function return
-0.098727	operating systems". 7.16 Function return
-0.516185	{ // SSE2 supported return
-0.516185	{ // AVX supported return
-0.098035	= CriticalFunction(b, c); ... return
-0.098035	= (*CriticalFunction)(b, c); ... return
-0.483364	= b + 1; return
-0.416405	a function should never return
-0.328591	= a * 2; return
-0.317879	= a * 3; return
-0.224754	without using the normal return
-0.222243	dword ptr n; #endif return
-0.222200	other error reporting here: return
-0.165014	double x10 = x8*x2; return
-0.165014	N1 (N & (N-1)) return
-0.165014	0.3, -2.0, 4.4, 2.5}; return
-0.165014	checking (see page 134) return
-0.165014	delays execution by causing return
-0.165014	dummy[0]; clock = __rdtsc(); return
-0.165014	s = _mm_hadd_ps(s, s); return
-0.683701	the clock frequency is 2
-0.354576	250 μs on a 2
-0.436906	floats exp function of 2
-0.065127	is a power of 2
-0.056638	be a power of 2
-0.017659	by a power of 2
-0.055288	matrix a power of 2
-0.055288	been a power of 2
-0.055288	columns a power of 2
-0.055288	N a power of 2
-0.106795	only for powers of 2
-0.016073	of using powers of 2
-0.032764	preferably using powers of 2
-0.106795	means avoid powers of 2
-0.538415	can be reduced to 2
-0.237682	of 256 Kbytes to 2
-0.355581	of b will be 2
-0.236599	short int a; // 2
-0.406200	b; int d; // 2
-0.236599	byte at 13 // 2
-0.444059	reduce int x = 2
-0.418821	} The multiplication by 2
-0.292320	right = divide by 2
-0.236241	int before dividing by 2
-0.346319	takes typically 0 - 2
-0.351802	program uses more than 2
-0.344383	for arrays bigger than 2
-0.537492	8 bytes = double 2
-0.545105	c = a + 2
-0.053908	0 ? c + 2
-0.228845	2.5*x^2 - 8*x + 2
-0.237010	v; if (u.i * 2
-0.233042	signed or unsigned 2 2
-0.288681	64 Iu16vec4 32 2 2
-0.236882	used. It takes between 2
-0.287866	to execution time. 4 2
-0.232325	of optimizing ............................................................................................... 4 2
-0.680592	int, signed or unsigned 2
-0.291363	128 SSE double 64 2
-0.445014	SSE2 long long 64 2
-0.291363	4 32 4 64 2
-0.218242	128 I64vec2 Vec2q 64 2
-0.218242	128 Iu32vec4 Vec4ui 64 2
-0.323006	64 MMX int 32 2
-0.230749	int 64 Iu16vec4 32 2
-0.235682	b * 5 / 2
-0.322255	+= 2;} // add 2
-0.135849	#endif // INSTRSET == 2
-0.062634	} #if INSTRSET == 2
-0.062634	set #if INSTRSET == 2
-0.231268	{ if (i % 2
-0.230732	at an address below 2
-0.171872	2exponent 16383 one fraction 2
-0.077489	2exponent 127 1 fraction 2
-0.077489	2exponent 1023 1 fraction 2
-0.019760	x*x*x*x*x*x*x*x = ((x2) 2) 2
-0.516577	8 bytes = int64_t 2
-0.018513	+ i); // Add 2
-0.079307	128 bytes Intel Core 2
-0.079307	aligned operands Intel Core 2
-0.079307	unaligned op. Intel Core 2
-0.212160	13.3 Difficult cases........................................................................................................ 124 2
-0.165040	and data can exceed 2
-0.237668	provokes an error. // You
-0.237495	cpuid // Read time You
-0.237465	a); } 111 } You
-0.488205	files for intrinsic functions You
-0.927520	Still faster if unsigned You
-0.236462	of the instruction code. You
-0.467574	kilobytes at a time. You
-0.307077	a lot of time. You
-0.337773	meaning for member functions. You
-0.235040	amount of memory used. You
-0.234867	= a ^ 1; You
-0.320929	smaller and more efficient. You
-0.329123	which is less efficient. You
-0.234190	follow the guidelines below. You
-1.134936	the function is called. You
-0.669668	option in the compiler. You
-0.288688	need the 'this' pointer. You
-0.231542	in two different registers. You
-0.281371	save several clock cycles. You
-0.281371	approximately 500 clock cycles. You
-0.289257	to use vector operations. You
-0.208163	than floating point operations. You
-0.306514	is 95 not needed. You
-0.231048	of the base classes. You
-0.231085	before terminating a thread. You
-0.410937	even for double precision. You
-0.230464	in a graceful way. You
-0.524108	of elements per vector. You
-0.229675	are impossible with references. You
-0.229810	allowed and which not. You
-0.748733	use dynamic memory allocation. You
-0.302621	all elements to zero. You
-0.282655	architecture of the software. You
-0.458891	reduction in this case. You
-0.226356	is rarely needed anyway. You
-0.467765	address divisible by 16. You
-0.298983	good for the application. You
-0.224672	a few pitfalls here. You
-0.224598	bits of the result. You
-0.512647	of the preceding one. You
-0.595566	ranges do not overlap. You
-0.176183	a and b overlap. You
-0.167758	between different CPU cores. You
-0.167758	between multiple CPU cores. You
-0.221970	approach to error handling. You
-0.221970	to use denormal numbers. You
-0.507562	scope of this manual. You
-0.302176	accessible from other modules. You
-0.221970	do something about them. You
-0.221970	into the program itself. You
-0.218279	functions is not expensive. You
-0.211906	order of Boolean operands. You
-0.211906	only for positive n. You
-0.211906	<< 4) + a. You
-0.250913	trace with a debugger. You
-0.199646	during execution of CriticalFunction. You
-0.250913	example 7.35 page 52. You
-0.250913	the algorithm in question. You
-0.199646	a particular processor model. You
-0.250913	on different test examples. You
-0.250913	it cannot be shared. You
-0.199646	want to 155 test. You
-0.465454	sets are mutually incompatible. You
-0.465454	explained on page 72. You
-0.199646	operating systems available today. You
-0.199646	programming questions to me. You
-0.164813	vector element level 108 You
-0.164813	is referencing it twice. You
-0.164813	154 // Print heading You
-0.164813	option -fwrapv or -fno-strict-overflow. You
-0.164813	sets the variable __intel_cpu_feature_indicator_x. You
-0.164813	to prevent cache contention. You
-0.164813	these directives are compiler-specific. You
-0.164813	32 bits (rarely 64). You
-0.164813	operations are not used). You
-0.164813	cache effects into account. You
-0.164813	it is too late. You
-0.164813	of the optimization job. You
-0.164813	expects a GOT entry. You
-0.164813	in a bad dilemma. You
-0.164813	project window or makefile. You
-0.501559	the copy of the table
-0.356454	The declaration of the table
-0.357569	libraries. Numbers in the table
-1.087408	makes sure that the table
-0.356940	very fast if the table
-0.651457	more time than the table
-0.441186	takes to calculate the table
-0.276746	care to calculate the table
-0.276746	convenient to calculate the table
-0.276746	safer to calculate the table
-0.341950	compiler will store the table
-0.406116	useful to copy the table
-0.406116	if you expect the table
-0.292662	recommended to declare the table
-0.380930	done by declaring the table
-0.292662	code that copies the table
-0.236542	the function. Copying the table
-1.068541	a pointer to a table
-0.349664	table (PLT) and a table
-0.312246	replace it by a table
-0.224996	the branch by a table
-0.224996	predictable branch by a table
-0.798256	be replaced by a table
-0.354027	function call with a table
-0.973612	be implemented as a table
-0.062758	version (May use a table
-0.408397	a value from a table
-0.314945	to read from a table
-0.350348	shared object has a table
-0.382797	undetected. The principle of table
-0.498925	burdensome position-independent code and table
-0.237565	complicated address calculation and table
-1.241195	number of elements in table
-0.329845	My experimental results in table
-0.312782	the ones mentioned in table
-0.191497	results are listed in table
-0.191497	suffixes are listed in table
-0.187370	the instructions listed in table
-0.236271	results are summarized in table
-0.323493	cycles per element. The table
-0.236373	in table 8.1. The table
-0.236373	my test examples. The table
-0.236373	(see p. 104). The table
-0.236373	on page 134. The table
-0.356207	coef[16] = { // table
-0.237710	that rely heavily on table
-0.353964	The examples in this table
-0.357361	8.23a. Loop to make table
-0.497315	function Size of each table
-0.292632	object. Obviously, all these table
-0.331220	in table The following table
-0.428672	or unsigned. The following table
-0.235973	offset table (GOT). These table
-0.319101	can bypass the virtual table
-0.243326	pointer to a virtual table
-0.243326	up in a virtual table
-0.214066	functions. This so-called virtual table
-0.174473	function with a lookup table
-0.346397	to use a lookup table
-0.174473	implementation uses a lookup table
-0.232629	on page 132. Unfortunately, table
-0.092610	Intrinsic functions for vectorized table
-0.092610	be used for vectorized table
-0.091825	in the global offset table
-0.091825	variables called global offset table
-0.231786	element level 9. Avoid table
-0.286020	small x // align table
-0.219583	programmers use a hash table
-0.172202	of data. A hash table
-0.172202	fast enough. A hash table
-0.028923	in the procedure linkage table
-0.113764	in a procedure linkage table
-0.048396	uses a procedure linkage table
-0.028923	functions, called procedure linkage table
-0.028923	an ordinary procedure linkage table
-0.222285	in a hand- written table
-0.218489	x); } 112 Vectorized table
-0.218449	per element. 100 As table
-0.035762	pointer in an import table
-0.035762	goes through an import table
-0.357371	keep track of the performance
-0.356100	limiting factors for the performance
-0.355028	been unsatisfied with the performance
-0.349115	using hyperthreading, but the performance
-0.351798	tool for using the performance
-0.947424	in cases where the performance
-0.352340	of times before the performance
-0.455442	relevant to test the performance
-0.344119	to set up the performance
-0.543804	full information about the performance
-0.323571	Trying to read the performance
-0.150298	possible to improve the performance
-0.129319	You can improve the performance
-0.129319	systems can improve the performance
-0.093205	that may improve the performance
-0.093205	This may improve the performance
-0.093205	you may improve the performance
-0.418340	switches can reduce the performance
-0.291961	optimize, and reading the performance
-0.312370	operations without reducing the performance
-0.291961	In this case, the performance
-0.235926	want to compare the performance
-0.235926	process can influence the performance
-0.235926	programming without paying the performance
-0.569013	support. There is a performance
-0.314467	be to include a performance
-0.347062	its own set of performance
-0.325150	when the optimization of performance
-0.237481	a dramatic degradation of performance
-0.865722	is no difference in performance
-0.268533	no big difference in performance
-0.237267	priority. The gain in performance
-0.237267	substantial. This gain in performance
-0.236794	a considerable improvement in performance
-0.427574	it less efficient. The performance
-0.323459	on Intel processors. The performance
-0.329938	brands of CPUs. The performance
-0.292439	investigating performance problems. The performance
-0.236346	for branch mispredictions. The performance
-0.314597	debugging if required for performance
-0.237684	causes technical problems or performance
-0.237687	recommendation for good code performance
-0.283062	read one or more performance
-0.283062	enable one or more performance
-0.331642	develop- ment time when performance
-0.237468	performance monitor counters. A performance
-0.329194	in terms of program performance
-0.235746	testing and analyzing program performance
-0.525318	functions. There is no performance
-0.525318	parameter. There is no performance
-0.324186	there is hardly any performance
-0.313453	I believe that software performance
-0.313352	built-in test feature called performance
-0.236533	(www.agner.org/optimize/testp.zip). A particularly useful performance
-0.236366	services under advanced system performance
-0.236833	it calls. The best performance
-0.236833	operating system. The best performance
-0.236833	other compilers). The best performance
-0.440848	to get a good performance
-0.313367	libraries have very good performance
-0.235094	ms by selecting optimize performance
-0.234825	16.3 Worst-case testing Most performance
-0.097075	speed.............................................................................................................. 153 16.1 Using performance
-0.097075	(see below) 16.1 Using performance
-0.335859	(typically 64) can improve performance
-0.005644	This library has reduced performance
-0.230667	systems give almost identical performance
-0.322573	best way to identify performance
-0.224743	reasonably well. Very poor performance
-0.222234	be considered. A realistic performance
-0.272059	clock cycle. The highest performance
-0.218385	There is no 51 performance
-0.212193	The fallacy of measuring performance
-0.212119	Microsoft Table 2.1. Comparing performance
-0.199853	that there are inherent performance
-0.251146	and measuring the overall performance
-0.165004	of microprocessor The benchmark performance
-0.165004	very useful for investigating performance
-0.165004	lazy binding definitely degrades performance
-0.237913	risk of activating the very
-0.569218	expensive that it is very
-0.546019	expandable, but it is very
-0.452628	final program, it is very
-0.353700	script. Interpreted code is very
-0.355784	operating system. This is very
-0.356854	these data. It is very
-0.324447	point library which is very
-0.324447	the stack, which is very
-0.420211	& operation, which is very
-0.469736	if the library is very
-0.354880	free are: There is very
-0.601002	number of registers is very
-0.328494	clock cycle counter is very
-0.454209	but the syntax is very
-0.235181	table of constants is very
-0.291114	choice of algorithm is very
-0.291114	the loop body is very
-0.480644	Studio This is a very
-0.480644	if. This is a very
-0.515910	Gnu compiler is a very
-0.444053	code itself is a very
-0.493539	b because of a very
-0.350697	the disadvantage of a very
-0.355817	functions, but in a very
-0.351874	except perhaps for a very
-0.368718	This can be a very
-0.368718	chip can be a very
-0.343300	of integers with a very
-0.343300	optimizations. Loops with a very
-0.353510	be interpreted as a very
-0.491737	cache. This has a very
-0.425726	an integer takes a very
-0.328093	measurements may require a very
-0.234858	seen, is certainly a very
-0.234858	8.21 is indeed a very
-0.615647	does not apply to very
-0.574811	be very long and very
-0.237555	flexible, well tested, and very
-0.484427	be used only for very
-0.293262	impossible to work for very
-0.237070	times 24 dramatically for very
-0.354710	is going to be very
-0.496968	does It can be very
-0.350261	sorted list can be very
-0.350261	These counters can be very
-0.642704	dependency chains can be very
-0.478897	resulting code will be very
-0.340109	functionality. This will be very
-0.354729	the program that are very
-0.342724	and these operations are very
-0.334269	brands of microprocessors are very
-0.235979	together Cache misses are very
-0.235979	with out-of-order capabilities are very
-0.237704	processor performs better on very
-0.237674	the same thread as very
-0.459089	Current compilers are not very
-0.237643	not computationally intensive may very
-0.321928	All these libraries have very
-0.235093	The AVX instructions have very
-0.311377	is because computers have very
-0.237498	is particularly critical. A very
-0.340909	many features, but also very
-0.349346	(but not in some very
-0.288960	is supported by some very
-0.351677	in registers are accessed very
-0.344333	of making the arrays very
-0.624405	then you can get very
-0.550829	which makes data caching very
-0.311405	fixed-size array is made very
-0.215186	smart and other things very
-0.215186	compiler does some things very
-0.233310	of this bookkeeping depends very
-0.308770	the code can become very
-0.281423	Integer operations are generally very
-0.222307	write expressions like -(-a) very
-0.222236	by unit-testing is unfortunately very
-0.218476	code is executed. Optimizes very
-0.212165	to be obsolete. Programmers very
-0.165045	example 7.43b is admittedly very
-1.029079	different parts of the software
-0.356716	and systematization of the software
-0.586913	language is that the software
-0.502692	programmed. But if the software
-0.099510	at the time the software
-0.459102	best compilers use the software
-0.355711	was coded. If the software
-0.335070	double precision. But the software
-0.335070	This may cause the software
-0.293127	requirements of optimizing the software
-0.236951	you may view the software
-0.738205	advantage of using a software
-0.405956	The difference between a software
-0.236434	how to test a software
-0.236434	x86 CPUs. However, a software
-0.292539	is to choose a software
-0.236434	whether to base a software
-0.572376	takes to install a software
-0.292539	not traditionally considered a software
-0.236434	has to reinstall a software
-0.563386	if a piece of software
-0.471666	on the costs of software
-0.236530	the advanced principles of software
-0.380913	formalism. The splitting of software
-0.236530	updates. Automatic updating of software
-0.236530	Security. The vulnerability of software
-0.236530	with a lineage of software
-0.236530	draw the attention of software
-0.336232	all respects relevant to software
-0.237521	software development process and software
-0.314273	for advanced programmers and software
-0.237595	for this shift in software
-0.237595	or class separately in software
-0.324597	waste of time for software
-0.237043	It is common for software
-0.537140	is not uncommon for software
-0.236993	writing style are that software
-0.324250	important usability problems that software
-0.381559	and I believe that software
-0.538414	time is wasted on software
-0.237540	things to test when software
-0.293704	the 64-bit systems. A software
-1.100070	is possible to make software
-1.432616	in order to make software
-0.314076	considerable debate about which software
-0.345890	often requires that all software
-0.313953	Remember again, that most software
-0.236970	4. Even worse, many software
-0.340671	identical performance for 32-bit software
-0.232836	sourcebook for fast 32-bit software
-0.352441	Before starting a new software
-0.336734	a means of making software
-0.229851	are satisfied with making software
-0.236205	iterations of redesign. Some software
-0.236151	database, and other extra software
-0.236004	to come. Even big software
-0.291749	with a well optimized software
-0.235219	user. Compatibility problems. All software
-0.344207	operation in the application software
-0.234729	want to make their software
-0.205084	3.4 Automatic updates Many software
-0.205084	3.9 Other databases Many software
-0.205084	workday or more. Many software
-0.233822	in the Microsoft platform software
-0.233922	with the ever bigger software
-0.249360	have put the whole software
-0.249360	mispredictions. Test the whole software
-0.090900	five manuals: 1. Optimizing software
-0.407866	code in a typical software
-0.276345	a piece of CPU-intensive software
-0.222219	the goal of 18 software
-0.272107	high priority of structured software
-0.199836	a lot of irrelevant software
-0.199836	to many hard working software
-0.199836	usability A better performing software
-0.199836	on a computer. Security software
-0.164988	choose to make memory-hungry software
-0.164988	tools for supporting multi-threaded software
-0.246389	are stored in the order
-0.246389	usually stored in the order
-0.352319	are accessed in the order
-0.352319	stored consecutively in the order
-0.586746	argument is that the order
-0.406524	You can check the order
-0.236818	order is usually the order
-0.338059	if we change the order
-0.122238	Do not swap the order
-0.052088	you cannot swap the order
-0.236818	careful when swapping the order
-0.313434	loop. This reflects the order
-0.236818	done by controlling the order
-0.454628	execute instructions out of order
-0.048331	needed. 11 Out of order
-0.048331	103 11 Out of order
-0.310501	should disable it in order
-0.551645	to the code in order
-0.284496	set of data in order
-0.284496	on arranging data in order
-0.478136	for each other in order
-0.305032	highest instruction set in order
-0.300221	cache line size in order
-0.225715	bit of i in order
-0.337889	floating point variables in order
-0.558048	power of 2 in order
-0.337982	a non-sequential order in order
-0.341577	C++ language elements in order
-0.225715	be declared const in order
-0.225715	divisible by 8 in order
-0.225715	a to unsigned in order
-0.280362	kinds of operations in order
-0.089990	package several times in order
-0.089990	alternatingly several times in order
-0.225715	made very big in order
-0.339863	each array element in order
-0.300221	contains debug information in order
-0.322795	to round addresses in order
-0.225715	in the end in order
-0.319781	longer than needed in order
-0.319781	be joined together in order
-0.225715	instead of bool in order
-0.305032	under worst-case conditions in order
-0.225715	to the right in order
-0.225715	number of cores in order
-0.225715	of user input in order
-0.225715	into multiple blocks in order
-0.225715	load is low in order
-0.280362	several different algorithms in order
-0.225715	the specific purpose in order
-0.225715	is calling itself in order
-0.225715	certain programming principles in order
-0.225715	the software package in order
-0.225715	than there is, in order
-0.225715	lot of bookkeeping in order
-0.225715	all disturbing influences in order
-0.225715	to do experiments in order
-0.225715	software develop- ment in order
-0.225715	degree of randomness in order
-0.225715	is then de-referenced in order
-0.225715	the planning phase in order
-0.225715	offset table (GOT) in order
-0.225715	FuncCol(i)) * sizeof(float) in order
-0.237128	the template parameter. The order
-0.237128	integers. 7.5 Booleans The order
-0.237128	(see page 51). The order
-0.308868	= 2.0; } In order
-0.230513	calculation of B. In order
-0.230513	of course system-specific. In order
-0.405416	elements have no specific order
-0.289497	Fortran where the storage order
-0.289523	linked together. The link order
-0.518882	indexed in a non-sequential order
-0.218525	are accessed in sequential order
-0.347429	elements have a natural order
-0.251297	accessed sequentially. The opposite order
-0.165128	fully utilizing its out-of- order
-0.165128	principle for a 2'nd order
-0.463106	representation, except in the long
-0.556338	just because it is long
-0.715945	if the loop is long
-0.356080	a sum of a long
-0.493022	list[i]; This has a long
-0.510508	disadvantage of using a long
-0.736337	the situation where a long
-0.236177	task that takes a long
-0.312669	to integer takes a long
-0.236626	branches may take a long
-0.236626	and logarithms take a long
-0.036935	can take quite a long
-0.235728	mispredicted, which causes a long
-0.291735	thousand numbers. With a long
-0.235728	of calculations forms a long
-0.048332	of float, double and long
-0.048332	between float, double and long
-0.237681	misses, branch misprediction, or long
-0.443284	calculations are done with long
-0.236889	loop-carried dependency chains with long
-0.287006	no extra time as long
-0.526280	of the program as long
-0.231568	optimizations automatically, but as long
-0.309710	for multiple variables as long
-0.231568	takes six times as long
-0.231568	floating point calculations as long
-0.231568	is not significant as long
-0.231568	and 64-bit integers, as long
-0.348832	be noticeable but not long
-0.237566	stack. These registers have long
-0.237544	time to calculate when long
-0.237465	the first sub-vector. A long
-0.747222	and there are no long
-0.313550	a list of some long
-0.344101	response time is so long
-0.339302	integer takes a very long
-0.304585	to work for very long
-0.259385	going to be very long
-0.418517	chains can be very long
-0.272549	232-1 uint32_t unsigned long long
-0.202290	4 128 SSE2 long long
-0.202290	10; int i; long long
-0.202290	8 256 AVX2 long long
-0.202290	2 64 MMX long long
-0.202290	16 512 AVX512 long long
-0.202290	long long time1; long long
-0.202290	-231 231-1 int32_t long long
-0.202290	16.1 #include <intrin.h> long long
-0.202290	volatile int DontSkip; long long
-0.334769	4 double 8 8 long
-0.518663	in 16-bit systems: unsigned long
-0.227687	__int64 64-bit Linux: unsigned long
-0.227687	0 232-1 uint32_t unsigned long
-0.292804	data and measure how long
-0.330065	32 4 128 SSE2 long
-0.323505	thing is to avoid long
-0.323505	you have to avoid long
-0.354359	= 10; int i; long
-0.290168	the program takes too long
-0.320360	32 8 256 AVX2 long
-0.319926	This delay is just long
-0.231725	on page 22. Avoid long
-0.302911	resolve any branch misprediction long
-0.311541	32 2 64 MMX long
-0.903229	int in 16-bit systems: long
-0.560162	32 16 512 AVX512 long
-0.069320	frequent cause of unacceptably long
-0.069320	still frustrated by unacceptably long
-0.069320	framework sometimes have unacceptably long
-0.069320	user might experience unacceptably long
-0.444337	of vector math libraries: long
-0.264969	int64_t 29 64-bit Linux: long
-0.251140	favorable: Larger data types: long
-0.199847	i; long long time1; long
-0.199847	this would give annoyingly long
-0.164998	32 -231 231-1 int32_t long
-0.164998	Example 16.1 #include <intrin.h> long
-0.164998	dummy[4]; volatile int DontSkip; long
-0.237757	PC's and mainframes, and between
-0.382759	uses the cache in between
-0.237642	a multithreaded program, or between
-0.454270	evicted from the cache between
-0.237170	many elements are there between
-0.313626	is used. It takes between
-0.101418	no difference in performance between
-0.101418	big difference in performance between
-0.436408	are 6 unused bytes between
-0.014746	no difference in speed between
-0.195179	variable that is shared between
-0.195179	memory address and shared between
-0.195179	read-only can be shared between
-0.195179	variables that are shared between
-0.195179	section is not shared between
-0.333053	conversion time is typically between
-0.321372	positive result. The conversion between
-0.044482	compensate for the difference between
-0.044482	consumption as the difference between
-0.021672	x; Note the difference between
-0.021672	Day. Note the difference between
-0.044482	example illustrates the difference between
-0.044482	fast. Calculating the difference between
-0.196663	and FPGAs. The difference between
-0.488356	There is no difference between
-0.136930	only a minimal difference between
-0.289709	a third-party graphics framework between
-0.320188	Use mask to choose between
-0.321280	switch is a switch between
-0.213681	you don't need conversions between
-0.213681	page 140. Avoid conversions between
-0.329483	compilers offer the choice between
-0.229945	RISC and CISC processors, between
-0.227923	for threads that jump between
-0.067966	10) { ... Conversions between
-0.067966	point precision conversion Conversions between
-0.067966	long double precision. Conversions between
-0.067966	set is enabled. Conversions between
-0.032655	another platform. 14.8 Conversions between
-0.032655	double..................................................................................... 140 14.8 Conversions between
-0.426344	the code is distributed between
-0.176382	be needed for communication between
-0.132808	cache is that communication between
-0.132808	amount of necessary communication between
-0.276352	Unpredictable branches that select between
-0.222177	modern microprocessors is split between
-0.074708	language is a compromise between
-0.074708	must be a compromise between
-0.222177	instruction set has nothing between
-0.222118	if a thread jumps between
-0.010126	a branch that chooses between
-0.088504	where a program chooses between
-0.147283	You have to distinguish between
-0.147283	is important to distinguish between
-0.212050	delays if the distance between
-0.199785	is not optimized. Jumps between
-0.199785	a single function. Switch between
-0.017502	for communication and synchronization between
-0.017502	because communication and synchronization between
-0.074687	compiler makes a distinction between
-0.074687	is an important distinction between
-0.164942	of data. The similarity between
-0.164942	make a sensible balance between
-0.164942	resources, and the transitions between
-0.164942	divide the work evenly between
-0.164942	to divide the workload between
-0.164942	of synchronizing and communicating between
-0.164942	is no clear correspondence between
-0.164942	be predicted perfectly varies between
-0.164942	No differences were observed between
-0.164942	the same without discriminating between
-0.164942	to be. The distinctions between
-0.164942	order to facilitate porting between
-0.164942	function, one that discriminates between
-0.164942	in a multitasking environment, between
-0.164942	necessary functions for distinguishing between
-0.763555	16 bits of the 32-bit
-0.357645	mentioned above for the 32-bit
-0.356027	is better than the 32-bit
-0.357334	each bit of a 32-bit
-0.355817	be reached with a 32-bit
-0.345917	the offset as a 32-bit
-0.634226	be expressed as a 32-bit
-0.438993	between the efficiency of 32-bit
-0.531985	results when applied to 32-bit
-0.547562	and 64-bit Windows and 32-bit
-0.299145	for 32-bit Windows and 32-bit
-0.449520	between 32-bit Linux and 32-bit
-0.325506	64-bit systems and in 32-bit
-0.445175	64-bit mode than in 32-bit
-0.057226	point register variables in 32-bit
-0.307070	function calls faster in 32-bit
-0.286903	function calling method in 32-bit
-0.431719	is 32 bits in 32-bit
-0.340440	point registers available in 32-bit
-0.036485	on the stack in 32-bit
-0.317516	is 4 bytes in 32-bit
-0.317516	or 64-bit integers in 32-bit
-0.398652	Using double precision in 32-bit
-0.231477	variables is eight in 32-bit
-0.423536	self- relative addresses in 32-bit
-0.333653	operating system running in 32-bit
-0.317516	of self-relative references in 32-bit
-0.120178	is inefficient, especially in 32-bit
-0.120178	scarce resource, especially in 32-bit
-0.120178	than relocation, especially in 32-bit
-0.231477	a scarce resource in 32-bit
-0.231477	for general purposes in 32-bit
-0.373883	compiling without -fpic in 32-bit
-0.231477	is approximately six in 32-bit
-0.231477	Sum2 and Sum3 in 32-bit
-0.269141	open source compiler for 32-bit
-0.269141	A commercial compiler for 32-bit
-0.269141	a cheap compiler for 32-bit
-0.327621	almost identical performance for 32-bit
-0.234477	80 clock cycles for 32-bit
-0.508978	function is intended for 32-bit
-0.509526	-fno-pic when compiling for 32-bit
-0.234477	the optimization capabilities for 32-bit
-0.234477	OS X Compilers for 32-bit
-0.234477	make separate executables for 32-bit
-0.714506	a and b are 32-bit
-0.293387	"memory" ); #else // 32-bit
-0.237179	defined(__unix__) || defined(__GNUC__) // 32-bit
-0.356020	of vector, such as 32-bit
-0.354365	a little faster than 32-bit
-0.655772	is inefficient to use 32-bit
-0.101987	Microsoft compiler. Supports only 32-bit
-0.101987	instruction sets. Supports only 32-bit
-0.237385	with 64 bits, but 32-bit
-0.289801	fact represented as two 32-bit
-0.289801	integer rather than two 32-bit
-0.237091	supports self-relative addressing. In 32-bit
-0.528902	difference in performance between 32-bit
-0.339455	is no difference between 32-bit
-0.099846	Gnu 32-bit -fno-builtin Gnu 32-bit
-0.099846	64 bit -fno-builtin Gnu 32-bit
-0.235657	The compiler sometimes uses 32-bit
-0.235614	The Gnu libraries support 32-bit
-0.328450	A sourcebook for fast 32-bit
-0.235112	systems. Today (2013) both 32-bit
-0.234775	are inferior to their 32-bit
-0.234054	for 64-bit integers. Many 32-bit
-0.233170	make utility. It supports 32-bit
-0.213770	others are not. Supports 32-bit
-0.213770	produce binary code). Supports 32-bit
-0.232611	for many platforms, including 32-bit
-0.231806	both Windows and Linux, 32-bit
-0.281380	way as in Linux. 32-bit
-0.456144	8 pointer or reference, 32-bit
-0.330627	and Mac OS X, 32-bit
-0.165035	run in both 16-bit, 32-bit
-0.525356	a space in the branch
-0.523296	buffer. Contentions in the branch
-0.461791	the performance if the branch
-0.356414	be predicted by the branch
-0.172275	This is called the branch
-0.172275	special cache called the branch
-0.501570	advantageous to replace the branch
-0.237217	on how predictable the branch
-0.293430	and it avoids the branch
-0.237877	that is used is branch
-0.460544	worst case is a branch
-0.355239	convert it to a branch
-0.354410	specialization, not with a branch
-0.348091	to recover from a branch
-0.292522	the code has a branch
-0.292522	The code has a branch
-0.292147	predict which way a branch
-0.549214	constants. For example, a branch
-0.333991	can automatically replace a branch
-0.329620	either way. Such a branch
-0.236090	whenever it feeds a branch
-0.322987	for an explanation of branch
-0.441312	is a kind of branch
-0.341023	times each function and branch
-0.048291	of cache misses and branch
-0.048291	code, cache misses and branch
-0.293681	eight 16-bit integers. The branch
-0.314173	can be critical. The branch
-0.354998	The algorithms used for branch
-0.237429	get reliable results for branch
-0.237819	past history of that branch
-0.313699	is how the if branch
-0.293229	Pentium 4. The if branch
-0.102493	branch // Loop with branch
-0.102493	Example 12.4a. Loop with branch
-0.293973	gives more details on branch
-0.313223	works correctly. A code branch
-0.235975	list of which code branch
-0.235975	and run any code branch
-0.312478	1; } } A branch
-0.227340	a = b; A branch
-0.227340	the other way. A branch
-0.227340	on every call. A branch
-0.227340	well, of course. A branch
-0.227340	and VIA CPUs". A branch
-0.227340	are often mispredicted. A branch
-0.227340	when it changes. A branch
-0.458888	flag then the loop branch
-0.345633	overlapping calculations. The loop branch
-0.805363	Smaller microcontrollers have no branch
-0.236993	expression to generate many branch
-0.457297	model the best possible branch
-0.236951	time and resolve any branch
-0.634742	no need to take branch
-0.599395	to make a new branch
-0.327635	and maintaining a new branch
-0.236183	See page 43 about branch
-0.540872	joined into a single branch
-0.346384	the BTB can cause branch
-0.344694	most cases, the optimal branch
-0.234981	CPU that each particular branch
-0.182789	if the loop control branch
-0.182789	then the loop control branch
-0.182789	where the loop control branch
-0.182789	execute the loop control branch
-0.166447	The i<20 loop control branch
-0.332080	function then the dispatch branch
-0.279319	replace a poorly predictable branch
-0.224760	often suffer from poor branch
-0.218403	data cache, code cache, branch
-0.264995	pipeline. If the wrong branch
-0.015529	goes to cache misses, branch
-0.015529	has most cache misses, branch
-0.015529	instructions executed, cache misses, branch
-0.199869	_mm_clflush intrinsic function. Provoke branch
-0.199869	jumps Eliminate branches Remove branch
-0.165019	cache, branch target buffer, branch
-0.356921	> -b to a <
-0.023424	(x = 0; x <
-0.126141	$B1$2 label if i <
-0.000006	i = 0; i <
-0.000003	(i = 0; i <
-0.169018	for ( ; i <
-0.126141	the loop condition i <
-0.126141	The two comparisons i <
-0.206997	be reversed if c <
-0.002580	(c = 0; c <
-0.300457	list[size]; ... if (i <
-0.225914	float list[ARRAYSIZE]; if (i <
-0.218053	after exceptions: while (i <
-0.234436	interval 0 <= n <
-0.019407	(r = 0; r <
-0.019407	(r = 1; r <
-0.232265	(temp = &list[0]; temp <
-0.230000	(row = 0; row <
-0.191005	(c2 = r1; c2 <
-0.191005	(c2 = c1; c2 <
-0.228081	(j = 0; j <
-0.224855	(column = 0; column <
-0.224831	(c1 = 0; c1 <
-0.218473	Now 1.0 <= u.f <
-0.212206	(r1 = 0; r1 <
-0.122610	(r2 = r1; r2 <
-0.122610	(r2 = r1+1; r2 <
-0.330695	i; if ((unsigned int)i <
-0.199937	a*b - n.a. !(a <
-0.017515	}; if ((unsigned int)n <
-0.017515	479001600}; if ((unsigned int)n <
-0.165081	are certain that u <
-0.165081	} u; if (u.i[1] <
-0.165081	= 0; while (seconds <
-0.165081	would be while (0 <
-1.453082	the address of the member
-0.522521	complicated implementation of the member
-0.653408	the offset of the member
-0.357530	the class that the member
-0.357615	so (i.e. if the member
-0.445398	convenient to have the member
-0.356137	source file. If the member
-0.523568	const function that is member
-0.349651	the object, and a member
-0.352274	function could be a member
-0.705375	a pointer or a member
-0.099186	make the function a member
-0.196411	as efficient as a member
-0.334714	implemented internally as a member
-0.340846	C++0x support. Make a member
-0.418056	efficient than accessing a member
-0.312146	more compact. Accessing a member
-0.291748	the class. Calling a member
-0.235739	You can force a member
-0.435026	most complicated implementation of member
-0.325248	use of classes and member
-0.212076	The this pointer in member
-0.212076	implicit 'this' pointer in member
-0.237804	See page 52. The member
-0.237805	a different meaning for member
-0.352401	a variable, pointer or member
-0.537202	non-static data members or member
-0.313536	it doesn't work with member
-0.236904	can cause complications with member
-0.046970	offset of a data member
-0.046970	simple cases, a data member
-0.046970	for accessing a data member
-0.046970	structures. Accessing a data member
-0.284291	of a class data member
-0.229177	before the first data member
-0.459636	is recommended to make member
-0.338085	other complications that make member
-0.415353	pointer. You may make member
-0.574070	that use the same member
-0.349630	efficient as any other member
-0.129717	parameters to a class member
-0.129717	applied to a class member
-0.230135	doesn't work for class member
-0.285379	into registers. A class member
-0.237264	data members. But each member
-0.237221	to align its b member
-0.015190	member functions. A static member
-0.064232	the stack. A static member
-0.293136	class (CParent<>) contains any member
-0.340840	to control the way member
-0.313480	points to. A const member
-0.235747	efficient as a virtual member
-0.235747	to call a virtual member
-0.251313	runtime dispatch to virtual member
-0.251313	is obtained with virtual member
-0.200001	at least one virtual member
-0.200001	object has no virtual member
-0.232269	static static static Assume member
-0.014877	faster than a non-static member
-0.136327	data members or non-static member
-0.180274	incurred on all non-static member
-0.241112	can call the polymorphic member
-0.351580	version of a polymorphic member
-0.156821	in 32-bit systems. Virtual member
-0.071370	non-static access. 7.20 Virtual member
-0.071370	(methods)......................................................................... 53 7.20 Virtual member
-0.068009	128 bytes. 7.19 Class member
-0.068009	............................................................................ 51 7.19 Class member
-0.212148	-fp- model fast=2 Simple member
-0.212148	to each other (not member
-0.199881	(CGrandParent) contains any non-polymorphic member
-0.199881	oriented programming are: Non-static member
-0.199881	to call a non-virtual member
-0.165029	use of structures (without member
-0.540201	variables because of the way
-0.142029	difference lies in the way
-0.142029	efficiency lies in the way
-0.354460	classes Programming in the way
-0.356336	not satisfied with the way
-0.567589	application depends on the way
-0.381994	options to control the way
-0.338795	integer types Unfortunately, the way
-0.352993	x. It is a way
-0.838428	If there is a way
-0.809872	the code in a way
-0.064670	the calculation in a way
-0.343927	reorganized in such a way
-0.292958	(page 131) shows a way
-0.293684	exploiting fine-grained parallelism. The way
-0.237440	by physical factors. The way
-0.334504	can define in this way
-0.334504	is measured in this way
-0.529206	bits in a different way
-0.567190	implemented in the same way
-0.342978	BSD work the same way
-0.342978	always goes the same way
-0.342978	to go the same way
-0.417718	metaprogramming is the only way
-0.311882	pointer aliasing. The only way
-0.374620	times and the other way
-0.374620	many times the other way
-0.287505	and rarely the other way
-0.477772	algorithms to predict which way
-0.337716	through pointers in one way
-0.498909	in more than one way
-0.228809	branch that goes one way
-0.228809	example, to go one way
-0.228809	that goes randomly one way
-0.555977	unfortunately there is no way
-0.514739	processors. There is no way
-0.514739	screen. There is no way
-0.331062	allocated resource. The C++ way
-0.458247	can be an efficient way
-0.352856	in a more efficient way
-0.397397	is a very efficient way
-0.293025	is an even faster way
-0.349431	two times the first way
-0.342478	two ways. The first way
-0.373267	This is a useful way
-0.325064	is a very useful way
-0.329851	a time. A simple way
-0.271674	This is the best way
-0.271674	needed. Obviously, the best way
-0.271674	spot. Sometimes, the best way
-0.269926	below shows. The best way
-0.269926	suitable duration. The best way
-0.738814	This is a common way
-0.514023	structure is a good way
-0.328069	have exploited. A good way
-0.235632	whenever it goes another way
-0.321702	of code. The second way
-0.233281	set. The most compatible way
-0.263094	calculations in a safe way
-0.210453	are called. The safe way
-0.316527	compiling module2.cpp. The simplest way
-0.020173	There is no easy way
-0.229057	is fastest. The typical way
-0.178550	most cases, the fastest way
-0.178550	is still the fastest way
-0.224868	list is a convenient way
-0.224868	more primitive, but efficient, way
-0.222293	that for a portable way
-0.637212	CPUs in a suboptimal way
-0.056977	aliasing (/Oa). The easiest way
-0.056977	large delays. The easiest way
-0.165024	more clear and intelligible way
-0.845306	to one of the elements
-0.356775	and b, and the elements
-1.090508	makes sure that the elements
-0.353743	is that if the elements
-0.997119	of 2 if the elements
-0.945343	order in which the elements
-0.335576	28. We take the elements
-0.237230	simply by storing the elements
-0.237230	This function adds the elements
-0.137191	is the number of elements
-0.329100	if the number of elements
-0.477199	by the number of elements
-0.338875	on the number of elements
-0.380261	If the number of elements
-0.446350	available. The number of elements
-0.175867	the total number of elements
-0.274068	the 107 number of elements
-0.274068	lookups Max. number of elements
-0.340501	number and types of elements
-0.043760	element, bits Number of elements
-0.101675	in vector Type of elements
-0.101675	as follows: Type of elements
-0.444200	to the diagonal. The elements
-0.237450	r2 and c2 for elements
-0.237450	are search requests for elements
-0.370805	can be used if elements
-0.259014	may be used if elements
-0.348713	2 applies only when elements
-0.353272	the smaller the data elements
-0.291839	performed on multiple data elements
-0.442569	do operations on all elements
-0.288883	The constructor sets all elements
-0.311028	the array after all elements
-0.339141	checking all the array elements
-0.582172	the addresses of array elements
-0.323737	Induction variables for array elements
-0.226971	access to individual array elements
-0.237027	x?" or "how many elements
-0.236984	does not alias any elements
-0.236520	diagonal and swap these elements
-0.334323	of different C++ language elements
-0.235645	code will read four elements
-0.235647	STL for accessing container elements
-0.261039	diagonal. The first eight elements
-0.208632	line. But these eight elements
-0.208632	we can handle eight elements
-0.208632	because it handles eight elements
-0.322284	sum operator // add elements
-0.346314	use standard user interface elements
-0.233702	An experiment where 10 elements
-0.000283	vector in eight consecutive elements
-0.000071	// Load eight consecutive elements
-0.231320	// Array with N elements
-0.088585	below diagonal // swap elements
-0.088585	swapd(a[r][c], a[c][r]); // swap elements
-0.229123	list causes all subsequent elements
-0.307548	may fail to distinguish elements
-0.218494	row-wise, then the mirror elements
-0.199920	may even add dummy elements
-0.354900	In general, it is faster
-0.766982	static member function is faster
-0.344613	the template function is faster
-0.511670	the time. This is faster
-0.348268	called before. This is faster
-0.701699	longer time. It is faster
-0.449775	to integers. It is faster
-0.347998	files smaller. It is faster
-0.043517	to floating point is faster
-0.232665	power of 2 is faster
-0.410779	language". The method is faster
-0.522925	versions. This method is faster
-0.318964	column. The access is faster
-0.310585	to a file is faster
-0.586292	in this case is faster
-0.001675	by a constant is faster
-0.101325	the software implementation is faster
-0.101325	a software implementation is faster
-0.288253	microprocessor. Integer division is faster
-0.610942	If one operand is faster
-0.288253	writing big blocks is faster
-0.400399	and development tool is faster
-0.232665	in example 15.1c is faster
-0.232665	SSE3. // (This is faster
-0.232665	a constant: Unsigned is faster
-0.232665	in example 14.21 is faster
-0.237881	if this prevents a faster
-1.312392	is likely to be faster
-0.514549	memory. This may be faster
-0.452614	This method may be faster
-0.354157	this multiplication will be faster
-0.344463	because integer operations are faster
-0.344791	inefficient. Linear arrays are faster
-0.555853	one makes the code faster
-0.453263	may make member functions faster
-0.293285	in C and C++ faster
-0.236783	function is 83 called faster
-0.320241	but this is often faster
-0.509769	a[i]; It is often faster
-0.231008	it would be even faster
-0.231008	There is an even faster
-0.329491	speed is many times faster
-0.223579	up to 5 times faster
-0.223579	three to seven times faster
-0.310177	can make function calls faster
-0.402492	frame makes function calls faster
-0.228566	operators are calculated much faster
-0.228566	It is usually much faster
-0.291754	mathematical functions are calculated faster
-0.322177	Therefore, it will run faster
-0.220954	the code will run faster
-0.216800	will make applications run faster
-0.234091	communication between threads becomes faster
-0.341484	then it is usually faster
-0.233711	into a vector goes faster
-0.286683	makes the code execute faster
-0.325313	to run a little faster
-0.229206	calls may run slightly faster
-0.281434	WTL application is generally faster
-0.276432	can often be executed faster
-0.222281	of CPUs is increasing faster
-0.002151	of 2 // Still faster
-0.002151	is faster // Still faster
-0.265041	to make software packages faster
-0.199909	multiplication } // ipow faster
-0.165055	of rounding, but neither faster
-1.129656	recommended to use the const
-0.341461	its value. However, the const
-0.407712	is to remove the const
-0.237621	used for relieving the const
-0.538941	a call to a const
-0.354873	function is by a const
-0.237300	easily optimize away a const
-0.237300	a non-const reference, a const
-0.341775	to make table of const
-0.294135	directives are equivalent to const
-0.314551	for test purposes. The const
-0.352599	A const pointer or const
-0.236360	7.28 class c1 { const
-0.236360	0xC0000091L void MathLoop() { const
-0.378074	it points to. A const
-0.234492	a const reference. A const
-0.234492	(&ArraySize) is taken. A const
-0.237503	_mm_empty(); // EMMS } const
-0.237439	of the table has const
-0.237146	StoreNTD(double * dest, double const
-0.295609	ABC 123 and static const
-0.275955	Func () { static const
-0.221827	a; int b; static const
-0.221827	x) { __declspec(align(16)) static const
-0.221827	Table of factorials: static const
-0.221827	7.29b floata; boolb=0; static const
-0.331439	static char const * const
-0.342692	polynomial with induction variables const
-0.236195	Example 8.24. Integer constant const
-0.457813	7.7 unsigned int i; const
-0.003332	StoreVector(void * d, __m128i const
-0.010073	StoreVectorA(void * d, __m128i const
-0.234364	int n; static char const
-0.567033	double Table[100]; int x; const
-0.424483	table should be declared const
-0.328378	make log2 a global const
-0.325082	Example 12.1a. Automatic vectorization const
-0.374752	applied to a local const
-0.226487	const & a, T const
-0.367098	}; int order(int x); const
-0.224759	inline int lrintf (float const
-0.222163	Alignd(X) X __attribute__((aligned(16))) #endif const
-0.218375	Example: // Example 14.8 const
-0.218375	or from example 16.1 const
-0.016019	inline int lrint (double const
-0.264878	search: // Example 14.30 const
-0.001697	static inline __m128i LoadVector(void const
-0.212033	matrix[c][r]. // Example 9.5a const
-0.212033	Example: // Example 11.3 const
-0.212135	Example: // Example 9.4 const
-0.212135	fastest: // Example 7.17 const
-0.035737	Example 9.1a int Func(int); const
-0.035737	Example 9.1b int Func(int); const
-0.251052	array. // Example 9.6a const
-0.199769	two: // Example 11.2b const
-0.199769	Define size of squares: const
-0.465681	// Table of factorials: const
-0.164926	interval: // Example 14.5a const
-0.164926	parameters Vec4f polynomial (Vec4f const
-0.164926	numbers: // Example 11.2a const
-0.164926	vector operator + (vector const
-0.164926	static inline __m128i LoadVectorA(void const
-0.164926	static inline float add_elements(__m128 const
-0.164926	memcpy: // Example 7.33b const
-0.164926	well use a #define, const
-0.164926	static inline T max(T const
-0.164926	Example: // Example 7.33a const
-0.164926	int FuncRow(int); int FuncCol(int); const
-0.164926	this: // Example 14.4a const
-0.237537	the function pointer and makes
-0.324915	function calls faster and makes
-0.353143	to make code that makes
-0.443536	with a destructor that makes
-0.314549	c + d; // makes
-0.347043	some cases and it makes
-0.462498	algorithms is that it makes
-0.462498	volatile is that it makes
-0.337297	compiler mechanism because it makes
-0.337297	particularly interesting because it makes
-0.733055	few cases where it makes
-0.233855	an integer variable, it makes
-0.536088	fragmentation of the code makes
-0.536088	compactness of the code makes
-0.308734	in the memory. This makes
-0.564700	of the program. This makes
-0.389680	addressing of data. This makes
-0.272950	matrix[i][j] += x; This makes
-0.292465	0 or 1. This makes
-0.292465	structure or class. This makes
-0.219173	typically 64 bytes. This makes
-0.616465	on the stack. This makes
-0.219173	space than needed. This makes
-0.219173	in random order. This makes
-0.299789	than 32 bits. This makes
-0.535796	any other modules. This makes
-0.356897	offset at all. This makes
-0.219173	registers is doubled. This makes
-0.219173	same cache lines. This makes
-0.272950	has been loaded. This makes
-0.219173	array is stored. This makes
-0.272950	has become fragmented. This makes
-0.219173	the table static. This makes
-0.219173	misses have occurred. This makes
-0.219173	be made local. This makes
-0.355553	library functions. The compiler makes
-0.348117	efficient code, but this makes
-0.237562	to another class. It makes
-0.237450	*.so). The installation program makes
-0.237414	multiple smaller functions only makes
-0.231287	the access non-sequential which makes
-0.231287	binding by default, which makes
-0.231287	level of abstraction which makes
-0.525728	on the stack, which makes
-0.293558	two comparisons by one makes
-1.610015	the SSE2 instruction set makes
-0.349002	Pentium Pro instruction set makes
-0.916063	Agner's vector class library makes
-0.230711	of time, it also makes
-0.286033	if possible. This also makes
-0.230711	The static keyword also makes
-0.346221	reasons: The function call makes
-0.236556	simple variable. Using pointers makes
-0.292472	C++ exception handling system makes
-0.236056	processors. A non-Intel processor makes
-0.099229	compiler manual. This option makes
-0.099229	or -fsource-asm). This option makes
-0.235722	derived class. This check makes
-0.235309	and unsigned integers simply makes
-0.321312	reference. A const reference makes
-0.377937	volatile. The volatile keyword makes
-0.313488	application, while dynamic linking makes
-0.308364	is used. Dynamic linking makes
-0.233452	enum, const, or #define makes
-0.412728	the standard stack frame makes
-0.285146	Excessive use of templates makes
-0.229996	absence of such checks makes
-0.229952	module. The static declaration makes
-0.304235	with multiple memory blocks makes
-0.226681	template with many instances makes
-0.320711	-fpie because the linker makes
-0.212201	rather than the product makes
-0.165014	the code section position-independent, makes
-0.237691	cannot be inlined or cannot
-0.565194	section so that it cannot
-0.528549	function means that it cannot
-0.457674	needed only if it cannot
-0.353208	uses pointers because it cannot
-0.344863	points to. Therefore, it cannot
-0.547738	compiler that the function cannot
-0.846946	A static member function cannot
-0.322107	A const member function cannot
-0.235640	the dispatcher 128 function cannot
-0.346278	distributed. The intermediate code cannot
-0.796044	is that the compiler cannot
-0.346165	calls. Unfortunately, the compiler cannot
-0.346165	example 12.1b, the compiler cannot
-0.353538	page 72). The compiler cannot
-0.233353	keyword. The CodeGear compiler cannot
-0.302513	function library and you cannot
-0.302513	always sequential, and you cannot
-0.346193	programming questions if you cannot
-0.318890	big loop then you cannot
-0.318890	exception handling then you cannot
-0.318890	instruction set, then you cannot
-0.343858	clock cycles. If you cannot
-0.309716	known type, but you cannot
-0.299453	{ ... Here, you cannot
-0.279629	{ ... Here you cannot
-0.279629	In most systems, you cannot
-0.225068	is false. Likewise, you cannot
-0.537959	9.6b. The MOVNTQ instruction cannot
-0.434155	contains a function which cannot
-0.538739	that the level-2 cache cannot
-0.444615	reductions that the compilers cannot
-0.234795	purity. For example, compilers cannot
-0.381893	If the final size cannot
-0.314128	mentioned above. An object cannot
-0.335069	(see below). A variable cannot
-0.203193	a ^ 1; You cannot
-0.335102	several clock cycles. You cannot
-0.203193	terminating a thread. You cannot
-0.203193	and which not. You cannot
-0.203193	in this case. You cannot
-0.203193	few pitfalls here. You cannot
-0.203193	of Boolean operands. You cannot
-0.203193	execution of CriticalFunction. You cannot
-0.203193	different test examples. You cannot
-0.203193	directives are compiler-specific. You cannot
-0.623958	a particular memory address cannot
-0.465405	program. Whole program optimization cannot
-0.293488	different so that they cannot
-0.220037	same reason that they cannot
-0.315411	be avoided because they cannot
-0.268982	in cases where they cannot
-0.215665	and which reductions they cannot
-0.292454	set. More complicated cases cannot
-0.329867	bits. The vector instructions cannot
-0.236251	is that access times cannot
-0.323259	favor of Intel CPUs cannot
-0.236088	and make it work cannot
-0.342791	of m and therefore cannot
-0.340030	overview of the problem cannot
-0.321391	is that the name cannot
-0.321312	pointer or const reference cannot
-0.310753	times for network resources cannot
-0.290207	line. 132 Table lookup cannot
-0.234470	assume is optimized. We cannot
-0.108089	pointers of different types cannot
-0.328097	cases where dynamic linking cannot
-0.233440	of floating point operands cannot
-0.414199	dynamic library is loaded cannot
-0.224754	prevent optimization. The debugger cannot
-0.222200	point induction variables Compilers cannot
-0.199864	as many encryption algorithms, cannot
-0.314211	64-bit vector operations and before
-0.237469	program starts running and before
-0.353343	i to unsigned int before
-0.490944	122 this the time before
-0.343818	may read the time before
-0.355618	condition terminates the program before
-0.322657	counters inside your program before
-0.237428	by an EMMS instruction before
-0.313806	converting a to double before
-0.335359	processor is an Intel before
-0.236966	in a temporary array before
-0.554737	so that the value before
-1.719366	the time it takes before
-0.419670	to a virtual table before
-0.236846	any branch misprediction long before
-0.444586	routine that is called before
-0.427774	any, must be called before
-0.369054	function is usually called before
-0.323260	function billions of times before
-0.514290	pushed on the stack before
-0.313417	popped from the stack before
-0.291967	c[i] are too big before
-0.518865	(1) check for overflow before
-0.291835	integers to signed integers before
-0.348806	precision to double precision before
-0.235684	do such a check before
-0.342870	to store is known before
-0.516690	Is the size known before
-0.215390	replaced by their values before
-0.215390	initialized to desired values before
-0.215390	by their actual values before
-0.235236	of a pointer well before
-1.097844	a few clock cycles before
-0.815904	the time stamp counter before
-0.235168	measure the clock count before
-0.310850	convert it to signed before
-0.234316	start a new addition before
-0.220893	the final size needed before
-0.359261	solution. Is searching needed before
-0.310311	x can be read before
-0.288058	do because it comes before
-0.232129	new value of temp before
-0.434961	choose the optimal algorithm before
-0.231162	increasing the thread priority before
-0.579959	calculation of one iteration before
-0.742178	the performance monitor counters before
-0.229917	computer for security reasons before
-0.304159	NumberOfTests times // Time before
-0.282819	may detect the misprediction before
-0.226656	because they are resolved before
-0.226507	a nearby address again before
-0.224624	full declaration of c1 before
-0.224682	the calculation of B before
-0.224682	of interpretation or compilation before
-0.309176	have done the job before
-0.222139	functions that require cleanup before
-0.035732	the intrinsic function _mm256_zeroupper() before
-0.003828	support then call _mm256_zeroupper() before
-0.003828	dispatching then call _mm256_zeroupper() before
-0.003828	support, then call _mm256_zeroupper() before
-0.218268	often take several years before
-0.212004	deal of programming experience before
-0.212004	because it is evicted before
-0.199741	it may be freed before
-0.074670	work to do immediately before
-0.074670	must be placed immediately before
-0.164900	decoded in several stages before
-0.164900	have to be restored before
-0.164900	on a particular subtask before
-0.164900	start to calculate (c+d) before
-0.164900	each string is checked before
-0.164900	on the second sub-vector before
-0.354532	the table that is stored
-0.840777	so that it is stored
-0.532234	integer if it is stored
-0.532234	efficiently if it is stored
-0.322884	loop counter i is stored
-0.305113	If the variable is stored
-0.305113	sure the variable is stored
-0.267646	the first result is stored
-0.267646	the second result is stored
-0.571168	the first element is stored
-0.235876	fraction. The sign is stored
-0.426339	numbers. The exponent is stored
-0.235876	and the fraction is stored
-0.237592	code is distributed and stored
-0.538237	calculated in advance and stored
-0.344088	the variable to be stored
-0.444838	allow variables to be stored
-0.546386	references, which can be stored
-0.454498	of variables can be stored
-0.351058	error code may be stored
-0.311183	same class will be stored
-0.311183	and objects will be stored
-0.311183	constant 3.5 will be stored
-0.311183	static modifier will be stored
-0.566038	multidimensional array should be stored
-0.309476	16 bytes should be stored
-0.014058	used together should be stored
-0.325079	An object cannot be stored
-0.325079	A variable cannot be stored
-0.632744	should therefore preferably be stored
-0.541597	storage. Variables that are stored
-0.043216	inside a function are stored
-0.240803	if the data are stored
-0.272557	in which data are stored
-0.309227	inside a class are stored
-0.309227	a derived class are stored
-0.341973	Variables and objects are stored
-0.220230	commonly used variables are stored
-0.374351	overdetermined Boolean variables are stored
-0.220230	function. Global variables are stored
-0.508767	which the elements are stored
-0.747139	memory. Function parameters are stored
-0.230191	class or structure are stored
-0.305539	Floating point numbers are stored
-0.523570	are used together are stored
-0.589501	floating point constants are stored
-0.500075	The objects are not stored
-0.293829	result ebx is then stored
-0.554026	accessed through a pointer stored
-0.230791	the program are also stored
-0.047235	each other are also stored
-0.230791	used together are also stored
-0.426476	necessary if the objects stored
-0.281551	complexity (en.wikipedia.org/wiki/Standard_Template_Library). The objects stored
-0.098746	used only for objects stored
-0.098746	this principle for objects stored
-0.330082	called properties) are always stored
-0.882821	all objects have been stored
-0.235398	pointers and for information stored
-0.311227	and child are typically stored
-0.610700	a variable is never stored
-0.309744	The functions are usually stored
-0.231883	above, p. 26). Variables stored
-0.017519	aligned by 16, i.e. stored
-0.306361	sequence are not necessarily stored
-0.358602	the body of the called
-0.357964	the caller to the called
-0.330345	is big and is called
-0.502772	a function that is called
-0.342141	initialization routine that is called
-0.351604	or if it is called
-1.214767	time the function is called
-0.482412	times the function is called
-0.500768	when a function is called
-0.396427	call. The function is called
-0.305267	any other function is called
-0.305267	times each function is called
-0.558469	the virtual function is called
-0.470965	a time. This is called
-0.319921	different CPUs. This is called
-0.319921	its parameters. This is called
-0.319921	the processor. This is called
-0.319921	to do. This is called
-0.319921	the same. This is called
-0.319921	vacant spaces. This is called
-0.319590	calls other functions is called
-0.350399	library (DLL) which is called
-0.319590	the preceding one is called
-0.608233	in this example is called
-0.309095	in Intel processors is called
-0.169951	CPUs. Intel's profiler is called
-0.169951	VTune; AMD's profiler is called
-0.310994	and the destructor is called
-0.101124	of times CriticalFunction is called
-0.101124	on whether CriticalFunction is called
-0.877892	dynamic library can be called
-0.451234	same function may be called
-0.451234	copy constructor may be called
-0.350625	the function cannot be called
-0.323757	copy constructor must be called
-0.323757	if any, must be called
-0.235206	// Dispatcher. Will be called
-0.323226	of any function are called
-0.336458	that functions which are called
-0.427337	and derived class are called
-0.235550	of each object are called
-0.534164	that all destructors are called
-0.235550	sum1 and sum2 are called
-0.351047	static Assume member function called
-0.313725	throw() throw() Assume function called
-0.234774	stand alone compiler when called
-0.345617	precedence, not only when called
-0.234774	main, but also when called
-0.453781	A part of memory called
-0.311952	only if all functions called
-0.340719	for any library functions called
-0.450630	if it is only called
-0.237352	in a special cache called
-0.417013	the function is also called
-0.321879	simultaneously. This is also called
-0.022729	dynamic link libraries, also called
-0.236964	pointers to its variables called
-0.516853	if it has been called
-0.235128	where the function was called
-0.234453	paragraph described a mechanism called
-0.320707	the functions are actually called
-0.363453	dispatcher function is usually called
-0.278347	if any, is usually called
-0.524336	compilers have a feature called
-0.201916	Gnu compiler A feature called
-0.253469	a built-in test feature called
-0.232647	pointers to its functions, called
-0.165086	of a class (also called
-0.165086	member function is 83 called
-0.165086	powN is // erroneously called
-1.049302	the calculation of the address
-0.348815	a row to the address
-0.348815	pointer eax to the address
-0.881941	is equal to the address
-0.585431	binding is that the address
-0.356327	array element if the address
-0.522220	order to make the address
-0.216882	first look up the address
-0.216882	(3) look up the address
-0.048047	(eax) which contains the address
-0.048047	ecx now contains the address
-0.048047	is. ecx contains the address
-0.048047	and edx contains the address
-0.522148	order to calculate the address
-0.310142	and can calculate the address
-0.235769	pointer is simply the address
-0.347718	integer constant, unless the address
-0.436084	i; } Here, the address
-0.752575	order to find the address
-0.312183	the line containing the address
-0.398104	support for calculating the address
-0.267225	implicitly when calculating the address
-0.322754	map file tells the address
-0.235769	is 4. So the address
-0.235769	line that covered the address
-0.293701	calculate its address. The address
-0.293701	little odd here. The address
-0.438834	few extra instructions for address
-0.357387	look up the function address
-0.237706	to a different code address
-0.342295	are aligned to an address
-0.017914	be stored at an address
-0.076567	to start at an address
-0.207878	be loaded at an address
-0.076567	must begin at an address
-0.293837	load address. If this address
-0.288300	writing a variable from address
-0.232706	or 0x40 bytes from address
-0.232706	0x4700. Reading again from address
-0.288300	a program reads from address
-0.021564	stored at a memory address
-0.305484	large libraries. The memory address
-0.311188	to read from memory address
-0.046493	for a particular memory address
-0.046493	that a particular memory address
-0.226280	at an arbitrary memory address
-0.335028	is then stored at address
-0.235860	on the stack at address
-0.539339	writing from the same address
-0.451871	writes to any other address
-0.293502	it can calculate each address
-0.545536	calculation of the array address
-0.293067	can overwrite the return address
-0.292627	to relocate, but these address
-0.292282	a register if its address
-0.234803	rather than the complicated address
-0.321565	the program by their address
-0.321106	once, while the runtime address
-0.340042	(1) get its own address
-0.332120	loaded at a higher address
-0.233319	directly if the target address
-0.168699	changed then the target address
-0.226029	be predicted. The target address
-0.303005	advance, because a fixed address
-0.279327	expressed as a base address
-0.276432	benefit from the larger address
-0.212177	same or a nearby address
-0.199909	but the variable whose address
-0.540785	but any of the 4
-0.347417	member. This pointer is 4
-0.441790	Library exp function of 4
-0.237670	aligned // Structure of 4
-0.347431	may take up to 4
-0.142715	4 bytes. first // 4
-0.142715	8 bytes. first // 4
-0.323039	399 int b; // 4
-0.405318	7 int d; // 4
-0.346358	= 2048 bytes = 4
-0.406738	list, rolled out by 4
-0.293141	esp ; align by 4
-0.314477	processors, and 3 - 4
-0.536239	4 bytes = int 4
-0.536239	= float or int 4
-0.537446	8 bytes = double 4
-0.922113	4 bytes = float 4
-0.234058	mode 8 8 float 4
-0.527475	8. There are also 4
-0.236982	sizeof(float)) = 8 * 4
-0.289212	float or double takes 4
-0.233508	the microprocessor. Multiplication takes 4
-0.273073	8 8 float 4 4
-0.219281	signed or unsigned 4 4
-0.219281	reference, 32-bit mode 4 4
-0.219281	AMD only _mm_permutevar_ps 4 4
-0.219281	4 AVX _mm256_permutevar_ps 4 4
-0.680553	int, signed or unsigned 4
-0.291341	256 AVX double 64 4
-0.444982	AVX2 long long 64 4
-0.291341	8 64 4 64 4
-0.073192	2 32 8 64 4
-0.073192	8 32 8 64 4
-0.307224	relation to execution time. 4
-0.231607	the total computation time. 4
-0.446024	MMX short int 16 4
-0.225062	char 64 Iu8vec8 16 4
-0.225062	int 64 Is16vec4 16 4
-0.302451	128 SSE2 int 32 4
-0.286412	128 SSE2 float 32 4
-0.214054	4 64 4 32 4
-0.214054	128 Iu16vec8 Vec8us 32 4
-0.214054	128 Is32vec4 Vec4i 32 4
-0.235647	stride is 8192 / 4
-0.311708	or reference, 32-bit mode 4
-0.290678	organized as 32 sets 4
-0.115434	especially on the Pentium 4
-0.056903	matrix on a Pentium 4
-0.056903	measured on a Pentium 4
-0.056903	9.5a on a Pentium 4
-0.157214	another computer. The Pentium 4
-0.115434	clock cycles on Pentium 4
-0.115434	regular pattern, while Pentium 4
-0.319281	factor of 1, 2, 4
-0.308467	PREFETCH _mm_prefetch SSE Store 4
-0.228100	mark end of procedure 4
-0.125176	4 AVX2 _mm256_i32gather_epi32 unlimited 4
-0.125176	4 AVX2 _mm_i32gather_ps unlimited 4
-0.125176	8 AVX2 _mm_i32gather_epi32 unlimited 4
-0.125176	4 AVX2 _mm256_i32gather_ps unlimited 4
-0.516496	8 bytes = int64_t 4
-0.367096	improved by a factor 4
-0.222188	unit throughput ....................................................................................... 22 4
-0.016025	compiled to assembly: ALIGN 4
-0.355816	NEAR ; parameter 1: 4
-0.212119	costs of optimizing ............................................................................................... 4
-0.199853	chooses the least recently 4
-0.199853	kb = 8192 bytes, 4
-0.165004	XOP, AMD only _mm_permutevar_ps 4
-0.165004	int 4 AVX _mm256_permutevar_ps 4
-0.237588	// Windows syntax or See
-0.792402	parts of the code. See
-0.329592	a virtual member function. See
-0.235584	more efficient than functions. See
-0.299560	same piece of memory. See
-0.225158	preferably with contiguous memory. See
-0.235094	in the main program. See
-0.224091	XMM registers are used. See
-0.377484	PathScale and Gnu compilers. See
-0.702981	AMD and VIA processors. See
-0.232709	not standardized across platforms. See
-0.232758	in the simplest cases. See
-0.763576	than 0 or 1. See
-0.232334	never use static variables. See
-0.231623	goes into sleep mode. See
-0.231563	discussions about code optimization. See
-0.330808	point variables, if possible. See
-0.285808	in an inefficient way. See
-0.229813	on an Intel CPU. See
-0.431706	are aligned or not. See
-0.229891	the best possible version. See
-0.370392	or out of order. See
-0.748931	avoid dynamic memory allocation. See
-0.229002	level linking" if available. See
-0.228915	more complex integer expressions. See
-0.228872	other kinds of storage. See
-0.463030	and the operating system. See
-0.214705	platform and operating system. See
-0.518876	the loop control branch. See
-0.366951	and enables interprocedural optimizations. See
-0.317653	function using assembly language. See
-0.364343	type to avoid this. See
-0.736634	ranges do not overlap. See
-0.306025	with structured exception handling. See
-0.222029	or reading disk files. See
-0.419074	obvious to do so. See
-0.222106	series of five manuals. See
-0.147241	the same memory pool. See
-0.147241	in one memory pool. See
-0.016015	overriding Intel's CPU dispatcher. See
-0.271880	it is not cached. See
-0.413105	has no pointer aliasing. See
-0.211964	is slightly less compact. See
-0.211964	to prevent such errors. See
-0.211964	time each part takes. See
-0.212089	to recover from exceptions. See
-0.264800	aliasing does not occur. See
-0.023501	stored near each other. See
-0.199701	not be optimally aligned. See
-0.250976	the strictness is required. See
-0.199701	the branch prediction mechanism. See
-0.199701	a serious legal issue. See
-0.199701	causes a long delay. See
-0.199701	not use STL containers. See
-0.199701	care of the alignment. See
-0.250976	may cause cache contentions. See
-0.199701	the program will crash. See
-0.164864	before p is incremented. See
-0.164864	force when I die. See
-0.164864	what you are doing. See
-0.164864	of optimization is requested. See
-0.164864	are available from Intel. See
-0.164864	and perhaps Mac OS. See
-0.164864	using the directive __declspec(cpu_dispatch(...)). See
-0.164864	and loop-invariant code motion. See
-0.164864	the parallelism is obvious. See
-0.164864	runtime type identification (RTTI). See
-0.164864	<xmmintrin.h> _mm_setcsr(_mm_getcsr() | 0x8040); See
-0.220591	a multiple of the critical
-0.492980	each version of the critical
-0.492980	optimal version of the critical
-0.970191	appropriate version of the critical
-0.346349	every call of the critical
-1.045060	different versions of the critical
-0.519479	first call to the critical
-0.530457	be equal to the critical
-0.354421	millisecond resolution and the critical
-0.349231	to int in the critical
-0.306368	function calls in the critical
-0.349231	kept small in the critical
-0.349231	point-to-integer conversions in the critical
-0.355249	the subroutine for the critical
-0.355472	to avoid that the critical
-0.351932	is obtained if the critical
-0.351932	branch mispredictions if the critical
-0.457770	functions called by the critical
-0.540784	done every time the critical
-0.352908	these methods then the critical
-0.819301	This is because the critical
-0.353762	page 62. If the critical
-0.351072	column-wise manner where the critical
-0.311906	program that calls the critical
-0.235537	example 16.2 calls the critical
-0.350746	than log) inside the critical
-0.342938	it is outside the critical
-0.328181	cache size. When the critical
-0.328181	loop that includes the critical
-0.290827	and after executing the critical
-0.378682	have to identify the critical
-0.234929	call this distance the critical
-0.654481	the system code is critical
-0.566607	is part of a critical
-0.496033	different versions of a critical
-0.503070	are used in a critical
-0.330984	loaded. This makes a critical
-0.237189	and after executing a critical
-0.237891	above security advices in critical
-0.293320	kb, 8 ways. The critical
-0.293320	same cache lines. The critical
-0.293320	8 bytes each. The critical
-0.237843	are not recommended for critical
-0.314089	and data cache are critical
-0.237367	and memory access are critical
-0.237759	the most important or critical
-0.237547	logically distinct tasks. A critical
-0.539388	called from the same critical
-0.134552	versions of the most critical
-0.310330	only to the most critical
-0.207816	used in the most critical
-0.128017	even in the most critical
-0.128017	calls in the most critical
-0.234215	sure that the most critical
-0.234215	to make the most critical
-0.234215	by making the most critical
-0.234215	and isolate the most critical
-0.336101	such cases. The most critical
-0.236802	total time. Optimizing less critical
-0.348379	be mitigated by making critical
-0.324882	device drivers are particularly critical
-0.068017	.......................................................................................................... 120 13 Making critical
-0.068017	files. 121 13 Making critical
-0.023521	c; ... // Call critical
-0.165112	a soft processor activates critical
-0.331480	hot spot. Use the call
-0.346968	to see whether the call
-0.335849	compiler may replace the call
-0.314237	p and inlining the call
-0.407519	able to overlap the call
-0.237491	simply by removing the call
-0.456898	be certain that a call
-0.237710	is unchanged across a call
-0.512174	are: The overhead of call
-0.771402	in the code to call
-0.457804	that you have to call
-0.740339	then you have to call
-0.346910	takes longer time to call
-0.453251	than it takes to call
-0.914233	time it takes to call
-0.498136	functions that need to call
-1.193864	if you want to call
-0.494073	function that needs to call
-0.291559	about the destructor to call
-0.235572	the exception handler to call
-0.235572	F1 is supposed to call
-0.314336	the best compiler and call
-0.237574	thrown by F2 and call
-0.353620	test program that can call
-0.354009	C1, so it can call
-0.344309	template parameter. It can call
-0.324737	// define matrix // call
-0.237196	<< "Hello "; // call
-0.543874	provided that the function call
-0.730660	implemented as a function call
-0.336369	can replace a function call
-0.336369	by replacing a function call
-0.336237	following reasons: The function call
-0.305453	try block or function call
-0.249116	once for each function call
-0.249116	code at each function call
-0.434639	for a virtual function call
-0.285359	that each intrinsic function call
-0.230118	handling information. Each function call
-0.315859	Entry to dispatched function call
-0.331654	because you should not call
-0.520229	Windows). Alternatively, you may call
-0.236057	a different address. A call
-0.236057	a device driver. A call
-0.314299	0x800 apart. I will call
-0.340676	to zero and then call
-0.232972	without AVX support then call
-0.232972	using CPU dispatching then call
-0.232972	no AVX support, then call
-0.506900	is more than one call
-0.320662	there is only one call
-0.320662	will make only one call
-0.293516	a time because each call
-0.293173	This will make any call
-0.503197	called before the first call
-0.161451	CPU dispatching on first call
-0.161451	the dispatch on first call
-0.045203	point. // After first call
-0.045203	dispatcher. // After first call
-0.292534	determined by a system call
-0.171178	a function that doesn't call
-0.322139	moved with a single call
-0.322139	to make a single call
-0.165628	makes dispatching on every call
-0.165628	the dispatch on every call
-0.289165	if no other modules call
-0.283083	obj1; p->f(); // Virtual call
-0.281457	&CriticalFunction_386; } // Now call
-0.051644	work int i = 0;
-0.000288	for (int i = 0;
-0.320510	{ CFALSE: c = 0;
-0.012540	b for (i = 0;
-0.012540	0; for (i = 0;
-0.022804	i; for (i = 0;
-0.012540	b; for (i = 0;
-0.003101	... for (i = 0;
-0.072353	1; for (i = 0;
-0.012540	zero for (i = 0;
-0.006225	x; for (i = 0;
-0.012540	temp; for (i = 0;
-0.012540	3; for (i = 0;
-0.012540	a[100]; for (i = 0;
-0.012540	45 for (i = 0;
-0.012540	r; for (i = 0;
-0.012540	loop: for (i = 0;
-0.012540	a[2]; for (i = 0;
-0.012540	84 for (i = 0;
-0.012540	timediff[NumberOfTests]; for (i = 0;
-0.012540	printf("\nResults:"); for (i = 0;
-0.307823	0) { d = 0;
-0.202467	a[100]; float sum = 0;
-0.141984	int i, sum = 0;
-0.141984	float list[size], sum = 0;
-0.252853	i<301; i+=3){ list[i] = 0;
-0.201368	DelayFiveSeconds() { seconds = 0;
-0.003370	{ for (c = 0;
-0.010190	rows for (c = 0;
-0.201368	= 0, sum2 = 0;
-0.101007	c; for (r = 0;
-0.129977	x; for (x = 0;
-0.129977	B; for (x = 0;
-0.252853	int i, largest_index = 0;
-0.252853	int absvalue, largest_abs = 0;
-0.201368	i++) for (j = 0;
-0.201368	{ for (c1 = 0;
-0.201368	column; for (row = 0;
-0.201368	squares: for (r1 = 0;
-0.201368	row++) for (column = 0;
-0.201368	2; } list[300] = 0;
-0.634328	} else { return 0;
-0.047015	CriticalFunction(b, c); ... return 0;
-0.047015	(*CriticalFunction)(b, c); ... return 0;
-0.234368	= StringLength; i > 0;
-0.308526	= 2.0; i >= 0;
-0.224954	y && z != 0;
-0.294202	too slow. Today, the 8
-0.444311	level-1 data cache is 8
-0.172313	is a cache of 8
-0.172313	level-1 data cache of 8
-0.335488	either sixteen integers of 8
-0.237284	hold 8 double's of 8
-0.324892	registers (6 integer and 8
-0.901438	in 32-bit systems and 8
-0.349554	then sizeof(S1) would be 8
-0.237157	6 unused bytes // 8
-0.324450	{ double b; // 8
-0.237701	(columns * sizeof(float)) = 8
-0.335287	is typically aligned by 8
-1.071574	an address divisible by 8
-0.382528	Multiplication takes 4 - 8
-0.536220	4 bytes = int 8
-0.536220	= float or int 8
-1.106458	reasons explained on page 8
-0.237183	float 4 4 double 8
-0.928342	4 bytes = float 8
-0.236966	the value 10 * 8
-0.293181	example, a double takes 8
-0.236849	set has nothing between 8
-0.214697	4 4 double 8 8
-0.214697	signed or unsigned 8 8
-0.214697	reference, 64-bit mode 8 8
-0.303240	Instruction set char 8 8
-0.214697	Vector class, Agner 8 8
-0.214697	char 64 Is8vec8 8 8
-0.476677	integer, signed or unsigned 8
-0.307677	512 AVX512 double 64 8
-0.468442	AVX512 long long 64 8
-0.292841	10, 12 or 16 8
-0.436795	SSE2 short int 16 8
-0.219490	128 Is16vec8 Vec8s 16 8
-0.219490	128 Iu8vec16 Vec16uc 16 8
-0.302438	256 AVX2 int 32 8
-0.286399	256 AVX2 float 32 8
-0.214043	4 64 2 32 8
-0.286399	16 32 8 32 8
-0.214043	Vec4d 16 16 32 8
-0.339703	compiler sees the constant 8
-0.235626	is 512 kb / 8
-0.333201	has made the structure 8
-0.344676	or reference, 64-bit mode 8
-0.182914	bits Instruction set char 8
-0.259001	long int unsigned char 8
-0.182914	2 128 SSE2 char 8
-0.182914	1 64 MMX char 8
-0.182914	value in stdint.h char 8
-0.335653	and leave the last 8
-0.424220	SelectAddMul_SSE41 #elif INSTRSET == 8
-0.285225	MOVNTI _mm_stream_si32 SSE2 Store 8
-0.195870	to using namespaces. 65 8
-0.195870	65 7.33 Namespaces........................................................................................................... 65 8
-0.125172	2 AVX2 _mm256_i64gather_pd unlimited 8
-0.125172	8 AVX2 _mm_i64gather_pd unlimited 8
-0.125172	2 AVX2 _mm256_i64gather_epi32 unlimited 8
-0.125172	8 AVX2 _mm_i64gather_epi32 unlimited 8
-0.282914	use only the lower 8
-0.281353	ENDP ecx, 1 eax, 8
-0.311564	cache line can hold 8
-0.224761	Intel Vector class, Agner 8
-0.355784	PROCNEAR ; parameter 1: 8
-0.164983	of programming language ............................................................................... 8
-0.164983	char 128 Is8vec16 Vec16c 8
-0.164983	2 uint64_t 128 Vec2uq 8
-0.164983	8 char 64 Is8vec8 8
-0.164983	cache is 512 kb, 8
-0.164983	1 int64_t 64 I64vec1 8
-0.537494	point where it is less
-0.582994	calculate the function is less
-0.355779	by default. This is less
-0.884433	the Intel compiler is less
-0.356851	iterations ahead. It is less
-0.352462	32-bit integers, which is less
-0.210355	Atom processors, but is less
-0.210355	under test, but is less
-0.481573	structure or class is less
-0.338354	or multidimensional array is less
-0.336487	that a value is less
-0.291105	The -fpie option is less
-0.600982	A linked list is less
-0.533416	floating point numbers is less
-0.404095	line. The delay is less
-0.235173	not! 250 μs is less
-0.235173	of a bitfield is less
-0.237803	of jumping around and less
-0.237800	may be sufficient for less
-1.257414	is likely to be less
-0.491997	not guaranteed to be less
-0.357294	oriented programs can be less
-0.497663	the code will be less
-0.236041	may in fact be less
-0.348547	Virtual member functions are less
-0.310809	the level-1 cache are less
-0.345412	efficient. Dynamic libraries are less
-0.234616	and 64 bits are less
-0.335885	on executing instructions are less
-0.234616	cases. Integer expressions are less
-0.234616	if alternative implementations are less
-0.290471	the alignment requirements are less
-0.237695	by only 50% or less
-0.345119	default, which makes it less
-0.555840	also makes the code less
-0.357520	when i is not less
-0.237576	small embedded applications have less
-0.293741	thread may run at less
-0.356882	only makes the program less
-0.237349	Fortran and several other less
-0.324721	contains similar functions, but less
-0.340890	programming languages, but also less
-0.236688	is using one register less
-0.330202	that make member pointers less
-0.236254	be a million times less
-0.236110	writing data files while less
-0.283575	accessed backwards and much less
-0.228546	devices typically have much less
-0.592818	The code cache works less
-0.378940	not read or write less
-0.235055	because this brand was less
-0.148322	makes the data caching less
-0.148322	caching and data caching less
-0.327560	This makes data caching less
-0.426698	the code makes caching less
-0.333029	an option that allows less
-0.234482	with a relative difference less
-0.319228	hardware platform has become less
-0.232572	Digital Mars compilers produce less
-0.528177	Factors that make vectorization less
-0.231284	the total time. Optimizing less
-0.231301	optimizing compilers available, though less
-0.523069	Boolean variables as input less
-0.255902	127 bytes is slightly less
-0.255902	int) are only slightly less
-0.165141	work, 133 although slightly less
-0.199869	sequentially. It works somewhat less
-0.165019	one. It may neverthe- less
-0.293983	time stamp counter // For
-0.515007	spots in the code. For
-0.226762	microprocessor handles this code. For
-0.226762	code that makes code. For
-0.455929	possible at compile time. For
-0.455929	calculated at compile time. For
-0.235768	operands are comparisons, etc. For
-0.342706	the pointer is used. For
-0.291068	in Intel header files For
-0.375701	variable in many cases. For
-0.344690	big waste of resources. For
-0.210249	you have ample resources. For
-0.287132	size of the variable. For
-0.231596	the possibilities for optimization. For
-0.230543	as a Boolean vector. For
-0.255187	is called CPU dispatching. For
-0.255187	of poor CPU dispatching. For
-0.199868	file of every version. For
-0.199868	of every intermediate version. For
-0.195833	the object pointed to. For
-0.195833	member pointer refers to. For
-0.407763	of floating point expressions. For
-0.195760	reduce complicated algebraic expressions. For
-0.284030	calculations can cause overflow. For
-0.328594	use different execution units. For
-0.227831	efficient use of software. For
-0.226552	cannot be vectorized automatically. For
-0.226502	threads in each core. For
-0.279185	reference to a structure. For
-0.222065	a CPU- specific profiler. For
-0.284065	integer in one operation. For
-0.284065	as a shift operation. For
-0.693640	by the unroll factor. For
-0.218262	the data elements are. For
-0.218262	the worst- case conditions. For
-0.218262	a difference in efficiency. For
-0.218262	priorities to different tasks. For
-0.355647	have big data structures. For
-0.148544	two or more constants. For
-0.148544	used for defining constants. For
-0.641404	compilers and operating systems". For
-0.211998	in C++: Preprocessor directives. For
-0.211998	members of mixed sizes. For
-0.283985	by a table lookup. For
-0.102770	the conversion is valid. For
-0.102770	second operand is valid. For
-0.211998	more efficient than post-increment. For
-0.330423	of the clock frequency. For
-0.199735	different kinds of jobs. For
-0.199735	it as a subexpression. For
-0.199735	instruction set is supported. For
-0.251014	the task in question. For
-0.199735	independence, and easy development. For
-0.465619	reasons of mathematical purity. For
-0.199735	a number of sources. For
-0.251014	the job fast enough. For
-0.330423	part of the fraction. For
-0.251014	branch is poorly predictable. For
-0.199735	of an execution unit. For
-0.199735	columns to a matrix. For
-0.164895	fundamental laws of algebra. For
-0.164895	only when it exits. For
-0.164895	divisions can be combined. For
-0.164895	the sake of modularity. For
-0.164895	effect is simply identical. For
-0.164895	possibility of algebraic reduction. For
-0.164895	and % means modulo. For
-0.164895	sometimes be eliminated completely. For
-0.164895	form of error reporting. For
-0.164895	between threads is minimized. For
-0.236012	It is used, for example,
-0.236012	pattern can be, for example,
-0.236012	with two decimals, for example,
-0.236012	|| expression. Assume, for example,
-0.236012	memory footprint. If, for example,
-0.236012	In example 12.3a, for example,
-0.078812	} } In this example,
-0.078812	c; } In this example,
-0.175169	+= b; In this example,
-0.175169	-2.0 55 In this example,
-0.175169	reused elsewhere. In this example,
-0.175169	MAX(f(x), g(x)); In this example,
-0.064631	handles this code. For example,
-0.064631	that makes code. For example,
-0.241209	at compile time. For example,
-0.064038	pointer is used. For example,
-0.064038	Intel header files For example,
-0.100952	have ample resources. For example,
-0.064038	of the variable. For example,
-0.064038	possibilities for optimization. For example,
-0.064038	a Boolean vector. For example,
-0.015146	called CPU dispatching. For example,
-0.015146	poor CPU dispatching. For example,
-0.030840	floating point expressions. For example,
-0.030840	complicated algebraic expressions. For example,
-0.064038	can cause overflow. For example,
-0.064038	different execution units. For example,
-0.064038	be vectorized automatically. For example,
-0.064038	in each core. For example,
-0.100952	in one operation. For example,
-0.064038	the unroll factor. For example,
-0.064038	data elements are. For example,
-0.064038	worst- case conditions. For example,
-0.064038	difference in efficiency. For example,
-0.064038	to different tasks. For example,
-0.064038	big data structures. For example,
-0.030840	or more constants. For example,
-0.030840	for defining constants. For example,
-0.155622	conversion is valid. For example,
-0.064038	efficient than post-increment. For example,
-0.064038	the clock frequency. For example,
-0.064038	kinds of jobs. For example,
-0.064038	as a subexpression. For example,
-0.064038	set is supported. For example,
-0.064038	task in question. For example,
-0.064038	and easy development. For example,
-0.064038	of mathematical purity. For example,
-0.064038	number of sources. For example,
-0.064038	job fast enough. For example,
-0.064038	of the fraction. For example,
-0.064038	an execution unit. For example,
-0.064038	to a matrix. For example,
-0.064038	laws of algebra. For example,
-0.064038	when it exits. For example,
-0.064038	sake of modularity. For example,
-0.064038	is simply identical. For example,
-0.064038	of algebraic reduction. For example,
-0.064038	% means modulo. For example,
-0.064038	of error reporting. For example,
-0.064038	threads is minimized. For example,
-0.352363	loops. Consider the following example,
-0.303497	removed from the above example,
-0.099382	faster. In the above example,
-0.099382	big. In the above example,
-0.228473	inlined. (In the above example,
-0.342079	two. In the preceding example,
-0.088595	// Example 12.4c. Same example,
-0.088595	// Example 12.4e. Same example,
-0.088595	// Example 12.4d. Same example,
-0.200021	indeed a very contrived example,
-0.357904	must consider that the bit
-0.864880	is to use the bit
-0.351296	slow implementations of this bit
-0.285297	For example, if each bit
-0.230064	in fact using each bit
-0.322158	variable __intel_cpu_feature_indicator where each bit
-0.230064	// get next each bit
-0.237050	Addison-Wesley, 2003. Contains many bit
-0.236777	has nothing between 8 bit
-0.026177	Shared objects in 64 bit
-0.026177	address calculation in 64 bit
-0.026177	byte longer in 64 bit
-0.026177	relative references in 64 bit
-0.026177	without -fpic in 64 bit
-0.026177	is seen in 64 bit
-0.237527	operating systems. The 64 bit
-0.187729	32 bit code 64 bit
-0.187729	13 Asmlib Gnu 64 bit
-0.187729	code more efficient. 64 bit
-0.187729	_WIN64 not _WIN64 64 bit
-0.187729	code (option -fno-pic). 64 bit
-0.236463	macros Compiler identification 16 bit
-0.192816	disadvantages compared to 32 bit
-0.192816	8 bit and 32 bit
-0.055274	mode than in 32 bit
-0.055274	Shared objects in 32 bit
-0.055274	absolute references in 32 bit
-0.192816	by OpenMP directives 32 bit
-0.192816	several advantages over 32 bit
-0.192816	__INTEL_COMPILER __INTEL_COMPILER 161 32 bit
-0.192816	Important features 80386 32 bit
-0.351324	stored as a single bit
-0.339621	or writing a small bit
-0.018331	__m128i defines a 128 bit
-0.018331	__m128 defines a 128 bit
-0.018331	__m128d defines a 128 bit
-0.202016	float vectors SSE2 128 bit
-0.202016	bit mode SSE 128 bit
-0.098725	things with the sign bit
-0.098725	this example, the sign bit
-0.022658	shift out the sign bit
-0.098725	example sets the sign bit
-0.238137	bits except the sign bit
-0.098725	by setting the sign bit
-0.098725	ebx,31 copies the sign bit
-0.098725	to flip the sign bit
-0.097278	: 1; // sign bit
-0.097278	-0 (zero with sign bit
-0.022354	0x80000000; // set sign bit
-0.022354	0x7FFFFFFF; // set sign bit
-0.097278	{ // test sign bit
-0.097278	; shift down sign bit
-0.097278	0x80000000; // Set sign bit
-0.097278	0x80000000; // flip sign bit
-0.222323	search instructions AVX 256 bit
-0.222323	double vectors AVX2 256 bit
-0.370701	CPUs with a slow bit
-0.212262	for CPUs with slow bit
-0.371841	into the least significant bit
-0.205844	register. If the carry bit
-0.205844	instructions where the carry bit
-0.146424	the next. The carry bit
-0.132896	less useful in 32- bit
-0.132896	32-bit mode. The 32- bit
-0.132896	two versions. A 32- bit
-0.347382	SSE instruction set (128 bit
-0.199953	as addition, subtraction, comparison, bit
-0.165097	AVX2, or two 128- bit
-0.588690	is part of the operating
-0.456752	the overhead of the operating
-1.316768	the responsibility of the operating
-0.353509	advanced facilities of the operating
-0.352728	automatic updates to the operating
-0.352728	anyway. Updates to the operating
-0.014603	the CPU and the operating
-0.598276	the processor and the operating
-0.327037	the microprocessor and the operating
-0.502619	are included in the operating
-0.523553	MOVNTDQ require that the operating
-0.469338	and done by the operating
-0.764620	are supported by the operating
-0.668854	is determined by the operating
-0.333144	is caught by the operating
-0.652292	that come with the operating
-0.306635	graphics framework between the operating
-0.306635	The similarity between the operating
-0.590897	The profiler tells the operating
-0.292076	Memory-hungry applications force the operating
-0.128452	6 2.3 Choice of operating
-0.128452	manual. 2.3 Choice of operating
-0.003811	different C++ compilers and operating
-0.101869	by most CPUs and operating
-0.101869	all 64-bit CPUs and operating
-0.235200	all modern microprocessors and operating
-0.235200	the hardware platform and operating
-0.322059	about which platforms and operating
-0.472982	framework is used. The operating
-0.484473	invalidate the cache. The operating
-0.237096	files and databases. The operating
-0.345253	don't even have an operating
-0.292761	in applications without an operating
-0.453607	to use a different operating
-0.869360	when there is no operating
-0.293328	is ported to multiple operating
-0.348118	software because the two operating
-0.554510	both 32-bit and 64-bit operating
-0.643068	and sixteen in 64-bit operating
-0.316818	programs compiled for 64-bit operating
-0.324208	16-bit mode and some operating
-0.329164	registers available in 32-bit operating
-0.329164	general purposes in 32-bit operating
-0.236516	operating systems, though these operating
-0.312598	name. In the Windows operating
-0.235858	The Windows and Linux operating
-0.235323	necessary to query certain operating
-0.234949	Linux, BSD or Mac operating
-0.474435	used in the old operating
-0.217487	will crash on old operating
-0.285182	details in both compiler, operating
-0.229979	backup features, and current operating
-0.259781	and Mac OS X operating
-0.259781	Intel-based Mac OS X operating
-0.306297	standardization of programming languages, operating
-0.032678	this in a protected operating
-0.032678	message in a protected operating
-0.212183	processor core. Unfortunately, contemporary operating
-0.199914	2, 4, etc.). Older operating
-0.199914	Microsoft, Gnu, Clang Supported operating
-0.165060	that violate or circumvent operating
-0.294201	to first convert the unsigned
-0.538780	if the dividend is unsigned
-0.331837	floating point. Conversion of unsigned
-0.443572	are converting a to unsigned
-0.324594	by type-casting i to unsigned
-0.237274	changing the dividend to unsigned
-0.237274	10; // Convert to unsigned
-0.038091	differently on signed and unsigned
-0.038091	between using signed and unsigned
-0.018628	conversion between signed and unsigned
-0.018628	Conversions between signed and unsigned
-0.038091	to mix signed and unsigned
-0.236463	Example 7.4. Signed and unsigned
-0.314581	the extra bits. The unsigned
-0.094584	64-bit integer, signed or unsigned
-0.044722	2 int, signed or unsigned
-0.044722	short int, signed or unsigned
-0.094584	1 char, signed or unsigned
-0.010404	// Still faster if unsigned
-0.293046	with signed than with unsigned
-0.236879	of comparing signed with unsigned
-0.325516	An overflow of an unsigned
-0.325516	used. Conversion of an unsigned
-0.346851	is interpreted as an unsigned
-0.234544	by making i an unsigned
-0.236579	32 2 2 int unsigned
-0.313149	64-bit Linux: long int unsigned
-0.235160	}; struct Sdouble { unsigned
-0.235160	}; struct Slongdouble { unsigned
-0.235160	follows: struct Sfloat { unsigned
-0.237542	it occurs, (2) use unsigned
-0.588829	32 8 64 4 unsigned
-0.302816	64 Is16vec4 16 4 unsigned
-0.321952	Is32vec4 Vec4i 32 4 unsigned
-0.329130	64 Is8vec8 8 8 unsigned
-0.318405	Is16vec8 Vec8s 16 8 unsigned
-0.312908	Is8vec16 Vec16c 8 16 unsigned
-0.140799	23; // fractional part unsigned
-0.140799	52; // fractional part unsigned
-0.235621	discussed below. Signed / unsigned
-0.234621	short int int 256 unsigned
-0.232187	doubles: union {double d; unsigned
-0.231273	double ipow (double x, unsigned
-0.306801	more efficient to convert unsigned
-1.416926	union { float f; unsigned
-0.204564	int in 16-bit systems: unsigned
-0.224713	if nonzero and normal unsigned
-0.222210	integer overflow. Signed versus unsigned
-0.310968	int arraysize = 1000; unsigned
-0.264943	unsigned __int64 64-bit Linux: unsigned
-0.212174	example: // Example 7.7 unsigned
-0.212174	overflow: // Example 7.25 unsigned
-0.251115	or uint64_t MS compiler: unsigned
-0.199825	// fractional part 142 unsigned
-0.164978	32 char 256 Vec32c unsigned
-0.164978	unsigned int u[2]} a[size]; unsigned
-0.164978	checking template <typename T, unsigned
-0.164978	// exponent + 0x3FF unsigned
-0.164978	// exponent + 0x3FFF unsigned
-0.164978	point: // Example 14.22b unsigned
-0.164978	Example: // Example 14.22a unsigned
-0.164978	16 0 65535 uint16_t unsigned
-0.164978	8 0 255 uint8_t unsigned
-0.164978	// exponent + 0x7F unsigned
-0.164978	32 0 232-1 uint32_t unsigned
-0.570202	well. This is the first
-1.440799	the address of the first
-0.354278	the output of the first
-0.457728	the behavior of the first
-0.826454	is added to the first
-0.356632	used members in the first
-0.350842	is called for the first
-0.350842	the need for the first
-0.573876	mode so that the first
-0.235190	to access it the first
-0.540245	effects or if the first
-0.352138	operand. Likewise, if the first
-0.499480	the calculations on the first
-0.353167	second sum, then the first
-0.345097	is initialized only the first
-0.354014	following way. If the first
-0.348319	of all but the first
-0.296550	virtual table before the first
-0.420006	is called before the first
-0.123440	is known before the first
-0.123440	size known before the first
-0.502671	development. For example, the first
-0.322896	again two times the first
-0.291123	is only calculated the first
-0.291123	than by optimizing the first
-0.235190	In 64-bit Windows, the first
-0.519200	bytes to find the first
-0.235190	In 64-bit Linux, the first
-0.340621	but waits until the first
-0.291123	needed before adding the first
-0.329194	data members within the first
-0.235190	the same way, the first
-0.235190	instruction to localize the first
-0.748602	It is faster to first
-1.246549	it is necessary to first
-0.289642	and ZMM registers The first
-0.233887	the optimal algorithm The first
-0.309940	256-bit YMM registers. The first
-0.729411	in 32-bit mode. The first
-0.289642	good as possible. The first
-0.309940	the following way. The first
-0.289642	in two ways. The first
-0.620066	above the diagonal. The first
-0.233887	testing worst-case performance: The first
-0.233887	not a vector). The first
-0.233887	be optimized further. The first
-0.233887	now as follows. The first
-0.325270	All source files are first
-0.237770	an && expression, or first
-0.292114	// CPU dispatching on first
-0.380259	implement the dispatch on first
-0.292114	takes time. Dispatch on first
-0.237618	need only read this first
-0.325142	Will be called only first
-1.434247	The following example shows first
-0.232670	critical integer parameter comes first
-0.020741	a; // 2 bytes. first
-0.020741	d; // 2 bytes. first
-0.009096	first // 4 bytes. first
-0.009096	b; // 4 bytes. first
-0.009096	d; // 4 bytes. first
-0.020741	bytes // 8 bytes. first
-0.020741	b; // 8 bytes. first
-0.089666	a[100]; // 400 bytes. first
-0.212257	parameters and the 49 first
-0.057004	entry point. // After first
-0.057004	the dispatcher. // After first
-0.165128	program efficiency is reflected, first
-0.358429	further expansions of the register
-0.827781	This is because the register
-0.750786	advantages of using the register
-0.407636	of the way the register
-0.346085	11.3 even without the register
-0.721145	if it is a register
-0.746124	rather than in a register
-0.346149	is stored in a register
-0.334422	be stored in a register
-0.342540	transferring 'this' in a register
-0.455652	this to be a register
-0.794690	is stored as a register
-0.445686	registers organized as a register
-0.236101	will make temp a register
-0.292160	way of setting a register
-0.563745	prevents the use of register
-1.151845	that the value of register
-1.019494	for an explanation of register
-0.237119	most. The opposite of register
-0.293319	CPUs are capable of register
-0.381695	all C++ compilers The register
-0.630793	the code cache. The register
-0.593309	a register variable. The register
-0.479995	in a function for register
-0.237072	are used most for register
-0.237072	registers. Typical candidates for register
-0.237502	of main memory. A register
-0.428857	that could benefit from register
-0.342321	smaller than the vector register
-0.444009	result in a vector register
-0.318572	further extension of vector register
-0.577012	size of each vector register
-0.227160	addition with another vector register
-0.282001	use one 256-bit vector register
-0.227160	whether the largest vector register
-0.656731	the compiler to make register
-0.427994	can use the same register
-0.393545	thenaandbcannot use the same register
-0.920457	can share the same register
-0.338252	register storage. The same register
-0.950755	when the floating point register
-0.520277	number of floating point register
-0.234222	to make floating point register
-0.338996	cannot make floating point register
-0.061922	difficulties making floating point register
-0.293582	solution is using one register
-0.293541	maximum number of integer register
-0.237119	a[size], b[size], c[size]; float register
-0.548364	same. This is called register
-0.424215	advantage of a new register
-0.327655	to use a new register
-0.236325	than the largest available register
-0.236208	obey certain rules about register
-0.285642	and makes an extra register
-0.285642	it requires an extra register
-0.540905	fits into a single register
-0.980300	the compiler to optimize register
-0.423364	example, a 128-bit XMM register
-0.306868	even though the logical register
-0.313293	ebx as a temporary register
-0.369170	change in the YMM register
-0.226655	be only one free register
-0.451967	totaling up to fourteen register
-0.222276	assigning a new physical register
-0.212171	problems separating the flags register
-0.356400	this code with a 64
-0.651821	a line size of 64
-0.335727	or two integers of 64
-0.331339	use the vectors of 64
-0.314674	registers is extended to 64
-0.901449	in 32-bit systems and 64
-0.293777	8, 16, 32 and 64
-0.778888	below. Shared objects in 64
-0.313660	for address calculation in 64
-0.236523	one byte longer in 64
-0.323675	mostly relative references in 64
-0.380903	object without -fpic in 64
-0.313660	cost is seen in 64
-0.338822	and operating systems. The 64
-0.237422	can be expected. The 64
-0.358343	each vector can be 64
-0.339244	MMX registers, which are 64
-0.293984	kb / 8 = 64
-0.293954	double is represented with 64
-0.293958	directives 32 bit code 64
-0.237689	cache of 8 - 64
-0.310699	Linux: unsigned long int 64
-0.350736	2 int unsigned int 64
-0.340392	16 4 short int 64
-0.479286	4 unsigned short int 64
-0.237547	and stack entries use 64
-0.231989	8 256 AVX double 64
-0.231989	4 128 SSE double 64
-0.231989	16 512 AVX512 double 64
-0.292998	MMX int 32 2 64
-0.200108	128 SSE2 long long 64
-0.200108	256 AVX2 long long 64
-0.200108	64 MMX long long 64
-0.200108	512 AVX512 long long 64
-0.588842	32 8 64 4 64
-0.302824	short int 16 4 64
-0.321959	64 4 32 4 64
-0.323590	set char 8 8 64
-0.231823	64 2 32 8 64
-0.231823	32 8 32 8 64
-0.222758	code with a 64 64
-0.277011	be expected. The 64 64
-0.222758	63 31 11.6 64 64
-0.222758	element Example 9.6b 64 64
-0.236426	int 64 Is32vec2 32 64
-0.236226	1.19 13 Asmlib Gnu 64
-0.235622	while a double uses 64
-0.291281	long long 64 1 64
-0.321089	size, which is typically 64
-0.243226	line size is typically 64
-0.335621	independent code more efficient. 64
-0.220950	Agner 8 8 char 64
-0.304716	8 8 unsigned char 64
-0.336801	to load the entire 64
-0.226558	Iu32vec2 64 1 int64_t 64
-0.276400	not _WIN64 not _WIN64 64
-0.218368	destructor causes another exception. 64
-0.218428	code optimization Intel: "Intel 64
-0.199836	program or data exceeds 64
-0.199836	int64_t MS compiler: __int64 64
-0.164988	position-independent code (option -fno-pic). 64
-0.164988	Is32vec2 32 64 Iu32vec2 64
-0.164988	bytes. Each line covers 64
-0.164988	63 63 31 11.6 64
-0.164988	per element Example 9.6b 64
-0.164988	int64_t 128 I64vec2 Vec2q 64
-0.164988	int 128 Iu32vec4 Vec4ui 64
-0.353805	that we have to take
-0.492236	the compiler has to take
-0.379832	programmer can do to take
-0.498894	B. In order to take
-1.142703	example shows how to take
-0.467596	is no need to take
-0.291766	the reinstallation work to take
-0.235754	the installation process to take
-0.291766	classes with destructors to take
-0.235754	the program appear to take
-0.235754	used as coprocessors to take
-0.294115	the application itself and take
-0.292732	for big objects that take
-0.337788	are single instructions that take
-0.323774	lot of branches that take
-0.236604	has multiple instances that take
-0.452238	in applications that can take
-0.492026	cycles, but it can take
-0.493772	good if you can take
-0.339689	System database It can take
-0.286033	from RAM memory can take
-0.286033	the 7 program can take
-0.340677	n-1 multiplications, which can take
-0.322292	if unsigned You can take
-0.322292	+ a. You can take
-0.324745	interface, another thread can take
-0.307877	level-1 cache. We can take
-0.307877	of u.f We can take
-0.230711	example, one tread can take
-0.230711	such programs installed can take
-0.931455	the compiler may not take
-0.553059	today, then it may take
-0.609365	For example, it may take
-0.341707	back again. This may take
-0.232103	error. The calculations may take
-0.232103	optimal. The branches may take
-0.232103	log on process may take
-0.354468	be vectorized if you take
-0.120786	that the loop will take
-0.120786	then this loop will take
-0.120786	the whole loop will take
-0.376069	version in main will take
-0.233685	reports of which functions take
-0.401902	and the critical functions take
-0.621240	root and mathematical functions take
-0.293594	problems. Software developers should take
-0.620328	double and long double take
-1.115694	if a and b take
-0.338394	Virtual functions in C++ take
-0.313231	all, it will often take
-0.236407	operations and shift operations take
-1.061429	may in some cases take
-0.380311	cases, double precision calculations take
-0.236076	extra code and doesn't take
-0.235489	then the multiplication would take
-0.290806	are frameworks that typically take
-0.234582	microprocessors. Multiplication and division take
-0.234483	r = 28. We take
-0.233654	Floating point calculations usually take
-0.319272	single precision. These conversions take
-0.329097	and shuffling can sometimes take
-0.232212	then it will still take
-0.218476	of possible inputs. Let's take
-0.212165	to log, and logarithms take
-0.199897	unit as additions. Divisions take
-0.199897	Conversions between different precisions take
-0.353048	In C++, it is often
-0.353048	script languages, it is often
-0.290551	of vectors, as is often
-0.349499	are long. This is often
-0.349499	interface frameworks. This is often
-0.539979	library, but this is often
-0.427039	the core. It is often
-0.329915	response times. It is often
-0.329915	= a[i]; It is often
-0.329915	out-of-order execution. It is often
-0.329915	number information. It is often
-0.329915	programming style. It is often
-0.329915	to read. It is often
-0.329915	page 130. It is often
-0.350675	highly optimized program is often
-0.535963	aware that there is often
-0.827017	number of objects is often
-0.403377	the operating system is often
-0.378345	Unfortunately, table lookup is often
-0.290551	a given task is often
-0.237574	is more complex and often
-0.237574	new and delete, and often
-0.414349	compile the program are often
-0.346913	program. Small functions are often
-0.338195	} Induction variables are often
-0.543375	expensive if they are often
-0.326162	If two threads are often
-0.319738	because relative addresses are often
-0.233299	The clock counts are often
-0.309239	CodeAnalyst. Unfortunately, profilers are often
-0.288974	process. These requirements are often
-0.288974	unexpected behaviors. Arrays are often
-0.233299	anyway. Software distributors are often
-0.352806	another. Therefore, it can often
-0.454192	a good compiler can often
-0.330517	and class objects can often
-0.235069	complex digital operation can often
-0.290986	the compiler output can often
-0.235069	cases. Database queries can often
-0.354573	multiple processes because it often
-0.352689	with profiling, but it often
-0.293996	overloaded operators. Vectorized code often
-0.541971	version. The Gnu compiler often
-0.346317	in all, it will often
-0.348615	processor). Optimizing compilers will often
-0.234552	the preferred language will often
-0.343110	Time- consuming library functions often
-0.457354	declaration and the most often
-0.593279	then put the most often
-0.324354	automatically choose the most often
-0.315575	operand that is most often
-0.324579	a new vector size often
-0.236904	be obsolete. Programmers very often
-0.236696	the compiler e.g. how often
-0.292670	Unix-like systems. Mac systems often
-0.379375	disk or other hardware often
-0.224850	expressions also occur quite often
-0.224850	up, which happens quite often
-0.290603	expression. The size conversion often
-0.709832	on a hard disk often
-0.322055	statements because switch statements often
-0.228058	expansions. Programmers do, however, often
-0.199931	old version. Updating mechanisms often
-0.165076	Even big software companies often
-0.165076	common error that hackers often
-0.165076	same source file. Keep often
-0.358005	provoked here in a rather
-0.461447	directly into the code rather
-0.476753	the floating point code rather
-0.421822	calculations at compile time rather
-0.595854	resolved at compile time rather
-0.237511	server in full use rather
-0.285767	store x in memory rather
-0.285767	is stored in memory rather
-0.293439	using a 64-bit integer rather
-0.237025	advantageous to use float rather
-0.293244	modify an existing object rather
-0.236969	addresses for one array rather
-0.287676	typically aligned by 8 rather
-0.232158	sees the constant 8 rather
-0.520313	'this' in a register rather
-0.236550	is a class template rather
-0.200814	local variables in registers rather
-0.293971	be transferred in registers rather
-0.151863	are transferred in registers rather
-0.380867	advantages of using pointers rather
-0.236379	access or cache access rather
-0.846752	by the operating system rather
-0.337413	entries use 64 bits rather
-0.236303	number we get 0 rather
-0.236208	contains only six instructions rather
-0.236284	Optimizing for present processors rather
-0.236186	is executed 10 times rather
-0.562497	memory on the stack rather
-0.236050	use standard API calls rather
-0.328964	key in the container rather
-0.404286	setting the flush-to-zero mode rather
-0.351895	using CPU clock cycles rather
-0.234727	indexes, working with sets rather
-0.417781	using a software implementation rather
-0.321205	using a template parameter rather
-0.328103	operands are integer expressions rather
-0.438127	of using static linking rather
-0.288745	advantages of using references rather
-0.584339	the program is loaded rather
-0.232428	it in one operation rather
-0.231580	give the result 100 rather
-0.326879	of specific processor models rather
-0.343240	right from the beginning rather
-0.284085	done in big blocks rather
-0.227972	single call to memcpy rather
-0.226463	logarithm of each factor rather
-0.317712	of the execution units rather
-0.224687	in a single step rather
-0.307423	memory needed in advance rather
-0.212010	use truncation towards zero, rather
-0.212010	the multiplication of xxn rather
-0.212010	runtime libraries and frameworks, rather
-0.199746	give the result -56 rather
-0.199746	is only calculated once, rather
-0.164906	write if(!a && !b) rather
-0.164906	it defines electrical connections rather
-0.164906	be calculated as (b*2.0)/3.0 rather
-0.164906	we are using unions rather
-0.164906	best on processor X?" rather
-0.164906	the code that matters rather
-0.164906	sets the CPU supports, rather
-0.164906	of good development tools, rather
-0.164906	core is running at, rather
-0.592088	important part of the optimization
-0.356768	programming language when the optimization
-0.349063	therefore cannot do the optimization
-0.456479	Manual on using the optimization
-0.347839	were observed between the optimization
-0.237398	important to focus the optimization
-0.237398	program and concentrate the optimization
-1.012072	do a lot of optimization
-0.172327	A higher level of optimization
-0.172327	the highest level of optimization
-0.293534	a high degree of optimization
-0.254691	line options relevant to optimization
-0.254691	and keywords relevant to optimization
-0.171878	where the obstacles to optimization
-0.171878	Some important obstacles to optimization
-0.023417	option. 8.4 Obstacles to optimization
-0.023417	77 8.4 Obstacles to optimization
-0.023417	compilers. 8.3 Obstacles to optimization
-0.023417	74 8.3 Obstacles to optimization
-0.236029	developer.intel.com. Many advices on optimization
-0.236029	2001. Advanced book on optimization
-0.236029	Processors". www.amd.com. Advices on optimization
-0.237704	titles. Literature on code optimization
-0.293927	to rely on compiler optimization
-0.364974	safer to do this optimization
-0.254500	compilers will do this optimization
-0.357582	option for whole program optimization
-0.273515	"__attribute__((visibility("hidden")))". Use whole program optimization
-0.064178	Profile-guided optimization Whole program optimization
-0.064178	a program. Whole program optimization
-0.064178	optimization /Og Whole program optimization
-0.237015	benefit from its many optimization
-0.339490	traditionally considered a software optimization
-0.232937	goal of 18 software optimization
-0.215078	software in C++ An optimization
-0.215078	and VIA CPUs: An optimization
-0.215078	software in C++: An optimization
-0.215078	in assembly language: An optimization
-0.349868	doesn't provide the best optimization
-0.235994	used for giving specific optimization
-0.235913	Linux. Has many good optimization
-0.221663	in applying the various optimization
-0.506968	C++ compilers have various optimization
-0.234069	all relevant options. Many optimization
-0.272215	other hand, if your optimization
-0.218523	to CriticalFunction. If your optimization
-0.230207	with all the relevant optimization
-0.030851	version with all relevant optimization
-0.030851	out with all relevant optimization
-0.030851	Even with all relevant optimization
-0.268387	and suggestions for my optimization
-0.215139	Please note that my optimization
-0.420597	is possible to insert optimization
-0.088569	long latencies. 8.5 Compiler optimization
-0.088569	by CPU.............................................................................81 8.5 Compiler optimization
-0.218438	abstraction which makes detailed optimization
-0.218481	cross-module optimizations when interprocedural optimization
-0.212171	doing whole program 81 optimization
-0.212171	map file /Fm Generate optimization
-0.035760	Mac systems. 14 Specific optimization
-0.035760	......................................................................... 130 14 Specific optimization
-0.165050	true/false Loopunrolling x-xxxx--x Profile-guided optimization
-0.165050	-Ofast /O3 -O3 Interprocedural optimization
-0.165050	consumers. Choose the strongest optimization
-0.331886	compilers. This includes the libraries
-0.237817	to date. Mac The libraries
-0.237720	the operating system or libraries
-0.519815	2.6 Choice of function libraries
-0.283267	platforms. Comparison of function libraries
-0.105352	only compilers and function libraries
-0.105352	Intel compilers and function libraries
-0.276067	be used for function libraries
-0.295726	other compilers or function libraries
-0.221926	used that most function libraries
-0.200930	that the Intel function libraries
-0.200930	function in Intel function libraries
-0.221926	107). The Gnu function libraries
-0.221926	for details. Use function libraries
-0.221926	code. The best function libraries
-0.096939	in vectors. These function libraries
-0.096939	Performance Primitives". These function libraries
-0.221926	best. Some common function libraries
-0.276067	the best optimized function libraries
-0.221926	There are various function libraries
-0.295726	unit. Various graphics function libraries
-0.221926	speed-critical functions. Many function libraries
-0.276067	best optimized math function libraries
-0.221926	justified for general function libraries
-0.221926	support SSE. Several function libraries
-0.221926	possible to distribute function libraries
-0.221926	before you. Optimized function libraries
-0.324839	math libraries: long vector libraries
-0.237429	recommended to try different libraries
-0.293614	Faster than most other libraries
-0.237357	function, though not all libraries
-0.333134	and other container class libraries
-0.311139	Table 12.4. Vector class libraries
-0.348844	price, and in most libraries
-0.350901	sets. However, the Intel libraries
-0.234624	also work when Intel libraries
-0.335259	classes defined in two libraries
-0.331540	functions linked from static libraries
-0.323661	libraries. www.agner.org/optimize/#vectorclass All these libraries
-0.068117	and all the dynamic libraries
-0.068117	distribute all the dynamic libraries
-0.197349	dynamic libraries. The dynamic libraries
-0.197349	.a), but not dynamic libraries
-0.197349	one or more dynamic libraries
-0.327202	share the same dynamic libraries
-0.197349	will make all dynamic libraries
-0.461227	14.11 Static versus dynamic libraries
-0.342806	versions instead. The Gnu libraries
-0.292203	be useful for large libraries
-0.226732	and position-independent code Function libraries
-0.226732	versus dynamic libraries Function libraries
-0.333279	purposes. Unfortunately, the standard libraries
-0.225656	Most compilers include standard libraries
-0.234420	package, including all runtime libraries
-0.234074	are discussed below. Many libraries
-0.234123	runtime DLL's (dynamically linked libraries
-0.039667	either as static link libraries
-0.313384	can make dynamic link libraries
-0.232682	caching less efficient. Dynamic libraries
-0.226708	libraries. Several special purpose libraries
-0.212177	to write _mm_add_epi16(a,b). Two libraries
-0.212177	Boost collection contains well-tested libraries
-0.212177	the SVML and LIBM libraries
-0.357451	template specialization. This is how
-0.018117	for an example of how
-0.037022	shows an example of how
-0.062628	90 for examples of how
-0.062628	103 for examples of how
-0.062628	www.agner.org/optimize/cppexamples.zip for examples of how
-0.236552	on my study of how
-0.292673	a basic understanding of how
-0.351882	into machine code and how
-0.152708	function is called and how
-0.237032	can look like and how
-0.209009	(see page 130 for how
-0.209009	(See page 130 for how
-0.291581	See page 120 for how
-0.023346	See page 122 for how
-0.235592	See page 107 for how
-0.291581	appendix at www.agner.org/optimize/cppexamples.zip for how
-0.347877	a loop depends on how
-0.316615	the memory, depending on how
-0.316615	example 12.4a, depending on how
-0.235193	test theory. Advice on how
-0.297298	problems. More details about how
-0.223252	a few comments about how
-0.223252	"Instruction tables". Tips about how
-0.333242	account. You can calculate how
-0.279419	function call to count how
-0.224883	counter variables that count how
-0.346547	compiler generates to see how
-0.169219	The following example shows how
-0.122284	InstructionSet().The following example shows how
-0.135979	expression. Example 12.4b shows how
-0.135979	on page 39 shows how
-0.130791	obstacles and to know how
-0.130791	in order to know how
-0.130791	is useful to know how
-0.130791	you want to know how
-0.310532	is useful for checking how
-0.287815	profiler that can tell how
-0.231885	result in a[i]. Note how
-0.092437	part. It is discussed how
-0.092437	considerations. It is discussed how
-0.230674	tell the compiler e.g. how
-0.229968	millisecond. The profiler counts how
-0.261064	the program to measure how
-0.184645	test data and measure how
-0.224825	example only to show how
-0.222276	casting operator that specifies how
-0.222241	on advanced C++ programming, how
-0.296100	therefore important to understand how
-0.218481	The following examples explain how
-0.218523	little or no idea how
-0.766655	The following example illustrates how
-0.212171	the factors that decide how
-0.074731	smaller. This manual discusses how
-0.074731	languages. This section discusses how
-0.199903	you are in doubt how
-0.165050	The next chapter describes how
-0.495566	better version of the code.
-0.495566	final version of the code.
-0.711141	debug version of the code.
-1.048088	same part of the code.
-0.756780	other parts of the code.
-0.522773	used parts of the code.
-1.014853	the rest of the code.
-0.493186	other variable in the code.
-0.643059	hot spots in the code.
-0.350443	specific places in the code.
-0.350443	require modifications in the code.
-0.350443	assigned previously in the code.
-0.345621	can possibly improve the code.
-0.236705	well it optimizes the code.
-0.236705	it for improving the code.
-0.538178	optimizes a piece of code.
-0.319454	a particular piece of code.
-0.331291	joining identical pieces of code.
-0.293526	or small sequences of code.
-0.237665	interpret that string as code.
-0.237560	the microprocessor handles this code.
-0.441577	part of the instruction code.
-0.487035	incompatible with floating point code.
-0.634399	before any floating point code.
-0.346006	x87 style floating point code.
-0.420694	simple reductions on integer code.
-0.353461	default anyway in 64-bit code.
-0.442998	with C or C++ code.
-0.292990	make code that makes code.
-0.354691	after executing the critical code.
-0.279615	further discussion of system code.
-0.279615	resource use in system code.
-0.225056	are intended for system code.
-0.329806	check for the error code.
-0.140698	not produce any extra code.
-0.181369	without adding any extra code.
-0.332358	a pointer in assembly code.
-0.229372	and understand compiler-generated assembly code.
-0.310759	but not the compiled code.
-0.295793	well as directly compiled code.
-0.221982	with a fully compiled code.
-0.318956	or compiling the intermediate code.
-0.333551	compilation of an intermediate code.
-0.344246	separated from the application code.
-0.234723	useful for vectorizing mathematical code.
-0.390516	appear in the source code.
-0.132887	with the same source code.
-0.132887	from the same source code.
-0.265167	features of the position-independent code.
-0.212289	the costs of position-independent code.
-0.232241	prevents a faster vectorized code.
-0.230725	distributed as binary executable code.
-0.606867	that gives the simplest code.
-0.304251	the so-called position- independent code.
-0.276398	of C++ and Fortran code.
-0.222297	the loader. 2. Position-independent code.
-0.222218	that relate to CPU-intensive code.
-0.218463	vectorization leads to suboptimal code.
-0.148641	not always for application-specific code.
-0.148641	than in optimizing application-specific code.
-0.465887	AVX code to non-AVX code.
-0.199881	on longjmp in time-critical code.
-0.165029	specific optimizations in precompiled code.
-0.500806	mispredicted 50% of the time.
-0.142474	only 10% of the time.
-0.142474	true 10% of the time.
-0.272368	small bit at a time.
-0.501191	one line at a time.
-0.272368	one square at a time.
-0.272368	few kilobytes at a time.
-0.857778	take a lot of time.
-0.308015	a significant amount of time.
-0.308015	a considerable amount of time.
-0.606948	takes only slightly more time.
-0.274505	used at the same time.
-0.182772	variable at the same time.
-0.182772	multiplication at the same time.
-0.182772	things at the same time.
-0.182772	DLL at the same time.
-0.235507	access rather than CPU time.
-0.235507	overhead which consumes CPU time.
-0.237289	50% or less each time.
-0.237293	which functions take most time.
-0.237039	that the branching takes time.
-0.637451	take quite a long time.
-0.353676	access it the first time.
-0.247708	code takes no extra time.
-0.247708	conversion takes no extra time.
-0.296163	precisions take no extra time.
-0.262744	them again takes extra time.
-0.318695	doesn't take any extra time.
-0.222341	in relation to execution time.
-0.276538	program compactness, and execution time.
-0.361254	to the total execution time.
-0.128213	calculate it at compile time.
-0.128213	the compiler at compile time.
-0.128213	as possible at compile time.
-0.128213	are available at compile time.
-0.128213	b) etc. at compile time.
-0.128213	are done at compile time.
-0.186675	be calculated at compile time.
-0.682496	is known at compile time.
-0.131454	not known at compile time.
-0.190891	integer known at compile time.
-0.190891	constant known at compile time.
-0.406628	always resolved at compile time.
-0.128213	are instantiated at compile time.
-0.235780	to the total calculation time.
-0.235500	loaded or at run time.
-0.310920	efficiency, portability and development time.
-0.234460	another way than last time.
-0.297690	unsigned integer takes longer time.
-0.105070	multiplication would take longer time.
-0.105070	and division take longer time.
-0.105070	additions. Divisions take longer time.
-0.096000	compilers. Dispatch at load time.
-0.096000	need relocation at load time.
-0.232959	below. Dispatch at installation time.
-0.324804	other calculations to save time.
-0.338529	fraction of the total time.
-0.119261	waste of the user's time.
-0.119261	can steal the user's time.
-0.199925	to the total computation time.
-0.579650	different value of the template
-0.783278	new instance of the template
-0.356675	class name and the template
-0.357979	function. But in the template
-0.656641	the sense that the template
-0.357409	into one if the template
-0.447815	are efficient because the template
-0.346447	is faster because the template
-0.355887	the same. If the template
-0.793418	the above example, the template
-0.237879	common sub-expressions. Why is template
-0.356638	Here CParent is a template
-0.460552	typedef instead of a template
-0.350307	function parameter and a template
-0.455130	would know that a template
-0.344917	is given as a template
-0.344917	is provided as a template
-0.349452	to read. If a template
-0.737682	advantage of using a template
-0.231530	child class through a template
-0.231530	generation class through a template
-0.347435	for each set of template
-0.345079	of virtual functions. The template
-0.655682	rather than a function template
-0.237712	7.34b. Replace macro by template
-0.324138	this calculation implemented with template
-0.236901	express any algorithm with template
-0.291950	type and size as template
-0.313177	child class name as template
-0.235916	many different factors as template
-0.349332	templates. Two or more template
-0.611189	at compile time. A template
-0.231660	of template parameters. A template
-0.231660	templates for polymorphism A template
-0.231660	cases. 7.28 Templates A template
-0.231660	to do so). A template
-0.453215	template is a class template
-0.293451	15.1d. Integer power using template
-0.339723	complicated? Because the C++ template
-0.233888	need it. In C++ template
-0.381633	are cases, however, where template
-1.716553	a power of 2 template
-0.236619	Replace macro by template template
-0.323072	N-1)==0,N>::p(x); } // Use template
-0.346534	} }; // Function template
-0.431224	alternatives to the standard template
-0.300122	container classes. The standard template
-0.347371	array using the above template
-0.234773	to use this complicated template
-0.400268	Array with bounds checking template
-0.306844	the power of N template
-0.281433	a power of 2: template
-0.067991	recursive templates. The powN template
-0.067991	the template. The powN template
-0.199875	function template because partial template
-0.008667	} }; // Full template
-0.330612	as a template parameter: template
-0.465877	return x * m;} template
-0.165024	are implemented by (partial) template
-0.165024	how tortuous and convoluted template
-0.165024	end with a non-recursing template
-0.165024	} }; // Partial template
-0.294243	of ebx. Only the registers
-1.001380	where the number of registers
-0.399205	systems: The number of registers
-0.399205	27 The number of registers
-0.521221	and the type of registers
-0.327543	32-bit systems, but in registers
-0.344014	and local variables in registers
-0.777501	can be stored in registers
-0.426388	that are stored in registers
-0.602688	variables are stored in registers
-0.143216	to be transferred in registers
-0.101209	would be transferred in registers
-0.116291	parameters are transferred in registers
-0.531909	can be returned in registers
-0.359402	size of the vector registers
-0.275014	CPU. If the vector registers
-0.275014	of using the vector registers
-0.316515	The size of vector registers
-0.313988	two 128- bit vector registers
-0.225493	Windows). The XMM vector registers
-0.299957	that supported 128-bit vector registers
-0.225493	eight or sixteen vector registers
-0.237497	This is advantageous because registers
-0.549568	When the floating point registers
-0.908908	types of floating point registers
-0.240121	are eight floating point registers
-0.240121	involves eight floating point registers
-0.440321	size of the integer registers
-0.288433	are approximately six integer registers
-0.232823	systems and fourteen integer registers
-0.237290	could benefit from using registers
-0.236354	the number of available registers
-0.009810	the floating point stack registers
-0.019846	The floating point stack registers
-0.236018	a register stack. These registers
-0.020151	variables in the XMM registers
-0.020151	will use the XMM registers
-0.003294	precision when the XMM registers
-0.006614	processor) when the XMM registers
-0.111649	point underflow in XMM registers
-0.052246	not check if XMM registers
-0.052246	is costly if XMM registers
-0.111649	good implementation uses XMM registers
-0.246374	registers The 128-bit XMM registers
-0.230085	there are not enough registers
-0.229182	are extended to 256-bit registers
-0.016776	instruction set and YMM registers
-0.156851	only SSE). The YMM registers
-0.291678	variables, and for saving registers
-0.005761	instruction set and ZMM registers
-0.048382	and the 512-bit ZMM registers
-0.257357	new version without the need
-0.257357	dynamic libraries without the need
-0.257357	of calculations without the need
-0.237668	variable. (This eliminates the need
-0.687675	addressing of data. The need
-0.349943	any member functions that need
-0.102240	pointers or addresses that need
-0.102240	no absolute addresses that need
-0.236211	connections. Temporary files that need
-0.236211	are allocated resources that need
-0.350072	The user may not need
-0.350950	relative references do not need
-0.312686	IPP library does not need
-0.312686	the list does not need
-0.312686	other hand, does not need
-0.331981	condition. Things that may need
-0.579642	of the program may need
-0.231020	An allocated array may need
-0.323341	the case we may need
-0.639716	CPU cores. You may need
-0.231020	The program logic may need
-0.231020	Critical device drivers may need
-0.353030	same algorithm, then you need
-0.350061	natural ordering? If you need
-0.308939	in C++ so you need
-0.308939	are not sure you need
-0.529201	In other words, you need
-0.237450	set. Therefore, you only need
-0.939311	then there is no need
-0.336152	called. There is no need
-0.336152	references. There is no need
-0.336152	returns. There is no need
-0.336152	execution. There is no need
-0.336152	43). There is no need
-0.336152	87). There is no need
-0.221218	is inlined - no need
-0.915101	members of a class need
-0.293537	and Gnu). Other compilers need
-0.329042	constant n, then we need
-0.226378	is evicted before we need
-0.281115	In this case we need
-0.226378	or CPU cores, we need
-0.236987	two different registers. You need
-0.623686	all the dynamic libraries need
-0.350572	and current operating systems need
-0.110191	function because it doesn't need
-0.417859	temp. The compiler doesn't need
-0.208328	constructors. A class doesn't need
-0.208328	if the object doesn't need
-0.312538	variable. The different threads need
-0.292104	in a high-level language need
-0.291899	and 14.30 will therefore need
-0.311590	and the object files need
-0.187121	local data that don't need
-0.249101	long as you don't need
-0.249101	object then you don't need
-0.293522	range and we don't need
-0.187121	static if they don't need
-0.377994	more. Many software applications need
-0.331893	twice because both the pointers
-0.165616	to a table of pointers
-0.165616	and a table of pointers
-0.165616	has a table of pointers
-0.102570	than type casting of pointers
-0.102570	safer. Type casting of pointers
-0.237004	to the programmer that pointers
-0.237004	the compiler explicitly that pointers
-0.537063	standard C, specifying that pointers
-0.237699	to exchange data or pointers
-0.056861	table // of function pointers
-0.334989	Virtual functions and function pointers
-0.351190	valid addresses, or if pointers
-0.534839	container rather than by pointers
-0.293974	to do things with pointers
-0.533724	enum as well as pointers
-0.313399	are always transferred as pointers
-0.379499	safer to use than pointers
-0.354872	using references rather than pointers
-0.235515	class objects (rather than pointers
-0.293824	structures that typically use pointers
-0.357354	so as to make pointers
-0.237336	that you analyze all pointers
-0.990953	The advantages of using pointers
-0.326022	are disadvantages of using pointers
-0.237106	assignment. shared_ptr allows multiple pointers
-0.230981	C, specifying that two pointers
-0.323293	the difference between two pointers
-0.230981	an addition. Comparing two pointers
-0.221001	complicated implementation of member pointers
-0.275020	cause complications with member pointers
-0.301256	complications that make member pointers
-0.221001	control the way member pointers
-0.221001	model fast=2 Simple member pointers
-0.236114	expressions as arguments while pointers
-0.130993	arrays are accessed through pointers
-0.134053	structures are accessed through pointers
-0.165977	in fact accessed through pointers
-0.425988	optimize code that uses pointers
-0.098729	decrement operators. 7.7 Function pointers
-0.098729	............................................................................................ 36 7.7 Function pointers
-0.235255	problem: 1. Relocation. All pointers
-0.234704	a simple variable. Using pointers
-0.233719	used for the link pointers
-0.231219	the new block. Any pointers
-0.203310	common implementations of smart pointers
-0.203310	purpose of using smart pointers
-0.316573	register variables. This includes pointers
-0.333814	program has to keep pointers
-0.229962	matters. Problems with invalid pointers
-0.324041	to zero, by setting pointers
-0.229057	data section may contain pointers
-0.017510	member pointer. 7.9 Smart pointers
-0.017510	Member pointers.......................................................................................................37 7.9 Smart pointers
-0.035755	pointer is deleted. Smart pointers
-0.035755	than for auto_ptr. Smart pointers
-0.251171	has changed. 7.8 Member pointers
-0.165024	of pointers, by initializing pointers
-0.358343	to test in the test
-0.356820	memory used by the test
-0.353797	thread priority before the test
-0.341508	then output after the test
-0.555309	convenient to make a test
-0.337022	Then you make a test
-0.664951	recommended to put a test
-0.237349	I have developed a test
-0.429738	use a set of test
-0.228773	a typical set of test
-0.228773	a suitable set of test
-0.429738	a realistic set of test
-0.342776	// Critical function to test
-0.345812	piece of code to test
-0.518985	you may have to test
-0.590322	set in order to test
-0.350701	The typical way to test
-0.311483	fraction. For example, to test
-0.751806	122 for how to test
-0.334603	Advice on how to test
-0.235182	uses XMM registers to test
-0.515887	words, you need to test
-0.311483	Number of times to test
-1.225870	it is necessary to test
-0.333436	is more relevant to test
-0.291115	are two things to test
-0.235182	is common practice to test
-0.237828	code branches separately and test
-1.052351	can be useful in test
-0.340793	of test data. The test
-0.237096	time stamp counter. The test
-0.237096	and hot spots. The test
-0.293693	of a variable for test
-0.382194	any code branch for test
-0.237807	~, <<, >> can test
-0.486117	< 0) { // test
-0.633110	& 0x7FFFFFFF) { // test
-0.237710	not a textbook on test
-0.237509	the critical code. A test
-0.314167	behave differently on different test
-0.450533	limit, then you should test
-0.352712	Time difference for each test
-0.288681	to include a performance test
-0.233041	considered. A realistic performance test
-0.236819	times // Time before test
-0.046748	2; } }; void test
-0.225324	function swapd(a[r][c], a[c][r]); void test
-0.349102	fast in a simple test
-0.329581	work correctly. The speed test
-0.339588	to make a small test
-0.303793	algebraic reductions in my test
-0.268398	the manual for my test
-0.130698	in the program under test
-0.130698	If the program under test
-0.199960	performance monitor counters. My test
-0.199960	have been identified. My test
-0.399237	rather than a dedicated test
-0.212235	CPUs have a built-in test
-0.165060	seen in the unit- test
-0.356956	the parameters of the new
-1.220360	the beginning of the new
-0.497849	contents copied to the new
-0.353795	to adapt to the new
-0.656602	to wait for the new
-0.548638	unstable or if the new
-0.314039	user never uses the new
-0.293553	application programmer gets the new
-0.354264	the LLVM is a new
-0.349486	full advantage of a new
-0.349486	the insertion of a new
-0.352620	be updated to a new
-0.350460	of memory for a new
-0.349344	may require that a new
-0.272210	this every time a new
-0.272210	updated every time a new
-0.222687	(methods) Each time a new
-0.836541	is to use a new
-0.341514	branch only when a new
-0.947115	is to make a new
-0.504525	needs to make a new
-0.434616	rather than making a new
-0.233287	processors that support a new
-0.376398	writes to load a new
-0.327304	it will generate a new
-0.101164	possible to start a new
-0.101164	it can start a new
-0.288960	to begin calculating a new
-0.110401	necessary to allocate a new
-0.110401	delete to allocate a new
-0.120878	string classes allocate a new
-0.233287	language Before starting a new
-0.233287	verifying and maintaining a new
-0.233287	this by assigning a new
-0.233287	it and create a new
-0.237850	facilities are needed, and new
-0.233778	de-allocation of memory with new
-0.233778	construct an object with new
-0.076854	memory Memory allocated with new
-0.076854	include: Memory allocated with new
-0.233778	dynamic memory allocation with new
-0.233778	be allocated dynamically with new
-0.237707	of the problem. This new
-0.333118	new features to each new
-0.234878	of CPU development, each new
-0.342990	little-known alternative to using new
-0.236276	please install this important new
-0.236004	the instruction set. These new
-0.235704	wstring or CString uses new
-0.235567	done with the operators new
-0.328780	for software to add new
-0.448742	update when the next new
-0.232622	usability problems and desired new
-0.231822	Microprocessor producers keep adding new
-0.229141	that what is brand new
-0.222318	advantages of alloca over new
-0.165097	update mechanism to advertise new
-0.165097	always happy to receive new
-0.165097	is allocated dynamically (with new
-0.293979	is not portable to systems
-0.314485	not easily ported to systems
-0.294192	a separate thread in systems
-0.237407	little-endian storage, but other systems
-0.331223	are available in all systems
-0.319128	certain that the 64-bit systems
-0.326342	are available in 64-bit systems
-0.326342	be avoided in 64-bit systems
-0.275875	future. 6 The 64-bit systems
-0.301861	necessary to use 64-bit systems
-0.319417	usually 32. In 64-bit systems
-0.324669	resource use on such systems
-0.454224	less efficient in some systems
-0.223454	systems and in 32-bit systems
-0.421319	register variables in 32-bit systems
-0.223454	32 bits in 32-bit systems
-0.223454	4 bytes in 32-bit systems
-0.223454	double precision in 32-bit systems
-0.223454	is eight in 32-bit systems
-0.223454	approximately six in 32-bit systems
-0.205644	64 bits, but 32-bit systems
-0.205644	64-bit integers. Many 32-bit systems
-0.298833	more efficient. 64 bit systems
-0.298833	(option -fno-pic). 64 bit systems
-0.432651	similarity between the operating systems
-0.284261	most CPUs and operating systems
-0.212232	modern microprocessors and operating systems
-0.212232	which platforms and operating systems
-0.161319	use a different operating systems
-0.161319	because the two operating systems
-0.127098	32-bit and 64-bit operating systems
-0.127098	compiled for 64-bit operating systems
-0.161319	mode and some operating systems
-0.035065	available in 32-bit operating systems
-0.035065	purposes in 32-bit operating systems
-0.161319	systems, though these operating systems
-0.161319	Windows and Linux operating systems
-0.073212	in the old operating systems
-0.073212	crash on old operating systems
-0.161319	features, and current operating systems
-0.161319	core. Unfortunately, contemporary operating systems
-0.161319	4, etc.). Older operating systems
-0.161319	Gnu, Clang Supported operating systems
-0.216971	less intensive applications. Some systems
-0.216971	speed is important. Some systems
-0.216971	is intended for. Some systems
-0.216971	graphics accelerator card. Some systems
-0.236000	and Windows 3.x. These systems
-0.931032	Linux, BSD and Mac systems
-0.223469	in Unix-like systems. Mac systems
-0.328654	32-bit integers in 16-bit systems
-0.018514	17 Optimization in embedded systems
-0.218516	of compatibility with existing systems
-0.284243	comparison. On big endian systems
-0.212258	way to fully utilize systems
-0.347374	Shared objects in Unix-like systems
-0.199948	in registers. 64-bit Unix systems
-0.165091	should be used. Web systems
-0.503020	take care of the user
-0.451174	project goes to the user
-0.451174	are annoying to the user
-0.349105	is unacceptable to the user
-0.355644	is terminated and the user
-0.356269	are unnecessary for the user
-0.574057	code is that the user
-0.348556	so long that the user
-0.639367	is important that the user
-0.342910	is overloaded or the user
-0.535677	resources, even if the user
-0.451256	big problem if the user
-0.491418	file, especially if the user
-0.355833	CPU time on the user
-0.457682	lower priority than the user
-0.354093	different priorities then the user
-0.312607	is no way the user
-0.236125	a word processor the user
-0.236125	should never interrupt the user
-0.236125	recommended to place the user
-0.236125	or even telling the user
-0.236125	the system forbids the user
-0.353632	it unusual that a user
-0.347209	inconvenient times when a user
-0.293340	and easy development of user
-0.351777	test data instead of user
-0.737326	that the choice of user
-0.128344	12 2.7 Choice of user
-0.128344	undocumented. 2.7 Choice of user
-0.641633	the response time to user
-0.237853	software be reinstalled and user
-0.338435	for Linux systems. The user
-0.537285	the program starts. The user
-0.237116	size is insufficient. The user
-0.441844	longer response times for user
-0.113565	its time waiting for user
-0.113565	their time waiting for user
-0.236355	program that waits for user
-0.236355	can handle. Waiting for user
-0.625412	the same time. A user
-0.293707	color settings and different user
-0.293276	tools. The simplest possible user
-0.743030	This is a very user
-0.236797	compilers available, though less user
-0.291395	programmer can use standard user
-0.321795	disadvantage for the end user
-0.105018	delay that the end user
-0.105018	unlikely that the end user
-0.243816	time before the end user
-0.232658	of the code, including user
-0.068024	by dropping the graphical user
-0.010623	loop of a graphical user
-0.010623	menus of a graphical user
-0.021512	application has a graphical user
-0.068024	3.10 Graphics A graphical user
-0.068024	program their own graphical user
-0.301378	a database for storing user
-0.222300	development tools. A popular user
-0.199959	the new features. Take user
-0.290529	access to all of these
-0.493760	belong to one of these
-0.330316	libraries for many of these
-0.224164	bypassed by any of these
-0.298381	(RTTI) If any of these
-0.224164	microprocessors without any of these
-0.340860	mechanisms, and some of these
-0.350759	The latest versions of these
-1.215548	can take advantage of these
-0.101673	message systems. All of these
-0.101673	formats. Comments All of these
-0.334278	a hardware implementation of these
-0.290529	the code. Many of these
-0.643819	to be aware of these
-0.501190	for the availability of these
-0.234667	case" values. Which of these
-0.310869	An OR combination of these
-0.234667	columns. Every fourth of these
-0.325363	Fortunately, the solution to these
-0.341381	by another function and these
-0.237558	as Boolean vectors, and these
-0.314698	The code examples in these
-0.347246	contains various functions for these
-0.237072	complete code examples for these
-0.237072	transfer is avoided for these
-0.747226	you can assume that these
-0.293671	an explanation. Note that these
-0.293671	patch. 131 Note that these
-0.381018	etc. But beware that these
-0.237706	actually doing something on these
-0.335935	container classes that use these
-0.235646	bits total size, because these
-0.291642	is very problematic because these
-0.346554	the stack for all these
-0.235277	shared object. Obviously, all these
-0.235300	and fence instructions, but these
-0.235300	references to relocate, but these
-0.237096	the same variables. In these
-0.236888	a sensible balance between these
-0.292975	waste of resources. For these
-0.345457	assembly code to access these
-0.799595	various ways to avoid these
-0.314714	allocation. You should avoid these
-0.236051	for performance reasons. Use these
-0.236056	same cache line. But these
-0.257871	several different purposes. All these
-0.205823	or error prone. All these
-0.205823	in table 9.2. All these
-0.205823	external libraries. www.agner.org/optimize/#vectorclass All these
-0.235169	best Java implementations. However, these
-0.232622	lrintf and lrint. Unfortunately, these
-0.630208	no way to tell these
-0.231322	X operating systems, though these
-0.458762	the compiler will convert these
-0.230039	the diagonal and swap these
-0.324059	brand simply by setting these
-0.439352	discusses how to overcome these
-0.307536	Make sure to distinguish these
-0.165050	and attempts to translate these
-0.485715	and compatibility problems and they
-0.711527	the same thing and they
-0.313394	for many programmers and they
-0.236784	to 32-bit integers, and they
-0.236784	be different sizes, and they
-0.341773	used in so that they
-0.341773	b different so that they
-1.067627	to make sure that they
-0.472752	and make sure that they
-0.236206	the same reason that they
-0.237734	they are needed, or they
-0.723205	declared inside the function they
-0.317874	valid values or if they
-0.317874	not overlap or if they
-0.231394	making them static if they
-0.296335	different arrays even if they
-0.296335	be mispredicted even if they
-0.231394	have other values if they
-0.231394	also stored together if they
-0.286808	cause fatal errors if they
-0.231394	time, but expensive if they
-0.231394	are relatively cheap if they
-0.237708	use unsigned integers - they
-0.349616	are evaluated every time they
-0.289059	keep track of when they
-0.344069	be loaded only when they
-0.289059	performance monitor counters when they
-0.233374	chains is stronger when they
-0.301252	less optimal code because they
-0.320645	are equally efficient because they
-0.305727	of program performance because they
-0.226584	are particularly critical because they
-0.226584	the Boolean operators because they
-0.516510	should be avoided because they
-0.281348	are relatively costly because they
-0.215209	the function in which they
-0.313294	the order in which they
-0.185390	the thread in which they
-0.235311	-(-a) = a, but they
-0.235311	define 64-bit integers, but they
-0.531554	errors in cases where they
-0.339862	the only situation where they
-0.337039	zero whenever the objects they
-0.339121	the type of objects they
-0.236826	in several stages before they
-0.419513	memory, depending on how they
-0.509051	because in most cases they
-0.366887	registers, regardless of whether they
-0.366887	compilers to see whether they
-0.235042	resources than the programs they
-0.235131	transferred as pointers unless they
-0.234609	efficient, and that's what they
-0.290139	doing things only after they
-0.288895	situations, and which reductions they
-0.230054	floating point calculations whenever they
-0.165076	pointers and the texts they
-0.102618	multiple versions with and without
-0.102618	code compiled with and without
-0.331047	the innermost loop and without
-0.237652	be represented with or without
-0.293692	write directly to memory without
-0.237435	method of storing data without
-0.314151	in an application program without
-0.539675	__intel_cpu_features_init_x() does the same without
-0.343042	of most library functions without
-0.501406	overflow outside the loop without
-0.581390	manuals can be used without
-0.605835	floating point to integer without
-0.293476	Microsoft or Gnu compilers without
-0.335340	function stores a double without
-0.290691	compile the shared object without
-0.459499	compile a shared object without
-0.324284	to a new version without
-0.335538	to make shared objects without
-0.340360	the same dynamic libraries without
-0.236464	in example 11.3 even without
-0.876690	of object oriented programming without
-0.511873	the floating point operations without
-0.292326	able to reorder instructions without
-0.236309	compiled for old processors without
-0.236188	handle an unrecoverable error without
-0.313362	reduced performance on CPUs without
-0.625296	the sequence of calculations without
-0.236119	used as command-line versions without
-0.408924	the program is compiled without
-0.260643	and main() are compiled without
-0.298136	later with code compiled without
-0.260643	each process when compiled without
-0.208281	A shared object compiled without
-0.327495	SSE Store 4 bytes without
-0.318363	SSE2 Store 8 bytes without
-0.049676	SSE2 Store 16 bytes without
-0.024128	SSE Store 16 bytes without
-0.235999	want to improve speed without
-0.618284	case of an exception without
-0.348849	may use double precision without
-0.379602	size array or container without
-0.290109	programming, but in applications without
-0.531202	compatible with old microprocessors without
-0.232893	ways of handling errors without
-0.231685	prevent legitimate backup copying without
-0.333435	operands cannot be changed without
-0.286682	The disadvantage of compiling without
-0.230653	can call C1::f directly without
-0.229928	breaking out of F1 without
-0.224736	than the C-style type-casting without
-0.222191	you declare an int, without
-0.218333	obtain the desired functionality without
-0.218333	rely on a unit-test without
-0.212067	show a disassembly, probably without
-0.199802	the numbers in question without
-0.164957	can be used freely without
-0.164957	directly compiled code. (Compile without
-0.344321	intrinsic functions. This is useful
-0.344321	into memory. This is useful
-0.344321	same value. This is useful
-0.353225	improve performance. It is useful
-0.353225	Cache organization It is useful
-0.646061	console mode program is useful
-0.353198	language output, which is useful
-0.328373	library. This method is useful
-0.328373	16. This method is useful
-0.328373	2.0 This method is useful
-0.235884	optimal. Best-case testing is useful
-0.235884	manually. This principle is useful
-0.235884	The empty throw()specification is useful
-0.571874	time. This is a useful
-0.353725	library (STL) is a useful
-0.462625	but it can be useful
-0.658274	cases it can be useful
-0.392076	variable. This can be useful
-0.392076	fast. This can be useful
-0.392076	free. This can be useful
-0.392076	it). This can be useful
-0.392076	32-62. This can be useful
-0.795823	dynamic library can be useful
-0.427636	Smart pointers can be useful
-0.330392	Lazy binding can be useful
-0.330392	Lookup tables can be useful
-0.330392	code. Metaprogramming can be useful
-0.547884	then it may be useful
-0.429359	optimization it may be useful
-0.465943	72 This may be useful
-0.409735	predictable. It may be useful
-0.409735	unavoidable. It may be useful
-0.316368	Bitfields Bitfields may be useful
-0.335353	profilers available which are useful
-0.775177	These function libraries are useful
-0.480200	(ZMM). Vector operations are useful
-0.321416	compiled. #if directives are useful
-0.310877	CodeAnalyst. These profilers are useful
-0.290536	in Linux). Threads are useful
-0.290536	wrong type. References are useful
-0.234673	|, ^, ~ are useful
-0.335960	(chapter 12) are more useful
-0.324652	A profiler is most useful
-0.472633	automatically. It is also useful
-0.324273	& operator is also useful
-0.230789	It is 102 also useful
-0.420436	This library contains many useful
-0.333219	cycle counter is very useful
-0.493604	itself is a very useful
-0.275489	well tested, and very useful
-0.242794	It can be very useful
-0.242794	counters can be very useful
-0.350264	-fpie option is less useful
-0.551887	times. It is often useful
-0.232286	details (www.agner.org/optimize/testp.zip). A particularly useful
-0.229183	forums and newsgroups contain useful
-0.355837	equally near then the even
-0.556487	PCs. Therefore, it is even
-0.237842	map are prone to even
-0.237757	to be overwritten, and even
-0.237763	a hash table for even
-0.451712	but it would be even
-0.237778	a common denominator can even
-0.235761	important new update or even
-0.235761	be a hundred or even
-0.235761	code are uncached or even
-0.235761	a binary search, or even
-0.572670	standardized. It is not even
-0.236646	in a register, not even
-0.454529	... There is an even
-0.349835	legacy software. It may even
-0.291341	of RAM memory may even
-0.352545	per vector. You may even
-0.351113	soon as you have even
-0.293684	it) load into memory even
-0.293172	area for different objects even
-0.515724	scope of a variable even
-0.352336	times before the performance even
-0.802838	and in some cases even
-0.236064	spaces for different arrays even
-0.442784	will make multiple versions even
-0.228889	in some cases. An even
-0.228889	as memory leak. An even
-0.235721	OR operator (|) works even
-0.454738	take 10 clock cycles even
-0.500271	implementation in most cases, even
-0.347159	to use exception handling even
-0.340939	smallest devices, you don't even
-0.234282	performance for many applications even
-0.332550	bypassing the dispatch mechanism even
-0.290196	table lookups are needed even
-0.289923	is not an Intel, even
-0.232160	physical register to temp even
-0.232095	function is always inlined even
-0.285094	main will be used, even
-0.415819	branches can be mispredicted even
-0.281355	most likely be called, even
-0.295954	the latter is executed even
-0.222177	until the function returns even
-0.218389	the computer starts up, even
-0.218315	or require more resources, even
-0.551784	lazy binding by default, even
-0.212050	the whole program execution, even
-0.212050	that overflow never occurs, even
-0.212050	always takes memory space, even
-0.347146	variable in example 11.3 even
-0.199785	than floating point expressions, even
-0.199785	can be a time-consumer even
-0.199785	unacceptably long response times, even
-0.164942	equivalent if(!(a || b)) even
-0.164942	instructions rather than nine, even
-0.164942	happen that (b*c) overflows, even
-0.164942	for the exception handler, even
-0.556385	executable because it is sure
-0.357366	#include directives. This is sure
-0.237860	compiler cannot know for sure
-0.353056	then you cannot be sure
-0.407396	We can never be sure
-0.250597	only if you are sure
-0.250597	loop if you are sure
-0.250597	unsigned if you are sure
-0.250597	aliasing" if you are sure
-0.520429	or if they are sure
-0.334971	most cases they are sure
-0.335363	and AMD processors are sure
-0.234681	the same arguments are sure
-0.522126	if you are not sure
-0.496478	pointers is to make sure
-0.252998	You have to make sure
-1.083211	in order to make sure
-0.085353	only way to make sure
-0.217394	you want to make sure
-0.252998	is important to make sure
-0.252998	the structure to make sure
-0.083801	the programmer to make sure
-0.252998	a destructor to make sure
-0.252998	a subexpression to make sure
-0.252998	systematic manner to make sure
-0.252998	code carefully to make sure
-0.265793	if possible, and make sure
-0.265793	are aligned, and make sure
-0.385215	...). We can make sure
-0.262864	if only you make sure
-0.180820	function library then make sure
-0.180820	Intel compiler, then make sure
-0.322743	words, you must make sure
-0.194038	additional parameters. Therefore, make sure
-0.194038	and non-recoverable errors; make sure
-0.255303	a destructor that makes sure
-0.542809	is that it makes sure
-0.283592	+= x; This makes sure
-0.283592	64 bytes. This makes sure
-0.283592	table static. This makes sure
-0.203544	exception handling system makes sure
-0.203544	A const reference makes sure
-0.203544	The volatile keyword makes sure
-0.203544	than the product makes sure
-0.202342	useful way of making sure
-0.202342	good way of making sure
-0.617864	be solved by making sure
-0.221058	by a variable. Make sure
-0.221058	files and executables. Make sure
-0.165148	to be signed. Be sure
-0.351091	a macro, but the method
-0.438731	You may choose the method
-0.237752	table 9.3 shows, the method
-0.323001	are not used. The method
-0.311629	number of bits. The method
-0.235305	faster than pow The method
-0.291254	more iterations back. The method
-0.235305	in assembly language". The method
-0.379206	not be negative. The method
-0.235305	a loop count. The method
-0.235305	a Gauss elimination. The method
-0.348945	that no function or method
-0.296804	is never called. This method
-0.361933	support different CPUs. This method
-0.277097	the Intel compiler. This method
-0.222835	of the library. This method
-0.386011	into each thread. This method
-0.222835	dynamic memory allocation. This method
-0.222835	in the array. This method
-0.277097	i modulo 16. This method
-0.222835	loop are finished. This method
-0.361933	code at all. This method
-0.222835	have multiple versions. This method
-0.277097	program is loaded. This method
-0.222835	u.f < 2.0 This method
-0.222835	slightly less efficiently. This method
-0.222835	Use different executables. This method
-0.222835	must be added. This method
-0.222835	in Windows MFC). This method
-0.501270	The advantage of this method
-0.303905	to note that this method
-0.331742	You should use this method
-0.331248	of pointers because this method
-0.342032	template metaprogramming, but this method
-0.324898	You may avoid this method
-0.228817	processors, and choose this method
-0.228817	functions. 80 Unfortunately, this method
-0.237528	of memory blocks. A method
-0.331478	However, the short vector method
-0.345943	is closed. The same method
-0.334506	The choice of which method
-0.289023	is also discussed which method
-0.233343	important to consider which method
-0.484303	make the induction variable method
-0.428120	dispatch on first call method
-0.508826	problem. The most important method
-0.415332	changes the function calling method
-0.230006	} 138 A similar method
-0.230019	is used. A newer method
-0.229174	to the table. Optimization method
-0.228052	_mm_free. A more general method
-0.228086	function returns. The preferred method
-0.218479	string. The old C-style method
-0.218479	and disadvantages. The original method
-0.212211	the same effect. Which method
-0.212211	code uses an unfortunate method
-0.355312	Remove branch that is always
-1.027669	that the function is always
-0.345331	assume that b is always
-0.536920	feature that there is always
-0.313307	if possible. SSE2 is always
-0.334770	the template parameter is always
-0.292855	The code section is always
-0.427636	digits. The exponent is always
-0.653880	Another possibility is to always
-0.648290	expect a compiler to always
-0.538057	can be reduced to always
-0.574858	count is small and always
-0.651584	maximum repeat count and always
-0.510482	mispredicted. A branch that always
-0.805648	the template parameters are always
-0.323033	and intermediate results are always
-0.312455	of these manuals are always
-0.292042	and references. Arrays are always
-0.235997	(also called properties) are always
-0.382659	one : 1; // always
-0.237738	to always true or always
-0.324907	The compiler is not always
-0.512253	unfortunately this is not always
-0.529562	structures It is not always
-0.324907	instruction set is not always
-0.324907	one computer is not always
-0.324907	cross-platform compatibility is not always
-0.431879	standard libraries are not always
-0.558806	operating systems are not always
-0.305455	model numbers are not always
-0.305455	The profilers are not always
-0.337750	multiple applications, but not always
-0.422104	Intel libraries do not always
-0.325964	these directives do not always
-0.626562	of 2 does not always
-0.539364	division. The compiler will always
-0.310735	with another thread will always
-0.234555	the same core will always
-0.313240	use the #pragma vector always
-0.236655	vector always #pragma vector always
-0.236655	("internal"))) Vectorize #pragma vector always
-0.454342	set. If the cache always
-0.235101	8. The size should always
-0.291022	The installation process should always
-0.454577	assumption that the variable always
-0.346204	type, but you cannot always
-0.325302	32-bit integers, and they always
-0.595043	make sure that they always
-0.236308	A floating point constant always
-0.970337	stored on the stack always
-0.236194	implemented. The recursion must always
-0.375915	the function call statement always
-0.288625	will see that p always
-0.316367	double's. It is almost always
-0.325377	optimization manuals. I am always
-0.165081	the compiler. Remember, therefore, always
-0.458052	loops would make the access
-0.350760	memory blocks makes the access
-0.354550	of error is to access
-0.448170	need assembly code to access
-0.345147	accessed recently than to access
-0.354806	not always possible to access
-0.580092	de-referenced in order to access
-0.483948	} In order to access
-0.434467	it is faster to access
-0.285894	usually much faster to access
-0.291985	take several seconds to access
-0.590715	will be unable to access
-0.235947	the following steps to access
-0.237812	row or column. The access
-0.459957	problem here is that access
-0.556382	make the functions that access
-0.348937	faster because we can access
-0.331958	endian storage. If you access
-0.331958	systems). 42 If you access
-0.235338	file will give you access
-0.314354	no other threads have access
-0.229450	CPU access and memory access
-0.229450	is likely that memory access
-0.229450	this method if memory access
-0.229450	a bottleneck than memory access
-0.047015	ebx. 9 Optimizing memory access
-0.047015	84 9 Optimizing memory access
-0.293729	data explicitly if data access
-0.237421	is relevant when CPU access
-0.237370	linear algebra) require other access
-0.237331	file access or cache access
-0.237055	sake of fastest possible access
-0.422825	means that it cannot access
-0.413167	static member function cannot access
-0.236550	settings and different user access
-0.219521	the bottleneck is file access
-0.045369	useful to put file access
-0.045369	advantageous to put file access
-0.219521	big-endian storage. Optimizing file access
-0.906845	in order to get access
-0.328459	of instructions for fast access
-0.232963	a double which gives access
-0.243167	file access and network access
-0.192755	be controlled. The network access
-0.192755	of software with network access
-0.369102	loaded. 21 3.13 Memory access
-0.228050	data structures with non-sequential access
-0.224790	as a subset, giving access
-0.218432	prefetch data for regular access
-0.265028	146 below. 3.7 File access
-0.199897	GOT, and finally (4) access
-0.035759	.......................................................................................... 21 3.12 Network access
-0.035759	system modules. 3.12 Network access
-0.199897	language that allows direct access
-0.199897	a container for exclusive access
-0.165045	on access. Sequential forward access
-0.647596	+ 2; } } void
-0.332168	C1 x; ... } void
-0.752942	*p + 2; } void
-0.617842	_mm_storeu_si128((__m128i *)d, x); } void
-0.406863	once for each version void
-0.381391	// Loop with branch void
-0.005176	CHello { public: virtual void
-0.005176	C0 { public: virtual void
-0.206968	public: void NotPolymorphic(); virtual void
-0.322983	second by another thread void
-0.022744	transpose and copy matrix void
-0.799176	// Define vector classes void
-0.056130	<< 2; } }; void
-0.202008	b, c, d; }; void
-0.469821	virtual void f(); }; void
-0.202008	public: ... ~C1(); }; void
-0.235613	Vectorization with alignment problem void
-0.045606	into array static inline void
-0.204974	cache line: static inline void
-0.190192	dispatched function call inline void
-0.376708	class CHello { public: void
-0.222709	public CParent<CChild1> { public: void
-0.097233	class CGrandParent { public: void
-0.097233	public CGrandParent { public: void
-0.222709	public CParent<CChild2> { public: void
-0.218408	rows/columns in matrix 96 void
-0.272085	code. // Example 8.26a void
-0.212208	Example: // Example 7.12 void
-0.212142	define function type typedef void
-0.251171	improvements). // Example 8.26b void
-0.251171	transpose function swapd(a[r][c], a[c][r]); void
-0.199875	keyword: // Example 14.1c void
-0.199875	NotPolymorphic(); virtual void Disp(); void
-0.074721	x[]); void F2(float x[]); void
-0.074721	9.2a void F1(int x[]); void
-0.251171	Example: // Example 8.21 void
-0.165024	9.5b. // Example 9.5b void
-0.165024	= &SelectAddMul_dispatch; // Dispatcher void
-0.165024	{ _mm_storeu_si128((__m128i *)d, x);} void
-0.165024	c1() : x(0) {}; void
-0.165024	<math.h> #define EXCEPTION_FLT_OVERFLOW 0xC0000091L void
-0.165024	Example: // Example 8.5a void
-0.165024	to the function prototype: void
-0.165024	Example 9.3 #include <malloc.h> void
-0.165024	Example: // Example 8.25 void
-0.165024	union: // Example 9.2b void
-0.165024	Example: // Example 9.2a void
-0.165024	// Branch/loop function vectorized: void
-0.165024	#include <stdio.h> #include <asmlib.h> void
-0.314700	A short int is 16
-0.335981	each, eight integers of 16
-0.314460	within a block of 16
-0.237676	can be increased to 16
-0.237676	kb. This corresponds to 16
-0.237725	The factor sizeof(S1) = 16
-0.102326	or 16 8 or 16
-0.102326	the lower 8 or 16
-0.236444	8, 10, 12 or 16
-1.171092	Roll out loop by 16
-0.234027	// align table by 16
-0.924367	that is divisible by 16
-0.234027	operations when alignment by 16
-0.234027	and big structures by 16
-0.234027	compiler Linux Align by 16
-1.466334	parts of the code 16
-0.382541	double takes 4 - 16
-0.349874	16-bit systems: unsigned int 16
-0.326988	128 SSE2 short int 16
-0.326988	256 AVX2 short int 16
-0.326988	64 MMX short int 16
-0.233515	in 16-bit systems: int 16
-0.345054	2. Objects bigger than 16
-0.355402	disk files. See page 16
-0.293362	we reach element number 16
-0.319352	64 MMX char 8 16
-0.227792	128 Is8vec16 Vec16c 8 16
-0.227792	int64_t 64 I64vec1 8 16
-0.348319	XMM registers to test 16
-0.446049	AVX2 short int 16 16
-0.225077	int 832 256 16 16
-0.225077	Vec2d Vec8f Vec4d 16 16
-0.322995	256 AVX int 32 16
-0.306194	512 AVX512 float 32 16
-0.234662	short int 832 256 16
-0.022207	1 byte = char 16
-0.219773	MOVNTPD _mm_stream_pd SSE2 Store 16
-0.133565	MOVNTPS _mm_stream_ps SSE Store 16
-0.133565	MOVNTQ _mm_stream_pi SSE Store 16
-0.282962	simply stores the lower 16
-0.228032	variables can be 8, 16
-0.228057	Predefined macros Compiler identification 16
-0.222212	and temp++ actually adds 16
-0.218408	15 Metaprogramming ....................................................................................................... 150 16
-0.212142	a clock cycle? ...................................................................................... 16
-0.199875	int 128 Is16vec8 Vec8s 16
-0.199875	biggest time consumers ................................................................................ 16
-0.199875	find hot spots .................................................................................. 16
-0.165024	reduced 15.1a to 15.1c). 16
-0.165024	char 128 Iu8vec16 Vec16uc 16
-0.165024	unsigned char 64 Iu8vec8 16
-0.165024	Vec4f Vec2d Vec8f Vec4d 16
-0.165024	Instruction set needed _mm_shuffle_epi8 16
-0.165024	short int 64 Is16vec4 16
-0.351557	intrinsic functions for the SSE2
-0.454278	times, one for the SSE2
-0.352804	software implementation if the SSE2
-0.352804	bits (XMM) if the SSE2
-0.025526	mode or when the SSE2
-0.313123	32-bit systems when the SSE2
-0.313123	vector operations when the SSE2
-0.313123	than truncation when the SSE2
-0.313123	four float's when the SSE2
-0.306292	CPUs with only the SSE2
-0.306292	and insert only the SSE2
-0.344473	old processors without the SSE2
-0.571515	In some cases the SSE2
-0.250824	long time unless the SSE2
-0.250824	32-bit systems unless the SSE2
-0.250824	32-bit mode unless the SSE2
-0.250824	than rounding unless the SSE2
-0.329553	page 105). Using the SSE2
-0.165086	is to enable the SSE2
-0.165086	recommended to enable the SSE2
-0.208019	mode or enable the SSE2
-0.382750	with the SSE and SSE2
-0.428524	is more efficient. The SSE2
-0.330887	all modern CPUs. The SSE2
-0.237111	(see page 140). The SSE2
-0.341259	builder. Not optimized for SSE2
-0.237462	14.21. // Only for SSE2
-0.104326	>= 4) { // SSE2
-0.437017	int parm2) {...} // SSE2
-0.291429	FUNCNAME SelectAddMul_AVX2 #endif // SSE2
-0.382608	has the SSE or SSE2
-0.294061	the denormals-are-zero mode if SSE2
-0.237727	Example 12.4b. Vectorized with SSE2
-0.756252	for the instruction set SSE2
-0.335677	fprintf(stderr, "\nError: Instruction set SSE2
-0.236543	point to integer without SSE2
-0.042388	double 64 2 128 SSE2
-0.042388	long 64 2 128 SSE2
-0.333502	int 32 4 128 SSE2
-0.202012	int 16 8 128 SSE2
-0.202012	char 8 16 128 SSE2
-0.233931	128 bit float vectors SSE2
-0.342931	#include <emmintrin.h> // Define SSE2
-0.605920	instruction set if possible. SSE2
-0.212217	cases when the 145 SSE2
-0.199948	into the same executable. SSE2
-0.251253	/arch:SSE -msse /arch:SSE -msse SSE2
-0.165091	without cache MOVNTDQ _mm_stream_si128 SSE2
-0.165091	without cache MOVNTI _mm_stream_si32 SSE2
-0.165091	MMX mmintrin.h SSE xmmintrin.h SSE2
-0.165091	without cache MOVNTPD _mm_stream_pd SSE2
-0.122518	if the index is out
-0.056953	the array index is out
-0.056953	an array index is out
-0.325283	time. Interpreted languages are out
-0.294003	multiple calculations simultaneously or out
-0.237762	// return 0 if out
-0.357537	that index is not out
-0.230172	CPUs can execute instructions out
-0.305518	optimization by executing instructions out
-0.233542	The following list points out
-0.308795	variables. Move the conversions out
-0.232926	may write FatalAppExitA(0,"Array index out
-0.056656	you want to find out
-0.267093	of a[i] and shift out
-0.189695	set). We can shift out
-0.189695	example 14.28 will shift out
-0.028923	because it cannot rule out
-0.014221	the compiler cannot rule out
-0.014221	The compiler cannot rule out
-0.093278	able to completely rule out
-0.004601	trick is to roll out
-0.004601	easy way to roll out
-0.004601	be useful to roll out
-0.004601	we want to roll out
-0.004601	is advantageous to roll out
-0.023511	understand when we roll out
-0.005528	vector objects // Roll out
-0.005528	b, c; // Roll out
-0.001375	= _mm_set1_epi16(2); // Roll out
-0.005528	Is16vec8 two(2,2,2,2,2,2,2,2); // Roll out
-0.355896	and it can move out
-0.218485	can clear or mask out
-0.122594	operations can be carried out
-0.122594	The tests were carried out
-0.048367	} }; // Index out
-0.005759	cout << "Error: Index out
-0.056986	that can be moved out
-0.056986	calculation may be moved out
-0.212177	consequence of n being out
-0.122594	is used for jumping out
-0.122594	necessary destructors after jumping out
-0.035761	exceptions can be ruled out
-0.035761	they cannot be ruled out
-0.035761	be avoided by rolling out
-0.035761	example 8.26a by rolling out
-0.074733	a memory block turns out
-0.074733	if the prediction turns out
-0.199909	information can be left out
-0.074733	the loop is rolled out
-0.074733	of a list, rolled out
-0.165055	// Loop to print out
-0.165055	Then we are breaking out
-0.354672	if all of the following
-0.566951	choose one of the following
-0.354672	unacceptable. Each of the following
-0.342718	the compiler in the following
-0.342718	be necessary in the following
-0.628034	This works in the following
-0.342718	as given in the following
-0.342718	be improved in the following
-0.342718	are discussed in the following
-0.342718	is interpreted in the following
-0.342718	are evaluated in the following
-0.342718	table (PLT) in the following
-0.346776	a program for the following
-0.346776	linear array for the following
-0.346776	code caching for the following
-0.356356	works best if the following
-0.355116	be illustrated by the following
-0.342502	bit systems have the following
-0.348448	as position-independent has the following
-0.333634	code goes through the following
-0.235805	you may consider the following
-0.048053	The compiler generates the following
-0.048053	Intel compiler generates the following
-0.235805	explanation. Please skip the following
-0.291823	in loops. Consider the following
-0.235805	July 2011). Instead, the following
-0.228211	elements in table The following
-0.336812	error message function. The following
-0.470573	virtual member functions. The following
-0.322261	caching very efficient. The following
-0.497795	given instruction set. The following
-0.313537	most newer processors. The following
-0.414511	inside the loop. The following
-0.313537	power of 2. The following
-0.303185	floating point precision. The following
-0.461953	loop or not. The following
-0.283194	two 128-bit vectors. The following
-0.283194	can not do. The following
-0.228211	3 breakpoint again. The following
-0.283194	calls and branches. The following
-0.283194	library at www.agner.org/optimize/asmlib.zip. The following
-0.283194	for further explanation. The following
-0.099285	dividend is unsigned. The following
-0.099285	signed or unsigned. The following
-0.228211	recover from errors. The following
-0.228211	using multiplications only. The following
-0.283194	values before compilation. The following
-0.228211	and no multiplications. The following
-0.228211	happens at runtime). The following
-0.228211	will see shortly. The following
-0.228211	Nerds at Wikibooks. The following
-0.228211	instructions becomes noticeable. The following
-0.228211	CPUs (Intel Atom). The following
-0.228211	is not satisfactory. The following
-0.165184	this function is InstructionSet().The following
-0.357140	their workplace and the system
-0.544029	system. Note that the system
-0.353706	programmer forgets that the system
-0.420872	seconds to access the system
-0.237587	system, and therefore the system
-0.355179	be determined by a system
-0.354421	multiple threads on a system
-0.853897	for further discussion of system
-0.237709	to the area of system
-0.335218	drivers, configuration files and system
-0.088801	of compatibility problems and system
-0.293261	to hardware interfaces and system
-0.350670	very time-consuming function in system
-0.314394	economize resource use in system
-0.237844	of allocated resources. The system
-0.341743	that are intended for system
-0.237737	can be determined with system
-0.237467	different screen resolutions, different system
-0.017912	part of the operating system
-0.017912	overhead of the operating system
-0.017912	responsibility of the operating system
-0.017912	facilities of the operating system
-0.128061	Updates to the operating system
-0.095098	CPU and the operating system
-0.076560	included in the operating system
-0.076560	require that the operating system
-0.105112	done by the operating system
-0.105112	caught by the operating system
-0.076560	come with the operating system
-0.128061	framework between the operating system
-0.076560	profiler tells the operating system
-0.076560	applications force the operating system
-0.347194	2.3 Choice of operating system
-0.039586	is used. The operating system
-0.039586	the cache. The operating system
-0.039586	and databases. The operating system
-0.175010	applications without an operating system
-0.131566	to query certain operating system
-0.131566	BSD or Mac operating system
-0.131566	in both compiler, operating system
-0.347194	Mac OS X operating system
-0.347194	in a protected operating system
-0.131566	violate or circumvent operating system
-0.307116	your own error handling system
-0.333707	The C++ exception handling system
-0.233499	background services under advanced system
-0.090935	................................................................................................................. 21 3.11 Other system
-0.090935	is best. 3.11 Other system
-0.315139	such applications are highly system
-0.224901	and back again. Accessing system
-0.330741	frameworks, interpreters, just-in-time compilers, system
-0.165112	drivers, interrupt service routines, system
-0.572844	Only one of the 32
-0.407823	while an int is 32
-0.237696	integer type size_t is 32
-0.336200	each, four integers of 32
-0.294157	following disadvantages compared to 32
-0.237782	between 8 bit and 32
-0.450639	bit mode than in 32
-0.780504	below. Shared objects in 32
-0.324656	use absolute references in 32
-0.237042	be 8, 16 or 32
-0.237042	vector size (16 or 32
-0.336060	are preferably aligned by 32
-0.325144	lines are organized as 32
-0.309474	16-bit systems: long int 32
-0.233496	8 128 SSE2 int 32
-0.233496	4 256 AVX int 32
-0.233496	16 256 AVX2 int 32
-0.233496	4 64 MMX int 32
-0.356543	64 bits rather than 32
-0.237545	a float. (Both use 32
-0.347831	& b; will make 32
-0.652750	are used. See page 32
-0.342996	a variable, for example 32
-0.381824	write a 64-bit double 32
-0.231036	2 128 SSE2 float 32
-0.231036	4 256 AVX2 float 32
-0.231036	8 512 AVX512 float 32
-0.293144	can replace j * 32
-0.330827	32 4 64 2 32
-0.313510	16-bit systems: unsigned long 32
-0.330709	64 4 64 4 32
-0.324332	16 16 32 8 32
-0.319318	128 SSE2 char 8 32
-0.227765	uint64_t 128 Vec2uq 8 32
-0.324262	31 11.6 64 64 32
-0.313547	Vec8f Vec4d 16 16 32
-0.448666	available if the AVX 32
-0.235618	because a float uses 32
-0.234958	16 for SSE2, preferably 32
-0.311680	Parallelization by OpenMP directives 32
-0.288111	The bitwise operators produce 32
-0.231749	Another problem with accessing 32
-0.231213	(10000 / 64) % 32
-0.276395	have several advantages over 32
-0.276395	other than 8, 16, 32
-0.165114	may use the upper 32
-0.165114	{ // Get upper 32
-0.164983	char 16 SSSE3 _mm_perm_epi8 32
-0.164983	not __INTEL_COMPILER __INTEL_COMPILER 161 32
-0.164983	int 128 Iu16vec8 Vec8us 32
-0.164983	variables and operators ...................................................................... 32
-0.164983	int 128 Is32vec4 Vec4i 32
-0.164983	unsigned int 64 Is32vec2 32
-0.164983	set Important features 80386 32
-0.164983	short int 64 Iu16vec4 32
-0.827527	This is because the file
-0.353656	the program before the file
-0.447181	that makes sure the file
-0.420725	unable to access the file
-0.477973	minutes to write the file
-0.237491	opens and closes the file
-0.382819	if the bottleneck is file
-0.356088	forward access to a file
-0.696143	faster to access a file
-0.627676	Reading or writing a file
-0.236998	reads or writes a file
-0.236998	program that created a file
-0.236998	a function opens a file
-0.237465	and dynamic linking. The file
-0.537982	file is closed. The file
-0.355456	The time used for file
-0.237495	a plain old data file
-0.452410	needed from the library file
-0.322527	compatible on the object file
-0.322527	compilers at the object file
-0.441050	then use an object file
-0.225030	Supports three different object file
-0.225030	of the usual object file
-0.443044	separate C or C++ file
-0.313692	programs that have many file
-0.236077	floppy disk. A big file
-0.299262	second step. The intermediate file
-0.333566	compiled to an intermediate file
-0.556015	information in a separate file
-0.369795	also useful to put file
-0.520309	be advantageous to put file
-0.234603	class in one source file
-0.231328	with big-endian storage. Optimizing file
-0.336908	may mirror the entire file
-0.103205	Therefore, both the executable file
-0.103205	file. Only the executable file
-0.103205	run. Both the executable file
-0.130047	or by an executable file
-0.130047	in a single executable file
-0.086406	libmmt.lib and the header file
-0.086406	can use the header file
-0.097079	are including a header file
-0.097079	If the standard header file
-0.022312	include the appropriate header file
-0.022312	Including the appropriate header file
-0.145467	by requesting a map file
-0.066671	the program. The map file
-0.066671	the linker. The map file
-0.145467	/FA -S Generate map file
-0.229133	Available protocols and standardized file
-0.042027	first call // Header file
-0.042027	or later // Header file
-0.088561	follows: Instruction set Header file
-0.165076	compilers www.agner.org/ optimize/#vectorclass Include file
-0.165076	to make a zip file
-0.358462	manual or in the programming
-0.408020	on the way the programming
-0.294186	obtained by choosing a programming
-0.848954	at the time of programming
-0.268528	in the choice of programming
-0.383152	that the choice of programming
-0.128084	optimization. 2.4 Choice of programming
-0.128084	6 2.4 Choice of programming
-0.078399	is a matter of programming
-0.299466	simply a matter of programming
-0.312218	code. The history of programming
-0.534660	a good deal of programming
-0.235799	by better standardization of programming
-0.237878	optimizing University courses in programming
-0.487120	can be called from programming
-0.331515	faster than in other programming
-0.231627	make developers choose other programming
-0.231627	on compilers. Several other programming
-0.231627	in performance over other programming
-0.237391	important to decide which programming
-0.237124	environment (IDE) supports multiple programming
-0.503872	knowledge of the C++ programming
-0.343217	is that the software programming
-0.339494	difference between a software programming
-0.229718	Avoid unnecessary functions Some programming
-0.229718	dynamic memory allocation. Some programming
-0.236107	platforms and other compiled programming
-0.568114	This is a common programming
-0.263345	is also a common programming
-0.221193	leaks and other common programming
-0.235319	to adhere to certain programming
-0.514234	feel that a particular programming
-0.290424	ARM platforms and various programming
-0.218526	get answers to your programming
-0.218526	please don't send your programming
-0.321287	which of the advanced programming
-0.232937	is Perl. Several modern programming
-0.287790	is not a safe programming
-0.008316	advantages of object oriented programming
-0.004138	effects of object oriented programming
-0.025441	classes. The object oriented programming
-0.025441	use an object oriented programming
-0.025441	textbooks recommend object oriented programming
-0.059897	delete). 88 Object oriented programming
-0.283021	is definitely the preferred programming
-0.079321	code.................................................................................. 148 14.13 System programming
-0.079321	OS X. 14.13 System programming
-0.212177	safer. It may catch programming
-0.199909	much of the trivial programming
-0.199909	of a relatively primitive programming
-0.165055	be possible. Template meta- programming
-0.165055	Structures and classes Nowadays, programming
-0.430585	file and all the dynamic
-0.332748	to distribute all the dynamic
-0.458457	needs to load the dynamic
-0.502050	a function in a dynamic
-0.355886	be compiled as a dynamic
-0.293576	address at which a dynamic
-0.292888	Four typical uses of dynamic
-0.506809	defined. The cost of dynamic
-0.236741	are: The process of dynamic
-0.592378	used. The advantages of dynamic
-0.419490	dynamically. The advantages of dynamic
-0.471968	of the costs of dynamic
-0.292888	advance. The disadvantages of dynamic
-0.077729	in both static and dynamic
-0.077729	with both static and dynamic
-0.314623	multiple dynamic libraries. The dynamic
-0.237814	heap is reserved for dynamic
-0.023460	libraries (*.lib, *.a) or dynamic
-0.237761	the first application if dynamic
-0.293988	programming errors associated with dynamic
-0.348861	or .a), but not dynamic
-0.356571	static linking rather than dynamic
-0.345516	deciding whether to use dynamic
-0.796823	no reason to use dynamic
-0.230861	container class libraries use dynamic
-0.100268	standard container classes use dynamic
-0.100268	of string classes use dynamic
-0.230861	such as Java, use dynamic
-0.516743	and one or more dynamic
-0.234581	that calls it. A dynamic
-0.234581	for each process. A dynamic
-0.234581	support static linking. A dynamic
-0.346946	factors that can make dynamic
-0.568133	version of the same dynamic
-0.959247	can share the same dynamic
-0.237354	This will make all dynamic
-0.352163	and drawbacks of using dynamic
-0.341097	is distributed between multiple dynamic
-0.349936	In the cases where dynamic
-0.236516	array or container without dynamic
-0.236135	by the application, while dynamic
-0.318705	fixed size to avoid dynamic
-0.449699	of how to avoid dynamic
-0.222330	if possible, and avoid dynamic
-0.322613	might clash with another dynamic
-0.291226	for multiple purposes. All dynamic
-0.158159	implemented in a separate dynamic
-0.516343	0) { // Make dynamic
-0.035766	145 14.11 Static versus dynamic
-0.035766	limitation). 14.11 Static versus dynamic
-0.218481	9.7 Container classes Whenever dynamic
-0.348338	linking includes only the part
-0.460231	Security software that is part
-0.336031	if a parameter is part
-0.462173	The stack is a part
-0.314487	e.g. how often a part
-0.237788	OS X (Darwin) are part
-0.237677	are usually included as part
-0.357534	code that is not part
-0.312824	almost certain that this part
-0.323921	make measurements on this part
-0.341309	amount of time. A part
-0.173159	used in the same part
-0.465510	way in the same part
-0.237427	the following cases: If part
-0.407368	possible to see which part
-0.237391	for other reasons, but part
-0.329418	interrupt occurs in each part
-0.232436	how much time each part
-0.567712	how many times each part
-0.472271	stored in a static part
-0.236973	should not include any part
-0.039560	int in the critical part
-0.019330	calls in the critical part
-0.039560	small in the critical part
-0.039560	conversions in the critical part
-0.327043	mispredictions if the critical part
-0.248197	methods then the critical part
-0.265259	used in a critical part
-0.170856	from the same critical part
-0.026618	of the most critical part
-0.054975	to the most critical part
-0.017563	in the most critical part
-0.054975	isolate the most critical part
-0.073979	cases. The most critical part
-0.406009	42 If you access part
-0.380526	program is an important part
-0.341431	at least a large part
-0.339579	used in a small part
-0.291801	level framework. The optimized part
-0.235654	AVX support and another part
-0.350021	you activate a particular part
-0.229950	and the most significant part
-0.370597	if the most time-consuming part
-0.229087	another C++ program (or part
-0.015531	: 23; // fractional part
-0.015531	: 52; // fractional part
-0.015531	: 63; // fractional part
-0.199903	performance if the time-critical part
-0.165050	to put the task-specific part
-0.540635	if any of the bits
-0.353454	or manipulate all the bits
-0.237748	the compiler interpret the bits
-0.591196	to the number of bits
-0.324954	an int uses more bits
-0.065013	of interpreting the same bits
-0.293628	and sets all other bits
-0.334109	is zero if all bits
-0.235291	zero by testing all bits
-0.231034	you can set multiple bits
-0.231034	or mask out multiple bits
-0.231034	you can toggle multiple bits
-0.323964	sixteen integers of 8 bits
-0.156036	two integers of 64 bits
-0.156036	the vectors of 64 bits
-0.092166	32-bit systems and 64 bits
-0.092166	16, 32 and 64 bits
-0.209311	vector can be 64 bits
-0.209311	registers, which are 64 bits
-0.209311	stack entries use 64 bits
-0.380965	0x7FFFFFFF) { // test bits
-0.219552	short int is 16 bits
-0.273379	eight integers of 16 bits
-0.381239	lower 8 or 16 bits
-0.219552	stores the lower 16 bits
-0.237428	type size_t is 32 bits
-0.187641	four integers of 32 bits
-0.237428	8, 16 or 32 bits
-0.187641	float. (Both use 32 bits
-0.187641	variable, for example 32 bits
-0.187641	a 64-bit double 32 bits
-0.187641	a float uses 32 bits
-0.187641	problem with accessing 32 bits
-0.083764	use the upper 32 bits
-0.083764	// Get upper 32 bits
-0.235894	reading or writing small bits
-0.281837	vector register is 128 bits
-0.227015	64 bits (MMX), 128 bits
-0.222301	set is available, 256 bits
-0.222301	128 bits (XMM), 256 bits
-0.234427	the least significant n bits
-0.217536	operating system, and 512 bits
-0.217536	and soon also 512 bits
-0.230066	the integer has enough bits
-0.031258	Total size of vector, bits
-0.226654	types available. declaration size, bits
-0.279347	of doubles by comparing bits
-0.218461	N into the individual bits
-0.199925	register sizes to 1024 bits
-0.008669	Size of each element, bits
-0.165071	so that the remaining bits
-0.523605	mix different kinds of operations
-0.336124	a long sequence of operations
-0.299269	set is the vector operations
-0.299269	11. Using the vector operations
-0.302133	The use of vector operations
-0.266868	(chapter 11) and vector operations
-0.286105	operations. 105 The vector operations
-0.306234	a problem with vector operations
-0.392670	advantageous to use vector operations
-0.173596	compilers can use vector operations
-0.173596	can also use vector operations
-0.266868	loop by using vector operations
-0.213794	after the 64-bit vector operations
-0.213794	The most efficient vector operations
-0.393902	access. 12 Using vector operations
-0.266868	branch mispredictions. Boolean vector operations
-0.213794	most processors (when vector operations
-0.539384	in-between the floating point operations
-0.884094	types of floating point operations
-0.062492	of doing floating point operations
-0.333555	are 100 floating point operations
-0.343397	multiple purposes. Floating point operations
-0.327552	performance because the integer operations
-0.221683	will typically use integer operations
-0.221683	an advantage because integer operations
-0.221683	possible to do integer operations
-0.221683	assume that these integer operations
-0.045730	141 14.9 Using integer operations
-0.045730	int)u; 14.9 Using integer operations
-0.221683	very fast. Simple integer operations
-0.558979	it possible to do operations
-0.237134	split into two 64-bit operations
-0.406675	the call and return operations
-0.351725	or 1. This makes operations
-0.236759	addition, subtraction, comparison, bit operations
-0.292652	Boolean vectors, and these operations
-0.329774	must do the extra operations
-0.278391	results. Integer operators Integer operations
-0.223977	a specific size. Integer operations
-0.290613	page 43). The Boolean operations
-0.322495	sound processing, and mathematical operations
-0.342389	all these table lookup operations
-0.234430	of << and | operations
-0.234314	were splitting 256-bit read operations
-0.318452	bit operations and shift operations
-0.231297	while waiting for disk operations
-0.171906	point register variables. Vector operations
-0.171906	later instruction sets. Vector operations
-0.171906	512 bits (ZMM). Vector operations
-0.190989	you can do arithmetic operations
-0.241184	memory address. Pointer arithmetic operations
-0.199937	Intrinsic functions are primitive operations
-0.165081	short int, float. Similar operations
-0.355103	single bit which is 0
-0.325146	or if one is 0
-0.325162	constructor initializes x to 0
-0.023492	element in b to 0
-0.929476	is known to be 0
-0.708284	is guaranteed to be 0
-0.348431	eax = i = 0
-0.523158	specialization for N = 0
-0.036326	a & ~a = 0
-0.064011	a - a-a = 0
-0.064011	- n.a. a-a = 0
-0.064011	x-xxx---- a-(-b)=a+b a-a = 0
-0.099942	a - a*0 = 0
-0.099942	- n.a. a*0 = 0
-0.047102	- a ^a = 0
-0.047102	~a a ^a = 0
-0.099942	reductions as 0/a = 0
-0.099942	a - 0/a = 0
-0.229981	n.a. - andnot(a,a) = 0
-0.017854	no other value than 0
-0.036473	any other value than 0
-0.005642	have other values than 0
-0.005642	no other values than 0
-0.235907	are the integers from 0
-0.235907	in the interval from 0
-0.355602	integers with the value 0
-0.313550	return 0; // return 0
-0.353594	two comparisons i < 0
-0.400685	... if (i < 0
-0.330455	int unsigned char 8 0
-0.323927	unsigned long int 64 0
-0.339945	that b is always 0
-0.330065	systems: unsigned int 16 0
-0.236473	systems: unsigned long 32 0
-0.236426	{ // test bits 0
-0.344967	= a, a & 0
-0.236057	was used by element 0
-0.291691	this number we get 0
-0.234939	branch instruction takes typically 0
-0.234431	series: ex xn n 0
-0.474061	= a, a | 0
-0.256145	a<<b<<c=a<<(b+c) x-xxx--xx a | 0
-0.021054	result = b > 0
-0.206618	1's when bb[i] > 0
-0.413480	integer in the interval 0
-0.165076	- a & 0= 0
-1.360156	the size of the type
-0.655391	the processor and the type
-0.538777	can assume that the type
-0.548414	initialization, or if the type
-0.356779	be done on the type
-0.353484	of templates where the type
-0.344427	relatively expensive, while the type
-0.437764	operations and choose the type
-0.293346	declared by specifying the type
-0.237143	is valid. Re-interpreting the type
-0.331490	with N elements of type
-0.102714	have four numbers of type
-0.102714	have eight numbers of type
-0.449055	the program than to type
-0.294126	on the size and type
-0.292881	the loop is. The type
-0.236734	of four float. The type
-0.236734	the class declaration. The type
-0.292881	64 bits each. The type
-0.356447	information about the function type
-0.380668	"asmlib.h" // Define function type
-0.292448	fprintf // define function type
-0.356575	using unions rather than type
-0.348697	it had a different type
-0.349655	time consumption of different type
-0.575674	be of the same type
-0.324651	bits. The unsigned integer type
-0.455742	are different for each type
-0.237240	pointer arithmetics and pointer type
-0.237126	= 100000001.23456. The float type
-0.293161	to work with any type
-0.236902	return types The return type
-0.349102	time-consuming than a simple type
-0.670146	different ways of doing type
-0.195246	off support for runtime type
-0.195246	does not use runtime type
-0.195246	methods or require runtime type
-0.195246	frame- pointer No runtime type
-0.234103	floating point instructions. Each type
-0.343898	set for the appropriate type
-0.233235	or an over- loaded type
-0.033415	objects of a composite type
-0.033415	Objects of a composite type
-0.069617	parameter has a composite type
-0.111208	a parameter of composite type
-0.224862	details about rounding. Pointer type
-0.114606	type identification (RTTI) Runtime type
-0.053534	........................................................................................ 53 7.21 Runtime type
-0.053534	the effort. 7.21 Runtime type
-0.218449	type conversion // C-style type
-0.165060	= static_cast<float>(i); // Implicit type
-0.165060	type casting // Constructor-style type
-0.561290	program. This is the case
-0.488819	If this is the case
-0.654146	For example, in the case
-0.356059	clock cycles in the case
-0.502737	most efficient if the case
-0.472045	this is not the case
-0.314003	is faster. In the case
-0.237295	them all. In the case
-0.237295	doesn't occur. In the case
-0.237295	page 60. In the case
-0.236991	This is often the case
-0.236991	as is commonly the case
-0.537637	of a function in case
-0.317310	the std::unexpected() function in case
-0.330899	a long time in case
-0.309622	version to use in case
-0.401806	stop the program in case
-0.309622	move the object in case
-0.289339	a safe way in case
-0.170206	to clean up in case
-0.170206	are cleaned up in case
-0.047697	catch an exception in case
-0.047697	raising an exception in case
-0.320130	of signed integers in case
-0.309622	generating denormal numbers in case
-0.233620	in all operands in case
-0.327635	preventing program errors in case
-0.233620	clean up everything in case
-0.233620	may be justified in case
-0.237649	n; switch (n) { case
-0.278054	15.1b, and in this case
-0.278054	case, but in this case
-0.278054	efficient solution in this case
-0.278054	by columns in this case
-0.278054	: 0] in this case
-0.233959	the resources. In this case
-0.233959	single-thread speed. In this case
-0.233959	page 71). In this case
-0.233959	same divisor. In this case
-0.381702	assume the worst possible case
-0.235569	updating in the likely case
-0.316653	is repetitive. The simplest case
-0.603895	because in the latter case
-0.303061	extended to the general case
-0.034750	to cover the worst case
-0.176480	way, etc. The worst case
-0.088584	case 1: printf("Beta"); break; case
-0.088584	case 2: printf("Gamma"); break; case
-0.088584	case 0: printf("Alpha"); break; case
-0.165133	performance under the worst- case
-0.165133	while in the former case
-0.357734	stack, except for the cases
-0.348292	(page 146). In the cases
-0.313430	default integer size in cases
-0.381311	This is advantageous in cases
-0.324387	vector operations automatically in cases
-0.428157	for such errors in cases
-0.292972	these example containers in cases
-0.352589	dispatching There may be cases
-0.352589	36. There may be cases
-0.521703	possible. However, there are cases
-0.338960	are so many different cases
-0.210633	compilers can in most cases
-0.210633	pointers because in most cases
-0.210633	is advantageous in most cases
-0.210633	thread. However, in most cases
-0.401267	hardware implementation in most cases
-0.273109	clock cycles. In most cases
-0.273109	other optimizations. In most cases
-0.313783	11 programming, etc. In cases
-0.515998	However, there are many cases
-0.311381	mathematical purity. In many cases
-0.237079	deallocated in all possible cases
-0.076985	platforms, and in some cases
-0.076985	level, and in some cases
-0.018005	compiler may in some cases
-0.036789	It may in some cases
-0.036789	declaration may in some cases
-0.235551	more efficient in some cases
-0.170622	is possible in some cases
-0.127933	graphics calculations. In some cases
-0.127933	Loop unrolling In some cases
-0.127933	array element. In some cases
-0.127933	optimal, though. In some cases
-0.127933	microprocessors have. In some cases
-0.127933	to mind. In some cases
-0.127933	mispredictions. 44 In some cases
-0.127933	page 34. In some cases
-0.420216	code automatically in simple cases
-0.236004	implementation is best. These cases
-0.226384	one of the few cases
-0.686177	there are a few cases
-0.222906	is needed. These complicated cases
-0.222906	instruction set. More complicated cases
-0.234375	specifying otherwise. In difficult cases
-0.263099	be optimal in special cases
-0.210457	But there are special cases
-0.090931	so on. 7.31 Other cases
-0.090931	................................................................................ 61 7.31 Other cases
-0.309449	But in more complex cases
-0.251260	of programming. 13.3 Difficult cases
-0.165097	imprecision in some rare cases
-0.341831	next function. However, the short
-0.588188	thread if it is short
-0.357911	test finishes in a short
-0.293979	next step. With a short
-0.698136	is a list of short
-0.331766	long vector libraries and short
-0.237704	should avoid macros with short
-0.325031	any size other than short
-0.478198	7.35a struct S1 { short
-0.237498	instead of int. A short
-0.498734	the speed by using short
-0.537548	vector math libraries: Intel short
-0.313361	64 Iu8vec8 16 4 short
-0.323943	Iu8vec16 Vec16uc 16 8 short
-0.302584	Is16vec4 16 4 unsigned short
-0.282619	Vec8s 16 8 unsigned short
-0.227705	0 255 uint8_t unsigned short
-0.330091	8 16 128 SSE2 short
-0.405849	eight numbers of type short
-0.336350	Example 7.21 int i; short
-0.336350	Example 7.23 int i; short
-0.291336	or unsigned 1 1 short
-0.290544	64 4 unsigned 256 short
-0.321134	256 Vec32c unsigned char short
-0.320423	8 32 256 AVX2 short
-0.000021	aa[], short int bb[], short
-0.000047	void SelectAddMul(short int aa[], short
-0.001698	void SelectAddMul_dispatch(short int aa[], short
-0.001698	void FUNCNAME(short int aa[], short
-0.001698	void FuncType(short int aa[], short
-0.043538	Array size Alignd ( short
-0.043538	aligned arrays Alignd ( short
-0.043538	bb[size] ); Alignd ( short
-0.311607	8 8 64 MMX short
-0.222271	last byte at 11 short
-0.212224	7.22. // Example 7.22 short
-0.199897	Small data types: char, short
-0.165045	8 -128 127 int8_t short
-0.165045	of smaller sizes (char, short
-0.557375	The result of the &
-0.655369	multiple bits with the &
-0.355617	= 18, then the &
-0.335961	an integer. But the &
-0.323168	& a= a a &
-0.721669	d; c = a &
-0.514498	expression y = a &
-0.457922	&& b with a &
-0.085511	n.a. n.a. - a &
-0.380325	~a = 0 a &
-0.699740	-1 = a, a &
-0.236108	x-xxxx--x ~a&~b=~(a|b) --xxxx--- a &
-0.237886	Don't change && to &
-0.294121	the ^ operator. The &
-0.325177	C0 * p = &
-0.037030	void Func(int a[], int &
-0.237264	Testing multiple conditions using &
-0.237237	sake of security. b &
-0.200031	* dest, double const &
-0.002512	* d, __m128i const &
-0.200031	& a, T const &
-0.200031	Vec4f polynomial (Vec4f const &
-0.200031	operator + (vector const &
-0.200031	inline float add_elements(__m128 const &
-0.200031	inline T max(T const &
-0.453956	calculated by a single &
-0.235207	} void FuncB (int &
-0.226654	return N; } T &
-0.147353	} u; if (u.i &
-0.147353	n; 143 if (u.i &
-0.148671	n; u.i = (n &
-0.262765	0) { if (n &
-0.074739	x) { // (N &
-0.074739	N: #define N1 (N &
-0.465970	Weekdays Day; if (Day &
-0.165071	+ p->b;} int Sum3(S3 &
-0.165071	of 2 return powN<(N &
-0.165071	<< 4) | ((C &
-0.165071	& 0x0F) | ((B &
-0.165071	int i; ... list[i &
-0.165071	vmldExp2 Intel SVML v.10.3 &
-0.165071	double Intel SVML v.10.2 &
-0.165071	2.5f}; a = OneOrTwo5[b &
-0.165071	7.40c x.abc = (A &
-0.659104	the case of the simple
-1.073020	is faster than the simple
-0.348214	the function. In the simple
-0.563218	vector. This is a simple
-0.449267	whether p is a simple
-0.347596	control condition is a simple
-0.781713	an object of a simple
-0.355960	be fast in a simple
-1.026743	should preferably be a simple
-0.348864	a reference or a simple
-0.349950	complicated algorithm if a simple
-0.353242	linear list with a simple
-0.350010	more time-consuming than a simple
-0.092531	limited range then a simple
-0.092531	narrow range then a simple
-0.235052	fast as calling a simple
-0.417011	time than accessing a simple
-0.065159	branch that follows a simple
-0.065159	if it follows a simple
-0.065159	function pointer follows a simple
-0.331877	of array elements of simple
-0.575124	long response times to simple
-0.237699	expects immediate responses to simple
-0.237846	is fast, compact, and simple
-0.313748	old data file in simple
-0.211706	the code automatically in simple
-0.211706	this optimization automatically in simple
-0.381683	memcpy, at least in simple
-0.236721	are the same for simple
-0.292865	This is efficient for simple
-0.323917	response times, even for simple
-0.442431	long response times for simple
-0.356057	of overflow, such as simple
-0.333964	at a time. A simple
-0.231708	or full speed. A simple
-0.231708	all data members. A simple
-0.231708	no other branches. A simple
-0.231708	using a profiler. A simple
-0.314188	the code contains only simple
-0.457502	be advantageous to do simple
-0.451160	Most compilers can do simple
-0.355139	on only the most simple
-0.330909	that select between two simple
-0.237109	7.8 Member pointers In simple
-0.236912	and mainframes, and between simple
-0.236087	No cache contentions. Use simple
-0.338435	these problems is quite simple
-0.471781	Most compilers can reduce simple
-0.331113	point multiplication, to mix simple
-0.222289	is destroyed. In 50 simple
-0.165091	of cache space. Putting simple
-0.457129	restrictions on using the instructions
-0.481179	supports this kind of instructions
-0.294087	for computing i/2+r. The instructions
-0.348897	branch must rely on instructions
-0.305968	64 bits. The vector instructions
-0.230551	Today's microprocessors have vector instructions
-0.485505	possible to use vector instructions
-0.230551	SSE4.1 some more vector instructions
-0.341463	few more integer vector instructions
-0.346919	support of the different instructions
-0.443442	enabled. There are no instructions
-0.237068	ebx. The next two instructions
-0.237039	using a pipeline where instructions
-0.236607	for giving specific optimization instructions
-0.231182	instruction set. These new instructions
-0.231182	producers keep adding new instructions
-0.323612	table 9.2. All these instructions
-0.236217	ever used, though. Some instructions
-0.522217	a lot of extra instructions
-0.229502	require a few extra instructions
-0.236170	for testing single assembly instructions
-0.235955	activates critical application- specific instructions
-0.235950	bitwise operators are single instructions
-0.221162	the data cache. These instructions
-0.221162	solve this problem. These instructions
-0.221162	vectorized table lookup. These instructions
-0.291844	if possible. The AVX instructions
-0.350202	sets include a few instructions
-0.235283	instruction sets have certain instructions
-0.137650	then the nontemporal write instructions
-0.085067	effect of nontemporal write instructions
-0.085067	area. The nontemporal write instructions
-0.085067	The so-called nontemporal write instructions
-0.234739	time. There are intrinsic instructions
-0.290549	precision require precision conversion instructions
-0.309940	are other cache control instructions
-0.306786	x86 CPUs can execute instructions
-0.229089	processors that supported 256-bit instructions
-0.229114	instructions SSE4.2 string search instructions
-0.146362	of optimization by executing instructions
-0.191391	cycles spent on executing instructions
-0.146362	x 43 speculatively executing instructions
-0.226678	the number of machine instructions
-0.224785	now contains only six instructions
-0.218446	able to define application-specific instructions
-0.074714	actually able to reorder instructions
-0.074714	A compiler may reorder instructions
-0.165009	the table. The 16-byte instructions
-0.165009	ADC (add with carry) instructions
-0.165009	will support the ADX instructions
-0.165009	the queue of pending instructions
-0.463213	processing power of the processors
-0.357414	inferior version on the processors
-0.797871	a negative list of processors
-0.187732	The first generation of processors
-0.366245	the next generation of processors
-0.187732	the second generation of processors
-0.235747	of the time on processors
-0.235747	of its time on processors
-0.690962	"what works best on processors
-0.235191	preferably be avoided on processors
-0.293718	simple processors and vector processors
-0.516032	several versions for different processors
-0.325345	cache organization for different processors
-0.324601	of precision on most processors
-0.332847	monitor counter in Intel processors
-0.234619	processors and earlier Intel processors
-0.324653	point operation on such processors
-0.592985	well only on some processors
-0.345264	same way, the first processors
-0.296290	ZMM registers The first processors
-0.296290	YMM registers. The first processors
-0.236293	mainframes, and between simple processors
-0.228582	resources for other virtual processors
-0.228582	hardware CPU. These virtual processors
-0.283343	Pentium 4 and AMD processors
-0.228342	later Intel processors. AMD processors
-0.234064	on newer processors. Many processors
-0.233731	64-bit versions. The x86 processors
-0.233683	is compiled for old processors
-0.233235	fail to recognize VIA processors
-0.232928	reason is that modern processors
-0.411102	The performance on non-Intel processors
-0.258976	when running on non-Intel processors
-0.206074	method for all unknown processors
-0.206074	Failure to handle unknown processors
-0.276967	The number of logical processors
-0.034734	of cores or logical processors
-0.034734	CPU cores or logical processors
-0.159550	only the even-numbered logical processors
-0.065609	of the standard PC processors
-0.065609	as the standard PC processors
-0.065609	purposes the standard PC processors
-0.165076	the number of physical processors
-0.165076	processor has four physical processors
-0.222271	are: Optimizing for present processors
-0.218432	significant effect on older processors
-0.165045	or another. Therefore, micro- processors
-0.165045	may also see emulated processors
-0.165045	in parallel. Small lightweight processors
-0.165045	than last time. Newer processors
-0.165045	the same brand. Future processors
-0.573562	ways depending on the available
-0.237666	following table lists the available
-0.237666	of CPUs increased the available
-0.382498	important to study the available
-0.334349	OS support and is available
-0.357230	processors, and it is available
-0.355712	The InstructionSet() function is available
-0.132488	the C++ compiler is available
-0.132488	Gnu C++ compiler is available
-0.339428	C++ compiler, which is available
-0.339428	library asmlib, which is available
-1.163753	AVX instruction set is available
-0.236376	stdint.h or inttypes.h is available
-0.236376	limited "express" edition is available
-0.881637	when the number of available
-0.461620	are expected to be available
-0.351840	the ones that are available
-0.232409	supporting multi-threaded software are available
-0.530813	Many function libraries are available
-0.268976	XMM vector registers are available
-0.770063	point stack registers are available
-0.268976	The YMM registers are available
-0.476223	variables. Vector operations are available
-0.327808	to the calculations are available
-0.310381	Free trial versions are available
-0.287962	improve efficiency. These are available
-0.232409	many standard tasks are available
-0.232409	container class templates are available
-0.308180	graphical interface frameworks are available
-0.210619	Operations that are only available
-0.210619	256-bit size are only available
-0.422938	The function is also available
-0.326633	output option is also available
-0.381198	makes an extra register available
-0.352174	best optimized function libraries available
-0.665718	eight floating point registers available
-0.306926	approximately six integer registers available
-0.493362	CPUs and operating systems available
-0.330054	these manuals are always available
-0.206844	number of logical processors available
-0.395404	cores or logical processors available
-0.504075	object can be made available
-0.232945	such feature will become available
-0.228086	Patches should be easily available
-0.226710	There are various profilers available
-0.299232	larger than the largest available
-0.222309	AMD LIBM library. Only available
-0.199942	for Basic soon became available
-0.165086	Intel's Math Kernel Library, available
-0.165086	research, not on publicly available
-0.357538	reduce a to the constant
-1.038536	is faster if the constant
-0.720801	before multiplying with the constant
-0.521211	either by making the constant
-0.331153	vector register, add the constant
-0.338817	+ 3.5; Here, the constant
-0.293553	a parenthesis around the constant
-0.237325	the compiler sees the constant
-0.237902	loop count (ArraySize) is constant
-0.355671	per row is a constant
-1.026747	should preferably be a constant
-0.250433	loop counter by a constant
-0.250433	integer multiplication by a constant
-0.174424	point division by a constant
-0.112318	Integer division by a constant
-0.190202	// Division by a constant
-0.190202	faster. Division by a constant
-0.250433	// Modulo by a constant
-0.353244	an integer with a constant
-1.087547	recommended to use a constant
-0.543433	guidelines by using a constant
-0.047931	address by adding a constant
-0.047931	calculated by adding a constant
-0.235054	base address plus a constant
-0.237323	do function inlining and constant
-0.048299	6.0f; Constant folding and constant
-0.048299	places. Constant folding and constant
-0.237475	choice of n. The constant
-0.293724	N&(N-1) is 0. The constant
-0.345063	and perhaps }; // constant
-0.235530	&& b<c) Multiply by constant
-0.065267	0 - Divide by constant
-0.065267	and add Divide by constant
-0.065267	(-a>-b)=(a<b) ---xx---x Divide by constant
-0.237702	that are declared as constant
-0.237536	the constant subexpression. A constant
-0.357526	space. A floating point constant
-0.351473	so that only one constant
-0.352589	can replace an integer constant
-0.237305	} By giving each constant
-0.351324	have even a single constant
-0.348953	with the double precision constant
-0.235087	// Example 8.24. Integer constant
-0.326589	in order to enable constant
-0.226683	works for any compile-time constant
-0.212223	to stack memory. Copying constant
-0.017516	as common subexpression elimination, constant
-0.017516	inlining, common subexpression elimination, constant
-0.356186	with CPUs that are up
-0.540199	and make the code up
-0.237641	version is currently not up
-0.341426	.cpp modules that make up
-0.330658	way is to set up
-0.375337	which can be set up
-0.288092	test tool can set up
-0.233523	The unrolled loop takes up
-0.233523	The 'this' pointer takes up
-0.204825	of branches that take up
-0.204825	multiple instances that take up
-0.325293	again. This may take up
-0.099492	be used to speed up
-0.099492	about how to speed up
-0.235247	than making it count up
-0.234470	-156. Surprisingly, we end up
-0.085263	is necessary to look up
-0.085263	compiler needs to look up
-0.191458	necessary to first look up
-0.191458	self-relative address. (3) look up
-0.233694	the clock frequency goes up
-0.303475	powerful computers to keep up
-0.203247	that they always keep up
-0.273655	64-bit Unix systems allow up
-0.203231	BSD and Mac allow up
-0.230010	is needed for setting up
-0.222262	an STL vector turned up
-0.212191	we need to split up
-0.165068	lines should be split up
-0.042019	has something to clean up
-0.042019	is nothing to clean up
-0.088543	the program must clean up
-0.375619	need to be cleaned up
-0.148645	allocated resources are cleaned up
-0.265015	cache will be filled up
-0.048362	exception handlers for cleaning up
-0.048362	lot of time cleaning up
-0.048362	is prevented from cleaning up
-0.212154	Library versions tested (not up
-0.074725	program less efficient. Splitting up
-0.074725	with this rule. Splitting up
-0.251184	keep their CPU dispatchers up
-0.165035	reproducible time measurements: warm up
-0.165035	call to _endthread() cleans up
-0.165035	cache and it fills up
-0.165035	dependency chain may fill up
-0.165035	object can be speeded up
-0.165035	this section by summing up
-0.165035	transferred in registers, totaling up
-0.165035	special precautions for speeding up
-0.461835	then check for the error
-0.344772	if possible, or the error
-0.351219	as long as the error
-0.355536	the pipeline then the error
-0.237580	this will trigger the error
-0.324883	a common source of error
-0.341320	even worse kind of error
-0.325026	any other form of error
-0.294197	well thought-through approach to error
-0.048340	60 7.30 Exceptions and error
-0.048340	multithreading. 7.30 Exceptions and error
-0.237741	to use, incompatible or error
-0.356054	used branches such as error
-1.036017	in case of an error
-0.298782	exclusive mode, and an error
-0.324865	can return with an error
-0.224503	cleaned up. If an error
-0.224503	function may return an error
-0.278988	the linker makes an error
-0.224503	we don't need an error
-0.445093	linker will generate an error
-0.224503	operator will detect an error
-0.224503	the dispatcher signal an error
-0.224503	program to issue an error
-0.224503	This will provoke an error
-0.224503	or for issuing an error
-0.224503	function that detects an error
-0.330325	is accessed, and this error
-0.611920	You can avoid this error
-0.339920	more expensive and more error
-0.292254	make and therefore more error
-0.237517	occur and recovering from error
-0.333744	an exception or other error
-0.347348	may insert any other error
-0.406011	is a common programming error
-0.236101	on bounds checking). An error
-0.512295	data is a common error
-0.235685	an overflow or another error
-0.097087	you make your own error
-0.097087	better, make your own error
-0.216437	simply prints an appropriate error
-0.216437	error; and make appropriate error
-0.288171	*(T*)0; } // No error
-0.007698	value of the residual error
-0.007698	calculation of the residual error
-0.015534	repeated until the residual error
-0.199942	bug is a minor error
-0.199942	null reference to provoke error
-0.165086	to handle an unrecoverable error
-0.237791	are usability issues, and I
-0.351265	CPU detection function that I
-0.237404	is so complicated that I
-0.336165	be obsolete. But if I
-0.237634	expressions, but no compiler I
-0.237544	come into force when I
-0.237412	very different speeds. If I
-0.237370	manual on usability, but I
-0.103926	one of the compilers I
-0.179024	none of the compilers I
-0.103926	None of the compilers I
-0.078805	on all the compilers I
-0.078805	because all the compilers I
-0.316664	8. 71 The compilers I
-0.306690	Comparison of different compilers I
-0.235663	up into multiple functions. I
-0.310565	any of the examples I
-0.217493	a | b; Here, I
-0.217493	OneOrTwo5[b & 1]; Here, I
-1.135781	the function is called. I
-0.232831	the matrix line size. I
-0.208337	you don't understand it. I
-0.208337	itself and recompile it. I
-0.317819	a lot in performance. I
-0.333722	vectorized code or not. I
-0.229106	need the next element. I
-0.227951	constants, and initialized arrays. I
-0.602553	family and model number. I
-0.224737	and destructors to call. I
-0.364609	create a new one. I
-0.165039	relevant books and manuals. I
-0.165039	for my optimization manuals. I
-0.218379	advanced system performance options. I
-0.218379	point. The reason is, I
-0.218379	do the reductions manually. I
-0.212114	as good as expected. I
-0.212114	any specific model. Instead, I
-0.212114	expression -(-a) to a. I
-0.330574	For my own research, I
-0.199847	engineering principles to use. I
-0.251140	cycle? In this manual, I
-0.164998	testing and maintenance easier. I
-0.164998	machines with embedded microcontrollers. I
-0.164998	representation is particularly tricky. I
-0.164998	by thousands of people. I
-0.164998	data. That being said, I
-0.164998	code. In this chapter, I
-0.164998	multiple of 0x800 apart. I
-0.164998	have quite dramatic consequences. I
-0.270484	a useful way of making
-0.270484	a good way of making
-0.270484	a convenient way of making
-0.102376	The alternative solution of making
-0.102376	The radical solution of making
-0.313705	as a means of making
-0.236580	order. The advice of making
-0.292705	compiler is capable of making
-0.237868	loop by two and making
-0.345630	the program code for making
-0.556903	can be useful for making
-0.332646	may be useful for making
-0.591949	languages are good for making
-0.380171	have a feature for making
-0.235997	Environments) have facilities for making
-0.525049	organized if you are making
-0.505340	branches. If you are making
-0.468302	X, unless you are making
-0.328670	single precision or by making
-0.302364	and compiler-generated code by making
-0.338554	tell it this by making
-0.227520	member functions faster by making
-0.338565	eliminate one division by making
-0.472468	can be avoided by making
-0.227520	times faster either by making
-0.227520	Provoke cache misses by making
-0.227520	of context switches by making
-0.227520	avoid multiple inheritance by making
-0.022721	can be solved by making
-0.227520	Provoke branch mispredictions by making
-0.227520	may be mitigated by making
-0.407893	who are satisfied with making
-0.382543	microcontrollers. I am not making
-0.455491	may be faster than making
-0.354915	existing object rather than making
-0.235569	down to zero than making
-0.306689	This prevents it from making
-0.262622	prevent the compiler from making
-0.153205	prevents the compiler from making
-0.736623	Therefore, you should avoid making
-0.234139	extra code for actually making
-0.230780	code. If you consider making
-0.279421	implemented in different places making
-0.008671	Some compilers have difficulties making
-0.938926	and the number of times
-0.804747	by the number of times
-0.324840	measure // Number of times
-0.237308	the function billions of times
-0.313795	called once or multiple times
-0.234039	go one way two times
-0.234039	times. Then again two times
-0.210632	The speed is many times
-0.210632	the critical function many times
-0.210632	one way, then many times
-0.210632	data are used many times
-0.021380	to count how many times
-0.021380	that count how many times
-0.043868	can tell how many times
-0.043868	profiler counts how many times
-0.210632	branch that goes many times
-0.292580	here is that access times
-0.292197	} } The execution times
-0.228530	a software package several times
-0.228530	different versions alternatingly several times
-0.235581	to be reloaded eight times
-0.350241	the break a few times
-0.234685	repeated 1024/4 = 256 times
-0.220930	first way and three times
-0.220930	This is approximately three times
-0.233688	branch is executed 10 times
-0.309129	code up to 5 times
-0.390333	applications because the response times
-0.065507	of unacceptably long response times
-0.065507	have unacceptably long response times
-0.065507	experience unacceptably long response times
-0.156289	cost of longer response times
-0.230722	This loop repeats 20 times
-0.415888	slower than the subsequent times
-0.229146	table can improve search times
-0.228032	can occur at random times
-0.173967	or even a thousand times
-0.173967	loop repeats a thousand times
-0.224825	that it takes six times
-0.222276	executes three to seven times
-0.306325	calculate *p+2 a hundred times
-0.222311	the library function 250 times
-0.299233	program flow at inconvenient times
-0.056984	a program repeats 1000 times
-0.056984	that also repeats 1000 times
-0.265034	may start at unpredictable times
-0.199903	the critical function ten times
-0.199903	test // Repeat NumberOfTests times
-0.165050	even be a million times
-0.064579	static memory to the stack
-0.543396	addressed relative to the stack
-0.356566	too large for the stack
-1.017904	rather than on the stack
-0.296254	allocates memory on the stack
-0.202841	are stored on the stack
-0.247749	Variables stored on the stack
-0.296254	the parameters on the stack
-0.296254	of space on the stack
-0.057306	are transferred on the stack
-0.296254	by storage on the stack
-0.296254	objects. Storage on the stack
-0.296254	is pushed on the stack
-0.354090	and popped from the stack
-0.458668	frame function because the stack
-0.650902	Do not use a stack
-0.325222	for setting up a stack
-0.044126	7.31 Other cases of stack
-0.345095	from static memory to stack
-0.407554	copies the table to stack
-0.237862	systems: Pointers, references, and stack
-0.544657	that is stored in stack
-0.346967	the new function. The stack
-0.330454	the sections below. The stack
-0.236762	two other situations: The stack
-0.236762	is implementation dependent. The stack
-0.237734	; save ebx on stack
-0.237533	; restore ebx from stack
-0.325216	when the floating point stack
-0.346036	functions. The floating point stack
-0.541358	do. This is called stack
-0.232456	described a mechanism called stack
-0.335374	spot. Use the call stack
-0.222369	of using the register stack
-0.222369	the way the register stack
-0.307973	an explanation of register stack
-0.299292	code cache. The register stack
-0.510564	the floating point register stack
-0.333321	handling. Omitting the standard stack
-0.300209	"frame pointer". The standard stack
-0.232629	exception handling /EHs- No stack
-0.165117	an option for "standard stack
-0.434958	have big arrays and want
-0.335236	features, and you may want
-0.335236	For example, you may want
-0.165424	critical function and you want
-0.165424	different functions and you want
-0.165424	less efficient and you want
-0.165424	non-sequential access and you want
-0.165424	exception handling and you want
-0.165424	fast anyway and you want
-0.239093	the compiler that you want
-0.239093	instruction set that you want
-0.239093	of arrays that you want
-0.239093	of statements that you want
-0.239093	let's say that you want
-0.498104	for example if you want
-0.270543	For example, if you want
-0.270543	different CPUs if you want
-0.270543	annotation option if you want
-0.270543	some help if you want
-0.254831	piece of code you want
-0.255251	to do when you want
-0.255251	array indices when you want
-0.203125	running a program you want
-0.255966	point library. If you want
-0.255966	at www.agner.org/optimize/asmlib.zip. If you want
-0.336376	reproducible results. If you want
-0.255966	a macro. If you want
-0.255966	} 152 If you want
-0.152378	the function where you want
-0.152378	general case where you want
-0.410125	If, for example, you want
-0.152378	compiler does what you want
-0.152378	measure exactly what you want
-0.203125	functions that 150 you want
-0.254831	and Sum3. Whether you want
-0.334939	the optimizations that we want
-0.222912	is the function we want
-0.225884	manually, but if we want
-0.225884	For example, if we want
-0.307096	= n∙(n-1)!. If we want
-0.292436	books and manuals. I want
-0.441026	line if you don't want
-0.221066	and Intel compilers. We want
-0.221066	long dependency chain. We want
-0.233560	errors. If you just want
-0.232306	code. However, we still want
-0.191042	and software developers who want
-0.191042	are for those who want
-0.309811	variable in the code. Example:
-0.309811	previously in the code. Example:
-0.306027	identical pieces of code. Example:
-0.419077	adding any extra code. Example:
-1.004582	at the same time. Example:
-0.371130	of the called function. Example:
-0.229491	is a pure function. Example:
-0.426356	are stored in memory. Example:
-0.548479	stored in static memory. Example:
-0.477105	time they are used. Example:
-1.135757	the function is called. Example:
-0.233185	to unroll a loop. Example:
-0.207679	a power of 2. Example:
-0.232484	sequence of consecutive variables. Example:
-0.439814	optimizations across function calls. Example:
-0.231720	stack versus XMM registers. Example:
-0.231758	by a float variable. Example:
-0.323567	virtual function is needed. Example:
-0.230636	documentation for detailed instructions. Example:
-0.825704	in a non-sequential order. Example:
-0.229102	that it jumps to. Example:
-0.458485	no check for overflow. Example:
-0.246648	integer doesn't cause overflow. Example:
-0.369072	to the previous value. Example:
-0.227945	from a previous branch. Example:
-0.224731	by the same constant. Example:
-0.318843	from poor branch prediction. Example:
-0.224771	by the calculated result. Example:
-0.299100	of the loop counter. Example:
-0.224771	an additional integer counter. Example:
-0.222224	in a single operation. Example:
-0.306204	preceding iteration is finished. Example:
-0.222224	data in different ways. Example:
-0.222177	sake of parallel execution. Example:
-0.276406	addresses of array elements. Example:
-0.264963	calculate it only once. Example:
-0.264963	available registers is limited. Example:
-0.199841	make a lookup-table static. Example:
-0.251134	pointed to is known. Example:
-0.330567	doing the same thing. Example:
-0.199841	on completely independent divisions. Example:
-0.199841	set SSE2 or later. Example:
-0.164993	array to all zeroes. Example:
-0.164993	which may be undesired. Example:
-0.164993	and 32 bit offsets). Example:
-0.164993	avoid the loop overhead. Example:
-0.164993	write the members individually. Example:
-0.559566	while most of the Gnu
-0.654953	the case of the Gnu
-0.353165	is similar to the Gnu
-0.518823	-fpic according to the Gnu
-0.350228	for Windows and the Gnu
-0.350228	inlined 15.1b and the Gnu
-0.405391	is used in the Gnu
-0.519968	syntax described in the Gnu
-0.558651	and supported by the Gnu
-0.347557	libraries included with the Gnu
-0.449217	Gnu Comes with the Gnu
-0.458293	less efficient than the Gnu
-0.346726	automatically, and only the Gnu
-0.343829	is called, while the Gnu
-0.500814	expected to replace the Gnu
-0.437159	1.; } Here, the Gnu
-0.259028	The Microsoft, Intel and Gnu
-0.259028	with Microsoft, Intel and Gnu
-0.293568	Microsoft, Intel, PathScale and Gnu
-0.087001	13.6 CPU dispatching in Gnu
-0.195911	13.2. CPU dispatching in Gnu
-0.328222	on AMD CPUs. The Gnu
-0.488998	pure function calls. The Gnu
-0.311221	vector math libraries. The Gnu
-0.321768	the 32-bit version. The Gnu
-0.403784	with automatic vectorization. The Gnu
-0.311221	anonymous namespace. 3. The Gnu
-0.234962	(see page 107). The Gnu
-0.234962	Windows and Mac. The Gnu
-0.234962	library versions instead. The Gnu
-0.294092	__declspec(align(16)) X #else // Gnu
-0.237766	made with Microsoft or Gnu
-0.052399	n.a. MS compiler Windows Gnu
-0.025410	optimization MS compiler Windows Gnu
-0.236113	vector function libraries. Use Gnu
-0.051459	for the Microsoft, Intel, Gnu
-0.051459	as the Microsoft, Intel, Gnu
-0.109845	x86 platforms. Microsoft, Intel, Gnu
-0.233755	multiple operating systems. 10 Gnu
-0.230784	Digital Mars PGI PathScale Gnu
-0.581879	Supports only 32-bit Windows. Gnu
-0.122635	-fno-builtin Gnu 32-bit -fno-builtin Gnu
-0.122635	Gnu 64 bit -fno-builtin Gnu
-0.199981	0.6 1.19 13 Asmlib Gnu
-0.165122	11.1 for IA-32/Intel64, 2009. Gnu
-0.237557	for details. Development time Some
-0.237405	program. Avoid unnecessary functions Some
-0.427434	Use whole program optimization Some
-0.454982	Choice of function libraries Some
-0.922478	version of the code. Some
-0.568480	it at compile time. Some
-0.380761	modules. 3.12 Network access Some
-0.500546	sixteen in 64-bit systems. Some
-0.306290	in 64-bit operating systems. Some
-0.289966	work on all compilers. Some
-0.377431	handling. 8.6 Optimization directives Some
-0.291898	much on the compiler. Some
-0.330592	algebra in a compiler. Some
-0.251155	Comes with Microsoft compiler. Some
-0.438443	iteration of the loop. Some
-0.287179	in 32 bit mode. Some
-0.231803	block of 16 bytes. Some
-0.323953	} } Loop unrolling Some
-0.229056	in the optimal order. Some
-0.518421	using dynamic memory allocation. Some
-0.229086	strongest optimization option available. Some
-0.229056	because of technical problems. Some
-0.456631	of a cache line. Some
-0.227954	for less intensive applications. Some
-0.452367	if speed is important. Some
-0.222153	how to avoid them. Some
-0.222153	time doing the division. Some
-0.148600	array rather than two. Some
-0.148600	compilers will make two. Some
-0.218350	the program starts up. Some
-0.890524	matter of programming style. Some
-0.218350	before it can run. Some
-0.347193	it is intended for. Some
-0.265033	code and just-in-time compilation. Some
-0.212085	a multidimensional array sequentially. Some
-0.264937	2007 (www.intel.com/technology/itj/). 10.1 Hyperthreading Some
-0.212085	which one works best. Some
-0.199819	hardly ever used, though. Some
-0.164973	from many different places). Some
-0.164973	by the program logic. Some
-0.164973	at regular time intervals. Some
-0.164973	wasteful in the STL. Some
-0.164973	of the Xnu project. Some
-0.164973	or graphics accelerator card. Some
-0.164973	or memory pool. Alignment? Some
-0.164973	several iterations of redesign. Some
-0.164973	other things very stupid. Some
-0.164973	be obeyed. Copy protection. Some
-0.348712	efficient solution because of its
-0.292682	structure or each of its
-0.271409	application uses most of its
-0.271409	to run most of its
-0.271409	153 spends most of its
-0.343178	the polymorphic member of its
-0.344373	the individual bits of its
-0.512139	on the values of its
-1.106436	to a pointer to its
-0.237306	it must return to its
-0.184622	table of pointers to its
-0.293833	the function type and its
-0.237571	has no side-effects and its
-0.314383	store each object in its
-0.237613	such a framework in its
-0.350735	is used or if its
-0.237053	in a register if its
-0.491128	m is replaced by its
-0.313979	an integer constant with its
-0.236924	the loop counter with its
-0.236038	constructing the object on its
-0.312503	can then run on its
-0.349376	unknown CPU based on its
-0.322473	class c1 other than its
-0.354891	CPU supports, rather than its
-0.404635	set is better than its
-0.440147	compiler does not have its
-0.312878	memory block should have its
-0.293817	to a pointer then its
-0.428873	that can benefit from its
-0.237512	with element matrix[c][r] at its
-0.232336	stack. Each thread has its
-0.232336	a linked list has its
-0.232336	each CPU model has its
-0.232336	A template instance has its
-0.314128	the simplest cases, but its
-0.581696	structure to make sure its
-0.452235	which gets information about its
-0.323028	to give each thread its
-0.235684	data object: (1) get its
-0.235505	the compiler must calculate its
-0.288920	optimized. We cannot change its
-0.231285	thread should then handle its
-0.316581	in order to align its
-0.136330	child class by type-casting its
-0.136330	different type by type-casting its
-0.165071	it from fully utilizing its
-0.303615	without worrying too much about
-0.228572	have to worry much about
-0.089941	valuable source of information about
-0.089941	compiler doesn't have information about
-0.089941	119 for more information about
-0.042646	handler needs all information about
-0.042646	has saved all information about
-0.089941	compiler has no information about
-0.089941	disassembly, probably without information about
-0.042646	have the necessary information about
-0.042646	the 124 necessary information about
-0.089941	in memory. No information about
-0.089941	give the full information about
-0.089941	lot of added information about
-0.129229	on the CPUID information about
-0.042646	class which gets information about
-0.042646	generation class gets information about
-0.089941	the compiler additional information about
-0.089941	Compiler has insufficient information about
-0.089941	it has incomplete information about
-0.234305	as you can read about
-0.504042	statement can be made about
-0.021846	that is said here about
-0.231320	assembly names. The details about
-0.249014	page 141 for details about
-0.182190	these problems. More details about
-0.020509	important to do something about
-0.229210	don't have to care about
-0.226715	to obey certain rules about
-0.279340	storage and page 87 about
-0.276445	making any specific recommendation about
-0.355912	delay. See page 43 about
-0.218455	cached. See page 26 about
-0.218494	cannot make any assumption about
-0.212188	2 (See page 137 about
-0.330672	do have to worry about
-0.199920	different threads, but that's about
-0.199920	compiler takes the hint about
-0.199920	make a few comments about
-0.165066	newsgroups contain useful discussions about
-0.165066	the programmer hasn't thought about
-0.165066	should get a reply about
-0.165066	not be too worried about
-0.165066	4: "Instruction tables". Tips about
-0.165066	on Intel processors. Details about
-0.165066	is a considerable debate about
-0.455226	code then it is important
-0.455226	index then it is important
-0.455226	found, then it is important
-0.515582	Bridge) because it is important
-0.521375	applications, but it is important
-0.515582	programmed. Therefore, it is important
-0.331803	software project, it is important
-0.331803	like these, it is important
-0.331803	in nature, it is important
-0.474729	longer used. It is important
-0.417880	a pointer. It is important
-0.322576	induction variables. It is important
-0.417880	mathematical calculations. It is important
-0.322576	assembly language. It is important
-0.322576	it is. It is important
-0.322576	to do. It is important
-0.322576	program structure. It is important
-0.322576	data decomposition. It is important
-0.322576	and off. It is important
-0.338789	time when performance is important
-0.358381	are stored can be important
-0.237734	is busy concentrating on important
-0.535557	users as well as important
-0.324321	a program is an important
-0.420055	throughput There is an important
-0.237611	saying please install this important
-0.289118	the system, the more important
-0.352859	file access is more important
-0.331601	of development are more important
-0.289118	it is even more important
-0.399571	each of the most important
-0.399571	Some of the most important
-0.224739	this problem. The most important
-0.224739	libraries available. The most important
-0.224739	execute faster. The most important
-0.224739	out-of-order execution. The most important
-0.224739	code generality. The most important
-0.344157	This function is so important
-0.401447	that it is very important
-0.282510	data. It is very important
-0.282510	of algorithm is very important
-0.449591	processors, but is less important
-0.232313	platform has become less important
-0.236281	to avoid them. Some important
-0.236122	speed is important. An important
-0.559181	memory. It is therefore important
-0.331956	the problem is too important
-0.420762	considerations that are particularly important
-0.575802	global if it is accessed
-0.540593	inefficient because it is accessed
-0.524730	cache, where it is accessed
-0.348021	and therefore it is accessed
-0.357437	below shows. It is accessed
-0.318384	variable or object is accessed
-0.318384	that no object is accessed
-0.293168	function or variable is accessed
-0.293168	data A variable is accessed
-0.336252	level-1 data cache and accessed
-0.639061	pointed to can be accessed
-0.639061	this example can be accessed
-0.348400	then both can be accessed
-0.348400	a DLL can be accessed
-0.450282	variables. They can be accessed
-0.458182	Multidimensional arrays should be accessed
-0.440450	when the data are accessed
-0.053641	efficiently when data are accessed
-0.307962	in a class are accessed
-0.232226	and child class are accessed
-0.353275	if the objects are accessed
-0.089139	manner? If objects are accessed
-0.093597	and the elements are accessed
-0.093597	that the elements are accessed
-0.093597	if the elements are accessed
-0.151194	only when elements are accessed
-0.342708	stored in registers are accessed
-0.428649	when the arrays are accessed
-0.242778	efficient when arrays are accessed
-0.242778	6. If arrays are accessed
-0.406988	that the addresses are accessed
-0.139986	that the rows are accessed
-0.222240	if the rows are accessed
-0.228433	arrays or structures are accessed
-0.046848	above the diagonal are accessed
-0.046848	below the diagonal are accessed
-0.407947	around in memory or accessed
-1.008344	that it is not accessed
-0.517041	the function is not accessed
-0.341556	be used, even when accessed
-0.209255	compile time. Are objects accessed
-0.209255	list. 94 Are objects accessed
-0.451697	file that has been accessed
-0.508324	program are in fact accessed
-0.222348	or malloc) is necessarily accessed
-0.338685	data. The speed of CPUs
-0.335802	each new generation of CPUs
-0.366102	treats different brands of CPUs
-0.255375	for other brands of CPUs
-0.341327	the strlen function for CPUs
-0.331308	set, another version for CPUs
-0.849704	that is compatible with CPUs
-0.292086	of this function on CPUs
-0.339755	is optimal only on CPUs
-0.859520	has reduced performance on CPUs
-0.336994	to have many different CPUs
-0.341132	test on several different CPUs
-0.331229	clock cycles than other CPUs
-0.346012	be compatible with all CPUs
-0.237290	is supported by most CPUs
-0.315007	in favor of Intel CPUs
-0.229418	specific profiler. For Intel CPUs
-0.229418	while all newer Intel CPUs
-0.229418	tasks on current Intel CPUs
-0.330766	a computer with multiple CPUs
-0.306540	future. To use multiple CPUs
-0.231031	in parallel: Using multiple CPUs
-0.237126	supported on all 64-bit CPUs
-0.236250	of 16 bytes. Some CPUs
-0.236065	or Intel compiler. Use CPUs
-0.303354	Intel VTune, for AMD CPUs
-0.328295	but not on AMD CPUs
-0.234085	performance monitor counters Many CPUs
-0.233744	execution All modern x86 CPUs
-0.321530	when compatibility with old CPUs
-0.213794	This is because modern CPUs
-0.213794	chapter 12. Most modern CPUs
-0.232287	in the oldest Pentium CPUs
-0.064117	reduced performance on non-Intel CPUs
-0.166224	The speed on non-Intel CPUs
-0.144979	CPU dispatcher treats non-Intel CPUs
-0.144979	these also treat non-Intel CPUs
-0.229984	that doesn't handle current CPUs
-0.136356	optimization by CPU Modern CPUs
-0.136356	are critical resources. Modern CPUs
-0.136356	calculations in parallel. Modern CPUs
-0.136356	temp1 and temp2. Modern CPUs
-0.199920	are called accumulators. Current CPUs
-0.199920	and integer division. Older CPUs
-0.165066	operations on contemporary 106 CPUs
-0.165066	for some small low-power CPUs
-0.459829	specific version of the function.
-0.459829	desired version of the function.
-0.317630	right version of the function.
-0.455935	the start of the function.
-0.456839	implicit parameter to the function.
-0.353577	is transferred to the function.
-0.340677	to return from the function.
-0.340677	ret returns from the function.
-0.346557	other modules call the function.
-0.352318	object defined inside the function.
-0.723736	jumping out of a function.
-0.356751	is equivalent to a function.
-0.235465	at least one other function.
-0.636981	doesn't call any other function.
-0.350079	before calling the library function.
-0.314030	be accessed from any function.
-0.453049	address of the member function.
-0.096340	efficient as a member function.
-0.557617	to a class member function.
-0.412985	as a virtual member function.
-0.100846	body of the called function.
-0.100846	caller to the called function.
-0.688354	version of the critical function.
-0.429906	call of the critical function.
-0.416790	call to the critical function.
-0.440094	parameters of the new function.
-0.351303	than isolating a single function.
-0.824705	version of the virtual function.
-0.347146	code of the next function.
-0.234765	use the _mm_clflush intrinsic function.
-0.556010	or in a separate function.
-0.192490	points to the dispatcher function.
-0.192490	gets from the dispatcher function.
-0.192490	// Make the dispatcher function.
-0.869618	pointer to the desired function.
-0.536324	call to the inlined function.
-0.263080	copy of an inlined function.
-0.334999	your own error message function.
-0.284292	this is a pure function.
-0.283058	implementation of the memcpy function.
-0.573049	to call a polymorphic function.
-0.304376	is called a leaf function.
-0.222325	are some examples: strlen function.
-0.212194	to call the ReadTSC function.
-0.199925	in registers anyway. Pure function.
-0.502494	make use of the extra
-0.538476	time because of the extra
-0.349306	you must do the extra
-0.382386	can optimize away the extra
-0.237586	be faster despite the extra
-0.499174	contains a lot of extra
-0.717084	consume a lot of extra
-0.294141	of a structure. The extra
-0.236766	less than 231. This extra
-0.236766	(see page 135). This extra
-0.149985	then there is an extra
-0.149985	But there is an extra
-0.149985	cases, there is an extra
-0.436767	processor will have an extra
-0.283545	faster and makes an extra
-0.228520	loop or add an extra
-0.228520	because it needs an extra
-0.228520	importantly, it requires an extra
-0.228520	application software. Such an extra
-0.228520	that it adds an extra
-0.324999	list and make this extra
-0.352330	system database, and other extra
-0.571715	object. There is no extra
-0.272725	There will be no extra
-0.089659	the code takes no extra
-0.089659	conversion often takes no extra
-0.089659	exception handling takes no extra
-0.089659	This conversion takes no extra
-0.161708	long double take no extra
-0.161708	different precisions take no extra
-0.218974	type conversion generates no extra
-0.233556	Copying the table takes extra
-0.233556	reading them again takes extra
-0.219408	and doesn't take any extra
-0.219408	it doesn't generate any extra
-0.019573	do not produce any extra
-0.009676	does not produce any extra
-0.219408	type-casting without adding any extra
-0.324248	gives rise to some extra
-0.346227	compiler has to take extra
-0.338297	program logic may need extra
-0.350294	arrays require a few extra
-0.235422	compiler may actually add extra
-0.224897	code gives an 9 extra
-0.222306	Runtime type identification adds extra
-0.212234	Instrumentation: The compiler inserts extra
-0.553069	using a function that does
-0.455320	time. A code that does
-0.419380	example, a loop that does
-0.236608	a default constructor that does
-0.237724	the code automatically or does
-0.558820	notice is that it does
-0.345013	the advantage that it does
-0.235167	when in fact it does
-0.235167	efficient, and sometimes it does
-0.235167	comes to optimization, it does
-0.314550	__attribute(( const)) Assume function does
-0.236751	a ready-made profiler. This does
-0.236751	values per point. This does
-0.439719	check if the compiler does
-0.103600	Checking what the compiler does
-0.340024	code. Sometimes the compiler does
-0.352042	an integer. The compiler does
-0.285432	(parallel composer) This compiler does
-0.315937	} The Microsoft compiler does
-0.230182	of code. Each compiler does
-0.324965	is used. However, this does
-0.234444	the previous value. It does
-0.234444	more syntax check. It does
-0.234444	for pointer conversions. It does
-0.547475	condition inside the loop does
-0.407376	implicit 'this' pointer which does
-0.338577	sure that the pointer does
-0.352755	or decrementing a pointer does
-0.231865	that a specific pointer does
-0.237157	SVML. The IPP library does
-0.353107	virtual member the object does
-1.542559	a power of 2 does
-0.823392	using powers of 2 does
-0.293087	because it is long does
-0.292028	A process or thread does
-0.330328	is important. This manual does
-0.322500	is that the list does
-0.234937	cast The static_cast operator does
-0.527395	program. The CPU dispatcher does
-0.310614	page 70). The programmer does
-0.321011	Specifies that pointer aliasing does
-0.472446	on the other hand, does
-0.272132	but unfortunately the unit-test does
-0.212183	operators). The same argument does
-0.251216	exponent } Example 14.26 does
-0.165060	CPU’s. Another function __intel_cpu_features_init_x() does
-0.358350	source annotation in the assembly
-0.552852	may look at the assembly
-0.643423	This option makes the assembly
-0.331733	example shows what the assembly
-0.976939	by the use of assembly
-0.237894	or easy linking to assembly
-0.152860	optimization of C++ and assembly
-0.292059	usually dealt with in assembly
-0.419219	as a pointer in assembly
-0.236012	15.1b and d in assembly
-0.292059	'$' are allowed in assembly
-0.048086	2. Optimizing subroutines in assembly
-0.005087	2: "Optimizing subroutines in assembly
-0.736811	from the function. The assembly
-0.293724	for assembly output. The assembly
-0.444916	a compiler option for assembly
-0.928380	An optimization guide for assembly
-0.237094	-S or /Fa for assembly
-0.102574	in compiled C++ or assembly
-0.102574	in C, C++ or assembly
-0.329039	link map or an assembly
-0.344312	compiler doesn't have an assembly
-0.425944	able to generate an assembly
-0.450511	you need to use assembly
-0.450511	rarely necessary to use assembly
-0.232247	a round function using assembly
-0.350956	classes than by using assembly
-0.232247	are highly optimized, using assembly
-0.333566	device drivers may need assembly
-0.231381	Gnu). Other compilers need assembly
-0.061575	compiler generates the following assembly
-0.236091	to do this: Use assembly
-0.236016	this for testing single assembly
-0.198917	excellent support for inline assembly
-0.250094	code with an inline assembly
-0.198917	is to use inline assembly
-0.198917	using the same inline assembly
-0.198917	assembly-like intrinsic functions, inline assembly
-0.212262	read and understand compiler-generated assembly
-0.212223	-parallel -openmp -static Generate assembly
-0.165097	details of instruction timing, assembly
-0.165097	be straightforward. The MASM assembly
-0.540725	avoided because of the large
-0.345105	if you avoid the large
-0.544371	If the object is large
-0.331446	if the list is large
-0.970664	the repeat count is large
-0.457354	inefficient. There is a large
-0.457354	recycled? There is a large
-0.357006	more complicated in a large
-0.351295	conflicts. But if a large
-0.355051	will appear as a large
-0.236471	implicitly when copying a large
-0.236471	(or at least a large
-0.313020	user must install a large
-0.236471	solution can incur a large
-0.567213	justified in case of large
-0.293982	are many allocations of large
-0.445331	on all data in large
-0.048346	9.10 Cache contentions in large
-0.237082	but quite inefficient in large
-0.976384	can be useful for large
-0.625224	are highly optimized for large
-0.294010	on mathematical applications with large
-0.331774	when doing calculations on large
-0.237706	exceed 2 Gbytes. This large
-0.335954	systems. Applications that use large
-0.292123	allocated memory block. A large
-0.236069	be mentioned here: A large
-0.313781	loop counters, etc. In large
-0.344145	point variables is so large
-0.325142	the library is very large
-0.328653	disadvantage of a very large
-0.249539	perhaps for a very large
-0.328653	integers with a very large
-0.249539	interpreted as a very large
-0.158694	used only for very large
-0.158694	24 dramatically for very large
-0.236087	dedicated test server. Use large
-0.323047	a program has several large
-0.322883	other's caches and cause large
-0.310599	Arrays that are too large
-0.283380	may choose to align large
-0.335222	Some compilers will align large
-0.016030	the arrays are sufficiently large
-0.212217	branches that can skip large
-0.649608	This means that a must
-0.294107	large runtime framework that must
-0.355496	clean up then it must
-0.802232	disadvantage that the function must
-0.293219	object. The calling function must
-0.357740	x; Here, the code must
-0.517703	inefficient because the compiler must
-0.352400	in list, the compiler must
-0.331592	the reading of x must
-0.353022	If not, then you must
-0.233025	fastest first. However, you must
-0.233025	kind of problems you must
-0.529158	In other words, you must
-0.233025	a specific purpose, you must
-0.237560	to do manually. It must
-0.356874	other words, the program must
-0.291518	more difficult. The functions must
-0.333807	page 140). Mathematical functions must
-0.537952	9.6b. The MOVNTQ instruction must
-0.237370	as it is, but must
-0.335594	class. The container class must
-0.293473	the compilers cannot do must
-0.503980	initial value of i must
-0.351017	dictates that an object must
-0.349862	S1 in the array must
-0.293164	CPU models. However, we must
-0.293023	then the loop branch must
-0.406364	where the carry bit must
-0.494439	is that the user must
-0.337993	relatively costly because they must
-0.236286	That being said, I must
-0.342759	shared between multiple threads must
-0.235353	here: The inequality sign must
-0.234970	same cache. Multithreaded programs must
-0.234464	worst possible performance. We must
-0.531100	of user interface framework must
-0.233844	The 128-bit XMM vectors must
-0.334597	object. Any copy constructor must
-0.232921	and rounding 137 errors must
-0.232876	index, i. This index must
-0.232152	the mouse. This task must
-0.222275	= 8; // SIZE must
-0.199847	to vectorize. The pragmas must
-0.199847	is implemented. The recursion must
-0.251140	the destructor, if any, must
-0.164998	software module for correctness must
-0.358473	The purpose of the while
-0.358328	remains zero in the while
-0.339155	on n, including the while
-0.237637	order to emulate the while
-0.444136	of the program, and while
-0.349554	The loop would be while
-0.821665	resolved at compile time while
-0.351654	the processor can do while
-0.356451	{ seconds = 0; while
-0.344039	float uses 32 bits while
-0.075762	CPU can do calculations while
-0.075762	thread can do calculations while
-0.291134	and writing data files while
-0.235029	permissible in all cases, while
-0.233414	called a frame function, while
-0.337324	to restart the computer while
-0.233252	little or no overhead while
-0.320789	Gnu compiler for Windows, while
-0.231277	function can modify x, while
-0.371728	XMM registers are used, while
-0.227933	the old Pentium 4, while
-0.367113	int in one vector, while
-0.699275	the function is called, while
-0.226552	and c are integers, while
-0.226552	speed or program size, while
-0.226552	time if and compile-time while
-0.224761	debugger and press break while
-0.315561	double y = 1.0; while
-0.222215	it is running on, while
-0.222215	{ // do nothing while
-0.553113	shared between multiple threads, while
-0.465794	string[100], *p = string; while
-0.199830	accept expressions as arguments while
-0.199830	to resume after exceptions: while
-0.251121	is 16 bits wide, while
-0.199830	are done only once, while
-0.164983	p has been incremented, while
-0.164983	condition is relatively expensive, while
-0.164983	all 1's is unchanged, while
-0.164983	same register for both, while
-0.164983	a simple regular pattern, while
-0.164983	one call to Func1, while
-0.164983	for generality and flexibility, while
-0.164983	are separated by semicolons, while
-0.164983	needed by the application, while
-0.164983	to 15.1c as intended, while
-0.461779	to end of a ;
-0.355436	; ecx = a ;
-0.237510	name ;startofFunc ; a ;
-0.345182	what r points to ;
-0.036696	to top of loop ;
-0.017960	; top of loop ;
-0.789691	sign bit of i ;
-0.326584	with end of array ;
-0.233640	store result in array ;
-0.293026	from stack ; return ;
-0.313483	= divide by 2 ;
-0.292882	; align by 4 ;
-0.229680	save ebx on stack ;
-0.229680	restore ebx from stack ;
-0.599775	; mangled function name ;
-0.233712	; a ; r ;
-0.231243	repeat loop if true ;
-0.203212	compute i/2 in ebx ;
-0.203212	eax, 100 $B1$2 ebx ;
-0.285970	; return ; align ;
-0.065604	loop ; unused label ;
-0.065604	r ; unused label ;
-0.065604	true ; unused label ;
-0.226712	for calculations: for ( ;
-0.226712	; ecx = Induction ;
-0.272132	ja $B2$3: ret ALIGN ;
-0.208608	; a[i] = Induction; ;
-0.208608	; a[i+1] = Induction; ;
-0.212131	#endif double Func1(double) pure_function ;
-0.284141	1: 8 + esp ;
-0.199864	eax $B2$2 ; Induction++; ;
-0.165014	8 edx, eax $B2$2 ;
-0.165014	; point to a[i+2] ;
-0.165014	PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROCNEAR ;
-0.165014	example 8.26a (32-bit mode): ;
-0.165014	is the variable 85 ;
-0.165014	?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROC NEAR ;
-0.165014	code from example 8.26b: ;
-0.165014	unused label ;eax=addressofa ;edx=addressinr ;
-0.165014	; i + sign(i) ;
-0.165014	4 + esp ;alignby4 ;
-0.165014	of Func ;a ;r ;
-0.165014	array ; i++ ;checkifi<100 ;
-0.165014	edx, DWORD PTR [esp+12] ;
-0.165014	mangled function name ;startofFunc ;
-0.648758	the compiler that the arrays
-0.519076	operations require that the arrays
-0.356605	a disadvantage when the arrays
-0.355896	above. 7. If the arrays
-0.436139	to make sure the arrays
-0.731020	solution of making the arrays
-0.303544	be efficient whether the arrays
-0.303544	for sure whether the arrays
-0.237143	compiler can align the arrays
-0.564118	makes the use of arrays
-0.523919	in a number of arrays
-0.341279	www.agner.org/optimize/cppexamples.zip contains examples of arrays
-0.293538	__attribute__((aligned(16))). Specifies alignment of arrays
-0.336055	not always apply to arrays
-0.341546	same advice applies to arrays
-0.324949	align large objects and arrays
-0.237565	memory allocation Objects and arrays
-0.325260	never occurs, even for arrays
-0.420844	code more efficient when arrays
-0.488343	how you can make arrays
-0.352019	memory spaces for different arrays
-0.237430	a vector. 6. If arrays
-0.313955	objects and fixed size arrays
-0.311681	you want as static arrays
-0.234039	will align large static arrays
-0.284472	in case of large arrays
-0.229337	program has several large arrays
-0.228952	therefore recommended that big arrays
-0.580023	if you have big arrays
-0.810344	one or a few arrays
-0.340011	errors is to replace arrays
-0.219436	how to make aligned arrays
-0.219436	// Make three aligned arrays
-0.376353	to static or global arrays
-0.324320	is optimized for accessing arrays
-0.282980	same for simple variables, arrays
-0.212235	storing strings in character arrays
-0.212183	of RAM memory. Big arrays
-0.199914	// Example 12.5. Aligned arrays
-0.199914	is to allocate variable-size arrays
-0.199914	for Linux) 4. Align arrays
-0.165060	backwards. Copying or clearing arrays
-0.165060	is very inefficient. Linear arrays
-0.165060	as writing data. Multidimensional arrays
-0.356950	cycles later and the work
-0.573259	dynamically depending on the work
-0.349328	is increased when the work
-0.349328	and decreased when the work
-0.313825	is to divide the work
-0.185652	order to divide the work
-0.185652	need to divide the work
-0.348329	an equal amount of work
-0.344159	want the function to work
-0.352492	a Windows compiler to work
-0.455132	shows a way to work
-0.437100	This is sure to work
-0.578638	method is likely to work
-0.313323	examples are intended to work
-0.236725	it is impossible to work
-0.237833	Linux. 82 Keywords that work
-0.336191	package and make it work
-0.351220	of pointers may not work
-0.352833	However, this does not work
-0.293923	Mac systems, this may work
-0.324969	how to make this work
-0.343297	n;} This code will work
-0.236040	next processor model will work
-0.382108	if there is other work
-0.237367	of math functions should work
-0.237088	CPU. These methods also work
-0.448700	libraries do not always work
-0.236458	of the trivial programming work
-0.426847	because of the extra work
-0.292232	that all code versions work
-0.438515	saying that it doesn't work
-0.362831	method, but it doesn't work
-0.208312	Unfortunately, this method doesn't work
-0.208312	if above line doesn't work
-0.208312	// if above doesn't work
-0.234699	make the next model work
-0.429970	view the software development work
-0.219402	of the Gnu directives work
-0.219402	of the Microsoft directives work
-0.231293	should do as little work
-0.524610	Shared objects in BSD work
-0.228048	giving it some heavy work
-0.218455	details about how caches work
-0.212188	to be deleted. User work
-0.165066	unusual for the reinstallation work
-0.345174	object p points to (see
-0.462022	help of the compiler (see
-0.382007	to the preceding one (see
-0.336350	of the data cache (see
-0.290961	microcontrollers have no cache (see
-0.434145	of the derived class (see
-0.501352	mixes float and double (see
-0.428646	use a smart pointer (see
-0.490492	functions are less efficient (see
-0.342724	subexpressions, and induction variables (see
-0.869510	stored in a register (see
-0.690483	are stored in registers (see
-0.524872	in the XMM registers (see
-0.334360	is divisible by 16 (see
-0.997450	and the operating system (see
-0.329843	to use vector instructions (see
-0.380626	running on non-Intel processors (see
-0.349428	integer with a constant (see
-0.562578	storage on the stack (see
-0.236028	overflow checks where necessary (see
-0.443621	than with unsigned integers (see
-0.291792	relax floating point precision (see
-0.490725	use a linked list (see
-0.440198	than 0 or 1 (see
-0.533425	core clock cycle counter (see
-0.438350	in floating point expressions (see
-0.706300	is out of range (see
-0.232961	indeed vectorized as intended (see
-0.193899	intrinsics and automatic vectorization (see
-0.193899	vector intrinsics, automatic vectorization (see
-0.318818	{ // Bounds checking (see
-0.229903	cost to using templates (see
-0.963547	of the critical stride (see
-0.329856	break down dependency chains (see
-0.229043	can be quite time-consuming (see
-0.398504	assume no pointer aliasing (see
-0.195874	cannot rule out aliasing (see
-0.313206	have no branch prediction (see
-0.227939	extensions. If a profiling (see
-0.623314	microprocessor with out-of-order capabilities (see
-0.317352	rather than the throughput (see
-0.212102	in case of mispredictions (see
-0.199836	appears to be profitable (see
-0.164988	language and automatic CPU-dispatching (see
-0.164988	to do the devirtualization (see
-0.356632	light-weight alternative is the Windows
-0.358412	registration database in the Windows
-0.348218	same name. In the Windows
-0.354450	a directive for a Windows
-0.294004	I once made a Windows
-0.399113	for 64-bit Linux and Windows
-0.307443	also supports Linux and Windows
-0.236803	in Windows 7 and Windows
-0.292959	operating systems DOS and Windows
-0.236803	Template Library (ATL) and Windows
-0.443878	AVX is supported in Windows
-0.237621	user interface (OnIdle in Windows
-0.293729	for best performance. The Windows
-0.237480	Linux and BSD. The Windows
-0.561722	the Intel compiler for Windows
-0.307080	or Microsoft compiler for Windows
-0.535773	and Intel compilers for Windows
-0.292452	user interface library for Windows
-0.513076	Studio when compiling for Windows
-0.237792	__declspec(align(64)) int BigArray[1024]; // Windows
-0.237729	in some cases on Windows
-0.311960	Intel compiler Intel compiler Windows
-0.188925	compiler Linux Intel compiler Windows
-0.064542	_WIN32 n.a. MS compiler Windows
-0.015259	to optimization MS compiler Windows
-0.302051	The disadvantage of 64-bit Windows
-0.501560	supports 32-bit and 64-bit Windows
-0.421858	Supports 32- and 64-bit Windows
-0.880670	Linux than in 64-bit Windows
-0.218751	more efficient than 64-bit Windows
-0.218751	is more efficient. 64-bit Windows
-0.218751	conventions are different. 64-bit Windows
-0.303993	32-bit Linux and 32-bit Windows
-0.348234	and Sum3 in 32-bit Windows
-0.337129	is intended for 32-bit Windows
-0.534424	seen in 64 bit Windows
-0.256784	to work in both Windows
-0.256784	assembly syntax in both Windows
-0.222306	for user input. (In Windows
-0.272191	and 64-bit Linux, BSD, Windows
-0.212234	Borland's now discontinued Object Windows
-0.251272	_WIN64 _LP64 _WIN64 _LP64 Windows
-0.165107	to the current position. Windows
-0.348274	cache in between the calls
-0.520351	able to avoid the calls
-0.591199	double the number of calls
-0.339375	repeats 20 times and calls
-0.940017	is a function that calls
-0.454695	loop A function that calls
-0.440163	update the program that calls
-0.387410	small test program that calls
-0.236208	that each statement that calls
-0.287025	excessive number of function calls
-0.287025	the chain of function calls
-0.106734	of branches and function calls
-0.106734	many branches and function calls
-0.227374	time spent on function calls
-0.282244	modifier can make function calls
-0.392636	application with many function calls
-0.098974	stack. This makes function calls
-0.098974	stack frame makes function calls
-0.227374	despite the extra function calls
-0.333258	misprediction of virtual function calls
-0.321430	that contain pure function calls
-0.227374	predicted well. Even function calls
-0.312519	If a dispatched function calls
-0.227374	though the 61 function calls
-0.227374	problems. Avoid nested function calls
-0.237738	replace such loops by calls
-0.344235	error message and then calls
-0.236026	API function which then calls
-0.293544	a program contains no calls
-0.619946	a program has many calls
-0.236434	function call statement always calls
-0.236434	be determined with system calls
-0.235660	big. 7.14 Functions Function calls
-0.570665	compiled with AVX support calls
-0.646130	of a program contains calls
-0.234479	is described below. Make calls
-0.233010	function which in turn calls
-0.251285	F1. However, if F1 calls
-0.199976	be necessary. If F1 calls
-0.361203	code in example 16.2 calls
-0.222277	of a "function". Multiple calls
-0.291637	is loaded, the loader calls
-0.218473	If an error handler calls
-0.212206	should use standard API calls
-0.165081	the number of jumps, calls
-0.357677	the inputs to the calculations
-0.567844	branch depends on the calculations
-0.335749	will propagate through the calculations
-0.407521	CPU to overlap the calculations
-0.574675	it has finished the calculations
-0.382256	_finite()) and redo the calculations
-0.083848	in the sequence of calculations
-0.083848	if the sequence of calculations
-0.255625	doing a sequence of calculations
-0.294127	case of error. The calculations
-0.640320	control branch depends on calculations
-0.341270	return operations with other calculations
-0.515529	before the floating point calculations
-0.479401	integer and floating point calculations
-0.467292	registers for floating point calculations
-0.060283	that does floating point calculations
-0.317324	loop contains floating point calculations
-0.317324	enable fast floating point calculations
-0.317324	for strict floating point calculations
-0.338531	and double Floating point calculations
-0.314049	to do simple integer calculations
-0.646425	it takes to do calculations
-0.321393	the CPU can do calculations
-0.416408	one thread can do calculations
-0.120009	renaming and doing multiple calculations
-0.120009	CPU from doing multiple calculations
-0.120009	the CPU doing multiple calculations
-0.236973	C++ for doing some calculations
-0.232458	relocate, but these address calculations
-0.232458	while the runtime address calculations
-0.337123	to do the necessary calculations
-0.311137	know that double precision calculations
-0.311137	most cases, double precision calculations
-0.291464	are useful when doing calculations
-0.235311	register stack are: All calculations
-0.291292	make sure that certain calculations
-0.235284	must consider if intermediate calculations
-0.276546	functions for common mathematical calculations
-0.222348	and to mix mathematical calculations
-0.330774	the CPU to start calculations
-0.230707	available for doing parallel calculations
-0.228046	doing the heavy background calculations
-0.228081	uninitialized, if pointer arithmetic calculations
-0.281463	to put time- consuming calculations
-0.420187	Check that all code versions
-0.236859	to make multiple code versions
-0.352899	tests on Intel compiler versions
-0.236591	precision. The following compiler versions
-0.545127	are two or more versions
-0.387197	Make two or more versions
-0.250625	pointers to the different versions
-0.250625	to test the different versions
-0.250625	classes contain the different versions
-0.321159	is available in different versions
-0.200667	Overloaded functions The different versions
-0.200667	instructions sets. The different versions
-0.221558	large or if different versions
-0.321159	even compatible with different versions
-0.221558	file stub. If different versions
-0.420501	you have two different versions
-0.237172	-fno-builtin to get library versions
-0.070635	the code in multiple versions
-0.033883	critical code in multiple versions
-0.100432	entire program in multiple versions
-0.100432	is compiled in multiple versions
-0.158351	you may make multiple versions
-0.158351	This will make multiple versions
-0.213244	feature for making multiple versions
-0.213244	can automatically generate multiple versions
-0.429649	cases, there are two versions
-0.286391	common to make two versions
-0.316966	to distinguish these two versions
-0.236557	mechanism to advertise new versions
-0.236263	(www.intel.com/technology/itj/). 10.1 Hyperthreading Some versions
-0.323752	lowest of the compiled versions
-0.292051	library functions have several versions
-0.235812	www.agner.org/optimize/asmlib.zip. Currently includes optimized versions
-0.234423	AVX2 and all three versions
-0.232230	effort to make special versions
-0.230700	divisible by 16. Library versions
-0.230019	is available in newer versions
-0.226729	systems. 3 The latest versions
-0.199942	directly to the CPU-specific versions
-0.035767	for Intel CPUs. New versions
-0.035767	for AMD CPUs. New versions
-0.199942	be used as command-line versions
-0.199942	checks where necessary. Fast versions
-0.165086	an IDE. Free trial versions
-0.358264	the throughput of the execution
-0.461106	dramatic effect on the execution
-0.102645	they can block the execution
-0.102645	can possibly block the execution
-0.338804	subsequent counts give the execution
-0.346048	did not improve the execution
-0.048297	may slow down the execution
-0.048297	operations slow down the execution
-0.141346	cost in terms of execution
-0.237880	cheap, in relation to execution
-0.335922	such as cache and execution
-0.237561	usability, program compactness, and execution
-0.561751	} } } The execution
-0.314194	by memory access. The execution
-0.341709	examples are optimized for execution
-0.255029	compiler. Use CPUs with execution
-0.255029	division. Older CPUs with execution
-0.352434	the throughput of an execution
-0.293859	vector size often have execution
-0.313063	reproducibility. Delays in program execution
-0.291788	array grows during program execution
-0.446025	evenly between the different execution
-0.312952	integer operations use different execution
-0.357512	no native floating point execution
-0.237252	future CPUs. Half size execution
-0.237121	in fact only 64-bit execution
-0.237060	can be used where execution
-0.036644	11 Out of order execution
-0.236139	generality and flexibility, while execution
-0.235981	is split between several execution
-0.291289	are good for optimizing execution
-0.588114	spend most of their execution
-0.177705	cache because the out-of-order execution
-0.177705	In general, the out-of-order execution
-0.189762	automatically thanks to out-of-order execution
-0.533449	contribution to the total execution
-0.265371	effect on the total execution
-0.231340	had the full 128-bit execution
-0.231326	task switch occurs during execution
-0.228037	is critical. The fastest execution
-0.212177	unfortunate method that delays execution
-0.165055	on CPUs with full-size execution
-0.165055	into multiple threads. Out-of-order execution
-0.352543	important thing is to avoid
-0.564925	execution, you have to avoid
-0.585597	can be used to avoid
-0.233678	with fixed size to avoid
-1.639676	It is possible to avoid
-0.519473	size in order to avoid
-0.519473	variables in order to avoid
-0.519473	needed in order to avoid
-0.883292	the best way to avoid
-0.817767	examples of how to avoid
-0.578703	to know how to avoid
-0.316445	C++ programming, how to avoid
-0.702325	the operating system to avoid
-0.233678	the same type to avoid
-1.107103	and you want to avoid
-0.974447	may be able to avoid
-0.194884	are various ways to avoid
-0.164576	describe various ways to avoid
-0.530451	which processor models to avoid
-0.233678	in test situations to avoid
-0.289405	during time measurements to avoid
-0.233678	be completely unrolled to avoid
-0.458692	variables if possible, and avoid
-0.447536	global if you can avoid
-0.501986	so, then you can avoid
-0.317104	fastest because you can avoid
-0.317104	programs. If you can avoid
-0.444760	C2, then we can avoid
-0.302196	of time. You can avoid
-0.302196	it twice. You can avoid
-0.302196	GOT entry. You can avoid
-1.081963	} The compiler may avoid
-0.336251	memory used. You may avoid
-0.336251	base classes. You may avoid
-0.353726	small devices if you avoid
-0.766452	as long as you avoid
-0.142302	it. Therefore, you should avoid
-0.142302	consuming. Therefore, you should avoid
-0.142302	namespaces. Therefore, you should avoid
-0.330194	memory allocation. You should avoid
-0.341163	cycles. If you cannot avoid
-0.339739	of CriticalFunction. You cannot avoid
-0.290945	anyway. You may preferably avoid
-0.234527	should by all means avoid
-0.349795	optimized away and the result
-0.349795	four bits, and the result
-0.654574	to wait for the result
-0.575175	integer, so that the result
-0.633942	calculation depends on the result
-0.446961	addition depends on the result
-0.484708	should depend on the result
-0.355956	++i). But when the result
-0.354646	not evaluated, because the result
-0.236130	function will return the result
-1.126561	to make sure the result
-0.480710	(1,2,3,4), and store the result
-0.455781	want to see the result
-0.236130	each iteration needs the result
-0.113493	overflow and give the result
-0.113493	underflow and give the result
-0.292193	and then convert the result
-0.005730	c); // Store the result
-0.011536	bc); // Store the result
-0.011536	mask); // Store the result
-0.426734	matrix and stores the result
-0.354295	to wait for a result
-0.346112	quite often as a result
-0.346112	may occur as a result
-0.323529	will be 2. The result
-0.292504	is very fast. The result
-0.236403	to the profiler. The result
-0.236403	a[i] is ecx+eax*4. The result
-0.236403	such as <. The result
-0.950207	LoadVector(cc + i); // result
-0.237216	consecutive elements c.load(cc+i); // result
-0.237614	PTR [ecx+eax*4],ebx stores this result
-0.357729	not get the same result
-0.353706	sum, then the first result
-0.235454	points to ; store result
-0.227509	object for the intermediate result
-0.227509	to store the intermediate result
-0.327612	may get a better result
-0.342769	and last the second result
-0.275736	so that the final result
-0.275736	can check the final result
-0.328023	//=A*x*x+B*x+C //=DeltaY // Store result
-0.218519	all because the 33 result
-0.265125	up with the correct result
-0.357534	other work that the processor
-0.537203	Intel, even if the processor
-0.350163	is selected if the processor
-0.350163	to determine if the processor
-0.962857	is supported by the processor
-1.195757	cycles, depending on the processor
-0.488214	dispatcher checks whether the processor
-0.331620	definition language. Such a processor
-0.293981	is better. Whenever a processor
-0.677707	a negative list of processor
-0.548286	a positive list of processor
-0.335775	If you know that processor
-0.293661	it has. Assuming that processor
-0.698184	"what works best on processor
-0.293971	eight threads simultaneously. This processor
-0.356571	processor models rather than processor
-0.160085	threads in the same processor
-0.416359	priority in the same processor
-0.416359	stay in the same processor
-0.199648	negative list of which processor
-0.199648	positive list of which processor
-0.352705	to use for each processor
-0.313795	to utilize the multiple processor
-0.236973	At this time, any processor
-0.711891	every time a new processor
-0.292038	in terms of specific processor
-0.228584	same machine. The virtual processor
-0.283618	increasingly important. A virtual processor
-0.235623	that "we don't support processor
-0.452334	bad on a particular processor
-0.488576	assume that the next processor
-0.290225	a new and better processor
-0.233238	Intel, AMD or VIA processor
-0.231817	Intel processors. A non-Intel processor
-0.231308	with only one logical processor
-0.067998	solution where a soft processor
-0.067998	processor. Such a soft processor
-0.218438	to tell a hyperthreading processor
-0.199903	have a dedicated physics processor
-0.199903	example, in a word processor
-0.165050	example, a Core i7 processor
-0.165050	micro-op cache. The Core2 processor
-0.165050	prediction. A Pentium M processor
-0.358494	the lowest of the compiled
-0.587815	compiler is that the compiled
-0.434628	complex, but not the compiled
-0.350629	many instances makes the compiled
-0.334143	it is, and is compiled
-0.520058	A code that is compiled
-0.344905	vectors. Code that is compiled
-0.828183	If a function is compiled
-0.393315	when the code is compiled
-0.606642	If the code is compiled
-0.323298	The source code is compiled
-0.808132	of the program is compiled
-0.670204	of a program is compiled
-0.236212	C++. Yet, D is compiled
-0.331808	or C++ file and compiled
-1.049681	can be useful in compiled
-0.836537	can be implemented in compiled
-0.299338	for programs implemented in compiled
-0.553840	versions have to be compiled
-0.355881	this example should be compiled
-0.419900	code can possibly be compiled
-0.324701	the critical code are compiled
-0.237362	code and main() are compiled
-1.203755	a piece of code compiled
-0.235150	or later with code compiled
-0.101850	restrictions on mixing code compiled
-0.101850	problem when mixing code compiled
-0.312674	run faster than when compiled
-0.236181	for each process when compiled
-0.352325	other platforms and other compiled
-0.237308	modules if necessary, each compiled
-0.445283	above. A shared object compiled
-0.236752	source files are first compiled
-0.223966	operating systems and programs compiled
-0.308376	double precision in programs compiled
-0.177493	framework. Obviously, the directly compiled
-0.177493	as well as directly compiled
-0.177493	implementations of C++, directly compiled
-0.229160	obtained with a fully compiled
-0.272184	mode): ; Example 8.26a compiled
-0.218524	Shared objects are normally compiled
-0.251266	8.26b: ; Example 8.26b compiled
-0.165102	that can be cross- compiled
-0.237520	Output array element } An
-0.237408	possible. Use inline functions An
-0.338342	Optimizing software in C++ An
-0.331165	} 70 Induction variables An
-0.236581	from the application code. An
-0.236576	or less each time. An
-0.347863	mispredictions. Boolean vector operations An
-0.379433	functions. 7.27 Overloaded operators An
-0.235353	strides. Uncached memory store An
-0.235178	of the first program. An
-0.342764	the variable is used. An
-0.436244	solution in some cases. An
-0.411180	of efficient container classes. An
-0.521240	makes data caching inefficient. An
-0.325131	when it is executed. An
-0.519133	overflow of the arrays. An
-0.227927	all 0's gives zero. An
-0.452373	when speed is important. An
-0.299032	storage methods mentioned above. An
-0.176324	into a single result. An
-0.176324	produces a negative result. An
-0.622169	folding and constant propagation An
-0.276390	page 93. 7.10 Arrays An
-0.301572	the class is declared. An
-0.218419	40 i = s; An
-0.272026	are given in www.agner.org/optimize/cppexamples.zip. An
-0.264943	alternative implementations. 7.22 Inheritance An
-0.264943	mathematical functions. 7.4 Enums An
-0.212091	the function is inlined. An
-0.265037	space to become fragmented. An
-0.212091	table 8.1 below. Devirtualization An
-0.199825	AMD and VIA CPUs: An
-0.251115	Optimizing software in C++: An
-0.330544	does the same thing. An
-0.164978	known as memory leak. An
-0.164978	from -128 to +127. An
-0.164978	compilers and operating systems"). An
-0.164978	(except for char pointers). An
-0.164978	all sizes of matrices. An
-0.164978	over the C99 standard. An
-0.164978	134 on bounds checking). An
-0.164978	subroutines in assembly language: An
-0.164978	instruction set it supports. An
-0.164978	explained on page 27. An
-0.348629	& N-1)==0,N>::p(x); } // Use
-0.313247	void CriticalFunction(); ... // Use
-0.535069	= _mm_cmpgt_epi16(b, zero); // Use
-0.236004	a * 2.5; // Use
-0.237763	instead of copying it Use
-0.346945	language Use intrinsic functions Use
-0.344189	this: Use assembly language Use
-0.235873	of graphics cards, etc. Use
-0.304767	decomposition of the data. Use
-0.327194	with lots of data. Use
-0.290005	works with all compilers. Use
-0.289100	Gnu or Intel compiler. Use
-0.233303	memory blocks, for example: Use
-0.232885	conditions are satisfied: 1. Use
-0.232563	Intel or PathScale. 2. Use
-0.317225	Mathematical vector function libraries. Use
-0.331002	one function, if possible. Use
-0.306852	in the object file. Use
-0.226741	ways to do this: Use
-0.417351	compiler manual for details. Use
-0.222194	short or common names. Use
-0.222194	object or library files. Use
-0.296045	results as floating point. Use
-0.218391	required for performance reasons. Use
-0.272066	code is not optimal. Use
-0.291540	explanation of this option. Use
-0.218500	better at vectorization. 3. Use
-0.212197	member pointers are implemented. Use
-0.212125	// Cache contentions expected. Use
-0.284134	identify a hot spot. Use
-0.074714	cycle? ...................................................................................... 16 3.2 Use
-0.074714	can be improved. 3.2 Use
-0.074714	bits to zero. 14.3 Use
-0.074714	checking .................................................................................................. 134 14.3 Use
-0.074714	topics ......................................................................................... 132 14.1 Use
-0.074714	Specific optimization topics 14.1 Use
-0.199858	a dedicated test server. Use
-0.251153	// No cache contentions. Use
-0.199858	that it calls. 48 Use
-0.165009	through the symbolic link. Use
-0.165009	or an assembly listing. Use
-0.165009	Example: // Example 7.34a. Use
-0.165009	Gnu compiler allows "__attribute__((visibility("hidden")))". Use
-0.165009	microprocessor that supports this). Use
-0.237902	scans a string of bytes
-0.206715	This pointer is 4 bytes
-0.212791	only _mm_permutevar_ps 4 4 bytes
-0.212791	AVX _mm256_permutevar_ps 4 4 bytes
-0.206715	_mm_prefetch SSE Store 4 bytes
-0.043198	AVX2 _mm256_i32gather_epi32 unlimited 4 bytes
-0.043198	AVX2 _mm_i32gather_ps unlimited 4 bytes
-0.043198	AVX2 _mm_i32gather_epi32 unlimited 4 bytes
-0.043198	AVX2 _mm256_i32gather_ps unlimited 4 bytes
-0.281956	8 double's of 8 bytes
-0.253673	32-bit systems and 8 bytes
-0.202097	a double takes 8 bytes
-0.202097	made the structure 8 bytes
-0.202097	_mm_stream_si32 SSE2 Store 8 bytes
-0.042403	AVX2 _mm256_i64gather_pd unlimited 8 bytes
-0.042403	AVX2 _mm_i64gather_pd unlimited 8 bytes
-0.042403	AVX2 _mm256_i64gather_epi32 unlimited 8 bytes
-0.042403	AVX2 _mm_i64gather_epi32 unlimited 8 bytes
-0.381226	size is typically 64 bytes
-0.261093	be increased to 16 bytes
-0.208680	Objects bigger than 16 bytes
-0.208680	registers to test 16 bytes
-0.059073	_mm_stream_pd SSE2 Store 16 bytes
-0.028533	_mm_stream_ps SSE Store 16 bytes
-0.028533	_mm_stream_pi SSE Store 16 bytes
-0.262825	or class is 128 bytes
-0.344642	within the first 128 bytes
-0.092510	0.29 0.28 strlen 128 bytes
-0.092510	0.59 0.27 strlen 128 bytes
-0.059739	The number of unused bytes
-0.059739	cause holes of unused bytes
-0.129026	13 // 2 unused bytes
-0.059739	first // 4 unused bytes
-0.059739	are also 4 unused bytes
-0.059739	there are 6 unused bytes
-0.059739	first // 6 unused bytes
-0.233025	line covers 64 consecutive bytes
-0.229191	loops (less than 65 bytes
-0.226713	offset bigger than 127 bytes
-0.226713	following table. Type size, bytes
-0.199981	/ 4 = 2048 bytes
-0.199981	entire 64 or 0x40 bytes
-0.165122	and the array 800 bytes
-0.165122	Type size, bytes alignment, bytes
-0.358525	file than in the big
-0.200019	integer size that is big
-0.546912	the member function is big
-0.447115	particular integer size is big
-0.461282	device itself is a big
-0.352207	different parts of a big
-0.352207	the modules of a big
-0.353667	between rows in a big
-0.353667	than investing in a big
-0.536696	to roll out a big
-0.324039	such processors requires a big
-0.538823	allocation and deallocation of big
-0.336181	4. Align arrays and big
-0.325367	preferably be done in big
-0.345088	this method only for big
-0.237836	is therefore recommended that big
-0.550490	that are based on big
-0.407850	makes the compiled code big
-0.443081	making programs that have big
-0.341445	other systems may have big
-0.131149	important if you have big
-0.131149	precision if you have big
-0.335947	to platforms that use big
-0.293776	a floppy disk. A big
-0.352300	big arrays and other big
-0.342292	objects together in one big
-0.308980	pipeline structure has one big
-0.233081	efficient to allocate one big
-0.865291	and there is no big
-0.513300	If there are no big
-0.339823	a matrix is so big
-0.326084	and c[i] are so big
-0.621561	can be a very big
-0.225207	not apply to very big
-0.225207	making the arrays very big
-0.225207	array is made very big
-0.236696	are in doubt how big
-0.315128	or container is too big
-0.294592	and c[i] are too big
-0.231858	floating point comparison. On big
-0.496794	access. Reading or writing big
-0.226660	years to come. Even big
-0.165076	exceeding that of yesterday's big
-0.352608	any extra code and doesn't
-0.488081	is a function that doesn't
-0.290283	to a function that doesn't
-0.290283	while a function that doesn't
-0.904319	smallest integer size that doesn't
-0.235400	more efficient solution that doesn't
-0.235400	a CPU dispatcher that doesn't
-0.235400	primitive programming style that doesn't
-0.465918	less so that it doesn't
-0.465918	exception so that it doesn't
-0.323754	or assume that it doesn't
-0.323754	program saying that it doesn't
-0.058859	member function because it doesn't
-0.307139	optimized code because it doesn't
-0.307139	different platforms because it doesn't
-0.326718	simplest method, but it doesn't
-0.326718	syntax restriction, but it doesn't
-0.433743	In this case it doesn't
-0.427754	here if the compiler doesn't
-0.486002	possible because the compiler doesn't
-0.465705	systems. If the compiler doesn't
-0.330486	5. But the compiler doesn't
-0.330486	simple function, the compiler doesn't
-0.330486	reason why the compiler doesn't
-0.338673	across modules The compiler doesn't
-0.338673	of temp. The compiler doesn't
-0.229137	where the chosen compiler doesn't
-0.237573	doesn't mean atomic. It doesn't
-0.536836	instruction that the CPU doesn't
-0.443638	and the CPUID instruction doesn't
-0.324678	conversion to signed integer doesn't
-0.293537	other constructors. A class doesn't
-0.457165	cases where the size doesn't
-0.713362	needed if the object doesn't
-0.340520	80 Unfortunately, this method doesn't
-0.329867	long as the error doesn't
-0.312383	that signed integer overflow doesn't
-0.235862	// if above line doesn't
-0.533451	_EM_OVERFLOW); // if above doesn't
-0.347948	if-else structure), the microprocessor doesn't
-0.232622	on a store operation doesn't
-0.228095	away. Note that volatile doesn't
-0.222294	the Gnu manual currently doesn't
-0.346627	filled up if the threads
-0.346627	an advantage if the threads
-0.064220	cache contentions if the threads
-0.356234	cache line, because the threads
-0.517705	threads. The use of threads
-1.171868	if the number of threads
-0.237842	storage p. 28) The threads
-0.237841	meaningless event counts for threads
-0.331800	becomes faster and that threads
-0.237756	Run multiple processes or threads
-0.800118	have two or more threads
-0.323817	preferably have no more threads
-0.321695	that variable. The different threads
-0.233882	here means that different threads
-0.335090	multitasking environment, between different threads
-0.336388	are running in other threads
-0.439119	objects if no other threads
-0.237392	the end when all threads
-0.331070	were not divided into threads
-0.092566	For example, if multiple threads
-0.092566	thread safe if multiple threads
-0.092566	is used by multiple threads
-0.092566	paralleli- zation by multiple threads
-0.323911	be divided into multiple threads
-0.489001	is shared between multiple threads
-0.262988	order to avoid multiple threads
-0.210359	term for running multiple threads
-0.210359	CPU cores: Define multiple threads
-0.210359	data processing. Running multiple threads
-0.276209	is considerable. If two threads
-0.222051	capable of making two threads
-0.045791	able to run two threads
-0.045791	try to run two threads
-0.222051	to avoid running two threads
-0.222051	It doesn't prevent two threads
-0.169813	is that communication between threads
-0.169813	of necessary communication between threads
-0.235608	cores can run eight threads
-0.276533	network access in separate threads
-0.222337	time-consuming tasks into separate threads
-0.212234	same processor core. Two threads
-0.199965	system core and high-priority threads
-0.571414	de-allocated. This is the best
-0.982830	is one of the best
-0.411870	with some of the best
-0.411870	Even some of the best
-0.831648	function pointer to the best
-0.355922	Basic .NET and the best
-0.357468	factor 4 in the best
-0.355475	a DLL with the best
-0.471023	it is not the best
-0.343258	all libraries have the best
-0.499289	do not use the best
-0.347768	compiler will do the best
-0.236396	brand or model the best
-0.521523	is to find the best
-0.312931	not needed. Obviously, the best
-0.292496	directives that select the best
-0.292496	more by choosing the best
-0.236396	hot spot. Sometimes, the best
-0.236396	compiler doesn't provide the best
-0.357866	best cases. It is best
-0.480688	which programming language is best
-0.348834	No universal solution is best
-0.340984	optimizing application-specific code. The best
-0.332147	that it calls. The best
-0.731701	set is available. The best
-0.290473	an operating system. The best
-0.290473	in this case. The best
-0.290473	Java virtual machine. The best
-0.290473	the other compilers). The best
-0.234618	example below shows. The best
-0.234618	a suitable duration. The best
-0.234618	has several flaws: The best
-0.331820	a 64-bit version for best
-0.356234	the compilers that are best
-0.337585	is likely to work best
-0.282199	the one that works best
-0.210485	The version that works best
-0.203875	The automatic vectorization works best
-0.090082	rather than "what works best
-0.090082	typically thinks "what works best
-0.037861	depends on what fits best
-0.037861	depending on what fits best
-0.218554	The version that performs best
-0.461995	without problems if the necessary
-0.445442	compiler doesn't have the necessary
-0.456266	check for all the necessary
-0.812849	have to do the necessary
-0.434819	constructor that does the necessary
-0.314152	processors that support the necessary
-0.237420	operating systems lack the necessary
-0.678740	so that it is necessary
-0.475557	think that it is necessary
-0.502630	advantageous then it is necessary
-0.502630	small, then it is necessary
-0.516653	78 Therefore, it is necessary
-0.332505	} Here, it is necessary
-0.332505	is called, it is necessary
-0.332505	tasks. Sometimes it is necessary
-0.332505	is accessed, it is necessary
-0.356216	quite often. This is necessary
-0.456392	integer calculations. It is necessary
-0.353225	for hackers. It is necessary
-0.291911	development. This unit-testing is necessary
-0.539844	that the amount of necessary
-0.438940	of usability problems and necessary
-0.586753	code. This can be necessary
-0.841938	it may not be necessary
-1.302801	then it may be necessary
-0.556332	profile. It may be necessary
-0.237827	If frequent updates are necessary
-0.345141	Dynamic linking makes it necessary
-0.555671	hand, it is not necessary
-0.761254	but this is not necessary
-0.542843	zero. It is not necessary
-0.135912	exception handling is not necessary
-0.135912	Exception handling is not necessary
-0.452686	and they are not necessary
-0.537240	make overflow checks where necessary
-0.237027	up and calling any necessary
-0.415039	C++, it is often necessary
-0.509839	core. It is often necessary
-0.315959	expressions. It is therefore necessary
-0.315959	constant. It is therefore necessary
-0.315959	can. It is therefore necessary
-0.327309	Fortunately, it is rarely necessary
-0.212257	doesn't give the 124 necessary
-0.865890	calculate the address of element
-0.343481	that was used by element
-0.048236	diagonal is swapped with element
-0.048236	matrix[r][c] is swapped with element
-0.382504	order to access an element
-0.129024	branches at the vector element
-0.129024	lookup at the vector element
-0.351470	to hold only one element
-0.048861	Add 2 to each element
-0.334453	and bc for each element
-0.045547	zero); // AND each element
-0.045547	110 // AND each element
-0.220589	quadratic matrix, i.e. each element
-0.022177	c); // Compare each element
-0.482667	address of the array element
-0.392773	system if the array element
-0.310230	the address of array element
-0.708614	address of an array element
-0.358569	bytes) of each array element
-0.220390	endl; // Output array element
-0.236927	Size of each table element
-0.442919	address of the first element
-0.119048	known before the first element
-0.455487	insertion of a new element
-0.236221	and make this extra element
-0.235517	it needs only calculate element
-0.311801	two expressions for every element
-0.335730	to add the last element
-0.219423	at the diagonal. Each element
-0.219423	a linked list. Each element
-0.423004	50 clock cycles per element
-0.048387	Matrix size Time per element
-0.048387	Total kilobytes Time per element
-0.048387	Example 9.6a Time per element
-0.019508	finding the numerically largest element
-0.019508	finds the numerically largest element
-0.039935	// Find numerically largest element
-0.265086	"what is the nearest element
-0.199948	add an extra dummy element
-0.165091	cache. When we reach element
-0.494298	C++ is also a language
-0.293886	Java today. But this language
-0.293818	efficiency is important. A language
-0.481555	drawbacks of the C++ language
-0.225065	problem with the C++ language
-0.225065	languages. But the C++ language
-0.224702	is needed. The C++ language
-0.224702	compilers work. The C++ language
-0.777870	efficiency of different C++ language
-0.231736	the way the programming language
-0.182562	by choosing a programming language
-0.042251	the choice of programming language
-0.042251	2.4 Choice of programming language
-0.182562	to decide which programming language
-0.182562	of the C++ programming language
-0.231736	that the software programming language
-0.182562	that a particular programming language
-0.182562	definitely the preferred programming language
-0.172517	the use of assembly language
-0.172517	easy linking to assembly language
-0.269725	and d in assembly language
-0.237753	compiler option for assembly language
-0.294040	compiled C++ or assembly language
-0.237753	to generate an assembly language
-0.133964	than by using assembly language
-0.133964	highly optimized, using assembly language
-0.172517	do this: Use assembly language
-0.172517	of instruction timing, assembly language
-0.172517	straightforward. The MASM assembly language
-0.380212	code for the common language
-0.233286	includes the low-level C language
-0.231339	to the device. Any language
-0.283092	these reasons, the preferred language
-0.226731	class library). The D language
-0.090778	and a hardware definition language
-0.190637	in a hardware definition language
-0.090778	where a hardware definition language
-0.091180	them. The hardware definition language
-0.224885	web application integration, mixed language
-0.122638	software in a high-level language
-0.122638	is an advanced high-level language
-0.236588	relate to CPU-intensive code. But
-0.292708	which consumes CPU time. But
-0.321447	of the member function. But
-0.371125	to the called function. But
-0.228229	strcat, strlen, sprintf, etc. But
-0.228229	C#, Visual Basic, etc. But
-0.436760	or a smart pointer. But
-0.232545	* c > b) But
-0.232124	for the same resources. But
-0.342220	every three clock cycles. But
-0.411195	than for double precision. But
-0.229928	on a modern CPU. But
-0.229903	x when it returns. But
-0.229098	signed and unsigned integers. But
-0.229015	variable as function parameter. But
-0.228000	than the destination array. But
-0.592328	the same cache line. But
-0.227970	special mathe- matical applications. But
-0.440137	treated as an integer. But
-0.409170	of the data members. But
-0.222171	addresses to function names. But
-0.222171	not advantageous by itself. But
-0.222171	to the constant 5. But
-0.218368	found in other languages. But
-0.272039	will soon be obsolete. But
-0.264956	certain to be inlined. But
-0.212102	some positive value, n. But
-0.465804	time it was programmed. But
-0.330559	comes on the market. But
-0.199836	do have such checks. But
-0.199836	a software optimization issue. But
-0.199836	can see the delay. But
-0.199836	then 0+1.23456 = 1.23456. But
-0.199836	used for Java today. But
-0.164988	compiler, or vice versa. But
-0.164988	are containers 93 themselves. But
-0.164988	for (i=0; i<n; ++i). But
-0.164988	to C0::f or C1::f. But
-0.164988	because of its simplicity. But
-0.164988	in a single session. But
-0.164988	into a and b. But
-0.164988	cause a cache miss. But
-0.164988	and other resource conflicts. But
-0.460916	256 times and the speed
-0.553064	increasing faster than the speed
-0.355860	is long because the speed
-0.293460	you can double the speed
-0.344529	are used, while the speed
-0.492983	This can improve the speed
-0.331051	modifications actually increase the speed
-0.237243	Testing speed Testing the speed
-0.381908	test that measures the speed
-0.594707	can be used to speed
-0.354216	Tips about how to speed
-0.188838	is no difference in speed
-0.157071	simply no difference in speed
-0.237282	lot to gain in speed
-0.237282	much you gain in speed
-0.339908	most used data. The speed
-0.312933	performance and precision. The speed
-0.236398	versions work correctly. The speed
-0.236398	current CPUs optimally. The speed
-0.236398	family number 6! The speed
-0.237101	optimizing the software for speed
-0.313772	is critical. Optimizing for speed
-0.237101	compiler Linux Optimize for speed
-0.237777	the static version if speed
-0.236185	definitely be avoided when speed
-0.236185	version is preferred when speed
-0.233699	of the code where speed
-0.233699	threads are areas where speed
-0.324278	there was hardly any speed
-0.314247	not improve the execution speed
-0.021247	in terms of execution speed
-0.208985	are optimized for execution speed
-0.208985	good for optimizing execution speed
-0.290470	switch statements The high speed
-0.327008	you want to improve speed
-0.233751	This can actually reduce speed
-0.231854	may run with reduced speed
-0.231829	cores, and a processing speed
-0.224901	at less than half speed
-0.176454	is handled at half speed
-0.279420	to 15.1c). 16 Testing speed
-0.518489	each compiled for the specific
-0.352937	big enough for the specific
-0.344890	and searching, or the specific
-0.355954	other input/output than the specific
-0.336297	note: This example is specific
-0.997324	unless there is a specific
-0.351123	an integer of a specific
-0.351123	integer types of a specific
-0.331215	the code to a specific
-0.134769	the thread to a specific
-0.134769	a thread to a specific
-0.331215	the linker to a specific
-0.524336	for elements in a specific
-0.303657	a container for a specific
-0.303657	big enough for a specific
-0.303657	130 Compile for a specific
-0.303657	be wired for a specific
-0.351399	the compiler that a specific
-0.235412	computing resources. Typically, a specific
-0.235412	each bit indicates a specific
-0.538960	Thinking in terms of specific
-0.237707	criteria or lists of specific
-0.331332	a separate version for specific
-0.293718	that are fine-tuned for specific
-0.563807	used if there are specific
-0.237792	SSE4.1 // AVX2 // specific
-0.294043	specific CPU brands or specific
-0.237527	specific optimization instructions at specific
-0.307328	if elements have no specific
-0.398971	performance. I have no specific
-0.232670	-O3 or (requires no specific
-0.285094	am not making any specific
-0.229885	going to recommend any specific
-0.229885	have to obey any specific
-0.232647	compatibility with legacy code, specific
-0.337131	are available to fit specific
-0.231315	difficult to maintain. Any specific
-0.364727	are used for giving specific
-0.048380	sampling requires a CPU- specific
-0.048380	library functions. The CPU- specific
-0.048380	complicated to make CPU- specific
-0.165107	interface and other system- specific
-0.165107	processor activates critical application- specific
-0.325335	1 is changed to c
-0.038130	inexact if b and c
-0.038130	example, a, b and c
-0.038130	and add b and c
-0.009223	// Multiply b and c
-0.629691	above the diagonal. The c
-0.350671	may write: y = c
-0.237762	must be reversed if c
-0.490674	(b != 0) { c
-0.237553	first, then d+e, then c
-0.245096	each element in vector c
-0.237428	do not overlap. If c
-0.739379	a = b + c
-1.001439	a + b + c
-0.237022	14.15b if (a * c
-0.014602	for (c = 0; c
-0.312108	1; x[1] = b; c
-0.022576	a ? b : c
-0.022315	into vector c: __m128i c
-0.234589	unsigned for fast division c
-0.349361	even if a, b, c
-0.021931	= select(b > 0, c
-0.552450	a, b, c, d; c
-0.215827	= 0, c, d; c
-0.020780	b > 0 ? c
-0.524654	= temp * temp; c
-0.224802	into vector c: Is16vec8 c
-0.301678	-100, b = 100, c
-0.212177	a = select_gt(b, zero, c
-0.199909	= b * 3.5; c
-0.165055	-1.0E8, b = 1.0E8, c
-0.165055	34 else { CFALSE: c
-0.165055	= (a+1) * (a+1); c
-0.357547	multithreaded applications it is much
-0.339934	shift operation which is much
-0.439606	shift operation, which is much
-0.330874	a soft processor is much
-0.293048	calculate self-relative addresses is much
-0.406617	data. The effect is much
-0.236881	compiled without -fpic is much
-0.346411	multiple inheritance where a much
-0.237577	many computer users and much
-0.237577	are accessed backwards and much
-0.237804	that you measure are much
-0.292978	try to do as much
-0.236820	want to get as much
-0.314375	such devices typically have much
-0.237524	for millisecond resolution. A much
-0.335930	powerful facilities that do much
-0.277524	deallocation of memory takes much
-0.277524	hard disk often takes much
-0.164179	Floating point division takes much
-0.228078	division Integer division takes much
-0.223211	unfortunate because truncation takes much
-0.344140	this effect is so much
-0.236907	this bookkeeping depends very much
-0.236740	frameworks that typically take much
-0.345148	the program are often much
-0.042226	is called and how much
-0.222133	You can calculate how much
-0.276302	program to measure how much
-0.501925	therefore it is accessed much
-0.291774	bitwise operators are calculated much
-0.235692	This framework typically uses much
-0.207922	a program has too much
-0.260238	because it takes too much
-0.207922	precision without worrying too much
-0.341517	manner. It is usually much
-0.504067	operations can be made much
-0.132873	CPU dispatching: 1. How much
-0.061374	................................................................................ 16 3.1 How much
-0.061374	time consumers 3.1 How much
-0.355983	cases, you can obtain much
-0.330695	not have to worry much
-0.570888	however, there is a single
-0.436325	internal functions in a single
-0.436325	actually called in a single
-0.475068	multiple bits in a single
-0.436325	all done in a single
-0.436325	multiple conditions in a single
-0.337324	is included in a single
-0.350782	exception handling for a single
-0.337992	be calculated by a single
-0.882458	be replaced by a single
-0.351660	or moved with a single
-0.816428	is stored as a single
-0.849624	efficient to make a single
-0.275345	include not only a single
-0.275345	that contains only a single
-0.050861	.cpp files into a single
-0.253410	objects together into a single
-0.333301	it fits into a single
-0.253410	some formula into a single
-0.253410	be joined into a single
-0.309652	you have even a single
-0.309652	even when just a single
-0.233645	Boolean operators produce a single
-0.233645	rather than isolating a single
-0.325358	the result back to single
-0.294143	functions is higher for single
-0.408005	The bitwise operators are single
-0.444193	operations are done with single
-0.858935	just as fast as single
-0.809451	no more time than single
-0.520807	application. You may use single
-0.236250	above examples all use single
-0.237530	to convert b from single
-0.546131	whether you are using single
-0.711945	difference in speed between single
-0.339822	by making the constant single
-0.291695	double precision or four single
-0.235611	double precision or eight single
-0.287312	using this for testing single
-0.231330	used. Do not mix single
-0.356019	a penalty for mixing single
-0.293420	= absvalue; largest_index = i;
-0.237208	= order(i); matrix[j][0] = i;
-0.255547	* p) { int i;
-0.051195	& r) { int i;
-0.130199	sum = 0; int i;
-0.130199	sum2 = 0; int i;
-0.303441	float f; unsigned int i;
-0.303441	Example 7.7 unsigned int i;
-0.333429	size = 100; int i;
-0.000471	{ float f; int i;
-0.201958	NumberOfTests = 10; int i;
-0.069253	7.26b float a[100]; int i;
-0.069253	7.26a float a[100]; int i;
-0.201958	size = 16; int i;
-0.201958	factorial = 1.0; int i;
-0.333429	size = 1000; int i;
-0.074391	14.13b int list[300]; int i;
-0.074391	14.13a int list[300]; int i;
-0.074391	14.12a int list[300]; int i;
-0.436555	32; float matrix[rows][columns]; int i;
-0.201958	b;}; S1 list[100]; int i;
-0.201958	// Example 7.21 int i;
-0.201958	// Example 7.23 int i;
-0.201958	// Example 7.20 int i;
-0.201958	// Example 7.19 int i;
-0.201958	abc * p; int i;
-0.201958	max = 110; int i;
-0.201958	b;}; Sab ab[size]; int i;
-0.201958	14.6 float list[16]; int i;
-0.201958	// Example 7.30b int i;
-0.201958	// Example 7.30a int i;
-0.201958	14.13c int list[301]; int i;
-0.237258	p = p + i;
-0.228170	n; i++) f *= i;
-0.251354	1024; int a[size], b[size], i;
-0.525668	r + 2; } These
-0.292715	of the position-independent code. These
-0.662568	takes no extra time. These
-0.235854	misses, branch mispredictions, etc. These
-0.311605	copying blocks of memory. These
-0.290768	// C++ casting operator These
-0.509908	manipulate the data cache. These
-0.348931	to the instruction set. These
-0.289125	particular set of CPUs. These
-0.233181	for holding the pointer. These
-0.232195	interfaces and system calls. These
-0.231218	than static link libraries. These
-0.231239	as a register stack. These
-0.323560	complicated implementation is needed. These
-0.317200	back to single precision. These
-0.230631	of the largest vector. These
-0.199882	any brand of CPU. These
-0.199882	any known hardware CPU. These
-0.322021	to solve this problem. These
-0.227970	functions, etc. in vectors. These
-0.222171	the software development process. These
-0.222171	and often excessively so. These
-0.218368	methods to improve efficiency. These
-0.272107	one function to another. These
-0.272039	class templates in www.agner.org/optimize/cppexamples.zip. These
-0.212102	profiler is called CodeAnalyst. These
-0.212102	each type of microprocessor. These
-0.264956	which implementation is best. These
-0.284107	for vectorized table lookup. These
-0.347217	functions malloc and free. These
-0.199836	0x3700, 0x3F00 and 0x4700. These
-0.199836	or bypassing syntax checks. These
-0.199836	far from the server. These
-0.251127	e.g. .R. for AVX. These
-0.164988	is 8*1024/64 = 128. These
-0.164988	Effective C++". Addison-Wesley, 1996. These
-0.164988	in the so-called commpage. These
-0.164988	in C++ and Fortran. These
-0.164988	that begin with _mm. These
-0.164988	global offset table (GOT). These
-0.164988	DOS and Windows 3.x. These
-0.164988	and "Integrated Performance Primitives". These
-0.498561	same version of the virtual
-0.498561	which version of the virtual
-0.892164	right version of the virtual
-1.078683	different versions of the virtual
-0.356720	be bypassed when the virtual
-0.456383	directly without using the virtual
-0.293551	save by avoiding the virtual
-0.407271	compiler can bypass the virtual
-0.830322	which version of a virtual
-1.074635	a pointer to a virtual
-0.357399	look up in a virtual
-0.353805	table lookup for a virtual
-0.897717	as efficient as a virtual
-0.730882	takes to call a virtual
-0.352412	with templates instead of virtual
-0.237702	prediction and misprediction of virtual
-0.237699	on runtime dispatch to virtual
-0.237699	function // Call to virtual
-0.343334	with member pointers and virtual
-0.237583	statement jump tables, and virtual
-0.294133	the same machine. The virtual
-0.428629	that is obtained with virtual
-0.293108	7.43a. Runtime polymorphism with virtual
-0.292123	becoming increasingly important. A virtual
-0.236069	is not necessary. A virtual
-0.343215	reserve resources for other virtual
-0.237439	function is called. If virtual
-0.382092	has at least one virtual
-0.631369	the object has no virtual
-0.500353	If you can avoid virtual
-0.292045	known hardware CPU. These virtual
-0.084934	public CHello { public: virtual
-0.107669	class C0 { public: virtual
-0.107669	public C0 { public: virtual
-0.232651	can avoid the inefficient virtual
-0.231817	than non-virtual functions. Avoid virtual
-0.287267	virtual functions. This so-called virtual
-0.184682	framework and the Java virtual
-0.184682	emulating the so-called Java virtual
-0.466011	{ public: void NotPolymorphic(); virtual
-0.165091	declared. Avoid multiple inheritance, virtual
-0.408095	involve the loading of several
-0.237828	D, Pascal, Fortran and several
-0.237880	fetched and decoded in several
-0.292428	is for C++ for several
-0.236336	want to optimize for several
-0.292428	classes and templates for several
-0.236336	response is delayed for several
-0.236336	should be prepared for several
-0.294130	is a risk that several
-0.124461	of compiler There are several
-0.124461	by compiler There are several
-0.299599	(*.dll, *.so). There are several
-0.299599	in Windows). There are several
-0.299599	AMD CodeAnalyst. There are several
-0.299599	Memory copying. There are several
-0.314619	it is accessed by several
-0.237710	need to test on several
-0.342394	Intel library functions have several
-0.572188	64 bit systems have several
-0.237569	static library, except when several
-0.523060	where a program has several
-0.232332	in Intel compilers has several
-0.375071	The static keyword has several
-0.232332	The C++ syntax has several
-0.314164	in program memory. If several
-0.237395	it is cached, but several
-0.236894	microprocessors is split between several
-0.306420	database It can take several
-0.306420	programs installed can take several
-0.227398	it will often take several
-0.348346	may have to test several
-0.311923	a loop that contains several
-0.289897	strict formalism that requires several
-0.343813	it necessary to load several
-0.232920	the loop control statement several
-0.307919	(a+b). This can save several
-0.229172	one. I have provided several
-0.433006	reinstall a software package several
-0.212183	caching, but it took several
-0.165060	the different versions alternatingly several
-0.165060	that the microprocessor wastes several
-0.165060	the processing power. Connecting several
-0.537453	available to the function through
-0.356002	pointers Calling a function through
-0.623023	// Call critical function through
-0.407522	or pointers to data through
-0.182039	r++) { // loop through
-0.273599	c++) { // loop through
-0.431681	about its child class through
-0.318795	the third generation class through
-0.663492	of a derived class through
-0.237237	code that accesses b through
-0.350079	program loads the library through
-0.634579	a variable or object through
-0.447295	to accessing an object through
-0.373279	access the data object through
-0.351133	When accessing a variable through
-0.551071	The function is called through
-0.236821	get its own address through
-0.422889	because it is accessed through
-0.316920	a class are accessed through
-0.066211	the arrays are accessed through
-0.066211	when arrays are accessed through
-0.066211	If arrays are accessed through
-0.239738	or structures are accessed through
-0.191591	are in fact accessed through
-0.191591	malloc) is necessarily accessed through
-0.217512	data. The code goes through
-0.217512	in a DLL goes through
-0.248870	because it may go through
-0.197829	and public variables go through
-0.197829	is, but must go through
-0.233225	be accessed from main through
-0.670952	i++) { // Loop through
-0.230696	programs automatically download updates through
-0.283058	(2) find the GOT through
-0.228053	needs an extra jump through
-0.224845	Intel compiler versions 7 through
-0.222265	pixel or line separately through
-0.251228	indicated by the caller through
-0.165071	each allocated block. Walking through
-0.165071	this value will propagate through
-0.165071	more time than looping through
-0.165071	constant can be propagated through
-0.518700	intermediate code for the common
-0.353081	// Prototype for the common
-0.346100	maps etc. It is common
-0.346100	of unit-testing It is common
-0.346100	optimized away. It is common
-0.346100	Feature bloat. It is common
-0.477627	up. This is a common
-0.477627	eax,eax. This is a common
-0.340898	input data is a common
-0.340898	do so is a common
-0.340898	of abstraction is a common
-0.512251	trick of using a common
-0.492035	so is also a common
-0.439978	division by making a common
-0.352673	necessary. Fast versions of common
-0.237839	with that branch. The common
-0.702232	contains many functions for common
-0.237752	macros with short or common
-0.459996	do optimizations such as common
-0.560959	used. It is more common
-0.237539	|= 0x20; 46 A common
-0.352325	memory leaks and other common
-0.372030	some of the most common
-0.372030	instead of the most common
-0.683078	used in the most common
-0.293574	to calculate the most common
-0.293574	is probably the most common
-0.248894	Mathematical functions The most common
-0.248894	arithmetic operations. The most common
-0.248894	single element. The most common
-0.248894	dispatch methods. The most common
-0.479183	well-tested libraries for many common
-0.289391	and discovered that many common
-0.223314	of technical problems. Some common
-0.223314	one works best. Some common
-0.223314	things very stupid. Some common
-0.235331	(see page 93). All common
-0.322593	be pure. This allows common
-0.234121	they fail to eliminate common
-0.342584	A compiler can eliminate common
-0.218524	pointers, references, 'this' pointer, common
-0.165102	by using function inlining, common
-0.355140	a & a = a,
-0.291334	a && true = a,
-0.048151	a & -1 = a,
-0.312745	such as -(-a) = a,
-0.346397	(b*c) overflows, even if a,
-0.058987	int main() { int a,
-0.229480	and unsigned integers int a,
-0.229480	// Example 14.10 int a,
-0.229480	// Example 14.11 int a,
-0.229480	x * m;} int a,
-0.229480	// Example 8.6a int a,
-0.229480	// Example 8.6b int a,
-0.229480	d = 1.6; int a,
-0.237506	y + a.y);} vector a,
-0.229498	// Example 14.14b double a,
-0.229498	// Example 14.14a double a,
-0.229498	// Example 14.18c double a,
-0.229498	// Example 8.2a double a,
-0.337688	struct S1 { float a,
-0.219289	+ 1.0f;} 66 float a,
-0.219289	// Example 11.1a float a,
-0.219289	// Example 11.1b float a,
-0.219289	// Example 8.16 float a,
-0.219289	// Example 14.18a float a,
-0.219289	// Example 14.18b float a,
-0.763039	} In this example, a,
-0.344246	T max(T const & a,
-0.017229	8.7 int SomeFunction (int a,
-0.017229	8.9b int SomeFunction (int a,
-0.017229	8.9a int SomeFunction (int a,
-0.017229	8.11b int SomeFunction (int a,
-0.017229	8.11a int SomeFunction (int a,
-0.345064	479001600}; ... int i, a,
-0.189775	the following way: bool a,
-0.189775	// Example 7.10a bool a,
-0.189775	// Example 7.9a bool a,
-0.212252	int cc[]) { Vec16s a,
-0.199981	Define vector objects Vec8s a,
-0.165122	}; vector() {} vector(float a,
-0.165122	a to b memcpy(b, a,
-0.462147	always belong to the thread
-0.460512	objects declared in the thread
-0.356470	it locally in the thread
-0.237587	problem by increasing the thread
-0.382388	have to fix the thread
-0.565266	pointer stored in a thread
-0.454595	becomes invalid if a thread
-0.454851	run slower than a thread
-0.346298	be used when a thread
-0.293196	core by setting a thread
-0.293196	need to lock a thread
-0.293332	cores. A process or thread
-0.237131	to each task or thread
-0.237688	classes are generally not thread
-0.237547	is not doubled. A thread
-0.584539	not in the same thread
-0.350422	time as the other thread
-0.328326	higher priority to one thread
-0.288750	threads so that one thread
-0.233102	two threads where one thread
-0.344818	separate containers for each thread
-0.285308	dependency chains then each thread
-0.230073	possible to give each thread
-0.230073	On the contrary, each thread
-1.098716	of the operating system thread
-0.299905	every second by another thread
-0.291711	the core with another thread
-0.210199	do calculations while another thread
-0.210199	the user interface, another thread
-0.301733	access in a separate thread
-0.301733	scheduled in a separate thread
-0.159872	calculations into a separate thread
-0.159872	task into a separate thread
-0.199419	than making a separate thread
-0.191441	on the stack. Each thread
-0.191441	two jobs simultaneously. Each thread
-0.191441	into multiple threads. Each thread
-0.191441	multiple processor cores. Each thread
-0.222312	database, and a third thread
-0.199970	or in a high-priority thread
-0.165112	happen that a low-priority thread
-0.165112	resources from a higher-priority thread
-0.405505	resource files, help files etc.
-0.435752	of (2n / b) etc.
-0.021507	exponential functions, trigonometric functions, etc.
-0.231243	will evict number 2, etc.
-0.285094	#else // Gnu compiler, etc.
-0.226564	instruction sets, cache size, etc.
-0.222177	bounds violations, invalid pointers, etc.
-0.222177	mixed language 11 programming, etc.
-0.218373	flip-flops, multiplexers, arithmetic units, etc.
-0.212108	instruction set (/arch:SSE2, /arch:AVX etc.
-0.212108	of addition, subtraction, multiplication, etc.
-0.264963	simple variables, loop counters, etc.
-0.212108	files, data base access, etc.
-0.212108	the operands are comparisons, etc.
-0.212108	times the other way, etc.
-0.251134	the constants Sunday, Monday, etc.
-0.199841	to network resources, databases, etc.
-0.199841	overhead of semaphores, mutexes, etc.
-0.465815	platforms, different screen resolutions, etc.
-0.199841	vectors SSE3 horizontal add, etc.
-0.199841	sets rather than loops, etc.
-0.465815	subexpression elimination, constant propagation, etc.
-0.199841	counter with its limit, etc.
-0.465815	cache misses, branch mispredictions, etc.
-0.164993	// Or #include <ia32intrin.h> etc.
-0.164993	binary trees, hash maps etc.
-0.164993	strcpy, strcat, strlen, sprintf, etc.
-0.164993	pow, log, exp, sin, etc.
-0.164993	types of graphics cards, etc.
-0.164993	for Windows, -msse2, -mavx, etc.
-0.164993	execution units, memory ports, etc.
-0.164993	buffer, branch pattern history, etc.
-0.164993	windows, mutexes, database connections, etc.
-0.164993	-mssse3 -msse4.1 -mAVX -axSSE3, etc.
-0.164993	/arch:SSE4.1 -mAVX /arch:AVX /QaxSSE3, etc.
-0.164993	Java, C#, Visual Basic, etc.
-0.164993	to windows, graphic brushes, etc.
-0.164993	Menus, buttons, dialog boxes, etc.
-0.164993	calls exit(), abort(), _endthread(), etc.
-0.164993	mispredictions, floating point exceptions, etc.
-0.237583	while Pentium 4 and AMD
-0.237583	AQtime, Intel VTune and AMD
-0.237834	also page 119). The AMD
-0.635721	and assembly code for AMD
-0.237091	use Intel VTune, for AMD
-0.237091	"Software Optimization Guide for AMD
-0.414719	CPUs, but not on AMD
-0.233543	not at all on AMD
-0.233543	the integer size on AMD
-0.328655	degradation of performance on AMD
-0.311285	currently only supported on AMD
-0.416396	always work well on AMD
-0.356057	of CPUs such as AMD
-0.293857	for AMD CPUs use AMD
-0.418617	0.27 strlen 128 bytes AMD
-0.235993	and x86-64 platforms. AMD AMD
-0.235161	optimized yet. Supports both AMD
-0.320736	and later Intel processors. AMD
-0.000943	The microarchitecture of Intel, AMD
-0.000118	"The microarchitecture of Intel, AMD
-0.083931	micro-operation breakdowns for Intel, AMD
-0.122654	platform with an Intel, AMD
-0.083931	of microprocessors from Intel, AMD
-0.083931	tool supports both Intel, AMD
-0.530108	memcpy 16kB aligned operands AMD
-0.528992	x86 and x86-64 platforms. AMD
-0.218484	ammintrin.h (MS) xopintrin.h (Gnu) AMD
-0.466011	dispatcher. See page 131. AMD
-0.466011	memcpy 16kB unaligned op. AMD
-0.199948	immintrin.h AMD SSE4A ammintrin.h AMD
-0.165091	+ ia32intrin.h _mm_exp_ps _mm_exp_pd AMD
-0.165091	Math Library __vrs4_expf __vrd2_exp AMD
-0.165091	= char 16 XOP, AMD
-0.165091	PCLMUL wmmintrin.h AVX immintrin.h AMD
-0.532926	best solution is to compile
-0.632613	Another possibility is to compile
-1.572125	It is possible to compile
-0.993863	makes it possible to compile
-1.017912	that you want to compile
-0.495929	of the code and compile
-0.293903	load the framework and compile
-0.350643	of course, that you compile
-0.339840	position-independent code when you compile
-0.235368	following way. First you compile
-0.242530	and calculate it at compile
-0.192188	to the compiler at compile
-0.192188	much as possible at compile
-0.192188	by its value at compile
-0.192188	calculations are available at compile
-0.192188	doing some calculations at compile
-0.192188	/ b) etc. at compile
-0.270073	calculations are done at compile
-0.085548	coefficients is calculated at compile
-0.085548	cannot be calculated at compile
-0.025857	objects is known at compile
-0.012736	elements is known at compile
-0.025857	n is known at compile
-0.025857	divisor is known at compile
-0.036046	cannot be known at compile
-0.001155	is not known at compile
-0.005804	are not known at compile
-0.036046	an integer known at compile
-0.036046	the size known at compile
-0.036046	a constant known at compile
-0.159706	#if is resolved at compile
-0.049137	is always resolved at compile
-0.049137	are always resolved at compile
-0.192188	Templates are instantiated at compile
-0.192188	will calculate (1./1.2345) at compile
-0.324339	local references. If we compile
-1.325018	the responsibility of the exception
-0.462044	redirects it to the exception
-0.455833	make support for the exception
-0.352784	some information for the exception
-0.355447	innermost function, then the exception
-0.620881	by turning off the exception
-0.354143	called. The program is exception
-0.531417	Avoiding the cost of exception
-0.314708	the possible alternatives to exception
-0.314646	used for debugging and exception
-0.237462	case of overflow. The exception
-0.237462	handling support anyway. The exception
-0.640220	turn off support for exception
-0.325279	You may think that exception
-0.629484	that is used by exception
-0.331637	your program relies on exception
-0.372403	in case of an exception
-0.232816	etc., and if an exception
-0.232816	But what if an exception
-0.231535	code will catch an exception
-0.231535	can possibly throw an exception
-0.231535	hardware for raising an exception
-0.461040	is optimal to use exception
-0.324846	must make your program exception
-0.237433	in some compilers. If exception
-0.865285	when there is no exception
-0.312425	handler, even if no exception
-0.352173	system instead of using exception
-0.331090	connections, etc. The C++ exception
-0.237067	cannot change its possible exception
-0.236987	will never throw any exception
-0.232575	program /Qipo -ipo No exception
-0.232203	registers, and possibly save exception
-0.864140	is the reason why exception
-0.148671	be compatible with structured exception
-0.148671	code relies on structured exception
-0.032673	efficient. You can disable exception
-0.032673	compiler. You can disable exception
-0.199925	b[arraysize], c[arraysize]; // Enable exception
-0.048369	is to wrap the allocated
-0.048369	recommended to wrap the allocated
-0.237759	auto_ptr that owns the allocated
-0.345986	or object that is allocated
-0.345986	sure everything that is allocated
-0.350268	reasons: Each object is allocated
-0.382125	bigger memory block is allocated
-0.237899	of all cleanup of allocated
-0.293732	common programming error. The allocated
-0.237482	for garbage collection. The allocated
-0.450271	variable size can be allocated
-0.490338	large array can be allocated
-0.511848	of objects can be allocated
-0.348391	the stack can be allocated
-0.348391	and arrays can be allocated
-0.353234	No memory will be allocated
-0.355121	collection. Objects that are allocated
-0.827858	safe if there are allocated
-0.023413	of different sizes are allocated
-0.350897	to anything it has allocated
-0.237424	be saved. Any other allocated
-0.333025	keep pointers to all allocated
-0.344030	is important that all allocated
-0.352746	heap manager for each allocated
-0.353397	non-recoverable errors; make sure allocated
-0.236119	data caching inefficient. An allocated
-0.451646	everything that has been allocated
-0.340119	object in its own allocated
-0.113872	objects stored in dynamically allocated
-0.113872	resource, such as dynamically allocated
-0.113872	pointers to different dynamically allocated
-0.113872	uses many small dynamically allocated
-0.113872	how to align dynamically allocated
-0.025793	exp 12.8 Aligning dynamically allocated
-0.025793	119 12.8 Aligning dynamically allocated
-0.113872	discussion of aligning dynamically allocated
-0.191002	dynamically allocated memory Memory allocated
-0.191002	cleaned up include: Memory allocated
-0.426937	if the time slices allocated
-0.587551	function if it is small
-1.028413	if the function is small
-0.356938	point addition. This is small
-0.658966	number of elements is small
-0.872317	the loop count is small
-0.799482	the repeat count is small
-0.356939	A register is a small
-0.502513	only used in a small
-0.652122	A loop with a small
-0.532390	blocks rather than a small
-1.200513	is to make a small
-0.340436	that use only a small
-0.442315	reading or writing a small
-0.472978	than to allocate a small
-0.237891	to the design of small
-0.336268	is also relevant to small
-0.237846	speed, memory economy and small
-0.353179	systems Microcontrollers used in small
-0.324370	16-bit programs, except for small
-0.023461	// Approximate exp(x) for small
-0.237745	single assembly instructions or small
-0.294006	are particularly important on small
-0.293979	executable to be as small
-0.605828	can be divided into small
-0.209434	applications even on such small
-0.209434	quite fast on such small
-0.233659	is divided into many small
-0.233659	a program uses many small
-0.324235	time, except for some small
-0.339830	the object is so small
-0.326092	lists that are so small
-0.345551	loop body is very small
-0.232980	performs better on very small
-0.311210	Library functions are typically small
-0.294610	and C are too small
-0.220983	zero. Execution time too small
-0.606131	than reading or writing small
-0.222289	rarely justifies the relatively small
-0.500918	should preferably be kept small
-0.357658	rare. Testing for the overflow
-0.354460	or (5) make the overflow
-0.356416	correct result because the overflow
-0.640015	exception in case of overflow
-0.450900	integers in case of overflow
-0.293549	susceptible to problems of overflow
-0.338694	Faster, but risk of overflow
-0.316101	way to check for overflow
-0.205114	does not check for overflow
-0.103085	have no check for overflow
-0.205114	We might check for overflow
-0.205114	problem: (1) check for overflow
-1.404735	to make sure that overflow
-0.382171	are so big that overflow
-0.294004	not wrap around on overflow
-0.329618	A Number) if an overflow
-0.463722	c+b will generate an overflow
-0.235616	to optimize away an overflow
-0.064774	zero and to make overflow
-0.356187	// Catch floating point overflow
-0.349973	{ // Floating point overflow
-0.308654	to +127. An integer overflow
-0.319138	assumption that signed integer overflow
-0.232807	around, (3) trap integer overflow
-0.457756	makes sure that no overflow
-0.330831	the arrays. An array overflow
-0.330842	(be aware of possible overflow
-0.292315	to worry much about overflow
-0.292160	a negative result. An overflow
-0.282794	size that doesn't cause overflow
-0.227859	the reduction would cause overflow
-0.235080	memory pool. 15 Integer overflow
-0.289920	12. Higher inputs give overflow
-0.233239	unsigned variables. A positive overflow
-0.230006	missing check for buffer overflow
-0.218479	Or, if protection against overflow
-0.212211	the compiler to ignore overflow
-0.165086	single result. An uncaught overflow
-0.346451	a; double b; a +=
-0.237712	// Example 8.5b a +=
-0.021630	i < 100; i +=
-0.213732	i < size; i +=
-0.001353	i < 256; i +=
-0.213732	i < 20; i +=
-0.321952	a[i] = temp; temp +=
-0.169962	16; n++) { sum +=
-0.049612	< 100; i++) sum +=
-0.049612	< size; i++) sum +=
-0.049612	(i=0; i<100; i++) sum +=
-0.016437	} else { list[i] +=
-0.383198	for(i=0; i<300; i++){ list[i] +=
-0.153123	for(i=i_div_3=0; i<300; i+=3,i_div_3++){ list[i] +=
-0.224831	c1 < r1; c1 +=
-0.165154	* _mm_load_ps(coef+i); // s +=
-0.165154	sum for(inti=0;i<16;i+=4){ //Loopby4 s +=
-0.165105	+= 2) { sum1 +=
-0.165105	sum2 += list[i+1];} sum1 +=
-0.218473	check if nonzero u.i +=
-0.218473	sum1 += list[i]; sum2 +=
-0.218473	Table[x] = Y; Y +=
-0.212251	list[i] += i_div_3; list[i+1] +=
-0.212206	r1 < SIZE; r1 +=
-0.212251	list[i+1] += i_div_3; list[i+2] +=
-0.212206	Y += Z; Z +=
-0.212206	s2 += a[i+2]; s3 +=
-0.212206	s1 += a[i+1]; s2 +=
-0.199937	+= 4) { s0 +=
-0.199937	s0 += a[i]; s1 +=
-0.165081	can modify x *const_cast<int*>(&x) +=
-0.165081	... list[i & 15] +=
-0.165081	< 100; i++) matrix[FuncRow(i)][FuncCol(i)] +=
-0.165081	columns; j++) 39 matrix[i][j] +=
-0.331928	allowed inputs are the integers
-0.552295	module. The size of integers
-0.237518	math allow addition of integers
-0.331507	floating point Conversion of integers
-0.237704	convert these types to integers
-0.237704	floating point numbers to integers
-0.023499	floating point numbers and integers
-0.354640	of whether they are integers
-0.237287	with bitwise operators using integers
-0.321877	bits each, or two integers
-0.289861	nearest integer. If two integers
-0.306417	16-bit systems or 64-bit integers
-0.168653	we can use 64-bit integers
-0.168653	you may use 64-bit integers
-0.099574	don't need conversions between integers
-0.099574	140. Avoid conversions between integers
-0.326968	is enabled. Conversions between integers
-0.228890	the efficiency of 32-bit integers
-0.228890	inefficient to use 32-bit integers
-0.283965	rather than two 32-bit integers
-0.197411	point. Conversion of unsigned integers
-0.048834	between signed and unsigned integers
-0.103863	mix signed and unsigned integers
-0.113233	7.4. Signed and unsigned integers
-0.087584	signed than with unsigned integers
-0.087584	comparing signed with unsigned integers
-0.197411	occurs, (2) use unsigned integers
-0.197411	efficient to convert unsigned integers
-0.197411	overflow. Signed versus unsigned integers
-0.235687	16 bits each, four integers
-0.235608	8 bits each, eight integers
-0.222363	the behavior of signed integers
-0.296244	unsigned integers to signed integers
-0.373710	vector of eight 16-bit integers
-0.229177	vector instructions cannot multiply integers
-0.218529	can contain either sixteen integers
-0.148720	are stored as 8-bit integers
-0.148720	we are using 8-bit integers
-0.059226	this library with the option
-0.309742	normally compiled with the option
-0.127823	to compile with the option
-0.127823	you compile with the option
-0.309742	integer overflow with the option
-0.309742	and link with the option
-0.356714	discussions. Turn on the option
-0.237254	this option. Use the option
-0.237254	are implemented. Use the option
-0.237073	module with, e.g. the option
-0.236154	for such optimizations with option
-0.292221	shared object made with option
-0.236154	overflow behavior well-defined with option
-0.236770	Gnu compiler manual. This option
-0.236770	(/FAs or -fsource-asm). This option
-0.021708	Some compilers have an option
-0.044558	Most compilers have an option
-0.044558	Many compilers have an option
-0.107786	the compiler has an option
-0.107786	Intel compiler has an option
-0.231556	unless you specify an option
-0.350842	to specify the compiler option
-0.350842	78). Adding the compiler option
-0.340584	to use a compiler option
-0.353556	used here. The compiler option
-0.233390	using the same compiler option
-0.293879	Microsoft compiler supports this option
-0.313675	are compiled without any option
-0.236683	Choose the strongest optimization option
-0.448932	off the exception handling option
-0.231981	output. The assembly output option
-0.231981	have an assembly output option
-0.375035	off the loop unroll option
-0.231327	requires static linking (e.g. option
-0.228100	not optimal. Use 12 option
-0.212246	32-bit case. The -fpie option
-0.251285	use the source annotation option
-0.165117	the "generate map file" option
-0.536674	costs if it is good
-0.536674	style if it is good
-0.357793	final product. It is good
-0.314137	cache. Single precision is good
-0.167777	This compiler is a good
-0.167777	Intel compiler is a good
-0.167777	Clang compiler is a good
-0.344040	or structure is a good
-0.542000	it can be a good
-0.526457	It is not a good
-0.326005	library that has a good
-0.326005	the reader has a good
-0.322548	assembly language because a good
-0.235600	I have done a good
-0.235600	It is therefore a good
-0.362092	possible to get a good
-0.277228	elsewhere and get a good
-0.312925	may be quite a good
-0.507196	and the availability of good
-0.237855	level. My recommendation for good
-0.211917	while high-level languages are good
-0.211917	hand. Low-level languages are good
-0.235108	is not always as good
-0.235108	compiler. Not optimized as good
-0.235108	Does not optimize as good
-0.235108	data are cached as good
-0.573745	thread. It is not good
-0.316078	+ 2; } A good
-0.285563	difference in performance. A good
-0.230297	byte of zero. A good
-0.230297	in one operation. A good
-0.230297	a simple index. A good
-0.230297	often have exploited. A good
-0.237414	time. For example, all good
-0.237080	64-bit Linux. Has many good
-0.498794	compiler is a very good
-0.225248	compilers are not very good
-0.299667	these libraries have very good
-0.279833	supported by some very good
-0.212269	data (low numbers mean good
-1.055406	the calculation of the power
-0.088612	for x to the power
-0.088612	get x to the power
-0.088612	Calculate x to the power
-0.297150	pointed to is a power
-0.216698	value that is a power
-0.216698	divisor that is a power
-0.493752	32. This is a power
-0.216698	the size is a power
-0.216698	its size is a power
-0.297150	the constant is a power
-0.594335	a matrix is a power
-0.297150	when columns is a power
-0.216698	if N is a power
-0.216698	If N is a power
-0.297150	the factor is a power
-0.297150	of abc is a power
-0.057436	if divisor is a power
-0.297150	desired interval is a power
-0.299733	should always be a power
-0.203497	may preferably be a power
-0.225904	should preferably be a power
-1.063318	Integer division by a power
-0.323863	can multiply by a power
-0.323863	operations. Multiplying by a power
-0.521646	N is not a power
-0.288782	in a matrix a power
-0.233131	had not been a power
-0.233131	number of columns a power
-0.233131	specialization for N a power
-0.293614	Example 15.1b. Calculate integer power
-0.235125	// Example 15.1d. Integer power
-0.112196	matrix is a high power
-0.112196	bytes) is a high power
-0.231874	use the high processing power
-0.284339	lightweight processors with low power
-0.222359	less memory and computing power
-0.165153	to utilize the computational power
-1.340226	the size of the matrix
-1.325777	a multiple of the matrix
-1.452752	the address of the matrix
-0.356880	of 2 and the matrix
-0.358118	the rows in the matrix
-0.580428	tried to make the matrix
-0.891485	is to divide the matrix
-0.237334	time to transpose the matrix
-0.201778	the size of a matrix
-0.447172	largest element in a matrix
-0.027340	of columns in a matrix
-0.350802	of 2 if a matrix
-0.048076	time to transpose a matrix
-0.048076	takes to transpose a matrix
-0.291989	themselves. But implementing a matrix
-0.291989	following example transposes a matrix
-0.235951	} } Transposing a matrix
-0.314731	This function writes to matrix
-0.051208	rows and columns in matrix
-0.237358	number of rows/columns in matrix
-0.237851	any cache lines for matrix
-0.236095	(see page 78). A matrix
-0.236095	multidimensional structure needed? A matrix
-0.350717	matrix cell for different matrix
-0.339000	Pentium 4 with different matrix
-0.208152	with a 64 64 matrix
-0.208152	expected. The 64 64 matrix
-0.436141	rows in a big matrix
-0.035829	to transpose and copy matrix
-0.295311	for a 512 512 matrix
-0.221575	instructions. The 512 512 matrix
-0.231873	The execution times per matrix
-0.315786	line size // define matrix
-0.854165	// function to transpose matrix
-0.354601	to work on a Linux
-0.352674	in newer versions of Linux
-0.314722	BSD are identical to Linux
-0.346385	performance. The Windows and Linux
-0.339040	as well as in Linux
-0.344482	functions and data in Linux
-0.237085	has been introduced in Linux
-0.237085	can be overridden in Linux
-0.637558	the Intel compiler for Linux
-0.505856	frameworks are available for Linux
-0.818141	a good choice for Linux
-0.237792	int BigArray[1024] __attribute__((aligned(64))); // Linux
-0.237729	is generally possible on Linux
-0.062161	compiler Windows Intel compiler Linux
-0.024698	compiler Windows Gnu compiler Linux
-0.416489	and 32-bit and 64-bit Linux
-0.416489	support 32-bit and 64-bit Linux
-0.186992	more efficient in 64-bit Linux
-0.306214	slightly faster in 64-bit Linux
-0.302051	Only available for 64-bit Linux
-0.218751	point numbers. Therefore, 64-bit Linux
-0.237114	(e.g. option /MT). In Linux
-0.389065	32-bit Windows and 32-bit Linux
-0.345935	without -fpic in 32-bit Linux
-0.224930	self-relative addressing. In 32-bit Linux
-0.279472	no difference between 32-bit Linux
-0.528157	objects in 64 bit Linux
-0.476087	objects in 32 bit Linux
-0.535566	is said here about Linux
-0.482887	compiler Intel compiler Windows Linux
-0.233244	platform, but also supports Linux
-0.038835	optimization guide for Windows, Linux
-0.246239	32- and 64-bit Windows, Linux
-0.212234	Windows platform _WIN32 _WIN32 Linux
-0.233986	64-bit. They have not been
-0.170417	when it has not been
-0.170417	class library has not been
-0.233986	If columns had not been
-0.233986	Studio IDE. Has not been
-0.222888	of the code have been
-0.222888	by the compiler have been
-0.222888	of the program have been
-0.509334	a and b have been
-0.045293	before all objects have been
-0.022056	after all objects have been
-0.296867	after all elements have been
-0.222888	but the examples have been
-0.222888	above the diagonal have been
-0.222888	constant N1 could have been
-0.222888	the hot spots have been
-0.222888	function in isolation have been
-0.227914	a file that has been
-0.227914	that everything that has been
-0.366030	check if it has been
-0.321144	up because it has been
-0.104820	object after it has been
-0.104820	accessed after it has been
-0.205851	during this time has been
-0.284092	of the pointer has been
-0.212089	previous link pointer has been
-0.090841	number of registers has been
-0.090841	of vector registers has been
-0.205851	because the file has been
-0.205851	registers. This problem has been
-0.205851	the repeat count has been
-0.205851	the pointer p has been
-0.205851	fact, the STL has been
-0.205851	wait until seconds has been
-0.205851	a hot spot has been
-0.205851	"Gnu indirect function" has been
-0.205851	loop initialisation i=0; has been
-0.222394	block that has already been
-0.347103	vector turned up to cause
-0.237523	are too small to cause
-0.579235	87) is likely to cause
-0.338559	at unpredictable times and cause
-0.237087	can be invalid and cause
-0.237087	the critical stride and cause
-0.237087	each other's caches and cause
-0.325307	other resource problems that cause
-0.496373	is because it can cause
-0.394689	their stack. This can cause
-0.555946	or modified. This can cause
-0.303857	is defined. This can cause
-0.303857	access patterns. This can cause
-0.306187	stride then this can cause
-0.286060	in static memory can cause
-0.321010	of an array can cause
-0.230735	updating of software can cause
-0.230735	if intermediate calculations can cause
-0.306187	An array overflow can cause
-0.230735	members. This alignment can cause
-0.230735	the same generation can cause
-0.230735	in the BTB can cause
-0.469073	is that it may cause
-0.469073	non-sequentially because it may cause
-0.295505	different modules. This may cause
-0.295505	marketing reasons. This may cause
-0.318322	Pentium CPUs which may cause
-0.232138	and class members may cause
-0.233096	to do so will cause
-0.233096	multiple overloaded operators will cause
-0.023154	from address 0x2710 will cause
-0.329840	integer size that doesn't cause
-0.229028	to signed integer doesn't cause
-0.500137	abstraction is a common cause
-0.500652	probably the most common cause
-0.291531	where the reduction would cause
-0.369240	swapping is a frequent cause
-0.276591	hardware identification. Such schemes cause
-0.248663	takes advantage of the AVX
-0.248663	main advantage of the AVX
-0.357186	apply as to the AVX
-0.357817	named YMM in the AVX
-0.517536	is compiled for the AVX
-0.352286	when compiling for the AVX
-0.353483	are available if the AVX
-0.353483	bits (YMM) if the AVX
-0.862248	how to use the AVX
-0.355662	page 105). If the AVX
-0.023446	_mm256_zeroupper() before leaving the AVX
-0.236900	to 12.1a. Enable the AVX
-0.293732	set if possible. The AVX
-0.237482	2.6.30 and later. The AVX
-0.341807	mixing code compiled for AVX
-0.063918	>= 11) { // AVX
-0.437295	int parm2) {...} // AVX
-0.343611	can run only if AVX
-0.268466	program is compiled with AVX
-0.351463	of code compiled with AVX
-0.237601	supported. For example, use AVX
-0.235926	penalty when going from AVX
-0.235926	before any transition from AVX
-0.344460	the library has no AVX
-0.314047	set SSE4.1 instr. set AVX
-0.292958	float or int 4 AVX
-0.074988	versions with and without AVX
-0.074988	compiled with and without AVX
-0.319864	program is compiled without AVX
-0.236361	SSE4.2 string search instructions AVX
-0.361248	double 64 4 256 AVX
-0.361248	float 32 8 256 AVX
-0.685317	and the operating system. AVX
-0.074756	right vector elements. 12.1 AVX
-0.074756	vector operations............................................................................................... 105 12.1 AVX
-0.165112	(Gnu) AES, PCLMUL wmmintrin.h AVX
-0.565261	that the use of classes
-0.382780	object. 7.17 Structures and classes
-0.435002	of text strings in classes
-0.021769	intrinsic functions or vector classes
-0.308302	CPU dispatching with vector classes
-0.268756	you are using vector classes
-0.312854	example, using Intel vector classes
-0.087154	109 12.5 Using vector classes
-0.087154	section. 12.5 Using vector classes
-0.036739	<dvec.h> // Define vector classes
-0.076876	"vectorclass.h" // Define vector classes
-0.215465	example, using Agner vector classes
-0.320637	example using Agner's vector classes
-0.094505	libraries of predefined vector classes
-0.094505	functions Use predefined vector classes
-0.331079	for organizing data into classes
-0.237134	the vectors into C++ classes
-0.079468	More examples of container classes
-0.079468	the discussion of container classes
-0.176808	applications. Remember that container classes
-0.176808	common to make container classes
-0.176808	collection of example container classes
-0.176808	Unfortunately, many standard container classes
-0.176808	write your own container classes
-0.176808	at www.agner.org/optimize/cppexamples.zip containing container classes
-0.221028	common implementations of string classes
-0.275051	large applications. The string classes
-0.286803	}; // The child classes
-0.200035	F64vec4 Table 12.5. Vector classes
-0.200035	AVX512 Table 12.1. Vector classes
-0.229225	Inheritance from multiple parent classes
-0.035774	dynamic memory allocation. Container classes
-0.017519	...................................................................................... 90 9.7 Container classes
-0.017519	using alloca. 9.7 Container classes
-0.035774	cases. Multiple threads? Container classes
-0.165128	if there are wrapper classes
-0.165128	} 59 third generations classes
-0.583094	time if it is done
-0.457383	long as it is done
-0.350436	is called. This is done
-0.350436	innermost loop. This is done
-0.344672	a smaller size is done
-0.714524	Dynamic memory allocation is done
-0.380488	and the multiplication is done
-0.380488	and function inlining is done
-0.236225	function. The branching is done
-0.236225	C1::Disp() or C2::Disp() is done
-0.236225	load address. Relocation is done
-0.294182	should be standardized and done
-0.537803	calculations have to be done
-0.522039	that has to be done
-0.811628	because it can be done
-0.477486	memory. This can be done
-0.477486	process. This can be done
-0.477486	only. This can be done
-0.442634	to integer can be done
-0.342339	of 2 can be done
-0.342339	this polynomial can be done
-0.351698	a constant should be done
-0.345821	cannot do must be done
-0.464705	test should preferably be done
-0.464705	device should preferably be done
-0.342212	Floating point operations are done
-0.159551	these address calculations are done
-0.159551	are: All calculations are done
-0.159551	that certain calculations are done
-0.235578	Most performance tests are done
-0.235578	and 9. Multiplications are done
-0.237683	often easier said than done
-0.351688	maintenance easier. I have done
-0.236387	consider whether others have done
-0.350908	the work it has done
-0.293658	[ecx+eax*4]. This is all done
-0.235190	15.1b to 15.1c was done
-0.341597	bits. This is usually done
-0.397840	object is not necessarily done
-0.333534	syntax checking and is therefore
-0.470152	source code. It is therefore
-0.413862	the memory. It is therefore
-0.319347	same compiler. It is therefore
-0.319347	was executed. It is therefore
-0.319347	point expressions. It is therefore
-0.319347	precision constant. It is therefore
-0.319347	page 73). It is therefore
-0.319347	more efficiently. It is therefore
-0.319347	page 72. It is therefore
-0.319347	programmer can. It is therefore
-0.319347	for correctness. It is therefore
-0.334003	memory. Efficient caching is therefore
-0.432158	i to p is therefore
-0.339186	the main memory and therefore
-0.235524	programmer to make and therefore
-0.345079	other local variables and therefore
-0.291503	a dedicated microprocessor and therefore
-0.379511	no operating system, and therefore
-0.235524	difficult to understand and therefore
-0.235524	value of m and therefore
-0.235524	it does not, and therefore
-0.235524	will be non-zero, and therefore
-0.235524	highly system dependent and therefore
-0.344533	Pointer arithmetic operations are therefore
-0.237393	constructor itself. Constructors are therefore
-1.122084	of the code can therefore
-0.343904	longer time. It can therefore
-0.380432	Dynamic memory allocation can therefore
-0.347958	as 'this'. We can therefore
-0.237686	have. The developers may therefore
-0.236087	the above examples will therefore
-0.236087	14.23b and 14.30 will therefore
-0.300748	The vectorized code should therefore
-0.314600	storage space. It should therefore
-0.247705	about them. You should therefore
-0.247705	too late. You should therefore
-0.226159	floating point calculations should therefore
-0.226159	long. Lazy binding should therefore
-0.294220	float type holds a precision
-0.336141	the same regardless of precision
-0.331706	overflow and loss of precision
-0.554432	calculated with the same precision
-0.352783	to keep the same precision
-0.356205	to relax floating point precision
-0.349995	32-bit integer. Floating point precision
-0.197338	multiplying with the double precision
-0.285964	example is a double precision
-0.266733	single precision to double precision
-0.187455	mix single and double precision
-0.187455	mixing single and double precision
-0.197338	not know that double precision
-0.197338	point constants are double precision
-0.248317	calculated faster than double precision
-0.197338	You may use double precision
-0.197338	vectors of two double precision
-0.231690	registers have long double precision
-0.231690	calculate when long double precision
-0.197338	can hold four double precision
-0.197338	In most cases, double precision
-0.197338	single precision. Using double precision
-0.087556	Disadvantages are: Long double precision
-0.087556	double precision. Long double precision
-0.193163	is higher for single precision
-0.243625	You may use single precision
-0.193163	convert b from single precision
-0.193163	you are using single precision
-0.193163	making the constant single precision
-0.193163	precision or four single precision
-0.193163	precision or eight single precision
-0.096864	ADX instructions for high precision
-0.096864	math. Libraries for high precision
-0.233561	have mixed precision require precision
-0.279434	where operands have mixed precision
-0.074767	slightly more time. Single precision
-0.074767	the data cache. Single precision
-0.165138	newest instruction set. High precision
-0.165138	set is enabled (single precision
-0.445989	it doesn't have the line
-0.343570	cache line then the line
-0.343570	set values then the line
-0.346451	kb size with a line
-0.346451	4 ways, with a line
-0.237756	draw each pixel or line
-0.294036	and interpreted line by line
-0.237734	swapping column 29 with line
-0.527330	You may replace this line
-0.235236	execute a code one line
-0.510957	prefetch more than one line
-0.101594	arrays by the cache line
-0.101594	divisible by the cache line
-0.234454	again before the cache line
-0.234454	at least the cache line
-0.234454	will evict the cache line
-0.269284	address so a cache line
-0.269284	for fetching a cache line
-0.154563	contemporary processors. The cache line
-0.154563	cache line. The cache line
-0.206816	// align by cache line
-0.206816	Func 87 used cache line
-0.206816	load a new cache line
-0.091212	of 64. Each cache line
-0.091212	line 29. Each cache line
-0.206816	causes an entire cache line
-0.352743	the file for each line
-0.465987	multiple of the matrix line
-0.618562	size of a matrix line
-0.533464	_EM_OVERFLOW); // if above line
-0.335702	of range. The next line
-0.234484	be negative. The last line
-0.234142	of 64 bytes. Each line
-0.228090	it is and interpreted line
-0.222350	may remove the memset line
-0.114606	compiler from the command line
-0.097727	specified on a command line
-0.097727	called from a command line
-0.165107	options Table 18.1. Command line
-0.341754	check for overflow and works
-0.710774	piece of code that works
-0.423843	find the one that works
-0.286119	the only one that works
-0.464718	another function library that works
-0.426849	brand. The version that works
-0.463609	then make sure it works
-0.325130	example 16.1. This code works
-0.102094	whole program optimization. This works
-0.102094	offer profile-guided optimization. This works
-0.235812	32-bit absolute addresses. This works
-0.835958	in the Intel compiler works
-0.312839	chapter describes how this works
-0.236319	faster. Of course, this works
-0.237569	are accessed sequentially. It works
-0.314328	and see which one works
-0.308949	the cache. The cache works
-0.118554	time. The code cache works
-0.118554	together The code cache works
-0.285924	data sequentially A cache works
-0.293287	mathematical functions. It also works
-0.347632	each thread. This method works
-0.335166	note that this method works
-0.236094	Gnu manual currently doesn't works
-0.349628	processors. Explicit CPU dispatching works
-0.378697	that all code branches works
-0.234859	a particular code implementation works
-0.505659	the out-of-order execution mechanism works
-0.220983	memory. The renaming mechanism works
-0.321920	end user. Dynamic linking works
-0.326959	the way a profiler works
-0.320505	instructions. The automatic vectorization works
-0.222295	a program that already works
-0.122603	X?" rather than "what works
-0.122603	programmer typically thinks "what works
-0.199925	example 14.12b and 14.13b works
-0.199925	is necessary. 101 Multithreading works
-0.165071	The OR operator (|) works
-0.358349	profiling support in the optimized
-0.473207	debugger is not the optimized
-0.325075	If you run the optimized
-0.237668	In example 12.2, the optimized
-0.861110	that the code is optimized
-0.443931	The data cache is optimized
-0.314321	that some expression is optimized
-0.347457	very good compilers and optimized
-0.237886	addresses are obscured in optimized
-0.293721	produce Boolean output. The optimized
-0.237472	high level framework. The optimized
-0.848117	things that can be optimized
-0.553895	above code can be optimized
-0.353220	= !a; can be optimized
-0.456412	queries can often be optimized
-0.351950	compiler. Some functions are optimized
-0.293589	All these examples are optimized
-0.237745	been reordered, inlined, or optimized
-0.355380	modern microprocessors are not optimized
-0.352449	the functionality of an optimized
-0.352210	any processor that you optimized
-0.237302	the same function, each optimized
-0.098746	some of the best optimized
-0.418718	Math core library contains optimized
-0.225314	to use the well optimized
-0.225314	However, with a well optimized
-0.446359	The pointer is simply optimized
-0.230753	from www.agner.org/optimize/asmlib.zip. Currently includes optimized
-0.195947	breakpoint in the fully optimized
-0.195947	libraries are not fully optimized
-0.111206	names. But a highly optimized
-0.119039	These functions are highly optimized
-0.266395	function libraries are highly optimized
-0.111206	you consider making highly optimized
-0.122616	the Gnu compiler. Not optimized
-0.122616	Borland C++ builder. Not optimized
-0.199948	multiple versions, each carefully optimized
-0.588063	well if it is inside
-0.716010	if the loop is inside
-0.352208	because they must be inside
-0.237778	smaller by declaring it inside
-0.357756	square by the code inside
-0.351151	small piece of memory inside
-0.354537	following algorithm is used inside
-0.237047	inheritance by making objects inside
-0.236986	store the shared variable inside
-0.349828	by declaring the table inside
-0.277855	how predictable the branch inside
-0.277855	it avoids the branch inside
-0.334291	not with a branch inside
-0.279440	16-bit integers. The branch inside
-0.293008	and c2 for elements inside
-0.236139	and fixed size arrays inside
-0.316796	depends on the calculations inside
-0.222311	branch depends on calculations inside
-0.508219	the floating point calculations inside
-0.379114	counter is a counter inside
-0.234935	count and no branches inside
-0.241798	arrays should be declared inside
-0.348661	should preferably be declared inside
-0.227942	variables and objects declared inside
-0.144698	Variables and objects declared inside
-0.073404	a class Variables declared inside
-0.073404	course inefficient. Variables declared inside
-0.231293	reading the performance counters inside
-0.286042	for the overflow condition inside
-0.287003	its body is defined inside
-0.203299	a static object defined inside
-0.226693	declaring the function body inside
-0.224814	look at what happens inside
-0.222259	re-allocation is needed. Objects inside
-0.222290	while loop because nothing inside
-0.199920	operations (addition, multiplication, etc.) inside
-0.165066	preferably be kept entirely inside
-0.165066	call (other than log) inside
-0.294181	sleep mode. See the manual
-0.294181	exception handling. See the manual
-0.530979	This is not a manual
-0.237608	(See page 49 and manual
-0.237608	the preceding paragraph and manual
-0.441985	mangling are explained in manual
-0.425320	details are given in manual
-0.379086	methods are discussed in manual
-0.235219	"Macro loops" chapter in manual
-0.311527	sets is provided in manual
-0.500540	latencies are listed in manual
-0.235219	in table 19 in manual
-0.101876	described in detail in manual
-0.101876	in more detail in manual
-0.235219	in kernel code" in manual
-0.235219	processors are covered in manual
-0.233954	page 8 below. This manual
-0.377325	portability is important. This manual
-0.233954	faster and smaller. This manual
-0.233954	164 1 Introduction This manual
-0.233954	on page 158. This manual
-0.455378	www.openmp.org and the compiler manual
-0.495941	listed in the compiler manual
-0.032543	an appendix to this manual
-0.067724	An appendix to this manual
-0.339988	the basis for this manual
-0.236850	the simplest cases. See manual
-0.496593	described in the Gnu manual
-0.233276	of computers and my manual
-0.017523	in the CPU (See manual
-0.165175	on AMD CPUs (See manual
-0.165175	to be mispredicted (See manual
-0.212279	rest of the present manual
-0.165147	Agner Fog The present manual
-0.199993	version. See the vectorclass manual
-0.237877	compilers can compute a /
-0.833345	b; a = b /
-0.407455	c; a = b /
-0.548729	10; a = b /
-0.396775	if (a > b /
-0.237178	i++){ list[i] += i /
-0.233731	(number of sets). Here, /
-0.288863	= b * 5 /
-0.288539	reciprocal_divisor; reciprocal_divisor = 1. /
-0.310244	temp; c = temp /
-0.287205	i++. cmp eax, 100 /
-0.307740	instructions add ebx, eax /
-0.226650	{ sum += xn /
-0.226650	and 64-bit Windows. Borland /
-0.224790	Windows. Borland / CodeGear /
-0.218432	= (total cache size) /
-0.309691	a = (unsigned int)b /
-0.212165	is discussed below. Signed /
-0.048364	necessary calculations of (2n /
-0.048364	as a * (2n /
-0.048364	n. The constant (2n /
-0.212165	cache is 512 kb /
-0.330642	b1; y2 = a2 /
-0.330642	y2; y1 = a1 /
-0.199897	= b + 2.0 /
-0.199897	critical stride is 8192 /
-0.251197	(a+1); c = (a+1) /
-0.165045	= (a1*b2 + a2*b1) /
-0.165045	The instructions mov ebx,eax /
-0.165045	have (set) = (10000 /
-0.165045	b = (unsigned int)a /
-0.165045	(set) = (memory address) /
-0.165045	= b * (1. /
-0.165045	v. 14.00 for 80x86 /
-0.165045	and (set) = (0x2710 /
-0.561646	CPUs. This method is explained
-0.294041	of bounds checking is explained
-0.236941	of variable storage are explained
-0.293117	libraries. These factors are explained
-0.236941	about name mangling are explained
-0.290167	for optimizing code, as explained
-0.217231	on non-Intel processors, as explained
-0.217231	to 64-bit mode, as explained
-0.217231	on the system, as explained
-0.217231	preferably be static, as explained
-0.217231	be cleaned up, as explained
-0.217231	loss of precision, as explained
-0.217231	doing out-of-order execution, as explained
-0.217231	of cache space, as explained
-0.217231	of vector operations, as explained
-0.217231	used for metaprogramming, as explained
-0.217231	use vector classes, as explained
-0.217231	container class templates, as explained
-0.217231	can eliminate branches, as explained
-0.217231	about register use, as explained
-0.217231	in other ways, as explained
-0.217231	compiled without AVX, as explained
-0.217231	for switch statements, as explained
-0.217231	a memory pool, as explained
-0.217231	for other optimizations, as explained
-0.217231	use static linking, as explained
-0.217231	the clock frequency, as explained
-0.217231	unit is pipelined, as explained
-0.217231	the critical stride, as explained
-0.217231	expensive cache contentions, as explained
-0.232321	theoretical background is further explained
-0.103527	frame functions for reasons explained
-0.103527	of precision for reasons explained
-0.103527	this manual for reasons explained
-0.103527	32-bit mode, for reasons explained
-0.218566	be predicted perfectly. As explained
-0.200026	and table lookup mechanisms explained
-0.461189	be replaced by the calculated
-0.356859	replace it with the calculated
-0.651457	the operand that is calculated
-0.353379	loop counter, which is calculated
-0.381805	unless the value is calculated
-0.267493	that each value is calculated
-0.312529	d; This expression is calculated
-0.236059	(2n / b) is calculated
-0.236059	each value xn is calculated
-0.236059	value of n! is calculated
-0.236059	table of coefficients is calculated
-0.236059	so that a+b is calculated
-0.236059	address of matrix[j][0] is calculated
-0.236059	f(x) or g(x) is calculated
-0.351304	a number to be calculated
-0.469771	exponential function can be calculated
-0.518115	i which can be calculated
-0.469771	induction variable can be calculated
-0.333461	the result can be calculated
-0.048155	loop counter can be calculated
-0.333461	point numbers can be calculated
-0.431478	if condition can be calculated
-0.333461	critical stride can be calculated
-0.333461	two parentheses can be calculated
-0.348994	example, b*2.0/3.0 will be calculated
-0.346853	inlined or cannot be calculated
-0.307439	that r+i/2 could be calculated
-0.454826	and mathematical functions are calculated
-0.407367	the bitwise operators are calculated
-0.350912	array, which it has calculated
-0.402016	sure it is only calculated
-0.309792	Here, log(2.0) is only calculated
-0.330145	intermediate results are always calculated
-0.165153	4, anda * 17is calculated
-0.165153	For example,a * 16is calculated
-0.355591	serious burden is the calculation
-0.452493	repeat count and the calculation
-0.350147	then B, and the calculation
-0.357526	chain. Nothing in the calculation
-0.356572	not needed for the calculation
-0.547597	correct or if the calculation
-0.349714	can occur, but the calculation
-0.236479	Note how efficient the calculation
-0.352805	of B before the calculation
-0.923915	to roll out the calculation
-0.344764	to speed up the calculation
-0.292590	iterations and start the calculation
-0.236479	of software specifies the calculation
-0.292590	the latter case, the calculation
-0.236479	microprocessor can begin the calculation
-0.572475	it has finished the calculation
-0.380843	overflow and redo the calculation
-0.236479	work // Re-do the calculation
-0.236053	b + c; The calculation
-0.490957	multiple function calls. The calculation
-0.236053	// Calculate polynomial The calculation
-0.236053	set number 28. The calculation
-0.292105	-128 generates 127. The calculation
-0.236053	is not supported. The calculation
-0.347323	return f; } This calculation
-0.237627	next example shows this calculation
-0.237569	invariant code motion A calculation
-0.609758	the sense that each calculation
-0.328152	of calculations, where each calculation
-0.236967	utilizing its out-of- order calculation
-0.062046	to make the address calculation
-0.223878	extra instructions for address calculation
-0.223878	than the complicated address calculation
-0.680725	contribution to the total calculation
-0.165143	we have an estimated calculation
-0.165143	rarely in Linux. Address calculation
-0.222526	to virtual function } };
-0.222526	array index operator } };
-0.057637	cout << 1; } };
-0.222526	{ return x; } };
-0.055931	cout << 2; } };
-0.222526	{ return 1.0; } };
-0.222526	powN<true,N-N1>::p(x); #undef N1 } };
-0.222526	powN<true,N/2>::p(x) * powN<true,N/2>::p(x); } };
-0.222526	child function: (static_cast<MyChild*>(this))->Disp(); } };
-0.236858	operator // add elements };
-0.350147	1; // sign bit };
-0.610005	beginning of the structure };
-0.163674	{ public: int c; };
-0.163674	B2 b2; int c; };
-0.842796	a, b, c, d; };
-0.226696	last byte at 19 };
-0.265086	fetching, decoding and perhaps };
-0.122616	{return a + b;} };
-0.122616	int ReadB() {return b;} };
-0.008670	int b:2; int c:2; };
-0.199948	Wednesday, Thursday, Friday, Saturday };
-0.008670	public: virtual void f(); };
-0.251253	int a[1000]; float b[1000]; };
-0.466011	{ public: void NotPolymorphic(); };
-0.199948	0x20, Saturday = 0x40 };
-0.008670	unsigned int sign :1;//signbit };
-0.165091	{ public: ... ~C1(); };
-0.165091	int c; int UnusedFiller; };
-0.165091	c:2; }; char abc; };
-0.165091	"Alpha", "Beta", "Gamma", "Delta" };
-0.165091	= y.d + 4.; };
-0.485632	52 or class is 128
-0.314517	each vector register is 128
-0.065717	type __m128i defines a 128
-0.065717	type __m128 defines a 128
-0.065717	type __m128d defines a 128
-0.234566	Vec8us 32 4 int 128
-0.350772	32 4 unsigned int 128
-0.340415	16 8 short int 128
-0.479318	8 unsigned short int 128
-0.528984	class is less than 128
-0.355459	Mac OS. See page 128
-0.237243	256 float 128 double 128
-0.237136	256 uint64_t 256 float 128
-0.234048	SSE double 64 2 128
-0.234048	long long 64 2 128
-0.233469	SSE2 int 32 4 128
-0.233469	SSE2 float 32 4 128
-0.323969	short int 16 8 128
-0.333215	members in the first 128
-0.333215	members within the first 128
-0.312993	MMX char 8 16 128
-0.236483	bit float vectors SSE2 128
-0.281841	126 12.2 128 128 128
-0.227018	127 126 12.2 128 128
-0.336975	called. Therefore, the dispatcher 128
-0.304784	8 16 unsigned char 128
-0.221006	I64vec1 8 16 char 128
-0.231303	256 && SIZE % 128
-0.230741	32 bit mode SSE 128
-0.226660	Vec4ui 64 2 int64_t 128
-0.165126	0.35 0.29 0.28 strlen 128
-0.165126	0.82 0.59 0.27 strlen 128
-0.218467	Vec2q 64 2 uint64_t 128
-0.218502	127 127 126 12.2 128
-0.251234	in Gnu compiler ......................................................................... 128
-0.165076	be 64 bits (MMX), 128
-0.358035	the compiler if the uses
-0.294136	compiled code big and uses
-0.331362	is other code that uses
-0.331362	to optimize code that uses
-0.236206	a second application that uses
-0.292280	a software framework that uses
-0.236206	a hot spot that uses
-0.353116	other CPUs, but it uses
-0.356422	much memory a function uses
-0.293250	} The pow function uses
-0.237714	in 32-bit Mac code uses
-0.314565	fail if an int uses
-0.553919	illogical that the compiler uses
-0.517724	BSD, but the compiler uses
-0.236005	different Intel CPUs. It uses
-0.236005	for the label. It uses
-0.382065	systems if the program uses
-0.382065	happens if the program uses
-0.491475	pointer. If a program uses
-0.342888	load time. The program uses
-0.335436	bits while a double uses
-0.324682	vectorized, because a float uses
-0.236891	to test when software uses
-0.234939	code. This framework typically uses
-0.331971	library. If the application uses
-0.223442	that a particular application uses
-0.222926	} } This implementation uses
-0.277200	zero. A good implementation uses
-0.234816	as long as their uses
-0.234708	if the user never uses
-0.289487	ifunc branch). This feature uses
-0.232251	problem. The compiler sometimes uses
-0.232240	mode, where it still uses
-0.229117	compile time. Four typical uses
-0.226680	elements of a vector, uses
-0.165076	string, wstring or CString uses
-0.174875	into one of the four
-0.356958	next vector, and the four
-0.356435	eight) points with the four
-0.331258	add_elements(s); // add the four
-0.482959	(2,2,2,2), and store the four
-0.237410	in one vector, the four
-0.908461	the value that is four
-0.634044	128 bit vector of four
-0.313811	floats A structure of four
-0.330916	// Define vectors of four
-0.237134	allows a maximum of four
-0.237134	data into groups of four
-0.421337	value of i to four
-0.355972	CPU core. There are four
-0.293316	eight 16-bit integers or four
-0.420155	two double precision or four
-0.293108	function 250 times with four
-0.236934	Core i7 processor with four
-0.352721	functions have more than four
-0.237612	you can only have four
-0.314251	simultaneously. This processor has four
-0.321485	0x1C. There are only four
-0.321485	calculate pow(x,10) with only four
-0.233653	64-bit Windows allows only four
-0.351711	that we can do four
-0.353688	64-bit Windows, the first four
-0.235700	For example, you get four
-0.311801	memory block for every four
-0.234934	*= xx4; // next four
-0.234321	The code will read four
-0.203271	a vector of e.g. four
-0.203271	registers can hold e.g. four
-0.311694	each vector can hold four
-0.284243	of 16 bits each, four
-0.199948	} This loop calculates four
-0.838775	sometimes more efficient than functions.
-0.237453	simply treated as different functions.
-0.480650	versions of the library functions.
-0.228395	is useful for library functions.
-0.228395	of dynamically linked library functions.
-0.228395	time on executing library functions.
-0.344642	split up into multiple functions.
-0.348145	identical for the two functions.
-0.213272	different meaning for member functions.
-0.266278	data members or member functions.
-0.213272	as any other member functions.
-0.310568	obtained with virtual member functions.
-0.200520	members or non-static member functions.
-0.200520	on all non-static member functions.
-0.213272	contains any non-polymorphic member functions.
-0.489567	versions of the virtual functions.
-0.283637	templates instead of virtual functions.
-0.235436	than the intrinsic hardware functions.
-0.210309	you had used intrinsic functions.
-0.210309	compilers that support intrinsic functions.
-0.210309	using the so-called intrinsic functions.
-0.210294	contains many useful mathematical functions.
-0.210294	more information about mathematical functions.
-0.210294	library contains optimized mathematical functions.
-0.234457	variable names from string functions.
-0.234429	code for the three functions.
-0.345026	no calls to frame functions.
-0.210496	leaf functions and frame functions.
-0.286759	penalty for using overloaded functions.
-0.283039	versions of the polymorphic functions.
-0.265086	is preferable for speed-critical functions.
-0.212258	as logarithms and trigonometric functions.
-0.199948	to all local non-member functions.
-0.199948	programs must use thread-safe functions.
-0.199948	more resources than non-virtual functions.
-0.165091	linker to remove unreferenced functions.
-0.314723	15 Integer overflow is another
-1.106664	to a pointer to another
-0.237504	from one auto_ptr to another
-0.314252	is later ported to another
-0.294143	with AVX support and another
-0.293580	the derived class in another
-0.331182	the thousand results in another
-0.237349	supposedly is system-independent, in another
-0.490324	have to wait for another
-0.314536	if an overflow or another
-0.235526	incremented every second by another
-0.235526	incremented to 5 by another
-0.235526	can be changed by another
-0.235526	and later deleted by another
-0.291305	do an addition with another
-0.235349	of the core with another
-0.235349	a project built with another
-0.235349	address might clash with another
-0.444038	a clock cycle on another
-0.422020	may be called from another
-0.298065	is also called from another
-0.351708	then it can do another
-0.348366	compiler-generated code by making another
-0.535388	can do calculations while another
-0.330407	a dispatched function calls another
-0.215609	which in turn calls another
-0.094560	However, if F1 calls another
-0.094560	necessary. If F1 calls another
-0.236082	the object file. Use another
-0.291889	the loop is inside another
-0.309759	mispredicted whenever it goes another
-0.231822	if the destructor causes another
-0.337132	the AVX instruction set, another
-0.272239	C++ program that produces another
-0.165086	of the user interface, another
-0.165086	bit mode, we encounter another
-0.357592	same space for the parameters
-0.354063	64-bit mode where the parameters
-0.237675	(In 64-bit mode, the parameters
-0.237675	32-bit mode. Storing the parameters
-0.346414	with any type of parameters
-0.237847	and operating systems". The parameters
-0.312904	vector objects as function parameters
-0.236374	allows only four function parameters
-0.236374	equally efficient. Simple function parameters
-0.237713	not be passed as parameters
-0.335763	12.6. Function with vector parameters
-0.502513	first eight floating point parameters
-0.349986	integer parameters. Floating point parameters
-0.232826	The first two integer parameters
-0.288436	the first six integer parameters
-0.232826	on CodeGear compiler) integer parameters
-0.197517	sense that the template parameters
-0.197517	one if the template parameters
-0.266943	efficient because the template parameters
-0.197517	same. If the template parameters
-0.289967	different factors as template parameters
-0.324403	template instance has its parameters
-0.318408	a maximum of four parameters
-0.227027	Windows, the first four parameters
-0.184743	7.15 Function parameters Function parameters
-0.039349	than in memory. Function parameters
-0.039349	around in memory. Function parameters
-0.184743	and overloaded operators. Function parameters
-0.082621	................................................................................................................ 48 7.15 Function parameters
-0.082621	this respect. 7.15 Function parameters
-0.184743	are using __fastcall. Function parameters
-0.232642	function type with desired parameters
-0.231835	But beware that macro parameters
-0.054120	allow up to fourteen parameters
-0.199976	on the stack (three parameters
-1.552849	it is possible to get
-0.701486	data in order to get
-0.489663	other in order to get
-0.489663	end in order to get
-0.489663	randomness in order to get
-0.235777	// Use template to get
-0.336758	arrays and want to get
-0.336758	we still want to get
-0.704053	may be difficult to get
-1.002782	are various ways to get
-0.235777	12 option -fno-builtin to get
-0.235777	takes some experience to get
-0.237859	seek information elsewhere and get
-0.387921	code then you can get
-0.387921	cache then you can get
-0.330552	Internet where you can get
-0.314599	// square x // get
-0.237759	timingtest.h from www.agner.org/optimize/testp.zip or get
-0.299025	But we will not get
-0.299025	me. You will not get
-0.546520	changes then you may get
-0.335226	In fact, you may get
-0.429119	are. For example, you get
-0.402608	differ then you will get
-0.283393	course, because you will get
-0.308980	simultaneously. Each thread will get
-0.233082	c; Here, y will get
-0.237381	utilized appropriately. Users should get
-0.233458	to this number we get
-0.289154	-fpic option. Then we get
-0.235178	and b will both get
-0.290875	of programming will typically get
-0.426598	case so we don't get
-0.279420	then you will soon get
-0.218508	public data object: (1) get
-0.199970	is extremely inefficient, (4) get
-0.564986	(true) { a = b;
-0.347500	Example 8.10b a = b;
-0.236582	+ 1; x[1] = b;
-0.036598	{ int a; int b;
-0.036598	public: int a; int b;
-0.036598	14.2a float a; int b;
-0.036598	14.2b float a; int b;
-0.128068	abc {int a; int b;
-0.232542	byte at 399 int b;
-0.318252	struct S1 { double b;
-0.209551	short int a; double b;
-0.209551	7.24 float a; double b;
-0.122342	c = a & b;
-0.122342	y = a & b;
-0.219508	unsigned integers int a, b;
-0.219508	* m;} int a, b;
-0.219508	= 1.6; int a, b;
-0.034898	Example 14.14b double a, b;
-0.034898	Example 14.14a double a, b;
-0.034898	Example 14.18c double a, b;
-0.034898	Example 8.2a double a, b;
-0.186688	1.0f;} 66 float a, b;
-0.186688	Example 14.18a float a, b;
-0.186688	Example 14.18b float a, b;
-0.160426	... int i, a, b;
-0.252024	Example 7.10a bool a, b;
-0.291987	double b; a += b;
-0.533978	b ? a : b;
-0.335730	c = a && b;
-0.336366	d = a | b;
-0.327433	d = a || b;
-0.619778	char a = 0, b;
-0.189780	7.29a float a; bool b;
-0.189780	double x, y; bool b;
-0.189780	x, y, z; bool b;
-0.403799	8.13a int i, a[100], b;
-0.814436	possible to do the check
-0.408037	You can bypass the check
-0.345187	to do such a check
-0.346980	add extra code to check
-0.543394	job. You have to check
-0.350117	then F1 has to check
-0.427452	is a way to check
-0.715633	The best way to check
-0.802201	122 for how to check
-0.352040	registers. You need to check
-0.579243	where you want to check
-0.339890	61 function calls to check
-0.643725	is often necessary to check
-0.292861	calculations. The program can check
-0.351602	or makefile. You can check
-0.348508	to zero We can check
-0.654469	& 0x7FFFFFFF) { // check
-0.333623	CPU dispatcher does not check
-0.333623	Example 14.26 does not check
-0.314495	the derived class. This check
-0.237564	calling function must then check
-0.522304	bits. There is no check
-0.522304	returned. There is no check
-0.126377	these functions have no check
-0.126377	string functions have no check
-0.292301	page 135). This extra check
-0.292276	that the function must check
-0.234851	style that doesn't automatically check
-0.234628	There is no automatic check
-0.290289	It makes a runtime check
-0.230734	have added a bounds check
-0.230029	this example. We might check
-0.523228	Boolean variables as input check
-0.304291	If the CPU brand check
-0.218520	input data. A missing check
-0.218490	against this problem: (1) check
-0.565766	predictable, then it is advantageous
-0.264997	out whether it is advantageous
-0.378560	deciding whether it is advantageous
-0.264997	determine whether it is advantageous
-0.530484	int. Therefore, it is advantageous
-0.341506	overflow. Likewise, it is advantageous
-0.562700	Inlining a function is advantageous
-0.350232	other purposes. This is advantageous
-0.350232	cache line. This is advantageous
-0.461380	apply to. It is advantageous
-0.329373	a lookup table is advantageous
-0.559507	finished. This method is advantageous
-0.334176	data without caching is advantageous
-0.567709	then it can be advantageous
-0.245913	operations. It can be advantageous
-0.245913	CPU. It can be advantageous
-0.982116	It may not be advantageous
-0.510861	45. This may be advantageous
-0.553136	anyway. It may be advantageous
-0.350883	whether vectorization will be advantageous
-0.819796	It can also be advantageous
-0.566063	in some cases be advantageous
-0.101574	It can therefore be advantageous
-0.101574	allocation can therefore be advantageous
-0.237841	with multiple cores are advantageous
-0.729605	then it is not advantageous
-0.506712	which it is not advantageous
-0.550152	problems. It is not advantageous
-0.340238	If hyperthreading is not advantageous
-0.415368	microprocessor and therefore not advantageous
-0.341283	vector operations is more advantageous
-0.341283	bitwise operators is more advantageous
-0.350270	where it is less advantageous
-0.236724	factors that decide how advantageous
-0.236513	It is almost always advantageous
-0.235111	Lookup tables are particular advantageous
-0.858915	If the code is implemented
-0.354076	Basic .NET, which is implemented
-0.343848	a derived class is implemented
-0.340333	Arrays An array is implemented
-0.048243	Each code version is implemented
-0.292880	the application software is implemented
-0.313828	destructors A constructor is implemented
-0.787752	because it can be implemented
-0.875186	the code can be implemented
-0.333459	how this can be implemented
-0.431476	missing functions can be implemented
-0.333459	the loop can be implemented
-0.062479	Function libraries can be implemented
-0.431476	dispatch mechanism can be implemented
-0.333459	example 14.28 can be implemented
-0.333459	interface etc., can be implemented
-0.333459	example 8.24 can be implemented
-0.138331	A queue should be implemented
-0.138331	FIFO queue should be implemented
-0.399104	algorithms, cannot easily be implemented
-0.231785	may some day be implemented
-0.337066	three conditions which are implemented
-0.292081	trigonometric functions, etc. are implemented
-0.323668	other programming languages are implemented
-0.313268	template metaprogramming, loops are implemented
-0.292081	example 15.1b. Branches are implemented
-0.237786	therefore need modification if implemented
-0.292488	scan instruction and have implemented
-0.351690	to call. I have implemented
-0.350965	of objects is often implemented
-0.235817	example shows this calculation implemented
-0.291031	time-consumer even for programs implemented
-0.333250	b; This is typically implemented
-0.235030	graphics application is preferably implemented
-0.358277	detailed overview of the problem
-0.357567	CPU models if the problem
-0.356079	work better. If the problem
-0.484891	you can avoid the problem
-0.420483	you can reduce the problem
-0.237332	You may ignore the problem
-0.382032	try to fix the problem
-0.237332	way of solving the problem
-0.565770	16. This is a problem
-0.558171	Gnu. There is a problem
-0.349407	If caching is a problem
-0.769109	this is not a problem
-0.323970	base access, etc. The problem
-0.692118	in the cache. The problem
-0.381240	full-size execution units. The problem
-0.236764	diagonal remain unchanged. The problem
-0.236771	scarcity of registers. This problem
-0.236771	efficient code caching. This problem
-0.105305	simple solution to this problem
-0.105305	standard solution to this problem
-0.244608	Possible solutions to this problem
-0.285317	general, you have this problem
-0.422742	possible to avoid this problem
-0.230081	compiler has solved this problem
-0.371946	way to solve this problem
-0.324936	: b; } A problem
-0.355864	predicted. This is no problem
-0.323177	be a very big problem
-0.231848	12.1b. Vectorization with alignment problem
-0.231855	the new version causes problem
-0.231356	upon the double. Another problem
-0.229191	leads to a usability problem
-0.226726	Security The most serious problem
-0.279466	is critical. The worst problem
-0.199981	less safe. This safety problem
-0.532124	efficient if it is known
-0.532124	call if it is known
-0.493380	58 If it is known
-0.353021	a variable which is known
-0.831050	number of objects is known
-0.225415	number of elements is known
-0.142611	types of elements is known
-0.381842	because the result is known
-0.267521	the 33 result is known
-0.235714	elements to store is known
-0.322686	fact that n is known
-0.313015	log on process is known
-0.312117	if the condition is known
-0.591466	of the divisor is known
-0.294216	to truly represent a known
-0.546511	on an object of known
-0.294171	(ArraySize) is constant and known
-0.456774	if it cannot be known
-0.334817	the array is not known
-0.334817	of objects is not known
-0.334817	row length is not known
-0.334817	memory required is not known
-0.334817	the divisor is not known
-0.528158	lengths that are not known
-0.237467	designed to handle only known
-0.455611	exponent is an integer known
-0.135574	stack. Is the size known
-0.135574	38). Is the size known
-0.237269	through the implicit pointer known
-0.314061	123 correspond to any known
-0.349514	row is a constant known
-0.305212	common source of error known
-0.229915	a common programming error known
-0.129450	the string is already known
-0.129450	the CPU-type is already known
-0.227599	a to b for (i
-0.427875	largest_index = 0; for (i
-0.004949	{ int i; for (i
-0.002467	0; int i; for (i
-0.004949	1.0; int i; for (i
-0.004949	list[100]; int i; for (i
-0.004949	7.30a int i; for (i
-0.227599	i, a[100], b; for (i
-0.058775	int i; ... for (i
-0.058775	b[size], i; ... for (i
-0.126768	x); 136 ... for (i
-0.126768	i, j; ... for (i
-0.099058	f = 1; for (i
-0.099058	b + 1; for (i
-0.227599	a to zero for (i
-0.075411	i; float x; for (i
-0.075411	j; float x; for (i
-0.306540	float register temp; for (i
-0.227599	temp = 3; for (i
-0.227599	int i, a[100]; for (i
-0.227599	int i; 45 for (i
-0.227599	Induction = r; for (i
-0.227599	Critical innermost loop: for (i
-0.227599	int i, StringLength; for (i
-0.227599	int i, a[2]; for (i
-0.227599	int i; 84 for (i
-0.227599	long long timediff[NumberOfTests]; for (i
-0.227599	before) } printf("\nResults:"); for (i
-0.234951	operator[] (unsigned int if (i
-0.343342	20; i++) { if (i
-0.101777	int i; ... if (i
-0.101777	float list[size]; ... if (i
-0.234951	100; float list[ARRAYSIZE]; if (i
-0.236245	resume after exceptions: while (i
-0.444176	any time, then the solution
-0.343563	to ignore, then the solution
-0.237751	cache sizes. Fortunately, the solution
-0.314713	by itself. But a solution
-0.237829	than integer comparisons. The solution
-0.236758	ALIGN ; mark_end; This solution
-0.236758	(e.g. DEC, JNZ). This solution
-0.348350	the cost of this solution
-0.290981	CPU time. But this solution
-0.235064	in edx. Furthermore, this solution
-0.407392	code to see which solution
-0.349021	collection. A more efficient solution
-0.271344	is the most efficient solution
-0.261884	then the most efficient solution
-0.383504	be a very efficient solution
-0.221112	of matrices. An efficient solution
-0.329911	full speed. A simple solution
-0.343554	this case. The best solution
-0.311757	before storing. The standard solution
-0.432173	set then the optimal solution
-0.223972	is not an optimal solution
-0.342838	("hidden")))". A more complicated solution
-0.424099	may be a better solution
-0.199976	variable size. The alternative solution
-0.251285	is inlined. An alternative solution
-0.316351	to be the fastest solution
-0.304344	An even more powerful solution
-0.218541	simplest and most clean solution
-0.218507	that the only reasonable solution
-0.212206	inside the class. Which solution
-0.330695	can be a viable solution
-0.199937	execution time. No universal solution
-0.165081	function library. The radical solution
-0.165081	hardware design. The ultimate solution
-0.358292	or key in the container
-0.356233	contained object because the container
-0.356322	multiple elements? If the container
-0.508381	way of making the container
-0.237584	the container. Can the container
-0.474360	need to use a container
-0.474360	preferred to use a container
-0.331216	allocated memory into a container
-0.331216	allocated array into a container
-0.236474	overcome by defining a container
-0.292585	account when choosing a container
-0.292585	may be considered a container
-0.292585	to temporarily lock a container
-0.236474	efficient to re-use a container
-0.341654	a[i] More examples of container
-0.346259	See the discussion of container
-0.325285	a container class. The container
-0.237845	time applications. Remember that container
-0.237295	if the array or container
-0.237295	fixed size array or container
-0.656751	is common to make container
-0.352325	Template Library) and other container
-0.293617	Do not use one container
-0.313982	a collection of example container
-0.313801	useful source of such container
-0.230651	for discussion of efficient container
-0.352867	replaced by more efficient container
-0.230651	checking and various efficient container
-0.291395	classes. Unfortunately, many standard container
-0.327942	may write your own container
-0.233742	using templates. Ready made container
-0.288587	stored in an STL container
-0.324362	in STL for accessing container
-0.231328	manual at www.agner.org/optimize/cppexamples.zip containing container
-0.212229	replace arrays by well-tested container
-0.123076	function. This has the advantage
-0.336134	instruction set gives the advantage
-0.350704	return list[x]; } The advantage
-0.497578	by the program. The advantage
-0.442065	same level-1 cache. The advantage
-0.311653	modern C++ compilers. The advantage
-0.533717	set is enabled. The advantage
-0.311653	address calculation faster. The advantage
-0.235325	number of iterations. The advantage
-0.235325	the variable m. The advantage
-0.237717	same logical register. This advantage
-0.518667	case there is an advantage
-0.212715	it can be an advantage
-0.212715	This can be an advantage
-0.669043	processor is not an advantage
-0.308378	This is only an advantage
-0.288151	library, SSE4.1 gives an advantage
-1.221325	then there is no advantage
-0.237135	there is no such advantage
-0.335403	one version that takes advantage
-0.272316	In order to take advantage
-0.272316	shows how to take advantage
-0.175209	applications that can take advantage
-0.175209	if you can take advantage
-0.175209	7 program can take advantage
-0.078828	unsigned You can take advantage
-0.078828	a. You can take advantage
-0.240882	cache. We can take advantage
-0.236097	was hardly any speed advantage
-0.347837	there is a specific advantage
-0.309212	instruction set. The main advantage
-0.231866	do to take maximum advantage
-0.323735	future. Typically, the full advantage
-0.212257	to 15.1c? We took advantage
-0.286835	// instrset_detect function // Function
-0.036235	_mm_loadu_si128((__m128i const*)p); } // Function
-0.075770	_mm_load_si128((__m128i const*)p); } // Function
-0.373801	SSE2 intrinsic functions // Function
-0.231418	Define vector classes // Function
-0.529255	1.0; } }; // Function
-0.286835	vectorized with SSE4.1 // Function
-0.286835	int cc[size] ); // Function
-0.231418	parm1, int parm2); // Function
-0.231418	return _mm_loadu_si128((__m128i const*)p);} // Function
-0.231418	prototype CriticalFunctionType CriticalFunction_Dispatch; // Function
-0.634863	linking and position-independent code Function
-0.340463	Static versus dynamic libraries Function
-0.236515	the table. Optimization method Function
-0.292301	to the inlined function. Function
-0.437047	respect. 7.15 Function parameters Function
-0.471507	have a non-inlined copy Function
-0.275043	rather than in memory. Function
-0.275043	scattered around in memory. Function
-0.230734	to access these instructions. Function
-0.276512	too big. 7.14 Functions Function
-0.222294	including the profiler itself. Function
-0.355960	constructors and overloaded operators. Function
-0.122619	and decrement operators. 7.7 Function
-0.122619	references ............................................................................................ 36 7.7 Function
-0.122619	parameters ............................................................................................... 50 7.16 Function
-0.122619	and operating systems". 7.16 Function
-0.074750	Functions ................................................................................................................ 48 7.15 Function
-0.074750	in this respect. 7.15 Function
-0.199953	you are using __fastcall. Function
-0.165097	instead: // Example 12.6. Function
-0.165097	/vms Fastcall functions /Gr Function
-0.165097	programmer to know about. Function
-0.165097	__intel_new_strlen in library libircmt.lib. Function
-0.535506	of the code to support
-0.237831	the operating system for support
-0.329769	only for compilers that support
-0.048277	generation of processors that support
-0.237191	all unknown processors that support
-0.292284	with all CPUs that support
-0.297643	systems that do not support
-0.297643	languages that do not support
-0.323453	32-bit Windows. Does not support
-0.345633	from compilers that have support
-0.549911	order. Some compilers have support
-0.237548	brand. Future processors will support
-0.440912	64-bit instruction set has support
-0.235774	the operating system has support
-0.656745	it takes to make support
-0.293157	Codeplay compiler has some support
-0.236665	instead. The Gnu libraries support
-0.045448	is compiled with AVX support
-0.045448	code compiled with AVX support
-0.293439	is compiled without AVX support
-0.014428	the CPU has hardware support
-0.060832	the microprocessor has hardware support
-0.347301	its possible exception handling support
-0.234739	excuse that "we don't support
-0.290265	operating systems need better support
-0.234131	C library. It requires support
-0.136521	recommended to turn off support
-0.231358	This method requires OS support
-0.228052	off debugging and profiling support
-0.226710	version with full debugging support
-0.199942	do not have inherent support
-0.165086	processors). It has excellent support
-0.462219	to check for the supported
-0.501476	This instruction set is supported
-0.356394	which instruction set is supported
-0.642447	SSE2 instruction set is supported
-0.356394	32 instruction set is supported
-0.341878	AVX instruction set is supported
-0.339679	several reasons. C++ is supported
-0.102243	only if AVX is supported
-0.102243	operating system. AVX is supported
-0.236218	only when AVX2 is supported
-0.489905	introduced in Linux and supported
-0.293868	is fully standardized and supported
-0.120347	the first processors that supported
-0.120347	The first processors that supported
-0.454090	instructions. Intrinsic functions are supported
-0.785434	if XMM registers are supported
-0.324161	Fortran. These directives are supported
-0.293259	are only available if supported
-0.237067	__restrict or __restrict__, if supported
-0.356078	feature information, such as supported
-0.354375	Intel processors are not supported
-0.236756	lowest instruction set not supported
-0.237464	but is currently only supported
-0.047703	4) { // SSE2 supported
-0.349976	the CPUID information about supported
-0.043342	11) { // AVX supported
-0.157785	parm2) { // Get supported
-0.102858	function version // Get supported
-0.199981	SSE2 is the minimum supported
-0.165122	cc[]) { // Detect supported
-0.408132	point register variables is eight
-0.815129	as a vector of eight
-0.331627	time in vectors of eight
-0.002864	the result vector in eight
-0.355976	of registers. There are eight
-0.421129	four double precision or eight
-0.060619	Roll out loop by eight
-0.429045	so you can have eight
-0.237414	four physical processors but eight
-0.237259	elements will go into eight
-0.342522	the diagonal. The first eight
-0.232133	and the 49 first eight
-0.236556	cache line. But these eight
-0.322453	four cores can run eight
-0.231324	that we can handle eight
-0.000278	8) { // Load eight
-0.000495	+ i); // Load eight
-0.004474	elements b.load(bb+i); // Load eight
-0.020178	floating point operations involves eight
-0.284263	of 8 bits each, eight
-0.199965	efficient because it handles eight
-0.165107	has to be reloaded eight
-0.502227	is done with the operators
-0.051472	Floating point variables and operators
-0.476001	7.2 Integers variables and operators
-0.293738	like adding vectors. The operators
-0.628879	one clock cycle. The operators
-0.314290	if they come from operators
-0.345976	the sense that all operators
-0.237421	0 or 1, but operators
-0.236027	malloc and free. These operators
-0.235104	produce undesired results. Integer operators
-0.083339	instead of the Boolean operators
-0.083339	operands of the Boolean operators
-0.079960	faster than the Boolean operators
-0.079960	difference between the Boolean operators
-0.236871	|, ~. The Boolean operators
-0.258194	classes and using overloaded operators
-0.206110	expression with multiple overloaded operators
-0.051248	to use the bitwise operators
-0.051248	operands. Nevertheless, the bitwise operators
-0.037858	when needed. The bitwise operators
-0.037858	at once The bitwise operators
-0.037858	and ||). The bitwise operators
-0.044143	can do with bitwise operators
-0.044143	trick of using bitwise operators
-0.010622	zero. 14.3 Use bitwise operators
-0.010622	134 14.3 Use bitwise operators
-0.044143	and the corresponding bitwise operators
-0.222324	said here about increment operators
-0.068028	overloaded functions. 7.27 Overloaded operators
-0.068028	.............................................................................................. 56 7.27 Overloaded operators
-0.122635	also applies to decrement operators
-0.165149	respectively. Increment and decrement operators
-0.165122	involving division and relational operators
-0.996952	is one of the few
-0.524443	member function is a few
-0.351267	function library for a few
-0.174146	that there are a few
-0.174146	But there are a few
-0.377103	etc. There are a few
-0.078297	has one or a few
-0.078297	only one or a few
-0.078297	just one or a few
-0.304083	write less than a few
-0.125954	is longer than a few
-0.125954	be longer than a few
-0.350187	I will make a few
-0.337699	it takes only a few
-0.328195	pointer typically takes a few
-0.289980	size then add a few
-0.234184	pointer is needed a few
-0.310294	cache in just a few
-0.327258	global arrays require a few
-0.234184	must wait until a few
-0.310294	instruction sets include a few
-0.289980	Repeating the break a few
-0.234184	add, etc. SSSE3 a few
-0.237857	admittedly very kludgy. The few
-0.421206	inside a loop with few
-0.237725	dispatcher should have as few
-0.292161	the hard disk. A few
-0.236102	a few lines. A few
-0.539410	generated from the same few
-0.237472	large libraries where only few
-0.237434	performance even matters, which few
-0.313586	AVX instructions have very few
-0.329180	software framework that uses few
-0.232683	the virtual table. Unfortunately, few
-0.352775	for improving code that contains
-0.420013	processors, a loop that contains
-0.381602	considered a container that contains
-0.496542	obvious and the code contains
-0.544515	vectors if the code contains
-1.395097	part of the program contains
-0.160278	loop of a program contains
-0.442151	ways). If a program contains
-0.456747	obtained. If a loop contains
-0.237407	induction variable (eax) which contains
-0.327176	C++ compiler. This library contains
-0.099352	AMD math core library contains
-0.099352	AMD Math core library contains
-0.228392	"Intel Performance Primitives" library contains
-0.300536	But if the software contains
-0.300536	coded. If the software contains
-0.287407	processes because it often contains
-0.231921	operators. Vectorized code often contains
-0.338459	possible if the expression contains
-0.473089	features: The code section contains
-0.231812	code you are testing contains
-0.230725	points to. Now ebx contains
-0.190978	the stack). ecx now contains
-0.190978	The loop body now contains
-0.228086	array address is. ecx contains
-0.226691	example, the Boost collection contains
-0.279392	array a and edx contains
-0.534426	this manual at www.agner.org/optimize/cppexamples.zip contains
-0.466001	"Intel Math Kernel Library" contains
-0.165086	function library at www.agner.org/optimize/asmlib.zip contains
-0.165086	linking. The file http://www.agner.org/optimize/asmlib.zip contains
-0.165086	first generation class (CGrandParent) contains
-0.165086	second generation class (CParent<>) contains
-0.255754	same time regardless of whether
-0.255754	in registers, regardless of whether
-0.294133	the final program and whether
-0.349042	the truth depends on whether
-0.237113	code will be efficient whether
-0.236538	cannot know for sure whether
-0.536036	want to find out whether
-0.236225	can be made about whether
-0.343848	you want to check whether
-0.235170	p->member is equally fast whether
-0.300163	C++ compilers to see whether
-0.300163	virtual table to see whether
-0.439314	it makes no difference whether
-0.377331	8.1. The table shows whether
-0.336471	CPU dispatcher to know whether
-0.231820	it is not clear whether
-0.619199	is difficult to predict whether
-0.558901	then you may consider whether
-0.268266	code, you may consider whether
-0.171875	so-called CPU-dispatcher that checks whether
-0.219785	derived class, it checks whether
-0.171875	The CPU dispatcher checks whether
-0.226654	is known at compile-time whether
-0.315685	the compiler to evaluate whether
-0.010815	fine-grained parallelism when deciding whether
-0.001188	into account when deciding whether
-0.010815	the disadvantages when deciding whether
-0.284216	in order to determine whether
-0.199925	2 GB. When considering whether
-0.199925	the first operand determines whether
-0.199925	each iteration it decides whether
-0.165071	able to predict correctly whether
-0.062782	0; i < 100; i++)
-0.233565	0; i < 2; i++)
-0.011732	0; i < size; i++)
-0.195981	0; i < n; i++)
-0.246792	2; i <= n; i++)
-1.257688	0; i < 256; i++)
-0.222347	0; i < 1000; i++)
-0.212257	0; for (i=0; i<100; i++)
-0.466083	0; i < 20; i++)
-0.330763	example, for (i=0; i<n; i++)
-0.008671	0; i < rows; i++)
-0.008671	0; i < NumberOfTests; i++)
-0.165128	; i < arraysize; i++)
-0.165128	0; i < ArraySize; i++)
-0.165128	0; i < list.Size(); i++)
-0.462909	new element in the list
-0.587822	here is that the list
-0.357840	very inefficient if the list
-0.352670	be put into the list
-0.456312	And here is a list
-0.353161	capabilities. Here is a list
-0.461289	12.8a. Sum of a list
-0.353815	page 3 for a list
-0.354666	array initialized by a list
-0.330768	the market. Such a list
-0.820871	of the beginning of list
-0.294211	extra dummy element to list
-1.246588	number of elements in list
-0.237554	this instruction set?". A list
-0.347592	sum of a long list
-0.353302	at Wikibooks. The following list
-0.213360	form of a linked list
-0.151446	element in a linked list
-0.151446	not use a linked list
-0.151446	Walking through a linked list
-0.074774	next block. A linked list
-0.074774	linked lists. A linked list
-0.106864	Alternatively, use a negative list
-0.106864	to make a negative list
-0.106864	software contains a negative list
-0.140151	is that a positive list
-0.140151	to make a positive list
-0.140151	software contains a positive list
-0.336965	by copying the entire list
-0.434182	can use a linear list
-0.329369	for even the smallest list
-0.015537	may use a sorted list
-0.015537	small then a sorted list
-0.015537	simplicity. But a sorted list
-0.294131	catch programming errors that would
-0.492048	mechanism even when it would
-0.352687	logarithm again, but it would
-0.235810	before multiplying them. This would
-0.235810	interfaces from scratch. This would
-0.235810	log(b[i]) + log(c[i]);. This would
-0.502515	module then the compiler would
-0.527818	then an optimizing compiler would
-0.314434	than the time you would
-0.267985	= 0 because this would
-0.267985	time-consuming tasks because this would
-0.347809	the value. The loop would
-0.237397	< 5) {} which would
-0.344012	and no induction variable would
-0.338655	odd number then we would
-0.527939	before the cache line would
-0.322684	64-bit mode, the parameters would
-0.235656	design. The ultimate solution would
-0.615580	2 then the multiplication would
-0.234855	seconds. A safer implementation would
-0.287849	b, c and d would
-0.524610	of the two loops would
-0.229968	example 15.1b to metaprogramming would
-0.326991	loop- carried dependency chain would
-0.190977	function libraries, but who would
-0.190977	single precision. And who would
-0.212219	examples where the reduction would
-0.165093	that a particular reduction would
-0.296121	example 15.1a to 15.1c would
-0.218455	even if they otherwise would
-0.569045	from 0x2700 to 0x273F would
-0.284209	Without static, the logarithm would
-0.199920	of double, then sizeof(S1) would
-0.358538	need updating in the likely
-0.234371	you optimized for is likely
-0.575657	manner then it is likely
-0.451993	the program, it is likely
-0.349752	or bottleneck, it is likely
-0.560274	then the code is likely
-0.442565	that your code is likely
-0.490923	solution. The compiler is likely
-0.452731	0x80000000; because this is likely
-0.350335	the test program is likely
-0.351633	garbage collector which is likely
-0.424008	the target address is likely
-0.310516	the end user is likely
-0.449389	consider which method is likely
-0.310516	resources. The system is likely
-0.234371	to 1024 bits is likely
-0.474509	if the problem is likely
-0.415977	particular CPU model is likely
-0.290192	a different platform is likely
-0.336268	the speed here is likely
-0.234371	a particular brand is likely
-0.234371	The integer comparison is likely
-0.234371	above, page 87) is likely
-0.349058	because static data are likely
-0.355402	The compiler is more likely
-0.237327	dispatcher function will most likely
-0.495809	performance, it is also likely
-0.444503	program, it is very likely
-0.444503	stack, which is very likely
-0.340809	executing instructions are less likely
-0.342899	local variables and therefore likely
-0.235288	CPU model, which quite likely
-0.322186	evaluate and are equally likely
-0.530450	the beginning of the structure
-0.358235	inserted UnusedFiller in the structure
-0.345903	code for making the structure
-0.324873	code will load the structure
-0.382270	reordering has made the structure
-0.357229	A union is a structure
-0.353808	unused bytes in a structure
-0.353808	come last in a structure
-0.459367	type such as a structure
-0.237013	doubt how big a structure
-0.293198	you may define a structure
-0.466949	to an array of structure
-0.314273	applies to arrays of structure
-0.293777	automatically. The alignment of structure
-0.195433	of the class or structure
-0.117688	to the class or structure
-0.040489	of a class or structure
-0.085144	in a class or structure
-0.407849	for each thread. This structure
-0.237550	of 4 floats A structure
-0.291868	database, or other data structure
-0.235845	thread its own data structure
-0.237506	a more clear program structure
-1.028014	members of the same structure
-0.433084	understanding of the whole structure
-0.306930	good for the logical structure
-0.230745	code has a parallel structure
-0.284302	situation where the logic structure
-0.276578	solution. Is a multidimensional structure
-0.291685	executed. However, the pipeline structure
-0.218514	object of a class, structure
-0.357977	even though it is doing
-0.584873	unless the function is doing
-0.382362	If the microprocessor is doing
-0.347295	but efficient, way of doing
-0.255238	A newer method of doing
-0.255238	The original method of doing
-0.174800	are different ways of doing
-0.264679	several different ways of doing
-0.422533	are smarter ways of doing
-0.237859	of register renaming and doing
-0.499001	Threads are used for doing
-0.293298	such as C++ for doing
-0.505860	operations are available for doing
-0.561444	that if you are doing
-0.453372	These two functions are doing
-0.330084	that different threads are doing
-0.236464	Sum2 and Sum3 are doing
-0.237752	can be accomplished by doing
-0.522091	as you are not doing
-0.341618	and other resources than doing
-0.293878	will never spend time doing
-0.236185	the default size when doing
-0.323809	operations are useful when doing
-0.540776	prevent the compiler from doing
-0.830523	prevents the compiler from doing
-0.141541	prevent the CPU from doing
-0.224120	prevents the CPU from doing
-0.237529	that are best at doing
-0.356048	to help the CPU doing
-0.903707	in the innermost loop doing
-0.234121	1% goes to actually doing
-0.732219	they are in fact doing
-0.251279	the program is busy doing
-0.458966	simple alternative is to run
-0.857470	code is likely to run
-0.813891	microprocessors are able to run
-0.236742	expect 64-bit programs to run
-0.536540	which processor models to run
-0.323943	hyperthreading, then try to run
-0.313343	users will prefer to run
-0.444265	the CPU dispatching and run
-0.293221	into multiple threads that run
-0.237034	services. Many services that run
-0.237034	useful on servers that run
-0.405572	AVX instruction set can run
-0.236174	the critical part can run
-0.236174	with four cores can run
-0.236174	the x86 family can run
-0.291413	the compiled code may run
-0.291413	many function calls may run
-0.235444	contrary, each thread may run
-0.353642	or __debugbreak();. If you run
-0.120841	on, then it will run
-0.272948	references. Therefore, it will run
-0.477030	Now the code will run
-0.307218	then each thread will run
-0.314331	Each thread can then run
-0.237531	is loaded or at run
-0.237462	at all. Can only run
-0.293619	The update process should run
-0.350668	time consumption of each run
-0.419367	you make a test run
-0.313022	another thread will always run
-0.234461	This will make applications run
-0.320223	instruction set can still run
-0.343379	write 2.0/3.0 than to calculate
-0.352596	we will have to calculate
-0.731293	take more time to calculate
-1.542815	it is possible to calculate
-0.775688	time it takes to calculate
-0.322214	need induction variables to calculate
-0.589767	element in order to calculate
-0.513150	operand is faster to calculate
-1.007417	that you want to calculate
-1.174507	compilers are able to calculate
-1.347210	it is recommended to calculate
-0.290279	a typical application to calculate
-0.321139	CPU will start to calculate
-0.234447	you don't care to calculate
-0.234447	Linux The procedure to calculate
-0.416092	be more convenient to calculate
-0.417430	it is safer to calculate
-0.341463	inline this function and calculate
-0.237623	to reload *p and calculate
-0.291009	accessed consecutively and can calculate
-0.520577	order and it can calculate
-0.446537	bytes, so we can calculate
-0.350156	into account. You can calculate
-0.311372	processors. Many processors can calculate
-0.346824	be used. We can calculate
-0.550813	then the compiler may calculate
-0.541908	1.2345); The compiler will calculate
-0.323139	back. Thus, we will calculate
-0.237481	because it needs only calculate
-0.380503	list, the compiler must calculate
-0.229230	example 8.21, you could calculate
-0.358046	usually inlined if the inline
-1.263632	for the compiler to inline
-0.353091	is more likely to inline
-0.674301	compiler is able to inline
-0.969851	to be able to inline
-0.605938	it is optimal to inline
-0.349022	has excellent support for inline
-0.237719	7.34a. Use macro as inline
-0.340240	the code with an inline
-0.330981	this by using an inline
-0.555461	solution is to use inline
-0.503345	of using the same inline
-0.314216	by making critical functions inline
-0.001233	vector from array static inline
-0.001233	vector into array static inline
-0.195782	template <int N> static inline
-0.195782	141 #include <emmintrin.h> static inline
-0.195782	// Example 14.19 static inline
-0.195782	template <typename T> static inline
-0.195782	a cache line: static inline
-0.195782	{ return _mm_cvtss_si32(_mm_load_ss(&x));} static inline
-0.195782	the function add_horizontal) static inline
-0.330856	to. Therefore, it cannot inline
-0.346287	to dispatched function call inline
-0.236130	Use inline functions An inline
-0.236118	function, if possible. Use inline
-0.310594	allow assembly-like intrinsic functions, inline
-0.237698	a zip file of every
-0.331755	a backup copy of every
-0.293282	one memory block for every
-0.237087	between two expressions for every
-0.237087	again and again for every
-0.294137	programming textbooks recommend that every
-0.348942	at every function or every
-0.292093	it makes dispatching on every
-0.380233	uses the dispatch on every
-0.292093	different times: Dispatch on every
-0.615076	you can do this every
-0.322878	to be done at every
-0.235870	temporary debug breakpoints at every
-0.343071	certain events, for example every
-0.334881	each object are called every
-0.344809	The branching is done every
-0.322517	put into the list every
-0.322675	a series of branches every
-0.057634	a new memory block every
-0.480718	new floating point addition every
-0.091644	can have one addition every
-0.091644	have only one addition every
-0.336663	that must be loaded every
-0.397521	programs search for updates every
-0.230707	generate an interrupt, e.g. every
-0.283027	to make a misprediction every
-0.279387	macro parameters are evaluated every
-0.276465	needs to be updated every
-0.199937	int seconds; // incremented every
-0.165081	memory block is re-allocated every
-0.165081	logarithm would be re-calculated every
-0.358219	(called x86) of the standard
-0.353728	as alternatives to the standard
-0.353728	default, conform to the standard
-0.524262	is based on the standard
-0.350782	increasingly blurred as the standard
-0.355993	above table. If the standard
-0.338728	these purposes. Unfortunately, the standard
-0.237243	for most purposes the standard
-0.237243	exception handling. Omitting the standard
-0.357794	directives. OpenMP is a standard
-0.456337	is to have a standard
-0.023508	strict aliasing rule of standard
-0.313795	well-tested container classes. The standard
-0.237121	checked before storing. The standard
-0.237121	or "frame pointer". The standard
-0.237843	and data structures for standard
-0.336222	(Microsoft, Intel) know that standard
-0.237712	standard 754 (1985). This standard
-0.341618	less computing resources than standard
-0.350553	The programmer can use standard
-0.312756	rights. Software should use standard
-0.479195	function libraries for many standard
-0.233674	container classes. Unfortunately, many standard
-0.236351	code contains only simple standard
-0.236025	processing power. Connecting several standard
-0.233275	problem. The official C standard
-0.230768	The Intel compiler includes standard
-0.229182	compression Most compilers include standard
-0.122628	is that the C/C++ standard
-0.122628	is bad The C/C++ standard
-0.165112	according to the IEEE standard
-0.461753	extra cache for the hardware
-0.502482	details depend on the hardware
-0.374532	is faster than the hardware
-0.356836	cause problems when the hardware
-0.356145	a microprocessor because the hardware
-0.351231	e.g. C++, and a hardware
-0.140731	be coded in a hardware
-0.140731	are coded in a hardware
-0.350242	be programmed in a hardware
-0.532843	implementation rather than a hardware
-0.345389	sequential instructions, where a hardware
-0.455778	Today, the choice of hardware
-0.445512	platform The choice of hardware
-0.128383	platform 2.1 Choice of hardware
-0.128383	5 2.1 Choice of hardware
-0.346443	allows direct access to hardware
-0.346444	etc. are implemented in hardware
-0.237849	that connect them. The hardware
-0.805482	schemes are based on hardware
-0.136252	because the CPU has hardware
-0.472036	because the microprocessor has hardware
-0.253804	particular CPU or other hardware
-0.253804	hard disk or other hardware
-0.235686	correspond to any known hardware
-0.338615	relying on the microprocessor hardware
-0.223488	general improvements in microprocessor hardware
-0.290665	faster than the intrinsic hardware
-0.279449	hardware definition language defines hardware
-0.199981	self-styled hacks and direct hardware
-0.165122	portability issue to catching hardware
-0.836040	a clock cycle is 1
-0.293826	0 for positive and 1
-0.237565	0 for false and 1
-0.355592	of a will be 1
-0.422844	to be 0 or 1
-0.665128	value than 0 or 1
-0.224420	is always 0 or 1
-0.314606	have AND'ed b with 1
-0.935282	0, last byte at 1
-0.757356	c = b + 1
-0.476730	char, signed or unsigned 1
-0.468515	MMX long long 64 1
-0.232029	32 64 Iu32vec2 64 1
-0.236411	: 1; // always 1
-0.236437	set needed _mm_shuffle_epi8 16 1
-0.236464	16 SSSE3 _mm_perm_epi8 32 1
-0.236310	of security. b & 1
-0.225655	signed or unsigned 1 1
-0.225655	alignment, bytes bool 1 1
-0.232273	bytes alignment, bytes bool 1
-0.230715	OneOrTwo5[b!=0] as OneOrTwo5[(b!=0) ? 1
-0.367283	31 ebx, eax ebx, 1
-0.226665	[edx] DWORD PTR[ecx+eax*4],ebx eax, 1
-0.226642	( 1)sign 2exponent 127 1
-0.212235	Copyright notice .......................................................................................................... 164 1
-0.212183	4 ?Func2@@YAXQAHAAH@Z ENDP ecx, 1
-0.284202	Architecture Programmer’s Manual", Volume 1
-0.199914	value wrap around. Adding 1
-0.199914	generate -128, and subtracting 1
-0.165060	Last updated 2014-08-07. Contents 1
-0.165060	( 1)sign 2exponent 1023 1
-0.165060	misses in the level- 1
-0.023508	> b ? a :
-0.237372	142 unsigned int one :
-0.023262	= a ? b :
-0.043567	? c + 2 :
-0.235412	as OneOrTwo5[(b!=0) ? 1 :
-0.461818	0x3FFF unsigned int sign :
-0.008498	part unsigned int exponent :
-0.017166	normal unsigned int exponent :
-0.005290	{ unsigned int fraction :
-0.228119	? (cc[i] + 2) :
-0.082595	B2; 54 class D :
-0.082595	class B2; class D :
-0.173990	f(); }; class C1 :
-0.173990	"; Disp(); class C1 :
-0.068009	the declaration class CChild1 :
-0.068009	// versions: class CChild1 :
-0.212211	} }; class CChild2 :
-0.199942	<typename MyChild> class CParent :
-0.199942	} }; class C2 :
-0.199942	== 0) ? 1.0f :
-0.165086	%0 " : "=m"(n) :
-0.165086	= b ? 1.5f :
-0.165086	int x; public: c1() :
-0.165086	== EXCEPTION_FLT_OVERFLOW ? EXCEPTION_EXECUTE_HANDLER :
-0.165086	\n fistpl %0 " :
-0.165086	: "=m"(n) : "m"(x) :
-0.355067	we would have to add
-0.874282	is not possible to add
-1.726812	time it takes to add
-0.293319	common for software to add
-1.081591	is no reason to add
-0.237583	of the operands and add
-0.237583	constant = shift and add
-0.237841	for making plug-ins that add
-0.236039	// sum operator // add
-0.292090	n << 23; // add
-0.236039	*const_cast<int*>(&x) += 2;} // add
-0.236039	} return add_elements(s); // add
-0.407895	outside the loop or add
-0.353989	integer operations do not add
-0.337166	other modules. You may add
-0.337166	program itself. You may add
-0.236029	the vector size then add
-0.312492	any other module then add
-0.101996	by 2. The instruction add
-0.101996	array elements. The instruction add
-0.293214	to 100000000. When we add
-0.313094	vector. You may even add
-0.236346	The next two instructions add
-0.236172	divide by 2 ; add
-0.870054	a function that doesn't add
-0.235408	[eax+4], ecx 86 add add
-0.289882	the compiler may actually add
-0.231318	add sar add mov add
-0.265086	into a vector register, add
-0.199948	mov shr add sar add
-0.199948	mov $B1$2: mov shr add
-0.165091	PTR [eax+4], ecx 86 add
-0.721422	more efficient in 64-bit mode
-0.233902	when running in 64-bit mode
-0.047743	not needed in 64-bit mode
-0.233902	as follows in 64-bit mode
-0.233902	available, i.e. in 64-bit mode
-0.233902	much simpler in 64-bit mode
-0.303956	be 2 In 64-bit mode
-0.091255	library files. Use 64-bit mode
-0.091255	floating point. Use 64-bit mode
-0.206928	pointer or reference, 64-bit mode
-0.598183	register variables in 32-bit mode
-0.326986	calling method in 32-bit mode
-0.228910	pointer or reference, 32-bit mode
-0.328916	calculation in 64 bit mode
-0.328916	longer in 64 bit mode
-0.391356	references in 32 bit mode
-0.274815	features 80386 32 bit mode
-0.206125	not optimized for 16-bit mode
-0.206125	and 64-bit mode. 16-bit mode
-0.226765	the floating point rounding mode
-0.033301	inputs for a console mode
-0.033301	and use a console mode
-0.033301	output file. A console mode
-0.033301	user interface. A console mode
-0.042041	to set the flush-to-zero mode
-0.042041	from setting the flush-to-zero mode
-0.127753	Example 7.5. Set flush-to-zero mode
-0.068034	to switch to protected mode
-0.068034	of switching to protected mode
-0.074773	addition, set the denormals-are-zero mode
-0.074773	Set flush-to-zero and denormals-are-zero mode
-0.503026	intended because of a store
-0.354252	cache miss on a store
-0.429081	likely to generate a store
-0.353559	compromise safety is to store
-0.343828	memory block than to store
-0.578439	forces the compiler to store
-0.979390	then you have to store
-0.650362	is more efficient to store
-1.558578	It is possible to store
-0.987343	makes it possible to store
-0.290711	number of elements to store
-0.547328	www.agner.org/optimize/cppexamples.zip for how to store
-0.812039	without the need to store
-0.060135	} // Function to store
-0.213192	const*)p);} // Function to store
-0.732361	when deciding whether to store
-0.290711	efficient memory space to store
-0.313759	the same class and store
-0.237090	could calculate *p+2 and store
-0.237090	register containing (2,2,2,2), and store
-0.237090	constant vector (1,2,3,4), and store
-0.451003	x so we can store
-0.353224	jeopardizing safety, you may store
-0.323728	that the system may store
-0.544398	called. The compiler will store
-0.237545	fixed strides. Uncached memory store
-0.236217	r points to ; store
-0.230077	An optimizing compiler might store
-0.200009	pre-calculated table. Even better: store
-0.358115	to type in the values
-0.649063	will recognize that the values
-0.353493	C; Assuming that the values
-0.356952	depends only on the values
-0.498198	b++; will make the values
-0.342954	we can store the values
-0.407280	hand and insert the values
-0.237329	time and show the values
-0.293320	sign bit }; The values
-0.428271	function is called. The values
-0.313795	in an array. The values
-0.347121	many labels that have values
-0.063963	and b have other values
-0.063963	the operands have other values
-0.063963	variables might have other values
-0.203004	can have no other values
-0.203004	operands have no other values
-0.237332	addresses with different set values
-0.023231	operators for checking multiple values
-0.420196	Which of these two values
-0.925534	to calculate the table values
-0.416708	are replaced by their values
-0.234451	image data have three values
-0.232637	be initialized to desired values
-0.224867	example, then all five values
-0.222312	replaced by their actual values
-0.222312	been initialized to valid values
-0.222312	key? If the key values
-0.199970	vector, the four G values
-0.199970	with all the R values
-0.236629	loader. 2. Position-independent code. All
-0.352358	Choice of operating system All
-0.535321	Out of order execution All
-0.440037	of the application program. All
-0.234703	mutexes and message systems. All
-0.618654	8.5 Compiler optimization options All
-0.233181	the register stack are: All
-0.232545	avoiding any public variables. All
-0.231771	only simple standard operations. All
-0.323635	the library is needed. All
-0.276967	for several different purposes. All
-0.338971	array for multiple purposes. All
-0.229122	the user. Compatibility problems. All
-0.430227	platforms with big-endian storage. All
-0.313314	are accessed very fast. All
-0.226676	reductions they cannot do. All
-0.218432	under the best-case conditions. All
-0.212165	needs to be stored. All
-0.284182	incompatible or error prone. All
-0.199897	addresses that need relocation. All
-0.251197	implement this "override" feature. All
-0.199897	listed in table 9.2. All
-0.165045	way microprocessors are constructed. All
-0.165045	of return prediction). 149 All
-0.165045	compiling in two steps. All
-0.165045	list (see page 93). All
-0.165045	with external libraries. www.agner.org/optimize/#vectorclass All
-0.165045	this problem: 1. Relocation. All
-0.165045	object file formats. Comments All
-0.165045	it is an integer). All
-0.165045	is called stack unwinding. All
-0.524416	gives access to the sign
-0.355713	funny things with the sign
-0.791849	In this example, the sign
-0.456640	example, to test the sign
-0.345116	with or without the sign
-0.124295	can shift out the sign
-0.124295	will shift out the sign
-0.345973	to care about the sign
-0.324234	above example sets the sign
-0.475782	We can change the sign
-0.023427	all bits except the sign
-0.419440	value by setting the sign
-0.292782	shr ebx,31 copies the sign
-0.236648	want to flip the sign
-0.236648	simply by inverting the sign
-0.237854	of the fraction. The sign
-0.237853	because various corrections for sign
-0.382690	sign : 1; // sign
-0.237747	is -0 (zero with sign
-0.311371	+ 0x3FF unsigned int sign
-0.311371	+ 0x3FFF unsigned int sign
-0.311371	+ 0x7F unsigned int sign
-0.210911	|= 0x80000000; // set sign
-0.210911	&= 0x7FFFFFFF; // set sign
-0.381026	0) { // test sign
-0.313652	a[i] and shift out sign
-0.228126	ebx ; shift down sign
-0.313458	|= 0x80000000; // Set sign
-0.199993	^= 0x80000000; // flip sign
-0.165133	pitfalls here: The inequality sign
-0.357970	the calls to the copy
-0.460192	loop will use the copy
-1.004804	unless there is a copy
-0.357632	The benefits of a copy
-0.349252	obviously takes time to copy
-1.072088	may be useful to copy
-0.237513	the data block to copy
-0.382032	bigger memory block and copy
-0.023479	function to transpose and copy
-0.293320	a hidden pointer. The copy
-0.313795	function return value. The copy
-0.237121	constructors and destructors. The copy
-0.547703	References are useful for copy
-0.237211	memset(a, 0, sizeof(a)); // copy
-0.237211	a[i] = 0.0; // copy
-0.292140	for improved performance. A copy
-0.236083	doesn't need initialization. A copy
-0.513338	if there are no copy
-0.625631	the object has no copy
-0.236278	obeyed. Copy protection. Some copy
-0.234889	hardware is updated. Most copy
-0.234131	and system breakdown. Many copy
-0.288643	from making an unused copy
-0.231321	the entire object. Any copy
-0.011600	functions have a non-inlined copy
-0.011600	to make a non-inlined copy
-0.011600	from making a non-inlined copy
-0.035772	another module. This non-inlined copy
-0.212240	for saving a backup copy
-0.199970	applies to default constructors, copy
-0.085930	1.1 The costs of optimizing
-0.443709	with the requirements of optimizing
-0.294154	optimizing for size and optimizing
-0.348899	library functions than in optimizing
-0.237620	invest more efforts in optimizing
-0.976402	can be useful for optimizing
-0.594454	languages are good for optimizing
-0.342819	best algorithm than by optimizing
-0.406805	lot to gain by optimizing
-0.289294	cannot assume that an optimizing
-0.233581	declared volatile then an optimizing
-0.233581	an issue because an optimizing
-0.233581	or C1::f. But an optimizing
-0.233581	in most cases, an optimizing
-0.407772	is more important than optimizing
-0.516323	take into account when optimizing
-0.237525	not very good at optimizing
-0.293746	by a variable because optimizing
-0.236918	offer the choice between optimizing
-0.215112	the first program. An optimizing
-0.215112	variable is used. An optimizing
-0.215112	8.1 below. Devirtualization An optimizing
-0.215112	for char pointers). An optimizing
-0.496635	one of the best optimizing
-0.334656	in performance. A good optimizing
-0.235331	accessed very fast. All optimizing
-0.233488	it has many advanced optimizing
-0.331582	// Volatile to prevent optimizing
-0.224855	do the best job optimizing
-0.165102	// Serialize // Prevent optimizing
-0.592256	static part of the memory.
-0.356594	randomly around in the memory.
-0.356594	stored contiguously in the memory.
-0.314282	64 consecutive bytes of memory.
-0.643461	the same piece of memory.
-0.293785	of copying blocks of memory.
-0.337516	in registers, not in memory.
-0.494406	registers rather than in memory.
-0.875252	near each other in memory.
-0.543569	elements are stored in memory.
-0.679613	for objects stored in memory.
-0.495190	and scattered around in memory.
-0.291765	will save temp in memory.
-0.291765	necessarily stored sequentially in memory.
-0.291765	the elements consecutively in memory.
-0.583807	other in the code memory.
-0.171471	scattered around in program memory.
-0.171471	modules contiguous in program memory.
-0.234526	of the library into memory.
-0.321236	program is loaded into memory.
-0.272667	be stored in static memory.
-0.181284	are stored in static memory.
-0.286373	more efficiently than static memory.
-0.573281	static memory to stack memory.
-0.272734	to align dynamically allocated memory.
-0.272734	of aligning dynamically allocated memory.
-0.233285	registers instead of main memory.
-0.139876	the use of RAM memory.
-0.139876	the speed of RAM memory.
-0.182252	intermediate results in RAM memory.
-0.222356	container, preferably with contiguous memory.
-0.582671	want to use the well
-0.356454	platform. However, with a well
-0.237865	use a systematic and well
-0.046950	then you may as well
-0.046950	but you may as well
-0.521339	of the program as well
-0.229054	common string functions as well
-0.229054	and 64-bit Linux as well
-0.229054	the .NET framework as well
-0.229054	applies to reading as well
-0.229054	for software users as well
-0.229054	double, bool, enum as well
-0.229054	are not yet as well
-0.229054	Server 2008 R2 as well
-0.500089	LIBM libraries are not well
-0.354786	value of a pointer well
-0.236935	computationally intensive may very well
-0.312001	loop depends on how well
-0.226949	generates to see how well
-0.226949	useful for checking how well
-0.229208	do not always work well
-0.416044	that it doesn't work well
-0.321677	function library that works well
-0.227623	make sure it works well
-0.235267	also be predicted quite well
-0.032785	Nested loops are predicted well
-0.197877	loop-branch is usually predicted well
-0.212281	all x86 platforms. Works well
-0.199981	information. They have worked well
-0.199981	STL are universal, flexible, well
-1.135218	to make sure the information
-0.343596	system may store the information
-0.325337	a valuable source of information
-0.336161	link pointers and for information
-0.237697	object is known. This information
-0.498594	The compiler doesn't have information
-0.341472	can then use this information
-0.331453	and 119 for more information
-0.235287	exception handler needs all information
-0.235287	F1 has saved all information
-0.344423	the compiler has no information
-0.236963	have to save some information
-0.236526	a disassembly, probably without information
-0.236204	type identification adds extra information
-0.327950	doesn't have the necessary information
-0.228792	give the 124 necessary information
-0.232569	consecutively in memory. No information
-0.323646	not give the full information
-0.230048	a lot of added information
-0.291373	based on the CPUID information
-0.184700	number. The only CPUID information
-0.226715	are testing contains debug information
-0.322635	because the stack unwinding information
-0.165120	template class which gets information
-0.165120	second generation class gets information
-0.218455	give the compiler additional information
-0.218494	efficient to store application-specific information
-0.199920	selected. Compiler has insufficient information
-0.165066	are advised to seek information
-0.165066	have to save recovery information
-0.165066	if it has incomplete information
-0.519008	extra code. It is simply
-0.353292	is costless. It is simply
-0.302369	pointer. The pointer is simply
-0.392855	member function pointer is simply
-0.355889	are used, there is simply
-0.709319	class or structure is simply
-0.102179	of the difference is simply
-0.102179	functions. The difference is simply
-0.405379	post-increment. The effect is simply
-0.236043	Enums An enum is simply
-0.236043	with sequential labels is simply
-0.236043	support processor X" is simply
-0.346367	optimize both functions and simply
-0.351627	own error-handling function that simply
-0.324889	an overloaded function are simply
-0.407328	called. The values are simply
-0.237752	can become imprecise or simply
-0.331662	no extra time. It simply
-0.237414	as additions. When used simply
-0.293477	a data member pointer simply
-1.039918	a floating point number simply
-0.347763	dispatcher signal an error simply
-0.236326	thousands of people. I simply
-0.834321	signed and unsigned integers simply
-0.344817	smaller size is done simply
-0.340144	An array is implemented simply
-0.534288	positive floating point numbers simply
-0.432899	object can be copied simply
-0.304298	check for CPU brand simply
-0.301378	counts. It is measured simply
-0.199959	improve the performance significantly simply
-0.366599	If the compiler is able
-0.366599	example, the compiler is able
-0.382382	cases the microprocessor is able
-0.340381	the code to be able
-0.340381	the compiler to be able
-0.340381	you want to be able
-0.090914	compiler may not be able
-0.279458	it might not be able
-0.341120	the compiler may be able
-0.341120	future compilers may be able
-0.441098	M processor may be able
-0.475180	application program will be able
-0.337406	page 103) will be able
-0.343388	the compiler would be able
-0.175853	reductions the compilers are able
-0.175853	Fortunately, all compilers are able
-0.175853	Most C++ compilers are able
-0.175853	time. Some compilers are able
-0.175853	Unfortunately, few compilers are able
-0.175853	enabled. Few compilers are able
-0.253091	of Intel microprocessors are able
-0.332918	mechanisms. Modern microprocessors are able
-0.457883	but they are not able
-0.236778	compiler is usually not able
-0.532775	compiler is not always able
-0.320805	Modern CPUs are actually able
-0.217580	see whether they were able
-0.217580	I have tested were able
-0.287860	Newer processors are sometimes able
-0.587939	only if it is certain
-0.293820	b & 1 is certain
-0.237560	declared with #define is certain
-0.352304	to see if a certain
-0.352543	is lower than a certain
-0.324914	integer is within a certain
-0.237891	necessary to adhere to certain
-0.237831	optimally. The speed for certain
-1.404735	to make sure that certain
-0.237432	increase the likelihood that certain
-0.353503	thread. You cannot be certain
-0.550879	u; If you are certain
-0.343864	not optimal. There are certain
-0.343864	"Instruction tables". There are certain
-0.342999	automatically but only if certain
-0.237057	calculations in parallel if certain
-0.237721	and zero flags on certain
-0.357551	repeat count is not certain
-0.382419	SSE2 instruction sets have certain
-0.237518	to generate interrupts at certain
-0.346964	new instructions can make certain
-0.293547	there may be no certain
-0.559154	executed. It is therefore certain
-0.291223	set up to count certain
-0.338428	but it is quite certain
-0.235128	the CPUID instruction was certain
-0.310292	but unfortunately it prevents certain
-0.178603	then it is almost certain
-0.178603	a list is almost certain
-0.251247	code has to obey certain
-0.165086	is necessary to query certain
-0.428903	fast that the clock cycles
-0.147364	time unit is clock cycles
-0.147364	it uses more clock cycles
-0.241653	am using CPU clock cycles
-0.280043	accessed approximately two clock cycles
-0.147364	0 - 2 clock cycles
-0.192503	3 - 4 clock cycles
-0.192503	microprocessor wastes several clock cycles
-0.029351	is a few clock cycles
-0.029351	only a few clock cycles
-0.029351	takes a few clock cycles
-0.029351	needed a few clock cycles
-0.029351	until a few clock cycles
-0.046870	kludgy. The few clock cycles
-0.118447	something takes 10 clock cycles
-0.118447	still take 10 clock cycles
-0.248064	cycles. The core clock cycles
-0.181376	table are core clock cycles
-0.241243	3 - 5 clock cycles
-0.449716	10 - 20 clock cycles
-0.147364	2 and 15 clock cycles
-0.261066	(27 - 80 clock cycles
-0.067461	than a hundred clock cycles
-0.067461	but several hundred clock cycles
-0.147364	multiplication takes 11 clock cycles
-0.147364	code took 50 clock cycles
-0.015909	different size matrices, clock cycles
-0.147364	takes only 2-3 clock cycles
-0.147364	counter is counting clock cycles
-0.463182	int a[size], b[size]; // ...
-0.237209	DynamicArray[i] = WhateverFunction(i); // ...
-0.347804	< 1000; i++) { ...
-0.230366	int)i < 10) { ...
-0.230366	list[i] > 1.0) { ...
-0.230366	&& WriteFile(handle, ...)) { ...
-0.230366	i <= max) { ...
-0.230366	} catch (...) { ...
-0.230366	int)(max - min)) { ...
-0.315878	= 110; int i; ...
-0.315878	Sab ab[size]; int i; ...
-0.315878	float list[16]; int i; ...
-0.214084	int a[size], b[size], i; ...
-0.437673	int a, b, c; ...
-0.349685	class C1 { public: ...
-0.234132	F1() { C1 x; ...
-0.314729	{ S1 x, y; ...
-0.184739	a = CriticalFunction(b, c); ...
-0.184739	a = (*CriticalFunction)(b, c); ...
-0.212234	int order(int x); 136 ...
-0.403754	list[size]; int i, j; ...
-0.199965	#include <asmlib.h> void CriticalFunction(); ...
-0.466042	362880, 3628800, 39916800, 479001600}; ...
-0.199965	int i; float list[size]; ...
-0.165107	= 1000; int List[ArraySize]; ...
-0.165107	int a = Func1(2); ...
-0.165107	double log2 = log(2.0); ...
-0.165107	a[0], b[0], a[1], b[1], ...
-0.165107	... a = FactorialTable[b]; ...
-0.657247	can see that the addresses
-0.356137	only occurs because the addresses
-0.565971	variables to calculate the addresses
-0.382262	way to control the addresses
-0.331366	map file includes the addresses
-0.498237	used for calculating the addresses
-0.237701	vectors requires alignment to addresses
-0.237701	of data structures to addresses
-0.343636	Relocation. All pointers and addresses
-0.325174	may contain pointers or addresses
-0.237614	and destination both have addresses
-0.293777	and later reads from addresses
-0.349177	same range of memory addresses
-0.235898	loaded at round memory addresses
-0.293703	had read from different addresses
-0.237142	will use full 64-bit addresses
-0.309051	for storing function return addresses
-0.233141	execution by causing return addresses
-0.236547	attempts to translate these addresses
-0.236070	needs only calculate element addresses
-0.236004	0x3F00 and 0x4700. These addresses
-0.235667	the profiler itself. Function addresses
-0.235326	2. Position-independent code. All addresses
-0.246247	is smaller because relative addresses
-0.195497	This will generate relative addresses
-0.246247	for calculating self- relative addresses
-0.230017	needed for calculating row addresses
-0.226718	section contains no absolute addresses
-0.224869	procedure to calculate self-relative addresses
-0.218490	data members to round addresses
-0.571880	counter. This is a counter
-0.353729	monitor counter is a counter
-0.576196	needs a floating point counter
-0.466110	polynomial of the loop counter
-0.308129	calculations and the loop counter
-0.500811	not if the loop counter
-0.308129	counter, comparing the loop counter
-0.308129	to increment the loop counter
-0.102835	function of a loop counter
-0.237832	Division of a loop counter
-0.322690	changed freely. The loop counter
-0.214345	induction variable as loop counter
-0.295910	calculation time. A loop counter
-0.214345	0; // Initialize loop counter
-0.214345	log(c[i]); // Increment loop counter
-0.352622	by adding an integer counter
-0.379422	itself. You may add counter
-0.454917	The core clock cycles counter
-0.049850	// Table // Loop counter
-0.288613	counters. A performance monitor counter
-0.288613	particularly useful performance monitor counter
-0.113035	the core clock cycle counter
-0.113035	The core clock cycle counter
-0.027050	of the time stamp counter
-0.055898	with the time stamp counter
-0.069455	__rdtsc()). The time stamp counter
-0.069455	// Returns time stamp counter
-0.540395	If any of the shared
-0.504049	global variable in the shared
-0.480215	when called from the shared
-0.341066	when accessed from the shared
-0.325018	If we compile the shared
-0.483123	class and store the shared
-0.356426	A variable that is shared
-0.571247	not part of a shared
-0.497362	a function in a shared
-0.447975	but not in a shared
-0.447975	public variable in a shared
-0.340463	you are making a shared
-0.236835	possible to compile a shared
-0.314679	arbitrary memory address and shared
-0.552579	code is used in shared
-0.358381	are read-only can be shared
-0.502534	for variables that are shared
-0.237763	(dynamically linked libraries or shared
-0.357565	data section is not shared
-0.341546	by default, even when shared
-0.236087	of static libraries. A shared
-0.236087	mechanisms explained above. A shared
-1.119825	is possible to make shared
-0.462109	from within the same shared
-0.332234	variables in a 64-bit shared
-0.234031	for speeding up 64-bit shared
-0.043492	link libraries, also called shared
-0.495733	for a very large shared
-0.453179	each function call to count
-0.347291	be set up to count
-0.336283	add counter variables that count
-0.237800	zero than making it count
-0.472181	value of the loop count
-0.377381	only if the loop count
-0.377381	loops if the loop count
-0.312340	best when the loop count
-0.312340	constant. If the loop count
-0.337583	to make a loop count
-0.330915	AVX. 5. The loop count
-0.096935	above, the maximum loop count
-0.096935	same. The maximum loop count
-0.346887	to measure the clock count
-0.299908	as possible. The first count
-0.299908	following way. The first count
-0.342903	be non-zero, and therefore count
-0.234783	final destination, but don't count
-0.009926	fact that the repeat count
-0.004935	problem if the repeat count
-0.004935	well if the repeat count
-0.009926	problem when the repeat count
-0.009926	code). If the repeat count
-0.052066	with a high repeat count
-0.025253	near the maximum repeat count
-0.025253	the worst-case maximum repeat count
-0.052066	a very low repeat count
-0.052066	if the typical repeat count
-0.052066	small and fixed repeat count
-0.230071	do nothing while seconds count
-0.748804	each part of the program.
-1.288405	critical part of the program.
-0.518101	small part of the program.
-1.240296	critical parts of the program.
-0.350226	essential task of the program.
-1.185418	the beginning of the program.
-1.015690	the rest of the program.
-0.356059	one place in the program.
-0.356059	biggest time-consumer in the program.
-0.722627	be modified by the program.
-0.236991	disabled will crash the program.
-0.236991	operation that crashes the program.
-0.343194	overall performance of a program.
-0.632313	critical part of a program.
-0.343194	the development of a program.
-0.324737	that make up a program.
-0.458731	before you start to program.
-0.313814	declared in a C++ program.
-0.497733	output of the first program.
-0.436148	parts of a big program.
-0.419547	use a console mode program.
-0.320043	version of the application program.
-0.242351	not by the application program.
-0.242351	and market the application program.
-0.485667	loop in the main program.
-0.433094	optimizations of the whole program.
-0.501095	arrays in the final program.
-0.228136	the original, poorly designed program.
-0.272246	functionality to an existing program.
-0.555724	software, but it is quite
-0.356734	64 kbytes. This is quite
-0.353575	dispatch strategies It is quite
-0.353575	to a*b*c*2. It is quite
-0.354064	for everything, which is quite
-0.340319	Development in C++ is quite
-0.236722	to these problems is quite
-0.236722	12.4b and 12.4c is quite
-0.580247	modified. This can be quite
-0.453321	intrinsic functions can be quite
-0.544933	instructions which can be quite
-0.350801	programming languages can be quite
-1.126634	but it may be quite
-0.237820	each time slice are quite
-0.357563	C++ but is not quite
-0.331521	cached. This can have quite
-0.237421	newest CPU model, which quite
-0.237417	safe and flexible, but quite
-0.352313	versatile. Fortran is also quite
-0.308812	RAM memory can take quite
-0.308812	multiplications, which can take quite
-0.501955	where it is accessed quite
-0.500086	Sometimes the compiler does quite
-0.309032	0.666666666666666666667; This is actually quite
-0.302856	power consumption are actually quite
-0.336682	can also be predicted quite
-0.230768	Reducible expressions also occur quite
-0.230064	moved, which may happen quite
-0.224867	filled up, which happens quite
-0.296184	make software that runs quite
-1.604749	SSE2 instruction set is used.
-0.486578	before the pointer is used.
-0.757131	which the variable is used.
-0.427635	point register stack is used.
-0.405899	long double precision is used.
-0.236395	specific graphics framework is used.
-0.228332	if static linking is used.
-0.228332	when static linking is used.
-0.236395	#pragma vector nontemporal is used.
-0.236395	explicitly when alloca is used.
-0.355674	the set can be used.
-0.500470	simple array can be used.
-0.355903	file formats should be used.
-0.150191	point stack registers are used.
-0.163169	the XMM registers are used.
-0.275559	if XMM registers are used.
-0.319250	every time they are used.
-0.290830	in which they are used.
-0.500113	shared objects are not used.
-0.859629	the amount of memory used.
-0.324215	the type of registers used.
-0.245046	libraries it is never used.
-0.245046	the program is never used.
-0.162488	object is no longer used.
-0.226120	they are no longer used.
-0.327230	the program is actually used.
-0.218564	this feature is seldom used.
-0.337591	server. Use large data files
-0.235837	reading and writing data files
-0.237495	cache space or make files
-0.047973	program that scans all files
-0.047973	scanner that scans all files
-0.237181	if the necessary library files
-0.347519	declaration and the object files
-0.330564	release version of object files
-0.231058	commercial compilers. Mixing object files
-0.237053	complex framework requiring many files
-0.236017	necessary to load several files
-0.311617	file format. The intermediate files
-0.209175	kept in different source files
-0.209175	to join all source files
-0.209175	two steps. All source files
-0.230750	use more time loading files
-0.230750	of modules or resource files
-0.284353	names of the header files
-0.200024	defined in Intel header files
-0.165146	whether to store help files
-0.074770	files, resource files, help files
-0.074770	files, configuration files, help files
-0.226706	Open database connections. Open files
-0.129449	of compiling multiple .cpp files
-0.129449	for combining multiple .cpp files
-0.148717	read and write configuration files
-0.148717	of several drivers, configuration files
-0.218553	(Gnu) Table 12.2. Header files
-0.212229	a different compiler. Object files
-0.165102	and network connections. Temporary files
-0.460801	compilers then it is recommended
-0.460801	available then it is recommended
-0.460801	segment then it is recommended
-0.524586	calculated. Therefore, it is recommended
-0.337679	is used, it is recommended
-0.337679	between platforms, it is recommended
-0.436772	team projects, it is recommended
-0.400039	unreferenced functions. It is recommended
-0.454450	is used. It is recommended
-0.308192	be called. It is recommended
-0.400039	'this' pointer. It is recommended
-0.308192	function calls. It is recommended
-0.400039	alignment problems. It is recommended
-0.308192	data members. It is recommended
-0.308192	exception handling. It is recommended
-0.308192	dynamic versions. It is recommended
-0.308192	doing divisions. It is recommended
-0.308192	error message. It is recommended
-0.308192	page 54. It is recommended
-0.308192	page 61. It is recommended
-0.308192	many decimals. It is recommended
-0.395184	} It is not recommended
-0.395184	libraries It is not recommended
-0.395184	programs. It is not recommended
-0.351347	network resources are not recommended
-0.415387	dependent and therefore not recommended
-0.495831	cases it is also recommended
-0.420794	efficiently. It is therefore recommended
-0.420794	correctness. It is therefore recommended
-0.165174	_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It is strongly recommended
-0.357521	temporary object for the intermediate
-0.357196	possible overflow on the intermediate
-0.343289	have to store the intermediate
-0.293857	interpreting or compiling the intermediate
-0.237592	interpreter which interprets the intermediate
-0.449098	itself. Another disadvantage of intermediate
-0.352391	directly compiled code and intermediate
-0.237605	or double precision, and intermediate
-0.293329	object file format. The intermediate
-0.237128	the second step. The intermediate
-0.237128	that is distributed. The intermediate
-0.237851	of temporary objects for intermediate
-0.325213	you must consider if intermediate
-0.320861	A language based on intermediate
-0.320861	level framework based on intermediate
-0.346118	just-in-time compilation of an intermediate
-0.340368	first compiled to an intermediate
-0.332496	is implemented with an intermediate
-0.324583	programming languages use an intermediate
-0.230548	of Pascal used an intermediate
-0.233505	disadvantage of using an intermediate
-0.233505	reason for using an intermediate
-0.322757	is compiled into an intermediate
-0.236867	+ d; // makes intermediate
-0.291440	backup copy of every intermediate
-0.399311	it possible to store intermediate
-0.307604	the need to store intermediate
-0.235357	is an integer). All intermediate
-0.281501	innermost loop by storing intermediate
-0.212257	on big runtime frameworks, intermediate
-1.142605	when the code is fast
-0.237568	in character arrays is fast
-0.237568	a linear search, is fast
-0.334329	well-structured code and for fast
-0.236360	Convert to unsigned for fast
-0.337481	kind of instructions for fast
-0.292455	specify the options for fast
-0.236360	Loops: A sourcebook for fast
-0.357376	network access may be fast
-0.446088	size. Integer operations are fast
-0.328517	member function is as fast
-0.289040	operator i++ are as fast
-0.376496	operations are therefore as fast
-0.120905	may be just as fast
-0.120905	calculations are just as fast
-0.120905	a vector just as fast
-0.325127	The compilers also have fast
-0.326102	Modern CPUs are so fast
-0.233251	market is developing so fast
-0.349766	of constants is very fast
-0.345422	counter, which is calculated fast
-0.235257	software that runs quite fast
-0.324882	Vector operations are particularly fast
-0.597453	is recommended to enable fast
-0.229194	minimum, maximum, saturated addition, fast
-0.283077	or p->member is equally fast
-0.309471	can do the job fast
-0.218533	written. This worked sufficiently fast
-0.165112	addition, fast approximate reciprocal, fast
-0.358052	little overhead to the allocation
-0.314253	to the object. The allocation
-0.293758	of 64-bit integers. The allocation
-0.314573	priority. Especially the memory allocation
-0.065134	uses of dynamic memory allocation
-0.065134	cost of dynamic memory allocation
-0.065134	advantages of dynamic memory allocation
-0.065134	disadvantages of dynamic memory allocation
-0.197952	to use dynamic memory allocation
-0.119793	libraries use dynamic memory allocation
-0.197952	classes use dynamic memory allocation
-0.119793	Java, use dynamic memory allocation
-0.285772	and avoid dynamic memory allocation
-0.127512	purposes. All dynamic memory allocation
-0.127512	classes Whenever dynamic memory allocation
-0.018636	system code. Dynamic memory allocation
-0.018636	memory allocation Dynamic memory allocation
-0.018636	memory allocation. Dynamic memory allocation
-0.018636	caching inefficient. Dynamic memory allocation
-0.018636	systems). 28 Dynamic memory allocation
-0.018636	optimization are. Dynamic memory allocation
-0.018636	is limited. Dynamic memory allocation
-0.009218	memory. 9.6 Dynamic memory allocation
-0.009218	90 9.6 Dynamic memory allocation
-0.236785	compiler to optimize register allocation
-0.338724	The process of dynamic allocation
-0.304403	consumer if it involves allocation
-0.228142	in advance. The frequent allocation
-0.224929	compilation is finished. Register allocation
-0.276805	int cc[]) { for (int
-0.276805	__restrict bb) { for (int
-0.618058	sum = 0; for (int
-0.254237	b[size]; // ... for (int
-0.254237	int List[ArraySize]; ... for (int
-0.037658	the biggest vectors: for (int
-0.002260	the eight-element vectors: for (int
-0.233061	of 100 floats for (int
-0.233061	= 0, sum; for (int
-0.233061	nfac = 1.f; for (int
-0.233061	the name _alloca) for (int
-0.087045	Example 14.1b int factorial (int
-0.087045	Example 14.1a int factorial (int
-0.003943	Example 8.7 int SomeFunction (int
-0.003943	Example 8.9b int SomeFunction (int
-0.003943	Example 8.9a int SomeFunction (int
-0.003943	Example 8.11b int SomeFunction (int
-0.003943	Example 8.11a int SomeFunction (int
-0.020083	Example 7.1 float SomeFunction (int
-0.020083	#include <malloc.h> void SomeFunction (int
-0.218568	Example 7.42 int Multiply (int
-0.212286	Example 8.21 void Func1 (int
-0.200015	Example 8.5a void Plus2 (int
-0.200015	<int m> int MultiplyBy (int
-0.200015	Example 7.12 void FuncA (int
-0.165153	2; } void FuncB (int
-0.356616	uncached read because the write
-0.302914	+ b than to write
-0.302914	| operations than to write
-1.660108	It is possible to write
-0.365249	It is easier to write
-0.254713	is just easier to write
-0.406132	programmer may prefer to write
-0.380945	took several minutes to write
-0.236553	threads from attempting to write
-0.237605	calculate the value and write
-0.407687	WritePrivateProfileString to read and write
-0.102582	buffer and read or write
-0.102582	Do not read or write
-0.353218	in Windows, you may write
-0.353574	not needed. You may write
-0.651015	for example if you write
-0.236717	Programmers do, however, often write
-0.313540	this problem. These instructions write
-0.229940	obsolete. But if I write
-0.229940	different speeds. If I write
-0.693536	contentions if the threads write
-0.060185	9.10, then the nontemporal write
-0.060185	cache. Using the nontemporal write
-0.130073	the effect of nontemporal write
-0.130073	memory area. The nontemporal write
-0.130073	back. The so-called nontemporal write
-0.230066	don't think that programmers write
-0.218525	memory store An uncached write
-0.165128	__attribute(( fastcall)) __fastcall Noncached write
-0.336525	the function and to optimize
-0.689562	for the compiler to optimize
-0.608983	allows the compiler to optimize
-0.384352	allow the compiler to optimize
-0.541040	enables the compiler to optimize
-0.291145	to organize data to optimize
-0.590342	addresses in order to optimize
-0.951026	if you want to optimize
-0.728305	when you want to optimize
-0.641226	is very important to optimize
-0.492254	not be necessary to optimize
-0.291145	use this information to optimize
-1.075275	not be able to optimize
-0.645210	Before you start to optimize
-0.322069	Should we try to optimize
-0.237874	able to inline and optimize
-0.431351	A good compiler can optimize
-0.333359	a just-in-time compiler can optimize
-0.211482	instruction sets. Does not optimize
-0.211482	an IDE. Does not optimize
-0.357685	can help the compiler optimize
-0.643081	A good compiler will optimize
-0.047897	compiler 8.1 How compilers optimize
-0.047897	66 8.1 How compilers optimize
-0.335310	good compiler can often optimize
-0.313439	The compiler can easily optimize
-0.218537	the function or otherwise optimize
-0.165138	120 ms by selecting optimize
-0.165138	are discussed below. Cannot optimize
-0.357894	or more of the above
-0.452896	The code in the above
-0.350465	sign bit in the above
-0.350465	the methods in the above
-0.350465	time-consumers mentioned in the above
-0.350465	as shown in the above
-0.340394	is removed from the above
-0.340394	may deviate from the above
-0.352623	An array using the above
-0.399048	much faster. In the above
-0.307390	very big. In the above
-0.236738	precision. Let's repeat the above
-0.236738	be inlined. (In the above
-0.292885	Let me explain the above
-0.236738	existing program. Weighing the above
-0.292518	in a register. The above
-0.236416	sum += a[i]; The above
-0.236416	from unknown sources. The above
-0.236416	only happens rarely. The above
-0.236416	char pointers. 144 The above
-0.023459	_controlfp(0, _EM_OVERFLOW); // if above
-0.237724	it actually is. This above
-0.237431	cache lines we used above
-0.232703	negative. The method described above
-0.229228	have the disadvantages mentioned above
-0.276552	elements with column 28 above
-0.200009	the mirror elements matrix[c][r] above
-0.165148	at its mirror position above
-0.236633	anyway in 64-bit code. However,
-0.236193	of the next function. However,
-0.342815	vector nontemporal is used. However,
-0.233480	in the x86 CPUs. However,
-0.375859	on all major platforms. However,
-0.288506	regardless of the size. However,
-0.232191	space were scarce resources. However,
-0.306816	and reproducible as possible. However,
-0.231293	for the other thread. However,
-0.561342	for many different purposes. However,
-0.371790	for large data sets. However,
-0.229126	resources are most critical. However,
-0.229087	before they are executed. However,
-0.228054	constant with its value. However,
-0.226655	later instruction set. 120 However,
-0.226655	on the actual processor. However,
-0.226680	execution mechanism works automatically. However,
-0.272118	that's what they are. However,
-0.218481	calculated the fastest first. However,
-0.212171	on a PC platform. However,
-0.251203	window of a debugger. However,
-0.251203	graphics on the screen. However,
-0.251203	in the program flow. However,
-0.199903	start the next calculation. However,
-0.199903	the best Java implementations. However,
-0.165050	it inside {} brackets. However,
-0.165050	helpful for later maintenance. However,
-0.165050	for specific CPU models. However,
-0.165050	information for function F1. However,
-0.455650	be invalid if a was
-0.406826	the cache line that was
-0.237022	is a model that was
-0.237022	value of ebx that was
-0.182516	at the time it was
-0.319855	as last time it was
-0.235823	has the value it was
-0.357393	(i.e. where the function was
-0.356037	pulses since the CPU was
-0.443632	when the CPUID instruction was
-0.503240	set. This instruction set was
-0.347286	operations so that there was
-0.057921	the time the software was
-0.478618	speed on non-Intel CPUs was
-0.233754	the C++ template feature was
-0.232940	last time the statement was
-0.232606	units. Each 128-bit operation was
-0.230000	another thread. If seconds was
-0.229123	processors because this brand was
-0.460855	task when the CPUID was
-0.224901	only four multiplications. How was
-0.222277	decades ago, the recommendation was
-0.361165	interpreted version of Basic was
-0.403578	and the time consumption was
-0.296142	example 15.1b to 15.1c was
-0.218473	function in which alloca was
-0.199937	list in example 11.2b was
-0.343732	containing the members of both
-0.380561	are the same in both
-0.485540	libraries are available in both
-0.292361	intended to work in both
-0.236277	new cache line in both
-0.329853	family can run in both
-0.313464	inline assembly syntax in both
-0.236277	non- standardized details in both
-0.357365	a network may be both
-0.237806	u.f and v.f are both
-0.457702	It will fail if both
-0.237057	u.f > v.f if both
-0.545739	AVX is supported by both
-0.314515	this may work with both
-0.614948	a lot of time both
-0.293807	a and b will both
-0.237561	b are swapped then both
-0.237517	It requires support from both
-0.237501	oriented programming style has both
-0.293742	has const twice because both
-0.235161	to inline and optimize both
-0.235018	at run time. Therefore, both
-0.233223	My test tool supports both
-0.232961	fully optimized yet. Supports both
-0.232307	Initialize loop counter outside both
-0.230046	OS independent and checks both
-0.224837	and they always evaluate both
-0.199942	that source and destination both
-0.165086	embedded systems. Today (2013) both
-0.827822	more resources than the programs
-0.237886	installation and uninstallation of programs
-0.483767	64-bit operating systems and programs
-0.313744	optimizing CPU use in programs
-1.048300	can be useful in programs
-0.406908	and double precision in programs
-0.428484	prevent such errors in programs
-0.324815	a time-consumer even for programs
-0.237455	be annoyingly high for programs
-0.237719	commonly the case with programs
-0.237131	you can expect 64-bit programs
-0.619776	of errors in C++ programs
-0.237116	computer with many such programs
-0.237034	CPU-intensive code. But many programs
-0.313953	Automatic updates Many software programs
-0.236908	little faster than 32-bit programs
-0.441747	if you are making programs
-0.236257	regular time intervals. Some programs
-0.292022	discovered that many common programs
-0.235537	even matters, which few programs
-0.338561	bit Windows and Mac programs
-0.234944	Network access Some application programs
-0.234095	for user input. Many programs
-0.231328	the user's time. Other programs
-0.336340	reasons why object oriented programs
-0.276458	the throughput of CPU-intensive programs
-0.165076	usability problem in interactive programs
-0.165076	the same cache. Multithreaded programs
-0.358443	about investigation of the problems
-0.356606	spent fighting with the problems
-0.332602	rid of all the problems
-0.332602	not solve all the problems
-0.325099	However, this involves the problems
-0.481250	prevent this kind of problems
-0.237903	are less susceptible to problems
-0.835599	because the CPU has problems
-0.231248	the solution to these problems
-0.317236	error prone. All these problems
-0.236563	or Gnu compilers without problems
-0.236027	from the server. These problems
-0.312485	technical problems. Some common problems
-0.336167	of software can cause problems
-0.227883	identification. Such schemes cause problems
-0.222965	there are no caching problems
-0.222965	memory can cause caching problems
-0.207376	frequent causes of compatibility problems
-0.207376	frequent sources of compatibility problems
-0.074780	installation time and compatibility problems
-0.074780	resource problems and compatibility problems
-0.203317	frequent sources of resource problems
-0.335270	swapping and other resource problems
-0.480981	most useful for finding problems
-0.165140	in terms of usability problems
-0.165140	well as important usability problems
-0.165140	bugs, compatibility problems, usability problems
-0.212252	the inlining causes technical problems
-0.341549	a very long time unless
-0.444748	making an induction variable unless
-0.330975	will not do so unless
-0.337847	cannot do the optimization unless
-0.292714	always transferred as pointers unless
-0.542963	variables in 32-bit systems unless
-0.236318	change && to & unless
-0.080718	performance on non-Intel CPUs unless
-0.535296	strict floating point calculations unless
-0.390163	variables in 32-bit mode unless
-0.390163	set the flush-to-zero mode unless
-0.347285	support for exception handling unless
-0.636768	should definitely be avoided unless
-0.294218	division, which is slow unless
-0.212243	point comparisons are slow unless
-0.318480	use a stack frame unless
-0.232268	efficient, but not safe unless
-0.325847	the program more clear unless
-0.304296	and off by default unless
-0.226697	copying a large object, unless
-0.226697	longer time than rounding unless
-0.212194	unroll a loop manually unless
-0.330680	32-bit Mac OS X, unless
-0.165071	short int (16 bits), unless
-0.165071	rather than as b*(2.0/3.0) unless
-0.165071	for an integer constant, unless
-0.165071	induction variable method unfavorable, unless
-0.462313	different functions in the optimal
-0.419977	likely to be the optimal
-0.342839	be set then the optimal
-0.342839	the case then the optimal
-0.613905	In most cases, the optimal
-0.338289	done to choose the optimal
-0.522688	able to find the optimal
-0.314040	compiler will produce the optimal
-0.011568	4 2 Choosing the optimal
-0.023454	23 5 Choosing the optimal
-0.023454	website. 5 Choosing the optimal
-0.355651	the type that is optimal
-0.546256	evaluate whether it is optimal
-0.458056	some cases, it is optimal
-0.356952	0; 35 This is optimal
-0.348415	see which solution is optimal
-0.331068	a vector implementation is optimal
-0.237869	addition is finished. The optimal
-0.991656	It may not be optimal
-1.078925	then it may be optimal
-0.927703	but it may be optimal
-0.971918	when it is not optimal
-0.340913	PC processors is not optimal
-0.624559	a union is not optimal
-0.340913	loop unrolling is not optimal
-0.478343	This is not an optimal
-0.236829	Mars compilers produce less optimal
-0.237937	need to deallocate the space
-0.237909	call statement occupies a space
-0.638964	the required amount of space
-0.294155	than the heap. The space
-0.234817	loop takes up more space
-0.311048	Does not allocate more space
-0.234817	rather than allocating more space
-0.433919	is that the memory space
-0.493184	required amount of memory space
-0.165068	the stack. The memory space
-0.165068	table (PLT). The memory space
-0.224739	these purposes. This memory space
-0.224739	the most efficient memory space
-0.224739	some cases take memory space
-0.224739	used for saving memory space
-0.224739	less efficient. Extra memory space
-1.198741	can use the same space
-0.537860	a lot of cache space
-0.293642	the amount of cache space
-0.524468	that take up cache space
-0.230647	it can save cache space
-0.236849	from the larger address space
-0.312526	it takes too much space
-0.286760	RAM memory and disk space
-0.231358	compact and takes little space
-0.278265	collection when the heap space
-0.207147	and causes the heap space
-0.289778	page 26. The heap space
-0.355987	in performance. There are cases,
-0.336028	call __intel_cpu_features_init_x(). In other cases,
-0.331271	are permissible in all cases,
-0.238162	of time in most cases,
-0.238162	are fast in most cases,
-0.444748	hardware implementation in most cases,
-0.238162	efficient because, in most cases,
-0.126902	Difficult cases In most cases,
-0.126902	limit, etc. In most cases,
-0.126902	unsigned integers In most cases,
-0.126902	each calculation. In most cases,
-0.126902	many strings. In most cases,
-0.237143	mathematical calculations. In such cases,
-0.170252	less efficient. In many cases,
-0.170252	low priority. In many cases,
-0.286671	be useful in some cases,
-0.286671	improve optimizations in some cases,
-0.286671	the iterator in some cases,
-0.210496	detection function In some cases,
-0.210496	critical function. In some cases,
-0.210496	software users. In some cases,
-0.210496	Journal, 2002). In some cases,
-0.210496	its API. In some cases,
-0.230234	Member pointers In simple cases,
-0.230234	destroyed. In 50 simple cases,
-0.493034	There are a few cases,
-0.313947	except in the simplest cases,
-0.237248	Except for the simplest cases,
-0.236368	Set function pointer if else
-0.077454	pointer if else if else
-0.077454	else if else if else
-0.902073	} } } } else
-0.326281	goto DTRUE; } } else
-0.388978	d = 0; } else
-0.261967	a = b; } else
-0.335135	c = 1; } else
-0.297792	a + 1; } else
-0.209455	// Table lookup } else
-0.054290	a * 2; } else
-0.562852	y + 1.; } else
-0.209455	f is nonzero } else
-0.209455	of range"; 134 } else
-0.209455	out of range"; } else
-0.209455	0) { FuncA(i); } else
-0.092221	(y) { F1(a); } else
-0.092221	int a[1000]; F1(a); } else
-0.261967	CriticalFunction = &CriticalFunction_SSE2; } else
-0.261967	CriticalFunction = &CriticalFunction_AVX; } else
-0.209455	+ 1; 69 } else
-0.132911	cannot rely on anything else
-0.132911	more time than anything else
-0.132911	necessary to optimize anything else
-0.165174	CFALSE; } } 34 else
-0.165174	= sin(x); } 68 else
-0.454967	consuming. There is a lot
-0.454967	Conclusion There is a lot
-0.348476	be advantageous if a lot
-0.351492	of Func with a lot
-0.452435	interface can use a lot
-0.244717	have to do a lot
-0.072059	compilers can do a lot
-0.072059	CPUs can do a lot
-0.235284	objects that take a lot
-0.235284	These conversions take a lot
-0.401623	there is often a lot
-0.451422	This may cause a lot
-0.347428	the program uses a lot
-0.265131	particular application uses a lot
-0.336935	will soon get a lot
-0.319979	code often contains a lot
-0.326406	lookup or require a lot
-0.233496	You can save a lot
-0.233496	and they waste a lot
-0.233496	manager can spend a lot
-0.076789	framework can consume a lot
-0.076789	database can consume a lot
-0.233496	framework still consumes a lot
-0.233496	and drivers differ a lot
-0.233496	irrelevant software installed, a lot
-0.233496	amount of RAM, a lot
-0.334436	useful mathematical functions. A lot
-0.236139	organized into vectors. A lot
-1.710645	x x - - Integer
-1.716885	a power of 2 Integer
-0.465478	optimization Whole program optimization Integer
-1.220600	known at compile time. Integer
-0.470102	division take longer time. Integer
-0.323570	but risk of overflow Integer
-0.425940	Integers variables and operators Integer
-0.226378	undesired results. Integer operators Integer
-0.417985	performance. 14.4 Integer multiplication Integer
-0.339767	96. 14.5 Integer division Integer
-0.375921	explicitly in many cases. Integer
-0.232916	of a specific size. Integer
-0.317912	any cost in performance. Integer
-0.224850	a memory pool. 15 Integer
-0.276458	on most other microprocessors. Integer
-0.218502	space used for constants. Integer
-0.218502	may produce undesired results. Integer
-0.122607	for the performance. 14.4 Integer
-0.122607	at once................................... 135 14.4 Integer
-0.488896	depending on the microprocessor. Integer
-0.074742	multiplication ............................................................................................. 136 14.5 Integer
-0.074742	on page 96. 14.5 Integer
-0.199931	as an integer. 158 Integer
-0.199931	140 for further discussion. Integer
-0.465980	depending on the processor). Integer
-0.165076	that contains integer division: Integer
-0.165076	is. // Example 15.1d. Integer
-0.165076	example: // Example 8.24. Integer
-0.357420	as possible, and the dispatching
-0.814513	possible to do the dispatching
-0.237510	Writes "Hello 2" The dispatching
-0.237510	on page 44. The dispatching
-0.319700	to do the CPU dispatching
-0.319700	to override the CPU dispatching
-0.288054	common pitfalls of CPU dispatching
-0.253870	is intended for CPU dispatching
-0.202272	Example 13.1 // CPU dispatching
-0.253870	of research on CPU dispatching
-0.202272	function libraries have CPU dispatching
-0.253870	without AVX using CPU dispatching
-0.530973	code with automatic CPU dispatching
-0.223342	program contains automatic CPU dispatching
-0.202272	The compiler supports CPU dispatching
-0.202272	you should apply CPU dispatching
-0.202272	into vector c: CPU dispatching
-0.202272	Insert an explicit CPU dispatching
-0.202272	VIA processors. Explicit CPU dispatching
-0.089465	instruction set. 13.6 CPU dispatching
-0.089465	..................................................................................................... 126 13.6 CPU dispatching
-0.089465	......................................................................... 128 13.7 CPU dispatching
-0.089465	critical. 129 13.7 CPU dispatching
-0.202272	// Example 13.2. CPU dispatching
-0.437350	mechanism because it makes dispatching
-0.290532	the function. The automatic dispatching
-0.035782	strategies........................................................................................ 122 13.2 Model-specific dispatching
-0.035782	source files. 13.2 Model-specific dispatching
-0.358544	cause overflow in the particular
-0.354614	newest CPU of a particular
-0.347941	dealing with in a particular
-0.449703	or not in a particular
-0.347941	be invalid in a particular
-0.287343	right function for a particular
-0.287343	to use for a particular
-0.287343	is compiled for a particular
-0.287343	each optimized for a particular
-0.287343	and fine-tuned for a particular
-0.536772	This means that a particular
-0.293022	16) shows that a particular
-0.293022	the possibility that a particular
-0.293022	developers feel that a particular
-0.333676	very long on a particular
-0.333676	particularly bad on a particular
-0.324236	32-bit integer has a particular
-0.324236	a processor has a particular
-0.540975	speed by using a particular
-0.341692	be cases where a particular
-0.233844	When considering whether a particular
-0.233844	best job optimizing a particular
-0.233844	model N supports a particular
-0.321660	you can expect a particular
-0.233844	time you activate a particular
-0.294170	optimization effort on that particular
-0.314717	lookup. Lookup tables are particular
-0.335704	of CPU that each particular
-0.524795	method is that the microprocessor
-0.524795	consequence is that the microprocessor
-0.446096	the case that the microprocessor
-0.507041	instructions require that the microprocessor
-0.960453	is supported by the microprocessor
-0.356327	are relying on the microprocessor
-0.354621	is slow, then the microprocessor
-0.355220	allocation process because the microprocessor
-0.447623	threads simultaneously. If the microprocessor
-0.346295	+= sum2; If the microprocessor
-0.349778	function call makes the microprocessor
-0.313241	code and how the microprocessor
-0.313241	In most cases the microprocessor
-0.381090	on how well the microprocessor
-0.381090	version that fits the microprocessor
-0.236657	an if-else structure), the microprocessor
-0.539521	device than in a microprocessor
-0.629067	possible to implement a microprocessor
-0.237564	instruction set (requires a microprocessor
-0.128459	cache. 2.2 Choice of microprocessor
-0.128459	5 2.2 Choice of microprocessor
-0.237890	programming, compiler technology, and microprocessor
-0.237903	to general improvements in microprocessor
-0.236121	loop-carried dependency chain. A microprocessor
-0.236121	simple integer counter. A microprocessor
-0.147403	combination of a dedicated microprocessor
-0.208701	slower than a dedicated microprocessor
-0.355093	such errors is to replace
-0.568093	systems you have to replace
-0.847153	it is possible to replace
-0.750947	may be possible to replace
-0.708992	may be necessary to replace
-1.202842	it is advantageous to replace
-0.313691	It is expected to replace
-0.389338	so the compiler can replace
-0.389338	Here, the compiler can replace
-0.455353	inlining The compiler can replace
-0.617817	an optimizing compiler can replace
-0.325338	global const variable or replace
-0.407450	} The compiler may replace
-0.237445	1.0f; The compiler may replace
-0.237445	1.0f;} The compiler may replace
-0.237445	4; The compiler may replace
-0.237445	(&a); The compiler may replace
-0.349716	a debugger. You may replace
-0.541904	3.0; The compiler will replace
-0.513947	division. Some compilers will replace
-0.237580	is poorly predictable then replace
-0.341163	false. Likewise, you cannot replace
-0.339739	^ 1; You cannot replace
-0.313339	Optimizing compilers will often replace
-0.543690	the compiler can automatically replace
-0.362109	Most compilers will automatically replace
-0.357788	the code of the next
-1.125561	a pointer to the next
-0.456332	fed directly to the next
-0.340841	as explained in the next
-0.457321	G values in the next
-0.460744	the function for the next
-0.574604	this is that the next
-0.450947	likely case that the next
-0.526089	and assume that the next
-0.356250	5 μs on the next
-0.348807	size only when the next
-0.348807	an update when the next
-0.456492	problem and make the next
-0.323737	before we need the next
-0.292698	fail to start the next
-0.342189	valid only until the next
-0.735133	in 32-bit mode. The next
-0.534478	the function returns. The next
-0.291712	bit of ebx. The next
-0.235707	may need metaprogramming. The next
-0.235707	out of range. The next
-0.291712	ebx contains i/2+r. The next
-0.235707	AMD and VIA. The next
-0.237821	xxn *= xx4; // next
-0.235762	square x // get next
-0.165174	today will be mainstream next
-0.503773	program, especially if the branches
-0.554642	small. The number of branches
-0.578541	get a lot of branches
-0.237323	paragraph. The target of branches
-0.530419	executes a series of branches
-0.325334	of jumps, calls and branches
-0.237834	far from optimal. The branches
-0.460154	such contentions is that branches
-0.283455	verify that all code branches
-0.211550	can call all code branches
-0.237382	order to test all branches
-0.293650	and put seldom used branches
-0.324653	repeat count and no branches
-0.515420	which of the two branches
-0.341476	A program with many branches
-0.612980	a program has many branches
-0.441761	If you are making branches
-0.236009	loop that contains several branches
-0.235551	should have as few branches
-0.332142	program where the dispatch branches
-0.232263	is correlated with preceding branches
-0.231817	if appropriate. 8. Avoid branches
-0.089858	range analysis Join identical branches
-0.089858	memory area. Join identical branches
-0.212258	branches Eliminate jumps Eliminate branches
-0.165091	by vector size. Unpredictable branches
-0.165091	packing, unpacking needed. Predictable branches
-0.356942	|| b; This is typically
-0.348147	The conversion time is typically
-0.340113	shared object which is typically
-0.340113	line size, which is typically
-0.345549	cache line size is typically
-0.237060	new or malloc is typically
-0.237899	get time slices of typically
-0.237440	other data structures that typically
-0.237440	machine are frameworks that typically
-0.324155	console mode program are typically
-0.351402	optimized. Library functions are typically
-0.236915	parent and child are typically
-0.236768	zero-bits if unsigned. This typically
-0.236768	in compiled C++. This typically
-0.346259	of range. This may typically
-0.236062	time of programming will typically
-0.236062	floating point calculations will typically
-0.765127	through a function pointer typically
-0.237058	A branch instruction takes typically
-0.236500	to integer without SSE2 typically
-0.310655	processor features. The programmer typically
-0.234126	intermediate code. This framework typically
-0.307537	9.8 Strings Text strings typically
-0.228108	realize that such devices typically
-0.281498	Memory swapping. Software developers typically
-0.222353	software layers and frameworks typically
-0.212274	a lower priority level, typically
-0.348952	or friend function or operator
-0.237494	y = b;} vector operator
-0.314102	the latter has one operator
-0.202505	18, then the & operator
-0.202505	integer. But the & operator
-0.224132	^ operator. The & operator
-0.378009	by using the | operator
-0.326981	Safe [] array index operator
-0.232303	// constructor // sum operator
-0.182221	The constructor or overloaded operator
-0.249050	function. Using an overloaded operator
-0.182221	Overloaded operators An overloaded operator
-0.176462	casting // C++ casting operator
-0.268704	over- loaded type casting operator
-0.222318	or 1. The AND operator
-0.165118	clock cycle. The OR operator
-0.165118	and the EXCLUSIVE OR operator
-0.218490	applies to the modulo operator
-0.218490	decrement operators The pre-increment operator
-0.199953	Dynamic cast The dynamic_cast operator
-0.074750	effect of the const_cast operator
-0.074750	Const cast The const_cast operator
-0.199953	to zero. The [] operator
-0.165097	: 2.6f; The ?: operator
-0.165097	++i and the post-increment operator
-0.165097	Reinterpret cast The reinterpret_cast operator
-0.165097	Static cast The static_cast operator
-0.294229	heavy graphics application is preferably
-0.237838	256-bit YMM vectors are preferably
-0.347362	advantage of this by preferably
-0.237724	of the program - preferably
-0.353580	needed anyway. You may preferably
-0.236566	the first dimension may preferably
-0.420091	of a function should preferably
-0.206631	critical innermost loop should preferably
-0.206631	of each object should preferably
-0.058589	of the objects should preferably
-0.028307	variables and objects should preferably
-0.028307	Variables and objects should preferably
-0.206631	data decomposition, we should preferably
-0.277657	The speed test should preferably
-0.206631	by a list should preferably
-0.206631	A loop counter should preferably
-0.206631	The loop count should preferably
-0.206631	and switch statements should preferably
-0.339766	case. Loop unrolling should preferably
-0.206631	or other device should preferably
-0.206631	by an interrupt should preferably
-0.180040	vectorized code should therefore preferably
-0.180040	space. It should therefore preferably
-0.180040	point calculations should therefore preferably
-0.224919	disk. A few files, preferably
-0.200009	into a single container, preferably
-0.165148	by 16 for SSE2, preferably
-0.346353	0) { c = 1;
-0.288639	for (int n = 1;
-0.340189	{ DTRUE: d = 1;
-0.319379	int i, f = 1;
-0.104044	temp; for (r = 1;
-0.101060	= 0; list[i+1] = 1;
-0.101060	list[i] =0; list[i+1] = 1;
-0.233005	int a[2]; a[0] = 1;
-0.125835	{ return a - 1;
-0.125835	3; return a - 1;
-0.553810	b = a + 1;
-0.030511	{ return a + 1;
-0.030511	} return a + 1;
-0.014988	2; return a + 1;
-0.030511	3; return a + 1;
-0.540404	c = b + 1;
-0.050228	b * b + 1;
-0.212756	{ return x*x + 1;
-0.225743	unsigned int one : 1;
-0.225743	unsigned int sign : 1;
-0.405783	Disp() { cout << 1;
-0.401708	b = a ^ 1;
-0.165164	*= x; n >>= 1;
-0.236637	or at run time. Therefore,
-0.234480	often contains writeable data. Therefore,
-0.451032	a higher instruction set. Therefore,
-0.321550	any constructors are called. Therefore,
-0.287266	or reference to it. Therefore,
-0.457303	default in 64-bit mode. Therefore,
-0.431648	PLT for internal references. Therefore,
-0.317360	function pointer points to. Therefore,
-0.370650	that can be critical. Therefore,
-0.228053	in many different applications. Therefore,
-0.226654	for transferring additional parameters. Therefore,
-0.226675	to any other number. Therefore,
-0.159576	methods are time consuming. Therefore,
-0.159576	is very time consuming. Therefore,
-0.361149	or floating point numbers. Therefore,
-0.222265	only self- relative addresses. Therefore,
-0.291623	actually throws an exception. Therefore,
-0.392031	time MemberPointer is declared. Therefore,
-0.148671	one way or another. Therefore,
-0.148671	one thread than another. Therefore,
-0.212194	numbers of type int. Therefore,
-0.212243	a function library. 78 Therefore,
-0.465970	time it was programmed. Therefore,
-0.199925	streams with different strides. Therefore,
-0.199925	of scope or namespaces. Therefore,
-0.199925	computing power than PCs. Therefore,
-0.165071	pointer has been calculated. Therefore,
-0.357517	Gnu compiler on the Mac
-0.344467	64 bit Windows and Mac
-0.270305	/MT). In Linux and Mac
-0.089215	for Windows, Linux and Mac
-0.011532	of Linux, BSD and Mac
-0.011532	in Linux, BSD and Mac
-0.011532	64-bit Linux, BSD and Mac
-0.011532	Windows, Linux, BSD and Mac
-0.449905	64-bit shared objects in Mac
-0.293897	not been tested in Mac
-0.638872	the Gnu compiler for Mac
-0.237773	Windows, Linux, BSD or Mac
-0.314534	Can only run on Mac
-0.237160	for internal references. 64-bit Mac
-0.343655	relative addresses in 32-bit Mac
-0.280725	when compiling for 32-bit Mac
-0.280725	X Compilers for 32-bit Mac
-0.359429	32-bit -fno-builtin Gnu 32-bit Mac
-0.221016	as in Linux. 32-bit Mac
-0.234769	objects in Unix-like systems. Mac
-0.319854	operating systems Windows, Linux, Mac
-0.265138	of Linux and perhaps Mac
-0.048387	BSD systems. The Intel-based Mac
-0.048387	as well as Intel-based Mac
-0.048387	(Windows, Linux, BSD, Intel-based Mac
-0.330771	not up to date. Mac
-0.358336	the latency of the multiplication
-0.356973	in advance and the multiplication
-0.504365	can occur in the multiplication
-0.063778	of 2 then the multiplication
-0.344714	are integers, while the multiplication
-0.344573	compiler may avoid the multiplication
-0.582008	takes to make a multiplication
-0.331661	does not require a multiplication
-0.293282	floating point addition and multiplication
-0.010271	than addition, subtraction and multiplication
-0.020790	integer addition, subtraction and multiplication
-0.331904	Overflow may occur in multiplication
-0.352972	> abs(v.f) } The multiplication
-0.324866	the matrix element. The multiplication
-0.355492	ipow(x,10); // used for multiplication
-0.237627	In some cases this multiplication
-0.570303	and a floating point multiplication
-0.710229	or two floating point multiplication
-0.291023	such as 32-bit integer multiplication
-0.235101	will often replace integer multiplication
-0.254275	take longer time. Integer multiplication
-0.202632	14.4 Integer multiplication Integer multiplication
-0.089604	the performance. 14.4 Integer multiplication
-0.089604	once................................... 135 14.4 Integer multiplication
-0.229225	if the code involves multiplication
-0.874963	latest version of the application
-0.356741	the job of the application
-0.356529	system API and the application
-0.357880	graphics operation in the application
-0.353554	separate thread if the application
-0.353554	64-bit integers if the application
-0.356205	system, not by the application
-0.545959	be bigger than the application
-0.354560	not separated from the application
-0.355749	function library. If the application
-0.353235	it takes before the application
-0.236990	develop and market the application
-0.324857	in PC processors. The application
-0.293741	of the library. The application
-0.348222	thread-like scheduling in an application
-0.313245	hardware. Porting such an application
-0.497729	behavior of the first application
-0.236287	3.12 Network access Some application
-0.236130	and operating systems"). An application
-0.514345	shows that a particular application
-0.289952	example, a heavy graphics application
-0.289760	not necessary for your application
-0.310043	below. Installing a second application
-0.480091	efficiency of the final application
-0.408110	called in a typical application
-0.199987	development, database integration, web application
-0.199987	Library (WTL). A WTL application
-0.228432	int lrintf (float const x)
-0.022792	int lrint (double const x)
-0.006714	d, __m128i const & x)
-0.122308	polynomial (Vec4f const & x)
-0.122308	float add_elements(__m128 const & x)
-0.332179	7.1 float SomeFunction (int x)
-0.224509	m> int MultiplyBy (int x)
-0.012911	8.3a float parabola (float x)
-0.012911	a;} float parabola (float x)
-0.012911	8.1b float parabola (float x)
-0.000537	public: static double p(double x)
-0.008672	y; } double xpow10(double x)
-0.002152	of 10 double xpow10(double x)
-0.008672	loop unrolled double xpow10(double x)
-0.218564	inline double IntegerPower (double x)
-0.035778	by 16 float Exp(float x)
-0.035778	Taylor series float Exp(float x)
-0.165148	8.20 module1.cpp int Func1(int x)
-0.165148	pure_function ; double Func2(double x)
-0.325357	heap. The space is automatically
-0.816426	microprocessors are able to automatically
-0.341760	possible. A compiler that automatically
-0.425683	cases the compiler can automatically
-0.425683	Likewise, the compiler can automatically
-0.633093	and PathScale compilers can automatically
-0.498250	to vectorize the code automatically
-0.354082	will vectorize the code automatically
-0.320450	All optimizing compilers will automatically
-0.483948	47 Most compilers will automatically
-0.314248	modern processors prefetch data automatically
-0.497445	usually unroll a loop automatically
-0.407345	sets. The program should automatically
-0.381096	will do this optimization automatically
-0.489729	can use vector operations automatically
-0.292218	align large static arrays automatically
-0.337495	programming style that doesn't automatically
-0.235051	updates Many software programs automatically
-0.309759	cost because it goes automatically
-0.232240	functions are often inlined automatically
-0.284270	can insert nontemporal writes automatically
-0.226671	an update, or update automatically
-0.199942	example 14.14a with 14.14b automatically
-0.165086	example 12.8a to 12.8b automatically
-0.165086	Free Documentation License shall automatically
-0.165086	devirtualization (see page 73) automatically
-0.346306	the critical code to see
-0.235595	should look at to see
-0.379610	of C++ compilers to see
-0.989492	makes it possible to see
-0.630585	is also possible to see
-0.311975	a virtual table to see
-1.193997	if you want to see
-0.235595	the final result to see
-0.455240	usually not able to see
-0.431365	may therefore fail to see
-0.235595	a compiler generates to see
-0.291584	do some measurements to see
-0.235595	assembly output listing to see
-0.331838	try different libraries and see
-0.534760	then the compiler can see
-0.701221	an optimizing compiler can see
-0.236728	way the user can see
-0.352218	the code that you see
-0.639954	an optimizing compiler will see
-0.341895	clumsy, as you will see
-0.324685	future we may also see
-0.232678	of position- independent code, see
-0.276560	library has many features, see
-0.265151	Useful for vector operations, see
-0.165143	more on this topic, see
-0.165143	support for XMM registers; see
-0.357498	are sufficient, and the caching
-0.314690	to become fragmented and caching
-0.382761	is so big that caching
-0.235162	slight degradation in code caching
-0.312576	possible or when code caching
-0.235162	in situations where code caching
-0.235162	lines. This makes code caching
-0.343127	which makes the data caching
-0.331182	code caching and data caching
-0.003122	memory. This makes data caching
-0.003122	program. This makes data caching
-0.003122	class. This makes data caching
-0.003122	needed. This makes data caching
-0.003122	order. This makes data caching
-0.003122	bits. This makes data caching
-0.003122	fragmented. This makes data caching
-0.022344	stack, which makes data caching
-0.237451	different memory addresses. If caching
-0.517312	that there are no caching
-0.023112	of the code makes caching
-0.236569	of storing data without caching
-0.346485	static memory can cause caching
-0.222352	(see page 87). Data caching
-0.199993	of RAM memory. Efficient caching
-0.199993	transfer are eliminated. Code caching
-0.551364	into the code that allows
-0.405044	different operating systems that allows
-0.312239	also a language that allows
-0.312239	specify an option that allows
-0.379918	use a container that allows
-0.312239	symbol interposition feature that allows
-0.559750	set is that it allows
-0.345907	unfortunate consequence that it allows
-0.234890	to be pure. This allows
-0.234890	the previous iteration. This allows
-0.234890	overflow is "undefined". This allows
-0.234890	void F1() throw(); This allows
-0.554612	not. The Intel compiler allows
-0.540619	3. The Gnu compiler allows
-0.237429	/Gy, Linux: -ffunction-sections) which allows
-0.237129	AVX2 instruction set also allows
-0.262514	more efficient. 64-bit Windows allows
-0.262514	are different. 64-bit Windows allows
-0.236081	library). The D language allows
-0.416400	reference, a const reference allows
-0.234486	However, the out-of-order mechanism allows
-0.067780	and the program logic allows
-0.067780	If the program logic allows
-0.229186	point format is standardized allows
-0.199987	the exponent is biased allows
-0.199987	another by assignment. shared_ptr allows
-0.382789	of the number and sets
-0.237749	using indexes, working with sets
-0.237576	A CPU dispatcher then sets
-0.339117	calculations on large data sets
-0.417206	explanation of the instruction sets
-0.213086	be used if instruction sets
-0.213086	used only when instruction sets
-0.215449	versions for different instruction sets
-0.134359	compile for different instruction sets
-0.294897	based on which instruction sets
-0.347712	SSE and SSE2 instruction sets
-0.316999	such as supported instruction sets
-0.213086	availability of various instruction sets
-0.213086	but on what instruction sets
-0.213086	of backwards compatible instruction sets
-0.266068	set. The newer instruction sets
-0.285269	vectorization. The newest instruction sets
-0.310063	processors with CISC instruction sets
-0.324837	register. The above example sets
-0.230801	one of the 32 sets
-0.230801	are organized as 32 sets
-0.289253	normal array. The constructor sets
-0.229238	multiply-and-add Table 13.1. Instruction sets
-0.299293	function. The initialization routine sets
-0.165138	Intel library function __intel_cpu_features_init() sets
-0.165138	CPU brands and similarly sets
-1.257481	the result of the expression
-0.658248	For example, in the expression
-0.461910	not possible if the expression
-0.355293	32-bit integers, then the expression
-0.355960	the operands because the expression
-0.293711	is unchanged, while the expression
-0.293711	for both, while the expression
-0.338712	compiler may change the expression
-0.342821	of the time. The expression
-0.237131	loss of efficiency. The expression
-0.237131	the inverted mask. The expression
-0.237719	c + d; This expression
-0.192738	if b is an expression
-0.483135	argument to be an expression
-0.345799	cannot reduce the integer expression
-0.324270	to avoid that some expression
-0.222021	70 Induction variables An expression
-0.222021	and constant propagation An expression
-0.222021	the same thing. An expression
-0.329313	overflow on the intermediate expression
-0.234473	you expect the && expression
-0.231345	the loop counter. Any expression
-0.222352	the cases. The equivalent expression
-0.508359	*p+2 is a loop-invariant expression
-0.165133	be because the non-reduced expression
-0.237905	stack. This behaviour is implementation
-0.382583	where a particular code implementation
-0.488844	range } } This implementation
-0.350558	situations where a vector implementation
-0.351068	compiler uses a different implementation
-0.314150	necessary information about which implementation
-0.237121	n! 117 A C++ implementation
-0.293278	gives the simplest possible implementation
-0.294491	compilers use the software implementation
-0.294491	precision. But the software implementation
-0.278093	of using a software implementation
-0.278093	CPUs. However, a software implementation
-0.236115	the C99 standard. An implementation
-0.349908	that select the best implementation
-0.440910	that has a good implementation
-0.328109	of zero. A good implementation
-0.048240	faster than the hardware implementation
-0.327114	rather than a hardware implementation
-0.283159	and make a complicated implementation
-0.315529	a much more complicated implementation
-0.211299	use the most complicated implementation
-0.230016	string functions. A metaprogramming implementation
-0.229153	give infinity. A typical implementation
-0.224861	careful optimization. A mixed implementation
-0.222371	increments seconds. A safer implementation
-0.236771	end of procedure 4 Most
-0.236644	as binary executable code. Most
-0.570003	stored in static memory. Most
-0.234731	in memory or cache. Most
-0.343744	member pointers less efficient. Most
-0.531241	of user interface framework Most
-0.309719	five times. Thread-local storage Most
-0.233221	the CPU. Algebraic reductions Most
-0.438531	move outside the loop. Most
-0.232208	devices with limited resources. Most
-0.526722	156 16.3 Worst-case testing Most
-0.374354	accessing a simple variable. Most
-0.371791	using pointers and references. Most
-0.597004	of these instruction sets. Most
-0.659673	on floating point expressions. Most
-0.226670	access to low-level optimizations. Most
-0.222259	to debug and maintain. Most
-0.276480	function call. Algebraic reduction Most
-0.347335	described in chapter 12. Most
-0.212188	optimization options turned on. Most
-0.199920	of different C++ constructs Most
-0.199920	version of the executable. Most
-0.251222	Encryption, decryption, data compression Most
-0.199920	the hardware is updated. Most
-0.165066	memcpy(b, a, sizeof(b)); 47 Most
-0.165066	with more heuristic guidelines. Most
-0.558710	step rather than the complicated
-0.357657	Algebraic reduction is a complicated
-0.457597	case and make a complicated
-0.344790	to justify such a complicated
-0.449098	time. Another disadvantage of complicated
-0.237868	use an advanced and complicated
-0.351092	CPU dispatcher based on complicated
-0.341513	than to use this complicated
-0.278266	but not the more complicated
-0.518416	while it is more complicated
-0.333952	Address calculation is more complicated
-0.314509	specific literature for more complicated
-0.778157	make the code more complicated
-0.318730	"__attribute__((visibility ("hidden")))". A more complicated
-0.387513	able to do more complicated
-0.223866	individual array elements more complicated
-0.332614	where a much more complicated
-0.363354	becomes a little more complicated
-0.223866	method is somewhat more complicated
-0.355147	to use the most complicated
-0.344162	meta- programming is so complicated
-0.236031	implementation is needed. These complicated
-0.234772	are less expensive. Using complicated
-0.428495	a compiler to reduce complicated
-0.212257	higher instruction set. More complicated
-0.284291	this method is extremely complicated
-0.348145	The C++ way of handling
-0.341679	other possible ways of handling
-0.237872	are used twice for handling
-0.021926	7.30 Exceptions and error handling
-0.217432	branches such as error handling
-0.498844	make your own error handling
-0.262299	support for the exception handling
-0.193556	turning off the exception handling
-0.125211	the cost of exception handling
-0.125211	possible alternatives to exception handling
-0.125211	off support for exception handling
-0.125211	may think that exception handling
-0.125211	optimal to use exception handling
-0.125211	some compilers. If exception handling
-0.125211	instead of using exception handling
-0.125211	etc. The C++ exception handling
-0.125211	change its possible exception handling
-0.125211	/Qipo -ipo No exception handling
-0.125211	and possibly save exception handling
-0.125211	the reason why exception handling
-0.167992	relies on structured exception handling
-0.013815	You can disable exception handling
-0.057016	and error handling Exception handling
-0.057016	of exception handling Exception handling
-0.525932	used an intermediate code like
-0.340730	that standard library functions like
-0.235585	the more complicated functions like
-0.236408	otherwise. In difficult cases like
-0.235868	text strings in classes like
-0.565819	14.28 can be implemented like
-0.291506	libraries, but who would like
-0.290106	that programmers write expressions like
-0.191480	a compiler can look like
-0.145433	C++ implementation may look like
-0.145433	test setup may look like
-0.191480	This may typically look like
-0.215223	overloaded operators for things like
-0.215223	times to simple things like
-0.231849	times for simple tasks like
-0.230016	reason to add statements like
-0.468699	is useful in situations like
-0.218556	functions is also treated like
-0.218502	expensive. Using complicated techniques like
-0.056999	container class that behaves like
-0.056999	an object that behaves like
-0.048380	the factorial function looks like
-0.048380	The optimized code looks like
-0.048380	Agner's vector classes looks like
-0.165107	inline function is expanded like
-0.165107	responses to simple actions like
-0.237937	loop and splitting the dependency
-0.577657	((a+b)+c)+d. This is a dependency
-0.294015	how to break a dependency
-0.473679	manually. The effect of dependency
-0.237562	long dependency chains. A dependency
-0.293339	to gain if such dependency
-0.287825	This has a long dependency
-0.287825	calculations forms a long dependency
-0.209798	branch misprediction, or long dependency
-0.209798	first sub-vector. A long dependency
-0.209798	there are no long dependency
-0.043726	is to avoid long dependency
-0.043726	have to avoid long dependency
-0.209798	page 22. Avoid long dependency
-0.309500	part of a critical dependency
-0.233518	This makes a critical dependency
-0.234164	Y and Z. Each dependency
-0.232680	loop-carried dependency chain. Such dependency
-0.228126	need to break down dependency
-0.028381	is called a loop-carried dependency
-0.028381	there is no loop-carried dependency
-0.028381	8.23b has two loop-carried dependency
-0.028381	iterations are: No loop-carried dependency
-0.028381	dependency chains, especially loop-carried dependency
-0.212289	A longer loop- carried dependency
-0.212289	out of order. Long dependency
-0.478605	than to write the members
-0.408056	simple class containing the members
-0.547634	efficient. Variables that are members
-0.550219	together if they are members
-0.314638	structure or class with members
-0.307146	members (properties) The data members
-0.306982	to contain all data members
-0.217880	most often used data members
-0.271487	the parent class data members
-0.217880	by allowing two data members
-0.217880	a structure where data members
-0.217880	cases, but its data members
-0.217880	compilers will align data members
-0.021962	access any non-static data members
-0.045093	performance. 7.18 Class data members
-0.045093	51 7.18 Class data members
-0.217880	code that accesses data members
-0.778184	the most often used members
-0.293571	of structure and class members
-0.237039	access the saved variable members
-0.340297	or each of its members
-0.329384	by putting the smallest members
-0.165155	of the class. Data members
-0.165155	keeping data together. Data members
-0.200004	only one instance. Non-static members
-0.349311	are preferred because of their
-0.127324	programs spend most of their
-0.127324	applications spend most of their
-0.352339	advertise new versions of their
-0.294207	compilers are inferior to their
-0.237859	of programming languages and their
-0.237777	than one variable if their
-0.291519	in the program by their
-0.309965	parameters are replaced by their
-0.309965	its parameters replaced by their
-0.323215	should be identified by their
-0.535634	variables as long as their
-0.341544	different objects even when their
-0.237529	doing equivalent reductions at their
-0.237503	Application programmers rarely program their
-0.539353	who want to make their
-0.566756	a and b because their
-0.023210	the same register because their
-0.290779	allocated objects with each their
-0.234887	The threads have each their
-0.236596	branches separately and test their
-0.328459	modern CPUs can change their
-0.286795	profiling tools that fit their
-0.333896	often fail to keep their
-0.294243	security reasons before leaving their
-0.323551	loop is. The type __m128i
-0.341019	bc for each element __m128i
-0.104878	Multiply b and c __m128i
-0.021660	element in vector c __m128i
-0.047048	from array static inline __m128i
-0.002735	void StoreVector(void * d, __m128i
-0.025254	void StoreVectorA(void * d, __m128i
-0.016034	and generate a bit-mask: __m128i
-0.191920	cc into vector c: __m128i
-0.191920	bb into vector b: __m128i
-0.212297	the two AND operations: __m128i
-0.107200	a vector of (0,0,0,0,0,0,0,0) __m128i
-0.107200	a vector of (2,2,2,2,2,2,2,2) __m128i
-0.459042	{ F2(b); } } Using
-0.292301	equivalent to a function. Using
-0.439273	and the level-2 cache. Using
-0.342405	for vectorized table lookup Using
-0.330778	are powers of 2. Using
-0.374389	or a simple variable. Using
-0.317328	all use single precision. Using
-0.190987	with memory access. 12 Using
-0.190987	execution ................................................................................................. 103 12 Using
-0.148689	integers ................................... 141 14.9 Using
-0.148689	= (double)(signed int)u; 14.9 Using
-0.218520	cache are less expensive. Using
-0.272178	the sake of efficiency. Using
-0.148689	Testing speed.............................................................................................................. 153 16.1 Using
-0.148689	counter (see below) 16.1 Using
-0.199953	memory, at least temporarily. Using
-0.330718	instructions (see page 105). Using
-0.074750	vectorization ......................................................................................... 107 12.4 Using
-0.074750	integer vector division. 12.4 Using
-0.074750	functions ........................................................................................ 109 12.5 Using
-0.074750	the next section. 12.5 Using
-0.165097	described in this chapter. Using
-0.165097	explained at page 150. Using
-0.165097	do things in parallel: Using
-0.165097	described in chapter 11. Using
-1.035809	the order of the Boolean
-0.500663	|) instead of the Boolean
-0.355812	The operands of the Boolean
-0.553579	much faster than the Boolean
-0.701763	the difference between the Boolean
-0.654521	be used as a Boolean
-0.354379	you can make a Boolean
-0.498751	swapping the order of Boolean
-0.382614	Booleans The order of Boolean
-0.987945	In the case of Boolean
-0.237482	(see page 43). The Boolean
-0.237482	&, |, ~. The Boolean
-0.547703	~ are useful for Boolean
-0.314530	This makes operations with Boolean
-0.237710	operators using integers as Boolean
-0.356597	integer expressions rather than Boolean
-0.448667	all operators that have Boolean
-0.345443	in programs with many Boolean
-0.288237	from operators that produce Boolean
-0.226716	multiply by xx-xx--x- reciprocal Boolean
-0.306369	generate many branch mispredictions. Boolean
-0.218533	and 1 for true. Boolean
-0.466052	Boolean variables are overdetermined Boolean
-0.199970	the handle is invalid. Boolean
-0.165112	- - - 76 Boolean
-0.499137	still be in the cache.
-0.651488	is not in the cache.
-0.458288	same set in the cache.
-0.237682	may actively invalidate the cache.
-0.407947	variables in memory or cache.
-0.924178	space in the code cache.
-0.517679	contentions in the code cache.
-0.425942	use of the data cache.
-0.505118	cache and the data cache.
-0.054129	contentions in the data cache.
-0.274682	possible into the data cache.
-0.274682	to manipulate the data cache.
-0.565460	threads share the same cache.
-0.235460	in the level- 1 cache.
-0.239034	level-1 and the level-2 cache.
-0.367557	occur in the level-2 cache.
-0.256502	misses in the level-2 cache.
-0.253788	be in the level-1 cache.
-0.253788	table in the level-1 cache.
-0.189799	even the same level-1 cache.
-0.231363	copied to the disk cache.
-0.122647	use of the micro-op cache.
-0.122647	code cache or micro-op cache.
-0.165143	also be a level-3 cache.
-0.237874	the zero flag and don't
-0.314721	including local data that don't
-0.320871	containers is that you don't
-0.415759	code so that you don't
-0.331747	memset line if you don't
-0.331747	Don't panic if you don't
-0.739834	as long as you don't
-0.351258	the object then you don't
-0.346522	static memory. If you don't
-0.330384	different strides. Therefore, you don't
-0.228491	the smallest devices, you don't
-0.237427	its final destination, but don't
-0.293558	12.4a where current compilers don't
-0.277179	of range and we don't
-0.285264	negative so that we don't
-0.285264	factorials so that we don't
-0.205171	the cache so we don't
-0.205171	this case so we don't
-0.344583	them static if they don't
-0.229944	system performance options. I don't
-0.229944	-(-a) to a. I don't
-0.235389	of people. I simply don't
-0.199998	from everybody. So please don't
-0.199998	necessary because the factorials don't
-0.165138	common excuse that "we don't
-0.314749	a level-2 cache of 256
-0.237763	be repeated 1024/4 = 256
-0.237745	be more (128 or 256
-0.237667	256 short int int 256
-0.324970	loop will take only 256
-0.237253	128 float 256 double 256
-0.237146	128 double 128 float 256
-0.233478	AVX double 64 4 256
-0.233478	long long 64 4 256
-0.234551	AVX2 int 32 8 256
-0.234551	AVX2 float 32 8 256
-0.307968	8 64 4 unsigned 256
-0.232232	int int 256 unsigned 256
-0.168553	short int 16 16 256
-0.168553	832 256 16 16 256
-0.313032	SSE2 char 8 32 256
-0.235878	string search instructions AVX 256
-0.234474	Vec2uq 8 32 char 256
-0.234298	occur: if (SIZE > 256
-0.233912	and double vectors AVX2 256
-0.226677	unsigned 256 int int64_t 256
-0.226677	instruction set is available, 256
-0.218484	int int64_t 256 uint64_t 256
-0.165091	(MMX), 128 bits (XMM), 256
-0.165091	char short int 832 256
-0.553961	often faster than the intrinsic
-0.325330	avoided by calling the intrinsic
-0.565243	Especially the use of intrinsic
-0.237839	of two double. The intrinsic
-0.402958	that have support for intrinsic
-0.310554	has some support for intrinsic
-0.292871	12.2. Header files for intrinsic
-0.236726	"xmmintrin.h" // header for intrinsic
-0.459883	extra time. There are intrinsic
-0.325153	can be implemented with intrinsic
-0.356894	you choose to use intrinsic
-0.351276	are hundreds of different intrinsic
-0.237414	if you had used intrinsic
-0.614517	the sense that each intrinsic
-0.354454	better result by using intrinsic
-0.236495	<emmintrin.h> // Define SSE2 intrinsic
-0.236096	Use assembly language Use intrinsic
-0.329083	for compilers that support intrinsic
-0.210291	vectorized table lookup Using intrinsic
-0.092540	......................................................................................... 107 12.4 Using intrinsic
-0.092540	vector division. 12.4 Using intrinsic
-0.319665	(The PGI compiler supports intrinsic
-0.607110	by using the so-called intrinsic
-0.165102	C++ compilers allow assembly-like intrinsic
-0.165102	may use the _mm_clflush intrinsic
-0.356989	and investigated by the methods
-0.331795	least temporarily. Using the methods
-0.237460	casting operator These different methods
-0.428728	is faster than other methods
-0.314145	are two commonly used methods
-0.237126	You should use such methods
-0.292810	applying the various optimization methods
-0.430598	by any of these methods
-0.395478	systems. All of these methods
-0.395493	131 Note that these methods
-0.220780	classes that use these methods
-0.236561	12) are more useful methods
-0.488649	Each of the following methods
-0.350353	and branches. The following methods
-0.228562	blocks of memory. These methods
-0.283592	brand of CPU. These methods
-0.347411	more of the above methods
-0.234736	25 Since most development methods
-0.553303	processors. There are various methods
-0.289470	any of the storage methods
-0.285250	Square blocking and similar methods
-0.165102	compilers have inefficient code-based methods
-0.165102	position-independent code. These workaround methods
-0.165102	compilers have efficient table-based methods
-0.165102	this works and suggests methods
-0.352764	positive overflow of a signed
-0.647625	conversion Conversion of a signed
-0.501762	unsigned integer to a signed
-0.458751	that the behavior of signed
-0.438457	then convert it to signed
-0.420936	convert unsigned integers to signed
-0.293767	if the conversion to signed
-0.382752	compiler, the assumption that signed
-0.358380	and they can be signed
-0.237737	point is faster with signed
-0.294016	Overflow behaves differently on signed
-1.070364	is less efficient than signed
-0.575195	Unsigned is faster than signed
-0.237290	in speed between using signed
-0.232941	result. The conversion between signed
-0.331094	{ ... Conversions between signed
-0.323164	* 2.5; // Use signed
-0.331141	sure not to mix signed
-0.228095	4 4 64-bit integer, signed
-0.224867	The result of comparing signed
-0.165130	unsigned 2 2 int, signed
-0.212260	1 1 short int, signed
-0.032682	expressed as an 8-bit signed
-0.032682	coded as an 8-bit signed
-0.199970	bool 1 1 char, signed
-0.357943	quite likely is a model
-0.292979	any brand name and model
-0.324393	CPU brand names and model
-0.065558	The CPU family and model
-0.065558	on its family and model
-0.065558	its brand, family and model
-0.319377	can you assume that model
-0.413900	you cannot assume that model
-0.237777	of unknown brand or model
-0.237565	N-1 is inferior. A model
-0.346222	solution for the memory model
-0.170600	in a large memory model
-0.170600	Gbytes. This large memory model
-0.287171	CPU-specific and each CPU model
-0.892016	to a specific CPU model
-0.231713	represent a known CPU model
-0.374211	in a particular CPU model
-0.236593	when the next new model
-0.094381	you know that processor model
-0.094381	has. Assuming that processor model
-0.215138	use for each processor model
-0.215138	that the next processor model
-0.347235	and make the next model
-0.283092	be given a false model
-0.165138	/fp:fast=2 -fp-model fast, -fp- model
-0.314662	comments about how the development
-0.325294	the performance during the development
-0.237901	portability and ease of development
-0.347128	come with compilers and development
-0.466632	particular programming language and development
-0.293564	between efficiency, portability and development
-0.314229	programming work automatically. The development
-0.314229	an MFC application. The development
-0.237506	that it makes program development
-0.237315	performance. 25 Since most development
-0.324257	is true that some development
-0.375863	systematization of the software development
-0.288522	may view the software development
-0.217600	debate about which software development
-0.217600	of redesign. Some software development
-0.217600	the Microsoft platform software development
-0.217600	priority of structured software development
-0.381467	be a compromise between development
-0.235952	the availability of good development
-0.309485	and lack of advanced development
-0.284314	for fast and easy development
-0.224889	the availability of powerful development
-0.222318	development tools. One popular development
-0.199976	disk or network. Various development
-0.199976	64-bit Windows. The integrated development
-0.237929	This closely follows the mathematical
-0.023493	permissible for reasons of mathematical
-0.237516	to produce tables of mathematical
-0.237063	processing, sound processing, and mathematical
-0.237063	as sorting, searching, and mathematical
-0.048256	division, square root and mathematical
-0.048256	Division, square root and mathematical
-0.314711	the loop is in mathematical
-0.339348	are intrinsic instructions for mathematical
-0.237752	memcpy, memmove, memset, or mathematical
-0.294011	main focus is on mathematical
-0.454480	third thread can do mathematical
-0.236561	library contains many useful mathematical
-0.349963	for more information about mathematical
-0.228556	many functions for common mathematical
-0.487877	functions The most common mathematical
-0.235824	core library contains optimized mathematical
-0.235505	the innermost loop doing mathematical
-0.342860	literature for more complicated mathematical
-0.309464	A lot of advanced mathematical
-0.331127	operations, and to mix mathematical
-0.228085	time, such as heavy mathematical
-0.276492	function libraries for computing mathematical
-0.165102	are useful for vectorizing mathematical
-0.357665	most libraries it is never
-0.575391	case the function is never
-0.347766	the dispatched function is never
-0.855465	if the program is never
-0.094220	that a variable is never
-0.344620	data members that are never
-0.344620	for constants that are never
-0.643879	of the functions are never
-0.548702	even if they are never
-0.237277	noticed that i can never
-0.349088	the compiler. We can never
-0.290435	certain that a will never
-0.340414	code, so you will never
-0.234584	assume that F1 will never
-0.304784	Obviously, a function should never
-0.207120	A thread-safe function should never
-0.318693	a buffer. It should never
-0.230613	The updating mechanism should never
-0.498052	even if the user never
-0.292006	make sure that overflow never
-0.235190	C++ template feature was never
-0.440867	stack. The memory space never
-0.318190	time to user input never
-0.218543	where a #define directive never
-0.458021	function or in a separate
-0.740756	the code in a separate
-0.324845	file access in a separate
-0.100623	is implemented in a separate
-0.324845	application-specific information in a separate
-0.594193	is defined in a separate
-0.324845	put something in a separate
-0.324845	be placed in a separate
-0.324845	be scheduled in a separate
-0.349084	function library or a separate
-0.310175	consuming calculations into a separate
-0.310175	a task into a separate
-0.310175	preferably isolated into a separate
-0.437743	faster than making a separate
-0.379131	and have implemented a separate
-0.830443	an excessive number of separate
-0.293902	and network access in separate
-0.382451	can be placed in separate
-0.356751	are modified should be separate
-0.344937	more efficient to have separate
-0.420765	incompatible. You may make separate
-0.237485	Keep often used functions separate
-0.293495	put time-consuming tasks into separate
-0.236639	The different threads need separate
-0.356619	quite often because the block
-0.325372	of zero within a block
-0.294123	critical because they can block
-0.237695	ownership of the memory block
-0.282080	size of a memory block
-0.282080	allocating when a memory block
-0.216930	block, but this memory block
-0.270413	queue) allocates one memory block
-0.216930	that destroys any memory block
-0.021886	allocate a new memory block
-0.270413	in one big memory block
-0.216930	has its own memory block
-0.062605	a new bigger memory block
-0.216930	of the old memory block
-0.544029	sizes of the data block
-0.292281	memory block. A large block
-0.312611	to allocate one big block
-0.339659	to allocate a small block
-0.340154	then handle its own block
-0.496651	data in the old block
-0.325399	high-priority thread can possibly block
-0.228140	there is no try block
-0.356567	name ?Func@@YAXQAHAAH@Z is the name
-0.587824	macros is that the name
-0.459988	(Some compilers use the name
-0.237678	utility for modifying the name
-0.344657	position- independent code. The name
-0.293741	the library www.agner.org/optimize/asmlib.zip. The name
-0.523560	dot in the function name
-0.378757	SelectAddMul_dispatch; // Define function name
-0.290888	Assembly name Intrinsic function name
-0.047919	4 ; mangled function name
-0.047919	;alignby4 ; mangled function name
-0.351086	the function a different name
-0.088155	{ has the same name
-0.088155	main has the same name
-0.088155	executable has the same name
-0.349037	for the child class name
-0.266461	the correct child class name
-0.548447	parameters. This is called name
-0.312779	c1 other than its name
-0.312781	names. The details about name
-0.310284	and use the local name
-0.284291	can have any brand name
-0.284291	is just an arbitrary name
-0.199987	details. The funny looking name
-0.165128	these instructions. Function Assembly name
-0.348374	in performance between the systems.
-0.307950	space of the 64-bit systems.
-0.580504	for 32-bit and 64-bit systems.
-0.411480	between 32-bit and 64-bit systems.
-0.108001	in registers in 64-bit systems.
-0.108001	integer registers in 64-bit systems.
-0.252079	64 bits in 64-bit systems.
-0.252079	8 bytes in 64-bit systems.
-0.252079	and fourteen in 64-bit systems.
-0.467366	and sixteen in 64-bit systems.
-0.324270	the IDE on some systems.
-0.352874	scarce resource in 32-bit systems.
-1.018951	C++ compilers and operating systems.
-0.378286	64-bit CPUs and operating systems.
-0.223290	ported to multiple operating systems.
-0.303091	sixteen in 64-bit operating systems.
-0.312346	are available for Linux systems.
-0.290911	been tested in Mac systems.
-0.233998	typically used on bigger systems.
-0.230776	semaphores, mutexes and message systems.
-0.230776	also applies to BSD systems.
-0.176468	used in some embedded systems.
-0.224917	except for small embedded systems.
-0.347437	shared objects in Unix-like systems.
-0.165133	Mac OS and Itanium systems.
-0.354752	performance bottlenecks is to put
-0.337573	separate module, and to put
-0.292655	102 also useful to put
-0.292655	is often useful to put
-0.611706	can be advantageous to put
-0.385973	may be advantageous to put
-1.530644	It is recommended to put
-0.332745	is therefore recommended to put
-0.236176	who would like to put
-0.709715	a good idea to put
-0.495588	optimize the code and put
-0.237357	Define multiple threads and put
-0.293589	seldom used functions, and put
-0.554567	values have to be put
-0.643081	should therefore preferably be put
-0.569718	works then you may put
-0.341675	as b*(2.0/3.0) unless you put
-0.237642	only after they have put
-0.234517	than the other then put
-0.234517	first 128 bytes then put
-0.234517	than the other, then put
-0.225353	both functions and simply put
-0.279952	The values are simply put
-0.196018	and stopping threads. Don't put
-0.196018	was the opposite: Don't put
-0.540816	systems because of the needs
-0.294211	expression b && a needs
-0.322717	inside the function that needs
-0.322717	// Any function that needs
-0.292758	deleted. User work that needs
-0.442279	have a destructor that needs
-0.576706	function calls and it needs
-0.315353	multiple layers and it needs
-0.323647	first time because it needs
-0.323647	link library because it needs
-0.323647	becomes simpler because it needs
-0.237714	15] += 1.0f; This needs
-0.811635	is that the compiler needs
-0.352422	Without optimization, the compiler needs
-0.456765	integer. If a loop needs
-0.381926	expression a && b needs
-0.478441	Only the executable file needs
-0.236335	that only one constant needs
-0.461977	that a positive list needs
-0.442233	access. The code section needs
-0.287761	Any writable data section needs
-0.232278	of the code still needs
-0.306921	loop where each iteration needs
-0.413553	then the exception handler needs
-0.165117	function such as ReadB needs
-0.065502	cos(x); } z = y
-0.065502	= cos(x); z = y
-0.065502	= sin(x); z = y
-0.555611	1.; } else { y
-0.325719	} 68 else { y
-0.146315	b; if (b) { y
-0.324588	int n) { double y
-0.313593	ifbit=1 bitofn // return y
-0.333297	will load the structure y
-0.338480	integers, then the expression y
-0.705530	+ b + c; y
-0.234320	a = x > y
-0.233776	b + c; Here, y
-0.288963	b) {x = a; y
-0.232656	? a : b) y
-0.014879	b, c, d, y; y
-0.136346	c = 100, y; y
-0.136346	c = 1.23456, y; y
-0.016032	a1, a2, b1, b2; y
-0.165117	if (n & 1) y
-0.165117	return vector(x + a.x, y
-0.165117	algebra, we may write: y
-0.357768	runtime check that the conversion
-0.357841	floating point if the conversion
-0.795186	In this example, the conversion
-0.293954	is enabled. Typically, the conversion
-0.292918	smaller as well. The conversion
-0.292918	point to integer. The conversion
-0.236767	low positive result. The conversion
-0.236767	for(i=0; i<100; i++)a[i]=2*i; The conversion
-0.237717	= (short int)i; This conversion
-0.237558	in 64-bit mode. A conversion
-0.237522	the vectors. This data conversion
-0.331199	41 Float to integer conversion
-0.343590	arithmetic expression. The size conversion
-0.234681	in performance. Integer size conversion
-0.407043	discussion. Integer to float conversion
-0.236848	to signed integers before conversion
-0.236802	below. Signed / unsigned conversion
-0.283140	assume that the type conversion
-0.283140	expensive, while the type conversion
-0.218566	about rounding. Pointer type conversion
-0.218566	static_cast<float>(i); // Implicit type conversion
-0.282816	integer. Floating point precision conversion
-0.227878	mixed precision require precision conversion
-0.199987	rounding and truncation. Efficient conversion
-0.165128	integer because the integer-to-float conversion
-1.238265	else { a = c;
-0.734886	a; int b; int c;
-0.440679	B2 { public: int c;
-0.235629	public: B2 b2; int c;
-0.237289	int a, b; double c;
-0.116697	a + b + c;
-0.059815	2 : b * c;
-0.236978	b + 1; return c;
-0.852567	a = b / c;
-0.131282	x[]) { int b, c;
-0.030455	{ int a, b, c;
-0.063208	14.10 int a, b, c;
-0.063208	14.11 int a, b, c;
-0.063208	8.6a int a, b, c;
-0.162781	{ Vec16s a, b, c;
-0.162781	objects Vec8s a, b, c;
-0.695579	a = b % c;
-0.002152	b[SIZE][SIZE]) { int r, c;
-0.008672	x=y; y=temp;} int r, c;
-0.008672	9.5a: 98 int r, c;
-0.037143	one by means of #include
-0.540691	// vector class library #include
-0.236488	12.4b. Vectorized with SSE2 #include
-0.236173	of the compiled versions #include
-0.346417	using Agner vector classes #include
-0.451849	with automatic CPU dispatching #include
-0.290090	different in other compilers. #include
-0.232278	12.9b. Taylor series, vectorized #include
-0.222309	this: // Example 16.2 #include
-0.218511	www.agner.org/optimize/asmlib.zip. // Example 16.1 #include
-0.218511	alloca: // Example 9.3 #include
-0.212211	SSE2 or x64 141 #include
-0.212254	a: // Example 9.6b. #include
-0.013563	Header file for InstructionSet() #include
-0.251247	Example 16.2 #include <stdio.h> #include
-0.199942	clock; } // Or #include
-0.165086	other compilers. #include <excpt.h> #include
-0.165086	#include <excpt.h> #include <float.h> #include
-0.165086	and denormals-are-zero mode (SSE2): #include
-0.165086	Define vector classes (Intel) #include
-0.165086	Set flush-to-zero mode (SSE): #include
-0.165086	Intel vector classes 114 #include
-0.237939	succeeded in applying the various
-0.336270	application. The availability of various
-0.293878	with bounds checking and various
-0.325005	and ARM platforms and various
-1.026396	can be implemented in various
-0.351477	be avoided, there are various
-0.259070	less efficient. There are various
-0.259070	future processors. There are various
-0.259070	for vectors There are various
-0.259070	other resources. There are various
-0.259070	or not. There are various
-0.259070	unaligned arrays. There are various
-0.259070	present manual. There are various
-0.259070	code explicitly. There are various
-0.259070	save power. There are various
-0.259070	than normally. There are various
-0.128153	All C++ compilers have various
-0.128153	Most C++ compilers have various
-0.237503	somewhat more complicated because various
-0.313496	static keyword also makes various
-0.235608	library at www.agner.org/optimize/asmlib.zip contains various
-0.331090	the capability to reduce various
-0.224909	134 and 135 show various
-0.251310	The subsequent sections describe various
-0.344910	these languages have the disadvantage
-0.373539	but it has the disadvantage
-0.280620	called. This has the disadvantage
-0.280620	executed. This has the disadvantage
-0.568704	size. There is a disadvantage
-0.353942	will therefore be a disadvantage
-0.351371	may be at a disadvantage
-0.407363	This is often a disadvantage
-0.329139	as explained below. The disadvantage
-0.339024	and read-only data. The disadvantage
-0.329139	are never called. The disadvantage
-0.379759	in 64-bit Windows. The disadvantage
-0.312103	in another array. The disadvantage
-0.534468	the program starts. The disadvantage
-0.235702	transfer is avoided. The disadvantage
-0.234662	on page 107. A disadvantage
-0.234662	in ASCII form. A disadvantage
-0.234662	having different types. A disadvantage
-0.496752	available. The most important disadvantage
-0.229783	is important. An important disadvantage
-0.182255	than CPU time. Another disadvantage
-0.182255	the code itself. Another disadvantage
-0.182255	the program slower. Another disadvantage
-0.224934	and compact. The biggest disadvantage
-0.462917	be implemented in the high
-0.582307	cases to use the high
-0.356342	optimal solution because the high
-0.237685	the user. With the high
-0.839129	number of objects is high
-0.538553	the work load is high
-0.712823	a matrix is a high
-0.352890	(in bytes) is a high
-0.351484	unroll loops if a high
-0.652172	A loop with a high
-0.454805	task must have a high
-0.350931	BSD comes at a high
-0.445973	be situations where a high
-0.236671	fragmented and involve a high
-0.237502	and switch statements The high
-0.237502	become more powerful. The high
-0.338909	the ADX instructions for high
-0.237491	precision math. Libraries for high
-0.237527	optimization of performance has high
-0.119756	of code is so high
-0.119756	such code is so high
-0.371214	tolerance may be so high
-0.350377	may require a very high
-0.200015	time can be annoyingly high
-0.460309	instructions that use the zero
-0.293843	floating point number is zero
-0.339093	. The value is zero
-0.420860	{ // f is zero
-0.575586	the first byte of zero
-0.055802	// set a to zero
-0.437277	Comparing an integer to zero
-0.323623	setting these variables to zero
-0.441245	set sign bit to zero
-0.312436	setting a register to zero
-0.344649	by setting pointers to zero
-0.571397	can be initialized to zero
-0.292024	to set seconds to zero
-0.292024	loop count down to zero
-0.235982	Constructor // Initialize to zero
-0.237880	into the carry and zero
-0.324349	the type conversion takes zero
-0.235196	invalid if a was zero
-0.022319	vector of (0,0,0,0,0,0,0,0) __m128i zero
-0.013566	strings including the terminating zero
-0.165148	assume that seconds remains zero
-0.562407	But most of the Microsoft
-0.358354	for free in the Microsoft
-0.351342	same features as the Microsoft
-0.352668	Windows. Integrates into the Microsoft
-0.341610	Windows and C++ is Microsoft
-0.407886	popular development tool is Microsoft
-0.237708	example is specific to Microsoft
-0.237708	as a plug-in to Microsoft
-0.441959	Available from Intel and Microsoft
-0.561797	a.store(aa+i); } } The Microsoft
-0.382245	on Windows platforms. The Microsoft
-0.714453	Gnu, Clang, Intel or Microsoft
-0.293122	into projects made with Microsoft
-0.313997	available. Microsoft Comes with Microsoft
-0.237446	#ifdef _MSC_VER // If Microsoft
-0.234469	compilers are mentioned below. Microsoft
-0.219485	vectorclass.h Supported compilers Intel, Microsoft
-0.219485	the Gnu, Clang, Intel, Microsoft
-0.229198	purposes are also available. Microsoft
-0.226721	PathScale Gnu Intel Borland Microsoft
-0.224873	32-bit Mac Intel CodeGear Microsoft
-0.199976	for an explanation. (The Microsoft
-0.165117	(not up to date): Microsoft
-0.165117	compiler versions were tested: Microsoft
-0.237900	are serious limitations to what
-0.294161	compiler can do and what
-0.382752	developing so fast that what
-0.237759	will be called, or what
-0.235218	model numbers, but on what
-0.348548	or C++ based on what
-0.347888	to use depends on what
-0.349488	following solutions, depending on what
-0.446054	rows. Let's look at what
-0.500086	if the compiler does what
-0.236090	when it returns. But what
-0.235427	by 2 ; add what
-1.434230	The following example shows what
-0.572797	the programmer to know what
-0.253492	are sure you know what
-0.500793	the compiler doesn't know what
-0.314897	references. You can change what
-0.268479	const reference cannot change what
-0.228095	throw(A,B,C) to tell explicitly what
-0.228134	difficult to measure exactly what
-0.199970	pointers efficient, and that's what
-0.035772	page 105. 8.7 Checking what
-0.035772	.............................................................................................. 82 8.7 Checking what
-0.251279	clear to the reader what
-1.517628	the value of the parameter
-0.357983	more complex if the parameter
-0.357504	The transfer of a parameter
-0.352325	is significant if a parameter
-0.356090	thousand numbers as a parameter
-0.372785	that the overhead of parameter
-0.102481	function. The overhead of parameter
-0.237617	call and return and parameter
-0.331605	optimize register allocation and parameter
-0.638965	rather than a function parameter
-0.348351	difference between a function parameter
-0.237375	the most critical integer parameter
-0.343685	errors if the size parameter
-0.483827	happen if the size parameter
-0.433309	faster because the template parameter
-0.334921	of using a template parameter
-0.221899	virtual functions. The template parameter
-0.312088	do so). A template parameter
-0.288836	ecx = a ; parameter
-0.352690	a[i+1] = Induction; ; parameter
-0.216105	?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROCNEAR ; parameter
-0.216105	?Func@@YAXQAHAAH@Z PROC NEAR ; parameter
-0.272261	transferred as an implicit parameter
-1.222982	order to make the division
-0.237880	most microprocessors. Multiplication and division
-0.294164	c < 0. The division
-0.178487	constant is faster than division
-0.352225	bug causes floating point division
-0.270698	point division Floating point division
-0.114591	0; 14.6 Floating point division
-0.114591	137 14.6 Floating point division
-0.270698	clock cycles). Floating point division
-0.237408	we can eliminate one division
-0.319185	loop index. The integer division
-0.319185	no instructions for integer division
-0.232845	Here, / means integer division
-0.328542	to unsigned for fast division
-0.162785	power of 2 Integer division
-0.209647	at compile time. Integer division
-0.162785	14.5 Integer division Integer division
-0.162785	most other microprocessors. Integer division
-0.162785	on the microprocessor. Integer division
-0.209647	page 96. 14.5 Integer division
-0.162785	on the processor). Integer division
-0.162785	contains integer division: Integer division
-0.265158	while many reductions involving division
-0.357666	whether r is a reference
-0.331550	for example: Use a reference
-0.293823	function which returns a reference
-0.121888	is a pointer or reference
-0.121888	it a pointer or reference
-0.121888	with a pointer or reference
-0.121888	return a pointer or reference
-0.355103	through a pointer or reference
-0.121888	transfer a pointer or reference
-0.121888	pass a pointer or reference
-0.138035	elimination A pointer or reference
-0.063556	return any pointer or reference
-0.063556	making any pointer or reference
-0.138035	6 integer, pointer or reference
-0.138035	The returned pointer or reference
-0.382379	r points to. A reference
-0.202600	optimize away a const reference
-0.202600	non-const reference, a const reference
-0.224265	const pointer or const reference
-0.298500	const reference. A const reference
-0.349533	to use a constant reference
-0.288667	it sees a relative reference
-0.284331	provoke error // Return reference
-0.251335	// Return a null reference
-0.358385	by modifications of the source
-0.720930	they appear in the source
-0.356411	scattered everywhere in the source
-0.357702	clock cycle if the source
-0.500769	Do not use the source
-0.354213	time. Templates make the source
-0.237844	code (byte code). The source
-0.752660	has an option for source
-0.349079	Aligned operands means that source
-0.341265	are kept in different source
-0.572935	other in the same source
-0.547608	configurations with the same source
-0.524503	sets from the same source
-0.237396	is to join all source
-0.346908	parent class in one source
-0.380961	(STL) is a useful source
-0.512335	so is a common source
-0.312108	derived class in another source
-0.235341	in two steps. All source
-0.369196	This is a frequent source
-0.226716	image processing. Yeppp. Open source
-0.281481	comes from a reliable source
-0.212240	Open Watcom Another open source
-0.165112	regarded as a valuable source
-0.357062	in itself, and the cost
-0.357710	may consider if the cost
-0.355209	run faster at the cost
-0.335773	the market. But the cost
-0.237511	or NAN. Avoiding the cost
-0.237511	CPU dispatching. Underestimating the cost
-0.236777	stop the thread. The cost
-0.236777	CPU cores. 60 The cost
-0.236777	can be defined. The cost
-0.236777	optimizing multithreaded applications: The cost
-0.353802	the loop does not cost
-0.237724	of task switching. This cost
-0.566136	general, there is no cost
-0.570033	Namespaces There is no cost
-0.335581	stack. Deallocation has no cost
-0.230382	there is virtually no cost
-0.313700	used freely without any cost
-0.537000	There is no performance cost
-0.277319	than 231. This extra cost
-0.749297	there is an extra cost
-0.333134	There is no extra cost
-0.625753	There is a large cost
-0.268522	is a large overhead cost
-0.215258	involve a high overhead cost
-0.354634	the CPU it is running
-0.354634	the microprocessor it is running
-0.564108	whether the code is running
-0.445830	before your code is running
-0.237234	the CPU core is running
-0.237851	is Intel's term for running
-0.355493	other tasks that are running
-0.347898	now that we are running
-0.236923	spell-checking and repagination are running
-0.344092	is chosen only when running
-0.376548	possible instruction set when running
-0.233395	most other libraries when running
-0.233395	operating systems disappears when running
-0.236848	performance monitor counters before running
-0.352428	or Mac operating system running
-0.323556	operating system to avoid running
-0.323556	processor models to avoid running
-0.236091	processor core. Two threads running
-0.236038	from a higher-priority thread running
-0.234471	are: 146 Multiple applications running
-0.233764	leave a background process running
-0.222330	by all other processes running
-0.347429	before the program starts running
-0.212285	waste of resources. Consider running
-0.427393	using assembly language and automatic
-0.048174	64-bit. Supports OpenMP and automatic
-0.011552	parallel processing, OpenMP and automatic
-0.048174	page 107), OpenMP and automatic
-0.236555	Supports vector intrinsics and automatic
-0.511041	of the function. The automatic
-0.314238	the AVX instructions. The automatic
-0.237482	libraries have features for automatic
-0.237482	internet or intranet for automatic
-0.235892	Vector class code with automatic
-0.235892	the user-written code with automatic
-0.235371	SelectAddMul example (12.4e) with automatic
-0.235371	compilers and invoked with automatic
-0.403404	is to rely on automatic
-0.568638	you can rely on automatic
-0.585575	value. There is no automatic
-0.237321	since 2004. Can do automatic
-0.681041	useful in situations where automatic
-0.236122	with all compilers. Use automatic
-0.322552	of the program contains automatic
-0.288942	a compiler that supports automatic
-0.302351	takes hours to install automatic
-0.165133	compiler supports vector intrinsics, automatic
-0.237936	thread that shares the resources
-0.324998	Therefore, the data and resources
-0.325112	destructors are called and resources
-0.287555	framework may use more resources
-0.339080	in C++ take more resources
-0.289567	typically take much more resources
-0.289567	typically uses much more resources
-0.599218	takes only slightly more resources
-0.293791	DLL takes more memory resources
-0.326906	help files and other resources
-0.326906	databases, network and other resources
-0.477865	order to predict which resources
-0.237407	influences are removed, all resources
-0.047232	21 3.11 Other system resources
-0.047232	best. 3.11 Other system resources
-0.316424	if there are allocated resources
-0.228143	errors; make sure allocated resources
-0.333582	any of the shared resources
-0.192814	response times for network resources
-0.085793	that depend on network resources
-0.085793	that relies on network resources
-0.222330	applications have less computing resources
-0.251297	in order to reserve resources
-0.165128	a low-priority thread steals resources
-0.581626	final value of the induction
-0.460091	i and use the induction
-0.457971	chain would make the induction
-0.435032	back. The method of induction
-0.237865	pointer, common subexpressions, and induction
-0.293124	to optimize this with induction
-0.236948	8.23b. Calculate polynomial with induction
-0.048101	be calculated by an induction
-0.590005	compiler from making an induction
-0.525080	of how to use induction
-0.428866	studied do not make induction
-1.198716	can use the same induction
-0.722616	cannot make floating point induction
-0.349988	programmer. 79 Floating point induction
-0.324677	be 8 and no induction
-0.344505	chains, namely the two induction
-0.320705	the use of two induction
-0.330620	The compiler doesn't need induction
-0.310038	i by a second induction
-0.272211	and making an explicit induction
-0.035774	Store result // Update induction
-0.035774	variable Y // Update induction
-0.323245	data. This is the reason
-0.323245	other. This is the reason
-0.323245	compiled. This is the reason
-0.323245	happens. This is the reason
-0.340535	an intermediate code. The reason
-0.290073	0 and 1. The reason
-0.438495	100 clock cycles. The reason
-0.290073	version performs well. The reason
-0.290073	thing to do. The reason
-0.377759	old Pentium 4. The reason
-0.234266	is floating point. The reason
-0.234266	do not occur. The reason
-0.234266	in the end. The reason
-0.234266	I have tested. The reason
-0.234266	accessing it directly. The reason
-0.549267	variables for the same reason
-0.800476	then there is no reason
-0.445734	but there is no reason
-0.632056	cases, there is no reason
-0.494363	allocation. There is no reason
-0.494363	automatically. There is no reason
-0.309256	is running. The main reason
-0.224932	is a compelling security reason
-0.723752	initially points to the dispatcher
-0.549018	version. Note that the dispatcher
-0.354962	it gets from the dispatcher
-0.350429	non-Intel processor makes the dispatcher
-0.331422	the loader calls the dispatcher
-0.335788	are called. Therefore, the dispatcher
-0.331693	@gnu_indirect_function"); // Make the dispatcher
-0.237512	to a dispatcher. The dispatcher
-0.237512	else being initialized. The dispatcher
-0.237592	other hardware conditions. A dispatcher
-0.237526	== 2 // make dispatcher
-0.465791	difficult for the CPU dispatcher
-0.498743	is that the CPU dispatcher
-0.200168	library has a CPU dispatcher
-0.200168	to make a CPU dispatcher
-0.200168	of keeping a CPU dispatcher
-0.184681	the program. The CPU dispatcher
-0.184681	up-to-date version. The CPU dispatcher
-0.184681	Intel processor. The CPU dispatcher
-0.184681	most cases: The CPU dispatcher
-0.184681	of programming. The CPU dispatcher
-0.270212	instruction set. A CPU dispatcher
-0.353577	that the Intel CPU dispatcher
-0.908467	the value that is n
-0.331623	Make dynamic array of n
-0.293984	because the consequence of n
-0.331897	of the fact that n
-0.237788	initialize to x^0/0! // n
-0.958345	out the loop by n
-0.992758	can be calculated by n
-0.294009	a bounds check on n
-0.312671	array index than when n
-0.236179	possibly more serious when n
-0.237440	in a vector. If n
-0.237079	n places back, where n
-0.235921	if nonzero u.i += n
-0.322319	<< 23; // add n
-0.348653	= 1.f; for (int n
-0.341250	(int n = 1; n
-0.327181	x; x *= x; n
-0.332861	by 2n by adding n
-0.371841	isolates the least significant n
-0.226701	the series: ex xn n
-0.226701	the interval 0 <= n
-0.199953	by 2n by subtracting n
-0.199953	(2n / b) >> n
-0.659334	the length of the string
-0.127704	block every time a string
-0.314091	code that produces a string
-0.237369	length function scans a string
-0.441989	All common implementations of string
-0.278898	versions of memory and string
-0.278898	most common memory and string
-0.237335	sound processing Memory and string
-0.347827	examples: strlen function. The string
-0.293735	in large applications. The string
-0.348244	as efficient functions for string
-0.237849	and then interpret that string
-0.356074	string classes, such as string
-0.814406	problem is to use string
-0.237533	and variable names from string
-0.350668	the length of each string
-0.236018	Fast versions of common string
-0.304320	that the C style string
-0.068018	variables, floating point constants, string
-0.068018	floating 26 point constants, string
-0.199976	converts a zero-terminated ASCII string
-0.199976	more vector instructions SSE4.2 string
-0.094876	the responsibility of the programmer
-0.351203	responsi- bility of the programmer
-0.356846	be obvious to the programmer
-0.342803	is useful for the programmer
-0.342803	is important for the programmer
-0.509535	more difficult for the programmer
-0.342803	is relevant for the programmer
-0.352798	of things that the programmer
-0.352798	higher risk that the programmer
-0.502180	possible only if the programmer
-0.355037	program or because the programmer
-0.349724	particular situation, but the programmer
-0.406038	order to help the programmer
-0.236489	Intel compiler puts the programmer
-0.552998	reduction. For example, a programmer
-0.237153	for usability reasons. The programmer
-0.237153	than processor features. The programmer
-0.237153	(see page 70). The programmer
-0.344385	takes before the application programmer
-0.525639	identical code for the three
-0.237853	the first way and three
-0.237477	r.a + r.b;} The three
-0.237477	condition, and increment. The three
-0.357372	and multiplication may be three
-0.444855	same time. There are three
-0.344101	function calls. There are three
-0.343718	may be two or three
-0.446124	which are implemented as three
-0.293885	RGB image data have three
-0.342770	FuncC(i+1); } This has three
-0.234061	in this example has three
-0.234061	7.32b. A for-loop has three
-0.324914	for AVX2 and all three
-0.331068	is usually divided into three
-0.462933	and the other way three
-0.312614	example should be compiled three
-0.405749	only one addition every three
-0.344068	aa[size] ); // Make three
-0.232974	Optimizes moderately well. Supports three
-0.304329	truncation. This is approximately three
-0.212229	The assembly listing reveals three
-0.199959	loop. Example 12.4b executes three
-0.588952	CISC instruction set is better
-0.331731	The 64-bit version is better
-0.876194	can lead to a better
-0.516074	it may be a better
-0.336311	this might be a better
-0.355656	is compatible with a better
-0.624813	you may get a better
-0.343572	time a new and better
-0.237608	are becoming better and better
-0.339389	we are waiting for better
-1.077639	then it may be better
-0.926643	but it may be better
-0.354198	Code caching will be better
-0.236530	set may actually be better
-0.450171	so. The compilers are better
-0.237761	achieved more efficiently by better
-0.237562	Performance and usability A better
-1.465369	in order to make better
-0.231396	current operating systems need better
-0.231396	Many software applications need better
-0.234898	because the non-reduced expression better
-0.296250	The compilers are becoming better
-0.218531	The Core2 processor performs better
-0.658796	The effect of the keyword
-0.344959	variables by using the keyword
-0.344959	anything by using the keyword
-0.344959	segment by using the keyword
-0.331388	module then add the keyword
-0.237514	local: 1. Add the keyword
-0.312960	Use fastcall functions The keyword
-0.312960	the calculated value. The keyword
-0.292524	enables interprocedural optimizations. The keyword
-0.236421	on the context. The keyword
-0.292524	See page 80. The keyword
-0.247249	functions because the static keyword
-0.247249	may add the static keyword
-0.229956	inlined function. The static keyword
-0.229956	same class. The static keyword
-0.320559	to use the const keyword
-0.232645	test purposes. The const keyword
-0.169365	C++ compilers The register keyword
-0.169365	register variable. The register keyword
-0.235561	inlined if the inline keyword
-0.085103	is volatile. The volatile keyword
-0.085103	enabled. Volatile The volatile keyword
-0.222365	mode. Therefore, the __fastcall keyword
-0.357591	objects and is not efficient.
-0.337204	memory blocks is more efficient.
-0.337204	The latter is more efficient.
-0.332308	code smaller and more efficient.
-0.330163	position- independent code more efficient.
-0.229327	makes function calls more efficient.
-0.563233	makes data caching more efficient.
-0.229327	floating point comparisons more efficient.
-0.236970	makes data caching very efficient.
-0.324498	integers, which is less efficient.
-0.295898	64 bits are less efficient.
-0.189804	which makes it less efficient.
-0.189804	makes the program less efficient.
-0.189804	make member pointers less efficient.
-0.005156	the data caching less efficient.
-0.005156	and data caching less efficient.
-0.001712	makes data caching less efficient.
-0.026415	code makes caching less efficient.
-0.257907	are only slightly less efficient.
-0.241267	structure. This is equally efficient.
-0.285347	and they are equally efficient.
-0.459242	a function with a lookup
-0.677775	efficient to use a lookup
-0.474952	advantageous to use a lookup
-0.547465	improved by using a lookup
-0.335171	and possibly also a lookup
-0.338499	This implementation uses a lookup
-0.534337	set. Do not use lookup
-0.321410	call with a table lookup
-0.199599	The principle of table lookup
-0.088432	position-independent code and table lookup
-0.088432	address calculation and table lookup
-0.199599	rely heavily on table lookup
-0.199599	Obviously, all these table lookup
-0.278956	bypass the virtual table lookup
-0.199599	page 132. Unfortunately, table lookup
-0.330239	functions for vectorized table lookup
-0.199599	level 9. Avoid table lookup
-0.199599	} 112 Vectorized table lookup
-0.099592	......................................................................................... 132 14.1 Use lookup
-0.099592	optimization topics 14.1 Use lookup
-0.319294	return FactorialTable[n]; // Table lookup
-0.221055	cache line. 132 Table lookup
-0.228162	BSD, the slow GOT lookup
-1.472464	the address of the end
-0.345861	is important to the end
-0.064120	be distributed to the end
-0.345861	also inconvenient to the end
-0.354013	error handling in the end
-0.354013	unused points in the end
-0.354013	data structures in the end
-0.352079	a disadvantage for the end
-0.352079	perform poorly for the end
-0.352935	time delay that the end
-0.352935	is unlikely that the end
-0.458127	dummy elements at the end
-0.352953	the time before the end
-0.343927	are doing. See the end
-0.236655	programs. Writing past the end
-0.237915	in the majority of end
-0.347488	edx = point to end
-0.237759	a[i+2] ; compare with end
-0.382534	with more RAM than end
-0.237059	than -156. Surprisingly, we end
-0.236518	The recursion must always end
-0.165159	; align ; mark end
-0.331519	system programming, but in applications
-0.420922	only an advantage in applications
-0.237838	cores are advantageous for applications
-0.511115	slices. This will make applications
-0.314147	for calling from other applications
-0.237126	the screen. However, such applications
-0.344217	good performance for many applications
-0.169835	Other databases Many software applications
-0.169835	or more. Many software applications
-0.236829	not recommended for critical applications
-0.223314	of function libraries Some applications
-0.223314	multidimensional array sequentially. Some applications
-0.223314	memory pool. Alignment? Some applications
-0.236017	library, except when several applications
-0.234764	focus is on mathematical applications
-0.279431	used in small embedded applications
-0.222300	linking are: 146 Multiple applications
-0.222300	5-10% for some CPU-intensive applications
-0.212229	objects simultaneously. In multithreaded applications
-0.199959	PathScale compiler for Unix applications
-0.199959	development time for WTL applications
-0.165102	that cause the resource-hungry applications
-0.165102	files from disk. Memory-hungry applications
-0.236791	explained on page 8 below.
-0.284394	64-bit mode, as explained below.
-0.284394	the system, as explained below.
-0.284394	cleaned up, as explained below.
-0.284394	other optimizations, as explained below.
-0.235719	OS. See page 128 below.
-0.234888	position- independent code, see below.
-0.489090	This method is described below.
-0.319022	detection function as described below.
-0.320236	my experiment are given below.
-0.271884	type conversions is discussed below.
-0.182793	to optimization are discussed below.
-0.182793	function libraries are discussed below.
-0.229209	common compilers are mentioned below.
-0.226721	described in the sections below.
-0.423585	illustrated in example 13.1 below.
-0.222318	you follow the guidelines below.
-0.218538	listed in table 8.1 below.
-0.032679	explained on page 146 below.
-0.032679	detail on page 146 below.
-0.212277	listed on page 164 below.
-0.251285	These instructions are summarized below.
-0.199976	given in example 14.19 below.
-0.408173	unless you expect the &&
-0.456951	false = a a &&
-0.755011	d; c = a &&
-0.433314	You cannot replace a &&
-0.313909	time. The expression a &&
-0.236835	!(a || b) a &&
-0.048220	true = true a &&
-0.048220	!a = true a &&
-0.538853	the first operand of &&
-0.349456	true last in an &&
-0.293474	The equivalent expression b &&
-0.692085	of the Boolean operators &&
-0.234745	if (SIZE > 256 &&
-0.234797	= x > y &&
-0.233284	most cases. Don't change &&
-0.212277	(a&&c) = a&&(b||c) !a &&
-0.199976	if (i >= min &&
-0.199976	if (i < ARRAYSIZE &&
-0.074758	&& a<c) = (a<b &&
-0.074758	(a+c==b+c)=(a==b) ----x---- !(a<b)=(a>=b) (a<b &&
-0.165117	!(a<b)=(a>=b) (a<b && b<c &&
-0.165117	if (handle != INVALID_HANDLE_VALUE &&
-0.165117	prefer to write if(!a &&
-0.542012	x by using the |
-0.333789	single operation using the |
-0.456963	0 = a a |
-0.652358	b; d = a |
-0.458972	|| b with a |
-1.198320	n.a. n.a. - a |
-0.209671	a = a, a |
-0.307980	-1 = a, a |
-0.236843	---xx---- a<<b<<c=a<<(b+c) x-xxx--xx a |
-0.237874	use of << and |
-0.314490	C; x.abc = A |
-0.141386	& 3) << 4) |
-0.141386	| (B << 4) |
-0.035779	& (Tuesday | Wednesday |
-0.035779	expression (Tuesday | Wednesday |
-0.074767	if (Day & (Tuesday |
-0.074767	efficiency. The expression (Tuesday |
-0.330778	0, (a&b) | (~a&c) |
-0.074767	| (b&c) = (a&b) |
-0.074767	0 = 0, (a&b) |
-0.165138	(SSE2): #include <xmmintrin.h> _mm_setcsr(_mm_getcsr() |
-0.165138	= (n & 0x7FFFFF) |
-0.165138	= (A & 0x0F) |
-0.452743	> 0) { // Make
-0.236338	int cc[]) { // Make
-0.338410	+ 4.; }; // Make
-0.287506	int aa[size] ); // Make
-0.023070	zero = _mm_set1_epi16(0); // Make
-0.232008	<float, 100> list; // Make
-0.232008	(".type CriticalFunction, @gnu_indirect_function"); // Make
-0.232008	(0,0,0,0,0,0,0,0) Is16vec8 zero(0,0,0,0,0,0,0,0); // Make
-0.290320	method is described below. Make
-0.349038	the selected instruction set. Make
-0.232304	of the object's class. Make
-0.457397	recognized in 64-bit mode. Make
-0.317980	constructor for the object. Make
-0.287350	division by a variable. Make
-0.532935	as the function returns. Make
-0.218531	example 13.1 below. 126 Make
-0.218531	compiler with C++0x support. Make
-0.218531	simplest expressions and operators. Make
-0.199993	object files and executables. Make
-0.165133	consider the following alternatives: Make
-0.237555	f is zero } We
-0.404209	set can be used. We
-0.571042	in the level-1 cache. We
-0.342152	sign bit to zero We
-0.234276	Clang and Intel compilers. We
-0.431966	supported by the compiler. We
-0.231807	the worst possible performance. We
-0.226685	the floating point number. We
-0.585100	a long dependency chain. We
-0.222277	generation of identifier names. We
-0.218473	sign bit of u.f We
-0.218473	handling in this example. We
-0.218473	we assume is optimized. We
-0.265073	when r = 28. We
-0.199937	example 15.1b to 15.1c. We
-0.330695	about the sign bit. We
-0.165081	section for some caveats. We
-0.165081	16, 32, 64, ...). We
-0.165081	endian storage (e.g. PowerPC). We
-0.165081	with sign bit set). We
-0.165081	36 C++ as 'this'. We
-0.165081	example 15.1b to 15.1c? We
-0.165081	enough to be annoying. We
-0.540650	in any of the examples
-0.351103	as well, but the examples
-0.345062	memory pool. See the examples
-0.514534	specified instruction set. The examples
-0.324415	for each version. The examples
-0.237128	not well documented. The examples
-0.293306	page and 90 for examples
-0.237108	See page 103 for examples
-0.293306	alignment. See www.agner.org/optimize/cppexamples.zip for examples
-0.352524	is allowed. The code examples
-0.236875	http://www.agner.org/optimize/asmlib.zip contains complete code examples
-0.237595	can also find more examples
-0.237072	I have seen many examples
-0.231252	same variables. In these examples
-0.317242	different purposes. All these examples
-0.353303	see shortly. The following examples
-0.236038	I have provided several examples
-0.235600	manual at www.agner.org/optimize/cppexamples.zip contains examples
-0.516084	methods in the above examples
-0.321872	pointers. 144 The above examples
-0.212257	reference to a[i] More examples
-0.199987	possibility that such contrived examples
-0.199987	possible to construct obscure examples
-0.324833	same object, except for char
-0.314414	same object (except for char
-0.048274	16 1 byte = char
-0.048274	32 1 byte = char
-0.382149	Here, I have used char
-0.338704	vector, bits Instruction set char
-0.237153	14.3b int n; static char
-0.334860	class, Agner 8 8 char
-0.277586	Linux: long int unsigned char
-0.277586	Is8vec8 8 8 unsigned char
-0.223266	Vec16c 8 16 unsigned char
-0.223266	char 256 Vec32c unsigned char
-0.313020	64 I64vec1 8 16 char
-0.604063	64 2 128 SSE2 char
-0.313042	128 Vec2uq 8 32 char
-0.534536	b:2; int c:2; }; char
-0.311693	64 1 64 MMX char
-0.199965	maximum value in stdint.h char
-0.165107	to: // Example 7.10b char
-0.165107	counter: // Example 7.31b char
-0.165107	case: // Example 7.31a char
-0.165107	example: // Example 8.17 char
-0.165107	this: // Example 7.9b char
-0.556930	While some of the difference
-0.357382	to compensate for the difference
-0.351022	time consumption as the difference
-0.102687	*= x; Note the difference
-0.102687	variable Day. Note the difference
-0.237428	following example illustrates the difference
-0.237428	is fast. Calculating the difference
-0.543864	there may be a difference
-0.344320	the two functions. The difference
-0.293346	8 = 80. The difference
-0.237143	CPLDs and FPGAs. The difference
-0.405165	- there is no difference
-0.713543	then there is no difference
-0.571217	cases, there is no difference
-0.405165	enabled there is no difference
-0.534220	parameters. There is no difference
-0.096688	variable, it makes no difference
-0.096688	or #define makes no difference
-0.221258	there is simply no difference
-0.292211	there is no big difference
-0.288667	elements with a relative difference
-0.304396	to test // Time difference
-0.165159	is only a minimal difference
-0.352496	hardware definition code in addition
-0.237630	instead of (or in addition
-0.292804	vector register, do an addition
-0.236667	microprocessor is doing an addition
-0.528995	takes longer time than addition
-0.461414	If a floating point addition
-0.461414	do a floating point addition
-0.474887	accumulators for floating point addition
-0.322687	only one floating point addition
-0.646511	or two floating point addition
-0.322687	a new floating point addition
-0.322687	to mix floating point addition
-0.340173	page 105. Floating point addition
-0.311573	you can have one addition
-0.451380	will have only one addition
-0.290801	means that if each addition
-0.328152	dependency chain where each addition
-0.455531	can start a new addition
-0.508881	faster. The most important addition
-0.235733	it can do another addition
-0.845389	result of the preceding addition
-0.365882	addition before the preceding addition
-0.230779	high precision math allow addition
-0.358500	optimal decomposition of the data.
-0.502709	do calculations on the data.
-0.237679	mechanism can prefetch the data.
-0.293956	ways of organizing the data.
-0.345386	the smallest list of data.
-0.313377	its own block of data.
-0.065546	for relative addressing of data.
-0.031538	for self-relative addressing of data.
-0.031538	supports self-relative addressing of data.
-0.292922	databases with lots of data.
-0.236770	than 2 gigabytes of data.
-0.237511	and other odd-sized vector data.
-0.382162	to the most used data.
-0.313849	all public and static data.
-0.240726	typical set of test data.
-0.240726	realistic set of test data.
-0.236599	database for storing user data.
-0.236578	doing something on these data.
-0.231876	as well as writing data.
-0.230054	storing text or input data.
-0.265138	same code and read-only data.
-0.165133	long long, double. Misaligned data.
-0.165133	it often contains writeable data.
-0.525426	misprediction before it is too
-0.477186	If the problem is too
-0.450291	of this solution is too
-0.293259	array or container is too
-0.977134	the loop count is too
-0.237067	If the granularity is too
-0.064723	turns out to be too
-0.351823	should therefore not be too
-0.355495	dynamically. Arrays that are too
-0.236926	B and C are too
-0.536906	b[i] and c[i] are too
-0.325329	time too small or too
-0.237619	simply zero. Execution time too
-0.530584	If a program has too
-0.353048	optimal because it takes too
-0.233567	If the program takes too
-0.235179	version of Basic was too
-0.319360	heap space has become too
-0.232315	Unfortunately, some compilers unroll too
-0.226736	then the sampling generates too
-0.218551	for significant improvements. Making too
-0.165133	double precision without worrying too
-0.237905	preceding paragraph described a mechanism
-0.314232	objects (*.dll, *.so). The mechanism
-0.293738	catching hardware exceptions. The mechanism
-0.237716	normal return route. This mechanism
-0.558655	than the Gnu compiler mechanism
-0.352561	some cases, the Intel mechanism
-0.349857	called, while the Gnu mechanism
-0.075791	because the out-of-order execution mechanism
-0.075791	general, the out-of-order execution mechanism
-0.291019	page 44. The dispatching mechanism
-0.257043	compiler bypassing the dispatch mechanism
-0.279930	the virtual function dispatch mechanism
-0.145991	that the CPU dispatch mechanism
-0.066890	Implementation The CPU dispatch mechanism
-0.066890	mechanism. The CPU dispatch mechanism
-0.145991	developed. A CPU dispatch mechanism
-0.320263	calculation. However, the out-of-order mechanism
-0.536682	Unfortunately, the CPU detection mechanism
-0.281495	often abusing the update mechanism
-0.089371	situations: The stack unwinding mechanism
-0.089371	dependent. The stack unwinding mechanism
-0.212252	important work. The updating mechanism
-0.199981	in memory. The renaming mechanism
-0.354089	CriticalInnerFunction () { // Table
-0.235463	{ // n! // Table
-0.235463	// Polynomial coefficients // Table
-0.235463	Z += A2; // Table
-0.235463	134) return FactorialTable[n]; // Table
-2.034298	- n.a. n.a. - Table
-0.406005	16 8 or 16 Table
-0.236495	cache MOVNTDQ _mm_stream_si128 SSE2 Table
-0.234621	Mac Intel CodeGear Microsoft Table
-0.320500	= double 4 AVX2 Table
-0.289477	Overview of compiler options Table
-0.432916	nontemporal #pragma vector nontemporal Table
-0.284271	obscure possibility of overflow. Table
-0.560418	64 8 512 AVX512 Table
-0.218496	intrin.h (MS) x86intrin.h (Gnu) Table
-0.218496	64 0 264-1 uint64_t Table
-0.212229	a cache line. 132 Table
-0.199959	License, optional commercial license Table
-0.165102	F32vec4 F64vec2 F32vec8 F64vec4 Table
-0.165102	513 513 58.7 168.3 Table
-0.165102	optimization report /Qopt-report -opt-report Table
-0.165102	FMA3 floating point multiply-and-add Table
-0.165102	513 2056 38.1 97 Table
-0.357229	the program, and the runtime
-0.459861	more efficient than the runtime
-0.237675	example shows first the runtime
-0.344971	only once, while the runtime
-0.356249	linked either as a runtime
-0.331649	class. It makes a runtime
-0.858904	uses a lot of runtime
-0.640267	turn off support for runtime
-0.538519	time is wasted on runtime
-0.346163	object does not use runtime
-0.237550	a static library. A runtime
-0.531575	time rather than at runtime
-0.415786	C2::Disp() is done at runtime
-0.234246	m is transferred at runtime
-0.237399	software package, including all runtime
-0.237131	main reason why such runtime
-0.332844	must install a large runtime
-0.483437	of a very large runtime
-0.236111	are based on big runtime
-0.236072	for the common language runtime
-0.309537	code-based methods or require runtime
-0.232629	-fomit- frame- pointer No runtime
-0.212246	be aware of. Big runtime
-0.524907	on intermediate code is needed
-0.348344	processor. Extra time is needed
-0.487952	of the pointer is needed
-0.237238	a hash map is needed
-0.237238	73. Runtime polymorphism is needed
-0.357013	Global variables may be needed
-0.349119	induction variable would be needed
-0.973482	the functions that are needed
-0.237380	These table lookups are needed
-0.335483	of A is not needed
-0.136122	copy constructor is not needed
-0.136122	default constructor is not needed
-0.335483	compilers. Fastcall is not needed
-0.335483	Step (1) is not needed
-0.314582	the matrix longer than needed
-0.859608	the amount of memory needed
-0.338725	table element Instruction set needed
-0.381977	of the final size needed
-0.236176	of the extra work needed
-0.327216	library that is actually needed
-0.598813	this feature is rarely needed
-0.074777	efficient solution. Is searching needed
-0.074777	binary tree. Is searching needed
-0.356421	oriented programming as a means
-0.351584	a class member function means
-0.077604	files into one by means
-0.077604	modules into one by means
-0.232080	number 2, etc. This means
-0.287588	in most cases. This means
-0.232080	point multiplication units. This means
-0.232080	sets 4 ways. This means
-0.232080	do out-of-order execution. This means
-0.232080	every clock cycle. This means
-0.232080	32 = 28. This means
-0.314132	you should by all means
-0.289140	a local const variable means
-0.460365	to a global variable means
-0.235764	of sets). Here, / means
-0.233755	from 20 to 10 means
-0.233532	to a non-member function, means
-0.233532	good performance). Aligned operands means
-0.233547	decomposition. Functional decomposition here means
-0.231348	with truncation, and % means
-0.224909	scanners and other protection means
-0.218519	90. 15 Metaprogramming Metaprogramming means
-0.165122	as a scalar (Scalar means
-0.462686	B value in the last
-0.854266	organized so that the last
-0.356447	be accessed with the last
-0.355145	at least at the last
-0.331274	have to add the last
-0.341208	member or after the last
-0.237423	520 and leave the last
-0.237611	stored in x, and last
-0.237611	the G values, and last
-0.382764	never be negative. The last
-1.205838	is the same as last
-0.711725	the same way as last
-0.237681	goes another way than last
-0.009066	first byte at 0, last
-0.018324	bytes byte at 0, last
-0.231363	is most often true last
-0.230082	other big objects come last
-0.040468	first byte at 8, last
-0.040468	1 byte at 8, last
-0.222356	15 byte at 16, last
-0.165138	first byte at 12, last
-0.165138	first byte at 400, last
-0.237417	Some instructions are one byte
-0.313494	to find the first byte
-0.313494	to localize the first byte
-0.277959	// 2 bytes. first byte
-0.117974	// 4 bytes. first byte
-0.277959	// 8 bytes. first byte
-0.127244	// 400 bytes. first byte
-0.436610	// 4 unused bytes byte
-0.216217	last byte at 1 byte
-0.216217	needed _mm_shuffle_epi8 16 1 byte
-0.216217	SSSE3 _mm_perm_epi8 32 1 byte
-0.004671	byte at 0, last byte
-0.015971	byte at 8, last byte
-0.148044	byte at 16, last byte
-0.148044	byte at 12, last byte
-0.148044	byte at 400, last byte
-0.501392	core clock cycles per byte
-0.224934	last byte at 15 byte
-0.558560	matters rather than the parts
-0.408115	important to optimize the parts
-0.237598	is not possible when parts
-0.347089	random times and make parts
-0.341261	to manipulate the different parts
-0.432761	are stored in different parts
-0.263861	program, or between different parts
-0.263861	function. Switch between different parts
-0.312857	the costs to other parts
-0.235514	(RTTI), which affects other parts
-0.382162	of the most used parts
-0.347626	to identify the critical parts
-0.219387	security advices in critical parts
-0.219387	most important or critical parts
-0.556672	making the most critical parts
-0.219387	time. Optimizing less critical parts
-0.236073	and other system- specific parts
-0.291337	the likelihood that certain parts
-0.370715	When the most time-consuming parts
-0.315131	on the time consuming parts
-0.222336	the CPU brand. Critical parts
-0.265138	disadvantage if other nearby parts
-0.652649	b; d = a ||
-0.489684	true = a, a ||
-0.433879	you cannot replace a ||
-0.048278	false = false, a ||
-0.048278	!a = false, a ||
-0.538860	the first operand of ||
-0.237868	Boolean operators && and ||
-0.349462	or first in an ||
-0.292521	if (i < 0 ||
-0.028380	b : c (a&&b) ||
-0.028380	a+b+c=a+(b+c) (a+b)+c=a+(b+c) --xx----- (a&&b) ||
-0.028380	x--xx---- (a&&b)||(a&&!b)=a x--xx---- (a&&b) ||
-0.028380	c x-xx----- 75 (a&&b) ||
-0.028380	(a&&b&&c) = a&&b (a&&b) ||
-0.222347	|| Day == Wednesday ||
-0.330763	a&&b (a&&b) || (a&&c) ||
-0.199987	&& !b = !(a ||
-0.330763	75 (a&&b) || (!a&&c) ||
-0.199987	if (Day == Tuesday ||
-0.165128	(a&&b&&c) = a&&(b||c) (a&&!b) ||
-0.165128	int n; #if defined(__unix__) ||
-0.165128	than the equivalent if(!(a ||
-0.509048	b) { return a >
-0.314479	z; a = x >
-0.061986	// result = b >
-0.237195	(i = StringLength; i >
-0.236931	if (u.i * 2 >
-0.236059	if (a * c >
-0.230771	< ARRAYSIZE && list[i] >
-0.261119	Example 14.15a if (a >
-0.184691	function #define MAX(a,b) (a >
-0.562614	u, v; if (u.i >
-0.218496	v.i) { // u.f >
-0.356007	n) { if (n >
-0.199959	{ aa[i] = (bb[i] >
-0.199959	all 1's when bb[i] >
-0.017517	c; a = select(b >
-0.017517	c.load(cc+i); a = select(b >
-0.165102	the integer expression -a >
-0.165102	element (approximately): if (absvalue >
-0.165102	will occur: if (SIZE >
-0.165102	for the <, <=, >
-0.165102	2) { // abs(u.f) >
-0.382793	If the number and types
-0.414995	store objects of different types
-0.111921	that pointers of different types
-0.111921	two pointers of different types
-0.263091	make arrays of different types
-0.225077	conversions by using different types
-0.340419	family have two different types
-0.259324	and correspondingly two different types
-0.225077	brands of CPUs, different types
-0.237448	compilers can reduce other types
-0.101000	summarizes the different integer types
-0.101000	Sizes of different integer types
-0.232842	way of defining integer types
-0.348190	to mix the two types
-0.237017	compilers can reduce some types
-0.047626	50 7.16 Function return types
-0.047626	systems". 7.16 Function return types
-0.236587	compiler will convert these types
-0.230230	array elements of simple types
-0.315996	is efficient for simple types
-0.279441	Do objects have mixed types
-0.265151	indeed of the specified types
-0.515842	supported. The calculation of expressions
-0.343522	reduce some types of expressions
-0.546517	manipulations of floating point expressions
-0.447303	especially in floating point expressions
-0.508430	variables for floating point expressions
-0.228353	the operands are integer expressions
-0.276340	more reductions on integer expressions
-0.205513	algebraic manipulations on integer expressions
-0.228353	variables for other integer expressions
-0.228353	better at reducing integer expressions
-0.237172	Induction variables for float expressions
-0.664108	that chooses between two expressions
-0.237137	very often, but such expressions
-0.236203	that can skip large expressions
-0.224477	do, however, often write expressions
-0.224477	think that programmers write expressions
-0.235111	in many cases. Integer expressions
-0.258216	can reduce simple algebraic expressions
-0.206129	to reduce various algebraic expressions
-0.331722	understands only the simplest expressions
-0.165133	as constant references accept expressions
-0.165133	as function inlining. Reducible expressions
-0.535995	complicated code that is difficult
-1.138104	is that it is difficult
-0.345897	intrinsic functions It is difficult
-0.447121	allocated memory. It is difficult
-0.345897	has disadvantages: It is difficult
-0.345897	choose between. It is difficult
-0.354249	complicated process which is difficult
-0.574938	be very long and difficult
-0.237614	code becomes bulky and difficult
-0.855483	and it can be difficult
-0.185213	and it may be difficult
-0.314108	cache space and are difficult
-0.563899	other functions that are difficult
-0.555885	templates makes the code difficult
-0.354555	Vectorized code is more difficult
-0.339949	less clear and more difficult
-0.237126	constructor specifying otherwise. In difficult
-0.330686	program that are very difficult
-0.342899	to understand and therefore difficult
-0.437801	a*b*c*2. It is quite difficult
-0.284311	software that is slow, difficult
-0.440527	instructions to the instruction set.
-0.200508	least the same instruction set.
-0.200508	name for each instruction set.
-0.042127	on the available instruction set.
-0.042127	increased the available instruction set.
-0.200508	support the necessary instruction set.
-0.251884	for the specific instruction set.
-0.526156	in the AVX instruction set.
-0.305570	for the supported instruction set.
-1.077673	SSE2 or later instruction set.
-0.287660	with a higher instruction set.
-0.215110	SSE or higher instruction set.
-0.331467	for the desired instruction set.
-0.200508	for a given instruction set.
-0.200508	with the current instruction set.
-0.200508	for a lower instruction set.
-0.353871	use the newest instruction set.
-0.200508	have the selected instruction set.
-0.200508	on the specified instruction set.
-0.200508	without the FMA4 instruction set.
-0.200508	supports the corresponding instruction set.
-0.335585	cache lines in each set.
-0.314563	using an inline function instead
-0.294003	often inserts built-in code instead
-0.355451	by using short int instead
-0.574698	set of test data instead
-0.237192	that it has i instead
-0.237146	example 8.15a were float instead
-0.353122	pointer to the object instead
-0.442763	uses a lookup table instead
-0.489637	be stored in registers instead
-0.292549	own error handling system instead
-0.288919	avoided by using references instead
-0.742780	more performance monitor counters instead
-0.285238	polymorphism effect with templates instead
-0.228106	For example use #if instead
-0.184699	efficiency by using rounding instead
-0.184699	supports this). Use rounding instead
-0.222289	calls. 48 Use macros instead
-0.355995	an intermediate file format instead
-0.265086	with the option -fpie instead
-0.212217	#define, const or typedef instead
-0.199948	used char (or int) instead
-0.165091	operators (& and |) instead
-0.550527	Fortran are based on compilers.
-0.540920	different versions for different compilers.
-0.435590	be compiled with different compilers.
-0.377249	expressions on seven different compilers.
-0.336414	look different in other compilers.
-0.438653	are used with other compilers.
-0.344351	method works with all compilers.
-0.445996	to work on all compilers.
-0.606552	Gnu, Clang and Intel compilers.
-0.286126	well developed as C++ compilers.
-0.333035	features of Intel C++ compilers.
-0.230793	with most modern C++ compilers.
-0.351531	be expensive in some compilers.
-0.336790	Microsoft, Intel and Gnu compilers.
-0.167983	Intel, PathScale and Gnu compilers.
-0.290501	is specific to Microsoft compilers.
-0.030914	Intel, Gnu and PathScale compilers.
-0.139563	Intel, Microsoft and PathScale compilers.
-0.229201	are not compatible across compilers.
-0.226748	Intel, Gnu and Clang compilers.
-0.218531	available for the commercial compilers.
-0.354783	'this' pointer which is transferred
-0.172389	the way m is transferred
-0.237604	simple function, m is transferred
-0.237417	object, and ownership is transferred
-0.063076	function parameters to be transferred
-0.063076	four parameters to be transferred
-0.030394	fourteen parameters to be transferred
-0.347563	the parameters would be transferred
-0.125236	where the parameters are transferred
-0.125236	Simple function parameters are transferred
-0.183270	floating point parameters are transferred
-0.058119	two integer parameters are transferred
-0.058119	compiler) integer parameters are transferred
-0.125236	first four parameters are transferred
-0.082039	parameters Function parameters are transferred
-0.153126	memory. Function parameters are transferred
-0.082039	__fastcall. Function parameters are transferred
-0.233794	a and r are transferred
-0.294092	created, deleted, copied or transferred
-0.346054	a PC and then transferred
-0.330166	references. Arrays are always transferred
-0.523616	every function that is longer
-0.760822	is converted to a longer
-0.531464	at the cost of longer
-0.356744	or method should be longer
-0.237565	justify the method. A longer
-0.352332	the object is no longer
-0.282394	spaces that are no longer
-0.282394	when they are no longer
-0.324528	the one that takes longer
-0.301335	is started. It takes longer
-0.305783	an unsigned integer takes longer
-0.367203	multiplication Integer multiplication takes longer
-0.331329	program appear to take longer
-0.222842	the multiplication would take longer
-0.222842	Multiplication and division take longer
-0.222842	as additions. Divisions take longer
-0.348391	context switches by making longer
-0.070662	point division takes much longer
-0.070662	Integer division takes much longer
-0.155098	because truncation takes much longer
-0.339610	rows in the matrix longer
-0.234378	instructions are one byte longer
-0.048257	the time before and after
-0.048257	your program before and after
-0.048257	stamp counter before and after
-0.048257	clock count before and after
-0.237759	first data member or after
-0.503333	b are the same after
-0.328959	are needed, but only after
-0.235557	of doing things only after
-0.351067	to access an object after
-0.349927	solution. Sort the array after
-0.441673	no object is accessed after
-0.291765	to do the check after
-0.325397	approximately two clock cycles after
-1.006655	a few clock cycles after
-0.378068	tree. Is searching needed after
-0.232273	values are then output after
-0.229182	calling any necessary destructors after
-0.311701	jobs. The context switches after
-0.212240	but may be removed after
-0.199970	have to execute _mm_empty() after
-0.165112	loop is to resume after
-0.165112	file will remain locked after
-0.324417	to get used to read
-0.752251	if we want to read
-0.443088	few clock cycles to read
-0.237130	not testing. Trying to read
-0.237130	GetPrivateProfileString and WritePrivateProfileString to read
-0.237865	a memory buffer and read
-0.356487	cache line to be read
-0.357853	case x can be read
-0.351163	that a must be read
-0.355753	issue, as you can read
-0.347360	a time. Do not read
-0.354139	executed. Furthermore, you may read
-0.353643	page 16. If you read
-0.344859	specified. The code will read
-0.235562	high-level language need only read
-0.235562	safer implementation would only read
-0.237421	example 16.2 above, but read
-0.293230	be evicted when we read
-0.229191	instructions were splitting 256-bit read
-0.226713	if the program had read
-0.218519	expensive than an uncached read
-0.199981	not expect to 99 read
-0.450255	inline assembly code to give
-1.564776	it is possible to give
-0.237323	a hyperthreading processor to give
-0.237323	keyword wherever appropriate to give
-0.341495	generate an overflow and give
-0.237601	generate an underflow and give
-0.294118	examples. The table can give
-0.700516	of the class or give
-0.353787	the unit-test does not give
-0.237671	OR operator (^) may give
-0.236067	<< x.f; // will give
-0.292122	appropriate header file will give
-0.713320	The CPU dispatcher should give
-0.350590	and Linux operating systems give
-0.236128	the CPUID instruction doesn't give
-0.379528	tasks because this would give
-0.232289	often unreliable. They sometimes give
-0.320227	this solution can still give
-0.285275	cached. The subsequent counts give
-0.191011	overflow and negative inputs give
-0.191011	to 12. Higher inputs give
-0.419463	particular piece of code. Each
-0.236650	Dispatch at installation time. Each
-0.232236	lot of extra resources. Each
-0.287345	size of 64 bytes. Each
-0.514918	stored on the stack. Each
-0.231318	for implementing polymorphic classes. Each
-0.372846	of floating point instructions. Each
-0.328809	only 64-bit execution units. Each
-0.226714	do two jobs simultaneously. Each
-0.495227	job into multiple threads. Each
-0.224864	the multiple processor cores. Each
-0.272207	Load library at initialization. Each
-0.218484	save exception handling information. Each
-0.313468	it at the diagonal. Each
-0.806085	for the following reasons: Each
-0.641992	as a linked list. Each
-0.165091	Class member functions (methods) Each
-0.165091	line size of 64. Each
-0.165091	the output are unacceptable. Each
-0.165091	variables Y and Z. Each
-0.165091	29 with line 29. Each
-0.654146	the event that it becomes
-0.355355	object static then it becomes
-0.480457	tedious and the code becomes
-0.519812	is that the code becomes
-0.341242	better because the code becomes
-0.341242	the contrary, the code becomes
-0.305917	page 122. The code becomes
-0.305917	becomes contiguous. The code becomes
-0.305917	and tedious. The code becomes
-0.317857	the resulting machine code becomes
-0.353530	CPU. Unrolling a loop becomes
-0.488868	of nontemporal write instructions becomes
-0.380315	that communication between threads becomes
-0.335000	number 28. The calculation becomes
-0.495934	The time stamp counter becomes
-0.427044	(PLT). The memory space becomes
-0.298198	26. The heap space becomes
-0.222975	become fragmented and caching becomes
-0.222975	so big that caching becomes
-0.234768	The memory space never becomes
-0.165143	e.g. a menu click becomes
-0.347466	aligned(16))) Assume pointer is aligned
-0.553864	data have to be aligned
-0.355896	or class should be aligned
-0.351169	XMM vectors must be aligned
-0.877112	if the data are aligned
-0.310807	require that data are aligned
-0.508623	that the arrays are aligned
-0.508623	whether the arrays are aligned
-0.236663	then use #pragma vector aligned
-0.102406	is aligned #pragma vector aligned
-0.102406	vector aligned #pragma vector aligned
-0.802346	of how to make aligned
-1.052089	// Function to store aligned
-0.333236	or malloc is typically aligned
-0.235024	YMM vectors are preferably aligned
-0.234473	); // Make three aligned
-1.046594	// Function to load aligned
-0.287842	all instances of S1 aligned
-0.167985	0.44 0.12 memcpy 16kB aligned
-0.167985	Test Processor memcpy 16kB aligned
-0.199993	the arrays are properly aligned
-0.237871	have many keywords and directives
-0.344068	code. Many of these directives
-0.411254	explanation. Note that these directives
-0.452130	most of the Gnu directives
-0.236036	C++ and Fortran. These directives
-0.532335	by means of #include directives
-0.321378	most of the Microsoft directives
-0.216458	resolved at runtime. #define directives
-0.216458	given a name. #define directives
-0.230071	-opt-report Table 18.2. Compiler directives
-0.087037	exception handling. 8.6 Optimization directives
-0.087037	................................................................................... 81 8.6 Optimization directives
-0.074780	Documentation of the OpenMP directives
-0.074780	PSDK). Supports the OpenMP directives
-0.165171	threads Parallelization by OpenMP directives
-0.191039	code should have #if directives
-0.191039	program is compiled. #if directives
-0.048387	7.32 Preprocessing directives Preprocessing directives
-0.023520	time-critical code. 7.32 Preprocessing directives
-0.023520	.............................................................................. 65 7.32 Preprocessing directives
-0.199993	The library has preprocessing directives
-0.314181	device. Any language that requires
-0.237445	a strict formalism that requires
-0.354131	several seconds because it requires
-0.352272	CPU core, but it requires
-0.236488	references. Most importantly, it requires
-0.602510	of the program. This requires
-0.379938	the memory block. This requires
-0.235831	on this option. This requires
-0.237618	Windows operating system, this requires
-0.237577	Gnu C library. It requires
-0.236717	or other hardware often requires
-0.168917	difference between two pointers requires
-0.168917	addition. Comparing two pointers requires
-0.316772	the array. This method requires
-0.316772	is loaded. This method requires
-0.236391	operation on such processors requires
-0.235891	is enabled (single precision requires
-0.235807	f; } This calculation requires
-0.233965	use of intrinsic vectors requires
-0.233746	to remote databases usually requires
-0.199987	exceptions, etc. Event-based sampling requires
-0.339485	compiler from doing the optimizations
-0.480919	make this kind of optimizations
-0.325211	Table 8.1. Comparison of optimizations
-0.462053	some of the compiler optimizations
-0.291483	a result of other optimizations
-0.235506	also makes various other optimizations
-0.336721	some indication of which optimizations
-0.322290	to do and which optimizations
-0.235326	for turning off all optimizations
-0.235326	register and prevents all optimizations
-0.523069	is necessary to do optimizations
-0.309937	compiler warning for such optimizations
-0.309937	compilers will do such optimizations
-0.233128	prevents it from making optimizations
-0.743740	the compiler from making optimizations
-0.312534	to make CPU- specific optimizations
-0.589753	the compiler from doing optimizations
-0.335922	throw() statement can improve optimizations
-0.286125	optimization, which will enable optimizations
-0.218542	compiler to do interprocedural optimizations
-0.165122	compiler to do cross-module optimizations
-0.463304	computational power of the graphics
-0.222463	function call to a graphics
-0.222463	A call to a graphics
-0.222463	single call to a graphics
-0.355256	a platform with a graphics
-0.336622	the processors on a graphics
-0.336622	unit, either on a graphics
-0.352168	Some systems have a graphics
-0.515842	calls. The calculation of graphics
-0.510327	CPUs, different types of graphics
-0.237773	a graphics coprocessor or graphics
-1.154573	if there is no graphics
-0.292276	you avoid the large graphics
-0.347843	resources. Typically, a specific graphics
-0.234164	of extra resources. Each graphics
-0.241220	some of the heavy graphics
-0.191021	For example, a heavy graphics
-0.212289	manual does not cover graphics
-0.199993	layer of a third-party graphics
-0.199993	graphics processing unit. Various graphics
-0.165133	other purposes than rendering graphics
-0.523878	relative reference to a public
-0.487370	steps to access a public
-0.314317	(PLT). And whenever a public
-0.237912	that allows overriding of public
-0.299419	accesses to functions and public
-0.389225	All public functions and public
-0.237639	shared. You can't have public
-0.349141	a GOT for all public
-0.237041	error by avoiding any public
-0.035393	54 class D : public
-0.035393	B2; class D : public
-0.035393	}; class C1 : public
-0.035393	Disp(); class C1 : public
-0.035393	declaration class CChild1 : public
-0.035393	versions: class CChild1 : public
-0.163078	}; class CChild2 : public
-0.163078	MyChild> class CParent : public
-0.163078	}; class C2 : public
-0.235377	that need relocation. All public
-0.251323	the ability to override public
-0.165148	D : public B1, public
-0.217361	7.44 class C1 { public:
-0.217361	N> class powN { public:
-0.061108	functions class CHello { public:
-0.014490	: public CHello { public:
-0.095221	Devirtualization class C0 { public:
-0.095221	: public C0 { public:
-0.498707	: public CParent<CChild1> { public:
-0.095221	class: class CGrandParent { public:
-0.095221	: public CGrandParent { public:
-0.217361	B1, public B2 { public:
-0.217361	: public B1 { public:
-0.217361	template<> class powN<true,0> { public:
-0.217361	N> class powN<true,N> { public:
-0.217361	7.36 class S2 { public:
-0.217361	7.37 class S3 { public:
-0.217361	: public CParent<CChild2> { public:
-0.217361	template<> class powN<true,1> { public:
-0.310313	{ const int x; public:
-0.218584	// 2-dimensional vector 56 public:
-0.165179	{ protected: T a[N]; public:
-0.659233	during installation of the framework
-0.458644	takes to load the framework
-0.294047	Programs using such a framework
-0.294047	may supply such a framework
-0.325220	the intermediate code. This framework
-0.342901	to choose a software framework
-0.343298	software. Such an extra framework
-0.289041	program, and the runtime framework
-0.091659	install a large runtime framework
-0.091659	a very large runtime framework
-0.219465	Typically, a specific graphics framework
-0.219465	of a third-party graphics framework
-0.217888	choice of user interface framework
-0.318309	Choice of user interface framework
-0.216480	because the high level framework
-0.154150	where a high level framework
-0.224903	package on a complex framework
-0.132901	code for the .NET framework
-0.044147	runtime frameworks. The .NET framework
-0.044147	the mouse. The .NET framework
-0.093295	languages in Microsoft's .NET framework
-1.246534	it is necessary to look
-0.640118	the compiler needs to look
-0.353754	of a compiler can look
-0.331682	CPU dispatcher should not look
-0.352602	level 108 You may look
-0.235446	A C++ implementation may look
-0.235446	A test setup may look
-0.352964	thing and if you look
-0.351851	equally efficient. If you look
-0.235356	references are: When you look
-0.293814	some embedded systems. A look
-0.324949	compilers. // It will look
-0.324990	either case. Intrinsic functions look
-0.348616	counts that you should look
-0.324672	then you may also look
-0.292916	is necessary to first look
-0.234981	range. This may typically look
-0.148708	in the code. Let's look
-0.148708	= 4 rows. Let's look
-0.199981	automatically. For example, let's look
-0.199981	a self-relative address. (3) look
-0.298721	The mechanism of static linking
-0.216001	is clear that static linking
-0.216001	but not if static linking
-0.216001	executable file when static linking
-0.223204	advantages of using static linking
-0.159969	improved by using static linking
-0.216001	system, this requires static linking
-0.216001	recommended to specify static linking
-0.596752	The advantages of dynamic linking
-0.213717	first application if dynamic linking
-0.213717	linking rather than dynamic linking
-0.213717	the cases where dynamic linking
-0.213717	the application, while dynamic linking
-0.174220	linking is used. Dynamic linking
-0.174220	the end user. Dynamic linking
-0.078432	....................................................................................................... 19 3.6 Dynamic linking
-0.078432	is acceptable. 3.6 Dynamic linking
-0.231400	functions /Gr Function level linking
-0.229238	inline assembly or easy linking
-0.165182	64 bit code Static linking
-0.165182	dynamic linking are: Static linking
-0.528391	places in the code. Many
-0.235718	preferable for speed-critical functions. Many
-0.235263	you start to program. Many
-0.403022	libraries are discussed below. Many
-0.289903	fast on newer processors. Many
-0.376682	implement in a compiler. Many
-0.742780	Using performance monitor counters Many
-0.372830	manner. 3.4 Automatic updates Many
-0.473408	for automatic CPU dispatching. Many
-0.284291	support for 64-bit integers. Many
-0.226696	a very inefficient solution. Many
-0.276479	used with other microprocessors. Many
-0.500858	20 3.9 Other databases Many
-0.218484	on all relevant options. Many
-0.413507	Waiting for user input. Many
-0.199948	Optimization Reference Manual". developer.intel.com. Many
-0.199948	division to be slower. Many
-0.165091	whole workday or more. Many
-0.165091	problems and system breakdown. Many
-0.165091	security software. Background services. Many
-0.165091	handle unknown processors properly. Many
-0.237511	workstations and scientific vector processors.
-0.314225	is fastest on different processors.
-0.284598	works only with Intel processors.
-0.248533	but not on Intel processors.
-0.248533	also works on Intel processors.
-0.229448	Core and later Intel processors.
-0.593124	well only on some processors.
-0.235690	to handle only known processors.
-0.234164	does not cover graphics processors.
-0.092696	on AMD and VIA processors.
-0.339058	the number of logical processors.
-0.206110	processors but eight logical processors.
-0.269867	256 bytes) on future processors.
-0.200008	processors rather than future processors.
-0.200008	particularly fast on newer processors.
-0.200008	important on most newer processors.
-0.228126	be implemented in PC processors.
-0.470694	running on the newest processors.
-0.265138	64 bytes on contemporary processors.
-0.355800	the library that is actually
-0.357050	+ 0.666666666666666666667; This is actually
-0.638948	the calculation time is actually
-0.855916	if the program is actually
-0.293452	sizeof(S1) = 16 is actually
-0.331892	than 1% goes to actually
-0.347421	any extra code for actually
-0.643867	of the functions are actually
-0.347370	and Gnu compilers are actually
-0.442035	parallel. Modern CPUs are actually
-0.236474	low power consumption are actually
-0.354637	data files. This can actually
-0.339327	as bigger than it actually
-0.549880	fact, the compiler may actually
-0.236561	CISC instruction set may actually
-0.237272	whether the original pointer actually
-0.494538	long that the user actually
-0.218531	check if your modifications actually
-0.212263	function in case F2 actually
-0.199993	The name "position-independent code" actually
-0.165133	+ 100*16, and temp++ actually
-0.020744	3. The microarchitecture of Intel,
-0.000317	3: "The microarchitecture of Intel,
-0.237863	and micro-operation breakdowns for Intel,
-0.340255	PC platform with an Intel,
-0.680325	processor is not an Intel,
-0.237551	family of microprocessors from Intel,
-0.237335	dvec.h vectorclass.h Supported compilers Intel,
-0.235214	test tool supports both Intel,
-0.290517	Supported compilers Intel, Microsoft Intel,
-0.062860	code for the Microsoft, Intel,
-0.062860	good as the Microsoft, Intel,
-0.136383	are supported by Microsoft, Intel,
-0.136383	all x86 platforms. Microsoft, Intel,
-0.462509	by the Gnu, Clang, Intel,
-0.352090	the address of a linked
-0.646297	the form of a linked
-0.461378	Each element in a linked
-0.335564	size, not as a linked
-0.335564	buffer than as a linked
-0.526604	often implemented as a linked
-0.648767	Do not use a linked
-0.351161	block. Walking through a linked
-0.355411	code and can be linked
-0.522115	interface library can be linked
-0.459255	three versions should be linked
-0.313100	in most cases be linked
-0.237838	which the modules are linked
-0.457907	arrays are faster than linked
-0.237617	solution. Many containers use linked
-0.292170	the next block. A linked
-0.236110	use linked lists. A linked
-0.314341	intermediate files are then linked
-0.343150	addresses of library functions linked
-0.233023	the addresses of dynamically linked
-0.165148	of runtime DLL's (dynamically linked
-0.294086	{ float xn = x;
-0.456053	c1 { const int x;
-0.048024	coefficients double Table[100]; int x;
-0.048024	3.3; double Table[100]; int x;
-1.318198	f; int i; } x;
-0.512643	matrix[rows][columns]; int i; float x;
-0.231138	int i, j; float x;
-0.231138	// Example 7.27 float x;
-0.331302	x2 = x * x;
-0.990734	p(double x) { return x;
-0.228142	100; i++) matrix[FuncRow(i)][FuncCol(i)] += x;
-0.228142	j++) 39 matrix[i][j] += x;
-0.095667	*= x; x *= x;
-0.095667	& 1) y *= x;
-0.022014	n; x++) factorial *= x;
-0.022014	i--, x++) factorial *= x;
-0.095667	/ nfac; xn *= x;
-0.301446	void F1() { C1 x;
-0.068032	int c:2; }; Bitfield x;
-0.068032	char abc; }; Bitfield x;
-0.200004	__asm fld qword ptr x;
-0.336086	of competing brands of microprocessors
-0.237718	the x86 family of microprocessors
-0.347475	of how compilers and microprocessors
-0.324612	Some versions of Intel microprocessors
-0.319802	loop buffer that some microprocessors
-0.319802	than normal on some microprocessors
-0.685778	lies in the way microprocessors
-0.087736	is compatible with old microprocessors
-0.087736	be compatible with old microprocessors
-0.200947	high speed of modern microprocessors
-0.200947	execution core of modern microprocessors
-0.177905	quite inefficient. The modern microprocessors
-0.244020	in almost all modern microprocessors
-0.230067	operating system All newer microprocessors
-0.136378	variables and operators Modern microprocessors
-0.136378	3.15 Dependency chains Modern microprocessors
-0.136378	is branch prediction. Modern microprocessors
-0.136378	advanced prediction mechanisms. Modern microprocessors
-0.218537	the compatibility with older microprocessors
-0.218573	functions if possible. Smaller microprocessors
-0.165138	Using vector operations Today's microprocessors
-0.510415	takes more time to load
-0.235983	cause the cache to load
-1.719270	time it takes to load
-0.815068	without the need to load
-0.350597	makes it necessary to load
-0.213261	functions // Function to load
-0.213261	classes // Function to load
-0.213261	SSE4.1 // Function to load
-0.213261	); // Function to load
-0.489112	because it needs to load
-0.312438	causes all writes to load
-0.294167	to execute it. The load
-0.351628	operating system may not load
-0.344874	the optimized code will load
-0.291931	different compilers. Dispatch at load
-0.235900	not need relocation at load
-0.111722	increased when the work load
-0.111722	decreased when the work load
-0.519176	linker to a specific load
-0.306426	to fit the actual load
-0.165153	large part of it) load
-0.648530	no easy way to control
-0.294009	have various options to control
-0.116850	integer in the loop control
-0.116850	100 in the loop control
-0.466518	efficient if the loop control
-0.277201	than by the loop control
-0.362059	times then the loop control
-0.277201	situation where the loop control
-0.277201	can predict the loop control
-0.277201	can execute the loop control
-0.277201	to evaluate the loop control
-0.324741	example 7.30b. The loop control
-0.216234	The most efficient loop control
-0.216234	advantages: The i<20 loop control
-0.232898	There are other cache control
-0.047580	96 9.11 Explicit cache control
-0.047580	2001. 9.11 Explicit cache control
-0.338745	Mac, BSD Instruction set control
-0.237105	to use a version control
-0.226783	SSE2 Table 9.2. Cache control
-0.836518	allows the compiler to assume
-0.727290	tell the compiler to assume
-0.351160	Therefore, it has to assume
-0.237330	is not permissible to assume
-0.382793	ignore the problem and assume
-0.442512	most cases, you can assume
-0.627117	In general, you can assume
-0.326891	instruction code. You can assume
-0.326891	the result. You can assume
-0.314582	to ignore overflow or assume
-0.237674	set. Neither can you assume
-0.319950	: 2.5f; If we assume
-0.233472	library function which we assume
-0.480341	set, then you cannot assume
-0.339734	clock cycles. You cannot assume
-0.291531	an optimizing compiler would assume
-0.082616	then you can generally assume
-0.082616	16. You can generally assume
-0.165143	guess, that compiler makers assume
-0.165143	the compiler can safely assume
-0.301209	const int size = 100;
-0.234215	const int ARRAYSIZE = 100;
-0.234215	= 100, NUMCOLUMNS = 100;
-0.019747	= 0; x < 100;
-0.480434	= 0; i < 100;
-0.357846	} Assume that the numbers
-0.456713	numbers because all the numbers
-0.382645	enough to hold the numbers
-0.725700	integers to floating point numbers
-0.798583	integers and floating point numbers
-0.500365	conversions from floating point numbers
-0.061021	Conversions between floating point numbers
-0.322691	two positive floating point numbers
-0.418023	of nonzero floating point numbers
-0.340178	point variables Floating point numbers
-0.281892	250 times with four numbers
-0.227063	can only have four numbers
-0.235631	you can have eight numbers
-0.468891	CPU family and model numbers
-0.397877	Assuming that processor model numbers
-0.226757	an array of thousand numbers
-0.200015	rather than generating denormal numbers
-0.200015	if we use hexadecimal numbers
-0.165153	byte of data (low numbers
-0.354610	preferably implemented on a platform
-0.520512	accelerators The choice of platform
-0.494087	application to a different platform
-0.336555	not _WIN64 64 bit platform
-0.227725	Compiler identification 16 bit platform
-0.333606	__INTEL_COMPILER 161 32 bit platform
-0.236170	_LP64 _WIN64 _LP64 Windows platform
-0.235906	platform _WIN32 _WIN32 Linux platform
-0.294410	depend on the hardware platform
-0.020326	the choice of hardware platform
-0.020326	The choice of hardware platform
-0.010044	2.1 Choice of hardware platform
-0.118970	2 Choosing the optimal platform
-0.321378	free in the Microsoft platform
-0.233786	__linux__ __unix__ __linux__ x86 platform
-0.497888	on the standard PC platform
-0.218551	platform _M_IX86 _M_IX86 x86-64 platform
-0.212263	conflicting considerations of efficiency, platform
-0.581067	case the code is later
-0.339797	by one function and later
-0.236311	optimized for SSE2 and later
-0.236311	for the AVX and later
-0.380608	support the SSE and later
-0.236311	the Intel Core and later
-0.236311	into the pipeline and later
-0.292399	from address 0x2710 and later
-0.237868	be very helpful for later
-0.014055	when the SSE2 or later
-0.028580	unless the SSE2 or later
-0.028580	enable the SSE2 or later
-0.101212	Enable the AVX or later
-0.101212	compiled for AVX or later
-0.233417	binutils version 2.20 or later
-0.233417	unless the Pentium-II or later
-0.356909	we expect to use later
-0.236396	Intel SVML v.10.3 & later
-0.352072	- 20 clock cycles later
-1.512264	part of the code together
-0.241051	data that are used together
-0.066504	Variables that are used together
-0.104012	Functions that are used together
-0.474733	pool all the objects together
-0.289346	to store many objects together
-0.136688	together should be stored together
-0.439459	derived class are stored together
-0.491946	together are also stored together
-0.215453	properties) are always stored together
-0.400369	and can be linked together
-0.219484	files are then linked together
-0.333940	you want to keep together
-0.276565	the whole software project together
-0.382902	program will be joined together
-0.355474	dispatched function then the dispatch
-0.353891	a program where the dispatch
-0.345922	different places making the dispatch
-0.314268	CPUs. It uses the dispatch
-0.407558	how to implement the dispatch
-0.293772	the compiler bypassing the dispatch
-0.237922	function is called, a dispatch
-0.237919	it makes sense to dispatch
-0.437474	avoiding the virtual function dispatch
-0.286892	the inefficient virtual function dispatch
-0.519563	Note that the CPU dispatch
-0.293721	13.5 Implementation The CPU dispatch
-0.293721	detection mechanism. The CPU dispatch
-0.276499	was developed. A CPU dispatch
-0.222306	instr. set Automatic CPU dispatch
-0.222306	Intel have similar CPU dispatch
-0.097082	sets........................... 122 13.1 CPU dispatch
-0.097082	innermost loops. 13.1 CPU dispatch
-0.222306	programs use inappropriate CPU dispatch
-0.234518	is wasted on runtime dispatch
-0.657960	well-defined interface to the calling
-0.339410	error code which the calling
-0.575591	The powN template is calling
-0.314690	for cleaning up and calling
-0.314670	a global object. The calling
-0.540930	program is useful for calling
-0.350134	mode because the function calling
-0.452477	__fastcall changes the function calling
-0.291683	details that make function calling
-0.474830	using the same function calling
-0.236284	created a file by calling
-0.542863	should be avoided by calling
-0.236284	can be prevented by calling
-0.515659	is as fast as calling
-0.647577	clock cycles more than calling
-0.232482	a temporary array before calling
-0.844423	then call _mm256_zeroupper() before calling
-0.312546	to obey any specific calling
-0.441745	conform to the standard calling
-0.506501	19 in manual 5: calling
-0.237902	within the lifetime of your
-0.237903	can get answers to your
-0.237625	a decimal point in your
-0.237625	insert a switch in your
-0.293726	is not necessary for your
-0.606584	See the manual for your
-0.237851	important to remember that your
-0.479987	have to check if your
-0.237067	the other hand, if your
-0.310075	64 If you make your
-0.377390	problems you must make your
-0.234001	range"); or better, make your
-0.237448	calls to CriticalFunction. If your
-0.236845	take several years before your
-0.235874	the performance counters inside your
-0.291128	needed. You may write your
-0.235141	a stack frame unless your
-0.315780	more efficient to define your
-0.199981	So please don't send your
-0.165122	of the code. Inserting your
-0.256501	must return to its own
-0.229780	each object in its own
-0.247410	then run on its own
-0.229780	does not have its own
-0.099867	Each thread has its own
-0.099867	linked list has its own
-0.099867	CPU model has its own
-0.180815	give each thread its own
-0.180815	object: (1) get its own
-0.180815	should then handle its own
-0.222976	equivalent reductions at their own
-0.222976	programmers rarely program their own
-0.135694	If you make your own
-0.135694	or better, make your own
-0.175357	You may write your own
-0.175357	efficient to define your own
-0.175357	the code. Inserting your own
-0.268517	are based on my own
-0.215253	the code. For my own
-0.230104	and Linux. Asmlib My own
-0.294065	of a class is declared
-0.382650	above template class is declared
-0.346634	The table should be declared
-0.448051	Big arrays should be declared
-0.350109	multiple threads must be declared
-0.320913	objects should preferably be declared
-0.344626	Function parameters that are declared
-0.535988	storage Variables that are declared
-0.956417	in which they are declared
-0.336833	on how they are declared
-0.237717	If seconds was not declared
-0.104685	non-static variables and objects declared
-0.318635	stack Variables and objects declared
-0.281577	detect if any objects declared
-0.293185	also used for variables declared
-0.287360	of functions A macro declared
-0.208509	inside a class Variables declared
-0.208509	of course inefficient. Variables declared
-0.658609	precision variables in the XMM
-0.459888	compilers will use the XMM
-0.063631	double precision when the XMM
-0.342131	the processor) when the XMM
-0.237898	floating point underflow in XMM
-0.237859	for 64-bit Windows). The XMM
-0.451080	system has support for XMM
-0.340907	does not check if XMM
-0.237072	conversion is costly if XMM
-0.351656	- n.a. Floating point XMM
-0.291748	A good implementation uses XMM
-0.235118	x - - Integer XMM
-0.234782	- - 76 Boolean XMM
-0.319204	fits into a 128-bit XMM
-0.162404	For example, a 128-bit XMM
-0.138116	64-bit MMX to 128-bit XMM
-0.063590	point code. The 128-bit XMM
-0.063590	YMM registers The 128-bit XMM
-0.222361	of register stack versus XMM
-1.514525	the value of the second
-0.831125	is added to the second
-0.356630	false (0); and the second
-0.501101	comes only in the second
-0.356126	linked together in the second
-0.501943	start calculations on the second
-0.342925	is true, then the second
-0.342925	is false, then the second
-0.354905	is done at the second
-0.346493	operand determines whether the second
-0.237089	x, and last the second
-0.355066	replaced i by a second
-0.351804	must go through a second
-0.237567	see below. Installing a second
-0.343761	sequences of code. The second
-0.415399	non-polymorphic member functions. The second
-0.293078	the polymorphic functions. The second
-0.236789	(See page 137). The second
-0.235494	seconds; // incremented every second
-0.165174	such as price, compatibility, second
-0.121849	functions. The following example shows
-0.121849	efficient. The following example shows
-0.121849	set. The following example shows
-0.121849	processors. The following example shows
-0.121849	not. The following example shows
-0.121849	vectors. The following example shows
-0.121849	again. The following example shows
-0.121849	www.agner.org/optimize/asmlib.zip. The following example shows
-0.121849	multiplications. The following example shows
-0.121849	Atom). The following example shows
-0.114522	is InstructionSet().The following example shows
-0.210743	metaprogramming. The next example shows
-0.234060	per element. The table shows
-0.234060	table 8.1. The table shows
-0.272257	chosen expression. Example 12.4b shows
-0.200021	profiling (see page 16) shows
-0.200021	example on page 39 shows
-0.200021	7.43 on page 58 shows
-0.165159	Table 8.1 (page 77) shows
-0.165159	next section (page 131) shows
-0.467589	of programming language and interface
-0.352475	important that the user interface
-0.244777	time on the user interface
-0.244777	to place the user interface
-0.123914	the choice of user interface
-0.027824	2.7 Choice of user interface
-0.231891	Linux systems. The user interface
-0.167468	same time. A user interface
-0.167468	The simplest possible user interface
-0.167468	can use standard user interface
-0.167468	the code, including user interface
-0.155279	dropping the graphical user interface
-0.441177	of a graphical user interface
-0.155279	Graphics A graphical user interface
-0.167468	tools. A popular user interface
-0.191079	Library (OWL). Several graphical interface
-0.191079	depend on system-specific graphical interface
-0.119310	functionality and a well-defined interface
-0.176504	modules with a well-defined interface
-0.798502	may be possible to improve
-0.555339	code in order to improve
-0.814189	data in order to improve
-1.017906	If you want to improve
-0.237146	use these methods to improve
-0.456990	the stack. This can improve
-0.325517	clock cycles. You can improve
-0.325517	preceding one. You can improve
-0.290390	A hash table can improve
-0.290390	The 64-bit systems can improve
-0.290390	empty throw() statement can improve
-0.234545	size (typically 64) can improve
-0.237723	However, this did not improve
-0.333386	other advantages that may improve
-0.342646	two entries. This may improve
-0.457536	... then you may improve
-0.457536	code, then you may improve
-0.288917	function because this may improve
-0.335843	redesign can not only improve
-0.325426	(if valid) can possibly improve
-0.237936	simply by ignoring the higher
-0.325374	of these functions is higher
-0.568322	programmer. There is a higher
-0.336909	header file for a higher
-0.336909	is designed for a higher
-0.345675	a CPU with a higher
-0.345675	A model with a higher
-0.351144	typically loaded at a higher
-0.461646	be expected to be higher
-0.237827	another. These costs are higher
-0.382643	and the SSE or higher
-0.237558	and microprocessor microarchitecture. A higher
-0.237517	small code size has higher
-0.313683	instruction set or any higher
-0.228606	you measure are much higher
-0.228606	millisecond resolution. A much higher
-0.448793	only when the next higher
-0.321954	hyperthreading processor to give higher
-0.341577	first count is usually higher
-0.165128	execution units and hence higher
-0.497847	in a program is bigger
-0.434683	and the matrix is bigger
-0.614874	the size parameter is bigger
-0.352672	gives the advantage of bigger
-0.632667	floating point code. The bigger
-0.357378	interface library may be bigger
-0.356220	to arrays that are bigger
-0.294018	frameworks typically used on bigger
-0.407848	object is treated as bigger
-0.903717	in the innermost loop bigger
-0.425419	beginning of the new bigger
-0.322833	require that a new bigger
-0.671936	to allocate a new bigger
-0.236170	occurs, even for arrays bigger
-0.333113	operating systems that allows bigger
-0.496296	that the code becomes bigger
-0.288604	software projects have become bigger
-0.231858	with a total offset bigger
-0.222318	power of 2. Objects bigger
-0.222318	up with the ever bigger
-1.130821	recommended to use the vectors
-0.237847	way by wrapping the vectors
-0.336250	at a time in vectors
-0.638756	12.7 Mathematical functions for vectors
-0.324460	and mathematical operations on vectors
-0.331156	doing parallel calculations on vectors
-0.293588	AVX2 256 bit integer vectors
-0.716429	in speed by using vectors
-0.276779	bit integer and double vectors
-0.411069	bit float and double vectors
-0.313845	SSE 128 bit float vectors
-0.338480	do use the 64-bit vectors
-0.234785	the use of intrinsic vectors
-0.423419	code. The 128-bit XMM vectors
-0.233990	point code. The bigger vectors
-0.342945	// x^4 // Define vectors
-0.283096	below). The 256-bit YMM vectors
-0.347413	SSE2 instruction set (128 vectors
-0.013564	RGB video or 3-dimensional vectors
-0.356225	: EXCEPTION_CONTINUE_SEARCH) { // Floating
-0.501458	mix float and double Floating
-0.356607	>= b) - n.a. Floating
-0.719824	manipulating floating point variables Floating
-0.755203	registers in 64-bit systems. Floating
-0.689345	14.6 Floating point division Floating
-0.318481	= multiply and shift Floating
-0.342326	- 45 clock cycles. Floating
-0.373710	used for multiple purposes. Floating
-0.463589	to floating point expressions. Floating
-0.226695	functions with integer parameters. Floating
-0.226727	of the 32-bit integer. Floating
-0.122625	list[300] = 0; 14.6 Floating
-0.122625	Integer division...................................................................................................... 137 14.6 Floating
-0.251272	explained on page 105. Floating
-0.074754	and operators............................................................................... 29 7.3 Floating
-0.074754	integer variables. 31 7.3 Floating
-0.165107	register stack is organized. Floating
-0.165107	- 45 clock cycles). Floating
-0.165107	by the programmer. 79 Floating
-0.461822	select function, and the AVX2
-0.237859	the performance somewhat. The AVX2
-0.438926	SSE4.1 and one for AVX2
-0.294102	SSE2 // SSE4.1 // AVX2
-0.348748	use AVX only when AVX2
-0.233126	bytes = double 2 AVX2
-0.233126	bytes = int64_t 2 AVX2
-0.278043	bytes = int 4 AVX2
-0.223670	bytes = double 4 AVX2
-0.278043	bytes = float 4 AVX2
-0.223670	bytes = int64_t 4 AVX2
-0.099158	bytes = int 8 AVX2
-0.099158	float or int 8 AVX2
-0.227870	bytes = float 8 AVX2
-0.328849	long 64 4 256 AVX2
-0.328849	int 32 8 256 AVX2
-0.328849	int 16 16 256 AVX2
-0.198569	char 8 32 256 AVX2
-0.377361	float and double vectors AVX2
-0.200004	set available, e.g. AVX, AVX2
-0.341762	before and after the piece
-0.314675	is to insert the piece
-0.828736	which version of a piece
-0.351149	means that if a piece
-0.509252	- to make a piece
-0.733858	possible to make a piece
-0.641198	AVX part. If a piece
-0.292406	no idea how a piece
-0.236317	want to optimize a piece
-0.427538	want to generate a piece
-0.236317	the compiler optimizes a piece
-0.236317	but for studying a piece
-0.237773	background calculations piece by piece
-0.558384	to share the same piece
-0.352795	for executing the same piece
-0.330842	after executing a critical piece
-0.236187	the heavy background calculations piece
-0.270052	register is a small piece
-0.270052	rather than a small piece
-0.350135	job optimizing a particular piece
-0.062813	memory address that is divisible
-0.335959	a constant that is divisible
-0.354433	memory address which is divisible
-0.280016	the loop count is divisible
-0.502246	not certain to be divisible
-0.351730	// SIZE must be divisible
-0.351959	of points is not divisible
-0.351959	of iterations is not divisible
-0.293533	large arrays. Array size divisible
-0.073532	aligned to an address divisible
-0.022231	stored at an address divisible
-0.045662	start at an address divisible
-0.045662	begin at an address divisible
-0.090484	requires alignment to addresses divisible
-0.090484	data structures to addresses divisible
-0.204919	destination both have addresses divisible
-0.256852	at round memory addresses divisible
-0.977032	by the use of <<
-0.234498	nonzero u.i += n <<
-0.230801	array cout << list[i] <<
-0.001721	>= size) { cout <<
-0.000107	void Disp() { cout <<
-0.000430	void Hello() { cout <<
-0.001721	(unsigned int)size) { cout <<
-0.013960	Loop through array cout <<
-0.013960	bit of f cout <<
-0.228139	* 32 with j <<
-0.251323	| ((B & 3) <<
-0.165148	* 16is calculated asa <<
-0.165148	* 17is calculated as(a <<
-0.165148	<< 4) | (C <<
-0.165148	= A | (B <<
-0.439879	matrix[j][0] = i; } Here,
-0.329668	= log(2.0); ... } Here,
-0.431671	Func1(x) + 1.; } Here,
-0.229989	} return sum; } Here,
-0.229989	list[j].b + list[j].c; } Here,
-0.229989	list[100]; Func1(list, &list[8]); } Here,
-0.236065	= p + i; Here,
-0.235744	= a | b; Here,
-0.336142	> 1.0) { ... Here,
-0.705550	+ b + c; Here,
-0.289942	i++) matrix[FuncRow(i)][FuncCol(i)] += x; Here,
-0.251297	19 }; S1 ArrayOfStructures[100]; Here,
-0.199987	= d + 3.5; Here,
-0.199987	feedback comes from testing. Here,
-0.165128	about in my blog. Here,
-0.165128	% (number of sets). Here,
-0.165128	< ArraySize; i++) List[i]++; Here,
-0.165128	class c1; int c1::*MemberPointer; Here,
-0.165128	= OneOrTwo5[b & 1]; Here,
-0.358392	worst problem of the x86
-0.357693	64-bit extension to the x86
-0.354602	in hardware in the x86
-0.142072	newer microprocessors in the x86
-0.142072	Modern microprocessors in the x86
-0.524629	are based on the x86
-0.439001	use 32 bits in x86
-0.237862	the 64-bit versions. The x86
-0.929970	An optimization guide for x86
-0.122353	MKL). Works with all x86
-0.122353	(IPP). Works with all x86
-0.063827	newest processors. Supports all x86
-0.063827	Open source. Supports all x86
-0.063827	possible workaround. Supports all x86
-0.313412	mode. The 32- bit x86
-0.235377	with big-endian storage. All x86
-0.233011	Open source library. Supports x86
-0.233011	order execution All modern x86
-0.251323	__unix__ __linux__ __unix__ __linux__ x86
-0.314667	memory allocation are: The process
-0.172092	cases, the log on process
-0.172092	password. The log on process
-0.237558	between CPU cores. A process
-1.075366	one instance for each process
-0.235201	overhead to the allocation process
-0.311112	reduction is a complicated process
-0.276552	about how the development process
-0.316849	about which software development process
-0.234483	the slow GOT lookup process
-0.051605	rest of the installation process
-0.051605	unusual for the installation process
-0.051605	selected during the installation process
-0.244017	be installed. The installation process
-0.228100	should leave a background process
-0.226719	for updating. The update process
-0.276525	designed program. 6 Development process
-0.165128	work as a learning process
-0.165128	explanation why this delaying process
-0.356640	and fffff is the binary
-0.351461	is stored as the binary
-0.339295	will cut off the binary
-0.833002	are stored in a binary
-0.353300	so small that a binary
-0.543016	solution may be a binary
-0.351432	sorted list or a binary
-0.449103	form. A disadvantage of binary
-0.325390	D is compiled to binary
-0.832222	data are stored in binary
-0.237628	Remove right-most 1-bit in binary
-0.314516	compiled and distributed as binary
-0.441659	added and then use binary
-0.236102	in this case. A binary
-0.236102	to be moved. A binary
-0.340293	individual bits of its binary
-0.308494	Runtime, CLR, to produce binary
-0.199998	stored as a biased binary
-0.199998	sort and search facilities, binary
-0.338199	these obstacles and to know
-0.591508	bookkeeping in order to know
-0.344029	It is useful to know
-1.016373	If you want to know
-0.236754	the CPU dispatcher to know
-0.399307	for the programmer to know
-0.354020	many programmers do not know
-0.352752	logical sequence. If you know
-0.313076	you are sure you know
-0.347591	14.27 assumes that we know
-0.181015	Unfortunately, the compiler cannot know
-0.181015	12.1b, the compiler cannot know
-0.345518	if the compiler doesn't know
-0.345518	function, the compiler doesn't know
-0.222036	structure), the microprocessor doesn't know
-0.291538	precision. And who would know
-0.290665	performance options. I don't know
-0.165153	other compilers (Microsoft, Intel) know
-0.120370	the level-2 cache is 512
-0.120370	The level-2 cache is 512
-0.357937	16 lines in a 512
-0.354462	per element for a 512
-0.382793	the operating system, and 512
-0.314676	the subsequent instructions. The 512
-0.237139	(YMM), and soon also 512
-0.100806	AVX512 double 64 8 512
-0.100806	long long 64 8 512
-0.100242	AVX int 32 16 512
-0.100242	AVX512 float 32 16 512
-0.339614	to make the matrix 512
-0.077653	lines in a 512 512
-0.077653	element for a 512 512
-0.172282	subsequent instructions. The 512 512
-0.172282	511 2040 38.7 512 512
-0.172282	65 13.6 80.9 512 512
-0.165143	511 511 2040 38.7 512
-0.165143	65 65 13.6 80.9 512
-0.477037	tells the CPU to generate
-0.712030	the operating system to generate
-0.751790	if we want to generate
-0.578809	this is likely to generate
-1.185217	compilers are able to generate
-0.236953	the && expression to generate
-0.051221	b to 0 and generate
-0.314684	page 141. Applications that generate
-0.394366	inefficient, and it will generate
-0.303595	or created it will generate
-0.336448	of -fpic. This will generate
-0.312317	to -56 which will generate
-0.227208	uncaught overflow condition will generate
-0.227208	1 to 127 will generate
-0.227208	address. The linker will generate
-0.227208	calculation of c+b will generate
-0.446022	restriction, but it doesn't generate
-0.569088	the compiler can automatically generate
-0.558743	obtain most of the advantages
-0.756822	has many of the advantages
-0.719692	further discussion of the advantages
-0.237686	important to weigh the advantages
-0.166413	linking is used. The advantages
-0.166413	alloca is used. The advantages
-0.234980	the end user. The advantages
-0.290884	operations with pointers. The advantages
-0.533031	of programming style. The advantages
-0.234980	as the operands. The advantages
-0.378753	x + 1.0f;} The advantages
-0.234980	be allocated dynamically. The advantages
-0.234980	precision (80 bits). The advantages
-0.237529	instructions. Each type has advantages
-0.537974	There are also other advantages
-0.338568	While C++ has many advantages
-0.236098	if there are specific advantages
-0.292116	bit systems have several advantages
-0.347441	program. Weighing the above advantages
-0.351610	The parameters a and r
-0.293875	thing as p and r
-0.314183	of the variable that r
-0.237446	i = 0 that r
-0.292708	r) { r = r
-1.190886	i++) { a[i] = r
-0.380986	[esp+12] ; edx = r
-0.538575	value pointed to by r
-0.463511	(int & r) { r
-0.382404	loop, for example when r
-0.237096	in the sequence, where r
-0.064193	for (r = 0; r
-0.312702	;startofFunc ; a ; r
-0.235623	is not clear whether r
-0.054448	for (r = 1; r
-0.234646	2 ; add what r
-0.212289	the value that lies r
-0.584549	met then it is usually
-0.347166	body. A function is usually
-0.347166	the dispatcher function is usually
-0.460463	of bits. This is usually
-0.542081	but the compiler is usually
-0.357297	non-sequential manner. It is usually
-0.292309	The link order is usually
-0.344679	The first count is usually
-0.236231	static data area is usually
-0.236231	constructor, if any, is usually
-0.236231	loop. The loop-branch is usually
-0.351434	memory. The functions are usually
-0.236941	best. These cases are usually
-0.330677	constants. Integer constants are usually
-0.237585	micro-op cache. Compilers will usually
-0.592650	cores or logical processors usually
-0.341503	double Floating point calculations usually
-0.319790	bits in an integer, usually
-0.218566	Access to remote databases usually
-0.462430	the loop if the results
-0.237851	bc); // OR the results
-0.313807	seven different compilers. The results
-0.293332	kinds of optimizations. The results
-0.237131	different matrix sizes. The results
-0.294002	but event-counters do. This results
-0.236513	Loop to print out results
-0.236503	bitwise operators produce 32 results
-0.337072	and store the four results
-0.256833	double precision, and intermediate results
-0.337419	need to store intermediate results
-0.204902	an integer). All intermediate results
-0.204902	loop by storing intermediate results
-0.281508	4 computer. The measured results
-0.226725	order to get reliable results
-0.226736	function stores the thousand results
-0.199993	(^) may give inconsistent results
-0.199993	They sometimes give misleading results
-0.165133	512 matrix. My experimental results
-0.351709	two arrays, a and b,
-0.349581	int x[]) { int b,
-0.135896	overflows, even if a, b,
-0.022314	main() { int a, b,
-0.097091	Example 14.10 int a, b,
-0.097091	Example 14.11 int a, b,
-0.097091	Example 8.6a int a, b,
-0.097091	Example 8.6b int a, b,
-0.135896	+ a.y);} vector a, b,
-0.129345	S1 { float a, b,
-0.129345	Example 11.1a float a, b,
-0.129345	Example 11.1b float a, b,
-0.129345	Example 8.16 float a, b,
-0.111258	following way: bool a, b,
-0.111258	Example 7.9a bool a, b,
-0.135896	cc[]) { Vec16s a, b,
-0.135896	vector objects Vec8s a, b,
-0.403879	8.13b int i, a[100], b,
-0.540739	use any of the storage
-0.354258	in Fortran where the storage
-0.346182	declaration. The type of storage
-0.314510	takes 8 bytes of storage
-0.629035	at compile time. The storage
-0.237497	variables are stored. The storage
-0.348988	possible, and replaced by storage
-0.235858	page 26 about data storage
-0.235858	disadvantage of binary data storage
-0.293378	returns. Global or static storage
-0.180802	different kinds of variable storage
-0.272073	Different kinds of variable storage
-0.224914	for integer constants. Register storage
-0.048390	for each thread. Thread-local storage
-0.048390	thread environment block. Thread-local storage
-0.048390	changed five times. Thread-local storage
-0.284311	may have big endian storage
-0.074769	compilers can make thread-local storage
-0.074769	global variables. (See thread-local storage
-0.356293	handle strings is the old
-0.358282	the contents of the old
-0.357552	artificially changed to the old
-0.455697	to data in the old
-1.089671	is used in the old
-0.352676	text strings in the old
-0.352676	not present in the old
-0.356963	is 15 on the old
-0.237864	of a string. The old
-0.687877	code is compiled for old
-0.727009	that is compatible with old
-0.611211	not be compatible with old
-0.333103	set when compatibility with old
-0.311721	the code incompatible with old
-0.237748	It will crash on old
-0.293978	of data. Use an old
-0.293133	not in some very old
-0.218554	using a six years old
-0.165153	database by a plain old
-0.957892	enable the compiler to reduce
-0.606080	expect a compiler to reduce
-1.663004	It is possible to reduce
-0.456739	tested were able to reduce
-0.236951	to great lengths to reduce
-0.236951	tested the capability to reduce
-0.382810	and constant propagation and reduce
-0.352295	or Espresso) that can reduce
-0.353503	system, but you can reduce
-0.255007	and other compilers can reduce
-0.255007	compiler. Some compilers can reduce
-0.365629	reductions Most compilers can reduce
-0.234544	Frequent context switches can reduce
-0.234544	have ever seen can reduce
-0.798395	} The compiler may reduce
-0.460201	i+1; The compiler may reduce
-0.351023	all good compilers will reduce
-0.293045	For example, compilers cannot reduce
-0.234179	files. This can actually reduce
-0.390610	is a branch that goes
-0.228801	way. A branch that goes
-0.228801	course. A branch that goes
-0.237766	computer is reset or goes
-0.806903	mispredicted only when it goes
-0.354130	no cost because it goes
-0.313038	be mispredicted whenever it goes
-0.237775	call a polymorphic function goes
-0.353084	of data. The code goes
-0.567135	fraction of the time goes
-0.515017	data into a vector goes
-0.236479	A branch that always goes
-0.232283	input file. The output goes
-0.732593	that the clock frequency goes
-0.393690	function in a DLL goes
-0.276519	a typical software project goes
-0.265125	loop in example 9.5a goes
-0.165122	whether the call p->f() goes
-0.165122	while less than 1% goes
-0.353342	simple variables into a union
-0.237725	} } Using a union
-0.595027	a register variable. The union
-0.346413	a class, structure or union
-0.538341	void F3(bool y) { union
-0.403292	same memory space. A union
-0.234629	for an example. A union
-0.234629	53. 7.24 Unions A union
-0.212246	values: // Example 14.28 union
-0.251285	double: // Example 14.23b union
-0.251285	exponent: // Example 14.26 union
-0.251285	integers: // Example 14.27 union
-0.199976	bit: // Example 14.23 union
-0.165117	way: // Example 7.40b union
-0.165117	Array of 100 doubles: union
-0.165117	Example: // Example 7.39 union
-0.165117	bits: // Example 14.29 union
-0.165117	zero: // Example 14.24 union
-0.165117	bit: // Example 14.25 union
-0.242631	7.10b char a = 0,
-0.242631	7.9b char a = 0,
-0.345554	= 0, b = 0,
-0.310859	a & 0 = 0,
-0.233008	float list[size], sum1 = 0,
-0.233008	= 0, s3 = 0,
-0.233008	= 0, s2 = 0,
-0.233008	a[100]; float s0 = 0,
-0.233008	= 0, s1 = 0,
-0.343705	bytes. first byte at 0,
-0.283468	unused bytes byte at 0,
-0.022155	a = select(b > 0,
-0.074779	public: SafeArray() { memset(a, 0,
-0.074779	a to zero memset(a, 0,
-0.074779	floating point overflow: _controlfp_s(&dummy, 0,
-0.074779	point status: _fpreset(); _controlfp_s(&dummy, 0,
-0.165169	7.16 float list[100]; memset(list, 0,
-0.122937	time the function is called.
-0.293077	when the function is called.
-0.451233	polymorphic member function is called.
-0.535649	the critical function is called.
-0.236404	stack when CriticalInnerFunction is called.
-0.784752	that needs to be called.
-0.971507	the functions that are called.
-0.347795	for local objects are called.
-0.536047	that all destructors are called.
-0.236495	before any constructors are called.
-0.235224	in which alloca was called.
-0.102367	the function is never called.
-0.102367	dispatched function is never called.
-0.300026	the functions are never called.
-0.197917	to the power of 10
-0.293777	a misprediction penalty of 10
-0.237901	reduced from 20 to 10
-0.237862	for foreground jobs and 10
-0.463661	it will recognize that 10
-0.237717	and subtraction (3 - 10
-0.237088	time. An experiment where 10
-0.459418	both get the value 10
-0.023190	write that something takes 10
-0.236757	it will still take 10
-0.321518	to multiple operating systems. 10
-0.231888	is not detected until 10
-0.230038	static variables. See chapter 10
-0.230048	device drivers for Windows. 10
-0.296191	control branch is executed 10
-0.199976	Choice of compiler .................................................................................................... 10
-0.199976	cache control .............................................................................................. 99 10
-0.407650	X operating system is based
-0.182743	below. This manual is based
-0.182743	158. This manual is based
-0.356748	the dispatching should be based
-0.353679	for programs that are based
-0.424488	that these methods are based
-0.234681	Microsoft's .NET framework are based
-0.234681	implementations of Java are based
-0.234681	Pascal and Fortran are based
-0.152061	copy protection schemes are based
-0.234681	work. The recommendations are based
-0.237470	about an unknown CPU based
-0.443114	either C or C++ based
-0.236097	is important. A language based
-0.484012	make a CPU dispatcher based
-0.377619	a high level framework based
-0.288968	a branch will go based
-0.229208	branch can be chosen based
-0.355777	important it is to choose
-0.382037	different C++ compilers to choose
-0.237335	said than done to choose
-0.237335	// Use mask to choose
-0.382071	the operating system and choose
-0.314080	floating point operations and choose
-0.314080	on future processors, and choose
-0.323370	be available, we may choose
-0.287227	guidelines below. You may choose
-0.287227	vector operations. You may choose
-0.287227	page 52. You may choose
-0.287227	in question. You may choose
-0.287227	available today. You may choose
-0.231044	A software developer may choose
-0.293955	propagation, etc. Whether you choose
-0.544406	enabled. The compiler will choose
-0.348626	C++ program, you should choose
-0.378648	optimizing compilers will automatically choose
-0.226762	disadvantages that make developers choose
-0.357414	optimization options and the options
-0.314672	if we specify the options
-0.604333	18 Overview of compiler options
-0.236613	3. Use appropriate compiler options
-0.255196	compilers have various optimization options
-0.203449	relevant options. Many optimization options
-0.042636	all the relevant optimization options
-0.004536	with all relevant optimization options
-0.042636	latencies. 8.5 Compiler optimization options
-0.042636	CPU.............................................................................81 8.5 Compiler optimization options
-0.323980	to study the available options
-0.235906	Table 18.1. Command line options
-0.291350	but only if certain options
-0.532372	C++ compilers have various options
-0.233015	to select all installation options
-0.226761	version because the debugging options
-0.165148	to test. disable power-save options
-0.356710	rely on is the feature
-0.350272	shared_ptr. auto_ptr has the feature
-0.134860	Intel compilers have a feature
-0.134860	Some compilers have a feature
-0.344794	not have such a feature
-0.237781	} The indirect function feature
-0.236776	2.11 ifunc branch). This feature
-0.236776	utilities in 2010. This feature
-0.306574	imported pointer, but this feature
-0.306574	public symbols, but this feature
-0.235098	automatic prefetching so this feature
-0.237565	in Gnu compiler A feature
-0.505315	indicates a specific CPU feature
-0.313817	only hope that such feature
-0.292825	Because the C++ template feature
-0.317399	to put a test feature
-0.231381	have a built-in test feature
-0.232286	It has the special feature
-0.251310	GOT. The symbol interposition feature
-0.402242	where there are different ways
-0.640568	There are several different ways
-0.283618	syntax has several different ways
-0.338870	been defined in other ways
-0.237107	There are other possible ways
-0.532232	Windows). There are several ways
-0.235227	compilers also have fast ways
-0.151434	be implemented in various ways
-0.133147	avoided, there are various ways
-0.153248	efficient. There are various ways
-0.153248	resources. There are various ways
-0.153248	explicitly. There are various ways
-0.153248	power. There are various ways
-0.151434	and 135 show various ways
-0.151434	subsequent sections describe various ways
-0.045623	time. There are three ways
-0.045623	calls. There are three ways
-0.008673	that there are smarter ways
-0.237016	on the processors that were
-0.048249	time on processors that were
-0.291836	version causes problem that were
-0.102095	brands or models that were
-0.102095	all newer models that were
-0.236875	experiment where 10 elements were
-0.292740	to see whether they were
-0.236373	that supported 256-bit instructions were
-0.292261	The following compiler versions were
-0.235099	memory and disk space were
-0.233778	computer. The measured results were
-0.890463	compilers I have tested were
-0.287345	If the different tasks were
-0.230764	for different matrix sizes were
-0.218525	C++ compilers The tests were
-0.199987	b in example 8.15a were
-0.199987	2.1.7, 2004. No differences were
-0.165128	If Func1 and Func2 were
-0.357819	is used for the link
-0.351697	by looking at a link
-0.550561	- no need to link
-0.237880	the option -fno-pic and link
-0.237500	linking works differently. The link
-0.237500	are linked together. The link
-0.456656	than in a static link
-0.036111	implemented either as static link
-0.282919	libraries slower than static link
-0.307000	compiled as a dynamic link
-0.021183	(*.lib, *.a) or dynamic link
-0.208196	that can make dynamic link
-0.021183	in a separate dynamic link
-0.232665	a linear array. No link
-0.338834	loaded until the previous link
-0.200009	program makes a symbolic link
-0.341408	a fixed-size array is made
-0.237580	called, a dispatch is made
-0.237580	when no attempt is made
-0.545863	The code can be made
-0.489993	existing object can be made
-0.449957	Boolean operations can be made
-0.348142	automatic dispatching can be made
-0.348142	general statement can be made
-0.354217	.cpp file) should be made
-0.454300	objects can often be made
-0.552334	different compilers I have made
-0.677904	that the microprocessor has made
-0.235798	ArrayOfStructures[100]; This reordering has made
-0.349216	My own function library made
-0.344466	a 64-bit shared object made
-0.228132	dramatic consequences. I once made
-0.212281	be linked into projects made
-0.165148	advantage of using ready made
-0.165148	to using templates. Ready made
-0.509268	will point to the appropriate
-0.165759	Set pointer to the appropriate
-0.342561	symbolic link to the appropriate
-0.342561	statement leads to the appropriate
-0.656193	instruction set for the appropriate
-0.356127	them separately with the appropriate
-0.437671	system and choose the appropriate
-0.237085	have to include the appropriate
-0.293279	routine that loads the appropriate
-0.237085	vector classes. Including the appropriate
-0.438913	and 64-bit systems. The appropriate
-0.237717	X" is simply not appropriate
-0.237701	that simply prints an appropriate
-0.347099	an error; and make appropriate
-0.324667	on what is most appropriate
-0.236144	at vectorization. 3. Use appropriate
-0.165159	the const keyword wherever appropriate
-0.453613	largest_abs = 0; int i,
-0.224460	39916800, 479001600}; ... int i,
-0.304028	short int a[100]; int i,
-0.224460	{ // n! int i,
-0.496178	14.12b int list[300]; int i,
-0.138541	8; float matrix[rows][columns]; int i,
-0.138541	50; float matrix[rows][columns]; int i,
-0.224460	*p = string; int i,
-0.224460	100; S1 list[size]; int i,
-0.224460	// Example 8.13a int i,
-0.224460	// Example 8.13b int i,
-0.224460	// Example 8.12a int i,
-0.224460	// Example 8.14b int i,
-0.224460	// Example 8.14a int i,
-0.006702	in aa: StoreVector(aa + i,
-0.165184	out results printf("\n%2i %10I64i", i,
-0.502438	necessarily done by the constructor
-0.497550	Make the function a constructor
-0.314253	a normal array. The constructor
-0.237505	do the conversion. The constructor
-0.237231	// constant data // constructor
-0.293446	// default constructor // constructor
-0.237580	Constructors and destructors A constructor
-0.329993	data members. A simple constructor
-0.227023	calls to the copy constructor
-0.227023	there is a copy constructor
-0.244539	return value. The copy constructor
-0.080084	improved performance. A copy constructor
-0.080084	need initialization. A copy constructor
-0.227023	object has no copy constructor
-0.178351	entire object. Any copy constructor
-0.165168	class with a default constructor
-0.165168	x,y coordinates // default constructor
-0.165168	a constructor. A default constructor
-0.347291	a particular set of CPUs.
-0.336086	on all brands of CPUs.
-0.364958	code versions for different CPUs.
-0.689574	multiple versions for different CPUs.
-0.436103	optimize for several different CPUs.
-0.232122	code to support different CPUs.
-0.290522	assembly code for Intel CPUs.
-0.234661	function for different Intel CPUs.
-0.303398	assembly code for AMD CPUs.
-0.328326	only supported on AMD CPUs.
-0.234878	tools that fit their CPUs.
-0.481495	hardware in the x86 CPUs.
-0.321592	code incompatible with old CPUs.
-1.493753	Intel, AMD and VIA CPUs.
-0.308887	supported by all modern CPUs.
-0.231880	Works well with non-Intel CPUs.
-0.305393	fastest solution on future CPUs.
-0.212269	possibly not with earlier CPUs.
-0.878819	1; for (i = 2;
-0.023377	= 1; list[i+2] = 2;
-0.236002	= 1; a[1] = 2;
-0.366571	r = r + 2;
-0.139174	a[i] = *p + 2;
-0.221259	*p = *p + 2;
-0.226197	a[i] = b[i] + 2;
-0.226197	aa[i] = bb[i] + 2;
-0.311107	a = a * 2;
-0.223856	absvalue = a[i].u[1] * 2;
-2.093552	= 0; i < 2;
-0.292019	Example 8.5b a += 2;
-0.405784	Disp() { cout << 2;
-0.357154	label $B1$2:. This is just
-0.462191	intrinsic functions. It is just
-0.237404	or vector classes is just
-0.314133	ms. This delay is just
-0.294213	call is translated to just
-0.382442	from the cache in just
-0.237626	make 32 AND-operations in just
-0.357383	or reference may be just
-0.331851	double precision calculations are just
-0.655725	Splitting up a function just
-1.106794	can be done with just
-0.353644	from errors. If you just
-0.344925	is sufficient to have just
-0.341550	into memory even when just
-0.293839	for this purpose. It just
-0.350565	can calculate a vector just
-0.199987	be speeded up significantly just
-0.199987	a square brackets index, just
-0.747321	upper 32 bits of a[i]
-0.347492	// Return reference to a[i]
-0.325232	i++) { temp = a[i]
-0.034471	< 100; i++) { a[i]
-0.654573	< size; i++) { a[i]
-0.714842	i += 2) { a[i]
-0.709836	top of loop ; a[i]
-0.334009	address of array element a[i]
-0.226464	i < 2; i++) a[i]
-1.446796	i < size; i++) a[i]
-0.222353	occur in multiplication here: a[i]
-0.212281	way that avoids overflow: a[i]
-0.212281	use the safe formula a[i]
-0.726033	non-inlined copy of the function,
-0.408110	optimal to inline the function,
-0.344813	dispatching 125 for this function,
-0.851923	versions of the same function,
-0.237390	kept entirely inside one function,
-0.350105	parameter to the library function,
-0.236942	call a non-virtual member function,
-0.344446	But in the template function,
-0.168237	case of the simple function,
-0.168237	function. In the simple function,
-0.235835	functionality of an optimized function,
-0.323374	in turn calls another function,
-0.528510	occurs in the innermost function,
-0.287837	is called a frame function,
-0.330023	by inlining the latter function,
-0.536680	of the CPU detection function,
-0.222337	advantage in the select function,
-0.199976	applied to a non-member function,
-0.342292	the order of the operands
-0.355600	the evaluation of the operands
-0.456656	branches, provided that the operands
-0.353432	with certainty that the operands
-0.452373	variables than if the operands
-0.514272	more advantageous if the operands
-0.350052	inconsistent results if the operands
-0.473327	You cannot swap the operands
-0.237867	of Boolean operands The operands
-0.564919	order of floating point operands
-0.331280	same precision in all operands
-0.237107	calculation of expressions where operands
-0.472518	order of the Boolean operands
-0.385359	The order of Boolean operands
-0.045363	0.12 memcpy 16kB aligned operands
-0.045363	Processor memcpy 16kB aligned operands
-0.200021	mean good performance). Aligned operands
-0.503890	conversions out of the innermost
-0.558349	data used in the innermost
-0.989642	function calls in the innermost
-0.453201	is spent in the innermost
-0.350706	exception occurs in the innermost
-0.350706	index changing in the innermost
-0.347359	other processors, only the innermost
-0.381697	spot but also the innermost
-0.352341	must be inside the innermost
-0.293691	stack memory outside the innermost
-0.293691	function but outside the innermost
-0.289510	subroutine for the critical innermost
-0.377071	obtained if the critical innermost
-0.289510	62. If the critical innermost
-0.289510	log) inside the critical innermost
-0.289510	is outside the critical innermost
-0.215149	distinct tasks. A critical innermost
-0.276599	a, b; // Critical innermost
-0.579519	platform is likely to require
-0.350819	return from functions that require
-0.236511	inefficient code-based methods or require
-0.236511	on table lookup or require
-0.236511	to be slower or require
-0.500262	a pointer does not require
-0.345352	becomes full. This may require
-0.236557	The time measurements may require
-0.347981	most efficient vector operations require
-0.236369	9.2. All these instructions require
-0.236173	static or global arrays require
-0.235887	operands have mixed precision require
-0.311905	multiplying them. This would require
-0.310630	pool. Alignment? Some applications require
-0.233277	pointers and non-constant references require
-0.226739	the code. Some profilers require
-0.199981	MOVNTPS, MOVNTPD and MOVNTDQ require
-0.165122	(e.g. in linear algebra) require
-0.877599	latest version of the compiler.
-0.142054	handling option in the compiler.
-0.142054	unroll option in the compiler.
-0.354543	code flag in the compiler.
-0.803232	if supported by the compiler.
-0.349066	inlined automatically by the compiler.
-0.357043	very much on the compiler.
-0.354338	to implement in a compiler.
-0.354338	of algebra in a compiler.
-0.453711	modules with a different compiler.
-0.851982	versions of the same compiler.
-0.521448	function of the Intel compiler.
-0.234668	Use Gnu or Intel compiler.
-0.438150	with the Intel C++ compiler.
-0.452138	Comes with the Gnu compiler.
-0.322721	project built with another compiler.
-0.305681	features as the Microsoft compiler.
-0.275864	Microsoft Comes with Microsoft compiler.
-0.463095	discussed which of the advanced
-0.357280	to compromise on the advanced
-0.325092	it will run the advanced
-0.293959	to avoid running the advanced
-0.455801	functions. A lot of advanced
-0.237531	documentation and lack of advanced
-0.237531	with a wealth of advanced
-0.237877	to out-of-order execution and advanced
-0.314675	This manual is for advanced
-0.294030	contain many tips on advanced
-0.350904	libraries. C++ is an advanced
-0.330661	overkill. Don't use an advanced
-0.331497	array and for more advanced
-0.355151	to run the most advanced
-0.311009	system devices and using advanced
-0.344118	Modern microprocessors are using advanced
-0.338556	because it has many advanced
-0.230791	for background services under advanced
-0.346424	memory space where a #define
-0.237759	with enum, const, or #define
-0.314572	macro as inline function #define
-0.294022	A macro declared with #define
-0.325085	// If Microsoft compiler #define
-0.627531	#if INSTRSET == 2 #define
-0.236787	#elif INSTRSET == 8 #define
-0.459254	defining constants. For example, #define
-0.235968	// Gnu compiler, etc. #define
-0.233262	#elif INSTRSET == 5 #define
-0.212274	#define pure_function __attribute__((const)) #else #define
-0.284270	is resolved at runtime. #define
-0.074756	to swap two elements: #define
-0.074756	swap two array elements: #define
-0.199970	been given a name. #define
-0.199970	Example 8.22 #ifdef __GNUC__ #define
-0.165112	binary representation of N: #define
-0.165112	#include <float.h> #include <math.h> #define
-1.259200	If the number of points
-0.236492	that the object it points
-0.572502	After first call it points
-0.292605	cannot change what it points
-0.353777	change what a pointer points
-0.524404	what a function pointer points
-0.236501	see that p always points
-0.235616	Wikibooks. The following list points
-0.234156	the original pointer actually points
-0.089340	the variable that r points
-0.089340	= 0 that r points
-0.201949	; add what r points
-0.233024	add a few unused points
-0.233024	class of object p points
-0.048390	function pointer which initially points
-0.048390	// Function pointer initially points
-0.048390	The PLT entry initially points
-0.165143	of four (or eight) points
-0.357520	context switch is a switch
-0.237378	able to predict a switch
-0.407352	Remember to insert a switch
-0.293614	On older processors, a switch
-0.704206	and it needs to switch
-0.420750	number of branches and switch
-0.048303	function. 7.12 Branches and switch
-0.048303	40 7.12 Branches and switch
-0.538025	the same as for switch
-0.237487	advantageous as replacements for switch
-0.314661	A branch tree or switch
-0.380322	not predicted well. A switch
-0.236106	of jump targets. A switch
-0.237504	for switch statements because switch
-0.308046	happens when a task switch
-0.321094	Example 14.3a int n; switch
-0.222388	Context switches A context switch
-0.165143	constants, array initializer lists, switch
-0.356711	a variable is the range
-0.237854	way to limit the range
-0.168549	index is out of range
-0.253735	0 if out of range
-0.253735	is not out of range
-0.253735	n being out of range
-0.294161	overflow and underflow. The range
-0.353997	other address in this range
-0.462128	data because the same range
-0.237437	using 8-bit integers which range
-0.331917	line containing the address range
-0.331917	that covered the address range
-0.318521	keys within a limited range
-0.222388	range analysis The live range
-0.074769	Register variables, float Live range
-0.074769	from register storage. Live range
-0.165143	confined to a narrow range
-0.341829	stack memory at the start
-0.341829	installation options at the start
-0.350820	because b has to start
-0.477028	want the CPU to start
-1.561851	it is possible to start
-1.725662	time it takes to start
-0.433490	and therefore fail to start
-0.381495	take several minutes to start
-0.237883	overlap the iterations and start
-1.161581	so that it can start
-0.237682	This garbage collection may start
-0.235363	optimal algorithm before you start
-0.101929	hot spots Before you start
-0.101929	mathematical tasks. Before you start
-0.236082	independently. The CPU will start
-0.236082	The heap manager will start
-0.236221	mangled function name ; start
-0.231393	under the framework, during start
-0.247517	order in which the modules
-0.353472	not load all the modules
-0.408133	requires the loading of modules
-0.237773	functions, classes, templates or modules
-0.336414	about functions in other modules
-0.439143	code if no other modules
-0.349128	compiler option for all modules
-0.237199	and well- tested library modules
-0.348181	to keep the two modules
-0.562116	make the most critical modules
-0.236290	program starts up. Some modules
-0.344304	linking to assembly language modules
-0.290637	be placed in separate modules
-0.265157	below. Cannot optimize across modules
-0.246807	will enable optimizations across modules
-0.165160	transfer across all .cpp modules
-0.255918	combine the multiple .cpp modules
-0.237850	division is faster the smaller
-0.237850	is more advantageous the smaller
-0.355918	special position-independent code is smaller
-0.237573	of context switches is smaller
-0.237573	computer. The proxy is smaller
-0.502001	an integer to a smaller
-0.352457	unit-test but has a smaller
-0.237904	the size. Integers of smaller
-0.339364	on bigger systems. The smaller
-0.583214	case it may be smaller
-1.088156	to make the code smaller
-0.237265	divide the matrix into smaller
-0.344658	a function into multiple smaller
-0.236552	of a variable even smaller
-0.342211	the structure 8 bytes smaller
-0.485186	contiguous. The code becomes smaller
-0.334207	can often be made smaller
-0.581601	CPUs with execution units smaller
-0.293661	but the method used here
-0.407431	advantage of using static here
-0.828540	that it is necessary here
-0.342443	long because the speed here
-0.334992	+ c; The calculation here
-0.322632	access, etc. The problem here
-0.277829	of the const_cast operator here
-0.223480	2.6f; The ?: operator here
-0.234469	bounds check on n here
-0.310639	is done at runtime here
-0.233267	// Non-polymorphic functions go here
-0.375015	of the advice given here
-0.224905	does not cost anything here
-0.222337	programming languages. www.yeppp.info And here
-0.016032	Everything that is said here
-0.212246	data decomposition. Functional decomposition here
-0.165117	error message is provoked here
-0.358611	the resources of the core
-0.356225	Intel CPUs: use the core
-0.443675	core clock cycles. The core
-0.237495	the clock frequency. The core
-0.237833	in the table are core
-0.868761	running in the same core
-0.530286	frequency that the CPU core
-0.374211	has only one CPU core
-0.202912	to a specific CPU core
-0.353052	Intel processors is called core
-0.236484	interrupt service routines, system core
-0.292255	memory access. The execution core
-1.022124	in the same processor core
-0.378786	of a dedicated microprocessor core
-0.232308	119). The AMD math core
-0.226773	platforms. AMD AMD Math core
-0.358479	the answers in the relevant
-0.456819	line with all the relevant
-0.579487	optimizations that it is relevant
-0.346093	Optimizing for size is relevant
-0.339163	Optimizing for speed is relevant
-0.325286	methods could possibly be relevant
-0.538056	where it is more relevant
-0.253108	release version with all relevant
-0.253108	carried out with all relevant
-0.253108	calculations. Even with all relevant
-0.339712	and turn on all relevant
-0.352329	present manual is also relevant
-0.236596	happy to receive new relevant
-0.233777	18.1. Command line options relevant
-0.226766	but these are hardly relevant
-0.212275	Compiler directives and keywords relevant
-0.200004	in almost all respects relevant
-0.478992	using the vector registers are:
-0.313193	references rather than pointers are:
-0.876999	of object oriented programming are:
-0.427490	using the register stack are:
-0.392576	of dynamic memory allocation are:
-0.349661	pitfalls of CPU dispatching are:
-0.223098	advantages of dynamic linking are:
-0.223098	rather than dynamic linking are:
-0.233277	pointers rather than references are:
-0.415759	advantages of function inlining are:
-0.228094	calculations of loop iterations are:
-0.226726	or malloc and free are:
-0.226739	common problems with profilers are:
-0.222342	performance. The positive effects are:
-0.272211	about it. Possible solutions are:
-0.165122	the YMM registers. Disadvantages are:
-0.325373	that dates back to around
-0.337581	a way to work around
-0.289971	should have #if directives around
-1.068934	There are various ways around
-0.059921	is fragmented and scattered around
-0.028928	likely to be scattered around
-0.028928	They may be scattered around
-0.079340	objects that are scattered around
-0.131192	the data are scattered around
-0.059921	are many functions scattered around
-0.059921	help files etc. scattered around
-0.222366	variables do not wrap around
-0.212300	a lot of jumping around
-0.200009	data are scattered randomly around
-0.017521	to put a parenthesis around
-0.017521	you put a parenthesis around
-0.165148	to identify the circumstances around
-0.347100	the code up to 5
-0.237520	while seconds count to 5
-0.237520	has been incremented to 5
-0.314510	may take 3 - 5
-0.324988	it may take only 5
-1.207363	a = b * 5
-0.233731	1000 * 100 * 5
-0.376788	floating point addition takes 5
-0.233567	floating point operation takes 5
-0.236936	time is typically between 5
-0.234156	not cover graphics processors. 5
-0.424313	SelectAddMul_SSE2 #elif INSTRSET == 5
-0.218531	of hardware platform ....................................................................................... 5
-0.212263	the optimal platform ........................................................................................... 5
-0.212263	and usability ............................................................................................... 23 5
-0.199993	of multiplying by 3, 5
-0.165133	available from a website. 5
-0.584886	calls the function is replaced
-0.407852	template function, m is replaced
-0.237585	shift operation. x*8 is replaced
-0.458703	avoided, if possible, and replaced
-0.347699	} This can be replaced
-0.446611	>= size can be replaced
-0.345493	the multiplication can be replaced
-0.345493	two constants can be replaced
-0.345493	example 12.4b can be replaced
-0.354414	use pointers may be replaced
-0.351836	only constants will be replaced
-0.472572	139 can sometimes be replaced
-0.808855	the template parameters are replaced
-0.487159	one. The compiler has replaced
-0.345955	the compiler have been replaced
-0.235763	instance has its parameters replaced
-0.237202	float b) {x = a;
-0.237202	b, c; x[0] = a;
-0.346019	struct S1 { int a;
-0.339863	S1 { short int a;
-0.339863	at 11 short int a;
-0.437400	S3 { public: int a;
-0.233581	*p + 2;} int a;
-0.530369	c + b + a;
-0.228162	// Example 7.24 float a;
-0.228162	// Example 7.29a float a;
-0.228162	// Example 14.2a float a;
-0.228162	// Example 14.2b float a;
-0.232333	// Example 7.11 bool a;
-0.027572	8.15a struct S1 {double a;
-0.027572	8.15b struct S1 {double a;
-0.074777	7.13 struct abc {int a;
-0.074777	1024; struct Sab {int a;
-0.293997	There are lots of things
-0.237715	are a couple of things
-0.314717	using overloaded operators for things
-0.352349	very smart and other things
-0.961410	is possible to do things
-0.431281	is advantageous to do things
-0.431281	three ways to do things
-0.237166	is to do multiple things
-0.510704	maintenance There are two things
-0.293190	Each compiler does some things
-0.292459	response times to simple things
-0.470229	smarter ways of doing things
-0.553321	arrays. There are various things
-0.234467	assembly listing reveals three things
-0.199987	output can often reveal things
-0.199987	it does some funny things
-0.165128	compiler does quite ingenious things
-0.345232	the positive or the negative
-0.237915	bit // u.d is negative
-0.354066	applications. Alternatively, use a negative
-1.206130	is to make a negative
-0.593957	the software contains a negative
-0.407350	signed variable produces a negative
-0.341505	inputs give overflow and negative
-0.293878	has both positive and negative
-0.237857	for these variables. The negative
-0.408058	positive and 1 for negative
-0.408033	i can never be negative
-0.237833	fail if both are negative
-0.325119	than 2n and not negative
-0.236102	of your software. A negative
-0.236102	same bits differently. A negative
-0.344481	member functions) has no negative
-0.237098	1.0f; } A possible negative
-0.182137	relocations in the code section
-0.337015	Actually, only the code section
-0.523565	option makes the code section
-0.337015	addresses. Therefore, the code section
-0.305923	every access. The code section
-0.305923	OS X The code section
-0.305923	following features: The code section
-0.237731	other programming languages. This section
-0.348410	the end of this section
-0.235116	You may skip this section
-0.235116	I will conclude this section
-0.454874	data. Therefore, the data section
-0.327284	multiple processes. The data section
-0.234205	shared. Any writable data section
-0.335787	and VIA. The next section
-0.165169	subroutines in assembly language", section
-0.555505	safe to do the reductions
-0.294157	cannot do. All the reductions
-0.407687	compiler to do more reductions
-0.322306	some situations, and which reductions
-0.235402	(page 77) shows which reductions
-0.237088	all cases, while many reductions
-0.236382	only the most simple reductions
-0.234910	floating point expressions. Most reductions
-0.117629	to do the algebraic reductions
-0.117629	optimization methods and algebraic reductions
-0.117629	they cannot make algebraic reductions
-0.117629	to do any algebraic reductions
-0.159632	can do simple algebraic reductions
-0.117629	a compiler. Many algebraic reductions
-0.222366	not do such obvious reductions
-0.222366	best at doing equivalent reductions
-0.212300	for the CPU. Algebraic reductions
-0.407278	be, for example, to go
-0.457706	those who want to go
-0.579089	problem is likely to go
-0.237328	is simply predicted to go
-0.294178	Increment loop counter and go
-0.354300	of branch that can go
-1.195515	int cc[]) { // go
-0.538906	call to the function go
-0.714456	consuming because it may go
-0.292685	hand- written table may go
-0.380278	way a branch will go
-0.236074	these eight elements will go
-0.237479	} // Non-polymorphic functions go
-0.293173	functions and public variables go
-0.236225	it is, but must go
-0.236171	if pointer arithmetic calculations go
-0.218537	errors that would otherwise go
-0.314668	to eliminate everything that depends
-0.356900	which method to use depends
-0.325024	elements in each vector depends
-0.545206	efficiency of a loop depends
-0.429403	so that each value depends
-0.233620	and its return value depends
-0.285465	the loop control branch depends
-0.098971	sense that each calculation depends
-0.098971	calculations, where each calculation depends
-0.234989	of the final application depends
-0.290328	that if each addition depends
-0.723115	that can be predicted depends
-0.232318	each value of sum depends
-0.285330	natural parallelism. The gain depends
-0.212257	costs of this bookkeeping depends
-0.165128	closest to the truth depends
-0.237947	than necessary. Take the example:
-0.237870	of memory blocks, for example:
-0.354007	as illustrated in this example:
-0.447487	at compile time. For example:
-0.189811	are comparisons, etc. For example:
-0.189811	in many cases. For example:
-0.084617	object pointed to. For example:
-0.084617	pointer refers to. For example:
-0.189811	to a structure. For example:
-0.189811	of mixed sizes. For example:
-0.189811	a table lookup. For example:
-0.317068	operand is valid. For example:
-0.189811	is poorly predictable. For example:
-0.189811	can be combined. For example:
-0.189811	be eliminated completely. For example:
-0.352368	illustrated by the following example:
-0.231397	less than ARRAYSIZE. Another example:
-0.237890	software project together and tested
-0.330251	The program should be tested
-0.330251	All software should be tested
-0.330251	Web systems should be tested
-0.330251	and servers should be tested
-0.450688	video should also be tested
-0.339831	instruction. Programmers that have tested
-0.089611	the compilers I have tested
-0.264646	reductions manually. I have tested
-0.236205	by 16. Library versions tested
-0.321950	They have not been tested
-0.336442	the examples have been tested
-0.232321	code can be further tested
-0.165164	in reusable and well- tested
-0.237947	unused. This removed the contentions
-0.234807	level-2 cache as when contentions
-0.290688	transpose a matrix when contentions
-0.234807	much more dramatic when contentions
-0.443351	writes. If the cache contentions
-0.226237	there will be cache contentions
-0.280954	This can cause cache contentions
-0.293029	stronger for level-2 cache contentions
-0.293029	Check if level-2 cache contentions
-0.321681	than for level-1 cache contentions
-0.313830	The consequence of such contentions
-0.293500	is likely to cause contentions
-0.303621	critical stride and cause contentions
-0.326137	then this can cause contentions
-0.146449	0) { // Cache contentions
-0.067081	.......................................................................................... 96 9.10 Cache contentions
-0.067081	is opposite). 9.10 Cache contentions
-0.339029	the innermost loop is predicted
-0.339241	the same way is predicted
-0.428994	The target address is predicted
-0.569975	equally likely to be predicted
-0.181578	count that can be predicted
-1.073736	that it can be predicted
-0.350037	branches inside can be predicted
-0.559797	pattern can also be predicted
-0.350192	final size cannot be predicted
-0.345995	they otherwise would be predicted
-0.549482	cheap if they are predicted
-0.037063	processor. Nested loops are predicted
-0.357588	several branches is not predicted
-0.345337	sequential labels is simply predicted
-0.341631	The loop-branch is usually predicted
-0.995445	is one of the main
-0.354487	a function in the main
-0.354487	message loop in the main
-0.651030	This works in the main
-0.357311	a proxy for the main
-0.355593	the CPU than the main
-0.354883	A call from the main
-0.353704	learning process where the main
-0.352656	in registers instead of main
-0.293601	then the version in main
-0.636239	a global variable in main
-0.237367	then the instance in main
-0.344220	and Fortran code. The main
-0.514572	AVX instruction set. The main
-0.237148	program is running. The main
-0.594373	can be accessed from main
-0.510752	threads. There are two main
-0.342768	analyze all pointers and references
-0.065617	versus references Pointers and references
-0.031570	operations. 7.6 Pointers and references
-0.031570	33 7.6 Pointers and references
-0.594868	accessed through pointers or references
-0.356608	using pointers rather than references
-0.314367	loader will have more references
-1.066988	The advantages of using references
-0.352718	be avoided by using references
-0.236350	are declared as constant references
-0.266946	more efficient because relative references
-0.213863	mode and mostly relative references
-0.226752	the DLL use absolute references
-0.224909	the calculation of self-relative references
-0.222356	and references Pointers versus references
-0.165138	while pointers and non-constant references
-0.165138	through function calls. Internal references
-0.623989	when the program is loaded
-0.366218	when the library is loaded
-0.472937	a dynamic library is loaded
-0.237893	esp+8 and esp+12 and loaded
-0.525259	pointer has to be loaded
-0.348373	is sure to be loaded
-0.500802	link pointer can be loaded
-0.354366	code section can be loaded
-0.354788	Some modules may be loaded
-0.352313	cache line will be loaded
-0.350192	memory address cannot be loaded
-0.348505	framework that must be loaded
-0.127225	The dynamic libraries are loaded
-0.127225	more dynamic libraries are loaded
-0.431240	object which is typically loaded
-0.165169	operator, or an over- loaded
-0.347499	made about whether the positive
-0.353309	the exponent is a positive
-0.499815	where N is a positive
-0.353132	reason is that a positive
-0.581028	idea to make a positive
-0.593561	the software contains a positive
-0.294164	on program performance. The positive
-0.506418	and works only for positive
-0.293740	which is 0 for positive
-0.237573	and unsigned variables. A positive
-0.237170	us to compare two positive
-0.324283	count up to some positive
-0.332876	appear as a large positive
-0.483482	as a very large positive
-0.278966	> v.f if both positive
-0.224484	programming style has both positive
-0.304379	variable produces a low positive
-0.502203	moved out of the loop.
-0.461077	every iteration of the loop.
-0.670499	the branch inside the loop.
-0.456497	the calculations inside the loop.
-0.307094	multiplication, etc.) inside the loop.
-0.293806	be done outside the loop.
-0.293806	can move outside the loop.
-0.324577	will change during the loop.
-0.237261	or to exit the loop.
-0.473710	advantage to unroll a loop.
-0.324227	output after the test loop.
-0.428482	changing in the innermost loop.
-0.236777	be inside the innermost loop.
-0.313385	memory outside the innermost loop.
-0.269461	inside the critical innermost loop.
-0.269461	outside the critical innermost loop.
-0.200037	would be an infinite loop.
-0.542122	updates every time the computer
-0.460846	update automatically when the computer
-0.460027	seemingly simultaneously. If the computer
-0.607293	to turn off the computer
-0.268675	or log off the computer
-0.287032	off or until the computer
-0.287032	the file until the computer
-0.237342	user to restart the computer
-0.354339	of objects in a computer
-0.354339	is smaller in a computer
-0.563591	movements of objects in computer
-0.308119	of graphics objects in computer
-0.314343	is never used. A computer
-0.237414	clock cycle on one computer
-0.344265	precious resource for many computer
-0.776121	on a Pentium 4 computer
-0.233788	data. Use an old computer
-0.587837	parameter is that the overhead
-0.520066	want to avoid the overhead
-0.325188	a driver involves the overhead
-0.293966	program without invoking the overhead
-0.452926	to the function. The overhead
-0.307104	class member function. The overhead
-0.313394	function inlining are: The overhead
-0.313394	communicating between threads. The overhead
-0.420538	with little or no overhead
-0.305542	optimize away the extra overhead
-0.328502	will be no extra overhead
-0.216574	logic may need extra overhead
-0.216574	gives an 9 extra overhead
-0.284566	because of the large overhead
-0.609275	There is a large overhead
-0.338225	and involve a high overhead
-0.231388	There is very little overhead
-0.013244	all on AMD and VIA
-0.013244	performance on AMD and VIA
-0.013244	well on AMD and VIA
-0.000362	of Intel, AMD and VIA
-0.003273	for Intel, AMD and VIA
-0.003273	from Intel, AMD and VIA
-0.003273	both Intel, AMD and VIA
-0.237809	an Intel, AMD or VIA
-0.218599	products fail to recognize VIA
-0.237945	bits for holding the pointer.
-0.759256	be converted to a pointer.
-0.351047	import table or a pointer.
-0.349230	const restriction from a pointer.
-0.340707	for actually making a pointer.
-0.454102	necessarily accessed through a pointer.
-0.330784	that behaves like a pointer.
-0.346114	edx as a memory pointer.
-0.470573	implementation of the member pointer.
-0.232918	use the same member pointer.
-0.491907	relative to the stack pointer.
-0.139571	pointer or a smart pointer.
-0.139571	don't need a smart pointer.
-0.139571	object through a smart pointer.
-0.200044	doesn't need the 'this' pointer.
-0.251362	doesn't need a 'this' pointer.
-0.212292	caller through a hidden pointer.
-0.441360	Use a compiler that supports
-0.237451	(requires a microprocessor that supports
-0.354064	multiple threads. The compiler supports
-0.552114	systems. The Intel compiler supports
-0.321167	explanation. (The Microsoft compiler supports
-0.234469	compilers. (The PGI compiler supports
-0.237579	a make utility. It supports
-0.517322	sure that the CPU supports
-0.343075	set than the CPU supports
-0.344271	single instruction. The CPU supports
-0.349040	64 bit instruction set supports
-0.349040	the x86-64 instruction set supports
-0.340986	Mac platform, but also supports
-0.231384	that processor model N supports
-0.421442	counters. My test tool supports
-0.165148	assume that model N+1 supports
-0.165148	integrated development environment (IDE) supports
-0.549563	arrays. Note that the C
-0.237871	of A, B and C
-0.294203	use of arrays in C
-0.325280	language will often be C
-0.236953	be linked together with C
-0.236953	can be manipulated with C
-0.525453	slightly more resources than C
-0.098669	used in the Gnu C
-0.556065	placed in a separate C
-0.230776	You may choose either C
-0.008671	1.1, B = 2.2, C
-0.017519	is the old fashioned C
-0.017519	in the old fashioned C
-0.199993	also includes the low-level C
-0.165133	security problem. The official C
-0.434984	and one that is compatible
-0.336256	instruction set that is compatible
-0.336256	generic version that is compatible
-0.665087	if the processor is compatible
-0.542311	alloca may not be compatible
-0.461713	program will not be compatible
-0.354686	then it will be compatible
-0.327889	Fastcall functions are not compatible
-0.424507	Watcom compilers are not compatible
-0.327889	function names are not compatible
-0.350017	instruction set. The most compatible
-0.292689	It is not even compatible
-0.229213	for Windows are fully compatible
-0.226773	many respects and highly compatible
-0.212297	The sequence of backwards compatible
-0.165163	systems are not backwards compatible
-0.200015	Mars compiler is mostly compatible
-0.503413	code because of a change
-0.294221	is not allowed to change
-0.294164	support. Hardware updating. The change
-0.570644	inside the loop can change
-0.509513	same dynamic library can change
-0.350640	with references. You can change
-0.322588	because modern CPUs can change
-0.347387	some caveats. We can change
-0.328702	example, a compiler may change
-1.015119	} The compiler may change
-0.354483	be adjusted if you change
-0.639959	an optimizing compiler will change
-0.323133	in a[] which will change
-0.331045	same result if we change
-0.232563	or const reference cannot change
-0.232563	is optimized. We cannot change
-0.229222	in most cases. Don't change
-0.504473	the variable in the global
-0.988116	when applied to a global
-1.102399	be stored in a global
-0.653528	same name as a global
-0.346541	object. Likewise, when a global
-0.237200	to make log2 a global
-0.314738	storage of static and global
-0.211756	Access to static or global
-0.211756	rely on static or global
-0.237405	variable two names, one global
-0.351181	to make a variable global
-0.236994	Do not make variables global
-0.329464	any function are called global
-0.232473	to its variables called global
-0.236187	You may preferably avoid global
-0.235372	any public variables. All global
-0.231868	on page 26. Avoid global
-0.294219	sizes. The results of my
-0.237877	generation of computers and my
-0.236825	can read about in my
-0.292984	512 512 matrix in my
-0.236825	systems. A look in my
-0.236825	and algebraic reductions in my
-0.573225	can be found in my
-0.606602	See the manual for my
-0.237487	corrections and suggestions for my
-0.294159	manuals. Please note that my
-0.348988	have been replaced by my
-0.549193	methods are based on my
-0.236905	is based mainly on my
-0.236852	serious legal issue. See my
-0.313909	in the code. For my
-0.234904	on this topic, see my
-0.222348	most reliable solution. (In my
-0.485664	You can avoid the conversions
-0.314569	solutions are: Avoid the conversions
-0.237770	of variables. Move the conversions
-0.237421	the C++ language, all conversions
-0.407063	overflow Integer to float conversions
-0.427519	as you don't need conversions
-0.292528	consumption of different type conversions
-0.799712	in order to avoid conversions
-0.284354	If you cannot avoid conversions
-0.214078	no extra time. These conversions
-0.214078	holding the pointer. These conversions
-0.214078	to single precision. These conversions
-0.214078	bypassing syntax checks. These conversions
-0.231873	See page 140. Avoid conversions
-0.279460	this problem. 7.11 Type conversions
-0.165148	there are floating point-to-integer conversions
-0.350868	as last time the statement
-0.313738	while loop, the if statement
-0.293266	} 135 The if statement
-0.237630	the same after this statement
-0.454212	should be only one statement
-0.434270	macro so that each statement
-0.298947	that the function call statement
-0.298947	information. Each function call statement
-0.554670	evaluate the loop control statement
-0.162397	to predict a switch statement
-0.162397	older processors, a switch statement
-0.168910	branch tree or switch statement
-0.216476	jump targets. A switch statement
-0.168910	array initializer lists, switch statement
-0.393756	While an empty throw() statement
-0.228124	page 53). No general statement
-0.407856	most common cause of errors
-0.325137	a frequent source of errors
-0.237518	useful for preventing program errors
-0.357738	array. But the same errors
-0.306104	explicit checks for such errors
-0.140794	order to prevent such errors
-0.140794	way to prevent such errors
-0.306260	and other common programming errors
-0.230797	It may catch programming errors
-0.346490	because it can cause errors
-0.290779	possible ways of handling errors
-0.226742	is unsafe because serious errors
-0.212269	overflow can cause unpredictable errors
-0.212269	sign and rounding 137 errors
-0.165138	is intended for detecting errors
-0.165138	invalid and cause fatal errors
-0.237890	should be optional and off
-0.237804	RTTI then turn it off
-0.068321	your program to turn off
-0.068321	user has to turn off
-0.068321	the user to turn off
-0.068321	be useful to turn off
-0.091352	is recommended to turn off
-0.155527	two versions and turn off
-0.113901	point calculations or turn off
-0.231892	until you turn them off
-0.226773	turn off or log off
-0.088597	be used for turning off
-0.080871	whole program by turning off
-0.080871	significantly just by turning off
-0.165164	(bitwise and) will cut off
-0.555070	8. The number of unused
-0.237724	can cause holes of unused
-0.348237	question: Put in an unused
-0.592357	compiler from making an unused
-0.314019	at 13 // 2 unused
-0.588983	bytes. first // 4 unused
-0.232402	There are also 4 unused
-0.236838	stamp counter // For unused
-0.652541	top of loop ; unused
-0.216109	a ; r ; unused
-0.216109	loop if true ; unused
-0.216109	Func ;a ;r ; unused
-0.350338	then add a few unused
-0.328845	not possible to add unused
-0.196012	Here, there are 6 unused
-0.196012	bytes. first // 6 unused
-0.237939	aiming at explaining the relative
-0.356260	distinguish elements with a relative
-0.237733	when it sees a relative
-0.451077	set has support for relative
-0.594995	in the code are relative
-0.336151	address of each function relative
-0.329667	slightly more efficient because relative
-0.235666	code is smaller because relative
-0.470566	offset of the member relative
-0.486829	of a data member relative
-0.337918	-fpic. This will generate relative
-0.334012	number. If the offset relative
-0.199998	bit mode and mostly relative
-0.074767	relocation, but only self- relative
-0.074767	used for calculating self- relative
-0.165138	are in fact addressed relative
-0.478838	make the number of columns
-0.936987	If the number of columns
-0.781592	where the number of columns
-0.478838	making the number of columns
-0.023501	number of rows and columns
-0.421148	element. The multiplication by columns
-0.293871	case is faster when columns
-0.548857	c++) { // loop columns
-0.236481	through rows // loop columns
-0.237455	j << 5. If columns
-0.236812	leave the last 8 columns
-0.233027	possible to add unused columns
-0.008672	int rows = 20, columns
-0.200009	int rows = 10, columns
-0.325128	to add i to p
-0.520469	that is added to p
-0.314394	compiler will see that p
-0.537942	it is clear that p
-0.325217	int i; p = p
-0.538570	value pointed to by p
-0.237719	the same thing as p
-0.343179	cycles after the pointer p
-0.335682	what class of object p
-0.237078	C1 obj1; C0 * p
-0.236848	can be read before p
-0.354384	* p; int i; p
-0.235621	is equally fast whether p
-0.284312	good at optimizing away p
-0.199987	= &Object1; p->NotPolymorphic(); p->Hello(); p
-0.251297	Object2; CHello * p; p
-0.640518	good choice for all platforms.
-0.098259	64-bit Linux and Windows platforms.
-0.098259	supports Linux and Windows platforms.
-0.222380	some cases on Windows platforms.
-0.680815	Windows, Linux and Mac platforms.
-0.186868	optimization guide for x86 platforms.
-0.031575	Works with all x86 platforms.
-0.300592	source. Supports all x86 platforms.
-0.229219	is not standardized across platforms.
-0.228144	tested only on PC platforms.
-0.016035	all x86 and x86-64 platforms.
-0.212286	choice for all Unix-like platforms.
-0.035779	exist for all major platforms.
-0.035779	supported on all major platforms.
-0.350079	managed C++, and other languages
-0.425798	execute faster than other languages
-0.236591	Java implementations. However, these languages
-0.328480	The history of programming languages
-0.208704	be called from programming languages
-0.191441	compilers. Several other programming languages
-0.191441	performance over other programming languages
-0.208704	and other compiled programming languages
-0.208704	Perl. Several modern programming languages
-0.312642	be useful in compiled languages
-0.316684	than speed. This includes languages
-0.283104	such advantage in interpreted languages
-0.212300	program size, while high-level languages
-0.200009	and development time. Interpreted languages
-0.165148	fully compiled code. Compiled languages
-0.165148	project at hand. Low-level languages
-1.055676	the rest of the installation
-0.657357	not unusual for the installation
-0.325199	be selected during the installation
-0.573905	(*.dll or *.so). The installation
-0.237138	not in use. The installation
-0.237138	to be installed. The installation
-0.237860	problems. The procedures for installation
-0.293802	128 below. Dispatch at installation
-0.237421	possible to select all installation
-0.236772	Software developers should take installation
-0.206126	of time both during installation
-0.206126	the framework itself, during installation
-0.229208	should always use standardized installation
-0.068033	.................................................................................. 16 3.3 Program installation
-0.068033	following sections. 3.3 Program installation
-0.218549	rather than by individual installation
-0.327976	// Define function name depending
-0.339261	implemented in various ways depending
-0.233017	their clock frequency dynamically depending
-0.085981	- 16 clock cycles, depending
-0.085981	- 10 clock cycles, depending
-0.085981	- 6 clock cycles, depending
-0.085981	- 80 clock cycles, depending
-0.085981	- 25 clock cycles, depending
-0.226731	parts of the memory, depending
-0.311737	cycles for 32-bit integers, depending
-0.212269	between 9 and 64, depending
-0.199998	be three or four, depending
-0.165138	keyword has several meanings depending
-0.165138	faster than example 12.4a, depending
-0.165138	by a conditional move, depending
-0.165138	of the following solutions, depending
-0.657557	the sense that the syntax
-0.351016	quite efficient, but the syntax
-0.268989	never called. Unfortunately, the syntax
-0.268989	do this. Unfortunately, the syntax
-0.237914	way of relieving a syntax
-0.237138	bit of x The syntax
-0.313816	than pointers are: The syntax
-0.237138	up cache space. The syntax
-0.382413	with a little more syntax
-0.339797	hidden behind the C++ syntax
-0.327975	Type conversions The C++ syntax
-0.293206	pointer. It has some syntax
-0.329800	the same inline assembly syntax
-0.236182	int BigArray[1024]; // Windows syntax
-0.235916	BigArray[1024] __attribute__((aligned(64))); // Linux syntax
-0.212300	different way or bypassing syntax
-0.504657	is 50% of the cases.
-0.275247	by value in most cases.
-0.275247	is optimal in most cases.
-0.275247	linked lists in most cases.
-0.313823	viable solution in such cases.
-0.234555	the variable in many cases.
-0.234555	reductions explicitly in many cases.
-0.296048	improve performance in some cases.
-0.296048	efficient solution in some cases.
-0.296048	program structure in some cases.
-0.117339	data file in simple cases.
-0.174256	optimization automatically in simple cases.
-0.117339	at least in simple cases.
-0.349941	4 in the best cases.
-0.336531	the same in both cases.
-0.429327	only in the simplest cases.
-0.234141	on the newest processors. Supports
-0.289220	as the Microsoft compiler. Supports
-0.231843	Yeppp. Open source library. Supports
-0.231343	to the Intel libraries. Supports
-0.230763	vectorization. Optimizes moderately well. Supports
-0.744878	and later instruction sets. Supports
-0.230058	well, others are not. Supports
-0.218514	many good optimization options. Supports
-0.212246	to produce binary code). Supports
-0.251285	OpenMP and automatic parallelization. Supports
-0.330748	X, 32-bit and 64-bit. Supports
-0.251285	Windows, Linux and Mac. Supports
-0.199976	class library. Open source. Supports
-0.165117	kit (SDK or PSDK). Supports
-0.165117	not fully optimized yet. Supports
-0.165117	explanation and possible workaround. Supports
-0.358306	and foremost, in the choice
-0.581757	conclusion is that the choice
-0.353720	this discussion that the choice
-0.382413	Most compilers offer the choice
-0.293871	mainframe computers. Today, the choice
-0.236428	of hardware platform The choice
-0.292532	for Windows applications. The choice
-0.236428	the B values. The choice
-0.236428	platforms. Graphics accelerators The choice
-0.236428	the best algorithm. The choice
-0.041261	compiler is a good choice
-0.295030	is a very good choice
-0.344800	to be the optimal choice
-0.404485	n with a suitable choice
-0.325400	the decimal point is 1.
-0.051219	values than 0 and 1.
-0.537440	specialization for N = 1.
-0.237193	y2, reciprocal_divisor; reciprocal_divisor = 1.
-0.348427	to be 0 or 1.
-0.252926	value than 0 or 1.
-0.064182	values than 0 or 1.
-0.381832	17 will evict number 1.
-0.251323	dealing with this problem: 1.
-0.165148	software uses CPU dispatching: 1.
-0.165148	make a function local: 1.
-0.165148	series of five manuals: 1.
-0.165148	following conditions are satisfied: 1.
-0.826283	the performance of the STL
-0.355710	The generality of the STL
-0.355710	and flexibility of the STL
-0.720813	container classes in the STL
-0.356360	The containers in the STL
-0.341216	different purposes. However, the STL
-0.382167	size. In fact, the STL
-1.178138	that are used in STL
-0.293904	implementing a matrix in STL
-0.348240	objects stored in an STL
-0.330345	by one, into an STL
-0.534332	list. Do not use STL
-0.237461	the vector. The other STL
-0.236305	in the STL. Some STL
-0.218560	for every four objects. STL
-0.251335	objects in the container. STL
-1.125768	is that it is intended
-0.457523	purposes than it is intended
-0.500550	overflow. This function is intended
-0.754201	branch of code is intended
-0.356536	even temporarily. This is intended
-0.353442	64-bit systems. It is intended
-0.353442	own IDE. It is intended
-0.592638	handling Exception handling is intended
-0.330398	2010. This feature is intended
-0.236402	so-called symbol interposition is intended
-0.653781	function libraries that are intended
-0.293634	version. The examples are intended
-0.294027	is indeed vectorized as intended
-0.573764	profiler. It is not intended
-0.336064	to execute slower than intended
-0.283133	a physics processing unit intended
-0.490401	control the addresses of dynamically
-0.717664	for objects stored in dynamically
-0.352251	an object must be dynamically
-0.356088	other resource, such as dynamically
-0.314230	that pointers to different dynamically
-0.383804	object that is allocated dynamically
-0.168296	array can be allocated dynamically
-0.168296	objects can be allocated dynamically
-0.168296	arrays can be allocated dynamically
-0.291987	program uses many small dynamically
-0.341871	change their clock frequency dynamically
-0.410523	for how to align dynamically
-0.074777	exp exp 12.8 Aligning dynamically
-0.074777	vectors........................................................................ 119 12.8 Aligning dynamically
-0.212269	for discussion of aligning dynamically
-0.165138	clock frequency may vary dynamically
-0.435052	on a sequence of consecutive
-0.421242	objects are identified by consecutive
-0.236782	Each line covers 64 consecutive
-0.235755	This loop calculates four consecutive
-0.001920	result vector in eight consecutive
-0.000479	{ // Load eight consecutive
-0.000639	i); // Load eight consecutive
-0.001920	b.load(bb+i); // Load eight consecutive
-0.355815	to execute then the profiler
-0.339398	same computer, including the profiler
-0.355671	test run with a profiler
-0.293422	with the way a profiler
-0.102663	16 3.2 Use a profiler
-0.102663	improved. 3.2 Use a profiler
-0.313901	compiler packages include a profiler
-0.236065	in optimized programs. The profiler
-0.102187	reliable. Event-based sampling: The profiler
-0.102187	line. Time-based sampling: The profiler
-0.292119	from other processes. The profiler
-0.236065	e.g. every millisecond. The profiler
-0.236065	it takes. Debugging. The profiler
-0.237588	on page 153. A profiler
-0.226778	fit their CPUs. Intel's profiler
-0.165169	is called VTune; AMD's profiler
-0.441457	causes the memory to become
-0.448503	objects they point to become
-0.335822	is almost certain to become
-0.293567	the heap space to become
-0.805226	then the code can become
-0.236733	execute then measurements can become
-0.236733	of CPUs unequally can become
-0.312931	and that computers have become
-0.236397	that software projects have become
-0.237585	that such feature will become
-0.237583	the old block then become
-0.232371	a suboptimal way has become
-0.232371	the heap space has become
-0.232371	of hardware platform has become
-0.232371	when the heap has become
-0.313467	The heap can easily become
-0.352344	VIA processor and a Windows,
-0.508542	reporting. For example, in Windows,
-0.636942	the Gnu compiler for Windows,
-0.292893	(/arch:SSE2, /arch:AVX etc. for Windows,
-0.230732	An optimization guide for Windows,
-0.294038	most common platforms with Windows,
-0.697382	for 32- and 64-bit Windows,
-0.332247	register parameters. In 64-bit Windows,
-0.237126	intervals are short. In Windows,
-0.488055	cheap compiler for 32-bit Windows,
-0.288524	); #else // 32-bit Windows,
-0.350602	Clang Supported operating systems Windows,
-0.235010	systems Windows, Linux, Mac Windows,
-0.231380	as DOS and 16-bit Windows,
-0.222348	to avoid this. (In Windows,
-0.357984	an error if the index
-0.237857	support for multiplying the index
-0.294163	{ // Check that index
-0.237733	instead of j as index
-0.237726	the index, i. This index
-0.340255	list or with an index
-0.236672	a constant plus an index
-0.335661	in case the array index
-0.280300	n is an array index
-0.280300	check if an array index
-0.561544	used as an array index
-0.223707	// Safe [] array index
-0.321684	be identified by their index
-0.253649	so that the last index
-0.253649	accessed with the last index
-0.165153	you may write FatalAppExitA(0,"Array index
-0.354616	point addition on a modern
-0.237154	the vector operations of modern
-0.338483	The high speed of modern
-0.237154	The execution core of modern
-0.237154	the out-of-order capabilities of modern
-0.237154	the high complexity of modern
-0.237862	is quite inefficient. The modern
-1.003212	The reason is that modern
-0.542155	results. This is because modern
-0.309212	is supported by all modern
-0.233276	the reason why all modern
-0.376382	used in almost all modern
-0.382028	which comes with most modern
-0.235377	of order execution All modern
-0.234910	in chapter 12. Most modern
-0.224909	example is Perl. Several modern
-0.503796	is the one that gives
-0.313695	choose the method that gives
-0.313695	Use the option that gives
-0.355035	is useful because it gives
-0.456466	two kinds of code gives
-0.237717	the external clock. This gives
-0.237429	of a double which gives
-0.461171	the AVX2 instruction set gives
-0.420213	combination of these two gives
-0.292861	profiling, but it often gives
-0.542994	precision in 32-bit systems gives
-0.233550	c; The calculation here gives
-0.230066	vector class library, SSE4.1 gives
-0.806116	AMD and VIA CPUs" gives
-0.251297	AND'ed with all 0's gives
-0.199987	that N1 = N&(N-1) gives
-0.296844	NumberOfTests; i++) { // Loop
-0.296844	list.Size(); i++) { // Loop
-0.332743	+= TILESIZE) { // Loop
-0.234323	Loop with branch // Loop
-0.047812	coefficients // Table // Loop
-0.047812	A2; // Table // Loop
-0.234323	divisible by TILESIZE // Loop
-1.617449	- - x x Loop
-0.718416	- 1; } } Loop
-0.313278	the total execution time. Loop
-0.670563	option in the compiler. Loop
-0.459457	loop in this case. Loop
-0.694324	by the unroll factor. Loop
-0.200009	if branch is eliminated. Loop
-0.165148	arrays: // Example 12.4a. Loop
-0.165148	polynomial: // Example 8.23a. Loop
-0.575630	of parameter transfer is avoided
-0.475475	pointer. This can be avoided
-0.475475	CPUs"). This can be avoided
-0.475475	place. This can be avoided
-0.622296	this example can be avoided
-0.439354	invalid pointers can be avoided
-0.439354	a constant can be avoided
-0.339734	jumps Jumps can be avoided
-0.339734	result (b+c) can be avoided
-0.334341	This penalty should be avoided
-0.334341	bits wide, should be avoided
-0.334341	or malloc/free should be avoided
-0.567188	unrolling should preferably be avoided
-0.468127	conversions can sometimes be avoided
-0.141205	framework should definitely be avoided
-0.141205	containers should definitely be avoided
-0.355095	pointer aliasing is to turn
-0.323724	in your program to turn
-0.452912	the user has to turn
-0.380960	forbids the user to turn
-0.531318	can be useful to turn
-0.670753	It is recommended to turn
-0.237623	you are using and turn
-0.237623	these two versions and turn
-0.237901	another function which in turn
-0.537394	options that you can turn
-0.237791	floating point calculations or turn
-0.347374	Enterprise editions). Do not turn
-0.237678	stay on until you turn
-0.237582	option for RTTI then turn
-0.358054	a function if the inlining
-0.294196	optimizing away p and inlining
-0.237868	ignore a request for inlining
-0.292734	The disadvantage of function inlining
-0.292734	The advantages of function inlining
-0.334446	Register allocation and function inlining
-0.235692	be able do function inlining
-0.036918	a leaf function by inlining
-1.169658	can be avoided by inlining
-0.539950	probably be improved by inlining
-0.351744	other modules. This makes inlining
-0.209486	table. Optimization method Function inlining
-0.209486	the inlined function. Function inlining
-0.209486	a non-inlined copy Function inlining
-0.209486	to know about. Function inlining
-0.463238	cases, regardless of the size.
-0.294159	int, without specifying the size.
-0.331727	software for speed or size.
-0.353291	in terms of code size.
-0.448644	multiple of the vector size.
-0.234055	size divisible by vector size.
-0.234055	using the larger vector size.
-0.229245	less than the cache size.
-0.332670	bigger than the cache size.
-0.285972	memory access and cache size.
-0.481302	of the level-1 cache size.
-0.331037	few arrays of variable size.
-0.332184	than the vector register size.
-0.232158	the largest available register size.
-0.449604	types of a specific size.
-0.291943	of the matrix line size.
-0.457588	cause problems if the network
-0.354167	a minute if the network
-0.498367	use situation where the network
-0.357938	standard PC's in a network
-0.354434	be tested on a network
-0.237912	interfaces and interfaces to network
-0.325013	put file access and network
-0.335905	connections. Open files and network
-0.237862	cannot be controlled. The network
-0.444262	the response times for network
-0.538620	for user input or network
-0.237754	vulnerability of software with network
-0.335462	clients that depend on network
-0.330634	Software that relies on network
-0.200009	files or accessing databases, network
-0.237943	Linux and BSD, the slow
-0.141994	16; // This is slow
-0.354953	a division, which is slow
-0.351907	processes running, and a slow
-0.346273	on CPUs with a slow
-0.346273	old computer with a slow
-0.629749	Floating point comparisons are slow
-0.434821	function for CPUs with slow
-0.292692	Functions Function calls may slow
-0.236568	reads and writes may slow
-0.237434	simple test setup but slow
-0.236481	these table lookup operations slow
-0.210520	Some CPUs have particularly slow
-0.210520	bottleneck or any particularly slow
-0.237240	y = (a + b)
-0.237182	{} vector(float a, float b)
-0.236934	- n.a. !(a < b)
-0.344260	a, T const & b)
-0.061505	calculations of (2n / b)
-0.061505	a * (2n / b)
-0.061505	The constant (2n / b)
-0.534002	b ? a : b)
-0.234340	!b = !(a || b)
-0.234346	(a * c > b)
-0.232674	b) = (a >= b)
-0.002204	SomeFunction (int a, bool b)
-0.237890	<, <=, > and >=
-0.231229	< 0 and i >=
-0.231229	< 0 || i >=
-0.231229	x = 2.0; i >=
-0.231237	(unsigned int if (i >=
-0.306784	i; ... if (i >=
-0.281556	< b) = (a >=
-0.008673	&CriticalFunction_AVX; } if (level >=
-0.008673	} else if (level >=
-0.002152	many branches): if (level >=
-0.048395	if else { (iset >=
-0.048395	SelectAddMul_pointer = &SelectAddMul_AVX2; (iset >=
-0.048395	SelectAddMul_pointer = &SelectAddMul_SSE41; (iset >=
-0.330816	14.4b if ((unsigned int)i >=
-0.658141	the length of the desired
-0.179204	a pointer to the desired
-0.342496	is made to the desired
-0.342496	function go to the desired
-0.517665	and compiled for the desired
-0.352374	most appropriate for the desired
-0.340986	you may put the desired
-0.486431	options to enable the desired
-0.023454	possible to obtain the desired
-0.575609	can be initialized to desired
-0.438982	problems, usability problems and desired
-0.237769	Define function type with desired
-0.681472	which they are used. Such
-0.230764	of going either way. Such
-0.228100	and the application software. Such
-0.226731	a so-called soft processor. Such
-0.226719	a hardware definition language. Such
-0.451448	a loop-carried dependency chain. Such
-0.218525	programs they are running. Such
-0.330763	appears on the market. Such
-0.074763	in the same chip. Such
-0.074763	in the CPU chip. Such
-0.165128	circumvent operating system standards. Such
-0.165128	based on hardware identification. Such
-0.165128	objects in computer games. Such
-0.165128	keyword __thread or __declspec(thread). Such
-0.165128	that are not reproducible. Such
-0.582554	you to use the #pragma
-0.357060	vectorized code when the #pragma
-0.293357	__declspec(noalias) or __restrict or #pragma
-0.237152	compiler to vectorize, or #pragma
-0.237782	vector always Optimize function #pragma
-0.341567	is used, then use #pragma
-0.442079	Vectorize #pragma vector always #pragma
-0.235209	fastcall)) __fastcall Noncached write #pragma
-0.219481	Assume pointer is aligned #pragma
-0.585811	aligned #pragma vector aligned #pragma
-0.432958	write #pragma vector nontemporal #pragma
-0.191038	__restrict #pragma ivdep __restrict #pragma
-0.191038	__restrict __declspec( noalias) __restrict #pragma
-0.165143	Assume pointer not aliased #pragma
-0.165143	__attribute__ ((visibility ("internal"))) Vectorize #pragma
-0.307438	use in system code. Dynamic
-0.614973	produce any extra code. Dynamic
-0.628340	static linking is used. Dynamic
-0.554186	28 Dynamic memory allocation Dynamic
-1.013907	data caching less efficient. Dynamic
-0.857696	to the end user. Dynamic
-0.518794	without dynamic memory allocation. Dynamic
-0.521604	makes data caching inefficient. Dynamic
-0.222342	or 64-bit systems). 28 Dynamic
-0.218537	obstacles to optimization are. Dynamic
-0.265145	of allocations is limited. Dynamic
-0.074767	dynamically allocated memory. 9.6 Dynamic
-0.074767	data ...................................................................................................... 90 9.6 Dynamic
-0.074767	loading ....................................................................................................... 19 3.6 Dynamic
-0.074767	file, is acceptable. 3.6 Dynamic
-0.293671	separate from seldom used functions,
-0.313895	of CPU-time in library functions,
-0.293125	doesn't work with member functions,
-0.591473	of pointers to its functions,
-0.236071	Avoid multiple inheritance, virtual functions,
-0.400218	some support for intrinsic functions,
-0.210330	PGI compiler supports intrinsic functions,
-0.210330	compilers allow assembly-like intrinsic functions,
-0.230065	core library contains similar functions,
-0.229225	and log are pure functions,
-0.212275	is limited to well-tested functions,
-0.013566	such as logarithms, exponential functions,
-0.013566	logarithms, exponential functions, trigonometric functions,
-0.357023	interprocedural optimizations of the whole
-0.357023	better understanding of the whole
-0.356979	to T+6, and the whole
-0.357383	exception handling for the whole
-0.341391	they have put the whole
-0.237430	branch mispredictions. Test the whole
-0.237430	is occupied throughout the whole
-0.331788	work to take a whole
-0.237742	function that draws a whole
-0.867939	have an option for whole
-0.450574	compilers have support for whole
-0.351748	compiler that can do whole
-0.313477	have a feature called whole
-0.236149	compiler allows "__attribute__((visibility("hidden")))". Use whole
-0.333825	efficient, way of doing whole
-0.485977	we can avoid the inefficient
-0.354641	size. However, it is inefficient
-0.354641	other words, it is inefficient
-0.357056	to it. This is inefficient
-0.293463	from each other is inefficient
-0.313944	thread. Thread-local storage is inefficient
-0.629762	Floating point comparisons are inefficient
-0.349482	memory allocation in an inefficient
-0.348919	while other compilers have inefficient
-0.337323	Interpreted code is very inefficient
-0.310722	but in a very inefficient
-0.310722	is certainly a very inefficient
-0.493303	list can be very inefficient
-0.465177	This can be quite inefficient
-0.224934	and flexible, but quite inefficient
-0.356718	the level-1 and the level-2
-0.105997	contentions occur in the level-2
-0.647228	causes misses in the level-2
-0.517941	experiments. Contentions in the level-2
-0.357165	critical stride for the level-2
-0.587193	contentions is that the level-2
-0.929470	is bigger than the level-2
-0.354728	level-1 cache from the level-2
-0.343326	write instruction prevents the level-2
-0.352136	64 Kbytes and a level-2
-0.237746	and only if, a level-2
-0.345149	the level-2 cache. The level-2
-0.237872	so much stronger for level-2
-0.237793	y=temp;} // Check if level-2
-1.139699	make sure that the response
-0.353647	the advantage that the response
-0.346856	interactive programs because the response
-0.346856	critical applications because the response
-0.356256	250 ms. If the response
-0.324889	you should test the response
-0.345215	user if such a response
-0.438940	thread is waiting for response
-0.046300	cause of unacceptably long response
-0.046300	frustrated by unacceptably long response
-0.046300	sometimes have unacceptably long response
-0.046300	might experience unacceptably long response
-0.234373	the cost of longer response
-0.200026	user expects an immediate response
-0.165164	annoyingly long and irregular response
-0.430322	compiler. This method is described
-0.430322	added. This method is described
-0.237838	microprocessor. These algorithms are described
-0.234244	CPU detection function as described
-0.310365	for CPU-intensive code, as described
-0.015356	of modern CPUs, as described
-0.064977	or multi-core CPUs, as described
-0.352912	this chapter, I have described
-0.340543	be negative. The method described
-0.292534	except for the cases described
-0.290652	investigated by the methods described
-0.417674	called. Unfortunately, the syntax described
-0.232306	similar methods are further described
-0.251323	unwinding The preceding paragraph described
-1.016577	is a power of 2.
-1.005419	be a power of 2.
-0.057234	a high power of 2.
-0.338977	numbers are powers of 2.
-0.500429	the result will be 2.
-0.036995	to divide i by 2.
-0.405752	is rolled out by 2.
-0.233010	Linux and Mac platforms. 2.
-0.230097	a particular code version. 2.
-0.165159	has a different meaning. 2.
-0.165159	linker and the loader. 2.
-0.165159	Clang, Intel or PathScale. 2.
-0.510536	using different types of variables.
-0.549244	registers for the same variables.
-0.331263	the values of all variables.
-0.324896	decrement operators on integer variables.
-0.293375	should never use static variables.
-0.519948	on signed and unsigned variables.
-0.319082	the use of register variables.
-0.302428	used most for register variables.
-0.749323	make floating point register variables.
-0.313153	is avoided for these variables.
-0.290374	optimize this with induction variables.
-0.234168	by avoiding any public variables.
-0.351507	on static or global variables.
-0.268503	function are called global variables.
-0.233031	a sequence of consecutive variables.
-0.881684	that the number of lines
-0.404682	28 because the cache lines
-0.311947	that all the cache lines
-0.329398	which set of cache lines
-0.219674	into eight different cache lines
-0.219674	from loading any cache lines
-0.219674	fourth of these cache lines
-0.014611	of the four cache lines
-0.061645	are only four cache lines
-0.293499	caches are organized into lines
-0.236838	any of the 4 lines
-0.292628	This corresponds to 16 lines
-0.236063	8*1024/64 = 128. These lines
-0.707022	longer than a few lines
-0.293963	the circumstances around the hot
-0.713986	useful for finding the hot
-0.237685	of code once the hot
-0.293963	useful to isolate the hot
-0.851744	If there is a hot
-0.237560	of profiling. When a hot
-0.237560	enough to identify a hot
-0.346403	the critical functions and hot
-0.348988	a single function or hot
-0.354001	will occur in this hot
-0.237045	The profiler identifies any hot
-0.056659	a profiler to find hot
-0.333192	not intended for finding hot
-0.165153	are useful for identifying hot
-0.402867	of different integer types Unfortunately,
-0.566612	function is never called. Unfortunately,
-0.321402	1; a[1] = 2; Unfortunately,
-0.439923	containing pure function calls. Unfortunately,
-0.317359	of such container classes. Unfortunately,
-0.286774	many of these purposes. Unfortunately,
-0.675397	with automatic CPU dispatching. Unfortunately,
-0.226726	using the virtual table. Unfortunately,
-0.682182	the same processor core. Unfortunately,
-0.224879	how to do this. Unfortunately,
-0.222342	local non-member functions. 80 Unfortunately,
-0.265125	CPUs use AMD CodeAnalyst. Unfortunately,
-0.165122	sake of cross-platform portability. Unfortunately,
-0.165122	functions lrintf and lrint. Unfortunately,
-0.165122	explained on page 132. Unfortunately,
-0.286133	IA-32/Intel64, 2009. Gnu C++ v.
-0.286133	(Red Hat). PathScale C++ v.
-0.286133	3.1, 2007. PGI C++ v.
-0.371916	2.00. Intel C++ compiler, v.
-0.228581	2009). Intel C++ Compiler v.
-0.164613	tested: Microsoft C++ Compiler v.
-0.171913	2008. Digital Mars Compiler v.
-0.212297	2004. Open Watcom C/C++ v.
-0.251316	1.4, 2005. Codeplay VectorC v.
-0.165143	currently doesn't works (gcc v.
-0.165143	Math Kernel Library (MKL v.
-0.165143	v. 2.7, 2.8. Asmlib: v.
-0.165143	9.0 CodeGear Borland bcc, v.
-0.165143	v 4.0.1. Gnu: Glibc v.
-0.165143	Microsoft Visual studio 2008, v.
-0.237719	results in a. This operation
-0.352780	loop if the same operation
-0.352780	sets where the same operation
-0.351752	If each floating point operation
-0.351752	units. Any floating point operation
-0.346921	write it in one operation
-0.323979	result of the & operation
-0.539037	bits in a single operation
-0.312814	miss on a store operation
-0.234164	extra resources. Each graphics operation
-0.320236	replaced by a shift operation
-0.231385	execution units. Each 128-bit operation
-0.222352	-1. The bitwise AND operation
-0.199993	operations. A complex digital operation
-0.165133	by performing an illegal operation
-1.388608	critical part of the code,
-0.553688	task-specific part of the code,
-0.354510	lazy loading of the code,
-1.035888	the rest of the code,
-0.358301	the reciprocal in the code,
-0.652192	Before you start to code,
-0.355456	to make more efficient code,
-0.291357	be useful for optimizing code,
-0.333589	compiled into an intermediate code,
-0.224943	big runtime frameworks, intermediate code,
-0.332191	modifications of the source code,
-0.304379	needs of position- independent code,
-0.222353	assembly language for CPU-intensive code,
-0.200009	of compatibility with legacy code,
-0.165148	way of removing superfluous code,
-0.653825	shared object, then the instance
-0.293981	are declared whenever an instance
-0.491257	load more than one instance
-0.298997	Such variables have one instance
-0.279193	data and make one instance
-0.476373	template has only one instance
-0.224684	you will get one instance
-0.046229	code section needs one instance
-0.046229	data section needs one instance
-0.293573	be stored with each instance
-0.330374	compile time. A template instance
-0.599562	to make a new instance
-0.327724	will generate a new instance
-0.488775	is that the next instance
-0.234185	implementing polymorphic classes. Each instance
-0.428776	than the library that comes
-0.237446	the first algorithm that comes
-0.237773	it is initialized or comes
-0.349639	many advantages when it comes
-0.354583	can do because it comes
-0.355573	and Linux. The compiler comes
-0.237577	and open source. It comes
-0.237432	Template Library (STL) which comes
-0.593762	a new register size comes
-0.235704	logical register. This advantage comes
-0.234759	the next new model comes
-0.234646	most critical integer parameter comes
-0.230783	called. A considerable delay comes
-0.230776	in Linux and BSD comes
-0.212263	where the main feedback comes
-0.504825	took advantage of the fact
-0.312176	A pointer is in fact
-0.077315	stack and are in fact
-0.077315	C++ program are in fact
-0.077315	14.7b, we are in fact
-0.036939	because they are in fact
-0.036939	but they are in fact
-0.330862	a program may in fact
-0.235763	loop exits, when in fact
-0.235763	vector registers had in fact
-0.313403	a suboptimal way. The fact
-0.292946	risk of underflow. The fact
-0.236792	the sign bit. The fact
-0.236792	rather than 20. The fact
-0.860111	take advantage of this fact
-0.354419	CPU-intensive software is to find
-0.753466	times in order to find
-0.520836	algorithms in order to find
-0.520836	sizeof(float) in order to find
-0.906848	and you want to find
-0.506649	example, you want to find
-0.235799	string of bytes to find
-1.077552	not be able to find
-0.448865	can be difficult to find
-0.412298	that are difficult to find
-0.023362	Use a profiler to find
-0.348863	Here, you can also find
-0.346252	questions if you cannot find
-0.200043	a function call. (2) find
-0.547146	clean solution is to rely
-0.421074	certainly more convenient to rely
-0.330805	work on compilers that rely
-0.313705	from making optimizations that rely
-0.293234	vector operations. Algorithms that rely
-0.482848	code and you can rely
-0.342976	most cases you can rely
-0.237394	by multiple threads should rely
-0.313644	dispatcher 128 function cannot rely
-0.473450	loop then you cannot rely
-0.335371	which not. You cannot rely
-0.236513	but you cannot always rely
-0.236233	the loop branch must rely
-0.229226	longjmp if possible. Don't rely
-0.165153	that we can surely rely
-0.826255	} else { // No
-0.350020	return *(T*)0; } // No
-0.381948	pointer -fomit- frame- pointer No
-0.307324	compactness, and execution time. No
-0.561575	compiler at compile time. No
-0.340840	elements consecutively in memory. No
-0.235291	program is actually used. No
-0.233271	of loop iterations are: No
-0.690813	through a template parameter. No
-0.229183	threads need separate storage. No
-0.228126	through a linear array. No
-0.212263	VectorC v. 2.1.7, 2004. No
-0.165133	No exception handling /EHs- No
-0.165133	efficient (see page 53). No
-0.165133	-fwhole- program /Qipo -ipo No
-0.237527	cases, for example to produce
-0.498167	arguments are sure to produce
-0.237527	Language Runtime, CLR, to produce
-0.314400	come from operators that produce
-0.237453	worst-case conditions. Programs that produce
-0.294127	Booleans as output can produce
-0.352498	These conversions do not produce
-0.468836	The compiler does not produce
-0.332777	check. It does not produce
-0.293959	is ambiguous and may produce
-0.544402	/arch:SSE2. The compiler will produce
-0.237394	good optimizing compiler should produce
-0.237335	and Digital Mars compilers produce
-0.317684	~. The Boolean operators produce
-0.471781	needed. The bitwise operators produce
-0.358611	consuming features of the position-independent
-0.621634	by turning off the position-independent
-0.473641	avoiding the costs of position-independent
-0.037134	3.6 Dynamic linking and position-independent
-0.294008	that is compiled as position-independent
-0.236261	Mac systems often use position-independent
-0.292342	in Unix-like systems use position-independent
-0.237511	Mac OS X make position-independent
-0.237306	and by not using position-independent
-0.236573	make shared objects without position-independent
-0.340032	code section is always position-independent
-0.379805	but the compiler uses position-independent
-0.287822	The need for special position-independent
-0.165138	to avoid the burdensome position-independent
-0.637180	Transforming serial code for vectorization
-0.117019	register. Factors that make vectorization
-0.117019	is. Factors that make vectorization
-0.457466	function where you want vectorization
-0.235767	that decide how advantageous vectorization
-0.235637	to predict correctly whether vectorization
-0.293649	vector intrinsics and automatic vectorization
-0.247974	AVX instructions. The automatic vectorization
-0.197032	in situations where automatic vectorization
-0.197032	supports vector intrinsics, automatic vectorization
-0.130091	for float expressions Automatic vectorization
-0.130091	Automatic CPU dispatch Automatic vectorization
-0.130091	// Example 12.1a. Automatic vectorization
-0.060193	distant future. 12.3 Automatic vectorization
-0.060193	.......................................................... 107 12.3 Automatic vectorization
-0.551669	set. If you are including
-0.237759	a different compiler by including
-0.292230	for common mathematical calculations including
-1.493732	Intel, AMD and VIA including
-0.288634	compiler for 32-bit Windows, including
-0.478709	rest of the code, including
-0.021206	to handle the strings including
-0.508288	optimization options turned on, including
-0.222347	available for many platforms, including
-0.222347	has full metaprogramming features, including
-0.218546	depends only on n, including
-0.165128	for all static data, including
-0.165128	on the same computer, including
-0.165128	the whole software package, including
-0.539668	which is useful for checking
-0.037082	Use bitwise operators for checking
-0.237776	check for overflow by checking
-0.583834	way. There is no checking
-0.346443	imple- mentations have no checking
-0.236590	outside the loop without checking
-0.233024	It has some syntax checking
-0.177527	The method of bounds checking
-0.137013	of arrays with bounds checking
-0.137013	7.15a. Array with bounds checking
-0.035781	13) { // Bounds checking
-0.017522	are constant. 14.2 Bounds checking
-0.017522	................................................................................................. 132 14.2 Bounds checking
-0.035781	minimizing memory fragmentation. Bounds checking
-0.356344	level-2 cache because the out-of-order
-0.341543	next calculation. However, the out-of-order
-0.331601	this chapter. Using the out-of-order
-0.237686	condition. In general, the out-of-order
-0.773355	to take advantage of out-of-order
-0.328246	take maximum advantage of out-of-order
-0.325397	data automatically thanks to out-of-order
-0.048112	chain. A microprocessor with out-of-order
-0.048112	counter. A microprocessor with out-of-order
-0.292243	104 } Microprocessors with out-of-order
-0.805592	Smaller microcontrollers have no out-of-order
-0.351746	Modern microprocessors can do out-of-order
-0.589817	the CPU from doing out-of-order
-0.232334	dependency chain which prevents out-of-order
-0.294219	not be portable to platforms
-0.312093	to port to different platforms
-0.350725	is different for different platforms
-0.036711	may apply to other platforms
-0.233605	if implemented on other platforms
-0.314172	specific recommendation about which platforms
-0.347691	inline assembly on all platforms
-0.237175	useful for supporting multiple platforms
-0.514692	in the most common platforms
-0.312354	good choice for Linux platforms
-0.680796	Windows, Linux and Mac platforms
-0.233792	big-endian storage. All x86 platforms
-0.165143	Supports x86 and ARM platforms
-1.608886	SSE2 instruction set is particularly
-0.438452	areas where speed is particularly
-0.718115	Dynamic memory allocation is particularly
-0.237415	its binary representation is particularly
-0.658799	operating system can be particularly
-0.551434	few functions that are particularly
-0.344272	the considerations that are particularly
-0.482607	sets. Vector operations are particularly
-0.236039	classes. Text strings are particularly
-0.236039	and device drivers are particularly
-0.331552	bytes. Some CPUs have particularly
-0.237584	for details (www.agner.org/optimize/testp.zip). A particularly
-0.313713	specific bottleneck or any particularly
-0.235872	particular code implementation works particularly
-0.356716	the lrint function is given
-0.345024	The child class is given
-0.337536	best implementation for a given
-0.337536	hardware platform for a given
-0.358130	memory space can be given
-0.461216	virtual processor may be given
-0.313043	of container classes are given
-0.236490	parameter. Further details are given
-0.236490	multiplications and divisions are given
-0.236490	of my experiment are given
-0.237739	to optimize access, as given
-0.427034	it has not been given
-0.188110	Much of the advice given
-0.129468	then follow the advice given
-0.358482	or glitches in the output
-0.421331	that you compile the output
-0.237887	for file input and output
-0.237867	an input file. The output
-0.237736	that have Booleans as output
-0.347275	console or to an output
-0.357689	Looking at the compiler output
-0.314345	The values are then output
-0.042596	annotation in the assembly output
-0.042596	look at the assembly output
-0.042596	option makes the assembly output
-0.042596	shows what the assembly output
-0.254935	assembly output. The assembly output
-0.273639	doesn't have an assembly output
-1.335058	a multiple of the level-1
-0.712074	to be in the level-1
-0.455553	a table in the level-1
-0.517941	2. Contentions in the level-1
-0.352563	is mirrored in the level-1
-0.357165	square blocking for the level-1
-0.546223	are bigger than the level-1
-0.353518	4 computer where the level-1
-0.331166	line in both the level-1
-0.537403	takes to reload the level-1
-0.576070	Typically, there is a level-1
-0.314692	cache contentions than for level-1
-0.357758	cases even the same level-1
-0.337027	where almost the entire level-1
-0.562643	consumes most of the resources.
-0.460277	and a waste of resources.
-0.287503	a big waste of resources.
-0.549240	competing for the same resources.
-0.335796	of memory or other resources.
-0.293001	data cache are critical resources.
-0.535557	a lot of extra resources.
-0.235961	all cleanup of allocated resources.
-0.235590	framework that uses few resources.
-0.232674	user input or network resources.
-0.232292	small devices with limited resources.
-0.222336	a lot of computing resources.
-0.218531	disk space were scarce resources.
-0.165133	unless you have ample resources.
-0.549938	permissible when it is outside
-0.325168	15. If i is outside
-0.407587	memory to stack memory outside
-0.237425	inside a function but outside
-0.237032	in a temporary variable outside
-0.236467	do the extra operations outside
-0.236094	add the last element outside
-0.519056	to check for overflow outside
-0.448234	has to be done outside
-0.345749	// Initialize loop counter outside
-0.415398	Variables that are declared outside
-0.233281	pointer arithmetic calculations go outside
-0.230791	variables (i.e. variables defined outside
-0.356016	that it can move outside
-0.358667	the requirements of the task
-0.346997	This happens when a task
-0.740292	the situation where a task
-0.331543	threads. Don't put a task
-0.508985	thread. The cost of task
-0.237874	due to interrupts and task
-0.237725	reproducible. Such events as task
-0.237721	moving the mouse. This task
-0.344828	than doubled for this task
-0.335681	slices allocated to each task
-0.351358	there is a single task
-0.375037	platform for a given task
-0.231351	the spell checking. Any task
-0.165138	than on the essential task
-0.355616	the unsafe code is limited
-0.293465	of the CPU is limited
-0.441222	case, the performance is limited
-0.681927	the clock frequency is limited
-0.237248	of possible inputs is limited
-0.503755	cache size is a limited
-0.356452	Non-public distribution to a limited
-0.341257	function has only a limited
-0.324732	by keys within a limited
-1.324880	is likely to be limited
-0.294092	cannot be overloaded or limited
-0.237764	on small devices with limited
-0.236124	constants. Register storage A limited
-0.380348	also very expensive. A limited
-0.325383	nontemporal writes automatically in vectorized
-0.237864	more error prone. The vectorized
-0.347787	12.3. Intrinsic functions for vectorized
-0.998094	may be used for vectorized
-0.579698	way that can be vectorized
-0.561163	variables can also be vectorized
-0.331003	complicated cases cannot be vectorized
-0.331003	Table lookup cannot be vectorized
-0.236106	code can now be vectorized
-0.356907	is profitable to use vectorized
-0.236896	this prevents a faster vectorized
-0.313899	Example 12.4c. Same example, vectorized
-0.265164	the code is indeed vectorized
-0.165153	Example 12.9b. Taylor series, vectorized
-0.357939	compile-time polymorphism. It is sometimes
-0.237754	integer to zero is sometimes
-0.293896	code more efficient, and sometimes
-0.237626	click becomes inconsistent and sometimes
-0.338785	time. Newer processors are sometimes
-0.237393	used as macros are sometimes
-0.235637	level of optimization can sometimes
-0.291632	to float conversions can sometimes
-0.235637	optimization explicitly. Divisions can sometimes
-0.235637	/ c) 139 can sometimes
-0.235637	conversion and shuffling can sometimes
-0.355578	another problem. The compiler sometimes
-0.377642	using such a framework sometimes
-0.226780	are often unreliable. They sometimes
-0.357327	is called and the local
-0.460101	local, and use the local
-0.580979	forget to make the local
-0.348802	is copied to a local
-0.956963	when applied to a local
-0.325339	store intermediate data and local
-0.313795	the local name for local
-0.237120	that all destructors for local
-0.237120	and PLT lookups for local
-0.237485	64-bit mode. Make functions local
-0.625369	is contiguous with other local
-0.335663	static keyword to all local
-0.232694	all static data, including local
-0.265164	called from), function parameters, local
-0.540409	inefficient because of the costs
-0.357127	more focus on the costs
-0.346998	too much about the costs
-0.237517	from memory plus the costs
-0.293772	methods for avoiding the costs
-0.237517	be weighed against the costs
-0.347485	are four kinds of costs
-0.237148	of an exception. The costs
-0.102584	....................................................................................................................... 3 1.1 The costs
-0.102584	relevant information. 1.1 The costs
-0.237157	of the STL also costs
-0.236984	there are inherent performance costs
-0.228619	set of CPUs. These costs
-0.228619	function to another. These costs
-0.339388	the next instance of S1
-0.421182	Make all instances of S1
-0.237660	}; void Func() { S1
-0.337551	// 2 unused bytes S1
-0.235764	byte at 19 }; S1
-1.023222	int size = 100; S1
-0.125788	Example 12.2 __declspec(align(16)) struct S1
-0.125788	// Example 14.9 struct S1
-0.125788	// Example 8.15a struct S1
-0.125788	// Example 8.15b struct S1
-0.125788	// Example 7.35b struct S1
-0.125788	// Example 7.35a struct S1
-0.013567	{double a; double b;}; S1
-0.237913	example, a library of math
-0.320731	different kinds of vector math
-0.248064	AMD and Intel vector math
-0.248064	math libraries: Intel vector math
-0.314387	of some long vector math
-0.230708	list of short vector math
-0.230708	libraries: Intel short vector math
-0.455562	may use the Intel math
-0.514710	calculate the most common math
-0.236025	page 119). The AMD math
-0.380052	Libraries for high precision math
-0.534786	of the best optimized math
-0.328546	the options for fast math
-0.231378	unroll factor. A little math
-0.459053	a new value of temp
-0.314741	new physical register to temp
-0.349039	= a+1; b = temp
-0.349263	* temp; c = temp
-0.236594	+ b[i]; c[i] = temp
-1.162173	< size; i++) { temp
-0.489674	Most compilers will make temp
-0.330478	the value of register temp
-0.232299	variables, but will save temp
-0.383263	{ a[i] = temp; temp
-0.153161	i, a[100], b, temp; temp
-0.153161	a, b, c, temp; temp
-0.153161	int i, a[100], temp; temp
-0.165153	for (temp = &list[0]; temp
-0.725919	non-inlined copy of the inlined
-0.128407	one call to the inlined
-0.501260	compilers. The code is inlined
-0.382844	addresses. The names of inlined
-0.460553	member function to be inlined
-0.460735	simple constructor may be inlined
-0.352578	function which cannot be inlined
-0.314654	vectors. The operators are inlined
-0.351562	unused copy of an inlined
-0.346281	reduced 15.1a to an inlined
-0.345211	Small functions are often inlined
-0.340053	the function is always inlined
-0.441716	A function is usually inlined
-0.556453	wrong, but it is still
-0.293845	subtraction, multiplication, etc. is still
-0.237581	packages and who is still
-0.312675	not. The loop can still
-0.405584	low instruction set can still
-0.292252	But this solution can still
-0.236182	method described above can still
-0.345146	bit mode, where it still
-1.512257	part of the code still
-0.528989	cycles then it will still
-0.293247	64-bit code. However, we still
-0.235555	0x2700 to 0x273F would still
-0.377627	the high level framework still
-0.226749	parallel vector processing capabilities still
-0.934889	an object of the class.
-0.536582	each instance of the class.
-0.459755	all instances of the class.
-0.352779	shared variable inside the class.
-0.486662	as a structure or class.
-0.299455	the same structure or class.
-0.851982	member of the same class.
-0.312150	a pointer to another class.
-0.442003	memory into a container class.
-0.394316	both parent and child class.
-0.289296	pointer to its child class.
-0.580506	object of the derived class.
-0.229238	in the // parent class.
-0.466135	member of the object's class.
-0.358537	the information in the database
-0.354430	software applications use a database
-0.615262	possible to replace a database
-0.237777	if the network or database
-0.324950	storing user data. A database
-0.330479	to access the system database
-0.291350	to gain by optimizing database
-0.231376	than loops, etc. Optimizing database
-0.226742	etc. Locked mutexes. Open database
-0.079338	access................................................................................................................ 20 3.8 System database
-0.079338	to finish. 3.8 System database
-0.222356	facilities, easy GUI development, database
-0.199998	allocated memory, windows, mutexes, database
-0.165138	in the big registration database
-0.357983	single branch if the constants
-0.237856	Boolean expressions. Whether the constants
-0.829994	The maximum number of constants
-0.525492	from a table of constants
-0.339396	three parts: one for constants
-0.237476	or subexpression containing only constants
-0.237453	than multiplying by other constants
-0.530355	constants and floating point constants
-0.351757	that all floating point constants
-0.344528	recognize that the two constants
-0.656006	that chooses between two constants
-0.235122	used for constants. Integer constants
-0.230789	be stored. All identical constants
-0.165148	on the stack. String constants
-0.350832	= lookup[b]; If a bool
-0.352646	(or int) instead of bool
-0.236142	size, bytes alignment, bytes bool
-0.043068	int SomeFunction (int a, bool
-0.321167	Example 7.29a float a; bool
-0.314763	8.8b double x, y; bool
-0.642147	in the following way: bool
-0.212297	example: // Example 7.11 bool
-0.330786	double x, y, z; bool
-0.165143	1: // Example 7.10a bool
-0.165143	example: // Example 7.9a bool
-0.476552	bit at a time. Do
-0.603476	as a member function. Do
-0.746958	stack registers are used. Do
-1.013865	data caching less efficient. Do
-0.349035	the current instruction set. Do
-0.749943	avoid dynamic memory allocation. Do
-0.317391	one contiguous memory block. Do
-0.226731	it prevents certain optimizations. Do
-0.642100	as a linked list. Do
-0.212285	is a scarce resource. Do
-0.199987	= row + column; Do
-0.165128	Professional and Enterprise editions). Do
-0.165128	by a unique key. Do
-0.165128	or a hash map. Do
-0.408065	improved by inlining the frame
-0.237860	or by turning the frame
-0.352753	is simpler than a frame
-0.444072	functions is called a frame
-0.279272	contains no calls to frame
-0.279272	program contains calls to frame
-0.346410	between leaf functions and frame
-0.568778	are more efficient than frame
-0.334417	and frame functions. A frame
-0.236121	of an exception. A frame
-0.270503	not use a stack frame
-0.095089	Omitting the standard stack frame
-0.095089	pointer". The standard stack frame
-0.217009	handling /EHs- No stack frame
-0.236978	if (i % 2 ==
-0.235750	&& SIZE % 128 ==
-0.311775	0; } if (a ==
-0.057012	} #endif // INSTRSET ==
-0.027571	} } #if INSTRSET ==
-0.027571	instruction set #if INSTRSET ==
-0.027571	FUNCNAME SelectAddMul_SSE41 #elif INSTRSET ==
-0.027571	FUNCNAME SelectAddMul_SSE2 #elif INSTRSET ==
-0.148734	b; a = (b ==
-0.375770	0) { if (b ==
-0.057012	== Wednesday || Day ==
-0.057012	== Tuesday || Day ==
-0.466145	Weekdays Day; if (Day ==
-0.165159	other exceptions: __except (GetExceptionCode() ==
-0.323858	a; double b; int d;
-0.236672	byte at 7 int d;
-0.315144	14.23b union { double d;
-0.006654	unsigned int u; double d;
-0.054492	b + c + d;
-0.174965	vector a, b, c, d;
-0.565912	float a, b, c, d;
-0.078730	bool a, b, c, d;
-0.109379	b = 0, c, d;
-0.212320	100 doubles: union {double d;
-0.350351	pointer. It has the special
-0.357807	are saved in a special
-0.654155	be done with a special
-0.352991	processors that have a special
-0.637263	use a set of special
-0.293901	may be optimal in special
-0.314542	and underflow except in special
-0.331599	below. Many libraries for special
-0.314234	data. The need for special
-0.458837	function. But there are special
-0.453781	mode unless you have special
-0.656769	the effort to make special
-0.634870	no need to take special
-0.224909	standard function libraries. Several special
-0.591655	principles in order to prevent
-0.823906	the best way to prevent
-0.330992	A good way to prevent
-1.123217	and you want to prevent
-0.293125	need extra overhead to prevent
-0.236949	returns // Volatile to prevent
-0.524582	in the CPU and prevent
-0.382438	to the user and prevent
-0.650665	several factors that can prevent
-0.583813	chains in the code prevent
-0.346144	class definition. This will prevent
-0.236152	mean atomic. It doesn't prevent
-0.233791	because the debugging options prevent
-0.212292	computer is rebooted. To prevent
-0.944959	be replaced by a shift
-0.653769	be done with a shift
-0.355925	be done as a shift
-0.548189	2 by using a shift
-0.313770	comparison, bit operations and shift
-0.237100	bits of a[i] and shift
-0.237100	constant = multiply and shift
-0.406940	combination of additions and shift
-0.349662	bit set). We can shift
-0.502365	Multiply by constant = shift
-0.344848	The reasons for this shift
-0.237585	in example 14.28 will shift
-0.284568	i/2 in ebx ; shift
-0.229421	i + sign(i) ; shift
-0.351196	copy constructor and the destructor
-0.351196	copying process, and the destructor
-0.462095	to fail if the destructor
-0.370442	handler to call the destructor
-0.370442	supposed to call the destructor
-0.798105	all information about the destructor
-0.353766	of course be a destructor
-0.653379	a class with a destructor
-0.428554	class must have a destructor
-0.331126	the thread have a destructor
-0.353772	Do not make a destructor
-0.237592	the object owns. A destructor
-0.324716	copy constructor and no destructor
-0.292141	not necessary. A virtual destructor
-0.459417	software optimization is to save
-0.342692	All functions have to save
-0.689809	it doesn't have to save
-0.591802	low in order to save
-0.237144	with other calculations to save
-0.354024	arrays if it can save
-0.354325	of (a+b). This can save
-0.351617	not overlap. You can save
-0.330218	cycles that we may save
-0.353585	(rarely 64). You may save
-0.237585	register variables, but will save
-0.293635	possible. Typically it should save
-0.709873	; unused label ; save
-0.286806	restoring registers, and possibly save
-0.237887	in a register and prevents
-0.349183	the function, and it prevents
-0.650335	not optimal because it prevents
-0.236496	readable but unfortunately it prevents
-0.326983	temp in memory. This prevents
-0.310029	by another thread. This prevents
-0.233962	the preceding one. This prevents
-0.233962	be declared volatile. This prevents
-0.233962	it is compiling. This prevents
-0.331624	lookup tables if this prevents
-0.237486	the nontemporal write instruction prevents
-0.237445	critical dependency chain which prevents
-0.293354	see this. It also prevents
-0.310861	index. The integer division prevents
-1.428493	the address of the preceding
-0.144136	the result of the preceding
-0.649141	a constant to the preceding
-0.898258	is equal to the preceding
-0.567183	sum depends on the preceding
-0.347546	of two. In the preceding
-0.334002	new addition before the preceding
-0.334002	one iteration before the preceding
-0.344286	control branch. See the preceding
-0.237879	of stack unwinding The preceding
-0.237771	it is correlated with preceding
-0.356311	to always use the safe
-0.358091	reason why it is safe
-0.357366	or CString. This is safe
-0.357938	the calculations in a safe
-0.530640	union is not a safe
-0.429422	destructors are called. The safe
-0.760709	It may not be safe
-0.468313	exit may not be safe
-0.348917	and efficient, but not safe
-0.560457	code. It is more safe
-0.292287	and is therefore more safe
-0.490768	class. This is only safe
-0.236054	are generally not thread safe
-0.235977	The program is exception safe
-0.237623	a, b, c and d
-0.293892	to example 15.1b and d
-0.349753	* 3.5; c = d
-0.350174	+ c; y = d
-0.504821	(b == 0) { d
-0.237296	// Example 14.20 double d
-0.023259	b*x*x + c*x + d
-0.517496	= a & b; d
-0.227089	= a && b; d
-0.010361	int u; double d; d
-0.165159	} else { DTRUE: d
-0.122660	language ............................................................................... 8 2.5 Choice
-0.122660	as C++ compilers. 2.5 Choice
-0.074777	microprocessor ........................................................................................... 6 2.3 Choice
-0.074777	of this manual. 2.3 Choice
-0.074777	the data cache. 2.2 Choice
-0.074777	platform ....................................................................................... 5 2.2 Choice
-0.074777	the optimal platform 2.1 Choice
-0.074777	platform ........................................................................................... 5 2.1 Choice
-0.074777	function libraries........................................................................................ 12 2.7 Choice
-0.074777	names are undocumented. 2.7 Choice
-0.074777	compiler .................................................................................................... 10 2.6 Choice
-0.074777	with another compiler. 2.6 Choice
-0.074777	whole program optimization. 2.4 Choice
-0.074777	operating system......................................................................................... 6 2.4 Choice
-0.772748	in the code to tell
-0.651730	is also possible to tell
-0.062026	is no way to tell
-0.292038	#pragma vector always to tell
-0.235994	a variable declaration to tell
-0.235994	the __assume_aligned directive to tell
-0.235994	a function prototype to tell
-0.235994	because we forgot to tell
-0.235994	statements like throw(A,B,C) to tell
-0.235994	or #pragma novector to tell
-0.353972	a profiler that can tell
-0.349098	sign bit. We can tell
-0.237589	pointers or references then tell
-0.357522	chain, especially on the Pentium
-0.305176	64 matrix on a Pentium
-0.305176	were measured on a Pentium
-0.305176	example 9.5a on a Pentium
-0.305176	predicted perfectly on a Pentium
-0.331373	with old CPUs. The Pentium
-0.314250	on another computer. The Pentium
-0.382613	11 clock cycles on Pentium
-0.382373	about branch prediction. A Pentium
-0.335503	to fake an Intel Pentium
-0.236220	simple regular pattern, while Pentium
-0.262607	changed to the old Pentium
-0.262607	15 on the old Pentium
-0.165153	error in the oldest Pentium
-0.237922	The theoretical background is further
-0.354627	page 90 for a further
-0.624349	opens the possibility for further
-0.235642	can be expected for further
-0.235642	on C++ Performance for further
-0.235642	See page 150 for further
-0.235642	10 page 101 for further
-0.235642	See page 153 for further
-0.235642	See page 140 for further
-0.562137	of code can be further
-0.331796	and similar methods are further
-0.462174	often optimize the code further
-0.237580	the AVX instructions. A further
-0.654819	to unroll the loop further
-0.237734	user. Making exception-safe code Assume
-0.569006	i++; } } } Assume
-0.407436	static static static static Assume
-0.620780	aligned #pragma vector aligned Assume
-0.306923	for another memory access. Assume
-0.398513	throw() throw() throw() throw() Assume
-0.226731	in terms of speed. Assume
-0.466083	__declspec( align(16)) __attribute(( aligned(16))) Assume
-0.251297	__attribute(( const)) __attribute(( const)) Assume
-0.330763	ivdep __restrict #pragma ivdep Assume
-0.199987	/GR– -fno-rtti /GR- -fno-rtti Assume
-0.165128	diagonal are accessed column-wise. Assume
-0.165128	motion. See page 78. Assume
-0.165128	of code in general. Assume
-0.357441	has influence on the efficiency
-0.490202	minimal difference between the efficiency
-0.323986	resources, databases, etc. The efficiency
-0.102448	process...................................................................................................... 25 7 The efficiency
-0.102448	control tool. 7 The efficiency
-0.236777	microprocessors. 7.13 Loops The efficiency
-0.632129	The reason for this efficiency
-0.237605	of the question when efficiency
-0.331399	low priority of program efficiency
-0.293718	is concentrated on CPU efficiency
-0.237385	speed, memory economy, cache efficiency
-0.603798	then you may improve efficiency
-0.233023	at explaining the relative efficiency
-0.272244	is implemented. The highest efficiency
-0.955827	The fact that the repeat
-0.457418	no problem if the repeat
-0.354033	predicted well if the repeat
-0.356897	this problem when the repeat
-0.356336	of code). If the repeat
-0.325375	it decides whether to repeat
-0.236221	; i++ ;checkifi<100 ; repeat
-0.338215	loop with a high repeat
-0.289651	is near the maximum repeat
-0.208490	determine the worst-case maximum repeat
-0.229219	with a very low repeat
-0.229207	advantageous if the typical repeat
-0.283118	a small and fixed repeat
-0.218568	loss of precision. Let's repeat
-0.613467	is divisible by the unroll
-0.365503	be divisible by the unroll
-0.514251	not divisible by the unroll
-0.458759	does not have to unroll
-0.645792	is not necessary to unroll
-0.420514	be an advantage to unroll
-1.081628	is no reason to unroll
-0.293347	may be worthwhile to unroll
-0.348266	is odd and you unroll
-0.515689	unrolling Some compilers will unroll
-0.346101	vector or the loop unroll
-0.346101	turn off the loop unroll
-0.293570	2; Unfortunately, some compilers unroll
-0.233786	cache. Compilers will usually unroll
-0.064127	the functions that it calls.
-0.340322	lazy binding of function calls.
-0.307136	many branches or function calls.
-0.425007	economize the library function calls.
-0.231533	separately through multiple function calls.
-0.398734	applications with many function calls.
-0.286966	to data through function calls.
-0.232814	code containing pure function calls.
-0.232814	it involves pure function calls.
-0.100516	to optimize across function calls.
-0.100516	making optimizations across function calls.
-0.323672	hardware interfaces and system calls.
-0.289789	on system-specific graphical interface calls.
-0.061955	fits best into the algorithm
-0.520518	algorithm. The choice of algorithm
-0.237694	programming language defines an algorithm
-0.237038	possible to express any algorithm
-0.353717	by optimizing the first algorithm
-0.353308	multiplications only. The following algorithm
-0.349177	algorithm if a simple algorithm
-0.349934	by choosing the best algorithm
-0.245318	to choose the optimal algorithm
-0.110742	5 Choosing the optimal algorithm
-0.234886	an advanced and complicated algorithm
-0.200004	to implement a universal algorithm
-0.237940	example, which calculates the sum
-0.462386	simplest case is a sum
-0.459050	because each value of sum
-0.294102	constructor // constructor // sum
-0.237653	<= 16; n++) { sum
-0.237576	s3 += a[i+3]; } sum
-0.234148	xn = x; float sum
-0.377595	list float a[100]; float sum
-1.426047	i < 100; i++) sum
-1.437251	i < size; i++) sum
-0.217479	for (i=0; i<100; i++) sum
-0.345079	int a[100]; int i, sum
-0.466114	= 100; float list[size], sum
-0.200004	0.f, 1.f); // initialize sum
-0.023519	large to handle the strings
-0.294084	of different types or strings
-0.101924	is to store all strings
-0.101924	you may store all strings
-0.348219	for how to store strings
-0.426474	fastest way to handle strings
-0.367343	C-style method of storing strings
-0.048393	at compile time. Text strings
-0.048393	efficient container classes. Text strings
-0.048393	needs. 9.8 Strings Text strings
-0.122656	The storage of text strings
-0.122656	these and handle text strings
-0.234156	only on some processors. On
-0.321386	for several different CPUs. On
-0.432046	automatically by the compiler. On
-0.232281	that uses few resources. On
-0.356016	very big data structures. On
-0.212289	for assembly language output. On
-0.251304	detailed optimization more difficult. On
-0.199993	advantage to using hyperthreading. On
-0.199993	to a branch tree. On
-0.165133	a special loop predictor. On
-0.165133	a floating point comparison. On
-0.165133	certain modification is profitable. On
-0.165133	structure in example 9.1b. On
-0.358507	the representation of the exponent
-0.140505	power function when the exponent
-0.140505	pow function when the exponent
-0.355196	subtracting n from the exponent
-0.294231	// add n to exponent
-0.237512	for negative numbers. The exponent
-0.237512	the binary digits. The exponent
-0.292791	exponent : 8; // exponent
-0.236655	exponent : 15; // exponent
-0.236655	exponent : 11; // exponent
-0.059456	fractional part unsigned int exponent
-0.311387	and normal unsigned int exponent
-0.237917	with most distributions of Linux,
-0.635204	in both Windows and Linux,
-0.536811	code Shared objects in Linux,
-0.237634	(In Windows, SetThreadAffinityMask, in Linux,
-0.554568	including 32-bit and 64-bit Linux,
-0.329021	64-bit Windows. In 64-bit Linux,
-0.230956	in registers, whereas 64-bit Linux,
-0.293123	|| defined(__GNUC__) // 32-bit Linux,
-0.177926	processor and a Windows, Linux,
-0.177926	common platforms with Windows, Linux,
-0.177926	Supported operating systems Windows, Linux,
-0.177926	Windows, Linux, Mac Windows, Linux,
-0.165164	All x86 platforms (Windows, Linux,
-0.358231	the sake of the possibility
-0.356807	all platforms and the possibility
-0.318395	cannot rule out the possibility
-0.217956	completely rule out the possibility
-0.346698	hasn't thought about the possibility
-0.102626	if it opens the possibility
-0.102626	instruction set opens the possibility
-0.381935	Other compilers offer the possibility
-0.237262	inlining can open the possibility
-0.206142	the innermost loop. Another possibility
-0.206142	uses a GOT. Another possibility
-0.200043	rule out the theoretical possibility
-0.200043	of a very obscure possibility
-0.345248	contiguous memory. See the discussion
-0.337534	page 16 for a discussion
-0.337534	page 87 for a discussion
-0.293745	95 and 120 for discussion
-0.237494	See page 93 for discussion
-0.293907	be clear from this discussion
-0.331506	page 31 for more discussion
-0.237580	more error prone. A discussion
-0.553344	manual. There are various discussion
-0.169980	90 for a further discussion
-0.162225	page 150 for further discussion
-0.162225	page 101 for further discussion
-0.162225	page 153 for further discussion
-0.237862	conditions are satisfied. The conditions
-0.231079	useful for testing multiple conditions
-0.100348	Example 14.7b. Testing multiple conditions
-0.100348	Example 14.7a. Testing multiple conditions
-0.492342	If any of these conditions
-0.460659	all of the following conditions
-0.326785	best if the following conditions
-0.236326	and recovering from error conditions
-0.291350	in parallel if certain conditions
-0.234915	sufficient, and the caching conditions
-0.310658	this example has three conditions
-0.218564	available from www.agner.org/optimize. Copyright conditions
-0.272244	be tested under worst-case conditions
-0.354624	runs satisfactorily on a non-Intel
-0.237764	platforms. Works well with non-Intel
-0.157701	CPUs. The performance on non-Intel
-0.130963	has reduced performance on non-Intel
-0.334047	make this work on non-Intel
-0.231913	6! The speed on non-Intel
-0.279320	doesn't work well on non-Intel
-0.279320	it works well on non-Intel
-0.461548	set when running on non-Intel
-0.237588	with Intel processors. A non-Intel
-0.200032	Intel CPU dispatcher treats non-Intel
-0.200032	of these also treat non-Intel
-0.575642	return a pointer to it.
-1.296227	pointer or reference to it.
-0.237742	but don't count on it.
-0.356903	it unwise to use it.
-0.331295	not sure you need it.
-0.535616	to do something about it.
-0.427373	the program that calls it.
-0.500383	if you can avoid it.
-0.660165	of processors that support it.
-0.233281	project together and tested it.
-0.332365	program than to execute it.
-0.222342	if you don't understand it.
-0.165138	program itself and recompile it.
-0.139839	delay in the CPU (See
-1.687993	a power of 2 (See
-0.309033	before dividing by 2 (See
-0.292341	not on AMD CPUs (See
-0.337899	than in 64-bit Windows (See
-0.234354	of the specified types (See
-0.587796	versions for different CPUs. (See
-0.289245	enable optimizations across modules (See
-0.310595	rolled out by 2. (See
-0.288246	static or global variables. (See
-0.416059	addresses to be mispredicted (See
-0.330786	Library, available from www.intel.com. (See
-0.341593	a different kind of registers.
-0.237726	is the scarcity of registers.
-1.017954	to be transferred in registers.
-0.538318	can be returned in registers.
-0.347105	nicely into the vector registers.
-0.234058	advantage of bigger vector registers.
-0.234058	set of special vector registers.
-0.339092	representations in two different registers.
-0.237382	be stored in integer registers.
-0.237276	can be copied into registers.
-0.234012	register stack versus XMM registers.
-0.318738	variables in the YMM registers.
-0.241260	XMM and 256-bit YMM registers.
-0.358506	case situation of the maximum
-0.325278	all be below the maximum
-0.237689	count is near the maximum
-0.237689	As explained above, the maximum
-0.237922	64-bit Windows allows a maximum
-0.405944	special vector registers. The maximum
-0.236426	always the same. The maximum
-0.236426	(see page 27). The maximum
-0.236426	of the weekdays. The maximum
-0.236426	64-bit systems. 67 The maximum
-0.237096	size, bits minimum value maximum
-0.346259	can do to take maximum
-0.218572	to determine the worst-case maximum
-0.547693	16-bit, 32-bit and 64-bit mode.
-0.308958	by default in 64-bit mode.
-0.308958	always enabled in 64-bit mode.
-0.308958	not recognized in 64-bit mode.
-0.299173	in 32-bit or 64-bit mode.
-0.265004	calls faster in 32-bit mode.
-0.488792	the stack in 32-bit mode.
-0.265004	self-relative references in 32-bit mode.
-0.198599	resource, especially in 32-bit mode.
-0.198599	relocation, especially in 32-bit mode.
-0.528216	references in 64 bit mode.
-0.476145	than in 32 bit mode.
-0.165179	or goes into sleep mode.
-0.670531	the number of elements per
-0.394954	107 number of elements per
-0.236348	} The execution times per
-0.235413	data have three values per
-0.254105	unit is clock cycles per
-0.334137	are core clock cycles per
-0.254105	took 50 clock cycles per
-0.050970	size matrices, clock cycles per
-0.165183	follows: Matrix size Time per
-0.165183	size Total kilobytes Time per
-0.165183	element Example 9.6a Time per
-0.237491	am using this for testing
-0.352175	is also useful for testing
-0.354455	the code you are testing
-0.293219	number is zero by testing
-0.313688	insight you gain by testing
-0.323825	is very useful when testing
-0.312699	possibly be relevant when testing
-0.314255	the code faster because testing
-0.313503	possible. This also makes testing
-0.222370	in terms of development, testing
-0.017521	...................................................................................... 156 16.3 Worst-case testing
-0.017521	large. 156 16.3 Worst-case testing
-0.165153	conditions are optimal. Best-case testing
-0.577440	pointers so that the alignment
-0.351108	by 64, but the alignment
-0.314567	as well specify the alignment
-0.349761	function parameters because of alignment
-0.314676	this alignment automatically. The alignment
-0.237752	Example 12.1b. Vectorization with alignment
-0.325240	very few restrictions on alignment
-0.237723	of data members. This alignment
-0.351328	takes care of this alignment
-0.382410	with vector operations when alignment
-0.237276	insufficient information about pointer alignment
-0.234176	of intrinsic vectors requires alignment
-0.212275	__declspec(align(16)) or __attribute__((aligned(16))). Specifies alignment
-0.346214	the data to the right
-0.513292	it point to the right
-0.810895	function pointer to the right
-0.346214	one place to the right
-0.328784	right data into the right
-0.425626	getting them into the right
-0.381697	microprocessor has made the right
-0.522865	difficult to find the right
-0.228741	used for finding the right
-0.228741	required for finding the right
-0.293288	required for putting the right
-0.293542	the final array size right
-0.287883	+ sign(i) ; shift right
-0.831970	is added to the offset
-0.357646	more compact if the offset
-0.237431	needs to code the offset
-0.355389	or more then the offset
-0.356065	than 128 because the offset
-0.356174	signed number. If the offset
-0.428753	pointer simply stores the offset
-0.294175	{return b;} }; The offset
-0.341568	be accessed with an offset
-0.324712	signed number, or no offset
-0.215260	variable in the global offset
-0.268524	its variables called global offset
-0.287379	members with a total offset
-0.588159	set is that the compatibility
-0.678470	for the sake of compatibility
-0.237158	most frequent causes of compatibility
-0.628068	by the requirements of compatibility
-0.573948	are frequent sources of compatibility
-0.343329	take installation time and compatibility
-0.339139	of resource problems and compatibility
-0.382418	newer instruction set when compatibility
-0.276577	the sake of backwards compatibility
-0.251335	to find and resolve compatibility
-0.200021	portability. Unfortunately, the cross-platform compatibility
-0.165159	of information about bugs, compatibility
-0.356624	calculated twice because the macro
-0.356613	is similar to a macro
-0.331444	is expanded like a macro
-0.293820	matrix // define a macro
-0.637307	as a result of macro
-0.382770	inlined. But beware that macro
-0.292174	instead of functions A macro
-0.236113	limited in scope. A macro
-0.236140	// Example 7.34a. Use macro
-0.370556	a[SIZE][SIZE]) { // Define macro
-0.284177	Aligned arrays // Define macro
-0.212286	// Example 7.34b. Replace macro
-0.200015	instruction set. The preprocessing macro
-0.169929	int a; // 2 bytes.
-0.169929	int d; // 2 bytes.
-0.266112	bytes. first // 4 bytes.
-0.118838	int b; // 4 bytes.
-0.118838	int d; // 4 bytes.
-0.100810	unused bytes // 8 bytes.
-0.100810	double b; // 8 bytes.
-0.310120	line size of 64 bytes.
-0.374724	which is typically 64 bytes.
-0.292624	a block of 16 bytes.
-0.379827	in the first 128 bytes.
-0.228142	of abc is 12 bytes.
-0.200021	int a[100]; // 400 bytes.
-0.247315	a reference to the object.
-0.355730	or reference to the object.
-0.357603	a constructor for the object.
-0.237688	forget to delete the object.
-0.565704	pointers to the same object.
-0.352781	small block for each object.
-0.098872	called from the shared object.
-0.098872	accessed from the shared object.
-0.321315	are making a shared object.
-0.204924	within the same shared object.
-0.328488	stored in a global object.
-0.337015	to copy the entire object.
-0.251342	expression or an anonymous object.
-0.331413	// Make array of 100
-0.293792	calculates the sum of 100
-0.237535	100; // Array of 100
-0.545899	and that there are 100
-0.294061	should multiply it by 100
-0.236963	than comparing i with 100
-0.236963	It compares eax with 100
-0.237725	conversion takes 50 - 100
-0.237098	will take 1000 * 100
-0.645674	and give the result 100
-0.521676	clock cycles per element. 100
-0.234173	PTR[ecx+eax*4],ebx eax, 1 eax, 100
-0.184738	increment i++. cmp eax, 100
-0.598625	x++) factorial *= x; Note
-0.529096	than 0 and 1. Note
-0.230079	to the desired version. Note
-0.228126	in the Windows system. Note
-0.283075	style as character arrays. Note
-0.322304	Compiler Documentation for details. Note
-0.542412	78 for an explanation. Note
-0.218531	functions, but less optimized. Note
-0.251304	expression is optimized away. Note
-0.165133	in the variable Day. Note
-0.165133	an object file disassembler. Note
-0.165133	need any patch. 131 Note
-0.165133	this result in a[i]. Note
-0.348389	functions faster by making them
-0.536837	statements that you want them
-0.292080	the code and compile them
-0.334967	ever seen can reduce them
-0.233029	on until you turn them
-0.327396	are returned by copying them
-0.286115	the stack and reading them
-0.279428	numbers simply by comparing them
-0.212289	library file and copies them
-0.265138	CPU cores and leave them
-0.251304	be better to join them
-0.165133	global variables or hide them
-0.165133	right format and getting them
-0.102761	we are reading and writing
-0.102761	spent on reading and writing
-0.349008	cases where we are writing
-0.047868	faster than reading or writing
-0.047868	rather than reading or writing
-0.047868	File access Reading or writing
-0.047868	the program. Reading or writing
-0.047868	random access. Reading or writing
-0.047868	= 0x1C. Reading or writing
-0.535613	reading as well as writing
-0.293141	this shift in software writing
-0.283909	two or more threads writing
-0.333618	to avoid multiple threads writing
-0.499671	new version of the library.
-0.518309	appropriate version of the library.
-0.356019	module or a function library.
-0.292485	using a different function library.
-0.380713	into a separate function library.
-0.357546	the entire floating point library.
-0.517956	-mveclibabi=acml. Agner's vector class library.
-0.335353	DLL or a static library.
-0.234656	processing. Yeppp. Open source library.
-0.529713	in the Gnu C library.
-0.265164	page 131. AMD LIBM library.
-0.165153	faster than any non-vector library.
-0.293924	7.40b union Bitfield { struct
-0.022687	int sign :1;//signbit }; struct
-0.327369	be expressed as follows: struct
-0.218537	// Example 12.2 __declspec(align(16)) struct
-0.218555	Example: // Example 14.9 struct
-0.806125	int size = 1024; struct
-0.212269	example: // Example 7.13 struct
-0.199998	Example: // Example 8.15a struct
-0.165138	Example: // Example 7.40a struct
-0.165138	to: // Example 8.15b struct
-0.165138	last: // Example 7.35b struct
-0.165138	example: // Example 7.35a struct
-0.358481	occurred anywhere in the calculations.
-0.349657	the program do the calculations.
-0.357546	less precise floating point calculations.
-0.343165	and all the integer calculations.
-0.291030	advantage of 64-bit integer calculations.
-0.313169	various functions for these calculations.
-0.300023	sorting, searching, and mathematical calculations.
-0.210327	thread can do mathematical calculations.
-0.210327	innermost loop doing mathematical calculations.
-0.289976	of the heavy graphics calculations.
-0.230783	program logic allows parallel calculations.
-0.306426	time than the actual calculations.
-0.200015	the CPU from overlapping calculations.
-0.364760	advantageous to put the operand
-0.430425	other then put the operand
-0.237706	some microprocessors when an operand
-0.101895	branch prediction. If one operand
-0.101895	operand first. If one operand
-0.129128	or if the first operand
-0.129128	Likewise, if the first operand
-0.313721	way. If the first operand
-0.184703	(0); and the second operand
-0.082605	true, then the second operand
-0.082605	false, then the second operand
-0.184703	determines whether the second operand
-0.224939	put the most predictable operand
-0.497306	processor may have a reduced
-0.408147	a common cause of reduced
-0.355677	unused bytes can be reduced
-0.459505	the condition can be reduced
-0.236978	c; } Can be reduced
-0.294047	code may run with reduced
-0.566156	tests, the Intel compiler reduced
-0.817822	and the Gnu compiler reduced
-0.059430	platforms. This library has reduced
-0.128303	-mveclibabi=svml. This library has reduced
-0.748268	none of the compilers reduced
-0.349539	repeat count has been reduced
-0.349886	for approximately two clock cycles.
-0.248730	up to 4 clock cycles.
-0.197705	4 - 8 clock cycles.
-0.248730	can save several clock cycles.
-0.197705	take only 256 clock cycles.
-0.197705	addition every three clock cycles.
-0.301612	is called core clock cycles.
-0.197705	50 - 100 clock cycles.
-0.149154	5 and 20 clock cycles.
-0.309294	10 - 20 clock cycles.
-0.197705	typically takes 40 clock cycles.
-0.327682	14 - 45 clock cycles.
-0.197705	take approximately 500 clock cycles.
-0.825488	the performance of the final
-0.459205	The efficiency of the final
-0.355441	or estimate of the final
-0.352448	multiple times in the final
-0.352448	static arrays in the final
-0.352448	be disabled in the final
-0.352448	is inserted in the final
-0.576388	calculations so that the final
-0.356518	loop counter when the final
-0.355766	been allocated. If the final
-0.406804	program can check the final
-0.237007	preferable to allocate the final
-0.312840	the object on its final
-0.304243	all is for the sake
-0.304243	this or for the sake
-0.395164	inlined function for the sake
-0.395164	of cache for the sake
-0.304243	vector size for the sake
-0.304243	bit version for the sake
-0.304243	reorder instructions for the sake
-0.304243	with 1 for the sake
-0.304243	source files for the sake
-0.304243	hide them for the sake
-0.395164	is chosen for the sake
-0.304243	is included for the sake
-0.304243	is maintained for the sake
-0.294223	rather than sequences of operations.
-0.308089	not suited for vector operations.
-0.287876	be implemented as vector operations.
-0.698200	advantageous to use vector operations.
-0.287876	useful for Boolean vector operations.
-0.461880	faster than floating point operations.
-0.293614	the use of integer operations.
-0.235470	contains only simple standard operations.
-0.318533	of additions and shift operations.
-0.191047	fast as integer arithmetic operations.
-0.191047	resources than doing arithmetic operations.
-0.212286	have many file input/output operations.
-0.165153	also called Single-Instruction-Multiple-Data (SIMD) operations.
-0.441674	to the dispatcher function. When
-0.471901	set in the cache. When
-0.233271	rather than references are: When
-0.584025	than the cache size. When
-0.287336	as integer arithmetic operations. When
-0.317371	time than single precision. When
-0.229201	81). 77 Pointer aliasing When
-0.222336	on every call method. When
-0.560495	misses and branch mispredictions. When
-0.251304	as fast as additions. When
-0.165133	bigger than 2 GB. When
-0.165133	a discussion of profiling. When
-0.165133	be rounded to 100000000. When
-0.237942	advantageous to split the tasks
-0.294162	is very important for tasks
-0.345096	priority. If the different tasks
-0.337054	a switch between different tasks
-0.293698	out independently of other tasks
-0.323503	response times for simple tasks
-0.225729	data structures for standard tasks
-0.280379	libraries for many standard tasks
-0.235389	The speed for certain tasks
-0.231384	a high priority. Other tasks
-0.265165	thread as very time-consuming tasks
-0.196001	useful to put time-consuming tasks
-0.200009	a loop for trivial tasks
-0.292328	out of a function. Avoid
-0.235742	resources than non-virtual functions. Avoid
-0.709699	part of a program. Avoid
-0.233271	it. Possible solutions are: Avoid
-0.284318	problems or performance problems. Avoid
-0.226736	or key press. 19 Avoid
-0.222352	alias, if appropriate. 8. Avoid
-0.392135	before MemberPointer is declared. Avoid
-0.330771	explained on page 93. Avoid
-0.251304	explained on page 26. Avoid
-0.199993	vector element level 9. Avoid
-0.165133	explained on page 22. Avoid
-0.165133	used. See page 140. Avoid
-0.355896	writes only, then the effect
-0.236418	void xplus2() { The effect
-0.339933	prefetch the data. The effect
-0.330028	an infinite loop. The effect
-0.236418	the parentheses manually. The effect
-0.236418	pre-increment or post-increment. The effect
-0.237726	have undesired effects. This effect
-0.293903	The reason why this effect
-0.324313	This has hardly any effect
-0.233299	functions) has no negative effect
-0.305396	This has a significant effect
-0.226757	obtain the desired polymorphism effect
-0.218554	has a very dramatic effect
-0.357066	times lower; and the amount
-0.854510	organized so that the amount
-0.460987	be useful when the amount
-0.293770	but it increases the amount
-0.237516	you to reserve the amount
-0.237516	order to minimize the amount
-0.338600	test because the total amount
-0.305411	functions consume a significant amount
-0.085107	limit to the required amount
-0.085107	it allocates the required amount
-0.224956	and put an equal amount
-0.306440	collection takes a considerable amount
-0.200026	slow CPU, an insufficient amount
-1.363907	the size of the variable.
-0.357444	all optimizations on the variable.
-0.812478	than division by a variable.
-0.331668	non-constant references require a variable.
-0.294165	doing optimizations on that variable.
-0.324731	variable by a float variable.
-0.294309	it is a register variable.
-0.294309	to be a register variable.
-0.312100	reference or a simple variable.
-0.312100	than accessing a simple variable.
-0.551417	calculated by an induction variable.
-0.221077	making an explicit induction variable.
-0.375048	copied to a local variable.
-0.562643	predicted most of the time,
-0.351694	32 bits at a time,
-0.771304	be a waste of time,
-0.237621	user's computers. At this time,
-0.562369	thing at the same time,
-0.331323	a lot of CPU time,
-0.237031	be added at any time,
-0.449266	that takes a long time,
-0.292328	the table takes extra time,
-1.313472	not known at compile time,
-0.234762	a compromise between development time,
-0.212289	and pow at compile- time,
-0.165133	waste of the programmers' time,
-0.708012	declared inside a class Variables
-0.562729	Storage on the stack Variables
-0.340848	each other in memory. Variables
-0.335758	data caching more efficient. Variables
-0.217561	Global or static storage Variables
-0.499091	kinds of variable storage Variables
-0.233282	The positive effects are: Variables
-0.229195	is of course inefficient. Variables
-0.229195	used for temporary storage. Variables
-0.122647	linked library functions. 9.4 Variables
-0.122647	stored together...................................... 88 9.4 Variables
-0.165143	(see above, p. 26). Variables
-0.165143	distance the critical stride. Variables
-0.358541	usually called in the copying
-0.352439	the object instead of copying
-0.700304	several different ways of copying
-0.237887	Time for transposing and copying
-0.685282	This is done by copying
-0.339760	be copied simply by copying
-1.165381	can be avoided by copying
-0.234823	eliminate this jump by copying
-0.234823	objects are returned by copying
-0.523130	trivial tasks such as copying
-0.293876	memcpy function implicitly when copying
-0.212292	to avoid this wasteful copying
-0.212292	and prevent legitimate backup copying
-1.378459	for the sake of optimization.
-0.336285	are hardly relevant to optimization.
-0.237855	improve the possibilities for optimization.
-0.237738	useful discussions about code optimization.
-0.293963	Wikipedia article on compiler optimization.
-0.473233	able to do this optimization.
-0.362709	support for whole program optimization.
-0.277735	can do whole program optimization.
-0.236939	respects relevant to software optimization.
-0.232302	the debugging options prevent optimization.
-0.231351	the result of full optimization.
-0.199998	code still needs careful optimization.
-0.165138	Some compilers offer profile-guided optimization.
-0.331901	no extra cost to accessing
-0.346001	operator. The code for accessing
-0.992906	also be used for accessing
-0.292480	same induction variable for accessing
-0.339887	cache is optimized for accessing
-0.236382	used in STL for accessing
-0.237791	time loading files or accessing
-0.331691	double. Another problem with accessing
-0.859041	just as fast as accessing
-0.806530	no more time than accessing
-1.070506	is less efficient than accessing
-0.331688	virtual functions or when accessing
-0.231895	77 Pointer aliasing When accessing
-0.294072	turn them off or until
-0.237744	counters will stay on until
-0.237474	reference is valid only until
-0.453802	used by a variable until
-0.334805	to access the file until
-0.435560	pointer can be loaded until
-0.146451	of seconds and wait until
-0.146451	DelayFiveSeconds function will wait until
-0.146451	of x must wait until
-0.200004	is loaded, but waits until
-0.200004	the iteration is repeated until
-0.165143	error is not detected until
-0.165143	updates should be postponed until
-0.357819	no difference for the performance.
-0.237094	differ a lot in performance.
-0.314115	without any cost in performance.
-1.000139	is no difference in performance.
-0.331305	relatively small gain in performance.
-0.325242	no negative effect on performance.
-0.237523	negative impacts on program performance.
-0.381712	gives the worst possible performance.
-0.236100	64-bit version for best performance.
-0.598271	in order to improve performance.
-0.231883	common cause of reduced performance.
-0.230812	be inlined for improved performance.
-0.212281	some tips on improving performance.
-0.237868	also be convenient for adding
-0.349005	200. Next, we are adding
-0.311138	a single function by adding
-0.289040	calculate each address by adding
-0.233358	to 16 bytes by adding
-0.978498	can be calculated by adding
-1.164946	can be improved by adding
-0.233358	of each row by adding
-0.376496	number by 2n by adding
-0.293027	final size needed before adding
-0.236590	the C-style type-casting without adding
-0.290808	operators for things like adding
-0.230802	sets Microprocessor producers keep adding
-1.148067	int cc[]) { // Define
-0.623198	transpose(double a[SIZE][SIZE]) { // Define
-0.231456	12.5. Aligned arrays // Define
-0.526061	a, b, c; // Define
-0.231456	c2; double temp; // Define
-0.373855	x2; // x^4 // Define
-0.047344	vectorized #include <dvec.h> // Define
-0.047344	114 #include <dvec.h> // Define
-0.373855	classes #include "vectorclass.h" // Define
-0.231456	SSE2 #include <emmintrin.h> // Define
-0.526061	InstructionSet() #include "asmlib.h" // Define
-0.231456	SelectAddMul_SSE41, SelectAddMul_AVX2, SelectAddMul_dispatch; // Define
-0.165190	between multiple CPU cores: Define
-0.237874	inefficient, of course, and causes
-0.237434	to be mispredicted, which causes
-0.101679	64 64 matrix size causes
-0.101679	512 512 matrix size causes
-0.420117	if the new version causes
-0.322562	element in the list causes
-0.235206	read because the write causes
-0.232998	function if the inlining causes
-0.329257	fail if the destructor causes
-0.528522	where the critical stride causes
-0.228121	among the most frequent causes
-0.251310	for the FDIV bug causes
-0.165138	(or malloc and free) causes
-0.830344	rather than by the processing
-0.352346	multiple cores, and a processing
-0.809505	much more time than processing
-0.235782	with massively parallel vector processing
-0.235782	got RISC cores, vector processing
-0.322396	to use the high processing
-0.205243	power of the graphics processing
-0.304182	systems have a graphics processing
-0.205243	there is no graphics processing
-0.230778	standard for specifying parallel processing
-0.212281	functions for statistics, signal processing
-0.200009	input/output Graphics and sound processing
-0.200009	also have a physics processing
-0.334321	this case is to divide
-0.062594	CPU cores is to divide
-0.521330	i in order to divide
-0.521330	operations in order to divide
-0.521330	right in order to divide
-0.455361	cores, we need to divide
-0.340794	are several ways to divide
-0.496254	in the code and divide
-0.352595	positive n. You can divide
-0.237811	; shift right = divide
-0.340758	than signed when you divide
-0.380904	short vector library, you divide
-0.461985	delay due to the so-called
-0.358188	system kernel in the so-called
-0.355823	systems normally use the so-called
-0.440843	clock by using the so-called
-0.440843	explicitly by using the so-called
-0.293677	problem by bypassing the so-called
-0.237434	code by emulating the so-called
-0.356441	an FPGA as a so-called
-0.324446	in most cases. The so-called
-0.293357	and written back. The so-called
-0.237153	clear and modular. The so-called
-0.313400	the virtual functions. This so-called
-0.236790	the shared object. This so-called
-0.354956	bodies above, it is clear
-0.354956	each method, it is clear
-0.460872	available. It should be clear
-0.355759	| operator; you can clear
-0.868177	but it is not clear
-0.337748	same in a more clear
-0.233242	90 Gives a more clear
-0.232065	pointers makes it more clear
-0.232065	make the program more clear
-0.232065	of making software more clear
-0.584923	that there is no clear
-0.236826	makes the code less clear
-0.334305	are good for making clear
-0.462751	small fraction of the total
-0.350032	not add to the total
-0.140666	significant contribution to the total
-0.140666	negligible contribution to the total
-0.461046	any effect on the total
-0.655383	allocated dynamically when the total
-0.355881	unit- test because the total
-0.346866	more complicated. If the total
-0.346866	is stored? If the total
-0.357813	up, which is a total
-0.356276	data members with a total
-0.294181	Single-Instruction-Multiple-Data (SIMD) operations. The total
-0.436740	vectors of 64 bits total
-0.458989	can do is to mix
-0.338208	integer operations, and to mix
-0.292913	Be sure not to mix
-0.612177	can be advantageous to mix
-0.386253	also be advantageous to mix
-1.184400	compilers are able to mix
-0.236763	floating point multiplication, to mix
-0.347382	are used. Do not mix
-0.074787	* reciprocal_divisor; 14.7 Don't mix
-0.074787	........................................................................................... 139 14.7 Don't mix
-0.165187	would be evicted. Don't mix
-0.165179	preferably have a balanced mix
-0.294196	such as DOS and 16-bit
-0.129039	uint16_t unsigned int in 16-bit
-0.115695	unsigned short int in 16-bit
-0.115695	int8_t short int in 16-bit
-0.129039	32767 int16_t int in 16-bit
-0.324050	use 32-bit integers in 16-bit
-0.341773	are not optimized for 16-bit
-0.348156	not backwards compatible with 16-bit
-0.461669	not recommended to make 16-bit
-0.098634	a vector of eight 16-bit
-0.098634	in vectors of eight 16-bit
-0.324419	32-bit and 64-bit mode. 16-bit
-0.659242	the offset of the child
-0.357751	same name for the child
-0.031599	functions of parent and child
-0.031599	Members of parent and child
-0.065680	of both parent and child
-0.293763	} }; // The child
-0.324882	// parent class. The child
-0.327960	polymorphic member of its child
-0.307622	a pointer to its child
-0.223344	gets information about its child
-0.228152	"; // call polymorphic child
-0.265184	it has the correct child
-0.347476	commonly used set of containers
-0.237859	reinvent the wheel. The containers
-0.237835	the objects stored are containers
-0.237309	reasons. Use these example containers
-0.236045	often excessively so. These containers
-0.235880	is needed. Objects inside containers
-0.234771	efficient to have separate containers
-0.234163	very inefficient solution. Many containers
-0.233777	of using ready made containers
-0.442608	flexibility of the STL containers
-0.213846	vector. The other STL containers
-0.226737	several examples of suitable containers
-0.292913	loop by 16 to fit
-0.292913	that are available to fit
-0.002861	loop by eight to fit
-0.236763	modified, if necessary, to fit
-0.237461	offering profiling tools that fit
-0.237461	set into sub-vectors that fit
-0.457149	point. This does not fit
-0.511393	CPUs if the data fit
-0.337402	to make the data fit
-1.260669	for the compiler to predict
-0.591225	phase in order to predict
-0.868704	may be able to predict
-0.317756	not always able to predict
-0.317756	are sometimes able to predict
-0.120208	It is difficult to predict
-0.236379	using advanced algorithms to predict
-0.591694	compiler is unable to predict
-0.536941	requires that you can predict
-0.443340	well the microprocessor can predict
-0.165184	the CPU may occasionally predict
-0.325413	test and setting the priority
-0.237499	threads with widely different priority
-0.163449	threads with the same priority
-0.329586	by increasing the thread priority
-0.290514	more powerful. The high priority
-0.218577	code size has higher priority
-0.218577	processor to give higher priority
-0.196016	development and the low priority
-0.265183	run in a low priority
-0.241247	to at a lower priority
-0.241247	separate thread with lower priority
-0.503796	been copied to the disk
-0.349768	hard disk because of disk
-0.341526	where RAM memory and disk
-0.237629	CPU time, RAM and disk
-0.339407	do while waiting for disk
-0.230810	user input or reading disk
-0.122673	swapped to the hard disk
-0.019515	change of a hard disk
-0.009648	file on a hard disk
-0.009648	fast on a hard disk
-0.019515	response from a hard disk
-0.083948	better support for hard disk
-0.164864	problem that the clock frequency
-0.164864	problems that the clock frequency
-0.207652	example, if the clock frequency
-0.207652	programs when the clock frequency
-0.108072	10 Multithreading The clock frequency
-0.108072	work load. The clock frequency
-0.108072	standard PCs. The clock frequency
-0.151514	if the CPU clock frequency
-0.151514	at the CPU clock frequency
-0.200856	can change their clock frequency
-0.200856	for a higher clock frequency
-0.200856	at the actual clock frequency
-0.294225	give a CPU of unknown
-0.237701	any assumption about an unknown
-0.314312	uninitialized or come from unknown
-0.349149	this method for all unknown
-0.293292	may be so many unknown
-0.311513	a model that was unknown
-0.019632	the processors that were unknown
-0.009705	on processors that were unknown
-0.029791	or models that were unknown
-0.029791	newer models that were unknown
-0.329463	number. Failure to handle unknown
-0.355500	runtime polymorphism that is obtained
-0.019907	The best performance is obtained
-0.573418	of modern microprocessors is obtained
-0.381449	possible user interface is obtained
-0.313549	The highest efficiency is obtained
-0.524770	stamp counter can be obtained
-0.355421	higher resolution can be obtained
-0.476346	optimization can sometimes be obtained
-0.419299	that can possibly be obtained
-0.200049	execution is no doubt obtained
-0.235689	AVX2 Mathematical vector function libraries.
-0.291691	performance of different function libraries.
-0.291691	compilers and optimized function libraries.
-0.291691	compiler includes standard function libraries.
-0.331509	libraries and short vector libraries.
-0.352570	inferior to the Intel libraries.
-0.324475	the behavior of static libraries.
-0.236500	distributed between multiple dynamic libraries.
-0.436299	only for very large libraries.
-0.321619	slower than static link libraries.
-0.430651	and Intel vector math libraries.
-0.200015	to link with external libraries.
-0.237946	Newton-Raphson iterations. Here the iteration
-0.022992	the calculation of one iteration
-0.340068	register temp in one iteration
-0.323323	be saved from one iteration
-0.347479	residual error for each iteration
-0.325178	a loop where each iteration
-0.232505	of branch. After each iteration
-0.790628	there is an extra iteration
-0.311848	and again for every iteration
-0.442311	iteration before the preceding iteration
-0.338846	result of the previous iteration
-0.358672	the reading of the counters
-0.347485	is one set of counters
-0.237872	each CPU core). The counters
-0.352434	and reading the performance counters
-0.236067	branch mispredictions, etc. These counters
-0.086635	up the performance monitor counters
-0.086635	read the performance monitor counters
-0.084875	problems. The performance monitor counters
-0.019715	or more performance monitor counters
-0.019715	16.1 Using performance monitor counters
-0.236687	of the total time. Optimizing
-0.236001	rather than loops, etc. Optimizing
-0.233287	of CPU dispatching are: Optimizing
-0.213854	of five manuals: 1. Optimizing
-0.232682	and Mac platforms. 2. Optimizing
-0.563072	code caching is critical. Optimizing
-0.608750	systems with big-endian storage. Optimizing
-0.226752	and optimizing for speed. Optimizing
-0.176479	and pop ebx. 9 Optimizing
-0.176479	does ............................................................................. 84 9 Optimizing
-0.466125	depending on the processor). Optimizing
-0.311188	of b into a 128-bit
-0.403743	float's fits into a 128-bit
-0.311188	be combined into a 128-bit
-0.551883	files For example, a 128-bit
-0.237917	from 64-bit MMX to 128-bit
-0.631825	floating point code. The 128-bit
-0.293761	and YMM registers The 128-bit
-0.289908	256-bit vector as two 128-bit
-0.289908	read operations into two 128-bit
-0.534346	first processors that supported 128-bit
-0.234190	64-bit execution units. Each 128-bit
-0.323787	models had the full 128-bit
-0.237924	out of range is possibly
-0.237626	an import table and possibly
-0.237626	and restoring registers, and possibly
-0.334986	highest performance that can possibly
-0.334986	function F2 that can possibly
-1.120822	of the code can possibly
-0.329638	a high-priority thread can possibly
-0.235637	aliasing" (if valid) can possibly
-0.237686	cache line size may possibly
-0.382180	higher instruction set, but possibly
-0.229234	the following methods could possibly
-0.200026	local object is overwritten, possibly
-0.795011	result is stored in x,
-0.229521	// Example 7.32a double x,
-0.229521	// Example 7.32b double x,
-0.229521	// Example 8.8b double x,
-0.229521	// Example 8.8a double x,
-0.234158	vector 56 public: float x,
-0.234158	7.11 bool a; float x,
-0.235239	7.42 int Multiply (int x,
-0.232323	void Func() { S1 x,
-0.371955	one function can modify x,
-0.218573	loop double ipow (double x,
-0.165159	c, d, e, f, x,
-0.358231	first-in-last-out nature of the stack.
-0.352592	too big for the stack.
-0.455590	hardware support for the stack.
-0.201291	rather than on the stack.
-0.531404	is stored on the stack.
-0.334103	keep together on the stack.
-0.345678	_endthread() cleans up the stack.
-0.446335	organized as a register stack.
-0.290816	threads have each their stack.
-0.479035	thread has its own stack.
-1.757475	is a power of 2,
-0.237802	= 1, Monday = 2,
-0.237726	(i = (int)n - 2,
-0.054490	> 0, c + 2,
-0.381836	18 will evict number 2,
-0.111240	a factor of 1, 2,
-0.111240	allocations of sizes 1, 2,
-0.012444	FactorialTable[13] = {1, 1, 2,
-0.224920	powers of 2 (i.e. 2,
-0.165159	even-numbered logical processors (0, 2,
-0.357780	library (STL) if the full
-0.521740	avoided by making the full
-0.339115	does not give the full
-0.237600	Later models had the full
-0.293866	the future. Typically, the full
-0.237918	twice for handling a full
-0.552616	see the result of full
-0.237901	and a server in full
-0.331732	at half speed or full
-0.294047	a debug version with full
-0.331609	but this will use full
-0.237529	MASM assembly language has full
-0.292820	rather than CPU time. Another
-0.475482	inside the innermost loop. Another
-0.222342	than the code itself. Another
-0.272231	optimize well. Open Watcom Another
-0.218537	alias upon the double. Another
-0.569229	avoid long dependency chains. Another
-0.199998	still uses a GOT. Another
-0.199998	make the program slower. Another
-0.165138	not less than ARRAYSIZE. Another
-0.165138	contained in a DLL. Another
-0.165138	feature on Intel CPU’s. Another
-0.165138	down the execution considerably. Another
-0.382856	where the network is overloaded
-0.325020	using vector classes and overloaded
-0.473436	for copy constructors and overloaded
-0.353542	the name cannot be overloaded
-0.237791	conversion. The constructor or overloaded
-0.350654	different versions of an overloaded
-0.235649	a function. Using an overloaded
-0.235649	defined a constructor, an overloaded
-0.311021	C++ classes and using overloaded
-0.311021	performance penalty for using overloaded
-0.338521	An expression with multiple overloaded
-0.236151	7.27 Overloaded operators An overloaded
-0.358203	initiative whenever it is possible.
-0.357387	just happened to be possible.
-0.289241	Avoid virtual functions if possible.
-0.047683	SSE2 instruction set if possible.
-0.047683	later instruction set if possible.
-0.233534	re- usable library if possible.
-0.233534	inside one function, if possible.
-0.233534	floating point variables, if possible.
-0.233534	use of longjmp if possible.
-0.292046	as little work as possible.
-0.488482	cached as good as possible.
-0.236001	accurate and reproducible as possible.
-0.233454	101 Multithreading works more efficiently
-0.233454	can be calculated more efficiently
-0.233454	could be achieved more efficiently
-0.233454	to be cached more efficiently
-0.229932	variable is accessed most efficiently
-0.020303	The cache works most efficiently
-0.020303	code cache works most efficiently
-0.020303	A cache works most efficiently
-0.282839	backwards and much less efficiently
-0.227898	code cache works less efficiently
-0.227898	It works somewhat less efficiently
-0.236205	math functions should work efficiently
-0.294084	processors. Other brands or models
-0.338979	lists of specific CPU models
-0.043482	negative list of processor models
-0.043482	positive list of processor models
-0.021197	list of which processor models
-0.208370	terms of specific processor models
-0.236098	CPU brands or specific models
-0.332355	redesign. Some software development models
-0.230086	optimal choice for future models
-0.285323	version on all newer models
-0.165159	larger vector size. Later models
-0.502105	compilers. This function is OS
-0.215479	Linux, BSD and Mac OS
-0.185512	shared objects in Mac OS
-0.141058	Gnu compiler for Mac OS
-0.141058	internal references. 64-bit Mac OS
-0.051256	compiling for 32-bit Mac OS
-0.051256	Compilers for 32-bit Mac OS
-0.109381	in Linux. 32-bit Mac OS
-0.114503	systems. The Intel-based Mac OS
-0.114503	well as Intel-based Mac OS
-0.377679	loaded. This method requires OS
-0.654719	the virtual function is needed.
-0.472694	from the library is needed.
-0.428695	level of optimization is needed.
-0.331238	more complicated implementation is needed.
-0.237248	case memory re-allocation is needed.
-1.006051	when it is not needed.
-0.351272	old CPUs is not needed.
-0.235847	containers is 95 not needed.
-0.237697	allocate more space than needed.
-0.348759	is evaluated only when needed.
-0.598897	this feature is rarely needed.
-0.165169	conversion, shuffling, packing, unpacking needed.
-0.237883	instances of structures and classes.
-0.345134	is allowed only for classes.
-0.237520	lists the available vector classes.
-0.352212	and cons of using classes.
-0.349840	to all of these classes.
-0.209480	source of such container classes.
-0.156136	discussion of efficient container classes.
-0.156136	and various efficient container classes.
-0.209480	arrays by well-tested container classes.
-0.228137	used for implementing polymorphic classes.
-0.224914	one of the base classes.
-0.200015	consistent modularity and reusable classes.
-0.294047	b + 1 is changed
-0.237759	14.8 and 14.9 is changed
-0.534916	mode has to be changed
-0.579766	b2; This can be changed
-0.492988	a variable can be changed
-0.350300	{ ... can be changed
-0.350300	and edx can be changed
-0.355164	* 2.5 may be changed
-0.350675	point operands cannot be changed
-0.603470	the function pointer has changed
-0.235810	until the value has changed
-0.165179	the CPUID is artificially changed
-0.324969	example, that a is true
-0.357985	of habit, it is true
-0.346143	time and b is true
-0.954529	is known to be true
-0.293407	a || true = true
-0.293407	a || !a = true
-0.341702	; repeat loop if true
-0.324696	that is most often true
-0.313070	be reduced to always true
-0.614699	= true a && true
-0.599063	= false, a || true
-0.165153	produce a single result, true
-0.382889	start and stop the thread.
-0.237920	cleanup before terminating a thread.
-0.529391	possibly in a different thread.
-0.350456	data for the other thread.
-0.335889	join them into one thread.
-0.301259	be separate for each thread.
-0.356714	one instance for each thread.
-0.227716	exclusive access by each thread.
-0.227716	of work into each thread.
-0.204607	to 5 by another thread.
-0.204607	be changed by another thread.
-0.237510	and code addresses. The names
-0.237510	to compile for. The names
-0.355532	functions, but the function names
-0.341849	library. 119 The function names
-0.234997	clear correspondence between function names
-0.234997	without information about function names
-0.290904	ability to define function names
-0.286485	intrinsic vector functions have names
-0.286485	CPU- specific functions have names
-0.537166	function names and variable names
-0.235704	in library libircmt.lib. Function names
-0.304391	look at CPU brand names
-0.209709	register to temp even though
-0.209709	latter is executed even though
-0.209709	the function returns even though
-0.209709	floating point expressions, even though
-0.209709	if(!(a || b)) even though
-0.209709	rather than nine, even though
-0.233567	125 for this function, though
-0.283122	OS X operating systems, though
-0.226761	best optimizing compilers available, though
-0.222380	follow the track backwards though
-0.165164	in the multiplication b[i]*c[i], though
-0.165164	the call to Object1.Hello(), though
-0.447391	a program than to execute
-0.984925	then you have to execute
-0.490280	too long time to execute
-1.031976	time it takes to execute
-0.857503	code is likely to execute
-0.236765	functions take microseconds to execute
-0.323938	modern x86 CPUs can execute
-0.442458	that the microprocessor can execute
-0.236738	debugging. A debugger can execute
-0.555090	checks makes the code execute
-0.455634	certain kinds of code execute
-0.237883	division with truncation, and %
-0.435350	c; a = b %
-0.630833	10; a = b %
-0.313902	i++){ list[i] = i %
-0.330276	i++) { if (i %
-0.222381	> 256 && SIZE %
-0.272251	address) / (line size) %
-0.309700	a = (unsigned int)b %
-0.200015	= (10000 / 64) %
-0.165153	= (0x2710 / 0x40) %
-1.312645	is more efficient than mov
-0.237483	i/2+r. The next instruction mov
-0.236388	computing i/2+r. The instructions mov
-0.235460	shr add sar add mov
-0.249070	parameter $B1$1: mov mov mov
-0.182238	; parameter $B1$1: mov mov
-0.182238	mov lea $B2$2: mov mov
-0.200009	$B1$1: push mov xor mov
-0.200009	2: 12 $B1$1: push mov
-0.200009	Induction; ; parameter $B1$1: mov
-0.165148	mov mov lea $B2$2: mov
-0.165148	mov xor mov $B1$2: mov
-0.584159	gives the value of N
-0.896725	to the power of N
-0.382317	classes. The splitting of N
-0.010272	Full template specialization for N
-0.020792	Partial template specialization for N
-0.237789	(N & N-1)==0 if N
-0.294047	{ // Array with N
-0.237458	rightmost 1-bit removed. If N
-0.237107	Template for pow(x,N) where N
-0.416601	know that processor model N
-0.222375	constant. // General case, N
-0.318467	level-1 cache. The different kinds
-0.230367	able to do different kinds
-0.332649	There are two different kinds
-0.230367	threads are doing different kinds
-0.230367	is to mix different kinds
-0.331326	more time than other kinds
-0.237431	This can cause all kinds
-0.450041	transitions between the two kinds
-0.235744	core. There are four kinds
-0.235401	instructions can make certain kinds
-0.057014	of manuals. 7.1 Different kinds
-0.057014	constructs........................................................................ 26 7.1 Different kinds
-0.237502	in assembly names. The details
-0.237502	CPU cache (en.wikipedia.org/wiki/L2_cache). The details
-0.381737	my test tool for details
-0.237120	See page 141 for details
-0.237120	and operating systems" for details
-0.314523	VIA CPUs" gives more details
-0.537966	There are also other details
-0.229213	relies on non- standardized details
-0.212286	deeper into the technical details
-0.212286	avoid these problems. More details
-0.165153	integers and other hardware-related details
-0.165153	a one parameter. Further details
-0.358053	hard disk if the RAM
-0.831055	economize the use of RAM
-0.511199	than the speed of RAM
-0.539503	and the amount of RAM
-0.331868	store intermediate results in RAM
-0.292292	powerful computers with more RAM
-0.312717	Try to allocate more RAM
-0.312398	access Accessing data from RAM
-0.291988	fetch the variable from RAM
-0.237107	to around 1980 where RAM
-0.287844	64). You may save RAM
-0.231891	lot of CPU time, RAM
-0.657569	can see that the rows
-0.308845	of 2 if the rows
-0.861711	is to make the rows
-0.105440	512; // number of rows
-0.317726	Example 14.8 const int rows
-0.317726	Example 7.17 const int rows
-0.317726	int FuncCol(int); const int rows
-0.236960	if the distance between rows
-0.022809	{ // loop through rows
-0.356460	is accessed with a square
-0.493729	replace the call to square
-0.294158	edx, to ebx. The square
-0.237805	for // multiply // square
-0.237402	squares and handle one square
-0.237176	// Example 8.1a float square
-0.548458	time. This is called square
-0.236127	Cache contentions expected. Use square
-0.234892	Using complicated techniques like square
-0.226742	reciprocal, fast approximate reciprocal square
-0.199998	time. Single precision division, square
-0.165138	numbers is inefficient. Division, square
-0.579528	system is likely to fail
-0.237791	give misleading results or fail
-0.350522	approximately so. It may fail
-0.236570	the above example may fail
-0.441972	This above code will fail
-0.321316	both positive. It will fail
-0.234591	143. The trick will fail
-0.236743	big software companies often fail
-0.338055	optimal code because they fail
-0.333698	does not, and therefore fail
-0.227907	The developers may therefore fail
-0.165159	worse, many software products fail
-0.112778	useful for many different purposes.
-0.112778	available for many different purposes.
-0.438665	templates for several different purposes.
-0.285542	be used for other purposes.
-0.285542	register available for other purposes.
-0.340507	be used for multiple purposes.
-0.170500	same array for multiple purposes.
-0.100468	a variable for test purposes.
-0.100468	code branch for test purposes.
-0.344101	for many of these purposes.
-0.286687	stack for all these purposes.
-0.165169	library made for demonstration purposes.
-0.237479	certain operating system functions (e.g.
-0.237379	with a micro-op cache (e.g.
-0.237186	a valid 63 number (e.g.
-0.633485	code has a branch (e.g.
-0.236813	by a system call (e.g.
-0.236184	determined with system calls (e.g.
-0.235601	division and relational operators (e.g.
-0.310648	array sequentially. Some applications (e.g.
-0.338892	this requires static linking (e.g.
-0.233769	have big endian storage (e.g.
-0.232292	implement a universal algorithm (e.g.
-0.419566	modify the carry flag (e.g.
-0.510230	called. The disadvantage of compiling
-0.458443	offer the possibility of compiling
-0.237805	framework for interpreting or compiling
-0.237043	optimization. This works by compiling
-0.237043	in a module by compiling
-0.229254	bit vector registers when compiling
-0.229254	Microsoft Visual Studio when compiling
-0.229254	are less strict when compiling
-0.229254	compiler option -fno-pic when compiling
-0.229254	the class Vec16s when compiling
-0.229254	information about Func1 when compiling
-0.229254	and to Eclipse when compiling
-1.350488	is more efficient to convert
-0.407580	used, for example, to convert
-0.892597	is therefore necessary to convert
-0.349095	point number. We can convert
-0.237283	I have tested can convert
-0.408189	cases, the compiler will convert
-0.527360	the Gnu compiler will convert
-0.527360	A good compiler will convert
-0.344264	precision constant and then convert
-0.236050	u < 231 then convert
-0.292949	is faster to first convert
-0.380511	because the compiler must convert
-0.521996	which does the same thing
-0.241523	of doing the same thing
-0.241523	are doing the same thing
-0.443531	doing exactly the same thing
-0.515076	do more than one thing
-0.299913	optimal algorithm The first thing
-0.299913	optimized further. The first thing
-0.508896	execution. The most important thing
-0.321803	page 137). The second thing
-0.231388	long dependency chains. Another thing
-0.222365	the code. The third thing
-0.222375	to be an obvious thing
-0.352782	of i into the least
-0.237781	cache always chooses the least
-0.237781	AND operation isolates the least
-0.224565	be aligned by at least
-0.224565	the class has at least
-0.224565	function that calls at least
-0.224565	model N+1 supports at least
-0.224565	entire library (or at least
-0.224565	stored in memory, at least
-0.224565	the same cache, at least
-0.224565	memset and memcpy, at least
-0.224565	able to do, at least
-0.237858	structure or class for containing
-0.325164	move out loop-invariant code containing
-0.851304	a 128 bit vector containing
-0.348713	should be a class containing
-0.234950	of a simple class containing
-0.338003	with another vector register containing
-0.236492	disk. A big file containing
-0.405170	values then the line containing
-0.234774	block. A large block containing
-0.230791	An expression or subexpression containing
-0.534526	this manual at www.agner.org/optimize/cppexamples.zip containing
-0.200004	for regular access patterns containing
-0.236951	u; if (u.i[1] < 0)
-0.220316	{ if (n > 0)
-0.220316	aa[i] = (bb[i] > 0)
-0.151063	(i % 2 == 0)
-0.151063	SIZE % 128 == 0)
-0.151063	} if (a == 0)
-0.068996	a = (b == 0)
-0.068996	{ if (b == 0)
-0.093310	d; if (a != 0)
-0.093310	1.0; while (n != 0)
-0.093310	{ if (b != 0)
-0.093310	string; while (*p != 0)
-0.331905	worry about loss of precision.
-0.294199	very good performance and precision.
-0.357551	including relaxed floating point precision.
-0.481620	between single and double precision.
-0.099774	precision than for double precision.
-0.099774	cases, even for double precision.
-0.330380	done with long double precision.
-0.214099	result back to single precision.
-0.214099	as fast as single precision.
-0.214099	more time than single precision.
-0.267212	examples all use single precision.
-0.165169	the risk of losing precision.
-0.555681	necessary to do the algebraic
-0.458736	about the possibility of algebraic
-0.237877	various optimization methods and algebraic
-0.356904	is safe to use algebraic
-0.325028	that they cannot make algebraic
-0.542154	cases. This is because algebraic
-0.237038	compiler to do any algebraic
-0.285486	compilers can do simple algebraic
-0.230230	compilers can reduce simple algebraic
-0.234886	compiler to reduce complicated algebraic
-0.234644	capability to reduce various algebraic
-0.234163	in a compiler. Many algebraic
-0.517750	program. The use of structures
-0.325230	objects are instances of structures
-0.237805	If the arrays or structures
-0.489602	16. Alignment of data structures
-0.334684	on algorithms and data structures
-0.280660	lists and other data structures
-0.280660	merge the multiple data structures
-0.051891	contentions in large data structures
-0.396397	that have big data structures
-0.225977	for more advanced data structures
-0.236163	Align arrays and big structures
-0.355667	style type-casting with a little
-0.381856	programs to run a little
-0.293417	1.0f; This needs a little
-0.293417	a loop becomes a little
-0.237206	syntax may seem a little
-0.237759	efficient table-based methods with little
-0.294021	routine should do as little
-0.293916	constructs Most programmers have little
-0.237580	loop unroll factor. A little
-0.237079	is compact and takes little
-0.349799	are: There is very little
-0.234505	the sampling generates too little
-0.345083	void NotPolymorphic(); }; // Any
-0.554186	are. Dynamic memory allocation Any
-0.231863	copy the entire object. Any
-0.229205	to the new block. Any
-0.328860	floating point execution units. Any
-0.224921	b is 400 here. Any
-0.364793	from the loop counter. Any
-0.433149	are difficult to maintain. Any
-0.251310	sections can be shared. Any
-0.165138	needs to be saved. Any
-0.165138	doing the spell checking. Any
-0.165138	code to the device. Any
-0.358422	of abstraction in the logical
-0.657362	is good for the logical
-0.487751	temp even though the logical
-0.237918	model numbers form a logical
-1.024050	and the number of logical
-0.545164	system. The number of logical
-0.102590	number of cores or logical
-0.102590	multiple CPU cores or logical
-0.575759	instances of the same logical
-0.454220	system with only one logical
-0.235633	physical processors but eight logical
-0.165159	using only the even-numbered logical
-0.023428	using asmlib library int level
-0.343310	it adds an extra level
-0.022824	at the vector element level
-0.235699	Fastcall functions /Gr Function level
-0.191802	implemented in the high level
-0.191802	solution because the high level
-0.310046	situations where a high level
-0.234002	microprocessor microarchitecture. A higher level
-0.272257	modules when the highest level
-0.165159	the option for "function level
-0.023447	scans all files on access.
-0.234312	latency or by memory access.
-0.234312	mathematical calculations with memory access.
-0.234312	wait for another memory access.
-0.335784	aligned arrays with vector access.
-0.293570	row addresses at each access.
-0.291457	be done at every access.
-0.235477	hacks and direct hardware access.
-0.232303	gain by optimizing database access.
-0.307849	don't need any non-static access.
-0.228137	is faster than random access.
-0.864696	possible to use the bitwise
-0.564565	set by using the bitwise
-0.237775	both operands. Nevertheless, the bitwise
-0.292940	only when needed. The bitwise
-0.236787	values at once The bitwise
-0.236787	(&& and ||). The bitwise
-0.236787	with 2n -1. The bitwise
-0.237764	you can do with bitwise
-0.646552	The trick of using bitwise
-0.099592	to zero. 14.3 Use bitwise
-0.099592	.................................................................................................. 134 14.3 Use bitwise
-0.251348	! and the corresponding bitwise
-0.358055	call WriteFile if the handle
-0.330996	The safe way to handle
-0.605699	the fastest way to handle
-0.023450	are sufficiently large to handle
-0.381503	dispatchers are designed to handle
-0.324827	model number. Failure to handle
-0.237629	should avoid these and handle
-0.237629	into smaller squares and handle
-0.739658	so that we can handle
-0.237585	Each thread should then handle
-0.337555	CPU dispatcher that doesn't handle
-0.356694	information stored by the heap
-0.349403	garbage collection when the heap
-0.349403	different places when the heap
-0.314455	of memory called the heap
-0.472956	so will cause the heap
-0.293777	course, and causes the heap
-0.346445	high overhead cost of heap
-0.312975	in random order. The heap
-0.236433	for dynamic allocation. The heap
-0.292538	a memory heap. The heap
-0.236433	See page 26. The heap
-0.236433	then become invalid. The heap
-0.231378	The next instruction mov DWORD
-0.228144	DWORD PTR [eax], ecx DWORD
-0.184752	eax ebx, 1 ebx, DWORD
-0.234189	The instruction add ebx, DWORD
-0.176474	[esp+8] eax, eax edx, DWORD
-0.176474	eax, edx, ecx, edx, DWORD
-0.212286	+ esp ebx ecx, DWORD
-0.172697	ebx, DWORD PTR [edx] DWORD
-0.098597	[esp+8] DWORD PTR [edx] DWORD
-0.330801	[esp+4] DWORD PTR [esp+8] DWORD
-0.165153	[edx] DWORD PTR [eax+400] DWORD
-0.165153	edx, DWORD PTR [esp+4] DWORD
-0.381137	of the user's time. Other
-0.234176	handle only known processors. Other
-0.276559	have a high priority. Other
-0.218568	libraries in this format. Other
-0.265164	_M_X64 162 19 Literature Other
-0.074773	Graphics ................................................................................................................. 21 3.11 Other
-0.074773	one is best. 3.11 Other
-0.035779	database ...................................................................................................... 20 3.9 Other
-0.035779	(*.ini files). 20 3.9 Other
-0.074773	and so on. 7.31 Other
-0.074773	handling ................................................................................ 61 7.31 Other
-0.165153	Microsoft, Intel and Gnu). Other
-0.458074	support which is used during
-0.352420	track of the performance during
-0.312912	43 speculatively executing instructions during
-0.235200	lot of time both during
-0.584734	a specific CPU core during
-0.436478	turn off the computer during
-0.288973	a[] which will change during
-0.224909	a task switch occurs during
-0.218537	program may be selected during
-0.199998	of the framework itself, during
-0.199998	in an array grows during
-0.165138	runs under the framework, during
-0.355968	table (PLT) that is initialized
-1.003072	means that it is initialized
-0.651545	make sure it is initialized
-0.466771	that the table is initialized
-0.237893	constants, string constants, and initialized
-0.339407	the program, one for initialized
-0.554456	not need to be initialized
-1.102203	that it can be initialized
-0.500479	An array can be initialized
-0.330885	element } An array initialized
-0.345962	and b have been initialized
-0.313332	big that overflow can occur
-0.236733	branch target buffer can occur
-0.236733	and garbage collection can occur
-0.292696	but such expressions may occur
-0.236572	{ // Overflow may occur
-0.237585	that the break will occur
-0.237153	inlining. Reducible expressions also occur
-0.236156	signed integer overflow doesn't occur
-0.159531	a matrix when contentions occur
-0.159531	more dramatic when contentions occur
-0.218577	detecting errors that seldom occur
-0.354108	function directly if the target
-0.895469	be eliminated if the target
-0.355649	has changed then the target
-0.293973	able to predict the target
-0.237517	can be predicted. The target
-0.237517	the next paragraph. The target
-0.098690	space in the branch target
-0.098690	Contentions in the branch target
-0.301284	cache called the branch target
-0.275060	be critical. The branch target
-0.221036	cache, code cache, branch target
-0.230791	part of a program, especially
-0.369240	integers in 32-bit systems, especially
-0.296226	Position-independent code is inefficient, especially
-0.356032	or loss of precision, especially
-0.200004	are a scarce resource, especially
-0.200004	data in the file, especially
-0.200004	less efficient than relocation, especially
-0.251316	Avoid long dependency chains, especially
-0.165143	floating point code slower, especially
-0.165143	a critical dependency chain, especially
-0.165143	course also time consuming, especially
-0.494305	simple pointer or a smart
-0.497990	element then use a smart
-0.324515	you don't need a smart
-0.454267	an object through a smart
-0.313901	extra cost whenever a smart
-0.442015	most common implementations of smart
-0.236124	7.9 Smart pointers A smart
-0.312607	no longer used. A smart
-0.352220	The purpose of using smart
-0.293145	does some things very smart
-0.290804	objects with each their smart
-0.325313	the innermost loop that includes
-0.290794	with other compilers. This includes
-0.234900	for register variables. This includes
-0.234900	important than speed. This includes
-0.234900	the "override" feature. This includes
-0.555884	Intel The Intel compiler includes
-0.237150	the C++ language also includes
-0.381422	measured in this way includes
-0.419222	program. The map file includes
-0.289987	linking are: Static linking includes
-0.165159	Available from www.agner.org/optimize/asmlib.zip. Currently includes
-0.356722	is allocated and the entire
-0.358011	identical constants in the entire
-0.350216	dynamic linking makes the entire
-0.731132	solution of making the entire
-0.407058	time to copy the entire
-0.457522	cache to load the entire
-0.443167	done by copying the entire
-0.237179	will be loading the entire
-0.237179	loop where almost the entire
-0.293387	You may mirror the entire
-0.237715	the write causes an entire
-0.455591	copies them into the executable
-0.665300	and you want the executable
-0.331593	time. Therefore, both the executable
-0.293871	executable file. Only the executable
-0.237605	is run. Both the executable
-0.331696	process or by an executable
-0.539090	included in a single executable
-0.233802	and distributed as binary executable
-0.160366	function in the main executable
-0.160366	works in the main executable
-0.200231	call from the main executable
-0.462527	be useful if the subexpression
-0.345217	parenthesis around such a subexpression
-0.294092	propagation An expression or subexpression
-0.462146	elimination If the same subexpression
-0.221294	optimizations such as common subexpression
-0.221294	pure. This allows common subexpression
-0.221294	using function inlining, common subexpression
-0.093306	subexpression elimin., integer Common subexpression
-0.093306	a += 2; Common subexpression
-0.093306	XMM (vector) reductions: Common subexpression
-0.093306	propagation Pointer elimination Common subexpression
-0.500384	first way is to insert
-1.572135	It is possible to insert
-0.632814	is often possible to insert
-0.293353	and down. Remember to insert
-0.237149	you are risking to insert
-0.342673	at compile time and insert
-0.624626	in static memory and insert
-0.529892	this instruction set and insert
-0.237109	values by hand and insert
-0.353765	The Intel compiler can insert
-0.354562	error. // You may insert
-0.355812	chapter 9.10, then the nontemporal
-0.331815	level-2 cache. Using the nontemporal
-0.325397	then the effect of nontemporal
-0.237867	same memory area. The nontemporal
-0.313254	when the #pragma vector nontemporal
-0.236667	Noncached write #pragma vector nontemporal
-0.236667	vector nontemporal #pragma vector nontemporal
-0.354478	be ameliorated by using nontemporal
-0.307544	written back. The so-called nontemporal
-0.309559	be evicted. Don't mix nontemporal
-0.230803	Intel compiler can insert nontemporal
-0.345258	calculations go outside the bounds
-0.237922	I have added a bounds
-0.435050	count. The method of bounds
-0.312672	like an array with bounds
-0.312672	examples of arrays with bounds
-0.292249	Example 7.15a. Array with bounds
-0.237755	(see page 134 on bounds
-0.318405	scanf. Violation of array bounds
-0.155222	automatically check for array bounds
-0.155222	no checking for array bounds
-0.155222	no checks for array bounds
-0.237880	may be inlined for improved
-0.699385	things that can be improved
-0.488372	thing that can be improved
-0.534663	this code can be improved
-0.929998	} This can be improved
-0.521094	x; This can be improved
-0.341005	The performance can be improved
-0.341005	non-Intel processors can be improved
-0.341005	C++ projects can be improved
-0.351376	the speed will be improved
-0.233959	code can probably be improved
-0.357241	fast math and the SSE
-0.356697	control Microprocessors with the SSE
-0.350124	the microprocessor has the SSE
-0.314470	Does not support the SSE
-0.379827	float 32 4 128 SSE
-0.417661	80386 32 bit mode SSE
-0.367353	sections /Gy -ffunction- sections SSE
-0.165159	set Prefetch PREFETCH _mm_prefetch SSE
-0.165159	without cache MOVNTPS _mm_stream_ps SSE
-0.165159	Header file MMX mmintrin.h SSE
-0.165159	without cache MOVNTQ _mm_stream_pi SSE
-0.462175	etc. And it is discussed
-0.353796	particular part. It is discussed
-0.353796	these considerations. It is discussed
-0.293469	use of threads is discussed
-0.237251	different type conversions is discussed
-0.236498	obstacles to optimization are discussed
-0.535506	common function libraries are discussed
-0.330126	memory. These methods are discussed
-0.236498	most common time-consumers are discussed
-0.237747	such small devices, as discussed
-0.495835	where it is also discussed
-0.325413	may not need the updates
-0.172221	needs. The search for updates
-0.172221	Some programs search for updates
-0.293323	use time searching for updates
-0.237529	installation of downloaded program updates
-0.234664	hours to install automatic updates
-0.089882	.................................................................................................. 18 3.4 Automatic updates
-0.089882	standardized manner. 3.4 Automatic updates
-0.228142	or remotely. If frequent updates
-0.315154	because these time consuming updates
-0.165159	software programs automatically download updates
-1.342829	it is important to consider
-0.343521	program then you may consider
-0.343521	pointer then you may consider
-0.343521	structure then you may consider
-0.282999	to code, you may consider
-0.282999	composite object, you may consider
-0.456934	application-specific code. If you consider
-0.324976	result then we will consider
-0.292445	so complicated that I consider
-0.323475	specific purpose, you must consider
-0.229433	models. However, we must consider
-0.237859	because it requires the loading
-0.237859	it may involve the loading
-0.355645	program, you will be loading
-0.349693	programs use more time loading
-0.293814	the level-2 cache from loading
-0.237529	Basic, etc. But program loading
-0.231173	directly to memory without loading
-0.231173	stores a double without loading
-0.281582	times because of lazy loading
-0.068035	.................................................................................................... 19 3.5 Program loading
-0.068035	update process. 3.5 Program loading
-0.237846	d would all be below
-0.293544	code, as the example below
-0.527135	loaded at an address below
-0.291853	bounds checking is explained below
-0.044418	{ // loop columns below
-0.044418	rows // loop columns below
-0.228165	the function ReadTSC listed below
-0.276559	elements from row 28 below
-0.122653	if the elements matrix[r][c] below
-0.165170	diagonal. Each element matrix[r][c] below
-0.200015	parameters, as example 7.15b below
-0.325308	the latter case the reading
-0.779737	to turn off the reading
-0.441997	manner. This applies to reading
-0.538298	on the stack and reading
-0.237623	want to optimize, and reading
-0.451044	9.5 because we are reading
-0.538634	for user input or reading
-0.237759	done in connection with reading
-0.314546	time is spent on reading
-0.575222	blocks is faster than reading
-0.355771	one operation rather than reading
-0.314773	.NET framework. Obviously, the directly
-0.461704	than calling the function directly
-0.535585	framework as well as directly
-0.236187	described below. Make calls directly
-0.235209	problem. These instructions write directly
-0.226766	the floating point representation directly
-0.364800	different implementations of C++, directly
-0.265151	to put measurement instruments directly
-0.200004	it can call C1::f directly
-0.200004	which can be fed directly
-0.165143	} }; // Called directly
-0.572459	declaration. This is the simplest
-0.501437	registers only in the simplest
-0.460380	register except in the simplest
-0.357389	registers. Except for the simplest
-0.347770	it understands only the simplest
-0.193125	method that gives the simplest
-0.193125	option that gives the simplest
-0.313409	elements per vector. The simplest
-0.236797	advanced development tools. The simplest
-0.236797	when compiling module2.cpp. The simplest
-0.236797	code is repetitive. The simplest
-0.356644	Fine-grained parallelism is the situation
-0.357909	parallelism refers to the situation
-0.358422	be useful in the situation
-0.294170	class or structure. The situation
-0.237621	absent in a use situation
-0.314343	of cache space. A situation
-0.324847	that's about the only situation
-0.293238	be used in any situation
-0.036371	cover the worst case situation
-0.236056	0x20; 46 A common situation
-0.500132	is called from the message
-0.358078	level, typically in a message
-0.237899	as semaphores, mutexes and message
-0.188157	linker makes an error message
-0.188157	don't need an error message
-0.188157	will generate an error message
-0.188157	to issue an error message
-0.188157	will provoke an error message
-0.193435	bounds checking). An error message
-0.454069	make your own error message
-0.243931	prints an appropriate error message
-0.342835	takes extra time. The delay
-0.293346	a matrix line. The delay
-0.293346	the dynamic linker. The delay
-0.237728	of 250 ms. This delay
-0.355942	to calculate the time delay
-0.324970	of i which will delay
-0.341545	can incur a large delay
-0.236152	a store operation doesn't delay
-0.222365	are called. A considerable delay
-0.017522	of a store forwarding delay
-0.017522	generate a store forwarding delay
-0.358423	one statement in the condition
-0.202605	be eliminated if the condition
-0.237920	faster because testing a condition
-0.314594	so that the if condition
-0.460421	$B1$2 is the loop condition
-0.347800	mode, and an error condition
-0.303128	Testing for the overflow condition
-0.228163	result. An uncaught overflow condition
-0.293029	7.30b. The loop control condition
-0.293029	most efficient loop control condition
-0.280998	for using the performance monitor
-0.280998	set up the performance monitor
-0.280998	to read the performance monitor
-0.284820	performance problems. The performance monitor
-0.020479	one or more performance monitor
-0.199616	monitor counters. A performance monitor
-0.199616	test feature called performance monitor
-0.199616	A particularly useful performance monitor
-0.041972	153 16.1 Using performance monitor
-0.041972	below) 16.1 Using performance monitor
-0.851887	important to economize the resource
-0.575586	are frequent sources of resource
-0.237784	loading of modules or resource
-0.565693	access to the same resource
-0.326925	memory swapping and other resource
-0.326925	cache evictions and other resource
-0.279460	of DLLs, configuration files, resource
-0.886647	is important to economize resource
-0.534503	Registers are a scarce resource
-0.165148	Time is a precious resource
-0.165148	libraries or shared objects), resource
-0.791798	that the number of cores
-0.791798	than the number of cores
-0.342251	a reduced number of cores
-0.448559	synchronization between the different cores
-0.351722	in all the CPU cores
-0.157607	Library. The multiple CPU cores
-0.157607	systems with multiple CPU cores
-0.157607	to use multiple CPU cores
-0.338529	competition. Processors with multiple cores
-0.291760	i7 processor with four cores
-0.218581	processors and FPGA soft cores
-0.520394	all code has a parallel
-1.378499	for the sake of parallel
-0.595146	doing multiple calculations in parallel
-0.538772	the OpenMP directives for parallel
-0.311912	are available for doing parallel
-0.532869	the program logic allows parallel
-0.213858	good optimization options. Supports parallel
-0.213858	Linux and Mac. Supports parallel
-0.224929	a standard for specifying parallel
-0.200009	definition language is inherently parallel
-0.165148	Big supercomputers with massively parallel
-0.352671	the same code in either
-0.313956	to 5 times faster either
-0.167888	libraries can be implemented either
-0.417131	library can be linked either
-0.506422	below. You may choose either
-0.229201	integers. It can contain either
-0.301463	bit can be saved either
-0.218564	50-50 chance of going either
-0.284318	keep multiple memory blocks, either
-0.200009	a graphics processing unit, either
-0.267514	performance of two different implementations
-0.267514	can make two different implementations
-0.229778	of the loop. Some implementations
-0.229778	it can run. Some implementations
-0.487971	element. The most common implementations
-0.228608	page 93). All common implementations
-0.234916	binary executable code. Most implementations
-0.234899	programming languages and their implementations
-0.288286	CPUs have particularly slow implementations
-0.230092	the case if alternative implementations
-0.200021	just-in-time compilation. Some early implementations
-0.352658	lookup table instead of calculating
-0.339724	are often used for calculating
-0.339724	method currently used for calculating
-0.469741	use induction variables for calculating
-0.235649	dedicated physics processor for calculating
-0.799855	has hardware support for calculating
-0.329074	extra work needed for calculating
-0.338958	processing unit intended for calculating
-0.576417	This is faster than calculating
-0.293883	are done implicitly when calculating
-0.218578	allows it to begin calculating
-0.584366	restores the value of ebx
-0.237898	; compute i/2 in ebx
-0.330191	is ecx+eax*4. The result ebx
-0.291748	the label. It uses ebx
-0.232287	unused label ; save ebx
-0.226747	r points to. Now ebx
-0.224914	in assembly code. Register ebx
-0.284311	?Func@@YAXQAHAAH@Z ENDP + esp ebx
-0.212275	i < 100. pop ebx
-0.212275	1 eax, 100 $B1$2 ebx
-0.165143	unused label ; restore ebx
-0.584574	classes in the same generation
-0.299915	a vector). The first generation
-0.299915	as follows. The first generation
-0.292740	CPU development, each new generation
-0.429113	case that the next generation
-0.303386	μs on the next generation
-0.407733	only in the second generation
-0.083833	member functions. The second generation
-0.083833	polymorphic functions. The second generation
-0.226761	time loops or compile-time generation
-0.222371	information about the third generation
-0.459419	to do is to enable
-0.591803	const in order to enable
-0.925995	It is recommended to enable
-0.293349	appropriate compiler options to enable
-0.314716	can set up and enable
-0.502361	Use 64-bit mode or enable
-0.346276	function inline. This may enable
-0.344749	or inline. This will enable
-0.323145	program optimization, which will enable
-0.346339	The newer instruction sets enable
-0.952551	types of floating point instructions.
-0.645646	before any floating point instructions.
-0.236596	code to access these instructions.
-0.302904	as to the AVX instructions.
-0.302904	to use the AVX instructions.
-0.403113	common memory and string instructions.
-0.233998	Table 9.2. Cache control instructions.
-0.321107	doesn't delay the subsequent instructions.
-0.226773	or a few machine instructions.
-0.417423	with slow bit scan instructions.
-0.218554	compiler documentation for detailed instructions.
-0.712732	when the object is copied
-0.413721	whenever an object is copied
-0.335654	of the parameter is copied
-0.237417	in example 14.1c is copied
-0.497444	the object can be copied
-0.519319	many objects can be copied
-0.353504	data members can be copied
-0.525315	function code is not copied
-0.349552	the file has been copied
-0.212316	and the entire contents copied
-0.165169	pointer is created, deleted, copied
-0.556005	into a vector of e.g.
-0.641666	the response time to e.g.
-0.356091	with suffixes such as e.g.
-1.376327	to tell the compiler e.g.
-0.337186	a specific instruction set, e.g.
-0.311756	vector registers can hold e.g.
-0.226737	Newest instruction set available, e.g.
-0.212275	a software programming language, e.g.
-0.165143	libraries with internal multi-threading, e.g.
-0.165143	compiling the module with, e.g.
-0.165143	to generate an interrupt, e.g.
-0.458773	An alternative is to keep
-0.350484	The program has to keep
-0.352078	very useful way to keep
-1.015653	that you want to keep
-0.935389	can be advantageous to keep
-0.334593	companies often fail to keep
-0.236570	more powerful computers to keep
-0.313138	may be preferable to keep
-0.325357	all allocated objects and keep
-0.292661	sure that they always keep
-0.165179	instruction sets Microprocessor producers keep
-0.027576	next instruction mov DWORD PTR
-0.027576	PTR [eax], ecx DWORD PTR
-0.013569	ebx, 1 ebx, DWORD PTR
-0.013569	instruction add ebx, DWORD PTR
-0.013569	eax, eax edx, DWORD PTR
-0.013569	edx, ecx, edx, DWORD PTR
-0.027576	esp ebx ecx, DWORD PTR
-0.187477	DWORD PTR [edx] DWORD PTR
-0.027576	DWORD PTR [esp+8] DWORD PTR
-0.027576	DWORD PTR [eax+400] DWORD PTR
-0.027576	DWORD PTR [esp+4] DWORD PTR
-0.314072	set AVX instr. set Automatic
-0.234343	variables for float expressions Automatic
-0.340392	set Automatic CPU dispatch Automatic
-0.325389	CPU dispatch Automatic vectorization Automatic
-0.276559	by individual installation tools. Automatic
-0.074773	installation .................................................................................................. 18 3.4 Automatic
-0.074773	a standardized manner. 3.4 Automatic
-0.200015	Example: // Example 12.1a. Automatic
-0.074773	more distant future. 12.3 Automatic
-0.074773	registers .......................................................... 107 12.3 Automatic
-0.165153	installation tools. Automatic updates. Automatic
-0.236189	now discontinued Object Windows Library
-0.052068	is the Windows Template Library
-0.052068	(ATL) and Windows Template Library
-0.111240	is the Standard Template Library
-0.111240	in the Active Template Library
-0.226783	_mm_exp_pd AMD Core Math Library
-0.667308	addresses divisible by 16. Library
-0.410281	of Intel's Math Kernel Library
-0.218560	not always fully optimized. Library
-0.265171	__vrs4_expf __vrd2_exp AMD LIBM Library
-0.165159	the library functions directly: Library
-0.342229	|| (!a&&c) = a ?
-0.342229	|| (b&&c) = a ?
-1.300343	b; a = b ?
-0.169162	return a > b ?
-0.233855	MAX(a,b) (a > b ?
-0.036371	= b > 0 ?
-0.258241	= (bb[i] > 0) ?
-0.409539	= (b == 0) ?
-0.200032	__except (GetExceptionCode() == EXCEPTION_FLT_OVERFLOW ?
-0.165169	implement OneOrTwo5[b!=0] as OneOrTwo5[(b!=0) ?
-1.004696	if the function is defined
-0.348133	The sin function is defined
-0.293849	if its body is defined
-0.293631	Sunday, Monday, etc. are defined
-0.331238	Whether the constants are defined
-0.237531	if the programmer has defined
-0.237190	to a static object defined
-0.237005	global variables (i.e. variables defined
-0.345955	N1 could have been defined
-0.099171	Table 12.5. Vector classes defined
-0.099171	Table 12.1. Vector classes defined
-0.237925	version of Basic is Visual
-0.261700	Integrates into the Microsoft Visual
-0.234667	development tool is Microsoft Visual
-0.234667	a plug-in to Microsoft Visual
-0.185178	are mentioned below. Microsoft Visual
-0.185178	up to date): Microsoft Visual
-0.218572	directives for multi-core processing. Visual
-0.122663	languages such as C#, Visual
-0.122663	written in Java, C#, Visual
-0.212304	is available for free. Visual
-0.165169	13 objects, respectively (MS Visual
-0.591950	8 in order to align
-0.521942	120 for how to align
-1.074279	example shows how to align
-0.407296	You may choose to align
-0.560702	12.1a, the compiler can align
-0.574121	for small x // align
-0.237238	// swap elements // align
-0.471758	line. Some compilers will align
-0.483993	variable. Most compilers will align
-0.229429	stack ; return ; align
-0.229429	8 + esp ; align
-0.294009	seven memory allocations of sizes
-0.237726	data cache. Bit-fields of sizes
-0.099729	when objects of different sizes
-0.289675	different alignments and different sizes
-0.347705	work efficiently on all sizes
-0.331312	is intended for array sizes
-0.338017	extension of vector register sizes
-0.291954	cell for different matrix sizes
-0.291054	variables and operators Integer sizes
-0.233562	size. Integers of smaller sizes
-0.498537	i++) { a[i] = temp;
-0.023077	int r, c; double temp;
-0.232101	r2, c1, c2; double temp;
-0.047719	b = temp * temp;
-0.047719	c[i] = temp * temp;
-0.236788	b[size], c[size]; float register temp;
-0.233816	int i, a[100], b, temp;
-0.550244	int a, b, c, temp;
-0.403853	8.14b int i, a[100], temp;
-0.339376	have certain instructions that allow
-0.497864	Microsoft compiler does not allow
-0.346141	never changed. This will allow
-0.237394	version of C++ should allow
-0.349967	Furthermore, most C++ compilers allow
-0.333594	integers. Many 32-bit systems allow
-0.314476	is important. Some systems allow
-0.226024	registers. 64-bit Unix systems allow
-0.960625	Linux, BSD and Mac allow
-0.233019	DOS and 16-bit Windows, allow
-0.232319	for high precision math allow
-0.657063	directives work on the PathScale
-0.278575	by the Intel and PathScale
-0.278575	The Gnu, Intel and PathScale
-0.136905	Microsoft, Intel, Gnu and PathScale
-0.292997	Clang, Intel, Microsoft and PathScale
-0.714549	Gnu, Clang, Intel or PathScale
-0.323146	supported by Microsoft, Intel, PathScale
-0.233015	choice for all platforms. PathScale
-0.218577	Watcom Digital Mars PGI PathScale
-0.165164	4.1.0, 2006 (Red Hat). PathScale
-0.481279	Linux also applies to BSD
-0.490265	data in Linux and BSD
-0.345260	BSD Shared objects in BSD
-0.345260	references. Shared objects in BSD
-0.145040	most distributions of Linux, BSD
-0.189925	Shared objects in Linux, BSD
-0.205981	registers, whereas 64-bit Linux, BSD
-0.144635	and a Windows, Linux, BSD
-0.144635	platforms with Windows, Linux, BSD
-0.226772	conventions. FreeBSD and Open BSD
-0.165169	Mac Windows, Linux, Mac, BSD
-0.237265	d + e + f;
-0.009860	14.28 union { float f;
-0.009860	14.26 union { float f;
-0.009860	14.27 union { float f;
-0.009860	14.23 union { float f;
-0.009860	7.39 union { float f;
-0.009860	14.29 union { float f;
-0.009860	14.24 union { float f;
-0.009860	14.25 union { float f;
-0.489412	7.19 int i; float f;
-0.237010	f *= i; return f;
-1.257254	the result of the previous
-0.656849	a constant to the previous
-0.460154	the value in the previous
-1.034595	as explained in the previous
-0.836367	calculation depends on the previous
-0.193989	is calculated from the previous
-0.327266	more efficiently from the previous
-0.353071	is finished using the previous
-0.342876	be loaded until the previous
-0.350324	is known from a previous
-0.480802	= 0; i < size;
-0.357670	classes Fortunately, it is rarely
-0.353724	function pointers It is rarely
-0.353724	instruction set. It is rarely
-0.314102	function dispatch mechanism is rarely
-0.274179	but this feature is rarely
-0.182508	so this feature is rarely
-0.629878	of the time and rarely
-0.898964	so high that it rarely
-0.237440	and Mac programs but rarely
-0.230100	the program. Application programmers rarely
-0.230096	wealth of advanced features rarely
-0.529391	integer in a different way.
-0.350456	it goes the other way.
-0.326801	works in the following way.
-0.326801	interpreted in the following way.
-0.326801	evaluated in the following way.
-0.212315	allocation in an inefficient way.
-0.382896	in a very inefficient way.
-0.230802	chance of going either way.
-0.109376	so in a suboptimal way.
-0.185510	CPUs in a suboptimal way.
-0.165164	conditions in a graceful way.
-0.357981	first object to the vector.
-0.352866	data fit into the vector.
-0.240115	of elements in a vector.
-0.346937	consecutive terms in one vector.
-0.237303	handling a full size vector.
-0.290671	used as a Boolean vector.
-0.335779	value in the last vector.
-0.130066	number of elements per vector.
-0.299298	size of the largest vector.
-0.356125	well-structured program that is easier
-0.357867	quite convenient. It is easier
-0.538221	in example 15.1b is easier
-0.237890	is more manageable and easier
-0.443919	local. This makes it easier
-0.293731	static declaration makes it easier
-0.394836	read. It is often easier
-0.394836	130. It is often easier
-0.234188	28. The calculation becomes easier
-0.320071	functions. It is just easier
-0.212298	can make this reordering easier
-0.458477	variable pointed to is identical
-0.336124	to by p is identical
-0.331238	the two constants are identical
-0.237393	and Open BSD are identical
-0.235392	to be stored. All identical
-0.228153	operating systems give almost identical
-0.241269	the code is exactly identical
-0.191065	compilers will make exactly identical
-0.284338	more compact by joining identical
-0.074777	Live range analysis Join identical
-0.074777	same memory area. Join identical
-0.294185	typically between 5 and 20
-0.102439	penalty of 10 - 20
-0.102439	detected until 10 - 20
-0.237549	has been reduced from 20
-0.272261	} This loop repeats 20
-0.218549	3.8 System database ...................................................................................................... 20
-0.200009	and position-independent code ....................................................... 20
-0.200009	19 Literature ..................................................................................................................... 163 20
-0.165148	comp.lang.asm.x86 for some links. 20
-0.165148	20 3.7 File access................................................................................................................ 20
-0.165148	configuration files (*.ini files). 20
-0.236865	may be smaller as well.
-0.236865	compiled programming languages as well.
-0.236963	is executed. Optimizes very well.
-0.379093	IDE. Does not optimize well.
-0.267402	innermost loop is predicted well.
-0.297594	otherwise would be predicted well.
-0.197908	branches is not predicted well.
-0.356064	a code version performs well.
-0.057014	automatic vectorization. Optimizes reasonably well.
-0.057014	calling conventions. Optimizes reasonably well.
-0.165164	automatic vectorization. Optimizes moderately well.
-0.590287	particular part of the program,
-0.459619	during start of the program,
-0.355767	complete redesign of the program,
-0.352414	be modified by the program,
-0.244730	never modified by the program,
-0.293772	takes to execute the program,
-1.338114	critical part of a program,
-0.313840	processors. In a C++ program,
-0.289788	decimal point in your program,
-0.501150	inserted in the final program,
-0.212304	threads in a multithreaded program,
-0.456374	address. The address of list[i]
-0.499279	134 } else { list[i]
-0.499279	range"; } else { list[i]
-0.338509	operands because the expression list[i]
-0.234501	(i < ARRAYSIZE && list[i]
-0.342064	through array cout << list[i]
-0.008673	i; for(i=0; i<300; i++){ list[i]
-0.074775	i; for(i=0; i<300; i+=3){ list[i]
-0.074775	i; for(i=0; i<301; i+=3){ list[i]
-0.165159	i_div_3; for(i=i_div_3=0; i<300; i+=3,i_div_3++){ list[i]
-0.516351	test the response time under
-0.561570	modification of the program under
-0.534238	spent in the program under
-0.500736	long. If the program under
-0.352427	to test the performance under
-0.236104	version that performs best under
-0.334215	performance tests are done under
-0.327313	should also be tested under
-0.296240	the program that runs under
-0.212286	performance for background services under
-0.200015	be found in Wikipedia under
-0.538837	bit scan instruction and expect
-0.342978	which optimizations you can expect
-0.628536	In general, you can expect
-0.457411	and we do not expect
-0.353728	lookup table if you expect
-0.340757	to & unless you expect
-0.347598	cache line that we expect
-0.428638	library and you cannot expect
-0.237244	this case. You cannot expect
-0.237244	test examples. You cannot expect
-0.237244	are compiler-specific. You cannot expect
-0.869901	stored in a register except
-0.100196	zero if all bits except
-0.100196	by testing all bits except
-0.231883	at the same time, except
-0.226761	to the same object, except
-0.222366	than a static library, except
-0.356040	about overflow and underflow except
-0.212281	to make 16-bit programs, except
-0.630042	stored on the stack, except
-0.165148	-fpic is much faster, except
-0.165148	included in the representation, except
-0.356941	can happen with the loops
-0.293896	but no compile- time loops
-0.117578	order of the two loops
-0.237145	will automatically replace such loops
-0.292002	better on very small loops
-0.235214	loop counter outside both loops
-0.232327	Some compilers will unroll loops
-0.265164	In C++ template metaprogramming, loops
-0.008672	on the processor. Nested loops
-0.001965	This is the reason why
-0.221185	and 1. The reason why
-0.221185	to do. The reason why
-0.221185	not occur. The reason why
-0.148059	running. The main reason why
-0.230111	of the main reasons why
-0.229251	I have no explanation why
-0.200043	The following example explains why
-0.517460	care of the CPU dispatching.
-0.220458	flawed approach to CPU dispatching.
-0.274405	function library with CPU dispatching.
-0.220458	This is called CPU dispatching.
-0.220458	library functions without CPU dispatching.
-0.154812	features for automatic CPU dispatching.
-0.212826	code with automatic CPU dispatching.
-0.132181	(12.4e) with automatic CPU dispatching.
-0.220458	examples of poor CPU dispatching.
-0.220458	examples of bad CPU dispatching.
-0.229191	i >= size) { cout
-0.005601	virtual void Disp() { cout
-0.005601	public: void Disp() { cout
-0.046973	public: void Hello() { cout
-0.046973	Disp(); void Hello() { cout
-0.229191	>= (unsigned int)size) { cout
-0.237118	// Loop through array cout
-0.523429	sign bit of f cout
-0.287318	well as pointers and references.
-0.287318	of using pointers and references.
-0.325235	than by pointers or references.
-0.237764	that are impossible with references.
-0.237325	is simpler when using references.
-0.156759	local name for local references.
-0.156759	PLT lookups for local references.
-0.009648	not used for internal references.
-0.039948	and PLT for internal references.
-0.554170	was it possible to come
-0.237047	user interface elements that come
-0.314077	system or libraries that come
-0.237047	header file mathimf.h that come
-0.237791	they are uninitialized or come
-0.237684	time consuming updates may come
-0.293284	and other big objects come
-0.445467	values or if they come
-0.234904	Documentation License shall automatically come
-0.345068	often used data members come
-0.343723	or the series of statements
-0.237791	language allows compile-time if statements
-0.237190	by semicolons, while multiple statements
-0.328863	no reason to add statements
-0.131767	of branches and switch statements
-0.284372	7.12 Branches and switch statements
-0.216489	as replacements for switch statements
-0.216489	predicted well. A switch statements
-0.168922	switch statements because switch statements
-0.200032	than two ways. Switch statements
-0.881649	double d; d = u;
-0.311390	Example 7.25 unsigned int u;
-0.311390	Example 14.22b unsigned int u;
-0.311390	Example 14.22a unsigned int u;
-0.087681	f; int i; } u;
-0.230008	d; int i[2]; } u;
-0.357984	blend instruction if the SSE4.1
-0.349699	(16 bits), unless the SSE4.1
-0.237914	integer multiplication prior to SSE4.1
-0.339399	instruction set, one for SSE4.1
-0.237812	#endif // SSE2 // SSE4.1
-0.294044	Same example, vectorized with SSE4.1
-0.314072	Suppl. SSE3 instr. set SSE4.1
-0.329995	more integer vector instructions SSE4.1
-0.222370	the vector class library, SSE4.1
-0.165153	pmmintrin.h Suppl. SSE3 tmmintrin.h SSE4.1
-1.041364	as explained in the chapter
-0.538519	ways, as explained in chapter
-0.213032	CPUs, as described in chapter
-0.313768	vector operations mentioned in chapter
-0.237729	a polymorphous class? This chapter
-0.236856	use static variables. See chapter
-0.335780	32-bit mode. The next chapter
-0.438237	explained in the previous chapter
-0.165164	in the "Macro loops" chapter
-0.355130	new compiler which is similar
-0.314549	Templates A template is similar
-0.352031	use objconv or a similar
-0.237623	time. Text strings and similar
-0.237623	effort. Square blocking and similar
-0.237863	crystal ball reveals that similar
-0.237644	published by Intel have similar
-0.237580	i_div_3; } 138 A similar
-0.330699	of microprocessors are very similar
-0.418774	math core library contains similar
-0.023423	} This is of course
-0.023423	objects. This is of course
-0.048181	already works is of course
-0.048181	and animations is of course
-0.236600	calls. These are of course
-0.236600	then you may of course
-0.236600	NULL. There should of course
-0.236600	to 15.1c would of course
-0.165184	implementations reveal a zigzag course
-0.165184	at compile time. (Of course
-0.313770	different code address and back
-0.044053	to protected mode and back
-0.293296	changed to truncation and back
-0.351776	then convert the result back
-0.233308	loop counter and go back
-0.231392	and setting the priority back
-0.224926	that lies r places back
-0.222371	with 100 and jumps back
-0.165164	of software that dates back
-0.346302	be changed without the risk
-0.212198	because it involves the risk
-0.212198	method also involves the risk
-0.353737	Complicated code is a risk
-0.988424	then there is a risk
-0.237442	int)u; // Faster, but risk
-0.666683	and there is no risk
-0.419843	if there is no risk
-0.332215	There is a higher risk
-0.352661	heap manager has a garbage
-0.237100	as task switches and garbage
-0.237100	The allocation, deallocation and garbage
-0.102566	to memory management and garbage
-0.102566	of heap management and garbage
-0.314686	is no need for garbage
-0.294013	become too fragmented. This garbage
-0.548485	spaces. This is called garbage
-0.289286	heap manager will start garbage
-0.304390	activating the very time-consuming garbage
-0.352426	space. Excessive use of templates
-0.851594	in the form of templates
-0.325346	containing container classes and templates
-0.236968	desired polymorphism effect with templates
-0.293147	7.43b. Compile-time polymorphism with templates
-0.333207	Ready made container class templates
-0.234963	of suitable containers class templates
-0.443513	no cost to using templates
-0.234796	at page 150. Using templates
-0.212298	to well-tested functions, classes, templates
-0.350305	A missing check for buffer
-0.487191	file in a memory buffer
-0.356400	fills up the loop buffer
-0.472377	data in a static buffer
-0.054093	in the branch target buffer
-0.129350	critical. The branch target buffer
-0.002552	implemented as a circular buffer
-0.005119	queue as a circular buffer
-0.358561	The names of the header
-0.357330	library libmmt.lib and the header
-0.815224	you can use the header
-0.237920	you are including a header
-0.237890	tested library modules and header
-0.237816	9.6b. #include "xmmintrin.h" // header
-0.335650	classes defined in Intel header
-0.341655	table. If the standard header
-0.288402	to include the appropriate header
-0.288402	classes. Including the appropriate header
-0.358483	go away in the future
-0.348309	vendor string. In the future
-0.350838	the future. If a future
-0.331854	the optimal choice for future
-0.538780	can only hope that future
-0.329602	to work best on future
-0.236075	the fastest solution on future
-0.236075	or 256 bytes) on future
-0.356619	present processors rather than future
-0.231400	call to Object1.Hello(), though future
-0.437803	constructor may be called whenever
-0.347301	This principle is useful whenever
-0.535417	for floating point calculations whenever
-0.352061	wastes several clock cycles whenever
-0.342232	setting pointers to zero whenever
-0.310874	is an extra cost whenever
-0.415413	which they are declared whenever
-0.416065	certain to be mispredicted whenever
-0.222366	linkage table (PLT). And whenever
-0.165148	at their own initiative whenever
-0.406860	nothing to gain by unrolling
-0.237046	the performance dramatically by unrolling
-0.444185	2; } The loop unrolling
-0.233537	available use excessive loop unrolling
-0.233537	too much. Excessive loop unrolling
-0.161025	1; } } Loop unrolling
-0.161025	total execution time. Loop unrolling
-0.161025	in this case. Loop unrolling
-0.161025	the unroll factor. Loop unrolling
-0.161025	branch is eliminated. Loop unrolling
-0.808379	The different versions of CriticalFunction
-0.493735	removing the call to CriticalFunction
-0.237696	point extern "C" int CriticalFunction
-0.237098	function version CriticalFunctionType * CriticalFunction
-0.237099	{ // Generic version CriticalFunction
-0.591594	the number of times CriticalFunction
-0.516277	{ // SSE2 supported CriticalFunction
-0.516277	{ // AVX supported CriticalFunction
-0.235632	truth depends on whether CriticalFunction
-0.608311	it takes to execute CriticalFunction
-0.713767	the operating system to swap
-0.102726	define a macro to swap
-0.102726	// Define macro to swap
-0.237896	below the diagonal and swap
-0.237238	columns below diagonal // swap
-0.237238	diagonal swapd(a[r][c], a[c][r]); // swap
-0.347380	+ column; Do not swap
-0.294113	... Here, you cannot swap
-0.294113	... Here you cannot swap
-0.335379	Boolean operands. You cannot swap
-0.339266	application that uses a newer
-0.294025	You may choose a newer
-0.487638	and is available in newer
-0.745124	available instruction set. The newer
-0.314546	are particularly fast on newer
-0.314343	precision is used. A newer
-0.345029	advanced version on all newer
-0.235350	Pentium 4, while all newer
-0.324667	less important on most newer
-0.235387	of operating system All newer
-0.357420	binary integer, and the fraction
-0.421290	2.0) by setting the fraction
-0.081233	Sdouble { unsigned int fraction
-0.081233	Slongdouble { unsigned int fraction
-0.081233	Sfloat { unsigned int fraction
-0.237420	1)sign 2exponent 16383 one fraction
-0.341556	But if a large fraction
-0.339688	use only a small fraction
-0.225745	1)sign 2exponent 127 1 fraction
-0.225745	1)sign 2exponent 1023 1 fraction
-0.711819	may be necessary to modify
-0.820398	is not recommended to modify
-0.268833	// this function can modify
-0.268833	that one function can modify
-0.293373	own container classes or modify
-0.237167	can add, remove or modify
-0.525057	object. Make the function modify
-0.324333	loaded anyway. If we modify
-0.419791	const member function cannot modify
-0.234794	zero flag and don't modify
-0.868399	read the value of seconds
-0.350789	compiler would assume that seconds
-0.356614	clock cycles rather than seconds
-0.237656	thread void DelayFiveSeconds() { seconds
-0.293703	by another thread. If seconds
-0.335705	it attempts to set seconds
-0.236220	// do nothing while seconds
-0.322657	is delayed for several seconds
-0.394445	It can take several seconds
-0.307578	function will wait until seconds
-0.237872	precautions to account for unaligned
-0.097909	// Function to store unaligned
-0.234802	sake of efficiency. Using unaligned
-0.092390	// Function to load unaligned
-0.167996	0.18 0.11 memcpy 16kB unaligned
-0.167996	0.28 0.22 memcpy 16kB unaligned
-0.237633	data object through this address.
-0.346111	that holds a memory address.
-0.494112	loader to a different address.
-0.357747	library requiring the same address.
-0.236290	compiler must calculate its address.
-0.219481	to a specific load address.
-0.219481	fit the actual load address.
-0.224924	GOT through a self-relative address.
-0.276559	point to a valid address.
-0.165153	with a 32-bit (signed) address.
-0.043959	b * c); // Store
-0.291473	= _mm_or_si128(c2, bc); // Store
-0.235497	_mm_blendv_epi8(bc, c2, mask); // Store
-0.235497	//=2*A //=A*x*x+B*x+C //=DeltaY // Store
-0.230827	cache MOVNTI _mm_stream_si32 SSE2 Store
-0.230827	cache MOVNTPD _mm_stream_pd SSE2 Store
-0.177547	Prefetch PREFETCH _mm_prefetch SSE Store
-0.177547	cache MOVNTPS _mm_stream_ps SSE Store
-0.177547	cache MOVNTQ _mm_stream_pi SSE Store
-0.358507	each step of the sequence
-0.460602	subsequent elements in the sequence
-0.356541	next step in the sequence
-0.357852	a disadvantage if the sequence
-0.354261	is performed on a sequence
-0.237567	you are doing a sequence
-0.237567	case labels follow a sequence
-0.237905	that are allocated in sequence
-0.331835	with earlier CPUs. The sequence
-0.347610	situation where a long sequence
-0.760948	code generated by the compiler,
-0.356863	that comes with the compiler,
-0.614240	are using an Intel compiler,
-0.266185	Included with Intel C++ compiler,
-0.266185	v. 2.00. Intel C++ compiler,
-0.447086	case of the Gnu compiler,
-0.229787	X #else // Gnu compiler,
-0.235930	work on a Linux compiler,
-0.324948	standardized details in both compiler,
-0.224501	requires support from both compiler,
-0.408159	time. The delay is significant
-0.357667	each iteration is a significant
-0.495742	bytes). This has a significant
-0.314488	and functions consume a significant
-0.629816	opens the possibility for significant
-0.357586	the table is not significant
-0.499746	exponent, and the most significant
-0.154159	i into the least significant
-0.154159	operation isolates the least significant
-0.222380	precision of approximately seven significant
-0.339074	more complex cases it might
-0.237148	few parameters. Or it might
-0.512022	pointers). An optimizing compiler might
-0.314406	array elements then this might
-0.237580	a better solution. It might
-0.293182	is that the variables might
-0.236858	because a fixed address might
-0.351427	priorities then the user might
-0.234527	in this example. We might
-0.200015	games. Such a coprocessor might
-0.356539	vector registers in the CPU.
-0.356539	execution units in the CPU.
-0.503825	reordering easier for the CPU.
-0.573587	four, depending on the CPU.
-0.237917	on any brand of CPU.
-0.614234	running on an Intel CPU.
-0.235484	to any known hardware CPU.
-0.233024	addition on a modern CPU.
-0.231899	satisfactorily on a non-Intel CPU.
-0.251342	on a 2 GHz CPU.
-0.237294	bits Vector class, Intel Vector
-0.536009	size of vector, bits Vector
-0.308485	floating point register variables. Vector
-0.744984	and later instruction sets. Vector
-0.200009	F32vec8 F64vec4 Table 12.5. Vector
-0.165148	commercial license Table 12.4. Vector
-0.165148	not been updated lately. Vector
-0.165148	512 AVX512 Table 12.1. Vector
-0.165148	details. // Example 12.7. Vector
-0.165148	also 512 bits (ZMM). Vector
-0.462076	certain limit to the length
-0.357716	an integer if the length
-0.355477	2 GHz then the length
-0.356260	|= 0x20; If the length
-0.349379	not safe unless the length
-0.293775	row by adding the length
-0.382287	frequency is doubled. The length
-0.237515	CPU was started. The length
-0.290371	strlen function. The string length
-0.230103	efficient when the row length
-0.237899	organized into lines and sets.
-0.267646	optimized for large data sets.
-0.267646	applications with large data sets.
-0.227945	different processors and instruction sets.
-0.046768	any of these instruction sets.
-0.046768	availability of these instruction sets.
-0.065059	SSE2 and later instruction sets.
-0.065059	AVX and later instruction sets.
-0.065059	SSE and later instruction sets.
-0.236411	of the different instructions sets.
-0.499432	expression that is a linear
-0.570704	i*sizeof(S1). This is a linear
-0.454425	less efficient than a linear
-0.435247	you can use a linear
-0.676082	basis then use a linear
-0.324265	the container, then a linear
-0.406322	search, or even a linear
-0.351170	than looping through a linear
-0.237907	Some applications (e.g. in linear
-0.232716	common mathematical calculations including linear
-0.570759	consider if there is something
-0.048320	if I write that something
-0.048320	If I write that something
-0.314461	latter function also has something
-0.138965	is important to do something
-0.138965	very important to do something
-0.235550	goes to actually doing something
-0.290673	the opposite: Don't put something
-0.212292	faster vectorized code. Storing something
-0.370744	But it is certainly something
-0.752436	the sign bit of f
-0.395294	set sign bit of f
-0.826285	} else { // f
-0.237231	0 - 30 // f
-0.293845	the first sum, then f
-0.291636	i <= n; i++) f
-0.345090	// n! int i, f
-0.165159	f=i; f = (float)i; f
-0.165159	(float)i; f = float(i); f
-0.165159	i; float f; f=i; f
-0.576067	but there is a penalty
-0.325324	the alternative version. The penalty
-0.237731	YMM register state. This penalty
-0.585591	double There is no penalty
-0.280037	There is a performance penalty
-0.514261	There is no performance penalty
-0.225428	is hardly any performance penalty
-0.225428	is no 51 performance penalty
-0.241258	thousand so the misprediction penalty
-0.241258	may get a misprediction penalty
-0.352060	are breaking out of F1
-0.237914	empty throw() specification to F1
-0.453305	compiler to assume that F1
-0.345061	} } The function F1
-0.574420	is only possible if F1
-0.237075	function F1. However, if F1
-0.408035	all functions called by F1
-0.237580	throw an exception then F1
-0.237457	may be necessary. If F1
-0.165153	of F1 without returning. F1
-0.294236	the expression list[i] is invalid
-0.293899	violation, integer overflow, and invalid
-0.237629	array bounds violations and invalid
-0.357867	address. Pointers can be invalid
-0.313709	because this would be invalid
-0.313709	particular reduction would be invalid
-0.237764	security matters. Problems with invalid
-0.273302	static then it becomes invalid
-0.219484	time stamp counter becomes invalid
-0.165169	for array bounds violations, invalid
-0.237874	only called once. The reasons
-0.345890	than frame functions for reasons
-0.236019	loss of precision for reasons
-0.330018	for this manual for reasons
-0.236019	in 32-bit mode, for reasons
-0.023378	are not permissible for reasons
-0.338209	one of the main reasons
-0.232325	unless you have special reasons
-0.224938	the computer for security reasons
-0.348350	a common way of setting
-0.237893	before the test and setting
-0.429435	time is needed for setting
-0.429379	copying an array or setting
-0.234828	the absolute value by setting
-0.339766	CPU brand simply by setting
-0.234828	specific CPU core by setting
-0.234828	pointers to zero, by setting
-0.234828	interval [1.0, 2.0) by setting
-0.428952	registers can benefit from setting
-0.294260	module by compiling the module
-0.462551	suitable functions in a module
-0.529391	defined in a different module
-0.575346	were in the same module
-0.452207	only within the same module
-0.023070	called only from same module
-0.452004	referenced from any other module
-0.342915	to test a software module
-0.347799	library or a separate module
-0.780416	the address of the beginning
-0.178644	function relative to the beginning
-0.080200	member relative to the beginning
-0.178644	offset relative to the beginning
-1.090284	makes sure that the beginning
-0.356218	array coincides with the beginning
-0.354733	size right from the beginning
-0.352136	memory block into the beginning
-0.429510	whether an integer is within
-0.324929	variable is accessed from within
-0.314286	Internal references to data within
-0.429135	that is used only within
-0.345060	contain all data members within
-0.234640	first byte of zero within
-0.230064	semicolons, while multiple statements within
-0.200009	likely to be irrelevant within
-0.165148	indices or by keys within
-0.165148	certain to become obsolete within
-0.890952	the Intel compiler is used,
-0.357727	C-style type-casting. It is used,
-0.717618	dynamic memory allocation is used,
-0.313950	of the expression is used,
-0.331241	if dynamic linking is used,
-0.462629	32 sets can be used,
-0.355178	in main will be used,
-0.308087	floating point registers are used,
-0.891561	the XMM registers are used,
-0.222389	model is hardly ever used,
-0.237887	is OS independent and checks
-0.237863	the so-called CPU-dispatcher that checks
-0.343194	an Intel before it checks
-0.237149	a derived class, it checks
-0.443575	security. There are no checks
-0.313826	The absence of such checks
-0.022771	and to make overflow checks
-0.527468	processor. The CPU dispatcher checks
-0.218560	programmer to make explicit checks
-0.237798	for storing text or input
-0.294042	for buffer overflow on input
-0.077570	with Boolean variables as input
-0.077570	have Boolean variables as input
-0.331626	command line or an input
-0.319136	data instead of user input
-0.220845	response time to user input
-0.047277	time waiting for user input
-0.236513	time used for file input
-0.237854	optimized well, others are not.
-0.314705	while other functions can not.
-0.310134	use vectorized code or not.
-0.310134	vectorize a loop or not.
-0.234050	power of 2 or not.
-0.327092	increase the speed or not.
-0.234050	will be advantageous or not.
-0.170454	arrays are aligned or not.
-0.170454	are properly aligned or not.
-0.324815	be allowed and which not.
-0.325311	I don't think that programmers
-0.340361	to use for many programmers
-0.233704	code. For example, many programmers
-0.237024	enough. For example, some programmers
-0.340610	the attention of software programmers
-0.312741	optimization guide for assembly programmers
-0.234913	different C++ constructs Most programmers
-0.234173	start to program. Many programmers
-0.233545	manual is for advanced programmers
-0.165153	of the program. Application programmers
-0.356255	memory footprint than the alternative
-0.237864	of variable size. The alternative
-0.237787	is the case if alternative
-0.331607	it off and use alternative
-0.329988	a profiler. A simple alternative
-0.229034	function is inlined. An alternative
-0.229034	to become fragmented. An alternative
-0.231383	in a DLL. Another alternative
-0.165153	each object. A little-known alternative
-0.165153	to load. A light-weight alternative
-0.230784	slow bit scan instructions. My
-0.224909	following example illustrates this. My
-0.291726	this with an example. My
-0.218549	other less well-known languages. My
-0.200009	Windows and Linux. Asmlib My
-0.330794	the performance monitor counters. My
-0.200009	a 512 512 matrix. My
-0.200009	and one from me. My
-0.165148	the object file level. My
-0.165148	spots have been identified. My
-0.356286	of memory that is organized
-0.343563	how a cache is organized
-0.461954	XMM register can be organized
-0.636438	multidimensional array should be organized
-0.347055	and resources should be organized
-0.313115	that can easily be organized
-0.237399	128. These lines are organized
-0.237399	guidelines. Most caches are organized
-0.314597	sequentially in memory if organized
-0.680245	eight floating point registers organized
-0.049185	multiple of the critical stride
-0.258099	avoid that the critical stride
-0.258099	is because the critical stride
-0.258099	manner where the critical stride
-0.056655	8 ways. The critical stride
-0.056655	cache lines. The critical stride
-0.056655	bytes each. The critical stride
-0.622682	for the SSE2 instruction set,
-0.439607	only the SSE2 instruction set,
-0.439607	without the SSE2 instruction set,
-0.285974	SSE or SSE2 instruction set,
-0.274407	for a specific instruction set,
-0.774617	of the AVX instruction set,
-0.051227	// Get supported instruction set,
-0.358665	supports a particular instruction set,
-0.318846	or any higher instruction set,
-1.478098	the address of the current
-0.549933	are relative to the current
-0.357782	the updates if the current
-0.356619	be vectorized with the current
-0.237603	same module (i.e. the current
-0.294199	better backup features, and current
-0.456841	shows a code that current
-0.237755	for certain tasks on current
-0.237112	like example 12.4a where current
-0.231390	dispatcher that doesn't handle current
-0.325413	it doesn't need the 'this'
-0.456361	member functions have a 'this'
-0.421105	it doesn't need a 'this'
-0.485856	non-static member functions. The 'this'
-0.237865	of parameter transfer for 'this'
-0.380585	class by type-casting its 'this'
-0.218560	32-bit Windows by transferring 'this'
-0.148734	by __fastcall. The implicit 'this'
-0.194024	Sum1 has an implicit 'this'
-0.212292	function parameters, pointers, references, 'this'
-0.358672	whole structure of the problem.
-0.294232	that caching becomes a problem.
-0.446613	a discussion of this problem.
-0.288184	do not have this problem.
-0.310537	lengths to reduce this problem.
-0.288184	various ways around this problem.
-0.375449	designed to solve this problem.
-0.312642	structure has one big problem.
-0.235755	mode, we encounter another problem.
-0.224932	overflow is another security problem.
-0.314705	Pentium 4 processors, and 3
-0.355545	five manuals. See page 3
-0.381676	Floating point addition takes 3
-0.433533	example, it may take 3
-0.542340	rest of the program. 3
-0.416604	compilers and operating systems. 3
-0.085102	break at the interrupt 3
-0.085102	to remove the interrupt 3
-0.224924	the C++ language...................................................... 14 3
-0.165153	Contents 1 Introduction ....................................................................................................................... 3
-0.643774	pointer in member functions counts
-0.237448	in the CPU, which counts
-0.299194	experience. Occasionally, the clock counts
-0.299194	any event, the clock counts
-0.320735	normal afterwards. The clock counts
-0.331178	every millisecond. The profiler counts
-0.280443	count and the subsequent counts
-0.265182	not cached. The subsequent counts
-0.222380	results in meaningless event counts
-0.272264	is the "best case" counts
-0.010279	is a lot to gain
-0.020807	often a lot to gain
-0.382044	There is nothing to gain
-0.237512	has high priority. The gain
-0.293766	contains natural parallelism. The gain
-0.237733	be quite substantial. This gain
-0.236523	1. How much you gain
-0.236523	it. The insight you gain
-0.235979	justifies the relatively small gain
-0.237453	loop predictor. On other processors,
-0.237088	some processors. On many processors,
-0.338070	cycles on Pentium 4 processors,
-0.334722	integer size on AMD processors,
-1.493775	Intel, AMD and VIA processors,
-0.436008	works well on non-Intel processors,
-0.305404	work best on future processors,
-0.222390	between RISC and CISC processors,
-0.218549	branch tree. On older processors,
-0.165148	and on Intel Atom processors,
-0.353623	= (b*c)/d, it can happen
-0.380432	cache. The same can happen
-0.102231	the same errors can happen
-0.102231	because serious errors can happen
-0.325097	is moved, which may happen
-0.346146	page 87. This will happen
-1.424730	part of the program happen
-0.237005	risk that several variables happen
-0.335329	Therefore, it can often happen
-0.235924	in a big matrix happen
-0.357398	few times may be enough
-0.355429	If there are not enough
-0.293787	that the integer has enough
-0.232976	noticeable but not long enough
-0.232976	delay is just long enough
-0.026328	size that is big enough
-0.116501	integer size is big enough
-0.334679	addition. This is small enough
-0.327361	dispatch mechanism is rarely enough
-0.237919	you want them to apply
-0.610736	of 2 does not apply
-0.333664	same argument does not apply
-0.236574	advice given here may apply
-0.236574	of the advices may apply
-0.490332	exception. Therefore, you should apply
-0.309412	dispatching. Obviously, you should apply
-0.347293	2 does not always apply
-0.184761	unsigned The same rules apply
-0.184761	The same coding rules apply
-0.569011	... } } } Obviously,
-0.232665	values of all variables. Obviously,
-0.317999	the same shared object. Obviously,
-0.342363	approximately two clock cycles. Obviously,
-0.398492	it is not needed. Obviously,
-0.336183	of bad CPU dispatching. Obviously,
-0.648436	mode and back again. Obviously,
-0.306419	of A is finished. Obviously,
-0.361271	different for each process. Obviously,
-0.200009	of the .NET framework. Obviously,
-0.382605	using a particular code version.
-0.352771	some changes for each version.
-0.457395	to the best possible version.
-0.313580	better than the 32-bit version.
-0.291453	zip file of every version.
-0.235318	copy of every intermediate version.
-0.496664	present in the old version.
-0.520552	go to the desired version.
-0.230083	footprint than the alternative version.
-0.265158	will have an up-to-date version.
-0.357117	less efficient when the row
-0.915125	the length of a row
-0.351273	first eight elements in row
-0.237799	NUMCOLUMNS; column++) matrix[row][column] = row
-0.339293	take the elements from row
-0.638748	the address of each row
-0.350115	one container for each row
-0.356469	for (row = 0; row
-0.829002	number of elements per row
-0.334340	work needed for calculating row
-0.342841	See the Intel C++ Compiler
-0.261332	5, 2009). Intel C++ Compiler
-0.282589	were tested: Microsoft C++ Compiler
-0.227677	regularly. Intel: "Intel® C++ Compiler
-0.312676	7.1-4, 2008. Digital Mars Compiler
-0.222377	Table 18.3. Predefined macros Compiler
-0.074779	with long latencies. 8.5 Compiler
-0.074779	optimization by CPU.............................................................................81 8.5 Compiler
-0.165169	/Qopt-report -opt-report Table 18.2. Compiler
-0.165169	expressions when not selected. Compiler
-0.352900	you use is a matter
-0.352900	you prefer is a matter
-0.003665	It is simply a matter
-0.007361	structure is simply a matter
-0.003665	difference is simply a matter
-0.313273	classes is just a matter
-0.339212	this case it doesn't matter
-0.229059	where the size doesn't matter
-1.166475	make sure that the declaration
-0.564659	hidden by using the declaration
-0.353355	range printf(Greek[n]); } The declaration
-0.354222	However, the const int declaration
-0.911728	a structure or class declaration
-0.331168	other module. The static declaration
-0.494222	added to a variable declaration
-0.323773	by making the full declaration
-0.229226	different integer types available. declaration
-0.251329	must have extern "C" declaration
-0.355271	and delete is to allocate
-0.346109	(memory pooling) than to allocate
-0.564976	often more efficient to allocate
-1.238693	it is necessary to allocate
-0.236763	new and delete to allocate
-0.313368	it is preferable to allocate
-0.236763	processor core. Try to allocate
-0.325226	some cases. Does not allocate
-0.236579	are prone to even allocate
-0.291967	applications. The string classes allocate
-0.345245	the loop or the series
-0.357246	dependency chain is a series
-0.353823	number one in a series
-0.353823	the first in a series
-0.351422	be propagated through a series
-0.293221	I have made a series
-0.237034	Intel mechanism executes a series
-0.237733	20 Copyright notice This series
-0.354010	other volumes in this series
-0.218578	// Example 12.9a. Taylor series
-0.763891	has many of the features
-0.420955	and function libraries have features
-0.575755	many of the same features
-0.352374	instruction sets and other features
-0.331731	on using the optimization features
-0.231799	from its many optimization features
-0.236604	software to add new features
-0.309532	a wealth of advanced features
-0.281544	avoid the time- consuming features
-0.165153	follows: Instruction set Important features
-0.906302	the value that is added
-0.428470	When an integer is added
-0.237249	d+e, then c is added
-0.237249	class data members is added
-0.324563	sum, then f is added
-0.579436	with a lot of added
-0.522900	new objects can be added
-0.355945	register keyword can be added
-0.352925	is called. I have added
-0.345968	all elements have been added
-0.457411	is annoying to the user.
-0.354028	error messages to the user.
-0.141545	save time for the user.
-0.141545	saves time for the user.
-0.356775	when activated by the user.
-0.044399	important to the end user.
-0.021632	distributed to the end user.
-0.044399	inconvenient to the end user.
-0.286067	poorly for the end user.
-0.357781	This reduces the code to:
-0.036605	compiler may reduce this to:
-0.232608	compiler may change this to:
-0.232608	multiplication by changing this to:
-0.232608	/ 1.2345; Change this to:
-0.455362	!a; can be optimized to:
-0.309978	} Can be reduced to:
-0.234132	This can be changed to:
-0.234132	... can be changed to:
-0.456330	the stack is a waste
-0.353176	case situation is a waste
-0.351254	the user and a waste
-0.062839	not only be a waste
-0.457280	members may cause a waste
-0.237896	sources of frustration and waste
-0.330622	compatibility problems and they waste
-0.337226	itself is a big waste
-0.287385	which is a total waste
-0.488008	of example 15.1b to metaprogramming
-0.335946	from string functions. A metaprogramming
-0.236728	following examples explain how metaprogramming
-0.221908	sub-expressions. Why is template metaprogramming
-0.221908	Integer power using template metaprogramming
-0.221908	cases, however, where template metaprogramming
-0.221908	tortuous and convoluted template metaprogramming
-0.234510	are waiting for better metaprogramming
-0.231381	assembly language has full metaprogramming
-0.265193	techniques can be considered metaprogramming
-0.237920	memory by requesting a map
-0.339444	as list, set and map
-0.721063	of the program. The map
-0.293761	from the linker. The map
-0.233805	looking at a link map
-0.415321	tree or a hash map
-0.172229	finding elements. A hash map
-0.172229	specific interval. A hash map
-0.212298	masm=intel /FA -S Generate map
-0.165164	listing. Use the "generate map
-0.428615	systems allow you to define
-1.349307	is more efficient to define
-0.838626	will be able to define
-0.382046	including the ability to define
-0.348979	of constants we can define
-0.236076	cache line size // define
-0.323709	to transpose matrix // define
-0.236076	library #include <stdio.h> // define
-0.236076	// define fprintf // define
-0.520274	stack. Alternatively, you may define
-0.350235	object x when it returns.
-0.309093	soon as the function returns.
-0.201575	automatically when the function returns.
-0.089196	deallocated when the function returns.
-0.201575	freed when the function returns.
-0.059135	called before the function returns.
-0.059135	stack before the function returns.
-0.059135	freed before the function returns.
-0.059135	restored before the function returns.
-0.294214	the system database in Windows.
-0.382790	64-bit device drivers for Windows.
-0.162233	Supports 32-bit and 64-bit Windows.
-0.194627	Linux than in 64-bit Windows.
-0.209304	source compiler for 32-bit Windows.
-0.209304	commercial compiler for 32-bit Windows.
-0.046275	compiler. Supports only 32-bit Windows.
-0.046275	sets. Supports only 32-bit Windows.
-0.299038	The object oriented programming style
-0.299038	an object oriented programming style
-0.225187	a relatively primitive programming style
-0.197918	Note that the C style
-0.020338	the old fashioned C style
-0.231906	shift in software writing style
-0.165174	be mixed with x87 style
-0.165174	the same as C- style
-0.014463	+= 8) { // Load
-0.055825	LoadVector(bb + i); // Load
-0.233762	consecutive elements b.load(bb+i); // Load
-0.389103	for the function call. Load
-0.458567	a[100], temp; temp = 3;
-0.237704	code is __asm int 3;
-0.237254	i * 9 + 3;
-0.311259	a = a * 3;
-0.235810	list[i] += i / 3;
-0.231393	list[i] = i % 3;
-0.357265	of truncation. This is approximately
-0.407658	integer register variables is approximately
-0.237585	a branch misprediction is approximately
-0.237917	holds a precision of approximately
-0.237868	availability of x for approximately
-0.355992	very limited. There are approximately
-0.237795	in an array, or approximately
-0.456858	this loop will take approximately
-0.512210	to can be accessed approximately
-0.325319	simultaneously or out of order.
-0.421299	executing instructions out of order.
-0.344800	functions in the optimal order.
-0.097012	accessed in a non-sequential order.
-0.019765	and deallocated in random order.
-0.218578	accessed in non- sequential order.
-0.312991	2: printf("Gamma"); break; case 3:
-0.237258	preceding paragraph and manual 3:
-0.362923	in detail in manual 3:
-0.277911	are covered in manual 3:
-0.009622	the CPU (See manual 3:
-0.039839	AMD CPUs (See manual 3:
-0.039839	be mispredicted (See manual 3:
-0.200049	number of branches. Manual 3:
-0.314703	x86 platforms. 3. The microarchitecture
-0.000307	and manual 3: "The microarchitecture
-0.000153	in manual 3: "The microarchitecture
-0.000077	(See manual 3: "The microarchitecture
-0.002152	branches. Manual 3: "The microarchitecture
-0.353937	registers are: It is easy
-0.353937	to develop. It is easy
-0.324977	and this error is easy
-0.293903	and for fast and easy
-0.237632	efficiency, platform independence, and easy
-0.237802	functions, inline assembly or easy
-0.525387	x4∙xn-4. There is no easy
-0.525387	.so). There is no easy
-0.200037	IDE, for debugging facilities, easy
-0.458752	therefore be aware of situations
-0.237294	This is useful in situations
-0.668668	can be useful in situations
-0.237294	is also useful in situations
-0.237099	better than RISC in situations
-0.539388	user. There may be situations
-0.355993	support it. There are situations
-0.527594	post-increment. There are also situations
-0.236636	be useful in test situations
-1.345732	is more efficient to implement
-1.234528	it is possible to implement
-1.349655	It is possible to implement
-0.591516	(GOT) in order to implement
-1.145541	example shows how to implement
-0.483226	which is difficult to implement
-0.318843	is quite difficult to implement
-0.235936	// The child classes implement
-0.890670	compilers I have tested implement
-0.237692	small loops (less than 65
-0.196016	64 32 16.4 65 65
-0.196016	64 14.0 80.8 65 65
-0.218560	7.32 Preprocessing directives ......................................................................................... 65
-0.200021	speed to using namespaces. 65
-0.165159	......................................................................................... 65 7.33 Namespaces........................................................................................................... 65
-0.165159	64 64 32 16.4 65
-0.165159	64 64 14.0 80.8 65
-0.165159	of stack unwinding .............................................................................. 65
-0.556372	In cases where the chosen
-0.347328	// Now call the chosen
-0.336149	these two gives the chosen
-1.123999	of the code is chosen
-0.341420	disadvantages when C++ is chosen
-0.625535	The C++ language is chosen
-0.462966	optimal branch can be chosen
-0.451424	discovers that it has chosen
-0.506175	that the compiler has chosen
-0.237916	library can emulate a 256-bit
-0.237913	this problem. Vectors of 256-bit
-0.314741	registers are extended to 256-bit
-0.237883	to 128-bit XMM and 256-bit
-0.237864	16 (see below). The 256-bit
-0.293651	library will use one 256-bit
-0.534339	first processors that supported 256-bit
-0.234902	instruction set also allows 256-bit
-0.218568	256-bit instructions were splitting 256-bit
-0.556226	number. Therefore, it is slightly
-0.237588	than 127 bytes is slightly
-0.407662	mode. The latter is slightly
-0.321520	short int) are only slightly
-0.209305	language. C++ takes only slightly
-0.209305	double precision takes only slightly
-0.311941	function calls may run slightly
-0.200037	Some compilers make Sum1 slightly
-0.200037	also work, 133 although slightly
-0.314720	code is fragmented and scattered
-0.574275	are likely to be scattered
-0.357028	memory. They may be scattered
-0.355147	and objects that are scattered
-0.489396	and the data are scattered
-0.310822	poor if data are scattered
-0.292609	the dispatch branches are scattered
-0.325159	there are many functions scattered
-0.236025	files, help files etc. scattered
-0.876271	is not possible to contain
-0.237861	eliminate common subexpressions that contain
-0.345168	containing integers. It can contain
-0.237682	the data section may contain
-0.237394	The test data should contain
-0.292751	type of objects they contain
-0.235909	59 third generations classes contain
-0.212286	1996. These two books contain
-0.165153	internet forums and newsgroups contain
-0.237890	Using unaligned reads and writes
-0.102591	function that reads or writes
-0.102591	program afterwards reads or writes
-0.575332	9.5 so that it writes
-0.336200	} } This function writes
-0.293674	critical stride causes all writes
-0.203358	evicted. Don't mix nontemporal writes
-0.203358	compiler can insert nontemporal writes
-0.224926	nontemporal writes with normal writes
-0.357524	or interpretation on the device
-0.294228	which then calls a device
-0.237887	Interrupt service routines and device
-0.294209	point capabilities (except in device
-0.335817	a printer or other device
-0.353493	be used in 64-bit device
-0.087042	in a programmable logic device
-0.087042	devices A programmable logic device
-0.222365	C or C++. Critical device
-0.588089	loop if it is independent
-0.237756	in example 11.3 is independent
-0.237848	all the additions are independent
-0.231406	This function is OS independent
-0.316437	measure that is almost independent
-0.224932	be used on completely independent
-0.048396	the needs of position- independent
-0.048396	data. This makes position- independent
-0.048396	use the so-called position- independent
-0.427110	costs of dynamic memory allocation.
-0.184562	associated with dynamic memory allocation.
-0.451799	to use dynamic memory allocation.
-0.451799	classes use dynamic memory allocation.
-0.184562	of using dynamic memory allocation.
-0.184562	container without dynamic memory allocation.
-0.068874	to avoid dynamic memory allocation.
-0.236510	is reserved for dynamic allocation.
-0.234300	is faster than a non-static
-0.234300	called faster than a non-static
-0.538656	non-static data members or non-static
-0.345042	is incurred on all non-static
-0.235361	stack pointer. Likewise, all non-static
-0.285147	they don't need any non-static
-0.047094	it cannot access any non-static
-0.047094	function cannot access any non-static
-0.235403	microprocessors are constructed. All non-static
-0.461389	first count and the subsequent
-0.526374	as described in the subsequent
-0.345147	usually higher than the subsequent
-0.345147	be slower than the subsequent
-0.293875	operation doesn't delay the subsequent
-0.630954	the code cache. The subsequent
-0.237156	this first manual. The subsequent
-0.237156	are not cached. The subsequent
-0.293690	the list causes all subsequent
-0.313398	other member functions. This applies
-0.236788	a random manner. This applies
-0.297612	non-sequential order. The same applies
-0.297612	four floats. The same applies
-0.286152	page 137). This also applies
-0.230816	here about Linux also applies
-0.230816	about increment operators also applies
-0.897216	using powers of 2 applies
-0.222389	row. The same advice applies
-0.557398	same code can be applied
-0.552181	attribute which can be applied
-0.102576	which can only be applied
-0.102576	otherwise can only be applied
-0.232028	produce 32 results when applied
-0.002818	The keyword static, when applied
-0.167321	The copy constructors and destructors
-0.167321	no copy constructors and destructors
-0.048264	}; 7.23 Constructors and destructors
-0.048264	54 7.23 Constructors and destructors
-0.237769	are wrapper classes with destructors
-0.108710	be sure that all destructors
-0.108710	makes sure that all destructors
-0.254056	no guarantee that all destructors
-0.236128	and calling any necessary destructors
-0.112641	only be applied to integers.
-0.764429	exactly as efficient as integers.
-0.586444	take advantage of 64-bit integers.
-0.320660	inherent support for 64-bit integers.
-0.519954	using signed and unsigned integers.
-0.290679	less efficient than signed integers.
-0.373771	vectors of eight 16-bit integers.
-0.231388	128 bit vector containing integers.
-0.779114	of the function in terms
-0.236033	quite expensive - in terms
-0.036967	is no cost in terms
-0.102175	against the costs in terms
-0.102175	STL also costs in terms
-0.236033	#) are costless in terms
-0.236033	time lag. Thinking in terms
-0.233050	loop calculates four consecutive terms
-0.460068	of this is to help
-0.592239	elements in order to help
-0.512699	is that we can help
-0.313688	may be of some help
-0.236590	to reorder instructions without help
-0.348222	deciding whether to store help
-0.224934	configuration files, resource files, help
-0.224934	resource files, configuration files, help
-0.222371	for automatic updates, remote help
-0.343534	is usually faster to transfer
-0.237729	a "move constructor" to transfer
-0.325330	structure or class. The transfer
-0.017083	the overhead of parameter transfer
-0.008458	The overhead of parameter transfer
-0.082796	and return and parameter transfer
-0.082796	register allocation and parameter transfer
-0.165179	Use 64-bit mode Parameter transfer
-0.290134	even allocate more memory blocks
-0.290134	method with multiple memory blocks
-0.290134	deallocation of big memory blocks
-0.344677	the data into multiple blocks
-0.229040	be done in big blocks
-0.229040	Reading or writing big blocks
-0.287376	different ways of copying blocks
-0.165169	consisting of digital building blocks
-0.165169	language", section 17.9: "Moving blocks
-0.235878	pointer is simply optimized away
-0.225366	very good at optimizing away
-0.225366	Serialize // Prevent optimizing away
-0.270886	the compiler to optimize away
-0.335949	good compiler can optimize away
-0.203818	compiler can easily optimize away
-0.290684	therefore preferably be put away
-0.319761	is likely to go away
-0.296576	the transformation of example 15.1b
-0.222642	implementation analogous to example 15.1b
-0.326750	method used in example 15.1b
-0.326750	if branch in example 15.1b
-0.242601	The conversion from example 15.1b
-0.242601	to come from example 15.1b
-0.543020	compiler will convert example 15.1b
-0.287881	15.1a to an inlined 15.1b
-0.287397	the Gnu compiler reduced 15.1b
-0.357508	software development and the low
-0.538902	the work load is low
-0.357812	should run in a low
-0.354310	code branch for a low
-0.407629	unsigned variable produces a low
-0.330713	Small lightweight processors with low
-0.335305	into separate threads with low
-0.452796	Loops with a very low
-0.251348	code size have got low
-0.237919	Now, the factor to multiply
-0.562351	knows that it can multiply
-0.349095	zero } We can multiply
-0.237818	// loop for // multiply
-0.025252	Divide by constant = multiply
-0.348629	for example, you should multiply
-0.236880	The vector instructions cannot multiply
-0.641682	the same time to share
-0.292879	and that threads can share
-0.236733	b and c can share
-0.236733	applications running simultaneously can share
-0.293287	to make different objects share
-0.693604	contentions if the threads share
-0.345072	structure where data members share
-0.233786	or logical processors usually share
-0.276572	elements in row 28 share
-0.516214	SSE2 instruction set is enabled.
-0.341964	later instruction set is enabled.
-0.501642	SSE4.1 instruction set is enabled.
-0.356513	later) instruction set is enabled.
-0.330574	when interprocedural optimization is enabled.
-0.236579	set (or higher) is enabled.
-0.175231	page 32 for an explanation
-0.175231	page 130 for an explanation
-0.175231	page 43 for an explanation
-0.175231	page 81 for an explanation
-0.037632	VIA CPUs" for an explanation
-0.450548	arrays. I have no explanation
-0.352374	Please skip the following explanation
-0.218584	sets A more detailed explanation
-0.531517	typical repeat count is near
-1.028622	functions that are used near
-0.344168	used together are stored near
-0.078396	program are also stored near
-0.037431	other are also stored near
-0.334953	functions which are called near
-0.234012	of the code together near
-0.322208	two integers are equally near
-0.341418	in assembly language is provided
-0.382389	the instruction sets is provided
-0.237588	objects they contain is provided
-0.236947	logic. Some guidelines are provided
-0.293122	explained above. Examples are provided
-0.236947	searching and parsing are provided
-0.352925	new one. I have provided
-0.233575	a non-virtual member function, provided
-0.200037	do not use branches, provided
-0.358344	dispatch branch of the latter
-0.142612	*(++p) because in the latter
-0.142612	array[++i] because in the latter
-0.460143	error code. If the latter
-0.347931	be obtained. In the latter
-0.407439	avoided by inlining the latter
-0.487154	b)) even though the latter
-0.438362	in 64-bit systems. The latter
-0.324894	64 bit mode. The latter
-0.355153	ArrayOfStructures[100]; Here, there are 6
-0.463633	2 bytes. first // 6
-0.314516	addition takes 3 - 6
-0.235309	original, poorly designed program. 6
-0.224914	float or double plus 6
-0.451978	dominate in the future. 6
-0.212286	Choice of microprocessor ........................................................................................... 6
-0.200015	optimal algorithm ....................................................................................... 24 6
-0.165153	Choice of operating system......................................................................................... 6
-0.339146	function ten times and stores
-0.237629	transposes a matrix and stores
-0.461113	function, and the function stores
-0.434181	functions // This function stores
-0.293784	four objects. STL vector stores
-0.225363	extra time. It simply stores
-0.225363	data member pointer simply stores
-0.234520	while the Gnu mechanism stores
-0.165169	mov DWORD PTR [ecx+eax*4],ebx stores
-0.294100	what you want it to.
-0.271510	change what it points to.
-0.233586	a function pointer points to.
-0.064828	variable that r points to.
-0.064828	0 that r points to.
-0.230108	want them to apply to.
-0.307001	of the object pointed to.
-0.222377	code that it jumps to.
-0.200032	the member pointer refers to.
-0.358617	to integers of the default
-1.130896	recommended to use the default
-0.654969	a class with a default
-0.442003	functions. This applies to default
-0.237818	// x,y coordinates // default
-0.341642	are often used by default
-0.292387	transferred in registers by default
-0.236300	optional and off by default
-0.237588	need a constructor. A default
-0.536023	size of vector, bits Instruction
-0.236111	of each table element Instruction
-0.327999	name Intrinsic function name Instruction
-0.230798	Windows, Linux, Mac, BSD Instruction
-0.219993	sets is as follows: Instruction
-0.219993	files are as follows: Instruction
-0.224929	and compiler makers. 4. Instruction
-0.165159	not supported fprintf(stderr, "\nError: Instruction
-0.165159	point multiply-and-add Table 13.1. Instruction
-0.325389	for the purpose of finding
-0.565785	table is used for finding
-0.829252	can be useful for finding
-0.472379	which are useful for finding
-0.303026	is most useful for finding
-0.338964	is not intended for finding
-0.312968	use binary search for finding
-0.404805	math is required for finding
-0.237705	optimize anything else than finding
-0.538896	floating point numbers is inefficient.
-0.237843	and far procedures are inefficient.
-0.492292	but it is very inefficient.
-0.292891	more complex and often inefficient.
-0.338518	kbytes. This is quite inefficient.
-0.609767	This makes data caching inefficient.
-0.289979	fragmented and caching becomes inefficient.
-0.701725	This is of course inefficient.
-0.424498	8.6b int a, b, c,
-0.141122	a.y);} vector a, b, c,
-0.031217	{ float a, b, c,
-0.031217	11.1a float a, b, c,
-0.031217	11.1b float a, b, c,
-0.031217	8.16 float a, b, c,
-0.064854	way: bool a, b, c,
-0.064854	7.9a bool a, b, c,
-0.338480	0, b = 0, c,
-0.237887	First-In-Last-Out access, sort and search
-0.237867	the user's needs. The search
-0.501540	order and there are search
-0.023334	have been added? If search
-0.235118	time intervals. Some programs search
-0.234512	vector instructions SSE4.2 string search
-0.335941	hash table can improve search
-0.233788	and then use binary search
-0.293721	to optimization by CPU Modern
-0.624280	point variables and operators Modern
-0.533500	8.1 How compilers optimize Modern
-0.232303	cache are critical resources. Modern
-0.370744	cores. 3.15 Dependency chains Modern
-0.318996	used is branch prediction. Modern
-0.301813	multiple calculations in parallel. Modern
-0.165153	and advanced prediction mechanisms. Modern
-0.165153	a temp1 and temp2. Modern
-0.934374	ownership of the memory block.
-0.325434	its own allocated memory block.
-0.459815	the new bigger memory block.
-0.232712	in one contiguous memory block.
-0.440190	copied to the new block.
-0.235983	manager for each allocated block.
-0.448863	pointer to the next block.
-0.228162	there is a try block.
-0.200032	in a thread environment block.
-0.438647	avoided when speed is critical.
-0.109266	when code caching is critical.
-0.109266	where code caching is critical.
-0.577672	cache that can be critical.
-0.355945	cache use can be critical.
-0.779558	functions that are not critical.
-0.314312	which resources are most critical.
-0.292122	where speed is particularly critical.
-0.387661	functions that are particularly critical.
-0.167784	The effect of dependency chains
-0.167784	gain if such dependency chains
-0.285282	misprediction, or long dependency chains
-0.167784	dependency chain. Such dependency chains
-0.167784	to break down dependency chains
-0.278622	chains, especially loop-carried dependency chains
-0.167784	of order. Long dependency chains
-0.035786	multiple cores. 3.15 Dependency chains
-0.035786	switches..................................................................................................... 22 3.15 Dependency chains
-0.314782	threads have finished the time-consuming
-0.237614	check makes dynamic_cast more time-consuming
-0.340058	methods if the most time-consuming
-0.340058	mispredictions. When the most time-consuming
-0.229133	of activating the very time-consuming
-0.229133	same thread as very time-consuming
-0.229133	particularly critical. A very time-consuming
-0.483387	which can be quite time-consuming
-0.443263	often useful to put time-consuming
-0.336769	disk. Test with different brands
-0.377283	experiments on seven different brands
-0.233923	mechanism that treats different brands
-0.311926	without discriminating between CPU brands
-0.435792	fine-tuned for specific CPU brands
-0.343277	CPUs, not for other brands
-0.347714	works well on all brands
-0.231400	only known processors. Other brands
-0.212304	benchmark performance of competing brands
-0.843176	SSE2 instruction set is available.
-0.497060	AVX512 instruction set is available.
-0.237790	"function level linking" if available.
-0.347678	special purposes are also available.
-0.352216	optimized math function libraries available.
-0.235976	the strongest optimization option available.
-0.402906	the different integer types available.
-0.200026	a genuine compiler became available.
-0.436498	optimal in most cases. Don't
-0.331203	of longjmp if possible. Don't
-0.226769	starting and stopping threads. Don't
-0.212292	implemented with template metaprogramming. Don't
-0.074775	b1 * reciprocal_divisor; 14.7 Don't
-0.074775	division ........................................................................................... 139 14.7 Don't
-0.165159	must warn against overkill. Don't
-0.165159	line would be evicted. Don't
-0.165159	recommendation was the opposite: Don't
-0.314755	fast that what is brand
-0.339221	VIA processors because this brand
-0.353166	set. If the CPU brand
-0.289357	the check for CPU brand
-0.233636	not look at CPU brand
-0.233477	run optimally on any brand
-0.289176	processors can have any brand
-0.350146	CPU of a particular brand
-0.231406	a CPU of unknown brand
-0.549509	compiled when it is executed.
-1.121543	of the code is executed.
-0.855501	of the program is executed.
-0.335236	function and branch is executed.
-0.048260	first time Func is executed.
-0.048260	every time Func is executed.
-0.354682	stages before they are executed.
-0.462127	last time it was executed.
-0.224510	time the statement was executed.
-0.355307	by x<<3, which is faster.
-0.236959	to make their software faster.
-0.292434	is approximately three times faster.
-0.109548	operation which is much faster.
-0.109548	operation, which is much faster.
-0.221303	it is accessed much faster.
-0.590437	make the address calculation faster.
-0.234663	to make the division faster.
-0.286802	kinds of code execute faster.
-0.458843	The elements at the diagonal
-0.212085	we used above the diagonal
-0.212085	elements matrix[c][r] above the diagonal
-0.122475	row 28 below the diagonal
-0.056934	elements matrix[r][c] below the diagonal
-0.056934	element matrix[r][c] below the diagonal
-0.237437	} // At the diagonal
-0.020789	// loop columns below diagonal
-0.233588	to nearest integer int n;
-0.023192	i; } u; int n;
-0.233588	// Example 14.3a int n;
-0.233588	// Example 14.3b int n;
-2.093563	= 0; i < n;
-0.184761	= 2.0; x <= n;
-0.234199	= 2; i <= n;
-0.200043	__asm fistp dword ptr n;
-1.191012	i++) { a[i] = *p
-0.044048	p) { *p = *p
-0.237780	Accessing an object by *p
-0.146499	(int * p) { *p
-0.294280	is necessary to reload *p
-0.035783	Example 7.31b char string[100], *p
-0.035783	Example 7.31a char string[100], *p
-0.498509	the situation where the logic
-0.237863	expression better explains the logic
-0.314694	three times faster. The logic
-0.449235	elements and the program logic
-0.510655	containers. If the program logic
-0.344613	been deallocated. The program logic
-0.074781	faster in a programmable logic
-0.074781	logic devices A programmable logic
-0.165174	graphics processors. 5 Programmable logic
-0.525551	64-bit code for the Microsoft,
-0.645295	as good as the Microsoft,
-0.237867	I have tried. The Microsoft,
-0.495128	directives are supported by Microsoft,
-0.237759	and 32-bit Linux with Microsoft,
-0.237554	the latest compilers from Microsoft,
-0.234207	compilers Intel, Microsoft Intel, Microsoft,
-0.450619	Supports all x86 platforms. Microsoft,
-0.224920	for intrinsic functions (i.e. Microsoft,
-0.357984	be swapped to the hard
-0.357448	scattered around on the hard
-0.357373	The change of a hard
-0.337161	a file on a hard
-0.337161	sufficiently fast on a hard
-0.349669	for response from a hard
-0.349046	need better support for hard
-0.293305	time consumer to many hard
-0.228170	a slow and fragmented hard
-0.656246	an increasing number of purposes
-0.352068	different algorithms for different purposes
-0.285535	processing unit for other purposes
-0.285535	accelerator card for other purposes
-0.237334	computing, but for most purposes
-0.292109	libraries for many common purposes
-0.287848	Many libraries for special purposes
-0.283109	registers available for general purposes
-0.165159	limited audience for educational purposes
-0.762331	is advantageous if the typical
-0.821488	the code in a typical
-0.457479	is called in a typical
-0.354078	works best on a typical
-0.237388	data should contain a typical
-0.237872	one is fastest. The typical
-0.237588	inputs give infinity. A typical
-0.237035	list points out some typical
-0.165169	at compile time. Four typical
-0.356947	binding leads to a usability
-0.785467	costs in terms of usability
-0.048347	time. 4 Performance and usability
-0.048347	22 4 Performance and usability
-0.336218	graphical interface calls. The usability
-0.325385	standardized as possible for usability
-0.237843	All these problems are usability
-0.236308	as well as important usability
-0.165159	about bugs, compatibility problems, usability
-0.811422	that a function is pure
-0.348254	ivdep Assume function is pure
-0.357809	that this is a pure
-0.356783	Multiple calls to a pure
-0.237846	pow and log are pure
-0.293848	them. Pure functions A pure
-0.231388	out loop-invariant code containing pure
-0.229219	common subexpressions that contain pure
-0.304407	manually when it involves pure
-0.568950	Here you have to vectorize
-1.665855	It is possible to vectorize
-0.354273	compilers. We want to vectorize
-0.593876	compiler is unable to vectorize
-0.237896	to 12.8b automatically and vectorize
-0.351633	current compilers may not vectorize
-0.448632	whether the compiler will vectorize
-0.494553	vectorization. The compiler will vectorize
-0.234801	where current compilers don't vectorize
-0.293627	there are no cache problems.
-0.233092	technical problems or performance problems.
-0.233092	useful for investigating performance problems.
-0.292728	ways to avoid these problems.
-0.231883	parameters because of alignment problems.
-0.231907	find and resolve compatibility problems.
-0.212292	completely because of technical problems.
-0.165159	the end user. Installation problems.
-0.165159	to the user. Compatibility problems.
-0.314678	to a variable that could
-0.382688	returns even though it could
-0.357410	reference, or the function could
-1.466500	parts of the code could
-0.237677	In example 8.21, you could
-0.290657	of the following methods could
-0.224924	believe that the portability could
-0.218554	0. The constant N1 could
-0.165153	improved is that r+i/2 could
-0.314602	the variable as function parameter.
-0.237429	functions counts a one parameter.
-0.361515	value of the template parameter.
-0.276753	name and the template parameter.
-0.192976	instead of a template parameter.
-0.261619	provided as a template parameter.
-0.040810	class through a template parameter.
-0.278815	class name as template parameter.
-0.241209	an object of the derived
-0.357248	source file and the derived
-0.352785	making objects inside the derived
-0.498016	an object of a derived
-0.498016	An object of a derived
-1.076540	a pointer to a derived
-0.351700	parent class and a derived
-0.408127	of parent class and derived
-0.358392	memory allocation can be mentioned
-0.348319	Some common compilers are mentioned
-0.237733	and garbage collection, as mentioned
-0.449785	is the vector operations mentioned
-0.425173	of all the problems mentioned
-0.234780	of the storage methods mentioned
-0.281553	not have the disadvantages mentioned
-0.200015	none of the time-consumers mentioned
-0.251329	instructions than the ones mentioned
-0.419445	times to test // Time
-0.236651	Repeat NumberOfTests times // Time
-0.236651	to prevent optimizing // Time
-0.293531	as follows: Matrix size Time
-0.589253	time for the user. Time
-0.200026	Matrix size Total kilobytes Time
-0.251342	per element Example 9.6a Time
-0.165164	38.1 97 Table 9.1. Time
-0.165164	58.7 168.3 Table 9.3. Time
-0.281561	according to the table. Optimization
-0.068034	testing ................................................................................................ 157 17 Optimization
-0.068034	than normal. 157 17 Optimization
-0.074779	in the book "Performance Optimization
-0.074779	and Adolfy Hoisie: "Performance Optimization
-0.074779	on exception handling. 8.6 Optimization
-0.074779	options ................................................................................... 81 8.6 Optimization
-0.165169	produced regularly. AMD: "Software Optimization
-0.165169	64 and IA-32 Architectures Optimization
-0.537630	use of floating point expressions.
-0.526941	apply to floating point expressions.
-0.240132	than on floating point expressions.
-0.346531	reductions on floating point expressions.
-0.237391	or more complex integer expressions.
-0.236404	select between two simple expressions.
-0.222397	expressions rather than Boolean expressions.
-0.222397	programs with many Boolean expressions.
-0.231412	to reduce complicated algebraic expressions.
-0.237722	solution would be to include
-0.545633	} You have to include
-0.331704	performance measurement should not include
-0.314123	realistic performance test should include
-0.346139	data compression Most compilers include
-0.346332	The newest instruction sets include
-0.233020	compiled code. Compiled languages include
-0.212292	maintain. Most compiler packages include
-0.212308	it is run. Examples include
-0.324259	>>= 1; } return y;
-0.253079	Example 8.8b double x, y;
-0.206143	56 public: float x, y;
-0.159638	Func() { S1 x, y;
-0.159638	d, e, f, x, y;
-0.124075	a, b, c, d, y;
-0.301841	100, c = 100, y;
-0.165174	1.0E8, c = 1.23456, y;
-0.294262	time but avoids the overflow.
-0.836026	exception in case of overflow.
-0.325150	very obscure possibility of overflow.
-0.123053	is no check for overflow.
-0.324733	automatic check for integer overflow.
-0.336208	intermediate calculations can cause overflow.
-0.282858	signed integer doesn't cause overflow.
-0.200032	in question without generating overflow.
-0.728701	address of an array element.
-0.230360	clock cycles per array element.
-0.230360	of the current array element.
-0.454059	contains only a single element.
-0.478244	address of the matrix element.
-0.347276	we need the next element.
-0.103479	matrices, clock cycles per element.
-0.212304	finding a suitable pivot element.
-0.176844	the advantages of object oriented
-0.037925	negative effects of object oriented
-0.292441	and classes. The object oriented
-0.434947	may use an object oriented
-0.219153	main reasons why object oriented
-0.219153	programming textbooks recommend object oriented
-0.212321	and delete). 88 Object oriented
-0.165184	less efficient than non-object oriented
-0.463146	a breakpoint in the fully
-0.341627	14 Portability C++ is fully
-0.458475	that the syntax is fully
-0.356469	doubt obtained with a fully
-0.774203	The best way to fully
-0.237843	compilers for Windows are fully
-0.500123	function libraries are not fully
-0.314312	and prevent it from fully
-0.510229	libraries are not always fully
-0.347485	than other kinds of storage.
-0.307893	a function for register storage.
-0.232169	could benefit from register storage.
-0.234793	different threads need separate storage.
-0.228148	CPU used for temporary storage.
-0.003833	to systems with big-endian storage.
-0.015540	other platforms with big-endian storage.
-0.284345	that use big endian storage.
-0.511914	while the speed of addition,
-0.237903	numbers. You may, in addition,
-0.356105	integer operations such as addition,
-0.244683	much longer time than addition,
-0.576202	addition, a floating point addition,
-0.350288	to do an integer addition,
-0.235114	Most reductions involving integer addition,
-0.165164	operands: minimum, maximum, saturated addition,
-0.037146	block the execution of everything
-1.405750	to make sure that everything
-0.420332	double a, b; // everything
-0.537519	b * 1.2; // everything
-0.237110	in interpreted languages where everything
-0.581699	destructor to make sure everything
-0.312911	program must clean up everything
-0.281556	compile time to eliminate everything
-0.354677	time consumer if it involves
-0.349066	motion manually when it involves
-0.354138	particularly risky because it involves
-0.549619	especially if the code involves
-0.325043	actual processor. However, this involves
-0.237157	allocation. This method also involves
-0.114136	doing floating point operations involves
-0.165169	call to a driver involves
-0.459056	b[1000]; F2(b); } } Here
-0.336162	WriteFile(handle, ...)) { ... Here
-0.524746	in a suboptimal way. Here
-0.318064	with in assembly language. Here
-0.218554	2.5, which is double. Here
-0.466135	// Writes "Hello 2" Here
-0.200015	expansions and Newton-Raphson iterations. Here
-0.165153	out-of- order calculation capabilities. Here
-0.165153	= a1/b1 + a2/b2; Here
-0.526911	typical implementation of the factorial
-0.236670	// Example 14.1b int factorial
-0.236670	// Example 14.1a int factorial
-0.345825	Let's take the integer factorial
-0.234517	to x^0/0! // n factorial
-0.032686	7.32a double x, n, factorial
-0.032686	7.32b double x, n, factorial
-0.148750	x <= n; x++) factorial
-0.148750	>= 0; i--, x++) factorial
-0.358616	www.openmp.org. Documentation of the OpenMP
-0.237860	or PSDK). Supports the OpenMP
-0.237776	multiple threads Parallelization by OpenMP
-0.292214	of the data. Use OpenMP
-0.233024	32-bit and 64-bit. Supports OpenMP
-0.037860	options. Supports parallel processing, OpenMP
-0.037860	Mac. Supports parallel processing, OpenMP
-0.212298	data. Use OpenMP directives. OpenMP
-0.165164	vectorization (see page 107), OpenMP
-0.237285	compares the array pointer eax
-0.236229	the variable 85 ; eax
-0.283128	beginning of the array. eax
-0.191486	two instructions add ebx, eax
-0.146447	edx = r ebx, eax
-0.146447	eax ebx, 31 ebx, eax
-0.226767	DWORD PTR [esp+8] eax, eax
-0.224934	1 eax, 8 edx, eax
-0.200026	< 100. It compares eax
-0.002389	int aa[], short int bb[],
-0.614525	loop control branch is mispredicted
-0.053257	the other way is mispredicted
-0.530710	high repeat count is mispredicted
-0.492110	therefore certain to be mispredicted
-0.349668	return addresses to be mispredicted
-0.355160	function calls can be mispredicted
-0.355160	that branches can be mispredicted
-0.353751	a branch will be mispredicted
-0.294231	floating point format is standardized
-0.358071	also proceed in a standardized
-0.237883	access. Available protocols and standardized
-0.356751	of programs should be standardized
-0.294017	etc. should be as standardized
-0.357581	integer size is not standardized
-0.314389	process should always use standardized
-0.284332	the syntax is fully standardized
-0.200015	mechanism relies on non- standardized
-0.352650	monitor counters instead of (or
-0.314279	produces another C++ program (or
-1.628943	the SSE2 instruction set (or
-0.237211	makes the entire library (or
-0.347863	first element is stored (or
-0.520404	systems unless the SSE2 (or
-0.329179	into groups of four (or
-0.234515	I have used char (or
-0.495278	with new and delete (or
-0.325057	function and to optimize across
-0.214035	function or otherwise optimize across
-0.214035	discussed below. Cannot optimize across
-0.357334	it from making optimizations across
-0.219491	which will enable optimizations across
-0.460279	functions are not compatible across
-0.418753	allocation and parameter transfer across
-0.229229	size is not standardized across
-0.165169	data member is unchanged across
-0.011996	length of a clock cycle
-0.122169	comparable to a clock cycle
-0.210449	0.5ns. 2GHz A clock cycle
-0.426539	take only one clock cycle
-0.226756	typically takes one clock cycle
-0.221252	use the core clock cycle
-0.294928	frequency. The core clock cycle
-0.226607	optimize("a",on). Specifies that pointer aliasing
-0.220797	to assume no pointer aliasing
-0.294389	-fno-rtti Assume no pointer aliasing
-0.220797	for assuming no pointer aliasing
-0.226607	obstacle of possible pointer aliasing
-0.472911	it cannot rule out aliasing
-0.224953	page 81). 77 Pointer aliasing
-0.074785	rely on the strict aliasing
-0.074785	trick violates the strict aliasing
-0.004981	} void SelectAddMul(short int aa[],
-0.004981	branch void SelectAddMul(short int aa[],
-0.004981	classes void SelectAddMul(short int aa[],
-0.004981	inline void SelectAddMul(short int aa[],
-0.004981	x);} void SelectAddMul(short int aa[],
-0.004981	vectorized: void SelectAddMul(short int aa[],
-0.229514	Dispatcher void SelectAddMul_dispatch(short int aa[],
-0.229514	version void FUNCNAME(short int aa[],
-0.229514	typedef void FuncType(short int aa[],
-0.236792	Microsoft Visual Studio. This tool
-0.236792	available from www.agner.org/optimize/testp.zip. This tool
-0.298816	have developed a test tool
-0.288805	stamp counter. The test tool
-0.269450	manual for my test tool
-0.094737	monitor counters. My test tool
-0.094737	been identified. My test tool
-0.296289	programming language and development tool
-0.222400	tools. One popular development tool
-0.846594	The size of the parent
-0.352917	member functions of a parent
-0.757032	Data members of a parent
-0.226207	the member functions of parent
-0.143265	The member functions of parent
-0.237540	child class. Members of parent
-0.237818	goes in the // parent
-0.237190	details. Inheritance from multiple parent
-0.235227	the members of both parent
-0.928737	we don't have to care
-0.249115	container class that takes care
-0.249115	function library that takes care
-0.523425	that the compiler takes care
-0.289748	with destructors to take care
-0.289748	as coprocessors to take care
-0.304081	another thread can take care
-0.304081	one tread can take care
-0.341081	memory. If you don't care
-0.341611	by step. In most systems,
-0.252546	less efficient. In 64-bit systems,
-0.252546	clock cycle. In 64-bit systems,
-0.602309	the stack in 32-bit systems,
-0.329191	64-bit integers in 32-bit systems,
-0.527666	Mac OS X operating systems,
-0.232270	of programming languages, operating systems,
-0.476878	In Linux and Mac systems,
-0.008673	Lowest version int CriticalFunction_386(int parm1,
-0.008673	SSE2 version int CriticalFunction_SSE2(int parm1,
-0.035783	AVX version int CriticalFunction_AVX(int parm1,
-0.035783	version 127 int CriticalFunction_AVX(int parm1,
-0.165174	parameters typedef int CriticalFunctionType(int parm1,
-0.165174	first time int CriticalFunction_Dispatch(int parm1,
-0.580597	All the code is included
-0.348730	resources. This time is included
-0.293850	fragmentation. Bounds checking is included
-0.352561	most important functions are included
-0.357588	This '1' is not included
-0.236720	date. Mac The libraries included
-0.309826	Integer constants are usually included
-0.200032	no yes License license included
-0.353202	may even have a false
-0.294027	may be given a false
-0.294171	the value 0 for false
-0.954549	is known to be false
-0.237805	called with IsPowerOf2 = false
-0.237795	result, true (1) or false
-0.335765	= a a && false
-0.327470	= a, a || false
-0.478036	loop can change the value.
-0.357753	to have the same value.
-0.313641	as a function return value.
-0.292385	integer constant with its value.
-0.291837	it with the calculated value.
-0.318026	be below the maximum value.
-0.266481	constant to the previous value.
-0.266481	finished using the previous value.
-0.347553	name in the object file.
-0.022999	into a single object file.
-0.451577	in the same source file.
-0.221760	class in another source file.
-0.232330	or to an output file.
-0.326618	them into the executable file.
-0.230098	line or an input file.
-0.237703	y *= x; x *=
-0.234810	(n & 1) y *=
-0.230109	<= n; i++) f *=
-0.087047	<= n; x++) factorial *=
-0.087047	0; i--, x++) factorial *=
-0.226767	xn / nfac; xn *=
-0.212298	s += x^n/n! xxn *=
-0.200026	xn *= x; nfac *=
-0.357373	The creation of a temporary
-0.354083	store it in a temporary
-0.354083	the sequence in a temporary
-0.355929	uses ebx as a temporary
-0.294231	cause the creation of temporary
-0.355509	the CPU used for temporary
-0.341813	for register variables are temporary
-0.212309	Debugging. The profiler inserts temporary
-0.538896	size of abc is 12
-0.236144	is not optimal. Use 12
-0.306959	calculations with memory access. 12
-0.304390	branch misprediction is approximately 12
-0.226769	a ; parameter 2: 12
-0.218560	order execution ................................................................................................. 103 12
-0.200021	long double 8, 10, 12
-0.165159	Choice of function libraries........................................................................................ 12
-0.526828	good implementation of the memcpy
-0.582562	likely to use the memcpy
-0.643840	a single call to memcpy
-0.314712	use of memset and memcpy
-0.265177	0.18 0.18 0.18 0.11 memcpy
-0.212298	1.21 0.57 0.44 0.12 memcpy
-0.165164	function libraries Test Processor memcpy
-0.165164	1.00 0.25 0.28 0.22 memcpy
-0.358543	function address in the procedure
-0.354212	desired version in a procedure
-0.354212	a lookup in a procedure
-0.438528	The program uses a procedure
-0.341858	; mark end of procedure
-0.237872	64 bit Linux The procedure
-0.236858	to its functions, called procedure
-0.165169	feature uses an ordinary procedure
-0.337513	it is on a PC
-0.337513	cross- compiled on a PC
-0.531512	day be implemented in PC
-0.341670	been tested only on PC
-0.174440	x86) of the standard PC
-0.174440	based on the standard PC
-0.174440	blurred as the standard PC
-0.174440	most purposes the standard PC
-0.571888	pointers. This is a frequent
-0.353735	Memory swapping is a frequent
-0.382785	memory in advance. The frequent
-0.324920	standards. Such schemes are frequent
-0.314123	running. Such frameworks are frequent
-0.335978	Context switches are more frequent
-0.237461	locally or remotely. If frequent
-0.355156	are among the most frequent
-0.165159	double 2 AVX2 _mm256_i64gather_pd unlimited
-0.165159	float 8 AVX2 _mm_i64gather_pd unlimited
-0.165159	int 4 AVX2 _mm256_i32gather_epi32 unlimited
-0.165159	int64_t 4 AVX2 _mm_i32gather_ps unlimited
-0.165159	int64_t 2 AVX2 _mm256_i64gather_epi32 unlimited
-0.165159	int 8 AVX2 _mm_i32gather_epi32 unlimited
-0.165159	int 8 AVX2 _mm_i64gather_epi32 unlimited
-0.165159	float 4 AVX2 _mm256_i32gather_ps unlimited
-0.655331	in cases where the parallelism
-0.411384	simple cases where the parallelism
-0.122666	coarse-grained parallelism and fine-grained parallelism
-0.122666	parallelism than with fine-grained parallelism
-0.074781	more efficiently with coarse-grained parallelism
-0.074781	to distinguish between coarse-grained parallelism
-0.165174	running in parallel. Fine-grained parallelism
-0.165174	things in parallel. Coarse-grained parallelism
-0.439719	versions of the CPU detection
-0.458233	flaws in the CPU detection
-0.289618	or use the CPU detection
-0.056343	to replace the CPU detection
-0.289618	dispatching. Unfortunately, the CPU detection
-0.289618	or bypass the CPU detection
-0.363796	Overriding the Intel CPU detection
-0.237890	// Loop r2 and c2
-0.944056	each element in vector c2
-0.236954	mask to choose between c2
-0.042891	in vector c __m128i c2
-0.218577	c2 with the bit-mask: c2
-0.265177	for (c2 = r1; c2
-0.200026	for (c2 = c1; c2
-0.005119	paragraph and manual 3: "The
-0.002552	detail in manual 3: "The
-0.002552	covered in manual 3: "The
-0.000637	CPU (See manual 3: "The
-0.001274	CPUs (See manual 3: "The
-0.001274	mispredicted (See manual 3: "The
-0.037171	of branches. Manual 3: "The
-0.332950	single function by adding throw()
-0.182273	exceptions throw() throw() throw() throw()
-0.124363	throw exceptions throw() throw() throw()
-0.156889	not throw exceptions throw() throw()
-0.226779	does not throw exceptions throw()
-0.088604	should apply the empty throw()
-0.042046	also have an empty throw()
-0.042046	functions. While an empty throw()
-0.549407	predicted or if the prediction
-0.356475	calling vector::reserve with a prediction
-0.237872	cycles. The rules for prediction
-0.332681	predicted by the branch prediction
-0.279502	algorithms used for branch prediction
-0.224956	microcontrollers have no branch prediction
-0.224956	need to take branch prediction
-0.233568	out-of-order execution and advanced prediction
-1.097573	different versions of the polymorphic
-0.347420	It can call the polymorphic
-0.817288	which version of a polymorphic
-0.352646	Each instance of a polymorphic
-0.259242	need to call a polymorphic
-0.259242	needs to call a polymorphic
-0.292996	"Hello "; // call polymorphic
-0.347519	are used for implementing polymorphic
-0.314420	measurement code should have #if
-0.237621	directives. For example use #if
-0.459057	c); a.store(aa+i); } } #if
-0.237508	efficient than if because #if
-0.356998	depending on instruction set #if
-0.572978	the same source code. #if
-0.321114	nearest integer int n; #if
-0.251335	the program is compiled. #if
-0.237922	order of inheritance is now
-0.237843	Such hybrid solutions are now
-0.451128	know). The code can now
-0.232334	are accessed column-wise. Assume now
-0.228148	on the stack). ecx now
-0.391751	better. The loop body now
-0.883717	because their live ranges now
-0.165159	competing product is Borland's now
-0.348181	addition unit, but this unit
-0.350650	given below. The time unit
-0.574167	Subtractions use the same unit
-0.237420	it should save one unit
-0.279851	of the graphics processing unit
-0.208493	have a physics processing unit
-0.057015	dependency chain. 3.16 Execution unit
-0.057015	................................................................................................ 22 3.16 Execution unit
-0.392233	because the function calling conventions
-0.218596	obey any specific calling conventions
-0.001101	and manual 5: "Calling conventions
-0.000367	in manual 5: "Calling conventions
-0.001101	See manual 5: "Calling conventions
-0.218594	VIA CPUs. 5. Calling conventions
-0.498612	flag or in a register.
-0.776659	rather than in a register.
-0.513298	packed into a vector register.
-0.291808	same size as vector register.
-0.574167	cannot use the same register.
-0.237420	pointer takes up one register.
-0.423461	into a 128-bit XMM register.
-0.231400	of the same logical register.
-0.357812	switch statements is a kind
-0.494006	while-loop is also a kind
-0.321960	do not make this kind
-0.291043	The CPU supports this kind
-0.235119	rebooted. To prevent this kind
-0.453734	variables use a different kind
-0.234672	to tell explicitly what kind
-0.165174	leak. An even worse kind
-0.237947	obtained by dropping the graphical
-0.499321	message loop of a graphical
-0.352780	self-explaining menus of a graphical
-0.352283	the application has a graphical
-0.237588	access. 3.10 Graphics A graphical
-0.289770	rarely program their own graphical
-0.224932	Windows Library (OWL). Several graphical
-0.165169	doesn't depend on system-specific graphical
-0.348277	then use only the lower
-0.429424	It simply stores the lower
-0.325392	the residual error is lower
-0.354472	when compiling for a lower
-0.351593	responded to at a lower
-0.237920	zero-terminated ASCII string to lower
-0.335308	in other threads with lower
-0.236973	a separate thread with lower
-0.355525	body begins at the label
-0.331187	a sequence where each label
-0.038124	of loop ; unused label
-0.038124	; r ; unused label
-0.038124	if true ; unused label
-0.038124	;a ;r ; unused label
-0.442322	equal to the preceding label
-0.212309	back to the $B1$2 label
-0.314780	capabilities can overlap the iterations
-1.172394	if the number of iterations
-0.549735	is two or more iterations
-0.325142	the calculations of loop iterations
-0.293390	it is doing two iterations
-0.322665	be prepared for several iterations
-0.228618	loop control statement several iterations
-0.234799	loop is in mathematical iterations
-0.325315	in thousand so the misprediction
-0.237864	it may detect the misprediction
-0.582032	sure to make a misprediction
-0.625991	you may get a misprediction
-0.237899	rules for prediction and misprediction
-0.434745	is called the branch misprediction
-0.338049	recover from a branch misprediction
-0.228920	and resolve any branch misprediction
-0.321718	loop counter is an integer,
-0.321718	that 10 is an integer,
-0.300622	a pointer to an integer,
-0.300622	be converted to an integer,
-0.344553	of bits in an integer,
-0.237183	unsigned 4 4 64-bit integer,
-0.233807	as a biased binary integer,
-0.229243	or double plus 6 integer,
-0.083955	The principle of lazy binding
-0.005763	position-independent code and lazy binding
-0.048402	The delay on lazy binding
-0.048402	single session. But lazy binding
-0.048402	Some systems allow lazy binding
-0.074788	function is called. Lazy binding
-0.074788	sometimes unacceptably long. Lazy binding
-0.237924	the other hand, a just-in-time
-0.133779	on intermediate code and just-in-time
-0.133779	an intermediate code and just-in-time
-0.550553	framework are based on just-in-time
-0.236277	loop. Some implementations use just-in-time
-0.236277	best Java machines use just-in-time
-0.074781	frameworks, intermediate code, interpreters, just-in-time
-0.074781	large graphics frameworks, interpreters, just-in-time
-0.852565	if there is a try
-1.625308	It is recommended to try
-0.352778	An optimizing compiler may try
-0.237658	} void F0() { try
-0.237582	that the producer will try
-0.237582	microprocessor has hyperthreading, then try
-0.584923	because there is no try
-0.237059	GHz CPU. Should we try
-0.358484	that run in the background
-0.357921	it requires that the background
-0.237922	program should leave a background
-0.579433	installed, a lot of background
-0.331368	selecting optimize performance for background
-0.293751	and 10 ms for background
-0.283128	by doing the heavy background
-0.200032	efficient alternative. The theoretical background
-0.331483	s; An integer is converted
-0.344836	a base class is converted
-0.237591	in example 14.7b is converted
-0.553877	files need to be converted
-0.456420	an integer can be converted
-0.246873	a pointer can be converted
-0.246873	A pointer can be converted
-0.538290	large positive number when converted
-0.489181	size of the object pointed
-0.330598	the type of object pointed
-0.306619	about division). The object pointed
-0.791545	is that the value pointed
-0.342291	times because the value pointed
-0.420121	possibility that the variable pointed
-0.324374	the time the variable pointed
-0.413284	eliminated if the target pointed
-0.473659	with different brands of CPUs,
-0.331337	clock frequency than other CPUs,
-0.290540	works only for Intel CPUs,
-0.234677	flags on certain Intel CPUs,
-0.223358	vector operations of modern CPUs,
-0.223358	out-of-order capabilities of modern CPUs,
-0.122666	multiple CPUs or multi-core CPUs,
-0.122666	processor core on multi-core CPUs,
-0.237926	take extra precautions to account
-0.096623	have to take into account
-0.096623	if you take into account
-0.221083	and compatibility problems into account
-0.221083	take branch prediction into account
-0.006465	should be taken into account
-0.267318	(int a[], int * p)
-0.008964	__m128i LoadVector(void const * p)
-0.088234	__m128i LoadVectorA(void const * p)
-0.094023	void Plus2 (int * p)
-0.094023	void FuncA (int * p)
-0.214193	}; int Sum2(S3 * p)
-0.799374	all information about the chain
-0.167784	and splitting the dependency chain
-0.215219	This is a dependency chain
-0.167784	dependency chains. A dependency chain
-0.285282	sub-vector. A long dependency chain
-0.287789	makes a critical dependency chain
-0.167784	and Z. Each dependency chain
-0.167784	longer loop- carried dependency chain
-0.294196	structure, data flow and algorithms
-0.294173	other nearby branches. The algorithms
-0.237753	the general literature on algorithms
-0.349717	A discussion of different algorithms
-0.341208	to test several different algorithms
-0.236063	type of microprocessor. These algorithms
-0.234905	Another disadvantage of complicated algorithms
-0.289266	microprocessors are using advanced algorithms
-0.336215	variables go through the PLT
-0.237864	function and replaces the PLT
-0.331898	section position-independent, makes a PLT
-0.048264	of the GOT and PLT
-0.048264	not use GOT and PLT
-0.048264	no effect. GOT and PLT
-0.048264	-read_only_relocs suppress. GOT and PLT
-0.348294	the desired function. The PLT
-0.557306	of some of the heavy
-0.339417	accomplished by doing the heavy
-0.552986	question. For example, a heavy
-0.325398	very similar thanks to heavy
-0.237761	on a network with heavy
-0.356105	long time, such as heavy
-0.585589	systems. There is no heavy
-0.237031	by giving it some heavy
-1.213496	a piece of code once
-0.352751	subexpression occurs more than once
-0.538148	checking multiple values at once
-0.237481	instruction is executed only once
-0.456191	whether CriticalFunction is called once
-0.236348	quite dramatic consequences. I once
-0.200021	of the function. Compile once
-0.165159	DLL is relocated (rebased) once
-0.497655	But if all the additions
-0.237730	a balanced mix of additions
-0.314521	with a combination of additions
-0.237196	you get four float additions
-0.377564	takes to do two additions
-0.289913	done with just two additions
-0.235750	we can do four additions
-0.290358	be calculated by n additions
-0.061341	binary tree or a hash
-0.354265	some programmers use a hash
-0.319646	list of data. A hash
-0.233223	for finding elements. A hash
-0.233223	is fast enough. A hash
-0.233223	a specific interval. A hash
-0.165184	search facilities, binary trees, hash
-0.324597	esp+12 and loaded into ecx
-0.284568	ebx on stack ; ecx
-0.229421	label ;eax=addressofa ;edx=addressinr ; ecx
-0.226767	Only the registers eax, ecx
-0.226767	the array address is. ecx
-0.165164	than on the stack). ecx
-0.165164	ecx DWORD PTR [eax+4], ecx
-0.165164	[eax+400] DWORD PTR [eax], ecx
-0.526901	processors available in the system.
-0.367728	updates to the operating system.
-0.790775	CPU and the operating system.
-0.347963	processor and the operating system.
-0.442884	supported by the operating system.
-0.329945	hardware platform and operating system.
-0.267674	even have an operating system.
-0.312707	database in the Windows system.
-0.912513	integers and floating point variables,
-0.237185	feature. This includes static variables,
-0.763015	make floating point register variables,
-0.316022	the same for simple variables,
-0.230252	overflow, such as simple variables,
-0.232325	from), function parameters, local variables,
-0.176487	Register variables, integer Register variables,
-0.176487	subexpression elimin., float Register variables,
-0.357373	or structure. This is equally
-0.237761	*p or p->member is equally
-0.312517	to evaluate and are equally
-0.455787	thing and they are equally
-0.236050	If two integers are equally
-0.236050	Pointers and references are equally
-0.236050	ABC = 123; are equally
-0.454486	child class are accessed equally
-0.213212	performance. There are cases, however,
-0.348733	priority. In many cases, however,
-0.213212	are a few cases, however,
-0.296261	This method is inefficient, however,
-0.218572	allocation may be needed, however,
-0.200032	macro expansions. Programmers do, however,
-0.165169	are not always accurate, however,
-0.165169	references. It is OK, however,
-0.294043	4 (NetBurst) CPU is designed
-0.382624	However, the STL is designed
-0.555267	processors have to be designed
-0.237396	Many CPU dispatchers are designed
-0.237396	write instructions (MOVNT) are designed
-0.234791	template feature was never designed
-0.226778	of the original, poorly designed
-0.165169	instruction set was originally designed
-0.350837	set extensions. If a profiling
-0.314709	turn off debugging and profiling
-0.294047	compile the program with profiling
-1.465395	in order to make profiling
-0.790416	There are several different profiling
-0.328001	code. Inserting your own profiling
-0.306433	supports multiple programming languages, profiling
-0.165159	CPU vendors are offering profiling
-1.019027	if the code is fragmented
-0.237893	and a slow and fragmented
-0.357382	the memory to be fragmented
-0.314384	memory space becomes more fragmented
-0.273302	The heap space becomes fragmented
-0.219484	memory space never becomes fragmented
-0.296144	the memory to become fragmented
-0.213869	heap can easily become fragmented
-0.462439	input check if the inputs
-0.497532	runtime if all the inputs
-0.341776	console mode program. The inputs
-0.466250	the number of possible inputs
-0.288984	give overflow and negative inputs
-0.226773	example. The only allowed inputs
-0.218566	to keyboard and mouse inputs
-0.165164	0 to 12. Higher inputs
-0.355137	integer comparison, which is fast.
-0.237759	lrint(d); // Rounding is fast.
-0.475081	operation, which is very fast.
-0.316247	these operations are very fast.
-0.225276	registers are accessed very fast.
-0.225276	operations are generally very fast.
-0.235323	it is accessed quite fast.
-0.228166	class are accessed equally fast.
-0.331558	newer Intel CPUs have family
-0.354617	relies on the CPU family
-0.345723	necessarily newer. The CPU family
-0.312825	CPU based on its family
-0.065556	microprocessors in the x86 family
-0.171709	based on the x86 family
-0.165174	rather than its brand, family
-0.293417	vector. If n = 4,
-0.237205	= 2, Tuesday = 4,
-0.233802	16is calculated asa << 4,
-0.375081	on the old Pentium 4,
-0.206133	of 2 (i.e. 2, 4,
-0.206133	logical processors (0, 2, 4,
-0.200032	sizes 1, 2, 3, 4,
-0.165169	Journal Vol. 11, Iss. 4,
-0.237235	functions go here // Virtual
-0.237235	& obj1; p->f(); // Virtual
-0.708072	7.20 Virtual member functions Virtual
-0.234796	resource in 32-bit systems. Virtual
-0.122663	any non-static access. 7.20 Virtual
-0.122663	functions (methods)......................................................................... 53 7.20 Virtual
-0.200032	user-defined function is pure. Virtual
-0.165169	necessary (see page 96). Virtual
-0.352654	has i instead of j
-0.237761	j * 32 with j
-1.026218	< size; i++) { j
-0.328696	< rows; i++) { j
-0.237251	internally as (int)&matrix[0][0] + j
-0.356470	for (j = 0; j
-0.675796	the compiler can replace j
-0.229229	the factor to multiply j
-0.355462	will break at the interrupt
-0.408067	Remember to remove the interrupt
-0.294173	inline assembly instruction for interrupt
-0.330673	command received by an interrupt
-0.236679	how many times an interrupt
-0.236158	the application code. An interrupt
-0.321560	updating mechanism should never interrupt
-0.200032	System programming Device drivers, interrupt
-0.048272	a | -1 = -1
-0.313728	a ^ ~a = -1
-0.742720	n.a. - a & -1
-0.293317	= 0 a & -1
-0.256220	= a a | -1
-0.256220	n.a. - a | -1
-0.309565	= a a ^ -1
-0.462960	Integer variables can be 8,
-0.237805	= 4, Wednesday = 8,
-0.237795	1, 2, 4 or 8,
-0.325107	of sizes other than 8,
-0.960332	bytes. first byte at 8,
-0.315429	at 1 byte at 8,
-0.338789	8 8 long double 8,
-0.283115	2 (i.e. 2, 4, 8,
-0.246865	as cache and execution units.
-0.246865	operations use different execution units.
-0.196046	native floating point execution units.
-0.196046	CPUs. Half size execution units.
-0.196046	fact only 64-bit execution units.
-0.196046	split between several execution units.
-0.196046	CPUs with full-size execution units.
-0.378840	two floating point multiplication units.
-0.237887	bigger software packages and who
-0.237436	Intel function libraries, but who
-0.476447	for the end user who
-0.281550	programmers and software developers who
-0.222375	as single precision. And who
-0.165159	manuals are for those who
-0.165159	page 164 below. Those who
-0.165159	thank the many people who
-0.421032	this to be the fastest
-0.293971	that is calculated the fastest
-0.615279	In most cases, the fastest
-0.382535	it is still the fastest
-0.523113	piece of code is fastest
-0.453717	discussed which method is fastest
-1.378620	for the sake of fastest
-0.408090	speed is critical. The fastest
-0.237795	pointer aliasing. __declspec(noalias) or __restrict
-0.289492	__restrict aa, int * __restrict
-0.233755	problem void AddTwo(int * __restrict
-0.761635	by using the keyword __restrict
-0.228153	#pragma optimize("a", on) __restrict __restrict
-0.330816	noalias) __restrict #pragma ivdep __restrict
-0.165164	__restrict __restrict __declspec( noalias) __restrict
-0.165164	aliased #pragma optimize("a", on) __restrict
-0.351848	the source is an arithmetic
-0.237385	as fast as integer arithmetic
-0.541502	and you can do arithmetic
-0.237285	are uninitialized, if pointer arithmetic
-0.235554	other resources than doing arithmetic
-0.176489	a memory address. Pointer arithmetic
-0.176489	can be accessed. Pointer arithmetic
-0.165164	as gates, flip-flops, multiplexers, arithmetic
-0.355818	not vacant then the DLL
-0.331851	to data within the DLL
-1.189335	a function in a DLL
-0.457648	A variable in a DLL
-0.354197	format. Alternatively, make a DLL
-0.574174	applications use the same DLL
-0.275096	either as a runtime DLL
-0.221068	static library. A runtime DLL
-0.497534	loop if all the factors
-0.346377	by summing up the factors
-0.339013	called with many different factors
-0.236067	static link libraries. These factors
-0.464717	compiler There are several factors
-0.329762	*.so). There are several factors
-0.231406	be so many unknown factors
-0.165169	shared resources are limiting factors
-0.357159	Unix applications and the Gnu,
-0.822677	are supported by the Gnu,
-0.501793	is obtained with the Gnu,
-0.494312	compilers such as the Gnu,
-0.864222	is to use the Gnu,
-0.237877	Use automatic parallelization. The Gnu,
-0.356115	automatic vectorization, such as Gnu,
-0.229254	Intel, Microsoft Intel, Microsoft, Gnu,
-0.065598	for overflow of the arrays.
-0.323853	all data in large arrays.
-0.230810	string constants, and initialized arrays.
-0.410568	shows how to align arrays.
-0.230111	to account for unaligned arrays.
-0.122663	C style with character arrays.
-0.122663	C style as character arrays.
-0.656249	an increasing number of devices
-0.313828	to realize that such devices
-0.236507	back again. Accessing system devices
-0.228155	particularly important on small devices
-0.369279	fast on such small devices
-0.229229	processors. 5 Programmable logic devices
-0.224948	VHDL or Verilog. Common devices
-0.165164	be controlled. Small hand-held devices
-0.659351	get rid of the branch.
-0.462394	operator here is a branch.
-0.442001	also a kind of branch.
-0.237864	is compatible with that branch.
-0.404444	by the loop control branch.
-0.404444	predict the loop control branch.
-0.230802	known from a previous branch.
-0.265177	has chosen the wrong branch.
-0.462435	upper limit to the required
-0.237861	called, it allocates the required
-0.237756	A little math is required
-0.237756	of data manipulation is required
-0.237791	removed after debugging if required
-0.859670	the amount of memory required
-0.224932	#include <pmmintrin.h> // SSE3 required
-0.200032	web browsing that previously required
-0.426770	/ 10; a = (unsigned
-0.426770	% 10; a = (unsigned
-0.134491	/ 16; a = (unsigned
-0.134491	% 16; a = (unsigned
-0.449633	double c; b = (unsigned
-0.232705	if ((unsigned int)i >= (unsigned
-0.226782	int)(i - min) <= (unsigned
-0.165179	} T & operator[] (unsigned
-0.355971	a measure that is almost
-0.585448	arrays, then it is almost
-0.357798	with double's. It is almost
-0.331271	Such a list is almost
-0.552417	standard is used in almost
-0.237638	identical to Linux in almost
-0.537287	in a loop where almost
-0.234204	Linux operating systems give almost
-0.659242	get rid of the GOT
-0.341766	call. (2) find the GOT
-0.352134	all functions and a GOT
-0.237744	where it expects a GOT
-0.447483	it will not use GOT
-0.232702	and BSD, the slow GOT
-0.200032	apparently has no effect. GOT
-0.165169	the option -read_only_relocs suppress. GOT
-1.229339	the beginning of the array.
-0.463071	any elements in the array.
-0.349482	each run in an array.
-0.529391	result in a different array.
-0.312161	thousand results in another array.
-0.335644	looping through a linear array.
-0.224926	just as a normal array.
-0.200026	bigger than the destination array.
-0.452716	_mm. These functions are listed
-0.418525	compilers. The results are listed
-0.329567	www.agner.org/optimize. Copyright conditions are listed
-0.236047	AVX. These suffixes are listed
-0.236047	about instruction latencies are listed
-0.237747	SSE2 instruction set, as listed
-0.236411	on using the instructions listed
-0.212315	with the function ReadTSC listed
-0.357913	be extended to the general
-0.358425	uses logarithms in the general
-0.237777	have to consult the general
-0.336298	the future due to general
-0.344732	integer registers available for general
-0.237501	can be justified for general
-0.335904	and _mm_free. A more general
-0.232696	(see page 53). No general
-0.237863	C++ is definitely the preferred
-0.237863	For these reasons, the preferred
-0.585574	made) then it is preferred
-0.357869	are aligned. It is preferred
-0.331576	directly compiled version is preferred
-0.538800	the function returns. The preferred
-0.583238	arrays, it may be preferred
-0.339363	standard PC processors are preferred
-0.213694	4 - 16 clock cycles,
-0.535939	just a few clock cycles,
-0.295386	(3 - 10 clock cycles,
-0.072106	addition takes 5 clock cycles,
-0.072106	operation takes 5 clock cycles,
-0.213694	3 - 6 clock cycles,
-0.349391	40 - 80 clock cycles,
-0.213694	12 - 25 clock cycles,
-0.237746	possible to vectorize code explicitly
-0.566102	then tell the compiler explicitly
-0.314292	have to prefetch data explicitly
-0.235123	to deallocate the space explicitly
-0.451908	do the CPU dispatching explicitly
-0.331473	do the algebraic reductions explicitly
-0.343869	like throw(A,B,C) to tell explicitly
-0.307554	well specify the alignment explicitly
-0.787410	share the same memory space.
-0.291977	directive never takes memory space.
-0.537895	a lot of cache space.
-0.537895	a waste of cache space.
-0.230671	and uses more cache space.
-0.524515	that take up cache space.
-0.289551	8 bytes of storage space.
-0.286817	time, RAM and disk space.
-0.357673	optimal solution is a fixed
-0.324957	in advance, because a fixed
-0.407638	possible to insert a fixed
-0.325039	to declare objects and fixed
-0.314413	with a small and fixed
-0.023391	a circular buffer with fixed
-0.236186	in regular patterns with fixed
-0.331435	Aligning dynamically allocated memory Memory
-0.231885	Graphics and sound processing Memory
-0.391749	swap memory to disk. Memory
-0.035781	access ...................................................................................................... 21 3.13 Memory
-0.035781	heavily loaded. 21 3.13 Memory
-0.251342	for high precision math. Memory
-0.165164	be cleaned up include: Memory
-0.165164	operating systems, and API's. Memory
-0.575602	the first byte of zero.
-0.407580	set an array to zero.
-0.293789	sets all elements to zero.
-0.237532	all other bits to zero.
-0.237848	the sign bit are zero.
-0.236118	make this extra element zero.
-0.235409	become imprecise or simply zero.
-0.233035	with all 0's gives zero.
-0.478370	small bits in a non-sequential
-0.129429	are accessed in a non-sequential
-0.214975	or accessed in a non-sequential
-0.339726	are indexed in a non-sequential
-0.382645	big data structures with non-sequential
-0.292676	would make the access non-sequential
-0.341858	have fast ways of multiplying
-0.806871	has hardware support for multiplying
-0.444202	should be done by multiplying
-0.576414	2 is faster than multiplying
-0.237613	powers of 2 when multiplying
-0.228172	a to double before multiplying
-0.228172	are too big before multiplying
-0.228172	to double precision before multiplying
-0.549000	integer to floating point Conversion
-0.535018	point numbers and integers Conversion
-0.235315	type of registers used. Conversion
-0.222389	Float to integer conversion Conversion
-0.222389	Integer to float conversion Conversion
-1.108792	instruction set is enabled. Conversion
-0.296254	integer to floating point. Conversion
-0.165164	the modulo operator %. Conversion
-0.341900	make a loop count down
-0.235219	the time consumption was down
-0.085805	Function calls may slow down
-0.085805	and writes may slow down
-0.192843	table lookup operations slow down
-0.287867	in ebx ; shift down
-0.279482	no need to break down
-0.165169	the program is shut down
-0.358616	logical architecture of the software.
-0.358483	into account in the software.
-0.352650	to efficient use of software.
-0.344371	API and the application software.
-0.234012	the lifetime of your software.
-0.224926	than third party security software.
-0.212298	versions of their 23 software.
-0.200026	compatibility with some legacy software.
-0.356630	became available because the interpreted
-0.348732	The measured time is interpreted
-0.339031	of a loop is interpreted
-0.420873	number when i is interpreted
-0.237896	as it is and interpreted
-0.508230	code. For example, in interpreted
-0.325037	no such advantage in interpreted
-0.355653	negative integer will be interpreted
-0.861626	that the code is exactly
-0.477816	an overloaded operator is exactly
-0.808073	the template parameters are exactly
-0.237396	in disguise. Enums are exactly
-0.237649	These different methods have exactly
-0.489684	some compilers will make exactly
-0.322496	and Sum3 are doing exactly
-0.311787	be difficult to measure exactly
-0.520399	This code has a jump
-0.525661	as a table of jump
-0.294165	counts for threads that jump
-0.237636	compiler can eliminate this jump
-0.343310	it needs an extra jump
-0.292301	end of array ; jump
-0.347987	call makes the microprocessor jump
-0.327032	initializer lists, switch statement jump
-0.348735	overall computation time is determined
-0.314356	type of storage is determined
-0.237591	the time slices is determined
-0.355421	processors available can be determined
-0.459180	instruction sets can be determined
-0.352115	is loaded cannot be determined
-0.572640	in some cases be determined
-0.350993	given task is often determined
-0.024594	int bb[], short int cc[])
-0.237756	function or every code line.
-0.240222	beginning of a cache line.
-0.448068	without loading a cache line.
-0.240222	of occupying a cache line.
-0.436031	to the same cache line.
-0.308555	share the same cache line.
-0.226245	into an arbitrary cache line.
-0.633559	size of a matrix line.
-0.356763	solutions. Patches should be easily
-0.353301	parallel structure that can easily
-0.540665	object. The compiler can easily
-0.292259	gain in performance can easily
-0.236188	allocation. The heap can easily
-0.325150	human readable and not easily
-0.232576	of the problem cannot easily
-0.232576	many encryption algorithms, cannot easily
-0.042292	support for runtime type identification
-0.042292	not use runtime type identification
-0.042292	or require runtime type identification
-0.042292	pointer No runtime type identification
-0.057357	identification (RTTI) Runtime type identification
-0.027732	53 7.21 Runtime type identification
-0.027732	effort. 7.21 Runtime type identification
-0.230117	18.3. Predefined macros Compiler identification
-0.358541	right positions in the vectors.
-0.357115	an integral number of vectors.
-0.237901	trigonometric functions, etc. in vectors.
-0.357548	allows larger floating point vectors.
-0.237382	also allows 256-bit integer vectors.
-0.293497	easily be organized into vectors.
-0.231883	for things like adding vectors.
-0.286816	vector as two 128-bit vectors.
-0.237258	0) ? (cc[i] + 2)
-0.237115	2 > v.i * 2)
-0.431875	< 100; i += 2)
-0.230117	< size; i += 2)
-0.230117	< 20; i += 2)
-0.308526	= &SelectAddMul_SSE41; (iset >= 2)
-0.017523	n.a. x*x*x*x*x*x*x*x = ((x2) 2)
-0.017523	((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x = ((x2) 2)
-0.237629	interfere with real time applications.
-0.339005	users in many different applications.
-0.349149	is best for all applications.
-0.313826	is supported in such applications.
-0.323845	quite inefficient in large applications.
-0.427934	Microsoft compiler for Windows applications.
-0.212292	sufficient for less intensive applications.
-0.165159	in special mathe- matical applications.
-0.237507	register is volatile. The volatile
-0.237507	is enabled. Volatile The volatile
-0.345224	optimized away. Note that volatile
-0.332740	effect of the keyword volatile
-0.234010	seconds was not declared volatile
-0.228153	Example 7.3. Explain volatile volatile
-0.165164	// Example 7.3. Explain volatile
-0.165164	ReadTSC() { int dummy[4]; volatile
-0.291797	realistic number of cache misses
-0.291797	The penalty of cache misses
-0.228454	of the code, cache misses
-0.228454	time a thousand cache misses
-0.228454	to disk. Provoke cache misses
-0.043506	64 matrix size causes misses
-0.043506	512 matrix size causes misses
-0.226793	be stored together Cache misses
-0.208014	Do not use lookup tables
-0.043421	132 14.1 Use lookup tables
-0.043421	topics 14.1 Use lookup tables
-0.308536	for example to produce tables
-0.184886	effect. GOT and PLT tables
-0.184886	suppress. GOT and PLT tables
-0.074783	Vectorized table lookup Lookup tables
-0.074783	the table lookup. Lookup tables
-1.118238	are accessed in a random
-0.037136	allocated and deallocated in random
-0.788427	method is useful for random
-0.237776	may be caused by random
-0.576411	file is faster than random
-0.293880	making the data more random
-0.237550	collection can occur at random
-0.230699	BSD and Mac OS X
-0.093317	compiler for Mac OS X
-0.093317	references. 64-bit Mac OS X
-0.165200	for 32-bit Mac OS X
-0.092268	Linux. 32-bit Mac OS X
-0.146974	The Intel-based Mac OS X
-0.218590	compiler #define Alignd(X) __declspec(align(16)) X
-0.251367	compiler, etc. #define Alignd(X) X
-0.324370	undetected. Converting class objects Conversions
-0.336170	< 10) { ... Conversions
-0.747022	stack registers are used. Conversions
-0.290673	Floating point precision conversion Conversions
-0.317408	with long double precision. Conversions
-1.108792	instruction set is enabled. Conversions
-0.148738	to another platform. 14.8 Conversions
-0.148738	and double..................................................................................... 140 14.8 Conversions
-0.655341	precision variables in the YMM
-0.356660	a change in the YMM
-0.132699	AVX instruction set and YMM
-0.237874	requires only SSE). The YMM
-0.196029	128-bit XMM and 256-bit YMM
-0.196029	(see below). The 256-bit YMM
-0.200037	to 256-bit registers named YMM
-0.237590	time while if is resolved
-0.615076	a dynamic library is resolved
-0.293854	if because #if is resolved
-0.458237	performance because they are resolved
-0.525322	linked function is not resolved
-0.332988	template parameter is always resolved
-0.323238	template parameters are always resolved
-0.357827	sufficiently accurate for the purpose
-0.292946	both loops // The purpose
-0.236792	thread is terminated. The purpose
-0.236792	stored in y. The purpose
-0.236792	memory, using new. The purpose
-0.418633	enough for the specific purpose
-0.232331	function libraries. Several special purpose
-0.346484	shared object without the -fpic
-0.339335	than when compiled with -fpic
-0.358802	the shared object without -fpic
-0.223988	process when compiled without -fpic
-0.223988	shared object compiled without -fpic
-0.220559	disadvantage of compiling without -fpic
-0.534639	compiled with the option -fpic
-0.356787	worth considering is the D
-0.237872	vector class library). The D
-0.237870	Compilers and IDE's for D
-0.234966	class B2; 54 class D
-0.234966	B1; class B2; class D
-0.226767	is the D language. D
-0.165169	drawbacks of C++. Yet, D
-0.355105	object as if it had
-0.354486	same as if you had
-0.548747	But if the program had
-0.340210	supported 128-bit vector registers had
-0.233032	<< 5. If columns had
-0.231392	vector size. Later models had
-0.212298	example, the first PC's had
-0.237385	of functions with integer parameters.
-0.236788	up to fourteen register parameters.
-0.236700	each set of template parameters.
-0.292385	function type and its parameters.
-0.235744	have more than four parameters.
-0.235619	from the same few parameters.
-0.218566	left for transferring additional parameters.
-0.235486	ebx, eax ebx, 1 ebx,
-0.365950	2. The instruction add ebx,
-0.225747	next two instructions add ebx,
-0.311499	; edx = r ebx,
-0.148153	= r ebx, eax ebx,
-0.148153	ebx, 31 ebx, eax ebx,
-0.218578	ebx, eax ebx, 31 ebx,
-0.237926	clock. This gives a measure
-0.344712	use this function to measure
-0.457793	of the program to measure
-0.520448	function we want to measure
-0.707035	may be difficult to measure
-0.325357	of test data and measure
-0.455123	clock counts that you measure
-0.503705	branch. If it is poorly
-0.477510	that the value is poorly
-0.335869	if the branch is poorly
-0.615718	possible to replace a poorly
-0.294152	if the branches are poorly
-0.165174	problems of the original, poorly
-0.165174	resource-hungry applications to perform poorly
-0.460025	various ways to do this:
-0.046601	implementation may look like this:
-0.046601	setup may look like this:
-0.098811	may typically look like this:
-0.051697	factorial function looks like this:
-0.051697	optimized code looks like this:
-0.051697	vector classes looks like this:
-0.526721	cases described in the sections
-0.237543	section and read-only data sections
-0.353316	not do. The following sections
-0.527311	mentioned in the above sections
-0.304396	code cache. The subsequent sections
-0.035782	-ffunction- sections /Gy -ffunction- sections
-0.035782	ced functions) /Gy -ffunction- sections
-0.229233	and resolve compatibility problems. Software
-0.301476	even swapped to disk. Software
-0.281555	computer is restarted anyway. Software
-0.013568	Intel: "IA-32 Intel Architecture Software
-0.165169	different user access rights. Software
-0.165169	and API's. Memory swapping. Software
-0.231902	precise floating point calculations. Even
-0.398517	CPUs is not needed. Even
-0.309099	would be predicted well. Even
-0.226767	in a pre-calculated table. Even
-0.304407	an Intel Pentium 4. Even
-0.165164	is unfortunately very common. Even
-0.165164	many years to come. Even
-0.548690	16, last byte at 19
-0.680009	are listed in table 19
-0.212298	3.5 Program loading ....................................................................................................... 19
-0.200026	3.4 Automatic updates .................................................................................................... 19
-0.200026	of compiler options....................................................................................... 160 19
-0.165164	_WIN64 _M_X64 _M_X64 162 19
-0.165164	move or key press. 19
-0.208313	version if speed is important.
-0.279639	preferred when speed is important.
-0.279639	code where speed is important.
-0.313952	question when efficiency is important.
-0.237253	recommended if portability is important.
-0.341517	becoming more and more important.
-0.251367	Virtualization is becoming increasingly important.
-0.460455	saved either in the carry
-0.356426	is kept in the carry
-0.356263	a register. If the carry
-0.352505	flags register into the carry
-0.353898	carry) instructions where the carry
-0.314275	and don't modify the carry
-0.237882	to the next. The carry
-0.349622	subsequent times because of lazy
-0.382590	calls. The principle of lazy
-0.233575	make position-independent code and lazy
-0.233575	uses position-independent code and lazy
-0.237757	linker. The delay on lazy
-0.236124	a single session. But lazy
-0.306274	important. Some systems allow lazy
-0.538529	the previous value as xn
-0.346146	Exp(float x) { float xn
-0.335434	} Here, each value xn
-0.323615	n++) { sum += xn
-0.291546	Thus, we will calculate xn
-0.165164	+= xn / nfac; xn
-0.165164	by the series: ex xn
-0.163207	value of the time stamp
-0.413162	obtained with the time stamp
-0.318783	addition to) the time stamp
-0.345208	Windows: __rdtsc()). The time stamp
-0.229732	using the so-called time stamp
-0.229732	etc. // Returns time stamp
-0.356629	optimized version because the debugging
-0.569255	frame is used for debugging
-0.237499	for the IDE, for debugging
-0.234374	may be removed after debugging
-0.337985	versions and turn off debugging
-0.231387	debug version with full debugging
-0.165169	the cost of verifying, debugging
-0.443224	to int x = 10;
-0.237214	const int NumberOfTests = 10;
-0.787339	a = b / 10;
-0.502048	= (unsigned int)b / 10;
-0.219105	= (unsigned int)a / 10;
-0.625845	a = b % 10;
-0.477516	= (unsigned int)b % 10;
-0.525896	behave according to the table.
-0.463072	as listed in the table.
-0.561322	given in the following table.
-0.340107	without using the virtual table.
-0.527311	shown in the above table.
-0.683956	in a procedure linkage table.
-0.165169	values in a pre-calculated table.
-0.237919	by a factor of 1,
-0.237808	Weekdays { Sunday = 1,
-0.881292	value than 0 or 1,
-0.286137	memory allocations of sizes 1,
-0.521674	Software Developer’s Manual", Volume 1,
-0.008673	int FactorialTable[13] = {1, 1,
-0.462189	the elements of a vector,
-0.063326	elements Total size of vector,
-0.345956	the appropriate type of vector,
-0.301814	short int in one vector,
-0.301814	R value in one vector,
-0.488813	values in the next vector,
-0.010348	bool b) { if (b)
-0.309524	* 3; } if (b)
-0.047684	y; bool b; if (b)
-0.047684	z; bool b; if (b)
-1.229646	the beginning of the object,
-0.833194	point to the same object,
-0.341556	when copying a large object,
-0.312437	that owns the allocated object,
-0.322760	variable in the shared object,
-0.731213	function in a shared object,
-0.318577	of returning a composite object,
-0.237927	partial template specialization is allowed
-0.356761	which imprecisions should be allowed
-0.237399	an STL container are allowed
-0.237399	'@' and '$' are allowed
-0.747036	a function is not allowed
-0.351961	function name is not allowed
-0.314234	an example. The only allowed
-0.347252	a container than to delete
-0.382591	if you forget to delete
-0.303998	memory with new and delete
-0.152555	to using new and delete
-0.152555	CString uses new and delete
-0.152555	the operators new and delete
-0.152555	alloca over new and delete
-0.233314	to the stack pointer. Likewise,
-0.581583	from the shared object. Likewise,
-0.326483	processors and instruction sets. Likewise,
-0.229224	question without generating overflow. Likewise,
-0.218566	of a different type. Likewise,
-0.200026	if a is false. Likewise,
-0.165164	of the second operand. Likewise,
-0.327904	instruction sets is as follows:
-0.288107	header files are as follows:
-0.638376	can be calculated as follows:
-0.232537	measured results were as follows:
-0.320460	memory if organized as follows:
-0.794050	can be expressed as follows:
-0.232537	of data elements, as follows:
-0.350578	elements of a vector simultaneously.
-0.237094	remove or modify objects simultaneously.
-0.228841	multiple processes or threads simultaneously.
-0.228841	can run eight threads simultaneously.
-0.222377	that run many processes simultaneously.
-0.222377	to do two jobs simultaneously.
-0.165169	jobs simultaneously or seemingly simultaneously.
-0.583816	instruments in the code itself
-0.357691	good as the compiler itself
-0.355723	resources than the program itself
-0.419017	in the test program itself
-0.344376	bigger than the application itself
-0.234010	powN template is calling itself
-0.229238	interpretation on the device itself
-0.140798	will be an efficient solution.
-0.140798	also be an efficient solution.
-1.043817	is the most efficient solution.
-0.424239	might be a better solution.
-0.413453	certainly a very inefficient solution.
-0.226773	and the most reliable solution.
-0.212309	best and most up-to-date solution.
-0.237812	though the rules of algebra
-0.172569	the many rules of algebra
-0.237528	XOR b Bit vector algebra
-0.351668	and shift Floating point algebra
-0.235136	Whole program optimization Integer algebra
-0.234799	by xx-xx--x- reciprocal Boolean algebra
-0.230100	mathematical calculations including linear algebra
-0.357374	small pieces of a suitable
-0.346087	many times with a suitable
-0.346087	>> n with a suitable
-0.237391	purpose of finding a suitable
-0.341861	provided several examples of suitable
-0.652915	write instructions are not suitable
-0.349166	be made for all suitable
-0.237818	using template metaprogramming // Template
-0.304408	alternative is the Windows Template
-0.323283	Library (ATL) and Windows Template
-0.231390	happened to be possible. Template
-0.200032	containers is the Standard Template
-0.165169	in the STL (Standard Template
-0.165169	used in the Active Template
-0.237832	The heap manager can spend
-0.353812	that it does not spend
-0.314461	maintain. The time you spend
-0.235405	intensive may very well spend
-0.235123	user input. Many programs spend
-0.311012	so you will never spend
-0.310680	function libraries Some applications spend
-0.232338	Such events as task switches
-0.016334	the number of context switches
-0.016334	The number of context switches
-0.069380	background jobs. The context switches
-0.069380	big program. Frequent context switches
-0.165190	memory caching. 3.14 Context switches
-0.122672	to be renewed. Context switches
-0.279213	swapping of memory to disk.
-0.279213	to swap memory to disk.
-0.293793	or even swapped to disk.
-0.293826	or resource files from disk.
-0.246863	around on the hard disk.
-0.196044	slow and fragmented hard disk.
-0.200043	file to a floppy disk.
-0.102803	unequally can become a serious
-0.102803	way has become a serious
-0.458848	code, but there are serious
-0.237614	range is possibly more serious
-0.237510	memcpy is unsafe because serious
-0.350021	classes. Security The most serious
-0.231397	the execution considerably. Another serious
-0.055078	+ 2, b * c);
-0.281033	+ two, b * c);
-0.008673	bc = _mm_mullo_epi16 (b, c);
-0.165179	function a = CriticalFunction(b, c);
-0.165179	pointer a = (*CriticalFunction)(b, c);
-0.090744	into the Microsoft Visual Studio
-0.090744	plug-in to Microsoft Visual Studio
-0.090744	mentioned below. Microsoft Visual Studio
-0.108193	for multi-core processing. Visual Studio
-0.108193	available for free. Visual Studio
-0.108193	objects, respectively (MS Visual Studio
-0.165190	80x86 / x64 (Visual Studio
-0.354881	Example 7.22 short int a[100];
-0.442361	S2 { public: int a[100];
-0.228175	out by 4 float a[100];
-0.228175	of a list float a[100];
-0.228175	// Example 7.26b float a[100];
-0.228175	// Example 7.26a float a[100];
-0.345109	Example 8.14a int i, a[100];
-0.237952	I have used the trick
-0.292538	a is true. The trick
-0.236433	a2*b1) / (b1*b2); The trick
-0.236433	when type-casting pointers: The trick
-0.236433	14.23 page 143. The trick
-0.236433	value of sum. The trick
-0.308093	done with a special trick
-0.446156	will not have the disadvantages
-0.294164	the advantages over the disadvantages
-0.382785	given in advance. The disadvantages
-0.521746	are. However, there are disadvantages
-0.237035	it does have some disadvantages
-0.236609	how to overcome these disadvantages
-0.352368	systems have the following disadvantages
-0.236640	ebx. Only the registers eax,
-0.225745	DWORD PTR[ecx+eax*4],ebx eax, 1 eax,
-0.225745	?Func2@@YAXQAHAAH@Z ENDP ecx, 1 eax,
-0.212304	loop increment i++. cmp eax,
-0.330824	ecx, DWORD PTR [esp+8] eax,
-0.165169	PTR [edx] DWORD PTR[ecx+eax*4],ebx eax,
-0.165169	$B2$2: mov mov 2:8+esp eax,
-1.061483	when the code is distributed
-0.102569	The program code is distributed
-0.237899	code is compiled and distributed
-0.546958	libraries need to be distributed
-0.529140	file needs to be distributed
-0.352224	used for function libraries distributed
-0.237759	from different compilers is generally
-0.294047	A WTL application is generally
-0.237899	is so important and generally
-0.445413	operators Integer operations are generally
-0.314130	threads? Container classes are generally
-0.552937	integer, then you can generally
-0.352108	by 16. You can generally
-0.234062	of portability to 64-bit mode,
-0.234062	edx, respectively. (In 64-bit mode,
-0.329193	mode than in 32-bit mode,
-0.477247	inefficient, especially in 32-bit mode,
-0.528216	-fpic in 64 bit mode,
-0.308003	useful in 32- bit mode,
-0.200043	a file in exclusive mode,
-0.548146	and 64-bit Windows and Linux.
-0.548146	in both Windows and Linux.
-0.339340	same way as in Linux.
-0.237638	programs but rarely in Linux.
-0.538059	and Intel compilers for Linux.
-0.515245	Eclipse when compiling for Linux.
-0.704099	for 32- and 64-bit Linux.
-0.750487	void test () { C1
-0.235230	}; void F1() { C1
-0.235230	}; void g() { C1
-0.230237	object belongs to class C1
-0.305594	void f(); }; class C1
-0.230237	"Hello "; Disp(); class C1
-0.230237	// Example 7.44 class C1
-0.349044	The so-called objects are instances
-0.333073	be applied to all instances
-0.235357	}; // Make all instances
-0.237190	that CParent::Hello() has multiple instances
-0.345492	A template with many instances
-0.236703	Two or more template instances
-0.165169	can hold many renamed instances
-1.267050	time the function is called,
-0.507447	When the function is called,
-0.162267	time a function is called,
-0.642415	a shared object is called,
-0.355183	virtual function will be called,
-0.237428	will most likely be called,
-0.325312	the computer during the update
-0.237861	are often abusing the update
-0.820170	without the need to update
-0.294175	reason for updating. The update
-0.237798	of an update, or update
-0.407837	you can make an update
-0.236614	install this important new update
-0.237703	(x = 2.0; x <=
-0.234212	>= min && i <=
-0.234212	(i = 2; i <=
-0.236440	in the interval 0 <=
-0.234517	n = 1; n <=
-0.165169	((unsigned int)(i - min) <=
-0.165169	0x3F800000; // Now 1.0 <=
-0.885288	from floating point to integer.
-0.342286	should preferably be an integer.
-0.279963	access x as an integer.
-0.279963	is treated as an integer.
-0.279963	be represented as an integer.
-0.313596	bits of the 32-bit integer.
-0.265197	number to the nearest integer.
-0.357009	function call by the body
-0.460598	very inefficient because the body
-0.357416	by declaring the function body
-0.552267	advantageous if the loop body
-0.303978	clearly better. The loop body
-0.303978	mov eax,0. The loop body
-0.292400	used or if its body
-0.276046	cache for the hardware definition
-0.063785	C++, and a hardware definition
-0.010001	coded in a hardware definition
-0.020238	programmed in a hardware definition
-0.063785	instructions, where a hardware definition
-0.180108	connect them. The hardware definition
-0.357509	.NET framework and the Java
-0.341640	run. Some implementations of Java
-0.314524	of the features of Java
-0.569959	that is used for Java
-0.342984	.NET and the best Java
-0.337085	virtual machine. The best Java
-0.331851	by emulating the so-called Java
-0.237306	internal multi-threading, e.g. Intel Math
-0.236036	x86-64 platforms. AMD AMD Math
-0.184760	old version of Intel's Math
-0.184760	are supplied in Intel's Math
-0.224949	_mm_exp_ps _mm_exp_pd AMD Core Math
-0.194036	such as the "Intel Math
-0.148745	purposes (www.boost.org). The "Intel Math
-0.559048	code that the compiler generates
-0.341700	code that a compiler generates
-0.783815	} } The compiler generates
-0.552127	} The Intel compiler generates
-0.416635	while the type conversion generates
-0.265197	subtracting 1 from -128 generates
-0.200043	time then the sampling generates
-0.237872	than other CPUs for executing
-0.331767	lot of optimization by executing
-0.330910	their execution time on executing
-0.313552	clock cycles spent on executing
-0.200863	counter before and after executing
-0.200863	count before and after executing
-0.165174	to x 43 speculatively executing
-0.237890	calling conventions. FreeBSD and Open
-0.231888	Agner's vector class library. Open
-0.230802	Does not optimize well. Open
-0.212298	Compiler v. 8.42n, 2004. Open
-0.200026	mutexes. Open database connections. Open
-0.165164	and image processing. Yeppp. Open
-0.165164	brushes, etc. Locked mutexes. Open
-1.531749	const int size = 256;
-0.771366	= 0; i < 256;
-0.523925	do different kinds of optimizations.
-0.352389	constant propagation and other optimizations.
-0.235405	unfortunately it prevents certain optimizations.
-0.334420	the possibility for further optimizations.
-0.068034	efficient and enables interprocedural optimizations.
-0.068034	modules. This enables interprocedural optimizations.
-0.200032	giving access to low-level optimizations.
-0.501276	== 0) { // Cache
-0.760232	should be stored together Cache
-0.074781	and more important. 9.2 Cache
-0.074781	data ......................................................................................... 87 9.2 Cache
-0.074781	sequentially .......................................................................................... 96 9.10 Cache
-0.074781	order is opposite). 9.10 Cache
-0.200037	_mm_stream_si128 SSE2 Table 9.2. Cache
-0.350947	the software to be slower
-1.268541	is likely to be slower
-0.313817	make dynamic link libraries slower
-0.337808	soft processor is much slower
-0.235563	thread will always run slower
-0.332398	is likely to execute slower
-0.165169	but neither faster nor slower
-0.358010	user friendly. It is free
-0.421367	delete or malloc and free
-0.695304	compiler is available for free
-0.457408	most compilers do not free
-0.454224	may be only one free
-0.233306	this topic, see my free
-0.229234	even though it could free
-0.353877	efforts on the time consuming
-0.233665	graphics function is time consuming
-0.233665	system can be time consuming
-0.233665	problematic because these time consuming
-0.122672	to avoid the time- consuming
-0.122672	idea to put time- consuming
-0.165184	executing library functions. Time- consuming
-0.355957	the container is to hold
-0.314294	an extra register to hold
-0.335991	is big enough to hold
-0.380440	then each vector can hold
-0.292262	renaming. The CPU can hold
-0.292262	the vector registers can hold
-0.236191	Each cache line can hold
-1.032307	different parts of the memory,
-0.488348	from a variable in memory,
-0.520343	it is stored in memory,
-0.806725	to be stored in memory,
-0.272750	stored in dynamically allocated memory,
-0.272750	such as dynamically allocated memory,
-0.165179	Far Systems with segmented memory,
-0.261490	have no cache (see p.
-0.209031	to using templates (see p.
-0.209031	down dependency chains (see p.
-0.209031	no branch prediction (see p.
-0.209031	than the throughput (see p.
-0.289557	variables. (See thread-local storage p.
-0.279495	the stack (see above, p.
-0.327981	= 0; c < SIZE;
-0.010663	= 0; r < SIZE;
-0.010663	= 1; r < SIZE;
-0.213295	= 0; r1 < SIZE;
-0.298100	the loop in this case.
-0.298100	is needed in this case.
-0.298100	error message in this case.
-0.298100	the reduction in this case.
-0.335577	right formula in each case.
-0.313596	above for the 32-bit case.
-0.230816	same code in either case.
-0.237872	loop for calculations: for (
-0.048397	// Array size Alignd (
-0.048397	three aligned arrays Alignd (
-0.048397	int bb[size] ); Alignd (
-0.165174	2 52 , longdoublevalue (
-0.165174	2 23 , doublevalue (
-0.165174	calculated as follows: floatvalue (
-0.358397	exception handling can be expensive
-0.354562	uncached write is more expensive
-0.236222	makes program development more expensive
-0.324797	of the time, but expensive
-0.330778	level-2 cache are so expensive
-0.236967	you can get very expensive
-0.437842	strategies It is quite expensive
-0.237893	corrections for sign and rounding
-0.943622	much longer time than rounding
-0.575457	so the floating point rounding
-0.354482	improve efficiency by using rounding
-0.700255	difference in speed between rounding
-0.525793	for the difference between rounding
-0.236153	that supports this). Use rounding
-0.336792	VIA processors. See page 130
-0.336792	Intel CPU. See page 130
-0.346286	non-Intel processors (see page 130
-0.326653	different CPUs. (See page 130
-0.212315	128 17.4 129 129 130
-0.251360	in Intel compiler ......................................................................... 130
-0.165179	problem are the following: 130
-0.314753	or the user is far
-1.104119	be stored in a far
-0.237890	storage, far pointers, and far
-0.235407	labels that have values far
-0.761635	by using the keyword far
-0.701739	This is of course far
-0.200026	be huge). Far storage, far
-0.340864	stored sequentially in memory. They
-0.288274	are called global variables. They
-0.224934	implemented as three branches. They
-0.218566	on publicly available information. They
-0.330816	Linux, 32-bit and 64-bit. They
-0.165164	capabilities are very smart. They
-0.165164	profilers are often unreliable. They
-0.341836	explicitly what kind of exceptions
-0.514638	has to check for exceptions
-0.237790	be left out if exceptions
-0.324852	unrecoverable error without using exceptions
-0.212298	function does not throw exceptions
-0.200026	to check that thrown exceptions
-0.251342	arraysize) { // Catch exceptions
-0.502953	methods depend on the system,
-0.314686	systems. The smaller the system,
-0.524626	microprocessor and the operating system,
-0.490538	determined by the operating system,
-0.218882	there is no operating system,
-0.218882	In the Windows operating system,
-0.501620	in a protected operating system,
-0.504507	increasing function of the absolute
-0.336147	We can take the absolute
-0.979515	takes to calculate the absolute
-0.237628	within the DLL use absolute
-0.293609	code section contains no absolute
-0.236951	compiler sometimes uses 32-bit absolute
-0.296268	sign bit to compare absolute
-0.753575	d, y; y = (a
-0.237214	!(a < b) = (a
-0.312053	= 0; } if (a
-0.235660	b, c, d; if (a
-0.235660	// Example 14.15b if (a
-0.235660	// Example 14.15a if (a
-0.165184	inline function #define MAX(a,b) (a
-0.358542	that appears in the machine
-0.591225	as the number of machine
-0.314532	and then transferred as machine
-0.237277	code is translated into machine
-0.292136	and the Java virtual machine
-0.810624	one or a few machine
-0.200026	so that the resulting machine
-0.382693	;edx=addressinr ; ecx = Induction
-0.237701	{ int i; int Induction
-0.237583	temp += 9; } Induction
-0.324125	variables for array elements Induction
-0.328363	for other integer expressions Induction
-0.521663	Loop invariant code motion Induction
-0.165164	= temp; } 70 Induction
-0.237917	the time slices to 120
-0.237890	See page 95 and 120
-0.355556	optimally aligned. See page 120
-0.349061	or later instruction set. 120
-0.212298	120 12.10 Conclusion .......................................................................................................... 120
-0.200026	or 3-dimensional vectors ....................................................... 120
-0.165164	Aligning dynamically allocated memory................................................................. 120
-0.357990	do. Hence, it is hardly
-0.537333	assume that there is hardly
-0.324977	large memory model is hardly
-0.294152	instructions, but these are hardly
-0.314565	use 64-bit integers with hardly
-0.445986	1 cache. This has hardly
-0.235224	so that there was hardly
-0.357074	same thing and the CPUID
-0.524644	chosen based on the CPUID
-0.489323	The time when the CPUID
-0.342086	this task when the CPUID
-0.342086	than 33% when the CPUID
-0.347054	you may call the CPUID
-0.314242	model number. The only CPUID
-0.325419	functions that access the saved
-0.357869	carry bit can be saved
-0.355915	intermediate results should be saved
-0.351194	carry bit must be saved
-0.237851	and function calls are saved
-0.293792	possible if F1 has saved
-0.311533	of ebx that was saved
-0.358054	the target if the changes
-0.314770	is almost independent of changes
-0.313774	function. If the version changes
-0.313688	through 14, with some changes
-0.290359	a dispatcher. The dispatcher changes
-0.376028	that the last index changes
-0.222371	functions The keyword __fastcall changes
-0.237856	b and c are integers,
-0.346100	32-bit integers and 64-bit integers,
-0.234065	you to define 64-bit integers,
-0.224963	when applied to 32-bit integers,
-0.333600	clock cycles for 32-bit integers,
-0.224963	and b are 32-bit integers,
-0.279509	represented as two 32-bit integers,
-0.382861	I have implemented a collection
-0.237595	and time consuming. A collection
-0.145012	task switches and garbage collection
-0.145012	allocation, deallocation and garbage collection
-0.145543	too fragmented. This garbage collection
-0.145543	manager will start garbage collection
-0.165179	For example, the Boost collection
-0.292863	note that my optimization manuals
-0.344101	latest versions of these manuals
-0.231287	code examples in these manuals
-0.292637	or in the programming manuals
-0.231400	162 19 Literature Other manuals
-0.304396	first manual. The subsequent manuals
-0.423653	This series of five manuals
-0.551432	predicted depends on the processor.
-0.918628	cycles, depending on the processor.
-0.470751	64, depending on the processor.
-0.614245	running on an Intel processor.
-0.338093	on the Pentium 4 processor.
-0.306454	optimal on the actual processor.
-0.218586	as a so-called soft processor.
-0.314637	147 14.12 Position-independent code Shared
-0.381152	relocation at load time. Shared
-0.291965	in 32 bit Linux Shared
-0.201259	mode, as explained below. Shared
-0.201259	system, as explained below. Shared
-0.524782	Shared objects in BSD Shared
-0.371957	name for local references. Shared
-0.255783	shows, the method of storing
-0.255783	old C-style method of storing
-0.568569	It is used for storing
-0.237135	use a database for storing
-0.237135	used as buffers for storing
-0.346742	the innermost loop by storing
-0.342592	is implemented simply by storing
-0.237874	end users have. The developers
-0.341483	some disadvantages that make developers
-0.288671	advanced programmers and software developers
-0.308922	usability problems that software developers
-0.236314	details. Development time Some developers
-0.184760	resolve compatibility problems. Software developers
-0.184760	API's. Memory swapping. Software developers
-0.013328	int CriticalFunction_386(int parm1, int parm2)
-0.013328	int CriticalFunction_SSE2(int parm1, int parm2)
-0.013328	int CriticalFunction_AVX(int parm1, int parm2)
-0.055958	int CriticalFunction_Dispatch(int parm1, int parm2)
-0.293903	to sum1 from time T
-0.237585	{ return N; } T
-0.340420	or if the type T
-0.305796	N elements of type T
-0.236087	max(T const & a, T
-0.348564	<typename T> static inline T
-0.165169	class SafeArray { protected: T
-0.349515	at compile time to eliminate
-0.336047	because they fail to eliminate
-0.510331	a+1;. The compiler can eliminate
-0.332506	result. A compiler can eliminate
-0.312684	loop if this can eliminate
-0.346854	a2/b2; Here we can eliminate
-0.495662	list[i].b. It can also eliminate
-0.589346	not a power of 2:
-0.339391	defined as powers of 2:
-0.312988	1: printf("Beta"); break; case 2:
-0.250348	are discussed in manual 2:
-0.250348	loops" chapter in manual 2:
-0.329624	more detail in manual 2:
-0.322424	= a ; parameter 2:
-0.352646	Returning objects of a composite
-0.352646	inefficient. Objects of a composite
-0.352097	the parameter has a composite
-0.293628	Instead of returning a composite
-0.237923	of a parameter of composite
-0.378983	for the simplest cases, composite
-0.272284	preferred method for transferring composite
-0.626218	of a program. The profilers
-0.237761	Some common problems with profilers
-0.236308	of the code. Some profilers
-0.236063	is called CodeAnalyst. These profilers
-0.553348	not. There are various profilers
-0.232701	use AMD CodeAnalyst. Unfortunately, profilers
-0.200026	There are also third-party profilers
-0.314755	function names. But a highly
-0.237899	in many respects and highly
-0.453425	commpage. These functions are highly
-0.344419	best function libraries are highly
-0.484840	These function libraries are highly
-0.236498	However, such applications are highly
-0.236352	If you consider making highly
-0.294196	is interpreted again and again
-0.236862	or a nearby address again
-0.231892	stack and reading them again
-0.303110	a loop is interpreted again
-0.224948	read from 0x4700. Reading again
-0.218577	way three times. Then again
-0.200026	memory addresses is reused again
-0.237917	around. Adding 1 to 127
-0.345100	total offset bigger than 127
-0.537259	{...} // AVX version 127
-0.226761	65 33 11.8 127 127
-0.212298	stdint.h char 8 -128 127
-0.403853	floatvalue ( 1)sign 2exponent 127
-0.165164	65 65 33 11.8 127
-0.310520	dealt with in assembly language.
-0.343853	C, C++ or assembly language.
-0.343853	necessary to use assembly language.
-0.292117	round function using assembly language.
-0.262171	drivers may need assembly language.
-0.226787	considering is the D language.
-0.859938	in a hardware definition language.
-0.349239	the programmer to be aware
-0.349239	of dangers to be aware
-0.437140	but you should be aware
-0.337973	software. You should be aware
-0.337973	software developers should be aware
-0.426759	You should therefore be aware
-0.165190	5 / 2 (be aware
-0.312173	that support intrinsic functions. Alternatively,
-0.584102	than the cache size. Alternatively,
-0.231392	has its own stack. Alternatively,
-0.976343	when the function returns. Alternatively,
-0.228147	supported in such applications. Alternatively,
-0.218577	converted to OMF format. Alternatively,
-0.265177	(e.g. IsProcessorFeaturePresent in Windows). Alternatively,
-0.357555	that have floating point capabilities
-0.337943	observed between the optimization capabilities
-0.262714	chapter. Using the out-of-order capabilities
-0.011884	A microprocessor with out-of-order capabilities
-0.049623	} Microprocessors with out-of-order capabilities
-0.287385	massively parallel vector processing capabilities
-0.536956	if ((unsigned int)n < 4)
-0.627904	< 100; i += 4)
-0.201965	((B & 3) << 4)
-0.201965	17is calculated as(a << 4)
-0.201965	A | (B << 4)
-0.235777	} if (level >= 4)
-0.235777	else if (level >= 4)
-0.587736	linking is that the linker
-0.491502	is done by the linker
-0.349230	are relocated by the linker
-0.356258	option -fpie because the linker
-0.336048	-ffunction-sections) which allows the linker
-0.294184	32-bit (signed) address. The linker
-0.285346	support from both compiler, linker
-0.350875	unlimited 8 bytes = int64_t
-0.382683	int32_t long long or int64_t
-0.237704	256 unsigned 256 int int64_t
-0.330978	Iu32vec4 Vec4ui 64 2 int64_t
-0.291455	64 Iu32vec2 64 1 int64_t
-0.165169	__int64 64 -263 263-1 int64_t
-0.349622	very large number of bits.
-0.349622	an extended number of bits.
-0.232090	is extended to 64 bits.
-0.232090	a double uses 64 bits.
-0.236525	bits rather than 32 bits.
-0.426961	use of the extra bits.
-0.234014	by ignoring the higher bits.
-1.223007	order to make the measurements
-0.331824	vary dynamically and that measurements
-0.349737	a program. The time measurements
-0.236304	CPU core during time measurements
-0.382384	microseconds to execute then measurements
-0.347104	hot spot and make measurements
-0.237035	decide to do some measurements
-0.956986	The fact that the representation
-0.294178	old DOS compilers). The representation
-0.851425	of the floating point representation
-0.324737	example 8.15b. The integer representation
-0.281789	stored in a binary representation
-0.253517	right-most 1-bit in binary representation
-0.201958	bits of its binary representation
-0.233591	// Example 8.7 int SomeFunction
-0.233591	// Example 8.9b int SomeFunction
-0.233591	// Example 8.9a int SomeFunction
-0.233591	// Example 8.11b int SomeFunction
-0.233591	// Example 8.11a int SomeFunction
-0.237206	// Example 7.1 float SomeFunction
-0.236553	9.3 #include <malloc.h> void SomeFunction
-0.237532	execution speed or program size,
-0.237393	supported instruction sets, cache size,
-0.765824	by the cache line size,
-0.231892	of 64 bits total size,
-0.230091	integer types available. declaration size,
-0.554624	circular buffer with fixed size,
-0.224934	the following table. Type size,
-0.355105	error message if it is.
-0.800084	branch inside the loop is.
-0.236862	of the array address is.
-0.234179	bigger than it actually is.
-0.232687	decide how advantageous vectorization is.
-0.315816	and convoluted template metaprogramming is.
-0.226773	as the compiler itself is.
-0.111255	b Bit vector algebra reductions:
-0.111255	shift Floating point algebra reductions:
-0.111255	program optimization Integer algebra reductions:
-0.111255	xx-xx--x- reciprocal Boolean algebra reductions:
-0.015542	Floating point XMM (vector) reductions:
-0.015542	- Integer XMM (vector) reductions:
-0.015542	76 Boolean XMM (vector) reductions:
-0.314555	when a user is waiting
-0.314555	while another thread is waiting
-0.349011	language". While we are waiting
-0.710103	most of its time waiting
-0.236306	most of their time waiting
-0.345230	two threads are often waiting
-0.236236	processor can do while waiting
-1.610959	SSE2 instruction set is available,
-0.357379	metaprogramming tools to be available,
-0.354670	calculations whenever they are available,
-0.356999	difference. Newest instruction set available,
-0.314056	the best optimizing compilers available,
-0.347678	purpose libraries are also available,
-0.222371	vector classes are currently available,
-0.539849	don't vectorize the code automatically.
-0.236194	the trivial programming work automatically.
-0.291900	out-of-order execution mechanism works automatically.
-0.232698	register. This advantage comes automatically.
-0.422590	cases cannot be vectorized automatically.
-0.231888	care of this alignment automatically.
-0.229243	compilers may not vectorize automatically.
-0.507130	method works only for powers
-0.314666	all the numbers are powers
-0.237744	etc. are defined as powers
-0.985753	The advantage of using powers
-0.324257	The advise of using powers
-0.232294	this by preferably using powers
-0.236205	by all means avoid powers
-0.341616	facilities for making a debug
-0.237744	a program executable: a debug
-0.350837	and more difficult to debug
-0.235630	you are testing contains debug
-0.228148	The profiler inserts temporary debug
-0.218581	code lines. The 17 debug
-0.165169	take most time. Uses debug
-0.294173	150. Using templates for polymorphism
-0.236593	the desired functionality without polymorphism
-0.321227	efficient than the runtime polymorphism
-0.626416	to obtain the desired polymorphism
-0.165189	// Example 7.43a. Runtime polymorphism
-0.165189	See page 73. Runtime polymorphism
-0.165169	// Example 7.43b. Compile-time polymorphism
-0.836050	Microsoft, Intel, Gnu and Clang
-0.382287	and Windows platforms. The Clang
-0.237515	Unix-like platforms. Clang The Clang
-0.233031	for all Unix-like platforms. Clang
-0.203709	obtained with the Gnu, Clang
-0.203709	such as the Gnu, Clang
-0.156898	Microsoft Intel, Microsoft, Gnu, Clang
-0.356129	The time that is measured
-0.348733	measurement. If time is measured
-0.357869	the counts. It is measured
-0.314265	Pentium 4 computer. The measured
-0.237515	in example 16.2. The measured
-0.356763	and output should be measured
-0.233810	different matrix sizes were measured
-0.352677	the above code in details.
-0.236416	the compiler manual for details.
-0.236416	the vectorclass manual for details.
-0.236026	See page 88 for details.
-0.236026	See page 29 for details.
-0.236026	C++ Compiler Documentation for details.
-0.236026	See my blog for details.
-0.357066	be faster when the factor
-0.237863	* sizeof(float)). Now, the factor
-0.340869	be improved by a factor
-0.340869	index multiplied by a factor
-0.353365	= 2.0; } The factor
-0.350703	the logarithm of each factor
-0.371997	code is a risk factor
-0.097190	x) { _mm_storeu_si128((__m128i *)d, x);
-0.088604	x) { _mm_store_si128((__m128i *)d, x);
-0.035784	UnusedFiller; }; int order(int x);
-0.035784	i, j; int order(int x);
-0.165179	s; s = _mm_hadd_ps(x, x);
-0.165179	F32vec4 xxn(x4, x2*x, x2, x);
-0.358544	runs alone in the core.
-0.868807	running in the same core.
-0.237482	on its own CPU core.
-0.335573	two threads in each core.
-0.067913	in the same processor core.
-0.525896	Now, according to the rules
-0.487909	expressions, even though the rules
-0.444281	20 clock cycles. The rules
-0.346005	if unsigned The same rules
-0.293300	to implement the many rules
-0.235405	has to obey certain rules
-0.165169	2016. The same coding rules
-0.539282	function in terms of speed.
-0.294171	size and optimizing for speed.
-0.407820	are more important than speed.
-0.234006	units and hence higher speed.
-0.231381	half speed or full speed.
-0.165164	with the expected real-time speed.
-0.165164	than half the single-thread speed.
-0.237924	often an obstacle to vectorization.
-0.293819	better and better at vectorization.
-0.091792	processing, OpenMP and automatic vectorization.
-0.292416	and invoked with automatic vectorization.
-0.310884	to rely on automatic vectorization.
-0.185189	2004. Can do automatic vectorization.
-1.165533	are transferred in registers anyway.
-0.235699	possible exception handling support anyway.
-0.234525	feature is rarely needed anyway.
-0.336736	line will be loaded anyway.
-0.231393	known to be true anyway.
-0.074779	the computer is restarted anyway.
-0.074779	shut down and restarted anyway.
-0.314047	advantageous to use the smallest
-0.452406	preferred to use the smallest
-0.564176	necessary, by using the smallest
-0.293682	table for even the smallest
-0.341657	few resources. On the smallest
-0.293682	2 by putting the smallest
-0.433270	Now it is the responsibility
-0.038282	arrays. It is the responsibility
-0.038282	core. It is the responsibility
-0.038282	course. It is the responsibility
-0.038282	diagnose. It is the responsibility
-0.038282	response. It is the responsibility
-0.038282	leaks. It is the responsibility
-0.234004	available, e.g. AVX, AVX2 Mathematical
-0.212309	Memory and string manipulation Mathematical
-0.122666	variables ......................... 142 14.10 Mathematical
-0.122666	u[1] by u[0]. 14.10 Mathematical
-0.330831	time-consuming (see page 140). Mathematical
-0.074781	for vectorization............................................................. 117 12.7 Mathematical
-0.074781	is called. 118 12.7 Mathematical
-0.237183	been increased from 64-bit MMX
-0.222859	int 32 2 64 MMX
-0.296833	int 16 4 64 MMX
-0.302746	char 8 8 64 MMX
-0.222859	long 64 1 64 MMX
-0.313659	Instruction set Header file MMX
-0.218584	is available. The older MMX
-0.350103	or comes from a reliable
-0.582032	order to make a reliable
-0.378556	system is often more reliable
-0.170907	because it gives more reliable
-0.170907	it often gives more reliable
-0.499751	easiest and the most reliable
-0.907279	in order to get reliable
-0.460910	Embarcadero Comes with the Borland
-0.345076	as intended, while the Borland
-0.237777	also included. Combining the Borland
-0.237306	PGI PathScale Gnu Intel Borland
-0.581960	32-bit and 64-bit Windows. Borland
-0.224938	2008, v. 9.0 CodeGear Borland
-0.165174	x64 (Visual Studio 2005). Borland
-0.349207	primitive operations in the sense
-0.349207	a macro in the sense
-0.140411	is portable in the sense
-0.140411	fully portable in the sense
-0.349207	is serial in the sense
-0.349207	are overdetermined in the sense
-0.338149	cases where it makes sense
-0.358249	is supported in the latest
-0.455869	branches: one for the latest
-0.352812	page 122) for the latest
-0.506565	dispatching. For example, the latest
-0.331512	PathScale. 2. Use the latest
-0.293778	end user gets the latest
-0.237882	operating systems. 3 The latest
-0.350037	= &CriticalFunction_386; } // Now
-0.237235	0x7FFFFF) | 0x3F800000; // Now
-0.580505	that r points to. Now
-0.299292	the problems mentioned above. Now
-0.165169	+ (c + d); Now
-0.165169	has been brutally interrupted. Now
-0.165169	} sum = (s0+s1)+(s2+s3); Now
-0.314284	throughput of the execution units
-0.043594	Use CPUs with execution units
-0.043594	Older CPUs with execution units
-0.261481	between the different execution units
-0.209024	the full 128-bit execution units
-0.236081	the largest vector. These units
-0.288311	the CPU chip. Such units
-0.339043	we want it to do.
-0.293791	an obvious thing to do.
-0.237534	necessary cleanup jobs to do.
-0.294010	what it can not do.
-0.330875	which reductions they cannot do.
-0.235132	matters, which few programs do.
-0.165174	multi-core CPUs, but event-counters do.
-0.064380	clock cycle is the reciprocal
-0.343417	Even better: store the reciprocal
-0.407817	time and insert the reciprocal
-0.237730	= multiply by - reciprocal
-0.265197	approximate reciprocal, fast approximate reciprocal
-0.165179	= multiply by xx-xx--x- reciprocal
-0.006601	inline void StoreVector(void * d,
-0.227148	inline void StoreVectorA(void * d,
-0.151781	float a, b, c, d,
-0.163034	of code into multiple threads.
-0.035385	the work into multiple threads.
-0.163034	the tasks into multiple threads.
-0.163034	the job into multiple threads.
-0.236966	synchronizing and communicating between threads.
-0.165184	of starting and stopping threads.
-0.615784	In some cases, the log
-0.237890	like sqrt, pow and log
-0.237869	with a password. The log
-0.348988	multiplication here: a[i] = log
-0.294088	to turn off or log
-0.234192	remote databases usually requires log
-0.598825	the critical innermost loop. log
-0.331929	the function stores the thousand
-0.354081	same function on a thousand
-0.525374	example every time a thousand
-0.407371	hundred or even a thousand
-0.237391	a loop repeats a thousand
-0.467629	feeding an array of thousand
-0.336267	only one time in thousand
-0.294230	used for implementing a compile-time
-0.237890	compile- time if and compile-time
-0.237795	compile- time loops or compile-time
-0.314560	without polymorphism or with compile-time
-0.546619	it is known at compile-time
-0.313713	that works for any compile-time
-0.234911	The D language allows compile-time
-0.355783	operator here is to remove
-0.518743	then you need to remove
-0.382046	allows the linker to remove
-0.293573	variable names. Remember to remove
-0.237805	that doesn't add or remove
-0.354562	to zero. You may remove
-0.200043	multiple threads can add, remove
-0.237925	logical processors. Hyperthreading is Intel's
-0.355144	years old version of Intel's
-0.294212	functions are supplied in Intel's
-0.237764	This is supplied with Intel's
-0.233573	that fit their CPUs. Intel's
-0.013568	unless you are overriding Intel's
-0.531883	8 rather than by 16.
-0.793116	that is divisible by 16.
-0.855610	an address divisible by 16.
-0.391674	to addresses divisible by 16.
-0.275058	have addresses divisible by 16.
-1.439195	as explained on page 16.
-0.218590	same as i modulo 16.
-0.859437	to be transferred in registers,
-0.646911	parameters are transferred in registers,
-0.313772	should be saved in registers,
-0.341670	mechanism works only on registers,
-0.226779	available. The older MMX registers,
-0.165179	frame, saving and restoring registers,
-0.068032	{ // function to transpose
-0.032685	matrix // function to transpose
-0.462159	40% more time to transpose
-0.442923	as long time to transpose
-1.725743	time it takes to transpose
-0.293002	define matrix // call transpose
-0.926735	we don't have to wait
-0.296974	array element has to wait
-0.296974	each addition has to wait
-0.296974	user actually has to wait
-0.237899	value of seconds and wait
-0.324982	the DelayFiveSeconds function will wait
-0.236247	reading of x must wait
-0.452017	it to any other number.
-0.851425	of the floating point number.
-0.419903	offset as a 32-bit number.
-0.532685	as an 8-bit signed number.
-0.146651	brand name and model number.
-0.126999	its family and model number.
-0.126999	brand, family and model number.
-0.357924	90% chance that the break
-0.237864	hot spot. Repeating the break
-0.925813	examples of how to break
-1.167391	is no need to break
-0.529002	debugger then it will break
-0.200043	a debugger and press break
-0.356473	when multiplying with a constant.
-0.237848	they point to are constant.
-0.294092	count is large or constant.
-0.357756	loop by the same constant.
-0.237388	is a positive integer constant.
-0.349007	is a double precision constant.
-0.068042	address in the procedure linkage
-0.010625	version in a procedure linkage
-0.010625	lookup in a procedure linkage
-0.021517	program uses a procedure linkage
-0.068042	its functions, called procedure linkage
-0.068042	uses an ordinary procedure linkage
-0.311209	an error code if possible,
-0.234952	and static variables if possible,
-0.290853	simplest possible implementation if possible,
-0.234952	should be avoided, if possible,
-0.234952	is always normalized, if possible,
-0.237753	as few branches as possible,
-0.095793	consider that the bit scan
-0.095793	to use the bit scan
-0.218877	implementations of this bit scan
-0.095793	with a slow bit scan
-0.095793	CPUs with slow bit scan
-0.165190	afterwards a BSF (bit scan
-0.282682	compared to 32 bit systems:
-0.282682	advantages over 32 bit systems:
-0.016731	unsigned int in 16-bit systems:
-0.008285	short int in 16-bit systems:
-0.016731	int16_t int in 16-bit systems:
-0.354565	one operand is more predictable
-0.334527	point comparisons are more predictable
-0.652360	then put the most predictable
-0.419571	12.4a, depending on how predictable
-0.252017	If it is poorly predictable
-0.184764	to replace a poorly predictable
-0.151603	Hello() { cout << "Hello
-0.016334	Called directly // Writes "Hello
-0.004026	"Hello 1" // Writes "Hello
-0.016334	&Object2; p2->Hello(); // Writes "Hello
-0.339244	in this way is equal
-0.293854	address of list[i] is equal
-0.237590	where each label is equal
-0.503624	matrix happen to be equal
-0.237711	threads and put an equal
-0.347090	to p is therefore equal
-0.237874	the register keyword. The CodeGear
-0.293089	stack (three parameters on CodeGear
-0.236917	first two (three on CodeGear
-0.237306	Gnu 32-bit Mac Intel CodeGear
-0.235815	64-bit Windows. Borland / CodeGear
-0.165174	studio 2008, v. 9.0 CodeGear
-0.501275	functions. The code is compact
-0.338838	numerical data is more compact
-0.338838	data member is more compact
-0.335362	generally faster and more compact
-0.489802	makes the code more compact
-0.232079	can be made more compact
-0.351349	The calculation of this polynomial
-0.237586	B*x + C; } polynomial
-0.165185	Loop counter // Calculate polynomial
-0.165185	// Example 8.23b. Calculate polynomial
-0.212320	is an n'th degree polynomial
-0.200037	with vector parameters Vec4f polynomial
-0.346481	code. (Compile without the Common
-0.237388	Common subexpression elimin., integer Common
-0.233584	8.5b a += 2; Common
-0.426555	Integer XMM (vector) reductions: Common
-0.276579	Constant propagation Pointer elimination Common
-0.165169	as VHDL or Verilog. Common
-0.494856	destructor. A function that reads
-0.237798	with normal writes or reads
-0.353835	Assume that a program reads
-0.335943	address 0x2710 and later reads
-0.230111	of efficiency. Using unaligned reads
-0.200032	If the program afterwards reads
-0.324937	the value from memory plus
-0.532623	(8 float or double plus
-0.236865	as a base address plus
-0.349539	address plus a constant plus
-0.235633	the beginning of list plus
-0.228157	to the preceding label plus
-0.245899	page 49 and manual 5:
-0.203799	are explained in manual 5:
-0.203799	are given in manual 5:
-0.203799	table 19 in manual 5:
-0.203799	kernel code" in manual 5:
-0.195186	simplest cases. See manual 5:
-0.592241	package in order to increase
-0.456392	factors. The way to increase
-0.355762	(In Windows you can increase
-0.346250	most systems, you cannot increase
-0.234191	if your modifications actually increase
-0.200037	justify a possible minor increase
-0.237168	type casting // C++ casting
-0.218599	unions rather than type casting
-0.218599	an over- loaded type casting
-0.218599	conversion // C-style type casting
-0.218599	casting // Constructor-style type casting
-0.224953	method is safer. Type casting
-0.237169	takes extra time, of course,
-0.237169	This is inefficient, of course,
-0.237169	safe programming practice, of course,
-0.237169	can be omitted, of course,
-0.237169	program. This requires, of course,
-0.165190	the division faster. Of course,
-0.358309	not visible in the scope
-0.861515	possible to make the scope
-0.020825	purposes is beyond the scope
-0.020825	queries is beyond the scope
-0.020825	coprocessors is beyond the scope
-0.336315	same name, regardless of scope
-0.538950	following example shows the principle
-0.493586	of function calls. The principle
-0.237512	may go undetected. The principle
-0.237733	do this manually. This principle
-0.341544	not normally use this principle
-1.198838	can use the same principle
-0.357250	the latency and the throughput
-0.460999	is limited by the throughput
-0.558285	xxn rather than the throughput
-0.429163	way to increase the throughput
-0.040475	chain. 3.16 Execution unit throughput
-0.040475	22 3.16 Execution unit throughput
-0.059487	of the time is spent
-0.293928	time you would have spent
-0.343873	not only the time spent
-0.343873	for reducing the time spent
-0.352082	that the clock cycles spent
-1.531741	const int size = 16;
-0.818779	a = b / 16;
-0.518063	= (unsigned int)b / 16;
-0.625832	a = b % 16;
-0.477507	= (unsigned int)b % 16;
-0.226782	= 1; n <= 16;
-0.294018	is the name of Func
-0.314623	name ; start of Func
-0.500526	calculated the first time Func
-0.348949	be re-calculated every time Func
-0.314325	label ; return from Func
-0.236549	// Example 8.25 void Func
-0.568533	anything, you have to identify
-0.591664	information in order to identify
-0.894529	the best way to identify
-0.649529	is discussed how to identify
-0.335507	may be enough to identify
-0.236961	in the debugger to identify
-0.355309	family number, which is 15
-0.294199	takes between 2 and 15
-0.802202	8, last byte at 15
-0.307628	in a memory pool. 15
-0.218572	System programming .......................................................................................... 150 15
-0.165169	containers. See page 90. 15
-0.237084	clock cycles. Division takes 14
-0.234796	tested in Mac systems. 14
-0.231891	the sake of optimization. 14
-0.226783	Intel compiler ......................................................................... 130 14
-0.165169	of user interface framework........................................................................... 14
-0.165169	of the C++ language...................................................... 14
-0.460017	of how to do this.
-0.323575	same type to avoid this.
-0.323575	time measurements to avoid this.
-0.346594	not able to see this.
-0.224944	are not always avoiding this.
-0.767151	The following example illustrates this.
-0.237388	float Register variables, integer Register
-0.237196	Common subexpression elimin., float Register
-0.292863	pointer in assembly code. Register
-0.306447	the compilation is finished. Register
-0.218581	problems for integer constants. Register
-0.251348	= temp / 4; Register
-0.354629	software package on a complex
-0.339653	composite type is more complex
-0.339653	The situation is more complex
-0.309427	b. But in more complex
-0.344747	point expressions or more complex
-0.237599	sequences of operations. A complex
-0.236718	leads to suboptimal code. Intrinsic
-0.234791	instructions. Function Assembly name Intrinsic
-0.234517	instructions are summarized below. Intrinsic
-0.230803	a few machine instructions. Intrinsic
-0.226772	code in either case. Intrinsic
-0.165169	4 AVX2 Table 12.3. Intrinsic
-0.294233	constructors and destructors to call.
-0.350609	consistent for the function call.
-0.350609	optimize across the function call.
-0.502352	address through a function call.
-0.313881	time. Dispatch on first call.
-0.312843	times: Dispatch on every call.
-0.286025	the compiler you will notice
-0.286025	the compiler, you will notice
-0.101240	The first thing we notice
-0.101240	The second thing we notice
-0.068039	..................................................................................................................... 163 20 Copyright notice
-0.068039	some links. 20 Copyright notice
-0.251765	LoadVector(cc + i); // Add
-0.440191	market the application program. Add
-0.233030	a function local: 1. Add
-0.697110	version of the library. Add
-0.336214	library with CPU dispatching. Add
-0.221039	is used is branch prediction.
-0.035363	an explanation of branch prediction.
-0.221039	page 43 about branch prediction.
-0.221039	suffer from poor branch prediction.
-0.340145	has made the right prediction.
-0.722174	keep up with the expected
-0.358012	with Gnu. It is expected
-0.842893	and it can be expected
-0.353507	The same can be expected
-0.353507	WTL applications can be expected
-0.237854	AVX-512 instruction set are expected
-0.355954	thread-specific data is to declare
-0.353582	is also recommended to declare
-0.407803	It is preferred to declare
-0.294206	of #include directives and declare
-0.354562	is called. You may declare
-0.354488	integer size if you declare
-0.657374	is good for the application.
-0.356786	each compiler with the application.
-0.382657	size that fits the application.
-0.224039	overflow in the particular application.
-0.476159	not in a particular application.
-0.165184	compact than an MFC application.
-0.559547	calculated at compile time here.
-0.237438	the memory model used here.
-0.233577	is simply not appropriate here.
-0.222392	are a few pitfalls here.
-0.212304	seem a little odd here.
-0.200032	of b is 400 here.
-1.363500	the size of the largest
-0.356075	is larger than the largest
-0.347310	to know whether the largest
-0.023527	for finding the numerically largest
-0.023527	14.30 finds the numerically largest
-0.048401	2; // Find numerically largest
-0.549407	called, even if the dispatched
-0.350841	linkage table. If a dispatched
-0.294011	{ // go to dispatched
-0.237727	} // Entry to dispatched
-0.237905	} // continue in dispatched
-0.323419	dispatched function calls another dispatched
-0.512133	instance of the data members.
-0.333702	by reordering the data members.
-0.498684	7.2. Alignment of data members.
-0.318960	by copying all data members.
-0.230904	function cannot modify data members.
-0.438263	of the child class members.
-0.331578	smallest data size that fits
-0.331321	call the version that fits
-0.356366	so small that it fits
-0.200814	use depends on what fits
-0.200814	solutions, depending on what fits
-0.251360	structure of four float's fits
-0.237728	x-xxx - xx(-)x- - x-xxxx--x
-0.319404	Optimization method Function inlining x-xxxx--x
-0.224932	- xx(-)x- - x-xxxx--x x-xxxx--x
-0.330824	x-xxxxx-- (a&b)|(a&c) = a&(b|c) x-xxxx--x
-0.165169	x-xxxx--x (a|b)&(a|c) = a|(b&c) x-xxxx--x
-0.165169	is always true/false Loopunrolling x-xxxx--x
-0.547704	INSTRSET is used for giving
-0.481213	which are used for giving
-0.237780	up the CPU by giving
-0.421446	understand it. I am giving
-0.200037	{ DoThisThreeTimesAWeek(); } By giving
-0.165174	language as a subset, giving
-0.349139	assume that floating point comparisons
-0.711721	set makes floating point comparisons
-0.320393	clock cycles. Floating point comparisons
-0.320393	is organized. Floating point comparisons
-0.289930	1.0f; } The two comparisons
-0.234140	error condition. Replacing two comparisons
-0.237306	See page 131. Intel Performance
-0.381795	Technical Report on C++ Performance
-0.287969	total computation time. 4 Performance
-0.232416	throughput ....................................................................................... 22 4 Performance
-0.272286	statistics, and the "Intel Performance
-0.165174	Kernel Library" and "Integrated Performance
-0.229260	on the stack (see above,
-0.229260	the critical stride (see above,
-0.350133	is pipelined, as explained above,
-0.227396	predicted perfectly. As explained above,
-0.361326	as in example 16.2 above,
-0.165179	at the function bodies above,
-0.350133	memory pool, as explained above.
-0.227398	table lookup mechanisms explained above.
-0.375085	follow the advice given above.
-0.165192	garbage collection, as mentioned above.
-0.165192	all the problems mentioned above.
-0.165192	the storage methods mentioned above.
-0.230092	holds a memory address. Pointer
-0.222385	Borland Microsoft Constant propagation Pointer
-0.200032	(See page 81). 77 Pointer
-0.200032	for details about rounding. Pointer
-0.165169	complicated functions like sin. Pointer
-0.165169	to can be accessed. Pointer
-0.356295	The purpose is to detect
-0.314704	very smart. They can detect
-0.498094	so that it may detect
-0.237590	The [] operator will detect
-0.296976	PathScale compilers can automatically detect
-0.222980	The program should automatically detect
-0.457187	exception without using the normal
-0.356437	index, just as a normal
-0.325382	the priority back to normal
-0.237893	1 if nonzero and normal
-0.237764	mix nontemporal writes with normal
-0.809536	take more time than normal
-0.234360	are based on compilers. Several
-0.317414	includes standard function libraries. Several
-0.212304	www.amd.com. 163 Internet forums Several
-0.165169	do not support SSE. Several
-0.165169	Object Windows Library (OWL). Several
-0.165169	An example is Perl. Several
-0.585832	module then it is convenient
-0.357955	linked list is a convenient
-0.582777	example, it may be convenient
-0.562555	classes can also be convenient
-0.673447	It may be more convenient
-0.236226	it is certainly more convenient
-0.294231	this example only to show
-0.790702	at a time and show
-0.237632	interrupt 3 breakpoint and show
-0.331356	the user but only show
-0.212309	page 134 and 135 show
-0.212320	results in table 9.1 show
-0.237636	element number 16 in column
-0.467144	different cache lines in column
-0.294055	swap these elements with column
-0.356471	for (column = 0; column
-0.222383	time we are swapping column
-0.165174	goes from the leftmost column
-0.009025	CriticalFunction_386(int parm1, int parm2) {...}
-0.009025	CriticalFunction_SSE2(int parm1, int parm2) {...}
-0.009025	CriticalFunction_AVX(int parm1, int parm2) {...}
-0.455120	Comparison of function libraries Test
-0.336214	functions without CPU dispatching. Test
-0.281567	and fragmented hard disk. Test
-0.560598	misses and branch mispredictions. Test
-0.074781	cases........................................................................................................ 124 2 13.4 Test
-0.074781	a reliable decision. 13.4 Test
-0.294231	the full declaration of c1
-0.237896	// Loop r1 and c1
-0.629261	information about the class c1
-0.234970	// Example 7.28 class c1
-0.356471	for (c1 = 0; c1
-0.212309	0; c1 < r1; c1
-0.632674	c*x + d = x-
-1.907517	- x x x x-
-1.004427	x- x x x x-
-0.575686	74 x x x x-
-0.234475	-- - xx x x-
-0.224963	+ d = x- x-
-0.237821	want to measure // Number
-0.022967	of each element, bits Number
-0.233030	will evict number 1. Number
-0.200037	0 in this column. Number
-0.200037	Vector class libraries 113 Number
-0.357993	I believe that the portability
-0.569258	block. There is a portability
-1.378580	for the sake of portability
-0.237791	therefore not recommended if portability
-0.237613	a viable compromise when portability
-0.212304	a compromise between efficiency, portability
-0.237821	(Intel) #include <pmmintrin.h> // SSE3
-0.377402	integer and double vectors SSE3
-0.251354	/arch:SSE2 -msse2 /arch:SSE2 -msse2 SSE3
-0.074781	SSE3 instruction set Suppl. SSE3
-0.074781	emmintrin.h SSE3 pmmintrin.h Suppl. SSE3
-0.165174	SSE xmmintrin.h SSE2 emmintrin.h SSE3
-1.263715	for the compiler to evaluate
-0.639877	the same time to evaluate
-0.838206	will be able to evaluate
-0.311197	&& a needs to evaluate
-0.311197	&& b needs to evaluate
-0.292674	integers, and they always evaluate
-0.023502	157 17 Optimization in embedded
-0.237769	devices and machines with embedded
-0.351550	now used in some embedded
-0.228167	Microcontrollers used in small embedded
-0.306994	programs, except for small embedded
-0.237041	Literature Other manuals by Agner
-0.237041	manuals is copyrighted by Agner
-0.382026	12.4e. Same example, using Agner
-0.237306	Vector class library Intel Agner
-0.272277	class, Intel Vector class, Agner
-0.200037	and Mac platforms By Agner
-0.462169	efficient thanks to the availability
-0.357162	second source, and the availability
-0.357536	to test for the availability
-0.293876	which will delay the availability
-0.237609	discrete icon signaling the availability
-0.314703	with the application. The availability
-0.357724	using InstructionSet(): // Example 13.1
-0.490515	explicitly as in example 13.1
-0.420834	of CriticalFunction in example 13.1
-0.594380	is illustrated in example 13.1
-0.222394	different instruction sets........................... 122 13.1
-0.200043	the critical innermost loops. 13.1
-0.237927	type, a pointer, a reference,
-0.404978	through a pointer or reference,
-0.282877	4 4 pointer or reference,
-0.282877	8 8 pointer or reference,
-0.165184	pointer or a non-const reference,
-0.358619	language runtime of the .NET
-0.525558	build code for the .NET
-0.237515	Big runtime frameworks. The .NET
-0.237515	moving the mouse. The .NET
-0.276592	as C#, Visual Basic .NET
-0.165179	other languages in Microsoft's .NET
-0.311787	c, d; if (a !=
-0.218591	> y && z !=
-0.218581	= 1.0; while (n !=
-0.501044	0) { if (b !=
-0.165169	Example 7.8 if (handle !=
-0.165169	= string; while (*p !=
-0.291628	hard disk. A few files,
-0.203362	DLLs, configuration files, resource files,
-0.203362	or shared objects), resource files,
-0.229248	automatic updates, remote help files,
-0.148755	objects), resource files, configuration files,
-0.148755	number of DLLs, configuration files,
-0.595677	7.6 Pointers and references Pointers
-0.215261	references Pointers versus references Pointers
-0.325435	access by each thread. Pointers
-0.230103	to a valid address. Pointers
-0.074783	Boolean vector operations. 7.6 Pointers
-0.074783	7.5 Booleans................................................................................................................... 33 7.6 Pointers
-0.351866	run at more than half
-0.347816	run at less than half
-0.237559	size is handled at half
-0.485935	diagonal there is only half
-0.233696	ports, etc. of only half
-0.233696	double by modifying only half
-0.987389	operator is used for converting
-0.438332	of extra instructions for converting
-0.349011	example 7.4 we are converting
-0.293883	// Use signed when converting
-0.236867	it to signed before converting
-0.212320	last line is implicitly converting
-0.237488	cache. The problem only occurs
-0.111434	and if an exception occurs
-0.111434	what if an exception occurs
-0.233581	when a task switch occurs
-0.230814	If the same subexpression occurs
-0.283138	many times an interrupt occurs
-0.405431	*(int*)&x |= 0x80000000; // Set
-0.023383	level = InstructionSet(); // Set
-0.236078	= instrset_detect(); 116 // Set
-0.165184	available: // Example 7.6. Set
-0.165184	underflow: // Example 7.5. Set
-0.382626	anyway. The exception is costly
-0.382626	example, the conversion is costly
-0.237851	advanced programming constructs are costly
-0.235318	time slice are quite costly
-0.276585	comparisons, which are relatively costly
-0.284352	other abuse is extremely costly
-0.345661	speed-critical program on the newest
-0.345661	works best on the newest
-0.446822	when running on the newest
-1.040321	advantageous to use the newest
-0.519324	disadvantage of using the newest
-0.314703	obstacle to vectorization. The newest
-0.237872	is a standard for specifying
-0.237780	class is declared by specifying
-0.236596	declare an int, without specifying
-0.334758	is a copy constructor specifying
-0.016036	rule of standard C, specifying
-0.510509	changes. A branch that follows
-0.355106	switch statement if it follows
-0.766038	can be implemented as follows
-0.293035	now be vectorized as follows
-0.885884	of the function pointer follows
-0.165174	its arguments. This closely follows
-0.449153	<. The result of comparing
-0.342589	point numbers simply by comparing
-0.237043	comparison of doubles by comparing
-0.835580	sometimes more efficient than comparing
-0.236648	array element. Rather than comparing
-0.251360	incrementing a loop counter, comparing
-0.357475	the function. This is efficient,
-0.294199	arrays is fast and efficient,
-0.821849	make the code more efficient,
-0.237440	A more primitive, but efficient,
-0.236642	as to make pointers efficient,
-0.235313	Fortran is also quite efficient,
-0.615757	the next generation of computers
-0.331826	software development, and that computers
-0.542162	measure. This is because computers
-0.308921	reason why all modern computers
-0.136384	typically have more powerful computers
-0.136384	in ever more powerful computers
-0.353656	and last all the B
-0.572965	begin the calculation of B
-0.337117	vector, and the four B
-0.212320	the values of A, B
-0.008673	double A = 1.1, B
-0.313324	intended for system code. System
-0.074783	File access................................................................................................................ 20 3.8 System
-0.074783	operations to finish. 3.8 System
-0.074783	Position-independent code.................................................................................. 148 14.13 System
-0.074783	Mac OS X. 14.13 System
-0.466187	execution of everything else. System
-0.118098	in a series of five
-0.237640	notice This series of five
-0.347488	that are up to five
-0.293686	above example, then all five
-0.286824	the value has changed five
-0.350703	intermediate result of each step
-0.539090	functions in a single step
-0.448871	directly to the next step
-0.425121	together in the second step
-0.291772	go through a second step
-0.165174	allocating more space 91 step
-0.336303	87). Data caching is poor
-0.341858	seen many examples of poor
-0.336296	be higher due to poor
-0.357400	The usability may be poor
-0.293820	statements often suffer from poor
-0.165169	Optimizes reasonably well. Very poor
-0.928725	you don't have to prefetch
-0.237872	optimization. Prefetching data The prefetch
-0.314703	out-of-order execution mechanism can prefetch
-0.236880	the level-2 cache cannot prefetch
-0.236412	is that modern processors prefetch
-0.234913	are able to automatically prefetch
-0.293987	of code gives an 9
-0.237111	a[i] = i * 9
-0.236957	predicted perfectly varies between 9
-0.218581	push and pop ebx. 9
-0.212316	2, 3, 4, 6, 9
-0.200032	compiler does ............................................................................. 84 9
-0.331750	not spend time on deciding
-0.232028	and fine-grained parallelism when deciding
-0.129452	problems into account when deciding
-0.127942	taken into account when deciding
-0.287529	over the disadvantages when deciding
-0.352059	the GOT through a self-relative
-0.572965	is the calculation of self-relative
-0.294176	has no instruction for self-relative
-0.350580	The procedure to calculate self-relative
-0.044651	bit instruction set supports self-relative
-0.044651	x86-64 instruction set supports self-relative
-0.237814	float * DynamicArray = (float
-0.231406	Example 8.1a float square (float
-0.027574	Example 8.3a float parabola (float
-0.027574	* a;} float parabola (float
-0.027574	Example 8.1b float parabola (float
-0.200043	static inline int lrintf (float
-0.553001	core. For example, a Core
-0.347673	optimized for the Intel Core
-0.229471	strlen 128 bytes Intel Core
-0.229471	16kB aligned operands Intel Core
-0.229471	16kB unaligned op. Intel Core
-0.236041	ia32intrin.h _mm_exp_ps _mm_exp_pd AMD Core
-0.354785	call stack in the debugger
-0.354785	you see in the debugger
-0.354785	(release version) in the debugger
-0.358078	the program in a debugger
-0.237877	options prevent optimization. The debugger
-0.293861	incompatible with debugging. A debugger
-0.655916	multiple bits with the ^
-0.457877	^0 = a a ^
-0.489324	b; b = a ^
-0.342087	^ ~b = a ^
-0.382117	^a = 0 a ^
-0.222403	-1 (a&~b)|(~a&b)=a^b --------- ~a ^
-0.732137	take the same time regardless
-0.357756	is still the same regardless
-0.500490	fast in most cases, regardless
-0.228152	known to be false regardless
-0.623847	are transferred in registers, regardless
-0.165169	having the same name, regardless
-0.455671	Use rounding instead of truncation
-0.421377	to be changed to truncation
-0.336064	faster nor slower than truncation
-0.237625	numbers to integers use truncation
-0.237510	This is unfortunate because truncation
-0.361313	the C/C++ standard specifies truncation
-0.846597	to one of the base
-1.078271	a pointer to a base
-0.654563	be expressed as a base
-0.742162	when deciding whether to base
-0.237545	remote help files, data base
-0.218578	mode if the image base
-0.763898	16 bits of the result.
-0.462146	to produce the same result.
-0.541032	formula into a single result.
-0.291841	replaced by the calculated result.
-0.322596	variable produces a negative result.
-0.233313	produces a low positive result.
-0.233035	uses CPU dispatching: 1. How
-0.148749	in the compiler 8.1 How
-0.148749	compiler .......................................................................................... 66 8.1 How
-0.212315	with only four multiplications. How
-0.074783	consumers ................................................................................ 16 3.1 How
-0.074783	biggest time consumers 3.1 How
-0.226944	to break a dependency chain.
-0.102749	has a long dependency chain.
-0.102749	forms a long dependency chain.
-0.129749	called a loop-carried dependency chain.
-0.129749	is no loop-carried dependency chain.
-0.129749	are: No loop-carried dependency chain.
-0.236545	below. 3.7 File access Reading
-0.446318	time-consumer in the program. Reading
-0.231393	faster than random access. Reading
-0.554633	14.1 Use lookup tables Reading
-0.212304	% 0x20 = 0x1C. Reading
-0.200032	we read from 0x4700. Reading
-0.354363	second step where the compilation
-0.237805	step of interpretation or compilation
-0.290002	Any language that requires compilation
-0.389625	intermediate code and just-in-time compilation
-0.156895	are based on just-in-time compilation
-0.203091	Java machines use just-in-time compilation
-0.140246	for finding the hot spots
-0.140246	code once the hot spots
-0.139192	profiler identifies any hot spots
-0.015147	profiler to find hot spots
-0.139192	useful for identifying hot spots
-0.357852	standard says that the behavior
-0.477744	library can change the behavior
-0.237778	intended to mimic the behavior
-0.294181	blog for details. The behavior
-0.312444	(5) make the overflow behavior
-0.212315	2008 version). This wasteful behavior
-0.237731	higher than normal. This happens
-0.237486	b[i]*c[i], though this only happens
-0.293696	is filled up, which happens
-0.290934	compiled C++. This typically happens
-0.234669	Let's look at what happens
-0.229242	interpreted languages where everything happens
-0.549405	blocks, or if the 7
-0.935464	0, last byte at 7
-0.292269	is supported in Windows 7
-0.292281	on Intel compiler versions 7
-0.212304	6 Development process...................................................................................................... 25 7
-0.165169	a version control tool. 7
-0.234961	data storage and page 87
-0.352799	cache contentions. See page 87
-0.224944	; return from Func 87
-0.218578	Optimizing memory access ............................................................................................. 87
-0.218578	code and data ......................................................................................... 87
-0.165174	9.2 Cache organization ................................................................................................... 87
-0.331519	of elements in vector Type
-0.226777	in the following table. Type
-0.327407	data elements, as follows: Type
-0.122666	of this problem. 7.11 Type
-0.122666	Arrays ..................................................................................................................... 38 7.11 Type
-0.200037	this method is safer. Type
-0.339040	be implemented in different places
-0.570819	scattered around at different places
-0.236113	optimization instructions at specific places
-0.235755	value that is four places
-0.234522	value that is n places
-0.233811	value that lies r places
-0.315361	function because the stack unwinding
-0.020891	Other cases of stack unwinding
-0.196300	other situations: The stack unwinding
-0.196300	implementation dependent. The stack unwinding
-0.256520	a mechanism called stack unwinding
-0.571586	list should preferably be static,
-0.082506	calculated value. The keyword static,
-0.082506	interprocedural optimizations. The keyword static,
-0.082506	the context. The keyword static,
-0.082506	page 80. The keyword static,
-0.200049	Func is executed. Without static,
-0.090623	don't understand it. I am
-0.090623	and recompile it. I am
-0.257261	my optimization manuals. I am
-0.205282	principles to use. I am
-0.205282	In this manual, I am
-0.205282	with embedded microcontrollers. I am
-0.331761	frame function into a leaf
-0.331761	be turned into a leaf
-0.443797	function is called a leaf
-0.048106	one other function. A leaf
-0.048106	any other function. A leaf
-0.293145	makes a distinction between leaf
-0.962620	the second operand is evaluated
-0.061735	that should not be evaluated
-0.349465	that macro parameters are evaluated
-0.237401	&& and || are evaluated
-0.657199	second operand is not evaluated
-1.085795	not be able to completely
-0.967918	the code can be completely
-0.357028	repeat count may be completely
-0.237802	setup but slow or completely
-0.294045	even be used on completely
-0.231415	misleading results or fail completely
-0.294210	is reused again and again.
-0.032072	code address and back again.
-0.007799	protected mode and back again.
-0.032072	to truncation and back again.
-0.501069	the interrupt 3 breakpoint again.
-0.507224	to the availability of powerful
-0.237654	some development tools have powerful
-0.311073	developers typically have more powerful
-0.290723	cases. An even more powerful
-0.234838	invest in ever more powerful
-0.291275	consumption are actually quite powerful
-0.354785	make it in the form
-0.717176	container classes in the form
-0.458372	blocks, either in the form
-0.496514	line by any other form
-0.289790	that processor model numbers form
-0.289551	are stored in binary form
-0.556561	alloca, because it is deallocated
-0.037136	sizes are allocated and deallocated
-0.354676	so that they are deallocated
-0.347684	allocated objects are also deallocated
-0.234918	The space is automatically deallocated
-0.234517	the other way three times.
-0.232713	long and irregular response times.
-0.224938	value has changed five times.
-0.306462	from memory a hundred times.
-0.119308	garbage collector at inconvenient times.
-0.119308	come unpredictably at inconvenient times.
-0.343709	is less useful in 32-
-0.742014	in 32-bit mode. The 32-
-0.127253	PathScale C++ compiler for 32-
-0.127253	PGI C++ compiler for 32-
-0.237592	in two versions. A 32-
-0.233032	the Intel libraries. Supports 32-
-0.351619	the array a and edx
-0.293906	registers eax, ecx and edx
-0.435017	use the value in edx
-0.237729	PTR [edx] adds, not edx
-0.229433	ecx = Induction ; edx
-0.229433	DWORD PTR [esp+12] ; edx
-0.322361	pointers because it cannot rule
-0.385678	that the compiler cannot rule
-0.232152	72). The compiler cannot rule
-0.041348	on the strict aliasing rule
-0.041348	violates the strict aliasing rule
-0.224950	be able to completely rule
-0.237906	doing two iterations in one.
-0.327738	than making a new one.
-0.327738	and create a new one.
-0.296500	result of the preceding one.
-0.338857	depends on the previous one.
-0.543063	course, but this is permissible
-0.587528	important. This can be permissible
-0.237854	subtraction and multiplication are permissible
-0.571865	loop. It is not permissible
-0.340461	algebraic reductions are not permissible
-0.340461	(e.g. '>') are not permissible
-0.237695	has to assume the worst
-0.483684	one that gives the worst
-0.048359	order to cover the worst
-0.048359	big to cover the worst
-0.324894	other way, etc. The worst
-0.407562	caching is critical. The worst
-0.501735	system this is the job
-0.349446	algorithm can do the job
-0.237694	others have done the job
-0.892259	is to divide the job
-0.349963	will do the best job
-0.283143	requires that the background job
-0.237342	in many commercial compilers due
-0.234010	expected to be higher due
-0.230810	incur a large delay due
-0.285334	away in the future due
-0.200032	that measurements are unstable due
-0.200032	there are some differences due
-0.348538	{ double y = 1.0;
-0.023332	x, n, factorial = 1.0;
-0.235413	i++) { list[i].a = 1.0;
-0.235413	temp++) { temp->a = 1.0;
-0.990869	p(double x) { return 1.0;
-0.237866	doubled. Thin clients that depend
-0.237399	of one iteration should depend
-0.511984	platforms because it doesn't depend
-0.234797	because the factorials don't depend
-0.234793	code. These workaround methods depend
-0.231400	and other hardware-related details depend
-0.356580	memory access is the biggest
-0.535506	16 to fit the biggest
-0.048358	program. 3 Finding the biggest
-0.048358	14 3 Finding the biggest
-0.237879	platform-independent and compact. The biggest
-0.342981	b, c; // Define biggest
-0.234800	The funny looking name ?Func@@YAXQAHAAH@Z
-0.018519	ALIGN 4 PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z
-0.272291	$B1$3: pop ret ALIGN ?Func@@YAXQAHAAH@Z
-0.008673	assembly: ALIGN 4 PUBLIC ?Func@@YAXQAHAAH@Z
-0.355043	inherently parallel because it defines
-0.339033	the software programming language defines
-0.786507	a hardware definition language defines
-0.234805	is. The type __m128i defines
-0.200037	each. The type __m128 defines
-0.165174	float. The type __m128d defines
-0.024667	live ranges do not overlap.
-0.298369	(live ranges) do not overlap.
-0.990901	of a and b overlap.
-0.228168	their live ranges now overlap.
-0.089881	optimization options. Supports parallel processing,
-0.089881	and Mac. Supports parallel processing,
-0.218586	for audio and video processing,
-0.218578	calculations. Examples are image processing,
-0.212309	and video processing, signal processing,
-0.200037	are image processing, sound processing,
-0.290548	*)d, x); } void SelectAddMul(short
-0.209234	Loop with branch void SelectAddMul(short
-0.209234	Define vector classes void SelectAddMul(short
-0.312195	function call inline void SelectAddMul(short
-0.209234	_mm_storeu_si128((__m128i *)d, x);} void SelectAddMul(short
-0.209234	Branch/loop function vectorized: void SelectAddMul(short
-0.237949	mechanisms often disturb the users
-0.289451	in mind, that many users
-0.233719	are used by many users
-0.313612	of time for software users
-0.234534	more RAM than end users
-0.233320	resource for many computer users
-0.237896	256 bits (YMM), and soon
-0.237744	it becomes invalid as soon
-0.481377	models then you will soon
-0.236092	and my manual will soon
-0.222383	A compiler for Basic soon
-0.218578	b * 5). As soon
-0.350313	15.0) is using a six
-0.314234	body now contains only six
-0.498471	show that it takes six
-0.353735	64-bit Linux, the first six
-0.265198	register variables is approximately six
-0.196029	limited. There are approximately six
-0.230815	Metaprogramming ....................................................................................................... 150 16 Testing
-0.230815	15.1a to 15.1c). 16 Testing
-0.236124	15.1c). 16 Testing speed Testing
-0.165174	2: // Example 14.7b. Testing
-0.165174	Example: // Example 14.7a. Testing
-0.165174	the occurrence is rare. Testing
-0.274799	the compiled code. In general,
-0.220806	outside the loop. In general,
-0.220806	accessed equally fast. In general,
-0.220806	about this condition. In general,
-0.220806	their 32-bit counterparts. In general,
-0.220806	random number generators. In general,
-0.355616	The trick is to roll
-0.647410	no easy way to roll
-1.070777	may be useful to roll
-0.520159	If we want to roll
-1.203829	it is advantageous to roll
-0.293266	to understand when we roll
-0.488440	support for intrinsic functions (i.e.
-0.496573	optimal to do so (i.e.
-0.313660	variables. All global variables (i.e.
-0.532581	for powers of 2 (i.e.
-0.291281	storing function return addresses (i.e.
-0.409499	within the same module (i.e.
-0.294199	loaded into ecx and edx,
-0.314729	whose address is in edx,
-0.236826	ecx, 1 eax, 8 edx,
-0.229229	PTR [esp+8] eax, eax edx,
-0.226772	mov mov 2:8+esp eax, edx,
-0.212304	2:8+esp eax, edx, ecx, edx,
-0.512963	two different implementations of C++,
-0.279280	code. Most implementations of C++,
-0.237136	14.2 Bounds checking In C++,
-0.230805	software programming language, e.g. C++,
-0.218586	Compiled languages include C, C++,
-0.165174	the code. C#, managed C++,
-0.006519	c = LoadVector(cc + i);
-0.006519	b = LoadVector(bb + i);
-0.343749	class with members of mixed
-0.456806	because they cannot be mixed
-0.428132	key. Do objects have mixed
-0.312937	expressions where operands have mixed
-0.237592	needs careful optimization. A mixed
-0.200037	integration, web application integration, mixed
-0.237795	<< 6); Or, if protection
-0.352404	virus scanners and other protection
-0.258020	benefits of a copy protection
-0.205955	Copy protection. Some copy protection
-0.205955	is updated. Most copy protection
-0.205955	system breakdown. Many copy protection
-0.521857	independent of the loop counter.
-0.346106	than from the loop counter.
-0.311417	is a simple integer counter.
-0.235127	make an additional integer counter.
-0.446285	to) the time stamp counter.
-0.252221	the so-called time stamp counter.
-0.358057	more integer to the structure.
-0.723807	a member of a structure.
-0.877900	or reference to a structure.
-0.467220	into a class or structure.
-0.279187	the object's class or structure.
-0.237540	for the desired program structure.
-0.408166	of an int is 4.
-0.479638	perfectly on a Pentium 4.
-0.189807	fake an Intel Pentium 4.
-0.317062	to the old Pentium 4.
-0.165179	-mavx, etc. for Linux) 4.
-0.165179	programmers and compiler makers. 4.
-0.237870	off the computer for security
-0.237112	errors in programs where security
-0.235755	Integer overflow is another security
-0.449078	deviate from the above security
-0.165169	there is a compelling security
-0.165169	reliable than third party security
-0.591227	reducing the number of branches.
-0.325376	has many calls and branches.
-0.341429	few or no other branches.
-0.237103	improvements. Making too many branches.
-0.234512	are implemented as three branches.
-0.265184	branch and other nearby branches.
-0.418085	8 short int 128 Is16vec8
-0.532673	: b * c; Is16vec8
-0.890855	cc into vector c: Is16vec8
-0.890855	bb into vector b: Is16vec8
-0.806151	a vector of (0,0,0,0,0,0,0,0) Is16vec8
-0.806151	a vector of (2,2,2,2,2,2,2,2) Is16vec8
-0.324225	the number of CPU cores.
-0.231735	jumps between different CPU cores.
-0.423776	jump between multiple CPU cores.
-0.307377	optimized. Jumps between CPU cores.
-0.338541	a CPU with multiple cores.
-0.236170	utilize the multiple processor cores.
-0.716656	can take care of communication
-0.324871	are various methods for communication
-0.331371	may be needed for communication
-0.356198	same cache is that communication
-0.237511	with fine-grained parallelism because communication
-0.236125	the amount of necessary communication
-0.485198	is used only for avoiding
-0.324877	and suggests methods for avoiding
-0.323412	become invalid, and by avoiding
-0.236307	avoid this error by avoiding
-0.236307	we may save by avoiding
-0.510283	systems are not always avoiding
-1.296759	pointer or reference to anything
-0.491124	function cannot rely on anything
-0.552769	consume more time than anything
-0.346968	be necessary to optimize anything
-0.234678	loop does not cost anything
-0.489092	pointer does not alias anything
-0.294120	cc); } #endif // INSTRSET
-0.231902	set. The preprocessing macro INSTRSET
-0.191070	a.store(aa+i); } } #if INSTRSET
-0.191070	on instruction set #if INSTRSET
-0.074783	#define FUNCNAME SelectAddMul_SSE41 #elif INSTRSET
-0.074783	#define FUNCNAME SelectAddMul_SSE2 #elif INSTRSET
-0.236545	21 3.13 Memory access Accessing
-0.436979	through a smart pointer. Accessing
-0.648489	mode and back again. Accessing
-0.218572	into classes or structures. Accessing
-0.212304	making data more compact. Accessing
-0.165169	require a variable. Efficiency Accessing
-0.346467	access internal variables and internal
-0.063532	are not used for internal
-0.293334	GOT and PLT for internal
-0.237769	Use function libraries with internal
-0.236551	because we can access internal
-0.339790	unsigned int or by type-casting
-0.236305	its child class by type-casting
-0.236305	a different type by type-casting
-0.293885	be aware of when type-casting
-0.229237	same as C- style type-casting
-0.272284	safe than the C-style type-casting
-0.491716	be determined by the requirements
-0.349384	obviously influenced by the requirements
-0.356784	often conflicting with the requirements
-0.236076	software development process. These requirements
-0.337989	calculations or turn off requirements
-0.307578	64, but the alignment requirements
-0.358057	at all to the profiler.
-0.530662	tool is not a profiler.
-0.350085	alternatives to using a profiler.
-0.312599	requires a CPU- specific profiler.
-0.165179	than using a ready-made profiler.
-0.336327	64-bit mode. Therefore, the __fastcall
-0.237793	#pragma optimize(...) Fastcall function __fastcall
-0.330245	fastcall functions The keyword __fastcall
-0.165174	function __fastcall __attribute(( fastcall)) __fastcall
-0.165174	keywords Fast function calling. __fastcall
-0.458759	it may cause a loss
-0.441983	problems of overflow and loss
-0.314608	would cause overflow or loss
-0.324331	integers with hardly any loss
-0.236308	have to worry about loss
-0.349770	call or any other cleanup
-0.331293	take care of all cleanup
-0.337184	for all the necessary cleanup
-0.290814	C++ way of handling cleanup
-0.233582	from functions that require cleanup
-0.148753	organization ................................................................................................... 87 9.3 Functions
-0.148753	and VIA CPUs". 9.3 Functions
-0.122672	7.13 Loops...................................................................................................................... 45 7.14 Functions
-0.122672	is too big. 7.14 Functions
-0.165184	C++ compiler, v. 10.1.020. Functions
-0.236355	thought-through approach to error handling.
-0.212958	for debugging and exception handling.
-0.212958	program relies on exception handling.
-0.265923	there is no exception handling.
-0.265923	compatible with structured exception handling.
-0.851468	optimization of C++ and Fortran
-0.237635	of C++, Pascal and Fortran
-0.294214	two loops (except in Fortran
-0.165179	not quite as versatile. Fortran
-0.165179	C, C++, D, Pascal, Fortran
-0.504663	more discussion of the increment
-0.478387	allows the CPU to increment
-0.237729	When used simply to increment
-0.460427	eax,1 is the loop increment
-0.535683	is said here about increment
-0.429489	graphics function libraries and drivers
-0.136391	service routines and device drivers
-0.136391	capabilities (except in device drivers
-0.136391	used in 64-bit device drivers
-0.136391	or C++. Critical device drivers
-0.677749	it is important to economize
-0.443459	It is important to economize
-0.279646	even more important to economize
-0.314731	most efficient library and economize
-0.568667	resolved at compile time. Templates
-0.483165	of a template parameter. Templates
-0.281578	int x = 10; Templates
-0.212309	in simple cases. 7.28 Templates
-0.165174	the same template. 57 Templates
-0.200050	eight elements in row 28
-0.200050	the elements from row 28
-0.224953	cache lines in column 28
-0.176501	these elements with column 28
-0.200049	(32-bit or 64-bit systems). 28
-0.237922	12.4b executes three to seven
-0.236920	various algebraic expressions on seven
-0.236920	series of experiments on seven
-0.312382	turned up to cause seven
-0.229243	a precision of approximately seven
-0.504281	frame function can be turned
-0.293792	into an STL vector turned
-0.034686	all relevant optimization options turned
-0.446301	parameter. The order of inheritance
-0.286474	7.38b. Alternative to multiple inheritance
-0.231100	cases such as multiple inheritance
-0.286474	You may avoid multiple inheritance
-0.222395	// Example 7.38a. Multiple inheritance
-0.647788	The easiest way to overcome
-0.711316	130 for how to overcome
-0.582763	is discussed how to overcome
-0.413009	section discusses how to overcome
-0.462969	safety problem can be overcome
-0.378052	long and difficult to maintain.
-0.378052	and are difficult to maintain.
-0.290311	and therefore difficult to maintain.
-0.472835	that is easier to maintain.
-0.237905	difficult to debug and maintain.
-0.112549	systems allow up to fourteen
-0.112549	Mac allow up to fourteen
-0.264874	registers, totaling up to fourteen
-0.726192	in 32-bit systems and fourteen
-0.575161	32-bit operating systems and fourteen
-0.314774	application program. Add to 122
-0.626927	operating system. See page 122
-0.342144	will crash. See page 122
-0.165179	13.1 CPU dispatch strategies........................................................................................ 122
-0.165179	for different instruction sets........................... 122
-0.233667	more complicated and time consuming.
-0.233667	these methods are time consuming.
-0.233667	which is very time consuming.
-0.233667	can be particularly time consuming.
-0.212331	long and very time- consuming.
-0.237949	enough to justify the method.
-0.454015	further discussion of this method.
-0.381336	dispatch on every call method.
-0.236706	use this complicated template method.
-0.236404	cache contentions. Use simple method.
-1.376564	for the sake of backwards
-0.336153	CPUs. The sequence of backwards
-0.652915	operating systems are not backwards
-0.710145	when data are accessed backwards
-0.212323	to follow the track backwards
-0.294263	optimal to mirror the remote
-0.356475	of communication with a remote
-0.294231	data locally. Access to remote
-0.237757	disk cache. Files on remote
-0.165174	intranet for automatic updates, remote
-0.356115	simple types such as int,
-0.237711	if you declare an int,
-0.293177	or unsigned 2 2 int,
-0.230423	unsigned 1 1 short int,
-0.230423	data types: char, short int,
-0.237899	choose between c2 and bc
-0.944073	each element in vector bc
-0.042891	b and c __m128i bc
-0.218590	with the inverted bit-mask: bc
-0.282032	with compilers and development tools.
-0.210343	lack of advanced development tools.
-0.210343	availability of powerful development tools.
-0.213890	always use standardized installation tools.
-0.213890	than by individual installation tools.
-0.301817	an integer in one operation.
-0.301817	four additions in one operation.
-0.539103	conditions in a single operation.
-0.192771	done as a shift operation.
-0.192771	by using a shift operation.
-0.352919	register size in the future.
-0.520686	become available in the future.
-0.352919	will dominate in the future.
-0.352919	will grow in the future.
-0.165190	in a more distant future.
-0.294263	order to force the swapping
-0.349011	the time we are swapping
-0.237615	must be careful when swapping
-0.237558	see the excessive memory swapping
-0.228158	memory to disk. Memory swapping
-0.357124	512 bits when the AVX512
-0.039732	double 64 8 512 AVX512
-0.039732	long 64 8 512 AVX512
-0.039732	int 32 16 512 AVX512
-0.039732	float 32 16 512 AVX512
-0.568713	process There is a considerable
-0.331384	garbage collection takes a considerable
-0.293633	can still give a considerable
-0.237395	is of course a considerable
-0.237603	that are called. A considerable
-0.314783	You may remove the memset
-0.352655	The explicit use of memset
-0.341850	loops by calls to memset
-0.237867	compiler may report that memset
-0.348175	to use the functions memset
-0.657562	well-defined interface to the rest
-0.357163	assembly language and the rest
-0.358310	the advice in the rest
-0.577222	start so that the rest
-0.460901	is called by the rest
-0.328763	most advanced code version on,
-0.376954	running the advanced version on,
-0.424484	CPU it is running on,
-0.118945	relevant optimization options turned on,
-0.237328	The same example using Agner's
-0.346516	dispatching with vector classes Agner's
-0.330831	system (see page 107). Agner's
-0.165174	with the option -mveclibabi=acml. Agner's
-0.165174	LIBM Library amd_vrs4_expf amd_vrd2_exp Agner's
-0.357509	file level, and the Digital
-0.237896	while the Borland and Digital
-0.218578	Constantfolding xxxxxxxxx Codeplay Watcom Digital
-0.165174	C++ v. 7.1-4, 2008. Digital
-0.165174	performance for vector intrinsics. Digital
-0.547106	gets information about the third
-0.352357	remote database, and a third
-0.345149	improve the code. The third
-0.237700	often more reliable than third
-0.165174	function. The } 59 third
-0.235502	possible vector objects // Roll
-0.534069	a, b, c; // Roll
-0.023339	two = _mm_set1_epi16(2); // Roll
-0.235502	(2,2,2,2,2,2,2,2) Is16vec8 two(2,2,2,2,2,2,2,2); // Roll
-0.324551	Time before test // Critical
-0.420342	i, a, b; // Critical
-0.212315	time consuming parts only. Critical
-0.265197	limits the CPU brand. Critical
-0.212315	be C or C++. Critical
-0.028934	49 and manual 5: "Calling
-0.035788	explained in manual 5: "Calling
-0.035788	given in manual 5: "Calling
-0.035788	code" in manual 5: "Calling
-0.028934	cases. See manual 5: "Calling
-0.341879	scarce resources. However, the CISC
-0.237899	distinctions between RISC and CISC
-0.293769	a limited resource. The CISC
-0.237515	superior performance/price ratio. The CISC
-0.331704	standard PC processors with CISC
-0.237896	point addition units, and 22
-0.218578	Execution unit throughput ....................................................................................... 22
-0.200037	3.15 Dependency chains ................................................................................................ 22
-0.165174	22 3.14 Context switches..................................................................................................... 22
-0.165174	21 3.13 Memory access....................................................................................................... 22
-0.294181	0 or 1. The AND
-0.537533	= _mm_cmpgt_epi16(b, zero); // AND
-0.237240	_mm_and_si128(c2, mask); 110 // AND
-0.515505	results of the two AND
-0.321431	2n -1. The bitwise AND
-0.212288	is rarely worth the effort
-0.212288	is hardly worth the effort
-0.325660	and concentrate the optimization effort
-0.098823	hand, if your optimization effort
-0.098823	CriticalFunction. If your optimization effort
-0.892104	integers and floating point numbers.
-0.351768	integers or floating point numbers.
-0.233322	and 1 for negative numbers.
-0.315173	function on a thousand numbers.
-0.200043	reasons to use denormal numbers.
-0.293885	devices are becoming more popular
-0.237592	and development tools. A popular
-0.237342	Today, the 8 most popular
-0.236834	this brand was less popular
-0.212309	powerful development tools. One popular
-0.294122	TILESIZE = 8; // SIZE
-0.317728	EMMS } const int SIZE
-0.317728	Example 9.5a const int SIZE
-0.317728	Example 9.6a const int SIZE
-0.234528	(SIZE > 256 && SIZE
-0.631836	Runtime type identification (RTTI) Runtime
-0.122669	functions ........................................................................................ 53 7.21 Runtime
-0.122669	worth the effort. 7.21 Runtime
-0.165179	polymorphism: // Example 7.43a. Runtime
-0.165179	so. See page 73. Runtime
-0.236523	adhere to certain programming principles
-0.289546	are stored. The storage principles
-0.321407	compromise on the advanced principles
-0.233318	There are two main principles
-0.165174	process and software engineering principles
-0.582901	reduce the number of context
-0.545172	program. The number of context
-0.237877	for background jobs. The context
-0.237595	3.14 Context switches A context
-0.165179	a big program. Frequent context
-0.237793	these addresses to function names.
-0.537173	function names and variable names.
-0.339989	are allowed in assembly names.
-0.236070	with short or common names.
-0.165174	compile-time generation of identifier names.
-0.341859	show various ways of reducing
-0.999887	may be used for reducing
-0.293814	compilers are better at reducing
-0.236596	floating point operations without reducing
-0.320826	Gnu compilers are actually reducing
-0.457340	for code that can benefit
-0.293510	in XMM registers can benefit
-0.293860	amounts of memory will benefit
-0.196040	a variable that could benefit
-0.196040	of the code could benefit
-0.667252	it may not be worth
-0.468325	cache may not be worth
-0.423876	set. It is rarely worth
-0.230108	a DLL. Another alternative worth
-0.301497	Hence, it is hardly worth
-0.820274	to the Gnu compiler manual.
-0.192267	the scope of this manual.
-0.236801	only read this first manual.
-0.276598	techniques in the present manual.
-0.294172	type casting operator that specifies
-0.340635	a piece of software specifies
-0.098370	that the C/C++ standard specifies
-0.098370	bad The C/C++ standard specifies
-0.378125	Volatile The volatile keyword specifies
-0.314724	no longer used and searching
-0.237637	Other programs use time searching
-0.234529	efficient functions for string searching
-0.284370	an efficient solution. Is searching
-0.165188	a binary tree. Is searching
-0.330312	explanation of register stack versus
-0.279498	Pointers and references Pointers versus
-0.074787	....................................................................................... 145 14.11 Static versus
-0.074787	this limitation). 14.11 Static versus
-0.212315	for integer overflow. Signed versus
-0.061220	function inlining and constant propagation
-0.014515	Constant folding and constant propagation
-0.217844	order to enable constant propagation
-0.218602	Intel Borland Microsoft Constant propagation
-0.814529	able to do the reduction
-0.354271	obscure examples where the reduction
-0.514432	possibility that a particular reduction
-0.122672	the function call. Algebraic reduction
-0.122672	more complicated reductions. Algebraic reduction
-0.237401	unit-test without taking cache effects
-0.215260	positive or the negative effects
-0.215260	these variables. The negative effects
-0.233322	program performance. The positive effects
-0.165179	the operands has side effects
-0.011647	z = y + 1.;
-0.226214	Func1(x) * Func1(x) + 1.;
-0.226214	x.a = y.a + 1.;
-0.237882	Live range analysis The live
-0.200056	objects even when their live
-0.057021	and b because their live
-0.013569	same register because their live
-0.487709	possible to access a multidimensional
-0.237749	efficient solution. Is a multidimensional
-0.048276	78). A matrix or multidimensional
-0.048276	needed? A matrix or multidimensional
-0.237599	memset(list, 0, sizeof(list)); A multidimensional
-1.033020	time it takes to install
-0.237537	it takes hours to install
-0.236250	that the user must install
-0.200049	pop-up messages saying please install
-0.539287	- in terms of development,
-0.293795	is used during program development,
-0.331348	the history of CPU development,
-0.340630	the costs of software development,
-0.165174	debugging facilities, easy GUI development,
-0.761790	that rely on the strict
-0.237864	The trick violates the strict
-0.353413	development models have a strict
-0.237875	turn off requirements for strict
-0.340826	alignment requirements are less strict
-0.019920	SIZE; r++) { for (c
-0.236767	loop through rows for (c
-0.237269	(a + b) + (c
-0.532132	on page 146 below. Position-independent
-0.232709	and the loader. 2. Position-independent
-0.251360	code everywhere by default. Position-independent
-0.074783	dynamic libraries............................................................................ 146 14.12 Position-independent
-0.074783	position-independent code. 147 14.12 Position-independent
-0.575623	where the parallelism is obvious
-0.564797	pointer. It may be obvious
-0.451223	when it would be obvious
-0.485702	appears to be an obvious
-0.313835	will not do such obvious
-0.237759	below the diagonal is swapped
-0.237759	each element matrix[r][c] is swapped
-0.357404	or they may be swapped
-0.714726	a and b are swapped
-0.324171	are uncached or even swapped
-0.222383	Other system resources .......................................................................................... 21
-0.218578	3.12 Network access ...................................................................................................... 21
-0.212320	database is heavily loaded. 21
-0.212309	3.9 Other databases ....................................................................................................... 21
-0.165174	21 3.10 Graphics ................................................................................................................. 21
-0.313496	to fit the biggest vectors:
-0.000537	to fit the eight-element vectors:
-0.629832	one clock cycle. The OR
-0.294117	= _mm_andnot_si128(mask, bc); // OR
-0.236162	0's gives zero. An OR
-0.309590	by using the bitwise OR
-0.165174	(&) and the EXCLUSIVE OR
-0.355187	>= N) { // Array
-0.236660	size = 100; // Array
-0.236660	size = 256; // Array
-0.228166	data in large arrays. Array
-0.165184	38 // Example 7.15a. Array
-0.293726	used by all other processes
-0.516206	not shared between multiple processes
-0.234136	on access. Run multiple processes
-0.237110	servers that run many processes
-0.228160	a lot of background processes
-0.626359	The C++ language is portable
-0.354632	use that for a portable
-0.496205	code will not be portable
-1.027836	that it is not portable
-0.284356	Portability C++ is fully portable
-0.579390	which is likely to consume
-0.237731	for virus scanners to consume
-0.237287	an extra framework can consume
-0.237287	data. A database can consume
-0.237499	These operators and functions consume
-0.212332	operating system standards. Such schemes
-0.212332	on hardware identification. Such schemes
-0.079348	protection. Some copy protection schemes
-0.079348	updated. Most copy protection schemes
-0.079348	breakdown. Many copy protection schemes
-0.236760	division takes 40 - 80
-0.236760	and multiplication (27 - 80
-0.355573	integer expressions. See page 80
-0.235770	all local non-member functions. 80
-0.290688	functions and simply put 80
-0.371974	as pointers and references. Arrays
-0.074783	pointers .......................................................................................................... 38 7.10 Arrays
-0.074783	on page 93. 7.10 Arrays
-0.466187	can be allocated dynamically. Arrays
-0.165179	strange and unexpected behaviors. Arrays
-0.382786	to use it for lists
-0.237802	on complicated criteria or lists
-0.381545	table The following table lists
-0.234203	are faster than linked lists
-0.165174	pointer type casting. Linked lists
-0.354786	may fail in the event
-0.354786	to recover in the event
-0.354786	be resized in the event
-0.323186	input/output than the specific event
-0.165184	This results in meaningless event
-0.357946	main memory in a computer.
-0.354445	anything else on a computer.
-0.776170	on a Pentium 4 computer.
-0.235764	clock cycle on another computer.
-0.251360	in a big mainframe computer.
-0.294042	code 64 bit code Static
-0.376443	than dynamic linking are: Static
-0.230108	other functions can not. Static
-0.122669	functions ....................................................................................... 145 14.11 Static
-0.122669	overcome this limitation). 14.11 Static
-0.237930	is over. Virtualization is becoming
-0.449400	compiler. The compilers are becoming
-0.338228	and vector processors are becoming
-0.293128	Small hand-held devices are becoming
-0.347094	Efficient caching is therefore becoming
-0.358545	an advantage in the select
-0.547386	should be possible to select
-0.324822	size. Unpredictable branches that select
-0.237461	has preprocessing directives that select
-0.313103	The compiler will always select
-0.462188	12.8b. Sum of a list,
-0.348364	access an element in list,
-0.356112	STL templates, such as list,
-0.288995	your software. A negative list,
-0.165174	sharing the same queue, list,
-0.237593	bit scan instruction is executed
-0.614929	loop control branch is executed
-0.314358	of the latter is executed
-0.353078	intermediate code cannot be executed
-0.457935	operation can often be executed
-0.357292	is optimal on the actual
-0.653989	more time than the actual
-0.355342	clock cycles at the actual
-0.535507	necessary, to fit the actual
-0.416825	parameters replaced by their actual
-0.348251	dependency chains. In this case,
-0.432456	because in the latter case,
-0.230482	obtained. In the latter case,
-0.303125	logarithms in the general case,
-0.200043	integer constant. // General case,
-0.428373	The gain in performance over
-0.306731	to weigh the advantages over
-0.217607	systems have several advantages over
-0.218584	The advantages of alloca over
-0.165179	compilers due to controversies over
-0.346285	be tested with a realistic
-0.346285	be performed with a realistic
-0.441528	order to get a realistic
-0.335910	to date. A more realistic
-0.237599	also be considered. A realistic
-0.459966	because the size of abc
-0.459966	If the size of abc
-0.459966	example, the size of abc
-0.231914	// Example 7.13 struct abc
-0.165184	int b; int c;}; abc
-0.628722	calculation of A is finished.
-0.237424	the preceding addition is finished.
-0.314156	the preceding iteration is finished.
-0.237424	where the compilation is finished.
-0.237859	inside the loop are finished.
-0.232693	list, on the other hand,
-0.047546	CPUs. On the other hand,
-0.047546	compiler. On the other hand,
-0.047546	difficult. On the other hand,
-0.047546	profitable. On the other hand,
-0.237899	x86-64 platform _M_IX86 and _WIN64
-0.236791	32 bit platform not _WIN64
-0.236791	platform not _WIN64 not _WIN64
-0.310089	_WIN64 64 bit platform _WIN64
-0.251360	bit platform _WIN64 _LP64 _WIN64
-1.727045	time it takes to recover
-0.649778	to know how to recover
-0.316793	to be able to recover
-0.381784	attempt is made to recover
-0.462531	output goes to the console
-0.354475	The inputs for a console
-0.354447	interface and use a console
-0.236136	an output file. A console
-0.236136	graphical user interface. A console
-0.502668	resources. Most of the advice
-0.357248	mode. Much of the advice
-0.325212	vectorization then follow the advice
-0.575516	a non-sequential order. The advice
-0.346017	preceding row. The same advice
-0.341323	same data in different ways.
-0.429073	be used in two ways.
-0.289921	go more than two ways.
-0.236848	as 32 sets 4 ways.
-0.236833	is 512 kb, 8 ways.
-1.129444	like this: // Example 16.2
-0.871501	The code in example 16.2
-0.512370	principle as in example 16.2
-0.345268	will crash the program. 16.2
-0.200043	monitor counters .................................................................... 155 16.2
-0.352973	is used inside the pow
-0.237896	such as sqrt and pow
-0.353365	return pow(x,10); } The pow
-0.354431	// ipow faster than pow
-0.165174	library functions like sqrt, pow
-0.575623	of modern microprocessors is split
-0.456691	case we need to split
-0.891572	is not advantageous to split
-0.356763	few lines should be split
-0.235230	Each 128-bit operation was split
-0.294155	all the factors are generated
-0.065030	look at the code generated
-0.235330	different compiler. Object files generated
-0.200043	Most of the comments generated
-0.044132	time a string is created
-0.346386	a Windows program that created
-0.237805	class is declared or created
-0.233041	object must be dynamically created
-0.543045	frequency may be a hundred
-0.646907	take more than a hundred
-0.293633	re-loaded from memory a hundred
-0.237395	and calculate *p+2 a hundred
-0.236086	is cached, but several hundred
-0.325402	estimated calculation time of 250
-0.237811	* 0.5 ns = 250
-0.434821	call the library function 250
-0.352758	is actually more than 250
-0.165174	this loop? Certainly not! 250
-0.579438	use a lot of computing
-0.341826	much less memory and computing
-0.324874	a temporary register for computing
-0.429160	various function libraries for computing
-0.236836	embedded applications have less computing
-0.352658	using references instead of pointers,
-0.936032	arrays are accessed through pointers,
-0.230101	array bounds violations, invalid pointers,
-0.226782	huge). Far storage, far pointers,
-0.265190	loop counters, function parameters, pointers,
-0.353420	even faster way to limit
-0.235417	may be no certain limit
-0.053553	or a reasonable upper limit
-0.053553	when no reasonable upper limit
-0.114650	or a not-too-big upper limit
-0.237896	way. See page and 90
-0.355568	such errors. See page 90
-0.233032	__attribute__((aligned(64))); // Linux syntax 90
-0.218578	Alignment of data ...................................................................................................... 90
-0.212309	Dynamic memory allocation ...................................................................................... 90
-0.704226	and it needs to follow
-0.354487	than C if you follow
-0.237586	you want vectorization then follow
-0.437863	because the cache lines follow
-0.212309	if the case labels follow
-0.444364	one is called a loop-carried
-1.154638	if there is no loop-carried
-0.293399	example 8.23b has two loop-carried
-0.232696	loop iterations are: No loop-carried
-0.230820	long dependency chains, especially loop-carried
-0.356864	must use a function library,
-0.418144	With a long vector library,
-0.329797	With a short vector library,
-0.447537	for the vector class library,
-0.335370	resources than a static library,
-0.237952	few decades ago, the recommendation
-0.394791	I have no specific recommendation
-0.303940	not making any specific recommendation
-0.200054	bit scan instructions. My recommendation
-0.200054	object file level. My recommendation
-0.812163	9.6 Dynamic memory allocation Objects
-0.906674	a power of 2. Objects
-0.323806	memory re-allocation is needed. Objects
-0.229231	complex and often inefficient. Objects
-0.284352	is called garbage collection. Objects
-0.357813	programming language is a compromise
-0.354308	framework must be a compromise
-1.248313	it is necessary to compromise
-0.337562	efficient solution that doesn't compromise
-0.330839	may be a viable compromise
-0.028390	level, and the Digital Mars
-0.028390	the Borland and Digital Mars
-0.028390	xxxxxxxxx Codeplay Watcom Digital Mars
-0.028390	v. 7.1-4, 2008. Digital Mars
-0.028390	for vector intrinsics. Digital Mars
-0.477515	then the value is already
-0.325100	of the string is already
-0.237591	though the CPU-type is already
-0.346388	of a program that already
-0.335899	memory block that has already
-0.570706	or if there is nothing
-0.355326	control branch. There is nothing
-0.443742	(the instruction set has nothing
-0.237513	the while loop because nothing
-0.237344	5) { // do nothing
-0.535232	? b : c (a&&b)
-0.200037	a*b=b*a a+b+c=a+(b+c) (a+b)+c=a+(b+c) --xx----- (a&&b)
-0.200037	---x----- x--xx---- (a&&b)||(a&&!b)=a x--xx---- (a&&b)
-0.165174	: c x-xx----- 75 (a&&b)
-0.165174	|| (a&&b&&c) = a&&b (a&&b)
-0.498944	processor for calculating the physical
-0.591229	between the number of physical
-0.408045	CPU is limited by physical
-0.352566	by assigning a new physical
-0.235755	This processor has four physical
-0.234953	7.20 int i; if ((unsigned
-0.290855	"Gamma", "Delta" }; if ((unsigned
-0.234953	3628800, 39916800, 479001600}; if ((unsigned
-0.234953	// Example 14.5b if ((unsigned
-0.234953	// Example 14.4b if ((unsigned
-1.758762	x - - - xxxxxxxxx
-0.200037	(-a)*(-b)=a*b a/a=1 ----x---x a/1=a xxxxxxxxx
-0.200037	x-xx----x x-xxxxxx- x-xxxx-x- x-xxxxxxx xxxxxxxxx
-0.165174	Function inlining x-xxxx--x Constantfolding xxxxxxxxx
-0.165174	x-xxxx-x- x-xxxxxxx xxxxxxxxx xxxxxxx-x xxxxxxxxx
-0.344953	are allowed to have constructors
-0.420425	running and before any constructors
-0.288188	and destructors. The copy constructors
-0.215557	are useful for copy constructors
-0.268860	there are no copy constructors
-0.478735	The clock frequency is increased
-0.355947	Intel CPUs can be increased
-0.355947	of abc can be increased
-0.323381	new generation of CPUs increased
-0.451757	vector registers has been increased
-0.237162	tips on advanced C++ programming,
-0.292632	the area of system programming,
-0.344342	instruction timing, assembly language programming,
-0.222389	integration, mixed language 11 programming,
-0.165174	of structured and object-oriented programming,
-0.349786	but not any other factor.
-0.005242	divisible by the unroll factor.
-0.290724	or the loop unroll factor.
-0.226779	whenever they are available, i.e.
-0.035785	be aligned by 16, i.e.
-0.035785	are aligned by 16, i.e.
-0.165179	its address is taken, i.e.
-0.165179	transposes a quadratic matrix, i.e.
-0.421392	30 // f is nonzero
-0.237926	We can multiply a nonzero
-0.336300	}; The values of nonzero
-0.340917	{ // check if nonzero
-0.237081	// always 1 if nonzero
-0.314750	a frequent cause of unacceptably
-0.237780	is still frustrated by unacceptably
-0.237652	a framework sometimes have unacceptably
-0.287875	becomes inconsistent and sometimes unacceptably
-0.212309	the user might experience unacceptably
-0.425624	be different for each process.
-0.999173	one instance for each process.
-0.430121	of the software development process.
-0.377412	inefficient virtual function dispatch process.
-0.281569	computer during the update process.
-0.294117	// Loop counter // Calculate
-0.200037	reorganize: // Example 15.1c. Calculate
-0.200037	integer: // Example 15.1b. Calculate
-0.165174	variables: // Example 8.23b. Calculate
-0.165174	time. // Example 15.1a. Calculate
-0.237821	// Example 14.21. // Only
-0.231899	131. AMD LIBM library. Only
-0.228158	into the executable file. Only
-0.324076	an arbitrary cache line. Only
-0.272286	the value of ebx. Only
-0.842381	code is that it adds
-0.434918	required // This function adds
-0.234191	100*16, and temp++ actually adds
-0.480270	(RTTI) Runtime type identification adds
-0.165174	eax / sar ebx,1 adds
-0.014949	} }; void test ()
-0.063158	swapd(a[r][c], a[c][r]); void test ()
-0.224953	Example 8.25 void Func ()
-0.200049	Example 14.1c void CriticalInnerFunction ()
-0.538693	This is slow // Division
-0.231909	functions for these calculations. Division
-0.342386	- 8 clock cycles. Division
-0.556936	which is much faster. Division
-0.165174	cases where it matters: Division
-0.358674	But beware of the pitfalls
-0.102719	the program. 16.2 The pitfalls
-0.102719	.................................................................... 155 16.2 The pitfalls
-0.501635	methods. The most common pitfalls
-0.707056	there are a few pitfalls
-0.353845	to install a program package
-0.435449	parts of the software package
-0.229472	to base a software package
-0.229472	to install a software package
-0.229472	to reinstall a software package
-0.558733	!b) rather than the equivalent
-0.478028	An overloaded operator is equivalent
-0.325327	of the cases. The equivalent
-0.325299	runtime. #define directives are equivalent
-0.235562	are best at doing equivalent
-0.351960	is therefore important to understand
-0.518036	that is difficult to understand
-0.473109	15.1b is easier to understand
-0.408127	used to read and understand
-0.441053	panic if you don't understand
-0.447902	Use predefined vector classes Fortunately,
-0.222389	cases, but not all. Fortunately,
-0.212309	sets and cache sizes. Fortunately,
-0.165174	has one operator less. Fortunately,
-0.165174	+ c.y + d.y; Fortunately,
-0.355433	the compiler from the command
-0.356623	never respond to a command
-0.354265	typically specified on a command
-0.349889	when called from a command
-0.237599	it is servicing. A command
-0.294113	size; i++) b[i] = a[i];
-0.237010	// No error return a[i];
-0.160169	100; i++) sum += a[i];
-0.160169	i<100; i++) sum += a[i];
-0.220495	4) { s0 += a[i];
-0.237950	it rarely justifies the relatively
-0.314759	testing a condition is relatively
-0.357790	the dangers of a relatively
-0.338795	point comparisons, which are relatively
-0.293640	misprediction penalty. Branches are relatively
-0.323927	must have a high priority.
-0.221764	of performance has high priority.
-0.246856	separate threads with low priority.
-0.196038	size have got low priority.
-0.283145	other threads with lower priority.
-0.347856	the size of data files.
-0.293436	as object or library files.
-0.429915	everywhere in the source files.
-0.231396	input or reading disk files.
-0.230114	library modules and header files.
-0.459832	default. Position-independent code is inefficient,
-0.357269	memory block. This is inefficient,
-0.561459	efficiently. This method is inefficient,
-0.338550	everything, which is quite inefficient,
-0.284365	but this is extremely inefficient,
-0.325419	if you follow the guidelines
-0.349859	take advantage of these guidelines
-0.456510	is unsigned. The following guidelines
-0.236314	the program logic. Some guidelines
-0.165174	screen resolutions, etc. Accessibility guidelines
-0.078694	multi-threading, e.g. Intel Math Kernel
-0.037567	version of Intel's Math Kernel
-0.037567	supplied in Intel's Math Kernel
-0.037567	as the "Intel Math Kernel
-0.037567	(www.boost.org). The "Intel Math Kernel
-0.237932	new or malloc) is necessarily
-0.453060	new object is not necessarily
-0.350595	higher number is not necessarily
-0.352376	in sequence are not necessarily
-0.351073	or thread does not necessarily
-0.294203	function to use and returns
-0.357415	variable until the function returns
-0.335699	a member function which returns
-0.233040	counter // For unused returns
-0.212320	in the beginning. ret returns
-0.549746	doing two or more jobs
-0.381838	possible to do two jobs
-0.165184	all the necessary cleanup jobs
-0.165184	way of handling cleanup jobs
-0.165179	30 ms for foreground jobs
-0.449505	object of the class. Data
-0.218594	do not always work. Data
-0.200037	of keeping data together. Data
-0.200037	cache (see page 87). Data
-0.165174	the same memory areas. Data
-0.294203	extra software layers and frameworks
-0.237851	Java virtual machine are frameworks
-0.234523	reason why such runtime frameworks
-0.289784	(OWL). Several graphical interface frameworks
-0.232709	they are running. Such frameworks
-0.458806	fail to see the excessive
-0.329089	of software into an excessive
-0.235661	as you avoid an excessive
-0.235661	press. 19 Avoid an excessive
-0.237632	function libraries available use excessive
-0.579638	argue that it is safer
-0.462382	it to. It is safer
-0.294155	using references. References are safer
-0.237595	thread increments seconds. A safer
-0.559235	73). It is therefore safer
-0.349077	the supported instruction set. Aligning
-0.074786	library exp exp 12.8 Aligning
-0.074786	for vectors........................................................................ 119 12.8 Aligning
-0.074786	with vector access. 12.9 Aligning
-0.074786	allocated memory................................................................. 120 12.9 Aligning
-0.290726	maximum advantage of out-of-order execution.
-0.170010	microcontrollers have no out-of-order execution.
-0.170010	microprocessors can do out-of-order execution.
-0.170010	chain which prevents out-of-order execution.
-0.230822	the sake of parallel execution.
-0.023429	size = 1024; int a[size],
-0.292215	100; int i; float a[size],
-0.292215	1000; int i; float a[size],
-0.231168	size = 1000; float a[size],
-0.807632	rather than by the latency
-0.451432	be limited by the latency
-0.644877	the same as the latency
-0.348134	important distinction between the latency
-0.352670	dependency chain has a latency
-0.294013	Remember, therefore, always to specify
-1.355513	it is recommended to specify
-0.341679	do so unless you specify
-0.331060	the code if we specify
-0.633057	you may as well specify
-0.525257	float a[100]; int i; for(i=0;
-0.023950	int list[300]; int i; for(i=0;
-0.286415	int list[301]; int i; for(i=0;
-0.355354	will benefit from the larger
-0.353767	advantage in using the larger
-0.899088	integer size that is larger
-0.497318	unit-test may have a larger
-0.378674	is that it allows larger
-0.356112	algebraic reductions such as -(-a)
-0.538510	a*b+a*c = a*(b+c) - -(-a)
-0.356612	= a*4 - n.a. -(-a)
-0.338520	may change the expression -(-a)
-0.234920	programmers write expressions like -(-a)
-0.436519	performance in some cases. Multiple
-0.218586	dynamic linking are: 146 Multiple
-0.165174	notion of a "function". Multiple
-0.165174	the multiplication is exact. Multiple
-0.165174	class: // Example 7.38a. Multiple
-0.294244	performance by unit-testing is unfortunately
-0.309420	an optimized function, but unfortunately
-0.317184	are pure functions, but unfortunately
-0.231205	output more readable but unfortunately
-0.286594	error doesn't occur, but unfortunately
-0.459061	and each value of n!
-0.064058	(int n) { // n!
-0.538546	the previous value as n!
-0.236445	ex xn n 0 n!
-0.651887	works most efficiently if pieces
-0.228167	be divided into small pieces
-0.228167	functions are typically small pieces
-0.230816	compact by joining identical pieces
-0.222389	consuming parts only. Critical pieces
-0.339986	the interpreted version of Basic
-0.339986	most popular version of Basic
-0.348324	Basic. A compiler for Basic
-0.203372	of Basic is Visual Basic
-0.255109	such as C#, Visual Basic
-0.226777	the most reliable solution. (In
-0.364834	measurements to avoid this. (In
-0.413654	times for user input. (In
-0.265190	function can be inlined. (In
-0.200037	ecx and edx, respectively. (In
-0.347043	predictions in the different microprocessors.
-0.438704	be used with other microprocessors.
-0.291535	cycles on most other microprocessors.
-0.324679	clock cycle on most microprocessors.
-0.165179	work only on Intel/x86-compatible microprocessors.
-0.441360	typically stored in different modules.
-0.309643	not accessible from other modules.
-0.098153	accessed by any other modules.
-0.323677	configuration files and system modules.
-0.237823	xxn * _mm_load_ps(coef+i); // s
-0.226786	s = _mm_hadd_ps(x, x); s
-0.375795	i; short int s; s
-0.148749	x) { __m128 s; s
-0.165179	initialize sum for(inti=0;i<16;i+=4){ //Loopby4 s
-0.504401	modules appear in the project
-0.357755	best suited for the project
-0.350322	call it from a project
-0.376056	put the whole software project
-0.233041	in a typical software project
-0.294240	where a task is divided
-0.577674	applications that can be divided
-0.355947	background job can be divided
-0.294013	different tasks were not divided
-0.341644	data area is usually divided
-0.292003	a library function from www.agner.org/optimize/asmlib.zip.
-0.292003	demonstration purposes. Available from www.agner.org/optimize/asmlib.zip.
-0.282664	the function library at www.agner.org/optimize/asmlib.zip.
-0.210879	the asmlib library at www.agner.org/optimize/asmlib.zip.
-0.350143	supplied in the library www.agner.org/optimize/asmlib.zip.
-0.096616	(Day & (Tuesday | Wednesday
-0.096616	The expression (Tuesday | Wednesday
-0.375083	Tuesday || Day == Wednesday
-0.283136	2, Tuesday = 4, Wednesday
-0.200043	the bits for Tuesday, Wednesday
-0.293832	can therefore suffer from mispredictions.
-0.035788	cache misses and branch mispredictions.
-0.279511	reliable results for branch mispredictions.
-0.224964	to generate many branch mispredictions.
-0.237867	to disk. Software that relies
-0.357781	handling unless the code relies
-0.324915	frame unless your program relies
-0.290367	hardware exceptions. The mechanism relies
-0.165174	dispatcher in the MKL relies
-0.236025	violations, invalid pointers, etc. And
-0.317420	fast as single precision. And
-0.433216	and difficult to maintain. And
-0.466176	procedure linkage table (PLT). And
-0.165174	various programming languages. www.yeppp.info And
-0.312122	be tested on different platforms,
-0.235718	in different browsers, different platforms,
-0.445083	is available for many platforms,
-0.236963	to facilitate porting between platforms,
-0.235936	generally possible on Linux platforms,
-0.629002	the sign bit to compare
-1.019444	If you want to compare
-0.382318	biased allows us to compare
-0.237902	the residual error and compare
-0.236245	point to a[i+2] ; compare
-0.294240	pointer or reference is valid
-0.724184	if it is a valid
-0.356788	to point to a valid
-0.237923	outside the bounds of valid
-0.314751	have been initialized to valid
-0.563912	optimize a piece of CPU-intensive
-0.444058	increase the throughput of CPU-intensive
-0.237922	problems that relate to CPU-intensive
-0.237875	of assembly language for CPU-intensive
-0.324310	by 5-10% for some CPU-intensive
-0.436244	big for the stack. Is
-0.298114	be an efficient solution. Is
-0.141391	the most efficient solution. Is
-0.200043	or a binary tree. Is
-0.165179	pointer (see page 38). Is
-0.757844	be able to do so.
-0.333324	be obvious to do so.
-0.333324	tested seem to do so.
-0.229247	an array, or approximately so.
-0.165184	delete, and often excessively so.
-0.294238	This extra cost is seen
-0.356761	software performance should be seen
-0.357591	cache misses is not seen
-0.352925	model number. I have seen
-0.222383	compiler I have ever seen
-0.232325	lot of computing resources. Typically,
-1.108844	instruction set is enabled. Typically,
-0.328899	between several execution units. Typically,
-0.452018	size in the future. Typically,
-0.165174	computers have memory caches. Typically,
-1.082928	is divisible by the 107
-0.355568	or not. See page 107
-0.218578	12.3 Automatic vectorization ......................................................................................... 107
-0.165174	and YMM registers ................................................................. 107
-0.165174	and ZMM registers .......................................................... 107
-0.325187	The allocated memory is contiguous
-0.355137	stack memory which is contiguous
-0.237769	single container, preferably with contiguous
-0.346950	be stored in one contiguous
-0.233575	keep the two modules contiguous
-0.237807	with the pointer it gets
-0.237453	a template class which gets
-0.593921	The second generation class gets
-0.476462	before the end user gets
-0.234529	before the application programmer gets
-0.343725	in this series of manuals.
-0.237896	the relevant books and manuals.
-0.292867	suggestions for my optimization manuals.
-0.321133	described in the subsequent manuals.
-0.598605	a series of five manuals.
-0.294019	the function definition. This tells
-0.419248	linker. The map file tells
-0.290371	purposes. The const keyword tells
-0.101775	Event-based sampling: The profiler tells
-0.101775	Time-based sampling: The profiler tells
-0.355956	general method is to wrap
-1.355025	it is recommended to wrap
-0.325185	they are guaranteed to wrap
-0.354035	point variables do not wrap
-0.355621	will make the value wrap
-0.509649	each function or class separately
-0.324497	than moving each object separately
-0.235928	each pixel or line separately
-0.378823	call all code branches separately
-0.231901	code and compile them separately
-0.370793	Assume function is pure __attribute((
-0.222389	optimize(...) Fastcall function __fastcall __attribute((
-0.035784	by 16 __declspec( align(16)) __attribute((
-0.035784	__attribute(( aligned(16))) __declspec( align(16)) __attribute((
-0.251360	is pure __attribute(( const)) __attribute((
-0.237927	with other subtasks is necessary.
-0.525636	cleanup that may be necessary.
-1.027836	if it is not necessary.
-0.546987	input less efficient than necessary.
-0.537282	make overflow checks where necessary.
-0.294240	speed of CPUs is increasing
-0.314585	reduce the problem by increasing
-0.347643	and used for an increasing
-0.236684	we are seeing an increasing
-0.165179	bits represent a monotonically increasing
-0.365802	must be aligned by 16,
-0.255142	arrays are aligned by 16,
-0.349942	at 15 byte at 16,
-0.191071	sizes other than 8, 16,
-0.191071	(i.e. 2, 4, 8, 16,
-0.339024	for communication between different threads,
-0.339989	divide it into multiple threads,
-0.289586	be shared between multiple threads,
-0.289586	are shared between multiple threads,
-0.536987	communication and synchronization between threads,
-0.196033	poorly designed program. 6 Development
-0.196033	algorithm ....................................................................................... 24 6 Development
-0.322361	page 29 for details. Development
-0.165179	syntax is very old-fashioned. Development
-0.165179	on. Most IDE's (Integrated Development
-0.305960	the expression that is AND'ed
-0.305960	The expression that is AND'ed
-0.237424	value of cc[i]+2 is AND'ed
-0.237424	mask, and bb[i]*cc[i] is AND'ed
-0.456021	1]; Here, I have AND'ed
-0.266072	This allows common subexpression elimination
-0.182809	+= 2; Common subexpression elimination
-0.182809	(vector) reductions: Common subexpression elimination
-0.176507	Microsoft Constant propagation Pointer elimination
-0.176507	functions like sin. Pointer elimination
-0.348937	some cases, but not all.
-0.377769	no extra code at all.
-0.234273	are not supported at all.
-0.234273	or no offset at all.
-0.231910	seen can reduce them all.
-0.723902	Optimizations in the compiler ..........................................................................................
-0.380904	148 14.13 System programming ..........................................................................................
-0.532420	3.11 Other system resources ..........................................................................................
-0.552350	13.4 Test and maintenance ..........................................................................................
-0.501052	9.9 Access data sequentially ..........................................................................................
-0.720718	You may use the upper
-0.194041	time or a reasonable upper
-0.148749	Useful when no reasonable upper
-0.382917	i++) { // Get upper
-0.165179	time or a not-too-big upper
-0.294039	function names and code addresses.
-0.314317	around at different memory addresses.
-0.288677	but only self- relative addresses.
-0.226782	sometimes uses 32-bit absolute addresses.
-0.272277	are aligned at round addresses.
-0.065141	that *p+2 is a loop-invariant
-0.237638	common subexpression elimination and loop-invariant
-0.237638	elimination, constant propagation, and loop-invariant
-0.236533	it can move out loop-invariant
-0.325402	doing an addition to sum1
-0.741341	i += 2) { sum1
-0.237010	The two summation variables sum1
-0.466176	= 100; float list[size], sum1
-0.165174	list[i]; sum2 += list[i+1];} sum1
-0.331842	a ^ -1 = ~a
-0.293317	a= a a & ~a
-0.742720	n.a. - a & ~a
-0.309565	= 0 a ^ ~a
-0.165179	= -1 (a&~b)|(~a&b)=a^b --------- ~a
-0.443241	Floating point induction variables Compilers
-0.236722	C or C++ code. Compilers
-0.290683	of the micro-op cache. Compilers
-0.732915	32-bit Mac OS X Compilers
-0.224944	live ranges now overlap. Compilers
-0.165174	: "m"(x) : "memory" );
-0.165174	((C & 3) <<6 );
-0.165174	( short int bb[size] );
-0.165174	( short int cc[size] );
-0.165174	( short int aa[size] );
-0.237921	if the goal of 18
-0.224954	evict number 1. Number 18
-0.200037	embedded systems ............................................................................. 158 18
-0.200037	3.3 Program installation .................................................................................................. 18
-0.165174	(see p. 22). 159 18
-0.535675	to do something about them.
-0.494917	know how to avoid them.
-0.419047	the function that needs them.
-0.306996	too big before multiplying them.
-0.165174	the wires that connect them.
-0.232568	when b is floating point.
-0.621519	from integer to floating point.
-0.232568	intermediate results as floating point.
-0.231914	have three values per point.
-0.218590	pointer serves as entry point.
-0.442876	contentions and the time consumption
-0.342532	and stores the time consumption
-0.347923	programming style. The time consumption
-0.233667	get the exact time consumption
-0.236000	processors with low power consumption
-0.235573	its b member by 8.
-0.373435	an address divisible by 8.
-0.235573	multiplying the index by 8.
-0.200054	not alias, if appropriate. 8.
-0.356676	a key? If the key
-0.331679	simple actions like a key
-0.294036	response to pressing a key
-0.237181	by their index or key
-0.237181	a mouse move or key
-0.058860	page 78 for an explanation.
-0.235661	this works, here's an explanation.
-0.334432	C++ Performance for further explanation.
-0.325438	This needs a little explanation.
-0.237780	therefore not advantageous by itself.
-0.357781	resources than the code itself.
-0.356943	instruments into the program itself.
-0.233575	done by the constructor itself.
-0.288668	computer, including the profiler itself.
-0.538218	list needs to be updated
-0.882393	dynamic library can be updated
-0.305209	library has not been updated
-0.229913	IDE. Has not been updated
-0.165184	2004 - 2014. Last updated
-0.237592	value of i will appear
-1.214292	parts of the program appear
-0.261586	order in which they appear
-0.566204	in which the modules appear
-0.314694	very inefficient way. The Codeplay
-0.372960	conventions. Optimizes reasonably well. Codeplay
-0.222383	inlining x-xxxx--x Constantfolding xxxxxxxxx Codeplay
-0.165174	C/C++ v. 1.4, 2005. Codeplay
-0.165174	with these. The CodeGear, Codeplay
-0.237195	to the same object (except
-0.425112	manipulations on integer expressions (except
-0.231401	of the previous iteration (except
-0.524790	of the two loops (except
-0.226773	have floating point capabilities (except
-0.460667	level-3 cache. If the combined
-0.354269	memory model where the combined
-0.358400	and s3 can be combined
-0.325302	if the results are combined
-0.382546	Clang The Clang compiler combined
-0.341853	page 15. C++ is definitely
-0.232873	These complicated cases should definitely
-0.232873	The .NET framework should definitely
-0.288489	so. These containers should definitely
-0.326110	session. But lazy binding definitely
-0.237899	eax with 100 and jumps
-0.356366	the code that it jumps
-0.333976	invalid if a thread jumps
-0.122669	Join identical branches Eliminate jumps
-0.122669	y + 1.; Eliminate jumps
-0.237533	into the right vector elements.
-0.784663	of structure or class elements.
-0.597633	the addresses of array elements.
-0.233727	fast access to array elements.
-0.333204	binary search for finding elements.
-0.237445	parameter transfer across all .cpp
-0.306621	to combine the multiple .cpp
-0.231100	possibility of compiling multiple .cpp
-0.231100	or for combining multiple .cpp
-0.322214	module (i.e. the current .cpp
-0.341551	friendly compiler with many features,
-0.334918	class library has many features,
-0.235415	has many advanced optimizing features,
-0.230100	language has full metaprogramming features,
-0.212315	applications need better backup features,
-0.346280	off the position-independent code flag
-0.234673	that use the zero flag
-0.066451	either in the carry flag
-0.066451	kept in the carry flag
-0.144939	don't modify the carry flag
-0.055548	< 256; i += 8)
-0.308544	else { (iset >= 8)
-0.722171	keep up with the ever
-0.237905	has to invest in ever
-0.352925	no compiler I have ever
-0.292033	even if no exception ever
-0.301493	memory model is hardly ever
-0.236080	// Called directly // Writes
-0.023383	Writes "Hello 1" // Writes
-0.236080	= &Object2; p2->Hello(); // Writes
-0.532440	3.11 Other system resources Writes
-0.294203	4, 6, 9 and 13
-0.548697	12, last byte at 13
-0.226777	12.10 Conclusion .......................................................................................................... 120 13
-0.165174	and header files. 121 13
-0.165174	2.23 0.95 0.6 1.19 13
-0.408131	that the numbers in b[i]
-1.197445	i++) { a[i] = b[i]
-0.237793	overflow by checking if b[i]
-1.162203	< size; i++) { b[i]
-1.456562	i < size; i++) b[i]
-0.639717	the calculation time is doubled.
-0.606617	number of registers is doubled.
-0.478148	CPU clock frequency is doubled.
-0.357595	the performance is not doubled.
-0.451766	of registers has been doubled.
-0.314720	to be read and written
-0.356761	System code should be written
-0.237098	the floating point value written
-0.235132	the case with programs written
-0.165174	typo in a hand- written
-0.335651	better standardization of programming languages,
-0.308600	than in other programming languages,
-0.219637	(IDE) supports multiple programming languages,
-0.273475	memory allocation. Some programming languages,
-0.165190	example, in interpreted script languages,
-0.572637	allocated with new or malloc
-0.536164	new and delete or malloc
-0.236554	new and delete, or malloc
-0.450001	or with the functions malloc
-0.229243	new and delete (or malloc
-0.731006	of the program that runs
-0.324607	to make software that runs
-0.293245	than a thread that runs
-0.548755	Test if the program runs
-0.236975	again, that most software runs
-0.324776	b when a is true,
-0.513051	only when b is true,
-0.237424	bb[i] > 0 is true,
-0.237424	operand of || is true,
-0.237753	and therefore count as true,
-0.339498	spend time doing the division.
-0.237802	code involves multiplication or division.
-0.345053	functions for integer vector division.
-0.564928	cases of floating point division.
-0.293629	to SSE4.1 and integer division.
-0.237214	A; double Y = C;
-0.293427	= B; x.c = C;
-0.237265	A*x*x + B*x + C;
-0.013568	x; int A, B, C;
-0.212325	0.12 0.11 0.18 0.18 0.18
-0.165188	0.18 0.12 0.11 0.18 0.18
-0.212315	0.12 0.18 0.12 0.11 0.18
-0.212315	Intel Core 2 0.12 0.18
-0.165179	Core 2 0.63 0.75 0.18
-0.237266	not not _WIN32 n.a. MS
-0.116475	options relevant to optimization MS
-0.116475	keywords relevant to optimization MS
-0.226779	long long or int64_t MS
-0.218584	long long or uint64_t MS
-0.538225	(*SelectAddMul_pointer)(aa, bb, cc); } #endif
-0.229234	fistp dword ptr n; #endif
-0.265190	__attribute__((const)) #else #define pure_function #endif
-0.165174	#define Alignd(X) X __attribute__((aligned(16))) #endif
-0.165174	8 #define FUNCNAME SelectAddMul_AVX2 #endif
-1.055980	the rest of the present
-0.358486	the techniques in the present
-0.237877	by Agner Fog The present
-0.314695	dispatching are: Optimizing for present
-0.294013	problem that were not present
-0.697696	from example 15.1b to 15.1c
-0.420796	reduces example 15.1a to 15.1c
-0.237537	reducing example 15.1d to 15.1c
-0.926047	The code in example 15.1c
-0.165184	example 15.1a to 151 15.1c
-0.832444	const int size = 1000;
-0.236014	const int ArraySize = 1000;
-0.236014	const int arraysize = 1000;
-2.093573	= 0; i < 1000;
-0.572874	special versions of the strlen
-0.382775	that have tested the strlen
-0.200043	1.00 0.35 0.29 0.28 strlen
-0.165179	Here are some examples: strlen
-0.165179	4.5 0.82 0.59 0.27 strlen
-0.501271	3. The code is __asm
-0.237805	__asm int 3; or __asm
-0.234200	fld qword ptr x; __asm
-0.074783	32-bit Windows, Intel/MASM syntax: __asm
-0.074783	32-bit Linux, Gnu/AT&T syntax: __asm
-0.155882	zero or one clock cycle.
-0.319056	take only one clock cycle.
-0.155882	in just one clock cycle.
-0.046045	point addition every clock cycle.
-0.046045	one addition every clock cycle.
-0.802209	8, last byte at 11
-0.381689	time. Integer multiplication takes 11
-0.236118	application integration, mixed language 11
-0.231396	feature is rarely needed. 11
-0.218578	10.1 Hyperthreading ..................................................................................................... 103 11
-0.237867	(*.dll or *.so) that belong
-0.237438	0x4700. These addresses all belong
-0.236756	consuming library functions often belong
-0.236535	on the stack always belong
-0.338548	of these cache lines belong
-0.237136	original is destroyed. In 50
-0.324364	Typically, the conversion takes 50
-0.218578	Function return types .............................................................................................. 50
-0.212309	7.15 Function parameters ............................................................................................... 50
-0.212309	} This code took 50
-0.237654	(Integrated Development Environments) have facilities
-0.289286	devices and using advanced facilities
-0.020182	been added? If search facilities
-0.224948	development tools have powerful facilities
-0.237729	Manual", Volume 1 - 5.
-0.339880	a to the constant 5.
-0.233806	32 with j << 5.
-0.233577	AMD and VIA CPUs. 5.
-0.251354	preferably 32 for AVX. 5.
-0.325254	by Intel but is currently
-0.331746	The Windows version is currently
-0.314670	predefined vector classes are currently
-0.340558	not used. The method currently
-0.235833	in the Gnu manual currently
-0.235027	may occur in multiplication here:
-0.229245	allocation can be mentioned here:
-0.222383	are two main principles here:
-0.222396	beware of the pitfalls here:
-0.165174	any other error reporting here:
-0.436527	structure in some cases. Does
-0.745090	and later instruction sets. Does
-0.362407	compiler for 32-bit Windows. Does
-0.362407	Supports only 32-bit Windows. Does
-0.272284	Windows, including an IDE. Does
-0.331701	} A problem with macros
-0.343550	directives when used as macros
-0.736708	Therefore, you should avoid macros
-0.236157	it calls. 48 Use macros
-0.165174	nontemporal Table 18.3. Predefined macros
-0.353590	bad dilemma. You may prefer
-0.236577	example, a programmer may prefer
-0.237683	class. Which solution you prefer
-0.237592	that many users will prefer
-0.293261	is double. Here we prefer
-0.819065	the value of the divisor
-0.023460	unsigned // Faster if divisor
-0.349549	by using a constant divisor
-0.122675	updates .................................................................................................... 19 3.5 Program
-0.122675	the update process. 3.5 Program
-0.074788	spots .................................................................................. 16 3.3 Program
-0.074788	the following sections. 3.3 Program
-0.314761	generation of processors is better.
-0.284375	processor model will work better.
-0.229251	the next model work better.
-0.165184	This solution is clearly better.
-0.450215	In 32-bit Linux and BSD,
-0.453720	system is based on BSD,
-0.279876	32-bit and 64-bit Linux, BSD,
-0.208515	x86 platforms (Windows, Linux, BSD,
-0.356949	vector c2 with the bit-mask:
-0.048375	0 and generate a bit-mask:
-0.330846	bc with the inverted bit-mask:
-1.757478	is a power of two.
-0.237805	within a year or two.
-0.356625	one array rather than two.
-0.489691	other compilers will make two.
-0.330854	several minutes to start up,
-0.501061	need to be cleaned up,
-0.212315	time the computer starts up,
-0.212315	time it is filled up,
-0.218590	called and resources cleaned up.
-0.347508	when the program starts up.
-0.265197	heap to be filled up.
-0.165179	chains can be broken up.
-0.236989	if required for performance reasons.
-0.330076	for C++ for several reasons.
-0.229243	as possible for usability reasons.
-0.200043	new version for marketing reasons.
-0.355573	of order. See page 103
-0.212315	101 10.1 Hyperthreading ..................................................................................................... 103
-0.200043	of order execution ................................................................................................. 103
-0.251360	improve this by writing: 103
-0.101119	execution time. 4 2 Choosing
-0.101119	optimizing ............................................................................................... 4 2 Choosing
-0.215269	usability ............................................................................................... 23 5 Choosing
-0.215269	from a website. 5 Choosing
-0.539396	lengths of the time slices
-0.331475	frequent if the time slices
-0.331475	can increase the time slices
-0.233669	thread will get time slices
-0.992241	in case of an exception.
-0.326457	the event of an exception.
-0.235664	F2 actually throws an exception.
-0.235773	the destructor causes another exception.
-0.236409	multiple conditions using & enum
-0.236166	functions. 7.4 Enums An enum
-0.399291	14.7a. Testing multiple conditions enum
-0.165179	int, float, double, bool, enum
-0.457176	loop in a program repeats
-0.351523	example, if a loop repeats
-0.379480	FuncC(i); } This loop repeats
-0.237167	another loop that also repeats
-0.356869	a CPU with the highest
-0.357069	language modules when the highest
-0.443715	every clock cycle. The highest
-0.382294	language is implemented. The highest
-0.313192	of rows/columns in matrix 96
-0.222389	Access data sequentially .......................................................................................... 96
-0.200043	93 9.8 Strings ...................................................................................................................... 96
-0.165179	large data structures ............................................................. 96
-0.294017	am not going to recommend
-0.237732	for software teachers to recommend
-0.035786	functions Some programming textbooks recommend
-0.035786	classes Nowadays, programming textbooks recommend
-0.353649	is also likely to lead
-0.457790	the code. This can lead
-0.236739	This new insight can lead
-0.236739	studying the bottlenecks can lead
-0.313276	counter then make an additional
-0.323874	avoided by making an additional
-0.357694	to give the compiler additional
-0.272290	register left for transferring additional
-0.585596	created. There is no 51
-0.653182	each other. See page 51
-0.165179	7.17 Structures and classes............................................................................................ 51
-0.165179	data members (properties) ............................................................................ 51
-0.237533	{ // 2-dimensional vector 56
-0.218584	7.26 Overloaded functions .............................................................................................. 56
-0.218584	7.27 Overloaded operators ............................................................................................. 56
-0.165179	55 7.25 Bitfields ................................................................................................................... 56
-0.457996	can also be a type.
-0.351132	pointer of a different type.
-0.237395	size of each integer type.
-0.212315	type-casted to a wrong type.
-0.353447	copying them into a place
-1.356009	it is recommended to place
-0.328618	called only from one place
-0.235282	i and shifts one place
-0.585705	execution then it is preferable
-0.497378	that static linking is preferable
-1.133300	but it may be preferable
-0.551942	style. It is often preferable
-0.478390	for the CPU to overlap
-0.498383	microprocessor is able to overlap
-0.237837	with out-of-order capabilities can overlap
-0.354035	their live-ranges do not overlap
-0.023772	eight to fit the eight-element
-0.309587	without SSE2 typically takes 40
-0.403911	microprocessors. Integer division takes 40
-0.501069	i; short int s; 40
-0.165184	38 7.11 Type conversions.................................................................................................... 40
-0.237705	add 2 to x 43
-0.342150	prediction mechanism. See page 43
-0.342150	long delay. See page 43
-0.165184	Branches and switch statements............................................................................. 43
-0.726184	in 32-bit systems and sixteen
-0.575156	32-bit operating systems and sixteen
-0.237809	operations involves eight or sixteen
-0.230820	It can contain either sixteen
-0.999921	also be used for turning
-0.339796	frame function or by turning
-0.292398	the whole program by turning
-0.236310	up significantly just by turning
-0.235913	126 Make pointer at initialization.
-0.323559	call. Load library at initialization.
-0.330647	the object doesn't need initialization.
-0.337191	that does the necessary initialization.
-0.233037	only on PC platforms. Graphics
-0.212321	these categories: File input/output Graphics
-0.074786	optimizing database access. 3.10 Graphics
-0.074786	databases ....................................................................................................... 21 3.10 Graphics
-0.354363	can predict where the obstacles
-0.349864	be aware of these obstacles
-0.236319	avoid them. Some important obstacles
-0.743128	of the most common obstacles
-0.358488	are provided in the asmlib
-0.351215	in vectors, but the asmlib
-0.023286	supported instruction set, using asmlib
-0.419559	a piece of code. Furthermore,
-0.325422	the program is executed. Furthermore,
-0.200043	problems and system crash. Furthermore,
-0.165179	and stored in edx. Furthermore,
-1.469453	it is possible to obtain
-0.345621	is sometimes possible to obtain
-0.536264	functions then you can obtain
-0.443443	such cases, you can obtain
-0.584269	change the value of ebx.
-0.345095	least significant bit of ebx.
-0.237924	is in edx, to ebx.
-0.212321	to push and pop ebx.
-0.237805	with a prediction or estimate
-0.272291	(or if a reasonable estimate
-0.165179	then we can roughly estimate
-0.165179	to see if our estimate
-1.341714	SSE2 instruction set is enabled
-1.041009	later instruction set is enabled
-0.340096	possible. SSE2 is always enabled
-0.231910	cores and leave them enabled
-0.294213	inlining more efficient and enables
-0.023366	single object file. This enables
-0.571114	any other modules. This enables
-0.122675	on this option. 8.4 Obstacles
-0.122675	compiler ....................................................................... 77 8.4 Obstacles
-0.074788	and PathScale compilers. 8.3 Obstacles
-0.074788	different compilers............................................................................. 74 8.3 Obstacles
-0.021990	Func(int a[], int & r)
-0.218232	void FuncB (int & r)
-0.218232	p->b;} int Sum3(S3 & r)
-0.237906	can be arranged in regular
-0.325328	automatically prefetch data for regular
-0.237556	through the Internet at regular
-0.495986	pointer follows a simple regular
-0.584374	know the value of m
-0.685811	lies in the way m
-0.216495	in the template function, m
-0.353225	In the simple function, m
-0.236730	that string as code. Metaprogramming
-0.176501	programming .......................................................................................... 150 15 Metaprogramming
-0.176501	See page 90. 15 Metaprogramming
-0.272290	page 90. 15 Metaprogramming Metaprogramming
-0.234536	shortly. The following examples explain
-0.068039	assembly code. Let me explain
-0.068039	and sets. Let me explain
-0.212321	short vector libraries. To explain
-0.236708	the branching takes time. Dispatch
-0.234526	See page 128 below. Dispatch
-0.310515	compiled with different compilers. Dispatch
-0.165179	decision at different times: Dispatch
-0.294033	to other platforms as well,
-0.291910	Some functions are optimized well,
-0.309269	same way is predicted well,
-0.212323	Visual Studio optimizes reasonably well,
-0.543065	2-20, but this is sufficiently
-0.144336	sure the arrays are sufficiently
-0.200049	line written. This worked sufficiently
-0.234526	in example 13.1 below. 126
-0.226779	33 11.8 127 127 126
-0.222389	Test and maintenance .......................................................................................... 126
-0.212315	126 13.5 Implementation ..................................................................................................... 126
-0.314759	float and double is bad
-0.358078	the programmer in a bad
-0.341861	find more examples of bad
-0.232333	code implementation works particularly bad
-0.002795	{ public: static double p(double
-0.139515	optimization. Everything that is said
-0.139515	register. Everything that is said
-0.578439	least, it can be said
-0.524822	It is often easier said
-0.358057	also applies to the modulo
-0.336299	same rules apply to modulo
-0.237218	the same as i modulo
-0.351690	be used to avoid modulo
-0.336262	large data files and databases
-0.043101	...................................................................................................... 20 3.9 Other databases
-0.043101	files). 20 3.9 Other databases
-0.222395	locally. Access to remote databases
-0.095313	point overflow: _controlfp_s(&dummy, 0, _EM_OVERFLOW);
-0.095313	status: _fpreset(); _controlfp_s(&dummy, 0, _EM_OVERFLOW);
-0.008674	0, _EM_OVERFLOW); // _controlfp(0, _EM_OVERFLOW);
-0.224953	6); Or, if protection against
-0.165179	said, I must warn against
-0.165179	scheme should be weighed against
-0.165179	number of possible remedies against
-0.288667	the vector register size. Vectorized
-0.356088	classes and overloaded operators. Vectorized
-0.165179	set: // Example 12.4b. Vectorized
-0.165179	*)d, x); } 112 Vectorized
-0.165179	break; case 1: printf("Beta"); break;
-0.165179	break; case 2: printf("Gamma"); break;
-0.165179	break; case 3: printf("Delta"); break;
-0.165179	{ case 0: printf("Alpha"); break;
-0.587954	Linux is that the loader
-0.356933	once more by the loader
-0.237781	program is loaded, the loader
-0.294213	both compiler, linker and loader
-0.602989	family and model number. Failure
-0.122672	allocated is also deallocated. Failure
-0.332676	it has been deallocated. Failure
-0.251367	cases of program flow. Failure
-0.344645	of the class is declared.
-0.759045	which the variable is declared.
-0.102686	the time MemberPointer is declared.
-0.102686	c1 before MemberPointer is declared.
-0.579438	consumes a lot of resources,
-0.237619	slower or require more resources,
-0.549274	compete for the same resources,
-0.232711	and interfaces to network resources,
-0.595214	evaluated if a is true.
-0.408087	false and 1 for true.
-0.356763	where it should be true.
-0.532820	this is not always true.
-0.336258	simple variables, arrays and objects.
-0.349166	memory allocation for all objects.
-0.784663	of structure or class objects.
-0.235760	block for every four objects.
-0.593340	doing multiple calculations in parallel.
-0.428246	threads that run in parallel.
-0.438099	that are running in parallel.
-0.537263	to do things in parallel.
-0.314663	were inserted, one by one,
-0.331359	always one, and only one,
-0.340089	that there is always one,
-0.224944	the preceding label plus one,
-0.234622	// Example 14.12b int list[300];
-0.234622	// Example 14.13b int list[300];
-0.234622	// Example 14.13a int list[300];
-0.234622	// Example 14.12a int list[300];
-0.011965	0; r < SIZE; r++)
-0.011965	1; r < SIZE; r++)
-1.385181	a, b; a = parabola
-0.231168	// Example 8.3a float parabola
-0.231168	a * a;} float parabola
-0.231168	// Example 8.1b float parabola
-0.236662	} // x^2 // x^4
-0.236662	x^4 F32vec4 xx4(x4); // x^4
-0.236662	x2 * x2; // x^4
-0.165190	// x^1, x^2, x^3, x^4
-0.331901	simple things like a mouse
-0.237902	times to keyboard and mouse
-0.237181	a key press or mouse
-0.237181	quickly to keyboard or mouse
-0.221926	template because partial template specialization
-0.022282	}; // Full template specialization
-0.221926	}; // Partial template specialization
-0.537383	function of the loop index.
-0.706601	used as an array index.
-0.349210	list with a simple index.
-0.165179	array with a top-of-stack index.
-0.236989	under advanced system performance options.
-0.236726	Has many good optimization options.
-0.322607	turn on all relevant options.
-0.165179	set of performance monitoring options.
-0.049363	0; c < SIZE; c++)
-0.013569	0; c < r; c++)
-0.293070	smaller the data elements are.
-0.441099	the obstacles to optimization are.
-0.595111	make sure that they are.
-0.231309	and that's what they are.
-0.357409	memory allocation may be needed,
-0.519630	only when they are needed,
-0.023450	If search facilities are needed,
-0.450230	Unfortunately, the way of declaring
-0.688552	This is done by declaring
-0.292398	variable even smaller by declaring
-0.236310	to be inlined by declaring
-0.358547	function names in the SVML
-0.232071	of 2 double Intel SVML
-0.232071	earlier vmlsExp4 vmldExp2 Intel SVML
-0.232071	later __svml_expf4 __svml_exp2 Intel SVML
-0.037089	link library (*.dll or *.so).
-0.008674	called shared objects (*.dll, *.so).
-0.379705	i; } u; if (u.i
-0.023351	} u, v; if (u.i
-0.235663	int n; 143 if (u.i
-0.236128	usability problems and necessary support.
-0.571299	with and without AVX support.
-0.228160	the program with profiling support.
-0.165179	a compiler with C++0x support.
-0.294213	time than addition and subtraction
-0.017525	longer time than addition, subtraction
-0.212331	reductions involving integer addition, subtraction
-0.023472	= _mm_add_epi16(c, two); // Multiply
-0.237711	// Example 7.42 int Multiply
-0.165184	= (a<b && b<c) Multiply
-0.122672	(*p != 0) *(p++) |=
-0.122672	> 0; i--) *(p++) |=
-0.165184	7.27 float x; *(int*)&x |=
-0.165184	x.f = 2.0f; x.i |=
-0.687397	strings in a memory pool.
-0.232725	a container or memory pool.
-0.780995	in the same memory pool.
-0.288320	strings in one memory pool.
-0.429436	takes. The version that performs
-0.235398	which a code version performs
-0.235398	which this code version performs
-0.236170	cache. The Core2 processor performs
-0.357424	and statistics, and the "Intel
-0.494781	available, such as the "Intel
-0.237879	common purposes (www.boost.org). The "Intel
-0.218594	on code optimization Intel: "Intel
-1.254204	known at compile time. Are
-0.218590	with a top-of-stack index. Are
-0.200043	to be too small. Are
-0.165179	a linked list. 94 Are
-0.237877	and decrement operators The pre-increment
-0.293900	difference whether you use pre-increment
-0.338747	are also situations where pre-increment
-0.233327	adjusted if you change pre-increment
-0.294206	the allocated object, and ownership
-0.284365	"move constructor" to transfer ownership
-0.165179	or operator that transfers ownership
-0.165179	The object that looses ownership
-0.653182	each other. See page 88
-0.165179	allocation (new and delete). 88
-0.165179	be stored together ...................................... 88
-0.165179	should be stored together...................................... 88
-0.148760	float x; *(int*)&x |= 0x80000000;
-0.148760	= 2.0f; x.i |= 0x80000000;
-0.074788	} u; u.i ^= 0x80000000;
-0.074788	example with u.i[1] ^= 0x80000000;
-0.504773	calls and it can move
-0.545211	expression that it can move
-0.237693	because the container may move
-0.218590	things like a mouse move
-0.293852	a = c; } Can
-0.302373	not supported at all. Can
-0.212315	been updated since 2004. Can
-0.251360	access to the container. Can
-0.348356	a portable way of defining
-0.355514	definitions when used for defining
-0.406860	solved this problem by defining
-0.237046	can be overcome by defining
-0.713523	piece of code that produces
-0.345913	a C++ program that produces
-0.233491	of an unsigned variable produces
-0.233491	of a signed variable produces
-0.237785	cause a loss of precision,
-0.237785	overflow or loss of precision,
-0.277266	with single or double precision,
-0.277266	single precision or double precision,
-0.456118	Inlined functions have a non-inlined
-0.581710	has to make a non-inlined
-0.341399	compiler from making a non-inlined
-0.237738	from another module. This non-inlined
-0.526754	avoids many of the drawbacks
-0.048373	applications. 2.8 Overcoming the drawbacks
-0.048373	14 2.8 Overcoming the drawbacks
-0.294213	of the advantages and drawbacks
-0.649563	float Exp(float x) { __declspec(align(16))
-0.218590	Example: // Example 12.2 __declspec(align(16))
-0.251360	Microsoft compiler #define Alignd(X) __declspec(align(16))
-0.200043	always work. Data alignment. __declspec(align(16))
-0.552081	flip sign bit of u.f
-0.336244	that we know that u.f
-0.356253	> v.i) { // u.f
-0.226782	// Now 1.0 <= u.f
-0.357827	are available for the commercial
-0.237595	well. Codeplay VectorC A commercial
-0.331102	are missing in many commercial
-0.200043	General Public License, optional commercial
-0.291178	to read and write configuration
-0.279493	shared objects), resource files, configuration
-0.200043	loading of several drivers, configuration
-0.165179	excessive number of DLLs, configuration
-0.350857	The examples on page 134
-0.351579	of range (see page 134
-0.466197	Index out of range"; 134
-0.200049	14.2 Bounds checking .................................................................................................. 134
-0.237756	individual functions or code lines.
-0.128995	for the same cache lines.
-0.707067	longer than a few lines.
-0.348324	for your compiler for restrictions
-0.235638	instructions have very few restrictions
-0.074887	optimal. There are certain restrictions
-0.074887	tables". There are certain restrictions
-0.331433	subexpression elimination x n.a. Constant
-0.234674	Gnu Intel Borland Microsoft Constant
-0.165179	5.0f; b = 6.0f; Constant
-0.165179	or a few places. Constant
-0.254802	stored by the heap manager
-0.119653	random order. The heap manager
-0.119653	memory heap. The heap manager
-0.119653	become invalid. The heap manager
-0.236957	branch target buffer, branch pattern
-0.003834	follows a simple periodic pattern
-0.015542	branches. A simple periodic pattern
-0.829017	64-bit mode because the x86-64
-0.037136	Supports all x86 and x86-64
-0.212321	x86 platform _M_IX86 _M_IX86 x86-64
-0.452884	permissible to assume that *p+2
-0.293711	prevented from assuming that *p+2
-0.280832	reload *p and calculate *p+2
-0.226129	8.21, you could calculate *p+2
-0.237902	The CodeGear, Codeplay and Watcom
-0.184765	not optimize well. Open Watcom
-0.184765	v. 8.42n, 2004. Open Watcom
-0.222399	x-xxxx--x Constantfolding xxxxxxxxx Codeplay Watcom
-0.458164	truncation and make a round
-0.237924	align data members to round
-0.235913	data are aligned at round
-0.323559	libraries are loaded at round
-0.804091	have two or more cores,
-0.237485	multiple CPUs or CPU cores,
-0.237196	vector processing instructions, multiple cores,
-0.212315	sets have got RISC cores,
-0.211204	has a branch that chooses
-0.211204	example, a branch that chooses
-0.649754	situation where a program chooses
-0.236547	If the cache always chooses
-0.180893	while the program is running.
-0.354682	the programs they are running.
-0.237619	the program itself when running.
-0.507429	lot of code is serial
-0.345352	The above code is serial
-0.035786	== 2 12.6 Transforming serial
-0.035786	............................................................................................. 113 12.6 Transforming serial
-0.310539	eight consecutive elements from cc
-0.232777	into vector b: from cc
-0.237242	on first call // Header
-0.237242	2.20 or later // Header
-0.620422	as follows: Instruction set Header
-0.165184	x86intrin.h (Gnu) Table 12.2. Header
-0.350829	of mathematical functions that 150
-0.355573	main program. See page 150
-0.222389	14.13 System programming .......................................................................................... 150
-0.212315	150 15 Metaprogramming ....................................................................................................... 150
-0.237154	C++ is quite efficient thanks
-0.234922	processors prefetch data automatically thanks
-0.230103	microprocessors are very similar thanks
-0.283151	space never becomes fragmented thanks
-0.341281	- 2, x = 2.0;
-0.462290	1.0; for (x = 2.0;
-0.236017	= 1.0; list[i].b = 2.0;
-0.236017	= 1.0; temp->b = 2.0;
-0.329487	a branch into the pipeline
-0.329487	is fed into the pipeline
-0.341664	are executed. However, the pipeline
-0.549302	obtained by using a pipeline
-0.353460	(double x, unsigned int n)
-0.044449	14.1b int factorial (int n)
-0.044449	14.1a int factorial (int n)
-0.326408	<malloc.h> void SomeFunction (int n)
-0.156727	response times for user input.
-0.156727	that waits for user input.
-0.156727	handle. Waiting for user input.
-0.272297	to keyboard or mouse input.
-0.723904	Optimizations in the compiler 8.1
-0.680037	are listed in table 8.1
-0.234524	possibility of overflow. Table 8.1
-0.212315	the compiler .......................................................................................... 66 8.1
-0.236444	under the worst- case conditions.
-0.379470	CPU or other hardware conditions.
-0.272284	response time under worst-case conditions.
-0.165179	done under the best-case conditions.
-0.293235	obtain much more by choosing
-0.457300	performance is obtained by choosing
-0.921913	taken into account when choosing
-0.349097	to help the programmer choosing
-1.006710	reasons explained on page 146
-0.332965	in detail on page 146
-0.376450	of dynamic linking are: 146
-0.165184	Static versus dynamic libraries............................................................................ 146
-0.382262	56 7.26 Overloaded functions ..............................................................................................
-0.531836	7.16 Function return types ..............................................................................................
-0.377672	81 8.6 Optimization directives ..............................................................................................
-0.567156	9.11 Explicit cache control ..............................................................................................
-0.294092	calling the intrinsic function _mm256_zeroupper()
-0.118792	AVX support then call _mm256_zeroupper()
-0.118792	CPU dispatching then call _mm256_zeroupper()
-0.118792	AVX support, then call _mm256_zeroupper()
-0.417426	messages to the user. Making
-0.165188	Conclusion .......................................................................................................... 120 13 Making
-0.165188	header files. 121 13 Making
-0.165184	possibility for significant improvements. Making
-0.408076	recommended to set the flush-to-zero
-0.325319	benefit from setting the flush-to-zero
-0.176507	// Example 7.6. Set flush-to-zero
-0.176507	// Example 7.5. Set flush-to-zero
-0.357790	the example of a Taylor
-0.356115	mathematical iterations such as Taylor
-0.165179	classes): // Example 12.9b. Taylor
-0.165179	this: // Example 12.9a. Taylor
-0.237119	selected version FuncType * SelectAddMul_pointer
-0.228173	&SelectAddMul_SSE41; (iset >= 2) SelectAddMul_pointer
-0.222404	{ (iset >= 8) SelectAddMul_pointer
-0.212315	&SelectAddMul_AVX2; (iset >= 5) SelectAddMul_pointer
-0.724753	initially points to the dispatcher.
-0.356956	initially points to a dispatcher.
-0.023343	are overriding Intel's CPU dispatcher.
-0.119654	applications and the Gnu, Clang,
-0.119654	supported by the Gnu, Clang,
-0.119654	to use the Gnu, Clang,
-0.125219	vectorization, such as Gnu, Clang,
-0.237899	in example 14.8 and 14.9
-0.593093	order. Example: // Example 14.9
-0.212315	and integers ................................... 141 14.9
-0.466187	d = (double)(signed int)u; 14.9
-0.441794	that depends only on n,
-0.236384	for any compile-time constant n,
-0.189598	Example 7.32a double x, n,
-0.189598	Example 7.32b double x, n,
-0.885337	2. Example: // Example 14.8
-0.926031	the code in example 14.8
-0.212315	ported to another platform. 14.8
-0.212315	float and double..................................................................................... 140 14.8
-0.779901	is no risk of overflow,
-0.382790	is no checking for overflow,
-0.237395	array bounds violation, integer overflow,
-0.312382	too small to cause overflow,
-0.058281	0; x < 100; x++)
-0.284368	2.0; x <= n; x++)
-0.165184	i >= 0; i--, x++)
-0.331812	the caching conditions are optimal.
-0.524302	built-in code is not optimal.
-0.650889	write instructions are not optimal.
-0.314328	of course far from optimal.
-0.001700	& x) { _mm_storeu_si128((__m128i *)d,
-0.165190	& x) { _mm_store_si128((__m128i *)d,
-0.786500	An object of a class,
-0.200060	Vector class, Intel Vector class,
-0.200060	of vector, bits Vector class,
-0.320098	pointer to a derived class,
-0.237588	y = cos(x); } z
-0.234523	x > y && z
-0.466187	{ y = cos(x); z
-0.466187	{ y = sin(x); z
-0.172210	b) is calculated in advance
-0.172210	it has calculated in advance
-0.331096	of memory needed in advance
-0.237105	microprocessor doesn't know in advance
-0.009109	from cc into vector c:
-0.447642	value of b is guaranteed
-0.237761	value of i&15 is guaranteed
-0.354682	integers - they are guaranteed
-0.357595	image base is not guaranteed
-0.354562	graceful way. You may think
-0.346057	CPU model and then think
-0.236356	on usability, but I think
-0.290685	to a. I don't think
-0.636401	page 89 for an example.
-0.338976	explain this with an example.
-0.347817	function (n!) as an example.
-0.354016	exception handling in this example.
-0.742022	set is available. The older
-0.336095	that the compatibility with older
-0.325253	a significant effect on older
-0.231914	a branch tree. On older
-0.294242	compilers, etc., as is commonly
-0.315813	Register variables The most commonly
-0.315813	different purposes. The most commonly
-0.510771	address. There are two commonly
-0.346481	may fill up the queue
-0.444367	efficient to implement a queue
-0.237595	at inconvenient times. A queue
-0.251360	For example, a FIFO queue
-0.237950	memory when exiting the {}
-0.235891	by declaring it inside {}
-0.265197	while (0 < 5) {}
-0.165179	add elements }; vector() {}
-0.802118	b = a + 1.0f;
-0.042023	else { list[i] += 1.0f;
-0.220499	list[i & 15] += 1.0f;
-0.122675	jl $B1$3: pop ret ALIGN
-0.122675	cmp ja $B2$3: ret ALIGN
-0.017524	8.26a compiled to assembly: ALIGN
-0.017524	8.26b compiled to assembly: ALIGN
-0.237381	program. This requires no modification
-0.333600	the program may need modification
-0.231418	14.30 will therefore need modification
-0.311763	see if a certain modification
-0.230108	ball reveals that similar solutions
-0.074786	something about it. Possible solutions
-0.074786	on non-Intel machines? Possible solutions
-0.165184	same chip. Such hybrid solutions
-0.045811	in C++ An optimization guide
-0.045811	VIA CPUs: An optimization guide
-0.045811	in C++: An optimization guide
-0.045811	assembly language: An optimization guide
-0.358546	the examples in the appendix
-0.348254	are provided in an appendix
-0.348699	are available as an appendix
-0.236169	efficient container classes. An appendix
-0.294184	or code lines. The 17
-0.224960	in this column. Number 17
-0.074786	Worst-case testing ................................................................................................ 157 17
-0.074786	random than normal. 157 17
-0.237952	you should apply the empty
-0.237879	empty throw() specification. The empty
-0.345307	F1 also have an empty
-0.236686	frame functions. While an empty
-0.172218	also makes testing and maintenance
-0.172218	of development, testing and maintenance
-0.048265	2 13.4 Test and maintenance
-0.048265	decision. 13.4 Test and maintenance
-0.325202	by XOR'ing it with 1:
-0.312991	0: printf("Alpha"); break; case 1:
-0.200817	?Func@@YAXQAHAAH@Z PROCNEAR ; parameter 1:
-0.200817	PROC NEAR ; parameter 1:
-0.165188	is rarely needed. 11 Out
-0.165188	Hyperthreading ..................................................................................................... 103 11 Out
-0.165184	accessed on a First-In-Last- Out
-0.165184	accessed on a First-In-First- Out
-0.457816	avoid this in a protected
-0.354347	error message in a protected
-0.237732	needs to switch to protected
-0.237732	overhead of switching to protected
-0.518899	with dynamic memory allocation. Container
-0.074786	allocation ...................................................................................... 90 9.7 Container
-0.074786	on using alloca. 9.7 Container
-0.165184	some cases. Multiple threads? Container
-0.756487	may be used as alternatives
-0.459230	there are more efficient alternatives
-0.237117	look at the possible alternatives
-0.553359	normally. There are various alternatives
-1.014571	do a lot of modifications
-1.179284	can be improved by modifications
-0.289795	to check if your modifications
-0.233586	is likely to require modifications
-0.309393	i<300; i+=3,i_div_3++){ list[i] += i_div_3;
-0.220499	+= i_div_3; list[i+1] += i_div_3;
-0.220499	+= i_div_3; list[i+2] += i_div_3;
-0.345112	int list[300]; int i, i_div_3;
-0.350838	s; 40 i = s;
-0.063560	int i; short int s;
-0.200049	& x) { __m128 s;
-0.035786	look at the "worst case"
-0.035786	counts represent the "worst case"
-0.074788	it is the "best case"
-0.074788	"worst case" and "best case"
-0.545097	handling. You have to distinguish
-0.338727	executables. Make sure to distinguish
-1.245567	It is important to distinguish
-0.335567	It may fail to distinguish
-0.237879	rounding and truncation. The missing
-0.459764	aliasing. Operations that are missing
-0.352009	Unfortunately, these functions are missing
-0.324991	or input data. A missing
-0.231416	Mac platforms. 2. Optimizing subroutines
-0.001700	in manual 2: "Optimizing subroutines
-0.222400	true that some development tools
-0.222400	or network. Various development tools
-0.230106	waiting for better metaprogramming tools
-0.228166	vendors are offering profiling tools
-0.356774	we have a = 0x2710
-0.118900	a variable from address 0x2710
-0.118900	Reading again from address 0x2710
-0.118900	program reads from address 0x2710
-0.266631	to isolate the hot spot
-0.135006	there is a hot spot
-0.135006	profiling. When a hot spot
-0.174227	single function or hot spot
-0.237517	as recursive templates. The powN
-0.237517	inside the template. The powN
-0.341711	avoiding infinite loop if powN
-0.407336	IsPowerOf2, int N> class powN
-0.645313	the same as the C-style
-0.356169	more safe than the C-style
-0.314696	Implicit type conversion // C-style
-0.233812	a string. The old C-style
-0.471081	of the C++ language While
-0.291783	calls to frame functions. While
-0.466187	subroutines in assembly language". While
-0.165179	to use than others. While
-0.517523	b:2; int c:2; }; Bitfield
-0.227103	}; char abc; }; Bitfield
-0.233822	// Example 7.40b union Bitfield
-0.231914	// Example 7.40a struct Bitfield
-0.237731	also has something to clean
-0.382588	there is nothing to clean
-0.293578	The simplest and most clean
-0.236250	words, the program must clean
-0.226782	with the option -fpic according
-0.301497	in a binary representation according
-0.212323	compiler to always behave according
-0.200043	-100+100+100 = 100. Now, according
-0.356255	< 13) { // Bounds
-0.074786	to are constant. 14.2 Bounds
-0.074786	tables ................................................................................................. 132 14.2 Bounds
-0.165184	for minimizing memory fragmentation. Bounds
-0.382809	int i; } u; u.i
-0.587341	} u; int n; u.i
-0.276603	// check if nonzero u.i
-0.878453	can lead to a dramatic
-0.445938	effect is much more dramatic
-0.350393	This has a very dramatic
-0.235323	This can have quite dramatic
-0.292826	command-line versions without an IDE.
-0.236686	32-bit Windows, including an IDE.
-0.340216	not have its own IDE.
-0.454653	the Microsoft Visual Studio IDE.
-0.237877	thread are smaller. The lengths
-0.351323	or strings of different lengths
-0.237063	strings typically have variable lengths
-0.165179	have gone to great lengths
-0.357595	member functions is not expensive.
-0.325844	Cache misses are very expensive.
-0.233043	features, but also very expensive.
-0.340829	level-1 cache are less expensive.
-1.376584	for the sake of efficiency.
-0.331722	hardly any loss of efficiency.
-0.339473	be a difference in efficiency.
-0.327060	these methods to improve efficiency.
-0.203368	Literature ..................................................................................................................... 163 20 Copyright
-0.203368	for some links. 20 Copyright
-0.165184	always available from www.agner.org/optimize. Copyright
-0.165184	Technical University of Denmark. Copyright
-0.331902	the integer registers is extended
-0.763996	This method can be extended
-0.539835	128-bit XMM registers are extended
-0.341574	is done with an extended
-0.237404	size) = (total cache size)
-0.308538	0 || i >= size)
-0.074786	(memory address) / (line size)
-0.074786	(number of sets) (line size)
-0.200043	SSE4.2 nmmintrin.h (MS) smmintrin.h (Gnu)
-0.165179	(Gnu) AMD FMA4 fma4intrin.h (Gnu)
-0.165179	XOP ammintrin.h (MS) xopintrin.h (Gnu)
-0.165179	all intrin.h (MS) x86intrin.h (Gnu)
-0.294240	known. This information is contained
-1.079158	a pointer to a contained
-0.237922	each thread. Pointers to contained
-0.279488	code can be completely contained
-0.507257	avoid the overhead of transferring
-0.293759	The preferred method for transferring
-0.237506	free register left for transferring
-0.237785	in 32-bit Windows by transferring
-1.014098	data caching less efficient. Access
-0.074786	Strings ...................................................................................................................... 96 9.9 Access
-0.074786	manual at www.agner.org/optimize/cppexamples.zip. 9.9 Access
-0.165184	the remote data locally. Access
-0.335304	local variables, and for saving
-1.294934	can be used for saving
-0.237137	have a strategy for saving
-0.165190	up a stack frame, saving
-0.344282	the market for many years
-0.312551	will often take several years
-0.224948	is using a six years
-0.200043	to five or ten years
-0.234709	// Example 14.16a double y,
-0.234709	// Example 14.16b double y,
-0.296173	Example 8.8a double x, y,
-0.258243	bool a; float x, y,
-0.294020	The high priority of structured
-0.237735	stress the importance of structured
-0.702083	not be compatible with structured
-0.331694	the code relies on structured
-0.141406	platforms. See the compiler documentation
-0.141406	obvious. See the compiler documentation
-0.224950	higher due to poor documentation
-0.212321	1997. Mostly obsolete. Microprocessor documentation
-0.758520	void test () { CChild1
-0.234976	that the declaration class CChild1
-0.234976	multiple // versions: class CChild1
-0.200049	CChild1 Object1; CChild2 Object2; CChild1
-0.312690	Codeplay Watcom Digital Mars PGI
-0.200043	and PathScale compilers. (The PGI
-0.165179	CPUs cannot be tolerated. PGI
-0.165179	C++ v. 3.1, 2007. PGI
-0.231900	cycles per element. 100 As
-0.165179	return y = pow(x,n) As
-0.165179	can be predicted perfectly. As
-0.165179	expression b * 5). As
-0.122801	use runtime type identification (RTTI)
-0.122801	No runtime type identification (RTTI)
-0.045299	7.21 Runtime type identification (RTTI)
-0.235575	are double precision by default,
-0.023344	and lazy binding by default,
-0.235575	Windows. Does not, by default,
-0.229544	return y; } double xpow10(double
-0.022879	power of 10 double xpow10(double
-0.229544	power, loop unrolled double xpow10(double
-0.017525	Example 14.17b double a1, a2,
-0.017525	Example 14.17a double a1, a2,
-0.008674	14.16a double y, a1, a2,
-0.008674	14.16b double y, a1, a2,
-0.234275	the program flow at inconvenient
-0.234275	time-consuming garbage collector at inconvenient
-0.234275	may come unpredictably at inconvenient
-0.341016	programmers' time, but also inconvenient
-0.536903	offset has to be expressed
-0.456424	the address can be expressed
-0.353250	the offset can be expressed
-0.353250	The formats can be expressed
-0.357991	reduce speed if the bottleneck
-0.356593	function calls. If the bottleneck
-0.457998	likely to be a bottleneck
-0.236123	to maintain. Any specific bottleneck
-0.564767	functions by using the directive
-0.851865	You cannot expect a directive
-0.233581	space where a #define directive
-0.165179	aligned or the __assume_aligned directive
-0.291409	a non-Intel CPU. If not,
-0.291409	the unroll factor. If not,
-0.329869	in fact it does not,
-0.361344	for 32-bit Windows. Does not,
-0.357675	because registers is a scarce
-0.117572	register. Registers are a scarce
-0.117572	reference. Registers are a scarce
-0.233818	and disk space were scarce
-0.380158	inside the loop. Example 12.4b
-0.235988	the chosen expression. Example 12.4b
-0.311100	the way of example 12.4b
-0.352365	AND-OR construction in example 12.4b
-0.526835	An implementation of the lrint
-0.864953	how to use the lrint
-0.142957	14.19 static inline int lrint
-0.142957	_mm_cvtss_si32(_mm_load_ss(&x));} static inline int lrint
-0.537446	functions that have multiple versions.
-0.335373	memory-hungry software in two versions.
-0.338520	VIA including the 64-bit versions.
-0.536071	both static and dynamic versions.
-0.613015	9 Optimizing memory access .............................................................................................
-0.635401	12.5 Using vector classes .............................................................................................
-0.379666	56 7.27 Overloaded operators .............................................................................................
-0.418098	135 14.4 Integer multiplication .............................................................................................
-0.667379	addresses divisible by 16. Alignment
-0.148753	of register variables. 9.5 Alignment
-0.148753	together ...................................... 88 9.5 Alignment
-0.165184	or 16 Table 7.2. Alignment
-0.325394	each processor model is going
-0.237923	a 50-50 chance of going
-0.382587	use. I am not going
-0.237617	a performance penalty when going
-0.279190	because the overflow and underflow
-0.279190	much about overflow and underflow
-0.467273	which will generate an underflow
-0.357557	that generate floating point underflow
-0.016334	even when their live ranges
-0.005377	b because their live ranges
-0.002680	register because their live ranges
-0.467594	unrolling the loop and splitting
-0.314268	only for classes. The splitting
-0.237517	such a formalism. The splitting
-0.233814	supported 256-bit instructions were splitting
-0.463176	total waste of the user's
-0.237781	current version satisfies the user's
-0.237781	that can steal the user's
-0.234541	the majority of end user's
-0.325154	n.a. _MSC_VER and not __INTEL_COMPILER
-0.235940	Intel compiler Windows Linux __INTEL_COMPILER
-0.148753	_MSC_VER and not __INTEL_COMPILER __INTEL_COMPILER
-0.148753	compiler Windows Linux __INTEL_COMPILER __INTEL_COMPILER
-0.575184	that need to be cleaned
-0.575184	may need to be cleaned
-0.331812	sure allocated resources are cleaned
-0.290542	are called and resources cleaned
-0.467642	if the table is cached.
-0.571809	sticks may not be cached.
-1.025414	if it is not cached.
-0.354417	and data are not cached.
-0.237902	functions for audio and video
-0.237809	produce streaming audio or video
-0.032688	access. 12.9 Aligning RGB video
-0.032688	120 12.9 Aligning RGB video
-0.012675	eight consecutive elements in aa:
-0.237196	have the line number information.
-0.236386	not on publicly available information.
-0.347392	possibly save exception handling information.
-0.233332	to receive new relevant information.
-0.418349	alloca. 9.7 Container classes Whenever
-0.379238	it is never used. Whenever
-0.230108	has one big problem. Whenever
-0.218584	of processors is better. Whenever
-0.358058	normally belongs to the area
-0.144289	use the same memory area
-0.314308	tables. The static data area
-0.356634	n here because the consequence
-0.493595	or function calls. The consequence
-0.237517	has been wasted. The consequence
-0.212321	occur has the unfortunate consequence
-0.234709	// Example 14.17b double a1,
-0.234709	// Example 14.17a double a1,
-0.068041	Example 14.16a double y, a1,
-0.068041	Example 14.16b double y, a1,
-0.538909	if the dividend is unsigned.
-0.339484	number when converted to unsigned.
-0.331878	can be signed or unsigned.
-0.237794	extending with zero-bits if unsigned.
-0.314571	do arithmetic operations with pointers.
-0.290379	object, except for char pointers.
-0.088608	integer overflow, and invalid pointers.
-0.088608	bounds violations and invalid pointers.
-0.355573	not cached. See page 26
-0.331663	static keyword, for floating 26
-0.165179	of different C++ constructs........................................................................ 26
-0.165179	kinds of variable storage............................................................................. 26
-0.331231	virtual functions if possible. Smaller
-0.283136	account in the software. Smaller
-0.212315	data to optimize caching. Smaller
-0.165179	design of small microcontrollers: Smaller
-0.653182	operating system. See page 29
-0.226779	64 -263 263-1 int64_t 29
-0.224948	we are swapping column 29
-0.165179	Integers variables and operators............................................................................... 29
-0.325403	do another addition to sum2
-0.237899	summation variables sum1 and sum2
-0.338469	list[size], sum1 = 0, sum2
-0.251360	{ sum1 += list[i]; sum2
-0.237817	int n; u.i = (n
-0.629352	!= 0) { if (n
-0.293786	(int n) { if (n
-0.236245	y = 1.0; while (n
-0.294182	makes intermediate object for (b
-0.568819	int b; a = (b
-0.214832	== 0) { if (b
-0.314462	!= 0) { if (b
-0.237046	floating point number by 2n
-0.293235	You can divide by 2n
-0.325202	by AND'ing it with 2n
-0.529054	value is less than 2n
-0.420562	have little or no idea
-0.265070	can be a good idea
-0.265070	is not a good idea
-0.265070	is therefore a good idea
-0.736919	9.5 Alignment of data ......................................................................................................
-0.381077	36 7.7 Function pointers ......................................................................................................
-0.380943	21 3.12 Network access ......................................................................................................
-0.375069	20 3.8 System database ......................................................................................................
-0.294216	should be written in C,
-0.022584	aliasing rule of standard C,
-0.229247	code. Compiled languages include C,
-0.237823	in Gnu compiler // Same
-0.165179	available: // Example 12.4c. Same
-0.165179	this: // Example 12.4e. Same
-0.165179	classes: // Example 12.4d. Same
-0.327821	more efficient. You can disable
-0.327821	the compiler. You can disable
-0.450578	application then you should disable
-0.200049	the code to test. disable
-0.761792	that rely on the assumption
-0.237866	the Gnu compiler, the assumption
-0.314502	doesn't make such an assumption
-0.293259	you cannot make any assumption
-0.294049	is that x is treated
-0.544704	if the object is treated
-0.352351	member functions is also treated
-0.291384	overloaded function are simply treated
-0.234370	not compatible across compilers. Fastcall
-0.165179	Optimize function #pragma optimize(...) Fastcall
-0.165179	Simple member pointers /vms Fastcall
-0.165179	parameters on CodeGear compiler). Fastcall
-0.531124	video or 3-dimensional vectors RGB
-0.074788	vector access. 12.9 Aligning RGB
-0.074788	memory................................................................. 120 12.9 Aligning RGB
-0.165184	approximate reciprocal square root, RGB
-0.237899	Java and C# and avoids
-0.781176	in a way that avoids
-0.350246	a time and it avoids
-0.237444	takes more time but avoids
-0.543525	Therefore the compiler is prevented
-0.294051	without returning. F1 is prevented
-0.355950	cache contentions can be prevented
-0.355950	wasteful behavior can be prevented
-0.434481	142 14.10 Mathematical functions .......................................................................................
-0.687603	Choice of hardware platform .......................................................................................
-0.616294	Choosing the optimal algorithm .......................................................................................
-0.513337	3.16 Execution unit throughput .......................................................................................
-0.685532	but this feature is seldom
-0.294172	for detecting errors that seldom
-0.237565	used functions separate from seldom
-0.311036	used functions, and put seldom
-0.212006	is a penalty for mixing
-0.212006	is no penalty for mixing
-0.594956	are certain restrictions on mixing
-0.293888	is a problem when mixing
-0.122672	a member function. 7.12 Branches
-0.122672	Type conversions.................................................................................................... 40 7.12 Branches
-0.200049	loop in example 15.1b. Branches
-0.165184	the branch misprediction penalty. Branches
-0.237950	to alias upon the double.
-0.355312	constant 2.5, which is double.
-0.324494	bit vector of two double.
-0.165179	data types: long long, double.
-0.357724	from www.agner.org/optimize/asmlib.zip. // Example 16.1
-0.335688	// or from example 16.1
-0.212315	16 Testing speed.............................................................................................................. 153 16.1
-0.251360	cycle counter (see below) 16.1
-0.023436	int r, c; for (r
-0.037045	c; double temp; for (r
-0.652664	is true, which is 50%
-0.237491	size grows by only 50%
-0.306982	that a is true 50%
-0.321147	branch will be mispredicted 50%
-0.350646	does so in a suboptimal
-0.064742	non-Intel CPUs in a suboptimal
-0.314756	automatic vectorization leads to suboptimal
-0.125218	0.18 0.18 0.11 memcpy 16kB
-0.125218	0.57 0.44 0.12 memcpy 16kB
-0.125218	libraries Test Processor memcpy 16kB
-0.125218	0.25 0.28 0.22 memcpy 16kB
-0.314263	different priorities to different tasks.
-0.323535	times, even for simple tasks.
-0.234808	for more complicated mathematical tasks.
-0.165179	doing multiple logically distinct tasks.
-0.358058	bit mode if the image
-0.237899	statistics, signal processing and image
-0.294155	parallel calculations. Examples are image
-0.218590	or 3-dimensional vectors RGB image
-0.237952	efficient to determine the worst-case
-0.287393	be relevant when testing worst-case
-0.203368	the response time under worst-case
-0.203368	also be tested under worst-case
-0.231404	a single result, true (1)
-0.251360	remedies against this problem: (1)
-0.165179	a public data object: (1)
-0.165179	through this address. Step (1)
-0.657920	when b is a float,
-0.325387	compilers). The representation of float,
-0.335297	precision conversion Conversions between float,
-0.222394	types such as int, float,
-0.340684	not suitable for example 9.5
-0.234857	If we modify example 9.5
-0.308535	use of register variables. 9.5
-0.218590	stored together ...................................... 88 9.5
-0.495962	2) { a[i] = Induction;
-0.309999	loop ; a[i] = Induction;
-0.102169	Induction; ; a[i+1] = Induction;
-0.102169	= Induction; a[i+1] = Induction;
-0.325403	we are writing to uncached
-0.325302	the program code are uncached
-0.314499	more expensive than an uncached
-0.236166	Uncached memory store An uncached
-0.352965	of N into the individual
-0.346461	makes the access to individual
-0.534912	system rather than by individual
-0.322748	in order to identify individual
-0.339463	This allows it to begin
-0.237869	functions have names that begin
-0.444222	then the microprocessor can begin
-0.236247	in the array must begin
-0.442054	goes to the user interface.
-0.313038	priority than the user interface.
-0.281189	of a graphical user interface.
-0.188175	has a graphical user interface.
-0.357724	with alloca: // Example 9.3
-0.236986	element. 100 As table 9.3
-0.224944	Cache organization ................................................................................................... 87 9.3
-0.466187	AMD and VIA CPUs". 9.3
-0.348424	an explanation of this option.
-0.043815	to turn on this option.
-0.226793	object without the -fpic option.
-0.357846	leftmost column to the diagonal.
-0.355343	reflecting it at the diagonal.
-0.212208	column 28 above the diagonal.
-0.212208	mirror position above the diagonal.
-0.294210	of user interfaces and interfaces
-0.326983	easy development of user interfaces
-0.332161	their own graphical user interfaces
-0.235498	direct access to hardware interfaces
-0.100845	exp function of 4 floats
-0.100845	// Structure of 4 floats
-0.329218	Define vectors of four floats
-0.307580	Make array of 100 floats
-0.345060	from one function to another.
-0.314523	from one object to another.
-0.407990	in one way or another.
-0.293986	to one thread than another.
-0.352577	<typename T, unsigned int N>
-0.236680	template <bool IsPowerOf2, int N>
-0.102884	of 2 template <int N>
-0.102884	of N template <int N>
-0.122675	first 128 bytes. 7.19 Class
-0.122675	(properties) ............................................................................ 51 7.19 Class
-0.122675	effect on performance. 7.18 Class
-0.122675	and classes............................................................................................ 51 7.18 Class
-0.446326	place in the program. Small
-0.301848	that run in parallel. Small
-0.330839	resources cannot be controlled. Small
-0.200043	that make vectorization favorable: Small
-0.237869	used the trick that N1
-0.292478	is 0. The constant N1
-0.233581	representation of N: #define N1
-0.165179	powN<(N1&(N1-1))==0,N1>::p(x) * powN<true,N-N1>::p(x); #undef N1
-0.820848	used. The advantages of alloca
-0.237617	the space explicitly when alloca
-0.985403	the function in which alloca
-0.976415	when the function returns. alloca
-0.294213	about pointer alignment and aliasing.
-0.222756	code has no pointer aliasing.
-0.222756	hint about no pointer aliasing.
-0.296710	78. Assume no pointer aliasing.
-0.459184	A branch can be eliminated
-0.355424	or reference can be eliminated
-0.561643	branch can also be eliminated
-0.476355	Divisions can sometimes be eliminated
-0.353831	planning stage that a detailed
-0.538801	the compiler documentation for detailed
-0.335907	Instruction sets A more detailed
-0.324119	of abstraction which makes detailed
-0.234805	float 256 double 256 F32vec4
-0.242395	F32vec4 xx4(x4); // x^4 F32vec4
-0.148757	x^1, x^2, x^3, x^4 F32vec4
-0.218590	vectors of four floats F32vec4
-0.237809	you can clear or mask
-0.323239	_mm_cmpgt_epi16(b, zero); // Use mask
-0.022320	generate a bit-mask: __m128i mask
-1.093001	makes sure that the original
-0.461203	is called when the original
-0.488840	it checks whether the original
-0.237882	advantages and disadvantages. The original
-0.314391	quite costly because all caches
-0.313333	More details about how caches
-0.234928	more heuristic guidelines. Most caches
-0.165179	will invalidate each other's caches
-0.336292	software products fail to recognize
-0.346361	this case it will recognize
-0.488808	++b; the compiler will recognize
-0.525687	memory. Most compilers will recognize
-0.148753	512 378.7 168.5 513 513
-0.148753	512 2048 230.7 513 513
-0.165184	512 512 378.7 168.5 513
-0.165184	512 512 2048 230.7 513
-0.272296	template method. 7.29 Threads Threads
-0.074786	complicated template method. 7.29 Threads
-0.074786	56 7.28 Templates...............................................................................................................57 7.29 Threads
-0.165184	is possible in Linux). Threads
-0.122675	using overloaded functions. 7.27 Overloaded
-0.122675	functions .............................................................................................. 56 7.27 Overloaded
-0.074788	3) <<6 ); 7.26 Overloaded
-0.074788	Bitfields ................................................................................................................... 56 7.26 Overloaded
-0.730592	high power of 2. Contentions
-0.200043	the branch target buffer. Contentions
-0.165179	branch target buffer (BTB). Contentions
-0.165179	matrix in my experiments. Contentions
-0.561653	called. This method is illustrated
-0.237761	tiling. This technique is illustrated
-0.358401	This effect can be illustrated
-0.237750	with bounds checking, as illustrated
-0.127045	register size. In other words,
-0.127045	template parameter. In other words,
-0.127045	intended for. In other words,
-0.127045	exception safe. In other words,
-0.237879	efficient, but risky. The returned
-0.522904	when objects can be returned
-0.355948	composite type can be returned
-0.349051	cases, composite objects are returned
-0.294184	a new one. The existing
-0.675279	sake of compatibility with existing
-0.346295	add functionality to an existing
-0.236686	the function modify an existing
-0.528477	modifications in the code. Let's
-0.231404	about loss of precision. Let's
-0.165179	bytes = 4 rows. Let's
-0.165179	number of possible inputs. Let's
-0.211953	and stored as it is,
-0.211953	be executed as it is,
-0.293403	more RAM than there is,
-0.344906	floating point. The reason is,
-0.372495	function. The following example illustrates
-0.372495	2. The following example illustrates
-0.372495	errors. The following example illustrates
-0.372495	compilation. The following example illustrates
-0.037147	16.2 The pitfalls of unit-testing
-0.314588	of measuring performance by unit-testing
-0.237736	in software development. This unit-testing
-0.028767	list[300]; int i; for(i=0; i<300;
-0.165190	int i, i_div_3; for(i=i_div_3=0; i<300;
-0.338778	int Sum2(S3 * p) {return
-0.313499	int Sum3(S3 & r) {return
-0.165179	at 403 int ReadB() {return
-0.165179	int b; int Sum1() {return
-0.235825	y2 = a2 / b2;
-0.095493	y, a1, a2, b1, b2;
-0.200049	B1 { public: B2 b2;
-0.228163	with real time applications. Remember
-0.222389	names and variable names. Remember
-0.272284	next model work better. Remember
-0.165179	goes up and down. Remember
-0.325334	in simple cases. The explicit
-0.323874	two and making an explicit
-0.236686	set specified. Insert an explicit
-1.323330	the programmer to make explicit
-0.355903	accessed row-wise, then the mirror
-0.429533	may be optimal to mirror
-0.354562	a time. You may mirror
-0.236312	element matrix[c][r] at its mirror
-0.357378	the combination of a dedicated
-0.505231	use rather than a dedicated
-0.426053	much slower than a dedicated
-0.647708	systems also have a dedicated
-0.329274	{ public: virtual void Disp()
-0.275158	CParent<CChild1> { public: void Disp()
-0.275158	CParent<CChild2> { public: void Disp()
-0.059563	double b[SIZE][SIZE]) { int r,
-0.234622	{temp=x; x=y; y=temp;} int r,
-0.234622	example 9.5a: 98 int r,
-0.355657	the code. // Example 8.26a
-0.292032	(32-bit mode): ; Example 8.26a
-0.668758	assembly code from example 8.26a
-0.234861	the compiler optimize example 8.26a
-0.314778	debugger cannot set a breakpoint
-0.042049	at the interrupt 3 breakpoint
-0.042049	remove the interrupt 3 breakpoint
-0.303138	to insert a fixed breakpoint
-0.004315	14.17b double a1, a2, b1,
-0.004315	14.17a double a1, a2, b1,
-0.002152	double y, a1, a2, b1,
-0.237869	the logical register that appears
-0.237809	11.1b automatically, although it appears
-0.331636	loop automatically if this appears
-0.236166	new and better processor appears
-0.237950	necessary for verifying the functionality
-0.235489	making plug-ins that add functionality
-0.626437	to obtain the desired functionality
-0.389105	library with a well-defined functionality
-0.338897	rarely found in other languages.
-0.318887	developers choose other programming languages.
-0.230824	platforms and various programming languages.
-0.165184	several other less well-known languages.
-0.237923	defines an algorithm of sequential
-0.546505	addresses are accessed in sequential
-0.382642	a switch statement with sequential
-0.200043	are accessed in non- sequential
-0.136300	to this manual at www.agner.org/optimize/cppexamples.zip
-0.234275	in the appendix at www.agner.org/optimize/cppexamples.zip
-0.236861	of the alignment. See www.agner.org/optimize/cppexamples.zip
-0.010931	a matter of programming style.
-0.023493	in example 9.6b. The MOVNTQ
-0.237825	{ _mm_stream_pi((__m64*)dest, *(__m64*)&source); // MOVNTQ
-0.499777	8 bytes without cache MOVNTQ
-0.237929	which we assume is optimized.
-0.525322	the code is not optimized.
-0.236836	similar functions, but less optimized.
-0.229240	are not always fully optimized.
-0.983327	of code and data .........................................................................................
-0.402664	65 7.32 Preprocessing directives .........................................................................................
-0.421430	107 12.3 Automatic vectorization .........................................................................................
-0.466187	14 Specific optimization topics .........................................................................................
-0.237367	with virtual functions class CHello
-0.499734	class C1 : public CHello
-0.271507	class C2 : public CHello
-0.200049	C1 Object1; C2 Object2; CHello
-0.355686	caches work can be found
-0.355686	time. (Examples can be found
-0.351203	for correctness must be found
-0.230820	of advanced features rarely found
-0.343590	15.1c was done by me
-0.074786	compiler-generated assembly code. Let me
-0.074786	lines and sets. Let me
-0.165184	people who have sent me
-0.459193	this value from the counts.
-0.313786	between the two clock counts.
-0.416093	higher than the subsequent counts.
-0.356088	at the "worst case" counts.
-0.330736	branch mispredictions. The performance measurement
-0.342844	bottlenecks is to put measurement
-0.341889	may put the desired measurement
-0.165179	of this method. Your measurement
-0.293407	may go through multiple layers
-0.236971	and other extra software layers
-0.236078	formalism that requires several layers
-0.234801	excessive number of separate layers
-0.347816	up. If an error handler
-0.180254	responsibility of the exception handler
-0.180254	it to the exception handler
-0.180254	function, then the exception handler
-0.356289	an offset that is coded
-0.357373	point instructions. This is coded
-0.581258	instructions that can be coded
-0.356247	specific instructions that are coded
-0.575565	count is small and changing
-0.324314	if possible and by changing
-0.324314	avoid the multiplication by changing
-0.376053	with the last index changing
-0.358487	longer time in the unit-test
-0.237866	function, but unfortunately the unit-test
-0.354629	cannot rely on a unit-test
-0.237651	performs best under this unit-test
-0.336325	are accessed through the implicit
-0.237879	affected by __fastcall. The implicit
-0.348699	is transferred as an implicit
-0.335277	functions. Sum1 has an implicit
-0.325357	software packages faster and smaller.
-0.237854	task or thread are smaller.
-0.236166	the array 800 bytes smaller.
-0.235330	space or make files smaller.
-0.717180	to be in the interval
-0.458374	an integer in the interval
-0.354787	point number in the interval
-0.341901	length of the desired interval
-0.356632	at all because the 33
-0.284365	32 16.4 65 65 33
-0.200043	32 7.4 Enums ...................................................................................................................... 33
-0.165179	...................................................................................................................... 33 7.5 Booleans................................................................................................................... 33
-0.355573	is incremented. See page 31
-0.232702	operators on integer variables. 31
-0.367391	r ebx, eax ebx, 31
-0.218584	per element 63 63 31
-0.007022	10; a = (unsigned int)b
-0.007022	16; a = (unsigned int)b
-0.319411	guide for x86 platforms. 3.
-0.228166	assembly instruction for interrupt 3.
-0.226790	and better at vectorization. 3.
-0.165179	into an anonymous namespace. 3.
-0.530715	that something takes 10 μs
-0.233327	may take only 5 μs
-0.165191	0.5 ns = 250 μs
-0.165191	loop? Certainly not! 250 μs
-0.452036	called from any other module.
-0.291786	is system-independent, in another module.
-0.045214	be called from another module.
-0.045214	also called from another module.
-0.288317	any extra code. Dynamic cast
-0.222399	functions can not. Static cast
-0.165179	float to int. Reinterpret cast
-0.165179	and VIA CPUs"). Const cast
-0.336167	variables are stored as 8-bit
-0.313790	be expressed as an 8-bit
-0.313790	is coded as an 8-bit
-0.448326	example, we are using 8-bit
-0.288674	without specifying the size. Integers
-0.230818	and operators Integer sizes Integers
-0.122672	of using classes. 7.2 Integers
-0.122672	variable storage............................................................................. 26 7.2 Integers
-0.381077	operators. 7.7 Function pointers Calling
-0.449512	instances of the class. Calling
-0.222389	and VIA CPUs. 5. Calling
-0.165179	and then calls exit. Calling
-0.016037	static inline int lrint (double
-0.200049	using loop double ipow (double
-0.165184	static inline double IntegerPower (double
-0.227105	Thursday, Friday, Saturday }; Weekdays
-0.227105	Saturday = 0x40 }; Weekdays
-0.148757	conditions using & enum Weekdays
-0.148757	Testing multiple conditions enum Weekdays
-0.237875	but not always for application-specific
-0.638737	more efficient to store application-specific
-0.291380	functions than in optimizing application-specific
-0.315845	be able to define application-specific
-0.506383	add b and c first.
-0.231916	the most predictable operand first.
-0.230111	used data members come first.
-0.316442	is calculated the fastest first.
-0.557385	described some of the considerations
-0.331818	is often determined by considerations
-0.353319	not satisfactory. The following considerations
-0.200043	that reflects the conflicting considerations
-0.313644	16383 one fraction 2 63
-0.472146	kilobytes Time per element 63
-0.276592	it is a valid 63
-0.218584	Time per element 63 63
-0.314761	complications. A double is represented
-1.103593	that it can be represented
-0.355948	zero. Zero can be represented
-0.732414	they are in fact represented
-0.592388	is, in order to force
-0.352596	member functions. You can force
-0.237282	shall automatically come into force
-0.234524	from disk. Memory-hungry applications force
-0.473268	necessary to do this manually.
-0.909353	you have to do manually.
-0.289000	to do the reductions manually.
-0.200043	to set the parentheses manually.
-0.356766	inside containers should be identified
-0.237404	specific order but are identified
-0.495447	consecutively? If objects are identified
-0.324649	top-of-stack index. Are objects identified
-0.429075	classes are given in www.agner.org/optimize/cppexamples.zip.
-0.237639	containers class templates in www.agner.org/optimize/cppexamples.zip.
-0.835650	to this manual at www.agner.org/optimize/cppexamples.zip.
-0.381374	same memory pool. See www.agner.org/optimize/cppexamples.zip.
-0.352666	the user has a virus
-0.346461	with network access to virus
-0.538801	is not uncommon for virus
-0.165179	to many users. Firewalls, virus
-0.336262	alignment of arrays and structures.
-0.294104	data into classes or structures.
-0.236577	you have big data structures.
-0.171506	to very big data structures.
-0.916416	Agner's vector class library exp
-0.230808	library functions directly: Library exp
-0.356088	function of 4 floats exp
-0.218590	vector class library exp exp
-0.717658	multiplied by the size (in
-0.343604	class objects. The size (in
-0.291973	of a matrix line (in
-0.626462	the CPU clock frequency (in
-0.462221	that p is a pointer,
-0.237749	a simple type, a pointer,
-0.230117	parameters, pointers, references, 'this' pointer,
-0.165184	main through an imported pointer,
-0.237930	the carry bit is kept
-0.467441	loop should preferably be kept
-0.467441	statements should preferably be kept
-0.352571	function. Sometimes, functions are kept
-0.237309	A + A; double Y
-0.630725	// Update induction variable Y
-0.342824	the two induction variables Y
-0.165179	{ Table[x] = Y; Y
-0.237619	do cross-module optimizations when interprocedural
-0.987292	the compiler to do interprocedural
-0.148757	more efficient and enables interprocedural
-0.242395	other modules. This enables interprocedural
-0.293643	size, because these are incompatible
-0.293643	Many optimization options are incompatible
-0.555900	this makes the code incompatible
-0.200049	slow, difficult to use, incompatible
-0.234810	more (128 or 256 bytes)
-0.042048	by the size (in bytes)
-0.042048	objects. The size (in bytes)
-0.088608	a matrix line (in bytes)
-0.503701	it points to the selected
-0.446163	do not have the selected
-1.125658	of the code is selected
-0.357406	the program may be selected
-0.339472	then its value is multiplied
-0.356343	clock counts should be multiplied
-0.351740	This index must be multiplied
-0.288678	constant plus an index multiplied
-0.237638	gives more reliable and reproducible
-0.237638	measurements as accurate and reproducible
-0.237621	ways to get more reproducible
-0.344938	be difficult to get reproducible
-0.349049	Linux Shared objects are normally
-0.354033	overlap. Compilers do not normally
-0.237734	of everything else. This normally
-0.292752	BSD and Mac systems normally
-0.355511	cache space used for constants.
-0.549746	between two or more constants.
-0.324741	caching problems for integer constants.
-0.218584	when used for defining constants.
-0.237756	of data cache, code cache,
-0.345709	the contents of data cache,
-0.472536	in the level-1 data cache,
-0.565493	usually share the same cache,
-0.237750	Function pointer serves as entry
-0.380286	Prototype for the common entry
-0.241279	and replaces the PLT entry
-0.191074	desired function. The PLT entry
-0.341867	processors. The performance is inferior
-0.348328	Some 64-bit compilers are inferior
-0.237711	it will run an inferior
-0.236166	set it supports. An inferior
-1.320939	is likely to be obsolete.
-0.237428	manual will soon be obsolete.
-0.074788	group books 1994. Mostly obsolete.
-0.074788	Addison- Wesley 1997. Mostly obsolete.
-0.535540	can do calculations while simultaneously
-0.462430	from doing multiple calculations simultaneously
-0.234675	146 Multiple applications running simultaneously
-0.222389	two or more jobs simultaneously
-0.265210	code. An interrupt service routine
-0.048401	critical function. The initialization routine
-0.011604	program has an initialization routine
-0.011604	library has an initialization routine
-0.325302	of smart pointers are auto_ptr
-0.428745	is transferred from one auto_ptr
-0.218584	one, and only one, auto_ptr
-0.165179	are auto_ptr and shared_ptr. auto_ptr
-0.341265	every call. A branch tree
-0.281805	may be a binary tree
-0.089349	this case. A binary tree
-0.089349	be moved. A binary tree
-0.145748	where the compiler is unable
-0.482190	the program will be unable
-0.342499	the user will be unable
-0.325429	the code is executed. Optimizes
-0.392990	OpenMP and automatic vectorization. Optimizes
-0.205278	Can do automatic vectorization. Optimizes
-0.284365	the standard calling conventions. Optimizes
-0.356223	static variables, floating point constants,
-0.235518	for floating 26 point constants,
-0.045629	floating point constants, string constants,
-0.045629	26 point constants, string constants,
-0.726061	further discussion of the techniques
-0.237866	experience before trying the techniques
-0.353321	at runtime). The following techniques
-0.234925	less expensive. Using complicated techniques
-0.491189	inline the function or otherwise
-0.237456	the | operator which otherwise
-0.445472	mispredicted even if they otherwise
-0.235570	programming errors that would otherwise
-0.074786	same member pointer. 7.9 Smart
-0.074786	7.8 Member pointers.......................................................................................................37 7.9 Smart
-0.200049	the pointer is deleted. Smart
-0.165184	shared_ptr than for auto_ptr. Smart
-0.935901	small or if it opens
-0.503136	Assume that a function opens
-0.237456	by calling WritePrivateProfileString, which opens
-0.357004	a new instruction set opens
-0.127708	variables that may be modified
-0.460229	while data that are modified
-0.417839	constants that are never modified
-0.311100	tested can convert example 15.1a
-0.234861	that automatically reduces example 15.1a
-0.260910	the Intel compiler reduced 15.1a
-0.208518	of the compilers reduced 15.1a
-0.529180	x86 and x86-64 platforms. Comparison
-0.200049	n.a. - Table 8.1. Comparison
-0.074786	do this optimization. 8.2 Comparison
-0.074786	optimize ............................................................................................ 66 8.2 Comparison
-0.526221	temp before it is finished
-0.314435	when all threads have finished
-0.131165	(c+d) before it has finished
-0.131165	sub-vector before it has finished
-0.549840	line when it is run.
-0.496456	time the program is run.
-1.023642	when the program is run.
-0.354833	compilation before it can run.
-0.048065	96 9.9 Access data sequentially
-0.048065	www.agner.org/optimize/cppexamples.zip. 9.9 Access data sequentially
-0.236875	are not necessarily stored sequentially
-0.512221	example can be accessed sequentially
-0.237565	the programming manuals from Intel:
-0.236726	Literature on code optimization Intel:
-0.218590	Mostly obsolete. Microprocessor documentation Intel:
-0.466187	versions are produced regularly. Intel:
-0.354012	function libraries in this format.
-0.338800	in the long double format.
-0.330156	the usual object file format.
-0.165179	be converted to OMF format.
-0.619928	of errors in C++ programs.
-0.235881	are obscured in optimized programs.
-0.231408	backwards compatible with 16-bit programs.
-0.229254	efficient than non-object oriented programs.
-1.051230	accessed in a non-sequential manner
-0.251360	handled in a systematic manner
-0.165179	in a rather unconventional manner
-0.165179	a in a column-wise manner
-0.382051	understanding of how compilers work.
-0.448903	directives do not always work.
-0.236319	busy concentrating on important work.
-0.234207	how compilers and microprocessors work.
-0.382693	unsigned long long or uint64_t
-0.330987	I64vec2 Vec2q 64 2 uint64_t
-0.234800	256 int int64_t 256 uint64_t
-0.165179	int 64 0 264-1 uint64_t
-0.312056	return &CriticalFunction_AVX; } if (level
-0.312975	&CriticalFunction_AVX; } else if (level
-0.023351	are many branches): if (level
-0.237906	have worked well in tests
-0.382792	different C++ compilers The tests
-0.237042	you may make some tests
-0.236989	Worst-case testing Most performance tests
-0.224948	other way three times. Then
-0.218584	program with profiling support. Then
-0.218584	without the -fpic option. Then
-0.165179	occurs somewhere in F1? Then
-0.346246	a solution where a soft
-0.331679	soft processor. Such a soft
-0.231906	FPGA as a so-called soft
-0.212321	emulated processors and FPGA soft
-0.348478	= -100, b = 100,
-0.348795	= 100, c = 100,
-0.236017	const int min = 100,
-0.236017	const int NUMROWS = 100,
-0.391774	often gives more reliable results.
-0.194045	more reliable and reproducible results.
-0.148753	difficult to get reproducible results.
-0.200049	and may produce undesired results.
-0.237926	way to tell a hyperthreading
-1.091105	is advantageous to use hyperthreading
-0.237464	a particular application. If hyperthreading
-0.500411	then you can avoid hyperthreading
-0.294210	the simplest expressions and operators.
-0.090956	vector classes and overloaded operators.
-0.090956	copy constructors and overloaded operators.
-0.265211	the increment and decrement operators.
-0.655463	A leaf function is simpler
-0.325189	are: The syntax is simpler
-0.337816	self-relative addresses is much simpler
-0.496392	contrary, the code becomes simpler
-0.575465	that the floating point format
-0.100253	step. The intermediate file format
-0.100253	to an intermediate file format
-0.506604	data to the right format
-0.645814	compile time or a reasonable
-0.352506	stored (or if a reasonable
-0.324861	copy that the only reasonable
-0.293617	requirement. Useful when no reasonable
-0.722174	be done with the resolution
-0.234671	require a very high resolution
-0.289790	resolution. A much higher resolution
-0.200043	is measured with millisecond resolution
-0.314122	two or more integer units,
-0.236208	size often have execution units,
-0.525206	two floating point addition units,
-0.228166	gates, flip-flops, multiplexers, arithmetic units,
-0.593093	variables. Example: // Example 12.2
-0.236280	calling the library function. 12.2
-0.222389	YMM registers ................................................................. 107 12.2
-0.218584	11.8 127 127 126 12.2
-0.009109	from bb into vector b:
-0.237548	finished the time-consuming data processing.
-0.230811	OpenMP directives for parallel processing.
-0.218584	signal processing and image processing.
-0.212323	OpenMP directives for multi-core processing.
-0.351923	well-defined functionality and a well-defined
-0.634946	function library with a well-defined
-0.447614	or modules with a well-defined
-0.224963	make the overflow behavior well-defined
-0.036973	power of 2 // Still
-0.023383	constant is faster // Still
-0.236761	Division takes 14 - 45
-0.236761	and multiplication (20 - 45
-0.354395	Example 7.30b int i; 45
-0.165184	statements............................................................................. 43 7.13 Loops...................................................................................................................... 45
-0.310539	eight consecutive elements from bb
-0.232777	2.0f; } 115 from bb
-0.444150	factors are explained in detail
-0.343542	algorithms are described in detail
-0.314395	is described in more detail
-0.358675	microprocessors. Many of the advices
-0.234204	Reference Manual". developer.intel.com. Many advices
-0.224950	from the above security advices
-0.237161	(See page 71). The conclusion
-0.237161	a make utility. The conclusion
-0.237161	rather than 1.23456. The conclusion
-0.325191	it points to is deleted
-0.544707	sure the object is deleted
-0.335956	one function and later deleted
-0.357514	integer parameters and the 49
-0.352816	than functions. See page 49
-0.332595	64-bit Windows (See page 49
-0.237098	intermediates, loop counters, function parameters,
-0.237098	was called from), function parameters,
-0.313311	and size as template parameters,
-0.236730	a faster vectorized code. Storing
-0.449518	instance of the class. Storing
-0.512433	stack in 32-bit mode. Storing
-0.294210	a = 0x2710 and (set)
-0.429103	10000, then we have (set)
-0.165184	address by the formula: (set)
-0.237902	between coarse-grained parallelism and fine-grained
-0.294061	coarse-grained parallelism than with fine-grained
-0.165184	useful methods for exploiting fine-grained
-0.228172	imprecise or simply zero. Execution
-0.074788	a dependency chain. 3.16 Execution
-0.074788	chains ................................................................................................ 22 3.16 Execution
-0.055797	c: __m128i c = LoadVector(cc
-0.285892	c: Is16vec8 c = LoadVector(cc
-0.614007	be loaded at an arbitrary
-0.329095	be loaded into an arbitrary
-0.235666	This is just an arbitrary
-0.318571	variable inside the class. Which
-0.200049	and "best case" values. Which
-0.200049	exactly the same effect. Which
-0.314478	that the compilers may behave
-0.237348	a number). Different compilers behave
-0.313110	a compiler to always behave
-0.236788	is represented with 64 bits,
-0.292649	an int is 32 bits,
-0.235765	of i to four bits,
-0.237902	to be platform-independent and compact.
-0.293890	for making data more compact.
-0.313459	bytes is slightly less compact.
-0.594157	a container class that behaves
-0.314205	is an object that behaves
-0.200054	page 142). 30 Overflow behaves
-0.336266	see emulated processors and FPGA
-0.313279	microprocessor core and an FPGA
-0.348257	a microprocessor in an FPGA
-0.336262	processors. AMD processors and earlier
-0.294061	but possibly not with earlier
-0.236413	Intel SVML v.10.2 & earlier
-0.232922	0; while (seconds < 5)
-0.232922	be while (0 < 5)
-0.308544	= &SelectAddMul_AVX2; (iset >= 5)
-0.271047	clock cycle. The operators &,
-0.395384	once The bitwise operators &,
-0.277892	the corresponding bitwise operators &,
-0.237362	See chapter 10 page 101
-0.222395	other subtasks is necessary. 101
-0.165184	.............................................................................................. 99 10 Multithreading.............................................................................................................. 101
-0.079506	program for the following reasons:
-0.079506	array for the following reasons:
-0.079506	caching for the following reasons:
-0.342865	by storing the elements consecutively
-0.350778	or structure are stored consecutively
-0.645570	the rows are accessed consecutively
-1.014098	data caching less efficient. Extra
-0.234538	long, double. Misaligned data. Extra
-0.226787	the Pentium 4 processor. Extra
-0.567387	way in case of error.
-0.237713	next line provokes an error.
-0.406098	also a common programming error.
-0.462968	of operations can be carried
-0.233814	compilers The tests were carried
-0.165184	method. A longer loop- carried
-0.294077	be made smaller by reordering
-0.237736	bytes S1 ArrayOfStructures[100]; This reordering
-0.325055	compiler can make this reordering
-0.312182	later ported to another platform.
-0.235041	only run on Mac platform.
-0.369301	is on a PC platform.
-0.562086	section if you are satisfied
-0.237406	below. Those who are satisfied
-0.522167	If you are not satisfied
-0.351171	also safer. It may catch
-0.445849	The above code will catch
-0.237590	try { F1(); } catch
-0.200049	/arch:SSSE2 -msse4.1 /arch:SSE4.1 -mAVX /arch:AVX
-0.165184	code Static linking (multithreaded) /arch:AVX
-0.165184	desired instruction set (/arch:SSE2, /arch:AVX
-0.355579	memory allocation. See page 93
-0.231412	objects stored are containers 93
-0.212321	9.7 Container classes ..................................................................................................... 93
-0.824844	version of the virtual 53
-0.251367	Virtual member functions ........................................................................................ 53
-0.165184	Class member functions (methods)......................................................................... 53
-0.228168	#define Alignd(X) __declspec(align(16)) X #else
-0.222399	"m"(x) : "memory" ); #else
-0.200049	__GNUC__ #define pure_function __attribute__((const)) #else
-0.313279	a multiplication and an addition.
-0.313279	multiplication but only an addition.
-1.028935	of a floating point addition.
-1.254227	known at compile time. Text
-0.411484	various efficient container classes. Text
-0.265203	specific needs. 9.8 Strings Text
-0.092839	portable to systems with big-endian
-0.092839	ported to systems with big-endian
-0.292263	on other platforms with big-endian
-0.466197	class B1; class B2; 54
-0.165184	type identification (RTTI) ........................................................................... 54
-0.165184	54 7.22 Inheritance .............................................................................................................. 54
-0.237902	See page 145 and 119
-0.231909	than any non-vector library. 119
-0.165184	Mathematical functions for vectors........................................................................ 119
-0.335789	|| b) a && !a
-0.599131	= false, a || !a
-0.251367	|| (a&&c) = a&&(b||c) !a
-0.314477	an extra level of abstraction
-0.102731	requires several layers of abstraction
-0.102731	of separate layers of abstraction
-0.224965	integers of 8 bits each,
-0.313502	integers of 16 bits each,
-0.335251	integers of 32 bits each,
-0.102082	branches): if (level >= 11)
-0.200054	threads. Out-of-order execution (chapter 11)
-0.314754	than 65 bytes of code).
-0.233812	CLR, to produce binary code).
-0.165184	an intermediate code (byte code).
-0.812181	9.6 Dynamic memory allocation ......................................................................................
-0.501069	The pitfalls of unit-testing ......................................................................................
-0.466197	is a clock cycle? ......................................................................................
-0.356595	the pipeline. If the wrong
-0.237867	it has chosen the wrong
-0.461132	are type-casted to a wrong
-0.055769	b: __m128i b = LoadVector(bb
-0.285702	b: Is16vec8 b = LoadVector(bb
-0.325406	two 32-bit integers to alias
-0.236551	the pointer does not alias
-0.236551	specific pointer does not alias
-0.347253	wasteful copying of memory blocks,
-0.290149	to keep multiple memory blocks,
-0.311915	allocations of large memory blocks,
-0.236628	new features. Take user feedback
-0.338217	process where the main feedback
-0.212321	desired new features. User feedback
-0.216500	pure_function __attribute__((const)) #else #define pure_function
-0.216500	8.22 #ifdef __GNUC__ #define pure_function
-0.165190	pure_function #endif double Func1(double) pure_function
-0.824803	-(-a) = a - a-a
-1.158649	= a - n.a. a-a
-0.200049	xxxxxxx-x xxxxxxxxx x-xxx---- a-(-b)=a+b a-a
-0.208644	are no long dependency chains.
-0.043529	to avoid long dependency chains.
-0.569968	thread is used for prefetching
-0.378335	can rely on automatic prefetching
-0.218594	do calculations while simultaneously prefetching
-0.458806	possible to see the compiler-generated
-0.429484	Intel function libraries and compiler-generated
-0.222395	to read and understand compiler-generated
-0.237603	a good investment. A redesign
-0.122675	lead to a complete redesign
-0.122675	the data. A complete redesign
-0.122675	the compilers may behave differently
-0.122675	number). Different compilers behave differently
-0.212331	142). 30 Overflow behaves differently
-0.346061	of A and then B,
-0.013569	Bitfield x; int A, B,
-0.262877	and automatic vectorization. Optimizes reasonably
-0.148757	standard calling conventions. Optimizes reasonably
-0.212327	processing. Visual Studio optimizes reasonably
-0.682357	the same processor core. Two
-0.347516	speed to using templates. Two
-0.165184	than to write _mm_add_epi16(a,b). Two
-0.200049	Constructors and destructors .................................................................................. 55
-0.165184	// will give -2.0 55
-0.165184	55 7.24 Unions .................................................................................................................... 55
-0.165998	kinds of vector math libraries:
-0.165998	some long vector math libraries:
-0.230186	of short vector math libraries:
-0.381965	cases be linked into projects
-0.341142	and maintainability of C++ projects
-0.313622	style are that software projects
-0.146465	52 , longdoublevalue ( 1)sign
-0.146465	23 , doublevalue ( 1)sign
-0.146465	as follows: floatvalue ( 1)sign
-0.885363	2. Example: // Example 14.6
-0.356473	} list[300] = 0; 14.6
-0.212321	14.5 Integer division...................................................................................................... 137 14.6
-0.356790	powerful solution is the combination
-0.460523	a constant with a combination
-0.222399	gives zero. An OR combination
-0.325290	well optimized Intel function libraries,
-0.049566	*.a) or dynamic link libraries,
-0.236170	Note that volatile doesn't mean
-0.234021	of data (low numbers mean
-0.265203	ebx. The square brackets mean
-0.355582	methods: Instrumentation: The compiler inserts
-0.236765	The Gnu compiler often inserts
-0.331197	takes. Debugging. The profiler inserts
-0.107208	p) { return _mm_loadu_si128((__m128i const*)p);
-0.165190	p) { return _mm_load_si128((__m128i const*)p);
-0.352061	the caller through a hidden
-0.356766	any function) should be hidden
-0.327280	= 16 is actually hidden
-0.356613	= a*(b+c) - n.a. x*x*x*x*x*x*x*x
-0.074788	+ d = ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x
-0.074788	x x x ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x
-0.307399	able to recover from errors.
-0.209207	made to recover from errors.
-0.463144	overhead to prevent such errors.
-0.311764	consecutive bytes of memory. One
-0.296282	of powerful development tools. One
-0.265203	is done only once. One
-0.206150	This is called square blocking
-0.206150	complicated techniques like square blocking
-0.165190	worth the effort. Square blocking
-0.023472	faster if unsigned // Faster
-0.212327	difficult to find elsewhere. Faster
-0.229249	points out some typical sources
-0.085112	Such schemes are frequent sources
-0.085112	Such frameworks are frequent sources
-0.237924	code is limited to well-tested
-0.294077	to replace arrays by well-tested
-0.235642	the Boost collection contains well-tested
-0.228176	also relevant to small devices,
-0.369308	even on such small devices,
-0.329419	resources. On the smallest devices,
-0.503100	addition with floating point multiplication,
-0.251367	speed of addition, subtraction, multiplication,
-0.165184	floating point operations (addition, multiplication,
-0.058261	before leaving the AVX part.
-0.235146	effort on that particular part.
-0.294104	a graphics library or API
-0.573299	between the operating system API
-0.291471	Software should use standard API
-0.561749	program when the program starts
-0.490867	called before the program starts
-0.337486	every time the computer starts
-1.512297	part of the code only.
-0.234390	the time consuming parts only.
-0.200049	be calculated using multiplications only.
-0.235504	as simple variables, loop counters,
-0.235504	are temporary intermediates, loop counters,
-0.338544	a loop with multiple counters,
-0.441579	throughout the whole program execution,
-0.345115	take advantage of out-of-order execution,
-0.210561	CPU from doing out-of-order execution,
-0.356473	i+=3){ list[i] = 0; list[i+1]
-0.291775	i+=3,i_div_3++){ list[i] += i_div_3; list[i+1]
-0.165184	i<300; i+=3){ list[i] =0; list[i+1]
-0.358059	severe delays if the distance
-0.237651	I will call this distance
-0.200049	critical stride. Variables whose distance
-0.357739	absolute values: // Example 14.28
-0.439390	by 2 in example 14.28
-0.339762	The method in example 14.28
-0.346463	by initializing pointers to zero,
-0.165184	integers use truncation towards zero,
-0.165184	Is16vec8 a = select_gt(b, zero,
-0.356473	for (r1 = 0; r1
-0.334183	by TILESIZE // Loop r1
-0.333434	0; r1 < SIZE; r1
-0.470761	TILESIZE) { // Loop r2
-0.265203	for (r2 = r1; r2
-0.165184	for (r2 = r1+1; r2
-0.200049	ammintrin.h AMD XOP ammintrin.h (MS)
-0.165184	fma4intrin.h (Gnu) all intrin.h (MS)
-0.165184	SSE4.1 smmintrin.h SSE4.2 nmmintrin.h (MS)
-0.447836	120 for discussion of aligning
-0.237877	// Define macro for aligning
-1.158528	prevents the compiler from aligning
-0.869569	have an option for assuming
-0.349017	problem since we are assuming
-0.382360	compiler is prevented from assuming
-0.237820	i; int Induction = r;
-0.331745	= 0; c < r;
-0.094898	variables, float Live range analysis
-0.094898	register storage. Live range analysis
-0.165190	to do a thorough analysis
-0.350528	the branch. It may seem
-0.236581	x The syntax may seem
-0.890689	compilers I have tested seem
-0.348114	versions of Linux and perhaps
-0.237641	by fetching, decoding and perhaps
-0.230828	is much faster, except perhaps
-0.191075	application code. An interrupt service
-0.191075	programming Device drivers, interrupt service
-0.165190	each integer type. Interrupt service
-0.236322	only 32-bit Windows. Gnu Comes
-0.234678	are also available. Microsoft Comes
-0.165184	/ CodeGear / Embarcadero Comes
-0.231708	parameter 1: 4 + esp
-0.231708	parameter 1: 8 + esp
-0.231708	ALIGN ?Func@@YAXQAHAAH@Z ENDP + esp
-0.334504	never uses the new features.
-0.231307	problems and desired new features.
-0.236173	models rather than processor features.
-0.237285	= a ^ b ---xx----
-0.074788	xxxxxxxxx 0/a=0 ---x---xx (-a==-b)=(a==b) ---xx----
-0.074788	x-xxx-x-- 0/a=0 ---xx--xx (-a==-b)=(a==b) ---xx----
-0.437088	48 7.15 Function parameters ...............................................................................................
-0.570179	The costs of optimizing ...............................................................................................
-0.521718	4 Performance and usability ...............................................................................................
-0.875736	reason is that the C/C++
-0.237879	double is bad The C/C++
-0.272290	8.42n, 2004. Open Watcom C/C++
-0.237820	the value -100+100+100 = 100.
-0.340982	label if i < 100.
-0.340982	loop condition i < 100.
-0.434973	Sab {int a; int b;};
-0.043896	S1 {double a; double b;};
-0.294174	a single task that consumes
-0.237458	9 extra overhead which consumes
-0.232340	high level framework still consumes
-0.291778	can hold e.g. four numbers,
-0.329598	brand names and model numbers,
-0.200049	of 2. Using hexadecimal numbers,
-0.284370	that are particularly critical. 129
-0.212321	128 128 17.4 129 129
-0.165184	128 128 128 17.4 129
-1.033025	time it takes to reload
-1.245126	it is necessary to reload
-0.339496	instruction doesn't give the 124
-0.165184	124 13.3 Difficult cases........................................................................................................ 124
-0.165184	13.2 Model-specific dispatching .................................................................................... 124
-0.418518	propagation, and loop-invariant code motion
-0.048091	x Loop invariant code motion
-0.048091	compiler. Loop invariant code motion
-0.382866	prefer to run a speed-critical
-0.344681	the dispatching only for speed-critical
-0.237508	linking is preferable for speed-critical
-0.346463	a long list of numbers:
-0.503100	example with floating point numbers:
-0.307580	the sum of 100 numbers:
-0.233324	VIA. The next section (page
-0.230108	in the previous chapter (page
-0.218594	of overflow. Table 8.1 (page
-0.382861	integers from 0 to 12.
-0.362415	as described in chapter 12.
-0.185190	operations mentioned in chapter 12.
-0.456878	the loop will take 1000
-0.148757	in a program repeats 1000
-0.148757	loop that also repeats 1000
-0.521045	stronger when they are long.
-0.234530	too small or too long.
-0.222407	inconsistent and sometimes unacceptably long.
-0.345979	The details of cache organization
-0.082633	more important. 9.2 Cache organization
-0.082633	......................................................................................... 87 9.2 Cache organization
-0.460085	on software that is slow,
-0.629143	calculation of A is slow,
-0.237595	the divisions (Division is slow,
-0.023512	the same operation is performed
-0.356768	The test should be performed
-0.358079	making software in a high-level
-0.236245	or program size, while high-level
-0.289292	C++ is an advanced high-level
-0.293607	for reserving memory in advance.
-0.314336	can be calculated in advance.
-0.331212	can be given in advance.
-0.311556	the code is fast anyway
-0.232337	information in the database anyway
-0.304418	often used by default anyway
-0.075164	separate dynamic link library (*.dll
-0.623825	all the dynamic libraries (*.dll
-0.339399	to BSD systems. The Intel-based
-0.535630	Linux as well as Intel-based
-0.272290	platforms (Windows, Linux, BSD, Intel-based
-0.352658	the startup code and main()
-0.171966	(*CriticalFunction)(parm1, parm2); } int main()
-0.171966	return &CriticalFunction_386; } int main()
-0.294113	x^2 float x4 = x2
-0.324641	xpow10(double x) { double x2
-0.237206	1./8.71782E10, 1./1.30767E12, 1./2.09227E13}; float x2
-0.236525	interpreters, just-in-time compilers, system database,
-0.222395	communication with a remote database,
-0.222395	the same queue, list, database,
-0.639649	with all x86 platforms. Works
-0.165184	math library (VML, MKL). Works
-0.165184	Intel Performance Primitives (IPP). Works
-0.531184	through a series of calculations:
-0.314698	// Main loop for calculations:
-0.218590	rules apply to modulo calculations:
-0.351702	is chosen as the basis
-0.165184	a First-In-First- Out (FIFO) basis
-0.165184	a First-In-Last- Out (FILO) basis
-0.314700	on important work. The updating
-0.465475	hand, does not need updating
-0.230824	tools. Automatic updates. Automatic updating
-0.638013	a lot of data manipulation
-0.236821	2003. Contains many bit manipulation
-0.310709	processing Memory and string manipulation
-0.237217	64) % 32 = 28.
-0.293430	example when r = 28.
-0.293410	lines in set number 28.
-0.237367	Example 8.19. Devirtualization class C0
-0.622882	class C1 : public C0
-0.200049	g() { C1 obj1; C0
-0.346981	in order to optimize access,
-0.224950	help files, data base access,
-0.165184	with First-In-First-Out or First-In-Last-Out access,
-0.531184	is a series of calculations,
-0.291556	default size when doing calculations,
-0.234811	such as heavy mathematical calculations,
-0.538806	the OpenMP directives for multi-core
-0.444180	Using multiple CPUs or multi-core
-0.237761	same processor core on multi-core
-0.237404	at the last cache level,
-0.427347	on the object file level,
-0.286827	at a lower priority level,
-0.237561	A Pragmatic Look at Exception
-0.561928	Exceptions and error handling Exception
-0.333781	cost of exception handling Exception
-0.294235	when it comes to optimization,
-0.341499	feature called whole program optimization,
-0.200049	C1::f } 73 Without optimization,
-0.236034	elimination, constant propagation, etc. Whether
-0.284368	with many Boolean expressions. Whether
-0.165184	in Sum2 and Sum3. Whether
-0.655104	the performance because the contents
-0.314686	block and copy the contents
-0.337046	allocated and the entire contents
-0.293407	Addison-Wesley, 1996. These two books
-0.289013	answers in the relevant books
-0.165184	code optimization", Coriolis group books
-0.237930	the word static is removed
-0.357406	security, but may be removed
-0.237736	8 columns unused. This removed
-0.353668	are listed on page 164
-0.212321	20 Copyright notice .......................................................................................................... 164
-0.165184	I die. See www.gnu.org/copyleft/fdl.html. 164
-0.231171	columns = 8; float matrix[rows][columns];
-0.231171	columns = 32; float matrix[rows][columns];
-0.231171	columns = 50; float matrix[rows][columns];
-0.060039	not as a linked list.
-0.060039	than as a linked list.
-0.060039	implemented as a linked list.
-0.237882	a Taylor series. The exponential
-0.008674	functions such as logarithms, exponential
-0.237879	as mentioned above. The generality
-0.408090	STL is designed for generality
-0.323817	(STL) if the full generality
-0.336904	how long time it takes.
-0.616889	how much time it takes.
-0.313651	much time each part takes.
-0.358079	different threads in a multithreaded
-0.237139	modify objects simultaneously. In multithreaded
-0.235419	into account when optimizing multithreaded
-0.116722	0; list[i+1] = 1; list[i+2]
-0.116722	=0; list[i+1] = 1; list[i+2]
-0.291782	i_div_3; list[i+1] += i_div_3; list[i+2]
-0.293545	array element. Matrix size Total
-0.584709	bits Number of elements Total
-0.414324	vector Type of elements Total
-0.237810	is to do it explicitly.
-0.786483	to vectorize the code explicitly.
-0.381192	to do this optimization explicitly.
-0.336046	mathematical calculations. In other programs,
-0.348905	the program. In some programs,
-0.231412	recommended to make 16-bit programs,
-0.237810	see how well it optimizes
-0.357694	how well the compiler optimizes
-0.322364	multi-core processing. Visual Studio optimizes
-0.228172	Inserting your own profiling instruments
-0.148757	is to put measurement instruments
-0.148757	put the desired measurement instruments
-0.237244	as entry point. // After
-0.237244	to the dispatcher. // After
-0.228172	a kind of branch. After
-0.215269	cases, while many reductions involving
-0.215269	point expressions. Most reductions involving
-0.228180	Converting class objects Conversions involving
-0.203710	n.a. Floating point XMM (vector)
-0.203710	- - Integer XMM (vector)
-0.203710	- 76 Boolean XMM (vector)
-0.350360	doesn't occur has the unfortunate
-0.357478	than rounding. This is unfortunate
-0.293995	Mac code uses an unfortunate
-0.014038	discussed in manual 2: "Optimizing
-0.014038	chapter in manual 2: "Optimizing
-0.014038	detail in manual 2: "Optimizing
-0.503801	is copied to the parameter,
-0.331901	also treated like a parameter,
-0.538084	assignment, as a function parameter,
-0.702130	able to recover from exceptions.
-0.324864	handling errors without using exceptions.
-0.235498	issue to catching hardware exceptions.
-0.520572	ways to avoid the time-
-0.293157	very long and very time-
-0.342846	good idea to put time-
-0.015542	128 bytes AMD Opteron K8
-0.015542	aligned operands AMD Opteron K8
-0.015542	unaligned op. AMD Opteron K8
-1.323397	when the program is loaded.
-0.451766	link pointer has been loaded.
-0.200049	or database is heavily loaded.
-0.537333	the same as for (i=0;
-0.628023	sum = 0; for (i=0;
-0.237140	identical. For example, for (i=0;
-0.237905	storage. Example 14.23b and 14.30
-0.355657	pivot search: // Example 14.30
-0.312444	i; } } Example 14.30
-0.350740	= a; y = b;}
-0.348835	Sum1() {return a + b;}
-0.218590	403 int ReadB() {return b;}
-0.237736	Studio 2008 version). This wasteful
-0.434597	ways to avoid this wasteful
-0.165184	memory allocation is unnecessarily wasteful
-0.236664	to provoke error // Return
-0.236664	Initialize to zero // Return
-0.236664	error return a[i]; // Return
-0.098039	array static inline void StoreVector(void
-0.088610	in the software. Smaller microcontrollers
-0.088610	to optimize caching. Smaller microcontrollers
-0.088610	of small microcontrollers: Smaller microcontrollers
-0.336284	of storing strings in character
-0.237771	fashioned C style with character
-0.237750	fashioned C style as character
-0.481005	the programming language is implemented.
-0.538577	in example 15.1b is implemented.
-0.325309	way member pointers are implemented.
-0.285832	of code size. In fact,
-0.230534	use assembly language. In fact,
-0.230534	function can throw. In fact,
-0.234277	virtual 53 function at runtime.
-0.531638	time rather than at runtime.
-0.418642	if is resolved at runtime.
-0.714334	to unroll a loop manually
-0.346799	do must be done manually
-0.284365	and loop-invariant code motion manually
-0.237925	of the multiplication of xxn
-0.292032	for(inti=0;i<16;i+=4){ //Loopby4 s += xxn
-0.165184	// s += x^n/n! xxn
-0.015542	cycle. The operators &, |,
-0.007702	The bitwise operators &, |,
-0.007702	corresponding bitwise operators &, |,
-0.885363	memory. Example: // Example 7.2
-0.231409	cons of using classes. 7.2
-0.218590	of variable storage............................................................................. 26 7.2
-0.652333	instance for each thread. Thread-local
-0.229247	a thread environment block. Thread-local
-0.224953	has changed five times. Thread-local
-0.341499	of doing whole program 81
-0.355579	if available. See page 81
-0.165184	Compiler optimization options ................................................................................... 81
-0.593106	static. Example: // Example 7.1
-0.222399	this series of manuals. 7.1
-0.218590	different C++ constructs........................................................................ 26 7.1
-0.337035	processor makes the dispatcher signal
-0.224953	audio and video processing, signal
-0.200049	many functions for statistics, signal
-0.274714	be implemented as a circular
-0.336391	a queue as a circular
-0.331173	to double In example 7.4
-0.236531	and operators ...................................................................... 32 7.4
-0.312187	information about mathematical functions. 7.4
-1.018767	allows the compiler to ignore
-0.354563	processor model. You may ignore
-1.061648	may in some cases ignore
-0.294210	18.2. Compiler directives and keywords
-0.313787	Some compilers have many keywords
-0.165184	static where appropriate. Compiler-specific keywords
-0.578723	Another example: // Example 7.8
-0.200049	Function pointers ...................................................................................................... 37 7.8
-0.200049	function pointer has changed. 7.8
-0.235593	may calculate it only once.
-0.291582	it is done only once.
-0.236871	it is only called once.
-0.345163	y) { union { 89
-0.342155	of memory. See page 89
-0.342155	not overlap. See page 89
-0.451780	void Func2() { int list[100];
-0.237206	// Example 7.16 float list[100];
-0.527809	a; double b;}; S1 list[100];
-0.358138	following techniques can be considered
-0.357034	smart pointer may be considered
-0.165190	package is not traditionally considered
-0.237640	call (e.g. GetProcessAffinityMask in Windows).
-0.237640	calls (e.g. IsProcessorFeaturePresent in Windows).
-0.324489	device drivers for 64-bit Windows).
-0.330045	you want to compile for.
-0.369319	than it is intended for.
-0.283163	of code is intended for.
-0.814854	have to do the divisions
-0.237902	speed up multiplications and divisions
-0.222395	multiplication is exact. Multiple divisions
-0.313910	= 10, columns = 8;
-0.237217	const int TILESIZE = 8;
-0.707706	unsigned int exponent : 8;
-0.237871	a zigzag course that reflects
-0.538524	the innermost loop. This reflects
-0.620515	double and long double reflects
-0.418356	90 9.7 Container classes .....................................................................................................
-0.265203	Multithreading.............................................................................................................. 101 10.1 Hyperthreading .....................................................................................................
-0.251367	.......................................................................................... 126 13.5 Implementation .....................................................................................................
-0.736086	from the value that lies
-0.310719	= 80. The difference lies
-0.232347	reason for this efficiency lies
-0.237905	such as logarithms and trigonometric
-0.021517	as logarithms, exponential functions, trigonometric
-0.429221	that allow you to manipulate
-0.382591	standardized allows us to manipulate
-0.237813	>> can test or manipulate
-0.292801	fraction : 23; // fractional
-0.236664	fraction : 52; // fractional
-0.236664	fraction : 63; // fractional
-0.235966	and subtracting 1 from -128
-0.313216	integers which range from -128
-0.330550	in stdint.h char 8 -128
-0.245677	program happen to be spaced
-0.245677	variables happen to be spaced
-0.421286	because the addresses are spaced
-0.407851	We can make an approximate
-0.224521	maximum, saturated addition, fast approximate
-0.224521	fast approximate reciprocal, fast approximate
-0.325368	and unsigned integers in comparisons,
-0.629788	if the operands are comparisons,
-0.503100	require two floating point comparisons,
-0.265203	and desired new features. User
-0.200049	need to be deleted. User
-0.165184	Take user feedback seriously. User
-0.340969	is faster if the dividend
-0.237783	and by changing the dividend
-0.235915	to consume time at unpredictable
-0.291948	collection may start at unpredictable
-0.346536	array overflow can cause unpredictable
-0.066874	array static inline __m128i LoadVector(void
-0.237785	space 91 step by step.
-0.347297	function for the next step.
-0.342817	done at the second step.
-0.237313	Y = C; double Z
-0.630734	// Update induction variable Z
-0.165184	Y; Y += Z; Z
-0.237406	The three clauses are separated
-0.237406	within each clause are separated
-0.525329	system code is not separated
-0.294210	varies between 9 and 64,
-0.294077	512-bit ZMM registers by 64,
-0.165184	4, 8, 16, 32, 64,
-0.331869	the library file and copies
-0.456847	insert a code that copies
-0.165184	ebx,eax / shr ebx,31 copies
-0.575786	models of the same brand.
-0.354622	that limits the CPU brand.
-0.291549	to dispatch by CPU brand.
-0.357478	23 software. This is annoying
-0.851770	copy protection schemes are annoying
-0.485705	known to be an annoying
-0.456218	AMD's profiler is called CodeAnalyst.
-0.283427	Intel VTune and AMD CodeAnalyst.
-0.228416	AMD CPUs use AMD CodeAnalyst.
-0.184770	compiler options....................................................................................... 160 19 Literature
-0.184770	_M_X64 _M_X64 162 19 Literature
-0.165190	a list of titles. Literature
-0.345030	be very useful to study
-1.247887	It is important to study
-0.289009	based mainly on my study
-0.345818	other objects on the stack,
-0.384047	be stored on the stack,
-0.716694	are stored on the stack,
-0.346690	heap management and garbage collection.
-0.171952	no need for garbage collection.
-0.171952	This is called garbage collection.
-0.349653	is costly when it occurs,
-0.343200	for overflow before it occurs,
-0.234810	sure that overflow never occurs,
-0.730140	compile with the option -fno-pic
-0.302233	specify the compiler option -fno-pic
-0.227409	here. The compiler option -fno-pic
-0.218597	__unix__ __linux__ x86 platform _M_IX86
-0.218597	_M_IX86 _M_IX86 x86-64 platform _M_IX86
-0.265210	__linux__ x86 platform _M_IX86 _M_IX86
-0.382867	If the bottleneck is elsewhere
-0.235424	advised to seek information elsewhere
-0.233043	can cause unpredictable errors elsewhere
-0.314616	a different way or bypassing
-0.407954	avoid this problem by bypassing
-0.657400	rely on the compiler bypassing
-0.015521	range from 0x2700 to 0x273F
-0.065719	from address 0x2700 to 0x273F
-0.237902	on page 134 and 135
-0.382392	Friday) { DoThisThreeTimesAWeek(); } 135
-0.165184	multiple values at once................................... 135
-0.294090	of the factorial function looks
-0.331688	output. The optimized code looks
-0.346528	using Agner's vector classes looks
-0.233824	of 100 doubles: union {double
-0.246227	Example 8.15a struct S1 {double
-0.246227	Example 8.15b struct S1 {double
-1.212676	can be used for implementing
-0.481224	functions are used for implementing
-0.236133	containers 93 themselves. But implementing
-0.352662	short int instead of int.
-0.237924	to convert float to int.
-0.405975	four numbers of type int.
-0.293824	constant always takes memory space,
-0.634342	a waste of cache space,
-0.231415	You may save RAM space,
-0.354307	Predictable branches that can skip
-0.354563	cache contention. You may skip
-0.200049	here's an explanation. Please skip
-0.434142	of 2 (See page 137
-0.226790	for sign and rounding 137
-0.165184	136 14.5 Integer division...................................................................................................... 137
-0.456994	occupying a cache line. 132
-0.218590	Specific optimization topics ......................................................................................... 132
-0.200049	Use lookup tables ................................................................................................. 132
-0.237925	of the needs of position-
-0.351749	of data. This makes position-
-0.331866	normally use the so-called position-
-0.539110	operator } }; // Index
-0.008674	{ cout << "Error: Index
-0.165184	__restrict or #pragma optimize("a",on). Specifies
-0.165184	function. __attribute__((const)) (Linux only). Specifies
-0.165184	alignment. __declspec(align(16)) or __attribute__((aligned(16))). Specifies
-0.860282	absolute value of the residual
-1.049201	the calculation of the residual
-0.343560	is repeated until the residual
-0.329262	take advantage of vector operations,
-0.312221	structures. Useful for vector operations,
-0.237401	integer with vector integer operations,
-0.473695	of the drawbacks of C++.
-0.325351	often be C or C++.
-0.405570	programs implemented in compiled C++.
-0.237481	command or do other input/output
-0.236526	that have many file input/output
-0.212321	of these categories: File input/output
-0.237704	and maintain. Most compiler packages
-0.376061	order to make software packages
-0.233045	the ever bigger software packages
-0.237952	avoided by joining the operations:
-0.222399	of the two AND operations:
-0.218590	used to avoid modulo operations:
-0.703892	AMD and VIA processors. Explicit
-0.074788	structures ............................................................. 96 9.11 Explicit
-0.074788	Hoisie, SIAM 2001. 9.11 Explicit
-0.344863	never designed for this purpose.
-0.511125	wired for a specific purpose.
-0.527950	function for a particular purpose.
-0.233775	a1 * b2 * reciprocal_divisor;
-0.233775	a2 * b1 * reciprocal_divisor;
-0.165190	b1, b2, y1, y2, reciprocal_divisor;
-0.313501	by their values before compilation.
-0.449784	intermediate code and just-in-time compilation.
-0.241285	Some implementations use just-in-time compilation.
-0.237817	as (critical stride) = (number
-0.235825	(total cache size) / (number
-0.231409	/ (line size) % (number
-0.310831	systems may have big endian
-0.222062	platforms that use big endian
-0.222062	point comparison. On big endian
-1.038312	is a function that allocates
-0.629674	function is called, it allocates
-0.165184	deque (doubly ended queue) allocates
-0.353668	are given on page 136
-0.367392	j; int order(int x); 136
-0.218590	14.4 Integer multiplication ............................................................................................. 136
-0.293848	not appropriate here. It reveals
-0.265203	function. The assembly listing reveals
-0.165184	in my crystal ball reveals
-0.462718	every time it is filled
-0.356956	the heap to be filled
-0.355183	the cache will be filled
-0.294104	CPU only) -O3 or (requires
-1.628965	the SSE2 instruction set (requires
-0.218590	compiler, linker and loader (requires
-0.347663	option available. Some compilers offer
-0.343247	the executable. Most compilers offer
-0.287948	this format. Other compilers offer
-0.265210	to integers. 7.25 Bitfields Bitfields
-0.122675	applied to integers. 7.25 Bitfields
-0.122675	Unions .................................................................................................................... 55 7.25 Bitfields
-0.453208	} } } // At
-0.200049	of end user's computers. At
-0.165184	oriented programming are dominating. At
-0.446378	user will have an up-to-date
-0.236688	you should choose an up-to-date
-0.293580	the best and most up-to-date
-0.228183	for security reasons before leaving
-0.150261	then call _mm256_zeroupper() before leaving
-0.322374	page 88 for details. Inheritance
-0.122675	(RTTI) ........................................................................... 54 7.22 Inheritance
-0.122675	use alternative implementations. 7.22 Inheritance
-0.566179	want when the program 153
-0.355579	part takes. See page 153
-0.165184	150 16 Testing speed.............................................................................................................. 153
-0.338244	loops if a high degree
-0.314794	should contain a typical degree
-0.165184	that is an n'th degree
-0.233679	const & x) { _mm_storeu_si128((__m128i
-0.290009	will do such optimizations automatically,
-0.222395	15.1a to 151 15.1c automatically,
-0.200049	example 11.1a to 11.1b automatically,
-0.324668	access a multidimensional array sequentially.
-0.456473	the data are accessed sequentially.
-0.896408	the elements are accessed sequentially.
-0.122675	operators ...................................................................... 32 7.4 Enums
-0.122675	about mathematical functions. 7.4 Enums
-0.165190	an integer in disguise. Enums
-0.315850	easier for the CPU. Algebraic
-0.389094	across the function call. Algebraic
-0.165184	do more complicated reductions. Algebraic
-0.741612	that the values of A,
-0.023429	}; Bitfield x; int A,
-0.351702	same precision as the operands.
-0.235240	they always evaluate both operands.
-0.403558	the order of Boolean operands.
-0.505881	a[100]; int i; for(i=0; i<100;
-0.294286	= 0; for (i=0; i<100;
-0.165184	i; float i2; for(i=0,i2=0; i<100;
-0.212332	0.11 0.18 0.18 0.18 0.11
-0.165195	2 0.63 0.75 0.18 0.11
-0.212327	2 0.12 0.18 0.12 0.11
-0.442878	operands Intel Core 2 0.12
-0.222403	Core 2 0.12 0.18 0.12
-0.200049	0.11 1.21 0.57 0.44 0.12
-0.356721	kind: "what is the nearest
-0.357988	point number to the nearest
-0.237926	{ // Round to nearest
-0.231409	and short vector libraries. To
-0.452038	grow in the future. To
-0.165184	the computer is rebooted. To
-0.342632	x-- x x-- x x--
-0.171936	algebra reductions: x-- x x--
-0.311809	Bit vector algebra reductions: x--
-0.342978	standards for the C++ language,
-0.292648	between a software programming language,
-0.859938	in a hardware definition language,
-0.357123	most cases when the 145
-0.355579	| 0x8040); See page 145
-0.218590	14.10 Mathematical functions ....................................................................................... 145
-0.355579	avoid this. See page 140
-0.237206	// everything is float 140
-0.165184	mix float and double..................................................................................... 140
-0.355579	assembly language. See page 141
-0.200049	for SSE2 or x64 141
-0.165184	numbers and integers ................................... 141
-0.314492	actually be better than RISC
-0.236966	be. The distinctions between RISC
-0.251367	instruction sets have got RISC
-0.290004	rather than future processors. Consider
-0.375077	a waste of resources. Consider
-0.200049	the case in loops. Consider
-0.294236	time. The storage of text
-0.286823	avoid these and handle text
-0.301491	as buffers for storing text
-0.233582	with a different compiler. Object
-0.218590	(new and delete). 88 Object
-0.165184	is Borland's now discontinued Object
-0.357732	unsigned Examples: // Example 14.10
-0.200049	point variables ......................... 142 14.10
-0.165184	replace u[1] by u[0]. 14.10
-0.462116	modulo calculations: // Example 14.11
-0.212321	Mathematical functions ....................................................................................... 145 14.11
-0.165184	to overcome this limitation). 14.11
-0.226801	power of 2 template <int
-0.226801	power of N template <int
-0.226801	x * m;} template <int
-0.228166	two or more iterations back.
-0.224950	that is four places back.
-0.222399	be read and written back.
-1.460486	For example: // Example 8.4
-0.534588	turn on this option. 8.4
-0.200049	by compiler ....................................................................... 77 8.4
-0.593106	limited. Example: // Example 8.7
-0.251367	operations, see page 105. 8.7
-0.200049	Optimization directives .............................................................................................. 82 8.7
-0.284595	the function. The assembly listing
-0.229445	-openmp -static Generate assembly listing
-0.485980	at the assembly output listing
-0.353391	These units are used twice
-0.236904	the table has const twice
-0.345469	or g(x) is calculated twice
-0.341854	Some early implementations of Pascal
-0.376050	for all major platforms. Pascal
-0.364855	Most implementations of C++, Pascal
-0.358401	cache miss can be expected.
-0.490242	always as good as expected.
-0.311116	{ // Cache contentions expected.
-0.231909	difference for the performance. 14.4
-0.226793	17.4 129 129 130 14.4
-0.212321	values at once................................... 135 14.4
-1.371275	short int cc[]) { Vec16s
-0.345872	register for the class Vec16s
-0.165184	16 16 256 Vec32uc Vec16s
-0.290376	This is equally efficient. Simple
-0.313485	are generally very fast. Simple
-0.165184	fast, -fp- model fast=2 Simple
-0.008674	Intel Architecture Software Developer’s Manual",
-0.165190	AMD: "AMD64 Architecture Programmer’s Manual",
-0.237641	the CPU cores and leave
-0.237641	matrix 512 520 and leave
-0.314134	used. No program should leave
-0.459851	This problem can be solved
-0.355950	This dilemma can be solved
-0.447383	The Intel compiler has solved
-0.357478	library (SVML). This is supplied
-0.455563	advanced mathematical functions are supplied
-0.352930	function that I have supplied
-0.231412	made for demonstration purposes. Available
-0.231409	and direct hardware access. Available
-0.224953	class library Intel Agner Available
-0.502416	of program code is translated
-0.237762	intrinsic function call is translated
-0.349579	initialisation i=0; has been translated
-0.234067	263-1 int64_t 29 64-bit Linux:
-0.234067	compiler: unsigned __int64 64-bit Linux:
-0.165190	an option (Windows: /Gy, Linux:
-0.589308	time for the user. With
-0.222395	on a thousand numbers. With
-0.212321	for the next step. With
-0.226785	32- and 64-bit Linux. Has
-0.218590	Microsoft Visual Studio IDE. Has
-0.165184	optimizer. Borland/CodeGear/Embarcadero C++ builder Has
-0.102960	CPUs unless you are overriding
-0.333174	interposition feature that allows overriding
-0.220922	strlen 128 bytes AMD Opteron
-0.220922	16kB aligned operands AMD Opteron
-0.220922	16kB unaligned op. AMD Opteron
-0.209735	C++ compilers and operating systems".
-0.502138	end up with the correct
-0.452668	if it has the correct
-0.237932	if our estimate is correct
-0.237756	and less efficient code caching.
-0.237563	page 87 about memory caching.
-0.346981	organize data to optimize caching.
-0.779908	is no risk of overflow:
-0.525269	exception for floating point overflow:
-0.218594	a way that avoids overflow:
-0.345913	an antivirus program that scans
-0.237464	a virus scanner that scans
-0.237799	The string length function scans
-0.326810	compiler in the following way:
-0.326810	improved in the following way:
-0.326810	(PLT) in the following way:
-0.348414	it optimizes the code. Sometimes
-0.311142	be particularly time consuming. Sometimes
-0.218590	even for simple tasks. Sometimes
-0.381507	bit -fno-builtin Gnu 32-bit -fno-builtin
-0.346301	Asmlib Gnu 64 bit -fno-builtin
-0.235990	optimal. Use 12 option -fno-builtin
-0.255781	is small enough to justify
-0.255781	is rarely enough to justify
-0.313494	in performance can easily justify
-0.224899	data structures. On the contrary,
-0.224899	using hyperthreading. On the contrary,
-0.224899	example 9.1b. On the contrary,
-0.283893	the same function calling conventions.
-0.203708	to the standard calling conventions.
-0.203708	in manual 5: calling conventions.
-0.638866	the critical function. The initialization
-0.255055	The program has an initialization
-0.255055	or library has an initialization
-0.357452	discussion forums on the Internet
-0.336219	download updates through the Internet
-0.200054	- 5. www.amd.com. 163 Internet
-0.592245	big in order to cover
-0.237732	arrays very big to cover
-0.353827	This manual does not cover
-0.222401	by the constructor itself. Constructors
-0.122675	int c; }; 7.23 Constructors
-0.122675	Inheritance .............................................................................................................. 54 7.23 Constructors
-0.236966	and CISC processors, between PC's
-0.353741	For example, the first PC's
-0.235495	power. Connecting several standard PC's
-0.462116	size conversion // Example 7.21
-0.212321	member functions ........................................................................................ 53 7.21
-0.466197	be worth the effort. 7.21
-0.314690	an unfortunate method that delays
-0.322962	unpredictable times and cause delays
-0.165184	This can cause severe delays
-0.005978	aa: StoreVector(aa + i, a);
-0.102765	numbers in b[i] and c[i]
-0.102765	checking if b[i] and c[i]
-0.165190	= a[i] + b[i]; c[i]
-0.237877	by exception handlers for cleaning
-0.615038	a lot of time cleaning
-0.382360	F1 is prevented from cleaning
-0.357764	speed. In the same way,
-0.452902	three times the other way,
-0.237429	goes many times one way,
-0.404456	use of RAM memory. Big
-0.222403	a big mainframe computer. Big
-0.165184	should be aware of. Big
-0.132701	AVX-512 instruction set and ZMM
-0.165190	32 and the 512-bit ZMM
-0.341865	104). The table of coefficients
-0.035786	} polynomial // Polynomial coefficients
-0.035786	= 3.3; // Polynomial coefficients
-0.356119	segmented memory, such as DOS
-0.453099	the old operating systems DOS
-0.233812	in some very old DOS
-0.294187	the 32-bit case. The -fpie
-0.753095	compile with the option -fpie
-0.303146	object made with option -fpie
-0.345505	switch statement with many labels
-0.346064	efficient if the case labels
-0.218590	switch statement with sequential labels
-0.040213	= {1, 1, 2, 6,
-0.228172	1, 2, 3, 4, 6,
-0.212321	cmp jl $B1$3: pop ret
-0.165184	add cmp ja $B2$3: ret
-0.165184	saved in the beginning. ret
-0.310707	conversions is discussed below. Signed
-0.229245	check for integer overflow. Signed
-0.165184	unsigned. // Example 7.4. Signed
-0.237902	calls to log, and logarithms
-1.086166	mathematical functions such as logarithms
-0.291783	The pow function uses logarithms
-0.687945	of the array is stored.
-0.538881	constant needs to be stored.
-0.341817	understand how variables are stored.
-0.229245	proceed in a standardized manner.
-0.518925	bits in a non-sequential manner.
-0.228168	accessed in a random manner.
-0.233035	for speed or size. Today,
-0.212321	Basic was too slow. Today,
-0.200049	yesterday's big mainframe computers. Today,
-0.325424	of course be the easiest
-0.237520	pointer aliasing (/Oa). The easiest
-0.237520	cause large delays. The easiest
-0.237902	have to push and pop
-0.347516	if i < 100. pop
-0.165184	add cmp jl $B1$3: pop
-0.339889	3.5; Here, the constant 3.5
-0.226787	Automatic updates .................................................................................................... 19 3.5
-0.222395	during the update process. 3.5
-0.289563	options and the options -S
-0.074788	Generate assembly listing /FA -S
-0.074788	-S - masm=intel /FA -S
-1.030330	if the function is inlined.
-0.502260	is certain to be inlined.
-0.503913	the function can be inlined.
-0.225761	ecx 86 add add cmp
-0.225761	sar add mov add cmp
-0.165190	the loop increment i++. cmp
-0.237553	the data structure, data flow
-0.546856	delays in the program flow
-0.348772	which determines the program flow
-0.532425	by means of #include directives.
-0.229254	the data. Use OpenMP directives.
-0.165184	metaprogramming in C++: Preprocessor directives.
-0.455288	been allocated is also deallocated.
-0.119075	after it has been deallocated.
-0.325129	may possibly be more (128
-0.578720	-msse SSE2 instruction set (128
-0.349052	sections SSE instruction set (128
-0.272290	likely to be obsolete. Programmers
-0.200049	this bit scan instruction. Programmers
-0.165184	result of macro expansions. Programmers
-1.342876	it is important to focus
-0.459170	that there is more focus
-0.309272	Fortran code. The main focus
-0.784844	static to the function definition.
-0.343364	body inside the class definition.
-0.490820	defined inside a class definition.
-0.325424	needs to follow the track
-0.303610	useful way to keep track
-0.203371	allocated objects and keep track
-0.237654	too worried about this condition.
-0.322029	will trigger the error condition.
-0.285179	exception or other error condition.
-0.237902	s0, s1, s2 and s3
-0.338475	0, s2 = 0, s3
-0.165184	a[i+1]; s2 += a[i+2]; s3
-0.338475	0, s1 = 0, s2
-0.165184	a[i]; s1 += a[i+1]; s2
-0.165184	(s0+s1)+(s2+s3); Now s0, s1, s2
-0.324487	use vector operations on contemporary
-0.236924	typically 64 bytes on contemporary
-0.232720	same processor core. Unfortunately, contemporary
-0.222395	in the compiler .......................................................................................... 66
-0.538990	* x + 1.0f;} 66
-0.200049	How compilers optimize ............................................................................................ 66
-0.237930	of array bounds is probably
-0.806759	then the code can probably
-0.165184	and show a disassembly, probably
-0.565300	Avoid the use of longjmp
-0.556407	used when the function longjmp
-0.348963	possible. Don't rely on longjmp
-0.015542	, longdoublevalue ( 1)sign 2exponent
-0.015542	, doublevalue ( 1)sign 2exponent
-0.015542	follows: floatvalue ( 1)sign 2exponent
-0.327056	tree or switch statement leads
-0.320619	situations where automatic vectorization leads
-0.326110	delay on lazy binding leads
-0.293542	256; // Array size Alignd
-0.292285	Make three aligned arrays Alignd
-0.222399	short int bb[size] ); Alignd
-0.382278	can use it for improving
-1.297904	can be used for improving
-0.294052	and some tips on improving
-0.293642	instruction sets and cache sizes.
-0.291976	4 with different matrix sizes.
-0.224950	with members of mixed sizes.
-0.372972	19 3.5 Program loading .......................................................................................................
-0.272290	.......................................................................................... 150 15 Metaprogramming .......................................................................................................
-0.501060	20 3.9 Other databases .......................................................................................................
-0.237871	fact an integer that holds
-0.236447	100000001.23456. The float type holds
-0.229245	of the array. eax holds
-0.341863	The benchmark performance of competing
-0.331812	if the threads are competing
-0.237599	Foundation Classes (MFC). A competing
-0.100255	answers to your programming questions
-0.100255	don't send your programming questions
-0.165190	the time to answer questions
-0.566115	never stored in a register,
-0.513307	y into a vector register,
-0.571007	into a 128-bit vector register,
-0.346398	code, including user interface etc.,
-0.233584	turn calls another function, etc.,
-0.330846	code, interpreters, just-in-time compilers, etc.,
-0.544885	takes to call the ReadTSC
-0.357417	obtained with the function ReadTSC
-0.235770	from www.agner.org/optimize/testp.zip or get ReadTSC
-0.144302	This can be replaced with:
-0.212327	int c; }; Replace with:
-0.237907	chapter "Register usage in kernel
-0.573299	in the operating system kernel
-0.324207	well as in Linux kernel
-0.539642	Intel, AMD and VIA CPUs").
-0.049930	float matrix[rows][columns]; int i, j;
-0.247510	S1 list[size]; int i, j;
-0.331716	Do objects have a natural
-0.331716	if elements have a natural
-0.379680	if the code contains natural
-0.231915	logic allows parallel calculations. Examples
-0.279495	pool, as explained above. Examples
-0.291775	when it is run. Examples
-0.352501	else if else { (iset
-0.165184	8) SelectAddMul_pointer = &SelectAddMul_AVX2; (iset
-0.165184	5) SelectAddMul_pointer = &SelectAddMul_SSE41; (iset
-0.607098	F1 calls another function F2
-0.237785	for exceptions thrown by F2
-0.452838	std::unexpected() function in case F2
-0.237184	pressing a key or moving
-0.237184	pressing a button or moving
-0.356630	to memcpy rather than moving
-0.357739	matrix a: // Example 9.6b.
-0.339762	not work in example 9.6b.
-0.339762	as shown in example 9.6b.
-0.330846	only) (Intel CPU only) -O3
-0.165184	-O3 or -Ofast /O3 -O3
-0.165184	speed /O2 or /Ox -O3
-0.237811	hour. Neither is it unusual
-0.577168	software, it is not unusual
-0.566079	lost. It is not unusual
-0.232911	time goes to cache misses,
-0.232911	program has most cache misses,
-0.232911	machine instructions executed, cache misses,
-0.552967	0/a = 0 - Divide
-0.291469	= shift and add Divide
-0.165184	(-a==-b)=(a==b) ---xx---- (-a>-b)=(a<b) ---xx---x Divide
-0.354269	you may use a sorted
-0.325084	is small then a sorted
-0.314337	its simplicity. But a sorted
-0.237925	the conflicting considerations of efficiency,
-0.237404	order to improve cache efficiency,
-0.381522	is a compromise between efficiency,
-0.356653	machine code is the same.
-0.237783	small and always the same.
-0.314663	parameters are exactly the same.
-0.047806	the standard template library (STL)
-0.047806	The standard template library (STL)
-0.316715	the Standard Template Library (STL)
-0.420622	still want to get rid
-0.272307	option. Then we get rid
-0.218604	so we don't get rid
-0.233812	foreground jobs and 10 ms
-0.226787	time slices to 120 ms
-0.212321	slices of typically 30 ms
-0.293407	a program has two arrays,
-0.236245	counters, etc. In large arrays,
-0.292235	there are no big arrays,
-0.443299	that if the elements matrix[r][c]
-0.337543	matrix, i.e. each element matrix[r][c]
-0.283924	the diagonal. Each element matrix[r][c]
-0.458760	want the program to issue
-0.478359	this is not an issue
-0.224953	There is a portability issue
-0.353230	The simplest way to solve
-0.382591	(MOVNT) are designed to solve
-0.457156	profiler. This does not solve
-0.322682	is not a problem since
-0.361327	Has not been updated since
-0.165184	number of clock pulses since
-0.293860	for different purposes is beyond
-0.237595	Optimizing database queries is beyond
-0.237595	use of coprocessors is beyond
-0.571937	The code becomes more readable
-0.236231	the assembly output more readable
-0.165190	it is not human readable
-0.339472	when an operand is infinity
-0.500449	final result will be infinity
-0.294104	a was zero or infinity
-1.014579	do a lot of bookkeeping
-0.351356	The costs of this bookkeeping
-0.230816	following example explains why bookkeeping
-0.293235	are combined by some formula
-0.232344	always use the safe formula
-0.623075	for finding the right formula
-0.352966	go deeper into the technical
-0.349772	fail completely because of technical
-0.231910	if the inlining causes technical
-0.235940	SSE4.1 instr. set AVX instr.
-0.230113	SSE3 instr. set SSE4.1 instr.
-0.279495	instruction set Suppl. SSE3 instr.
-0.358621	are indeed of the specified
-0.573792	move, depending on the specified
-0.311317	mode program are typically specified
-0.626386	are smarter ways of organizing
-0.421353	51 performance penalty for organizing
-0.575311	improve the performance by organizing
-0.357739	element matrix[c][r]. // Example 9.5a
-0.540597	c loop in example 9.5a
-0.234861	a matrix using example 9.5a
-0.237932	operand of && is false,
-0.293430	a && false = false,
-0.293430	a && !a = false,
-0.237902	It is free and open
-0.237837	function. Function inlining can open
-0.231410	well. Open Watcom Another open
-0.344807	to find the optimal decomposition
-0.165184	main principles here: functional decomposition
-0.165184	and data decomposition. Functional decomposition
-0.237925	separately. The fallacy of measuring
-0.237902	the hot spots and measuring
-0.347376	have confirmed this by measuring
-0.074788	page 146 below. 3.7 File
-0.074788	code ....................................................... 20 3.7 File
-0.165190	one of these categories: File
-0.719166	dynamic memory allocation is negligible
-0.421221	the exception handling is negligible
-0.481308	penalty is only a negligible
-0.353134	disk caching, but it took
-0.421130	} } This code took
-0.234540	15.1b to 15.1c? We took
-0.237029	own caller, and so on.
-0.424484	microprocessor it is running on.
-0.817900	relevant optimization options turned on.
-0.290009	but eight logical processors. Hyperthreading
-0.074788	10 Multithreading.............................................................................................................. 101 10.1 Hyperthreading
-0.074788	4, 2007 (www.intel.com/technology/itj/). 10.1 Hyperthreading
-0.346341	test bits 0 - 30
-0.235038	time slices of typically 30
-0.165184	integers (see page 142). 30
-0.314198	a function pointer which initially
-0.381979	function // Function pointer initially
-0.272290	function. The PLT entry initially
-0.353276	when contentions do not occur.
-0.352907	pointer aliasing does not occur.
-0.511996	assume that it doesn't occur.
-0.283149	style with character arrays. Strings
-0.074788	classes ..................................................................................................... 93 9.8 Strings
-0.074788	fit specific needs. 9.8 Strings
-0.402677	code. 7.32 Preprocessing directives Preprocessing
-0.074788	in time-critical code. 7.32 Preprocessing
-0.074788	unwinding .............................................................................. 65 7.32 Preprocessing
-1.567980	it is possible to utilize
-0.592245	blocks in order to utilize
-0.229250	best way to fully utilize
-0.338306	Make a vector of (0,0,0,0,0,0,0,0)
-0.233339	illustrated in this example: 38
-0.212321	7.9 Smart pointers .......................................................................................................... 38
-0.200049	38 7.10 Arrays ..................................................................................................................... 38
-0.352663	of the pointer or reference.
-0.324469	is by a const reference.
-0.251367	by returning a null reference.
-0.200062	INSTRSET == 2 #define FUNCNAME
-0.200062	INSTRSET == 8 #define FUNCNAME
-0.200062	INSTRSET == 5 #define FUNCNAME
-0.237952	whole program. During the history
-0.345155	in precompiled code. The history
-0.200049	based on the past history
-0.574402	1; } }; class CChild2
-0.200049	() { CChild1 Object1; CChild2
-0.165184	p1 = &Object1; p1->Hello(); CChild2
-0.565008	bits except the sign bit:
-0.308905	by inverting the sign bit:
-0.215562	and shift out sign bit:
-0.231921	There are various discussion forums
-0.212321	5. www.amd.com. 163 Internet forums
-0.200049	Internet forums Several internet forums
-0.233047	has support for relative addressing
-0.176505	no instruction for self-relative addressing
-0.423694	instruction set supports self-relative addressing
-0.577516	const int size = 1024;
-0.460066	includes languages such as C#,
-0.793018	parts of the code. C#,
-0.200049	programs written in Java, C#,
-0.347210	the beginning rather than allocating
-0.347210	in advance rather than allocating
-0.200054	allocating piecewise or re- allocating
-0.355596	this example so that a+b
-0.356612	Constant folding - n.a. a+b
-0.311802	optimization Integer algebra reductions: a+b
-0.339193	calculations. This should be taken
-0.339193	These problems should be taken
-0.339193	following considerations should be taken
-0.498936	cycles, depending on the microprocessor.
-0.346447	for each type of microprocessor.
-0.357417	reference allows the function argument
-0.345073	The conclusion to this argument
-0.346017	>= operators). The same argument
-0.237466	Func1(2); ... } If Func1
-0.236553	// Example 8.21 void Func1
-0.452328	the necessary information about Func1
-0.399955	in shared objects in Unix-like
-0.486008	time. Shared objects in Unix-like
-0.640576	good choice for all Unix-like
-0.237731	x-- x --- - -----
-0.507175	xx x x- x -----
-0.200049	- ----- x---- x---- -----
-1.222186	a = b * 2.5
-0.236836	programming language ............................................................................... 8 2.5
-0.310521	developed as C++ compilers. 2.5
-0.352424	the same code and read-only
-0.237641	the code section and read-only
-0.356249	areas. Data that are read-only
-0.526037	most variables in a well-structured
-0.325361	for making clear and well-structured
-0.331609	lead to a more well-structured
-0.236501	that the remaining bits represent
-0.285348	and the subsequent counts represent
-0.165184	was certain to truly represent
-0.445628	are difficult to find elsewhere.
-0.299329	correctness must be found elsewhere.
-0.200049	c+b can be reused elsewhere.
-0.504664	the use of the micro-op
-0.356479	on processors with a micro-op
-0.237809	the code cache or micro-op
-0.421139	of which one is best.
-0.331749	about which implementation is best.
-0.235890	see which one works best.
-0.237925	on it. Instead of returning
-0.237785	rather unconventional manner by returning
-0.314533	is automatically deallocated when returning
-0.233324	YMM registers. Disadvantages are: Long
-0.317432	single and double precision. Long
-0.370792	instructions out of order. Long
-0.537488	{ for (c2 = r1;
-0.381872	sqaure: for (r2 = r1;
-0.236957	= 0; c1 < r1;
-0.325392	Event-based sampling requires a CPU-
-0.446201	the library functions. The CPU-
-0.357385	more complicated to make CPU-
-0.294237	array ; jump to top
-0.304648	end of a ; top
-0.284590	100 $B1$2 ebx ; top
-1.342876	it is important to decide
-0.314733	up the factors that decide
-0.237693	be annoying. We may decide
-0.216744	also stored near each other.
-0.232523	and underflow neutralize each other.
-0.206150	accessed with a square brackets
-0.206150	to ebx. The square brackets
-0.218595	when exiting the {} brackets
-0.212321	not been updated since 2004.
-0.165184	Mars Compiler v. 8.42n, 2004.
-0.165184	Codeplay VectorC v. 2.1.7, 2004.
-0.972298	the repeat count is odd
-0.237713	example 11.2b was an odd
-0.325438	may seem a little odd
-1.460486	For example: // Example 7.7
-0.218590	increment and decrement operators. 7.7
-0.200049	and references ............................................................................................ 36 7.7
-0.412479	the Intel C++ Compiler Documentation
-0.200049	restrictions. A GNU Free Documentation
-0.165184	www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP. www.openmp.org. Documentation
-0.223659	use, incompatible or error prone.
-0.097588	expensive and more error prone.
-0.097588	and therefore more error prone.
-0.237559	sqrt and pow at compile-
-0.314105	static if), but no compile-
-0.230818	of C++ should allow compile-
-0.236284	accessed from any function. Global
-0.231909	you can avoid it. Global
-0.976438	before the function returns. Global
-0.236990	table (GOT). These table lookups
-0.480608	the GOT and PLT lookups
-0.165184	113 Number of simultaneous lookups
-0.236730	Loopunrolling x-xxxx--x Profile-guided optimization Whole
-0.329335	make up a program. Whole
-0.165184	-O3 Interprocedural optimization /Og Whole
-0.237817	- n.a. (-a)*(-b) = a*b
-0.165184	reductions: a+b = b+a a*b
-0.165184	n.a. a+b = b+a, a*b
-0.357756	file" option for the linker.
-0.355356	map file from the linker.
-0.313650	to load the dynamic linker.
-1.378660	for the sake of security.
-0.102798	of C++ relates to security.
-0.102798	C++ language relates to security.
-0.342238	time than the table lookup.
-0.733479	branch by a table lookup.
-0.370807	used for vectorized table lookup.
-0.342155	pointer aliasing. See page 78
-0.342155	not occur. See page 78
-0.307590	or a function library. 78
-0.346287	vector register size is handled
-0.237762	// This triangle is handled
-0.356768	User feedback should be handled
-0.235250	8.21 void Func1 (int a[],
-0.035786	Example 8.26a void Func(int a[],
-0.035786	Example 8.26b void Func(int a[],
-0.237930	The last line is implicitly
-0.237797	use the memcpy function implicitly
-0.334231	9. Multiplications are done implicitly
-0.053293	the strings including the terminating
-0.236874	that require cleanup before terminating
-0.237733	__GNUC__ and not not _WIN32
-0.234017	_WIN64 _LP64 Windows platform _WIN32
-0.212321	_LP64 Windows platform _WIN32 _WIN32
-0.294236	the necessary calculations of (2n
-0.346014	b as a * (2n
-0.292482	of n. The constant (2n
-0.237464	a performance test that measures
-0.237464	is a counter that measures
-0.331202	other processes. The profiler measures
-0.408127	mix of additions and multiplications.
-0.324724	n additions and no multiplications.
-0.312178	pow(x,10) with only four multiplications.
-0.236842	be sufficient for less intensive
-0.074788	obtained in a computationally intensive
-0.074788	that are not computationally intensive
-0.580873	code that can be moved
-0.357034	A calculation may be moved
-0.294108	can be copied or moved
-0.237817	ReadTSC(); CriticalFunction(); timediff[i] = ReadTSC()
-0.344507	#include <intrin.h> long long ReadTSC()
-0.323239	CriticalFunction(); ... // Use ReadTSC()
-0.477615	sure the result is valid.
-0.382399	that the conversion is valid.
-0.961766	the second operand is valid.
-0.319745	= 1024; int a[size], b[size];
-0.236804	int i; float a[size], b[size];
-0.097757	= 1000; float a[size], b[size];
-0.233582	with the Gnu compiler. Not
-0.232709	serial code for vectorization Not
-0.165184	the Borland C++ builder. Not
-0.237621	program is achieved when none
-0.235358	reduce any expression, but none
-0.235358	15.1b to 15.1c, but none
-0.356627	processor X?" rather than "what
-0.165184	The programmer typically thinks "what
-0.165184	searches of the kind: "what
-0.265203	but only an addition. Comparing
-0.165184	in the condition clause. Comparing
-0.165184	CodeGear Microsoft Table 2.1. Comparing
-0.287392	RISC cores, vector processing instructions,
-0.218590	an algorithm of sequential instructions,
-0.165184	as flush and fence instructions,
-0.635066	for different instruction sets Microprocessor
-0.228166	chosen the wrong branch. Microprocessor
-0.272290	Wesley 1997. Mostly obsolete. Microprocessor
-0.292859	calculation implemented with template metaprogramming.
-0.333602	case we may need metaprogramming.
-0.319435	n, then we need metaprogramming.
-0.294090	summarized below. Intrinsic function Size
-0.453407	follows: Type of elements Size
-0.231409	all of these classes. Size
-0.999921	may be used for metaprogramming,
-0.287188	any algorithm with template metaprogramming,
-0.287188	it. In C++ template metaprogramming,
-0.750405	An optimizing compiler can bypass
-0.352110	variable __intel_cpu_feature_indicator_x. You can bypass
-0.237813	running on. Replace or bypass
-0.312758	or /Fa for assembly output.
-0.344352	option for assembly language output.
-0.234809	operators that produce Boolean output.
-0.533152	2.2 Choice of microprocessor ...........................................................................................
-0.689497	14.6 Floating point division ...........................................................................................
-0.531122	Choosing the optimal platform ...........................................................................................
-0.714313	useful for finding the numerically
-0.237867	Example 14.30 finds the numerically
-0.165190	* 2; // Find numerically
-0.234375	first in an || expression.
-0.304415	two gives the chosen expression.
-0.228171	source is an arithmetic expression.
-0.419443	pointers.......................................................................................................37 7.9 Smart pointers ..........................................................................................................
-0.364859	163 20 Copyright notice ..........................................................................................................
-0.251367	....................................................... 120 12.10 Conclusion ..........................................................................................................
-0.648827	return 0; } The InstructionSet()
-0.048344	// Header file for InstructionSet()
-0.378835	analysis Join identical branches Eliminate
-0.844271	= y + 1.; Eliminate
-0.276599	identical branches Eliminate jumps Eliminate
-0.237927	strategy for saving a backup
-0.290373	software applications need better backup
-0.165184	user and prevent legitimate backup
-0.349077	the specific instruction set. 13.6
-0.284370	14.0 80.8 65 65 13.6
-0.218590	13.5 Implementation ..................................................................................................... 126 13.6
-0.511813	size; i++) { // Get
-0.344871	int parm2) { // Get
-0.236664	desired function version // Get
-0.353824	Assume function does not throw
-0.311034	that F1 will never throw
-0.421449	F2 that can possibly throw
-0.451139	or higher instruction set. More
-0.233593	Return reference to a[i] More
-0.229247	to avoid these problems. More
-0.212321	expressions Automatic vectorization Devirtualization ---x-----
-0.200049	algebra reductions: !(!a)=a x-xxxxxxx ---x-----
-0.200049	x-xxxx--x x-xxxx--x x-xx----- x--x----- ---x-----
-0.103701	* p) { return _mm_loadu_si128((__m128i
-0.729946	Choice of programming language Before
-0.590312	to find hot spots Before
-0.218590	more complicated mathematical tasks. Before
-0.623166	32-bit and 64-bit systems. Applications
-0.392222	to the user interface. Applications
-0.165184	possible. See page 141. Applications
-0.237731	is approximately 12 - 25
-0.231909	cause of reduced performance. 25
-0.165184	24 6 Development process...................................................................................................... 25
-0.356950	first processors with the AVX-512
-0.148757	the library function. 12.2 AVX-512
-0.148757	registers ................................................................. 107 12.2 AVX-512
-0.406791	127 1 fraction 2 23
-0.321722	new versions of their 23
-0.212321	Performance and usability ............................................................................................... 23
-0.378236	28, the cache will evict
-0.234609	1. Number 18 will evict
-0.234609	column. Number 17 will evict
-0.524984	start of the function. Copying
-0.235418	memory to stack memory. Copying
-0.165184	arrays forwards, not backwards. Copying
-0.314151	Table[100]; int x; for (x
-0.237140	factorial = 1.0; for (x
-0.237140	A + B; for (x
-0.311450	rely on anything else being
-0.290374	the consequence of n being
-0.165184	of test data. That being
-0.237825	x // x^n // sum,
-0.353741	added to the first sum,
-0.342817	added to the second sum,
-0.237879	also has disadvantages: The unrolled
-0.237476	Calculate integer power, loop unrolled
-0.279495	count may be completely unrolled
-0.314647	the different cores is slow.
-0.237762	(int)d; // Truncation is slow.
-0.234535	of Basic was too slow.
-0.054432	consecutive elements in aa: StoreVector(aa
-1.460486	For example: // Example 7.11
-0.322219	discussion of this problem. 7.11
-0.212321	7.10 Arrays ..................................................................................................................... 38 7.11
-0.237952	new processor enters the market
-0.538856	takes to develop and market
-0.347173	years old. The CPU market
-0.815420	as a vector of vectors,
-0.237907	for integer division in vectors,
-0.234809	using integers as Boolean vectors,
-0.235992	saved. Any other allocated resource.
-0.318568	size is a limited resource.
-0.291775	registers is a scarce resource.
-0.047871	from Intel: "IA-32 Intel Architecture
-0.047871	documentation Intel: "IA-32 Intel Architecture
-0.165190	3B. developer.intel.com. AMD: "AMD64 Architecture
-0.593106	thing. Example: // Example 7.12
-0.603566	as a member function. 7.12
-0.218590	7.11 Type conversions.................................................................................................... 40 7.12
-0.331696	of available registers is limited.
-0.237762	number of allocations is limited.
-0.349822	of registers is very limited.
-0.593119	finished. Example: // Example 11.3
-0.754269	the loop in example 11.3
-0.339762	register variable in example 11.3
-0.237809	a #define, const or typedef
-0.312991	// define function type typedef
-0.235774	type with desired parameters typedef
-0.036960	the address range from 0x2700
-0.324779	0x40 bytes from address 0x2700
-0.379860	public: int c; }; Replace
-0.212321	it is running on. Replace
-0.165184	template: // Example 7.34b. Replace
-0.200049	recommend any specific model. Instead,
-0.165184	v. 4.5.2, July 2011). Instead,
-0.165184	use ~ for NOT. Instead,
-0.331869	all runtime libraries and frameworks,
-0.234532	based on big runtime frameworks,
-0.234207	avoid the large graphics frameworks,
-0.629692	For example, x = *(p++)
-0.321432	while (*p != 0) *(p++)
-0.165184	i > 0; i--) *(p++)
-0.236285	some small low-power CPUs (Intel
-0.236034	-msse4.1 -mAVX -axSSE3, etc. (Intel
-0.330846	etc. (Intel CPU only) (Intel
-0.352044	the same or a nearby
-0.350118	that branch and other nearby
-0.235560	a disadvantage if other nearby
-0.234535	space has become too fragmented.
-0.296166	heap space to become fragmented.
-0.296166	the heap has become fragmented.
-0.455682	using rounding instead of truncation.
-0.048350	speed between rounding and truncation.
-0.048350	difference between rounding and truncation.
-0.122675	series of manuals. 7.1 Different
-0.122675	C++ constructs........................................................................ 26 7.1 Different
-0.165190	NAN (not a number). Different
-0.339312	faster than calculating the logarithm
-0.237783	executed. Without static, the logarithm
-0.237783	the overflow. Taking the logarithm
-0.408137	using each bit in Day
-0.220323	Day == Wednesday || Day
-0.220323	(Day == Tuesday || Day
-0.537449	for code that is ported
-0.234022	the code is later ported
-0.228171	readable and not easily ported
-0.325303	all functions static or inline.
-0.351076	may declare the function inline.
-0.351076	are. Declare the function inline.
-0.585157	and the function is big.
-0.236977	code can become very big.
-0.332038	loop count is too big.
-0.237709	integers. The allocation and deallocation
-0.237709	of dynamic allocation and deallocation
-0.165190	allocated separately. The allocation, deallocation
-0.247471	a procedure linkage table (PLT)
-0.160757	called procedure linkage table (PLT)
-0.160757	ordinary procedure linkage table (PLT)
-0.357732	example 7.22. // Example 7.22
-0.212321	identification (RTTI) ........................................................................... 54 7.22
-0.200049	and use alternative implementations. 7.22
-0.074788	Memory access....................................................................................................... 22 3.14 Context
-0.074788	about memory caching. 3.14 Context
-0.165190	have to be renewed. Context
-0.885363	overflow. Example: // Example 7.23
-0.379860	b2; int c; }; 7.23
-0.212321	7.22 Inheritance .............................................................................................................. 54 7.23
-0.294267	resources. Consider running the services
-0.234204	software. Background services. Many services
-0.283143	optimize performance for background services
-0.462116	unsigned conversion // Example 7.20
-0.231409	need any non-static access. 7.20
-0.212321	member functions (methods)......................................................................... 53 7.20
-0.542745	-ftrapv, but this is extremely
-0.493888	but this method is extremely
-0.237595	and other abuse is extremely
-0.232347	data cache is 8 kb
-0.412906	a cache of 8 kb
-0.530732	level-2 cache is 512 kb
-0.292400	of cache space by joining
-1.174047	can be avoided by joining
-0.236312	made more compact by joining
-0.481291	operators also applies to decrement
-0.237641	of the increment and decrement
-0.237641	137, respectively. Increment and decrement
-0.237817	0x40) % 0x20 = 0x1C.
-0.237369	cache lines from set 0x1C.
-0.293408	belong to set number 0x1C.
-0.284107	delete, or malloc and free.
-0.212102	the functions malloc and free.
-0.485877	edition is available for free.
-0.502070	< 4) { // Check
-0.237244	{temp=x; x=y; y=temp;} // Check
-0.232718	particular code version. 2. Check
-0.352662	were float instead of double,
-0.614273	bits of a 64-bit double,
-0.218594	such as int, float, double,
-0.221634	that follows a simple periodic
-0.221634	it follows a simple periodic
-0.314935	other branches. A simple periodic
-0.357732	its address: // Example 7.27
-0.235773	for using overloaded functions. 7.27
-0.218590	Overloaded functions .............................................................................................. 56 7.27
-0.593106	registers. Example: // Example 7.24
-0.212321	and destructors .................................................................................. 55 7.24
-0.165184	function. See page 53. 7.24
-0.558738	factor rather than the product
-0.236975	A better performing software product
-0.212321	Classes (MFC). A competing product
-0.357732	of overflow: // Example 7.25
-0.521723	be applied to integers. 7.25
-0.212321	7.24 Unions .................................................................................................................... 55 7.25
-1.169361	code. Example: // Example 7.28
-0.436534	automatically in simple cases. 7.28
-0.218590	Overloaded operators ............................................................................................. 56 7.28
-0.343695	This includes pointers and references,
-0.222395	counters, function parameters, pointers, references,
-0.165184	32 bit systems: Pointers, references,
-0.539642	Intel, AMD and VIA CPUs"
-0.237046	green. It takes some experience
-0.346609	good deal of programming experience
-0.230110	then the user might experience
-1.350542	is more efficient to determine
-0.592101	experiments in order to determine
-0.237541	GetLogicalProcessorInformation in Windows) to determine
-0.226801	macro by template template <typename
-0.226801	with bounds checking template <typename
-0.226801	a template parameter: template <typename
-0.265203	- masm=intel /FA -S Generate
-0.200049	/Qparallel -parallel -openmp -static Generate
-0.165184	Generate map file /Fm Generate
-0.581799	automatically then it is certainly
-0.499331	issue. But it is certainly
-0.237595	is often seen, is certainly
-0.234531	in table 8.1 below. Devirtualization
-0.325431	float expressions Automatic vectorization Devirtualization
-0.165184	Example: // Example 8.19. Devirtualization
-0.462558	like this in a pivot
-0.237750	matrix for use as pivot
-0.311802	of finding a suitable pivot
-0.334539	Linux Align by 16 __declspec(
-0.228171	optimize("a", on) __restrict __restrict __declspec(
-0.466197	__declspec( align(16)) __attribute(( aligned(16))) __declspec(
-0.567388	time in case of mispredictions
-0.232923	BTB can cause branch mispredictions
-0.232923	intrinsic function. Provoke branch mispredictions
-0.247510	Example 8.13a int i, a[100],
-0.247510	Example 8.13b int i, a[100],
-0.247510	Example 8.14b int i, a[100],
-0.881697	when the number of allocations
-0.237563	to cause seven memory allocations
-1.039698	if there are many allocations
-0.236373	in separate modules if necessary,
-0.236373	save RAM space, if necessary,
-0.236373	code are modified, if necessary,
-0.885363	memory. Example: // Example 9.4
-0.323431	dynamically linked library functions. 9.4
-0.218590	be stored together...................................... 88 9.4
-0.352943	more bits than a float.
-0.329218	bit vector of four float.
-0.276603	types: char, short int, float.
-0.064874	x * x + 1.0f;}
-0.231708	{ return square(x) + 1.0f;}
-1.018406	if the code is indeed
-0.237762	loop. Example 8.21 is indeed
-0.237859	that thrown exceptions are indeed
-0.338275	experimental results in table 9.1
-0.613021	9 Optimizing memory access 9.1
-0.224950	memory access ............................................................................................. 87 9.1
-0.341598	unrelated to each other (not
-0.233326	16. Library versions tested (not
-0.330846	or infinity or NAN (not
-0.353415	Many CPUs have a built-in
-0.314700	and string instructions. The built-in
-0.212321	Gnu compiler often inserts built-in
-0.345227	a suitable choice of n.
-0.289005	works only for positive n.
-0.165184	to some positive value, n.
-0.878458	can lead to a complete
-0.324991	organizing the data. A complete
-0.235642	The file http://www.agner.org/optimize/asmlib.zip contains complete
-2.503909	x x x x (x)
-0.330154	= x- x- x (x)
-0.379548	x- x (x) x (x)
-0.230818	ENDP + esp ebx ecx,
-0.224953	mov 2:8+esp eax, edx, ecx,
-0.200049	recently 4 ?Func2@@YAXQAHAAH@Z ENDP ecx,
-0.023468	string is created or modified.
-0.461945	original object is not modified.
-0.088610	elimination x n.a. Constant folding
-0.088610	b = 6.0f; Constant folding
-0.088610	a few places. Constant folding
-0.345154	public variable where it expects
-0.346402	word processor the user expects
-0.306869	is insufficient. The user expects
-0.292801	// Virtual function // Call
-0.037034	b, c; ... // Call
-0.462298	branches. They can be joined
-0.481554	entire program will be joined
-0.342038	template instances will be joined
-0.493893	easier to use vector classes,
-0.234533	is to use string classes,
-0.232716	limited to well-tested functions, classes,
-0.346381	because optimizing compilers can compute
-0.236250	Here, the code must compute
-0.709922	top of loop ; compute
-0.460011	simply a matter of interpreting
-0.237880	large runtime framework for interpreting
-0.237905	in the SVML and LIBM
-0.228416	See page 131. AMD LIBM
-0.228416	Library __vrs4_expf __vrd2_exp AMD LIBM
-0.332506	members. The code that accesses
-0.332506	here. Any code that accesses
-0.235418	this "override" feature. All accesses
-0.358058	jumps back to the $B1$2
-0.287390	eax, 1 eax, 100 $B1$2
-0.200049	eax, 100 / jl $B1$2
-0.329492	support for hard disk copying.
-0.228168	high precision math. Memory copying.
-0.165184	without effectively preventing illegitimate copying.
-0.005763	Architecture Software Developer’s Manual", Volume
-0.048403	"AMD64 Architecture Programmer’s Manual", Volume
-0.357875	example 13.1 can be placed
-0.293177	code can then be placed
-0.351207	The pragmas must be placed
-0.283896	circumstances around the hot spot.
-0.278548	to identify a hot spot.
-0.192864	occur in this hot spot.
-0.571989	access part of a variable,
-0.237751	additional information about a variable,
-0.352648	to increment an integer variable,
-0.579441	cause a lot of jumping
-0.569968	longjmp is used for jumping
-0.234382	any necessary destructors after jumping
-0.501976	quite a long time compared
-0.226790	have the following disadvantages compared
-0.165184	is short in duration compared
-0.626356	The same applies to 3-dimensional
-0.023468	Aligning RGB video or 3-dimensional
-0.237925	call the destructor of x.
-0.479778	store the result in x.
-0.237761	the const restriction on x.
-0.237924	the expression -(-a) to a.
-0.331876	the four results in a.
-0.237265	as(a << 4) + a.
-0.237924	you change pre-increment to post-increment.
-0.237809	you use pre-increment or post-increment.
-1.312802	is more efficient than post-increment.
-0.354962	two. Often, it is sufficient
-0.458597	one-man projects, it is sufficient
-0.357409	powerful and may be sufficient
-0.556566	times because it is evicted
-0.356956	the table to be evicted
-0.355183	to 0x273F will be evicted
-0.237952	has problems separating the flags
-0.234673	the carry and zero flags
-0.200049	to the so-called partial flags
-0.237907	p and r in Sum2
-0.568796	slightly more efficient than Sum2
-0.165184	The three functions Sum1, Sum2
-0.338306	Make a vector of (2,2,2,2,2,2,2,2)
-0.092771	1 ebx, DWORD PTR [edx]
-0.092771	add ebx, DWORD PTR [edx]
-0.210898	PTR [esp+8] DWORD PTR [edx]
-1.460486	For example: // Example 7.14
-0.218594	43 7.13 Loops...................................................................................................................... 45 7.14
-0.212321	count is too big. 7.14
-0.357732	using memset: // Example 7.16
-0.222395	Function parameters ............................................................................................... 50 7.16
-0.642271	compilers and operating systems". 7.16
-0.357732	changes fastest: // Example 7.17
-0.318042	to delete the object. 7.17
-0.222395	return types .............................................................................................. 50 7.17
-0.521962	execution speed to using templates.
-0.370964	performance cost to using templates.
-0.165190	are implemented as recursive templates.
-1.460486	For example: // Example 7.13
-0.222395	in the different microprocessors. 7.13
-0.218590	and switch statements............................................................................. 43 7.13
-0.357732	type conversions: // Example 7.19
-0.231917	the first 128 bytes. 7.19
-0.218590	members (properties) ............................................................................ 51 7.19
-0.275812	strlen, sprintf, etc. But beware
-0.221700	c > b) But beware
-0.221700	to be inlined. But beware
-0.357732	more efficient: // Example 7.18
-0.231909	negative effect on performance. 7.18
-0.218590	Structures and classes............................................................................................ 51 7.18
-0.433968	either on a graphics card
-0.251373	on a graphics accelerator card
-0.222401	this is extremely inefficient, (4)
-0.165190	the GOT, and finally (4)
-0.537455	macro to swap two elements:
-0.237121	to swap two array elements:
-0.522380	implementation can be a viable
-0.516893	compilation may be a viable
-0.212327	Smart pointers .......................................................................................................... 38 7.10
-0.330854	classes on page 93. 7.10
-0.023470	(2,2,2,2,2,2,2,2) __m128i two = _mm_set1_epi16(2);
-0.335723	more efficient container class templates,
-0.233043	the STL. Some STL templates,
-0.237905	for code bloat and complexity
-0.322427	user. With the high complexity
-0.212331	129 129 130 14.4 511
-0.200054	129 130 14.4 511 511
-0.122678	pointers ...................................................................................................... 37 7.8 Member
-0.122678	pointer has changed. 7.8 Member
-0.237880	a similar utility for modifying
-0.237788	modify a double by modifying
-0.343601	point expressions may have undesired
-0.232718	ambiguous and may produce undesired
-0.200054	7.14 Functions ................................................................................................................ 48 7.15
-0.165190	Windows in this respect. 7.15
-0.237882	PLT and GOT. The symbol
-0.287397	shared object. This so-called symbol
-0.349822	system. This is very problematic
-0.324978	Text strings are particularly problematic
-0.351519	user who has to invest
-0.294019	it is worthwhile to invest
-0.314731	calls to memset and memcpy,
-0.559088	memory-intensive functions such as memcpy,
-0.172520	efficient than Sum2 and Sum3
-0.172520	functions Sum1, Sum2 and Sum3
-0.226791	transferred in registers anyway. Pure
-0.222401	function that needs them. Pure
-0.556750	subtasks, but it is impossible
-0.356249	with pointers that are impossible
-0.101789	to multiple inheritance class B1;
-0.101789	7.38a. Multiple inheritance class B1;
-0.165665	because of a store forwarding
-0.165665	to generate a store forwarding
-0.200054	also be huge). Far storage,
-0.165190	OS, etc.) have little-endian storage,
-0.335738	operating system (see page 107).
-0.614670	automatic vectorization (see page 107).
-0.235830	(set) = (10000 / 64)
-0.165190	cache line size (typically 64)
-0.349910	to declare the table static.
-0.165190	to make a lookup-table static.
-0.222403	platform _M_IX86 and _WIN64 _M_X64
-0.200054	_M_IX86 and _WIN64 _M_X64 _M_X64
-0.236556	<stdio.h> #include <asmlib.h> void CriticalFunction();
-0.165190	{ time1 = ReadTSC(); CriticalFunction();
-0.200054	x-xx--xx- x--x----- --xx----- x-xxx---x x-xxx---x
-0.200054	(x) x-xx--xx- x--x----- --xx----- x-xxx---x
-0.293048	for different platforms as shown
-0.236881	coded as _mm_empty() as shown
-0.237905	to poor documentation and lack
-0.350629	etc.). Older operating systems lack
-0.102611	* reciprocal_divisor; y2 = a2
-0.102611	/ b1; y2 = a2
-0.102611	* b2); y1 = a1
-0.102611	y1, y2; y1 = a1
-0.354253	a profiling (see page 16)
-1.169486	< 256; i += 16)
-0.357655	disassembly window of a debugger.
-0.356283	to trace with a debugger.
-0.352056	Digital Mars compiler is mostly
-0.325387	32 bit mode and mostly
-0.344282	+ (vector const & a)
-0.224961	8.1a float square (float a)
-0.165190	syntax: __asm fld qword ptr
-0.165190	x; __asm fistp dword ptr
-0.237882	__fastcall or __attribute__((fastcall)). The fastcall
-0.236171	or common names. Use fastcall
-0.357124	The optimal number of accumulators
-0.313893	loop and use multiple accumulators
-0.117168	for "assume no pointer aliasing"
-0.117168	option "assume no pointer aliasing"
-0.074790	maintenance .......................................................................................... 126 13.5 Implementation
-0.074790	be found elsewhere. 13.5 Implementation
-0.542397	to improve the performance significantly
-0.236391	can be speeded up significantly
-0.212331	methods for exploiting fine-grained parallelism.
-0.212327	the code contains natural parallelism.
-0.008674	40320, 362880, 3628800, 39916800, 479001600};
-0.200054	described in the book "Performance
-0.165190	Goedecker and Adolfy Hoisie: "Performance
-0.294244	address of x is type-casted
-0.325309	or if pointers are type-casted
-0.237316	= x *x; double x4
-0.237209	x; // x^2 float x4
-0.551775	reciprocal of the clock frequency.
-0.301518	changes in the clock frequency.
-0.294238	a second step of interpretation
-0.237813	that requires compilation or interpretation
-0.348036	11) and vector operations (chapter
-0.236214	multiple threads. Out-of-order execution (chapter
-0.095690	based on my own research,
-0.095690	code. For my own research,
-0.466207	of Numerically Intensive Codes", SIAM
-0.165190	Goedecker and A. Hoisie, SIAM
-0.380522	a[i] = Induction; ; a[i+1]
-0.399362	{ a[i] = Induction; a[i+1]
-0.222401	x-xxxxxxx xxxxxxxxx xxxxxxx-x xxxxxxxxx x-xxx----
-0.165190	x-xx----- x--x----- ---x----- x---x---x x-xxx----
-0.237813	a static buffer or send
-0.234812	everybody. So please don't send
-0.221083	// Example 7.31b char string[100],
-0.221083	// Example 7.31a char string[100],
-0.536119	byte = char 16 SSSE3
-0.236039	SSE3 horizontal add, etc. SSSE3
-0.343731	reduce other types of expressions,
-0.461896	expressions than floating point expressions,
-0.293312	the appropriate function version CriticalFunctionType
-0.200054	parm2); // Function prototype CriticalFunctionType
-0.351590	preceding one (see page 71).
-0.332602	by 2. (See page 71).
-0.832675	data are stored in ASCII
-0.165190	example converts a zero-terminated ASCII
-0.500152	allocated objects are not overlapping
-0.659572	prevents the CPU from overlapping
-0.358080	be obtained in a computationally
-0.535932	Applications that are not computationally
-0.234537	cases, the Intel mechanism executes
-0.272297	the loop. Example 12.4b executes
-0.276605	appear in the project window
-0.165190	code in the disassembly window
-0.237813	up to five or ten
-0.752690	calls the critical function ten
-0.237827	of S1 aligned // Structure
-0.218595	array 800 bytes smaller. Structure
-0.523936	doing different kinds of jobs.
-0.283149	10 ms for background jobs.
-0.200054	questions from everybody. So please
-0.200054	nagging pop-up messages saying please
-0.122678	destructors .................................................................................. 55 7.24 Unions
-0.122678	See page 53. 7.24 Unions
-0.200054	3A and 3B. developer.intel.com. AMD:
-0.466207	versions are produced regularly. AMD:
-0.632680	c*x + d = ((a*x+b)*x+c)*x+d
-2.503956	x x x x ((a*x+b)*x+c)*x+d
-0.226795	more and more important. 9.2
-0.224956	and data ......................................................................................... 87 9.2
-0.237932	memory. One kilobyte is 1024
-0.237926	vector register sizes to 1024
-0.073456	the time it was programmed.
-0.200054	arrays. // Example 12.5. Aligned
-0.165190	numbers mean good performance). Aligned
-0.525232	go based on the past
-0.200054	in C++ programs. Writing past
-0.379369	aligning dynamically allocated memory. 9.6
-0.222403	of data ...................................................................................................... 90 9.6
-0.265712	a member of the object's
-0.095494	double a1, a2, b1, b2,
-0.237515	a function template because partial
-0.331874	due to the so-called partial
-0.102611	x-xxx---- a*b*c=a*(b*c) a+b+c+d = (a+b)+(c+d)
-0.102611	(a+b)+c=a+(b+c) a+b+c=c+b+a a+b+c+d = (a+b)+(c+d)
-0.355478	other than short int (16
-0.740241	by the vector size (16
-0.480852	translated to the instruction xor
-0.231414	12 $B1$1: push mov xor
-0.218599	model work better. Remember again,
-0.284372	than calculating the logarithm again,
-0.218595	9.8 Strings ...................................................................................................................... 96 9.9
-0.218595	this manual at www.agner.org/optimize/cppexamples.zip. 9.9
-0.343356	ahead of time and resolve
-0.237645	order to find and resolve
-0.573891	meanings depending on the context.
-0.440210	adapt to the new context.
-0.389724	to the appropriate version (May
-0.063635	CPU dispatcher. See page 131.
-0.358060	be ignored if the goal
-0.222403	date. A more realistic goal
-0.444334	on CPU dispatching and discovered
-0.293933	program. Many programmers have discovered
-0.234181	table by 16 float Exp(float
-0.234181	12.9a. Taylor series float Exp(float
-0.212327	Container classes ..................................................................................................... 93 9.8
-0.200054	to fit specific needs. 9.8
-0.355084	__INTEL_COMPILER __INTEL_COMPILER n.a. n.a. _MSC_VER
-0.200054	for aligning data #ifdef _MSC_VER
-0.074790	of unit-testing ...................................................................................... 156 16.3
-0.074790	is unreasonably large. 156 16.3
-0.200054	there is a 90% chance
-0.165190	other with a 50-50 chance
-0.358403	arrays. Strings can be manipulated
-0.235241	when the CPUID was manipulated
-0.516085	127. The calculation of c+b
-0.230826	useful if the subexpression c+b
-0.665646	that allows you to override
-0.382593	loose the ability to override
-0.447494	they do not use branches,
-0.311812	if this can eliminate branches,
-0.339104	are used in multiple applications,
-0.313839	most efficient for such applications,
-0.352932	own research, I have developed
-0.345318	not yet as well developed
-0.222401	this complicated template method. 7.29
-0.165190	............................................................................................. 56 7.28 Templates...............................................................................................................57 7.29
-0.314777	occurs during execution of CriticalFunction.
-0.442025	between the calls to CriticalFunction.
-0.330368	and smaller. This manual discusses
-0.233330	programming languages. This section discusses
-0.165190	5 #define FUNCNAME SelectAddMul_SSE41 #elif
-0.165190	2 #define FUNCNAME SelectAddMul_SSE2 #elif
-0.222403	& 3) <<6 ); 7.26
-0.218595	7.25 Bitfields ................................................................................................................... 56 7.26
-0.017525	provided in manual 4: "Instruction
-0.017525	listed in manual 4: "Instruction
-0.447844	offset of b is 400
-0.237827	public: int a[100]; // 400
-0.213895	- x x Loop invariant
-0.213895	in the compiler. Loop invariant
-0.023260	a*x*x*x + b*x*x + c*x
-0.583821	references in the code carefully
-0.237362	in multiple versions, each carefully
-0.237621	to the stack when CriticalInnerFunction
-0.236556	// Example 14.1c void CriticalInnerFunction
-0.165190	(-a)*(-b)=a*b ---xxx--- a/a=1 --------x a/1=a
-0.165190	a*1=a (-a)*(-b)=a*b a/a=1 ----x---x a/1=a
-1.187916	const & x) { __m128
-0.323586	bits each. The type __m128
-0.324022	bits with the & operator;
-0.378134	operation using the | operator;
-0.356444	recognizes it as a subexpression.
-0.339894	parenthesis around the constant subexpression.
-0.458773	This memory space is freed
-1.344849	then it may be freed
-0.222403	using the bitwise OR operator,
-0.165190	constructor, an overloaded assignment operator,
-0.251373	p->Hello(); p = &Object2; p->Hello();
-0.165190	p = &Object1; p->NotPolymorphic(); p->Hello();
-0.335663	BIOS setup. on Intel CPUs:
-1.493861	Intel, AMD and VIA CPUs:
-0.323066	is true, and all 0's
-0.631256	is AND'ed with all 0's
-0.357958	logic device is a chip
-0.584591	language in the same chip
-0.224956	bits with the ^ operator.
-0.165190	tests with the sizeof operator.
-0.294137	the installation process can proceed
-0.314175	unattended. Uninstallation should also proceed
-0.048279	// Lowest version int CriticalFunction_386(int
-0.237905	Covers PC's, workstations and scientific
-0.237908	have a niche in scientific
-0.467645	of the exponent is biased
-0.828456	is stored as a biased
-0.357958	FDIV bug is a minor
-0.293321	easily justify a possible minor
-0.357453	rendering graphics on the screen.
-0.237869	takes to refresh the screen.
-0.351636	model comes on the market.
-0.351636	processor appears on the market.
-0.035787	16 __declspec( align(16)) __attribute(( aligned(16)))
-0.035787	aligned(16))) __declspec( align(16)) __attribute(( aligned(16)))
-0.358139	These costs can be justified
-0.357037	extra time may be justified
-0.294237	to repeat or to exit
-0.218599	then calls exit. Calling exit
-0.123375	else { y = cos(x);
-0.381667	any function or variable having
-0.200054	in p1 and p2 having
-0.339584	of the time. A for-loop
-0.236143	in example 7.32b. A for-loop
-0.291475	bytes bool 1 1 char,
-0.251373	favorable: Small data types: char,
-0.200054	a+0=a a*0=0 a*1=a (-a)*(-b)=a*b a/a=1
-0.165190	a*1=a x-xxxxx-x (-a)*(-b)=a*b ---xxx--- a/a=1
-0.039355	can become a serious legal
-0.039355	has become a serious legal
-0.349786	used for any other resource,
-0.534600	Registers are a scarce resource,
-0.483875	Supports OpenMP and automatic parallelization.
-0.221774	all compilers. Use automatic parallelization.
-0.449971	an efficient way of keeping
-0.531132	Underestimating the cost of keeping
-0.200054	is less reliable. Event-based sampling:
-0.165190	every code line. Time-based sampling:
-0.357739	align arrays. // Example 12.5.
-0.234535	F64vec2 F32vec8 F64vec4 Table 12.5.
-0.237738	calculated in advance. This reduces
-0.234931	A compiler that automatically reduces
-0.991041	when applied to a non-member
-0.232344	keyword to all local non-member
-0.967943	the code can be vectorized,
-0.407429	loop can still be vectorized,
-0.008674	#define swapd(x,y) {temp=x; x=y; y=temp;}
-0.074790	to remove all disturbing influences
-0.074790	best-case conditions. All disturbing influences
-0.579391	explanation. The following example explains
-0.234535	the non-reduced expression better explains
-0.592391	itself in order to emulate
-0.336270	vector class library can emulate
-0.237813	may be three or four,
-0.960545	out the loop by four,
-0.229965	usability issues, and I believe
-0.229965	good as expected. I believe
-0.336285	value maximum value in stdint.h
-0.334844	the standard header file stdint.h
-0.195687	elimin., integer Common subexpression elimin.,
-0.195687	Pointer elimination Common subexpression elimin.,
-0.322822	one and only one instance.
-0.455274	function has only one instance.
-0.022988	and copy matrix void TransposeCopy(double
-0.237715	a slow CPU, an insufficient
-0.237541	not selected. Compiler has insufficient
-0.382899	how to overcome the dangers
-0.759972	are a number of dangers
-0.451109	that the objects are aligned.
-0.200054	may not be optimally aligned.
-0.558740	at, rather than the external
-0.294064	need to link with external
-0.386504	size) { cout << "Error:
-0.386504	int)size) { cout << "Error:
-0.230117	Suppl. SSE3 tmmintrin.h SSE4.1 smmintrin.h
-0.212327	smmintrin.h SSE4.2 nmmintrin.h (MS) smmintrin.h
-0.234536	aware of. Big runtime frameworks.
-0.234025	programming language and interface frameworks.
-0.074790	enum Weekdays { Sunday, Monday,
-0.074790	if the constants Sunday, Monday,
-0.513367	BSD and Mac OS X,
-0.608897	for 32-bit Mac OS X,
-0.237603	used without restrictions. A GNU
-0.165190	included in compiler price GNU
-0.237367	in example 13.1 page 127.
-0.226793	1 from -128 generates 127.
-0.401909	prototypes for each version FuncType
-0.233691	to the selected version FuncType
-0.350847	// Virtual call to C1::f
-0.313462	so it can call C1::f
-0.343853	the program less efficient. Splitting
-0.165190	disagree with this rule. Splitting
-0.318049	suited for vector operations. Algorithms
-0.165190	on vectors and matrixes. Algorithms
-0.200054	etc. -msse3 -mssse3 -msse4.1 -mAVX
-0.165190	-mssse3 /arch:SSSE2 -msse4.1 /arch:SSE4.1 -mAVX
-0.443747	do not have to worry
-0.343222	we do have to worry
-0.539110	done in a single instruction.
-0.322229	of this bit scan instruction.
-0.350050	return x10; } // x^2
-0.237246	x * x; // x^2
-1.116495	that it can be disabled
-0.521049	counters when they are disabled
-0.462532	calls directly to the CPU-specific
-0.237859	performance monitor counters are CPU-specific
-0.355664	suggested improvements). // Example 8.26b
-0.292038	example 8.26b: ; Example 8.26b
-0.515962	lower instruction set. The preprocessing
-0.343348	code. The library has preprocessing
-0.341332	multiple streams with different strides.
-0.303143	regular patterns with fixed strides.
-0.382861	interval from 0 to 15.
-0.353674	provided below, on page 15.
-1.033286	time it takes to develop
-0.351690	x; } }; // Full
-0.351690	powN<true,N/2>::p(x); } }; // Full
-0.523359	(double x) { // (N
-0.218595	of N: #define N1 (N
-0.237827	a[arraysize], b[arraysize], c[arraysize]; // Enable
-0.200054	example 12.1b to 12.1a. Enable
-0.348873	quite simple in most cases:
-0.561332	necessary in the following cases:
-0.059480	from AVX code to non-AVX
-0.237220	n.a. - a+b+c = a+(b+c)
-0.237220	- n.a. (a+b)+c = a+(b+c)
-0.048387	key or moving the mouse.
-0.048387	button or moving the mouse.
-0.222401	Volume 1 - 5. www.amd.com.
-0.165190	AMD Family 15h Processors". www.amd.com.
-0.237926	are adding -100 to -56
-0.645692	and give the result -56
-0.354569	function libraries is more difficult.
-0.236233	makes detailed optimization more difficult.
-0.236039	-mAVX /arch:AVX /QaxSSE3, etc. -msse3
-0.200054	(multithreaded) /arch:AVX /openmp /MT -msse3
-0.272297	code from example 8.26a (32-bit
-0.165190	that allows bigger segments (32-bit
-0.224958	last all the B values.
-0.272297	case" and "best case" values.
-0.251373	or double) /arch:SSE2 -msse2 /arch:SSE2
-0.165190	inte- ger or double) /arch:SSE2
-0.313774	// Define vector objects Vec8s
-0.224956	short int 128 Is16vec8 Vec8s
-0.237371	template <typename MyChild> class CParent
-0.229252	Writes "Hello 2" Here CParent
-0.466207	aliasing (see page 78). Adding
-0.165190	the value wrap around. Adding
-0.165190	and 3A and 3B. developer.intel.com.
-0.165190	Architectures Optimization Reference Manual". developer.intel.com.
-0.074790	-fomit- frame- pointer -fomit- frame-
-0.074790	stack frame /Oy -fomit- frame-
-0.221772	vector class library #include <stdio.h>
-0.221772	// Example 16.2 #include <stdio.h>
-0.218595	data structures ............................................................. 96 9.11
-0.251373	A. Hoisie, SIAM 2001. 9.11
-0.356635	without -fpic because the relocations
-0.437113	and it will generate relocations
-0.172258	will be infinity or NAN
-0.172258	zero or infinity or NAN
-0.218595	data sequentially .......................................................................................... 96 9.10
-0.165190	storage order is opposite). 9.10
-0.098476	1" // Writes "Hello 2"
-0.324276	n factorial } return sum;
-0.338480	0, s3 = 0, sum;
-0.228173	integral number of vectors. 12.10
-0.226793	3-dimensional vectors ....................................................... 120 12.10
-0.512214	threads. The overhead of semaphores,
-0.356122	between threads, such as semaphores,
-0.222401	cycle on most microprocessors. Multiplication
-0.489136	depending on the microprocessor. Multiplication
-0.237820	trick that N1 = N&(N-1)
-0.835688	power of 2 then N&(N-1)
-0.212168	Example 8.26a compiled to assembly:
-0.212168	Example 8.26b compiled to assembly:
-0.008674	New versions are produced regularly.
-0.294184	used and searching for vacant
-0.357598	this address is not vacant
-0.358547	unknown factors in the early
-0.236324	and just-in-time compilation. Some early
-0.293263	class (CGrandParent) contains any non-polymorphic
-0.165190	with templates // Place non-polymorphic
-0.023470	= 2.2, C = 3.3;
-0.293318	extremely costly to many users.
-0.236979	many hard working software users.
-0.314564	The functions must have extern
-0.237480	the common entry point extern
-0.558740	stack rather than the heap.
-0.346128	of managing a memory heap.
-0.294187	long double format. The formats
-0.236530	protocols and standardized file formats
-0.358139	if exceptions can be ruled
-0.456210	where they cannot be ruled
-0.294244	of memory addresses is reused
-0.358403	subexpression c+b can be reused
-0.230823	in the last vector. Organize
-0.165190	access is a bottleneck. Organize
-0.053651	class CChild1 : public CParent<CChild1>
-0.724012	replaced with: // Example 14.12b
-0.355093	loop unrolling in example 14.12b
-0.235884	vectorization favorable: Small data types:
-0.235884	less favorable: Larger data types:
-0.357829	division. Correction for the FDIV
-0.237882	the "FDIV bug". The FDIV
-0.354044	the value before the decimal
-0.460526	single constant with a decimal
-0.237209	sum = 1.f; float nfac
-0.327290	nfac; xn *= x; nfac
-0.288314	Open files and network connections.
-0.232342	Locked mutexes. Open database connections.
-0.539733	less than in a PC.
-0.237753	that previously required a PC.
-0.805561	schemes are based on hacks
-0.165190	calls rather than self-styled hacks
-0.236357	can improve search times 24
-0.218595	the optimal algorithm ....................................................................................... 24
-0.236769	because switch statements often suffer
-0.323587	the code can therefore suffer
-0.237905	use try, catch, and throw.
-0.339423	exceptions a function can throw.
-0.536067	interpreting the same bits differently.
-0.235890	user. Dynamic linking works differently.
-0.122678	Align by 16 __declspec( align(16))
-0.122678	align(16)) __attribute(( aligned(16))) __declspec( align(16))
-0.228671	elements Size of each element,
-0.228671	classes. Size of each element,
-0.372978	process. 3.5 Program loading Often,
-0.218599	a year or two. Often,
-0.265210	trigger the error condition. Replacing
-0.347530	declare the function inline. Replacing
-0.023260	(a+b)+(c+d) a*b+a*c=a*(b+c) a*x*x*x + b*x*x
-0.206294	"Optimizing subroutines in assembly language".
-0.294213	make pointers efficient, and that's
-0.314186	between different threads, but that's
-0.212327	Model-specific dispatching .................................................................................... 124 13.3
-0.466207	the time of programming. 13.3
-0.545918	aware that there are inherent
-0.515136	systems do not have inherent
-0.165190	160 /Qparallel -parallel -openmp -static
-0.165190	-fopenmp /Qopenmp -m32 -m64 -static
-0.210561	2 unused bytes S1 ArrayOfStructures[100];
-0.210561	at 19 }; S1 ArrayOfStructures[100];
-0.222403	CPU dispatch strategies........................................................................................ 122 13.2
-0.222401	in the source files. 13.2
-0.324749	pivot element. The integer comparison
-0.212327	can make an approximate comparison
-0.237953	the compiler takes the hint
-0.481308	keyword is only a hint
-0.218595	and maintenance .......................................................................................... 126 13.5
-0.212327	must be found elsewhere. 13.5
-0.237003	Difficult cases........................................................................................................ 124 2 13.4
-0.165190	make a reliable decision. 13.4
-0.235774	Gnu compiler ......................................................................... 128 13.7
-0.212327	are particularly critical. 129 13.7
-0.212327	"Register usage in kernel code"
-0.165190	code. The name "position-independent code"
-0.237739	is predicted well, of course.
-0.237739	are not safe, of course.
-0.221772	Taylor series, vectorized #include <dvec.h>
-0.221772	vector classes 114 #include <dvec.h>
-0.352976	what happens inside the loop,
-0.324396	n, including the while loop,
-0.532702	as an 8-bit signed number,
-0.283153	on the CPU family number,
-0.345226	example illustrates such a case:
-0.228173	ASCII string to lower case:
-1.178397	can be avoided by rolling
-0.237050	optimize example 8.26a by rolling
-0.345263	extra operations outside the loop:
-0.233595	b; // Critical innermost loop:
-0.237820	} x; x.f = 2.0f;
-0.463227	8.0f) * x + 2.0f;
-0.352822	less compact. See page 52.
-0.234981	in example 7.35 page 52.
-0.547117	directives are useful for supporting
-0.237511	Various development tools for supporting
-0.294176	code to check that thrown
-0.226791	to check for exceptions thrown
-0.347378	can do this by invoking
-0.236606	an application program without invoking
-0.023470	const int FactorialTable[13] = {1,
-1.569502	it is possible to construct
-0.525066	returns. Make the function construct
-0.526068	values before it is compiled.
-0.578589	before the program is compiled.
-0.230956	in matrix 96 void transpose(double
-0.230956	// Example 9.5b void transpose(double
-0.122678	see page 105. 8.7 Checking
-0.122678	directives .............................................................................................. 82 8.7 Checking
-0.152526	such as common subexpression elimination,
-0.152526	function inlining, common subexpression elimination,
-0.589833	StringLength; for (i = StringLength;
-0.345112	= string; int i, StringLength;
-0.235043	database integration, web application integration,
-0.232342	easy GUI development, database integration,
-0.493088	less than a few kilobytes
-0.212331	element. Matrix size Total kilobytes
-0.236412	and code size have got
-0.380749	CISC instruction sets have got
-0.237811	is to declare it locally
-0.378338	files and other resources locally
-0.237926	modifying only half of it,
-0.532936	the program logic allows it,
-0.294117	8 * 4 = 32.
-0.233815	in an integer, usually 32.
-0.356791	executable. SSE2 is the minimum
-0.236505	available. declaration size, bits minimum
-0.503623	easiest way to make thread-specific
-0.231414	or class for containing thread-specific
-0.237714	// Example 8.12b int a[2];
-0.345112	Example 8.12a int i, a[2];
-0.293708	a higher address which can't
-0.237000	cannot be shared. You can't
-0.235777	Function with vector parameters Vec4f
-0.165190	Vec8i Vec8ui Vec4q Vec4uq Vec4f
-0.299320	through a function call. (2)
-0.265210	overflow before it occurs, (2)
-0.321049	branch. See the preceding paragraph
-0.210560	stack unwinding The preceding paragraph
-0.293863	closed. The file will remain
-0.330080	elements at the diagonal remain
-0.148760	not uncommon for virus scanners
-0.148760	many users. Firewalls, virus scanners
-0.382236	sums } This loop calculates
-0.237461	the following example, which calculates
-0.231916	loading files or accessing databases,
-0.218595	interfaces to network resources, databases,
-0.165190	processing. Scott Meyers: "Effective C++".
-0.165190	2005; and "More Effective C++".
-0.322371	respectively (MS Visual Studio 2008
-0.165190	7 and Windows Server 2008
-0.367395	functions) /Gy -ffunction- sections /Gy
-0.200054	(remove unreferen- ced functions) /Gy
-0.276605	B; x.c = C; Assuming
-0.165190	other features it has. Assuming
-0.320370	small that a binary search,
-0.335660	or even a linear search,
-0.647957	dispatching in Intel compiler .........................................................................
-0.812764	dispatching in Gnu compiler .........................................................................
-0.798580	efficiency of different C++ constructs
-0.236533	of the advanced programming constructs
-0.048040	on different platforms, different screen
-0.048040	browsers, different platforms, different screen
-0.226793	possibility for further optimizations. Loops
-0.212327	the different microprocessors. 7.13 Loops
-0.236556	// Example 8.5a void Plus2
-0.327351	+ 2;} int a; Plus2
-0.824804	a+0 = a - a*0
-1.158651	= a - n.a. a*0
-0.809896	a*0 = 0 - a*1
-1.117487	= 0 - n.a. a*1
-0.221772	Agner vector classes #include "vectorclass.h"
-0.221772	automatic CPU dispatching #include "vectorclass.h"
-0.403914	= 1024; int a[size], b[size],
-0.473738	int i; float a[size], b[size],
-0.234712	// Polynomial coefficients double Table[100];
-0.234712	C = 3.3; double Table[100];
-0.184775	do. The following sections describe
-0.184775	cache. The subsequent sections describe
-0.339471	it still uses a GOT.
-0.237905	through the PLT and GOT.
-0.325337	code. Dynamic cast The dynamic_cast
-0.236890	class. This check makes dynamic_cast
-0.200063	of the program. 3 Finding
-0.200063	C++ language...................................................... 14 3 Finding
-0.008674	1, 2, 6, 24, 120,
-0.438956	program, and one for uninitialized
-0.551020	values if they are uninitialized
-0.165190	modules (See page 81). 77
-0.165190	optimization by compiler ....................................................................... 77
-0.915147	the length of a string.
-0.165190	have a false vendor string.
-1.712262	- - - x 74
-0.165190	Comparison of different compilers............................................................................. 74
-0.237592	call to C1::f } 73
-0.355585	is required. See page 73
-0.212327	{ CChild1 Object1; CChild2 Object2;
-0.200054	{ C1 Object1; C2 Object2;
-0.932169	is bigger than the destination
-0.237905	means that source and destination
-0.277719	it by a table lookup:
-0.395157	branch by a table lookup:
-0.237905	See page 73 and 72
-0.233335	+ b + a; 72
-0.236417	Day; if (Day & (Tuesday
-0.311187	of efficiency. The expression (Tuesday
-0.023429	{ int a:4; int b:2;
-0.044098	char string[100], *p = string;
-0.408094	manipulation is required for putting
-0.294079	reduced to 2 by putting
-0.237774	in example 14.14a with 14.14b
-1.154630	this to: // Example 14.14b
-0.930153	elements are accessed in non-
-0.331697	The mechanism relies on non-
-0.488023	convert example 15.1b to 15.1c.
-0.357739	and reorganize: // Example 15.1c.
-0.335738	points to (see page 73).
-0.335738	point precision (see page 73).
-0.634936	linking and position-independent code .......................................................
-0.531134	video or 3-dimensional vectors .......................................................
-0.200054	The overhead of semaphores, mutexes,
-0.200054	dynamically allocated memory, windows, mutexes,
-0.314763	opposite of register is volatile.
-0.328039	threads must be declared volatile.
-0.237634	position. Windows DLLs use relocation.
-0.428368	or addresses that need relocation.
-0.096622	(b&c) = (a&b) | (~a&c)
-0.096622	= 0, (a&b) | (~a&c)
-1.005475	then there is a 90%
-0.329228	hot spot that uses 90%
-0.237714	template <int m> int MultiplyBy
-0.237467	the template parameter. If MultiplyBy
-0.355446	sorting algorithms, are not suited
-0.312614	programming language is best suited
-0.019232	"IA-32 Intel Architecture Software Developer’s
-0.008674	a1, a2, b1, b2, y1,
-0.357739	the reciprocal: // Example 14.14a
-0.926063	the code in example 14.14a
-0.236631	be reinstalled and user settings
-0.200054	resolutions, different system color settings
-1.004758	version of the function. Compile
-0.226797	are the following: 130 Compile
-0.236287	the _mm_clflush intrinsic function. Provoke
-0.391786	of memory to disk. Provoke
-0.348260	a pointer in an import
-0.292831	DLL goes through an import
-0.447554	when a memory block turns
-0.228172	or if the prediction turns
-0.237905	0x2F00, 0x3700, 0x3F00 and 0x4700.
-0.331457	when we read from 0x4700.
-0.212327	x --- - ----- x----
-0.200054	--- - ----- x---- x----
-0.074790	for all applications. 2.8 Overcoming
-0.074790	interface framework........................................................................... 14 2.8 Overcoming
-0.200054	= 0 a+0=a a*0=0 a*1=a
-0.165190	a+0=a x-xxxxxx- a*0=0 --xxxx-xx a*1=a
-0.222401	less efficient than necessary. Take
-0.265210	uses the new features. Take
-0.638020	a lot of data shuffling,
-0.165190	data. Extra data conversion, shuffling,
-0.339058	into threads with different priorities
-0.235728	useful for assigning different priorities
-0.100718	"Error: Index out of range";
-0.234680	more complicated because various corrections
-0.218599	who have sent me corrections
-0.236842	languages, but also less safe.
-0.235996	make your program exception safe.
-0.589057	higher instruction set is supported.
-0.357598	double precision is not supported.
-0.330366	an expression or an anonymous
-0.330366	or class into an anonymous
-0.356846	a user-defined function is pure.
-0.461684	a function to be pure.
-0.330028	some more vector instructions SSE4.2
-0.200054	SSE3 tmmintrin.h SSE4.1 smmintrin.h SSE4.2
-0.237014	clock = __rdtsc(); return clock;
-0.344508	int DontSkip; long long clock;
-0.787805	the loop in example 12.4a
-0.234865	in situations like example 12.4a
-0.101690	transposition of different size matrices,
-0.101690	and copying different size matrices,
-0.234212	operator (^) may give inconsistent
-0.234211	a menu click becomes inconsistent
-0.165190	__m128i a = _mm_or_si128(c2, bc);
-0.165190	bit-mask: bc = _mm_andnot_si128(mask, bc);
-0.460080	81 optimization is to join
-0.538520	may be better to join
-0.851741	index is out of range.
-0.473618	// Index out of range.
-0.023429	a:4; int b:2; int c:2;
-0.283151	// Rounding is fast. Value
-0.265210	// Truncation is slow. Value
-0.237371	the grandparent class: class CGrandParent
-0.340044	class CParent : public CGrandParent
-0.008674	__m128i c2 = _mm_add_epi16(c, two);
-1.070213	x- x x x --
-0.222401	- - - xxxxxxxxx --
-0.237932	CPU brand check is bypassed
-0.462969	dispatching mechanism can be bypassed
-0.236086	the loading of several drivers,
-0.165190	14.13 System programming Device drivers,
-0.294244	and the other is -0
-0.294108	u.d is negative or -0
-0.414122	processors on a graphics accelerator
-0.219507	graphics coprocessor or graphics accelerator
-0.231414	by optimizing database access. 3.10
-0.222403	Other databases ....................................................................................................... 21 3.10
-0.222403	3.10 Graphics ................................................................................................................. 21 3.11
-0.265210	which one is best. 3.11
-0.353135	simple solution, but it increases
-0.406783	enough. A hash table increases
-0.165196	Network access ...................................................................................................... 21 3.13
-0.165196	is heavily loaded. 21 3.13
-0.222401	3.13 Memory access....................................................................................................... 22 3.14
-0.212327	87 about memory caching. 3.14
-0.224958	CPU with multiple cores. 3.15
-0.222401	3.14 Context switches..................................................................................................... 22 3.15
-0.320086	break a dependency chain. 3.16
-0.222401	Dependency chains ................................................................................................ 22 3.16
-0.237533	style. Some compilers make Sum1
-0.235776	for the three functions. Sum1
-0.008674	a+b+c+d = (a+b)+(c+d) a*b+a*c=a*(b+c) a*x*x*x
-0.068042	!= 0) *(p++) |= 0x20;
-0.068042	0; i--) *(p++) |= 0x20;
-0.454854	must be divisible by TILESIZE
-0.354232	of squares: const int TILESIZE
-0.237070	that can reduce any expression,
-0.234534	last in an && expression,
-0.023401	Finding the biggest time consumers
-0.358547	stamp counter in the CPU,
-0.400478	computer with a slow CPU,
-0.324528	* p; p = &Object1;
-0.237220	* p1; p1 = &Object1;
-0.536314	Optimization in embedded systems .............................................................................
-0.718648	what the compiler does .............................................................................
-0.035787	_mm_cvtss_f32(s); } // Approximate exp(x)
-0.035787	*= n+1; // Approximate exp(x)
-0.152902	at the time of programming.
-0.237733	timediff[i] = ReadTSC() - time1;
-0.344508	int i; long long time1;
-0.225378	generate interrupts at certain events,
-0.225378	up to count certain events,
-0.354185	computationally intensive program is achieved
-0.314681	the portability could be achieved
-0.221772	Vectorized with SSE2 #include <emmintrin.h>
-0.221772	or x64 141 #include <emmintrin.h>
-0.228173	best for all applications. 2.8
-0.224958	user interface framework........................................................................... 14 2.8
-0.341882	you cannot find the answers
-0.440911	where you can get answers
-0.509019	applications: The cost of starting
-0.212331	of programming language Before starting
-0.171470	register stack also has disadvantages:
-0.171470	Loop unrolling also has disadvantages:
-0.229252	of microprocessor ........................................................................................... 6 2.3
-0.508392	scope of this manual. 2.3
-0.984437	the loop control branch ahead
-0.525310	increment the loop counter ahead
-0.655711	the critical function is inserted
-0.331568	} Here, we have inserted
-0.510158	and the data cache. 2.2
-0.233332	hardware platform ....................................................................................... 5 2.2
-0.251373	float vectors) /arch:SSE -msse /arch:SSE
-0.165190	(128 bit float vectors) /arch:SSE
-0.531132	Choosing the optimal platform 2.1
-0.233332	optimal platform ........................................................................................... 5 2.1
-0.875811	a = b + 2.0
-0.236957	1.0 <= u.f < 2.0
-0.008674	24, 120, 720, 5040, 40320,
-0.236682	// Example 9.1a int Func(int);
-0.236682	// Example 9.1b int Func(int);
-0.237597	because the threads will invalidate
-0.165190	Alternatively, you may actively invalidate
-0.237522	are accessed sequentially. The opposite
-0.237522	be used most. The opposite
-0.237908	a risk factor in itself,
-0.290008	installation of the framework itself,
-0.228173	of function libraries........................................................................................ 12 2.7
-0.165190	Function names are undocumented. 2.7
-0.348696	if (y) { int a[1000];
-0.236682	union { 89 int a[1000];
-0.233817	of compiler .................................................................................................... 10 2.6
-0.233587	built with another compiler. 2.6
-0.237788	Numerically Intensive Codes", by S.
-0.165190	scientific vector processors. Henry S.
-0.333986	stored in a thread environment
-0.234810	Windows. The integrated development environment
-0.859017	F1(a); } else { F2(b);
-0.251373	else { float b[1000]; F2(b);
-0.355046	very efficient because it handles
-0.348007	and how the microprocessor handles
-0.374489	do whole program optimization. 2.4
-0.229252	of operating system......................................................................................... 6 2.4
-1.249057	It is important to note
-0.200054	the subsequent manuals. Please note
-0.534362	you may consider whether others
-0.218599	functions are optimized well, others
-0.236128	available to fit specific needs.
-0.299328	version satisfies the user's needs.
-0.235830	add ebx, eax / sar
-0.235498	$B1$2: mov shr add sar
-0.035787	same module __attribute__ ((visibility ("internal")))
-0.035787	((visibility ("internal"))) __attribute__ ((visibility ("internal")))
-0.593119	elements. Example: // Example 8.15a
-0.355093	and b in example 8.15a
-0.449882	so that the memory footprint
-0.235948	have a larger memory footprint
-0.237905	in example 14.12b and 14.13b
-0.724012	replaced with: // Example 14.13b
-0.237813	regardless of scope or namespaces.
-0.628685	execution speed to using namespaces.
-0.788464	method is useful for preventing
-0.165190	backup copying without effectively preventing
-0.537546	InstructionSet() #include "asmlib.h" // Lowest
-0.237246	CriticalFunction = &CriticalFunction_Dispatch; // Lowest
-0.165190	Tuesday, Wednesday, Thursday, Friday, Saturday
-0.165190	0x10, Friday = 0x20, Saturday
-0.017525	a*b*c=a*(b*c) a+b+c+d = (a+b)+(c+d) a*b+a*c=a*(b+c)
-0.017525	a+b+c=c+b+a a+b+c+d = (a+b)+(c+d) a*b+a*c=a*(b+c)
-0.008674	different platforms, different screen resolutions,
-0.224958	an int is 4. So
-0.165190	answer questions from everybody. So
-0.355664	different array. // Example 9.6a
-0.535049	Time per element Example 9.6a
-0.102611	a+(b+c) - a*b+a*c = a*(b+c)
-0.102611	- n.a. a*b+a*c = a*(b+c)
-0.232720	are not reproducible. Such events
-0.228173	be caused by random events
-0.237788	are not affected by __fastcall.
-0.797746	when you are using __fastcall.
-0.809896	a-a = 0 - a+0
-1.117487	= 0 - n.a. a+0
-0.226793	the expected real-time speed. Delays
-0.165190	affinity mask. Poor reproducibility. Delays
-0.200054	See ISO/IEC TR18015 Technical Report
-0.165190	ISO/IEC TR 18015, "Technical Report
-0.538692	i++) { aa[i] = (bb[i]
-0.235500	(cc[i] + 2) : (bb[i]
-1.166225	later instruction set is specified.
-0.357007	the appropriate instruction set specified.
-0.538739	applied to a function prototype
-0.346569	int parm2); // Function prototype
-0.353674	The example on page 39
-0.165190	j < columns; j++) 39
-0.355501	vectorized automatically. For example, let's
-0.165190	To explain the difference, let's
-0.023460	}; Weekdays Day; if (Day
-0.571812	functions may not be visible
-0.357598	the alignment is not visible
-0.236791	of 8 - 64 Kbytes
-0.234810	level-2 cache of 256 Kbytes
-0.357958	A cache is a proxy
-0.314703	in a computer. The proxy
-0.237592	* temp; 104 } Microprocessors
-0.567173	9.11 Explicit cache control Microprocessors
-1.408039	as explained on page 105.
-0.290886	vector operations, see page 105.
-0.236289	that has been accessed recently
-0.309589	always chooses the least recently
-0.331912	a large cost to creating
-0.237880	class is responsible for creating
-0.023470	i++) { j = order(i);
-0.293520	that the member pointer refers
-0.228175	in parallel. Coarse-grained parallelism refers
-0.356958	the file to a floppy
-0.356122	removable media such as floppy
-0.779914	is no risk of underflow.
-0.341833	around on overflow and underflow.
-0.218595	7.7 Function pointers ...................................................................................................... 37
-0.200054	should definitely be avoided. 37
-0.237908	implicit pointer known in 36
-0.200054	Pointers and references ............................................................................................ 36
-0.230273	4) | ((C & 3)
-0.230273	0x0F) | ((B & 3)
-0.313839	theoretical possibility that such contrived
-0.350400	is indeed a very contrived
-0.224958	the number of branches. Manual
-0.330854	is available from www.intel.com. Manual
-0.313496	uses more cache space. Excessive
-0.165190	compilers unroll too much. Excessive
-0.236556	// Example 7.12 void FuncA
-0.165190	times and calls alternately FuncA
-0.237905	such as email and web
-0.200054	GUI development, database integration, web
-0.314651	that no overflow can occur,
-0.236173	as the error doesn't occur,
-0.354348	are actually able to reorder
-0.352786	Scheduling A compiler may reorder
-0.237813	rather than seconds or microseconds
-0.313402	the critical functions take microseconds
-0.356791	of containers is the Standard
-0.347523	language relates to security. Standard
-0.659360	The effect of the const_cast
-0.325337	CPUs"). Const cast The const_cast
-0.230115	is hardly ever used, though.
-0.165190	is not always optimal, though.
-0.165199	long or int64_t MS compiler:
-0.165199	long or uint64_t MS compiler:
-0.035787	F2(float x[]); void F3(bool y)
-0.035787	Example 9.2b void F3(bool y)
-0.200054	/Qopenmp -m32 -m64 -static /MT
-0.165190	linking (multithreaded) /arch:AVX /openmp /MT
-0.350838	the local object is overwritten,
-0.461684	other variables to be overwritten,
-0.270599	ebx ecx, DWORD PTR [esp+8]
-0.270599	PTR [esp+4] DWORD PTR [esp+8]
-0.358403	load time can be annoyingly
-0.234212	because this would give annoyingly
-0.520723	16; int i; float list[size];
-0.232346	size = 100; S1 list[size];
-0.218595	Public License, optional commercial license
-0.200054	control no yes License license
-0.165190	/ (b1 * b2); y1
-0.165190	b1, b2, y1, y2; y1
-0.265210	* b2 * reciprocal_divisor; y2
-0.165190	= a1 / b1; y2
-0.094898	swap two elements: #define swapd(x,y)
-0.094898	two array elements: #define swapd(x,y)
-0.192864	int i; if ((unsigned int)i
-0.192864	Example 14.4b if ((unsigned int)i
-0.048279	// SSE2 version int CriticalFunction_SSE2(int
-0.233168	clock frequency is 2 GHz
-0.233168	μs on a 2 GHz
-0.595221	evaluated if a is false.
-0.237621	and all 0's when false.
-0.233817	drivers for Windows. 10 Multithreading
-0.212327	subtasks is necessary. 101 Multithreading
-0.269931	code for Intel CPUs. New
-0.269931	code for AMD CPUs. New
-0.101345	dispatcher function. typeof(CriticalFunction) * CriticalFunctionDispatch(void)
-0.101345	__asm__ ("CriticalFunction"); typeof(CriticalFunction) * CriticalFunctionDispatch(void)
-0.313206	code examples for these methods.
-0.340417	use inappropriate CPU dispatch methods.
-0.237704	when a genuine compiler became
-0.224958	compiler for Basic soon became
-0.331364	advantageous if, and only if,
-0.347322	without caching is advantageous if,
-0.218599	majority of end user's computers.
-0.251373	of yesterday's big mainframe computers.
-0.508392	int A, B, C; x.abc
-0.165190	needed: // Example 7.40c x.abc
-0.035787	unit-testing ...................................................................................... 156 16.3 Worst-case
-0.035787	unreasonably large. 156 16.3 Worst-case
-0.229250	between two simple expressions. Operations
-0.218595	pointer alignment and aliasing. Operations
-0.236734	This includes the libraries named
-0.236652	extended to 256-bit registers named
-0.200054	a-(-b)=a+b ---xxx-x- a+0=a x-xxxxxx- a*0=0
-0.200054	a-a = 0 a+0=a a*0=0
-0.237905	contained in p1 and p2
-0.165190	p1->Hello(); CChild2 * p2; p2
-0.294217	information is contained in p1
-0.165190	Object2; CChild1 * p1; p1
-0.346673	compilers exist for all major
-0.446065	and supported on all major
-0.325034	Some application programs use internet
-0.224956	163 Internet forums Several internet
-0.233779	int c;}; abc * p;
-0.233779	C2 Object2; CHello * p;
-0.463559	<emmintrin.h> static inline int lrintf
-0.450002	done with the functions lrintf
-0.577697	inlined so that the resulting
-0.827091	a); } } The resulting
-0.074790	call transpose function swapd(a[r][c], a[c][r]);
-0.074790	columns below diagonal swapd(a[r][c], a[c][r]);
-0.368971	instructions for high precision math.
-0.227932	instruction set. High precision math.
-0.294117	8192 / 4 = 2048
-0.327834	2040 38.7 512 512 2048
-0.381944	c = d + 3.5;
-1.222204	a = b * 3.5;
-0.237882	DLLs use relocation. The DLLs
-0.236212	the current position. Windows DLLs
-0.348326	or PathScale compiler for Unix
-0.237188	transferred in registers. 64-bit Unix
-0.342499	112 Vectorized table lookup Lookup
-0.284372	than the table lookup. Lookup
-0.506097	If the template parameters differ
-0.222403	function libraries and drivers differ
-0.023470	library int level = InstructionSet();
-0.324988	... ~C1(); }; void F1()
-0.230956	the function prototype: void F1()
-0.237738	also less safe. This safety
-0.222403	solution that doesn't compromise safety
-0.237926	_mm_add_epi16(a,b). Two libraries of predefined
-0.236171	Use intrinsic functions Use predefined
-0.864558	how to make a variable-size
-0.332426	delete is to allocate variable-size
-0.236417	* p = & obj1;
-0.301501	void g() { C1 obj1;
-0.008674	Optimization of Numerically Intensive Codes",
-0.438315	lookup. These instructions are summarized
-0.420600	optimizations. The results are summarized
-0.237932	of different targets is small.
-0.568272	out to be too small.
-0.588419	Exceptions and error handling ................................................................................
-0.466207	the biggest time consumers ................................................................................
-0.350324	send data from a buffer.
-0.474838	called the branch target buffer.
-0.065248	size = 100; float list[size],
-0.022270	file for InstructionSet() #include "asmlib.h"
-0.237926	replace all occurrences of ArraySize
-0.354232	Integer constant const int ArraySize
-0.237209	integer Register variables, float Live
-0.284372	benefit from register storage. Live
-0.165190	a = _mm_blendv_epi8(bc, c2, mask);
-0.165190	bit-mask: c2 = _mm_and_si128(c2, mask);
-0.237774	functions have names with suffixes
-0.236085	.R. for AVX. These suffixes
-0.357091	done manually by the programmer.
-0.445225	job of the application programmer.
-0.200054	--xxxxxx- a-(-b)=a+b ---xxx-x- a+0=a x-xxxxxx-
-0.165190	- - - x-xx----x x-xxxxxx-
-0.294241	not been given a name.
-0.825228	function with the same name.
-0.357792	extra layer of a third-party
-0.527609	uses. There are also third-party
-0.330854	b+a a*b = b*a (a+b)+c=a+(b+c)
-0.165190	reductions: a+b=b+a a*b=b*a a+b+c=a+(b+c) (a+b)+c=a+(b+c)
-0.702352	contains many functions for audio
-0.165190	Programs that produce streaming audio
-0.237753	references accept expressions as arguments
-0.825228	function with the same arguments
-0.345047	which would be an infinite
-0.279504	used only for avoiding infinite
-0.557891	later in the program flow.
-0.329323	possible cases of program flow.
-0.236584	be overwritten, and even worse,
-0.226791	Intel Pentium 4. Even worse,
-0.452171	time because the cache miss
-0.445375	if, a level-2 cache miss
-0.358060	be permissible if the unsafe
-0.237932	memset and memcpy is unsafe
-0.302544	some expression is optimized away.
-0.227671	reordered, inlined, or optimized away.
-0.498951	intended for calculating the movements
-0.222403	for calculating the physical movements
-0.172277	- n.a. x*x*x*x*x*x*x*x = ((x2)
-0.237473	= ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x = ((x2)
-0.237926	or malloc. Handles to windows,
-0.367392	as dynamically allocated memory, windows,
-0.237926	an immediate response to pressing
-0.234932	for simple tasks like pressing
-0.283151	size as vector register. Factors
-0.226793	how advantageous vectorization is. Factors
-0.356122	by considerations such as price,
-0.338248	comes at a high price,
-0.276605	+ 1.; Eliminate jumps Jumps
-0.218595	code is not optimized. Jumps
-0.314420	of verifying, debugging and maintaining
-0.237645	fine-tuning, testing, verifying and maintaining
-0.212327	always evaluate both operands. Nevertheless,
-0.251373	than in a PC. Nevertheless,
-0.237905	File input/output Graphics and sound
-0.224958	Examples are image processing, sound
-0.237905	on network resources and servers
-0.237763	This is useful on servers
-0.102082	line or a make utility.
-0.102082	such as a make utility.
-0.589415	optimized version of the executable.
-0.357767	linked into the same executable.
-0.332397	access times cannot be controlled.
-0.332397	network resources cannot be controlled.
-0.323192	searching, or the specific literature
-0.303139	to consult the general literature
-0.136995	const int SIZE = 512;
-0.236287	has to take extra precautions
-0.232342	need to take special precautions
-0.374195	discovered that there are smarter
-0.374195	discover that there are smarter
-0.074790	Numerically Intensive Codes", SIAM 2001.
-0.074790	and A. Hoisie, SIAM 2001.
-0.330854	to (see page 73). Current
-0.165190	sum2 are called accumulators. Current
-0.023512	your optimization effort is concentrated
-1.026251	< size; i++) { aa[i]
-0.328703	< 256; i++) { aa[i]
-0.237753	a[i]; // Return a null
-0.294040	manner by returning a null
-0.770640	The Intel compiler is capable
-0.444260	temp2. Modern CPUs are capable
-0.237592	else { FuncB(i); } FuncC(i);
-0.251373	+= 2) { FuncA(i); FuncC(i);
-0.325364	compelling security reason for updating.
-0.165190	and necessary support. Hardware updating.
-0.237905	instructions MOVNTPS, MOVNTPD and MOVNTDQ
-0.824719	16 bytes without cache MOVNTDQ
-0.314703	not in memory. The renaming
-0.330509	are capable of register renaming
-0.231917	than 2 GB. When considering
-0.222409	DLL. Another alternative worth considering
-0.556566	applications. Therefore, it is worthwhile
-0.565272	vector. It may be worthwhile
-0.200054	xxxxxxxxx xxxxxxx-x xxxxxxxxx x-xxx---- a-(-b)=a+b
-0.165190	2 a+a+a+a=a*4 -(-a)=a --xxxxxx- a-(-b)=a+b
-0.331374	{ public: virtual void f();
-0.323030	Each object is allocated separately.
-0.226791	output should be measured separately.
-0.236558	data for regular access patterns
-0.218595	be arranged in regular patterns
-0.332976	container classes on page 93.
-1.245736	as explained on page 93.
-0.348385	dispatcher in only the lowest
-0.165190	= &SelectAddMul_SSE2; // Error: lowest
-0.233589	<float.h> #include <math.h> #define EXCEPTION_FLT_OVERFLOW
-0.232347	exceptions: __except (GetExceptionCode() == EXCEPTION_FLT_OVERFLOW
-0.237929	programmer has defined a constructor,
-0.311772	hidden pointer. The copy constructor,
-0.165190	// 32-bit Windows, Intel/MASM syntax:
-0.165190	// 32-bit Linux, Gnu/AT&T syntax:
-0.523272	a = b * 1.2;
-0.571465	are explained on page 26.
-0.352822	of storage. See page 26.
-0.408203	have to set the parentheses
-0.348231	d); Now the two parentheses
-0.312452	optimize away an overflow check.
-0.233045	a little more syntax check.
-0.531187	made a series of experiments
-0.523107	often necessary to do experiments
-0.535487	Out of order execution .................................................................................................
-0.554671	14.1 Use lookup tables .................................................................................................
-0.008674	in manual 4: "Instruction tables".
-0.235830	cmp eax, 100 / jl
-0.265210	add mov add cmp jl
-0.476847	add to the total computation
-0.251373	assume that the overall computation
-0.347017	Most compilers can make thread-local
-0.231921	or global variables. (See thread-local
-0.304223	currently not up to date.
-0.304223	CPU dispatchers up to date.
-0.648914	systems also have a physics
-0.307644	also have a dedicated physics
-0.345472	that a+b is calculated first,
-0.235422	all the R values first,
-0.229262	in a register (see below)
-0.229262	clock cycle counter (see below)
-0.336300	The if branch is eliminated.
-0.237859	and parameter transfer are eliminated.
-1.413031	Load eight consecutive elements c.load(cc+i);
-0.200054	+= 16) { b.load(bb+i); c.load(cc+i);
-0.165190	Boolean algebra reductions: !(!a)=a x-xxxxxxx
-0.165190	- x-xx----x x-xxxxxx- x-xxxx-x- x-xxxxxxx
-0.315856	depending on the CPU. Unrolling
-0.165190	and FuncB, then FuncC. Unrolling
-0.314296	if the processor has hyperthreading.
-0.443536	no advantage to using hyperthreading.
-0.200054	One kilobyte is 1024 bytes,
-0.200054	8 kb = 8192 bytes,
-0.036522	as static link libraries (*.lib,
-0.579444	RAM, a lot of irrelevant
-1.324973	is likely to be irrelevant
-0.352282	However, you must be careful
-0.234815	the code still needs careful
-0.235884	processing, signal processing, data compression
-0.235884	functions Encryption, decryption, data compression
-0.237641	high resolution if time intervals
-0.265210	consume time at unpredictable intervals
-0.054814	r1+TILESIZE; r2++) { for (c2
-0.237715	the user expects an immediate
-0.265210	insufficient. The user expects immediate
-0.301501	test () { C1 Object1;
-0.218599	test () { CChild1 Object1;
-0.212327	point operations (addition, multiplication, etc.)
-0.165190	BSD, Intel-based Mac OS, etc.)
-0.308340	and Linux, 32-bit and 64-bit.
-0.308340	OS X, 32-bit and 64-bit.
-0.648827	return 0; } The indirect
-0.165190	A feature called "Gnu indirect
-0.165190	a/1=a xxxxxxxxx 0/a=0 ---x---xx (-a==-b)=(a==b)
-0.165190	a/1=a x-xxx-x-- 0/a=0 ---xx--xx (-a==-b)=(a==b)
-0.478060	If the microprocessor has hyperthreading,
-0.443536	an advantage to using hyperthreading,
-0.356723	sign, eee is the exponent,
-0.237869	the sign bit, the exponent,
-0.503578	2 == 0) { FuncA(i);
-0.737481	i += 2) { FuncA(i);
-0.339497	cross-platform portability. Unfortunately, the cross-platform
-1.378660	for the sake of cross-platform
-0.017525	two elements: #define swapd(x,y) {temp=x;
-0.017525	array elements: #define swapd(x,y) {temp=x;
-0.235884	data. This is data decomposition.
-0.345296	functional decomposition and data decomposition.
-0.290730	parameter and a template parameter:
-0.378563	given as a template parameter:
-0.237461	with a profiler which determines
-0.652945	if the first operand determines
-0.056907	7.18 Class data members (properties)
-0.457670	and static const int ABC
-0.233589	constants. For example, #define ABC
-0.504666	4 Most of the comments
-0.350372	will make a few comments
-0.276609	.......................................................................................................... 38 7.10 Arrays .....................................................................................................................
-0.265210	options....................................................................................... 160 19 Literature .....................................................................................................................
-0.804656	deciding whether it is profitable
-0.461684	this appears to be profitable
-0.284376	better explains the logic behind
-0.212327	16 is actually hidden behind
-0.251373	platforms By Agner Fog. Technical
-0.165190	code. See ISO/IEC TR18015 Technical
-0.349081	the same instruction set. Neither
-0.165190	more than an hour. Neither
-0.352798	is optimal for each calculation.
-0.347304	to start the next calculation.
-0.165198	function is pure __attribute(( const))
-0.165198	pure __attribute(( const)) __attribute(( const))
-0.336303	manageable and easier to test,
-0.458400	of the program under test,
-0.074790	/Gy -ffunction- sections /Gy -ffunction-
-0.074790	unreferen- ced functions) /Gy -ffunction-
-0.074790	vectors) /arch:SSE -msse /arch:SSE -msse
-0.074790	bit float vectors) /arch:SSE -msse
-0.293463	i = 0; // Initialize
-0.237246	T // Constructor // Initialize
-0.237121	array sizes and array indices
-0.233050	are identified by consecutive indices
-0.325265	i += 4) { s0
-0.381861	4 float a[100]; float s0
-0.265210	-static Generate assembly listing /FA
-0.165190	/FA -S - masm=intel /FA
-0.331841	each new version for marketing
-0.228173	There is no heavy marketing
-0.341830	int CriticalFunctionType(int parm1, int parm2);
-0.165190	chosen version return (*CriticalFunction)(parm1, parm2);
-0.200054	a+b=b+a a*b=b*a a+b+c=a+(b+c) (a+b)+c=a+(b+c) --xx-----
-0.200054	x (x) x-xx--xx- x--x----- --xx-----
-0.136995	const int rows = 20,
-0.314788	course that reflects the conflicting
-0.345243	These requirements are often conflicting
-1.394615	= 0; i < 20;
-0.331933	track backwards though the 61
-0.200054	and error handling ................................................................................ 61
-0.237788	a function uses by looking
-0.200054	in details. The funny looking
-0.237774	works more efficiently with coarse-grained
-0.381526	important to distinguish between coarse-grained
-0.236903	then the mirror elements matrix[c][r]
-0.535323	is swapped with element matrix[c][r]
-0.200054	/arch:AVX /QaxSSE3, etc. -msse3 -mssse3
-0.165190	/openmp /MT -msse3 /arch:SSE3 -mssse3
-1.073594	may be useful to isolate
-0.237905	how to identify and isolate
-0.292879	understand compiler-generated assembly code. Let
-0.230114	into lines and sets. Let
-0.237641	of the task in question.
-0.538334	into the algorithm in question.
-0.575402	for small x // x^n
-0.235771	xx4; // next four x^n
-0.237872	CPU dispatch mechanism that treats
-0.343853	the Intel CPU dispatcher treats
-0.047405	systems. 14 Specific optimization topics
-0.047405	130 14 Specific optimization topics
-0.230114	through a self-relative address. (3)
-0.165190	guaranteed to wrap around, (3)
-0.459163	to use a lookup table:
-0.247064	by using a lookup table:
-0.382869	if the network is unstable
-0.237859	and that measurements are unstable
-0.313496	number of CPU cores. 60
-0.165190	Templates...............................................................................................................57 7.29 Threads .................................................................................................................. 60
-0.357124	use this number of iterations.
-0.165190	Taylor expansions and Newton-Raphson iterations.
-0.347523	float Live range analysis Join
-0.466207	the same memory area. Join
-0.538692	i++) { aa[i] = bb[i]
-0.237621	is all 1's when bb[i]
-0.355906	short time then the sampling
-0.200054	point exceptions, etc. Event-based sampling
-0.314630	the formula: (set) = (memory
-0.480042	for all the objects (memory
-0.294184	unit-testing is necessary for verifying
-0.165190	cost of fine-tuning, testing, verifying
-0.226793	Program loading ....................................................................................................... 19 3.6
-0.165190	.exe file, is acceptable. 3.6
-0.222403	Program installation .................................................................................................. 18 3.4
-0.212327	in a standardized manner. 3.4
-0.311121	avoids overflow: a[i] = log(b[i])
-0.311121	safe formula a[i] = log(b[i])
-0.212327	value -100+100+100 = 100. Now,
-0.165190	* (columns * sizeof(float)). Now,
-0.537060	= ((x2) 2) 2 a+a+a+a=a*4
-0.165190	((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x = ((x2)2)2 a+a+a+a=a*4
-0.336329	If you are in doubt
-0.355895	fastest execution is no doubt
-0.136203	pointer aliasing (see page 78).
-0.136203	out aliasing (see page 78).
-0.237905	know that u.f and v.f
-0.234381	{ // u.f > v.f
-0.251373	accessed in a FIFO manner?
-0.165190	accessed in a FILO manner?
-0.356630	flush-to-zero mode rather than generating
-0.236606	numbers in question without generating
-0.276605	have got low priority. Especially
-0.222401	aligned at round addresses. Especially
-0.606882	2.5 Choice of compiler ....................................................................................................
-0.372974	18 3.4 Automatic updates ....................................................................................................
-0.236531	clock cycle? ...................................................................................... 16 3.2
-0.165190	that can be improved. 3.2
-0.382501	}; if (y) { F1(a);
-0.251373	(y) { int a[1000]; F1(a);
-0.236287	isolating a single function. Switch
-0.276609	more than two ways. Switch
-0.023470	(0,0,0,0,0,0,0,0) __m128i zero = _mm_set1_epi16(0);
-0.447607	often use position-independent code everywhere
-0.317449	dispatch branches are scattered everywhere
-0.237220	(a&&b) || (a&&c) = a&&(b||c)
-0.381876	(a&&c) || (a&&b&&c) = a&&(b||c)
-0.237820	(~a&c) | (b&c) = (a&b)
-0.338480	& 0 = 0, (a&b)
-0.236531	hot spots .................................................................................. 16 3.3
-0.165190	in the following sections. 3.3
-0.236531	time consumers ................................................................................ 16 3.1
-0.466207	the biggest time consumers 3.1
-0.437779	a branch that goes randomly
-0.411504	if data are scattered randomly
-0.218595	of arrays and structures. Useful
-0.165190	maximum possible memory requirement. Useful
-0.230826	3.7 File access................................................................................................................ 20 3.8
-0.165190	disk operations to finish. 3.8
-0.203375	System database ...................................................................................................... 20 3.9
-0.203375	files (*.ini files). 20 3.9
-0.533558	8.1 How compilers optimize ............................................................................................
-0.640404	7.6 Pointers and references ............................................................................................
-0.425006	investing in a big mainframe
-0.229061	that of yesterday's big mainframe
-0.421238	function to test // (time
-0.237733	// (time after) - (time
-0.231913	relevant to software optimization. Everything
-0.228173	use the same register. Everything
-0.294053	alignment by 16 is required.
-0.237764	unless the strictness is required.
-0.707671	cannot rule out the theoretical
-0.237882	most efficient alternative. The theoretical
-0.237926	reduce example 12.1b to 12.1a.
-0.593119	instructions. Example: // Example 12.1a.
-0.463153	the data in the file,
-0.165190	directory as the .exe file,
-0.229258	consumer to many hard working
-0.165190	optimized by using indexes, working
-0.237882	not to vectorize. The pragmas
-0.237753	insert optimization hints as pragmas
-0.237926	software engineering principles to use.
-0.696940	it is not in use.
-0.350843	is slow, difficult to use,
-0.236806	certain rules about register use,
-0.236842	that make vectorization less favorable:
-0.528544	Factors that make vectorization favorable:
-0.236529	screen resolutions, different system color
-0.218599	reciprocal square root, RGB color
-0.463723	The critical stride is 8192
-0.237820	is 8 kb = 8192
-0.023470	bit-mask: __m128i mask = _mm_cmpgt_epi16(b,
-0.314763	with older microprocessors is lost.
-0.237859	and user settings are lost.
-0.074790	Threads .................................................................................................................. 60 7.30 Exceptions
-0.074790	techniques of multithreading. 7.30 Exceptions
-0.504666	are out of the question
-0.408137	hold the numbers in question
-0.791677	at a time and afterwards
-0.524374	0x1C. If the program afterwards
-0.976462	before the function returns. Every
-0.165190	rows, not the columns. Every
-0.314788	in addition, set the denormals-are-zero
-0.237905	7.6. Set flush-to-zero and denormals-are-zero
-0.784844	static to the function declaration.
-0.345875	appear in the class declaration.
-0.341445	overflow but no other exceptions:
-0.234385	is to resume after exceptions:
-0.414178	bulky and difficult to read.
-0.319600	the code difficult to read.
-0.233330	object oriented programming are: Non-static
-0.330854	and only one instance. Non-static
-0.657596	the form of a re-
-0.237813	than allocating piecewise or re-
-0.341439	with another dynamic library requiring
-0.234209	on a complex framework requiring
-0.222403	Example 7.13 struct abc {int
-0.200054	= 1024; struct Sab {int
-0.901498	the disadvantage that the branching
-0.638866	the critical function. The branching
-0.606704	the function pointer has changed.
-0.610817	a variable is never changed.
-0.353170	compile-time whether the object belongs
-0.218599	everything else. This normally belongs
-0.227302	(a+b)+c=a+(b+c) --xx----- (a&&b) || (a&&c)
-0.227302	= a&&b (a&&b) || (a&&c)
-0.354232	example 16.1 const int NumberOfTests
-0.165190	each test // Repeat NumberOfTests
-0.294244	choice of platform is obviously
-0.355508	is large then it obviously
-0.268539	written table may go undetected.
-0.215273	that would otherwise go undetected.
-0.296289	a thread that runs alone
-0.165190	used as a stand alone
-0.357013	place indicated by the caller
-0.355358	at runtime from the caller
-0.327693	lead to a better understanding
-0.165190	language and a basic understanding
-0.294137	the development process can influence
-0.345002	C++ program. This has influence
-0.535243	? b : c x-xx-----
-0.224956	xx(-)x- - x-xxxx--x x-xxxx--x x-xx-----
-1.136651	the function is called. Lazy
-0.212327	and sometimes unacceptably long. Lazy
-0.237827	For unused returns // Volatile
-0.333897	(or higher) is enabled. Volatile
-0.500123	You may need to lock
-0.165190	thread than to temporarily lock
-0.294244	for educational purposes is allowed.
-0.357598	and mirroring is not allowed.
-0.459232	makes caching more efficient today
-0.236629	what is brand new today
-0.182693	double d; d = (double)(signed
-0.358080	executed faster in a programmable
-0.237603	Programmable logic devices A programmable
-0.293363	that do have such checks.
-0.233045	way or bypassing syntax checks.
-0.442835	calculation and table lookup mechanisms
-0.165190	the old version. Updating mechanisms
-0.338280	are summarized in table 8.1.
-0.234535	n.a. n.a. - Table 8.1.
-0.353661	first, then all the G
-0.337134	one vector, the four G
-0.604666	Windows Intel compiler Linux Align
-0.224958	etc. for Linux) 4. Align
-0.237269	object for (b + c)
-0.329298	(a > b / c)
-0.325261	Generic version CriticalFunction = &CriticalFunction_386;
-0.324570	// Default version return &CriticalFunction_386;
-0.407090	that the first two (three
-0.562775	than on the stack (three
-0.499285	0; } else { goto
-0.893359	1; } else { goto
-0.522167	when you are not testing.
-0.293832	main feedback comes from testing.
-0.074790	don't need the "override" feature.
-0.074790	to implement this "override" feature.
-0.074790	and GOT. The symbol interposition
-0.074790	object. This so-called symbol interposition
-0.538796	an initialization routine that loads
-0.407600	library. The application program loads
-0.008674	static link libraries (*.lib, *.a)
-0.235566	to keep their CPU dispatchers
-0.235566	processors properly. Many CPU dispatchers
-0.228173	takes up one register. Registers
-0.212327	the pointer or reference. Registers
-0.236039	floating point exceptions, etc. Event-based
-0.165190	but is less reliable. Event-based
-0.425213	solve all the problems associated
-0.288686	other common programming errors associated
-0.545045	loading can be a time-consumer
-0.313496	access is the biggest time-consumer
-0.536705	in the CPU detection mechanism.
-0.313490	by the branch prediction mechanism.
-0.100819	using namespaces. 65 8 Optimizations
-0.100819	7.33 Namespaces........................................................................................................... 65 8 Optimizations
-0.294213	number of devices and machines
-0.281585	machine. The best Java machines
-0.329995	for dealing with this problem:
-0.236392	possible remedies against this problem:
-0.235424	to default constructors, copy constructors,
-0.229252	This applies to default constructors,
-0.230112	simpler when using references. References
-0.218595	to a wrong type. References
-0.048312	if instruction sets are mutually
-0.048312	when instruction sets are mutually
-0.382619	This is coded as _mm_empty()
-0.332419	you have to execute _mm_empty()
-0.569738	initialization. The compiler may report
-0.236733	file /Fm Generate optimization report
-0.237449	need to remove all disturbing
-0.235418	the best-case conditions. All disturbing
-0.237908	possible minor increase in develop-
-0.340646	advanced principles of software develop-
-0.495679	it will not be negative.
-0.314164	a will never be negative.
-0.229254	access, sort and search facilities,
-0.281587	the IDE, for debugging facilities,
-0.473699	operators will cause the creation
-0.237882	(b + c) The creation
-0.345002	(4) get a compiler warning
-0.237384	you will get no warning
-0.354232	Example 14.5a const int min
-0.375604	... if (i >= min
-0.224956	point to are constant. 14.2
-0.212327	lookup tables ................................................................................................. 132 14.2
-0.303139	other bits to zero. 14.3
-0.218595	Bounds checking .................................................................................................. 134 14.3
-0.212327	optimization topics ......................................................................................... 132 14.1
-0.466207	14 Specific optimization topics 14.1
-0.345257	possible version. See the vectorclass
-0.237733	features, see http://www.agner.org/optimize/ - vectorclass
-0.265210	* b1 * reciprocal_divisor; 14.7
-0.200054	point division ........................................................................................... 139 14.7
-0.237905	programming languages, profiling and debugging.
-0.407937	options are incompatible with debugging.
-0.230956	// Example 8.26a void Func(int
-0.230956	// Example 8.26b void Func(int
-0.237932	multiply j by is (columns
-0.293327	(int)&matrix[0][0] + j * (columns
-0.212327	Integer multiplication ............................................................................................. 136 14.5
-0.165190	explained on page 96. 14.5
-0.481302	which the array is defined.
-0.658815	upper limit can be defined.
-0.200054	/QaxSSE3, etc. -msse3 -mssse3 -msse4.1
-0.165190	-msse3 /arch:SSE3 -mssse3 /arch:SSSE2 -msse4.1
-0.237827	x^4 // x^8 // x^10
-0.313668	// x^10 // return x^10
-0.340769	if there are many branches):
-0.236449	Day == Friday) { DoThisThreeTimesAWeek();
-0.236449	Wednesday | Friday)) { DoThisThreeTimesAWeek();
-0.074790	double) /arch:SSE2 -msse2 /arch:SSE2 -msse2
-0.074790	ger or double) /arch:SSE2 -msse2
-0.387233	mathematical functions such as logarithms,
-0.313711	use position-independent code by default.
-0.237050	position-independent code everywhere by default.
-0.325364	The development time for WTL
-0.237603	Template Library (WTL). A WTL
-0.044100	_controlfp_s(&dummy, 0, _EM_OVERFLOW); // _controlfp(0,
-0.741566	take more time to load.
-0.337930	depending on the work load.
-0.534570	* c; a = select(b
-0.347918	b.load(bb+i); c.load(cc+i); a = select(b
-0.401764	in the high level framework.
-0.279510	runtime of the .NET framework.
-0.306476	relies on exception handling. 8.6
-0.212327	optimization options ................................................................................... 81 8.6
-0.102766	methods for communication and synchronization
-0.102766	parallelism because communication and synchronization
-0.595293	time Func is executed. Without
-0.200054	to C1::f } 73 Without
-0.348287	or QueryPerformanceCounter functions for millisecond
-0.237774	time is measured with millisecond
-0.293855	instead of double, then sizeof(S1)
-0.226791	2.0; } The factor sizeof(S1)
-0.503832	code or in a high-priority
-0.294213	routines, system core and high-priority
-0.293160	class separately in software development.
-0.284378	platform independence, and easy development.
-0.719546	it doesn't have to push
-0.200054	parameter 2: 12 $B1$1: push
-0.048366	book "Performance Optimization of Numerically
-0.048366	Hoisie: "Performance Optimization of Numerically
-0.237926	many different CPUs to verify
-0.237905	to test, maintain and verify
-0.222997	format is standardized allows us
-0.222997	exponent is biased allows us
-0.237905	such as sorting and searching,
-0.165190	tasks such as sorting, searching,
-0.458494	target pointed to is known.
-0.544709	of the object is known.
-0.200054	14.12 Position-independent code.................................................................................. 148 14.13
-0.165190	in Mac OS X. 14.13
-0.218599	versus dynamic libraries............................................................................ 146 14.12
-0.165190	of position-independent code. 147 14.12
-0.509158	to the same memory area.
-0.509158	share the same memory area.
-0.357739	Gnu compilers. // Example 14.19
-0.355093	is given in example 14.19
-0.356630	towards zero, rather than rounding.
-0.312841	141 for details about rounding.
-0.237269	matrix[row][column] = row + column;
-0.165190	int matrix[NUMROWS][NUMCOLUMNS]; int row, column;
-0.790980	can improve the performance dramatically
-0.200054	improve search times 24 dramatically
-0.234541	public and static data. 148
-0.165190	146 14.12 Position-independent code.................................................................................. 148
-0.165190	chains with long latencies. 8.5
-0.165190	to optimization by CPU.............................................................................81 8.5
-0.237246	C; } polynomial // Polynomial
-0.237246	C = 3.3; // Polynomial
-0.292710	a register, not even temporarily.
-0.338194	in memory, at least temporarily.
-0.452813	because of a very obscure
-0.200054	is possible to construct obscure
-0.357739	static keyword: // Example 14.1c
-0.355093	The FactorialTable in example 14.1c
-0.462661	63; // fractional part 142
-0.165190	floating point variables ......................... 142
-0.446185	Use the option for "assume
-0.427752	Adding the compiler option "assume
-0.790757	whether the arrays are properly
-0.265210	the object is deleted properly
-0.292880	considered a software optimization issue.
-0.466207	become a serious legal issue.
-0.458773	when the computer is restarted
-0.237905	is shut down and restarted
-0.237714	1; } module2.cpp int Func2()
-0.323716	2; } } void Func2()
-0.200054	- x-xxxx--x x-xxxx--x x-xx----- x--x-----
-0.165190	(x) x (x) x-xx--xx- x--x-----
-0.237708	and computing power than PCs.
-0.235499	computing resources than standard PCs.
-0.231913	to do this optimization. 8.2
-0.212327	compilers optimize ............................................................................................ 66 8.2
-0.231914	do something about it. Possible
-0.165190	well on non-Intel machines? Possible
-0.237905	C++ code. Compilers and IDE's
-0.234934	options turned on. Most IDE's
-0.621266	Gnu and PathScale compilers. 8.3
-0.200054	of different compilers............................................................................. 74 8.3
-0.580875	advantages that can be obtained.
-0.407429	problem cannot easily be obtained.
-0.594559	1. Optimizing software in C++:
-0.237641	be considered metaprogramming in C++:
-0.123375	(b) { y = sin(x);
-0.325424	user can see the delay.
-0.347615	which causes a long delay.
-0.428896	x; float sum = 1.f;
-0.237220	1.f; float nfac = 1.f;
-0.212327	do this optimization explicitly. Divisions
-0.251373	same unit as additions. Divisions
-0.236733	longjmp in time-critical code. 7.32
-0.229252	stack unwinding .............................................................................. 65 7.32
-0.345112	= 0; int i, largest_index
-0.165190	{ largest_abs = absvalue; largest_index
-0.330347	which are 64 bits wide,
-0.318772	int is 16 bits wide,
-1.357693	< 100; i++) { list[i].a
-0.324450	induction variable for accessing list[i].a
-0.236173	on a particular processor model.
-0.312611	to recommend any specific model.
-0.229252	Preprocessing directives ......................................................................................... 65 7.33
-0.200054	49 for a discussion. 7.33
-0.065747	Using integer operations for manipulating
-0.074790	with multiple cores. 3.15 Dependency
-0.074790	Context switches..................................................................................................... 22 3.15 Dependency
-0.514067	are as fast as additions.
-0.236881	the same unit as additions.
-0.824719	16 bytes without cache MOVNTPD
-0.165190	The 16-byte instructions MOVNTPS, MOVNTPD
-0.440408	represented as an integer. 158
-0.200054	in embedded systems ............................................................................. 158
-0.294117	B, C; x.a = A;
-0.381944	A2 = A + A;
-0.051025	much is a clock cycle?
-0.355435	is far from the server.
-0.236651	than a dedicated test server.
-0.212327	pitfalls of unit-testing ...................................................................................... 156
-0.165190	footprint is unreasonably large. 156
-0.237905	each carefully optimized and fine-tuned
-0.356249	making branches that are fine-tuned
-0.200054	16.3 Worst-case testing ................................................................................................ 157
-0.251373	more random than normal. 157
-0.347254	or bitmap than to draw
-0.237734	is necessary here to draw
-0.231420	differently on different test examples.
-0.286839	reductions in my test examples.
-0.317447	objects inside the derived class:
-0.165190	functions in the grandparent class:
-0.558068	cache. The advantage of sharing
-0.331815	if multiple threads are sharing
-0.581778	code you want to 155
-0.165190	performance monitor counters .................................................................... 155
-0.200054	7.29 Threads .................................................................................................................. 60 7.30
-0.165190	the techniques of multithreading. 7.30
-0.338902	be prevented in other ways,
-0.236855	= 8192 bytes, 4 ways,
-0.462634	target address can be predicted.
-0.356347	loop branch should be predicted.
-0.102886	same as for (i=0; i<n;
-0.102886	For example, for (i=0; i<n;
-0.578829	but the program is dividing
-0.236874	to unsigned int before dividing
-0.352408	virtual functions, and other complications
-0.346536	same generation can cause complications
-0.330601	with code compiled without AVX,
-0.230822	instruction set available, e.g. AVX,
-0.212327	caller, and so on. 7.31
-0.200054	error handling ................................................................................ 61 7.31
-0.441631	case of overflow and redo
-0.237645	(e.g. with _finite()) and redo
-0.237880	automatically detect opportunities for parallelization
-0.498659	107), OpenMP and automatic parallelization
-0.304425	cycles per array element. Matrix
-0.327424	results were as follows: Matrix
-0.575706	7.23 Constructors and destructors ..................................................................................
-0.590321	to find hot spots ..................................................................................
-0.646950	2, b * c); a.store(aa+i);
-1.008669	consecutive elements in aa: a.store(aa+i);
-0.047150	u; if (u.i & 0x7FFFFFFF)
-0.047150	143 if (u.i & 0x7FFFFFFF)
-0.294137	if multiple threads can add,
-0.165190	double vectors SSE3 horizontal add,
-0.336300	the wrong branch is fed
-0.556878	register which can be fed
-0.659360	the end of the array,
-0.349497	largest element in an array,
-0.293764	SSE2, preferably 32 for AVX.
-0.237511	as e.g. .R. for AVX.
-0.216504	If Microsoft compiler #define Alignd(X)
-0.216504	Gnu compiler, etc. #define Alignd(X)
-0.037092	c __m128i c2 = _mm_add_epi16(c,
-0.733328	of the program that waits
-0.237449	program is loaded, but waits
-0.356630	with sets rather than loops,
-0.236249	if and compile-time while loops,
-0.294184	of the bits for Tuesday,
-0.251373	Weekdays { Sunday, Monday, Tuesday,
-0.237908	often the case in loops.
-0.518096	for the critical innermost loops.
-0.237827	log(b[i]) + log(c[i]); // Increment
-0.200054	136 and 137, respectively. Increment
-0.575007	therefore likely to be cached
-0.640382	code and data are cached
-0.022924	common subexpression elimination, constant propagation,
-0.293813	of program or data exceeds
-0.234810	to user input never exceeds
-0.575579	can be found in Wikipedia
-0.310527	of Intel C++ compilers. Wikipedia
-0.526963	156 16.3 Worst-case testing ................................................................................................
-0.370796	22 3.15 Dependency chains ................................................................................................
-0.330854	AND'ed with the inverted mask.
-0.165190	setting a thread affinity mask.
-0.314363	or not. I will conclude
-0.323587	'this'. We can therefore conclude
-0.358139	data sections can be shared.
-0.456210	that it cannot be shared.
-0.237932	then the DLL is relocated
-0.237859	relocation. The DLLs are relocated
-0.020183	the execution of everything else.
-0.817006	objects accessed in a FIFO
-0.552641	resources. For example, a FIFO
-0.080266	the "Intel Math Kernel Library"
-0.080266	The "Intel Math Kernel Library"
-0.325334	commonly used methods for dealing
-0.354467	instruction that you are dealing
-0.738101	CGrandParent { public: void Hello()
-0.230956	virtual void Disp(); void Hello()
-0.200054	no graphics processing unit. Various
-0.165190	hard disk or network. Various
-0.324489	heavy marketing of 64-bit software,
-0.328283	high complexity of modern software,
-0.524372	arraysize; i++) { // Overflow
-0.212327	(see page 142). 30 Overflow
-0.237337	can be calculated using multiplications
-0.380720	how to speed up multiplications
-0.293239	but there are some differences
-0.232714	v. 2.1.7, 2004. No differences
-0.023470	= 1.1, B = 2.2,
-0.462161	processors on the same machine.
-0.292148	the so-called Java virtual machine.
-0.233043	Do not use STL containers.
-0.330854	to creating and deleting containers.
-0.345560	it to a branch tree.
-0.320370	list or a binary tree.
-0.165190	a funda- mentally flawed approach
-0.165190	systematic and well thought-through approach
-0.312460	size can be allocated dynamically.
-0.312460	stack can be allocated dynamically.
-0.330854	function. typeof(CriticalFunction) * CriticalFunctionDispatch(void) __asm__
-0.165190	"C" int CriticalFunction (); __asm__
-0.153122	systems. Some compilers have difficulties
-0.102766	cost to creating and deleting
-0.102766	responsible for creating and deleting
-0.354637	page 49 for a discussion.
-0.334437	page 140 for further discussion.
-0.265210	...................................................................... 32 7.4 Enums ......................................................................................................................
-0.265210	..................................................................................................... 93 9.8 Strings ......................................................................................................................
-0.236455	smaller sizes (char, short int)
-0.229249	have used char (or int)
-0.344710	F3(bool y) { if (y)
-0.293280	float b[1000]; }; if (y)
-0.344867	a function for this purpose,
-0.511132	enough for a specific purpose,
-0.200054	such as most sorting algorithms,
-0.165190	such as many encryption algorithms,
-0.421227	SSE2 supported CriticalFunction = &CriticalFunction_SSE2;
-0.293199	// SSE2 supported return &CriticalFunction_SSE2;
-0.330904	void NotPolymorphic(); virtual void Disp();
-0.466207	cout << "Hello "; Disp();
-0.640454	the response time is consistent
-1.179300	can be improved by consistent
-0.237820	and then 0+1.23456 = 1.23456.
-0.356630	get 0 rather than 1.23456.
-0.008674	int i; } u, v;
-0.008674	__m128i bc = _mm_mullo_epi16 (b,
-0.335347	compiler output can often reveal
-0.230829	languages and their implementations reveal
-0.714961	class or structure is created.
-0.999451	in which they are created.
-0.356916	special reasons to use denormal
-0.200054	mode rather than generating denormal
-0.460891	for updates should be optional
-0.165190	GNU General Public License, optional
-0.042050	0.11 memcpy 16kB unaligned op.
-0.042050	0.22 memcpy 16kB unaligned op.
-0.236173	less each time. An experiment
-0.233331	The results of my experiment
-0.236250	edx, eax $B2$2 ; Induction++;
-0.399362	Induction; a[i+1] = Induction; Induction++;
-0.339028	precision. Conversions between different precisions
-0.357559	mixing different floating point precisions
-0.074790	dispatching .................................................................................... 124 13.3 Difficult
-0.074790	time of programming. 13.3 Difficult
-0.324273	are based on an interpreter
-0.236691	first PC's had an interpreter
-0.305577	the procedure linkage table (PLT).
-0.432042	a procedure linkage table (PLT).
-0.292821	int UnusedFiller; }; int order(int
-0.236682	int i, j; int order(int
-0.237926	hardware circuits consisting of digital
-0.224956	of operations. A complex digital
-0.114508	int i; for(i=0; i<300; i++){
-0.200054	Opteron K8 0.38 0.44 0.40
-0.165190	Core 2 0.77 0.89 0.40
-0.068042	8.8a double x, y, z;
-0.068042	a; float x, y, z;
-0.212327	Floating point division ........................................................................................... 139
-0.200054	> b / c) 139
-0.165190	0.18 0.11 1.21 0.57 0.44
-0.165190	AMD Opteron K8 0.38 0.44
-0.008674	version return (*SelectAddMul_pointer)(aa, bb, cc);
-0.310101	identification 16 bit platform __GNUC__
-0.200054	// Example 8.22 #ifdef __GNUC__
-0.314692	then the line that covered
-0.339373	for different processors are covered
-0.262638	strings is the old fashioned
-0.391914	strings in the old fashioned
-0.314574	allocate variable-size arrays with alloca.
-0.407608	for restrictions on using alloca.
-0.404461	speed of RAM memory. Efficient
-0.489136	between rounding and truncation. Efficient
-0.407630	will use different memory spaces
-0.312925	of time cleaning up spaces
-0.322430	= Induction; ; parameter $B1$1:
-0.228173	; parameter 2: 12 $B1$1:
-0.008674	120, 720, 5040, 40320, 362880,
-0.235830	= (memory address) / (line
-0.165190	= (number of sets) (line
-0.237813	for each row or column.
-0.354016	element 0 in this column.
-0.339975	become bigger and more complex,
-0.337635	the source code more complex,
-0.532150	on page 146 below. 3.7
-0.230826	position-independent code ....................................................... 20 3.7
-0.577882	Mars This is a cheap
-0.276605	penalty. Branches are relatively cheap
-0.035512	for reasons of mathematical purity.
-0.200054	x^4 F32vec4 s(0.f, 0.f, 0.f,
-0.165190	// x^4 F32vec4 s(0.f, 0.f,
-0.355664	a double: // Example 14.23b
-0.235994	with big-endian storage. Example 14.23b
-0.229262	and induction variables (see below).
-0.229262	divisible by 16 (see below).
-0.325395	two pointers requires a division,
-0.291979	more time. Single precision division,
-0.237926	save one unit of received
-0.222406	is servicing. A command received
-0.218595	lead to a dramatic degradation
-0.200054	may be a slight degradation
-0.325424	that don't need the "override"
-0.237654	order to implement this "override"
-0.357739	in two: // Example 11.2b
-0.355093	in list in example 11.2b
-0.583074	Here, the address of matrix[j][0]
-0.466207	{ j = order(i); matrix[j][0]
-0.212327	reductions: !(!a)=a x-xxxxxxx ---x----- x--xx----
-0.165190	x-xxxxxxx ---x----- x--xx---- (a&&b)||(a&&!b)=a x--xx----
-0.236287	in a separate function. Sometimes,
-0.284372	around the hot spot. Sometimes,
-0.928087	you don't have to distribute
-0.523216	therefore not possible to distribute
-0.553047	also look at the "worst
-0.237869	subsequent counts represent the "worst
-0.206622	true. Boolean variables are overdetermined
-0.206622	invalid. Boolean variables are overdetermined
-0.165198	calculation time of 250 ms.
-0.165198	actually more than 250 ms.
-0.533817	that does the same thing.
-0.499354	fact doing the same thing.
-0.355133	// Define size of squares:
-0.349175	and c1 for all squares:
-0.564538	name at the time MemberPointer
-0.236874	declaration of c1 before MemberPointer
-0.362859	which is available from www.intel.com.
-0.277858	Kernel Library, available from www.intel.com.
-0.228180	< r1; c1 += TILESIZE)
-0.228180	< SIZE; r1 += TILESIZE)
-0.524632	from a number of sources.
-0.231418	or come from unknown sources.
-0.462532	fragmented thanks to the first-in-last-out
-0.358080	is organized in a first-in-last-out
-0.008674	720, 5040, 40320, 362880, 3628800,
-0.519639	Furthermore, it is not uncommon
-0.519639	Today, it is not uncommon
-0.331903	computer games. Such a coprocessor
-0.335446	platform with a graphics coprocessor
-0.707706	unsigned int fraction : 23;
-0.233818	u.i += n << 23;
-0.281582	Intel compilers for Linux. 82
-0.218595	8.6 Optimization directives .............................................................................................. 82
-0.341145	important disadvantage of C++ relates
-0.471093	with the C++ language relates
-0.289012	the same member pointer. 7.9
-0.165190	37 7.8 Member pointers.......................................................................................................37 7.9
-0.726192	takes care of the alignment.
-0.222403	not always work. Data alignment.
-0.229254	as efficient as integers. 7.5
-0.218595	7.4 Enums ...................................................................................................................... 33 7.5
-0.304341	have done a good deal
-0.395286	and get a good deal
-0.212327	linker and loader (requires binutils
-0.165190	as example 13.1, Requires binutils
-0.318049	for Boolean vector operations. 7.6
-0.218595	33 7.5 Booleans................................................................................................................... 33 7.6
-0.176509	in Mac systems. 14 Specific
-0.176509	compiler ......................................................................... 130 14 Specific
-0.236287	registers anyway. Pure function. __attribute__((const))
-0.265210	#ifdef __GNUC__ #define pure_function __attribute__((const))
-0.237859	in the background are unnecessary
-0.231914	of a program. Avoid unnecessary
-0.344736	} else { float b[1000];
-0.234181	89 int a[1000]; float b[1000];
-0.218595	variables and operators............................................................................... 29 7.3
-0.218595	on integer variables. 31 7.3
-0.343539	an error simply by performing
-0.234535	and usability A better performing
-0.293747	calculations are done only once,
-0.379943	it is only calculated once,
-0.338480	float s0 = 0, s1
-0.296289	{ s0 += a[i]; s1
-0.722311	The CISC instruction set (called
-0.230112	allows compile-time if statements (called
-0.716280	r) { int i; 84
-0.200054	the compiler does ............................................................................. 84
-0.074790	functions must have extern "C"
-0.074790	common entry point extern "C"
-0.382195	Linux in almost all respects
-0.331109	Gnu compiler in many respects
-0.212327	A GNU Free Documentation License
-0.165190	set control no yes License
-0.236861	of the code. See ISO/IEC
-0.165190	on compiler optimization. en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC
-0.756499	can be used as command-line
-0.293869	profiling and debugging. A command-line
-0.292329	result in array ; i++
-0.235044	and the post-increment operator i++
-0.351590	a constant (see page 137).
-0.430402	by 2 (See page 137).
-0.520749	a branch inside the template.
-0.561386	types with the same template.
-0.237827	positive integer constant. // General
-0.200054	in compiler price GNU General
-0.504707	been stored in the container,
-0.541054	together into a single container,
-0.382799	and 64-bit Windows. The integrated
-0.237813	a graphics card or integrated
-1.323408	when the program is started.
-0.235241	since the CPU was started.
-0.434298	is a function which transposes
-0.579391	noticeable. The following example transposes
-0.525905	have access to the container.
-0.463079	are objects in the container.
-0.092037	to dispatched version return (*SelectAddMul_pointer)(aa,
-0.092037	in dispatched version return (*SelectAddMul_pointer)(aa,
-0.323160	operating system. It will crash
-0.236101	they are disabled will crash
-0.023470	const double A = 1.1,
-0.665646	that allows you to reserve
-0.592247	cores in order to reserve
-0.222401	SSE4.1 and integer division. Older
-0.165190	(0, 2, 4, etc.). Older
-0.489086	when the pointer is deleted.
-0.814753	that need to be deleted.
-0.367392	64-bit Windows and Linux. Asmlib
-0.222403	0.95 0.6 1.19 13 Asmlib
-0.534600	the program is run. Both
-0.265215	option for the linker. Both
-0.324528	p->NotPolymorphic(); p->Hello(); p = &Object2;
-0.237220	* p2; p2 = &Object2;
-0.313787	{ struct { int a:4;
-0.313787	struct Bitfield { int a:4;
-0.200054	0 a+0=a a*0=0 a*1=a (-a)*(-b)=a*b
-0.165190	a*0=0 --xxxx-xx a*1=a x-xxxxx-x (-a)*(-b)=a*b
-0.358403	unwinding information can be left
-0.236806	only one free register left
-0.165190	c1; c2 < c1+TILESIZE; c2++)
-0.165190	r1; c2 < r2; c2++)
-0.123646	depends on the processor. Nested
-0.202569	depending on the processor. Nested
-0.350714	for multiplication } // ipow
-0.237316	power using loop double ipow
-0.352648	requires only an integer comparison,
-0.251373	such as addition, subtraction, comparison,
-0.037112	CPUs. New versions are produced
-0.279619	CHello { public: void NotPolymorphic();
-0.397649	CGrandParent { public: void NotPolymorphic();
-0.382591	a+b+c = a+(b+c) - a*b+a*c
-0.356613	= a+(b+c) - n.a. a*b+a*c
-0.357739	this capability: // Example 11.1a
-0.926063	the code in example 11.1a
-0.236173	that the CPU doesn't support,
-0.235943	library has no AVX support,
-0.353732	memory leaks if you forget
-0.352760	different thread. If you forget
-0.348775	vector bc with the inverted
-0.348775	is AND'ed with the inverted
-0.237926	in example 11.1a to 11.1b
-0.357739	writing: 103 // Example 11.1b
-0.294117	10 * 8 = 80.
-0.355585	interprocedural optimizations. See page 80.
-0.293410	address of element number i.
-0.200054	eax holds the index, i.
-0.237738	each line written. This worked
-0.293933	available information. They have worked
-0.314563	protection against overflow is needed:
-0.237764	explains why bookkeeping is needed:
-0.225766	notice .......................................................................................................... 164 1 Introduction
-0.225766	updated 2014-08-07. Contents 1 Introduction
-0.813467	int i; for(i=0; i<300; i+=3){
-0.165190	int i; for(i=0; i<301; i+=3){
-0.023102	to assembly: ALIGN 4 PUBLIC
-0.074790	((x2) 2) 2 a+a+a+a=a*4 -(-a)=a
-0.074790	x*x*x*x*x*x*x*x = ((x2)2)2 a+a+a+a=a*4 -(-a)=a
-0.148760	programming manuals from Intel: "IA-32
-0.148760	obsolete. Microprocessor documentation Intel: "IA-32
-1.024228	when the program is loaded,
-0.496659	When the program is loaded,
-0.318747	: c (a&&b) || (a&&b&&c)
-0.220328	(a&&b) || (a&&c) || (a&&b&&c)
-0.495151	a-(-b)=a+b a-a = 0 a+0=a
-0.165190	-(-a)=a --xxxxxx- a-(-b)=a+b ---xxx-x- a+0=a
-0.357739	use SafeArray: // Example 7.15b
-0.314063	template parameters, as example 7.15b
-0.237318	because the block size grows
-0.350167	elements in an array grows
-0.237932	2 then N&(N-1) is 0.
-0.331203	reversed if c < 0.
-0.017525	r1; r2 < r1+TILESIZE; r2++)
-0.017525	r1+1; r2 < r1+TILESIZE; r2++)
-0.237953	array. eax holds the index,
-0.265210	with a square brackets index,
-0.504827	when none of the time-consumers
-0.501653	operations. The most common time-consumers
-0.298807	linear search, is fast enough.
-0.224524	do the job fast enough.
-0.008674	2, 6, 24, 120, 720,
-0.325446	and 12.4c is quite tedious
-0.465229	functions can be quite tedious
-0.236212	all code versions work correctly.
-0.235890	all code branches works correctly.
-0.237905	This is safe and flexible,
-0.165190	the STL are universal, flexible,
-0.354232	int i; const int ARRAYSIZE
-0.406729	list[ARRAYSIZE]; if (i < ARRAYSIZE
-0.316128	not use the source annotation
-0.221773	an option for source annotation
-0.224958	into ecx and edx, respectively.
-0.165190	page 136 and 137, respectively.
-0.230118	1 Introduction ....................................................................................................................... 3 1.1
-0.218595	receive new relevant information. 1.1
-0.354253	of mispredictions (see page 43).
-0.318116	branch prediction (see p. 43).
-0.501071	i; } u; u.i ^=
-0.165190	above example with u.i[1] ^=
-0.291334	bit-mask which is all 1's
-0.631256	is AND'ed with all 1's
-0.237634	easier if we use hexadecimal
-0.234812	powers of 2. Using hexadecimal
-0.404467	unsigned int i; } u,
-1.011251	f; int i; } u,
-0.340736	is signed, or by extending
-0.237050	a longer size by extending
-0.098608	The operators &, |, ^,
-0.172709	bitwise operators &, |, ^,
-0.357950	be handled in a systematic
-1.099470	recommended to use a systematic
-0.403904	But the & operator forces
-0.233824	register variable. The union forces
-0.498905	unless the loop is rolled
-0.222401	Sum of a list, rolled
-0.577626	only from same module __attribute__
-0.330854	module __attribute__ ((visibility ("internal"))) __attribute__
-0.294217	with programs written in Java,
-0.356122	programming languages, such as Java,
-0.093319	#pragma ivdep __restrict #pragma ivdep
-0.093319	__declspec( noalias) __restrict #pragma ivdep
-0.508479	from Microsoft, Intel and Gnu.
-0.348171	and highly compatible with Gnu.
-0.237953	1. This ends the recursion
-0.382799	15.1b is implemented. The recursion
-1.199362	n.a. n.a. - a ^a
-0.237753	-1 = ~a a ^a
-0.408189	to the rules of algebra,
-0.311042	the case of Boolean algebra,
-0.293827	overhead cost to memory management
-0.231417	overhead cost of heap management
-0.355999	Boolean expressions. There are lots
-0.237774	files and databases with lots
-0.200054	1 - 5. www.amd.com. 163
-0.200054	160 19 Literature ..................................................................................................................... 163
-0.165190	every version. For team projects,
-0.165190	intermediate version. For one-man projects,
-0.200054	-m32 -m64 -static /MT 160
-0.165190	Overview of compiler options....................................................................................... 160
-0.237827	vector, uses SSE3. // (This
-0.287402	an explicit induction variable. (This
-0.346163	Linux, BSD, Windows and Mac.
-0.494963	64-bit Windows, Linux and Mac.
-0.584591	FPGA in the same chip.
-0.537778	integrated in the CPU chip.
-0.358403	a file can be wrapped
-0.354684	pointers unless they are wrapped
-0.351001	that can be predicted perfectly
-0.243626	it can be predicted perfectly
-0.008674	__m128i mask = _mm_cmpgt_epi16(b, zero);
-0.184031	all objects have been added?
-0.200054	and "More Effective C++". Addison-Wesley,
-0.165190	Warren, Jr.: "Hacker's Delight". Addison-Wesley,
-0.355238	(except for the loop counter,
-0.351528	for incrementing a loop counter,
-0.434086	variable with the static modifier
-0.200054	or __attribute__((fastcall)). The fastcall modifier
-0.294108	or /Ox -O3 or -Ofast
-0.200054	no specific option) better: -Ofast
-0.535571	before the code to test.
-0.200054	you want to 155 test.
-0.595207	will be unable to respond
-0.321582	buffer. It should never respond
-0.008674	"Performance Optimization of Numerically Intensive
-0.358547	and algorithms in the planning
-0.200054	factors in the early planning
-0.574409	1; } }; class C2
-0.200054	() { C1 Object1; C2
-0.456945	data with all the R
-0.337134	points with the four R
-0.326732	debug version and a release
-0.326732	program development, and a release
-0.591373	significant part of the fraction.
-0.357305	binary decimals of the fraction.
-0.237813	for Tuesday, Wednesday or Friday
-0.165190	8, Thursday = 0x10, Friday
-0.286168	unnecessary functions Some programming textbooks
-0.230830	and classes Nowadays, programming textbooks
-0.357392	point division to be slower.
-0.502250	will make the program slower.
-0.222403	memory allocation ...................................................................................... 90 9.7
-0.200054	restrictions on using alloca. 9.7
-0.538692	{ for (c2 = c1;
-0.237371	// Example 7.14 class c1;
-0.085115	intermediate code, interpreters, just-in-time compilers,
-0.085115	graphics frameworks, interpreters, just-in-time compilers,
-0.237905	will generate -128, and subtracting
-0.382668	divide by 2n by subtracting
-0.355725	* CriticalFunctionDispatch(void) { // Returns
-0.237246	#include <ia32intrin.h> etc. // Returns
-0.294187	and tested it. The insight
-0.236629	the problem. This new insight
-0.165190	point algebra reductions: a+b=b+a a*b=b*a
-0.165190	XMM (vector) reductions: a+b=b+a, a*b=b*a
-0.101031	= r1; r2 < r1+TILESIZE;
-0.101031	= r1+1; r2 < r1+TILESIZE;
-0.237753	such obvious reductions as 0/a
-0.561155	a/1 = a - 0/a
-0.102364	names. We can only hope
-0.102364	15.1c. We can only hope
-0.237827	// Example 7.45 // Portability
-0.224958	sake of optimization. 14 Portability
-0.350474	compatible with the other compilers).
-0.212331	some very old DOS compilers).
-0.355725	< arraysize) { // Catch
-0.452370	c[i]); } } // Catch
-0.352765	some programs, more than 99%
-0.212327	calculations. In other programs, 99%
-0.018520	{ cout << "Hello ";
-0.231917	size = 1024; struct Sab
-0.212331	{int a; int b;}; Sab
-0.305446	iteration is a significant contribution
-0.212327	is only a negligible contribution
-0.331903	The compiler makes a distinction
-0.380627	There is an important distinction
-0.237246	// Store result // Update
-0.237246	induction variable Y // Update
-0.305896	Intel, Microsoft, Gnu, Clang Supported
-0.165190	Include file dvec.h vectorclass.h Supported
-0.074790	minor increase in develop- ment
-0.074790	principles of software develop- ment
-0.441735	Make the dispatcher function. typeof(CriticalFunction)
-0.165190	* CriticalFunctionDispatch(void) __asm__ ("CriticalFunction"); typeof(CriticalFunction)
-0.356172	control instructions than the ones
-0.314687	classes or modify the ones
-1.322265	when the program is busy
-0.237764	he or she is busy
-0.571812	memory may not be optimally
-0.322518	critical part can run optimally
-0.222401	overflow checks where necessary. Fast
-0.212327	where appropriate. Compiler-specific keywords Fast
-0.294187	code in details. The funny
-0.293239	that it does some funny
-0.232342	loops, etc. Optimizing database queries
-0.165190	in simple cases. Database queries
-0.353845	tag on a program saying
-0.200054	with nagging pop-up messages saying
-0.292792	are much higher than normal.
-0.236656	data more random than normal.
-0.226038	directly // Writes "Hello 1"
-0.226038	p2->Hello(); // Writes "Hello 1"
-0.237932	when the hardware is updated.
-0.484059	keeping a CPU dispatcher updated.
-0.237882	a particular purpose. The clumsy
-0.234218	case. Intrinsic functions look clumsy
-0.008674	d; d = (double)(signed int)u;
-0.358676	do much of the trivial
-0.314701	use a loop for trivial
-0.035787	matrix 96 void transpose(double a[SIZE][SIZE])
-0.035787	Example 9.5b void transpose(double a[SIZE][SIZE])
-0.336254	Only for SSE2 or x64
-0.235830	14.00 for 80x86 / x64
-0.313876	segments (32-bit or 64-bit systems).
-0.233821	32 bits in x86 systems).
-0.311609	and much time is wasted
-0.311609	parameter. No time is wasted
-0.200054	compiler price GNU General Public
-0.251373	copyrighted by Agner Fog. Public
-0.343334	or add an extra dummy
-0.235498	You may even add dummy
-0.336327	the library through the symbolic
-0.331903	installation program makes a symbolic
-0.504283	A variable can be fetched
-0.339395	pipeline where instructions are fetched
-0.237820	= 0x20, Saturday = 0x40
-0.237813	the entire 64 or 0x40
-2.034309	- n.a. n.a. - (a&b)|(a&c)
-0.165190	x---- ----- ~(~a)=a x-xxxxx-- (a&b)|(a&c)
-1.076803	is less efficient than relocation,
-0.428368	absolute addresses that need relocation,
-0.438678	Microsoft and PathScale compilers. (The
-0.542520	78 for an explanation. (The
-0.538706	b * 1.2; // Mixing
-0.234380	for the commercial compilers. Mixing
-0.237811	After each iteration it decides
-0.325235	conditions. A dispatcher function decides
-0.221772	denormals-are-zero mode (SSE2): #include <xmmintrin.h>
-0.221772	flush-to-zero mode (SSE): #include <xmmintrin.h>
-0.237127	{ return Func1(x) * Func1(x)
-0.570585	Func2(double x) { return Func1(x)
-0.008674	6, 24, 120, 720, 5040,
-0.339405	metaprogramming features, including the ability
-0.237869	Obviously, we loose the ability
-0.294079	ways of multiplying by 3,
-0.319430	of sizes 1, 2, 3,
-0.222403	system resources .......................................................................................... 21 3.12
-0.222406	files and system modules. 3.12
-0.521921	cores that do not 123
-0.200054	For example, #define ABC 123
-0.347503	a null reference to provoke
-0.346157	null reference. This will provoke
-0.165196	Optimizes reasonably well. Codeplay VectorC
-0.165196	v. 1.4, 2005. Codeplay VectorC
-0.314230	feasible. Interference from other processes.
-0.519730	and shared between multiple processes.
-0.074790	from same module __attribute__ ((visibility
-0.074790	__attribute__ ((visibility ("internal"))) __attribute__ ((visibility
-0.237932	of dependency chains is stronger
-0.236089	effect is so much stronger
-0.438928	cache. These instructions are accessible
-0.779585	functions that are not accessible
-0.237905	by consistent modularity and reusable
-0.294217	be put away in reusable
-0.294187	user. Installation problems. The procedures
-0.226795	far pointers, and far procedures
-0.237820	!a && !b = !(a
-0.356613	= a*b - n.a. !(a
-0.165196	systems ............................................................................. 158 18 Overview
-0.165196	p. 22). 159 18 Overview
-0.165190	AVX immintrin.h AMD SSE4A ammintrin.h
-0.165190	SSE4A ammintrin.h AMD XOP ammintrin.h
-0.356630	electrical connections rather than sequences
-0.235992	assembly instructions or small sequences
-0.237645	takes to start and stop
-0.382468	an error message and stop
-0.334437	be expected for further expansions
-0.218595	iterations such as Taylor expansions
-0.312683	or without the sign bit.
-0.312683	care about the sign bit.
-0.074790	number of vectors. 12.10 Conclusion
-0.074790	vectors ....................................................... 120 12.10 Conclusion
-0.237196	do not support static linking.
-0.536077	both static and dynamic linking.
-0.272297	versions without an IDE. Free
-0.200054	without restrictions. A GNU Free
-0.237953	performance and studying the bottlenecks
-0.236995	way to identify performance bottlenecks
-0.336303	are unstable due to interrupts
-0.332878	the CPU to generate interrupts
-0.473390	requirements of compatibility with legacy
-0.313710	of compatibility with some legacy
-0.237553	in a far data segment
-0.293675	much data for one segment
-0.237269	_WIN32 Linux platform n.a. __unix__
-0.251373	platform n.a. __unix__ __linux__ __unix__
-0.314731	by their address and attempts
-0.654759	the event that it attempts
-0.228173	allows 256-bit integer vectors. Code
-0.200054	parameter transfer are eliminated. Code
-0.068042	Friday, Saturday }; Weekdays Day;
-0.068042	= 0x40 }; Weekdays Day;
-0.102328	< c1+TILESIZE; c2++) { swapd(a[r2][c2],a[c2][r2]);
-0.102328	< r2; c2++) { swapd(a[r2][c2],a[c2][r2]);
-0.008674	dispatched version return (*SelectAddMul_pointer)(aa, bb,
-0.324976	in order to save power.
-0.231913	than by the processing power.
-0.340328	can be a time consumer
-0.236314	be an annoying time consumer
-0.483643	recommended to put a parenthesis
-0.237798	unless you put a parenthesis
-0.231414	modularity and reusable classes. Security
-0.276612	else on a computer. Security
-0.292410	loop counter with its limit,
-0.165190	never exceeds an acceptable limit,
-0.293905	here. You cannot use ~
-0.466207	operators &, |, ^, ~
-0.237799	// call transpose function swapd(a[r][c],
-0.521724	loop columns below diagonal swapd(a[r][c],
-0.236714	only slightly more time. Single
-0.510158	into the data cache. Single
-0.200054	0.24 n.a. 1.00 0.25 0.28
-0.165190	n.a. 1.00 0.35 0.29 0.28
-0.448723	but operators that have Booleans
-0.200054	efficient as integers. 7.5 Booleans
-0.403897	operands AMD Opteron K8 0.24
-0.200054	Opteron K8 0.24 0.25 0.24
-0.122678	Optimizing memory access 9.1 Caching
-0.122678	access ............................................................................................. 87 9.1 Caching
-0.325108	unpredictable intervals which may interfere
-0.237597	scope. A macro will interfere
-0.381982	/Oy -fomit- frame- pointer -fomit-
-0.165190	No stack frame /Oy -fomit-
-0.200054	AMD Opteron K8 0.24 0.25
-0.251373	0.25 0.24 n.a. 1.00 0.25
-0.037092	c __m128i bc = _mm_mullo_epi16
-0.346487	on CPUs without the FMA4
-0.236044	(MS) xopintrin.h (Gnu) AMD FMA4
-0.200054	by Agner Fog. Public distribution
-0.165190	is not allowed. Non-public distribution
-0.436020	* (2n / b) >>
-0.165190	|, ^, ~, <<, >>
-1.189432	compilers are able to do,
-0.212327	of macro expansions. Programmers do,
-0.237797	do not alias, if appropriate.
-0.237122	member functions static where appropriate.
-0.222403	in the subsequent manuals. Please
-0.296292	works, here's an explanation. Please
-0.074790	resources .......................................................................................... 21 3.12 Network
-0.074790	and system modules. 3.12 Network
-0.236827	uint64_t MS compiler: unsigned __int64
-0.251373	or int64_t MS compiler: __int64
-0.141609	most cache misses, branch mispredictions,
-0.141609	executed, cache misses, branch mispredictions,
-1.394615	= 0; i < rows;
-0.237553	way of keeping data together.
-0.234213	the modules are linked together.
-0.547202	less be possible to organize
-1.167413	is no need to organize
-0.723817	a member of a bitfield
-0.237753	faster to compose a bitfield
-0.357739	positive integer: // Example 15.1b.
-0.544811	while loop in example 15.1b.
-0.350235	error message when it sees
-0.561285	so that the compiler sees
-0.236714	portability and development time. Interpreted
-0.165190	and UNIX shell script. Interpreted
-0.357695	to make the compiler treat
-0.237170	some of these also treat
-0.008674	copy matrix void TransposeCopy(double a[SIZE][SIZE],
-0.237820	b<c && a<c) = (a<b
-0.165190	---xx---- (a+c==b+c)=(a==b) ----x---- !(a<b)=(a>=b) (a<b
-0.931805	cycles, depending on the processor).
-0.476575	integers, depending on the processor).
-0.237659	thousand cache misses have occurred.
-0.237541	Floating point overflow has occurred.
-0.357427	||, ! and the corresponding
-0.237869	the CPU supports the corresponding
-0.237668	a[N]; public: SafeArray() { memset(a,
-0.627191	set a to zero memset(a,
-0.331906	{ return x * m;}
-0.237905	that seldom occur and recovering
-0.325334	function can use for recovering
-0.382501	conditions enum Weekdays { Sunday,
-0.287889	branch if the constants Sunday,
-0.008674	instruction sets are mutually incompatible.
-0.659219	*p = *p + 2;}
-0.235993	modify x *const_cast<int*>(&x) += 2;}
-0.650431	console mode program is fast,
-0.165190	-ffast-math /fp:fast /fp:fast=2 -fp-model fast,
-0.570187	The costs of optimizing University
-0.200054	By Agner Fog. Technical University
-0.341856	number of calls to log,
-0.165190	functions such as pow, log,
-0.237872	Preprocessing directives (everything that begins
-0.391789	eax,0. The loop body begins
-0.355664	the exponent: // Example 14.26
-0.312451	to exponent } Example 14.26
-0.355664	as integers: // Example 14.27
-0.312451	both positive } Example 14.27
-0.453938	bits. The method is somewhat
-0.235890	accessed sequentially. It works somewhat
-0.252031	the branch is poorly predictable.
-0.184776	the branches are poorly predictable.
-0.196047	the speed of addition, subtraction,
-0.196047	operations such as addition, subtraction,
-0.657491	sign bit: // Example 14.23
-0.535397	union, as in example 14.23
-0.543882	There may be a slight
-0.334655	CPUs which may cause slight
-0.236214	throughput of an execution unit.
-0.307589	is no graphics processing unit.
-0.237905	than self-styled hacks and direct
-0.333174	a language that allows direct
-0.331939	will typically get the generic
-0.352364	instruction set, and a generic
-0.525220	one floating point addition unit,
-0.307589	have a graphics processing unit,
-0.237932	if the handle is invalid.
-0.233043	old block then become invalid.
-0.237932	network or database is heavily
-0.310634	operations. Algorithms that rely heavily
-0.331364	need relocation, but only self-
-0.432595	currently used for calculating self-
-0.357386	even faster to make log2
-0.324847	{ static const double log2
-0.439101	using the performance monitor counters.
-0.288618	feature called performance monitor counters.
-0.229259	be used where execution speed,
-0.229259	and flexibility, while execution speed,
-0.227302	(a&&b)||(a&&!b)=a x--xx---- (a&&b) || (!a&&c)
-0.227302	x-xx----- 75 (a&&b) || (!a&&c)
-0.237882	array to zero. The []
-0.165190	i) { // Safe []
-0.571465	is explained on page 122.
-0.290886	XMM registers; see page 122.
-0.292450	and make appropriate error messages
-0.165190	users with nagging pop-up messages
-0.165190	F1(int x[]); void F2(float x[]);
-0.165190	Example 9.2a void F1(int x[]);
-0.102003	-axSSE3, etc. (Intel CPU only)
-0.102003	CPU only) (Intel CPU only)
-0.212327	11.1a to 11.1b automatically, although
-0.165190	will also work, 133 although
-0.455568	code. Intrinsic functions are primitive
-0.222401	dangers of a relatively primitive
-0.200054	Intensive Codes", by S. Goedecker
-0.165190	on improving performance. Stefan Goedecker
-0.074790	dispatch strategies........................................................................................ 122 13.2 Model-specific
-0.074790	the source files. 13.2 Model-specific
-0.165190	type identification (RTTI) /GR– -fno-rtti
-0.165190	(RTTI) /GR– -fno-rtti /GR- -fno-rtti
-0.293928	you don't want this initialization,
-0.165190	for-loop has three clauses: initialization,
-0.237668	(absvalue > largest_abs) { largest_abs
-0.165190	a[size]; unsigned int absvalue, largest_abs
-0.230117	off and use alternative implementations.
-0.281585	and the best Java implementations.
-0.013569	{1, 1, 2, 6, 24,
-0.294213	analyzing program performance and studying
-0.294184	hot spots, but for studying
-0.354253	data cache (see page 87).
-0.318116	no cache (see p. 87).
-0.291083	it may cause cache contentions.
-0.235154	{ // No cache contentions.
-0.407341	unsigned int N> class SafeArray
-0.200054	SafeArray: // Example 7.15b SafeArray
-0.353674	Example 7.43 on page 58
-0.419659	able to do so. 58
-0.165198	over. Virtualization is becoming increasingly
-0.229258	vector processors are becoming increasingly
-0.237753	make the measurements as accurate
-0.218599	but this is sufficiently accurate
-0.237624	worthwhile to invest more efforts
-0.337952	to focus the optimization efforts
-0.450756	each time the program starts.
-0.490870	values before the program starts.
-0.235645	to. Now ebx contains i/2+r.
-0.276605	temporary register for computing i/2+r.
-0.357989	more clear to the reader
-0.357927	is assumed that the reader
-0.237763	not a manual on usability,
-0.231916	compromise between development time, usability,
-0.218599	it should be true. template<>
-0.200054	This ends the recursion template<>
-0.331933	language also includes the low-level
-0.346464	subset, giving access to low-level
-0.877280	SSE4.1 instruction set is available:
-0.314563	mode if SSE2 is available:
-0.212327	for floating point overflow: _controlfp_s(&dummy,
-0.165190	floating point status: _fpreset(); _controlfp_s(&dummy,
-0.229445	align by 4 ; mangled
-0.229445	+ esp ;alignby4 ; mangled
-0.074790	INSTRSET == 2 12.6 Transforming
-0.074790	classes ............................................................................................. 113 12.6 Transforming
-0.237597	the critical stride will contend
-0.340531	make all dynamic libraries contend
-0.200064	manager has a garbage collector
-0.200064	the very time-consuming garbage collector
-0.192864	"Delta" }; if ((unsigned int)n
-0.192864	39916800, 479001600}; if ((unsigned int)n
-0.200054	which they are created. Far
-0.165190	can also be huge). Far
-0.048366	{ // Table of factorials:
-0.048366	n! // Table of factorials:
-0.703484	7.20 Virtual member functions ........................................................................................
-0.694997	12.4 Using intrinsic functions ........................................................................................
-0.237811	the loop control it compares
-0.237585	i < 100. It compares
-0.203376	as the example below shows.
-0.203376	as example 7.15b below shows.
-0.382394	Friday)) { DoThisThreeTimesAWeek(); } By
-0.232346	Linux and Mac platforms By
-0.236983	it is known with certainty
-0.236983	compiler to predict with certainty
-0.165190	N with the rightmost 1-bit
-0.165190	{ // Remove right-most 1-bit
-0.350714	return clock; } // Or
-0.226791	the same few parameters. Or
-0.218595	program itself when running. Programs
-0.218595	time under worst-case conditions. Programs
-0.236417	by a single & operation,
-0.320283	done with a shift operation,
-0.077777	before the file is closed.
-0.077777	sure the file is closed.
-0.226793	vector class library. Open source.
-0.212327	is free and open source.
-0.138877	0x3FF unsigned int sign :1;//signbit
-0.138877	0x7F unsigned int sign :1;//signbit
-0.356347	Thread-local storage should be avoided,
-0.353081	dynamic linking cannot be avoided,
-0.526727	further described in the book
-0.165190	Codes", SIAM 2001. Advanced book
-0.575630	of parameter transfer is avoided.
-0.463671	cases should definitely be avoided.
-0.237827	// MOVNTQ _mm_empty(); // EMMS
-0.331703	be followed by an EMMS
-0.356086	heavy work to do immediately
-0.294289	pragmas must be placed immediately
-0.095315	SafeArray() { memset(a, 0, sizeof(a));
-0.095315	to zero memset(a, 0, sizeof(a));
-0.434335	XMM registers (see page 105).
-0.335738	vector instructions (see page 105).
-0.330799	is because the register usage
-0.165190	in the chapter "Register usage
-0.522445	You may have to fix
-0.325156	producer will try to fix
-0.023278	void TransposeCopy(double a[SIZE][SIZE], double b[SIZE][SIZE])
-0.237905	in a debugger and press
-0.276605	actions like a key press
-0.523160	standard tasks such as sorting
-0.237349	serial, such as most sorting
-0.050786	also called shared objects (*.dll,
-0.234454	0.38 0.44 0.40 n.a. 1.00
-0.234454	0.24 0.25 0.24 n.a. 1.00
-0.212327	1 fraction 2 23 ,
-0.200054	1 fraction 2 52 ,
-0.237624	cache is used more efficiently.
-0.313462	133 although slightly less efficiently.
-0.356678	into memory. If the word
-0.358080	For example, in a word
-0.234211	: public B1, public B2
-0.349703	public B1 { public: B2
-0.212327	99 10 Multithreading.............................................................................................................. 101 10.1
-0.165190	Iss. 4, 2007 (www.intel.com/technology/itj/). 10.1
-0.542515	i++) sum += a[i]; Converting
-0.251373	would otherwise go undetected. Converting
-0.356958	unused columns to a matrix.
-0.424438	in a 512 512 matrix.
-0.294117	= A; x.b = B;
-0.381944	Z = A + B;
-0.235574	you are not doing divisions.
-0.229252	used on completely independent divisions.
-0.329980	22. Avoid long dependency chains,
-0.320757	has two loop-carried dependency chains,
-0.518085	purposes. The use of coprocessors
-0.756499	can be used as coprocessors
-0.235338	S1 x, y; ... x.a
-0.508392	int A, B, C; x.a
-0.462161	have exactly the same effect.
-0.344522	-fno-pic apparently has no effect.
-0.575200	long response times to keyboard
-0.237734	to respond quickly to keyboard
-0.008674	5040, 40320, 362880, 3628800, 39916800,
-0.350050	return _mm_cvtss_f32(s); } // Approximate
-0.237246	nfac *= n+1; // Approximate
-0.023411	< 100; x++) { Table[x]
-0.208570	to remove the const restriction
-0.208570	for relieving the const restriction
-0.200054	A; x.b = B; x.c
-0.165190	= y.b + 2.; x.c
-0.500452	The result will be misleading
-0.234212	unreliable. They sometimes give misleading
-0.237553	macro for aligning data #ifdef
-0.165190	Example: // Example 8.22 #ifdef
-0.376062	16 3.3 Program installation ..................................................................................................
-0.413467	132 14.2 Bounds checking ..................................................................................................
-0.165190	numbers: // Example 12.8a. Sum
-0.165190	have: // Example 12.8b. Sum
-0.320310	= y.a + 1.; x.b
-0.200054	C; x.a = A; x.b
-0.794913	Intel, AMD and VIA CPUs".
-0.235830	instructions mov ebx,eax / shr
-0.231414	xor mov $B1$2: mov shr
-0.436752	integers of 64 bits each.
-0.342246	double's of 8 bytes each.
-0.222401	a/a=1 ----x---x a/1=a xxxxxxxxx 0/a=0
-0.165190	a/a=1 --------x a/1=a x-xxx-x-- 0/a=0
-0.544091	sometimes be avoided by replacing
-0.237050	of this fact by replacing
-0.356774	The solution a = 1.0f
-0.286161	(b == 0) ? 1.0f
-0.888971	the scope of this manual,
-0.347627	clock cycle? In this manual,
-0.314731	to be fragmented and involve
-0.715306	consuming because it may involve
-0.442160	always #pragma vector always Optimize
-0.604666	Windows Intel compiler Linux Optimize
-0.465298	size; i++) sum += list[i];
-0.283159	2) { sum1 += list[i];
-0.237827	0.f, 0.f, 1.f); // initialize
-0.212331	// x^n // sum, initialize
-0.222401	xn n 0 n! 117
-0.165190	serial code for vectorization............................................................. 117
-0.237872	and web browsing that previously
-0.165190	value it was assigned previously
-0.292880	12.4. Vector class libraries 113
-0.218595	Using vector classes ............................................................................................. 113
-0.206155	Single precision division, square root
-0.206155	is inefficient. Division, square root
-0.237905	including linear algebra and statistics,
-0.490195	Includes many functions for statistics,
-0.234534	should be compiled three times,
-0.500020	by unacceptably long response times,
-0.382899	way to overcome the obstacle
-0.237715	lookup is often an obstacle
-0.224963	is copyrighted by Agner Fog.
-0.176509	Mac platforms By Agner Fog.
-0.218599	class library exp exp 12.8
-0.212327	functions for vectors........................................................................ 119 12.8
-0.355435	also available from the IDE
-0.237715	C++ builder Has an IDE
-0.231414	arrays with vector access. 12.9
-0.226793	dynamically allocated memory................................................................. 120 12.9
-0.450230	very efficient way of removing
-0.343539	is measured simply by removing
-0.231420	critical code. A test setup
-0.231420	in a simple test setup
-0.165196	64 bit platform _WIN64 _LP64
-0.165196	platform _WIN64 _LP64 _WIN64 _LP64
-0.349220	preferably be a simple type,
-0.235715	an object of known type,
-0.237668	i += 16) { b.load(bb+i);
-1.413031	Load eight consecutive elements b.load(bb+i);
-0.356635	not necessary because the factorials
-0.315183	better: store the reciprocal factorials
-1.252521	appropriate version of the subroutine
-0.556115	something in a separate subroutine
-0.237905	kernel version 2.6.30 and later.
-0.336254	instruction set SSE2 or later.
-0.222401	Automatic vectorization ......................................................................................... 107 12.4
-0.222401	for integer vector division. 12.4
-0.200054	intrinsic functions ........................................................................................ 109 12.5
-0.165190	in the next section. 12.5
-0.206157	safe to use algebraic manipulations
-0.206157	This is because algebraic manipulations
-0.349238	the risk of memory leaks
-0.235948	way to prevent memory leaks
-0.235499	The official C standard says
-0.165190	the register usage convention says
-0.442884	// INSTRSET == 2 12.6
-0.200054	vector classes ............................................................................................. 113 12.6
-0.335738	and double (see page 140).
-0.335738	quite time-consuming (see page 140).
-0.200054	code for vectorization............................................................. 117 12.7
-0.165190	function is called. 118 12.7
-0.406797	1023 1 fraction 2 52
-0.235777	of the structure }; 52
-0.500416	compiler doesn't have to obey
-0.351519	System code has to obey
-1.006760	as explained on page 72.
-0.853493	reasons explained on page 72.
-0.102611	= b+a a*b = b*a
-0.102611	= b+a, a*b = b*a
-0.314763	the STL containers is 95
-0.355585	syntax or See page 95
-0.222403	the right vector elements. 12.1
-0.200054	Using vector operations............................................................................................... 105 12.1
-0.462533	in performance if the time-critical
-0.237908	rely on longjmp in time-critical
-0.222401	a more distant future. 12.3
-0.222401	ZMM registers .......................................................... 107 12.3
-0.629968	possible to implement a universal
-0.288308	and execution time. No universal
-0.357007	-msse2 SSE3 instruction set Suppl.
-0.165190	SSE2 emmintrin.h SSE3 pmmintrin.h Suppl.
-0.429247	and the program will crash.
-0.592034	compatibility problems and system crash.
-0.165190	---x----- x---x---x x-xxx---- a*b*c=a*(b*c) a+b+c+d
-0.165190	= b*a (a+b)+c=a+(b+c) a+b+c=c+b+a a+b+c+d
-0.294237	do not expect to 99
-0.218595	Explicit cache control .............................................................................................. 99
-0.523359	p(double x) { // Remove
-0.235043	Eliminate jumps Eliminate branches Remove
-0.288313	runtime frameworks, intermediate code, interpreters,
-0.212327	the large graphics frameworks, interpreters,
-0.294213	by 3, 5 and 9.
-0.525981	the vector element level 9.
-0.237085	The copy constructor, if any,
-0.237085	and the destructor, if any,
-0.293905	Multithreaded programs must use thread-safe
-0.335965	use thread-safe functions. A thread-safe
-0.286311	void F2(float x[]); void F3(bool
-0.230956	// Example 9.2b void F3(bool
-0.314735	opens a file in exclusive
-0.408094	lock a container for exclusive
-0.476381	instructions listed in table 9.2.
-0.234535	MOVNTDQ _mm_stream_si128 SSE2 Table 9.2.
-1.394615	= 0; i < NumberOfTests;
-0.237597	core). The counters will stay
-0.306476	thread does not necessarily stay
-0.338526	alleviated in the 64-bit extension
-0.232344	AVX instructions. A further extension
-0.502031	then it is the "best
-0.237905	the "worst case" and "best
-0.236961	of structures (without member functions)
-0.165190	linking (remove unreferen- ced functions)
-0.314763	Here the iteration is repeated
-0.294166	operation will then be repeated
-0.434159	factorials: static const int FactorialTable[13]
-0.335599	of factorials: const int FactorialTable[13]
-0.532149	(Tuesday | Wednesday | Friday)
-0.375092	Wednesday || Day == Friday)
-0.231420	// call polymorphic child function:
-0.272301	to use the lrint function:
-1.156134	code. Example: // Example 8.21
-0.380166	of the loop. Example 8.21
-0.421227	AVX supported CriticalFunction = &CriticalFunction_AVX;
-0.293199	// AVX supported return &CriticalFunction_AVX;
-0.318049	implemented as vector operations. 105
-0.165190	12 Using vector operations............................................................................................... 105
-0.357695	simply makes the compiler interpret
-0.346061	a string and then interpret
-0.470265	be 0 or 1. Writing
-0.218595	errors in C++ programs. Writing
-0.074790	Correction for the FDIV bug
-0.074790	"FDIV bug". The FDIV bug
-0.044100	_mm_mullo_epi16 (b, c); // Compare
-0.008674	elements: #define swapd(x,y) {temp=x; x=y;
-0.237813	out of range"); or better,
-0.226791	is not needed. Even better,
-1.207943	if you want to flip
-0.314638	u.i ^= 0x80000000; // flip
-0.318503	A structure of four float's
-0.281938	16-bit integers or four float's
-0.394492	installed can take several minutes
-0.228642	but it took several minutes
-0.237592	(bb[i] * cc[i]); } 109
-0.251373	Using intrinsic functions ........................................................................................ 109
-0.451861	b, c; b = (a+1)
-0.349770	* (a+1); c = (a+1)
-0.343293	is needed for other reasons,
-0.236627	of resources. For these reasons,
-0.226791	a pre-calculated table. Even better:
-0.165190	(requires no specific option) better:
-0.350713	above advantages of each method,
-0.331781	This is the simplest method,
-0.351863	edx but the variable whose
-0.231921	the critical stride. Variables whose
-0.265107	allocation with new and delete,
-0.265107	dynamically with new and delete,
-0.643483	a shared object is accessed,
-0.314563	the array element is accessed,
-0.102611	n.a. - (a&b)|(a&c) = a&(b|c)
-0.102611	~(~a)=a x-xxxxx-- (a&b)|(a&c) = a&(b|c)
-0.547767	Threads are useful for assigning
-0.347378	It does this by assigning
-0.348718	true, which is only 10%
-0.306994	and b is true 10%
-0.165190	Coriolis group books 1994. Mostly
-0.165190	development", Addison- Wesley 1997. Mostly
-0.295331	is provided in manual 4:
-0.295331	are listed in manual 4:
-0.227395	c = temp / 4;
-0.227395	c = (a+1) / 4;
-0.290013	buffer that some microprocessors have.
-0.224956	RAM than end users have.
-0.348358	is a way of relieving
-0.989261	operator is used for relieving
-0.835958	const int rows = 10,
-0.228175	8 long double 8, 10,
-0.224958	pop ret ALIGN ?Func@@YAXQAHAAH@Z ENDP
-0.165190	least recently 4 ?Func2@@YAXQAHAAH@Z ENDP
-0.237827	volatile int seconds; // incremented
-0.349579	until seconds has been incremented
-0.527817	functions that it calls. 48
-0.165190	45 7.14 Functions ................................................................................................................ 48
-0.232347	Day; if (Day == Tuesday
-0.231415	1, Monday = 2, Tuesday
-0.345472	of matrix[j][0] is calculated internally
-0.340167	A constructor is implemented internally
-0.288688	Put in an unused fourth
-0.200054	not the columns. Every fourth
-0.237118	two books contain many tips
-0.324319	hard-to-find errors, and some tips
-0.294184	costs are higher for shared_ptr
-0.165190	to another by assignment. shared_ptr
-0.435459	overridden in Linux and BSD.
-0.399960	and 64-bit Linux and BSD.
-0.044149	not be worth the effort.
-0.236391	and operating systems available today.
-0.226793	is used for Java today.
-0.232718	a different meaning. 2. Put
-0.165190	the algorithm in question: Put
-0.330998	// AVX version int CriticalFunction_AVX(int
-0.236682	AVX version 127 int CriticalFunction_AVX(int
-0.237511	elements inside sqaure: for (r2
-0.237511	is handled separately: for (r2
-0.346746	can improve this by writing:
-0.293240	the alignment explicitly by writing:
-0.023299	inheritance class B1; class B2;
-0.786206	generally assume that the overall
-0.294172	spots and measuring the overall
-0.122678	delete the object. 7.17 Structures
-0.122678	types .............................................................................................. 50 7.17 Structures
-0.336266	of object files and executables.
-0.237520	symbolic link. Use different executables.
-0.626371	hardware definition language is inherently
-0.356249	matrixes. Algorithms that are inherently
-0.237926	your programming questions to me.
-0.293832	Intel and one from me.
-0.732730	takes to call a non-virtual
-0.525479	take more resources than non-virtual
-0.494361	because this method is safer.
-0.341016	type casting, but also safer.
-0.584375	so the value of b+c
-0.218599	b and c first. b+c
-0.074790	Linux platform n.a. __unix__ __linux__
-0.074790	n.a. __unix__ __linux__ __unix__ __linux__
-0.330172	16-bit systems: int 16 -32768
-0.294120	algebra reductions: a+b = b+a
-1.056270	the calculation of the factorials,
-0.549377	just a matter of convenience
-0.711341	to be less than 231.
-0.990884	xpow10(double x) { return pow(x,10);
-0.236983	come. Even big software companies
-0.237600	the 64-bit systems will dominate
-0.236133	with legacy code, specific preferences
-0.288321	or __restrict or #pragma optimize("a",on).
-0.232725	always Optimize function #pragma optimize(...)
-0.212332	x-xxxx--x x-xx----- x--x----- ---x----- x---x---x
-0.403906	longdoublevalue ( 1)sign 2exponent 16383
-0.656034	the latest instruction set extensions.
-0.218603	something takes 10 μs today,
-0.838524	changed to: // Example 14.5b
-0.357747	certain interval: // Example 14.5a
-0.224965	a copy constructor specifying otherwise.
-0.165195	// initialize sum for(inti=0;i<16;i+=4){ //Loopby4
-0.237131	1. / (b1 * b2);
-0.165195	1./720., 1./5040., 1./40320., 1./362880., 1./3628800.,
-0.353420	capabilities still have a niche
-0.236960	if ((unsigned int)i < 10)
-0.222406	qword ptr x; __asm fistp
-0.233049	avoid this. (In Windows, SetThreadAffinityMask,
-0.508499	64-bit operating systems are common,
-0.347870	per byte of data (low
-0.236984	unit-testing is unfortunately very common.
-0.234031	systems that allows bigger segments
-0.165195	bcc, v. 5.5 Mac: Darwin8
-0.525989	the vector element level 108
-0.200060	mode program is fast, compact,
-0.358014	user interface. It is 102
-0.251379	class into an anonymous namespace.
-0.352491	the availability of an update,
-0.481219	gigabytes of data. The similarity
-0.265216	vector operations on contemporary 106
-0.237074	variables defined outside any function)
-0.331917	copying it Use a "move
-0.200060	feature called "Gnu indirect function"
-0.165195	Monday, Tuesday, Wednesday, Thursday, Friday,
-0.200060	than short int (16 bits),
-0.222406	of floating point division. Correction
-0.237884	disk copying. Security. The vulnerability
-0.657672	not unusual for the reinstallation
-0.343542	is done simply by ignoring
-0.337140	// add the four sums
-0.292525	{ // (N & N-1)==0
-0.165195	of 1/n! 1., 1./2., 1./6.,
-0.232722	vector(float a, float b) {x
-0.304430	the current array element. Rather
-0.232723	other compilers have inefficient code-based
-0.378523	cannot assume that model N+1
-0.357515	by 32 and the 512-bit
-0.657506	is available: // Example 7.6.
-0.237816	hacks that violate or circumvent
-0.200060	----x---x a/1=a xxxxxxxxx 0/a=0 ---x---xx
-0.326207	read about in my blog.
-0.165195	C++ Performance". www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP.
-0.233593	well with non-Intel CPUs. Includes
-0.229256	eax, 8 edx, eax $B2$2
-0.200060	expressions may have undesired effects.
-0.165195	S. Warren, Jr.: "Hacker's Delight".
-0.236536	b; will make 32 AND-operations
-0.314736	CPU- specific optimizations in precompiled
-0.325397	the values because a typo
-0.354258	derived class (see page 51).
-0.165195	"Intel® C++ Compiler Documentation". Included
-0.281589	down and restarted anyway. Updates
-0.165195	mangling. The characters '?', '@'
-0.643804	7.19 Class member functions (methods)
-0.353554	n is a loop count.
-0.237734	listing /FA -S - masm=intel
-0.236255	being said, I must warn
-0.165195	more reproducible time measurements: warm
-0.314628	The compiler has not noticed
-0.798584	efficiency of different C++ constructs........................................................................
-2.034309	- n.a. n.a. - a<<b<<c
-0.538189	error known as memory leak.
-0.230119	objconv or a similar utility
-0.314735	is extremely complicated and clumsy,
-0.237884	in the table. The 16-byte
-0.335699	an array to all zeroes.
-0.324323	this section for some caveats.
-0.228180	begins at the label $B1$2:.
-0.237908	such as spell-checking and repagination
-0.347499	Induction++; ; point to a[i+2]
-0.357696	make sure the compiler recognizes
-0.357600	__fastcall keyword is not recognized
-0.165195	a call to _endthread() cleans
-0.403910	then the & operator (bitwise
-0.200060	the sake of cross-platform portability.
-0.563853	parentheses can be calculated independently.
-0.357747	example 9.5b. // Example 9.5b
-0.349753	have the time to answer
-0.432389	classes in the STL (Standard
-0.357747	be used: // Example 13.2.
-0.630083	ebx, DWORD PTR [edx] adds,
-0.293908	assembly code or use objconv
-0.511140	container for a specific purpose:
-0.226798	but there are serious limitations
-0.237823	x4*x4; double x10 = x8*x2;
-0.237657	how to overcome this limitation).
-0.218601	2 to x 43 speculatively
-0.463154	compiler-generated code in the disassembly
-0.237409	Wikipedia under CPU cache (en.wikipedia.org/wiki/L2_cache).
-0.293624	not necessary when no attempt
-0.346062	gives a+b=0, and then 0+1.23456
-0.200060	(time after) - (time before)
-0.351232	an index of memory blocks.
-0.433219	use the header file timingtest.h
-0.294239	2" The dispatching to C1::Disp()
-0.237480	have a special loop predictor.
-0.237823	lines is 8*1024/64 = 128.
-0.510304	profilers are not always accurate,
-0.218601	types: long long, double. Misaligned
-0.357515	See www.agner.org/optimize and the FAQ
-0.236875	it has been called before.
-0.592341	are part of the Xnu
-0.212332	on anything else being initialized.
-0.811557	be able to do so).
-0.470495	flag in the compiler. Remember,
-0.165195	floating point -ffast-math /fp:fast /fp:fast=2
-0.235997	static linking (e.g. option /MT).
-0.165195	vector math library (VML, MKL).
-0.622899	class D : public B1
-0.539643	A disadvantage is that CParent::Hello()
-0.353837	these compilers that a user-defined
-0.232725	intended for finding hot spots,
-0.572969	for the calculation of B.
-0.325337	more useful methods for exploiting
-0.314759	size) % (number of sets).
-0.345994	the diagonal have been lost
-0.451050	2; i++) a[i] = i+1;
-0.314765	when a thread is terminated.
-0.226798	execution considerably. Another serious burden
-0.251379	|| (a&&b&&c) = a&&(b||c) (a&&!b)
-0.525100	illustrates how to use SafeArray:
-0.291792	8 16 char 128 Is8vec16
-0.452526	processor has a particular weakness
-0.237174	to security. Standard C++ imple-
-0.218601	processing and image processing. Yeppp.
-0.231919	or performance problems. Avoid nested
-0.344285	dest, double const & source)
-0.212332	optimization", Coriolis group books 1994.
-0.237543	of the operands has side
-0.453795	avoided unless you have ample
-0.337668	can be 64 bits (MMX),
-0.529061	μs is less than 1/50
-0.212332	Static linking (multithreaded) /arch:AVX /openmp
-0.324626	This is because we forgot
-0.290384	and increment. The three clauses
-0.165195	thread affinity mask. Poor reproducibility.
-0.483688	faster than x = -abs(x);.
-0.237734	x- x ----- - x-xxx
-0.200060	time1 = ReadTSC(); CriticalFunction(); timediff[i]
-0.165195	a to be signed. Be
-0.232352	1.f); // initialize sum for(inti=0;i<16;i+=4){
-0.347819	for issuing an error message.
-0.236876	a particular subtask before coordination
-0.467245	likely in a more distant
-0.382504	& enum Weekdays { Sunday
-0.709946	; unused label ; restore
-0.165195	b memcpy(b, a, sizeof(b)); 47
-0.200060	"More Effective C++". Addison-Wesley, 1996.
-0.341528	are always available from www.agner.org/optimize.
-0.165195	0; row < NUMROWS; row++)
-0.356301	while loop is to resume
-0.165195	x---- x---- ----- ~(~a)=a x-xxxxx--
-0.790678	to the same memory areas.
-0.165195	Intel/MASM syntax: __asm fld qword
-0.325272	for each test // Repeat
-0.347407	-ipo No exception handling /EHs-
-0.358082	and b in a union:
-0.237349	make the SelectAddMul example (12.4e)
-0.236090	that a low-priority thread steals
-0.165195	a loop of ADC (add
-0.453370	time an object is moved,
-0.357395	the sequence to be moved.
-0.407634	threads use different memory areas,
-0.200060	fraction 2 52 , longdoublevalue
-0.236761	here in a rather unconventional
-0.237707	function can modify x *const_cast<int*>(&x)
-0.346284	not using position-independent code (option
-0.200060	x-xxxxxx- a*0=0 --xxxx-xx a*1=a x-xxxxx-x
-0.165195	with new or malloc. Handles
-0.347494	cause all kinds of strange
-0.319431	they point to become invalid,
-0.200060	bytes without cache MOVNTDQ _mm_stream_si128
-0.165195	have an option (Windows: /Gy,
-0.218601	and leave them enabled (there
-0.559092	math functions such as sqrt
-0.237823	+ d.x; a.y = b.y
-0.570822	that do not support SSE.
-0.285364	by setting the fraction bits:
-0.236454	long int 64 0 264-1
-0.237760	higher priority than code generality.
-0.356776	8.4 double a = sin(0.8);
-0.237908	at address esp+8 and esp+12
-0.231919	be convenient for adding bounds-checking
-0.237874	only one, auto_ptr that owns
-0.236846	2.0; i >= 0; i--,
-0.165195	Linux compiler, or vice versa.
-0.357420	replaced by the function body.
-0.237816	like string, wstring or CString
-0.272306	378.7 168.5 513 513 58.7
-0.237829	Time // Serialize // Prevent
-0.287894	very expensive. A limited "express"
-0.200060	code bloat and complexity (en.wikipedia.org/wiki/Standard_Template_Library).
-0.629963	the sign bit to zero:
-0.311821	Example 7.27 float x; *(int*)&x
-0.165195	using position-independent code (option -fno-pic).
-0.237435	jobs. For example, one tread
-0.236859	2048 bytes = 4 rows.
-0.237934	the memory bus is saturated.
-0.325426	cache lines follow the rows,
-0.292246	a single result. An uncaught
-0.355330	ways than by a macro,
-0.165195	mainstream next year. Ignoring virtualization.
-0.237927	Beginners are advised to seek
-0.462195	function instead of a macro.
-0.336236	the way microprocessors are constructed.
-0.237198	used for all static data,
-0.323721	+ 2; } void FuncB
-0.314694	without any option that limits
-0.272304	x^2, x^3, x^4 F32vec4 xx4(x4);
-0.357959	user. Time is a precious
-0.165195	Guide for AMD Family 15h
-0.236983	lot of irrelevant software installed,
-0.357395	many files to be installed.
-0.341626	before converting to floating point:
-0.237882	Compiler v. 11.1 for IA-32/Intel64,
-0.200060	= char 16 SSSE3 _mm_perm_epi8
-0.237734	if ((unsigned int)(i - min)
-0.356952	compiler combined with the LLVM
-0.593132	individually. Example: // Example 7.40a
-0.357747	following way: // Example 7.40b
-0.657506	is needed: // Example 7.40c
-0.212332	the code must compute (FuncRow(i)*columns
-0.508718	for Mac OS X (Darwin)
-0.348984	pow(x,n) As we can see,
-0.237955	programming nowadays stress the importance
-0.532853	computer is not always comparable
-0.325208	languages are implemented with interpretation.
-0.294120	value as xn = x∙xn-1,
-0.224962	though this only happens rarely.
-0.232725	pointer not aliased #pragma optimize("a",
-0.237451	methods of rounding, but neither
-0.476267	always able to predict correctly
-0.314759	stride) = (number of sets)
-0.632729	2.6 Choice of function libraries........................................................................................
-0.294239	algorithm that comes to mind.
-0.339087	information about supported instruction sets,
-0.237884	5: calling conventions. The dot
-0.237273	y = (a1*b2 + a2*b1)
-0.345248	if they are often mispredicted.
-0.351866	set in the variable Day.
-1.324997	is likely to be mispredicted,
-0.421456	future. 12.3 Automatic vectorization Good
-0.200060	to test // (time after)
-0.443221	b; static const float OneOrTwo5[2]
-0.228178	register variables are temporary intermediates,
-0.336311	read before p is incremented.
-0.349586	pointer p has been incremented,
-0.165195	know how this works, here's
-0.346466	the RAM size is insufficient.
-0.237816	computer while he or she
-0.646210	compile time or a not-too-big
-0.345982	speed because of cache evictions
-0.349349	access to the sign bit,
-0.338762	function name Instruction set Prefetch
-0.293743	counters in each CPU core).
-0.212335	on) __restrict __restrict __declspec( noalias)
-0.294187	Table 9.1. Time for transposition
-0.237365	threads will invalidate each other's
-0.307866	Some guidelines are provided below,
-0.200060	copying without effectively preventing illegitimate
-0.287894	the user and prevent legitimate
-0.222406	modules and header files. 121
-0.307002	abstraction in the logical architecture
-0.237122	CPU can hold many renamed
-0.358014	mouse move. It is unacceptable
-0.352413	of integers and other hardware-related
-1.065943	bytes. first byte at 12,
-0.200060	and vector operations (chapter 12)
-0.237908	Third Edition, 2005; and "More
-0.165195	parallel processing. Scott Meyers: "Effective
-0.237829	SelectAddMul_pointer = &SelectAddMul_dispatch; // Dispatcher
-0.236862	when I die. See www.gnu.org/copyleft/fdl.html.
-0.507288	sources. For example, the Boost
-0.200060	mov ebx,eax / shr ebx,31
-0.224962	in programs where security matters.
-0.234683	} // Or #include <ia32intrin.h>
-0.237927	for disk operations to finish.
-0.325037	many common programs use inappropriate
-0.236453	switch (n) { case 0:
-0.237908	Intel: "Intel 64 and IA-32
-0.237776	cannot be mixed with x87
-0.200060	algebra reductions: a+b=b+a a*b=b*a a+b+c=a+(b+c)
-0.350233	= 5.0f; b = 6.0f;
-0.218601	declaring it inside {} brackets.
-0.294239	its pointer set to NULL.
-0.294043	(b*2.0)/3.0 rather than as b*(2.0/3.0)
-0.237816	static libraries (.lib or .a),
-0.462534	compare it to the tolerance
-0.224963	of function libraries Test Processor
-0.234816	to the disk cache. Files
-0.237549	development time, usability, program compactness,
-0.294082	Branches are implemented by (partial)
-0.165195	bit manipulation tricks Michael Abrash:
-0.293915	to sum2 from time T+1
-0.235504	OneOrTwo5[(b!=0) ? 1 : 0]
-0.165195	"Effective C++". Addison-Wesley. Third Edition,
-0.355097	| Friday) in example 14.7b
-0.552401	a graphical user interface. Otherwise
-0.407095	for the first two suggested
-0.593132	prediction. Example: // Example 14.3a
-0.657506	lookup table: // Example 14.3b
-0.473101	a dynamic link library (DLL)
-0.284379	== 5 #define FUNCNAME SelectAddMul_SSE41
-0.324564	= a + 2 thenaandbcannot
-0.165195	v. 4.1.0, 2006 (Red Hat).
-0.165195	int 32 -231 231-1 int32_t
-0.314703	for recovering or for issuing
-0.382872	class data member is unchanged
-0.337273	nature of the stack. Deallocation
-0.289801	reductions at their own initiative
-0.314645	been criticized for code bloat
-0.229255	point numbers is inefficient. Division,
-0.222408	// Linux syntax 90 Gives
-0.809137	does the same as C-
-0.165195	// Example 7.29b floata; boolb=0;
-0.237908	features of Java and C#
-0.228179	true (1) or false (0);
-0.237718	input never exceeds an acceptable
-0.237908	calls alternately FuncA and FuncB,
-0.200060	with the rightmost 1-bit removed.
-0.407680	integer and this will trigger
-0.352367	programming language and a basic
-0.165195	0; j < columns; j++)
-0.538178	checking multiple values at once...................................
-0.289309	same as for switch statements,
-0.358206	the one it is compiling.
-0.314759	some typical sources of frustration
-0.325014	in memory takes only 2-3
-0.237861	set and map are prone
-0.265219	access....................................................................................................... 22 3.14 Context switches.....................................................................................................
-0.319783	who want to go deeper
-0.234385	the equivalent if(!(a || b))
-0.235048	latter has one operator less.
-0.357794	mathematical notion of a "function".
-0.212332	Number of simultaneous lookups Max.
-0.236422	N; } T & operator[]
-1.611000	SSE2 instruction set is enabled:
-0.456326	block that the object owns.
-0.829743	safe if there are wrapper
-0.272306	and not __INTEL_COMPILER __INTEL_COMPILER 161
-0.200060	and _WIN64 _M_X64 _M_X64 162
-0.538730	simple periodic pattern can be,
-0.230121	applications (e.g. in linear algebra)
-0.325392	than it used to be.
-0.237829	above doesn't work // Re-do
-0.237928	the fundamental laws of algebra.
-0.341701	by making longer time slices.
-0.355908	while loops, then the transformation
-0.382726	x^2 // x^4 // x^8
-0.237115	version 2.20, glibc version 2.11
-0.218603	code to test. disable power-save
-0.237490	used in two other situations:
-0.165195	to make a sensible balance
-0.251379	0.44 0.40 n.a. 1.00 0.35
-0.421400	known CPU model is over.
-0.331640	assignment operator, or an over-
-0.234539	(a<b && b<c && a<c)
-0.643653	b1, b2; y = a1/b1
-0.235504	"=m"(n) : "m"(x) : "memory"
-0.165195	v. 11.1 for IA-32/Intel64, 2009.
-0.351556	at least in some situations,
-0.283157	even have a false vendor
-0.165195	7.15b SafeArray <float, 100> list;
-0.350717	&Object2; p->Hello(); } // Non-polymorphic
-0.347518	be quite a good investment.
-0.236422	table. The 16-byte instructions MOVNTPS,
-0.451075	have long double precision (80
-0.382568	NUMCOLUMNS = 100; int matrix[NUMROWS][NUMCOLUMNS];
-0.165195	explanation of return prediction). 149
-0.813475	x) { _mm_storeu_si128((__m128i *)d, x);}
-0.499992	false. The value of cc[i]+2
-0.236793	64 Is32vec2 32 64 Iu32vec2
-0.322764	4 unsigned int 128 Iu32vec4
-0.647666	< NumberOfTests; i++) { time1
-0.611996	be useful for making plug-ins
-0.234814	8 32 char 256 Vec32c
-0.237657	i is outside this interval,
-0.578657	size known at compile time?
-0.325265	>= 8) SelectAddMul_pointer = &SelectAddMul_AVX2;
-0.224965	For example, a Core i7
-0.332394	Microsoft platform software development kit
-0.796749	size of the array i)
-0.224962	vector parameters Vec4f polynomial (Vec4f
-0.357515	of resources, and the transitions
-0.218601	the table is cached. Usually
-0.165195	d; unsigned int u[2]} a[size];
-0.347531	87 9.2 Cache organization ...................................................................................................
-0.294239	the nearest element to x?"
-0.235153	the CPU has problems separating
-0.237200	as a 32-bit number (the
-0.330555	void f(); }; void g()
-0.237706	assembly language programming, compiler technology,
-0.349974	smaller and the array 800
-0.353562	Intel CPUs cannot be tolerated.
-0.237007	data can exceed 2 Gbytes.
-0.237131	CChild2 Object2; CChild1 * p1;
-0.237716	desired parameters typedef int CriticalFunctionType(int
-0.165195	smart pointer is created, deleted,
-0.824805	a*1 = a - a/1
-0.212332	the drawbacks of C++. Yet,
-0.236793	MS compiler: __int64 64 -263
-0.322948	the value it was assigned
-0.222406	two main principles here: functional
-0.237756	point value written as 2eee
-1.027869	that it is not human
-0.356446	is intended as a plug-in
-0.251379	mode (SSE2): #include <xmmintrin.h> _mm_setcsr(_mm_getcsr()
-0.348892	relative difference less than 2-20,
-0.355449	for D are not yet
-0.237908	as Taylor expansions and Newton-Raphson
-0.356770	User complaints should be regarded
-0.237365	than to draw each pixel
-0.333991	by setting a thread affinity
-0.513357	4 PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROCNEAR
-0.237372	www.intel.com. (See also page 119).
-1.158948	rather than on the stack).
-0.165195	lines. A few decades ago,
-0.520281	returns. Alternatively, you may reuse
-0.165195	not get any answer. Beginners
-0.237829	SelectAddMul_pointer = &SelectAddMul_SSE2; // Error:
-0.697494	strings in a memory pool,
-0.345994	the code have been reordered,
-0.200060	fraction 2 23 , doublevalue
-0.165195	131. Intel Performance Primitives (IPP).
-0.237563	for the project at hand.
-0.358082	a typo in a hand-
-0.237908	bitwise operators (& and |)
-0.234540	more efficiently by better standardization
-0.165195	problems and planned solutions. Patches
-0.237908	importance of structured and object-oriented
-0.236844	you should not call WriteFile
-0.200060	library through the symbolic link.
-0.440218	122 13.1 CPU dispatch strategies........................................................................................
-0.313648	own set of performance monitoring
-0.218601	printf("Alpha"); break; case 1: printf("Beta");
-0.834676	be more efficient to re-use
-0.237934	This non-inlined copy is dead
-0.345162	or micro-op cache. The Core2
-0.453757	static has a different meaning.
-0.452526	integer has a particular meaning,
-0.222406	- 2014. Last updated 2014-08-07.
-0.294043	is calculated internally as (int)&matrix[0][0]
-0.356260	array i) { // Safe
-0.237451	is not i but i*12,
-0.761706	by using the keyword __thread
-0.610955	the branch target buffer (BTB).
-0.323147	static keyword has several meanings
-0.235839	its out-of- order calculation capabilities.
-0.700148	explained in the next paragraph.
-0.293998	+ b;} }; int Sum2(S3
-0.165195	optimization. en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC TR 18015,
-0.237523	for different microprocessors, different alignments
-0.357561	// Reset floating point status:
-0.231419	the available vector classes. Including
-0.235505	an interrupt, e.g. every millisecond.
-0.527824	a; double b;}; S1 list[100],
-0.237884	Choice of microprocessor The benchmark
-0.165195	each bit in nn ifbit=1
-0.526445	g(x)); In this example, f(x)
-0.327429	be calculated as follows: floatvalue
-0.265219	inside a class definition. Inlining
-0.230119	would assume that seconds remains
-0.314790	the performance under the worst-
-0.487673	for a list of titles.
-0.503366	100 numbers: // Example 11.2a
-0.631881	runtime type identification (RTTI) /GR–
-0.235341	C1 { public: ... ~C1();
-0.237273	b;} vector operator + (vector
-0.314790	clock cycles. Obviously, the initial
-0.165195	Microsoft C++ compilers www.agner.org/ optimize/#vectorclass
-0.237131	{ return powN<true,N/2>::p(x) * powN<true,N/2>::p(x);
-0.334974	and sum2 are called accumulators.
-1.439329	as explained on page 22.
-0.508408	and are in fact addressed
-0.323721	x; ... } void F0()
-0.312485	Linux, BSD, Intel-based Mac OS,
-0.212332	SVML v.10.2 & earlier vmlsExp4
-0.575438	is to compile with -mcmodel=large,
-0.237927	range from -128 to +127.
-0.237319	a and b double precision:
-0.236960	0 <= n < 223
-0.538713	= a + 1; x[1]
-0.294217	performance. Stefan Goedecker and Adolfy
-0.835961	const int SIZE = 64;
-0.236983	Even worse, many software products
-0.236534	www.agner.org/ optimize/#vectorclass Include file dvec.h
-0.340535	but not dynamic libraries (.dll
-0.236090	and decoded in several stages
-0.356847	of this function is InstructionSet().The
-0.652311	a line size of 64.
-0.165195	floppy disks and USB sticks
-0.443370	zation by multiple threads Parallelization
-0.235945	64 bytes. Each line covers
-0.236362	into force when I die.
-0.353680	methods described on page 153.
-0.236454	unsigned int 16 0 65535
-0.200060	/Fm Generate optimization report /Qopt-report
-0.353837	often happen that a low-priority
-0.165195	e.g.: // Example 12.1b. Vectorization
-0.224962	always true/false Loopunrolling x-xxxx--x Profile-guided
-0.325390	to have constructors and destructors.
-0.236044	windows, graphic brushes, etc. Locked
-0.234028	considerations of efficiency, platform independence,
-0.536080	14.11 Static versus dynamic libraries............................................................................
-0.531504	But the cost of fine-tuning,
-0.810302	mispredicted only when it exits.
-0.460432	predict that the loop exits,
-0.234814	The details about name mangling
-0.349879	supported by the Gnu utilities
-0.251379	.................................................................................... 124 13.3 Difficult cases........................................................................................................
-0.349586	This problem has been alleviated
-0.293416	be calculated with two decimals,
-0.235946	execution times per matrix cell
-0.294178	function or operator that transfers
-0.237823	= order(i); list[j].a = list[j].b
-0.466218	{ j = order(i); list[j].a
-0.357747	the arrays: // Example 12.4a.
-0.237349	times faster than example 12.4a,
-0.293008	a shift operation. For example,a
-0.356770	Accessibility guidelines should be obeyed.
-0.331819	removed, all resources are sufficient,
-0.480188	performance of the final product.
-0.230828	< 100. pop ebx restores
-0.165195	(gcc v. 4.5.2, July 2011).
-0.200060	Algorithms that are inherently serial,
-0.555287	that have to be restored
-0.212335	of the code. C#, managed
-0.287405	in the YMM registers. Disadvantages
-0.224965	up with the expected real-time
-0.165195	1./24., 1./120., 1./720., 1./5040., 1./40320.,
-0.236136	and a processing speed exceeding
-0.265216	0/a=0 ---x---xx (-a==-b)=(a==b) ---xx---- (a+c==b+c)=(a==b)
-0.234540	report /Qopt-report -opt-report Table 18.2.
-0.165195	to e.g. a menu click
-0.350258	= 1.0E8, c = 1.23456,
-0.234935	Many software programs automatically download
-0.165195	better: -Ofast -mveclibabi -fopenmp /Qopenmp
-0.237739	blocking or tiling. This technique
-0.331894	The frequent allocation and de-allocation
-0.491193	x*8 is replaced by x<<3,
-0.349971	is not the best optimizer.
-0.236325	(See page 137 about division).
-0.294243	remaining bits represent a monotonically
-0.356483	linear array with a top-of-stack
-0.313896	multiple platforms or multiple configurations
-0.237908	can turn on and off.
-0.165195	Vec32uc Vec16s Vec16us Vec8i Vec8ui
-0.293314	13.1, Requires binutils version 2.20
-0.237270	1.09 1.25 1.61 n.a. 2.23
-0.165195	on C++ Performance". www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf.
-0.294239	have more references to relocate,
-0.237861	any answer. Beginners are advised
-0.294243	tasks like pressing a button
-0.439878	them into the right positions
-0.325062	program flow. However, this did
-0.165195	short int 128 Iu16vec8 Vec8us
-0.234683	in other compilers. #include <excpt.h>
-0.481311	there is only a minimal
-0.410322	in Intel's Math Kernel Library,
-0.200060	they are created. Far Systems
-0.237816	a whole workday or more.
-0.324178	new update or even telling
-0.237908	kinds of strange and unexpected
-0.228178	order to make profiling feasible.
-0.294120	*x; double x4 = x2*x2;
-0.897077	the latest version of Mathcad
-0.777355	library with the option -mveclibabi=acml.
-0.236633	is a very user friendly
-0.222406	lot of background processes running,
-0.351332	the number of different targets
-0.292292	message and then calls exit.
-0.232348	The cost of task switching.
-0.352382	may consider the following alternatives:
-0.420967	103 12 Using vector operations...............................................................................................
-0.457703	in registers (see page 27).
-0.618889	8.5 Compiler optimization options ...................................................................................
-0.537598	a[i] = log(b[i]) + log(c[i]);
-0.237790	object is copied by assignment,
-0.237790	auto_ptr to another by assignment.
-0.350845	are very difficult to diagnose.
-0.357747	return statement: // Example 8.9b
-0.421391	compilers reduced 15.1a to 15.1c).
-0.218601	elements }; vector() {} vector(float
-0.741509	a; int b; int c;};
-0.593132	to. Example: // Example 8.9a
-0.659483	versions for different instruction sets...........................
-0.581992	compiler for 32-bit Windows. Integrates
-0.236560	with alignment problem void AddTwo(int
-0.165195	Avoid the function scanf. Violation
-0.870714	to divide the work evenly
-0.349586	hot spot has been identified,
-0.325409	libraries 113 Number of simultaneous
-0.463685	use integer operations for incrementing
-0.308940	then measurements can become imprecise
-0.237315	from Intel. See Intel Technology
-0.503054	8.17 char a = -100,
-0.335409	see whether the call p->f()
-0.165195	long clock; __cpuid(dummy, 0); DontSkip
-0.165195	table of 1/n! 1., 1./2.,
-0.946498	be replaced by a blend
-0.381949	y = d + e
-0.294217	the full generality and flexibility
-0.165195	each version FuncType SelectAddMul, SelectAddMul_SSE2,
-0.236907	char const * const Greek[4]
-0.218605	CPU clock frequency (in Windows:
-0.356483	high-level language with a wealth
-1.051938	or if it is correlated
-0.292657	can be carried out independently
-0.237829	list[i] << endl; // Output
-0.284379	bounds checking template <typename T,
-0.284379	by template template <typename T>
-0.251379	for each version FuncType SelectAddMul,
-0.218605	throw() specification. The empty throw()specification
-0.235837	example,a * 16is calculated asa
-0.212332	stored are containers 93 themselves.
-0.348892	is not less than ARRAYSIZE.
-0.224965	parallel because it defines electrical
-0.324850	int x; const double A2
-0.235504	fistpl %0 " : "=m"(n)
-0.294217	by S. Goedecker and A.
-0.200060	The integrated development environment (IDE)
-0.200060	compiler optimization. en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC TR
-0.317445	of different function libraries. Numbers
-0.236253	Applications that use large amounts
-0.276612	Windows, Intel/MASM syntax: __asm fld
-0.283157	comparison, which is fast. Calculating
-0.165195	latest version of Mathcad (v.
-0.288320	that draws a whole polygon
-0.212332	----- x---- x---- ----- ~(~a)=a
-0.330322	fast=2 Simple member pointers /vms
-0.237697	then the profiler may sample
-0.237573	to answer questions from everybody.
-0.237813	that I consider it unwise
-1.163975	C++ compilers and operating systems").
-0.314694	another. The object that looses
-0.251379	p2; p2 = &Object2; p2->Hello();
-0.357747	induction variables: // Example 8.23b.
-0.237273	b) + (c + d);
-0.218603	except for char pointers. 144
-0.235505	definition language defines hardware circuits
-0.657506	lookup table: // Example 14.1b
-0.587365	} u; int n; 143
-1.129580	like this: // Example 14.1a
-0.237934	array element a[i] is ecx+eax*4.
-0.212332	convert float to int. Reinterpret
-0.301869	int min = 100, max
-0.212332	overflow. Table 8.1 (page 77)
-0.236655	a textbook on test theory.
-0.165195	it Use a "move constructor"
-0.236094	compiler versions 7 through 14,
-0.237606	or NAN (Not A Number)
-0.237600	number of cores will grow
-0.364868	to 32 bit systems: Pointers,
-0.308939	divisible by vector size. Unpredictable
-0.237816	use the GetTickCount or QueryPerformanceCounter
-0.530682	ways to divide the workload
-0.237790	to replace u[1] by u[0].
-0.852049	member of the same class).
-0.349024	PC. Similarly, we are seeing
-0.226799	is restarted anyway. Software distributors
-0.165195	Manual", Volume 1, 2A, 2B,
-0.165195	C++ v. 4.1.0, 2006 (Red
-0.236291	doesn't handle current CPUs optimally.
-0.354610	from aligning the data optimally,
-0.294240	a technological point of view.
-0.228179	similar thanks to heavy competition.
-0.308549	Digital Mars Compiler v. 8.42n,
-0.165195	of code optimization", Coriolis group
-0.354883	manuals. I want to thank
-0.165195	Performance". www.open- std.org/jtc1/sc22/wg21/docs/TR18015.pdf. OpenMP. www.openmp.org.
-0.318578	instruction set is particularly interesting
-0.224962	(a|b)&(a|c) = a|(b&c) x-xxxx--x ~a&~b=~(a|b)
-0.497913	;r ; unused label ;eax=addressofa
-0.285362	by using the declaration "static"
-0.561760	spaced a multiple of 0x800
-0.352599	Print heading You can subtract
-0.314435	to know how this works,
-0.165195	(); __asm__ (".type CriticalFunction, @gnu_indirect_function");
-0.343297	the possibility for other optimizations,
-0.237908	operations on vectors and matrixes.
-0.237523	computers have very different speeds.
-0.279508	SSE3 pmmintrin.h Suppl. SSE3 tmmintrin.h
-0.165195	public: c1() : x(0) {};
-0.451170	risk that the programmer forgets
-0.224962	the advice given above. 7.
-0.425271	exponent is a positive integer:
-0.337146	about Func1 when compiling module2.cpp.
-0.234936	putting the smallest members last:
-0.324368	Example 9.6b 64 64 14.0
-0.165195	FuncType SelectAddMul, SelectAddMul_SSE2, SelectAddMul_SSE41, SelectAddMul_AVX2,
-0.165195	are critical time consumers. Choose
-0.237273	list[j].a = list[j].b + list[j].c;
-0.218601	Sum3(S3 & r) {return r.a
-0.339032	called from many different places).
-0.294187	-msse2, -mavx, etc. for Linux)
-0.237816	the structure. Incrementing or decrementing
-0.236044	different screen resolutions, etc. Accessibility
-0.349791	constructors, and any other constructors.
-0.237319	x8 = x4*x4; double x10
-0.827908	} else { // Generic
-0.165195	256 F32vec4 F64vec2 F32vec8 F64vec4
-0.237934	code, which supposedly is system-independent,
-0.165195	n; i++) { 92 DynamicArray[i]
-0.987309	the compiler to do cross-module
-0.165195	in the compiler. Remember, therefore,
-0.237273	must compute (FuncRow(i)*columns + FuncCol(i))
-0.237790	functions in memory by requesting
-0.237556	mirror the remote data locally.
-0.462135	of calculations: // Example 8.3a
-0.657506	is available: // Example 12.4c.
-0.233050	The calculation here gives a+b=0,
-0.200060	are produced regularly. AMD: "Software
-0.312763	map or an assembly listing.
-0.637259	a register variable in eax.
-0.376467	objects in a computer game
-0.523362	& x) { // polynomial(x)
-0.237131	2) : (bb[i] * cc[i]);
-0.228179	or every code line. Time-based
-0.237908	procedures for installation and uninstallation
-0.311817	Floating point algebra reductions: a+b=b+a
-0.577699	bit so that the remaining
-0.237829	test sign bit // u.d
-0.165195	glibc version 2.11 ifunc branch).
-0.279510	..................................................................................................................... 38 7.11 Type conversions....................................................................................................
-0.330529	workplace and the system forbids
-1.439329	as explained on page 107.
-0.229257	for each allocated block. Walking
-0.165195	the IEEE standard 754 (1985).
-0.455292	is allocated is also de-allocated.
-0.272306	7.28 Templates...............................................................................................................57 7.29 Threads ..................................................................................................................
-0.200060	/Ox -O3 or -Ofast /O3
-0.237492	when software uses CPU dispatching:
-0.293743	optimized code with CPU dispatching,
-0.325208	requires a compiler with C++0x
-0.235835	(a1*b2 + a2*b1) / (b1*b2);
-0.165195	1./39916800., 1./4.790016E8, 1./6.22702E9, 1./8.71782E10, 1./1.30767E12,
-0.165195	double b;}; S1 list[100], *temp;
-0.237776	created. Far Systems with segmented
-0.237718	order to get an integral
-0.279508	(three parameters on CodeGear compiler).
-0.235340	to an existing program. Weighing
-0.236090	the position-independent code. These workaround
-0.165195	information is utilized appropriately. Users
-0.165195	programs where security matters. Problems
-0.165195	-Ofast -mveclibabi -fopenmp /Qopenmp -m32
-0.212335	as int, float, double, bool,
-0.200060	typically get the generic branch,
-0.200060	/arch:SSE3 -mssse3 /arch:SSSE2 -msse4.1 /arch:SSE4.1
-0.218601	relevant when testing worst-case performance:
-0.165195	the Professional and Enterprise editions).
-0.293042	then stored at address [ecx+eax*4].
-0.165195	not the best optimizer. Borland/CodeGear/Embarcadero
-0.392240	makes testing and maintenance easier.
-0.355623	ArraySize by the value 1000.
-1.073595	The advantage of using ready
-0.552354	The compilers I have studied
-0.875820	a = b + 0.666666666666666666667;
-0.235997	+= Z; Z += A2;
-0.331881	kernel in the so-called commpage.
-0.651069	how much time it uses.
-0.293609	be true. template<> class powN<true,0>
-0.165195	reads from addresses 0x2F00, 0x3700,
-0.165195	column < NUMCOLUMNS; column++) matrix[row][column]
-0.235998	(low numbers mean good performance).
-0.230119	mode. The next chapter describes
-0.237823	const float lookup[2] = {2.6f,
-0.236292	data cache and accessed non-sequentially
-0.348892	files while less than 1%
-0.165195	between recoverable and non-recoverable errors;
-0.292525	{ if (n & 1)
-0.736742	the cache line size (typically
-0.237874	a common error that hackers
-0.165195	how to avoid hard-to-find errors,
-0.357092	memory address by the formula:
-0.165195	to optimize this loop? Certainly
-0.376065	Each function call statement occupies
-0.224965	function libraries with internal multi-threading,
-0.234683	// Example 9.6b. #include "xmmintrin.h"
-0.458776	the memory space is occupied
-0.230121	512 512 matrix. My experimental
-0.813014	in the loop control condition:
-0.237563	where everything happens at runtime).
-0.165195	blocking: int r1, r2, c1,
-0.165195	Look at Exception Specifications, Dr
-0.507262	involves the overhead of switching
-0.348236	may interleave the two formulas
-0.444370	error is easy to trace
-0.420205	This is also called Single-Instruction-Multiple-Data
-0.200060	has three clauses: initialization, condition,
-0.237670	N> class SafeArray { protected:
-0.237931	their implementations reveal a zigzag
-0.325160	For more on this topic,
-0.212332	user feedback seriously. User complaints
-0.292624	bits (MMX), 128 bits (XMM),
-0.356867	to make a function local:
-0.356632	10 times rather than 20.
-0.224963	languages include C, C++, D,
-0.234936	to add statements like throw(A,B,C)
-0.350250	code cache and it fills
-0.334280	file) should be made local.
-0.236844	option that allows less precise
-0.165195	vector nontemporal Table 18.3. Predefined
-0.339034	one global and one local,
-0.645576	the diagonal are accessed column-wise.
-0.237800	Intel CPU’s. Another function __intel_cpu_features_init_x()
-0.212332	the so-called partial flags stall
-0.286168	for(i=0; i<300; i+=3){ list[i] =0;
-0.218601	on PC platforms. Graphics accelerators
-0.200060	Scott Meyers: "Effective C++". Addison-Wesley.
-0.294292	shift out sign bit: absvalue
-0.231419	more efficient than mov eax,0.
-0.165195	ptr x; __asm fistp dword
-0.165195	(128 vectors of inte- ger
-0.314706	and microprocessors work. The recommendations
-0.347870	the case of data decomposition,
-0.228179	a table of jump targets.
-0.357411	results, which may be undesired.
-0.292655	size (16 or 32 bytes).
-0.165195	and 72 for discussions. Turn
-0.357747	reference instead: // Example 12.6.
-0.535404	counters, as in example 7.32b.
-0.237315	Intel CPUs use Intel VTune,
-0.413690	number in the interval [1.0,
-0.237198	if statements (called static if),
-0.237934	because the macro is referencing
-0.456223	Intel's profiler is called VTune;
-0.165195	sometimes it does incredibly stupid
-0.236610	use double precision without worrying
-0.236876	string is checked before storing.
-0.624364	point variables and operators ......................................................................
-0.657506	table lookup: // Example 7.29b
-0.165195	Henry S. Warren, Jr.: "Hacker's
-0.218601	variables, arrays and objects. Storage
-0.165195	Division by a constant: Unsigned
-0.251379	processors are becoming increasingly blurred
-1.460680	For example: // Example 7.29a
-0.353462	= 1000; unsigned int dummy;
-0.356126	to obtain, such as eliminating
-0.291191	Windows, you may write FatalAppExitA(0,"Array
-0.314593	the intermediate code by emulating
-0.237115	if the current version satisfies
-1.356093	i; for (i = (int)n
-0.230120	by compiling the module with,
-0.165195	the information is utilized appropriately.
-0.165195	80.9 512 512 378.7 168.5
-0.237019	x10 = x8*x2; return x10;
-0.165195	168.5 513 513 58.7 168.3
-0.288318	conditions. Programs that produce streaming
-0.293204	8*x + 2 return (2.5f
-0.769511	Note the difference between commas
-0.212332	b2, y1, y2, reciprocal_divisor; reciprocal_divisor
-0.504667	format instead of the usual
-0.236859	the least recently 4 ?Func2@@YAXQAHAAH@Z
-0.234683	compilers. #include <excpt.h> #include <float.h>
-0.237908	(int)(&list[0]) + 100*16, and temp++
-0.354468	know what you are doing.
-0.237934	the memory footprint is unreasonably
-0.237910	language gained remarkably in popularity
-0.200060	the DLL is relocated (rebased)
-0.265216	86 add add cmp ja
-0.292624	bits (XMM), 256 bits (YMM),
-0.165195	in the interval [1.0, 2.0)
-0.325359	lineage of software that dates
-0.539114	function } }; // Called
-1.158653	= a - n.a. (-a)*(-b)
-0.237716	100; int matrix[NUMROWS][NUMCOLUMNS]; int row,
-0.218603	matrix[c][r] at its mirror position
-0.421396	class doesn't need a constructor.
-0.237908	Manual", Volume 2A and 2B.
-0.237931	or NAN (not a number).
-0.324029	the even-numbered logical processors (0,
-0.355591	code motion. See page 78.
-0.309862	fffff is the binary decimals
-0.234815	You may make separate executables
-0.237861	other protection means are among
-0.234541	independent code, see below. Installing
-0.234386	(approximately): if (absvalue > largest_abs)
-0.458768	C++ syntax in example 8.15b.
-0.224963	to integers use truncation towards
-0.336351	occur in the multiplication b[i]*c[i],
-0.200060	functions static where appropriate. Compiler-specific
-0.877464	This instruction set is maintained
-0.237626	the address calculation more efficient:
-1.021114	array static inline __m128i LoadVectorA(void
-0.165195	dummy; double a[arraysize], b[arraysize], c[arraysize];
-0.346497	not only improve the performance,
-0.864564	how to make a sensible
-0.212335	or -Ofast /O3 -O3 Interprocedural
-0.235717	access these instructions. Function Assembly
-0.314097	template <int N> class powN<true,N>
-2.034309	- n.a. n.a. - a+b+c
-1.005481	unless there is a compelling
-0.504487	is included in the profile.
-0.339440	accessing container elements are cumbersome
-0.548711	400, last byte at 403
-0.224962	xmmintrin.h SSE2 emmintrin.h SSE3 pmmintrin.h
-0.579537	user is likely to experience.
-0.563925	versions of a program executable:
-0.341854	(Scalar means not a vector).
-0.222406	Most IDE's (Integrated Development Environments)
-0.436041	work well on non-Intel machines?
-0.237882	subsequent manuals are for those
-0.237823	sign bit: absvalue = a[i].u[1]
-0.317450	of a copy protection scheme
-0.357830	specific preferences for the IDE,
-0.292525	#define N1 (N & (N-1))
-0.455619	be very useful for investigating
-0.288321	to vectorize, or #pragma novector
-0.237823	= 100, max = 110;
-0.200060	and searching for vacant spaces.
-0.165195	show a discrete icon signaling
-0.352623	class need not be passed
-0.234683	vector classes (Intel) #include <pmmintrin.h>
-0.353747	calculations on the first sub-vector.
-0.526010	representation according to the IEEE
-0.290962	cycle. The OR operator (|)
-0.222406	a condition is relatively expensive,
-0.378523	you assume that model N-1
-0.237861	the above sections are dominating
-0.222410	Fastcall function __fastcall __attribute(( fastcall))
-0.165195	it has been brutally interrupted.
-0.291792	12.2 128 128 128 17.4
-0.283157	For example, in interpreted script
-0.356776	{2.6f, 1.5f}; a = lookup[b];
-0.806174	p) { return _mm_loadu_si128((__m128i const*)p);}
-0.165195	unsigned char 128 Iu8vec16 Vec16uc
-0.218601	of four floats F32vec4 xxn(x4,
-0.381949	a[i] = r + i/2;
-0.606180	floating point to integer According
-0.235048	compiler technology, and microprocessor microarchitecture.
-0.293008	of every version. For team
-0.234386	{ // abs(u.f) > abs(v.f)
-0.200060	int CriticalFunction (); __asm__ (".type
-0.293008	every intermediate version. For one-man
-0.637207	Transforming serial code for vectorization.............................................................
-0.265216	then we need metaprogramming. None
-0.233593	as inline function #define MAX(a,b)
-0.326627	code cache, branch target buffer,
-0.451021	cycles, then we can roughly
-0.350745	// return y = pow(x,n)
-0.251379	| ((C & 3) <<6
-0.463154	every element in the arrays:
-0.165195	for AMD Family 15h Processors".
-0.165195	several different profiling methods: Instrumentation:
-0.499992	explanation. The value of i&15
-0.165195	declaration "static" or "__attribute__((visibility ("hidden")))".
-0.343561	are often used as buffers
-0.289801	= double 2 AVX2 _mm256_i64gather_pd
-0.237813	macro is referencing it twice.
-0.165195	small embedded systems. Today (2013)
-0.382901	I have tested the capability
-0.453381	that a call to _endthread()
-0.237823	{ // polynomial(x) = 2.5*x^2
-0.237816	__asm ("int 3"); or __debugbreak();.
-0.352413	user interface and other system-
-0.314606	there are many function calls,
-0.237927	It is tempting to fine-
-0.640696	in the function library asmlib,
-0.237934	class? This chapter is aiming
-0.345994	the program have been found,
-0.452526	long on a particular subtask
-0.237908	the functions lrintf and lrint.
-0.237131	= log (b[i] * c[i]);
-0.165195	long int 32 -231 231-1
-0.381985	CriticalFunction_Dispatch; // Function pointer serves
-0.237494	processors. Details about instruction latencies
-0.318579	distribution to a limited audience
-0.165195	level linking (remove unreferen- ced
-0.290015	event that it becomes full.
-0.352666	use #if instead of if.
-0.165195	by the user. Feature bloat.
-0.294190	separate function library. The radical
-0.406607	lot of cache space. Putting
-0.294190	and invalid pointers. The absence
-0.579466	FuncA(i); } else { FuncB(i);
-0.348360	A simple way of solving
-0.463318	a waste of the programmers'
-1.254274	known at compile time. Four
-0.572969	finished the calculation of (a+b).
-0.218603	thread. Pointers to contained objects?
-0.795026	result is stored in y.
-0.447024	check for array bounds violations,
-0.357533	cycles (depending on the processor)
-0.237606	Booth: "Inner Loops: A sourcebook
-0.224962	and double vectors SSE3 horizontal
-0.576213	than a floating point comparison.
-0.658818	dependency chains can be broken
-0.224965	* DynamicArray = (float *)alloca(n
-0.523165	Other tasks such as spell-checking
-0.228181	((unsigned int)i >= (unsigned int)size)
-0.593132	used. Example: // Example 7.34a.
-0.293750	(single precision requires only SSE).
-0.488075	executed even though the CPU-type
-0.237908	cost of synchronizing and communicating
-0.348386	by using only the even-numbered
-0.453757	keyword has a different meaning
-0.237823	n.a. - a<<b<<c = a<<(b+c)
-0.540609	especially when the code mixes
-0.237122	shuffling, such as many encryption
-0.236362	matrix line size. I tried
-0.443221	__declspec(align(16)) static const float coef[16]
-0.294190	be measured separately. The fallacy
-0.761177	a function is not referenced
-0.200060	#include <math.h> #define EXCEPTION_FLT_OVERFLOW 0xC0000091L
-0.345228	costs to such a formalism.
-0.567390	numbers in case of underflow:
-0.236254	return ; align ; mark
-0.237288	the data set into sub-vectors
-0.226797	polymorphism or with compile-time polymorphism.
-0.345252	vector aligned or the __assume_aligned
-0.226797	for implementing a compile-time polymorphism,
-0.321254	shows first the runtime polymorphism:
-0.165195	it lacks the self-explaining menus
-0.335573	two. Some other compilers (Microsoft,
-0.165195	b, c, d, e, f,
-0.347493	tested (not up to date):
-0.236829	the dividend is unsigned Examples:
-0.824730	16 bytes without cache MOVNTPS
-0.355047	be poor because it lacks
-0.358404	data access can be arranged
-0.350586	will start to calculate (c+d)
-0.294246	code. Register ebx is pushed
-0.237908	as floppy disks and USB
-0.312846	supports, rather than its brand,
-0.380478	for the intermediate result (b+c)
-0.237816	"standard stack frame" or "frame
-0.526977	sign :1;//signbit }; struct Sdouble
-0.235425	are universal, flexible, well tested,
-0.237716	p->a + p->b;} int Sum3(S3
-0.311817	pieces of a suitable duration.
-0.331940	become obsolete within the lifetime
-0.165195	&list[0]; temp < &list[100]; temp++)
-0.357747	the loop: // Example 14.13c
-0.165195	small. Are objects numbered consecutively?
-0.462135	modulo operations: // Example 14.13a
-0.237697	long dependency chain may fill
-0.568578	code to: // Example 8.15b
-0.310106	= float 8 AVX2 _mm_i64gather_pd
-0.237908	in the GOT, and finally
-0.288319	the same chip. Such hybrid
-0.212332	Func2() { int list[100]; Func1(list,
-0.288320	to take a whole workday
-0.294164	= 10; Templates are instantiated
-0.352413	this limitation and other flaws
-0.290387	object (except for char pointers).
-0.230121	Use the "generate map file"
-0.990884	xpow10(double x) { return IntegerPower<10>(x);
-0.233822	following compiler versions were tested:
-0.314762	gain by testing and analyzing
-0.237374	// Example 7.36 class S2
-0.237374	// Example 7.37 class S3
-0.356613	= b*a - n.a. (a+b)+c
-0.237823	static float list[] = {1.1,
-0.347870	and type of data elements,
-0.200060	call polymorphic child function: (static_cast<MyChild*>(this))->Disp();
-0.581262	language that can be cross-
-0.231417	operating system functions (e.g. GetLogicalProcessorInformation
-0.356176	program (or part of it).
-0.483432	languages can be quite substantial.
-0.538698	x++) { Table[x] = A*x*x
-0.218601	nmmintrin.h (MS) smmintrin.h (Gnu) AES,
-0.331860	you are certain that u
-0.229258	then calls a device driver.
-0.358404	Multiple divisions can be combined.
-0.294217	type has advantages and disadvantages.
-0.237075	each process. Obviously, we loose
-0.165195	thanks to heavy competition. Processors
-0.568806	and more efficient than investing
-0.340339	the values of its arguments.
-0.420993	The function library at www.agner.org/optimize/asmlib.zip
-0.523362	const x) { // Round
-0.342963	to do more complicated reductions.
-0.236984	7.43b is admittedly very kludgy.
-0.287407	Windows, SetThreadAffinityMask, in Linux, sched_setaffinity).
-0.237074	will not get any answer.
-0.236325	from fully utilizing its out-of-
-0.237716	Use square blocking: int r1,
-0.229258	for debugging facilities, easy GUI
-0.237908	such as flush and fence
-0.165195	where is the sign, eee
-0.165195	fast as a scalar (Scalar
-0.306483	debugging and exception handling. Omitting
-0.356632	six instructions rather than nine,
-0.165195	mov mov mov lea $B2$2:
-0.568578	optimized to: // Example 7.10b
-0.237273	b.y + c.y + d.y;
-0.357747	with 1: // Example 7.10a
-0.294240	a typical degree of randomness
-0.286837	in a low priority thread,
-0.293164	for fast 32-bit software development",
-0.237739	large expressions when not selected.
-0.237716	memory allocation are: int BigArray[1024]
-0.290836	as you will see shortly.
-0.293745	tables: Lists of instruction latencies,
-0.165195	seen in the broader perspective
-0.293153	dependency chains with long latencies.
-0.165195	&, |, ^, ~, <<,
-0.237716	// Example 7.18 int FuncRow(int);
-0.237829	that have multiple // versions:
-0.350233	= -1.0E8, b = 1.0E8,
-0.237908	of example 12.4b and 12.4c
-0.218603	are produced regularly. Intel: "Intel®
-0.265216	branch. It may seem illogical
-0.212335	developer.intel.com. AMD: "AMD64 Architecture Programmer’s
-0.200060	particular purpose. The clumsy AND-OR
-0.408011	int s; s = (short
-0.234539	to write if(!a && !b)
-0.512663	of code in multiple versions,
-0.224965	and machines with embedded microcontrollers.
-0.234386	integer expression -a > -b
-0.234937	reduce the integer expression -a
-0.165195	Instruction set Prefetch PREFETCH _mm_prefetch
-0.234540	optional commercial license Table 12.4.
-0.284379	template parameter: template <typename MyChild>
-0.825601	Alignd ( short int bb[size]
-0.200060	circuits consisting of digital building
-0.237861	in the STL are universal,
-0.834676	be more efficient to pool
-0.325394	The integer representation of &list[100]
-0.350717	bb, cc); } // Entry
-0.237212	add_horizontal) static inline float add_elements(__m128
-0.165195	a reference, or void. Returning
-0.330862	as for (i=0; i<n; ++i).
-0.354234	Example 9.4 const int NUMROWS
-0.331848	a[i+3]; } sum = (s0+s1)+(s2+s3);
-0.224963	page 131. Intel Performance Primitives
-0.356960	are confined to a narrow
-1.129580	like this: // Example 12.4e.
-0.234541	a lot of runtime DLL's
-0.165195	assembly language", section 17.9: "Moving
-0.235896	c2 for elements inside sqaure:
-0.325265	FuncType * SelectAddMul_pointer = &SelectAddMul_dispatch;
-0.165195	graphic brushes, etc. Locked mutexes.
-0.272304	key press or mouse move.
-0.235835	sum += xn / nfac;
-0.515407	handling is intended for detecting
-0.165195	branch by a conditional move,
-0.165195	Mostly obsolete. Rick Booth: "Inner
-0.165195	is doing multiple logically distinct
-0.237823	| (~a&c) a&b&c&d = (a&b)&(c&d)
-0.200060	price GNU General Public License,
-0.382255	with only one CPU core,
-0.234936	strings in classes like string,
-0.165195	0.89 0.40 0.30 4.5 0.82
-0.165195	function. 154 // Print heading
-0.456970	is discussed on page 60.
-0.231918	hardly relevant to optimization. Prefetching
-0.234685	compiler that supports automatic vectorization,
-0.322227	relative to the current position.
-0.165195	Intel Core 2 0.77 0.89
-0.283156	control statement several iterations ahead.
-0.236422	i; ... list[i & 15]
-0.218603	user has a virus scanner
-0.356951	out by the program logic.
-0.165195	2.20, glibc version 2.11 ifunc
-0.293756	pointers /vms Fastcall functions /Gr
-0.237823	value as n! = n∙(n-1)!.
-0.986766	} } } } Transposing
-0.444216	can be done by controlling
-0.237908	pointers are auto_ptr and shared_ptr.
-0.435002	files, help files and databases.
-0.236876	of programming experience before trying
-0.358082	different tasks in a multitasking
-0.733051	addition, subtraction and multiplication (27
-0.733051	addition, subtraction and multiplication (20
-0.237823	sets) (line size) = (total
-1.083356	this by // Example 8.5b
-0.465619	/Og Whole program optimization /GL
-0.593132	known. Example: // Example 8.5a
-0.356176	a large part of it)
-0.440218	loops. 13.1 CPU dispatch strategies
-0.429547	degree of optimization is requested.
-0.237934	such a response is delayed
-0.235652	i < ArraySize; i++) List[i]++;
-0.236876	done the job before you.
-0.251379	positive } Example 14.27 assumes
-0.357561	an additional floating point variable:
-0.200060	Jr.: "Hacker's Delight". Addison-Wesley, 2003.
-0.358014	software faster. It is assumed
-0.293382	with the Borland C++ builder.
-0.165195	Boolean operators &&, ||, !
-0.236536	University courses in programming nowadays
-0.212332	operators ............................................................................................. 56 7.28 Templates...............................................................................................................57
-0.229258	I have tested implement OneOrTwo5[b!=0]
-0.228182	facilities, binary trees, hash maps
-0.318240	as explained in chapter 9.10,
-0.335388	by compiling in two steps.
-0.230122	in the old version. Updating
-0.237816	the option /QaxAVX or -axAVX.
-0.343542	point number simply by inverting
-0.237934	representation of &list[100] is (int)(&list[100])
-0.460901	operations: __m128i a = _mm_or_si128(c2,
-0.231419	i/2+r. The instructions mov ebx,eax
-0.236718	implemented by (partial) template specialization.
-0.236718	with a non-recursing template specialization,
-0.581262	something that can be improved.
-0.325390	processing in C++ and Fortran.
-0.165195	for parallel processing. Scott Meyers:
-0.643653	b1, b2; y = (a1*b2
-0.657212	second operand is not evaluated,
-0.356776	1. Writing a = OneOrTwo5[b!=0];
-0.235504	x; public: c1() : x(0)
-0.165195	v.10.2 & earlier vmlsExp4 vmldExp2
-0.355664	of b+c will be rounded
-0.237829	a = (int)d; // Truncation
-0.852757	a = b / 1.2345;
-0.237910	it is short in duration
-0.224963	aware of when type-casting pointers:
-0.237776	names that begin with _mm.
-0.237563	Usability for Nerds at Wikibooks.
-0.232348	to interrupts and task switches;
-0.569739	f; The compiler may interleave
-0.456405	ranges) do not overlap. 27
-0.165195	0.75 0.18 0.11 1.21 0.57
-0.165195	long long clock; __cpuid(dummy, 0);
-0.339469	for Windows and to Eclipse
-0.294067	which may interfere with real
-0.237273	Table[x] = A*x*x + B*x
-0.165195	Specifications, Dr Dobbs Journal, 2002).
-0.331940	assembly listing. Use the "generate
-0.340110	branch that is always true/false
-0.357600	the error is not detected
-0.212332	big mainframe computer. Big supercomputers
-0.237790	instead of pointers, by initializing
-0.325400	of the memory is mirrored
-1.436835	i < 100; i++) matrix[FuncRow(i)][FuncCol(i)]
-0.237829	f = static_cast<float>(i); // Implicit
-0.236773	obsolete. Programmers very often underestimate
-0.331565	I disagree with this rule.
-0.486539	for the end user. Installation
-0.294164	libircmt.lib. Function names are undocumented.
-0.237816	a whole polygon or bitmap
-0.292525	u.i = (n & 0x7FFFFF)
-0.521733	Software Developer’s Manual", Volume 2A
-0.356446	be regarded as a valuable
-0.272304	same as the C-style type-casting.
-0.339143	for very large data bases,
-0.237464	the operating system which redirects
-0.291481	flush-to-zero and denormals-are-zero mode (SSE2):
-0.509156	patterns. This can cause severe
-0.335805	or after the last member.
-0.313904	set (128 bit float vectors)
-0.165195	It reveals a funda- mentally
-0.237874	and the wires that connect
-0.165195	it is the responsi- bility
-0.228182	*= x; nfac *= n+1;
-0.231923	the specified types (See Sutter:
-0.335916	is enabled. A more primitive,
-0.294187	Table 9.3. Time for transposing
-0.462165	running on the same computer,
-0.237934	these two values is closest
-0.294239	{ // Loop to print
-0.358061	the operands if the evaluation
-0.294187	typically 30 ms for foreground
-0.165195	Pure function. __attribute__((const)) (Linux only).
-0.325408	a specific advantage to obtain,
-0.237908	be further tested and investigated
-0.293643	Example 15.1c. Calculate integer power,
-0.237798	// Example 7.8 if (handle
-0.400480	Intel C++ Compiler v. 11.1
-0.237829	of type T // Constructor
-0.218601	element 63 63 31 11.6
-0.165195	on remote or removable media
-0.237884	preventing illegitimate copying. The benefits
-0.218601	16.4 65 65 33 11.8
-0.293204	power of 2 return powN<(N
-0.358061	Linux platforms if the bias
-0.294246	sure the information is utilized
-0.358548	CPU dispatcher in the MKL
-0.165195	compiler: __int64 64 -263 263-1
-0.218601	256 double 256 F32vec4 F64vec2
-0.905258	"Optimizing subroutines in assembly language",
-0.341912	objects in Mac OS X.
-0.234217	/Gr Function level linking (remove
-0.236177	"we don't support processor X"
-0.237934	The CPU market is developing
-0.790761	that the arrays are aligned,
-0.439369	is easier to write 2.0/3.0
-0.357747	loop counter: // Example 7.31b
-0.462135	lower case: // Example 7.31a
-0.237131	(N-1)) return powN<(N1&(N1-1))==0,N1>::p(x) * powN<true,N-N1>::p(x);
-0.224966	(Compile without the Common Language
-0.234215	nontemporal write instructions becomes noticeable.
-0.293191	uses more than 2 gigabytes
-0.234541	x *= x; n >>=
-0.354258	out-of-order capabilities (see page 103)
-0.356446	development work as a learning
-0.357747	Library (WTL): // Example 7.43b.
-0.324323	newsgroup comp.lang.asm.x86 for some links.
-0.237716	b; int c; int UnusedFiller;
-0.356483	the other with a 50-50
-0.237686	(Division is slow, you know).
-0.218601	stage that a detailed overview
-0.294246	The function F1 is supposed
-0.357747	single comparison: // Example 14.4b
-0.357747	compile time. // Example 15.1a.
-0.293191	256 Kbytes to 2 Mbytes.
-0.237910	a usability problem in interactive
-0.200060	(MS Visual Studio 2008 version).
-0.165195	bit in nn ifbit=1 bitofn
-0.222406	if no exception ever happens.
-0.408136	code more complicated and error-prone.
-0.200060	Intel C++ compilers. Wikipedia article
-0.293416	a table with two entries.
-0.314189	This is efficient, but risky.
-0.222406	it from a project built
-0.287895	parent and child class. Members
-0.504708	is running in the majority
-0.237839	free. Visual Studio can build
-0.290017	bit code Static linking (multithreaded)
-0.320313	Example 14.5b if ((unsigned int)(i
-0.230826	high that it rarely justifies
-0.885415	counter. Example: // Example 8.13a
-1.154678	this to: // Example 8.13b
-0.466218	operators &, |, ^, ~,
-0.237955	have to reinvent the wheel.
-0.237927	for many years to come.
-0.235893	manual currently doesn't works (gcc
-0.237955	because it lacks the self-explaining
-0.520281	size. Alternatively, you may actively
-0.165195	{1.1, 0.3, -2.0, 4.4, 2.5};
-0.237273	{ return vector(x + a.x,
-1.249068	It is important to weigh
-0.336328	problems that cause the resource-hungry
-0.237776	or by extending with zero-bits
-0.351001	optimized program is often reorganized
-0.303151	^ ~a = -1 (a&~b)|(~a&b)=a^b
-0.165195	in this block: 62 __try
-0.336237	execution speed and for minimizing
-0.237910	which are cheap, in relation
-0.237823	c++) { a[c][r] = b[r][c];
-0.237776	constants are defined with enum,
-0.331183	it explicitly. In example 8.21,
-0.548926	replaced by // Example 14.15b
-0.332050	the granularity is too fine
-0.165195	see my free E-book Usability
-0.332917	assembly language and automatic CPU-dispatching
-0.237816	of inte- ger or double)
-0.487213	only when called from main,
-0.336316	instruction was certain to truly
-0.294190	is allocated separately. The allocation,
-0.351001	vectors, as is often seen,
-0.165195	is Microsoft Foundation Classes (MFC).
-0.165195	as strcpy, strcat, strlen, sprintf,
-0.657599	the sign of a double:
-0.165195	CriticalFunction (); __asm__ (".type CriticalFunction,
-0.467604	out the loop and reorganize:
-0.237908	show how tortuous and convoluted
-0.237273	temp = a[i] + b[i];
-0.230827	suffixes such as e.g. .R.
-0.236610	can be used without restrictions.
-0.237934	of five manuals is copyrighted
-0.538684	a hard disk or network.
-0.355352	( ; i < arraysize;
-1.670180	It is possible to express
-0.235249	network may be both cheaper
-0.237569	object in case memory re-allocation
-0.293857	this pointer is then de-referenced
-0.311671	formats should be used. Web
-0.382863	telling the user to restart
-0.165195	lot of runtime DLL's (dynamically
-0.237882	< rows; i++) for (j
-0.237816	not backwards. Copying or clearing
-0.314703	for shared_ptr than for auto_ptr.
-0.165195	pointer aliasing /Oa -fno-alias Non-strict
-0.382568	ArraySize = 1000; int List[ArraySize];
-0.326207	512 matrix in my experiments.
-0.237934	the .exe file, is acceptable.
-0.691796	efficient than x = *(++p)
-0.165195	Vec16us Vec8i Vec8ui Vec4q Vec4uq
-0.538710	This is slow // Modulo
-0.165195	the first two suggested improvements).
-0.861900	tell the compiler to vectorize,
-0.237928	an approximate comparison of doubles
-0.237469	processor has hyperthreading. If so,
-0.165195	(WTL): // Example 7.43b. Compile-time
-0.572901	signifying one of the weekdays.
-0.237706	license included in compiler price
-0.237816	development kit (SDK or PSDK).
-0.222406	__fastcall __attribute(( fastcall)) __fastcall Noncached
-0.492833	not out of range printf(Greek[n]);
-0.212335	object is not modified. Unlike
-0.351449	care of the user interface,
-0.230826	Intel's Math Kernel Library (MKL
-0.438959	user is waiting for response.
-0.314507	more compact than an MFC
-0.237739	Instruction set SSE2 not supported");
-0.462996	to cache misses, branch misprediction,
-0.357515	the linker and the loader.
-0.291567	The compiler will calculate (1./1.2345)
-0.691796	efficient than x = array[++i]
-0.237716	Example 8.20 module1.cpp int Func1(int
-0.200060	The user expects immediate responses
-0.237861	options. CPU vendors are offering
-0.635435	12.5 Using vector classes Programming
-0.651318	are accessed on a First-In-Last-
-0.923054	version of the code. Inserting
-0.314633	we have (set) = (10000
-0.284381	// Prevent optimizing away cpuid
-0.458768	to CriticalFunction in example 16.2.
-0.165195	textbook on test theory. Advice
-0.356613	2) 2 - n.a. a+a+a+a
-0.525997	DWORD PTR [edx] DWORD PTR[ecx+eax*4],ebx
-0.200060	from example 8.26a (32-bit mode):
-0.237823	const float OneOrTwo5[2] = {1.0f,
-0.344193	the syntax is so kludgy
-0.310717	A for-loop has three clauses:
-0.165195	memory space is occupied throughout
-0.237319	x4 = x2*x2; double x8
-0.200060	explicit induction variable. (This eliminates
-0.349678	to metaprogramming would be straightforward.
-0.237908	to delete it and create
-0.710892	a pointer or a non-const
-0.458537	interface is obtained by dropping
-0.326628	tool is Microsoft Visual Studio.
-0.236046	wmmintrin.h AVX immintrin.h AMD SSE4A
-0.349014	a member function or friend
-0.237800	further by using function inlining,
-0.293409	is to use static linking,
-0.331183	XMM register. In example 12.2,
-0.500747	the memory allocation is unnecessarily
-0.382872	overflow. The exception is caught
-0.165195	branch (e.g. an if-else structure),
-0.325414	of each string is checked
-0.235997	+= a[i]; s1 += a[i+1];
-0.237798	// Example 8.10a if (true)
-0.650576	automatic vectorization (see page 107),
-0.354258	automatic CPU-dispatching (see page 122)
-1.107735	reasons explained on page 62.
-0.234029	as price, compatibility, second source,
-1.439329	as explained on page 96.
-0.533563	time the software was coded.
-0.455388	that can be optimized further.
-0.350717	*)d, x); } // Branch/loop
-0.459064	objects identified by a key?
-0.237131	= (float *)alloca(n * sizeof(float));
-0.200060	---xxx--- a/a=1 --------x a/1=a x-xxx-x--
-0.294018	- xxxxxxxxx -- - xx
-0.165195	identified by a unique key.
-0.857976	to the end user. Menus,
-0.301869	Does not, by default, conform
-0.293332	j * (columns * sizeof(float)).
-0.356483	log on with a password.
-0.165195	and pointer type casting. Linked
-0.335900	follows (using Intel vector classes):
-0.644806	8.2 Comparison of different compilers.............................................................................
-0.324625	function _mm256_zeroupper() before any transition
-0.218605	than addition and subtraction (3
-0.165195	guidelines should be obeyed. Copy
-0.232720	Intel C++ compiler, v. 10.1.020.
-0.200060	the code. See ISO/IEC TR18015
-0.314765	the reader what is happening.
-0.341675	consecutive indices or by keys
-0.307003	a constructor, an overloaded assignment
-0.200060	directives ......................................................................................... 65 7.33 Namespaces...........................................................................................................
-0.491469	test the different versions alternatingly
-0.823044	a, b, c, d, e,
-0.165195	Watcom C/C++ v. 1.4, 2005.
-0.272304	this problem by defining _mm_malloc
-0.236217	an error handler calls exit(),
-0.233822	float list[100]; memset(list, 0, sizeof(list));
-0.218601	cannot be controlled. Small hand-held
-0.346019	{ return a * a;}
-0.293382	Studio 2005). Borland C++ 5.82
-0.325397	be irrelevant within a year
-0.165195	calculated by the series: ex
-0.165195	1./2., 1./6., 1./24., 1./120., 1./720.,
-0.165195	Lists of instruction latencies, throughputs
-0.218601	list[300]; int i, i_div_3; for(i=i_div_3=0;
-0.226800	to be possible. Template meta-
-0.466218	} else { goto CFALSE;
-0.352504	} 34 else { CFALSE:
-0.287405	be returned in registers. Except
-0.165195	C++ compilers www.agner.org/ optimize/#vectorclass Include
-0.501086	should preferably be kept entirely
-0.237813	the address of it (&ArraySize)
-0.200060	are stored in ASCII form.
-0.237600	operator (bitwise and) will cut
-0.533563	time the software was developed.
-0.636127	reciprocal of the clock frequency,
-0.356483	backwards compatibility with a lineage
-0.236907	rounding, but neither faster nor
-0.570590	Func1(int x) { return x*x
-0.328039	to define your own error-handling
-0.231920	there is no clear correspondence
-0.331819	and high-priority threads are areas
-0.237697	then the CPU may occasionally
-0.625754	the sequence of calculations forms
-0.165195	and object-oriented programming, modularity, reusability
-0.355591	if possible. See page 141.
-0.382652	advanced data structures with First-In-First-Out
-0.349826	the syntax is very old-fashioned.
-0.237019	ptr n; #endif return n;}
-0.340052	you have to replace u[1]
-0.237053	table can give some indication
-0.361346	using a shift operation. x*8
-0.355416	the solution is more complicated.
-0.790678	in the same memory block,
-0.165195	container. STL deque (doubly ended
-0.320931	/arch:AVX etc. for Windows, -msse2,
-0.237955	time consumers. Choose the strongest
-1.342908	it is important to remember
-0.283157	the same source file. Keep
-0.226798	resource files from disk. Memory-hungry
-0.356952	some tests with the sizeof
-0.352367	heavy traffic and a server
-0.237464	type identification (RTTI), which affects
-0.332231	places making the dispatch decision
-0.165195	Intel Core 2 0.63 0.75
-0.442891	bytes Intel Core 2 0.77
-0.451578	No loop-carried dependency chain. Nothing
-0.236718	power of 2: template <bool
-0.503663	because it is a staircase
-0.237884	1.5f : 2.6f; The ?:
-0.382362	where execution speed, memory economy
-0.165195	1./3628800., 1./39916800., 1./4.790016E8, 1./6.22702E9, 1./8.71782E10,
-0.990884	xpow10(double x) { return ipow(x,10);
-0.200060	Extra data conversion, shuffling, packing,
-0.165195	"Hacker's Delight". Addison-Wesley, 2003. Contains
-0.346270	for Linux have an attribute
-0.226798	topic, see my free E-book
-1.385186	a, b; a = (int)d;
-0.234540	8 or 16 Table 7.2.
-0.165195	= (s0+s1)+(s2+s3); Now s0, s1,
-0.222408	compiler, v. 10.1.020. Functions _intel_fast_memcpy
-0.200060	malloc. Handles to windows, graphic
-0.638848	12.7 Mathematical functions for vectors........................................................................
-0.357747	of structures: // Example 9.1a
-0.357747	as follows: // Example 9.1b
-0.353462	u[2]} a[size]; unsigned int absvalue,
-0.433219	and the header file mathimf.h
-0.815438	you can use the GetTickCount
-0.234031	has support for XMM registers;
-0.236737	linked from static libraries (.lib
-0.354639	the principle for a 2'nd
-0.531504	and the cost of verifying,
-0.294190	accessed quite fast. The lesson
-0.165195	files on access. Sequential forward
-0.358677	the problems of the original,
-0.294239	address and attempts to translate
-0.165195	is likely to experience. Occasionally,
-0.230118	the possibility for significant improvements.
-0.294120	largest_abs) { largest_abs = absvalue;
-0.506832	makes the code section position-independent,
-0.501703	a branch by a conditional
-0.234540	2056 38.1 97 Table 9.1.
-0.292416	setting up a stack frame,
-0.236326	option for "standard stack frame"
-0.226797	char 8 -128 127 int8_t
-0.318123	the throughput (see p. 104).
-0.165195	Vec4q Vec4uq Vec4f Vec2d Vec8f
-0.165195	256 Vec32uc Vec16s Vec16us Vec8i
-0.165195	x); // x^1, x^2, x^3,
-0.378994	122 13.2 Model-specific dispatching ....................................................................................
-0.309120	a = b ? 1.5f
-0.165195	Whole program optimization /GL --combine
-0.320626	= int 4 AVX2 _mm256_i32gather_epi32
-0.413690	Assume no pointer aliasing. __declspec(noalias)
-0.418897	different brands of CPUs unequally
-0.234540	(a&b) | (~a&c) | (b&c)
-0.351706	same directory as the .exe
-0.226799	printf("Beta"); break; case 2: printf("Gamma");
-0.236975	for a 2'nd order polynomial:
-0.382671	clause are separated by commas.
-0.575578	it is used and popped
-0.293164	development process and software engineering
-0.294243	variables for calculating a polynomial.
-0.520578	way to avoid the burdensome
-0.200060	without an IDE. Free trial
-0.496407	because the code becomes contiguous.
-0.572857	set and YMM registers .................................................................
-0.235047	features. The programmer typically thinks
-0.237131	s += xxn * _mm_load_ps(coef+i);
-0.234030	very helpful for later maintenance.
-0.314779	use. The installation of downloaded
-0.324574	the chosen version return (*CriticalFunction)(parm1,
-0.235835	y1 = a1 / b1;
-0.237563	chapter is aiming at explaining
-0.530867	of the loop count (ArraySize)
-0.460894	Here, you should be prepared
-0.307594	most cases. The so-called iterators
-0.234214	name "position-independent code" actually implies
-0.356632	result -56 rather than 200.
-0.165195	improve speed without jeopardizing safety,
-0.224962	(a&b)|(a&c) = a&(b|c) x-xxxx--x (a|b)&(a|c)
-0.165195	Gnu C++ v. 4.1.0, 2006
-0.228178	Vol. 11, Iss. 4, 2007
-0.165195	of Denmark. Copyright © 2004
-0.354258	less efficient (see page 53).
-0.350323	rather than using a ready-made
-1.195583	int cc[]) { // Detect
-0.559092	using functions such as GetPrivateProfileString
-0.310109	be prevented by calling vector::reserve
-0.340048	class CChild2 : public CParent<CChild2>
-0.684255	Different kinds of variable storage.............................................................................
-1.248356	it is necessary to query
-0.357747	and memcpy: // Example 7.33b
-0.237908	memory allocation (new and delete).
-0.749147	It is faster to compose
-0.538927	throw() to the function prototype:
-0.237569	and for minimizing memory fragmentation.
-0.358014	or references. It is OK,
-0.523165	for tasks such as sorting,
-0.466218	a2, b1, b2, y1, y2;
-0.322971	be invalid and cause fatal
-0.655605	instruction set is the scarcity
-0.272304	books 1994. Mostly obsolete. Rick
-0.499786	4 bytes without cache MOVNTI
-0.459436	will get the value -100+100+100
-0.382836	50 7.17 Structures and classes............................................................................................
-0.237123	constants, string constants, array initializer
-0.165195	data compression and cryptography (www.intel.com).
-0.165195	1./120., 1./720., 1./5040., 1./40320., 1./362880.,
-0.355664	& operation will be non-zero,
-0.289307	constant data // constructor initializes
-0.237816	this example, f(x) or g(x)
-0.341682	for example when you discover
-0.290016	x.f; // will give -2.0
-0.354258	linked list (see page 93).
-0.429264	run the optimized code (release
-0.325194	own research, not on publicly
-0.576408	function libraries are highly optimized,
-0.294243	but only show a discrete
-0.165195	Generate optimization report /Qopt-report -opt-report
-0.345994	in isolation have been unsatisfied
-0.355449	These conversions are not safe,
-0.236326	Pointers, references, and stack entries
-0.356267	more predictable than the other,
-0.294112	more jobs simultaneously or seemingly
-0.294000	from main through an imported
-0.357600	file format is not standardized.
-0.237927	may need modification to compensate
-0.200060	and easier to test, maintain
-0.290835	more complicated functions like sin.
-0.165195	as pow, log, exp, sin,
-0.356952	of N with the rightmost
-0.729963	Choice of programming language ...............................................................................
-0.356637	is large because the insertion
-0.293816	access a public data object:
-0.236218	to get library versions instead.
-0.784779	that needs to be saved.
-0.314695	inverted bit-mask: bc = _mm_andnot_si128(mask,
-1.154678	this to: // Example 8.11b
-0.593132	branch. Example: // Example 8.11a
-0.200060	turned on. Most IDE's (Integrated
-0.294246	the storage order is opposite).
-0.200060	and 3B. developer.intel.com. AMD: "AMD64
-0.237823	= 0x10, Friday = 0x20,
-0.859019	} } else { DTRUE:
-0.466218	} else { goto DTRUE;
-0.314778	is more common to exchange
-0.293711	of the code, which supposedly
-0.309862	stored as the binary digits.
-1.439329	as explained on page 44.
-0.230118	of approximately seven significant digits,
-0.231419	link with external libraries. www.agner.org/optimize/#vectorclass
-0.237798	largest element (approximately): if (absvalue
-0.795278	The loop in example 8.23b
-0.234385	n; #if defined(__unix__) || defined(__GNUC__)
-0.236089	that branch. The common excuse
-0.284379	== 2 #define FUNCNAME SelectAddMul_SSE2
-0.333187	These are of course system-specific.
-0.231418	shuffling, packing, unpacking needed. Predictable
-0.308939	the larger vector size. Later
-0.349978	and Microsoft C++ compilers www.agner.org/
-0.222408	is limited by physical factors.
-0.232721	<=, > and >= operators).
-0.234385	(a&&b) || (!a&&c) || (b&&c)
-0.236846	&& z != 0; 35
-0.355331	goto CFALSE; } } 34
-0.165195	name mangling. The characters '?',
-0.165195	list[] = {1.1, 0.3, -2.0,
-0.327070	c; b = (unsigned int)a
-0.294187	has enough bits for holding
-0.556120	code in a separate module,
-0.218601	the time-consuming data processing. Running
-0.336333	may not take the hint,
-0.336180	it doesn't depend on system-specific
-0.237823	set int iset = instrset_detect();
-0.237816	other resources locally or remotely.
-0.382059	It comes with most distributions
-0.165195	obsolete. Rick Booth: "Inner Loops:
-0.237874	The common excuse that "we
-0.237606	for each object. A little-known
-0.165195	Stefan Goedecker and Adolfy Hoisie:
-0.237340	first call method using InstructionSet():
-0.165195	manipulation Mathematical functions Encryption, decryption,
-0.326207	A look in my crystal
-0.200060	QueryPerformanceCounter functions for millisecond resolution.
-0.224962	but slow or completely absent
-0.251379	previously required a PC. Similarly,
-0.357770	variable having the same name,
-0.462379	Find numerically largest element (approximately):
-0.652249	until the computer is reset
-0.294243	breakpoint and show a disassembly,
-0.347531	objects have a natural ordering?
-0.538582	effort is concentrated on arranging
-0.236737	possible to insert optimization hints
-0.234935	separately and test their functionality.
-0.237908	instructions are fetched and decoded
-0.212332	a ^ b ---xx---- a<<b<<c=a<<(b+c)
-0.454670	__intel_cpu_features_init() sets the variable __intel_cpu_feature_indicator
-0.165195	See Intel Technology Journal Vol.
-0.200060	at the diagonal remain unchanged.
-0.237934	with all 1's is unchanged,
-0.467660	like to put a tag
-0.165195	| (C << 6); Or,
-0.165195	files are also included. Combining
-0.444370	hundred clock cycles to fetch
-0.341853	+= TILESIZE) { for (c1
-0.237934	called the heap is reserved
-0.353420	therefore preferably have a balanced
-0.291292	This is actually quite convenient.
-0.236424	precision on most processors (when
-0.236610	legitimate backup copying without effectively
-0.393815	have an empty throw() specification.
-0.332050	this solution is too high.
-0.348618	Smaller microprocessors have no native
-0.538473	for other purposes than rendering
-0.651318	are accessed on a First-In-First-
-0.632005	without loading a cache line:
-0.237739	microprocessors is lost. This dilemma
-0.655572	the expression a = (b*c)/d,
-0.237600	and this value will propagate
-0.466218	can be predicted perfectly varies
-0.763937	to the same cache line,
-0.707529	Choice of user interface framework...........................................................................
-0.290385	dynamic array of n floats:
-0.344510	long time1; long long timediff[NumberOfTests];
-0.291790	vector of e.g. four floats.
-0.318123	dependency chains (see p. 22).
-0.237829	polymorphism with templates // Place
-0.234542	int c:2; }; char abc;
-0.382872	with unsigned integers is ambiguous
-0.438428	version for specific CPU models.
-0.200060	that do not 123 correspond
-0.237498	16 XOP, AMD only _mm_permutevar_ps
-0.503366	Replace with: // Example 7.38b.
-0.538698	x++) { Table[x] = Y;
-0.237716	by writing: __declspec(align(64)) int BigArray[1024];
-0.237908	2B, and 3A and 3B.
-0.352382	goes through the following steps
-0.809574	takes more time than looping
-0.356776	{ int a = Func1(2);
-0.165195	reveals a funda- mentally flawed
-0.616159	the programmer to know about.
-0.165195	the users with nagging pop-up
-0.237882	bit in Day for signifying
-0.236810	This is called register renaming.
-0.218603	was done by me manually,
-0.234683	// Example 9.3 #include <malloc.h>
-0.339503	is busy doing the spell
-0.510304	numbers are not always sequential,
-0.237823	*temp; for (temp = &list[0];
-0.330862	be infinity or NAN (Not
-0.237053	similar solutions may some day
-0.236291	rise to some extra complications.
-0.212332	programming are dominating. At least,
-0.237315	such as AQtime, Intel VTune
-0.230826	AMD Core Math Library __vrs4_expf
-2.245428	- - - - x-xx----x
-0.331206	optimized programs. The profiler identifies
-0.200060	x-xxxxx-x (-a)*(-b)=a*b ---xxx--- a/a=1 --------x
-0.237823	a&(b|c) x-xxxx--x (a|b)&(a|c) = a|(b&c)
-0.236843	line can hold 8 double's
-0.234217	Many containers use linked lists.
-0.165195	string constants, array initializer lists,
-0.581262	chip that can be programmed
-0.355126	variables will be used most.
-0.237829	ReadTSC function. 154 // Print
-0.212332	the wrong branch. Microprocessor designers
-0.682386	the same processor core. Try
-0.236877	objects are not stored contiguously
-0.357747	inside square: // Example 8.1b
-0.237829	float x, y; // x,y
-0.885415	function. Example: // Example 8.1a
-0.237816	with option -fwrapv or -fno-strict-overflow.
-0.200060	form of a re- usable
-0.237829	overflow has occurred. // Reset
-0.429566	order to increase the likelihood
-0.347481	efficient than when a fixed-size
-0.234937	This behaviour is implementation dependent.
-0.200060	x) { // Remove right-most
-0.165195	~a = -1 (a&~b)|(~a&b)=a^b ---------
-0.237372	in example 14.23 page 143.
-0.237908	Boolean operators (&& and ||).
-0.501205	system-specific. In order to facilitate
-0.165195	PathScale C++ v. 3.1, 2007.
-0.357747	vector classes): // Example 12.9b.
-0.293042	the stack at address esp+8
-0.760273	should be stored together ......................................
-0.382362	while execution speed, memory economy,
-0.361344	has not been updated lately.
-0.467639	as an array of structures:
-0.645576	the diagonal are accessed row-wise,
-0.582370	preferable to make a lookup-table
-0.237866	address which can't be reached
-0.593132	execution. Example: // Example 8.16
-0.165195	clock frequency (in Windows: __rdtsc()).
-0.381564	the global offset table (GOT).
-0.578748	following example: // Example 8.17
-0.503366	point numbers: // Example 8.18
-0.525986	all files on access. Sequential
-0.237000	use denormal numbers. You may,
-0.237928	of the techniques of multithreading.
-0.237574	than at runtime. Example 7.43
-0.357747	template parameter: // Example 7.42
-0.555287	may have to be renewed.
-0.462135	a case: // Example 7.45
-0.657506	is needed: // Example 7.44
-0.357747	to unsigned. // Example 7.4.
-0.224962	vector of (2,2,2,2,2,2,2,2) Is16vec8 two(2,2,2,2,2,2,2,2);
-0.237816	time. A for-loop or while-loop
-0.313337	a fully compiled code. Compiled
-0.341866	{ // table of 1/n!
-0.327837	13.6 80.9 512 512 378.7
-0.331908	clock cycles counter is counting
-0.165195	set not supported fprintf(stderr, "\nError:
-0.356112	the overflow and underflow neutralize
-0.200060	x; x.f = 2.0f; x.i
-0.165195	230.7 513 513 2056 38.1
-0.165195	14.4 511 511 2040 38.7
-0.290835	The Gnu compiler allows "__attribute__((visibility("hidden")))".
-0.237908	in computer games and animations
-0.165195	IA-32 Architectures Optimization Reference Manual".
-0.294187	function library made for demonstration
-0.290016	different types of graphics cards,
-0.165195	1./362880., 1./3628800., 1./39916800., 1./4.790016E8, 1./6.22702E9,
-0.237790	the table values by hand
-0.340230	return to its own caller,
-0.276614	2, 4, 8, 16, 32,
-0.235945	range. The next line provokes
-0.342825	value of the second operand.
-0.357387	may choose to make memory-hungry
-0.237934	An error message is provoked
-0.575394	= 20, columns = 32;
-0.237823	{ 92 DynamicArray[i] = WhateverFunction(i);
-0.354687	situation where they are unavoidable.
-0.370802	The compiler option -fno-pic apparently
-0.358082	completely contained in a DLL.
-0.165195	bytes without cache MOVNTPS _mm_stream_ps
-0.237927	the resource-hungry applications to perform
-0.236254	$B2$3: ret ALIGN ; mark_end;
-0.354258	where necessary (see page 96).
-0.237829	x2*x, x2, x); // x^1,
-0.534658	overflow with the option -ftrapv,
-0.313931	and __intel_new_strlen in library libircmt.lib.
-0.235835	b * (1. / 1.2345);
-1.019059	if the code is repetitive.
-0.233594	14.3a int n; switch (n)
-0.237643	access are critical time consumers.
-0.237931	some cases ignore a request
-0.325394	in binary representation of N:
-0.293946	const Greek[4] = { "Alpha",
-0.237007	* 5 / 2 (be
-0.237934	when the CPUID is artificially
-0.237816	a computer game or animation.
-0.237908	for the pros and cons
-0.311772	lower than a certain tolerance.
-0.165195	from addresses 0x2F00, 0x3700, 0x3F00
-0.237131	<< 4, anda * 17is
-0.312846	is better than its reputation.
-0.292948	8 unsigned char 64 Iu8vec8
-0.453023	assuming no pointer aliasing (/Oa).
-0.165195	vector operands: minimum, maximum, saturated
-0.342640	bit and 32 bit offsets).
-0.231419	push mov xor mov $B1$2:
-0.449169	reader has a good knowledge
-0.315860	{ list[i].a = 1.0; list[i].b
-0.330529	by an executable file stub.
-0.237816	algorithm (e.g. Quine–McCluskey or Espresso)
-0.229258	regularly. AMD: "Software Optimization Guide
-0.237908	Math Kernel Library" and "Integrated
-0.237927	from time T+1 to T+6,
-0.293243	language. Here are some examples:
-0.236255	possible performance. We must bear
-0.332017	log(2.0); ... } Here, log(2.0)
-2.034309	- n.a. n.a. - andnot(a,a)
-0.503366	of numbers: // Example 12.8a.
-0.165195	} u; u.i &= 0x7FFFFFFF;
-0.165195	etc. for Windows, -msse2, -mavx,
-0.237816	the declaration "static" or "__attribute__((visibility
-0.307000	lea $B2$2: mov mov 2:8+esp
-0.165195	__m128i a = _mm_blendv_epi8(bc, c2,
-0.346990	you start to optimize anything,
-0.237116	the number of clock pulses
-0.350632	between the operating systems disappears
-0.229258	and there are search requests
-0.452663	test, but is less reliable.
-0.307000	little work as possible. Typically
-0.237790	and the destructor by constructing
-0.200060	mirroring is not allowed. Non-public
-0.237007	an address below 2 GB,
-0.231417	can be shared. Any writable
-0.165195	int r1, r2, c1, c2;
-0.237816	header file stdint.h or inttypes.h
-0.237573	prevent two threads from attempting
-0.347531	much time it takes. Debugging.
-0.354493	be optimized by using indexes,
-0.251379	2) 2 a+a+a+a=a*4 -(-a)=a --xxxxxx-
-0.331946	limitations to what the preprocessor
-0.236177	time a new processor enters
-0.165195	works (gcc v. 4.5.2, July
-0.443221	boolb=0; static const float lookup[2]
-0.165195	100 rather than -156. Surprisingly,
-0.381791	123; are equally efficient because,
-0.236610	to improve speed without jeopardizing
-0.351651	a graphics function that draws
-0.237908	JavaScript, PHP, ASP and UNIX
-0.235947	or int 4 AVX _mm256_permutevar_ps
-0.237908	characters '?', '@' and '$'
-0.234542	that such contrived examples exist.
-0.234540	513 58.7 168.3 Table 9.3.
-0.581419	conversions can be used freely
-0.236610	on a unit-test without taking
-0.226799	bit to compare absolute values:
-0.165195	Automatic vectorization Automatic paralleli- zation
-0.237320	the memory page size (4096).
-0.276612	....................................................................................... 24 6 Development process......................................................................................................
-0.200060	then all the G values,
-0.165195	with vector operands: minimum, maximum,
-0.236588	and newsgroups contain useful discussions
-0.354633	as well use a #define,
-0.165195	Reset floating point status: _fpreset();
-0.233822	2004. No differences were observed
-0.237349	are actually reducing example 15.1d
-0.381546	Test the whole software package,
-0.347531	advance rather than allocating piecewise
-0.237404	code that contains integer division:
-0.233050	the last 8 columns unused.
-0.200060	a; int b;}; Sab ab[size];
-0.457767	certainly something that can steal
-0.629706	For example, x = array[i++]
-0.237908	devices are CPLDs and FPGAs.
-0.292041	_mm_load_ps(coef+i); // s += x^n/n!
-0.265216	....................................................... 20 3.7 File access................................................................................................................
-0.505886	to be converted to OMF
-0.373807	if the data fit nicely
-0.536365	the program under test finishes
-0.335543	and without the static keyword:
-0.434089	declared with the static keyword,
-0.165195	cycle is 1 0.5ns. 2GHz
-0.237908	processing, data compression and cryptography
-0.504487	option only in the Professional
-0.330803	even without the register keyword.
-0.559092	C functions such as strcpy,
-0.357747	members last: // Example 7.35b
-1.460680	For example: // Example 7.35a
-0.355330	a database by a plain
-0.251379	mode (SSE): #include <xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
-0.165195	bytes without cache MOVNTI _mm_stream_si32
-0.293191	arrays bigger than 2 GB.
-0.585835	dispatching, then it is advisable
-0.165195	security. Standard C++ imple- mentations
-0.320626	= int64_t 4 AVX2 _mm_i32gather_ps
-0.237823	c first. b+c = 100000001.23456.
-0.538201	Time per element Example 9.6b
-0.337098	this work on non-Intel processors).
-0.356770	protection scheme should be weighed
-0.230827	specific instruction set, e.g. /arch:SSE2.
-0.165195	(Embarcadero/CodeGear/Borland C++ Builder 5, 2009).
-0.200060	Here, we have inserted UnusedFiller
-0.323147	Intel compilers has several flaws:
-0.200060	Enums ...................................................................................................................... 33 7.5 Booleans...................................................................................................................
-0.234817	closely follows the mathematical notion
-0.212335	small low-power CPUs (Intel Atom).
-0.236793	or data exceeds 64 kbytes.
-0.165195	Some other compilers (Microsoft, Intel)
-0.454086	replaced by a single comparison:
-0.441611	software are available from Intel.
-0.284379	same function calling conventions. FreeBSD
-0.920287	may improve the performance somewhat.
-0.863018	for the SSE2 instruction set:
-0.236560	&SelectAddMul_dispatch; // Dispatcher void SelectAddMul_dispatch(short
-0.234683	// Example 16.1 #include <intrin.h>
-0.581262	expressions that can be reduced.
-0.165195	is infinity or NAN. Avoiding
-0.292454	insert any other error reporting
-0.294067	the problems associated with profiling,
-0.635294	for a discussion of profiling.
-0.324654	too small. Are objects numbered
-0.200060	algorithms in the planning phase
-0.235835	reciprocal_divisor = 1. / (b1
-0.237174	best optimizer. Borland/CodeGear/Embarcadero C++ builder
-0.200060	patterns with fixed strides. Uncached
-0.502032	But it is the responsi-
-0.165195	because the programmer hasn't thought
-0.381896	short vector math library (SVML).
-0.237734	----- - x-xxx - xx(-)x-
-0.357747	order polynomial: // Example 8.23a.
-0.659915	if the rows are indexed
-0.314189	on multi-core CPUs, but event-counters
-0.235338	which may happen quite often.
-0.536321	Optimization in embedded systems Microcontrollers
-0.347481	in popularity when a genuine
-0.492228	a long time to calculate.
-0.222406	structured and object-oriented programming, modularity,
-1.378681	for the sake of modularity.
-0.237927	scan forward) instruction to localize
-0.581780	program you want to optimize,
-0.231919	don't count on it. Instead
-0.284379	in this hot spot. Repeating
-0.301869	are running in parallel. Fine-grained
-0.165195	ASP and UNIX shell script.
-0.226797	(See thread-local storage p. 28)
-0.237174	= OneOrTwo5[b!=0]; will also work,
-0.279510	number 16 in column 28,
-0.451783	the pointer has been calculated.
-0.357830	on correction for the "FDIV
-0.308549	Hat). PathScale C++ v. 3.1,
-0.683356	makes floating point code slower,
-0.502254	by multiplying with the reciprocal:
-0.237816	function calling. __fastcall or __attribute__((fastcall)).
-0.825601	Alignd ( short int cc[size]
-0.165195	__asm ("fldl %1 \n fistpl
-0.165195	the libraries named MKL, VML
-0.165195	micro-op cache (e.g. Sandy Bridge)
-0.226797	set Header file MMX mmintrin.h
-0.236973	Larger data types: long long,
-0.744001	is that the microprocessor wastes
-0.294246	0. The division is inexact
-0.352413	3-dimensional geometry and other odd-sized
-0.165195	1./40320., 1./362880., 1./3628800., 1./39916800., 1./4.790016E8,
-0.864564	possible to make a thread-like
-0.237927	from time T to T+5,
-0.237908	Intel-based Mac OS and Itanium
-0.233048	with this problem: 1. Relocation.
-0.314238	that discriminates between CPU brands,
-0.272306	2048 230.7 513 513 2056
-0.236960	= 0; row < NUMROWS;
-0.165195	makers. 4. Instruction tables: Lists
-0.536862	x*x + 1; } module2.cpp
-0.357747	we have: // Example 12.8b.
-0.165195	0.40 n.a. 1.00 0.35 0.29
-0.357747	* 1.2f; // Example 14.18c
-0.230829	file MMX mmintrin.h SSE xmmintrin.h
-0.231419	contentions expected. Use square blocking:
-0.165195	0.30 4.5 0.82 0.59 0.27
-0.200060	n.a. 1.00 0.25 0.28 0.22
-0.378141	3) << 4) | ((C
-0.234540	(A & 0x0F) | ((B
-0.294120	i++) { b[i] = Func(a[i]);
-0.500319	inefficient if a program creates
-0.352382	2011). Instead, the following work-around
-0.235342	temporary objects for intermediate results,
-0.237710	function call (other than log)
-0.200060	x--x----- ---x----- x---x---x x-xxx---- a*b*c=a*(b*c)
-0.237074	times faster than any non-vector
-0.237866	Can the container be recycled?
-0.237776	of ADC (add with carry)
-0.235048	Linux and perhaps Mac OS.
-0.827098	... } } The FactorialTable
-0.233593	results printf("\n%2i %10I64i", i, timediff[i]);
-0.349102	situation, but the programmer can.
-0.165195	carry flag (e.g. DEC, JNZ).
-0.378697	of a critical dependency chain,
-0.290386	of (or in addition to)
-0.165195	(-a==-b)=(a==b) ---xx---- (a+c==b+c)=(a==b) ----x---- !(a<b)=(a>=b)
-0.353680	explained below on page 134.
-0.218603	100 As table 9.3 shows,
-0.314621	clock cycles whenever it feeds
-0.234386	ARRAYSIZE && list[i] > 1.0)
-0.228178	future due to general improvements
-0.349586	indirect function" has been introduced
-0.226799	which few programs do. Hence,
-0.200060	CISC instruction set (called x86)
-0.593132	result. Example: // Example 8.2a
-1.083356	this by // Example 8.2b
-0.236217	20 times and calls alternately
-0.212332	integer type. Interrupt service routines
-0.357420	to call the function billions
-0.403906	bytes AMD Opteron K8 1.09
-0.237756	will calculate xn as x4∙xn-4.
-0.354258	the compiler (see page 103),
-0.233338	dates back to around 1980
-0.458768	of 2 in example 14.7b,
-0.357747	of 2: // Example 14.7b.
-0.165195	be mainstream next year. Ignoring
-0.355449	point parameters are not affected
-0.165195	= y.c + 3.; x.d
-0.234215	int i; } x; x.f
-1.129580	like this: // Example 7.9b
-0.578748	the example: // Example 7.9a
-0.345351	The effect is simply identical.
-0.513357	if an exception occurs somewhere
-0.234539	= a&&(b||c) !a && !b
-0.349919	time because the memory bus
-0.218603	on a First-In-First- Out (FIFO)
-0.236536	not a safe programming practice,
-0.795278	The loop in example 8.24
-0.593132	called. Example: // Example 8.25
-0.330172	use an object file disassembler.
-0.483981	between the Boolean operators &&,
-0.593132	calls. Example: // Example 8.20
-0.651787	can be calculated as (critical
-0.885415	function. Example: // Example 8.22
-0.330730	out-of-order capabilities are very smart.
-1.129580	like this: // Example 12.9a.
-0.294187	structures by 16 for SSE2,
-0.237273	r) {return r.a + r.b;}
-0.231919	additions and shift operations. Multiplying
-0.357515	operator ++i and the post-increment
-0.350023	source of information about bugs,
-0.237816	goes to C0::f or C1::f.
-0.165195	double 256 F32vec4 F64vec2 F32vec8
-0.165195	parm2) {...} // Dispatcher. Will
-0.218601	I must warn against overkill.
-1.083356	this by // Example 8.3b
-0.294000	This feature uses an ordinary
-0.236610	object oriented programming without paying
-0.165195	Intel. See Intel Technology Journal
-0.356776	8.18 float a = -1.0E8,
-0.200060	which may cause slight imprecision
-0.342504	the chosen compiler doesn't provide
-0.340514	to do integer operations in-between
-0.454670	similarly sets the variable __intel_cpu_feature_indicator_x.
-0.420880	doesn't have to save recovery
-0.410603	the Windows Template Library (WTL).
-0.165195	Gnu: Glibc v. 2.7, 2.8.
-0.324399	__intel_cpu_feature_indicator where each bit indicates
-0.330862	i--) *(p++) |= 0x20; 46
-0.410603	and Windows Template Library (WTL):
-0.222406	therefore suffer from mispredictions. 44
-0.281589	to make a reliable decision.
-0.200060	bits in x86 systems). 42
-0.218601	them into a place indicated
-0.165195	data sets. Covers PC's, workstations
-0.165195	i2; for(i=0,i2=0; i<100; i++,i2+=2.0f)a[i]=i2; 41
-0.200060	38.7 512 512 2048 230.7
-0.337821	without -fpic is much faster,
-0.286166	(bb[i] > 0) ? (cc[i]
-0.351866	This is the variable 85
-0.237549	/GL --combine -fwhole- program /Qipo
-0.251379	12.4c is quite tedious indeed.
-0.341290	a[2]; a[0] = 1; a[1]
-0.233822	4) | (C << 6);
-0.237882	point of attack for hackers.
-0.382872	while the multiplication is exact.
-0.237882	int row, column; for (row
-0.236633	available, though less user friendly.
-0.301508	the value is poorly predictable,
-0.487588	a graphical user interface (OnIdle
-0.345248	Software distributors are often abusing
-0.236291	called a leaf function. Leaf
-0.165195	{ "Alpha", "Beta", "Gamma", "Delta"
-0.237823	= 8, Thursday = 0x10,
-0.276614	Loops...................................................................................................................... 45 7.14 Functions ................................................................................................................
-0.237816	dispatching to C1::Disp() or C2::Disp()
-0.237288	reset or goes into sleep
-0.165195	is also called Single-Instruction-Multiple-Data (SIMD)
-0.237718	a branch (e.g. an if-else
-0.230123	extern "C" int CriticalFunction ();
-0.354468	vector library, you are feeding
-0.200060	a*0=0 a*1=a (-a)*(-b)=a*b a/a=1 ----x---x
-0.237874	based on hacks that violate
-0.355591	or 1. See page 34.
-0.165195	n.a. 2.23 0.95 0.6 1.19
-0.200060	!(!a)=a x-xxxxxxx ---x----- x--xx---- (a&&b)||(a&&!b)=a
-0.287892	underflow except in special mathe-
-0.356126	definition language, such as VHDL
-0.460894	program updates should be postponed
-0.233050	in 32-bit systems gives rise
-0.353462	{double d; unsigned int u[2]}
-0.200060	as email and web browsing
-0.649183	to use a loop counter:
-0.165195	float list[] = {1.1, 0.3,
-0.234215	and scientific vector processors. Henry
-0.237075	64 bit mode, we encounter
-0.510164	of the data cache. Bit-fields
-0.237861	in the output are unacceptable.
-0.234935	one variable if their live-ranges
-0.200060	2 0.77 0.89 0.40 0.30
-0.237829	optimizing // Time // Serialize
-0.229258	and IA-32 Architectures Optimization Reference
-0.237643	of course also time consuming,
-0.231923	information about bugs, compatibility problems,
-0.381564	called global offset table (GOT)
-0.403906	op. AMD Opteron K8 0.38
-0.165195	---x---xx (-a==-b)=(a==b) ---xx---- (a+c==b+c)=(a==b) ----x----
-0.449132	a smart pointer is created,
-0.165195	{ __declspec(__align(64)) double matrix[SIZE][SIZE]; transpose(matrix);
-0.237409	want to prevent cache contention.
-0.237019	(N & (N-1)) return powN<(N1&(N1-1))==0,N1>::p(x)
-0.237882	for all squares: for (r1
-0.237790	and intelligible way by wrapping
-0.358404	on b can be omitted,
-0.218601	Sum2(S3 * p) {return p->a
-0.397928	number is not necessarily newer.
-0.165195	language defines hardware circuits consisting
-0.346270	then we have an estimated
-0.336239	approach to CPU dispatching. Underestimating
-0.355005	advance and stored in edx.
-0.237908	induction variables Y and Z.
-0.165195	"Intel 64 and IA-32 Architectures
-0.237816	avoid global variables or hide
-0.303148	apply the empty throw() specification
-0.165195	has done by fetching, decoding
-0.165195	error handler calls exit(), abort(),
-0.382560	faster to use than others.
-0.235997	= Y; Y += Z;
-0.623460	12.8 Aligning dynamically allocated memory.................................................................
-0.452666	function should also be considered.
-0.712338	piece of code in general.
-0.357747	derived class: // Example 7.38a.
-0.356000	and 2B. There are hundreds
-0.237319	Func1(double) pure_function ; double Func2(double
-0.326628	to date): Microsoft Visual studio
-0.928755	you don't have to reinvent
-0.237928	speed exceeding that of yesterday's
-0.232720	doesn't works (gcc v. 4.5.2,
-0.165195	PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROC NEAR
-0.237884	compatible with these. The CodeGear,
-0.292653	dynamic linking. The file http://www.agner.org/optimize/asmlib.zip
-0.279508	two (three on CodeGear compiler)
-0.200060	130 14.4 511 511 2040
-0.357747	runtime polymorphism: // Example 7.43a.
-0.446165	loops are implemented as recursive
-0.200060	you are not testing. Trying
-0.235945	column 29 with line 29.
-0.236610	out of F1 without returning.
-0.236094	to b memcpy(b, a, sizeof(b));
-0.236090	the operating system thread scheduler.
-0.381564	unsigned. The following table summarizes
-0.294178	it can happen that (b*c)
-0.235425	error-handling function that simply prints
-0.353303	Abrash: "Zen of code optimization",
-0.218605	b; a = parabola (2.0f);
-0.455563	list[16]; int i; ... list[i
-0.237882	metaprogramming // Template for pow(x,N)
-0.339834	PTR [edx] DWORD PTR [eax+400]
-0.356632	result 100 rather than -156.
-0.504286	shared object can be speeded
-0.237823	+ 3.; x.d = y.d
-0.358404	following work-around can be used:
-0.351866	rather than the variable m.
-0.237716	Multiply (int x, int m)
-0.294120	y; ... x.a = y.a
-0.294120	+ 1.; x.b = y.b
-0.294120	+ 2.; x.c = y.c
-0.165195	"Zen of code optimization", Coriolis
-0.212332	8, 16, 32, 64, ...).
-0.294292	* m;} template <int m>
-0.165195	!= INVALID_HANDLE_VALUE && WriteFile(handle, ...))
-0.358548	minor error in the oldest
-0.237908	floating point registers and correspondingly
-0.314299	non-Intel processors). It has excellent
-0.325265	f; f=i; f = (float)i;
-0.463154	non-polymorphic functions in the grandparent
-0.165195	Vec16s Vec16us Vec8i Vec8ui Vec4q
-0.200060	ebx, eax / sar ebx,1
-0.382682	i[2]; } u; if (u.i[1]
-0.340339	solution because of its simplicity.
-0.226799	run many processes simultaneously. Actually,
-0.212335	same applies to 3-dimensional geometry
-0.165195	4 int 128 Is32vec4 Vec4i
-0.343678	product is one that saves
-0.237739	aligned Assume pointer not aliased
-0.165195	Vec4uq Vec4f Vec2d Vec8f Vec4d
-0.165195	and 13 objects, respectively (MS
-0.165195	to make a thread-like scheduling
-0.231417	big endian storage (e.g. PowerPC).
-1.221489	then there is no guarantee
-0.165195	CPU model is over. Virtualization
-0.349971	to find the best algorithm.
-0.313124	same core will always compete
-0.356088	you need to do searches
-0.595108	the same register for both,
-0.230830	ameliorated by using nontemporal writes.
-0.237816	object by *p or p->member
-0.352935	next element. I have confirmed
-0.237908	128-bit execution units and hence
-0.235504	EXCEPTION_FLT_OVERFLOW ? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
-0.357747	is enabled: // Example 14.21.
-0.237756	is not quite as versatile.
-0.237908	programming, modularity, reusability and systematization
-0.237569	has a smaller memory footprint.
-0.314735	the functions memset and memcpy:
-0.237790	conclude this section by summing
-0.237569	have execution units, memory ports,
-0.212332	in an || expression. Assume,
-0.237884	microprocessor hardware design. The ultimate
-0.429547	the register stack is organized.
-0.237908	for string searching and parsing
-0.218603	University of Denmark. Copyright ©
-0.314706	another security problem. The official
-0.497355	call the function. This fragmentation
-0.165195	purpose. The clumsy AND-OR construction
-0.595061	in the code are modified,
-0.498475	shows that it takes 40%
-0.232720	Visual studio 2008, v. 9.0
-0.165195	with the option -read_only_relocs suppress.
-0.314765	of program efficiency is reflected,
-0.237908	on page 136 and 137,
-0.230829	and an error condition terminates
-0.226797	stack (see above, p. 26).
-0.502883	inside can be predicted perfectly.
-0.236536	11.6 64 64 32 16.4
-0.165195	program optimization /GL --combine -fwhole-
-0.578831	after the program is terminated
-0.218601	one fraction 2 63 .
-0.237739	accessing arrays forwards, not backwards.
-0.165195	= { "Alpha", "Beta", "Gamma",
-0.236984	expressions like -(-a) very often,
-0.218603	target buffer, branch pattern history,
-0.234218	const int x; public: c1()
-0.356637	an integer because the integer-to-float
-0.457672	MathLoop() { const int arraysize
-0.212332	16 256 Vec32uc Vec16s Vec16us
-0.200060	a+a+a+a=a*4 -(-a)=a --xxxxxx- a-(-b)=a+b ---xxx-x-
-0.236089	references, 'this' pointer, common subexpressions,
-0.315188	the linker to remove unreferenced
-0.356483	always end with a non-recursing
-0.982129	of the Intel compiler puts
-0.165195	version of Mathcad (v. 15.0)
-0.336316	or compile-time generation of identifier
-0.236134	today. But this language gained
-0.200060	in the early planning stage
-0.212332	float i2; for(i=0,i2=0; i<100; i++,i2+=2.0f)a[i]=i2;
-0.525966	use an intermediate code (byte
-0.237234	Use ReadTSC() from library asmlib..
-0.314703	program optimization or for combining
-0.237910	overloaded or limited in scope.
-0.237790	to 120 ms by selecting
-0.612561	in large data structures .............................................................
-0.232348	memory, windows, mutexes, database connections,
-0.237882	S1 list[100], *temp; for (temp
-0.352286	137 errors must be added.
-0.200060	for a discussion. 7.33 Namespaces
-0.346062	data structure and then merge
-0.354258	unsigned integers (see page 142).
-0.356126	table 9.2, such as flush
-0.165195	Examples include JavaScript, PHP, ASP
-0.235425	libraries are not well documented.
-0.352485	or circumvent operating system standards.
-0.251379	...................................................................................................... 37 7.8 Member pointers.......................................................................................................37
-0.237882	a software module for correctness
-0.324387	Conversions involving class objects (rather
-0.200060	the other is -0 (zero
-0.237716	only first time int CriticalFunction_Dispatch(int
-0.526728	(BTB). Contentions in the BTB
-0.294292	available. Some compilers offer profile-guided
-0.513357	4 PUBLIC ?Func@@YAXQAHAAH@Z ?Func@@YAXQAHAAH@Z PROC
-0.443318	11; // exponent + 0x3FF
-0.885415	counter. Example: // Example 7.32a
-0.236253	*p = string; while (*p
-0.229257	these problems are usability issues,
-0.346024	or 2016. The same coding
-0.292683	void F1(int x[]); void F2(float
-0.606886	18 Overview of compiler options.......................................................................................
-0.443318	15; // exponent + 0x3FFF
-0.200060	mov add cmp jl $B1$3:
-0.451050	size; i++) a[i] = 0.0;
-0.165195	code as example 12.4b, rewritten
-0.200060	threads, such as semaphores, mutexes
-0.289801	= int64_t 2 AVX2 _mm256_i64gather_epi32
-0.294240	a possible point of attack
-0.200060	rather than the external clock.
-0.200060	such as pow, log, exp,
-0.236561	and different user access rights.
-0.345248	clock counts are often fluctuating
-0.935698	this problem is to combine
-0.290384	----x---- !(a<b)=(a>=b) (a<b && b<c
-0.294138	JNZ). This solution can incur
-0.347696	library files are also included.
-0.325312	of these directives are compiler-specific.
-1.378681	for the sake of security,
-0.325340	int. Reinterpret cast The reinterpret_cast
-0.331881	by bypassing the so-called CPU-dispatcher
-0.349629	1.0f + b * 1.5f;
-0.237504	r.b;} The three functions Sum1,
-0.291295	and write configuration files (*.ini
-0.331908	the even integer is returned.
-0.165195	AMD Opteron K8 1.09 1.25
-0.265216	0.63 0.75 0.18 0.11 1.21
-0.915277	members of a class (also
-0.228178	execution and advanced prediction mechanisms.
-0.200060	vector processors. Henry S. Warren,
-0.200060	< columns; j++) 39 matrix[i][j]
-0.237927	is OK, however, to pass
-0.340420	have similar CPU dispatch mechanisms,
-0.224962	method Function inlining x-xxxx--x Constantfolding
-0.325265	CriticalFunctionType * CriticalFunction = &CriticalFunction_Dispatch;
-0.357600	program package is not traditionally
-0.436548	file in simple cases. Database
-0.212332	32-bit integers to alias upon
-0.235051	the program - preferably isolated
-0.467675	good to do a thorough
-0.230829	(GetExceptionCode() == EXCEPTION_FLT_OVERFLOW ? EXCEPTION_EXECUTE_HANDLER
-0.232721	The bitwise AND operation isolates
-0.165195	look in my crystal ball
-0.237452	FMA4 fma4intrin.h (Gnu) all intrin.h
-0.237829	int)a / 10; // Convert
-1.086190	mathematical functions such as pow,
-0.339815	of using a common denominator
-0.228181	- min) <= (unsigned int)(max
-0.218601	follows a simple regular pattern,
-0.477362	developers should be aware of.
-0.165195	for(i=0,i2=0; i<100; i++,i2+=2.0f)a[i]=i2; 41 Float
-0.559601	be an efficient solution. Sort
-0.292925	and delete, and often excessively
-0.165195	code have been reordered, inlined,
-0.200060	will then be repeated 1024/4
-0.281590	min && i <= max)
-0.344202	with, e.g. the option /QaxAVX
-0.293931	no explanation why this delaying
-0.237816	for speed /O2 or /Ox
-0.318581	/EHs- No stack frame /Oy
-0.237908	loop by n and reorganize
-0.237816	programs use internet or intranet
-0.234386	(u.i * 2 > v.i
-0.231419	another vector register containing (2,2,2,2),
-0.343177	call the library functions directly:
-0.224966	7.8 if (handle != INVALID_HANDLE_VALUE
-0.165195	2005). Borland C++ 5.82 (Embarcadero/CodeGear/Borland
-0.453023	Assume no pointer aliasing /Oa
-0.236737	/O3 -O3 Interprocedural optimization /Og
-0.355591	identification (RTTI). See page 54.
-0.237908	instruction latencies, throughputs and micro-operation
-0.165195	the Boolean operators &&, ||,
-0.237287	= a XOR b Bit
-0.466274	limited number of possible inputs.
-0.355623	will generate the value infinity,
-0.290016	and negative inputs give infinity.
-0.229256	prevent it from fully utilizing
-0.491495	This is a simple solution,
-0.200060	make vectorization less favorable: Larger
-0.349586	the STL has been criticized
-0.714613	Gnu, Clang, Intel or PathScale.
-0.231422	the possibility of algebraic reduction.
-0.462241	has chosen for the label.
-0.420243	This may be faster despite
-0.312620	Linux Optimize for speed /O2
-0.454449	a user has to reinstall
-0.314790	that runs under the framework,
-0.235778	of a vector, uses SSE3.
-0.492811	invalid in a particular situation,
-0.234544	truncation, and % means modulo.
-0.356776	Example 8.3b a = 5.0f;
-0.200060	Example 8.12b int a[2]; a[0]
-0.494929	programming, how to avoid hard-to-find
-0.237340	dynamically allocated memory, using new.
-0.165195	Non-strict floating point -ffast-math /fp:fast
-0.352286	of i must be adjusted
-0.237861	object oriented programming are dominating.
-0.236610	does the same without discriminating
-0.294239	consumption was down to 36.
-0.165195	2014. Last updated 2014-08-07. Contents
-0.165195	Glibc v. 2.7, 2.8. Asmlib:
-0.237931	following example converts a zero-terminated
-0.294246	but this unit is pipelined,
-0.456040	more efficient than a polymorphous
-0.237908	by defining _mm_malloc and _mm_free.
-0.335959	the pipeline and later discovers
-0.200060	bytes without cache MOVNTPD _mm_stream_pd
-0.237206	access patterns containing multiple streams
-0.218603	bytes without cache MOVNTQ _mm_stream_pi
-0.293314	loader (requires binutils version 2.20,
-0.237643	Internet at regular time intervals.
-0.335597	across modules (See page 81).
-0.381731	(20 - 45 clock cycles).
-0.237131	operation. For example,a * 16is
-0.343702	arguments while pointers and non-constant
-0.237934	of it (&ArraySize) is taken.
-0.314735	give annoyingly long and irregular
-0.320858	is that the linker extracts
-0.331908	if its address is taken,
-0.376467	graphics objects in computer games
-0.350233	be 1 b = lrint(d);
-0.322764	32 4 int 128 Is32vec4
-0.324006	int unsigned int 64 Is32vec2
-0.235996	the design of small microcontrollers:
-0.346326	block or function call (other
-0.165195	--combine -fwhole- program /Qipo -ipo
-0.237716	SomeFunction (int a, int x[])
-0.237319	unsigned int dummy; double a[arraysize],
-0.229261	printf("Gamma"); break; case 3: printf("Delta");
-0.237033	precision by default, so 1.2
-0.314533	N = 1. This ends
-0.200060	be unable to respond quickly
-0.233049	of relieving a syntax restriction,
-0.356126	of purposes such as email
-0.232350	storage. All x86 platforms (Windows,
-0.224962	beginning of list plus i*sizeof(S1).
-0.358548	unused bytes in the end.
-0.292884	costs of position-independent code. 147
-0.532848	numbers at a time packed
-0.235342	later reads from addresses 0x2F00,
-1.039800	compilers have an option (Windows:
-0.234385	= a&&(b||c) (a&&!b) || (!a&&b)
-0.237908	have a temp1 and temp2.
-0.165195	expensive. A limited "express" edition
-0.458282	equal to the critical stride,
-0.354714	this distance the critical stride.
-0.165195	be calculated as (critical stride)
-0.347482	each thread than to temporarily
-0.313631	not uncommon for software teachers
-0.237908	cost of starting and stopping
-0.355623	compiler to the value 0x2C
-0.324715	The first generation class (CGrandParent)
-0.237569	modern computers have memory caches.
-0.235249	and v.f are both positive.
-0.165195	1994. Mostly obsolete. Rick Booth:
-0.251379	set. High precision math. Libraries
-0.165195	template metaprogramming so complicated? Because
-0.237829	a[i].u[1] * 2; // Find
-0.237816	available in 2015 or 2016.
-0.311817	reciprocal Boolean algebra reductions: !(!a)=a
-0.314689	operand is infinity or NAN.
-0.218601	of each integer type. Interrupt
-0.357395	is intended to be platform-independent
-0.165195	functions such as strcpy, strcat,
-0.165195	and afterwards a BSF (bit
-0.237756	align the arrays as required,
-0.231419	A big file containing numerical
-0.237884	called name mangling. The characters
-0.358404	reasonable estimate can be made)
-0.294240	on all sizes of matrices.
-0.165195	Sunday, Monday, Tuesday, Wednesday, Thursday,
-0.236961	inferior to their 32-bit counterparts.
-0.355591	STL containers. See page 90.
-0.607457	Don't mix float and double.....................................................................................
-0.568686	available at compile time. (Of
-0.725073	drawbacks of the C++ language......................................................
-0.165195	or 32 bits (rarely 64).
-0.358548	unnecessarily wasteful in the STL.
-0.738687	few cases where it matters:
-0.907740	n.a. - a & 0=
-0.237910	cannot be determined in advance,
-0.635305	7.2 Integers variables and operators...............................................................................
-0.354258	as intended (see page 84).
-0.200060	for 80x86 / x64 (Visual
-0.234540	floating point multiply-and-add Table 13.1.
-0.237340	dynamic memory allocation using new/delete
-0.357747	floating point: // Example 14.22b
-0.885415	overflow. Example: // Example 14.22a
-0.350327	not optimal from a technological
-0.236587	before the performance even matters,
-0.237464	on an interpreter which interprets
-0.525986	all files on access. Run
-0.547771	libraries are useful for vectorizing
-1.065943	bytes. first byte at 400,
-0.165195	Report on C++ Performance". www.open-
-0.357747	metaprogramming is. // Example 15.1d.
-0.229256	but avoids the overflow. Taking
-0.538888	line has to be reloaded
-0.352382	position-independent has the following features:
-0.237882	free E-book Usability for Nerds
-0.846704	multiple versions of the user-written
-0.237593	polymorphic function. The } 59
-0.165195	AMD LIBM Library amd_vrs4_expf amd_vrd2_exp
-0.237131	x = 2 * 5;
-0.349127	mark_end; This solution is clearly
-0.165195	no pointer aliasing /Oa -fno-alias
-0.235338	the compiler does quite ingenious
-0.200060	with the same template. 57
-0.339815	by making a common denominator:
-0.237319	#define pure_function #endif double Func1(double)
-0.229255	unless the SSE2 (or later)
-0.237928	Fog. Technical University of Denmark.
-0.519535	line: static inline void StoreNTD(double
-0.212332	bits than a float. (Both
-0.165195	5.82 (Embarcadero/CodeGear/Borland C++ Builder 5,
-0.237910	the dependency chain in two:
-0.593132	later. Example: // Example 14.18a
-0.357747	double precision: // Example 14.18b
-0.355591	member function. See page 53.
-0.341354	select_gt(b, zero, c + two,
-0.526977	sign :1;//signbit }; struct Slongdouble
-0.357747	a union: // Example 9.2b
-0.593132	time. Example: // Example 9.2a
-0.324454	32-bit or 64-bit mode. Much
-0.349678	cache line would be evicted.
-0.237927	the update mechanism to advertise
-0.265216	0/a=0 ---xx--xx (-a==-b)=(a==b) ---xx---- (-a>-b)=(a<b)
-0.237861	if time intervals are short.
-0.355591	is requested. See page 45.
-0.165195	1./5040., 1./40320., 1./362880., 1./3628800., 1./39916800.,
-0.237452	compiler can replace all occurrences
-0.200060	by the processing power. Connecting
-0.354258	Bounds checking (see page 134)
-0.331183	by 16. In example 12.1a,
-0.165195	such as memcpy, memmove, memset,
-0.237934	when the original is destroyed.
-0.237075	n = 4, we have:
-0.354493	to zero by using memset:
-0.283156	processors (0, 2, 4, etc.).
-0.338284	ones mentioned in table 9.2,
-0.165195	it does incredibly stupid things.
-0.293931	to work around this limitation
-1.460680	For example: // Example 8.24.
-0.218603	network access to virus attacks
-0.357747	control condition: // Example 7.32b
-0.237884	one that doesn’t. The undocumented
-0.512713	information that we can surely
-0.218603	AND'ing it with 2n -1.
-0.607249	the following conditions are met:
-0.578831	until the program is shut
-0.561334	discussed in the following sections.
-0.165195	of strange and unexpected behaviors.
-0.372984	vectorization. Optimizes reasonably well. Very
-0.230828	most C++ compilers allow assembly-like
-0.165195	fast 32-bit software development", Addison-
-0.331946	this problem are the following:
-0.294270	libraries. To explain the difference,
-0.357959	memory access is a bottleneck.
-0.231420	numbers form a logical sequence.
-0.758531	void test () { __declspec(__align(64))
-0.345077	14.19 below. The function rounds
-0.231918	a result of macro expansions.
-0.353420	loop and have a temp1
-0.235835	(set) = (0x2710 / 0x40)
-0.355331	the integer value of temp.
-0.237593	- (time before) } printf("\nResults:");
-0.165195	the project at hand. Low-level
-0.236422	x.abc = (A & 0x0F)
-0.631881	Runtime type identification (RTTI) ...........................................................................
-0.990884	p(double x) { return powN<true,N/2>::p(x)
-0.165195	the optimized code (release version)
-0.537627	copy a to b memcpy(b,
-0.165195	int dummy; double a[arraysize], b[arraysize],
-0.358060	is closest to the truth
-0.314790	processors will support the ADX
-0.325424	in a loop of ADC
-1.342908	it is important to realize
-0.165195	for a specific purpose: Contain
-0.222408	6, 9 and 13 objects,
-0.235778	64 2 int64_t 128 I64vec2
-0.357411	64-bit Windows may be mitigated
-0.200060	-msse3 -mssse3 -msse4.1 -mAVX -axSSE3,
-0.346466	(rather than pointers to objects)
-0.346449	to be available in 2015
-0.356201	be improved is that r+i/2
-0.293915	often underestimate this time lag.
-0.237908	functions look clumsy and tedious.
-0.291482	improvements in microprocessor hardware design.
-0.236983	a well optimized software design,
-0.237573	graphical user interfaces from scratch.
-0.442891	op. Intel Core 2 0.63
-0.200060	--xxxx-xx a*1=a x-xxxxx-x (-a)*(-b)=a*b ---xxx---
-0.233336	cover graphics processors. 5 Programmable
-0.357999	is likely that the producer
-0.438865	implies more than it says.
-0.212332	option (Windows: /Gy, Linux: -ffunction-sections)
-0.382872	this memory block is re-allocated
-0.408138	next each bit in nn
-0.231417	the carry flag (e.g. DEC,
-0.440413	be transferred in registers, whereas
-0.237829	= (double)(signed int)u; // Faster,
-0.200060	anyway. Pure function. __attribute__((const)) (Linux
-0.165195	* 5 * 0.5 ns
-0.165195	Edition, 2005; and "More Effective
-0.236046	AMD SSE4A ammintrin.h AMD XOP
-0.355611	|| (!a&&b) = a XOR
-0.654916	be used as a stand
-0.165195	point -ffast-math /fp:fast /fp:fast=2 -fp-model
-0.165195	efficient than a polymorphous class?
-0.325390	delete (or malloc and free)
-0.165195	1./4.790016E8, 1./6.22702E9, 1./8.71782E10, 1./1.30767E12, 1./2.09227E13};
-0.354258	or 1 (see page 135).
-0.231418	disk because of disk caching,
-0.315862	#include <stdio.h> // define fprintf
-0.314762	r in Sum2 and Sum3.
-0.354018	Catch exceptions in this block:
-0.165195	has been doubled. Thin clients
-0.232720	Open Watcom C/C++ v. 1.4,
-0.165195	0; column < NUMCOLUMNS; column++)
-0.237543	or another error has occurred
-0.234029	use a version control tool.
-0.463318	extra overhead of the iterator
-0.237718	simply by performing an illegal
-0.593132	once. Example: // Example 8.6a
-1.083356	this by // Example 8.6b
-0.582370	sufficient to make a zip
-0.357747	example: 38 // Example 7.15a.
-0.352767	increased by more than 33%
-0.357395	prefer a to be signed.
-0.331908	if the integer is signed,
-0.357747	of underflow: // Example 7.5.
-0.226798	sum = (s0+s1)+(s2+s3); Now s0,
-0.165195	But this language gained remarkably
-0.290697	for small embedded systems. Today
-0.237927	designers have gone to great
-0.165195	write configuration files (*.ini files).
-0.212332	-msse4.1 /arch:SSE4.1 -mAVX /arch:AVX /QaxSSE3,
-0.307002	test tool for details (www.agner.org/optimize/testp.zip).
-0.294187	full 64-bit addresses for everything,
-0.212332	for hard disk copying. Security.
-0.165195	controversies over the C99 standard.
-0.228178	are several different profiling methods:
-0.165195	9 and 13 objects, respectively
-0.291789	vmlsExp4 vmldExp2 Intel SVML v.10.3
-0.291789	2 double Intel SVML v.10.2
-0.237816	called square blocking or tiling.
-0.307594	// Array of 100 doubles:
-0.165195	-56 rather than 200. Next,
-0.549474	choosing the most efficient alternative.
-0.226800	the STL (Standard Template Library)
-0.307000	$B1$1: mov mov mov lea
-1.052275	array static inline void StoreVectorA(void
-0.352286	manually. It must be emphasized
-0.314689	dynamic libraries (*.dll or *.so)
-0.231918	article on compiler optimization. en.wikipedia.org/wiki/Compiler_optimization.
-0.165195	smmintrin.h (Gnu) AES, PCLMUL wmmintrin.h
-0.165195	blocks such as gates, flip-flops,
-0.237910	and other languages in Microsoft's
-0.485987	in the assembly output (/FAs
-0.358060	integer According to the standards
-0.653216	are used. See page 140.
-0.165195	int i; float i2; for(i=0,i2=0;
-0.212332	to do the divisions (Division
-0.505886	list[301]; int i; for(i=0; i<301;
-0.165195	no pointer aliasing" (if valid)
-0.335665	CPU feature on Intel CPU’s.
-1.162233	< size; i++) { ab[i].b
-0.236217	optimized for accessing arrays forwards,
-0.165195	pivot in a Gauss elimination.
-0.345248	Unfortunately, profilers are often unreliable.
-0.629963	It is easy to port
-0.276614	purposes. Available from www.agner.org/optimize/asmlib.zip. Currently
-0.339377	costly and which are cheap,
-0.316496	e.g. Intel Math Kernel Library.
-0.165195	11, Iss. 4, 2007 (www.intel.com/technology/itj/).
-0.293745	technical details of instruction timing,
-0.233824	make the matrix 512 520
-0.236875	a class (also called properties)
-0.351405	operators produce a single result,
-0.341862	Users should get a reply
-0.838524	changed to: // Example 14.17b
-0.707718	unsigned int fraction : 52;
-0.237861	begins with #) are costless
-0.303147	called the branch misprediction penalty.
-0.343595	it has done by fetching,
-0.224962	priority back to normal afterwards.
-0.237823	const int ABC = 123;
-0.428530	Organize the data into groups
-1.057937	* p) { return _mm_load_si128((__m128i
-0.237662	many people who have sent
-0.230829	on very small loops (less
-0.165195	Intel SVML + ia32intrin.h _mm_exp_ps
-0.656803	long enough to be noticeable
-0.165195	SVML + ia32intrin.h _mm_exp_ps _mm_exp_pd
-0.230119	object through this address. Step
-0.218601	by using the directive __declspec(cpu_dispatch(...)).
-0.251379	i; float a[size], b[size], c[size];
-0.237908	with this mask, and bb[i]*cc[i]
-0.212332	Example 7.16 float list[100]; memset(list,
-0.463154	be seen in the broader
-0.236960	= 0; column < NUMCOLUMNS;
-0.237908	difference between commas and semicolons
-0.500598	operator; and you can toggle
-0.593132	operation. Example: // Example 14.7a.
-0.348040	12 Using vector operations Today's
-0.346541	This alignment can cause holes
-0.234540	8 512 AVX512 Table 12.1.
-0.165195	such contrived examples exist. Therefore
-0.344361	generate an assembly language output,
-0.165195	eax. The loop initialisation i=0;
-0.539117	called in a single session.
-0.575790	implementations of the same algorithm,
-0.237734	Copyright © 2004 - 2014.
-0.235997	+= a[i+1]; s2 += a[i+2];
-0.228178	calculated asa << 4, anda
-0.354567	or -fno-strict-overflow. You may deviate
-0.228178	third party security software. Background
-0.234683	the compiled versions #include "instrset_detect.cpp"
-0.237349	compiler to reduce example 12.1b
-0.439369	b than to write _mm_add_epi16(a,b).
-0.357515	operator (&) and the EXCLUSIVE
-0.674411	assembly code from example 8.26b:
-0.237212	// Example 14.6 float list[16];
-0.222408	models have a strict formalism
-0.434847	only on CPUs with full-size
-0.462561	error conditions in a graceful
-0.810302	mispredicted only when it changes.
-0.316723	the Active Template Library (ATL)
-0.165195	-static /MT 160 /Qparallel -parallel
-0.224965	1; n <= 16; n++)
-0.568686	done at compile time. (Examples
-0.532159	(Tuesday | Wednesday | Friday))
-0.232723	options turned on, including relaxed
-0.165195	the & operator (bitwise and)
-0.312514	Optimization Guide for AMD Family
-0.291649	instruction set not supported fprintf(stderr,
-0.355097	listed below in example 16.1.
-0.354834	data than it can handle.
-0.236718	functions take most time. Uses
-0.237816	a program creates or modifies
-0.379128	just-in-time compiler can optimize specifically
-0.336305	commercial compilers due to controversies
-0.236560	// Example 9.2a void F1(int
-0.504487	not included in the representation,
-0.212335	is four places back. Thus,
-0.234937	has many features, see http://www.agner.org/optimize/
-0.212335	80.8 65 65 13.6 80.9
-0.165195	9.6b 64 64 14.0 80.8
-0.421361	a more clear and intelligible
-0.382901	possible to utilize the computational
-0.237816	Data alignment. __declspec(align(16)) or __attribute__((aligned(16))).
-0.237643	profilers are: Coarse time measurement.
-0.325312	itself. Function addresses are obscured
-0.236631	sensible balance between these considerations.
-0.563925	structure of a program dictates
-0.237874	an illegal operation that crashes
-0.800118	static member function is 83
-0.165195	(See Sutter: A Pragmatic Look
-0.233335	in assembly language", section 17.9:
-0.165195	such as gates, flip-flops, multiplexers,
-0.358060	one iteration to the next.
-0.293643	floating point and integer representations
-1.168227	is no need to deallocate
-1.187919	const & x) { _mm_store_si128((__m128i
-0.313504	can sometimes be eliminated completely.
-0.237861	function calling conventions are different.
-0.324689	whether the different compilers succeeded
-0.237662	Mac OS, etc.) have little-endian
-0.354018	as described in this chapter.
-0.165195	16 char 128 Is8vec16 Vec16c
-0.349708	libraries have CPU dispatching 125
-1.460680	For example: // Example 14.16a
-0.237955	such as eliminating the if-branch
-0.569721	This manual is based mainly
-1.439329	as explained on page 132.
-0.236453	The unsigned integer type size_t
-0.817222	objects accessed in a FILO
-0.212332	from 0 to 12. Higher
-0.228180	DWORD PTR [eax+4], ecx 86
-0.165195	x, y; // x,y coordinates
-0.237131	y1 = a1 * b2
-0.237131	y2 = a2 * b1
-0.358548	an example in the "Macro
-0.351712	80 into a and b.
-0.350871	such as AMD and VIA.
-0.233594	this purpose. It just happened
-0.284379	53 function at runtime. Polymorphism
-0.236509	doubles by comparing bits 32-62.
-0.421136	elimination and loop-invariant code motion.
-0.229257	for many common purposes (www.boost.org).
-0.236253	seconds = 0; while (seconds
-0.350717	supported"); return; } // continue
-0.341677	will work only on Intel/x86-compatible
-0.357747	point variable: // Example 7.26b
-0.165195	Exception Specifications, Dr Dobbs Journal,
-0.331761	compilers. Intel C++ compiler (parallel
-0.593132	variable. Example: // Example 7.26a
-0.518163	this may improve the possibilities
-0.341371	before coordination with other subtasks
-0.237816	a particular weakness or bottleneck,
-0.325337	too little data for analysis.
-0.279510	....................................................................................................... 150 16 Testing speed..............................................................................................................
-0.200060	square root, RGB color difference.
-0.200060	tmmintrin.h SSE4.1 smmintrin.h SSE4.2 nmmintrin.h
-0.165195	are defined with enum, const,
-0.237908	named MKL, VML and SVML.
-0.547006	be less efficient than non-object
-0.237927	a portability issue to catching
-0.237670	F0() { try { F1();
-0.355449	vector operations are not used).
-0.237115	in Linux kernel version 2.6.30
-0.237409	get very expensive cache contentions,
-0.237790	that delays execution by causing
-0.594800	< SIZE; c++) { StoreNTD(&a[c][r],
-0.294094	recovery information for function F1.
-0.276612	Linux, Gnu/AT&T syntax: __asm ("fldl
-0.593132	needed. Example: // Example 8.19.
-0.290966	way or another. Therefore, micro-
-0.576429	15.1c is faster than 15.1b,
-0.165195	Example: // Example 8.20 module1.cpp
-0.237816	automatic CPU dispatching or memory-intensive
-0.237910	exception occurs somewhere in F1?
-0.237288	taking cache effects into account.
-0.165195	part of the Xnu project.
-0.236983	starting a new software project,
-0.381530	have to distinguish between recoverable
-0.330213	systems DOS and Windows 3.x.
-0.309119	an array with bounds checking,
-0.165195	busy doing the spell checking.
-0.568578	reduced to: // Example 8.10b
-0.357747	always false: // Example 8.10a
-0.291923	are not fully optimized yet.
-0.165195	0.40 0.30 4.5 0.82 0.59
-0.293998	int dummy[4]; volatile int DontSkip;
-0.200060	2;} int a; Plus2 (&a);
-0.235340	of the whole program. During
-0.165195	---xx--xx (-a==-b)=(a==b) ---xx---- (-a>-b)=(a<b) ---xx---x
-0.354160	latter case, you may view
-0.228179	product is Borland's now discontinued
-0.281589	but rarely in Linux. Address
-0.356477	reason why there is virtually
-0.357600	the STL is not satisfactory.
-1.439329	as explained on page 87.
-0.200060	/fp:fast /fp:fast=2 -fp-model fast, -fp-
-0.743106	Using performance monitor counters ....................................................................
-1.300914	can be used for fetching
-0.520727	a[100]; int i; float i2;
-0.224962	vector of (0,0,0,0,0,0,0,0) Is16vec8 zero(0,0,0,0,0,0,0,0);
-0.251379	coprocessor or graphics accelerator card.
-0.714350	only one call to Func1,
-0.200060	because the register usage convention
-0.230826	discontinued Object Windows Library (OWL).
-0.462241	expressions (except for the <,
-0.356126	in comparisons, such as <.
-0.165195	run. Examples include JavaScript, PHP,
-0.356637	may be because the non-reduced
-0.427360	at the object file level.
-0.349919	properly and the memory released
-0.380627	type by type-casting its address:
-0.237908	in the Professional and Enterprise
-0.235778	64 2 uint64_t 128 Vec2uq
-0.569384	7.12 Branches and switch statements.............................................................................
-0.319895	Linux, Mac Windows, Linux, Mac,
-0.441611	it is available from www.agner.org/optimize/testp.zip.
-0.237816	such as string or CString.
-0.237927	am always happy to receive
-0.165195	Borland bcc, v. 5.5 Mac:
-0.356847	An inline function is expanded
-0.165195	the problems and planned solutions.
-0.495882	one of the following solutions,
-0.237131	b = (a+1) * (a+1);
-0.357747	two gives: // Example 7.30b
-0.593132	loop. Example: // Example 7.30a
-0.165195	set SSE2 not supported"); return;
-0.352233	SSE. Several function libraries published
-0.231417	a system call (e.g. GetProcessAffinityMask
-0.237866	that all software be reinstalled
-0.234937	we may also see emulated
-0.554685	tree or a hash map.
-0.200060	-m64 -static /MT 160 /Qparallel
-0.382836	is the exponent, and fffff
-0.286169	in Java, C#, Visual Basic,
-0.356776	{1.0f, 2.5f}; a = OneOrTwo5[b
-0.237882	had an interpreter for Basic.
-0.421400	out which one is fastest.
-1.018369	(float x) { return square(x)
-0.226798	the last index changes fastest:
-0.226797	T> static inline T max(T
-0.332050	before it is too late.
-0.200060	// Example 7.15b SafeArray <float,
-0.314735	proxy is smaller and closer
-0.325340	not. Static cast The static_cast
-0.165195	because of their superior performance/price
-0.306483	still give a considerable improvement
-0.237563	calculate the table at runtime,
-0.165195	x2, x); // x^1, x^2,
-0.350184	(zero with sign bit set).
-0.237739	= 64 kb. This corresponds
-0.165195	only show a discrete icon
-0.165195	3; or __asm ("int 3");
-0.165195	Opteron K8 1.09 1.25 1.61
-0.354258	smart pointer (see page 38).
-0.228179	etc. #define Alignd(X) X __attribute__((aligned(16)))
-0.318241	Intel: "Intel® C++ Compiler Documentation".
-0.325392	it is done in connection
-0.222406	if the program runs satisfactorily
-0.165195	platform software development kit (SDK
-0.314577	a variable-size array with alloca:
-0.229255	it is very inefficient. Linear
-0.237716	// Example 14.13c int list[301];
-0.276616	libraries............................................................................ 146 14.12 Position-independent code..................................................................................
-0.330213	Windows 7 and Windows Server
-0.165195	different object file formats. Comments
-0.509023	60 The cost of synchronizing
-0.237765	time you spend on redesigning
-0.314658	it has allocated with alloca,
-0.165195	Handles to windows, graphic brushes,
-0.842278	const x) { return _mm_cvtss_si32(_mm_load_ss(&x));}
-0.237910	We must bear in mind,
-0.513357	instruction set supports self-relative addressing.
-0.165195	the job before you. Optimized
-0.251379	development tools for supporting multi-threaded
-1.169461	code. Example: // Example 7.3.
-0.279510	Other manuals by Agner Fog
-0.165195	; unused label ;eax=addressofa ;edx=addressinr
-0.165195	Loop counter //=2*A //=A*x*x+B*x+C //=DeltaY
-0.212332	char, short int, float. Similar
-0.307648	container or memory pool. Alignment?
-0.540958	then make sure the startup
-0.165195	float lookup[2] = {2.6f, 1.5f};
-0.444322	brands, and one that doesn’t.
-0.593132	ways. Example: // Example 7.39
-0.574363	compiler will convert example 12.8a
-0.237927	convert example 12.8a to 12.8b
-0.237816	element to x?" or "how
-0.355097	as explained in example 7.35
-1.460680	For example: // Example 7.37
-0.237776	(everything that begins with #)
-0.165195	because it defines electrical connections
-0.593132	offsets). Example: // Example 7.36
-0.350745	: b) y = MAX(f(x),
-0.357420	faster than the function add_horizontal)
-0.320857	type-casting pointers: The trick violates
-0.237734	polynomial(x) = 2.5*x^2 - 8*x
-0.236862	about code optimization. See www.agner.org/optimize
-0.331614	of algebra, we may write:
-0.293936	that hackers often have exploited.
-0.237739	of its arguments. This closely
-0.237908	is reflected, first and foremost,
-0.508931	generality. The most important remedy
-0.236534	applications are highly system dependent
-0.237716	7.14 class c1; int c1::*MemberPointer;
-0.236846	StringLength; i > 0; i--)
-0.237908	for accessing list[i].a and list[i].b.
-0.237816	Files on remote or removable
-0.352371	is too important to ignore,
-0.212332	that volatile doesn't mean atomic.
-0.349629	intermediate expression b * 5).
-0.237372	stride (see above, page 87)
-0.537454	the code that is distributed.
-0.222410	But lazy binding definitely degrades
-0.165195	Example: // Example 7.3. Explain
-0.381896	Intel vector math library (VML,
-0.165195	of 2: template <bool IsPowerOf2,
-0.237823	i++) { ab[i].b = Func(ab[i].a);
-0.338110	computer. The Pentium 4 (NetBurst)
-0.350250	the compiler and it understands
-0.237200	CPUs have family number 6!
-2.093579	= 0; i < ArraySize;
-0.356112	this code version performs poorly.
-1.158979	simply a matter of habit,
-0.237823	const double log2 = log(2.0);
-0.232352	branch prediction. A Pentium M
-0.346895	multiplied by the clock period
-0.294217	designed for generality and flexibility,
-0.200060	organized in a first-in-last-out fashion.
-0.237908	1, 2A, 2B, and 3A
-0.858983	uses a lot of CPU-time
-0.165195	exceptions in this block: 62
-0.165195	the Common Language Runtime, CLR,
-0.165195	1.25 1.61 n.a. 2.23 0.95
-0.358060	adding n to the exponent:
-0.335808	only available with vector operands:
-0.521966	fourteen in 64-bit systems. 67
-0.237593	y = sin(x); } 68
-1.182411	return a + 1; 69
-0.702646	See page 130 for details).
-0.165195	Mac: Darwin8 g++ v 4.0.1.
-0.830478	an excessive number of DLLs,
-0.224963	integer to the structure. Incrementing
-0.555711	how to do the conversion.
-0.341336	be tested in different browsers,
-0.575394	= 20, columns = 50;
-0.859623	high that it is unrealistic
-0.235505	are based on hardware identification.
-0.229257	loop will take approximately 500
-0.320378	C++ compilers to choose between.
-0.545927	manual. You have to consult
-0.438267	value in the previous iteration.
-0.237074	case" counts. In any event,
-0.437962	This will be very helpful
-0.352445	without paying the performance costs.
-0.212335	direct hardware access. Available protocols
-0.234686	use a constant reference instead:
-0.230828	Integers of smaller sizes (char,
-0.356115	do automatic vectorization. Optimizes moderately
-0.283159	void AddTwo(int * __restrict aa,
-0.165195	can happen that (b*c) overflows,
-0.234540	double 4 AVX2 Table 12.3.
-0.517009	because it has been brutally
-0.237273	i ; i + sign(i)
-0.237600	level-2 cache contentions will occur:
-0.226798	v. 9.0 CodeGear Borland bcc,
-0.165195	return powN<(N1&(N1-1))==0,N1>::p(x) * powN<true,N-N1>::p(x); #undef
-0.165195	value written as 2eee 1.fffff,
-0.165195	en.wikipedia.org/wiki/Compiler_optimization. ISO/IEC TR 18015, "Technical
-0.579391	loop. The following example converts
-0.293332	by is (columns * sizeof(float))
-0.231921	expressed as follows: struct Sfloat
-0.237829	if powN is // erroneously
-0.212332	of additions and multiplications. Subtractions
-0.237174	C++ 5.82 (Embarcadero/CodeGear/Borland C++ Builder
-0.235837	anda * 17is calculated as(a
-0.294217	this to i and shifts
-0.237540	automatic parallelization. Supports vector intrinsics
-0.165195	add add cmp ja $B2$3:
-0.200060	Vec8ui Vec4q Vec4uq Vec4f Vec2d
-0.592045	compatibility problems and system breakdown.
-0.292041	divided into many small subtasks,
-0.237908	there between x and y?"
-0.539479	each call to a driver
-0.212332	different instruction sets Microprocessor producers
-0.165195	2 int64_t 128 I64vec2 Vec2q
-0.234539	usually divided into three parts:
-0.292948	8 8 char 64 Is8vec8
-0.325346	int cc[]); // function prototypes
-0.598661	a series of five manuals:
-0.237823	&list[100] is (int)(&list[100]) = (int)(&list[0])
-0.165195	a smaller memory footprint. If,
-0.165195	latencies, throughputs and micro-operation breakdowns
-0.592392	together in order to minimize
-0.165195	int 16 0 65535 uint16_t
-0.165195	systems: int 16 -32768 32767
-0.489070	access a variable in parts,
-0.237884	VML and SVML. The IPP
-0.200060	costs of optimizing University courses
-0.200060	xopintrin.h (Gnu) AMD FMA4 fma4intrin.h
-0.237910	application program. All in all,
-0.357420	look at the function bodies
-0.379482	elements. The instruction add eax,1
-0.237556	instruction set. Aligning data Loading
-0.237798	contentions will occur: if (SIZE
-0.165195	many bit manipulation tricks Michael
-0.292525	immediate responses to simple actions
-0.354468	supports then you are risking
-0.356793	1.fffff, where is the sign,
-0.236560	#define EXCEPTION_FLT_OVERFLOW 0xC0000091L void MathLoop()
-0.293896	are satisfied with more heuristic
-0.294094	the integer factorial function (n!)
-0.231419	fast approximate reciprocal square root,
-0.358548	data object in the GOT,
-0.232720	Kernel Library (MKL v. 7.2).
-1.439329	as explained on page 130.
-0.587533	saturated. This can be ameliorated
-0.235151	with many such programs installed
-0.358082	as pivot in a Gauss
-0.233336	legal issue. See my blog
-0.165195	at Exception Specifications, Dr Dobbs
-0.353858	algebraic expressions using the fundamental
-0.339834	[eax], ecx DWORD PTR [eax+4],
-0.237131	inline void StoreNTD(double * dest,
-0.200060	typeof(CriticalFunction) * CriticalFunctionDispatch(void) __asm__ ("CriticalFunction");
-0.165195	= b / 1.2345; Change
-0.356770	mathematical calculations, should be scheduled
-0.312455	behavior well-defined with option -fwrapv
-0.237299	is used for pointer conversions.
-0.237882	a limited audience for educational
-0.236810	in the YMM register state.
-0.212332	of loop ; compute i/2
-0.165195	Core Math Library __vrs4_expf __vrd2_exp
-0.228179	a network with heavy traffic
-0.200060	formula: (set) = (memory address)
-0.417445	annoying to the user. Compatibility
-0.236453	ways of doing type conversions:
-0.314679	the key values are confined
-0.237816	real-time speed. Delays or glitches
-0.237931	here. It reveals a funda-
-0.237319	() { __declspec(__align(64)) double matrix[SIZE][SIZE];
-0.236560	for each version void FUNCNAME(short
-0.535329	is swapped with element matrix[c][r].
-0.251379	alignment explicitly by writing: __declspec(align(64))
-0.165195	syntax: __asm ("fldl %1 \n
-0.350233	= Multiply(10,8); b = MultiplyBy<8>(10);
-0.233337	declared as constant references accept
-0.237908	sent me corrections and suggestions
-0.212332	of reduced performance. 25 Since
-0.289016	both positive and negative impacts
-0.320932	that is allocated dynamically (with
-0.222406	discussion of this method. Your
-0.348984	The lesson we can learn
-0.200060	considerations such as price, compatibility,
-0.293137	= c1; c2 < c1+TILESIZE;
-0.336148	switches after each time slice
-0.291082	............................................................................................. 136 14.5 Integer division......................................................................................................
-0.234542	element Instruction set needed _mm_shuffle_epi8
-0.234030	SVML v.10.3 & later __svml_expf4
-0.231919	unwise to use it. Complicated
-0.315860	{ temp->a = 1.0; temp->b
-0.237019	-2.0, 4.4, 2.5}; return list[x];
-0.237670	< &list[100]; temp++) { temp->a
-0.292147	compiler can eliminate common subexpressions
-0.512075	100 floating point operations (addition,
-0.236362	The reason is, I guess,
-0.357747	pointers, e.g.: // Example 12.1b.
-0.355097	required, but in example 12.1b,
-0.325265	>= 2) SelectAddMul_pointer = &SelectAddMul_SSE2;
-0.331325	different opinions on which imprecisions
-0.799722	// Define vector classes (Intel)
-0.165195	v.10.3 & later __svml_expf4 __svml_exp2
-0.237492	operating system and CPU hardware.
-0.352286	MOVNTQ instruction must be followed
-0.165195	SelectAddMul, SelectAddMul_SSE2, SelectAddMul_SSE41, SelectAddMul_AVX2, SelectAddMul_dispatch;
-0.293416	to have just two branches:
-0.290961	as 32-bit integer multiplication prior
-0.314054	from a=a*2; to return a+1;.
-0.493448	microprocessors and operating systems (but
-0.336135	undocumented Intel library function __intel_cpu_features_init()
-0.235996	except for some small low-power
-0.235997	+= list[i]; sum2 += list[i+1];}
-0.231417	with system calls (e.g. IsProcessorFeaturePresent
-0.237204	give the variable two names,
-0.331819	if certain conditions are satisfied.
-0.237884	used to be. The distinctions
-0.218603	i, i_div_3; for(i=i_div_3=0; i<300; i+=3,i_div_3++){
-0.330323	references do not need relocation
-0.354244	consuming. Sometimes it takes hours
-0.294120	- n.a. a+b = b+a,
-0.607249	the following conditions are satisfied:
-0.351173	previous one. It may neverthe-
-0.237212	x) { static float list[]
-0.306298	statement in the condition clause.
-0.329157	of making the container expandable,
-0.235504	to the IEEE standard 754
-0.165195	four floats F32vec4 xxn(x4, x2*x,
-0.230120	a 2 GHz CPU. Should
-0.237934	report that memset is deprecated.
-0.165195	In order to facilitate porting
-0.230826	__vrd2_exp AMD LIBM Library amd_vrs4_expf
-0.314507	take more than an hour.
-0.343678	detection function, one that discriminates
-0.237910	different compilers succeeded in applying
-0.237813	and other features it has.
-0.403906	doublevalue ( 1)sign 2exponent 1023
-0.165195	than it can handle. Waiting
-0.335900	use of Intel vector classes:
-0.251379	p1; p1 = &Object1; p1->Hello();
-0.460901	element __m128i a = _mm_blendv_epi8(bc,
-0.593132	overhead. Example: // Example 8.12a
-1.154678	this to: // Example 8.12b
-0.200060	// Function prototype CriticalFunctionType CriticalFunction_Dispatch;
-0.313337	as directly compiled code. (Compile
-0.235340	of a big program. Frequent
-0.356632	and frameworks, rather than isolating
-0.358014	<xmmintrin.h> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); It is strongly
-0.355416	calling program is more manageable
-0.292493	to be cleaned up include:
-0.623911	are transferred in registers, totaling
-0.237816	with option -Wstrict-overflow=2, or (5)
-0.488907	so-called nontemporal write instructions (MOVNT)
-0.276612	Basic is Visual Basic .NET,
-0.358082	matrix a in a column-wise
-0.218603	to the user. Making exception-safe
-0.212332	when the program 153 spends
-0.165195	0.77 0.89 0.40 0.30 4.5
-0.314610	0, sizeof(a)); } int Size()
-0.165195	264-1 uint64_t Table 7.1. Sizes
-0.237273	x.d = y.d + 4.;
-0.452720	easily available from a website.
-0.165195	the end user. Menus, buttons,
-0.472840	the code from example 9.5a:
-0.237365	the cache between each call,
-0.538464	to optimization by compiler .......................................................................
-0.165195	1./6., 1./24., 1./120., 1./720., 1./5040.,
-0.237716	byte at 403 int ReadB()
-0.233823	to print out results printf("\n%2i
-0.234540	0 264-1 uint64_t Table 7.1.
-0.313896	Contain one or multiple elements?
-0.466218	"assume no pointer aliasing" (if
-0.357411	program execution may be caused
-0.342086	of f cout << x.f;
-0.200060	--------x a/1=a x-xxx-x-- 0/a=0 ---xx--xx
-0.536126	byte = char 16 XOP,
-0.237816	structures with First-In-First-Out or First-In-Last-Out
-0.237131	(FuncRow(i)*columns + FuncCol(i)) * sizeof(float)
-0.218601	problems and necessary support. Hardware
-0.212332	Contains many bit manipulation tricks
-0.345060	~a&~b=~(a|b) --xxxx--- a & a=
-0.235946	cache lines for matrix a:
-0.584594	preferably in the same directory
-0.165195	Greek[4] = { "Alpha", "Beta",
-0.458480	will be calculated as (b*2.0)/3.0
-0.165195	the container. STL deque (doubly
-0.356776	critical function a = CriticalFunction(b,
-0.165195	Function level linking (remove unreferen-
-0.356446	as fast as a scalar
-0.318123	using templates (see p. 57).
-0.707718	unsigned int fraction : 63;
-0.218601	directives for parallel processing. Scott
-0.236253	caches and cause large delays.
-0.237829	optimizing away cpuid // Read
-0.230830	dispatch Automatic vectorization Automatic paralleli-
-0.165195	char 8 0 255 uint8_t
-0.279508	compilers can automatically detect opportunities
-0.228180	4, Wednesday = 8, Thursday
-0.402405	or int 8 AVX2 _mm_i32gather_epi32
-0.237662	branch. Microprocessor designers have gone
-0.237816	are not overlapping or aliasing,
-0.324281	four x^n } return add_elements(s);
-1.136823	int parm2) {...} // Dispatcher.
-0.294112	pointer, a reference, or void.
-0.292713	memory leak. An even worse
-0.325182	expression is calculated as ((a+b)+c)+d.
-0.165195	C++ is Microsoft Foundation Classes
-0.165195	such as strcpy, strcat, strlen,
-0.382726	C-style type casting // Constructor-style
-0.237882	test all branches for correctness.
-0.344772	will cause a cache miss.
-0.350327	steals resources from a higher-priority
-0.165195	party security software. Background services.
-0.346231	in my vector class library).
-0.439480	ecx, edx, DWORD PTR [esp+4]
-0.352818	int Size() { return N;
-0.237131	n floats: float * DynamicArray
-0.408011	__m128 s; s = _mm_hadd_ps(x,
-0.375611	int if (i >= N)
-0.462534	differences due to the design
-0.352093	and 15 clock cycles (depending
-0.575634	where the parallelism is obvious.
-0.456059	or if this is obvious,
-0.236560	function type typedef void FuncType(short
-1.222222	a = b * (1.
-0.499414	edx can be changed freely.
-0.294292	x (x) x (x) x-xx--xx-
-0.165195	SIZE; c++) { StoreNTD(&a[c][r], b[r][c]);
-0.312617	or (requires no specific option)
-0.237756	are particular advantageous as replacements
-0.381812	"Technical Report on C++ Performance".
-0.433553	information for the exception handler,
-0.237908	clauses: initialization, condition, and increment.
-0.293908	3.x. These systems use segmentation
-0.237756	by comparing them as integers:
-0.165195	/MT 160 /Qparallel -parallel -openmp
-0.458552	up the stack. This behaviour
-0.237790	0 or 1 by XOR'ing
-0.752787	has an option for RTTI
-0.561390	repeatedly with the same divisor.
-0.325023	+ a.x, y + a.y);}
-0.382718	separately: for (r2 = r1+1;
-0.236090	as the other thread increments
-0.235999	is used by exception handlers
-0.594958	accessed through pointers or references:
-0.165195	Intel C++ compiler (parallel composer)
-0.356260	class vector { // 2-dimensional
-0.337954	to 127 will generate -128,
-0.165195	// Loop counter //=2*A //=A*x*x+B*x+C
-0.212332	account when optimizing multithreaded applications:
-0.237816	the options -S or /Fa
-0.237718	way to handle an unrecoverable
-0.265216	or other error condition. Things
-0.323689	-S Generate map file /Fm
-0.357515	the pointers and the texts
-0.237928	for several iterations of redesign.
-0.234540	(n & 0x7FFFFF) | 0x3F800000;
-0.352666	option -fpie instead of -fpic.
-0.538908	a good deal of research
-0.358206	specific event it is servicing.
-0.200060	DontSkip; long long clock; __cpuid(dummy,
-0.503379	64; // number of rows/columns
-0.251379	p; p = &Object1; p->NotPolymorphic();
-0.165195	S. Goedecker and A. Hoisie,
-0.222406	are accessed through pointers, e.g.:
-0.444284	be a destructor that destroys
-0.233048	in the container. STL deque
-0.525473	function because the compiler knows
-0.237910	the Gnu utilities in 2010.
-0.237882	Compiler v. 14.00 for 80x86
-0.237908	stack frame, saving and restoring
-0.237273	y = a1/b1 + a2/b2;
-0.237019	= _mm_hadd_ps(s, s); return _mm_cvtss_f32(s);
-0.534658	link with the option -read_only_relocs
-0.237908	Several internet forums and newsgroups
-0.314068	same code as example 12.4b,
-0.357747	instruction set: // Example 12.4b.
-0.344153	16 or 32 bits (rarely
-0.356776	memory address a = 10000,
-0.165195	in the order a[0], b[0],
-0.354258	point expressions (see page 72).
-0.547115	No information about the dimensions
-0.165195	5.5 Mac: Darwin8 g++ v
-0.466218	a2, b1, b2, y1, y2,
-0.218601	run in parallel. Small lightweight
-0.284379	1: 4 + esp ;alignby4
-0.233335	problems with profilers are: Coarse
-0.237800	} // Branch/loop function vectorized:
-0.200060	of platform is obviously influenced
-0.232720	CodeGear Borland bcc, v. 5.5
-0.228179	systems, and API's. Memory swapping.
-0.349024	F1? Then we are breaking
-0.403162	optimization are discussed below. Cannot
-0.350149	by including the library libmmt.lib
-0.237882	take special precautions for speeding
-0.279510	; start of Func ;a
-0.237776	63 number (e.g. with _finite())
-0.251379	decomposition and data decomposition. Functional
-0.165195	of their superior performance/price ratio.
-0.200060	installation process can proceed unattended.
-0.325392	false model number to reflect
-0.165195	start of Func ;a ;r
-0.165195	float OneOrTwo5[2] = {1.0f, 2.5f};
-0.355591	from exceptions. See page 61.
-0.165195	-mveclibabi -fopenmp /Qopenmp -m32 -m64
-0.251379	or she is busy concentrating
-0.443710	use the bitwise operators (&
-0.230121	less well-known languages. My preference
-0.165195	{ int list[100]; Func1(list, &list[8]);
-0.943720	be transferred in registers (6
-0.236253	loop would be while (0
-0.292624	is available, 256 bits (YMM)
-0.357747	vector classes: // Example 12.4d.
-0.231919	called in the copying process,
-0.314790	are done under the best-case
-0.237273	b.x + c.x + d.x;
-0.426602	point XMM (vector) reductions: a+b=b+a,
-0.943720	be transferred in registers (8
-0.294120	Example 7.40c x.abc = (A
-0.378141	(B << 4) | (C
-0.236983	64-bit systems. A software developer
-0.234540	x.abc = A | (B
-0.165195	name Instruction set Prefetch PREFETCH
-0.234814	This is called name mangling.
-0.340834	Integer expressions are less susceptible
-0.231919	tips on improving performance. Stefan
-0.311867	7.5. Set flush-to-zero mode (SSE):
-0.331902	set (128 vectors of inte-
-0.200060	/arch:AVX /openmp /MT -msse3 /arch:SSE3
-0.355331	the preceding value of sum.
-0.356126	member function such as ReadB
-0.237816	such as VHDL or Verilog.
-0.237934	that model N-1 is inferior.
-0.353747	all but the first dimension
-0.222406	more reliable than third party
-0.200060	Family 15h Processors". www.amd.com. Advices
-0.312887	other form of error reporting.
-0.358404	the hardware can be wired
-0.593132	constant. Example: // Example 14.12a
-1.732243	time it takes to refresh
-0.237884	the pitfalls here: The inequality
-0.347531	cost to using templates. Ready
-0.237523	and p2 having different types.
-0.452602	a, b; b = !a;
-0.228181	integer int n; #if defined(__unix__)
-0.357515	the parameter, and the destructor,
-1.210453	is to make a destructor.
-0.357515	units, etc. and the wires
-0.501376	you may use the _mm_clflush
-0.294112	have mixed types or sizes?
-0.581211	how to make the SelectAddMul
-0.200060	---xxx-x- a+0=a x-xxxxxx- a*0=0 --xxxx-xx
-0.212332	switch statements............................................................................. 43 7.13 Loops......................................................................................................................
-0.237716	supported instruction set int iset
-0.358548	was saved in the beginning.
-0.823338	It is necessary to adhere
-0.237823	= dummy[0]; clock = __rdtsc();
-0.235835	b + 2.0 / 3.0;
-0.342404	only 256 clock cycles. Calculations
-0.234388	the method. A longer loop-
-0.212332	int i; for(i=0; i<100; i++)a[i]=2*i;
-0.237908	distinguish between recoverable and non-recoverable
-0.344526	function that has no side-effects
-0.165195	does incredibly stupid things. Looking
-0.407765	try to optimize this loop?
-0.165195	g++ v 4.0.1. Gnu: Glibc
-0.237374	an object of class C1,
-0.235945	file for each line written.
-0.539114	N1 } }; // Partial
-0.351281	of the elements in a[]
-0.200060	be improved by consistent modularity
-0.292528	to handle unknown processors properly.
-0.234814	-fp-model fast, -fp- model fast=2
-0.825601	Alignd ( short int aa[size]
-0.234214	in case F2 actually throws
-0.237927	the two branches to feed
-0.358548	incremented, while in the former
-0.165195	correction for the "FDIV bug".
-0.623259	slow down the execution considerably.
-0.226798	Development time Some developers feel
-0.325327	for finding problems that relate
-0.655572	the expression a = b++;
-0.236844	and several other less well-known
-0.524823	elements in a vector. 6.
-0.294000	cards, etc. Use an antivirus
-0.293779	Integers can be different sizes,
-0.352969	to feed into the pipeline.
-0.165195	%1 \n fistpl %0 "
-0.356126	building blocks such as gates,
-0.376467	of objects in computer games.
-0.237019	(see page 134) return FactorialTable[n];
-0.236718	way than last time. Newer
-0.237319	N> static inline double IntegerPower
-0.456970	as discussed on page 158.
-0.231417	and relational operators (e.g. '>')
-0.234540	of compiler options Table 18.1.
-0.319431	almost certain to become obsolete
-0.331901	compiler reduced 15.1b to 15.1c,
-0.237019	an explanation of return prediction).
-0.330174	systems: long int 32 -231
-0.289570	on is the feature information,
-0.237927	to reduce (a*b*c)+(c*b*a) to a*b*c*2.
-0.237776	disturb the users with nagging
-0.222406	in some cases. Multiple threads?
-0.535404	memory, as in example 7.22.
-0.272304	and various programming languages. www.yeppp.info
-0.236960	= &list[0]; temp < &list[100];
-0.288689	#else // 32-bit Windows, Intel/MASM
-0.200060	compilers for Linux. 82 Keywords
-0.331667	the final program. This requires,
-0.234935	reasons before leaving their workplace
-0.318578	memory allocation is particularly risky
-0.869589	have an option for "standard
-0.875847	cycles if it is cached,
-0.200060	p = & obj1; p->f();
-0.230828	page 134 on bounds checking).
-0.212332	this in a pivot search:
-0.237829	versions #include "instrset_detect.cpp" // instrset_detect
-0.341594	microseconds as a time measure.
-0.294217	of a program and concentrate
-0.237776	do two additions with double's.
-0.226798	formula in each case. Inlined
-0.237823	- n.a. a+a+a+a = a*4
-0.289015	a microprocessor that supports this).
-0.237131	x4 = x2 * x2;
-0.165195	without the Common Language Runtime,
-0.165195	This calculation requires n-1 multiplications,
-0.532166	set of test data. That
-0.293269	the cache. When we reach
-0.165195	floats F32vec4 xxn(x4, x2*x, x2,
-1.758762	x - - - 76
-0.200060	b : c x-xx----- 75
-0.355480	unsigned char short int 832
-0.306483	of course a considerable job,
-0.337957	part of the optimization job.
-0.311150	the index by 8. 71
-0.237593	a[i] = temp; } 70
-0.236862	and Gnu compilers. See www.openmp.org
-0.314735	frequency goes up and down.
-0.200060	manually by the programmer. 79
-0.294164	Verilog. Common devices are CPLDs
-0.237123	the object or array coincides
-1.154678	this to: // Example 8.14b
-0.276612	mouse move or key press.
-0.593132	value. Example: // Example 8.14a
-1.210453	is to make a bit-mask
-0.310718	therefore not be too worried
-0.235504	" : "=m"(n) : "m"(x)
-0.237955	size by extending the sign-bit
-0.593132	zeroes. Example: // Example 7.33a
-0.293165	and other things very stupid.
-0.237569	important remedy is memory pooling.
-0.200060	all the objects (memory pooling)
-0.294178	doubled. A thread that shares
-0.421361	software more clear and modular.
-0.224965	a BSF (bit scan forward)
-0.310717	} This has three advantages:
-0.237908	such as GetPrivateProfileString and WritePrivateProfileString
-0.226797	/ x64 (Visual Studio 2005).
-0.829858	no reason to use try,
-0.357561	vectors FMA3 floating point multiply-and-add
-0.229257	so that it writes only,
-0.341800	a square. // This triangle
-0.200060	- - x-xx----x x-xxxxxx- x-xxxx-x-
-0.237829	b = lrint(d); // Rounding
-0.231417	a universal algorithm (e.g. Quine–McCluskey
-0.234216	} This calculation requires n-1
-0.165195	expressions using the fundamental laws
-0.237927	// sum, initialize to x^0/0!
-0.382671	clauses are separated by semicolons,
-0.357008	the desired instruction set (/arch:SSE2,
-0.325265	= (float)i; f = float(i);
-0.345514	write it with many decimals.
-0.165195	optimize this loop? Certainly not!
-1.092093	together should be stored together......................................
-0.235778	long as their uses (live
-0.357561	-fno-alias Non-strict floating point -ffast-math
-0.345237	the large overhead of managing
-0.741509	a; int b; int Sum1()
-0.237934	that the occurrence is rare.
-0.571590	interrupt should preferably be responded
-0.231919	Next, we are adding -100
-0.237273	x.b = y.b + 2.;
-0.358082	the values in a pre-calculated
-0.200060	a*b = b*a (a+b)+c=a+(b+c) a+b+c=c+b+a
-0.814865	able to do the devirtualization
-0.347497	with Intel's compilers and invoked
-0.324745	for AVX2, or two 128-
-0.652237	is illustrated in example 9.5b.
-0.356960	resources Writes to a printer
-0.165195	process can proceed unattended. Uninstallation
-0.233593	solution on future CPUs. Half
-0.620435	as follows: Instruction set Important
-0.237908	} If Func1 and Func2
-0.200060	explain the difference, let's say
-0.331183	the application. In example 12.3a,
-0.719947	manipulating floating point variables .........................
-0.286840	operations into two 128-bit reads.
-0.165195	1., 1./2., 1./6., 1./24., 1./120.,
-0.448334	examples we are using unions
-0.165195	include C, C++, D, Pascal,
-0.251379	two loop-carried dependency chains, namely
-0.544064	analysis of the data structure,
-0.234543	well as writing data. Multidimensional
-0.237756	in 36 C++ as 'this'.
-0.515963	Vec16s when compiling for AVX2,
-0.234683	16.2 #include <stdio.h> #include <asmlib.h>
-0.336569	can run in both 16-bit,
-0.538604	the above example with u.i[1]
-0.234540	some compilers unroll too much.
-0.344125	a second induction variable (eax)
-0.165195	not aliased #pragma optimize("a", on)
-0.237955	here to draw the attention
-0.659105	causes misses in the level-
-0.343692	The development time and maintainability
-0.340728	int i; float f; f=i;
-0.443318	8; // exponent + 0x7F
-0.451072	here because we are relying
-0.165195	options in the BIOS setup.
-0.226800	{ Sunday = 1, Monday
-0.351861	expression that is an n'th
-0.230119	explained in the chapter "Register
-0.330862	= (a&b) | (~a&c) a&b&c&d
-0.237365	multiple statements within each clause
-0.353562	it work cannot be ignored
-0.573677	safer to use a union,
-0.237884	has three advantages: The i<20
-0.165195	(except for the <, <=,
-0.237908	reductions involving division and relational
-0.407838	double x2 = x *x;
-0.224963	and the "Intel Performance Primitives"
-0.265216	} } Example 14.30 finds
-0.236558	always true or always false:
-0.235896	by the code inside square:
-0.165195	should be obeyed. Copy protection.
-0.237816	methods are incremental or iterative
-0.235343	linked libraries or shared objects),
-0.318578	binary representation is particularly tricky.
-0.237955	the recommendation was the opposite:
-0.352969	go back into the for-loop:
-0.350363	Function inlining has the complication
-0.356776	the case a = ++b;
-0.237931	is only half a square.
-0.165195	version FuncType SelectAddMul, SelectAddMul_SSE2, SelectAddMul_SSE41,
-0.456640	important to have a strategy
-0.237790	and not negative by AND'ing
-0.236560	: x(0) {}; void xplus2()
-0.236177	works best on processor X?"
-0.212335	Pragmatic Look at Exception Specifications,
-0.354494	may even be a million
-0.561214	avoid dynamic memory allocation (new
-0.344871	be mispredicted for this reason.
-0.237657	operating systems". For this reason,
-0.165195	option) better: -Ofast -mveclibabi -fopenmp
-0.233593	the matrix into smaller squares
-0.218601	caching conditions are optimal. Best-case
-0.236325	get a reply about investigation
-0.467630	simply an integer in disguise.
-0.237019	using the normal return route.
-0.568283	out to be too small,
-0.237884	than its reputation. The compactness
-0.356407	to avoid the loop overhead.
-0.533747	Table // Loop counter //=2*A
-0.235504	b ? 1.5f : 2.6f;
-0.165195	print out results printf("\n%2i %10I64i",
-0.237373	If there are floating point-to-integer
-0.314433	processors with this instruction set?".
-0.429457	exit the loop. The loop-branch
-0.455877	& a) { return vector(x
-1.083356	this by // Example 8.8b
-0.335862	string manipulation Mathematical functions Encryption,
-1.169461	code. Example: // Example 8.8a
-0.224962	(a&~b)|(~a&b)=a^b --------- ~a ^ ~b
-0.200060	costly to many users. Firewalls,
-0.165195	^ b ---xx---- a<<b<<c=a<<(b+c) x-xxx--xx
-0.237273	(int)(&list[100]) = (int)(&list[0]) + 100*16,
-0.165195	root, RGB color difference. Newest
-0.340110	The exponent is always normalized,
-0.292291	interface (OnIdle in Windows MFC).
-0.590331	cases of stack unwinding ..............................................................................
-0.336147	running two threads with widely
-0.349678	the logarithm would be re-calculated
-0.478657	2 or not. The advise
-0.234816	share the same cache. Multithreaded
-0.235046	will be mainstream next year.
-0.200060	appropriate instruction set specified. Insert
-0.165195	such as function inlining. Reducible
-0.235779	// add elements }; vector()
-0.291644	few lines. A few decades
-0.237908	load is high and decreased
-0.944662	Obstacles to optimization by CPU.............................................................................81
-0.356776	function pointer a = (*CriticalFunction)(b,
-0.357747	for details. // Example 12.7.
-0.236561	algebra) require other access patterns.
-0.538868	takes to develop and publish
-0.236907	are equivalent to const definitions
-1.385186	a, b; a = Multiply(10,8);
-0.314765	case of overflow is "undefined".
-0.265216	This triangle is handled separately:
-0.339834	instruction mov DWORD PTR [ecx+eax*4],ebx
-0.347517	F2 and call the std::unexpected()
-0.237823	Example 7.41b a.x = b.x
-0.237928	used by thousands of people.
-0.165195	no other exceptions: __except (GetExceptionCode()
-0.777355	library with the option -mveclibabi=svml.
-0.594800	< SIZE; c++) { a[c][r]
-0.551024	errors if they are uninitialized,
-0.591235	Therefore, the number of jumps,
-0.810708	one or a few places.
-0.200060	a niche in scientific computing,
-0.165195	courses in programming nowadays stress
-0.291792	16 unsigned char 128 Iu8vec16
-0.165195	order a[0], b[0], a[1], b[1],
-0.349792	point calculations unless the strictness
-0.234540	Intel CodeGear Microsoft Table 2.1.
-0.236177	table. Type size, bytes alignment,
-0.232720	2.7, 2.8. Asmlib: v. 2.00.
-0.222408	make the value wrap around.
-0.165195	to eliminate common sub-expressions. Why
-0.369312	...................................................................................................... 21 3.13 Memory access.......................................................................................................
-0.313603	exceptions: while (i < arraysize)
-0.236453	arithmetics and pointer type casting.
-0.236453	than a simple type casting,
-0.447024	checking for array bounds violations
-0.200060	specific option) better: -Ofast -mveclibabi
-0.575378	allocated with new or malloc.
-0.165195	© 2004 - 2014. Last
-0.314625	dynamically (with new or malloc)
-0.165195	& source) { _mm_stream_pi((__m64*)dest, *(__m64*)&source);
-0.236960	certain that u < 231
-0.347904	elements in a specific interval.
-0.237861	All disturbing influences are removed,
-0.330555	c, d; }; void Func()
-0.311772	is within a certain interval:
-0.538871	into the algorithm in question:
-0.293137	= r1; c2 < r2;
-0.447024	checks for array bounds violation,
-0.331819	most development methods are incremental
-0.237934	in example 7.43b is admittedly
-0.236861	soft processor activates critical application-
-0.714947	a good idea to collect
-0.165195	large data sets. Covers PC's,
-0.290696	independent code. The name "position-independent
-0.357092	actually needed by the application,
-0.232725	critical functions and hot spots.
-2.093579	= 0; i < list.Size();
-0.549364	interface than on the essential
-0.165195	square blocking: int r1, r2,
-0.575321	constant = multiply by xx-xx--x-
-0.237908	Fog. Public distribution and mirroring
-0.446189	on the option for "function
-0.345994	hot spots have been identified.
-0.165195	b) y = MAX(f(x), g(x));
-0.323436	on executing library functions. Time-
-0.235247	This instruction set was originally
-0.237934	number of lines is 8*1024/64
-0.165195	= a|(b&c) x-xxxx--x ~a&~b=~(a|b) --xxxx---
-0.279511	be vectorized as follows (using
-0.357092	be calculated by the series:
-0.237273	x.c = y.c + 3.;
-0.234386	v; if (u.i > v.i)
-0.356776	c; Is16vec8 a = select_gt(b,
-1.136823	int parm2) {...} // Prototype
-0.515026	together on the stack. String
-0.294067	// erroneously called with IsPowerOf2
-0.165195	(requires binutils version 2.20, glibc
-0.357830	page 51 for the pros
-0.165195	except in special mathe- matical
-0.466218	Class data members (properties) ............................................................................
-0.575634	the first element is stored?
-0.237174	Dynamic memory allocation also tends
-0.355436	9.5a goes from the leftmost
-0.508483	(i.e. Microsoft, Intel and Gnu).
-0.218601	// Example 12.9b. Taylor series,
-0.218601	example of a Taylor series.
-0.292624	soon also 512 bits (ZMM).
-0.165195	compiler options Table 18.1. Command
-0.232720	2005. Codeplay VectorC v. 2.1.7,
-0.466218	be worth the effort. Square
-0.796086	In this example, the DelayFiveSeconds
-0.236739	only to show how tortuous
-0.572857	set and ZMM registers ..........................................................
-0.165195	user. Menus, buttons, dialog boxes,
-0.165195	x); s = _mm_hadd_ps(s, s);
-0.237387	Instruction set control no yes
-0.165195	end user. Menus, buttons, dialog
-0.351861	if it is an integer).
-0.630083	bitwise operators &, |, ~.
-0.426602	Boolean XMM (vector) reductions: ~(~a)
-0.165195	which transposes a quadratic matrix,
-0.222406	x-xxxxxx- x-xxxx-x- x-xxxxxxx xxxxxxxxx xxxxxxx-x
-0.234541	on page 164 below. Those
-0.222410	(i.e. the current .cpp file)
-0.236960	more details on branch predictions
-0.233336	up to some positive value,
-0.339401	{ int b, c; x[0]
-0.237670	const & source) { _mm_stream_pi((__m64*)dest,
-0.237756	tested implement OneOrTwo5[b!=0] as OneOrTwo5[(b!=0)
-0.237927	CPUID was manipulated to fake
-0.292624	register is 128 bits (XMM)
-0.324982	function is doing multiple logically
-0.237492	performance monitoring options. CPU vendors
-0.352652	allocated for an integer constant,
-0.237716	{ double d; int i[2];
-0.455126	This requires that you analyze
-0.587533	scheduler. This can be accomplished
-0.165195	reason to use try, catch,
-0.165195	C++". Addison-Wesley. Third Edition, 2005;
-0.165195	(MS) smmintrin.h (Gnu) AES, PCLMUL
-0.290386	use the const keyword wherever
-0.234935	dispatcher based on complicated criteria
-0.265216	........................................................................... 54 7.22 Inheritance ..............................................................................................................
-0.235048	operators The pre-increment operator ++i
-0.200060	the program is dividing repeatedly
-0.290835	standard library functions like sqrt,
-0.294112	the keyword __restrict or __restrict__,
-0.237882	the microprocessor hardware for raising
-0.165195	optimize/#vectorclass Include file dvec.h vectorclass.h
-0.700148	explained in the next section.
-1.043441	addresses in the code section,
-0.236562	the induction variable method unfavorable,
-0.346539	using Intel vector classes 114
-0.265216	.................................................................................................................... 55 7.25 Bitfields ...................................................................................................................
-0.349102	or because the programmer hasn't
-0.200060	F32vec4 s(0.f, 0.f, 0.f, 1.f);
-0.236537	mmintrin.h SSE xmmintrin.h SSE2 emmintrin.h
-0.532853	set is not always optimal,
-0.357999	are assuming that the occurrence
-0.165195	the cost of fine-tuning, testing,
-0.520857	address of the preceding row.
-0.643804	7.19 Class member functions (methods).........................................................................
-0.356632	API calls rather than self-styled
-0.237273	__svml_exp2 Intel SVML + ia32intrin.h
-0.236422	2 return powN<(N & N-1)==0,N>::p(x);
-0.647985	take more than a minute
-0.237927	compact, and simple to develop.
-0.272304	sure that they are. Declare
-0.331940	you can get the exact
-0.348370	an insufficient amount of RAM,
-0.382901	debugger to identify the circumstances
-0.293609	the recursion template<> class powN<true,1>
-0.237882	cannot use ~ for NOT.
-0.314633	0x2710 and (set) = (0x2710
-0.402405	= int 8 AVX2 _mm_i64gather_epi32
-0.235835	Borland / CodeGear / Embarcadero
-0.380676	even a thousand times lower;
-0.237160	Some compilers have efficient table-based
-0.331151	were able to reduce (a*b*c)+(c*b*a)
-0.165195	513 513 2056 38.1 97
-0.350586	is possible to calculate pow(x,10)
-0.165195	profiler is called VTune; AMD's
-0.294240	17.9: "Moving blocks of data",
-0.352785	< n; i++) { 92
-0.294082	saving memory space by allowing
-0.466218	manual 4: "Instruction tables". Tips
-0.237816	a Linux compiler, or vice
-0.237200	useful for random number generators.
-0.237823	x2*x2; double x8 = x4*x4;
-0.352935	the examples I have tested.
-0.237273	a.x = b.x + c.x
-0.237273	a.y = b.y + c.y
-0.380421	where a soft processor activates
-0.224962	that is n places back,
-0.348771	the services only when activated
-0.165195	for calculating a polynomial. Scheduling
-0.357747	a template: // Example 7.34b.
-0.330862	("internal"))) __attribute__ ((visibility ("internal"))) Vectorize
-0.233336	been replaced by my comments,
-0.290390	use of two induction variables:
-0.230120	Instruction set Important features 80386
-0.357747	common denominator: // Example 14.16b
-0.200060	Example 7.45 // Portability note:
-0.165195	Same as example 13.1, Requires
-0.452043	are slow unless the Pentium-II
-0.294246	A competing product is Borland's
-0.228182	+= x^n/n! xxn *= xx4;
-0.200060	but no other exceptions: __except
-0.237928	than other methods of rounding,
-0.466218	become a serious legal issue,
-0.200060	efficient way of removing superfluous
-0.355503	a subexpression. For example, b*2.0/3.0
-0.355664	new today will be mainstream
-0.314633	_mm_hadd_ps(x, x); s = _mm_hadd_ps(s,
-0.336321	executed. An example is Perl.
-0.593132	divisions. Example: // Example 14.17a
-0.698823	from example 15.1b to 15.1c?
-0.237882	73 and 72 for discussions.
-0.292416	This is called stack unwinding.
-0.222406	int 3; or __asm ("int
-0.294239	options are set to relax
-0.229255	SSE2 instruction set (or higher)
-0.200060	in array ; i++ ;checkifi<100
-0.309860	These cases are usually dealt
-0.408087	x ((a*x+b)*x+c)*x+d x*x*x*x*x*x*x*x = ((x2)2)2
-0.506450	Optimizing subroutines in assembly language:
-0.349588	long ReadTSC() { int dummy[4];
-0.538036	on the newest CPU model,
-0.349978	sense that C++ compilers exist
-1.460680	For example: // Example 14.15a
-0.286836	The use of structures (without
-0.165195	that is always true/false Loopunrolling
-0.615742	the size parameter is wrong,
-0.237706	I guess, that compiler makers
-0.771018	this is not a textbook
-0.707718	unsigned int exponent : 15;
-0.538189	error known as memory leaks.
-0.165195	memory footprint is unreasonably large.
-0.200060	a thread affinity mask. Poor
-0.165195	chains (see p. 22). 159
-0.236291	call the ReadTSC function. 154
-0.200060	identification (RTTI) /GR– -fno-rtti /GR-
-0.237593	{ return IntegerPower<10>(x); } 152
-0.283160	it expects a GOT entry.
-0.314606	optimizations such as function inlining.
-0.493750	inlining the call to Object1.Hello(),
-0.421391	convert example 15.1a to 151
-0.235425	a systematic and well thought-through
-0.294239	the compiler not to vectorize.
-0.237908	how this works and suggests
-0.529031	take longer time than normally.
-0.320626	= float 4 AVX2 _mm256_i32gather_ps
-0.233822	control .............................................................................................. 99 10 Multithreading..............................................................................................................
-0.165195	handler calls exit(), abort(), _endthread(),
-0.165195	as their uses (live ranges)
-0.234937	functions. A metaprogramming implementation analogous
-0.293416	broken up. The two summation
-0.536715	2.3 Choice of operating system.........................................................................................
-0.293204	by copying the return statement:
-0.236558	manuals. I am always happy
-1.163975	C++ compilers and operating systems"
-0.231921	defined(__GNUC__) // 32-bit Linux, Gnu/AT&T
-0.380921	X. 14.13 System programming Device
-0.237908	processors, between PC's and mainframes,
-0.358060	machine code to the device.
-0.165195	example in the "Macro loops"
-0.200060	see http://www.agner.org/optimize/ - vectorclass www.agner.org/optimize/#vectorclass.
-0.165195	Gnu/AT&T syntax: __asm ("fldl %1
-0.212332	First-In-First-Out or First-In-Last-Out access, sort
-0.165195	lookup: // Example 7.29b floata;
-0.230830	individual installation tools. Automatic updates.
-0.290550	or intranet for automatic updates,
-0.226799	here: a[i] = log (b[i]
-0.458003	may also be a level-3
-0.400480	Microsoft C++ Compiler v. 14.00
-1.108950	instruction set is enabled. Few
-0.351651	exceptions. The function that detects
-0.321588	compilers use the name _alloca)
-0.165195	another error has occurred anywhere
-0.622899	class D : public B1,
-0.234937	pointers and references. Most importantly,
-0.272304	....................................................................................................... 21 3.10 Graphics .................................................................................................................
-0.165195	the order a[0], b[0], a[1],
-0.237131	100 * 5 * 0.5
-0.322226	activated by the user. Feature
-0.165195	1.61 n.a. 2.23 0.95 0.6
-0.331565	is AND'ed with this mask,
-0.463185	ARRAYSIZE = 100; float list[ARRAYSIZE];
-0.419662	4 short int 64 Is16vec4
-0.415739	not on Intel processors. Details
-0.229257	is run. Examples include JavaScript,
-0.301869	int NUMROWS = 100, NUMCOLUMNS
-0.229255	groups of four (or eight)
-0.693992	in the following way. First
-0.478005	without the risk of losing
-0.237823	i++) { time1 = ReadTSC();
-0.429149	workload between multiple CPU cores:
-0.237706	assembly programmers and compiler makers.
-0.229258	with template metaprogramming. Don't panic
-0.237884	a = sin(0.8); The sin
-0.652249	until the computer is rebooted.
-0.310109	a file by calling WritePrivateProfileString,
-0.212332	{ F1(); } catch (...)
-0.352413	virus attacks and other abuse
-0.237657	already known at this place.
-0.237716	7.18 int FuncRow(int); int FuncCol(int);
-0.235048	to the modulo operator %.
-0.806174	AMD and VIA CPUs"). Const
-0.356260	* 2) { // abs(u.f)
-0.229260	compiler makers. 4. Instruction tables:
-0.536973	if ((unsigned int)n < 13)
-0.165195	to make profiling feasible. Interference
-0.314276	dispatch decision at different times:
-0.237734	<= (unsigned int)(max - min))
-0.372984	evictions and other resource conflicts.
-0.165195	OneOrTwo5[b!=0]; will also work, 133
-0.237033	is template metaprogramming so complicated?
-0.165195	not need any patch. 131
-0.234683	<excpt.h> #include <float.h> #include <math.h>
-0.542370	task of the program. Application
-0.293322	to thank the many people
-0.200060	media such as floppy disks
-0.229255	counts a one parameter. Further
-0.294270	more than half the single-thread
-0.412491	memory management and garbage collection,
-0.237593	list[i+2] += i_div_3; } 138
-0.343606	tables, and virtual function tables.
-0.165195	("fldl %1 \n fistpl %0
-0.228179	lists, switch statement jump tables,
-0.321735	preferred because of their superior
-0.237626	computers have become more powerful.
-0.357830	the FAQ for the newsgroup
-0.314470	The first time you activate
-0.237816	to class C1 or C2,
-0.237776	computer. Big supercomputers with massively
-0.459064	are identified by a unique
-0.314758	kinds of costs to multithreading
-0.462719	maintain. And it is unlikely
-0.237928	up the queue of pending
-0.519935	accessed in the order a[0],
-0.236793	/ 8 = 64 kb.
-0.530739	level-2 cache is 512 kb,
-0.488572	unit-testing It is common practice
-0.498126	for runtime type identification (RTTI).
-0.498126	require runtime type identification (RTTI),
-0.237573	header file timingtest.h from www.agner.org/optimize/testp.zip
-0.251379	.................................................................................. 55 7.24 Unions ....................................................................................................................
-0.294217	compromise when portability and ease
-0.814760	may need to be resized
-0.287897	old CPUs. The Pentium Pro
-0.230122	consuming updates may come unpredictably
-0.535056	point numbers and integers ...................................
-0.340235	data through function calls. Internal
-0.237404	more predictable than integer comparisons.
-0.200060	to wrap around, (3) trap
-0.358404	public data can be overridden
-0.165195	unsigned int 128 Iu32vec4 Vec4ui
-0.292147	fail to eliminate common sub-expressions.
-0.222406	the bounds of valid addresses,
-0.407567	because there are different opinions
-0.352087	different cases for different microprocessors,
-0.290835	to write the members individually.
-0.237273	p) {return p->a + p->b;}
-0.347871	in eax. The loop initialisation
-0.812627	of the code that matters
-0.165195	is tempting to fine- tune
-0.234540	#pragma vector nontemporal Table 18.3.
-0.352767	CPUs was more than doubled
-0.349878	to one of these categories:
-0.265216	Device drivers, interrupt service routines,
-0.237928	the broader perspective of usability.
-0.234540	#include <xmmintrin.h> _mm_setcsr(_mm_getcsr() | 0x8040);
-0.489134	However, there are a couple
-0.445710	files. Use 64-bit mode Parameter
-0.563023	(arrays can also be huge).
-0.314068	// Same as example 13.1,
-0.229258	make the division faster. Of
-0.212332	The next section (page 131)
-0.294217	test program itself and recompile
-0.356260	62 __try { // Main
-0.200060	includes the libraries named MKL,
-0.537598	a[i] = log(b[i]) + log(c[i]);.
-0.212332	test data. That being said,
-0.314759	size) / (number of ways).
-0.233336	reliable solution. (In my tests,
-0.233822	and search facilities, binary trees,
-0.294243	then make it a template:
-0.165195	long double precision (80 bits).
-0.356776	{ ... a = FactorialTable[b];
-0.222408	registers has been doubled. Thin
-0.842278	const x) { return _mm_cvtsd_si32(_mm_load_sd(&x));}
-0.165195	s; s = (short int)i;
-1.206728	the compilers I have tried.
-0.237910	functions (e.g. GetLogicalProcessorInformation in Windows)
-0.235580	Is a multidimensional structure needed?
-0.226800	Developer’s Manual", Volume 1, 2A,
-0.237756	inheritance is now as follows.
-0.325265	= float(i); f = static_cast<float>(i);
-0.354018	32-bit Windows in this respect.
-0.496407	and the code becomes bulky
-0.237606	time to load. A light-weight
-0.212332	bytes of memory. One kilobyte
-0.331752	Do not turn on correction
-0.237813	which instruction set it supports.
-0.356078	instruction sets the CPU supports,
-0.524828	= temp * temp; 104
-0.575321	improve the performance by 5-10%
-0.235503	clock cycle is 1 0.5ns.
-0.237934	a certain modification is profitable.
-0.236454	unsigned char 8 0 255
-0.237884	would be straightforward. The MASM
-0.237372	as explained at page 150.
-0.443673	call the CPUID instruction directly,
-0.237813	fast as accessing it directly.
-0.237697	only once. One may argue
-0.339448	of the problems and planned
-0.860146	take advantage of this capability:
-0.231919	object on its final destination,
-0.165195	PHP, ASP and UNIX shell
-0.237816	dynamic libraries (.dll or .so).
-0.352068	FatalAppExitA(0,"Array index out of range");
-0.352286	inequality sign must be reversed
-0.165195	= {1.1, 0.3, -2.0, 4.4,
-1.210481	is the same as reflecting
-0.357420	93. Avoid the function scanf.
-0.350845	the strlen function in isolation
-0.331819	the shared resources are limiting
-0.324210	reached with a 32-bit (signed)
-0.357561	branch mispredictions, floating point exceptions,
-0.237643	get more reproducible time measurements:
-0.281590	| 0x3F800000; // Now 1.0
-0.237816	vector registers (XMM or YMM)
-0.372000	with large data sets. Covers
-0.200060	{ Sunday, Monday, Tuesday, Wednesday,
-0.340234	or sixteen vector registers (XMM
-0.419662	unsigned short int 64 Iu16vec4
-0.293267	does not need any patch.
-0.165195	// table of 1/n! 1.,
-0.357600	these conditions is not met
-0.212332	AMD XOP ammintrin.h (MS) xopintrin.h
-0.165195	{ FuncA(i); FuncC(i); FuncB(i+1); FuncC(i+1);
-0.237931	a&b&c&d = (a&b)&(c&d) a ^0
-0.294270	to controversies over the C99
-0.418115	unsigned short int 128 Iu16vec8
-0.212332	(Gnu) all intrin.h (MS) x86intrin.h
-0.165195	data conversion, shuffling, packing, unpacking
-0.237816	assembly output (/FAs or -fsource-asm).
-0.212332	features. Take user feedback seriously.
-0.165195	allocation are: int BigArray[1024] __attribute__((aligned(64)));
-0.230120	a fixed address might clash
-0.200060	thanks to the first-in-last-out nature
-0.222408	rather than the equivalent if(!(a
-0.237569	the maximum possible memory requirement.
-0.251379	Intensive Codes", SIAM 2001. Advanced
-0.462971	A constant can be propagated
-0.331913	call p->f() goes to C0::f
-0.236793	64 1 int64_t 64 I64vec1
-0.593952	The second generation class (CParent<>)
-0.218601	programmer in a bad dilemma.
-0.349084	the newest instruction set. High
-0.165195	Darwin8 g++ v 4.0.1. Gnu:
-0.237908	languages, operating systems, and API's.
-0.237124	an explanation and possible workaround.
-0.308549	2009. Gnu C++ v. 4.1.0,
-0.218603	_mm_stream_pi((__m64*)dest, *(__m64*)&source); // MOVNTQ _mm_empty();
-0.538871	to do things in parallel:
-0.478507	measurements to see if our
-0.357096	(2.5f * x - 8.0f)
-0.294246	semaphores, mutexes, etc. is considerable.
-0.165195	stack frame" or "frame pointer".
-0.165195	manipulation tricks Michael Abrash: "Zen
-0.566596	integer, pointer or reference parameters).
-0.306483	There is a considerable debate
-0.237606	types (See Sutter: A Pragmatic
-0.200060	3, 5 and 9. Multiplications
-0.237540	add the constant vector (1,2,3,4),
-0.212332	the previous chapter (page 146).
-0.237910	by my comments, in green.
-0.165195	in integer registers. Typical candidates
-0.251379	considered metaprogramming in C++: Preprocessor
-0.495328	tasks into multiple threads. Out-of-order
-0.236253	restart the computer while he
-0.444149	manuals are used by thousands
-0.237931	time to e.g. a menu
-0.348255	system code. In this chapter,
-0.237908	This data conversion and shuffling
-0.382718	(a&&b) || (a&&b&&c) = a&&b
-0.237299	and by avoiding pointer arithmetics
-0.378523	256 16 16 256 Vec32uc
-0.232720	different types of variables. Move
-0.339746	may prefer to write if(!a
-0.237882	< NUMROWS; row++) for (column
-0.574049	the loop by two gives:
-0.237927	will be rounded to 100000000.
-0.330172	three different object file formats.
-0.165195	Technology Journal Vol. 11, Iss.
-0.515549	pointers if it has incomplete
-0.165195	(n) { case 0: printf("Alpha");
-0.222406	The } 59 third generations
-0.382872	and unsigned integers is costless.
-0.165195	satisfied with more heuristic guidelines.
-0.293835	data optimally, or from knowing
-0.458768	The syntax in example 7.43b
-0.350840	for example i = 18,
-0.165195	underestimate this time lag. Thinking
-0.285361	the actual load address. Relocation
-0.231919	stored in integer registers. Typical
-0.165195	int 16 -32768 32767 int16_t
-0.200060	GNU Free Documentation License shall
-0.237131	&Object1; p1->Hello(); CChild2 * p2;
-0.234218	protected: T a[N]; public: SafeArray()
-0.237908	WritePrivateProfileString, which opens and closes
-0.165195	long 32 0 232-1 uint32_t
-0.235504	0) ? 1.0f : 2.5f;
-0.226800	profiler inserts temporary debug breakpoints
-0.314736	the Intel compiler in favor
-0.165195	v. 5.5 Mac: Darwin8 g++
-0.251379	2014-08-07. Contents 1 Introduction .......................................................................................................................
-0.310333	and who is still frustrated
-0.232720	4.0.1. Gnu: Glibc v. 2.7,
-0.235997	+= a[i+2]; s3 += a[i+3];
-0.336328	the rows, not the columns.
-0.226797	SafeArray { protected: T a[N];
-0.236044	log, exp, sin, etc. Overriding
-0.236960	= 0; j < columns;
-0.165195	operations: // Example 7.41b a.x
-0.165195	+ c.x + d.x; a.y
-0.237556	Misaligned data. Extra data conversion,
-0.485963	function or class is responsible
-0.165195	back into the for-loop: i++;
-0.251379	function prototype: void F1() throw();
-0.222406	is the loop increment i++.
-0.234815	availability of good development tools,
-0.324006	in main will take precedence,
-0.324761	zero and then call __intel_cpu_features_init_x().
-0.237829	*)alloca(n * sizeof(float)); // (Some
-0.237908	away in reusable and well-
-0.237573	has a jump from a=a*2;
-0.429198	system to generate an interrupt,
-0.308549	2007. PGI C++ v. 7.1-4,
-0.237776	means integer division with truncation,
-0.218601	five or ten years old.
-0.236454	unsigned long 32 0 232-1
-0.292415	a framework in its API.
-0.323590	four float. The type __m128d
-0.165195	32-bit software development", Addison- Wesley
-0.354258	the devirtualization (see page 73)
-0.355591	do so. See page 73.
-0.272304	xx4(x4); // x^4 F32vec4 s(0.f,
-1.105885	is used in the Active
-0.356446	C language as a subset,
-0.466274	a number of possible remedies
-0.357515	on alignment and the resultant
-0.358548	places back in the sequence,
-0.560707	example, the compiler can safely
-0.236422	a = OneOrTwo5[b & 1];
-0.358677	do searches of the kind:
-0.165195	tasks in a multitasking environment,
-0.237373	integer and 8 floating point).
-0.310331	Preprocessing directives Preprocessing directives (everything
-0.165195	date): Microsoft Visual studio 2008,
-0.283159	aa, int * __restrict bb)
-0.165195	PGI C++ v. 7.1-4, 2008.
-0.237540	Intel compiler supports vector intrinsics,
-0.314295	poor performance for vector intrinsics.
-0.790696	there is an extra layer
-0.218603	on a First-In-Last- Out (FILO)
-0.231420	option for "function level linking"
-1.460680	For example: // Example 14.2a
-0.657506	table lookup: // Example 14.2b
-0.237592	FuncA and FuncB, then FuncC.
-0.237908	between CPU brands and similarly
-0.200060	and Windows Server 2008 R2
-0.165195	a multiple of 0x800 apart.
-0.234028	256 bit integer vectors FMA3
-0.314735	the logical structure and clarity
-0.234936	In difficult cases like these,
-0.348174	is mostly compatible with these.
-0.165195	system and CPU hardware. Porting
-0.349586	this time has been wasted.
-0.200060	functions such as memcpy, memmove,
-0.439480	eax edx, DWORD PTR [esp+12]
-0.382800	such a feature for reserving
-0.358014	dispatcher updated. It is tempting
-0.234539	be two or three levels
-0.236560	by another thread void DelayFiveSeconds()
-0.408165	interposition is intended to mimic
-0.165195	Meyers: "Effective C++". Addison-Wesley. Third
-0.547771	profilers are useful for identifying
-0.236362	into multiple functions. I disagree
-0.356126	third-party profilers such as AQtime,
-0.357747	fraction bits: // Example 14.29
-0.357747	to zero: // Example 14.24
-0.657506	sign bit: // Example 14.25
-0.322229	not have this problem. Vectors
-0.265216	saturated addition, fast approximate reciprocal,
-0.357747	lrint function: // Example 14.20
-0.926080	The code in example 14.21
-0.355814	all caches have to adapt
-0.455572	these two functions are unrelated
-0.463154	not seen in the unit-
-0.284379	== 8 #define FUNCNAME SelectAddMul_AVX2
-0.345123	14.20 double d = 1.6;
-0.224966	you would have spent fighting
-0.237908	10.1.020. Functions _intel_fast_memcpy and __intel_new_strlen
-0.314762	library. Supports x86 and ARM
-0.331895	stores this result in a[i].
-0.327878	CPU core is running at,
-0.354834	even worse, it can overwrite
-0.165195	the other thread increments seconds.
-0.935507	0, last byte at 399
-0.235947	AES, PCLMUL wmmintrin.h AVX immintrin.h
-0.237931	time and afterwards a BSF
-0.293998	Explain volatile volatile int seconds;
-1.222222	a = b * 1.2f;
-0.237756	15.1d to 15.1c as intended,
-0.237816	the project window or makefile.
-0.237122	creates or modifies many strings.
-0.741448	The operating system may supply
-0.716827	basis then use a queue.
-0.462165	are sharing the same queue,
-0.236875	the function was called from),
-0.348290	the necessary functions for distinguishing
-0.165195	STL deque (doubly ended queue)
-0.165195	FAQ for the newsgroup comp.lang.asm.x86
-0.355097	the structure in example 9.1b.
-1.486991	int bb[], short int cc[]);
-0.301869	do things in parallel. Coarse-grained
-0.226799	processors. Hyperthreading is Intel's term
-0.165195	in classes like string, wstring
-0.234216	ability to override public symbols,
-0.218601	can have quite dramatic consequences.
-0.682656	involves the risk of activating
-0.292041	+= list[i+1];} sum1 += sum2;
-0.294246	communication between threads is minimized.
-0.233822	where 10 elements were inserted,
-0.237019	error reporting here: return *(T*)0;
-0.355097	the if-branch in example 7.30b.
-0.382541	the clock frequency may vary
-1.129580	like this: // Example 14.4a
-0.294138	code and data can exceed
-0.234539	(handle != INVALID_HANDLE_VALUE && WriteFile(handle,
-0.325265	>= 5) SelectAddMul_pointer = &SelectAddMul_SSE41;
-0.165195	Intel Technology Journal Vol. 11,
-0.235944	that has already been allocated.
-0.581991	as described in chapter 11.
-0.294246	switching. This cost is minimized
-0.354041	that pointers do not alias,
-0.237593	x + 2.0f; } 115
-0.335665	processors and on Intel Atom
-0.165195	Example 7.15b SafeArray <float, 100>
-0.165195	int iset = instrset_detect(); 116
-0.835691	+ i, a); } 111
-0.200060	c2 = _mm_and_si128(c2, mask); 110
-0.302384	are guaranteed to wrap around,
-0.443833	_mm_store_si128((__m128i *)d, x); } 112
-0.312455	such optimizations with option -Wstrict-overflow=2,
-0.200060	2) { FuncA(i); FuncC(i); FuncB(i+1);
-0.165195	available with vector operands: minimum,
-1.136674	the function is called. 118
-0.707718	unsigned int exponent : 11;
-0.501703	matters: Division by a constant:
-0.165195	processors. Henry S. Warren, Jr.:
-0.354258	be profitable (see page 70).
-0.235048	1. The AND operator (&)
-0.634686	list[i+2] = 2; } list[300]
-0.692269	of the Boolean operators (&&
-0.350717	return &CriticalFunction_SSE2; } // Default
-0.200060	The file will remain locked
-0.165195	1/n! 1., 1./2., 1./6., 1./24.,
-0.236422	number of machine instructions executed,
-0.656803	long enough to be annoying.
-0.642302	as a linked list. 94
-0.311447	than allocating more space 91
-0.235997	= temp; temp += 9;
-0.165195	code from example 9.5a: 98
-0.235048	compiler on the Mac platform,
-0.293893	free the memory when exiting
-0.351556	slight imprecision in some rare
-0.237928	of the program of occupying
-0.314695	the bit-mask: c2 = _mm_and_si128(c2,
-0.237908	the right format and getting
-0.382838	this is possible in Linux).
-0.593132	undesired. Example: // Example 7.41a
-0.462135	the operations: // Example 7.41b
-0.228178	sign bit are zero. Zero
-0.233822	cout << list[i] << endl;
-0.231419	(0x2710 / 0x40) % 0x20
-0.339834	PTR [eax+400] DWORD PTR [eax],
-0.165195	with: // Example 7.38b. Alternative
-0.290698	can make a Boolean NOT
-0.200060	result will be misleading reports
-0.236175	than in the big registration
-1.074470	in case of an error;
-0.237800	Compiler-specific keywords Fast function calling.
-0.226799	using the keyword far (arrays
-0.237816	the keyword __thread or __declspec(thread).
-0.560248	c = a * 2.5;
-0.356679	is slow. If the granularity
-0.501082	i; } u; u.i &=
-0.658818	pointed to can be accessed.
-0.358548	power-save options in the BIOS
-0.294243	function which transposes a quadratic
-0.293857	is calculated first, then d+e,
-0.314765	to by r is re-loaded
-0.339899	multiplying with the constant 2.5,
-0.291646	because it often contains writeable
-0.237816	allocation using new/delete or malloc/free
-0.501082	instruction set is enabled (single
-0.313501	compiler A feature called "Gnu
-0.200060	to the instruction xor eax,eax.
-0.237910	incremental or iterative in nature,
-0.200060	/MT -msse3 /arch:SSE3 -mssse3 /arch:SSSE2
-1.439329	as explained on page 27.
-0.165195	software development", Addison- Wesley 1997.
-0.235945	7.17 Structures and classes Nowadays,
-0.231417	a micro-op cache (e.g. Sandy
-0.224963	Library" and "Integrated Performance Primitives".
-0.600149	; mangled function name ;startofFunc
-0.331878	do. This results in meaningless
-0.575337	x-- x x-- x ---
-0.236773	version. Updating mechanisms often disturb
-0.442070	and to put the task-specific
-0.200060	files and network connections. Temporary
-0.314533	point is 1. This '1'
-0.237823	__cpuid(dummy, 0); DontSkip = dummy[0];
-0.341825	the dispatcher function and replaces
-0.458592	c, temp; temp = a+1;
-0.535937	events that are not reproducible.
-0.329873	and sometimes it does incredibly
-0.287406	references require a variable. Efficiency
-0.294292	the result is valid. Re-interpreting
-0.343059	performance penalty to using inheritance.
-0.237206	is declared. Avoid multiple inheritance,
-0.212332	of the same brand. Future
-0.342825	calculations on the second sub-vector
-0.290962	the EXCLUSIVE OR operator (^)
-0.237934	transfer for 'this' is incurred
-0.290384	a<c) = (a<b && b<c)
-0.290549	and C++ is Microsoft Foundation
-0.350477	explained in the other volumes
-0.234540	(MS) x86intrin.h (Gnu) Table 12.2.

\end\